v3 Development: Improve reliability of decoder by treating only trial decodes and text validation as authoritative. by emcd · Pull Request #3 · emcd/python-detextive

emcd · 2026-02-13T04:46:21Z

No description provided.

…lection.

Add comprehensive documentation for confidence scoring approach: - Size-based scaling rationale and formula - Detector-specific strategies (intrinsic vs constant confidence) - Base confidence values for magic (0.95/0.75) and charset-normalizer (0.85) - Examples and interaction with behavior thresholds Add analysis of text validation and confidence threshold: - text_validate_confidence is effectively unused (always 0.0 in main path) - Validation checks textuality, not detection confidence (orthogonal concerns) - Recommend removing confidence threshold, keeping tristate control Fix docstring in is_permissive_charset() to correctly reflect that CP1252 is not permissive (has 5 undefined bytes). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Refactor decoders to use charset trial decoding with validator hooks. Update default trial codec order to prefer UTF-8 before OS defaults and keep inference confidence gating. Adjust docs and tests for BOM-aware charset normalization and decode behavior. Co-Authored-By: GPT-5 Codex <gpt-5-codex@users.noreply.openai.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 452af933e1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

sources/detextive/decoders.py

Remove resolved Windows encoding investigation notes and keep active research notes focused on current v3 decisions. Update ideas scope to post-v3.0+ and retain CP1252 historical finding in decode refactor notes. Co-Authored-By: GPT-5 Codex <gpt-5-codex@users.noreply.openai.com>

Co-Authored-By: GPT-5 Codex <gpt-5-codex@users.noreply.openai.com>

Co-Authored-By: GPT-5 Codex <gpt-5-codex@users.noreply.github.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2fbad9168d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

sources/detextive/decoders.py

Propagate decode-attempt confidence into text validation gating. Add tests for both above-threshold skip and below-threshold validation behavior. Co-Authored-By: Codex <codex@users.noreply.openai.com>

Update architecture summary and validation decision documentation for current v3 behavior. Restore conservative decode-attempt text validation confidence handling and remove threshold-gating tests. Co-Authored-By: Codex <codex@users.noreply.openai.com>

Treat supplied HTTP Content-Type as authoritative parse input in inference paths. Convert charset and MIME detection toggles to booleans and validate them via BehaviorsInvalidity. Update tests and architecture notes for the v3 behavior model. Co-Authored-By: Codex <codex@users.noreply.openai.com>

emcd and others added 4 commits February 12, 2026 03:59

[WIP] Refactor 'decode' logic to improve accuracy of character set se…

a56a9a5

…lection.

Update notes. (Coauthor: Anthropic Claude Sonnet 4.5)

3b2ba09

chatgpt-codex-connector bot reviewed Feb 13, 2026

View reviewed changes

sources/detextive/decoders.py Outdated Show resolved Hide resolved

emcd and others added 5 commits February 12, 2026 21:09

Remove charset promotions behavior and preserve detection confidence.

f2fe5dd

Co-Authored-By: GPT-5 Codex <gpt-5-codex@users.noreply.openai.com>

Clarify default versus supplement semantics for decoding and inference.

8712978

Co-Authored-By: GPT-5 Codex <gpt-5-codex@users.noreply.openai.com>

Add decode_inform API with textual MIME metadata.

f2ae383

Co-Authored-By: GPT-5 Codex <gpt-5-codex@users.noreply.github.com>

Improve v3 coverage plan and decoding test coverage.

2fbad91

Co-Authored-By: GPT-5 Codex <gpt-5-codex@users.noreply.github.com>

chatgpt-codex-connector bot reviewed Feb 14, 2026

View reviewed changes

sources/detextive/decoders.py Show resolved Hide resolved

emcd and others added 3 commits February 13, 2026 18:46

Honor decode-attempt validation confidence threshold.

4636cb9

Propagate decode-attempt confidence into text validation gating. Add tests for both above-threshold skip and below-threshold validation behavior. Co-Authored-By: Codex <codex@users.noreply.openai.com>

emcd merged commit 9f490ae into master Feb 14, 2026
22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v3 Development: Improve reliability of decoder by treating only trial decodes and text validation as authoritative.#3

v3 Development: Improve reliability of decoder by treating only trial decodes and text validation as authoritative.#3
emcd merged 12 commits intomasterfrom
decode-refactor

emcd commented Feb 13, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

emcd commented Feb 13, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant