Conversation
Add comprehensive documentation for confidence scoring approach: - Size-based scaling rationale and formula - Detector-specific strategies (intrinsic vs constant confidence) - Base confidence values for magic (0.95/0.75) and charset-normalizer (0.85) - Examples and interaction with behavior thresholds Add analysis of text validation and confidence threshold: - text_validate_confidence is effectively unused (always 0.0 in main path) - Validation checks textuality, not detection confidence (orthogonal concerns) - Recommend removing confidence threshold, keeping tristate control Fix docstring in is_permissive_charset() to correctly reflect that CP1252 is not permissive (has 5 undefined bytes). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Refactor decoders to use charset trial decoding with validator hooks. Update default trial codec order to prefer UTF-8 before OS defaults and keep inference confidence gating. Adjust docs and tests for BOM-aware charset normalization and decode behavior. Co-Authored-By: GPT-5 Codex <gpt-5-codex@users.noreply.openai.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 452af933e1
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Remove resolved Windows encoding investigation notes and keep active research notes focused on current v3 decisions. Update ideas scope to post-v3.0+ and retain CP1252 historical finding in decode refactor notes. Co-Authored-By: GPT-5 Codex <gpt-5-codex@users.noreply.openai.com>
Co-Authored-By: GPT-5 Codex <gpt-5-codex@users.noreply.openai.com>
Co-Authored-By: GPT-5 Codex <gpt-5-codex@users.noreply.openai.com>
Co-Authored-By: GPT-5 Codex <gpt-5-codex@users.noreply.github.com>
Co-Authored-By: GPT-5 Codex <gpt-5-codex@users.noreply.github.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2fbad9168d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Propagate decode-attempt confidence into text validation gating. Add tests for both above-threshold skip and below-threshold validation behavior. Co-Authored-By: Codex <codex@users.noreply.openai.com>
Update architecture summary and validation decision documentation for current v3 behavior. Restore conservative decode-attempt text validation confidence handling and remove threshold-gating tests. Co-Authored-By: Codex <codex@users.noreply.openai.com>
Treat supplied HTTP Content-Type as authoritative parse input in inference paths. Convert charset and MIME detection toggles to booleans and validate them via BehaviorsInvalidity. Update tests and architecture notes for the v3 behavior model. Co-Authored-By: Codex <codex@users.noreply.openai.com>
No description provided.