Skip to content

fix: handle non-UTF-8 bytes in stdio server stdin#2302

Merged
maxisbey merged 3 commits intomainfrom
fix/stdio-invalid-utf8
Mar 17, 2026
Merged

fix: handle non-UTF-8 bytes in stdio server stdin#2302
maxisbey merged 3 commits intomainfrom
fix/stdio-invalid-utf8

Conversation

@maxisbey
Copy link
Contributor

TextIOWrapper(sys.stdin.buffer, encoding="utf-8") defaults to errors="strict", which raises UnicodeDecodeError when stdin contains bytes that are not valid UTF-8. The exception occurs during async for line in stdin: iteration — outside the JSON-parsing try/except — so it propagates through the task group and terminates the server process.

With errors="replace", invalid bytes become U+FFFD. The resulting line then fails JSON validation, which is already caught and sent into the read stream as an exception for the session layer to handle — the same path as any other malformed input.

Motivation and Context

The MCP spec requires clients to send valid UTF-8 JSON-RPC, and in stdio transport the client spawns the server as its own subprocess — so in practice this only triggers when a client sends protocol-violating input to its own child. But the server process dying on a single bad byte is worse than surfacing a parse error, and the other MCP SDKs (TypeScript, C#, Go, Rust) all handle this gracefully.

How Has This Been Tested?

Regression test test_stdio_server_invalid_utf8 monkeypatches sys.stdin with a binary buffer containing \xff\xfe\n followed by a valid JSON-RPC message. It verifies the invalid line surfaces as an in-stream exception and the valid message still comes through afterward. Fails on main with ExceptionGroup(UnicodeDecodeError), passes with this fix.

Breaking Changes

None.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update

Checklist

  • I have read the MCP Documentation
  • My code follows the repository's style guidelines
  • New and existing tests pass locally
  • I have added appropriate error handling
  • I have added or updated documentation as needed

Additional context

Reported by @manthanghasadiya.

src/mcp/client/stdio.py has a related issue — StdioServerParameters.encoding_error_handler defaults to "strict", so a buggy server can crash the client the same way. It's configurable so lower priority; worth a follow-up to change the default.

TextIOWrapper defaults to errors='strict', which raises
UnicodeDecodeError when stdin contains bytes that are not valid UTF-8.
This exception occurs during 'async for line in stdin' iteration, which
is outside the JSON-parsing try/except block, so it propagates through
the task group and terminates the server process.

With errors='replace', invalid bytes are converted to U+FFFD. The
resulting line then fails JSON validation, which is already caught and
sent into the read stream as an exception for the session layer to
handle - the same path as any other malformed input.
Copy link

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — clean one-line fix with a solid regression test.

Extended reasoning...

Overview

This PR adds a single parameter (errors="replace") to the TextIOWrapper call in src/mcp/server/stdio.py that wraps sys.stdin.buffer. It also adds a well-structured regression test in tests/server/test_stdio.py that verifies invalid UTF-8 bytes surface as in-stream exceptions rather than crashing the server.

Security risks

None. The change makes the server more resilient — instead of dying on a single bad byte (a potential denial-of-service vector from a misbehaving client), it gracefully degrades to a JSON parse error through the existing error-handling path. This aligns with how other MCP SDKs (TypeScript, C#, Go, Rust) handle the same scenario.

Level of scrutiny

This is a minimal, mechanical fix — one keyword argument added to a stdlib call. The behavior change is well-understood: errors="replace" is a standard Python TextIOWrapper option. The test is thorough, covering both the error case and verifying subsequent valid messages still process correctly. Low scrutiny is appropriate.

Other factors

The only bug found by the automated review is a minor nit about a stale # pragma: no cover annotation — the new test now exercises that except handler, so the pragma is no longer accurate. This is cosmetic and does not affect correctness or CI. The PR description is clear, the motivation is sound, and the change is non-breaking.

coverage.py misreports the ->exit arc on nested async with on Python
3.14 Windows. Also simplify from __aiter__/__anext__ to receive().
Copy link

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — straightforward robustness fix with good test coverage.

Extended reasoning...

Overview

This PR touches two files: src/mcp/server/stdio.py (adding errors="replace" to the TextIOWrapper wrapping sys.stdin.buffer, and removing a stale # pragma: no cover) and tests/server/test_stdio.py (adding a regression test for invalid UTF-8 input). The total diff is very small — one parameter added, one comment removed, and ~30 lines of test code.

Security risks

None. The change makes the server more robust by converting invalid bytes to U+FFFD replacement characters instead of crashing. The invalid input then follows the existing error-handling path (JSON parse failure → exception sent into the read stream). No new attack surface is introduced.

Level of scrutiny

This warrants minimal scrutiny. It is a single-parameter change to a standard Python API (TextIOWrapper), using a well-documented error mode (errors="replace"). The fix aligns with how the other MCP SDKs (TypeScript, C#, Go, Rust) handle the same scenario, as noted in the PR description. The test clearly demonstrates the before/after behavior.

Other factors

  • The regression test is well-structured: it monkeypatches sys.stdin with a BytesIO containing invalid UTF-8 followed by a valid JSON-RPC message, verifying that the invalid line produces an in-stream exception and the valid message still processes correctly.
  • The PR description is thorough, explaining motivation, testing, and noting a related follow-up for the client side.
  • No outstanding reviewer comments remain unaddressed.
  • The change is consistent with the project's existing patterns for error handling in the stdio transport.

maxisbey added a commit that referenced this pull request Mar 17, 2026
PR #2302 passed ./scripts/test (100% coverage) but failed CI when
strict-no-cover found lines marked '# pragma: no cover' that the new
test actually executed. The claim in CLAUDE.md that ./scripts/test
'matches CI exactly' was false.

Adds strict-no-cover as a final step in scripts/test. The tool
internally spawns 'uv run coverage json' without --frozen, which
rewrites uv.lock on machines with registry overrides; UV_FROZEN=1
propagates to that subprocess.

Also converts session.py:426 from 'no cover' to 'lax no cover'. The
except-handler during connection cleanup is a genuine race (stream may
or may not already be closed depending on timing), so it is
nondeterministically covered. strict-no-cover would have flagged it
intermittently on high-core machines.

CLAUDE.md now documents the fast targeted path (single test file +
strict-no-cover, ~4s, no false positives on partial runs), the three
pragma types, and the known coverage.py ->exit arc quirk with nested
async-with that only the CI matrix catches.
Copy link
Member

@jerome3o-anthropic jerome3o-anthropic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved

@maxisbey maxisbey merged commit 1a2244f into main Mar 17, 2026
33 checks passed
@maxisbey maxisbey deleted the fix/stdio-invalid-utf8 branch March 17, 2026 18:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants