Skip to content

[v1.x] fix: handle non-UTF-8 bytes in stdio server stdin#2303

Merged
maxisbey merged 3 commits intov1.xfrom
fix/stdio-invalid-utf8-v1.x
Mar 17, 2026
Merged

[v1.x] fix: handle non-UTF-8 bytes in stdio server stdin#2303
maxisbey merged 3 commits intov1.xfrom
fix/stdio-invalid-utf8-v1.x

Conversation

@maxisbey
Copy link
Contributor

Backport of #2302 to v1.x.

Cherry-pick of the main commit with one adaptation: the regression test's final assertion uses JSONRPCMessage(root=valid) instead of valid directly, because on v1.x SessionMessage.message is still a JSONRPCMessage RootModel wrapper (pre-#1908).

See #2302 for the full description and test verification.

TextIOWrapper defaults to errors='strict', which raises
UnicodeDecodeError when stdin contains bytes that are not valid UTF-8.
This exception occurs during 'async for line in stdin' iteration, which
is outside the JSON-parsing try/except block, so it propagates through
the task group and terminates the server process.

With errors='replace', invalid bytes are converted to U+FFFD. The
resulting line then fails JSON validation, which is already caught and
sent into the read stream as an exception for the session layer to
handle - the same path as any other malformed input.
Copy link

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — straightforward one-parameter backport with good test coverage.

Extended reasoning...

Overview

This PR backports a fix from main (#2302) to the v1.x branch. It adds errors="replace" to the TextIOWrapper wrapping sys.stdin.buffer in src/mcp/server/stdio.py, so that non-UTF-8 bytes on stdin are replaced with U+FFFD instead of crashing the server. A regression test in tests/server/test_stdio.py exercises the new behavior.

Security risks

None. The change makes the server more robust by gracefully handling malformed input rather than crashing. Using errors="replace" is the standard Python approach for this scenario.

Level of scrutiny

Low scrutiny warranted. This is a single-parameter addition to an existing stdlib call, following well-established Python conventions. The test is clear and directly exercises the new code path. No CODEOWNERS file exists.

Other factors

The only bug finding is a minor nit: the # pragma: no cover on the except clause is now stale since the new test exercises that path. This is a valid observation but purely cosmetic — it affects coverage reporting accuracy, not functionality. The core change is correct and well-tested, so this nit does not block approval.

coverage.py misreports the ->exit arc on nested async with on Python
3.14 Windows. Also simplify from __aiter__/__anext__ to receive().
Copy link

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — straightforward backport adding errors="replace" to handle non-UTF-8 stdin bytes gracefully, with a solid regression test.

Extended reasoning...

Overview

This PR backports #2302 to the v1.x branch. It touches two files: src/mcp/server/stdio.py (adding errors="replace" to the TextIOWrapper for stdin, and removing a stale # pragma: no cover) and tests/server/test_stdio.py (adding a regression test for invalid UTF-8 input).

Security risks

None. The change is purely defensive — it prevents a crash on malformed input by replacing invalid bytes with U+FFFD, which then fails JSON validation and is delivered as an in-stream exception. This is the standard Python approach for handling encoding errors gracefully.

Level of scrutiny

This is a small, focused bug fix with a clear intent and a well-written test. The code change is a single parameter addition (errors="replace") to an existing TextIOWrapper call, plus a pragma removal. The test covers both the error path and recovery. This warrants minimal scrutiny — it follows established patterns and has no ambiguity.

Other factors

The original fix was already merged to main (#2302), so this is a cherry-pick with a minor adaptation for the v1.x SessionMessage.message type (using JSONRPCMessage(root=valid) instead of valid directly). My previous inline comment about the stale pragma was already addressed by the PR itself — the diff removes the pragma on line 66. No outstanding reviewer comments need addressing.

@maxisbey maxisbey merged commit 2e9897e into v1.x Mar 17, 2026
22 checks passed
@maxisbey maxisbey deleted the fix/stdio-invalid-utf8-v1.x branch March 17, 2026 18:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants