[v1.x] fix: handle non-UTF-8 bytes in stdio server stdin#2303
Conversation
TextIOWrapper defaults to errors='strict', which raises UnicodeDecodeError when stdin contains bytes that are not valid UTF-8. This exception occurs during 'async for line in stdin' iteration, which is outside the JSON-parsing try/except block, so it propagates through the task group and terminates the server process. With errors='replace', invalid bytes are converted to U+FFFD. The resulting line then fails JSON validation, which is already caught and sent into the read stream as an exception for the session layer to handle - the same path as any other malformed input.
There was a problem hiding this comment.
LGTM — straightforward one-parameter backport with good test coverage.
Extended reasoning...
Overview
This PR backports a fix from main (#2302) to the v1.x branch. It adds errors="replace" to the TextIOWrapper wrapping sys.stdin.buffer in src/mcp/server/stdio.py, so that non-UTF-8 bytes on stdin are replaced with U+FFFD instead of crashing the server. A regression test in tests/server/test_stdio.py exercises the new behavior.
Security risks
None. The change makes the server more robust by gracefully handling malformed input rather than crashing. Using errors="replace" is the standard Python approach for this scenario.
Level of scrutiny
Low scrutiny warranted. This is a single-parameter addition to an existing stdlib call, following well-established Python conventions. The test is clear and directly exercises the new code path. No CODEOWNERS file exists.
Other factors
The only bug finding is a minor nit: the # pragma: no cover on the except clause is now stale since the new test exercises that path. This is a valid observation but purely cosmetic — it affects coverage reporting accuracy, not functionality. The core change is correct and well-tested, so this nit does not block approval.
coverage.py misreports the ->exit arc on nested async with on Python 3.14 Windows. Also simplify from __aiter__/__anext__ to receive().
There was a problem hiding this comment.
LGTM — straightforward backport adding errors="replace" to handle non-UTF-8 stdin bytes gracefully, with a solid regression test.
Extended reasoning...
Overview
This PR backports #2302 to the v1.x branch. It touches two files: src/mcp/server/stdio.py (adding errors="replace" to the TextIOWrapper for stdin, and removing a stale # pragma: no cover) and tests/server/test_stdio.py (adding a regression test for invalid UTF-8 input).
Security risks
None. The change is purely defensive — it prevents a crash on malformed input by replacing invalid bytes with U+FFFD, which then fails JSON validation and is delivered as an in-stream exception. This is the standard Python approach for handling encoding errors gracefully.
Level of scrutiny
This is a small, focused bug fix with a clear intent and a well-written test. The code change is a single parameter addition (errors="replace") to an existing TextIOWrapper call, plus a pragma removal. The test covers both the error path and recovery. This warrants minimal scrutiny — it follows established patterns and has no ambiguity.
Other factors
The original fix was already merged to main (#2302), so this is a cherry-pick with a minor adaptation for the v1.x SessionMessage.message type (using JSONRPCMessage(root=valid) instead of valid directly). My previous inline comment about the stale pragma was already addressed by the PR itself — the diff removes the pragma on line 66. No outstanding reviewer comments need addressing.
Backport of #2302 to
v1.x.Cherry-pick of the
maincommit with one adaptation: the regression test's final assertion usesJSONRPCMessage(root=valid)instead ofvaliddirectly, because onv1.xSessionMessage.messageis still aJSONRPCMessageRootModel wrapper (pre-#1908).See #2302 for the full description and test verification.