Skip to content

fix: Handle connection drops (UND_ERR_SOCKET) and prevent process crash#377

Open
harshitha-cstk wants to merge 4 commits intodevelopmentfrom
fix/dx-5444-und-err-socket-422-errors
Open

fix: Handle connection drops (UND_ERR_SOCKET) and prevent process crash#377
harshitha-cstk wants to merge 4 commits intodevelopmentfrom
fix/dx-5444-und-err-socket-422-errors

Conversation

@harshitha-cstk
Copy link

Problem

When the remote server closes the TLS connection during a CDA request, Node 22's fetch (undici) can reject with TypeError: terminated and cause.code === 'UND_ERR_SOCKET'. In the SDK this led to:

  • Unhandled rejection: In the 200 path, the promise from response.json() had no .catch(). If the connection closed during body read, that rejection was unhandled and could crash the Node process.
  • No retries: Socket/abort errors were not retried; only HTTP status–based retries (e.g. 408, 429) were used.

Solution

  • Catch body-read failures: Add .catch() on the response.json() promise in both the 200 and non-200 branches so body-read errors are handled and no longer cause unhandled rejections.
  • Retry on socket/abort: Treat message === 'terminated' and error.cause.code === 'UND_ERR_SOCKET' or 'UND_ERR_ABORTED' as retriable and, when retryLimit > 0, use the existing onError() retry path.
  • Consistent handling: Apply the same catch-and-retry logic for fetch-level rejection and for body-read rejection in both 200 and non-200 branches; when not retrying, reject with the actual error so API errors (e.g. 422) remain identifiable.

Changes

  • src/core/lib/request.js
    • 200 branch: .catch() on data.then(...) to handle body-read errors, detect socket/abort, retry via onError(err) when possible, otherwise reject(err).
    • Non-200 branch: .catch((err) => ...) on data.then(...) with the same socket/abort detection and retry; reject with err or { status, statusText } when not retrying.
    • Outer fetch(...).catch: detect socket/abort and call onError(error) when retryLimit > 0, else reject(error).
  • docs/SDK-Engineering-Investigation-UND_ERR_SOCKET.md: Investigation doc for SDK engineering.

Result

  • Connection drops no longer cause unhandled rejections; the Request promise is either retried or rejected.
  • Socket/abort errors are retried using existing retryLimit / retryDelay / retryDelayOptions.
  • Callers receive the same error (including error.cause.code) when retries are exhausted, so they can handle or log failures without process crash.

Made with Cursor

dhavaljain999 and others added 2 commits March 3, 2026 13:30
- Add .catch() on response.json() in 200 and non-200 branches to handle body-read failures
- Retry on socket/abort errors (terminated, UND_ERR_SOCKET, UND_ERR_ABORTED) via onError()
- Treat fetch-level and body-read socket errors consistently; reject with actual error when not retrying
- Add SDK engineering investigation doc for UND_ERR_SOCKET handling

Made-with: Cursor
@harshitha-cstk harshitha-cstk requested review from a team as code owners March 16, 2026 11:22
@github-actions
Copy link

🔒 Security Scan Results

ℹ️ Note: Only vulnerabilities with available fixes (upgrades or patches) are counted toward thresholds.

Check Type Count (with fixes) Without fixes Threshold Result
🔴 Critical Severity 0 0 10 ✅ Passed
🟠 High Severity 0 0 25 ✅ Passed
🟡 Medium Severity 0 0 500 ✅ Passed
🔵 Low Severity 0 0 1000 ✅ Passed

⏱️ SLA Breach Summary

✅ No SLA breaches detected. All vulnerabilities are within acceptable time thresholds.

Severity Breaches (with fixes) Breaches (no fixes) SLA Threshold (with/no fixes) Status
🔴 Critical 0 0 15 / 30 days ✅ Passed
🟠 High 0 0 30 / 120 days ✅ Passed
🟡 Medium 0 0 90 / 365 days ✅ Passed
🔵 Low 0 0 180 / 365 days ✅ Passed

✅ BUILD PASSED - All security checks passed

@github-actions
Copy link

🔒 Security Scan Results

ℹ️ Note: Only vulnerabilities with available fixes (upgrades or patches) are counted toward thresholds.

Check Type Count (with fixes) Without fixes Threshold Result
🔴 Critical Severity 0 0 10 ✅ Passed
🟠 High Severity 0 0 25 ✅ Passed
🟡 Medium Severity 0 0 500 ✅ Passed
🔵 Low Severity 0 0 1000 ✅ Passed

⏱️ SLA Breach Summary

✅ No SLA breaches detected. All vulnerabilities are within acceptable time thresholds.

Severity Breaches (with fixes) Breaches (no fixes) SLA Threshold (with/no fixes) Status
🔴 Critical 0 0 15 / 30 days ✅ Passed
🟠 High 0 0 30 / 120 days ✅ Passed
🟡 Medium 0 0 90 / 365 days ✅ Passed
🔵 Low 0 0 180 / 365 days ✅ Passed

✅ BUILD PASSED - All security checks passed

@github-actions
Copy link

🔒 Security Scan Results

ℹ️ Note: Only vulnerabilities with available fixes (upgrades or patches) are counted toward thresholds.

Check Type Count (with fixes) Without fixes Threshold Result
🔴 Critical Severity 0 0 10 ✅ Passed
🟠 High Severity 0 0 25 ✅ Passed
🟡 Medium Severity 0 0 500 ✅ Passed
🔵 Low Severity 0 0 1000 ✅ Passed

⏱️ SLA Breach Summary

✅ No SLA breaches detected. All vulnerabilities are within acceptable time thresholds.

Severity Breaches (with fixes) Breaches (no fixes) SLA Threshold (with/no fixes) Status
🔴 Critical 0 0 15 / 30 days ✅ Passed
🟠 High 0 0 30 / 120 days ✅ Passed
🟡 Medium 0 0 90 / 365 days ✅ Passed
🔵 Low 0 0 180 / 365 days ✅ Passed

✅ BUILD PASSED - All security checks passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants