fix: add MAX_WALK_DEPTH guard to native engine AST walkers#484
fix: add MAX_WALK_DEPTH guard to native engine AST walkers#484carlos-alm merged 4 commits intomainfrom
Conversation
All recursive walk_node, walk_ast_nodes, walk_ast_nodes_with_config, scan_import_names, extract_implements_from_node (extractors), walk/ walk_children (complexity), and process_if (CFG) functions now bail out at depth 200, matching the existing MAX_VISIT_DEPTH pattern in dataflow.rs. Prevents stack overflow segfaults on deeply nested ASTs.
|
Claude finished @carlos-alm's task —— View job PR Review: MAX_WALK_DEPTH Guard ImplementationReviewing PR #484 that adds depth guards to prevent stack overflow segfaults in native engine AST walkers. Phase 1: Root Cause Analysis ✅Problem Being Solved: The PR addresses issue #481 - stack overflow segfaults occurring when the native Rust engine processes deeply nested syntax trees. This is a real infrastructure issue that can crash the entire process, not a symptom being masked. Approach Justification: The solution follows established patterns already used in the codebase:
Risk Assessment: ✅ LOW RISK - This is clearly fixing a real underlying issue (segfaults) rather than hiding symptoms. The depth limit is reasonable and follows existing precedent. Phase 2: Configuration Change Detection ✅Analysis: ✅ NO HIGH-RISK CHANGES DETECTED
Phase 3: Test Integrity ✅Analysis: ✅ NO TEST INTEGRITY CONCERNS
Phase 4: Code Quality Review ✅Implementation Pattern: ✅ EXCELLENT CONSISTENCY
Affected Components (12 files):
Security Assessment: ✅ NO CONCERNS
Performance Impact: ✅ MINIMAL
Phase 5: Backlog Compliance Assessment ✅Zero-dep: ✓ — No new runtime dependencies added. Uses existing tree-sitter/SQLite/AST infrastructure. Foundation-aligned: ✓ — Fully compliant with core principles:
Problem-fit: 5/5 — Directly addresses core problem: AI agents waste tokens re-orienting when tools crash on pathological inputs. Prevents segfaults that would completely break AI agent workflows. Breaking: No — All public APIs maintained via wrapper pattern. Purely additive safety improvement. Tier: Tier 1 (zero-dep ✓ + foundation-aligned ✓) Final AssessmentRoot Cause AnalysisProblem Being Solved: Stack overflow segfaults in native Rust engine AST walkers on deeply nested syntax trees (issue #481). Backlog Compliance
Critical ConcernsNone. This is a high-quality defensive fix with excellent implementation consistency. Final Recommendation
|
Greptile SummaryThis PR adds
Confidence Score: 5/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["Public API: walk_node(node, src, syms)"] --> B["walk_node_depth(node, src, syms, depth=0)"]
B --> C{"depth >= MAX_WALK_DEPTH?"}
C -- Yes --> D["return — guard fires"]
C -- No --> E["Process current node"]
E --> F["Iterate node.child_count()"]
F --> G["walk_node_depth(child, src, syms, depth+1)"]
G --> C
subgraph constants_module ["constants.rs — single source of truth"]
P["MAX_WALK_DEPTH = 200"] --> Q["cfg.rs, complexity.rs, dataflow.rs"]
P --> R["helpers.rs re-exports for extractors"]
end
subgraph complexity_mutual ["complexity.rs — mutual recursion pair"]
H["walk(node, depth)"] --> I{"depth >= MAX_WALK_DEPTH?"}
I -- Yes --> J["return"]
I -- No --> K["walk_children(node, same depth)"]
K --> L{"depth >= MAX_WALK_DEPTH?"}
L -- Yes --> M["return"]
L -- No --> N["walk(child, depth+1) per child"]
N --> H
end
Last reviewed commit: 2b8c458 |
| use crate::types::ComplexityMetrics; | ||
|
|
||
| /// Maximum recursion depth for AST traversal to prevent stack overflow. | ||
| const MAX_WALK_DEPTH: usize = 200; |
There was a problem hiding this comment.
Duplicate constant, consider a shared source of truth
complexity.rs (and cfg.rs) each define their own private const MAX_WALK_DEPTH: usize = 200, while helpers.rs already exports pub const MAX_WALK_DEPTH: usize = 200 (used by all the extractor files). This creates three independent definitions of the same value. If the guard threshold ever needs to be tuned, all three sites must be updated in sync.
Since complexity.rs and cfg.rs live outside the extractors module, importing from helpers.rs would be an awkward cross-module dependency. A cleaner long-term option would be a top-level shared constants module (e.g., crates/codegraph-core/src/constants.rs) so the single canonical value can be re-exported to all consumers. For now, at minimum a code comment pointing to the other two definitions would reduce the risk of drift.
This also applies to crates/codegraph-core/src/cfg.rs:5.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
There was a problem hiding this comment.
Fixed — created a shared \ module with the single canonical \ definition. All four consumers (complexity.rs, cfg.rs, dataflow.rs, helpers.rs) now import from it. Also unified dataflow's \ name to \ for consistency. helpers.rs re-exports via \ so extractors' wildcard imports still work.
| @@ -397,6 +402,7 @@ fn walk_children( | |||
| cognitive, | |||
| cyclomatic, | |||
| max_nesting, | |||
| depth + 1, | |||
| ); | |||
| } | |||
| } | |||
There was a problem hiding this comment.
Depth guard belongs in
walk_children, not only in walk
The depth guard lives entirely in walk, but walk_children is the true entry point — compute_function_complexity calls walk_children(…, 0) directly, never walk. Because walk_children itself contains no guard, a future caller that invokes walk_children at an already-high depth (e.g. from a wrapper that passes a pre-existing depth) could skip the guard on the first traversal step.
Adding a symmetric guard to walk_children would make the invariant self-contained:
fn walk_children(
node: &Node,
nesting_level: u32,
is_top_function: bool,
rules: &LangRules,
cognitive: &mut u32,
cyclomatic: &mut u32,
max_nesting: &mut u32,
depth: usize,
) {
if depth >= MAX_WALK_DEPTH {
return;
}
for i in 0..node.child_count() {This is a minor defensive hardening, not an active bug given the current call sites.
There was a problem hiding this comment.
Fixed — added a symmetric depth guard at the top of walk_children. Now both walk and walk_children independently check depth >= MAX_WALK_DEPTH, making the invariant self-contained regardless of which function is called first.
Extract the duplicated MAX_WALK_DEPTH constant from complexity.rs, cfg.rs, dataflow.rs, and helpers.rs into a single constants.rs module. Also unifies dataflow's MAX_VISIT_DEPTH name to MAX_WALK_DEPTH. helpers.rs re-exports the constant for extractor wildcard imports.
| fn process_if_depth(&mut self, if_stmt: &Node, current: u32, depth: usize) -> Option<u32> { | ||
| if depth >= MAX_WALK_DEPTH { | ||
| return Some(current); | ||
| } |
There was a problem hiding this comment.
Early-return leaves pre-created
false_block as an empty pass-through
When the depth guard fires here, current is a freshly-created false_block that the caller already wired up with cond_block → false_block ("branch_false"). Returning Some(current) causes the caller to immediately add false_block → join_block ("fallthrough"), leaving false_block as an empty stub labeled "else-if" that bypasses the entire else-if chain.
The resulting CFG is still structurally valid (no orphaned nodes, no crash), but CFG consumers will observe a branch_false → join edge with no body, silently truncating the else-if ladder. A brief comment here would help future maintainers understand the intent:
| fn process_if_depth(&mut self, if_stmt: &Node, current: u32, depth: usize) -> Option<u32> { | |
| if depth >= MAX_WALK_DEPTH { | |
| return Some(current); | |
| } | |
| fn process_if_depth(&mut self, if_stmt: &Node, current: u32, depth: usize) -> Option<u32> { | |
| if depth >= MAX_WALK_DEPTH { | |
| // Depth limit reached: return `current` so the caller can still | |
| // wire up a fallthrough edge to its join block. The else-if chain | |
| // will be silently truncated — the resulting CFG is structurally | |
| // valid but incomplete for very deeply nested if-else ladders. | |
| return Some(current); | |
| } |
There was a problem hiding this comment.
Fixed — added the suggested clarifying comment explaining that the early return produces a structurally valid but truncated CFG for deeply nested if-else ladders.
Document that the early return on depth limit produces a structurally valid but truncated CFG for very deeply nested if-else ladders.
* docs: add plan to centralize hardcoded configuration values Inventory 78+ hardcoded magic numbers across the codebase and design a 6-phase plan to route them through the existing .codegraphrc.json config system. Includes PR #480 brief command thresholds. * fix: address Greptile review feedback on config centralization plan (#482) - Fix magic-number count: ~70 individual values (34 entries), add derivation - Make recursive mergeConfig a hard Phase 1 prerequisite (not optional) - Move MCP_MAX_LIMIT to Category F (keep hardcoded — security boundary) - Merge Phase 7 into Phase 6 to fix phase count mismatch * fix: address round-2 Greptile review feedback (#482) - Keep build.driftThreshold in `build` namespace (not community) - Add `structure.cohesionThreshold` for A8 instead - Fix stale line numbers for A9-A12 (brief.js) - Split briefBfsDepth into briefCallerDepth + briefImporterDepth - Clarify check.js uses config.check.depth as sole authoritative key * fix: address round-3 Greptile review feedback (#482) - Add C5 (similarityWarnThreshold) to DEFAULTS schema and Phase 4 wiring - Split D4 into D4a (jsdocEndScanLines: 10) and D4b (jsdocOpenScanLines: 20) - Update Phase 5 file-utils.js wiring to reference both D4 keys * fix: address round-4 Greptile review feedback (#482) - Move C6 (embedding batch sizes) to Category F with rationale: model-specific implementation details rarely tuned by end-users - Rename analysis.defaultDepth to analysis.impactDepth for consistency with other per-feature depth keys (auditDepth, sequenceDepth, etc.) - Remove ambiguous Phase 4 wording about models.js configurability * docs: add native engine depth constants to config inventory (F7-F11) Reflects MAX_WALK_DEPTH constants introduced by #484 (fix for #481) and the pre-existing MAX_VISIT_DEPTH in dataflow.rs plus MCP_MAX_LIMIT. All are safety boundaries that should stay hardcoded. * fix: address round-5 Greptile review feedback (#482) - Add config.search.topK (C3) to semantic.js wiring in Phase 4 - Add structure to the Phase 6 CLAUDE.md configurable sections list
Summary
Closes #481.
MAX_WALK_DEPTH = 200depth guard to all recursive AST traversal functions in the native Rust engine, preventing stack overflow segfaults on deeply nested syntax treesMAX_VISIT_DEPTH = 200pattern already used indataflow.rswalk_node→walk_node_depth) to keep public APIs stableAffected components
walk_nodejavascript.rs,go.rs,python.rs,java.rs,csharp.rs,ruby.rs,php.rs,rust_lang.rs,hcl.rswalk_node_depthwalk_ast_nodes(JS)javascript.rswalk_ast_nodes_depthwalk_ast_nodes_with_confighelpers.rswalk_ast_nodes_with_config_depthscan_import_names(JS)javascript.rsscan_import_names_depthextract_implements_from_node(JS)javascript.rsextract_implements_depthwalk/walk_children(complexity)complexity.rsdepth: usizeparameter threaded throughprocess_if(CFG)cfg.rsprocess_if_depthTest plan
cargo build)