fix: add MAX_WALK_DEPTH guard to native engine AST walkers by carlos-alm · Pull Request #484 · optave/codegraph

carlos-alm · 2026-03-17T07:24:21Z

Summary

Closes #481.

Adds MAX_WALK_DEPTH = 200 depth guard to all recursive AST traversal functions in the native Rust engine, preventing stack overflow segfaults on deeply nested syntax trees
Follows the existing MAX_VISIT_DEPTH = 200 pattern already used in dataflow.rs
Uses a wrapper function pattern (walk_node → walk_node_depth) to keep public APIs stable

Affected components

Component	File(s)	Change
All extractors `walk_node`	`javascript.rs`, `go.rs`, `python.rs`, `java.rs`, `csharp.rs`, `ruby.rs`, `php.rs`, `rust_lang.rs`, `hcl.rs`	Depth-limited via `walk_node_depth`
`walk_ast_nodes` (JS)	`javascript.rs`	Depth-limited via `walk_ast_nodes_depth`
`walk_ast_nodes_with_config`	`helpers.rs`	Depth-limited via `walk_ast_nodes_with_config_depth`
`scan_import_names` (JS)	`javascript.rs`	Depth-limited via `scan_import_names_depth`
`extract_implements_from_node` (JS)	`javascript.rs`	Depth-limited via `extract_implements_depth`
`walk` / `walk_children` (complexity)	`complexity.rs`	`depth: usize` parameter threaded through
`process_if` (CFG)	`cfg.rs`	Depth-limited via `process_if_depth`

Test plan

Verify native engine builds cleanly (cargo build)
Run parser extraction tests to confirm no regressions
Run integration tests to verify graph output unchanged
Test with deeply nested fixture to confirm depth guard activates gracefully

All recursive walk_node, walk_ast_nodes, walk_ast_nodes_with_config, scan_import_names, extract_implements_from_node (extractors), walk/ walk_children (complexity), and process_if (CFG) functions now bail out at depth 200, matching the existing MAX_VISIT_DEPTH pattern in dataflow.rs. Prevents stack overflow segfaults on deeply nested ASTs.

claude · 2026-03-17T07:24:33Z

Claude finished @carlos-alm's task —— View job

PR Review: MAX_WALK_DEPTH Guard Implementation

Reviewing PR #484 that adds depth guards to prevent stack overflow segfaults in native engine AST walkers.

Phase 1: Root Cause Analysis ✅

Problem Being Solved: The PR addresses issue #481 - stack overflow segfaults occurring when the native Rust engine processes deeply nested syntax trees. This is a real infrastructure issue that can crash the entire process, not a symptom being masked.

Approach Justification: The solution follows established patterns already used in the codebase:

dataflow.rs already uses MAX_VISIT_DEPTH = 200 for similar protection
complexity.rs and cfg.rs both define MAX_WALK_DEPTH: usize = 200
Uses wrapper function pattern (walk_node → walk_node_depth) to maintain stable public APIs

Risk Assessment: ✅ LOW RISK - This is clearly fixing a real underlying issue (segfaults) rather than hiding symptoms. The depth limit is reasonable and follows existing precedent.

Phase 2: Configuration Change Detection ✅

Analysis: ✅ NO HIGH-RISK CHANGES DETECTED

No configuration files (.json, .toml, .yml, .config, etc.) were modified
No linting rules, test timeouts, or coverage thresholds were relaxed
No TypeScript strict mode or build configuration changes
All changes are pure code implementations maintaining existing behavior

Phase 3: Test Integrity ✅

Analysis: ✅ NO TEST INTEGRITY CONCERNS

Zero test files were modified in this PR
No test assertions were removed or weakened
No coverage thresholds lowered or test timeouts increased
No eslint-disable or workaround comments added
Changes are purely defensive guards with graceful degradation

Phase 4: Code Quality Review ✅

Implementation Pattern: ✅ EXCELLENT CONSISTENCY

Consistent depth constant: MAX_WALK_DEPTH: usize = 200 across all files
Proper wrapper pattern: Public APIs remain unchanged (walk_node → walk_node_depth)
Graceful degradation: Depth exceeded → early return (no panic, no error)
Threading pattern: Depth parameter consistently passed as depth + 1 to children

Affected Components (12 files):

Language extractors (9): javascript.rs, python.rs, go.rs, java.rs, csharp.rs, rust_lang.rs, ruby.rs, php.rs, hcl.rs
Generic helpers: helpers.rs - shared walk_ast_nodes_with_config_depth
Complexity analysis: complexity.rs - threads depth through walk() function
CFG construction: cfg.rs - adds depth guard to process_if_depth

Security Assessment: ✅ NO CONCERNS

No SQL injection, command injection, or secret exposure vectors
Depth limit prevents DoS via deeply nested input
Fail-safe behavior (early return) instead of panic/crash

Performance Impact: ✅ MINIMAL

Single integer parameter addition per recursive call
Comparison operation depth >= MAX_WALK_DEPTH is O(1)
Early termination actually improves performance on pathological inputs

Phase 5: Backlog Compliance Assessment ✅

Zero-dep: ✓ — No new runtime dependencies added. Uses existing tree-sitter/SQLite/AST infrastructure.

Foundation-aligned: ✓ — Fully compliant with core principles:

P1 (always-current): Prevents crashes that would break real-time analysis
P4 (zero-cost core): No external dependencies, minimal overhead
P5 (embeddable-first): Maintains stable programmatic APIs
P8 (we are not an agent): Pure defensive infrastructure, no decisions/generation

Problem-fit: 5/5 — Directly addresses core problem: AI agents waste tokens re-orienting when tools crash on pathological inputs. Prevents segfaults that would completely break AI agent workflows.

Breaking: No — All public APIs maintained via wrapper pattern. Purely additive safety improvement.

Tier: Tier 1 (zero-dep ✓ + foundation-aligned ✓)

Final Assessment

Root Cause Analysis

Problem Being Solved: Stack overflow segfaults in native Rust engine AST walkers on deeply nested syntax trees (issue #481).
Why This Approach: Follows established MAX_WALK_DEPTH = 200 pattern already used in dataflow.rs, complexity.rs, cfg.rs. Graceful degradation via early return.
Risk Assessment: LOW — Fixes real infrastructure failure, not masking symptoms.

Backlog Compliance

Zero-dep: ✓ — builds on existing tree-sitter/SQLite
Foundation-aligned: ✓ — maintains zero-infrastructure, embeddable-first principles
Problem-fit: 5 — directly prevents AI workflow breakage from tool crashes
Breaking: No — stable API via wrapper pattern
Tier: 1

Critical Concerns

None. This is a high-quality defensive fix with excellent implementation consistency.

Final Recommendation

Rating: ⭐⭐⭐⭐⭐ (5/5)
Action: APPROVE
Reasoning: Textbook example of defensive programming. Fixes real segfault issue using established patterns with zero API breakage. Essential infrastructure reliability improvement.

greptile-apps · 2026-03-17T07:28:41Z

Greptile Summary

This PR adds MAX_WALK_DEPTH = 200 depth guards to all recursive AST traversal functions in the native Rust engine (crates/codegraph-core/), preventing stack-overflow segfaults when parsing deeply nested syntax trees. It also consolidates the previously scattered depth-guard constants (a local MAX_VISIT_DEPTH in dataflow.rs and per-file private consts in complexity.rs/cfg.rs) into a single canonical crates/codegraph-core/src/constants.rs, with helpers.rs re-exporting it so existing extractor wildcard imports continue to work without modification.

New constants.rs declares the single pub const MAX_WALK_DEPTH: usize = 200; lib.rs exposes it as pub mod constants.
Wrapper pattern (walk_node → walk_node_depth, etc.) is applied uniformly across all 9 extractor files, helpers.rs, javascript.rs (4 functions), complexity.rs, and cfg.rs, keeping all public and package-internal APIs stable.
complexity.rs threads depth through both walk and walk_children with symmetric guards; walk correctly passes depth (not depth + 1) when delegating child iteration to walk_children on the same node, while walk_children increments to depth + 1 when calling walk on each child — so the counter tracks AST depth, not raw call-frame depth, and the maximum real call-stack growth is ~2× MAX_WALK_DEPTH.
dataflow.rs replaces its local MAX_VISIT_DEPTH with the shared constant and unifies the naming.
All previous review feedback (shared constant module, symmetric guard in walk_children, clarifying comment in cfg.rs) has been addressed.

Confidence Score: 5/5

This PR is safe to merge — it is a pure defensive hardening with no changes to observable behaviour on well-formed inputs.
The wrapper pattern is applied consistently and correctly across all 15 files. Depth increments at the right call sites in every case. The shared constant module cleanly replaces three prior independent definitions. All feedback from the previous review round has been addressed. No regressions are expected for normal AST depths; deeply nested inputs are now handled gracefully with a silent truncation (documented in cfg.rs) rather than a segfault.
No files require special attention.

Important Files Changed

Filename	Overview
crates/codegraph-core/src/constants.rs	New shared constants module introducing the canonical `MAX_WALK_DEPTH = 200` used by all consumers (complexity, cfg, dataflow, helpers/extractors).
crates/codegraph-core/src/dataflow.rs	Removes local `MAX_VISIT_DEPTH` constant, imports canonical `MAX_WALK_DEPTH` from `constants`, renames usages — behaviour is unchanged.
crates/codegraph-core/src/extractors/helpers.rs	Re-exports `MAX_WALK_DEPTH` from `constants` for wildcard-importing extractors; wraps `walk_ast_nodes_with_config` with a depth-limited `_depth` variant correctly incrementing depth at both recursion sites.
crates/codegraph-core/src/extractors/javascript.rs	Most extensive extractor change: wraps `walk_node`, `walk_ast_nodes`, `extract_implements_from_node`, and `scan_import_names` with depth-limited `_depth` variants; all recursion sites correctly pass `depth + 1`.
crates/codegraph-core/src/complexity.rs	Threads `depth: usize` through both `walk` and `walk_children`; both functions have symmetric guards. `walk` passes `depth` (not `depth + 1`) to `walk_children` on the same node, which then passes `depth + 1` to child `walk` calls — this correctly counts AST levels rather than call-stack frames.
crates/codegraph-core/src/cfg.rs	Wraps `process_if` with `process_if_depth`; both else-if recursion sites pass `depth + 1`; the depth-limit early-return is annotated with a clarifying comment explaining the truncated-but-valid CFG behaviour.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["Public API: walk_node(node, src, syms)"] --> B["walk_node_depth(node, src, syms, depth=0)"]
    B --> C{"depth >= MAX_WALK_DEPTH?"}
    C -- Yes --> D["return — guard fires"]
    C -- No --> E["Process current node"]
    E --> F["Iterate node.child_count()"]
    F --> G["walk_node_depth(child, src, syms, depth+1)"]
    G --> C

    subgraph constants_module ["constants.rs — single source of truth"]
        P["MAX_WALK_DEPTH = 200"] --> Q["cfg.rs, complexity.rs, dataflow.rs"]
        P --> R["helpers.rs re-exports for extractors"]
    end

    subgraph complexity_mutual ["complexity.rs — mutual recursion pair"]
        H["walk(node, depth)"] --> I{"depth >= MAX_WALK_DEPTH?"}
        I -- Yes --> J["return"]
        I -- No --> K["walk_children(node, same depth)"]
        K --> L{"depth >= MAX_WALK_DEPTH?"}
        L -- Yes --> M["return"]
        L -- No --> N["walk(child, depth+1) per child"]
        N --> H
    end

_{Last reviewed commit: 2b8c458}

greptile-apps · 2026-03-17T07:28:45Z

crates/codegraph-core/src/complexity.rs

 use crate::types::ComplexityMetrics;

+/// Maximum recursion depth for AST traversal to prevent stack overflow.
+const MAX_WALK_DEPTH: usize = 200;


Duplicate constant, consider a shared source of truth

complexity.rs (and cfg.rs) each define their own private const MAX_WALK_DEPTH: usize = 200, while helpers.rs already exports pub const MAX_WALK_DEPTH: usize = 200 (used by all the extractor files). This creates three independent definitions of the same value. If the guard threshold ever needs to be tuned, all three sites must be updated in sync.

Since complexity.rs and cfg.rs live outside the extractors module, importing from helpers.rs would be an awkward cross-module dependency. A cleaner long-term option would be a top-level shared constants module (e.g., crates/codegraph-core/src/constants.rs) so the single canonical value can be re-exported to all consumers. For now, at minimum a code comment pointing to the other two definitions would reduce the risk of drift.

This also applies to crates/codegraph-core/src/cfg.rs:5.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

Fixed — created a shared \ module with the single canonical \ definition. All four consumers (complexity.rs, cfg.rs, dataflow.rs, helpers.rs) now import from it. Also unified dataflow's \ name to \ for consistency. helpers.rs re-exports via \ so extractors' wildcard imports still work.

greptile-apps · 2026-03-17T07:28:46Z

crates/codegraph-core/src/complexity.rs

@@ -397,6 +402,7 @@ fn walk_children(
                cognitive,
                cyclomatic,
                max_nesting,
+                depth + 1,
            );
        }
    }


Depth guard belongs in walk_children, not only in walk

The depth guard lives entirely in walk, but walk_children is the true entry point — compute_function_complexity calls walk_children(…, 0) directly, never walk. Because walk_children itself contains no guard, a future caller that invokes walk_children at an already-high depth (e.g. from a wrapper that passes a pre-existing depth) could skip the guard on the first traversal step.

Adding a symmetric guard to walk_children would make the invariant self-contained:

fn walk_children( node: &Node, nesting_level: u32, is_top_function: bool, rules: &LangRules, cognitive: &mut u32, cyclomatic: &mut u32, max_nesting: &mut u32, depth: usize, ) { if depth >= MAX_WALK_DEPTH { return; } for i in 0..node.child_count() {

This is a minor defensive hardening, not an active bug given the current call sites.

Fixed — added a symmetric depth guard at the top of walk_children. Now both walk and walk_children independently check depth >= MAX_WALK_DEPTH, making the invariant self-contained regardless of which function is called first.

Reflects MAX_WALK_DEPTH constants introduced by #484 (fix for #481) and the pre-existing MAX_VISIT_DEPTH in dataflow.rs plus MCP_MAX_LIMIT. All are safety boundaries that should stay hardcoded.

Extract the duplicated MAX_WALK_DEPTH constant from complexity.rs, cfg.rs, dataflow.rs, and helpers.rs into a single constants.rs module. Also unifies dataflow's MAX_VISIT_DEPTH name to MAX_WALK_DEPTH. helpers.rs re-exports the constant for extractor wildcard imports.

carlos-alm · 2026-03-17T07:36:49Z

@greptileai

greptile-apps · 2026-03-17T07:46:46Z

crates/codegraph-core/src/cfg.rs

+    fn process_if_depth(&mut self, if_stmt: &Node, current: u32, depth: usize) -> Option<u32> {
+        if depth >= MAX_WALK_DEPTH {
+            return Some(current);
+        }


Early-return leaves pre-created false_block as an empty pass-through

When the depth guard fires here, current is a freshly-created false_block that the caller already wired up with cond_block → false_block ("branch_false"). Returning Some(current) causes the caller to immediately add false_block → join_block ("fallthrough"), leaving false_block as an empty stub labeled "else-if" that bypasses the entire else-if chain.

The resulting CFG is still structurally valid (no orphaned nodes, no crash), but CFG consumers will observe a branch_false → join edge with no body, silently truncating the else-if ladder. A brief comment here would help future maintainers understand the intent:

Suggested change

fn process_if_depth(&mut self, if_stmt: &Node, current: u32, depth: usize) -> Option<u32> {

if depth >= MAX_WALK_DEPTH {

return Some(current);

}

fn process_if_depth(&mut self, if_stmt: &Node, current: u32, depth: usize) -> Option<u32> {

if depth >= MAX_WALK_DEPTH {

// Depth limit reached: return `current` so the caller can still

// wire up a fallthrough edge to its join block. The else-if chain

// will be silently truncated — the resulting CFG is structurally

// valid but incomplete for very deeply nested if-else ladders.

return Some(current);

}

Fixed — added the suggested clarifying comment explaining that the early return produces a structurally valid but truncated CFG for deeply nested if-else ladders.

Document that the early return on depth limit produces a structurally valid but truncated CFG for very deeply nested if-else ladders.

carlos-alm · 2026-03-17T07:50:14Z

@greptileai

* docs: add plan to centralize hardcoded configuration values Inventory 78+ hardcoded magic numbers across the codebase and design a 6-phase plan to route them through the existing .codegraphrc.json config system. Includes PR #480 brief command thresholds. * fix: address Greptile review feedback on config centralization plan (#482) - Fix magic-number count: ~70 individual values (34 entries), add derivation - Make recursive mergeConfig a hard Phase 1 prerequisite (not optional) - Move MCP_MAX_LIMIT to Category F (keep hardcoded — security boundary) - Merge Phase 7 into Phase 6 to fix phase count mismatch * fix: address round-2 Greptile review feedback (#482) - Keep build.driftThreshold in `build` namespace (not community) - Add `structure.cohesionThreshold` for A8 instead - Fix stale line numbers for A9-A12 (brief.js) - Split briefBfsDepth into briefCallerDepth + briefImporterDepth - Clarify check.js uses config.check.depth as sole authoritative key * fix: address round-3 Greptile review feedback (#482) - Add C5 (similarityWarnThreshold) to DEFAULTS schema and Phase 4 wiring - Split D4 into D4a (jsdocEndScanLines: 10) and D4b (jsdocOpenScanLines: 20) - Update Phase 5 file-utils.js wiring to reference both D4 keys * fix: address round-4 Greptile review feedback (#482) - Move C6 (embedding batch sizes) to Category F with rationale: model-specific implementation details rarely tuned by end-users - Rename analysis.defaultDepth to analysis.impactDepth for consistency with other per-feature depth keys (auditDepth, sequenceDepth, etc.) - Remove ambiguous Phase 4 wording about models.js configurability * docs: add native engine depth constants to config inventory (F7-F11) Reflects MAX_WALK_DEPTH constants introduced by #484 (fix for #481) and the pre-existing MAX_VISIT_DEPTH in dataflow.rs plus MCP_MAX_LIMIT. All are safety boundaries that should stay hardcoded. * fix: address round-5 Greptile review feedback (#482) - Add config.search.topK (C3) to semantic.js wiring in Phase 4 - Add structure to the Phase 6 CLAUDE.md configurable sections list

carlos-alm mentioned this pull request Mar 17, 2026

docs: plan to centralize hardcoded configuration #482

Merged

2 tasks

greptile-apps bot reviewed Mar 17, 2026

View reviewed changes

fix: add clarifying comment on cfg.rs depth-guard early return (#484)

2b8c458

Document that the early return on depth limit produces a structurally valid but truncated CFG for very deeply nested if-else ladders.

Merge branch 'main' into fix/481-walk-depth-limit

01ae7fd

carlos-alm merged commit 0a91ba7 into main Mar 17, 2026
19 checks passed

carlos-alm deleted the fix/481-walk-depth-limit branch March 17, 2026 07:58

github-actions bot locked and limited conversation to collaborators Mar 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: add MAX_WALK_DEPTH guard to native engine AST walkers#484

fix: add MAX_WALK_DEPTH guard to native engine AST walkers#484
carlos-alm merged 4 commits intomainfrom
fix/481-walk-depth-limit

carlos-alm commented Mar 17, 2026

Uh oh!

claude bot commented Mar 17, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Mar 17, 2026 •

edited

Loading

Uh oh!

greptile-apps bot Mar 17, 2026

Uh oh!

carlos-alm Mar 17, 2026

Uh oh!

greptile-apps bot Mar 17, 2026

Uh oh!

carlos-alm Mar 17, 2026

Uh oh!

carlos-alm commented Mar 17, 2026

Uh oh!

greptile-apps bot Mar 17, 2026

Uh oh!

carlos-alm Mar 17, 2026

Uh oh!

carlos-alm commented Mar 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

carlos-alm commented Mar 17, 2026

Summary

Affected components

Test plan

Uh oh!

claude bot commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: MAX_WALK_DEPTH Guard Implementation

Phase 1: Root Cause Analysis ✅

Phase 2: Configuration Change Detection ✅

Phase 3: Test Integrity ✅

Phase 4: Code Quality Review ✅

Phase 5: Backlog Compliance Assessment ✅

Final Assessment

Root Cause Analysis

Backlog Compliance

Critical Concerns

Final Recommendation

Uh oh!

greptile-apps bot commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm commented Mar 17, 2026

Uh oh!

greptile-apps bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm commented Mar 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude bot commented Mar 17, 2026 •

edited

Loading

greptile-apps bot commented Mar 17, 2026 •

edited

Loading