Skip to content

fix: restore complexity/CFG/dataflow analysis in builds#469

Merged
carlos-alm merged 7 commits intomainfrom
fix/incremental-analysis-468
Mar 16, 2026
Merged

fix: restore complexity/CFG/dataflow analysis in builds#469
carlos-alm merged 7 commits intomainfrom
fix/incremental-analysis-468

Conversation

@carlos-alm
Copy link
Contributor

Summary

Fixes #468. Two bugs were causing all analysis tables (function_complexity, cfg_blocks, cfg_edges, dataflow) to be empty on every build — full or incremental, native or WASM.

  • Broken import paths in ast-analysis/engine.js: The buildXxx functions were dynamically imported as ../complexity.js, ../cfg.js, ../ast.js, ../dataflow.js — which resolved to non-existent src/complexity.js etc. instead of src/features/complexity.js. The imports failed silently inside try/catch blocks, so analysis data was never written to the DB. Fixed to ../features/complexity.js etc.

  • Missing WASM tree pre-parse for native incremental rebuilds: ensureWasmTrees was only triggered when dataflow analysis needed it, but complexity and CFG visitors also require a WASM tree-sitter tree for their visitor walks. On native incremental rebuilds, changed files had their analysis data purged by purgeFilesFromGraph but never re-computed. Broadened the ensureWasmTrees gate to trigger when any of complexity/CFG/dataflow need it.

Test plan

  • New tests in incremental-parity.test.js verify complexity, CFG, and dataflow rows survive incremental rebuilds
  • Full test suite passes (662 tests, 42 files — failures are only pre-existing missing WASM grammars for Rust/Java/C#/Ruby/PHP in worktree)
  • Manual verification: WASM full build, native full build, and native incremental rebuild all produce non-zero analysis rows

Two bugs fixed:

1. Broken import paths in ast-analysis/engine.js — the buildXxx
   functions (buildComplexityMetrics, buildCFGData, buildDataflowEdges,
   buildAstNodes) were imported as ../complexity.js etc. which resolved
   to non-existent src/complexity.js instead of src/features/complexity.js.
   The dynamic imports failed silently inside try/catch, so all analysis
   tables (function_complexity, cfg_blocks, dataflow) were always empty
   on every build — both full and incremental, both engines.

2. Missing WASM tree pre-parse for native engine incremental rebuilds —
   ensureWasmTrees was only called when dataflow needed it, but
   complexity and CFG visitors also require a WASM tree. On native
   incremental rebuilds, changed files had their analysis data purged
   but never re-computed because no WASM tree was available for the
   visitor walk.

Impact: 1 functions changed, 0 affected
@claude
Copy link

claude bot commented Mar 16, 2026

Claude finished @carlos-alm's task —— View job


PR Review: Fix complexity/CFG/dataflow analysis in builds

Conducting comprehensive review following 5-phase process:

Todo List:

  • Phase 1: Root cause analysis - understand the real problem
  • Phase 2: Configuration change detection - check for risky changes
  • Phase 3: Test integrity verification
  • Phase 4: Code quality review
  • Phase 5: Backlog compliance assessment
  • Final: Provide structured assessment and rating

Phase 1: Root Cause Analysis ✅

Problem Being Solved: Two critical bugs causing all analysis tables to be empty on every build:

  1. Broken import paths in src/ast-analysis/engine.js:340-373 - Dynamic imports were using ../complexity.js instead of ../features/complexity.js, causing silent failures inside try/catch blocks
  2. Missing WASM tree pre-parse for native incremental rebuilds - Only dataflow analysis triggered ensureWasmTrees(), but complexity and CFG visitors also require WASM trees

Why This Approach: This fixes the root cause, not symptoms. The import paths were genuinely wrong after the codebase reorganization (commit 68bd052 moved files from src/ to src/features/), and the WASM tree logic was incomplete.

Risk Assessment: Low risk - surgical fixes to import paths + broadened condition for WASM tree generation.

Phase 2: Configuration Change Detection ✅

No configuration, linting rules, timeouts, or module resolution changes detected. This is pure bug fixing.

Phase 3: Test Integrity ✅

Strong test coverage added:

  • 3 new tests in incremental-parity.test.js:147-164 specifically verify complexity, CFG, and dataflow analysis survive incremental rebuilds
  • Tests use readAnalysisTables() function to query the actual DB tables: function_complexity, cfg_blocks, dataflow
  • Tests verify row counts match between full vs incremental builds
  • Proper assertions: expect(incrAnalysis.complexity.length).toBeGreaterThan(0) ensures data exists

No red flags - tests strengthen verification without relaxing standards.

Phase 4: Code Quality ✅

Import path fixes (lines 340, 351, 362, 373):

- const { buildAstNodes } = await import('../ast.js');
+ const { buildAstNodes } = await import('../features/ast.js');

✅ Correct - the files are actually in src/features/, verified via ls src/features/

WASM tree logic enhancement (lines 81-116):

  • Broadened condition from if (doDataflow) to if (doComplexity || doCfg || doDataflow)
  • Added proper per-analysis checking: needsComplexity, needsCfg, needsDataflow
  • Logic is sound: complexity/CFG visitors need WASM trees for AST walking, not just dataflow

Code quality: Clean, defensive, follows existing patterns. No performance issues introduced.

Phase 5: Backlog Compliance ✅

Root Cause Analysis

Problem Being Solved: Critical infrastructure bug - all analysis tables empty due to broken import paths and incomplete WASM tree logic
Why This Approach: Fixes actual import resolution failures and missing dependencies, not masking symptoms
Risk Assessment: Low risk surgical fixes to core functionality

Backlog Compliance

  • Zero-dep: ✓ - No new runtime dependencies added
  • Foundation-aligned: ✓ - Fixes existing functionality, maintains offline-first architecture
  • Problem-fit: 4/5 - Directly prevents analysis data loss that would confuse AI agents about code complexity/structure
  • Breaking: No - Pure bug fixes, no API changes
  • Tier: 1

Critical Concerns

None. This is a well-executed bug fix with:

  • Precise root cause identification
  • Minimal, surgical changes
  • Strong test coverage for regression prevention
  • No relaxed standards or workarounds

Final Recommendation

  • Rating: ⭐⭐⭐⭐☆ (4/5)
  • Action: APPROVE
  • Reasoning: Excellent bug fix - addresses real infrastructure issues with proper testing. Restores critical analysis functionality that would otherwise leave AI agents with incomplete code structure data. Only missing 5th star due to lack of documentation about why these imports failed originally (though git history provides context).

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 16, 2026

Greptile Summary

This PR correctly fixes two independent root causes that were silently emptying all analysis tables (function_complexity, cfg_blocks, cfg_edges, dataflow) on every build. The changes are surgical and well-reasoned.

Root cause fixes:

  • Wrong dynamic import paths (../complexity.js../features/complexity.js etc.) — all four buildXxx imports were resolving to non-existent paths; because they were inside try/catch blocks the failures were swallowed silently. Fixed across all four import sites.
  • Too-narrow WASM pre-parse gate — the ensureWasmTrees call was gated only on doDataflow, so native incremental rebuilds would purge complexity and CFG data for changed files and never re-compute it. The gate is now broadened to any of doComplexity || doCfg || doDataflow, with fine-grained per-file guards (needsComplexity, needsCfg, needsDataflow) that match the extension sets and state checks used by the unified walk.

Regression test quality:

  • All three new analysis-parity tests assert toBeGreaterThan(0) before comparing counts, preventing vacuous passes.
  • Correct column names (function_node_id, source_id) are used in the SQL helpers.
  • db.close() in readAnalysisTables is guarded by try/finally.

Minor observation: The existing readGraph helper (not introduced by this PR) still calls db.close() inline without try/finally, inconsistent with the pattern correctly applied in the new readAnalysisTables.

Confidence Score: 5/5

  • This PR is safe to merge — it fixes two clearly identified silent failures with minimal, targeted changes and adds non-vacuous regression tests for all three affected analysis types.
  • Both fixes are straightforward corrections (wrong path strings, missing gate condition) with no behavioural changes beyond restoring previously broken functionality. The only outstanding item is a pre-existing try/finally omission in readGraph which is a test-helper resource-leak risk, not a production correctness issue.
  • No files require special attention; the one minor style note is in the test helper readGraph at line 28 of tests/integration/incremental-parity.test.js.

Important Files Changed

Filename Overview
src/ast-analysis/engine.js Fixes two root causes of empty analysis tables: broken ../features/* import paths and a too-narrow WASM pre-parse gate. The new gate correctly checks all three analysis types (needsComplexity, needsCfg, needsDataflow) with consistent extension and state guards. The COMPLEXITY_EXTENSIONS constant and d.cfg !== null guard align this module with the patterns already used by the unified walk. No new issues introduced.
tests/integration/incremental-parity.test.js Adds readAnalysisTables helper and three new regression tests for #468. Column names are correct (function_node_id, source_id), toBeGreaterThan(0) guards prevent vacuous passes, and db.close() in readAnalysisTables is properly guarded by try/finally. Minor: the pre-existing readGraph function was not updated with the same try/finally pattern, leaving a small resource-leak inconsistency.

Sequence Diagram

sequenceDiagram
    participant Builder as builder.js
    participant Engine as engine.js (runAnalyses)
    participant Parser as domain/parser.js
    participant Features as features/*.js

    Builder->>Engine: runAnalyses(db, fileSymbols, rootDir, opts)

    Note over Engine: WASM pre-parse gate (NEW: doComplexity || doCfg || doDataflow)
    loop For each file without _tree
        Engine->>Engine: needsComplexity = doComplexity && COMPLEXITY_EXTENSIONS && !d.complexity
        Engine->>Engine: needsCfg = doCfg && CFG_EXTENSIONS && d.cfg !== null && !blocks
        Engine->>Engine: needsDataflow = doDataflow && !symbols.dataflow && DATAFLOW_EXTENSIONS
        alt Any need is true
            Engine->>Parser: ensureWasmTrees(fileSymbols, rootDir)
            Parser-->>Engine: fileSymbols updated with _tree
            Note over Engine: break — no need to check further files
        end
    end

    Note over Engine: Unified walk (files with _tree)
    loop For each file with _tree
        Engine->>Engine: createComplexityVisitor / createCfgVisitor / createDataflowVisitor
        Engine->>Engine: walkWithVisitors → store results on symbols/defs
    end

    Note over Engine: Delegate to buildXxx (FIXED import paths)
    Engine->>Features: import('../features/ast.js') → buildAstNodes
    Engine->>Features: import('../features/complexity.js') → buildComplexityMetrics
    Engine->>Features: import('../features/cfg.js') → buildCFGData
    Engine->>Features: import('../features/dataflow.js') → buildDataflowEdges
    Features-->>Engine: DB rows written
    Engine-->>Builder: timing object
Loading

Last reviewed commit: 93ec404

Comment on lines +88 to +99
const needsComplexity =
doComplexity &&
defs.some((d) => (d.kind === 'function' || d.kind === 'method') && d.line && !d.complexity);
const needsCfg =
doCfg &&
CFG_EXTENSIONS.has(ext) &&
defs.some(
(d) =>
(d.kind === 'function' || d.kind === 'method') &&
d.line &&
!Array.isArray(d.cfg?.blocks),
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needsCfg gate is looser than needsWasmCfg

The WASM pre-parse gate here uses !Array.isArray(d.cfg?.blocks) which is true when d.cfg === null. However, the unified walk gate on line 197 uses d.cfg !== null && !Array.isArray(d.cfg?.blocks), which skips null entries.

In cfg.js, d.cfg === null means "the native engine explicitly decided CFG can't/needn't be computed for this definition" and is treated as satisfied. The mismatch means this gate may trigger WASM tree loading for files where the visitor will then skip all definitions.

This isn't a correctness bug (CFG results are still correct), but it's a minor inefficiency — and a consistency concern. Consider aligning the check:

Suggested change
const needsComplexity =
doComplexity &&
defs.some((d) => (d.kind === 'function' || d.kind === 'method') && d.line && !d.complexity);
const needsCfg =
doCfg &&
CFG_EXTENSIONS.has(ext) &&
defs.some(
(d) =>
(d.kind === 'function' || d.kind === 'method') &&
d.line &&
!Array.isArray(d.cfg?.blocks),
);
const needsComplexity =
doComplexity &&
defs.some((d) => (d.kind === 'function' || d.kind === 'method') && d.line && !d.complexity);
const needsCfg =
doCfg &&
CFG_EXTENSIONS.has(ext) &&
defs.some(
(d) =>
(d.kind === 'function' || d.kind === 'method') &&
d.line &&
d.cfg !== null &&
!Array.isArray(d.cfg?.blocks),
);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — added d.cfg !== null to the needsCfg gate, aligning it with the unified walk guard on line 197 and avoiding unnecessary WASM loading for null-CFG definitions.

Comment on lines +154 to +158
it('preserves CFG blocks for changed file (#468)', () => {
const fullAnalysis = readAnalysisTables(path.join(fullDir, '.codegraph', 'graph.db'));
const incrAnalysis = readAnalysisTables(path.join(incrDir, '.codegraph', 'graph.db'));
expect(incrAnalysis.cfgBlocks.length).toBe(fullAnalysis.cfgBlocks.length);
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CFG and dataflow tests may pass vacuously

The complexity test on line 150 asserts toBeGreaterThan(0) to ensure rows actually exist before comparing counts. The CFG and dataflow tests only compare counts between full and incremental — if both return 0 (e.g., because the fixture is too simple or a grammar is missing), the test passes without actually verifying analysis data survives incremental rebuilds.

Consider adding expect(incrAnalysis.cfgBlocks.length).toBeGreaterThan(0) and similar for dataflow, matching the complexity test's approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — added toBeGreaterThan(0) assertions for CFG blocks and dataflow statements, matching the complexity test pattern. Also fixed wrong column names in readAnalysisTables (node_idfunction_node_id, source_node_idsource_id) that were causing the queries to silently return empty arrays via the try/catch fallback — confirming the tests were indeed passing vacuously.

Add d.cfg !== null check to needsCfg predicate, matching the unified
walk guard and avoiding unnecessary WASM loading for null-CFG defs.

Fix wrong column names in readAnalysisTables (node_id -> function_node_id,
source_node_id -> source_id) that caused CFG/dataflow queries to silently
return empty arrays. Add toBeGreaterThan(0) assertions so these tests
fail fast if analysis data is missing.

Impact: 1 functions changed, 0 affected
@carlos-alm
Copy link
Contributor Author

@greptileai

Comment on lines +88 to +90
const needsComplexity =
doComplexity &&
defs.some((d) => (d.kind === 'function' || d.kind === 'method') && d.line && !d.complexity);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needsComplexity missing language/extension filter

Unlike needsCfg (which guards with CFG_EXTENSIONS.has(ext)) and needsDataflow (which guards with DATAFLOW_EXTENSIONS.has(ext)), needsComplexity has no extension filter. If a file has function definitions but belongs to a language with no COMPLEXITY_RULES entry (e.g. a language the native parser produces definitions for but the WASM complexity visitor doesn't support), this gate can fire and trigger ensureWasmTrees for all files in the batch — even though the unified walk will later skip complexity for that language because cRules is undefined.

The check should be consistent with the other gates:

Suggested change
const needsComplexity =
doComplexity &&
defs.some((d) => (d.kind === 'function' || d.kind === 'method') && d.line && !d.complexity);
const needsComplexity =
doComplexity &&
COMPLEXITY_RULES.has(extToLang.get(ext)) &&
defs.some((d) => (d.kind === 'function' || d.kind === 'method') && d.line && !d.complexity);

This prevents unnecessary WASM loading and brings the three gate conditions in line with each other.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 93ec404. Added COMPLEXITY_EXTENSIONS = buildExtensionSet(COMPLEXITY_RULES) at module level (matching CFG_EXTENSIONS and DATAFLOW_EXTENSIONS), and added COMPLEXITY_EXTENSIONS.has(ext) to the needsComplexity check so it now filters by language/extension like the other two guards.

Comment on lines +47 to +84
const db = new Database(dbPath, { readonly: true });
const result = {};
try {
result.complexity = db
.prepare(
`SELECT fc.node_id, fc.cognitive, fc.cyclomatic, n.name, n.file
FROM function_complexity fc JOIN nodes n ON fc.node_id = n.id
ORDER BY n.name, n.file`,
)
.all();
} catch {
result.complexity = [];
}
try {
result.cfgBlocks = db
.prepare(
`SELECT cb.function_node_id, cb.block_index, cb.block_type, n.name, n.file
FROM cfg_blocks cb JOIN nodes n ON cb.function_node_id = n.id
ORDER BY n.name, n.file, cb.block_index`,
)
.all();
} catch {
result.cfgBlocks = [];
}
try {
result.dataflow = db
.prepare(
`SELECT d.source_id, d.kind, n.name, n.file
FROM dataflow d JOIN nodes n ON d.source_id = n.id
ORDER BY n.name, n.file, d.kind`,
)
.all();
} catch {
result.dataflow = [];
}
db.close();
return result;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

db.close() not guarded by finally

db.close() at line 82 is called after three separate try/catch blocks. In the current implementation this is always reachable — the inner catch blocks swallow all exceptions — but the pattern is fragile: a future catch that rethrows, or a change that adds code between the last catch and db.close(), would silently leak the database handle. A try/finally makes the intent explicit and is idiomatic for resource cleanup:

Suggested change
const db = new Database(dbPath, { readonly: true });
const result = {};
try {
result.complexity = db
.prepare(
`SELECT fc.node_id, fc.cognitive, fc.cyclomatic, n.name, n.file
FROM function_complexity fc JOIN nodes n ON fc.node_id = n.id
ORDER BY n.name, n.file`,
)
.all();
} catch {
result.complexity = [];
}
try {
result.cfgBlocks = db
.prepare(
`SELECT cb.function_node_id, cb.block_index, cb.block_type, n.name, n.file
FROM cfg_blocks cb JOIN nodes n ON cb.function_node_id = n.id
ORDER BY n.name, n.file, cb.block_index`,
)
.all();
} catch {
result.cfgBlocks = [];
}
try {
result.dataflow = db
.prepare(
`SELECT d.source_id, d.kind, n.name, n.file
FROM dataflow d JOIN nodes n ON d.source_id = n.id
ORDER BY n.name, n.file, d.kind`,
)
.all();
} catch {
result.dataflow = [];
}
db.close();
return result;
}
function readAnalysisTables(dbPath) {
const db = new Database(dbPath, { readonly: true });
const result = {};
try {
try {
result.complexity = db
.prepare(
`SELECT fc.node_id, fc.cognitive, fc.cyclomatic, n.name, n.file
FROM function_complexity fc JOIN nodes n ON fc.node_id = n.id
ORDER BY n.name, n.file`,
)
.all();
} catch {
result.complexity = [];
}
try {
result.cfgBlocks = db
.prepare(
`SELECT cb.function_node_id, cb.block_index, cb.block_type, n.name, n.file
FROM cfg_blocks cb JOIN nodes n ON cb.function_node_id = n.id
ORDER BY n.name, n.file, cb.block_index`,
)
.all();
} catch {
result.cfgBlocks = [];
}
try {
result.dataflow = db
.prepare(
`SELECT d.source_id, d.kind, n.name, n.file
FROM dataflow d JOIN nodes n ON d.source_id = n.id
ORDER BY n.name, n.file, d.kind`,
)
.all();
} catch {
result.dataflow = [];
}
} finally {
db.close();
}
return result;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 93ec404. Wrapped the three inner try/catch blocks in an outer try/finally so db.close() is guaranteed to run even if an unexpected error propagates past the individual catch blocks.

@carlos-alm
Copy link
Contributor Author

@greptileai

@carlos-alm carlos-alm merged commit 2cebba1 into main Mar 16, 2026
13 checks passed
@carlos-alm carlos-alm deleted the fix/incremental-analysis-468 branch March 16, 2026 11:22
@github-actions github-actions bot locked and limited conversation to collaborators Mar 16, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Incremental rebuild loses complexity/CFG/dataflow on native engine

1 participant