fix: restore complexity/CFG/dataflow analysis in builds#469
Conversation
Two bugs fixed: 1. Broken import paths in ast-analysis/engine.js — the buildXxx functions (buildComplexityMetrics, buildCFGData, buildDataflowEdges, buildAstNodes) were imported as ../complexity.js etc. which resolved to non-existent src/complexity.js instead of src/features/complexity.js. The dynamic imports failed silently inside try/catch, so all analysis tables (function_complexity, cfg_blocks, dataflow) were always empty on every build — both full and incremental, both engines. 2. Missing WASM tree pre-parse for native engine incremental rebuilds — ensureWasmTrees was only called when dataflow needed it, but complexity and CFG visitors also require a WASM tree. On native incremental rebuilds, changed files had their analysis data purged but never re-computed because no WASM tree was available for the visitor walk. Impact: 1 functions changed, 0 affected
|
Claude finished @carlos-alm's task —— View job PR Review: Fix complexity/CFG/dataflow analysis in buildsConducting comprehensive review following 5-phase process: Todo List:
Phase 1: Root Cause Analysis ✅Problem Being Solved: Two critical bugs causing all analysis tables to be empty on every build:
Why This Approach: This fixes the root cause, not symptoms. The import paths were genuinely wrong after the codebase reorganization (commit 68bd052 moved files from Risk Assessment: Low risk - surgical fixes to import paths + broadened condition for WASM tree generation. Phase 2: Configuration Change Detection ✅No configuration, linting rules, timeouts, or module resolution changes detected. This is pure bug fixing. Phase 3: Test Integrity ✅Strong test coverage added:
No red flags - tests strengthen verification without relaxing standards. Phase 4: Code Quality ✅Import path fixes (lines 340, 351, 362, 373): - const { buildAstNodes } = await import('../ast.js');
+ const { buildAstNodes } = await import('../features/ast.js');✅ Correct - the files are actually in WASM tree logic enhancement (lines 81-116):
Code quality: Clean, defensive, follows existing patterns. No performance issues introduced. Phase 5: Backlog Compliance ✅Root Cause AnalysisProblem Being Solved: Critical infrastructure bug - all analysis tables empty due to broken import paths and incomplete WASM tree logic Backlog Compliance
Critical ConcernsNone. This is a well-executed bug fix with:
Final Recommendation
|
Greptile SummaryThis PR correctly fixes two independent root causes that were silently emptying all analysis tables ( Root cause fixes:
Regression test quality:
Minor observation: The existing Confidence Score: 5/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Builder as builder.js
participant Engine as engine.js (runAnalyses)
participant Parser as domain/parser.js
participant Features as features/*.js
Builder->>Engine: runAnalyses(db, fileSymbols, rootDir, opts)
Note over Engine: WASM pre-parse gate (NEW: doComplexity || doCfg || doDataflow)
loop For each file without _tree
Engine->>Engine: needsComplexity = doComplexity && COMPLEXITY_EXTENSIONS && !d.complexity
Engine->>Engine: needsCfg = doCfg && CFG_EXTENSIONS && d.cfg !== null && !blocks
Engine->>Engine: needsDataflow = doDataflow && !symbols.dataflow && DATAFLOW_EXTENSIONS
alt Any need is true
Engine->>Parser: ensureWasmTrees(fileSymbols, rootDir)
Parser-->>Engine: fileSymbols updated with _tree
Note over Engine: break — no need to check further files
end
end
Note over Engine: Unified walk (files with _tree)
loop For each file with _tree
Engine->>Engine: createComplexityVisitor / createCfgVisitor / createDataflowVisitor
Engine->>Engine: walkWithVisitors → store results on symbols/defs
end
Note over Engine: Delegate to buildXxx (FIXED import paths)
Engine->>Features: import('../features/ast.js') → buildAstNodes
Engine->>Features: import('../features/complexity.js') → buildComplexityMetrics
Engine->>Features: import('../features/cfg.js') → buildCFGData
Engine->>Features: import('../features/dataflow.js') → buildDataflowEdges
Features-->>Engine: DB rows written
Engine-->>Builder: timing object
Last reviewed commit: 93ec404 |
| const needsComplexity = | ||
| doComplexity && | ||
| defs.some((d) => (d.kind === 'function' || d.kind === 'method') && d.line && !d.complexity); | ||
| const needsCfg = | ||
| doCfg && | ||
| CFG_EXTENSIONS.has(ext) && | ||
| defs.some( | ||
| (d) => | ||
| (d.kind === 'function' || d.kind === 'method') && | ||
| d.line && | ||
| !Array.isArray(d.cfg?.blocks), | ||
| ); |
There was a problem hiding this comment.
needsCfg gate is looser than needsWasmCfg
The WASM pre-parse gate here uses !Array.isArray(d.cfg?.blocks) which is true when d.cfg === null. However, the unified walk gate on line 197 uses d.cfg !== null && !Array.isArray(d.cfg?.blocks), which skips null entries.
In cfg.js, d.cfg === null means "the native engine explicitly decided CFG can't/needn't be computed for this definition" and is treated as satisfied. The mismatch means this gate may trigger WASM tree loading for files where the visitor will then skip all definitions.
This isn't a correctness bug (CFG results are still correct), but it's a minor inefficiency — and a consistency concern. Consider aligning the check:
| const needsComplexity = | |
| doComplexity && | |
| defs.some((d) => (d.kind === 'function' || d.kind === 'method') && d.line && !d.complexity); | |
| const needsCfg = | |
| doCfg && | |
| CFG_EXTENSIONS.has(ext) && | |
| defs.some( | |
| (d) => | |
| (d.kind === 'function' || d.kind === 'method') && | |
| d.line && | |
| !Array.isArray(d.cfg?.blocks), | |
| ); | |
| const needsComplexity = | |
| doComplexity && | |
| defs.some((d) => (d.kind === 'function' || d.kind === 'method') && d.line && !d.complexity); | |
| const needsCfg = | |
| doCfg && | |
| CFG_EXTENSIONS.has(ext) && | |
| defs.some( | |
| (d) => | |
| (d.kind === 'function' || d.kind === 'method') && | |
| d.line && | |
| d.cfg !== null && | |
| !Array.isArray(d.cfg?.blocks), | |
| ); |
There was a problem hiding this comment.
Fixed — added d.cfg !== null to the needsCfg gate, aligning it with the unified walk guard on line 197 and avoiding unnecessary WASM loading for null-CFG definitions.
| it('preserves CFG blocks for changed file (#468)', () => { | ||
| const fullAnalysis = readAnalysisTables(path.join(fullDir, '.codegraph', 'graph.db')); | ||
| const incrAnalysis = readAnalysisTables(path.join(incrDir, '.codegraph', 'graph.db')); | ||
| expect(incrAnalysis.cfgBlocks.length).toBe(fullAnalysis.cfgBlocks.length); | ||
| }); |
There was a problem hiding this comment.
CFG and dataflow tests may pass vacuously
The complexity test on line 150 asserts toBeGreaterThan(0) to ensure rows actually exist before comparing counts. The CFG and dataflow tests only compare counts between full and incremental — if both return 0 (e.g., because the fixture is too simple or a grammar is missing), the test passes without actually verifying analysis data survives incremental rebuilds.
Consider adding expect(incrAnalysis.cfgBlocks.length).toBeGreaterThan(0) and similar for dataflow, matching the complexity test's approach.
There was a problem hiding this comment.
Fixed — added toBeGreaterThan(0) assertions for CFG blocks and dataflow statements, matching the complexity test pattern. Also fixed wrong column names in readAnalysisTables (node_id → function_node_id, source_node_id → source_id) that were causing the queries to silently return empty arrays via the try/catch fallback — confirming the tests were indeed passing vacuously.
Add d.cfg !== null check to needsCfg predicate, matching the unified walk guard and avoiding unnecessary WASM loading for null-CFG defs. Fix wrong column names in readAnalysisTables (node_id -> function_node_id, source_node_id -> source_id) that caused CFG/dataflow queries to silently return empty arrays. Add toBeGreaterThan(0) assertions so these tests fail fast if analysis data is missing. Impact: 1 functions changed, 0 affected
| const needsComplexity = | ||
| doComplexity && | ||
| defs.some((d) => (d.kind === 'function' || d.kind === 'method') && d.line && !d.complexity); |
There was a problem hiding this comment.
needsComplexity missing language/extension filter
Unlike needsCfg (which guards with CFG_EXTENSIONS.has(ext)) and needsDataflow (which guards with DATAFLOW_EXTENSIONS.has(ext)), needsComplexity has no extension filter. If a file has function definitions but belongs to a language with no COMPLEXITY_RULES entry (e.g. a language the native parser produces definitions for but the WASM complexity visitor doesn't support), this gate can fire and trigger ensureWasmTrees for all files in the batch — even though the unified walk will later skip complexity for that language because cRules is undefined.
The check should be consistent with the other gates:
| const needsComplexity = | |
| doComplexity && | |
| defs.some((d) => (d.kind === 'function' || d.kind === 'method') && d.line && !d.complexity); | |
| const needsComplexity = | |
| doComplexity && | |
| COMPLEXITY_RULES.has(extToLang.get(ext)) && | |
| defs.some((d) => (d.kind === 'function' || d.kind === 'method') && d.line && !d.complexity); |
This prevents unnecessary WASM loading and brings the three gate conditions in line with each other.
There was a problem hiding this comment.
Fixed in 93ec404. Added COMPLEXITY_EXTENSIONS = buildExtensionSet(COMPLEXITY_RULES) at module level (matching CFG_EXTENSIONS and DATAFLOW_EXTENSIONS), and added COMPLEXITY_EXTENSIONS.has(ext) to the needsComplexity check so it now filters by language/extension like the other two guards.
| const db = new Database(dbPath, { readonly: true }); | ||
| const result = {}; | ||
| try { | ||
| result.complexity = db | ||
| .prepare( | ||
| `SELECT fc.node_id, fc.cognitive, fc.cyclomatic, n.name, n.file | ||
| FROM function_complexity fc JOIN nodes n ON fc.node_id = n.id | ||
| ORDER BY n.name, n.file`, | ||
| ) | ||
| .all(); | ||
| } catch { | ||
| result.complexity = []; | ||
| } | ||
| try { | ||
| result.cfgBlocks = db | ||
| .prepare( | ||
| `SELECT cb.function_node_id, cb.block_index, cb.block_type, n.name, n.file | ||
| FROM cfg_blocks cb JOIN nodes n ON cb.function_node_id = n.id | ||
| ORDER BY n.name, n.file, cb.block_index`, | ||
| ) | ||
| .all(); | ||
| } catch { | ||
| result.cfgBlocks = []; | ||
| } | ||
| try { | ||
| result.dataflow = db | ||
| .prepare( | ||
| `SELECT d.source_id, d.kind, n.name, n.file | ||
| FROM dataflow d JOIN nodes n ON d.source_id = n.id | ||
| ORDER BY n.name, n.file, d.kind`, | ||
| ) | ||
| .all(); | ||
| } catch { | ||
| result.dataflow = []; | ||
| } | ||
| db.close(); | ||
| return result; | ||
| } |
There was a problem hiding this comment.
db.close() not guarded by finally
db.close() at line 82 is called after three separate try/catch blocks. In the current implementation this is always reachable — the inner catch blocks swallow all exceptions — but the pattern is fragile: a future catch that rethrows, or a change that adds code between the last catch and db.close(), would silently leak the database handle. A try/finally makes the intent explicit and is idiomatic for resource cleanup:
| const db = new Database(dbPath, { readonly: true }); | |
| const result = {}; | |
| try { | |
| result.complexity = db | |
| .prepare( | |
| `SELECT fc.node_id, fc.cognitive, fc.cyclomatic, n.name, n.file | |
| FROM function_complexity fc JOIN nodes n ON fc.node_id = n.id | |
| ORDER BY n.name, n.file`, | |
| ) | |
| .all(); | |
| } catch { | |
| result.complexity = []; | |
| } | |
| try { | |
| result.cfgBlocks = db | |
| .prepare( | |
| `SELECT cb.function_node_id, cb.block_index, cb.block_type, n.name, n.file | |
| FROM cfg_blocks cb JOIN nodes n ON cb.function_node_id = n.id | |
| ORDER BY n.name, n.file, cb.block_index`, | |
| ) | |
| .all(); | |
| } catch { | |
| result.cfgBlocks = []; | |
| } | |
| try { | |
| result.dataflow = db | |
| .prepare( | |
| `SELECT d.source_id, d.kind, n.name, n.file | |
| FROM dataflow d JOIN nodes n ON d.source_id = n.id | |
| ORDER BY n.name, n.file, d.kind`, | |
| ) | |
| .all(); | |
| } catch { | |
| result.dataflow = []; | |
| } | |
| db.close(); | |
| return result; | |
| } | |
| function readAnalysisTables(dbPath) { | |
| const db = new Database(dbPath, { readonly: true }); | |
| const result = {}; | |
| try { | |
| try { | |
| result.complexity = db | |
| .prepare( | |
| `SELECT fc.node_id, fc.cognitive, fc.cyclomatic, n.name, n.file | |
| FROM function_complexity fc JOIN nodes n ON fc.node_id = n.id | |
| ORDER BY n.name, n.file`, | |
| ) | |
| .all(); | |
| } catch { | |
| result.complexity = []; | |
| } | |
| try { | |
| result.cfgBlocks = db | |
| .prepare( | |
| `SELECT cb.function_node_id, cb.block_index, cb.block_type, n.name, n.file | |
| FROM cfg_blocks cb JOIN nodes n ON cb.function_node_id = n.id | |
| ORDER BY n.name, n.file, cb.block_index`, | |
| ) | |
| .all(); | |
| } catch { | |
| result.cfgBlocks = []; | |
| } | |
| try { | |
| result.dataflow = db | |
| .prepare( | |
| `SELECT d.source_id, d.kind, n.name, n.file | |
| FROM dataflow d JOIN nodes n ON d.source_id = n.id | |
| ORDER BY n.name, n.file, d.kind`, | |
| ) | |
| .all(); | |
| } catch { | |
| result.dataflow = []; | |
| } | |
| } finally { | |
| db.close(); | |
| } | |
| return result; | |
| } |
There was a problem hiding this comment.
Fixed in 93ec404. Wrapped the three inner try/catch blocks in an outer try/finally so db.close() is guaranteed to run even if an unexpected error propagates past the individual catch blocks.
Impact: 1 functions changed, 0 affected
Summary
Fixes #468. Two bugs were causing all analysis tables (
function_complexity,cfg_blocks,cfg_edges,dataflow) to be empty on every build — full or incremental, native or WASM.Broken import paths in
ast-analysis/engine.js: ThebuildXxxfunctions were dynamically imported as../complexity.js,../cfg.js,../ast.js,../dataflow.js— which resolved to non-existentsrc/complexity.jsetc. instead ofsrc/features/complexity.js. The imports failed silently inside try/catch blocks, so analysis data was never written to the DB. Fixed to../features/complexity.jsetc.Missing WASM tree pre-parse for native incremental rebuilds:
ensureWasmTreeswas only triggered when dataflow analysis needed it, but complexity and CFG visitors also require a WASM tree-sitter tree for their visitor walks. On native incremental rebuilds, changed files had their analysis data purged bypurgeFilesFromGraphbut never re-computed. Broadened theensureWasmTreesgate to trigger when any of complexity/CFG/dataflow need it.Test plan
incremental-parity.test.jsverify complexity, CFG, and dataflow rows survive incremental rebuilds