refactor: decompose AST analysis visitors and domain builder stages#491
refactor: decompose AST analysis visitors and domain builder stages#491carlos-alm merged 18 commits intomainfrom
Conversation
- Remove dead `truncate` function from ast-analysis/shared.js (0 consumers) - Remove dead `truncStart` function from presentation/table.js (0 consumers) - Un-export `BATCH_CHUNK` in builder/helpers.js (only used internally) Skipped sync.json targets that were false positives: - BUILTIN_RECEIVERS: used by incremental.js + build-edges.js - TRANSIENT_CODES/RETRY_DELAY_MS: internal to readFileSafe - MAX_COL_WIDTH: internal to printAutoTable - findFunctionNode: re-exported from index.js, used in tests Impact: 1 functions changed, 32 affected
Impact: 29 functions changed, 105 affected
…ures Impact: 5 functions changed, 7 affected
connection.js: add debug() logging to all 8 catch-with-fallback blocks so failures are observable without changing behavior. migrations.js: replace 14 try/catch blocks in initSchema with hasColumn() and hasTable() guards. CREATE INDEX calls use IF NOT EXISTS directly. getBuildMeta uses hasTable() check instead of try/catch. Impact: 10 functions changed, 19 affected
Add debug() logging to 10 empty catch blocks across context.js, symbol-lookup.js, exports.js, impact.js, and module-map.js. All catches retain their fallback behavior but failures are now observable via debug logging. Impact: 6 functions changed, 18 affected
Add debug() logging to 6 empty catch blocks: 3 in disposeParsers() for WASM resource cleanup, 2 in ensureWasmTrees() for file read and parse failures, and 1 in getActiveEngine() for version lookup. Impact: 3 functions changed, 0 affected
Add debug() logging to 9 empty catch blocks across complexity.js (5), cfg.js (2), and dataflow.js (2). All catches for file read and parse failures now log the error message before continuing. Impact: 4 functions changed, 2 affected
Split the monolithic walkJavaScriptNode switch (13 cases, cognitive 228) into 11 focused handler functions. The dispatcher is now a thin switch that delegates to handleFunctionDecl, handleClassDecl, handleMethodDef, handleInterfaceDecl, handleTypeAliasDecl, handleVariableDecl, handleEnumDecl, handleCallExpr, handleImportStmt, handleExportStmt, and handleExpressionStmt. The expression_statement case now reuses the existing handleCommonJSAssignment helper, eliminating ~50 lines of duplication. Worst handler complexity: handleVariableDecl (cognitive 20), down from the original monolithic function (cognitive 279). Impact: 13 functions changed, 3 affected
Split walkPythonNode switch into 7 focused handlers: handlePyFunctionDef, handlePyClassDef, handlePyCall, handlePyImport, handlePyExpressionStmt, handlePyImportFrom, plus the decorated_definition inline dispatch. Moved extractPythonParameters, extractPythonClassProperties, walkInitBody, and findPythonParentClass from closures to module-scope functions. Impact: 12 functions changed, 5 affected
Split walkJavaNode switch into 8 focused handlers plus an extractJavaInterfaces helper. Moved findJavaParentClass to module scope. The class_declaration case (deepest nesting in the file) is now split between handleJavaClassDecl and extractJavaInterfaces. Impact: 12 functions changed, 5 affected
Apply the same per-category handler decomposition to all remaining language extractors: Go (6 handlers), Ruby (8 handlers), PHP (11 handlers), C# (11 handlers), Rust (9 handlers), HCL (4 handlers). Each extractor now follows the template established by the JS extractor: - Thin entry function creates ctx, delegates to walkXNode - walkXNode is a thin dispatcher switch - Each case is a named handler function at module scope - Helper functions (findParentClass, etc.) moved to module scope Impact: 66 functions changed, 23 affected
…pers Move nested handler functions to module level in cfg-visitor.js, dataflow-visitor.js, and complexity-visitor.js — reducing cognitive complexity of each factory function from 100-337 down to thin coordinators. Extract WASM pre-parse, visitor setup, result storage, and build delegation from runAnalyses into focused helper functions. Impact: 66 functions changed, 43 affected
Extract edge-building by type (import, call-native, call-JS, class hierarchy) from buildEdges. Extract per-phase insertion logic from insertNodes. Extract scoped/incremental/full-build paths and reverse-dep cascade from detectChanges. Extract setup, engine init, alias loading from pipeline.js. Extract node/edge-building helpers from incremental.js rebuildFile. Impact: 44 functions changed, 19 affected
|
Claude finished @carlos-alm's task —— View job PR Review: AST Analysis Visitors and Domain Builder DecompositionReviewing this refactoring PR that claims to decompose high-complexity subsystems. Review Checklist
Root Cause AnalysisProblem Being Solved: This PR addresses genuine complexity issues in the worst subsystems in the codebase:
Why This Approach: The decomposition strategy is sound:
Risk Assessment: LOW. This is pure refactoring with no functional changes. Configuration & Test Analysis✅ No configuration relaxation - biome.json unchanged Code Quality AnalysisStrengths:
Concerns:
Backlog Compliance
Tier: 1 (zero-dep ✓ + foundation-aligned ✓) Critical ConcernsNone identified. This is high-quality refactoring work that:
Final Recommendation
This is the kind of disciplined refactoring work that makes codebases more maintainable without introducing risk. |
Greptile SummaryThis PR is a large structural refactoring — Titan quality sweep phase 5a — that decomposes two high-complexity subsystems (AST analysis visitors and domain builder stages) into focused, independently testable helpers. No new features are introduced; the public API surface ( Key changes:
Issues found:
Confidence Score: 3/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[buildGraph] --> B[setupPipeline]
B --> C[runPipelineStages]
C --> D[collectFiles]
D --> E[detectChanges]
E -->|earlyExit| Z[return]
E -->|scoped| ES[handleScopedBuild]
E -->|full| EF[handleFullBuild]
E -->|incremental| EI[handleIncrementalBuild]
ES & EF & EI --> F[parseFiles]
F --> G[insertNodes]
G --> G1[insertDefinitionsAndExports]
G1 --> G2[insertChildren]
G2 --> G3[insertContainmentEdges]
G3 --> G4[updateFileHashes]
G4 --> H[resolveImports]
H --> I[buildEdges]
I --> I1[buildImportEdges]
I1 --> I2{native engine?}
I2 -->|yes| I3[buildCallEdgesNative]
I2 -->|no| I4[buildCallEdgesJS]
I3 & I4 --> J[buildStructure]
J --> K[runAnalyses / engine.js]
K --> K1[ensureWasmTreesIfNeeded]
K1 --> K2[setupVisitors per file]
K2 --> K3[walkWithVisitors]
K3 --> K4[storeComplexityResults]
K4 --> K5[storeCfgResults]
K5 --> K6[delegateToBuildFunctions]
K6 --> L[finalize]
|
src/ast-analysis/engine.js
Outdated
| if (needsWasmTrees) { | ||
| try { | ||
| const { ensureWasmTrees } = await getParserModule(); | ||
| await ensureWasmTrees(fileSymbols); |
There was a problem hiding this comment.
Missing
rootDir argument dropped during extraction
ensureWasmTrees requires two arguments — fileSymbols and rootDir — as confirmed by its signature in src/domain/parser.js:
export async function ensureWasmTrees(fileSymbols, rootDir) {
// ...
const absPath = path.join(rootDir, relPath); // ← rootDir used hereWhen rootDir is undefined, path.join(undefined, relPath) will throw a TypeError (Node ≥ 20 is strict about this), meaning WASM pre-parsing silently fails on the native engine path. This directly reverts the fix for bug #468 ("incremental rebuilds on the native engine silently lose these analyses for changed files").
The ensureWasmTreesIfNeeded helper needs to accept rootDir and pass it through:
| await ensureWasmTrees(fileSymbols); | |
| await ensureWasmTrees(fileSymbols, rootDir); |
The helper's signature should also be updated:
async function ensureWasmTreesIfNeeded(fileSymbols, opts, rootDir) {And the call in runAnalyses updated:
await ensureWasmTreesIfNeeded(fileSymbols, opts, rootDir);| } | ||
| } | ||
| return reverseDeps; | ||
| } | ||
|
|
There was a problem hiding this comment.
Scoped build log silently dropped for zero-reverse-dep case
In the original detectChanges, the info(...) rebuild log was always emitted for scoped builds:
// original — unconditional
info(`Scoped rebuild: ${changePaths.length} changed, ${ctx.removed.length} removed, ${reverseDeps.size} reverse-deps`);The refactoring moved the log inside if (reverseDeps.size > 0), so it is now silently skipped when there are no reverse-dependency files. A scoped rebuild with only direct changes (the common case) produces no log output at all, making builds harder to diagnose.
| } | |
| } | |
| return reverseDeps; | |
| } | |
| if (reverseDeps.size > 0) { | |
| info( | |
| `Scoped rebuild: ${changePaths.length} changed, ${ctx.removed.length} removed, ${reverseDeps.size} reverse-deps`, | |
| ); | |
| } else { | |
| info( | |
| `Scoped rebuild: ${changePaths.length} changed, ${ctx.removed.length} removed, 0 reverse-deps`, | |
| ); | |
| } |
There was a problem hiding this comment.
Fixed in f6488f1 — moved the info() log outside the reverseDeps.size > 0 check so scoped builds always emit a log line.
| if (changePaths.length > 0 || ctx.removed.length > 0) { | ||
| purgeFilesFromGraph(db, [...ctx.removed, ...changePaths], { purgeHashes: false }); | ||
| } | ||
|
|
||
| if (reverseDeps.size > 0) { | ||
| const deleteOutgoingEdgesForFile = db.prepare( | ||
| 'DELETE FROM edges WHERE source_id IN (SELECT id FROM nodes WHERE file = ?)', | ||
| ); | ||
| for (const relPath of reverseDeps) { | ||
| deleteOutgoingEdgesForFile.run(relPath); | ||
| } | ||
| for (const relPath of reverseDeps) { | ||
| const absPath = path.join(rootDir, relPath); | ||
| ctx.parseChanges.push({ file: absPath, relPath, _reverseDepOnly: true }); | ||
| } | ||
| return; | ||
| } | ||
| } | ||
|
|
||
| const increResult = | ||
| incremental && !forceFullRebuild | ||
| ? getChangedFiles(db, allFiles, rootDir) | ||
| : { changed: allFiles.map((f) => ({ file: f })), removed: [], isFullBuild: true }; | ||
| // ── Scoped build path ─────────────────────────────────────────────────── | ||
|
|
||
| ctx.removed = increResult.removed; | ||
| ctx.isFullBuild = increResult.isFullBuild; | ||
| ctx.parseChanges = increResult.changed.filter((c) => !c.metadataOnly); | ||
| ctx.metadataUpdates = increResult.changed.filter((c) => c.metadataOnly); | ||
| function handleScopedBuild(ctx) { | ||
| const { db, rootDir, opts } = ctx; | ||
|
|
||
| // Early exit: no changes detected | ||
| if (!ctx.isFullBuild && ctx.parseChanges.length === 0 && ctx.removed.length === 0) { | ||
| const ranAnalysis = await runPendingAnalysis(ctx); | ||
| if (ranAnalysis) { | ||
| closeDb(db); | ||
| writeJournalHeader(rootDir, Date.now()); | ||
| ctx.earlyExit = true; | ||
| return; | ||
| } | ||
| let hasEmbeddings = false; | ||
| try { | ||
| db.prepare('SELECT 1 FROM embeddings LIMIT 1').get(); | ||
| hasEmbeddings = true; | ||
| } catch { | ||
| /* table doesn't exist */ | ||
| } | ||
| ctx.hasEmbeddings = hasEmbeddings; | ||
|
|
||
| healMetadata(ctx); | ||
| info('No changes detected. Graph is up to date.'); | ||
| closeDb(db); | ||
| writeJournalHeader(rootDir, Date.now()); | ||
| ctx.earlyExit = true; | ||
| return; | ||
| const changePaths = ctx.parseChanges.map( | ||
| (item) => item.relPath || normalizePath(path.relative(rootDir, item.file)), | ||
| ); | ||
|
|
||
| let reverseDeps = new Set(); | ||
| if (!opts.noReverseDeps) { | ||
| const changedRelPaths = new Set([...changePaths, ...ctx.removed]); | ||
| reverseDeps = findReverseDependencies(db, changedRelPaths, rootDir); | ||
| } | ||
|
|
||
| // Purge changed + removed files, then add reverse-deps | ||
| purgeAndAddReverseDeps(ctx, changePaths, reverseDeps); | ||
|
|
There was a problem hiding this comment.
hasEmbeddings detection duplicated across three extracted helpers
handleScopedBuild, handleFullBuild, and handleIncrementalBuild all contain an identical try/catch block that detects whether the embeddings table exists and sets ctx.hasEmbeddings. The original code had this logic once. Consider extracting it:
function detectHasEmbeddings(db) {
try {
db.prepare('SELECT 1 FROM embeddings LIMIT 1').get();
return true;
} catch {
return false;
}
}Then call ctx.hasEmbeddings = detectHasEmbeddings(ctx.db); once before dispatching to the three helpers, instead of repeating the block in each.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
There was a problem hiding this comment.
Fixed in f6488f1 — extracted a shared detectHasEmbeddings(db) helper and replaced the duplicated try-catch blocks in all three build handlers.
Without rootDir, path.join(undefined, relPath) throws a TypeError, silently breaking WASM pre-parsing on the native engine path.
…dings helper The scoped rebuild info log was only emitted when reverse-deps existed, hiding output for the common zero-reverse-dep case. Also deduplicate the hasEmbeddings try-catch across three build handlers into a shared detectHasEmbeddings helper.
# Conflicts: # src/db/migrations.js # src/features/shared/find-nodes.js Impact: 3 functions changed, 7 affected
|
Resolved merge conflicts with main — same pattern as #490 (hasTable guards in migrations.js, findNodes validation in find-nodes.js). All 1883 tests pass. |
Impact: 16 functions changed, 68 affected
…e/codegraph into refactor/titan-ast-builder
Summary
Titan quality sweep — phase 5a: decompose the two highest-complexity subsystems.
AST analysis visitors
createCfgVisitor(worst in codebase: cog 337, 9.39 estimated bugs) → extract per-statement handlers (processIf,processSwitch,processFor, etc.) — cog 337→18createDataflowVisitor→ extract per-expression handlers — cog 215→12createComplexityVisitor→ extract per-metric handlers — cog 156→8runAnalyses→ decompose engine orchestrator — cog 212→15Domain builder stages
buildEdges(cog 351, 3.84 bugs) → split by edge type (import, call, hierarchy)insertNodes→ split by node processing phasedetectChanges→ separate discovery from hashingStacked on
Files changed (9)
src/ast-analysis/visitors/cfg-visitor.jssrc/ast-analysis/visitors/dataflow-visitor.jssrc/ast-analysis/visitors/complexity-visitor.jssrc/ast-analysis/engine.jssrc/domain/graph/builder/stages/build-edges.jssrc/domain/graph/builder/stages/insert-nodes.jssrc/domain/graph/builder/stages/detect-changes.jssrc/domain/graph/builder/incremental.jssrc/domain/graph/builder/pipeline.jsTest plan