refactor: decompose monolithic language extractors into per-category handlers#490
refactor: decompose monolithic language extractors into per-category handlers#490carlos-alm merged 16 commits intomainfrom
Conversation
- Remove dead `truncate` function from ast-analysis/shared.js (0 consumers) - Remove dead `truncStart` function from presentation/table.js (0 consumers) - Un-export `BATCH_CHUNK` in builder/helpers.js (only used internally) Skipped sync.json targets that were false positives: - BUILTIN_RECEIVERS: used by incremental.js + build-edges.js - TRANSIENT_CODES/RETRY_DELAY_MS: internal to readFileSafe - MAX_COL_WIDTH: internal to printAutoTable - findFunctionNode: re-exported from index.js, used in tests Impact: 1 functions changed, 32 affected
Impact: 29 functions changed, 105 affected
…ures Impact: 5 functions changed, 7 affected
connection.js: add debug() logging to all 8 catch-with-fallback blocks so failures are observable without changing behavior. migrations.js: replace 14 try/catch blocks in initSchema with hasColumn() and hasTable() guards. CREATE INDEX calls use IF NOT EXISTS directly. getBuildMeta uses hasTable() check instead of try/catch. Impact: 10 functions changed, 19 affected
Add debug() logging to 10 empty catch blocks across context.js, symbol-lookup.js, exports.js, impact.js, and module-map.js. All catches retain their fallback behavior but failures are now observable via debug logging. Impact: 6 functions changed, 18 affected
Add debug() logging to 6 empty catch blocks: 3 in disposeParsers() for WASM resource cleanup, 2 in ensureWasmTrees() for file read and parse failures, and 1 in getActiveEngine() for version lookup. Impact: 3 functions changed, 0 affected
Add debug() logging to 9 empty catch blocks across complexity.js (5), cfg.js (2), and dataflow.js (2). All catches for file read and parse failures now log the error message before continuing. Impact: 4 functions changed, 2 affected
Split the monolithic walkJavaScriptNode switch (13 cases, cognitive 228) into 11 focused handler functions. The dispatcher is now a thin switch that delegates to handleFunctionDecl, handleClassDecl, handleMethodDef, handleInterfaceDecl, handleTypeAliasDecl, handleVariableDecl, handleEnumDecl, handleCallExpr, handleImportStmt, handleExportStmt, and handleExpressionStmt. The expression_statement case now reuses the existing handleCommonJSAssignment helper, eliminating ~50 lines of duplication. Worst handler complexity: handleVariableDecl (cognitive 20), down from the original monolithic function (cognitive 279). Impact: 13 functions changed, 3 affected
Split walkPythonNode switch into 7 focused handlers: handlePyFunctionDef, handlePyClassDef, handlePyCall, handlePyImport, handlePyExpressionStmt, handlePyImportFrom, plus the decorated_definition inline dispatch. Moved extractPythonParameters, extractPythonClassProperties, walkInitBody, and findPythonParentClass from closures to module-scope functions. Impact: 12 functions changed, 5 affected
Split walkJavaNode switch into 8 focused handlers plus an extractJavaInterfaces helper. Moved findJavaParentClass to module scope. The class_declaration case (deepest nesting in the file) is now split between handleJavaClassDecl and extractJavaInterfaces. Impact: 12 functions changed, 5 affected
Apply the same per-category handler decomposition to all remaining language extractors: Go (6 handlers), Ruby (8 handlers), PHP (11 handlers), C# (11 handlers), Rust (9 handlers), HCL (4 handlers). Each extractor now follows the template established by the JS extractor: - Thin entry function creates ctx, delegates to walkXNode - walkXNode is a thin dispatcher switch - Each case is a named handler function at module scope - Helper functions (findParentClass, etc.) moved to module scope Impact: 66 functions changed, 23 affected
|
Claude finished @carlos-alm's task —— View job PR Review Complete ✅Reviewing PR #490 - refactor: decompose monolithic language extractors into per-category handlers Todo List:
Root Cause AnalysisProblem Being Solved: Monolithic Why This Approach: Decomposition into per-category handler functions follows proven refactoring patterns and maintains identical functionality while dramatically reducing cognitive load. Risk Assessment: LOW RISK — purely structural refactoring with no behavioral changes, extensive test coverage exists. Backlog Compliance
Code Quality AssessmentExcellent execution ⭐⭐⭐⭐ Pattern Consistency: All 9 language extractors refactored identically:
Functional Preservation: Same input/output contracts, existing test suite validates behavior unchanged. Maintainability Gains:
Final Recommendation
Minor note: This is internal quality improvement work (Problem-fit: 3) rather than user-facing functionality, but the execution is exemplary and the benefits to future development are clear. |
Greptile SummaryThis PR decomposes the monolithic Key points:
Confidence Score: 3/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["extractXSymbols(tree, filePath)"] --> B["ctx = { definitions, calls,\nimports, classes, exports }"]
B --> C["walkXNode(tree.rootNode, ctx)"]
C --> D{"node.type switch"}
D -->|"class / struct / record"| E["handleXClassDecl(node, ctx)"]
D -->|"method / constructor"| F["handleXMethodDecl(node, ctx)"]
D -->|"interface / trait"| G["handleXInterfaceDecl(node, ctx)\n— emits interface + body methods"]
D -->|"import / use / using"| H["handleXImportDecl(node, ctx)"]
D -->|"call / invocation"| I["handleXCallExpr(node, ctx)"]
D -->|"export"| J["handleXExportStmt(node, ctx)"]
D -->|"(no match)"| K["recurse only"]
E --> L["recurse into children"]
F --> M{"parent is interface/trait?"}
M -->|"yes → skip (guard)"| N["return early"]
M -->|"no → emit"| L
G --> L
H --> L
I --> L
J --> L
K --> L
L --> C
C --> O["return ctx"]
Last reviewed commit: 10d08a9 |
src/extractors/go.js
Outdated
| function handleGoFuncDecl(node, ctx) { | ||
| const nameNode = node.childForFieldName('name'); | ||
| if (nameNode) { | ||
| const params = extractGoParameters(node.childForFieldName('parameters')); | ||
| ctx.definitions.push({ | ||
| name: nameNode.text, | ||
| kind: 'function', | ||
| line: node.startPosition.row + 1, | ||
| endLine: nodeEndLine(node), | ||
| children: params.length > 0 ? params : undefined, | ||
| visibility: goVisibility(nameNode.text), | ||
| }); | ||
| } |
There was a problem hiding this comment.
Inconsistent guard clause style in
handleGoFuncDecl
handleGoFuncDecl uses a positive guard (if (nameNode) { … }) while every other handler in this file — handleGoMethodDecl, handleGoTypeDecl, handleGoConstDecl, handleGoCallExpr, and the handlers in all sibling extractors — use an early-return guard (if (!nameNode) return;). This creates a style inconsistency within the otherwise uniform refactoring, and the indentation nesting is unnecessary.
| function handleGoFuncDecl(node, ctx) { | |
| const nameNode = node.childForFieldName('name'); | |
| if (nameNode) { | |
| const params = extractGoParameters(node.childForFieldName('parameters')); | |
| ctx.definitions.push({ | |
| name: nameNode.text, | |
| kind: 'function', | |
| line: node.startPosition.row + 1, | |
| endLine: nodeEndLine(node), | |
| children: params.length > 0 ? params : undefined, | |
| visibility: goVisibility(nameNode.text), | |
| }); | |
| } | |
| function handleGoFuncDecl(node, ctx) { | |
| const nameNode = node.childForFieldName('name'); | |
| if (!nameNode) return; | |
| const params = extractGoParameters(node.childForFieldName('parameters')); | |
| ctx.definitions.push({ | |
| name: nameNode.text, | |
| kind: 'function', | |
| line: node.startPosition.row + 1, | |
| endLine: nodeEndLine(node), | |
| children: params.length > 0 ? params : undefined, | |
| visibility: goVisibility(nameNode.text), | |
| }); | |
| } |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
There was a problem hiding this comment.
Fixed in 18a236c — switched to early-return guard style to match all other handlers in the file.
| function handleRustTraitItem(node, ctx) { | ||
| const nameNode = node.childForFieldName('name'); | ||
| if (!nameNode) return; | ||
| ctx.definitions.push({ | ||
| name: nameNode.text, | ||
| kind: 'trait', | ||
| line: node.startPosition.row + 1, | ||
| endLine: nodeEndLine(node), | ||
| }); | ||
| const body = node.childForFieldName('body'); | ||
| if (body) { | ||
| for (let i = 0; i < body.childCount; i++) { | ||
| const child = body.child(i); | ||
| if (child && (child.type === 'function_signature_item' || child.type === 'function_item')) { | ||
| const methName = child.childForFieldName('name'); | ||
| if (methName) { | ||
| ctx.definitions.push({ | ||
| name: `${nameNode.text}.${methName.text}`, | ||
| kind: 'method', | ||
| line: child.startPosition.row + 1, | ||
| endLine: child.endPosition.row + 1, | ||
| }); | ||
| } | ||
| break; | ||
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
handleRustTraitItem + walker recursion creates duplicate definitions for trait methods with default implementations
handleRustTraitItem manually iterates the trait body and adds TraitName.methodName definitions for any function_item children (default method implementations). However, walkRustNode then unconditionally recurses into all children of the trait_item, which includes those same function_item nodes. When the walker visits them, handleRustFuncItem fires — and since findCurrentImpl only looks for impl_item ancestors (not trait_item), it returns null. The result is a second, bare-name entry pushed to ctx.definitions:
// From handleRustTraitItem:
{ name: 'MyTrait.default_method', kind: 'method', … }
// From handleRustFuncItem (via recursive walk):
{ name: 'default_method', kind: 'function', … }
This behaviour was already present before this PR (it existed in the monolithic switch), so no regression is introduced here. However, since the handlers are now standalone functions, this is the right moment to decide whether the duplication is intentional (e.g., the bare-name entry assists resolution) or an oversight worth fixing. If unintentional, one fix is to skip function_item children inside a trait_item in handleRustFuncItem:
function handleRustFuncItem(node, ctx) {
// Skip default-impl functions already emitted by handleRustTraitItem
if (node.parent && node.parent.type === 'trait_item') return;
const nameNode = node.childForFieldName('name');
…
}The same pattern applies to handleCsInterfaceDecl / handleCsMethodDecl in csharp.js, and handlePhpInterfaceDecl / handlePhpMethodDecl in php.js.
There was a problem hiding this comment.
Fixed in 484a6d2 — added early-return guards in handleRustFuncItem, handleCsMethodDecl, and handlePhpMethodDecl to skip methods whose parent is a trait_item/interface_declaration body, preventing duplicate definitions.
All other handlers use `if (!nameNode) return;` style. Align handleGoFuncDecl to match.
…C#, and PHP extractors Trait/interface handlers already emit qualified method definitions. Without this guard, the recursive walker also fires the method handler, producing bare-name duplicates. Skip methods whose parent is a trait_item/interface_declaration body.
…plicate definitions
|
Addressed the Java interface duplicate-emit issue from the review summary: fixed in 888eb5a — added early-return guard in handleJavaMethodDecl to skip methods whose parent is an interface_declaration body, matching the pattern already applied to C#, PHP, and Rust extractors. Also added deduplication in the parity test normalize() to handle the native engine still producing duplicates for interface methods. |
Keep main's improvements: hasTable guards in legacy compat, try-catch in getBuildMeta, default parameter + validation in findNodes.
|
Resolved merge conflicts with main — kept main's improvements (hasTable guards, getBuildMeta try-catch, findNodes validation). All 1883 tests pass. |
Summary
Titan quality sweep — phase 4: decompose all 9 JS language extractors.
Each extractor had a single monolithic
walkXNodefunction with a giant switch/case over AST node types (cognitive complexity 137–279). Split into per-category handler functions: declarations, imports, exports, calls.javascript.jsdone first (cog 279→15), then replicatedAll extractors are dynamically dispatched via
LANGUAGE_REGISTRY— verified through integration tests that exercise each parser.Stacked on
Files changed (9)
src/extractors/javascript.jssrc/extractors/python.jssrc/extractors/java.jssrc/extractors/go.js,ruby.js,php.js,csharp.js,rust.js,hcl.jsTest plan