Skip to content

refactor: migrate integration tests to InMemoryRepository#462

Merged
carlos-alm merged 7 commits intomainfrom
refactor/inmemory-integration-tests
Mar 16, 2026
Merged

refactor: migrate integration tests to InMemoryRepository#462
carlos-alm merged 7 commits intomainfrom
refactor/inmemory-integration-tests

Conversation

@carlos-alm
Copy link
Contributor

Summary

  • Add openRepo(dbPath, opts) utility in src/db/connection.js that accepts an injected Repository instance via opts.repo or falls back to opening a readonly SQLite DB wrapped in SqliteRepository
  • Make findMatchingNodes and buildDependencyGraph polymorphic — accept either a raw db or a Repository via instanceof check
  • Refactor triageData, sequenceData, and communitiesData to use openRepo(), enabling Repository injection
  • Convert triage, sequence, and communities integration test fixtures from manual SQLite setup to createTestRepo() fluent builder (sequence dataflow tests remain on SQLite since InMemoryRepository has no dataflow table)
  • Mark ROADMAP 3.13 item complete

Test plan

  • npx vitest run tests/integration/triage.test.js — 20 tests pass
  • npx vitest run tests/integration/sequence.test.js — 16 tests pass
  • npx vitest run tests/integration/communities.test.js — 13 tests pass
  • npm test — full suite 1853 tests pass, 0 regressions
  • npm run lint — clean

Expand the deferred Phase 3 items into a dedicated phase after
TypeScript Migration with detailed descriptions for each sub-item:
event-driven pipeline, unified engine strategy, subgraph export
filtering, transitive confidence, query caching, config profiles,
pagination standardization, and plugin system.

Renumber subsequent phases 5-9 → 6-10 with all cross-references
updated.
Add openRepo() utility that accepts an injected Repository instance or
falls back to SQLite, enabling tests to bypass the filesystem entirely.

- Add openRepo(dbPath, opts) to src/db/connection.js
- Make findMatchingNodes and buildDependencyGraph polymorphic (accept
  db or Repository via instanceof check)
- Refactor triageData, sequenceData, communitiesData to use openRepo
- Convert triage, sequence, communities test fixtures to createTestRepo()
  fluent builder (sequence dataflow tests stay on SQLite)
- Mark ROADMAP 3.13 InMemoryRepository migration item complete

Impact: 10 functions changed, 11 affected
@claude
Copy link

claude bot commented Mar 16, 2026

Claude finished @carlos-alm's task —— View job


PR Review: Migrate Integration Tests to InMemoryRepository

Review Checklist:

  • Phase 1: Root Cause Analysis
  • Phase 2: Configuration Change Detection
  • Phase 3: Test Integrity Analysis
  • Phase 4: Code Quality Review
  • Phase 5: Backlog Compliance Assessment
  • Final Assessment and Recommendation

Root Cause Analysis

Problem Being Solved: This PR addresses ROADMAP 3.13 — migrating integration tests from manual SQLite setup to InMemoryRepository for faster, SQLite-free testing. The refactor introduces a Repository abstraction layer that enables dependency injection for testing while maintaining backward compatibility.

Why This Approach: The approach is well-justified. It introduces a clean openRepo(dbPath, opts) utility that accepts either an injected Repository or falls back to SQLite, making functions polymorphic without breaking changes. The migration preserves all test assertions while eliminating SQLite dependency, significantly improving test speed and reducing setup complexity.

Risk Assessment: Low risk. The changes are purely additive with fallback behavior. No breaking changes to existing APIs. The instanceof Repository checks ensure compatibility with both old and new calling patterns.

Configuration Change Detection ✅

No high-risk configuration changes detected. No changes to biome.json, package.json, vitest config, or other build/lint configurations. This is a pure refactor without any rule relaxation or threshold lowering.

Test Integrity Analysis ✅

Test rigor is maintained or improved:

  • All test assertions remain identical — no weakened expectations
  • Test coverage actually increases due to more comprehensive fixture setup via createTestRepo() fluent builder
  • Sequence dataflow tests appropriately remain on SQLite since InMemoryRepository lacks dataflow table support
  • Tests are now faster and more reliable (no filesystem dependency)
  • Added comprehensive test coverage for edge cases (missing complexity data, etc.)

Code Quality Review ✅

Architecture & Design:

  • Clean dependency injection pattern with fallback behavior
  • Proper polymorphic design using instanceof checks (src/graph/builders/dependency.js:24, src/domain/analysis/symbol-lookup.js:30)
  • Good separation of concerns — Repository abstraction cleanly separates test doubles from production SQLite
  • Maintains backward compatibility perfectly

Implementation Quality:

  • openRepo() utility is well-designed with clear documentation (src/db/connection.js:103-114)
  • Proper resource cleanup with close() functions
  • Intelligent fallback handling for minConfidence filtering (src/graph/builders/dependency.js:82-88)
  • Test fixtures are much cleaner and more maintainable using fluent builder pattern

No Security/Performance Concerns:

  • No credential handling changes
  • No new runtime dependencies
  • Tests run faster (in-memory vs SQLite I/O)
  • No breaking changes to existing behavior

Backlog Compliance Assessment

Zero-dep: ✅ — No new runtime dependencies added. Pure refactoring of existing code.

Foundation-aligned: ✅ — Fully compliant with all core principles:

  • P1 (always-current): Doesn't affect rebuild speed, improves test speed
  • P4 (zero-cost core): No cloud APIs, purely local refactoring
  • P5 (embeddable-first): Maintains programmatic API contract
  • P8 (not an agent): Pure code intelligence refactor, no behavior changes

Problem-fit: 4/5 — This directly improves the codebase's maintainability and test reliability, which reduces development friction. While not directly addressing the "lost AI" problem, it enables faster iteration on features that do address it. Strong foundation work that supports the primary mission.

Breaking: No — Purely additive changes. All existing call sites continue to work identically. New opts.repo parameter is optional with sensible fallbacks.

Tier: 1 (Zero-dep ✅ + Foundation-aligned ✅)

Critical Concerns

None. This is a well-executed refactor that improves code quality without introducing risks.

Final Recommendation

  • Rating: ⭐⭐⭐⭐☆ (4/5)
  • Action: APPROVE
  • Reasoning: High-quality refactor that addresses an explicit roadmap item (3.13), improves test reliability and speed, maintains perfect backward compatibility, and follows established architectural patterns. The implementation is clean, well-documented, and properly tested. This is exactly the kind of foundation work that enables faster feature development.

The one point deducted is for being "good foundation work" rather than "exceptional innovation" — this is solid engineering that moves the project forward systematically.


@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 16, 2026

Greptile Summary

This PR completes a repository-pattern migration by introducing openRepo() as a unified entry point for all read features. triageData, sequenceData, and communitiesData now accept either an injected Repository (for tests) or fall back to a readonly SQLite-backed SqliteRepository. Three integration test suites are converted from manual SQLite fixture setup to the lightweight createTestRepo() / InMemoryRepository builder, while the dataflow annotation tests in sequence.test.js are intentionally kept on SQLite since InMemoryRepository has no dataflow table.

Key design decisions land cleanly:

  • openRepo throws a TypeError if opts.repo is provided but isn't a Repository instance, preventing silent fallthrough.
  • The instanceof SqliteRepository guard in sequence.js replaces the previous fragile repo.db truthiness check for the dataflow block.
  • communities.js intentionally closes the DB early (before the Louvain algorithm runs) since the full graph is loaded into memory — this is correct and more resource-efficient than holding the connection open during graph analysis.
  • findMatchingNodes and buildDependencyGraph are made polymorphic via instanceof Repository checks, keeping both the old raw-db call path and the new Repository path in sync.

One minor observation: the minConfidence branch in buildFunctionLevelGraph (line 82 of dependency.js) still documents that confidence filtering is unsupported for the Repository path. No current callers pass minConfidence with a Repository, so this is a latent gap rather than an active defect, but it's worth confirming the comment aligns with the intended long-term contract for this function.

Confidence Score: 5/5

  • This PR is safe to merge — it is a clean refactoring with no behavioral changes to production logic, full test coverage, and all three previously-raised review concerns addressed.
  • All production call paths are preserved (raw-db callers are unchanged; the new Repository path is additive). The openRepo TypeError guard prevents silent misuse. instanceof SqliteRepository replaces the fragile repo.db guard. 1853 tests pass. The only residual gap is the undocumented minConfidence no-op for Repository in dependency.js, but no current caller exercises that combination.
  • src/graph/builders/dependency.js — the minConfidence branch for Repository paths is a documented no-op; worth a follow-up if Repository callers with minConfidence are added in future.

Important Files Changed

Filename Overview
src/db/connection.js Adds openRepo() utility with proper TypeError guard when opts.repo is provided but is not a Repository instance. Falls back cleanly to openReadonlyOrFail + SqliteRepository wrapper otherwise. No issues.
src/features/communities.js Migrated to openRepo(). Intentionally closes the DB early (after graph loading) before running the Louvain algorithm, since the graph is fully in-memory at that point. All Louvain and drift post-processing use only the in-memory graph. Correct and resource-efficient pattern.
src/features/sequence.js Migrated to openRepo(). Dataflow block correctly guarded with instanceof SqliteRepository (per previous review fix). close() in a wrapping finally block ensures no DB leak.
src/graph/builders/dependency.js Made polymorphic for Repository. The minConfidence path for Repository falls back to unfiltered getCallEdges() (documented in a comment). Current callers that pass minConfidence (viewer.js, export.js) still use raw db, so this code path is not actively triggered in production.
tests/integration/sequence.test.js Core sequence tests migrated to InMemoryRepository. Dataflow annotation tests correctly remain on SQLite (InMemoryRepository has no dataflow table). Both paths clearly separated and well-documented.

Sequence Diagram

sequenceDiagram
    participant Caller as CLI / MCP
    participant Feature as Feature (triage/sequence/communities)
    participant openRepo as openRepo()
    participant Repo as Repository (InMemory or Sqlite)
    participant DB as SQLite DB

    Caller->>Feature: featureData(dbPath, opts)
    Feature->>openRepo: openRepo(dbPath, opts)

    alt opts.repo is Repository instance
        openRepo-->>Feature: { repo: opts.repo, close: noop }
    else opts.repo is non-null but not Repository
        openRepo-->>Caller: TypeError
    else no opts.repo
        openRepo->>DB: openReadonlyOrFail(dbPath)
        openRepo-->>Feature: { repo: new SqliteRepository(db), close: db.close }
    end

    Feature->>Repo: findNodes / getEdges / etc.
    Repo-->>Feature: in-memory data
    Feature->>Feature: close()
    Feature-->>Caller: result
Loading

Last reviewed commit: ff74ffb

Comment on lines 80 to +88
if (minConfidence != null) {
edges = db
.prepare("SELECT source_id, target_id FROM edges WHERE kind = 'calls' AND confidence >= ?")
.all(minConfidence);
if (isRepo) {
// minConfidence filtering not supported by Repository — fall back to getCallEdges
edges = dbOrRepo.getCallEdges();
} else {
edges = dbOrRepo
.prepare("SELECT source_id, target_id FROM edges WHERE kind = 'calls' AND confidence >= ?")
.all(minConfidence);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minConfidence silently ignored for Repository

When a Repository is passed and minConfidence is set, the filter is silently dropped — getCallEdges() returns all call edges regardless of confidence. This means callers like viewer.js and export.js that pass minConfidence would get incorrect (unfiltered) graphs if they were ever migrated to use the Repository path.

At minimum, a warn() log would make this visible. Better would be to add a getCallEdgesAboveConfidence(min) method to the Repository contract, or post-filter the results:

Suggested change
if (minConfidence != null) {
edges = db
.prepare("SELECT source_id, target_id FROM edges WHERE kind = 'calls' AND confidence >= ?")
.all(minConfidence);
if (isRepo) {
// minConfidence filtering not supported by Repository — fall back to getCallEdges
edges = dbOrRepo.getCallEdges();
} else {
edges = dbOrRepo
.prepare("SELECT source_id, target_id FROM edges WHERE kind = 'calls' AND confidence >= ?")
.all(minConfidence);
}
if (minConfidence != null) {
if (isRepo) {
// minConfidence filtering not yet in Repository — filter in JS
edges = dbOrRepo.getCallEdges().filter((e) => e.confidence == null || e.confidence >= minConfidence);
} else {
edges = dbOrRepo
.prepare("SELECT source_id, target_id FROM edges WHERE kind = 'calls' AND confidence >= ?")
.all(minConfidence);
}

Note: this suggestion requires getCallEdges() to include a confidence field in its return shape, which would need a small change to InMemoryRepository.getCallEdges() and SqliteRepository.getCallEdges().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — when a Repository is passed and minConfidence is set, call edges are now post-filtered in JS: .filter(e => e.confidence != null && e.confidence >= minConfidence). Updated getCallEdges() in all three layers (SqliteRepository, InMemoryRepository, base JSDoc) to include the confidence field in the return shape.

@carlos-alm
Copy link
Contributor Author

@greptileai

Comment on lines +175 to +176
if (hasTable && repo.db) {
const db = repo.db;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dataflow path bypasses Repository abstraction

The dataflow annotation block relies on repo.db to access raw SQLite, falling back to skip the block for InMemoryRepository (where repo.db is undefined). While SqliteRepository intentionally exposes get db() as an escape hatch, this creates a fragile implicit contract: any future Repository subclass that happens to define a .db property for unrelated reasons would accidentally enter the raw-SQL dataflow block.

A more robust guard would be an explicit type-check or a dedicated Repository method, for example:

// Instead of:
if (hasTable && repo.db) {

// Consider:
if (hasTable && repo instanceof SqliteRepository) {

Or, alternatively, expose a getDataflowStatements() method on SqliteRepository that returns the prepared statements (returning null from the base class), so no raw db access is needed at the call site. This is low priority given the existing JSDoc comment on get db(), but worth tracking as dataflow support expands to other Repository implementations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — replaced the fragile repo.db check with an explicit instanceof SqliteRepository guard.

Comment on lines +103 to +106
export function openRepo(customDbPath, opts = {}) {
if (opts.repo instanceof Repository) {
return { repo: opts.repo, close() {} };
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silent fallthrough for non-Repository opts.repo

If opts.repo is provided but is not an instanceof Repository (e.g. a plain mock object, a wrong class, or an accidentally undefined value from a destructuring default), the function silently falls through to openReadonlyOrFail(customDbPath). The caller may not realize the injected repo was ignored.

Consider adding an explicit guard to surface the mismatch early:

Suggested change
export function openRepo(customDbPath, opts = {}) {
if (opts.repo instanceof Repository) {
return { repo: opts.repo, close() {} };
}
export function openRepo(customDbPath, opts = {}) {
if (opts.repo != null) {
if (!(opts.repo instanceof Repository)) {
throw new TypeError(
`openRepo: opts.repo must be a Repository instance, got ${Object.prototype.toString.call(opts.repo)}`,
);
}
return { repo: opts.repo, close() {} };
}
const db = openReadonlyOrFail(customDbPath);
return {
repo: new SqliteRepository(db),
close() {
db.close();
},
};
}

This makes the contract explicit and prevents hard-to-debug test failures where a repo is passed but ignored.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — openRepo now throws a TypeError if opts.repo is provided but is not a Repository instance.

@carlos-alm
Copy link
Contributor Author

@greptileai

Add new Phase 4 covering the port of JS-only build phases to Rust:
- 4.1-4.3: AST nodes, CFG, dataflow visitor ports (~587ms savings)
- 4.4: Batch SQLite inserts (~143ms)
- 4.5: Role classification & structure (~42ms)
- 4.6: Complete complexity pre-computation
- 4.7: Fix incremental rebuild data loss on native engine
- 4.8: Incremental rebuild performance (target sub-100ms)

Bump old Phases 4-10 to 5-11 with all cross-references updated.
Benchmark evidence shows ~50% of native build time is spent in
JS visitors that run identically on both engines.
Keep Phase 4 (Native Analysis Acceleration) numbering from PR branch,
incorporate 'DX & onboarding' addition and improved deliverable wording
from main.
@carlos-alm carlos-alm merged commit 2bd997a into main Mar 16, 2026
14 checks passed
@carlos-alm carlos-alm deleted the refactor/inmemory-integration-tests branch March 16, 2026 10:59
@github-actions github-actions bot locked and limited conversation to collaborators Mar 16, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant