Skip to content

Commit f40bb91

Browse files
fix: change default embedding model from jina-code to nomic-v1.5
jina-code requires HuggingFace authentication (gated model), causing `codegraph embed` to crash for users without HF_TOKEN. nomic-v1.5 is public, same 768d dimensions, and improved quality with 8192 context.
1 parent e97df1b commit f40bb91

File tree

3 files changed

+43
-19
lines changed

3 files changed

+43
-19
lines changed

README.md

Lines changed: 39 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,7 @@ Most code graph tools make you choose: **fast local analysis with no AI, or powe
9393
| **** | **Always-fresh graph** | Three-tier change detection: journal (O(changed)) → mtime+size (O(n) stats) → hash (O(changed) reads). Sub-second rebuilds even on large codebases. Competitors re-index everything from scratch; Merkle-tree approaches still require O(n) filesystem scanning |
9494
| **🔓** | **Zero-cost core, LLM-enhanced when you want** | Full graph analysis with no API keys, no accounts, no cost. Optionally bring your own LLM provider for richer embeddings and AI-powered search — your code only goes to the provider you already chose |
9595
| **🔬** | **Function-level, not just files** | Traces `handleAuth()``validateToken()``decryptJWT()` and shows 14 callers across 9 files break if `decryptJWT` changes |
96-
| **🤖** | **Built for AI agents** | 17-tool [MCP server](https://modelcontextprotocol.io/) — AI assistants query your graph directly. Single-repo by default, your code doesn't leak to other projects |
96+
| **🤖** | **Built for AI agents** | 17-tool [MCP server](https://modelcontextprotocol.io/) with `context` and `explain` compound commands — AI assistants get full function context in one call. Single-repo by default, your code doesn't leak to other projects |
9797
| **🌐** | **Multi-language, one CLI** | JS/TS + Python + Go + Rust + Java + C# + PHP + Ruby + HCL in a single graph — no juggling Madge, pyan, and cflow |
9898
| **💥** | **Git diff impact** | `codegraph diff-impact` shows changed functions, their callers, and full blast radius — ships with a GitHub Actions workflow |
9999
| **🧠** | **Semantic search** | Local embeddings by default, LLM-powered embeddings when opted in — multi-query with RRF ranking via `"auth; token; JWT"` |
@@ -180,12 +180,15 @@ codegraph deps src/index.ts # file-level import/export map
180180

181181
| | Feature | Description |
182182
|---|---|---|
183-
| 🔍 | **Symbol search** | Find any function, class, or method by name with callers/callees |
183+
| 🔍 | **Symbol search** | Find any function, class, or method by name — exact match priority, relevance scoring, `--file` and `--kind` filters |
184184
| 📁 | **File dependencies** | See what a file imports and what imports it |
185185
| 💥 | **Impact analysis** | Trace every file affected by a change (transitive) |
186-
| 🧬 | **Function-level tracing** | Call chains, caller trees, and function-level impact |
186+
| 🧬 | **Function-level tracing** | Call chains, caller trees, and function-level impact with qualified call resolution |
187+
| 🎯 | **Deep context** | `context` gives AI agents source, deps, callers, signature, and tests for a function in one call; `explain` gives structural summaries of files or functions |
188+
| 📍 | **Fast lookup** | `where` shows exactly where a symbol is defined and used — minimal, fast |
187189
| 📊 | **Diff impact** | Parse `git diff`, find overlapping functions, trace their callers |
188190
| 🗺️ | **Module map** | Bird's-eye view of your most-connected files |
191+
| 🏗️ | **Structure & hotspots** | Directory cohesion scores, fan-in/fan-out hotspot detection, module boundaries |
189192
| 🔄 | **Cycle detection** | Find circular dependencies at file or function level |
190193
| 📤 | **Export** | DOT (Graphviz), Mermaid, and JSON graph export |
191194
| 🧠 | **Semantic search** | Embeddings-powered natural language search with multi-query RRF ranking |
@@ -210,7 +213,19 @@ codegraph watch [dir] # Watch for changes, update graph incrementally
210213
codegraph query <name> # Find a symbol — shows callers and callees
211214
codegraph deps <file> # File imports/exports
212215
codegraph map # Top 20 most-connected files
213-
codegraph map -n 50 # Top 50
216+
codegraph map -n 50 --no-tests # Top 50, excluding test files
217+
codegraph where <name> # Where is a symbol defined and used?
218+
codegraph where --file src/db.js # List symbols, imports, exports for a file
219+
codegraph stats # Graph health: nodes, edges, languages, quality score
220+
```
221+
222+
### Deep Context (AI-Optimized)
223+
224+
```bash
225+
codegraph context <name> # Full context: source, deps, callers, signature, tests
226+
codegraph context <name> --depth 2 --no-tests # Include callee source 2 levels deep
227+
codegraph explain <file> # Structural summary: public API, internals, data flow
228+
codegraph explain <function> # Function summary: signature, calls, callers, tests
214229
```
215230

216231
### Impact Analysis
@@ -225,6 +240,14 @@ codegraph diff-impact --staged # Impact of staged changes
225240
codegraph diff-impact HEAD~3 # Impact vs a specific ref
226241
```
227242

243+
### Structure & Hotspots
244+
245+
```bash
246+
codegraph structure # Directory overview with cohesion scores
247+
codegraph hotspots # Files with extreme fan-in, fan-out, or density
248+
codegraph hotspots --metric coupling --level directory --no-tests
249+
```
250+
228251
### Export & Visualization
229252

230253
```bash
@@ -268,9 +291,9 @@ A single trailing semicolon is ignored (falls back to single-query mode). The `-
268291
| `minilm` | all-MiniLM-L6-v2 | 384 | ~23 MB | Apache-2.0 | Fastest, good for quick iteration |
269292
| `jina-small` | jina-embeddings-v2-small-en | 512 | ~33 MB | Apache-2.0 | Better quality, still small |
270293
| `jina-base` | jina-embeddings-v2-base-en | 768 | ~137 MB | Apache-2.0 | High quality, 8192 token context |
271-
| `jina-code` (default) | jina-embeddings-v2-base-code | 768 | ~137 MB | Apache-2.0 | **Best for code search**, trained on code+text |
294+
| `jina-code` | jina-embeddings-v2-base-code | 768 | ~137 MB | Apache-2.0 | Best for code search, trained on code+text (requires HF token) |
272295
| `nomic` | nomic-embed-text-v1 | 768 | ~137 MB | Apache-2.0 | Good quality, 8192 context |
273-
| `nomic-v1.5` | nomic-embed-text-v1.5 | 768 | ~137 MB | Apache-2.0 | Improved nomic, Matryoshka dimensions |
296+
| `nomic-v1.5` (default) | nomic-embed-text-v1.5 | 768 | ~137 MB | Apache-2.0 | **Improved nomic, Matryoshka dimensions** |
274297
| `bge-large` | bge-large-en-v1.5 | 1024 | ~335 MB | MIT | Best general retrieval, top MTEB scores |
275298

276299
The model used during `embed` is stored in the database, so `search` auto-detects it — no need to pass `--model` when searching.
@@ -304,13 +327,13 @@ By default, the MCP server only exposes the local project's graph. AI agents can
304327
| Flag | Description |
305328
|---|---|
306329
| `-d, --db <path>` | Custom path to `graph.db` |
307-
| `-T, --no-tests` | Exclude `.test.`, `.spec.`, `__test__` files |
330+
| `-T, --no-tests` | Exclude `.test.`, `.spec.`, `__test__` files (available on `fn`, `fn-impact`, `context`, `explain`, `where`, `diff-impact`, `search`, `map`, `hotspots`, `deps`, `impact`) |
308331
| `--depth <n>` | Transitive trace depth (default varies by command) |
309332
| `-j, --json` | Output as JSON |
310333
| `-v, --verbose` | Enable debug output |
311334
| `--engine <engine>` | Parser engine: `native`, `wasm`, or `auto` (default: `auto`) |
312-
| `-k, --kind <kind>` | Filter by kind: `function`, `method`, `class`, `struct`, `enum`, `trait`, `record`, `module` (search) |
313-
| `--file <pattern>` | Filter by file path pattern (search) |
335+
| `-k, --kind <kind>` | Filter by kind: `function`, `method`, `class`, `struct`, `enum`, `trait`, `record`, `module` (`fn`, `context`, `search`) |
336+
| `-f, --file <path>` | Scope to a specific file (`fn`, `context`, `where`) |
314337
| `--rrf-k <n>` | RRF smoothing constant for multi-query search (default 60) |
315338

316339
## 🌐 Language Support
@@ -361,18 +384,19 @@ Both engines produce identical output. Use `--engine native|wasm|auto` to contro
361384

362385
### Call Resolution
363386

364-
Calls are resolved with priority and confidence scoring:
387+
Calls are resolved with **qualified resolution** — method calls (`obj.method()`) are distinguished from standalone function calls, and built-in receivers (`console`, `Math`, `JSON`, `Array`, `Promise`, etc.) are filtered out automatically. Import scope is respected: a call to `foo()` only resolves to functions that are actually imported or defined in the same file, eliminating false positives from name collisions.
365388

366389
| Priority | Source | Confidence |
367390
|---|---|---|
368391
| 1 | **Import-aware**`import { foo } from './bar'` → link to `bar` | `1.0` |
369392
| 2 | **Same-file** — definitions in the current file | `1.0` |
370-
| 3 | **Same directory** — definitions in sibling files | `0.7` |
371-
| 4 | **Same parent directory** — definitions in sibling dirs | `0.5` |
372-
| 5 | **Global fallback** — match by name across codebase | `0.3` |
373-
| 6 | **Method hierarchy** — resolved through `extends`/`implements` ||
393+
| 3 | **Same directory** — definitions in sibling files (standalone calls only) | `0.7` |
394+
| 4 | **Same parent directory** — definitions in sibling dirs (standalone calls only) | `0.5` |
395+
| 5 | **Method hierarchy** — resolved through `extends`/`implements` | varies |
396+
397+
Method calls on unknown receivers skip global fallback entirely — `stmt.run()` will never resolve to a standalone `run` function in another file. Duplicate caller/callee edges are deduplicated automatically. Dynamic patterns like `fn.call()`, `fn.apply()`, `fn.bind()`, and `obj["method"]()` are also detected on a best-effort basis.
374398

375-
Dynamic patterns like `fn.call()`, `fn.apply()`, `fn.bind()`, and `obj["method"]()` are also detected on a best-effort basis.
399+
Codegraph also extracts symbols from common callback patterns: Commander `.command().action()` callbacks (as `command:build`), Express route handlers (as `route:GET /api/users`), and event emitter listeners (as `event:data`).
376400

377401
## 📊 Performance
378402

src/cli.js

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -374,7 +374,7 @@ program
374374
.action(() => {
375375
console.log('\nAvailable embedding models:\n');
376376
for (const [key, config] of Object.entries(MODELS)) {
377-
const def = key === 'jina-code' ? ' (default)' : '';
377+
const def = key === 'nomic-v1.5' ? ' (default)' : '';
378378
console.log(` ${key.padEnd(12)} ${String(config.dim).padStart(4)}d ${config.desc}${def}`);
379379
}
380380
console.log('\nUsage: codegraph embed --model <name>');
@@ -388,8 +388,8 @@ program
388388
)
389389
.option(
390390
'-m, --model <name>',
391-
'Embedding model: minilm, jina-small, jina-base, jina-code (default), nomic, nomic-v1.5, bge-large. Run `codegraph models` for details',
392-
'jina-code',
391+
'Embedding model: minilm, jina-small, jina-base, jina-code, nomic, nomic-v1.5 (default), bge-large. Run `codegraph models` for details',
392+
'nomic-v1.5',
393393
)
394394
.action(async (dir, opts) => {
395395
const root = path.resolve(dir || '.');

src/embedder.js

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@ export const MODELS = {
5555
},
5656
};
5757

58-
export const DEFAULT_MODEL = 'jina-code';
58+
export const DEFAULT_MODEL = 'nomic-v1.5';
5959
const BATCH_SIZE_MAP = {
6060
minilm: 32,
6161
'jina-small': 16,

0 commit comments

Comments
 (0)