Full-text and structural code search for a large monorepo. Runs a Typesense search server in WSL and exposes results as MCP tools so Claude can query the codebase directly without copy-pasting.
- Windows 11 with WSL2
- Python 3.10+ available in WSL (
python3 --version)
From a Windows command prompt, run:
setup_mcp.cmd <path-to-your-source-root>
Example:
setup_mcp.cmd C:\myrepo\src
This will:
- Write
config.jsonwith your source root path and API key - Create the Windows MCP venv at
.venv\ - Create the WSL indexserver venv at
~/.local/indexserver-venv/ - Register
mcp.cmdwith Claude Code
Reload VS Code after running (Ctrl+Shift+P → Developer: Reload Window).
ts start # starts Typesense (WSL), watcher, and heartbeat
On first start, ts start automatically detects the missing collection and kicks off the indexer. You can also trigger it manually:
ts index --reset # drop + recreate collection, then re-index
Initial indexing of a large repo (~100k files) takes 30–40 minutes.
Run codesearch as a Docker container instead of installing locally. The container includes Typesense, the file watcher, and the MCP server.
- Docker installed and running
- Source code directory to index
docker build -t codesearch-mcp -f docker/Dockerfile .docker run -d --name codesearch \
-p 3000:3000 \
-p 8108:8108 \
-v /path/to/your/source:/source:ro \
-v codesearch_data:/typesensedata \
codesearch-mcpReplace /path/to/your/source with the path to your source code directory.
- Port
3000exposes the MCP SSE endpoint /sourceis where your code is mounted (read-only)codesearch_datavolume persists the Typesense index between container restarts
On first start, the container will automatically index all files in /source.
claude mcp add codesearch-docker --transport sse http://localhost:3000/sseAlternatively, use docker-compose for easier management:
cd docker
# Set your source directory
export SOURCE_DIR=/path/to/your/source
# Start the container
docker-compose up -d
# View logs
docker-compose logs -f
# Stop
docker-compose down| Variable | Default | Description |
|---|---|---|
CODESEARCH_PORT |
8108 |
Typesense server port (internal) |
CODESEARCH_ROOT_NAME |
default |
Name for the source root in config |
CODESEARCH_API_KEY |
(auto-generated) | Typesense API key |
MCP_PORT |
3000 |
MCP SSE server port |
All service commands go through ts.cmd (Windows CMD/PowerShell) or ts.sh (Git Bash / WSL):
ts status show server health, doc count, watcher/heartbeat state
ts start start Typesense + watcher + heartbeat (auto-indexes if needed)
ts stop stop everything
ts restart stop then start
ts index re-index in background (incremental, keeps existing collection)
ts index --reset drop + recreate collection, then re-index
ts index --root <name> index a specific named root (multi-root setups)
ts verify scan FS + repair index: add missing, re-index stale, remove orphans
ts verify --root <name> verify a specific named root
ts verify --no-delete-orphans repair without removing deleted-file entries
ts log tail the Typesense server log
ts log --indexer [-n N] tail the indexer/verifier log (default: last 40 lines)
ts log --heartbeat tail the heartbeat log
ts watcher start the file watcher standalone
ts heartbeat start the heartbeat watchdog standalone
To index multiple source trees, edit config.json:
{
"api_key": "codesearch-local",
"roots": {
"default": "X:/path/to/first/src",
"other": "Y:/path/to/second/src"
}
}Each root gets its own Typesense collection (codesearch_default, codesearch_other). Index each one:
ts index --root default --reset
ts index --root other --reset
ts restart
Use the MCP root= parameter to search a specific collection:
search_code("ItemProcessor", root="other")
query_cs("implements", "IFoo", root="other")
The watcher picks up changes automatically within ~12 seconds (10 s poll interval + 2 s debounce). For large repos, or after bulk operations like a git pull or branch switch, use the MCP tools or ts verify to confirm everything is in sync.
ready() # poll FS + check index is in sync (synchronous, ~30 s for 200k files)
verify_index(action="start") # launch background repair scan
verify_index(action="status") # monitor repair progress
verify_index(action="stop") # cancel a running scan
ready() returns a summary with poll_ok (FS walk completed), index_ok (zero missing/stale/orphaned), and timing. If not ready, verify_index(action="start") repairs the index without resetting it.
ts verify # foreground repair scan (missing + stale + orphans)
ts verify --no-delete-orphans # repair without removing deleted-file entries
ts verify --root other # verify a specific named root
# From WSL — uses a lightweight venv in /tmp:
bash test-query.shOr directly:
/tmp/ts-test-venv/bin/pytest tests/test_query_cs.py -v74 tests covering all 15 query_cs modes against a synthetic C# fixture (tests/query_fixture.cs).
Tests are split into thematic files under tests/. Tests that don't need a running server (unit tests, mock-based) run anywhere; integration tests auto-skip with a clear message if Typesense is not running.
run-server-tests.cmd # all tests
run-server-tests.cmd TestSearchFieldModes # specific class
run-server-tests.cmd test_method_sigs # specific method
Or directly from WSL:
~/.local/indexserver-venv/bin/pytest tests/ -v| File | What it tests |
|---|---|
test_indexer.py |
Indexer, semantic fields, multi-root, extract_cs_metadata, index_file_list pipeline |
test_indexer_query_consistency.py |
Cross-checks that indexer and query extract the same values from identical source — catches drift between the two |
test_watcher.py |
File watcher event handler (unit + integration) |
test_process_cs.py |
process_file() C# structural query API |
test_python.py |
Python metadata extraction (extract_py_metadata), process_py_file(), Python semantic fields |
test_verifier.py |
_export_index() (mock HTTP), run_verify() diff logic, full verify integration |
# Using the Windows venv:
.venv\Scripts\python.exe search.py "MyInterface"
.venv\Scripts\python.exe search.py "MyMethod" --ext cs --sub mysubsystem
.venv\Scripts\python.exe search.py "MyInterface" --mode implements
.venv\Scripts\python.exe search.py "MyMethod" --mode callers
.venv\Scripts\python.exe search.py "Obsolete" --mode attr
.venv\Scripts\python.exe search.py "MyType" --mode uses.venv\Scripts\python.exe query.py --methods MyClass.cs
.venv\Scripts\python.exe query.py --calls MyMethod "src/mysubsystem/**/*.cs"
.venv\Scripts\python.exe query.py --calls MyClass.MyMethod "src/mysubsystem/**/*.cs"
.venv\Scripts\python.exe query.py --implements IMyInterface --search "IMyInterface"
.venv\Scripts\python.exe query.py --field-type MyType --search "MyType"
.venv\Scripts\python.exe query.py --param-type MyType --search "MyType"
.venv\Scripts\python.exe query.py --uses MyType --search "MyType"
.venv\Scripts\python.exe query.py --find MyMethod MyClass.cs
.venv\Scripts\python.exe query.py --attrs TestMethod "src/**/*.cs"
.venv\Scripts\python.exe query.py --member-accesses MyType MyClass.cs-
Typesense — fast keyword/semantic search over pre-indexed metadata (class names, method names, base types, call sites, signatures, attributes, etc.). Runs in WSL; data stored at
~/.local/typesense/. -
tree-sitter — precise C# AST queries on the file set returned by Typesense. Skips comments and string literals, understands syntax.
Typical flow: Typesense narrows the haystack to ~50 candidate files → tree-sitter parses each one and applies the structural query.
┌─────────────────────────────────────────────────┐
│ MCP CLIENT (Claude ↔ tools) │
│ mcp_server.py search.py query.py │
│ Claude Code VSCode ext → mcp.sh (WSL) ← actual
│ Manual/CI alternative → mcp.cmd (Windows) │
│ Venv (WSL): ~/.local/mcp-venv/ │
│ Venv (Windows): .venv/ │
└───────────────────┬─────────────────────────────┘
│ HTTP localhost:8108
┌───────────────────▼─────────────────────────────┐
│ INDEXSERVER (WSL only) │
│ indexserver/service.py indexer.py │
│ indexserver/watcher.py heartbeat.py │
│ Venv: ~/.local/indexserver-venv/ │
│ Entry: ts.cmd (Windows) / ts.sh (WSL) │
└─────────────────────────────────────────────────┘
│ data
Typesense server
~/.local/typesense/
MCP runs in WSL. The Claude Code VSCode extension launches the MCP server via
mcp.sh, somcp_server.pyruns under the WSL Python (~/.local/mcp-venv). This means file paths inside the MCP process must be/mnt/x/...style, even thoughconfig.jsonstores them as WindowsX:/...paths.config.to_native_path()converts automatically based onsys.platform.Direct CLI usage (
query.py,search.pyinvoked by hand) can run under either Windows or WSL depending on which Python you call — both are supported.
When running in Docker, all components run in a single container:
┌─────────────────────────────────────────────────┐
│ DOCKER CONTAINER │
│ │
│ MCP Server (SSE) ──────────── port 3000 │
│ Typesense Server ──────────── port 8108 │
│ File Watcher (background) │
│ │
│ /source (volume) ──── your source code │
│ /typesensedata (volume) ── persisted index │
└─────────────────────────────────────────────────┘
The MCP server uses SSE (Server-Sent Events) transport instead of stdio, allowing Claude Code to connect via HTTP.
Client-side (repo root)
| File | Purpose |
|---|---|
config.py |
Shared constants: HOST, PORT, API_KEY, ROOTS, collection names. Reads config.json. |
search.py |
Typesense HTTP search; search() + format_results() |
query.py |
tree-sitter AST query functions + process_file() + files_from_search() |
mcp_server.py |
FastMCP server: search_code, query_cs, query_py, ready, verify_index, service_status tools |
mcp.cmd |
Windows launcher: .venv\Scripts\python.exe mcp_server.py |
mcp.sh |
WSL launcher: ~/.local/mcp-venv/bin/python mcp_server.py |
setup_mcp.cmd |
One-time setup: writes config.json, creates venvs, registers MCP |
Server-side (indexserver/)
| File | Purpose |
|---|---|
config.py |
Same constants as client config.py; also has INCLUDE_EXTENSIONS, EXCLUDE_DIRS, MAX_FILE_BYTES |
indexer.py |
Full re-index via os.walk + .gitignore parsing + tree-sitter C#/Python metadata extraction. Shared index_file_list() pipeline used by both full indexer and verifier. |
verifier.py |
Index repair: compares FS mtimes against the index, re-indexes missing/stale files, removes orphaned entries. check_ready() for synchronous readiness checks. |
watcher.py |
Incremental updates: PollingObserver (10 s interval) monitors source root and upserts changes. Uses polling because inotify doesn't fire for Windows-backed /mnt/ paths in WSL. |
heartbeat.py |
Health loop: checks server every 30 s, restarts watcher or server on failure |
start_server.py |
Downloads Typesense Linux binary; starts server process in WSL |
service.py |
CLI dispatcher for all ts subcommands including ts verify |
smoke_test.py |
Quick sanity check that the server is up and basic queries work |
Entry points
| File | Purpose |
|---|---|
ts.cmd |
Windows CMD/PowerShell → WSL bridge for all service commands |
ts.sh |
WSL / Git Bash entry point for service commands |
smoke-test.cmd |
Run smoke_test.py via WSL |
run-server-tests.cmd |
Run pytest test suite via WSL |
The collection uses tiered semantic fields extracted by tree-sitter at index time:
| Tier | Fields | Used by MCP mode |
|---|---|---|
| T1 | base_types |
implements |
| T1 | call_sites |
callers |
| T1 | method_sigs |
sig |
| T2 | type_refs |
uses |
| T2 | attributes |
attr |
| T2 | usings |
— |
| — | class_names, method_names, symbols |
text, symbols |
| — | content |
text |
Search ranking by file type: .cs (priority 3) → .h/.cpp/.c (2) → scripts/.py/.ts (1) → config/docs (0).
The subsystem field is the first path component under the source root. Use sub= to scope searches to a subsystem.
{
"api_key": "codesearch-local",
"roots": {
"default": "X:/path/to/your/src"
}
}This file is not checked in (listed in .gitignore) — it contains your local source root path. Run setup_mcp.cmd <src-root> to generate it.