RAG Without Vectors: Emergent Retrieval
🔍 The Heretical Question
What if you didn't need embeddings, vector databases, or chunking strategies? What if reasoning itself could be the similarity function?
The Standard RAG Pipeline
Everyone knows how Retrieval-Augmented Generation works. You take your documents,
you chunk them, you embed them with something like text-embedding-3-small,
you store them in Pinecone or Chroma, and at query time you embed the query,
find similar vectors, pull the chunks, and stuff them into context.
It works. It's battle-tested. But it's also expensive, brittle, and requires constant tuning of chunk sizes, overlap, retrieval thresholds, and reranking strategies.
The Heresy: Let the LLM Be the Retriever
What if instead of pre-computing similarity through embeddings, you gave the LLM tools to explore and let it figure out what's relevant?
❌ Vector RAG
Similarity is pre-computed by a frozen embedding model. It can't reason. "Authentication" and "login" might be far apart in embedding space despite being conceptually identical.
✓ File System RAG
Similarity is computed at query time by the LLM's reasoning.
It can understand "I need authentication code" means searching for
auth, login, session, and jwt.
// The LLM's retrieval "algorithm"
User: "How does authentication work?"
LLM thinks:
├── Auth code is probably in /src/auth/ or /lib/auth/
├── Maybe there's a middleware folder
└── JWT handling might be in utils/
LLM acts:
├── search_files("*.auth.ts", "src/")
├── search_in_file("jwt", "src/middleware/index.ts")
└── read_file_lines("src/auth/session.ts", 1, 50)
// The LLM IS the similarity function The Tool Stack That Enables This
Building the OODA MCP server taught me what tools an LLM needs to perform emergent retrieval. It's not one tool—it's a progression from broad discovery to surgical extraction.
"What's in this project?" → Browse folder structure, find files by pattern. Like scanning a library's catalog.
"Where is this concept?" → Search for patterns across files with context lines. Supports regex and fuzzy matching with configurable thresholds.
"Show me this specific part." → Read line ranges, tail files, get exactly what's needed.
offset: -50 reads last 50 lines like Unix tail.
"Search all of these at once." → Parallel operations reduce round-trips. One tool call can search 20 files simultaneously.
The magic happens in the progression. The LLM doesn't read everything—it narrows down iteratively, just like a human developer would.
The Secret Sauce: Fuzzy Matching
Vector embeddings capture "semantic similarity" through high-dimensional geometry. But you can get 80% of the benefit with simple Levenshtein distance.
batch_search_in_files({
searches: [
{ path: "src/auth/session.ts", pattern: "autentication" }
],
isFuzzy: true,
fuzzyThreshold: 0.7, // 70% similarity
contextLines: 3
})
// Returns matches for "authentication" despite typo
// similarity: 0.85 → "authentication"
This is critical for LLM tolerance. The model might misspell, use synonyms, or guess at naming conventions. Fuzzy matching catches these without the overhead of embedding computation.
"autentication" "authentication" When to Use Which Approach
Emergent RAG isn't a replacement—it's an alternative for specific use cases.
| Factor | Vector RAG | File System RAG |
|---|---|---|
| Setup Cost | High (embeddings, vector DB, chunking) | Zero (files already exist) |
| Query Cost | Low (one embedding + vector search) | Higher (multiple tool calls) |
| Reasoning | None (geometric similarity only) | Full LLM reasoning at retrieval time |
| Data Freshness | Stale (must re-embed on changes) | Always current (reads live files) |
| Best For | Large static corpora, semantic search | Codebases, live documents, exploration |
| Structure Awareness | Lost (flat chunks) | Preserved (folder hierarchy, file names) |
The Real Insight: Context Is Navigation
Traditional RAG treats retrieval as a lookup problem: given a query, return the most similar chunks. But when you give an LLM file system tools, it treats retrieval as a navigation problem: given a goal, explore until you find what you need.
This is closer to how humans actually work. When I need to understand authentication in a codebase, I don't compute embeddings—I:
- Look at the folder structure (
list_directory) - Find likely candidates (
search_files("*auth*")) - Search for key terms (
search_in_file("jwt")) - Read the relevant sections (
read_file_lines)
The LLM does exactly this—but faster, in parallel, and with perfect recall of everything it's seen in the conversation.
Implementation: The Tool Stack
1. Discovery Layer
list_directory + search_files
// Broad discovery
search_files({
directory: "src/",
pattern: "*.auth.ts",
recursive: true,
maxResults: 100
})
// Returns: ["src/auth/session.auth.ts", "src/middleware/jwt.auth.ts", ...] 2. Location Layer
search_in_file + batch_search_in_files
// Find specific patterns with context
search_in_file({
path: "src/auth/session.ts",
pattern: "validateToken",
contextLines: 5, // 5 lines before and after
maxMatches: 10
})
// Returns matches with surrounding context
// Line 47: "export function validateToken(token: string) {" 3. Extraction Layer
read_file_lines with surgical precision
// Read just the function, not the whole file
read_file_lines({
path: "src/auth/session.ts",
startLine: 47,
endLine: 72,
includeLineNumbers: true
})
// Or read the last 50 lines of a log file
read_file_lines({
path: "logs/auth.log",
offset: -50 // Negative = from end, like tail
}) The Emergent Property
When you give an LLM these tools, something interesting emerges: it teaches itself retrieval strategies. Not through training, but through in-context reasoning about what it needs.
Ask it to find authentication code and it will search for auth, login,
session, jwt, and token—without being told that these
concepts are related. The semantic knowledge is already in the model; you just
need to give it tools to act on that knowledge.
The Punchline
Vector RAG precomputes similarity and throws away reasoning.
File System RAG computes similarity through reasoning.
The LLM is the embedding model.
Is this always better than vector RAG? No. Is it simpler, cheaper, and more transparent for many use cases? Absolutely. And for codebases—where structure, naming conventions, and conceptual organization matter—it's often superior.