Architecture
Layer diagram
mermaid
graph TB
subgraph Consumers
CLI["CLI (Typer)"]
MCP["MCP Server (FastMCP)"]
REST["REST API (FastAPI)"]
end
subgraph Core
Engine["LocalLens Engine<br/>(Python Library)"]
end
subgraph Acceleration["Rust Extensions (optional)"]
RustBM25["BM25 Index"]
RustChunker["Chunker (27 langs)"]
RustWalker["File Walker"]
RustWatcher["File Watcher"]
end
subgraph Storage
Edge["Qdrant Edge<br/>(Embedded)"]
Server["Qdrant Server<br/>(Docker)"]
end
subgraph External
Ollama["Ollama<br/>(Local LLM)"]
ST["Sentence Transformers<br/>(Embeddings)"]
end
Dashboard["Web Dashboard<br/>(React + Vite)"] --> REST
CLI --> Engine
MCP --> Engine
REST --> Engine
Engine --> Edge
Engine --> ST
Engine --> Ollama
Engine -.->|optional| RustBM25
Engine -.->|optional| RustChunker
Engine -.->|optional| RustWalker
Engine -.->|optional| RustWatcher
REST -.-> ServerThe LocalLens Python library (locallens.engine.LocalLens) is the center of everything. The CLI, MCP server, and REST API are all thin wrappers that call the same engine methods.
Pipeline
Every file goes through the same 5-step pipeline:
- Extract — Pull text from the file using the appropriate extractor (PDF, DOCX, code, etc.)
- Chunk — Split text into ~500 character chunks with 50 character overlap. Structure-aware: respects heading boundaries in Markdown, function boundaries in code
- Embed — Convert each chunk to a 384-dimensional vector using
all-MiniLM-L6-v2(sentence-transformers) - Store — Upsert into Qdrant with deterministic point IDs (
uuid5offile_path:chunk_index). Payload includesfile_path,file_name,file_type,chunk_text,chunk_index,file_hash,indexed_at - Search / RAG — Query the vector store by semantic similarity, BM25 keywords, or hybrid (RRF fusion). For RAG, retrieved chunks become context for Ollama
Two vector DB modes
Qdrant Edge (embedded)
Used by the Python library and CLI. No Docker, no server process.
- Storage:
~/.locallens/qdrant_data - SDK:
qdrant-edge-py(EdgeShard) - Features: named vectors, keyword payload indexes, filtered search, facets
Qdrant Server (Docker)
Used by the web dashboard's FastAPI backend.
- Storage: Docker volume at
./data/qdrant - SDK:
qdrant-clientover HTTP (port 6333) - Start:
docker compose up -d qdrant
Both modes use the same schema, so data can sync between them via locallens sync push/pull.
Hybrid search
LocalLens supports three search modes:
| Mode | How it works |
|---|---|
semantic | Cosine similarity between query embedding and stored chunk embeddings |
keyword | BM25 scoring over tokenized chunk text |
hybrid | Both semantic and BM25 results combined via Reciprocal Rank Fusion (RRF, k=60) |
Hybrid is the default and generally gives the best results — semantic similarity catches meaning while BM25 catches exact terms.
Optional components
| Component | Purpose | Required? |
|---|---|---|
| Ollama | Local LLM for RAG (ask command) | Only for ask |
| Moonshine | Speech-to-text (voice input) | Only for voice |
| Piper TTS | Text-to-speech (voice output) | Only for voice |
| Docker | Qdrant Server for web dashboard | Only for web dashboard |
Shared schema
Both Qdrant Edge and Qdrant Server use an identical schema:
- Collection name:
locallens - Named vector key:
"text" - Vector dimensions: 384 (cosine distance)
- Payload fields:
file_path,file_name,file_type,chunk_text,chunk_index,file_hash,indexed_at - Keyword indexes on:
file_hash,file_path,file_type - Point IDs:
uuid5(namespace, f"{abs_file_path}:{chunk_index}")
