Retrieval & reranking
Retrieval & reranking
When you start a new AI session, XTrace retrieves the most relevant beliefs, episodes, and artifacts from your memory — and assembles them into a context payload that the AI can use. This is what eliminates cold starts.
Retrieval pipeline
1. Classifier gate
An optional first stage that determines whether a query needs memory at all. Simple greetings or off-topic messages skip retrieval entirely, saving latency and cost.
2. Embedding search
The query is embedded and matched against stored facts using vector similarity. Only ACTIVE facts are searched — superseded and retracted beliefs are excluded automatically.
Results come back with a search_score that reflects semantic similarity to the query.
3. Reranking
Raw embedding scores are noisy. The reranker takes the top candidates (widened by a rerank_multiplier) and re-scores them against the original query using a cross-encoder model.
XTrace supports multiple reranker backends:
After reranking, only the top top_k_facts survive.
4. Iterative retrieval
For complex queries, a single retrieval pass may not be enough. The answer agent supports multi-iteration retrieval:
- Retrieve and rerank a first batch of facts
- An LLM evaluates sufficiency — does the retrieved context answer the query?
- If not, the LLM decomposes the query into sub-queries
- Each sub-query triggers another retrieval + rerank pass
- Results are accumulated and cross-reranked at the end
This is controlled by max_iterations and sufficiency_threshold.
5. Episode co-occurrence expansion
After the core retrieval, the system expands context by finding sibling facts — other beliefs extracted from the same episodes as the retrieved facts. If multiple retrieved facts share an episode, the other facts from that episode are likely relevant too.
Sibling scores are discounted by score_propagation_alpha to rank below directly-matched facts.
Context assembly
Retrieved facts, episodes, and artifacts are assembled into a structured context string that fits within a character budget (default 16,000 characters).
Assembly strategy
- Episode groups first: Facts are grouped by episode. Each group includes the episode header, associated artifacts, and fact bullets — ranked by match count and score
- Residual episodes: Episodes without matched facts but still relevant
- Greedy fill: Content is added until the character budget is exhausted
Optional LLM filtering
Three post-retrieval LLM passes can refine the assembled context:
These are optional and configurable. For most use cases, the default assembly without LLM filtering is sufficient.