Retrieval & reranking

When you start a new AI session, XTrace retrieves the most relevant beliefs, episodes, and artifacts from your memory — and assembles them into a context payload that the AI can use. This is what eliminates cold starts.

Retrieval pipeline

Query → Gate → Embedding search → Rerank → Episode expansion → Context assembly

1. Classifier gate

An optional first stage that determines whether a query needs memory at all. Simple greetings or off-topic messages skip retrieval entirely, saving latency and cost.

2. Embedding search

The query is embedded and matched against stored facts using vector similarity. Only ACTIVE facts are searched — superseded and retracted beliefs are excluded automatically.

Results come back with a search_score that reflects semantic similarity to the query.

3. Reranking

Raw embedding scores are noisy. The reranker takes the top candidates (widened by a rerank_multiplier) and re-scores them against the original query using a cross-encoder model.

XTrace supports multiple reranker backends:

Backend	Model	Notes
Local	BAAI/bge-reranker-v2-m3	Default, runs in-process
Cohere	Cohere Rerank API	External service

After reranking, only the top top_k_facts survive.

4. Iterative retrieval

For complex queries, a single retrieval pass may not be enough. The answer agent supports multi-iteration retrieval:

Retrieve and rerank a first batch of facts
An LLM evaluates sufficiency — does the retrieved context answer the query?
If not, the LLM decomposes the query into sub-queries
Each sub-query triggers another retrieval + rerank pass
Results are accumulated and cross-reranked at the end

This is controlled by max_iterations and sufficiency_threshold.

5. Episode co-occurrence expansion

After the core retrieval, the system expands context by finding sibling facts — other beliefs extracted from the same episodes as the retrieved facts. If multiple retrieved facts share an episode, the other facts from that episode are likely relevant too.

Sibling scores are discounted by score_propagation_alpha to rank below directly-matched facts.

Context assembly

Retrieved facts, episodes, and artifacts are assembled into a structured context string that fits within a character budget (default 16,000 characters).

Assembly strategy

Episode groups first: Facts are grouped by episode. Each group includes the episode header, associated artifacts, and fact bullets — ranked by match count and score
Residual episodes: Episodes without matched facts but still relevant
Greedy fill: Content is added until the character budget is exhausted

Optional LLM filtering

Three post-retrieval LLM passes can refine the assembled context:

Pass	What it does
Context selection	Pre-assembly filter — removes irrelevant facts, artifacts, and episodes before assembly
Fact selection	Post-assembly — re-evaluates and trims facts in the assembled context
Context cleanup	Post-assembly — rewrites the assembled string for clarity and relevance

These are optional and configurable. For most use cases, the default assembly without LLM filtering is sufficient.

Configuration

Parameter	Default	What it controls
`top_k_facts`	—	Maximum facts returned
`top_k_episodes`	—	Maximum episodes returned
`enable_reranking`	`true`	Whether to rerank after embedding search
`rerank_multiplier`	—	How many extra candidates to fetch for reranking
`max_iterations`	`1`	Iterative retrieval depth
`min_fact_score`	—	Minimum score threshold for facts
`default_char_budget`	`16000`	Character limit for assembled context
`max_episodes`	—	Episode cap in assembly