Retrieval & reranking

View as Markdown

Retrieval & reranking

When you start a new AI session, XTrace retrieves the most relevant beliefs, episodes, and artifacts from your memory — and assembles them into a context payload that the AI can use. This is what eliminates cold starts.

Retrieval pipeline

Query → Gate → Embedding search → Rerank → Episode expansion → Context assembly

1. Classifier gate

An optional first stage that determines whether a query needs memory at all. Simple greetings or off-topic messages skip retrieval entirely, saving latency and cost.

The query is embedded and matched against stored facts using vector similarity. Only ACTIVE facts are searched — superseded and retracted beliefs are excluded automatically.

Results come back with a search_score that reflects semantic similarity to the query.

3. Reranking

Raw embedding scores are noisy. The reranker takes the top candidates (widened by a rerank_multiplier) and re-scores them against the original query using a cross-encoder model.

XTrace supports multiple reranker backends:

BackendModelNotes
LocalBAAI/bge-reranker-v2-m3Default, runs in-process
CohereCohere Rerank APIExternal service

After reranking, only the top top_k_facts survive.

4. Iterative retrieval

For complex queries, a single retrieval pass may not be enough. The answer agent supports multi-iteration retrieval:

  1. Retrieve and rerank a first batch of facts
  2. An LLM evaluates sufficiency — does the retrieved context answer the query?
  3. If not, the LLM decomposes the query into sub-queries
  4. Each sub-query triggers another retrieval + rerank pass
  5. Results are accumulated and cross-reranked at the end

This is controlled by max_iterations and sufficiency_threshold.

5. Episode co-occurrence expansion

After the core retrieval, the system expands context by finding sibling facts — other beliefs extracted from the same episodes as the retrieved facts. If multiple retrieved facts share an episode, the other facts from that episode are likely relevant too.

Sibling scores are discounted by score_propagation_alpha to rank below directly-matched facts.

Context assembly

Retrieved facts, episodes, and artifacts are assembled into a structured context string that fits within a character budget (default 16,000 characters).

Assembly strategy

  1. Episode groups first: Facts are grouped by episode. Each group includes the episode header, associated artifacts, and fact bullets — ranked by match count and score
  2. Residual episodes: Episodes without matched facts but still relevant
  3. Greedy fill: Content is added until the character budget is exhausted

Optional LLM filtering

Three post-retrieval LLM passes can refine the assembled context:

PassWhat it does
Context selectionPre-assembly filter — removes irrelevant facts, artifacts, and episodes before assembly
Fact selectionPost-assembly — re-evaluates and trims facts in the assembled context
Context cleanupPost-assembly — rewrites the assembled string for clarity and relevance

These are optional and configurable. For most use cases, the default assembly without LLM filtering is sufficient.

Configuration

ParameterDefaultWhat it controls
top_k_factsMaximum facts returned
top_k_episodesMaximum episodes returned
enable_rerankingtrueWhether to rerank after embedding search
rerank_multiplierHow many extra candidates to fetch for reranking
max_iterations1Iterative retrieval depth
min_fact_scoreMinimum score threshold for facts
default_char_budget16000Character limit for assembled context
max_episodesEpisode cap in assembly