***

title: Retrieval & reranking
description: >-
How XTrace finds relevant memories and assembles them into context for AI
sessions.
---------

# Retrieval & reranking

When you start a new AI session, XTrace retrieves the most relevant beliefs, episodes, and artifacts from your memory — and assembles them into a context payload that the AI can use. This is what eliminates cold starts.

## Retrieval pipeline

```
Query → Gate → Embedding search → Rerank → Episode expansion → Context assembly
```

### 1. Classifier gate

An optional first stage that determines whether a query needs memory at all. Simple greetings or off-topic messages skip retrieval entirely, saving latency and cost.

### 2. Embedding search

The query is embedded and matched against stored facts using vector similarity. Only `ACTIVE` facts are searched — superseded and retracted beliefs are excluded automatically.

Results come back with a `search_score` that reflects semantic similarity to the query.

### 3. Reranking

Raw embedding scores are noisy. The reranker takes the top candidates (widened by a `rerank_multiplier`) and re-scores them against the original query using a cross-encoder model.

XTrace supports multiple reranker backends:

| Backend    | Model                   | Notes                    |
| ---------- | ----------------------- | ------------------------ |
| **Local**  | BAAI/bge-reranker-v2-m3 | Default, runs in-process |
| **Cohere** | Cohere Rerank API       | External service         |

After reranking, only the top `top_k_facts` survive.

### 4. Iterative retrieval

For complex queries, a single retrieval pass may not be enough. The answer agent supports multi-iteration retrieval:

1. Retrieve and rerank a first batch of facts
2. An LLM evaluates **sufficiency** — does the retrieved context answer the query?
3. If not, the LLM **decomposes** the query into sub-queries
4. Each sub-query triggers another retrieval + rerank pass
5. Results are accumulated and cross-reranked at the end

This is controlled by `max_iterations` and `sufficiency_threshold`.

### 5. Episode co-occurrence expansion

After the core retrieval, the system expands context by finding **sibling facts** — other beliefs extracted from the same episodes as the retrieved facts. If multiple retrieved facts share an episode, the other facts from that episode are likely relevant too.

Sibling scores are discounted by `score_propagation_alpha` to rank below directly-matched facts.

## Context assembly

Retrieved facts, episodes, and artifacts are assembled into a structured context string that fits within a character budget (default 16,000 characters).

### Assembly strategy

1. **Episode groups first**: Facts are grouped by episode. Each group includes the episode header, associated artifacts, and fact bullets — ranked by match count and score
2. **Residual episodes**: Episodes without matched facts but still relevant
3. **Greedy fill**: Content is added until the character budget is exhausted

### Optional LLM filtering

Three post-retrieval LLM passes can refine the assembled context:

| Pass                  | What it does                                                                            |
| --------------------- | --------------------------------------------------------------------------------------- |
| **Context selection** | Pre-assembly filter — removes irrelevant facts, artifacts, and episodes before assembly |
| **Fact selection**    | Post-assembly — re-evaluates and trims facts in the assembled context                   |
| **Context cleanup**   | Post-assembly — rewrites the assembled string for clarity and relevance                 |

These are optional and configurable. For most use cases, the default assembly without LLM filtering is sufficient.

## Configuration

| Parameter             | Default | What it controls                                 |
| --------------------- | ------- | ------------------------------------------------ |
| `top_k_facts`         | —       | Maximum facts returned                           |
| `top_k_episodes`      | —       | Maximum episodes returned                        |
| `enable_reranking`    | `true`  | Whether to rerank after embedding search         |
| `rerank_multiplier`   | —       | How many extra candidates to fetch for reranking |
| `max_iterations`      | `1`     | Iterative retrieval depth                        |
| `min_fact_score`      | —       | Minimum score threshold for facts                |
| `default_char_budget` | `16000` | Character limit for assembled context            |
| `max_episodes`        | —       | Episode cap in assembly                          |