***

title: Consolidation
description: >-
How XTrace merges overlapping beliefs and deduplicates knowledge
automatically.
--------------

# Consolidation

Over time, a memory store accumulates overlapping and redundant beliefs. You might say "I prefer TypeScript" in three different conversations. The extraction pipeline captures each instance — consolidation merges them.

## Why consolidation matters

Without consolidation, retrieval returns near-duplicates that waste context budget and confuse downstream models. Consolidation keeps the knowledge base tight: one authoritative belief per concept, with lineage back to the originals.

## How it works

### Cluster-based consolidation

After a batch of new facts is ingested, consolidation runs as a second phase:

1. **Seed selection**: New fact IDs from the current ingestion are the seeds
2. **Neighbor expansion**: For each seed, find similar existing facts via embedding similarity
3. **Clustering**: A union-find algorithm groups facts with overlapping neighbors into clusters (using token Jaccard similarity and embedding scores)
4. **LLM resolution**: Each multi-fact cluster is sent to an LLM that decides the outcome:

| Outcome       | What happens                                                                                                                                                    |
| ------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Supersede** | One fact absorbs the others. The survivor may be reworded to capture nuance from the group. Superseded facts get `status: SUPERSEDED` with `replaced_by` links. |
| **Keep all**  | The facts are genuinely distinct (e.g., different scopes). They're marked as consolidated so they won't be re-evaluated.                                        |

5. **Marking**: All processed facts get a `consolidated_at` timestamp so they're skipped in future sweeps

### Pairwise consolidation

An incremental path for smaller batches:

1. **Intra-batch dedup**: Token Jaccard catches near-identical facts within the same batch
2. **Vector search**: Each new fact is compared against existing facts via embedding similarity
3. **Scope filtering**: Candidates are filtered by context (e.g., same artifact scope)
4. **Optional reranking**: Widens the candidate pool and reranks for precision
5. **LLM pairwise resolution**: Each candidate pair → `supersede` or `keep_all`

## Artifact consolidation

Artifacts go through a separate consolidation pipeline:

1. Similar artifacts are found via embedding search
2. Text deduplication (Jaccard) catches near-copies
3. Above a confidence threshold, artifacts are linked as versions or derivatives
4. Version chains are maintained so the full evolution history is preserved

## Configuration

| Parameter                                     | Default | What it controls                                    |
| --------------------------------------------- | ------- | --------------------------------------------------- |
| `enable_consolidation`                        | —       | Master toggle for fact consolidation                |
| `consolidation_resolve_threshold`             | `0.60`  | Minimum similarity to consider a pair               |
| `consolidation_text_dedup_threshold`          | —       | Jaccard threshold for text-level dedup              |
| `consolidation_cluster_concurrency`           | —       | Parallel cluster resolution calls                   |
| `consolidation_max_existing_matches`          | —       | How many existing facts to compare against per seed |
| `enable_artifact_consolidation`               | —       | Toggle for artifact consolidation                   |
| `artifact_consolidation_confidence_threshold` | —       | Minimum confidence for artifact linking             |

## Consolidation vs belief revision

These are complementary systems:

* **Belief revision** handles explicit contradictions detected during extraction ("we switched from Postgres to MySQL")
* **Consolidation** handles implicit redundancy detected after extraction ("user prefers TypeScript" appearing three times)

Both result in `SUPERSEDED` facts with lineage links, but they trigger at different stages and for different reasons.
