Consolidation

Over time, a memory store accumulates overlapping and redundant beliefs. You might say “I prefer TypeScript” in three different conversations. The extraction pipeline captures each instance — consolidation merges them.

Why consolidation matters

Without consolidation, retrieval returns near-duplicates that waste context budget and confuse downstream models. Consolidation keeps the knowledge base tight: one authoritative belief per concept, with lineage back to the originals.

How it works

Cluster-based consolidation

After a batch of new facts is ingested, consolidation runs as a second phase:

Seed selection: New fact IDs from the current ingestion are the seeds
Neighbor expansion: For each seed, find similar existing facts via embedding similarity
Clustering: A union-find algorithm groups facts with overlapping neighbors into clusters (using token Jaccard similarity and embedding scores)
LLM resolution: Each multi-fact cluster is sent to an LLM that decides the outcome:

Outcome	What happens
Supersede	One fact absorbs the others. The survivor may be reworded to capture nuance from the group. Superseded facts get `status: SUPERSEDED` with `replaced_by` links.
Keep all	The facts are genuinely distinct (e.g., different scopes). They’re marked as consolidated so they won’t be re-evaluated.

Marking: All processed facts get a consolidated_at timestamp so they’re skipped in future sweeps

Pairwise consolidation

An incremental path for smaller batches:

Intra-batch dedup: Token Jaccard catches near-identical facts within the same batch
Vector search: Each new fact is compared against existing facts via embedding similarity
Scope filtering: Candidates are filtered by context (e.g., same artifact scope)
Optional reranking: Widens the candidate pool and reranks for precision
LLM pairwise resolution: Each candidate pair → supersede or keep_all

Artifact consolidation

Artifacts go through a separate consolidation pipeline:

Similar artifacts are found via embedding search
Text deduplication (Jaccard) catches near-copies
Above a confidence threshold, artifacts are linked as versions or derivatives
Version chains are maintained so the full evolution history is preserved

Configuration

Parameter	Default	What it controls
`enable_consolidation`	—	Master toggle for fact consolidation
`consolidation_resolve_threshold`	`0.60`	Minimum similarity to consider a pair
`consolidation_text_dedup_threshold`	—	Jaccard threshold for text-level dedup
`consolidation_cluster_concurrency`	—	Parallel cluster resolution calls
`consolidation_max_existing_matches`	—	How many existing facts to compare against per seed
`enable_artifact_consolidation`	—	Toggle for artifact consolidation
`artifact_consolidation_confidence_threshold`	—	Minimum confidence for artifact linking

Consolidation vs belief revision

These are complementary systems:

Belief revision handles explicit contradictions detected during extraction (“we switched from Postgres to MySQL”)
Consolidation handles implicit redundancy detected after extraction (“user prefers TypeScript” appearing three times)

Both result in SUPERSEDED facts with lineage links, but they trigger at different stages and for different reasons.