Consolidation
Consolidation
Over time, a memory store accumulates overlapping and redundant beliefs. You might say “I prefer TypeScript” in three different conversations. The extraction pipeline captures each instance — consolidation merges them.
Why consolidation matters
Without consolidation, retrieval returns near-duplicates that waste context budget and confuse downstream models. Consolidation keeps the knowledge base tight: one authoritative belief per concept, with lineage back to the originals.
How it works
Cluster-based consolidation
After a batch of new facts is ingested, consolidation runs as a second phase:
- Seed selection: New fact IDs from the current ingestion are the seeds
- Neighbor expansion: For each seed, find similar existing facts via embedding similarity
- Clustering: A union-find algorithm groups facts with overlapping neighbors into clusters (using token Jaccard similarity and embedding scores)
- LLM resolution: Each multi-fact cluster is sent to an LLM that decides the outcome:
- Marking: All processed facts get a
consolidated_attimestamp so they’re skipped in future sweeps
Pairwise consolidation
An incremental path for smaller batches:
- Intra-batch dedup: Token Jaccard catches near-identical facts within the same batch
- Vector search: Each new fact is compared against existing facts via embedding similarity
- Scope filtering: Candidates are filtered by context (e.g., same artifact scope)
- Optional reranking: Widens the candidate pool and reranks for precision
- LLM pairwise resolution: Each candidate pair →
supersedeorkeep_all
Artifact consolidation
Artifacts go through a separate consolidation pipeline:
- Similar artifacts are found via embedding search
- Text deduplication (Jaccard) catches near-copies
- Above a confidence threshold, artifacts are linked as versions or derivatives
- Version chains are maintained so the full evolution history is preserved
Configuration
Consolidation vs belief revision
These are complementary systems:
- Belief revision handles explicit contradictions detected during extraction (“we switched from Postgres to MySQL”)
- Consolidation handles implicit redundancy detected after extraction (“user prefers TypeScript” appearing three times)
Both result in SUPERSEDED facts with lineage links, but they trigger at different stages and for different reasons.