Consolidation

View as Markdown

Consolidation

Over time, a memory store accumulates overlapping and redundant beliefs. You might say “I prefer TypeScript” in three different conversations. The extraction pipeline captures each instance — consolidation merges them.

Why consolidation matters

Without consolidation, retrieval returns near-duplicates that waste context budget and confuse downstream models. Consolidation keeps the knowledge base tight: one authoritative belief per concept, with lineage back to the originals.

How it works

Cluster-based consolidation

After a batch of new facts is ingested, consolidation runs as a second phase:

  1. Seed selection: New fact IDs from the current ingestion are the seeds
  2. Neighbor expansion: For each seed, find similar existing facts via embedding similarity
  3. Clustering: A union-find algorithm groups facts with overlapping neighbors into clusters (using token Jaccard similarity and embedding scores)
  4. LLM resolution: Each multi-fact cluster is sent to an LLM that decides the outcome:
OutcomeWhat happens
SupersedeOne fact absorbs the others. The survivor may be reworded to capture nuance from the group. Superseded facts get status: SUPERSEDED with replaced_by links.
Keep allThe facts are genuinely distinct (e.g., different scopes). They’re marked as consolidated so they won’t be re-evaluated.
  1. Marking: All processed facts get a consolidated_at timestamp so they’re skipped in future sweeps

Pairwise consolidation

An incremental path for smaller batches:

  1. Intra-batch dedup: Token Jaccard catches near-identical facts within the same batch
  2. Vector search: Each new fact is compared against existing facts via embedding similarity
  3. Scope filtering: Candidates are filtered by context (e.g., same artifact scope)
  4. Optional reranking: Widens the candidate pool and reranks for precision
  5. LLM pairwise resolution: Each candidate pair → supersede or keep_all

Artifact consolidation

Artifacts go through a separate consolidation pipeline:

  1. Similar artifacts are found via embedding search
  2. Text deduplication (Jaccard) catches near-copies
  3. Above a confidence threshold, artifacts are linked as versions or derivatives
  4. Version chains are maintained so the full evolution history is preserved

Configuration

ParameterDefaultWhat it controls
enable_consolidationMaster toggle for fact consolidation
consolidation_resolve_threshold0.60Minimum similarity to consider a pair
consolidation_text_dedup_thresholdJaccard threshold for text-level dedup
consolidation_cluster_concurrencyParallel cluster resolution calls
consolidation_max_existing_matchesHow many existing facts to compare against per seed
enable_artifact_consolidationToggle for artifact consolidation
artifact_consolidation_confidence_thresholdMinimum confidence for artifact linking

Consolidation vs belief revision

These are complementary systems:

  • Belief revision handles explicit contradictions detected during extraction (“we switched from Postgres to MySQL”)
  • Consolidation handles implicit redundancy detected after extraction (“user prefers TypeScript” appearing three times)

Both result in SUPERSEDED facts with lineage links, but they trigger at different stages and for different reasons.