How Ferrosa Memory Works — Architecture, Retrieval & Auditable Forgetting

The architecture

Four layers, one clear contract.

An agent never talks to storage directly. It calls MCP tools; Ferrosa Memory owns the semantics of memory; and Ferrosa DB provides durable, indexed substrate underneath. Each layer has a single job, so the layer above can stay simple.

At the top, an agent, IDE, or runtime issues a small, stable set of MCP calls — ingest, retrieve, link, explain, forget. It does not know how a fact is stored, indexed, or ranked; it only knows the tool contract.

The memory service translates those calls into memory semantics: entities and typed edges, bi-temporal facts, trajectory folds, context segments, intentions, and the forget journal. This is where dedup, supersession, ranking, and consolidation policy live.

Underneath, Ferrosa DB does the heavy lifting: CQL storage for the wide-column tables, HNSW vector indexes for semantic recall, a property graph for traversal, and S3-backed durability with hot/warm/cold tiering so memory survives restarts and scales past a single process.

Caller

Agent · IDE · Runtime

Issues a stable MCP tool contract; carries no storage knowledge.

▼ MCP tools

Interface

MCP tool surface

ingest · retrieve · link · explain · forget

▼

Memory semantics

Ferrosa Memory service

entities + graph bi-temporal facts folds + segments hybrid retrieval consolidation forget journal

▼

Substrate

Ferrosa Database

CQL storage HNSW vectors property graph S3 tiering

Agent / IDE / runtime
        │  MCP tools  (ingest · retrieve · link · explain · forget)
        ▼
Ferrosa Memory service        ← owns memory *semantics*
  ├─ entities + typed graph
  ├─ bi-temporal facts (supersession)
  ├─ trajectory folds + context segments
  ├─ hybrid retrieval (Reciprocal Rank Fusion)
  ├─ dream-cycle consolidation
  └─ auditable forget journal
        │
        ▼
Ferrosa Database              ← durable, indexed substrate
  CQL storage · HNSW vectors · property graph · S3 hot/warm/cold tiering

The memory primitives

Memory as structured data, not opaque text.

A vector store gives you one shape: a blob with an embedding. Ferrosa Memory gives you the shapes agents actually reason over — and stores each one natively so it can be queried, linked, and explained.

entities · graph

Entities & typed graph

Named entities are deduplicated with phonetic matching (Double Metaphone) and connected by typed edges — depends_on, contains, calls, uses, references, related_to — alongside legacy bidirectional links like CO_OCCURS_WITH and SUPERSEDES. An agent can follow relationships, not just match strings.

bi-temporal

Bi-temporal facts

Facts are timestamped and superseded rather than overwritten. Retrieval returns the most-recent-valid value by default, while the full supersession chain stays inspectable — so you can audit what was known, and when, and time-travel back through it.

memoization

Memoization

A content-hash-keyed cache of completed sub-call results. When a deterministic sub-task recurs, the prior result is returned from cache instead of paying for a redundant LLM call — memory that saves work, not just stores it.

plan trees

Plan trees

Hierarchical, depth-nested plan and task nodes capture how a complex job decomposes. When a sub-task returns, parent context is re-injected, so a deep agent keeps its place instead of losing the thread.

trajectory folds

Trajectory folds

Open a fold for a sub-task, append turns as work proceeds, then seal it with a summary and an embedding (a FOLDED_INTO edge). Long trajectories compress into recallable units without throwing away the detail underneath.

document chunks

Document chunks

Documents are stored as a hierarchy — document → section → chunk — with semantic previous/next links. When a search hits one chunk, get_chunk_context expands around it to recover the surrounding evidence a single hit would miss.

context segments

Context segments

Raw conversation turns are persisted as deterministic semantic segments, dual-indexed for both lexical (BM25) and vector recall, with temporal previous/next links for bounded expansion windows. The transcript becomes searchable structure.

Hybrid retrieval

One ranked answer, fused from many kinds of evidence.

No single signal is enough — vectors miss exact strings, keywords miss paraphrase, and neither knows what matters in this workspace. Ferrosa Memory runs many retrievers and combines them with Reciprocal Rank Fusion, so a result that several independent signals agree on rises to the top.

The signals fused by RRF

Phonetic entity match — fuzzy name recall via Double Metaphone.
Entity ANN — approximate-nearest-neighbour over entity embeddings (HNSW).
Fold-summary ANN — semantic recall over sealed trajectory summaries.
Context-segment BM25 — lexical match over raw conversation segments.
Context-segment ANN — semantic match over the same segments.
Document-chunk BM25 — lexical match over document chunks.
Document-chunk ANN — semantic match over document chunks.
Document phonetic — phonetic match against document content.
Warmth / recency — session-aware decay favouring recently relevant memory.
PageRank centrality — graph-structural authority of an entity.
Feedback reputation — accumulated user-feedback signal on past results.

⚖️

Fusion, then judgment

Optional query decomposition expands a request into lexical and LLM-generated variants before retrieval. Reciprocal Rank Fusion merges every signal's ranking into one list. Workspace-aware reranking then boosts or demotes results by the current working directory, so local context wins. An optional LLM judge / reranker can take a final pass — but it is off by default.

🔁

A retrieval loop that learns

Relevance feedback on a result set is recorded and folded back into the feedback reputation signal, so future rankings reflect what actually helped. Recall is treated as a system that improves with use — not a fixed similarity threshold frozen at index time.

Retrieval is capability-tuned, not magic. The point of fusing eleven signals is robustness: a result wins because phonetic, lexical, semantic, structural, and behavioural evidence agree — not because one embedding happened to land close.

Dream-cycle consolidation

Memory that gets better while the agent is idle.

Human memory consolidates during rest. Ferrosa Memory does the same: a background "dream cycle" runs during idle periods — and after enough new ingests — turning a pile of fresh facts into linked, ranked, durable knowledge. You can also force a pass with run_consolidation at wrap-up.

triage

Triage

Newly ingested memories are sorted and prepared for deeper processing, separating signal worth linking from noise that can rest.

connect

Connection discovery

Within-fold and pairwise analysis surfaces relationships between memories that arrived separately, proposing new edges across the graph.

insight

Insight generation

Clusters of co-occurring entities are summarized into higher-level insights, so recurring patterns become first-class, retrievable memory.

rank

PageRank recompute

Graph centrality is recomputed as the structure grows, keeping the authority signal that feeds retrieval current with the latest links.

decay

Ebbinghaus warmth decay

An Ebbinghaus-style forgetting curve cools memory that hasn't been touched, so stale context fades in ranking instead of crowding fresh recall.

materialize

Materialization

High-confidence derived predicates are materialized for fast reuse, so repeatedly inferred facts don't have to be recomputed on every query.

Auditable forgetting

The 0.15 headline: memory you can safely remove.

Deleting a memory in a connected graph is dangerous — it can orphan edges, break temporal chains, and invalidate derived facts. So forget is two-phase: look before you leap, then leap with a receipt.

Propose — read-only

A hybrid search finds forget candidates and computes their blast radius: direct edges, temporal supersession chains, derived facts that reference the entity, and inbound references. Nothing is mutated. The phase returns a signed, TTL'd token describing exactly what would change.

Confirm — write

Confirmation replays the token under a TOCTOU guard: a content_hash check rejects the operation if the underlying memory changed since the proposal, so you never delete something other than what you reviewed. The change is recorded in an append-only ForgetJournal before any state is touched.

↩️

Reversible retract vs. hard delete

The default mode is a reversible retraction — a soft-delete to state=unavailable that hides the memory from recall but keeps it restorable via restore_forgotten. When you genuinely need it gone, a hard delete performs a graph DETACH DELETE. Both are journaled.

🛟

Crash recovery

Because every forget is journaled before it executes, a crash mid-operation can't leave memory half-deleted. A startup recovery sweep reprocesses incomplete forgets to a clean state — the append-only journal is the source of truth.

Memory you can remove — and prove you removed correctly — is memory you can trust. The blast-radius preview, signed token, TOCTOU guard, and journal exist so that forgetting is a deliberate, auditable operation rather than a destructive guess.

Hooks & session lifecycle

Memory that shows up at the right moment — and stays quiet otherwise.

The best memory layer is one an agent doesn't have to think about. Ferrosa Memory wires into the session lifecycle through hooks, so recall and capture happen automatically, scoped to the workspace, and biased toward silence over noise.

🌅

SessionStart recall

When a session begins, a hook automatically runs check_intentions and a hybrid search for the current context — so the agent opens with what it already knows, before it reads a single file.

✍️

Turn capture

A turn-capture hook persists conversation turns into context segments and queues consolidation, so today's work becomes tomorrow's recallable memory without an explicit save step.

🧭

Workspace isolation

0.15 derives stable, per-workspace hook sessions instead of leaning on process-global state — and fallback recall stays semantic and procedural, never a raw episodic transcript leaking across sessions.

🤫

Prefer silence over noise

Recall-relevance guards, lexical-overlap checks, and workspace filtering mean a hook would rather surface nothing than inject low-confidence context. Quiet recall keeps the agent's window clean.

⏳

Intentions — prospective memory

set_intention defers an action to fire later on a Topic, FilePattern, Duration, or Context trigger; check_intentions fires the matching ones at session start. Memory remembers not just the past, but what to do next.

📌

Durable session tasks

A focus stack, working set, and recovery hints survive restarts and support multi-agent handoff — so an interrupted agent can pick up exactly where the last one left off.

How Ferrosa Memory works.