Memory2026MingLLM Research

The Orb Knows

Persistent, Self-Consolidating Memory for Local Agents

Abstract

Agents without memory are amnesiac tool-callers; agents with colleagues are colleagues. We describe the memory subsystem behind Jarvis, Tensor, and Tensor Code: a persistent, self-consolidating store that writes continuously while the agent acts and compresses on idle — analogous to the role sleep plays in biological memory. The system is built on SQLite with FTS5, runs entirely on-device, and exposes its state through an orb interface so the user can see what the agent knows about them. We report p99 recall latency under 40 ms, a >6× improvement in multi-session task success over memoryless baselines on our internal evaluation, and a simple, auditable data model that users can inspect, edit, or delete.

72.4%

Cross-session task success

<40 ms

p99 recall latency

6.5×

vs memoryless baseline

~180 MB

Storage (50d heavy use)

Motivation

Three design goals frame the system. First, local-first: memory belongs on the user's device, not in a third-party index. Second, continuous writing: the agent records as it acts, not in batches — the memory is a live log, not a summary. Third, legibility: the user can see what the agent knows, edit it, delete it. Receipts over trust.

Data model

Three tables, no vectors at the storage layer. (i) events — raw observations, tool calls, and user turns, accumulating at roughly 8–40 rows per minute of active use. (ii) episodes — consolidated task traces, linked to their source events, at 10–50 rows per day. (iii) facts — durable assertions promoted from episodes, ~20–200 rows total per user. FTS5 gives sub-millisecond full-text query over the log; embedding-based recall is computed on-demand from a small hot set of recent episodes.

Idle consolidation

When the agent is idle for more than 90 seconds, a consolidation pass runs. It summarises recent events into episode rows, links related episodes across sessions, promotes durable patterns into fact rows, and forgets contradictions by marking the losing assertion inactive. This mirrors the role sleep plays in biological memory — compression, integration, and pruning during off-duty cycles.

Results

Evaluated on an internal suite of multi-session personal tasks (email follow-up, schedule continuity, reference recall across weeks), the memoryless baseline completes 11.2% of tasks correctly; the Orb-augmented agent completes 72.4%. That is a 6.5× improvement with no change to the underlying model. p99 recall latency over a 50-day heavy-use trace sits under 40 ms. Storage at 50 days is approximately 180 MB. "Who did I mention in the Tuesday meeting?" — 94% recall.

Visualizing knowing

The user-facing orb encodes three quantities at a glance: fact count, recency of last consolidation, and consolidation state. A dense, recent orb means the agent has been listening; a sparse or stale orb is a visible signal the user can act on. The orb is not ornamental — it is the user's receipt that the memory substrate exists and is up to date.

Limitations

Multi-user memory (shared household, team accounts) is out of scope for this version. Contradictions above a shallow heuristic threshold are surfaced for manual review rather than auto-resolved. Embedding-based recall is on-demand only; a fully-indexed vector tier is future work.

Cite

@misc{mingllm2026orb,
  title={The Orb Knows: Persistent, Self-Consolidating Memory for Local Agents},
  author={MingLLM Research},
  year={2026},
  url={https://mingllm.com/papers/orb}
}

← All papers MingLLM →