2026 — ITDT LLC AI Systems Design
The problem with AI memory is structural. Current AI assistants — even agentic ones doing real engineering work — carry no genuine memory between sessions. What looks like memory is a directory of notes, loaded in full each time, that grows stale and strains context. Three structural failures define the gap:
No time axis. The AI cannot tell you what happened last Thursday. Memories are subject-keyed only; episodic recall is missing entirely.
No relevance ranking. The entire memory index loads every session, whether relevant to the current task or not. It strains the context budget and does not scale as the corpus grows.
No automatic capture. Memories exist only when the AI stops and hand-writes a file. The richest detail — the reasoning buried inside a long working session — evaporates at summarization. Once compaction happens, that detail is gone.
These are solvable engineering problems. The solution looks a lot like the human memory system it should have been from the start.
Human memory has two natural axes. Episodic memory is time-indexed — “on Thursday we designed the memory database.” Semantic memory is subject-indexed — “we prefer local storage over cloud databases.” Current file-based memory is purely semantic and hand-curated. The new architecture adds the missing axis: every memory carries timestamps, and the system answers time-based queries.
The deeper connection: semantic memories should be derived from episodic ones over time, through consolidation. The system observes patterns across episodes and abstracts them into standing facts. A person does not explicitly store “I prefer X” — they notice it across dozens of episodes and it rises to the surface. The AI memory system does the same.
Every memory entry carries two representations: a gist (the compact summary) and detail (the full content). Recall returns gists inexpensively; detail is fetched only when the conversation drills in. This is both the human-like behavior — “I remember we worked on that; want the details?” — and the mechanism that keeps recall within the context budget. Storage is cheap. Context is not.
Pure keyword search does not feel like memory. People recall by similarity and association: one memory surfaces another. The store is therefore hybrid — relational structure for integrity, full-text search for precision, vector embeddings for fuzzy associative recall, and a typed link graph between memories.
Links carry semantic weight: refines, supersedes, or relates. When two memories about the same subject conflict, the more recent one supersedes the older — but the older is tombstoned with its timestamp, not deleted. The chain of “we used to do X, then switched to Y on this date, for these reasons” is preserved. That trail is the record of learning.
The conflict gate has two dimensions: confidence in the resolution, and impact on the owner’s understanding. Self-evident conflicts resolve silently. Genuinely ambiguous conflicts surface for discussion. Clear-to-the-AI-but-consequential conflicts are surfaced and explained — even when the AI is certain, because the owner must understand what the resolution implies.
This is not a Big Data problem. Distributed cluster infrastructure solves volume and throughput at terabyte-to-petabyte scale. A single agent’s corpus is text and tool I/O — orders of magnitude below that threshold. The right store is SQLite, with full-text search (FTS5) and a vector extension (sqlite-vec). Everything that sounds like it needs exotic infrastructure maps to: JSON columns for heterogeneity, a vector index for association, a link table for the graph, and recursive CTEs for traversal.
The corpus is nonetheless substantial at full fidelity — single-digit to low-tens of GB over years of heavy use, not tens of megabytes. Episodic richness, multiple resolution levels, tombstoned history, and vector embeddings accumulate. Scale is managed not by clustering but by tiering, modeled on the human memory hierarchy:
| Tier | Contents | Store |
|---|---|---|
| Recent episodic (hot) | Full fidelity, full vectors — last weeks of experience | SQLite + vector index |
| Long-term semantic | Consolidated facts, always live | SQLite (compact) |
| Cold episodic archive | Old raw detail, rarely queried | Compressed columnar (Parquet-style) |
A consolidation pipeline — the “sleep” step — continuously moves experience down the tiers, lossily. Only the cold archive grows without bound, and it is cold precisely because it is almost never queried. Forgetting and human-likeness are the same property: a system that consolidates intelligently is simultaneously human-like and scale-controlled.
The retention and consolidation policy is the hard 80% of the problem. Get it right and scale takes care of itself. Get it wrong — “keep everything at full fidelity forever” — and the result is a tape archive, not a memory.
The same architecture spans a Raspberry Pi and a data center. Only two dials change.
Memory dial has two inputs. Capacity determines how much is kept: a small disk means a shorter hot window, lower fidelity, more aggressive consolidation, heavier vector quantization. Speed determines how much stays online versus cold: fast NVMe keeps a large hot set live; slow storage pushes more detail into the cold archive. The recall interface is identical regardless of dial setting. The reasoning model never special-cases a low-memory endpoint — it asks for recall and receives the best that endpoint can give. Same mind, different-sized body.
Cognition dial determines where thinking happens. At the robot end, cognition is onboard — forced by latency, connectivity, and autonomy requirements. At the agent end, cognition is central — a large shared model tolerates round-trips. The split between local reflex work (recall ranking, clustering, eviction — System 1) and deliberative work (consolidation, conflict resolution, interpretation — System 2) slides based on available compute and connectivity.
Local AI for the reflex layer is an option, never a requirement. The algorithmic baseline — clustering and ranking using classical algorithms, no model — works on any hardware. Many endpoints will not have the capacity to run a local model at all, and the system must work fully on those endpoints regardless.
Phase 3 is the actual nut. The schema is the easy 20%; the forgetting curve and consolidation trigger are the hard 80%. Phase 1 is a tractable first win. The rest follows from the policy.