5 Comments
User's avatar
Mitchell Kosowski's avatar

The Memento framing nails it and I'd like to build on your point about auditability. In enterprise contexts, the training/deployment boundary isn't just engineering convenience. It's what makes AI legally shippable. Once models compress user experience into weights, every update becomes simultaneously an ML experiment and a data lineage event.

That's why the middle ground feels like where near-term wins live: frozen bases with auditable module layers. Parametric learning isn't blocked by technical limits so much as by the fact that procurement won't sign off on a model whose weights drift every Tuesday or full moon.

The filing cabinet stays locked. You just get differentiable drawer keys.

Shanka Jayasinha's avatar

Great summary of the problem at hand, we believe it can be solved with the right architecture

Timmy Ghiurau's avatar

Best piece we've read on this. The spectrum is right: context, modules, weights. The filing cabinet framing is the one the memory category needed.

We've been building the weight-level end for two years.

www.midbrain.ai

Kevin Keller's avatar

We're building the Artificial Doubt Engine (ADE - Ah-Dee) at Tucuxi - https://tucuxi.ai/

We've applied to A16Z Speedrun - so if you're really interested in funding the radical bet - Novel Architectures that learns continuously - though we fundamentally disagree with your premise that "you should build the learning mechanism into the substrate." - the substrate - the model - is a neuron/flat substrate - reasoning and learning more advanced reasoning should be in a separate harness that can go around any model. There's no filing cabinet, just right sized recurrent neural networks used to compress tacit knowledge about the right depth and scope of reasoning to achieve better and better outcomes and feedback signals that span the temporal time frame in which outcomes can be determined - a lot of decisions we learn from as humans have outcomes that occur further down the road - if we neglect those outcomes, the signal for learning improvement that could have made us/a system better is just thrown away.

Everyone else is building AGI — Artificial General Intelligence. Systems that know. We're building ADE — a system that doubts. Because we believe that doubt is more fundamental than knowledge. Knowledge is what doubt produces when it operates long enough across enough dimensions. Intelligence is what you get when doubt calibration is sufficiently compiled. And the only system you can trust is one that doesn't trust itself.

If you want to learn more: https://medium.com/@hikevin/how-we-named-ade-8f0e4116a450

Scott's avatar

Really good piece.

We've spent the last year at Memco pretty deep in this question, and my main reaction is that people still make the category too model-centric. "Continual learning" gets treated as if it only counts when weights change. That's too narrow.

In practice, learning happens at at least three layers: the model, the harness, and the memory/context layer. Long term, I agree the deeper answer probably does involve stronger parametric learning. But enterprises absolutely want agents that get smarter every day. The near-term question is where that learning shows up first: in opaque weight updates, or in a memory layer that can retain, validate, scope, and reuse what the organization has actually learned.

That matters because most enterprise intelligence is local. Which fix actually worked. Which workflow changed. Which runbook is stale. Which exception matters. Who else should inherit that lesson. There's also a big gap between individual continual learning and organizational continual learning. A lot of the real pain is not "the model can't learn." It's "agent A is re-learning what agent B already discovered last week."

So I'd frame external memory less as a dodge around continual learning, and more as the practical bridge to it. But the bar is high. A bigger filing cabinet is still a filing cabinet. Traces are raw material, not the product. The hard part is deciding what gets admitted, validated, scoped, decayed, and reused.

And there's another reason this matters: if you run those experiential loops well, you generate an incredibly valuable corpus for parametric memory later. Not generic trace sludge — validated experience. The kind of thing you could plausibly compress into LoRA-style updates, adapters, or other scoped weight updates for frontier or open models. So active memory isn't an alternative to parametric learning. It's one of the cleanest ways to prepare for it.