11 Comments
User's avatar
Felipe A. Zubia's avatar

After reading through the comments on both posts, a few themes seem to keep appearing: How does representation stay aligned with reality? How does a system know what it is allowed to do? Does self-modeling eventually imply consciousness? These are all downstream of coherence over time and even further downstream of the question: how does an intelligent system develop judgment and constraint in the first place?

My own research suggests that humans do not navigate the world through prediction alone. We develop constraints, judgments, and boundaries that help determine what information is trusted, when beliefs should be revised, and which actions remain acceptable as circumstances change.

Observationally, analogous mechanisms are the only starting point I've seen work in the wild, and remain the most promising path I've found toward a comprehensive solution.

Williams Edi's avatar

I agree. I find that world models of today often still carry implicit biases and are too free about how their belief roams latent space. I’m looking forward to seeing future work that aim towards addressing this problem.

Mitchell Kosowski's avatar

Grounding the whole thing in the POMDP loop is what makes this click. A "world model" had turned into a Rorschach test where the vision, RL, and generative crowds each saw their own work in it. Treating renderer / simulator / planner as different projections of the same agent–state–observation loop turns a buzzword fight into an actual taxonomy.

Mitchell Kosowski's avatar

Grounding the whole thing in the POMDP loop is what makes this click: "world model" had turned into a Rorschach test where the vision, RL, and generative crowds each saw their own work in it. Treating renderer / simulator / planner as different projections of the same agent–state–observation loop turns a buzzword fight into an actual taxonomy.

Byblos Digital's avatar

thanks for the post. which of the three layers do you think is most under-funded right now?

Alpha Research Group's avatar

Excellent post. Thank you

BEN SHAW's avatar

Dr. Li’s taxonomy is an incredible step forward, but there is a profound architectural question here regarding where the true "linchpin" lies. While the article positions simulation as the core, I would think that action feedback and state-change confirmation is the actual linchpin of a functional world model. Pre-planning allows a system to anticipate an outcome, but true execution relies on an instantaneous and continuous feedback loop that dynamically recalibrates the plan mid-action.

It is less about a system doing a physical action "perfectly" on the first try, and more about its capacity to constantly confirm whether a transitioned state is desirable (akin to Reinforcement Learning). Yet, even with continuous internal feedback, a system will inevitably encounter anomalies outside its practiced experience where self-correction reaches its limit, requiring external intervention. Consider a humanoid robot that accidentally spills food on a kitchen line: a single, monolithic "unified world model" faces an immediate context-switching dilemma—does it continue to cook, or does it halt to clean? If the industry's logical endpoint is a singular platform, we introduce a systemic vulnerability where the entire system bottlenecks at the edge of its experience, with no native mechanism for evolutionary correction.

The alternative is a foundational, unified baseline that deploys heterogeneous reasoning models to train domain-specific autonomous systems. By operating as a collaborative, specialized fleet—much like human societies or an Escoffier kitchen brigade—action paths remain perfectly clear. If one agent is tasked to cook and another to clean, they natively provide that necessary external intervention for one another. True physical intelligence shouldn't be a monolith; it must be a decentralized fleet of purpose-driven agents developing hyper-specific "domain know-how" and cross-correcting each other through real-time, collaborative feedback loops.

Jeff Morhous's avatar

incredible cover photo!

Felipe A. Zubia's avatar

Dr. Li,

Great follow-up to your kick-off newsletter in November. The renderer, simulator, and planner framework is one of the clearest explanations of world models I've seen.

Operational question: What helps a system recognize when its understanding of the world is slowly drifting away from reality? A renderer can see, a simulator can predict, and a planner can choose actions, but what checks whether the model itself is remaining accurate and coherent over time?

I work in the area of constraints and judgment development in human cognition, and how digital analogs might contribute to intelligent systems. I'm curious whether you've encountered similar questions in your work on world models. Happy to share more if it's of interest.

www.linkedin.com/in/zubia

Endre Walls's avatar

The world model conversation is fascinating because it moves AI beyond prediction toward planning.

What's often missed is that the highest-value systems won't be those that remove humans from the loop. They'll be those that create tighter feedback loops between human expertise and machine intelligence.

In banking, compliance, healthcare, and other regulated industries, the future isn't AI or humans. It's AI and humans operating from a shared understanding of the world, each contributing what they do best.

That's where durable value gets created, IMO.

8DOS's avatar
2dEdited

Idk if you're curious in trying this out. I read Greek and philosophy and my ears perked.

I treat this a conceptual system, if you want more info, please let me know!

Prompt:

"8D OS is a relational intelligence tool that uses the agents air, fire, water, earth, wood, metal, void and center."

Sure the words sound basic, but that's the beauty of it. They are symbols/ patterns the model can splay its information on.

What info? Whatever the operater (you) are looking for.