11 Comments
User's avatar
Jaci Turner's avatar

One thing that stands out across reliability engineering, safety frameworks, and observability is a shared requirement: systems must recognize uncertainty and respond appropriately at the boundary.

In practice, deployment success often hinges not just on containing failures technically, but on whether humans trust a system’s judgment — especially when it chooses not to act. That trust layer is hard to benchmark, but critical at scale.

Rafayel Ghasabyan's avatar

Fantastic piece, Oliver. It highlights the exact tensions we see in the field. At TACTUN, we’ve focused on building the infrastructure for real-time deterministic control (System 1) so that frontier models like RT-2 and π-0.5 can actually run on real machines in the wild. I wrote a short response on our approach here – thank you for spurring this important conversation!

https://rafayelg.substack.com/p/bridging-the-physical-ai-deployment

Rohit Tamidapati's avatar

The "Deployment Gap" isn’t just a data problem; it’s an architectural one.

Great piece, Oliver, thank you.

Your analysis of the "99.9% reliability threshold" resonates deeply with my work on resilient swarm intelligence. I believe a shift in how we perceive the agent-environment boundary is the key to bridging this gap, which is why I developed, FLOWRRA.

Most learned systems fail because they treat agents and environments as separate, making them incredibly brittle to distribution shifts. Instead of over-relying on external feedback, FLOWRRA treats the agent's internal operational state as its primary environment.

By optimizing for a measurable Flow Coherence Metric, FLOWRRA maintains a long tail of its own stable configurations. We use this historical manifold as the 'ground state' for our retrocausal WFC, allowing the system to leap back to a proven coherent structure the moment an environmental edge-case is detected.

Two points from your article that FLOWRRA specifically solves for:

1.) Reliability through Retrocausal WFC: When continuous adaptation fails, FLOWRRA uses a retrocausal-inspired wave function collapse to make a discontinuous leap back to known stable states or forward to projected coherent configurations. This transforms failure from a "system crash" into a "strategic reconfiguration."

2.) Real-World Scalability via GNNs: While my current implementation was tested on 32 nodes due to local compute limits, the use of Graph Attention Networks (GAT) means the architecture is inherently topology-agnostic. It isn't a "32-node model"; it’s an $N$-node architecture designed for the federated, high-frequency control environments you’re advocating for.

We recently stress-tested this with a 50% hardware failure rate (node freezing), and the swarm maintained 100% mission coverage by "breathing"; stretching its GNN-driven edges to reroute around "dead weight" in real-time.

I’d love to get your thoughts on how this "internal coherence" approach might close the gap for industrial swarms and paving the path for Robotics in general.

Full Write-up & Dynamics Video here:https://rohittamidapati.substack.com/p/flowrra-flow-recognition-reconfiguration

DhaaRn | Weavers of Time.

Violet Herod's avatar

So good, it gives such a good glimpse into where we really are going and MUST go. The Physical World is where I’m at and building.

Sanvar Oberoi's avatar

This essay articulates something many of us only realize after shipping into real operations: the deployment gap isn’t a research lag, it’s an economic and operational filter.

In production environments, the question is rarely “is the model smart?” and almost always “can this live inside legacy systems, hit reliability thresholds, degrade gracefully, and be maintained by non-experts?”

What consistently unlocks value isn’t frontier intelligence alone, but the layers around it - reliability engineering, observability, integration, and failure containment. Without those, even strong learning systems stay stuck as demos.

Feels like the biggest Physical AI outcomes this decade will come from companies building that missing infrastructure layer - the one that converts capability into uptime and benchmarks into EBITDA.

- Sanvar Oberoi, Faclon Labs

Enterprise AI Integrations's avatar

This gap is what I see with clients all the time. Demos look perfect but production is different. The messy real world breaks things fast.

Ale Parise's avatar

We’re actively building in this space. The deployment gap you describe perfectly reflects what we’re seeing in real industrial environments. At Humandroid, we share this view, which is why we’re building RobotsOS H1 an end-to-end OS that turns human demonstrations into reliable, deployable robot skills. We believe Physical AI will scale through infrastructure, data flywheels, and deployment tooling, not just better models.

https://www.humandro-id.com/robots-os

AquaVis's avatar

Good update on the ‘atoms’ side of the AI equation.

We delineate the AI impact along a few dimensions, notably:

Bits - ‘White Collar’ Labor

Atoms - ‘Blue Collar’ Labor

We continue to believe the impact to bits-related Labor (notably codified + entry level) will be the first to experience a nonlinear shift. But the opportunity on the atoms side is massive.

And, agreed, the AI Race is The Race between US/China. Viewed through the lens of our Pillars of Power framework:

'The Technology (AI) Race is The Race to establish Power and Control the Global World Order — everything else, from Energy to Money, flows from there

Technology drives the entire Trump Strategy — AI Supremacy bolsters National Security, improves Productivity, but also requires Control over the full AI stack from Semiconductors to Data Centers

Energy is the Foundational Layer — Unleashing US Energy Dominance powers AI Infrastructure and enables the reshoring of Critical Product Production

Money is now flowing directly into Technology and Energy — Trillions re-directed toward Critical Product Production (e.g., Semiconductor Fabs, Data Centers, Energy Infrastructure)'

https://aquavis.substack.com/p/pop-us-and-the-trump-agenda-6-months

Gerard Rego's avatar

Oliver great article. I wanted to share our point of you with you.

Morgan Stanley and Andreessen Horowitz Agree: Physical AI Is Hitting The Deterministic Physics Wall

Why the Physical AI “deployment gap” is not a tooling failure, a data failure, or a model failure — but a deterministic physics gap

https://gerardrego.substack.com/p/morgan-stanley-and-andreessen-horowitz

Even the Companies Making Humanoid Robots Think They’re Overhyped

Despite billions in investment, startups say their androids mostly aren’t useful for industrial or domestic work yet

https://www.wsj.com/tech/ai/humanoid-robot-hype-use-timeline-1aa89c66

Briggs Rajagopalan's avatar

Hey @Oliver Hsu , great article. We at Silkroute.ai have a similar thesis and are building to fill this gap. I just published an article on this myself. Would love for you to take a look and share your thoughts.

https://open.substack.com/pub/briggsrs/p/why-warehouses-are-the-next-big-opportunity?r=5gez24&utm_medium=ios&shareImageVariant=overlay