4 Comments
User's avatar
Ed McPadden's avatar

This makes so much more sense to me because the AI can produce something that is much more semantic that raw pixels. In fact, I think having some intermediate language, like USD would be good. If there is a semantic format that gets generated then there can be other AI stages in a workflow to go from that to almost anything.

It would seem that Spanish anything would be better than direct to pixel with the hardware and AI advances of late.

Xeun Badejo's avatar

We’ve been building in this space for about a year now and we’re gearing up for our launch in a few weeks. We will be launching a design review agent and have been building an editor/visual orchestration platform from the ground up for a few months now. Glad to see the rest of the industry catching up.

Hannes Täyrönen's avatar

I've been subconsciously thinking about these things without even noticing it during recent times etc in Claude. The framing makes sense and interesting to see how it evolves.

Abhishek's avatar

Great piece, Yoko. The Code → Render → Inspect → Revise loop you describe has massive implications for autonomous driving and robotics that I think are underappreciated.

Right now, generating diverse, edge-case scenarios for AV or robot testing is expensive and time-consuming. If a model can generate simulation environments as executable code (USD scene graphs, game engine scenes, physics simulator scripts), teams can spin up thousands of scenario variants — rain, occlusions, unusual obstacles — without manual authoring. And crucially, each iteration improves the underlying artifact, not just the rendered output. That’s the key shift.

Beyond testing, this unlocks a tighter training loop:

Model generates scenario → Model generates simulation environment via code → Agent/robot system acts in that environment → Outcome feeds back into model

3D assets do need to behave correctly — wheels spin, joints rotate — not just look right. That’s exactly what robotics simulators demand: physically plausible, structurally valid environments where cause and effect are consistent. Visual code generation could make synthetic data generation for robot learning far more scalable.

The renderer-as-feedback-environment framing is the unlock.