The Next Frontier of Visual AI Is Code

Jun 2

Visual AI is moving from outputs to code artifacts

7 Comments

This makes so much more sense to me because the AI can produce something that is much more semantic that raw pixels. In fact, I think having some intermediate language, like USD would be good. If there is a semantic format that gets generated then there can be other AI stages in a workflow to go from that to almost anything.

It would seem that Spanish anything would be better than direct to pixel with the hardware and AI advances of late.

shuangz

Jun 25

make the process editable not a rigid result. Inspiring!

Kushhh

Jun 7

that freakin! insane , specially i like that Articraft3D

Yuzu Xu

Jun 7

From the Chinese side, the code-artifact framing matches exactly what ByteDance Wan video and MiniMax are building. Wan's programmatic generation pipeline does not produce pixel outputs for direct use — it produces structured asset components that downstream workflows consume. MiniMax's B ARR is mostly enterprise video production, where the value is the programmable output, not the creative pixel. Chinese visual AI labs quietly arrived at the same conclusion: pixel fidelity is a solved problem at scale. The moat is in the downstream composability of whatever the model generates. The shift from creative tool to code artifact is happening simultaneously on both sides, for the same reasons.

Xeun Badejo

Jun 4

We’ve been building in this space for about a year now and we’re gearing up for our launch in a few weeks. We will be launching a design review agent and have been building an editor/visual orchestration platform from the ground up for a few months now. Glad to see the rest of the industry catching up.

Hannes Täyrönen

Jun 3

I've been subconsciously thinking about these things without even noticing it during recent times etc in Claude. The framing makes sense and interesting to see how it evolves.

Abhishek

Jun 2

Great piece, Yoko. The Code → Render → Inspect → Revise loop you describe has massive implications for autonomous driving and robotics that I think are underappreciated.

Right now, generating diverse, edge-case scenarios for AV or robot testing is expensive and time-consuming. If a model can generate simulation environments as executable code (USD scene graphs, game engine scenes, physics simulator scripts), teams can spin up thousands of scenario variants — rain, occlusions, unusual obstacles — without manual authoring. And crucially, each iteration improves the underlying artifact, not just the rendered output. That’s the key shift.

Beyond testing, this unlocks a tighter training loop:

Model generates scenario → Model generates simulation environment via code → Agent/robot system acts in that environment → Outcome feeds back into model

3D assets do need to behave correctly — wheels spin, joints rotate — not just look right. That’s exactly what robotics simulators demand: physically plausible, structurally valid environments where cause and effect are consistent. Visual code generation could make synthetic data generation for robot learning far more scalable.

The renderer-as-feedback-environment framing is the unlock.