Runway’s CEO thinks AI video is just the warm-up for world models

AI-generated video has gone from party trick to legitimate creative tool almost overnight, and Runway has had a front-row seat the whole time. The New York company has raised close to $860 million at a $5.3 billion valuation, putting it toe-to-toe with Google and OpenAI in the generative video arms race. But CEO Cristóbal Valenzuela doesn’t think the endgame is just making better cat videos.

In a recent interview, Valenzuela laid out his vision: AI video is a prequel. The real prize is what he calls “world models”—systems that don’t just predict the next pixel, but understand physics, causality, and 3D environments well enough to simulate entire scenes from scratch. Think less “text-to-video” and more “text-to-reality.”

It’s a bold claim, and one that’s been floated before by researchers at DeepMind and MIT. But Runway is actually shipping products that inch toward it. Their latest model, Gen-3 Alpha, already shows surprisingly coherent object persistence and basic physical interactions—a cup falls and shatters, water flows around obstacles. It’s not photorealistic every time, but the improvement over Gen-2 is dramatic.

Valenzuela argues that video generation forces models to learn implicit rules about how the world works. A model that can’t keep a ball bouncing consistently or a person walking without morphing into a blob isn’t really understanding anything—it’s just memorizing patterns. True world models would need to internalize gravity, occlusion, material properties, and even simple cause and effect.

I’m cautiously optimistic about this direction, but I’ve also seen this hype cycle before. Every few years someone declares that generative models are on the verge of simulating reality, and then we get another round of impressive-but-brittle demos. Runway’s advantage is that they’re actually iterating in public—their tools are used by real filmmakers and designers, which forces them to handle edge cases rather than just cherry-picking outputs for a paper.

The bigger question is whether world models will ever be computationally practical. Simulating physics at scale is brutally expensive, and current transformer architectures aren’t designed for it. Valenzuela acknowledged this, hinting that Runway is exploring hybrid approaches that combine neural networks with traditional game-engine-style simulators. That feels more realistic than pure end-to-end learning.

If they pull it off, the implications go far beyond entertainment. World models could power autonomous driving simulations, robotics training environments, architectural visualization, and even scientific modeling. Imagine asking an AI to simulate how a new bridge design handles an earthquake, or how a drug molecule interacts with a protein—without needing a physics engine for every scenario.

For now, Runway is still mostly known for its video tools, and that’s fine. The company has a solid business serving creators who want to generate B-roll, animate storyboards, or add visual effects without a full VFX pipeline. But Valenzuela is clearly playing a longer game. He’s positioning Runway not just as a media company, but as an infrastructure bet on how AI learns to understand space and time.

I’ll believe world models are truly here when I can drop a virtual bowling ball into a generated scene and watch it knock over pins with consistent physics—not just once, but a hundred times in a row without glitches. Runway isn’t there yet, but they’re closer than most. And that’s worth paying attention to.

Runway’s CEO thinks AI video is just the warm-up for world models

Comments (0)