AI Inference is Surprisingly Cheap

Reality may have a less surprising amount of detail than we previously thought.

Jun 09, 2025

People often make the observation that "reality has a surprising amount of detail." AI might provide a counter-example.

Veo 3, the latest Google video model, can't be that big. It is a big matrix, after all. The full weights would fit on a couple hard drives, max. Yet, it seems pretty close to capturing a full world model from within its parameters, which is getting more and more consistent with every generation.

Sure, those parameters have been laboriously calculated. And the amount of training data is insane.

But, once trained, the fact that inference is so cheap in the grand scheme of things suggests perhaps reality may not be as difficult to simulate as we first expected. It means that, for our simulators, maybe the training run took a lot of time and effort, but now, once the models have been trained, the parameters tuned, running inference is just not that computationally expensive.

We can maybe ground us out using the text models. They, too, have a surprisingly rich world model contained within them, complete with a factual understanding much greater than that contained in wikipedia, the ability to do mathematics and coding, many millions of personalities, endless poetry and art. Except for their consciousness, most people can probably be represented by a couple-trillion parameter model.

That's not even to mention the marvel of the multimodal models. Intelligence is a great compression algorithm. Double click on the personalities part as well — language models can simulate reasonably self-consistent agents, including interactions between them. It's pretty remarkable when you consider all that's happening is a vector is being multiplied by a matrix, again and again. And it's not even that big of a matrix (or set of matrices).

This might suggest the following implications:

Running us as a simulation might be cheap once you've absorbed the initial training cost
1. You may only have to do that once and you can amortize it across all simulations, provided you initial conditions are similar enough
2. Chain of thought, and other test-time compute, further suggests that there may be flexibility in trading off between inference compute and training compute for additional power — perfect for the occasional more complex simulation that they have to do
The world is not that complex actually.
Much of the world's salient features are highly compressible.
AI should be studied as a tool for metaphysical progress.

Thanks to Adam Khoja for helpful conversations and feedback.

Rudolf Laine

Jun 10

Re video models: the standard against which they’re very good is humans not distinguishing them from videos of the world, which is quite different from simulating the world.

Also note we know from physics that the fundamental laws seem simple. But we don’t do video models by simulating physics. Inference is clearly not cheap re the world. Rather, it seems there is so much detail in the world that a data heavy heuristic capturing approach beats actually dealing with the object level detail.

Re a few terabytes: that’s a lot! Genomes are tiny compared to that - O(1GB), and most of that irrelevant to the actual body - and our bodies or those of many other animals sure are very complex.

Expand full comment

The First Scattering

Discussion about this post