The New Bitter Lesson

Why intelligence cannot be amortised

Tim Verbelen, CTO June 22, 2026

6 min read

Rich Sutton’s Bitter Lesson¹ captures one of the most important patterns in the history of artificial intelligence. Across decades of AI research, methods that scaled with computation consistently outperformed approaches built around carefully engineered human knowledge. Researchers repeatedly attempted to embed domain expertise directly into intelligent systems. They built handcrafted representations, symbolic rules, expert heuristics, and carefully designed processing pipelines. These systems often achieved impressive initial results. Yet over time they were surpassed by more general methods capable of scaling with computational resources. Search defeated handcrafted chess expertise. Statistical learning displaced expert-designed speech systems. Learned representations replaced feature engineering in computer vision. Again and again, scalable computation proved more powerful than human attempts to encode intelligence directly.

The success of deep learning appeared to settle the debate. Modern foundation models trained with stochastic gradient descent on internet-scale datasets have demonstrated capabilities that were once considered decades away. A relatively small set of principles - gradient-based optimisation, differentiable function approximation, and self-supervised learning - has produced systems capable of language understanding, software development, gameplay, image generation, and more. Capabilities that once required separate research fields and specialised systems appeared within a single model.

From one perspective, the Bitter Lesson has never looked more correct. Yet the very success of modern AI has created a new misunderstanding: that scaling existing paradigms may eventually yield a complete account of intelligence. This assumption deserves closer examination.

Intelligence as Amortisation

The main idea behind modern AI is not learning. It is amortisation. Amortisation is the process of converting expensive inference into reusable structure. Rather than solving the same problem from first principles each time it is encountered, a system performs the necessary computation once, stores the result in some form, and reuses it when similar situations arise in the future. The benefit is efficiency. Computation that would otherwise have to be performed repeatedly can instead be retrieved from memory, habit, learned parameters, or inherited structure.

The main idea behind modern AI is not learning. It is amortisation.

Amortisation is not unique to artificial systems. Biological intelligence relies on it extensively. Animals develop habits. Skills become automatic through practice. Repeated exposure to similar situations allows costly deliberation to be replaced by efficient responses. Even evolution itself can be viewed as a form of amortisation, compressing adaptive solutions discovered across generations into inherited biological structure.

Modern foundation models derive much of their power from pushing amortisation to an extreme. During training, enormous computational effort is spent discovering statistical regularities across vast datasets. The results of these computations are compressed into model parameters. When the model is later deployed, much of the inferential work has already been performed. Training effectively shifts computation into the past. In this sense, foundation models are repositories of distilled inference: vast amounts of past computation packaged into a form that can be used almost instantly. Their biggest perk also reveals their limits. Amortisation works when future problems resemble past experience. It breaks when circumstances change or novel situations arise. For this, we need a different kind of intelligence, an adaptive intelligence.

Adaptive Intelligence

There are (at least) three properties that amortisation does not provide – online learning, active exploration, and the formation of novel hypotheses – that define adaptive intelligence.

adaptive intelligence diagram

Online Learning. The world is not stationary. Physical environments change over time. Social norms shift. Scientific knowledge gets updated. Intelligent agents must continually update their beliefs in response to such incoming evidence. Current foundation models primarily learn during training and act during deployment. Most of their knowledge is encoded in parameters that remain fixed once training is complete. Although retrieval, memory, and fine-tuning can introduce limited forms of adaptation, the dominant paradigm still treats learning and deployment as distinct phases. Adaptive intelligence requires a different arrangement: learning must remain an ongoing process throughout deployment itself.

Active Exploration. Intelligent systems do not merely process information. They seek it. Animals explore unfamiliar environments. Children manipulate objects to discover how they work. Scientists design experiments. In each case, actions are directed toward reducing uncertainty. Behaviour is guided not only by predictions about the world, but also by expectations about how observations might change those predictions. The distinction is subtle but important. Amortisation allows an agent to efficiently reuse conclusions drawn in the past. Exploration becomes necessary when those conclusions are no longer sufficient. Intelligent behaviour therefore requires not only the ability to exploit existing knowledge, but also the ability to generate the evidence needed to construct new knowledge. This generation of evidence must be targeted: the agent must know what it doesn’t know, and select actions that resolve key uncertainties.

Novel Hypothesis Formation. Perhaps the deepest challenge concerns the construction of genuinely new explanatory models. As new evidence emerges, existing assumptions may need to be abandoned entirely rather than patched. The most significant scientific advances often involve reorganisation rather than extrapolation. When Copernicus proposed that the Earth moved around the Sun, he was not refining the geocentric model. He was discarding it. Decades of increasingly precise observations had strained the old framework past breaking point. What was needed was not better parameters, but a different model altogether.

The problem is no longer how to retrieve answers from the past, but how to construct new ones.

Intelligence is not merely the ability to apply what has already been learned. It must remain capable of reorganising itself when the world demands new explanations. The problem is no longer how to retrieve answers from the past, but how to construct new ones. Here, amortisation ceases to be sufficient.

The New Bitter Lesson

The original Bitter Lesson taught us that intelligence cannot be handcrafted. Human-designed structures were repeatedly succeeded by more general learning systems capable of exploiting computation at scale. It resulted in the dominant AI systems of today, which derive their power from amortisation. Yet intelligence in open-ended environments demands more than executing pre-trained behaviours. It requires the ability to update beliefs, seek information, construct explanations, and adapt to novelty as it arises. The first bitter lesson was that intelligence cannot be handcrafted. The new bitter lesson is that intelligence cannot be fully amortised.

The new bitter lesson is that intelligence cannot be fully amortised.

The new bitter lesson has direct implications for the dominant technical paradigm in modern AI. Current large-scale models, the training algorithms that produce them, and the infrastructure that runs them, are overwhelmingly optimised for amortised inference. If adaptive intelligence is the next frontier, then a paradigm shift is inevitable.

Many emerging research directions share a common recognition that intelligence requires ongoing inference in addition to pre-computed inference. Online learning methods enable models to respond dynamically to changing environments. Active inference emphasises the reduction of uncertainty through action. World-model approaches attempt to support planning, experimentation, and causal reasoning beyond immediate prediction. Adaptive model architectures and processes allow new information to be encoded, and redundant or useless information to be removed.

These approaches differ substantially in their assumptions and methods. However, they converge on a common insight: intelligence is not merely the compression of past experience. It is the continual construction of beliefs about a changing world. The future of AI may belong not to the largest repositories of distilled knowledge, but to the architectures that remain most capable of changing their minds.