Red Lines for Recursive Self-Improvement

10 Ideas For When To Push The Button

Jul 14, 2025

Superintelligence is scary.

Governments may, as the paper "Superintelligence Strategy" suggests, want to deter other countries from building it. This could be because superintelligence may grant their rival a "decisive strategic advantage" — winning both economic and military competition — or because of the inability to control these AIs.

The paper suggests a potential triggering point as “any state's aggressive bid for unilateral AI dominance.” This, however, is a difficult definition for nation-states to base (potentially hostile) preventative action upon: the paper suggests an escalation ladder that ranges from espionage to cyberattacks to kinetic strikes. Indeed, this seems to be the central point of some critiques of the paper.

One example escalation ladder proposed in Superintelligence Strategy.

Deterrence requires not just the will to retaliate, but the ability to recognize when retaliation is justified.

One way to develop superintelligence quickly is to start the "intelligence explosion" by allowing AIs to improve themselves. This could provide one such "red line" that could trigger a superintelligence that had "highly sophisticated cyberattack, expert-level virology, and fully autonomous AI R&D" (as described in the Appendix of the paper). However, even this is a fuzzy process.

The most effective operationalization of such a threshold should have the following properties:

Observability (detectability by external actors),
Precision (clear threshold with minimal ambiguity),
Relevance (direct connection to dangerous recursive improvement),
Timeliness (sufficient warning without false alarms),
Gaming resistance (difficult to circumvent while maintaining effectiveness).

With these criteria in mind, I've made one brief attempt to collect and rank 10 potential operationalizations of "recursive self-improvement", alongside brief discussion of their advantages and disadvantages.

Concentrated Compute Resources. Total compute resources (measured in FLOPS or specialized AI chips) controlled by a single entity or closely networked cluster exceeds threshold τ.

The nice thing: we'll know about this one. It's pretty hard to hide a data center. With a combination of satellite imagery, supply chain tracking (and ideally location trackers on chips, too), energy monitoring and others. This'll also offer a leading indicator before dangerous capabilities actually emerge.

Drawbacks include centrally a regime towards where deployment capabilities matter, where the total capacity to do AI inference (because reinforcement learning / number of virtual environments) matters more, which is inherently easier to train in across many data centers. In general, this may not be stable because of increasing algorithmic efficiency (but likewise with most compute governance), and there is some risk of false alarms from large non-AI applications.

Overall, though, if we see superintelligence soon, I expect this to be the main practical threshold that is adopted, or at least the central part of a broader holistic evaluation.

Deployed System R&D Acceleration. The measured productivity multiplier that engineers achieve when using currently deployed frontier AI systems for AI development work exceeds threshold τ (e.g., >10x speedup).

This measures actual recursive improvement through observable economic effects that are difficult to fake systematically. This is the mechanism that is outlined in AI-2027 that kicks of the intelligence explosion.

You might detect this through productivity studies, economic intelligence, or leaked internal metrics about how much faster AI development gets when using current AI tools. But you need access to competitor workflows or detailed internal metrics that companies keep secret. It may also lag behind cutting-edge capabilities that aren't deployed yet, and it's hard to tell if speedups come from AI versus just better development tools in general.

Autonomous R&D Acceleration. The proportion of total work performed by autonomous AI agents (rather than humans) in developing generation n+1 AI systems exceeds threshold τ, where cognitive work includes research, design, implementation, and testing.

This does also capture the essential dynamic behind a recursively self-improving AI system more precisely than, say, just compute — but also more narrowly.

However, this is difficult to observe without knowledge of the internal development process (though, eg., some have suggested espionage is likely to remain effective), and requires subjective judgements about what counts as an individual autonomous agent, versus human augmentation.

Code-based. The ratio of AI-generated code to human-authored code in the development of successive AI systems exceeds threshold τ (e.g., >90% AI-authored).1

The nice thing about this is that this is simple! What is the fraction of the total codebase that was typed by humans, versus by an LLM? Like others, it's pretty difficult to externally access without access to both the codebase and full, verifiable logs of history, but may be possible with autonomous AI agents. Another problem is that this could be gamed through token human involvement that maintains the appearance of human control, while essentially continuing to be autonomous AIs.

Theoretical R&D. The theoretical maximum R&D acceleration achievable using the most advanced available AI system exceeds threshold τ, regardless of current deployment status.

This is somewhat like autonomous R&D acceleration, and captures the human augmentation too, but instead it requires either benchmarking on one's own systems, or speculating/having access about the capabilities of their rival's, to try and estimate the theoretical maximum with which their rival could achieve an intelligence explosion if they tried. This measures true recursive potential rather than current deployment status, potentially providing earlier warning of dangerous developments than metrics focused on deployed systems.

This may still be possible in a world where the US has responsibly and perfectly paused right on the precipice, and doesn't trust China to do the same.

General Capability Acceleration. The rate of improvement in broad AI capabilities (measured across standardized evaluation suites) exceeds τ over time period Δt.

This provides measurable thresholds through benchmarks that could be observable if the AIs undergo public testing or evaluation. Would you launch a nuclear weapon over HLE? Also, general capabilities may not translate directly to dangerous specific capabilities, evaluations remain highly vulnerable to sandbagging and various forms of gaming by sophisticated actors who can deliberately underperform.

Dangerous Capability Acceleration. AI systems demonstrate rapid improvement (exceeding rate τ over period Δt) in specifically dangerous capabilities: autonomous R&D, advanced cyber operations, biological design, strategic planning, or deception.

Unlike #4, this does directly track the capabilities that create existential risk and decisive strategic advantage. However, unlike #4, it suffers from the further fact that there is not only an incentive for the actor to deliberately underperform, but also for the AIs themselves to "sandbag" if misaligned (or just not perfectly aligned). Also does not track the capabilities that might grant a decisive strategic advantage.

Holistic Assessment. Some holistic judgement of the intelligence community, considering the total amount of compute and data, our (rough) estimate of the talent available, proxies from deployed models and knowledge of our own technical progress.

Chiefly, this is tractable because this essentially boils down to "our best effort." Unfortunately, it is difficult to verifiably communicate this to the opposing side, and liable to the whims and lack of technical understanding of the government.

Derivative-based. Let X(t) be the agent's proficiency with task X at time t. Then self-improvement can be measured as X'(t), meta-improvement with X''(t), etc.. A potential threshold for RSI, then, is when, for any n, X^(n)(t) grows at least as fast as X^(n-1)(t)

This is probably the most theoretically precise mathematical definition, with grounding in what to recursive improvement actually is. But you need incredibly precise measurements of how capabilities change over time, which is probably impossible with messy real-world data. It's too abstract for government officials to actually use. There will be fights about what the measurements mean, how to parameterize and whether they're accurate.

Explicit Declaration. A state or organization has formally announced, ordered, or documented the initiation of recursive self-improvement protocols for AI systems.

If you caught this, this would be the most clear red line. Spies might be able to find these kinds of orders or documents. But smart actors won't announce they're trying to build superintelligence. Recursive improvement will probably happen gradually without anyone deciding to "start recursion." Only incompetent actors would leave this kind of paper trail. (eg. I would not be surprised if all the AI companies had a data retention policy that involves deleting everything after a certain period by default, for other reasons).

The core problem is that the easiest things to detect (compute, economic changes) don't directly measure the danger, while the most dangerous things (actual capabilities, secret plans) are exactly what sophisticated enemies will try their best to conceal, or fake.

We can take some comfort, though, that it's hard to hide data centers and to weed out spies. Likewise, it is hard to follow through on your deterrence. But perhaps some amount of strategic ambiguity may be okay.

Overall, I'm optimistic. My best guess is that, if there is enough political willpower, such potential escalation will only ever have to remain a threat. Best a well-defined one, though.

An adaptation of a tweet suggested by Dan Hendrycks, the lead author of the paper (though I can't find the tweet for the life of me, or Grok)

Andrew Ayobami

Jul 27Edited

Hi, great read and really new perspectives! Would you still view concentrated compute as the most likely red line when considering that the current hardware bottlenecks sort of suggest that the intelligence explosion will likely be software-driven, as opposed to by massive compute scaling? I touched on this briefly here https://mostlyharmlessmachines.substack.com/i/167570330/detection-is-harder-without-significant-compute-scaling . Curious to hear your thoughts

Expand full comment

The First Scattering

Discussion about this post