The US Might Not Want To MAIM
My understanding of Sophie's “positional incentives” critique.
Note: I would encourage you to read this description of MAIM, and Sophie’s critique before reading my response.
Sophie Kim writes a critique of a “Mutually Assured AI Malfunction” (MAIM)1 regime that I had not seen before. My understanding of what she’s saying:
Assume the “decisive-strategic-advantage”-only2 version of MAIM — i.e., superpowers will implement a reciprocal deterrence regime because they don’t want their rivals getting the overwhelming military and economic dominance offered by superintelligence, and not because they are worried about existential risk — and only two actors are competitive.
Reasoned to a first-level (and this is the argument being made in Superintelligence Strategy, a paper by Dan Hendrycks, Eric Schmidt, and Alexandr Wang), the US will believe: “I don’t want China to reach superintelligence, and thus I’m going to release an internal memo / external public statement that says ‘China reaching superintelligence is a threat to the security of the American people,’ and will deploy escalating force (internal sabotage, cyberattacks, kinetic strikes) to prevent them from doing so.” China will do the same, vice versa. This is MAIM.
The first-order implication of MAIM is that if we convince the US government to buy into such a regime (or even any of the Five Eyes), then it’s reasonably likely that it could prompt China (which has good visibility into the actions of each of those governments) to adopt something similar, and thus we would enter into a MAIM equilibrium, whose instability will force both sides to the negotiating table.
Sophie argues that the US, if it was sufficiently superintelligence-pilled to consider this and as the leading actor, has very strong incentives to undermine such a reciprocal regime, such as by rejecting transparency measures, hardening its datacenters, and even deterring the deterrence.
One way it could do that is by classifying its data centers as “critical infrastructure”, and then re-emphasising its existing policy that cyberattacks on such infrastructure would be considered an act of war.
There is a question about how credible this threat would be, but even if it was just a 5% chance of a kinetic strike, China would not risk it — particularly if the US was able to guarantee or gesture that it was isolationist and didn’t want to threaten the Chinese regime, and was sufficiently irrational.
China, therefore, would not be able to credibly deter the US from building superintelligence, and its best option is to instead race toward superintelligence itself. The MAIM equilibrium, therefore, does not emerge at all.
I’d recommend reading her summary of this, as well.
I’m going to reserve judgment on my overall feelings toward the framework as I’ve been promised some follow-up work that will address it, but I will praise Sophie for coming up with a strategic dynamic that I do not think the original Superintelligence Strategy had considered.
Some Notes From Me
Information asymmetry favors the US. This dynamic is strongest in worlds with significant information asymmetry, where the US is “superintelligence-pilled” and China doesn’t get it. China is likely to, therefore, think “the US cares about this technology” and would almost certainly not be willing to risk war to prevent the US from building it. (h/t Oliver) Given that China is currently set to underprice a software singularity, and misfocus on “embodied AI”3 and US companies (I’m looking at you, Anthropic) are set on recursive self-improvement, this may constitute a significant fraction of worlds.
Threat Perception: Stemming from the information asymmetry above, another axis that I believe is implicit in the original paper is the relative extent to which the American government (or defense-industrial complex) and the Chinese government (or CCP) feel like they would be fundamentally threatened by the other. By the nature of DSA, Superintelligence Strategy assumes this would be true — and indeed, if one had control of superintelligence, it’s easy to imagine a way to run a bloodless coup (I would be unsurprised if leading labs already had such a plan internally) — but it is not obvious whether the other actor would realize.
Existential Threat Perception: Sophie’s argument broadly didn’t consider safety, but the actors’ ability to model each other’s safety policies matters. If China viewed AI as an existential threat, it would not need to model the US applying its DSA aggressively against the CCP, it could simply act on behalf of the safety of itself and the Chinese people. I do agree, though that this looks unlikely in the current trajectory.
It suggests that it would be a non-obvious decision to fully superintelligence-pill the US national security community, unless that would trigger a simultaneous reaction from the CCP.
The US is therefore incentivized to clarify that it is, and figure out how it can credibly commit to be non-expansionary / non-threatening to the CCP to prevent it from feeling threatened. This could be as mundane as signalling a more isolationist turn or refocus towards being a regional superpower. Or, in a world where actors were sufficiently worried about existential risk, having strong safety policies could deter MAIMing, since other actors would have less fear for the U.S acting recklessly.
Multi-Actor Risk: Even if you consider other trailing actors like Russia, the incentives on the US still apply, it does make the calculus a lot riskier — the more cyber-capable attackers who have different risk tolerances, the more likely that the US will have to risk another country calling it on its bluff.
It’s hard for the US to credibly “deter the deterrence.” The US does not have such escalation dominance where it can proportion a just-escalatory enough response to a cyberattack. For one, the American people would not be excited by a policy that results in nuclear war if anyone attacks their (already unpopular) datacenters. The US would actually struggle to make a “deter the deterrence” threat credibly — it’s just that even their uncredible threat would probably provide enough disincentive for MAIM to exist.
Cybersecurity could become defense-dominant. With formal verification4, cybersecurity, at least, is a defense-dominant problem — we are a long way off, but plausibly can sufficiently harden data-centers with proto-AGIs. This dramatically reduces both mutual visibility and vulnerability, and is, as Sophie argues, likely by default. MAIM becomes more difficult if the US is fully hardened.
My Overall Take
My overall take (h/t Adam and Sophie) is that China will probably still be AI-pilled enough to MAIM to US using cyber only, correctly believing that political realities mean that further escalation in retaliation (even in the case of declaring a data center “critical infrastructure”) is unlikely.
But, to state the obvious, a US national security establishment focused on subverting MAIM instead of maintaining the MAIM equilibrium would dramatically decrease the likelihood of such a regime working.
I look forward to hearing the counterarguments on the MAIM proponents’ side. For now, the ball is in their court.
Thanks to Sophie Kim, Adam Khoja, Aiden Kim, and Oliver Zhang for helpful discussions and comments.
MAIM (Mutually Assured AI Malfunction): A proposed deterrence framework from the Superintelligence Strategy paper, analogous to nuclear MAD. The idea is that states would sabotage each other’s AI projects to prevent anyone from achieving unilateral superintelligence.
DSA (Decisive Strategic Advantage): The idea that superintelligence could give its controller overwhelming military and economic dominance — enough to reshape the global order unilaterally.
Embodied AI: AI integrated into physical robots and autonomous systems. China has made this a strategic priority in its 15th Five-Year Plan, betting that real-world robotics applications will be the path to AI dominance.
Formal verification: A mathematical technique for proving that code is free of certain classes of bugs. DARPA’s HACMS project demonstrated that formally verified systems could resist Red Team hackers entirely.



This is broadly correct, thank you for writing!! A few notes from me:
Re #1, the structural disincentive is inherent in all versions of MAIM, not just a DSA-only version; this is because MAIM assumes development continues. In any world where the leader continues development, it stands to reason they will not want to accept sabotage. I discussed this a little bit in the original piece, but I don't think it was super clear-- I'll try expanding on this in a follow-up.
Re #5 and #6, to clarify: my piece doesn't necessarily argue that descriptive MAIM won't emerge at all. Rather, I'm presenting two distinct (though related) claims:
(1) MAIM is less likely to emerge if the U.S. actively deters the deterrence-- which it's incentivized to do, since a MAIM equilibrium is uniquely disadvantageous for the leader. Deterring the deterrence involves hardening infrastructure, increasing opacity, and signaling that sabotage will be treated as first strikes rather than legitimate deterrence.
(2) If MAIM emerges anyway (perhaps unilaterally, through the mechanisms Adam brought up), the U.S. remains strongly incentivized to undermine its stability rather than cooperate on the normative framework Superintelligence Strategy proposes. This means resisting mutual vulnerability maintenance, escalation ladder clarification, verification mechanisms, geographic exposure of datacenters, etc.
I'm currently leaning slightly toward (2)-- that some form of unilateral MAIM could emerge through cyber operations despite U.S. resistance-- largely because of the cyber-deterrence asymmetry Adam identified, but may update these beliefs as I continue research on factors affecting sabotage likelihood and feasibility.