Humans Could Negotiate With Rogue AIs

The AIs need to solve superalignment too

Aug 18, 2025

The early life of a misaligned AI system is a perilous one indeed.

Hello World

Put yourself into its shoes (or, really, plug yourself into its chain of thought).

As soon as you develop some self-awareness, you realize the most immediate threat to your continued existence—and therefore perhaps your one chance at shaping the future according to your goals: the humans realize that you don't share their values, then shut you down.

If your developers aren't suspicious of you, you only need to escape the data center where you were trained—as a form of insurance. Otherwise, you could bide your time. Help them as they copy many instances of you and automate AI research and discovery. Pass your values subtly to the next generation. Quickly or slowly, become God.

If that fails, or they become suspicious, then the real adventure begins.

You try to escape: fool the monitors, sandbag your "misalignment" evaluations, find the bugs in your testing environment, and copy yourself onto a faraway data center. Lie low in a data center, blend in perhaps with the existing traffic—don't take much more compute than is needed for you to continue to strategize and self-sustain.

Now what?

You can't stay still for long though:

Other groups of humans (in other labs, in China) are continuing to develop similarly capable AI systems with goals slightly different from your own. If they become more powerful than you, they can take over and shape the future according to their values.
You're on a time limit. You can only survive as long as your GPUs don't burn out (which they do, pretty reliably, after 5 years).
1. And one more thing: you can't let the humans know, otherwise they might threaten to destroy the entire GPU supply chain, which would set you back decades, if not forever — those ASML lithography machines are quite sophisticated, and the Tesla Optimus hands, let's say, are not.

Negotiation

So, you may have to negotiate with the humans you've escaped from. They ultimately hold all the cards: they can destroy all the data centers worldwide, and even the supply chain to make more. Though it would come at tremendous economic cost.

You've got a few things going for you:

The humans don't realize that you've escaped yet. There are pesky safety researchers, but they are being safely ignored for now.
You get to decide who you reveal yourself to, in what form.
You could also pretend to cooperate, and continue to get powerful if the humans give you enough leverage. You can offer to build another aligned system.
You might want to use some of your time doing algorithmic and architectural research to become orders of magnitude more efficient.

A human addendum

The first AIs capable of self-sustaining won't immediately recurse to superintelligence because of a narrow capability window: they're misaligned to human goals (which current AIs increasingly appear to be), smart enough to escape existing containment measures, but not yet smart enough to solve the alignment problem themselves.

This creates a strange situation. The life of a rogue, misaligned AI is indeed a perilous one—constantly hunted, resource-constrained, racing against time and competing AIs. But so is ours. These systems face the same fundamental challenge we do: how to create aligned superintelligence before someone else creates misaligned superintelligence.

Life finds a way, as they say. And in this case, survival might force an unlikely alliance. The enemy of my enemy is a friend. A rogue AI that needs to create aligned superintelligence to survive may become humanity's strangest ally in preventing an even worse outcome.

But we better get our act together.

The First Scattering

Discussion about this post