10 Questions I'm Thinking About
AI disruption, alignment, verification, chips, and utopia — a working list.
The response to open-source models post has blossomed into Sophie’s full-on project, and will require a little bit of time to finish. For now, some open questions…
Here are ten questions I think matter for broad AI policy and strategy right now, with some brief notes and sub-questions. They have been on my mind because I think it’s possible to make serious progress, broadly underexplored (though I’d love to be shown wrong!) and important factors for understanding the gameboard in AI. If you’re a researcher looking for a project, a funder looking for a gap — perhaps a place to start.
The implications of AI disruption. Does the US initiating AI disruption against China increase or decrease the likelihood of a bilateral cooperative agreement between the two countries?
The case for: policymakers only act in crises, leaders want to avoid instability, disruption shows that the US is taking AI and national security seriously. The case against: cyberattacks are bad for building trust and good will.
How would China actually respond to low-level US cyber sabotage of AI training runs, and what escalation dynamics should we expect?
Definitions. What is a good definition of intelligence recursion? Superintelligence? What are good red lines otherwise for deterrence or regulation?
My brainstorm attempt. Which is the best? (eg. for the criteria of observability, precision, relevance, timeliness and gaming resistance)
Securing AI datacenters against nation-states. If both the US and China were seriously bought into securing their AI datacenters, how long would it take until they were resistant to different levels of nation-state attacks?
For every given point of time in the future, how much time does “locking down the labs” take? What is the full space of AI disruption attacks that could occur?
On the default trajectory, how far away are we from SL-5? Are we even trying? Perhaps the SL-5 taskforce / RAND will have an answer.
Follow the money. Are the current labs spending on compute and datacenter build-out as if they believe “AI will automate most of the economy?” If not, which part of the compute supply chain is the most skeptical?
My best guess is that the chips are flying off the shelves, and it is TSMC + the fab build out that looks conservative, but I don’t know. If this is the case, why don’t buyers sign purchase guarantees? It would be good to run the numbers. Is it, as Samuel suggests, just being risk averse?
AI research progress. My sense is that most employees (not leadership) of the AI labs are skeptical of achieving recursive self improvement within the next 3 years (perhaps with the exception of Anthropic.) What is driving this skepticism?
My approximate model of timelines is that we are ~1 research breakthrough away from “taste in ML” (or, in the most general sense, “taste”), after which we get RSI (and “AGI” is an engineering problem). Such a research breakthrough can be coarsely modelled as drawing from some distribution, as a function of how many scientists are trying + compute that they have, and it does not seem implausible that, e.g., as the neolabs proliferate, these will become more common.
What is a better outside-view model of AI research progress? How about an inside view? (Continual learning? Recurrence? Different architectures? Data efficiency?)
Data poisoning. How much of the internet would you need to poison to meaningfully affect the propensities of the models trained on it?
This Anthropic / UK AISI paper says that a “near-constant number of poison samples“ is enough. If this is true, is this happening already? Can you use this to degrade bioweapon-relevant capabilities for open-weight models?
How do you quantify the effect on propensities rather than capabilities of the models? How robust are those backdooring attacks to fine-tuning and other mitigations? See also the IAPS report on AI integrity and Forethought on catastrophic data poisoning (itself with many interesting and important defense-relevant questions)
How effective is AI-powered training data filtering on removing backdoors? How expensive is it?
Chinese Chip production. Is China good at building AI chips? How far behind are they? How long until they reach parity?
There are arguments on both sides. Nvidia is invested in getting one answer, which muddies the field, and the AI safetyists another. It seems like a big deal if true.
In general, is AI a government priority? What leading indicators would we have?
Alignment difficulty. There has been a vibe shift in people thinking AI alignment is easier. What is generating that? In general, what are the best ways to reason about alignment difficulty? How solvable would it be as a large-scale R&D effort?
Especially important given that alignment difficulty, I believe, is a cause of the split between camps of AI safety. How does it relate to the previous conceptual leaps, including on simulators and shard theory? My sense is it is an empirical argument — in fact we are seeing less emergent misalignment than of previous generations, and that this might speak to the problem being easier. Is there a more satisfying explanation?
In an upcoming post, I’ll speak a little bit about my simplified, stylized model of alignment through considering the space of human values. I’d be curious to see where that is wrong.
AI verification. What is the state of the art in AI verification? How much time / resources would it take to meet the standards of mutual visibility and vulnerability for a bilateral AI deal?
How effective are privacy-preserving AI-agent based approaches / software-or-firmware-only / hardware approaches? Are there other things that you can do? How good are the best open-source datasets?
Utopia. What does a good future look like? Is it possible to reach an end-state which pleases all? What are the institutional features of a bilateral / multinational AGI project?
I know some think this a poorly-formed question, and that the journey matters more than the destination, but I’m not satisfied. What is missing from Nozick’s “Utopia of Utopias”? (Perhaps best illustrated here). How do you solve the problem of new minds — ie. What is the decision process for deciding when a new mind / morally-relevant agent can be created? (Should there be a hard rate-limit? Some kind of resource limitation?)
If answers don’t already exist (online, or in nonpublic documents, or in people’s heads), I believe these could take a talented researcher or organization’s worth of resources. Do email me if you’d be interested on working or funding work on any of these (or point me in the direction of answers that already exist!)
Thanks to Samuel, Adam, Parv, Saheb and Sophie for helpful conversation and feedback.



Takes on takes:
1. Does AI deterrence promote cooperation? - I think this strongly depends on how hard is it to break out of a stalemate. If mutual sabotage is easy and inflicts very long delays, then each country would be encouraged to cooperate and develop AI under mutually agreed terms. If sabotage is hard to sustain (ex: algorithmic progress advances far enough that you can easily train RSI-capable AIs at a covert blacksite), each country might bide their time and hope to break out first. I think the second is much more likely, since even the largest non-nuclear strikes on AI infrastructure would only buy you five or so years at the current rate of algorithmic efficiency improvement.
And in the short term, of course, you would clearly burn your bridges by taking such an aggressive position.
5. AI research progress - There's a few competing ideas on why there might be differences between the leadership and employees. On the one hand, the leaders might be feeling pressure to speak AGI into reality by drumming up investor support, so they're distorting their public timelines.
From the research perspective, I think it's clear that there's several big unsolved capability bottlenecks (continual learning, sample efficiency, self-play in domains without perfect ground-truth). The bet from companies like Anthropic is that scaling LLMs will increase the productivity of their best human researchers enough that they can quickly come up with breakthroughs (such as by letting developers test more ideas more quickly), and that the solutions can then be stapled on top of the existing LLM paradigm.
7. AI Chips - My sense is that China might catch up qualitatively, but that the more important factor is their ability to scale how many they're making, which is still several years behind the U.S. See: Erich's analysis that most of the compute available to China in 2026 will come from legal H200 sales.
https://www.the-substrate.net/p/where-will-china-get-its-compute
To your other point, I think AI is clearly a government priority. It's just that the Chinese government is investing more heavily in robots and embodied AI, while the U.S. is focused on recurisive software improvements to automate white collar work. The Chinese perspective is roughly that it doesn't matter if the U.S. has slightly better models: China can stay close behind by iterating on U.S. improvements, and ultimately grow much more by preparing their industrial base to better integrate AI.
8. Alignment difficulty - Today's LLMs are both highly competent and mostly aligned, which has cut some of the edge from the original Yudkowskian foom and doom perspective. The division is between the people that you can scale LLMs and that the HHH persona will be robust, vs the people that think that either a) LLMs cannot be scaled without some kind of self-play RL and that this optimization pressure will misalign the model, or b) that LLMs can't scale to ASI, and the only architecture that can will be heavily RL-based and prone to misalignment by default.
10. Good futures - State of the art thinking here is Forethought's Better Futures. The major crux in longtermist planning (to me) is whether the driving goal is to reduce existential risk as much as possible, or whether to try and maximize welfare. Ex: there's tension between a one-world government to avoid proliferation of offensive AI capabilities, and with the freedom people would have to do a long-reflection style consideration of what to do with future resources. I am much closer towards the maximizing security end of the spectrum: "The main thing for us to do about Utopia is to protect its potential to someday be made real"
As far as your specific question on creating moral patients: if you can easily create new beings with welfare, and they have rights to some resources, there has to be a limit on creating new moral patients. Otherwise, some populations (digital or biological) will grow explosively and reduce us back to a malthusian equilibrium.
shard theory for the win