10 Questions I'm Thinking About
AI disruption, alignment, verification, chips, and utopia — a working list.
The response to open-source models post has blossomed into Sophie’s full-on project, and will require a little bit of time to finish. For now, some open questions…
Here are ten questions I think matter for broad AI policy and strategy right now, with some brief notes and sub-questions. They have been on my mind because I think it’s possible to make serious progress, broadly underexplored (though I’d love to be shown wrong!) and important factors for understanding the gameboard in AI. If you’re a researcher looking for a project, a funder looking for a gap — perhaps a place to start.
The implications of AI disruption. Does the US initiating AI disruption against China increase or decrease the likelihood of a bilateral cooperative agreement between the two countries?
The case for: policymakers only act in crises, leaders want to avoid instability, disruption shows that the US is taking AI and national security seriously. The case against: cyberattacks are bad for building trust and good will.
How would China actually respond to low-level US cyber sabotage of AI training runs, and what escalation dynamics should we expect?
Definitions. What is a good definition of intelligence recursion? Superintelligence? What are good red lines otherwise for deterrence or regulation?
My brainstorm attempt. Which is the best? (eg. for the criteria of observability, precision, relevance, timeliness and gaming resistance)
Securing AI datacenters against nation-states. If both the US and China were seriously bought into securing their AI datacenters, how long would it take until they were resistant to different levels of nation-state attacks?
For every given point of time in the future, how much time does “locking down the labs” take? What is the full space of AI disruption attacks that could occur?
On the default trajectory, how far away are we from SL-5? Are we even trying? Perhaps the SL-5 taskforce / RAND will have an answer.
Follow the money. Are the current labs spending on compute and datacenter build-out as if they believe “AI will automate most of the economy?” If not, which part of the compute supply chain is the most skeptical?
My best guess is that the chips are flying off the shelves, and it is TSMC + the fab build out that looks conservative, but I don’t know. If this is the case, why don’t buyers sign purchase guarantees? It would be good to run the numbers. Is it, as Samuel suggests, just being risk averse?
AI research progress. My sense is that most employees (not leadership) of the AI labs are skeptical of achieving recursive self improvement within the next 3 years (perhaps with the exception of Anthropic.) What is driving this skepticism?
My approximate model of timelines is that we are ~1 research breakthrough away from “taste in ML” (or, in the most general sense, “taste”), after which we get RSI (and “AGI” is an engineering problem). Such a research breakthrough can be coarsely modelled as drawing from some distribution, as a function of how many scientists are trying + compute that they have, and it does not seem implausible that, e.g., as the neolabs proliferate, these will become more common.
What is a better outside-view model of AI research progress? How about an inside view? (Continual learning? Recurrence? Different architectures? Data efficiency?)
Data poisoning. How much of the internet would you need to poison to meaningfully affect the propensities of the models trained on it?
This Anthropic / UK AISI paper says that a “near-constant number of poison samples“ is enough. If this is true, is this happening already? Can you use this to degrade bioweapon-relevant capabilities for open-weight models?
How do you quantify the effect on propensities rather than capabilities of the models? How robust are those backdooring attacks to fine-tuning and other mitigations? See also the IAPS report on AI integrity and Forethought on catastrophic data poisoning (itself with many interesting and important defense-relevant questions)
How effective is AI-powered training data filtering on removing backdoors? How expensive is it?
Chinese Chip production. Is China good at building AI chips? How far behind are they? How long until they reach parity?
There are arguments on both sides. Nvidia is invested in getting one answer, which muddies the field, and the AI safetyists another. It seems like a big deal if true.
In general, is AI a government priority? What leading indicators would we have?
Alignment difficulty. There has been a vibe shift in people thinking AI alignment is easier. What is generating that? In general, what are the best ways to reason about alignment difficulty? How solvable would it be as a large-scale R&D effort?
Especially important given that alignment difficulty, I believe, is a cause of the split between camps of AI safety. How does it relate to the previous conceptual leaps, including on simulators and shard theory? My sense is it is an empirical argument — in fact we are seeing less emergent misalignment than of previous generations, and that this might speak to the problem being easier. Is there a more satisfying explanation?
In an upcoming post, I’ll speak a little bit about my simplified, stylized model of alignment through considering the space of human values. I’d be curious to see where that is wrong.
AI verification. What is the state of the art in AI verification? How much time / resources would it take to meet the standards of mutual visibility and vulnerability for a bilateral AI deal?
How effective are privacy-preserving AI-agent based approaches / software-or-firmware-only / hardware approaches? Are there other things that you can do? How good are the best open-source datasets?
Utopia. What does a good future look like? Is it possible to reach an end-state which pleases all? What are the institutional features of a bilateral / multinational AGI project?
I know some think this a poorly-formed question, and that the journey matters more than the destination, but I’m not satisfied. What is missing from Nozick’s “Utopia of Utopias”? (Perhaps best illustrated here). How do you solve the problem of new minds — ie. What is the decision process for deciding when a new mind / morally-relevant agent can be created? (Should there be a hard rate-limit? Some kind of resource limitation?)
If answers don’t already exist (online, or in nonpublic documents, or in people’s heads), I believe these could take a talented researcher or organization’s worth of resources. Do email me if you’d be interested on working or funding work on any of these (or point me in the direction of answers that already exist!)
Thanks to Samuel, Adam, Parv, Saheb and Sophie for helpful conversation and feedback.


