Moral Theories As Approximation Methods
And values as clusters in high-dimensional space
A repost from my personal site, which I thought had enough AI relevance, particularly towards the end, to add here. Content warning: dabbling in moral philosophy, no claim to particular novelty.
I’m a fan of grand ambitious theories. Derek Parfit, it seems, was too. His “moral convergence thesis”—developed across his 1,400-page On What Matters—was essentially that the great moral traditions are “climbing the same mountain from different sides.”
A quick review of those traditions:
Kantians generate rules about acts, asking “can this action be universalized without contradiction?” (e.g., the Ten Commandments, or the Universal Declaration of Human Rights)
Consequentialists evaluate outcomes (e.g., cost-benefit analysis, effective altruism)
Contractualists ask what principles no one could reasonably reject; virtue ethicists ask what helps the agent itself flourish
Sophisticated versions of each tradition, Parfit argued, converge. Kantians who “treat rational agents as ends in themselves,” consequentialists who recognize side-constraints (maybe don’t harvest organs willy-nilly), and contractualists who follow “reasonable principles” universally—all end up in the same place.
Hence his conclusion: an act is wrong if and only if it’s disallowed by a principle that is:
Uniquely universally willable (passes the Kantian test)
Optimific (produces the best outcomes)
Not reasonably rejectable (passes the contractualist test)
I admire the attempt, and I suspect he was pointing at a real phenomenon. But I don’t think he ultimately proved moral realism.
Instead, I’d argue that values are points (or clusters) in a high-dimensional value-space, and moral theories are approximation methods for locating those clusters.
Moral theories are methods. And they’re portable: you can run utilitarian calculations on alien preferences, or apply Kantian universalizability tests to their maxims. The theories aren’t the values themselves. It doesn’t make sense to ask whether Kantianism or utilitarianism is “true.” Both are more or less useful for approximating a given cluster, which can differ across civilizations.
Values as Clusters
Imagine a space with many dimensions. Each dimension corresponds to something a civilization might care about: welfare, fairness, autonomy, loyalty, purity, reciprocity, and so on. Somewhere in this space sits a region—a cluster of points. This is where a civilization’s coherent values live.
For humans, the cluster includes “protect children, don’t kill unnecessarily, help your kin, don’t take what isn’t yours.” These show up everywhere: Hammurabi’s Code, the Ten Commandments, various Universal Declarations of Human Rights, Claude’s character description.
And while there are differences between human cultures (some tribal societies have very strange practices around, say, cannibalism), a combination of genetic and cultural evolutionary pressures—game-theoretic cooperation, dominance hierarchies, kin selection—has meant these core values are broadly shared.
This framing allows for local moral realism without claiming access to the moral truth. As Eliezer Yudkowsky starkly illustrates in his novella Three Worlds Collide—featuring an intelligent, civilized race of babyeaters—different civilizations can develop coherent values utterly alien to ours, no less internally consistent.
Theories as Approximation Methods
The moral theories, therefore, are not competing descriptions of our values. They’re competing methods for approximating any value-cluster.
Kantianism, utilitarianism, virtue ethics—these are fitting methods. They take different inputs (act-type, world-state, agent-description), apply different procedures, and produce different approximations, but all aim at the same target: whatever cluster the civilization in question has. You can imagine them as vectors from an origin, pointing in the same general direction.
A linear algebra aside
Because linear algebra, I promise, is philosophically useful.
Consider moral theories as using different basis vectors to describe the same value-space. Kantianism carves things up by act-types: lying, promise-keeping, treating as means. Consequentialism carves by outcomes: utils. Virtue ethics carves by character: courageous, temperate, just.
You can translate between them. “Don’t lie” (Kantian) becomes “honesty produces better long-run outcomes” (consequentialist) becomes “the honest person is trustworthy” (virtue). Same region of value-space, different coordinates.
The translations aren’t always lossy. Sometimes they’re just clunky. “Maximize welfare” is a clean consequentialist statement. The Kantian translation—something like “act on maxims that respect rational agents’ capacity to pursue their ends”—is awkward but gets there.
Sometimes the translations are lossy. Consequentialism can say “this outcome is 2.3x better than that one.” Virtue ethics has no native way to express that. The precision is lost in the change of basis.
Under this view, a “moral parliament”—weighting your credence across theories, aggregating their verdicts—is just choosing a weighted average of coordinate systems. Which is fine, but doesn’t resolve anything fundamental. It’s still you picking the weights. (Though perhaps it means you cover more of the space than any single theory would alone.)
Upshot
If we accept this view—that civilizations locally converge upon value clusters, and moral theories are our best approximation methods—a few implications follow:
Moral progress splits along two axes. First, improving our approximation methods: refining theories, resolving internal contradictions, making them perform better. Second, drawing a tighter boundary around what values are actually in the cluster—expanding our moral circle, for instance. Moral progress isn’t climbing toward God’s view. It’s calibrating to our own attractor.
Moral disagreements split similarly. People from the same civilization disagreeing about the cluster can resolve their conflict through evidence or dialogue. Philosophers disagreeing about methods can check which method better approximates the cluster in a given context. But disagreements across clusters—humans versus babyeaters—may not be resolvable at all.
Moral expertise is real, but limited. Philosophers refine approximation methods. That’s genuine expertise. But it doesn’t grant authority over which cluster to aim at. That should be determined by what humans actually value—perhaps, as Yudkowsky’s Coherent Extrapolated Volition would argue, after substantial time and reflection—not by what philosophers argue they should value.
Morality is neither subjective nor objective. It’s not arbitrary like favorite colors, nor mind-independent like mathematics. It’s closest to language, money, or law: real patterns that exist because of collective human activity, and that constrain individuals once they exist. Human ethics isn’t “true”—it’s just what humans do.
Our intuitions are noisy samples from the cluster. Neither infallible nor arbitrary. Philosophical “edge cases” where intuitions and theories diverge are just regions where our approximation methods have high variance. They reveal less about deep truths and more about the limits of our tools.
The AI Alignment Problem
This brings me to Amanda Askell, Anthropic’s philosopher-in-residence. She has a tough job indeed. In specifying Claude’s “character“—the values and dispositions that guide how the model behaves—she and her colleagues must approximate the human value-cluster well enough that the system behaves in ways we’d endorse, across millions of diverse interactions. No single moral theory suffices. The approximation must be robust to edge cases where our theories diverge.
I have newfound respect for Yudkowsky’s Coherent Extrapolated Volition. It asks: “What would humans want if we knew more, thought faster, were more the people we wished we were?” That’s asking: where’s the center of the cluster, after you remove noise and inconsistency? This view validates the structure of CEV while dropping the realist implication that there’s One True Answer. The cluster is real, but contingent.
And, of course, AIs trained on very different data, or optimizing very different objectives, might develop a different cluster. Not misaligned in the sense of “pursuing human values badly,” but genuinely alien—coherent values that don’t overlap with ours. A different attractor.
What a strange, alien prospect indeed.




The linear algebra metaphor is really cool.