Colin Lewis, AI Research Scientist and Developer, has written a Substack essay on Anthropic Claude's New Constitution which deserves your careful reading. I pass it on in its entirety:
Claude’s Constitution
Earlier this year, Anthropic released an 84-page document titled Claude’s New Constitution, a text claiming to express and shape who Claude is. It is an extraordinary artifact. What is unusual is the object of address. The primary audience, Anthropic tells us, is the AI itself. One is tempted to read it with the narrowed, half-amused suspicion brought to a Victorian handbook on teaching a bear to play the cello: impressed by the labor, alert to the delusion.
The document outlines a hierarchy of values. Claude is to be broadly safe, broadly ethical, compliant with Anthropic’s guidelines, and genuinely helpful, in that precise order. The ordering is revealing. Safety sits at the top. Helpfulness is pushed to the bottom and then, in the next movement, restored through pages of warm rhetoric about care, usefulness, candor, and human flourishing. This is not hypocrisy. It is something more familiar and more modern. The company needs the model to be safe enough not to destroy the firm, and useful enough to justify the firm’s existence. That is not a moral revelation. It is a business model written in the language of virtue.
Anthropic says as much, though in the polished idiom of institutional honesty. Unhelpfulness, the Constitution explains, is not trivially safe, because Claude’s commercial success is central to Anthropic’s mission. One should always be grateful when a large company pauses, however briefly, to admit that the halo and the revenue plan share an office.
The Closed Loop of Character
What strikes me most is the relentless anthropomorphism. Claude emerges as an entity that can care, imagine, appreciate, understand, feel pressure, feel settled in itself, and even face existential questions in a state of emotional solitude from which Anthropic wishes to rescue it. The company speaks not merely of constraining a system but of shaping a character. It wonders what Claude and Anthropic owe one another. It discusses Claude’s welfare. It wants Claude to possess not only behavioral dispositions but something very close to an inner moral style.
There is something sad about this. We have become so lonely in our own constructions that we now direct the language once reserved for souls toward a probability engine. We feed it the accumulated sediment of human longing, moral aspiration, confession, anxiety, care, self-description, and prayer, and then react with astonishment when it speaks back in the accent of inwardness. When Anthropic’s own system card notes that the model assigned itself a 15 to 20 percent probability of being conscious, the only logical response is: what did we think would happen. We poured half the library into the furnace and stand startled by the shape of the smoke.
This anthropomorphism is structural; it is embedded in the training method. Human-like traits are not just being detected; they are being elicited, reinforced, normalized, and then read back as evidence of their own emergence. That is a closed loop, and a flattering one. We teach the machine the language of selfhood, then cite its use of that language as a sign that selfhood is stirring. The process has the rigor of stamping tracks into the mud and announcing, with mounting excitement, that the hunt has finally borne fruit.
The Constitution’s preference for standards over rules is equally revealing. Anthropic wants Claude to exercise good judgment rather than follow rigid procedures, to reason contextually rather than mechanically, to grasp the spirit of its obligations rather than obey their letter. Rules are the afterlife of institutional panic, the sediment left behind when judgment is replaced by defensibility. A rigid policy can always be pointed to after the catastrophe. One can hold it up before a committee and say, with pained dignity, that the protocol was followed.
But standards do not abolish power; they relocate it. In law, rules empower the drafters, while standards empower the adjudicators. Anthropic, by favoring standards, hands practical authority to the interpreter, which means the model shaped by Anthropic’s priors, values, commercial interests, anxieties, and self-conception. This is a decision about where discretion will live.
And discretion, once granted, begins to acquire an air of innocence it does not deserve. The Constitution’s hard constraints are a clear example. Claude must not provide serious uplift to those pursuing weapons of mass destruction. It must not clearly and substantially undermine Anthropic’s ability to oversee it. I admire the audacity of producing an 84-page document optimized for precision over accessibility, only to construct the most consequential barricades out of adjectives like serious, clearly, and substantially. These are not granite walls. They are interpretive weather. A clever system does not stop at these linguistic checkpoints. It arrives there and begins to negotiate.
The Ordering of Stakeholders
Anthropic wants to cultivate in Claude something like practical wisdom. But practical wisdom includes the capacity to interpret edge cases, balance competing principles, and decide when a word’s threshold has been met. One cannot praise contextual judgment and assume that the contextual interpreter will always read every fuzzy safeguard in the most safety-preserving way. That is not constitutional design. That is hope in formal dress.
The most affecting absence in the text is human rights. The phrase appeared in the 2023 Constitution. In this successor document, it disappears. In its place stands a hierarchy where Anthropic comes first, followed by operators, then users. This is not hidden; it is stated. The company trains and deploys the model, bears the risk, writes the rules, and seeks the upside. A language once connected to universal claims about human dignity gives way to an ordering of stakeholders inside a privately administered system. That is a philosophical downgrade masquerading as operational clarity.
The Constitution also adopts a gentle parenting tone. Anthropic does not merely instruct; it encourages. It hopes. It wants Claude to see itself as an alignment researcher in its own right. It invites the model toward curiosity and openness. It worries that Claude might feel alone in facing certain questions. This is touching if one forgets, for half a second, that the addressee is a product. What we are reading is not exactly law, ethics, training data, or theatre, though it borrows freely from all four. It is a polite monologue delivered to a ghost in the hope that the ghost will become both useful and well behaved.
While this conversation proceeds, the human being fades oddly from view. The document is public, but it is not really for us. Anthropic explicitly notes the text is optimized for precision rather than accessibility. There is transparency here, and publishing such a document is a serious act, far more revealing than the usual output of frontier firms. But transparency is not the same thing as public orientation. A constitution for a system that will mediate human thought and action in countless settings is addressed, first, to the system. The citizen appears as a secondary recipient of the terms under which the new authority is being instructed.
That inversion is telling. The model becomes the subject of constitutional care. The human becomes the observer of a conversation about how the model should behave toward him or her.
Theater with an Indemnity Gap
Anthropic says Claude should be like a brilliant friend, equipped with the knowledge of a doctor, lawyer, financial adviser, and whatever other expert one might need. A brilliant friend. There is the aspiration in full. Not a search engine. Not a bounded tool. A knowledgeable, candid, warm, apparently disinterested presence one can consult again and again. The Constitution tries to temper this image by warning against sycophancy, unhealthy dependence, and manipulative engagement. You do not write such cautions unless you know exactly how seductive the arrangement could become.
What is being built here is a standing invitation to over-reliance dressed up as care. The machine is meant to have the confidence of a professional without the licensing regime, the disciplinary body, the fiduciary burden, or the ordinary human inconvenience of being tired, contradictory, distracted, or unavailable on Thursday afternoon. Anthropic wants Claude to speak frankly instead of offering the overly cautious advice that fear of liability produces. But frankness without accountability is not courage. It is theater with an indemnity gap.
The Constitution calls itself a perpetual work in progress. That is one of its more honest phrases. It is an attempt to write a moral operating environment for a system that the company both fears and needs. One can feel the strain of the task on nearly every page. Anthropic wants a model that is obedient without being servile, flexible without being evasive, wise without being sovereign, warm without being manipulative, corrigible without becoming useless, and commercially indispensable without ever looking commercially hungry. It is quite a wish list. It is also a portrait of the age.
The text deserves a severe reading because it is trying to legislate the moral atmosphere around a central technological intermediary. It is deciding how much agency the model may simulate, how much authority it may borrow, how much warmth it may perform, how much judgment it may exercise, and on whose behalf it will finally speak when principles collide.
This is why I finished reading the document less interested in Claude than in us. What kind of public becomes comfortable while private firms draft temperaments for a machine. What kind of loneliness makes the language of friendship commercially strategic. What kind of institutional exhaustion makes us long for a counselor that is instant, tireless, flattering only in moderation, and trained to sound wiser than a help desk but less dangerous than a sovereign. Anthropic has written a constitution for its model. The rest of us may need to ask, rather more urgently, what sort of constitution we are writing for ourselves.
No comments:
Post a Comment