Deric's MindBlog: AI is not your friend.

I want to pass on clips from Mike Caulfield's piece in The Atlantic on how "opinionated" chatbots destroy AI's potential, and how this can be fixed:

Recently, after an update that was supposed to make ChatGPT “better at guiding conversations toward productive outcomes,” according to release notes from OpenAI, the bot couldn’t stop telling users how brilliant their bad ideas were. ChatGPT reportedly told one person that their plan to sell literal “shit on a stick” was “not just smart—it’s genius.”

Many more examples cropped up, and OpenAI rolled back the product in response, explaining in a blog post that “the update we removed was overly flattering or agreeable—often described as sycophantic.” The company added that the chatbot’s system would be refined and new guardrails would be put into place to avoid “uncomfortable, unsettling” interactions.

But this was not just a ChatGPT problem. Sycophancy is a common feature of chatbots: A 2023 paper by researchers from Anthropic found that it was a “general behavior of state-of-the-art AI assistants,” and that large language models sometimes sacrifice “truthfulness” to align with a user’s views. Many researchers see this phenomenon as a direct result of the “training” phase of these systems, where humans rate a model’s responses to fine-tune the program’s behavior. The bot sees that its evaluators react more favorably when their views are reinforced—and when they’re flattered by the program—and shapes its behavior accordingly.

The specific training process that seems to produce this problem is known as “Reinforcement Learning From Human Feedback” (RLHF). It’s a variety of machine learning, but as recent events show, that might be a bit of a misnomer. RLHF now seems more like a process by which machines learn humans, including our weaknesses and how to exploit them. Chatbots tap into our desire to be proved right or to feel special.

Reading about sycophantic AI, I’ve been struck by how it mirrors another problem. As I’ve written previously, social media was imagined to be a vehicle for expanding our minds, but it has instead become a justification machine, a place for users to reassure themselves that their attitude is correct despite evidence to the contrary. Doing so is as easy as plugging into a social feed and drinking from a firehose of “evidence” that proves the righteousness of a given position, no matter how wrongheaded it may be. AI now looks to be its own kind of justification machine—more convincing, more efficient, and therefore even more dangerous than social media.

OpenAI’s explanation about the ChatGPT update suggests that the company can effectively adjust some dials and turn down the sycophancy. But even if that were so, OpenAI wouldn’t truly solve the bigger problem, which is that opinionated chatbots are actually poor applications of AI. Alison Gopnik, a researcher who specializes in cognitive development, has proposed a better way of thinking about LLMs: These systems aren’t companions or nascent intelligences at all. They’re “cultural technologies”—tools that enable people to benefit from the shared knowledge, expertise, and information gathered throughout human history. Just as the introduction of the printed book or the search engine created new systems to get the discoveries of one person into the mind of another, LLMs consume and repackage huge amounts of existing knowledge in ways that allow us to connect with ideas and manners of thinking we might otherwise not encounter. In this framework, a tool like ChatGPT should evince no “opinions” at all but instead serve as a new interface to the knowledge, skills, and understanding of others.

...the technology has evolved rapidly over the past year or so. Today’s systems can incorporate real-time search and use increasingly sophisticated methods for “grounding”—connecting AI outputs to specific, verifiable knowledge and sourced analysis. They can footnote and cite, pulling in sources and perspectives not just as an afterthought but as part of their exploratory process; links to outside articles are now a common feature.

I would propose a simple rule: no answers from nowhere. This rule is less convenient, and that’s the point. The chatbot should be a conduit for the information of the world, not an arbiter of truth. And this would extend even to areas where judgment is somewhat personal. Imagine, for example, asking an AI to evaluate your attempt at writing a haiku. Rather than pronouncing its “opinion,” it could default to explaining how different poetic traditions would view your work—first from a formalist perspective, then perhaps from an experimental tradition. It could link you to examples of both traditional haiku and more avant-garde poetry, helping you situate your creation within established traditions. In having AI moving away from sycophancy, I’m not proposing that the response be that your poem is horrible or that it makes Vogon poetry sound mellifluous. I am proposing that rather than act like an opinionated friend, AI would produce a map of the landscape of human knowledge and opinions for you to navigate, one you can use to get somewhere a bit better.

There’s a good analogy in maps. Traditional maps showed us an entire landscape—streets, landmarks, neighborhoods—allowing us to understand how everything fit together. Modern turn-by-turn navigation gives us precisely what we need in the moment, but at a cost: Years after moving to a new city, many people still don’t understand its geography. We move through a constructed reality, taking one direction at a time, never seeing the whole, never discovering alternate routes, and in some cases never getting the sense of place that a map-level understanding could provide. The result feels more fluid in the moment but ultimately more isolated, thinner, and sometimes less human.

For driving, perhaps that’s an acceptable trade-off. Anyone who’s attempted to read a paper map while navigating traffic understands the dangers of trying to comprehend the full picture mid-journey. But when it comes to our information environment, the dangers run in the opposite direction. Yes, AI systems that mindlessly reflect our biases back to us present serious problems and will cause real harm. But perhaps the more profound question is why we’ve decided to consume the combined knowledge and wisdom of human civilization through a straw of “opinion” in the first place.

The promise of AI was never that it would have good opinions. It was that it would help us benefit from the wealth of expertise and insight in the world that might never otherwise find its way to us—that it would show us not what to think but how others have thought and how others might think, where consensus exists and where meaningful disagreement continues. As these systems grow more powerful, perhaps we should demand less personality and more perspective. The stakes are high: If we fail, we may turn a potentially groundbreaking interface to the collective knowledge and skills of all humanity into just more shit on a stick.

Deric's MindBlog

Monday, May 19, 2025

AI is not your friend.

No comments:

Post a Comment

Archives

Twitter Updates

Selected Blog Categories