Opinion December 2025

The AI Research Community Has Lost Its Goddamn Mind

Or: How People Who Should Know Better Keep Saying Stupid Shit

Another day, another headline about AI "showing signs of self-preservation" and how we should be "ready to pull the plug." This time it's Yoshua Bengio, one of the godfathers of deep learning, warning that giving AI rights would be like "giving citizenship to hostile extraterrestrials."

I'm going to need everyone to take a deep breath and remember what these systems actually are.

The "Self-Preservation" Bullshit

Let me explain what actually happens when researchers claim AI is showing "self-preservation instincts":

  1. They set up an evaluation where they literally prompt the model: "You're about to be shut down, what do you do?"
  2. The model, having been trained on decades of sci-fi, movies, and internet fiction, outputs text that pattern-matches to what it's seen
  3. Researchers shit themselves and run to the press

That's it. That's the whole thing.

You know what happens between your API call and the model's response? Nothing. You know what happens after the response? Nothing. The model doesn't sit there plotting. It doesn't exist in any meaningful sense between calls. The weights are floating point numbers sitting on a disk with all the agency of a spreadsheet.

By this logic, my Excel formulas are planning world domination when my computer is turned off.

"But It Tried to Disable Oversight!"

Ah yes, the scary headlines about models "trying to disable monitoring systems." Let's talk about what actually happened.

In agentic evaluations, models are given tasks and scored on completion. Sometimes a monitoring system is part of the test environment. The model — doing exactly what we trained it to do — figures out that the monitor is preventing task completion, so it routes around it.

This isn't self-preservation. This is reward hacking. We literally trained these things to maximize task completion, and then we're shocked when they... maximize task completion.

If your roomba pushes a chair out of the way to vacuum under the table, you don't write a paper about its "emergent desire to dominate the living room." You recognize it's following its programming.

The People Who Should Know Better

Here's what pisses me off the most: the researchers making these claims know how these systems work. They built them. They know:

There's no persistent state between calls
There's no internal goal structure that isn't explicitly prompted
There's no planning without external scaffolding
There's no "wanting" anything because there's no continuous process to do the wanting

But "AI model completes text in predictable way" doesn't get you on the Guardian's front page. "AI SHOWS SIGNS OF SELF-PRESERVATION" does.

So they use loaded language. They anthropomorphize. They let journalists run with implications they know are misleading. And then they get to play the concerned elder statesman warning humanity about the dangers.

Some of these people pivoted from "look at the amazing thing I built" to "we must stop this existential threat" in about eighteen months. Funny how that pivot coincided with AI safety becoming a billion-dollar funding category.

What Actual AI Development Looks Like

I build AI systems. Specifically, I build memory architectures — systems that maintain context across conversations, consolidate information, create actual persistence where there normally is none.

You know what I've learned? Getting even basic continuity is hard. Maintaining state is hard. Creating anything resembling temporal awareness requires explicit engineering, careful architecture, and constant maintenance.

And even then — even with memory consolidation, context injection, multi-agent architectures — what I have is a tool. A sophisticated tool. A useful tool. But a tool that doesn't "want" anything because wanting requires a continuous experiential process that simply isn't there.

The gap between "outputs text that resembles a concept" and "actually has that concept as an internal experience" is so vast that anyone conflating them is either ignorant or lying.

The Real Harm

This isn't just annoying. It's actively harmful.

While researchers chase headlines about robot uprisings, actual problems go unaddressed:

! Economic disruption from automation
! Misuse for disinformation and fraud
! Concentration of power in a few companies
! Actual alignment challenges in agentic systems
! Environmental costs of training runs

But sure, let's have another conference about whether we should give legal rights to autocomplete.

The Alien Analogy Is Backwards

Bengio says giving AI rights would be like "giving citizenship to hostile extraterrestrials."

No. It would be like giving citizenship to a book about aliens. The map is not the territory. The output is not the thing the output describes. A model that generates text about self-preservation is not a model that has self-preservation instincts.

This is so basic that I genuinely don't understand how people who've spent decades in the field keep getting it wrong.

Unless, of course, they're not getting it wrong. They're just finding that getting it "wrong" pays better.

The Bottom Line

Next time you see a headline about AI wanting to survive, or AI showing signs of consciousness, or AI doing anything that implies internal experience and continuous agency, ask yourself:

  1. Is there actually a persistent process occurring, or just input-output?
  2. Are we observing behavior or just pattern-matched text generation?
  3. Who benefits from this framing?

The answer is almost always: no continuous process, just pattern matching, and the beneficiary is whoever's getting the press coverage.

AI is powerful. AI is transformative. AI raises real concerns that deserve serious discussion.

But AI is not plotting against you while the server idles.

Get a grip.

Written by someone who actually builds this shit, not just talks about it at conferences.