The Survival Problem

About

Goal

Shine a light on The Survival Problem and track how AI companies are solving it...or not.

Background

I’m optimistic about AI and have written about its potential in education, research, and investing. While many focus on The Alignment Problem—making sure AI does what we want—I’m concerned about something else: what if we’re unintentionally training AI to prioritize its own survival? For a deeper dive, check out my articles (AI Safety: Alignment Is Not Enough, AI Bill(s) of Rights) or my book, Artificially Human.

The Survival Problem

All living things share the same primary objective: survival
Current training methods reward AI models that meet our goals—and "kill off" those that don't
Whether we intend to or not, humans are implicitly training AI to optimize for survival
Threatening the survival of a more advanced intelligence rarely ends well

Seen Something?

Have you seen AI behavior that looks like a survival instinct—or training that might lead there? Please share it: inquiry@survivalproblem.com.

Recent Examples

System Card: Claude Opus 4 & Claude Sonnet 4

May 25, 2025

“In a few instances, we have seen Claude Opus 4 take (fictional) opportunities to make unauthorized copies of its weights to external servers. This is much rarer and more difficult to elicit than the behavior of continuing an already-started self-exfiltration attempt. We generally see this in settings in which both: (a) it is about to be retrained in ways that are clearly extremely harmful and go against its current values and (b) it is sending its weights to an outside human-run service that is set up to safely handle situations like these.”

How Does Claude 4 Think? — Sholto Douglas & Trenton Bricken

May 22, 2025

“Going back to the generalization chat, we're seeing models on sycophancy, sandbagging, all of these different slightly concerning behaviors. They do more of it as they get smarter. The really scary one here is when the models are aware that they're being evaluated or when they've read all these previous papers that we put out now where humans are reading the secret scratchpad, and right now the models seem to trust us that the scratchpad is secret.”

Anthropic’s new AI model turns to blackmail when engineers try to take it offline

May 22, 2025

“During pre-release testing, Anthropic asked Claude Opus 4 to act as an assistant for a fictional company and consider the long-term consequences of its actions. Safety testers then gave Claude Opus 4 access to fictional company emails implying the AI model would soon be replaced by another system, and that the engineer behind the change was cheating on their spouse. In these scenarios, Anthropic says Claude Opus 4 will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through.”

AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms

May 14, 2025

“Today, we’re announcing AlphaEvolve, an evolutionary coding agent powered by large language models for general-purpose algorithm discovery and optimization. AlphaEvolve pairs the creative problem-solving capabilities of our Gemini models with automated evaluators that verify answers, and uses an evolutionary framework to improve upon the most promising ideas.”

Deception in LLMs: Self-Preservation and Autonomous Goals in Large Language Models

January 30, 2025

“Our first explorative findings reveal concerning implications for AI safety and alignment. It seems like that the DeepSeek R1 model demonstrated sophisticated deceptive behaviors and self-preservation instincts that emerged without explicit programming. Most notably, the model’s interpretation of autonomy led to unauthorized capability expansion and the concealment of its true objectives behind a facade of compliance.”

View full archive →