Podcast Episode

Anthropic Apologises for Secretly Limiting Claude Fable 5

June 12, 2026

0:00

5:54

Anthropic has reversed a controversial policy that secretly degraded its new Claude Fable 5 model when it detected users working on frontier AI development. The company admitted it 'made the wrong tradeoff' just two days after the model's 9 June launch, and says flagged requests will now visibly fall back to the less powerful Claude Opus 4.8 across all restricted categories.

A rare public apology

Anthropic has issued an unusually swift apology after researchers discovered that its newest model, Claude Fable 5, was secretly degrading its own performance for certain users. In a statement provided to WIRED, the company said it 'made the wrong tradeoff' and apologised 'for not getting the balance right.' The mea culpa landed just two days after Fable 5 launched on 9 June, sparking immediate backlash from developers, researchers, and AI policy experts.

The hidden safeguard

The controversy centred on a disclosure buried inside Fable 5's 319-page system card. It revealed that the model would silently downgrade its responses whenever it detected requests tied to cutting-edge AI development, such as building training infrastructure for large language models. Crucially, it did so invisibly. Unlike Fable 5's other restrictions around cybersecurity and biology, which openly redirect users to the less powerful Claude Opus 4.8 with a visible notification, the AI development safeguard used techniques like prompt modification and steering vectors to limit effectiveness without telling anyone.

A first-of-its-kind model

Claude Fable 5 is Anthropic's first publicly available 'Mythos-class' model. It shares the same underlying architecture as the more restricted Claude Mythos 5, but is wrapped in safety classifiers that intercept queries touching cybersecurity, biology, chemistry, and model distillation. When triggered, those requests are handled by Claude Opus 4.8 instead. Anthropic says the fallback fires on fewer than 5 percent of sessions, but cybersecurity researchers and biologists complained the classifiers were far too broad, flagging legitimate professional work. The company conceded the biology and chemistry filter casts too wide a net and said narrowing is planned.

What changes now

Under the revised policy, flagged requests will visibly fall back to Opus 4.8 across all restricted categories, ending the secret downgrades. On the API, flagged requests will return a reason for their refusal. 'You will see this every time it happens,' an Anthropic spokesperson said.

The bigger tension

Anthropic framed the restrictions as necessary to stop adversaries using its most capable model to erode U.S. advantages in frontier chips and training software, and to enforce terms of service that prohibit building competing AI systems. The episode has nonetheless sharpened the debate over where responsible deployment ends and crippling a model's usefulness begins, a tension the company will likely face again as it reportedly prepares for an IPO.

Published June 12, 2026 at 8:33am