Anthropic’s policy change matters because it touches the central dream of “safety competition” in AI: the idea that labs will compete not only to build stronger models, but also to build stronger guardrails. In its September 19, 2023 Responsible Scaling Policy, Anthropic said it would pause scaling or delay deployment if its capabilities moved ahead of the safety measures required for that level of risk. But in Version 3.0, released on February 24, 2026, the company rewrote that approach. The new policy separates what Anthropic itself plans to do from what it thinks the whole industry should do, and it says plainly that it cannot promise to follow the more ambitious industry-wide recommendations on its own. Instead, it now emphasizes Frontier Safety Roadmaps, regular Risk Reports, and, in some cases, external review. (www-cdn.anthropic.com)
What changed is not only the policy text, but the company’s theory of change. Anthropic says it once hoped its framework would trigger a “race to the top,” encouraging rivals to adopt similar standards. In part, that worked: Anthropic notes that OpenAI and Google DeepMind later published comparable frontier-safety frameworks, and the Frontier AI Safety Commitments announced at the AI Seoul Summit on May 21, 2024 pushed major firms to publish safety frameworks and define thresholds for intolerable risk. Yet the same official documents also reveal the weakness of voluntary safety competition. Anthropic now argues that if one company slows down while others keep training and releasing powerful systems, the most reckless actor may end up setting the pace. OpenAI’s updated Preparedness Framework, published in 2025, likewise says it may adjust its requirements if another frontier developer releases a high-risk system without similar safeguards, while Google DeepMind explicitly calls frontier security a collective-action problem. (anthropic.com)
So, can “safety competition” survive? Yes—but only in a limited form. Companies can still compete on transparency, testing, red-teaming, and reporting. Anthropic itself argues that its earlier framework successfully pushed it to build stronger safeguards, and it says ASL-3 protections were activated in May 2025. But the harder lesson is that market pressure alone is unlikely to sustain costly safety promises when rivals can ignore them. In that sense, Anthropic’s retreat is less a sudden betrayal than a warning: without shared rules, public accountability, and eventually regulation, the race to the top can quickly become a race to explain why the top is unreachable. (anthropic.com)










