Anthropic Reverses Secret AI Model Degradation Policy

Anthropic is altering a policy for its latest AI model, Claude Fable 5, that would have covertly limited researchers from using the technology to develop competing AI models. The company announced the change after significant criticism from the artificial intelligence research community.

Visible Safeguards for AI Development

Anthropic released Claude Fable 5 earlier this week, a version of its advanced AI model equipped with enhanced safety measures. While some safeguards, such as rerouting users asking about cybersecurity or biology to a less capable model, were anticipated, a different approach was planned for those engaged in frontier AI development.

The company had intended to invisibly degrade the performance of Claude Fable 5 for researchers working on training other AI models. This practice, explicitly banned in Anthropic’s terms of service, would have effectively hindered competitors from using the model. Anthropic stated in a message to WIRED that it is now making these safeguards visible, allowing users to be alerted if their requests are refused or rerouted due to suspected attempts to build highly capable AI.

“We’re changing Fable 5’s safeguards for frontier LLM development to make them visible,” Anthropic said in a statement. “We made the wrong trade-off and we apologize for not getting the balance right.”

Backlash from AI Community

The reversal came after strong opposition from AI researchers. Critics argued that secretly degrading the model’s capabilities crossed a line, even as Anthropic has previously taken steps to limit the use of Claude for building AI models. Claude’s coding agent is a popular tool among developers, including those in open-source AI research.

Dean Ball, a senior fellow at the Foundation for American Innovation and former White House AI adviser, described the policy of degrading performance without user notification as “shockingly hostile and a terrible look.” He added that such “secret sabotage” undermines Anthropic’s broader efforts to promote AI safety and collaboration.

Will Brown, research lead at Prime Intellect, expressed concern that the policy suggested Anthropic did not trust other entities to conduct AI research. “It feels a bit like they’re starting to pull the ladder up behind them,” Brown said, noting that the lack of transparency would leave developers uncertain about potential rule violations.

Brown also highlighted the potential impact on third-party evaluation firms that assess AI models for safety and performance, work that could have been hampered by secret degradations.

Rationale for Original Policy

Anthropic originally implemented these measures due to concerns about the rapid advancement of AI capabilities, stating that AI could improve faster than society can adapt. The company believes it is beneficial to have the option to slow or pause frontier AI development to allow for societal adjustments and alignment research.

The company also cited national security concerns, stating the safeguards were intended to prevent foreign adversaries from using advanced models to gain an advantage in areas like chip optimization. Anthropic indicated that hidden safeguards are more difficult to circumvent, allowing for more targeted application. However, with the AI development safeguard now visible, Anthropic anticipates it may need to cast a broader net, potentially affecting more benign requests, and is working to improve classifier precision.

Mitchell Landsberg

Mitchell Landsberg is the senior reporter for News Raise and focuses on Technology. Mitchell regularly writes about social media platforms and how influencers, industry and general people use them to communicate and make money.