Until recently, the idea of the U.S. government testing powerful AI models before public release sounded extreme. In May 2026, it became much more real. NIST’s Center for AI Standards and Innovation, or CAISI, announced new agreements with Google DeepMind, Microsoft, and xAI to conduct pre-deployment evaluations of frontier AI systems. Because OpenAI and Anthropic had already signed similar agreements in 2024, all five major U.S. frontier labs are now participating in this voluntary review system. According to NIST, CAISI has already completed more than 40 evaluations, including tests on advanced models that have not yet been released to the public. (nist.gov)
Why is Washington moving in this direction? The short answer is national security. NIST says these agreements are meant to assess frontier AI capabilities, study security risks, and improve AI evaluation methods. In some cases, developers even provide versions of models with weakened safeguards so government experts can probe what the systems might do in worst-case conditions. CAISI also works with an interagency group called the TRAINS Taskforce and can support testing in classified environments. This push fits the Trump administration’s July 23, 2025 AI Action Plan, which called for building and updating national-security-related AI evaluations through collaboration between CAISI and security agencies. (nist.gov)
The bigger question is whether this should remain voluntary. Supporters argue that independent testing before release is simply common sense: if a model can dramatically strengthen cyberattacks or reveal dangerous new capabilities, society should know before the system spreads everywhere. That concern has intensified after reports about Anthropic’s Mythos model and its potential to accelerate hacking. At the same time, critics worry that government review could slow innovation or evolve into political control over a fast-moving technology. For now, the United States has not created a formal mandatory approval regime; officials have only said they are studying a possible executive order that would make the process more like an FDA-style safety check. In other words, America is no longer asking whether AI should be evaluated before release. It is arguing about how strong that evaluation should be. (investing.com)










