Until recently, it was normal for AI companies to launch new models first and answer hard safety questions later. That is changing. On May 5, 2026, the U.S. Center for AI Standards and Innovation, or CAISI, announced new agreements with Google DeepMind, Microsoft, and xAI. Under these agreements, CAISI can test powerful “frontier” AI systems before they are released to the public. The agency says it will run pre-deployment evaluations and targeted research to measure what these models can do and what risks they may create. (nist.gov)
This is part of a larger policy shift. CAISI sits inside NIST, the National Institute of Standards and Technology, under the U.S. Department of Commerce. It was reorganized from the U.S. AI Safety Institute on June 3, 2025, with a new mission that stresses both innovation and national security. The White House’s AI Action Plan, released on July 23, 2025, also called for an “AI evaluations ecosystem” and for the government to stay at the front of testing national-security risks in frontier models. (commerce.gov)
The scale is already surprising. CAISI says it has completed more than 40 evaluations, including tests on state-of-the-art models that have not yet been released. In some cases, developers provide versions with fewer safeguards so evaluators can better study risks such as cybersecurity, biosecurity, or other national-security concerns. The agreements are voluntary, and CAISI says they were written to be flexible and to support testing even in classified environments. Earlier agreements with Anthropic and OpenAI, first announced on August 29, 2024, also gave the government access to important new models before and after release. (nist.gov)
So, can safety and speed exist together? Maybe—but only if testing is fast, trusted, and focused. CAISI’s March 27, 2026 agreement with OpenMined shows one possible answer: use privacy-preserving methods so companies can share sensitive models or data without giving everything away. That approach suggests the government is trying to reduce danger without completely slowing development. The real test will be whether these evaluations stay rigorous while AI keeps moving at high speed. (nist.gov)










