The Preparedness team is dedicated to making frontier AI models safe
We have several safety and policy teams working together to mitigate risks from AI. Our Safety Systems team focuses on mitigating misuse of current models and products like ChatGPT. Superalignment builds foundations for the safety of superintelligent models that we (hope) to have in a more distant future. The Preparedness team maps out the emerging risks of frontier models, and it connects to Safety Systems, Superalignment and our other safety and policy teams across OpenAI.
Preparedness should be driven by science and grounded in facts
We are investing in the design and execution of rigorous capability evaluations and forecasting to better detect emerging risks. In particular, we want to move the discussions of risks beyond hypothetical scenarios to concrete measurements and data-driven predictions. We also want to look beyond what’s happening today to anticipate what’s ahead. This is so critical to our mission that we are bringing our top technical talent to this work.
We bring a builder’s mindset to safety
Our company is founded on tightly coupling science and engineering, and the Preparedness Framework brings that same approach to our work on safety. We learn from real-world deployment and use the lessons to mitigate emerging risks. For safety work to keep pace with the innovation ahead, we cannot simply do less, we need to continue learning through iterative deployment.
Preparedness Framework (Beta)
Our Preparedness Framework (Beta) lays out the following approach to develop and deploy our frontier models safely:
We will run evaluations and continually update “scorecards” for our models. We will evaluate all our frontier models, including at every 2x effective compute increase during training runs. We will push models to their limits. These findings will help us assess the risks of our frontier models and measure the effectiveness of any proposed mitigations. Our goal is to probe the specific edges of what’s unsafe to effectively mitigate the revealed risks. To track the safety levels of our models, we will produce risk “scorecards” and detailed reports.
We will define risk thresholds that trigger baseline safety measures. We have defined thresholds for risk levels along the following initial tracked categories - cybersecurity, CBRN (chemical, biological, radiological, nuclear threats), persuasion, and model autonomy. We specify four safety risk levels, and only models with a post-mitigation score of “medium” or below can be deployed; only models with a post-mitigation score of “high” or below can be developed further. We will also implement additional security measures tailored to models with high or critical (pre-mitigation) levels of risk.
We will establish a dedicated team to oversee technical work and an operational structure for safety decision-making. The Preparedness team will drive technical work to examine the limits of frontier models capability, run evaluations, and synthesize reports. This technical work is critical to inform OpenAI’s decision-making for safe model development and deployment. We are creating a cross-functional Safety Advisory Group to review all reports and send them concurrently to Leadership and the Board of Directors. While Leadership is the decision-maker, the Board of Directors holds the right to reverse decisions.
We will develop protocols for added safety and outside accountability. The Preparedness Team will conduct regular safety drills. Some safety issues can emerge rapidly, so we have the ability to mark urgent issues for rapid response. We believe it is instrumental that this work gets feedback from people outside OpenAI and expect to have audits conducted by qualified, independent third-parties. We will continue having others red-team and evaluate our models, and we plan to share updates externally.
We will help reduce other known and unknown safety risks. We will collaborate closely with external parties as well as internal teams like Safety Systems to track real-world misuse. We will also work with Superalignment on tracking emergent misalignment risks. We are also pioneering new research in measuring how risks evolve as models scale, to help forecast risks in advance, similar to our earlier success with scaling laws. Finally, we will run a continuous process to try surfacing any emerging “unknown unknowns.”
This is a summary of the main components of the Preparedness Framework (Beta), and we encourage you to read the complete version. This framework is the initial Beta version that we are adopting, and is intended to be a living document. We expect it to be updated regularly as we learn more and receive additional feedback. We welcome your thoughts at email@example.com.