Skip to main content

Safety Systems

The Safety Systems team is dedicated to ensuring the safety, robustness, and reliability of AI models and their deployment in the real world.

Building on the many years of our practical alignment work and applied safety efforts, Safety Systems addresses emerging safety issues and develops new fundamental solutions to enable the safe deployment of our most advanced models and future AGI, to make AI that is beneficial and trustworthy.

Safety Systems stays closest to deployment risks while our Superalignment team focuses on aligning superintelligence and our Preparedness team focuses on safety assessments for frontier models. In collaboration, these teams span a wide spectrum of technical efforts tackling AI safety challenges at OpenAI.

Our approach

We believe that safe AGI cannot be developed in a vacuum. Learning how to safely deploy powerful models and future AGI requires consistent learning, iterative practice, and research in the real world. We continue to invest in model behavior alignment, safety & ethical reasoning skills in foundation models, end-to-end safety infrastructure, as well as human values alignment via human-AI collaboration on policy development.

Problems

Safe deployment of AI models requires solving a new and evolving set of technical challenges and open-ended safety problems. Some examples right now are:

  • How do we ensure our models robustly avoid giving unsafe or inappropriate answers, while also still giving useful and trustworthy answers in a wide range of applications, from high-stakes domains to playful applications?

  • How do we detect unknown classes of harmful answers, actions or usage?

  • How do we maintain user privacy while ensuring safety?

  • How do we build AI to be collaborative with users and safely take actions on behalf of those users?

  • How can we use the model to red-team another model to discover novel failure cases?

  • How do we best leverage diverse human expertise to guide AI safety?

  • How do we share our learnings and solutions to uplift safety across the industry?

OpenAI humans

Lessons

We need to approach AI safety from first principles, using AI itself to solve AI safety challenges and building general solutions for categories of problems. 

There is a delicate tradeoff between safe behavior and usefulness of the model. For example, as the team builds reliable and robust refusal behavior into the model, it is crucial to draw the right boundary and understand the context in order to prevent over-refusal scenarios.

Solid engineering work and infrastructure are the foundation. It enables fast iterations of research and various mitigations via analyzing real-world data and use cases, fast prototyping and smooth deployment. We are designing and building a safety service centered around model capability for automated investigation, analysis, enforcement decisioning and a better data flywheel back into model training.

Team

Safety Systems brings together a diverse team of experts in engineering, research, policy, human-AI collaboration, and product management. This combination of talents has proven to be highly effective, enabling us to access a wide spectrum of solutions ranging from pre-training improvement and model fine-tuning to inference-time monitoring and mitigation.

Safety Systems consists of four subteams.

  • Safety Engineering: The team implements system level mitigation into products, builds a secure, privacy-aware, centralized safety service infra, and creates ML-centric toolings for investigation and enforcement at scale.

  • Model Safety Research: Model behavior alignment is a core focus of our work, with the goal of creating safer models that behave in alignment with our values and are reliable and controllable. The team advances our capabilities for precisely implementing robust, safe behavior in our models.

  • Safety Reasoning Research: Detecting and understanding risks, both knowns and unknowns, is essential to guide the design of default safe model behavior and mitigations. The team is working towards this goal by building better safety and ethical reasoning skills into the foundation model and using these skills to enhance our moderation(opens in a new window) models.

  • Human-AI Interaction: Policy is the interface for aligning model behavior with desired human values and we co-design policy with models and for models, and thus policies can be directly plugged into our safety systems. Human experts also provide feedback for the system for alignment with human expectations in various stages.

OpenAI humans

Join us

Safety Systems is at the forefront of AI safety research and development. If you're interested in being part of this groundbreaking work, we invite you to apply for our safety engineer and research engineer positions. Come work on making AI systems safe and beneficial for humanity.