Research

Confidence-Building Measures for Artificial Intelligence: Workshop proceedings

Abstract

Foundation models could eventually introduce several pathways for undermining state security: accidents, inadvertent escalation, unintentional conflict, the proliferation of weapons, and the interference with human diplomacy are just a few on a long list. The Confidence-Building Measures for Artificial Intelligence workshop hosted by the Geopolitics Team at OpenAI and the Berkeley Risk and Security Lab at the University of California brought together a multistakeholder group to think through the tools and strategies to mitigate the potential risks introduced by foundation models to international security. Originating in the Cold War, confidence-building measures (CBMs) are actions that reduce hostility, prevent conflict escalation, and improve trust between parties. The flexibility of CBMs make them a key instrument for navigating the rapid changes in the foundation model landscape. Participants identified the following CBMs that directly apply to foundation models and which are further explained in this conference proceedings: 1. crisis hotlines 2. incident sharing 3. model, transparency, and system cards 4. content provenance and watermarks 5. collaborative red teaming and table-top exercises and 6. dataset and evaluation sharing. Because most foundation model developers are non-government entities, many CBMs will need to involve a wider stakeholder community. These measures can be implemented either by AI labs or by relevant government actors.

Acknowledgments

Report authors, in order of contribution

Sarah Shoker (OpenAI)*
Andrew Reddie (University of California, Berkeley)**

Report authors, in alphabetical order

Sarah Barrington (University of California, Berkeley)
Ruby Booth (Berkeley Risk and Security Lab)
Miles Brundage (OpenAI)
Husanjot Chahal (OpenAI)
Michael Depp (Center for a New American Security)
Bill Drexel (Center for a New American Security)
Ritwik Gupta (University of California, Berkeley)
Marina Favaro (Anthropic)
Jake Hecla (University of California, Berkeley)
Alan Hickey (OpenAI)
Margarita Konaev (Center for Security and Emerging Technology)
Kirthi Kumar (University of California, Berkeley)
Nathan Lambert (Hugging Face)
Andrew Lohn (Center for Security and Emerging Technology)
Cullen O'Keefe (OpenAI)
Nazneen Rajani (Hugging Face)
Michael Sellitto (Anthropic)
Robert Trager (Centre for the Governance of AI)
Leah Walker (University of California, Berkeley)
Alexa Wehsener (Institute for Security and Technology)
Jessica Young (Microsoft)


All authors provided substantive contributions to the paper through sharing their ideas as participants in the workshop, writing the paper, and/or editorial feedback and direction. The first two authors are listed in order of contribution, and the remaining authors are listed alphabetically. Some workshop participants have chosen to remain anonymous. The claims in this paper do not represent the views of any author’s organization. For questions about this paper, contact Sarah Shoker at sshoker@openai.com and Andrew Reddie at areddie@berkeley.edu.

*Significant contribution, including writing, providing detailed input for the paper, research, workshop organization, and setting the direction of the paper.
**Significant contribution, including providing detailed input for the paper, research, workshop organization, and setting the direction of the paper.