5 minute read

GPT-2: 6-Month Follow-Up

We’re releasing the 774 million parameter GPT-2 language model after the release of our small 124M model in February, staged release of our medium 355M model in May, and subsequent research with partners and the AI community into the model’s potential for misuse and societal benefit. We’re also releasing an open-source legal agreement to make it easier for organizations to initiate model-sharing partnerships with each other, and are publishing a technical report about our experience in coordinating with the wider AI research community on publication norms.

Read ReportView CodeLegal Agreement

Key things we’ve learned

1. Coordination is difficult, but possible. To date, there hasn’t been a public release of a 1558M parameter language model, though multiple organizations have developed the systems to train them, or have publicly discussed how to train larger models. For example, teams from both NLP developer Hugging Face and the Allen Institute for Artificial Intelligence (AI2) with the University of Washington have explicitly adopted similar staged release approaches to us. Since February, we’ve spoken with more than five groups who have replicated GPT-2[1].

2. Humans can be convinced by synthetic text. Research from our research partners Sarah Kreps and Miles McCain at Cornell published in Foreign Affairs says people find GPT-2 synthetic text samples almost as convincing (72% in one cohort judged the articles to be credible) as real articles from the New York Times (83%)[2]. Additionally, research from AI2/UW has shown that news written by a system called "GROVER" can be more plausible than human-written propaganda. These research results make us generally more cautious about releasing language models.

3. Detection isn’t simple. In practice, we expect detectors to need to detect a significant fraction of generations with very few false positives. Malicious actors may use a variety of sampling techniques (including rejection sampling) or fine-tune models to evade detection methods. A deployed system likely needs to be highly accurate (99.9%–99.99%) on a variety of generations. Our research suggests that current ML-based methods only achieve low to mid–90s accuracy, and that fine-tuning the language models decreases accuracy further. There are promising paths forward (see especially those advocated by the developers of "GROVER") but it's a genuinely difficult research problem. We believe that statistical detection of text needs to be supplemented with human judgment and metadata related to the text in order to effectively combat misuse of language models.

Partnerships

We’ve partnered with four leading research organizations to analyze both the newly-released 774M parameter GPT-2 model and the unreleased full-size GPT-2 model. We’ve included some preliminary results from them in our technical report, and their ongoing analysis will factor into the potential release of the 1558M model. We’ve also developed a non-commercial legal agreement to facilitate the sharing of models between organizations and are publishing it here to help others initiate such sharing schemes.

  • Cornell University is studying human susceptibility to digital disinformation generated by language models.
  • The Middlebury Institute of International Studies Center on Terrorism, Extremism, and Counterterrorism (CTEC) is exploring how GPT-2 could be misused by terrorists and extremists online.
  • The University of Oregon is developing a series of “bias probes” to analyze bias within GPT-2.
  • The University of Texas at Austin is studying the statistical detectability of GPT-2 outputs after fine-tuning the model on domain-specific datasets, as well as the extent of detection transfer across different language models.

Future release decisions

Research from these partners will factor into our future release decisions, as will observing how the 774M model is used, and discussing language models with researchers and policymakers to understand the considerations around larger models. As part of our staged release strategy, our current plan is to release the 1558M parameter model in a few months, but it's plausible that findings from a partner, or malicious usage of our 774M model, could change this.

We think that a combination of staged release and partnership-based model sharing is likely to be a key foundation of responsible publication in AI, particularly in the context of powerful generative models. The issues inherent to large models are going to grow, rather than diminish, over time. We hope that our work on GPT-2, discussed further in the technical report we're publishing, will help provide evidence the AI community can draw on when thinking about the publication challenges inherent to some parts of AI research.


Acknowledgments
Thanks to the following for feedback on this post: Yura Burda, Daniela Amodei, Josh Tobin, Christopher Olah, Jeff Wu, Alec Radford, Ashley Pilipiszyn, Greg Brockman, Rowan Zellers, and Dario Amodei.

Footnotes

  1. Having these conversations is difficult, as it involves talking candidly about proprietary systems and it’s unclear who to reach out to in specific organizations to discuss such models and what the appropriate processes are for inter-org discussion about unreleased research. ↩︎

  2. These samples were generated via a “human-in-the-loop” process meant to simulate contemporary disinformation operations, where a human generated samples and periodically selected some for exposure to people. ↩︎