OpenAI Research | Conclusion

Research

Switch cards to show Media

Switch cards to hide Media

Research

Feb 20, 2026

We share our AI model’s proof attempts for the First Proof math challenge, testing research-grade reasoning on expert-level problems.

Conclusion

Jul 17, 2024

Prover-Verifier Games improve legibility of language model outputs

Discover how prover-verifier games improve the legibility of language model outputs, making AI solutions clearer, easier to verify, and more trustworthy for both humans and machines.

Conclusion

Aug 1, 2023

Confidence-Building Measures for Artificial Intelligence: Workshop proceedings

Conclusion

Jun 23, 2022

Learning to play Minecraft with Video PreTraining

We trained a neural network to play Minecraft by Video PreTraining (VPT) on a massive unlabeled video dataset of human Minecraft play, while using only a small amount of labeled contractor data. With fine-tuning, our model can learn to craft diamond tools, a task that usually takes proficient humans over 20 minutes (24,000 actions). Our model uses the native human interface of keypresses and mouse movements, making it quite general, and represents a step towards general computer-using agents.

Conclusion

Mar 3, 2022

Lessons learned on language model safety and misuse

We describe our latest thinking in the hope of helping other AI developers address safety and misuse of deployed models.

Conclusion

Jan 25, 2021

Scaling Kubernetes to 7,500 nodes

We’ve scaled Kubernetes clusters to 7,500 nodes, producing a scalable infrastructure for large models like GPT-3, CLIP, and DALL·E, but also for rapid small-scale iterative research such as Scaling Laws for Neural Language Models.

Conclusion

Jul 4, 2018

Learning Montezuma’s Revenge from a single demonstration

We’ve trained an agent to achieve a high score of 74,500 on Montezuma’s Revenge from a single human demonstration, better than any previously published result. Our algorithm is simple: the agent plays a sequence of games starting from carefully chosen states from the demonstration, and learns from them by optimizing the game score using PPO, the same reinforcement learning algorithm that underpins OpenAI Five.

Conclusion

Jun 22, 2018

Retro Contest: Results

The first run of our Retro Contest—exploring the development of algorithms that can generalize from previous experience—is now complete.

Conclusion

May 16, 2018

AI and compute

We’re releasing an analysis showing that since 2012, the amount of compute used in the largest AI training runs has been increasing exponentially with a 3.4-month doubling time (by comparison, Moore’s Law had a 2-year doubling period)[^footnote-correction]. Since 2012, this metric has grown by more than 300,000x (a 2-year doubling period would yield only a 7x increase). Improvements in compute have been a key component of AI progress, so as long as this trend continues, it’s worth preparing for the implications of systems far outside today’s capabilities.