Research
Can frontier LLMs earn $1 million from real-world freelance software engineering?
This report outlines the safety work carried out for the OpenAI o3-mini model, including safety evaluations, external red teaming, and Preparedness Framework evaluations.
Trading Inference-Time Compute for Adversarial Robustness
Sora is OpenAI’s video generation model, designed to take text, image, and video inputs and generate a new video as an output. Sora builds on learnings from DALL-E and GPT models, and is designed to give people expanded tools for storytelling and creative expression.
This report outlines the safety work carried out prior to releasing OpenAI o1 and o1-mini, including external red teaming and frontier risk evaluations according to our Preparedness Framework.
Advancing red teaming with people and AI
A factuality benchmark called SimpleQA that measures the ability for language models to answer short, fact-seeking questions.
We've analyzed how ChatGPT responds to users based on their name, using AI research assistants to protect privacy.
We introduce MLE-bench, a benchmark for measuring how well AI agents perform at machine learning engineering.