We study how training on incorrect responses can cause broader misalignment in language models and identify an internal feature driving this behavior—one that can be reversed with minimal fine-tuning.
We are replacing the existing GPT-4o-based model for Operator with a version based on OpenAI o3. The API version will remain based on 4o.
Codex is a cloud-based coding agent. Codex is powered by codex-1, a version of OpenAI o3 optimized for software engineering. codex-1 was trained using reinforcement learning on real-world coding tasks in a variety of environments to generate code that closely mirrors human style and PR preferences, adheres precisely to instructions, and iteratively runs tests until passing results are achieved.
Codex が登場: codex-1 を搭載した、多数のタスクを並行して処理可能なクラウドベースのソフトウェア エンジニアリングエージェント。Codex を使用すると、開発者は複数のエージェントを同時に展開して、機能についての記述、コードベースに関する質問への回答、バグの修正、レビュー用のプル リクエストの提案などのコーディングタスクを個別に処理できます。
HealthBench is a new evaluation benchmark for AI in healthcare which evaluates models in realistic scenarios. Built with input from 250+ physicians, it aims to provide a shared standard for model performance and safety in health.
OpenAI o3 と o4-mini は、Chain-of-Thought に画像を用いた論理的思考を行うことで、視覚認識に大きなブレークスルーをもたらします。
全てのツールにアクセスが可能な、これまでで最もスマートかつ高性能なモデル
OpenAI o3 and OpenAI o4-mini combine state-of-the-art reasoning with full tool capabilities—web browsing, Python, image and file analysis, image generation, canvas, automations, file search, and memory.
Sharing our updated framework for measuring and protecting against severe harm from frontier AI capabilities.