Our team of five neural networks, OpenAI Five, has started to defeat amateur human teams at Dota 2.
Evolved Policy Gradients
Proximal Policy Optimization
We’re releasing a new class of reinforcement learning algorithms, Proximal Policy Optimization (PPO), which perform comparably or better than state-of-the-art approaches while being much simpler to implement and tune.
Robots that Learn
We've created a robotics system, trained entirely in simulation and deployed on a physical robot, which can learn a new task after seeing it done once.