Filip Wolski

4 posts

OpenAI Five
OpenAI Five

Our team of five neural networks, OpenAI Five, has started to defeat amateur human teams at Dota 2.


Evolved Policy Gradients

Proximal Policy Optimization

Proximal Policy Optimization

We’re releasing a new class of reinforcement learning algorithms, Proximal Policy Optimization (PPO), which perform comparably or better than state-of-the-art approaches while being much simpler to implement and tune.


3 minute read

Robots that Learn

Robots that Learn

We've created a robotics system, trained entirely in simulation and deployed on a physical robot, which can learn a new task after seeing it done once.


3 minute read