We've found that self-play allows simulated AIs to discover physical skills like tackling, ducking, faking, kicking, catching, and diving for the ball, without explicitly designing an environment with these skills in mind.
Better Exploration with Parameter Noise
We've found that adding adaptive noise to the parameters of reinforcement learning algorithms frequently boosts performance. This exploration method is simple to implement and very rarely decreases performance, so it's worth trying on any problem.
OpenAI Baselines: DQN
We're open-sourcing OpenAI Baselines, our internal effort to reproduce reinforcement learning algorithms with performance on par with published results. We'll release the algorithms over upcoming months; today's release includes DQN and three of its variants.
Evolution Strategies as a Scalable Alternative to Reinforcement Learning
We've discovered that evolution strategies (ES), an optimization technique that's been known for decades, rivals the performance of standard reinforcement learning (RL) techniques on modern RL benchmarks, while overcoming many of RL's inconveniences.