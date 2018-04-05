To describe the benchmark in detail, as well as provide some baseline results, we are releasing a technical report: Gotta Learn Fast: A New Benchmark for Generalization in RL. This report contains details about the benchmark as well as results from running Rainbow DQN, PPO, and a simple random guessing algorithm called JERK. JERK samples random action sequences in a way that is optimized for Sonic, and as training progresses it replays the top-scoring sequence of actions more frequently.

We found that we could significantly boost PPO’s performance on the test levels by leveraging experience from the training levels. When the network was pre-trained on the training levels and fine-tuned on the test levels, its performance nearly doubled, making it better than the strongest alternative baselines. While this is not the first reported instance of successful transfer learning in RL, it is exciting because it shows that transfer learning can have a large and reliable effect.

But we have a long way to go before our algorithms can rival human performance. As shown above, after two hours of practice on the training levels and one hour of play on each test level, humans are able to attain scores that are significantly higher than those attained by RL algorithms, including ones that perform transfer learning.