Skip to main content

August 6, 2018

OpenAI Five Benchmark: Results

Two people wearing headphones and sitting at a desk in front of monitors, playing a video game while a live audience watches

Yesterday, OpenAI Five won a best-of-three against a team of 99.95th percentile Dota players: Blitz(opens in a new window)Cap(opens in a new window)Fogged(opens in a new window)Merlini(opens in a new window), and MoonMeander(opens in a new window)—four of whom have played Dota professionally—in front of a live audience and 100,000 concurrent livestream viewers.

The human team won game three after the audience adversarially selected Five’s heroes. We also showed our preliminary work to introspect Five’s view of the game, including its probability of winning, which made predictions surprising to the human observers. These results show that Five is a step towards advanced AI systems which can handle the complexity and uncertainty of the real world.

In case you missed it: the livestream from the Benchmark commentated by Purge(opens in a new window) and ODPixel(opens in a new window). Christy(opens in a new window) and Greg(opens in a new window) also both livetweeted the event.


Overview of the day

Audience game

The day began with a team of volunteers from the audience bravely playing the first public match against OpenAI Five. Five won within the first 14 minutes (an evenly-matched game generally takes 45 minutes).

Large, crowded venue with an audience facing a panel of people speaking
Row of people wearing headsets and playing Dota on computer monitors

Games 1 and 2

Person embracing his teammate

We revealed a new OpenAI Five capability—the ability to draft(opens in a new window). Drafting is considered an extremely challenging(opens in a new window) part of Dota, since heroes interact with each other in complex ways.

OpenAI Five team probability estimate

In late June we added a win probability output to our neural network to introspect what OpenAI Five is predicting. When later considering drafting, we realized we could use this to evaluate the win probability of any draft: just look at the prediction on the first frame of a game with that lineup. In one week of implementation, we crafted a fake frame for each of the 11 million possible team matchups and wrote a tree search to find OpenAI Five’s optimal draft.

Focus on a row of people looking upwards

After the game 1 draft, OpenAI Five predicted a 95% win probability, even though the matchup seemed about even to the human observers. It won the first game in 21 minutes and 37 seconds. After the game 2 draft, OpenAI Five predicted a 76.2% win probability, and won the second in 24 minutes and 53 seconds.

Game 3: audience draft

For the third game, we asked the audience to draft OpenAI Five’s heroes. As expected(opens in a new window), they selected an adversarial lineup.


Before the game began, OpenAI Five predicted a 2.9% chance of winning. Five played on despite the bad odds, and at one point made enough progress to predict a 17% win probability, before ultimately losing after 35 minutes and 47 seconds.

People greeting each other behind a long desk with black monitors


Our usual development cycle is to train each major revision of the system from scratch. However, this version of OpenAI Five contains parameters that have been training since June 9th across six major system revisions. Each revision was initialized with parameters from the previous one.

We invested heavily in “surgery” tooling which allows us to map old parameters to a new network architecture. For example, when we first trained warding, we shared a single action head for determining where to move and where to place a ward. But Five would often drop wards seemingly in the direction it was trying to go, and we hypothesized it was allocating its capacity primarily to movement. Our tooling let us split the head into two clones initialized with the same parameters.

We estimate that we used the following amounts of compute to train our various Dota systems:

  • 1v1 model: 8 petaflop/s-days

  • June 6th model: 11 petaflop/s-daysA

  • Aug 5th model: 35 petaflop/s-daysA

We are also releasing our latest network architecture(opens in a new window).

Peaking at the model

We can get some insight into the model’s planning via an output which predicts where a hero will be in the future. In the following video, the highlighted boxes show the predicted location of Sven in 6 seconds:

We can also train outputs to predict various other quantities — last hits, tower counts, and the like:

Making our model function requires working through many bugs and unexpected behaviors. Here are some examples:

What’s next

These results give us confidence in moving to the next phase of this project: playing a team of professionals at The International later this month. We will announce details of the games once they are confirmed—follow us(opens in a new window) on Twitter to stay up to date!


  1. A

    We revised these numbers after a more rigorous analysis (4/14/19)