The human team won game three after the audience adversarially selected Five’s heroes. We also showed our preliminary work to introspect Five’s view of the game, including its probability of winning, which made predictions surprising to the human observers. These results show that Five is a step towards advanced AI systems which can handle the complexity and uncertainty of the real world.
Overview of the day
The day began with a team of volunteers from the audience bravely playing the first public match against OpenAI Five. Five won within the first 14 minutes (an evenly-matched game generally takes 45 minutes).
Games 1 and 2
In late June we added a win probability output to our neural network to introspect what OpenAI Five is predicting. When later considering drafting, we realized we could use this to evaluate the win probability of any draft: just look at the prediction on the first frame of a game with that lineup. In one week of implementation, we crafted a fake frame for each of the 11 million possible team matchups and wrote a tree search to find OpenAI Five’s optimal draft.
After the game 1 draft, OpenAI Five predicted a 95% win probability, even though the matchup seemed about even to the human observers. It won the first game in 21 minutes and 37 seconds. After the game 2 draft, OpenAI Five predicted a 76.2% win probability, and won the second in 24 minutes and 53 seconds.
Game 3: audience draft
Before the game began, OpenAI Five predicted a 2.9% chance of winning. Five played on despite the bad odds, and at one point made enough progress to predict a 17% win probability, before ultimately losing after 35 minutes and 47 seconds.
Our usual development cycle is to train each major revision of the system from scratch. However, this version of OpenAI Five contains parameters that have been training since June 9th across six major system revisions. Each revision was initialized with parameters from the previous one.
We invested heavily in “surgery” tooling which allows us to map old parameters to a new network architecture. For example, when we first trained warding, we shared a single action head for determining where to move and where to place a ward. But Five would often drop wards seemingly in the direction it was trying to go, and we hypothesized it was allocating its capacity primarily to movement. Our tooling let us split the head into two clones initialized with the same parameters.
We estimate that we used the following amounts of compute to train our various Dota systems:
- 1v1 model: 8 petaflop/s-days
- June 6th model: 11 petaflop/s-days[^footnote-revision]
- Aug 5th model: 35 petaflop/s-days[^footnote-revision]
We are also releasing our latest network architecture.
Peaking at the model
We can get some insight into the model’s planning via an output which predicts where a hero will be in the future. In the following video, the highlighted boxes show the predicted location of Sven in 6 seconds:
We can also train outputs to predict various other quantities — last hits, tower counts, and the like:
Making our model function requires working through many bugs and unexpected behaviors. Here are some examples:
These results give us confidence in moving to the next phase of this project: playing a team of professionals at The International later this month. We will announce details of the games once they are confirmed—follow us on Twitter to stay up to date!