December 5, 2016

Universe

We’re releasing Universe, a software platform for measuring and training an AI’s general intelligence across the world’s supply of games, websites and other applications.

View code

A colorful collage of square-dimension moving 2D video game interfaces

Loading…

Universe allows an AI agent⁠(opens in a new window) to use a computer like a human does: by looking at screen pixels and operating a virtual keyboard and mouse. We must train AI systems on the full range of tasks we expect them to solve, and Universe lets us train a single agent on any task a human can complete with a computer.

In April, we launched Gym⁠, a toolkit for developing and comparing reinforcement learning⁠ (RL) algorithms. With Universe, any program can be turned into a Gym environment. Universe works by automatically launching the program behind a VNC⁠(opens in a new window) remote desktop—it doesn’t need special access to program internals, source code, or bot APIs.

Today’s release consists of a thousand environments⁠(opens in a new window) including Flash games⁠(opens in a new window), browser tasks⁠(opens in a new window), and games like slither.io⁠(opens in a new window) and GTA V⁠(opens in a new window). Hundreds of these are ready for reinforcement learning, and almost all can be freely run with the universe⁠(opens in a new window) Python library as follows:

Python

1import gym
2import universe # register Universe environments into Gym
3
4env = gym.make('flashgames.DuskDrive-v0') # any Universe environment ID here
5observation_n = env.reset()
6
7while True:
8  # agent which presses the Up arrow 60 times per second
9  action_n = [[('KeyEvent', 'ArrowUp', True)] for _ in observation_n]
10  observation_n, reward_n, done_n, info = env.step(action_n)
11  env.render()

The sample code above will start your AI playing the Dusk Drive Flash game. Your AI will be given frames like the above 60 times per second. You’ll need to have Docker and universe installed.

Our goal⁠ is to develop a single AI agent that can flexibly apply its past experience on Universe environments to quickly master unfamiliar, difficult environments, which would be a major step towards general intelligence. There are many ways to help⁠: giving us permission on your games, training agents across Universe tasks, (soon) integrating new games, or (soon) playing the games.

Background

The area of artificial intelligence has seen rapid progress over the last few years. Computers can now see⁠(opens in a new window), hear⁠(opens in a new window), and translate⁠(opens in a new window) languages with unprecedented accuracies. They are also learning to generate images⁠, sound⁠(opens in a new window), and text⁠(opens in a new window). A reinforcement learning system, AlphaGo⁠(opens in a new window), defeated the world champion at Go. However, despite all of these advances, the systems we’re building still fall into the category of “Narrow AI”—they can achieve super-human performance in a specific domain, but lack the ability to do anything sensible outside of it. For instance, AlphaGo can easily defeat you at Go, but you can’t explain the rules of a different board game to it and expect it to play with you.

Systems with general problem solving ability—something akin to human common sense, allowing an agent to rapidly solve a new hard task—remain out of reach. One apparent challenge is that our agents don’t carry their experience along with them to new tasks. In a standard training regime, we initialize agents from scratch and let them twitch randomly through tens of millions of trials as they learn to repeat actions that happen to lead to rewarding outcomes. If we are to make progress towards generally intelligent agents, we must allow them to experience a wide repertoire of tasks so they can develop world knowledge and problem solving strategies that can be efficiently reused in a new task.

Universe Infrastructure

Universe exposes a wide range of environments through a common interface: the agent operates a remote desktop by observing pixels of a screen and producing keyboard and mouse commands. The environment exposes a VNC server and the universe⁠(opens in a new window) library turns the agent into a VNC client.

Our design goal for universe was to support a single Python process driving 20 environments in parallel at 60 frames per second. Each screen buffer is 1024x768, so naively reading each frame from an external process would take 3GB/s of memory bandwidth. We wrote a batch-oriented VNC client⁠(opens in a new window) in Go, which is loaded as a shared library in Python and incrementally updates a pair of buffers for each environment. After experimenting with many combinations of VNC servers, encodings, and undocumented protocol options, we now routinely drive dozens of environments at 60 frames per second with 100ms latency—almost all due to server-side encoding.

Here are some important properties of our current implementation:

General. An agent can use this interface (which was originally designed for humans) to interact with any existing computer program without requiring an emulator or access to the program’s internals. For instance, it can play any computer game, interact with a terminal, browse the web, design buildings in CAD software, operate a photo editing program, or edit a spreadsheet.

Familiar to humans. Since people are already well versed with the interface of pixels/keyboard/mouse, humans can easily operate any of our environments. We can use human performance as a meaningful baseline, and record human demonstrations by simply saving VNC traffic. We’ve found demonstrations to be extremely useful in initializing agents with sensible policies with behavioral cloning (i.e. use supervised learning to mimic what the human does), before switching to RL to optimize for the given reward function.

VNC as a standard. Many implementations of VNC are available online and some are packaged by default into the most common operating systems, including OSX. There are even VNC implementations in JavaScript⁠(opens in a new window), which allow humans to provide demonstrations without installing any new software—important for services like Amazon Mechanical Turk.

Easy to debug. We can observe our agent while it is training or being evaluated—we just attach a VNC client to the environment’s (shared) VNC desktop. We can also save the VNC traffic for future analysis.

We were all quite surprised that we could make VNC work so well. As we scale to larger games, there’s a decent chance we’ll start using additional backend technologies. But preliminary signs indicate we can push the existing implementation far: with the right settings, our client can coax GTA V to run at 20 frames per second over the public internet.

Environments

We have already integrated a large number of environments into Universe, and view these as just the start. Each environment is packaged as a Docker image and hosts two servers⁠(opens in a new window) that communicate with the outside world: the VNC server which sends pixels and receives keyboard/mouse commands, and a WebSocket server which sends the reward signal for reinforcement learning tasks (as well as any auxiliary information such as text, or diagnostics) and accepts control messages (such as the specific environment ID to run).

Atari games

Universe includes the Atari 2600 games from the Arcade Learning Environment. These environments now run asynchronously inside the quay.io/openai/universe.gym-core⁠(opens in a new window) Docker image and allow the agent to connect over the network, which means the agent must handle lag and low frame rates. Running over a local network in the cloud, we usually see 60 frames per second, observation lags of 20ms, and action lags of 10ms; over the public internet this drops to 20 frames per second, 80ms observation lags, and 30ms action lags.

Flash games

We turned to Flash games as a starting point for scaling Universe—they are pervasive on the Internet, generally feature richer graphics than Atari, but are still individually simple. We’ve sifted through over 30,000 so far, and estimate there’s an order of magnitude more.

Our initial Universe release includes 1,000 Flash games (100 with reward functions), which we distribute in the quay.io/openai/universe.flashgames⁠(opens in a new window) Docker image with consent from the rightsholders. This image starts a TigerVNC⁠(opens in a new window) server and boots a Python control server, which uses Selenium⁠(opens in a new window) to open a Chrome browser to an in-container page with the desired game, and automatically clicks through any menus needed to start the game.

Extracting rewards. While environments without reward functions can be used for unsupervised learning or to generate human demonstrations, RL needs a reward function. Unlike with the Atari games, we can’t simply read out success criteria from the process memory, as there is too much variation in how each game stores this information. Fortunately, many games have an on-screen score which we can use as a reward function, as long as we can parse it. While off-the-shelf OCR⁠(opens in a new window) such as Tesseract⁠(opens in a new window) performs great on standard fonts with clean backgrounds, it struggles with the diverse fonts, moving backgrounds, flashy animations, or occluding objects common in many games. We developed a convolutional neural network-based OCR model that runs inside the Docker container’s Python controller, parses the score (from a screen buffer maintained via a VNC self-loop), and communicates it over the WebSocket channel to the agent.

Browser tasks

Humanity has collectively built the Internet into an immense treasure trove of information, designed for visual consumption by humans. Universe includes browser-based environments which require AI agents to read, navigate, and use the web just like people—using pixels, keyboard, and mouse.

Today, our agents are mostly learning to interact with common user interface elements like buttons, lists and sliders, but in the future they could complete complex tasks, such as looking up things they don’t know on the internet, managing your email or calendar, completing Khan Academy⁠(opens in a new window) lessons, or working on Amazon Mechanical Turk and CrowdFlower⁠(opens in a new window) tasks.

Mini World of Bits. We first set out to create a new benchmark that captures the salient challenges of browser interactions in a simple setting. We call this benchmark Mini World of Bits⁠(opens in a new window). We think of it as an analogue to MNIST⁠(opens in a new window), and believe that mastering these environments provides valuable signal towards models and training techniques that will perform well on full websites and more complex tasks. Our initial Mini World of Bits benchmark consists of 80 environments that range from simple (e.g. click a specific button) to difficult (e.g. reply to a contact in a simulated email client).

Real-world browser tasks. We’ve begun work on more realistic browser tasks. The agent takes an instruction, and performs a sequence of actions on a website. One such environment hands the agent details of a desired flight booking, and then requires it to manipulate a user interface to search for the flight. (We use cached recordings of these sites to avoid spamming them, or booking lots of real flights.)

Future integrations

This infrastructure is general-purpose: we can integrate any game, website, or application which can run in a Docker container (most convenient) or a Windows virtual machine (less convenient). We’d like the community’s help⁠(opens in a new window) to continue extending the breadth of Universe environments, including completing the integrations of our partners’⁠ games, Android apps (emulators can run inside of Docker), fold.it⁠(opens in a new window), Unity games, HTML5 games, online educational games, and really anything else people think of.

Microsoft’s Project Malmo⁠(opens in a new window) team will be integrating with Universe, and we look forward to supporting other AI frameworks as well.

Running an environment

Despite the huge variety, running Universe environments requires minimal setup. You’ll need only to install Docker⁠(opens in a new window) and universe⁠(opens in a new window):

Plain Text

1$ git clone https://github.com/openai/universe && pip install -e universe

We package each collection of similar environments into a “runtime”, which is a server exposing two ports: 5900 (used for the VNC protocol to exchange pixels/keyboard/mouse) and 15900 (used for a WebSocket control protocol⁠(opens in a new window)). For example, the quay.io/openai/universe.flashgames⁠(opens in a new window) Docker image is a runtime that can serve many different Flash game environments.

Starting a runtime. You can boot your first runtime from the console as follows:

Plain Text

1# -p 5900:5900 and -p 15900:15900 expose the VNC and WebSocket ports
2# --privileged/--cap-add/--ipc=host needed to make Selenium work
3$ docker run --privileged --cap-add=SYS_ADMIN --ipc=host \
4    -p 5900:5900 -p 15900:15900 quay.io/openai/universe.flashgames

This will download and run the Flash games Docker container. You can view and control the remote desktop by connecting your own VNC viewer to port 5900, such as via TurboVNC⁠(opens in a new window) or the browser-based VNC client⁠(opens in a new window) served via the webserver on port 15900. The default password is openai. OSX also has a native VNC viewer which can be accessed by running open vnc://localhost:5900 in Terminal. (Unfortunately, the OSX viewer doesn’t implement Tight encoding, which is the best option for bigger games.)

Writing your own agent. You can write your own agent quite easily, using your favorite framework such as TensorFlow⁠(opens in a new window) or Theano⁠(opens in a new window). (We’ve provided a starter TensorFlow agent⁠(opens in a new window).) At each time step, the agent’s observation includes a NumPy pixel array, and the agent must emit a list of VNC events (mouse/keyboard actions). For example, the following agent will activate Dusk Drive and press forward constantly:

Plain Text

1import gym
2import universe # register Universe environments into Gym
3
4env = gym.make('flashgames.DuskDrive-v0') # any Universe [environment ID](https://github.com/openai/universe/blob/master/universe/__init__.py#L297) here
5# If using docker-machine, replace "localhost" with your Docker IP
6env.configure(remotes="vnc://localhost:5900+15900")
7observation_n = env.reset()
8
9while True:
10  # agent which presses the Up arrow 60 times per second
11  action_n = [[('KeyEvent', 'ArrowUp', True)] for _ in observation_n]
12  observation_n, reward_n, done_n, info = env.step(action_n)
13  env.render()

You can keep your own VNC connection open, and watch the agent play, or even use the keyboard and mouse alongside the agent in a human/agent co-op mode.

Environment management. Because environments run as server processes, they can run on remote machines, possibly within a cluster or even over the public internet. We’ve documented a few ways to manage remote⁠(opens in a new window) runtimes. At OpenAI, we use an “allocator” HTTP service, which provisions runtimes across a Kubernetes⁠(opens in a new window) cluster on demand, and which we can use to connect a single agent process to hundreds of simultaneous environments.

Validating the Universe infrastructure

Universe agents must deal with real-world griminess that traditional RL agents are shielded from: agents must run in real-time and account for fluctuating action and observation lag. While the full complexity of Universe is designed to be out of reach of current techniques, we also have ensured it’s possible to make progress today.

Universe Pong. Our first goal was solving gym-core.PongDeterministic-v3⁠(opens in a new window). Pong is one of the easiest Atari games, but it had the potential to be intractable as a Universe task, since the agent has to learn to perform very precise maneuvers at 4x realtime (as the environment uses a standard frameskip of 4). We used this environment to validate that Universe’s variable latencies still allowed for learning precise and rapid reactions. Today’s release includes universe-starter-agent⁠(opens in a new window), which takes an hour to train to a Pong score of +17 out of 21. Humans playing the same version of Pong were only able reach a score of -11 on a scale between -21 and 21, due to the game’s high speed.

Additional experiments. We applied RL to several racing Flash games, which worked after applying some standard tricks such as reward normalization. Some browser tasks where we tried RL had difficult exploration problems, but were solvable with behavioral cloning from human demonstration data.

Some of our successful agents are shown below. While solving Universe will require an agent far outside the reach of current techniques, these videos show that many interesting Universe environments can be fruitfully approached with today’s algorithms.

Remote training. We’ve also experimented with slither.io⁠(opens in a new window) agents on our physical infrastructure⁠ (which has access to Titan X GPUs) and environments in the cloud. Generally, the agent will control 32 simultaneous environments at 5 frames per second—observations are available much more frequently, but lower framerates help today’s RL algorithms, which struggle with dependencies over many timesteps.

Diagram showing agent’s “reaction time” over the public internet.

Our agent’s “reaction time” averages around 150ms over the public internet: 110ms for an observation to arrive, 10ms to compute the action, and 30ms for the action to take effect. (For comparison, human reaction time⁠(opens in a new window) averages around 250ms.) Reaction times drop to 80ms over a local network, and 40ms within a single machine.

Looking forward

Research progress requires meaningful performance measurement. In upcoming weeks, we’ll release a transfer learning benchmark, allowing researchers to determine if they are making progress on general problem solving ability.

Universe draws inspiration from the history of the ImageNet⁠(opens in a new window) dataset in the Computer Vision community. Fei-Fei Li⁠(opens in a new window) and her collaborators deliberately designed the ImageNet benchmark to be nearly impossible, but error rates have dropped rapidly from 28% in 2010 to 3% in 2016, which reaches (or in some cases even surpasses) human-level⁠(opens in a new window) performance.

If the AI community does the same with Universe, then we will have made real progress towards systems with broad, general intelligence.

Help us improve Universe

Universe will only succeed with the community’s help. There are many ways to contribute (and one particularly great way is to join us⁠):

Grant us permission to use your game, program, website, or app.

If your program would yield good training tasks for an AI then we’d love your permission⁠(opens in a new window) to package it in Universe. Good candidates have an on-screen number (such as a game score) which can be parsed as a reward, or well-defined objectives, either natively or definable by the user.

Train agents on Universe tasks.

AI advances require the entire community to collaborate, and we welcome the community’s help in training agents across these tasks. We’ve released a starter agent⁠(opens in a new window) which should be a helpful starting point for building your own agents. In upcoming weeks, we’ll release the sub-benchmarks we think are the right places to start.

Integrate new environments. (Coming soon)

We have many more environments waiting to be integrated than we can handle on our own. In upcoming weeks, we’ll release our environment integration tools, so anyone can contribute new environment integrations. In the meanwhile, we’ll be running a beta⁠(opens in a new window) for environment integrators.

Contribute demonstrations. (Coming soon)

We’re compiling a large dataset of human demonstrations on Universe environments, which will be released publicly. If you’d like to play games for the good of science, please sign up for our beta⁠(opens in a new window).

Author

OpenAI

Credits

Acquisition & partnerships: Erin Pettigrew, Jack Clark, Jeff Arnold

Core infrastructure: Greg Brockman, Catherine Olsson, Alex Ray

Demonstrations: Tom Brown, Jeremy Schlatter, Marie La, Catherine Olsson

Distributed training infrastructure: Vicki Cheung, Greg Brockman, Jonas Schneider

Documentation & communications: Jack Clark, Andrej Karpathy, Catherine Olsson

Environment integrations: Alec Radford, Jonathan Gray, Tom Brown, Greg Brockman, Alex Ray, Catherine Olsson, Trevor Blackwell, Tambet Matiisen, Craig Quiter

Initial agent results: Rafal Jozefowicz, Dario Amodei, Ilya Sutskever, Jonathan Ho, Trevor Blackwell, Avital Oliver, Yaroslav Bulatov

Remote environment management: Vicki Cheung, Greg Brockman, Catherine Olsson, Jie Tang

RL baselines: Dario Amodei, Harri Edwards

Website: Ludwig Petterson, Jie Tang, Tom Brown, Alec Radford, Jonas Schneider, Szymon Sidor

World of Bits: Andrej Karpathy, Tianlin (Tim) Shi, Linxi (Jim) Fan, Jonathan Hernandez, Percy Liang

Partners

The following partners have been key to creating Universe: EA, Valve, Microsoft, NVIDIA, Kongregate, Newgrounds, Yacht Club Games, Zachtronics, Ludeon Studios, Monomi Park, 2D Boy, Adam Reagle, Alvin Team, Rockspro, Anubhav Sharma, Arkadium, Beast Games, Char Studio, Droqen, Percy Pea, deeperbeige, Denny Menato, Dig Your Own Grave, Free World Group, Gamesheep, Hamumu Software, Hemisphere Games, Icy Lime, Insane Hero, inRegular Games, JackSmack, Nocanwin, Joe Willmott, Johnny Two Shoes, The Gamest Studio, László Cziglédszky, Madalin Games, Martian Games, Mateusz Skutnik, Mikalay Radchuk, Neutronized, Nitrome, ooPixel, PacoGames, Pixelante, Plemsoft, Rob Donkin, robotJam, Rumble Sushi 3D, SFB Games, Simian Logic, Smiley Gamer, Sosker, tequibo, kometbomb, ThePodge, Vasco Freitas, Vitality Games, Wolve Games, Xform Games, XGen Studios

View all