Skip to main content

We’re releasing a Neural MMO(opens in a new window), a massively multiagent game environment for reinforcement learning agents. Our platform supports a large, variable number of agents within a persistent and open-ended task. The inclusion of many agents and species leads to better exploration, divergent niche formation, and greater overall competence.

In recent years, multiagent settings have become an effective platform(opens in a new window) for(opens in a new window) deep(opens in a new window) reinforcement(opens in a new window) learning(opens in a new window) research(opens in a new window). Despite this progress, there are still two main challenges for multiagent reinforcement learning. We need to create open-ended tasks with a high complexity ceiling: current environments are either complex but too narrow(opens in a new window) or open-ended but too(opens in a new window) simple(opens in a new window). Properties such as persistence and large population scale are key, but we also need more benchmark environments to quantify learning progress in the presence of large population scales and persistence. The game genre of Massively Multiplayer Online Games (MMOs) simulates a large ecosystem of a variable number of players competing in persistent and extensive environments.

To address these challenges, we built our Neural MMO to meet the following criteria:

  • Persistence: Agents learn concurrently in the presence of other learning agents with no environment resets. Strategies must consider long time horizons and adapt to potentially rapid changes in the behaviors of other agents.

  • Scale: The environment supports a large and variable number of entities. Our experiments consider up to 100M lifetimes of 128 concurrent agents in each of 100 concurrent servers.

  • Efficiency: The computational barrier to entry is low. We can train effective policies on a single desktop CPU.

  • Expansion: Similarily to(opens in a new window) existing(opens in a new window) MMOs(opens in a new window), our Neural MMO is designed to update with new content. Current core features include procedural generation of tile-based terrain, a food and water foraging system, and a strategic combat system. There is an opportunity for open-source driven expansion in the future.

The environment

Players (agents) may join any available server (environment), each containing an automatically generated tile-based game map of configurable size. Some tiles, such as food-bearing forest tiles and grass tiles, are traversable. Others, such as water and solid stone, are not. Agents spawn at a random location along the edges of the environment. They must obtain food and water, and avoid combat damage from other agents, in order to sustain their health. Stepping on a forest tile or next to a water tile refills a portion of the agent’s food or water supply, respectively. However, forest tiles have a limited supply of food, which regenerates slowly over time. This means that agents must compete for food tiles while periodically refilling their water supply from infinite water tiles. Players engage in combat using three combat styles, denoted Melee, Range, and Mage for flavor.

Input: Agents observe a square crop of tiles centered on their current position. This includes tile terrain types and the select properties (health, food, water, and position) of occupying agents.

Output: Agents output action choices for the next game tick (timestep). Actions consist of one movement and one attack.

Grouped visualizations of the game creation process

The model

As a simple baseline, we train a small, fully connected architecture using vanilla policy gradients(opens in a new window), with a value function baseline and reward discounting as the only enhancements. Instead of rewarding agents for achieving particular objectives, agents optimize only for their lifetime (trajectory length): they receive reward 1 for each tick of their lifetime. We convert variable length observations, such as the list of surrounding players, into a single length vector by computing the maximum across all players (OpenAI Five also utilized this trick). The source release includes our full distributed training implementation, which is based on PyTorch(opens in a new window) and Ray(opens in a new window).

Evaluation results

Agents’ policies are sampled uniformly from a number of populations—agents in different populations share architectures, but only agents in the same population share weights. Initial experiments show that agent competence scales with increasing multiagent interaction. Increasing the maximum number of concurrent players magnifies exploration; increasing the number of populations magnifies niche formation—that is, the tendency of populations to spread out and forage within different parts of the map.

Server merge tournaments: Multiagent magnifies competence

There is no standard procedure among MMOs for evaluating relative player competence across multiple servers. However, MMO servers sometimes undergo merges where the player bases from multiple servers are placed within a single server. We implement “tournament” style evaluation by merging the player bases trained in different servers. This allows us to directly compare the policies learned in different experiment settings. We vary test time scale and find that agents trained in larger settings consistently outperform agents trained in smaller settings.

Increased population size magnifies exploration


In the natural world, competition among animals can incentivize them to spread out to avoid conflict. We observe that map coverage increases as the number of concurrent agents increases. Agents learn to explore only because the presence of other agents provides a natural incentive for doing so.

Increased species count magnifies niche formation

Given a sufficiently large and resource-rich environment, we found different populations of agents separated across the map to avoid competing with others as the populations increased. As entities cannot out-compete other agents of their own population (i.e., agents with whom they share weights), they tend to seek areas of the map that contain enough resources to sustain their population. Similar effects were also independently observed in concurrent multiagent research by DeepMind(opens in a new window).

Additional insights

We visualize agent-agent dependencies by fixing an agent at the center of a hypothetical map crop. For each position visible to that agent, we show what the value function would be if there were a second agent at that position. We find that agents learn policies dependent on those of other agents, in both the foraging and combat environments. Agents learn “bull’s eye” avoidance maps to begin foraging more effectively after only a few minutes of training. As agents learn the combat mechanics of the environment, they begin to appropriately value effective engagement ranges and angles of approach.

Next steps

Our Neural MMO resolves two key limitations of previous game-based environments, but there are still many left unsolved. This Neural MMO strikes a middle ground between environment complexity(opens in a new window) and population(opens in a new window) scale(opens in a new window). We’ve designed this environment with open-source expansion in mind and for the research community to build upon.

If you are excited about conducting research on multiagent systems, consider joining OpenAI.


Joseph Suarez, Yilun Du, Phillip Isola, Igor Mordatch


Thanks to Clare Zhu for her substantial work on the 3D client.

We also thank the following for feedback on drafts of this post: Greg Brockman, Ilya Sutskever, Jack Clark, Ashley Pilipiszyn, Ryan Lowe, Julian Togelius, Joel Liebo, Cinjon Resnick.