Skip to main content

August 3, 2017

Gathering human feedback

Gathering Human Feedback

RL-Teacher is an open-source implementation of our interface to train AIs via occasional human feedback rather than hand-crafted reward functions. The underlying technique was developed as a step towards safe AI systems, but also applies to reinforcement learning problems with rewards that are hard to specify.


The release contains three main components:

The entire system consists of less than 1,000 lines of Python code (excluding the agents). After you’ve set up your web server you can launch an experiment by running:

$ python rl_teacher/ -p human --pretrain_labels 175 -e Reacher-v1 -n human-175

Humans can give feedback via a simple web interface (shown above), which can be run locally (not recommended) or on a separate machine. Full documentation is available on the project’s GitHub repository(opens in a new window). We’re excited to see what AI researchers and engineers do with this technology—please get in touch with any experimental results!