Implicit Generation and Generalization Methods for EnergyBased Models
We've made progress towards stable and scalable training of energybased models (EBMs) resulting in better sample quality and generalization ability than existing models. Generation in EBMs spends more compute to continually refine its answers and doing so can generate samples competitive with GANs at low temperatures^{[1]}, while also having mode coverage guarantees of likelihoodbased models. We hope these findings stimulate further research into this promising class of models.
Generative modeling is the task of observing data, such as images or text, and learning to model the underlying data distribution. Accomplishing this task leads models to understand high level features in data and synthesize examples that look like real data. Generative models have many applications in natural language, robotics, and computer vision.
Energybased models represent probability distributions over data by assigning an unnormalized probability scalar (or “energy”) to each input data point. This provides useful modeling flexibility—any arbitrary model that outputs a real number given an input can be used as an energy model. The difficulty however, lies in sampling from these models.
To generate samples from EBMs, we use an iterative refinement process based on Langevin dynamics. Informally, this involves performing noisy gradient descent on the energy function to arrive at lowenergy configurations (see paper for more details). Unlike GANs, VAEs, and Flowbased models, this approach does not require an explicit neural network to generate samples  samples are generated implicitly. The combination of EBMs and iterative refinement have the following benefits:

Adaptive computation time. We can run sequential refinement for long amount of time to generate sharp, diverse samples or a short amount of time for coarse less diverse samples. In the limit of infinite time, this procedure is known to generate true samples from the energy model.

Not restricted by generator network. In both VAEs and Flow based models, the generator must learn a map from a continuous space to a possibly disconnected space containing different data modes, which requires large capacity and may not be possible to learn. In EBMs, by contrast, can easily learn to assign low energies at disjoint regions.

Builtin compositionality. Since each model represents an unnormalized probability distribution, models can be naturally combined through product of experts or other hierarchical models.
Generation
We found energybased models are able to generate qualitatively and quantitatively highquality images, especially when running the refinement process for a longer period at test time. By running iterative optimization on individual images, we can autocomplete images and morph images from one class (such as truck) to another (such as frog).
In addition to generating images, we found that energybased models are able to generate stable robot dynamics trajectories across large number of timesteps. EBMs can generate a diverse set of possible futures, while feedforward models collapse to a mean prediction.
Generalization
We tested energybased models on classifying several different outofdistribution datasets and found that energybased models outperform other likelihood models such as Flow based and autoregressive models. We also tested classification using conditional energybased models, and found that the resultant classification exhibited good generalization to adversarial perturbations. Our model—despite never being trained for classification—performed classification better than models explicitly trained against adversarial perturbations.
Lessons learned
We found evidence that suggest the following observations, though in no way are we certain that these observations are correct:
 We found it difficult to apply vanilla HMC to EBM training as optimal step sizes and leapfrog simulation numbers differ greatly during training, though applying adaptive HMC would be an interesting extension.
 We found training ensembles of energy functions (sampling and evaluating on ensembles) to help a bit, but was not worth the added complexity.
 We didn’t ﬁnd much success adding a gradient penalty term, as it seemed to hurt model capacity and sampling.
More tips, observations and failures from this research can be found in Section A.8 of the paper.
Next steps
We found preliminary indications that we can compose multiple energybased models via a product of experts model. We trained one model on different size shapes at a set position and another model on same size shape at different positions. By combining the resultant energybased models, we were able to generate different size shapes at different locations, despite never seeing examples of both being changed.
Compositionality is one of the unsolved challenges facing AI systems today, and we are excited about what energybased models can do here. If you are excited to work on energybased models please consider applying to OpenAI!