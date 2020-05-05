Algorithmic efficiency can be defined as reducing the compute needed to train a specific capability. Efﬁciency is the primary way we measure algorithmic progress on classic computer science problems like sorting. Efficiency gains on traditional problems like sorting are more straightforward to measure than in ML because they have a clearer measure of task difficulty.[^footnote-difficulty] However, we can apply the efficiency lens to machine learning by holding performance constant. Efficiency trends can be compared across domains like DNA sequencing[^reference-17] (10-month doubling), solar energy[^reference-18] (6-year doubling), and transistor density[^reference-3] (2-year doubling).

For our analysis, we primarily leveraged open-source re-implementations[^reference-19][^reference-20][^reference-21] to measure progress on AlexNet level performance over a long horizon. We saw a similar rate of training efficiency improvement for ResNet-50 level performance on ImageNet (17-month doubling time).[^reference-7][^reference-16] We saw faster rates of improvement over shorter timescales in Translation, Go, and Dota 2:

Within translation, the Transformer [^reference-22] surpassed seq2seq [^reference-23] performance on English to French translation on WMT’14 with 61x less training compute 3 years later. We estimate AlphaZero [^reference-24] took 8x less compute to get to AlphaGoZero [^reference-25] level performance 1 year later. OpenAI Five Rerun required 5x less training compute to surpass OpenAI Five [^reference-26] (which beat the world champions, OG) 3 months later.

It can be helpful to think of compute in 2012 not being equal to compute in 2019 in a similar way that dollars need to be inflation-adjusted over time. A fixed amount of compute could accomplish more in 2019 than in 2012. One way to think about this is that some types of AI research progress in two stages, similar to the “tick tock” model of development seen in semiconductors; new capabilities (the “tick”) typically require a significant amount of compute expenditure to obtain, then refined versions of those capabilities (the “tock”) become much more efficient to deploy due to process improvements.

Increases in algorithmic efficiency allow researchers to do more experiments of interest in a given amount of time and money. In addition to being a measure of overall progress, algorithmic efficiency gains speed up future AI research in a way that’s somewhat analogous to having more compute.

