In this case, a population is initialized with empty programs. It then evolves in repeating cycles to produce better and better learning algorithms. At each cycle, two (or more) random models compete and the most accurate model gets to be a parent. The parent clones itself to produce a child, which gets mutated - i.e., the child's code is modified in a random way, which could mean, for example, arbitrarily inserting, removing or modifying a line in the code. The mutated algorithm is then evaluated on image classification tasks.
In contrast to much previous AutoML work, the AutoML-Zero setup makes the search space very sparse — an accurate algorithm might be as rare as one in 1012 candidates. This is due to the granularity of the building blocks provided to the algorithm, which include only basic operations such as variable assignment, addition, and matrix multiplication.
"In such an environment, a random search will not find a solution in a reasonable amount of time, yet evolution can be tens of thousands of times faster, according to our measurements," say the researchers. "We distributed the search on multiple machines that occasionally exchange algorithms (analogous to migration in real life). We also constructed small proxy classification tasks on which to evaluate each child algorithm, and executed this evaluation with highly optimized code."
Despite the sparsity, the evolutionary search discovers more complex and effective techniques as time passes. Initially, the simplest algorithms appear, which represent linear models with hard-coded weights. In time, stochastic gradient descent (SGD) is invented to learn the weights, in spite of the gradient itself not having been provided as a building block. Though flawed at first, SGD gets fixed relatively quickly, starting a series of improvements to the