The job of compilers is to translate programming languages written by humans into binary executable by computer hardware. Compilers run on large complex, heterogeneous, non-deterministic, and constantly changing systems. Optimising compilers is difficult because the number of possible optimisations is huge. Designing heuristics that take all of these considerations into account ultimately becomes hard. As a result, many compiler optimisations are out of date or poorly tuned.
One of the key challenges is to select the right code transformation for a given program, which requires effective evaluation of the quality of a possible compilation option. For instance, knowing how a code transformation will affect eventual performance is one such option.
A decade ago, machine learning was introduced to enhance the automation of space optimisation. This enabled compiler writers to develop without having to worry about architectures or program specifics. Algorithms were capable of learning heuristics from the results of previous searches. This helped optimise the process in the next run in a single shot manner. Machine learning-based compilation is now a research area, and over the last decade, this field has generated a large amount of academic interest.
To know more about the current state of ML and its implications for compilers, researchers from the University of Edinburgh and Facebook AI collaborated to survey the role of machine learning with regards to compilers.
Current State Of ML For Compilers
Compilers have two jobs – translation and optimisation. Programs are first translated into binary correctly. Next, they have to find the most efficient translation possible. So, the vast majority of research and engineering practices is focussed on optimisation. The idea behind the optimisation of compilers is to minimise or maximise some attributes of an executable computer program such as minimising a program’s execution time, memory requirement, and power consumption.
Machines can be made to learn how to optimise a compiler to make it run faster. Machine learning is ideally suited to making any code optimisation decision where the performance impact depends on the underlying platform.
The advantage of the ML-based approach, stated the researchers, Wang and Boyle, is that the entire process of building the model can be easily repeated whenever the compiler needs to target a new hardware architecture, operating system or application domain.
There were a few implementations of deep learning for compilers. One of them was a deep neural network used to directly predict the right heuristic value should be for some optimisations in OpenCL. They used LSTMs for this, which enabled them to somewhat understand the structure of the program, such as whether a variable has been declared in the past. The results improved over prior, hand-built features. This research was followed by the incorporation of graph-based representations. Researchers thought that it would suit programming languages better. The instructions of a program can be represented as the edges of the graph, depicting the relationship between variables.
Data-flow is fundamental to practically every optimisation in the compiler. Models should be able to reason about complex programs. For this to happen, wrote the researchers, required better representations and graph RNNs that match the data-flow pattern.
In this work, the authors address the question: where is this field going? What do we need to do in the coming years? The optimisation process is hard, and it is only going to get harder still.
What Does The Future Hold
Going forward, the researchers are betting big on reinforcement learning for better compilers. The above illustration is an architecture for the compiler that is governed by the principles of reinforcement learning. Here, RL systems are tasked with looking at optimisations on other functions, basic blocks and individual instructions.