5.6. Frontend Compilation Optimization

Much like classical compilers, AI compilers implement compilation optimization to enhance the effectiveness of the IRs generated during the compilation process. This strategy reduces not only the length of the code and the time required for its compilation and execution but also diminishes the energy usage of processors during execution. Compilation optimization techniques can be divided into two categories: hardware-agnostic optimization and hardware-specific optimization. However, all optimization techniques applied at the frontend are inherently hardware-agnostic, as the frontend remains oblivious to the backend hardware specifics.

5.6.1. Process of Compilation Optimization

Typically, compilation optimizers execute a sequence of optimization passes. In each pass, an IR is used as input, which then produces a revised IR as output. A single pass might incorporate several sub-passes and can be conducted once or multiple times.

The overall success of compilation optimization significantly depends on the selection and ordering of optimization operations. Not only does the compiler execute various compilation optimization operations as needed, but it can also adjust the number of optimization passes along with the types and sequence of optimization operations. These adjustments are contingent upon the set level of compilation optimization, as illustrated in Figure Fig. 5.6.1.

../_images/optimization_pass.png

Fig. 5.6.1 Structural layout of an optimization pass in compilationoptimization

5.6.2. Prevalent Optimization Methods

Today, a wide array of frontend compilation optimization methods exist. Analogously, machine learning frameworks also employ various optimization methods, although these diverge from those found in classical compilers. This section will detail three frequently employed and versatile frontend compilation optimization methods.

5.6.2.1. Elimination of Dead Code and Unreachable Code

Dead code refers to segments of code that yield outputs not utilized by any other code, while unreachable code refers to segments of code that are not included in any valid control flow path. Figure Fig. 5.6.2 demonstrates these two types of code. The removal of dead or unreachable code can decrease the size of IRs and expedite both the compilation and execution of a program. These types of code can result from human errors or may manifest during other compilation optimizations.

../_images/dead_code_elimination.png

Fig. 5.6.2 Elimination of deadcode

In Chapter [subsec:conversion_between_and_combination_of_dynamic_and_static_graphs], it was previously mentioned that the tracing method can be employed during the process of converting dynamic graphs to static graphs. The tracing method is considered highly effective in identifying dead code and unreachable code. Consequently, this step is often incorporated into the graph conversion procedure.

5.6.2.2. Constant Propagation and Constant Folding

Constant propagation is a process that replaces specific constants with their known values during compilation. On the other hand, constant folding is a process that substitutes variables with constants when the results of multiple operations can be computed directly during compilation. Figure :numref:ch06/ch06-opt-pass-constant-broadcast depicts these two methods.

../_images/constant_propagation_and_constant_folding.png

Fig. 5.6.3 Constant propagation and constant foldingtechniques

5.6.2.3. Common Subexpression Elimination

In order to understand what common subexpression elimination entails, let’s consider the following: If an expression E has been computed and the values of all its variables remain unchanged from the prior computation, E is identified as a common subexpression. This concept is visualized in Figure :numref:ch06/ch06-opt-pass-CSE. As such, E doesn’t need to be computed again; it can be directly replaced with the expression result obtained from the preceding computation.

../_images/common_subexpression_elimination.png

Fig. 5.6.4 Common subexpression eliminationprocess

Common subexpression elimination, like the elimination of dead code and unreachable code, is typically carried out during the graph conversion process. In PyTorch, the torch script module provides a dedicated API for common subexpression elimination. This approach is inherent as it simplifies the identification of common subexpressions within torchscript.