Notes on Optimizing Compilers

8 minute read

Published:

Notes from 15-745: Optimizing Compilers.

Lecture 1: Overview of Optimizations

Three-address code

A := B op C
A := unaryop B
A := B
GOTO s
IF A relop B GOTO s
CALL f
RETURN

Basic block

  • sequence of 3-addr statements
  • only first statement can be reached from outside the block (no branches into middle)
  • all statements are executed consecutively if first one is executed (no branches out or halts except maybe at end of block)
  • are maximal
  • optimizations within a basic block are local optimizations

Flow graph

  • Nodes: basic blocks
  • Edges: \(b_i \rightarrow b_j\) if there is a branch from \(b_i\) to \(b_j\). Also works if \(b_j\) physically follows \(b_i\) which does not end in an unconditional goto.

Optimization types:

  • local: within bb – across instrs
  • global: within a flow graph – across bbs
  • interprocedural analysis: within a program – across procedures (flow graphs)

Local

  • common subexpression elimination
  • constant folding or elimination
  • dead code elimination

Global

  • Global versions of local
    • global common subexpression elimination
    • global constant propagation
    • dead code elimination
  • Loop opts
    • reduce code to be executed in each iter
    • code motion
    • induction var elimination
  • Other control structures
    • Code hoisting: eliminate copies of idential code on parallel paths in flow graph to reduce code size

Induction Variable Elimination

  • Loop indices are induction var
  • Linear function of loop indices are also induction var
  • Analysis: detection of induction var
  • opts
    • strength reduction: replace mult with add
    • elim loop index: replace termination by tests on other induction vars

Loop invariant code motion

  • computation is done within loop and result is same as long as we keep going around the loop
  • move computation outside of loop

Machine dependent optimizations

  • regalloc
  • instr schd
  • memory hierarchy optz

Lecture 2: Local Optimizations

Outline

  • bb/flow graphs
  • abstr 1: dag
  • abstr 2: value numbering
  • phi in ssa

Partitionining into BBs

  • identify leader of eahc bb.
    • first instr
    • target of a jump
    • any instr after a jump
  • bb starts at leadr & ends at instr immediately before a leader or last instr

Common subexpression elimination

  • array expressions
  • field access in records
  • access to parameters

CSE cont:

  • consider Parse Tree for a+a * (b-c) + (b-c) * d
  • notice that subtrees b-c and a are duplicated
  • turn parse tree to expression DAG

DAG bad:

  • worked for one statment but not so good for multiple statments

Alternative: Value Numbering Scheme

  • var2value where each value has its own number
  • common subexpression means same value number
  • r1 + r2 => var2value(r1) + var2value(r2)
VALUES : (var, op, var) -> var = {}
BIN_VALUES : (var, op, var) -> var = {}

for dst, src1, op, src2 in instrs:
  val1 = var2value(src1)
  val2 = var2value(src2)
  if val1, op, val2 in BIN_VALUES:
    print(f`{dst} = {BIN_VALUES[src1, op, src2]}`)
  else:
    tmp = newtemp()
    BIN_VALUES[val1, op, val2] = tmp
    print(f`{tmp} = {VALUES[src1]} {op} {VALUES[src2]}`)
    print(f`{dst} = {tmp}`)
  VALUES[dst] = tmp

uhhh im sort of lost, above is wrong

phi functions in SSA

  • appears in llvm
  • copy prop
    • for a given use of X
      • are all reaching definitions of X:
        • copies from same variable: eg, X = Y
        • where Y is not redefined since that copy?
      • if so, substitute X for Y
  • Static single assignments: IR where every variable is assigned a value at most once in the program rext
  • Easy for basic block
    • For instr:
      • LHS: assign to fresh version of variable
      • RHS: use most recent version of variable
  • What about joins in CFG?
    c = 12
    if (i) {
    a = x + y
    b = a + x
    } else {
    a = b + 2
    c = y + 1
    }
    a = c + a
    

    to

    c1 = 12
    if (i) {
    a1 = x + y
    b1 = a1 + x
    } else {
    a2 = b + 2
    c2 = y + 1
    }
    a4 = c? + a?
    
  • use notational convention: phi function
    a3 = phi(a1,a2)
    c3 = phi(c1,c2)
    b2 = phi(b1,b)
    b3 = c3 + a3
    

Lecture 3: Overview of LLVM Compiler

LLVM Compiler Infrastructure

  • provides resable components for building compilers
  • reduces time/cost
  • build static compilers, jits, trace-based optz, etc LLVM Compiler Framework
  • e2e compilers using LLVM infra
  • C and C++ are robust and aggressive
  • Emit C code or native code

Parts

  • LLVM Virtual Instructio Set

System:

  • front end turns C/C++/Java/… into LLVM IR. One front end is Clang
  • LLVM optimizer opterates on LLVM IR in series of passes
  • Back end turns LLVM IR into machine code

Analysis passes do not change code

LLVM instruction set

  • RISC-like three address code
  • infinite virtual register set in SSA form
  • Simple, low-level control flow constructs
  • Load/store instructions with typed-pointers
for (int i = 0; i < N; i++) {
  Sum(&A[i], &P);
}
loop:   ; prefs = %bb0, %loop
  %i.1 = phi i32 [ 0, %bb0 ], [ %i.2, %loop ]
  #AiAddr = getelementptr float* %A, i32 %i.1
  call void @Sum(float %AiAddr, %pair* %P)
  %i.2 = add i32 %i.1, 1
  %exitcond = icmp eq i32 %i.1, %N
  br i1 %exitcond, label %outloop, label %loop
  • explicit dataflow through SSA
  • explicit cfg, even for exceptions
  • explicit language independent type-information
  • explicit typed pointer arithmetic
    • preserve array subscript and struture indexing

Type system:

  • Primitives: label, void, float, integer (incld arbitrary bitwidth integers i1, i32, …)
  • Derived: pointer, array, structure, function
  • No high-level types

Stack allocation is explicit in LLVM

int caller() {
  int T;
  T = 4;
  ...
}

int %caller() {
  %T = alloca i32
  store i32 4, i32* %T
  ...
}

LLVM IR is almost all nested doubly-linked lists

Function &M = ...
for (Function::iterator I = M->begin(); I != M->end; ++I) {
  BasicBlock &BB = I;
  ...
}
  • Modules contains functions and globals (unit of compilation/analysis/optimization)
  • Functions contain basic blocks and arguments
  • Basic blocks contain instructions
  • Instructions contain opcodes and vector of operands

Pass Manager

  • ImmutablePass: doesn’t do much
  • LoopPass
  • RegionPass: process single-enty, single-exit regions
  • ModulePas
  • CallGraphSCCPass: bottom up on the call graph
  • FunctionPass
  • BasicBlockPass: process basic blocks

Tools:

  • llvm-as: .ll (text) to .bc (binary)
  • llvm-dis: .bc to .ll
  • llvm-link: link multiple .bc files
  • llvm-prof: profile output to human readers
  • opt: run one or more optz on bc

Aggregate tools:

  • bugpoint
  • clang

mostly use clang, opt, llvm-dis

Lecture 4: Dataflow Analysis

Reaching definitions

  • every assignment is a definition
  • a definition reaches a point p if there exists a path from the point immediately following d to p such that d is not killed along that path

Live variable analysis:

  • a variable v is live at point p if the value of v is used along some path in the flow graph starting at p
  • otws the variable is dead

For each basic block, track uses and defs

reaching definitions vs live variables:

  • sets of definitions vs variables
  • forward vs backward

Reaching:

  • consider a = b + c. at what previous statments did variable c acquire a value that can reach line 10? Live range:
  • starting from definition of c, go to the next definition of variable or end of scope that variable c exists.

Lecture 6: More on LLVM Compiler

  • almost everything is sublcass of llvm::Value
for (Function F1 = func->begin(), FE = func->end(); FI != FE; ++FI) {
    for (BasicBlock::iterator BBI = FL->begin(), BBE = FI->end(); BBI != BBE; ++BBI) {
        Instruction *I = BBI;
        if (Vallnst * CI = dyn_cast<CallInst>(I)) {
            outs() << "I'm a Call Instruction!\n";
        }
        if (UnaryInstruction * UI = dyn_cast<UnaryInstruction>(I)) {
            outs() << "I'm a Unary Instruction!\n";
        }
        ...
    }
}

Lecture 8: SSA

  • only 1 asignment per variable
  • definitions dominate uses
    • x strictly dominates w iff impossible to reach w without passing through x first

If x_i is used in x <- phi(…, x_i, …) then VV(x_i) dominates ith pred of BB(PHI)

  • if x is used in y <- … x then BB(x) dominates BB(y)

Lecture 9: SSA style optz

constant prop

W = all defs
while W:
    stmt S <- W.pop()
    if (S has form "v <- c" or S has form "v <- phi(c,c, ..., c)"):
        delete S
        foreach stmt U that uses v {
            replace v with c in U
            W.add(U)
        }

Lecture 10: loop motion

  • use reaching defs and dominators