A Quick Introduction to Register Allocation — Hassam Uddin

Most overviews of register allocation are ridiculously complex and not at all practical. I hope, that by the end of this reading, you will gain a deeper understanding of the vocabulary and concepts that are often employed when discussing register allocation. Near the end, I’ll have a full-blown linear scan register allocator written in OCaml. If you want to follow along, you can find the code here.

What is Register Allocation? ¶

I/O is slow. It is ridiculously slow. Reading from RAM takes about 50-100ns, reading from a register takes approximately 300ps. That is a difference of 3 orders of magnitude. When you’ve got a program that requires every ounce of speed it can have, storing values in registers rather than RAM would result in a 1000x speedup. As compiler developers, it is our job to make sure that we are making full use of the registers available to us.

Register Allocation is a very interesting problem that arises in compiler development. How do we make the best use of registers while maintaining correctness?

Vocabulary and Concepts ¶

There are a ton of resources talking about register allocation out there, but a lot of them don’t stop and clarify the vocabulary that is being used. If you’re just looking for code samples, you can skip past this section.

Statement Ordering ¶

Programs are trees, yet when talking about register allocation, we talk about the program as a sequence of statements. This is something that is often just glossed over and to be fair, it’s not very complex, but it’s not obvious either. We need some way of converting an AST into a sequence of statements that maintain dependencies. This is called a reverse postordering and is easily derived from a DFS. Most register allocation algorithms are applied per basic block, but can also be applied globally.

Liveness ¶

We say that a variable is alive if it is being used. We can represent this pretty easily using an array of start and endpoints, also known as live ranges. Register allocation becomes messy and a lot slower when using ranges, so instead we like to talk about just the start and endpoint of a variable’s life, also known as live intervals. We use liveness analysis when allocating registers, but it is also useful for type checking and some common optimizations, so computing the intervals is essentially “free”.

Spilling ¶

Say we have 4 registers in our system and 5 variables that are alive at the same time. It is impossible to fit 5 gallons into a 4-gallon container, so we are out of luck. That won’t stop us though, we’re going to spill a variable. Spilling is the act of saving a variable into memory, rather than storing it in a register. To rephrase what we were saying earlier, we want a register allocation algorithm that minimizes unnecessary spilling. How do we choose which variable should be spilled though?

Heuristic ¶

There is no correct answer to the question we posed above. In a more positive tone, any answer is correct. We will choose something called a spilling heuristic, which is just how we decide which variable gets spilled. In linear scan register allocation, the algorithm that I’ll demonstrate in just a moment, the most common heuristic is to spill the variable that “dies” the latest.

Graph Coloring ¶

Graph coloring is the problem of giving each vertex in a graph a different color (from some fixed set of colors) than any of its neighbors. This is not a “compiler problem”, but it was discovered that register allocation can be simplified to graph coloring. If we draw a graph with every single variable as a vertex and an edge between them if they are alive at the same time, we get a register inference graph, which we can then color with the registers available to us. The only problems are that some graphs cannot be colored and graph coloring is NP-complete and computationally expensive. There is a clever algorithm developed by Chaitin et al. that handles spilling in reasonable time. I will not be going over the algorithm, but send me a message if you’re interested.

Linear Scan Register Allocation ¶

I promised code, let’s get to that code. From their 1999 paper Poletto and Sarkar devised a ridiculously simple and clever algorithm for register allocation. We greedily assign and free registers until we can’t fit them in anymore, which is when we spill.

Linear scanning is ridiculously fast, but as you may have guessed, it does not produce optimal code. When ocamlc added an option for linear scanning, they saw compile-time improvements of almost 25%. The difference in performance between Chaitin’s algorithm and linear scanning is so minuscule that for a lot of applications it does not matter, but we like as much performance as we can have. Linear scanning is used most often in JITs, like Oracle’s Hotspot, because compilation speed matters a ton. This is the basic algorithm:

LinearScanRegisterAllocation
    active ← {}
    for each live interval i, in order of increasing start point do
        ExpireOldIntervals(i)
        if length(active) = R then
            SpillAtInterval(i)
        else
            register[i] ← a register removed from pool of free registers
            add i to active, sorted by increasing end point

ExpireOldIntervals(i)
    for each interval j in active, in order of increasing end point do
        if endpoint[j] ≥ startpoint[i] then
            return
        remove j from active
        add register[j] to pool of free registers

SpillAtInterval(i)
    spill ← last interval in active
    if endpoint[spill] > endpoint[i] then
        register[i] ← register[spill]
        location[spill] ← new stack location
        remove spill from active
        add i to active, sorted by increasing end point
    else
        location[i] ← new stack location

We enter the algorithm with all of the variables in our code represented as live intervals. In OCaml, we’ll define these to look like this:

type interval =
  { name: string;
    reg: regs;
    location: int;
    start_point: int;
    end_point: int;
  }

regs is just an enum containing all the registers that we have. To make life easier for us, let’s define a constructor:

let make_interval name start close =
  {
    name = name;
    reg = Spilled;
    location = -1;
    start_point = start;
    end_point = close;
  }

To represent our input, we will use a global mutable array. I couldn’t figure out a way to make this completely immutable, but we’ll get pretty close. Once the algorithm is done, we’ll want to have the register or spilled status of each interval in the array.

let intervals = [|
  make_interval "a" 1 2;
  make_interval "b" 1 4;
  make_interval "c" 1 3;
  make_interval "d" 1 8;
  make_interval "e" 2 6;
  make_interval "f" 3 6;
  make_interval "g" 8 10;
|]

Main loop ¶

Let’s run that loop over the intervals in our main function

let () =
  let glob_pool = ref [EAX; EBX; ECX; EDX] in (* Our available registers, ideally this is passed in by the architecture stage *)
  let num_regs = List.length !glob_pool in
  let glob_active = ref [] in (* Again, I don't love mutable structures, so if anyone has suggestions to make this immutable, I would love to hear them *)
  let length = Array.length intervals in
  for i = 0 to length - 1 in
    let current = intervals.(i) in
    let (active, pool) = expire_old_intervals (* Not yet implemented *)
    let (active, pool) = match List.length active >= num_regs with (* Assign current interval, and modify active *)
      | true -> spill_interval (* not yet implemented *)
      | false -> (* Greedily assign a new register *)
        let new_reg, pool = get_reg pool in (* Can be a heuristic or a random choice of available registers *)
        let new_interval = {current with reg = new_reg} in
        let () = intervals.(i) <- new_interval in
          (new_interval, i)::active, pool (* We may need to undo what we've allocated, so we have to store the index to go back to *)
    in
    let () = glob_active := active in
    glob_pool := pool
  done;;
  Array.iter (fun interval -> printf "Interval %s with reg %s at location %d \n" interval.name (name interval.reg) interval.location) interval

That’s pretty straightforward. The intervals that we defined above don’t require any spilling, so let’s implement the expire function and see if the algorithm works.

Expiring intervals ¶

We will partition the current active registers into two groups, one for all expired intervals (intervals whose endpoint we are currently past) with free registers, and one for all intervals that are still alive

let expire_old_intervals active current pool =
  let still_active j = (fst j).end_point > current.start_point in (* Our active list is of the form interval * int *)
  let (new_active, regs_to_free) = List.partition (still_active) active in
  let rec add pool l = match l with
    | (head::tail) -> add (free_reg (fst head).reg pool) tail (* free_reg returns the pool with the register added, avoiding duplicates or spilled registers *)
    | [] ->  pool
  in
  new_active, add pool regs_to_free

let () =
  ...
  let (active, pool) = expire_old_intervals !glob_active current !glob_pool

If we remove the reference to spilling and run our algorithm, it works! This is my output:

Interval a with reg eax at location -1
Interval b with reg ebx at location -1
Interval c with reg ecx at location -1
Interval d with reg edx at location -1
Interval e with reg eax at location -1
Interval f with reg ecx at location -1
Interval g with reg ebx at location -1

Yours may differ if you chose to implement get_reg differently, and that’s okay. On some machines, certain registers are faster or more efficient, and so get_reg would need some sort of heuristic for choosing which register it allocates.

Spilling ¶

Last but not least, we need to spill. The most common heuristic is choosing to spill the interval that will die the latest. The simplest heuristic, however, is to just spill whatever we are currently dealing with.

let stack_location = ref 0

let spill_interval current =
  let current, i = current in
  let () = intervals.(i) <- {current with location = !stack_location}
  let () = stack_location := !stack_location + 4

let () =
  ...
  let () = spill_interval (current, i) in (active, pool)

This is simple and it works, but our spilling heuristic is not the best. Instead, we’ll compare our current interval with the interval that has the latest endpoint. The simplest way to do this is to sort the list, but you could maintain a smarter data structure if you cared about performance.

let spill_interval active current =
  let current, i = current in
  let rev_sort = List.sort (fun a b -> -1 * (compare_endpoint (fst a) (fst b))) active in
  let spill, index = List.hd rev_sort in (* Latest endpoint in active *)
  let tail = List.tl rev_sort in
  let active = match spill.end_point >= current.end_point with (* Should we spill the old or current interval *)
    | true -> (* old *)
      let new_interval = {current with reg = spill.reg} in
      let () = intervals.(index) <- {spill with reg = Spilled; location = !stack_location} in (* Reassign old interval's value *)
      let () = intervals.(i) <- new_interval in (* assign new interval a register *)
        (new_interval, index)::tail (* remove old interval from active and add current *)
    | false -> (* current *)
      let () = intervals.(i) <- {current with location = !stack_location} in
        active
  in
  let () = stack_location := !stack_location + 4 in
  active

let () =
  ...
  let (active, pool) = ...
    (spill_interval active (current, i)), pool

Wrapping Up ¶

That wasn’t too bad at all. Let’s try another set of intervals and fewer registers and see that it works.

let intervals = [|
  make_interval "a" 1 10;
  make_interval "b" 1 4;
  make_interval "c" 1 3;
  make_interval "d" 2 8;
  make_interval "e" 3 6;
  make_interval "f" 3 10;
  make_interval "g" 4 8;
|]
...
let glob_pool = ref [EAX; EBX; ECX]

And we get:

Interval a with reg SPILLED at location 0
Interval b with reg ebx at location -1
Interval c with reg ecx at location -1
Interval d with reg eax at location -1
Interval e with reg ecx at location -1
Interval f with reg SPILLED at location 4
Interval g with reg ebx at location -1

Awesome! That wraps up this little overview of register allocation and linear scanning. If people are interested, I would love to create an overview of Chaitin’s algorithm as well. I hope you learned something, whether that be vocabulary words to look up, or how to write an actual register allocator in OCaml.