In a nutshell

This post describes an implementation of machine and compiler that uses garbage collection to reclaim unused memory in the heap. To play with it, click the link A virtual machine with garbage collection.

Motivation

The last post presented a virtual machine with explicit memory management to represent runtime entities such as environments, function objects, and runtime-stack frames. These entities were allocated from a heap data structure, which was an array of primitive values. When a new entity is needed, it gets allocated a contiguous range of heap slots called a node starting at a free pointer. After each allocation, the free pointer gets incremented by the size of the allocated entity. That means that the heap grows bigger and bigger as computation proceeds.

This regime may not be realistic, because it keeps allocating fresh memory for all runtime entities. In the JavaScript implementation of the previous post where automatically resizing JavaScript arrays are used, the heap will grow very quickly and may exceed the maximal size of memory available to the JavaScript implementation. In programming languages that have a fixed array size, such as C, the implementer of the machine is faced with the question how big the heap should be to accommodate all runtime entities. In both cases, there is typically a lot of wasted memory: At any point of time, there are bound to be many nodes that will never be used again, and yet the machine will keep their slots reserved forever. Such nodes are called garbage. Unfortunately, we cannot predict precisely which nodes are garbage. As usual, we take a conservative approach and define a safe approximation.

Preparing the heap

A useful property of the machine is that it will only visit nodes in the heap by following pointers from a small number of registers. These are the current environment, the current operand stack, and the current frames on the runtime stack. These nodes are called roots. The nodes reachable from the roots are called live, and the rest are called dead. Any dead node is garbage, and therefore its memory can be reused. In this post, I describe a memory management technique called stop-and-copy garbage collection that makes use of this property. The first idea is to split the available memory into two halves.

onst heap = [];
const heap_size = 100000;
const heap_bottom = 0;
const space_size = heap_size / 2 ;

The machine keeps allocating memory from one half of the memory, and when that half is full, it stops execution of machine instructions, copies all live nodes into the other half, and then resumes execution.

let to_space = heap_bottom;
let top_of_space = to_space + space_size - 1; 
let from_space = top_of_space + 1;
let free = to_space;

Initially, the bottom half of the memory is called the to-space and the top half is called the from-space. The new version of the function make_node keeps allocating memory from the current to_space. If the current to-space is not big enough to accommodate the new node, it performs garbage collection by calling a function flip.

function make_node(kind, size) {
    if (free + size > top_of_space) {
        flip();
    }
    if (free + size /* still */ > top_of_space) {
        error("out of memory");
    } else {
        const node_address = free;
        heap[node_address + kind_offset] = kind;
        heap[node_address + size_offset] = size;
        heap[node_address + number_of_children_offset] = 0;
        free = free + size;
        return node_address;
    }
}

The function flip reverses the roles of from_space and to_space, and copies all nodes reachable from the roots (environment, operand stack, and runtime-stack frames) from the new from-space to the new to-space.

function flip() {
    let temp = from_space;
    from_space =to_space; 
    to_space = temp; 
    top_of_space = to_space + space_size - 1; 
    let scan = to_space; 
    free = to_space;
    environment = copy(environment);
    operand_stack = copy(operand_stack);
    for (let r = runtime_stack; !is_null(r); r = tail(r)) {
         set_head(r, copy(head(r)));
    }
    while (scan < free ) {
        let number_of_children = heap[scan + number_of_children_offset];
        for (let c = 0; c < number_of_children; c = c + 1) {
            heap[scan + first_child_offset + c] = 
                copy(heap[scan + first_child_offset + c]);
        }
        scan = scan + heap[scan + size_offset];
    }
}

After all roots are copied into the to-space, a while loop scans the copied nodes and copies their children as well. A scan pointer keeps track of the next copy whose children still need to be copied. A free pointer keeps track of the place where the next child is to be copied to. The copying of children continues until scan reaches free, at which point all live nodes have been copied from the from-space to the to-space. This particularly elegant way of copying all live nodes in a breadth-first manner using a while-loop is due to C.J. Cheney.

Each time a node is copied using the copy function, the address of the copy is recorded in the original. This address is traditionally called a “broken heart”.

const broken_heart_offset = 0; // use kind slot for broken heart
function copy(v) {
    if (already_copied(v)) {
        return heap[v + broken_heart_offset];
    } else {
        const addr = free;
        move(v, free);
        free = free + heap[v + size_offset];
        heap[v + broken_heart_offset] = addr; 
        return addr;
    }
}

The copy function checks if a node is already copied to the to-space using the broken-heart address.

function already_copied(v) {
    return heap[v + broken_heart_offset] >= to_space &&
           heap[v + broken_heart_offset] <= top_of_space;
}

The actual copying is done using the move function.

function move(source, destination) {
    let size = heap[source + size_offset];
    for (let i = 0; i < size; i = i + 1) {
        heap[destination + i] = heap[source + i];
    }
}

Nasty bugs

The integration of a stop-and-copy garbage collector into a virtual machine is notoriously error-prone. I’m going to illustrate this using the call-instruction as an example. Take a look at this seemingly reasonable implementation of the call-instruction.

        // machine clause for call-instruction
        } else if (is_call_instruction(instr)) {
            const arity = call_instruction_arity(instr);
            const callee = peek_in_operand_stack(arity);
            if (is_primitive_function(callee)) {
                const args = pop_arguments_from_operand_stack(arity);
                const callee = pop_from_operand_stack();
                pc = pc + 1;
                push_on_operand_stack(apply_primitive_function(
                                          callee,
                                          args));
            } else {
                runtime_stack = pair(make_runtime_stack_call_frame(),
                                     runtime_stack);
                const max_stack_size = function_max_stack_size(callee);
                pc = function_address(callee);
                environment = function_environment(callee);
                extend_environment(arity);
                operand_stack = make_operand_stack(max_stack_size);
            }
        } else ...

If the callee is a compound function, the machine allocates a new runtime-stack frame on the heap and pushes it on the runtime stack. It sets the pc to the first instruction of the function body and allocates a new environment node on the heap using the extend_environment function, which assigns the environment register to the new node. Finally, it allocates a new operand stack of sufficient size on the heap and assigns the operand_stack register to it.

But note that each heap allocation can trigger a garbage collection, which renders previously obtained addresses obsolete. The bug in the implementation above is that a garbage collection triggered by make_runtime_stack_call_frame renders the callee address obsolete. In that case, the environment register gets assigned to an obsolete address in the from-space, whose node might get overwritten in the next call of flip before it is used by some other instruction.

To fix the bug, I’m reobtaining the callee address after the call of make_runtime_stack_call_frame.

        // machine clause for call-instruction
        } else if (is_call_instruction(instr)) {
            const arity = call_instruction_arity(instr);
            const callee = peek_in_operand_stack(arity);
            if (is_primitive_function(callee)) {
                const args = pop_arguments_from_operand_stack(arity);
                const callee = pop_from_operand_stack();
                pc = pc + 1;
                push_on_operand_stack(apply_primitive_function(
                                          callee,
                                          args));
            } else {
                runtime_stack = pair(make_runtime_stack_call_frame(),
                                     runtime_stack);
                // callee might be obsolete; reload from current operand stack
                const callee = peek_in_operand_stack(arity);
                const max_stack_size = function_max_stack_size(callee);
                pc = function_address(callee);
                environment = function_environment(callee);
                extend_environment(arity);
                operand_stack = make_operand_stack(max_stack_size);
            }
        } else ...

A few further modifications are needed to avoid obsolete addresses in the machine. As another example, each operation on the operand stack now uses the operand_stack register just before it is needed, instead of passing a possibly obsolete heap address as argument to the operation.

function push_on_operand_stack(v) {
    heap[operand_stack + first_child_offset +
         heap[operand_stack + number_of_children_offset]] = v;
    heap[operand_stack + number_of_children_offset] = 
        heap[operand_stack + number_of_children_offset] + 1;
}
function pop_from_operand_stack() {
    const val = heap[operand_stack + first_child_offset - 1 +
                     heap[operand_stack + number_of_children_offset]];
    heap[operand_stack + number_of_children_offset] = 
        heap[operand_stack + number_of_children_offset] - 1;
    return val;
}
function top_of_operand_stack() {
    return heap[operand_stack + first_child_offset - 1 + 
                heap[operand_stack + number_of_children_offset]];
}
function peek_in_operand_stack(n) {
    return heap[operand_stack + first_child_offset - 1 + 
                heap[operand_stack + number_of_children_offset] - n];
}

Play with the machine implementation by changing heap_size. If it is large enough, the machine code runs to the end without a single garbage collection. As you make heap_size smaller and smaller, more and more garbage collections are needed. Add display messages to keep track of the calls of flip. If heap_size becomes too small, flip cannot free a sufficient amount of space for some runtime entity, and the program terminates with the message memory exhausted.

Summary

This post concludes my coverage of virtual machines in this blog. I started out with a rather high-level machine that used a global operand stack and the environment data structure from SICP JS. I replaced the global operand stack by local operand stacks of precomputed size that are allocated for each function call. I replaced the SICP-JS-style environments by arrays by exploiting the properties of lexical scoping. These changes allowed me to allocate all runtime entities on a heap data structure. Finally in this post, I introduced garbage collection as a method to reuse the space of heap nodes that are no longer needed.

If you like to take this a bit further, here are a couple of ideas:

Byte code: Represent the machine code by sequences of bytes (or JavaScript numbers). The machine can then dispatch using the first byte (number) of the sequence, instead of using nested conditionals as in the machines shown so far.

Representing primitive values: Investigate how to optimize the treatment of primitive values (numbers, boolean values) to avoid heap allocation.

Representing strings: Investigate how strings can be handled in the machine, after reading about a string pool in SICP JS 5.5.6.


A Virtual Machine with a Heap     Function Combinators