A Virtual Machine with Garbage Collection
In a nutshell
This post describes an implementation of machine and compiler that uses garbage collection to reclaim unused memory in the heap. To play with it, click the link A virtual machine with garbage collection.
Motivation
The last post presented a virtual machine with explicit memory management to represent runtime entities such as environments, function objects, and runtime-stack frames. These entities were allocated from a heap data structure, which was an array of primitive values. When a new entity is needed, it gets allocated a contiguous range of heap slots called a node starting at a free pointer. After each allocation, the free pointer gets incremented by the size of the allocated entity. That means that the heap grows bigger and bigger as computation proceeds.
This regime may not be realistic, because it keeps allocating fresh memory for all runtime entities. In the JavaScript implementation of the previous post where automatically resizing JavaScript arrays are used, the heap will grow very quickly and may exceed the maximal size of memory available to the JavaScript implementation. In programming languages that have a fixed array size, such as C, the implementer of the machine is faced with the question how big the heap should be to accommodate all runtime entities. In both cases, there is typically a lot of wasted memory: At any point of time, there are bound to be many nodes that will never be used again, and yet the machine will keep their slots reserved forever. Such nodes are called garbage. Unfortunately, we cannot predict precisely which nodes are garbage. As usual, we take a conservative approach and define a safe approximation.
Preparing the heap
A useful property of the machine is that it will only visit nodes in the heap by following pointers from a small number of registers. These are the current environment, the current operand stack, and the current frames on the runtime stack. These nodes are called roots. The nodes reachable from the roots are called live, and the rest are called dead. Any dead node is garbage, and therefore its memory can be reused. In this post, I describe a memory management technique called stop-and-copy garbage collection that makes use of this property. The first idea is to split the available memory into two halves.
onst heap = [];
const heap_size = 100000;
const heap_bottom = 0;
const space_size = heap_size / 2 ;
The machine keeps allocating memory from one half of the memory, and when that half is full, it stops execution of machine instructions, copies all live nodes into the other half, and then resumes execution.
let to_space = heap_bottom;
let top_of_space = to_space + space_size - 1;
let from_space = top_of_space + 1;
let free = to_space;
Initially, the bottom half of the memory is called the to-space and the top half is
called the from-space.
The new version of the function make_node
keeps allocating memory from the current to_space
.
If the current to-space is not big enough to accommodate the new node, it performs
garbage collection by calling a function flip
.
function make_node(kind, size) {
if (free + size > top_of_space) {
flip();
}
if (free + size /* still */ > top_of_space) {
error("out of memory");
} else {
const node_address = free;
heap[node_address + kind_offset] = kind;
heap[node_address + size_offset] = size;
heap[node_address + number_of_children_offset] = 0;
free = free + size;
return node_address;
}
}
The function flip
reverses the roles of from_space
and to_space
, and copies all
nodes reachable from the roots (environment, operand stack, and runtime-stack frames)
from the new from-space to the new to-space.
function flip() {
let temp = from_space;
from_space =to_space;
to_space = temp;
top_of_space = to_space + space_size - 1;
let scan = to_space;
free = to_space;
environment = copy(environment);
operand_stack = copy(operand_stack);
for (let r = runtime_stack; !is_null(r); r = tail(r)) {
set_head(r, copy(head(r)));
}
while (scan < free ) {
let number_of_children = heap[scan + number_of_children_offset];
for (let c = 0; c < number_of_children; c = c + 1) {
heap[scan + first_child_offset + c] =
copy(heap[scan + first_child_offset + c]);
}
scan = scan + heap[scan + size_offset];
}
}
After all roots are copied into the to-space, a while loop scans
the copied nodes and copies their children as well. A scan
pointer
keeps track of the next copy whose children still need to be copied.
A free
pointer keeps track of the place where the next child is to
be copied to. The copying of children
continues until scan
reaches free
, at
which point all live nodes have been copied from the from-space
to the to-space. This particularly elegant way of copying all
live nodes in a breadth-first manner using a while-loop is due to
C.J. Cheney.
Each time a node is copied using the copy
function, the address
of the copy is recorded in the original. This address is traditionally
called a “broken heart”.
const broken_heart_offset = 0; // use kind slot for broken heart
function copy(v) {
if (already_copied(v)) {
return heap[v + broken_heart_offset];
} else {
const addr = free;
move(v, free);
free = free + heap[v + size_offset];
heap[v + broken_heart_offset] = addr;
return addr;
}
}
The copy
function checks if a node is already copied to the
to-space using
the broken-heart address.
function already_copied(v) {
return heap[v + broken_heart_offset] >= to_space &&
heap[v + broken_heart_offset] <= top_of_space;
}
The actual copying is done using the move
function.
function move(source, destination) {
let size = heap[source + size_offset];
for (let i = 0; i < size; i = i + 1) {
heap[destination + i] = heap[source + i];
}
}
Nasty bugs
The integration of a stop-and-copy garbage collector into a virtual machine is notoriously error-prone. I’m going to illustrate this using the call-instruction as an example. Take a look at this seemingly reasonable implementation of the call-instruction.
// machine clause for call-instruction
} else if (is_call_instruction(instr)) {
const arity = call_instruction_arity(instr);
const callee = peek_in_operand_stack(arity);
if (is_primitive_function(callee)) {
const args = pop_arguments_from_operand_stack(arity);
const callee = pop_from_operand_stack();
pc = pc + 1;
push_on_operand_stack(apply_primitive_function(
callee,
args));
} else {
runtime_stack = pair(make_runtime_stack_call_frame(),
runtime_stack);
const max_stack_size = function_max_stack_size(callee);
pc = function_address(callee);
environment = function_environment(callee);
extend_environment(arity);
operand_stack = make_operand_stack(max_stack_size);
}
} else ...
If the callee is a compound function, the machine allocates a new
runtime-stack frame on the heap and pushes it on the runtime stack.
It sets the pc
to the first instruction of the function
body and allocates a new environment node on the heap
using the extend_environment
function, which assigns the
environment
register to the new node. Finally, it
allocates a new operand stack of sufficient size on the heap
and assigns the operand_stack
register to it.
But note that each heap allocation can trigger a garbage collection,
which renders previously obtained addresses obsolete. The bug
in the implementation above is that a garbage collection triggered
by make_runtime_stack_call_frame
renders the callee
address
obsolete. In that case, the environment
register gets assigned
to an obsolete address in the from-space, whose node might
get overwritten in the next call of flip
before it is used
by some other instruction.
To fix the bug, I’m reobtaining the callee address after the
call of make_runtime_stack_call_frame
.
// machine clause for call-instruction
} else if (is_call_instruction(instr)) {
const arity = call_instruction_arity(instr);
const callee = peek_in_operand_stack(arity);
if (is_primitive_function(callee)) {
const args = pop_arguments_from_operand_stack(arity);
const callee = pop_from_operand_stack();
pc = pc + 1;
push_on_operand_stack(apply_primitive_function(
callee,
args));
} else {
runtime_stack = pair(make_runtime_stack_call_frame(),
runtime_stack);
// callee might be obsolete; reload from current operand stack
const callee = peek_in_operand_stack(arity);
const max_stack_size = function_max_stack_size(callee);
pc = function_address(callee);
environment = function_environment(callee);
extend_environment(arity);
operand_stack = make_operand_stack(max_stack_size);
}
} else ...
A few further modifications are needed to avoid
obsolete addresses in the machine. As another example, each operation
on the operand stack now uses the operand_stack
register just before
it is needed, instead of passing a possibly obsolete heap address
as argument to the operation.
function push_on_operand_stack(v) {
heap[operand_stack + first_child_offset +
heap[operand_stack + number_of_children_offset]] = v;
heap[operand_stack + number_of_children_offset] =
heap[operand_stack + number_of_children_offset] + 1;
}
function pop_from_operand_stack() {
const val = heap[operand_stack + first_child_offset - 1 +
heap[operand_stack + number_of_children_offset]];
heap[operand_stack + number_of_children_offset] =
heap[operand_stack + number_of_children_offset] - 1;
return val;
}
function top_of_operand_stack() {
return heap[operand_stack + first_child_offset - 1 +
heap[operand_stack + number_of_children_offset]];
}
function peek_in_operand_stack(n) {
return heap[operand_stack + first_child_offset - 1 +
heap[operand_stack + number_of_children_offset] - n];
}
Play with
the machine implementation
by changing heap_size
. If it is large enough, the machine
code runs to the end without a single garbage collection. As
you make heap_size
smaller and smaller, more and more
garbage collections are needed. Add display
messages
to keep track of the calls of flip
. If heap_size
becomes
too small, flip
cannot free a sufficient amount of space
for some runtime entity, and the program terminates with the message
memory exhausted
.
Summary
This post concludes my coverage of virtual machines in this blog. I started out with a rather high-level machine that used a global operand stack and the environment data structure from SICP JS. I replaced the global operand stack by local operand stacks of precomputed size that are allocated for each function call. I replaced the SICP-JS-style environments by arrays by exploiting the properties of lexical scoping. These changes allowed me to allocate all runtime entities on a heap data structure. Finally in this post, I introduced garbage collection as a method to reuse the space of heap nodes that are no longer needed.
If you like to take this a bit further, here are a couple of ideas:
Byte code: Represent the machine code by sequences of bytes (or JavaScript numbers). The machine can then dispatch using the first byte (number) of the sequence, instead of using nested conditionals as in the machines shown so far.
Representing primitive values: Investigate how to optimize the treatment of primitive values (numbers, boolean values) to avoid heap allocation.
Representing strings: Investigate how strings can be handled in the machine, after reading about a string pool in SICP JS 5.5.6.