Metalinguistic Abstraction in Stages
In a nutshell
This post describes four implementations of recursive evaluators. To play with them, click the links:
- Calculator language
- Adding booleans, conditionals, and sequences
- Adding blocks, declarations, and names
- Adding functions with implicit return
- Adding return statements
Motivation
The textbook Structure and Interpretation of Computer Programs, JavaScript Edition concludes in Chapters 4 and 5 with interpreters and compilers that present executable mental models for the evaluation of programs, at various levels of abstraction. Chapter 4 starts with a recursive metacircular evaluator for the subset of JavaScript that is used throughout the book. Teaching this metacircular evaluator is a challenge because it is conceptually rich and because it is often the first medium-sized program that students encounter (around 500 lines). In this blog post I share a teaching technique that I’ve found to be useful. The idea is to introduce the metacircular evaluator in stages, starting from a simple calculator language, and gradually extend the language to become a more and more realistic programming language. This way, the main concepts are revealed step-by-step, and the students gradually adapt to larger and larger programs.
A recursive evaluator for a calculator language
We start with a calculator sublanguage of JavaScript. A program in such a language is a single expression
statement, and the expression consists of numbers and the binary operators +
, -
, *
, and /
.
A typical “program” looks like this:
1 + 2 * 3 - 4;
A parse
function comes in handy, to translate such a program into a syntax tree:
parse("1 + 2 * 3 - 4;");
which results in
list("binary_operator_combination",
"-",
list("binary_operator_combination",
"+",
list("literal", 1),
list("binary_operator_combination",
"*",
list("literal", 2), list("literal", 3))),
list("literal", 4))
using the list notation of SICP JS. A recursive evaluator (click the link to play) for such syntax trees is quite simple.
function evaluate(expr) {
return is_literal(expr)
? literal_value(expr)
: is_operator_comb(expr)
? apply(operator_comb_operator_symbol(expr),
list_of_values(
list(operator_comb_first_operand(expr),
operator_comb_second_operand(expr))))
: error(expr, "Unknown expression: ");
}
where the apply
function uses the appropriate JavaScript operator to compute the result of
binary operator combinations
function apply(operator, operands) {
const first_op = head(operands);
const second_op = head(tail(operands));
return operator === "+"
? first_op + second_op
: operator === "-"
? first_op - second_op
: operator === "*"
? first_op * second_op
: operator === "/"
? first_op / second_op
: error(operator, "Unknown operator");
}
and where the function list_of_values
just evaluates the elements of the given list of expressions.
function list_of_values(exprs) {
return map(evaluate, exprs);
}
The following syntax functions separate the concrete representation of the syntax tree from the evaluation.
// literal
function is_literal(component) {
return is_tagged_list(component, "literal");
}
function literal_value(component) {
return head(tail(component));
}
function is_tagged_list(component, the_tag) {
return is_pair(component) && head(component) === the_tag;
}
// operator combination
function is_operator_comb(component) {
return is_tagged_list(component, "binary_operator_combination");
}
function operator_comb_operator_symbol(component) {
return list_ref(component, 1);
}
function operator_comb_first_operand(component) {
return list_ref(component, 2);
}
function operator_comb_second_operand(component) {
return list_ref(component, 3);
}
Finally, a function parse_and_evaluate
provides a convenient interface to the evaluator.
function parse_and_evaluate(program) {
return evaluate(parse(program));
}
For example parse_and_evaluate("1 + 2 * 3 - 4;"));
returns the expected result, the number 3.
Adding booleans, conditionals, and sequences
The next evaluator extends the calculator language by
adding the boolean values true
and
false
, conditional expressions and sequences of statements. In JavaScript, the component
statements of a sequence are evaluates in the order in which they appear, and the result
in the case of this JavaScript sublanguage is the result of evaluating the last statement
of the sequence. The result of evaluating the program
8 + 34; true ? 1 + 2 : 17;
is 3, because the result of evaluating the first statement of the sequence 8 + 34
is ignored, and
the conditional expression evaluates to the result of 1 + 2
because its predicate is true.
The boolean values true
and false
are literal values like numbers, so the only additional
cases in this evaluator (click on the link for
all necessary syntax and support functions) handle conditionals and sequences.
function evaluate(comp) {
return is_literal(comp)
? literal_value(comp)
: is_operator_comb(comp)
? apply(operator_comb_operator_symbol(comp),
list_of_values(
list(operator_comb_first_operand(comp),
operator_comb_second_operand(comp))))
: is_conditional(comp)
? eval_conditional(comp)
: is_sequence(comp)
? eval_sequence(sequence_statements(comp))
: error(comp, "Unknown component:");
}
The function eval_conditional
checks whether the result of evaluating the predicate is
considered true (using the function is_truthy
) and evaluates the appropriate branch.
function eval_conditional(comp) {
return is_truthy(evaluate(conditional_predicate(comp)))
? evaluate(conditional_consequent(comp))
: evaluate(conditional_alternative(comp));
}
The function eval_sequence
recursively evaluates the components of the sequence.
function eval_sequence(stmts) {
if (is_empty_sequence(stmts)) {
return undefined;
} else if (is_last_statement(stmts)) {
return evaluate(first_statement(stmts));
} else {
const ignore = evaluate(first_statement(stmts));
return eval_sequence(rest_statements(stmts));
}
}
For example, the result of
parse_and_evaluate("8 + 34; true ? 1 + 2 : 17;");
is 3 because the result of 8 + 34
is ignored by eval_sequence
, and the eval_conditional
chooses
the consequent expression.
Adding blocks, declarations, and names
The next evaluator adds blocks, const
declarations,
and names. A typical example is
const y = 4;
{
const x = y + 7;
x * 2;
}
which evaluates to 22 because in the program, the name y
is declared to be 4, and in the
block (delimited by braces {...}
) the name x
is declared to refer to y + 7
, i.e. 11.
Declarations that use JavaScript’s const
and let
enjoy block scope. To handle declarations within
blocks and occurrences of names in their scope,
this evaluator (click to see the new syntax
and support functions) adds the functions eval_block
and eval_declaration
.
function evaluate(comp, env) {
return is_literal(comp)
? literal_value(comp)
...
: is_name(comp)
? lookup_symbol_value(symbol_of_name(comp), env)
: is_block(comp)
? eval_block(comp, env)
: is_declaration(comp)
? eval_declaration(comp, env)
: error(comp, "Unknown component:");
}
This version of the function evaluate
has a new parameter env
that keeps track of the names
that are declared
in any given scope and the values that these names refer to at any given time. The evaluation of
a name occurrence looks up the value associated with the symbol (string) of the name in the
environment using lookup_symbol_value
.
The function
eval_block
scans out the local names that are declared in the body of a given block, and evaluates
the body in an environment that extends the current environment by bindings of the local names
to their initial value *unassigned*
.
function eval_block(component, env) {
const body = block_body(component);
const locals = scan_out_declarations(body);
const unassigneds = list_of_unassigned(locals);
return evaluate(body, extend_environment(locals,
unassigneds,
env));
}
function list_of_unassigned(symbols) {
return map(symbol => "*unassigned*", symbols);
}
The function eval_declaration
just assigns in the current environment
the symbol of the declaration to the result of evaluating the expression of the declaration.
function eval_declaration(component, env) {
assign_symbol_value(
declaration_symbol(component),
evaluate(declaration_value_expression(component), env),
env);
}
To get declarations to work outside of any block, the function parse_and_evaluate
wraps the
given program in an implicit block.
function parse_and_evaluate(program) {
return evaluate(make_block(parse(program)),
the_empty_environment);
}
Then
parse_and_evaluate(`
const y = 4;
{
const x = y + 7;
x * 2;
}`);
yields the expected value 3.
Adding functions (with implicit return)
The
next evaluator deviates a bit from JavaScript,
to handle function declarations and lambda expressions
in the simplest possible way. In this JavaScript variant, the return value of a function is the result
of evaluating the function body; there is no need or use for return statements in this language.
For example, the function fact
in this language
function fact(n) {
n === 1 ? 1 : n * fact(n - 1);
}
computes the factorial function for positive integers. To make this happen, the evaluate
function
in the this evaluator
needs to include cases for function declarations, function applications, and lambda expressions.
function evaluate(component, env) {
return is_literal(component)
? literal_value(component)
: is_function_declaration(component)
? evaluate(function_decl_to_constant_decl(component), env)
...
: is_application(component)
? apply(evaluate(function_expression(component), env),
list_of_values(arg_expressions(component), env))
: is_operator_comb(component)
? evaluate(operator_comb_to_application(component),
env)
: is_lambda_expression(component)
? make_function(lambda_parameter_symbols(component),
lambda_body(component), env)
: error(component, "Unknown component:");
}
Operator combinations are now treated as function applications, using the function
operator_comb_to_application
, and function declarations are treated as constant
declarations with lambda
expressions, using the function function_decl_to_constant_decl
. The function apply
for evaluating function applications
is reminiscent of evaluating blocks. It creates a new environment in which the
function parameters refer to the already evaluated arguments, and then returns
the result of evaluating the body of the function with respect to the new environment.
function apply(fun, args) {
return is_primitive_function(fun)
? apply_primitive_function(fun, args)
: is_compound_function(fun)
? evaluate(function_body(fun),
extend_environment(
function_parameters(fun),
args,
function_environment(fun)))
: error(fun, "Unknown function type:");
}
The function apply
delegates the application of primitive functions to the function
apply_primitive_function
.
function apply_primitive_function(fun, arglist) {
return apply_in_underlying_javascript(
primitive_implementation(fun), arglist);
}
In order to provide bindings for predeclared names, the function parse_and_evaluate
now
uses the_global_environment
as its initial environment.
function parse_and_evaluate(program) {
return evaluate(make_block(parse(program)),
the_global_environment);
}
the_global_environment
contains bindings of all predeclared
functions to their respective primitive
functions. The application of
our example factorial function to 4
parse_and_evaluate(`
function fact(n) {
n === 1 ? 1 : n * fact(n - 1);
}
fact(4);
`);
gives the expected result of 24.
Adding return statements
The final evaluator handles return statements, a prominent feature in languages like C, Java, Python, and JavaScript. Returns statements allow the programmer to return from a function from anywhere in its body. Whatever statements in the body that remain to be evaluated are ignored. For example, in JavaScript, the program
function f(x) {
if (true) {
const y = 2;
return x + y;
44;
} else {
55;
}
66;
}
f(1);
results in 3 because the evaluation of the body of f
returns the result of evaluating x + y
to the
caller, ignoring the subsequent expression statements 44;
and 66;
that would otherwise
remain to be evaluated in the body. To achieve this, the evaluate
function
function evaluate(component, env) {
return is_literal(component)
? literal_value(component)
...
: is_return_statement(component)
? eval_return_statement(component, env)
...
: error(component, "unknown syntax -- evaluate");
}
in
this evaluator
uses the function eval_return_statement
function eval_return_statement(component, env) {
return make_return_value(evaluate(return_expression(component),
env));
}
which identifies the result of evaluating
the return expression as a “return value” using the function make_return_value
.
function make_return_value(content) {
return list("return_value", content);
}
function is_return_value(value) {
return is_tagged_list(value, "return_value");
}
function return_value_content(value) {
return head(tail(value));
}
The function eval_sequence
now checks whether the result of evaluating the
currently first statement is such a return value, in which case it ignores
the subsequent statements.
function eval_sequence(stmts, env) {
if (is_empty_sequence(stmts)) {
return undefined;
} else if (is_last_statement(stmts)) {
return evaluate(first_statement(stmts), env);
} else {
const first_stmt_value =
evaluate(first_statement(stmts), env);
if (is_return_value(first_stmt_value)) {
return first_stmt_value;
} else {
return eval_sequence(rest_statements(stmts), env);
}
}
}
The function apply
now checks whether the result of evaluating the body of the
function that is being applied is a return value, in which case it unwraps the
return value using the return_value_content
function above.
function apply(fun, args) {
if (is_primitive_function(fun)) {
return apply_primitive_function(fun, args);
} else if (is_compound_function(fun)) {
const result = evaluate(function_body(fun),
extend_environment(
function_parameters(fun),
args,
function_environment(fun)));
return is_return_value(result)
? return_value_content(result)
: undefined;
} else {
error(fun, "unknown function type -- apply");
}
}
If the result of evaluating the body is not a return value, apply
returns the
value undefined
, which is JavaScript’s default return value of functions.
With this mechanism in place, our example program
parse_and_evaluate(`
function f(x) {
if (true) {
const y = 2;
return x + y;
44;
} else {
55;
}
66;
}
f(1);
`
);
results in the expected value 3.
The factorial function above needs to have a return
added, because otherwise
it would always return undefined
. The example program
parse_and_evaluate(`
function factorial(n) {
return n === 1
? 1
: n * factorial(n - 1);
}
factorial(4);`);
results in the expected value 24.
Note that the recursive evaluator presented in this section gives rise to a recursive
process even if the underlying implementation of JavaScript has proper tail calls
(such as the Safari browser), and even if the given program being interpreted is giving
rise to an iterative process according to the
SICP terminology, because both the
wrapping of return values in eval_return_statement
and their unwrapping in
apply
are deferred operations. In a future blog post, I’ll show an evaluator that
fixes this.