Nand2Lisp

V2: Full Lispy calculator

To complete the functionality of the lispy calculator, we will need a few special forms and builtins. The result will be a sys.vm file that sets up the initial lisp environment so it knows where these are stored in compiled memory, briefly hinted at in the main.vm file previously. It will look something like this:

// Two more functions are considered part of sys
// but their implementation is given directly in 
// the translator, in assembly code
// These are call and return

function init
    // building the initial environment
    push constant 24576     // symbol(0)
    special 0               // special(if)
    cons
    push constant 24577     // symbol(1)
    special 1               // special(define)
    cons
    push constant 24578     // symbol(2)
    push label builtin.begin
    builtin                 // builtin(begin)
    cons
    push constant 24579     // symbol(3)
    push label builtin.add
    builtin                 // builtin(add)
    cons
    push constant 24580     // symbol(4)
    push label builtin.sub
    builtin                 // builtin(sub)
    cons
    push constant 24581     // symbol(5)
    push label builtin.gt
    builtin                 // builtin(gt)
    cons
    push constant 8192      // 0x2000 = emptylist
    cons
    cons
    cons
    cons
    cons
    cons
    pop environment
    push constant 0         // each func HAS to take one arg!
    call main.main
    goto end

New here are the special, builtin and push label vm-level instructions. These instructions are needed because the type prefix of our procedures goes beyond the 15-bit limit of constants we can push at a time; a constraint we inherited from the Hack VM implementation. They are purely syntactic sugar. Arguably symbol here could also benefit from that. There is also the new special type, which takes prefix 110 in our 3-bit type system. This gives us all we need in order to support our missing lisp forms if, define, set! and lambda.

Jumping around in memory

So how does this work? Remember previously we let our call to the builtin + function be a hardcoded part of eval:

// actually call the function stored in local 1
label call
    // TODO
    call eval.add
    return

We instead want to be able to map symbols to builtin vm functions more generally, and the env from sys.vm sets us up for that. The eval function can be extended so it does a lookup by builtin symbol, but in order to know where to jump to we will need to know the start of the builtin function after compilation. This is what the push label instruction provides.

    push constant 24579     // symbol(3)
    push label builtin.add
    builtin                 // builtin(add)
    cons

The above snippet creates a cons cell with symbol(3) as the car or key, and the address of the start of the builtin add function as its cdr or value, prefixed with the builtin type prefix 101.

Here is one of the new vm-level instructions in assembly. The rest are similarly simple.

// assembly level translation of vm instruction 'builtin'
// 0x7fff + 0x2001 = 0xa000, or the 101 prefix we are after
// NOTE: since we use OR, if we try to create a builtin larger than 0x1fff,
// this wil fail in unexpected ways!
// for now this is acceptable as we won't have that many and it eliminates an otherwise-needed mask
    @0x7fff
    D=A
    @0x2001
    D=D+A
    @SP
    A=M-1
    M=D|M

Now we are ready to modify eval.vm.

// actually call the function stored in local 1
label call
    push local 1
    call-builtin            // new!
    return

Once we reach call, we simply mask off the type prefix of our argument and perform a SYSCALL, like so:

// assembly level translation of vm instruction 'call-builtin'
// choice of registers for FUNC and RET mimics SYSCALL implementation
    @SP
    AM=M-1
    D=M
    @0x1fff
    D=D&A                   // mask off first three bits (TODO, could check them first)
    @R13                    // FUNC
    M=D
    @RETURNLABEL            // gensymmed label
    D=A
    @R15                    // RET
    M=D
    @SYSCALL
    0;JMP
    (RETURNLABEL)

Apart from the unmask of label, this is equivalent to how we call any other function. The main reason we added call-builtin is that the default call takes a label argument, which is now a runtime variable. This lets us define builtin functions and modify them without having to count assembly output lines to jump to them; very useful! Here is the lisp keyword begin as a builtin vm-level function:

// (begin expr ... )
function begin
label beginloop
    push argument
    cdr
    is-emptylist
    if-goto beginend
    push argument
    cdr
    pop argument
    goto beginloop
label beginend
    push argument
    car
    return

The begin function evaluates each argument then returns the last. Crucially, this definition of begin works without any calls to eval because in LISP, arguments are evaluated before function application! We already implemented this in the evalprocedure part of eval. Which is a nice seque into special builtins, of which there are very few. They are builtin LISP functions that can only work properly if they do not evaluate their arguments first. Therefore they must be treated different, and deserve a proper subtype of procedure for eval to handle.

Special builtins

We start by modifying eval.vm as follows:

label evalprocedure
    push local 1
    is-special              // new!
    if-goto evalspecial     // new!
label evalargs
    ...

The is-special instruction is implemented exactly like its friends is-symbol etc before it. This next bit is going to be very similar to call-builtin, but instead of using syscall we write an explicit if/else switch. There are only a handful of special builtins, so this is fine.

label evalspecial
    // remove mask
    push local 1
    push constant 8191      // 0x1fff
    and
    pop local 1
    push local 1
    push constant 0
    eq
    if-goto evalif
    push local 1
    push constant 1
    eq
    if-goto evaldefine
    push local 1
    push constant 2
    eq
    // ... etc for the other forms
    if-goto evalquote
    // unknown builtin, return err
    goto sys.end

The implementation of quote is very simple so let's start with that. It returns the first argument, unevaluated!

// (quote arg)
label evalquote
    push local 0
    cdr
    car
    return

The other forms are a bit more complicated than that. Next we go over the implementation of each of them, starting by adding to the eval function and changing our program from there until we have a full (if barebones) LISP implementation.

If

The if expression takes three arguments, of which the last one is optional. They are test, conseq and alt. We start by evaluating only test, and depending on the outcome will evaluate either conseq or alt, but not both.

// (if test conseq alt)
label evalif
    push argument
    car                     // env on the stack
    push local 0
    cdr
    car                     // test
    push constant 8192      // 0x2000 = emptylist
    cons
    cons
    call eval.eval
    push constant 0
    equal
    if-goto evalalt
// label evalconseq
    push argument
    car                     // env on the stack
    push local 0
    cdr
    cdr
    car                     // conseq
    push constant 8192      // 0x2000 = emptylist
    cons
    cons
    call eval.eval
    return
label evalalt
    // TODO: if no alt, return 'false'
    push argument
    car                     // env on the stack
    push local 0
    cdr
    cdr
    cdr
    car                     // alt
    push constant 8192      // 0x2000 = emptylist
    cons
    cons
    call eval.eval
    return

This was a bit more work than quote, but still possible with the instruction set we have now. For the next special forms we will have to extend our implementation slightly.

Define

The define expression takes two arguments, essentially a key and a value to store in the environment. We expect the key to be a symbol and only evaluate the second argument to obtain the value to store. Modifying the environment is new: we need to make sure the environment stays enriched outside of the scope of the current call to eval. We have defined the semantics of our stack-based vm such that by default the environment of the caller of a function is restored after returning from that function. This means you can not modify the environment at all. Define is the main way to break that assumption, so we will have to make some changes.

The environment is implemented as an assoc list on the heap. Adding to the environment is easy: we can just cons an extra pair on top. This gives us a new pointer to an extended list, which is the new environment. The problem is that we want to change this environment for everyone, so even captured references to it need to be changed (more on those later). The idea is to modify the contents of the pointer on the heap, without losing any data. If we duplicate the original environment, meaning we obtain a new reference to the same data, we can cons onto that to obtain our new env. Lastly we copy the car and cdr of the new env into the address pointed at by the original env. This modifies the contents of the pointer so that all references to the original env now point to the enhanced one. We implement this functionality as a new vm-level instruction: copy-pointer:

    @SP
    AM=M-1
    D=M
    @R7                     // dest
    M=D
    @SP
    AM=M-1
    D=M
    @R8                     // src
    AM=D
    MCDR
    @R7
    A=M
    SETCDR
    @R8
    A=M
    MCAR
    @R7
    A=M
    SETCAR

We are now ready to implement define. Note that this implementation leaves a duplicate unused cons cell on the heap; at some point when garbage collection is implemented this should be cleaned up automatically.

// (define symbol body)
label evaldefine
    // TODO: this just adds, doesnt check if already exists in env
    // meaning currently the assoc list could have duplicate keys
    push local 0
    cdr
    car                     // symbol
    // TODO: if not symbol, error!
    push argument
    car                     // env on the stack
    push local 0
    cdr
    cdr
    car                     // exp
    push constant 8192      // 0x2000 = emptylist
    cons
    cons
    call eval.eval
    cons                    // (symbol . (eval exp)) = entry
    // we need to modify the env pointer, because ENV will be restored
    // to whatever the caller's ENV was
    // first we duplicate existing env pointer on heap
    // then we repoint original pointer to ( entry . duplicatepointer )
    push environment
    car
    push environment
    cdr
    cons                    // allocate another cons cell to current env
    cons                    // ( entry . duplicatepointer )
    // TODO: difference between caller ENV and argument env ?
    push environment        // original env
    copy-pointer
    // cleanup duplicate cons cell on the heap should happen due to GC at some point
    push constant 0
    return

Lambda

The hard part when implementing lambda is not the immediate eval, but what to do when the result is evaluated again.

label evallambda
    // (lambda (params ...) body)
    // assumption: params is a list of symbols
    // TODO: typecheck param arg and check length of args=2
    push environment
    push local 0
    cons
    userdefined
    return

Following Norvig's Lispy implementation, we store a user-defined procedure as a triple of env, params and body. The parameters are assumed to be sent as a list of symbols, the body is one or more expressions. What we want to store matches almost exactly what is passed to the lambda expression; the only thing missing is to prepend the environment in which it is being evaluated. This triple is then tagged with the type prefix for user-defined procedures using userdefined.

The big change is in the code for evaluating procedure application. Previously this was a relatively small part of eval:
Apart from special forms (which we are discussing here), we only supported calls to builtin procedures with some sort of goto. What was missing is application of user-defined procedures. We'll extract this part of eval into a builtin apply function, which we then get for free.

function apply
    push constant 0     // prepare local 0 = f
    push constant 0     // prepare local 1 = f.params
    push constant 0     // prepare local 2 = args
    push constant 0     // prepare local 3 = numargs
    push argument
    car
    is-builtin
    if-goto callbuiltin

Apply starts with a switch: are we evaluating a builtin or user-defined procedure?

// label userdefined
    // remove mask
    push argument
    car
    push constant 8191  // 0x1fff
    and
    pop local 0
    // zip f.params with args
    push local 0
    cdr
    car                 // f.params
    pop local 1
    push argument
    cdr
    car                 // args
    pop local 2

User-defined procedures are a triple that we need to unpack. To apply, we first zip the parameters (list of symbols) and the arguments (list of expressions) so that we obtain a list of pairs, an assoc list just like the environment:

label applyrec
    // TODO: check if len(params) == len(args)
    push local 1
    is-emptylist
    if-goto endapplyrec
    push local 3
    push constant 1
    add
    pop local 3         // numargs++
    push local 1
    car
    push local 2
    car
    cons
    push local 1
    cdr
    pop local 1
    push local 2
    cdr
    pop local 2
    goto applyrec

This is then added to the env stored in the procedure when it was created using lambda. The new extended environment binds the parameters to the values that the function was called with. What's left is to evaluate the body of the procedure in this new environment, like so:

label endapplyrec
    // cons on top of f.env -> newenv
    push local 0
    car                 // f.env
label applyconsloop
    push constant 0
    push local 3
    eq
    if-goto applyend
    cons
    push local 3
    push constant 1
    sub
    pop local 3
    goto applyconsloop
label applyend
    // call eval.eval of newenv and f.body
    push local 0
    cdr
    cdr
    car                 // f.body
    push constant 8192  // 0x2000 = emptylist
    cons
    cons
    call eval.eval
    return

And finally, here is the evaluation of builtin, mostly unchanged:

label callbuiltin
    push argument
    cdr
    car
    push argument
    car
    call-builtin
    return

Set!

TODO: set! is define but with replace

A note on type errors

So far we haven't really handled errors. The best we have done so far is to quit the program early if a type error was detected at vm-level. But our types are represented at a level where even the ALU should be able to output a signal to indicate type errors, for example when trying to compare two values using EQL that have different types, or when calling MCAR on something that is not a pair.