Nand2Lisp

V1: From Hack to Lisp machine

I finished the nand2tetris book a while ago, building my own tooling in Go to simulate a simple computer all the way up from NAND-gates. After that I dove into SICP and went on to get a translation of Peter Norvig's lispy running on the nand2tetris computer. I left the project there to explore the wonderful world of Lisp further. Now it is time to come back with all the things I've learned, and build a nand2tetris-style Lisp machine. In this first post we will go over some ideas on how to modify the Hack computer and build something close to the lispy 'calculator'.

Computer Architecture and the LISP language

The core data structure in Lisp is the cons cell, representing a pair of values. A lot of the time in evaluating s-expressions will be spent traversing lists via the car and cdr commands. The easiest way to represent this close to where the machine language can efficiently execute on it is to add another RAM16K chip next to the main memory in the Hack computer, which will be addressed in the same way and at the same time. Each cycle we will fetch data from both RAM chips and feed it into the ALU. The first RAM chip will represent the car and the second the cdr in memory. We'll implement this using a builtin RAM32K chip, of which the second half will go unused (in fact, only the heap will use the cdr part, since that's where we will allocate cons cells). This image is based on the Hack memory (additions in red):

in (16) ➡

Data Memory RAM (16K) Added RAM (32K) Screen memory map (8K) Keyboard memory map
➡ car out (16)
address (16) ➡ ➡ cdr out (16)

This change makes addresses not just pointers into a flat list, but pointers into a list of cons cells (or pairs). Not everything in Lisp is a pair though: we have atoms which can be symbols or primitives such as numbers, and we have procedures that can be builtin or user-defined. This type information is crucial to handle s-expressions at the lowest level. We can take some bits of our 16-bit registers and encode type data for each value in those. This changes the Hack platform into a tagged architecture. We'll use the first 3 bits, leaving us with only 13 to represent data with. The alternative would be to extend the bitsize, but that would be a more drastic overall change.

Prefix Type Example
000 Pair (0x0 is the empty list) ()
001 Pair (+ . (x . 3))
010 Primitive 42
011 Symbol x
101 Builtin +
111 Special builtin lambda
100 Userdefined func

Prefix	Type	Example
000	Pair (0x0 is the empty list)	`()`
001	Pair	`(+ . (x . 3))`
010	Primitive	`42`
011	Symbol	`x`
101	Builtin	`+`
111	Special builtin	`lambda`
100	Userdefined	`func`

The first 3 bits indicate ISPROCEDURE, ISATOM and ISSYMBOL respectively. Their negations are ISEXPRESSION, ISPAIR and ISPRIMITIVE. Note that the 16-bit numbers representation in Hack gets some features for free that we lose this way without modifying the ALU further, main one being negative numbers as 2's complement. In return we get extremely low-level type checking, which I find more interesting to play with.

One caveat is that we now have to be very careful not to confuse pointers to pairs with pairs themselves, as it all looks the same in memory. For example, how would we tell the difference between a pointer to pair, and a cons cell starting with a pair? In general, we'll assume that the stack only works with pointers and the heap only contains actual cons cells. To make the allocation of a cons cell with empty cdr easier, we represent empty list as 0x0.

Using this data representation we can build Lisp-specific machine instructions. My approach is to extend the ALU to recognise these new instructions and execute them without breaking the existing Hack ones. This way we can reuse most of the nand2tetris infrastructure to test our machine!

Machine Language

Here's a refresher on the original Hack instructions and how a 16-bit word maps to functionality:

c free a comp dest jump instruction
0 v v v v v v v v v v v v v v v A-instruction: @value
1 1 1 1 1 1 0 0 0 0 0 1 0 0 0 0 C-instruction: dest=comp;jump. In this case D=M

c	free	a	comp	dest	jump	instruction
0	v	v	v	v	v	v	v	v	v	v	v	v	v	v	v	A-instruction: @value
1	1	1	1	1	1	0	0	0	0	0	1	0	0	0	0	C-instruction: dest=comp;jump. In this case `D=M`

You can see two bits are completely unused, or free. We will call them b[1] and b[2], or useALU and writeCdrM respectively. The following modifications leave the Hack instructions unaltered, meaning the assembler will also still be valid.

If b[1] is high, we will use the Hack ALU like normal, but otherwise we will ignore its output completely and use our Lisp-specific gates.
If b[2] is high, we will write the CDR output to the CDR RAM.

Since the original Hack platform did not have this second RAM the ALU does not know or understand it: just to be sure we will never write to CDR from ALU.

This leaves at least 7 bits ( a + comp ) to switch between different instructions. Even if we don't find an elegant way like the ALU to encode them together, we can make an ugly switch using Mux-gates. Here's the ones I've found I needed to make eval work at the virtual machine level; their implementation in gates is left to the reader :)

Command Instruction Example
SETCAR 1111001100001000 shorthand for M=D, not a new instruction
SETCDR 1010111111000000 writes D to CDR register
MCAR 1111110000010000 shorthand for D=M, not a new instruction. Maybe when we support typechecking.
MCDR 1000011111010000 writes value of CDR register into D
EQLM 1000010111010000 sets D to 0xffff if M == D, otherwise 0x0000
ISSYMB 1000001011010000 checks tags of M, setting D to true or false the same way as EQLM
ISPRIM 1000001010010000 similar but checking different prefix
ISPROC 1000001101010000 checks first bit set to 1
ISEMPTY 1000001001010000 all 4 of these reuse the same gates; prefix can be found in the instruction b[7:9]
EMPTYCDR 1000000001010000 like the above but checks the CDR register instead

Command	Instruction	Example
SETCAR	1111001100001000	shorthand for `M=D`, not a new instruction
SETCDR	1010111111000000	writes `D` to CDR register
MCAR	1111110000010000	shorthand for `D=M`, not a new instruction. Maybe when we support typechecking.
MCDR	1000011111010000	writes value of CDR register into `D`
EQLM	1000010111010000	sets `D` to `0xffff` if `M == D`, otherwise `0x0000`
ISSYMB	1000001011010000	checks tags of `M`, setting `D` to true or false the same way as `EQLM`
ISPRIM	1000001010010000	similar but checking different prefix
ISPROC	1000001101010000	checks first bit set to 1
ISEMPTY	1000001001010000	all 4 of these reuse the same gates; prefix can be found in the instruction `b[7:9]`
EMPTYCDR	1000000001010000	like the above but checks the CDR register instead

Assembler

The changes to assembler, virtual machine and compiler are not that interesting to work out in more detail here. Instead let's have a look at a program using these instructions, hand-written in extended Hack assembly. We will build up to the eval function, one that can evaluate any s-expression. To do so we need one very important component that is not yet reflected in our types: the environment.

At its most basic, the environment is a mapping of symbols to s-expressions. Norvig's lispy uses a dictionary; in my own Golang-based Lisp interpreter I do the same thing. But we don't have to implement hashmaps: we already have lists!
We can use association lists, with the key being a symbol and the value an s-expression. However, this will be slow.

(ASSQ)
    // assume @R11 = K (key) and @R12 = P (pointer to assoc list)
    @R12
    A=M
    A=M
    D=M
    @R11
    EQLM
    @ASSQCONTINUE
    D;JEQ
    // here K == D !
    @R12
    A=M
    A=M
    MCDR
    @0x6002     // I've used this address to write to tape output
    M=D
    @END
    0;JMP
(ASSQCONTINUE)
    @R12
    A=M
    EMPTYCDR
    @FAILTOFIND
    D;JNE
    @R12
    A=M
    MCDR
    @R12
    M=D
    @ASSQ
    0;JMP
(FAILTOFIND)
    @END
    0;JMP

That's pretty compact!

Virtual Machine

We could write out eval in much the same way, but at some point using the stack abstraction speeds up the process a lot. I've written all of eval in the virtual stack machine first, and only translated it back to assembly later for performance reasons. This way it's a lot easier to follow along.

I've made a few modifications to the Hack virtual machine to better represent a functional language and get a grasp of the differences. First of all, we will do away with local variables and force all functions to take exactly one argument. Local vars are not needed because we will use the environment, and since we have list we can pass a variadic amount of args by just passing a list pointer. Local variables will make an appearance in eval but will refer to the first n items on the function's working stack. Most functions are expected to not have to use this at all.

Instead of LCL we store a pointer to ENV, a pointer to the current environment we are evaluating in. The THIS and THAT registers are also no longer needed, just like pointer and temp. This simplifies things quite a bit: calling and returning from functions has less overhead.

ARG ➡ argument
return address
saved ENV
saved ARG
SP ➡ working stack

CONS and memory allocation

We want cons to be supported on the VM level: it consumes the top two items on the stack and creates a new cons cell on the heap. If we push 4, then 7, then call cons, the result should be the cons cell (4 . 7).
In order to do so we must support some form of alloc at this abstraction level and we will do so making the following assumptions:

We only ever need to alloc one space at a time, ie only support alloc(1) for a cons cell
At some point we will support compacting garbage collection, so we can always alloc downwards

The solution for now is to keep a global variable FREE in memory. The first few registers now look like:

Register Name Usage
RAM[0] NIL indicates failure
RAM[1] SP stack pointer
RAM[2] ENV environment pointer
RAM[3] ARG argument pointer
RAM[4] FREE starts at 2048, base of heap

Register	Name	Usage
RAM[0]	`NIL`	indicates failure
RAM[1]	`SP`	stack pointer
RAM[2]	`ENV`	environment pointer
RAM[3]	`ARG`	argument pointer
RAM[4]	`FREE`	starts at 2048, base of heap

The cons VM instruction can now be translated like this:

    @SP
    AM=M-1
    D=M
    @FREE
    A=M
    SETCDR
    @SP
    A=M-1
    D=M
    @FREE
    A=M
    SETCAR
    @FREE
    D=M
    M=D+1
    @SP
    A=M-1
    M=D

Eval

We have all the tools to start building eval. Here's a pseudocode version of the evaluation function we are going to build:

function eval(env, e):
    if e is a procedure:
        return e
    // e is guaranteed to be an expression
    if e is an atom:
        if e is a symbol:
            return env.lookup(e)
        // e is guaranteed to be a primitive
        return e
    // e is guaranteed to be a pair
    f, args = e[0], e[1:]     // unpack the list e
    f = eval(env, f)
    args = map(eval, args)
    return f(args)

This means we do not support any keywords like lambda or define yet. In fact, the only function we will support is addition, represented by the symbol +. Since all our functions take only a single argument, we will have to pass in the list (env e). This is then manipulated on the stack using car and cdr. Let's start writing the eval function in VM language:

// (eval env e) -> evaluation of e in env, or NIL if error
// NOTE env is an argument, not ENV, which is a reference to the evaluating environment.
// TODO: if first argument is not an env assoc list (how?), use *ENV instead, ie support (eval e)
function eval
    push constant 0     // prepare local 0 = e
    push constant 0     // prepare local 1 = evaluated procedure
    push constant 0     // prepare local 2 = num args
    push argument
    cdr
    car
    pop local 0         // store e in lcl0
    push local 0
    is-procedure
    if-goto evalself
    push local 0
    is-symbol
    if-goto evalsymbol
    push local 0
    is-primitive
    if-goto evalself
    // guaranteed to be a pair!

The start of our evaluating function is doing most of the type-based branching. Note that we do use local variables here which we have to prepare explicitly by pushing zeroes on the working stack. Most other functions will use the env to set local variables, but we will need define first for that.

We enter the more complicated evalpair before others because of the fallthrough in the above type switch. All it does here is to evaluate the car of e, which is expected to yield a procedure. If it does, we jump to evalargs which is explained later. This is the first time we see how to call the eval function from the VM: pushing arguments on the stack and cons-ing them into one list pointer argument.

label evalpair
    push argument
    car                 // env on the stack
    push local 0
    car
    push constant 8192  // 0x2000 = emptylist
    cons
    cons
    call eval.eval
    pop local 1
    push local 1        // evalled procedure
    is-procedure
    if-goto evalargs
    // attempt to apply non-procedure
    push constant 0
    return

Procedures and primitives evaluate to themselves. Symbols we look up in the environment using ASSQ, or in this case its VM translation assoc.assq. The latter takes the list (env e) as its argument so we can just pass the full argument to eval.

label evalself
    push local 0
    return
label evalsymbol
    push argument
    call assoc.assq
    return

Here is where things get really interesting. Before calling the procedure, we need to evaluate the arguments that are being passed in. We can traverse the input list until we hit emptylist marking the end, call eval on each element, and leave the results on the stack. In order to cons the correct amount of times at the end, we'll keep numargs in a separate local var.
Due to how cons works we can only cons at the end unfortunately or this would not be needed.

label evalargs
    // we will need to build ( evaluatedargs ... )
    // on the stack, then call the evaluated func with those args
    push local 0
    cdr
    // if emptylist, we ran out of args
    is-emptylist
    if-goto evalprocedure
    push local 2
    push constant 1
    add
    pop local 2         // numargs++
    push local 0
    cdr
    pop local 0
    push argument
    car                 // env on the stack
    push local 0
    car
    push constant 8192  // 0x2000 = emptylist
    cons
    cons
    call eval.eval      // returns evalled arg, or NIL if error
    goto evalargs

At this point we should have all arguments evaluated and waiting on the stack. The procedure to call is stored in local variable 1. Here we create the list of arguments by calling cons numargs times after pushing the end marker.

label evalprocedure
    push constant 8192  // 0x2000 = emptylist
// cons numargs times
label consloop
    push constant 0
    push local 2
    eq
    if-goto call
    cons
    push local 2
    push constant 1
    sub
    pop local 2
    goto consloop
// actually call the function stored in local 1
label call
    // TODO
    call eval.add
    return

For this first version we will leave this at a hardcoded goto, but the procedure could either be referencing a builtin one, a user-defined one (using lambda), or a special one like lambda, define or if, each codifying different jumps around instruction memory.

The actual implementation of add is pretty straightforward, but we have to remember to mask the result to keep the type tag intact. Probably a lot faster if hand-rolled in assembly, but then we'd have to think about how to jump to it correctly.

// TODO: make variadic, just have to check emptylist in a loop
function add
    push argument
    car
    push argument
    cdr
    car
    add
    // NOTE: primitives are prefixed with 010
    push constant 8191  // 0x1fff
    and
    push constant 16384 // 0x4000
    or
    return

Compiler

The Lisp compiler is mainly a parser that can read s-expressions, then writes VM instructions writing the parsed expressions into starting memory. The actual environment is still hardcoded in a separate VM-file and loaded during init, but the main function below is now generated from a lisp file.

function main
    push constant 100   // pointer to ENV, set in sys.vm
    push constant 16426 // primitive(42)
    push constant 8192  // 0x2000 = emptylist
    cons
    cons
    call eval.eval
    write               // expect 0x402a, the number 42

    push constant 100   // pointer to ENV, set in sys.vm
    push constant 24578 // symbol(2) representing 'x', bound to primitive(7) in ENV
    push constant 8192  // 0x2000 = emptylist
    cons
    cons
    call eval.eval
    write               // expect 0x4007, the number 7

    push constant 100   // pointer to ENV, set in sys.vm
    push constant 24581 // symbol(5) representing '+', bound to builtin(0) in ENV
    push constant 24578 // symbol(2) representing 'x', bound to primitive(7) in ENV
    push constant 16419 // primitive(35)
    push constant 8192  // 0x2000 = emptylist
    cons
    cons
    cons
    push constant 8192  // 0x2000 = emptylist
    cons
    cons
    call eval.eval      // evaluate (+ x 35)
    write               // expect 0x402a, the number 42

    push constant 0     // void return
    return

The compiler uses a symbol table, storing string->symbol lookups. It simply keeps a counter and adds a key-value pair when it sees a symbol it has never seen before. I've added a few initial symbols by hand; this table will become more useful once we implement define.

Background and what's next

The nand2tetris book, or The Elements of Computing Systems by Noam Nisan and Shimon Schocken, describes multiple levels of a functioning computer. I followed it from the configuration of gates to make chips through machine language, assembly, a virtual machine and finally the high level language Jack. The only real modification to the setup that I ended up making was allowing for the bitshift operation because multiplication felt too slow. For this I implemented a barrel shifter circuit, and plugging this in meant extending all the abstracting layers. The CPU had to support a new instruction bypassing the ALU, the assembler had to recognize the new instruction, and if I had wanted to I could've exposed this all the way into Jack (or its Golang equivalent in my case).

The book includes a short postscript with ideas on how to extend the project. One of those is to replace Jack with a Scheme. In this series of posts I will take that idea a bit further and build a lisp machine with minimal modifications to the nand2tetris computer architecture and its Hack machine language. It assumes familiarity with the book, as if it is just another chapter at the end. The aim is to have a Lisp eval loop as close to the metal as possible and understand the benefits of running Lisp as both operating system and application.

Next time we will implement some special procedures like if that do not necessarily evaluate all their arguments, properly switch between those and builtin procedures, and add type errors on the chip level.

> part 2: Full Lispy calculator

ARG ➡	argument
	return address
	saved ENV
	saved ARG
SP ➡	working stack