I finished the nand2tetris book a while ago, building my own tooling in Go to simulate a simple computer all the way up from NAND-gates. After that I dove into SICP and went on to get a translation of Peter Norvig's lispy running on the nand2tetris computer. I left the project there to explore the wonderful world of Lisp further. Now it is time to come back with all the things I've learned, and build a nand2tetris-style Lisp machine. In this first post we will go over some ideas on how to modify the Hack computer and build something close to the lispy 'calculator'.
in (16) ➡ |
| ➡ car out (16) | |||||
address (16) ➡ | ➡ cdr out (16) |
This change makes addresses not just pointers into a flat list, but pointers into a list of cons cells (or pairs). Not everything in Lisp is a pair though: we have atoms which can be symbols or primitives such as numbers, and we have procedures that can be builtin or user-defined. This type information is crucial to handle s-expressions at the lowest level. We can take some bits of our 16-bit registers and encode type data for each value in those. This changes the Hack platform into a tagged architecture. We'll use the first 3 bits, leaving us with only 13 to represent data with. The alternative would be to extend the bitsize, but that would be a more drastic overall change.
Prefix | Type | Example | |||
---|---|---|---|---|---|
000 | Pair (0x0 is the empty list) | () | |||
001 | Pair | (+ . (x . 3)) | |||
010 | Primitive | 42 | |||
011 | Symbol | x | |||
101 | Builtin | + | |||
111 | Special builtin | lambda
100 | Userdefined | func
| |
One caveat is that we now have to be very careful not to confuse pointers to pairs with pairs themselves, as it all looks the same in memory. For example, how would we tell the difference between a pointer to pair, and a cons cell starting with a pair? In general, we'll assume that the stack only works with pointers and the heap only contains actual cons cells. To make the allocation of a cons cell with empty cdr easier, we represent empty list as 0x0.
Using this data representation we can build Lisp-specific machine instructions. My approach is to extend the ALU to recognise these new instructions and execute them without breaking the existing Hack ones. This way we can reuse most of the nand2tetris infrastructure to test our machine!
c | free | a | comp | dest | jump | instruction | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | v | v | v | v | v | v | v | v | v | v | v | v | v | v | v | A-instruction: @value |
1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | C-instruction: dest=comp;jump. In this case D=M |
You can see two bits are completely unused, or free. We will call them b[1] and b[2], or useALU and writeCdrM respectively. The following modifications leave the Hack instructions unaltered, meaning the assembler will also still be valid.
This leaves at least 7 bits ( a + comp ) to switch between different instructions. Even if we don't find an elegant way like the ALU to encode them together, we can make an ugly switch using Mux-gates. Here's the ones I've found I needed to make eval work at the virtual machine level; their implementation in gates is left to the reader :)
Command | Instruction | Example |
---|---|---|
SETCAR | 1111001100001000 | shorthand for M=D, not a new instruction |
SETCDR | 1010111111000000 | writes D to CDR register |
MCAR | 1111110000010000 | shorthand for D=M, not a new instruction. Maybe when we support typechecking. |
MCDR | 1000011111010000 | writes value of CDR register into D |
EQLM | 1000010111010000 | sets D to 0xffff if M == D, otherwise 0x0000 |
ISSYMB | 1000001011010000 | checks tags of M, setting D to true or false the same way as EQLM |
ISPRIM | 1000001010010000 | similar but checking different prefix |
ISPROC | 1000001101010000 | checks first bit set to 1 |
ISEMPTY | 1000001001010000 | all 4 of these reuse the same gates; prefix can be found in the instruction b[7:9] |
EMPTYCDR | 1000000001010000 | like the above but checks the CDR register instead |
At its most basic, the environment is a mapping of symbols to s-expressions.
Norvig's lispy uses a dictionary; in my own Golang-based Lisp interpreter I do the same thing.
But we don't have to implement hashmaps: we already have lists!
We can use association lists, with the key being a symbol and the value an s-expression.
However, this will be slow.
(ASSQ) // assume @R11 = K (key) and @R12 = P (pointer to assoc list) @R12 A=M A=M D=M @R11 EQLM @ASSQCONTINUE D;JEQ // here K == D ! @R12 A=M A=M MCDR @0x6002 // I've used this address to write to tape output M=D @END 0;JMP (ASSQCONTINUE) @R12 A=M EMPTYCDR @FAILTOFIND D;JNE @R12 A=M MCDR @R12 M=D @ASSQ 0;JMP (FAILTOFIND) @END 0;JMPThat's pretty compact!
I've made a few modifications to the Hack virtual machine to better represent a functional language and get a grasp of the differences. First of all, we will do away with local variables and force all functions to take exactly one argument. Local vars are not needed because we will use the environment, and since we have list we can pass a variadic amount of args by just passing a list pointer. Local variables will make an appearance in eval but will refer to the first n items on the function's working stack. Most functions are expected to not have to use this at all.
Instead of LCL we store a pointer to ENV, a pointer to the current environment we are evaluating in. The THIS and THAT registers are also no longer needed, just like pointer and temp. This simplifies things quite a bit: calling and returning from functions has less overhead.
ARG ➡ | argument |
return address | |
saved ENV | |
saved ARG | |
SP ➡ | working stack |
Register | Name | Usage |
---|---|---|
RAM[0] | NIL | indicates failure |
RAM[1] | SP | stack pointer |
RAM[2] | ENV | environment pointer |
RAM[3] | ARG | argument pointer |
RAM[4] | FREE | starts at 2048, base of heap |
@SP AM=M-1 D=M @FREE A=M SETCDR @SP A=M-1 D=M @FREE A=M SETCAR @FREE D=M M=D+1 @SP A=M-1 M=D
function eval(env, e): if e is a procedure: return e // e is guaranteed to be an expression if e is an atom: if e is a symbol: return env.lookup(e) // e is guaranteed to be a primitive return e // e is guaranteed to be a pair f, args = e[0], e[1:] // unpack the list e f = eval(env, f) args = map(eval, args) return f(args)This means we do not support any keywords like lambda or define yet. In fact, the only function we will support is addition, represented by the symbol +. Since all our functions take only a single argument, we will have to pass in the list (env e). This is then manipulated on the stack using car and cdr. Let's start writing the eval function in VM language:
// (eval env e) -> evaluation of e in env, or NIL if error // NOTE env is an argument, not ENV, which is a reference to the evaluating environment. // TODO: if first argument is not an env assoc list (how?), use *ENV instead, ie support (eval e) function eval push constant 0 // prepare local 0 = e push constant 0 // prepare local 1 = evaluated procedure push constant 0 // prepare local 2 = num args push argument cdr car pop local 0 // store e in lcl0 push local 0 is-procedure if-goto evalself push local 0 is-symbol if-goto evalsymbol push local 0 is-primitive if-goto evalself // guaranteed to be a pair!The start of our evaluating function is doing most of the type-based branching. Note that we do use local variables here which we have to prepare explicitly by pushing zeroes on the working stack. Most other functions will use the env to set local variables, but we will need define first for that.
We enter the more complicated evalpair before others because of the fallthrough in the above type switch. All it does here is to evaluate the car of e, which is expected to yield a procedure. If it does, we jump to evalargs which is explained later. This is the first time we see how to call the eval function from the VM: pushing arguments on the stack and cons-ing them into one list pointer argument.
label evalpair push argument car // env on the stack push local 0 car push constant 8192 // 0x2000 = emptylist cons cons call eval.eval pop local 1 push local 1 // evalled procedure is-procedure if-goto evalargs // attempt to apply non-procedure push constant 0 returnProcedures and primitives evaluate to themselves. Symbols we look up in the environment using ASSQ, or in this case its VM translation assoc.assq. The latter takes the list (env e) as its argument so we can just pass the full argument to eval.
label evalself push local 0 return label evalsymbol push argument call assoc.assq returnHere is where things get really interesting. Before calling the procedure, we need to evaluate the arguments that are being passed in. We can traverse the input list until we hit emptylist marking the end, call eval on each element, and leave the results on the stack. In order to cons the correct amount of times at the end, we'll keep numargs in a separate local var.
label evalargs // we will need to build ( evaluatedargs ... ) // on the stack, then call the evaluated func with those args push local 0 cdr // if emptylist, we ran out of args is-emptylist if-goto evalprocedure push local 2 push constant 1 add pop local 2 // numargs++ push local 0 cdr pop local 0 push argument car // env on the stack push local 0 car push constant 8192 // 0x2000 = emptylist cons cons call eval.eval // returns evalled arg, or NIL if error goto evalargsAt this point we should have all arguments evaluated and waiting on the stack. The procedure to call is stored in local variable 1. Here we create the list of arguments by calling cons numargs times after pushing the end marker.
label evalprocedure push constant 8192 // 0x2000 = emptylist // cons numargs times label consloop push constant 0 push local 2 eq if-goto call cons push local 2 push constant 1 sub pop local 2 goto consloop // actually call the function stored in local 1 label call // TODO call eval.add returnFor this first version we will leave this at a hardcoded goto, but the procedure could either be referencing a builtin one, a user-defined one (using lambda), or a special one like lambda, define or if, each codifying different jumps around instruction memory.
The actual implementation of add is pretty straightforward, but we have to remember to mask the result to keep the type tag intact. Probably a lot faster if hand-rolled in assembly, but then we'd have to think about how to jump to it correctly.
// TODO: make variadic, just have to check emptylist in a loop function add push argument car push argument cdr car add // NOTE: primitives are prefixed with 010 push constant 8191 // 0x1fff and push constant 16384 // 0x4000 or return
function main push constant 100 // pointer to ENV, set in sys.vm push constant 16426 // primitive(42) push constant 8192 // 0x2000 = emptylist cons cons call eval.eval write // expect 0x402a, the number 42 push constant 100 // pointer to ENV, set in sys.vm push constant 24578 // symbol(2) representing 'x', bound to primitive(7) in ENV push constant 8192 // 0x2000 = emptylist cons cons call eval.eval write // expect 0x4007, the number 7 push constant 100 // pointer to ENV, set in sys.vm push constant 24581 // symbol(5) representing '+', bound to builtin(0) in ENV push constant 24578 // symbol(2) representing 'x', bound to primitive(7) in ENV push constant 16419 // primitive(35) push constant 8192 // 0x2000 = emptylist cons cons cons push constant 8192 // 0x2000 = emptylist cons cons call eval.eval // evaluate (+ x 35) write // expect 0x402a, the number 42 push constant 0 // void return returnThe compiler uses a symbol table, storing string->symbol lookups. It simply keeps a counter and adds a key-value pair when it sees a symbol it has never seen before. I've added a few initial symbols by hand; this table will become more useful once we implement define.
The book includes a short postscript with ideas on how to extend the project. One of those is to replace Jack with a Scheme. In this series of posts I will take that idea a bit further and build a lisp machine with minimal modifications to the nand2tetris computer architecture and its Hack machine language. It assumes familiarity with the book, as if it is just another chapter at the end. The aim is to have a Lisp eval loop as close to the metal as possible and understand the benefits of running Lisp as both operating system and application.
Next time we will implement some special procedures like if that do not necessarily evaluate all their arguments, properly switch between those and builtin procedures, and add type errors on the chip level.
> part 2: Full Lispy calculator