Datalog in miniKanren

Having access to an embedded logical programming language makes some tasks really easy. One prerequisite for RealTalk is some form of Datalog, and I built one in Scheme using miniKanren so that I had access to all of the internals. This page explains the naive Datalog implementation I did before modifying some of it to fit my version of Dynamicland.

As a typical example we can look at a directed graph with five vertices labeled a through e. Using Datalog, we introduce these vertices as separate records, establish facts about directed edges between them, and then introduce rules about what it means for a vertex to be reachable from another vertex. In the end we manually trigger fixpoint analysis and run a query against the resulting database of initial and derived facts.

(define dl (make-new-datalog)) (define a (dl-record! dl 'vertex)) (define b (dl-record! dl 'vertex)) (define c (dl-record! dl 'vertex)) (define d (dl-record! dl 'vertex)) (define e (dl-record! dl 'vertex)) (define (dl-edge x y) (dl-assert! dl x 'edge y)) (dl-edge a c) (dl-edge b a) (dl-edge b d) (dl-edge c d) (dl-edge d a) (dl-edge d e) (dl-rule! dl (reachable ,?x ,?y) :- (edge ,?x ,?y)) (dl-rule! dl (reachable ,?x ,?y) :- (edge ,?x ,?z) (reachable ,?z ,?y)) (dl-fixpoint! dl) (define query (dl-find dl ,?id where (,?id reachable ,?id)))

This is roughly the syntax that I want to go for, minus some of the commas (or unquotes) that I did not get rid of. Now that we have a goal, let's jump into the implementation! We can define a Datalog instance as a record with the following fields:

(define-record-type (make-datalog edb idb rdb idx-entity idx-attr counter) datalog? (edb datalog-edb) (idb datalog-idb set-datalog-idb!) (rdb datalog-rdb set-datalog-rdb!) (idx-entity datalog-idx-entity) (idx-attr datalog-idx-attr) (counter datalog-counter set-datalog-counter!))

Most of these are internal indices, each a different hash-table. Facts are always a 3-tuple of entity-ID, attribute-name and value. Whenever we assert a fact we not only update the main entity table (edb), we also update two indices that index facts by entity (idx-entity) and attribute (idx-attr) respectively.

(define (dl-assert! dl entity attr value) (hashtable-set! (datalog-edb dl) (list entity attr value) #t) (dl-update-indices! dl (list entity attr value))) (define (dl-update-indices! dl tuple) (let ((entity (car tuple)) (attr (cadr tuple)) (idx-entity (datalog-idx-entity dl)) (idx-attr (datalog-idx-attr dl))) (let ((m (hashtable-ref idx-entity entity #f))) (if m (hashtable-set! m tuple #t) (let ((new (make-hashtable))) (hashtable-set! idx-entity entity new) (hashtable-set! new tuple #t)))) (let ((m (hashtable-ref idx-attr attr #f))) (if m (hashtable-set! m tuple #t) (let ((new (make-hashtable))) (hashtable-set! idx-attr attr new) (hashtable-set! new tuple #t))))))

The dl-record! macro lets us introduce one or more facts in an easy way:

(define-syntax dl-record! (syntax-rules () ((_ dl type (attr value) ...) (let ((id (dl-next-id! dl))) (dl-assert! dl id (list type attr) value) ... id))))

So far so good. Adding facts to hash-tables is maybe not the most exciting part. Next we are going to look at rules, and this will get a lot more interesting! Applying a rule is the same thing a running a query against the database. In our naive Datalog, running fixpoint amounts to applying each rule, introducing each newly found fact into our set of derived facts, and repeat that process until no more new facts are being introduced. We'll start by looking at fixpoint, and come back to rules later as they are the most complicated part of this implementation.

(define (dl-assert-rule! dl rule) (hashtable-set! (datalog-rdb dl) rule #t)) (define (dl-apply-rule dl rule) (dl-find rule)) (define (dl-find goal) (runf* goal)) (define (dl-fixpoint! dl) (set-datalog-idb! dl (make-hashtable)) (dl-fixpoint-iterate dl)) (define (dl-fixpoint-iterate dl) (let* ((facts (map (lambda (rule) (dl-apply-rule dl rule)) (hashtable-keys (datalog-rdb dl)))) (factset (foldl (lambda (x y) (set-extend! y x)) facts (make-hashtable))) (new (hashtable-keys (set-difference factset (datalog-idb dl))))) (set-extend! (datalog-idb dl) new) (for-each (lambda (fact) (dl-update-indices! dl fact)) new) (if (not (null? new)) (dl-fixpoint-iterate dl))))
As you may have guessed, we will use miniKanren to run queries. That dl-find function calls runf*, which is the same as the run* macro from miniKanren but written as a normal function. This will make it a lot easier to use it within another macro. Similarly, the fresh macro is rewritten as a (slightly less ergonomic) function that now needs the amount of fresh vars as an argument.

(define (fresh-vars count kont) (lambda (s/c) (let loop ((n count) (vars '()) (st s/c)) (if (zero? n) ((apply kont (reverse vars)) st) (let ((c (cdr st))) (loop (- n 1) (cons (var c) vars) `(,(car s/c) . ,(+ c 1))) ))))) (define (runf* goal) (mK-reify (take-all (call/empty-state goal))))

So now we can run miniKanren as follows, not using any macros:

> (runf* (fresh-vars 1 (lambda (x) (disj (equalo x 5) (equalo x 6))))) > (5 6)

Now that that is done we can rewrite a rule somewhat like this. There is a lot of work needed to actually make that work, but lets start with the rough idea:

(dl_rule! (reachable ,?x ,?y) :- (edge ,?x ,?z) (reachable ,?z ,?y)) (dl-assert-rule! dl (fresh-vars 2 (lambda (q x y) (conj (equalo q `(reachable ,?x ,?y)) (dl-findo dl ((edge ,?x ,?z) (reachable ,?z ,?y))) ))))))))) ; once more, with problem areas highlighted: (dl-assert-rule! dl (fresh-vars [numvars] (lambda (q [vars]) (conj (equalo q [head]) (dl-findo dl [body]) )))))))))

The dl-rule! macro turns the rule into a function that when ran will try to match all of the body clauses against the database and returns the head with bound variables if it succeeds. The dl-findo function implements an extra-logical predicate that follows a pretty typical Datalog optimisation. If we try to match a tuple (entity attr value) against the database, we can limit the amount of potential matches if the entity or the attribute are already known.

(define-syntax dl-findo (syntax-rules () ((_ dl (m ...)) (conj+ (dl-findo_ dl `m) ... )))) (define (dl-findo_ dl m) (fresh (x y entity attr db) (conso entity x m) (conso attr y x) (conde [(boundo entity) (lookupo (datalog-idx-entity dl) entity db) (membero m db)] [(unboundo entity) (boundo attr) (lookupo (datalog-idx-attr dl) attr db) (membero m db)] ))) (define (boundo v) (lambda (s/c) (if (var? v) (let ((x (walk v (car s/c)))) (if (var? x) mzero (unit s/c))) (unit s/c)))) (define (unboundo v) (lambda (s/c) (if (var? v) (let ((x (walk v (car s/c)))) (if (var? x) (unit s/c) mzero)) mzero))) (define (lookupo m key value) (lambda (s/c) (let* ((k (if (var? key) (walk key (car s/c)) key)) (v (hashtable-ref m k #f))) (if v ((equalo value (hashtable-keys v)) s/c) mzero))))

The most problematic part of the dl-rule! macro is figuring out all of the variable scoping and hygiene. Using syntax-case ended up being the closest I could get to what I wanted. Note that right now the database (dl) is captured in a closure, so we do not have to specify it again when running a rule.

(define-syntax dl-rule! (lambda (stx) (define (symbol-with-question-mark? s) (and (symbol? s) (let ((str (symbol->string s))) (and (positive? (string-length str)) (char=? (string-ref str 0) #\?))))) (define (collect-vars datum) (cond [(symbol? datum) (if (symbol-with-question-mark? datum) (list datum) '())] [(pair? datum) (append (collect-vars (car datum)) (collect-vars (cdr datum)))] [else '()])) (define (remove-duplicates syms) (define seen '()) (define (unique s) (let ((d (syntax->datum s))) (if (member d seen) #f (begin (set! seen (cons d seen)) #t)))) (filter unique syms)) (define (replace-symbols datum sym->gen) (cond [(symbol? datum) (let ((mapped (assoc datum sym->gen))) (if mapped (cdr mapped) (datum->syntax stx datum)))] [(pair? datum) (cons (replace-symbols (car datum) sym->gen) (replace-symbols (cdr datum) sym->gen))] [else (datum->syntax stx datum)])) (syntax-case stx (:-) ((_ dl (head hx hy) :- (body bx by) ...) (let* ((datums (syntax->datum #'((hx head hy) (bx body by) ...))) (head-datum (car datums)) (body-datums (cdr datums)) (vars (remove-duplicates (collect-vars datums))) (numvars (+ 1 (length vars))) (gens (generate-temporaries vars)) (sym->gen (map cons vars gens)) (replaced-head (replace-symbols head-datum sym->gen)) (replaced-body (replace-symbols body-datums sym->gen))) #`(dl-assert-rule! dl (fresh-vars #,numvars (lambda (q #,@gens) (conj (equalo q `#,replaced-head) (dl-findo dl #,replaced-body) )))))))))

Now we are ready to run our example! The only thing I have changed is that I have not implemented the dl-find macro yet, so in order to query I have to write things out a bit more. It would have the same problems as dl-rule! and the same solution, I imagine.

> (define query (dl-find (fresh-vars 1 (lambda (?id) (dl-findo dl ( (,?id reachable ,?id) )))))) > (3 1 4)

This should print (3 1 4) or a permutation thereof, order depending on the hash-table implementation. As a bonus, this page implements and runs all of the above using Guile Scheme and WebAssembly! The result of running all of this should be printed below: