Hash Table Syntax in Common Lisp

I sometimes get caught up in thinking about what a language should do for me that I forget what a language allows me to do. Take, for instance, the hash-table in common lisp. On first glance, it seems a bit unwieldy, because building it up is somewhat annoying for small cases. Of course we can use associative arrays, but that comes with its own annoyance (such as having to specify to use #'equal as the test when looking it up if you key on strings). What I'd really like to have is a new kind of syntax:

{:key1 => "value1", :key2 => "value2"}

It is handy to have this succinct declaration at our disposal, and it's a bit annoying that CL doesn't have it off the bat. And getting caught up in that annoyance I forget that there is little standing in the way of creating it. But first let's start with something simpler:

{:key1 "value1" :key2 "value2"}

Setting up the basic reader macro

Let's turn this into a hash-table. To do so takes two steps: define a reader macro for the { character that splits the list into pairs, then generate code for the hash table after. First, the pair generation.

(set-macro-character #\{
 (lambda (str char)
  (declare (ignore char))                           ; Ignore me some warnings
  (let
   ((*readtable* (copy-readtable *readtable* nil))
    (keep-going t))
   (set-macro-character #\} (lambda (stream char)
                             (declare (ignore char) (ignore stream))
                             (setf keep-going nil)))
   (loop for key = (read str nil nil t)
         while keep-going
         for value = (read str nil nil t)
         collect (list key value)))))

There's a few things going on here. The first is simply defining the reader for {, which gets us started. Then the body of the loop reads a key and a value and keeps on going until we hit the end which is denoted by }. We set that second macro-character on a readtable that is a shadowed copy of the main readtable so as to not mess around with people trying to us } as an every day character. The nice part about this is that the reader will do the right thing and stop reading the previous atom just before the } character comes up. And this will generate pairs:

CL-USER> (read-from-string "{:key1 \"value1\" :key2 \"value2\"}")
((:KEY1 "value1") (:KEY2 "value2"))

So now we'd actually like a hash map to be born out of this, which means taking those pairs and doing something with them. We'll modify the above to save them in a variable which we'll use in a quasiquoted return value:

(let
 ((pairs (loop for key = (read str nil nil t)
               while keep-going
               for value = (read str nil nil t)
               collect (list key value)))
  (retn (gensym)))
`(let
  ((,retn (make-hash-table :test #'equal)))
  ,@(mapcar
     (lambda (pair)
      `(setf (gethash ,(car pair) ,retn) ,(cadr pair)))
     pairs)
  ,retn))

The let block at the end is all that our function will return, and it should create and return a hash-table handily:

CL-USER> (read-from-string "{:key1 \"value1\" :key2 \"value2\"}")
(LET ((#:G742 (MAKE-HASH-TABLE :TEST #'EQUAL)))
  (SETF (GETHASH :KEY1 #:G742) "value1")
  (SETF (GETHASH :KEY2 #:G742) "value2")
  #:G742)

And if we execute it as normal lisp:

CL-USER> {:key1 "value1" :key2  "value2"}
#<HASH-TABLE :TEST EQUAL :COUNT 2 {B21B109}>

CL-USER> (gethash :key1 {:key1 "value1" :key2 "value2"})
"value1"

Adding some syntactic sugar

However, we'd like it to be a bit prettier, so we'll add in some commas and arrows. What we'd really like is what I described above:

{:key1 => "value1", :key2 => "value2"}

All of this is really just syntactic sugar. Because each key and value are atoms in and of themselves, we can stop right here and create basically a new kind of list that should always have an event number of atoms and is surrounded with '{}'. But because we have fallen in love with Ruby, we should go all the way, and read in the arrows and commas, check to make sure they are arrows and commas, and discard them anyway. Unfortunately, commas already exist in the read table, and they'll throw a nasty error if you read them. However, we can't just blow away the #\, character in the readtable like we did with the #\} character because it's often used (we'll just ignore legitimate uses of the #\} character in functions like 'weird-func-name-}', as I've not yet seen a function named that). What we'll need to do is enhance that loop to include two new reads:

((pairs (loop for key = (read str nil nil t)
              for sep = (read str nil nil t)
              for value = (read str nil nil t)
              for end? = (read-separator str)

We'll need to define read-separator to override the comma to return something we can check for:

(defun read-separator (str)
 (let
  ((*readtable* (copy-readtable *readtable* nil)))
  (set-macro-character #\, (lambda (stream char)
                            (declare (ignore char) (ignore stream))
                            'break))
  (read str nil)))

And change the #\} override to return 'end instead of just changing the keep-going variable:

(set-macro-character #\} (lambda (stream char)
                          (declare (ignore char) (ignore stream))
                          'end))

And then add some checks in the loop to make sure the hash-table is looking sane:

do (when (not (eql '=> sep)) (error "Expected =>, did not get"))
do (when (not (or (eql 'end end?) (eql 'break end?))) (error "Expected , or }"))

And so our new function looks like this:

(set-macro-character #\{
 (lambda (str char)
  (declare (ignore char))
  (let
   ((*readtable* (copy-readtable *readtable* nil)))
   (set-macro-character #\} (lambda (stream char)
                             (declare (ignore char) (ignore stream))
                             'end))

   (let
    ((pairs (loop for key = (read str nil nil t)
                  for sep = (read str nil nil t)
                  for value = (read str nil nil t)
                  for end? = (read-separator str)
                  do (when (not (eql '=> sep)) (error "Expected =>, did not get"))
                  do (when (not (or (eql 'end end?) (eql 'break end?))) (error "Expected , or }"))
                  collect (list key value)
                  while (not (eql 'end end?))))
     (retn (gensym)))
    `(let
      ((,retn (make-hash-table :test #'equal)))
      ,@(mapcar
         (lambda (pair)
          `(setf (gethash ,(car pair) ,retn) ,(cadr pair)))
         pairs)
      ,retn)))))

We now have a new syntax. It should even work with weird quasiquote macros like this:

(defmacro hash-for-val (val)
 `{:key1 => ,val})

I don't know if you'd ever need a macro like that, but the comma works just like you'd expect, and indeed if you do run that, the following will work:

(gethash :key1 (hash-for-val "g"))

Adding Pretty Printing

The last part of what we'd like to do here is make it so that when lisp prints out a hash-table, it prints it in the form that we could read back in by pasting. To do this, we need to ensure the pretty printing is on, and set the pretty print dispatch for the hash-table type. This is fairly easy to do, but it's worth noting here that maphash does indeed not work like mapcar (it returns nil), so we'll have to use loop to convert it to a list of pairs so we can print it with format:

(set-pprint-dispatch 'hash-table
 (lambda (str ht)
  (format str "{?????溅玺???誉??蕃?}"
   (loop for key being the hash-keys of ht
         for value being the hash-values of ht
         collect (list key value)))))

After this, this is what you should be able to do on the REPL:

CL-USER> {:test => 9, "foo" => 3}
{:TEST => 9, "foo" => 3}

Of course, this abstraction is a little bit leaky with regard to normal read macros. When you actually do the following:

(read-from-string "{:test => 9, :foo => 3}")

You'll get a large s-expression that would build up the hashmap for you. However, this is not how other reads work. For instance, using the #S reader macro, you'll get actually get out what you put in, as follows:

CL-USER> (defstruct foo a b)
FOO

CL-USER> (read-from-string "#S(FOO :A 3 :B 4)")
#S(FOO :A 3 :B 4)

But what's interesting about this is the following:

CL-USER> (eval (read-from-string "#S(FOO :A N :B M)"))
#S(FOO :A N :B M)

Whereas if you actually type #(FOO :A N :B M) into the REPL, it will attempt to evaluate N and M. When creating my own reader macro, I could either get it to act like #S does from the REPL (by way of creating the code that creates the hashmap), or as it does from the reader (by actually creating the hashmap at read time, and not evaluating any of the symbols). Perhaps there is a way to make it do both, but I could not find it trivially.

In conclusion, I usually forget, when I'm bemoaning the lack of some nice syntax that some other language has, that Common Lisp can become that for me. But there is a danger here. The language I'm now using isn't really Common Lisp anymore, but some dialect that only I know. If I started adding some more extentions like [] for arrays, then suddenly no one will know how to read the code but me. Because I use CL alone, in a vacuum, this actually works out pretty well. However, if you're going to be creating a library for others to consume or read, then maybe you should abstain from too much weirdness. And you should definitely not ever modify the end programmer's reader without their permission, as that will just cause all kinds of problems.

And so we come to the sliding scale: Should we have complete flexibility, or complete specification, or somewhere in between. Personally? I like the flexibility, and that's why I find my home in Common Lisp.

Full Code:

(defun read-separator (str)
 (let
  ((*readtable* (copy-readtable *readtable* nil)))
  (set-macro-character #\, (lambda (stream char)
                            (declare (ignore char) (ignore stream))
                            'break))
  (read str nil)))

(set-macro-character #\{
 (lambda (str char)
  (declare (ignore char))
  (let
   ((*readtable* (copy-readtable *readtable* nil)))
   (set-macro-character #\} (lambda (stream char)
                             (declare (ignore char) (ignore stream))
                             'end))

   (let
    ((pairs (loop for key = (read str nil nil t)
                  for sep = (read str nil nil t)
                  for value = (read str nil nil t)
                  for end? = (read-separator str)
                  do (when (not (eql '=> sep)) (error "Expected =>, did not get"))
                  do (when (not (or (eql 'end end?) (eql 'break end?))) (error "Expected , or }"))
                  collect (list key value)
                  while (not (eql 'end end?))))
     (retn (gensym)))
    `(let
      ((,retn (make-hash-table :test #'equal)))
      ,@(mapcar
         (lambda (pair)
          `(setf (gethash ,(car pair) ,retn) ,(cadr pair)))
         pairs)
      ,retn)))))

(set-pprint-dispatch 'hash-table
 (lambda (str ht)
  (format str "{?????溅玺???誉??蕃?}"
   (loop for key being the hash-keys of ht
         for value being the hash-values of ht
         collect (list key value)))))