Testing Clojure macros with metadata

Reading Time: 7 minutes

This is a bit of a technical aside about Clojure macros and core.async. In the below sections I’ll show how you can:

  • Attach metadata to the output of a defmacro. This metadata can be used to build robust tests for complex macros, notably those that use gensyms.
  • Use Clojure’s core.async to break out of asynchronous callback hell, and use macros to make this intuitive.

Read on to get the details…

A (slightly manufactured) problem

Let’s say we have a simple function that adds two number and then calls a function with the result:


(defn adder [a b f]
  (f (+ a b)))

To get the results of our function we need to pass in another function that uses it:


(adder 1 2
      (fn [result]
       (println "Result is" result)))

A little convoluted perhaps, but this is actually a fairly common pattern in asynchronous programming, especially in the JavaScript world. If you prefer a slightly more practical example, here’s a function that does the same thing but in a different thread, possibly utilising an additional CPU:


(defn adder-thread  [a b f]
  (.start (Thread.
           (fn []
             (f (+ a b))))))

Disconnecting from callbacks

The problem with this style of programming is that once inside a callback tree it is very hard to get out of it. The simplest thing is the pass the result of the callback to yet another callback, rapidly resulting in callback hell, with deeply nested anonymous functions, or a huge number of small, single-use functions.

What we’d often like is the ability to return the result passed to a callback up to the ‘top level’ where we originally called adder. But when doing so we need to take into account that the callback may be called concurrently from the ‘main’ thread of control, so we need to wait for it to complete before retrieving a value.

There are a number of ways of doing this, but luckily Clojure has a library to help with such asynchronous programming, called core.async. This provides, among other things, one-way, thread-safe channels. We can use these as a method to both wait-for and retrieve the value passed to the callback. The resulting code looks like this:


(def result (let [rchan (async/chan 1)]
              (adder 1 2 (fn [result] (async/>!! rchan result)))
              (async/<!! rchan))

Let’s unpack this a bit… The first thing we do is create a new channel with a length of 1 using (async/chan 1) (assuming core.async has been imported as async). The reason we give it a length of one is to ensure that it doesn’t block when we put data onto it, decoupling it from the caller. The channel effectively acts as an intermediate store for the return value; if the length was zero then the push operation would block until the receiver fetched the value, which is not what we want. We then call adder with our result callback. This callback puts the result onto our channel using the >!! function. Finally, back in our main stream of control we retrieve this value from the channel. As the channel is blocking, we can ensure that we wait until the return value is ready. The let then returns the retrieved value.

Obviously this is a somewhat convoluted example, but hopefully the point gets through; core.async and channels can be used to retrieve values from an asynchronous, possibly concurrent, callback system and make it act more like synchronous, imperative code.

But…

But, this isn’t actually much better is it? All we’ve done is replace callback hell with channel hell; boilerplate code to support channel creation and manipulation. However, the important word there is ‘boilerplate’; in the Lisp world any use of boilerplate code is a sign you should be considering replacing it with a macro.

If you’re not familiar with macros, the short version is that they are Clojure code that run at compile time and produce new Clojure code that is passed to the compiler. This is an immensely powerful idea and has long been one of Lisp’s biggest strengths. If you’re interested learning more about them the best reference in my mind is not a Clojure one but Practical Common Lisp by Peter Seibel. Common Lisp is slightly different to Clojure, but the core concepts are the same.

But back to our boilerplate code. If we’re going to use this pattern a lot then it is worth our while to turn it into a macro. Being good little developers, naturally we’re going to follow TDD and write our testing up front. Luckily that’s pretty easy; for starters we already have an example of code we need to produce above:


(let [rchan (async/chan 1)]
  (adder 1 2 (fn [result] (async/>!! rchan result)))
  (async/<!! rchan))

We also have the function macroexpand-1 which will run a macro and return the result. We can then compare these pieces of code to see if they’re equal (as Clojure code is also Clojure data). So lets write our test:


(deftest macro-output
  (let [output (macroexpand-1 `(with-return-channel (adder 1 2)))

        expected `(let [chan (async/chan 1)]
                    (adder 1 2 (fn [result] (async/>!! chan result)))
                    (async/<!! chan))]

    (is (= expected output))))

Note: This assumes that you’re tests are in the same namespace as the tested code. If not you may need to fully-qualify the variable names.

Now that we have our test we can write our macro. We’re going to keep the implementation simple and limit it to a single target function (but see below for pointers to a more advanced version). What we want is something like this:


(with-return-channel
  (adder 1 2))

That will then expand to our test code above. So the steps to produce the macro are:


(defmacro with-return-channel [body]
    `(let [rchan (async/chan 1)]
       ~(concat body [`(fn [result] (async/>!! rchan result))])
       (async/<!! rchan)))

As you can see this just uses the Clojure syntax-quote (aka ‘backtick’) operator to replicate the example code. The only special thing it does is inject the result function into the target adder code (aka body) using concat inside an unquoted area. defmacro then returns this generated code to the compiler. And if we run our test we should see them passing.

So far so simple. Except…

Cleaning up our act

If you have a background in Lisp, especially Common Lisp, alarm bells will be going off now, because our generated code uses named variables. While Clojure does take some steps to make macros hygienic (namely by namespacing variables in syntax-quotes) it is still to recommended to avoid statically named variables in macros, as we don’t know what context they’re going to be used in. Luckily Clojure (like most Lisps) provides a function called gensym to generate randomised variable names (aka ‘symbols’) that we use instead. Clojure even provides a handy shortcut for this; if you append a # to any variable inside a syntax-quote it will be converted into a randomised symbol. e.g:


clj-macros.core=> `(myvariable)
(clj-macros.core/myvariable)

clj-macros.core=> `(myvariable#)
(myvariable__24260__auto__)

Unfortunately this name->symbol conversion is only valid inside the same syntax-quote operation though, which means we can’t use it in our macro as we have multiple quotes.

So let’s rewrite our macro to be hygienic:


(defmacro with-return-channel [body]
  (let [csym (gensym)
        rsym (gensym)
        callback `(fn [~rsym] (async/>!! ~csym ~rsym))]

    `(let [~csym (async/chan 1)]
       ~(concat body [callback])
       (async/<!! ~csym))))

As you can see, we generate two symbols, csym and rsym, then use unquote (~) to inject them anywhere we would have used rchan or result previously. Simple enough; let’s run the tests again…


Test Summary
clj-macros.core-test
Ran 7 tests, in 7 test functions
1 failures

Results

Fail in macro-output
expected: (= expected output)
  actual: (not
 (=
  (clojure.core/let
   [clj-macros.core-test/chan (clojure.core.async/chan 1)]
   (clj-macros.core/adder
    1
    2
    (clojure.core/fn
     [clj-macros.core-test/result]
     (clojure.core.async/>!!
      clj-macros.core-test/chan
      clj-macros.core-test/result)))
   (clojure.core.async/<!! clj-macros.core-test/chan))

(clojure.core/let
   [G__24463 (clojure.core.async/chan 1)]
   (clj-macros.core/adder
    1
    2
    (clojure.core/fn
     [G__24464]
     (clojure.core.async/>!! G__24463 G__24464)))
   (clojure.core.async/<!! G__24463))))

It’s failed. This is because we’ve replaced our static symbols with gensyms, so naturally they no longer match. And here we (finally) get to the point of this post, which is: how do you test a macro that has gensyms? After-all, the gensyms are pseudo-random and not reliably predictable across runs.

(There’s another related question, which is “Should you test macro output directly, or just test the resulting code?” Hopefully the process of developing the macro above should answer this; comparing macro output to a ‘known-good’ reference implementation gives us meaningful test failures we can inspect, and also acts as later documentation for how the macro is supposed to work.)

So what’s the answer? Well, Clojure has some properties that can help us here, namely homoiconicity and metadata.

Getting meta

Clojure is homoiconic, which means that a Clojure program is represented by Clojure data structures. This is why macros are so straight forward: Clojure code that is input into a macro is manipulable in the same way as data from a socket or file, and the resulting data is treated as Clojure code and passed to the compiler.

The other property we’re going to use is metadata. Clojure data-structures can also contain additional annotation information that can describe special properties of the data. The Clojure uses this internally to e.g. to attach documentation to functions or mark them as private. However this metadata can be attached to any data structure, including the code produced by a macro (which as noted above is just another form of data). So we can use this to pass information back out from the macro to assist any tests.

So let’s rewrite our macro to use metadata:


(defmacro with-return-channel [body]
  (let [csym (gensym)
        rsym (gensym)
        callback `(fn [~rsym] (async/>!! ~csym ~rsym))]

    (with-meta

      `(let [~csym (async/chan 1)]
         ~(concat body [callback])
         (async/<!! ~csym))

      {:csym csym
       :rsym rsym})))

As you can see, the only difference is that we now use the with-meta function to attach a map to the generated code. This contains two entries that tell any testing code what the gensym lookups returned, allowing us to use those in testing. We can retrieve the metadata with meta:


clj-macros.core=> (meta (macroexpand-1 `(with-return-channel (adder 1 2))))
{:csym G__24477, :rsym G__24478}

Now we can update our test to use this:


(deftest macro-output-with-metadata
  (let [output (macroexpand-1 `(with-return-channel (adder 1 2)))
        metadata (meta output)
        csym (:csym metadata)
        rsym (:rsym metadata)

        expected `(let [~csym (async/chan 1)]
                    (adder 1 2 (fn [~rsym] (async/>!! ~csym ~rsym)))
                    (async/<!! ~csym))]

    (is (= expected output))))

And our tests now pass 🙂

A more complex example

If this all seems a lot of work for what is in-fact a very simple macro, a more involved example may be in order. As noted earlier on, this sort of callback-injection and extraction macro is potentially useful in the ClojureScript world, as JavaScript uses callbacks extensively. However, wrapping each call in a macro is a bit tedious; if we want to chain a bunch of calls together it would be much nicer to have something more like Clojure’s native ->> (“thread-after”) operator. Ideally we’d like a macro that’s named something unpronounceable like -|| (although you can call it “pipe-after” if you like) that takes this:


(-|| (adder 1 2)
     (adder 3)
     (adder 4))

and converts it to something like this:


(let [rchan (async/chan 1)]
  (adder 1 2 (fn [r]
               (adder 3 r (fn [r]
                            (adder 4 r (fn [r]
                                         (async/>!! rchan r)))))))
  (async/<!! rchan))

The actual implementation of this is a bit tricky and involves right folding the input to nest the output. In this case having a reference implementation to test against is very useful. If you’re interested see below for a link to the final result.

The code

Code for the simple macro example above is available in my Atlassian Bitbucket repository. If you want to look at the more involved -|| macro have a look in my personal Bitbucket repo.