Lisp macro is a programmatic transformation of source code. A macro transformer is a procedure that takes a subtree of source code, and returns a reconstructed tree of source code.
The traditional Lisp macros take the input source code as
an S-expression, and returns the output as another S-expression.
Gauche supports that type of macro, too, with define-macro
form.
Here’s the simple definition of when
with the traditional macro.
(define-macro (when test . body) `(if ,test (begin ,@body)))
For example,
if the macro is used as (when (zero? x) (print "zero") 'zero)
,
the above macro transformer rewrites it to
(if (zero? x) (begin (print "zero") 'zero))
. So far, so good.
But what if the when
macro is used in an environment
where the names begin
or if
is bound to nonstandard values?
(let ([begin list]) (when (zero? x) (print "zero") 'zero))
The expanded result would be as follows:
(let ([begin list]) (if (zero? x) (begin (print "zero") 'zero)))
This obviously won’t work as the macro writer intended, since
begin
in the expanded code refers to the locally bound name.
This is a form of variable capture. Note that, when Lisp
people talk about variable capture of macros, it often means
another form of capture, where the temporary variables inserted
by a macro would unintentionally capture the variables passed to
the macro. That kind of variable capture can be avoided easily
by naming the temporary variables something that never conflict,
using gensym
.
On the other hand, the kind of variable capture in the above example
can’t be avoided by gensym
, because (let ([begin list]) ...)
part isn’t under macro writer’s control. As a macro writer, you can
do nothing to prevent the conflict, just hoping the
macro user won’t do such a thing. Sure, rebinding begin
is
a crazy idea that nobody perhaps wants to do, but it can happen on
any global variable, even the ones you define for your library.
Various Lisp dialects have tried to address this issue in different ways. Common Lisp somewhat relies on the common sense of the programmer—you can use separate packages to reduce the chance of accidental conflict but can’t prevent the user from binding the name in the same package. (The Common Lisp spec says it is undefined if you locally rebind names of CL standard symbols; but it doesn’t prevent you from locally rebinding symbols that are provided by user libraries.)
Clojure introduced a way to directly refer to the toplevel variables by a namespace prefix, so it can bypass whatever local bindings of the same name (also, it has a sophisticated quasiquote form that automatically renames free variables to refer to the toplevel ones). It works, as far as there are no local macros. With local macros, you need a way to distinguish different local bindings of the same name, as we see in the later examples. Clojure’s way can only distinguish between local and toplevel bindings. It’s ok for Clojure which doesn’t have local macros, but in Scheme, we prefer uniform and orthogonal axioms—if functions can be defined locally with lexical scope, why not macros?
Let’s look at the local macro with lexical scope. For the sake of
explanation, suppose we have
hypothetical local macro binding form, let-macro
,
that binds a local identifiers to a macro transformer.
(We don’t actually have let-macro
; what we have is
let-syntax
and letrec-syntax
, which have slightly
different way to call macro transformers. But here let-macro
may
be easier to understand as it is similar to define-macro
.)
(let ([f (^x (* x x))]) (let-macro ([m (^[expr1 expr2] `(+ (f ,expr1) (f ,expr2)))]) (let ([f (^x (+ x x))]) (m 3 4)))) ; [1]
The local identifier m is bound to a macro transformer
that takes two expressions, and returns an S-expression.
So, the (m 3 4)
form [1] would be expanded into
(+ (f 3) (f 4))
. Let’s rewrite the above expression
with the expanded form. (After expansion, we no longer
need let-macro
form, so we don’t include it.)
(let ([f (^x (* x x))]) (let ([f (^x (+ x x))]) (+ (f 3) (f 4)))) ; [2]
Now, the question. Which binding f
in the expanded form [2]
should refer? If we literally interpret the expansion,
it would refer to the inner binding (^x (+ x x))
.
However, following the Scheme’s scoping principle, the outer
code should be fully understood regardless of inner code:
(let ([f (^x (* x x))]) (let-macro ([m (^[expr1 expr2] `(+ (f ,expr1) (f ,expr2)))]) ;; The code here isn't expected to accidentally alter ;; the behavior defined outside. ))
The macro writer may not know the inner let
shadows
the binding of f
(the inner forms may be include
d,
or may be changed by other person who didn’t fully realize
the macro expansion needs to refer outer f
).
To ensure the local macro to work regardless of what’s placed
inside let-macro
, we need a sure way to refer the outer
f
in the result of macro expansion. The basic idea is
to “mark”
the names inserted by the macro transformer m
—which are
f
and +
—so that we can distinguish two f
’s.
For example, if we would rewrite the entire form and renames corresponding local identifiers as follows:
(let ([f_1 (^x (* x x))]) (let-macro ([m (^[expr1 expr2] `(+ (f_1 ,expr1) (f_1 ,expr2)))]) (let ([f_2 (^x (+ x x))]) (m 3 4))))
Then the naive expansion would correctly preserve scopes; that is,
expansion of m
refers f_1
, which wouldn’t conflict
with inner name f_2
:
(let ([f_1 (^x (* x x))]) (let ([f_2 (^x (+ x x))]) (+ (f_1 3) (f_1 4))))
(You may notice that this is similar to lambda calculus treating lexical bindings with higher order functions.)
The above example deal with avoiding f
referred from the
macro definition (which is, in fact, f_1
) from being
shadowed
by the binding of f
at the macro use (which is f_2
).
Another type of variable capture (the one most often talked about,
and can be avoided by gensym
)
is that a variable in macro use site is shadowed by the binding introduced
by a macro definition. We can apply the same renaming strategy to
avoid that type of capture, too. Let’s see the following example:
(let ([f (^x (* x x))]) (let-macro ([m (^[expr1] `(let ([f (^x (+ x x))]) (f ,expr1)))]) (m (f 3))))
The local macro inserts binding of f
into the expansion.
The macro use (m (f 3))
also contains a reference to f
,
which should be the outer f
,
since the macro use is lexically outside of the let
inserted
by the macro.
We could rename f
’s according to its lexical scope:
(let ([f_1 (^x (* x x))]) (let-macro ([m (^[expr1] `(let ([f_2 (^x (+ x x))]) (f_2 ,expr1)))]) (m (f_1 3))))
Then expansion unambiguously distinguish two f
’s.
(let ([f_1 (^x (* x x))]) (let ([f_2 (^x (+ x x))]) (f_2 (f_1 3))))
This is, in principle, what hygienic macro is about (well, almost).
In reality, we don’t rename everything in batch.
One caveat is in the latter example—we statically renamed
f
to f_2
, but it is possible that the macro
recursively calls itself, and we have to distinguish f
’s
introduced in every individual expansion of m
.
So macro expansion and renaming should work together.
There are multiple strategies to implement it, and the Scheme standard doesn’t want to bind implementations to single specific strategy. The standard only states the properties the macro system should satisfy, in two concise sentences:
If a macro transformer inserts a binding for an identifier (variable or keyword), the identifier will in effect be renamed throughout its scope to avoid conflicts with other identifiers.
If a macro transformer inserts a free reference to an identifier, the reference refers to the binding that was visible where the transformer was specified, regardless of any local bindings that surround the use of the macro.
Just from reading this,
it may not be obvious how to realize those properties, and
the existing hygienic macro mechanisms (e.g. syntax-rules
) hide
the “how” part. That’s probably one of the reason some people
feel hygienic macros are difficult to grasp. It’s like
continuations—its description is concise but at first
you have no idea how it works; then, through experience,
you become familiarized yourself to it, and then you reread
the original description and understand it says exactly what it is.
This introduction may not answer how the hygienic macro realizes those properties, but I hope it showed what it does and why it is needed. In the following chapters we introduce a couple of hygienic macro mechanisms Gauche supports, with examples, so that you can familiarize yourself to the concept.