Module trmacros

Term rewriting macros

Term rewriting macros are macros or templates that have not only a name but also a pattern that is searched for after the semantic checking phase of the compiler: This means they provide an easy way to enhance the compilation pipeline with user defined optimizations:

template optMul{`*`(a, 2)}(a: int): int = a+a

let x = 3
echo x * 2

The compiler now rewrites x * 2 as x + x. The code inside the curlies is the pattern to match against. The operators *, **, |, ~ have a special meaning in patterns if they are written in infix notation, so to match verbatim against * the ordinary function call syntax needs to be used.

Unfortunately optimizations are hard to get right and even the tiny example is wrong:

template optMul{`*`(a, 2)}(a: int): int = a+a

proc f(): int =
  echo "side effect!"
  result = 55

echo f() * 2

We cannot duplicate 'a' if it denotes an expression that has a side effect! Fortunately Nim supports side effect analysis:

template optMul{`*`(a, 2)}(a: int{noSideEffect}): int = a+a

proc f(): int =
  echo "side effect!"
  result = 55

echo f() * 2 # not optimized ;-)

So what about 2 * a? We should tell the compiler * is commutative. We cannot really do that however as the following code only swaps arguments blindly:

template mulIsCommutative{`*`(a, b)}(a, b: int): int = b*a

What optimizers really need to do is a canonicalization:

template canonMul{`*`(a, b)}(a: int{lit}, b: int): int = b*a

The int{lit} parameter pattern matches against an expression of type int, but only if it's a literal.

Parameter constraints

The parameter constraint expression can use the operators | (or), & (and) and ~ (not) and the following predicates:

PredicateMeaning
atomThe matching node has no children.
litThe matching node is a literal like "abc", 12.
symThe matching node must be a symbol (a bound identifier).
identThe matching node must be an identifier (an unbound identifier).
callThe matching AST must be a call/apply expression.
lvalueThe matching AST must be an lvalue.
sideeffectThe matching AST must have a side effect.
nosideeffectThe matching AST must have no side effect.
paramA symbol which is a parameter.
genericparamA symbol which is a generic parameter.
moduleA symbol which is a module.
typeA symbol which is a type.
varA symbol which is a variable.
letA symbol which is a let variable.
constA symbol which is a constant.
resultThe special result variable.
procA symbol which is a proc.
methodA symbol which is a method.
iteratorA symbol which is an iterator.
converterA symbol which is a converter.
macroA symbol which is a macro.
templateA symbol which is a template.
fieldA symbol which is a field in a tuple or an object.
enumfieldA symbol which is a field in an enumeration.
forvarA for loop variable.
labelA label (used in block statements).
nk*The matching AST must have the specified kind. (Example: nkIfStmt denotes an if statement.)
aliasStates that the marked parameter needs to alias with some other parameter.
noaliasStates that every other parameter must not alias with the marked parameter.

The alias and noalias predicates refer not only to the matching AST, but also to every other bound parameter; syntactially they need to occur after the ordinary AST predicates:

template ex{a = b + c}(a: int{noalias}, b, c: int) =
  # this transformation is only valid if 'b' and 'c' do not alias 'a':
  a = b
  inc a, c

Pattern operators

The operators *, **, |, ~ have a special meaning in patterns if they are written in infix notation.

The | operator

The | operator if used as infix operator creates an ordered choice:

template t{0|1}(): expr = 3
let a = 1
# outputs 3:
echo a

The matching is performed after the compiler performed some optimizations like constant folding, so the following does not work:

template t{0|1}(): expr = 3
# outputs 1:
echo 1

The reason is that the compiler already transformed the 1 into "1" for the echo statement. However, a term rewriting macro should not change the semantics anyway. In fact they can be deactived with the --patterns:off command line option or temporarily with the patterns pragma.

The {} operator

A pattern expression can be bound to a pattern parameter via the expr{param} notation:

template t{(0|1|2){x}}(x: expr): expr = x+1
let a = 1
# outputs 2:
echo a

The ~ operator

The ~ operator is the not operator in patterns:

template t{x = (~x){y} and (~x){z}}(x, y, z: bool): stmt =
  x = y
  if x: x = z

var
  a = false
  b = true
  c = false
a = b and c
echo a

The * operator

The * operator can flatten a nested binary expression like a & b & c to &(a, b, c):

var
  calls = 0

proc `&&`(s: varargs[string]): string =
  result = s[0]
  for i in 1..len(s)-1: result.add s[i]
  inc calls

template optConc{ `&&` * a }(a: string): expr = &&a

let space = " "
echo "my" && (space & "awe" && "some " ) && "concat"

# check that it's been optimized properly:
doAssert calls == 1

The second operator of * must be a parameter; it is used to gather all the arguments. The expression "my" && (space & "awe" && "some " ) && "concat" is passed to optConc in a as a special list (of kind nkArgList) which is flattened into a call expression; thus the invocation of optConc produces:

`&&`("my", space & "awe", "some ", "concat")

The ** operator

The ** is much like the * operator, except that it gathers not only all the arguments, but also the matched operators in reverse polish notation:

import macros

type
  TMatrix = object
    dummy: int

proc `*`(a, b: TMatrix): TMatrix = discard
proc `+`(a, b: TMatrix): TMatrix = discard
proc `-`(a, b: TMatrix): TMatrix = discard
proc `$`(a: TMatrix): string = result = $a.dummy
proc mat21(): TMatrix =
  result.dummy = 21

macro optM{ (`+`|`-`|`*`) ** a }(a: TMatrix): expr =
  echo treeRepr(a)
  result = newCall(bindSym"mat21")

var x, y, z: TMatrix

echo x + y * z - x

This passes the expression x + y * z - x to the optM macro as an nnkArgList node containing:

Arglist
  Sym "x"
  Sym "y"
  Sym "z"
  Sym "*"
  Sym "+"
  Sym "x"
  Sym "-"

(Which is the reverse polish notation of x + y * z - x.)

Parameters

Parameters in a pattern are type checked in the matching process. If a parameter is of the type varargs it is treated specially and it can match 0 or more arguments in the AST to be matched against:

template optWrite{
  write(f, x)
  ((write|writeln){w})(f, y)
}(x, y: varargs[expr], f: TFile, w: expr) =
  w(f, x, y)

Example: Partial evaluation

The following example shows how some simple partial evaluation can be implemented with term rewriting:

proc p(x, y: int; cond: bool): int =
  result = if cond: x + y else: x - y

template optP1{p(x, y, true)}(x, y: expr): expr = x + y
template optP2{p(x, y, false)}(x, y: expr): expr = x - y

Example: Hoisting

The following example shows how some form of hoisting can be implemented:

import pegs

template optPeg{peg(pattern)}(pattern: string{lit}): TPeg =
  var gl {.global, gensym.} = peg(pattern)
  gl

for i in 0 .. 3:
  echo match("(a b c)", peg"'(' @ ')'")
  echo match("W_HI_Le", peg"\y 'while'")

The optPeg template optimizes the case of a peg constructor with a string literal, so that the pattern will only be parsed once at program startup and stored in a global gl which is then re-used. This optimization is called hoisting because it is comparable to classical loop hoisting.

AST based overloading

Parameter constraints can also be used for ordinary routine parameters; these constraints affect ordinary overloading resolution then:

proc optLit(a: string{lit|`const`}) =
  echo "string literal"
proc optLit(a: string) =
  echo "no string literal"

const
  constant = "abc"

var
  variable = "xyz"

optLit("literal")
optLit(constant)
optLit(variable)

However, the constraints alias and noalias are not available in ordinary routines.

Move optimization

The call constraint is particularly useful to implement a move optimization for types that have copying semantics:

proc `[]=`*(t: var TTable, key: string, val: string) =
  ## puts a (key, value)-pair into `t`. The semantics of string require
  ## a copy here:
  let idx = findInsertionPosition(key)
  t[idx] = key
  t[idx] = val

proc `[]=`*(t: var TTable, key: string{call}, val: string{call}) =
  ## puts a (key, value)-pair into `t`. Optimized version that knows that
  ## the strings are unique and thus don't need to be copied:
  let idx = findInsertionPosition(key)
  shallowCopy t[idx], key
  shallowCopy t[idx], val

var t: TTable
# overloading resolution ensures that the optimized []= is called here:
t[f()] = g()
Generated: 2014-10-03 00:45:21 UTC