On this page:
1.1 Intent
1.2 Motivation
1.3 Applicability
1.4 Solution and example
1.4.1 The vocabulary of syntactic forms
1.4.2 The expander
1.4.3 The compiler
1.4.4 Embedding macros
1.5 Consequences
1.6 Implementation details
1.7 Known uses
1.8 Related patterns
Bibliography
8.14

1 Extensible embedded compiler🔗

1.1 Intent🔗

Extensible embedded compilers are a good way to implement full-featured domain-specific languages that:

1.2 Motivation🔗

While one use of macros is to implement new language forms that extend the base Racket language, another is to implement conceptually distinct domain-specific langauges (DSLs).

A DSL has its own grammar, static semantics, and evaluation model. Implementing these features may require the flexibility of a traditional multi-pass compiler. At the same time, we often want DSLs to integrate fluidly with Racket and other DSLs. And just as programers can create abstractions atop Racket using macros, they should be able to create abstractions atop DSLs as well.

The extensible embedded compiler pattern is a way of structuring a DSL implementation to support all of these properties. It replicates the structure of the implementation of Racket, using a macro expander tailored to the DSL as front-end and a traditional multi-pass compiler as back-end. The DSL’s macro expander shares the Racket expander’s hygiene mechanism and expander environment in order to integrate the DSL syntax with Racket.

1.3 Applicability🔗

Use an extensible embedded compiler when:

1.4 Solution and example🔗

The high level components of an extensible embedded compiler are a vocabulary of core syntactic forms, a macro expander, a back-end compiler, and macros to embed the DSL in Racket. The following sections address each component in turn.

We use the DSL of parsing expression grammars (PEGs) [peg] as an example to illustrate the pattern. Basic parsing expressions match empty input, characters, sequences, and alternatives. A local binding form enables named recursive grammars.

peg := -eps

     | (-char <character>)

     | (-seq <peg> <peg>)

     | (-or <peg> <peg>)

     | (-local ([<identifier> <peg>]) <peg>)

We embed the DSL in Racket with a syntactic form called parse:

racket-exp := (parse <peg> <racket-exp>)

Its Racket sub-expression should evaluate to a string, which will be parsed according to the grammar of the PEG subexpression. The form returns the number of characters of the string that match the grammar.

1.4.1 The vocabulary of syntactic forms🔗

Racket associates syntactic forms with an identifier binding in a module or local scope, just like runtime bindings. This allows the visibility and name of syntactic forms to be controlled by modules and lexical scope, and is essential to Racket’s conception of "languages as libraries". Consequently, the first task in defining a DSL is to create bindings for the core syntactic forms of the DSL.

When we create identifier bindings, we also have to define their meaning when they appear as normal Racket expressions. DSL forms only make sense in the context of the DSL, so we declare that expansion should raise an error if they appear in a Racket context.

The Literal pattern describes how to establish the bindings. Using define-literals from syntax-generic2 to abstract over that pattern, the literal definitions for the PEG language may be written as follows:

(define-literal-forms
  peg-literals
  "peg forms cannot be used as racket expressions"
  (-eps
   -char
   -seq
   -or
   -local))
1.4.2 The expander🔗

The expander checks that a program conforms to the syntax of the DSL, expands macros, and reconstructs fully-expanded syntax. It also needs to construct representations of the program’s scopes and bindings in order to implement hygienic name resolution, and because macro names are scoped and may be shadowed.

As a first step we need to define data structures for the representations of DSL variable and macro bindings in the expander environment. Both kinds of binding need to be distinguished from those belonging to other languages, so we create new structure types to represent them. There isn’t any static information associated with PEG variables, so the corresponding structure has no fields. For macros we need to remember the transformer procedure:

(begin-for-syntax
  (struct peg-variable [])
  (struct peg-macro [transformer]))

In DSLs with richer static semantics additional information such as types would be associated with variables. More sophisticated extensible embedded compilers usually employ the Binding interface pattern in their expander environment representations to allow other languages to create variable bindings that interoperate with the DSL.

The main part of the expander for PEGs is defined in figure 1.

 2 (begin-for-syntax
 3   (define/hygienic (expand-peg stx)
 4     (syntax-parse stx
 5       #:literal-sets (peg-literals)
 6       ; Core forms
 7       [-eps this-syntax]
 8       [(-char c:char) this-syntax]
 9       [(-seq e1 e2)
10        (def/stx e1^ (expand-peg #'e1))
11        (def/stx e2^ (expand-peg #'e2))
12        (qstx/rc (-seq e1^ e2^))]
13       [(-or e1 e2)
14        (def/stx e1^ (expand-peg #'e1))
15        (def/stx e2^ (expand-peg #'e2))
16        (qstx/rc (-or e1^ e2^))]
17       [(-local [g e]
18                b)
19        (define sc (make-scope))
20        (def/stx g^ (bind! (add-scope #'g sc)
21                           #'(peg-variable)))
22        (def/stx e^ (expand-peg (add-scope #'e sc)))
23        (def/stx b^ (expand-peg (add-scope #'b sc)))
24        (qstx/rc
25         (-local [g^ e^]
26                 b^))]
27       [name:id
28        (when (not (peg-variable? (lookup #'name)))
29          (raise-syntax-error #f "not bound as a peg" #'name))
30        this-syntax]
31  
32       ; Macros
33       [(head . rest)
34        #:when (peg-macro? (lookup #'head))
35        (expand-peg
36         ((peg-macro-transformer (lookup #'head)) stx))])))

Figure 1: PEG DSL Expander

1.4.3 The compiler🔗

Phase 1 function over syntax

Can use apply-as-transformer hygiene to ensure generated code is fresh

[[Persistent symbol tables]] as needed

1.4.4 Embedding macros🔗

Macro in the language that the DSL is to be embedded in; usually Racket.

Invoke the expander and the compiler.

Perform and wrapping or conversion of values between the internal representation of the DSL and the external rep shared with Racket.

1.5 Consequences🔗
1.6 Implementation details🔗
1.7 Known uses🔗
1.8 Related patterns🔗
Bibliography🔗

[peg] Bryan Ford, “Parsing expression grammars: a recognition-based syntactic foundation,” POPL, 2004. https://doi.org/10.1145/964001.964011