Professional Documents
Culture Documents
Abstract
A new design methodology is introduced, with some examples on building Domain
Specific Languages hierarchy on top of Scheme.
1
methodology is different, it strongly encourages of domain specific languages is not very tricky.
the use of all possible programming technolo- An approach I will describe here is based on
gies and to invent the “impossible” ones. metaprogramming techniques. It requires a so
called Core Language, on top of which we will
build a hierarchy of our domain specific lan-
2 Domain specific lan- guages. The Core Language should possess the
following properties:
guages
• True macros. That is, we must have an
Below I am providing an outline of the pro- access to a complete programming lan-
posed methodology. guage (preferably the same as a host
Any problem domain can be best expressed language, or a different one) inside the
using a language (mathematical, programming, macro definitions. Macros should be real
natural, ...) specially designed for it. In most programs which can do anything that the
cases there should be one entity in a language programs written in the host language
for every entity in a problem domain. For ex- can do. Macros are producing the code
ample, if a problem domain is the recognition in the host language, in the form of text
of syntax constructions in a characters stream, or directly as an abstract syntax tree.
the Domain Specific Language should contain
characters and characters sets as a primary en- • True runtime eval. Programs that are
generated in the runtime should be eval-
tity and automata constructions for expressing
syntax. That is enough — regular expressions uated. This can be a different language
language is designed. It is hard to believe that than the host language, or, better, the
same one.
somebody will ever invent anything better for
this purpose than this “most optimal” DSL. • Turing-completeness. This should be a
If a problem domain is already specified as real programming language, equivalent in
an algebra, we even do not have to design the its expressive power to the “general pur-
DSL: it will be this algebra itself, galvanised pose” languages.
with any underlying computational semantics
— this is the way SQL was born. If a prob- • Simplicity. It is an extensible core and
lem domain is 3D graphics, linear algebra and should not contain any unnecessary com-
stereometry should be used. All the languages plexity that can be later added by a user
and data formats dedicated to 3D contain sub- who really needs it.
sets of this formal theories. • Comprehensive and easy to use data
As it is stated in [1], types system. If a type system is well
suited for expressing any possible ab-
“The object of a DSL-based soft-
stract syntax trees, the language fits this
ware architecture is to minimise the
requirement.
semantic distance between the sys-
tem’s specification and its imple- On top of the Core Language we have to
mentation.” build functionality that will be needed to im-
plement programming languages. It is lexing,
parsing, intermediate languages that fit well
3 Core language computational models different from the model
of the Core Language (e.g., if the core language
For any problem it is convenient to have a lan- is imperative or an eager functional, we will
guage that best fits it. There already exist spe- need a graph reduction engine to implement
cialized languages for some common problems. lazy functional DSLs, or a term unification en-
But what to do if none is available? The an- gine to implement logical languages and a stack
swer is trivial: implement it. Implementation machine if we have to go to lower levels). The
2
Core Language enriched with this “Swiss army 5 Scheme example
knife” for programming languages development
then becomes a major tool for any project.
A good example of a practical Core Language is
Scheme (with addition of Common Lisp–style
4 New methodology macros). It uses S–expressions as an AST,
and S–expressions composition is very natu-
The development process must fit in the fol- ral. S–expressions are good enough to repre-
lowing chain: sent any possible AST (for example, XML is
naturally represented as SXML). It provides a
• divide the problem into sub–problems,
true runtime eval hosting the same language
possibly using some object oriented de-
as in compile time. There exist some practical
sign techniques, or whatever fits better.
and efficient Scheme implementations which
• formalize each sub–problem. provide performance acceptable for most tasks,
good FFI, and, thus, integration with legacy li-
• implement the Domain Specific Lan- braries.
guage after this formalization, using the
Core Language and other DSL with the Let us start with adding the functionality
same semantics. described above to Scheme. First of all we will
need parsing — not all of our team members
• solve the problem using the best possible are fond of parentheses, so we have to imple-
language. ment many complicated syntaxes. The most
This way any project will grow into a tree (hier- natural way for a functional programming lan-
archy) of domain specific languages. Any lan- guage is to implement a set of parsing combina-
guage is a subset or a superset of another lan- tors for building recursive descendant parsers
guage in the hierarchy (or, may be, combina- (mostly LL(1), but it is not such a fixed limit as
tion of several languages), and the amount of LALR(1) for Yacc–like automata generators).
coding for a new language if we already have Of course we will use metaprogramming
a deep and comprehensive hierarchy is quite wherever possible. All the parsers should be
small. functions which consume a list of tokens (e.g.
A development team working within this characters) as an input and return the result in
methodology should consist of at least one spe- the following form:
cialist who maintains this hierarchy, an archi-
tect who formalizes problems, and a number of ((RESULT anyresult) unparsed-input-rest)
coders who specialize in particular problem do- or
mains, they even may not be programmers at ((FAIL reason) input)
all — they just have to know well their domains
and operate them in terms that are as close as To access the parsing result we will provide
possible to the native problem domain termi- the following macros:
nology. For example, HTML designer will be
happy operating HTML–like tags for his tem- (define-macro (success? r )
plates (that is why JSP custom tags are so pop- ‘(not (eq? (caar ,r ) ’FAIL)))
ular); mathematician will find a language mod-
elled after the standard mathematical notation
intuitive — for this reason Wolfram Mathe- And if we are sure that we have some re-
matica is so popular among non-programmers; sult, we will use the following macro to extract
game script writer will operate a language ex- it (otherwise, this will return a fail message):
pressing characters, their properties and action
rules — stating, not programming. This list can (define-macro (result r )
be continued infinitely. ‘(cdar ,r ))
3
In any case, we can access the rest of the The last definition looks surprisingly com-
stream after the parsing pass: pact, thanks to the pselect macro. From this
stage the power of metaprogramming becomes
(define-macro (rest r )
more and more obvious.
‘(cdr ,r ))
Just as a reference, we will show here the
These macros could also be implemented as definition of a choice combinator:
functions. But all the macros are available in (define-macro (pOR0 p1 p2 )
the context of macro definitions while functions ‘(λ (l)
are not. (let ((r1 (,p1 l)))
Almost all of the parsers should fail on the (if (success? r1 )
end of the input, so the following safeguard r1
macro will be extremely useful: (,p2 l)))))
(define-macro (parser p) And its nested version is obvious:
‘(λ (l) (define-macro (pOR p1 . p~o )
(if (∅? l) ’((FAIL "EMPTY")) 0
‘(pselect pOR ,p1 ,@p~o ))
(,p l))))
We will skip the rest of the combinators
Now this game becomes more interesting. definitions and just show what we gained after
Here is a very handy macro that nests a se- all. For example, now to define a floating point
quence of applications into the form of (m p1 number recognizer, we can use this definition:
(m p2 . . . (m px pn ))):
(define parse-num
(define-macro (pselect m p1 . p~o ) (p+
(if (∅? p~o ) p1 (pOR (pcsx-or (#\− #\+))
(let ((p2 (car p~o )) parse-any)
(px (cdr p~o ))) (pMANY pdigit)
‘(,m ,p1 (pselect ,m ,p2 ,@px ))))) (p OR
(p+ (pcharx #\.)
(pMANY pdigit))
Sequence parsing combinator with two ar-
parse-any)))
guments can be declared as follows:
It looks like BNF, but still too Schemish.
(define-macro (p+0 p1 p2 ) This is already a Domain Specific Language on
‘(λ (l) top of Scheme, but it does not conform to the
(let ((r1 (,p1 l))) perfectionist requirement. However, we can use
(if (success? r1 ) this still not perfect parsing engine to imple-
(let ((r2 (,p2 (rest r1 )))) ment an intermediate regular expressions lan-
(if (success? r2 ) guage as a macro. Omitting the definitions, we
(cons (cons ’RESULT will show the previous recognizer implemented
(append in a new way:
(result r1 )
(result r2 ))) (define parse-num
(rest r2 )) (regexp
(cons (list ’FAIL "p+" (car r2 )) l))) ((#\− / #\+) / parse-any) +
r1 )))) (pdigit ∗) +
(("." + (pdigit ∗)) /
And it will be immediately turned into the parse-any)))
sequence parsing combinator with an arbitrary This new Domain Specific Language can be
number of arguments: used in many ways. For example, we can build
(define-macro (p+ p1 . p~o ) a simple infix pre–calculator for constants:
‘(pselect p+0 ,p1 ,@p~o )) (define-macro (exp1 . v )
4
(defparsers only languages with a computational model
(letrec which is close to the model of Scheme (ea-
((epr ger dynamically typed functional languages
(let with imperative features), but any possible
((body languages, providing small intermediate DSLs
(regexp which simulate alternative computational mod-
(num :−> $ 0) / els. For those who need very lowlevel power it
(lst -> (aprs epr ))))) is possible to produce an intermediate code in
(regexp C language (for example, the Bigloo Scheme [3]
((body + (SCM psym +) + epr ) implementation allows to include C code when
:−> (list (+ $ 0 $ 2))) / compiling through C backend). For implement-
((body + (SCM psym −) + epr ) ing complicated runtime models it is easy to
:−> (list (− $ 0 $ 2))) / produce an intermediate Forth–like DSL on top
((body + (SCM psym ∗) + epr ) of Scheme and then use both Scheme and Forth
:−> (list (∗ $ 0 $ 2))) / metaprogramming powers.
((body + (SCM psym / ) + epr )
:−> (list (/ $ 0 $ 2))) / body
)))) 6 Alternatives
(car (result (epr v ))))))
To make the picture complete, it is neces-
And then, wherever we want to calculate a
sary to mention other possible choices for the
numerical constant in the compilation time, we
Core Language. The popular programming
may use the exp1 macro:
language, C++, could become such a Core
(exp1 5 + ((10 / 2)−(1 / 5))) Language relatively easily. It has a Turing-
This language does not look like Scheme complete macro system, unfortunately, featur-
any more. And we can go even further, im- ing the language different from the host lan-
plementing a Pascal (or Rlisp)–like language guage (so only one stage preprocessing is pos-
on top of Scheme, using just the same regexp sible). It lacks a good type system, but it could
macro to describe both a lexer and a parser, be simulated on top of the existing lowlevel fea-
and then to compile the resulting code to the tures. There exist some implementations of
underlying Scheme. the recursive descendant parsing combinators
for C++ (e.g., Boost Spirit library [2]), imple-
(pasqualish mentation of the functional programming (e.g.,
" Boost Lambda [2]), and even Lisp compilers on
function fac(x) top of the C++ template system. The runtime
begin evaluation is available in different ways: using
if (x > 0) then pluggable scripting languages other than C++,
x*fac(x - 1) using the C++ interpreter [14]. An interesting
else 1; approach is described in [13].
end Another choice is Forth. It is a powerful
") metalanguage, but the core language remains
too lowlevel and unsafe. Forth is often the only
No more parenthesis that frighten non–Lisp choice available for the embedded systems with
programmers so much! Now even Pascal pro- limited resources.
grammers can use Scheme. It is worth mentioning modern experimen-
The code samples above demonstrate some tal extensions for strictly typed functional lan-
of the techniques available in this approach. guages: Template Haskell [12] and MetaOCaml
The complete implementation can be down- [11]. Both of them conform well to all of
loaded from [4]. It is possible to produce not the Core Language requirements. Objective
5
Caml also provides one–stage metaprogram- metaprogramming too.
ming using a sophisticated preprocessing en-
gine CamlP4. And OCaml is quite good for
implementing interpreters using the closure– 7 Conclusion
based technique. Some examples can be found
in [7]. The idea of metaprogramming is not something
esoteric. Metaprogramming is used widely by
No doubt that Common Lisp would also commercial programmers, they just do not re-
be a very good platform since it shares almost alize it. The methodology proposed in this pa-
all the features with Scheme with exception of per is an attempt of uncovering all the hid-
simplicity. The killing feature of Common Lisp den power of the metaprogramming techniques
is advanced runtime compilation in some of the available.
major implementations (CMU CL [8] and its The Scheme example presented above is
descendant SBCL [9] are good examples), and part of the working project, which already
the defmacro is guaranteed to be working in all proved the supremacy of this approach. A sub-
the implementations available, which is a great set of the Domain Specific Languages hierarchy
advantage over Scheme. designed for the WWW data acquiring project
For relatively small projects Tcl [10] would is shown on the Fig. 1.
be a good choice. Its computational model The subject discussed requires future re-
is based on rewrites (and primary data struc- search and practical approbation, whose final
tures are just the strings of text), which ren- result may be a completely formalized, mathe-
ders an extremely powerful metaprogramming matically strict methodology description and a
tool. JavaScript language is also based on Core Language which will best fit this method-
the rewrites semantics, so it could be used for ology.
6
References
[1] Diomidis Spinellis. Reliable software implementation using domain specific languages. In
G. I. Schuëller and P. Kafka, editors, Proceedings ESREL ’99 — The Tenth European
Conference on Safety and Reliability, pages 627–631, Rotterdam, September 1999. ESRA,
VDI, TUM, A. A. Balkema //
[draft] http://www.dmst.aueb.gr/dds/pubs/conf/1999-ESREL-SoftRel/html/dsl.html
7
Core language
Parsing combinators
Rule engine
SQL templates
Figure 1: A sample DSLs hierarchy subset for the Web crawler project.