You are on page 1of 8

Using a hierarchy of Domain Specific Languages in

complex software systems design


V. S. Lugovsky <VSLougovski@lbl.gov>
October 23, 2018
arXiv:cs/0409016v1 [cs.PL] 9 Sep 2004

Abstract
A new design methodology is introduced, with some examples on building Domain
Specific Languages hierarchy on top of Scheme.

1 Introduction nized as “true” macros as they hide an access


to the host language.
Programs that write programs that write pro- This situation looks like a paradox. On the
grams (...) Too complicated? A hackers tech- one hand, industry uses the metaprogramming
nique which can not be applied to the “real ideas and tools, and it is easy to imagine how
world” problems? This is exactly how IT in- it would suffer without them. On the other
dustry specialists think about metaprogram- hand, industry does not want to hear anything
ming. And this is a completely wrong notion! related to the metaprogramming. It does not
Metaprogramming is the only known way want people inventing new programming lan-
to reduce the complexity significantly. In some guages — plenty of industry coders barely use
areas “programs that write programs” are ac- only one language and IT managers believe
cepted by the industry due to the enormous without any reason that they can not be taught
level of complexity of the corresponding hand- to use more [6].
written code — regular expressions, lexers and Industry prefers to “re–invent a wheel” and
parsers generators to name a few, and code wiz- to express any sort of complexity in the form
ards and templates in popular “integrated de- of libraries for static, steady languages. For
velopment environments” are also widely used. some strange reason learning complicated li-
But this does not help at all in the over- braries for a language which barely fits problem
all methodology recognition. The industry’s domain needs is preferred to learning a small
most beloved and widely buzzworded language, new language specifically designed for it.
Java, does not have even such a rudimentary In this paper I am trying to advocate the
preprocessor as C does. Very few C++ pro- metaprogramming approach as the major de-
grammers have an idea on how to use the tem- sign methodology for complex systems. Sounds
plates, they just utilize STL without any un- like another one “silver bullet” invention?
derstanding of the true source of the power. There were many methodologies claiming to
Even in the enlightened world of Lisp pro- solve all possible problems of the mankind —
gramming the misunderstanding is surprisingly RUP, eXtreme programming, etc. Why do we
wide: almost all of Lisp dialects and Scheme need another one? Simply because the previ-
implementations have problems with macros ous approaches did not succeed. They were
(not so many people are using them), and even too tied to particular programming technolo-
the current Scheme standard R5RS contains gies (mostly — OOP varieties), which are def-
only hygienic macros that can hardly be recog- initely not “silver bullets”. Metaprogramming

1
methodology is different, it strongly encourages of domain specific languages is not very tricky.
the use of all possible programming technolo- An approach I will describe here is based on
gies and to invent the “impossible” ones. metaprogramming techniques. It requires a so
called Core Language, on top of which we will
build a hierarchy of our domain specific lan-
2 Domain specific lan- guages. The Core Language should possess the
following properties:
guages
• True macros. That is, we must have an
Below I am providing an outline of the pro- access to a complete programming lan-
posed methodology. guage (preferably the same as a host
Any problem domain can be best expressed language, or a different one) inside the
using a language (mathematical, programming, macro definitions. Macros should be real
natural, ...) specially designed for it. In most programs which can do anything that the
cases there should be one entity in a language programs written in the host language
for every entity in a problem domain. For ex- can do. Macros are producing the code
ample, if a problem domain is the recognition in the host language, in the form of text
of syntax constructions in a characters stream, or directly as an abstract syntax tree.
the Domain Specific Language should contain
characters and characters sets as a primary en- • True runtime eval. Programs that are
generated in the runtime should be eval-
tity and automata constructions for expressing
syntax. That is enough — regular expressions uated. This can be a different language
language is designed. It is hard to believe that than the host language, or, better, the
same one.
somebody will ever invent anything better for
this purpose than this “most optimal” DSL. • Turing-completeness. This should be a
If a problem domain is already specified as real programming language, equivalent in
an algebra, we even do not have to design the its expressive power to the “general pur-
DSL: it will be this algebra itself, galvanised pose” languages.
with any underlying computational semantics
— this is the way SQL was born. If a prob- • Simplicity. It is an extensible core and
lem domain is 3D graphics, linear algebra and should not contain any unnecessary com-
stereometry should be used. All the languages plexity that can be later added by a user
and data formats dedicated to 3D contain sub- who really needs it.
sets of this formal theories. • Comprehensive and easy to use data
As it is stated in [1], types system. If a type system is well
suited for expressing any possible ab-
“The object of a DSL-based soft-
stract syntax trees, the language fits this
ware architecture is to minimise the
requirement.
semantic distance between the sys-
tem’s specification and its imple- On top of the Core Language we have to
mentation.” build functionality that will be needed to im-
plement programming languages. It is lexing,
parsing, intermediate languages that fit well
3 Core language computational models different from the model
of the Core Language (e.g., if the core language
For any problem it is convenient to have a lan- is imperative or an eager functional, we will
guage that best fits it. There already exist spe- need a graph reduction engine to implement
cialized languages for some common problems. lazy functional DSLs, or a term unification en-
But what to do if none is available? The an- gine to implement logical languages and a stack
swer is trivial: implement it. Implementation machine if we have to go to lower levels). The

2
Core Language enriched with this “Swiss army 5 Scheme example
knife” for programming languages development
then becomes a major tool for any project.
A good example of a practical Core Language is
Scheme (with addition of Common Lisp–style
4 New methodology macros). It uses S–expressions as an AST,
and S–expressions composition is very natu-
The development process must fit in the fol- ral. S–expressions are good enough to repre-
lowing chain: sent any possible AST (for example, XML is
naturally represented as SXML). It provides a
• divide the problem into sub–problems,
true runtime eval hosting the same language
possibly using some object oriented de-
as in compile time. There exist some practical
sign techniques, or whatever fits better.
and efficient Scheme implementations which
• formalize each sub–problem. provide performance acceptable for most tasks,
good FFI, and, thus, integration with legacy li-
• implement the Domain Specific Lan- braries.
guage after this formalization, using the
Core Language and other DSL with the Let us start with adding the functionality
same semantics. described above to Scheme. First of all we will
need parsing — not all of our team members
• solve the problem using the best possible are fond of parentheses, so we have to imple-
language. ment many complicated syntaxes. The most
This way any project will grow into a tree (hier- natural way for a functional programming lan-
archy) of domain specific languages. Any lan- guage is to implement a set of parsing combina-
guage is a subset or a superset of another lan- tors for building recursive descendant parsers
guage in the hierarchy (or, may be, combina- (mostly LL(1), but it is not such a fixed limit as
tion of several languages), and the amount of LALR(1) for Yacc–like automata generators).
coding for a new language if we already have Of course we will use metaprogramming
a deep and comprehensive hierarchy is quite wherever possible. All the parsers should be
small. functions which consume a list of tokens (e.g.
A development team working within this characters) as an input and return the result in
methodology should consist of at least one spe- the following form:
cialist who maintains this hierarchy, an archi-
tect who formalizes problems, and a number of ((RESULT anyresult) unparsed-input-rest)
coders who specialize in particular problem do- or
mains, they even may not be programmers at ((FAIL reason) input)
all — they just have to know well their domains
and operate them in terms that are as close as To access the parsing result we will provide
possible to the native problem domain termi- the following macros:
nology. For example, HTML designer will be
happy operating HTML–like tags for his tem- (define-macro (success? r )
plates (that is why JSP custom tags are so pop- ‘(not (eq? (caar ,r ) ’FAIL)))
ular); mathematician will find a language mod-
elled after the standard mathematical notation
intuitive — for this reason Wolfram Mathe- And if we are sure that we have some re-
matica is so popular among non-programmers; sult, we will use the following macro to extract
game script writer will operate a language ex- it (otherwise, this will return a fail message):
pressing characters, their properties and action
rules — stating, not programming. This list can (define-macro (result r )
be continued infinitely. ‘(cdar ,r ))

3
In any case, we can access the rest of the The last definition looks surprisingly com-
stream after the parsing pass: pact, thanks to the pselect macro. From this
stage the power of metaprogramming becomes
(define-macro (rest r )
more and more obvious.
‘(cdr ,r ))
Just as a reference, we will show here the
These macros could also be implemented as definition of a choice combinator:
functions. But all the macros are available in (define-macro (pOR0 p1 p2 )
the context of macro definitions while functions ‘(λ (l)
are not. (let ((r1 (,p1 l)))
Almost all of the parsers should fail on the (if (success? r1 )
end of the input, so the following safeguard r1
macro will be extremely useful: (,p2 l)))))
(define-macro (parser p) And its nested version is obvious:
‘(λ (l) (define-macro (pOR p1 . p~o )
(if (∅? l) ’((FAIL "EMPTY")) 0
‘(pselect pOR ,p1 ,@p~o ))
(,p l))))
We will skip the rest of the combinators
Now this game becomes more interesting. definitions and just show what we gained after
Here is a very handy macro that nests a se- all. For example, now to define a floating point
quence of applications into the form of (m p1 number recognizer, we can use this definition:
(m p2 . . . (m px pn ))):
(define parse-num
(define-macro (pselect m p1 . p~o ) (p+
(if (∅? p~o ) p1 (pOR (pcsx-or (#\− #\+))
(let ((p2 (car p~o )) parse-any)
(px (cdr p~o ))) (pMANY pdigit)
‘(,m ,p1 (pselect ,m ,p2 ,@px ))))) (p OR
(p+ (pcharx #\.)
(pMANY pdigit))
Sequence parsing combinator with two ar-
parse-any)))
guments can be declared as follows:
It looks like BNF, but still too Schemish.
(define-macro (p+0 p1 p2 ) This is already a Domain Specific Language on
‘(λ (l) top of Scheme, but it does not conform to the
(let ((r1 (,p1 l))) perfectionist requirement. However, we can use
(if (success? r1 ) this still not perfect parsing engine to imple-
(let ((r2 (,p2 (rest r1 )))) ment an intermediate regular expressions lan-
(if (success? r2 ) guage as a macro. Omitting the definitions, we
(cons (cons ’RESULT will show the previous recognizer implemented
(append in a new way:
(result r1 )
(result r2 ))) (define parse-num
(rest r2 )) (regexp
(cons (list ’FAIL "p+" (car r2 )) l))) ((#\− / #\+) / parse-any) +
r1 )))) (pdigit ∗) +
(("." + (pdigit ∗)) /
And it will be immediately turned into the parse-any)))
sequence parsing combinator with an arbitrary This new Domain Specific Language can be
number of arguments: used in many ways. For example, we can build
(define-macro (p+ p1 . p~o ) a simple infix pre–calculator for constants:
‘(pselect p+0 ,p1 ,@p~o )) (define-macro (exp1 . v )

4
(defparsers only languages with a computational model
(letrec which is close to the model of Scheme (ea-
((epr ger dynamically typed functional languages
(let with imperative features), but any possible
((body languages, providing small intermediate DSLs
(regexp which simulate alternative computational mod-
(num :−> $ 0) / els. For those who need very lowlevel power it
(lst -> (aprs epr ))))) is possible to produce an intermediate code in
(regexp C language (for example, the Bigloo Scheme [3]
((body + (SCM psym +) + epr ) implementation allows to include C code when
:−> (list (+ $ 0 $ 2))) / compiling through C backend). For implement-
((body + (SCM psym −) + epr ) ing complicated runtime models it is easy to
:−> (list (− $ 0 $ 2))) / produce an intermediate Forth–like DSL on top
((body + (SCM psym ∗) + epr ) of Scheme and then use both Scheme and Forth
:−> (list (∗ $ 0 $ 2))) / metaprogramming powers.
((body + (SCM psym / ) + epr )
:−> (list (/ $ 0 $ 2))) / body
)))) 6 Alternatives
(car (result (epr v ))))))
To make the picture complete, it is neces-
And then, wherever we want to calculate a
sary to mention other possible choices for the
numerical constant in the compilation time, we
Core Language. The popular programming
may use the exp1 macro:
language, C++, could become such a Core
(exp1 5 + ((10 / 2)−(1 / 5))) Language relatively easily. It has a Turing-
This language does not look like Scheme complete macro system, unfortunately, featur-
any more. And we can go even further, im- ing the language different from the host lan-
plementing a Pascal (or Rlisp)–like language guage (so only one stage preprocessing is pos-
on top of Scheme, using just the same regexp sible). It lacks a good type system, but it could
macro to describe both a lexer and a parser, be simulated on top of the existing lowlevel fea-
and then to compile the resulting code to the tures. There exist some implementations of
underlying Scheme. the recursive descendant parsing combinators
for C++ (e.g., Boost Spirit library [2]), imple-
(pasqualish mentation of the functional programming (e.g.,
" Boost Lambda [2]), and even Lisp compilers on
function fac(x) top of the C++ template system. The runtime
begin evaluation is available in different ways: using
if (x > 0) then pluggable scripting languages other than C++,
x*fac(x - 1) using the C++ interpreter [14]. An interesting
else 1; approach is described in [13].
end Another choice is Forth. It is a powerful
") metalanguage, but the core language remains
too lowlevel and unsafe. Forth is often the only
No more parenthesis that frighten non–Lisp choice available for the embedded systems with
programmers so much! Now even Pascal pro- limited resources.
grammers can use Scheme. It is worth mentioning modern experimen-
The code samples above demonstrate some tal extensions for strictly typed functional lan-
of the techniques available in this approach. guages: Template Haskell [12] and MetaOCaml
The complete implementation can be down- [11]. Both of them conform well to all of
loaded from [4]. It is possible to produce not the Core Language requirements. Objective

5
Caml also provides one–stage metaprogram- metaprogramming too.
ming using a sophisticated preprocessing en-
gine CamlP4. And OCaml is quite good for
implementing interpreters using the closure– 7 Conclusion
based technique. Some examples can be found
in [7]. The idea of metaprogramming is not something
esoteric. Metaprogramming is used widely by
No doubt that Common Lisp would also commercial programmers, they just do not re-
be a very good platform since it shares almost alize it. The methodology proposed in this pa-
all the features with Scheme with exception of per is an attempt of uncovering all the hid-
simplicity. The killing feature of Common Lisp den power of the metaprogramming techniques
is advanced runtime compilation in some of the available.
major implementations (CMU CL [8] and its The Scheme example presented above is
descendant SBCL [9] are good examples), and part of the working project, which already
the defmacro is guaranteed to be working in all proved the supremacy of this approach. A sub-
the implementations available, which is a great set of the Domain Specific Languages hierarchy
advantage over Scheme. designed for the WWW data acquiring project
For relatively small projects Tcl [10] would is shown on the Fig. 1.
be a good choice. Its computational model The subject discussed requires future re-
is based on rewrites (and primary data struc- search and practical approbation, whose final
tures are just the strings of text), which ren- result may be a completely formalized, mathe-
ders an extremely powerful metaprogramming matically strict methodology description and a
tool. JavaScript language is also based on Core Language which will best fit this method-
the rewrites semantics, so it could be used for ology.

6
References
[1] Diomidis Spinellis. Reliable software implementation using domain specific languages. In
G. I. Schuëller and P. Kafka, editors, Proceedings ESREL ’99 — The Tenth European
Conference on Safety and Reliability, pages 627–631, Rotterdam, September 1999. ESRA,
VDI, TUM, A. A. Balkema //
[draft] http://www.dmst.aueb.gr/dds/pubs/conf/1999-ESREL-SoftRel/html/dsl.html

[2] The Boost project // http://www.boost.org/

[3] The Bigloo Practical Scheme implementation // http://www.bigloo.org/

[4] V. S. Lugovsky, DSLEngine project home // http://dslengine.sourceforge.net/

[5] P. Graham, The Hundred-Year Language // http://www.paulgraham.com/hundred.html

[6] P. Graham, The Python Paradox // http://www.paulgraham.com/pypar.html

[7] V. S. Lugovsky, publications list // http://ontil.ihep.su/˜vsl

[8] CMU Common Lisp // http://www.cons.org/cmucl/

[9] Steel Bank Common Lisp // http://sbcl.sourceforge.net/

[10] Tcl programming language resource // http://tcl.activestate.com/

[11] MetaOCaml project home // http://www.metaocaml.org/

[12] T. Sheard, S. P. Jones, Template metaprogramming for Haskell //


http://research.microsoft.com/˜simonpj/papers/meta-haskell/

[13] Tempo project home // http://compose.labri.fr/prototypes/tempo/

[14] CINT project home // http://root.cern.ch/root/Cint.html

7
Core language

Parsing combinators

Stack machine Lexer

Graph machine Parser generator

Unification machine Templates language Forth-like Regular expressins

Data aquision regexps Pascal-like

Rule engine

SQL templates

Figure 1: A sample DSLs hierarchy subset for the Web crawler project.

You might also like