You are on page 1of 31

Domain Specific Languages

And Haskell
Don Stewart | LACSS, Santa Fe, NM | Oct 14, 2009

© 2009 Galois, Inc. All rights reserved.

Two Points To Take Home

1. Embedded domain specific languages (EDSLs)

are an inexpensive way to improve portability,
maintainability, productivity, and correctness of
new scientific code

2. Haskell is a great programming language for

EDSLs, and also for exploring parallel
programming models – via STM, aggressive
speculation and nested data parallelism.

© 2009 Galois, Inc. All rights reserved.

Part 1:
A Way Forward:
Embedded Domain Specific Languages

© 2009 Galois, Inc. All rights reserved.

Change and Growth
• New architectures are appearing
• Increasing complexity (GPUs, The Cell)
– unusual programming models
– unusual memory hierarchies
– unusual hybrid architectures
– massive compute power
• How do we write code that is going to be portable,
maintainable, correct and fast?
• How to bridge the “programmability gap”? How do
we experiment with these machines?
© 2009 Galois, Inc. All rights reserved.
Domain Specific Languages (DSLs)
• DSLs are:
– Small languages with a restricted programming
model aimed at a particular problem domain
– Work at semantic level of problem domain
– Not trying to do everything at once
• A relatively new, and very hot, approach to
tackling unusual problems and managing
• Emerged from the programming language
community © 2009 Galois, Inc. All rights reserved.
DSLs : the advantages

Focused to a particular problem domain, so

• Better productivity: work in domain abstractions
• Higher level – easier to maintain
• Restricted semantics: easier to optimize and verify
– Domain-level knowledge feeds new optimizations
• Usually small and declarative – not tied to a
particular hardware model – so easier to port
• Encourages explorative coding!

© 2009 Galois, Inc. All rights reserved.

An nice example: Cryptol

A DSL for cryptography built by Galois

• Emphasis on performance + correctness
• High level: target users are crypto experts
• High level: not tied to any machine model, so:
– Portable to VHDL/FPGA/C/Haskell/Interpreter
• Restricted programming model enables automatic
equivalence checking
• Domain-specific optimizations: so very, very fast
– “128 bit AES targetting 100Gbps via Async FPGAs”
© 2009 Galois, Inc. All rights reserved.
Domain-specific knowledge made visible
– – AES “rounds” function in Cryptol
Rounds (State, (initialKey, rndKeys, finalKey)) = final
where {
istate = State ^ initialKey;
rnds = [istate] # [| Round (state, key) – – stream comprehension
|| state <- rnds
|| key <- rndKeys
final = FinalRound (last rnds, finalKey);

© 2009 Galois, Inc. All rights reserved.

Making DSLs easier to build
• Designing and building compilers is relatively time
– Especially if we want good code generators
– Good type systems
– Good optimizers
• Embedded DSLs are the way forward
– Embed your DSL in an existing language – save $$
– Reuse its syntax, compiler, libs, tools, type system
– But, use a library for code generation
– And write your own optimizations
© 2009 Galois, Inc. All rights reserved.
Good host languages for EDSLs

Good host languages

– Need to support overloading (numbers, strings)
– Can build ASTs from regular syntax
– Need a rich type system (embed the domain
language's types in the host language's types)
– Should have a good toolchain (doc tools, profilers)
– Should have good code generation libraries
• C, LLVM, Asm, Haskell, ...

Haskell is a good host language!

© 2009 Galois, Inc. All rights reserved.

1. The “accelerate” multi-dim array EDSL

“accelerate” - Haskell EDSL for multi-dimensional

array processing targetting data parallel hardware
– Collective operations on multi-dim arrays
• Targetting massive data parallelism
– Restricted control flow and types
• Widely portable, and matches what the GPU supports
– Generative code approach based on templates
• Matches hand-specialization techniques
• Not tied to any hardware – guaranteed portability

© 2009 Galois, Inc. All rights reserved.
Import Data.Array.Accelerate
– – EDSL code for dot product

dotp :: Vector Float → Vector Float → Acc (Scalar Float)

dotp xs ys =
let xs' = use xs – – marshal data to GPU
ys' = use ys
in fold (+) 0 (zipWith (*) xs' ys') – – GPU computation

• See “Haskell Arrays, Accelerated (Using GPUs)”,

Chakravarty et al, Haskell Implementors Workshop,
2009. © 2009 Galois, Inc. All rights reserved.
2. An EDSL for SIMD-parallel algorithms
• Programming model: SIMD-parallel algos
– Target users: mathematicians (!)
– Target backends: CPU, the Cell
– Emphasizes domain-specific optimization
– Emphasizes exploratory programming
– “Generates unusual call patterns”
– Generates C, Haskell, or fed into state-of-the-art
instruction scheduler
• Anand and Kahl, “A Domain-Specific Language for the
Generation of Optimized SIMD-Parallel Assembly Code”,
© 2009 Galois, Inc. All rights reserved.
Bit-shift division: example
divShiftMA :: SPUType val
⇒ Integer → Integer → Integer → Integer → val → val
divShiftMA p q s n v
| s 6≡ 0 = mpya m v b
| m’ < 2 ↑ 10 ∧ m’ > 0 = mpyui v m’
| m’ < 2 ↑ 9 ∧ m’ > (−2 ↑ 9) = mpyi v m’
| otherwise = mpy v m
m’ = (p ∗ 2 ↑ n + (q − 1)) ‘div‘ q -- integer exponent and division
m = unwrds4 m’
b = unwrds4 $ (s ∗ 2 ↑ n) ‘div‘ q
© 2009 Galois, Inc. All rights reserved.
3. BASIC (targets LLVM)
Import Basic
main = runBASIC $ do
10 GOSUB 1000
100 LET I := INT(100 * RND(0))
200 PRINT "Guess my number:"
220 LET S := SGN(I-X)
230 IF S <> 0 THEN 300
240 FOR X := 1 TO 5
250 PRINT X*X;" You won!"
260 NEXT X
© 2009 Galois, Inc. All rights reserved.
EDSLs: Summary
• DSLs increase abstraction, enabling new levels of
– Portability
– Verifiability
– Maintainability
– Without sacrificing performance
• Embedded DSLs are cheaper to construct
– Reuse significant resources of a compiler toolchain
• Haskell is a rich playground for EDSLs, with many
examples, in a wide range of domains
• More examples in the paper
© 2009 Galois, Inc. All rights reserved.
Part 2:
Parallel Programming in Haskell

© 2009 Galois, Inc. All rights reserved.

The Haskell Approach to Multicore

Two broad approaches to multicore programming

provided by Haskell

• Deterministic parallelism
1. Hand-annotated speculation + work stealing queues
2. Nested data parallelism
• Concurrency for multicore
3. Very lightweight threads
4. Communication via MVars and transactional memory

© 2009 Galois, Inc. All rights reserved.

Haskell and Parallelism: Why?
• Language reasons:
– Purity, laziness and types mean you
can find more parallelism in your code
– No specified execution order!
– Speculation and parallelism safe.
• Purity provides inherently more parallelism
• A very high level language, but lots of static
type information for the optimizer
© 2009 Galois, Inc. All rights reserved.
Haskell and Parallelism

• Custom multicore runtime: high

performance threads a primary concern –
thanks Simon Marlow!
• Mature: 20 year code base, long term
industrial use, massive library system
• Ready to go:

© 2009 Galois, Inc. All rights reserved.

The GHC Runtime Model

• Multiple virtual cpus

– Each virtual cpu has a pool of OS threads
– CPU local spark pools for additional work
• Lightweight Haskell threads map onto OS threads:
many to one.
• Even lighter 'sparks' used for speculative work
• Automatic thread migration and load balancing
• Parallel, generational GC
• Transactional memory and MVars.
© 2009 Galois, Inc. All rights reserved.
Approach 1. Parallel Strategies
Useful speculation built up from the `par` combinator:

a `par` b

• Creates a spark for 'a' – very cheap! A speculation “hint”

• Runtime sees chance to convert spark into a thread
• Which in turn may get run in parallel, on another core
• 'b' is returned
• No restrictions on what you can annotate – very
flexible approach to post-hoc parallelization
© 2009 Galois, Inc. All rights reserved.
Parallel Strategies: Programming Model
• Deterministic:
– Same results with parallel and sequential
– No races, no errors
– Good for reasoning: erase the `par` and get the
original program
• Cheap: sprinkle par as you like, then measure and
• Measurement much easier with Threadscope
• Strategies: combinators for common patterns
© 2009 Galois, Inc. All rights reserved.
New Tools : ThreadScope
• New thread profiling tool: ThreadScope

© 2009 Galois, Inc. All rights reserved.

Approach 2: Nested Data Parallelism
We can write a lot of parallel programs strategies or explicit
threads, however

• par/seq are very light, but granularity is hard to get right

• forkIO/MVar/STM are more precise, but more complex
• Trade offs between abstraction and precision

Another way to parallel Haskell programs:

• nested data parallelism

© 2009 Galois, Inc. All rights reserved.

Nested Data Parallelism
• If your program is expressible as a nested data
parallel program
– The compiler will flatten it to a flat data parallel one
– No worrying about explicit threads or synchronization
– Clear cost model (unlike `par` speculation)
– Good locality of data/easier partioning of work
• Looks like a good model for array and GPU
programming (see Chakravarty's 'accelerate' @
• Good speedups with many hardware threads (T2)
© 2009 Galois, Inc. All rights reserved.
Approach 3: Explicit lightweight threads
• Lightweight threads are preemptively scheduled
(10 … 10M+ Haskell threads possible)
• Non-deterministic scheduling: random interleaving
• When the main thread terminates, all threads
terminate (“daemonic threads”)
• Threads may be preempted when they allocate
• Communicate via messages or shared memory
• See for benchmarks
© 2009 Galois, Inc. All rights reserved.
Communicating between threads
• We need to communicate between threads
• We need threads to wait on results
• Use shared, mutable synchronizing variables to

Synchronization achieved via async messages,

MVars or STM
By far the most popular concurrency technique, and
maps onto multicore well
See Simon Marlow's pubs. for lots of benchmarks
© 2009 Galois, Inc. All rights reserved.
Approach 4: Transactional Memory
• Optimistic: Each atomic block appears to run in
complete isolation
• Runtime publishes modifications to shared
variables to all threads, or,
• Restarts the transaction that suffered contention
• You have the illusion you're the only thread
• Composable, deadlock free communication
• Used in concurrency-heavy systems at Galois
• Slower than MVars, but useful and can be tuned
© 2009 Galois, Inc. All rights reserved.
Parallelism and Haskell: Summary
• More information: Google for:
– “Parallel Programming in Haskell: A Reading List”
• Sophisticated, fast runtime
– 1. Sparks and parallel strategies
– 2. Nested data parallel arrays
– 3. Explicit threads + MVars and shared memory
– 4. Transactional memory

• Available in a widely used open source language

© 2009 Galois, Inc. All rights reserved.
About Galois
Research and tech transition company
Just over a decade old
Specialists in
– Compiler and language engineering
– Domain-specific languages
– Formal methods
– High assurance systems
– High performance cryptography
Clients include DOE, DARPA, DHS, DOD and IC
Looking to collaborate!
© 2009 Galois, Inc. All rights reserved.

You might also like