You are on page 1of 46

Engineering Large Projects in Haskell

A Decade of Functional Programming at Galois

Don Stewart | 2009 04 20 | London HUG

2008 Galois, Inc. All rights reserved.


This talk made possible by...
• Aaron Tomb • Joe Hurd
• Adam Wick • Joel Stanley
• Andy Adams-Moran • John Launchbury
• Andy Gill • John Matthews
• David Burke • Laura McKinney
• Dylan McNamee • Lee Pike
• Eric Mertens • Levent Erkok
• Iavor Diatchki • Louis Testa
• Isaac Potoczny-Jones • Magnus Carlsson
• Jef Bell • Paul Heinlein
• Peter White • Sally Browning
• Trevor Elliott • Thomas Nordin
• Phil Weaver • Brett Letner
2008 Galois, Inc. All rights reserved.
• Jeff Lewis • … and many others
What does Galois do?
• Information assurance for critical systems
• Building systems that are trustworthy and secure
• Mixture of government and industry clients
• R&D with our favorite tools:
– Formal methods
– Typed functional languages
– Languages, compilers, DSLs
• Systems components: kernels, file systems, network
stuff, analysis tools, user land apps, ...
• Haskell for pretty much everything
2008 Galois, Inc. All rights reserved.
Yes. Haskell can do that.
• Many 20 – 200k LOC Haskell projects
• Oldest projects approaching 10 years
• Teams of 1 – 6 developers at a time
• Much pair programming, whiteboards, code reviews
• 20 – 30 devs over longer project lifetime
• Have built many tools and libraries to support
Haskell development on this scale

• Haskell essential to keeping clients happy with:


– Deadlines, performance(!), maintainability
2008 Galois, Inc. All rights reserved.
Themes
Languages matter!

• Writing correct software is difficult!


• Programming languages vary wildly in how well they
support robust, secure, safe coding practices
• Languages and tools can aid or hinder our efforts:
– Type systems
– Purity
– Modularity / compositionality
– Abstraction support
– Tools: analyses, provers, model checking
– Buggy implementations
2008 Galois, Inc. All rights reserved.
Detect errors early!

• Detecting problems before executing the program is


critical
– Debugging is hard
– Debugging low level systems is harder
– Debugging low level critical systems is ...
• Culture of error prevention
– “How could we rule out this class of errors?”
– “How could we be more precise?”

2008 Galois, Inc. All rights reserved.


The toolchain matters!

• Can't build anything without a good tool chain


– Native code compiler
– Libraries, libraries, libraries
– Debugging, tracing
– Profiling, inspection
– Testing, analysis
– Open, modifiable tools
• Particularly when pushing the
boundaries
2008 Galois, Inc. All rights reserved.
Community matters!
• Soup of ideas in a large, open research community:
– Rapid adoption of new ideas
• Support, maintainance and help
– Can't build everything we need in-house!
• Give back via:
– Workshops: CUFP, ICFP, Haskell Symposium
– Hackathons
– Industrial Haskell Group
– Open source code and infrastructure
– Teaching: papers, blogs, talks
2008 Galois, Inc. All rights reserved.
How Galois uses Haskell
1. The Type System
Types make our lives easier

• Cheap way to verify properties


– Cheaper than theorem proving
– More assurance than testing
– Saves debugging in hostile environments
• Typical conversation:
– Engineer A: “Spec says this must never
happen”
– Engineer B: “Can we enforce that in the
type system?”
2008 Galois, Inc. All rights reserved.
Kinds of things types enforce

• Simple things:
– Correct arguments to a function
– Function f does not touch the disk
– No null pointers
– Mixing up similar concepts:
• Virtual / physical addresses
• Serious things:
– Information flow policies
– Correct component wiring and integration
2008 Galois, Inc. All rights reserved.
Recent experience
First demo of a big systems project
• Six engineers
• 50k lines of code, in 5 components,
developed over a number of months
• Integrated, tested, demo'd in only a week,
two months ahead of schedule, 2 rungs
above performance spec.
• 1 space leak, spotted and fixed on first
day of testing
• 2 bugs found (typos from spec)
2008 Galois, Inc. All rights reserved.
Purity is fundamental

• Difficult to show safety without purity


• Code should be pure by default
• Makes large systems easier to glue:
– Pure code is “safe” by default to call
• Effects are “code smells”, and have to be
treated carefully
• The world has too many impure
languages: don't add to that
2008 Galois, Inc. All rights reserved.
Types aren't enough though

• Still not expressive enough for a lot of the


properties we want to enforce

• We care a lot about sizes in types


– “Input must only be 128, 192 or 256 bits”
– “Type T should be represented with 7 bits”

2008 Galois, Inc. All rights reserved.


Other tools in the bag

• Extended static analysis tools


• Model checking
– SAT, SMT, …
• Theorem proving
– Isabelle, Coq

• How much assurance do you need?


2008 Galois, Inc. All rights reserved.
2. Abstractions
Monads

• Constantly rolling new monads


– Captures critical facts about the execution
environment in the type
• Directly encodes semantics we care about
– “Computed keys are not visible outside the
M component”
– “Function f has read-only access to
memory”

2008 Galois, Inc. All rights reserved.


Algebraic Data Types
• Every system is either an interpreter or a
compiler
– Abstract syntax trees are ubiquitous
– Represent processes symbolically, via
ADTs, then evaluate them in a safe
(monadic) context
– Precise, concise control over possible
values
– But need precise representation control

2008 Galois, Inc. All rights reserved.


Laziness

• Captures some concepts perfectly


– “A stream of 4k packets from the wire”
• Critical for control abstractions in DSLs
• Useful for prototyping:
– error “M.F.foo: not implemented”

2008 Galois, Inc. All rights reserved.


Laziness

• Makes time and space reasoning harder!


– Mostly harmless in practice
– Stress testing tends to reveal retainers
– Graphical profiling knocks it dead
• Must be able to precisely enable/disable
• Be careful with exceptions and mutation
• whnf/rnf/! are your friends
2008 Galois, Inc. All rights reserved.
Type classes

• We use type classes


– Well defined interfaces between large
components (sets of modules)
– Natural code reuse
– Capture general concepts in a natural way
– Capture interface in a clear way
– Kick butt EDSLs (see Lennart's blog)

2008 Galois, Inc. All rights reserved.


Concurrency

• forkIO rocks
– Cheap, very fast, precise threads
• MVars rock
• STM rocks (safely composable locks!)

• Result: not shy introducing concurrency


when appropriate

2008 Galois, Inc. All rights reserved.


3. Foreign Function Interface
Foreign Function Interface

• The world is a messy place


• A good FFI means we can always call
someone else's code if necessary
• Have to talk to weird bits of hardware and
weird proof systems
• ForeignPtr is great abstraction tool
• Must have clear API into the runtime
system (hot topic at the moment)
2008 Galois, Inc. All rights reserved.
4. Meta programming
There's alway boilerplate
• Abstractions get rid of a lot of repetitive
code, but there's always something
that's not automated
• We use a little Template Haskell
• Other generics:
– Hinze-style generics
– SYB generics
• Particular useful for generating instance
code for marshalling
2008 Galois, Inc. All rights reserved.
5. Performance
Fast enough for majority of things

• Vast majority of code is fast enough


– GHC -O2 -funbox-strict-fields
– Happy with 1 – 2x C for low level code
• Last few drops get squeezed out:
– Profiling
– Low level Haskell
– Cycle-level measurement
– EDSLs to generate better code
– Calling into C
2008 Galois, Inc. All rights reserved.
Performance

• Really precise performance requires


expertise
• Libraries are helping reify “oral traditions”
about optimization
• Still a lack of clarity about performance
techniques in the broader Haskell
community though

2008 Galois, Inc. All rights reserved.


6. Debugging
There are still bugs!

• Testing
– QuickCheck!!!
• Heap profiling
– “By type” profiling of the heap
• GHC -fhpc
– Great for finding exceptions
– Understanding what is executing
• +RTS -stderr
– Explain what GC, threads, memory is up to
2008 Galois, Inc. All rights reserved.
7. Documentation
Generating supporting artifacts

• Haddock is great for reference material


– Helps capture design in the source
– Code + types becomes self documenting
• Design documents can be partially
extracted via:
– The major data and type signatures
– graphmod
– cabalgraph
– HPC analysis
2008 Galois, Inc. All rights reserved.
8. Libraries
Hackage Changes Everything

• There's a library for everything, and often


more than one...
• Can sit back and let mtl / monadlib / haxml
/ hxt fight it out :)
• Static linking → need BSD licensed code
if we want to ship
• Haskell Platform to answer QA questions

2008 Galois, Inc. All rights reserved.


9. Shipping code
Cabal

• I don't know how Haskell was possible


before Cabal :)
• Quickly adopted Cabal/cabal-install across
projects
• cabal-install:
– Simple, clean integration of internal and
external components into packageable
objects

2008 Galois, Inc. All rights reserved.


10. Conventions
We try to ...

• -Wall police
• Consistent layout
• No tabs
• Import qualified Control.Exception
• {-# LANGUAGE … #-}
• Map exceptions into Either / Maybe

2008 Galois, Inc. All rights reserved.


We try to ...

• deriving Show
• Line/column for errors if you must throw
• No global mutable state
• Put type sigs in “when you're done” with
the design
• Use GHCi for rapid experimentation
• Cabal by default.
• Libraries by default
2008 Galois, Inc. All rights reserved.
11. Things that we still need
More support for large scale
programming
• Enforcing conventions across the code
• Data representation precision (emerging)
• A serious refactoring tool
• Vetted and audited libraries by experts
(Haskell Platform)
• Idioms for mapping design onto
types/functions/classes/monads
• Better capture your 100 module design!
2008 Galois, Inc. All rights reserved.

You might also like