You are on page 1of 49

Engineering Large Projects

in a Functional Language
Lessons from a Decade of Haskell at Galois
Don Stewart | 2010-07-10 | DevNation PDX
This talk made possible by...
 Aaron Tomb  Joel Stanley
 Adam Wick  John Launchbury

 Andy Adams-Moran  John Matthews

  Jonathan Daugherty
Andy Gill
 Josh Hoyt
 David Burke
 Laura McKinney
 Dylan McNamee
 Ledah Casburn
 Eric Mertens
 Lee Pike
 Iavor Diatchki
 Levent Erkok
 Isaac Potoczny-Jones
 Louis Testa
 Jef Bell
 Magnus Carlsson
 Peter White
 Matt Sottile
 Trevor Elliott
 Paul Heinlein
 Phil Weaver  Rogan Creswick
 Jason Dagit  Sally Browning
 Jeff Lewis  Sigbjorn Finne
 Joe Hurd  Thomas Nordin
 Brett Letner
© 2010 Galois, Inc. All rights reserved.  … and many others
What does Galois do?

 Information assurance for critical systems


 Building systems that are trustworthy and secure
 Mixture of government and industry clients
 R&D with our favorite tools:
• Formal methods
• Typed functional languages
• Languages, compilers, DSLs
 Kernels, file systems, networks, servers, compilers,
security, desktop apps, ...
 Haskell for pretty much everything
© 2010 Galois, Inc. All rights reserved.
Haskell is ...

 A purely functional language


 Strongly statically typed
 20 years old
 Open source http://haskell.org
http://haskell.org/platform
 Compiled and interpreted http://hackage.haskell.org
 Used in research, open source and industry

© 2010 Galois, Inc. All rights reserved.


Yes. Haskell can do that.

 Many 20 – 200k LOC Haskell projects


 Oldest commercial projects over 10 years of
development now (e.g. Cryptol)
 Teams of 1 – 6 developers at a time
 Much pair programming, whiteboards, code reviews
 20 – 30 devs over longer project lifetime
 Have built many tools and libraries to support
Haskell development on this scale
 Haskell essential to keeping clients happy with:
• Deadlines, performance(!), maintainability
© 2010 Galois, Inc. All rights reserved.
Themes

© 2010 Galois, Inc. All rights reserved.


Languages matter!

 Writing correct software is difficult!


 Programming languages vary wildly in how well they
support robust, secure, safe coding practices
 Languages and tools can aid or hinder our efforts:
• Type systems
• Purity
• Modularity / compositionality
• Abstraction support
• Tools: analyses, provers, model checking
• Buggy implementations
© 2010 Galois, Inc. All rights reserved.
Detect errors early!

 Detecting problems before executing the program is


critical
• Debugging is hard
• Debugging low level systems is harder
• Debugging low level critical systems is ...
 Culture of error prevention
• “How could we rule out this class of errors?”
• “How could we be more precise?”

© 2010 Galois, Inc. All rights reserved.


The toolchain matters!

 Can't build anything without a good tool chain


• Native code, optimizing compiler
• Libraries, libraries, libraries
• Debugging, tracing
• Profiling, inspection, runtime analysis
• Testing, analysis
• Need open, modifiable tools
– Particularly when pushing the boundaries
(Haskell on bare metal..)
© 2010 Galois, Inc. All rights reserved.
Community matters!

 Soup of ideas in a large, open research community:


• Rapid adoption of new ideas
 Support, maintainance and help
• Can't build everything we need in-house!
 Give back via:
• Workshops: CUFP, ICFP, Haskell Symposium
• Hackathons
• Industrial Haskell Group
• Open source code and infrastructure
• Teaching: papers, blogs, talks
© 2010 Galois, Inc. All rights reserved.
How Galois Uses Haskell

© 2010 Galois, Inc. All rights reserved.


1. The Type System

© 2010 Galois, Inc. All rights reserved.


© 2010 Galois, Inc. All rights reserved.
Types make our lives easier

 Cheap way to verify properties


• Cheaper than theorem proving
• More assurance than testing
• Saves debugging in hostile environments
 Typical conversation:
• Engineer A: “Spec says this must never happen”
• Engineer B: “Can we enforce that in the type system?”

© 2010 Galois, Inc. All rights reserved.


Kinds of things types enforce

 Simple things:
• Correct arguments to a function
• Function f does not touch the disk
• No null pointers
• Mixing up similar concepts:
– Virtual / physical addresses
 Serious things:
• Information flow policies
• Correct component wiring and integration

© 2010 Galois, Inc. All rights reserved.


Recent experience
First demo of a new system
 Six engineers
 50k lines of code, in 5 components, developed over a
number of months
 Integrated, tested, demo'd in only a week, two months
ahead of schedule, significantly above performance
spec.
 1 space leak, spotted and fixed on first day of testing via
the heap profiler
 2 bugs found (typos from spec)

© 2010 Galois, Inc. All rights reserved.


Purity is fundamental

 Difficult to show safety without purity


 Code should be pure by default
 Makes large systems easier to glue:
• Pure code is “safe” by default to call
 Effects are “code smells”, and have to be treated
carefully
 The world has too many impure languages: don't add to
that

© 2010 Galois, Inc. All rights reserved.


Types aren't enough though

 Still not expressive enough for a lot of the properties we


want to enforce

 We care a lot about sizes in types


• “Input must only be 128, 192 or 256 bits”
• “Type T should be represented with 7 bits”

© 2010 Galois, Inc. All rights reserved.


Other tools in the bag

 Extended static analysis tools


 Model checking
• SAT, SMT, …
 Theorem proving
• Isabelle, Agda, Coq

 How much assurance do you need?

© 2010 Galois, Inc. All rights reserved.


2. Abstractions

© 2010 Galois, Inc. All rights reserved.


Monads

 Constantly rolling new monads


• Captures critical facts about the execution environment in the
type
 Directly encodes semantics we care about
• “Computed keys are not visible outside the M component”
• “Function f has read-only access to memory”

© 2010 Galois, Inc. All rights reserved.


Algebraic Data Types

 Every system is either an interpreter or a compiler


• Abstract syntax trees are ubiquitous
• Represent processes symbolically, via ADTs, then evaluate
them in a safe (monadic) context
• Precise, concise control over possible values
• But need precise representation control

© 2010 Galois, Inc. All rights reserved.


Laziness

 Captures some concepts perfectly


• “A stream of 4k packets from the wire”
 Critical for control abstractions in DSLs
 Useful for prototyping:
• error “M.F.foo: not implemented”

© 2010 Galois, Inc. All rights reserved.


Laziness

 Makes time and space reasoning harder!


• Mostly harmless in practice
• Stress testing tends to reveal retainers
• Graphical profiling knocks it dead
 Must be able to precisely enable/disable
 Be careful with exceptions and mutation
 whnf/rnf/! are your friends

© 2010 Galois, Inc. All rights reserved.


Type classes

 We use type classes


• Well defined interfaces between large components (sets of
modules)
• Natural code reuse
• Capture general concepts in a natural way
• Capture interface in a clear way
• Kick butt EDSLs (see Lennart's blog)

© 2010 Galois, Inc. All rights reserved.


Concurrency and Parallelism

 forkIO rocks
• Cheap, very fast, precise threads
 MVars rock
 STM rocks (safely composable locks!)

 Result: not shy introducing concurrency when


appropriate

© 2010 Galois, Inc. All rights reserved.


3. Foreign Function Interface

© 2010 Galois, Inc. All rights reserved.


Foreign Function Interface

 The world is a messy place


 A good FFI means we can always call someone else's
code if necessary
 Have to talk to weird bits of hardware and weird proof
systems
 ForeignPtr is great abstraction tool
 Must have clear API into the runtime system (hot topic at
the moment)

© 2010 Galois, Inc. All rights reserved.


4. Meta programming

© 2010 Galois, Inc. All rights reserved.


There's alway boilerplate

 Abstractions get rid of a lot of repetitive code, but there's


always something that's not automated
 We use a little Template Haskell
 Other generics:
• Hinze-style generics
• SYB generics
 Particular useful for generating instance code for
marshalling

© 2010 Galois, Inc. All rights reserved.


5. Performance

© 2010 Galois, Inc. All rights reserved.


Fast enough for majority of things

 Vast majority of code is fast enough


• GHC -O2 -funbox-strict-fields
• Happy with 1 – 2x C for low level code
 Last few drops get squeezed out:
• Profiling
• Low level Haskell
• Cycle-level measurement
• EDSLs to generate better code
• Calling into C

© 2010 Galois, Inc. All rights reserved.


Performance

 Really precise performance requires expertise


 Libraries are helping reify “oral traditions” about
optimization
 Still a lack of clarity about performance techniques in the
broader Haskell community though

© 2010 Galois, Inc. All rights reserved.


6. Debugging

© 2010 Galois, Inc. All rights reserved.


There are still bugs!

 Testing
• QuickCheck!!!
 Heap profiling
• “By type” profiling of the heap
 GHC -fhpc
• Great for finding exceptions
• Understanding what is executing
 +RTS -stderr
• Explain what GC, threads, memory is up to

© 2010 Galois, Inc. All rights reserved.


7. Documentation

© 2010 Galois, Inc. All rights reserved.


Generating supporting artifacts

 Haddock is great for reference material


• Helps capture design in the source
• Code + types becomes self documenting
 Design documents can be partially extracted via:
• The major data and type signatures
• graphmod
• cabalgraph
• HPC analysis

© 2010 Galois, Inc. All rights reserved.


8. Libraries

© 2010 Galois, Inc. All rights reserved.


Hackage Changed Everything

 2200+ libraries created in 3 years. There's a library for


everything, and often more than one...
 Can sit back and let mtl / monadlib / haxml / hxt fight it
out :)
 Static linking → need BSD licensed code if we want to
ship
 Haskell Platform to answer QA questions

© 2010 Galois, Inc. All rights reserved.


9. Shipping code

© 2010 Galois, Inc. All rights reserved.


Cabal

 I don't know how Haskell was possible before Cabal :)


 Quickly adopted Cabal/cabal-install across projects
 cabal-install:
• Simple, clean integration of internal and external components
into packageable objects

© 2010 Galois, Inc. All rights reserved.


10. Conventions

© 2010 Galois, Inc. All rights reserved.


We try to ...

 -Wall police
 Consistent layout
 No tabs
 Import qualified Control.Exception
 {-# LANGUAGE … #-}
 Map exceptions into Either / Maybe

© 2010 Galois, Inc. All rights reserved.


We try to ...

 deriving Show
 Line/column for errors if you must throw
 No global mutable state
 Put type sigs in “when you're done” with the design
 Use GHCi for rapid experimentation
 Cabal by default.
 Libraries by default

© 2010 Galois, Inc. All rights reserved.


11. Training

© 2010 Galois, Inc. All rights reserved.


Easy to find Haskell programmers

 With a big open source community, its much easier to


find Haskell programmers now
 Many more applicants than jobs, often with significant
experience from open source
 We train on-site, and new resources like LYAH and
RWH make this easier.

© 2010 Galois, Inc. All rights reserved.


12. Things that we still need

© 2010 Galois, Inc. All rights reserved.


More support for large scale programming

 Enforcing conventions across the code


 Data representation precision (emerging)
 A serious refactoring tool (HaRe on Hackage!)
 Vetted and audited libraries by experts (Haskell Platform
)
 Idioms for mapping design onto
types/functions/classes/monads
 Better capture your 100 module design!

© 2010 Galois, Inc. All rights reserved.


© 2010 Galois, Inc. All rights reserved.

You might also like