compiler for Science (i.e. for
NumPy and other typed containers)
March 16, 2013
Travis E. Oliphant, Jon Riehl
Mark Florisson, Siu Kwan Lam
Saturday, March 16, 13
Where I’m coming from
After Before
ρ
0
(2πf)
2
U
i
(a, f) = [C
ijkl
(a, f) U
k,l
(a, f)]
,j
Saturday, March 16, 13
spyder
scíkíts
machine learning in Python
1,000,000 to 2,000,000 users of NumPy!
Saturday, March 16, 13
NumFOCUS  blatant ad!
www.numfocus.org
501(c)3 Public Charity
Join Us! http://numfocus.org/membership/
Saturday, March 16, 13
Code that users might write
x
i
=
i−1
X
j=0
k
i−j,j
a
i−j
a
j
O = I ? F
Slow!!!!
Saturday, March 16, 13
Why is Python slow?
1. Dynamic typing
2. Attribute lookups
3. NumPy getitem (a[...])
Saturday, March 16, 13
What are Scientists doing Now?
•
Writing critical parts in C/C++/Fortran and
“wrapping” with
•
SWIG
•
ctypes
•
Cython
•
f2py (or fwrap)
•
handcoded wrappers
•
Writing new code in Cython directly
•
Cython is “modiﬁed Python” with type information everywhere.
•
It produces a Cextension module which is then compiled
Saturday, March 16, 13
Cython is the most popular
these days. But, speeding up
NumPybased codes should be
even easier!
Saturday, March 16, 13
NumPy Array is “typed container”
shape
Saturday, March 16, 13
Let’s use this!
NumPy Users are already using “typed
containers” with regular storage and access
patterns. There is plenty of information to
optimize the code if we either:
•
Provide type information for function
inputs (jit)
•
Create a “callsite” for each function that
compiles and caches the result the ﬁrst
time it gets called with new types.
Saturday, March 16, 13
Requirements Part I
•
Work with CPython (we need the full scientiﬁc
Python stack!)
•
Minimal modiﬁcations to code (use type inference)
•
Programmer control over what and when to “jit”
•
Ability to build static extensions (for libraries)
•
Fall back to Python CAPI for “object” types.
Saturday, March 16, 13
Requirements Part II
•
Produce code as fast as C (maybe even Fortran)
•
Support NumPy arrayexpressions and be able to
produce universal functions (e.g. y = sin(x))
•
Provide a tool that could adapt to provide
parallelism and produce code for modern vector
hardware (GPUs, accelerators, and manycore
machines)
Saturday, March 16, 13
Do we have to write the full compiler??
No!
LLVM has
done much
heavy lifting
LLVM =
Compilers for
everybody
Saturday, March 16, 13
Face of a modern compiler
Intermediate
Representation
(IR)
x86
C++
ARM
PTX
C
Fortran
ObjC
Parsing
Code Generation
FrontEnd
BackEnd
Saturday, March 16, 13
Face of a modern compiler
Intermediate
Representation
(IR)
x86
ARM
PTX
Python
Code Generation
BackEnd
Numba
LLVM
Parsing
FrontEnd
Saturday, March 16, 13
Example
Numba
Saturday, March 16, 13
NumPy + Mamba = Numba
LLVM Library
Intel Nvidia Apple AMD
OpenCL ISPC CUDA CLANG OpenMP
LLVMPY
Python Function Machine Code
ARM
Saturday, March 16, 13
Simple API
•
jit  provide type information (fastest to call at runtime)
•
autojit  detects input types, infers output, generates code
if needed, and dispatches (a little more runtime call
overhead)
#@jit('void(double[:,:], double, double)')
@autojit
def numba_update(u, dx2, dy2):
nx, ny = u.shape
for i in xrange(1,nx1):
for j in xrange(1, ny1):
u[i,j] = ((u[i+1,j] + u[i1,j]) * dy2 +
(u[i,j+1] + u[i,j1]) * dx2) / (2*(dx2+dy2))
Comment out one of jit or autojit (don’t use together)
Saturday, March 16, 13
Example
@numba.jit(‘f8(f8)’)
def sinc(x):
if x==0.0:
return 1.0
else:
return sin(x*pi)/(pi*x)
Numba
Saturday, March 16, 13
~150x speedup
Realtime image
processing (50 fps
Mandelbrot)
Saturday, March 16, 13
Speeding up Math Expressions
x
i
=
i−1
X
j=0
k
i−j,j
a
i−j
a
j
Saturday, March 16, 13
Image Processing
@jit('void(f8[:,:],f8[:,:],f8[:,:])')
def filter(image, filt, output):
M, N = image.shape
m, n = filt.shape
for i in range(m//2, Mm//2):
for j in range(n//2, Nn//2):
result = 0.0
for k in range(m):
for l in range(n):
result += image[i+km//2,j+ln//2]*filt[k, l]
output[i,j] = result
~1500x speedup
Saturday, March 16, 13
Compile NumPy array expressions
from numba import autojit
@autojit
def formula(a, b, c):
a[1:,1:] = a[1:,1:] + b[1:,:1] + c[1:,:1]
@autojit
def express(m1, m2):
m2[1:1:2,0,...,::2] = (m1[1:1:2,...,::2] *
m1[2:1:2,...,::2])
return m2
Saturday, March 16, 13
Fast vectorize
NumPy’s ufuncs take “kernels” and
apply the kernel elementbyelement
over entire arrays
Write kernels in
Python!
from numba.vectorize import vectorize
from math import sin
@vectorize([‘f8(f8)’, ‘f4(f4)’])
def sinc(x):
if x==0.0:
return 1.0
else:
return sin(x*pi)/(pi*x)
Saturday, March 16, 13
Casestudy  j0 from scipy.special
•
scipy.special was one of the ﬁrst libraries I wrote
•
extended “umath” module by adding new
“universal functions” to compute many scientiﬁc
functions by wrapping C and Fortran libs.
•
Bessel functions are solutions to a differential
equation:
x
2
d
2
y
dx
2
+ x
dy
dx
+ (x
2
−α
2
)y = 0
y = J
α
(x)
J
n
(x) =
1
π
Z
π
0
cos (nτ −xsin (τ)) dτ
Saturday, March 16, 13
scipy.special.j0 wraps cephes algorithm
Saturday, March 16, 13
Result  equivalent to compiled code
In [6]: %timeit vj0(x)
10000 loops, best of 3: 75 us per loop
In [7]: from scipy.special import j0
In [8]: %timeit j0(x)
10000 loops, best of 3: 75.3 us per loop
But! Now code is in Python and can be
experimented with more easily (and moved to
the GPU / accelerator more easily)!
Saturday, March 16, 13
Laplace Example
@jit('void(double[:,:], double, double)')
def numba_update(u, dx2, dy2):
nx, ny = u.shape
for i in xrange(1,nx1):
for j in xrange(1, ny1):
u[i,j] = ((u[i+1,j] + u[i1,j]) * dy2 +
(u[i,j+1] + u[i,j1]) * dx2) / (2*(dx2+dy2))
Adapted from http://www.scipy.org/PerformancePython
originally by Prabhu Ramachandran
@jit('void(double[:,:], double, double)')
def numbavec_update(u, dx2, dy2):
u[1:1,1:1] = ((u[2:,1:1]+u[:2,1:1])*dy2 +
(u[1:1,2:] + u[1:1,:2])*dx2) / (2*(dx2+dy2))
Saturday, March 16, 13
Results of Laplace example
Version Time Speed Up
NumPy 3.19 1.0
Numba 2.32 1.38
Vect. Numba 2.33 1.37
Cython 2.38 1.34
Weave 2.47 1.29
Numexpr 2.62 1.22
Fortran Loops 2.30 1.39
Vect. Fortran 1.50 2.13
https://github.com/teoliphant/speed.git
Saturday, March 16, 13
Numba can change the game!
LLVM IR
x86
C++
ARM
PTX
C
Fortran
Python
Numba turns Python into a “compiled
language” (but much more ﬂexible). You don’t
have to reach for C/C++
Saturday, March 16, 13
Many More Advanced Features
•
Extension classes (jit a class  autojit coming soon!)
•
Struct support (NumPy arrays can be structs)
•
SSA  can refer to local variables as different types
•
Typed lists and typed dictionaries and sets coming soon!
•
pointer support
•
calling ctypes and CFFI functions natively
•
pycc (create standalone dynamic library and executable)
•
pycc python (create static extension module for Python)
Saturday, March 16, 13
Uses of Numba
Python
Function
Framework accepting dynamic function pointers
U
f
u
n
c
s
G
e
n
e
r
a
l
i
z
e
d
U
F
u
n
c
s
F
u
n
c
t
i
o
n

b
a
s
e
d
I
n
d
e
x
i
n
g
M
e
m
o
r
y
F
i
l
t
e
r
s
W
i
n
d
o
w
K
e
r
n
e
l
F
u
n
c
s
I
/
O
F
i
l
t
e
r
s
R
e
d
u
c
t
i
o
n
F
i
l
t
e
r
s
C
o
m
p
u
t
e
d
C
o
l
u
m
n
s
Numba
function pointer
Saturday, March 16, 13
Accelerate/NumbaPro  blatant ad!
Python and NumPy compiled to
Parallel Architectures
(GPUs and multicore
machines)
•
Create parallelfor loops
•
Parallel execution of
ufuncs
•
Run ufuncs on the GPU
•
Write CUDA directly in
Python!
•
Free for Academics
fast development and fast
execution!
Currently premium
features will be
contributed to open
source over time!
Saturday, March 16, 13
Numba Development
1260 Mark Florisson
203 Jon Riehl
181 Siu Kwan Lam
110 Travis E. Oliphant
30 Dag Sverre Seljebotn
28 Hernan Grecco
19 Ilan Schnell
11 Mark Wiebe
8 James Bergstra
4 Alberto Valverde
3 Thomas Kluyver
2 Maggie Mari
2 Dan Yamins
2 Dan Christensen
1 timo
1 Yaroslav Halchenko
1 Phillip Cloud
1 Ond!ej "ertík
1 Martin Spacek
1 Lars Buitinck
1 Juan Luis Cano Rodríguez
git log format=format:%an  sort  uniq c  sort r
Siu
Mark
Jon
Saturday, March 16, 13
Milestone Roadmap
•
Rapid progress this year
•
Still some bugs  needs users!
•
Version 0.7 end of Feb.
•
Version 0.8 in April
•
Version 0.9 June
•
Version 1.0 by end of August
•
Stable API (jit, autojit) easy to use
•
Should be able to write equivalent of
NumPy and SciPy with Numba and
memoryviews.
http://numba.pydata.org
http://llvmpy.org
http://compilers.pydata.org
We need you:
•
your usecases
•
your tests
•
developer help
Saturday, March 16, 13
Architectural Overview
Python
Source
Python Parser
Python
AST
Numba Stage 1 Numba Stage n
Numba Code
Generator
Numba
Environment
Numba
AST
LLVM
Saturday, March 16, 13
Numba Architecture
!
Entry points
!
…/numba/decorators.py
!
Environment
!
…/numba/environment.py
!
Pipeline
!
…/numba/pipeline.py
!
Code generation
!
…/numba/codegen/...
Saturday, March 16, 13
Development Roadmap
!
Better stage separation, better modularity
!
Untyped Intermediate Representation (IR)
!
Typed IR
!
Specialized IR
!
Module level entry points
!
Better Array Specialization
Saturday, March 16, 13
Community Involvement
!
~/git/numba$ wc AUTHORS
25 88 1470 AUTHORS
!
(4 lines are blank or instructions)
!
Github https://github.com/numba/numba
!
Mailing list  numbausers@continuum.io
!
Sprints  contact Jon Riehl
!
Examples:
!
Hernan Grecco just contributed Python 3 support (Yeah!)
!
Dag collaborating on autojit classes with Mark F.
!
We need you to show off your amazing demo!
Saturday, March 16, 13