Numba: A dynamic Python

compiler for Science (i.e. for
NumPy and other typed containers)
March 16, 2013
Travis E. Oliphant, Jon Riehl
Mark Florisson, Siu Kwan Lam
Saturday, March 16, 13
Where I’m coming from
After Before
ρ
0
(2πf)
2
U
i
(a, f) = [C
ijkl
(a, f) U
k,l
(a, f)]
,j
Saturday, March 16, 13

spyder

scíkíts
machine learning in Python
1,000,000 to 2,000,000 users of NumPy!
Saturday, March 16, 13
NumFOCUS --- blatant ad!
www.numfocus.org
501(c)3 Public Charity
Join Us! http://numfocus.org/membership/
Saturday, March 16, 13
Code that users might write
x
i
=
i−1
X
j=0
k
i−j,j
a
i−j
a
j
O = I ? F
Slow!!!!
Saturday, March 16, 13
Why is Python slow?
1. Dynamic typing
2. Attribute lookups
3. NumPy get-item (a[...])
Saturday, March 16, 13
What are Scientists doing Now?

Writing critical parts in C/C++/Fortran and
“wrapping” with

SWIG

ctypes

Cython

f2py (or fwrap)

hand-coded wrappers

Writing new code in Cython directly

Cython is “modified Python” with type information everywhere.

It produces a C-extension module which is then compiled
Saturday, March 16, 13
Cython is the most popular
these days. But, speeding up
NumPy-based codes should be
even easier!
Saturday, March 16, 13
NumPy Array is “typed container”
shape
Saturday, March 16, 13
Let’s use this!
NumPy Users are already using “typed
containers” with regular storage and access
patterns. There is plenty of information to
optimize the code if we either:

Provide type information for function
inputs (jit)

Create a “call-site” for each function that
compiles and caches the result the first
time it gets called with new types.
Saturday, March 16, 13
Requirements Part I

Work with CPython (we need the full scientific
Python stack!)

Minimal modifications to code (use type inference)

Programmer control over what and when to “jit”

Ability to build static extensions (for libraries)

Fall back to Python C-API for “object” types.
Saturday, March 16, 13
Requirements Part II

Produce code as fast as C (maybe even Fortran)

Support NumPy array-expressions and be able to
produce universal functions (e.g. y = sin(x))

Provide a tool that could adapt to provide
parallelism and produce code for modern vector
hardware (GPUs, accelerators, and many-core
machines)
Saturday, March 16, 13
Do we have to write the full compiler??
No!
LLVM has
done much
heavy lifting
LLVM =
Compilers for
everybody
Saturday, March 16, 13
Face of a modern compiler
Intermediate
Representation
(IR)
x86
C++
ARM
PTX
C
Fortran
ObjC
Parsing
Code Generation
Front-End
Back-End
Saturday, March 16, 13
Face of a modern compiler
Intermediate
Representation
(IR)
x86
ARM
PTX
Python
Code Generation
Back-End
Numba
LLVM
Parsing
Front-End
Saturday, March 16, 13
Example
Numba
Saturday, March 16, 13
NumPy + Mamba = Numba
LLVM Library
Intel Nvidia Apple AMD
OpenCL ISPC CUDA CLANG OpenMP
LLVMPY
Python Function Machine Code
ARM
Saturday, March 16, 13
Simple API

jit --- provide type information (fastest to call at run-time)

autojit --- detects input types, infers output, generates code
if needed, and dispatches (a little more run-time call
overhead)
#@jit('void(double[:,:], double, double)')
@autojit
def numba_update(u, dx2, dy2):
nx, ny = u.shape
for i in xrange(1,nx-1):
for j in xrange(1, ny-1):
u[i,j] = ((u[i+1,j] + u[i-1,j]) * dy2 +
(u[i,j+1] + u[i,j-1]) * dx2) / (2*(dx2+dy2))
Comment out one of jit or autojit (don’t use together)
Saturday, March 16, 13
Example
@numba.jit(‘f8(f8)’)
def sinc(x):
if x==0.0:
return 1.0
else:
return sin(x*pi)/(pi*x)
Numba
Saturday, March 16, 13
~150x speed-up
Real-time image
processing (50 fps
Mandelbrot)
Saturday, March 16, 13
Speeding up Math Expressions
x
i
=
i−1
X
j=0
k
i−j,j
a
i−j
a
j
Saturday, March 16, 13
Image Processing
@jit('void(f8[:,:],f8[:,:],f8[:,:])')
def filter(image, filt, output):
M, N = image.shape
m, n = filt.shape
for i in range(m//2, M-m//2):
for j in range(n//2, N-n//2):
result = 0.0
for k in range(m):
for l in range(n):
result += image[i+k-m//2,j+l-n//2]*filt[k, l]
output[i,j] = result
~1500x speed-up
Saturday, March 16, 13
Compile NumPy array expressions
from numba import autojit
@autojit
def formula(a, b, c):
a[1:,1:] = a[1:,1:] + b[1:,:-1] + c[1:,:-1]
@autojit
def express(m1, m2):
m2[1:-1:2,0,...,::2] = (m1[1:-1:2,...,::2] *
m1[-2:1:-2,...,::2])
return m2
Saturday, March 16, 13
Fast vectorize
NumPy’s ufuncs take “kernels” and
apply the kernel element-by-element
over entire arrays
Write kernels in
Python!
from numba.vectorize import vectorize
from math import sin
@vectorize([‘f8(f8)’, ‘f4(f4)’])
def sinc(x):
if x==0.0:
return 1.0
else:
return sin(x*pi)/(pi*x)
Saturday, March 16, 13
Case-study -- j0 from scipy.special

scipy.special was one of the first libraries I wrote

extended “umath” module by adding new
“universal functions” to compute many scientific
functions by wrapping C and Fortran libs.

Bessel functions are solutions to a differential
equation:
x
2
d
2
y
dx
2
+ x
dy
dx
+ (x
2
−α
2
)y = 0
y = J
α
(x)
J
n
(x) =
1
π
Z
π
0
cos (nτ −xsin (τ)) dτ
Saturday, March 16, 13
scipy.special.j0 wraps cephes algorithm
Saturday, March 16, 13
Result --- equivalent to compiled code
In [6]: %timeit vj0(x)
10000 loops, best of 3: 75 us per loop
In [7]: from scipy.special import j0
In [8]: %timeit j0(x)
10000 loops, best of 3: 75.3 us per loop
But! Now code is in Python and can be
experimented with more easily (and moved to
the GPU / accelerator more easily)!
Saturday, March 16, 13
Laplace Example
@jit('void(double[:,:], double, double)')
def numba_update(u, dx2, dy2):
nx, ny = u.shape
for i in xrange(1,nx-1):
for j in xrange(1, ny-1):
u[i,j] = ((u[i+1,j] + u[i-1,j]) * dy2 +
(u[i,j+1] + u[i,j-1]) * dx2) / (2*(dx2+dy2))
Adapted from http://www.scipy.org/PerformancePython
originally by Prabhu Ramachandran
@jit('void(double[:,:], double, double)')
def numbavec_update(u, dx2, dy2):
u[1:-1,1:-1] = ((u[2:,1:-1]+u[:-2,1:-1])*dy2 +
(u[1:-1,2:] + u[1:-1,:-2])*dx2) / (2*(dx2+dy2))
Saturday, March 16, 13
Results of Laplace example
Version Time Speed Up
NumPy 3.19 1.0
Numba 2.32 1.38
Vect. Numba 2.33 1.37
Cython 2.38 1.34
Weave 2.47 1.29
Numexpr 2.62 1.22
Fortran Loops 2.30 1.39
Vect. Fortran 1.50 2.13
https://github.com/teoliphant/speed.git
Saturday, March 16, 13
Numba can change the game!
LLVM IR
x86
C++
ARM
PTX
C
Fortran
Python
Numba turns Python into a “compiled
language” (but much more flexible). You don’t
have to reach for C/C++
Saturday, March 16, 13
Many More Advanced Features

Extension classes (jit a class --- autojit coming soon!)

Struct support (NumPy arrays can be structs)

SSA --- can refer to local variables as different types

Typed lists and typed dictionaries and sets coming soon!

pointer support

calling ctypes and CFFI functions natively

pycc (create stand-alone dynamic library and executable)

pycc --python (create static extension module for Python)
Saturday, March 16, 13
Uses of Numba
Python
Function
Framework accepting dynamic function pointers
U
f
u
n
c
s
G
e
n
e
r
a
l
i
z
e
d

U
F
u
n
c
s
F
u
n
c
t
i
o
n
-
b
a
s
e
d

I
n
d
e
x
i
n
g
M
e
m
o
r
y

F
i
l
t
e
r
s

W
i
n
d
o
w

K
e
r
n
e
l

F
u
n
c
s
I
/
O

F
i
l
t
e
r
s
R
e
d
u
c
t
i
o
n

F
i
l
t
e
r
s
C
o
m
p
u
t
e
d

C
o
l
u
m
n
s

Numba
function pointer
Saturday, March 16, 13
Accelerate/NumbaPro -- blatant ad!
Python and NumPy compiled to
Parallel Architectures
(GPUs and multi-core
machines)

Create parallel-for loops

Parallel execution of
ufuncs

Run ufuncs on the GPU

Write CUDA directly in
Python!

Free for Academics
fast development and fast
execution!
Currently premium
features will be
contributed to open-
source over time!
Saturday, March 16, 13
Numba Development
1260 Mark Florisson
203 Jon Riehl
181 Siu Kwan Lam
110 Travis E. Oliphant
30 Dag Sverre Seljebotn
28 Hernan Grecco
19 Ilan Schnell
11 Mark Wiebe
8 James Bergstra
4 Alberto Valverde
3 Thomas Kluyver
2 Maggie Mari
2 Dan Yamins
2 Dan Christensen
1 timo
1 Yaroslav Halchenko
1 Phillip Cloud
1 Ond!ej "ertík
1 Martin Spacek
1 Lars Buitinck
1 Juan Luis Cano Rodríguez
git log --format=format:%an | sort | uniq -c | sort -r
Siu
Mark
Jon
Saturday, March 16, 13
Milestone Roadmap

Rapid progress this year

Still some bugs -- needs users!

Version 0.7 end of Feb.

Version 0.8 in April

Version 0.9 June

Version 1.0 by end of August

Stable API (jit, autojit) easy to use

Should be able to write equivalent of
NumPy and SciPy with Numba and
memory-views.
http://numba.pydata.org
http://llvmpy.org
http://compilers.pydata.org
We need you:

your use-cases

your tests

developer help
Saturday, March 16, 13
Architectural Overview
Python
Source
Python Parser
Python
AST
Numba Stage 1 Numba Stage n
Numba Code
Generator
Numba
Environment
Numba
AST
LLVM
Saturday, March 16, 13
Numba Architecture
!
Entry points
!
…/numba/decorators.py
!
Environment
!
…/numba/environment.py
!
Pipeline
!
…/numba/pipeline.py
!
Code generation
!
…/numba/codegen/...
Saturday, March 16, 13
Development Roadmap
!
Better stage separation, better modularity
!
Untyped Intermediate Representation (IR)
!
Typed IR
!
Specialized IR
!
Module level entry points
!
Better Array Specialization
Saturday, March 16, 13
Community Involvement
!
~/git/numba$ wc AUTHORS
25 88 1470 AUTHORS
!
(4 lines are blank or instructions)
!
Github https://github.com/numba/numba
!
Mailing list --- numba-users@continuum.io
!
Sprints --- contact Jon Riehl
!
Examples:
!
Hernan Grecco just contributed Python 3 support (Yeah!)
!
Dag collaborating on autojit classes with Mark F.
!
We need you to show off your amazing demo!
Saturday, March 16, 13