You are on page 1of 27

PyPy and Unladen

Swallow - Making
Python Fast

Friday, July 30, 2010


Why is Python slow?

Why is CPython slow?

What are we going to do about it?

Friday, July 30, 2010


Why is Python Slow
Python the abstract language, not the
implementation.

Very dynamic.

Almost nothing known at compile time.

Frame introspection.

Object model.

Globals/Builtins
Friday, July 30, 2010
Frame Introspection
import sys

def f():
a = 3
g()

def g():
try:
raise Exception
except Exception, e:
frame = sys.exc_info()[2].tb_frame
print frame.f_back.f_locals["a"]

f()

Friday, July 30, 2010


Object Model
class A(object):
def __init__(self, **kwargs):
self.__dict__.update(kwargs)

o = A(a=1, b=2)
print o.a

Friday, July 30, 2010


Dynamic

def f(a, b):


print a + b

Friday, July 30, 2010


Globals/Builtins
def f(l):
yield len(l)
yield len(l)

for i in f([3]):
print i
len = lambda o: 3

Friday, July 30, 2010


Why is CPython Slow

“Primitive” bytecode VM

Value boxing

Reference counting

Friday, July 30, 2010


What Are We Going
To Do About It

Unladen Swallow

PyPy

Friday, July 30, 2010


Unladen Swallow

Google funded branch of Python

Started out off of Python 2.6

PEP 3146 - Merging Unladen Swallow into


Py3k

LLVM based function JIT

Friday, July 30, 2010


LLVM
Low Level Virtual Machine

Not a VM like CPython.

Take Python representation of a function and


turn into LLVM representation of a function
and generate machine code.

Includes all sorts of optimizations and code


generators.

Friday, July 30, 2010


JIT

Profile and see which functions are called the


most.

Record what types are seen for each


operation.

Emit optimized machine code (that bails back


to the interpreter if guards fail).

Friday, July 30, 2010


PyPy

Python in Python

JIT Generator

Tracing JIT

Friday, July 30, 2010


RPython

Restricted Python

Statically typed subset of Python

Can be efficiently converted to C, JVM


bytecode, CIL (.NET bytecode)

Friday, July 30, 2010


JIT Generator

Take an interpreter written in RPython

Add a few hints to the source code

Automatically generate a JIT for it

Friday, July 30, 2010


Tracing JIT

Profile code looking for hot loops

Record types seen within a loop

Generated optimized machine code for loops

Friday, July 30, 2010


Benchmarks

The Python Benchmark Suite

Extracted from Unladen Swallow

Used by CPython, Unladen Swallow, PyPy

Friday, July 30, 2010


CPython vs Unladen
Swallow
Benchmark CPython 2.6 Unladen Swallow Difference
2to3 25.13s 24.87 s 1.01

django 1.08 s 0.80 s 1.35

html5lib 14.29 s 13.20 s 1.08

nbody 0.51 s .28 s 1.84

rietveld 0.75 s 0.55 s 1.37

slowpickle 0.75 s 0.55 s 1.37

slowspitfire 0.83 s 0.61 s 1.36

slowunpickle 0.33 s 0.26 s 1.26

spambayes 0.31 s 0.34 s 1.10

Friday, July 30, 2010


CPython vs PyPy

Friday, July 30, 2010


Faster is
Possible

Friday, July 30, 2010


Global/Builtin
Lookup Caching

Loading a global takes 1 dictionary lookup.

Loading a builtin takes 2.

Globals/Builtins rarely, if ever, change.

Friday, July 30, 2010


PyPy

Uses a dictionary for modules similar to V8


hidden classes.

Check that the dict has the right shape.

Read the field directly out of it.

Friday, July 30, 2010


In Unladen Swallow

When the JIT compiler sees a


LOAD_GLOBAL opcode it does the lookup
at compile time, writes the exact address of
the value into the machine code, and registers
a listener with the globals/builtins dictionary.

If the globals/builtins dictionary is written to


it invalidates the machine code.

Friday, July 30, 2010


Inlining
Good programming practice is to split up
functions.

Function calls are expensive.

Also, calls across the Python interpreter/C (or


other target language) barrier are expensive.

Remove argument parsing, frame, and “raw”


function call overhead.

Friday, July 30, 2010


In Unladen Swallow
This hasn’t landed in trunk yet.

When compiling code if a


CALL_FUNCTION always points to the
same function check how “expensive” that
function is, if it’s low copy its bytecode into
our bytecode.

Also, at CPython compile time turn all library


functions into LLVM IR, so we can inline
that as well.

Friday, July 30, 2010


In PyPy

Tracing JIT automatically goes through all


function calls.

Final operations list automatically has all calls


inlined.

Library functions are compiled to jitcode (the


JIT’s IR) at PyPy compile time, so they can
be inlined too.

Friday, July 30, 2010


Questions?
Complaints? Thrown
Vegetables?

Friday, July 30, 2010

You might also like