You are on page 1of 33

GPU Computing with

Spark and Python

Afif A. Iskandar
(AI Research Engineer)
My Bio

Afif A. Iskandar
Artificial Intelligence Research Engineer & Educator

- Artificial Intelligence Research Engineer @


Unicorn Startup
- Content Creator & Educator @
NgodingPython

Afif A. Iskandar
AI Enthusiast
Bachelor Degree in Mathematics @ Universitas Indonesia
Master Degree in Computer Science @ Universitas Indonesia
Overview

● Why Python ?
● Numba : Python JIT Compiler for CPU and GPU
● PySpark : Distributed Programming in Python
● Hands-On Tutorial
● Conclusion
Why Python ?
Python is Fast
for writing, testing and developing code
Python is Fast
because it’s interpreted, dynamically typed and high level
Python is Slow
For repeated Execution of Low-level task
Python is Slow, Because

● Python is a high level, interpreted and


dynamically-typed language
● Each Python operation comes with a small
type-checking overhead
● With many repeated small operations (e.g. in a
loop), this overhead becomes significant!
The paradox ...

what makes Python fast


for development

what makes Python slow


for code execution
Is there another way ?

- Switching languages for speed in your projects can be a


little clunky:
- Sometimes tedious boilerplate for translating data types
across the language barrier
- Generating compiled functions for the wide range of data
types can be difficult
- How can we use cutting edge hardware, like GPUs?
Numba
Compiling Python

● Numba is an open-source, type-specializing compiler for Python


functions
● Can translate Python syntax into machine code if all type information
can be deduced when the function is called.
● Implemented as a module. Does not replace the Python interpreter!
● Code generation done with:
○ LLVM (for CPU)
○ NVVM (for CUDA GPUs).
How Does Numba Work ?
Numba on the CPU
Numba on the CPU
CUDA Kernels in Python
CUDA Kernels in Python
CUDA Kernels in Python
Calling the Kernel from Python
Handling Device Memory Directly
Higher Level Tools : GPU ufuncs
Higher Level Tools : GPU ufuncs
GPU ufuncs Performance
PySpark
What is Apache Spark

● An API and an execution engine for distributed computing on a cluster


● Based on the concept of Resilient Distributed Datasets (RDDs)
○ Dataset: Collection of independent elements (files, objects, etc) in memory
from previous calculations, or originating from some data store
○ Distributed: Elements in RDDs are grouped into partitions and may be
stored on different nodes
○ Resilient: RDDs remember how they were created, so if a node goes
down, Spark can recompute the lost elements on another node
Computation DAGs

Fig from:
https://databricks.com/blog/2015/06/22/understanding-your-spark-application-through-visualization.html
How Does Spark Scale ?

● All cluster scaling is about minimizing I/O. Spark does this in several
ways:
○ Keep intermediate results in memory with rdd.cache()
○ Move computation to the data whenever possible (functions are
small and data is big!)
○ Provide computation primitives that expose parallelism and
minimize communication between workers: map, filter, sample,
reduce, …
Python and Spark

● Spark is implemented in Java & Scala on the JVM


● Full API support for Scala, Java, and Python (+ limited support for R)
● How does Python work, since it doesn’t run on the JVM? not counting
IronPython)
Tutorial
Notebook Link : TBA
Conclusion
PySpark and Numba for GPU Clusters

● Numba let’s you create compiled CPU and CUDA functions right
inside your Python applications.
● Numba can be used with Spark to easily distribute and run your code
on Spark workers with GPUs
● There is room for improvement in how Spark interacts with the GPU,
but things do work.
● Beware of accidentally multiplying fixed initialization and compilation
costs.
Thank You

You might also like