Professional Documents
Culture Documents
GPU Computing With Spark and Python
GPU Computing With Spark and Python
Afif A. Iskandar
(AI Research Engineer)
My Bio
Afif A. Iskandar
Artificial Intelligence Research Engineer & Educator
Afif A. Iskandar
AI Enthusiast
Bachelor Degree in Mathematics @ Universitas Indonesia
Master Degree in Computer Science @ Universitas Indonesia
Overview
● Why Python ?
● Numba : Python JIT Compiler for CPU and GPU
● PySpark : Distributed Programming in Python
● Hands-On Tutorial
● Conclusion
Why Python ?
Python is Fast
for writing, testing and developing code
Python is Fast
because it’s interpreted, dynamically typed and high level
Python is Slow
For repeated Execution of Low-level task
Python is Slow, Because
Fig from:
https://databricks.com/blog/2015/06/22/understanding-your-spark-application-through-visualization.html
How Does Spark Scale ?
● All cluster scaling is about minimizing I/O. Spark does this in several
ways:
○ Keep intermediate results in memory with rdd.cache()
○ Move computation to the data whenever possible (functions are
small and data is big!)
○ Provide computation primitives that expose parallelism and
minimize communication between workers: map, filter, sample,
reduce, …
Python and Spark
● Numba let’s you create compiled CPU and CUDA functions right
inside your Python applications.
● Numba can be used with Spark to easily distribute and run your code
on Spark workers with GPUs
● There is room for improvement in how Spark interacts with the GPU,
but things do work.
● Beware of accidentally multiplying fixed initialization and compilation
costs.
Thank You