Welcome to Scribd!

Skip carousel

PyCUDA AH

Uploaded by

piyush kumar

0% found this document useful (0 votes)

3 views16 pages

Original Title

PyCUDA-AH

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

3 views16 pages

PyCUDA AH

Uploaded by

piyush kumar

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 16

Search inside document

PyCUDA

Tutorial
Introduction
As we may have already seen, NVIDIA GPUs use CUDA cores to perform calculations. In this segment, we
will be using Python to run CUDA programs.

The difference between running C and Python is that Python has a lot more QOL especially in regards to
data visualization and scientific computing.
Aim
We will demonstrate one of the simplest applications of parallel programming. Adding of two matrices.
Note that each element in the matrix will be added in parallel i.e. in a single step. We will forgo the
classical for loop in this case. We will start with 2 2D matrices with an arbitrary size in numpy and then
will proceed to add them.
Getting started
To start our virtual workspace, we will use Google Colaboratory, address at:

https://colab.research.google.com/

Google’s cloud service will provide this to us for free (note that you will require a gmail account)
Setup

1. Top left, File-> new Python 3 notebook

2. Runtime -> Hardware accelerator -> GPU
3. Setup workspace with a new code cell

Since pycuda is not a native library in colab we need an additional line before importing the libraries.

!pip install pycuda.

Run the code segment first before proceeding (at the left, a play button)
Building
Wait for a bit while pycuda is being installed. After the build finishes, we are ready to proceed.

Firstly we need to import the basic libraries.

Declaring
We declare our matrix as follows

Of course, we are free to declare as we wish, but this will create a standard random number matrix,
repeat for a “b” matrix. Note that the cloud probably does not support double precision accuracy so to
avoid difficulties, we pre-convert to float32 or single precision.
GPU details
As we may have seen earlier, but GPU’s don’t support data in them, that is to say, we can’t say that
> accessing GPU
> GPU variable a
>a=3

We will need to allocate memory and then create a copy from the device and access it via pointers and
the run calculations on that.
We do this as follows:

As the memory allocation step

Then proceed with:

Which essentially copies a to a_gpu

htod stands for “host to device” that is, from your system to the GPU, we will be making the
reverse of this command to extract data from our GPU.
Writing the function
Interestingly enough, the function body has to be The equivalence in this is such that all a[idx] are
written in C being executed simultaneously.

Linearization of the 2D matrix will give us the

following relation between idx and (x,y) of an
index.

PS : don’t forget the “;”

a a+1 a+2 a+3 a+4 Array indexing.

threadIdx.x refers to x
a+5 a+(5+1) a+(5+2) a+(5+3) a+(5+4) threadIdx.y refers to y

blockDim.x refers to
a+(5*2) a+(5*2+1) . . . dimension in x which
in our case is 5
Hence, we can create
. . . . . an equivalent 1D
array of size 25x1
with the following
. . . . a+(5*4+4) index notation.
= a+24

s e
Calling the function

Have to create a separate variable to get This will extract the function from
the function as follows “SourceModule” and execute it on the
GPU copies of a and b
Extracting the
result

This step is pretty straightforward. For

simplicity, we will create a different matrix
and copy data from device to host using

dtoh

Which stands for “device to host”

Result

We should get some output in the form of

matrices

As an exercise, we could check the check

the result without the GPU in action as a+b

And compare the results

Advantages of parallel programming
1. Faster execution of repeated computation
2. Efficient handling of data
3. Does not need high power machines in comparison to CPU’s

Note: As we may already have seen, but parallel programming needs to be applied in very specific
circumstances. More specifically, it is best suited for work where a computation is not affected by its next
step.

Ex. repeated derivatives as opposed to complete derivative calculation.

Thank You

Source Code - Machine Code
Document102 pages
Source Code - Machine Code
jlasfjldfkj
No ratings yet
PINOUT Del KIT Kornak (STM32F407 Discovery F4) 0001 Rev 1.00 Module Pinouts Functions 1
Document28 pages
PINOUT Del KIT Kornak (STM32F407 Discovery F4) 0001 Rev 1.00 Module Pinouts Functions 1
Pedro Schmidt
No ratings yet
HPC
Document7 pages
HPC
Smita Shrestha
No ratings yet
Advanced C++ Interview Questions You'll Most Likely Be Asked
From Everand
Advanced C++ Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Cuda Firstprograms PDF
Document6 pages
Cuda Firstprograms PDF
Francesco Lombardi
No ratings yet
Icse Class 9 Hy Computer Application 2019
Document2 pages
Icse Class 9 Hy Computer Application 2019
ROY'S DIGI-TECH COM
No ratings yet
Class Xi Computer Science Paper For Half Yearly Exam 2010
Document8 pages
Class Xi Computer Science Paper For Half Yearly Exam 2010
sm_manda
60% (5)
Faculty of Computers and Infromation, Cairo University: CS213: Programming II Year 2022-2023 First Semester
Document5 pages
Faculty of Computers and Infromation, Cairo University: CS213: Programming II Year 2022-2023 First Semester
Ahmed Abd El-hamid Ahmed
No ratings yet
Maxflow - Segmentation: 1 Image Restoration / Segmentation Project
Document4 pages
Maxflow - Segmentation: 1 Image Restoration / Segmentation Project
Mirna D. Ojeda
No ratings yet
CGA Lab File
Document52 pages
CGA Lab File
Reena Libra
No ratings yet
LabNotes Latest
Document74 pages
LabNotes Latest
AniketRaikwar
No ratings yet
C++ Vs Fortran
Document10 pages
C++ Vs Fortran
Kian Chuan
No ratings yet
ch05 (4) - Multi Dim Arrays
Document15 pages
ch05 (4) - Multi Dim Arrays
Samantha Morin
No ratings yet
Introduction To Tools - Python: 1 Assignment
Document7 pages
Introduction To Tools - Python: 1 Assignment
Rahul Vasanth
No ratings yet
Final
Document4 pages
Final
Apinya SUTHISOPHAARPORN
No ratings yet
CSC126 CH2P2
Document16 pages
CSC126 CH2P2
afiq8686
No ratings yet
Eceg 1052 CH 2
Document34 pages
Eceg 1052 CH 2
CHARA
No ratings yet
An Introduction To Matlab: Part 5
Document5 pages
An Introduction To Matlab: Part 5
ayeshsekar
No ratings yet
Cse133 2019 S1
Document2 pages
Cse133 2019 S1
MD. JAHID RAYHAN
No ratings yet
5.1 Types: Edsger Dijkstra
Document30 pages
5.1 Types: Edsger Dijkstra
philip resuello
No ratings yet
Ece4750 T01 Proc Concepts Problems
Document10 pages
Ece4750 T01 Proc Concepts Problems
solf jacob
No ratings yet
As A Single PDF
Document3 pages
As A Single PDF
auctmetu
No ratings yet
LAB MANUAL Updated PDF
Document45 pages
LAB MANUAL Updated PDF
Tiff Nguyen
No ratings yet
178 hw3
Document3 pages
178 hw3
jagaenator
No ratings yet
Lecture 2: Introduction To Pytorch
Document7 pages
Lecture 2: Introduction To Pytorch
Nilesh Chaudhary
No ratings yet
Adobe Scan Jan 05, 2024
Document8 pages
Adobe Scan Jan 05, 2024
SAM YT
No ratings yet
C++ Proposed Exercises (Chapter 7: The C++ Programing Language, Fourth Edition)
Document7 pages
C++ Proposed Exercises (Chapter 7: The C++ Programing Language, Fourth Edition)
Mauricio Bedoya
No ratings yet
Datalab
Document5 pages
Datalab
jack
No ratings yet
C, C
Document44 pages
C, C
api-3760815
No ratings yet
A From-Scratch Tour of Bitcoin in Python
Document38 pages
A From-Scratch Tour of Bitcoin in Python
krkvinod
No ratings yet
Cs Paper-5
Document16 pages
Cs Paper-5
rishabh
No ratings yet
Planar Data Classification With One Hidden Layer v5
Document19 pages
Planar Data Classification With One Hidden Layer v5
sn3fru
No ratings yet
Praktikum Klimtrop 3
Document13 pages
Praktikum Klimtrop 3
Rahmad Auliya Tri Putra
No ratings yet
Chapter XXI
Document41 pages
Chapter XXI
Sam V
No ratings yet
CIS 419/519 Introduction To Machine Learning Assignment 2: Instructions
Document12 pages
CIS 419/519 Introduction To Machine Learning Assignment 2: Instructions
Toán
No ratings yet
C++ AMP Errata: Chapter 1: Vectorization (Page 8)
Document3 pages
C++ AMP Errata: Chapter 1: Vectorization (Page 8)
dr_s_m_afzali8662
No ratings yet
A From-Scratch Tour of Bitcoin in Python
Document42 pages
A From-Scratch Tour of Bitcoin in Python
txboxi23
No ratings yet
Data Lab
Document6 pages
Data Lab
Anonymous Kmmc9r
No ratings yet
Assembly Language Lab
Document32 pages
Assembly Language Lab
Eri Matraku
100% (1)
Class 12
Document5 pages
Class 12
kishlaysinha5
No ratings yet
Automatic Differentiation, C++ Templates AndPhotogrammetry
Document14 pages
Automatic Differentiation, C++ Templates AndPhotogrammetry
gorot1
No ratings yet
Sample Midterm 1
Document7 pages
Sample Midterm 1
Thanh Le
No ratings yet
12.plotting 2D Graphs
Document5 pages
12.plotting 2D Graphs
ds vidhya
No ratings yet
Data Handling
Document12 pages
Data Handling
dhyeymdobariya
No ratings yet
CSE101 Chapter7 S21
Document56 pages
CSE101 Chapter7 S21
Lucky Alamsyah
No ratings yet
Knitr Manual
Document11 pages
Knitr Manual
Robin C DS
No ratings yet
9 Java Programmin PDF
Document31 pages
9 Java Programmin PDF
Ashok Chelika
No ratings yet
Full Test-1
Document9 pages
Full Test-1
anithasancs
No ratings yet
Linkedin-Skill-Assessments-Quizzes-1 - C++quiz - MD at Master Mauriziomarini - Linkedin-Skill-Assessments-Quizzes-1 GitHub
Document28 pages
Linkedin-Skill-Assessments-Quizzes-1 - C++quiz - MD at Master Mauriziomarini - Linkedin-Skill-Assessments-Quizzes-1 GitHub
Ankit Raj
No ratings yet
Quiz 3: 1 Exceptions
Document12 pages
Quiz 3: 1 Exceptions
Arnold Ramirez Zavelta
No ratings yet
GPU Programming EE 4702-1 Final Examination: Exam Total
Document10 pages
GPU Programming EE 4702-1 Final Examination: Exam Total
moien
No ratings yet
cs2106 1718s1 Midterm Solution
Document12 pages
cs2106 1718s1 Midterm Solution
Reynard Ardian
No ratings yet
Icse J: ICSE Class 10 Computer Applications (Java) 2009 Solved Question Paper
Document14 pages
Icse J: ICSE Class 10 Computer Applications (Java) 2009 Solved Question Paper
shah manan
No ratings yet
Set 10
Document8 pages
Set 10
Mimie
No ratings yet
CS2209 Oops Lab
Document79 pages
CS2209 Oops Lab
tp2006ster
No ratings yet
Coding Introduction
Document46 pages
Coding Introduction
lijiayi2023666
No ratings yet
Delhi Public School, Gurgaon Mid Term Examination (2021-22) Computer Science (083) Class Xii
Document7 pages
Delhi Public School, Gurgaon Mid Term Examination (2021-22) Computer Science (083) Class Xii
Saransh Prajapati
No ratings yet
MT 1 Rubricfa 2011
Document22 pages
MT 1 Rubricfa 2011
David James
No ratings yet
Computer
Document5 pages
Computer
Vivek Varma
No ratings yet
Gd Script
From Everand
Gd Script
Marijo Trkulja
No ratings yet
Summary of Jimmy Song's Programming Bitcoin
From Everand
Summary of Jimmy Song's Programming Bitcoin
IRB Media
No ratings yet
L15-File Management
Document19 pages
L15-File Management
jackops
No ratings yet
L14-File Management
Document23 pages
L14-File Management
jackops
No ratings yet
L13-File Management
Document18 pages
L13-File Management
jackops
No ratings yet
Cuda - New Features and Beyond Ampere Programming For Developers PDF
Document78 pages
Cuda - New Features and Beyond Ampere Programming For Developers PDF
jackops
No ratings yet
CSI Adhyayan Jul-Sep 2020 PDF
Document138 pages
CSI Adhyayan Jul-Sep 2020 PDF
jackops
No ratings yet
An Introduction To PyCUDA Using Prefix Sum Algorithm PDF
Document6 pages
An Introduction To PyCUDA Using Prefix Sum Algorithm PDF
jackops
No ratings yet
Co Gate Exam
Document4 pages
Co Gate Exam
jyothibellaryv
No ratings yet
MT8127 b760h Light L Rev Scatter
Document8 pages
MT8127 b760h Light L Rev Scatter
ahmad tedy
No ratings yet
53 Medison Accuvix Service Manual PDF
Document41 pages
53 Medison Accuvix Service Manual PDF
kizen_5
No ratings yet
LVDS Source Synchronous 7:1 Serialization and Deserialization Using Clock Multiplication
Document18 pages
LVDS Source Synchronous 7:1 Serialization and Deserialization Using Clock Multiplication
Roman Bacik
No ratings yet
Spi Module
Document13 pages
Spi Module
MuhammadSaeedTahir
100% (2)
UT60A Computer Interface Software
Document4 pages
UT60A Computer Interface Software
daniel villa
No ratings yet
Acer P1283
Document2 pages
Acer P1283
Oscar Alexander Ojeda Arias
No ratings yet
Installation of Wintestem Under Windows 10: 1. Issue
Document4 pages
Installation of Wintestem Under Windows 10: 1. Issue
Méd El Yazid
No ratings yet
Netstar Project
Document60 pages
Netstar Project
Lee Chee Soon
No ratings yet
Types of Processors
Document7 pages
Types of Processors
Tayyab14
67% (3)
Rockwell DF1
Document4 pages
Rockwell DF1
atif
No ratings yet
2016defcon Intro To Disassembly Workshop PDF
Document324 pages
2016defcon Intro To Disassembly Workshop PDF
Anonymous 1FVnQG
No ratings yet
Asus X441ubr Rev 2.1
Document68 pages
Asus X441ubr Rev 2.1
João da Silva
No ratings yet
Tender Specs For MSA 2040
Document3 pages
Tender Specs For MSA 2040
Ana ana
No ratings yet
4 de Febrero de 2022: Nuevos Ryzen Con Graficos
Document2 pages
4 de Febrero de 2022: Nuevos Ryzen Con Graficos
CharlyBeltránSánchez
No ratings yet
17CS005
Document2 pages
17CS005
Koush Rastogi
No ratings yet
Manual de Utilizare Sistem de Control Acces
Document16 pages
Manual de Utilizare Sistem de Control Acces
remuss
No ratings yet
CacheMARA Cluster Q2 2018 0
Document2 pages
CacheMARA Cluster Q2 2018 0
Igor Kutsienyo
No ratings yet
Compal LA-7912P r02
Document60 pages
Compal LA-7912P r02
Leandrey Bicalho
No ratings yet
Virtual Storage Access Method
Document12 pages
Virtual Storage Access Method
Rajiv Nayak
No ratings yet
CHMA NotesFewContents
Document158 pages
CHMA NotesFewContents
Sayyan Shaikh
No ratings yet
SINUMERIK 840di sl/840D sl/840D PCU-Basesoftware (IM8)
Document88 pages
SINUMERIK 840di sl/840D sl/840D PCU-Basesoftware (IM8)
ВячеславГрузевский
No ratings yet
Delhi Public School Bareilly: Subject-Computer Science Annual Project Report Topic - Alumni Management System
Document21 pages
Delhi Public School Bareilly: Subject-Computer Science Annual Project Report Topic - Alumni Management System
WIZARDXOP
No ratings yet
3.3.6 Parallel Processing
Document2 pages
3.3.6 Parallel Processing
Pushkar Niroula
No ratings yet
HC 05 Datasheet
Document14 pages
HC 05 Datasheet
ivan10m
No ratings yet
Learning Activity Sheet PDF
Document10 pages
Learning Activity Sheet PDF
aldrin joseph
No ratings yet
AWT - E1070etools - Rev2006 - User Manual
Document51 pages
AWT - E1070etools - Rev2006 - User Manual
Juan
No ratings yet
1tgc901021m0201 - Mcu Users Guide v3.0
Document92 pages
1tgc901021m0201 - Mcu Users Guide v3.0
serkalemt
No ratings yet
Nve75259 01
Document128 pages
Nve75259 01
rmorenodx4587
No ratings yet