Faculty of Engineering & Technology
High Performance Computing Laboratory
(203105430)
B. Tech CSE 4th Year 7th Semester
PRACTICAL : 09
AIM: Write a simple CUDA program to print “Hello World!”
What is CUDA programing ?
CUDA (Compute Unified Device Architecture) programming is a parallel
computing platform and application programming interface (API) model
created by NVIDIA. It allows developers to harness the computational
power of NVIDIA GPUs (Graphics Processing Units) for general-purpose
processing, beyond just graphics rendering.
Logical architecture of GPU
1. Grids :
Grid refers to the highest-level grouping of threads that are scheduled
for execution on the GPU device. It represents the entire set of parallel
work that needs to be processed by the GPU.
Enrollment No.: 210303105048
Div: 7B25
P a g e | 38
Faculty of Engineering & Technology
High Performance Computing Laboratory
(203105430)
B. Tech CSE 4th Year 7th Semester
2. Blocks :
A block is a group of threads that execute concurrently on an SM.
Threads within the same block can cooperate with each other through
shared memory and synchronization mechanisms.
3. Warps :
A warp is the smallest unit of execution in CUDA. It consists of 32
consecutive threads that are executed in lockstep on an SM. This
means that all 32 threads within a warp execute the same instruction at
the same time.
4. Threads :
A thread is a basic unit of execution in CUDA (NVIDIA's parallel
computing platform). Threads are organized into groups called thread
blocks, and multiple thread blocks are organized into a grid.
CUDA program execution flow.
Enrollment No.: 210303105048
Div: 7B25
P a g e | 39
Faculty of Engineering & Technology
High Performance Computing Laboratory
(203105430)
B. Tech CSE 4th Year 7th Semester
Steps :
1. Data copy from CPU to GPU.
2. Execution on GPU.
3. Data copy from GPU to CPU.
CUDA program to print hello world.
%%writefile p1.cu
#include <stdio.h>
__global__ void cuda_hello() {
printf("Hello World!\n");
}
int main() {
cuda_hello<<<1,5>>>();
cudaDeviceSynchronize();
return 0;
}
Output.
Enrollment No.: 210303105048
Div: 7B25
P a g e | 40
Faculty of Engineering & Technology
High Performance Computing Laboratory
(203105430)
B. Tech CSE 4th Year 7th Semester
“Hello world” from diff numbers of blocks.
%%writefile p2.cu
#include <stdio.h>
__global__ void cuda_hello() {
printf("Good morning PU\n");
}
int main() {
cuda_hello<<<2,5>>>();
cudaDeviceSynchronize();
return 0;
}
Output.
Enrollment No.: 210303105048
Div: 7B25
P a g e | 41