HPC 2

FACULTY OF ENGINEERING AND TECHNOLOGY
BACHELOR OF TECHNOLOGY
GPU COMPUTING
LAB
(203105398)
MANUAL
6th SEMESTER
COMPUTER SCIENCE & ENGINEERING
DEPARTMENT
CERTIFICATE
Faculty of Engineering & Technology
Semester: 6th Year: 3rd
B.Tech CSE-AI
Subject Name: GPU
Subject Code : 203105398
This is to certify that Mr./Ms …………………………with

Enrolment no.200303124548 has successfully completed his/her
laboratory experiments in the GPU COMPUTING (203105398)
from the department of PIT.-CSE (AI). During the academic
year.2022-23.
Date of Submission:......................... Staff In charge:...........................
Head Of Department:...........................................
PAGE NO.:
B.Tech CSE-AI
Subject Name: GPU
INDEX
Sr. Page Starting Ending

Experiment Title Grade Sign
No. No. Date Date
Understand the system by various linux/

1. windows commands & GPU and CUDA
Architectures.
2. Understand the google colab.
3. Analyze the program using gprof profiles.
4.
Wap to demonstrate the addition of an array
using cuda code
5. Wap to demonstrate squaring an array using a

simple cuda kernel
6. Wap to demonstrate vector-matrix

multiplication using gpu global memory
Wap vector matrix multiplication with

7. measuring time using cuda events and uses
shared memory
Wap demonstrate vector-matrix

8.
multiplication using gpu constant memory it
stores vector v in gpu constant memory
9. Analyse the program using nvidia profilers
10. With the help of gpu libraries like keras,

tensorflow, gan etc develop a mini project.
PAGE NO.:
B.Tech CSE-AI
Subject Name: GPU
EXPERIMENT-01
AIM: STUDY THE FACILITIES PROVIDED BY GOOGLE COLAB
GOOGLE COLAB: Colab is a free jupyter notebook environment which runs in the
cloud. It lets you and your team members edit documents, the way you work with
Google Docs. Colab supports may popular machine learning libraries which can be
easily loaded in your notebook.
Facilities provided by Google Colab:
1) Free access to GPU and TPU: One of the most significant usage of using Google
Colab is the free access to Graphics Processing Units(GPU) and Tensor Processing
Units(TPUs). This is particularlyUseful for tasks that require significant computational
power, such as deep learning models.
2) Jupyter Notebook Integration: Colab supports Jupyter notebooks, making it easy
to create documents and share them that contain live code, equations, visualizations, and
narrative text.
3) Collaboration and Sharing: Colab allows multiple users to collaborate on a
notebook in realtime. Notebooks can be shared just like Google Docs or Sheets, and
comments can be added for collaboration.
4) Integration with Google Drive: Colab is integrated with Google Drive, you can
save work directly to your Google Drive or share your notebooks with others. This
makes it easy to access and manage your projects.
5) Pre-installed Libraries and Packages: Colab comes with many pre-installed
libraries commonly used in machine learning and data analysis, such as NumPy, Pandas,
Metaplotlib, and TensorFlow. This reduces the setup time for users.
6) Easy Installation of Addtional Packages: Users can install additional Python
packages using the ‘ ! pip install ’ command directly within a Colab notebook.
7) Data Import and Export: Colab allows easy import and export of data. You can
upload data from your local machine, connect to Google Drive, or even fetch data from
the internet. Data can also be saved and downloaded directly from Colab.
8) Markdown Support: Colab supports Markdown cells, allowing users to add
formatted text, images, and hyperlinks to provide context and explinations in their
notebooks.
9) Interactive Data Visualization: Colab provides support for interactive data
visualization using libraries like Meta- -plotlib, Plotly, and Seaborn.
10)Access to External APIs: You can access external APIs and integrate them directly
into your Colab notebook For various tasks.
11) Hardware Acceleration: Colab allows users to enable or disable GPU and TPU
acceleration for their notebooks, depending on the requirements of their code.
Remember that while Colab provides a lot of resources for free, there are limitations on
PAGE NO.:
B.Tech CSE-AI
Subject Name: GPU
resource usage, and sessions may be terminated after a certain period of inactivity. Users
intrested in longer and more resource-intensive tasks may need to consider other
options,such as paid cloud services
PAGE NO.:
B.Tech CSE-AI
Subject Name: GPU
EXPERIMENT- 02
AIM: - LINUX COMMANDS
TYPES OF COMMANDS: -
PWD
CD
RM
HEAD
TAIL
MKDIR
RMDIR
LS
WC
TOUCH
CAT
TAC
GREP
ID
COMM
DF
MOUNT
CLEAR
EXIT
WHOAMI
CP
SORT
DATE
TIME
ZCAT
MV
TEE
PAGE NO.:
B.Tech CSE-AI
Subject Name: GPU
PWD: -The PWD command is used to display the location of the current working directory.
Syntax: - pwd
CD: -It allows you to change your working directory. You use it to move around within
the hierarchy of your file system.
SYNTAX: -cd
RM: -The RM command is used to remove a file.

SYNTAX: - rm<file name>
HEAD: -The HEAD command is used to display the content of a file. It displays the first 10 lines of a file.
SYNTAX: - head <dir name>
PAGE NO.:
B.Tech CSE-AI
Subject Name: GPU
TAIL: -The TAIL command is similar to the head command. The difference between both commands is that it
displays the last ten lines of the file content. It is useful for reading the error message.
SYNTAX: - tail <dir name>
MKDIR: -The MKDIR command is used to create a new directory under any directory.
SYNTAX: - mkdir<dir name>
RMDIR: -The RMDIR command is used to delete a directory.

SYNTAX: -rmdir <dir name>
LS: -The LS command is used to display a list of content of a directory.
PAGE NO.:
B.Tech CSE-AI
Subject Name: GPU
SYNTAX: - ls
WC: The WC command is used to count the lines, words, and characters in
TOUCH: -The TOUCH command is used to create empty files. We can create multiple
empty files by executing it once.
SYNTAX: - touch <file name 1> AND touch<file1 file2>
CAT: -The CAT command is also used as a filter. To filter a file, it is used inside pipes.
SYNTAX: - Cat <filename> |cat or tac|
GREP: -He GREP is the most powerful and used filter in a Linux system. The 'grep' stands for "global
regular expression print." It is useful for searching the content from a file. Generally, it is used with the pipe.
SYNTAX: -Cat <file name> | grep <search text>
TAC: -The TAC command is the reverse of cat command, as its name specified. It displays the file content in
reverse order (from the last line).
SYNTAX: - cat pra.txt | tac
PAGE NO.:
B.Tech CSE-AI
Subject Name: GPU
ID: The ID command is used to display the user ID (UID) and group ID (GID).
SYNTAX: - id
COMM: -The COMM command is used to compare two files or streams. By default, it displays three columns,
first displays non-matching items of the first file, second indicates the non-matching item of the second file, and
the third column displays the matching items of both files.
SYNTAX: - com <file1> <file2>
DF: -The DF command is used to display the disk space used in the file system. It displays the output as in the
number of used blocks, available blocks, and the mounted directory.
SYNTAX: - df
MOUNT: -The MOUNT command is used to connect an external device file system to the system's file system.
PAGE NO.:
B.Tech CSE-AI
Subject Name: GPU
SYNTAX:- mount
CLEAR: -Linux clear command is used to clear the terminal screen.

SYNTAX: - clear
EXIT: -Linux EXIT command is used to exit from the current shell. It takes a parameter as a number and exits
the shell with a return of status number.
SYNTAX: - exit
WHOAMI: -It tells you about the system's username.

SYNTAX: - whoami
CP: -The CP command is used to copy a file or directory.

SYNTAX: - cp
SORT: -The SORT command is used to sort files in alphabetical order.

SYNTAX: - sort<file name>
PAGE NO.:
B.Tech CSE-AI
Subject Name: GPU
DATE: -The DATA command is used to display date, time, time zone, and more.
SYNTAX: date
TIME: -The TIME command is used to display the time to execute a command.
SYNTAX: - time
ZCAT: -The ZCAT command is used to display the compressed files.

SYNTAX: - ls and after zcat<file name>
MV: -The MV command is used to move a file or a directory form one location to another location.
SYNTAX: - mav <file name ><directory>
PAGE NO.:
B.Tech CSE-AI
Subject Name: GPU
TEE: -The TEE command is quite similar to the cat command. The only difference between both filters is that
it puts standard input on standard output and also write them into a file.
SYNTAX: - cat <file name> | tee <new file.name > | cat
PASSWD: - USED TO CREATE AND CHANGE PASSWORD.

SYNTAX:passwd
PIP: - USED TO INSTALL PACKAGE.

SYNTAX:pip install <package name>
HISTORY: - SHOW THE DATA WHAT YOU HAVE DONE UPTO NOW.
SYNTAX:history
PAGE NO.:
B.Tech CSE-AI
Subject Name: GPU
EXPERIMENT-03
AIM: - Using Divide and Conquer Strategies design a class for Concurrent Quick Sort
using C++.
PAGE NO.:
B.Tech CSE-AI
Subject Name: GPU
%%cu
#include <iostream>
#include <vector>
#include <algorithm>
#include <cstdlib>
#include <ctime>
#include <omp.h>
using namespace std;
class QuickSortMultiThreading {
public:
QuickSortMultiThreading(int start, int end, vector<int>& arr)
: start_(start), end_(end), arr_(arr) {}
int partition(int start, int end, vector<int>& arr) {
int i = start;
int j = end;
int pivoted = rand() % (j - i + 1) + i;
int t = arr[j];
arr[j] = arr[pivoted];
PAGE NO.:
B.Tech CSE-AI
Subject Name: GPU
arr[pivoted] = t;
j--;
while (i <= j)
{
if (arr[i] <= arr[end]) {
i++;
continue;
}
if (arr[j] >= arr[end]) {
j--;
continue;
}
t = arr[j];
arr[j] = arr[i];
arr[i] = t;
j--;
i++;
}
t = arr[j + 1];
arr[j + 1] = arr[end];
arr[end] = t;
return j + 1;
}
void operator() () {
if (start_ >= end_) {
PAGE NO.:
B.Tech CSE-AI
Subject Name: GPU
return;
}
int p = partition(start_, end_, arr_);
QuickSortMultiThreading left(start_, p - 1, arr_);
QuickSortMultiThreading right(p + 1, end_, arr_);
#pragma omp parallel sections
{
#pragma omp section
{
left();
}
#pragma omp section
{
right();
}
}
}
private:
int start_;
int end_;
vector<int>& arr_;
};
PAGE NO.:
B.Tech CSE-AI
Subject Name: GPU
int main() {
int n = 7;
vector<int> arr = {54, 64, 95, 82, 12, 32, 63};
srand(time(NULL));
QuickSortMultiThreading(0, n - 1, arr)();
for (int i = 0; i < n; i++) {
cout << arr[i] << " ";
}
cout << endl;
return 0;
}
OUT PUT:
PAGE NO.:
B.Tech CSE-AI
Subject Name: GPU
PRACTICAL 4
AIM: Write a program on an unloaded cluster for several different
numbers of nodes and record the time taken in each case.Draw a graph
execution time against the number of nodes.
AIM: W.A.P to demonstrate the addition of an array using CUDA code.
CODE:
%%cu
#include <stdio.h>
int main()
int arr[] = {1, 2, 3, 4, 5};
int sum = 0;
int length = sizeof(arr)/sizeof(arr[0]);
for (int i = 0; i < length; i++) { sum
= sum + arr[i];
printf("Sum of all the elements of an array: %d", sum);
return 0;
PAGE NO.:
B.Tech CSE-AI
Subject Name: GPU
}
OUTPUT
Figure: 4.1 Addition of Arrays
PAGE NO.:
B.Tech CSE-AI
Subject Name: GPU
05
AIM: W.A.P to demonstrate squaring an array using a simple CUDA kernel.
CODE:
%%cu
#include<stdio.h>
int main()
{
int arr[5] = {1,2,3,4,5};
int i = 0;
printf("Array elements:\n"); for(i =
0;i<5;i++) printf("%d",arr[i]); printf("\
nsquare of array elements;\n"); for(i =
0;i<5;i++); printf("%d",arr[i]*arr[i]);
printf("\n");
return 0;
}
OUTPUT
PAGE NO.:
B.Tech CSE-AI
Subject Name: GPU
Figure: 5.1 Squaring an Array
PAGE NO.:
B.Tech CSE-AI
Subject Name: GPU
06
AIM: W.A.P to demonstrate vector-matrix multiplication using GPU global
memory.
CODE:
%%cu
#include<stdio.h>
#include<stdlib.h>
global void arradd(int* md, int* nd, int* pd, int size)
{
//Get unique identification number for a given thread int
myid = blockIdx.x*blockDim.x + threadIdx.x;
pd[myid] = md[myid] + nd[myid];

}
int main()
{ int size = 2000 *
sizeof(int);
int m[2000], n[2000], p[2000],*md, *nd,*pd;
int i=0;
//Initialize the arrays for(i=0;

i<2000; i++ )
{ m[i] = i; n[i] =
i; p[i] =
0;
}
PAGE NO.:
B.Tech CSE-AI
Subject Name: GPU
// Allocate memory on GPU and transfer the data

cudaMalloc(&md, size);
cudaMemcpy(md, m, size, cudaMemcpyHostToDevice);
cudaMalloc(&nd, size);
cudaMemcpy(nd, n, size, cudaMemcpyHostToDevice);
cudaMalloc(&pd, size);
// Define number of threads and blocks

dim3 DimGrid(10, 1);
dim3 DimBlock(200, 1);
// Launch the GPU kernel function

arradd<<< DimGrid,DimBlock >>>(md,nd,pd,size);
// Transfer the results back to CPU memory cudaMemcpy(p,

pd, size, cudaMemcpyDeviceToHost);
// Free GPU
arrays
cudaFree(md);
cudaFree(nd);
cudaFree (pd);
// Print the results for(i=0;

i<2000; i++ )
{ printf("\t%d",p[i]);
}
}
PAGE NO.:
B.Tech CSE-AI
Subject Name: GPU
OUTPUT
Figure: 6.1 Vector Matrix Multiplication
PAGE NO.:
B.Tech CSE-AI
Subject Name: GPU
PRACTICAL: 07
AIM: W.A.P vector matrix multiplication with measuring time using CUDA
events and uses shared memory.
CODE:
%%cu
#include<stdio.h>
#include<stdlib.h>
global void arradd(int* md, int* nd, int* pd, int

size) {
//Get unique identification number for a given thread

int myid = blockIdx.x*blockDim.x + threadIdx.x;
pd[myid] = md[myid] * nd[myid];

}
int main()
{
int size = 2000 * sizeof(int);

int m[2000], n[2000], p[2000],*md, *nd,*pd;
int i=0;
//Initialize the arrays

for(i=0; i<2000; i++ )
PAGE NO.:
B.Tech CSE-AI
Subject Name: GPU
m[i] = i;
n[i] = i;
p[i] = 0;
PAGE NO.:
B.Tech CSE-AI
Subject Name: GPU
PAGE NO.:
B.Tech CSE-AI
Subject Name: GPU
PAGE NO.:
B.Tech CSE-AI
Subject Name: GPU
// Allocate memory on GPU and transfer the data cudaMalloc(&md,

size);
cudaMemcpy(md, m, size, cudaMemcpyHostToDevice);
PAGE NO.:
B.Tech CSE-AI
Subject Name: GPU
PAGE NO.:
B.Tech CSE-AI
Subject Name: GPU
cudaMalloc(&nd, size);
cudaMemcpy(nd, n, size, cudaMemcpyHostToDevice);
PAGE NO.:
B.Tech CSE-AI
Subject Name: GPU
PAGE NO.:
B.Tech CSE-AI
Subject Name: GPU
cudaMalloc(&pd, size);
dim3 DimGrid(10, 1);
dim3 DimBlock(200, 1);
arradd<<< DimGrid,DimBlock >>>(md,nd,pd,size);

cudaMemcpy(p, pd, size, cudaMemcpyDeviceToHost);
cudaFree(md); cudaFree(nd); cudaFree (pd); for(i=0;
i<2000; i++ )
PAGE NO.:
B.Tech CSE-AI
Subject Name: GPU
PAGE NO.:
B.Tech CSE-AI
Subject Name: GPU
printf("\t%d",p[i]);
}
}
OUTPUT
Figure: 7.1 vector matrix multiplication uses shared memory
PAGE NO.:
B.Tech CSE-AI
Subject Name: GPU
PRACTICAL: 08
AIM: W.A.P demonstrate vector-matrix multiplication using GPU constant
memory it stores vector v in GPU constant memory.
CODE:
PAGE NO.:
B.Tech CSE-AI
Subject Name: GPU
Figure: 8.1 Code
PAGE NO.:
B.Tech CSE-AI
Subject Name: GPU
OUTPUT
Figure: 8.2 Vector-matrix multiplication using GPU
PAGE NO.:
B.Tech CSE-AI
Subject Name: GPU
PRACTICAL: 09
AIM: Analyse the program using NVIDIA Profilers.
CODE:
!nvidia-smi
OUTPUT
Figure: 9.1 NVIDIA Profile
PAGE NO.:

HPC 2

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

HPC 2

Uploaded by

Copyright:

Available Formats

FACULTY OF ENGINEERING AND TECHNOLOGY

This is to certify that Mr./Ms …………………………with

Date of Submission:......................... Staff In charge:...........................

Sr. Page Starting Ending

Understand the system by various linux/

2. Understand the google colab.

3. Analyze the program using gprof profiles.

5. Wap to demonstrate squaring an array using a

6. Wap to demonstrate vector-matrix

Wap vector matrix multiplication with

Wap demonstrate vector-matrix

9. Analyse the program using nvidia profilers

10. With the help of gpu libraries like keras,

RM: -The RM command is used to remove a file.

SYNTAX: - tail <dir name>

RMDIR: -The RMDIR command is used to delete a directory.

LS: -The LS command is used to display a list of content of a directory.

SYNTAX: - com <file1> <file2>

CLEAR: -Linux clear command is used to clear the terminal screen.

WHOAMI: -It tells you about the system's username.

CP: -The CP command is used to copy a file or directory.

SORT: -The SORT command is used to sort files in alphabetical order.

ZCAT: -The ZCAT command is used to display the compressed files.

PASSWD: - USED TO CREATE AND CHANGE PASSWORD.

PIP: - USED TO INSTALL PACKAGE.

int arr[] = {1, 2, 3, 4, 5};

int length = sizeof(arr)/sizeof(arr[0]);

for (int i = 0; i < length; i++) { sum

printf("Sum of all the elements of an array: %d", sum);

Figure: 4.1 Addition of Arrays

printf("Array elements:\n"); for(i =

0;i<5;i++) printf("%d",arr[i]); printf("\

nsquare of array elements;\n"); for(i =

Figure: 5.1 Squaring an Array

pd[myid] = md[myid] + nd[myid];

//Initialize the arrays for(i=0;

// Allocate memory on GPU and transfer the data

// Define number of threads and blocks

// Launch the GPU kernel function

// Transfer the results back to CPU memory cudaMemcpy(p,

// Print the results for(i=0;

Figure: 6.1 Vector Matrix Multiplication

global void arradd(int* md, int* nd, int* pd, int

//Get unique identification number for a given thread

pd[myid] = md[myid] * nd[myid];

int size = 2000 * sizeof(int);

//Initialize the arrays

// Allocate memory on GPU and transfer the data cudaMalloc(&md,

cudaMemcpy(md, m, size, cudaMemcpyHostToDevice);

arradd<<< DimGrid,DimBlock >>>(md,nd,pd,size);

Figure: 7.1 vector matrix multiplication uses shared memory

Figure: 8.1 Code

Figure: 8.2 Vector-matrix multiplication using GPU

Figure: 9.1 NVIDIA Profile

You might also like