You are on page 1of 5

S. D. M.

College of Engineering and Technology, Dharwad – 580 002


Department of Electronics and Communication Engineering
Internal Assessment-I Scheme

Semester: VIII Date: 10/04/2021


Course Code & Title: 15UECE872- GPU Computing Max. Marks: 20
Course Instructor: Mr. Sunil S. Mathad Time: 10:00 to 11.00AM

Note: Q3 is Compulsory answer any one from Q1 and Q2


.
Q1(a) Bring out the fundamental differences in design philosophies of CPU and GPU. (5M)

Q1(b With appropriate example show memory as a limiting factor to parallelism. (5M)
)

All threads access global memory for their input matrix elements

–Two memory accesses (8 bytes) per floating point multiply-add (2 fpops)


–4B/s of memory bandwidth/FLOPS
–150 GB/s limits the code at 37.5 GFLOPS
But 37.5 GFLOPs is a limit. In an actual execution, memory is not busy all the time, and
the code runs at about 25 GFLOPs. To get closer to 1,000 GFLOPs we need to drastically
cut down accesses to global memory.

OR

Q2(a) Brief about compilation and execution of a CUDA program. (5M)

Q2(b With an example explain CUDA grid organization. (5M)


)

Q3(a) Elaborate upon synchronization and transparent scalability in CUDA GPUs. (5M)
A barrier is a synchronization point:

–each thread calls a function to enter barrier;


–threads block (sleep) in barrier function until all threads have called;
–after last thread calls function, all threads continue past the barrier.
An API function call in CUDA __syncthreads()
•All threads in the same block must reach the __syncthreads() before any can move on.

Q3(b Write a CUDA code to add two matrices of the order N; involving kernel definition and (5M)
) launch of kernel from the host code.

#include<stdio.h>

#include<cuda.h>

#define N 3

__global__ void matadd(int *l,int *m, int *n)

int x=blockIdx.x;
int y=blockIdx.y;
int id=gridDim.x * y +x;
n[id]=l[id]+m[id];
}

int main()

int a[N][N];
int b[N][N];
int c[N][N];
int *d,*e,*f;
int i,j;

printf("\n Enter elements of first matrix of size 2 * 3\n");


for(i=0;i<N;i++)
{
for(j=0;j<N;j++)
{
scanf("%d",&a[i][j]);
}
}
printf("\n Enter elements of second matrix of size 2 * 3\n");
for(i=0;i<N;i++)
{
for(j=0;j<N;j++)
{
scanf("%d",&b[i][j]);
}
}

cudaMalloc((void **)&d,N*N*sizeof(int));
cudaMalloc((void **)&e,N*N*sizeof(int));
cudaMalloc((void **)&f,N*N*sizeof(int));

cudaMemcpy(d,a,N*N*sizeof(int),cudaMemcpyHostToDevice);
cudaMemcpy(e,b,N*N*sizeof(int),cudaMemcpyHostToDevice);

dim3 grid(N,N);
/* Here we are defining two dimensional Grid(collection of blocks) structure. Syntax is
dim3 grid(no. of columns,no. of rows) */

matadd<<<grid,1>>>(d,e,f);

cudaMemcpy(c,f,N*N*sizeof(int),cudaMemcpyDeviceToHost);
printf("\nSum of two matrices:\n ");
for(i=0;i<N;i++)
{
for(j=0;j<N;j++)
{
printf("%d\t",c[i][j]);
}
printf("\n");
}
cudaFree(d);
cudaFree(e);
cudaFree(f);
return 0;
}

Q. No. 1a 1b 2a 2b 3a 3b

CO 1 1,2 2 1,2 1,2 2,3


Approved by IQAC

You might also like