NirajTamang Week8

Device Query
dq00.cu
This is a C program that uses the CUDA library to detect the number of CUDA capable
devices available on the system. The program starts by declaring an integer variable
device_count. It then calls the function cudaGetDeviceCount() to get the number of CUDA
capable devices available on the system. The function returns a cudaError_t value which
is stored in the variable error_id. If the value of error_id is non-zero, the program prints
an error message to the standard error stream and exits with a status code of 1. If the
value of device_count is 0, the program prints a message to the standard error stream
indicating that there are no available devices that support CUDA. Otherwise, the program
prints the number of CUDA capable devices detected on the system to the standard
output stream using the printf() function. Based on the input, the program was compiled
using the nvcc compiler and executed successfully. The output of the program indicates
that 1 CUDA device was detected on the system. This means that the system has a GPU
that supports CUDA and can be used for parallel computing tasks using the CUDA library.
dq01.cu
This is a C program that uses the CUDA library to query the properties of the first device
on the system. The program starts by defining a cudaDeviceProp variable
device_properties. The program then calls the function cudaGetDeviceProperties() to get
the properties of the first device on the system and store them in device_properties. The
program then calls the functions cudaDriverGetVersion() and cudaRuntimeGetVersion()
to get the version numbers of the CUDA driver and runtime respectively. Finally, the
program prints the name of the first device on the system, its compute capability, the
version number of the CUDA driver, and the version number of the CUDA runtime to the
standard output stream using the printf() function.
Based on the input, nvcc compiled and executed the program successfully. The output of
the program indicates that the first device on the system is a Nvidia GeForce GT 710 with
a compute capability of 3.5. The output also shows the version number of the CUDA driver
and the version number of the CUDA runtime.
dq02.cu
This is a C program that uses the CUDA library to retrieve information about the CUDA
capable device installed on your system. The program starts by declaring a
cudaDeviceProp variable device_properties. It then calls the function
cudaGetDeviceProperties() to get the properties of the CUDA capable device with index
0. The properties of the device are stored in the device_properties variable. The program
then prints the name of the device, its clock rate, memory, number of multi processors,
cores per multiprocessor, and total CUDA cores to the standard output stream using the
printf() function.Based on the input, nvcc compiled and executed the program
successfully. The output of the program indicates the devices with the index 0 is a “Nvidia
GeForce GT 710” with a clock rate of 954 MHz and 2098003968 bytes of memory. The
program also shows that the device has 1 multiprocessor with 192 cores per
multiprocessor. Therefore, the total number of CUDA cores on the device is 192.
dq03.cu
This is a C program that uses the CUDA library to retrieve information about the CUDA
capable device installed on your system. The program starts by declaring a
cudaDeviceProp variable device_properties. It then calls the function
cudaGetDeviceProperties() to get the properties of the CUDA capable device with index
0. The properties of the device are stored in the device_properties variable. The program
then prints the maximum number of threads per block, the maximum size of each
dimension of a block, and the maximum size of each dimension of a grid to the standard
output stream using the printf() function.
program indicates that the maximum number of the threads per block is 1024. The
maximum size of each dimension of a block is [1024] [1024][64]. The maximum size of
each dimension of a grid is [2147483647][65535][65535].
Learning Cuda
01.cu
This is a C program that uses the CUDA library to allocate memory on the device, copy
data from the host to the device, execute a kernel function on the device, copy data back
from the device to the host, and free the memory on the device. The program starts by
declaring an integer pointer d_n and an integer variable h_n. It then calls the function
cudaMalloc() to allocate memory on the device for d_n. The function returns a cudaError_t
value which is stored in the variable error. If the value of error is non-zero, the program
prints an error message to the standard error stream and exits with a status code of 1.
The program then calls the function cudaMemcpy() to copy the value of h_n from the host
to the device. The function returns a cudaError_t value which is stored in the variable
error. If the value of error is non-zero, the program prints an error message to the standard
error stream. The program then launches a kernel function kernel() on the device using
the <<<1,1>>> syntax. The kernel function sets the value of d_n to 97. The program then
calls the function cudaThreadSynchronize() to synchronize the host thread with the
device. The program then calls the function cudaMemcpy() to copy the value of d_n from
the device to the host. The function returns a cudaError_t value which is stored in the
variable error. If the value of error is non-zero, the program prints an error message to the
standard error stream. Finally, the program calls the function cudaFree() to free the
memory allocated for d_n on the device. The function returns a cudaError_t value which
is stored in the variable error. If the value of error is non-zero, the program prints an error
message to the standard error stream and exits with a status code of 1. Based on the
input, nvcc compiled and executed the program successfully. The output of the program
indicates that the value of h_n is 97.
01b.cu
This is a C program that uses the CUDA library to allocate memory on the device, copy
data from the host to the device, execute a kernel function on the device, copy data back
from the device to the host, and free the memory on the device. The program starts by
declaring an integer pointer d_n and an integer variable h_n. It then calls the function
cudaMalloc() to allocate memory on the device for d_n. The program then calls the
function cudaMemcpy() to copy the value of h_n from the host to the device. The program
then launches a kernel function kernel() on the device using the <<<1,1>>> syntax. The
kernel function sets the value of d_n to 97. The program then calls the function
cudaThreadSynchronize() to synchronize the host thread with the device. The program
then calls the function cudaMemcpy() to copy the value of d_n from the device to the host.
Finally, the program calls the function cudaFree() to free the memory allocated for d_n on
the device. Based on the input, nvcc compiled and executed the program successfully.
The output of the program indicates that the value of h_n is 97.
02.cu
This is a C program that uses the CUDA library to execute a kernel function on the device
with different thread block and grid dimensions. The program starts by declaring a kernel
function kernel(). The kernel function prints the block and thread indices of the current
thread to the standard output stream using the printf() function. The program then calls
the kernel function with different thread block and grid dimensions using the <<<...>>>
syntax. The program then calls the function cudaThreadSynchronize() to synchronize the
host thread with the device. Finally, the program returns 0.
the program indicates that the kernel function was executed with different thread block
and grid dimensions. The output shows the block and thread indices of each thread that
was executed by the kernel function. The output is formatted to display the block and
thread indices in a tabular format for easy readability.
03.cu
This is a C program that uses the CUDA library to perform element-wise addition of two
integer arrays h_a and h_b of size 128 on the device. The program starts by declaring
three integer pointers d_a, d_b, and d_c and two integer arrays h_a and h_b. It then calls
the function cudaMalloc() to allocate memory on the device for d_a, d_b, and d_c. The
function returns a cudaError_t value which is stored in the variable error. If the value of
error is non-zero, the program prints an error message to the standard error stream and
exits with a status code of 1. The program then calls the function cudaMemcpy() to copy
the values of h_a and h_b from the host to the device. The program then launches a
kernel function kernel() on the device using the <<<1,128>>> syntax. The kernel function
performs element-wise addition of the arrays a and b and stores the result in the array c.
The program then calls the function cudaThreadSynchronize() to synchronize the host
thread with the device. The program then calls the function cudaMemcpy() to copy the
values of d_c from the device to the host. Finally, the program calls the function
cudaFree() to free the memory allocated for d_a, d_b, and d_c on the device.
Based on input, nvcc compiled and executed the program successfully. The output of the
program indicates that the element-wise addition of the arrays h_a and h_b was
performed correctly on the device and the results were copied back to the host. The
program then prints the results of the addition operation to the standard output stream
using the printf() function. The output shows the sum of each pair of elements in h_a and
h_b in a tabular format.
04.cu
program then calls the function cudaMemcpy() to copy the values of h_a and h_b from
the host to the device. The program then launches a kernel function kernel() on the device
using the <<<1,128>>> syntax. The kernel function performs element-wise addition of the
arrays a and b and stores the result in the array c. The program then calls the function
then calls the function cudaMemcpy() to copy the values of d_c from the device to the
host. Finally, the program calls the function cudaFree() to free the memory allocated for
d_a, d_b, and d_c on the device.
the program indicates that the element-wise addition of the arrays h_a and h_b was
performed correctly on the device and the results were copied back to the host. The
program then prints the results of the addition operation to the standard output stream
using the printf() function. The output shows the sum of each pair of elements in h_a and
h_b in a tabular format.
05.cu
using the <<<1,1500>>> syntax. The kernel function performs element-wise addition of
the arrays a and b and stores the result in the array c. The program then calls the function
cudaGetLastError() to check if there was any error during the kernel launch. If there was
an error, the program prints an error message to the standard error stream and exits with
a status code of 1. The program then calls the function cudaThreadSynchronize() to
synchronize the host thread with the device. The program then calls the function
cudaMemcpy() to copy the values of d_c from the device to the host. Finally, the program
calls the function cudaFree() to free the memory allocated for d_a, d_b, and d_c on the
device.
Based on the input, nvcc compiled and executed the program successfully. However, the
output of the program indicates that there was an error during the kernel launch. The error
message indicates that the kernel launch returned an invalid configuration argument. This
error can occur if the kernel launch parameters are not valid for the device.
06.cu
integer arrays h_a and h_b of size 1500 on the device. The program starts by defining
using the <<<10,150>>> syntax. The kernel function performs element-wise addition of
the arrays a and b and stores the result in the array c. The program then calls the function
then calls the function cudaMemcpy() to copy the values of d_c from the device to the
host. Finally, the program prints the results of the addition operation to the standard output
stream using the printf() function.
Based on input, nvcc compiled and executed the program successfully. The output of the
program shows the sum of each pair of elements in h_a and h_b in a tabular format.
factorise_3_4_block.c
This is a C program that uses the pthread library to factorize a given number goal using
four threads. The program starts by defining a struct arguments_t which contains two
integer fields start and block_size. The program then defines a function find_factors()
which takes an arguments_t pointer as an argument and finds the factors of goal within
the specified block of numbers. The program then creates four threads and assigns each
thread a block of numbers to factorize using the pthread_create() function. The program
then waits for all threads to complete using the pthread_join() function. Finally, the
program calculates the time taken to factorize goal using the clock_gettime() function.
the program indicates the time taken to factorize goal using four threads.
factorise_3_cude.cu
This is a C program that uses the CUDA library to find the factors of a given number goal
using parallel processing. The program starts by defining a kernel function factorise().
The kernel function takes three integer variables a, b, and c as input and checks if their
product is equal to goal. If the product is equal to goal, the kernel function prints the values
of a, b, and c to the standard output stream using the printf() function. The program then
defines a function time_difference() which takes two timespec pointers start and finish
and a long long int pointer difference as input and calculates the time difference between
the two timespecs. The program then defines a struct timespec variable start and calls
the clock_gettime() function to get the current time and store it in start. The program then
defines two dim3 variables gd and bd and initializes them with the values (1000,1000, 1)
and (1000,1, 1) respectively. The program then launches the kernel function factorise()
on the device using the <<<gd,bd>>> syntax. The program then calls the function
cudaDeviceSynchronize() to synchronize the host thread with the device. The program
then calls the function cudaGetLastError() to check if there was any error during the kernel
launch. If there was an error, the program prints an error message to the standard error
stream and exits with a status code of 1. The program then calculates the time taken to
factorize goal using parallel processing using the time_difference() function and prints the
result to the standard output stream using the printf() function.
the program indicates that the kernel launch was successful, and the factors of goal were
found using parallel processing. The output also shows the time taken to find the factors
of goal using parallel processing.

NirajTamang Week8

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

NirajTamang Week8

Uploaded by

Copyright:

Available Formats

Device Query

You might also like