You are on page 1of 2

Abhinay Surve

PE27 E2
Assignment No:7

Title: Write a CUDA program to find the sum/avg of elements in the N-element vector.
(where N is large number)

Aim: Design and implement a a CUDA program to find the sum/avg of elements element in
the N-element vector. (where N is large number)
Theory:

1]Explain the logic of reduction in addition/summation


Reduction is the process of converting an expression to a simpler form. Conceptually, an
expression is reduced by simplifying one reducible expression (called “redex”) at a time. Each
step is called a reduction.In computability theory and computational complexity theory, a
reduction is an algorithm for transforming one problem into another problem. A sufficiently
efficient reduction from one problem to another may be used to show that the second problem is
at least as difficult as the first.

2]What role do threads play in addition of numbers using reduction

 OpenMP will make a copy of the reduction variable per thread, initialized to the identity of the
reduction operator, for instance 11 for multiplication.
 Each thread will then reduce into its local variable;
 At the end of the parallel region, the local results are combined, again using the reduction operator,
into the global variable.
FAQs:

1. Name the top 5 super computers of the world


1. Fugaku, Japan
Built by Fujitsu, Fugaku is installed at the RIKEN Center for Computational Science
(R-CCS) in Kobe, Japan. With its additional hardware, the system achieved a new
world record of 442 petaflops result on HPL, making it three times ahead of the
number two system in the list.
RIKEN’s director, Satoshi Matsuoka, stated that the improvement came as they were
“finally being able to use the entire machine rather than just a good chunk of it.”
Since the June competition, his team has been able to fine-tune the code for
maximum performance. “I don’t think we can improve much anymore,” Matsuoka
said.
2. Summit, U.S.
Based at the Oak Ridge National Laboratory (ORNL) in Tennessee, Summit was
built by IBM and is the fastest system in the US.. Launched in 2018, it has a
performance of 148.8 petaflops and has 4,356 nodes, each one housing two 22-core
Power9 CPUs and six NVIDIA Tesla V100 GPUs.
Recently, two teams working on Summit won the prestigious Gordon Bell Prize for
outstanding achievement in high-performance computing, commonly referred to as
the ‘Nobel Prize of supercomputing.’
3. Sierra, U.S.
A system at the Lawrence Livermore National Laboratory (LLNL) in California,
Sierra has an HPL mark of 94.6 petaflops. With each of its 4,320 nodes equipped
with two Power9 CPUs and four NVIDIA Tesla V100 GPUs, it has an architecture
similar to that of Summit.
Sierra also made it to the 15th position on the Green500 List of the world’s most
energy-efficient supercomputers.
4. Sunway TaihuLight, China
Installed at China’s National Supercomputing Center in Wuxi, Sunway TaihuLight
previously held the Number 1 spot for two years (2016-2017). However, its rank has
since fallen. While it was in third position last year, it has now slipped to fourth.
Built by China’s National Research Center of Parallel Computer Engineering &
Technology (NRCPC), it achieved 93 petaflops on its HPL benchmark. It is powered
exclusively by Sunway SW26010 processors.
5. Selene, U.S.
Installed in-house at NVIDIA Corp, Selene jumped to fifth position from seventh
position in the June rankings. After a recent upgrade, Selene achieved 63.4 petaflops
on HPL, nearly doubling its previous score of 27.6 petaflops.
NVIDIA unveiled Selene, its AI supercomputer, in June this year, after constructing
and running it in less than a month. Its key uses include system development and
testing, in-house AI workloads, and chip design work.

2. Name the components of GPU responsible for parallel processing


 Shared Memory (without threads)
 Threads
 Distributed Memory / Message Passing
 Data Parallel
 Hybrid
 Single Program Multiple Data (SPMD)
 Multiple Program Multiple Data (MPMD)

You might also like