You are on page 1of 58

Parallel Computing (CS 633)

January 8, 2024

Preeti Malakar
pmalakar@cse.iitk.ac.in
Logistics
• Class hours: MW 3:30 – 4:45 PM (RM 101)
• Office hour: MTW 5:00 – 5:30 PM (KD 221)
• https://www.cse.iitk.ac.in/users/cs633/2023-24-2
– Lectures will be uploaded after every class
• Announcements/uploads on
– MooKIT
– Course email alias
• Email to the instructor should always be prefixed with
[CS633] in the subject
2
Grading Policy
Participate actively in class

3
Switch OFF All Devices

4
5
Assignments

• Programming assignments in C
• In a group (group size = 3)
– Send group member information by Jan 14 to
{gsarkar,madhavm}@cse.iitk.ac.in
– Include clearly names, roll numbers, IITK email-ids
– Subject of email [CS633 Group]
– Change in group formation is not allowed
• Mode of submission will be explained in due time

6
Assignments

• Credit for early submissions (+5 / day)


– Max credit: +15 / assignment
– Last date of submission will be considered only
• Score reduction for late submissions (-3 / day)
– Max 2 late days / assignment
• None of the assignments can be completed in a day!

Plagiarism will NOT be tolerated


Use of AI tools is NOT allowed
7
Lecture 1

Introduction
Multicore Era
CPU
Intel 4004
(1971)
Single core
single chip

Single core Hydra Multiple cores


(2000)
multiple chips single chip
Cray X-MP IBM POWER4
(1982) (2001)
Multiple cores
multiple chips

9
Moore’s Law (1965)
Number of transistors in a chip doubles every 18 months

[Source: Wikipedia]
“However, it must be programmed with a more complicated parallel programming
10
model to obtain maximum performance.”
Trends

[Source: M. Frans Kaashoek, MIT]


11
12
top500.org (Nov’23)

~ $600 million
~ 7300 sq. ft.
~ 22 MW power
~ 23000 L water

13
green500.org (Nov’23)

Metric of interest: Performance per Watt 14


15
Top #1
supercomputer

https://www.top500.or
g/resources/top-
systems/

16
Making of a Supercomputer

Source: energy.gov 17
Greenest Data Centre?

Source: MIT TR 06/19

18
“The 149,000 square
foot facility built on a
hillside overlooking the
UC Berkeley campus
and San Francisco Bay
will house one of the
most energy-efficient
computing centers
anywhere, tapping into
the region’s mild
climate to cool the
supercomputers at the
National Energy
Research Scientific
Computing Center
(NERSC) and eliminating
the need for
mechanical cooling. ”

https://www.science.org/content/article/climate-change-threatens-supercomputers 19
Top Supercomputers from India

20
Supercomputing in India [topsc.cdacb.in, Jul’23]

21
Source: www.iitk.ac.in
22
Credit: Ashish Kuvelkar, CDAC
23
National Supercomputing Mission Sites

24
Big Compute

25
Massively Parallel Codes

Climate simulation of Earth [Credit: NASA]


26
Discretization

Gridded mesh for a global model [Credit: Tompkins, ICTP]

27
Numerical Weather Models

• Use numerical methods to solve equations


that govern atmospheric processes
• Are based on fluid dynamics and depend on
observations of meteorological variables
• Are used to obtain nowcast/forecast

28
Massively Parallel Simulations

Self-healing material simulation


[Nomura et al., “Nanocarbon synthesis by high-temperature
oxidation of nanoparticles”, Scientific Reports, 2016] 29
Massively Parallel Analysis

[Nomura et al., “Nanocarbon synthesis by high-temperature


oxidation of nanoparticles”, Scientific Reports, 2016]
30
Massively Parallel Codes

Cosmological simulation [Credit: ANL]


31
Massively Parallel Analysis
Virgo Consortium

32
Computational Science

[Source: Culler, Singh and Gupta] 33


Big Data

34
Output Data
10 PB / year

High-
2 PB / simulation
energy
Scaled to 786K cores on Mira
physics
Higgs boson simulation
Source: CERN
240 TB / simulation

Cosmology
Q Continuum simulation
Source: Salman Habib et al.

Climate/weather
Hurricane simulation
Source: NASA 35
Input Data

[Credit: World Meteorological Organization]


36
System Architecture Trends

[Credit: Pavan Balaji@ATPESC’17] 37


I/O trends

NERSC I/O trends [Credit: www.nersc.gov]


38
Compute vs. I/O trends
I/O VS. FLOPS FOR #1 SUPERCOMPUTER IN TOP500 LIST
1.00E-03

1.00E-04
Byte/FLOP

1.00E-05

1.00E-06
1997 2001 2004 2008 2010 2011 2013 2015 2018

39
Why Parallel?

A*
20 hours

2 hours
Not really
40
Parallelism
A parallel computer is a collection of processing
elements that communicate and cooperate to solve
large problems fast.

– Almasi and Gottlieb (1989)

41
Speedup
Example – Sum of squares of N numbers
Serial Parallel

for i = 1 to N for i = 1 to N/P


sum += a[i] * a[i] sum += a[i] * a[i]
collate result

O(N) O(N/P) +
Communication time
42
Performance Measure
• Speedup
Time ( 1 processor)
SP =
Time ( P processors)

• Efficiency
SP
EP =
P

43
Parallel Performance (Parallel Sum)
Parallel efficiency of summing 10^7 doubles

#Processes Time (sec) Speedup Efficiency


1 0.025 1 1.00
2 0.013 1.9 0.95
4 0.010 2.5 0.63
8 0.009 2.8 0.35
12 0.007 3.6 0.30

44
Ideal Speedup
Speedup Linear
Superlinear

Sublinear

Processors
45
Issue – Scalability

[Source: M. Frans Kaashoek, MIT]


46
Scalability Bottleneck

Performance of weather simulation application


47
Parallelism
A parallel computer is a collection of processing
elements that communicate and cooperate to solve
large problems fast.

– Almasi and Gottlieb (1989)

48
Distributed Memory Systems

• Networked systems
Node • Distributed memory
• Local memory
• Remote memory
• Parallel
Codefile system

Cluster
49
Parallel Programming Models
Libraries MPI, TBB, Pthread, OpenMP, …
New languages Haskell, X10, Chapel, …
Extensions Coarray Fortran, UPC, Cilk, OpenCL, …

• Shared memory
– OpenMP, Pthreads, …
• Distributed memory
– MPI, UPC, …
• Hybrid
– MPI + OpenMP
50
This course …

51
Large-scale Parallel Computing

Message Parallel
passing algorithms

Designing Performance
parallel codes analysis

52
Message Passing Paradigm

• Point-to-point (P2P) communications


• Collective communications
• Algorithms
• Performance

53
Profiling

54
Parallel I/O
NOT SHARED

2 GB/s SHARED
BRIDGE NODES

4 GB/s

IB NETWORK

128:1

Compute node rack I/O nodes GPFS filesystem

11
Job Scheduling

Wikipedia

NODES USERS

JOBS

Example of a real supercomputer activity


- Argonne National Laboratory Theta jobs
56
Supercomputer Activity

57
Reference Material

• DE Culler, A Gupta and JP Singh, Parallel Computer Architecture:


A Hardware/Software Approach Morgan-Kaufmann, 1998.
• A Grama, A Gupta, G Karypis, and V Kumar, Introduction to
Parallel Computing. 2nd Ed., Addison-Wesley, 2003.
• Marc Snir, Steve W. Otto, Steven Huss-Lederman, David W.
Walker and Jack Dongarra, MPI - The Complete Reference,
Second Edition, Volume 1, The MPI Core.
• Bill Gropp, Using MPI, Third Edition, The MIT Press, 2014.
• Research papers

58

You might also like