You are on page 1of 36

Algorithms and Parallel Computing

Course Introduction

Danilo Ardagna
Politecnico di Milano
danilo.ardagna@polimi.it
Danilo Ardagna - Course Introduction 4

Instructors [A-M)
Federica Filippini
federica.filippini@polimi.it

Danilo Ardagna
danilo.ardagna@polimi.it

Bruno Guindani
bruno.guindani@polimi.it

Luca Pozzoni
lucaezio.pozzoni@polimi.it
Danilo Ardagna - Course Introduction 5

Instructors [M-Z]

Damian Andrew Tamburri


damianandrew.tamburri@polimi.it

Paolo Joseph Baioni


paolojoseph.baioni@polimi.it

Gian Enrico Conti


gianenrico.conti@polimi.it
Danilo Ardagna - Course Introduction 6

Course schedule
• Monday: 8:30-10:00
• Tuesday: 8:30-10:00
• Wednesday: 8:30-10:00,
• Thursday: 11.30-13.00

• Office hours: Thursday 14.30-16.30 or by appointment,


please send to your instructor an email before in any case
Danilo Ardagna - Course Introduction 7

Course schedule
• Lab hands-on sessions (innovative teaching):
• Lab assignment shared a couple of days before
• You will work in groups
• You can propose your own group otherwise you will be assigned to a
group by lecturers
• Dates [A,M):
• 4th October
• 25th October
• 1st December
• 14th December
• 22nd December
Danilo Ardagna - Course Introduction 8

Streaming and lectures recording


• No streaming (only on public transports strikes, but let
me know in advance!)

• Lectures will be always recorded but will be published


only every 1-2 weeks (we are syncing with other courses
in this semester)
Danilo Ardagna - Course Introduction 9

Teaching material
Slides of the course will be published on WeBeep:
Danilo Ardagna - Course Introduction 10

Participate to discussions forums

Always publish
your code
otherwise no
answer will be
provided
Danilo Ardagna - Classes 11

Useful websites – C++ reference

http://en.cppreference.com/
Danilo Ardagna - Classes 12

Useful websites – Stack Overflow

https://stackoverflow.com
Danilo Ardagna - Course Introduction 13

Course goals

• Learn to build industry scale software

• Develop skills necessary to write efficient algorithms

• Solve large-scale problems on parallel computers


Danilo Ardagna - Course Introduction 14

Motivations
• Parallel computing
• From high-end computing and hardware to commodity systems
• Development of faster programs

• Applications
• Require the processing of large amounts of data
• Data science, financial modelling and multimedia processing

• Parallel processing is the only cost-effective method for


the fast solution of these (big-data) problems

• Cloud computing allows to access to resources on


demand
Danilo Ardagna - Course Introduction 15

Course content

• C++ programming language (75%)

• MPI programming framework (25%)


Danilo Ardagna - Course Introduction 16

Teaching material
C++ Primer (5th Edition), S. B. Used also in
Lippman, J. Lajoie, B. E. Moo. APSC/PACS
Addison-Wesley.

The C++ Programming Language


(4th Edition), B. Stroustrup.
Addison-Wesley.
Danilo Ardagna - Course Introduction 17

Teaching material
An introduction to parallel
programming, Peter S. Pacheco.
Morgan Kaufmann.

Introduction to Algorithms (3rd


Edition),T. Cormen, C. Leiserson,
R. Rivest and C. Stein. MIT Press.
Danilo Ardagna - Course Introduction 18

Exam
• Final evaluation is based on a
written exam (maximum grade 31)

• The exam will consist in three exercises (two on


C++, one on MPI) and will last 2-2.5 hours

• Three assignments during the semester with an


auto-grader
• One point for each assignment
• Instructions will be provided soon on WeBeep
• Sampling of the solutions:
• Instructors can ask for an oral discussion
• -2 if we realize you are cheating (and you lose any other bonus
you earned)
Danilo Ardagna - Course Introduction 19

Build industry scale software – software is everywhere


• Our society depends on and relies on software

• Do you drive a car?


• “Cars run on code”
• High-end cars contain close to 100 million
Lines of code (LOC)
• Running on 70-100 microprocessor-based
electronic control units
• Self driving cars …

• Do you travel on planes?


• “A plane is software with wings”

• Do you have a smartphone?


• Besides talking, you
• Interact in social networks
• Use it for travel instructions
• …
Danilo Ardagna - Course Introduction 20

Build industry scale software


• Software is key and pervasive, but still fragile
• Ever increasing size (ultra-large systems)
• Ever increasing criticality
• Dependability despite normal failures
• Continuous evolution and deployment
• New paradigms for development and acquisition

• Object-oriented programming approach to cope with


these issues
Danilo Ardagna - Course Introduction 21

Programming languages
styles/paradigms evolution
• Procedural programming
• Data abstraction
• Object-oriented programming
• Generic programming

• Functional programming, logic programming, rule-based


programming, constraints-based programming, aspect-
oriented programming, …
Danilo Ardagna - Course Introduction 22

Programming languages
styles/paradigms evolution
• Procedural programming
• Data abstraction
• Object-oriented programming
• Generic programming
• Portability is good
• High performance is good
• Functional programming, logic programming,
• Anything rule-based
that eases debugging is good
programming, constraints-based programming,
• Stability over decades is goodaspect-
oriented programming,•• …Ease of learning is good
Small is good
• Whatever helps analysis is good
• Having lots of facilities is good

• You can’t have all at the same time:


engineering trade-offs
Danilo Ardagna - Course Introduction 23

Build industry scale software:


what do we need?

• Easily support the work of many developers

• Favour code reuse

• Easily support testing, debugging and software evolution


Danilo Ardagna - Course Introduction 24

C++ vs. Matlab & R


I have been using Matlab and C++ for about 10 years. For
every numerical algorithms implemented for my research, I
always start from prototyping with Matlab and then
translate the project to C++ to gain a 10x to 100x
performance improvement. Of course, I am comparing
optimized C++ code to the fully vectorized Matlab code.
On average, the improvement is about 50x.
https://stackoverflow.com/questions/20513071/performance-tradeoff-when-is-matlab-
better-slower-than-c-c
Danilo Ardagna - Course Introduction 25

C++ vs. Matlab & R


I have been using Matlab and C++ for about 10 years. For
every numerical algorithms implemented for my research, I
always start from prototyping with Matlab and then
translate the project to C++ to gain a 10x to 100x (I am not
kidding) performance improvement. Of course, I am
comparing optimized C++ code to the fully vectorized
Matlab code. On average, the improvement is about 50x.

https://stackoverflow.com/questions/20513071/performance-tradeoff-when-is-matlab-
better-slower-than-c-c
Danilo Ardagna - Course Introduction 26

Parallel computing
• Consider your favorite computational application
• One processor can give me results in N hours
• Why not use N processors…
…and get the results in just one hour?

• Parallelism = applying multiple processors to a single problem

• Split your program in multiple smaller chunks (working in a fraction


of data)

• Chunks collaborate by sending messages (MPI– Message Passing


Interface)
Danilo Ardagna - Course Introduction 27

Parallel computing
• Performance comes at a price: complexity
• Applications must be written specifically to take advantage of
distributed computing
• Performance characteristics of applications change
• Debugging becomes more of a challenge

If we have 1,000 cores?


Danilo Ardagna - Course Introduction 28

Parallel computing

Development of models begins at small scale.


Working on your laptop is convenient, simple.
Actual analysis, however, is slow.

“Scaling up” typically means a small server


or fast multi-core desktop.
Speedup exists, but for very large models,
not significant.
Single machines don't scale up forever.
Danilo Ardagna - Course Introduction 29

Parallel computing

For the largest models, a different approach is required.

Nowadays we can access computing resources in the Cloud on


demand (e.g., Amazon, Microsoft, Google, IBM,….)
Danilo Ardagna - Course Introduction 30

Parallel computing

For the largest models, a different approach is required.

Nowadays
ImageNet wedataset:
can access computing resources in the Cloud on
https://devopedia.org/imagenet
demand (e.g., Amazon, Microsoft, Google, IBM,….)

Training requires weeks on hundreds of servers


Danilo Ardagna - MapReduce and Hadoop Ecosystem 31

Top 500 Computers (http://www.top500.org)


Flops

Study Linux!
• Since November 2017, 100% of TOP500 systems are based on Linux
• Cheat sheet available on WeBeep
Danilo Ardagna - Course Introduction 32

Download CLion IDE for C++ programming

https://www.jetbrains.com/clion/download
Danilo Ardagna - Course Introduction 33

Download the VM for MPI programming and the


Linux cheat sheet

• If your laptop runs Windows or


an Intel Mac install the VM

• If you run a Mac with M1/M2


CPU, follow the Docker
installation

Tutoring service will be available


early October (VM installation
and initial classes review)
Danilo Ardagna - Course Introduction 34

Why APC is important


• You will put in practice all the math you know
(and you will learn in the next two years) through coding

• At interviews companies ask questions on, e.g., shared


pointers!

• You come to my office asking how to port your R/Matlab


thesis code to C++ & MPI to make it faster!

• You have more than 30% chance to write code when


you will graduate
Danilo Ardagna - Course Introduction 35

Mateday 2019 – Occupazione Ing Matematica

Iscritti: 382; presenze mattina: 310; pomeriggio (mentoring aziendale): 100

CCS – 27 Novembre2019
Danilo Ardagna - Course Introduction 36

How to be successful in this course?

1. Review C language
2. Write your code
3. Check solutions only after your own
implementation
4. Participate to forums
5. Hack solutions provided on the
course website
6. Forget Matlab and R!
Danilo Ardagna - Course Introduction 37

How to be successful in this course?

1. Review C language
2. Write your code
3. Check solutions only after your own
implementation
4. Participate to Piazza
5. Hack solutions provided on the
course website
6. Forget Matlab and R!
Danilo Ardagna - Course Introduction 38

How to be successful in this course?


1. End of December 2022,
2022 APC Statistics
20 students (over 355)
60.00% were attending classes!
50.00%
2. The worst results in the

Participate to classes
40.00%

30.00%
course history!!
20.00% 3. Average grade 22!!!

in presence and
10.00%

0.00%
January ca ll Februa ry call June ca ll July call Sep tember call
2022 APC Statistics
Success rate

study day-by-day!!!!
180
160
140
120
100
80
60
40
20
0
January ca ll Februa ry call June ca ll July call Sep tember call

Studen ts e nrolled

You might also like