What is Parallel Computing?
Parallel computing is a process where a large problem is broken into smaller parts
and solved simultaneously using multiple processors.
Unlke serial programming, where one instruction is executed at a time, parallel
computing executes many instructions at the same time.
Serial Programming Fails when:
o Only one task can run at a time, leading to wasted hardware resources and
increased execution time.
Why Parallel Programming?
Handles dynamic and large problems more efficiently.
Enables faster computations and resource savings.
Makes complex simulations and data processing possible (e.g., weather forecasting,
real-time systems).
Needed due to limits in improving processor speeds (e.g., clock speed limits, heat
issues).
Moore's Law
Predicts that the number of transistors on a chip doubles every two years.
This increases computational power.
But physical limits mean we need to focus on multiple cores rather than faster single
cores.
Performance Factors
Processor speed and memory speed both matter.
Clock rates have increased by ~40%, but DRAM (memory) speed only ~10%.
Cache memory is used to reduce the gap between processor and memory speeds.
Types of Parallelism
1. Pipelining:
o Breaks tasks into stages so multiple instructions are processed in different
stages at the same time.
2. Superscalar Execution:
o Processor executes multiple instructions per clock cycle using multiple
execution units.
3. Very Long Instruction Word (VLIW):
o Compiler schedules multiple instructions in advance.
o Simplifies hardware but increases complexity in programming.
4. SIMD (Single Instruction, Multiple Data):
o One instruction operates on multiple data elements.
o Used in GPUs, image processing, matrix operations.
5. MIMD (Multiple Instruction, Multiple Data):
o Different processors execute different instructions on different data.
Memory Architectures
1. Shared Memory:
o All processors access the same memory.
o Easier to program but difficult to scale (e.g., UMA & NUMA).
UMA: Equal access time to all memory.
NUMA: Access time varies depending on memory location.
2. Message Passing:
o Each processor has its own memory.
o Processors communicate by sending messages (e.g., using MPI or PVM).
o More scalable but harder to program.
Cache Coherency
Ensures that multiple copies of the same memory block (cached in different
processors) stay consistent.
PRAM (Parallel Random Access Machine) Models
1. EREW – Exclusive Read, Exclusive Write (no sharing).
2. CREW – Concurrent Read, Exclusive Write.
3. ERCW – Exclusive Read, Concurrent Write.
4. CRCW – Concurrent Read and Write (most powerful).
Threads
A thread is a single sequence of execution within a program.
Multi-threading allows multiple threads to run in parallel.
Benefits of Using Threads
1. Portable software across systems.
2. Load balancing – Distribute work evenly.
3. Better resource usage – Avoids delays due to I/O or waiting.
OpenMP – A Tool for Parallel Programming
Used in C/C++ and Fortran.
Uses pragmas/directives to manage parallel code.
Basic Directives:
#pragma omp parallel: Starts parallel region.
#pragma omp parallel for: Splits loop among threads.
Thread Functions:
omp_get_thread_num(): Gets thread ID.
omp_get_num_threads(): Gets total threads.
omp_set_num_threads(N): Sets number of threads.
omp_get_wtime(): Gets wall-clock time.
Data Handling:
private: Each thread has its own copy.
shared: All threads share the variable.
firstprivate: Copy with initial value.
lastprivate: Saves the final value from the last iteration.
Synchronization:
#pragma omp critical: Allows one thread at a time.
#pragma omp atomic: Fast, thread-safe increment.
#pragma omp barrier: Synchronizes all threads.
Loop Scheduling:
Static: Equal chunks to threads.
Dynamic: Assigns as threads become free.
Guided: Starts with large chunks, decreases over time.
Parallel Sections:
#pragma omp parallel sections: Different tasks run in parallel.
Performance Metrics
FLOPS (Floating Point Operations Per Second): Measures computing power.
GFLOPS = Billion FLOPS.
Conclusion
Parallel computing improves speed, efficiency, and makes complex problems solvable by:
Breaking tasks into subtasks.
Running them across multiple processors/threads.
Using smart scheduling, memory handling, and architecture-aware programming.
Would you like this in a downloadable PDF or formatted as a revision sheet?
Great! Here's a simplified and easy-to-learn version of your IPP Question Bank, with short
but clear answers — perfect for revision and exams.
✅ UNIT 1: Basics of Parallel Programming
1. What is Parallel Computing?
Parallel computing means solving a large problem by dividing it into smaller tasks and
running them simultaneously using multiple processors or threads. This helps in faster
execution.
2. What is Concurrency?
Concurrency means that multiple tasks are in progress, but not necessarily at the same
time. It focuses more on task management than actual simultaneous execution.
3. Why do we need Parallel Programming?
To solve problems faster, use hardware efficiently, and handle large, complex, or real-time
problems like simulations, weather forecasting, etc.
4. Difference Between Serial and Parallel Execution
Serial: One task at a time (step-by-step).
Parallel: Many tasks run together using multiple cores or threads.
✅ UNIT 2: OpenMP Basics
5. What is OpenMP?
OpenMP is an API in C, C++, and Fortran used to write parallel programs easily using
pragmas (e.g., #pragma omp parallel).
6. How to Create Threads in OpenMP?
Use:
#pragma omp parallel
This starts a parallel region where multiple threads execute the block of code.
7. OpenMP for Loop Parallelism
#pragma omp parallel for
for (int i = 0; i < N; i++) { ... }
This splits loop iterations among threads.
8. Critical and Atomic
#pragma omp critical: Only one thread at a time in this block (for complex updates).
#pragma omp atomic: Faster, for simple updates like sum += a[i].
9. Private, Shared, Firstprivate, Lastprivate
private: Each thread gets its own copy (uninitialized).
shared: All threads use the same variable.
firstprivate: Thread gets a copy with the original value.
lastprivate: The value from the last iteration is saved.
10. What is Synchronization?
It means controlling access to shared data to prevent conflicts. Tools:
critical
atomic
barrier
lock
11. What is Schedule in OpenMP?
It defines how loop iterations are divided:
static: Fixed-size chunks
dynamic: Assigned as threads finish
guided: Decreasing chunk size over time
✅ UNIT 3: MPI Basics
12. What is MPI?
MPI (Message Passing Interface) is used in distributed memory systems to communicate
between processors.
13. Important MPI Functions
MPI_Send / MPI_Recv: Send and receive messages (blocking)
MPI_Isend / MPI_Irecv: Non-blocking versions
MPI_Bcast: Broadcast to all processes
MPI_Gather / MPI_Scatter: Collect or distribute data
MPI_Reduce: Combines values using operations (sum, max, etc.)
MPI_Barrier: All processes wait here (synchronization)
14. MPI_Comm_rank & MPI_Comm_size
MPI_Comm_rank: Gives process ID
MPI_Comm_size: Total number of processes
✅ UNIT 4: Architectures and Concepts
15. Shared vs Distributed Memory
Shared: All processors access same memory (OpenMP).
Distributed: Each has own memory (MPI), uses message passing.
16. UMA vs NUMA
UMA (Uniform): Equal memory access time
NUMA (Non-uniform): Memory access time varies based on location
17. Superscalar and VLIW
Superscalar: Executes multiple instructions in one cycle using multiple ALUs
VLIW: Compiler schedules multiple instructions in one long word
18. Cache Coherence
Ensures all processors see consistent values of shared variables in their caches.
✅ UNIT 5: Algorithms & Case Studies
19. PRAM Models
EREW: Exclusive Read, Exclusive Write
CREW: Concurrent Read, Exclusive Write
ERCW: Exclusive Read, Concurrent Write
CRCW: Concurrent Read, Concurrent Write
20. Odd-Even Sort (MPI)
Alternates sorting even-odd and odd-even indexed pairs
Processes exchange and compare values using MPI_Sendrecv
21. BFS in OpenMP
Traverse levels of a graph using parallel loops
Protect next-level queue using critical or atomic
22. Histogram Sorting and Race Condition
Race condition occurs when threads update the same variable (e.g., histogram bin)
Fix using: #pragma omp atomic or locks
23. Block Matrix Multiplication (MPI)
Divide matrices into blocks
Each process computes its part
Use MPI to share and collect results
Time complexity: O(n³/p) + communication cost
24. Fork-Join Model
Fork: Master creates threads (parallel region)
Join: Threads finish and sync before continuing
Would you like this cleaned version as a PDF file or a printable revision sheet?
Absolutely! Here's the complete simplified and easy-to-understand summary of your IPP
Question Bank PDF — all in one go, organized by topic.
🌟 Introduction to Parallel Programming (Simplified Notes)
✅ OpenMP – Shared Memory Programming
🔹 Basic Directives
#pragma omp parallel: Creates multiple threads.
#pragma omp parallel for: Splits loop among threads.
#pragma omp parallel {
printf("Thread %d\n", omp_get_thread_num());
}
🔹 Synchronization
critical: Only one thread enters at a time.
atomic: Fast, for simple operations (like sum++).
barrier: Waits until all threads reach this point.
nowait: Skips waiting after loops.
flush: Makes updates visible to all threads.
ordered: Maintains original loop order inside a parallel loop.
🔹 Data Sharing Clauses
private: Each thread gets its own uninitialized variable.
shared: Variable is shared among all threads.
firstprivate: Private, but initialized with original value.
lastprivate: After loop, saves value from last iteration.
threadprivate: Global variable, but private to each thread.
🔹 Reduction
Combines thread results (e.g., sum, max).
#pragma omp parallel for reduction(+:sum)
🔹 Loop Scheduling
static: Fixed chunks.
dynamic: Chunks given as threads finish.
guided: Large to small chunks.
auto: Compiler decides.
🔹 Tasks
task: Runs block later.
taskwait: Waits for child tasks to finish.
🔹 Locks
omp_lock_t lock;
omp_init_lock(&lock);
omp_set_lock(&lock);
// critical work
omp_unset_lock(&lock);
omp_destroy_lock(&lock);
🔹 Useful OpenMP Functions
omp_set_num_threads(n)
omp_get_num_threads()
omp_get_thread_num()
omp_get_wtime() → returns wall time
🔹 OpenMP Env Variables
OMP_NUM_THREADS: set thread count
OMP_SCHEDULE: schedule type
OMP_DYNAMIC: auto-adjust threads
OMP_PROC_BIND: fix threads to CPUs
✅ MPI – Distributed Memory Programming
🔹 Basic MPI Communication
MPI_Send / MPI_Recv: Blocking send/receive
MPI_Isend / MPI_Irecv: Non-blocking
MPI_Wait, MPI_Test: Wait for non-blocking completion
MPI_Sendrecv: Send and receive together
🔹 Collective Communication
MPI_Bcast: Broadcast to all
MPI_Gather: Collect to root
MPI_Allgather: All collect
MPI_Scatter: Divide from root
MPI_Reduce: Combine values (sum, max)
MPI_Allreduce: All get combined result
MPI_Barrier: Wait for all to sync
🔹 MPI Info & Utilities
MPI_Comm_rank(): Get process ID
MPI_Comm_size(): Total processes
MPI_COMM_WORLD: All processes group
MPI_Status: Holds message info
✅ Core Concepts
🔹 Clock Concepts
Clock cycle: Smallest unit of CPU time.
Clock speed: How many cycles/second (GHz = billions/sec).
🔹 Superscalar vs VLIW
Superscalar: Hardware runs many instructions per cycle.
VLIW: Compiler packs many instructions into one long word.
🔹 Memory Concepts
Cache coherence: Keep cache data consistent across cores.
Spatial locality: Accessing nearby memory addresses.
Temporal locality: Reusing same address soon.
Strided access: Accessing with a step (e.g., a[i+2]).
🔹 Memory Architecture
Shared memory: One memory, many threads (OpenMP).
Distributed memory: Each processor has its own memory (MPI).
UMA: Uniform access to memory.
NUMA: Access speed varies by location.
🔹 SIMD vs MIMD vs SPMD
SIMD: One instruction, many data (used in GPUs).
MIMD: Each core does different instructions.
SPMD: Same code, different data (common in MPI).
🔹 Blocking vs Non-blocking MPI
Blocking: Sender/receiver wait.
Non-blocking: Send/recv continues, needs MPI_Wait.
✅ Algorithms
🔹 Dijkstra’s Algorithm (MPI)
Find shortest path.
Use MPI_Allreduce to share minimum distance info.
🔹 Odd-Even Sort (MPI)
Even phase: sort pairs (0,1), (2,3)...
Odd phase: sort pairs (1,2), (3,4)...
Use MPI_Sendrecv to swap values.
🔹 BFS (OpenMP)
Traverse graph by levels.
Use parallel for to explore neighbors.
Use critical/atomic to update next level safely.
🔹 Histogram Sorting
Count frequencies (e.g., in images).
Problem: race condition if threads update same bin.
Fix: use critical, atomic, or locks.
🔹 Block Matrix Multiplication (MPI)
Divide matrices into smaller blocks.
Each process computes a block.
Time: O(n³/p) + communication cost
🔹 Fork-Join Model
Fork: Start parallel threads.
Join: Wait for threads to finish.
🔹 PRAM Types
EREW: Exclusive read, exclusive write
CREW: Concurrent read, exclusive write
ERCW: Exclusive read, concurrent write
CRCW: Concurrent read, concurrent write
🔹 K-d Mesh & Hypercube
K-d Mesh: Grid network, 2 neighbors per dimension
Hypercube: 2^d nodes, each connected by 1-bit difference
🎯 Done!
This is your complete simplified version of the full "Introduction to Parallel Programming"
question bank — with no topics missed.
Would you like me to turn this into a PDF or revision sheet now?