This action might not be possible to undo. Are you sure you want to continue?

# Performance Metrics

Outline:Performance Metrics Introduction Running Time Speedups Efficiency Communication Overheads Scalability

**What are Performance Metrics ?
**

Performance metrics are a tool to help programmers reduce the running time of their applications. To determine best parallel algorithm Evaluate hardware platforms Examine the benefits from parallelism These metrics have been developed to help in the performance debugging of parallel programs

**Metric 1:- Running Time
**

(Run-time is the dominant metric)

Running time is the Execution Time of a program. It is the measure of how long the parallel program needs to run to solve our problem, we can directly measure its effectiveness. The running time is the amount of time consumed in execution of an algorithm for a given input on the Nprocessor based parallel computer. It is denoted by T(n) where n signifies the number of processors employed. If n=1, it is sequential computer.

The relation between Execution time vs. Number of processors

It can be easily seen from the graph that as the number of processors increases, initially the execution time reduces but after a certain optimum level the execution time increases as number of processors increases. This discrepancy is because of the overheads involved in increasing the number of processes.

Metric 2:-Speedup

To find out how much better our program does on the parallel machine, compared to a program running on only one processor, taking the ratio of the two is a natural solution. This measure is called the speedup. Speed up is the ratio of the time required to execute a given program using a specific algorithm on a machine with single processor (i.e. T (1) (where n=1)) to the time required to execute the same program using a specific algorithm on a machine with multiple processors (i.e. T(n)). S(n) = T(1) or T(n) It may be noted that the term T(1) signifies the amount of time taken to execute a program using the best sequential algorithm i.e., the algorithm with least time complexity. Measures performance gain and relative benefit achieved by parallelizing a given application over sequential implementation.

Example:-

Example (Contd..)

**Graph showing Speedup
**

Assumptions :The p PEs used are assumed to be identical to the one used by the sequential algorithm Superlinear Linear Sublinear

No of processors required

Resultant Speedups

Linear speedup: Sp p Super-linear speedup Sp p Sub-linear speedup:

Sp p

Defination of Speedups

Linear speedup:- Implicitly means 1 to 1 speedup per

processor.

**Sub-linear Speedup:- It means Speedup(n) less than
**

n, reflecting the fact that not all parts of a program benefit from parallel execution. More normal due to overhead of startup, synchronization and communication etc.

**Super-linear speedup:- In rare situations, for
**

Speedup(n) to be larger than n. This is called a superlinear speedup means the program has been speed up by more than the increase of CPUs.

**Causes of Superlinear Speedup
**

A super-linear speedup comes about because each CPU is now working on a smaller set of memory. The problem data handled by any one CPU fits better in cache, so each CPU executes faster than the single CPU could do. Other reasons are:Overhead reduced Latency hidden Mathematical inefficiency of the serial algorithm Higher memory access cost in sequential processing

**Metric 3:- Efficiency (Throughput)
**

Measures fraction of time for which a PE is usefully employed Defined as ratio of the speedup to the number of PEs In ideal parallel systems Speedup is equal to p, and Efficiency is equal to one In practice Speedup is less than p, and Efficiency is between zero and one

**Metric 4:- Communication Overhead
**

Total time spent by all the PEs over and above that required by the fastest known sequential algorithm on a single PE for solving the same problem Represented by an overhead function Total time spent in solving a problem using p PEs is Time spent for performing useful work is The remainder is overhead given by

**Factor Causing Parallel Overheads
**

Uneven Load Distribution In the parallel computer, the problem is split into sub-problems and is assigned for computation to various processors. But sometimes the sub-problems are not distributed in a fair manner to various sets of processors which causes imbalance of load between various processors. This event causes degradation of overall performance of parallel computers. Cost involved in inter-processor communication As the data is assigned to multiple processors in a parallel computer while executing a parallel algorithm, the processors might be required to interact with other processes thus requiring inter-processor communication. Therefore, there is a cost involved in transferring data between processors which incurs an overhead.

Contd..

Synchronization In order to execute a parallel algorithm on a parallel computer, K number of processors are required. As we know, execution time decreases with increase in number of processors. However, when input size is fixed and we keep on increasing the number of processors, in such a situation after some point the execution time starts increasing. This is because of overheads encountered in the parallel system. Parallel Balance Point Multiple processors require synchronization with each other while executing a parallel algorithm. That is, the task running on processor X might have to wait for the result of a task executing on processor Y. Therefore, a delay is involved in completing the whole task distributed among K number of processors.

**Metric 5:-Scalability of parallel systems
**

The ability to maintain performance gain when system and problem size increase Total overhead function is a function of both problem size and the number of processing elements p In many cases, grows sub-linearly with respect to Ts In such cases, efficiency increases if problem size is increased keeping number of PEs constant One can simultaneously increase the problem size and number of processors to keep efficiency constant Such systems are called scalable parallel systems