Optimizing Code with Loop Transformations

The document discusses three techniques for improving program performance by increasing parallelism in fully permutable loops: 1. Exploiting fully permutable loops allows loops to execute in parallel by creating a loop nest transform from independent solutions to time-partition constraints. 2. Wavefronting partitions loop computation using an index variable that is a combination of all permutable loop indices, grouping iterations along diagonals for parallel execution. 3. Blocking aggregates loop iterations into blocks that can be assigned to processors, enhancing data locality and reducing pipelining overhead.

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

60 views6 pages

Optimizing Code with Loop Transformations

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Code Transformations

Exploiting Fully Permutable Loops:

Exploiting fully permutable loops is a technique used to improve
the performance of a program by increasing parallelism. The
technique is based on the idea that when multiple loops can be
executed in any order, it is possible to execute them in parallel.

The technique involves creating a loop nest with k outermost

fully permutable loops from k independent solutions to the time-
partition constraints. This is done by making the kth solution the
kth row of the new transform. Once the affine transform is
created, an algorithm can be used to generate the code.
• The solutions found in the SOR (Successive Over-Relaxation)
example were [1 0] and [1 1]. By making the first solution the first
row and the second solution the second row, the transform 1 0 1 1
is created. By making the second solution the first row instead, the
transform 1 1 1 0 is created.
• This technique is useful because it allows the program to take
advantage of the parallelism present in the loop nest, which can
lead to a significant increase in performance.
Wavefronting:

• It is also easy to generate k 1 inner parallelizable loops from a loop with k

outermost fully permutable loops. Although pipelining is preferable, we include
this information here for completeness..

• We partition the computation of a loop with k outermost fully permutable loops

using a new index variable i’, where i’ is defined to be some combination of all the
indices in the k permutable loop nest.

• We create an outermost sequential loop that iterates through the i0 partitions in

increasing order; the computation nested within each partition is ordered as
before. The 1st k 1 loops within each partition are guaranteed to be parallelizable.
Intuitively, if given a two-dimensional iteration space, this transform groups
iterations along 135 diagonals as an execution of the outermost loop. This
strategy guarantees that iterations within each iteration of the outermost loop
have no data dependence.
Blocking:

• A k-deep, fully permutable loop nest can be blocked in k-dimensions.

Instead of assigning the iterations to processors based on the value of the
outer or inner loop indexes, we can aggregate blocks of iterations into one
unit. Blocking is useful for enhancing data locality as well as for minimizing
the overhead of pipelining.
Blocking:
A simple loop nest. Blocked version of this loop nest
• for (i=0; i<n; i++) • for (ii = 0; ii<n; i+=b)
for (jj = 0; jj<n; jj+=b)
for (j=1; j<n; j++) for (i = ii*b; i <= min(ii*b-1, n);
{ i++)
<S> for (j = ii*b; j <= min(jj*b-1,
} n); j++) {
<S>
}
• Before • After

Loop Transformations for Parallelism
No ratings yet
Loop Transformations for Parallelism
38 pages
Loop Parallelization and Pipelining
No ratings yet
Loop Parallelization and Pipelining
14 pages
Software Pipelining in Compiler Design
No ratings yet
Software Pipelining in Compiler Design
25 pages
Two Level Nested Loops Tiled Iteration Space Scheduling by Changing Wave Front Angles Approach
No ratings yet
Two Level Nested Loops Tiled Iteration Space Scheduling by Changing Wave Front Angles Approach
8 pages
Understanding Parallel Computation
No ratings yet
Understanding Parallel Computation
13 pages
VLIW Machine Homework Solutions
No ratings yet
VLIW Machine Homework Solutions
16 pages
Class 9 FlowchartForLoops
No ratings yet
Class 9 FlowchartForLoops
20 pages
Loop Shifting for High-Level Synthesis
No ratings yet
Loop Shifting for High-Level Synthesis
22 pages
Loops and Nested Loops Exercises Guide
No ratings yet
Loops and Nested Loops Exercises Guide
4 pages
Gaddis Python 4e Chapter 04
No ratings yet
Gaddis Python 4e Chapter 04
30 pages
L8 Parallel Algorithms
No ratings yet
L8 Parallel Algorithms
41 pages
Optimal Loop Parallelization For Maximizing Iteration-Level Parallelism
No ratings yet
Optimal Loop Parallelization For Maximizing Iteration-Level Parallelism
10 pages
FOR vs. WHILE Loops in Real-Time Apps
No ratings yet
FOR vs. WHILE Loops in Real-Time Apps
7 pages
Parallel Algorithms: Sorting and Races
No ratings yet
Parallel Algorithms: Sorting and Races
51 pages
Super Linear Speedup in Parallel Algorithms
33% (3)
Super Linear Speedup in Parallel Algorithms
4 pages
Graph Search Algorithms Implementation
No ratings yet
Graph Search Algorithms Implementation
42 pages
ACA Unit 3
No ratings yet
ACA Unit 3
17 pages
Code Generation Compiler Construction
No ratings yet
Code Generation Compiler Construction
38 pages
Cloning and Loops in Repetitive Tasks
No ratings yet
Cloning and Loops in Repetitive Tasks
2 pages
DAA Decode
No ratings yet
DAA Decode
122 pages
Compiler Optimization Techniques
No ratings yet
Compiler Optimization Techniques
19 pages
W7 - Advanced Program Control - Course Notes
No ratings yet
W7 - Advanced Program Control - Course Notes
24 pages
Daa 1
No ratings yet
Daa 1
40 pages
Algorithm Design Basics
No ratings yet
Algorithm Design Basics
78 pages
Computation As An Expressive Computation As An Expressive Medium Medium Medium Medium
No ratings yet
Computation As An Expressive Computation As An Expressive Medium Medium Medium Medium
44 pages
Parallel Computation Models Explained
No ratings yet
Parallel Computation Models Explained
3 pages
04 Iterative Control Statements
No ratings yet
04 Iterative Control Statements
13 pages
Master of Computer Application (MCA) - Semester - 4 MC0080 - Analysis and Design of Algorithms Assignment Set - 1
No ratings yet
Master of Computer Application (MCA) - Semester - 4 MC0080 - Analysis and Design of Algorithms Assignment Set - 1
11 pages
Chapter 6 Algorithm Solutions Overview
No ratings yet
Chapter 6 Algorithm Solutions Overview
3 pages
Unit - Viii Machine Dependent Code Optimization Peephole Optimization
No ratings yet
Unit - Viii Machine Dependent Code Optimization Peephole Optimization
9 pages
Code Optimization Techniques Explained
No ratings yet
Code Optimization Techniques Explained
15 pages
DAA
No ratings yet
DAA
7 pages
1 2 3 4 5 6 7 8 Merged
No ratings yet
1 2 3 4 5 6 7 8 Merged
78 pages
Teaching Parallelism in Computing
No ratings yet
Teaching Parallelism in Computing
37 pages
DAA Ans
No ratings yet
DAA Ans
13 pages
04 - Gaddis Python - Lecture - PPT - ch04
No ratings yet
04 - Gaddis Python - Lecture - PPT - ch04
30 pages
Day01 HPC WRKSHP Compiler Opt
No ratings yet
Day01 HPC WRKSHP Compiler Opt
61 pages
Lecture 4: Principles of Parallel Algorithm Design (Part 4)
No ratings yet
Lecture 4: Principles of Parallel Algorithm Design (Part 4)
27 pages
PDC Lecture 04
No ratings yet
PDC Lecture 04
44 pages
CSE 242A Integrated Circuit Layout Automation: Lecture: Floorplanning Winter 2009 Chung-Kuan Cheng
No ratings yet
CSE 242A Integrated Circuit Layout Automation: Lecture: Floorplanning Winter 2009 Chung-Kuan Cheng
34 pages
OsChapter 6
No ratings yet
OsChapter 6
12 pages
CS536 Lec 14
No ratings yet
CS536 Lec 14
17 pages
MIPS Pipeline Code Scheduling Techniques
No ratings yet
MIPS Pipeline Code Scheduling Techniques
17 pages
Design and Analysis of Algorithms Exam
No ratings yet
Design and Analysis of Algorithms Exam
14 pages
AI Problems
No ratings yet
AI Problems
46 pages
Code Optimization Part 3 L15
No ratings yet
Code Optimization Part 3 L15
16 pages
Compiler Scheduling for MIPS ILP
No ratings yet
Compiler Scheduling for MIPS ILP
18 pages
T5 Repitition
No ratings yet
T5 Repitition
34 pages
Introduction to Parallel Algorithms
No ratings yet
Introduction to Parallel Algorithms
3 pages
Unit - 1 Python Notes
No ratings yet
Unit - 1 Python Notes
37 pages
Static ILP and Software Pipelining Techniques
No ratings yet
Static ILP and Software Pipelining Techniques
26 pages
Automatic Parallelization in Compilers
No ratings yet
Automatic Parallelization in Compilers
2 pages
Characteristics of Algorithms & Binary Trees
No ratings yet
Characteristics of Algorithms & Binary Trees
11 pages
CSBP119 SP2023 LCN 04
No ratings yet
CSBP119 SP2023 LCN 04
31 pages
CSC244/246 Homework 4 Instructions
No ratings yet
CSC244/246 Homework 4 Instructions
5 pages
Compiler Optimization Techniques Guide
No ratings yet
Compiler Optimization Techniques Guide
9 pages
Dynamic Programming for Matrix Multiplication
No ratings yet
Dynamic Programming for Matrix Multiplication
50 pages
Ladotd Hydraulics Program
No ratings yet
Ladotd Hydraulics Program
14 pages
C++ Stream and File I/O Explained
No ratings yet
C++ Stream and File I/O Explained
14 pages
Mumbai Pharmaceutical Companies Directory
No ratings yet
Mumbai Pharmaceutical Companies Directory
15 pages
Ankit Mishra's CV: Software Testing Skills
No ratings yet
Ankit Mishra's CV: Software Testing Skills
2 pages
Researchpaper B22ai025 Movierating
No ratings yet
Researchpaper B22ai025 Movierating
4 pages
Basic of Computer and Programming I
No ratings yet
Basic of Computer and Programming I
2 pages
Halcyon Message Management Suite User Guide
No ratings yet
Halcyon Message Management Suite User Guide
1,349 pages
MeeGo App Development Guide
No ratings yet
MeeGo App Development Guide
33 pages
Data Structures and Algorithms Quiz
No ratings yet
Data Structures and Algorithms Quiz
5 pages
Class Test1,2,3-Answer Key
No ratings yet
Class Test1,2,3-Answer Key
23 pages
DBMS Question Paper 2023
No ratings yet
DBMS Question Paper 2023
12 pages
Session 1 - Design Thinking For Web UX Design - PPT
No ratings yet
Session 1 - Design Thinking For Web UX Design - PPT
31 pages
Basic Pagemaker
33% (3)
Basic Pagemaker
23 pages
Post OFFICE-Post Office Project - Sample - Synopsis
No ratings yet
Post OFFICE-Post Office Project - Sample - Synopsis
32 pages
Finite State Machine Design Guide
No ratings yet
Finite State Machine Design Guide
57 pages
Visual Text Analysis Guide
No ratings yet
Visual Text Analysis Guide
3 pages
2026 Paper 01 S.essay
No ratings yet
2026 Paper 01 S.essay
5 pages
Microsoft 365 E3 - Microsoft Azure
No ratings yet
Microsoft 365 E3 - Microsoft Azure
19 pages
Switch Plugin Configuration Guide v8 13 X 2024-11-07-10-58-17
No ratings yet
Switch Plugin Configuration Guide v8 13 X 2024-11-07-10-58-17
171 pages
Life Cycle Inventory Analysis - L2
No ratings yet
Life Cycle Inventory Analysis - L2
7 pages
Red Colour: The Checkpoints Highlighted in Will Be Filled by NSDL E-Gov
No ratings yet
Red Colour: The Checkpoints Highlighted in Will Be Filled by NSDL E-Gov
7 pages
Azure Essentials
100% (2)
Azure Essentials
17 pages
Web Developer Career Guide
No ratings yet
Web Developer Career Guide
54 pages
S220 Specs
No ratings yet
S220 Specs
4 pages
HTML5 Structure and Syntax Overview
No ratings yet
HTML5 Structure and Syntax Overview
15 pages
DBMS Unit-1 Solution (All PYQs)
No ratings yet
DBMS Unit-1 Solution (All PYQs)
14 pages
Web GIS for Decision Makers
No ratings yet
Web GIS for Decision Makers
23 pages
Vessel View 4 Inch Manual
100% (1)
Vessel View 4 Inch Manual
55 pages
Set Up & Instruction Manual REV. 0.2 Set Up & Instruction Manual REV. 0.8
No ratings yet
Set Up & Instruction Manual REV. 0.2 Set Up & Instruction Manual REV. 0.8
39 pages
Cisco Notes
No ratings yet
Cisco Notes
8 pages

Optimizing Code with Loop Transformations

Uploaded by

Optimizing Code with Loop Transformations

Uploaded by

Code Transformations

Exploiting Fully Permutable Loops:

The technique involves creating a loop nest with k outermost

• It is also easy to generate k 1 inner parallelizable loops from a loop with k

• We partition the computation of a loop with k outermost fully permutable loops

• We create an outermost sequential loop that iterates through the i0 partitions in

• A k-deep, fully permutable loop nest can be blocked in k-dimensions.

You might also like