Welcome to Scribd!

Lecture 27

Uploaded by

0% found this document useful (0 votes)

3 views14 pages

The document discusses several topics related to data level parallelism and vector processing architectures: 1) Memory systems for vector processors use multiple independent memory banks that allow multiple loads or stores per clock cycle to support high bandwidth needs for vector operations. 2) Vector stride addressing allows gathering non-sequential memory locations into vectors using unit and non-unit strides, but non-unit strides can cause bank conflicts that degrade performance. 3) Sparse matrices are also important for vector processing and use indirect addressing to access only non-zero elements in a compact sparse storage format.

Original Description:

Original Title

Lecture_27

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

3 views14 pages

Lecture 27

Uploaded by

Udai Valluru

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 14

Search inside document

CKV

Advanced VLSI Architecture

MEL G624

Lecture 27: Data Level Parallelism

CKV
Memory Banks

The start‐up time for a load/store vector unit is the time

to get the first word from memory into a register.

Penalties for start‐ups on load/

store units are higher than those for arithmetic unit

Memory system must be designed to support high bandwi

dth for vector loads and stores
CKV
Memory Banks
Vector processor use memory banks which allow multiple
independent access rather than memory interleaving
mainly to:
Support multiple loads or stores per clock. Be able to control
the addresses to the banks independently

Support (multiple) non‐sequential loads or stores

Support multiple processors sharing the same memory syste

m, processor will be generating its own independent stream
of addresses
CKV
Memory Banks
Example: Cray T90

32 processors, each generating 4 loads and 2 stores per

cycle

Processor cycle time is 2.167ns

SRAM cycle time is 15ns

How many memory banks needed?

CKV
Handling Multidimensional Arrays

for (i = 0; i < 100; i=i+1)

for (j = 0; j < 100; j=j+1) {
A[i][j] = 0.0;
for (k = 0; k < 100; k=k+1)
A[i][j] = A[i][j] + B[i][k] * D[k][j];
}

Two dimensional arrays stored in Memory in

Row-Major order (as in C)
Column-Major order (as in Fortran)
CKV
Handling Multidimensional Arrays

for (i = 0; i < 100; i=i+1)

for (j = 0; j < 100; j=j+1) {
A[i][j] = 0.0;
for (k = 0; k < 100; k=k+1)
A[i][j] = A[i][j] + B[i][k] * D[k][j];
}
CKV
(Unit/Non‐Unit) Stride Addressing
The distance separating elements to be gathered into a si
ngle register is called stride.

In the example:
Matrix D has a stride of 100 double words
Matrix B has a stride of 1 double word

Use non‐unit stride for D

To access non‐
sequential memory location and to reshape them into a
dense structure
CKV
(Unit/Non‐Unit) Stride Addressing
The Size of the matrix may not be known at compile time

Use LVWS/SVWS: load/store vector with stride

LVWS V1, (R1, R2) Load V1 from address at R1 and stride in R2

SVWS (R1, R2), V1 Store V1 to address at R1 and stride in R2

The vector stride, like the vector starting address, can be

put in a general‐purpose register (dynamic)
CKV
Problems in Stride Addressing
Once we introduce non‐unit strides, it becomes
possible to request accesses from the same bank
frequently

Memory bank conflict !!

Bank conflict (stall) occurs when the same bank is hit

faster than bank busy time:

#banks / GCD(stride, #banks) < bank busy time

CKV
Problems in Stride Addressing
8 memory banks, Bank busy time: 6 cycles, Total memory
latency: 12 cycles.

What is the difference between a 64‐element vector load

with a stride of 1 and 32?
CKV
Handling Sparse Matrices
Sparse matrices in vector mode is a necessity

Example: Consider a sparse vector sum on arrays A and C

for (i = 0; i < n; i=i+1)

A[K[i]] = A[K[i]] + C[M[i]];

where K and M and index vectors to designate the nonzero

elements of A and C

Sparse matrix elements stored in a compact form and acce

ssed indirectly
CKV
Handling Sparse Matrices
Example: Consider a sparse vector sum on arrays A and C

for (i = 0; i < n; i=i+1)

A[K[i]] = A[K[i]] + C[M[i]];
Ra, Rc, Rk and Rm the starting addresses of vectors

LV Vk, Rk ;load K
LVI Va, (Ra+Vk) ;load A[K[]] Gather
LV Vm, Rm ;load M
LVI Vc, (Rc+Vm) ;load C[M[]] Gather
ADDVV.D Va, Va, Vc ;add them
SVI (Ra+Vk), Va ;store A[K[]] Scatter
CKV
Programming Vector Architectures
Compilers can provide feedback to programmers
Programmers can provide hints to compiler
Cray Y‐MP Benchmarks
CKV

Thank You for Attending

SS448 Manual
Document8 pages
SS448 Manual
kiko
No ratings yet
Lec ltm1160 5 1
Document475 pages
Lec ltm1160 5 1
mohamed Abd Elhafeez
No ratings yet
Guide To Resistors
Document34 pages
Guide To Resistors
Viet Anh Nguyen
No ratings yet
SYNTHESIS-sparc by Rajesh
Document20 pages
SYNTHESIS-sparc by Rajesh
Rajesh Perla
No ratings yet
Verilog Interview Questions
Document7 pages
Verilog Interview Questions
Vishal K Srivastava
No ratings yet
Practical Reverse Engineering: x86, x64, ARM, Windows Kernel, Reversing Tools, and Obfuscation
From Everand
Practical Reverse Engineering: x86, x64, ARM, Windows Kernel, Reversing Tools, and Obfuscation
Bruce Dang
No ratings yet
PD Interview Questions and Answers
Document32 pages
PD Interview Questions and Answers
venkata ratnam
No ratings yet
Grammaire Progressive Du Français - Niveau Intermédiaire - Livre + Corrigés
Document20 pages
Grammaire Progressive Du Français - Niveau Intermédiaire - Livre + Corrigés
Tiago Jeronimo Lopes
0% (1)
VLSI Interview Questions
Document7 pages
VLSI Interview Questions
Vlsi Guru
No ratings yet
Transistor Analysis
Document3 pages
Transistor Analysis
Nazir Kazimi
No ratings yet
ReachStackers 42-45 T Kalmar
Document12 pages
ReachStackers 42-45 T Kalmar
Pavlovstefan6187
50% (4)
Camry PDF
Document102 pages
Camry PDF
mihai_1957
No ratings yet
Kronecker Products and Matrix Calculus with Applications
From Everand
Kronecker Products and Matrix Calculus with Applications
Alexander Graham
No ratings yet
How To Write FSM Is Verilog
Document8 pages
How To Write FSM Is Verilog
Ayaz Mohammed
No ratings yet
Advanced VLSI Architecture: Lecture 26: Data Level Parallelism
Document15 pages
Advanced VLSI Architecture: Lecture 26: Data Level Parallelism
Udai Valluru
No ratings yet
Unit Iii Data-Level Parallelism in Vector, Simd, and Gpu Architectures
Document26 pages
Unit Iii Data-Level Parallelism in Vector, Simd, and Gpu Architectures
Shweta
No ratings yet
Advanced VLSI Architecture: Lecture 23 - 24: Data Level Parallelism
Document27 pages
Advanced VLSI Architecture: Lecture 23 - 24: Data Level Parallelism
Udai Valluru
No ratings yet
CH 04. Data-Level Parallelism in Vector, SIMD, and GPU Architectures
Document50 pages
CH 04. Data-Level Parallelism in Vector, SIMD, and GPU Architectures
Faheem Khan
No ratings yet
7-VECTOR PROCESSING-04-Jan-2020Material - I - 04-Jan-2020 - VECTOR - PROCESSING PDF
Document31 pages
7-VECTOR PROCESSING-04-Jan-2020Material - I - 04-Jan-2020 - VECTOR - PROCESSING PDF
ANTHONY NIKHIL REDDY
No ratings yet
Advanced VLSI Architecture: Lecture 25: Data Level Parallelism
Document19 pages
Advanced VLSI Architecture: Lecture 25: Data Level Parallelism
Udai Valluru
No ratings yet
Module 1.6
Document53 pages
Module 1.6
project mission
No ratings yet
SDSL Cheatsheet
Document2 pages
SDSL Cheatsheet
felipelouza
No ratings yet
Appendix Ii. Image Processing Using Opencv Library
Document11 pages
Appendix Ii. Image Processing Using Opencv Library
vlad
No ratings yet
Indian Institute of Technology Guwahati: Department of Computer Science and Engineering
Document5 pages
Indian Institute of Technology Guwahati: Department of Computer Science and Engineering
PalashAhuja
No ratings yet
CS7103 - MultiCore Architecture Ppts Unit-II
Document43 pages
CS7103 - MultiCore Architecture Ppts Unit-II
Surya
No ratings yet
Module 5
Document37 pages
Module 5
Pavithra
No ratings yet
CS6461 - Computer Architecture Fall 2016 - Vector Operations
Document47 pages
CS6461 - Computer Architecture Fall 2016 - Vector Operations
闫麟阁
No ratings yet
Data-Level Parallelism in Vector, SIMD, and GPU Architectures
Document58 pages
Data-Level Parallelism in Vector, SIMD, and GPU Architectures
Dhananjai Yadav
No ratings yet
Lecture 28
Document22 pages
Lecture 28
Udai Valluru
No ratings yet
PS3 Programming Basics: Week 1. SIMD Programming On PPE Materials Are Adapted From The Textbook
Document37 pages
PS3 Programming Basics: Week 1. SIMD Programming On PPE Materials Are Adapted From The Textbook
jayantbildani
No ratings yet
Lecture 20: Data Level Parallelism - Introduction and Vector Architecture
Document47 pages
Lecture 20: Data Level Parallelism - Introduction and Vector Architecture
Phani Kumar
No ratings yet
Unit 2 - Chapter 2
Document29 pages
Unit 2 - Chapter 2
Pallavi Bharti
No ratings yet
Integrated Design of AES (Advanced Encryption Standard) Encrypter and Decrypter
Document9 pages
Integrated Design of AES (Advanced Encryption Standard) Encrypter and Decrypter
uptcalc2007
No ratings yet
Introduction To Verilog
Document43 pages
Introduction To Verilog
Shashwat Shrivastava
No ratings yet
CZ1102 Computing & Problem Solving Lecture 10
Document24 pages
CZ1102 Computing & Problem Solving Lecture 10
Charmaine Chu
No ratings yet
Comparison of Convolutional Codes With Block Codes
Document5 pages
Comparison of Convolutional Codes With Block Codes
Eminent Aymee
No ratings yet
Fundamentals of Parallel Processing: By: Solomon S
Document35 pages
Fundamentals of Parallel Processing: By: Solomon S
Amanuel
No ratings yet
Computer Architecture CT 2 Paper Solution: K M K M K M K M T T) S (M
Document12 pages
Computer Architecture CT 2 Paper Solution: K M K M K M K M T T) S (M
Nishant Agarwal
No ratings yet
Towards Automating Cryptographic Hardware Implementations: A Case Study of HQC
Document6 pages
Towards Automating Cryptographic Hardware Implementations: A Case Study of HQC
Ajay Rodge
No ratings yet
Verilog Interview Questions & Answers For FPGA & ASIC
Document5 pages
Verilog Interview Questions & Answers For FPGA & ASIC
prodip7
No ratings yet
S, R Are All STD - Logic
Document7 pages
S, R Are All STD - Logic
Muhammad Moin
100% (1)
Blocked Matrix Multiply
Document6 pages
Blocked Matrix Multiply
Deivid CArpio
No ratings yet
Towards Automating Cryptographic Hardware Implementations: A Case Study of HQC
Document15 pages
Towards Automating Cryptographic Hardware Implementations: A Case Study of HQC
Ajay Rodge
No ratings yet
NB:-Write The Answers in Your Own Way and Do Not Copy From Other
Document2 pages
NB:-Write The Answers in Your Own Way and Do Not Copy From Other
SayanMaiti
No ratings yet
Real-Time VLSF Architecture For Video Compression: Fatemi and Panchanathan
Document4 pages
Real-Time VLSF Architecture For Video Compression: Fatemi and Panchanathan
कर्म सिंह मलिक
No ratings yet
B2 Ece2003 50333 50077 50428
Document2 pages
B2 Ece2003 50333 50077 50428
Nandu
No ratings yet
CA 13 VectorProcessors
Document16 pages
CA 13 VectorProcessors
Caroline Sam
No ratings yet
G Nageshwara Reddy - 13MVD1036
Document8 pages
G Nageshwara Reddy - 13MVD1036
Nagesh Reddy
No ratings yet
Unit Iii - Aca
Document13 pages
Unit Iii - Aca
Anitha Denis
No ratings yet
Advanced Computer Architecture
Document5 pages
Advanced Computer Architecture
Pranav JI
No ratings yet
Vector Quantization: Data Compression and Data Retrieval
Document39 pages
Vector Quantization: Data Compression and Data Retrieval
Pooja Bharti
No ratings yet
CA Classes-236-240
Document5 pages
CA Classes-236-240
SrinivasaRao
No ratings yet
CA Classes-201-205
Document5 pages
CA Classes-201-205
SrinivasaRao
No ratings yet
APCCAS2008 MGomes
Document4 pages
APCCAS2008 MGomes
ldpc
No ratings yet
Architecture Chapter4 E5 2012
Document92 pages
Architecture Chapter4 E5 2012
EAGLEFORCE06
No ratings yet
A Unified Approach To The Construction of Binary
Document10 pages
A Unified Approach To The Construction of Binary
chaudhryadnanaslam3799
No ratings yet
Synthesizable Verilog Coding: For RTL Design
Document93 pages
Synthesizable Verilog Coding: For RTL Design
Gabriel Sales Veloso
No ratings yet
Major Field Test in Computer Science Sample Questions
Document5 pages
Major Field Test in Computer Science Sample Questions
dc5k20na
No ratings yet
Syllabus Topic: - Vector Processing - Vector Processor
Document14 pages
Syllabus Topic: - Vector Processing - Vector Processor
Ananthu RK
No ratings yet
Computer Architecture AllClasses-Outline-199-294
Document96 pages
Computer Architecture AllClasses-Outline-199-294
SrinivasaRao
No ratings yet
Unit-5: Cocurrent Processors: Vector & Multiple Instruction Issue Processors
Document106 pages
Unit-5: Cocurrent Processors: Vector & Multiple Instruction Issue Processors
Jitender Garg
No ratings yet
A Walk Through Some Openfoam Code: Vector: Håkan Nilsson 1
Document11 pages
A Walk Through Some Openfoam Code: Vector: Håkan Nilsson 1
dfi49965
No ratings yet
Design and Implementation of Efficient Advanced Encryption Standard Composite S-Box With CM-Mode
Document9 pages
Design and Implementation of Efficient Advanced Encryption Standard Composite S-Box With CM-Mode
gunasekaran k
No ratings yet
Ece201 Verilog Lecture 1
Document28 pages
Ece201 Verilog Lecture 1
Marize Nepomuceno
No ratings yet
Q1
Document3 pages
Q1
anshubankofindia
No ratings yet
Native Shader Compilation With LLVM PDF
Document37 pages
Native Shader Compilation With LLVM PDF
thi minh phuong nguyen
No ratings yet
Sol 2003 Networks
Document15 pages
Sol 2003 Networks
sumanchakravarthi
No ratings yet
Reading Assignment1
Document15 pages
Reading Assignment1
Udai Valluru
No ratings yet
Lecture 30 GPU Programming Loop Parallelism
Document16 pages
Lecture 30 GPU Programming Loop Parallelism
Udai Valluru
No ratings yet
FULLTEXT01
Document94 pages
FULLTEXT01
Udai Valluru
No ratings yet
Tu0403switcherki2009isic 140305063913 Phpapp02
Document40 pages
Tu0403switcherki2009isic 140305063913 Phpapp02
Udai Valluru
No ratings yet
Tu0401switcherki2009isic 140121090158 Phpapp01
Document128 pages
Tu0401switcherki2009isic 140121090158 Phpapp01
Udai Valluru
No ratings yet
Tu0404bgrqpldrki2009isic 140121090507 Phpapp02
Document146 pages
Tu0404bgrqpldrki2009isic 140121090507 Phpapp02
Udai Valluru
No ratings yet
Lecture 17
Document20 pages
Lecture 17
Udai Valluru
No ratings yet
Lecture 20
Document28 pages
Lecture 20
Udai Valluru
No ratings yet
Tu0402switcherki2009isic 140305063756 Phpapp01
Document48 pages
Tu0402switcherki2009isic 140305063756 Phpapp01
Udai Valluru
No ratings yet
Lecture 28
Document22 pages
Lecture 28
Udai Valluru
No ratings yet
Lecture 18
Document38 pages
Lecture 18
Udai Valluru
No ratings yet
Advanced VLSI Architecture: Lecture 7: Memory Hierarchy
Document20 pages
Advanced VLSI Architecture: Lecture 7: Memory Hierarchy
Udai Valluru
No ratings yet
Advanced VLSI Architecture: Lecture 8: Memory Hierarchy
Document35 pages
Advanced VLSI Architecture: Lecture 8: Memory Hierarchy
Udai Valluru
No ratings yet
Lecture 22
Document29 pages
Lecture 22
Udai Valluru
No ratings yet
Advanced VLSI Architecture: Lecture 25: Data Level Parallelism
Document19 pages
Advanced VLSI Architecture: Lecture 25: Data Level Parallelism
Udai Valluru
No ratings yet
(MEL G621) : Lecture 1: Introduction
Document8 pages
(MEL G621) : Lecture 1: Introduction
Udai Valluru
No ratings yet
9.ion Implantation
Document18 pages
9.ion Implantation
Udai Valluru
No ratings yet
Lecture 21
Document25 pages
Lecture 21
Udai Valluru
No ratings yet
MEL G611: Ic Fabrication Technology: Section X: Annealing
Document12 pages
MEL G611: Ic Fabrication Technology: Section X: Annealing
Udai Valluru
No ratings yet
Sec VII, L 18-21: TFD: MEL G611: Ic Fabrication Technology
Document22 pages
Sec VII, L 18-21: TFD: MEL G611: Ic Fabrication Technology
Udai Valluru
No ratings yet
Sec X, L 30 & 31: Annealing: MEL G611: Ic Fabrication Technology
Document17 pages
Sec X, L 30 & 31: Annealing: MEL G611: Ic Fabrication Technology
Udai Valluru
No ratings yet
Sec VI, L 16-17: Etching: MEL G611: Ic Fabrication Technology
Document31 pages
Sec VI, L 16-17: Etching: MEL G611: Ic Fabrication Technology
Udai Valluru
No ratings yet
ST34-G Meter Technical Manual
Document32 pages
ST34-G Meter Technical Manual
Pedro Andrés Garzon
No ratings yet
Super Quick Charger: Bcg-34Hrmf
Document2 pages
Super Quick Charger: Bcg-34Hrmf
Cuka Araújo
No ratings yet
Thermal Power Plant Training Report (Formal)
Document91 pages
Thermal Power Plant Training Report (Formal)
Gurpreet Singh
No ratings yet
Graphene Fundamentals and Emergent Applications
Document7 pages
Graphene Fundamentals and Emergent Applications
Ponczek
0% (2)
Microgrids Managementgy67
Document13 pages
Microgrids Managementgy67
sj.saravanan9540
No ratings yet
Service DP710
Document309 pages
Service DP710
Chung Kim
No ratings yet
Icom F-121
Document36 pages
Icom F-121
Sylma Costarelli Martin
No ratings yet
Imnpd Bryan
Document13 pages
Imnpd Bryan
Jasmine Kaur Buttar
No ratings yet
DM74LS574 Octal D Flip-Flop With 3-STATE Outputs: General Description
Document4 pages
DM74LS574 Octal D Flip-Flop With 3-STATE Outputs: General Description
Zakaria Zebbiche
No ratings yet
Fuji Frenic-Vg7s
Document50 pages
Fuji Frenic-Vg7s
Mathawee Chotchai
No ratings yet
Rain Sensor (RG11) Interfacing Instructions
Document7 pages
Rain Sensor (RG11) Interfacing Instructions
April
No ratings yet
4PMMNL Abridged User Guide
Document4 pages
4PMMNL Abridged User Guide
w_muangphuan
No ratings yet
D and F Block Elements Chemistry Class Xii
Document3 pages
D and F Block Elements Chemistry Class Xii
Ipshita Pathak
No ratings yet
2 DTNB Tra Vinh Wind Farm 110 KV - 20181214
Document82 pages
2 DTNB Tra Vinh Wind Farm 110 KV - 20181214
nguyenvanmyk410kx
No ratings yet
01 1299134310 2255
Document90 pages
01 1299134310 2255
Lee Kar Huo
No ratings yet
Astribank Ip PBX Brochure
Document2 pages
Astribank Ip PBX Brochure
Ruth Bridger
No ratings yet
I/A Series Hardware: ® Product Specifications
Document16 pages
I/A Series Hardware: ® Product Specifications
Karuzriv Riv
No ratings yet
FDP - Brochure & Schedule
Document2 pages
FDP - Brochure & Schedule
avinash d
No ratings yet
DI13070M 405kW
Document2 pages
DI13070M 405kW
Travis Brown
No ratings yet
Form Kosong Pengukuran Battery
Document7 pages
Form Kosong Pengukuran Battery
muhammad lendi
No ratings yet
R15 Ind780 TM en
Document538 pages
R15 Ind780 TM en
vm8119
No ratings yet
Osmeoisis 2022-09-06 15-33-19PIC - Enh - A - 9
Document39 pages
Osmeoisis 2022-09-06 15-33-19PIC - Enh - A - 9
Tomás Burón
No ratings yet
Canon IR2016 Error Codes
Document7 pages
Canon IR2016 Error Codes
nafees
No ratings yet