Welcome to Scribd!

MIT18 335JF10 Lec2a Hand

Uploaded by

0% found this document useful (0 votes)

9 views7 pages

This document discusses performance experiments with matrix multiplication. It begins by providing hardware specifications of a 2.66GHz Intel Core 2 Duo processor used for the experiments. It then presents the standard "obvious" C code implementation of matrix multiplication with three nested loops. The document notes that despite being just three loops, matrix multiplication performance can become complicated. Memory efficiency and cache performance play a key role, and optimizing for these is non-trivial. Even for a trivial algorithm, high performance requires understanding low-level hardware details.

Original Description:

Fjklbhkk

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

9 views7 pages

MIT18 335JF10 Lec2a Hand

Uploaded by

DevendraReddyPoreddy

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 7

Search inside document

18.

335 Fall 2008

Performance Experiments

with Matrix Multiplication

Steven G. Johnson

Hardware: 2.66GHz Intel Core 2 Duo

64-bit mode, double precision, gcc 4.1.2

optimized BLAS dgemm: ATLAS 3.6.0

http://math-atlas.sourceforge.net/
A trivial problem?

C = A B

m×p m×n n×p

the “obvious” C code: for i = 1 to m

for j = 1 to p
/* C = A B, where A is m x n, B is n x p, n
and C is m x p, in row-major order */
void matmul(const double *A, const double *B, Cij = ∑ Aik Bkj
double *C, int m, int n, int p) k=1
{
int i, j, k;

for (i = 0; i < m; ++i)

for (j = 0; j < p; ++j) {

double sum = 0;

2mnp flops
for (k = 0; k < n; ++k)

€
sum += A[i*n + k] * B[k*p + j];

(adds+mults)
C[i*p + j] = sum;

just three loops, how complicated can it get?

flops/time is not constant!

(square matrices, m=n=p)

L1 cache
exceeded?
(2.66GHz processor?
why < 1 gigaflops?)

L2 cache
exceeded?

L1 cache
exceeded
for single row?
Not all “noise” is random

All flops are not created equal

nearly peak
theoretical flop rate
(2 flops/cycle via SSE2 instructions)

same #operations
same abstract algorithm
factor of 10 in speed
Things to remember

• We cannot understand performance without

understanding memory efficiency (caches).
– ~10 times more important than arithmetic count
• Computers are more complicated than you think.
• Even a trivial algorithm is nontrivial to implement well.

– matrix multiplication: 10 lines of code → 130,000+ (ATLAS)

MIT OpenCourseWare
http://ocw.mit.edu

18.335J / 6.337J Introduction to Numerical Methods

Fall 2010

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

Setul de Instructiuni 8086: Exemple
Document10 pages
Setul de Instructiuni 8086: Exemple
eugen lupu
No ratings yet
Code Generation From MATLAB PDF
Document301 pages
Code Generation From MATLAB PDF
Ilias Mansouri
No ratings yet
XII MS Computer Science PB-1 (2023-24)
Document9 pages
XII MS Computer Science PB-1 (2023-24)
ujjwaldixit9452
No ratings yet
Modern GPU
Document221 pages
Modern GPU
x2y2z2rm
No ratings yet
Bebop to the Boolean Boogie: An Unconventional Guide to Electronics
From Everand
Bebop to the Boolean Boogie: An Unconventional Guide to Electronics
Clive Maxfield
Rating: 4 out of 5 stars
4/5 (3)
Fortran Workshop
Document145 pages
Fortran Workshop
Michael Benhamou
No ratings yet
Linear Algebra Libraries
Document35 pages
Linear Algebra Libraries
wxyz1949
100% (1)
User Guide QESPRESSO
Document77 pages
User Guide QESPRESSO
Manjeet Bhatia
No ratings yet
Open See Spy
Document452 pages
Open See Spy
hebele
No ratings yet
Performance Experiments With Matrix Multiplication A Trivial Problem?
Document1 page
Performance Experiments With Matrix Multiplication A Trivial Problem?
proxymo1
No ratings yet
Instructions/Notes:: Int Int
Document2 pages
Instructions/Notes:: Int Int
Abdullah Arif
No ratings yet
ES-CS 401 - Quiz - CA2
Document4 pages
ES-CS 401 - Quiz - CA2
mrprofessor7362
No ratings yet
Basic Computational Number Theory (Jacques Amsel)
Document8 pages
Basic Computational Number Theory (Jacques Amsel)
Gonzalo Rodriguez
No ratings yet
Data Structures and Algorithms: (CS210/ESO207/ESO211)
Document24 pages
Data Structures and Algorithms: (CS210/ESO207/ESO211)
Moazzam Hussain
No ratings yet
Coding Statements TCS NQT
Document13 pages
Coding Statements TCS NQT
Sairam Chowdary Nadipalli
No ratings yet
TA ComputerSystem
Document7 pages
TA ComputerSystem
Duy Trần
No ratings yet
2020 Daa End-Autumn Solution
Document22 pages
2020 Daa End-Autumn Solution
21051999
No ratings yet
Correction TP-Maths Appliquées GI-S3 Programmation Sous MATLAB
Document11 pages
Correction TP-Maths Appliquées GI-S3 Programmation Sous MATLAB
Emene Ek
No ratings yet
Assembly Language Lab
Document32 pages
Assembly Language Lab
Eri Matraku
100% (1)
BIT DL 4
Document44 pages
BIT DL 4
prajwol neupane
No ratings yet
Algo Analysis
Document33 pages
Algo Analysis
Swapnil Shinde
No ratings yet
CS4961 L15
Document14 pages
CS4961 L15
Anirudh Seth
No ratings yet
Data Structures Midterm Exam Model Answer
Document7 pages
Data Structures Midterm Exam Model Answer
abdalrahmantouny63
No ratings yet
Exercise 1 (1pt) : Master M1 SMA Computer Programming Class 2012/13. Exam. 2 Hours, Open Book
Document11 pages
Exercise 1 (1pt) : Master M1 SMA Computer Programming Class 2012/13. Exam. 2 Hours, Open Book
pranay
No ratings yet
Lecture Notes Workings
Document135 pages
Lecture Notes Workings
chandu
No ratings yet
Solution DSA Midterm Fall 2014: Statement Will Not Deduct Any Marks, Don't Write More Than Space Provided
Document4 pages
Solution DSA Midterm Fall 2014: Statement Will Not Deduct Any Marks, Don't Write More Than Space Provided
Sadiq Ahmad
No ratings yet
CSEN701 PA5 Solution 27551 29906
Document5 pages
CSEN701 PA5 Solution 27551 29906
no name
No ratings yet
CSC 2426 Report
Document3 pages
CSC 2426 Report
nekriachvv
No ratings yet
CLO3 Exercise Solutions
Document3 pages
CLO3 Exercise Solutions
maryam
No ratings yet
CA-2 - k22cw - Set A PDF
Document4 pages
CA-2 - k22cw - Set A PDF
Jash Jajoo
No ratings yet
Network Basics Collection
Document11 pages
Network Basics Collection
rocks66aryan
No ratings yet
L2 - Introduction To Algorithm Analysis
Document25 pages
L2 - Introduction To Algorithm Analysis
اسامه المقطري
No ratings yet
Extra Coding Statements TCS
Document54 pages
Extra Coding Statements TCS
Jeevan Sai Maddi
No ratings yet
Introduction To Tools - Python: 1 Assignment
Document7 pages
Introduction To Tools - Python: 1 Assignment
Rahul Vasanth
No ratings yet
April 2005 Withsloution
Document21 pages
April 2005 Withsloution
api-3755462
No ratings yet
Previous GATE Questions With Solutions On C Programming - CS/IT
Document43 pages
Previous GATE Questions With Solutions On C Programming - CS/IT
dsmari
No ratings yet
Binomial Coefficent
Document14 pages
Binomial Coefficent
Harsh Tibrewal
No ratings yet
Answer Test
Document4 pages
Answer Test
Nor Alif
No ratings yet
Sample Paper - 2008 Class-XI Subject - Computer Science
Document3 pages
Sample Paper - 2008 Class-XI Subject - Computer Science
somenath sengupta
No ratings yet
08 - Mixedprogramming: 1 Mixed Programming
Document41 pages
08 - Mixedprogramming: 1 Mixed Programming
giordano mancini
No ratings yet
Exam 2014 S 2 Guide
Document24 pages
Exam 2014 S 2 Guide
Khuresh Shah
No ratings yet
Assignment 2
Document10 pages
Assignment 2
Abdul Rehman
No ratings yet
1 - Algorithm Analysis
Document22 pages
1 - Algorithm Analysis
Srinitya Suripeddi
No ratings yet
GPU Programming EE 4702-1 Final Examination: Name Solution
Document10 pages
GPU Programming EE 4702-1 Final Examination: Name Solution
moien
No ratings yet
C 100 For I 1 To N Do For J 1 To N Do (Temp A (I) (J) + C A (I) (J) A (J) (I) A (J) (I) Temp - C) For I 1 To N Do For J 1 To N Do Output (A (I) (J) )
Document5 pages
C 100 For I 1 To N Do For J 1 To N Do (Temp A (I) (J) + C A (I) (J) A (J) (I) A (J) (I) Temp - C) For I 1 To N Do For J 1 To N Do Output (A (I) (J) )
Utkarsh Shukla
No ratings yet
Meta Fork
Document31 pages
Meta Fork
Dule Bikov
No ratings yet
Signal
Document15 pages
Signal
عبدالله طلبه المغربي
No ratings yet
BSC Sem4 Algorithms Assignment 1 Ans
Document11 pages
BSC Sem4 Algorithms Assignment 1 Ans
Mukesh Kumar
No ratings yet
Midterm Semester 2 2019 2020
Document8 pages
Midterm Semester 2 2019 2020
Duy Trần
No ratings yet
B.Tech II Year I Semester (R13) Regular & Supplementary Examinations December 2015
Document1 page
B.Tech II Year I Semester (R13) Regular & Supplementary Examinations December 2015
jayarami reddy p
No ratings yet
QP 11 CS
Document6 pages
QP 11 CS
meenakvs2007
No ratings yet
Midsemester Exam Solutions
Document6 pages
Midsemester Exam Solutions
abegaz
No ratings yet
Nimesh 1
Document31 pages
Nimesh 1
nekod74098
No ratings yet
Problem Set 2
Document5 pages
Problem Set 2
Eren Güngör (Student)
No ratings yet
Set4 PDF
Document2 pages
Set4 PDF
Amulya Raj
No ratings yet
True or False: Stack
Document15 pages
True or False: Stack
Amro Abosaif
No ratings yet
COMPUTER Paper XI
Document5 pages
COMPUTER Paper XI
Nick Singh
No ratings yet
Regression and Interpolation Lec. 6.1: Introduction: Dr. Niket Kaisare Department of Chemical Engineering IIT-Madras
Document12 pages
Regression and Interpolation Lec. 6.1: Introduction: Dr. Niket Kaisare Department of Chemical Engineering IIT-Madras
GrantMwakipunda
No ratings yet
ECE568 2010 Ts 1303709634
Document10 pages
ECE568 2010 Ts 1303709634
Ibrahim qashta
No ratings yet
First Year Higher Secondary Model Examination - 2021
Document8 pages
First Year Higher Secondary Model Examination - 2021
Abhin C
No ratings yet
Comp Science FY 330
Document8 pages
Comp Science FY 330
Abhin C
No ratings yet
Programming Fundamentals: Dr. Sheikh Faisal Rashid Department of Computer Science and Engineering UET Lahore
Document25 pages
Programming Fundamentals: Dr. Sheikh Faisal Rashid Department of Computer Science and Engineering UET Lahore
muhammad shoaib
No ratings yet
4 - Questions of For Loop
Document4 pages
4 - Questions of For Loop
Tiasa Pal
No ratings yet
Raport: La Analiza I Proiectarea Algoritmilor Ș
Document7 pages
Raport: La Analiza I Proiectarea Algoritmilor Ș
Radu Madiudin
No ratings yet
Texas Instruments Technical Exam Questions: Struct A (
Document14 pages
Texas Instruments Technical Exam Questions: Struct A (
Shankar
No ratings yet
3.intermediate Code Generation PDF
Document27 pages
3.intermediate Code Generation PDF
Abhi
No ratings yet
MIT6 0002F16 ProblemSet5
Document13 pages
MIT6 0002F16 ProblemSet5
DevendraReddyPoreddy
No ratings yet
Mit6 041SCF13 L07
Document3 pages
Mit6 041SCF13 L07
DevendraReddyPoreddy
No ratings yet
Ghjjb. CCC
Document8 pages
Ghjjb. CCC
DevendraReddyPoreddy
No ratings yet
Mitocw - Watch?V Uk5Yvoxnksk
Document13 pages
Mitocw - Watch?V Uk5Yvoxnksk
DevendraReddyPoreddy
No ratings yet
MIT6 0002F16 Lec15
Document20 pages
MIT6 0002F16 Lec15
DevendraReddyPoreddy
No ratings yet
18.A34 PROBLEMS #7: 1 2 K I I I
Document5 pages
18.A34 PROBLEMS #7: 1 2 K I I I
DevendraReddyPoreddy
No ratings yet
MIT18 650F16 Regression
Document44 pages
MIT18 650F16 Regression
DevendraReddyPoreddy
No ratings yet
Lec6 Constr Opt
Document30 pages
Lec6 Constr Opt
DevendraReddyPoreddy
No ratings yet
Probabilistic Collocation Method (PCM) For Modeling Response of GEOS-Chem Simulations To Model Parameter Uncertainties
Document18 pages
Probabilistic Collocation Method (PCM) For Modeling Response of GEOS-Chem Simulations To Model Parameter Uncertainties
DevendraReddyPoreddy
No ratings yet
Intro To Bayes Approach. Reasons To Be Bayesian: Differences Between Bayesian and Frequentist Approaches 1
Document9 pages
Intro To Bayes Approach. Reasons To Be Bayesian: Differences Between Bayesian and Frequentist Approaches 1
DevendraReddyPoreddy
No ratings yet
Sums of Independent Random Variables: Scott She Eld
Document10 pages
Sums of Independent Random Variables: Scott She Eld
DevendraReddyPoreddy
No ratings yet
Problem Set 6: Course Textbook
Document3 pages
Problem Set 6: Course Textbook
DevendraReddyPoreddy
No ratings yet
MIT18 A34F18Supp11
Document4 pages
MIT18 A34F18Supp11
DevendraReddyPoreddy
No ratings yet
MIT18 440S14 Lecture4 PDF
Document19 pages
MIT18 440S14 Lecture4 PDF
DevendraReddyPoreddy
No ratings yet
14.382 Spring 2017 Homework 1 Data Description
Document3 pages
14.382 Spring 2017 Homework 1 Data Description
DevendraReddyPoreddy
No ratings yet
hw3 Realworld PDF
Document1 page
hw3 Realworld PDF
DevendraReddyPoreddy
No ratings yet
Mit6 S095iap18 Puzzle 7
Document8 pages
Mit6 S095iap18 Puzzle 7
DevendraReddyPoreddy
No ratings yet
Testing Concepts.: 1 Hypotheses
Document6 pages
Testing Concepts.: 1 Hypotheses
DevendraReddyPoreddy
No ratings yet
MIT18 335JF10 Lec4 Hand PDF
Document3 pages
MIT18 335JF10 Lec4 Hand PDF
DevendraReddyPoreddy
No ratings yet
Massachusetts Institute of Technology: (Final Exam - Spring 2009)
Document14 pages
Massachusetts Institute of Technology: (Final Exam - Spring 2009)
DevendraReddyPoreddy
No ratings yet
MIT14 384F13 Lec11 PDF
Document6 pages
MIT14 384F13 Lec11 PDF
DevendraReddyPoreddy
No ratings yet
Problems On Sums and Integrals
Document15 pages
Problems On Sums and Integrals
DevendraReddyPoreddy
No ratings yet
MIT15 S21IAP14 Session4.2 PDF
Document84 pages
MIT15 S21IAP14 Session4.2 PDF
DevendraReddyPoreddy
No ratings yet
State-Space Models. ML Estimation. DSGE Models. Examples of State-Space Models (Cont.)
Document9 pages
State-Space Models. ML Estimation. DSGE Models. Examples of State-Space Models (Cont.)
DevendraReddyPoreddy
No ratings yet
MCMC: Gibbs Sampling: D K k1 k+1 D
Document7 pages
MCMC: Gibbs Sampling: D K k1 k+1 D
DevendraReddyPoreddy
No ratings yet
Problem Set 2: Warmups
Document1 page
Problem Set 2: Warmups
DevendraReddyPoreddy
No ratings yet
9.913 Pattern Recognition For Vision: Class VII, Part I - Techniques For Clustering Yuri Ivanov
Document52 pages
9.913 Pattern Recognition For Vision: Class VII, Part I - Techniques For Clustering Yuri Ivanov
DevendraReddyPoreddy
No ratings yet
14: Classification, Statistical Sins
Document23 pages
14: Classification, Statistical Sins
DevendraReddyPoreddy
No ratings yet
6wUD gp5WeE PDF
Document13 pages
6wUD gp5WeE PDF
DevendraReddyPoreddy
No ratings yet
Fortran Codes
Document5 pages
Fortran Codes
Varada Radjou C
No ratings yet
Cambricon: An Instruction Set Architecture For Neural Networks
Document13 pages
Cambricon: An Instruction Set Architecture For Neural Networks
Tampolen
No ratings yet
Software Optimization For High-Performance Computing
Document409 pages
Software Optimization For High-Performance Computing
api-3714165
100% (3)
Symbolic Statistics With SymPy
Document6 pages
Symbolic Statistics With SymPy
Chinh Xuan Bui
No ratings yet
Multi-Million Core, Multi-Wafer AI Cluster: Cerebras Systems
Document41 pages
Multi-Million Core, Multi-Wafer AI Cluster: Cerebras Systems
ddscribe
No ratings yet
Gan Fpga
Document35 pages
Gan Fpga
964520978sbj
No ratings yet
NWChem6.3 Documentation
Document472 pages
NWChem6.3 Documentation
Kristhian Alcantar Medina
No ratings yet
Nschloe-Awesome-Scientific-Computing Curated List
Document1 page
Nschloe-Awesome-Scientific-Computing Curated List
Anonymous 1rLNlqU
No ratings yet
How To Optimize A CUDA Matmul Kernel For CuBLAS-like Performance - A Worklog
Document23 pages
How To Optimize A CUDA Matmul Kernel For CuBLAS-like Performance - A Worklog
daweley389
No ratings yet
View
Document18 pages
View
nietoc
No ratings yet
Intel MKL Sparse Blas Overview
Document14 pages
Intel MKL Sparse Blas Overview
niksloter84
No ratings yet
Molpro Instguide
Document16 pages
Molpro Instguide
rhle5x
No ratings yet
Sparse 1
Document68 pages
Sparse 1
abi
No ratings yet
MKL Userguide LNX
Document114 pages
MKL Userguide LNX
yash_verma
No ratings yet
1.1 Arduino Duemilanove: Mini Project Report 8-Bit RISC Microprocessor Using VHDL
Document9 pages
1.1 Arduino Duemilanove: Mini Project Report 8-Bit RISC Microprocessor Using VHDL
Deepak Dinesh
No ratings yet
Machine Learning in Python Main Developments and T
Document44 pages
Machine Learning in Python Main Developments and T
muhammad syaukani
100% (1)
Intel MKL 2019 Developer Guide Linux PDF
Document124 pages
Intel MKL 2019 Developer Guide Linux PDF
Hector
No ratings yet
Dlib-Ml: A Machine Learning Toolkit: Davis E. King
Document4 pages
Dlib-Ml: A Machine Learning Toolkit: Davis E. King
John Le Tourneux
No ratings yet
727721euit025 FDS QUESTION BANK 1
Document73 pages
727721euit025 FDS QUESTION BANK 1
SHANMUGAM S
No ratings yet
Screen
Document3 pages
Screen
Adrian Bagayan
No ratings yet
Flops Memory Parallel Processing
Document8 pages
Flops Memory Parallel Processing
Ronald
No ratings yet
GIAnT Doc
Document100 pages
GIAnT Doc
Dubán Marín
No ratings yet
Intel Math Kernel Library
Document127 pages
Intel Math Kernel Library
Prashu Chaudhary
No ratings yet
Alglib Man
Document430 pages
Alglib Man
Stanley Ko
No ratings yet