Welcome to Scribd!

Aca 15CS72

Uploaded by

0% found this document useful (0 votes)

48 views9 pages

This document contains information about solving the Laplace equation on a distributed memory multicomputer system to test problem scalability. It discusses how increasing the problem size or grid points allows the computation to be distributed across more processing nodes. For good scalability, the computation time per node should be equal to the communication latency between nodes. It provides an example where a grid size of 86 allows balanced computation and communication, and increasing the grid size or memory capacity can further improve performance.

Original Description:

MODULE3

Original Title

ACA-15CS72

Copyright

Available Formats

PPTX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

48 views9 pages

Aca 15CS72

Uploaded by

jinkle pancholi

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 9

Search inside document

ACA-15CS72

Assignment-1
Jinkle Pancholi
1AY16CS044
22 October 2019

Staff In Charge: Module Coordinator: HOD:

Problem Scalability

• The problem size corresponds to the data set size. This is the key to achieving scalable performance
as the program granularity changes.
• A problem scalable computer should be able to perform well as the problem size increases.
• The problem size can be scaled to be sufficiently large in order to operate efficiently on a computer
with a given granularity Problems such as Monte Carlo simulation and ray tracing are "perfectly
parallel", since their threads of computation do not come together over long spells of
computation.
• Such an independence among threads isvery much desired in using a scalable MPP system.
• In general, the problem granularity (operations on a grid point/data required from adjacent grid
points) must be greater than a machine's granularity (node operation rate/node-to-node
communication data rate) in order for a multicomputer to be effective.
• Laplace equations are often used to model physical structures. A 3-D Laplace
equation is specified by:-
We want to determine the problem scalability of the Laplace equation solver on a distributed-memory
multicomputer with a sufficiently large number of processing nodes. Based on finite-difference method,
solving Eq. 3.39 requires performing the following averaging operation iteratively across a very large grid,as
shown in Fig. 3.14:

where 1 Sij.ks Nand N is the number of grid points along each dimension. In total, there are N grid points in the
problem domain to be evaluated during each iteration m for 1 sms. The three-dimensional domain can be
partitioned into p subdomains, each having grid points such that pn N where p is the machine size. The
computations involved in each subdomain are assigned to one node of a multicomputer. Therefore, in each
iteration, each node is required to perform 7 computations as specified in Eq.
Problem scaling for solving Laplace equation on a distributed memory
multicomputer.
• Each subdomain is adjacent to six other subdomains.
• Therefore, in each iteration, each node needs to exchange
(send or receive) a total of 6n words of floating-point
numbers with its neighbours.
• Assume each floating-point number is double-precision (64
bits, or 8 bytes).
• Each processing node has the capability of performing 100
Mflops (or 0.01 us per floating-point operation).
• The internode communication latency is assumed to be lus
(or 1 megaword/s) for transferring a floating-point number.
• For a balanced multicomputer, the computation time within
each node and inter-node communication latency should be
equal.
• Thus 0.07n us equals 6 us communication latency, implying
that has to be at least as large as 86.
• A node memory of capacity 86 x8-640K 8-5120 Kwords 5
megabytes is need edto hold each subdomain of data.
• On the other hand, suppose each message exchange takes 2
us (one receive and one send) per word.
• The communication latency is doubled. We desire to scale up
the problem size with an enlarged local memory of 32
megabytes.
• The subdomain dimension size n can be extended to at most
160, because 160x 8-32megabytes.
• This size problem requires 0.3 s of computation time and 2 x
0.15 s of send and receive time.
• Thus each iteration takes 0.6 (0.3+0.3) s, resulting in a
computation rate of 50 Mflops , which is only 50%of the
peak speed of cache node Ir the problem size n is further
increased, the collective Mflops rate and efficiency will be
improved.
• But this cannot be achieved unless the memory capacity is
further enlarged For a fixed memory capacity, the situation
corresponds to the memory bound region shown in Fig. 3.6c
Another risk of problem scaling is to exacerbate the limited 1
capability which is not demonstrated in this example
Z

(0,rn-n,rn)
(rn,0,0)
(0,rn,rn)
(0,rn,rn-n)

(rn,0,0) (rn,rn,rn)
n
n Y
n (r0,rn,0)

(rn,0,0)
(rn,rn,0)

Digital Modulations using Matlab
From Everand
Digital Modulations using Matlab
Mathuranathan Viswanathan
Rating: 4 out of 5 stars
4/5 (6)
J Java Streams 2 Brian Goetz PDF
Document12 pages
J Java Streams 2 Brian Goetz PDF
Ak Chowdary
No ratings yet
18s PDF
Document6 pages
18s PDF
NUBG Gamer
No ratings yet
CSCE626 Amato LN PerformanceAnalysisMethodology
Document19 pages
CSCE626 Amato LN PerformanceAnalysisMethodology
Raj Kumar Yadav
No ratings yet
Mesh Tensorflow Deep Learning For Supercomputers
Document10 pages
Mesh Tensorflow Deep Learning For Supercomputers
Jose Antonio Freyre Zavala
No ratings yet
Deep Neural Network (DNN)
Document80 pages
Deep Neural Network (DNN)
20210802144
No ratings yet
Parallel Architectures
Document160 pages
Parallel Architectures
Rache
No ratings yet
Breaking The Simulation Barrier: SRAM Evaluation Through Norm Minimization
Document8 pages
Breaking The Simulation Barrier: SRAM Evaluation Through Norm Minimization
Tanmoy Roy
No ratings yet
Unit 3 - Machine Learning - WWW - Rgpvnotes.in
Document29 pages
Unit 3 - Machine Learning - WWW - Rgpvnotes.in
ASHOKA KUMAR
No ratings yet
Lecture 4
Document33 pages
Lecture 4
Venkat ram Reddy
No ratings yet
Advanced Computer Architecture 1
Document14 pages
Advanced Computer Architecture 1
anilbittu
No ratings yet
Solution 2-DD
Document70 pages
Solution 2-DD
KugaRamanan
No ratings yet
CUDA Implementation of A Biologically Inspired Object Recognition System
Document5 pages
CUDA Implementation of A Biologically Inspired Object Recognition System
proxymo1
No ratings yet
Many-Core vs. Many-Thread Machines: Stay Away From The Valley
Document4 pages
Many-Core vs. Many-Thread Machines: Stay Away From The Valley
Nandan Kumar Jha
No ratings yet
Decreased: (4) Vlsi
Document6 pages
Decreased: (4) Vlsi
manavchahal
No ratings yet
Automation High Througput
Document41 pages
Automation High Througput
Katia Conceição
No ratings yet
Wavelet Transform: A Recent Image Compression Algorithm: Shivani Pandey, A.G. Rao
Document4 pages
Wavelet Transform: A Recent Image Compression Algorithm: Shivani Pandey, A.G. Rao
IJERD
No ratings yet
Ml@ok Questions
Document16 pages
Ml@ok Questions
INV SHR
No ratings yet
Kwan2006 Article ACompleteImageCompressionSchem
Document15 pages
Kwan2006 Article ACompleteImageCompressionSchem
rube nobe
No ratings yet
High Level Parameter Analysis in WSN Based On WMCL Algorithm
Document33 pages
High Level Parameter Analysis in WSN Based On WMCL Algorithm
rishi_008
No ratings yet
Trabalho Gaussian Parte Escrita
Document6 pages
Trabalho Gaussian Parte Escrita
Matheus Queiroz
No ratings yet
Massively Parallel Processors
Document102 pages
Massively Parallel Processors
M Haider Hussain
No ratings yet
High Performance Computing: 772 10 91 Thomas@chalmers - Se
Document75 pages
High Performance Computing: 772 10 91 Thomas@chalmers - Se
tedlaon
No ratings yet
Parallel Processing
Document35 pages
Parallel Processing
prv142
No ratings yet
MPI The Best Possible Practise
Document5 pages
MPI The Best Possible Practise
kam_anw
No ratings yet
Parallel Algebraic Multigrid-Survey
Document18 pages
Parallel Algebraic Multigrid-Survey
pmdelgado2
No ratings yet
DSE 5251 Insem1 Scheme
Document8 pages
DSE 5251 Insem1 Scheme
Balathrinath Reddy
No ratings yet
Efficient Net B0
Document4 pages
Efficient Net B0
Navya
No ratings yet
pp21 40
Document20 pages
pp21 40
mesfin Demise
No ratings yet
Data Science Tips and Tricks: 1. Running Jupyter Notebooks On A Shared Linux/Unix Machine
Document6 pages
Data Science Tips and Tricks: 1. Running Jupyter Notebooks On A Shared Linux/Unix Machine
eight8888
No ratings yet
An Energy-Efficient Deep Learning Processor With Heterogeneous Multi-Core Architecture For Convolutional Neural Networks and Recurrent Neural Networks
Document2 pages
An Energy-Efficient Deep Learning Processor With Heterogeneous Multi-Core Architecture For Convolutional Neural Networks and Recurrent Neural Networks
jeffconnors
No ratings yet
Week - 5 (Deep Learning) Q. 1) Explain The Architecture of Feed Forward Neural Network or Multilayer Perceptron. (12 Marks)
Document7 pages
Week - 5 (Deep Learning) Q. 1) Explain The Architecture of Feed Forward Neural Network or Multilayer Perceptron. (12 Marks)
Mrunal Bhilare
No ratings yet
Name: Ahammad Nadendla Roll No: B19BB030
Document12 pages
Name: Ahammad Nadendla Roll No: B19BB030
Ankur Borkar
No ratings yet
Deep Learning Notes
Document44 pages
Deep Learning Notes
AJAY SINGH NEGI
100% (1)
Compression: Image Data Compression, Discrete Cosine Transform, and The JPEG Format
Document14 pages
Compression: Image Data Compression, Discrete Cosine Transform, and The JPEG Format
fh637
No ratings yet
Redundancy, Repair
Document8 pages
Redundancy, Repair
Sabarish Ittamveetil
No ratings yet
An Ecient Transistor Folding Algorithm For Row-Based CMOS Layout Design
Document4 pages
An Ecient Transistor Folding Algorithm For Row-Based CMOS Layout Design
Nitesh Yadav
No ratings yet
OOAD
Document67 pages
OOAD
Bobby Jasuja
No ratings yet
NN Text Generation Zaid Bouslikhin
Document14 pages
NN Text Generation Zaid Bouslikhin
Zaïd Bouslikhin
No ratings yet
Solution of CSE 240A Assignemnt 3
Document5 pages
Solution of CSE 240A Assignemnt 3
soc1989
No ratings yet
Basic Communication Operations
Document70 pages
Basic Communication Operations
Rahul Padhy
No ratings yet
Parallel Databases: Solutions To Practice Exercises
Document3 pages
Parallel Databases: Solutions To Practice Exercises
NUBG Gamer
No ratings yet
International Journal of Engineering Research and Development
Document7 pages
International Journal of Engineering Research and Development
IJERD
No ratings yet
Communication Operations
Document70 pages
Communication Operations
kamaleshaiah
No ratings yet
TensorFlow With R
Document46 pages
TensorFlow With R
biondimi
No ratings yet
Project Report
Document5 pages
Project Report
Oussama
No ratings yet
Parallel Computing Introduction
Document36 pages
Parallel Computing Introduction
ajishalfred
No ratings yet
Multithread
Document3 pages
Multithread
khadija
No ratings yet
Montgomery Multiplication On The Cell
Document9 pages
Montgomery Multiplication On The Cell
mausamdas
No ratings yet
Imagenet Classification
Document9 pages
Imagenet Classification
ice117
No ratings yet
CFD Project
Document1 page
CFD Project
rareshubham1868
No ratings yet
BigData - Monte Carlo Algorithm
Document22 pages
BigData - Monte Carlo Algorithm
Satish V Madala
No ratings yet
Alex Net
Document9 pages
Alex Net
Mahmoud
No ratings yet
Convolutional Neural Network
Document55 pages
Convolutional Neural Network
raj858778
No ratings yet
MFIX On of Discrete Element Method
Document30 pages
MFIX On of Discrete Element Method
kamranian
No ratings yet
A Data Throughput Prediction Using Scheduling and Assignment Technique
Document5 pages
A Data Throughput Prediction Using Scheduling and Assignment Technique
International Journal of computational Engineering research (IJCER)
No ratings yet
The Cache Complexity of Multithreaded Cache Oblivious Algorithms
Document10 pages
The Cache Complexity of Multithreaded Cache Oblivious Algorithms
proxymo1
No ratings yet
Flood Fill: Flood Fill: Exploring Computer Vision's Dynamic Terrain
From Everand
Flood Fill: Flood Fill: Exploring Computer Vision's Dynamic Terrain
Fouad Sabry
No ratings yet
Raster Graphics Editor: Transforming Visual Realities: Mastering Raster Graphics Editors in Computer Vision
From Everand
Raster Graphics Editor: Transforming Visual Realities: Mastering Raster Graphics Editors in Computer Vision
Fouad Sabry
No ratings yet
Aiming for Quantum Computer
From Everand
Aiming for Quantum Computer
Veijo Hänninen
No ratings yet
Digital Raster Graphic: Unveiling the Power of Digital Raster Graphics in Computer Vision
From Everand
Digital Raster Graphic: Unveiling the Power of Digital Raster Graphics in Computer Vision
Fouad Sabry
No ratings yet
Loosely Coupled Method
Document18 pages
Loosely Coupled Method
Madhu Kiran Reddy Muli
No ratings yet
Parallel Programming and R
Document13 pages
Parallel Programming and R
logix
No ratings yet
CSC 520 Chapter 1
Document56 pages
CSC 520 Chapter 1
Le Song Lam
No ratings yet
Intro Parallel Programming 2015
Document38 pages
Intro Parallel Programming 2015
Ermin Sehic
No ratings yet
Con Currency
Document46 pages
Con Currency
007wasr
No ratings yet
Module-2: Introduction, Lexical Analysis: Syllabus
Document28 pages
Module-2: Introduction, Lexical Analysis: Syllabus
Samba Shiva Reddy.g Shiva
No ratings yet
Ab Initio - V1.6
Document50 pages
Ab Initio - V1.6
Praveen Joshi
No ratings yet
Parallel Computing
Document12 pages
Parallel Computing
Saravanan Thangavelu
100% (1)
Q1. What Is Exadata?
Document16 pages
Q1. What Is Exadata?
Rajesh Bhardwaj
No ratings yet
Gpu Programming in Matlab 91967v00
Document6 pages
Gpu Programming in Matlab 91967v00
Kc Cheng
No ratings yet
What Is A Data Warehouse?
Document7 pages
What Is A Data Warehouse?
jupudigupta
No ratings yet
Federal Tvet Institute (Ftveti) : Course Description
Document5 pages
Federal Tvet Institute (Ftveti) : Course Description
Matwos Arega
No ratings yet
Process Redesign
Document24 pages
Process Redesign
Saravanan Kumar
No ratings yet
Syllabus For Applied Electronics
Document28 pages
Syllabus For Applied Electronics
vinayakbond
No ratings yet
LG Wa Dpe Cloud Paas Aws l1
Document1 page
LG Wa Dpe Cloud Paas Aws l1
Manish Kumar
No ratings yet
Top Down Distribution Explained With An Example
Document4 pages
Top Down Distribution Explained With An Example
Devesh Changoiwala
No ratings yet
Week 3 - HDFS Using Cloudera
Document122 pages
Week 3 - HDFS Using Cloudera
Petter P
100% (1)
Multithreaded Architectures: Lecture 5: Performance Considerations
Document49 pages
Multithreaded Architectures: Lecture 5: Performance Considerations
YAAKOV SOLOMON
No ratings yet
Computer System Architecture: Pamantasan NG Cabuyao
Document12 pages
Computer System Architecture: Pamantasan NG Cabuyao
Bien Medina
No ratings yet
1 TSM Admin Best Practices
Document43 pages
1 TSM Admin Best Practices
newbeone
No ratings yet
First Practical Work-3
Document2 pages
First Practical Work-3
Andy Ortiz
No ratings yet
Parallel Processing With Cuda
Document25 pages
Parallel Processing With Cuda
Sudip Adhikari
No ratings yet
ACA Important Topic
Document2 pages
ACA Important Topic
Dhiraj Jha
No ratings yet
Parallel and Distributed Computing
Document28 pages
Parallel and Distributed Computing
Jamel Pandiin
No ratings yet
Guide To Unconventional Computing For Music
Document290 pages
Guide To Unconventional Computing For Music
Sonnenschein
100% (4)
ITEC582 Chapter18
Document36 pages
ITEC582 Chapter18
Ana Clara Cavalcante Sousa
No ratings yet
CASI Solver
Document23 pages
CASI Solver
Johney Tan
No ratings yet
Introduction CUDA
Document46 pages
Introduction CUDA
sue
No ratings yet
14.M.E Big Data
Document89 pages
14.M.E Big Data
Arun Kumar
No ratings yet