Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Look up keyword
Like this
3Activity
0 of .
Results for:
No results containing your search query
P. 1
Mohr

Mohr

Ratings: (0)|Views: 45 |Likes:
Published by sandeep_nagar29

More info:

Published by: sandeep_nagar29 on Jan 22, 2009
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF or read online from Scribd
See more
See less

11/05/2011

 
John von Neumann Institute for Computing
Introduction to Parallel Computing
Bernd Mohr
published in
Computational Nanoscience: Do It Yourself! 
,J. Grotendorst, S. Bl¨ugel, D. Marx (Eds.),John von Neumann Institute for Computing, J¨ulich,NIC Series, Vol.
31
, ISBN 3-00-017350-1, pp. 491-505, 2006.
c
 
2006 by John von Neumann Institute for ComputingPermission to make digital or hard copies of portions of this work forpersonal or classroom use is granted provided that the copies are notmade or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwiserequires prior specific permission by the publisher mentioned above.
http://www.fz-juelich.de/nic-series/volume31
 
Introduction to Parallel Computing
Bernd Mohr
John von Neumann Institute for ComputingCentral Institute for Applied MathematicsForschungszentrum J¨ulich52425 J¨ulich, Germany
 E-mail: b.mohr@fz-juelich.de
The major parallel programming models for scalable parallel architectures are the messagepassing model and the shared memory model. This article outlines the main concepts of thesemodels as well as the industry standard programming interfaces MPI and OpenMP. To exploitthe potential performance of parallel computers, programs need to be carefully designed andtuned. We will discuss design decisions for good performance as well as programming toolsthat help the programmer in program tuning.
1 Introduction
Although the performance of sequential computers increases incredibly fast, it is insuffi-cient for a large number of challenging applications. Applications requiring much moreperformanceare numerical simulations in industry and research as well as commercial ap-plications such as query processing, data mining, and multi-media applications. Currenthardwarearchitectures offeringhigh performancedo not only exploit parallelism on a veryfine grain level within a single processor but apply a medium to large number of proces-sors concurrently to a single computation. High-end parallel computers currently (2005)deliver up to 280 Teraflop/s (
¡¢£ 
floating point operations per second) and are developedand exploited within the ASCI (Accelerated Strategic Computing Initiative) program of the Department of Energy in the USA.This article concentrates on programmingnumerical applications on parallel computerarchitecturesintroducedin Section 1.1. Parallelizationofthose applicationscentersaroundselecting a decomposition of the data domain onto the processors such that the workloadis well balanced and the communication between processors is reduced (Section 1.2)
5
.The parallel implementation is then based on either the message passing or the sharedmemory model (Section 2). The standard programming interface for the message passingmodel is MPI (Message Passing Interface)
9–11
, offering a complete set of communicationroutines (Section 3). OpenMP
3,12
is the standard for directive-based shared memory pro-gramming and will be introduced in Section 4.Since parallel programs exploit multiple threads of control, debugging is even morecomplicated than for sequential programs. Section 5 outlines the main concepts of paral-lel debuggers and presents TotalView
13
, the most widely available debugger for parallelprograms.Although the domain decomposition is key to good performance on parallel archi-tectures, program efficiency also heavily depends on the implementation of the commu-nication and synchronization required by the parallel algorithms and the implementationtechniques chosen for sequential kernels. Optimizing those aspects is very system depen-dent and thus, an interactive tuning process consisting of measuring performance data and
1

You're Reading a Free Preview

Download
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->