Professional Documents
Culture Documents
Admin 10
Admin 10
Processing
Dr. Guy Tel-Zur
Lecture 10
Agenda
• Administration
• Final presentations
• Demos
• Theory
• Next week plan
• Home assignment #4 (last)
Final Projects
• Next Sunday: Groups 1-16 will present
• Next Monday: Groups 17+ will present
• 10 minutes presentation per group
• All group members should present
• Send to: gtelzur@gmail.com your
presentation by midnight of the previous day
נוכחות חובה
Final Presentations
החלוקה לקבוצות הינה קשיחה •
קבוצה שלא תציג תאבד 5נקודות בציון •
יש לבצע חזרה ולוודא עמידה בזמנים •
המצגת צריכה לכלול :שם הפרויקט ,מטרתו, •
האתגר בבעיה מבחינת החישוב המקבילי ,דרכים
לפתרון.
לא תתקבלנה מצגות בזמן השיעור! יש להקפיד •
לשלוח אותן אל המרצה מבעוד מועד
The Course Roadmap
Introductio
n
HPC HTC
Condor
Message Passing Shared Memory
d ay
To Cloud
Today
Computing
Advanced Parallel Computing and
Distributed Computing course
• A new course at the department:
Distributed Computing: Advanced Parallel
Processing course + Grid Computing + Cloud
Computing
Course Number: 361-1-4691
An Idea for a
!!!final project
MPI_Init(&argc,&argv);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);
MPI_Comm_size(MPI_COMM_WORLD,&nproc);
nbin = NBIN/nproc;
step = 1.0/(nbin*nproc);
#pragma omp parallel private(tid)
{
int i;
double x;
nthreads = omp_get_num_threads();
tid = omp_get_thread_num();
for (i=nbin*myid+tid; i<nbin*(myid+1); i+=nthreads) {
x = (i+0.5)*step;
sum[tid] += 4.0/(1.0+x*x);
}
printf("rank tid sum = %d %d %e\n",myid,tid,sum[tid]);
}
for(tid=0; tid<nthreads; tid++)
pi += sum[tid]*step;
MPI_Allreduce(&pi,&pig,1,MPI_DOUBLE,MPI_SUM,MPI_COMM_WORLD);
if (myid==0) printf("PI = %f\n",pig);
MPI_Finalize();
return 0;
}
++Cilk
http://software.intel.com/en-us/articles/intel-cilk-plus/
17/8/2011
Fibonachi (Fibonacci)
Try:
http://www.wolframalpha.com/input/?i=fibonacci+number
Fibonachi Numbers
serial version
long fib_serial(long n) {
if (n < 2) return n;
}
Cilk++ Fibonachi (Fibonacci)
#include <cilk.h>
#include <stdio.h>
long fib_parallel(long n)
{
long x, y;
if (n < 2) return n;
x = cilk_spawn fib_parallel(n-1);
y = fib_parallel(n-2);
cilk_sync;
return (x+y);
}
int cilk_main()
{
int N=50;
long result;
result = fib_parallel(N);
printf("fib of %d is %d\n",N,result);
return 0;
}
Cilk_spawn
...
do_stuff_1(); // execute strand 1
cilk_spawn func_3(); // spawn strand 3 at knot A
do_stuff_2(); // execute strand 2
cilk_sync; // sync at knot B
do_stuff_4(); // execute strand 4 ...
Let's add labels to the strands to indicate the number of milliseconds it takes to
execute each strand
Here is the DAG for a serial loop that spawns each iteration. In this case, the
work is not well balanced, because each child does the work of only one
iteration before incurring the scheduling overhead inherent in entering a sync.
Race conditions
Check the “qsort-race” program with cilkscreen:
StarHPC on the Cloud
Reference: http://myxman.org/dp/node/182
Intel® Parallel Studio
• Use Parallel Composer
to create and compile a parallel application
• Use Parallel Inspector
to improve reliability by finding memory and
threading errors
• Use Parallel Amplifier
to improve parallel performance by tuning
threaded code
Intel® Parallel Studio
Parallel Studio add new features to Visual
Studio
Intel’s Parallel Amplifier –
Execution Bottlenecks
Intel’s Parallel Inspector –
Threading Errors
Intel’s Parallel Inspector –
Threading Errors
Error – Data Race
Intel Parallel Studio - Composer
The installation of this part failed for me.
Probably because I didn’t install before Intel’s C++
compiler.
Sorry I can’t make a demo here…