Professional Documents
Culture Documents
Ni dung
X l song song Xu hng pht trin ca CPU Cc m hnh lp trnh song song
3/5/2014
X L SONG SONG
3/5/2014
3/5/2014
3/5/2014
ng cao tc nhiu ln xe
Chia nh ra x l nhanh hn
Nhng bi ton phc tp nhiu lnh vc thc t i hi cao v tc a ra quyt nh nhanh da trn lng ln d liu nh:
D bo thi tit (d bo bo, l, ) Chun on y khoa Kinh t - ti chnh (mua bn chng khon) Qun s
3/5/2014
10
3/5/2014
11
3/5/2014
13
3/5/2014
14
Power Wall
3/5/2014
16
Power Wall
3/5/2014 17
Power Wall
S lng transistor/die tng gp i (N) Kch thc transistors thu nh hn (C) S dng s vol thp hn (V).
3/5/2014 http://www.digital-daily.com/cpu/intel_penryn/
18
Power Wall
in nng cung cp gim t 15V xung cn 1V trong vng gn 30 nm Ngng ti thiu l 0.7V cn gim thm c (1.0/0.7)2=2X Nhng khi tng mt (N) v xung nhp (f) ca CPU ln th mc tiu hao nng lng tng t 1 W ln 100 W ch trn 1 cm2 kh tn nhit t ti gii hn xung nhp CPU khng gip tng tc h thng nh trc na (k t P4)
3/5/2014 19
Memory Wall
3/5/2014
20
Memory Wall
tr ca DRAM ci thin khng ng k dng cache ca CPU CPU cache tn km do miss (c th mt 300 xung ng h) gim miss tng gp 4 ln dung lng cache (kch thc thc s tng!)
Cch d dng hn tng bng thng b nh Truy xut b nh song song Nhn chung hiu nng c nng cao
3/5/2014 21
Memory Wall
http://en.wikipedia.org/wiki/File:Cache,missrate.svg
3/5/2014 22
Gii php ca nh sn xut phn cng tng tc CPU 100x l tng s li/nhn thay v tng f. m ra mt k nguyn song song mi
3/5/2014 23
Source [3]
3/5/2014
24
Source [3]
3/5/2014 25
3/5/2014
26
Song song
Source [2]
3/5/2014
27
Song song
void quicksort(int * a, int n) { if (n <= 1) return; int s = partition(a,n); parallel_invoke( [&]{quicksort(a,s);}, [&]{quicksort(a+s,n-s);}); }
Cch suy ngh tun t Chuyn i t tun t sang song song Kh nng m rng theo phn cng
3/5/2014
29
Source [2]
kt qu c tnh tt nh (chc chn) Khng cn ph hp khi chuyn sang song song trng thi khng xc nh
3/5/2014 30
Song song
void quicksort(int * a, int n) { if (n <= 1) return; int s = partition(a,n); 3 parallel_invoke( [&]{quicksort(a,s);}, [&]{quicksort(a+s,n-s);}); }
quicksort(a,s); quicksort(a+s,n-s); }
Thi im 1 & 2 bit chc tnh trng ca mng a Thi im 3 khng bit chc tnh trng ca mng a
3/5/2014 31
Source [4]
3/5/2014 32
1 2
3 4 5
int fib(int n) { if (n < 2) return n; else { int x = fib(n-1); int y = fib(n-2); return x + y; } }
3/5/2014
33
nh lut Amdahl
1 speedup 1 p
3/5/2014
Source [2]
34
speedup
1 p s N
Tng s CPU cng gip tng tc vi cng t l code chy song song
Source [2]
3/5/2014 35
Liu khi phn cng thay i (s CPU thay i) th code c phi thay i khng?
phi
typedef struct int main(int argc, char *argv[]) { { int input; pthread_t thread; int output; thread_args args; } thread_args; int status; int result; void *thread_func ( void *ptr ) int thread_result; { int i = ((thread_args *) ptr)->input; if (argc < 2) return 1; ((thread_args *) ptr)->output = int n = atoi(argv[1]); fib(i); if (n < 30) result = fib(n); return NULL; else { } args.input = n-1; status = pthread_create(&thread, NULL, thread_func, (void*) &args ); // main can continue executing while the thread executes. result = fib(n-2); // Wait for the thread to terminate. pthread_join(thread, NULL); result += args.output; } printf("Fibonacci of %d is %d.\n", n, result); return 0; 3/5/2014 37 }
c im
Theo m hnh song song v tc v Khng ph hp vi cc kin trc my tnh a nhn/a li mi. C nhiu vn v l thuyt cha khc phc c trong m hnh lp trnh song song theo tc v.
3/5/2014
38
3/5/2014
Source [2]
39
c im ca m hnh
Song song cc thao tc trn mt tp d liu (VD: mng hoc ma trn) Mi tc v x l mt phn d liu ca cng mt CTDL Cc tc v thc hin cng mt thao tc trn d liu Ph hp vi kin trc a nhn/a li mi Khc phc nhiu vn ca lp trnh song song theo tc v.
3/5/2014 40
Mi trng lp trnh:
Ngn ng truyn thng (Fortran) Th vin ha (OpenGL, Direct3D) Ngn ng m rng (CUDA) Ngn ng x l theo kiu mng
3/5/2014
41
Bt ngun t lnh vc tnh ton hiu nng cao (High Performance Computing) S dng rng ri trong cc siu my tnh Ngn ng Fortran (Fortran 90, HPF)
V d: Cng hai ma trn A v B
Fortran 90 C=A+B Fortran 77 DO I = 1, N DO J = 1, N C(I, J) = A(I, J) + B(I, J) END DO END DO
3/5/2014
42
u im
Da trn nhng ngn ng ph bin (Fortran)
Khuyt im
Khng h tr nn tng desktop
3/5/2014
43
Th vin ha
Shading Language
OpenGLs shading Language (GLSL) DirectX High Level Shader Language (HLSL)
u im
Da trn c im chung ca phn cng GPU lm c trn nhiu GPU khc nhau
Khuyt im
Khng th hin c c im ring ca mi card ha Kh s dng
3/5/2014 44
Th vin ha v d
float main(float2 texcoord : TEXCOORD0, uniform samplerRECT img) : COLOR { float a, b, c, d; a = f1texRECT(img, texcoord); b = f1texRECT(img, texcoord + float2(0, 1)); c = f1texRECT(img, texcoord + float2(1, 0)); d = f1texRECT(img, texcoord + float2(1, 1)); return max(max(a, b), max(c, d)); }
3/5/2014
Source [8]
45
Ngn ng m rng
c pht trin v h tr bi nh sn xut phn cng M rng da trn ngn ng quen thuc Gm
XMT-C (PRAM trn Chip) CUDA - NVIDIA nm 2007 CAL(Compute Abstraction Layer) AMD Radeon
3/5/2014 46
Ngn ng m rng
u im Gn vi ngn ng quen thuc (ch yu C) n gin ha (che i phn song song) Khuyt im Kh ti u
C void incrementArray (float *a, int N) { int i; for (i=0; i < N; i++) a[i] = a[i] +1.0f; } CUDA __global__ void incrementArray (float *a, int N) { int idx = blockIdx.x*blockDim.x + threadIdx.x; if (idx < N) a[idx] = a[idx] + 1.0f; }
3/5/2014 47
Khuyt im
tng thit k da trn mng
Gm
RapidMind CPU/GPU Acceleware CPU/GPU
3/5/2014 48
RapidMind
3/5/2014
50
[2]
Blaise Barney, Introduction to Parallel Computing, High Performance Computing Training Workshop, Lawrence Livermore National Laboratory, 2009
Bi vit The Problem: Moore's Law and Fast Numerical Software
[3]
[4]
[5]
Charles E. Leiserson v Ilya B. Mirman How to Survive the Multicore Software Revolution, Cilk Arts
Taking Parallelism Mainstream, Parallel Computing Developer Center, Microsoft, 2009
[6]
[7] [8]
Stuart Oberman, GPUs: High Performance Arithmetic for Graphics and General Purpose Computation ARITH 19, 2009
Ejaz Anwer, Handling Multiple Processors in Your Code Using RapidMind, Codeguru, 2007 Ian Buck and Tim Purcell, GPU Gems: Programming Techniques, Tips and Tricks for Real-Time Graphics, ch. 37, Addison-Wesley, 2004
3/5/2014 51