You are on page 1of 51

Bi gii thiu:

Tng quan v lp trnh song song

Ni dung
X l song song Xu hng pht trin ca CPU Cc m hnh lp trnh song song

Truyn thng Da trn d liu

3/5/2014

X L SONG SONG

3/5/2014

Vai tr ca x l song song trong cuc sng

X l song song hon ton khng xa l trong cuc sng


Quy tnh tin siu th Mua v v cng vin ng cao tc nhiu ln xe

Nhiu s vic phc tp trong cuc sng u xy ra ng thi

3/5/2014

Quy tnh tin siu th

Source: http://www.baobinhthuan.com.vn/web/data/ne ws/2009/1/21476/thitruong.jpg Source: Checkouts http://www.sturm.si/en/images/iman/gal_img.0077.jpg


3/5/2014 5

Mua v v cng vin

Source:http://www.vtc.vn/newsimage/original/vtc_21692 5_suoitien.jpg Source: The Old Entrance http://www.matterhorn1959.com/blog1/4.TicketBooths.jpg

3/5/2014

ng cao tc nhiu ln xe

Source: Stock Photo: Cars In Traffic On Multi-lane Highway http://www.worldofstock.com/closeups/TRC4948.php


3/5/2014 7

Vai tr ca x l song song trong cuc sng

Ti sao li phi x l song song?


Tit kim thi gian + tin bc

Chia nh ra x l nhanh hn

x l song song gip nng cao nng sut


3/5/2014 8

ng dng x l song song


Nhng bi ton phc tp nhiu lnh vc thc t i hi cao v tc a ra quyt nh nhanh da trn lng ln d liu nh:
D bo thi tit (d bo bo, l, ) Chun on y khoa Kinh t - ti chnh (mua bn chng khon) Qun s

Xy dng m hnh tnh ton v phn tch trn my tnh


3/5/2014

ng dng x l song song


M phng thc t xem xt n nhiu yu t (tham s) khc nhau nhiu kh nng/th hin ca mt bi ton c th c x l song song

3/5/2014

10

XU HNG PHT TRIN CA CPU

3/5/2014

11

S pht trin ca CPU


Central Processing Unit (CPU) 60 nm pht trin ca CPU Intel

60 YEARS OF THE TRANSISTOR: 1947 2007 http://www.intel.com/technology/timeline.pdf


3/5/2014 12

S pht trin ca CPU

60 YEARS OF THE TRANSISTOR: 1947 2007 http://www.intel.com/technology/timeline.pdf

3/5/2014

13

S pht trin ca CPU

60 YEARS OF THE TRANSISTOR: 1947 2007 http://www.intel.com/technology/timeline.pdf

3/5/2014

14

S pht trin ca CPU


CPU nhiu li ngy cng ph dng Ti sao li phi chuyn t n li sang nhiu li? T 1975 hiu nng CPU vn tng lin tc (100x/10 nm) Nhng ro cn khi tng tc CPU n li

Power Wall Memory Wall Complexity Wall


3/5/2014 15

Power Wall

3/5/2014

16

Power Wall

Cng sut (W) ca CPU t l vi NCV2f


N: s lng transistor C: in dung V: s vol f: tn s

Xu hng N C V (Cng ngh transistor mi) S nh th no nu f

3/5/2014 17

Power Wall

Mi th h mch in mi (90, 60, 45, 32, 22, 16, 11 nm)


S lng transistor/die tng gp i (N) Kch thc transistors thu nh hn (C) S dng s vol thp hn (V).

3/5/2014 http://www.digital-daily.com/cpu/intel_penryn/

18

Power Wall
in nng cung cp gim t 15V xung cn 1V trong vng gn 30 nm Ngng ti thiu l 0.7V cn gim thm c (1.0/0.7)2=2X Nhng khi tng mt (N) v xung nhp (f) ca CPU ln th mc tiu hao nng lng tng t 1 W ln 100 W ch trn 1 cm2 kh tn nhit t ti gii hn xung nhp CPU khng gip tng tc h thng nh trc na (k t P4)

3/5/2014 19

Memory Wall

3/5/2014

20

Memory Wall
tr ca DRAM ci thin khng ng k dng cache ca CPU CPU cache tn km do miss (c th mt 300 xung ng h) gim miss tng gp 4 ln dung lng cache (kch thc thc s tng!)

Nhiu transitor trong CPU c dng cho vic x l truy xut b nh ny

Cch d dng hn tng bng thng b nh Truy xut b nh song song Nhn chung hiu nng c nng cao

3/5/2014 21

Memory Wall

T l miss khi tng dung lng cache

http://en.wikipedia.org/wiki/File:Cache,missrate.svg
3/5/2014 22

Xu hng pht trin CPU

Do nhng ro cn trn nn CPU n li


Hiu nng s tng rt chm (5 10%) Tt cho phn mm truyn thng (chy tun t

Gii php ca nh sn xut phn cng tng tc CPU 100x l tng s li/nhn thay v tng f. m ra mt k nguyn song song mi

3/5/2014 23

My tnh v hiu nng phn mm

Trc y ch cn nng cp phn cng chng trnh chy nhanh hn

Hai xu hng khc nhau

Tn s CPU b gim xung do gii hn vt l

Source [3]

3/5/2014

24

My tnh v hiu nng phn mm

Source [3]
3/5/2014 25

LP TRNH SONG SONG

3/5/2014

26

Lp trnh song song truyn thng


Tun t

Song song

Source [2]

3/5/2014

27

Lp trnh song song truyn thng


Tun t
void quicksort(int * a, int n) { if (n <= 1) return; int s = partition(a,n); quicksort(a,s); quicksort(a+s,n-s); }

Song song
void quicksort(int * a, int n) { if (n <= 1) return; int s = partition(a,n); parallel_invoke( [&]{quicksort(a,s);}, [&]{quicksort(a+s,n-s);}); }

tng n gin nhng mang li hiu qu cao


3/5/2014 28

Lp trnh song song truyn thng


Lp trnh song song khng n gin 3 kh khn ch o

Cch suy ngh tun t Chuyn i t tun t sang song song Kh nng m rng theo phn cng

Cn nhiu vn khc nh debug, kim th, hiu nng

3/5/2014

29

Cch suy ngh tun t

Kin trc Von Neumann

Source [2]

kt qu c tnh tt nh (chc chn) Khng cn ph hp khi chuyn sang song song trng thi khng xc nh
3/5/2014 30

Cch suy ngh tun t


Tun t
void quicksort(int * a, int n) { if (n <= 1) return; int s = partition(a,n);
1 2

Song song
void quicksort(int * a, int n) { if (n <= 1) return; int s = partition(a,n); 3 parallel_invoke( [&]{quicksort(a,s);}, [&]{quicksort(a+s,n-s);}); }

quicksort(a,s); quicksort(a+s,n-s); }

Thi im 1 & 2 bit chc tnh trng ca mng a Thi im 3 khng bit chc tnh trng ca mng a
3/5/2014 31

Chuyn i tun t sang song song


M hnh mi quan h gia tun t v song Song song song bng mt th c hng nh lnh Cnh ni x y lnh x thc thi trc lnh y (tun t) Khng c cnh ni x v y x || y (song song)
Tun t

Source [4]
3/5/2014 32

Chuyn i tun t sang song song

Hm tnh Fibonacci qui


song song
1

1 2

3 4 5

int fib(int n) { if (n < 2) return n; else { int x = fib(n-1); int y = fib(n-2); return x + y; } }

3/5/2014

33

Chuyn i tun t sang song song

nh lut Amdahl

1 speedup 1 p

T l code chy song song cng nhiu tc tng ln cng nhiu ln

3/5/2014

Source [2]

34

Chuyn i tun t sang song song

Khi tng s CPU ln

speedup

1 p s N
Tng s CPU cng gip tng tc vi cng t l code chy song song

Source [2]
3/5/2014 35

Kh nng m rng theo phn cng

Liu khi phn cng thay i (s CPU thay i) th code c phi thay i khng?

phi

thay i code vn ln ???


3/5/2014 36

typedef struct int main(int argc, char *argv[]) { { int input; pthread_t thread; int output; thread_args args; } thread_args; int status; int result; void *thread_func ( void *ptr ) int thread_result; { int i = ((thread_args *) ptr)->input; if (argc < 2) return 1; ((thread_args *) ptr)->output = int n = atoi(argv[1]); fib(i); if (n < 30) result = fib(n); return NULL; else { } args.input = n-1; status = pthread_create(&thread, NULL, thread_func, (void*) &args ); // main can continue executing while the thread executes. result = fib(n-2); // Wait for the thread to terminate. pthread_join(thread, NULL); result += args.output; } printf("Fibonacci of %d is %d.\n", n, result); return 0; 3/5/2014 37 }

Kh nng m rng theo phn cng

Tng tc 1.5 ln Code cho CPU 2 nhn CPU 4 nhn Sa code

Lp trnh song song kiu truyn thng

c im
Theo m hnh song song v tc v Khng ph hp vi cc kin trc my tnh a nhn/a li mi. C nhiu vn v l thuyt cha khc phc c trong m hnh lp trnh song song theo tc v.

3/5/2014

38

Lp trnh song song da trn d liu

Mi phn ca d liu c chia cho mt b x l (tc v) thc hin

3/5/2014

Source [2]

39

Lp trnh song song da trn d liu

c im ca m hnh
Song song cc thao tc trn mt tp d liu (VD: mng hoc ma trn) Mi tc v x l mt phn d liu ca cng mt CTDL Cc tc v thc hin cng mt thao tc trn d liu Ph hp vi kin trc a nhn/a li mi Khc phc nhiu vn ca lp trnh song song theo tc v.
3/5/2014 40

Lp trnh song song da trn d liu

Mi trng lp trnh:
Ngn ng truyn thng (Fortran) Th vin ha (OpenGL, Direct3D) Ngn ng m rng (CUDA) Ngn ng x l theo kiu mng

3/5/2014

41

Ngn ng truyn thng


Bt ngun t lnh vc tnh ton hiu nng cao (High Performance Computing) S dng rng ri trong cc siu my tnh Ngn ng Fortran (Fortran 90, HPF)
V d: Cng hai ma trn A v B
Fortran 90 C=A+B Fortran 77 DO I = 1, N DO J = 1, N C(I, J) = A(I, J) + B(I, J) END DO END DO

3/5/2014

42

Ngn ng truyn thng

u im
Da trn nhng ngn ng ph bin (Fortran)

Khuyt im
Khng h tr nn tng desktop

3/5/2014

43

Th vin ha

Shading Language
OpenGLs shading Language (GLSL) DirectX High Level Shader Language (HLSL)

u im
Da trn c im chung ca phn cng GPU lm c trn nhiu GPU khc nhau

Khuyt im
Khng th hin c c im ring ca mi card ha Kh s dng
3/5/2014 44

Th vin ha v d
float main(float2 texcoord : TEXCOORD0, uniform samplerRECT img) : COLOR { float a, b, c, d; a = f1texRECT(img, texcoord); b = f1texRECT(img, texcoord + float2(0, 1)); c = f1texRECT(img, texcoord + float2(1, 0)); d = f1texRECT(img, texcoord + float2(1, 1)); return max(max(a, b), max(c, d)); }

3/5/2014

Source [8]

45

Ngn ng m rng
c pht trin v h tr bi nh sn xut phn cng M rng da trn ngn ng quen thuc Gm

XMT-C (PRAM trn Chip) CUDA - NVIDIA nm 2007 CAL(Compute Abstraction Layer) AMD Radeon
3/5/2014 46

Ngn ng m rng
u im Gn vi ngn ng quen thuc (ch yu C) n gin ha (che i phn song song) Khuyt im Kh ti u

C void incrementArray (float *a, int N) { int i; for (i=0; i < N; i++) a[i] = a[i] +1.0f; } CUDA __global__ void incrementArray (float *a, int N) { int idx = blockIdx.x*blockDim.x + threadIdx.x; if (idx < N) a[idx] = a[idx] + 1.0f; }
3/5/2014 47

Ngn ng x l theo kiu mng


Tn dng sc mnh ca cc CPU/GPU nhiu nhn u im

Code ngn gn v r rng

Khuyt im
tng thit k da trn mng

Gm
RapidMind CPU/GPU Acceleware CPU/GPU
3/5/2014 48

Ngn ng x l theo kiu mng


C++ Thc hin tnh ton float results[10000]; Array< 1 , Value1f > output; //Stream Program chy trn d liu Program prg = RM_BEGIN { In<Value1f> a; // input 1 In<Value1f> b; // input 2 Out<Value1f> c; // output c = a + b; //thao tc } RM_END; output = prg(input1, input2);
3/5/2014 49

RapidMind

for(int i = 0; i < 10000; ++i) { result[i] = input1[i] + input2[i]; }

Ngn ng x l theo kiu mng


C++ Xut kt qu RapidMind

const float* results = output.read_data();


for (int i = 0; i < 10000; ++i) { std::cout << "output[" << i << "] = (" << results[i] << ")" << std::endl; } for (int i = 0; i < 10000; ++i) { std::cout << "output[" << i << "] = (" << results[i] << ")" << std::endl; }

3/5/2014

50

Ti liu tham kho


[1] Chas Boyd, Data-parallel Computing, ACM Queue vol. 6, no. 2, 2008

[2]

Blaise Barney, Introduction to Parallel Computing, High Performance Computing Training Workshop, Lawrence Livermore National Laboratory, 2009
Bi vit The Problem: Moore's Law and Fast Numerical Software

[3]

[4]
[5]

Charles E. Leiserson v Ilya B. Mirman How to Survive the Multicore Software Revolution, Cilk Arts
Taking Parallelism Mainstream, Parallel Computing Developer Center, Microsoft, 2009

[6]
[7] [8]

Stuart Oberman, GPUs: High Performance Arithmetic for Graphics and General Purpose Computation ARITH 19, 2009
Ejaz Anwer, Handling Multiple Processors in Your Code Using RapidMind, Codeguru, 2007 Ian Buck and Tim Purcell, GPU Gems: Programming Techniques, Tips and Tricks for Real-Time Graphics, ch. 37, Addison-Wesley, 2004
3/5/2014 51

You might also like