Professional Documents
Culture Documents
TM TT KHA LUN
Song song ha l mt gii php quan trng c p dng khi gii quyt cc vn i hi phi tnh ton ln thng gp trong cc lnh vc khoa hc c bnBi ton Nbody l mt trong nhng bi ton c bn trong lnh vc vt l hc thin th, lin quan ti lc tng tc gia cc ht vi nhau trong khng gian. C rt nhiu hng gii quyt bi ton trn, trong c phng php s dng thut ton Barnes-Hut. OpenMP l giao din lp trnh ng dng API, cung cp cho ngi lp trnh mt giao din mm do, c tnh kh chuyn trong khi pht trin cc ng dng song song trn cc my tnh s dng kin trc b nh chia s. Kha lun ny gii thiu tng quan v bi ton N-body, thut ton Barnes-Hut v giao din lp trnh ng dng OpenMP. Trn c s nh gi hiu nng thut ton Barnes-Hut, tin hnh tm hiu, phn tch v xut cc phng thc song song ha thut ton Barnes-Hut vi OpenMP.
L Th Lan Phng
LI CM N
u tin, em mun gi li cm n su sc nht ti TS. Nguyn Hi Chu, ngi hng dn v ch bo em tn tnh trong sut thi gian lm kha lun. Em xin chn thnh cm n thy Phm K Anh, gim c Trung tm Tnh ton hiu nng cao Trng i hc KHTN i hc Quc gia H Ni, ngi to iu kin tt nht cho em thc hnh v th nghim thut ton. Em cng xin gi li cm n ti tt c cc thy v cc anh ch trong Trung tm, nhng ngi gip v tr li mi thc mc, to iu kin cho em hon thnh kha lun. Em xin cm n thy on Minh Phng, ging vin b mn Mng v Truyn thng my tnh, khoa CNTT, trng i hc Cng ngh, ngi gip em th nghim bi ton trn my a x l Intel. Cui cng, em xin gi li cm n su sc ti nhng ngi thn trong gia nh em, nhng ngi lun quan tm, ng vin khch l em trong hc tp v trong cuc sng.
L Th Lan Phng
ii
L Th Lan Phng
iii
Bng t vit tt
T hoc cm t Giao din lp trnh ng dng Cc ch th m dnh cho a x l Lung Ht , Khi Nt Giao din truyn thng ip Tm khi
T ting Anh Application Program Interface Open Specifications for Multi Processing Thread Body Cell Node Message Passing Interface Center of mass
L Th Lan Phng
iv
Mc lc
TM TT KHA LUN........................................................................................... i LI CM N ............................................................................................................. ii Danh sch hnh v...................................................................................................... iii Bng t vit tt........................................................................................................... iv Mc lc ........................................................................................................................ v M U...................................................................................................................... 1 Chng 1: BI TON N-BODY V THUT TON BARNES-HUT ................ 2 1.1 Bi ton N-body .......................................................................................... 2 1.1.1 Gii thiu bi ton N-body............................................................ 2 1.1.2 Phng php nhm tng tc bi ton N-body............................... 5 1.1.3 Cu trc cy Quadtree v Octree................................................... 7 1.2 Thut ton Barnes-Hut ................................................................................ 9 1.2.1 M t thut ton Barnes-Hut ....................................................... 10 Chng 2: GII THIU V OPENMP................................................................. 15 2.1 OpenMP (Open specifications for Multi Processing) ............................... 15 2.2 Kin trc b nh chia s............................................................................ 16 2.3 Mc tiu ca OpenMP............................................................................... 17 2.4 Mi trng h tr OpenMP....................................................................... 18 2.5 M hnh lp trnh OpenMP ....................................................................... 18 2.6 Mt s ch th c bn trong OpenMP ........................................................ 19 2.6.1 Cc ch th song song ha............................................................ 20 2.6.2 Ch th khai bo min song song ................................................. 20 2.6.3 Ch th lin quan ti mi trng d liu...................................... 21 2.6.4 Ch th lin quan ti chia s cng vic ........................................ 23 2.6.5 Ch th ng b ha ..................................................................... 28 2.6.6 Th vin v mt s bin mi trng ........................................... 31 L Th Lan Phng
Song song ha thut ton Barnes-Hut vi OpenMP 2.7 V d v lp trnh song song vi OpenMP................................................ 33 2.7.1 omp_hello.c ................................................................................. 33 2.7.2 Cch bin dch ............................................................................. 33 2.7.3 Kt qu ........................................................................................ 34 Chng 3: SONG SONG HA THUT TON BARNES-HUT........................ 35 3.1 Treecode .................................................................................................... 35 3.1.1 Cu trc d liu ca cy .............................................................. 35 3.1.2 Cc bin ton cc ........................................................................ 39 3.2 Th nghim v nh gi hiu nng ca treecode ...................................... 40 3.2.1 Th nghim chng trnh treecode ............................................. 40 3.2.2 nh gi hiu nng...................................................................... 42 3.3 Song song ha treecode vi OpenMP ....................................................... 43 3.3.1 Mi trng thc hin song song ................................................. 43 3.3.2 Thc hin song song.................................................................... 44 3.4 Kt qu thc nghim ................................................................................. 51 KT LUN ............................................................................................................... 53 TI LIU THAM KHO........................................................................................ 54
L Th Lan Phng
vi
M U
Bi ton N-body l mt trong nhng bi ton c bn ca vt l hc thin th. Trc y c rt nhiu hng khc nhau khi gii quyt vn lin quan ti lc tng tc gia cc ht ca h N ht trong khng gian. Trong c hai cch gii quyt c bn. l tnh trc tip lc gia cc cp ht vi phc tp l O (N2) v cch tnh th nng lp vi phc tp l O (N log N). Cch th nht cho php tnh ton mt cch gn chnh xc lc tng tc. Song thi gian cn thc hin trong bi ton N-body l rt ln, xp x O (N2) vi N l s ht. Trong thi gian tnh lc chim ch yu, khong 96 % thi gian thc hin chng trnh khi c th nghim trn my Intel 1 CPU. Cch th hai dng nh gim thiu thi gian tnh ton nhng li thiu chnh xc v thiu tnh tng qut khi m phng h N-body. Thut ton Barnes-Hut v cc ci tin ca n c p dng tnh lc vi phc tp xp x O (N log N) v cho kt qu tng i chnh xc. Song song ha thut ton Barnes-Hut c ngha v cng quan trng trong vic tng tc bi ton N-body. Song song ha thut ton Barnes-Hut trn kin trc my tnh c b nh phn tn bng cch s dng giao din lp trnh ng dng MPI c nhiu tc gi nghin cu v t kt qu tt. Tuy nhin vn song song ha thut ton ny trn kin trc my tnh a x l b nh chia s cha c nghin cu nhiu. OpenMP l mt trong cc giao din lp trnh ng dng dnh cho cc ng dng song song trn kin trc my tnh a x l b nh chia s. So vi MPI, OpenMP c tnh mm do, tnh kh chuyn cao, v cho php ngi lp trnh c c mt giao din n gin khi xy dng v pht trin cc ng dng song song. Kha lun ny nghin cu tng quan v bi ton N-body, tm hiu v thut ton Barnes-Hut cng nh v giao din lp trnh OpenMP. T rt ra nhng nhn xt v nh gi hiu nng thut ton v nghin cu vn song song ha thut ton Barnes-Hut s dng OpenMP trn m hnh b nh chia s.
L Th Lan Phng
Hnh 1: Minh ha h N-body trong khng gian Di y l gii thut c bn khi m phng h N-body. while (t < tfinal) {
L Th Lan Phng
Song song ha thut ton Barnes-Hut vi OpenMP for i =1 to n do { tnh lc f(i) tc dng ln ht i cp nht vn tc v v tr ca ht i } t = t + t } Trong lc f(i) tc dng ln ht i c th c tnh n gin nh sau: for i = 1 to n f(i) = sum[ j=1,...,n, j!=i] f(i,j) /* f(i,j) la luc cua hat j tac dung len i*/ end for Lc tng tc ca cc ht ph thuc vo khong cch gia chng. Bi vy, khi ht chuyn ng ta cn phi xc nh li lc tc dng ln n. Xt ht chuyn ng di tc dng ca lc hp dn.
Trong : F l lc hp dn gy ra bi hai ht a, b. G: hng s hp dn. = (6.6742 0.001) x 10-11 N m2 kg-2 ma, mb: khi lng ca ht a, b tng ng. d: khong cch gia hai ht L Th Lan Phng 3
Hnh 2: Biu din lc tng hp tc dng ln 1 ht Xt cc thi im t0, t1, vi khong thi gian l t. Di tc dng ca lc Fnet, vn tc ca ht l:
Lc hp dn chiu ln trc Ox l:
R rng vi N ht trong khng gian, thut ton tnh trc tip lc tng tc gia cc cp ht s lm cho phc tp ca bi ton l O(N2). Vy lm th no c th gim thiu c thi gian tnh ton?
L Th Lan Phng
Song song ha thut ton Barnes-Hut vi OpenMP Thit lp t s: Kch thc khi hp D/r = Khong cch t tm khi ti tri t Ta thy t s D/r l rt nh, do vy c th thay th mt cch tng i chnh xc tt c cc v sao trong chm sao Andromeda nh mt im x t ti tm ca khi.
Hnh 3: Quan st thin h Andromeda t tri t tng ny c cc nh bc hc trc pht hin v p dng vo nhiu bi ton. Nh trong l thuyt c hc c in, khi tnh lc ht ca tri t tc dng ln qu to ang ri, Newton coi tri t nh l mt im c t ti tm ca tri t. im mi m y l vic ta p dng tng ny mt cch quy gii quyt bi ton N-body. Chng hn khi ta quan st t chm sao tinh n Andromeda, di ngn h Milky Way c th c xp x l mt im t ti tm di. Nhng iu quan trng hn l qu trnh ny c th c lp li nhiu ln min l t s khong cch D1/r1 l nh c th thay th cc v sao trong mt khi nh hn bng mt im t ti tm khi khi tnh lc hp dn.
L Th Lan Phng
Hnh 4: Biu din qu trnh quy thay th mt cm bi tm im qu trnh quy khi chia nh khng gian tr nn n gin, ngi ta s dng mt cu trc d liu c bit. l cu trc cy Quadtree v Octree.
L Th Lan Phng
Hnh 5: Cy Quadtree vi 4 mc Mi nt (node) ca cy tng ng c 4 con (children), l 4 vung nh va mi c to thnh t vic phn chia vung ln hn trc . Vi cy Octree, qu trnh din ra tng t. Nhng thay v mi nt c 4 con (nh trong Quadtree), mi nt ca cy Octree c 8 con. Di y l hnh m t mt cy Octree vi 2 mc chia.
Hnh 6: Cy Octree vi 2 mc Cc l ca cy Quadtree lu thng tin v v tr, khi lng ca cc ht tng ng c trong hp. Tuy nhin, nu nh phn b cc ht trong khng gian khng ng u th vic phn chia nh trn s khin cho nhiu l ca cy l rng. Do vy, vic lu tr cc l rng L Th Lan Phng
Song song ha thut ton Barnes-Hut vi OpenMP nh th rt lng ph. khc phc tnh trng trn, ngi ta ch tin hnh phn chia cc vung ch khi chng c cha nhiu hn mt ht. Ta c cu trc cy c dng nh sau
L Th Lan Phng
QuadInsert(i, root) ... thm ht i vo cy end for Duyt cy loi b nhng l trng procedure QuadInsert(i,n) ... th tc ny thm ht i vo nt n trong cy ... khi xy dng cy, ch mi l trong cy ch cha ... 1 hoc 0 ht If cy con c gc ti n cha nhiu hn 1 ht La chn con c ca n thm ht i. L Th Lan Phng
10
Song song ha thut ton Barnes-Hut vi OpenMP QuadInsert(i,c) else if cy con c gc ti n cha ng 1 ht Thm 4 con ca n vo cy Quadtree Chuyn ht c trong n sang mt con Chn con c ca n thm ht i QuadInsert(i,c) else if cy con ti n l rng Lu ht i vo nt n endif Bc 2: Vi mi vung ca cy, tnh tm khi v tng khi lng cc ht c trong . ... tnh tm khi v tng khi lng cc ht cho mi ( mass, cm ) = Compute_Mass(root) ... cm = tm khi function ( mass, cm ) = Compute_Mass(n) ... Tnh khi lng v tm khi ... cho tt c cc ht c trong cy con gc l n if n cha 1 ht store ( mass, cm ) at n return ( mass, cm ) else for cc con c(i) ca n (i=1,2,3,4) ( mass(i), cm(i) ) = Compute_Mass(c(i)) end for mass = mass(1) + mass(2) + mass(3) + mass(4) cm = ( mass(1)*cm(1) + mass(2)*cm(2) L Th Lan Phng
11
Song song ha thut ton Barnes-Hut vi OpenMP + mass(3)*cm(3) + mass(4)*cm(4)) / mass store ( mass, cm ) at n return ( mass, cm ) end Bc 3: Vi mi ht, duyt cy tnh lc tc dng ln n. tnh lc tc dng ln ht, ta xt t s: Kch thc ca hp Khong cch t ht ti tm khi
D/r =
Nu t s D/r l nh, ta c th tnh lc gy ra bi cc ht trong hp bng cch s dng khi lng v v tr tm khi trong hp. Gi s (theta) l ngng (gc m) cn tnh ton (thng thng 0 < <= 1). Nu D/r < , ta tnh lc hp dn tc dng ln cc ht nh sau. (x, y, z) l v tr ca ht trong khng gian 3 chiu. m l khi lng ca ht. (xcm, ycm, zcm) l v tr ca tm ht trong hp mcm l tng khi lng cc ht c trong hp G l hng s hp dn Khi lc hp dn s c tnh theo cng thc xp x l: Force = G * m * mcm * ( (xcm-x)/r3, (ycm-y)/r3, (zcm-z)/r3) (*) Trong r = sqrt ((xcm-x)2 + (ycm-y)2 + (zcm-z)2 ) l khong cch t ht ti tm cc ht c trong hp. Nu D/r >= , qu trnh tnh lc tc dng ln mt ht c p dng quy. Lc tc dng ln ht bng tng cc lc cc nt con tc dng ln ht . Thut ton tnh lc tc dng ln ht bc ny c th c m t nh di y.
L Th Lan Phng
12
Song song ha thut ton Barnes-Hut vi OpenMP ... vi mi ht, duyt cy tnh lc tc dng ln n For i = 1 to n f(i) = TreeForce(i,root) end for function f = TreeForce(i,n) ... tnh lc hp dn tc dng ln ht i ... da vo tt c cc ht c trong nt n f=0 if n cha 1 ht f = lc tnh c da vo cng thc (*) else r = khong cch t ht i ti tm khi ti n D = kch thc ca cell n if D/r < theta tnh lc f da vo cng thc (*) else for tt c cc con c ca n f = f + TreeForce(i,c) end for end if end if Qua thut ton trn ta nhn thy: qu trnh duyt cy tnh lc tc dng ln ht l c lp i vi mi ht. Bi vy c th tin hnh song song ha qu trnh ny nhm tng tc bi ton N-body.
L Th Lan Phng
13
Song song ha thut ton Barnes-Hut vi OpenMP Hin nay, c rt nhiu gii thut nhm song song ha qu trnh tnh ton lc tc dng trong h thng N-body. C hai hng tin hnh song song ha thut ton. 1) Song song ha vi MPI - s dng b nh phn tn 2) Song song ha vi OpenMP - s dng b nh chia s Trong kha lun ny, ta s xem xt phng php song song ha thut ton Barnes-Hut vi OpenMP.
L Th Lan Phng
14
L Th Lan Phng
15
L Th Lan Phng
16
Song song ha thut ton Barnes-Hut vi OpenMP Mt s v d v cc my tnh b nh chia s: SGI Origin2000: l s kt hp hiu qu gia kin trc b nh chia s v b nh phn tn. B nh c phn tn v mt vt l gia cc nt, vi 2 b x l ti mi nt. Quyn truy cp ti b nh cc b ca cc b x l ti cc nt l nh nhau. Xt theo kha cnh kin trc chia s, tt c cc nt u c quyn truy cp ging nhau ti b nh phn tn vt l (http://www.cray.com/products/systems/origin2000) Sun HPC servers, nh Enterprise 3000 (gm t 1 n 6 b x l) hoc Enterprise 10000 (gm 4 n 64 b x l). (http://www.sun.com/servers)
HP Exemplar series, nh S-class (gm 4 n 16 b x l), X-class (ti 64 b x l) (http://www.hp.com/pressrel/sep96/30sep96a.htm) DEC Ultimate Workstation. Gm 2 b x l, nhng tc ca mi b x l rt cao (533 MHz) (http://www.workstation.digital.com/products/uwseries/uwproduct. html).
L Th Lan Phng
17
L Th Lan Phng
18
Hnh 10: M hnh Fork-Join Trong m hnh Fork-Join, tt c cc chng trnh OpenMP u bt u bi mt tin trnh n. l master thread (lung chnh). Lung chnh ny c thc hin tun t cho n khi gp ch th khai bo vng cn song song ha. Fork: sau khi gp ch th khai bo song song, master thread s to ra mt nhm cc lung song song. Khi , cc cu lnh trong vng c khai bo song song s c thc hin song song ha trn nhm cc lung va c to. Join: khi cc lung thc hin xong nhim v ca mnh, chng s tin hnh qu trnh ng b ha, ngt lung, v ch li 1 lung duy nht l master thread.
19
Song song ha thut ton Barnes-Hut vi OpenMP f77: !$OMP PARALLEL f77: call work(x,y) f77: !$OMP END PARALLEL C/C++:
C/C++: #pragma omp parallel C/C++: { C/C++: work(x,y); C/C++: } Di y l tm tt cc ch th c bn khi lp trnh vi OpenMP (dng trong ngn ng C/C++)
L Th Lan Phng
20
Vng lin tc
Hnh 11: Minh ha vng c song song ha Cu trc: #pragma omp parallel [clause [clause]] <new_line> strutured block Trong clause c th l: private shared default firstprivate reduction if (scalar_logical_expression) copyin
21
Song song ha thut ton Barnes-Hut vi OpenMP firstprivate (list): ging vi khai bo private, song bn sao danh sch cc bin c gn gi tr ban u l gi tr ca cc bin gc. lastprivate (list): ging vi khai bo private, cc bin gc trong danh sch s c gn gi tr l gi tr cui cng sau khi ra khi vng lp hoc ra khi mt section. shared (list): tt c cc thread u c quyn truy cp ti cng mt danh sch cc bin c khai bo l shared. V thc cht, bin c chia s chim mt v tr c th trong b nh. Mi thread c th c v ghi thng qua a ch nh . Vn t ra l phi m bo cho cc thread truy cp mt cch hp l ti cc bin chia s. default (shared | none): thit lp thuc tnh mc nh cho tt c cc bin c s dng trong vng song song ha. Ring cc bin trong khai bo threadprivate khng chu nh hng ca default. Cc bin c th c khai bo chnh tc l private, sharedm khng cn phi khai bo default. reduction (operator : list): cho php cc bin thuc list (bin shared) c cc bn sao l private trong mi thread. Cc thread s thc hin v ghi gi tr vo bin private . Kt thc ch th reduction, bin shared trong list c ly gi tr t cc bin private mi thread bng cch p dng ton t operator. Ton t operator c th l cc php +, *, -, max, min, schedule (type [, chunk_size]): Ch th ny ch ra cch thc vng lp for c phn chia nh th no gia cc thread, thng c s dng to trng thi cn bng ti gia cc thread. Trong type c th l: static, dynamic, guided, hoc runtime o static: Nu khng ch ra chunk_size th chunk_size c gn bng CEILING(tng s ln lp/s lung). Cc chunk c gn ln lt cho cc thread (tc l theo kiu round-robin) o dynamic: Nu khng ch ra chunk_size th chunk_size c gn bng 1. Cc chunk c gn cho cc thread theo kiu: thread no ri hoc n trc th thc hin trc (first-come first-do). L Th Lan Phng
22
Song song ha thut ton Barnes-Hut vi OpenMP o guided: Nu khng ch ra chunk_size th chunk_size c gn bng 1. Nu ch ra chunk_size th tng s ln lp s c ch ra sao cho c ca cc chunk ni tip nhau (theo ch s tng dn) hay chunk_size gim theo hm m. chunk_size chnh l c ca chunk b nht. Cch lm: chunk_size u tin = CEILING(s ln lp chia cho s thread). cc chunk_size tip theo = CEILING(s ln lp cn li chia cho s thread) Khi thc hin, nu s lng chunk ln hn s thread th thread no thc hin xong phn vic ca mnh s m nhim chunk tip theo cha c thc hin. o runtime: Chunk_size s c xc nh khi chng trnh c thc hin. Kiu schedule s l static hoc c ch ra thng qua bin mi trng OMP_SCHEDULE (nh vy c th l DYNAMIC, GUIDED...) threadprivate: #pragma omp threadprivate (list) <new_line> Ch th threadprivate c s dng lm cho cc bin c phm vi ton cc tr thnh cc b v tip tc tn ti trong mi thread trong sut cc qu trnh cn c song song ha. Ch th phi c xut hin ngay sau khi khai bo bin. Sau mi thread s lm vic vi mt bn sao cc bin. copyin (list): gn gi tr cho cc bin c khai bo l threadprivate trong cc thread bng gi tr ca cc bin gc trong master thread trc khi thc hin song song. List cha danh sch cc bin sao chp.
L Th Lan Phng
23
Hnh 12: Hnh minh ha ch th Do/for Cu trc: #pragma omp for [clause [clause]] <new_line> C/C++ for loop Trong clause c th l: private (list) firstprivate (list) lastprivate (list) reduction (operator:list) schedule (type [,chunk_size]) ordered nowait ngha ca private, firstprivate, lastprivate, reduction, schedule c m t mc 6.3.
L Th Lan Phng
24
Song song ha thut ton Barnes-Hut vi OpenMP o ordered: phi c xut hin khi trong vng lp for c s dng ch th ordered o nowait: cho bit cc thread khng cn phi tin hnh ng b ha khi kt thc vng lp song song. Chng tip tc thc hin cc cu lnh sau vng lp m khng cn phi ch i thread no. C th kt hp gia khai bo song song vi ch th chia s cng vic bng cu trc sau: #pragma omp parallel for [clauses] <new_line> for loop
2.6.4.2 Sections
Ch th Sections ch ra cc on m c phn chia nh th no gia cc thread. Mt khai bo sections c th gm nhiu section con c lp vi nhau. Mi mt section c thc hin 1 ln bi mt thread. Nu thi gian thc hin l nhanh v cch ci t cho php, mt thread c th thc hin nhiu hn 1 section. Nu s lng thread nhiu hn s lng section, khi mt vi thread c th ri. Nu s lng thread t hn so vi section, ty thuc vo cch ci t s xc nh cc section c thc hin nh th no.
L Th Lan Phng
25
Hnh 13: Hnh minh ha ch th sections Cu trc: #pragma omp sections [clause[ clause] . . . ] <new-line> { [#pragma omp section <new-line>] structured-block [#pragma omp section <new-line> structured-block ...] } Trong clause c th l: private lastprivate firstprivate reduction
L Th Lan Phng
26
Song song ha thut ton Barnes-Hut vi OpenMP nowait ngha ca cc thng s trn ging nh c m t cc phn trc. C th kt hp gia khai bo sections vi khai bo parallel. #pragma omp parallel sections [clauses] <new-line> { [#pragma omp section <new-line>] structured-block [#pragma omp section <new-line>] structured-block ...] }
2.6.4.3 Single
Ch th cho bit on m nm trong khai bo single s c thc thi bi duy nht mt thread. Nu khng c ty chn nowait, cc thread khc s khng thc hin ch th single v ch i ti im cui ca khi lnh trong khai bo single.
L Th Lan Phng
27
Hnh 14: Hnh minh ha ch th single Cu trc: #pragma omp single [clauses] <new-line> structured-block Trong clause c th l: private firstprivate nowait
2.6.5 Ch th ng b ha
Xt v d n gin di y: 2 thread nm trn 2 b x l khc nhau u cng thc hin vic tng gi tr ca bin x vo mt thi im. (gi s x = 0) THREAD 1: increment(x) THREAD 2: increment(x)
L Th Lan Phng
28
Song song ha thut ton Barnes-Hut vi OpenMP { x = x + 1; } THREAD 1: 10 LOAD A, (x address) 20 ADD A, 1 30 STORE A, (x address) } THREAD 2: 10 LOAD A, (x address) 20 ADD A, 1 30 STORE A, (x address) { x = x + 1;
C th xy ra trng hp: thread1 lu gi tr x vo thanh ghi A thread2 lu gi tr x vo thanh ghi A thread1 cng thm 1 vo gi tr x trong thanh ghi A thread2 cng thm 1 vo gi tr x trong thanh ghi A thread1 lu gi tr trong thanh ghi A ti a ch ca x thread2 lu gi tr trong thanh ghi A ti a ch ca x Kt qu: x=1, khng phi l 2 nh mong i. trnh tnh trng trn, vic tng gi tr x phi c ng b ha gia cc thread m bo cho kt qu chnh xc. Di y l mt s ch th lin quan ti ng b ha.
2.6.5.1 Master
Ch th cho bit on m nm trong khai bo master s c thc hin bi master thread. Cc thread khc s b qua on m ny v tip tc thc hin bnh thng. Cu trc:
L Th Lan Phng
29
Song song ha thut ton Barnes-Hut vi OpenMP #pragma omp master <new-line> structured-block
2.6.5.2 Critical
Ch th xc nh on m nm trong khai bo s c truy cp bi duy nht mt thread vo mt thi im. Cc thread khc s phi ch cho n khi khng c thread no thc hin on m . Cu trc: #pragma omp critical [(name)] <new-line> structured-block
2.6.5.3 Barrier
Khi 1 thread gp ch th barrier, thread s phi ch cho n khi no tt c cc thread cn li u gp ch th ny. Cu trc: #pragma omp barrier <new-line>
2.6.5.4 Atomic
Ch th atomic xc nh mt vng b nh c th no s c cp nht mt cch tng phn, khng cho php nhiu thread cng thc hin ti vo mt thi im. Ch th ch p dng cho cc cu lnh n. Cu trc: #pragma omp atomic <new-line> statement_expression Cc cu lnh n c th l: ++x x++ --x x-L Th Lan Phng 30
Song song ha thut ton Barnes-Hut vi OpenMP v cc ton t +, -, *, /, &, ^, |, >> hoc <<
2.6.5.5 Flush
Ch th flush s ghi li cc bin visible trong thread vo b nh. Ngi lp trnh c th t xc nh qu trnh ng b ha mt cch trc tip trn b nh chia s thng qua vic s dng flush. Ty chn list c s dng xc nh danh sch cc bin cn flush, nu khng c ty chn ny tt c cc bin s c ghi li vo b nh. Cu trc: #pragma omp flush [(list)] <new-line>
2.6.5.6 Ordered
Ch th ordered xc nh vng lp s c thc hin theo th t nh th c thc thi trn b x l tun t. Ordered ch xut hin trong khai bo ch th lp Do/for. Ti mt thi im, ch c mt thread thc hin cng vic trong phn khai bo ordered. Cu trc: #pragma omp ordered <new-line> structured-block
Song song ha thut ton Barnes-Hut vi OpenMP int omp_in_parallel(void) void omp_set_dynamic(int dynamic_threads) int omp_get_dynamic(void) void omp_set_nested(int nested) int omp_get_nested(void)
L Th Lan Phng
32
int nthreads, tid; /* Fork a team of threads giving them their own copies of variables */ #pragma omp parallel private(nthreads, tid) { /* Obtain thread number */ tid = omp_get_thread_num(); printf("Hello World from thread = %d\n", tid); /* Only master thread does this */ if (tid == 0) { nthreads = omp_get_num_threads(); printf("Number of threads = %d\n", nthreads); } } /* All threads join master thread and disband */
Chng trnh minh ha hot ng ca cc thread khi thc hin song song. Bin tid c khai bo l private, lu ID ca mi thread. Bin private nthreads cho bit s lng thread tham gia vo qu trnh song song.
L Th Lan Phng
33
Song song ha thut ton Barnes-Hut vi OpenMP Trn my IBM AIX, dch chng trnh omp_hello.c, dng lnh: xlc_r qsmp=omp omp_hello.c o hello thc hin chng trnh, g lnh: ./hello Nu khng ch r s lng thread cn s dng thc hin qu trnh song song, th chng trnh s ly s thread mc nh hin c trong h thng kin trc b nh chia s. C th xc nh s thread cn thit thng qua hm th vin omp_set_num_threads(int num_threads) hoc truyn gi tr cho bin mi trng OMP_NUM_THREADS bng lnh: export OMP_NUM_THREADS Gi s, trong v d trn t s thread l 4: export OMP_NUM_THREADS=4 Xem trang http://www.navo.hpc.mil/Resources/Hardware/Romulus_Users_Guide.html#ProgEnv bit thm chi tit v mt s ch th bin dch.
2.7.3 Kt qu
Hello World from thread Number of threads = 4 Hello World from thread Hello World from thread Hello World from thread = 0 = 3 = 1 = 2
L Th Lan Phng
34
L Th Lan Phng
35
Hnh 15: Cu trc d liu cy trong treecode (1) Cu trc node biu din cc thng tin chung ca body v cell. Theo l thuyt, mi mt thnh phn ca cy c th c biu din l t hp ca body v cell. Nhng cch biu din l khng hiu qu, v cu trc ca body v cell i hi khng gian b nh khc nhau. Do vy ngi ta s dng cu trc node biu din chung cho body v cell. Vic p kiu c s dng chuyn con tr c kiu ty thnh con tr tr ti node, body v cell.
typedef struct _node { short type; bool update; real mass; vector pos; struct _node *next; } node, *nodeptr; #define Type(x) (((nodeptr) (x))->type)
#define Update(x) (((nodeptr) (x))->update) #define Mass(x) #define Pos(x) #define Next(x) (((nodeptr) (x))->mass) (((nodeptr) (x))->pos) (((nodeptr) (x))->next)
Trong :
L Th Lan Phng
36
Song song ha thut ton Barnes-Hut vi OpenMP Type(q) tr li kiu ca node q, c gi tr l CELL hoc BODY Update(q) c gi tr l boolean, cho bit q c cn cp nht lc tng tc khng? Next(q) l con tr, tr ti node tip theo ca q, sau khi tt c cc con ca q c duyt. Mass(q) l khi lng ca ht q hoc l khi lng ca tt c cc ht c trong cell q Pos(q) l v tr ca ht q hoc v tr ca tm khi trong cell q Cu trc body biu din cc ht.
typedef struct { node bodynode; vector vel; vector acc; real phi; } body, *bodyptr; #define Vel(x) #define Acc(x) #define Phi(x) (((bodyptr) (x))->vel) (((bodyptr) (x))->acc) (((bodyptr) (x))->phi)
Trong : Vel(b) l vn tc ca ht b Acc(b) l gia tc ca ht b Phi(b) l th nng ca ht b Cu trc cell biu din cc nt trong ca cy
#define NSUB (1 << NDIM) typedef struct { node cellnode;
L Th Lan Phng
37
#if !defined(QUICKSCAN) real rcrit2; #endif nodeptr more; union { nodeptr subp[NSUB]; matrix quad; } sorq; } cell, *cellptr; #if !defined(QUICKSCAN) #define Rcrit2(x) (((cellptr) (x))->rcrit2) #endif #define More(x) #define Subp(x) #define Quad(x) (((cellptr) (x))->more) (((cellptr) (x))->sorq.subp) (((cellptr) (x))->sorq.quad)
Trong : Subq(c) l mng cc con tr tr ti cc con ca c More(c) l con tr tr ti con u tin trong cc con ca c Quad(c) l ma trn quadrupole moments Rcrit2(c) l bnh phng bn knh m nu nm ngoi bn knh , cell c c coi nh l mt cell interaction.
L Th Lan Phng
38
L Th Lan Phng
39
Song song ha thut ton Barnes-Hut vi OpenMP actmax: di ln nht ca danh sch active trong khi tnh lc nbbcalc: s cc tng tc gia cc ht vi nhau nbccalc: s cc tng tc gia cc ht v cc cell cpuforce: thi gian CPU cn tnh lc
Thc hin cu lnh make dch nhiu file vi nhau L Th Lan Phng
40
Song song ha thut ton Barnes-Hut vi OpenMP $ make treecode Thc hin chng trnh bng ./treecode [tham s] C th xem cc tham s bng cu lnh: ./treecode help
treecode in= out= dtime=1/32 eps=0.025 theta=1.0 usequad=false options= tstop=2.0 dtout=1/4 nbody=4096 seed=123 save= restore= VERSION=1.4 Hierarchical N-body code (theta scan) Input file with initial conditions Output file of N-body frames Leapfrog integration timestep Density smoothing length Force accuracy parameter if true, use quad moments Various control options Time to stop integration Data output timestep Number of bodies for test run Random number seed for test run Write state file as code runs Continue run from state file Joshua Barnes February 21 2001
Khi thc hin chng trnh, kt qu ca qu trnh tnh ton s c hin th ra mn hnh, c dng nh sau:
L Th Lan Phng
41
Hierarchical N-body code (theta scan) nbody 4096 dtime 0.03125 rsize 64.0 eps 0.0250 ftree 3.050 theta 1.00 actmax 1114 usequad false nbbtot 1051990 dtout 0.25000 nbctot 1417390 tstop 2.0000 CPUfc 0.003 CPUtot 0.004 CPUfc 0.003 CPUtot 0.007 CPUfc 0.003 CPUtot 0.011 CPUfc
tdepth 13
time |T+U| T -U -T/U |Vcom| |Jtot| 0.000 0.24032 0.25082 0.49114 0.51069 0.00000 0.00576 rsize 64.0 tdepth 12 ftree 3.108 actmax 1121 nbbtot 1054863 nbctot 1419799
time |T+U| T -U -T/U |Vcom| |Jtot| 0.031 0.24028 0.25075 0.49104 0.51066 0.00001 0.00576 rsize 64.0 tdepth 13 ftree 3.057 actmax 1108 nbbtot 1040044 nbctot 1422316
time |T+U| T -U -T/U |Vcom| |Jtot| 0.062 0.24029 0.25069 0.49098 0.51059 0.00001 0.00576 rsize tdepth ftree actmax nbbtot nbctot
L Th Lan Phng
42
Flat profile: Each sample counts as 0.01 seconds. % cumulative self time seconds seconds calls 96.02 129.09 129.09 65 0.96 130.37 1.29 2466730 0.95 131.65 1.28 65 0.76 132.68 1.03 266240 0.57 133.44 0.76 65 0.21 133.72 0.28 65 0.19 133.97 0.25 65 0.13 134.15 0.18 64 0.10 134.28 0.13 65 0.05 134.35 0.07 130637 0.03 134.39 0.04 69477 0.01 134.41 0.02 130637 0.01 134.43 0.02 8192 0.01 134.45 0.02 65 0.01 134.46 0.01 1
self s/call 1.99 0.00 0.02 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01
total s/call 1.99 0.00 0.02 0.00 0.01 0.00 0.00 2.05 0.00 0.00 0.00 0.00 0.00 2.05 0.07
name walktree subindex diagnostics loadbody hackcofm threadtree newtree stepsystem expandbox makecell xrandom setrcrit fpickshell treeforce testdata
Kt qu ny c c khi bin dch v thc hin chng trnh trn my Intel vi 1 CPU, Pentium 4 CPU 2.26GHz, 240MB RAM, h iu hnh LINUX. Nh vy, qua kt qu trn ta thy hm walktree chim n 96.02% tng s thi gian thc hin c chng trnh, trong khi phn trm thi gian thc hin cc hm khc l rt nh. Do vy, tuy thi gian thc hin treecode c nhanh hn v vn kim sot li l tt hn so vi cc chng trnh m phng bi ton N-body trc , song ti u ha chng trnh treecode, ta tin hnh th nghim song song ha chng trnh vi OpenMP trn my Intel 4 CPU nhm tng hiu nng tnh ton.
L Th Lan Phng
43
Song song ha thut ton Barnes-Hut vi OpenMP Cc node c kt ni vi nhau thng qua HPS (High Performance Switch Switch hiu nng cao), bng thng 2GBps v GEthernet. H thng lu tr chung: IBM DS4400 v EXP700 kt ni vi cm IBM 1600 thng qua cp quang vi bng thng 2Gbps. Cc node chy HH AIX 5L phin bn 5.2 Kt qu profile ca treecode trn IBM AIX s l:
% cumulative time seconds 42.3 13.85 34.8 25.23 7.6 27.73 5.4 29.50 4.0 30.81 2.6 31.65 0.8 31.91 0.5 32.09 0.4 32.22 0.4 32.34 0.4 32.46 0.3 32.55 0.2 32.61 0.1 32.64 self self total seconds calls ms/call ms/call 13.85 11.38 532480 0.02 0.02 2.50 1.77 161645903 0.00 0.00 1.31 16072949 0.00 0.00 0.84 570149 0.00 0.03 0.26 0.18 65 2.77 2.77 0.13 2480994 0.00 0.00 0.12 0.12 0.09 266240 0.00 0.00 0.06 65 0.92 1.25 0.03 65 0.46 0.46 name .sqrt [8] .sumnode [9] .__mcount [11] .sqrtf [12] .accept [13] .walktree_13_6 <cycle 1> [7] .qincrement [15] .diagnostics [18] .subindex [19] .__stack_pointer [20] .qincrement1 [21] .loadbody [16] .hackcofm [22] .threadtree [25]
Vi trnh bin dch ca Intel (R) Xeon (TM) 4 CPU 2.40 GHz, th phn trm thi gian thc hin cc hm ca treecode l:
L Th Lan Phng
45
Flat profile: Each sample counts as 0.00195312 seconds. % cumulative self self time seconds seconds calls ms/call 73.87 9.13 9.13 532480 0.02 12.84 10.71 1.59 16013212 0.00 10.21 11.97 1.26 569962 0.00 0.84 12.08 0.10 2466729 0.00 0.47 12.14 0.06 266240 0.00 0.46 12.19 0.06 303722 0.00 0.27 12.23 0.03 65 0.51 0.25 12.26 0.03 520 0.06 0.22 12.28 0.03 520 0.05 0.19 12.31 0.02 266240 0.00 0.14 12.33 0.02 64 0.27
total ms/call 0.02 0.00 0.02 0.00 0.00 0.00 0.51 0.07 0.05 0.03 189.50
name sumnode accept walktree subindex loadbody walksub diagnostics hackcofm threadtree gravsum stepsystem
Nh vy, vi cc trnh bin dch khc nhau, thi gian thc hin cc hm ca treecode l hon ton khc. V vy, vic nh gi hm no tn nhiu thi gian nht cng nh cn phi tin hnh song song ha nh th no l vn gp nhiu kh khn.
Hm walktree tnh lc hp dn ln tt c cc ht c trong node p thng qua vic duyt quy p v cc con ca n. Ti mi thi im trong lt duyt quy, thng tin ca cc node t gc ti p c lu tr trong mt tp cc node. Tp chnh l tp cc tng tc. Tp ny c chia thnh 2 tp cell v body ring bit, c tr bi cc con tr tng ng l cptr v bptr. Phn cn li ca cy c biu din bi mt tp cc active node, bao gm node p v cc node xung quanh n trong khng gian. Con tr tr ti cc node ny c lu vo mng nm gia aptr v nptr. Node p c kch thc l psize v v tr l pmid. Trong vng lp chnh, walktree duyt qua tt c cc active node ca p, kim tra xem node no s c thm vo danh sch tng tc, v node no gn vi p n mc phi kim tra cc con ca n mc tip theo ca qu trnh duyt quy. Cc cell c kim L Th Lan Phng 46
Song song ha thut ton Barnes-Hut vi OpenMP tra thng qua hm accept. Nu cell cch kh xa p, ngha l t s D/r l nh, cell c thm vo danh sch tng tc ca p. Ngc li, kim tra tt c cc con ca n, v thm vo danh sch cc active node. Nu c danh sch active mi c to ra, th tip tc duyt cy quy mc tip theo thng qua li gi hm walksub. Hm walksub thc hin vic gi hm walktree ti cc con ca p. Ngc li, nu khng c danh sch active mi no, tin hnh kim tra p. Nu p l body, thc hin tnh ton lc ti p bng li gi hm gravsum. Nguyn mu ca hm walksub c dng nh sau:
void walksub(nodeptr *nptr, nodeptr *np, cellptr bptr,nodeptr p, real psize, vector pmid); cellptr cptr,
Cc tham s trong hm walksub c gi tr ging vi cc tham s trong hm walktree ti li gi hm. C 2 trng hp xy ra: Nu p l cell, khi walksub s duyt qua tt c cc con ca p, v gi hm walktree ti mi nt con . Nu p l body, walksub s gi hm walktree ng mt ln duy nht, duyt nt danh sch active ca n.
L Th Lan Phng
47
local void walktree(nodeptr *aptr, nodeptr *nptr, cellptr cptr, cellptr bptr, nodeptr p, real psize, vector pmid) { nodeptr *np, *ap, q; int actsafe; if (Update(p)) { /* are new forces needed? */ np = nptr; /* start new active list */ actsafe = actlen - NSUB;/* leave room for NSUB more */ for (ap = aptr; ap < nptr; ap++)/* loop over active nodes */ if (Type(*ap) == CELL) { /* is this node a cell? */ if (accept(*ap, psize, pmid)) {/* does it pass the test?*/ Mass(cptr) = Mass(*ap); /* copy to interaction list */ SETV(Pos(cptr), Pos(*ap)); SETM(Quad(cptr), Quad(*ap)); cptr++; /* and bump cell array ptr */ } else { /* else it fails the test */ if (np - active >= actsafe) /* check list has room */ error("walktree: active list overflow\n"); for (q = More(*ap); q != Next(*ap); q = Next(q)) /* loop over all subcells */ *np++= q; /* put on new active list */ } } else /* else this node is a body */ if (*ap != p) { /* if not self-interaction */ --bptr; /* bump body array ptr */ Mass(bptr) = Mass(*ap);/* and copy data to array */ SETV(Pos(bptr), Pos(*ap)); } actmax = MAX(actmax, np - active); /* keep track of max active */ if (np != nptr) /* if new actives listed */ walksub(nptr, np, cptr, bptr, p, psize, pmid); /* then visit next level */ else { /* else no actives left, so */ if (Type(p) != BODY) /* must have found a body */ error("walktree: recursion terminated with cell\n"); gravsum((bodyptr) p, cptr, bptr); /* sum force on the body */ } } }
L Th Lan Phng
48
local void walksub(nodeptr *nptr, nodeptr *np, cellptr cptr, cellptr bptr, nodeptr p, real psize, vector pmid) { real poff; nodeptr q; int k; vector nmid; poff = psize / 4; /* precompute mid. offset */ if (Type(p) == CELL) { /* fanout over descendents */ for (q = More(p); q != Next(p); q = Next(q)) { /* loop over all subcells */ for (k = 0; k < NDIM; k++) /* locate each's midpoint */ nmid[k] = pmid[k] + (Pos(q)[k] < pmid[k] ? - poff : poff); walktree(nptr, np, cptr, bptr, q, psize / 2, nmid); /* recurse on subcell */ } } else { /* extend virtual tree */ for (k = 0; k < NDIM; k++) /* locate next midpoint */ nmid[k] = pmid[k] + (Pos(p)[k] < pmid[k] ? - poff : poff); walktree(nptr, np, cptr, bptr, p, psize / 2, nmid); /* and search next level */ } }
L Th Lan Phng
49
Song song ha thut ton Barnes-Hut vi OpenMP nm trong khai bo task v mt l thuyt s c xp vo hng i. Hng i s kt thc khi tt c cc cng vic trn c hon thnh. Nh vy, vi vic s dng hng i, hm walktree c th c chnh sa li nh sau: Trong hm gravcal(), li gi hm walktree s c thm cc ch th taskq v task ca OpenMP.
void gravcalc(void) { . . active[0] = (nodeptr) root; CLRV(rmid);
*/ */
/* Add parallel region */ #pragma omp parallel { #pragma intel omp taskq { #pragma intel omp task { walktree(active, active + 1, interact, interact + actlen, (nodeptr) root, rsize, rmid); /* scan tree, update forces */ } } } /* end of parallel region */ cpuforce = cputime() - cpustart; /* store CPU time w/o alloc */ free(active); free(interact); }
L Th Lan Phng
50
local void walksub(nodeptr *nptr, nodeptr *np, cellptr cptr, cellptr bptr, nodeptr p, real psize, vector pmid) { if (Type(p) == CELL) { /* fanout over descendents */ /* add parallel region */ #pragma intel omp parallel taskq shared(q) { for (q = More(p); q != Next(p); q = Next(q)) { #pragma intel omp task captureprivate(q) { for (k = 0; k < NDIM; k++) nmid[k] = pmid[k] + (Pos(q)[k] < pmid[k] ? - poff : poff); walktree(nptr, np, cptr, bptr, q, psize / 2, nmid); } } } /* end of parallel region */ } else { for (k = 0; k < NDIM; k++) nmid[k] = pmid[k] + (Pos(p)[k] < pmid[k] ? - poff : poff); walktree(nptr, np, cptr, bptr, p, psize / 2, nmid); } }
Khi bin dch chng trnh, cc ch th ca OpenMP s c thc hin song song. Kt qu thc nghim c cho di y.
L Th Lan Phng
51
Flat profile: Each sample counts as 0.00195312 seconds. % cumulative self self time seconds seconds calls ms/call 74.84 10.43 10.43 532480 0.02 11.18 11.99 1.56 569962 0.00 10.83 13.50 1.51 16013212 0.00 0.68 13.59 0.09 2466729 0.00 0.48 13.66 0.07 266240 0.00 0.29 13.70 0.04 0.27 13.73 0.04 520 0.07 0.24 13.77 0.03 266240 0.00 0.22 13.80 0.03 130637 0.00 0 20 13 82 0 03 65 0 42
total ms/call 0.02 0.02 0.00 0.00 0.00 0.09 0.04 0.00 0 42
name sumnode walktree accept subindex loadbody _walksub_202__task4 hackcofm gravsum _walksub_199__taskq3 diagnostics
Nh vy, ty thuc vo tng trnh bin dch trn cc my tnh c cu hnh khc nhau, kt qu th nghim thu c trn my a x l Intel ch mang tnh cht tng i.
L Th Lan Phng
52
KT LUN
Kt qu t c
Sau mt thi gian tm hiu, nghin cu v nh gi, ti nhn thy thut ton Barnes-Hut v cc ci tin ca n gp phn quan trng khi gii quyt bi ton N-body, vi phc tp ch l O (N log N). Cng qua tm hiu, ti thy OpenMP l mt giao din lp trnh ng dng song song n gin v d s dng. N cung cp cho ngi dng mt giao din mm do, c tnh kh chuyn cao trong khi xy dng v pht trin cc ng dng song song trn cc kin trc my tnh b nh chia s. Vic ci t v th nghim ci tin ca thut ton Barnes-Hut trn cc my a x l Intel v IBM cng gip cho ti c nhng kinh nghim v thu c mt s kt qu thc nghim. qua ti phn tch v thy c mt s kh khn khi tin hnh song song ha. Tuy kt qu thc nghim t c cha cao, song qua tm hiu ti hc hi c kinh nghim v nng cao vn hiu bit ca mnh v tnh ton hiu nng cao trn cc my a x l, cng nh kinh nghim v s dng h iu hnh Linux.
L Th Lan Phng
53
J. E. Barnes, A modified tree code: Don't laugh; It runs, Journal of Computational Physics 87 (1990) 161--170. A. Kawai, J. Makino, High-accuracy treecode based on pseudoparticle multipole method, Proceedings of the 208th Symposium of the International Astronomical Union (Tokyo, Japan, July 10-13, 2001) 305-314. A. Kawai, J. Makino, Pseudo-particle multipole method: A simple method to implement a high-accuracy treecode, The Astrophysical Journal, 550 (2001) L143-L146. A. Kawai, J. Makino, T. Ebisuzaki, Performance analysis of high-accuracy tree code based on the pseudoparticle multipole method, The Astrophysical Journal Supplement 151 (2004) 13-33. Joshua E. Barnes, Institute for Astronomy, University of Hawaii, Treecode Guide Giovanni Erbacci, Shared Memory Paradigm, High Performance Systems Department, CINECA Introduction to OpenMP, Technical User Support, Supercomputing Institute, University of Minnesota exercise
[5]
[6]
[11] Michael S, Claudia Leopold, A User's Experience with Parallel Sorting and OpenMP, Talk at the EWOMP'04 conference, Stockholm
L Th Lan Phng
54
L Th Lan Phng
55