You are on page 1of 35

I HC QUC GIA TP.

H CH MINH
TRNG I HC KHOA HC T NHIN
KHOA CNTT - B MN KHMT

BO CO MN HC
LP TRNH SONG SONG D LIU TRN GPU

OpenCL

Thng tin v nhm thc hin:

Co Co
H v tn

Thnh vin

OpenCL

MSSV

TL. ng gp

Nguyn Hng Sn

1/3

Nguyn Trung Tn

1/3

Trn Quc T

1/3

Mc lc
I. Gii thiu v OpenCL...................................................................................................... 4
1. Tng quan.................................................................................................................... 4
2. Lch s hnh thnh....................................................................................................... 5
3. c im ..................................................................................................................... 5
3.1. OpenCL, mt chun lp trnh m......................................................................... 5
3.2. Tn dng ti a cc ti nguyn c th c ca my tnh ....................................... 5
4. Ngn ng..................................................................................................................... 6
5. H tng (Platform)....................................................................................................... 6
6. Tm vc ca OpenCL ................................................................................................. 7
II. Kin trc OpenCL trn nn h iu hnh Mac OS X ..................................................... 8
1. S lc v Mac OS X ................................................................................................. 8
2. Framework & Runtime................................................................................................ 8
3. Compiler...................................................................................................................... 9
4. Operation Model ......................................................................................................... 9
4.1. Platform Model .................................................................................................... 9
4.2. Execution Model ................................................................................................ 10
4.3. Memory Model................................................................................................... 11
4.4. Programming Model .......................................................................................... 12
III. Workflow pht trin chng trnh OpenCL................................................................ 13
1. Cc bc vit mt chng trnh OpenCL ................................................................. 13
1.1. Xc nh nhng nhim v no c th thc hin song song................................ 13
1.2. Vit cc kernel v cc hm b tr ...................................................................... 13
1.3. Setup context ...................................................................................................... 13
1.4. Vit m lnh bin dch v build chng trnh OpenCL ................................ 13
1.5. Khi to cc i tng memory object .............................................................. 14
1.6. Lp hng i lnh c th t (enqueue command) iu khin vic thc thi lin
tc v ng b cc kernel, c v ghi d liu, v thao tc trn cc memory object . 14
OpenCL

1.7. c gi tr tr v................................................................................................. 14
2. Vit Kernel ................................................................................................................ 14
3. Truy vn thit b ........................................................................................................ 16
4. Khi to OpenCL Context ........................................................................................ 16
5. Khi to Program Object .......................................................................................... 17
6. Build Program Executable ........................................................................................ 19
7. Khi to Kernel Object ............................................................................................. 22
8. Khi to Memory Object .......................................................................................... 22
9. Thc thi cc kernel .................................................................................................... 22
9.1. Xc nh s chiu ca d liu: ........................................................................... 23
9.2. Xc nh s lng work-item............................................................................. 23
9.3. Chn kch thc cho work-group ...................................................................... 23
9.4. Enqueue Kernel Execution................................................................................. 25
10. Nhn kt qu tr v.................................................................................................. 28
10.1. Ch cho n khi cc kernel hon tt thc thi ................................................... 28
10.2. c kt qu....................................................................................................... 29
11. Gii phng b nh................................................................................................... 29
12. Debug chng trnh OpenCL.................................................................................. 29
IV. Performance ................................................................................................................ 31
1. GPGPU Performance ................................................................................................ 31
1.1. S thc................................................................................................................ 31
1.2. Bandwidth .......................................................................................................... 32
1.3. Nhn xt ............................................................................................................. 32
2. CPU Performance...................................................................................................... 33
2.1. S hc................................................................................................................. 33
2.2. Bandwidth .......................................................................................................... 34
2.3. Nhn xt ............................................................................................................. 34
IV. Ti liu tham kho....................................................................................................... 35

OpenCL

I. Gii thiu v OpenCL

1. Tng quan
OpenCL (Open Computing Language) l chun m (cng b vo 12/2008) h tr lp
trnh song song trn cc thit b (bao gm c GPU), c xut bi Apple v nhng
li quyn pht trin cho Khronos Group. D ch mi ra i nhng OpenCL li nhn c
rt nhiu s h tr t cc nh sn xut phn cng.

Danh sch cc nh sn xut phn cng ng h OpenCL

OpenCL

2. Lch s hnh thnh


OpenCL ban u c xut v pht trin bi Apple sau ny c pht trin thm vi
s hp tc ca AMD, IBM, Intel,v nVidia. Sau Apple nhng li quyn pht trin
cho Khronos Group (t chc ang nm gi cc chun m khc nh OpenGL,
OpenAL).
16/06/2008: Nhm Khronos Compute Working c thnh lp vi cc i din
n t cc cng ty CPU, GPU, thit b nhng v cc vi x l khc.
18/11/2008: a ra c t k thut OpenCL 1.0.
08/12/2008: bn OpenCL 1.0 chnh thc c pht hnh.
20/04/2009: nVidia ra mt OpenCL driver v SDK pht trin trong chng
trnh OpenCL Early Access.
05/08/2009: AMD gii thiu cng c pht trin u tin cho nn tng OpenCL
nh l mt phn ca chng trnh ATI Stream SDK v2.0 Beta.
3. c im
3.1. OpenCL, mt chun lp trnh m
y l iu trc tin phi ni n, OpenCL l mt chun lp trnh m h tr
min ph cho tt c nhng hng phn cng no c nhu cu tm hiu v ng dng.
Do , mi h tr k thut ca OpenCL u ng ti min ph trn website ca
Khronos Group.
OpenCL c pht trin theo xu hng cross-platform, c lp vi h tng phn
cng ca cc thit b tnh ton cng nh gia cc h iu hnh nn nhn c
s ng h ca rt nhiu cc nh sn xut.

3.2. Tn dng ti a cc ti nguyn c th c ca my tnh
OpenCL c pht trin theo xu hng tn dng c tt c cc thit b tnh ton
c th thc thi song song. iu c ngha l nu ta c mt CPU a nhn vi
OpenCL th c th lp trnh thc thi cc tc v song song trn CPU . Hn
na, OpenCL h tr lp trnh song song tc v (task-parallel programming) v c
lp trnh song song d liu (data-parallel programming).
OpenCL

4. Ngn ng
OpenCL s dng ngn ng OpenCL-C da trn chun C99 cho lp trnh kernel v IEEE754 (chun du chm ng cho s hc) v th nn c php hon ton ging vi C/C++.
5. H tng (Platform)
Do c pht trin theo hng c lp vi h tng phn cng nn OpenCL t xy dng
mt lp phn cng tru tng cho bn thn mnh v c lp hon ton vi h tng phn
cng ca thit b.

M hnh h tng:
Mt host bao gm nhiu Compute Device (Core CPU / SM GPU / ).
Mt Compute Device (CPU / GPU / ) bao gm nhiu Compute Unit.
Mt Compute Unit c th c phn chia thnh mt hoc nhiu Processing
Element (vd: 1 SP trong SM GPU).

OpenCL

6. Tm vc ca OpenCL
Phn cng: OpenCL h tr lp trnh song song trn CPU, GPU hay thm ch c cc thit
b nhng v di ng.
Danh sch cc GPU c OpenCL chnh thc h tr bao gm:
Nvidia:

GeForce 9400M
GeForce 9600M GT
GeForce 8600M GT
GeForce GT 120
GeForce GT 130
GeForce GTX 285
GeForce 8800 GT
GeForce 8800 GS
Quadro FX 4800
Quadro FX 5600

Thc t cc chip dng G80, G90 ca nVidia hoc cao hn, thm ch l cc GPU
GTX200 series vn c th h tr OpenCL nu cc GPU ny c cng ngh CUDA
ca nVidia.
ATI:
Radeon 4850
Radeon 4870

i vi CPU, OpenCL h tr cc chip thuc c hai hng ln hin nay l Intel v AMD.
Do OpenCL vn cn kh mi cho nn cha nhiu cc hng thit k phn cng h tr
chun ny. Nhng tng lai rt ha hn khi OpenCL l chun c nhiu ng ln
trong ngnh cng nghip phn cng h tr nht. V d mi ch ra i khng lu nhng
cc i gia ny chnh thc h tr OpenCL trong lot sn phm hiu nng cao ca
mnh.
V nn tng h iu hnh: OpenCL c th chy c trn c Mac OS X, Windows v
Linux.
OpenCL

II. Kin trc OpenCL trn nn h iu hnh Mac OS X


1. S lc v Mac OS X
H my Macintosh ca Apple ni ting v thit k p v tinh t, h thng phn cng cao
cp, s ti ng b ti u gia phn cng vi phn mm. H iu hnh Mac OS X gp
phn khng nh vo thnh cng nh vo tnh n gin n nh, h mu chun, t b
tn hi cng vi cc cng ngh tin tin (Grand Central Dispatch, 64-bit,) trong c
OpenCL, v c th ni Mac OS X Snow Leopard 10.6 l h iu hnh u tin trc tip
a OpenCL vo Core ca mnh.

2. Framework & Runtime


OpenCL framework trong Mac OS X cung cp
y cc headers cn thit d dng thc
hin bin dch m ngun OpenCL cng nh
giao tip vi OpenCL Runtime. Ch n gin
vi
mt
dng
lnh
#include
<opencl.h>, ta c th s dng cc API ca
OpenCL m khng cn phi khai bo g thm
trong chng trnh.
Theo hnh v minh ha bn, m hnh lm vic
ca OpenCL cng tng t nh CAL ca ATI
hay CUDA ca nVidia.
Mt iu d thy l OpenCL runtime lm vic
trc tip vi driver ca phn cng, v th mt
s kin cho rng OpenCL ch l mt chun v
ngn ng lp trnh song song nn khng th chy nhanh hn CAL/CUDA l hon ton
sai lch. V lm vic trc tip vi driver ca phn cng nn OpenCL runtime c th coi
nh tng ng vi CAL/CUDA v vic hiu nng c th cao hn CAL/CUDA l bnh
thng.
Mt s hng phn cng nh ATI a OpenCL vo b cng c lp trnh song song ca
mnh, nhng mc no ch v mt c php hnh thc ch khng s dng OpenCL
OpenCL

runtime v tng giao tip vi driver vn gi CAL m ko s dng OpenCL runtime. l


l do STREAM chy chm hn OpenCL v mt s nguyn nhn no .

3. Compiler
OpenCL compiler trong Mac OS X s dng LLVM. Khi bin dch mt chng trnh
OpenCL, trc ht cc ch th trong chng trnh s c dch sang mt dng biu
din trung gian (Intermediate Representation IR). Sau , LLVM s dch v ti u ha
IR sang m ph hp vi thit b m chng trnh s thc thi. im nhn nm ch:
chng trnh ch cn vit mt ln nhng vn c th chy trn nhiu kin trc phn cng
khc nhau. V ng vi mi h thng phn cng, ln u tin bin dch ca chng trnh
s c cache li trnh vic bin dch trng lp khng cn thit.

4. Operation Model
S vn hnh ca OpenCL c m t bi cm cc model c mi lin h vi nhau, bao
gm Platform Model (m hnh h tng), Execution Model (m hnh thc thi), Memory
Model (m hnh b nh), v Programming Model (m hnh lp trnh).
4.1. Platform Model
Nh ni trn, OpenCL
device s lm vic vi host
device l device iu khin
chng trnh hot ng. Khi
chng trnh thc thi, host s
to ra mt mi trng tru
tng hay cn gi l mi
trng o (context) v cung
cp cc thit b tnh ton (compute device) cng vi khong vng nh nht nh
m chng trnh s s dng. Bn cnh mt hng i lnh c th t
(command queue) cng s c to ra chng trnh c th iu phi cc lnh
trong kernel v thc hin cc thao tc truy xut ti b nh.

OpenCL

Lu : tc truy xut gia cc device s chm hn rt nhiu so vi tc giao


tip ni b ca cc thnh phn trong device v bn thn host device (vd: CPU)
cng c th l mt OpenCL device.

4.2. Execution Model

Kernel: l mt tp cc lnh c vit ra thc thi trn mt thit b h tr
OpenCL (OpenCL device). Tp cc kernel v cc hm b tr (helper function)
c gi l Program.

Khi bin dch chng trnh, cc kernel c bin dch thnh kernel object v
tng t vi program ta c program object.
Vic thc thi mt chng trnh OpenCL bao gm nhiu thc thi mt cch ng
thi cc instance ca mt kernel trn mt hoc nhiu OpenCL device trn
command queue c iu phi bi ng dng host (host application). Mi
instance ca mt kernel gi l mt work-item. Mi work-item thc thi cng mt
on m lnh nhng trn cc vng d liu khc nhau v mi mt work-item chy
trn mt single-core ca multiprocessor. Khi n nh thc thi chng trnh trn
mt device no , ta xc nh s lng work-item cn thit hon tt vic x l
d liu m ta s gi l index space (khng gian ch mc). OpenCL h tr index
space ti a l 3 chiu.
Cc work-item c th nhm li thnh nhng work-group. OpenCL cng c c ch
ng b ha tnh ton gia cc work-item trong mt work-group nhng khng h
tr tng t gia cc work-group vi nhau.
Mi work-item trong chng trnh c mt nh danh duy nht - global ID h
tr truy xut trong index space. V d, mt work-item trong khng gian ch mc 2
chiu c gi tr X l 23 v Y l 6 s mang global ID (23, 6). Tng t, mi workOpenCL

10

group cng s c mt nh danh duy nht work-group ID, xc nh v tr ca


work-group trong index space.
OpenCL cng cho php nh v tr ca mt work-item trong work-group thng qua
local ID.
Ta c th hnh dung s tng t gia OpenCL vi CUDA, mi work-item tng
ng mt thread, v mi work-group tng ng vi mt thread block.
Memory object: l mt handle ti vng nh global (xem 4.3) c s dng
lu d liu t ng dng vo vng nh ca thit b thao tc. C 2 loi chnh:
buffer object v image object, vi buffer object c th cha bt c loi d liu
no v image object c s dng c th cho cc d liu nh. Host application
dng command-queue thc hin thao tc c v ghi ln memory object.
4.3. Memory Model
OpenCL phn chia tm vc b nh vo bn loi sau:
- Global memory: c th c v ghi
bi tt c cc work-item trong cc
work-group. y chnh l vng nh
c cp pht m t trong Platform
Model.
- Constant memory: l mt vng trn
global memory ch h tr vic c bi
cc work-item v gi gi tr khng i
sut qu trnh thc thi ca mt kernel.
Gi tr trn constant memory c
cung cp bi host application.
- Local memory: c th c c ghi bi mt work-group c th v gi gi tr
chia s bi cc work-item trong work-group .
- Private memory: ch c th truy xut bi mt work-item duy nht.
Vic s dng b nh hiu qu v tc ph thuc rt nhiu vo cch dng bn
loi b nh trn. Trong private memory v local memory cho tc cao nht,
vng nh cho tc truy xut chm chnh l global memory.
OpenCL

11

Cc khi nim v tm vc ca vng nh trong OpenCL cng tng t vi CUDA:

4.4. Programming Model


OpenCL h tr hai m hnh lp trnh song song chnh: song song d liu (dataparallel) v song song tc v (task-parallel).
Cc tin trnh song song d liu thc thi nhiu instance c cng kernel mt cch
ng thi, mi instance x l mt tp d liu ring bit. Mi tp d liu lin kt
vi mt im trong khng gian ch mc mt, hai hay ba chiu.
Song song tc v li tng t nh nhng tin trnh thc thi a lung c tnh cht
c lp nhau, mi process thc hin nhng nhim v khc nhau. Trong OpenCL,
lp trnh song song tc v bao gm vic lp hng i nhiu kernel, v OpenCL
thc hin chng mt cch song song s dng cc thit b tnh ton c th c.

OpenCL

12

III. Workflow pht trin chng trnh OpenCL


1. Cc bc vit mt chng trnh OpenCL
Tin trnh pht trin mt chng trnh OpenCL bao gm cc bc di y.
1.1. Xc nh nhng nhim v no c th thc hin song song
chng trnh t c hiu qu cao nht, trc tin ta phi xc nh nhng g
c th thc hin ng thi t d dng suy ra cch t chc b nh cng nh chi
ph ph hp cho chng trnh.

1.2. Vit cc kernel v cc hm b tr
thc hin tnh ton song song trn OpenCL device, bt buc phi vit cc
kernel. Cc kernel c ng gi v bin dch khi chng trnh thc thi.

1.3. Setup context
S dng cc hm c trong OpenCL framework tm v quyt nh thit b no s
dng trong chng trnh. Sau khi to mi trng o bao gm memory object
v command queue.

1.4. Vit m lnh bin dch v build chng trnh OpenCL
Sau khi xc nh c OpenCL device v setup context, chng ta s vit m lnh
cho host application bin dch m ngun chng trnh v s dng cc kernel object
t m ngun bin dch. Cc lnh sau c thc hin lin tc theo th t:
a. Hm clCreateProgramWithSource khi to chng trnh t m ngun
OpenCL-C cho trc, hoc nu c sn on m c bin dch trc (c
cache t ln thc thi trc, v d chng trnh ngoi nh cc th vin), gi hm
clCreateProgramWithBinary. Cc hm ny s lin kt cc kernels v cc
hm b tr vo mt chng trnh v tr v mt program object.
OpenCL

13

b. Gi hm clBuildProgram bin dch program object ph hp vi cc thit


b c th ang c ca h thng.
c. Gi
clCreateKernel
cho
mi
kernel,
hoc
gi
clCreateKernelsInProgram to cc kernel object trong mt chng
trnh OpenCL, hay ni khc i, ta extract cc i tng kernel c bin dch
t mt program object cho trc.

1.5. Khi to cc i tng memory object
gi cc d liu nhp xut v tr v gi tr cho cc i s u vo (input object),
memory object tham gia vo nhim v thao tc vng nh gia host device v
OpenCL device.

1.6. Lp hng i lnh c th t (enqueue command) iu khin vic thc thi lin
tc v ng b cc kernel, c v ghi d liu, v thao tc trn cc memory object
thc thi mt kernel, ta phi tun theo cc bc sau:
a. Gi hm clSetKernelArg truyn cc tham s u vo (parameter value)
vo kernel.
b. Xc nh kch thc work-group v lp index space thc thi kernel.
c. a lnh thc thi kernel vo command queue.

1.7. c gi tr tr v
Enqueue command c gi tr xut t work-item v a n vo host memory.

2. Vit Kernel
Kernel c vit bng ngn ng OpenCL-C c c php ging vi C vi mt s im
ring bit. Mt kernel c dng nh sau:

OpenCL

14

Lu :
1. Mt kernel lun c khai bo vi tit u t __kernel.
2. Khi thc thi mt kernel, ta dng hm clSetKernelArg truyn gi tr vo
cc tham s c nh ngha trn.
3. Cc hm get_global_id v get_local_size ly thng tin v workitem khi thc thi kernel.
4. mul24 l hm ton hc c sn trong OpenCL-C, v c rt nhiu hm c kh nng
tnh ton hiu sut cao c h tr sn cho c d liu c hng ln vector.
5. Kernel c th c gi t mt kernel khc trong cng mt chng trnh OpenCL.

OpenCL

15

3. Truy vn thit b
Mi chng trnh OpenCL i hi phi c mt context, bao gm danh sch cc OpenCL
device tn ti trn h thng. S dng hm clGetDeviceIDs truy vn danh sch
thit b trn my h tr OpenCL. Ta c th gii hn vic truy vn da vo c th ca
loi thit b hoc kt hp cc thit b (vd: ch dng GPU, CPU hay kt hp c 2), bn
cnh ta cng c th gii hn s lng thit b mun s dng.
V d: gi s chng ta mun thc thi code trn GPU v khng quan tm c bao nhiu
GPU s dng c v ta ch cn mt. Ta gn CL_DEVICE_TYPE_GPU vo tham s
device_type trong hm clGetDeviceIDs v gn num_entires = 1, OpenCL s
tr v ID ca GPU u tin m n tm thy.

4. Khi to OpenCL Context


Mt khi xc nh c s s dng OpenCL device no tnh ton v c t nht mt
thit b s dng c, chng ta bt tay vo khi to OpenCL context nhm phc v cho
vic nhnm cc thit b li vi nhau c th chia s vng nh gia cc compute device,
hoc chng ta cng c th khi to context t mt OpenGL context tn ti trc
OpenCL

16

nu c nhu cu kt hp OpenGL v OpenCL vi nhau. Vic chia s b nh gia 2


context hon ton c th thc hin c.
khi to mt context, trc tin ta phi xc nh thit b no s dng (kt qu tr v t
hm clGetDeviceIDs), v truyn n vo hm clCreateContext.

5. Khi to Program Object


Mt chng trnh OpenCL bao gm mt tp cc kernel, cc hm b tr c th gi t
kernel (cc kernel lun phi bt u bng t kha __kernel). Tuy nhin, nhng hm
b tr ny c th khng thc thi ng vai tr nh mt entry point t OpenCL API. C
ngha l, ta ch c th enqueue cc kernel thng bo nh trn. Mt program object
ng gi chng trnh ngun OpenCL, i km vi phin bn thc thi c build ln
trc ca chng trnh, cng nh build options, build log, v danh sch cc thit b m
chng trnh bin dch dng trc .
Ta c th khi to mt program object trc tip t m ngun ca chng trnh OpenCL
v bin dch n trc tip vo thi im thc thi ng dng (application runtime). Thm
vo , ta cng c th build program object s dng m nh phn ca ln build thnh
cng trc trnh phi build khi thc thi ng dng.

OpenCL

17

Lu :
1. M ngun ca kernel c a vo ng dng nhng mt con tr kiu char, ta c
th khai bo trc tip hoc lu vo file, t c chui ra con tr ny v s dng
trong hm clCreateProgramWithSource.
2. Nu khng kt thc chui bi NULL, ta phi quy nh s lng k t ti a ca
mt chui cho mi m ngun kernel.
Hm clCreateProgramWithSource to ra mt program object cha m ngun,
nhng n vn khng c kh nng thc thi n khi no c bin dch v lin kt.
Mt khi to bn nh phn ca chng trnh, ta c th s dng hm
clGetProgramInfo cha phin bn nh phn ny. Nu ta cache li, ln chy ti
ca chng trnh ta c th s dng phin bn nh phn thay cho m ngun to program
OpenCL

18

object. Thao tc ny gim ng k thi gian khi to v thc thi chng trnh sau ln u
tin ng dng chy trn mt thit b nht nh.
Vic khi to program object t m nh phn cng tng t vi t m ngun, ngoi tr
vic ta phi cung cp mi phin bn nh phn khc nhau cho mi thit b khc nhau m
kernel s chy. Ta c hm clCreateProgramWithBinary:

Lu :
1. Xem mc Truy vn thit b.
2. Gi tr tr v t hm clGetDeviceIDs.
3. Khi c program binary, ta c th ly thng tin v program object t hm
clGetProgramInfo.
4. V mi compute device u c tp lnh ring bit, ta phi cung cp nhng m nh
phn ring bit cho mi thit b mun s dng. Khi gi hm
clCreateProgramWithBinary, OpenCL kim tra mi mi bn nh phn
vi tng thit b c trn h thng m n tm c m bo bn nh phn ny
ph hp vi thit b. Tham s binaryStatus tr v mt mng thng tin cha
kt qu kim tra cho mi bn nh phn.

6. Build Program Executable


Sau
khi

khi
to
thnh
cng
program
object
s
dng
clCreateProgramWithSource hay clCreateProgramWithBinary, ta phi
xy dng phin bn thc thi ca chng trnh (build program executable) t program
object . Vic build mt chng trnh bin dch bt c m ngun no c trong program
OpenCL

19

object v lin kt m my tr v vi mt chng trnh c th thc thi c. Hm


clBuildProgram c s dng thc hin vic ny.

Hm clBuildProgram tc ng chnh sa ln chnh program object m ta truyn


thm vo phin bn thc thi ca chng trnh. Do , mt s program object cha bn
thc thi, mt s khng.
Khi bin dch m ngun chng trnh, c th chng ta s gp li. OpenCL framework
cung cp hm clGetProgramBuildInfo h tr truy vn trnh bin dch ca
OpenCL nhm ly thng tin chi tit v ln build cui cng. Ta c th s dng hm ny
kt hp vi clBuildProgram nh sau:

OpenCL

20

Trong v d trn, ng dng dng hng s CL_PROGRAM_BUILD_LOG nhm ly thng


tin chi tit li. Ta c th s dng clGetProgramBuildInfo ly nhng thng tin
khc nh build options m ta s dng khi gi hm clBuildProgram, hay tnh trng
bin dch hin ti.

OpenCL

21

7. Khi to Kernel Object


Mt kernel object cha nhng thng tin c bit v kernel function c khai bo trong
chng trnh cng nh m ngun, tham s s dng khi thc thi kernel. Mt khc, bn
thn kernel l mt hm, nhng mt kernel l mt cu trc d liu phc tp bao gm
kernel function v c d liu m kernel thao tc. Khi mun thc thi mt kernel, ta s
dng kernel object cha kernel a vo command queue.
S dng hm clCreateKernel khi to mt kernel object hoc gi hm
clCreateKernelsInProgram to cc kernel object cho tt c kernel trong
chng trnh OpenCL.
Cc phn tip theo cung cp ci nhn tng quan v cch khi to memory object cha
d liu, kt hp d liu vi kernel object v thc thi kernel.

8. Khi to Memory Object


Memory object thc cht l mt vng bo qun t global memory ca thit b c th
c xem nh ni cha d liu chng trnh. Sau khi khi to v ng k kernel vi
OpenCL runtime, chng ta c th gi d liu ca ng dng ti cc kernel ang chy trn
nhng thit b khc nhau bng cch ng gi d liu vo memory object trc tin, sau
lin kt memory object ny vi kernel c bit no . Nh m t trn, c 2 loi
memory object: buffer object l mt khi b nh, trong khi image object li l mt cu
trc phc tp, c th dnh biu din cc i tng nh 2D hay 3D.
khi to buffer object, ta dng hm clCreateBuffer. Tng t ta c th s dng
cc hm clCreateImage2D hay clCreateImage3D cho cc d liu nh ph hp.
Cc hm ny tr v i tng c kiu d liu l cl_mem.

9. Thc thi cc kernel


OpenCL lun thc thi cc kernel theo c ch song song d liu, c ngha l, cc instance
ca cng mt kernel (hay cn gi l cc work-item) thc thi trn cc phn khc nhau ca
tp d liu. (Nu mun thc thi song song tc v, ta phi enqueue nhiu kernel trn cc
thit b khc nhau) Mi work-item chu trch nhim thc thi kernel ng mt ln v thao
tc trn phn d liu c giao. Chng ta c nhim v xc nh s lng work-item cn
OpenCL

22

thit x l tt c d liu. Bi v tp d liu thng c t chc di dng mt, hai,


hoc ba chiu (d liu m thanh, nh hai hay ba chiu, cc i tng ba chiu).
9.1. Xc nh s chiu ca d liu:
Bc u tin khi chun b thc thi mt kernel l xc nh s chiu m ta mun s dng
biu din d liu. V d, nu d liu biu din mt nh hai chiu c kch thc m x n,
khi ta c tp d liu hai chiu vi mi im d liu biu din bi ta ca n trn
hai trc m v n.
OpenCL cha h tr s chiu ln hn 3.

9.2. Xc nh s lng work-item
Bc k tip khi mun thc thi kernel l xc nh c bao nhiu work-item cn
thit x l ht d liu (global work size), v n nh ngha tng s work-item
c ba chiu. Vi d liu mt chiu, global work size bng vi vi s lng data
item. Vi d liu hai chiu, global work size l m*n. Tng t l x*y*z vi d
liu 3 chiu c x, y, v z work-item trong mi chiu. Thc t khng c gii hn v
s lng work-item, v s lng work-item ln s tn dng c kh nng tnh
ton ca GPU (hn 1000).

9.3. Chn kch thc cho work-group
Khi enqueue mt kernel thc thi n trn mt thit b, ta c th ch nh kch
thc ca work-group m OpenCL s dng trong qu trnh thc thi. Cc workitem trong cng work-group c th chia s b nh v thc thi mt cch ng b.
tn dng nhng c im ny, tuy nhin, cn phi bit kch thc cc i ca
work-group m OpenCL device mun thc thi cho php. Ta s dng hm
clGetKernelWorkGroupInfo
v
thuc
tnh
CL_KERNEL_WORK_GROUP_SIZE ly thng tin ny. Nu khng cn chia s
d liu gia cc work-item trong mt work-group, truyn gi tr NULL vo tham
s local_work_size khi enque kernel khi thc.
Lu l cng cn dng hm clGetDeviceInfo vi tham s
CL_DEVICE_MAX_WORK_ITEM_SIZE ly kch thc cc i trong mi
OpenCL

23

chiu ca work-group, v gi hm clGetKernelWorkGroupInfo vi tham


s CL_KERNEL_WORK_GROUP_SIZE ly kch thc tng ca work-group.
C 3 iu kin cn c p ng kch thc a phng c m bo:
1. S lng work-item i vi tng chiu (local_x, local_y, v local_z) trong
mt work-group phi nh hn gi tr tr v t hm
clGetDeviceInfo(CL_DEVICE_MAX_WORK_ITEM_SIZES).
2. Tng s work-item trong mi work-group (local_x*local_y*local_z) phi
nh
hn
hoc
bng
vi
gi
tr
tr
v
t
hm
clGetKernelWorkGroupInfo(CL_KERNEL_WORK_GROUP_SIZ).
3. S lng work-item ng vi tng chiu trong mi work-group phi c
chia u cho tng s cc work-item trong chiu (global_n mod
local_n = 0).
on m sau minh ha vic s dng hm clGetKernelWorkGroupInfo:

OpenCL

24


9.4. Enqueue Kernel Execution
Sau khi xc nh s chiu cn thit biu din d liu, s work-item cho mi
chiu, v kch thc work-group ph hp. ta c th enqueue kernel thc thi n.

OpenCL

25

OpenCL

26

Lu :
1. C th dng hm clSetKernelArg truyn gi tr cho tham s ca
kernel, th t ty thuc vo th t khai bo ca tham s trong nh ngha
ca kernel.
2. Cc bc trung gian cn thc hin trong nhng phn trc trc khi x l
enqueue kernel thc thi (xem Cc bc vit mt chng trnh
OpenCL 1).
3. Ch mc ca tham s bt u t 0.
4. D liu c th mt, hai, hoc ba chiu (xem Xc nh s chiu ca d
liu 9.1).
5. Tham s local l mt mng xc nh kch thc ca mi chiu ca mng d
liu x l tt c d liu bi kernel. V d, nu c d liu l dng nh hai
chiu vi kch thc 64x128, th mng kch thc s c dng [64, 128].
6. Nu mun ch nh kch thc ca mt work-group, ta phi ch nh theo
dng mt mng vi cng s lng chiu s dng cho d liu. Gi tr ca
mng phi chia u cho mng gi tr ca global work size. V d, nu
global work size l [64, 144], th work-group-size s l [8, 12], [4, 4], hay
[32, 34], nhng khng th l [24, 32]. Khng nht thit phi ch nh kch
thc work-group khi enqueue mt kernel. Ta c th OpenCL lm vic
bng cch truyn gi tr NULL vo tham s local_work_size.
7. Tham s ny v hai tham s k tip c dng kim sot chui cc s
kin nu s c xy ra ngoi mun khi gi hm
clCreateCommandQueue. Tham s ny xc nh s lng cc mc
trong hai tham s tip theo.

OpenCL

27

8. Nu ang s dng cc i tng s kin (event objects) qun l chui


thc thi, ta c th ch nh nhng s kin no phi hon thnh trc khi
lnh ny c thc hin.
9. Nu mun ch nh cc lnh khc phi ch cho n khi lnh ny c thc
hin xong, hay mun truy vn instance thc thi ca kernel sau ny, ta cn
cha mt event object cho vic thc thi instance .

10. Nhn kt qu tr v
Sau khi kernel thc thi xong, ta phi c kt qu tr v t device v a n vo host
memory.
10.1. Ch cho n khi cc kernel hon tt thc thi
Gn gi tr CL_TRUE vo tham s block_read m bo lnh
clEnqueueReadBuffer hay clEnqueueReadImage khng kt thc cho
n khi d liu c c v chp vo b nh. (D ta c th s dng hm
clFinish bt host application ngng li cho n khi tt c command trong
mt command queue thc hin xong, nhng hm ny gy nh hng n hiu
qu v tc ca chng trnh).
Lu l trong mt command queue cho trc, tt c cc lnh lun thc hin theo
th t. Ta phi ng b ha hoc ch cc lnh ang thc thi trn cc command
queue khc nhau (tc trn cc thit b khc nhau). Cng lu l d ta c th c
v ghi t cng mt buffer object trong mt kernel, i vi image object ta phi c
nhng object ring bit phc v cho vic c v ghi.
Nu mun ch mt kernel kt thc thc thi v sau enqueue mt kernel khc
trn cng mt command queue, ta c th dng mt event object cho mt instance
thc thi kernel c quenque v ch nh lnh k tip ch event object ny
trc khi thc thi (xem thm phn Lu 7, 8, 9 trong mc Thc thi kernel
9.4).
Ngoi ra, ta c th enqueue lnh clEnqueueBarrier hay cc lnh chn ro
b nh (mem_fence, read_mem_fence, write_mem_fence) ng b
ha cc lnh trong mt work-group. ng b ha cc lnh trong cc workgroup khc nhau, ta c th dng cc event object. S dng clWaitForEvents,
OpenCL

28

clEnqueueWaitForEvents v clGetEvenInfo ly thng tin v mt lnh,


bao gm c trng thi thc thi ca n.

10.2. c kt qu

Khi kernel kt thc thc thi, ta c th c d liu t device tr v host host
applicatin c th x l d liu ny. c d liu, gi hm
clEnqueueReadBuffer hay clEnguqueReadImage, ty thuc vo loi
memory object ta to ra cha kt qu output (xem thm Execution Model /
Memory Object II / 4.2).
Lu : tr khi c nh s dng cng mt i tng memory object gi input
v output ca mt kernel, nn khi to mt memory object ring cha d liu
output, v gn n nh mt i s ca kernel.

11. Gii phng b nh


Khi host application khng cn yu cu cc ti nguyn khc nhau phc v vic chy
OpenCL cng nh context, ta cn gii phng cc ti nguyn ny. Cc hm bao gm:

clReleaseMemObject
clReleaseKernel
clReleaseProgram
clReleaseCommandQueue

12. Debug chng trnh OpenCL


Hin ti c mt vi cch ph bin nh s dng gdb debugger xem m hp ng mt khi
chng trnh c bin dch, hoc khai bo vng nh vi kch thc tng minh
tm ra nhng trng hp out-of-range khi truy xut a ch vng nh (hin nay cha c
c ch bo v vng nh trn cc GPU). Hay s dng tin ch Shark iu chnh
performance ca chng trnh. Tuy nhin, bc n gin v th s nht vn l thc thi
kernel trn CPU v dng lnh printf trong chnh kernel.

OpenCL

29

Cng cn m bo mt iu rng kernel khng mt qu nhiu thi gian thc hin trn
GPU khi bn thn GPU l mt ti nguyn chia s cho cc ng dng khc, vic chy qu
lu c th tc ng xu n phn hi ca h thng (gy treo my chng hn).

OpenCL

30

IV. Performance
1. GPGPU Performance
1.1. S thc
Mi trng : Windows Vista x64 SP2; Catalyst 9.11 video / STREAM 1.4.427 /
OpenCL 1.0 Beta 4; ForceWare 190.89 video / CUDA 2.3 / OpenCL 1.0 live
release.
Da theo bng thng k ny ta thy c rng OpenCL vn chm hn CUDA i
cht trong tnh ton floating-point, nhng li nhnh hn khi tnh ton doubleemulation v nhanh hn rt nhiu so vi CAL.

OpenCL

31


1.2. Bandwidth
Mi trng: Windows Vista x64 SP2; Catalyst 9.11 video / STREAM 1.4.427 /
OpenCL 1.0 Beta 4; ForceWare 190.89 video / CUDA 2.3 / OpenCL 1.0 live
release.

1.3. Nhn xt
Hiu sut hot ng ca OpenCL bn 1.0 gn nh tng ng vi CUDA thm
ch nhanh hn trong mt s trng hp.

OpenCL

32

2. CPU Performance
2.1. S hc
Mi trng: Windows Vista SP2, Server 2003 SP2, AMD CPU OpenCL 1.0
preview.

OpenCL

33

2.2. Bandwidth
Mi trng: Windows Vista SP2, Server 2003 SP2, AMD CPU OpenCL 1.0
preview.


2.3. Nhn xt
So vi nhng tin bi nh .Net hay Java th hin nay OpenCL vn cha thc s
chy tt hn.

OpenCL

34

IV. Ti liu tham kho


OpenCL Programming Guide for Mac OS X, Mac Develeoper Center - Apple Inc.,
2009.
OpenCL Techonology Brief, Apple Inc., 2009.
OpenCL The Open Standard for Heterogeneous Parallel Programming, Kronos
Group, 2009.
The OpenCL Specification, Aaftab Munshi Kronos OpenCL Working Group, 2009.
OpenCL Quickcard Reference, Kronos OpenCL Working Group, 2009.
OpenCL Tutorials, Ph.D David W. Gohara, Center for Computational Biology,
Washing University School of Medicine, 2009.
OpenCL Samples & Introduction, MacResearch, 2009.
Benchmarks : OpenCL GPGPU Performance (OpenCL vs. CUDA/STREAM),
http://www.sisoftware.net/index.html?dir=qa&location=gpu_opencl&langx=en&a=
Benchmarks : OpenCL CPU Performance (OpenCL vs native/Java/.Net),
http://www.sisoftware.net/index.html?dir=qa&location=cpu_opencl&langx=en&a=

OpenCL

35