You are on page 1of 36

Conference title 1

The Multi2Sim Simulation Framework



A CPU-GPU Model
for Heterogeneous Comuting
www!multi2sim!org

"afael U#al
$a%id "! &aeli
'ortheastern Uni%ersit(
)oston* MA
The Multi2Sim Simulation Framework* PACT 2+,, Tutorial 2
-utline
1. Introduction
First )lock . The /01 CPU Simulation
2. The x86 CPU Emulation
3. The x86 CPU Architectural Simulation
4. The Memor !ierarch
". #enchmar$% and Simulation%
Second )lock . The AM$ 2%ergreen GPU Simulation
6. The &'enC( Pro)rammin) Model
*. The AM+ E,er)reen -PU Emulation
8. The AM+ E,er)reen -PU Architectural Simulation
.. #enchmar$% and Simulation%
1/. Conclu%ion% and 0uture 1or$
The Multi2Sim Simulation Framework* PACT 2+,, Tutorial 3
,! 3ntroduction
Moti%ation
2 4imitations of e/isting CPU simulators
3 Such a% Sim'leScalar4 Simic%4 SSMT4 M5Sim4 SMTSim4
M"4 ...
3 0ull5%%tem ,%. a''lication5onl %imulation.
3 0ree4 o'en5%ource.
3 Architectural %imulation accurac.
3 Al'ha6PISA architecture% 7 cro%%5com'iler%.
3 Inte)rated %%tem.
2 Current simulation needs
3 #a%ed on current 'roce%%or mar$et.
3 !etero)eneou% CPU5-PU en,ironment%.
3 Tool 8or e,aluation o8 ne1 architectural 'ro'o%al%.
3 Simulation o8 a -PU ISA.
2 2/isting GPU simulation aroaches
3 #arra9 :;I+IA Tel%a ISA.
3 &celot9 PT< intermediate lan)ua)e %imulator.
3 :o architectural %imulation.
3 :o emulation o8 AM+ ISA%.
3 :ot ca'a=le o8 hetero)eneou% %imulation.
The Multi2Sim Simulation Framework* PACT 2+,, Tutorial 4
,! 3ntroduction
Multi2Sim )ackground
2 Multi2Sim 5!/ %ersion series* 2+,, 6/0172%ergreen8
Suerscalar ieline
&ut5o85order execution4
=ranch 'rediction4 trace
cache4 etc.
Multithreading
0ine5)rain4 coar%e5)rain
and %imultaneou% >SMT?.
Multicore architecture!
Con8i)ura=le memor hierarch4
cache coherence4
interconnection net1or$%.
State-of-the-art #enchmarks!
Te%ted %u''ort 8or common re%earch
=enchmar$%4 a,aila=le 8or do1nload.
GPU model
Su''ort 8or &'enC(
=enchmar$%.
Model 8or E,er)reen ISA.
2 Multi2Sim ,!/ %ersion series* 2++9 6M3PS-#ased8
2 Multi2Sim 2!/ %ersion series* 2++0 6/01-#ased8
The Multi2Sim Simulation Framework* PACT 2+,, Tutorial 5
,! 3ntroduction
Getting Started
2 User-friendl( installation and
test
$ tar -xzf multi2sim-3.1.tar.gz
$ cd multi2sim-3.1
$ ./configure
$ make
$ sudo make install
2 Alication-onl( simulator
-riginal e/ecution Simulated e/ecution
$ ./test-args hola que tal
arg[0] !hola!
arg[1] !que!
arg[2] !tal!
$ m2s ./test-args hola que tal
"... #imulator out$ut ...%
arg[0] !hola!
arg[1] !que!
arg[2] !tal!
"... #imulator statistics ...%
The Multi2Sim Simulation Framework* PACT 2+,, Tutorial 6
,! 3ntroduction
The 3niFile Format
2 2/amle of 3niFile
& 'his is a comment.
[ #ection 0 ]
(olor )ed
*eight +0
[ ,ther#ection ]
-aria.le -alue
$emo ,
2 Multi2Sim uses 3niFile for
3 Con8i)uration 8ile%.
3 &ut'ut %tati%tic 8ile%.
3 Standard error out'ut.
The Multi2Sim Simulation Framework* PACT 2+,, Tutorial 7
)lock ,
The /01 CPU
Simulation
The Multi2Sim Simulation Framework* PACT 2+,, Tutorial 8
2! The CPU 2mulation
$efinition
2 2mulation 6a!k!a! functional simulation8
3 @u%t mimic ori)inal =eha,ior o8 a 'ro)ram.
3 A a% o''o%ed to timin)6detailed6architectural
%imulation.
2 Stes
1? Pro)ram loadin).
2? Simulation loo'.
The Multi2Sim Simulation Framework* PACT 2+,, Tutorial 9
2! The CPU 2mulation
Program 4oading
2 3nitiali:ation of a rocess state
3 ;irtual memor ma'.
3 ;alue o8 x86 re)i%ter%.
Stack
Program arguments
Environment variables
0x08000000
mmap region
(not initialized)
Heap
Initialized data
Text
Initialized data
0x08xxxxxx
0x40000000
0xc0000000
eax
ebx
eax
ecx
esp
eip
I
n
i
t
i
a
l
i
z
e
d
i
n
s
t
r
u
c
t
i
o
n

p
o
i
n
t
e
r
T
o
p

o
f

s
t
a
c
k
,8 Parse 24F e/ecuta#le
3 E(0 %ection%.
3 InitialiBed code and data.
28 3nitiali:e stack
3 Pro)ram header%.
3 Ar)ument%.
3 En,ironment ,aria=le%.
58 3nitiali:e registers
3 Pro)ram entr 'oint 7 eip
3 Stac$ 'ointer 7 esp
The Multi2Sim Simulation Framework* PACT 2+,, Tutorial 10
2! The CPU 2mulation
Simulation 4oo
$emo 2
"ead instr!
at ei$
In%tr.
=te%
$ecode
instruction
In%tr.
8ield%
3nstr! is
int 0x/0
:o Ce%
2mulate
s(stem call
2mulate
/01 instr!
Mo%e ei$
to ne/t instr!
2 2mulation of /01 instructions
3 U'date memor ma' >i8 needed?.
3 U'date x86 re)i%ter%.
3 Exam'le9 add [.$011]2 0x3
2 2mulation of 4inu/ s(stem
calls
3 AnalBe %%tem call code and ar)%.
3 U'date memor ma'.
3 U'date eax 1ith return ,alue.
3 Exam'le9 read4fd2 .uf2 count5&
The Multi2Sim Simulation Framework* PACT 2+,, Tutorial 11
5! The CPU Architectural Simulation
$efinition
2 Architectural simulation 6a!k!a! detailed;timing
simulation8
3 Pro,ide% 'er8ormance re%ult% 8rom executin) a 'ro)ram
on a con8i)ura=le CPU model.
3 Main 'er8ormance metric9 execution time.
#ut al%o %tructure% occu'anc4 cache hit rate%4 contention 'oint%...
Architectural
Simulator
cycle counter
CPU
functional
simulator
CPU cores
model
Memory hierarchy
model
Run a new x86
instruction
This is the isntr.
that was run
The Multi2Sim Simulation Framework* PACT 2+,, Tutorial 12
5! The CPU Architectural Simulation
The Suerscalar Pieline
$emo 5
Fetch
In%tr.
Cache
0etch Dueue
$isatch

Eeorder #u88er


In%truction Fueue

(oad6Store Fueue
3ssue
Commit
+ata
Cache
Ee)i%ter
0ile
0U
Trace Dueue

Trace
Cache
$ecode
Go' Dueue

<rite#ack
2 Characteristics
3 S'eculati,e execution.
3 #ranch 'rediction.
3 &ut5o85order execution.
3 Trace cache.
The Multi2Sim Simulation Framework* PACT 2+,, Tutorial 13
5! The CPU Architectural Simulation
Multithreaded Processor Model
Fetch
In%tr.
Cache
$isatch




3ssue
Commit
+ata
Cache
Ee)i%ter
0ile
0U

Trace
Cache
$ecode
<rite#ack
Fetch
In%tr.
Cache
$isatch




3ssue
Commit
+ata
Cache
Ee)i%ter
0ile
0U

Trace
Cache
$ecode
<rite#ack
Fetch
In%tr.
Cache
$isatch




3ssue
Commit
+ata
Cache
Ee)i%ter
0ile
0U

Trace
Cache
$ecode
<rite#ack
Shared Functional
Unit Pool
2 Multithreading Paradigms
3 Coarse grain multithreading
Thread %1itch u'on lon)5latenc e,ent%.
3 Fine grain multithreading
Thread %1itch at a ccle )ranularit.
3 Simultaneous multithreading
Multi'le5thread i%%uin) o8 in%truction%.
The Multi2Sim Simulation Framework* PACT 2+,, Tutorial 14
5! The CPU Architectural Simulation
Multicore Processor Model
Core / Core 1

Memor( Hierarch(
Fetch
I n%tr.
Cache
$isatch




3ssue
Commit
+ata
Cache
Ee)i%ter
0ile
0U
Trace
Cache
$ecode
<rite#ack
Fetch
I n%tr.
Cache
$isatch




3ssue
Commit
+ata
Cache
Ee)i%ter
0ile
0U
Trace
Cache
$ecode
<rite#ack
2 Multicore Processor
3 Multi'le inde'endent %u'er%calar 'i'eline%.
3 Communication onl throu)h memor
hierarch.
$emo =
2 <hat can we run on it>
3 Multi'le %in)le5threaded 'ro)ram%.
3 &ne >or more? 'ro)ram% %'a1nin) child
thread%.
The Multi2Sim Simulation Framework* PACT 2+,, Tutorial 15
5! The CPU Architectural Simulation
$efinitions
2 Core 6c-+* c-,* !!!8
3 !ard1are com'onent 1ith an inde'endent %et o8 %u'er%calar 'i'eline%.
3 Each core ma contain %e,eral threads.
$emo =
2 Thread 6t-+* t-,* !!!8
3 !ard1are com'onent 1ith a 'artiall inde'endent %et o8 'i'eline %ta)e%.
2 Conte/t 6ct/-+* ct/-,* !!!8
3 So8t1are thread 1ith inde'endent ,alue 8or re)i%ter% >incl. ei?.
3 Can =e a %eDuential 'ro)ram or a %'a1ned child context.
2 'ode
3 !ard1are com'onent runnin) a context.
3 Multicore 'roc.9 c+4 c,4 A Multithreaded 'roc.9 t+4 t,4 A
Multicore5multithreaded 'roc.9 c+-t+4 c+-t,4 ...
The Multi2Sim Simulation Framework* PACT 2+,, Tutorial 16
2 Configuring memor( hierarch(
3 An num=er o8 cache% or)aniBed in an num=er o8 le,el%.
3 Connected throu)h an num=er o8 interconnect%.
3 A %et o8 1 or more cache% mu%t connect to an interconnect 8rom Ha=o,eI.
&nl one cache 3or main memor3 connected H=elo1I.
=! Memor( Hierarch(
Configuration
2 Memor( hierarch( entries
3 Each node ha% t1o entrie% to the memor hierarch9
In%truction entr J +ata entr
3 Se,eral node entrie% can con,er)e to the %ame cache >or main memor?.

3nterconnect
Cache Cache Cache
Cache or
Main Memor(
The Multi2Sim Simulation Framework* PACT 2+,, Tutorial 17
=! Memor( Hierarch(
Configuration
c+-t+
$ata
4,
3nstr!
4,
c+-t,
$ata
4,
3nstr!
4,
Core +
c,-t+
$ata
4,
3nstr!
4,
c,-t,
$ata
4,
3nstr!
4,
Core ,
42 Cache 42 Cache
Main Memor(
2 2/amle
3 25core4 25threaded 'roce%%or >4 node%?.
3 Each thread ha% it% o1n 'ri,ate data and in%truction (1 cache%.
3 (2 cache%9 %hared amon) thread%4 'ri,ate 'er core4 uni8ied 8or data6in%tr.
$emo ?
The Multi2Sim Simulation Framework* PACT 2+,, Tutorial 18
?! )enchmarks and Simulations
Suorted CPU )enchmarks
2 Se@uential #enchmarks
3 SPEC CPU 2///
3 SPEC CPU 2//6
3 Media#ench5I
$emo 1
2 Parallel #enchmarks
3 SP(AS!52
3 PAESEC 2.1
2 A%aila#ilit( on we#site
3 x86 =inarie% te%ted on Multi2Sim.
3 (i%t o8 execution command%.
3 +ata 8ile% 8or 8ree5di%tri=ution =enchmar$%.
The Multi2Sim Simulation Framework* PACT 2+,, Tutorial 19
)lock 2
The AM$ 2%ergreen GPU
Simulation
The Multi2Sim Simulation Framework* PACT 2+,, Tutorial 20
1! The -enC4 Programming Model
3ntroduction
2 GPU
3 Ma%%i,el 'arallel de,ice.
3 &ri)inall de,oted to )ra'hic% com'utation%.
3 :o1 )ettin) 'o'ular 8or )eneral 'ur'o%e com'utation% >-P-PU?.
3 Sin)le5Pro)ram Multi'le5+ata >SIMP? model.
2 MaAor GPU %endors
3 :;I+IA 7 CU+A 'ro)rammin) lan)ua)e.
3 AM+ 7 &'enC( 'ro)rammin) lan)ua)e.
The Multi2Sim Simulation Framework* PACT 2+,, Tutorial 21
1! The -enC4 Programming Model
Bector Addition 2/amle
int main45
6
[ ... ]
cl(reate7rogram8ith#ource4...2
9:ector;add.cl92 ...5&
cl(reate<ernel4...2 9:ector;add92
...5&
.uf1 cl(reate=uffer4...2 (>;?@?;)@AB2
size2 ...5&
.uf2 cl(reate=uffer4...2 (>;?@?;)@AB2
size2 ...5&
.uf3 cl(reate=uffer4...2 (>;?@?;8)C'@2
size2 ...5&
cl#et<ernelArg4...2 02 .uf12 ...5&
cl#et<ernelArg4...2 12 .uf22 ...5&
cl#et<ernelArg4...2 22 .uf32 ...5&
cl@nqueueDB)ange<ernel4...5&
[ ... ]
E
-enC4 Host Program
:ector;add.c
-enC4 $e%ice &ernel
:ector;add.cl
;;kernel :oid :ector;add4
;;read;onlF ;;glo.al int G.uf12
;;read;onlF ;;glo.al int G.uf22
;;Hrite;onlF ;;glo.al int G.uf35
6
int id get;glo.al;id405&
.uf3[id] .uf1[id] 0 .uf2[id]&
E
/01 e/ecuta#le #inar(
:ector;add
AM$ 2%ergreen kernel #inar(
:ector;add..in
The Multi2Sim Simulation Framework* PACT 2+,, Tutorial 22
1! The -enC4 Programming Model
-enC4 Software 2ntities
Common OpenCL
Kernel:
;;kernel func45
6
E
<ork-
grou
...
<ork-
item
<ork-
grou
<ork-
grou
<ork-
grou
...
.
.
.
.
.
.
ND-Range
...
...
.
.
.
.
.
.
Work-group
<ork-
item
<ork-
item
<ork-
item
Work-item
lo!al memory "ocal memory Pri#ate memory
>SnchroniBation
allo1ed at thi% le,el?
2 Proerties
3 !o%t 'ro)ram con8i)ure% :+5Ean)e and Kor$5)rou' %iBe%.
3 &nl Kor$5item% in the %ame Kor$5)rou' can %nchroniBe and %hare data.
3 Kor$5)rou'% in :+5Ean)e can execute in an order.
The Multi2Sim Simulation Framework* PACT 2+,, Tutorial 23
9! The 2%ergreen GPU 2mulation
The -enC4 Call Stack
-

e
r
a
t
i
n
g
s
(
s
t
e
m

c
o
d
e
U
s
e
r
-
s

a
c
e
c
o
d
e
-enC4 function call
6e!g!* cl@nqueueDB)ange<ernel8
-enC4 host rogram
AM$ -enC4 li#rar(
6li.,$en(>.so8
S(stem calls
6mainl( ioctl8
GPU $ri%er
M
u
l
t
i
2
S
i
m
2
m
u
l
a
t
e
d

r
o
g
r
a
m
-enC4
function call
-enC4 host rogram
Multi2Sim -enC4 li#rar(
6m2s-li.$encl.so8
Secial s(stem call
6code 52?8
GPU 2mulator
'ati%e 2/ecution Simulated 2/ecution
2 Comarison
3 &'enC( 8unction call% are 8or1arded to m2s-libopencl.so.
3 Each 8unction i% im'lemented a% a %%tem call 32".
3 Multi2Sim emulate% -PU a8ter clEnqueueNDRangeKernel.
The Multi2Sim Simulation Framework* PACT 2+,, Tutorial 24
9! The 2%ergreen GPU 2mulation
Program 4oading
2 3nitiali:ation of de%ice kernel
3 -lo=al memor ma' >1hole :+5Ean)e?.
3 (ocal memorie% >each 1or$5)rou'?.
3 Ee)i%ter 8ile% >each 1or$5item?.
<ork-item <ork-item

<ork-grou
<ork-item <ork-item

<ork-grou

'$-"ange
Glo#al
Memor(
4ocal
Memories
"egister
Files
-enC4 kernel #inar(
6%ectorCadd!#in8
The Multi2Sim Simulation Framework* PACT 2+,, Tutorial 25
9! The 2%ergreen GPU 2mulation
2%ergreen Assem#l( Code
2 Structure
3 Main Control 0lo1 >C0? clau%e.
3 Secondar Arithmetic5(o)ic >A(U? and Texture >TE<? clau%e%.
3 A(U in%truction% are ;(IK.
00 A>IJ ABB)4325 (D'4/5 <(A(*@04(=1J0-135
0 xJ >#*> )3.x2 )0.x2 1
HJ >#*> ;;;;2 )0.x2 40x35.x
tJ ?,- )/.x2 1
1 xJ >#*> )3.x2 7-1.x2 40x25.x
FJ >#*) )1.F2 7-1.z2 40x25.x
zJ ABB;CD' ;;;;2 <(0[1].x2 7-2.x
tJ >#*) )K.x2 <(0[3].x2 1
2 FJ >#*) )2.F2 7-3.z2 40x25.x
01 '@LJ ABB)41++5 (D'425
3 -M@'(* )1.x;;;2 )1.F2 fc131 ?@NA4+5
M@'(*;'O7@4D,;CDB@L;,MM#@'5
+ -M@'(* )2.x;;;2 )2.F2 fc131 ?@NA4+5
M@'(*;'O7@4D,;CDB@L;,MM#@'5
02 A>I;7I#*;=@M,)@J ABB)4+K5 (D'435
3 xJ >B#;8)C'@ ;;;;2 )1.H2 )1.x
1 xJ >B#;8)C'@ ;;;;2 )1.x2 )2.x
K xJ 7)@BD@;CD' ;;;;2 )K.x2 0.0f
I7BA'@;@L@(;?A#< I7BA'@;7)@B
03 PI?7 7,7;(D'415 ABB)4135
0+ ?@?;)A';(A(*@>@##;#',)@;)A8J
)A'415[)1].x;;;2 )02 A))AO;#CQ@4+5 ?A)< -7?
CF 3 nstruction
Counter
Secondar( Clause
3nstruction Counter
Secondar( A4U
Clause
Secondar( T2D
Clause
The Multi2Sim Simulation Framework* PACT 2+,, Tutorial 26
9! The 2%ergreen GPU 2mulation
Simulation 4oo
2mulate
>all 1or$5item%?
2mulate
>all 1or$5item%?
"ead CF
instruction
In%tr.
=te%
$ecode
instruction
In%tr.
8ield%
3nstr! is
CF>
Ce% :o
Start A4U;
T2D clause
"ead A4U;
T2D instr!
In%tr.
=te%
$ecode
instruction
In%tr.
8ield%
2nd of
clause>
:o
Go U
Ce%
2 2/ecution of CF clause
3 In%truction% a88ectin) control 8lo1.
3 SnchroniBation o'eration%.
3 Krite% to )lo=al memor.
2 Secondar( A4U clause
3 Arithmetic5lo)ic o'eration%.
3 Acce%%e% to local memor.
2 Secondar( T2D clause
3 Eead% 8rom )lo=al memor.
$emo 9
The Multi2Sim Simulation Framework* PACT 2+,, Tutorial 27
0! The GPU Architectural Simulation
AM$ 2%ergreen GPU Architecture
2 The GPU Comute $e%ice
3 Pool o8 'endin) 1or$5)rou'% >K)%?.
3 Set o8 com'ute unit% >Cu%?.
3 +i%'atcher 3 ma'% K-% to CU%.
3 -lo=al memor hierarch.
Comute
Unit +
Comute
Unit ,
Comute
Unit '-,

Kor$5)rou' di%'atcher
Pendin)
Kor$5)rou'
'ool
Glo#al Memor( Hierarch(
A4U
2ngine
CF
2ngine
T2D
2ngine
Ee)i%ter 0ile
Eead
Ka,e8ront
Pool -
l
o
=
a
l

M
e
m
o
r

>
r
e
a
d
%
?
(
o
c
a
l
M
e
m
o
r

A
(
U

C
l
a
u
%
e
T
E
<

C
l
a
u
%
e
-
l
o
=
a
l

m
e
m
o
r

>
1
r
i
t
e
%
?
2 Comute Unit
3 Pool o8 'endin) 1a,e8ront% >K8%?
3 Three execution en)ine%.
3 (ocal memor.
3 Ee)i%ter 8ile.
The Multi2Sim Simulation Framework* PACT 2+,, Tutorial 28
0! The GPU Architectural Simulation
2/ecution 2ngines
Fetch
6one <F8
In%tr. =te%
=u88er%
$ecode
6round-
ro#in8
In%truction
Memor
>C0
Clau%e?
0rom Eead
Ka,e8ront
Pool
E
x
t
r
a
c
t
K
0
To Eead
Ka,e8ront
Pool
I
n
%
e
r
t
K
0
K0
/
K0
1
K0
:51
E
E
E
C0 In%tr.
=u88er%
>1 entr
'er K0?
K0
/
K0
1
K0
:51
E
E
E
2/ecute 6round-ro#in8
(aunch %econdar
A(U clau%e
(aunch %econdar
TE< clau%e
Execute
C0 in%truction
Comlete
2 Control Flow 6CF8 2ngine
3 4 %ta)e%.
3 Extract% one K0 8rom 'ool at fetch %ta)e.
3 Place% a K0 =ac$ into 'ool at comlete %ta)e.
3 Secondar clau%e% can =e launched at e/ecute
%ta)e.
The Multi2Sim Simulation Framework* PACT 2+,, Tutorial 29
0! The GPU Architectural Simulation
2/ecution 2ngines
EEE
In%truction
=te%
$ecode
"ead
6each
Su#<F8
SubWF
SubWF !
SubWF "
x
#
$
w
t
...
.
.
.
Stream Core $
%
r
o
c
e
s
s
i
n
&
'
l
e
m
e
n
t
s
%
i
p
e
l
i
n
e
S
t
a
&
e
s
Stream Core ,
Stream Core '.,
!
!
!
<ork-3tem +
Su=K0 /4 14 ...
<rite
2/ecute
In%truction
Memor
>A(U
clau%e%?
(
o
c
a
l
M
e
m
o
r
(
o
c
a
l
M
e
m
o
r

0rom
Ee)i%ter
0ile
To
Ee)i%ter
0ile
x

B
1
t
;(IK
=undle
=u88er
>1 entr?
<ork-3tem +
Su=K0 /4 14 ...
<ork-3tem '-,
Su=K0 /4 14 ...
Fetch
6one <F8
2 Arithmetic-4ogic 6A4U8 2ngine
3 " %ta)e%.
3 K0 i% %'lit into Su=K0% at the read %ta)e.
3 Su=K0 %iBe i% eDual to num=er o8 a,aila=le Stream Core% >Sc%?.
3 Each SC ha% " 'i'elined 'roce%%in) element% >x4 4 B4 14 t?.
The Multi2Sim Simulation Framework* PACT 2+,, Tutorial 30
0! The GPU Architectural Simulation
2/ecution 2ngines
!!!
Fetch
6one <F8
In%truction
=te%
$ecode
In%truction
Memor
>TE<
Clau%e%?
"ead
EeDue%t to
(1 cache
>-lo=al Mem.?
<rite
+ata 8rom
(1 cache
To
Ee)i%ter
0ile
TE<
in%tr.
=u88er
>1 entr?
0rom
Ee)i%ter
0ile
a
d
d
r
.
d
a
t
a
2 Control Flow 6CF8 2ngine
3 4 %ta)e%.
3 -lo=al memor read% are i%%ued at read
%ta)e.
3 The com'lete at write %ta)e.
The Multi2Sim Simulation Framework* PACT 2+,, Tutorial 31
0! The GPU Architectural Simulation
Summar( of <ork-items Grouing
2 '$-"ange
3 -rou' o8 all 1or$5item% 8or one $ernel launch.
2 <ork-grou
3 Kor$5item% can 'er8orm %nchroniBation%.
3 Kor$5item% %hare a 8a%t5acce%% local memor.
2 <a%efront
3 SIM+ execution unit.
2 Su#wa%efront
3 Kor$5item% that can =e i%%ued to Stream Core% at a time.
-

e
n
C
4

P
r
o
g
!

M
o
d
e
l
G
P
U

A
r
c
h
i
t
e
c
t
u
r
e
$emo 0
The Multi2Sim Simulation Framework* PACT 2+,, Tutorial 32
F! )enchmarks and Simulations
Suorted GPU )enchmarks
2 AM$ S$&Gs -enC4 )enchmarks
3 Matrix com'utation%.
3 0inancial =enchmar$%.
3 Sortin) al)orithm%.
3 etc.
2 Features
3 Pro,ided in Multi2Sim %ite a% x86 J E,er)reen =inarie%.
3 Command5line can =e tuned 8or di88erent in'ut %iBe%.
3 Pro,ide =oth CPU and -PU im'lementation%4 1ith %el85chec$.
$emo F
The Multi2Sim Simulation Framework* PACT 2+,, Tutorial 33
,+! Conclusions
Simulation Caa#ilities
2 /01 CPU Simulation
3 ISA5le,el.
3 :o need 8or 8ull5%%tem %imulation.
3 Su'er%calar6multithreaded6multicore.
3 Memor hierarchie% and interconnect%.
3 State5o85the5art =enchmar$%.
2 AM$ 2%ergreen GPU Simulation
3 ISA5le,el.
3 0ir%t 8ull architectural %imulation 8rame1or$.
3 Eeali%tic -PU 'i'eline >=a%ed on AM+ Eadeon "8*/?.
3 Memor hierarchie% and interconnect%.
The Multi2Sim Simulation Framework* PACT 2+,, Tutorial 34
,+! Conclusions
Additional Material
2 The Multi2Sim Guide
3 Com'lete documentation.
3 H-ettin) %tartedI %ection%4 1ith execution exam'le%.
3 +e%cri'tion o8 CPU and -PU architectural model%.
2 The Multi2Sim Forum
3 +i%cu%%ion 8orum 8or Multi2Sim u%er%.
2 The Multi2Sim Mailing 4ist
3 Announcement% o8 ne1 ,er%ion%4 u'dated documentation4 etc.
The Multi2Sim Simulation Framework* PACT 2+,, Tutorial 35
,+! Conclusions
Future <ork
2 2/tending suort for #enchmarks
3 Su''ort 8or the entire &'enC( %'eci8ication.
3 Su''ort 8or the entire E,er)reen ISA.
3 Su''ort 8or the com'lete AM+ S+L %uite4 and other u'comin)
=enchmar$%.
2 Focus on heterogeneous architectures
3 Model 8or AM+ 0u%ion.
3 CPU and -PU 1or$in) concurrentl.
3 Su''ortin)6de%i)nin) =enchmar$% 1ith hetero)eneou% 'roce%%in).
2 Maintenance of CPU simulation
3 I%%ue% re'orted = Multi2Sim u%er%.
3 Sta=ilit and %u''ort increa%e% da = da.
Conference title 36
The Multi2Sim Simulation Framework

A CPU-GPU Model
for Heterogeneous Comuting
www!multi2sim!org

"afael U#al
$a%id "! &aeli
'ortheastern Uni%ersit(
)oston* MA