Professional Documents
Culture Documents
Crashdump Analysis On Solaris Frank Hofmann
Crashdump Analysis On Solaris Frank Hofmann
Operating System
on x86 Platforms
Crashdump Analysis
Operating System Internals
Table of Contents
1.Foreword............................................................................................................5
1.1.History of this document........................................................................................5
1.2.About modifying this document.............................................................................7
2.Introduction to x86 architectures............................................................... ......9
2.1.History and Evolution of the x86 architecture.......................................................
2.2.!haracteristics of x86..........................................................................................12
2.".#ar$eteering % &aming the architecture............................................................1
3.Assembly Lanuae on x86 !lat"orms.............................................................21
".1.'eneric (ntroduction to Assembly language........................................................21
".2.Assembly language on x86 )latforms..................................................................25
".".x86 assembly on *&(+ systems % calling conventions, A-(................................."2
"...!ase study/ !om)aring x86 and 01A2! assembly languages..............................2
".5.3he role of the stac$.............................................................................................6
".6.4dd things about the x86 instruction set............................................................55
".7.Exam)les of com)iler6generated code on x86 )latforms.....................................51
".8.Accessing data structures....................................................................................65
"..!om)iler hel) for debugging A#76. code..........................................................7"
#.$emory and %ri&ilee $anaement on x86....................................................''
..1.3he x86 )rotected mode % )rivilege management...............................................78
..2.3ra)s, (nterru)ts, 0ystem !alls, !ontexts...........................................................8
..".8irtual #emory #anagement on x86..................................................................1
....Advanced 0ystem 1rogramming 3echni9ues on x86.........................................151
5.Interru!t handlin( )e&ice Autocon"iuration.............................................1*3
5.1.(nterru)t Handling and (nterru)t 1riority #anagement...................................15"
5.2.A1(! and (4A1(! features.................................................................................15.
6.+olaris,x86 architecture................................................................................111
6.1.:ernel and user mode........................................................................................112
6.2.Entering the 0olaris;x86 $ernel.........................................................................11"
6.".0olaris;x86 8# architecture % x86 HA3 layer....................................................118
6...8irtual #emory <ayout......................................................................................121
6.5.!ontext s=itching..............................................................................................122
6.6.0u))orting #ulti)le !1*s.................................................................................12.
6.7.isaexec % !reating "2;6.bit6s)ecific a))lications..............................................125
'.+olaris,x86 -rashdum! Analysis...................................................................12'
7.1.7ebugging tools for core6;crashdum) analysis..................................................127
7.2.3roubleshooting system hangs on 0olaris;x86...................................................12
7."."2bit $ernel crashdum) analysis % a =ell6$no=n exam)le................................1"2
7...6.bit $ernel crashdum) analysis % =ell $no=n exam)le...................................155
7.5.Another 6.bit crashdum) analysis exam)le......................................................16"
7.6.A#76. A-( % -ac$tracing =ithout frame)ointers.............................................176
"
7.7.Exam)les on a))lication coredum) analysis on 0olaris;x86..............................177
7.8.Advanced 7ebugging 3o)ics..............................................................................178
8.Lab .xercises........................................................................... ......................1'9
8.1.(ntroduction to x86 assembly language.............................................................17
8.2.0tac$s and 0tac$tracing....................................................................................18"
9./e"erences.....................................................................................................185
1*.License................................................... ......................................................18'
.
1.Foreword
1.1.History of this do!ment
3his document didn>t start out from no=here, but neither has it originally been
intended for )ublication in boo$ form. -ut then, sometimes history ta$es unex)ected
)aths ...
0hortly after 0un had revised the ill6begotten idea of ?)hasing out@ 0olaris for x86
)latforms and started to ram) u) a hard=are )roduct line =ith (ntel !1*s in it, ( =as
a))roached by the 0ervice division =ithin 0un about =here they could get an
introductory course about ho= to )erform lo=6level troubleshooting % crashdum)
analysis % on the x86 )latform. (nformation and trainings about troubleshooting on this
level on 01A2! )latforms are =idely available % starting =ith the famous ?1anicA@ boo$
all the =ay to extensive classes offered by 0un Educational 0ervices to )artici)ants
both internal and external to 0un. 3hat not=ithstanding, =e soon found out that no
internal training about the lo=6level guts of 0olaris;x86 did exist. 7evelo)ment
engineers =ere usually both ca)able and encouraged to find out about the x86
)latform on their o=n, and users outside of the engineering s)ace =ere fe= and far
bet=een. 0o this )roBect started as a slide set for teaching engineers =ho =ere familiar
=ith 01A2! assembly, 0olaris (nternals and some !rashdum) Analysis the
fundamentals of x86 assembly and 0olaris on x86 )latforms, strongly focusing on
?=hat>s similar@ and ?=hat>s different@ bet=een the lo=6level 0olaris $ernel on 01A2!
and x86 )latforms.
( =as to a large degree sur)rised by the amount of interest this material generated
internally, so it gre=, as time allo=ed, into a multi6day internal course on 0olaris;x86
internals and crashdum) analysis. Cor a =hile, ( came to s)end a significant amount of
time teaching this never6official ?class@ ...
3hen came the =or$ on 0olaris 15 and the A#76. )ort. 3he ne= ?6.bit x86@ )latform
su))ort brought changes in the A-( =ith it that severely sur)rised even ex)erienced
?x86 old6timers@ and re9uired a large amount of addition to the existing material,
=hich at that time had gro=n into a braindum) of semi6related slides. 2evam)ing the
0olaris hard=are interface layer for both "2bit and 6.bit on x86;A#76. as =ell as the
addition of ne= features li$e 7trace or the <inux A))lication Environment made
further modifications necessary.
(n the end, 0tar4ffice>s limited ability to deal =ith )resentations of 255D slides
eventually made it inevitable to dro) the till6then ada)ted method of ?add a slide as a
ne= 9uestion comes u)@.
Eould ( have to ma$e the same choice again (>d )robably have o)ted to install myself a
3e+ system, but ( decided to give 0tar4ffice another chance and turn this material into
something closer to a boo$. Ho= ( regret not having used 3e+ to start =ith ... that>ll
teach me A
4ver the course of the A#76. )ort of 0olaris this gre= into essentially the current
form, and =hen )eo)le started using the 6.bit )ort internally a large amount of ne=
9uestions and ty)ical )roblems came u) =hich ( attem)ted to address. 3o say it
u)front, =hile the assembly language on A#76. =ill be immediately familiar to )eo)le
=ho $no= about ?classical@ x86, the calling conventions used in 6.bit machine code on
A#76. are so much different that in many as)ects crashdum) analysis on
0olaris;A#76. is closer to 0olaris;01A2! than it is to 0olaris;x86. -ut then it isn>t ...
=ell, (>m disgressing, go read the boo$.
3hen the 4)en0olaris )roBect came. (nitially, ( had )lanned to )ublish this on launch
1.Core=ord 5
day, but for many reasons this didn>t =or$ out at that time. 0o here it is % several
months delayed, no longer com)letely covering the state of our internal and external
F4)en0olarisG develo)ment releases. -ut it>s finally revie=ed, the crashdum) analysis
exam)le dum)s are made available, the 0tar4ffice document has been cleaned u) to
only rely on freely available fonts D gra)hics.
Ehich means that you % yes, loo$ into the mirror % are no= su))osed to =or$ =ith this
material, and on it. 3he =hole document including all illustrations are no= made
available in editable form.
1lease read the license attached to the end of the document.
Hes, you can ma$e modifications to this document.
Hes, you can redistribute co)ies of this document in any form you see fit % you>re in
fact encouraged to do so.
Hes, you>re encouraged to contribute corrections or additions.
Cor all else legalese, see the a))endix.
I 255"62555, Cran$ Hofmann, 0un #icrosystems, (nc.
EnBoy % and never forget/
Don't panic !
F0hall ( say green is my favourite color JG
(f you =ish to contact the author, )lease send Email to/
Frank.Hofmann@sun.com
At this )oint in time, ( cannot even start listing the number of )eo)le that have made
this document )ossible. 'iven that it didn>t start as a boo$ )roBect (>ve $e)t a lousy
bibliogra)hy.
(>d li$e to both than$ every unnamed contributor as =ell as excuse myself for not
naming you.
*sing the =ords of (saac &e=ton/
If I have seen further it is by
Standing on the shoulder of giants.
Hou $no= =ho you are.
6 1.Core=ord
1.".#bo!t modifying this do!ment
0tar4ffice8 is used to edit this document, but F-etaG versions of 4)en4ffice 2.x should
be able to access it as =ell.
3he document uses the 4)en0ource DejaVu fonts =hich are a derivative of -it0tream
8era. 3he difference bet=een these t=o is that the 7eBa8u font family contains full
bold;italic;bolditalic;condensed ty)efaces for 0ans, 0erif and #onos)aced, =hile the
original -itstream 8era fonts only su))ly the full ty)eface set for 0ans. (nstalling the
7eBa8u fonts is therefore a )rere9uisite to being able to edit this document and
recreate the out)ut as6is.
3hese fonts are available from htt)/;;deBavu.sourceforge.net
4ther fonts than 7eBa8u should not be used. 3o sim)lify this, s=itch the 0tar4ffice
stylist tools to only sho= ?A))lied 0tyles@, and don>t use any but these.
(f you =ish to contribute bac$ changes;additions in )lain text that>s more than
=elcome. (f you modify the 0tar4ffice document itself, allo= sim)le merge bac$ by
enabling the change recording facility in 0tar4ffice. 0ee the hel) functionality, on
?!hanges@.
&ote that 0tar4ffice>s master text facility is some=hat dumb % it records full
)athnames Finstead of relative locationsG for the subdocuments. Ehen you o)en
book.odm in 0tar4ffice8, the &avigator =ill sho= you the list of subdocuments. *se the
right mousebutton to re9uest the context menu, and choose ?Edit <in$@ to change the
)athnames of the subdocuments to refer to the location =here you un)ac$ed the file
set.
3he same is true for embedding gra)hics. &ot even the documented functionality
F?lin$@ to the illustrations instead of instantiate a co)y for the documentG is =or$ing.
0o be a=are =hen you change some file under figures/, you might need to delete and
reinsert it in the main document ...
(>ll $ee) a )ointer to the current version F0tar4ffice for editing ; 17C for reading and
)rintingG of this document on my blog/
http://blogs.sun.com/ambiguous/
And finally/ 3hese instructions should be better ...
1.Core=ord 7
".$ntrod!tion to x86 arhitet!res
".1.History and %&ol!tion of the x86 arhitet!re
3he main driving force in develo)ment of the x86 )rocessor family has al=ays been to
enhance existing functionality in such a =ay that full binary6level com)atibility =ith
)revious x86 )rocessors can be maintained. Ho= im)ortant this is to (ntel is best
described in (ntel>s o=n =ords/
One of the most important achievements of the IA!" architecture is
that the ob#ect code programs created for these processors starting
in $%&' still e(ecute on the latest processors in the IA!"
architecture family.
Among all !1* architectures still available in current machines, only the (-#"xx
mainframe architecture Ffirst introduced in 16. =ith the (-#"65, still available in the
(-# K0eries mainframesG has a longer history of unbro$en binary bac$=ard
com)atibility. All current ?x866com)atible@ !1*s still su))ort and im)lement the full
feature set of the original member of the x86 family, the (ntel 8586 !1* =hich =as
introduced in 178.
3his means/ Executable )rograms from code originally =ritten for the 8586 =ill run
unmodified on any recent x866com)atible !1* such as (ntel>s 1entium6(8 or A#7>s
4)teron )rocessor. Hes, #0740 1.5 is 9uite li$ely to run on the very latest and
greatest L1!6com)atibleL, )rovided you can still find some single6sided "65$- 5ML
2.(ntroduction to x86 architectures
Illustration ! "#er#ie$ of the %&' architecture e#olution
1978
i8086
16 bit
1MB RAM
8 Registers
segments
16 bit
1MB RAM
8 Registers
segments
1982
i80286
still 16 bit
16MB RAM
8 Registers
protected mode
1985
i80386
32bit
4GB RAM
32bit MMU
A32 !rc"itect#re
32bit
4GB RAM
32bit MMU
$#'" arhitet!re
1987
i80486
integr!ted $%U
on&c"ip c!c"e
1993
%enti#m
1997
%enti#m&
64GB RAM '%A()
on&c"ip 2
nd
l*l+ c!c"e
1999
%enti#m&
,,( e-tension
l!rge p!ges
,M% s#pport 'A%.)
MM/ e-tension
2000
%enti#m 4
,,(2 e-tension
R,. intern!ll0 1 23ps
2003
AM4 3pteron
64bit
2565B *irt#!l memor0
16 registers
#()6* arhitet!re
flo))y drive =hich =ould allo= you to boot it on that shiny ne= A#7 4)teron
=or$station.
-ac$=ard com)atibility of the x86 )rocessor family goes =ay beyond =hat most other
!1* architectures Fincluding 01A2!G have to offer. 0un #icrosystem>s 0olaris;01A2!
binary com)atibility guarantee only ensures that applications Fnot o)erating systems
or other lo=6level codeG =ritten on and for )revious 40;hard=are =ill continue to run
on recent 40;hard=are combinations, but it does not claim that old versions of the
0olaris 4)erating Environment =ill run on )rocessors that =ere yet unreleased at the
time a s)ecific release shi))ed. 3his is different on x86. &e= versions of x86 !1*s
from =hatever vendor run older o)erating systems Bust fine. (ncom)atibilities if at all
rise from the lac$ of device driver su))ort for ne=er integrated )eri)herals, but not
from the ne=er !1*>s inability to function li$e its )redecessors.
0ince introduction of the (ntel i85"86 in 185 FAG, most features of the x86 architecture
have remained remar$ably constant. 0#1 su))ort Fvia A1(!G and su))ort for more
than .'- )hysical memory Fvia 1AEG =as added in the 1entium res)ectively to the
1entium1ro )rocessorsN after that, only instruction set extensions F##+, 00EG =ere
added but no externally6visible changes =ere done to other core subsystems of x86.
Crom the )oint of vie= of 0olaris;x86, it =as never necessary therefore to have more
than one $ernel, /platform/i86pc/kernel/unix, for su))orting the o)erating system
on x86 )rocessors. 1ut this in context and com)are it =ith 0olaris in 01A2!/ Cor the
various 01A2! generations Fmaximum number of architectures concurrently
su))orted in 0olaris 2.6/ sun., sun.c, sun.d, sun.m, sun.u, sun.u1G, each time
se)arate )latform su))ort =as re9uired. Even today, 0olaris delivers ten FAG different
$ernels for the various 01A2! )latforms, =hile 0olaris for x86 still has only one.
3his strict insistence on binary com)atibility =ith its )redecessors obviously has
disadvantages as =ell. 3he =ay ho= the i85"86 introduced "2bit su))ort in some areas
loo$s illogical and counterintuitive, es)ecially =hen com)aring it =ith "2bit
architectures that =ere designed for "2bit from their very beginnings. 0ome exam)les
of this =ill be given later.
After releasing the i85"86 "2bit )rocessor, (ntel decided to $ee) future versions of
x866com)atible F?(A"2@ in (ntel>s termsG !1*s on "2bit. Each generation became faster
and added functionality, but the limitation to "2bit remained. (n the early 15s, this
did not seem a )roblem because the maBor mar$ets for x86 at that time F#icrosoft 740
and Eindo=sG =ere 16bit only any=ay, and (ntel>s evolutionary )ath to 6.bit had been
layed out in the agreement =ith H1 to co6develo) a ne= 6.bit architecture/ (A6., then
dubbed ?#erced@, is today found in the (ntel (tanium )rocessors.
-ut (A6. has nothing to do =ith x86. 3he instruction sets have nothing in common and
existing )rograms or o)erating systems =ritten for "2bit x86 )rocessors cannot run on
machines =ith (A6.;(tanium )rocessors in it. 3he (tanium, though )roduced by (ntel, is
a genetic child of H1>s 1A62(0! architecture, but only a distant relative to (ntel>s o=n
x86;(A"2.
(n addition to that, (ntel and H1 =ere late at delivering the (A6. !1* % very late.
0o late that bac$ in 2555, A#7 ste))ed in and decided to extend the old x86
architecture another time % to 6.bit. A#7 had, =ith varying success, been building
x866com)atible )rocessors since the early 185s and sa= (ntel>s de6facto termination
of x86 as a chance to extend its o=n mar$et reach. 3he A#76. F6.bit x86G
architecture =as done in a =ay very similar to ho= (ntel had done the i85"86, and
)rocessors based on A#76. Fmuch unli$e (tanium;(A6.G are, in good old x86 tradition,
fully binary bac$=ard com)atible. 4f course, actually using the ne= 6.bit o)erating
mode re9uires )orting o)erating system and a))lications Fli$e using "2bit on the
15 2.(ntroduction to x86 architectures
i85"86 did re9uire at the timeG. -ut even =hen running a 6.bit o)erating system does
A#76. )rovide a sandboxed "2bit environment to run existing a))lications in Fagain,
li$e the i85"86 =hich allo=ed the same for 16bit )rograms running on a "2bit 40G.
3herefore the A#76. architecture offers much better investment )rotection than (A6.
% =hich =ill not run existing "2bit o)erating systems or a))lications.
-y the time the A#7 4)teron 6.bit )rocessor became available, the (tanium, on the
mar$et for three years then, had seen very little ado)tion % =hile users and soft=are
vendors $e)t )ushing ever harder on (ntel to follo= A#7>s lead and )rovide 6.bit
ca)abilities in their x86 )rocessor line as =ell. (ntel resisted this for several years in
order not to Beo)ardiKe the mar$et for their (tanium )rocessors but eventually gave in
and cloned A#76.. Cor obvious reasons (ntel doesn>t call their 6.bit6ca)able x86
)rocessors ?A#76.6com)atible@ but uses the term E#6.3 F(nhanced )emory '*bit
+echnologyG for the architecture and (A"2e for the 6.bit instruction set extension.
(ntel !1*s =ith E#6.3 are com)atible to A#76. % =hich (ntel confirms in the CAO
for the 6.bit Extension 3echnology.
http://$$$.intel.com/technology/'*bite%tensions/fa,.htm notes that/
)%: (s it )ossible to =rite soft=are that =ill run on (ntel>s )rocessors =ith
(ntelP E#6.3, and A#7>s 6.6bit ca)able )rocessorsJ
A%: -es. in most cases. (#en though the hard$are microarchitecture for
each company/s processor is different. the operating system and soft$are
ported to one processor $ill likely run on the other processor due to the
close similarity of the instruction set architectures.
Ho= the future of x86 =ill loo$ remains to be seen. -ut the x86 architecture, =ith
more than 25 years of age, has far sur)assed the success of all other Fnon6embeddedG
)rocessor architectures ever develo)ed. Eith 6.bit extensions that have reBuvenated
x86, and x866com)atible )rocessors =ith 6.bit ca)abilities becoming common)lace
no=, this is unli$ely to change in the near future.
2.(ntroduction to x86 architectures 11
".".Charateristis of x86
3here are t=o factors res)onsible for the main characteristics of the machine
instruction set for =hat is commonly termed ?x86 architecture@/
3he long history of x86 has left its mar$ on the instruction set.
x86 machine code carries a huge legacy of Fmis6Gfeatures from the time =hen the
architecture =as still 16bit only, and in )arts even from )re6x86 8bit days Fin the
form of limited com)atibility =ith the (ntel 8558G.
3he need to introduce ne= ca)abilities =ithout brea$ing binary com)atibility has
lead to a lot of instruction set extensions that are o)tional, and =hose )resence
needs to be detected by a))lications ; o)erating systems that =ant to ma$e use of
them. (n addition, x86 never =as a vendor6loc$ed6in architecture, even though
(ntel>s decisions have dominated its evolution. -oth o)erating systems and
a))lication code for x86 therefore needs to ex)end some efforts on determining
=hich !1* by =hat vendor it runs on, and =hat instruction set extensions this !1*
)rovides before it can ma$e use of o)timiKed code.
3his is fortunately much im)roved by A#76. =hich establishes a ne= ?6.bit x86
baseline@.
(n addition to that, x86 !1*s use the so6called little endian =ay of ordering data in
memory. Endianness becomes very relevant once data needs to be exchanged bet=een
systems of differing architecture.
".".1.C$SC and +$SC
-ac$ in the early days of !1* design in the 175s and early 185s, manufacturing
technology did not allo= for anything close to the com)lexity =e have today. !1*
designers then had to ma$e tradeoffs, mostly bet=een a feature6rich assembly
language, but fe= registers and generally lo=er instruction through)ut, and a feature6
)oor assembly language =ith many registers and faster execution for the sim)le
instructions that there =ere.
3he x86 architecture is the classical exam)le of a so6called !(0! )rocessor. 3he term
0I10 stands for 0omple% Instruction 1et 0omputer, and is used to describe a )rocessor
=hose instruction set offers single, dedicated !1* instructions for )ossibly very
involved tas$s. 1hiloso)hically, the ultimate design goal for a !(0! )rocessor is to
achieve a 1/1 match bet=een !1* instructions and instructions in a high6level
)rogramming language.
!(0! is almost a re9uirement for !1*s =hich maintain full bac$=ard com)atibility
such as the x86 family. Adding functionality to an existing architecture al=ays means
adding instructions and com)lexity. A )ure evolutionary !1* develo)ment as (ntel has
done it therefore almost necessitates a !(0! architecture.
All in all, (ntel>s latest instruction set reference needs t=o volumes and more than
1555 )ages to describe all x86 instructions su))orted by the latest x86 !1*s by (ntel.
Cor com)arison 6 the s)arcv architecture reference manual only has 156 )ages
describing all s)arcv assembly instructions.
'iven the focus on instruction functionality vs. versatility, !(0! architectures tend to
have features li$e/
many s)ecial6)ur)ose instructions.
An exam)le on x86 =ould be t=o se)arate instructions for com)arison % the generic
CMP instruction and the TEST instruction =hich =ill only chec$ for e9uality or
Keroness.
12 2.(ntroduction to x86 architectures
the ability to modify a memory location directly, =ithout the need to load its
contents into a register first.
3his is done to offset the lac$ of registers % the idea is that if destination or source
of an o)eration can be memory, less registers are needed.
instructions =ith varying length.
3his is both due to the fact that !(0! architectures usually allo= to embed FlargeG
constants into the instruction, and because feature additions over time have
re9uired the introduction of longer o)codes Finstruction encodingsG.
Another conse9uence of this is that there are fe= ga)s Fundefined or illegal
o)codesG in the instruction set. As =e =ill see, to an x86 !1* random data ma$es u)
for a decodeable instruction stream A
fe= general6)ur)ose registers.
Historically there had to be a tradeoff bet=een using the s)ace on the !1* die to
)rovide more registers or more6ca)able instructions. !(0! !1* designers chose to
do the latter, and it often )roved difficult to extend the register set even after
manufacturing technologies =ould have allo=ed for it. 3he x86 architecture lived
=ith only eight registers, until A#7 designing the 6.bit mode finally too$ the
chance and extended the register set to 16.
3he x86 architecture is the single maBor remaining !(0! architecture out there today.
#ost other !1* architectures on the mar$et today, =hether 01A2!, 1o=er1!, A2# or
Fto a degreeG even (A6., have gone the other =ay % 2(0!.
S*A+, assembly source binary machine code disassembler output
func:
tst i!
orcc g!" i!" g!
set #$%&" i!
or g!" #$%&" i!
cmp i!" i#
subcc i!" i#" g!
clr i!
or g!" g!" i!
mo' i#" i!
or g!" i#" i!
.si(e func".)func
section .text
!: 8! *! !! #8
&: 8! *! !! #8
8: b! #! $& d$
c: b! #! $& d$
#!: 8! a6 !! #*
#&: 8! a6 !! #*
#8: b! #! !! !!
#c: b! #! !! !!
$!: b! #! !! #*
$&: b! #! !! #*
section .text
tst i!
tst i!
mo' !x&d$" i!
mo' !x&d$" i!
cmp i!" i#
cmp i!" i#
clr i!
clr i!
mo' i#" i!
mo' i#" i!
Illustration ! machine code e%ample on 2I10. synthetic instructions
01A2! and all its incarnations are a classical exam)le of 2I10 F2educed Instruction
1et 0omputerG, and share many generic features =ith other 2(0! architectures/
<ots and lots of !1* registers are available. Cor exam)le, 01A2! )rovides at least
"2 general6)ur)ose registers Finternally hundreds, via register =indo=sG.
3o modify data in memory, one must load it into a register, modify the register
contents and store the register bac$ into memory. 3his is called a load!store
architecture.
2(0! instructions usually have a fixed instruction siKe. All 01A2! instructions, for
exam)le, are "2bit. 2(0! (nstruction sets are rather designed than e#ol#ed.
(nstructions often are multi6)ur)ose. A 2(0! !1*, for exam)le, may not have
se)arate instructions for subtracting values, com)aring values or testing values for
Kero % instead, ty)ically, ?0*-@ =ill be used but the result Fa)art from condition
bitsG be ignored. 0ee the 01A2! assembly code exam)le above.
(nstructions tend to be sim)le. (f a 2(0! !1* offers com)lex instructions at all, they
2.(ntroduction to x86 architectures 1"
are usually com)leted by hel) of the o)erating system 6 instructions leading to
com)lex system activity =ill tra) and re9uire soft=are hel) to finish.
*nli$e !(0!, the focus for 2(0! is on ra= execution )o=er 6 the more instructions )er
unit of time a !1* can )rocess the faster it =ill be in the end. Executing a doKen
sim)le instructions as fast as theoretically )ossible often )roves to )rovide better
through)ut than executing a single, slo= instruction to achieve the same effect. 2(0!
originally =as invented to allo= for sim)ler !1* designs running at higher cloc$
s)eed.
2(0! )ays for this by often re9uiring more instructions to achieve an e9uivalent result
as !(0! gets =ith Bust one or t=o instructions/
('- assembly binary code S*A+, assembly binary code
mo'+
,!x#$%&-6.8*abcdef!"
rax
add+ rax"'ar
&8 b8
f! de bc *a
.8 -6 %& #$
&8 !# !& $-
XX XX XX XX
set/i /i0!x#$%&-&!!1" o#
xor o#" )!x$.*" o#
set/i /i0!x6-&%$!!!1" o!
xor o!" )!x##!" o!
sllx o#" %$" o#
xor o#" o!" o!
set/i /i0'ar1" o#
or o#" lo0'ar1" o#
ldx 2o#3" o$
addc o!" o$" o$
stx o$" 2o#3
#% !& 8d #-
*$ #a .d 8.
## #* -! c8
*! #a %e f!
*% $a .! $!
*! #a &! !8
#% !X XX XX
*$ #$ 6X XX
d& -a &! !!
*& &$ !! !a
d& .$ &! !!
Illustration 3 ! 2I10 4 0I10: 5dding a '*bit constant to a global #ariable 6var7
3oday, most arguments in the !(0! vs. 2(0! debate have become obsoleted by
technical )rogress.
0ince the introduction of (ntel>s 1entium6(8 and A#7>s Athlon, modern x86 )rocessors
internally Lrecom)ileL x86 instructions into 2(0! instruction sets. (ntel calls this Q6o)s,
=hile A#7 uses the term 241s F2(0! o)sG o)enly. 3hese 2(0! execution engines in
x86 !1*s are not ex)osed to the user 6 the ste) of decoding;com)iling x86 instructions
into the underlying micro6o)s is done by an additional layer of hard=are in the
instruction decoder )art of these !1*s.
<i$e=ise, 2(0! !1*s over time have added com)lex instructions such as hard=are
multi)ly;divide =hich had to be done )urely in soft=are in early 2(0! designs.
Additionally, instruction set extensions li$e the 8isual (nstruction 0et F8(0G on
*ltra01A2! or Alti8ec on 1o=er1! allo= for 7016li$e F0(#7G functionality Bust li$e
##+;00E do on x86.
0o =hat is a modern x86 !1* then J !(0! or 2(0! J
3he ans=er is/
-oth. (t is a !(0! !1*, but to )erform best, one has to )rogram it li$e a 2(0! !1*.
Cor exam)le, A#7 in their 1oft$are "ptimi8ation 9uide for 5)D 5thlon'* and 5)D
"pteron :rocessors ex)lains it li$e this/
+he 5)D'* instruction set is comple%; instructions ha#e #ariable!length
encodings and many perform multiple primiti#e operations. 5)D 5thlon '*
and 5)D "pteron processors do not e%ecute these comple% instructions
directly. but. instead. decode them internally into simpler fi%ed!length
instructions called macroops. :rocessor schedulers subse,uently break
do$n macro!ops into se,uences of e#en simpler instructions called micro
ops. each of $hich specifies a single primiti#e operation.
1. 2.(ntroduction to x86 architectures
and a little later/
Instructions are classified according to ho$ they are decoded by the
processor. +here are three types of instructions:
Instruction .ype Description
Direct:ath 1ingle 5 relati#ely common instruction that the processor
decodes directly into one macro!op in hard$are.
Direct:ath Double 5 relati#ely common instruction that the processor
decodes directly into t$o macroops in hard$are.
Vector:ath 5 sophisticated or less common instruction that the
processor decodes into one or more < ... = macro!ops < ... =
.
and finally/
>se Direct:ath instructions rather than Vector:ath instructions.
(n short/
-y)ass the !(0! runtime translation layer to get best )erformance out of the
underlaying 2(0! execution engine.
0imilar notes can be found in the res)ective manuals for (ntel>s 1entium (8 !1* family
and later.
2.(ntroduction to x86 architectures 15
".".".%ndianness
3he x86 !1* family is traditionally ?ittle (ndian. Ehat does this mean J
3he to)ic of ho= bytes that form multi6byte For, for that matter, multi6bitG entities
should be ordered in the )ast used to have almost religious traits. 3his is the reason
=hy the technical term for memory byte ordering, (ndianness, =as ta$en from
9ulli#er/s +ra#els by Ronathan 0=ift and refers to the holy =ar bet=een the t=o
em)ires of <illi)ut and -lefuscu about the 9uestion =hich end eggs are to be o)ened at
first.
3he original reference =hich coined the term seems to be a )osting by 7avid !onen in
his famous essay ?"n holy $ars and a plea for peace@, =hich dates from the 1st of
A)ril 185 and became a classic on that subBect after it =as )ublished by the (EEE
com)uting magaKine in 181. 3he article is also $no=n under the reference number
(E&61"7.
Data ordering in little endian mode Data ordering in big endian mode
utsname4!x%!%5s
utsname4!x%!%: sn'6$&
7 utsname4%!%58
utsname4!x%!%: .%6e.6-f%$%&!!!!
7 utsname4%!%5$9
utsname4!x%!%: %$%&!!!! .%6e.6-f
7 utsname4%!%5&x
utsname4!x%!%: ! %$%& .6-f .%6e
7 utsname4%!%58:
utsname4!x%!%: ! ! %& %$ -f .6 6e .%
utsname4!x%!%5s
utsname4!x%!%: sn'6$&
7 utsname4!x%!%58
utsname4!x%!%: .%6e.6-f%$%&!!!!
7 utsname4!x%!%599
utsname4!x%!%: .%6e.6-f %$%&!!!!
7 utsname4!x%!%5&x
utsname4!x%!%: .%6e .6-f %$%& !
7 utsname4!x%!%58:
utsname4!x%!%: .% 6e .6 -f %$ %& ! !
Ehen a )rocessor accesses a multi6byte data ty)e Fi.e. ! ty)es s/ort, int, long,
long longG from memory in a single o)eration, it =ill ma$e an im)licit assum)tion
=hat comes first % the most significant byte F#0-G or the least significant byte F<0-G.
3hese terms are used interchangeably =ith Endianness,
<0- Fleast significant byte firstG / <ittle Endian
16 2.(ntroduction to x86 architectures
Illustration * ! "n the origin of the term 6(ndianness7
Little End
Big End
#0- Fmost significant byte firstG / -ig Endian
As there is endianness on byte level, there>s also endianness on bit level, i.e. regarding
the ordering of bits =ithin a byte. -ut this )oses less )roblems than byte ordering,
because a)art from serial )rotocols little data exchange is done on bit6level, and
fortunately mixed6endian !1*s that used little endian for bits and big endian for bytes
or vice versa Fyuc$ % li$e ancient gree$ =ritten in a mode called ?boustrofedon@, ?li$e
the ox )lo=s@ 6 one line from left6to6right, and the next right6to6leftG are no longer on
the mar$et. 3oday, big6endian !1*s use big endian for both bit and byte ordering, and
li$e=ise little6endian !1*s.
3o a !1*, reading numbers from memory, a$a ordering bytes =ithin a =ord, is li$e
reading a text to humans % =ords are made u) from characters, and you read them
from left6to6right % unless, of course, you>re reading Arabic or Hebre= texts, or
traditional chinese, =here you read them from right6to6left. 3here is no inherent
advantage or disadvantage to do it either =ay, and =hat>s su))osed to be the correct
=ay of doing it de)ends on the !1*;language you use. -ut a conse9uence is that =hat
feels natural to one seems very odd to the other.
0o accessing data the big6endian =ay is li$e reading left6to6right, =hile little6endian is
li$e reading right6to6left and therefore may loo$ odd. -ut if the out)ut is formatted, it
becomes clear again/
/ittle 0ndian1 +ightaligned pointers 2ig 0ndian1 /eftaligned pointers
utsname4#!#/8
utsname4!x#!#: 6%6#6$686%.&6#68
7 utsname4#!#/9
utsname4!x#!#: 6%.&6#68
7 utsname4#!#/x
utsname4!x#!#: 6#68
7 utsname4#!#/:
utsname4!x#!#: 68
utsname4#!#/8
utsname4!x#!#: 6%6#6$686%.&6#68
7 utsname4#!#/9
utsname4!x#!#: 6%6#6$68
7 utsname4#!#/x
utsname4!x#!#: 6%6#
7 utsname4#!#/:
utsname4!x#!#: 6%
3he difference in endianness bet=een e.g. x86 Flittle endianG and 01A2! Fbig endianG
becomes relevant as soon as data is exchanged bet=een t=o machines of differing
endianness. Even =ithin the same system this can ha))en, in the case the !1* and a
)eri)heral device use different endianness but share memory. Ehenever file contents,
shared memory or net=or$ )ac$ets are exchanged bet=een t=o )arties that use
different endianness, a common storage format must be agreed on, or a method to
s=a) endianness must be found.
Exam)les ho= to deal =ith endianness are/
@et$ork Ayte "rdering.
4n creating net=or$ )ac$et contents, the sender is su))osed to use the host6to6
net=or$ interfaces, /tonl01 etc., to convert data to the net=or$ byte ordering,
=hile the receiver shall use the corres)onding net=or$6to6host functions,
nto/l01 etc., to decode net=or$ data into its native format.
&et=or$ byte ordering is big6endian, but it is un)ortable to )rogram based on
that assum)tion. (t>s also unnecessary % on big6endian machines, the interfaces
for host;net=or$ byteorder conversion =ill do nothing % they>ll sim)ly )ass
through their in)ut. !om)iler o)timiKers eliminate these calls on big6endian
systems.
0ee man)age b;teorder0%S<C=ET1.
2emote :rocedure 0alls F21!G.
1assing 21! arguments bet=een t=o systems re9uires an endian6agnostic data
re)resentation. 3his is called +72 Fexchangeable data re)resentationG, and a
2.(ntroduction to x86 architectures 17
library is su))lied that )rograms can use to convert a huge variety of basic data
ty)es into +72 re)resentation. +72 is a generaliKation of net=or$ byte ordering
to arbitrary data ty)es. 0ee man)age xdr0%>S?1.
D)5 )emory 5ccess by device drivers.
1eri)heral devices and the main !1* in a machine may access memory =ith
differing endianness. Ehen =riting a device driver for such a device, the
)rogrammer therefore needs an interface to s)ecify to the host o)erating
system that a given device is big6 or little6endian. 7e)ending on =hether device
and host use the same or a different byte ordering, data to be transferred to or
from that device must be converted into the )ro)er byte order. *nder 0olaris,
the 77( interface set )rovides routines to re9uest byte s=a))ing to be done by
the frame=or$. 0ee man)ages/
ddi6de'ice6acc6attr0*S1, ddi6dma6mem6alloc0*S1 and ddi6dma6s;nc0*S1.
18 2.(ntroduction to x86 architectures
".'.(ar,eteering - .aming the arhitet!re
3he number of trademar$ed and non6trademar$ed terms a))lied to ?x86 !1*s@ and
soft=are that runs on the ?x86 )latform@ is legend, and mar$eting de)artments
every=here $ee) adding to it.
&aming the architecture is truly the -abel of the com)uting industry.
3he term ?x86@ and derivatives of that is generic Fnot trademar$edG, and commonly
used to describe all architectures Fby various vendorsG that =ere in one =ay or the
other ?derived@ from the original (ntel 8586 micro)rocessor, and have a high degree
of com)atibility =ith (ntel !1*s.
3he same a))lies to ?1! com)atible@ 6 though that includes more than Bust a !1*
that is ?x86 com)atible@. 3he original (-# 1!;A3 Ftrademar$ed terms, againG had, in
addition to the i8586 !1*, a set of standard hard=are;)eri)herals =hose )resence
can be assumed on ?com)atibles@. <ater, #icrosoft, (ntel and other hard=are
vendors devised u)dated ?1! ++@ standards to list a set of hard=are;bus interfaces
available by default on ?modern@ systems.
(ntel uses the trademar$ed terms ?(ntel Architecture@, and more s)ecifically ?"2bit
(ntel Architecture@ F(A"2G. (ntel al=ays had more than one !1* architecture in their
)ortfolio Fe.g. today the (tanium;(A6., in the )ast the i865 2(0!, even before that
the i."7G so ?(ntel Architecture@ alone doesn>t mean anything technically. (A"2, on
the other hand, is the term a))lied to the instruction set;feature set of (ntel !1*s
=hose ancestor is the 8586 % in short, (A"2 is ?x86 by (ntel@.
3he !1* names i8586, i85286, i85"86, i85.86, 1entium, ... are (ntel trademar$s.
*&(+ )latforms have traditionally shortened these to i86, i286, i"86 F=ith and
=ithout the leading >i>G. ?"86@ is )articularly fre9uent as the first "2bit version.
?"86@ and variants thereof is found all over/
, file ls
ls: E?@ %$)bit ?S: executable 8!%86 Aersion #"
d;namicall; linked" stripped
2.(ntroduction to x86 architectures 1
Illustration B ! :ieter ArCgel. +o$er of Aabel
, uname )a
Sun<S /atc/back -.#!.# onn')Bork i86pc i%86 i86pc
, isainfo
amd6& i%86
Ee also find it, for exam)le, in the E<C format architecture name/
Cs;s/elf./7: Ddefine EM6%86 % /E Fntel 8!%86 E/
(t>s also )resent as the conditional6com)ile definition for "2bit x86/
Cs;s/isa6defs./7:
2 ... 3
/E
E T/e feature test macro 66i%86 is generic for all processors implementing
E t/e Fntel %86 instruction set or a superset of it. Specificall;" t/is
E includes all members of t/e %86" &86" and Pentium famil; of processors.
E/
Delif defined066i%861 GG defined0i%861
/E
E Make sure t/at t/e H>SF)C Ipoliticall; correctI s;mbol is defined.
E/
Dif Jdefined066i%861
Ddefine 66i%86
Dendif
2 ... 3
(ntel itself never used the terms ?i586@, ?i686@ F=ith or =ithout the >i>G or similar,
but other !1* vendors Fli$e A#7 or !yrixG did, and e.g. the '&* gcc com)iler
recogniKes )m-86 and similar as hint to o)timiKe code for )ost6.86 )rocessors.
3he confusion about names doesn>t get better =ith the extension to 6.bit.
3he 6.bit extension =as created and first s)ecified by A#7. A#7 called this
?x86S6.@ during develo)ment Fand the term is still used as the architecture name on
<inuxG, and ?A#76.@ on release.
(n fact, both are found as the E<C architecture name/
Cs;s/elf./7:
Ddefine EM6HMK6& 6$ /E HMKs x86)6& arc/itecture E/
Ddefine EM698666& EM6HMK6& /E 0compatibilit;1 E/
A#76. a))lies to the instruction set Fx86 including 6.bit extensionsG.
3he A#7 4)teron and Athlon6. are !1*s by A#7 im)lementing the A#76.
architecture.
(ntel for obvious reasons does not use the term ?A#76.@. 0ince ?(A6.@ is already
given to the Fx866incom)atibleG (tanium architecture, (ntel has created t=o ne=
names of its o=n instead/
E#6.3 FExtended #emory 6.bit technologyG
(A"2e
3he first is a))lied to )rocessors by (ntel that are ?A#76. com)atible@, =hile the
second F=hich is very uncommonG is used in (ntel>s architecture reference manual to
describe the 6.bit x86 instruction set FextensionG.
#icrosoft and 0un, for exam)le, chose to use the term ?x6.@ =hen tal$ing about the
6.bit x86 architecture, res). their o)erating systems su))orting it. (n that context,
?x86@ means "2bit6x86, =hile ?x6.@ means 6.bit6x86.
(n this document, the term ?x86@ is used =herever )ossible, =ith a s)ecific note
?"2bit@, ?"2bit mode@, ?6.bit@ etc. as a))ro)riate.
25 2.(ntroduction to x86 architectures
'.#ssembly /ang!age on x86
platforms
Crom ?3he 3ao of 1rogramming@/
+he +ao ga#e birth to machine language.
)achine language ga#e birth to the assembler.
+he assembler ga#e birth to the compiler.
@o$ there are ten thousand languages.
'.1.0eneri $ntrod!tion to #ssembly lang!age
1rogramming languages, ho=ever they are structured, tend to im)lement a common
set of minimum functionality. 1rogramming languages usually have features li$e/
instructions, i.e. o)erations to modify and 9uery LstateL
LstateL Foperands;#ariables;dataG that instructions o)erate on
modulariKation Fthe ability to substructure both )rogram and data into smaller
reusable units of execution;access, termed functions;structuresG
Assembly language of course su))lies all of these. 3he )ur)ose of this section is to
ex)lain ho= constructs used in x86 assembly language im)lement these basic building
bloc$s. 0ince this manual is not su))osed to re)lace introductory tutorials on either
)rogramming in general nor machine6level )rogramming as such, no attem)t =ill be
made to ex)lain things li$e L=hat is an instructionL, L=hat is an ex)ressionL. #inimum
familiarity =ith )rogramming is assumed.
3o understand assembly language )rograms For disassembled com)iled codeG, loo$ at
the above list of language building bloc$s again in more detail.
'.1.1.$nstr!tions
Assembly language uses mnemonics Fhuman6readable transcri)t of the actual binary
machine codeG for instructions. 3he follo=ing classes are usually su))lied/
1. arithmetic;logical instructions. Anything that actually modifies data Fa$a )erforms
an o)erationG falls under this category. Exam)les are addition, multi)lication, and
other numerical o)erations.
2. com)arisons and conditionals to 9uery state and change the flo= of execution
de)ending on that state. A ty)ical exam)le =ould be a Lchec$ if lo=er thanL or a
Lbranch if e9ualL instruction.
". <oad;0tore o)erations for data transfer
.. function subroutine su))ort, a$a call;ret instructions Finstruction transferG
Ho= readable the assembly language for a s)ecific )rocessor is de)ends some=hat on
the choice of the !1* vendor ho= to name the instructions.
(ntel for the x86 !1* family has used )lain english terms For at =orst sim)le
abbreviationsG for assembly instruction names. A ty)ical exam)le =ould be the name of
the instruction that calculates the sum of t=o o)erands/ LHKKL. At =orst, an
abbreviation as LM<A>TLHL F#ove non6tem)oral 9uad=ord alignedG can occur, but in
most cases x86 assembly instruction names are descri)tive.
".Assembly <anguage on x86 )latforms 21
'.1.".Operands1 2ariables and )ata
3o understand the conce)ts used in assembly language for accessing data, =e have to
examine more closely =hat data can be. #ore )recisely, =hat the scope FvisibilityG of a
)articular item is.
4ne )ossible =ay ho= data can be classified in a hierarchical =ay =ould be/
3his is not the only )ossible subclassification of LdataL, of course, but the above
scheme has the advantage that it ma)s very =ell to some of the conce)ts inherent to
assembly language.
Crom the )oint of vie= of currently executing machine code, data can be considered to
be ?closer@ and ?further@ a=ay.
7ata that can be seen from any code =ithin the current )rogram is called global.
'lobal data is )ersistent, it =ill continue to exist even if the s)ecific )iece of code
that ha))ened to be using it has been com)leted.
3he ! )rogramming language $no=s a s)ecific subty)e of global data that is
called static. 0tatic data in ! is not visible to every code from the current
)rogram but only to code from the same sourcefile, or to all instantiations FcallsG
of a given function. ! static also is )ersistent.
Any other data in use by the )rogram is tem)orary and only lives as long as the
current function is executing. 0uch data is recreated;reinitialiKed each time a given
function is run, and different functions o)erate on different sets of data. 3his is
generically called local data. (t is usually subclassed further into/
Cunction in)ut/ 5rguments
Cunction out)ut/ return #alueDsE
4ther non6)ersistent data in use by the function/ local #ariables
0tructured )rogramming languages have finer6grained bloc$s of execution than
functions. !onsider, for exam)le, a loo) =ithin a function. (t uses data, though in
most cases not all of the data that this function is o)erating on. (nstead, it uses only
a subset of that. 3his subset of currently6in6use data is called the $orking set.
Cor o)timal )erformance, a method is desired to access data from the =or$ing set in
22 ".Assembly <anguage on x86 )latforms
Illustration ! Data @amespace based on scope of access
)ata
glob!ll0 *isble *isible 6rom 7it"in
! speci6ic 6#nction onl0
common to !ll
inst!nti!tions
'. l!ng#!ge static)
per 6#nction inst!nce
inp#t
'arguments)
o#tp#t
'return values)
loc!ls
in #se 8 !cti*e
'7or9ing set)
in!cti*e
as fast a =ay as )ossible.
(n terms of machine6level architecture, data as classified above therefore falls into
three big grou)s/
1. 'lobal, )ersistent data. 3his is the Heap.
2. 3em)orary data =hich lives as long as the function that uses it is executing. 3his is
usually called the 1tack.
". 7ata that ma$es u) the current =or$ing set. #ost !1*s )rovide fast6access
tem)orary storage for such data % a set of 2egisters.
".Assembly <anguage on x86 )latforms 2"
Illustration 3 ! )achine!?anguage concepts for heap. stack and registers
H%#P
+%0$ST%+S
ST#C3
in #se 8 !cti*e
'7or9ing set)
common to !ll
inst!nti!tions
'. l!ng#!ge stati)
)ata
glob!ll0 *isble *isible 6rom 7it"in
! speci6ic 6#nction onl0
per 6#nction inst!nce
inp#t
'arguments)
o#tp#t
'return values)
loc!ls
in!cti*e
'.1.'.+egisters1 the Sta, and the Heap
A high6level )rogramming language often does not inherently $no= the conce)t of
memory. Ehere data is stored or ho= it is accessed is u) to the internal
im)lementation of the language and not usually ex)osed to the )rogrammer. Even
intermediate6level languages li$e L!L that su))ly language features for s)ecifying data
locality F! $ey=ords extern;static;auto;register, )ointersG don>t usually s)ecify
ho= these features are im)lemented, but refer to Lthe architectureL to su))ly the
bac$end. Assembly language is different here. 7ue to the tight binding bet=een
hard=are features and assembly language, the )rogrammer here has to $no= about
the details regarding =here data is stored, res). consider the o)timal )lace =here to
)ut o)erands at any given time. 3his is =here the above diagram comes in handy.
Assembly language at least $no=s the distinction bet=een persistent and temporary
data % the heap and the stack. 3here are machines out there Fthe Rava 8irtual #achine,
or Corth, for exam)leG =hich im)lement nothing else, but most current )rocessors
)rovide hard=are su))ort for )utting a $orking set of data into fast tem)orary storage
% a set of registers.
!1* 2egisters are $ind of a L<evel 5 !acheL Fand the existance of registers as a fast6
access tem)orary data storage far )receeds the existance of !1* cachesG =ithin the
!1*, and used to hold variables that are either fre9uently 9ueried or being modified as
)art of a com)utation. (n many !1*s, arithmetic o)erations re9uire the )resence of
the o)erands =ithin registers. !1* registers, )rovided enough of them are available,
=ill be the )lace =here the $orking set of variables for the current function is found.
-ut even modern !1*s created;designed at a time =hen s)ace on the !1* die is
a)lenty, don>t offer unlimited number of registers. 4n the contrary, registers are
usually a scarce resource. 3his is =here the stack comes in again 6 to serve as a
bac$ing store for local variables. -y giving each function its o=n dedicated )iece of
memory s)ecific to this instantiation Fi.e. different for e.g. t=o !1*s calling the same
codeG, a so6called stack frame, the function can Ls=a)L its =or$ing set bet=een stac$
For hea)G and registers.
2egisters and;or the stac$ frame also serve for data6)assing bet=een nested function
calls. -y letting the frames of calling and called function overla), arguments can be
)assed bet=een functions or values returned.
7ata that is not s)ecific to one instantiation of a function call but shared bet=een all
calls to this function Fa ! staticG, or all calls to all functions Fa global variableG =ill
not end u) in the stac$ but in a =ell6defined location in memory that every code $no=s
about. 3his memory location is often called the data segment of the )rogram, or the
heap.
2. ".Assembly <anguage on x86 )latforms
'.".#ssembly lang!age on x86 platforms
'.".1.+egisters
3he general6)ur)ose x86 register set has evolved from the 8bit i8558 )rocessor>s HM;H?
accumulator model via the eight 16bit registers of the i8586 )rocessor, and their
extension Fhence the register name )refix >E>G to "2bit in the i85"86 and 6.bit in the
A#7 4)teron. All registers are global, and 16;8bit register names are only alias names
for lo=er bits of the "2bit register. 3his is called register aliasing. (n "2bit mode, x86
)rocessors im)lement the follo=ing general6)ur)ose registers/
4verall, x86 !1*s in "2bit mode have only eight global, general6)ur)ose registers.
3hey are shared bet=een "2;16;8bit access/
"2bit registers / EH9, E:9, EC9, EK9, ESF, EKF, E:P, ESP
16bit registers / H9, :9, C9, K9, SF, KF, :P, SP
3hese registers cover bit 5..15 of the corres)onding "2bit registers.
8bit registers / H?, :?, C?, K?, and HM, :M, CM, KM,
3hese registers cover bits 5...7 F.?G or bits 8..15 F.MG of registers EH9 ... EK9.
1rocessors in the x86 family su))ly many more registers than that, but none of these
are general6)ur)ose. (nstead, s)ecific instructions are re9uired to ma$e use of those.
!ommonly6seen s)ecial registers in x86 include/
3he )rocessor state registerFsG/ E@?HNS, CO!...CO8.
3he )rogram counter Finstruction )ointerG register/ EFP.
Cloating )oint and vector registers/ ST!..ST8, MM!..MM8, 9MM!..9MM8
1eculiar to the architecture is the conce)t of segmentation, =hich also is controlled via
a s)ecial set of registers/
7escri)tor 3able registers/ NKTO, ?KTO, FKTO
0egment registers/ CS, KS, ES, @S, NS, SS
#odern x86 !1*s su))ly hundreds of registers, all of them s)ecial6)ur)ose. 3hey are
".Assembly <anguage on x86 )latforms 25
Illustration F ! 2egister set Dinteger registersE on %&' architectures in F3bit mode
3
2
b
i
t
r
e
g
i
s
t
e
r
s
1
6
b
i
t
r
e
g
i
s
t
e
r
s
8
b
i
t
r
e
g
i
s
t
e
r
s
%eax
%ebx
%ecx
%edx
%esi
%edi
%ebp
%esp
%ax
%bx
%cx
%dx
%si
%di
%bp
%sp
%ah/%al
%bh/%bl
%ch/%cl
%dh/%dl
called machine!specific registers, or #02, and control s)ecific features of the given
!1*. 1lease refer to the )rocessor manuals from the res)ective !1* vendors.
(n 6.bit mode FA#76. and E#6.3 )rocessorsG, the general6)ur)ose register set is
t=ice as large as before, and access to 16;8bit ?subregisters@ has been unified/
6.bit mode retains register aliasing but ma$es it uniform. (n addition to that, the
number of general6)ur)ose registers Fand the number of 9MM vector registersG has
been doubled. (n 6.bit mode, the !1* )rovides/
16 6.bit registers / OH9, O:9, OC9, OK9, OKF, OSF, O:P, OSP and O8..O#-.
16 "2bit registers / EH9, E:9, EC9, EK9, EKF, ESF, E:P, ESP and O8K..O#-K.
3hese registers ma) bits 5.."1 of the corres)onding 6.bit register.
16 16bit registers / H9, :9, C9, K9, KF, SF, :P, SP and O8P..O#-P.
3hese registers ma) bits 5..15 of the corres)onding "2;6.bit register.
16 8bit registers / H?, :?, C?, K?, KF?, SF?, :P?, SP? and O8:..O#-:.
26 ".Assembly <anguage on x86 )latforms
Illustration * ! 2egister set Dinteger registersE on %&' architectures in '*bit mode
3
2
b
i
t
r
e
g
i
s
t
e
r
s
1
6
b
i
t
r
e
g
i
s
t
e
r
s
8
b
i
t
r
e
g
i
s
t
e
r
s
%eax
%ebx
%ecx
%edx
%esi
%edi
%ebp
%esp
%ax
%bx
%cx
%dx
%si
%di
%bp
%sp
%al
%bl
%cl
%dl
%sil
%dil
%bpl
%spl
%r8d
%r9d
%r10d
%r11d
%r13d
%r12d
%r14d
%r15d
%r8w
%r9w
%r10w
%r11w
%r13w
%r12w
%r14w
%r15w
%r8b
%r9b
%r10b
%r11b
%r13b
%r12b
%r14b
%r15b
6
4
b
i
t
r
e
g
i
s
t
e
r
s
%rax
%rbx
%rcx
%rdx
%rsi
%rdi
%rbp
%rsp
%r8
%r9
%r10
%r11
%r13
%r12
%r14
%r15
3hese registers ma) bits 5..7 of the corres)onding 16;"2;6.bit register.
(n 6.bit mode, the ?highbyte@ registers HM..KM are de)recatedN they still are available
but their use is no longer suggested for 6.bit code.
3he 6.bit x86 register set is uniform % all registers can be used in the same =ay, i.e.
all of them have 8;16;"2bit ?subregisters@. 3hat doesn>t mean all of them are e9ually
efficient, though. 3he x86 instruction set has ?o)timiKed machine o)codes@ for some
arithmetic o)erations that )ut their result into eax;rax, for exam)le. <i$e=ise, the
6.bit extensions encode the use of r8..r#- via an additional byte in the instruction
stream, so the use of the ?classical@ registers vs. the ?ne=@ registers creates more
com)act binary code. 1lease refer to the !1* vendors> o)timiKation guidelines for
instructions on ho= to o)timally use the register set if you intend to =rite assembly
code for 6.bit x86 )latforms manually.
6.bit mode also has the @?HNS register FO@?HNSG, and the 6.bit )rogram counter OFP,
=hich is made ex)licitly available for 1!6relative addressing, a feature not available in
"2bit code.
2egister aliasing re9uires rules that s)ecify ho= the high bits of the 16;"2;6.bit
register are handled if an instruction o)erates ex)licitly on a "2;16;8bit register/
A 8bit o)eration on .? does not affect bits 8.."1 Fi.e. the u))er bits in the .9 and E..
registersG. 4)erating on .M, bits 5..7 and 16.."1 are unaffected.
-its "2...6" of the 6.bit O.. register are cleared.
A 16bit o)eration does not affect bits 16.."1 Fi.e. the u))er bits in E..;O..KG.
-its "2...6" of the 6.bit O.. register are cleared.
A "2bit o)eration clears bits "2..6" of the 6.bit O... register.
(n other =ords, if o)erating in 6.bit mode, all o)erations that are not ex)licitly 6.bit
=ill Kero extend their result to 6.bit. 3he advantage of doing this is in )reserving the
semantics of all existing "2bit o)erations. Cor exam)le, a "2bit addition =ill overflo=
after "2bit and set status register bits to indicate this condition, instead of silently
=ra))ing around to 6.bit and )reventing )ro)er detection of the "2bit overflo=.
".Assembly <anguage on x86 )latforms 27
'.".".#ddressing (odes
Accessing memory is )ossible either/
Direct, su))lying an absolute "2;6.bit value as address
2egister indirect, using the value contained in a register as address
Indirect $ith offset, using the contents of a register as the base address and a Fno
larger than "2bitG constant as additional offset
Indirect $ith inde% and scale, using a register as base address of an array, a second
register as index into that array and a scale factor of 1, 2, . or 8 for that register to
s)ecify the siKe of the elements in the array.
Indirect $ith offset. inde% and scale. 0ame as before, exce)t that no= the start
address of the array =ill be the sum of base register and offset. 3his allo=s e.g. to
efficiently access arrays that are themselves members of larger data structures.
instruction pointer relati#e $ith offset. 3his is only available in 6.bit mode and
allo=s for efficient )osition6inde)endent code.
As a summary, memory access on x86 systems is done by calculating the address
im)licitly using the follo=ing formula/
Any )arts are o)tional. (n "2bit mode, only the "2bit registers eax...esp can be used,
of course.
3he stac$ is s)ecial on x86, and the architecture has ex)licit su))ort for accessing
stac$ memory % via PQSM;P<P instructions.
:ushing something onto the stack =ill decrement esp;rsp by the siKe of the o)erand
and )ut the value of the o)erand into the memory location that esp;rsp )oints at
then.
:opping something off the stack ta$es the value the esp;rsp )oints at, and then
increments esp;rsp by the siKe of the o)erand.
28 ".Assembly <anguage on x86 )latforms
memor0 loc!tion = offset
rax
rbx
rcx
rdx
rsi
rdi
rbp
rsp
r8
r#-
#
$
&
8
rax
rbx
rcx
rdx
rsi
rdi
rbp
rsp
r8
r#-
local variables
-...(%rbp)
+...(%rsp)
input argument s
%rdi, %rsi, %rdx,
%rcx, %r8, %r9
local variables
-...(%rbp)
+...(%rsp)
local variables
-...(%rbp)
+...(%rsp)
input argument s
%rdi, %rsi, %rdx,
%rcx, %r8, %r9
input argument s
%rdi, %rsi, %rdx,
%rcx, %r8, %r9
segment
b
a
s
e
a
d
d
r
e
s
s
base:si8e
gdtr
ldtr
segment selector is laid out Ft=o bits for )rivilege, the table ty)e bit, and 1" bit for
the table indexG does only allo= 812 descri)tor table entries, but at 6.$- )er
segment one needs 655"5 F6.$G to cover the entire "2bit )hysical address s)ace.
"2bit )rograms therefore =ere su))lied =ith a =holly se)arate "2bit ##* =hich =ill
be described in section ".". 0egmentation is no longer used for address space
separation. the ne= F"2bitG ##* ta$es over this tas$.
0egment descri)tors in "2bit mode are su))osed to be set u) to allo= flat addressing,
=here the ?near@ "2bit )art of the .8bit logical address Fi.e. the far )ointer including
the im)licitly used segment registerG ma)s 1/1 to the "2bits of the #irtual address. 3his
tric$ is accom)lished by setting the base address to &*<< and the segment siKe to
.'- in those segment descri)tors.
3he effect is to bypass segmentation/
3he )rotected mode in "2bit is therefore reduced to/
1. 7eclare the )rivilege levels to use. Each ring to be used needs t=o segment
descri)tors/
one flat code segment descri)tor, and
one flat data segment descri)tor.
All a))lications =ould share the same set of code;data segment descri)tors. 0ince
all these segments =ere flat Fand therefore overla)G, the segment registers contain,
at all times, only one out of t=o configurations/
Segment +egister value in 9ernel mode value in user mode
cs ring 5 code segment ring " code segment
ds;es;ss ring 5 data segment ring " data segment
82 ..#emory and 1rivilege #anagement on x86
Illustration * ! creating flat DunsegmentedE F3bit address spaces on the &LF&'
Local Descriptor
Table
base:si8e
base:si8e
base:si8e
base:si8e
base:si8e
0 31 15
F3bit DlogicalE address
F3bit #irtual address
base address
p
r
i
#
.
+
+
Global Descriptor
Table
base:si8e
base:si8e
base:si8e
base:si8e
t
t
P
L
t
t
P
segment
b
a
s
e
a
d
d
r
e
s
s
base:si8e
gdtr
ldtr
0 1 2 3
>Q??
DF3bitE physical address
"2bit
##*
2. 1rovide a )rivilege s=itching mechanism, i.e. at least one system call gate.
A ty)ical "2bit )rotected mode setu) therefore is very sim)le/
Even using a <ocal 7escri)tor 3able at all is o)tional % it>s )ossible to )ut both user
and $ernel mode code;data segments and syscall gate into the 'lobal 7escri)tor 3able.
&ote that on x86, e%ecutability is a )ro)erty of the code segment. 3he classical "2bit
x86 ##* does not have an attribute bit for ?is this )age executable J@ 6 only )ost6
A#76. ##*s $no= about a ?&+@ )age attribute bit. A certain memory location is
executable if there exists a code segment that covers it. !onse9uently, if =e let code
and data segments overla) Fand there are very good reasons for this % do you =ant
function addresses to be conce)tually different from any other address JG, everything
is executable.
4n classic x86 )latforms using the "2bit )rotected mode =ith flat segments, every
address is executable A
*.1.'.System segments
3he 1rotected mode uses t=o s)ecial segment descri)tor ty)es Fsystem descri)torsG for
control )ur)oses/
1. 3he ?ogical Descriptor +able in fact is a Fnon6flatG segment. ldtr therefore doesn>t
contain a memory location Fli$e gdtr doesG, but a segment selector % the local
descri)tor table register is a segment register, =ith a s)ecial rule that it may only
contain selectors =hose TT bit is clear Fi.e. =hich index the '73G.
3he original idea =as to use the '73 for $ernel segments and the <73 for those of
a))lications, and o)erating systems $ee)ing trac$ of these user contexts by creating
one <73 )er a))lication.
2. 3he +ask 1tate 1truct F300G is another ty)e of system descri)tor.
300 embodies the idea of a ?context@ in hard=are.
(n the limited =ay ho= most o)erating systems set u) the )rotected mode, the role of
the 300 is to su))ly the !1* =ith stac$)ointer locations for the various rings of
)rivilege.
3he "2bit mode 300 is more com)licated because it contains a bac$ing store s)ace
..#emory and 1rivilege #anagement on x86 8"
Illustration B ! typical F3bit protected mode setup
/ocal Descriptor
.able
user cs
user ds/es/ss
system call gate
Global Descriptor
Table
kernel %ds/%es/%ss
kernel cs
?D+ base
F3bit +11
"2bit flat
address s)ace
base >Q??
siKe .'-
!x!!!!!!!!
!xffffffff
s;scall6/andler01
for the general6)ur)ose and segment register set as =ell as the set of stac$)ointers
for the various )rivilege levels/
struct tss V
uint#66t tss6linkW /E #6)bit prior TSS selector E/
uint#66t tss6rs'd!W /E reser'ed" ignored E/
uint%$6t tss6esp!W
uint#66t tss6ss!W
uint#66t tss6rs'd#W /E reser'ed" ignored E/
uint%$6t tss6esp#W
uint#66t tss6ss#W
uint#66t tss6rs'd$W /E reser'ed" ignored E/
uint%$6t tss6esp$W
uint#66t tss6ss$W
uint#66t tss6rs'd%W /E reser'ed" ignored E/
uint%$6t tss6cr%W
uint%$6t tss6eipW
uint%$6t tss6eflagsW
uint%$6t tss6eaxW
uint%$6t tss6ecxW
uint%$6t tss6edxW
uint%$6t tss6ebxW
uint%$6t tss6espW
uint%$6t tss6ebpW
uint%$6t tss6esiW
uint%$6t tss6ediW
uint#66t tss6esW
uint#66t tss6rs'd&W /E reser'ed" ignored E/
uint#66t tss6csW
uint#66t tss6rs'd-W /E reser'ed" ignored E/
uint#66t tss6ssW
uint#66t tss6rs'd6W /E reser'ed" ignored E/
uint#66t tss6dsW
uint#66t tss6rs'd.W /E reser'ed" ignored E/
uint#66t tss6fsW
uint#66t tss6rs'd8W /E reser'ed" ignored E/
uint#66t tss6gsW
uint#66t tss6rs'd*W /E reser'ed" ignored E/
uint#66t tss6ldtW
uint#66t tss6rs'd#!W /E reser'ed" ignored E/
uint#66t tss6rs'd##W /E reser'ed" ignored E/
uint#66t tss6bitmapbaseW /E io permission bitmap base
address E/
XW
3he 6.bit 300 is ?reduced@ to the sim)le role of )roviding stac$)ointers % one for
each higher6)rivileged ring of execution, and a selectable table of se#en interrupt
stackpointers, the FST23/
Dpragma pack0&1
struct tss V
uint%$6t tss6rs'd!W /E reser'ed" ignored E/
uint6&6t tss6rsp!W /E stack pointer CP? [ ! E/
uint6&6t tss6rsp#W /E stack pointer CP? [ # E/
uint6&6t tss6rsp$W /E stack pointer CP? [ $ E/
uint6&6t tss6rs'd#W /E reser'ed" ignored E/
uint6&6t tss6ist#W /E Fnterrupt stack table # E/
uint6&6t tss6ist$W /E Fnterrupt stack table $ E/
uint6&6t tss6ist%W /E Fnterrupt stack table % E/
uint6&6t tss6ist&W /E Fnterrupt stack table & E/
uint6&6t tss6ist-W /E Fnterrupt stack table - E/
8. ..#emory and 1rivilege #anagement on x86
Aesp8Ass for all )rivileged rings
uint6&6t tss6ist6W /E Fnterrupt stack table 6 E/
uint6&6t tss6ist.W /E Fnterrupt stack table . E/
uint6&6t tss6rs'd$W /E reser'ed" ignored E/
uint#66t tss6rs'd%W /E reser'ed" ignored E/
uint#66t tss6bitmapbaseW /E io permission bitmap base
address E/
XW
Dpragma pack01
0ince in the 6.bit mode, all im)licitly6used segments Fcs;ds;es;ssG are flat, it>s
unnecessary to )rovide values for ss in any )rivilege level.
(ntel originally introduced the 300 for hard$are task s$itching. 3he current tas$ is,
li$e the ldtr, a s)ecial segment register called task register FtrG. 3he selector in
there indexes the '73. An o)erating system could have multi)le Fone )er )rocessG 300
segments in the '73, and ?s=itch@ bet=een them by reloading tr. 0uch a tas$ s=itch
=ould save the current state FregistersG to the current 300, and then reload that Fi.e.
all register;segment register contentsG from the ne= 300, ma$ing that current.
Ehile a given tas$ is running Fi.e. a certain 300 being activeG, the 300 )rovides the
!1* =ith the information $here to find kernel stackpointers =hen doing a )rivilege
s=itch.
Hard=are tas$ s=itching has )roven troublesome over time/
the 300 )rovides no means for saving;restoring floating )oint registers or other
register extensions that =ere introduced in the x86 family )ost685"86.
the 300 )rovides no means for an o)erating system to attach ?40 state@ to a tas$.
hard=are tas$ s=itching is !(0! at its =orst % it>s a single !1* instruction but
executing this is horribly slo=. (t>s in fact much slo=er than saving;restoring the
register set manually using sim)le se9uences of instructions.
hard=are tas$ s=itching doesn>t scale to large numbers of )rocesses.
7escri)tor 3ables have siKe limitations % the index )art of a segment selector is only
1"bit and the '73 therefore cannot be larger than 812 entries. -ut 300 segments
must be in the '73, and a limit of Zfe= thousands of threads is belo= =hat x86
!1*s have been able to handle for some generations by no=, even in "2bit.
3his is =hy even (ntel>s manuals today discourage the use of hard=are tas$ s=itching.
A#7 in devising the 6.bit extension therefore decided to limit the use of the 300
=hen running in 6.bit mode to its remaining core )ur)oses/
1. 1rovide stac$)ointers for the various )rivilege levels.
2. 1rovide a mechanism for interru)ts to run on se)arate stac$s.
Hard=are tas$ s=itching is no longer )ossible in 6.bit mode % there>s al=ays one tas$
only, and the o)erating system =ill need to change $ernel stac$)ointers =ithin that
single 300 if it =ishes to use e.g. )er6thread $ernel stac$s.
*.1.*.Pri&ilege swithing
3he x86 )rotected mode, very much unli$e other !1* architectures, does not $no=
any im)licit )rivilege s=itching. 3here is no instruction at all, and no interru)t, tra) or
other event =hich =ill end u) in )rivileged mode % unless the !1* =as )rogrammed
for the s)ecific event to redirect execution to a handler function running =ith higher
)rivileges.
All )rivilege s=itching on x86 )latforms ha))ens through gates. Ehich means the
?rings@ model )robably should better be dra=n li$e this/
..#emory and 1rivilege #anagement on x86 85
A gate is a descri)tor Fi.e. an entry of a descri)tor tableG =hich, instead of base
address and siKe s)ecifies a gate handler address and a target code segment selector.
'ates therefore/
redirect execution to a s)ecific location Fthe gate handlerG in the target cs.
s=itch )rivileges if the target cs )rivilege level is not e9ual to the current cs
)rivilege level.
1rivilege s=itching is done by calling a gate. 'ate calls can be/
e%plicit, by using the far call instruction, lcall, and s)ecifying the segment selector
that indexes the desired gate in the '73 or <73.
implicit, via the Interrupt Descriptor +able F(73G. 3he (73 is s)ecial in the sense
that it only may contain gate descri)tors, and must have exactly 255 entries Fone for
each x86 interru)t numberG. All hard=are exce)tions, faults, tra)s and interru)ts on
x86 are routed via the (73.
#ore details on )rivilege s=itching =ill be given in section ".2.2.
86 ..#emory and 1rivilege #anagement on x86
Illustration ' ! pri#ilege s$itching and gates
R
ING
R
IN
G 1
R
IN
G
2
Ring 3 ! "n#ri$ileged code
Ent ering higher
privilege t hrough
Gat es
*.1.8.6*bit Proteted (ode
3he 6.bit )rotected mode is highly sim)lified. (n fact, the already6described common
)ractice of setting u) the F"2bitG )rotected mode as follo=s/
All im)licitly used segments Fcs and ds;es;ssG are flat
one code and one data segment for ring 5 % $ernel
one code and one data segment for ring " % usermode
0egment regs ds;es;ss are e9ual at all times.
-oth cs and ds;es;ss can only have one of t=o set of values/ $ernel;user.
is made mandatory in 6.bit mode.
3he 6.bit )rotected mode doesn>t care about descri)tor base;siKe values as far as the
corres)onding segment selectors are in cs;ds;es;ss. 3hese segments are
im)licitly flat, and the !1* =ill only use;chec$ the )rivilege level bits % and the ty)e of
the segment. A ty)ical 6.bit x86 )rotected mode setu) uses/
a 6.bit code and a data segment Fone eachG for ring 5, the $ernel
one data segment for ring ", a))lications Fshared by "2;6.bit a))licationsG
one 6.bit code segment for ring ", used by 6.bit a))lications
one "2bit code segment for ring ", used by "2bit a))lications Fcompatibility modeG
All these segments are implicitly flat % the 6.bit x86 !1* ignores base;siKe values in
these descri)tors.
0ystem call and tra) handling changes slightly % see section ".2.
*.1.6.Segment and 0ate )esriptor Formats
*.1.9.The role of segment registers %fs and %gs
3he x86 architecture suffers from the lac$ of general6)ur)ose registers % as sho=n,
there are only eight of them in "2bit mode, and 16 in 6.bit mode, and their contents
are shared bet=een all functions in a )rogram, and bet=een the different levels of
)rivileges Fa )rivilege s=itch doesn>t change any of the general6)ur)ose registers
exce)t for esp;rspG.
-ut there often is need to $ee) some fixed reference, li$e a )ointer to thread!specific
data, in a location that>s 9uic$ly accessible.
!1* architectures =ith many registers at their dis)osal usually s)ecify in the A-( that
one register is su))osed to be set aside for this useN 01A2!, for exam)le, gives g. on
every !1* to hold the address of the current thread.
7oing that on x86 is bad % it>d slo= do=n code significantly. !onsider e.g. the i"86
*&(+ A-(, =hich already s)ecifies fixed roles for esp;ebp Fthe stac$6;frame)ointerG
and ebx Ffor the location of the global offset table in )osition6inde)endent codeG.
3a$ing ebx is bad enough and, as sho=n in cha)ter 2, slo=s do=n )osition6
inde)endent code significantly. -ut ta$ing yet another of the general6)ur)ose registers
a=ay for thread6s)ecific data is a very bad idea, not only because it>d reduce the
number of registers available to only four, but also because it>d )revent running code
from being able to use the full x86 instruction set. All remaining five registers
Feax;ecx;edx;esi;ediG are used im)licitly in some contexts Fecx is the counter
..#emory and 1rivilege #anagement on x86 87
register for loop;rep, eax;edx are )referred o)erand registers for "2 6.
multi)lication;division, and esi;edi are o)erand registers for string instructionsG are
im)licitly used some=here.
3his means another solution is re9uired. (ntel had seen the need for this, and in fact
)rovides a =ay out of the )roblem by su))lying t=o segment registers that are not
im)licitly used for anything 6 fs and gs.
3here are t=o related conce)ts that can be im)lemented using global segments/
1. thread6s)ecific data ; thread6local storage F307;3<0G. 3his means a )er6thread $ey is
used to locate a )iece of data that>s global to a given thread and accessible under
the same $ey from all functions =ithin this thread.
7ifferent threads use different keys to locate ?their@ data. All descri)tor tables Fon
all !1*sG =ill contain a common set of segment descri)tors Fone )er $eyG that locate
the various data sets.
2. !1*6local data in multi)rocessor systems.
3he same key is used by code indifferent of =hat !1* it is running on but
de)ending on that a different set of data is )rovided, by )utting different segment
descri)tors into each !1*>s descri)tor table at the index s)ecified by the common
selector value.
88 ..#emory and 1rivilege #anagement on x86
*.".Traps1 $nterr!pts1 System Calls1 Contexts
*.".1.The $nterr!pt )esriptor Table
As mentioned before, x86 !1*s do not $no= any instruction nor any other event that
im)licitly =ould s=itch the !1* from non)rivileged into )rivileged execution. (nstead,
all events that on other !1*s commonly involve a s=itch into su)ervisor mode /
hard=are interru)ts
tra)s and machine exce)tions
code execution errors Farithmetic faults, undefined;illegal o)codes, brea$)ointsG
)rivilege violation attem)ts Fexecuting )rivileged instructions ; accessing )rivileged
memory from un)rivileged codeG
are )rogrammable on x86 % they are routed through the Interrupt Descriptor +able.
3he (73 is different from '73;<73 in that it can only contain gate descriptors. (n
addition to that, it al=ays contains 255 entries % one for each interru)t vector $no=n
to the x86 !1*.
*.".".Pri&ilege swithes and sta,s
*.".'.Fast system all interfaes
?!lassical@ x86 system calls using a call gate in the <73 and the lcall instruction
have a long latency due to the various descri)tor table loo$u)s that are needed/
A segment loo$u) is )erformed to extract the <73 base address from the <73
segment descri)tor in the '73.
A segment loo$u) is )erformed to extract the gate descri)tor from the <73
A segment loo$u) is )erformed to extract the $ernel code segment base address
from the '73.
4nly then can execution be transferred into the $ernel, and the handler be dis)atched.
A faster method to )erform a system call is using an interrupt gate in the (73 and an
int instruction to issue the system call. 0ince the (73 is no segment, but located
directly in memory via its base address in idtr, this involves one less descri)tor table
loo$u). *sing int instead of lcall is therefore a )referable =ay ho= to )erform
system calls on x86 machines and yields lo=er latency syscalls.
-ut even that is still burdened =ith the overhead of segmentation and descri)tor table
loo$u).
(n a flat memory model as it is used in "2bit and 6.bit )rotected mode, a far pointer
kernel6code6segment:s;scall6/andler contains all of the information re9uired to
)erform the )rivilege s=itch and call the $ernel entry )oint. 3he target F$ernelG code
segment>s )rivilege bits determine that a )rivilege s=itch is re9uested, and since the
$ernel code segment is flat no base address needs to be added to the handler,
segmentation memory translation is a no6o). &o descri)tor table loo$u)s at all are
necessary to derive this.
Ehat>s needed therefore is a =ay to tell the !1*/ Cor )erforming a syscall, call a
..#emory and 1rivilege #anagement on x86 8
s)ecific )redefined far )ointer, i.e./
s=itch to the $ernel code segment Fand raise )rivileges as re9uestedG
run the $ernel>s system call handler given its address.
-oth (ntel and A#7 inde)endently introduced fast system call mechanisms in their
x86 !1*s that allo= this sim)le ?s=itch to )rivileged mode and call that handler@
a))roach % s;senter from (ntel, and s;scall by A#7.
3he s;scall instruction, if available F(ntel !1*s only $no= it if they have the A#76.6
com)atible E#6.3 extensionG, is the )referrable solution because it automatically
saves usermode return addresses and stac$)ointers on entry, and the corres)onding
s;sret instruction can resume execution in userland after the system call from there
directly. (ntel>s s;senter;s;sexit instructions re9uire the caller to )ass return
addresses and usermode stac$)ointers in registers, and the $ernel must manually
restore them before being able to issue s;sexit to return from the system call.
A)art from these im)lementation differences, the actual mechanism to control fast
system calls are similar bet=een the t=o. As an exam)le, the s;scall;s;sret method
=ill be described here.
5 ..#emory and 1rivilege #anagement on x86
*.'.2irt!al (emory (anagement on x86
3he 8586 16bit !1* did not su))ort any form of memory management % the ma))ing
bet=een the u))er 16bits of a ?logical@ FfarG address, i.e. the segment (7, and the
u))er 16bits of the "2bit F=ell % 25bitG )hysical address =as static, 1/1. (n addition, as
mentioned before, the 8586 had no notion of )rivilege and o)erating systems could
neither establish se)arate address s)aces bet=een non)rivileged user a))lications and
)rivileged $ernel code, nor )revent a))lication code from executing instructions that
=ould modify ?critical@ state.
Eith the 85286, (ntel both introduced a mechanism for )rivilege management and
made the ma))ing bet=een segment (7s Fthe far )art of an address, i.e. the u))er
16bitG and )hysical address )rogrammable. (n the 85286 and follo=ing !1*s o)erating
in 16bit mode, the protected mode allo=ed for )rivilege and address s)ace se)aration
by declaring Fnon6overla))ingG user;$ernel code and data segments. 0ince the
association bet=een segment (7s and their F)hysicalG location in memory is fully
)rogrammable, the 16bit )rotected mode im)lemented a sim)le one6level ##*, =ith
the '73;<73 functioning as translation table for virtual;)hysical memory access. (n
other =ords/ (n 16bit mode, segmentation actually )erforms the role of the ##*, and
logical addresses Ffar )ointersG are virtual addresses.
*.'.1.The lassial '"bit x86 ((4
3he use of segmentation for virtual;)hysical translation may have been a))ro)riate
=hen x86 !1*s =ere 16bit only. -ut for "2bit mode, the reliance on far )ointers, or
ex)licitly segmented memory access, causes severe )roblems. Ehy should anybody
=ant to fiddle =ith multi)le segments in a))lications;o)erating systems if a single
"2bit )ointer can locate every byte of )hysical memory in a machine J
(n other =ords/ (f the segment offset alone Flo=er )arts of a logical address, no= a
"2bit valueG can address every )iece of )hysical memory in a machine, =hy bother
=ith multi)le segments Fand .8bit far )ointersG at all J Ehat>s needed for "2bit
o)eration is a flat address space % unsegmented, =ith addresses starting at Kero and
ending at .'-.
3he "2bit 1rotected #ode allo=s to create such flat segments, =hich start at Kero and
cover all of the "2bit address s)ace. -ut doing that reduces the "2bit )rotected mode
to a vehicle for supplying pri#ileges only. -y using a )air of flat segments for
a))lication code and data Frunning in ring "G, and another such flat )air for $ernel
code and data Frunning in ring 5G, both user and $ernel code can run in a flat address
s)ace.
-ut clearly doing so removes memory management from the 1egmentation ))>. 3he
descri)tor tables are no= )rogrammed in such a =ay that any memory translation
ca)abilities via descri)tor tables are bypassed. 3here>s again a 1/1 ma))ing bet=een
logical address Fi.e. far )ointerG and the result of the segmentation translation ste),
and the unit6of6memory, the segment siKe, is .'-.
(n a flat "2bit mode, a logical address cannot be translated to a )hysical address
directly. <ogical and virtual addresses are no longer e,ual, and a memory granularity
of .'- is ina))ro)riate. 2esult of the segmentation translation =ill be a virtual
address no=, and a ne= mechanism to translate this to the actual )hysical address is
re9uired.
3his means for "2bit mode, the 85"86 had to su))ly a ne= memory management unit
as a 2
nd
stage of address translation % to convert from a "2bit virtual address to a "2bit
)hysical address.
..#emory and 1rivilege #anagement on x86 1
3he "2bit ##* )erforms address loo$u)s using sim)le hashing. (n ! )seudo code, its
o)eration can be ex)ressed as/
register paddr6t EEEcr%W /E pagedir base address in cr% E/
paddr6t tables2323W
Ddefine TH:?ESU 0# CC #!1 /E #!$& E/
Ddefine PHNESU 0# CC #$1 /E &!*6 E/
Ddefine PHNEKFO6FK90'addr1 00'addr 77 $$1 \ 0TH:?ESU R #11
Ddefine PHNET:?6FK90'addr1 00'addr 77 #$1 \ 0TH:?ESU R #11
Ddefine PHNE6<@@SET0'addr1 0'addr \ 0PHNESU R #11
Ddefine AH6T<6PH0'a1 tables2PHNEKFO6FK90'a132PHNET:?6FK90'a13 ^
4 PHNE6<@@SET0'a1
(n other =ords, the virtual address is s)lit FhashedG into three )arts/
bits \"1..22] su))ly the index into the level 1 table/ 3he page directory FtableG.
Entries in the )age directory locate )age tables in )hysical memory.
bits \21..12] su))ly the index into the level 5 table/ 3he page table.
Entries in the )age table locate the actual )hysical )ages.
bits \11..5] for the offset =ithin the )hysical )age.
-oth )age directory and )age tables are s)arse arrays of Fno more thanG 152. "2bit
values. A s)ecific index can therefore locate a )hysical )age if the table contains a non6
>Q?? pagetable entry that su))lies sufficient )ermissions for the running code the
access this )age.
4ther=ise, the ##* =ill cause a pagefault FDP@ tra)G. 3his ha))ens if/
the )agetable entry is >Q??, as indicator of unma))ed memory
the )agetable entry present bit has been cleared by the o)erating system to indicate
e.g. a s=a))ed6out )age
the running code executes at ring ";5 but the user;super#isor bit in the )agetable
2 ..#emory and 1rivilege #anagement on x86
Illustration J ! classical F3bit %&' ))>
10bit
PDT index
10bit
PT index
12bit page offset
32bit
!hysical
address
PKE2!3
PKE2#3
PKE2...3
PKE2#!$%3
PKE2#!$$3
PKE2#!$#3
page directory
table
page
table
PTE2#!$%3
PTE2#!$$3
PTE2...3
PTE2#3
PTE2!3
0 31
##* context register cr%
4
20bit page frame number 12bit page offset
0 11 21 12 22 31
32bit
&irtual
address
entry indicates that this )age is su))osed to be accessible from the ?other@ mode
only.
3he DP@ tra) has its o=n pagefault address register 6 cr$. 8irtual memory Fi.e.
pagingG can easily be im)lemented via this mechanism. Ehenever a )agefault occurs,
the handler =ill find the virtual address that caused the fault in cr$. (t =ill ex)licitly
)erform the table loo$u) and ins)ect the )agetable entry at this )osition. (f the entry
doesn>t exist, it can e.g. choose to create it Fgiving semantics of MHP6H><> mmapG. (f it
exists but the )age )resent bit is clear, it may decide to e.g. load the )age from s=a).
3he illustration and the exam)les given so far already indicate that )age directory and
)age table entries are not Bust "2bit )hysical addresses. And they need not be % they
locate )ages, and because the )age siKe is a fixed .$- a 25bit number, the page frame
number, is sufficient to enumerate all .$- )ages on a machine that allo=s for .'- of
)hysical memory. 1agetable entries therefore contain the 1C& % and attributes for that
)age. 0ome of them =ere already mentioned, the page present and the user;super#isor
attributes. -ut there are more. C'm//at6pte./7 on 0olaris 15;x86 names them/
Ddefine PT6AH?FK 0!x!!#1 /E a 'alid translation is present E/
Ddefine PT6POFTH:?E 0!x!!$1 /E t/e page is Britable E/
Ddefine PT6QSEO 0!x!!&1 /E t/e page is accessible b; user mode E/
Ddefine PT6POFTETMOQ 0!x!!81 /E Brite back cac/ing is disabled 0non)PHT1 E/
Ddefine PT6><CHCME 0!x!#!1 /E page is not cac/eable 0non)PHT1 E/
Ddefine PT6OE@ 0!x!$!1 /E page Bas referenced E/
Ddefine PT6M<K 0!x!&!1 /E page Bas modified E/
Ddefine PT6PHNESFUE 0!x!8!1 /E abo'e le'el !" indicates a large page E/
Ddefine PT6PHT6&= 0!x!8!1 /E at le'el !" used for Brite combining E/
Ddefine PT6N?<:H? 0!x#!!1 /E t/e mapping is global E/
Ddefine PT6S<@TPHOE 0!xe!!1 /E a'ailable for softBare E/
3he u))er 25bits of a )agetable entry =ill of course contain the )age frame number.
13Es therefore have the follo=ing format/
As sho=n, there are no s)are;reserved attribute bits left % all are used. 3his =asn>t so
in the 85"86, =hich did not have global or large )ages nor caching attributes. 3hese
##* features, as usual =ith x86, have to be detected and activated before use, and
CPQFK is re9uired to find out =hether a given x866com)atible !1* has these features.
Ehat>s noticeably missing from the "2bit ##* is an attribute for )age executability.
?!lassical@ x86 has this in the segmentation ##* only, in form of code segments. 3his
means that in flat address s)aces, a )age is executable if it is readable A
..#emory and 1rivilege #anagement on x86 "
Illustration & ! F3bit pagetable entry
31 0 1 2 3 4 5 7 6 8 9 12
25bit )age frame number
11
for "1
usage
P
O
/
P
Q
/
S
P
T
>
C
?
P
O
E
@
M
<
K
N
*.'.".Physial #ddressing %xtension BP#%C
3en years after the introduction of the 85"86, x866based systems had evolved far
enough Fand far beyond =hat (ntel had antici)atedG that the need for big server
systems ca)able of accessing more than .'- of )hysical memory became evident.
!om)eting "2bit architectures of that time, li$e sun.d;"2bit sun.u, 1A62(0! 7xxx or
#(10 "xxx =hich =ere all used in server systems by various vendors, have had ##*s
that allo=ed the o)erating system to hold several "2bit )rograms including all of their
.'- address s)ace in system memory concurrently. 3his re9uires ##* translation
modes that convert "2bit virtual addresses into more6than6"2bit )hysical addresses.
3he "2bit x86 ##* had no )rovisions for that. 3here are no s)are;reserved bits in
"2bit )agetable entries, so the )age frame number couldn>t sim)ly be extended. (n
addition to that, the siKe of a )agetable had to be one )hysical )age F.$-G, so (ntel
neither could Bust double the siKe of )agetable entries.
(ntel therefore had to/
increase the siKe of a )agetable entry from "2bit to 6.bit, using Fsome ofG the ne=
s)are bits for a larger )age frame number
half the siKe of )age table;)age directory table from 152. entries to 512 entries so
that the total siKe of the table =ould stay .$-.
-ut halving the table siKe to 512 of course means that the virtual address can no
longer be s)lit 15/15/12 % but 2///12, necessitating/
the introduction of a third translation table.
3his so6called :age Directory :ointer +able can, given that there are only t=o bits for
its index, only contain four entries of course.
3his ##* mode, called :hysical 5ddressing (%tension F1AEG, =as introduced =ith the
1entium1ro;(( !1*s and uses three levels of translation tables/
. ..#emory and 1rivilege #anagement on x86
Illustration M ! :hysical 5ddress (%tension: allo$ '*9A memory in F3bit mode
9bit
PDT index
9bit
PT index
12bit page offset
36bit
!hysical
address
PKE2!3
PKE2#3
PKE2...3
PKE2-##3
PKE2-#!3
PKE2-!*3
page directory
table
page
table
PTE2-##3
PTE2-#!3
PTE2...3
PTE2#3
PTE2!3
0 35
##* context register cr%
4
24bit page frame number 12bit page offset
page directory
pointer table
PKE2!3
PKPTE2#3
PKPTE2$3
PKPTE2%3
0 11 20 12 21 29 31
2
b
i
t
32bit
&irtual
address
the page directory pointer table Ffour entries onlyG
the page directory table F=hich no= has 512 entriesG
the page table.
3he format of )agetable entries is t=ice the siKe as before, but uses the same format
including all of the attribute bits =ith t=o exce)tions/
1. 3he )age frame number is no= Fat leastG 2.bits.
2. (f the !1* su))orts it, bit 6" is a ne= )age attribute ?&+@ 6 not e%ecutable.
3he &+ bit got introduced by A#7 in the 4)teron )rocessors, but it>s not 6.bit
s)ecific. As usual =ith x86 extensions, the )resence of the &+ bit can be 9ueried by
the o)erating system using the CPQFK instruction.
!1*s that have the &+ bit the long6missing ca)ability in F"2bitG x86 to )rotect areas of
memory from being executed. 3he feature is also called Data (%ecution :rotection
F7E1G by #icrosoft, and (nhanced Virus :rotection FE81G by A#7 #ar$eting % though
the latter is a technically a misleading term, since the ?only@ thing &+ )rotects against
are sim)le6=ritten stac$ overflo= ex)loits.
..#emory and 1rivilege #anagement on x86 5
Illustration L ! '*bit pagetable entry. F'bit :5( mode
36 0 1 2 3 4 5 7 6 8 9 12
2.bit )age frame number
11
for OS
usage
63
O
/
P
P
Q
/
S
P
T
>
C
O
E
@
M
<
K
?
P
N
>
9
bits 37..62 reserved (shall be 0)
*.'.'.The #()6* 6*bit ((4
3he maBor reason for going to 6.bit, =hatever !1* vendor, has al=ays been to allo=
concurrent access to large virtual address s)aces. 3his of course mandated a ne=
##* mode % both the classical "2bit x86 ##* and the 1AE mode are only su))orting
"2bit virtual addresses.
A#7 in designing the 6.bit mode retained all existing x86 characteristics. (t>s
therefore not sur)rising that the 6.bit ##* mode is sim)ly an extension of 1AE mode
to 6.bit virtual addresses. 1AE mode )ro)erties also found in the 6.bit ##* are/
)ages are 2
12
bytes, .$-.
1age directory entries can )oint to large )ages of 2
21
bytes, 2#-.
)agetable entries use the 1AE format F6.bit =ide, attribute bits identical to 1AEG
translation tables contain 2
F512G entries.
3he big deficit of 1AE mode, unbalanced translation tables because of the use of only
2 bits for the )age directory )ointer table index, of course is solved because 6.bit
virtual addresses su))ly enough bits to ma$e that table, li$e all others, 512 entries.
A#7 additionally added a fourth table, sim)ly called ?:age )ap ?e#el *@. 3he 6.bit
##* therefore loo$s li$e this/
1agetable entries, as mentioned, use the 1AE format % exce)t, of course, the 1C&
=hich is no longer 2.bit as in "2bit 1AE mode, but .5bit no=.
3his allo=s the 6.bit ##* to access 2
52
bytes of memory, .1- in total.
3he 6.bit ##* of course su))lies the &+ bit that A#7 introduced =ith the 4)teron
and Athlon6. !1* series. 6.bit )agetable entries use the follo=ing format/
6 ..#emory and 1rivilege #anagement on x86
Illustration ! '*bit %&' ))>
PTE2-##3
PTE2-#!3
PTE2...3
PTE2#3
PTE2!3
4
Mbit
:D+ inde%
52bit
!hysical
address
PKE2!3
PKE2#3
PKE2...3
PKE2-##3
PKE2-#!3
PKE2-!*3
page directory
table
0 51
##* context register cr%
3bit page offset
page directory
pointer table
PKE2!3
PKPTE2#3
PKPTE2$3
PKPTE2-##3
PKPTE2...3
PM?&E2!3
PM?&E2-#!3
PM?&E2-##3
PM?&E2...3
page map
level7
Mbit
:D:+ inde%
Mbit
:)?* inde%
6#bit &irtual address
'bit canonical part
0 20 21 29 30 38 39 47 48 63
page frame number
12 11
Mbit
:+ inde%
3bit page offset
page table
Ehat needs ex)lanation of course is the 9uestion =hat the virtual address bits that are
not used for )age offset and table indices are su))osed to be. 0im)le arithmetics tells
us that the ##* uses 12DDDD X .8 bit for address translation only. Ehat>s the
state of the u))er 16 bits of a virtual address J
3he ans=er to that is that the ##* =ill only )erform address translation if the u))er
16bits are either all 8ero or all one, de)ending on the state of bit *J.
(f =e consider the 6.bit virtual address s)ace to be unsigned and to extend from 5 to
2
6.
61, this s)lits the addressable virtual memory into t$o ranges of 2
.7
bytes F1283-G
each, one at the bottom and one at the top of the virtual address s)ace/
3he u))er and lo=er address s)aces are se)arated by an address space hole.
3ranslateable virtual addresses that match the condition of bits 6"...7 inclusively are
either all Kero or all one, i.e. that the address is either =ithin the lo= 1283- or the
high 1283- of the virtual address s)ace, are called canonical addresses. 8irtual
addresses in the hole are noncanonical and their use causes DNP faults.
8irtual address s)aces =ith holes are not ne= in 6.bit environments. Cor exam)le,
*ltra01A2!6( and (( also had an address s)ace hole Fbut they =ere even more limited
than A#76. % =ith only 2x13- of virtual address s)aceG.
3here is a different =ay of understanding 6.bit ##*s =hich use a canonical mode.
!onsider the virtual address to be a signed *&bit #alue. (.e. virtual addresses range
..#emory and 1rivilege #anagement on x86 7
Illustration F ! 5ddress space hole in 5)D'*
6#bit &irtual address
16bit canonical part
47 48 63
1111111111111111 1 11111111111111111111111111111111111111111111111
1111111111111111 1 000000000000000000000000000000000000000000000000
0000000000000000 0 11111111111111111111111111111111111111111111111
0000000000000000 0 000000000000000000000000000000000000000000000000
Address Space <ole
non6translatable virtual addresses
memory access in this range causes DNP faults
46 0
(
128 3- upper virtual address s)ace
!xffffffffffffffff...!x8!!!!!!!!!!!
128 3- lo$er virtual address s)ace
!x!...!x.fffffffffff
Illustration 3 ! '*bit pagetable entry. '*bit mode
52 0 1 2 3 4 5 7 6 8 9 12
.5bit )age frame number
11
for OS
usage
63
O
/
P
P
Q
/
S
P
T
>
C
O
E
@
M
<
K
?
P
N
>
9bits BF..'3 reser#ed
Dshall be LE
from 62
.7
...2
.7
61 bytes Fi.e. 1283-G. 3he u))er bits of the 6.bit address are therefore
derived by sign e%tension.
(n this signed re)resentation, there is no address s)ace hole. Allo=ed virtual addresses
are continuous over a 2563- range, =hile virtual addresses outside of that range are
undefined and cause DNP faults =hen used.
3he terms negati#e address range and upper address range describe the same thing,
and are often used interchangeably.
8 ..#emory and 1rivilege #anagement on x86
Illustration * ! signed #irtual addresses
6#bit &irtual address
47 48 63
)))))))))))))))) ) 11111111111111111111111111111111111111111111111
)))))))))))))))) ) 00000000000000000000000000000000000000000000000
++++++++++++++++ + 11111111111111111111111111111111111111111111111
++++++++++++++++ + 00000000000000000000000000000000000000000000000
128 3- negati#e virtual address s)ace/
)!x!!!!!!!!!!!#...)!x8!!!!!!!!!!!
128 3- positi#e virtual address s)ace
!x!!!!!!!!!!!!...!x.fffffffffff
non!translatable #irtual addresses I access causes #GP faults
46 0
*
non!translatable #irtual addresses I access causes #GP faults
16bit canonical part
*.'.*./arge Pages
Cor accessing very large amounts of memory, .$- )ages are inconvenient and slo= for
obvious reasons/
Having the o)erating system create e.g. 1 million )agetable entries to allocate a
.'- chun$ of memory ta$es a =hile.
!ached translations F3<-;translation lookaside buffer entriesG =ill sho= heavy
contention =hen so many translations need to be done all the time.
(n order to s)eed u) the use of large amounts of memory, large pages are being used.
3he common mechanism for creating a large )age is to use a higher6level translation
table entry as ?large :+(@ and ma$e the siKe of large )ages the sum of the siKes of all
)ages in the next lo=er6level table. (n x86 terms, a )age directory entry that has the
largepage attribute bit set =ill not )oint to a )agetable, but directly to the )hysical
location of the large )age. 0ince )agetables contain 512 entries, the siKe of a large
)age =ill be 512x.$-, 2#- Fif using the classical "2bit ##* mode/ 152.x.$-, .#-G.
3he ca)ability to su))ort large )ages, as all of the )ost685"86 features, again must be
detected using a !1*(7 instruction, and selectively enabled. All 6.bit6ca)able x86
!1*s su))ort large )ages, but not necessarily all "2bit ?x86 com)atibles@.
At this time, there is no su))ort for V2#- large )ages Fthe next logical siKe =ould be
512x2#-, 1'-G in the x86 ##*. (t>s li$ely, though, that if this is ever introduced that
it>ll be done the same =ay % by ma$ing a )age directory )ointer table entry refer to a
?huge page@ of 1'- then, or even a ?giant page@ of 512'-, if ever the )age ma) level
. table may )oint directly to a )age...
(n any case, turning around the conce)t of large )ages/
A >Q?? as table entry means/
..#emory and 1rivilege #anagement on x86
Illustration B ! using large pages
Mbit
:D+ inde%
3bit page offset
52bit
!hysical
address
PKE2!3
PKE2#3
PKE2...3
PKE2-##3
PKE2-#!3
PKE2-!*3
page directory
table
0 51
##* context register cr%
4
3bit page
offset
page directory
pointer table
PKE2!3
PKPTE2#3
PKPTE2$3
PKPTE2-##3
PKPTE2...3
PM?&E2!3
PM?&E2-#!3
PM?&E2-##3
PM?&E2...3
page map
level7
Mbit
:D:+ inde%
Mbit
:)?* inde%
6#bit &irtual address
'bit canonical part
0 20 21 29 30 38 39 47 48 63
page frame number
an unma))ed area of 512'- if in the 1#<. table
an unma))ed area of 1'- if in a )age directory )ointer table
an unma))ed area of 2#- if in a )age directory table
an unma))ed )age of .$- if in a )age table.
155 ..#emory and 1rivilege #anagement on x86
*.*.#d&aned System Programming TehniD!es on
x86
*.*.1.)(# !sing &irt!al addresses - $O((4
*.*.".4sing the Proteted (ode for Hardware
2irt!aliEation
#odern system architectures often allo= to run multi)le o)erating system instances
on the same )hysical machine. 7omaining ; hard=are )artitioning ; virtualiKation are
the terms used to describe this ca)ability. 0u))orting this re9uires
..#emory and 1rivilege #anagement on x86 151
8.$nterr!pt handling1 )e&ie
#!toonfig!ration
8.1.$nterr!pt Handling and $nterr!pt Priority
(anagement
x86 )rocessors alone $no= only interru)t vectors Findices into the (73G and a bit in the
)rocessor status register E@?HNS;O@?HNS that says ?interru)ts enabled@. 3he !1*
$no=s no conce)t of interru)t )riorities. 3o a x86 !1*, all 255 Fres). all user6available
22"G interru)t vectors are e9ual. 3he )rocessor has )ins that connected )eri)heral
devices can use to ?cause interru)ts@, but a)art from ?ignore all@ F(E bit clear % after a
cli instructionG and ?acce)t all@ FFE bit set, after a sti instructionG it hasn>t the ability
to selectively bloc$ interru)ts, e.g. from a lo=6)rio dis$ device if a handler for a high6
)rio net=or$ interru)t is Bust running. All of the follo=ing/
-inding hard=are interru)t sources to !1* interru)t vectors,
classifying interru)ts on )eri)heral devices into different )riority classes, and
selectively bloc$ing out s)ecific devices or s)ecific interru)t )riorities
manage interru)t state Finterru)t active;)endingG
has, on x86 )latforms, al=ays been the tas$ of a device called :rogrammable Interrupt
0ontroller F1(!G. 3he ?classical@ x86 1(! is the (ntel i825A.
5.(nterru)t handling, 7evice Autoconfiguration 15"
8.".#P$C and $O#P$C feat!res
8.".1.O&er&iew
(n a multi)rocessor environment, interru)t handling becomes significantly more
com)licated than before. 3he interru)t controller in 0#1 systems must )rovide all the
ca)abilities of interru)t management and )rioritiKation as the sim)le 1(!, but in
addition to that, a 0#16ca)able interru)t controller must be able to/
route interrupts. (t>s highly undesirable on a multi)rocessor system to bother all
!1*s at once =ith handling a s)ecific device interru)t.
support inter!processor interrupts, i.e. a !1* itself being the interru)t source of
another !1*.
Eith the introduction of the 1entium F15G micro)rocessor architecture, (ntel
integrated 0#1 su))ort on chi), =ith an interru)t arbitrator ; coherency controller
subsystem called the 5d#anced :rogrammable Interrupt 0ontroller, A1(!.
Every modern x86 )rocessor contains a local 5:I0, =hose tas$s are/
2eceive and dis)atch local interru)ts Ffrom devices directly connected to !1*
interru)t )insG
7is)atch !1*6internal interru)ts FA1(! timer, tem)erature sensors, )erformance
monitoring eventsG
2eceive and dis)atch external interru)ts Ffrom the (4A1(!, a system;)eri)heral bus
com)onent that routes )eri)heral interru)ts via the A1(! )rotocolsG
2eceive and dis)atch inter6)rocessor interru)ts, (1(s F(A"2 term for crosscallG.
15. 5.(nterru)t handling, 7evice Autoconfiguration
Illustration ! 5:I0 functionality
A*I, A System bus
*eripheral bus
d
e
v
i
c
e
d
e
v
i
c
e
d
e
v
i
c
e
d
e
v
i
c
e
d
e
v
i
c
e
I0A%I-
-%1 *
local
A1(!
!"#i$ter$al
i$terru%t sour&es
-%1 1
local
A1(!
!"#i$ter$al
i$terru%t sour&es
-%1 3
local
A1(!
!"#i$ter$al
i$terru%t sour&es
-%1 2
local
A1(!
!"#i$ter$al
i$terru%t sour&es
directed crosscall
broadcast
crosscall
de#ice interrupt
#anage interru)t )riorities.
3he chi)set =ill contain an external or (;4 A1(!, =hich is then used as a
)rogrammable dis)atch facility for hard=are interru)ts to the various local A1(!s of
the )rocessors in the system.
3he A1(! routes external F(4A1(!G and local Fsee aboveG interru)t sources to (A"2
interru)t numbers in a user6)rogrammable =ay. A1(! registers used for this )ur)ose
contain the (A"2 int[ to dis)atch on event in their lo=est 8 bits.
3he A1(! grou)s interru)t vector numbers F5..255G into 16 )riority grou)s, =ith the
)riority of an interru)t given by int[;16. 1riority 5 and 1 are highest Fhard=are
exce)tionsG, and can neither be created nor bloc$ed by the A1(!. 1riorities 2..15 are
available to A1(! interru)ts. 3he +ask :riority 2egister, TPO, allo=s to bloc$ interru)ts
of lo= )riority, =hile the read6only :rocessor :riority 2egister, PPO, reflects the
current settings.
!1*6!1* communication is executed via Inter!:rocessor Interrupt, (1(, the x86 term
for LcrosscallL. 3o generate (1(s, the (nterru)t !ontrol 2egister, (!2, is =ritten. 3he
(!2 alone enables the )rogrammer to/
dis)atch a )rogrammable int[ to a single other !1* Ftargeted (1(G
broadcast a )rogrammable int[ to all !1*s, including;excluding the sending !1*
Additional A1(! registers, the ?ocal Destination 2egister, ?KO, and the Destination
Format 2egister, K@O, allo= even for multicast (1(s Frestricting broadcasts to a
selected set of !1*sG.
3he A1(! is )rogrammed li$e a memory6ma))ed deviceN base address for the ma))ing
is the machine6s)ecific FH%$6HPFC6:HSE register, any A1(! registers are offset
relatively to the base. 3o read or =rite an A1(! register, sim)ly access memory at the
corres)onding offset relative to the A1(! base address.
3he (4A1(! is a ?doubly indirect@ ma))ed device. (t has t=o registers FF<OENSE? and
F<PF>G that are accessed relative to the 1outhbridge>s HPFC6:HSE. Fthis is not the same
as the !1*s FH%$6HPFC6:HSE AG (4E(& ma)s to one of the actual (4A1(! registers
de)ending on the value in F<OENSE?. (.e. to )rogram an (4A1(! register, the register
number is )ut into F<OENSE? first, and the actual (4A1(! register is then accessed via
F<PF>.
8.".".#P$C interr!pt registers
A1(! registers, =hether (4A1(! F(ntel 825"AAG or !1*6local A1(!, use a common
format for all registers associated =ith interru)t delivery/
3he bits have the follo=ing meaning/
5..7/ F>TD
the x86 interru)t vector to dis)atch to the target !1*FsG on event
8..11, delivery mode ; destination mode Fonly in registers for nonlocal deliveryG
decides =hich !1*s are targeted in broadcast;multicast routing modes
12/ 70 Fdelivery statusG
indicates =hether an interru)t of this ty)e is )ending F9ueuedG due to higher6
5.(nterru)t handling, 7evice Autoconfiguration 155
Illustration 3 ! 5:I0 2egister format. for interrupt!dispatch related registers
' 3 J L
F>TD
D)
D1+ )
D
1
)
1
Q
'
)rioritiKed interru)t handlers running
16/ #0: Fmas$G
can be used to selectively bloc$ generation of interru)ts from interru)t sources
FdevicesG controlled by the given F(4GA1(! register.
56..6"/ destination A1(! (7 Fonly in registers for non6local delivery, i.e. the (4A1(!
redirection table and the local A1(!>s (nterru)t !ommand 2egisterG determines the
interru)t target !1* set, together =ith the delivery ; destination mode bits and t=o
A1(! control registers Flogical destination register, destination format registerG.
3he main A1(! registers related to interru)t creation;dis)atch are on 1entium6(8
systems Fothers may not have all of the 16(8>s ?AT23 registers % again, as$ via CPQFKG/
A*I, base
B...
+egister Description
!x%$! ?AT2!3 timer register.
1rogram high6resolution timer interru)ts here.
!x%8!/*!
!x%e!
CCO, FCO, KCO current count;initial count;di#ide configuration register.
Additional state for the A1(! timer.
!x%%! ?AT2#3 thermal monitor register.
!reate interru)ts on danger of overheating.
!x%&! ?AT2$3 performance counter register.
(nterru)ts for )rofiling.
!x%-!
!x%6!
?AT2%3
?AT2&3
local de#ice L register
local de#ice register
!x%.! ?AT2-3 5:I0 error register.
!reate interru)t if A1(! encounters an error.
!x$8! ESO (rror status register.
Auxilliary FreadonlyG information on A1(! errors.
!x%!!
!x%#!
FCO Interrupt 0ommand 2egister F6.bitG
!reate inter6)rocessor interru)ts.
!xd!
!xe!
?KO, K@O ?ogical Destination ; Destination Format 2egister.
*sed to clarify destination !1*s for (1(s.
3he (4A1(! uses a set of 2. registers called I/" 2edirection +able that uses the
mentioned format to dis)atch events on the )eri)heral bus via the A1(! mechanism.
3he F<OEKFO23 registers use the described generic register format to select interru)t
vector and target !1*.
156 5.(nterru)t handling, 7evice Autoconfiguration
8.".'.$nterr!pt priorities1 cr8
3he A1(! Fas =ell as the older i8257 1(!G uses a sim)le ma))ing bet=een interru)t
)riorities and x86 interru)t vectors/
(nterru)t 1riority / vector[ ; 16
1riority subclass / vector[ Y 16
(.e. high6)riority interru)ts get high interru)t vector numbers.
3his ma))ing is not configurableN =hat the A1(! allo=s to do is to bloc$;9ueue
interru)ts not only based on the interru)t source T #0: bit, but also based on
)riority. Cor that )ur)ose, the A1(! )rovides a ?)riority register@ =hich comes in t=o
flavours/
+egister ?ame Description
+ask :riority 2egister,
TPO
Erite6only register. 4ffset !x8!
*sed to set the current (1< F(nterru)t 1rivilege <evelG.
:rocessor :riority 2egister,
PPO
2ead6only register. 4ffset !xa!
*sed to 9uery the current (1<.
2ecent x86 !1*s ma) the A1(! (nterru)t 1riority register directly to a ne= !1*
control register. cr8, if available, serves both )ur)oses, to 9uery and set the current
(1<.
5.(nterru)t handling, 7evice Autoconfiguration 157
8.".*.#P$C interr!pt proessing flow
3he A1(! runs concurrently =ith and asynchronously to the !1*, in a state machine
similar to the follo=ing/
3hree additional A1(! registers are involved =ith this flo=/
Interrupt 2e,uest 2egister, FOO.
3he FOO is a 256bit6=ide bitma) % a )ending but not6yet6dis)atched interru)t =ill
result in a bit being set in the FOO.
3he A1(! )ermanently monitors Factive, i.e. not mas$edG interru)t sources and
automatically manages the bitma) in FOO.
3he register occu)ies 256bit of memory, offsets !x$!!..!x$.! to the base. 0oft=are
can read this for debug )ur)oses Fbut not =rite itG.
In!1er#ice 2egister, FSO.
3he A1(! uses FSO to indicate =hich interru)t is currently being serviced by the
!1*. 3he FSO contains the vector number. (t>s again a 256bit6=ide register, but
unli$e the FOO it can only contain exactly one bit set. 3he A1(! automatically
moves the highest6)riority bit from FOO to FSO and dis)atches the interru)t vector
to the !1* if FSO gets cleared.
Again, 256bit of memory at offsets !x#!!..!x#.! ma) the FSO. (t>s also readonly.
(nd!of!Interrupt 2egister, E<F.
3his register is =riteonly and used as a trigger. 3he interru)t service routine
running on the !1* =rites to E<F to indicate com)letion. Eriting E<F clears FSO
and causes the A1(! to continue interru)t dis)atch if there are interru)ts 9ueued
via FOO. 3he value =ritten to E<F is irrelevant % as said, it>s a )ure trigger.
158 5.(nterru)t handling, 7evice Autoconfiguration
Illustration F ! 5:I0 interrupt processing $orkflo$
interru)t handler runs,
=rites E<F at end
A1(! moves F>TD bit from FOO
to active F>TD bit in FSO
A1(! clears active
F>TD bit in FSO
2I+/
set, handler
running
J
int
)ending
J
FOO
indicates
interru)t
J
!1* dis)atches F>TD
handler via gate in (73
A1(! sets re9uested
$.TF bit in %IRR
no
yes
yes
no
yes
A1(! dis)atches F>TD to !1*
CP4
5.(nterru)t handling, 7evice Autoconfiguration 15
6.Solaris=x86 arhitet!re
Crom ?3he 3ao of 1rogramming@/
+he $arlord asked the programmer:
NHhich is easier to design: an accounting package or an operating systemGN
N5n operating system.N replied the programmer.
+he $arlord uttered an e%clamation of disbelief.
N1urely an accounting package is tri#ial ne%t to the comple%ity of an
operating system.N he said.
N@ot so.N said the programmer. N$hen designing an accounting package. the
programmer operates as a mediator bet$een people ha#ing different ideas:
ho$ it must operate. ho$ its reports must appear.
and ho$ it must conform to the ta% la$s.
Ay contrast. an operating system is not limited by outside appearances.
Hhen designing an operating system.
the programmer seeks the simplest harmony bet$een machine and ideas.
+his is $hy an operating system is easier to design.N
+he $arlord of Hu nodded and smiled.
N+hat is all good and $ell. but $hich is easier to debugGN
+he programmer made no reply.
3o be covered here/
0olaris (nternals
6.0olaris;x86 architecture 111
6.1.3ernel and !ser mode
3he =ay ho= 0olaris;x86 sets u) the 6.bit )rotected mode is subBect to the follo=ing
constraints/
3he use of a bootloader and the use of -(40 services re9uires thun$ing interfaces
bet=een the F6.bitG $ernel and the 6.;"2bit )arts of the bootloader that su))ly the
booto)s services. 3his is =hy the descri)tor table contains a set of )rivileged
code;data descri)tors for calling into bootloader;-(40 during system startu).
*sing fast system call mechanisms Fs;scall and;or s;senterG mandates the sho=n
ordering of code;data segments Fi.e. the $ernel data segment must directly follo=
the $ernel code segment in the descri)tor table, because this assum)tion is im)licit
in the =ay fast syscalls are set u)G.