Professional Documents
Culture Documents
Next
The classic Zen Timer code profiling tool, in both executable and source
code format.
a\
4
@
Exclusive! The text of Michaels long out of print 1989 cult classic Zen of
AsS6T373ibly
Language, plus scans of all 1 OOt technical figures.
ant essays from Michaels ongoing work in game developg for the first time in book form.
:nts, descriptions,
copyrights,
installation,
limita-
n.
Hardware
Wimiws
95 or NT.
Black Book
Michael Abrash
Albany, NY Belmont, CA Bonn Boston Clnclnnatl Detrolt Johannesburg London
Madrld Melbourne Mexlco C I New
~ York Paris Slngapore Tokyo Toronto Washlngton
Publisher
Project Editor
Production Proiect Coordinator
Compositor
Cover Artist and Cover Design
Proofreaders
Indexer
CD-ROM Development
Keith Weiskamp
Denise Constantine
Kim Eoff
Rob Mauhar
Anthony Stock
JefKellum and Meredith Brittain
Caroline Parks
Robert Clarfield
The author and publisher of this book have used their best effortsin preparing the book and
the programs contained in it. These efforts include the development, research, and testing of
the theories and programs to determine their effectiveness. The
author and publisher makeno
warranty of any kind, expressed or implied, with regard
to these programs or the documentation contained in this book.
The author and publisher shall notbe liable in the event of incidental or consequential damages in connection with, or arising out the
of, furnishing, performance, or
use of the programs,
associated instructions,and/or claims of productivity gains.
Tiademarks
Trademarked names appear throughout this book. Rather than list the names and entities that
own the trademarks or insert
a trademark symbol witheach mention of the trademarked name,
the publisher states thatit is using the names for editorialpurposes only and to the benefit of
the trademark owner, with no intention of infringing upon that trademark.
The Coriolis Group, Inc.
602/483-0192
FAX 602/483-0193
http://www.coriolis.com
109876543
Acknowledgments
Because this bookwas written over many years,in many different settings, an unusually large number of people have played a part in making this book possible.
First and foremost, thanks (yet again) to Jeff Duntemann for getting this book
started, doing thedirty work, and keeping things on track and everyones spirits
up. Thanks to Dan Illowsky for not only contributing ideas and encouragement,
but also getting me started writing articles long ago, when I lacked the confidence
to do it on myown-and
for teaching me how to handle the business end of
things. Thanks to Will Fastie for giving me my first crack at writing for a large
audience in the long-gone but still-missed PC TechJournal, and for showing me
how much fun it could be in his even longer-vanished but genuinely terrific column in Creative Computing (the most enjoyable single columnI have everread in a
computer magazine; I used to haunt the mailbox around the beginning of the
month justto see whatWill had to say). Thanks to Robert Keller, Erin OConnor,
Liz Oakley, Steve Baker,
and the rest of the cast ofthousands that made PJa uniquely
fun magazine-especiallyErin, who did more thananyone toteach me the proper
use of English language. (To this day, Erin will still patiently explain to me when
one should use that andwhen one should use whicheven though eight years
of instruction on this and related topics have left no discernible imprint on my
brain.) Thanks toJon Erickson, Tami Zemel, Monica Berg,
and therest of the DDJ
crew for excellent, professional editing, and forjustbeing great people. Thanks to
the Coriolis gangfor their tireless hard work: JeffDuntemann andKeith Weiskamp
on the editorial and publishing side, and Brad Grannis, Rob Mauhar, and Kim
Eoffwho handled art,design, and layout. Thanks to Jim
Mischel whodid a terrific
job testing code for the book and putting thecode disk together. Thanks to Jack
Tseng, for teaching me a lot about graphics hardware, and even more about how
much difference hard work can make. Thanks John
to Cockerham, David Stafford,
Terje Mathisen,the BitMan, Chris Hecker, Jim Mackraz, Melvin
Laftte, JohnNavas,
Phil Coleman, Anton Truenfels,
John Carmack,John Miles,John Bridges, Jim Kent,
Hal Hardenberg, Dave Miller, Steve Levy, Jack Davis,
Duane Strong, Daev Rohr, Bill
Weber, Dan Gochnauer, Patrick Milligan, Tom Wilson, the people in the ibm.pc/
fast.code topic on Bix, and all the rest of you who havebeen so generous with your
ideas and suggestions. Ive done my best to acknowledge contributors by name in
this book, but if yourname is omitted, my apologies, and consider yourself thanked;
this bookcould not have happened without you.And, of course, thanks to Shayand
Emily for their generous patience with my passion for writing and computers.
And, finally, thanks to the readers of my articles and to you, the reader of this
book. You are, after all, the ultimate reason why I write, and I hope you learn as
much and have as much fun reading this book as I did writing it!
Michael Abrash (mikeab@microsoft.com)
Bellevue, Washington, 1997
To Shay
Previous
Next
Home
Contents
Foreword xxxi
Introduction xxxiii
Part I
Chapter 1
Chapter 2
19
19
WhereWe'reGoing
A World Apart 21
The Unique Nature of Assembly
Language Optimization 23
Instructions: The Individual versus
the Collective 23
Assembly Is Fundamentally Different
Transfwmation Inefficiencies 25
Self-Rliance 27
vi i
25
Knowledge 27
The FlexibleMind
28
Whereto Begm ? 30
Chapter 3
Assume Nothing
31
33
42
72
75
The 8-BitBusCycle-Eater79
The Impact of the 8-Bit Bus Cycle-Eater 82
What to Do about the 8-Bit Bus Cycle-Eater? 83
86
Contents
95
Wait States 99
The Display Adapter Cycle-Eater 101
The Impact of the DisplayAdapter Cycle-Eater 104
What to Do about the Display Adapter Cycle-Eater? 107
Cycle-Eaters:A Summary 108
What Does It All Mean? 108
Chapter 5
111
121
Chapter 6
123
125
LookingPastFaceValue
128
130
Chapter 7
Local Optimization
135
138
139
Chapter 8
Speedin Up C with
Assem ly Language
149
153
Chapter 9
155
156
167
Chapter 10 PatientCoding,Faster
187
Code
189
196
193
Recursion
199
Chapter 11
Code Alignment
215
21 9
223
Caveat Programmor
241
246
Chapter 14
Boyer-Moore String
Searching 259-
Chapter
15
277
279
306
312
Chapter 18
347
372
Chapter 20
Pentium
Rules
378
381
Chapter 21
404
Chapter 22
400
Zennin
Flexib eand
Mind
the 41 3
Taking a Spin through What
Youve Learned 415
Zenning 41 5
406
427
Chapter 24
Parallel Processing
with the VGA 449
Taking on Graphics Memory Four Bytes
at a Time 451
VGA Programming: ALUs and Latches 451
Notes on the ALU/Latch Demo
Program 458
Chapter 25
Chapter 26
Chapter 28
523
Chapter 29
561
Chapter 31
Two 256-ColorPages600
Something to Think About
Chapter 32
605
Chapter 33
625
Chapter
34
ChangingColorswithout
Writing Pixels 637
Chapter 35
An Implementation in C 661
Looking at EVGALine 665
Drawing Each Line 668
Drawing Each Pixel 669
Chapter 36
Chapter 37
695
705
Contents
Chapter 38
The PolygonPrimeval
707
710
Chapter 39
723
Chapter 40
732
735
735
739
742
Nonconvex Polygons
Details, Details
Chapter 41
Contents
'755
755
Chapter 42
Wued in Haste;Fried,
Stewed at Leisure 773
Fast Antialiased LinesUsing WusAlgorithm 775
Wu Antialiasing 776
Tracing and Intensity in One 778
Sample Wu Antialiasing 782
Noteson WuAntialiasing
Chapter 43
791
799
Chapter 44
Split ScreensSavethe
PageFlipped Day 817
640x480 Page Flipped Animation in
64K...Almost 819
A Plethora of Challenges 819
A Page Flipping Animation Demonstration 820
WriteMode 3 831
Drawing Text 832
PageFlipping 833
Knowing WhentoFlip 835
Chapter 45
836
841
Contents
844
855
Image? 859
872
Chapter 47
861
862
873
Mode X: 256-Color
VGA Magic875
908
Chapter 49
Mode X 256-Color
Animation 913
How to Make the VGA Really Get up
and Dance 915
Masked Copying 915
Faster Masked Copying 91 8
Notes on Masked Copying 923
Animation 924
Mode X Animation in Action 924
Works Fast, Looks Great 930
931
939
An Ongoing Journey
Chapter 51
948
949
Sneakers in Space
951
Chapter 52
954
957
966
987
989
Having a Ball
984
992
1000
1003
1005
Chapter 55
Color Modeling in
256-Color Mode
1027
1031
Chapter 58
1068
1092
1095
BSP Trees
1098
VisibilityDetermination 1099
Limitations of BSP Trees 11 00
1101
Visibility Ordmng 11 04
1107
1113
14
1 1 15
Chapter 61
Frames of Reference
1 1 31
1135
Contents
11 4 3
Chapter 62
Chapter 63
1162
Floating-Point for
Real-Time 3-D 1 163
Knowing When to Hurl Conventional Math
Wisdom Out the Window 1165
Not Your Fathers Floating-point 1167
Pentium Floating-point Optimization 1167
Pipelining, Latenq, and Throughput
FXCH 1169
11 68
1175
Contents
11 83
Overdraw 1 184
The Beam Tree 1 185
3-D Engine du Jour 1186
Subdividing Raycast 11 87
Vntex-FreeSurfaces 11 87
The DrawBufer 1 18 7
Span-Based Drawing 1 18 7
Portals 1188
Breakthrough! 1188
Simph.@,and Keep on Trying New Things 1189
Learn Now,Pay Forward 1190
References 1190
Chapter 65
1 191
Polygon Clipping
1 197
Chapter 66
1207
QuakesHidden-Surface
Removal 1209
Struggling with Z-Order Solutions to the
Hidden Surface Problem 121 1
Creative Flux and Hidden Surfaces 1212
Contents
Sorted Spans
1214
Edge-Sorting Keys
1220
1221
1223
Chapter 68
1230
1239
QuakesLighting
Model
1243
Chapter 69
SurfaceCachinandQuakes
Triangle Mode s 1257
Letting the Graphics Card Build the Textures
The Light Map as A y h a Texture 1262
Drawing Triangle Models Fast 1263
Trading Subpixel Precisionfor Speed 1265
A n Idea that Didnt Work 1265
1261
Contents
Previous
Home
Afterword
Index
Contents
1297
1299
Next
Foreword
Home
Next
I talked myself hoarse that day, explaining all the ins and outsof Doom to
Michael and an interested group
of his coworkers.Every few daysafterwards,
I would get an email fromMichael asking for an elaboration on oneof my
points, or discussing an aspectof the futureof graphics.
Eventually, I popped thequestion-I offered him ajob atid. Just think:no
reporting to anyone, an opportunity to code all day, starting with a clean
the right thingas a programmer. It didnt
work.
sheet of paper. A chance to do
I kept at it though, and about a year later I finally convinced him to come
down and take a look at
id. I was working on Quake.
Going from Doom to Quakewas a tremendous step. Iknew where I wanted
I wasnt at all clear what the steps were to get Ithere.
was trying
to end up, but
a huge numberof approaches, andeven the failureswere teaching me a lot.
My enthusiasm must have been contagious, because he took the
job.
Much heroic programming ensued.Several hundred thousand linesof code
were written. And rewritten. And rewritten. And rewritten.
is
In hindsight,I have plenty of regrets aboutvarious aspectsof Quake, but it
a rare person that doesnt freely acknowledge the technicaloftriumph
it. We
nailed it. Sure, a year from
now I will have probably found anew perspective
that will make me cringe at the clunkiness
of some partof Quake, but at the
moment it still looks pretty damn goodto me.
I was very happy to have Michael describe muchof the Quake technology in
his ongoing magazine articles.We learned a lot, andI hope we managed to
teach a bit.
When a non-programmer hears about
Michaels articlesor thesource codeI
have released, I usually get a stunnedWTF would you do that for??? look.
They dont getit.
Programming is not a zero-sum game. Teaching something to a fellow programmer doesnttake it away from you. Im happy to share
what I can, because
gravy, honest!
Im in it for thelove of programming. The Ferraris are just
This book containsmany of the original articles that helped launch
my programming career.I hope my contribution to the contents
of the later articles
can provide similar stepping stones for others.
-John Camnack
id Software
Foreword
Previous
Home
Introduction
What was it like working with John Carmack onQuake? Like being strapped
onto a rocket during takeoff-in the middle of a hurricane. It seemedlike
the whole world was watching, waiting to see if id Software could top Doom;
every casual e-mail tidbit or conversation with a visitor ended up posted on
the Internet within hours. And meanwhile, we were pouring everything we
had into Quakes technology; Id often come inin the morning to find John
still there, working on anew idea so intriguing that he couldnt bear
to sleep
until he had tried it out. Toward the end, when I spent most of my time
speeding things up, would
I
spend the day in a trance writing optimized assembly code, staggerout of the Town East Tower into theblazing Texas heat,
and somehowdrive home on LBJ Freeway without smacking into anyof the
speeding pickups whizzing past me on both sides. At home, Id fall into a
fitful sleep, then come back the next day in a daze and do it again. EveryI wonder
thing happenedso fast, and underso much pressure, that sometimes
how any of us made it through that without completely burning out.
At the same time, of course, it was tremendously exciting. Johns ideas were
endless and brilliant, and Quake ended up establishing a new standard for
Internet and first-person 3-D game technology. Happily, id has an enlightened attitude about sharing information,was
and
willing to let mewrite about
the Quake technology-both how it worked and how it evolved. Over the two
years I workedat id, wrote
I
a number of columns about Quake in07:Dobbs
Sourcebook, as well as a detailed overview for the 1997 Computer GameDevelopers Conference. You can find these in the latter part of this book; they
represent a rare look into the development and inner workings of leadingedge software development, andI hope you enjoy reading themas much as I
enjoyed developing the technology and writing about it.
The rest of this book is pretty much everything Ive written over the past
decade about graphics and performance programming thats
still relevant to
programming today, and thatcovers a lot of ground. Most of Zen of Ch-aphics
Programming, 2nd Edition is in there (and therest is on the CD) ; all of Zen of
Code Optimization is there too, and even my 1989 book Zen of Assembly Lan-
xxxiii
Next
Previous
Home
Next
I dont have much else to say that hasnt already been said elsewhere in
this book, in one of the introductions to the previous volumes or in one
of the astonishingly large number of chapters. As youll see as you read,
its been quite a decade for microcomputer programmers, Iand
have been
extremely fortunate to not only be a part
of it, but to be able to chronicle
part of it as well.
And the next decadeis shaping upto be justas exciting!
Michael Abrash
Bellevue, Washington
May 1997
Introduction
Previous
Part 1
Home
Next
Previous
chapter 1
Home
Next
My point is simply this:PCs can work wonders. Its not easy coaxing them into doing
that, but its rewarding-and its sure as heck fun. In this book, were going towork
some of those wonders, starting.. .
...now.
mthout good design, good algorithms, and complete understanding of the program
k
operation, your carefully optimized code will amount to one of mankindb least
fruitful creations-a fast slow program.
Whats a fast slow program? you ask. Thats a good question, and a brief (true)
story is perhaps the best answer.
Chapter 1
Being an engineer back then meantknowing howto use a slide rule, andIrwin could
jockey a slipstick withthe best of them. In fact,he was so good that hechallenged a
fellow witha calculator to a duel-and won, becoming a local legend in the
process.
When you get right down to it, though, Irwin was spitting into the wind. In a few
short years his hard-earned slipstick skills would be worthless, and the entirediscipline would be essentially wiped from the face of the earth. Whats more, anyone
with half a brain could see that changeover coming, Irwin had basically wasted the
considerable effortand time he had spent optimizing his soon-to-be-obsoleteskills.
What does all this haveto do with programming? Plenty. When you spend time optimizing poorlydesigned assembly code, or when you count on an optimizing compiler
to make your code fast, youre wastingthe optimization, much as Irwin did. Particularly in assembly, youll find thatwithout proper up-front design and everything else
that goes into high-performance design, youll waste considerableeffort and time on
making an inherently slow program as fast aspossible-which is still slow-when you
could easily have improved
performance a great deal more withjust a little thought. As
well see, handcrafted assembly language and optimizing compilers matter, but less
than you might think, in the grandscheme of things-and they scarcely matter at all
unless theyre used in the context of a good design and a thorough understanding of
both thetask at hand and thePC.
LISTING 1.1
11-1.C
I*
*
*
*
Program t o c a l c u l a t e t h e 1 6 - b i t c h e c k s u m o f a l l b y t e s i n t h e
s p e c i f i e df i l e .O b t a i n st h eb y t e s
one a t a t i m e v i a r e a d 0 .
l e t t i n g DOS p e r f o r ma l ld a t ab u f f e r i n g .
*I
#i
n c l ude < s t d i 0. h>
Chapter 1
i n t Handle;
u n s i g n e dc h a rB y t e ;
u n s i g n e d i n t Checksum:
i n t ReadLength;
i f ( a r g c !- 2 ) {
p r i n t f ( " u s a g e :c h e c k s u mf i l e n a m e \ n " ) :
exit(1):
i f ( (Handle
open(argvC11. 0-RDONLY I 0-BINARY))
p r i n t f ( " C a n ' t open f i l e :% s \ n " .a r g v C 1 1 ) :
exit(1);
- -1
>
/* I n i t i a l i z e t h e
Checksum
0;
checksumaccumulator
*/
/ * Add e a c hb y t ei nt u r ni n t ot h ec h e c k s u ma c c u m u l a t o r
w h i l e ( (ReadLength
r e a d ( H a n d 1 e .& B y t e .s i z e o f ( B y t e ) ) )
Checksum +- ( u n s i g n e di n t )B y t e ;
}
i f ( ReadLength
- -1
*/
) {
) {
p r i n t f ( " E r r o rr e a d i n gf i l e% s \ n " .a r g v C 1 1 ) ;
exit(1):
)
/ * R e p o r tt h er e s u l t
*/
p r i n t f ( " T h e checksum i s : %u\n".
exit(0);
Checksum);
Table 1.1 shows the time takenfor Listing 1.1to generate a checksum ofthe WordPerfect
version 4.2 thesaurus file, TH.WP(362,293 bytes in size), on a 10 MHz AT machine of
no special parentage. Execution times are given for Listing 1.1 compiled with Borland
and Microsoft compilers, with optimization both on andoff; all four times are pretty
much the same, however, and all are much too slow to be acceptable. Listing 1.1 requires over two and one-half minutes to checksum one file!
Listings 1.2 and 1.3form the Uassembly equivalent to Listing 1.1, and Listings
1.6 and 1.7form the Uassembly equivalent to Listing 1.5.
These results make it clear that it's folly to rely on your compiler's optimization to
make your programs fast. Listing 1.1 is simply poorly designed, and no amount of
compiler optimization will compensate for that failing. To drive home thepoint, consider Listings 1.2 and 1.3, which together are equivalent to Listing 1.1 except that the
entire checksum loop is written in tight assembly code. The assembly language implein Table 1.1, but it's less
mentation is indeed faster than any ofthe C versions, as shown
than 10 percent faster, and it's still unacceptablyslow.
LISTING1.211-2.C
I*
*
*
*
*I
Program t o c a l c u l a t e t h e 1 6 - b i t c h e c k s u m o f t h e s t r e a m o f b y t e s
f r o mt h es p e c i f i e df i l e .O b t a i n st h eb y t e so n ea t
a timein
a s s e m b l e r .v i ad i r e c tc a l l st o
00s.
# i n c l u d e< s t d i o . h >
# i n c l u d e< f c n t l . h >
m a i n ( i n ta r g c .c h a r* a r g v [ l )
in t Hand1 e;
u n s i g n e dc h a rB y t e :
u n s i g n e d i n t Checksum:
in t ReadLength:
i f ( a r g c !- 2 ) {
p r i n t f ( " u s a g e :c h e c k s u mf i l e n a m e \ n " ) :
exit(1):
i f ( (Handle
open(argvC11. 0-RDONLY I 0-BINARY))
p r i n t f ( " C a n ' to p e nf i l e :% s \ n " .a r g v C 1 1 ) :
exit(1):
i f ( !ChecksumFile(Handle.&Checksum)
) {
p r i n t f ( " E r r o rr e a d i n gf i l e% s \ n " .a r g v C 1 1 ) :
exit(1):
1
I* R e p o r t t h e r e s u l t
*I
p r i n t f ( " T h ec h e c k s u m
exit(0);
i s : %u\n".Checksum):
10
Chapter 1
- -1
LISTING1.311-3.ASM
; A s s e m b l e rs u b r o u t i n et op e r f o r m
a 1 6 - b i t checksumon
; opened on t h ep a s s e d - i nh a n d l e .S t o r e st h er e s u l ti nt h e
;
p a s s e d - i nc h e c k s u mv a r i a b l e .R e t u r n s
1 f o rs u c c e s s ,
thefile
0 f o re r r o r .
; C a lal s :
i n t ChecksumFile(unsigned i n t H a n d l e ,u n s i g n e di n t* C h e c k s u m ) ;
; where:
--
Handle
handle # u n d e rw h i c hf i l et oc h e c k s u mi s
Checksum
p o i n t e rt ou n s i g n e di n tv a r i a b l ec h e c k s u m
t o b es t o r e di n
open
is
; P a r a m e t e rs t r u c t u r e :
Parms
Hand1
Checksum
Pa rms
struc
dw
dw
dw
dw
?
?
?
?
.modelsmal
.data
word
db
db
;pushed BP
; r e t u r na d d r e s s
ends
TempWord1 a b e l
TempByte
-ChecksumFile
.code
pub1 ic
p r once a r
push
mov
push
mov
sub
?
0
-ChecksumFi
bp
bp. sp
si
le
v a r iraebglies t e r C ' s
:save
; g e tf i l eh a n d l e
: z e r ot h ec h e c k s u m
;accumulator
bx.[bp+Handlel
si ,si
d x . o f f s e t TempByte
mov
int
jc
and
jz
add
ah,3fh
21h
ErrorEnd
ax.ax
Success
si.[TempWord]
jmp
ChecksumLoop
ChecksumLoop:
ErrorEnd:
Success :
; e r r o r sub
jmp
mov
mov
mov
;read
; p o i n t DX t ot h eb y t ei n
; w h i c h DOS s h o u l ds t o r e
: e a c hb y t er e a d
:DOS r e a d f i l e f u n c t i o n
#
; r e a dt h eb y t e
:an e r r o ro c c u r r e d
; a n yb y t e sr e a d ?
;no-end o f f i l e reached-we'redone
; a d dt h eb y t ei n t ot h e
;checksum t o t a l
ax,ax
s h o r t Done
bx.[bp+Checksuml
[bxl ,si
ax.1
; p o i n tt ot h ec h e c k s u mv a r i a b l e
; s a v et h e
new checksum
;success
11
Done:
POP
POP
ret
si
bP
: r e s t o r e C s r e g i s t e rv a r i a b l e
-ChecksumFile
endp
end
The lesson is clear: Optimizationmakes code faster, but withoutproper design, optimization just creates fast slow code.
Well, then, how are we going toimprove our design? Before we can do that,we have
to understandwhats wrong with the current design.
12
Chapter 1
Make sure you understand what really goes on when you insert a seeminglyinnocuous function call into the time-critical portions of your code.
In this case that means knowing how DOS and the C / C t t file-access libraries do
their work. In other words, know the territory !
LISTING1.411-4.C
I*
*
*
*
*/
Program t o c a l c u l a t e t h e 1 6 - b i t
checksum o f t h e s t r e a m o f b y t e s
f r o mt h es p e c i f i e df i l e .O b t a i n st h eb y t e so n ea t
a t i m ev i a
g e t c 0 .a l l o w i n g
C t op e r f o r md a t ab u f f e r i n g .
# i n c l u d e < s t d i o . h>
m a i n ( i n ta r g c .c h a r* a r g v [ ] )
F I L E* C h e c k F i l e :
i n tB y t e :
u n s i g n e di n t Checksum:
i f ( a r g c != 2 ) {
p r i n t f ( " u s a g e :c h e c k s u mf i l e n a m e \ n " ) :
exit(1):
i f ( (CheckFile
= f o p e n ( a r g v C 1 1 ." r b " ) )
=- NULL
p r i n t f ( " C a n ' t open f i l e :% s \ n " .a r g v [ l ] ) :
exit(1):
/* I n i t i a l i z e t h e
Checksum = 0 :
checksumaccumulator
*/
/ * Add e a c h b y t e i n t u r n i n t o t h e
checksumaccumulator
w h i l e ( ( B y t e = g e t c ( C h e c k F i 1 e ) ) != EOF
{
Checksum += ( u n s i g n e di n t )B y t e :
*/
/ * R e p o r tt h er e s u l t
p r i n t f ( " T h ec h e c k s u m
exit(0):
*/
i s : %u\n". Checksum):
13
Ifyou were to implement any of the listings in this chapter entirely in hand-optimized assembly, I suppose you might get a performance improvement of a few
percent-but Irather doubtyou iiget
even that much,andyou iisure asheckspend
an awful lot of time for whatever meager improvementdoes result. Let C do what
it does well,and use assembly only whenit makes a perceptibledzfference.
Besides, we dont want to optimize untilthe design is refined to our satisfaction, and
that wont be thecase until weve thought about other approaches.
14
Chapter 1
I'll ignore the first reason, both because performance is no longer an issue if the
code is fast enough andbecause the currentapplication doesnot run fast enough1 3 seconds is a longtime. (Stop and wait for 1 3 seconds while you're doing something
intense, andyou'll see just how long it is.)
The second reason is the hallmark of the mediocre programmer. Know when optimization matters-and then optimize when it does!
The third reason is often fallacious. C library functions are not always written in
assembly, nor arethey always particularly well-optimized. (In fact, they're oftenwritten forportability, which has nothing to do with optimization.) What's more, they're
general-purpose functions,
and often can be outperformedby well-but-not- brilliantlywritten code that is well-matched to a specific task. As an example, consider Listing
1.5,which uses internal bufferingto handle blocks of bytesat a time. Table 1.1 shows
that Listing 1.5 is 2.5 to 4 times faster thanListing 1.4 (and as much as 49 times faster
than Listing 1.1 !), even though it uses no assembly at all.
LISTING1.511-5.C
I*
*
*
Program t o c a l c u l a t e t h e 1 6 - b i t
checksum o f t h e s t r e a m o f b y t e s
f r o mt h es p e c i f i e df i l e .B u f f e r st h eb y t e si n t e r n a l l y ,r a t h e r
t h a n l e t t i n g C o r DOS do t h ew o r k .
*I
#i
n c l u d e < s t d 0.
i h>
d i n c l u d e < f c n t l h>
# i n c l u d e< a l l o c . h >
I* a l 1 o c . hf oBr o r l a n d .
r n a l 1 o c . hf o M
r icrosoft
# d e f i n e BUFFER-SIZE
0x8000
I* 32Kb d a t ab u f f e r
*I
*/
m a i n ( i n ta r g c .c h a r* a r g v [ I )
[
in t Hand1 e ;
u n s i g n e di n t Checksum:
u n s i g n e dc h a r* W o r k i n g B u f f e r .* W o r k i n g P t r ;
i n t W o r k i n g L e n g t h L. e n g t h c o u n t ;
i f ( a r g c != 2 1 {
p r i n t f ( " u s a g e :c h e c k s u mf i l e n a r n e \ n " ) :
exit(1);
-- -1
I
I* Get memory i n w h i c h t o b u f f e r t h e d a t a
*I
i f ( ( W o r k i n g B u f f e r = malloc(BUFFER-SIZE)) == NULL
p r i n t f ( " C a n ' tg e te n o u g hm e m o r y \ n " ) :
) {
15
I* I n i t i a l i z e t h e
Checksum = 0:
checksumaccumulator
*I
I* P r o c e s s t h e f i l e i n
BUFFER-SIZE chunks * I
do {
i f ( (WorkingLength = read(Hand1eW
. orkingBuffer.
BUFFER-SIZE)) == - 1 ) {
p r i n t f ( " E r r o rr e a d i n gf i l e% s \ n " .a r g v [ l ] ) ;
exit(1);
I* Checksum t h i s c h u n k * I
WorkingPtr
WorkingBuffer:
Lengthcount = WorkingLength:
1
w h i l e ( Lengthcount"
I* Add e a c h b y t e i n t u r n i n t o t h e
checksumaccumulator
Checksum += ( u n s i g n e di n t )* W o r k i n g P t r + + :
1
1 while
WorkingLength
I* R e p o r tt h er e s u l t
*I
p r i n t f ( " T h ec h e c k s u mi s :
exit(0);
*I
);
%u\n". Checksum);
16
Chapter 1
and 1.7, outperforms even the best-optimized C version of Listing 1.5 by 26 percent.
These are considerable improvements, well worth pursuing-once the design has
been maxed out.
LISTING 1.6 11-6.C
/*
*
*
*
*
Program t o c a l c u l a t e t h e 1 6 - b i t
checksum o f t h e s t r e a m o f b y t e s
f r o mt h es p e c i f i e df i l e .B u f f e r st h eb y t e si n t e r n a l l y ,r a t h e r
t h a n l e t t i n g C o r DOS do t h e w o r k , w i t h t h e t i m e - c r i t i c a l
p o r t i o no ft h ec o d ew r i t t e ni no p t i m i z e da s s e m b l e r .
*I
# i n c l u d e< s t d i o . h >
# i n c l u d e < f c n t l h>
# i n c l u d <e a l l o c . h >
/ * a l 1 o c . fhoBr o r l a n d .
*/
m a l 1 o c . hf o M
r icrosoft
# d e f i n e BUFFER-SIZE
0x8000
/*
3 2 K d a t ab u f f e r
*I
m a i n ( i n ta r g c .c h a r* a r g v [ ] )
t
i n t Handle:
u n s i g n e di n t Checksum:
u n s i g n e dc h a r* W o r k i n g B u f f e r :
i n tW o r k i n g L e n g t h ;
i f ( a r g c != 2 )
p r i n t f ( " u s a g e :c h e c k s u mf i l e n a m e \ n " ) :
exit(1):
= open(argv[l].
0-ROONLY I 0-BINARY))
p r i n t f ( " C a n ' to p e nf i l e :% s \ n " .a r g v [ l l ) :
exit(1);
i f ( (Handle
-1
==
/ * Get memory i n w h i c h t o b u f f e r t h e d a t a
i f ( (WorkingBuffer
/* I n i t i a l i z e t h e
Checksum = 0 :
checksumaccumulator
I* P r o c e s s t h e f i l e i n
do
*/
malloc(BUFFER-SIZE))
p r i n t f ( " C a n ' tg e te n o u g hm e m o r y \ n " ) :
exit(1);
=
32K chunks
==
NULL 1
*/
*/
i f ( ( W o r k i n g L e n g t h = r e a d ( H a n d 1 e .W o r k i n g B u f f e r .
BUFFER-SIZE)) == -1 ) 1
p r i n t f ( " E r r o rr e a d i n gf i l e% s \ n " .a r g v C 1 1 ) :
exit(1);
I
/ * Checksum t h i s chunk i f t h e r e ' s a n y t h i n g i n
it */
WorkingLength )
ChecksumChunk(WorkingBuffer.WorkingLength.&Checksum);
] w h i l e ( WorkingLength ) :
if
/ * R e p o r tt h er e s u l t
*/
p r i n t f ( " T h ec h e c k s u mi s :% u \ n " .C h e c k s u m ) :
exit(0):
17
: checksum.
; C a lal s :
;
v oCi dh e c k s u m C h u n k ( u n s i g ncehd*aBru f f e r .
u n s i g n ei ndBtu f f e r L e n g t hu .n s i g n ei nd* C
t hecksum);
; where:
;
B u f f e r = p o i n t e rt os t a r to fb l o c ko fb y t e st o
checksum
;
BufferLength
# o fb y t e st o
checksum ( 0 means
64K.
n o t 0)
;
Checksum = p o i n t e tr ou n s i g n e di n vt a r i a b l e
checksum i s
;
s t oi rne d
: P a r a m e t e rs t r u c t u r e :
Parms s t r u c
Buffer
BufferLength
Checksum
Parmsends
dw
dw
dw
dw
dw
?
?
?
?
?
;pushed BP
: r e t u r na d d r e s s
.modelsmall
.code
p u b l i c _ChecksumChunk
-ChecksumChunk
pnr oe ca r
push
bp
mov
bp.sp
push s i
v ar er igaibs lteeCr' s ; s a v e
c ld
;make LODSB i n c r e m e n t S I
mov
s. [i b p + B u f f e r l
;point t o buffer
mov
cx.[bp+BufferLengthl
; g e tb u f f e rl e n g t h
mov
bx.[bp+Checksuml
: p o i n t t o c h e c k s u mv a r i a b l e
; g e tt h ec u r r e n tc h e c k s u m
mov
d x[ ,b x l
ah,ah
sub
; s o AX will be a 1 6 - b i t v a l u e a f t e r
ChecksumLoop:
1 odsb
; g e tt h en e x tb y t e
dx.ax
add
:add i t i n t o t h e checksum t o t a l
l o o p ChecksumLoop
: c o n t i n u ef o ra l lb y t e si nb l o c k
; s a v et h e new checksum
mov
[ b x ] ,d x
LODSB
r epop
vg;ariC
resi 'stasetbosrl riee
POP
bp
ret
ChecksumChunk
endp
end
Note that in Table 1.1, optimization makes little difference except in the case of
Listing 1.5, where the design has been refined considerably. Execution time in the
other cases is dominated by time spent in DOS and/or the C library, so optimization
of the code you write is pretty much irrelevant. What's more, while the approximately two-times improvement we got by optimizing is not to be sneezed at, itpales
against the up-to-50-times improvement we got by redesigning.
18
Chapter 1
Previous
Home
Next
By the way, the execution times even of Listings 1.6 and 1.7 are dominated by DOS
disk access times. If a disk cache is enabled and the file to be checksummed is already in the cache, the assembly version is three times as fast as the C version. In
other words, the inherent nature
of this application limits the performanceimprovement that can be obtained via assembly. In applications that are moreCPU-intensive
and less disk-bound, particularly those applications in which string instructions and/
or unrolled loops can be used effectively, assembly tends to be considerably faster
relative to C thanit is in this very specific case.
All this is basicallya way of saying: Knowwhere youre going, know the territory, and
know when itmatters.
19
Previous
chapter 2
a world apart
Home
Next
1:i
.n
:I;
e!:J
Nature of Assembly LanguageOptimization
23
I examined the subroutineline by line, saving a cycle here and a cycle there, until
the code truly seemed to be optimized. When I was done, the key part of the code
looked somethinglike this:
LoopTop:
l o d;sgtbnehetebxeyttxtoet r a c t
a bf ri to m
and
a1 , a; ihs o l a t hebei t
we w a n t
rol
a1 . c; rl o t a ttehbeii nt ttohdee s i r epdo s i t i o n
or
b l . a: 1i n s e trhtbei int t ohf ei n na il b b l e
1 p l a ct hoer ei g h t
d;cet nxhcebgxiot e s
; cdoxeucn t
down tnhuemb boi tefs r
jnz
L o o p T o: p r o c e st hsneebx itt ,
i f any
Now, its hard towrite code thats much faster than seven instructions, only one of
which accesses memory, and most programmers would have called it a day at this
point. Still, something bothered me, so I spent a bit of time going over the code
again. Suddenly, the answer struck me-the code was rotating each bit intoplace
separately, so that a multibit rotation was being performedevery time through the
loop, fora total of four separatetime-consuming multibit rotations!
; g e tt h en e x tb y t e
t o extract a bitfrom
: i s o l a t e t h e b i t we w a n t
: i n s e r tt h eb i ti n t ot h ef i n a ln i b b l e
;make room f o r t h e n e x t b i t
; c o u n t down t h e number o f b i t s
: p r o c e s st h en e x tb i t ,
i f any
: r o t a t ea l lf o u rb i t si n t ot h e i rf i n a l
: p o s i t i o n s a t t h e same t i m e
This moved the costly multibit rotation out of the loopso that itwas performed just
once, rather than four
times. While the codemay not look much different fromthe
original, and in fact still contains exactly the same number of instructions, the performance of the entire subroutine
improved by about 10 percent from just
this one
change. (Incidentally, that wasnt the endof the optimization; I eliminated the DEC
andJNZ instructions by expanding the fouriterations of the loop-but thats a tale
for another chapter.)
The pointis this:To write trulysuperior assembly programs, you need to know what
the various instructions do andwhich instructions execute fastest...and more. You
must also learn to look at your programming problems fromvariety
a
ofperspectives
so that you can put those fast instructions to work in the most effective ways.
24
Chapter 2
Transformation Inefficiencies
N o matter how well an implementation is derived from the correspondingdesign,
however, high-levellanguages like C/C++ and Pascal inevitablyintroduce additional
transformation inefficiencies, as shown in Figure 2.1.
The process of turning a design into executable code by way of a high-level language
involves two transformations: one performedby the programmerto generate source
code, and another performed by the compiler to turn source code into machine
High-Level Language
Compiled to machine
language by a high-level
language compiler
(Transformation #2)
Language Code
Figure 2.1
A World Apart 25
Assem bler
Source Code
Language Code
26
Chapter 2
The key, of course, is the programmer, since in assembly the programmer mustessentially perform the transformation fromthe application specification to machine
language entirelyon his or herown. (The assembler merely handles thedirect translation from assembly to machine language.)
Self-Reliance
The first part of assemblylanguage optimization,then, is self-reliance.An assembler
is nothing more thana tool to let you design machine-language programs without
having to think in hexadecimal codes. S o assembly language programmers-unlike
all other programmers-must take full responsibility for the quality of their code.
Since assemblers provide little help at any level higher than the generation of machine language, the assembly programmer must be capable both of coding any
programming construct directly and of controlling the PC at the lowest practical
level-the operating system, the BIOS, even the hardware where necessary. Highlevel languages handle most of this transparently to the programmer,
but in assembly
everything is fair-and necessary-game, which
brings us to another aspect of assembly optimization: knowledge.
Knowledge
In the PC world, you can never have enough knowledge, and every item you add to
your store will make your programs better.Thorough familiarity with both the operating system APIs and BIOS interfaces is important; since those interfaces are
well-documented and reasonably straightforward, my advice is to get a good book or
two and bring yourself up to speed. Similarly, familiarity with the PC hardware is
required. While that topic covers a lotof ground-display adapters, keyboards, serial
ports, printer ports, timer and DMA channels, memory organization, and moremost of the hardware is well-documented, and articles about programming major
hardware components appearfrequently inthe literature,so this sort of knowledge
can be acquiredreadily enough.
The single most critical aspect of the hardware,and the one about
which it is hardest
to learn,is the CPU. The x86 family CPUshave a complex, irregular instruction set,
and, unlike most processors, they are neitherstraightforward nor well-documented
regarding true code performance.
Whats more, assembly isso difficult to learn that
most articles and books that present assembly code settle for code that justworks,
rather than code that pushes the CPU to its limits. In fact, since most articles and
books are written for inexperiencedassembly programmers, there is verylittle information of any sort available about how to generate high-quality assembly code for
the x86 family CPUs.As a result,knowledge about programming themeffectively is
by far the hardest knowledge to gather. A good portion of this book is devoted to
seeking out such knowledge.
A World Apart
27
Be forewarned, though: No matter how much you learn about programming the
PC in assembly, there 5 always more to discover.
28
Chapter 2
Simple logic dictates thatno compiler canknow asmuch aboutwhat a piece of code
needs todo or adapt
as well to those needsas the person who wrote the code.Given
that superior information andadaptability, an assembly language programmer can
generate better code than
a compiler,all the more so given that compilers are constrained by the limitations of high-levellanguages and by the process of transformation
from high-level to machine language.Consequently, carefully optimized assembly is
notjust the language of choice but the only choice for thelpercent to 10 percent of
code-usually consisting of small, well-defined subroutines-that determines overall program performance, andit is the only choice for code that must as
becompact
as possible, as well. In the run-of-the-mill, non-time-critical portions of your programs, it makes no sense to waste time and effort on writing optimized assembly
code-concentrate your efforts on loops and the like instead; but in those areas
where you need the finest code quality, accept no substitutes.
Note that I said that an assembly programmer can generate better code than acompiler, not will generate better code.While it is true that goodassembly code is better
than good compiled code, it is also true that bad assembly code is often much worse
than bad compiled code; since the assembly programmer has so much control over
the program,he orshe has virtually unlimited opportunities to waste cycles and bytes.
The sword cuts both ways, and good assemblycode requires more, not less, forethought
and planning than good code
written in ahigh-level language.
The gist of all this is simply that goodassembly programming is done in the context
of a solid overall framework unique to each program, and theflexible mind is the
key to creating that framework and holding ittogether.
A World Apart
29
Previous
Home
Next
Where to Begin?
To summarize, the skill of assembly language optimization is a combinationof knowledge, perspective, and a way of thought thatmakes possiblethe genesis of absolutely
the fastest or the smallest code. With that in mind, what should the first step be?
Development of the flexible mind is an obvious step. Still, the flexible mind is no
better than the
knowledge at its disposal. The first step in the journey
toward mastering optimization at that exalted level, then, would seem to be learning how to learn.
30
Chapter 2
Previous
chapter 3
assume nothing
Home
Next
chapter 3
33
There you have an important tenet of assembly language optimization: After crafting the best code possible, check it in action to see if it j. really doing what you
think it is. r f it k not behaving as expected, that 5. all to the good, since solving
mysteries is thepath to knowledge. Youll learn more in thisway, Iassure you,than
from any manual or book on assembly language.
care about performance, do your best to improve the code and thenmeasure the improvement. If
you dont measure performance, youre just guessing, and if youre guessing, youre
not very likely to write top-notch code.
Ignorance about true performance can be costly. When I wrote video games for a
living, I spent days at a time trying to wring more performance from my graphics
drivers. I rewrote whole sections of code justto save a few cycles,juggled registers,
and relied heavily on blurry-fast register-to-register shifts and adds. As I was writing
my last game, I discovered that the program ranperceptibly faster if I used look-up
tables instead of shifts and adds formy calculations. It shouldnt have run faster, according to my cycle counting, but it did. In truth, instruction fetching
was rearing its
head again, as it often does, and the fetching of the shifts and adds was taking as
much as four times the nominal execution time of those instructions.
Ignorance can also be responsible for considerable wasted effort. I recall a debate in
the letters column of one computermagazine about exactly how quickly text can be
drawn on a Color/Graphics Adapter(CGA) screen without causing snow. The letterwriters counted every cycle in their timing loops, just as the authorin the story that
started this chapter had. Like that author, the letter-writers had failed to take the
prefetch queue into account. In fact, they had neglected the effects of video wait
states as well, so the code they discussed was actually much slower than their estimates. The proper test would, of course, have been to run the code to see if snow
resulted, since the only true measure of code performanceis observing it inaction.
34
Chapter 3
PZTIMER.ASM
Zen t i m e r ,w i t hi n t e r r u p t sd i s a b l e d .
Z T i m e r O f f :S t o p st h e
Zen t i m e r ,s a v e st h et i m e rc o u n t ,
t i m e st h eo v e r h e a dc o d e ,a n dr e s t o r e si n t e r r u p t st ot h e
s t a t et h e yw e r ei n
when ZTimerOn was c a l l e d .
Z T i m e r R e p o r t :P r i n t st h en e tt i m et h a tp a s s e db e t w e e ns t a r t i n g
a n ds t o p p i n gt h et i m e r .
5 4 ms passesbetweenZTimerOnand
Note: I f l o n g e rt h a na b o u t
Z T i m e r O f fc a l l s ,t h et i m e rt u r n so v e ra n dt h ec o u n ti s
i n a c c u r a t e . When t h i sh a p p e n s ,
an e r r o r message i s d i s p l a y e d
i n s t e a d o f a c o u n t . The l o n g - p e r i o d Zen t i m e rs h o u l db eu s e d
i n suchcases.
N o t e :I n t e r r u p t s
*MUST* be l e f t o f f b e t w e e nc a l l st o
a n dZ T i m e r O f ff o ra c c u r a t et i m i n ga n df o rd e t e c t i o no f
t i m e ro v e r f l o w .
ZTimerOn
N o t e :T h e s er o u t i n e sc a ni n t r o d u c es l i g h ti n a c c u r a c i e si n t ot h e
s y s t e mc l o c kc o u n tf o re a c hc o d es e c t i o nt i m e de v e n
if
t i m e r 0 d o e s n to v e r f l o w .
I f t i m e r 0 d o e so v e r f l o w ,t h e
s y s t e mc l o c kc a n
become s l o w b y v i r t u a l l y
anyamount
of
t i m e ,s i n c et h es y s t e mc l o c kc a n ta d v a n c ew h i l et h e
p r e c i s o nt i m e ri st i m i n g .C o n s e q u e n t l y ,i t s
a goodidea
t or e b o o ta tt h e
end o fe a c ht i m i n gs e s s i o n .( T h e
Assume Nothing
35
b a t t e r y - b a c k e dc l o c k ,
timer.)
i f any.
i sn o at f f e c t e db yt h e
Zen
: Al r e g i s t e r s , and a l lf l a g se x c e p tt h ei n t e r r u p tf l a g ,a r e
: p r e s e r v e db ya l lr o u t i n e s .I n t e r r u p t sa r ee n a b l e da n dt h e nd i s a b l e d
: b yZ T i m e r O n .a n da r er e s t o r e db yZ T i m e r O f ft ot h es t a t et h e yw e r e
: i n when ZTimerOn was c a l l e d .
Code
s e g m e nw
t o r dp u b l i c
'CODE'
assume
cs:Code.
ds:nothing
public
ZTimerOn.
ZTimerOff.
ZTimerReport
: Base a d d r e s so ft h e8 2 5 3t i m e rc h i p .
EASEL8253
40h
equ
: T h ea d d r e s so ft h et i m e r
TIMER-0-8253
equ
; T h ea d d r e s so ft h e
MODEL8253
0 c o u n tr e g i s t e r si nt h e8 2 5 3 .
BASE-8253 + 0
mode r e g i s t e r i n t h e
equ
8253.
EASEL8253 + 3
: T h ea d d r e s so fO p e r a t i o n
Command Word3
i n t h e 8259Programmable
: I n t e r r u p tC o n t r o l l e r ( P I C ) ( w r i t eo n l y ,a n dw r i t a b l eo n l y
when
: b i t 4 o ft h eb y t ew r i t t e nt ot h i sa d d r e s si s
OCW3
20h
0 and b i t 3 i s 1).
equ
: T h ea d d r e s so ft h eI n t e r r u p tR e q u e s tr e g i s t e ri nt h e8 2 5 9
: ( r e a do n l y .a n dr e a d a b l eo n l y
when b i t 1 o f OCW3
: o f OCW3
- 0).
I RR
20h
- 1 and b i t 0
PIC
equ
: Macro t oe m u l a t e a POPF i n s t r u c t i o n i n o r d e r t o f i x t h e b u g i n
some
: 8 0 2 8 6c h i p sw h i c ha l l o w si n t e r r u p t st oo c c u rd u r i n g
a POPF even when
: i n t e r r u p t sr e m a i nd i s a b l e d .
MPOPF macro
l o c a lp l .
p2
j m ps h o r tp 2
i rp el :t
push
p2:
cs
c a pl l l
endm
jump t o pushedaddress
& p o pf l a g s
c o n s t r u c tf a rr e t u r na d d r e s st o
t h en e x ti n s t r u c t i o n
t oe n s u r et h a te n o u g ht i m eh a se l a p s e d
: Macro t o d e l a y b r i e f l y
: b e t w e e ns u c c e s s i v e
1/0 accesses s o t h a t t h e d e v i c eb e i n ga c c e s s e d
: canrespond
t ob o t ha c c e s s e se v e n
DELAY macro
jmp
$+2
jmp
5+2
jmp
S+2
endm
36
Chapter 3
ona
v e r y f a s t PC.
db
OriginalFlags
: s t o r a g ef o ru p p e rb y t eo f
FLAGS r e g i s t e r when
ZTimerOn c a l l e d
: t i m e r 0 c o u n t when t h e t i m e r
i ss t o p p e d
number o f c o u n t s r e q u i r e d t o
e x e c u t et i m e ro v e r h e a dc o d e
: used t o i n d i c a t e w h e t h e r t h e
t i m e ro v e r f l o w e dd u r i n gt h e
t i m i n gi n t e r v a l
TimedCount
dw
Referencecount
dw
O v e r f l owFl ag
db
: S t r i n gp r i n t e d
t or e p o r tr e s u l t s .
O u t p u tlSa tbre l
db
ASCIICountEnd
db
db
byte
Odh.
Oah.
' T i m e dc o u n t :
' , 5 dup ( ? )
l a b e lb y t e
' m i c r o s e c o n d s ' , Odh. Oah
'f'
: S t r i n gp r i n t e d
t or e p o r tt i m e ro v e r f l o w .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
db
Odh. Oah
' * The t i m eor v e r f l o w e d ,
so t h ei n t e r v atli m e d
was
db
db
Odh. Oah
db
' * t oloo nfgot hr per e c i s i otni m etmor e a s u r e .
db
Odh, Oah
db
'* P l e a s pe e r f o r mt h tei m i n tge sat g a i wn i t thh e
db
Odh. Oah
db t i m'e*r .l o n g - p e r i o d
Odh.
db
Oah
db
......................................................
db
Odh. Oah
db
't'
t i m i n sg t. a r t
*'
*'
*'
*.
. ....................................................................
*
. ....................................................................
t o: *c aRl loeudt i n e
ZTimerOn
proc
near
Save t h ec o n t e x to ft h ep r o g r a mb e i n gt i m e d .
push
ax
pushf
POP
ax
mov
cs:[OriginalFlags].ah
ah.0fdh
and
g e t f l a g s so we cankeep
i n t e r r u p t s o f f when l e a v i n g
t h i sr o u t i n e
: remember t h e s t a t e o f t h e
Interruptflag
s e tp u s h e di n t e r r u p tf l a g
to 0
push
ax
: T u r no ni n t e r r u p t s ,
: pending.
s o t h et i m e ri n t e r r u p tc a no c c u r
if it's
sti
Assume Nothing
37
S e tt i m e r 0 o f t h e 8 2 5 3 t o
mode2
( d i v i d e - b y - N ) . t o cause
l i n e a rc o u n t i n gr a t h e rt h a nc o u n t - b y - t w oc o u n t i n g .A l s o
l e a v e st h e8 2 5 3w a i t i n gf o rt h ei n i t i a lt i m e r
0 c o u n tt o
b el o a d e d .
mov
out
al.00110100b
MODEL8253 .a1
S e tt h et i m e rc o u n tt o
t i m e ri n t e r r u p tr i g h t
N o t e :t h i si n t r o d u c e s
c l o c kc o u n te a c ht i m e
;mode
0. so we know we w o n ' tg e ta n o t h e r
away.
a ni n a c c u r a c yo fu pt o
54 ms i n t h e s y s t e m
i t i s executed.
DELAY
sub
a1
,a1
out
TIMER-0-8253.al
DELAY
out
TIMER-0-8253.al
:msb
;lsb
W a i tb e f o r ec l e a r i n gi n t e r r u p t st oa l l o wt h ei n t e r r u p tg e n e r a t e d
when s w i t c h i n g f r o m mode 3 t o mode2
t o b er e c o g n i z e d .
The d e l a y
mustbe
a t l e a s t 2 1 0n sl o n gt oa l l o wt i m ef o rt h a ti n t e r r u p tt o
o c c u r .H e r e ,1 0j u m p sa r eu s e df o rt h ed e l a yt oe n s u r et h a tt h e
a v e r y f a s t PC.
d e l a yt i m e will bemorethanlongenoughevenon
r e p t1 0
jmp
S+2
endm
D i s a b l ei n t e r r u p t st og e t
a na c c u r a t ec o u n t .
cli
S e tt h et i m e rc o u n tt o
mov
out
DELAY
as lu. ba l
out
DELAY
out
0 againtostartthetiminginterval.
al.00110100b
MODE-8253.al
; s e t up t o l o a d i n i t i a l
; t i m e rc o u n t
TIMER-0-8253,al
; l o a dc o u n tl s b
TIMER-0-8253.al
; lcooaudn t
msb
; R e s t o r et h ec o n t e x ta n dr e t u r n .
MPOPF
POP
ret
; k e e p si n t e r r u p t so f f
ax
ZTimerOn
endp
.....................................................................
;* cRotgosium
aeutcnontaittipndonl. lge d
*
.....................................................................
ZTimerO
p rnfofeca r
38
Chapter 3
: Save t h ec o n t e x t
o f t h ep r o g r a mb e i n gt i m e d .
push
ax
push
cx
pushf
: L a t c ht h ec o u n t .
mov
out
; lt ai m
t cehr
al,00000000b
MODE-8253,al
: See i f t h et i m e rh a so v e r f l o w e db yc h e c k i n gt h e8 2 5 9f o r
a pending
: t i m e ri n t e r r u p t .
mov
al.00001010b
out
OCW3.al
DELAY
in
a1,IRR
: I n t e r r u p tR e q u e s tr e g i s t e r
and
al,1
mov
cs:[0verflowFlag].al
: A l l o wi n t e r r u p t st o
IRnet;eqrureeuaspdtt
; register
: s e t AL t o 1 i f IRQO ( t h e
: t i m e ri n t e r r u p t )i sp e n d i n g
: s t o r et h et i m e ro v e r f l o w
: status
happenagain.
sti
: Read o u tt h ec o u n t
we l a t c h e d e a r l i e r .
in
al.TIMER_0-8253
DELAY
mov
ah.al
in
a1 .TIMER-0-8253
xchg
ah.al
neg
; l e a s ti g n i f i c a n
b ty t e
; m o ssti g n i f i c a nbty t e
c o:ufcnrot dnmovw
e rnt
; r e m a i n i n gt oe l a p s e d
: count
mov
cs:[TimedCountl.ax
: Time a z e r o - l e n g t hc o d ef r a g m e n t ,t og e t
a r e f e r e n c e f o r how
; much o v e r h e a dt h i sr o u t i n eh a s .T i m e
it 1 6t i m e sa n da v e r a g e
it,
: f o ra c c u r a c y ,r o u n d i n gt h er e s u l t .
ax
mov
mov
cli
cs:[ReferenceCountl,O
cx.16
: ailnl ottew
or or uf fp t s
: D r e c i s er e f e r e n c ec o u n t
RefLoop:
cR
a lel f e r e n c e Z T i m e r D n
cRa lel f e r e n c e Z T i m e r O f f
1 oop Ref Loop
sti
add
cs:[ReferenceCount].8
mov
c l .4
s hc rs : [ R e f e r e n c e C o u n t ] . c l
: total
( 0.5
16)
: ( t o t a l ) / 16 + 0.5
: R e s t o r eo r i g i n a li n t e r r u p ts t a t e .
POP
ax
: r e t r i e v e f l a g s when c a l l e d
Assume Nothing
39
mov
ch.~s:[OriginalFlags1
: g e tb a c kt h eo r i g i n a lu p p e r
b y t eo ft h e
FLAGS r e g i s t e r
o n l yc a r ea b o u to r i g i n a l
interrupt flag
keep all o t h e r f l a g s i n
t h e i rc u r r e n tc o n d i t i o n
make f l a g s w o r d w i t h o r i g i n a l
interruptflag
p r e p a r ef l a g st ob ep o p p e d
and
ch.not
Ofdh
or
ah.ch
push
ax
: R e s t o r et h ec o n t e x t
o ft h ep r o g r a mb e i n gt i m e da n dr e t u r nt o
MPOPF
POP
POP
ret
...
...
ah.0fdh
and
it.
: r e s t o r et h ef l a g sw i t ht h e
: o r i g i n a li n t e r r u p ts t a t e
cx
ax
ZTimerOffendp
: C a l l e db yZ T i m e r O f f
t os t a r tt i m e rf o ro v e r h e a dm e a s u r e m e n t s .
ReferenceZTimerOnproc
: Save t h e c o n t e x t o f
push
ax
pushf
near
t h ep r o g r a mb e i n gt i m e d
: i n t e r r u patarsel r e a od fyf
: S e tt i m e r 0 o f t h e 8 2 5 3 t o
mode 2 ( d i v i d e - b y - N ) , t o c a u s e
: l i n e a rc o u n t i n gr a t h e rt h a nc o u n t - b y - t w oc o u n t i n g .
: s e t up lt oo a d
mov
al.00110100b
out
MODE-8253.al
DELAY
: S e tt h et i m e rc o u n tt o
a1,al
sub
out
TIMER-0-8253,al
DELAY
o Tu It M E R - 0 8 2 5 3 , a l
: i n i t ti ai ml ceoru n t
0.
: l o acdo u nl st b
: l o acdo u n t
msb
: R e s t o r et h ec o n t e x to ft h ep r o g r a mb e i n gt i m e da n dr e t u r nt o
MPOPF
POP
ax
ret
ReferenceZTimerOnendp
: C a l l e db yZ T i m e r O f ft os t o pt i m e ra n da d dr e s u l tt oR e f e r e n c e c o u n t
: f o ro v e r h e a dm e a s u r e m e n t s .
40
Chapter 3
it.
near proc
ReferenceZTimerOff
: Save t h ec o n t e x t
o ft h ep r o g r a mb e i n gt i m e d .
push
ax
push
cx
pushf
: L a t c ht h ec o u n ta n dr e a d
it.
mov
out
DELAY
in
DELAY
mov
in
xchg
neg
a1 .00000000b
MODEU3253,al
: l a t c ht i m e r
a1 .TIMER-0_8253
: lsb
add
cs:[ReferenceCountl,ax
ah.al
al.TIMER-OC8253
ah.al
ax
: msb
: c o n v e r tf r o mc o u n t d o w n
: r e m a i n i n gt o amount
: c o u n t e d down
: R e s t o r et h ec o n t e x to ft h ep r o g r a mb e i n gt i m e da n dr e t u r n
t o it.
MPOPF
POP
cx
POP
ax
ret
ReferenceZTimerOffendp
. ....................................................................
*
. ....................................................................
rt;rei em
s*pui onR
l trgcosot a.ultliende
Z T i m e nr Repear pro oc r t
pushf
push
ax
push
bx
push
cx
push
dx
push s i
push
ds
cs
ds
push
POP
assume
; Check f o r t i m e r
ds
: DOS f u n c t i o n s r e q u i r e t h a t
DS p o i n t
: tt oe xt ot
dbies p l a y e d
on t shcer e e n
:Code
0 overflow.
[ O v e r f l owFl a g l .O
PrintGoodCount
mov
d x . o f f sO
e tv e r f l o w S t r
mov
ah.9
int
21h
jm
s hpoErnt d Z T i m e r R e p o r t
cmp
jz
: C o n v e r tn e tc o u n tt od e c i m a l
A S C I I i nm i c r o s e c o n d s .
Assume Nothing
41
Previous
PrintGoodCount:
mov
ax.CTimedCount1
ax.[ReferenceCount]
sub
mov
s i , o f f s eAt S C I I C o u n t E n d
Home
Next
- 1
; C o n v e r tc o u n tt om i c r o s e c o n d sb ym u l t i p l y i n gb y. 8 3 8 1 .
mov
mu1
mov
db ixv
dx.8381
dx
bx,
10000
:*
.8381
: C o n v e r tt i m ei nm i c r o s e c o n d st o
mov
mov
CTSLoop:
sub
div
add
mov
dec
1 oop
-*
8381 / 10000
5 decimal A S C I I d i g i t s
bx. 10
cx.5
dx.dx
bx
d1:O
[sil.dl
si
CTSLoop
; P r i n tt h er e s u l t s .
mov
mov
int
ah.9
d x , o f f sO
e tu t p u t S t r
21h
EndZTimerReport:
POP
ds
pop
si
POP
dx
POP
cx
POP
bx
POP
ax
MPOPF
ret
ZTimerReport
endp
Code
ends
end
42
Chapter 3
Assume Nothing
43
Timer
From bit 0
of port 61 h
Output.
*
To speaker
circuitry
Gate
Timer 1
output 1
+5 volts
(makes the
timers run
non-stop
in all the
modes we'll
discuss)
b DRAM refresh
Gate
Timer
output
Gate
To IRQO (hardware
interrupt 0, the
timer interrupt)
44
Chapter
operation, timer 0 generates an output pulse that is lowfor about 27.5 msand high for
about 27.5 ms; this pulse is sent to the 8259 interrupt controller, and its rising edge
generates a timer interrupt onceevery 54.925 ms.
Square wave mode is not very usefulfor precision timing because
it counts down by two
twice per timer interrupt, thereby rendering exact timings impossible. Fortunately,the
8253 offers another timer mode, mode 2 (divide-by-N mode), which is both a good
substitute for square wave mode and aperfect mode for precision timing.
Divide-by-Nmode counts down by one from the initial count. When the countreaches
zero, the timer turns over and starts counting down again without stopping, and a
pulse is generated for a single clock period. While the pulse is not held fornearly as
long as in square wave mode, it doesnt
matter, since the 8259 interrupt controller is
configured in thePC to be edge-triggered and hencecares only about theexistence
of a pulse from timer 0, not the durationof the pulse. As a result, timer 0 continues
to generate timer interrupts in divide-by-N mode, and thesystem clockcontinues to
maintain good time.
Why not use timer 2 instead of timer 0 for precision timing? After all, timer 2 has a
programmable gate input and isnt used for anything but sound generation. The
problem with timer 2 is that its output cant generate an interrupt; in fact, timer 2
cant do anything but drive the speaker. We need the interrupt generated by the
output of timer 0 to tell us when the count has overflowed, and we will see shortly
that thetimer interrupt also makesit possible to time much longer periods than the
Zen timer shown in Listing 3.1 supports.
In fact, the Zen timer shown in Listing 3.1 can only time intervals of up to about 54
ms in length, since that is the periodof time that can be measured by timer 0 before
its count turns over and repeats. fifty-four ms maynot seem like a very long time, but
even a CPU as slow as the 8088 can perform more than 1,000 divides in 54 ms, and
division is the single instruction that the 8088 performs most slowly. If a measured
period turns out to be longer than 54 ms (that is, if timer 0 has counted down and
turned over), theZen timer will display a message to that effect. A long-period Zen
timer for use in such cases willbe presented later inthis chapter.
The Zen timer determines whether timer 0 has turned over by checking to seewhether
an IRQO interrupt is pending. (Remember, interrupts are off while the Zen timer
runs, so the timer interrupt cannot be recognized until the Zen timer stops and
enables interrupts.) If an IRQO interrupt is pending, then timer 0 has turned over
and generated atimer interrupt. Recall that ZTimerOn initially setstimer 0 to 0, in
order to allowfor thelongest possible period-about 54 ms-before timer 0 reaches
0 and generates the timer interrupt.
Now were ready tolook at theways in which the Zen timer can introduce inaccuracy
into thesystem clock. Since timer 0 is initially set to 0 by the Zen timer,and since the
system clock ticks only when timer 0 counts off 54.925 ms and reaches 0 again, an
average inaccuracy of one-half of 54.925 ms,or about27.5 ms, is
incurred eachtime
Assume Nothing
45
the Zen timer is started. In addition, atimer interrupt is generated when timer 0 is
switched from mode 3 to mode 2, advancing the system clock by up to 54.925 ms,
although this only happens the first time the Zen timer is run after awarm or cold
boot. Finally, up to 54.925 ms canagain be lost when ZTimerOff is called, since that
routine again sets the timer count to zero. Net result: The system clock willrun upto
110 ms (about a ninthof a second) slow each time the Zen timer is used.
Potentially far greater inaccuracy can be incurred by timing code that takes longer
than about 110 ms to execute. Recall that all interrupts, including the timer interrupt, aredisabled while timing code with the Zen timer.The 8259 interrupt controller
is capable of remembering at most one pending timer interrupt,so all timer interrupts after the first one during
any given Zen timing interval are ignored.
Consequently, if a timing interval exceeds 54.9 ms, the system clock effectivelystops
54.9 ms after the timing interval startsand doesnt restart until the timing interval
ends, losing time all the while.
The effects on the system time of the Zen timer arent a matter for great concern,
as
they are temporary, lasting only until the nextwarm or cold boot. Systems that have
battery-backed clocks, (AT-style machines; that is, virtually allmachines in common
use) automatically reset the correcttime whenever the computer is booted, and systemswithout battery-backed clocks
prompt for the correct datetime
andwhenbooted.
Also, repeated use of the Zen timer usually makes the system clock slow by at most a
total of a few seconds, unless code thattakes much longer than54 ms to run is timed
(in which case the Zen timer will notify you that the codeis too long to time).
Nonetheless, its a good idea to reboot
your computer at the
end of each session with
the Zen timer in order to make sure that thesystem clock iscorrect.
46
Chapter 3
47
stored in a buffer within the driver, to be dumped at a later time. (Credit for this
final approach goes to Michael Geary,and thanks go to David Millerfor passing the
idea on to me.)
You may well want to devise still other approaches better suited to your needs than
those Ive presented. Go to it! Ive
just thrown out afew possibilities toget you started.
48
Chapter
IBM machines, or that either absolute or relative code performance will be similar
even on different IBM models; in fact, quite the oppositeis true. For example, every
PS/2 computer, even the relatively slow Model 30, executes code much faster than
does a PC or XT. As another example, I set out to do thetimings for my earlier book
Zen of Assembly Language on an XT-compatible computer, only to find that thecomputer wasn't quite IBM-compatible regarding code performance. The differences
were minor, mind you, but my experience illustrates the risk of assumingthat a specific make ofcomputer will perform ina certain way without actually checking.
Not that this variation between models makes the Zen timer one whit less usefulquite the contrary. The Zen timeris an excellent tool for evaluating code performance
over the entire spectrumof PC-compatible computers.
Program t o measureperformance
o f c o d et h a tt a k e sl e s st h a n
54 ms t oe x e c u t e .
(PZTEST.ASM)
L i n kw i t h PZTIMER.ASM ( L i s t i n g3 . 1 ) .
PZTEST.BAT ( L i s t i n g3 . 4 )
canbeused
t o assembleand
linkbothfiles.
Code t o be
measuredmustbe
i n t h e f i l e TESTCODE; L i s t i n g3 . 3
shows
a sample TESTCODE f i l e .
By M i c h a e A
l brash
m y sst ae cgkm
sp taearncatk
dup(?)
512
db
ends
mystack
'STACK'
Code
s e g m pe anp rtuab l i c
'CODE'
assume
cs:Code.
ds:Code
e x t r nZ T i m e r 0 n : n e a r .Z T i m e r 0 f f : n e a r .2 T i m e r R e p o r t : n e a r
S t a r t p r on ce a r
push
cs
; s e t DS t op o i n t ot h e
code
segment,
pop
ds
; s o d a t aa sw e l la sc o d ec a ne a s i l y
; b ei n c l u d e di n
TESTCODE
include
; D i s p l a yt h er e s u l t s .
c aZ lTl i m e r R e p o r t
; T e r m i n a t et h ep r o g r a m .
Assume Nothing
49
mov
ah.4ch
int
21h
Start endp
Code ends
Se tnadr t
Listing 3.3 shows some sample code to be timed. This listing measures the time required to execute 1,000 loads of AL from the memory variable MemVar. Note that
Listing 3.3 calls ZTimerOn to start timing, performs 1,000 MOV instructions in a
row, and calls ZTimerOff to end timing. When Listing 3.2 is named TESTCODE and
included by Listing 3.3, Listing 3.2 calls ZTimerReport to display the executiontime
after the codein Listing 3.3 has been run.
M e a s u r e st h ep e r f o r m a n c eo f1 , 0 0 0l o a d so f
AL f r o m
memory.(Usebyrenaming
t o TESTCODE. w h i c hi s
i n c l u d e db y
PZTEST.ASM ( L i s t i n g 3 . 2 ) . PZTIME.BAT
( L i s t i n g3 . 4 )d o e st h i s ,a l o n gw i t ha l la s s e m b l y
a n dl i n k i n g . )
j mSpk i:pj u mapr o u ndde f i n edda t a
MemVar
db
Skip:
; S t a r tt i m i n g .
call
ZTimerOn
r e p1 t0 0 0
a1 , [MemVarl
endm
mov
: S t o pt i m i n g .
c aZ lTl i m e r O f f
Its worth noting that Listing 3.3 begins by jumping around the memory variable
MemVar. This approach lets us avoid
reproducing Listing 3.2in its entirety for each code
fragment we want to measure;by defining any needed data right in the code segment
and jumping around that
data, each listing becomes selfcontained andcan be plugged
directly into Listing 3.2 as TESTCODE. Listing3.2 sets DS equal toCS before doing
timed.
anything else preciselyso that datacan be embeddedin code fragments being
Note that only after the initial jump is performed in Listing 3.3 is the Zen timer
started, since we dont want to include the execution time of start-up code in the
timing interval. Thats why the calls to ZTimerOn and ZTimerOff are in TESTCODE,
not in PZTESTMM; this way, wehave full control over whichportion of TESTCODE
is timed, and we can keep set-up code and thelike out of the timing interval.
50
Chapter 3
Listing 3.3 is used by naming it TESTCODE, assembling both Listing 3.2 (which
includes TESTCODE) and Listing 3.1 with TASM or MASM, and linking the two
resulting OBJ files together by way of the Borland or Microsoft linker. Listing 3.4
shows a batch file, PZTIME.BAT, which does all that; when run, this batch file generates and runs the executable file PZTEST.EXE. PZTIME.BAT (Listing 3.4) assumes
that thefile PZTIMER.ASM contains Listing 3.1, and thefile PZTEST.ASM contains
Listing 3.2. The command-line parameter to PZTIME.BAT is the name of the file to
be copied to TESTCODE and included intoPZTEST.ASM. (Note that TurboAssembler can be substituted for MASM by replacing masmwith tasmand linkwith
tlink in Listing 3.4. The same is true of Listing 3.7.)
LISTING 3.4
PZTIME.BAT
echo o f f
rem
rem *** L i s t i n g 3 . 4 ***
rem
rem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
rem * B a t c h f i l e PZTIME.BAT, w h i c hb u i l d sa n dr u n st h ep r e c i s i o n
rem * Zen t i m e rp r o g r a m PZTEST.EXE t ot i m et h ec o d e
named a st h e
rem * c o m m a n d - l ipnaer a m e t Le irs. t i3m
n.g1ubset
named
rem * PZTIMER.ASM. and L i s t i n g3 . 2m u s bt e
named PZTEST.ASM. To
rem * t i mtehceo diLenS T 3 - 3y.o u tdy pteh e
DOS command:
rem *
lst3-3
rem * p z t i m e
rem *
rem * N o t e t h a t MASM andLINKmustbe
inthecurrentdirectoryor
rem * on t h ec u r r e n tp a t hi no r d e rf o rt h i sb a t c hf i l et ow o r k .
rem *
rem * T h i sb a t c hf i l ec a nb es p e e d e du pb ya s s e m b l i n g
PZTIMER.ASM
l i n e s : trheem
rem
o tvhi*ne gno n c e ,
rem *
rem * masm p z t i m e r ;
rem * i f e r r o r l e v e lr r o1r eg no dt o
rem *
this
rem * f r o m
rem *
A brem
r a s h* By M i c h a e l
rem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
rem
remMake s u r e a f i l e t o t e s t
was s p e c i f i e d .
rem
i f n o tx % l - xg o t oc k e x i s t
echo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
test. to
file
echo
echo
* asPpl ee caisf ey
*
...............................................................
g o t oe n d
rem
rem Make s u r e t h e f i l e e x i s t s .
rem
:ckexist
i f e x i s t %1gotodocopy
echo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
echo *s pT eh cefiifliee,d
%1, d o e xs instt,
echo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
g o t oe n d
Assume Nothing
51
Assuming that Listing 3.3 is named LST3-3.ASM and Listing 3.4 is named
PZTIME.BAT, the code in Listing 3.3 wouldbe timed with the command:
p z t i m e LST3-3.ASM
In fact, thats exactly how I timed each of the listings in this book. Code fragments
you write yourself
can be timed in just thesame way. If you wishto time code directly
in place in your programs, rather thanin the test-bed program of Listing 3.2, simply
52
Chapter 3
Assume Nothing
53
The long-period Zen timer has some of the same effects on thesystem time as does
the precision Zen timer, so its a good idea to reboot the system after a session with
the long-period Zen timer. The long-period Zen timer does not, however, have the
same potential for introducingmajor inaccuracy into thesystem clocktime during a
single timing run since it leaves interrupts enabled and thereforeallows the system
clock to update normally.
54
Chapter 3
method will work on all PGcompatible computers, but may occasionally produce results that are incorrect by 54 ms. The timer-stop approach avoids synchronization
problems, but doesn't work on all computers.
LISTING 3.5
UTIMER.ASM
T h el o n g - p e r i o d
Zen t i m e r . (LZTIMER.ASM)
Uses t h e8 2 5 3t i m e ra n dt h e
BIOS t i m e - o f - d a y c o u n t t o t i m e t h e
p e r f o r m a n c eo fc o d et h a tt a k e sl e s st h a n
a nh o u rt oe x e c u t e .
B e c a u s ei n t e r r u p t sa r el e f t
on ( i n o r d e r t o a l l o w t h e t i m e r
i n t e r r u p t t o b er e c o g n i z e d ) ,t h i si sl e s sa c c u r a t et h a nt h e
p r e c i s i o n Zen t i m e r , s o it i sb e s tu s e do n l yt ot i m ec o d et h a tt a k e s
m o r et h a na b o u t5 4m i l l i s e c o n d st oe x e c u t e( c o d et h a tt h ep r e c i s i o n
Zen t i m e rr e p o r t so v e r f l o w
on). R e s o l u t i o n i s l i m i t e d b y t h e
o c c u r r e n c eo ft i m e ri n t e r r u p t s .
By MichaelAbrash
E x t e r n a l l yc a l l a b l er o u t i n e s :
ZTimerOn:Savesthe
B I O S t i m eo fd a yc o u n ta n ds t a r t st h e
l o n g - p e r i o d Zen t i m e r .
Z T i m e r O f f :S t o p st h el o n g - p e r i o d
Zen t i m e ra n ds a v e st h et i m e r
c o u n ta n dt h e
BIOS t i m e - o f - d a yc o u n t .
Z T i m e r R e p o r t :P r i n t st h et i m et h a tp a s s e db e t w e e ns t a r t i n ga n d
s t o p p i n gt h et i m e r .
Note: I f e i t h e rm o r et h a na nh o u rp a s s e so rm i d n i g h tf a l l sb e t w e e n
c a l l s t o ZTimerOnandZTimerOff,anerror
i sr e p o r t e d .F o r
t i m i n gc o d et h a tt a k e sm o r et h a n
a f e wm i n u t e st oe x e c u t e ,
e i t h e r t h e OOS TIME command i n a b a t c h f i l e b e f o r e
and a f t e r
DOS
e x e c u t i o no ft h ec o d et ot i m eo rt h eu s eo ft h e
t i m e - o f - d a yf u n c t i o ni np l a c eo ft h el o n g - p e r i o d
Zen t i m e r i s
morethanadequate.
Note:The
P S / 2 v e r s i o ni sa s s e m b l e db ys e t t i n gt h es y m b o l
PS2 t o 1.
PS2 m u s tb es e tt o
1 on P S / 2 computersbecausethe
PS/Z's
t i m e r sa r en o tc o m p a t i b l ew i t h
a nu n d o c u m e n t e dt i m e r - s t o p p i n g
f e a t u r eo ft h e8 2 5 3 :t h ea l t e r n a t i v et i m i n ga p p r o a c ht h a t
mustbeusedonPS/2computersleaves
a s h o r tw i n d o w
d u r i n gw h i c ht h et i m e r
0 c o u n ta n dt h e
B I O S t i m e rc o u n t may
n o tb es y n c h r o n i z e d .
You s h o u l da l s os e tt h e
PS2 symbol t o
1 i f y o u ' r eg e t t i n ge r r a t i co ro b v i o u s l yi n c o r r e c tr e s u l t s .
Note: When PS2 i s 0. t h ec o d er e l i e so n
anundocumented8253
f e a t u r et og e t
more r e l i a b l er e a d i n g s .
It i s p o s s i b l e t h a t
t h e8 2 5 3( o rw h a t e v e rc h i pi se m u l a t i n gt h e8 2 5 3 )
may b ep u t
i n t o an u n d e f i n e d o r i n c o r r e c t s t a t e
when t h i s f e a t u r e i s
used.
....................................................................
Assume Nothing
55
: N o t e :E a c hb l o c ko fc o d eb e i n gt i m e ds h o u l di d e a l l yb er u ns e v e r a l
:
t i m e sw, i t h
a t l e a sttw os i m i l arre a d i n g rse q u i r e dt o
e s t a b l i s h a t r u e measurement, i no r d e tr oe l i m i n a t e
v a r i a b i l ict ay u s ebi ndyt e r r u p t s .
:
:
: N o t e :I n t e r r u p t sm u s tn o tb ed i s a b l e df o rm o r et h a n
:
:
:
any
54 ms a t a
: Note: Any e x t r ac o d er u n n i n go f ft h et i m e ri n t e r r u p t( s u c ha s
:
some m e m o r y - r e s i d e nut t i l i t i e s )
will i n c r e a stehtei m e
:
measured
by
the
Zen t i m e r .
: N o t e :T h e s er o u t i n e sc a ni n t r o d u c ei n a c c u r a c i e so f
UD t o a few
t e n t h s o f a second i n t o t h e s y s t e m c l o c k c o u n t f o r e a c h
c o d es e c t i o nt i m e d .C o n s e q u e n t l y ,i t ' s
a g o o di d e at o
r e b o o t a t t h ec o n c l u s i o no ft i m i n gs e s s i o n s .( T h e
b a t t e r y - b a c k e dc l o c k ,
i f any. i s n o t a f f e c t e d
b yt h e
Zen
timer.)
: Al r e g i s t e r s and a l lf l a g sa r ep r e s e r v e db ya l lr o u t i n e s .
Code
'CODE'
segmentwordpublic
assume
cs:Code.
ds:nothing
public
ZTimerOn.
ZTimerOff.
ZTimerReport
ona
f u l l y8 2 5 3 - c o m p a t i b l e
S e t P S 2 t o 0 t o a s s e m b l ef o ru s e
system: when PS2 i s 0. t h er e a d i n g sa r em o r er e l i a b l e
i f the
c o m p u t e rs u p p o r t st h eu n d o c u m e n t e dt i m e r - s t o p p i n gf e a t u r e ,
b u t may b eb a d l yo f f
i f t h a tf e a t u r ei sn o ts u p p o r t e d .I n
f a c t , t i m e r - s t o p p i n g may i n t e r f e r e w i t h y o u r c o m p u t e r ' s
o v e r a l lo p e r a t i o nb yp u t t i n gt h e8 2 5 3i n t o
an u n d e f i n e do r
i n c o r r e c ts t a t e .
Use w i t hc a u t i o n ! ! !
Set PS2 t o 1 t o assemble f o r u s e o nn o n - 8 2 5 3 - c o m p a t i b l e
P S / 2 computers: when PS2 i s 1. r e a d i n g s
s y s t e m s ,i n c l u d i n g
will work
may o c c a s i o n a l l y be o f f by54
ms. b u tt h ec o d e
p r o p e r l y on all systems.
PS2
: B a s ea d d r e s so ft h e8 2 5 3t i m e rc h i p .
BASE-8253
40h
equ
: T h ea d d r e s so ft h et i m e r
TIMER-0-8253
: T h ea d d r e s so ft h e
MODEL8253
56
Chapter 3
0 c o u n tr e g i s t e r si nt h e8 2 5 3 .
equ
BASE-8253
+ 0
mode r e g i s t e r i n t h e 8 2 5 3 .
equ
BASEL8253
; Theaddress
ofthe
B I O S t i m e rc o u n tv a r i a b l ei nt h e
BIOS
: datasegment.
TIMER-COUNT
46ch equ
: Macro t oe m u l a t e
a POPF i n s t r u c t i o n i n o r d e r t o f i x t h e b u g i n
some
a POPF even when
: i n t e r r u p t sr e m a i nd i s a b l e d .
; 80286 c h i p sw h i c ha l l o w si n t e r r u p t st oo c c u rd u r i n g
MPOPF macro
l o c a lp l .p 2
j m ps h o r tp 2
i rp el :t
cs
push
p2:
cpal l l
endm
;jump t o pushedaddress
: c o n srt erf autaurcdrtdnrteos s
:ni ntehsxettr u c t i o n
& p o pf l a g s
; Macro t o d e l a y b r i e f l y t o e n s u r e t h a t e n o u g h t i m e h a s e l a p s e d
: b e t w e e ns u c c e s s i v e
; canrespond
1 / 0 accesses s o t h a tt h ed e v i c eb e i n ga c c e s s e d
t ob o t ha c c e s s e se v e no n
a v e r y f a s t PC.
DELAY macro
J+2
jmp
jmp
J+2
jmp
6+2
endm
StartBIOSCountLowdw
StartBIOSCountHigh
dw
EndBIOSCountLow
dw
EndBIOSCountHigh dw
EndTimedCount
dw
Referencecount
dw
:BIOS c o u n tl o ww o r da tt h e
: s t a r to ft h et i m i n gp e r i o d
:BIOS c o u nhti gwh o ratdht e
; s t a r to ft h et i m i n gp e r i o d
;BIOS c o u nl ot w o ratdht e
: end o f t h et i m i n gp e r i o d
:BIOS c o u n th i g hw o r da tt h e
; end o f t h e t i m i n g p e r i o d
: t i m e r 0 c o u natttheen d
of
: t h et i m i n gp e r i o d
;number oc fo u n tr se q u i r et do
: e x e c u t et i m e ro v e r h e a dc o d e
: S t r i n gp r i n t e d
t or e p o r tr e s u l t s .
O u t p u tlSa tbre l
byte
Odh.
Oah.
' T i m e dc o u n t :
'
db
10 dup ( ? )
' m i c r o s e c o n d s ' , Odh. Oah
'J'
db
TimedCountStr
db
db
: T e m p o r a r ys t o r a g ef o rt i m e dc o u n t
: o f t e n when c o n v e r t i n gf r o md o u b l e w o r db i n a r yt o
CurrentCountLow
C u r r e n t C o u n t H i g h dw
dw
?
: Powers o f t e n t a b l e u s e d t o p e r f o r m d i v i s i o n
; d o u b l e w o r dc o n v e r s i o nf r o mb i n a r yt o
by 10 when d o i n g
ASCII.
Assume Nothing
57
m i n g .s t a r t
100 dd
1000 dd
10000dd
100000
dd
1000000
dd
10000000
dd
100000000
dd
1000000000
dd
PowersOfTenEnd
l a bweol r d
: S t r i n gp r i n t e dt or e p o r tt h a tt h eh i g hw o r do ft h e
: c h a n g e dw h i l et i m i n g( a nh o u re l a p s e do rm i d n i g h t
: and s o t h e c o u n t i s i n v a l i d a n d t h e t e s t n e e d s t o b e r e r u n .
BIOS count
was c r o s s e d ) ,
*'
*'
*'
*'
*'
*'
*'
.....................................................................
to
c:*
a l lRe od u t i n e
.....................................................................
ZTimerOn
proc
near
Save t h ec o n t e x to ft h ep r o g r a mb e i n gt i m e d .
push
ax
pushf
S e tt i m e r 0 o f t h e 8253 t o mode 2 ( d i v i d e - b y - N ) . t o c a u s e
l i n e a rc o u n t i n gr a t h e rt h a nc o u n t - b y - t w oc o u n t i n g .
Also stops
t i m e r 0 u n t i lt h et i m e rc o u n ti sl o a d e d ,e x c e p t
onPS/2
computers.
mov
out
al.00110100b
MODE-8253.al
S e tt h et i m e rc o u n tt o
t i m e ri n t e r r u p tr i g h t
N o t e :t h i si n t r o d u c e s
c l o c kc o u n te a c ht i m e
58
Chapter 3
:mode2
0, so we know we w o n ' tg e ta n o t h e r
away.
54 ms i n t h e s y s t e m
a ni n a c c u r a c yo fu pt o
i t i s executed.
DELAY
sub
a1 .a1
out
TIMERPOP8253.al
DELAY
out
TIMER-0-8253,al
:lsb
:msb
: I n c a s ei n t e r r u p t sa r ed i s a b l e d ,e n a b l ei n t e r r u p t sb r i e f l yt oa l l o w
: t h ei n t e r r u p tg e n e r a t e d
when s w i t c h i n gf r o m mode 3 t o mode 2 t o be
: r e c o g n i z e d .I n t e r r u p t sm u s tb ee n a b l e df o ra tl e a s t2 1 0n st oa l l o w
: t i m ef o rt h a ti n t e r r u p tt oo c c u r .H e r e ,
10 j u m p sa r eu s e df o rt h e
: d e l a yt oe n s u r et h a tt h ed e l a yt i m e
will b em o r et h a nl o n ge n o u g h
: evenon
a v e r yf a s t
PC.
pushf
sti
r e p t 10
jmp
1+2
endm
MPOPF
: S t o r et h et i m i n gs t a r t
BIOS count.
: ( S i n c et h et i m e rc o u n t
was j u s t s e t t o
0 . t h e B I O S c o u n t will
: s t a yt h e same f o rt h en e x t5 4
ms. s o we d o n ' t n e e d t o d i s a b l e
: i n t e r r u p t si no r d e rt oa v o i dg e t t i n g
push
ds
sub
mov
mov
mov
mov
mov
POP
ax.ax
ds.ax
ax,ds:[TIMERPCOUNT+2]
cs:[StartBIOSCountHighl.ax
ax.ds:[TIMERPCOUNT]
cs:[StartBIOSCountLow],ax
ds
: S e tt h et i m e rc o u n tt o
mov
out
DELAY
sub
out
DELAY
out
a h a l f - c h a n g e dc o u n t . )
0 a g a i nt os t a r tt h et i m i n gi n t e r v a l .
: t i m e rc o u n t
a1 .a1
TIMER-0-8253,al
: l o a dc o u n tl s b
TIMER-0-8253.al
c: loouandt
msb
: R e s t o r et h ec o n t e x to ft h ep r o g r a mb e i n gt i m e da n dr e t u r nt o
it.
MPOPF
POP
ret
ax
ZTimerOn
endp
.....................................................................
:*c oRgt usioeamtnuctnotitadpn.iolngl e d
*
.....................................................................
ZTimerO
p rfnof eca r
: Save t h ec o n t e x to ft h ep r o g r a mb e i n gt i m e d .
Assume Nothing
59
pushf
push
ax
push
cx
: I n c a s ei n t e r r u p t sa r ed i s a b l e d ,e n a b l ei n t e r r u p t sb r i e f l yt oa l l o w
: a n yp e n d i n gt i m e ri n t e r r u p tt ob eh a n d l e d .I n t e r r u p t sm u s tb e
: e n a b l e df o r a t l e a s t 210ns
toallowtimeforthatinterruptto
: o c c u r .H e r e ,1 0j u m p sa r eu s e df o rt h ed e l a yt oe n s u r et h a tt h e
: d e l a yt i m e
will bemorethanlongenougheven
on a v e r y f a s t
PC.
sti
r e p1 t0
jmp
9+2
endm
: L a t c ht h et i m e rc o u n t .
i f PS2
mov
out
al,00000000b
MODE-8253.al
: tl ai mt cehr
0 count
:
:
:
:
:
T h i s i s where a o n e - i n s t r u c t i o n - l o n gw i n d o we x i s t s
on t h e PS/2.
The t i m e rc o u n ta n dt h e
B I O S c o u n tc a nl o s es y n c h r o n i z a t i o n :
s i n c et h et i m e rk e e p sc o u n t i n ga f t e ri t ' sl a t c h e d ,
i t c a nt u r n
overrightafterit'slatched
andcausethe
B I O S c o u n tt ot u r n
o v e rb e f o r ei n t e r r u p t sa r ed i s a b l e d ,l e a v i n g
us w i t h t h e t i m e r
: c o u n tf r o mb e f o r et h et i m e rt u r n e do v e rc o u p l e dw i t ht h e
BIOS
: c o u n tf r o ma f t e rt h et i m e rt u r n e do v e r .
The r e s u l t i s a c o u n t
: t h a t ' s 54 ms t o ol o n g .
else
: S e tt i m e r 0 t o mode2
: l o a d ,w h i c hs t o p st i m e r
( d i v i d e - b y - N ) ,w a i t i n gf o r
a 2 - b y t ec o u n t
0 u n t i lt h ec o u n ti sl o a d e d .( O n l yw o r k s
: on f u l l y8 2 5 3 - c o m p a t i b l ec h i p s . )
mov
out
DELAY
mov
out
al.00110100b
MODEL8253,al
:mode 2
a l . 0 0 0 0 0:t0li am
0t0ec brh
MODEL8253,al
0 count
endi f
cli
the
;stop
BIOS count
: Read t h e B I O S c o u n t .( S i n c ei n t e r r u p t sa r ed i s a b l e d ,t h e
: c o u n tw o n ' tc h a n g e . )
push
ds
ax.ax
sub
mov
mov
mov
mov
60
Chapter 3
ds,ax
ax,ds:[TIMER_COUNT+2]
cs:[EndBIOSCountHighl,ax
ax,ds:[TIMERLCOUNT1
BIOS
mov
POP
cs:[EndBIOSCountLowl.ax
ds
; Read t h et i m e rc o u n ta n ds a v e
rno m ; c o n v e r t
it.
in
a1 .TIMERpOp8253
DELAY
mov
ah.al
in
a1 ,TIMERp0-8253
:msb
xchg
ah.al
neg
ax
:lsb
: r e m a i n i n gt oe l a p s e d
: count
mov
cs:[EndTimedCountl.ax
: R e s t a r tt i m e r
: t o beloaded.
0. w h i c hi ss t i l lw a i t i n gf o r
an i n i t i a l c o u n t
i f e PS2
DELAY
mov
a1 .00110100b
: 2 - b y t ec o u n t
out
DELAY
sub
out
DELAY
mov
out
DELAY
MODEL8253,al
a1 .a1
TIMERpOp8253.al
:lsb
a1 ,ah
TIMERpOp8253.al
:msb
endi f
the
;let
sti
: Time a z e r o - l e n g t hc o d ef r a g m e n t ,t og e t
: much o v e r h e a dt h i sr o u t i n eh a s .T i m e
: f o ra c c u r a c y ,r o u n d i n gt h er e s u l t .
mov
mov
cli
a r e f e r e n c ef o r
i t 16timesandaverage
cs:[ReferenceCountl.O
cx.16
a l l o w t o : i n toef rf r u p t s
: p r e c i s er e f e r e n c ec o u n t
Ref Loop:
c a l l ReferenceZTimerOn
cR
a lel f e r e n c e Z T i m e r O f f
1 oop Ref Loop
sti
casd:d[ R e f e r e n c e C o u n:ttlo. 8t a l
mov
c l .4
s hcr s : [ R e f e r e n c e C o u n t l . c:l( t o t a l )
how
it,
+ (0.5 * 16)
/ 16
+ 0.5
; R e s t o r et h ec o n t e x to ft h ep r o g r a mb e i n gt i m e da n dr e t u r nt o
it.
POP
cx
POP
ax
MPOPF
ret
Assume Nothing
61
ZTimerOffendp
: C a l l e db yZ T i m e r O f ft os t a r tt h et i m e rf o ro v e r h e a dm e a s u r e m e n t s .
ReferenceZTimerOnproc
near
: Save t h ec o n t e x to ft h ep r o g r a mb e i n gt i m e d .
push
ax
pushf
: S e tt i m e r 0 o f t h e 8253 t o mode 2 ( d i v i d e - b y - N ) .t oc a u s e
: l i n e a rc o u n t i n gr a t h e rt h a nc o u n t - b y - t w oc o u n t i n g .
a1
mov
out
.00110100b
MODE-8253.al
: Set t h et i m e rc o u n tt o
;mode 2
DELAY
sub
a 1 ,a1
out
TIMER-0-8253.al
DELAY
out
TIMERPOP8253.al
;msb
: R e s t o r et h ec o n t e x t
:lsb
o f t h ep r o g r a mb e i n gt i m e da n dr e t u r nt o
MPOPF
POP
ax
ret
ReferenceZTimerOnendp
: C a l l e db yZ T i m e r O f f
t os t o pt h et i m e r
a n da d dt h er e s u l tt o
: Referencecountforoverheadmeasurements.Doesn'tneed
: a t t h e B I O S c o u n tb e c a u s et i m i n g
: i s n ' tg o i n gt ot a k ea n y w h e r en e a r5 4
t ol o o k
a z e r o - l e n g t hc o d ef r a g m e n t
ms.
ReferenceZTim
n peerarorOcf f
: Save t h ec o n t e x to ft h ep r o g r a mb e i n gt i m e d .
pushf
push
ax
p u schx
: M a t c ht h ei n t e r r u p t - w i n d o wd e l a yi nZ T i m e r O f f
62
sti
rept
jmp
endm
10
$+2
mov
out
al.00000000b
MODE-8253,al
Chapter 3
: l a t c ht i m e r
it.
: Read t h ec o u n ta n ds a v e
o wf rno m ; c o n v e r t
ax
it.
DELAY
in
a1 ,TIMER_0_8253
DELAY
mov
ah.al
in
a1 .TIMER_0_8253
x c hagh , a l
neg
;lsb
;msb
remaining t o elapsed
: count
cs:[ReferenceCountl,ax
add
; R e s t o r et h ec o n t e x ta n dr e t u r n .
POP
POP
MPOPF
cx
ax
ret
ReferenceZTimerOffendp
.....................................................................
r et isrmeuipln;*
tosgr.t R
oc oa ul ltei nd e
.....................................................................
Z T i m enr R
epaer rop co r t
pushf
push
ax
push
bx
push
cx
push
dx
push s i
p u sdhi
push
ds
push
POP
ds assume
cs
ds
:DOS f u n c t i o n s r e q u i r e t h a t
: totextto
b ed i s p l a y e d
DS p o i n t
on t h es c r e e n
:Code
; See i f m i d n i g h to rm o r et h a na nh o u rp a s s e dd u r i n gt i m i n g .
; n o t i f yt h eu s e r .
mov
cmp
jz
I f so,
ax.[StartBIOSCountHighl
ax.[EndBIOSCountHighl
C a l c B I O S T;i hmcoeouduri cnd htn a' tn g e ,
; s o e v e r y t h i n g ' sf i n e
aixn c
cmp
ax.[EndBIOSCountHighl
j n zT e s t T o o L o n g: m i d n i g hot trw oh o u r
: b o u n d a r i e sp a s s e d ,
so t h e
: r e s u l t sa r en og o o d
mov
ax.CEndBIOSCountLowl
cmp
ax.[StartBIOSCountLowl
jb
CalcBIOSTime
single
:a
hour
boundary
; p a s s e d - - t h a t ' s OK. s o l o n ga s
: t h et o t a lt i m ew a s n ' tm o r e
; t h a n anhour
Assume Nothing
63
: O v e ra nh o u re l a p s e do rm i d n i g h tp a s s e dd u r i n gt i m i n g ,w h i c h
: r e n d e r st h er e s u l t si n v a l i d .N o t i f yt h eu s e r .T h i sm i s s e st h e
: casewhere a m u l t i p l e o f 24 h o u r sh a sp a s s e d ,b u tw e ' l lr e l y
: o nt h ep e r s p i c a c i t yo ft h e
user t o d e t e c t t h a t c a s e .
TestTooLong:
mov
ah.9
mov
dx.offsT
e tu r n O v e r S t r
int
21h
jmp
short
ZTimerReportOone
: C o n v e r tt h e
BIOS t i m et om i c r o s e c o n d s .
CalcBIOSTime:
mov
ax.CEndBIOSCountLowl
ax.[StartBIOSCountLow]
sub
mov 54925
dx.
:number
m iocfr o s e c oenadcsh
: BIOS c o u n tr e p r e s e n t s
mu1
dx
mov abs: sxi de. aet x
BIOS c oi nu n t
mov
cx.dx
: microseconds
: C o n v e r tt i m e rc o u n tt om i c r o s e c o n d s .
mov
mov
mu1
mov
sdi i v
ax, [ EndTimedCount]
s i ,8381
si
s i ,10000
:*
: Add t i m e ra n d
.E381
- * 8381 /
10000
B I O S c o u n t st o g e t h e rt og e ta no v e r a l lt i m ei n
: microseconds.
add
adc
bx.ax
cx.0
: S u b t r a c tt h et i m e ro v e r h e a da n ds a v et h er e s u l t .
mov
mov
mu1
mov
sdi i v
bx.ax
sub
cx.0
sbb
mov
mov
ax.[ReferenceCount]
: c o n v e r tt h er e f e r e n c ec o u n t
s i ,8381
si
: t om i c r o s e c o n d s
s i , 10000
:* .E381 * 8381 / 10000
[CurrentCountLowl.bx
[CurrentCountHigh].~~
: C o n v e r tt h er e s u l tt o
anASCII
s t r i n gb yt r i a ls u b t r a c t i o n s
: powers o f1 0 .
mov
d i . o f f s e t PowersOfTenEnd - o f f s e t PowersOfTen - 4
mov
s i . o f f sTe itm e d C o u n t S t r
CTSNextOigi t :
mov
b l ,'O'
CTSLoop:
mov
ax.[CurrentCountLow]
mov
dx,[CurrentCountHigh]
ax.PowersOfTen[di]
sub
64
Chapter 3
of
sbb
dx.PowersOfTenCdi+2l
jc
CTSNextPowerDown
bi nl c
mov
CCurrentCountLowl.ax
mov
[CurrentCountHigh].dx
jrnp
CTSLoop
CTSNextPowerDown:
rnov
[sil.bl
si ni c
s udbi
.4
CjTnSs N e x t D i g i t
: P r i n tt h er e s u l t s .
mov
rnov
int
ah.9
d x , o f f sO
e tu t p u t S t r
21h
ZTirnerReportDone:
POP
ds
pop
di
pop
si
POP
dx
POP
cx
POP
bx
POP
ax
MPOPF
ret
ZTimerReport
endp
Code
ends
end
Assume Nothing
65
If you do leave the PS2 equate at1 in Listing 3.5, you should repeat each code-timing
run several timesbefore relying on the results to be
accurate to more than 54 ms, since
variations may result from the possible lack of synchronization between the timer 0
count and the BIOS time-ofday count. Infact, its a good idea to time code more than
once no matter which versionof the long-period Zen timer youre using, since interrupts, which must beenabled in order for the
long-period timer to work properly,may
occur at any time and can alter execution time substantially.
Finally, please note that the precision Zen timer works perfectly well on both PS/2
and non-PS/S computers. The PS/2 and 8253 considerations weve just discussed
apply only to the long-periodZen timer.
: Program t o m e a s u r ep e r f o r m a n c eo fc o d et h a tt a k e sl o n g e rt h a n
: 5 4 ms t oe x e c u t e . (LZTEST.ASM)
:
:
:
:
L i n kw i t h LZTIMER.ASM ( L i s t i n g3 . 5 ) .
LZTIME.BAT ( L i s t i n g3 . 7 )
canbeused
t o assembleand
l i n kb o t hf i l e s .
Code t ob e
measuredmustbe
i n t h e f i l e TESTCODE: L i s t i n g3 . 8 shows
asample
f i l e (LST3-8.ASM)whichshouldbe
named TESTCODE.
: By M i c h a e A
l brash
mystack
segment
stack
para
dup(?)
512
db
ends
mystack
STACK
s e g m epnaptruab l i c
CODE
assume
cs:Code.
ds:Code
e x t r nZ T i m e r 0 n : n e a r .Z T i m e r 0 f f : n e a r .Z T i m e r R e p o r t : n e a r
S t a r tp r o cn e a r
push
cs
Code
66
Chapter 3
pop
ds
: p o i n t D S t toh e
code
segment.
: so d a t a a s w e l l ascodecan
: b ei n c l u d e di n
TESTCODE
easily
: Delay f o r 6 - 7s e c o n d s .t ol e tt h eE n t e rk e y s t r o k et h a ts t a r t e dt h e
cmp
ah,2ch
21h
bh.dh
: g e tt h ec u r r e n tt i m e
: s e tt h ec u r r e n tt i m ea s i d e
ah.2ch
bx
21h
bx
dh,bh
CheckDelayTime
jnb
dh.60
add
CheckDelayTime:
s u b dh,bh
cmp
dh.7
jb
Del
ayLoop
include
TESTCODE
; p r e s e r v es t a r tt i m e
: g e tt i m e
:retrievestarttime
: i s t h e new secondscount
less t h a n
: t h es t a r ts e c o n d sc o u n t ?
:no
:yes. a m i n u t em u s th a v et u r n e do v e r ,
; so add one m i n u t e
: g e tt i m et h a t ' sp a s s e d
;has i t beenmorethan
;notyet
6 s e c o n d sy e t ?
:code t o b em e a s u r e d ,i n c l u d i n gc a l l s
: t o ZTimerOnandZTimerOff
: D i s p l a yt h er e s u l t s .
c aZ lTl i m e r R e p o r t
: T e r m i n a t et h ep r o g r a m .
mov
int
S t a r t endp
Code ends
end
ah.4ch
21h
Start
As with the precision Zen timer, the program inListing 3.6 is used by naming the file
containing the codeto be timed TESTCODE, then assembling both Listing 3.6 and
Listing 3.5 with MASM or TASM and linking the two files together byway of the
Microsoft or Borland linker. Listing 3.7 shows a batch file, named LZTIME.BAT,
which does all of the above, generating and running the
executable file LZTEST.EXE.
LZTIME.BAT assumes that the file LZTIMER.ASM contains Listing 3.5 and the file
LZTEST.ASM contains Listing 3.6.
LISTING 3.7 UTIME.BAT
echo o f f
rem
rem *** L i s t i n g 3.7 ***
rem
rem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
rem * B a t cf ihl e
LZTIME.BAT, w h ibcuhi ladrnsudtnhse
rem * l o n g - p e r i o d Zen t i m e rp r o g r a m
LZTEST.EXE t ot i m et h ec o d e
*
*
Assume Nothing
67
e.
brash
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
rem
remMake s u r e a f i l e t o t e s t was s p e c i f i e d .
rem
i f n o t x%l-x
g o t oc k e x i s t
echo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
t e s t .t oa f i l e
echo *s pPel ec ai fsye
echo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
g o t oe n d
rem
remMake s u r e t h e f i l e e x i s t s .
rem
:ckexist
i f e x i s t %1 g o t od o c o p y
echo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
echo *s pT eh cefiifl iee,d
"%1." d o ee xs ins'tt.
echo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
g o t oe n d
rem
r e mc o p yt h ef i l et om e a s u r et o
TESTCODE.
:docopy
copy %1t e s t c o d e
masm l z t e s t ;
i f e r r o r l e v e l 1 g o t oe r r o r e n d
masm l z t i m e r :
i f e r r o r l e v e l 1 g o t oe r r o r e n d
linklztest+lztimer:
i f e r r o r l e v e l 1 g o t oe r r o r e n d
1z t e s t
g o t oe n d
:errorend
echo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
echo * An e r r o r o c c u r r e d w h i l e b u i l d i n g t h e l o n g - p e r i o d
Zen t i m e r .
echo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
:end
68
Chapter 3
LISTING 3.8
LST3-8.ASM
: N o t e :t a k e sa b o u tt e nm i n u t e st oa s s e m b l e
;
you a ur es i n g
on a s l o w P C i f
MASM
db
Skip:
: S t a r tt i m i n g .
call
ZTimerOn
rept
20000
a1 , [MemVar]
mov
endm
; S t o pt i m i n g .
cZ
a lTl i m e r O f f
When LZTIME.BAT is run ona PC with the following command line (assuming the
code in Listing 3.8 is the file LST3-8.ASM)
l z t i m el s t 3 - 8 . a s m
the result is 72,544 ps, or about 3.63 ps per load of AL from memory. This is just
slightly longer than the time per load of AL measured by the precision Zen timer, as
we would expect given that interrupts are
left enabled by the long-period Zen timer.
The extra fraction of a microsecond measured per MOV reflects the time required
to execute the BIOS code that handles the 18.2 timer interrupts that occur each
second.
Note that the commandcan take asmuch as 10 minutes to finish on a slow PC if you
are using MASM, with most of that time spent assembling Listing3.8. Why? Because
MASM is notoriously slow at assembling REPT blocks, and theblock in Listing 3.8 is
repeated 20,000 times.
69
(display timing results) routines can be called from C. There aretwo separate cases
to be dealt with here: small code model and large; Ill tackle the simpler one, the
small code model,first.
Altering the Zen timerfor linking toa small code model C program involves the following steps:Change ZTimerOn to -ZTimerOn, change ZTimerOff to -ZTimerOff, change
ZTimerReport to -ZTiierReport, and change Code to -TEXT. Figure 3.2 shows the
line numbers and new states of all linesfrom Listing 3.1 that must be changed. These
changes convertthe code to use Cstyle
external label namesand thesmall model C code
segment. (In C++, usethe C specifier,as in
e x t e r n C
ZTimerOn(void);
when declaring thetimer routines extern, so that name-mangling doesnt occur, and
the linker can find the routines C-style names.)
Thats all it takes; after doingthis, youll be able to use the Zen timer fromC, as, for
example, in:
ZTimerOn( :
f o r( i - 0 .
x-0; i
<
l
O
O
i++)
;
x +- i;
ZTimerOff( ;
ZTimerReportO;
(Im talking about the precision timer here. The long-period timer-Listing
requires the same modifications, but todifferent lines.)
Line #
47
4a
49
near
ZTimerOn
proc
140
210
ZnTei 2apm1reo6rcO f f
296
372
assume
384
437
439
cs:-TEXT.
assume
Nw S t a t e
-TEXT
public
~
CODE
ds:nothing
-ZTimerOn.
-2TimerOff.
-2TimerReport
s e g m ewn ot pr du b l i c
ZTimerOn
endp
-ZTimerOff
endp
-Z T i m e r R e p oprrtn
o ec a r
ds:-TEXT
-ZTimerReport endp
-TEXT
ends
These are the lines in Listing 3.1 that must be changed for use with small code
model C, and the states of the lines after the changesare made.
70
Chapter 3
3.5-
Altering the Zen timer for use in Cs large code model is a tad more complex, because in addition to the above changes, all functions, including the internal reference
timing routines that areused to calculate overhead so it can be subtracted out,must
be converted to far. Figure 3.3 shows the line numbers and new states of all lines
from Listing 3.1 that must be changed inorder to callthe Zen timer from large code
model C. Again, the line numbers are specific to the precision timer, but the longperiod timer is very similar.
The full listings for the C-callable Zen timers are presented in Chapter K on the
companion CD-ROM.
ReferenceZTimerOn
with
cs
push
R e pf net receaanrl cl e Z T i m e r O n
47
48
49
140
210
216
267
268
296
302
336
372
384
437
439
New S t a t e
PZTIMER-TEXT s e g m e nwt o r dp u b l i c
CODE
assume
cs:PZTIMER-TEXT.
ds:nothing
p u b l i c -ZTimerOn._ZTimerOff.-ZTimerReport
Z T i m e r O np r o cf a r
-ZTimerOn
endp
-Z T i m e r O f f p r o c f a r
c a l lf a rp t r
ReferenceZTimerOn
c a l fl a pr t R
r eferenceZTimerOff
ZTimerOff endp
R e f e r e n c e Z T i m e rpOrf nao rc
R e f e r e n c e Z T i m e r Opf fr of ca r
- Z T i m e r R e p o rpt r o fca r
assume ds:PZTIMER_TEXT
-ZTimerReportendp
PZTIMER-TEXT ends
-
These are the lines in Listing 3.1 that must be changed for use with large
code modelC, and the states of the lines after the changes are made.
Figure 3.3
Assume Nothing
71
timer, because our purpose in calling the reference timing code is to determine
exactly how much time is taken by overhead code-including the far calls to
ZTimerOn and ZTimerOff ! By converting the far calls to push/near call pairs within
the Zen timer module, TASM makes it impossible to emulate exactly the overhead of
the Zen timer, and makes timings slightly (about 16 cycles on a 386) less accurate.
Whats the solution? Put theNOSMART directive at the startof the Zen timer code.
This directive instructs TASM to turn off all optimizations, including converting far
calls to push/near call pairs. By the way, there is, to the best of my knowledge, no
such problem with MASM up throughversion 5.10A.
In my mind, the whole business of optimizing assemblers is a mixed blessing. In
general, its nice to have the assembler shorteningjumps andselecting sign-extended
forms of instructions for you. On the other hand, the benefits of tricks likesubstituting
push/near call pairs for far calls are relatively small, and those tricks can get in the
way when complete controlis needed. Sure, complete control
is needed very rarely,
but when it is, optimizing assemblers can cause subtle problems; I discovered TASMs
alteration of far calls onlybecause I happenedto view the code in the debugger, and
you might want to do thesame if youre using a recentversion of MASM.
Ive tested the changes shown in Figures 3.2 and 3.3 with TASM and Borland C++
4.0, and also with the latest MASM and Microsoft C/C++ compiler.
Further Reading
For those of you who wish to pursue the mechanics of code measurement further,
one good article about measuring code performance with the 8253 timer is Programming Insight: High-Performance Software Analysis on the IBM PC, by Byron
Sheppard, which appeared in the January, 1987 issue of Byte. For complete if somewhat cryptic information on the 8253 timer itself, I refer you to Intels Microsystem
Components Handbook, which is also a useful reference for a number of other PC
components, including the 8259 Programmable Interrupt Controller and the 8237
DMA Controller. For details about the way the 8253 is used in the PC, as well as a
great dealof additional information aboutthe PCs hardware and BIOS resources, I
suggest you consult IBMs series of technical reference manuals for the PC, XT, AT,
Model 30, and microchannel computers, such as the Models 50, 60, and 80.
For our purposes, however, its not critical that you understand exactly how the Zen
timer works. All you reallyneed to know is what the Zen timer can do andhow to use
it, and weve accomplished that in this chapter.
72
Chapter 3
Previous
Home
Next
Assume Nothing
73
Previous
Home
Next
"";n
" &
," "
')i
77
cycle-eaters and optimization rules have changed over time, but dotake the time to
at least skim through this chapter to give yourself a goodstart on thematerial in the
rest of this book.
Cycle-Eaters
Programming has many levels, ranging from the
familiar (high-levellanguages, DOS
calls, and thelike) down to the esoteric things that lie on theshadowy edge of hardware-land. I call these cycle-eaters because, like the monsters in a bad50s horror movie,
they lurk in those shadows, taking their share of your programs performance without regardto the forces of goodness or theU S . Army. In this chapter, were going to
jump right in at thelowest level by examining the cycle-eaters that live beneath the
programming interface; thatis, beneath your application, DOS, and BIOS-in fact,
beneath the instruction set itself.
Why start at the lowest level? Simplybecause cycle-eaters affect the performance of
all assembler code, and yet are almost unknown to most programmers.A full understanding of code optimization requires an understanding of cycle-eaters and their
implications. Thats no simple task, and in fact it is in precisely that area that most
books and articles about assembly programming fall short.
Nearly allliterature on assembly programming discusses onlythe programming interface: the instruction set, the registers, the flags, and the BIOS and DOS calls. Those
topics coverthe functionality of assemblyprograms most thoroughly-but its performance above all else that were after. No one ever tells you about the raw stuff of
performance, which liesbeneath the programminginterface, in the dimly-seen realmpopulated by instruction prefetching, dynamic RAM refresh, and wait states-where
software meets hardware. This area is the domain of hardware engineers, and is almost
never discussed asit relates to code performance. And yetit is only by understanding
the mechanisms operatingat this level that we can fullyunderstand andproperly improve the performanceof our code.
Which brings us to cycle-eaters.
78
Chapter 4
The nature andseverity of the cycle-eaters vary enormously from processor to processor, and (especially) from memory architecture tomemory architecture. In order
to understand themall, we need first to understand thesimplest among them,those
that haunted the original 8088-based IBM PC. Later on in this book, Ill be better
able to explain the newer generation of cycle-eaters in terms of those ancestral cycleeaters-but we have to get the groundwork
down first.
. . .an 8088!
Fans of the 8088 call it a 16-bit processor. Fans of other 16-bit processors call the
8088 an 8-bit processor.The truthof the matter is that the8088 is a 16-bit processor
that often performs like an %bit processor.
equivalent to an8086. (In fact, the 8086
The 8088 is internally a full 16-bit processor,
is identical to the 8088, except that it has a full 16-bit bus. The 8088 is basically the
poor mans 8086, because it allows a cheaper-albeit slower-system
to be built,
thanks to the half-sized bus.) In terms of the instruction set, the 8088 is clearly a l6bit
In the Lair of the Cycle-Eaters
79
The 8088
Internally, the 8088 is
a full 16-bit
processor, just like the
8086. No cycle
eaters live in here!
Cycle-eater #2
Prefetch
Queue
Cycleeater # 1
Cycle-eater #4
PC Bus
I
Cycleeater
#3
Figure 4.1
80
Chapter 4
The 8088
PC Bus
data bus. Technically, it might be more accurate to place this cycle-eater in the Bus
Interface Unit,which breaks 16-bit memory accesses
into paired8-bit accesses,but it is
really the limited width of the external databus that constricts data flow into and out
of the 8088. True, theoriginal PCs bus is also only8 bits wide,but thatsjust to match
the 8088s &bitbus; even if the PCs buswere 16 bits wide, data could still passinto and
out of the 8088 chip itself only 1 byte at atime.
Each bus access by the 8088 takes 4 clock cycles, or 0.838 ps in the 4.77 MHz PC, and
transfers 1 byte. That means thatthe maximum rate atwhich data can be transferred
into and outof the 8088 is 1 byte every 0.838ps. While 8086 bus accesses also take 4
clock cycles, each 8086 bus access can transfer either 1 byte or 1 word, for a maximum transfer rate of 1 word every 0.838 ps. Consequently, for word-sized memory
accesses, the 8086 has an effective transfer rate of 1 byte every 0.419 ps. By contrast,
every word-sized access on the 8088 requires two 4cycle-long bus accesses, one for
the highbyte of the word and one for the
low byte of the word. As a result,the 8088
has an effective transfer rate for word-sized memory accesses of just 1 word every
1.676 ps-and that, in a nutshell,is the 8-bit bus cycle-eater.
A related cycle-eater lurks beneath the 386SX chip, which isa 32-bit processor internally with only a 16-bit path to system memory. The numbers are different, but the
In the Lair of the Cycie-Eaters
81
way the cycle-eater operates is exactly the same. AT-compatible systems have 16-bit
data buses, which can access a full 16-bit wordat a time. The 386SX can process 32
bits (a doubleword) at atime, however, and loses a lot of timefetching that doubleword
from memory intwo halves.
a x , w o r dp t r[ M e m V a r l
al.byteptr[MemVarl
takes toread the byte at address MemVar. (Actually, the difference betweenthe two isnt
very likely to be exactly4 cycles, for reasons that will become clear once we discuss the
prefetch queue anddynamic RAM refresh cycleeaters later in this chapter.)
Whats more, in some cases one instmction can perform multiple word-sized accesses, incurring that 4cycle penalty on each access. For example, adding value
a
to
a word-sized memory variable requires two word-sized accesses-one to read the
destination operand frommemory prior to adding to it, and oneto write the result
of the addition back to the destination operand-and thus incurs not one but
two 4
cycle penalties. As a result
a d dw o r dp t rC M e m V a r 1 . a ~
String instructionscan suffer from the%bit bus cycle-eater to a greater extent than
other instructions. Believe it or not, a single REP MOVSW instruction can lose as
much as 131,070 word-sized
memory accesses x 4 cycles, or 524,280 c y c b to the 8-bit
bus cycle-eater! In otherwords, one 8088 instruction (admittedly, an instruction that
does a great deal)
can take overone-tenth of a second longer on 8088
an than on an
8086, simply because of the 8-bit bus. One-tenth of a second! Thats a phenomenally
long time in computer terms; in one-tenth of a second,the 8088 can perform more
than 50,000 additions and subtractions.
The upshotof allthis is simply that the 8088 can transfer word-sized data to and from
memory at only half the speed of the 8086, which inevitably causes performance
problems when coupled with an Execution Unit that can process word-sized data
82
Chapter 4
every bit as quickly as an 8086. These problems show up with any code that uses
word-sized memory operands. More ominously, as we will see shortly, the 8-bit bus
cycle-eater can cause performance problemswith other sorts of code as well.
LST4- 1.ASM
; M e a s u r e st h ep e r f o r m a n c eo f
a l o o pw h i c h u s e s a
; b y t e - s i z e d memory v a r i a b l e as t h e l o o pc o u n t e r .
jmp
Counter
Skip
db
100
Skip:
c a l l ZTimerOn
LoopTop:
[C
d eo cu n t e r ]
jnz
LoopTop
c aZ lTl i m e r O f f
LISTING 4.2
LST4-2.ASM
: M e a s u r e st h ep e r f o r m a n c eo f
a l o o pw h i c hu s e s
; w o r d - s i z e d memory v a r i a b l ea st h el o o pc o u n t e r .
Sj m
k ipp
Counter
dw
100
Skip:
c a l l ZTimerOn
LoopTop:
[ Cdoeuc n t e r ]
jnz
LoopTop
c aZ lTl i m e r O f f
83
instructions dont exist in avacuum, so in the listings I used code that both decremented
the counter and tested the result. The difference is that between decrementing a
memory location (simply an instruction) and using a loop counter (afunctional instruction sequence). If youcome across code in this book thatseems lessthan optimal,
its simplydue to my desire to providecode thats relevant to real programming p r o b
lems. On the other hand,optimal code is an elusive thing indeed; by no means should
you assume that the code in this book is ideal! Examine it, question it, and improve
upon it, for an inquisitive, skepticalmind is an important partof the Zen of assembly
optimization.
Back to the 8-bit bus cycle-eater. As Ive said, in 8088 work you should strive to use
byte-sized memory variables whenever possible. That doesnot mean thatyou should
use 2 byte-sized memory accesses to manipulate a word-sized memory variable in
preference to 1 word-sized memory access, as, for instance,
mov
mov
d 1 , b y pt et r
d h . b y tpet r
[MemVarl
[MemVar+ll
versus:
mov
d x . w o rpd[tM
r emVarl
Recall that every accessto a memory byte takesat least 4 cycles; that limitation is built
right into the 8088. The 8088 is also built so that the second byte-sized memory
access to a 16-bit memory variable takes just those 4 cycles and no more.
Theres no
way you can manipulate the second
byte of a word-sized memory variable faster with
a second separate byte-sized instruction in less than 4 cycles. As a matter of fact,
youre bound to access that second byte much more slowly with a separate instruction, thanks to the overhead of instruction fetching and execution, address calculation,
and thelike.
For example, consider Listing 4.3,which performs 1,000 word-sized reads from
memory. This code runs in 3.77ps per word read on a 4.77 MHz 8088. Thats 45
percent faster than the 5.49 ps per word read of Listing 4.4,which reads the same
1,000 words as Listing 4.3 but does so with 2,000 byte-sized reads. Both listings perform exactly the same number of memory accesses-2,000 accesses, each byte-sized,
as all 8088 memory accesses must be. (Remember thatthe Bus Interface Unit must
perform two byte-sized memory accesses in order to handle a word-sized memory
operand.) However, Listing4.3is considerably faster because it expends only 4 additional cycles to read thesecond byte ofeach word, while Listing
4.4performs a second
LODSB, requiring 13 cycles, to read the second byte of each word.
LISTING 4.3
LST4-3.ASM
; M e a s u r e st h ep e r f o r m a n c e
o f r e a d i n g 1,000 words
; from memory w i t h 1,000 w o r d - s i z e da c c e s s e s .
s iu. bs i
84
Chapter 4
mov
c x , 1000
c a l l ZTimerOn
rep
lodsw
c aZ lTl i m e r O f f
LISTING 4.4
LST4-4.ASM
: M e a s u r e st h ep e r f o r m a n c e
o f r e a d i n g 1000 words
: f r o m memory w i t h 2,000 b y t e - s i z e da c c e s s e s .
sub
s, is i
mov
c x , 2000
c a l l ZTimerOn
l or ed ps b
c Za T
llimerOff
In short,if you must perform a16-bit memory access, let the 8088 break the access
into two byte-sized accessesfor you. The 8088 is more efficient at thattask than your
code canpossibly be.
Word-sized variables should be stored in registers to the greatest feasible extent,
since registers are inside the 8088, where 16-bit operations are just as fast as 8-bit
operations because the 8-bit cycle-eatercant get at them. In fact,
its a good idea to
keep as many variables of all sorts in registers as you can. Instructionswith registeronly operands execute very rapidly, partially because they avoid both the
time-consuming memory accesses and the lengthy address calculations associated
with memory operands.
There is yet another reason why register operands are preferable to
memory operands, andits an unexpectedeffect of the %bit bus cycle-eater. Instructions with only
register operands tendto be shorter (in
terms of bytes) than instructions with memory
operands, andwhen it comes to performance, shorter is usually better. In order to
explain why that is true andhow it relates to the &bit bus cycle-eater, I mustdiverge
for a moment.
For the last few pages, you may well havebeen thinking that the
%bit bus cycle-eater,
while a nuisance, doesnt seemparticularly subtle or difficult to quantify. After all,
any instruction reference tells us exactly how many cycles each instruction loses to
the 8-bit bus cycle-eater, doesnt it?
Yes and no.Its true that in general
we know approximately how much longer given
a
instruction will take to execute with a word-sized memory operand thanwith a bytesized operand, although the
dynamic RAM refresh and wait state cycle-eaters (which
Ill cover a little later) can raise the cost of the 8-bit bus cycle-eater considerably.
However, all word-sized memory accesses lose 4 cycles to the 8-bit bus cycle-eater,
and theres one sortof word-sized memory access we havent discussed yet: instruction fetching. The ugliest manifestation of the %bit bus cycle-eater is in fact the
prefetch queue cycle-eater.
85
Cbx1.a~
its well-documented that anextra 4 cycles will always be requiredto write the upper
byte of AX to memory. Not so with the prefetch queue cycle-eater lurking nearby.
For instance, the instructions
shr
shr
shr
shr
shr
ax.1
ax.1
ax.1
ax.1
ax.1
should execute in 10 cycles, since each SHR takes 2 cycles to execute, accordingto
Intels specifications. Those specifications contain Intels official instruction execution times, but in this case-and in many others-the specifications are drastically
wrong. Why? Because theydescribe execution time once an instruction reaches thep-efetch
86
Chapter 4
queue. They say nothing about whether a given instruction will be in the prefetch
queue when its time for thatinstruction to run, orhow long itwill take that instruction to reach the prefetch queue
if its not there already. Thanks to the low
performance of the 8088s external databus, thats a glaring omission-but, alas, an
unavoidable one. Lets look at why the official execution times are wrong, and why
that cant be helped.
In other words, while the execution time for a given instruction is constant, the
fetch timefor that instruction depends heavily on the context which
in the instruction is executing-the amount of prefetching the preceding instructions
allowed-and can vary from a full 4 cycles per instruction byte tono time at all.
As well see later, other cycle-eaters, such as DRAM refresh and display memory wait
states, can cause prefetching variations even during different executions of the same
code sequence. Given that, its meaningless to talk about theprefetch time of a given
instruction except inthe context of a specific code sequence.
So now you know why the official instruction execution times are often wrong, and
why Intel cant provide better specifications. You also know now why it is that you
must time your code if you want toknow how fast it really is.
In theLair of theCycle-Eaters
87
: M e a s u r e st h ep e r f o r m a n c eo f
1,000 SHR i n s t r u c t i o n s
: i n a r o w .S i n c e
SHR e x e c u t e si n 2 c y c l e sb u t
is
: 2 b y t e sl o n g ,t h ep r e f e t c h
queue i s alwaysempty,
: a n dp r e f e t c h i n gt i m ed e t e r m i n e st h e
: p e r f o r m a n c eo ft h ec o d e .
call
rept
88
Chapter 4
ZTimerOn
1000
overall
ax.1
shr
endm
c a l l ZTimerOff
LISTING 4.6
LST4-6.ASM
sub
cx. 1000
ax.ax
ZTimerOn
call
r e p t 1000
mu1
ax
ax.1
shr
endm
c a l l ZTimerOff
Execution Unit
Activity
Bus Interface
Unit Activity
Execution Unit
executes shr
Execution Unit
idle
Execution Unit
executes shr
Execution Unit
idle
Execution Unit
executes shr
Execution Unit
idle
89
Now lets examine Listing 4.6. Here each SHR follows a MUL instruction. Since
MUL instructions take so long to execute thatthe prefetch queue is alwaysfull when
they finish, each SHR should be ready and waiting in the prefetch queue when the
preceding MUL ends. As a result, wed expect that each SHR would execute in 2
cycles; together with the 118-cycle execution time of multiplying 0 times 0, the total
execution time should come to 120 cyclesper SHR/MUL pair, as shown in Figure 4.4.
And, by God, when we run Listing 4.6we get an execution time of 25.14ps per SHR/
MUL pair, or exact4 120 cycles!According to these results, the trueexecution time
of SHR would seem to be 2 cycles, quite a change from
the conclusion we drew from
Listing 4.5.
The key point is this:Weve seen one code sequence
in which SHR took 8-plus cycles
to execute, and another in which it took only 2 cycles. Are we talking about two
different forms of S H R here? Of course not-the difference is purely a reflection of
the differing states in which
the preceding code left the prefetch queue. In Listing 4.5,
each SHR after the first few follows a slew of other SHR instructions which have
sucked the prefetch queue dry, so overall performance reflects instruction fetch time.
By contrast, each SHR in Listing 4.6 follows a MUL instruction which leaves the
prefetch queue full, so overall performance reflects Execution Unit execution time.
Clearly, either instruction fetch time or Execution Unit execution time-or even a
mix of the two, if an instruction is partially prefetched-can determine codeperformance. Some people operate under a rule
of thumb by which they assume that the
execution time of each instruction is 4 cycles times the number of bytes in the instruction. While thats often true for register-only code, it frequently doesnt hold
for code that accesses memory. For one thing, the rule should be4 cycles times the
number of memory accesses, not instruction bytes, since all accesses take 4 cycles on
the 8088-based PC. For another, memory-accessing instructions often have slower
Execution Unit execution times than the 4 cycles per memory access rule would
dictate, because the 8088 isnt very fastat calculating memory addresses. Also, the 4
cycles per instruction byte rule isnt true for register-only instructions that are already in the prefetch queue when the precedinginstruction ends.
The truthis that it never hurts performance to reduce either thecycle count or the
byte count of a given bit of code, but theres no guaranteethat one or the other
will
improve performance either. For example, consider Listing 4.7, whichconsists of a
series of 4cycle, 2-byte MOV A L , O instructions, and which executes at the rate of
1.81 ps per instruction. Now consider Listing 4.8, which replaces the 4-cycle MOV
A L , O with the 3-cycle (but still 2-byte) S U B a,&.
Despite its l-cycle-per-instruction
advantage, Listing 4.8runs atexactly the same speedas Listing 4.7.
The reason: Both
instructions are 2 bytes long, and in both cases it is the 8-cycle instruction fetch time,
not the3 or 4cycle Execution Unit execution time, that limits performance.
90
Chapter 4
Execution Unit
Activity
Execution Unit
executes shr
Bus Interface
Unit Activity
,
Cvcle 0
t
I
Cvcle 3
Cycle
Cycle
Bus Interface
Unit prefetches
next shr
Cycle 6
Cycle 7
Execution Unit
executes mu1
Cycle 1 1
Bus Interface
Unit prefetches
next mu1
Bus Interface
Unit idle
Execution Unit
executes shr
Execution Unit
executes mu1
Bus Interface
Unit prefetches
next shr
91
LISTING 4.7
LST4-7.ASM
: M e a s u r e st h ep e r f o r m a n c eo fr e p e a t e d
: w h i c ht a k e 4 c y c l e se a c ha c c o r d i n g
: specifications.
ax,ax
sub
call
rept
mov
MOV A L . 0 i n s t r u c t i o n s
to Intelsofficial
ZTimerOn
1000
a1 ,O
endm
c aZ lTl i m e r O f f
LISTING 4.8
LST4-8.ASM
: M e a s u r e st h ep e r f o r m a n c eo fr e p e a t e d
: w h i c ht a k e
3 c y c l e se a c ha c c o r d i n g
: specifications.
SUB A L . A L i n s t r u c t i o n s
to Intelsofficial
ax.ax
sub
c a l l ZTimerOn
r e p t 1000
sub
a1 .a1
endm
c aZ lTl i m e r O f f
As you can see, its easy to be drawn into thinkingyoure saving cycles when youre
not. You can only improve the performanceof a specific bit of code by reducing the
factor-either instruction fetch time or execution time, or sometimes a mix of the
two-thats limiting the performance of that code.
In case you missed it in all the excitement, the variability ofprefetching means that
our method of testing performance by executing 1,000 instructions in arow by no
means produces trueinstruction execution times, any more than the official execution times in the Intel manuals are truetimes. The fact of the matter is that a
given instruction takes at least as long to execute as the time given for it in the Intel
manuals, but may take as much as 4 cycles per byte longer, depending on the
state of
the prefetch queue when the precedinginstruction ends.
The only true execution time for an instruction is a time measured in a certain
context, and that time is meaningfiil only in that context.
What we Teal&want isto know howlong useful workingcode takes torun, nothow long
a single instruction takes, and theZen timer gives us the tool we need to gather that
information. Granted, it would be easier if we could just add up
neatly documented
instruction execution times-but thats not goingto happen. Without actually measuring the performanceof a given code sequence,you simplydont know how fastit
is. For crying out loud, even the people who designed the 8088 at Intel couldnt tell
you exactly how quickly
a given 8088 code sequenceexecutes on thePC just by looking at it! Get used to the idea that executiontimes are only meaningful in context,
learn the rules of thumb in this book, anduse the Zen timer to measure your code.
92
Chapter 4
93
More than anything,the above rules mean using the registers as heavily as possible,
both because register-only instructions are short and because they dont perform
memory accesses to read orwrite operands. However, using the registers is a rule of
thumb, not a commandment. In some circumstances, it may actually be faster to
access memory. (The look-up table technique is one such case.) Whats more, the
performance of the prefetch queue (and hence the performance of each instruction) differs from one code sequence to the next, and can even differ during different
executions of the same code sequence.
All in all, writing good assembler code is as much an artas a science. As a result, you
should follow the rules of thumb described here-and then time your code to see
how fast it really is. You should experimentfreely, but always remember that actual,
measured performance is the bottom line.
94
Chapter 4
Lets start with DRAM refresh, which affects the performanceof everyprogram that
runs on thePC.
95
The 256 addresses accessed by the refresh DMA accesses are arrangedso that taken
together they properly refresh all the memory in the PC. By accessing one of the 256
addresses every 15.08 ps, all of the PCs DRAM is refreshed in 256 x 15.08 ps, or 3.86
ms, which is just about the desired 4 ms time I mentioned earlier. (Only the first
640K of memory is refreshed in the PC; video adapters and other adapters above
640K containing memory that requires refreshing must provide their own DRAM
refresh in pre-AT systems.)
Dont sweat the details here. The important point
is this:For at least 4 out of every72
cycles, the original PCs bus is given over toDRAM refresh and is not available tothe
8088, as shown in Figure 4.5. That means thatas much as 5.56 percent of the PCs
already inadequate bus capacity is lost. However, DRAM refresh doesnt necessarily
72 cycle$
4 cycles
Figure 4.5
96
Chapter 4
stop the 8088 in its tracks for 4 cycles. The Execution Unit of the 8088 can keep
processing while DRAM refresh is occurring, unless the EU needs to access memory.
Consequently, DRAM refresh can slow code performanceanywhere from 0 percent
to 5.56 percent (and actually a bit more, as well see shortly), depending on the
extent to which DRAM refresh occupies cycles during which the 8088 would otherwise be accessing memory.
t od e m o n s t r a t e
a case i nw h i c h
MUL i n s t r u c t i o n s ,
be f u l l a t a l l t i m e s ,
DRAM r e f r e s hh a s
no i m p a c t
: oncodeperformance.
sub
ax.ax
c a l l ZTimerOn
r e p t 1000
mu1
ax
endm
c aZ lTl i m e r O f f
Running Listing 4.9, we find that each MUL executes in 24.72 ps, or exactly 118
cycles. Since thats the shortest time in which MUL can execute,we can see that no
performance is lost to DRAM refresh. Listing 4.9 clearly illustrates that DRAM refresh only affects code performance when a DRAM refresh forces the Execution
Unit of the 8088 to wait for amemory access.
Now lets look at the series of SHR instructions shown in Listing 4.10. Since SHR
executes in 2 cycles but is 2 bytes long, the prefetch queue shouldbe empty while
Listing 4.10 executes, with the 8088 prefetching instruction bytes non-stop. As a result, the time per instruction of Listing 4.10 should precisely reflect the time required
to fetch the instruction bytes.
97
LISTING 4.10
LST4- 1O.ASM
: M e a s u r e st h ep e r f o r m a n c eo fr e p e a t e d
SHR i n s t r u c t i o n s .
: w h i c he m p t yt h ep r e f e t c hq u e u e ,t od e m o n s t r a t et h e
: w o r s t - c a s ei m p a c t
o f DRAM r e f r e s h oncodeperformance.
c a l l ZTimerOn
r e p t 1000
asxh. 1r
endm
c aZ lTl i m e r O f f
Since 4 cycles are required to read each instruction byte, wed expect each SHR to
execute in 8 cycles, or 1.676 ps, if there were no DRAM refresh. In fact, each SHR in
Listing 4.10 executes in 1.81 ps, indicating that DRAM refresh is taking 7.4 percent
of the programs execution time. Thats nearly 2 percent more than ourworst-case
estimate of the loss to DRAM refresh overhead! In fact, the result indicates that DRAM
refresh is stealing not 4, but 5.33 cycles out of every 72 cycles. How can this be?
The answer is that agiven DRAM refresh can actually hold up CPU memory accesses
for as many as 6 cycles, depending on the timing of the DRAM refreshs DMA request relative to the 8088s internal instruction execution state. When the code in
Listing 4.10 runs, each DRAM refresh holds up the CPU for either 5 or 6 cycles,
depending onwhere the 8088 is in executing the currentS H R instruction whenthe
refresh request occurs. Now we see that things can get even worse than we thought:
DRAM reji-esh can steal as much as 8.33 percent of available memory access time-4 out of
a e r y 72 cycles-from the 8088.
Which of the two cases weve examined reflects reality? Whileeither case can happen,
the latter case-significant performance reduction, ranging
as high as 8.33 percentis far more likely to occur. This is especially true for high-performance assembly
code, which uses fast instructions that tend to cause non-stop instruction fetching.
98
Chapter 4
wouldnt have any effect. In the old days when code size was measured in bytes, not
K bytes, and processors were less powerful-and complex-programmers did in fact
use similar tricks to eke every last bit of performance from their code. When programming thePC, however, the prefetch queue cycle-eater would makesuch careful
code synchronization adifficult task indeed, andany modest performanceimprovement that did result could neverjustify the increase
in programming complexity and
the limits on creative programming thatsuch an approachwould entail. Besides, all
that effort goes to waste on faster 8088s, 286s, and other computerswith different
execution speedsand refresh characteristics. Theres noway around it: Useful code
accesses memory frequently and atirregular intervals, and over the long haul
DRAM
refresh always exacts its price.
If youre still harboring thoughts of reducing the overheadof DRAM refresh, consider this. Instructions that tend not to suffer very much from DRAM refresh are
those that have a high ratio of execution time to instruction fetch time, and those
arent thefastest instructions of the PC. It certainly wouldnt make sense to use slower
instructions just to reduce DRAM refresh overhead, for its total execution timeDRAM refresh, instruction fetching,and all-that matters.
The important thing to understand about DRAM refresh is that it generally slows
your code down, and that the extent of that performance reduction can vary considerably and unpredictably,depending onhow the DRAM refreshes interactwith your codes
pattern of memory accesses. When you use the Zen timer and geta fractional cycle
count for theexecution time of an instruction, thats often the
DRAM refresh cycleeater at work. (The display adapter cycle-eater is another possible culprit, and, on
386s and later processors, cache misses and pipeline execution hazardsproduce this
sort of effect as well.) Whenever you get two timing results that differ less or more
than they seemingly should, thats usually DRAM refresh too. Thanks to DRAM refresh, variations of up to 8.33 percent in PC code performance are par for the course.
Wait States
Wait states are cycles during which a bus access by the CPU to a device on the PCs
bus is temporarily halted by that device while the device gets ready to complete the
read or write. Wait states are well and truly the lowest level of code performance.
Everything we have discussed (and will discuss)-even DMA accesses-can
be affected by wait states.
Wait states exist because the CPIJ must to be able to coexist with any adapter, no matter how slow(withinreason). The 8088 expects to be
able to complete each bus access-a
memory or 1 / 0 read or write-in 4 cycles, but adapters cant always respond that
quickly for a number of reasons. For example, display adapters must split access to
display memory between the CPU and the circuitry that generates the video signal
based on the contents of display memory, so they often cant immediately fulfill a
request by the CPU for a display memoryread or write. To resolve this conflict, display
In the Lair of the Cycle-Eaters
99
adapters can tell the CPU to wait during bus accesses by inserting one or morewait
states, as shownin Figure 4.6. The CPU simply sits and idles as long as wait statesare
inserted, then completes the access as soon as the display adapter indicates its readiness by no longer inserting wait states. The same would be true of any adapter that
couldnt keep up with the CPU.
Mind you, this is all
transparent to executing code. An instruction that encounters wait
states runs exactly asif there were no wait states, only slower. Waitstates are nothing
more or less than wasted time as far as the CPU and your program are concerned.
By understanding the circumstances in which wait states can occur, you can avoid
them when possible. Even when its not possible to work around wait states, its still
to your advantage to understand how they can cause your code to run moreslowly.
First, lets learn a bit more aboutwait states by contrast with DRAM refresh. Unlike
DRAM refresh, wait states do not occur on
any regularly scheduled basis, and areof
no particular duration. Wait states can only occur when an instruction performs a
memory or 1/0 read or write. Both the presence of wait states and the numberof
wait states inserted on any given bus access are entirely controlled by the device
being accessed. When it comes to wait states, the CPU is passive, merely accepting
whatever wait states the accessed device chooses to insert during the course of the
access. All of this makes perfect sense given that the whole point of the wait state
Cycle 3
The display adapter inserts n wait
states while waiting for an access
to display memory to become
available. The Bus Interface Unit of
the 8088 is idle during this time.
Cycle n
Cycle n + l
Cycle n+2
Cycle n+3
100
Chapter 4
mechanism is to allow a device to stretch outany access to itself for however much
time it needs to perform the access.
As with DRAM refresh, wait states dont stop the 8088 completely. The Execution
Unit can continueprocessing while waitstates are inserted,so long as the EU doesnt
need to perform a busaccess. However,in thePC, wait states most often occurwhen
an instruction accesses a memory operand, so in fact the Execution Unit usually is
stopped by wait states. (Instruction fetchesrarely wait in an 8088-based PC because
system memory is zero-wait-state. AT-class
memory systems routinely insert1 or more
wait states, however.)
As it turns out,wait states pose a serious problem injust one areain the PC. While
any adapter can insert wait states, in the PC only displayadapters do so to the extent
that performance is seriously affected.
101
50 cycles
n
.
Display memory is
beina read for
vide: data and is
not available to the
8088;wait states
are inserted when
8088 accesses
occur.
Second, because the displayed dots (or pixels, short for picture elements) must be
drawn on thescreen at a constant speed, many display adapters provide memory accesses only at fixed intervals.As a result, time can be lost while the 8088 synchronizes
with the start of the next display adapter memory access, even if the video circuitry
isnt accessing displaymemory at thattime, as shownin Figure 4.8.
Finally, the time it takes a display adapter to complete a memory access is related to
the speed of the clock whichgenerates pixels on the screen rather than to the memory
access speed of the 8088. Consequently, the time taken for display memory to complete an 8088 read or write access is often longer than the time taken for system
memory to complete an access, even if the 8088 lucks into hitting a free display
memory accessjust as it becomesavailable, again as shownin Figure 4.8.Any or all of
102
Chapter 4
the threefactors Ive described can result inwait states, slowing the 8088 and creating the display adapter cycle-eater.
is that display memory
If some of this is Greek to you, dont worry. The important point
is not very fast compared to normal system memory. How slow is it? Incredibly slow.
Remember how slow IBMsill-fated PCjrwas?
In case youveforgotten, Ill refresh your
memory: The PCjrwas at best only half as fast the
as PC. The PCjrhadan 8088 running
at 4.77 MHz, just like the PC-why do you suppose it was so much slower? Ill tell you
why: All the memory in the Pcjr was display memory.
In theLair of theCycle-Eaters
103
Enough said. All the memory in the PC is not display memory, however,and unless
youre thickheaded enoughto put codein display memory,the PC isnt going to run
as slowly asa PC& (Putting codeor othernon-video data in unusedareas of display
memory sounds like a neat idea-until you consider the effect on instruction
prefetching of cutting the 8088s already-poor memory access performance in half.
Running your code from display memory is sort of like running on a hypothetical
8084-an 8086 with a 4-bit bus. Not recommended!) Given that your code anddata
reside in normal system memory below the 640K mark, how great an impact does
the display adapter cycle-eater have on performance?
The answer variesconsiderably depending onwhat displayadapter andwhat display
mode were talking about. The display adapter cycle-eater is worst withthe Enhanced
Graphics Adapter (EGA) and the original Video Graphics Array (VGA). (Many
VGAs,
especially newer ones, insert
many fewer wait
states than IBMs original VGA. On the
other hand, Super
VGAs have more bytes of displaymemory to be accessed in highresolution mode.) While the Color/Graphics Adapter(CGA), Monochrome Display
Adapter (MDA), and Hercules Graphics Card (HGC) all suffer from the display
adapter cycle-eater as well, they suffer to a lesser degree. Since the VGA represents
the base standard forPC graphics now and for the foreseeable future, andsince it is
the hardest graphics adapter to wring performance from, well restrict our discussion to the VGA (and its close relative, the EGA) for the remainderof this chapter.
104
Chapter 4
course, now we know that their cardinal sin was to ignore the prefetch queue; even if
there were no wait states, their calculations would have been overly optimistic.
There are
display memory wait statesas well, however, so the calculations werenot justoptimistic
but wildly optimistic.
Text mode situations suchas the above notwithstanding, wherethe display adapter
cycle-eater really kicks
in is in graphics mode, andmost especiallyin thehigh-resolution graphics modes of the EGA and VGA. The problem hereis not that there are
necessarily more wait states per access in high-resolution graphics modes
(that varies
from adapter to adapter and mode to mode).
Rather, the problemis simply that are
many more bytes of displaymemory per screen in these modes than in
lower-resolution graphics modes and in text modes, so many more display memoryaccesses-each
incurring its share of display memory wait states-are required in order to
draw an
image of a given size.When accessing the many thousands of bytes used inthe highresolution graphics modes,the cumulative effects of display memory waitstates can
seriously impact code performance, even as measured in human time.
For example, if we assume the same 5 ps per display memory access for the EGAs
high-resolution graphics mode thatwe assumed for text mode, it would take 26,000
X 2 X 5 ps, or 260 ms, to scroll the screen once inthe EGAs high-resolution graphics
mode, mode10H. Thats more than one-quarterof a second-noticeable by human
standards, an eternity by computer standards.
That sounds pretty serious, but we did make an unfoundedassumption about memory
access speed. Lets get some hard numbers. Listing 4.11 accesses displaymemory at
the 8088s maximum speed, by way of a REP MOVSW with display memory as both
source and destination. The code in Listing 4.11 executes in 3.18 ps per access to
display memory-not as long as we had assumed, but a long time nonetheless.
o f memory a c c e s st oE n h a n c e dG r a p h i c s
: A d a p t e rg r a p h i c s
mode d i s p l a y memory a t A000:OOOO.
mov
int
mov
mov
mov
sub
mov
mov
c ld
call
rep
ax.0010h
: 1s e0hlhie- cr et s
ax.Oa000h
ds,ax
es.ax
si.si
d,i s i
cx.800h
:move
ZTimerOn
movsw
EGA g r a p h i c s
(AH=O s e l e c t s
: B I O S s e t mode f u n c t i o n ,
: w i t h AL-mode t o s e l e c t )
: mode 1 0h e x
: w r i t i n ge a c hb y t ei m m e d i a t e l yb a c k
105
: t o t h e same a d d r e s s . No memory
: l o c a t i o n sa r ea c t u a l l ya l t e r e d :t h i s
: isjustto
measurememoryaccess
: times
c aZ lTl i m e r O f f
mov
int
ax.0003h
10h
: r et et tuxotr n
mode
For comparison, lets see howlong thesame code takes when accessing normal system
RAM instead of display memory. The code in Listing 4.12, which performs a REP
MOVSW from thecode segment to the code segment, executes in 1.39 ps per display
memory access. That means that on average, 1.79 ps (more than 8 cycles!) are lost to
the display adapter cycle-eater on each access. In other words, the display adapter
cycle-eater can rnure than doubb the execution time of 8088 code!
LISTING 4.1 2 LST4- 12.ASM
: Times s w e d o f memory a c c e s s t o n o r m a sl y s t e m
: memory.
mov
mov
sub
mov
mov
cld
call
rep
ax.ds
es.ax
si.si
d i ,si
cx.800h
ZTimerOn
movsw
: s i m p l yr e a de a c ho ft h ef i r s t
: 2 K w o r d so ft h ed e s t i n a t i o ns e g m e n t ,
: w r i t i n ge a c hb y t ei m m e d i a t e l yb a c k
: t o t h e same a d d r e s s . No memory
: l o c a t i o n s a r e a c t u a l l ya l t e r e d :t h i s
: i s j u s t t o measurememoryaccess
: times
c aZ lTl i m e r O f f
Bear in mind that were talking about a worst case here; the impact of the display
adapter cycle-eater is proportional to the percent of time a given code sequence
spends accessing display memory.
106
Chapter 4
the next. As a result, code that accesses display memory less intensively than the
code inListing 4.1 1 will on average lose 4 or 5 rather than8-plus cycles to the display
adapter cycle-eater on each memory access.
Nonetheless, the display adapter cycle-eater always takes its toll on graphics code.
Interestingly, that toll becomes much higher on ATs and 80386 machines because
while those computers can executemany more instructions per microsecond than
can the 8088-based PC, it takes just as long to access displaymemory on those computers as on the8088-based PC. Remember, the limited speed
of access to a graphics
adapter is an inherentcharacteristic of the adapter,so the fastest computer around
cant access display memoryone iota faster than the adapterwill allow.
Moreovel; 486s and Pentiums, as well as recent Super VGAs, employ write-caching schemes that make display memory writes considerably faster than display
memory reads.
Along the same line, thedisplay adapter cycle-eater makes the popularexclusive-OR
animation technique, which requires paired reads and writes of display memory,
less-than-ideal for the PC. Exclusive-OR animation should be avoided in favor of
simply writing images to display memory wheneverpossible.
Another principle fordisplay adapter programming on the8088 is to perform multiple accesses to display memory very rapidly, in order to make use of as many of the
scarce accesses to display memory as
possible. This is especially important when many
large images need to be drawn quickly, since only by using virtually every available
display memory access can many bytes
be written to display memoryin a short period
of
time. Repeated string instructions are ideal for making maximum use of display
memory accesses; ofcourse, repeated string instructions can
only be used on whole
bytes, so this is another pointin favor of modifying display memorya byte at a time.
(On faster processors,however, displaymemory is so slow that it often pays to do several
In the Lair of the Cycle-Eaters
107
Cycle-Eaters: A Summary
Weve covered a great dealof sophisticated materialin this chapter, so dont feel bad
if you havent understood everything youve read; it will all become clear fromfurther reading, especially once you study, time, and tune code thatyou have written
yourself. Whats really important is that you come away from this chapter understanding thaton the8088:
The 8-bit bus cycle-eater causes each access to
a word-sized operand to be 4
cycles longer than an equivalent access atobyte-sized operand.
The prefetch queue cycle-eater can cause instruction execution times to be as
much as four times longer than the officially documented cycle times.
The DRAM refresh cycle-eater slows most PC code, with performance reductions ranging as high as
8.33 percent.
The display adapter cycle-eater typically doubles and can more than triple the
length of the standard 4-cycle access to display memory, with intensive display
memory access suffering most.
This basic knowledge about cycle-eaters puts you in a good position tounderstand
the results reported by the Zen timer, and thatmeans that youre well on your way to
writing high-performance assembler code.
108
Chapter 4
Previous
Home
Next
In theLair of theCycle-Eaters
109
Previous
Home
Next
ree little words should strike terror into the heartof anyone
who owns more
t
bag and toothbrush.
a
Our last
move was the usual
the distance from the old house to the new was only
everythingsmaller than a washing machine. We have
, kids, computers, you name it-so the moving proa sizable household
A large number-33, to be exact. I personally spent
riving back and forth between the two houses. The move took
things: What does this have to do with high-performance programming, andwhy on earth didntI rent a truck and get the move over
in one ortwo trips, saving hours of driving? As it happens, thesecond question answers the first. I didnt rent a truck because it seerned easier and cheaperto use cars-no
big truck to drive, no rentals, spread the work out moremanageably, and so on.
It wasnt easier, and wasnt even much cheaper. (It costs quite abit to drive a car 330
miles, tosay nothing of the value of 15 hours of my time.) But, at thetime, it seemed
as though my approach would be easier and cheaper. In fact, I didnt realize just how
much time I hadwasted driving back and forthuntil I sat down to write thischapter.
In Chapter 1, I briefly discussedusing restartable blocks. This, you might remember, is
the process of handling in chunks data sets too large to fit in memory so that they
113
can be processed just aboutas fast asif they did fit in memory. The restartable block
approach is very fast but is relatively difficult to program.
At the opposite end of the spectrum lies byte-by-byteprocessing, whereby DOS (or,
in less extreme cases, a group of library functions) is allowedto do all the hardwork,
so that you only have to
deal with one byte at a time. Byte-by-byte processing is easy to
program butcan be extremely slow, due to the vast overhead that results from invoking DOS each time a byte must be processed.
Sound familiar? It should. Imoved via the byte-by-byte approach, and theoverhead
of driving back and forth made for miserable performance. Renting a truck (the
restartable block approach) would have required moreeffort and forethought, but
would have paid off handsomely.
The easy, familiar approach often has nothing in its favor except that it requires
less thinking; not a great virtue when writing high-performance code-or when
moving.
And with that, lets look at afairly complex application of restartable blocks.
1 14
Chapter 5
Brute-Force Techniques
Given that no C/Ct+ library function meets our needs
precisely, an obvious alternative approach is the brute-force technique that uses memcmp() to compare every
potential matching location in the buffer to the string were searching for, as illustrated in Figure 5.1.
By the way, wecould, of course, use our own code, working with pointers in a loop, to
perform the comparison
in place of memcmp().But memcmp() will almost certainly
use the very fastREPZ CMPS instruction. However, never assume! It wouldnthurt to
use a debuggerto check out theactual machine-code implementation
of memcmp()
from yourcompiler. If necessary,you could always write your own assemblylanguage
implementation of memcmp().
1 15
Invoking memcmp() for each potential match location works, but entails considerable overhead. Each comparison requires that parametersbe pushed andthat a call
to and return from memcmp() be performed, along with a pass through the comparison loop. Surely theres a better way!
Indeed there is. We can eliminate most calls to memcmp() by performing a simple
test on each potential match
location that will reject most such locations right off the
bat. Well just check whether the first character of the potentially matching buffer
location matches the first character of the string were searching for. We could make
this check by using a pointer in a loop to scan the buffer for the next match for the
first character, stopping to check for a match
with the rest of the string only when the
first character matches, as shown in Figure 5.2.
Using memchr()
Theres yet a betterway to implement this approach, however. Usethe memchr()function, which does nothing more or less than find the next occurrence of a specified
character in a fixed-length buffer (presumably by using the extremely efficientREPNZ
SCASB instruction, although again it wouldnt hurt to check). By using memchr() to
scan for potential matches that can then be fully tested with memcmp(),we can build
a highly efficientsearch engine thattakes good advantage of the information we have
about thebuffer being searched and the string were searching for. Our enginealso
relies heavily on repeated string instructions, assuming that the memchr() and
memcmp() library functions are properly coded.
1 16
Chapter 5
Were going to go with the this approach in our file-searching program; the only
trick lies in decidinghow to integratethis approach with restartable blocks in order
to search through files larger than ourbuffer. This certainly isnt the fastest-possible
searching algorithm; as one example, the Boyer-Moore algorithm, which cleverly
eliminates many buffer locations as potential matches in the process of checking
preceding locations, can be considerably faster. However,the Boyer-Moore algorithm
is quite complex to
understand and implement, and
would distract us from our main
focus, restartable blocks, so well save it for a later chapter (Chapter14, to be precise). Besides, I suspectyoull find the approachwell use to be fast enough formost
purposes.
Now that weve selected a searching approach, lets integrate it with file handling
and searching through multiple blocks. In other words, lets make it restartable.
1 17
Otherwise, well load in another buffer full of data fromthe file, search it, and so on.
The only trick lies
in handling potentially matching sequences in the
file that start in
one buffer and end in the next-that is, sequences that span buffers. Well handle
this by copying the uncheckedbytes at the endof one buffer to the start of the next
and readingthat many fewer bytes the nexttime we fill the buffer.
The exact number of bytesto be copied from the end
of one buffer to the start of the
next is the length of the searched-for string minus 1, since thats how many bytes at
the endof the buffer cant be checkedas possiblematches (because the check would
run off the endof the buffer).
Thats really allthere is to it. Listing 5.1 shows the file-searching program. As you can
see, its not particularly complex, although a few fairly opaque lines of code are
required to handle merging the end of one block with the start of the next. The code
that searches a single block-the function SearchForString()-is
simple and compact
(as it should be,given that its by far the most heavily-executed code in the listing).
Listing 5.1 nicely illustrates the core concept of restartable blocks: Organize your
program so that you can do your processing within each block as fast as youcould if
there were only one block-which is to say at top speed-and make your blocks as
large as possible in order tominimize the overhead associated with going from one
block to the next.
LISTING 5.1
SEARCH.C
I* Program t o s e a r c h t h e f i l e s p e c i f i e d
*
*
b yt h ef i r s tc o m m a n d - l i n e
argument f o rt h es t r i n gs p e c i f i e db yt h es e c o n dc o m m a n d - l i n e
a r g u m e n t .P e r f o r m st h es e a r c hb yr e a d i n ga n ds e a r c h i n gb l o c k s
o f s i z e BLOCK-SIZE. * I
# i n c l u d e< s t d i o . h >
# i n c l u d e < f c n t l h>
# i n c l u d e< s t r i n g . h >
# i n c l u d e< a l l o c . h >
I* a l 1 o c . hf o B
r o r l a n dc o m p i l e r s ,
*/
m a l 1 o c . hf o rM i c r o s o f tc o m p i l e r s
# d e f i n e BLOCK-SIZE
0x4000
I* w e l lp r o c e s st h ef i l ei n
16K blocks
I* S e a r c h e st h es p e c i f i e dn u m b e ro fs e q u e n c e s
inthespecified
b u f f e rf o rm a t c h e st oS e a r c h s t r i n go fS e a r c h S t r i n g L e n g t h .N o t e
t h a tt h ec a l l i n g
c o d es h o u l da l r e a d yh a v es h o r t e n e dS e a r c h L e n g t h
i f n e c e s s a r yt oc o m p e n s a t ef o rt h ed i s t a n c ef r o mt h ee n do ft h e
a matchingsequence i n t h e
buffertothelastpossiblestartof
buffer.
*I
i n t SearchForString(unsigned c h a r* B u f f e r ,i n tS e a r c h L e n g t h ,
u n s i g n e dc h a r* S e a r c h s t r i n g .i n tS e a r c h S t r i n g L e n g t h )
(
u n s i g n e dc h a r* P o t e n t i a l M a t c h :
I* Search s o l o n g as t h e r e a r e p o t e n t i a l - m a t c h l o c a t i o n s
remaining *I
w h i l e ( SearchLength ) I
I* See i f t h e f i r s t c h a r a c t e r o f S e a r c h s t r i n g c a n
1 18
Chapter 5
befound
*/
*/
--
i f ( (PotentialMatch =
m e m c h r ( B u f f e r *. S e a r c h s t r i n g S
, earchLength))
break:
/* No matches i n t h i sb u f f e r */
I* The f i r s tc h a r a c t e rm a t c h e s :
a l s o matches * /
NULL )
see i f t h e r e s t o f t h e s t r i n g
i f ( S e a r c h S t r i n g L e n g t h -= 1 1 {
r e t u r n ( 1 ) : I* T h a t one m a t c h i n gc h a r a c t e r
s e a r c hs t r i n g , s o w e ' v eg o t
was t h ew h o l e
a match * I
else {
/*
*I
C h e c kw h e t h e rt h er e m a i n i n gc h a r a c t e r sm a t c h
i f ( !memcmp(PotentialMatch + 1. S e a r c h s t r i n g
S e a r c h S t r i n g L e n g t h - 1) ) {
r e t u r n c l ) ; / * We've g o t a match * I
+ 1.
1
I* The s t r i n gd o e s n ' tm a t c h :k e e pg o i n gb yp o i n t i n gp a s tt h e
p o t e n t i a lm a t c hl o c a t i o n
we j u s t r e j e c t e d * I
SearchLength
P o t e n t i a l M a t c h - B u f f e r + 1;
Buffer
P o t e n t i a l M a t c h + 1:
return(0):
--
I* No m a t c hf o u n d
m a i n ( i n ta r g c .c h a r* a r g v [ ] )
i n t Done:
i n t Handle:
i n t WorkingLength;
i n tS e a r c h S t r i n g L e n g t h ;
i n t BlockSearchLength:
i n t Found;
*/
/ * I n d i c a t e s w h e t h e rs e a r c h i s done * /
/ * H a n d l eo f f i l eb e i n gs e a r c h e d * /
/ * L e n g t ho f
c u r r e n tb l o c k
*/
/ * L e n g t ho f s t r i n g t o s e a r c h f o r
I*
/*
*/
L e n g t h t o s e a r c hi nc u r r e n tb l o c k
*/
I n d i c a t e s f i n a ls e a r c hc o m o l e t i o n
status *I
I* # obf y t e tsor e a di n t on e xbt l o c k ,
i n t NextLoadCount;
a c c o u n t i n gf o rb y t e sc o p i e df r o mt h e
lastblock */
u n s i g n e dc h a r* W o r k i n g B l o c k ;
I* B l o c k s t o r a g e b u f f e r
*I
u n s i g n e dc h a r* S e a r c h s t r i n g ;
I* P o i n t e r t o t h e s t r i n g t o s e a r c h f o r
u n s i g n e dc h a r* N e x t L o a d P t r ;
/* O f f s e ta tw h i c ht os t a r tl o a d i n g
t h en e x tb l o c k ,a c c o u n t i n gf o r
b y t e sc o p i e df r o mt h el a s tb l o c k
/ * Check f o r t h e p r o p e r
i f ( a r g c !- 3
{
number o f arguments
*/
*/
*I
p r i n t f ( " u s a g e :s e a r c hf i l e n a m es e a r c h - s t r i n g \ n " ) ;
exit(1):
1
besearched * /
i f ( (Handle
o p e n ( a r g v [ l ] . OERDONLY 1 0-BINARY))
p r i n t f ( " C a n ' t open f i l e :% s \ n " .a r g v [ l l ) ;
exit(1):
/ * T r y t o open t h e f i l e t o
>
-- -1 1
I* C a l c u l a t e t h e l e n g t h o f t e x t t o s e a r c h f o r
*I
Searchstring
argvCE1:
strlen(SearchString):
SearchStringLength
I* T r y t o g e t memory i n w h i c h t o b u f f e r t h e d a t a
*/
i f ( ( W o r k i n g B l o c k = malloc(BLOCK-SIZE))
NULL 1 I
p r i n t f ( " C a n ' t g e t enoughmemory\n"):
exit(1);
--
CrossingtheBorder
1 19
I* Load t h e f i r s t b l o c k
a t t h es t a r to ft h eb u f f e r ,
and t r y t o
fill t h e e n t i r e b u f f e r
*/
WorkingBlock:
NextLoadPtr
NextLoadCount = BLOCK-SIZE:
I* Not
done
w i t sh e a r c yh e t
*I
Done = 0:
Found = 0 :
I* Assume we w o n ' ft i n d
a match * I
/* Searchthefilein
BLOCK-SIZE chunks * /
do
I* Read i n however many b y t e sa r en e e d e dt o
fill o u tt h eb l o c k
( a c c o u n t i n gf o rb y t e sc o p i e do v e rf r o mt h el a s tb l o c k ) .o r
*I
t h er e s to ft h eb y t e si nt h ef i l e ,w h i c h e v e ri sl e s s
i f ( (WorkingLength
read(Hand1e.NextLoadPtr.
NextLoadCount)) == - 1 ) I
p r i n t f ( " E r r o rr e a d i n gf i l e% s \ n " .a r g v C 1 1 ) :
exit(1):
/*
A c c o u n tf o ra n yb y t e s
we c o p i e df r o mt h ee n d
o f thelast
*I
b l o c ki nt h et o t a ll e n g t ho ft h i sb l o c k
WorkingLength +- NextLoadPtr - WorkingBlock:
/ * C a l c u l a t et h e number o f b y t e s i n t h i s b l o c k t h a t c o u l d
a m a t c h i n gs e q u e n c et h a tl i e s
p o s s i b l y be t h e s t a r t o f
entirelyinthisblock
( s e q u e n c e st h a tr u no f ft h ee n do f
andfound
t h eb l o c k will b e t r a n s f e r r e d t o t h e n e x t b l o c k
when t h a t b l o c k i s s e a r c h e d )
*I
if
( (BlockSearchLength
WorkingLength - S e a r c h S t r i n g L e n g t h + 1) <= 0 1 {
Done = 1: / * Too f e wc h a r a c t e r si nt h i sb l o c kf o r
so t h i s
t h e r et ob e
a n yp o s s i b l em a t c h e s ,
isthefinalblock
a n dw e ' r ed o n ew i t h o u t
f i n d i n g a match
*I
I
else {
/ * S e a r c ht h i sb l o c k
*I
i f ( S e a r c h F o r S t r i n g ( W o r k i n g B 1 o c k . BlockSearchLength.
) {
S e a r c h s t r i n gS
. earchStringLength)
Found = 1:
I* We've f o u n d a match * I
Done = 1:
else
I* Copy any b y t e sf r o mt h e
end o f t h e b l o c k t h a t s t a r t
p o t e n t i a l l y - m a t c h i n gs e q u e n c e st h a tw o u l dr u no f f
*/
t h ee n do ft h eb l o c ko v e rt ot h en e x tb l o c k
i f ( SearchStringLength > 1 ) I
memcpy(WorkingB1ock.
WorkingBlock+BLOCK-SIZE - S e a r c h S t r i n g L e n g t h
SearchStringLength - 1):
/*
Setup
t ol o a dt h en e x tb y t e sf r o mt h ef i l ea f t e rt h e
*I
b y t e sc o p i e df r o mt h e
end o f t h e c u r r e n t b l o c k
NextLoadPtr = WorkingBlock + S e a r c h S t r i n g L e n g t h - 1:
NextLoadCount
BLOCK-SIZE - S e a r c h S t r i n g L e n g t h + 1:
120
Chapter 5
1.
w h i l e ( !Done ) :
/*
R e p o r tt h er e s u l t s
*/
( Found ) (
p r i n t f ( S t r i n gf o u n d \ n ) :
else I
p r i n t f ( S t r i n gn o tf o u n d \ n ) :
if
exit(Found);
/*
R e t u r nt h ef o u n d / n o ft o u n ds t a t u s
DOS e r r o r l e v e l * /
as the
1 21
that the overhead of loading the program, running through the C startup code,
opening the file, executing printf(), and exiting the program and returningto the
DOS shell are also included in my timings. Given which, it shouldbe apparent why
converting to assembly language isnt worth the trouble-the best we could do by
speeding up the search is a 10 percent orso improvement, and thatwould require
more than doubling the performance of code that already uses repeated string instructions to do most of the work.
Not likely.
By the way, dont think thatjust because very large block sizes dont muchimprove
performance, it wasnt worth using restartable blocks in Listing 5.1. Listing 5.1 runs
more than three
times more slowly witha block size of32 bytes than with a block size
122
Chapter 5
Previous
Home
Next
of 4K, and any byte-by-byte approach would surely be slower still, due to the overhead of repeated calls to DOS and/or the C stream I/O library.
Restartable blocks do minimize the overhead of DOS file-access callsin Listing 5.1;
its just that theres no way to reduce that overhead to the point where
it becomes
worth attempting to further improve the performanceof our relatively efficient search
engine. Althoughthe search engine is by no means fully optimized, its nonetheless
as fast as theres any reason forit to be, given the balanceof performance among the
components of this program.
1 program so that you know where optimization is needed and whether it will make
a significant difference beforeyou invest your time.
CrossingtheBorder
1 23
Previous
Home
Next
in
127
Oh.
Had I known I was seated next to a real, live science-fiction writer-an award-nominated
writer, by God!-I would havepumped him for all I was worth, but thepossibility had
never occurred to me. I was at a dinner put on
by a computermagazine, seated next
to an editorwho had justfinished a book about Turbo Pascal, and, gosh, it was obvious that the appropriatetopic was computers.
For once, the moral is not dont judge abook by its cover. Jeff isin fact what he
appeared to be at face value:a computerwriter and editor. However, he is more, too;
face value wasntfull value. Youll similarly find thatface value isnt always full value
in computer programming, and especially so when working in assembly language,
where many instructions have talents above and beyond their obvious abilities.
On the other hand, there are
also a numberof instructions, such as LOOP, that are
designed to perform specific functions but arent always the best instructions for
those functions. So dont judge abook by its cover,either.
Assembly language for the x86 family isnt like any other language (for which we
should, without hesitation, offer our profuse thanks). Assembly language reflects
the design of the processor rather than the waywe think, so its full of multiple
instructions that performsimilar functions, instructions with odd andoften confusing side effects, and endless ways to string together different instructions
to do much
the same things, often with seeminglyminuscule differences that can turn outto be
surprisingly important.
To produce the best code, you must decide precisely what you
need to accomplish,then
put together the sequence
of instructions that accomplishes that end most efficiently,
regardless of what the instructions are usually used for. Thats
why optimization for the
PC is an art, and its whythe best assembly language
for the x86 familywill almost always
handily outperform compiled code. With that in mind, lets look past face valueand while wereat it, Ill toss in a few examples of not judginga book by its cover.
The point to all this: You must come to regard the x86 family instructions for what
they do, notwhat youre used to thinking they do. Yes, SHL shifts a patternleft-but
a look-up table can do the same thing, and can often do it faster. ADD can indeed
add two operands, but it cantput theresult in a third
register; LEA can. The instruction set is your raw material for writing high-performance code.By limiting yourself
to thinking only in certain well-establishedways about thevarious instructions, youre
putting yourself at a substantial disadvantage every time you sit downto program.
In short, thex86 familycan do much more thanyou think-if youll use everything
it has to offer. Give it a shot!
128
Chapter 6
bx.si
a1 , [ b x l
Or you could let the processor do thearithmetic for you in a single instruction:
mov
a1 ,[ b x + s i ]
The two approaches arefunctionally interchangeable but not equivalent from a performance standpoint,and which is better depends on the
particular context. If its a
one-shot memory access, its best tolet theprocessor perform the addition;its generally fasterat doingthis than a separate ADD instruction would be. If its a memory
access within a loop, however, its advantageous on the 8088 CPU to perform the
addition outside the loop, if possible, reducing effective address calculation time
inside the loop, as in the following:
add
LoopTop:
mov
inc
1 oop
bx.si
a 1 ,[ b x ]
bx
LoopTop
129
not always better to perform the addition outside the loop on the 486. All memory addressing calculationsare free on the Pentium,however. Ill discuss
486 performance issues
in Chapters12 and 13, and the Pentium in Chapters 19 through 21.
ax.bx
ax.di
ax.2
(It would be more compact to incrementAX twice than to addtwo to it, and would
probably be faster on an 8088, but thats not what were after at the moment.) An
elegant alternative solutionis simply:
a xl e. [ab x + d i + 2 1
di ,si
di.2
or:
l e ad i
,[ s i + 2 l
Mind you,the only components LEA can add areBX or BP, SI or DI, and a constant
displacement, so its not going to replace
ADD most of the time. Also, LEA is considerably slower than ADD on an 8088, although itis just as fast as ADD on a 286 or 386
1 30
Chapter 6
when fewer than three memory addressing components are used. LEA is 1 cycle
slower than ADD on a 486 if the sum of two registers is used to point to memory,but
no slower than ADD on a Pentium. On both a 486 and Pentium, LEA can also be
slowed down by addressing interlocks.
where it can take advantage ofthe enhancedmemory addressing modesof those processors. (The 486 and Pentium offerthe same modesas the 386, so Ill refer only tothe
386 from now on.) The 386 can do two very interesting things: It can use any 32-bit
register (EAX, EBX, and so on) as the memory addressing base register and/or the
memory addressing index register, and it can multiply any 32-bit register used as an
index by two, four, or eight in the process of calculating a memory address,as shownin
Figure 6.2. Lets see what thats
good for.
Well, the obvious advantageis that any two 32-bit registers,or any 32-bit registerand
any constant, or any two 32-bit registers and any constant, can be added together,
Looking Past Face Value
13 1
with the result stored in any register. This makes the 32-bit LEA much more generally useful than the standard16-bit LEA in the role of an ADD with an independent
destination.
But what else can LEA do ona 386, besides add?
It can multiply any register used as an index. LEA can multiply only by the power-oftwo values 2,4, or 8, but thats useful more often thanyou might imagine,especially
when dealing with pointers into tables. Besides, multiplying by 2,4, or 8 amounts to
a left shift of 1, 2, or 3 bits, so we can now add up to two 32-bit registers and a
constant, and shift (or multiply) one of the registers to some extent-all with a single
instruction. For example,
lea
edi,TableBase[ecx+edx*4]
132
Chapter 6
Previous
Home
Next
some values other than powers of two. You see, the same 32-bit register can be both
base and index on the 386, and can be scaled as the index while being used unchanged as the base. That means that you can, forexample, multiply EBX by 5 with:
1 e ae b x[.e b x + e b x * 4 1
Without LEA and scaling, multiplication of EBX by 5 would require either a relatively slowMUL, along with a set-up instruction or two, or three separate
instructions
along the lines of the following
mov
edx.ebx
shl
ebx.2
e abdx d, e d x
and would in either case require thedestruction of the contents of another register.
Multiplying a 32-bit valueby a non-power-of-twomultiplier in just 2 cycles is a pretty
neat trick, even though it works only on a 386 or 486.
Thefull list of values thatLEA can multiply aregister by on a 386 or 486 is: 2, 3,
4, 5, 8, and 9. That list doesn 't include every multiplier you might want, but it
covers some common1y used ones, and the performance ishard to beat.
I'd like to extend my thanks to Duane Strongof Metagraphics for his help in brainstorming uses for the 386 versionof LEA and for pointing out the complications of
486 instruction timings.
Looking PastFaceValue
133
Previous
chapter 7
local optimization
Home
Next
optimizing
halfway
between
algorithms
andand
cycle
counting
optimizing
halfway
between
algorithms
cycle
counting
You might not think hhbut theres much to learn about performance programming
from the Great Buffald-.Fiasco.Towit:
The scene is Buffalo, j4ew York, in the deadof winter, withthe snow piled several feet
deep. Four college &dents, living in typical student housing, are frozen to the bone.
The third floor of their house, uninsulated and so cold that its uninhabitable, has an
ancient bathrooW6One fabulously cold day, inspiration strikes:
.&3g$$@@q
Hey-we could make that bathroom into asauna!
Pandemonium ensuks. Someone rushes out andbuys a gas heater, and at considerable risk to lifeand limb hooks it up to an abandoned butstill live gaspipe that once
fed a stove on the third floor. Someone else gets sheets of plastic and lines the walls
of the bathroomto keep the moisture in, andyet another studentgets a bucket full
of rocks. The remaining chapbrings up some old wooden chairs and sets them up to
make benches along the sides of the bathroom. Voila-instant sauna!
They crank up the gas heater, put the bucket of rocks in front of it, close the door,
take off their clothes, and sit down to steam themselves. Mind you,
notits
yet 50 degrees
Fahrenheit inthis room, but thegas heater is roaring. Surely warmer times await.
Indeed they do. The temperature climbs to 55 degrees, then 60, then 63, then 65,
and finally creeps up to 68 degrees.
137
So, drawing fortitude from theknowledge that our quest is a pure andworthy one,
lets resume our exploration of assembly language instructions with hidden talents
and instructions with well-known talents that are less than they appear to be. In the
process, well come to see that there is another, very important optimization level
between the algorithm/design level and the cycle-counting/individual instruction
level. Ill call this middle level local optimization; it involves focusing on optimizing
sequences of instructions rather thanindividual instructions, all withan eye toimplementing designs as efficiently as possible given the capabilities of the x86 family
instruction set.
And yes, in case youre wondering, the above story is indeed true. Was I there? Let
me put it this way: If I were, Id never admit it!
138
Chapter 7
How can this be? Dont ask me, ask Intel. On the 8088 and 80286, LOOP is indeed
faster than DEC CX/JNZ by a cycle, and LOOP is generally a little faster still because its a byte shorter and so can be fetched faster. On the 386, however, things
change; LOOP is two cycles slower than DEC/JNZ, and the fetch time for one extra
byte on even an uncached 386 generally isnt significant. (Remember that the 386
fetches four instruction bytes at a pop.) LOOPis three cycles slower than DEC/JNZ
on the 486, and the 486 executes instructions in so few cycles that those three cycles
mean that DEC/JNZ is nearly twice as fast asLOOP. Then,too, unlike LOOP, DEC
doesnt require thatCX be used, so the DEC/JNZ solution is both faster and more
flexible on the 386 and 486, and on thePentium as well. (By the way, all this is not
just theory; Ive timed the relative performances of LOOP and DEC CX/JNZ on a
cached 386, and LOOPreally is slower.)
Things are strangerstillfor LOOPk relative JCXZ, which branches ifand only if
CX is zero. JCXZ is faster than AND CXCWJZ on the 8088 and 80286, and
equivalent on the 80386-but is about twice as slow on the486!
By the way, dont fall victim to the lures of JCXZ and do somethinglike this:
and
jcxz
f i e: Il fd
i s 0, dboont h te r
The AND instruction has already set the Zero flag, so this
and
jz
fdieesl idr:tIehcsdxeo.l0afthe
SkipLoop
f i e: lIdf
i s 0 . bdootnhet r
will do justfine and is faster on all processors. Use JCXZ only whenthe Zero flagisnt
already set to reflect the status of CX.
~ocalOptimization
139
In particular, if youre going to write 386 protected mode code,which will run only
on the 386,486, and Pentium,
youd bewell advised to rethinkyour use of the more
esoteric members of the x86 instruction set. LOOP, JCXZ, the various accumulatorspecific instructions, and even the string instructions in manycircumstances no longer
offer the advantages they did on the8088. Sometimes theyrejust notany faster than
more general instructions, so theyre not worth going out of your way to use; sometimes, as with LOOP, theyre actually slower, and youd do well to avoid them
altogether in the 386/486 world. Reviewing the instruction cycle timesin theMASM
or TASM manuals, or looking over the cycle times in Intels literature, is a goodplace
to start; published cycle times are closer to actual execution times on the 386 and
486 than on the 8088, and are reasonably reliable indicators of the relative performance levels of x86instructions.
Local Optimization
One level at which assembly language programming pays off handsomely is that of
local optimization; that is, selecting the best sequence of instructions for task.
a
The key
to local optimization is viewing the 80x86 instruction set as a setof building blocks,
each with unique characteristics. Your job is to sequence those blocks so that they
perform well. It doesnt matter what the instructions are intended to do or what
their names are;all that matters is what they do.
Our discussion of LOOP versus DEC/JNZ is an excellent example of optimization
by cycle counting. Its worth knowing, but onceyouve learned it, you just routinely
use DEC/JNZ at the bottom of loops in 386/486specific code, and thats that. Besides, youllsave at most a few cycles each time, and while that helps alittle, its not
going to make all that much difference.
Now lets step back for a moment, andwith no preconceptions consider what the
x86 instruction set can do for us. The bulk of the time with both LOOP and DEC/
JNZ is taken up by branching, which just happensto beone of the slowest aspects of
every processor in the x86 family,and therest is taken up by decrementing the count
register and checking whether its zero. There may be ways to perform those tasks a
140 Chapter 7
little fasterby selecting different instructions, but they can get only so fast, and branching can't even get all that fast.
The trick, then, is not to find the fastest way to decrement a count and branch
conditionully, but rather to figure out how to accomplish the same result without
decrementing or branching as often. Remember the Kobiyashi Muru problem in
Star Trek? The same principle applies here: Redefine the problem
to one that offers better solutions.
17-1.ASM
: Program t o i l l u s t r a t e s e a r c h i n g t h r o u g h
a buffer o f a specified
: l e n g t hu n t i le i t h e r
a s p e c i f i e db y t e o r a z e r o b y t e i s
: encountered.
: A s t a n d a r dl o o pt e r m i n a t e dw i t h
.model
smal
stack
LOOP i s used.
lOOh
.data
: Sample s t r i n gt os e a r c ht h r o u g h .
S a m p l e s ltbar yibnteegl
db
' T h i si s a sample s t r i n g o f a l o n ge n o u g hl e n g t h
db
' s o t h a t raw
searching
speed
can
outweigh
any
db
' e x t r as e t - u pt i m et h a t
may b er e q u i r e d . ' . O
SAMPLE-STRING-LENGTH
$e- qSua m p l e s t r i n g
: Userprompt.
P r o'm
E
db
pnct h
ea
r racter
'
'
t o s e a rfcohr : $ '
; R e s u l ts t a t u s
ByteFoundMsg
messages.
db
0dh.Oah
'dSbp e c i f i b
e ydftoeu n d . ' , O d h . O a h , ' $ '
ZeroByteFoundMsgdb
0dh.Oah
'dZbebryot e
encountered.'.Odh.Oah.'$'
0dh.Oah
db
NoByteFoundMsg
db
' B u f f eerx h a u s t ewd i tnhron a t c h . ' , O d h . O a h . ' $ '
Local Optimization
141
,code
S t a r tp r o cn e a r
mov ax,Bdata
;point
tsot a n d a rdda t a
segment
mov ds.ax
mov
d x . o f f sPe rt o m p t
mov ah.9
:OOS sfpturni nc tgi o n
i :prompt
nt
21h
user the
mov ah.1
f :OOS
u nkcet yi g
oe
nt
i nkey
t the
;get
21h
s e taf oor cr h
mov ah,al
:putcharactertosearchforin
AH
mov cx.SAMPLE-STRING-LENGTH
:#o f b y t e s t o s e a r c h
mov s, oi f f s eSta m p l e s t r i n: gp o i ntbotu f f etsore a r c h
c a l l SearchMaxLength
:search
b u f ft eh re
mov d x , o f f s e t ByteFoundMsg
;assume we f o u n d t h e b y t e
P r i nj ct s t a t u s
:we d i d f i n d t h e b y t e
;we d i d n ' t f i n d t h e b y t e , f i g u r e o u t
:whether we found a z e r ob y t eo r
: r a no u to fb u f f e r
mov d x , o f f s e t NoByteFoundMsg
;assume we d i d n ' t f i n d a z e r o b y t e
Pj rci xnzt s t a t u s
;we d fi idnnd' t
a b zy et er o
mov dx,offset
ZeroByteFoundMsg
:we found a z e r bo y t e
Printstatus:
mov ah.9
:DOS p r i n t s t r i n g f u n c t i o n
int
: r e p o r ts t a t u s
21h
mov ah.4ch
: rt eo t u r n
OOS
Zi nl ht
S t a r t endp
: F u n c t i o nt os e a r c h a b u f f e r o f a s p e c i f i e d l e n g t h u n t i l e i t h e r
: s p e c i f i e db y t eo r
a z e r ob y t ei se n c o u n t e r e d .
: Input:
;
;
-DS:SI -
AH
CX
:
: output:
:
CX
:
c h a r a c t etrso e a r c fho r
maximum l e n g t ht o besearched(mustbe
p o i n t e rt ob u f f e rt o
be
searched
> 0)
0 i f and o n l y i f we r a no u to fb y t e sw i t h o u tf i n d i n g
e i t h e rt h ed e s i r e db y t eo r
a z e r ob y t e
DS:SI
p o i n t e tr os e a r c h e d - f o br y t e
i f f o u n do, t h e r w i s e
byte
a f t e rz e r ob y t e
i f found. o t h e r w i s e b y t e a f t e r l a s t
b y t ec h e c k e d i f n e i t h e r s e a r c h e d - f o r b y t e n o r z e r o
b y t ei sf o u n d
C a r rF
y lag
s e t i f s e a r c h e d - f obr y t ef o u n dr,e s eot t h e r w i s e
SearchMaxLength
proc
near
cld
SearchMaxLengthLoop:
1 odsb
cmp
al.ah
jz
ByteFound
and
a1
.al
jz
ByteNotFound
loop
SearchMaxLengthLoop
ByteNotFound:
c lc
ret
ByteFound:
dec
si
stc
142
Chapter 7
: g e tt h en e x tb y t e
;isthisthebyte
we want?
;yes.we'redone
w i t hs u c c e s s
0 byte?
;isthistheterminating
; y e s .w e ' r ed o n ew i t hf a i l u r e
: i t ' sn e i t h e r ,
so c h e c kt h en e x t
; b y t e , i f any
r e t u r n" n o tf o u n d "s t a t u s
p o i n tb a c kt ot h el o c a t i o na tw h i c h
we f o u n dt h es e a r c h e d - f o rb y t e
r e t u r n" f o u n d "s t a t u s
ret
SearchMaxLength
endp
end
Start
Unrolling Loops
Listing 7.2 takes a different tack, unrolling the loop so that four bytes are checked
for each LOOP performed. Thesame instructions are used inside the loop in each
listing, but Listing 7.2 is arranged so that threequarters of the LOOPSare eliminated.
Listings 7.1 and 7.2 perform exactly the same task,and they usethe same instructions in
the loop-the searching algorithm hasn't changed in anyway-but we have sequenced
the instructionsdifferently in Listing 7.2, and that makes all the difference.
: A l o o pu n r o l l e df o u rt i m e s
and t e r m i n a t e dw i t h
a specified
LOOP i s used.
.model
small
.stack
lOOh
.data
: Sample s t r i n gt os e a r c ht h r o u g h .
Sampl e S t r i n g
1 a b eb ly t e
db
' T h i si s a sample s t r i n go f a l o n g enough l e n g t h '
db
' s o t h a t raw
searching
speed
can
outweigh
any
'
d b' e x t r as e t - u pt i m et h a t
may be r e q u i r e d . ' . O
SAMPLE-STRING-LENGTH
equ
$-Samplestring
: Userprompt.
Prompt
db
: R e s u l ts t a t u s
ByteFoundMsg
messages.
db
Odh.Oah
db
'Specifie
bd
yftoeu n d . ' . O d h . O a h . ' l '
ZeroByteFoundMsgdb
0dh.Oah
encountered.'.Odh.Oah.'S'
db
' Z ebr oy t e
0dh.Oah
db
NoByteFoundMsg
db
' B u f f eerx h a u s t ewd i tnhm
o atch.',Odh.Oah.'S'
: T a b l eo fi n i t i a l ,p o s s i b l yp a r t i a ll o o pe n t r yp o i n t sf o r
: SearchMaxLength.
SearchMaxLengthEntryTable
label
word
dw
SearchMaxLengthEntry4
dw
SearchMaxLengthEntryl
dw
SearchMaxLengthEntry2
dw
SearchMaxLengthEntry3
.code
S t a r tp r o cn e a r
mov
a x , @ d a :t p
a o i stnot a n d a rdda t a
segment
mov
ds.ax
mov
d x . o f f s e t Prompt
mov
ah.9
:DOS sfpturni nc g
tion
user the
i n:prompt
t
21h
mov
ah.1
:DOS g e t key f u n c t i o n
int
21h
s efatoorkrceth:hygee t
mov
a h;cp.haualt rsaectiaotnferocrrh
AH
Local Optimization
143
mov
cx.SAMPLELSTRING-LENGTH
;#boyf tstee
osa r c h
mov
s.io f f s eSt a m p l e s t r i n g
; p o i n tt ob u f f e rt os e a r c h
c a l l SearchMaxLength
; s e a r c ht h eb u f f e r
mov
d x . o f f s e t ByteFoundMsg
;assume we f o u n d t h e b y t e
P r i nj tcs t a t u s
;we d i d f i n d t h e b y t e
;we d i d n ' t f i n d t h e b y t e , f i g u r e o u t
;whether we f o u n d a z e r ob y t eo r
; r a no u to fb u f f e r
mov
d x . o f f s e t NoByteFoundMsg
;assume we d i d n ' t f i n d a z e r ob y t e
jP
c xr izn t s t a t u s
;we d i df ni n' td
a zbeyrtoe
mov
d x , o f f s e t ZeroByteFoundMsg ;we f o u n d a z e r ob y t e
Printstatus:
mov
ah.9
;DOS p r i n t s t r i n g f u n c t i o n
int
21h
i r e p o r ts t a t u s
mov
int
Start endp
ah.4ch
21h
: F u n c t i o nt os e a r c h
: r e t u r n t o DOS
a b u f f e ro f
a s p e c i f i e dl e n g t hu n t i le i t h e r
; s p e c i f i e db y t eo r
a z e r ob y t ei se n c o u n t e r e d .
; Input:
;
AH
c h a r a c t et sroe a r c fho r
;
CX
-- -
:
DS:SI
: output:
;
maximum l e n g t ht o be
searched
(must
be
p o i n t e tr ob u f f e tr o
be
searched
>
0)
- 0 i f and o n l y
i f we r a no u o
tb
f y t e sw i t h o u ft i n d i n g
o r a z e r ob y t e
e i t h e rt h ed e s i r e db y t e
DS:SI
p o i n t e tros e a r c h e d - f o br y t e
i f f o u n do, t h e r w i s eb y t e
a f t e rz e r ob y t e
i f f o u n d ,o t h e r w i s eb y t ea f t e rl a s t
bytechecked i f n e i t h e r s e a r c h e d - f o r b y t e n o r z e r o
b y t ei sf o u n d
CarrF
y lag
s e t i f s e a r c h e d - f obr y t ef o u n dr.e s eot t h e r w i s e
CX
SearchMaxLength
proc
near
c ld
mov
bx.cx
C X , ~
add
; tcw
lhixosrh
s.oto1hipcure,hg h
: u n rcoxl .l 1e ds h r
;bcaxan. le3dcintnuihtndlhreatyeotxe
; c a l c utl ha et e
maximum
o f passes
4 times
;pointtableforthefirst,
; p o s s i b l yp a r t i a ll o o p
a w o r d - lsoi zo ek d
-up
144 Chapter 7
do
SearchMaxLengthEntry2:
: g e tt h en e x tb y t e
lodsb
:isthisthebyte
we want?
cmp
a1 ,ah
:yes.we'redonewithsuccess
jz
EyteFound
0 byte?
: i st h i st h et e r m i n a t i n g
and
a1 .a1
: y e s .w e ' r ed o n ew i t hf a i l u r e
jz
EyteNotFound
SearchMaxLengthEntryl:
: g e tt h en e x tb y t e
1 odsb
; i s t h i st h eb y t e
we want?
cmp
a1
,ah
;yes.we'redonewithsuccess
jz
ByteFound
: i st h i st h et e r m i n a t i n g
0 byte?
and
al.al
: y e s .w e ' r ed o n ew i t hf a i l u r e
jz
ByteNotFound
; i t ' s n e i t h e r . s o c h e c kt h en e x t
l o o p SearchMaxLengthLoop
; f o u rb y t e s ,
i f any
ByteNotFound:
clc
s t faot u ns"dn:"or et t u r n
ret
ByteFound:
si
l ot ch: p
aebto
toai iocnnkt
a t which
dec
: we f o u n dt h es e a r c h e d - f o rb y t e
s t a t u s " f o u n d ": r e t u r n
stc
ret
SearchMaxLength
endp
end
Start
How much difference? Listing 7.2 runs in121 ps-40 percent faster than Listing 7.1,
even though Listing 7.2 still uses LOOP rather than DEC CX/JNZ. (The loop in
Listing 7.2 could be unrolled further, too; it's just a question of how much more
memory you wantto trade for ever-decreasing performance benefits.) That's typical
of local optimization; it won't often yield the order-of-magnitude improvements that
algorithmic improvements can produce, but it can getyou a critical 50 percent or
100 percent improvement when you've exhausted all other avenues.
Thepoint issimply this: You can gain far more by stepping back a bit and thinking
of the fastest
overall way for the CPU to performa task than you can by saving a
cycle here or there usingdifferent instructions. T q to thinkatthe level ofsequences
of instructions rather than individual instructions, and learn to treat x86 instructions as building blocks with unique characteristics rather than as instructions
dedicated to spec@ tasks.
BX.l
SHL
OR
EX.CL
AX.BX
Local Optimization
145
This solution is obvious because it takes good advantage of the special ability ofthe
x86 familyto shift or rotate by the variable number of bits specified by CL. However,
it takes an average of about 45 cycles on an8088. Its actually far faster to precalculate
the results, pass the bit number in BX, and look the shifted bit up, as shown in
Listing 7.3.
BX.l
AX.ShiftTableCBX1
ShiftTable
LABEL
BIT-PATTERN-0001H
REPT 1 6
DW
BIT-PATTERN
BIT-PATTERN-BIT-PATTERN
ENOM
: p r e p a r ef o rw o r ds i z e dl o o ku p
; l o o ku pt h eb i t
and OR it i n
WORD
SHL 1
Even though it accesses memory, this approach takes only 20 cycles-more than
twice as fast asthe variable shift. Once again,we were able toimprove performance
considerably-not by knowing the fastest instructions, but by selecting the fastest
sequence of instructions.
In the particular example
above, we once again run into thedifficulty of optimizing
across the x86 family.The table lookup is faster on the 8088 and 286, but its slightly
slower on the 386 and no faster on the 486. However, 386/486specific code could
instruction, along
use enhanced addressing to accomplish the whole job in just one
the lines of the code snippet inListing 7.4.
LISTING 7.4
17-4.ASM
EAX,ShiftTableCEBX*4]
:look
up
the
OR
ShiftTable
LABEL
BIT-PATTERN-0001H
REPT 32
DD
BIT-PATTERN
BIT-PATTERN-BIT-PATTERN
ENDM
b i t and OR i t i n
DWORD
SHL 1
Besides illustrating the advantages of local optimization, this example also shows
that it generally pays toprecalculate results; this is oftendone at or before assembly time, butprecalculated tables can also be built at run time. This is merely one
aspect of a fundamental optimization rule: Move as much work as possibleout of
your critical code by whatever means necessary.
Flags
The NOT instruction flips all the bits in the operand, from 0 to 1 or from 1 to 0.
Thats as simple as could be, but NOT nonetheless has a minor but interestingtalent: It doesnt affect the flags. That can be irritating; I once spent a good hour
tracking
146 Chapter 7
down a bug caused by my unconscious assumption that NOT does set the flags. After
all, everyother arithmetic and logical instruction sets the flags; why not NOT? Probably because NOT isnt considered to be an arithmetic or logical instruction at all;
rather, its a data manipulation instruction,
like MOV and the various rotates. (These
are RCR, RCL, ROR, and ROL, which affect only the Carry and Overflow flags.)
NOT is often used for tasks, such as flipping masks, where theres no reason to test
the state of the result, and in that context it can be handy to keep the flags unmodified for later testing.
Besides, fyou want to NOT an operand and settheJags in the process, you can
just XOR it with -1. Put another way, the only functional d@rence between NOT
AX and XOR AX,OFFFFH is that XOR modifies the Jags and NOT doesn t.
The x86 instruction set offers manyways to accomplish almost any task.
Understanding the subtle distinctions between the instructions-whether and which flags are
set, for example-can be critical when youre trying to optimize a code sequence
and youre running outof registers, or when youre trying to minimize branching.
LISTING 7.5
c LC
LOOP-TOP:
MOV
ADC
INC
INC
INC
INC
17-5.ASM
; c l etChaaetirfn
rhorierytai da dl i t i o n
A X . [ S I ] ; g e nt e x st o u r c eo p e r a n dw o r d
COI1,AX;add w i t hC a r r yt od e s to p e r a n dw o r d
SI
;npseot xoiuntortpc e r w
a nodr d
SI
; p o i n tt on e x td e s to p e r a n dw o r d
DI
Dl
LOOP LOOP-TOP
If ADD were used, the Carry flag would have to be saved between additions, with
code along thelines shown in Listing 7.6.
Local Optimization
147
Previous
Home
; c l ce
t haaitfnehrroriertayi da dl i t i o n
AX.CS11
[DII.AX
MOV
ADC
LAH F
ADD SI.2
ADD D I . 2
SAHF
LOOP
LOOP-TOP
;get
next
source
operand
word
;add w i t hc a r r yt od e sot p e r a n dw o r d
; as sectithdaferl rayg
: p o i n t t o nextsourceoperandword
; p o i n tt on e x td e s to p e r a n dw o r d
; r e s t o r et h ec a r r yf l a g
Its not that theListing 7.6approach is necessarily better or worse; that depends onthe
processor and the situation. The Listing 7.6approach is di&mt, and if you understand
the differences, youll be able to choose
the best approach forwhatever code youh a p
pen to write. (DEC has the same property of preserving the Carry flag,by the way.)
There are acouple of interesting aspects to the last example. First, note that LOOP
doesnt affect any flags at all; this allows the Carry flag to remain unchanged from
one addition to the next. Not altering the arithmeticflags is a common characteristic ofprogram control instructions (as opposed to arithmetic
and logical instructions
like SUB and AND, which do alter the flags).
The rule is not that the arithmetic Jags change whenever the CPU performs a
calculation; rathei: theflags change wheneveryou execute an arithmetic, logical,
orflag control (such as CLC to clear the Carryflag) instruction.
Not only do LOOP and JCXZ not alter theflags, but REP MOVS, which counts down
CX to 0, doesnt affect the flags either.
The other interesting point about the last example is the use of LAHF and SAHF,
which transfer thelow byte ofthe FLAGS register to and from AH, respectively. These
instructions were created to help provide compatibility with the 8080s (thats 8080,
not 8088) PUSH PSW and POP PSW instructions, but turnout to be compact(one
byte) instructions for saving and restoring the arithmetic flags. A word of caution,
however: SAHF restores theCarry, Zero, Sign, Auxiliary Carry,and Parity flags-but
not the Overflow flag, which resides in the highbyte of the FLAGS register. Also, be
aware that LAHF and SAHF provide a fast way to preserve the flags on an 8088 but
are relatively slow instructions on the 486 and Pentium.
There are times when its a clear liability that INC doesnt set the Carry flag. For
instance
INC
AOC
AX
DX.0
does not increment the 32-bit valuein DX:AX. To do that, youd need the following:
ADD
ADC
AX.l
DX.0
148 Chapter 7
Next
Previous
Home
Next
jumping languages
languages when
when you
you know
know it'll
it'll help
help
jumping
151
Which bringsus without missing a beat to this chapters theme, speedingup C with
assembly language. When you seek to speed up a C programby converting selected
parts of it (generally no more than
a few functions) to assembly language, make sure
you end upwith high-performance assembly language code, not fine-tuned C code.
Compilers like Microsoft C/C++ andWatcom C are by now pretty good atfine-tuning C code,
and youre not likely todo much better
by taking the compilers assembly
language output andtweaking it.
Apropos of which, when was the last time you heard of TerryJacks?
152
Chapter 8
Assembly language optimization is the finaland f a r from the only step in the optimization chain, and assuch should beperformed last; converting to assembly too
soon can lock in your codebefore the design is optimal. Atthe very least,conversion to assembly tends to makefuture changes and debuggingmore dijficult, slowing
you down and limitingyour options.
C compilers work within the stack frame model,whereby variables reside in ablock
of stack memory and areaccessed via offsets from BP. Compilers may store a couple
of variables in registers and may briefly keep othervariables in registers when theyre
used repeatedly, but thestack frame is the underlying architecture.Its a nice architecture; its flexible, convenient, easy to program, andmakes for fairly compact code.
However, stackframes have a few drawbacks. They must
be constructed and destroyed,
which takes both time and code. They are so easy to use that they tend to bias the
assembly language programmerin favor of accessing memory variables more often
than might be necessary. Finally, youcannot use BP as a general-purpose register if
SpeedingUp C with AssemblyLanguage
153
you intend to access a stack frame, andhaving that seventh register available issometimes useful indeed.
That doesnt meanyou shouldnt use stackframes, which are useful and often necessary. Just dont fall victim to their undeniable charms.
C compilers are not terrific at handling segments. Some compilers can efficiently
handle
a single far pointer used in a loop by leaving ES set for the duration of the loop. But two
far pointers used in the same loop confuse every compiler Ive seen, causing the full
segment:offset address to be reloaded each time either pointer is used.
This particularly affects performance in 286 protected mode (under OS/2 1.X or
the Rational DOS Extendel;for example) because segment loads in protected mode
take a minimum of 17 cycles, versus a mere 2 cycles in real mode.
In assembly language you have full control over segments. Use it, and, if necessary,
reorganize your code to minimize segment loading.
You can write goodassembly, bad assembly, or assembly that is virtually indistinguishable from compiled code; you are more likely than not to write the latter if
you think that optimization consists of tweaking compiled C code.
Sure, you can probably use the registers more efficiently and take advantage of an
instruction or two that the compiler missed, but the codeisnt going to get a whole
lot faster that way.
True optimization requires rethinking your code to take advantage of assembly language. A C loop that searches through an integerarray for matches might compile
154
Chapter 8
a x . [ b p - 8: G
1 tehst ee a r c h e d - f ov ar l u e
; I s t h i s a match?
[dil.ax
;Yes
Match
d i .2
; N o , a d v a n c et h ep o i n t e r
si
: D e c r e m e n tt h el o o pc o u n t e r
i f t h e raerme o rdea tpao i n t s
LoopTop
:Continue
natehrvxreaatl yu e
:Does i t m a t ct hs ee a r c h e d - f vo ar l u e ?
:Yes
:No. c o n t i n u e i f t h e raerm
e o rdea tpao i n t s
The ultimate in assembly language optimization comeswhen you change the rules;
that is, when youreorganize the entire programto allow the use of better assembly
language code in the small section of code that most affects overall performance.
For example, consider thatthe datasearched inthe last example is stored in an
array
of structures, with each structure in thearray containing otherinformation as well.
In this situation, REP SCASW couldnt be usedbecause the data searched through
wouldnt be contiguous.
SpeedingUp
Organizing a program h data so that the performance of the critical sections can
be optimized is a key part of design, and one thats easily shortchanged unless,
during the design stage, you thoroughly understand and work to bring together
your data needs, the critical sections of your program, and potential assembly
language optimizations.
LISTING8.118/*
# i n c l u d e< s t d i o . h >
# i f d e f -TURBOC#i
n c l u d e < a 11 oc. h>
#else
#i
n c l u d e <mal 1 oc. h>
#endi f
156
1.C
P r o g r a mt os e a r c ha na r r a ys p a n n i n g
a linkedlistofvariables i z e db l o c k s ,f o ra l le n t r i e sw i t h
a s p e c i f i e d I D number,
a n dr e t u r nt h ea v e r a g eo ft h ev a l u e so fa l ls u c he n t r i e s .E a c ho f
t h ev a r i a b l e - s i z e db l o c k s
may c o n t a i na n yn u m b e ro fd a t ae n t r i e s ,
*I
s t o r e da sa na r r a yo fs t r u c t u r e sw i t h i nt h eb l o c k .
Chapter 8
v o i dm a i n ( v o i d ) :
voidexit(int1;
*):
u n s i g n e di n tF i n d I D A v e r a g e ( u n s i g n e di n t .s t r u c tB l o c k H e a d e r
/ * S t r u c t u r et h a ts t a r t se a c hv a r i a b l e - s i z e db l o c k
*/
s t r u c tB l o c k H e a d e r
{
s t r u c tB l o c k H e a d e r* N e x t B l o c k :
/ * P o i n t e rt on e x tb l o c k ,o r
NULL
if thisisthelastblockinthe
l i n k e d l i s t */
u n s i g n ei B
nd tl o c k C o u n t :
/ * The
number
oDfa t a E l e m e ne tn t r i e s
i nt h i sv a r i a b l e - s i z e db l o c k
*/
I:
/ * S t r u c t u r e t h a t c o n t a i n s one e l e m e n to ft h ea r r a yw e ' l ls e a r c h
s t r u c tD a t a E l e m e n t {
/ * I D // f o r a r r a y e n t r y
*/
unsigned i n t I D :
/ * V a l u eo fa r r a ye n t r y
*/
u n s i g n e d i n tV a l u e :
*/
I:
v o i dm a i n ( v o i d )
{
i n t i.j:
u n s i g n e di n tI D T o F i n d :
s t r u c tB l o c k H e a d e r * B a s e A r r a y B l o c k P o i n t e r . * W o r k i n g B l o c k P o i n t e r :
s t r u c tD a t a E l e m e n t* W o r k i n g D a t a P o i n t e r :
s t r u c tB l o c k H e a d e r* * L a s t B l o c k P o i n t e r :
"):
p r i n t f ( " 1 D /I f o r w h i c h t o f i n d a v e r a g e :
scanf("%d".&IDToFind):
/ * B u i l d an a r r a ya c r o s s 5 b l o c k s , f o r t e s t i n g
*/
/ * A n c h o rt h el i n k e dl i s tt oB a s e A r r a y B l o c k P o i n t e r
*/
LastBlockPointer
&BaseArrayBlockPointer:
/ * C r e a t e 5 b l o c k so fv a r y i n gs i z e s
*/
f o r (i 1: i < 6 : i++)
I
/ * T r y t o g e t memory f o r t h e n e x t b l o c k
*/
i f ((WorkingBlockPointer
( s t r u c tB l o c k H e a d e r * ) m a l l o c ( s i z e o f ( s t r u c tB l o c k H e a d e r )
s i z e o f ( s t r u c tD a t a E l e m e n t )
* i * 10))
NULL) {
exit(1):
I
/* S e tt h e
/I o f d a t a e l e m e n t s i n t h i s b l o c k
*/
WorkingBlockPointer->Blockcount = i * 10:
/ * L i n k t h e new b l o c k i n t o t h e c h a i n
*/
*LastBlockPointer
WorkingBlockPointer:
/* P o i n tt ot h ef i r s td a t af i e l d
*/
WorkingDataPointer =
( s t r u c tD a t a E l e m e n t * ) ( ( c h a r* ) W o r k i n g B l o c k P o i n t e r
s i z e o f ( s t r u c tB l o c k H e a d e r ) ) :
/ * Fill t h ed a t af i e l d sw i t h
I D numbersandvalues
*/
f o r (j
0: j < (i* 1 0 ) : j++, W o r k i n g D a t a P o i n t e r + + ) {
WorkingDataPointer->ID
j:
WorkingDataPointer->Value
i * 1000 + j :
/*
Remember where t o s e t l i n k f r o m t h i s b l o c k t o t h e n e x t
LastBlockPointer
&WorkingBlockPointer->NextBlock:
*/
NULL t o i n d i c a t e
no m o r eb l o c k s
*/
t h a tt h e r ea r e
WorkingBlockPointer->NextBlock
NULL:
I D %d: %u\n".
p r i n t f ( " A v e r a g eo fa l le l e m e n t sw i t h
I D T o F i n dF, i n d I D A v e r a g e ( I D T o F i n d ,
BaseArrayBlockPointer)):
157
I* S e a r c h e st h r o u g ht h ea r r a yo fD a t a E l e m e n te n t r i e ss p a n n i n gt h e
l i n k e d l i s t o f v a r i a b l e - s i z e db l o c k s ,s t a r t i n gw i t ht h eb l o c k
all e n t r i e s w i t h
I D S matching
p o i n t e dt ob yB l o c k P o i n t e r .f o r
S e a r c h e d F o r I D .a n dr e t u r n st h ea v e r a g ev a l u eo ft h o s ee n t r i e s .
no m a t c h e sa r ef o u n d ,z e r o
i sr e t u r n e d *I
If
u n s i g n e di n tF i n d I D A v e r a g e ( u n s i g n e di n tS e a r c h e d F o r I D .
s t r u c tB l o c k H e a d e r* B l o c k P o i n t e r )
{
s t r u c tD a t a E l e m e n t* D a t a P o i n t e r :
u n s i g n e di n t IDMatchSum:
u n s i g n e di n tI D M a t c h C o u n t ;
u n s i g n e di n tW o r k i n g B l o c k C o u n t :
IDMatchSum
0:
all t h e l i n k e d b l o c k s u n t i l t h e l a s t b l o c k
( m a r k e dw i t h a N U L L p o i n t e r t o t h e n e x t b l o c k ) h a s b e e n
searched * I
IDMatchCount
I* S e a r c ht h r o u g h
do C
I* P o i n t t o t h e f i r s t D a t a E l e m e n t e n t r y w i t h i n t h i s b l o c k
DataPointer
( s t r u c tD a t a E l e m e n t * ) ( ( c h a r* ) B l o c k P o i n t e r
+
s i z e o f ( s t r u c tB l o c k H e a d e r ) ) :
I* S e a r c h all t h eD a t a E l e m e n te n t r i e sw i t h i nt h i sb l o c k
a n da c c u m u l a t ed a t af r o ma l lt h a tm a t c ht h ed e s i r e d
f o r( W o r k i n g B l o c k C o u n t - 0 ;
*I
I D *I
WorkingBlockCount<BlockPointer->BlockCount:
W o r k i n g B l o c k C o u n t t cD
. ataPointer++)
i nt h ev a l u e
I* If t h e I D matches,add
m a t c hc o u n t e r * I
a n di n c r e m e n tt h e
i f (DataPointer->ID
SearchedForID) {
IDMatchCounttc:
IDMatchSum +- D a t a P o i n t e r - > V a l u e :
I* P o i n t t o t h e n e x t b l o c k , a n d c o n t i n u e
i s n ' t NULL *I
w h i l e( ( B l o c k P o i n t e r
I* C a l c u l a t et h ea v e r a g eo f
a s l o n ga st h a tp o i n t e r
- BlockPointer->NextBlock)
all matches * I
i f (IDMatchCount
0)
return(0):
/* A vdoi vi di sbiyo n
else
return(1DMatchSum I I D M a t c h C o u n t ) ;
!-
NULL):
0 *I
The main body of Listing 8.1 constructs a linkedlist of memory blocks of various
sizes and stores an array of structures across those blocks, as shownin Figure8.2. The
formatches to
function FindIDAverage in Listing 8.1 searches through that array all
of all such matches.
a specified ID number and returns the average value
FindIDAverage contains two nested loops, the outer one repeating once for each
linked block and the inner one repeating once forarray
each
element in each
block.
The innerloop-the critical one-is compact, containingonly four statements, and
should lenditself rather well to compiler optimization.
158
Chapter 8
BlockHeader->NextBlock
BlockHeader->BlockCount
DataElement[Ol->ID
DataElement[Ol->Value
DataElementllI->ID
DataElementC11->Value
Blockneader->NextBlock
Blockneader->Blockcount
DataElementCOI->ID
DataElementCOI->Value
Blockneader->NextBlock
Blockneader->Blockcount
DataElement[Ol->ID
DataElement[Ol->Value
DataElement[ll->ID
DataElementC11->Value
DataElementCPI->ID
DataElement[ZI->Value
1000-
-----
Array Element 0
Array Element 1
Array Element 2
lobi"""
2000"""
F l )
- -
--
- _NULL -
--
--
Array Element 3
3000"""
3ooi
3002 -
-----
Array Element 4
-----
Array Element 5
As it happens, Microsoft C/C++ does optimize the inner loop of FindIDAverage nicely.
Listing 8.2 shows the code Microsoft C/C++ generates for the inner loop, consisting of
a mere seven assembly language instructions inside the loop. The compiler is smart
enough to convert the loop index variable, which countsup but is used for nothing but
counting loops, into a count-down variableso that the LOOP instruction can be used.
LISTING 8.218-2.COD
: Code g e n e r a t e db yM i c r o s o f t
:I***
: I ***
f o r (WorkingBlockCount-0:
C f o ri n n e rl o o p
o f FindIDAverage.
WorkingBlockCount<BlockPointer->BlockCount:
WorkingBlockCount++.
DataPointer++)
{
mov
WORD PTR [ ;bW
p -o6r1k .i n0 g B l o c k C o u n t
mov
bx.WORD
PTR
[bp+61
:B1 o c k P o i n t e r
cmp
WORD PTR [bx+21,0
I FB264
je
cx.WORD
movPTR
[bx+21
WORD PTR [ b p: W
- 6ol .r ck xi n g B l o c k C o u n t
add
di.WORD
mov
PTR :IDMatchSum
[bp-21
dx.WORD
PTR mov
: I[ D
b pM- a4 t1c h C o u n t
IL20004:
:I*** i f ( D a t a P o i n t e r - > I O
SearchedForID) {
ax.WOR0
mov
PTR [ s i ]
WORD PTR [ b p: S
+ 4e la, racxh e d F o r I D
cmp
$1265jne
;
I ***
159
I ***
dx
I ***
IOMatchCount++;
inc
I ***
I ***
I
265:
si.4
IDMatchSum += D a t a P o i n t e r - > V a l u e ;
d i .WORD PTR [ s i + 2 1
add
add
1 oop
tL20004
WORD PTR C b: IpD- 2Ml a. dt ci h S u m
mov
WORD PTR [ b p: I-D4 M
l . da xt c h C o u n t
mov
$FB264:
equ
; T y p i c a l l yo p t i m i z e da s s e m b l yl a n g u a g ev e r s i o no fF i n d I D A v e r a g e .
4
; P a s speadr a m eot ef frtshienet s
S e a r c h e d F oer qI Du
6
: s t af cr ak m
( sekoivpeurs h e d
BP
B l o c k P o i n et eqru
; a n dt h er e t u r na d d r e s s )
e q uoNcekx t B l
0
: F i e l od f f s e tisnt r u cBt l o c k H e a d e r
2
B1 ockCount
equ
BLOCK-HEAOERLSIZE equ
4
:Number obf y t e isns t r u cBt l o c k H e a d e r
D
I
equ
0
: s t r u cDt a t a E l e m e nf it e lodf f s e t s
2
Value
DATALELEMENT-SIZE equ
4
:Number obf y t e isns t r u cDt a t a E l e m e n t
.model
small
.code
p u b _l i F
c indIOAverage
160
Chapter 8
- F i n d I D A v neperraaogrce
f r ascmtaa:elScl eakrv' seb p p u s h
mov sf rtaaom
cukreb: tPpoo, si np t
C r e g ivsat er ir
es ab1
: P r edpsi uesr hv e
si push
= 0
:IDMatchSum
dx.dx sub
mov :IDMatchCount
bx.dx
0
mov
s i . [ b p + B l o c k P o i n t;ePro] if nibtrtloseotrc k
: I D w e l' or eo k ifnogr
mov
ax.[bp+SearchedForIDl
; S e a r c ht h r o u g ha l lt h el i n k e db l o c k su n t i lt h el a s tb l o c k
: ( m a r k e dw i t h
a NULL p o i n t e rt ot h en e x tb l o c k )h a s
been s e a r c h e d .
BlockLoop:
: P o i n tt ot h ef i r s tD a t a E l e m e n te n t r yw i t h i nt h i sb l o c k .
dl ei a
.[si+BLOCKpHEADER-SIZEl
: S e a r c ht h r o u g ha l lt h eD a t a E l e m e n te n t r i e sw i t h i nt h i sb l o c k
: a n da c c u m u l a t ed a t af r o ma l lt h a tm a t c ht h ed e s i r e d
ID.
mov
cx.~si+BlockCountl
j cDxozN e x t B l o c k
:No d at ihtnbai lso c k
IntraBlockLoop:
;Do we havean D
I match?
cmp
[di+IDl.ax
;No m a t c h
jnz
NoMatch
bx
inc
:We have a match:IDMatchCount++:
d x . [ d i +aVdadl u e ]
:IDMatchSum += D a t a P o i n t e r - > V a l u e :
NoMatch:
add
di.DATApELEMENT-SIZE
; p o i nttothnee xetl e m e n t
1 oop
I n t r a Bolc k L o o p
: P o i n tt ot h en e x tb l o c ka n dc o n t i n u e
i f t h a tp o i n t e ri s n ' t
NULL.
DoNextBlock:
mov
s i . [ s i + N e x t B l o c k: G
l epto i n t ettorhnee xbtl o c k
: I s i t a NULL p o i n t e r ?
and
si.si
B l o c kj Ln oz o p
:No. c o n t i n u e
: C a l c u l a t et h ea v e r a g eo fa l lm a t c h e s .
:Assume
ax.ax
sub
m
wea t ncf ohuens d
bx.bx and
0
jz
Done
:We d i fdi nn 'dt
maant cyr he et usr, n
d i v i :sPfi oar enrx p.xdacxrheg
/ IDMatchCount
I D: RMeat tucrhbnSx u m d i v
C r ev ga ir si at eb rl e s
popDone:
: R e s t osri e
pop
di
POP
bP
f r :aR
scmtaeaelscl tekor r' se
ret
-FindIDAverage
ENDP
end
1 61
LISTING 8.418-4.ASM
: H e a v i l yo p t i m i z e da s s e m b l yl a n g u a g ev e r s i o no fF i n d I D A v e r a g e .
: F e a t u r e sa nu n r o l l e dl o o pa n dm o r ee f f i c i e n tp o i n t e ru s e .
4
S e a r c h eedqFuo r I D
; P a s s e dp a r a m e t e ro f f s e t si nt h e
B l o c kePq ou i n t e r
equ
: s t a c kf r a m e( s k i po v e rp u s h e d
: a n dt h er e t u r na d d r e s s )
BP
0
s; F
B
ot rifl uefoi nsclcdetktHs e a d e r
N e x teBql ou c k
Blockcount
equ
2
4
BLOCK-HEADER-SIZE equ
;Number o f b y t e s i n s t r u c t B l o c k H e a d e r
ID
0
; s t rDu ac t a E l e mfeioenfltfds e t s
equ
2
Value
DATA-ELEMENT-SIZE equ
4
:Number boyf t e s
i n s t r uDcat t a E l e m e n t
.model
small
.code
p u b l-iFc i n d I D A v e r a g e
-F i n d I D A v e r apgre
noeca r
c:fS
asr latlam
evbcerpek' us s h
mov
bsop;tf P
aru. satcoropm
ki net
C r e gv iasrt iearb l e s
: P r e sd eipruvseh
s ip u s h
mov
d: P
i .rdesfpoar r e
SCASW
mov
es.di
cld
= 0
:IDMatchSum
dx.dx
sub
mov
: IbDxM. daxt c h C o u n t
0
mov
s i . [ b p + B l o c k P o i n t;ePr o] ifnibtrtolesotr c k
: I D w e l' or eo k ifnogr
mov
ax.[bp+SearchedForIDl
: S e a r c ht h r o u g ha l lo ft h el i n k e db l o c k su n t i lt h el a s tb l o c k
: ( m a r k e dw i t h a NULL p o i n t e rt ot h en e x tb l o c k )h a sb e e ns e a r c h e d .
BlockLoop:
: P o i n tt ot h ef i r s tD a t a E l e m e n te n t r yw i t h i nt h i sb l o c k .
l edai
, [si+BLDCK-HEADER-SIZE]
: S e a r c ht h r o u g ha l lt h eD a t a E l e m e n te n t r i e sw i t h i nt h i sb l o c k
: a n da c c u m u l a t ed a t af r o ma l lt h a tm a t c ht h ed e s i r e d
ID.
mov
c x . C s i + B l o c k C o u n t ] :Number oef l e m e n tistnh ibs l o c k
j c xDz o N e x t B l o c;kS k itph ibsl o c k
i f i t ' s empty
mov
b p . c: *x* * s t a cf rka mnl oeon g ae vr a i l a b l e * * *
cx.7 add
schxr . 1
;Number or ef p e t i t i o not shfuen r o l l e d
: loop
(Blockcount + 7) / 8
s hc rx . 1
cx.1 shr
a nb dp:.G
7 e n e r atthe en tpr yo ifntohtre
; f i r spt o, s s i bpl ya r t i pa al st hs r o u g h
sbhpl . 1
: t huen r o l l el od oapn d
j mc ps : [ L o o p E n t r y T a b l e + b p l
: v e c t o rt ot h a te n t r yp o i n t
2
align
L o o p E n t r y T al aw
bbloer ld
dw
LoopEntryB.LoopEntryl,LoopEntry2~LoopEntry3
dw
LoopEntry4.LoopEntry5.LoopEntry6.LoopEntry7
P1
M-IBL
macro
Nl o M
c aal t c h
LoopEntry&Pl&:
:Do we have
an
I D match?
scasw
jnz
NoMatch
:No m a t c h
:We h a v e a m a t c h
; I D M a t cbhxC o u n ti+n+c ;
d;xIaD
. [dM
ddia] t c h S u m
+- D a t a P o i n t e r - > V a l u e :
NoMatch:
add
di.DATA-ELEMENT-SIZE-2
: p o i nttothnee xetl e m e n t
: (SCASW advanced 2 b y t e sa l r e a d y )
162
Chapter 8
endm
2
a1 i g n
IntraBlockLoop:
M-IBL
8
M-IBL
7
MKIBL
6
M-IBL
5
M-IBL
4
M-IBL
3
M-IBL
2
1
MKIBL
I n t lroaoBpl o c k L o o p
: P o i n tt ot h en e x tb l o c ka n dc o n t i n u e
i f t h a tp o i n t e ri s n ' t
NULL.
DoNextBlock:
: G e tp o i n t e rt ot h en e x tb l o c k
mov
. [ s si +i N e x t B l o c k ]
and
si.si
: I s i t a NULL p o i n t e r ?
: N oc. o n t i n u e
B l o c k Lj on oz p
: C a l c u l a t et h ea v e r a g eo fa l lm a t c h e s .
:Assume wematches
found
no
ax.ax
sub
bx.bx and
jz
Done
:We d i df ni n' td
ma an tyc hr eetsu, r n
0
:adPxxi vrc.efidhospxgiraorne
/ IDMatchCount
:IRDeMt uabrtxcn h Sdui vm
C r e gv iasrti ea rb l e s
pop
Done:
; R e s st oi r e
di
pop
POP : R fcersasabtltm
alpoecerrke' s
ret
ENDP
-FindIDAverage
end
Listings 8.5 and 8.6 together go the final step and change the
rules infavor of assembly language. Listing 8.5 creates thesame listof linked blocks asListing 8.1. However,
instead of storing an array of structures within each block, it stores two arrays in each
block, one consisting of ID numbers and the other
consisting of the corresponding
values, as shown in Figure 8.3. No information is lost; the datais merely rearranged.
LISTING 8.5
/*
18-5.C
Program t os e a r c h an a r r a ys p a n n i n g
a l i n k e dl i s to fv a r i a b l e s i z e db l o c k s ,f o ra l le n t r i e sw i t h
a s p e c i f i e d ID number,
a n dr e t u r nt h ea v e r a g eo ft h ev a l u e so fa l ls u c he n t r i e s .
Each o f
t h ev a r i a b l e - s i z e db l o c k s
may c o n t a i na n yn u m b e ro fd a t ae n t r i e s .
o f t w os e p a r a t ea r r a y s ,o n ef o r
I D numbersand
s t o r e di nt h ef o r m
*/
o n ef o rv a l u e s .
# i n c l u d e< s t d i o . h >
B i f d e f -TURBOC#i
n c l u d e < a 11 oc. h>
#else
# i n c l u d e < m a l 1 oc. h>
#endi f
v o i dm a i n ( v o i d 1 :
voidexit(int);
e x t e r nu n s i g n e di n tF i n d I D A v e r a g e Z ( u n s i g n e di n t .
s t r u c tB l o c k H e a d e r
*):
163
BlockHeader->NextBlock
BlockHeader->BlockCount
IDCOl
IOCll
Val ue[Ol
ValueCll
BlockHeader->BlockCount
IDCOI
ValueCOI
"""""""_
Array Elements
" " "
1)-
2000- - - -
Element
Array
BlockHeader->NextBlock
BlockHeader->Blockcount
IDCOl
IDCll
IDC21
ValueCOl
ValueCll
ValueCZl
A r r a y Elements
3 thl,ough 5
S t r u c t u r et h a ts t a r t se a c hv a r i a b l e - s i z e db l o c k
*/
s t r u c tB l o c k H e a d e r
I
s t r u c tB l o c k H e a d e r* N e x t B l o c k :
/ * P o i n t e rt on e x tb l o c k .
o r NULL
i f thisisthelastblockinthe
l i n k e d l i s t */
u n s i g n ei B
nd tl o c k C o u n t ;
/ * The
number
D
o fa t a E l e m e en nt t r i e s
i nt h i sv a r i a b l e - s i z e db l o c k
*/
1:
v o i dm a i n ( v o i d 1
{
i n t i.j:
u n s i g n e di n tI D T o F i n d :
s t r u c tB l o c k H e a d e r * B a s e A r r a y B l o c k P o i n t e r , * W o r k i n g B l o c k P o i n t e r :
i n t* W o r k i n g D a t a P o i n t e r ;
s t r u c tB l o c k H e a d e r* * L a s t B l o c k P o i n t e r :
p r i n t f ( " 1 D I/f o r w h i c h t o f i n d a v e r a g e :
scanf("%d".&IDToFind):
/*
/*
'I):
B u i l d an a r r a ya c r o s s
5 b l o c k s ,f o rt e s t i n g
A n c h o rt h el i n k e dl i s tt oB a s e A r r a y B l o c k P o i n t e r
LastBlockPointer
&BaseArrayBlockPointer:
/ * C r e a t e 5 b l o c k so fv a r y i n gs i z e s
*/
f o r ( i 1; i < 6: i++)
I
/ * T r y t o g e t memory f o r t h e n e x t b l o c k
*/
164
Chapter 8
- -
*/
*/
i f ((WorkingBlockPointer =
( s t r u c tB l o c k H e a d e r * ) m a l l o c ( s i z e o f ( s t r u c tB l o c k H e a d e r )
s i z e o f ( i n t ) * 2 * i * 1 0 ) ) == NULL) {
exit(1):
/ * S e tt h en u m b e ro fd a t ae l e m e n t si nt h i sb l o c k
*/
WorkingBlockPointer->BlockCount
i * 10:
/ * L i n k t h e new b l o c k i n t o t h e c h a i n
*/
*LastBlockPointer = WorkingBlockPointer;
/* P o i n t t o t h e f i r s t d a t a f i e l d
*/
W o r k i n g D a t a P o i n t e r = ( i n t * ) ( ( c h a r* ) W o r k i n g B l o c k P o i n t e r
s i z e o f ( s t r u c tB l o c k H e a d e r ) ) :
/ * F i l lt h ed a t af i e l d sw i t h
I D numbersandvalues
*/
for ( j
0 ; j < ( i * 1 0 ) ; j++, W o r k i n g D a t a P o i n t e r + + ) (
*WorkingDataPointer = j ;
* ( W o r k i n g D a t a P o i n t e r + i * 1 0 ) = i * 1000 + j ;
/ * Remember where t o s e t l i n k f r o m t h i s b l o c k t o t h e n e x t
LastBlockPointer = &WorkingBlockPointer->NextBlock;
WorkingBlockPointer->NextBlock
*/
NULL t o i n d i c a t e
NULL:
I D % d :% u \ n " .
p r i n t f ( " A v e r a g eo fa l le l e m e n t sw i t h
IDToFind. FindIDAverageZ(1DToFind. B a s e A r r a y B l o c k P o i n t e r ) ) :
exit(0);
LISTING 8.618-6.ASM
; A l t e r n a t i v eo p t i m i z e da s s e m b l yl a n g u a g ev e r s i o no fF i n d I D A v e r a g e
;
r e q u i r e sd a t ao r g a n i z e da st w oa r r a y sw i t h i ne a c hb l o c kr a t h e r
; t h a na sa na r r a yo ft w o - v a l u ee l e m e n ts t r u c t u r e s .T h i sa l l o w st h e
: u s e o f REP SCASW f o r I D s e a r c h i n g .
SearchedForID
BlockPointer
equ
equ
Next61 ock
BlockCount
BLOCK-HEADER-SIZEequ
equ
0
2
;Number obfy t eisnt r u cBt l o c k H e a d e r
equ
; P a s s e dp a r a m e t e ro f f s e t si nt h e
; s t a c kf r a m e( s k i po v e rp u s h e d
; a n dt h er e t u r na d d r e s s )
: F i e ol df f s esitntsr u Bc tl o c k H e a d e r
BP
.model
small
.code
p u b l-iFc i n d I D A v e r a g e Z
- F i n d I D A v e r a pgnreeoEacr
c:fS
asr alatlam
evbcerpek' su s h
mov
sboftp;rauPa.ctrsom
okpi en t
C r e gv iasrt iearb l e s
; P r e sdeipruv seh
si push
SCASW
mov : P. rdesfpdoair r e
mov
es.di
c ld
mov
s i . [ b p + B l o c k P o i n t:eProl ifnibtrtolseotrc k
; I D w e l' or eo k ifnogr
mov
ax.[bp+SearchedForID]
0
;IDMatchSum
dx.dx
sub
mov
;IDMatchCount
bp,dx
0
: * * * s t a c kf r a m en ol o n g e ra v a i l a b l e * * *
; S e a r c ht h r o u g ha l lt h el i n k e db l o c k su n t i lt h el a s tb l o c k
: ( m a r k e dw i t h a NULL p o i n t e rt ot h en e x tb l o c k )h a sb e e ns e a r c h e d .
165
Previous
BlockLoop:
: S e a r c ht h r o u g ha l lt h eD a t a E l e m e n te n t r i e sw i t h i nt h i sb l o c k
: a n da c c u m u l a t ed a t af r o ma l lt h a tm a t c ht h ed e s i r e d
ID.
mov
cx,Csi+BlockCount]
jCXZ
D o N e x t B l o c ;kS k i tph i bs l o c k
i f t h e r e ' ns do a t a
: t os e a r c ht h r o u g h
mov
bx.cx
BX t o p o i n t t o t h e
: W e ' l lu s e
bx.1 shl
: c o r r e s p o n d i n gv a l u ee n t r yi nt h e
: c a s eo fa n
I D m a t c h (BX i s t h e
: l e n g t hi nb y t e so ft h e
I D array)
: P o i n tt ot h ef i r s tD a t a E l e m e n te n t r yw i t h i nt h i sb l o c k .
lea
di.Csi+BLOCK-HEADER-SIZE]
IntraBlockLoop:
the :S
fsocerreaapsr cw
nhz
ID
j nDzo N e x t B l o c k
:No m a t c ht h, bel o cidkso n e
bp inc
:We have aI DmMaat ct chh: C o u n t t t ;
+- D a t a P o i n t e r - > V a l u e :
add
dx.Cdi+bx-Z]
:IDMatchSum
: (SCASW hasadvanced D I 2 b y t e s )
s etahtm
drohrcaoceo:htuIxarase.gnchdx?
I n t r a Bj ln:oyzceksL o o p
: P o i n tt ot h en e x tb l o c ka n dc o n t i n u e
if t h a t p o i n t e r i s n ' t
NULL.
DoNextBlock:
mov
s i . C s i + N e x t B l o c k: G
l epto i n t ettorhnee xbtl o c k
and
, ssi i
: I s i t a NULL p o i n t e r ?
jnz
B1 ockLoop
:No. c o n t i n u e
: C a l c u l a t et h ea v e r a g eo fa l lm a t c h e s .
ax,ax
sub
:Assume we mfaontuocnhde s
bp,bp and
Jz
Done
:We d i dfm
ainna' ydt c hreest u, r n
0
ax.dx
xchg
; P r e p a r ef o rd i v i s i o n
bp
div
/ IDMatchCount
:ReturnIDMatchSum
C r e gv iasrti ea rb l e s
pop
Done:
: R e sst oi r e
di
pop
: R e s t o r ec a l l e r ' ss t a c kf r a m e
POP
bp
ret
-FindIDAverageZ ENDP
end
Home
Next
166
Chapter 8
Previous
Home
Next
iB
Back in high school, I took a precalculus class from Mr. Bourgeis, whose most notable
characteristics wer6bcessant pacing and truly enormous feet. My friend Barry, who
sat in theback row, rig$$ behind me,claimed that itwas because of his large feet that
Mr. Bourgeis was so resd
se feet were so heavy,Barry hypothesized, that ifMr.
would give way under the
Bourgeis remained id any one place for too long, the floor
strain, plunging thekmfortunate teacherdeep into the
mantle of the Earthand possibly all the way thr&gh to China. Many amusing cartoons were drawn tothis effect.
8:
UnfortunatelyJ3dh-y
-*,e..:
was too busy drawing cartoons, or, alternatively,sleeping, to
actually learn any math. In the long run, that didnt turn out
to be a handicap for
Barry, who went onko become vice-president of sales for a ham-packing company,
where presumably he hasrarely called upon to derive the quadratic equation.
Barrys
lack of scholarship caused some problems back then, though. On one memorable
occasion, Barry was half-asleep, with his eyesopen but unfocused and his chin balanced on his hand in the classic if I fall asleep my head will fall off myhand andIll
wake up posture, when Mr. Bourgeis popped a killer problem:
Barry, solvethis for X, please. On the blackboard lay the equation:
x
1 = 0
169
Mr. Bourgeis shook his head mournfully. Try again. Barry thought hard. Heknew
the fundamentalrule that the answer to most mathematical questions is either 0, 1,
infinity, -1, or minus infinity (do not apply this rule to balancing your checkbook,
however); unfortunately, that gave him only a 25 percent chanceof guessing right.
One,I whispered surreptitiously.
Zero,Barry announced. Mr. Bourgeis shook his head even more sadly.
One, I whispered louder. Barry looked still more thoughtful-a bad sign-so I
whispered oneagain, even louder. Barry looked so thoughtful that his eyes nearly
rolled up into his head, andI realized that hewas just doinghis best to convince Mr.
Bourgeis that Barry had solved this one by himself.
As Barry neared the climax of hisstimng performance and openedhis mouth to speak,
Mr. Bourgeis looked at himwith great concern.Barry, can you hear meall right?
Yes, sir, Barry replied. Why?
Well, I could hear theanswer allthe way up here.Surely youcould hear it just one
row away?
The class went wild. They might as well have sent us home early for all we accomplished the rest of the day.
I like to think I know more about performance programming Barry
than knewabout
math. Nonetheless, I always welcome good ideas and comments, and many readers
have sent me a slew of those over the years. So in this chapter, I thinkIll return the
favor by devoting a chapter to reader feedback.
EBX
170
Chapter 9
MOV
c LC
:# o f 3 2 - b iwt o r d tso
:no c a r r y i n t o t h e i n i t i a l
ADDLOOP:
EAX.[ESII
ADC
[EDII . A X
E LS EI .A[ S I + 4 1
L EEAD[ IE, D I + 4 ]
LOOP
ADDLOOP
MOV
add
ADC
: g e tt h en e x te l e m e n t
o f o n ea r r a y
:add i t t o t h e o t h e r a r r a y , w i t h c a r r y
:advance one a r r a y sp o i n t e r
: a d v a n c et h eo t h e ra r r a y sp o i n t e r
(Yes, I could use LODSD instead of MOV/LEA, Im just illustrating a point here.
Besides, LODS is only 1 cycle faster than MOV/LEA on the386, and is actually more
than twice as slow on the 486.) Ifwe used ADD rather than LEA to advance the
pointers, the carry from one
ADC to the nextwould haveto be preserved with either
PUSHF/POPF or LAHF/SAHF. (Alternatively,we could use multiple INCs, since
INC doesnt affect the Carry flag.)
In short, LEA is indeed different fromADD. Sometimes its better. Sometimes not;
thats the nature of the various instruction substitutions and optimizations thatwill
occur to you over time. Theres no such thing as best instructionson thex86; it all
depends onwhat youre trying to do.
But there sure area lot of interesting options, arent there?
The
Kennedy
Portfolio
AX.DX
NEG
:negative?
I s p o s i t i v e ;no
AX
:yes,negate
it
Ispositive:
171
:wordcount
:copyas
many w o r d sa sp o s s i b l e
:CX-1 i f c ol pe yn g t h
was odd,
;O e l s e
:copyanyoddbyte
CX.l
CX,CX
MOVSB
REP
(ADC CX,CX can be replaced with RCL CX,l;which is faster depends on the
processor type.)It might be hard tobelieve that the above is faster than this:
:word c o u n t
:copy
as
many words
as
:possible
i f e v ecno pl eyn g t h
JNC
CopyDone
;done
MOVSB
t h: ec o p y
odd b y t e
CopyDone:
SHR
REP
CX.l
MOVSW
However, it generally is. Sure, if the length is odd, Johns approach incurs a penalty
approximately equal to the REP startup time for MOVSB. However, if the length is
even, Johns approach doesnt branch,
saving cycles and notemptylng the prefetch
queue. If copy lengths areevenly distributed between even and odd,
Johns approach
is faster in most x86 systems.(Not onthe 486, though.)
John also points out that on the 386, multiple LEAs can be combined to perform
multiplications that cantbe handled by a single L E A , much as multiple shifts and
adds can be used for multiplication, only faster. LEA can be used to multiply in a
single instruction on the 386, but only by the values 2,3,4,5,8, and9; several LEAS
strung together can handle a muchwider range of values. For example, video programmers are undoubtedly
familiar with the following code to multiply AX times 80
(the width in bytes of the bitmap in most PC display modes) :
SHL
SHL
SHL
SHL
MOV
SHL
SHL
ADD
AX.l
AX.l
AX.l
AX.l
BX.AX
AX.l
AX.l
AX.BX
:*2
:* 4
:*8
:*16
;*32
:*64
;*EO
E A X . [EAX*ZI
EAX.[EAX*81
EAX.[EAX+EAX*41
172 Chapter 9
:*2
;*16
:*EO
EAX.MultiplesOf80Table[EAX*41
AX.4
BX.AX
AX.2
A X .ABDXD
;*16
;*64
;*80
Speeding Up Multiplication
That brings us to multiplication, one of the slowest of x86 operations and one that
allows for considerable optimization. One way to speed up multiplication is to use shift
and add, LEA, or a lookup table to hard-code a multiplication operation for afixed
multiplier, as shown above.Another is to take advantage of the early-out feature of the
386 (and the 486, but in the interests of brevity Ill just say 386from now on) by
arranging your operands so that themultiplier (always the rightmost operand following MUL or IMUL) is no larger than the other operand.
Why? Because the 386 processes one multiplier bit per cycle and immediately
ends a multiplication when all sign@ant bits
of the multiplier have been processed, so f m e r cycles arerequired to multiply a large multiplicandtimes a small
multiplier than a small multiplicand times a large multipliel; by a factor ofabout
1 cycle for each significant multiplier bit eliminated.
On 386SXs and uncached 386s, where code size can significantly affect performance due to instruction prefetching, the compact MUL and IMUL instructions
can approach and in some cases evenoutperform the optimized alternatives.
All in all, MUL and IMUL are reasonable performers on the 386, no longer to be
avoided in mostcases-and you can help that alongby arranging your code to make
the smaller operand themultiplier whenever you know which operand is smaller.
That doesnt mean that your code should test and swap operands to make sure the
smaller one is the multiplier; that rarely pays off. Im speaking more of the case
where youre scaling an array up by a value thats always in the rangeof, say, 2 to 10;
because the scale value will always be small and the array elements may have any
value, the scale value isthe logical choice for the multiplier.
Optimizing OptimizedSearching
Rob Williams writes witha wonderful optimization to the REPNZ SCASB-based optimized searching routine I discussed in Chapter 5. As a quick refresher, I described
searching a buffer for a text stringas follows: Scanfor the first byte of the text string
with REPNZ SCASB, then use REPZ CMFS to check for a full match whenever REPNZ
Startof
-0
buffer
being
1
searched
2
A
T
E
<blank>
5
6
A
N
D
c blank>
10
11
U
A
12
13
14
15
The obvious
searching
approach is to
scan through the
buffer for just the
- first
character
of
the search string,
, stopping
when
a
matchforthefirst
. : character is found;
only when a firstj charactermatch is
: found
are
buffer
bytescomparedto
;-the rest of the
: searchstring.In
case, 10 first:: this
character
<blank>
p;
I,
i
1
I
I D
:
,;cl:,
j j
: ---.
j
- - - - - a
comparisonsare
; ; needed (requiring
: :
-
-8
starting REPNZ
SCASB twice),
followedby two
comparisons
of
the
rest of the string.
174 Chapter 9
Q
U
A
L
- Startof
search
string
SCASB finds a match forthe first character, as shown in Figure 9.1. The principle is
that most buffer characters wont match the first character of any given string, so
REPNZ SCASB, by far the fastest way to search on the PC, can be used to eliminate
most potential matches; each remaining potential match can then be checked
its in
entirety with REPZ CMPS.
Robs revelation, which he credits without explanation to Edgar Allen Poe (search
nevermore?),was that by far the slowest part of the whole deal is handling REPNZ
SCASB matches, which require checking the remainder of the string with REPZ
CMPS and restarting REPNZ SCASB if no match is found.
The difference between Listings 9.1 and 9.2 (which gives you more than a doubling ofperformance) is due entirely to understanding the natureof the data being
handled, and biasing the code to reject that knowledge.
Hints My ReadersGave Me
175
Start of
buffer
being
searched
-0
1
2
3
4
5
<blank>
A
N
D
6
7
8
9
10
11
12
13
14
15
R
A
T
E
4-
<blank>
E
Q
:
:
A
L
"
<blank>
A faster searching
approach
scan
J~I
:"-
through
:4
, ,
j j
I - - -.
:
j
- - A
Figure 9.2
19-1.ASM
Searches a t e x t b u f f e r f o r
a t e x t s t r i n g . Uses REPNZ SCASB t o scan
; thebufferforlocationsthat
m a t c ht h ef i r s tc h a r a c t e r
of t h e
; s e a r c h e d - f o rs t r i n g ,t h e n
uses REPZ CMPS t o check f u l l y o n l y t h o s e
; l o c a t i o n st h a t
REPNZ
SCASB
has i d e n t i f i e d as p o t e n t i a l matches.
;
; Adaptedfrom
Zen o f AssemblyLanguage,byMichaelAbrash
; C s m a l lm o d e l - c a l l a b l e
as:
;
unsigned
char
F i n d S t r i n g ( u n s i g n e cdh a r
;
unsigned i nBt u f f e r L e n g t hu.n s i g n ecdh a r
;
unsigned Si neta r c h S t r i n g L e n g t h ) ;
: Returns a p o i n t e r t o t h e f i r s t
* Buffer,
* Searchstring.
match f o rS e a r c h s t r i n gi nB u f f e r . o r
; NULL
a
p o i n t e r i f nomatch i s f o u n d .B u f f e rs h o u l dn o ts t a r ta t
; offset 0 inthedata
segment t oa v o i dc o n f u s i n g
a match a t 0 w i t h
; no matchfound.
Parms s t r u c
Buffer
BufferLength
176 Chapter 9
dw
dw
dw
2 dup(?)
to - the
Start Of
search
string
is
;pushedBP/returnaddress
?
; p o i n t e rt ob u f f e rt os e a r c h
?s e a r ctbohu f f :eol ref n g t h
Searchstring
dw
?
:pointertostringforwhichtosearch
: l e n g t ho fs t r i n gf o rw h i c ht os e a r c h
SearchStringLength
dw
?
Parmsends
.model
smal
1
.code
public -Findstring
-F i n d s t r i n g p r o c n e a r
p u sb;hp r e s e r vcea l l e rs' st a cf rka m e
mov
b p . s p; p o i n t oo u rs t a c kf r a m e
push :spi r e s e r vcea l l e r r' se g i s t ev ra r i a b l e s
push d i
;make
cld
s t r i inngs t r u c t i oinnsc r e m epnot i n t e r s
mov
s i . [ b p + S e a r c h S t r i n; pgo] i sn tt soreiternoagrf co hr
mov
bx.[bp+SearchStringLengthl
: l e n g toshtf r i n g
bx.bx
and
i f string i s 0 length
jz
F i n d S t r i n g N o t Fm
o:unanot cdh
mov
d x . [ b p + B u f f e r Lb eu:nfl feogenftrhglt h
s u b ; d idfbfxbee.urbt efwxfneecreen
andl esnt gr itnhgs
i f s e asr tcrihisn g
jc
FindStringNotFound
match
:no
; l o n g e rt h a nb u f f e r
i nd:cxd i f f e r e n cbee t w e ebnu f f e r
a nsde a r csht r i n g
: l e n g t h s ,p l u s 1 ( # o f p o s s i b l e s t r i n g s t a r t
: l o c a t i o n st oc h e c ki nt h eb u f f e r )
mov
d i .ds
mov
es.di
E S : D I t bo u f f etrso e a r c thh r u
mov
d i , [ b p + B u f f e r: pl o i n t
1o d: sptf hibubtsrehtsyoeetfaersct hr i n g
AL
mov
b p:as. sepitdosteihentectoeosrne dabr cy ht e
: dbdconxeonetcm
o'et dptfhtbaihoreyrsefette
: s t r i n g w i t h CMPS: w e ' l l do i t w i t h SCAS
FindStringLoop:
CX
mov
c x . d x: p u rt e m a i n i n gb u f f e rs e a r c hl e n g t hi n
r e p n zs c a s b: s c a nf o rt h ef i r s tb y t eo ft h es t r i n g
jnz
F i n d S t r i n g N o t F o u n d: n ofto u n d ,
s o t h e r e ' sn om a t c h
: f o u n d . s o we have a p o t e n t i a lm a t c h - c h e c kt h e
; r e s to ft h i sc a n d i d a t el o c a t i o n
push d i
:remember satcndbhtaedoyhenxrteoetfs s
mov
d x;as.rscteihxdm
teea i nl sei nentgagoi rnt ch h
: t h eb u f f e r
;pointtotherestofthesearchstring
mov
s i .bp
mov
cx.bx
: s t r i n gl e n g t h( m i n u sf i r s tb y t e )
csxh. 1r
: c o n v e r tt ow o r df o rf a s t e rs e a r c h
:dowordsearch
i f nooddbyte
F ji n cd S t r i n g W o r d
; c o m p a r et h eo d db y t e
cmpsb
so we
;odd b y t ed o e s n ' tm a t c h ,
F ji n zd S t r i n g N o M a t c h
; h a v e n ' tf o u n dt h es e a r c hs t r i n gh e r e
FindStringWord:
j c xFzi n d S t r i n g F o u n; dt e w
s th e t h ewre ' vael r e a dcyh e c k e d
: t h ew h o l es t r i n g :
i f s o . t h i s i s a match
: b y t e sl o n g :
i f s o . we'vefound
a match
r e p z cmpsw
: c h e c kt h er e s to ft h es t r i n g
a word a t a t i m e
; i t ' s a match
jz
FindStringFound
FindStringNoMatch:
; g e tb a c kp o i n t e rt ot h en e x tb y t et os c a n
pop
di
dx.dx
and
: i st h e r ea n y t h i n gl e f tt oc h e c k ?
: y e s - c h e c kn e x tb y t e
F ji n zd S t r i n g L o o p
FindStringNotFound:
ax.ax
sub
; r e t u r n a NULL p o i n t e r i n d i c a t i n g t h a t t h e
: s t r i n g was n o tf o u n d
Fjim
n dpS t r i n g D o n e
177
FindStringFound:
ax
pop
axdec
; p o i n tt ot h eb u f f e rl o c a t i o na tw h i c ht h e
we p u s h e dt h e
: a d d r e s so ft h eb y t ea f t e rt h es t a r to ft h e
; p o t e n t i am
l atch)
; s t r i n g was f o u n d ( e a r l i e r
FindStringDone:
pop :dr ie s t o rcea l l e rr' es g i s t e
v ar r i a b l e s
pop
si
p ob; rppe s t o cr ea l l e rs' st a cf rka m e
ret
-Findstring endp
end
LISTING 9.2
;
;
;
;
L9-2.ASM
Searches a t e x t b u f f e r f o r
a t e x t s t r i n g . Uses REPNZ SCASB t o scan
t h eb u f f e rf o rl o c a t i o n st h a tm a t c h
a s p e c i f i e dc h a r a c t e ro ft h e
s e a r c h e d - f o rs t r i n g ,t h e nu s e s
REPZ CMPS t o check f u l l y o n l y t h o s e
l o c a t i o n st h a t REPNZ SCASB has i d e n t i f i e d as p o t e n t i a lm a t c h e s .
: C s m a l lm o d e l - c a l l a b l ea s :
;
:
;
;
u n s i g n ecdh a r
* F i n d S t r i n g ( u n s i g n e cd h a r
u n s i g n eidnBt u f f e r L e n g t hu.n s i g n ecdh a r
u n s i gS
ni ne td
archStringLength.
u n s i gS
ni nec ta
dnCharOffset);
; Returns a p o i n t e r t o t h e f i r s t
* Buffer,
* Searchstring.
match f o r S e a r c h s t r i n g i n B u f f e r . o r
: a NULL p o i n t e r i f no match i s f o u n d .B u f f e rs h o u l dn o ts t a r ta t
: o f f s e t 0 i n t h e d a t a segment t o a v o i d c o n f u s i n g a m a t c ha t 0 w i t h
; n o matchfound.
Parms s t r u c
Buffer
BufferLength
Searchstring
SearchStringLength
ScanCharOffset
dw
dw
dw
dw
dw
dw
2 dup(?)
?
?
?
?
?
;pushedBP/returnaddress
; p o i n t e rt ob u f f e rt os e a r c h
; l e n g t ho fb u f f e rt os e a r c h
sea
wt rso
;hcfptiohrcoirthinongt e r
; l e n g t ho fs t r i n gf o rw h i c ht os e a r c h
; osfctfirh
snoieanf trga cf toer r
; w h i c ht os c a n
Parmsends
.model
smal
1
.code
public -Findstring
-F i n d S t r i n g p r o c n e a r
: pprbeups cehar vl sleeftrara'csmk e
mov
b p .; sppostuitonarftcr ak m e
push; p rsei s ec ravl elreerg' si vs at er ir a b l e s
push d i
:make
cld
s ti rni sntgr u c tiinocnrse m
p oe innt t e r s
mov
si.[bp+SearchStrin
; pgo] isntstroeeti n
oragrf oc h
r
mov
cx.[bp+SearchStringLengthl
; l e n g toshtf r i n g
jF
c xi nzd S t r i n g N o t F o u n d
;no match i f s t r i ni sg
mov
d x . [ b p + B u f f e r Lb eu;nfl feogenftrhglt h
s e a r c ha n d;bdbui feffftedewrxre. cnexcn es u b
; lengths
jc
F i n d S t r i n g N o t F o u n d ;no match i f s e a r c hs t r i n gi s
; l o n g e rt h a nb u f f e r
; d i f f e r e n cbee t w e ebnu f f earnsde a r csht r i n g
i ndc x
; l e n g t h s ,p l u s
1 ( # o fp o s s i b l es t r i n gs t a r t
; l o c a t i o n st oc h e c ki nt h eb u f f e r )
mov
d i .ds
mov
es.di
178
Chapter 9
0 length
mov
mov
add
mov
bixn c
:point ES:DI t ob u f f e rt os e a r c ht h r u
; o f f s e ti ns t r i n go fc h a r a c t e r
: o nw h i c h t o scan
:point ES:DI t o f i r s t b u f f e r b y t e t o
scan
: p u tt h es c a nc h a r a c t e ri n
AL
: s e t BX t o t h e o f f s e t b a c k t o t h e s t a r t o f t h e
: p o t e n t i a lf u l lm a t c ha f t e r
a scanmatch,
: a c c o u n t i n gf o rt h e1 - b y t eo v e r r u no f
: REPNZ SCASB
d i ,[ b p + B u f f e r l
bx.[bp+ScanCharOffsetl
d. ib x
al.Csi+bxl
FindStringLoop:
rnov
cx.dx
repnz scasb
F ijnndz S t r i n g N o t F o u n d
push d i
mov
d x:as.rscteihxd
tmeea i nl e
isnentgagoi rtnhc h
: p u tr e m a i n i n gb u f f e rs e a r c hl e n g t hi n
CX
: s c a nf o rt h es c a nb y t e
: n o tf o u n d ,
s o t h e r e ' s no match
; f o u n d . s o we have a p o t e n t i a lm a t c h - c h e c kt h e
: r e s to ft h i sc a n d i d a t el o c a t i o n
:remember astdcnhbtadehoynrxo
eteetfs s
; t h eb u f f e r
: match i n t h e b u f f e r
mov
si,[bp+SearchStringl
:pointtothestartofthestring
mov
cx.[bp+SearchStringLengthl
: s t r i n gl e n g t h
: c o n v e r tt ow o r df o rf a s t e rs e a r c h
csxh. 1r
:dowordsearch
i f no o d db y t e
F ji n cd S t r i n g W o r d
;compare t h e odd b y t e
cmpsb
;odd b y t ed o e s n ' tm a t c h .
s o we
F ji n zd S t r i n g N o M a t c h
; h a v e n ' tf o u n dt h es e a r c hs t r i n gh e r e
FindStringWord:
; i f t h es t r i n gi so n l y
1 b y t el o n g ,
j c xFzi n d S t r i n g F o u n d
: w e ' v ef o u n d
a match
; c h e c kt h er e s to ft h es t r i n g
a word a t a t i m e
r e p z cmpsw
: i t ' s a match
F i n d jSzt r i n g F o u n d
FindStringNoMatch:
: g e tb a c kp o i n t e rt ot h en e x tb y t et os c a n
pop
di
; i st h e r ea n y t h i n gl e f tt oc h e c k ?
dx.dx
and
; y e s - c h e c kn e x tb y t e
F ji n zd S t r i n g L o o p
FindStringNotFound:
asx:ur.bea txu r n
NULLa
p o i ni tnedr i c a t ti thnhaget
: s t r i n g was n of ot u n d
j mFpi n d S t r i n g D o n e
FindStringFound:
p o ap:xp o i nt toh be u f f el ro c a t i o anwt h i c thh e
: s t r i n g was f o u n d( e a r l i e r we pushed t h e
subax.bx
: a d d r e s so ft h eb y t ea f t e rt h es c a nm a t c h )
F i n d S t ngDone:
ri
pop :dr ie s t o cr ea l l e rr' es g i s t ve ar r i a b l e s
pop
si
p ob; rpe s t ocr ea l l e rs' tsa fcrka m e
ret
_ F i n d s t r i n g endp
end
LISTING 9.3
19-3.C
I* Program t o e x e r c i s e b u f f e r - s e a r c h r o u t i n e s i n L i s t i n g s
#i
n c l ude < s t d i 0. h>
# i n c l u d e< s t r i n g . h >
# d e f i n e DISPLAYLLENGTH 40
e x t e r nu n s i g n e dc h a r
* F i n d S t r i n g ( u n s i g n e dc h a r
u n s i g n e dc h a r *, u n s i g n e di n t .u n s i g n e di n t ) ;
v o i dm a i n ( v o i d 1 :
*,
*/
u n s i g n e di n t .
179
s t a t i cu n s i g n e dc h a rT e s t B u f f e r C ]
"When, i n t h e c o u r s e o f
human \
e v e n t s , i t becomes n e c e s s a r yf o r o n ep e o p l et od i s s o l v et h e
\
p o l i t i c a l b a n d sw h i c hh a v ec o n n e c t e dt h e mw i t ha n o t h e r ,a n dt o
\
assumeamong
the powers oftheearththeseparate
and equal s t a t i o n \
t o w h i c ht h el a w so fn a t u r ea n do fn a t u r e ' s
God e n t i t l e t h e m . . . " :
v o i dm a i n 0
{
s t a t i cu n s i g n e dc h a rT e s t S t r i n g L l
"equal";
u n s i g n e dc h a r TempBufferCDISPLAY-LENGTH+ll;
u n s i g n e dc h a r* M a t c h P t r :
/*
S e a r c hf o rT e s t s t r i n g
and r e p o r t t h e r e s u l t s
*/
i f ((MatchPtr
FindString(Test6uffer.
( u n s i g n e di n t ) s t r l e n ( T e s t 6 u f f e r ) . T e s t s t r i n g .
NULL) {
( u n s i g n e di n t ) s t r l e n ( T e s t S t r i n g ) . 1))
/* T e s t s t r i n gw a s n ' tf o u n d
*/
p r i n t f ( " \ " % s \ "n o tf o u n d \ n " ,T e s t s t r i n g ) ;
1 else I
/ * T e s t s t r i n g was f o u n d .Z e r o - t e r m i n a t eT e m p B u f f e r ;s t r n c p y
w o n ' td o
it i f DISPLAY-LENGTH c h a r a c t e r sa r ec o p i e d
*/
TempBuffer[DISPLAYLLENGTHl
0:
p r i n t f ( " \ " % s \ "f o u n d .N e x t
%d c h a r a c t e r sa tm a t c h : \ n \ " % s \ " \ n " ,
T e s t s t r i n g . DISPLAY-LENGTH.
s t r n c p y ( T e m p B u f f e rM
. atchPtr,
DISPLAY-LENGTH)):
You'll notice that in Listing 9.2 I didn't use a table of character frequencies in English text to determine thecharacter forwhich to scan, but ratherlet the caller make
that choice. Each buffer of bytes has unique characteristics, and English-letter frequency could well be inappropriate. What if the buffer is filled with French text?
Cyrillic? What if it isn't text that's being searched? It might be worthwhile for an
application to build a dynamicfrequency table for each buffer so that the best scan
character could be chosen for eachsearch. Or perhaps not,if the search isn't timecritical or the buffer is small.
The pointis that you can improve performance dramatically by understanding the
nature of the data with which you work.(This is equally true forhigh-level language
programming, by the way.) Listing 9.2 is very similar to and only slightlymore complex than Listing 9.1; the difference lies not in elbow grease or cycle counting butin
the organic integrating optimizer technology we all carry around in our heads.
Short Sorts
David Stafford (recently of Borland and Borland Japan) who happens to be one of
the best assembly language programmers I've ever met, has written a C-callable routine that sorts an array of integers in ascending order. That
wouldn't be particularly
noteworthy, except thatDavid's routine, shown in Listing 9.4, is exactly25 bytes long.
Look at the code; you'll keep saying to yourself, "But this doesn't work.. .oh, yes, I
guess it does."As they say in the Prego spaghetti sauce ads, it's in thereand what a
job of packing. Anyway, David says that a 24byte sort routine eludes him, and he'd
like to knowif anyone can come up with one.
180
Chapter 9
LISTING9.419-4.ASM
.-""..._".."___..."""""..""....""...""..""."""...."...
: S o r t s an a r r a y o f i n t s . C c a l l a b l e( s m a l m
l odel).
; v o i ds o r t (i n t
num. i n t a [ ] 1:
25 bytes.
C o u r t e s yo fD a v i dS t a f f o r d .
.".."___..."_.""""..""...""....""..""....""".""..""..
.model m a l 1
.code
pub1 i c - s o r t
top:
mov
xchg
xchg
C:dsbaxtwxdw
.aljoapci netnetg e r s
d x , [bx+E]
dx. Cbxl
jl
dx. Cbxl
top
; d i d we p u tt h e m i n t h er i g h to r d e r ?
:no. swaD themback
inc
inc
1oop
bx
bx
top
:go t on e x ti n t e g e r
cmp
- s oprot :p
dx
cx
bx
push
bx
cx
dec
cx
push
a d d r epush
;st ruser snt o r ed x
jg
top
POP
POP
: g e tr e t u r na d d r e s s
; g e tc o u n t
; g e tp o i n t e r
: r e s t o r ep o i n t e r
:decrementcount
:savecount
: i f cx
>
( e n t r yp o i n t )
ret
end
181
Dividend
Bit 47
Bit 0
1-
Bit 47
And so on...
Bit 0
Quotient
most significant word is divided by the divisor (with no chance of overflow because
there areonly 16 bits in each) ; then the remainderis prepended to the next 16 bits
of dividend, and the process is repeated, as shown in Figure 9.3. This process is
equivalent to dividing by hand, except that here we stop to carry the remainder
manually only after eachword of the dividend; the hardware
divide takes care of the
rest. Listing 9.5 shows a function to divide an arbitrarily large dividend by a 16-bit
divisor, and Listing 9.6 shows a sample division of a large dividend. Note that the
same principle can be applied to handling arbitrarily large dividends in386 native
mode code,but in that case the operationcan proceed adword, rather thana word,
at a time.
As for handling signed division with arbitrarily large dividends, that can be done
easily enough by remembering the signs of the dividend and divisor, dividing the
absolute value of the dividendby the absolutevalue of the divisor, and applying the
stored signs to set the proper signs for the quotient and remainder. There may be
more clever ways to produce the same result,
by using IDN, for example;if you know
of one, drop mea line c/o Coriolis Group Books.
LISTING 9.5
; Divides
L9-5.ASM
an a r b i t r a r i l y l o n g u n s i g n e d d i v i d e n d
: d i v i s o r . C near-callable as:
:
unsigned i nDt i v ( u n s i g n e idn t
182 Chapter 9
Dividend,
by a 1 6 - b i tu n s i g n e d
i n tD i v i d e n d L e n g t h ,u n s i g n e di n tD i v i s o r ,
unsigned i n t * Q u o t i e n t ) ;
; R e t u r n st h er e m a i n d e ro ft h ed i v i s i o n .
: T e s t e d w i t h TASM 2.
parms s t r u c
Dividend
dw
dw
2 dup ( ? )
?
D i v i d e n d L e n g t h dw ?
Divisor
dw
Quotient
dw
Darms ends
small.model
.code
-D i v
public
-D i vp r o cn e a r
: pprbeups cehar vl sleeftrara'csmk e
mov
b p .; spopsotutioanrf rct akm e
push
si
; p r e s e r v ec a l l e r ' sr e g i s t e rv a r i a b l e s
push
di
; wwseot'drfrrkeoi nmg
msb
l s tbo
mov
ax.ds
STOS
mov ;ef so .ra x
mov
cx.[bp+DividendLength]
cx.2
sub
mov
s. [ib p + D i v i d e n d l
add
s, ci ;xp o i nt toh lea swt o r odt fh de i v i d e n d
; ( t h em o s ts i g n i f i c a n tw o r d )
mov
d i ,[ b p + O u o t i e n t ]
add
d. ic;xp o i nt toh lea swt o r odt fh qe u o t i e n t
; b u f f e r ( t h e most s i g n i f i c a n tw o r d )
mov
bx.[bp+Divisorl
csxh,rl
;# o f words t o p r o c e s s
ci xn c
sub
dx.dx
:convert
i n i t i ad li v i s o r
word t o a 3 2 - b i t
; v a l u ef o r D I V
DivLoop:
1odsw
bdxi v
sqtu;ohsw
otseoahtow
fivie
rsedn t
; gneemtxost si gt n i f i c awndootirvfdi s o r
:DX c o n t a i n s t h e r e m a i n d e r a t t h i s o o i n t .
ready t o prepend t o t h e n e x t d i v i i o r w o r d
1oop
mov
c ld
pop
pop
POP
D i vLoop
ax,dx
di
si
bP
r e t u r nt h er e m a i n d e r
r e s t o r ed e f a u l tD i r e c t i o nf l a gs e t t i n g
r e s t o r ec a l l e r ' sr e g i s t e rv a r i a b l e s
r e s t o r ec a l l e r ' ss t a c kf r a m e
183
ret
-Div
endp
end
Sampleuse
o fD i vf u n c t i o nt op e r f o r md i v i s i o n
d o e s n ' t f i t i n 16 b i t s * /
when t h e r e s u l t
# i n c l u d e< s t d i o . h >
* Dividend,
e x t e r nu n s i g n e di n tD i v ( u n s i g n e di n t
i n tD i v i d e n d L e n g t h .u n s i g n e di n tD i v i s o r ,
u n s i g n e di n t * Quotient);
main0 {
u n s i g n e dl o n g m, i
0x20000001;
u n s i g n e d i n t k . j = 0x10;
k
D i v ( ( u n s i g n e d i n t *)&i.
s i z e o f ( i ) . j. ( u n s i g n e di n t* ) & I n ) ;
p r i n t f ( " % l u / %u
% l u r %u\n", i. j . m. k ) ;
1
184
Chapter 9
In assembly, its easy to control the organizationof your stack frame. InC, however,
youll have to figure out the allocation scheme your compiler uses to allocate automatic variables, and declare automatics appropriatelyto produce thedesired effect.
C some years back, and trimmed thesize of a proIt can be done: I did it in Turbo
gram (admittedly, a large one) by several K-not bad, when you consider that the
sweet spot optimizationis essentially free, with no code reorganization, changein
logic, or heavy thinking involved.
+
L
l
- AX
car,,,
Bit 15
Bit 0
RCR AX, 1
car,,,
D
Bit 15
+
AX
Bit 0
RCL AX, 1
AX
ROR AX, 1
+
car,,,
c
Bit 15
AX
Bit 0
ROL AX, 1
185
RCL reg1 take 3 cycles,just like ROR, ROL, SHR, and SHL. At least, thats how fast
youll find different executiontimes
they run onmy 386, and I very much doubt that
on other386s. (Please let me know if you do, though!)
Interestingly, according to Intels i486 Microprocessor ProgrammersReference Manual,
the 486 can RCR or RCL a register by one bit in 3 cycles, but takes between 8 and 30
cycles to perform a multibit register RCR or RCL!
No great lesson here, just a caution to be leery of multibit RCR and RCL when
performance matters-and to take cycle-time documentation with a grain of salt.
dd
mov
mov
jmp
word p tCr P t r l . 5
word p t r CPtr+E].lDOOh
CPtrl
186
Chapter 9
LISTING 9.719-7.ASM
***
t ob u i l d
: T e s t e dw i t h
FarSeg
segment
org
F a r L albaeble l
endsFarSeg
a d i r e c tf a r
jump t o an a b s o l u t ea d d r e s s
***
***
far
1.model
smal
.code
start:
FarLabel
jmp
end
start
By the way, if youre wondering how I figured this out, I merely applied my good
friend Dan Illowskys long-standing rule for dealingwith MASM:
If the obvious doesnt work (and it usually doesnt), justtry everything you can think
of, no matter how ridiculous, until you find something that
does-a rule with plenty
of history on its side.
eax.1
takes only2 cycles to execute, but is 5 bytes long (because native mode constants are
dwords and the MOV instruction doesnt sign-extend). Both code fragments are
ways to set EAX to 1 (although thefirst affectsthe flags and the second
doesnt) ; this
is a classic trade-off of speed forspace. Second,
e b x .o- r1
ebx. -1
takes 2 cycles to execute and is 5 bytes long. Both instructionsset EBX to -1; this is a
classic trade-off of-gee, its not a trade-off at all, is it? OR is a better way to set a32bit register to all 1-bits,
just as SUB or XOR is a better way to set a register to all 0-bits.
Who woulda thunk it? Just goes to show howthe 32-bit displacements and constants
of 386 native mode change thefamiliar landscape of 80x86 optimization.
Hints My Readers Gave Me
187
Previous
Home
Next
188
Chapter 9
Previous
Home
Next
191
In this chapter, Im going to walk you through a simple but illustrative case history
that nicely points up thewisdom of delaying gratification when faced with programming problems, so that your mind has time to chew on the problems from other
angles. The alternative solutions you find by doing this may seem obvious,once youve
come up with them. They may not even differ greatly from your initial solutions.
Often, however, theywill be much better-and youll never even have the chanceto
decide whether theyre better or notif you take the first thing that comes into your
head and runwith it.
1 92
Chapter 10
193
solution, were apt to say, Ah, let the computer do that, its fast, and go back to
making paper airplanes. Unfortunately, brute-force solutions tend to beslow even
when performed by modern-day microcomputers, which are capableof several MIPS
except when Im late foran appointment andwant to finish a compileand run just
one more test before I leave, in which case the crystal in my computer is apparently
designed to automatically revert to 1 Hz.)
The solution that I instantly cameup with to finding theGCD is about as brute- force
as youcan get: Divide both the larger integer (iL) andsmaller
the integer (is)by every
integer equal to or less than the smaller integer, until a numberis found thatdivides
both evenly, as shownin Figure 10.1. This works,but its a lousy solution, requiringas
many as iS*2 divisions; uery expensive, especially for large values of is. For example,
finding the GCD of 30,001 and 30,002 would require 60,002 divisions, which alone,
disregarding tests and branches, would take about 2 seconds on an 8088, and more
than 50 milliseconds evenon a25 MHz 486-a very long time in computer years, and
not insignificant in human years either.
Listing 10.1 is an implementation of the brute-force approach toCCD calculation.
Table 10.1 shows howlong ittakes this approachto find the GCD for several integer
pairs. As expected, performanceis extremely poor when is is large.
1 94
Chapter
IO
LISTING 10.1
110- 1.C
I* F i n d sa n dr e t u r n st h eg r e a t e s t
common d i v i s o r o f t w op o s i t i v e
i n t e g e r s . Worksby
t r y i n ge v e r yi n t e g r a ld i v i s o rb e t w e e nt h e
1. u n t i l a d i v i s o r t h a t d i v i d e s
s m a l l e ro ft h et w oi n t e g e r sa n d
C c o d et e s t e dw i t hM i c r o s o f t
b o t hi n t e g e r se v e n l yi sf o u n d .A l l
a n dB o r l a n dc o m p i l e r s . * /
u n s i g n e di n tg c d ( u n s i g n e di n ti n t l .u n s i g n e di n ti n t 2 )
u n s i g n e di n tt e m p .t r i a l - d i v i s o r ;
/ * Swap i f n e c e s s a r yt o make s u r e t h a t i n t l
i f ( i n t l < int2) {
temp = i n t l ;
i n t l = int2;
temp;
int2
{
>=
int2 *I
195
I* Now j u s t t r y e v e r y d i v i s o r f r o m i n t 2
on down, u n t i l a common
d i v i s o ri sf o u n d .T h i sc a nn e v e r
bean
i n f i n i t el o o pb e c a u s e
1 d i v i d e se v e r y t h i n ge v e n l y
*I
f o r( t r i a l - d i v i s o r
i n t 2 ; ( ( i n t l X t r i a l - d i v i s o r ) !- 0) I I
( ( i n t 2 X t r i a l - d i v i s o r ) !- 0); t r i a l - d i v i s o r - )
return(tria1Ldivisor);
Wasted Breakthroughs
Sedgewick's first solution to the GCD problem was pretty much the oneI came up
with. He then pointed out that the
GCD of iL and is is the same asthe GCD of iLiS
and is. This was obvious (once Sedgewick pointed it out); by the very nature of
division, any number that divides iL evenly nL times and is evenly nS times must
divide iL-iS evenly nLnS times. Given that insight, I immediately designed a new,
faster approach, shown in Listing 10.2.
LISTING 10.2 110-2.C
I* F i n d sa n dr e t u r n st h eg r e a t e s t
common d i v i s o r o f t w o p o s i t i v e
i n t e g e r s . Worksby
s u b t r a c t i n gt h es m a l l e ri n t e g e rf r o mt h e
l a r g e ri n t e g e ru n t i le i t h e rt h ev a l u e sm a t c h( i nw h i c hc a s e
t h a t ' st h eg c d ) ,o rt h el a r g e ri n t e g e r
becomes t h e s m a l l e r o f
t h et w o ,i nw h i c hc a s et h et w oi n t e g e r s
swap r o l e s and t h e
*/
s u b t r a c t i o np r o c e s sc o n t i n u e s .
u n s i g n e di n tg c d ( u n s i g n e di n ti n t l .u n s i g n e di n ti n t 2 )
I
u n s i g n e di n tt e m p ;
I* I f t h et w oi n t e g e r sa r et h e
same, t h a t ' st h eg c da n dw e ' r e
done * I
if (intl
int2) I
return(int1);
/*
Swap i f n e c e s s a r y t o make s u r e t h a t i n t l
if ( i n t l < int2) {
temp
intl:
int2;
intl
temp;
int2
--
>-
inti!
*/
I* S u b t r a c t i n t 2 f r o m i n t l u n t i l i n t l i s
no l o n g e rt h el a r g e ro f
t h et w o * I
do (
i n t l -- i n t i ? ;
1 w h i l e ( i n t l > inti!):
I* Now r e c u r s i v e l y c a l l t h i s f u n c t i o n t o c o n t i n u e t h e p r o c e s s
r e t u r n ( g c d ( i n t 1 ,i n t 2 ) ) ;
*/
Listing 10.2 repeatedly subtracts is from iL until iL becomes less than orequal to is.
If iL becomes equal to is, then that's the GCD; alternatively, if iL becomes less than
is, iL and is switch values,and the process is repeated, as shownin Figure 10.2. The
number of iterations this approach requires relative to Listing 10.1depends heavily
on thevalues of iLand is,so it's not always faster, but, as Table10.1 indicates, Listing
10.2 isgenerally much better code.
196
Chapter
IO
Listing 10.2 isa far graver misstepthan Listing 10.1,for all that its faster. Listing 10.1
is obviouslya hacked-up, brute-force approach; no onecould mistake itfor anything
else. It could be speededup in any of a number of ways with a little thought. (Simply
skipping testing all the divisors between is and iS/2, not inclusive, would cut the
worst-case timein half, for example; thats not a particularly good optimization, but it
illustrates how easily Listing10.1 can be improved.) Listing 10.1 is a hack job, crying
out for inspiration.
Listing 10.2, on the other hand, has gotten the inspiration-and largely wasted it
through haste. Had Sedgewick not told me otherwise, I might well have assumed
that Listing 10.2was optimized, a mistake I would never havemade with Listing 10.1.
I experienced a conceptual breakthrough when I understood Sedgewicks point: A
smaller number can besubtracted from a larger number without affectingtheir GCD,
thereby inexpensively reducing the scale of the problem. And,in my hurry tomake
this breakthrough reality, I missed its fullscope. As Sedgewick says on thevery next
Patient Coding, Faster Code
197
page, the number that one gets by subtracting is from iL until iL is less than is is
precisely the same as the remainder that one
gets by dividing iL by i s a g a i n , this is
inherent in the natureof division-and that is the basis for Euclids algorithm, shown
in Figure 10.3. Listing 10.3 is an implementation of Euclids algorithm.
LISTING10.311
/*
0-3.C
F i n d sa n dr e t u r n st h eg r e a t e s t
common d i v i s o r o f t w o i n t e g e r s .
Uses E u c l i d sa l g o r i t h m :d i v i d e st h el a r g e ri n t e g e rb yt h e
i s 0. t h e s m a l l e r i n t e g e r i s t h e
GCD,
s m a l l e r ; i f t h er e m a i n d e r
o t h e r w i s et h es m a l l e ri n t e g e r
becomes t h e l a r g e r i n t e g e r , t h e
r e m a i n d e r becomes t h es m a l l e ri n t e g e r ,
and t h e p r o c e s s i s
r e p e a t e d . *I
s t a t i cu n s i g n e di n tg c d - r e c u r s ( u n s i g n e di n t .u n s i g n e di n t ) ;
u n s i g n e di n tg c d ( u n s i g n e di n ti n t l .u n s i g n e di n ti n t 2 )
u n s i g n e di n tt e m p ;
same, t h a t s t h e
/ * I f t h et w oi n t e g e r sa r et h e
done * I
int2) {
if (intl
return(int1);
GCO andwere
/ * Swap i f n e c e s s a r y t o
if ( i n t l < int2) {
temp
intl;
int2;
intl
temp;
int2
make s u r e t h a t i n t l
>-
i n t 2 */
I* Now c a l l t h e r e c u r s i v e f o r m
of t h ef u n c t i o n ,w h i c h
o f t h et w o
thatthefirstDarameteristhelarger
r e t u r n ( g c d - r e c u r s ( ; n t l .i n t 2 ) ) ;
assumes
*I
s t a t i cu n s i g n e di n tg c d - r e c u r s ( u n s i g n e di n tl a r g e r - i n t .
u n s i g n e di n ts m a l l e r - i n t )
i n t temp;
/*
I f t h er e m a i n d e ro fl a r g e r - i n td i v i d e db ys m a l l e r - i n ti s
t h e ns m a l l e r - i n ti st h eg c d
*/
i f ((temp
larger-int % smaller-int)
0) {
return(sma1ler-int);
0.
/*
1
Make s m a l l e r - i n t t h e l a r g e r i n t e g e r a n d t h e r e m a i n d e r t h e
s m a l l e ri n t e g e r ,
and c a l l t h i s f u n c t i o n r e c u r s i v e l y t o
*I
c o n t i n u et h ep r o c e s s
return(gcd-recurs(smaller-int, t e m p ) ) ;
As you can see from Table 10.1, Euclids algorithm is superior, especially for large
numbers (andimagine if we were working with large longs.?.
P
198
Chapter 10
number from the greater:In a commercial product, my lack of patience and discipline could have beencostly indeed.
Give your mind time and space to wander around theedges of important programming problems before you settle on any one approach. I titled this booksfirst chapter
The Best Optimizer Is between Your Ears, and thats still true; whats even more
true is that the optimizer between your ears does its best work not at theimplementation stage, but at thevery beginning, when you try to imagine how what you want
to do andwhat a computeris capable of doing can best be brought together.
Recursion
Euclids algorithm lends itself to recursion beautifully, so much so that an implementation like Listing 10.3 comes almost without thought. Again, though, take a
moment to stop and consider whats reallygoing on, at the
assembly language level,
in Listing 10.3. Theres recursion and then theres recursion; code recursion and
data recursion, to beexact. Listing 10.3 is code recursion-recursion through callsPatientCoding,Faster Code
199
the sort most often used because it is conceptually simplest. However, code recursion tends to be slow because it pushes parameters and calls a subroutine for every
iteration. Listing 10.4, which uses data recursion, is much faster and no morecomplicated than Listing 10.3. Actually, you could just say that Listing 10.4 uses a loop
and ignore any mention of recursion; conceptually, though, Listing 10.4 performs
the same recursive operations that Listing 10.3 does.
common d i v i s o r o f t w o i n t e g e r s .
Uses E u c l i d ' sa l g o r i t h m :d i v i d e st h el a r g e ri n t e g e rb yt h e
i s 0 . t h es m a l l e ri n t e g e ri st h e
GCD.
s m a l l e r ; i f t h er e m a i n d e r
o t h e r w i s et h es m a l l e ri n t e g e r
becomes t h e l a r g e r i n t e g e r , t h e
r e m a i n d e rb e c o m e st h es m a l l e ri n t e g e r ,a n dt h ep r o c e s s
is
*I
r e p e a t e d .A v o i d sc o d er e c u r s i o n .
u n s i g n e di n tg c d ( u n s i g n e di n ti n t l .u n s i g n e di n ti n t 2 )
u n s i g n e di n tt e m p ;
I* Swap i f n e c e s s a r y t o make s u r e t h a t i n t l
if (intl
temp
intl
int2
<
--
>-
i n t 2 *I
int2) {
intl;
int2;
temp;
I* Now l o o p , d i v i d i n g i n t l b y i n t 2
a n dc h e c k i n gt h er e m a i n d e r ,
0. A t e a c hs t e p ,
i f t h er e m a i n d e ri s n ' t
u n t i lt h er e m a i n d e ri s
0, a s s i g n i n t 2 t o i n t l .
a n dt h er e m a i n d e rt oi n t 2 .t h e n
repeat *I
for (:;) {
I* I f t h er e m a i n d e r o f in t l d i v i d e d b y i n t 2 i s
t h eg c d * I
i f ((temp
i n t l % int2)
0) {
return(int2);
I* Make i n t 2 t h e l a r g e r
s m a l l e ri n t e g e r ,a n d
int2;
intl
temp;
int2
0. t h e ni n t 2i s
i n t e g e ra n dt h er e m a i n d e rt h e
*/
r e p e a tt h ep r o c e s s
Patient Optimization
At long last, we're ready to optimize GCD determination in the classic sense. Table
10.1 shows the performanceof Listing 10.4 with and without MicrosoftC/C++'s maximum optimization, and also shows the performance of Listing 10.5, an assembly
language version of Listing 10.4. Sure, the optimized versions are faster than the
unoptimized version of Listing 10.4-but the gains are small compared to those realized from the higher-level optimizationsin Listings 10.2through 10.4.
LISTING10.5
110-5.ASM
; F i n d sa n dr e t u r n st h eg r e a t e s t
common d i v i s o r o f t w o i n t e g e r s .
; Uses E u c l i d ' sa l g o r i t h m :d i v i d e st h el a r g e ri n t e g e rb yt h e
; smaller;
200
i f t h er e m a i n d e ri s
Chapter 10
0. t h es m a l l e ri n t e g e ri st h e
GCD.
; o t h e r w i s et h es m a l l e ri n t e g e r
becomes t h e l a r g e r i n t e g e r , t h e
: r e m a i n d e rb e c o m e st h es m a l l e ri n t e g e r ,a n dt h ep r o c e s si s
: repeatedA
. v o i d sc o d er e c u r s i o n .
: C n e a r - c a l l a b l ea s :
: u n s i g n e di n tg c d ( u n s i g n e di n ti n t l .u n s i g n e di n ti n t 2 ) :
: P a r a m e t e rs t r u c t u r e :
parmsstruc
?
dw
dw
?
?
i n t l dw
?
i n t 2 dw
parmsends
:pushed B P
a:drpedutrsuehrsnesd
: i n t efwigfnhteodoircsh
: t h e GCD
small.model
.code
-gcd
pub1 ic
align 2
_ g cpdr once a r
: p prbeupsscehar vl sleeftrara'csmk e
mov
b p o;.susut epafrrtcakm e
: p rpseui ssechravl lereerg' sivsat er ira b l e s
p u sdhi
>- i n t 2
:Swap i f n e c e s s a r yt o make s u r e t h a t i n t l
mov
ax.intl[bpl
mov
bx.int2Cbpl
>- i n t 2 ?
cmp
a x i.:nbi stxl
s o w e ' rasel el t
j nI nbt s S:eyte s .
:no. so swap i n t l and i n t 2
xchg
ax.bx
IntsSet:
: Now l o o p , d i v i d i n g i n t l b y i n t 2
a n dc h e c k i n gt h er e m a i n d e r ,u n t i l
: t h er e m a i n d e ri s
0 . A t e a c hs t e p ,
i f t h er e m a i n d e ri s n ' t
0. a s s i g n
: int2tointl,
a n dt h er e m a i n d e rt oi n t 2 ,t h e nr e p e a t .
GCDLoop:
; i f t h er e m a i n d e ro fi n t ld i v i d e db y
: i n t Z i s 0 . t h e ni n t 2i st h eg c d
DX:AX f o rd i v i s i o n
; p r e p a r ei n t li n
dx.dx
sub
DX
; i n t l / i n t 2 :r e m a i n d e ri si n
bdxi v
: i st h er e m a i n d e rz e r o ?
dx.dx
and
: y e s . s o i n t 2 ( B X ) i st h eg c d
jz
Done
:no. s o move i n t 2 t o i n t l a n dt h e
; r e m a i n d e rt oi n t 2 ,
and r e p e a tt h e
: process
mov
ax.bx
: i n t l = int2:
r e m a i n d e rf r o m D I V
:int2
mov
bx,dx
: - s t a r to fl o o pu n r o l l i n g :t h ea b o v ei sr e p e a t e dt h r e et i m e s DX:AX f o r d i v i s i o n
; p r e p a r ei n t li n
dx.dx
sub
DX
; i n t l / i n t 2 ;r e m a i n d e ri si n
bdxi v
: i st h er e m a i n d e rz e r o ?
dx.dx
and
;yes. s o i n t 2 ( B X ) i s t h e g c d
jz
Done
mov
ax.bx
int2;
:intl
: i n t 2 = r e m a i n d e rf r o m D I V
mov
bx.dx
._
dx.dx
sub
bdxi v
; p r e p a r ei n t li n
DX:AX f o rd i v i s i o n
DX
; i n t l / i n t 2 ;r e m a i n d e ri si n
PatientCoding,FasterCode
201
._
dx.dx
and
jz
mov
mov
Done
ax.bx
bx.dx
dx.dx
sub
bdxi v
dx.dx
and
jz
Done
mov
ax.bx
mov
bx,dx
:-end o f l o o p u n r o l l i n g jmp
GCDLoop
: i st h er e m a i n d e rz e r o ?
: y e s . s o i n t 2 ( B X ) i s t h e gcd
: i n t l = int2:
: i n t 2 = r e m a i n d e rf r o m D I V
DX:AX f o rd i v i s i o n
: p r e p a r ei n t li n
i s i n DX
: i n t l / i n t 2 :r e m a i n d e r
: i st h er e m a i n d e rz e r o ?
: y e s . s o i n t 2 ( B X ) i st h eg c d
: i n t l = int2:
DIV
; i n t 2 = r e m a i n d e rf r o m
align 2
Done:
mov
a :xr.ebttxhu er n
pop : r edscit ao lr lreeerg isvsat reira b l e s
pop
si
POP : r ebspct ao lrlseeftrraacsmk e
ret
_gcd
endp
end
GCD
202
Chapter
IO
Previous
Home
Next
Pull back the reins a little. Dont measure progress by lines of code written today;
measure it instead by overall progress and by quality. Relax and listen to that quiet
inner voice that provides the real breakthroughs. Stop, look, listen-and think. Not
only will youfind thatits a more productive and creative way to program-but youll
also find that its more fun.
And think what youcould do with all those extra computeryears!
PatientCoding,FasterCode
203
Previous
Home
Next
::$
'_.b
.%
207
FamiIy Matters
While the x86 family is a large one, only a few members of the family-which includes the 8088, 8086, 80188, 80186, 286, 386SX, 386DX, numerous permutations
of the 486, and now the Pentium-really matter.
The 8088 is now allbut extinct in the PC arena. The8086 was used fairly widelyfor a
while, but has now all but disappeared. The 80186 and 80188 never really caught on
for use in PC and dont require further
discussion.
That leaves us withthe high-end chips: the 286, the 386SX, the 386, the 486, and the
Pentium. At this writing, the 386SX is fast going the way of the 8088; people are
realizing that its relatively small costadvantage over the 386 isnt enough to offset its
relatively large performance disadvantage. After all, the 386SX suffers from the same
debilitating problem thatlooms over the 8088-a too-small bus.Internally, the 386SX
is a 32-bit processor, but externally, its a 16-bit processor, a non-optimal architecture, especially for 32-bit code.
Im not goingto discussthe 386SX in detail. If youdo find yourself programming for
the 386SX, follow the same general rules you should follow for the 8088: use short
instructions, use the registers as heavily aspossible, and dont branch.In otherwords,
avoid memory, since the 386SX is by definition better at processing data internally
than it is at accessing memory.
The 486 is a world unto itself for the purposesof optimization, and the Pentiumis a
universe unto itself. Well treat them separately in later chapters.
This leaves us with
just two processors: the 286 and the386. Each was the PC standard
in its day. The 286 is no longer used in new systems, but there are millions of 286based systems still in daily use. The 386 is still being used in new systems, although
its on the downhill leg of its lifespan, and it is in even wider use than the 286. The
future clearly belongs to the 486 and Pentium, but the 286 and 386 are still very
much a partof the present-day landscape.
208
Chapter 1 1
ax.dword p [t Lr o n g V a r l
dx.es
mov
a x . w o pr dt r
d x . w o pr dt r
[LongVarl
[LongVar+21
Protected mode uses those altered segmentregisters to offer access to a great deal
more memory than real mode:The 286 supports 16megabytes of memory, whilethe
386 supports 4 gigabytes (4K megabytes) of physical memory and 64 terabytes (64K
gigabytes!) of virtual memory.
In protected mode,your programs generally run under an operating
system (OS/2,
Unix, Windows NT or the like) that exerts much more control over the computer
than does MS-DOS. Protected mode operating systems can generally run multiple
programs simultaneously, and the performanceof any one programmay depend far
less on code quality than on how efficiently the program uses operating system services and how often and under what circumstances the operating system preempts
the program. Protected mode programs are often mostly collections of operating
system calls, and the performanceof whatever code isnt operating-system oriented
may depend primarily on how large atime slice the operatingsystem givesthat code
to run in.
In short,taken as a whole, protected mode programmingis a different kettle
of fish
altogether fromwhat Ive been describing inthis book. Theres certainly a knack to
optimizing specifically for protected mode undergiven
a
operating system.. .but its
not what weve been learning, andnow is not the time to pursue it further. In general, though, the optimization strategies discussed in this book still hold true in
protected mode;itsjust issues specific to protected mode or a particular operating
system that we wont discuss.
mismatch that plagues the 8088 is eliminated. Consequently, theres no longer any
need to use byte-sized memory variables in preference to word-sized variables, at
least so long as word-sized variablesstart at even addresses, as well see shortly. On
the other hand,
access to byte-sized variables still
isnt any slowm than access to wordsized variables, so you can use whichever size suits a given task best.
You might think thatthe elimination of the 8-bit bus cycle-eater wouldmean that the
prefetch queue cycle-eater would alsovanish, since on the 8088 the prefetch queue
cycle-eater is a side effect of the 8-bit bus. That would seem all the morelikely given
that both the
286 and the386 havelarger prefetchqueues than the
8088 (6 bytes for
the 286, 16 bytes for the 386) and can perform memory accesses, including instruction fetches, in far fewer cycles than the 8088.
However, the prefetch queue cycle-eater doesnt vanish on either the286 or the386,
for several reasons. For one thing, branching instructions still empty the prefetch
queue, so instruction fetching still slows things down after most branches; when the
prefetch queue is empty, it doesnt much matter how big it is. (Even apart from
emptying the prefetch queue, branchesarent particularly fast on the286 or the386,
at a minimumof seven-plus cyclesapiece. Avoid branching whenever possible.)
After a branch it does matter how fast the queuecan refill, and therewe come to the
second reason the prefetch queue cycle-eater lives on: The 286 and 386 are so fast
that sometimes the Execution Unit can execute instructionsfaster than they can be
fetched, even though instruction fetchingis much faster on the286 and 386 than on
the 8088.
(All other things being equal,too-slow instruction fetchingis more of a problem on
the 286 than on the
386, since the 386 fetches 4 instruction bytes at atime versus the
2 instruction bytes fetched per memory access by the 286. However, the 386 also
typically runs atleast twice asfast as the 286, meaning that the386 can easily execute
instructions faster than they can be fetched unless very high-speed memory is used.)
The most significant reason that the prefetch queue cycle-eater not only survivesbut
prospers on the286 and 386, however, liesin the various memory architecturesused
in computersbuilt around the286 and 386. Due to the memory architectures,the 8bit bus cycle-eater is replaced by a new form of the wait state cycle-eater: waitstates
on accesses to normal system memory.
210
Chapter 1 1
that offer more performance for the dollar-but less performance overall-than
zero-wait-state memory.(It is possible to build zero-wait-state systemsfor the 286 and
386; itsjust so expensive that its rarely done.)
The IBM AT and true compatibles use one-wait-statememory (some AT clones use
zero-wait-state memory, but such clones are less common than one-wait-state AT
clones). The 386 systems use a wide variety of memory systems-including high-speed
caches, interleaved memory, and static-column RAM-that insert anywhere from 0 to
about 5 wait states (and many more if 8 or l6bitmemory expansion cards are used)
;
the exact number of wait states inserted at any given time depends on theinteraction between the code being executed and the
memory system its running on.
The performance of most 386 memory systems can vary great&,from one memory
access to anothel; depending on factorssuch as what data happensto bein the cache
and which interleavedbank and/or RAM column was accessedlast.
The many memory systems in use make it impossible for us to optimize for 286/386
computers with the precision thats possible on the 8088. Instead, we must write code
that runsreasonably well under thevarying conditions found in the286/386 arena.
The wait states that occur onmost accesses to system memory in 286 and 386 computers mean that nearly every accessto system memory-memory in the DOSs normal
640K memory area-is slowed down. (Accessesin computerswith high-speed caches
may be wait-state-free if the desired data is already in the cache, but will certainly
encounter wait states if the data isnt cached; this phenomenon produces highly
variable instruction execution times.) While this is our first encounter with system
memory wait states,we haverun into wait-state
a
cycle-eaterbefore: the display adapter
cycle-eater, which we discussed along with the other 8088 cycle-eaters way back in
Chapter 4. System memory generally has fewer wait states per access than display
memory. However, system memory is also accessed far more often than display
memory, so system memory wait states hurt plenty-and the place they hurt most is
instruction fetching.
Consider this: The 286 can store an immediate value to memory, as in MOV
[WordVar],O,in just 3 cycles. However, that instruction is 6 bytes long. The 286 is
capable of fetching 1word every 2 cycles; however,the one-wait-statearchitecture of
the AT stretches that to 3 cycles. Consequently, nine cycles are needed tofetch the
six instruction bytes. On topof that, 3 cycles are needed towrite to memory, bringing the total memory access time to
1 2 cycles. On balance, memory access
time-especially instruction prefetching-greatly exceeds execution time, to the
extent that this particular instruction can take up to four times as long to run as it
does to execute in the Execution Unit.
And that, my friend, is unmistakably the prefetchqueue cycle-eater. I mightadd that
the prefetch queuecycle-eater is in rare good form in theabove example: A 440-1
Pushing the 286 and 386
21 1
ratio of instruction fetch time to execution time is in a class with the best (or worst!)
thats found on the8088.
Lets check out the prefetch queue cycle-eater in action. Listing 11.1 times MOV
WordVar1,O. The Zen timer reports that on a one-wait-state 10 MHz 286-based AT
clone (the computerused for all tests in this chapter), Listing 11.1 runs in 1.27 ps
per instruction. Thats12.7 cycles per instruction, just as we calculated. (That extra
seven-tenths of a cycle comes fromDRAM refresh, which well get to shortly.)
11 1-1.ASM
LISTING 1 1.1
move t o
: memory. i n o r d e r t o d e m o n s t r a t e t h a t t h e p r e f e t c h
: q u e u ec y c l e - e a t e ri sa l i v ea n dw e l l
on t h e AT.
Skip jmp
: ae lvweany s
make wsourrde- s i z e d
memory
: v a r i a b l e sa r ew o r d - a l i g n e d !
WordVar dw
Skip:
call
rept
mov
endm
ZTim
c aelrl O f f
ZTimerOn
1000
CWordVarl .O
What does this mean? Itmeans that, practically speaking, the 286 as used in the AT
doesnt have a 16-bit bus.From a performance perspective, the 286 in an AT has twothirds of a 16-bit bus (a 10.7-bit bus?), since every bus access on an AT takes 50
percent longer than it should. A 286 running at 10 MHz should be able to access
memory at a maximum rate of 1 word every 200 ns; in a 10 MHz AT, however, that
rate is reduced to 1 word every 300 ns by the one-wait-state memory.
In short, a close relative ofour old friend the8-bit bus cycleeater-the system memory
wait state cycle-eater-haunts us still on all but zero-wait-state 286and 386 computers,
and that means that the prefetch queue cycleeater is aliveand well. (The system memory
wait state cycle-eater isnt really a new cycleeater, but rather a variant of the general
wait state cycleeater, of which the display adapter cycleeater is yet another variant.)
While the 286 in the AT can fetch instructions much faster than can the 8088 in the
PC, it can execute those instructions faster still.
The picture is less clear in the 386 world since there areso many different memory
architectures, but similar problems can occur inany computer built around a 286 or
386. The prefetch queue cycle-eater is even a factor-albeit a lesser one-on zerowait-state machines, both because branching empties the queue and
because some
instructions can outrun even zero-wait-stateinstruction fetching.(Listing 11.1 would
21 2
Chapter 1 1
take at least 8 cycles per instruction on a zero-wait-stateAT-5 cycles longer than the
official execution time.)
To summarize:
Memory-accessing instructions dont run at their official speeds on non-zerowait-state 286/386 computers.
Theprefetchqueuecycle-eaterreducesperformanceon
286/386 computers,
particularly when non-zero-wait-state memoryis used.
Branches often execute at less than their rated speeds on the 286 and 386 since
the prefetch queue is emptied.
The extent to which the prefetch queue and wait states affect performance varies
from one 286/386 computer to another, making precise optimization impossible.
Whats to be learned fromall this? Several things:
Keepyourinstructionsshort.
Keep it in the registers; avoid memory, since memory generally cant keep up
with the processor.
Dont
jump.
Of course, those areexactly the rules that apply to 8088 optimization as well. Isnt it
convenient that the same general rules apply across the board?
Data Alignment
Thanks toits l6bit bus, the 286 can access word-sizedmemory variables just as fast as
byte-sized variables.Theres a catch, however: Thats onlytrue forword-sized variables
that start at even addresses. When the 286 is asked to perform a word-sized access
starting at an odd address, it actually performs two separate accesses, each of which
fetches 1 byte, just as the 8088 does for all word-sized accesses.
Figure 11.1 illustrates thisphenomenon. The conversion of word-sized accesses toodd
addresses into double byte-sized accesses transparent
is
to memory-accessing instructions;
all any instruction knows is that the requested word has been accessed, no matter
whether 1 word-sized access or 2 byte-sized accesses wererequired to accomplish it.
The penalty for performinga word-sized accessstarting at anodd address is easy to
calculate: Two accesses take twice as long as one access.
.p
That, ina nutshell, is the data alignment cycle-eater, the onenew cycle-eater of the
286 and 386. (The dataalignment cycle-eater is a close
relative of the 8088s 8-bit bus
cycle-eater, but since it behaves differently-occurring only at odd addresses-and is
avoided with adifferent workaround,well consider it to be a new cycle-eater.)
Pushing the 286 and 386
21 3
69
Memory
To
286
The 80286 reads the word value
838217 at address 20000h with a
single word-sized access since that
word value starts at an even address.
2o02
2003
Memory
To
286
2002
2003
85
The way to deal with the data alignmentcycle-eater isstraightforward: Dont perform
word-sized accesses to odd addmses on the 284 ifyou can
it. The easiest way to avoid the
data alignment cycleeater is to placethe directive EVEN before each of your word-sized
variables. EVEN forces the offset of the nextbyte assembled to be even by inserting
a NOP if the currentoffset is odd; consequently, you can ensure thatany word-sized
variable can be accessed efficiently by the 286 simply by preceding itwith EVEN.
Listing 11.2, which accesses memory a word at a time with each word startingat an
odd address, runs on a 10 MHz AT clone in 1.27 ps per repetition of MOVW, or 0.64 ps
per word-sized memory access. Thats 6plus cycles per word-sized access, which
breaks
down to two separate memory accesses-3 cycles
to access the high byte of each
word and 3 cycles to access the low byte of each word, the inevitable result of nonword-aligned word-sized memory accesses-plus a bit extra forDRAM refresh.
he&
214
Chapter 1 1
***
L i s t i n g1 1 . 2
***
; M e a s u r e st h ep e r f o r m a n c eo fa c c e s s e st ow o r d - s i z e d
; v a r i a b l e st h a ts t a r ta t
o d da d d r e s s e s( a r en o t
; word-aligned).
Skip:
push
POP
mov
mov
mov
c ld
call
rep
call
ds
es
s i .; ls o u r c ae n d e s t i n a t i o an r teh e
; a n bd o t ah r ne owt o r d - a l i g n e d
di.si
c x . 1 0 0 0 ;move 1000words
same
ZTimerOn
movsw
ZTimerOff
On the other hand,Listing 11.3, which is exactly the same as Listing 11.2 save that
the memory accesses are word-aligned (start ateven addresses), runs in0.64 ps per
repetition of MOVSW, or 0.32 ps per word-sized memory access. Thats 3 cycles per
word-sized access-exactly twice as fast as the non-word-aligned accesses of Listing
11.2,just as we predicted.
LISTING 1 1.3 11
;
***
L i s t i n g1 1 . 3
1 -3.ASM
***
; M e a s u r e st h ep e r f o r m a n c eo fa c c e s s e st ow o r d - s i z e d
; v a r i a b l e st h a ts t a r ta te v e na d d r e s s e s( a r ew o r d - a l i g n e d ) .
Skip:
ds push
POP
sub
mov
mov
cl d
call
rep
ZTim
c aelrl O f f
es
s i .;ssio u r ca end de s t i n a t i oa tnrhee
; abnoadwtrhoe r d - a l i g n e d
di.si
c x . 1 0 0 0 :move
1000
words
same
ZTimerOn
movsw
Code Alignment
Lack of word alignment can also interfere with instruction fetching on the 286, although not to the extent that it interferes
with accessto word-sized memoryvariables.
Pushing the 286 and 386
21 5
cx. 1000
ZTimerOn
LoopTop:
1 oop
call
LoopTop
ZTimerOff
Memory
A
201 00
20101
20102
201 03
201 04
The last byte of mov ax, 1 and the first
byte of mov bx,2, which together
form a worduligned word, are
prefetched with a single word-sized
access; the 286 later splits the bytes
apart internally in the prefetch queue.
Word-aligned
Figure 1 1.2
21 6
Chapter 1 1
prefetching
on the 286.
201O5
mov ax, 1
I mov bx,2
00
02
Memory
286
20 100
c3
20101
68
201 02
05
201 03
00
201 04
28
201 05
D2
I ret
I mov ax,5
sub dl,dl
Now, this code should run in, say, about 12 cycles per loop at most. Instead, it took
over 14 cycles per loop, an execution time that I could not explain in any way. After
rolling i t around in my head for a while, I took a look at the code under a
debugger...and the answer leaped out atme. The loop begun ut a n odd address! That
meant that two instruction fetches were required eachtime through the loop; one
to
of one wordget the opcodebyte of the LOOP instruction, which resided at the end
aligned word, and anotherto get the displacementbyte, whichresided at the start of
the nextword-aligned word.
One simple change broughtthe execution time down to a reasonable 12.5 cycles per
loop:
mov
call
c x . 1000
ZTimerOn
even
LoopTop:
1 oop
call
LoopTop
ZTimerOff
Pushing the
execute aNOP can sometimes cancel the performanceadvantage of having a wordaligned branch destination.
I recommend that you onlygo out of your way to word-align the start offsets ofyour
subroutines, as in:
even
FindChar
proc near
So far weve only discussed alignment as it pertains to the 286. What, you may well
ask, of the 386?
The 386 adds theissue of doubleword alignment (thatis, alignment to addresses that
are multiples of four.) The rule for the 386 is: Word-sized memory accesses should
be word-aligned (its impossible for word-aligned word-sized accesses to cross
doubleword boundaries) , and doubleword-sized memory accesses should be
doubleword-aligned. However, in real (as opposed to 32-bit protected) mode,
doubleword-sized memory accesses are rare,so the simple word-alignment rule weve
developed for the 286 servesfor the 386 in real mode as well.
As for code alignment..
.the subroutine-start word-alignment rule of the 286 serves
reasonably well there too since it avoids the worst case, where just 1byte is fetched on
entry to a subroutine.While optimum performancewould dictate doublewordalignment of subroutines, that takes 3 bytes, a high price to pay for an optimization that
improves performance only on thepost 286 processors.
21 8
Chapter 1 1
An interesting corollary to this rule is that you shouldnt INC SP twice to add 2, even
though that takes fewer bytes than ADD SP,2. The stack pointer is odd between the
first and secondINC, so any interrupt occurringbetween the two instructions will be
serviced more slowly than it normally would. The same goes for decrementingtwice;
use SUB SP,2 instead.
The DRAM refresh cycle-eateris the cycle-eater thats least changed fromits 8088 form
on the 286 and 386. In the AT,DRAM refresh uses a little over five percent of all
available memory accesses, slightly
less than it uses in the PC, but in thesame ballpark.
While the DRAM refresh penalty varies somewhaton various AT clones and 386 computers (infact, a few computers arebuilt around static RAM, which requires no refresh
at all; likewise,caches are made of static RAM so cached systems generally suffer less
from DRAM refresh), the5 percent figure is a goodrule of thumb.
Basically, the effect of the DRAM refresh cycle-eater is pretty much the same throughout the PC-compatible world: fairly small, so it doesnt greatly affect performance;
unavoidable, so theres no point in worrying about it anyway; and a nuisance since it
results in fractional cycle counts when using the Zen timer.Just as with the PC, a given
code sequenceon theAT can execute atvarying speeds at differenttimes as a result of
the interaction between the code and DRAM refresh.
Theres nothing much new with DRAMrefresh on 286/386 computers, then. Be aware
of it, but dontoverly concern yourself-DRAM refresh is stillan act of God, and theres
not a blessed thing you can do aboutit. Happily, the internal cachesof the 486 and
Pentium make DRAM refresh largely a performance non-issue on those processors.
21 9
during any given period of time. The 8088 is capable of making use of most but not
all of those slots withREP MOVSW,so the numberof memory accesses allowedby a
display adapter such as a standard VGA is reasonably well-matched to an 8088s
memory access speed. Granted,access to a VGA slows the 8088 down considerablybut, as wereabout to find out, considerablyis a relative term. What VGA
a
does to
PC performance is nothing compared to what it doesto faster computers.
Under ideal conditions, a 286 can access memory much, muchfaster than an 8088.
A 10 MHz 286 is capable of accessing a word of system memory every 0.20 ps with
REP MOVSW, dwarfing the 1 byte every 1.31 ps that the 8088 in a PC can manage.
However, access to display memory is anything but ideal for a 286. For one thing,
most display adapters are 8-bit devices,although newer adapters are
16-bit in nature.
One consequence of that is that only 1 byte can be read or written per access to
display memory; word-sized accesses to 8-bit devices are automatically split into 2
separate byte-sized accesses by the ATs bus. Another consequence is that accesses
are simply slower; the ATs bus inserts additional wait states on accesses to 8-bit devices since it mustassume that such devices weredesigned for PCs and may not run
reliably at AT speeds.
However, the 8-bit size of most display adapters is but one of the two factors that
reduce the speed
with whichthe 286 can access display memory. Far
more cycles are
eaten by the inherent memory-access limitations of display adapters-that is, the
limited number of display memory accesses that display adapters make available to
the 286. Look at it this way: If REP MOVSW on a PC can use more than half of all
available accesses to display memory, then how much faster can code running ona
286 or 386 possibly run when accessing displaymemory?
Thats right-less than twice as fast.
In otherwords, instructions thataccess displaymemory wont run a whole lot faster
on ATs and faster computers than they do on PCs. That explains one of the two
viewpoints expressed at the beginning
of this section: The display adapter cycle-eater
is just about thesame on high-end computers as it is on thePC, in the sense that it
allows instructions thataccess displaymemory to run atjust about the
same speed on
all computers.
Of course, the picture is quite abit different whenyou compare the performanceof
instructions that access display memory to the maximum performance of those instructions. Instructions that access display memory receive many more wait states
when running ona 286 than they do on an8088. Why? While the 286 is capable of
accessing memory much more often than the
8088, weve seen that thefrequency of
access to display memory is determined not by processor speed but by the display
adapter itself. As a result, both processors are actually allowedjust about thesame
maximum number of accesses to display memory in any given time. By definition,
then, the 286 must spend many more cycles waiting than does the8088.
220
Chapter 1 1
And that explains the second viewpoint expressed above regarding thedisplay adapter
cycle-eater vis-a-vis the 286 and 386. The display adapter cycle-eater, asmeasured in
cycles lost to wait states, is indeed muchworse on AT-class computers than it
is on the
PC, and its worse stillon morepowerful computers.
How bad is the display adapter
cycle-eater on an AT? Its this bad: Based on my (not
inconsiderable) experience in timing display adapter access, found
Ive that the display adapter cycle-eater can slow an AT-r even a 386 computer-to near-PC
speeds when display memory is accessed.
I know thats hard to believe, but the display adapter cycle-eater gives out just so
many displaymemory accesses in agiven time, and no more, no matter
how fast the
processor is. In fact, the faster the processor, the more the display adapter cycleeater
hurts the performance
of instructions that access display memory.
The display adapter
cycle-eater is not only still present in 286/386 computers,its worsethan ever.
What can we do about this new, more virulent form of the display adapter cycleeater? The workaround is the same as it was on the PC: Access display memory as
little as you possibly can.
221
A couple of old instructions gain new features on the 286. For one, the 286 version
of PUSH is capable of pushing a constanton the stack. For another, the 286 allows
all shifts and rotates tobe performed for notjust 1bit or the number of bits specified
by CL, but for any constant number of bits.
222
Chapter 1 1
Note well: Those tricks don t necessarily work with system sofmare such asWindows, so Ih recommend against using them.Ifyou want $-gigabyte segments, use
a 32-bit environment suchas Win32.
Detailed Optimization
While the major 8088optimization rules hold true on computers built around the 286
and 386, manyof the instruction-specific optimizations no longer hold, forthe execution times of most instructions are quite different on the 286 and 386 than on the
8088. We have already seen one such example of the sometimes vast difference between 8088 and 286/386 instruction execution times: MOV [wordvar],O, which has
an Execution Unit execution time of 20 cycleson the8088, hasan EU execution time
ofjust 3cycles on the 286 and 2 cycles on the 386.
In fact, the performanceof virtually all memory-accessing
instructions has been improved enormously on the 286 and 386. The key to this improvement is the near
elimination of effective address (EA) calculation time. Where an 8088 takes from 5
to 12 cycles to calculate an EA, a 286 or 386 usually takes no time whatsoever to
perform the calculation. If a base+index+displacement addressing mode, such as
MOV AX,[WordArray+BX+SI],is used on a 286 or 386, 1 cycle is taken to perform
the EA calculation, but thats both the worst case and the only case in which theres
any EA overhead at all.
EU execution time of memoryThe elimination of EA calculation time means that the
addressing instructions is much closer to the EU execution time of register-only
instructions. For instance, on the 8088 ADD [wordVar],lOOH is a 31-cycle instruction, while ADD DX,lOOHis a 4cycle instruction-a ratio of nearly 8 to 1. By contrast,
Pushing the
223
on the286ADD wordVar1,lOOH is
a kycle instruction,while ADD DX,lOOH ais3-cycle
instruction-a ratio ofjust 2.3 to 1.
It would seem, then, thatits lessnecessary to use the registers on the286 than itwas
on the8088, but thats simply not thecase, for reasons weve already seen. Thekey is
this: The 286 can execute memory-addressing instructions so fast that theres no
spare instruction prefetchingtime during those instructions,so the prefetch queue
runs dry, especially on the AT, with its one-wait-statememory. On the AT, the 6-byte
instruction ADD [WordVar],lOOH iseffectivelyat least a 15-cycle instruction, because
3 cycles are needed tofetch each of the three instruction words and 6 more cycles
are needed to read
WordVar and write the result back to memory.
Granted, theregister-only instruction ADD DX,lOOH also slows down-to 6 cyclesbecause of instruction prefetching, leaving a ratio of 2.5 to 1. Now, however,lets look at
the performanceof the same code on an 8088. The register-only code would run in 16
cycles (4 instruction bytes at 4 cycles per byte),while the memory-accessing code would
run in 40 cycles (6 instruction bytes at 4 cycles per byte, plus 2 word-sized memory
accesses at 8 cycles per word).Thats a ratioof 2.5 to 1, exactly the same ason the 286.
This is all theoretical. We put our trust not in theory but in actual performance,so
lets run this code through the Zen timer. On a PC, Listing 11.4, which performs
register-only addition, runs in3.62 ms, while Listing 11.5, which performs addition
to amemory variable, runs in10.05 ms. On a 10MHz AT clone, Listing 11.4 runs in
0.64 ms, while Listing 11.5 runs in 1.80 ms. Obviously,the AT is much faster...but the
ratio of Listing 11.5 to Listing 11.4 is virtually identical on both computers, at2.78
for the PC and 2.81 for the AT.If anything, the register-only form of ADD has a
slightly Zurgeradvantage on the AT than it doeson the PC in this case.
Theory confirmed.
LISTING 1 1.4 11
:
***
L i s t i n g1 1 . 4
1-4.ASM
***
; M e a s u r e st h ep e r f o r m a n c eo fa d d i n ga ni m m e d i a t ev a l u e
; t o a r e g i s t e r ,f o rc o m p a r i s o nw i t hL i s t i n g1 1 . 5 ,w h i c h
: a d d sa ni m m e d i a t ev a l u et o
call
1 0 0r0e p t
dx.100h
add
endm
ZTim
c ae lrlO f f
ZTimerOn
LISTING 1 1.5
:
***
a memory v a r i a b l e .
L i s t i n g1 1 . 5
11 1-5.ASM
***
: M e a s u r e st h ep e r f o r m a n c e
o f a d d i n ga ni m m e d i a t ev a l u e
: t o a memory v a r i a b l e ,f o rc o m p a r i s o nw i t hL i s t i n g1 1 . 4 ,
; w h i c ha d d sa ni m m e d i a t ev a l u e
t o a register.
224
Chapter 1 1
jv
Skip
even
WordVar dw
: a l w a y s make s u r ew o r d - s i z e d
: v a r i a b l e sa r ew o r d - a l i g n e d !
memory
Skip:
call
rept
add
endm
call
ZTimerOn
1000
[WordVarllOOh
ZTimerOff
Whats going on? Simply this: Instruction fetching is controlling overall execution
time on both processors. Boththe 8088 in a PC and the 286 in an AT can execute the bytes
of the instructions inListings 11.4 and 11.5faster than they can be fetched.
Since the
instructions areexactly the same lengths on bothprocessors, it standsto reason that
the ratio of the overall execution times of the instructions should be the same on
both processors as well.Instruction length controls execution time,
and theinstruction lengths are thesame-therefore the ratios of the execution times are thesame.
The 286 can both fetch and execute instruction bytes faster than the 8088 can, so
code executes much faster on the 286; nonetheless, because the 286 can also execute those instruction
bytes much faster than it can fetchthem, overall performance
is still largely determined by the size of the instructions.
Is this always the case? No. When the prefetch queue is full, memory-accessing instructions on the 286 and 386 are much faster(relative to register-only instructions)
than they are on the 8088. Given the system wait states prevalent on 286 and 386
computers, however, the prefetch queue is likely to be empty quitea bit, especially
when code consisting of instructions with short EU execution times is executed. Of
course, thatsjust the sort
of code were likely to write when
were optimizing, so the
performance of high-speed code is more likely to be controlledby instruction size
than by EU execution time on most 286 and 386 computers, justas it is on thePC.
All of which is just a way of saying that faster memory access and EA calculation
notwithstanding, itsjustas desirable to keep instructions short
and memory accesses
to a minimum on the
286 and 386 asit is on the8088. And the way to do that is to use
the registers as heavily aspossible, use string instructions,use short formsof instructions, and the like.
The more things change, the morethey remain the same....
225
interrupts off. In other words, an interrupt can happen even though the Interrupt
flag is never set to1. Now, I dont want to blow this particular bug out
of proportion.
It only causes problems in code that cannot tolerate interrupts underany circumstances, and thats a rare sortof code, especially in user programs. However, some
code really does need to have interrupts absolutely disabled, with no chance of an
interrupt sneaking through. For example, a critical portion of a disk BIOS might
need to retrieve data from
the disk controller the instant it becomes
available; even
a few hundred microseconds of delay could result in asectors worth of data misread. In this case, one misplaced interrupt during aPOPF could result in a trashed
hard disk if that interruptoccurs while the disk BIOS isreading a sectorof the File
Allocation Table.
There is a workaround for thePOPF bug. While the workaroundis easy to use, its
considerably slower than POPF, and costs a few bytes as well, so you wont want to
use it in code that can tolerate interrupts. On the other hand, in code that truly
cannot be interrupted,you should view those extracycles and bytes as cheap insurance againstmysterious and erratic program crashes.
One obvious reason to discuss the POPF workaround is that its useful. Another
reason is that the workaround
is an excellent example
of Zen-level assembly
coding,
in that theres a well-defined goal to be achieved but no obvious way to do so. The
goal is to reproduce the functionality
of the POPF instruction withoutusing POPF,
and theplace to startis by asking exactly whatPOPF does.
All POPF does is pop theword on topof the stack into theFLAGS register, as shown
in Figure 11.4. How can we do that withoutPOPF? Of course, the 286s designers
intended us touse POPF for this purpose, and didnt intentionally provide
any alternative approach, so well have to devise an alternative approach of our own. To do
that, well have to search for instructions that contain
some of the same functionality
as POPF, in the hope that one of those instructions can be used in some way to
replace POPF.
Well, theres only one instruction other thanPOPF that loads the FLAGS register
directly from the stack, and thats IRET, which loads the FLAGS register from the
stack as it branches, as shown in Figure 11.5. IRET has no known bugs of the sort
that plague POPF, so its certainly a candidate to replace
POPF in non-interruptible
applications. Unfortunately,IRET loads theFLAGS register with the third word down
on thestack, not theword on topof the stack, as isthe case withPOPF; the far return
address that IRET pops into CS:IP lies between the top of the stack and the word
popped into theFLAGS register.
Obviously, the segment:offset that IRET expects to find on the
stack abovethe pushed
flags isntpresent when the stack is set up forPOPF, so well have to adjustthe stack
a bit before we can substitute IRET for POPF. What well have to do is push the
segment:offset of the instruction after our workaround code onto the stack right
above the pushed flags. IRET will then branch to that address and pop the flags,
226
Chapter 1 1
SP
ss
I-
3000
FLAGS
1800
ss
3000
FLAGS
0640
,1802
SP
31801
Memory
1
I
ss
FLAGS
31800
31802
SP
Smo
31800
31801
0295
31802
227
ss
IP
cs
18
FLAGS
31800
05
31801
90
31802
10
31803
31804
95
31805
02
31806
57
Memory
31800
05
31801
90
31802
10
18
31804
95
02
31806
57
Memory
IP
FLAGS
Figure 1 1.5
228
Chapter 1 1
31800
05
31801
90
10
31802
18
31803
31804
95
31805
02
+ 31806
57
address pushed on the stack will point to the instruction we want to continuewith.
The codeworks out like this:
j ms ph opr ot p f s k i p
popfiret:
:ibr reat n c h e s
: pushedby
CALL i n t o t h e
FLAGS r e g i s t e r
popfskip:
c a l lf a rp t rp o p f i r e t
;
;
:
:
o f t h en e x t
; p u s h e st h es e g m e n t : o f f s e t
; i n s t r u c t i o n on t h es t a c kj u s ta b o v e
; t h ef l a g sw o r d ,s e t t i n gt h i n g su p
so
: t h a t IRET will b r a n c ht ot h en e x t
; i n s t r u c t i o n a n dp o pt h ef l a g s
When e x e c u t i o nr e a c h e st h ei n s t r u c t i o nf o l l o w i n gt h i s
comment,
t h ew o r dt h a t
was on t o p o f t h e s t a c k
when JMP
SHORT
POPFSKIP
was r e a c h e dh a sb e e np o p p e di n t ot h e
FLAGS r e g i s t e r , j u s t as
i f a POPF i n s t r u c t i o nh a db e e ne x e c u t e d .
By the way, the flags can be popped much morequickly if youre willing to alter a
register in the process. For example, the following macro emulates POPF with just
one branch, butwipes out AX:
EMULATE-POPFLTRASHLAX
macro
p u schs
mov a x . o f f s e t $+5
push
ax
ir e t
endm
macro
$+4
ir e t
endm
which is shorter still, alters no registers, and branches just once. (Of course, this
version of EMULATE-POPF won't work on an8088.)
'L
Memory
IP
cs
1 s e g m e n tp o p f s k i p b
FLAGS
o f f s e tp o p f s k i p
???
317FA
317FC
317FE
4 31800
31802
H
???
???
???
???
Memory
317FA
cs
FLAGS
1 s e g m e n tp o p f s k i p C
1-
317FC
o f f speot p f s k i p + 5
317FE
o f f sp eo tp f s k i p
3p1uf8sl 0ah0gesd
31802
???
Memory
3-
317FA
317FA
317FE
cs
I s e g m e n tp o p f s k i p
Figure 1 1.6
230
Chapter 1 1
???
31800
31802
???
Previous
Home
Next
The standard version of EMULATE-POPF is 6 bytes longer than POPF and much
slower, as youdexpect given that itinvolves three branches. Anyone in his/her right
mind would prefer POPF to a larger, slower, three-branch macro-given a choice. In
non-interruptible code, however, theres no choice here; thesafer-if slower-approach
is the best. (Having people associate your programs with crashed computers is nota
desirable situation, no matter how unfair the circumstances under which it occurs.)
And now you know the nature of and theworkaround for the POPF bug. Whether
you ever need theworkaround or not,its a neatly packaged example of the tremendous flexibility of the x86 instruction set.
231
Previous
Home
Next
So this travelingsabpnan is walking downa road, and he sees a group of men digging
oa, there! he says. Whatyouguys need is a Model
a ditch with their b
8088 ditch digger!
ut a trowel and sells it to them.
A fewdays later, he st0
round. Theyre happywith the trowel, but he sells
them the latest ditchkigging technology, the Model 80286 spade. That keeps them
(a full 32 inches wide, with
content until he stohs by again with a Model 80386 shovel
ate the trowel), and that holds them until he comes back
eally need: a Model 80486 bulldozer.
&&opof the line, the salesman doesntpay them a call for a while.
re they none too friendly, but theyre digging with
the 80386
shovel;the bulldozer is sitting off to one side. Why on earth are you usingthat shovel?
the salesman asks. Whyarent you digging with the bulldozer?
Well, Lord knows we tried, says the foreman, but itwas all we could do just to lift
the damn thing!
Substitute processorfor the various digging implements, and you get an idea of
just how different the optimization rules for the 486 are from what youre used to.
Okay, its not quite that bad-but upon encountering a processor where string instructions are often to be avoided and memory-to-register MOVs are frequently as
fast as register-to-registerMOVs, Dorothy was heard to exclaim (before she sank out
235
Rules to Optimize By
In Appendix G of the i486 Microprocessor Programmers Reference Manual, Intel lists a
number of optimization techniques for the
486. Whileneither exhaustive (welllook
at two undocumented optimizations shortly) nor entirely accurate (well correct two
of the rules here), Intels list is certainly a good starting point. In particular, the list
conveys the extent to which 486 optimization differs from optimization for earlier
x86 processors. Generally, Ill be discussing optimization for real mode (it being the
most widely used mode at the moment),
although many ofthe rules should apply to
protected mode as well.
236
486 optimization is generally more precise and less frustrating than optimization
for other x86processors because every 486 has an identical internal cache. Whenever both the instructions being executed and thedata the instructions access are
in the cache, those instructions will run ina consistent and calculatable number of
cycles on all 486s, with little chance of interferencefrom the prefetch queue and
without regard to the speed of external memov.
Chapter 12
In other words, for cached code (which time-critical code almost always is), performance is predictable and can be calculated
with good precision, and those calculations
will apply on any 486. However, predictabledoesnt mean trivial;the cycle times
printed for the various instructions are not thewhole story.You must be aware of all
the rules, documented and undocumented, that go into calculating actual execution times-and uncovering some of those rules is exactly what this
chapter is about.
Therefore, in real mode, the rule is to avoid using two registers to point to memory
wheneverpossible. Often, this simply means adding thetwo registers together outside a loop before memory is actually addressed.
add
add
dcexc
jnz
ax.[bx+sil
s i .2
LoopTop
with this
add
s i .bx
LoopTop:
add
ax.Csil
add s i .2
dcexc
jnz
LoopTop
sub
si.bx
which calculates the same sum and leaves the registers in the same state as the first
example, but avoids indexed addressing.
In protected mode, thedefinition of indexed addressing is a tad more complex. The
use of two registers to address memory, as in MOV EAX, [EDX+EDI], still qualifies
Pushing the 486
237
for the one-cycle penalty. In addition, the use of 386/486 scaled addressing, as in
MOV [ECX*2],EAX,
also constitutes indexed addressing, even if only one register is
used to point to memory.
All this fuss over one cycle! You might well wonder how much difference one cycle
could make. After all,on the8088, effectiveaddress calculations take a minimum of5
cycles. On the 486, however, 1 cycle is a big deal because many instructions, including most register-only instructions (MOV, ADD, GMP, and so on) execute injust 1
cycle. In particular, MOVs to and from memory execute in 1 cycle-if theyre not
hampered by something like indexed addressing, in which case they slow to half
speed (or worse, as we will see shortly).
For example, consider the summing example shown earlier. The version that uses
base+index ( [BX+SI])addressing executes in eightcycles per loop. As expected, the
version that uses base ( [SI]) addressing runs one cycle faster, at seven cycles per
loop. However, the loop codeexecutes so fast on the 486 that the single cycle saved
by using base addressing makes the whole loop more than 14 percentfaster.
In a key loop on the 486, 1 cycle can indeed matter.
BX.OFFSET M e m V a r
A X , [BXI
theres no way that the 486 can calculate the address referenced by MOV AX,[BX]
until MOV BX,OFFSET MemVar finishes, so pipelining that calculation ahead of
time is not possible. A good workaroundis rearranging your code so that atleast one
instruction lies between the loadingof the memory pointer andits use. For example,
postdecrementing, as in the following
238
Chapter 12
LoopTop:
add ax, [ s i 1
add
s i .2
dcexc
jnz
LoopTop
add
add
dec
jnz
s i ,2
ax,[SIl
cx
LoopTop
Now that we understand what Intel means by this rule, letme make a very important
comment: My observations indicate that forreal-mode code, the documentation understates the extentof the penalty for interrupting theaddress calculation pipeline
its used.
by loading amemory pointer just before
The truth of the matter appears to be that i f a register is the destination of one
instructionand is thenusedbythenextinstruction
to address memory in real
mode, not one but two cycles are lost!
In 32-bit protected mode, however, the penalty is, in fact, the 1 cycle that Intel
documents.
Considering that MOV normally takes only one cycle total, thats quite a loss. For example, the postdecrement loop shown above is 2 full cycles faster than the
preincrement loop, resulting in a 29 percent improvement in the performance of
the entire loop.But wait, theres more. If a register is loaded 2 cycles (which generally means 2 instructions, but, because some 486 instructions take more than 1 cycle,
I
Cycle #
Instruction
being executed
NOV
AX,BX
n+l
[BX] MOV
n+2
A L , [ S I + l ]M O V
n+3
CX.DX
Address being
calculated (arrow
points to cycle during
which address is used)
,1
MOV
239
the 2 are not always equivalent) before its used to point to memory, 1 cycle is lost.
Therefore, whereas this code
mov
mov
di xn c
cdxe c
jnz
b x . o f f s e t MemVar
a x[ b, x ]
LoopTop
loses two cycles from interrupting theaddress calculation pipeline, this code
mov
di xn c
mov
cdxe c
jnz
b x . o f f s e t MemVar
a x[ b, x ]
LoopTop
jnz
b x . o f f s e t MemVar
a x[ b, x ]
LoopTop
Cycle #
CBXI
CSI+11
A X ,nB X
Instruction
being executed
NOV
CX,DX
n+l
MOV
n+2
[EX]
MOV
n+3
,1
M O V AL.[SI+ll
Figure 12.2
240
Chapter 12
Address being
calculated (arrow
points to cycle during
which address is used)
Caveat Programmor
A caution: Im quite certain that the 2-cycle-ahead addressing pipeline interruption
penalty Ivedescribed exists in the two 486s Ive tested. However, theres no guarantee that Intelwont change this aspect of the 486 in the future,especially giventhat
the documentation indicates otherwise. Perhaps the 2-cycle penalty is the result of a
bug in the initial steps of the 486, and will revert to the documentedl-cycle penalty
someday; likewise for the undocumented optimizations Ill describe below. Nonetheless, none of the optimizations I suggest would hurt performance even if the
undocumented performancecharacteristics of the 486 were to vanish, and they certainly will help performance on at least some 486s right now, so I feel theyre well
worth using.
There is, of course, no guarantee that Im entirely correctabout the optimizations die
cussed in this
chapter. Without knowingthe internals of the 486, all I can do is time code
and make inferences fromthe results; I invite you todeduce your own rules and cross
check them against mine.
Also, most likelythere are other optimizations that Im unaware
of. If you have
further information on these or any other undocumented optimizations,
please write and let me know. And, of course, if anyone from Intelis reading this and
wants to give usthe gospel truth, please do!
241
to exhibit the addressing pipeline interruption phenomenon (SP is both destination and addressing register for bothinstructions, according to Intel), butthis code
runs in six cycles per POP/RET pair, matching the official execution times exactly.
Likewise, a sequence like
POP
PcOxP
PbOxP
PaOxP
dx
loses two cycles because SP is the explicit destination of one instruction and then the
implied addressing register for the next, and the sequence
add
sp.10h
POP
ax
ah.0
mov
mov
cx.[MemVarll
al.CMemVar21
add
cx.ax
242
Chapter 12
because AL is loaded by one instruction, then AX is used as the source register for
the next instruction. A cycle canbe saved simply byrearranging the instructions so that
the byte register load isnt immediately followed by the word register usage, like so:
mov
ah.0
mov
mov
a1 .[MemVarZI
cx.[MemVarll
add
cx.ax
Strange as it may seem, this rule is neither arbitrary nor nonsensical. Basically, when
a byte destination register is part of a word source register for the next instruction,
the 486 is unable to directly use the result from thefirst instruction as the source for
the second instruction, because only part of the register required by the second
instruction is contained in the first instructions result. The full, updated register
value must be read from theregister file, and that value cant be read out until the
result from the first instruction has been written into the register file, a process that
takes an extra cycle. Im not going to explain this in great detail because its not
important thatyou understand why this rule exists (only that itdoes in fact exist) ,but
it is an interesting window on theway the 486 works.
In case youre curious, theres no such penalty for the typical XLAT sequence like
mov
bx.offset MemTable
mov a1 . [ s i 1
x1 at
In general, penalties for interrupting the 486s pipeline apply primarily to the fast
core instructions of the 486, most notably register-only instructions and MOV, although arithmeticand logical operations thataccess memory are also often affected.
I dont know all the performance dependencies, and I dont plan to; figuring all of
them outwould be a big, boring job of little value. Basically, on the486 you should
concentrate on using those fast core instructions when performance matters, and all
the rules Ill discussdo indeedapply to those instructions.
You dont need to understand every corner of the 486 universe unless youre a diehard head
who does this stuff for fun.Just learnenough to be able to speed up
Pushing the 486
243
the key portions of your programs, and spend the rest of your time on a fast design
and overall implementation.
a1.bl
cx.dx
s i , [di]
takes four rather than the expected three cycles to execute. Note that it is not required that the
byte register be partof the registerused to address memory; any byte
register will do the trick.
Worse still, loading byte registers both one andtwo cycles before a registeris used to
address memorycosts two cycles, as in
mov
mov
mov
bl .a1
c1.3
bx. [ s i 1
which takes fiverather than threecycles to run. However, there is no penalty if a byte
register is loaded one cycle but not two cycles before a register is used to address
memory. Therefore,
mov
mov
mov
cx.3
dl .a1
si, [bxl
bx.offset M e m V a r
cl.a1
ax, [ bx]
A more sophisticated programmer would expect tolose one cycle, becauseBX is loaded
two cycles beforebeing used to address memory.
In fact, though, this code takes
5 cycles
2 cycles, or 67 percent, longer than normal. Why? Well, under normal conditions,
244
Chapter 12
loading abyte register-CL in this case-one cycle before using a register to address
memory produces no penalty; loading 2 cycles ahead is the only case that normally
incurs a penalty. However, think of Rule #4 as meaning that loadinga byte register
disrupts the memory addressing pipeline as it starts up. Viewed that way, we can see
that MOV BX,OF'FSET MemVar interrupts theaddressing pipeline, forcing it to start
again, and then,presumably, MOV CL,AL interrupts thepipeline again because the
pipeline is now on its first cycle: the onethat loading abyte register can affect.
I know-it seems awfully complicated. It isn 't, rea&. Generally, try not to use byte
destinations exactly two cycles before usinga register to address memory, andtry
not to load a register either one or two cycles before using it to address memory,
and you '11 be fine.
Timing Your O w n
486 Code
In case you wantto do some 486 performance analysis ofyour own, let me show you
how I arrived at oneof the above conclusions; at thesame time, I can warn you ofthe
timing hazards of the cache. Listings 12.1 and 12.2 showthe code I ran through the
Zen timer in order to establish the effects of loading a byte register before using a
register to address memory. Listing 12.1ran in 120 ps on a 33 MHz 486, or 4 cycles
per repetition (120 ps/ 1000 repetitions = 120 ns per repetition; 120 ns per repetition/30 ns per cycle = 4 cycles per repetition);Listing 12.2 ran in 90 ps, or 3 cycles,
establishing that loading a byte register costs a cycle only when it's performed exactly 2 cycles before addressing memory.
1 .ASM
: M e a s u r e st h ee f f e c to fl o a d i n g
: u s i n g a r e g i s t e rt oa d d r e s s
mov
a b y t er e g i s t e r
memory.
b p:.r2ut htneecsot dt we i ct oe
: i t ' s cached
2 c y c l e sb e f o r e
m askuer e
bsxu. bbx
CacheFill Loop:
c a l l Z T i m e r O :ns t a rt ti m i n g
r e p t 1000
mov
d. cl l
noP
mov
a x[ b, x l
endm
c a lZl T i m e r O f:fs t o pt i m i n g
bdpe c
jz
Done
jmpCacheFi11Loop
Done:
1 c y c l eb e f o r e
m askuer e
: i t ' s cached
bsxu. bbx
245
Previous
Home
Next
CacheFill Loop:
c a lZl T i m e r O n: s t a rtti m i n g
r e p t 1000
noP
mov d, cl l
mov
ax.[bxl
endm
c a lZl T i m e r O f;fs t o pt i m i n g
bp
dec
jz
Done
j mCpa c h e F iLl lo o p
Done:
Note that Listings 12.1 and 12.2 each repeat the timing of the code under test a
second time,to make sure that the instructions are in the cache on the second
pass,
is lessthan 8Kin size,
the onefor which results are displayed. Also note that the code
so that it can all fit in the 486s 8K internal cache. If I double the REP value in
Listing 12.2 to 2,000, making the test code larger than 8K, the execution time more
than doubles to224 ps, or 3.7 cycles per repetition; the extraseven-tenths of a cycle
comes from fetchingnoncached instruction bytes.
Wheneveryou see non-integral timing results of this sort, its a good bet that the
test code or data isn t cached.
Previous
Home
Next
249
100 times faster than the 8088-and for which there is no good optimization documentation available. Sure, Intelprovides a few tidbits on optimization in the
back of
the i486 Microprocessor Programmers Reference Manual, but, as I discussed in Chapter
12, that information is both incomplete and not entirely correct.Besides, most assembly language programmers dont bother to read Intels manuals (which are
extremely informative and well done, but only slightly more fun to read than the
phone book), and
go right on programming the
486 using outdated 8088 optimization techniques, blissfully unaware of a new and heavily mutated generation of
cycle-eaters that interactwith their code inways undreamt of even on the386.
For example, considerhow Terje Mathisen doubled the speedof hiswordcounting
program on a486 simply by shuffling a coupleof instructions.
LISTING 13.1
mov di.[bp+OFFSl
mov b l , [ d i 1
add d x . [ b x + 8 0 0 0 h l
113-1.ASM
: g e tt h en e x tp a i ro fc h a r a c t e r s
: g e tt h es t a t ev a l u ef o rt h ep a i r
:incrementwordand
l i n ec o u n t
: a p p r o p r i a t e l yf o rt h ep a i r
250
Chapter
13
MOV DI,[BP+OFFS]
MOV B L . C D I 1
ADD DX,[BX+8000Hl
MOV DI.[BP+OFFSl
one instruction from the use of BX to address memory, so the pipeline penalty is
reduced fromtwo cycles toone cycle. The load of DI isalso separated by one instruction from the use of DI to address memory (remember, the loopis unrolled, so the
last instruction is followed by the first instruction), butbecause the intervening instruction takes two cycles, theres no penalty at all.
Remembel; pipeline penalties diminish with increasing numberof cycles, not instructions, between the pipeline disrupter and the potentially aficted instruction.
LISTING 13.2
mov b l , [ d i 1
mov di.[bp+OFFS]
adddx.[bx+8000h]
11 3-2.ASM
; g e tt h es t a t ev a l u ef o rt h ep a i r
; g e tt h en e x tp a i ro fc h a r a c t e r s
: i n c r e m e n tw o r da n dl i n ec o u n t
; a p p r o p r i a t e l yf o rt h ep a i r
At this point, Terje had nearly doubled the performanceof this code simply by moving one instruction. (Note that swapping the instructions also made it necessary to
preload DI at the startof the loop; Listing 13.2 is not exactly equivalent to Listing
13.1.) Ill let Terje describe his next optimization in his own words:
Aiming the 486
251
When I looked closely as this, I realized that the two cycles for the finalADD is just
the sum of 1 cycle to load the data frommemory, and 1cycle to add it to
DX, so the
code couldjust as well have been written as shown in Listing 13.3. The final breakthrough came when I realized that by initializing AX to zero outside the loop, I
could rearrange it as shown in Listing 13.4 and do the final ADD DXafter the
loop. Thisway there aretwo single-cycle instructions between the first and the fourth
line, avoiding all pipeline stalls, for a total throughputof two cycles/char.
LISTING 13.3
mov b l , [ d i 1
mov di.[bp+OFFSl
mov ax.[bx+8000hl
adddx,ax
LISTING 13.4
11 3-3.ASM
; g e tt h es t a t ev a l u ef o rt h ep a i r
; g e tt h en e x tp a i ro fc h a r a c t e r s
; i n c r e m e n tw o r da n dl i n ec o u n t
; a p p r o p r i a t e l yf o rt h ep a i r
11 3-4.ASM
mov b l , [ d i 1
mov di.[bp+OFFSl
adddx,ax
; g e tt h es t a t ev a l u ef o rt h ep a i r
; g e tt h en e x tp a i ro fc h a r a c t e r s
; i n c r e m e n tw o r da n dl i n ec o u n t
; a p p r o p r i a t e l yf o rt h ep a i r
mov a x . [ b x + 8 0 0 0 h; lg ei tn c r e m e n t fsonr e xtti m e
Id like to
point outtwo fairly remarkable things. First,the single cyclethat Terje savedin
Listing 13.4 sped up his entire word-counting engine
by 25 percent or more;Listing
13.4 is fully twice as fast as Listing 13.1-allthe resultof nothing more than shifting
an instructionand splitting another intotwo operations. Second,Terjes word-counting engine canprocess more than 16million characters per second on a486/33.
Clever 486 optimization can pay off big. QED.
BSWAP can also be useful for reversing the order of pixel bits from a bitmapso that
they can be rotated32 bits at atime with an instruction suchas ROR =,I.
Intels
byte ordering for multiword values (least-significant byte first) loads pixels in the
wrong order, s o far as word rotation is concerned, butBSWAP can take care of that.
252
Chapter
13
EAX before
BSWAP
0x34
0 0x x7 58 6
Bit 0
Bit 3 1
EAX after
BSWAP
0x56
0x34
Bit 3 1
0x12
Bit 0
BSWAP in operation.
Figure 13.2
As it turns out, though,BSWAP is also useful inan unexpected way, having todo with
making efficient use of the upper half of 32-bit registers.As any assembly language
programmer knows, the x86 register set is too small; or, to phrase thatanother way, it
sure would benice if the register set were bigger.As any 386/486 assembly language
programmer knows, there are many cases in which 16 bits is plenty. For example, a
16-bit scan-line counter generally does the trick nicely in a video driver, because
there are very few video devices withmore than 65,535 addressable scan lines. Combining these two observations yieldsthe obvious conclusion that itwould be greatif
there were some way to use the upper andlower 16 bits of selected 386 registers as
separate 16-bit registers, effectivelyincreasing the available register space.
Unfortunately, the x86 instruction set doesnt provide any way to work directly with
only the upperhalf of a 32-bit register.The next best solution is to rotate theregister
to give you accessin the lower 16 bits to thehalf youneed at any particular time, with
code along the lines of that in Listing 13.5. Having to rotate the 16-bit fields into
position certainly isnt as good as having direct access to the upper half, but surely
its better than having to get the values out of memory, isntit?
LISTING13.5
113-5.ASM
mov
cx,[initialskipl
s eh cl x . 1; 6p us kt vi pa l uui nep p eh ra l f
mov
c x , l O; pO
l oucot opui n t
o f ECX
CX
253
1 ooptop:
r oe rc x . 1 6
b:axsd.kcdi xp
vsna:kesliuxepcettxi n c
ror
e c x . 1: p6u t
: c o u cnxt d e c
jnz
1 ooptop
CX
1 oop
count
i n CX
down l o o p
Not necessarily. Shifts and rotates are among the worst performing instructions of
the 486, taking 2 to 3 cycles to execute. Thus, ittakes 2 cycles torotate the skip value
into CX in Listing 13.5, and 2 more cycles to rotate itback to the upperhalf of ECX.
Id say four cycles isa pretty steep priceto pay, especially considering thata MOV to
or from memory takes onlyone cycle. Basically,using ROR to access a 1&bit valuein
the upper half of a 16-bit register is a pretty marginal technique, unless for some
reason you cant access memory at all (for example, if youre using BP as a working
register, temporarily making the stack frame inaccessible).
On the386, ROR was the only way to split a 32-bit register into two 16-bit registers.
On the 486, however, BSWAP can not only do the job, but
can do it better, because
BSWAP executes in just onecycle. BSWAP has the addedbenefit of not affecting any
flags, unlikeROR. With BSWAP-basedcode like that in Listing 13.6,the upper16 bits of
a register can be accessed with only 2 cycles of overhead and without altering any
flags, making the technique of packing two 16-bit registers into one 32-bit register
much more useful.
LISTING13.6113-6.ASM
mov
cx.[initialskipl
b s w ae :pcpxsuktvi ap l u e
mov
c x . 1 :0pl0uo ctoopui n t
1o o p t o p :
bswap
ecx
b:axsd.kcdi xp
vsna:kesliuxepcettxi n c
b s we:aplcopuxcoot pui nn t
: c o u nc xt d e c
l oj nozp t o p
i n u p p eh ra l f
CX
o f ECX
CX
CX
down l o o p
254
Chapter 13
mov
push
ax,[bxl
ax
is twice as fast as
p u s h word p t r [bxl
and the only costis that the previous contents of AX are destroyed.
Likewise, popping a memory location takes six cycles, but popping a register and
writing it to memory takes only two cycles combined. The i486 Microprocessor
Programmers Refeen,ce Manual lists a 4cycle execution time for popping a register,
but pay that no mind; poppinga register takes only 1 cycle.
Why is it that such a convenient operation as pushing or popping memory is so slow?
The rule on the 486 is that simple operations, which can beexecuted in a single cycle
by the 486s MSG core, are fast; whereas complexoperations, which must be carried
out in microcode just as they were on the 386, are almost all relatively slow. Slow,
complex operations include all the string instructions except REP MOVS, as well as
XLAT, LOOP, and, of course, PUSH mem and POP mem.
ax.1
O c l hO.e O hO.O l h
dx.ax
255
At the endof this sequence, DXwill contain 2, and thefast n-bit version of SHLAX,l
will have executed. If you use this approach, Id recommendusing a macro, rather
than sticking DBs in the middle of your code.
Again, this technique is advantageous only on a 486. It also doesnt apply to RCL and
RCR,where you definitely want to use the 1-bit versions whenever youcan, because
the n-bit versions are horrendously slow. But if youre optimizing for the 486, these
tidbits cansave a few critical cycles-and Lord knows that if youre optimizingfor the
486-that is, if youneed even more performance thanyou get from unoptimized code
on a 486-you almost certainlyneed all the speed you can get.
256
Chapter 13
Ill end this chapter with two more quirks of 32-bit addressing. First, as with l6bit
addressing, addressing that uses EBP asa base register both accesses the SS segment
by default and always has a displacement of at least 1 byte. This reflects the common
use of EBP to address a stack frame, but is worth keeping in mind if you should
happen to use EBP to address non-stack memory.
Aiming the 486
257
Previous
Home
Lastly, as I mentioned, ESP cannot be scaled. In fact, ESP cannot be an indexregister; it must be a base register. Ironically, however,ESP is the oneregister that cannot
be used to address memory without the presence of an SIB byte, even if its used
without an index register. This is an outcome of the way in which the SIB byte extends the capabilities of the Mod-R/M byte, and theres nothingto be done aboutit,
but its at least worth noting that ESP-based, non-indexed addressing makes for instructions that are a byte larger than other non-indexed addressing (but not any
slower; theres no l-cycle penalty for using ESP as a base register) on the 486.
Next
Previous
Home
Next
'
When you seem t&be stumped, stop fora minute and think. All the information you
need may be right"ih"frontof your nose if you just look at things a little differently.
Here's a case in poin6:;:~
s '"".$" .
When I was in college&-iisEd to stay around campus for the summer. Oh, I'd take a
have fun. In that spirit, my
course or two, but m&tly it was an excuse tohang out and
my future wife, partly for reasons that will soon become appargirlfriend, Adrian
ent), bussed in to sp,&nd
a week, sharing a less-than-elegant $150 per month apartment
with me andiag$*.&!&&
br>kggcessity, my roommate.
Our apartmentw?i$::pretty much standard issue for two male collegestudents; maybe
even a cut above. The dishes were usually washed, there was generally food in the
refrigerator, and nothinglarger than a small dog hadtaken up permanentresidence
in the bathroom.However, there was one sticking point (literally): the kitchen floor.
This floor-standard tile, with
a nice pattern of black lines on an off-white background (orso we thought)-had never been cleaned. By which I mean that I know
for a certainty that we had never cleaned it, but I suspect that it had in fact not been
cleaned since the Late Jurassic,or possibly earlier. Our feet tended to stick toit; had
the apartment suddenly turned upside-down, I think we'd all have been hanging
from the ceiling.
One day, my roommate and I returned from a pickup basketball game. Adrian, having
been left to her own devices for a couple of hours, hadapparently kept herself busy.
."e
({it
02..8gj
26 1
262 Chapter 14
Only when that first character does (infrequently) match must we drop back to the
slower REPZ CMPS approach.
Its important to understand thatwere assuming that thebuffer is typical text.Thats
what I meant at the outset,when I said that the informationyou need may be under
your nose.
Formally, you don t know a blessed thing about thesearch buffeer, but experience,
common sense, and your knowledge of the application give you a great deal of
useful, ifsomewhat imprecise, information.
If the buffer contains the letter A repeated 1,000 times, followed by the letter B,
then theREPNZ SWB/REPZ CMPS approach will be muchslower than thebruteforce REPZ CMPS approach when searching for the patternAB, because REPNZ
SCASB would match at every buffer location. You could construct a horrendousworstcase scenario for almost any good optimization; the key is understanding the usual
conditions under which your code will work.
As discussed in Chapter 9, we also knowthat certain characters
have lowerprobabilities of matching than others. In a normal
buffer, Twill match far more often than
X. Therefore, if we use REPNZ SCASB to scan for the least common letter in the
search string, rather than thefirst letter, well greatly decrease the numberof times
we have to drop back to REPZ CMPS, and the search time will become very close to
the time it takes REPNZSCASB to go from the start of the buffer to the match
location. If the distance to the first match is N bytes, the least-commonRJPNZ SCASB
approach will take about as long as N repetitions of REPNZ SCASB.
At this point, were pretty much searching at the speed of REPNZ S W B . On the
x86, there simply is no faster way to test each character in turn. In orderto get any
faster, wed have to
check fewer characters-but we cant do that and still be sure of
finding all matches. Can we?
Actually, yes,we can.
263
algorithm, is what gives Boyer-Moore its reputation for inscrutability. That reputation is well deserved for this aspect (which I will not discuss further in this book), but
theres another part of Boyer-Moore thats easily understood, easily implemented,
and highly effective.
Consider this: Were searching for the pattern ABC, beginning the search at the
start (offset 0) of a buffer containing ABZABC. We match on A, we match on B,
and we mismatch on C; the buffer contains a Z in this position. What have we
learned? Why, wevelearned notonly that the pattern doesnt match buffer
the starting at offset 0, but also that it cant possibly match starting at offset 1 or offset 2,
either! After all, theres a Z in the buffer at offset 2; since the pattern doesnt contain a single Z, theres no way that the pattern can match starting at any location
from which it would span theZ at offset 2. We can just skip straight from offset 0 to
offset 3 and continue,saving ourselvestwo comparisons.
Unfortunately, this approach only pays offbig when a near-complete partial match is
found; if the comparison fails on thefirst pattern character, as often happens,we can
only skip ahead 1 byte, as usual. Look at it differently, though: What if we compare
the pattern starting with the last (rightmost) byte, rather than the first (leftmost)
byte? In other words, what if we compare from high memory toward low, in the
direction in which string instructions go after the STD instruction? After all, were
comparing one set of bytes (the pattern) to another set of bytes (a portion of the
buffer) ; it doesnt matterin the least in what order we compare them, so long as all
the bytes in one set are compared to the corresponding bytes in the otherset.
Why on earth would we want to start with the rightmost character? Because a
mismatch on the rightmost charactertells us a great deal more than a mismatch
on
the leftmost character.
We learn nothing new from a mismatch on the leftmost character, except that the
pattern cant match starting at that location.
A mismatch on the rightmost character,
however, tells us
about thepossibilities ofthe patternmatching starting at every buffer
location from which the pattern spans the mismatch location. If the mismatched
character inthe buffer doesnt appearin the pattern, thenwevejust eliminated not
one potential match, but as many potential matches as there are characters in the
pattern; thats how many locations there arein the buffer that might have matched,
but have just been shown not to, because they overlap the mismatched character
that doesnt belongin the pattern. Inthis case, we can skip ahead by the full pattern
length in the buffer! This is how we can outperform even REPNZ SCASB;REPNZ
SCMB has to check every byte in the buffer, but Boyer-Moore doesnt.
Figure 14.1 illustrates the operationof a Boyer-Moore search when the rightmost character of the search pattern (which is the first character thats compared at each location
because were comparing backwards) mismatches witha buffer character that appears
264 Chapter 14
+0
Start of
buffer being
searched
R
A
2
3
4
5
<blank:,
6
7
<blank>
9
10
11
12
13
14
15
lil'
.
/
<blank>
E
Q
U
A
Start of
search pattern
nowhere in the pattern.Figure 14.2 illustrates the operation of a partial match when
the mismatch occurs with a character that's not a patternmember. In this case,we can
only skipahead past the mismatch location, resulting in an advance of fewer bytes
than
the pattern length, andpotentially as little as the same single bytedistance by which
the standard search approach advances.
What if the mismatch occurs with a buffer character that does occur in the pattern?
Then we can't skip past the mismatch location, but we can skip to whatever location
aligns the rightmost occurrence of that character in the pattern with the mismatch
location, as shown in Figure 14.3.
Basically, we exercise our right as members of a free society to compare strings in
whichever direction we choose, and we choose to do so right to left, rather than the
more intuitive left to right. Whenever we find amismatch, we see whatwe can learn
from thebuffer character thatfailed to match the pattern.Imagine that we move the
pattern to the right across the mismatch location until we find a start location that
Boyer-Moore String Searching
265
+0
A
T
<blank>
A
N
D
Start of
buffer being
searched
6
7
8 <blank>
E
9
10
11
12
13
14
15
Q
U
A
L
<blank>
S
H+
search
Of pattern
the mismatch does not eliminate as a possible match for the pattern.
If the mismatch
character doesn't appear in the pattern, the pattern can move clear past the mismatch location.Otherwise, the patternmoves until a matching patternbyte liesatop
the mismatch. That's all there is to it!
266
Chapter 14
-+
Start of
buffer being
searched
Start of
search pattern
T
E
3
4 <blank>
5
6
7
<blank>
E
Q
10
11
12
13
N
D
U
A
L
14
<blank>
15
The best case for Boyer-Moore is good indeed: About N/M comparisons are required,
the ability of Boyerwhere N is the buffer lengthand M is the pattern length. This reflects
Moore to skipahead by a full pattern length on a complete mismatch.
is a C implementation of Boyer-Moore searchHow fast isBoyer-Moore? Listing 14.1
ing; Listing 14.2 is a test-bed program that searches up to the first 32K of a file for a
pattern. Table 14.1 (all times
measured with Turbo Profiler on a 20 MHz cached 386,
searching a modified version of the text of this chapter) shows that this implementation is generally much slower than REPNZ S W B , although it does come close when
searching for long patterns.
Listing 14.1is designed primarily to make later assembly
implementations more comprehensible, rather thanfaster; Sedgewicks implementation uses arraysrather than pointers,is a great deal more compact and very clever,
and may be somewhat faster. Regardless, the far superior performance of REPNZ
SCASB clearly indicates that assembly language is in order atthis point.
Boyer-MooreStringSearching
267
LISTING 14.1
/*
# i n c l u d e< s t d i o . h >
268
114- 1.C
Searches a b u f f e r f o r a s p e c i f i e d p a t t e r n . I n c a s e o f
a mismatch,
s k i p a c r o s sa s many
uses t h e v a l u e o f t h e m i s m a t c h e d b y t e t o
a s p o s s i b l e( p a r t i a lB o y e r - M o o r e ) .
p o t e n t i a lm a t c hl o c a t i o n s
NULL i f
R e t u r n ss t a r to f f s e to ff i r s tm a t c hs e a r c h i n gf o r w a r d ,o r
n om a t c h i s f o u n d .
C++ i n C mode a n dt h es m a l lm o d e l .
*/
T e s t e dw i t hB o r l a n d
Chapter 14
u n s i g n e dc h a r * F i n d S t r i n g ( u n s i g n e dc h a r
u n s i g n e d i n tB u f f e r L e n g t h .u n s i g n e dc h a r
u n s i g n e d i n tP a t t e r n L e n g t h )
{
BufferPtr.
* PatternPtr.
* WorkingPatternPtr. * WorkingBufferPtr:
u n s i g n e dc h a r
u n s i g n e d i n t CompCount.SkipTableC2561,Skip.DistanceMatched:
i n t i;
/ * R e j e c t i f t h eb u f f e ri st o os m a l l
*/
i f ( B u f f e r L e n g t h < P a t t e r n L e n g t h )r e t u r n ( N U L L ) :
I* R e t u r n an i n s t a n tm a t c h i f t h e p a t t e r n i s 0 - l e n g t h
i f ( P a t t e r n L e n g t h == 0 ) r e t u r n ( B u f f e r P t r 1 ;
*I
/ * C r e a t et h et a b l eo fd i s t a n c e sb yw h i c ht os k i p
aheadon
m i s m a t c h e sf o re v e r yp o s s i b l eb y t ev a l u e
*/
/* I n i t i a l i z ea l ls k i p st ot h ep a t t e r nl e n g t h :t h i si st h es k i p
d i s t a n c ef o rb y t e st h a td o n ' ta p p e a ri nt h ep a t t e r n
*/
f o r ( i = 0: i < 2 5 6 ; i++l
SkipTableCil = PatternLength;
/ * S e tt h es k i pv a l u e sf o rt h eb y t e st h a td oa p p e a ri nt h ep a t t e r n
tothedistancefromthebytelocationtothe
end o f t h e
p a t t e r n . When t h e r ea r em u l t i p l ei n s t a n c e so ft h e
same b y t e ,
t h er i g h t m o s ti n s t a n c e ' ss k i pv a l u ei su s e d .N o t et h a tt h e
r i g h t m o s tb y t eo ft h ep a t t e r ni s n ' te n t e r e di nt h es k i pt a b l e :
i f we g e t t h a t v a l u e f o r
a mismatch, we know f o r s u r e t h a t t h e
r i g h t end o f t h ep a t t e r n h a sa l r e a d yp a s s e dt h em i s m a t c h
a r e l e v a n tb y t ef o rs k i p p i n gp u r p o s e s
l o c a t i o n , so t h i s i s n o t
f o r ( i = 0 : i < ( P a t t e r n L e n g t h - 1 ) : i++)
SkipTable[PatternPtr[i]]
= PatternLength - i
1:
*/
/*
P o i n tt or i g h t m o s tb y t eo ft h ep a t t e r n
*/
P a t t e r n P t r += P a t t e r n L e n g t h - 1 :
I* P o i n t t o l a s t ( r i g h t m o s t ) b y t e o f t h e f i r s t p o t e n t i a l p a t t e r n
m a t c hl o c a t i o ni nt h eb u f f e r
*/
B u f f e r P t r += P a t t e r n L e n g t h - 1:
/ * Count o f number o f p o t e n t i a lp a t t e r nm a t c hl o c a t i o n si n
b u f f e r *I
B u f f e r L e n g t h - = P a t t e r n L e n g t h - 1;
I* S e a r c ht h eb u f f e r
*/
w h i l e (1) (
/ * See i f we have a m a t c h a t t h i s b u f f e r l o c a t i o n
*I
WorkingPatternPtr = P a t t e r n P t r :
WorkingBufferPtr = BufferPtr:
CompCount = P a t t e r n L e n g t h :
/ * Compare t h ep a t t e r n a n dt h eb u f f e rl o c a t i o n ,s e a r c h i n gf r o m
h i g h memory t o w a r d l o w ( r i g h t t o l e f t )
*I
== * W o r k i n g B u f f e r P t r - )
I
w h i l e( * W o r k i n g P a t t e r n P t r / * I f w e ' v em a t c h e dt h ee n t i r ep a t t e r n ,i t ' s
a match
i f (-CompCount == 0 )
/* R e t u r n a p o i n t e r t o t h e s t a r t o f t h e m a t c h l o c a t i o n
r e t u r n ( B u f f e r P t r - P a t t e r n L e n g t h + 1):
*/
*/
I
/* I t ' s
Boyer-MooreStringSearching
269
s k i p a h e a df r o mt h em i s m a t c hl o c a t i o nb yt h es k i pd i s t a n c e
*I
f o rt h em i s m a t c hc h a r a c t e r
i f (SkipTable[*WorkingBufferPtrl <- DistanceMatched)
1:
I* s k i pd o e s n ' t doanygood,advanceby
1 *I
Skip
else
/ * Use s k i pv a l u e ,a c c o u n t i n gf o rd i s t a n c ec o v e r e db yt h e
p a r t i a lm a t c h *I
Skip
SkipTable[*WorkingBufferPtrl - DistanceMatched;
I* If s k i p p i n ga h e a dw o u l de x h a u s tt h eb u f f e r ,w e ' r ed o n e
w i t h o u t a match * I
if ( S k i p >- B u f f e r L e n g t h fr e t u r n ( N U L L ) :
I* S k i pa h e a da n dp e r f o r mt h en e x tc o m p a r i s o n
*I
B u f f e r L e n g t h -- S k i p ;
B u f f e r P t r +- S k i p ;
1
J
& 14.3.
( M u s tb em o d i f i e dt op u tc o p yo fp a t t e r na ss e n t i n e la te n do ft h e
s e a r c hb u f f e ri no r d e rt o
beused w i t h L i s t i n g 1 4 . 4 . )
*/
# i n c l u d e< s t d i o . h >
P
in c l ude < s t r i n g . h>
C i n c l ude < f c n t l h >
# d e f i n e DISPLAY-LENGTH
# d e f i n e BUFFER-SIZE
40
0x8000
* F i n d S t r i n g ( u n s i g n e dc h a r
e x t e r nu n s i g n e dc h a r
u n s i g n e dc h a r *, u n s i g n e di n t ) ;
v o i dm a i n ( v o i d 1 :
*,
u n s i g n e di n t .
v o i dm a i n 0 I
u n s i g n e dc h a r TempBuffer[DISPLAY-LENGTH+ll:
u n s i g n e dc h a rF i l e n a m e [ l 5 0 ] P
. a t t e r n C 1 5 0 1 *. M a t c h P t r *, T e s t B u f f e r :
in t Hand1 e;
u n s i g n e di n tW o r k i n g L e n g t h :
p r i n t f ( " F i 1 et os e a r c h : " ) :
gets(Filename1:
p r i n t f ( " P a t t e r nf o rw h i c ht os e a r c h : " ) :
gets(Pattern):
i f ( (Handle
open(Fi1ename. 0-RDONLY 1 0-BINARY))
p r i n t f ( " C a n ' to p e nf i l e :
% s \ n " . F i l e n a m e ) ;e x i t ( 1 ) ;
*/
I* Get memory i n w h i c h t o b u f f e r t h e d a t a
i f ( ( T e s t B u f f e r - ( u n s i g n e dc h a r
*)malloc(BUFFER-SIZE+1))
p r i n t f ( " C a n ' tg e te n o u g hm e m o r y \ n " ) ;e x i t ( 1 ) ;
1
/*
-1
) (
NULL) (
TestBufferCWorkingLength]
0: I* 0 - t e r m i n a t e b u f f e r f o r p r i n t f
I* S e a r c hf o rt h ep a t t e r n
and r e p o r t t h e r e s u l t s
*I
i f ((MatchPtr
F i n d S t r i n g ( T e s t B u f f e r .W o r k i n g L e n g t h .P a t t e r n ,
NULL) (
( u n s i g n e di n t )s t r l e n ( P a t t e r n ) ) )
270
Chapter 14
*I
/ * P a t t e r nw a s n ' tf o u n d
*/
p r i n t f ( " \ " % s \ "n o tf o u n d \ n " .P a t t e r n ) ;
else {
/ * P a t t e r n was f o u n d .Z e r o - t e r m i n a t eT e m p B u f f e r :s t r n c p y
w o n ' td o
i t i f DISPLAY-LENGTH c h a r a c t e r sa r ec o p i e d
*/
TempBufferCDISPLAY-LENGTH] = 0 :
p r i n t f ( " \ " % s \ "f o u n d .N e x t
Xd c h a r a c t e r sa tm a t c h : \ n \ " % s \ " \ n " .
P a t t e r n , DISPLAY_.LENGTH,
s t r n c p y ( T e m p B u f f e rM
. atchPtr.
DISPLAY-LENGTH)):
exit(0):
a s p e c i f i e dp a t t e r n .
I n case o f a mismatch,
many
p o t e n t i a lm a t c hl o c a t i o n sa sp o s s i b l e( p a r t i a lB o y e r - M o o r e ) .
R e t u r n ss t a r to f f s e to ff i r s tm a t c hs e a r c h i n gf o r w a r d ,o r
NULL i f
no match i s f o u n d .
T e s t e d w i t h TASM.
C n e a r - c a l l a b l ea s :
u n s i g n e dc h a r * F i n d S t r i n g ( u n s i g n e dc h a r * B u f f e r P t r .
* PatternPtr.
u n s i g n e d i n tB u f f e r L e n g t h .u n s i g n e dc h a r
u n s i g n e d in t P a t t e r n L e n g t h ) :
; u s e st h ev a l u eo ft h em i s m a t c h e db y t et os k i pa c r o s sa s
:
:
;
:
:
parms
struc
2 d u p: (p?u)s h e d
dw
?
B u f f e r P t r dw
B u f f e r L e n g t h dw ?
?
P a t t e r n P t r dw
P a t t e r n L e n g t h dw ?
parms
ends
BP & r e t ua rdnd r e s s
: p o bi nut toftefore r
be s e a r c h e d
:#bo yf t bei nus f fbtseoer a r c h e d
: p o i np taoet rt ew
f orhn
sr itecoahr c h
: l e n gpotahf t t efrwonhr istceoha r c h
.model
small
.code
public -Findstring
near
~ F i n dpSr ot rci n g
c ld
: p r ebcsfpase
r ulatrlasvm
echerek' s
mov
sbotfp:arup.atcsroom
kpi net
Boyer-MooreStringSearching
271
push
:spi r e s e r vceael rl r' e
s g i s t ev a
r ri
ab1 es
push
di
s us p
b . 2 5 6:*a2l l o c ast p
e a fcSoekr i p T a b l e
: C r e a t et h et a b l eo fd i s t a n c e sb yw h i c ht os k i p
aheadonmismatches
: f o re v e r yp o s s i b l eb y t ev a l u e .F i r s t .i n i t i a l i z ea l ls k i p st ot h e
: p a t t e r nl e n g t h :t h i si st h es k i pd i s t a n c ef o rb y t e st h a td o n ' t
: appear i n t h e p a t t e r n .
mov
ax.Cbp+PatternLengthl
i f t h pe a t t e r ins
a nadx . a; rxe t u rainn s t a n
mta t c h
jz
InstantM
; Oa-tlcehn g t h
rnov
d i .ds
mov
es . d i
:ES=DS=SS
mov
d i;.pSsop
kt oii np tB u f f e r
mov
cx.256
stosw rep
e e do n l y
:from ax
dec
now on, nwe
: PatternLength - 1
mov
[bp+PatternLength].ax
: P o i n tt ol a s t( r i g h t m o s t )b y t eo ff i r s tp o t e n t i a lp a t t e r nm a t c h
: l o c a t i o ni nb u f f e r .
add
[bp+BufferPtrl.ax
: Reject i f b u f f e ri st o os m a l l ,
and s e tt h ec o u n to ft h e
number o f
: p o t e n t i a lp a t t e r nm a t c hl o c a t i o n si nt h eb u f f e r .
[ b p + B usfuf eb r L e n g t h l . a x
jbe
NoMatch
: S e tt h es k i pv a l u e sf o rt h eb y t e st h a t
doappear
i nt h ep a t t e r nt o
; t h ed i s t a n c ef r o mt h eb y t el o c a t i o nt ot h ee n do ft h ep a t t e r n .
: When t h e r ea r em u l t i p l ei n s t a n c e so ft h e
same b y t e ,t h er i g h t m o s t
; i n s t a n c e ' ss k i pv a l u ei su s e d .N o t et h a tt h er i g h t m o s tb y t eo ft h e
; p a t t e r ni s n ' te n t e r e di nt h es k i pt a b l e :
i f we g e t t h a t v a l u e f o r
: a mismatch, we know f o r s u r e t h a t t h e r i g h t
end o f t h e p a t t e r n
has
: a l r e a d yp a s s e dt h em i s m a t c hl o c a t i o n ,
s o t h i s i s not a r e l e v a n tb y t e
: f o rs k i p p i n gp u r p o s e s .
mov
si.[bp+PatternPtr] : p o i n t t o s t a r t o f p a t t e r n
: a r et h e r e any s k i p s t o s e t ?
and
ax.ax
:no
jz
SetSkipDone
mov
: p o i n tt oS k i p B u f f e r
d i .sp
SetSkipLoop:
sub
dddressino
g fbf y t ve a l u e
bx , b :xp r e p a r feowr o r a
mov
b,l[ s i:]g etth en e xpt a t t e r nb y t e
inc
si
: a dpvataphtntoeecirennt e r
shl
b; p
x .r 1ewplfoaorordek u p
mo v
C d i + b x l . a x: s e tt h es k i pv a l u e
when t h i s b y t e v a l u e i s
: t h em i s m a t c hv a l u ei nt h eb u f f e r
dec
ax
j nz
SetSkipLoop
SetSkipDone:
; D L - r i g h t m o s tp a t t e r nb y t ef r o m
now on
mov
dl ,[si 1
:pointtonext-to-rightmostbyteofpattern
dec
si
mov
[ b p + P a t t e r n P t r l . s i : f r o m now on
: S e a r c ht h eb u f f e r .
backward:for
std
REP2 CMPSB
mov
d i . [ b p + B u f f e r P t; rp] o iftniortsset a r cl ohc a t i o n
:#o f m a t lcohc a t i octnhose c k
mov
cx.[bp+BufferLength]
SearchLoop:
S k i Sp IT at ob l e
; p omov
i n.ts p
si
: S k i pt h r o u g hu n t i lt h e r e ' s
a m a t c hf o rt h er i g h t m o s tp a t t e r nb y t e .
QuickSearchLoop:
; r i g h t m o s tb u f f e rb y t e
a t t h i sl o c a t i o n
mov
b l , Cdi 1
:does i t m a t c ht h er i g h t m o s tp a t t e r nb y t e ?
cmp
, b dl l
:yes,sokeepgoing
jz
F u l lCompare
272 Chapter 14
:convert
bh,bh
sub
l:bopxora.efkbdop-xdruapr e
mov
t o a word
S k i ipnT a b l e
a [xs, i + b :xg] se kt vi pa l uf reosmk ti ap b fl otehr i s
: mismatchvalue
+- S k i p :
add : B udfif,earxP t r
Skip:
: B ucf fxe. raLxseunbg t h
ja
QuickSearchLoop
;continue
i f a nbyu f f el er f t
NoMatch
short
jmp
: R e t u r n a p o i n t e rt ot h es t a r to ft h eb u f f e r( f o r0 - l e n g t hp a t t e r n ) .
align
2
InstantMatch:
mov
ax.[bp+BufferPtrl
s hj m
o rpt
Done
: Compare t h ep a t t e r n and t h eb u f f e rl o c a t i o n ,s e a r c h i n gf r o mh i g h
: memory t o w a r d l o w ( r i g h t t o l e f t ) .
2
align
FullCompare:
mov
[bp+BufferPtr].di
: s a v et h ec u r r e n ts t a t eo f
:s et haer c h
mov
[bp+BufferLengthl.cx
:#bo yf t ey steot
compare
mov
cx.[bp+PatternLengthl
M a jt c xhz
;done i f o n l y one c h a r a c t e r
mov
si.[bp+PatternPtr]
: p o i n tt on e x t - t o - r i g h t m o s tb y t e s
di
dec
: o fb u f f e rl o c a t i o n
and p a t t e r n
repz
cmpsb
p a t t h:compare
e er n o f r e s t h e
jz
: t hMaat t' sc h
i tfound
: we've
a match
: I t ' s a mismatch: l e t ' s s e e what we can l e a r n from i t .
i nd ci
:compensate f1o-rb yot ve e r r ou fn
REP2 CMPSB;
: p o i n tt om i s m a t c hl o c a t i o ni nb u f f e r
: d o fb y t e st h a td i dm a t c h .
mov
si.[bp+BufferPtr]
sub
. dsi i
: I f . based on t h em i s m a t c hc h a r a c t e r ,
we c a n ' te v e ns k i p
aheadas
far
: a s where we s t a r t e dt h i sp a r t i c u l a rc o m p a r i s o n ,t h e nj u s ta d v a n c eb y
: 1 t ot h en e x tp o t e n t i a lm a t c h ;o t h e r w i s e ,s k i p
a h e a df r o mt h i s
: c o m p a r i s o nl o c a t i o nb yt h es k i pd i s t a n c ef o rt h em i s m a t c hc h a r a c t e r .
: l e s st h ed i s t a n c ec o v e r e db yt h ep a r t i a lm a t c h .
s bu xb,:bpxr e p afwroeorar d d r e s s i obn fygvftael u e
mov
b l , [ d :i gl et thvea l uoet fhm
e i s m a t cbhy tiebnu f f e r
add
b x: p. br xewplfoaorordek - u p
:SP pSo kit noi pt sT a b l e
bx.sp
add
mov
c x . [ b :xgl et htsek vi pa l uf eot hr m
i si s m a t c h
mov
ax.1
:assume w ej u
' al sldt v a nt nthcoee x t
: p o t e n t i a lm a t c hl o c a t i o n
s ucbx . :sti h
issek fi a
pe rn o u gt o
h
bweo rttahk i n g ?
1
jna
MoveAhead ;no. go w i t thh de e f a u lat d v a n c oe f
mov
a xc:,xy e st h: iitsshdei s t a n cte
sok i p
ahead
from
: t h el a s tp o t e n t i a lm a t c hl o c a t i o nc h e c k e d
MoveAhead:
: S k i pa h e a da n dp e r f o r mt h en e x tc o m p a r i s o n ,
i f t h e r e ' sa n yb u f f e r
: l e f t t o check.
mov
di.Cbp+BufferPtr]
+==S k i p :
add : B u f df ei r,Pa txr
mov
cx.[bp+BufferLength]
-- S k i p :
: B u f f e r Lcexn.gatxh
sub
; c o n t i n u e i f any b u f f e r l e f t
ja
SearchLoop
: R e t u r n a NULL p o i n t e r f o r no match.
2
align
NoMatch:
sub
ax.ax
s h o r t Done
jmp
--
Boyer-MooreStringSearching
273
: R e t u r ns t a r t
align
o fm a t c hi nb u f f e r( B u f f e r P t r
2
- (PatternLength
1)).
Match:
mo v
sub
ax.Cbp+BufferPtr]
ax,[bp+PatternLengthl
Done:
c ld
add
POP
POP
POP
ret
F indStri ng
end
:restoredefaultdirectionflag
s p . 2 5 6 * 2; d e a l l o c a t es p a c ef o rS k i p T a b l e
; r de is t co ar el l er re' gs i svt ae r i a b l e s
si
bP
: r e s t coar el l es rt 'fasrcakm e
endp
:
:
:
:
;
:
:
:
:
:
:
;
;
a s p e c i f i e dp a t t e r n .
I n c a s eo f a mismatch,
uses t h ev a l u eo ft h em i s m a t c h e db y t et os k i pa c r o s s
as many
p o t e n t i a lm a t c hl o c a t i o n s
as p o s s i b l e( p a r t i a lB o y e r - M o o r e ) .
R e t u r n ss t a r to f f s e to ff i r s tm a t c hs e a r c h i n gf o r w a r d ,o r
NULL i f
no match i s found.
R e q u i r e st h a tt h ep a t t e r n
be no l o n g e rt h a n 255 b y t e s ,a n dt h a t
t h e r eb e
a match f o r t h e p a t t e r n
somewhere i n t h e b u f f e r ( i e . .
a
c o p yo ft h ep a t t e r ns h o u l d
beplacedas
a s e n t i n e l a t t h e end o f
t h eb u f f e r i f t h e p a t t e r n i s n ' t a l r e a d y
known t o be i n t h e b u f f e r ) .
T e s t e dw i t h TASM.
C n e a r - c a l l a b l ea s :
u n s i g n e dc h a r * F i n d S t r i n g ( u n s i g n e dc h a r * B u f f e r P t r .
u n s i g n e di n tB u f f e r L e n g t h .u n s i g n e dc h a r
* PatternPtr,
u n s i g n e di n tP a t t e r n L e n g t h ) :
parms
274
struc
dw
Chapter 14
2 dup(?)
;pushed
BP & r e t uardnd r e s s
B u f f e r P t r dw
?
B u f f e r L e n g t h dw ?
P a t t e r n P t r dw
P a t t e r n L e n g t h dw ?
parms
ends
; p o i n tbteour f fbtseoer a r c h e d
.model sma 1 1
.code
public _Findstring
n_ eFai npr dr os ct r i n g
cld
; pbu
rp
esshecravlelsetfrar' sac m
k e
t o our s t afcr a
kme
mov
b p .;sppo i n t
push : p sr e
i s ecr av e
l l er er 'gs i svtaerri a b l e s
push
di
ssupb. 2;5a6l l o c astpeafScoker i p T a b l e
: C r e a t et h et a b l e
o f d i s t a n c e sb yw h i c ht os k i p
aheadonmismatches
: f o re v e r yp o s s i b l eb y t ev a l u e .F i r s t ,i n i t i a l i z ea l ls k i p st ot h e
; p a t t e r nl e n g t h :t h i si st h es k i pd i s t a n c ef o rb y t e st h a td o n ' t
: appear i n t h e p a t t e r n .
mov
d i .ds
: ES-DS=SS
mov
es,di
movS :kpit.posB
ipndutif f e r
mov
a 1 , b[pb
y tpre+ P a t t e r n L e n g t h ]
i f th
pe
a t t eirsn
and
a l:.rael tiuanrnsnt amnat t c h
jz
InstantMatch ; 0-length
mov
ah.al
mov
cx.256/2
stosw rep
mov
ax.[bp+PatternLength]
e do n l y
:from ax
dec
now on. n ewe
: PatternLength - 1
mov
[bp+PatternLength].ax
: P o i n tt or i g h t m o s tb y t eo ff i r s tp o t e n t i a lp a t t e r nm a t c hl o c a t i o n
: i nb u f f e r .
[ b p + B au df fde r P t r l , a x
: S e tt h es k i pv a l u e sf o rt h eb y t e st h a td oa p p e a r
i n t h ep a t t e r nt o
: t h ed i s t a n c ef r o mt h eb y t el o c a t i o n
t o t h e end o f t h e p a t t e r n .
mov
s i . C b p + P a t t e r n P t r ]; p o i n tt os t a r to fp a t t e r n
and
a ax: ax, rt h
e e raensyk i pst ose t ?
SetSkipDone
;no
jz
mov
t o SkipBuffer
:. p
sdpoi i n t
b xb;,xp r e p a rf eowro radd d r e s s i nogbf fy tvea l u e
sub
SetSkipLoop:
mov
b l . [ s: igl e
t htnee px ta t t e rbny t e
inc
si
: apdapvt o
aht einenrcntee r
mov
[ d i + b x l . a l: s e t h es k i pv a l u e
when t h i sb y t ev a l u ei s
: t h em i s m a t c hv a l u e
i n t h eb u f f e r
dec
ax
jnz
SetSkipLoop
SetSkipDone:
mov
.d[ls:iD] L - r i g h t m o ps ta t t e rbny tf er o m
now on
dec
:npesot xoii nt -t t o - r i g h t pmbaoyt fstt ete r n
mov
[ b p + P a t t e r n P t r l . s i ; f r o m now on
: S e a r c ht h eb u f f e r .
std
R E P Z CMPSB
: f o rb a c k w a r d
mov
di.[bp+BufferPtrl
: p o i n tt ot h ef i r s ts e a r c hl o c a t i o n
mov
bx.sp
XLAT
: p o i n tt oS k i p T a b l ef o r
Boyer-MooreStringSearching
275
SearchLoop:
sub
ah,ah
;used
tcoo n v e r t
AL t o a word
; S k i pt h r o u g hu n t i lt h e r e ' s
a match f o r t h e f i r s t p a t t e r n b y t e .
QuickSearchLoop:
; See i f we have a match a t t h e f i r s t b u f f e r l o c a t i o n .
8
; u nl roool pl
8 r tei m
d
to
buercasen c h i n g
REPT
: n ebxut f f be yr t e
mov
a1 ,[ d i 1
i t m aptthacetht e r n ?
cmp
d ;does
l .a1
jz
F u l lCompare
;yes,
so k e eg po i n g
x1 a t
stv;lkhntu
oamihfeop
oloiu.skrem a t c h
; B u f f e r P t r +- S k i p ;
add
d i ,ax
ENDM
QuickSearchLoop
jmp
; Return a p o i n t e rt ot h es t a r to ft h eb u f f e r( f o r0 - l e n g t hp a t t e r n ) .
align
2
InstantMatch:
mov
ax.Cbp+BufferPtrl
s jhmopr t
Done
; Compare t h e p a t t e r n
and t h eb u f f e rl o c a t i o n ,s e a r c h i n gf r o mh i g h
; memory t o w a r d l o w ( r i g h t t o l e f t ) .
align
2
Fullcompare:
mov
[ b p + B u f f e r P t r l . d: si a vt ehceu r r e nb tu f f el or c a t i o n
compare
mov
cx.Cbp+PatternLength] ;# obf y t e ys etto
j c xM
zatch
;done i f t h e r e was o n l y one c h a r a c t e r
dec
;dpi o intnoet dx et s t i n a t i obny t oe
compare ( S I
; p o i n t st on e x t - t o - r i g h t m o s ts o u r c eb y t e )
repz
cmpsb
; c o m p a rt ehr e ost htfpea t t e r n
jz
M a t;ct h a t ' s
i t ; w e 'fvoeu n d
a match
; I t ' s a mismatch: l e t ' s seewhat
we c a nl e a r nf r o m
it.
i nd ci
;compensate fIo- rb yot ev e r r uo nf
REPZ CMPSB;
; p o i n tt om i s m a t c hl o c a t i o ni nb u f f e r
; # o fb y t e st h a td i dm a t c h .
mov
si.Cbp+BufferPtrl
sub
. d si i
; I f . b a s e do nt h em i s m a t c hc h a r a c t e r ,
we c a n ' te v e ns k i pa h e a da sf a r
; aswhere
we s t a r t e d t h i s p a r t i c u l a r c o m p a r i s o n , t h e n j u s t a d v a n c e
by
; 1 t ot h en e x tp o t e n t i a lm a t c h ;o t h e r w i s e .s k i p
aheadfrom
this
; c o m p a r i s o nl o c a t i o nb yt h es k i pd i s t a n c ef o rt h em i s m a t c hc h a r a c t e r ,
: l e s st h ed i s t a n c ec o v e r e db yt h ep a r t i a lm a t c h .
; g e tt h ev a l u eo ft h em i s m a t c hb y t ei nb u f f e r
, Cdi 1
mov
a1
; g e tt h es k i pv a l u ef o rt h i sm i s m a t c h
x1 a t
;assume w e ' l l j u s t advance t o t h e n e x t
mov
cx.1
; p o t e n t i a lm a t c hl o c a t i o n
; i st h es k i pf a re n o u g ht o
be w o r t h t a k i n g ?
ax.si sub
1
;no. go w i t h t h e d e f a u l t a d v a n c e o f
jna
MoveAhead
; y e s .t h i si st h ed i s t a n c et os k i p
aheadfrom
mov
cx.ax
; t h el a s tp o t e n t i a lm a t c hl o c a t i o nc h e c k e d
MoveAhead:
; S k i pa h e a da n dp e r f o r mt h en e x tc o m p a r i s o n .
mov
di.[bp+BufferPtrl
+- S k i p ;
; Badd
u f ,f ce xr P t rd i
mov
s i . [ b p + P a t t e r n P t;rpl o i n
to
htnee x t - t o - r i g h t m o s t
; p a t t e r nb y t e
SearchLoop
jmp
; R e t u r ns t a r to fm a t c hi nb u f f e r( B u f f e r P t r
- ( P a t t e r n L e n g t h - 1)).
align
2
Match:
mov
ax,[bp+BufferPtrl
a x . [ b ps+uPba t t e r n L e n g t h l
276
Chapter Id
Previous
Home
Next
Done :
c ld
: r e sdt eo frdaei ruel tcf tl a
i ogn
asdpd. 2:5d6e a l l o c astpea c e
pop
: r edsi tcoarlel reer g
si sv taer ri a b l e s
si
pop
POP
: r eb spct oa rl leseftrraacsm
ke
ret
-F i n d S t r i ng
endp
end
for S k i p T a b l e
Note that Table 14.1 includes the time required to build the skip table each time
Findstring is called. This time could be eliminated for all but the first search when
repeatedly searching for a particular pattern, by building the skip table externally
and passing a pointer to it as a parameter.
boyer-MooreStringSearching
277
Previous
Home
Next
28 1
And whackedher onthe top of the head with my chin. (As I said, she was only about
four feet tall.) AndI do mean whacked.Jeannie burst into hysterical laughter, tried to
calm herself down, said
goodnight, and went inside, still giggling.No kiss.
I was a pretty mature teenager, so this was only slightlymore traumatic than leading
the Tournament of Roses parade in my underwear. On the next try, though, I did
manage to get the hangof this kissing business, and eventually evenwent on to have
a child. (Not with Jeannie, I might add; the mind
boggles at the mess I could have
made of that with her.) As it turns out, none of that stuff is particularly difficult; in
fact, its kind of enjoyable, wink, wink, say no more.
When youre dealing with something new, a little knowledge goes a longway. When
it comes to kissing, we have to fumble along the learning curve on our own, but
there areall sorts of resources to help speed up thelearning process when it comes
to programming. The basic mechanisms of programming-searches, sorts, parsing,
and the like-are well-understood and superbly well-documented. Treat yourself to
a book like Algorithms, by Robert Sedgewick (Addison Wesley), or Knuths The Art of
Computer Programmingseries (also from Addison Wesley; and where was Knuth with
The Art of Kissing when I needed him?), orpractically anything by Jon Bentley, and
when you tacklea new area, give yourselfa headstart. Theres still plenty of room for
inventiveness and creativity on your part, but why not apply that energy on top of the
knowledge thats already been gained, instead of reinventing the wheel? I know,
reinventing thewheel is just thekind of challenge programmers love-but can you
really affordto waste the time? Anddo you honestly think that youre so smart that you
can out-think Knuth,whos spent alifetime at this stuff and happensto be a genius?
Maybe you can-but I sure cant. For example, consider theevolution of my understanding of linked lists.
Linked Lists
Linked lists are data structurescomposed of discrete elements, or nodes, joined together with links.In C, the links are typically pointers. Like alldata structures, linked
lists have their strengths and their weaknesses. Primary among the strengths are:
simplicity; speedy sequential processing; ease and speed of insertion and deletion;
the ability to mix nodes of various sizes and types; and theability to handle variable
amounts of data, especially when the total amount of data changesdynamically or is
not always known beforehand. Weaknesses include: greater memory requirements
than arrays (the pointers take up space);slow non-sequential processing, including
finding arbitrary nodes; and an inability to backtrack, unless doubly-linked lists are
used. Unfortunately, doubly linked lists need more memory, as well as processing
time to maintain the backward links.
Linked lists arent very good for most types of sorts.Insertion and bubble sorts work
fine, but more sophisticated sorts depend on effkientrandom access, which linked
282
Chapter 15
Pointer to
head of list
&Node # 1
Node # 1
&Node
#2
Other data
in node
Node #2
Node #3
Node #4
Other data
Other data
in node
Other data
in node
283
node's NextNode pointer to point to the following node, as shown in Listing 15.1.
is ##included by all the linkedlist listings
(Listing 15.2 is the headerfile LLIST.H, which
in this chapter.) Easy enough-unless the preceding node happens to be the head
pointer, which doesn't have a NextNode field, because it's not a node,so Listing 15.1
won't work.Cumbersome special code and extra information
(a pointer to the head
of
the list) are required to handle the head-pointer case, as shown in Listing 15.3. (I'll
grant you that if you make the next-node pointer the
first fieldin theLinkNode structure, atoffset 0, then you could successfully point to the head pointerand pretendit
was a M o d e structure-but that's an ugly and potentially dangerous trick, and
we'll see a better approach next.)
LISTING 1 5.1
/*
115- 1 .C
D e l e t e st h en o d ei n
a l i n k e dl i s tt h a tf o l l o w st h ei n d i c a t e dn o d e .
Assumes l i s t i s headedby
a dummy node, s o no s p e c i a lt e s t i n gf o r
t h eh e a d - o f - l i s tp o i n t e ri sr e q u i r e d .R e t u r n st h e
same p o i n t e r
t h a t was p a s s e di n .
*/
#i
n c l ude "1 1 i s t . h"
s t r u c tL i n k N o d e* D e l e t e N o d e A f t e r ( s t r u c tL i n k N o d e* N o d e T o D e l e t e A f t e r )
NodeToDeleteAfter->NextNode
NodeToOeleteAfter->NextNode->NextNode:
return(NodeToDe1eteAfter):
LISTING 15.2
/*
1LIST.H
L i n k e dl i s th e a d e rf i l e .
# d e f i n e MAX-TEXT-LENGTH 100
# d e f i n e SENTINEL 32767
*/
/*
*/
l o n g e s ta l l o w e dT e x tf i e l d
/ * l a r g e ps ot s s i b V
l ea l uf iee l d
s t r u c tL i n k N o d e
{
s t r u c tL i n k N o d e* N e x t N o d e ;
i n t Value:
c h a r TextCMAX-TEXT-LENGTH+ll;
/* Any number o f a d d i t i o n a l d a t a f i e l d s
may b yp r e s e n t
*/
*/
1:
s t r u c tL i n k N o d e* D e l e t e N o d e A f t e r ( s t r u c tL i n k N o d e
*);
s t r u c tL i n k N o d e * F i n d N o d e B e f o r e V a l u e ( s t r u c t LinkNode
s t r u c tL i n k N o d e* I n i t L i n k e d L i s t ( v o i d ) ;
s t r u c tL i n k N o d e *InsertNodeSorted(struct LinkNode *.
s t r u c tL i n k N o d e
*);
LISTING 15.311
/*
*,
int):
5-3.C
D e l e t e st h en o d ei nt h es p e c i f i e dl i n k e dl i s tt h a tf o l l o w st h e
a h e a d - o f - l i s tp o i n t e r :
i n d i c a t e dn o d e .L i s ti sh e a d e db y
pointertothenodetodeleteafterpointstothehead-of-list
p o i n t e r ,s p e c i a lh a n d l i n gi sp e r f o r m e d .
*/
i i n c l ude "1 1 is t . h"
s t r u c tL i n k N o d e* D e l e t e N o d e A f t e r ( s t r u c tL i n k N o d e* * H e a d O f L i s t P t r .
s t r u c tL i n k N o d e* N o d e T o D e l e t e A f t e r )
{
/*
H a n d l es p e c i a l l y
i f t h en o d et od e l e t ea f t e ri sa c t u a l l yt h e
head o f t h e l i s t ( d e l e t e t h e f i r s t e l e m e n t i n t h e l i s t )
i f (NodeToDeleteAfter
( s t r u c tL i n k N o d e* ) H e a d O f L i s t P t r )
*HeadOfListPtr
(*HeadOfListPtr)->NextNode;
284
Chapter 15
- -
i f the
*/
1 else
NodeToDeleteAfter->NextNode =
NodeToDeleteAfter->NextNode->NextNode;
return(NodeToDe1eteAfter):
Dummy
head node
Node # 1
Node
#2
Node #3
Dummy tail
node
Not
used
&Node # 1
+ &Node #2
+ &Node #3
+ &Tail node
Not used
Other data
in node
Other data
in node
Other data
in node
285
The next-node pointer of the head node, which points to thefirst real node, is the
onlypart of the head node thatb actually used. This way the same code works on the
head nodeas on the rest of the list, so there areno special cases.
Likewise, there should be a separate node for the tail of the list, so that every node
that contains real data is guaranteed to have a node on either side of it. In this
scheme, anempty list contains two nodes, as shownin Figure 15.3.Although itis not
necessary, the tail node may point toitself as its own next node, rather than contain
a NULL pointer. Thisway, a deletion operationon anempty list will haveno effectquite unlike the
same operation performedon a list terminated with a NULL pointer.
The tail node of a list terminated like this can be detected because it will be the only
node for which the next-node pointer equals the current-node pointer.
we can still makea few refinements.
Figure 15.3 is a giant step in the right direction, but
The inner loop
of anycode that scans through such a list has to
perform a special teston
each node to determine whether the tail has been reached. So, for example, code to find
the first node containing a value field greater than or equal to a certain value has to
perform two tests in the inner loop,as shownin Listing 15.4.
LISTING 15.4
/*
115-4.C
F i n d st h ef i r s tn o d ei n
a l i n k e dl i s tw i t h
a v a l u ef i e l dg r e a t e r
t h a no re q u a lt o
a k e yv a l u e ,a n dr e t u r n s
a p o i n t e rt ot h en o d e
p r e c e d i n gt h a tn o d e( t of a c i l i t a t ei n s e r t i o n
and d e l e t i o n ) . o r
NULL p o i n t e r i f nosuchvalue
was f o u n d . Assumes t h e l i s t i s
terminated w i t h a tail node p o i n t i n g t o i t s e l f a s thenext node. */
# i n c l u d e< s t d i o . h >
b
in c l ude "1 1 is t . h"
s t r u c tL i n k N o d e
*FindNodeBeforeValueNotLess(
s t r u c tL i n k N o d e* H e a d O f L i s t N o d e .i n tS e a r c h v a l u e )
s t r u c tL i n k N o d e* N o d e P t r
while
- HeadOfListNode:
--
i f (NodePtr->NextNode->NextNode
NodePtr->NextNode)
return(NULL);
/ * we f o u n tdh see n t i n e lf :a i l e sde a r c h
else
r e t u r n ( N o d e P t r ) : /* s u c c e s s :r e t u r np o i n t e rt on o d ep r e c e d i n g
node t h a t was >- */
*/
Suppose, however, that we make the tail node a sentinel by giving it a value that is
guaranteed to terminate the search,
as shown in Figure 15.4. The list in Figure 15.4
has a sentinelwith a value field of 32,767; since we're working with integers, that's
the highestpossible search value, and is guaranteed tosatisfy anysearch thatcomes
down the pike. The success or failure of the search can then be determined outside
the loop,if necessary, by checking for thetail node's special pointer-but the inside
of the loopis streamlined tojust onetest, as shownin Listing 15.5. Not all linked lists
286
Chapter 15
tail DummyheadDummy
node
node
&Tail node
Tail node
Not
used
Not
used
LISTING15.5115-5.C
/*
F i n d st h ef i r s tn o d ei n
a v a l u e - s o r t e dl i n k e dl i s tt h a t
has a V a l u ef i e l dg r e a t e rt h a no re q u a lt o
a k e yv a l u e ,
and
r e t u r n s a p o i n t e rt ot h en o d ep r e c e d i n gt h a tn o d e( t of a c i l i t a t e
i n s e r t i o n and d e l e t i o n ) . o r
a NULL p o i n t e r i f nosuchvalue
was
a s e n t i n e l tail node
f o u n d . Assumes t h e l i s t i s t e r m i n a t e d w i t h
c o n t a i n i n gt h el a r g e s tp o s s i b l eV a l u ef i e l ds e t t i n g
and p o i n t i n g
*/
t o i t s e l f as t h en e x tn o d e .
# i n c l u d e< s t d i o . h >
C i n c l ude "1 1 is t . h"
s t r u c tL i n k N o d e
*FindNodeBeforeValueNotLess(
s t r u c tL i n k N o d e* H e a d O f L i s t N o d e .i n tS e a r c h v a l u e )
s t r u c tL i n k N o d e* N o d e P t r
HeadOfListNode;
w h i l e (NodePtr->NextNode->Value < S e a r c h v a l u e )
NodePtr = NodePtr->NextNode:
i f ( N o d e P t r - > N e x t N o d e - > N e x t N o d e -= NodePtr->NextNode)
return(NULL);
/ * we f o u n tdh see n t i n e fl ;a i l e sde a r c h
else
r e t u r n ( N o d e P t r ) ; / * s u c c e s s r; e t u r np o i n t e rt on o d ep r e c e d i n g
n o d et h a t was >- */
Dummy head
node
Node # 1
&Node # 1
& N o d e #2
Not used
Node #2
*/
Node #3
D u m tmali l
n o e
Other data
I
287
Circular Lists
One minor butelegant refinementyet remains: Use a single node as both the head
and the tail of the list. We can do this by connecting the last node back to the first
through the head/tail node
in a circular fashion, as shown inFigure 15.5. This head/
tail node can also, of course, be a sentinel;when its necessary to check for theend of
the list explicitly, that can be done by comparing the current node pointer to the
head pointer. If theyre equal, youre at the head/tail node.
W h y am I so fond of this circular list architecture? For one thing, itsaves a node, and
most of my linked list programming has been done in
severely memory-constrained
environments. Mostly, though, itsjust so neut;with this setup, theres not asingle node or
inner-loop instructionwasted. Perfect economy of programming, if you ask me.
I must admit thatI racked my brains for quite awhile to come up with the circular
list, simple as it may seem. Shortly after coming up with it, I happened to look in
Sedgewicks book, only to find my nifty optimization described plain as day; and a
little while after that, I came across a thread in thealgorithms/computer.sci topic on
BIX that described it in considerable detail. Folks, the informationis out there.Look
it up before turning on your optimizer afterburners!
Listings 15.1 and 15.6 together form suite
a
of C functions for maintaining circular
a
linked list sorted by ascending value. (Listing 15.5 requires modification before it
will work with circular lists.) Listing 15.7 is an assembly language version of
InsertNodeSorted(); note the tremendous efficiency of the scanning loop in
InsertNodeSorted()-four instructions per node!-thanks to the dummy head/tail/
sentinel node. Listing 15.8is a simple application that illustrates the use of the linkedlist functions in Listings 15.1 and 15.6.
Contrast Figure 15.5 with Figure 15.1, and Listings 15.1, 15.5, 15.6, and 15.7 with
Listings 15.3 and 15.4. Yes, linked lists are simple, but not so simple that a little
knowledge doesnt make a substantial difference. Make it a habit to read Knuth or
Sedgewick or the like before you write a single line of code.
Dummy head/tail
node
+ &Node # 1 -+
L
288
Chapter 15
&Node #2
Other data
in node
Not used
Figure 15.5
Node #1
Node #2
j
. &Node
#3
Other data
in node
Node #3
&Head/
tail node
Other data
in node
LISTING 15.7
115-7.ASM
: C n e a r - c a l l a b l ea s s e m b l yf u n c t i o nf o ri n s e r t i n g
a new node i n a
The l i s t
: iscircular:thatis.
i t has a dummy n o d ea sb o t ht h eh e a da n dt h e
: tailofthelist.
The dummy node i s a s e n t i n e l ,c o n t a i n i n gt h e
: l a r g e s tp o s s i b l eV a l u ef i e l ds e t t i n g .T e s t e dw i t h
TASM.
MAXLTEXT-LENGTH equ 100
: l o n g ea sl l to wTeef di xetl d
: l a rpgoesssVt fiabi elluel de
SENTINEL equ 32767
L i n k N o d es t r u c
NextNode dw
?
Value
dw
?
d bT e x t
MAX-TEXTLLENGTH+l d u p ( ? )
:*** Anynumber o f a d d i t i o n a l d a t a f i e l d s
may b yp r e s e n t ***
L i nkNode ends
: l i n k e dl i s ts o r t e db ya s c e n d i n go r d e ro ft h eV a l u ef i e l d .
.model
smal
.code
:
:
:
:
I n s e r t st h es p e c i f i e dn o d ei n t o
a a s c e n d i n g - v a l u e - s o r t e dl i n k e d
l i s t , s u c ht h a tv a l u e - s o r t i n gi sm a i n t a i n e d .R e t u r n s
a p o i n t e rt o
t h en o d ea f t e rw h i c ht h e
new node i s i n s e r t e d .
C n e a r - c a l l a b l ea s :
: s t r u c tL i n k N o d e *InsertNodeSorted(struct LinkNode*HeadOfListNode.
s t r u c tL i n k N o d e* N o d e T o I n s e r t )
parms
struc
: p u s hr eetdaudr nd r e s s
& BP
dw
2 dup ( ? I
HeadOfListNode dw
?
: p o i nhtt eonra odod fe
1 is t
N o d e T o I n s e r t dw
?
i; npnsot ooei tndrotee r
parms
ends
p u b l-i1c n s e r t N o d e S o r t e d
- 1 n s e r t N o d e S o r t epdr once a r
push
bP
mov
bP. SP
push
si
push
di
mov
si,[bpl.NodeToInsert
mov
ax.[sil.Value
mov
di.Cbpl.Head0fListNode
SearchLoop:
mov
mov
CmP
bx.di
di.Cbxl.NextNode
Cdil.Value.ax
SearchLoop
mov
mov
mov
290
Chapter 15
ax.[bxl.NextNode
[sil.NextNode,ax
Cbx1.NextNode.si
: p o i n tt os t a c kf r a m e
: p r e s e r v er e g i s t e rv a r s
; p o i n tt on o d et oi n s e r t
; s e a r c hv a l u e
:pointtolinkedlistin
: w h i c ht oi n s e r t
:advance t o t h e n e x t node
: p o i n tt of o l l o w i n gn o d e
: i st h ef o l l o w i n gn o d e ' s
: v a l u el e s st h a nt h ev a l u e
: f r o mt h en o d et oi n s e r t ?
: y e s . s o c o n t i n u es e a r c h i n g
:no. s o we h a v ef o u n do u r
: i n s e r tp o i n t
; l i n k t h e new nodebetween
: t h ec u r r e n tn o d ea n dt h e
: f o l l o w i n gn o d e
mov
bxax,
; r e t u r np o i n t e rt o
node
we i n s e r t e d
: r e s t o r er e g i s t e rv a r s
: a f t e rw h i c h
pop
pop
di
si
bp
POP
ret
JnsertNodeSortedendp
end
LISTING 15.8
/*
115-8.C
Sample l i n k e dl i s tp r o g r a m .T e s t e dw i t hB o r l a n d
#i
n c l u d e < s t d lib . h>
#i
n c l u d e < s t d i0 . h>
B
in c l u d e < c o n i0 . h>
Pi n c l u d e < c t y p e . h >
# i n c l u d e< s t r i n g . h >
{ { i n c l u d e" 1 l i s t . h "
C++.
*I
v o i dm a i n ( )
{ i n t Done = 0 . Char,Tempvalue:
s t r u c tL i n k N o d e* T e m p P t r .* L i s t P t r .* T e m p P t r Z :
c h a r TempBuffer[MAX-TEXT-LENGTH+31:
if ((ListPtr
InitLinkedListO)
printf("0utofmemory\n");
exit(1):
=-
NULL) I
w h i l e( ! D o n e )
{
p r i n t f ( " \ n A = a d dD
; - d e l e t e F: - f i n d L; - l i s at l l C
: !-quit\n>"):
Char = t o u p p e r ( g e t c h e 0 ) :
printf("\n"):
s w i t c h( C h a r )
{
case ' A ' :
I* add a node * I
i f ( ( T e m p P t r = m a l l o c ( s i z e o f ( s t r u c tL i n k N o d e ) ) )
p r i n t f ( " 0 u to f
exit(1):
memory\n
-- NULL)
):
p r i n t f ( " N o d ev a l u e :
"1:
s c a n f ( " % d "&
. TempPtr->Value):
i f ((FindNodeBeforeValue(ListPtr.TempPtr->Value))!=-NULL)
{ p r i n t f ( " * * *v a l u ea l r e a d yi nl i s t :t r ya g a i n* * * \ n " ) :
free(TempPtr1:
) e l s e{ p r i n t f ( " N o d et e x t :
"):
TempBuffer[O]
MAX-TEXT-LENGTH:
cgets(TempBuffer);
s t r c p y ( T e m p P t r - > T e x t&
. TempBuffer[El):
I n s e r t N o d e S o r t e d ( L i s t P t r .T e m p P t r ) ;
printf("\n"):
..
1
break:
I* d e l e t e a node * I
case ' D ' :
"):
p r i n t f ( " V a 1 u ef i e l do fn o d et od e l e t e :
scanf ("%d". &TempVal ue) :
i f ((TempPtr
F i n d N o d e B e f o r e V a l u e ( L i s t P t r . Tempvalue))
!=-NULL) I
TempPtrE
TempPtr->NextNode; I* - > node to d e l e t e * I
I* d e l e t e i t * I
DeleteNodeAfter(TempPtr):
I* f r e e i t s memory * /
free(TempPtr2):
291
1 else
p r i n t f ( " * * * n os u c hv a l u ef i e l di nl i s t* * * \ n " )
break;
I* f i n d a node * I
case I F ' :
p r i n t f ( " V a 1 u ef i e l do fn o d et of i n d :
"1;
scanf("%d".&Tempvalue);
i f ((TempPtr
F i n d N o d e B e f o r e V a l u e ( L i s t P t r . Tempvalue))
!- NULL)
printf("Va1ue%
: d\nText%
: s\n".
TempPtr->NextNode->Value.TempPtr->NextNode->Text);
else
p r i n t f ( " * * * n os u c hv a l u ef i e l di nl i s t* * * \ n " ) ;
break;
I* l i s t all nodes * I
' Lc' :a s e
L i s t P t r - > N e x t N o d e ; I* p o i n t t o f i r s t
node * I
TempPtr
i f (TempPtr
ListPtr) {
I* empty i f a st e n t i n e l
*I
p r i n t f ( " * * *L i s ti s
empty***\n");
1 else I
do { p r i n t f ( " V a l u e%
: d \ nT e x t%
: s \ n "T, e m p P t r - > V a l u e .
TempPtr->Text);
TempPtr->NextNode;
TempPtr
1 w h i l e( T e m p P t r !- L i s t P t r ) ;
- -
break;
case '0':
Done
1;
break;
default:
break;
Hi/Lo in 24 Bytes
In one of myPC TECHNIQLES "Pushing the Envelope" columns,I passed along one of
David Stafford's fiendish programming puzzles: Writea Gcallable function to find the
greatest or smallest unsigned int. Not a big deal-except that David had already done it
in 24 bytes, so the challenge was to do it in 24 bytes or less.
Such routines soon began coming at me from
all angles. However (and I hate to say
this because some of my correspondents were very pleased with the thought thatthey
had bested David), no one has yet met the challenge-because most of you folks
missed a key point. When David said, "Write a functionto find the greatestor smallest unsigned int in 24 bytes or less," he meant, 'Write the hi and the lo functions in
24 bytes or less-combined."
Oh.
Yes, a 24byte hi/lo function is possible, anatomically improbable as it might seem.
Which I guess goes to show that when one of David's puzzles seems lessthan impossible, odds are you're missing something. Listing 15.9 is David's 24byte solution,
from which a lotmay be learned if one reads closely enough.
292
Chapter 15
Previous
LISTING 15.9
Home
L15-9.ASM
; F i n dt h eg r e a t e s t
or s m a l l e s tu n s i g n e di n t .
; C c a l l a b l e( s m a l lm o d e l ) :
24 b y t e s .
: By D a v i dS t a f f o r d .
: u n s i g n e dh i (i n t
num. u n s i g n e da [ ]
: u n s i g n e dl o (i n t
n u m . u n s i g n e da [ ]
p u b l i c -.hi.
-hi :
-1
0:
db
xor
POP
POP
save:
top:
around:
POP
push
push
push
mov
cmp
j cxz
cmc
ja
inc
inc
dec
j nz
):
);
-10
Ob9h
cx.cx
ax
dx
bx
bx
dx
ax
ax, Cbxl
ax.[bxl
around
:mov c x . i m m e d i a t e
: g e tr e t u r na d d r e s s
: g e tc o u n t
: g e tp o i n t e r
: r e s t o r ep o i n t e r
; r e s t o r ec o u n t
; r e s t o r er e t u r na d d r e s s
save
bx
bx
dx
top
ret
Before I end this chapter, letme say that I get alot of feedback from my readers, and
it's much appreciated. Keep those cards, letters, and email messages coming. And if
any of you knowJeannie Schweigert, haveher dropme a line and letme know how
she's doing these days....
293
Next
Previous
Home
Next
297
runs! Im not saying that performanceis everything, but optimization isnt even down
there at number10, below online help! Ye gods and little fishes! We are talking here
about people who would take a bus from LA to New York instead ofa plane because it
had a cleaner bathroom; who would choose a painting from a Holiday Inn over a
Matisse because it had a fancier frame;who would buya h g o instead of-well, hell,
anything-because it hada nice owners manual and particularly attractive keys. We
are talking about peoplewho are focusing on means, and have forgotten about ends.
We are talking about people with no programming souls.
298
Chapter 16
LISTING 16.1
/*
116-1.C
W o r d - c o u n t i n gp r o g r a m .T e s t e dw i t hB o r l a n d
c o m p i l a t i o n mode a n dt h es m a l m
l odel.
C++
in C
*/
# i n c l u d e< s t d i o . h >
% i n c l u d e < f c n t l h>
# i n c l u d e< s y s \ s t a t . h >
# i n c l u d e < s t d l ib . h>
#i
n c l ude <i
0 . h>
# d e f i n e BUFFER-SIZE
i n tm a i n ( i n t .c h a r
I * l a r g e s ct h u n ko ff i l ew o r k e d
w i t h a t any one t i m e * /
Ox8000
**);
i n tm a i n ( i n ta r g c .c h a r* * a r g v )
i n tH a n d l e ;
u n s i g n e di n tB l o c k S i z e :
1o n g F i 1e S i z e :
u n s i g n e dl o n gW o r d C o u n t
c h a r* B u f f e r .C h a r f l a g
i f ( a r g c != 2 ) {
printf("usage:
exit(1):
- 0:
=
Ch:
0. P r e d C h a r F l a g .* B u f f e r P t r .
1
i f ( ( B u f f e r = rnalloc(BUFFERKS1ZE)) == NULL)
p r i n t f ( " C a n ' ta l l o c a t ea d e q u a t em e m o r y \ n " ) :
exit(1):
i f ( ( H a n d l e = o p e n ( a r g v C 1 1 , 0-RDONLY I 0-BINARY))
p r i n t f ( " C a n ' to p e nf i l e
%s\n". argvC11):
exit(1):
=-
-1) {
i f ( ( F i l e s i z e = f i l e l e n g t h ( H a n d 1 e ) ) == -1) I
printf("Errorsizingfile
%s\n". a r g v [ l l ) ;
exit(1):
}
I* P r o c e s st h ef i l ei nc h u n k s
*/
w h i l e( F i l e s i z e
> 0) {
I* G e tt h en e x tc h u n k
*I
F i l e s i z e - = ( B l o c k S i z e = min(Fi1eSize.BUFFER-SIZE)):
i f ( r e a d ( H a n d 1 e .B u f f e r ,B l o c k S i z e )
== -1) {
p r i n t f ( " E r r o rr e a d i n gf i l e
%s\n". a r g v C 1 1 ) :
exit(1):
I* Countwords
i n t h e chunk * I
BufferPtr = Buffer:
do I
PredCharFlag = C h a r f l a g :
Ch = * B u f f e r P t r + + & Ox7F; I* s t r i p h i g h b i t , w h i c h
w o r dp r o c e s s o r ss e ta sa n
flag *I
CharFlag =
II
)
some
II
II
There Ain't
i f ((!CharFlag)
Wordcount++:
&& P r e d C h a r F l a g ) {
I
1 w h i l e( - B l o c k S i z e ) ;
/ * C a t c ht h el a s tw o r d ,
i f (CharFlag) {
Wordcount++;
i f any
*/
1
I
p r i n t f ( " \ n T o t a lw o r d si nf i l e :% l u \ n " .W o r d c o u n t ) :
return(0):
LISTING16.211
/*
6-2.C
W o r d - c o u n t i n gp r o g r a mi n c o r p o r a t i n ga s s e m b l yl a n g u a g e .T e s t e d
w i t hB o r l a n d C++ i n C c o m p i l a t i o n mode & t h es m a l lm o d e l .
*/
#i
n c l ude < s t d i 0. h>
# i n c l u d e < f c n t l h>
#include <sys\stat. h>
#i
n c l u d e < s t dil b. h>
# i n c l u d e< i o . h >
# d e f i n e BUFFER-SIZE
intmain(int,char
v o i dS c a n B u f f e r ( c h a r
i f ( a r g c !- 2 ) {
printf("usage:
exit(1):
l a r g e sct h u n k
o f f i l e worked
w i t h a t a n yo n et i m e
*/
**I:
*,
u n s i g n e di n t ,c h a r
i n tm a i n ( i n ta r g c .c h a r* * a r g v )
i n t Handle:
u n s i g n e di n tB l o c k S i z e :
l o n gF i l e S i z e :
u n s i g n e dl o n gW o r d c o u n t
c h a r* B u f f e r .C h a r F l a g
/*
0x8000
*,
u n s i g n e dl o n g
*);
- 0:
0:
i f ((Buffer
malloc(BUFFER-SIZE))
NULL) {
p r i n t f ( " C a n ' ta l l o c a t ea d e q u a t em e m o r y \ n " ) ;
exit(1):
i f ((Handle
open(argvC11, OCRDONLY I 0-BINARY))
p r i n t f ( " C a n ' t open f i l e% s \ n " .a r g v C l ] ) :
300
Chapter 16
- -1)
exit(1):
i f ( ( F i l e s i z e = f i l e l e n g t h ( H a n d 1 e ) ) == -1) {
p r i n t f ( " E r r o rs i z i n gf i l e% s \ n " .a r g v [ l ] ) :
exit(1);
CharFlag = 0 :
w h i l e( F i l e s i z e
> 0) {
F i l e s i z e - = ( B l o c k S i z e = m i n ( F i 1 e S i z e . BUFFER-SIZE)):
i f ( r e a d ( H a n d 1 e .B u f f e r ,B l o c k S i z e )
=- -1) {
p r i n t f ( " E r r o rr e a d i n gf i l e% s \ n " .a r g v C 1 1 ) :
exit(1):
S c a n B u f f e r ( B u f f e r B. l o c k S i z e &. C h a r F l a g &. W o r d C o u n t ) :
1
I* C a t c ht h el a s tw o r d ,
i f (CharFlag) I
Wordcount++:
i f any * I
p r i n t f ( " \ n T o t a lw o r d si nf i l e :% l u \ n " .W o r d C o u n t ) :
return(0):
LISTING16.3116-3.ASM
; A s s e m b l ys u b r o u t i n ef o rL i s t i n g1 6 . 2 .S c a n st h r o u g hB u f f e r ,o f
: l e n g t hB u f f e r L e n g t h .c o u n t i n gw o r d sa n du p d a t i n gW o r d C o u n ta s
: a p p r o p r i a t e .B u f f e r L e n g t hm u s tb e
> 0 . *CharFlagand*Wordcount
: s h o u l de q u a l
0 on t h e f i r s t c a l l . T e s t e d w i t h
TASM.
: C n e a r - c a l l a b l ea s :
: v o i dS c a n B u f f e r ( c h a r* B u f f e r .u n s i g n e di n tB u f f e r L e n g t h ,
: c h a r* C h a r F l a g .u n s i g n e dl o n g* W o r d c o u n t ) :
psatrrm
u cs
2 d u;pp(u?sr)ehateduddr nr e s s
& BP
dw
B u f f e r dw
?
; b u f f e rt os c a n
: l e n g t ho fb u f f e rt os c a n
B u f f e r L e n g t h dw ?
: p o i n t e rt of l a gf o rs t a t eo fl a s t
?
C h a r F l a g dw
: c h a rp r o c e s s e do ne n t r y
( 0 on
: i n i t i a lc a l l ) .U p d a t e do ne x i t
WordCount dw
?
w
: 3po2coroo-itdbfnousitntet r
; f o u n d ( 0 on i n i t i a l c a l l )
parms
ends
.model
smal
1
.code
pub1 i c _ S c a n B u f f e r
.n_eSacrap nr oBcu f f e r
cal :preserve bp
push
l o c amov
l u p ; s ebt p . s p
c a l; p r e s e r v e s i
push
dipush
mov
mov
mov
mov
mov
l e r ' s s t a c kf r a m e
s t a c k frame
l e r ' s r e g i s t e rv a r s
301
mov
mov
ScanLoop:
mov
1o d s b
a l , 7 f ha n d
b l ,[ b x l
di.[bp+BufferLength]
bh.bl
; g e tc u r r e n tC h a r F l a g
;get I ofbytestoscan
CharFlag;
:PredCharFlag
* B u f f e r P t r + + & Ox7F;
;Ch
; s t r i ph i g hb i tf o rw o r dp r o c e s s o r s
; t h a ts e t
i t a sa ni n t e r n a lf l a g
;assume t hi iss
a c hCahra; r F l a g
; i t i s a c h a r i f b e t w e e n a and z
1;
mov
b l ,1
cmp
al.'a'
CheckAZ
jb
cmp
al.'z'
I s A C h aj nr a
C hec kAZ :
cmp
a1,'A'
; i t i s a c h a r i f b e t w e e n A and Z
Check09
jb
cmp
a1,'Z'
I s A C h aj nr a
Check09:
;it i s a c h a r i f b e t w e e n 0 and 9
cmp
a1 , ' 0 '
jb
CheckApostrophe
cmp
a1 , ' 9 '
I s A C h aj rn a
CheckApostrophe:
; i t i s a c h a r i fa paons t r o p h e
cmp
a1 .27h
jz
IsAChar
a cCh ha ar ;r F l a g
0;
; n. ob tlbslu b
bh.bh and
jz
ScanLoopBottom ; i f ( ( ! C h a r F l a g ) && P r e d C h a r F l a g ) (
cx.1
add
;
(WordCount)++;
dx.0 adc
;I
IsAChar:
ScanLoopBottom:
di
dec
("B
; Iu fwf ehri Ll ee n g t h ) ;
jnz
ScanLoop
mov
mov
mov
mov
mov
pop
pop
POP
ret
3 c ea n dB pu f f e r
end
. [ bspi+ C h a r F l a g l
; [ssei t] . b l
bx.[bp+WordCount]
[ b x ] ,; cs xe t
[bx+2],
dx
di
new C h a r F l a g
new w oc rodu n t
; r e s t o r ec a l l e r ' sr e g i s t e rv a r s
si
bP
; r e s t o r ec a l l e r ' ss t a c kf r a m e
302
Chapter 16
Ponder this. What we really want to know is nothing more than whether abyte is a
character, not what sort of character it is. For each byte value, we want a yes/no
status, and nothing else-and that description practically begs for a lookup table.
Listing 16.4 usesa lookuptable approach to boost performance another 50 percent,
to three times the performance of the original C code. On a 20 MHz 386, this represents a change from 4.6 to 1.6 seconds, which could be significant-who likes
to
wait? On an 8088, the improvement in word-counting a large file could easily be 10
or 20 seconds, which is definitely significant.
:
;
;
;
A s s e m b l ys u b r o u t i n ef o rL i s t i n g1 6 . 2 .S c a n st h r o u g hB u f f e r .o f
l e n g t hB u f f e r L e n g t h ,c o u n t i n gw o r d sa n du p d a t i n gW o r d C o u n ta s
a p p r o p r i a t e ,u s i n g
a l o o k u pt a b l e - b a s e da p p r o a c h .B u f f e r L e n g t h
mustbe
> 0. * C h a r F l a ga n d* W o r d c o u n ts h o u l de q u a l
0 on t h e
f i r s tc a l l .T e s t e dw i t h
TASM.
C n e a r - c a l l a b l ea s :
v o i dS c a n B u f f e r ( c h a r* B u f f e r .u n s i g n e di n tB u f f e r L e n g t h .
c h a r* C h a r F l a g ,u n s i g n e dl o n g* W o r d C o u n t ) ;
psatrrm
u cs
& BP
dw
2 d u:pp(u?s)r heaet uddrdnr e s s
?
B u f f e r dw
; b u f f e rt os c a n
; l e n g t ho fb u f f e rt os c a n
B u f f e r L e n g t h dw ?
;pointertoflagforstateoflast
?
C h a r F l a g dw
: c h a rp r o c e s s e do ne n t r y
( 0 on
; i n i t i a lc a l l ) .U p d a t e d
on e x i t
Wordcount dw
?
:w3po2cooro-itdbnfousitntetr
; f o u n d ( 0 on i n i t i a l c a l l )
parms
ends
.model
smal
1
.data
; T a b l eo fc h a r / n o ts t a t u s e sf o rb y t ev a l u e s0 - 2 5 5( 1 2 8 - 2 5 5a r e
; d u p l i c a t e s o f 0 - 1 2 7 t o e f f e c t i v e l y mask o f f b i t 7 . w h i c h some
: w o r dp r o c e s s o r ss e ta sa ni n t e r n a lf l a g ) .
C h a r S t a t u s T a bl al ebbey lt e
REPT
2
d u p (309) d b
1
;apostrophe
db
8 dup(0)
db
d u p (110) d b
;o-9
db
7 dup(0)
; A - d2 u p ( 1 ) 2 6
db
6 dup(0)
db
:a-z
26 d u p ( 1 )
db
db
5 dup(0)
ENDM
.code
p u b l-iSc c a n B u f f e r
n- eSac rapnr oBcu f f e r
cal ;preserve bp
push
mov
: s ebt p . s p
c a l; p r e s e r v e s i
push
dipush
l e r ' ss t a c kf r a m e
u p l o c a l s t a c kf r a m e
l e r ' sr e g i s t e rv a r s
There Ain't
303
mov
mov
mov
mov
mov
mov
mov
mov
ScanLoop:
and
s i . [ b p + B u f f e r :l p o i n t ob u f f e rt os c a n
bx.[bp+WordCount]
: g e t c u r r e3n2t - bwi ot cr od u n t
d i ,[ b x ]
dx. [bx+El
bx.[bp+CharFlag]
a1 C:cbguxerltrCe hn at r F l a g
# o fb y t e st os c a n
c x , C b p + B u f f e r L e n g t h l: g e t
b x . o f f s e tC h a r S t a t u s T a b l e
a1 .a1
:ZF-0
jz
ScanLoooBottom
and
a1 .a1
; g e tt h en e x tb y t e
; * * * d o e s n ' tc h a n g ef l a g s * * *
: l o o ku pi t sc h a r / n o ts t a t u s
; * * * d o e s n ' tc h a n g ef l a g s * * *
: d o n ' tc o u n t
a word i f l a s t b y t e
: not a character
; l a s t b y t e was a c h a r a c t e r :i st h e
: c u r r e n tb y t e a c h a r a c t e r ?
;no. s o c o u n t a w o r d
1 odsb
xlat
jz
ScanLoopBottom
dec
jnz
Done:
mov
mov
mov
mov
mov
POP
POP
POP
ret
a1 i g n
Countword:
add
adc
dec
jnz
jmp
-ScanBuffer
end
Countword
cx
ScanLoop
i f l a s tb y t e
: ZF=l i f not
was a c h a r ,
was
: c o u n t down b u f f e r l e n g t h
si .[bp+CharFlag]
[ s i 1.a1
; s e t new C h a r F l a g
bx.[bp+WordCountl
[ b :xsl e. dt i
new cwoourndt
[bx+2l ,dx
di
si
bP
: r e s t o r ec a l l e r ' sr e g i s t e rv a r s
: r e s t o r ec a l l e r ' ss t a c kf r a m e
2
d i .I
dx.0
cx
ScanLoop
Done
endp
: i n c r e m e n tt h ew o r dc o u n t
: c o u n t down b u f f e r l e n g t h
Listing 16.4 features several interesting tricks. First, it uses LODSB and XLAT in
succession, a very neat way to get a pointed-to
byte, advance the pointer, and look up
the value indexed by the byte in a table, all withjust two instruction bytes. (Interestingly, Listing16.4would probably run quite a bit better still on an8088, where LODSB
and XLAT have a greater advantage over conventional instructions. On the 486 and
Pentium, however, LODSB and XLAT lose much of their appeal, and should be
replaced with MOV instructions.) Better yet, LODSB and XLAT don't alterthe flags,
so the Zero flag status set before LODSB is still around to be tested after XLAT.
Finally, if you look closely, you will see that Listing 16.4 jumps out of the loop to
increment the word count in the case wherea word is actually found, with a duplicate of
the loop-bottom code placed after the code that increments the
word count, to avoid
304
Chapter 16
an extra branch back into the loop; this replaces the more intuitive approach of
jumping around the
incrementing codeto the loop bottom when a word isnt found.
Although this incurs a branch every time a word is found, a word is typically found
only once every 5 or 6 bytes; on average, then, a branchis saved about two-thirds of
the time. This is an excellent example of how understanding the natureof the data
youre processing allows you tooptimize in ways the compiler cant. Know your data!
So, gosh, Listing 16.4 is the best word-counting code in the universe, right? Not
hardly. If theres one thingmy years of toilin this vale of silicon havetaught me,its
that theres never a lack of potential for further optimization. Never! Off the top of
my head, Ican think of at least three ways to speed up Listing 16.4;and, since Turbo
Profiler reports thateven in Listing 16.4,88 percent of the time is spent scanning the
buffer (as opposed to reading the file),theres potential for those further optimizations to improve performance significantly. (However,it is true that when access is
performed to a hard rather than
RAM disk, disk accessjumps to about half of overall
execution time.) One possible optimization is unrolling the loop, although that is
truly a last resort because it tends to make further changes extremely difficult.
305
handling two bytes simultaneously. This is not the sort of technique one comes up
with by brute-force optimization.) Thinking (or,
worse yet, boasting) thatyour code
is the fastest possible is rollerskating on a tightrope in a hurricane; youredue for a
fall, if you catch my drift. Case in point: Terje Mathisens word-counting program.
DH.CEBX+EAXI
Harmless enough, save for two things. First, EBX happened to be zero at this point
(a leftover from an earlier version of the code,as it turnedout), so it was superfluous
as a memory-addressing component; this made it possible to use base-only addressing ([EAX]) rather than baset-index addressing([EBX+EAX]), which saves a cycle
on the 386. Second: Changing the instruction to CMP [EAX],DH saved 2 cyclesjust enough,by good fortune, to speed up thewhole program by 5 percent.
CMP reg,[mem]takes 6 cycles on the 386, but CMP /memJ,reg takes only 5 cycles;
1 you should always pevformCMP with the memory operandon the left on the 386.
(Granted, CMP [mem],reg is 1 cycle slower than CMP reg,[mem] on the 286, and
theyre both the same on the8088; in this case,though, the code was specific to the 386.
In case youre curious, both forms
take 2 cycles on the 486; quite a lotfaster, eh?)
306
Chapter 16
times while tryingto speedup his code, but he didnt really see whatit did; instead,
he saw what it was supposed to do. Mental shortcuts like this are what enable us to
deal with the complexities of assembly language without overloading after about 20
instructions, but they can be a major problem when looking over familiar code.
The third, and most interesting, lesson is that a far more fruitful optimization came
of all this, one that nicely illustrates that cycle counting is not the key to happiness,
riches, and wondrous performance. After getting my 5 percent speedup, I mentioned
to Terje the possibility of using a 64K lookup table. (This predated the arrival of
entries for the optimization contest.) He said that he hadconsidered it, but it didnt
seem to him to be worthwhile. He couldnt shake the thought, though, and started
to poke around, and oneday, voila, he posted a new version of his wordcount program, WC50, that was much faster than the oldversion. I dont have exact numbers,
but Terjes preliminary estimate was 80 percent faster, and word counting--including
disk cache access time-proceeds at more than3 MB per second on a 33 MHz 486.
Even allowing for the speedof the 486, those are very impressive numbers indeed.
The point I want to make, though, is that the biggest optimization barrier that Terje
faced was that he thought he had the fastest code possible. Once he opened up the
possibility that therewere fasterapproaches, and looked beyondthe specific approach
that he had so carefully optimized, he was able to come up with code that was a lot
faster. Considerthe incongruity of Terjes willingness to
consider a 5 percent speedup
significant in light of his later near-doubling
of performance.
success, with dozens ofgood entries. I certainly enjoyedit, even though I did have to
look at a lot of tricky assembly code that I didnt write-hard work under the best of
circumstances. It was worth the trouble, though. The winning entry was an astonishing
example of what assembly languagecan do in the right hands; on my 386, it was four
times faster at word counting than the nice, tight assemblycode I provided asa starting
There Aint
point-and about 13times faster than theoriginal C implementation. Attention, highlevel language chauvinists: Is the speedup getting significant yet? Okay, maybe word
counting isnt the most criticalapplication, but how would you like
to have that kind of
improvement in your compression software, or in your real-time games-or in Windows graphics?
The winner was David Stafford, who at the time was working for Borland International; his entry is shown in Listing 16.5. Dave Methvin, whom some of you may
recall as a tech editor of the late, lamented PC TechJournal, was a close second, and
Mick Brown, about whom I know nothing more than that he is obviously an extremely good assembly language programmer, was a close third, as shown in Table
16.2, whichprecedes Listing 16.5.Those three were out ahead of the pack; the fourthplace entry, good as it was (twice as fast as my original code), was twice as slow as
Davids winning entry, so you can see that David, Dave,and Mick attained a rarefied
level of optimization indeed.
Table 16.2 has two times
for each entry listed: the first valueis the overall counting time,
including time spent in the
main program, disk I/O, and everything else; the second
value is the time actuallyspent counting words, the time spent in ScanBuffer.The first
value is the time perceived by the user, but thesecond value best reflects the quality
of the optimization in each entry, since the rest of the overall execution time is fixed.
308
Chapter 16
LISTING 16.5
;
;
QSCAN3.ASM
QSCAN3.ASM
D a v iSdt a f f o r d
COMMENT $
How i t w o r k s
T h ei d e a
i s t o g ot h r o u g ht h eb u f f e rf e t c h i n ge a c hl e t t e r - p a i r( w o r d s
r a t h e rt h a nb y t e s ) .T h ec a r r yf l a gi n d i c a t e sw h e t h e r
we a r e
c u r r e n t l y i n a ( t e x t )w o r do rn o t .T h el e t t e r - p a i rf e t c h e df r o mt h e
i t l e f t one b i t
b u f f e ri sc o n v e r t e dt o
a 1 6 - b i ta d d r e s sb ys h i f t i n g
( l o s i n gt h eh i g hb i to ft h es e c o n dc h a r a c t e r )a n dp u t t i n gt h ec a r r y
f l a gi nt h el o wb i t .
T h eh i g hb i to ft h ec o u n tr e g i s t e ri ss e tt o
1. T h e nt h ec o u n tr e g i s t e ri sa d d e dt ot h eb y t ef o u n da tt h eg i v e n
a d d r e s s i n a l a r g e ( 6 4 K . n a t u r a l l y )t a b l e .T h eb y t ea tt h eg i v e n
i f t h el a s tc h a r a c t e ro ft h e
a d d r e s s will c o n t a i n a 1 i n t h e h i g h b i t
will
l e t t e r - p a i r i s a w o r d - l e t t e r( a l p h a n u m e r i co ra p o s t r o p h e ) .T h i s
s e tt h ec a r r yf l a gs i n c et h eh i g hb i to ft h ec o u n tr e g i s t e ri sa l s o
a
1. Thelow
b i to ft h eb y t ef o u n da tt h eg i v e na d d r e s s
will beone
if
t h es e c o n dc h a r a c t e ro ft h ep r e v i o u sl e t t e r - p a i r
was a w o r d - l e t t e r
a n dt h ef i r s tc h a r a c t e ro ft h i sl e t t e r - p a i ri sn o t
a w o r d - l e t t e r . It
will a l s ob e 1 i f t h e f i r s t c h a r a c t e r o f t h i s l e t t e r - p a i r i s
a
w o r d - l e t t e rb u t h es e c o n dc h a r a c t e ri sn o t .T h i sp r o c e s si s
r e p e a t e d .F i n a l l y ,t h ec a r r yf l a gi ss a v e dt oi n d i c a t et h ef i n a l
i n - a - w o r d / n o t - i n - a - w o r ds t a t u s .T h ec o u n tr e g i s t e ri sm a s k e dt o
r e m o v et h eh i g hb i t
and t h ec o u n to fw o r d sr e m a i n si nt h ec o u n t
register.
S o u n dc o m p l i c a t e d ?Y o u ' r er i g h tB! uitt ' fsa s t !
T h eb e a u t yo ft h i sm e t h o di st h a tn oj u m p sa r er e q u i r e d ,t h e
it r e q u i r e so n l yo n et a b l ea n dt h ep r o c e s sc a n
o p e r a t i o n sa r ef a s t .
b er e p e a t e d( u n r o l l e d )
many t i m e s .
QSCAN3 c a nr e a d2 5 6b y t e sw i t h o u t
jumping.
COMMEND $
.modelsmall
.code
Test1
Addr&x:
macro
x.y
mov
d i , Cbp+yl
adc
. ddi i
or
ax.si
add
a1 , Cdi 1
endm
Test2
Addr&x:
macro
x.y
mov
d i , Cbp+yl
adc
. d di i
1
a ad[ hd ,i
endm
Scan
Buffer
BufferLength
CharFlag
WordCount
128
4
6
:9 o r1 0b y t e s
4 bytes
; 3o r
;7 o r 8 b y t e s
4 bytes
: 3o r
; s c a n2 5 6b y t e sa t
a time
;parms
10
309
public -ScanBuffer
-S c a n B u f fperrnoec a r
push
mov
push
push
xor
mov
mov
shr
jnz
- t ebxut f f e r
cx, cx
. [sbi p + B u f f e; sr i]
a x . [ b p + B u f f e r L e n g t h l; d x
ax.1
;dx
Normal Buf
--
l e n g t hi nb y t e s
lengthinwords
OneByteBuf:
Normal Buf:
mov
mov
ax.segWordTable
es.ax
mov
mov
mov
add
add
mov
cbw
shr
adc
xchg
jmp
d i ,[ b p + C h a r F l a g ]
bh.[dil
b l , [ s i1
bh.'A"l
bx, bx
a1 . e s : [ b x ]
push
pushf
cwd
mov
div
or
mov
shl
mov
xchg
xor
mov
mov
mov
mov
mov
mov
mov
shr
j mp
bx, dx
bx.1
di ,LoopEntry[bx]
dx, ax
cx, cx
bx.[bp+CharFlagl
bl [bxl
bp,segWordTable
ds. bp
bp,si
s i ,8080h
ax.si
b l .1
di
a1 i g n
add
2
bx, bx
0
Scan12
rept
3 10
Chapter
16
:dx
sub
sub
sub
inc
Top :
n
: g e th ib i ti n
:getlowbit
0 or 1
;cx
ah ( t h e nb h )
:(1)
:( 2 )
bp
c l .Scan
cx
dx, dx
StartAtTheTop
cx, dx
si .cx
si .cx
ax
jz
StartAtTheTop:
a1 .1
c x ,c x
ax, bx
C1 eanUp
--
o l dC h a r F l a g
;bh
character
:bl
;makebh
i n t oc h a r a c t e r
: p r e p a r et oi n d e x
:remainder?
;nope.do
t h ew h o l eb a n a n a
: a d j u s tb u fp o i n t e r
; a d j u s tf o rp a r t i a lr e a d
:get index for start
:...address i n d i
:dx i s t h e l o o p c o u n t e r
; t o t a lw o r dc o u n t
;bl
o l dC h a r F l a g
: s c a nb u f f e rw i t hb p
: h ib i t s
: i n i tl o c a lw o r dc o u n t e r
o l dC h a r F l a g
;carry
: r e s t o r ec a r r y
...
Testl
Test2
%n.%n*2
%n+l.%n*2+2
n+2
endm
EndCount:
if
bx.bx sbb
Scange128
or
ax.si
add
a1 ,ah
mov
ah.0
: s a v ec a r r y
:becauseal+ah
may e q u a l1 2 8 !
else
add
and
a1 ,ah
ax.7fh
:mask
endi f
c o uadd
n tw o:rudp d ac xt e. a x
mov
ax.si
add
bp,Scan*2
:any dec
dx
Quit
jng
jmp
TOP
Quit:
POPf
jnc
c lc
Testl
c a r r: ys a vsbb
e bx
shr
adc
left?
:(2)
e v e no ro d db u f f e r ?
ItsEven
Odd.-1
bx,
ax.1
cx.0
ItsEven:
Cleanup:
-ScanBuffer
Address
push
POP
POP
ss
ds
bp
.data
macro
dw
endm
:(1)
X
Addr&X
-l a b e wl o r dScan
include
:restore
mov
si.[bp+WordCountl
[sil.cx
add
w o r dp t r[ s i + E l . O
adc
f lcaagrtrhyoe: nsbal yhv .e1
and
si.[bp+CharFlagl
mov
[ s i 1, b h
mov
di
POP
si
POP
bp
POP
ret
endp
LoopEntry
ds
REPT Scan
Address%n MOD Scan
n - 1
ENDM
. f a r d a t aW o r d T a b l e
qscan3.inc
end
: b u i l tb y
MAKETAB
31 1
Levels of Optimization
Three levels of optimization were evident in the word-counting entries I received in
response to my challenge. Id briefly describe them as fine-tuning, new perspective, and table-driven state machine. The latter categories produce faster code,
but, by the same token, they are harder to design, harder to implement, and more
difficult to understand, so theyre suitable for only the most demanding applications. (Heck, I dont even guarantee that David Staffords entry works perfectly,
although, knowing him, it probably does; the more complex and cryptic the code,
the greater the chance for obscurebugs.)
Remember, optimize only when needed, and stop when further optimization will
not be noticed. Optimization that 5. not perceptible to the user is like buying Telly
Savalas a comb; it5. not going todo any harm, but 5.it nonetheless a waste of time.
31 2
Chapter 16
was time to count a word. There aremany ways to reduce this sort of branching; in
fact, it is quite possible to eliminate it entirely. The most straightforward way to reduce such branching is to employ two loops. One loop is used to look for the end of
a word when the last byte was a non-separator, and oneloop is used to look for the
start of a word whenthe last bytewas a separator. This way, its no longer necessary to
maintain a flag to indicate the state of the last byte;that state is implied by whichever
loop is currently executing. This considerably simplifies and streamlines the inner
loop code.
Listing 16.6, contributed by Willem Clements, of Granada, Spain, illustrates a variety
of level 1 optimizations: the two-loop approach, the use of a 16- rather than 32-bit
counter, and theuse of LODSW. Together, these optimizations made Willems code
nearly twice as fast as mine in Listing 16.4. A few details could stand improvement;
for example,AND Axpx is a shorter way to test for zero
than CMP AX,O, and ALIGN 2
could be used. Nonetheless, this is good code, and its also fairly compact and reasonably easy to understand. In short,
this is an excellent example of how an hour or
so of hand-optimization might accomplish significantlyimproved performance at a
reasonable cost in complexity and time. This level of optimization is adequate for
most purposes (and, in truth, is beyond the abilities of most programmers).
F i n a lo p t i m i z a t i o nw o r dc o u n t
M i c h a eAl b r a s h
W i l l e mC l e m e n t s
C1 Moncayo
5,
Laurel
de
l a Reina
18140 La Z u b i a
Granada,Spain
T e3l 4 - 5 8 - 8 9 0 3 9 8
Fax34-58-224102
parms
struc
2 dup(?)
dw
?
buffer
dw
bufferlength
?
dw
charflag
?
dw
?
wordcount
dw
parms
ends
small
.model
.data
c h a r s t a t u s t a b l el a b e l b y t e
2
rept
db
3 9d u p ( 0 )
I
db
8 dup(0)
db
1 0d u p ( 1 )
db
7 dup(0)
db
26 d u p ( 1 )
db
db
6 dup(0)
26 d u p ( 1 )
db
db
5 dup(0)
endm
.code
3 13
-ScanBuffer
oddentry:
pub1 ic
proc
push
mov
push
push
mov
mov
mov
mov
mov
xor
shr
jc
cmp
jne
j mp
xchg
1 odsb
in c
cmp
jne
jmp
ScanBuffer
near
bP
bps sp
si
di
si .[bp+bufferl
bx.[bp+charflagl
a1 C b x l
cx.[bp+bufferlengthl
b x . o f f s e tc h a r s t a t u s t a b l e
: sw
e to r d c o u ztnoet r o
d. di i
cx.1
: c h a nc go eu n t
wt oo r d c o u n t
oddentry
: odd
number
o bf y t e st op r o c e s s
: c h e c k i f l a s t one i s c h a r
a1 . O l h
s c a n l oop4 : i f n o t s o . s e a r c h f o r c h a r
s c a n l o o p l : i f so. s e a r c hf o r z e r o
: l a s t one i n ah
a1 ,ah
: g e tf i r s tb y t e
cx
ah.0lh
: c h e c k i f l a s t one was c h a r
scanl oop5 : i f n o t so. search f o rc h a r
zero
s c a n l o o p 2 : i f so, s e a r c hf o r
locatetheendof
a word
c h a r tsw o: g e t
1 odsw
: t rf ai rnsst l a t e
x1 a t
: f i ir ns t
ah
xchg
a1 ,ah
: ster ca on ns dl a t e
x1 a t
scanl oop2:
: c o u n t down
cx
dec
: no blmeyoft ter es
d o nj ze l
: c h e c k i f cthwaor s
CmP
ax.0101h
:nfbteowgyxrotte s
s cj ea n l o o p l
d i in c
: i nwcor redacsoeu n t
: c h e c k i f new w o r ds t a r t e d
cmp
a1 , O l h
: l o c a t ee n do fw o r d
scanloopl
je
scanl oopl:
l o c a t et h eb e g i no f
a word
1 odsw
x1 a t
xchg
a1 , a h
x1 a t
scanl oop5:
cx
dec
jz
done2
cmp
ax.0
o o ps 4c a n l j e
CmP
a1 . O l h
o o sp cl a n l j e
d i in c
o o ps 4c a njlm p
donel :
CmP
ax.0101h
je
done
d i in c
done
jmp
done2:
cmp
ax.0100h
done
jne
d i in c
mov
si.[bp+charflagl
done:
mov
[ s i 1.a1
mov
bx,[bp+wordcountl
mov
Cbxl ax.
scanl oop4:
314
Chapter 16
g e tt w oc h a r s
trans1 ate first
f i r s t i n ah
t r a n s l a t es e c o n d
c o u n t down
n om o r eb y t e s
left
c h e c k i f w o r ds t a r t e d
i f n o t ,l o c a t eb e g i n
c h e c ko n e - l e t t e rw o r d
i f n o t ,l o c a t ee n do fw o r d
i n c r e a s ew o r d c o u n t
l o c a t e b e g i n o f n e x tw o r d
check i f end-of-word
i f n o t . we h a v ef i n i s h e d
i n c r e a s ew o r d c o u n t
c h e c kf o ro n e - l e t t e rw o r d
i f n o t , we h a v ef i n i s h e d
i n c r e a s ew o r d c o u n t
rnov
add
adc
rnov
rnov
POP
POP
-ScanBuffer
POP
ret
endp
end
dx. [bx+E]
di ,ax
dx.0
[bxl .di
[bx+Z] .dx
di
si
bp
Ttry to code as closely as possible to thereal world nature of those things my programmodels. It
seems somehow wrong to me to count the end
o f a word as you do when you
look for a transition
from a word to a non-word. A word is not a transition, it is the
presence o f a group of characters.
Thought ofthisway, the code would have counted theword when itfirstdetected thegroup. Had
you done this, your
main program would not haveneeded to look for the possible last transition
or deal with the semanticsof the valuein Charvalue.
John Richardson, of New York, contributed a good example of the benefits of a
different perspective (in this case, a hardware perspective). John eliminated all
There Aint
branches used for detectingword edges; theinner loopof his code is shown in Listing 16.7. As John explains it:
My next shot was to get rid of all the branches in the loop. To do that, I reached back to my
college hardware courses. I noticed that we were really looking at an edge triggered device we
want to count each time the I,m a character state goes from one to zero. Rememberingthat XOR
on two single-bit values will always return whether the bits are d$fierent or the same, I implemented a transition countm The counter triggers every time a word begins or ends.
LISTING16.7116-7.ASM
ScanLoop:
f i r s t , AH
1 odsw
: g et htnee x t
2 b y t e( As L
x1
f: il uroacspotht kas r /snt oa t u s
i f t h e r e s a new c h a r / n os t a t u s
x do 1r ,:asle e
a dddi . d x
:we add 1 f oera cchh a r / n ot rt a n s i t i o n
mov
d l ,a1
mov
a, a1; lho otahskt ee c o nbdy t e
c hi:talsuorpo/snkt ao t u s
x1 a t
i f t h e r e s a new c h a r / n so t a t u s
x od rl . :asl e e
a dddi . d x
:we add 1 f oera cchh a r / n ot rt a n s i t i o n
mov
d l .a1
dx
dec
jnz
ScanLoop
2nd)
John later divides the transition count by two to get the word count. (Food for thought:
Its also possible to useCMP and ADC to detect words withoutbranching.)
Johns approach makes it clear that wordcountingis nothing more than a fairly simple
state machine.The interesting part, of course, is building the fastest state machine.
Level 3: Breakthrough
The boundaries between the levels of optimization are not sharply defined. In a
sense, level 3 optimization is just like levels1 and 2, but moreso. At level 3, one takes
whatever level 2 perspective seems most promising, and implements it as efficiently
as possible on the x86. Even more than at level 2, at level 3 this means breakingout
of familiar patterns of thinking.
In the case of word counting, level 3 means building a table-driven state machine
dedicated to processing a buffer of bytes into a countof words with a minimum
of branching. Thislevel ofoptimization stripsaway many of the abstractionswe usually use in coding, such as loops, tests, and named variables-look back to Listing
16.5, and youll see what I mean. Only a few people reached this level, and I dont
think any of them did it without long, hard thinking; David Staffords final entry
(that is, the one I presentas Listing 16.5) was at least the fifth entry he sentme.
The key concept at level 3 is the use of a massive (64K) lookup table thatprocesses
byte sequences directly into word-count actions. With such a table, its possible to
look up the appropriate action for two bytes simultaneously in just a few instructions; next, Im going to look at the inspired and highly unusual way that Davids
316
Chapter 16
code, shown in Listing 16.5, does exactly that. (Before assembling Listing 16.5, you
must run the C code in Listing 16.8, to generate an include file defining the 64K
lookup table. When you assemble Listing 16.5,TASM will report a "location counter
overflow" warning; ignore it.)
LISTING 16.8
//
MAKETAB.C
MAKETALC
B u i l d QSCAN3.INC f o r QSCAN3.ASM
l i n c l u d e < s t d i o . h>
#i
ncl ude <ctype. h>
# d e f i n eC h T y p e (
( ( ( c ) & Ox7f)
c )
i n tN o c a r r y [
4
inC
t arry[
4 1
v o i dm a i n (v o i d
=
=
==
'\"
II
1 0 . 0x80, 1. 0x80 I :
( 1 . 0x81, 1. Ox80 ) :
i n ta h c h a r .a l C h a r .
i:
FILE *t = f o p e n ( "QSCAN3.INC".
p r i n t f (" B u i l d i n gt a b l e .P l e a s e
f o r ( ahChar
0 : ahChar
f o r (a l C h a r
0: alChar
i f ( alChar % 8
else
<
==
fclose( t
):
wait
..."
):
1 2 8 : ahchar++ 1
<
2 5 6 : alChar++ 1
f p r i n t f ( t . "\ndb%02Xh".Nocarry[
f p r i n t f ( t . " . % 0 2 X h "N. o c a r r y [
f p r i n t f ( t . ".%02Xh".Carry[
"wt"
i ] 1;
i ] 1:
3 1:
):
3 17
Byte 0
Byte 1
Byte 2
pair of bytes to the main word count simply by getting the two bytes, workingin the
carry status from the last byte, and using the resulting value to index into the 64K
table, adding in the 1 or 0 value found in that table.
A sequence of MOV/ADC/ADD
suffices to performall word-counting tasks for a pair
of bytes.Three instructions, no
branches-pretty nearly perfect code.
One detail remains to be attended to: setting the Carry flagfor next time if the last
byte was a non-separator.David does this in a bizarreand incredibly effective way: He
Byte 0
Byte 1
Byte 2
31 8
Chapter 16
9Ah
41 h
I
.
0
0
Previous
Home
Next
presets the high bit of the count, andsets the high bit in the lookup table for those
entries looked up by non-separators. When a non-separators lookup entryis added
to the count,it will produce a carry, as desired. The high bit of the countis masked
off before being added to the total count, so David is essentially using different parts
of the countvariables for different purposes (counting, andsetting the Carry flag).
There are a numberof other interesting details in Davids code, including the unrolling of the loop64 times, so that 256 bytes in a row are processed without a single
branch. Unfortunately, I lack the space to discuss Listing 16.5 any further. Perhaps
thats not so unfortunate, after all; Id hate to deny you the pleasure of discovering
the wonders of thisrather remarkable code yourself. I will sayone more thing, though.
The cycle count for Davids inner loop is 6.5 cycles per byte processed, and theactual
measured time for his routine, overhead and all, is 7.9 cycles/byte. The original C
code clocked in at around 100 cycles/byte.
Enough said, I trust.
There Aint
3 19
Previous
Home
Next
Chapt
I;:
nj"i
si".
,*si
sjnin
.,*a8
of Algorithmic Optimization
.&tomata Game
'"I&
cussing assembly language optimization,which I conderappreciated topic. However, I'd like to take this
t there is much, much more to optimization than ass essential for absolute maximum performance, but
ecessary but notsufficient, if you catch my drift-and
ing forimproved but not maximum performance.
imes: Optimize your algorithm first. Devise new apth said,Premature optimization is the root of all evil.
This is, of course, o&hat, stuff you knowlike the back of your hand. Oris it? As Jeff
Duntemann pointed out to
me the otherday, performance programmers are made,
not born. While I'm merrily gallivanting around in this book optimizing 486
pipelining and turning simple
tasks into horriblycomplicated and terrifylngly fast
state machines, many of youare still developing your basic optimization skills. I don't
want to shortchange thoseof you in the latter category, so in this chapter, we'll discuss some high-level language optimizations that can be applied by mere mortals
within a reasonable periodof time. We're going to examine a complete optimization
process, from start to
finish, and what we will find is that it's possible to get 50-times
a
speed-up withoutusing one byte of assembly! It's all a matterof perspective-how you
look at your code and data.
I've spent a lot of m
323
Conways Game
Previous
Home
Next
The program that were going to optimize is Conways famous Game of Life, longago favorite of the hackers at MITs AI Lab. If youvenever seen it, let meassure you:
Life isneat, and more than
a little hypnotic. Fractals have been the hot
graphics topic
in recent years, but foreye-catching dazzle, Life is hard to beat.
Of course, eye-catching dazzle requires real-time performance-lots of pixels help
too-and theres the rub. When there are, say, 40,000 cells to process and display, a
simple, straightforward implementation just doesnt cut it, even on a 33 MHz 486.
Happily, though, there are many, many ways to speed up Life, and they illustrate a
variety of important optimization principles, as this chapter will show.
First, Ill describe the groundrules of Life, implement avery straightforward version
in C++, and then speed that
version up by about eight times without using any drastically different approaches or any assembly. This may be a little tame for some of
you, but be patient; for after that,
well haul out thebig guns andmove into the30 to
40 times speed-up range. Then in the next chapter,Ill show you how several programmers really floored it in taking me up on my second Optimization Challenge,
which involved the Game of Life.
324
Chapter 17
Previous
Home
Next
ON-COLOR
15
OFF-COLOR 0
MSG-LINE 10
GENERATION-LINE 1 2
1
LIMIT-18-HZ
WRAP-EDGES
1
mode f o r w h i c h
mode s e t
*/
/ / o n - c pe il xlceoll o r
/ / o f f - c ep li lxceol l o r
/ / row f ot er x t
messages
/ / row f o r g e n e r a t i o n # d i s p l a y
/ / s e t 1 f o r maximum f r a m rea t e
/ / s etto
0 tdoi s a b lwe r a p p i nagr o u n d
/ / a t c e l l map edges
{
c l a s sc e l l m a p
private:
unsigned char *cell s :
u n s i g n e di n tw i d t h :
u n s i g n e di n tw i d t h - i n - b y t e s :
u n s i g n e di n th e i g h t :
u n s i g n e di n tl e n g t h - i n - b y t e s :
public:
c e l l m a p ( u n s i g n e di n th .u n s i g n e di n tv ) :
-cellmap(void):
v o i dc o p y - c e l l s ( c e l 1 m a p& s o u r c e m a p ) :
v o i ds e t _ c e l l ( u n s i g n e di n tx .u n s i g n e di n t
v o i dc l e a r - c e l l ( u n s i g n e di n tx .u n s i g n e di n t
i n tc e l l - s t a t e ( i n tx .i n ty ) :
v o i d next-generation(cellmap& dest-map):
18Hz
y):
y);
1:
e x t e r nv o i d enter-display-mode(void):
e x t e r nv o i d exit-display-mode(void):
e x t e r nv o i dd r a w - p i x e l ( u n s i g n e di n t
X . u n s i g n e di n t
u n s i g n e d in t C o l o r :
e x t e r nv o i ds h o w - t e x t ( i n tx .i n t
y . c h a r* t e x t ) :
Y.
/*
C o n t r o l st h es i z eo ft h ec e l l
map. Mustbe
w i t h i nt h ec a p a b i l i t i e s
o ft h ed i s p l a y
mode,andmustbe
limitedtoleave
room f o r t e x t
*/
d i s p l a ya tr i g h t .
u n s i g n e di n tc e l l m a p - w i d t h
96;
u n s i g n e di n tc e l l m a p - h e i g h t
= 96:
/* W i d t h & h e i g h t i n p i x e l s o f e a c h c e l l
as d i s p l a y e d on s c r e e n . * /
u n s i g n e di n tm a g n i f i e r
2:
325
Previous
voidmain0
(
u n s i g n e di n ti n i t - l e n g t h .x .y ,s e e d :
0;
u n s i g n e dl o n gg e n e r a t i o n
chargen-textC801;
l o n gb i o s - t i m e .s t a r t - b i o s - t i m e :
cellmap-width);
c e l l m a p current-map(cel1map-height.
cellmap-width):
c e l l m a p next-map(cel1map-height.
11 Gettheseed:seedrandomly
i f 0 entered
":
c o u t << "Seed ( 0 f o r randomseed):
c i n >> seed:
i f (seed
0 ) seed
( u n s i g n e d )t i m e ( N U L L 1 :
11 Randomly i n i t i a l i z e t h e i n i t i a l c e l l
map
c o u t << " I n i t i a l i z i n g . .
srand(seed);
( c e l l m a p - h e i g h t * c e l l m a p - w i d t h ) I 2;
init-length
do {
x
random(cel1map-width);
random(cel1map-height);
y
n e x t - m a p . s e t - c e l l ( x ,y ) :
3 w h i l e( - i n i t - l e n g t h ) ;
current_map.copy-cells(next_map): 11 p u t i n i t map i n current-map
.";
enter-display-mode():
/ I Keep r e c a l c u l a t i n g and r e d i s p l a y i n gg e n e r a t i o n su n t i l
/ I i sp r e s s e d
a key
&bios-time);
current_map.next-generation(next-map);
/ I Make c u r r e n t - m a pc u r r e n ta g a i n
current-map.copy-cells(next~map):
# i f LIMIT-18-HZ
/ I L i m i t t o amaximum
o f1 8 . 2f r a m e sp e rs e c o n d . f o rv i s i b i l i t y
do I
-bios-timeofday(-TIMELGETCLOCK. &bios-time):
3 w h i l e( s t a r t - b i o s - t i m e
bios-time):
bios-time:
start-bios-time
#endif
I w h i l e( ! k b h i t O ) ;
11 c l e ak re y p r e s s
g e t c h ( 1:
exit-display-mode();
" << g e n e r a t i o n << "\nSeed:
" <<
c o u t << " T o t a lg e n e r a t i o n s :
seed << "\n":
I* c e l l m a pc o n s t r u c t o r .
cellmap::cellmap(unsigned
{
width
w;
width-in-bytes
height
h;
326
Chapter 17
*I
i n t h .u n s i g n e di n t
- (w + 7)
I 8;
w)
Home
Next
Previous
Home
Next
/* cellmap destructor. */
cellmap::-cellmap(void)
I
delete[] cells:
1
*/
void cellmap::set_cell(unsigned
&
0x07):
*(cell-ptr)
&-
-(Ox80 >> (x
+ (x / 8 ) ;
&
+ (x / 8 ) ;
0x07)):
#if WRAP-EDGES
while (x < 0 ) x +- width:
/ / wrap, if necessary
while (x >- width) x -- width:
while (y < 0 ) y +- height:
while (y >- height) y -- height;
#else
if ((x < 0 ) 1 1 (x >- width) 1 ) (y < 0 ) 1 1 (y >- height))
/ / return 0 for off edges if no wrapping
return 0 :
lendi f
cells + (y * width-in-bytes) + (x / 8 ) ;
cell-ptr
return (*cell-ptr & (0x80 >> (x & 0x07))) ? 1 : 0;
1
next-map)
327
Previous
f o r ( y - 0 ;y < h e i g h t :
y++) {
f o r ( x - 0x; < w i d t h ;
x++) t
/ / F i g u r eo u t how many n e i g h b o r st h i sc e l lh a s
cell-state(x-1. y-1) + c e l l - s t a t e ( x . y-1)
neighbor-count
cell-state(x+l.
y-1) + cell-state(x-1,
y) +
c e l l - s t a t e ( x + l .y )
+ c e l l - s t a t e ( x - 1 . y+l) +
c e l l s-t a t e ( x .
y+l) + cell-statetx+l.
y+l);
i f (cell-state(x, y)
1) I
/ / The c e l l i s on; does i t s t a y on?
if ( ( n e i g h b o r - c o u n t !- 2 ) && ( n e i g h b o r - c o u n t != 3 ) ) I
next-map.clear-cell(x.
y);
/ / t u r n it o f f
d r a w - p i x e l ( x . y . OFF-COLOR);
I
I else t
--
/ / The c e l l i s o f f :
does it t u r n on?
i f (neighbor-count
3) I
next-map.set-cell(x.
y);
/ / t u r n i t on
d r a w - p i x e l ( x y, .
ON-COLOR):
VGA mode 1 3 h f u n c t i o n s f o r
T e s t e dw i t hB o r l a n d
C++.
# i n c l u d e< s t d i o . h >
# i n c l u d e< c o n i o . h >
l i n c l ude <dos. h>
*/
27
# d e f i n e TEXT-X-OFFSET
# d e f i n e SCREEN-WIDTH-IN-BYTES
Game o f L i f e .
320
/* W i d t h & h e i g h t i n p i x e l s o f e a c h c e l l .
e x t e r nu n s i g n e di n tm a g n i f i e r ;
*/
/*
Mode 1 3 hd r a wp i x e lf u n c t i o n .P i x e l sa r eo fw i d t h
s p e c i f i e db ym a g n i f i e r .
*/
v o i dd r a w - p i x e l ( u n s i g n e di n tx .u n s i g n e di n t
& height
y . u n s i g n e di n tc o l o r )
# d e f i n e SCREEN-SEGMENT
OxAOOO
u n s i g n e dc h a rf a r* s c r e e n - p t r ;
i n t i. j ;
FP-SEG(screen-ptr)
SCREEN-SEGMENT;
FP_OFF(screen-ptr)
y * m a g n i f i e r * SCREEN-WIDTH-IN-BYTES
f o r( i - 0 ;i < m a g n i f i e r :
i++)
I
f o r (j-0; j < m a g n i f i e r ; j++) t
*(screen-ptr+j)
color;
I
I
screen-ptr
+- SCREEN-WIDTH-IN-BYTES;
/*
Mode 13h m o d e - s e tf u n c t i o n .
v o i de n t e r - d i s p l a y - m o d e 0
{
u n i o n REGS r e g s e t :
328
Chapter 17
*/
magnifier;
Home
Next
Previous
Home
Next
r e g s e t . x . a x = 0x0013;
i n t 8 6 ( 0 x 1 0 .& r e g s e t .& r e g s e t ) :
I* T e x t mode m o d e - s e tf u n c t i o n .
v o i de x i t - d i s p l a y - m o d e 0
{
union
*/
REGS regset:
r e g s e t . x . a x = 0x0003;
i n t 8 6 ( 0 x 1 0 .& r e g s e t .& r e g s e t ) ;
/*
T e x td i s p l a yf u n c t i o n .O f f s e t st e x tt on o n - g r a p h i c sa r e ao f
screen. * I
v o i ds h o w - t e x t ( i n tx .i n ty .c h a r* t e x t )
gotoxy(TEXTpX_OFFSET + x . y ) :
puts(text):
329
Previous
Home
Next
the potential for significant speed-up lies? Ill tell you one place where I thought
there was considerable potential-in draw-pixel(). As a programmer of high-speed
graphics, I figuredany drawing function thatwas not only written in C/C++ but also
recalculated the target address from scratch for eachpixel would be among the
first
optimization targets. I also expected to get major gains out of going to a Ping-Pong
arrangement so that I didnt have to copy the new cellmap back to current-map
after calculating the next generation.
I was wrong. Wrong, wrong, wrong. (But at least I was smart enough to use a profiler
before actually writing any newcode.) Table 17.1 shows where the time actually goes
in Listings 17.1and 17.2. As you can see, the time taken by draw-pixel(), copy-cells(),
and atmythingother than calculating the next generationis nothing more than
noise.
We could optimize these
routines right down toexecuting instantaneously, and you know
what? It wouldnt make the slightest perceptible difference in how fast the program
runs. Given the present state of our Game of Life implementation, the only areas
worth looking at forpossible optimizations are cell-state() and nextsenerationo.
p
330
However, ifyou never look beneath the suflaceof the abstract model at the implementation details,you have noidea ofwhat thetruepe$nnance cost of various operations
is, and, withoutthat, you have largeb surrendered control over performance.
Chapter 17
Previous
Home
Next
Having said that, let me hasten to add that algorithmic improvements can make a
big difference even when working at a purely abstract level. For a large unordered
data set, a high-level Quicksort will beat the pants off the best-implemented insertion sort you can imagine. Still, you can optimize your algorithm from here 'til
doomsday, and if you have a fast algorithm running ontop of a highly abstract programming model, you'll almost certainly end up with a slow program. In Listing
17.1, the abstraction that's killing us is that of looking at the eight neighbors with
eight completely independent operations, requiring eight calls to cell-state() and
eight calculations of cell address and cell mask. In fact, given the nature of cell storage, the eight neighbors are
in a fixed relationship to one another, and theaddresses
and masks of all eight can generally be foundvery easily via hard-wired offsets and
shifts once theaddress and mask of anyone is known.
There's a kicker here, though, and that's the counting of neighbors forcells at the edge of
the cellmap. When cellmap wrapping
is enabled (so that the cellmap becomes essentially
a
toroid, with each edge joined seamlessly to the opposite edge, as opposed to having a
border of offcells), neighbors that reside on the otheredge of the cellmap can't be
accessed by the standard fixed offset, as shown
in Figure 17.1. So, in general, we could
improve performance by hard-wiring our neighborcountingfor the bit-percell cellmap
1
J
Cellmap
Edge-wrapping complications.
Figure 17.1
331
format, butit seems wed need a lotof conditional codeto handle wrapping, and that
would slow things back down again.
When a problem doesnt lend
itself well to optimization, make it a practice to see if
you can change the problem definition to one thatallows for greater efficiency. In
this case, wellchange the problemby putting padding bytes around the edgeof the
cellmap, and duplicating each edgeof the cellmapin the paddingbytes at the opposite side, as shown in Figure 17.2. That way, a hard-wired neighbor count will find
exactly whatit should-the opposite edge-without any special code at all.
But doesnt that extracopying of the edges take time? Sure, butonly a little;we can
build itinto the cellmapcopying function, and thenfrankly we wont even notice it.
Avoiding tens or hundredsof thousands of calls to cell-state(), on the other hand,
will be very noticeable. Listing 17.3 shows the alterations toListing 17.1required to
in truth,
implement ahard-wired neighborcounting function.This is a minor change,
implemented in abouthalf an hour and not
making the codesignificantly largerbut Listing 17.3 is 3.6 times faster thanListing 17.1, as shownin Table 17.1.Were up
to about 10 generations per second on a 486; not where we want to be, but it is a
vast improvement.
All neighbors for this cell are at
the usual adjacent addresses,
thanks to the padding cells.
Fbdding Cells
I
Fbdding Cells -
JI
I
0 0 / 0 O O O 0 0 . 0
Cellmap
332
Chapter 17
LISTING 17.3
/*
11 7-3.CPP
c e l l m a pc l a s sd e f i n i t i o n ,c o n s t r u c t o r ,c o p y - c e l l s o ,s e t L c e l l 0 ,
c l e a r - c e l l O .c e l l L s t a t e 0 .c o u n t L n e i g h b o r s 0 .
and
n e x t - g e n e r a t i o n 0f o rf a s t ,h a r d - w i r e dn e i g h b o rc o u n ta p p r o a c h .
O t h e r w i s e ,t h e
same as L i s t i n g 1 7 . 1 * /
c l a s sc e l l m a p
1
private:
u n s i g n e dc h a r* c e l l s ;
u n s i g n e di n tw i d t h :
u n s i g n e di n tw i d t h - . i n - b y t e s ;
u n s i g n e di n th e i g h t :
u n s i g n e di n tl e n g t h - i n - b y t e s ;
public:
c e l l m a p ( u n s i g n e di n th .u n s i g n e di n tv ) :
-cellmap(void);
v o i dc o p y - c e l l s ( c e l 1 m a p& s o u r c e m a p ) :
v o i ds e t - c e l l ( u n s i g n e di n tx .u n s i g n e di n ty ) :
v o i dc l e a r - c e l l ( u n s i g n e di n tx .u n s i g n e di n ty ) ;
intcell-state(int x. i n t y ) :
i n tc o u n t - n e i g h b o r s ( i n tx .i n ty ) ;
v o i d next-generation(cellmap& dest._map);
}:
/*
c e l l m a pc o n s t r u c t o r .
P a d sa r o u n dc e l ls t o r a g ea r e aw i t h
*I
b y t e ,u s e df o rh a n d l i n ge d g ew r a p p i n g .
cellmap::cellmap(unsigned i n t h .u n s i g n e di n t
w)
w i d t h = w;
width-in-bytes
height = h;
length-in-bytes
7) / 8)
+ 2:
width-in-bytes
((w
1 extra
/ / p a de a c hs i d ew i t h
/ / 1 e x t r ab y t e
( h + 2);
new u n s i g n ecdh a r C l e n g t h - i n - b y t e s ] ;
cells
m e m s e tl(ecne0gl.1t hs-. i n - b y t e s ) :
/ / p atdo p / b o t t o m
I / w i t h 1 e x t r ab y t e
/ / c e sl lt o r a g e
/ I c cl aseeltatllalorsr t.
/ * C o p i e so n ec e l l m a p ' sc e l l st oa n o t h e rc e l l m a p .
I f wrapping i s
e n a b l e d .c o p i e se d g e( w r a p )b y t e si n t oo p p o s i t ep a d d i n gb y t e si n
s o t h a tt h ep a d d i n gb y t e so f fe a c he d g eh a v et h e
s o u r c ef i r s t ,
same v a l u e sa sw o u l db ef o u n db yw r a p p i n ga r o u n dt ot h eo p p o s i t e
edge.Bothcellmapsareassumed
t o b et h e same s i z e . * /
v o i d cel1map::copy-cells(cel1map &sourcemap)
u n s i g n e dc h a r* c e l l - p t r ;
i n t i;
# i f WRAP-EDGES
/ / Copy l e f t and r i g h t edges i n t op a d d i n gb y t e s
on r i g h t and l e f t
c e l l - p t r = sourcemap.cells + width-in-bytes:
i;
< h e i g h t ; i++)
{
f o r (i=O
* c e l l - p t r = * ( c e l l - p t r + width-in-bytes
- 2):
* ( c e l l - p t r + width-in-bytes
- 1) = * ( c e l l L p t r + 1 ) :
c e l l - p t r += w i d t h - i n - b y t e s :
/ / Copy t o pa n db o t t o me d g e si n t op a d d i n gb y t e s
on b o t t o ma n dt o p
rnemcpy(sourcemap.cells, s o u r c e m a p . c e l l s + l e n g t h - i n - b y t e s
(width-in-bytes
* 2 ) . width-in-bytes):
memcpy(sourcemap.cel1s + l e n g t h - i n - b y t e s
- width-in-bytes.
sourcemap.cel1.s + w i d t h - i n - b y t e s .w i d t h - i n - b y t e s ) ;
333
#endi f
/ / Copy all cells to the destination
memcpy(cel1s. sourcemap.cells. length-in-bytes);
I
/ * Turns cell on. x and y are offset by 1 byte down and to the right,to compensate for the
padding bytes around the cellmap. * I
void ce1lmap::set-cell(unsigned int x . unsigned int y)
I-
Ox80
>>
+ ( ( x / 8 ) + 1);
( x & 0x07);
>>
+ ( ( x / 8 ) + 1):
(x & 0x07));
+ ( ( x / 8 ) + 1);
: 0;
- -
if ((mask
cell-ptr++;
334
Chapter 17
i f ((*(cell-ptr
neighbor-count++;
(width-in-bytes
2 ) ) & mask))
I 1 P o i n tt ou p p e rr i g h tn e i g h b o r
i f ((mask >>- 1) = 0 ) {
mask = 0x80:
cell-ptr++;
/ / C o u n tu p p e rr i g h tn e i g h b o r
i f ( ( * c e l l _ p t r & mask))neighbor-count++;
/ / Count r i g h t n e i g h b o r
i f ( ( * ( c e l l - p t r + width-in-bytes)
& mask))neighbor-count++:
I / C o u n tl o w e rr i g h tn e i g h b o r
i f ( ( * ( c e l l L p t r + (width-in..bytes
* 2 ) ) & mask))
neighbor-count++;
r e t u r nn e i g h b o r - c o u n t :
/* C a l c u l a t e st h en e x tg e n e r a t i o no fc u r r e n t - m a pa n ds t o r e s
next-map. * I
v o i d cellmap::next_generation(cellmap&
f
u n s i g n e di n tx .y .n e i g h b o r - c o u n t :
it i n
nexttmap)
f o r( y - 0 ;y < h e i g h t :
y++) 1
f o r (x=O; x < w i d t h ; x++) I
n e i g h b o r - c o u n t = c o u n t - n e i g h b o r s ( x .y ) :
i f ( c e l l - s t a t e ( x .y )
== 1) I
i f ( ( n e i g h b o r - c o u n t != 2 ) && ( n e i g h b o r - c o u n t != 3 ) )
n e x t - m a p . c l e a r - c e l l ( yx ), :
/ I t u r n it o f f
d r a w - p i x e l ( x , y . OFF-COLOR):
I else
i f ( n e i g h b o r - c o u n t == 3 ) {
n e x t - m a p . s e t - c e l l y( x) :.
d r a w - p i x e l ( x . y . ONKCOLOR):
/ I t u r n i t on
In Listing 17.3, note the padded cellmap edges, and the alterationof the member
functions to compensate for the padding.Also note that the width now has to be a
multiple of eight, to facilitate the process of copying the edges to the opposite padding
bytes. We have decreased the generality of our Game of Life implementation in exchange for better performance.
Thats a very common trade-off, ascommon as trading
memory for performance.As a rule, the more generala programis, the slower it is.
A corollary is that often (not always, but often), the moreheavily optimized a program is, the more complex and the moredifficult to implement it is. You can often
improve performance a good dealby implementing only the level of generality you
need, but at the
same time decreased generality makes it moredifficult to change or
port the program
a t some later date.
A Game of Life implementation, such as Listing
17.1, thats built on set-cell(), clear-cell(), and get-cell() is completely general; you
The Game of Life
335
can change the cell storage format simply by changing the constructor and those
three functions. Listing 17.3 is harder to changebecause count-neighborso would
also have to be altered, and
its more complex than any of the otherfunctions.
So, in Listing 17.3, weve gotten under the hood and changed the
cellmap format a
little, and gotten impressive results. But now count-neighborso is hard-wired for
optimized counting, and its stilltaking up more thanhalf the time. Maybe now its
time to go to assembly?
Not hardly.
117-4.CPP
I* n e x t L g e n e r a t i o n 0 .i m p l e m e n t e du s i n gf a s t ,a l l - i n - o n eh a r d - w i r e d
n e i g h b o rc o u n t / u p d a t e / d r a wf u n c t i o n .O t h e r w i s e ,t h e
*I
L i s t i n g1 7 . 3 .
same as
I* C a l c u l a t e st h en e x tg e n e r a t i o no fc u r r e n t - m a pa n ds t o r e s
next-map. * I
v o i d cel1map::next-generation(cellmap&
it i n
next-map)
u n s i g n e di n tx .
y. neighbor-count:
width-in-bytes
<< 1;
u n s i g n e d i n t wi dth-in-bytesX2
u n s i g n e dc h a r* c e l l L p t r .* c u r r e n t L c e l l - p t r .m a s k ,c u r r e n t t m a s k ;
u n s i g n e dc h a r* b a s e - c e l l - p t r *. r o w - c e l l - p t r b. a s e - m a s k ;
= next-map.cells;
u n s i g n e dc h a r* d e s t - c e l l - p t r
11 P r o c e s sa l lc e l l si nt h ec u r r e n tc e l l m a p
cells;
/ / p o i n tt ou p p e rl e f tn e i g h b o ro f
row-cel1-ptr
/I firstcellincell
map
f o r( y - 0 :y < h e i g h t :
y++) [ / I r e p e a tf o re a c hr o wo fc e l l s
11 C e l lp o i n t e ra n dc e l lb i t
mask f o r f i r s t c e l l i n
row
/ I t o access
upper
l e fnt e i g h b o r
base-cell-ptr = row-cell-ptr;
/ If iciornesf ltl
row
base-mask = 0x01:
/ I r e p e faoet ra ccheri lnol w
f o r ( x -x0<: w i d t h ;
x++) [
/ I F i r s t ,c o u n tn e i g h b o r s
/ / P o i n tt ou p p e rl e f tn e i g h b o ro fc u r r e n tc e l l
base-cell-ptr;
/ I p o i n t e ra n db i t
mask f o r
cell-ptr
11 u p lnpeef irtg h b o r
mask = basecmask;
/ I Countupper
l e f tn e i g h b o r
( * c e l l L p t r & mask) ? 1 : 0;
neighbor-count
/ / Count l e f t n e i g h b o r
i f ( ( * ( c e l l - p t r + width-in-bytes)
& mask))
neighbor-count++;
/ I C o u n tl o w e rl e f tn e i g h b o r
i f ( ( * ( c e l l - p t r + width-in-bytesX2)
& mask))
neighbor-count++:
336
Chapter 17
--
--
1
row-cell-ptr
--
+- width-in-bytes:
//
Listing 17.4 and Listing 17.3 are functionally the same; the only difference lies in
how nextsenerationo is implemented. (Only nextsenerationo is shownin Listing
17.4;the program is otherwise identical to Listing 17.3.) Listing 17.4 applies the
following optimizations to nextsenerationo:
The neighbor-counting code is brought into nextseneration, eliminating many function calls and from-scratch address/mask calculations; all multiplies are eliminated by
using pointers and addition; and all cellsare accessed directly via pointers and masks,
eliminating all remaining functioncalls and from-scratch address/mask calculations.
The Game of Life
337
The neteffect of these optimizations is that Listing 17.4is more than twice as fast as
Listing 17.3;weve achieved the desired 18 generationsper second, albeit only on a
486, and only at 96x96. (The #define that enables codelimiting the speedto 18 Hz,
which seemed ridiculous in Listing 17.1, is actually useful for keeping the generations from iterating too
quickly when Listing 17.4 is running ona 486, especially with
a small cellmap like 48x48.) Weve sped things up by about eight times so far; we
need to increase our speed another ten times to reach our goal of 200~200at 18
generations per second ona 20 MHz 386.
Its undoubtedly possible to improve the performanceof Listing17.4 further by finetuning thecode, but no tremendous improvement
is possible that way.
Were still not ready for assembly, though; what we need is a new perspective that
lends itself to vastly better performancein C++.The Life program in the nextsection
is three to seven times faster than Listing 17.4-and its still in C++.
How is this possible? Here aresome hints:
After a few dozen generations, mostof the cellmap consists of cellsin the off state.
There are many possible cellmap representations other than one bit-per-pixel.
Cellschangestaterelativelyinfrequently.
338
Chapter 17
Relax. Step back.Try to divine the true natureof theproblem. The objectis not to
count neighbors and check cell states as quickly as possible; thatk just one possible implementation. The object is to determine whenb state
a cellmust be
changed
and to change it appropriately, andthats what we needto do asquickly us possible.
What difference does that new perspective make? Lets approach it this way. What
does atypical cellmap look like? As it happens, after afew generations, thevast majority of cellsare off. In fact, the vast majority of cells are notonly off but areentirely
surrounded by off-cells. Also, cells change state infrequently; in any given generation after the first few, most cellsremain in the same state as inthe previous generation.
Do you see where Im heading? Do you hear a whisper of inspiration fromyour right
brain? The original implementation storedcell states as 1-bits (on), or0-bits (off).
For each generation and for eachcell, it counted thestates of the eight neighbors,
for an average of eight operations per cell per generation. Suppose, now, that on
average 10 percent of cells change state from one generation to the next. (The actual percentageis even lower, but this will do for illustration.) Supposealso that we
change the cell map format to store abyte rather than a bit for each cell, with the
byte storing notonly the cell state but also the countof neighboring on-cells for that
cell. Figure 17.3 shows this format. Then, rather than counting neighbors
each time,
we could just look at the neighbor countin the cell and operatedirectly from that.
But what about the overhead needed to maintain the neighbor counts? Well, each
time a cell changes state, eight operations
would be needed to update the countsin
the eight neighboring
cells. Butthis happens only once every ten cells, on averageso the cost of this approach is only one-tenth that of the original approach!
Know your data.
The Game of Life
339
7-5.CPP
C++ Game o f L i f e i m p l e m e n t a t i o n f o r a n y
mode f o r w h i c h mode s e t
a n dd r a wp i x e lf u n c t i o n sc a nb ep r o v i d e d .T h ec e l l m a ps t o r e st h e
n e i g h b o rc o u n tf o re a c hc e l la sw e l la st h es t a t eo fe a c hc e l l :
t h i sa l l o w sv e r yf a s tn e x t - s t a t ed e t e r m i n a t i o n .
Edgesalwayswrap
i nt h i si m p l e m e n t a t i o n .
C++. To r u n . l i n k w i t h L i s t i n g
17.2
T e s t e dw i t hB o r l a n d
i nt h el a r g em o d e l .
*/
# i n c l u d e < s t d l ib. h>
#i
n c l u d e < s t d 0.
i h>
# i n c l u d e< i o s t r e a m . h >
# i n c l u d e< c o n i o . h >
340
Chapter 17
# i n c l u d e< t i m e . h >
#i
n c l ude <dos .h>
fkin c l u d e < b i o s . h >
#i
n c l u d e <mem. h>
#define
#define
P d e f ine
#define
#define
ONKCOLOR 1 5
OFF-COLOR 0
MSG-LINE
10
GENERATION-LINE 1 2
0
LIMIT-18-HZ
/I
/I
/I
/I
o n - c pe il xlceoll o r
o f f - c ep li lxceol l o r
row f o tre x t
messages
row f o r g e n e r a t i o n # d i s p l a y
/ / s e t 1 t ot o maximum f r a m rea t e
c l a s sc e l l m a p
{
private:
unsigned char *cell s :
u n s i g n e dc h a r* t e m p - c e l l s :
u n s i g n e di n tw i d t h :
u n s i g n e di n th e i g h t :
u n s i g n e di n tl e n g t h - i n - b y t e s :
public:
c e l l m a p ( u n s i g n e di n th .u n s i g n e di n tv ) :
-cellmap(void):
v o i ds e t - c e l l ( u n s i g n e di n tx .u n s i g n e di n ty ) :
v o i d c l e a r - c e l l ( u n s i g n e di n tx .u n s i g n e di n t
intcell-state(intx.int
y):
i n tc o u n t - n e i g h b o r s ( i n tx .i n t
y):
v o i dn e x t - g e n e r a t i o n ( v o i d ) :
v o i di n i t ( v o i d ) ;
- 18Hz
y);
I:
e x t e r nv o i d
enter-displaymode(void):
e x t e r nv o i d exit-display-mode(void);
e x t e r nv o i dd r a w - p i x e l ( u n s i g n e di n t
X . u n s i g n e di n t
u n s i g n e di n tC o l o r ) ;
e x t e r nv o i ds h o w - t e x t ( i n tx .i n ty .c h a r* t e x t ) ;
I* C o n t r o l s t h e s i z e o f t h e c e l l
o ft h ed i s p l a y
mode, andmustbe
*I
d i s p l a ya tr i g h t .
96:
u n s i g n e di n tc e l l m a p - w i d t h
96:
u n s i g n e di n tc e l l m a p - h e i g h t
--
map. Mustbe
w i t h i nt h ec a p a b i l i t i e s
l i m i t e dt ol e a v e
room f o r t e x t
I* Width & h e i g h t i n p i x e l s o f e a c h c e l l .
u n s i g n e di n tm a g n i f i e r
2;
I* Randomizingseed
Y.
*/
*/
unsigned i n t seed:
v o i dm a i n 0
u n s i g n e dl o n gg e n e r a t i o n
0:
chargen-textC801:
l o n gb i o s - t i m e .s t a r t - b i o s - t i m e :
c e l l m a p current-map(cel1map-height.
current-map.init0:
cellmap-width):
/ / r a n d o m l yi n i t i a l i z ec e l l
map
enter-di splay-mode( ) :
341
/ / Keep r e c a l c u l a t i n ga n dr e d i s p l a y i n gg e n e r a t i o n su n t i la n yk e y
/ I i s pressed
s h o w - t e x t ( 0 . MSG-LINE. " G e n e r a t i o n : " ) :
start-bios-time
-bios-timeofday(-TIME-GETCLOCK.
&bios-time):
do {
generation++:
s p r i n t f ( g e n - t e x t ". % 1 0 1 u " g. e n e r a t i o n ) ;
show-text(1. GENERATION-LINE, g e n - t e x t ) ;
/ / R e c a l c u l a t ea n dd r a wt h en e x tg e n e r a t i o n
current-map.next-generationo;
# i f LIMIT-18-HZ
/ / L i m i t t o amaximum
o f 18.2 f r a m e s p e r s e c o n d , f o r v i s i b i l i t y
do
bios-timeofday(-TIME-GETCLOCK.&bios-time):
] w h i l e( s t a r t - b i o s - t i m e
bios-time);
bios-time;
start-bios-time
#endi f
1 w h i l e( ! k b h i t O ) :
/*
/ I c l ekaery p r e s s
getch0;
):
e x i t - d i s pay-mode(
l
c o u t << " T o t a lg e n e r a t i o n s :
seed << "\n":
c e l l m a pc o n s t r u c t o r .
cellmap::cellmap(unsigned
-- - -
"
<<
generation
*/
<<
width
w:
height
h;
w * h:
length-in-bytes
new u n s i g n ecdh a r C l e n g t h - i n - b y t e s ] :
cells
new u n s i g n e dc h a r [ l e n g t h - i n - b y t e s l ;
temp-cells
if ( (cells
NULL)
(temp-cells
NULL) 1
p r i n t f ( " 0 u to fm e m o r y \ n " ) :
exit(1):
memset(cel1s.
I* c e l l m a pd e s t r u c t o r .
cellmap::-cellmap(void)
0. l e n g t h - i n - b y t e s ) ;
"
<<
w)
i n t h ,u n s i g n e di n t
I(
"\nSeed:
/ / c e sl lt o r a g e
I / temp c e l l s t o r a g e
I / c l e a ra l lc e l l s ,t os t a r t
*I
d e l e t e C l c e l l s;
d e l e t e [ ]t e m p - c e l l s :
1
/ * T u r n sa no f f - c e l lo n ,i n c r e m e n t i n gt h eo n - n e i g h b o rc o u n tf o rt h e
e i g h tn e i g h b o r i n gc e l l s .
*/
v o i d cel1map::set-cell(unsigned
(
i n t x ,u n s i g n e di n ty )
u n s i g n e di n t w
width. h
height:
i n tx o l e f t .x o r i g h t .y o a b o v e .y o b e l o w ;
c e l l s + (Y
u n s i g n e dc h a r* c e l l - p t r
W)
+ X:
I / C a l c u l a t et h eo f f s e t st ot h ee i g h tn e i g h b o r i n gc e l l s .
/ / a c c o u n t i n gf o rw r a p p i n ga r o u n da tt h ee d g e so ft h ec e l l
-- -
if (x
0)
xoleft
w - 1:
else
xoleft
-1:
342
Chapter 17
map
-- -- -
i f (y
0)
yoabove
length-in-bytes
- w:
else
-w:
yoabove
i f (x
(w - 1))
x o r i g h t = - ( w - 1):
else
1:
xoright
i f (y
(h - 1 ) )
yobelow
-(length-in-bytes
- w):
else
w:
yobelow
-- --
*(cell-ptr)
I- 0x01:
* ( c e l l - p t r + yoabove + x o l e f t ) +- 2:
* ( c e l l - p t r + yoabove) +- 2:
* ( c e l l - p t r + yoabove + x o r i g h t ) +- 2:
* ( c e l l - p t r + x o l e f t ) +- 2:
* ( c e l l - p t r + x o r i g h t ) +- 2:
* ( c e l l - p t r + yobelow + x o l e f t ) +- 2:
* ( c e l l - p t r + yobelow) +- 2:
* ( c e l l - p t r + yobelow + x o r i g h t ) +- 2 :
1
I* T u r n sa no n - c e l lo f f ,d e c r e m e n t i n gt h eo n - n e i g h b o rc o u n tf o rt h e
e i g h tn e i g h b o r i n gc e l l s .
*I
i n tx .u n s i g n e di n ty )
v o i d cel1map::clear-cell(unsigned
(
u n s i g n e di n t w
width, h
height;
i n tx o l e f t ,x o r i g h t .y o a b o v e .y o b e l o w :
c e l l s + (y
u n s i g n e dc h a r* c e l l - p t r
w ) + x:
I / C a l c u l a t et h eo f f s e t st ot h ee i g h tn e i g h b o r i n gc e l l s ,
/ I a c c o u n t i n gf o rw r a p p i n ga r o u n da tt h ee d g e so ft h ec e l l
--
if (x
0)
xoleft
else
xoleft
if (y
0)
yoabove
else
yoabove
i f (x
(w
xoright
else
xoright
if ( y
(h
yobelow
else
yobelow
-- -- -w:
-- - ---
map
1:
-1:
lengthkin-bytes
w:
1))
- ( w - 1);
1:
1))
-(length-in-bytes
w):
- w;
* ( c e l l L p t r ) &- -0x01:
* ( c e l l _ p t r + yoabove + x o l e f t ) -- 2:
*(eel 1 - p t r + yoabove ) -- 2:
* ( c e l l - p t r + yoabove + x o r i g h t ) -- 2:
*(eel 1 - p t r + x o l e f t ) -- 2:
* ( c e l l _ p t r + x o r i g h t ) -- 2:
* ( c e l l - p t r + yobelow + x o l e f t ) -- 2:
* ( c e l l - p t r + yobelow) -- 2:
* ( c e l l - p t r + yobelow + x o r i g h t ) -- 2:
343
*I
cells + ( y * width) + x;
cell-ptr
return *cell-ptr & 0x01;
I f which to work
1
1 else
RowDone:
if (count
3) (
set-cell(x. y);
draw-pixel (x. y. ON-COLOR):
1
344
Chapter 17
/ / Gettheseed;seedrandomly
i f 0 entered
;
c o u t << Seed ( 0 f o r randomseed):
c i n >> seed;
i f ( s e e d =- 0 ) seed = ( u n s i g n e d )t i m e ( N U L L ) :
/ / Randomly i n i t i a l i z e t h e i n i t i a l c e l l
/ / ( a c t u a l l yg e n e r a l l yf e w e r ,b e c a u s e
/ / r a n d o m l ys e l e c t e dm o r et h a no n c e )
c o u t << I n i t i a l i z i n g . . . :
srand(seed);
( h e i g h t * w i d t h ) / 2:
init-length
do {
x = random(width):
random(height);
y
i f ( c e l l - s t a t e ( x .y )
-= 0 ) 1
s e t - c e l l ( x .y ) ;
map t o 50% a n - p i x e l s
some c o o r d i n a t e s will be
I
I w h i l e( - i n i t - l e n g t h ) ;
The large modelis actually not necessary for the96x96 cellmap inListing 17.5.However, I was actually more interested in seeingfast
a 200x200 cellmap, and two 200x200
cellmaps cant fit in asingle segment. (This caneasily be worked around inassembly
language for cellmaps up to a segment in size; beyond that size, cellmap scanning
becomes pretty complex, although it can
still be efficiently implemented with some
clever programming.)
Anyway, using the large model helps illustrate that its the data representation and
the dataprocessing approach you choose that mattermost. Optimization details like
memory models and segments and in-line functions andassembly language are important but secondary. Let your mind roam creatively before you start coding.
Otherwise, you may find youre writing well-tuned slow code, which is by no means
the same thingas fast code.
Take a close look at Listing 17.5. You will see that its quite a bit simpler than
Listing
17.4. To some extent, thats because I decided to hard-wire the program to wrap
around from one edgeof the cellmap to theother (its much more interesting that
way), but the main reason is that its a lot easier to work with the neighbor-count
model. Theresno complex mask and pointer management, and the
only thing that
reuZ(y needs to be optimized is scanning for zerobytes. (And, in fact,I havent optimized even that because its done in a C t + loop; it should really be REPZ SCASB.)
In truth, none of the code in Listing 17.5 is particularly well-optimized, and, as I
noted, the program
must be compiledwith the large model for large
cellmaps. Also,
of course, the entire program is still in C+t; note well that theres not a whit of
assembly here.
345
Previous
Home
No doubt we could get another two to five times improvement with good assembly
code-but thats dwarfed by a 30-times improvement, so optimization at a conceptual level must come first.
346
Chapter 17
Next
Previous
Home
Next
When I was in hi& school, my gym teacher had us run a race around the soccer
field, or rather, arotiNd a course marked with cones that roughly outlined the shape
of the field. I quickly s
d into second place behind Dwight Chamberlin. We cruised
around the field, and &We came to the far corner, Dwight cut across the corner,
inside a cone placed dkwardly far out from the others. followed,
I
and everyone else
cut inside the conet$o-except the pear-shaped kid bringing up the rear, who plodded his way around kvery singlecone on his way to finishing about half a lap behind.
When the laggar&&nally crossedthe finish line, thecoach named him the winner, to
my considCE%#&rj@itation.After all, the object was to see whocould run the fastest,
wasnt it?
Actually, it wasnt.The object was to see who could run the fastest according to the
limitations placed upon the contest. This is a crucial distinction, although usually
taken for granted. Would it havebeen legitimate if I had cutacross the middle of the
field? If I had ridden a bike? If I had broken theworld record for the100 meters by
dropping 100 meters from a plane? Competition has meaning only withina carefully
circumscribed arena.
Why am I telling you this? First, becauseit is a useful lesson for programming.
,_x
n 8.
349
The other reason for the anecdote has to do with the waymy second Optimization
Challenge worked itselfout. If youll recallfrom the last chapter, the challenge I made
to the readers of PC TECHNIQLES was to devise the fastest possible version of the
Game of Lifecellular automata simulation game. I gave an example, laid out therules,
and stood aside. Good thing, too. Apres moi, le deluge.. ..
And whenthe dust had settled, I was left withthe uneasy realizationthat every submitted
entry broke the rules. Every single entry. The rules clearly statedthat submitted code must
produce exactly thesame output as myexample implementation under all circumstances
in order to be eligible to win. I do notthink that therecan be anyquestion about what
exactly the same output means. It means the same pixels,in the same colors,at the
same placeson the screen at the same points in all the Life simulations that the original code was capable of running. Period. And not oneof the entries met that standard.
Some submitted listings were more than 400 lines long. Some didnt display the generation number at theright side of the screen, didnt draw the same pixel colors, or
didnt bother with magnification. Some had bugs. Some didnt support all possible
cellmap widths and heights up to 200x200, requiring widths and heights that were
specific multiples of a number of cellsthat lentitself toa particular implementation.
This last mission is,
in a way, a brilliant approach, as evidenced by the fact that ityielded
the two fastest submissions,but it is not within the rules of the contest. Some of the
rule-breaking was major, some very minor, and some had nothing to do with the Life
engine itself, but the rules were clear; where was I to draw the line if not with exact
compliance? And I was fully prepared to draw that line rigorously, disqualifjmg some
mind-bending submissions in order to let lesser but fully compliant entries win-until
I realized that there were no fully compliant entries.
Given which, I heaved a sigh of relief, threw away the rules, and picked a winner in
the true spirit of the contest: raw speed. Two winners, in fact: Peter Klerings, a programmer for Turck GmbH in Munich, Germany, whose entry just plain runs like a
bat out of hell, and David Stafford (who was also the winner of my first Optimization
Challenge), of Borland International, whose entry is slightly slower mainlybecause
he didnt optimize the drawing part of the program, in full accordance with the
contest rules, which specifically excluded drawing time from consideration. Unfortunately, Peters generation code and drawing code are so tightly intertwined that it
is impossible to separate them, and hence not really possible to figure out whose
350
Chapter 18
generation engine is faster. Anyway, at 180 to200 generations per second, including
drawing time, for 200x200 cellmaps (and in the neighborhoodof lOOOgpsfor 96x96
cellmaps, the size of my original implementation), theyre the fastest submissions I
received. Theyre both more than an order of magnitude faster than my final optimized c++
Life implementation shown in Chapter 17, and more than 300 times
faster than my original, perfectly functional Life implementation. Not 300 percent300 times. Cell generations scud across the screen like clouds, and walkers shoot out
like bullets. Each is a worthy winner, and I feel confident that the true objective of
the challenge has been met: pure, breathtaking speed.
Notwithstanding, mea culpa. The next time I lay a challenge, I will define the rules
with scrupulous care.Even so, this was much more thanjust anothercycle-counting
contest. Were fortunate enoughto be privy to
a startling demonstration of the power
of the best optimizer anyone has yet devised-you. (Thats the general you;I realize that the specific youmay or may not be quite up to the optimizing level of the
specific David Stafford or Peter Klerings.)
Onward to the code.
Table-Driven Magic
David Stafford won my first Optimization Challenge by means of a huge look-up
table and an incredible state machine driven by that table. The table didnt cause
Davids entry to exceed the line limit because Davids submission included code to
generate thetable on the fly as part of the build process. David has done himself one
better this time with his QLIFEprogram; not only does his build process generate a
64K table, but it also generates virtually all hiscode, consisting of 17,000-pluslines of
assembly language spanning another 64K. What David has done is write the equivalent of a bitblt compiler for the
Game of Life; one might in fact call it aLife compiler.
What Davids code generates is still a general-purpose program; it takes arbitrary
seed values, and can run for an arbitrary number of generations, so its not as if David
simply hardwired the instructions to draw each successive screen. However, its a
general-purpose program that is exquisitely tailored to the task it needsto perform.
All the pieces of QLIFE are shown in Listings 18.1 through 18.5, as follows: Listing
18.1is BUILD.BAT,the batch file used to build QLIFE; Listing 18.2 is LCOMP.C, the
program used to generate theassembler code and data file QLIFE.ASM; Listing 18.3
is MAIN.C,the main program for QLIFE; Listing 18.4 is VIDEO.C, the video-related
functions, and Listing 18.5 is LIFE.H, the headerfile. The following sidebar contains
Davids build instructions, exactly ashe wrote them. I certainly wont have room to
discuss all the marvelous intricacies of Davids code; I suggest you look over these
listings until you understand themthoroughly (it took me a day to pick them apart)
because theres a lot of neat stuff in there, and its an approach to performance
programming that operates at a more
efficient, tightly integrated level than you may
ever see again. One hint: It helps a lotto build and runLCOMP.C, redirect its output
Its a Wonderful Life
351
is
and
LISTING 18.1
to
BUILD.BAT
b c c- v
-D%1=%2;%2=%3:%3=%4;%4-%5:%5=%6:%6-%7:%7=%8;%8 1comp.c
lcomp > q l i f e . a s m
tasmxImx
lkh30000 q l i f e
b c c- v
-D%1=%2:%2-%3;%3=%4:%4=%5;%5-%6:%6-%7;%7-%8:%8
q l i f e . o b jm a i n . cv i d e 0 . c
LISTING18.2
LC0MP.C
I / LC0MP.C
/I
/ / L i f e c o m p i l e r .v e r
/I
/ I D a v i dS t a f f o r d
/I
352
Chapter 18
1.3
of
with-
# i n c l u d e< s t d i o . h >
# i n c l u d e < s t d l ib. h>
#i
n c l ude "1 if e . h "
(46
#defineLIST-LIMIT
138)
c h a r *Seg
- "";
i f ( WIDTH
printf(
printf(
printf(
printf(
printf(
>
HEIGHT
LIST-LIMIT
Seg
- "es:";
>
v o i dN e x t 2 (v o i d
(
printf(
printf(
printf(
printf(
printf(
"mov
"add
"mov
bp.es:[sil\n"
s i . 2 \ n " 1:
dh.Cbp+ll\n"
"or
dh.l\n" ) :
"jmp
dx\n"
1:
v o i dB u i l d M a p s (v o i d
u n s i g n e ds h o r t
1;
):
- 0.
i.j. S i z e , x
1:
p r i n t f ( "-DATA segment'DATA'\nalign2\n"
p r i n t f (" p u b l i c- C e l l M a p \ n "
1:
p r i n t f ( " - C e l l M a pl a b ew
l ord\n"
):
for( j
(
for( i
if( i
(
<
0; j
HEIGHT: j++
0; i
<
II
WIDTH; i++
i
- WIDTH-1
p r i n t f ( "dw 8000h\n"
C2.
C3:
II
-- 0
II
HEIGHT-1
):
else
(
p r i n t f ( "dw O\n"
):
p r i n t f ( "ChangeCell dw O\n" ) :
p r i n t f ( "_RowColMap l a b e w
l ord\n"
1:
353
- 0; j
i - 0:
<
HEIGHT: j++ 1
<
WIDTH: i++
)
, i n t f ( "dw 0%02x%02xh\n". j . i
i f ( WIDTH
HEIGHT
>
LIST-LIMIT
3 ):
p r i n t f ( "Changel dw o f f s e t -CHANGE:-ChangeListl\n"
p r i n t f ( "Change2 dw o f f s e t -CHANGE:-ChangeList2\n"
p r i n t f (" e n d s \ n \ n "
):
p r i n t f ( '"CHANGE
segmentparapublic'FAR_DATA'\n"
):
):
1:
else
{
p r i n t f (" C h a n g e l
dw o f f s e t DGR0UP:-ChangeListl\n"
p r i n t f ( "Change2 dw o f f s e t DGROUP:-ChangeListZ\n"
Size
WIDTH
HEIGHT
):
):
1:
p r i n t f (" p u b l i c
-ChangeListl\n_ChangeListl l a b e w
l ord\n"
1:
p r i n t f ( "dw %ddup
( o f f s e t DGROUP:ChangeCell)\n",Size
):
p r i n t f (" p u b l i c_ C h a n g e L i s t Z \ n - C h a n g e L i s t Zl a b e lw o r d \ n "
):
p r i n t f ( "dw %ddup ( o f f s e t DGROUP:ChangeCell)\n".
Size ):
p r i n t f (" e n d s \ n \ n "
):
p r i n t f ( "-LDMAP
do
s e g m e n tp a r ap u b l i c
'FAR-DATA'\n"
):
/ / C u r r e n tc e l ls t a t e s
c1
( x & 0x0800) >> 11;
C2
( X & 0x0400) >> 10:
c3
( x & 0x0200) >> 9 ;
--
---
/ / N e i g h b o rc o u n t s
N1
( X & OxOlCO)
N2
( X & 0x0038)
N3
( x & 0x0007):
y
- x & Ox8FFF:
I
1-
1-
--
2)
( N 1 + C2
3) )
354
1-
1)
( N 1 + C2
- 3))
C3
0x4000:
i f ( C2 && ( ( N 2
{
/ / P r e s e r v ea l lb u tt h en e x tg e n e r a t i o ns t a t e s
0x4000:
i f ( ! C 1 &&
6:
3;
+ C2
C 1 && ( ( N 1
if(
>>
>>
0x2000:
Chapter 18
+ C 1 + C3
-- 2) 1 1
(N2
C1
- 3))
i f ( !C2 &&
(
1-
(N2
C 1 + C3
- 3)
0x2000;
v o i d GetUpAndDown( v o i d )
(
p r i n t f ( "mov a x . [ b p + ~ R o w C o l M a p - ~ C e l l M a p l \ n " ) :
p r i n t"f oa( rh , a h \ n "
):
p r i n t f ( "mov dx.%d\n", DOWN ) :
p r i n t f ( "mov cx.%d\n". WRAPUP ) :
printf( "jz
s h oDr t% d \ nL" ,a b e l
):
p r i n t f ( "cmp ah.%d\n". HEIGHT - 1 ) :
p r i n t f ( "mov c x . % d \ n " . UP 1:
p r i n t f "( j bs h o r t
D%d\n".
Label
1:
p r i n t f ( "mov dx,%d\n". WRAPDOWN ) :
p r i n t f ( "D%d:\n",
Label
):
I
v o i dF i r s t p a s s (v o i d
c h a r *Op;
u n s i g n e ds h o r t
p r i n t f (" o r g
UpDown
- 0:
0%02xOOh\n".(Edge
<<
7 ) + (New
/ / r e s e tc e l l
p r i n t f (" x o rb y t ep t r[ b p + l l , 0 % 0 2 x h \ n " .
(New
<<
A
4) + (Old
Old)
<<
<<
1) ) :
1 ):
/ / g e tt h es c r e e na d d r e s sa n du p d a t et h ed i s p l a y
#i
f n d e f NOORAW
p r i n t f ( "mov a1 .160\n" ) :
p r i n t f ( "mov bx,[bp+-RowColMap-~CellMapl\n" 1:
p r i n t f ( "mu1 b h \ n " 1:
p r i n t f ( "add
ax.ax\n"
):
p r i n t f ( "mov bh.O\n" ) :
p r i n t f ( "add
bx.ax\n"
1:
/ / bx
s c r e eonf f s e t
i f ( ((New
Old) & 6)
355
i f ( (New
Old) & 1 )
p r i n t f ( "mov b y t ep t fr s : C b x + 2 l . % s \ n " ,
(New & 1) ? "15" : " d l " 1:
else
i f ( ((New
Old) & 3)
else
i f ( (New
Old) & 2
p r i n t f ( "mov b y t ep t fr s : C b x + l l . % s \ n " .
(New & 2 ) ? "15" : " d l " 1:
i f ( (New
Old) & 1
i f ( (New
Old) & 4 1
p r i n t f c "mov b y t e p t r f s : [ b x l . % s \ n " .
(New & 4 ) ? "15" : " d l " ) ;
#endi f
i f ( (New
i f ( (New
i f ( (New
Old) & 4 )
Old) 8 2 )
Old) & 1
i f ( Edge )
(
GetUpAndDownO;
i f ( (New
/ / ah
Old) & 4
row, a1
c o l .c x
--
"inc":
"dec":
356
Chapter 18
Op ) ;
Op 1;
- down
up.dx
i f ( New & 4
else
/I di
left
Op ) :
I
i f ( (New
Old) & 1 1
printf(
printf(
printf(
printf(
printf(
printf(
Op
Op
=
=
I1 d i
right
):
"add":
"sub".
"%s
word p t[rb p + d i l , 4 0 h \ n " .
" a dddi . c x \ n "
1;
"%s
word p t[rb p + d i I , 4 0 h \ n " .
" s udbi . c x \ n "
);
" a dddi . d x \ n "
):
"%s
word p t[rb p + d i l , 4 0 h \ n " .
Op ) :
Op ) :
Op 1 :
I
printf(
printf(
printf(
printf(
"mov d i . c x \ n "
"add
word
ptr
"mov d i . d x \ n "
"add
word
ptr
p r i n t f ( "mov
1:
UpDown 1 :
[bp+dil.%d\n".
1:
[bp+diI,%d\n".
UpDown
):
1:
dl.O\n"
else
(
Old) & 4 )
i f ( (New
(
i f ( New & 4
Op
Op
else
"inc":
"dec":
b y t ep t[rb p + % d l \ n " .
b y t ep t[rb p + % d l \ n " .
b y t ep t[rb p + % d l \ n " .
p r i n t f ( "%s
p r i n t f ( "%s
p r i n t f ( "%s
i f ( (New
Op. LEFT ) :
Op. UPPERLEFT ) :
Op, LOWERLEFT 1 ;
Old) & 1 )
i f ( New & 1
else
p r i n t f ( "%s
p r i n t f ( "%s
p r i n t f ( "%s
Op
Op
=
=
"add":
"sub".
word p t r [bp+%dl.40h\n".
word p t r [bp+%d].40h\n".
word p t r [bp+%d].40h\n".
i f ( abs( UpDown )
>
p r i n t f ( "add
word
p r i n t f ( "add
word
Op.
RIGHT
):
Op.
UPPERRIGHT
1:
Op. LOWERRIGHT ) :
1 )
p t r [bp+%dl.%d\n".
p t r [bp+%dl,%d\n".
U P , UpDown ) :
DOWN, UpDown ) :
else
i f ( UpDown
else
==
Op
Op
- "inc":
=
"dec":
357
p r i n t f ( "%s
p r i n t f ( "%s
Op. UP
1;
Op. DOWN 1;
b y t ep t r
[bp+%d]\n".
b y t ep t [r b p + % d l \ n " .
N e x t 1( ) ;
1
v o i dT e s t (c h a r* O f f s e t ,c h a r* S t r
printf(
printf(
printf(
printf(
"mov
"cmp
b x . C b p + % s l \ n "O. f f s e t
);
bh.[bxl\n"
1;
" j n z s h o rFt I X _ % s % d \ n "S. t rL. a b e l
"%s%d:\n",Str.Label
);
);
*Str. i n t JumpBack 1
v o i dF i x (c h a r* O f f s e t .c h a r
(
p r i n t f ( "FIX-%s%d:\n".Str.Label
p r i n t f ( "mov b h . [ b x l \ n "
);
p r i n t f ( "mov [ b p + % s l , b x \ n "O. f f s e t
i f ( * O f f s e t !- ' 0 '
p r i n tef l( s e
1;
p r i n t "f (l eaax . [ b p + % s l \ n O
" .f f s e t
"mov ax.bp\n" ) ;
);
1;
p r i n t f (" s t o s w \ n "
i f ( JumpBack )
);
Str. Label 1;
p r i n t f "( j m ps h o r%
t s%d\n".
v o i dS e c o n d p a s s (v o i d
p r i n t f (" o r g
O%OZxOOh\n".
(Edge << 7) + (New
<<
4)
+ ( O l d << 1) + 1 1;
i f ( Edge )
/ / f i n i s h e dw i t hs e c o n dp a s s
i f ( New
7 && O l d
0
(
p r i n t f ( "cmp b p . o f f s e t DGROUP:ChangeCell\n" ) ;
p r i n t f "( j n es h o r N
t otEnd\n"
);
p r i n t f ( "mov word p t re s : [ d i ] . o f f s e t
DGROUP:ChangeCell\n"
p r i n t f ( "pop d i s i
bpds\n"
1;
p r i n t f ( "mov C h a n g e c e l l .O\n" ) ;
p r i n t f (" r e t f \ n "
1;
p r i n t f (" N o t E n d : \ n "
);
1
GetUpAndDownO;
/ / ah
row,a1
c o l .c x
p r i n t f ( "push s i \ n " 1;
p r i n t f ( "mov s i . % d \ n " . WRAPLEFT 1;
p r i n t f ( "cmp a l . O \ n " ) ;
p r i n t f "( j es h o r t
L%d\n".
Label
);
p r i n t f ( "mov s i . % d \ n " . LEFT 1;
p r i n t f ( "L%d:\n".
Label
1;
358
Chapter 18
- u p .d x - down
// s i
left
1;
T e s t (" s i " ,
"LEFT" 1:
p r i n t f ( " a dsdi , c x \ n "
):
T e s t (" s i " .
"UPPERLEFT" ) :
p r i n t f "( s u bs i . c x \ n "
1;
p r i n t f ( "add s .i d x \ n "
):
T e s t (" s i " .
"LOWERLEFT" ) :
p r i n t f ( "mov
si,cx\n"
T e s t (" s i " .
"UP" 1:
p r i n t f ( "mov
si.dx\n"
T e s t (" s i " .
"DOWN" ) :
):
);
p r i n t f ( "cmp b y t ep t r
[bp+_RowColMap-_CellMapl.%d\n".
(WIDTH - 1) * 3 ) :
printf(
printf(
printf(
printf(
"mov
s i .%d\n". WRAPRIGHT ) ;
s h oRr t% d \ nL" .a b e l
):
s i . % d \ n " , RIGHT 1:
"R%d:\n".
Label
):
// si
right
"je
"mov
T e s t (" s i " .
"RIGHT" ) ;
p r i n t f ( " a dsdi . c x \ n "
):
T e s t (" s i " .
"UPPERRIGHT" 1:
p r i n t f ( " s usbi . c x \ n "
):
p r i n t f ( " a dsdi . d x \ n "
):
T e s t (" s i " .
"LOWERRIGHT" ) ;
}
else
T e s t (i t o a (
T e s t (i t o a (
T e s t (i t o a (
T e s t (i t o a (
T e s t (i t o a (
T e s t (i t o a (
T e s t (i t o a (
T e s t (i t o a (
I
i f (New
Old )
i f (Edge )
T e s t ( "0".
p r i n t f ( "pop
"CENTER"
si\n"
):
dl.O\n"
"mOV
);
NextE( ) :
i f ( Edge )
F" si xi "( ,
F i x"(s i " ,
F i x"(s i " ,
" sF ii "x,(
F
" si xi "(.
Fix( "si".
F i x (" s i " .
F i x (" s i " ,
"LEFT",
"UPPERLEFT".
"LOWERLEFT".
"UP",
"DOWN".
"RIGHT".
"UPPERRIGHT".
"LOWERRIGHT".
1 ):
1
):
):
1 );
1 ):
1 ):
1 1:
New == O l d
);
else
Fi ti ox (a (
F i x i(t o a (
F i x i(t o a (
LEFT. Buf, 1 0 ) .
UPPERLEFT. B u f . 10 1,
LOWERLEFT. B u f . 10 1,
"LEFT",
"UPPERLEFT".
"LOWERLEFT".
1 ):
1 );
1 1:
359
Fi tiox a( (
Fi ti xo (a (
F ii txo( a (
F i x (i t o a (
F i x (i t o a (
i f (New
UP, 1B0u f .
),
DOWN, B u f . 10 ) ,
RIGHT, B u f . 10 1,
UPPERRIGHT. B u f 1. 0
LOWERRIGHT. B u f 1. 0
- Old
i f (Edge )
F i x ( "0".
"UP" ,
"DOWN",
"RIGHT",
"UPPERRIGHT".
"LOWERRIGHT".
),
),
"CENTER".
"mov
1 );
1 1;
1 ):
1 );
New
-=
Old
0 );
dl.O\n"
);
NextE( ) :
v o i dm a i n (v o i d
c h a r *Seg
"ds";
BuildMapsO:
p r i n t f ( "DGROUP g r o u p _DATA\n" ) ;
p r i n t f (" L I F Es e g m e n t
'CODE'\n" ) ;
p r i n t f ( "assume cs:LIFE.ds:DGROUP,ss:DGROUP,es:NOTHING\n"
p r i n t f ( ".386C\n""public-NextGen\n\n"
1;
f o r ( Edge
f o r ( New
0 : Edge
f o r (O l d
<
0 ; New
- 0;
1; Edge++ )
<=
Old
8 : New++ )
<
8 : Old++
i f ( New != OFl di r s t p a s s Lo a: b e l * ;
SecondPassO:
Label++:
/ / f i n i s h e dw i t hf i r s tp a s s
p r i n t f "( o r g
O\n" ) ;
p r i n t f ( "mov s i .Changel\n" ) :
p r i n t f ( "mov d i .ChangeZ\n" ) ;
p r i n t f ( "mov C h a n g e l . d i \ n " ) ;
p r i n t f ( "mov ChangeZ,si\n" ) :
p r i n t f ( "mov Changecell.OF000h\n"
p r i n t f c "mov ax.seg -LDMAP\n" ) ;
p r i n t f ( "mov d s . a x \ n " 1:
NextZ( ) ;
/ I e n t r yp o i n t
p r i n t f ( '"NextGen:
i f ( WIDTH
p r i n t f ( "mov
p r i n t f ( "mov
HEIGHT
pushdsbp
>
#i
f n d e f NDDRAW
p r i n t f ( "mov ax.OAOOOh\n"
p r i n t f ( "mov f s , a x \ n " ) :
#endi f
360
Chapter 18
s id i \ n "" c l d \ n "
LIST-LIMIT
ax.%s\n". S e g
es,ax\n" ) :
);
);
):
Seg
- "seg
):
-CHANGE";
):
);
p r i n t f ( "mov
p r i n t f ( "mov
N e x t 1( ) :
1:
s .i C h a n g e l \ n "
d l . O \ n " 1:
1:
p r i n t f (" L I F Ee n d s \ n e n d \ n "
I/
/ / D a v i dS t a f f o r d
//
lin c l ude < s t d l ib . h>
# i n c l u d e< s t d i o . h >
# i n c l u d e< c o n i o . h >
# i n c l u d e< t i m e . h >
iin c l u d e < b i o s . h>
#i
n c l ude "1 if e . h"
/ / f u n c t i o n s i n VIDE0.C
v o i de n t e r - d i s p l a y - m o d e (v o i d
1:
v o i de x i t - d i s p l a y - m o d e (v o i d
):
v o i ds h o w - t e x t (i n tx .i n ty .c h a r* t e x t
v o i dI n i t c e l l m a p (v o i d
u n s i g n e di n t
x
y
--
i. j , t . x .y .i n i t :
1
for( i
- j
WIDTH) + x / 3 1
0: i
i f ( CellMapC i
<
WIDTH
NextGenO:
1-
Ox1000
<<
(2 - (x % 3)):
HEIGHT: i++
1
1 & 0x7000 1
C h a n g e L i s t l C j++1
2; i n i t : i n i t -
random( WIDTH * 3 ) :
random( HEIGHT ) :
CellMapC(y
- (HEIGHT * WIDTH * 3 ) /
for(init
):
( s h o r t ) & C e l l M a p C i 1:
pump.
1
v o i dm a i n (v o i d
- 0:
u n s i g n e dl o n gg e n e r a t i o n
c h a rg e n - t e x t [
80 1:
l o n gs t a r t - t i m e .e n d - t i m e :
u n s i g n e di n ts e e d :
" ) :
p r i n t f ( "Seed ( 0 f o r randomseed):
scanf("%d".&seed
):
i f ( seed
0 ) seed
( u n s i g n e d t) i m e ( N U L L ) :
srand(seed
1:
361
# i f n d e f NODRAW
enter-display-mode0:
show-text( 0. 1 0". G e n e r a t i o n : "
#end? f
/ / r a n d o mi nl yi t i a l iczeel l
InitCellmapO:
-b i o s - t i m e o f d a y (
):
-TIME-GETCLOCK.
map
&start-time
):
do
(
NextGenO:
generation++:
#i
f n d e f NOCOUNTER
s p r i n t f (g e n - t e x t ". % 1 0 1 u " g. e n e r a t i o n
show-text( 0. 12.gen-text
1:
#endi f
1:
C i f d e f GEN
w h i l e (g e n e r a t i o n
< GEN 1:
#else
w h i l e (! k b h i t O
):
#endi f
- b i o s _ t i m e o f d a y ( -TIMELGETCLOCK.
end-time -- s t a r t - t i m e :
&end-time
):
# i f n d e f NODRAW
g e t c h ( 1:
/ / c l e akre y p r e s s
exit-display-mode0:
#endif
p r i n t f (" T o t a lg e n e r a t i o n s :% l d \ n S e e d :% u \ n " .g e n e r a t i o n .s e e d
p r i n t f (" % l dt i c k s \ n " .e n d - t i m e
1:
p r i n t f ( "Time: %fgenerations/second\n".
( d o u b 1 e ) g e n e r a t i o n / (doub1e)end-time * 18.2 1:
# i n c l u d e< s t d i o . h >
# i n c l u d e< c o n i o . h >
#i
ncl ude <dosf h>
# d e f i n e TEXT-X-OFFSET
28
# d e f i n e SCREEN-WIDTH-IN-BYTES
320
# d e f i n e SCREEN-SEGMENT
OxAOOO
/ * Mode 1 3 hm o d e - s e tf u n c t i o n .
v o i de n t e r - d i s p l a y - m o d e 0
I
u n i o n REGS r e g s e t :
regset.x.ax
0x0013:
i n t 8 6 ( 0 x 1 0 .& r e g s e t .& r e g s e t ) :
362
Chapter 18
*/
ofLife.
):
I* T e x t mode m o d e - s e tf u n c t i o n .
v o i d e x i t - d i s pay-mode(
l
)
*I
u n i o n REGS r e g s e t :
0x0003:
regset.x.ax
i n t 8 6 ( 0 x 1 0 .& r e g s e t ,& r e g s e t ) :
/*
T e x td i s p l a yf u n c t i o n .O f f s e t st e x tt on o n - g r a p h i c sa r e ao f
screen. * I
v o i ds h o w - t e x t ( i n tx .i n t
y . c h a r* t e x t )
+ x. y ) :
gotoxy(TEXT-XKOFFSET
puts(text):
1:
e x t e r nu n s i g n e ds h o r tC e l l M a p [ l ;
e x t e r nu n s i g n e ds h o r tf a rC h a n g e L i s t l C I :
#define
l d e f ine
#define
#define
#define
#define
#define
%define
ddefi ne
#define
#define
#define
LEFT
RIGHT
UP
DOWN
UPPERLEFT
UPPERRIGHT
LOWERLEFT
LOWERRIGHT
WRAPLEFT
WRAPRIGHT
WRAPUP
WRAPOOWN
(-2)
(+2 1
(WIDTH * LEFT)
(WIDTH * RIGHT)
( U P + LEFT)
( U P + RIGHT)
(DOWN + LEFT)
(DOWN + RIGHT)
(RIGHT * (WIDTH - 1))
(LEFT * (WIDTH - 1 ) )
(DOWN * (HEIGHT - 1 ) )
(UP
* (HEIGHT - 1 ) )
363
edited Davids text a bit, and addedmy own comments in square brackets, so blame
me forany errors.)
Each three cells in the life grid are packed into two bytes, as shownin Figure 18.1.
So, it is convenient if the width of the cell array is an even multiple of three. Theres
nothing in the algorithm that prevents it from supporting
any arbitrary size, but the
code is a bit simpler this way. So if you wanta 200x200 grid, I recommend justusing
a 201x200 grid, and be happy with the extra free column.
Otherwise the edgewrapping codegets more complex.
Since every cellhas from zeroto eight neighbors,you may be wondering how I can
manage to keep track ofthem with onlythree bits. Each cell really has only
a maximum
of sevenneighbors since we only need to keep track of neighbors uutsde of the current
cell word. That is, if cell B changes state then we dont need to reflect this in the
neighbor countsof cellsA and C.Updating is made a little faster. [In otherwords,
when David picks up a word representing three cells, each of the three cells has at
least one of the othercells in that word asa neighbor, and thestate of that neighbor
is stored rightin that word, as shown in Figure 18.1. Therefore, the neighbor count
E A B C a b c X X X Y Y Y Z Z Z
-
B i t 1 1 1 1 1 1 9 8 7 6 5 4 3 2 1 0
5 4 3 2 1 0
XXX : theneighborcountforcell
YYY : theneighborcountforcell
ZZZ : theneighborcountforcell
B
C
364
Chapter 18
for a given cell never needs to reflect more than seven neighbors, because at least
one of the eight neighborsstates is already encoded in the word.]
The basicidea is to maintain a changelist. This is an array of pointers into thecell
array. Each change list element pointsto a word which changes in the next generation. This way we dont have to waste time scanning every cell since most of them do
not change. Two passes are made through the change
list. The first pass updates the
cell display on thescreen, sets the life/death status of each cell for this new generation, and updates the neighbor counts
for the adjacent cells. There are some
efficiencies gained by using cell triplets rather than individual cells since we usually
dont need to set all eight neighbors. [Again, the neighbor counts for cells in the
same word are implied by the states of those cells.] The second pass sets the nextgeneration states for the cells and their neighbors, and in the process builds the
change list for the next generation.
Processing each word is a little complex but very fast. A 64K block of code exists
with routines on each 256-byte boundary. Generally speaking, the entry point corresponds to the high byte ofthe cell word.This byte contains the life/death values and
a bit to indicate if this is an edge condition. During the first pass we take the cell
triplet word, AND it with OXFEOO, and jumpto that address. During the second pass
we take the cell triplet word, AND it with OxFE00, OR it with 0x0100, and jumpto
that address. [Therefore, there are 128 possiblejump targets on the first pass, and
128 more on thesecond, all on 256-byte boundaries and all keyed offthe high 7 bits
of the cell triplet state; because bit 8 of the jumpindex is 0 on thefirst pass and 1 on
the second, there is no conflict. The lower bit isnt needed for other purposes because only the edge flag bit and thesix life/death state bits matter forjumping into
Davids state machine. The other nine
bits, the bits usedfor the neighbor counts, are
used only in the next step.]
Determining which changes must be made to a cell triplet is easy and surprisingly
quick. Theres no counting! Instead, I use a 64K lookup table indexed by the cell
triplet itself. The value of the lookuptable entry is equal to what the high byte should
be in the next generation. If this value is equal to the current high byte, then no
changes are necessary to the cell. Otherwise it is placed in the change list. Look at
the code in the Test() and Fix() functions to see how this is done. [This step is as
important as it is obscure. David has a 64K table organized so that if you use a word
describing a cell triplet as a lookup index, thebyte you will read will be the state of
the high byte for the next generation. Inother words, Davidstable is constructed so
that the edgeflag bit, the life/deathstates, and the three neighbor count
fields form
an index to a byte describing the next generation state for that triplet. In practice,
only the next generation field of the cell changes. Then, if another change to a
nearby cell tries to nudge that cell into changing again, Davids code sees that the
desired state is already set, and does not add thatcell to the change list again.]
365
These suffice to select the proper, minimum-work code to process the nextcell tr ip
let thathas changed, andall potentially affected neighbors. For all the size of Davids
code, it has an astonishing economy of effort, as execution glides through the change
list without a wasted instruction.
366
Chapter 18
Previous
Home
Next
Alas, I donthave the room todiscuss Peter Klerings equallyremarkable Life implementation here. Ill close this chapter with a quote fromTerje Mathisen, one of the
finest optimizers it has everbeen my pleasure to meet,
who, after looking over Davids
and Peters entries, said,This has been an eye-opening experience for me. I honestly thought I had the fastest possible approach. TANSTATFC.
There Aint No Such Thing As the Fastest Code.
367
Previous
chapter 19
Home
Next
371
startling results. Nonetheless, the 486 was still too simple to mark a return to the
golden age of optimization.
372
Chapter 19
My first thought upon hearingof the Pentiums dual pipeswas to wonder how often
the prefetch queue stalls for lack of instruction bytes, given that the demand forinonly
struction bytes can be twice that of the 486. The answer is:rarely indeed, and then
because the codeis not in the internalcache. The 486 has a single 8K cache thatstores
both code and data, and prefetching can stall if data fetching doesnt allow time for
prefetching to occur (although this rarely happens in practice).
The Pentiurn,on the other hand, has
two separate 8K caches, onefor code and one
1 for data, so codepreftches can never collide with datafetches; theprefetch queue
can stall only when the code being fetched isn t in the internal code cache.
(And yes, self-modifyingcode still works;as with allPentium changes, the dual
caches
introduce noincompatibilities with 386/486 code.) Also, because the code and data
caches are separate, code cant be driven out of the cache in a tight loop that ac486. In addition, the Pentium expands486s
the 32-byte
cesses a lot of data, unlike the
prefetch queue to128 bytes. In conjunctionwith the branchprediction feature (described next), which allows the Pentium to prefetch properly at most branches,this
larger prefetchqueue means thatthe Pentiums two pipes should be better fed than
those of any previous x86 processor.
373
lines. Unlike the 486, where there is a 3-cycle penalty for branching to an
instruction
that spans a cache line, theres no such penalty on the Pentium. Thesecond is that
the cache line size (the number of bytes fetched from the external cache or main
memory on a cachemiss) on the Pentium
is 32 bytes, twice the size ofthe 486s cache
line, so a cache miss causes a longer run of instructions to be placed in the cache
than on the486. The third is that the Pentiums external bus is twice as wide as the
486s, at 64 bits, and runs twice as fast, at 66 MHz, so the Pentium can fetch both
instruction and data bytes from the externalcache four times as fast asthe 486.
Even when the Pentium is runningflat-out with bothpipes in use, it can generally
consume only about twice asmany bytes asthe 486; so the ratio ofexternal memory
bandwidth to processing power is much improved, although real-world performance is heavily dependenton the size and speed ofthe external cache.
The upshot of all this is that at the same clock speed, with code and data that are
mostly in the internal caches, the Pentium maxes out somewhere around twice as
fast as a 486. (When the caches are missed a lot, the Pentium can get as much as
three to four times faster,due to the superiorexternal bus and bigger caches.) Most
of this wont affect how you program, butit is useful to know that you dont have to
worry about instruction fetching. Its also useful to knowthe sizes of the caches because a high cache hit rate is crucial to Pentium performance. Cache misses are
vastly slower than cache hits (anywhere from two to 50 or more times as slow, depending on the speed
of the external cache and whether theexternal cache misses
as well), and the Pentium cant use the V-pipe on code that hasnt already been
executed outof the cache at least once. This means that is
it very important to get the
working sets of critical loops to fit in the internalcaches.
One change in the Pentium that
you definitely do have to worry about is superscalar
execution. Utilization of the V-pipe can range from near zero percent to 100 percent,
depending on the code being executed, and careful rearrangement of code can
have amazing effects. Maxing out V-pipe use is not a trivial task; Illspend all of the
next chapter discussing it so as to have time to cover it properly. In the meantime,
two good references for superscalar programming and other Pentium
information
~3:
and BopammingManual
are Intels Pentium Processor UsersManual: V o l u ~ ~Architecture
(ISBN 1-55512-195-0;Intel order number241430-OOl), and the article Optimizing
Pentium Codeby Mike Schmidt, in DXDobbsJoumal for January 1994.
Cache Organization
There aretwo other interesting changes inthe Pentiums cache organization. First,
the cache is two-way set-associative, whereas the 486 is four-way set-associative. The
details of this dont matter, but simply put, this, combined with the 32-byte cache
line size, means that the Pentium has somewhat coarser granularity in both space
and time than the 486 in terms of packing bytes into the cache, although the total
374
Chapter 19
cache space is now bigger. Theres nothingyou can do aboutthis, but itmay make it
a little harder to get a loopsworking set into the cache. Second, the internal cache
can now be configured (by the BIOS or OS; you wont have to worry about it) for
write-back rather than write-through operation. This meansthat writes to the internal data cache dont necessarily get propagated to the external bus until other
demands for cachespace force the data out
of the cache,making repeated writes to
memory variables such as loop counters cheaper on average than on the 486, although not as cheap as registers.
As a final note on Pentium architecture for this chapter, the pipeline stalls (what
Intel calls AGIs,for Address Generation Interlocks) that I discussed earlier in this book
theyre there in spadeson
(see Chapter 12) are still present in the Pentium. In fact,
the Pentium; the two pipelines mean that an AGI can now slow down execution of
an instruction thats three instructions away from theAGI (because four instructions
can execute in two cycles). So, for example, the code sequence
add
mov
add
mov
edx.4
ecx.[ebxl
ebx.4
[edxl.ecx
;U-pipe
;V-pipe
;U-pipe
;V-pipe
cycle
cycle
cycle
cycle
1
2
; due t o A G I
; ( w o u l d have been
; V - p i p e cycle 2 )
takes three cycles rather than thetwo cycles it should take,because EDX was modified on cycle 1 and an attemptwas made to use it on cycle two, before the AGI had
time to clear-even though there aretwo instructions between the instructions that
are actually involved in the AGI. Rearranging the code like
mov
add
mov
add
ecx.[ebxl
ebx.4
[edx+4].ecx
edx.4
;U-pipe
;V-pipe
:U-pipe
;V-pipe
cycle
cycle
cycle
cycle
1
2
2
makes it functionally identical, but cuts the cycles to 2-a 50 percent improvement.
Clearly, avoiding AGIs becomes a muchmore challenging and rewarding game in a
superscalar world, one to which Ill return in the next chapter.
Old Song
375
which can be multipliedby one, two,four, or eight, plus a constant value, and can store
the result in any register-allin one cycle, apart from AGIs. Not only
that, but as wellsee
through
in the nextchapter, LEA can go through eitherpipe, whereas SHL can only go
the U-pipe, so LEA is often a superior choice for multiplication by three, four, five,
eight, or nine. (ADD is the best choice for multiplication by two.)If you use LEA for
arithmetic, do rememberthat unlike ADD and SHL, it doesntmodifjr any flags.
As on the 486, memory operands should notcross any more alignment boundaries
than absolutely necessary. Wordoperands shouldbe word-aligned, dword operands
should be dword-aligned, and qword operands (double-precision variables) should
be qword-aligned. Spanning a dwordboundary, as in
mov ebx.3
mov eax.[ebxl
In fact, given that most jump targets aren t in performance-critical code, its hard
to make a compelling argumentfor aligning branch targets even on the 486. I i l
say that no alignment (except possiblywhere you know a branch target lies in a
key loop), or at most dword alignment f o r the 386) isplenq, and can shrink code
size considerably.
Instruction prefixes are awfully expensive; avoid them if you can. (These include
size
and addressing prefixes, segment ovemdes, LOCK, and the OFH prefixes that extend
the instruction set with instructions such as MOVSX. The exceptions are conditional
jumps, a fast special case.) At a minimum, prefix
a
byte generally takes an extra cycle
and shuts down the V-pipe for that cycle, effectively costing as much as two normal
instructions (although prefix cyclescan overlapwith previous multicycle instructions,or
AGIs, ason the486). This means thatusing 32-bit addressing or 32-bit operands in a
16-bit segment, or vice versa, makesfor bigger code thats significantly slower.So, for
example, you should generally avoid 16-bit variables (shorts, in C) in 32-bit code,
although if using 32-bit variables where theyre not needed makes your data space
get a lotbigger, you maywant to stick withshorts, especially since longs use the cache
less efficiently than shorts. The trade-off depends on the amountof data and the
number of instructions that reference that data. (eight-bit variables, such as chars,
have no extra overhead and can be used freely, although they may be less desirable
than longs for compilers that tend to promote variables to longs when performing
calculations.) Likewise, you should if possible avoid putting data in the code segment andreferring to it with a CS: prefix, or otherwise using segment overrides.
376
Chapter 19
LOCK is a particularly costly instruction, especially on multiprocessor machines, because it locks the busand requires that thehardware be brought into asynchronized
state. The cost varies depending on the
processor and system, but LOCK can make an
INC [ r n e m ] instruction (which normally takes 3 cycles) 5 , 10, or more cycles slower.
Most programmers will never use LOCK on purpose-its primarily an operating system instruction-but
theres a hidden gotcha here because the XCHG instruction
always locks the bus when used with a memory operand.
As with the 486, dont use ENTER or LEAVE, which are slower than the equivalent
discrete instructions.Also, start using TEST reg,reginstead of AND ngregor OR regreg
to test whether aregister is zero. The reason,as wellsee in Chapter21, is that TEST,
unlike AND and OR, never modifies the target register. Although in this particular
case AND and OR dont modify the target register either, the Pentium has
no way of
knowing that aheadof time, so if AND or OR goes through theU-pipe, the Pentium
may have to shutdown the V-pipe for acycle to avoid potential dependencies on the
result of the AND or OR. TEST suffers from no such potential dependencies.
Branch Prediction
One brand-spanking-new feature of the Pentium is hunch prediction, whereby the
Pentium tries to guess, based on past history, which way (or, for conditional jumps,
whether or not),
your codewill jump at eachbranch, andprefetches alongthe likelier path. If the guess is correct, the branch or fall-through takes only 1 cycle:!
cycles less than a branch and the same as a fall-through on the 486; if the guess is
wrong, the branch or fall-through takes 4 or 5 cycles (if it executes in the U- or Vpipe, respectively)-1 or 2 cycles more than a branch and3 or 4 cycles more than a
fall-through on the 486.
Branch prediction is unprecedented in the x86, and fundamentally alters the nature ofpedal-to-the-metal optimization,for the simple reason that it renders unrolled
loops largely obsolete. Rare indeed is the loop that can t afford to spare even 1 or
0 (yes, zero!) cycles per iteration for loop counting, and that j. how low the cost
can go for maintaining a loop on the Pentium.
Also, unrolled loops are bigger than normal loops, so there are extra (and expensive) cache misses the first time through the loopif the entire loopisnt already in
Pentium: Not the Same
the cache; then, too, an unrolled loopwill shoulder other code outof the internal
and external caches. If in a critical loop you absolutely need the time taken by the
loop control instructions, or if youneed an extra register that can be freed by unrolling
a loop, then by all meansunroll the loop. Dont expect the sort of speed-up you get from
this on the486 or especially the 386, though, andwatch out for the cache
effects.
You may wellwonder exactly w h the Pentium correctly predictsbranching. Alas, this is
one area that Intel has declined to document, beyond saying that you should endeavor
to fall through branches whenyou have a choice. Thats good advice on every other
x86 processor, anyway, so its well worth following. Also, its a pretty safe bet thatin a
tight loop, the Pentium will start guessing the right branch direction at the bottom
of the loop pretty quickly, so you can treat loop branchesas one-cycleinstructions.
Its an equally safe bet thatits a badmove to have in a loop a conditional
branch that
goes both ways on a random basis; its hard to see how the Pentium could consistently predict such branches correctly, and mispredicted branches are moreexpensive
than they might appear to be. Not only does a mispredicted branch take 4 or 5
cycles, but the Pentium can potentially execute as many as 8 or 10 instructions in
that time-3 times as many as the 486 can execute during its branch time-so correct branch prediction (or eliminating branch instructions, if possible) is very
important in inner loops. Note that on the486 you can count ona branch to take 1
cycle when itfalls through, but on the Pentium
you cant be sure whether will
it take
1 or either 4 or5 cycles on any given iteration.
As things currently stand, branch prediction is an annoyance for assembly language optimization because its impossible to be certain exactly how code will
perform until you measure it, and even then it j. drflcult to be sure exactly where
the cycles went.All I can say is try to fall through branchesifpossible, and try to
be consistent in your branching ifnot.
378
Chapter 19
Previous
Home
Next
would double the sizes of these discussions and make them hard to follow. In general, Id recommend reserving Pentium optimization for your most critical code,
and even there, its a good ideato have at least two code paths, one for the 386 and
one for the 486/Pentium.
Its also a good idea time
to your code on a 486 before and
after Pentium-optimizingit, to make sure you havent hurt performance on
what will
be, after all, by far the most important processor over the next coupleof years.
With that in mind, is optimizing for the Pentium even worthwhile today? That depends onyour application and its market-but if youwant absolutely the best possible
performance for your DOS and Windows apps on the fastest hardware, Pentium
optimization can make your code scream.
Going Superscalar
In thenext chapter, well lookinto thesingle biggestelement of Pentium performance,
cranking up the Pentiums second execution pipe. This is the area in which comtwo thoughts apparently being
piler technologyis most touted for the Pentium, the
is an
that (1) most existing code is in C, so recompiling to use the second pipe better
automatic win, and (2) its so complicated to optimize Pentium code that only a
compiler can do it well. The first point is a reasonable one, but it doessuffer from
one flaw for large programs, in that Pentium-optimized codeis larger than 486- or
386-optimizedcode, forreasons that will become apparent in the next chapter.
Larger
code meansmore cache misses and morepage faults; and while most ofthe code in
any program is not critical to performance,compilers optimize code indiscriminately.
The result is that Pentium compiler optimization
not only expands code, but can be
less beneficial than expected oreven slower in some cases. What makesmore sense
is enabling Pentium optimization
only for key code. Betteryet, you could hand-tune
the most important code-and yes, you can absolutely do a better jobwith a small,
critical loop than any PC compiler Ive ever seen, or expect tosee. Sure,you keep
hearing how great each new compiler generation is, and compilers certainly have
improved; but they play by the same rules we do, andwere more flexible and know
more aboutwhat were doing-and now we have the wonderfully complex and powerful Pentium uponwhich to loose our carbon-based optimizers.
379
Previous
chapter 20
pentium rules
Home
Next
1%
8p
384
Chapter 20
have handled, as shown in Figure 20.1. What the Pentium does, pure and simple, is
execute a single instruction streamand, whenever possible, takethe next two waiting
instructions and execute both at once, ratherthan one after the other.
The U-pipe is the more capable of the two pipes, able to execute any instruction in
the Pentium's instruction set. (A number of instructions actually use both pipes at
once. Logically, though, you can think of such instructions as U-pipe instructions,
and of the Pentiumoptimization model as one in which the U-pipe isable to execute
all instructions and is always active, with the objective being to keep the V-pipe also
working asmuch of the time as possible.)The U-pipe is generally similar toa full 486
in terms of both capabilities and instruction cycle counts. The V-pipe isa 486 subset,
able to execute simple instructions such as MOV and ADD, but unable to handle
MUL, DIV, string instructions, any sort of rotation or shift, or even ADC or SBB.
-11
Instruction Stream
PUSH EBX
DECEDX
SHR EDX,1
Cycle 2
Writebeforeread
-Idtecontention on EDX
Figure 20.1
PentiumRules
385
The use of both pipes does make MOVSD nearly twice asfast on the Pentium as on
the 486, but it4 nonetheless slower than using equivalent simpler instructions that
allow for superscalar execution. Stick to the Pentium 4 RISC-like instructionsthe pairable instructions Ill discuss next-when youre seeking maximum
performance, with just a few exceptions such as REP MOVS and REP STOS.
Trickier yet, register contention can shut down the V-pipe on any given cycle, and
Address Generation Interlocks (AGIs) can stall either pipe at
any time, as wellsee in
the next chapter.
The key to Pentium optimization is to view execution as a stream of instructions
going through theU- and V-pipes, and to eliminate, as much as possible,instruction
mixes that take the V-pipe out of action. In practice,this is not too difficult. The only
hard partis keeping in mind the long
list ofrules governing instruction pairing. The
place to begin is with the set of instructions that can go through the V-pipe.
V-Pipe-Capable Instructions
Any instruction can go through the U-pipe, and, for practical purposes, the U-pipe
is always executing instructions. (The exceptions are when the U-pipe execution
unit is waiting for instruction or data bytes after a cache miss, and when a U-pipe
instruction finishes before a paired V-pipe instruction, as Ill discuss below.) Only
the instructions shown in Table 20.1 can go through the V-pipe. In addition, the Vpipe can execute a separate instructiononly when one of the instructions listed in
Table 20.2 is executing in the U-pipe; superscalar execution is not possible while any
instruction not listed in Table 20.2 is executing in the U-pipe. So, for example,if you
use SHR EDX,CL, which takes4 cycles toexecute, no other instructions can execute
you use SHR EDX,10, it will take 1 cycle
during those 4 cycles; if, on the other hand,
to execute in the U-pipe, and another instruction can potentially execute concurrently in the V-pipe. (As you can see, similar instruction sequences can have vastly
different performancecharacteristics on the Pentium.)
Basically, after the currentinstruction or pair of instructions is finished (that is, once
neither the U- nor V-pipe is executing anything), the Pentium sends the next instruction
through the U-pipe. If the instruction after the one in the U-pipe is an instruction
the V-pipe can handle, if the instruction in the U-pipe is pairable, and if register
contention doesntoccur, then theV-pipe starts executing that instruction,as shown
in Figure 20.2. Otherwise, the second instruction waits until the first instruction is
386
Chapter 20
done, thenexecutes in the U-pipe, possibly pairing with the next instruction in line
if all pairing conditions are met.
The list of instructions the V-pipe can handle is not very long, and the list of U-pipe
pairable instructions is not much longer, but these actually constitute the bulk of the
instructions used inPC software. As a result, a fair amount of pairing happens even in
normal, non-Pentium-optimized code. This fact, plus the 64bit 66 MHz bus, branch
prediction, dual 8Kinternalcaches, and otherPentium features, together mean thata
Pentium is considerably faster
than a 486 at the same clockspeed, even without Pentiumspecific optimization,contrary to some reports.
Besides, almost all operations can be performed by combinations of pairable instructions. For example, PUSH [mem] is not on eitherlist, but both MOV reg,[mem]
and PUSH reg are, andthose two instructions can be used to push a value stored in
Pentium Rules
387
memory. In fact, given the proper instruction stream, the discrete instructions can
perform this operation effectively in just 1 cycle (taking one-half of each of 2 cycles,
for 2*0.5 = 1 cycle total execution time), as shown in Figure 20.3-a full cycle faster
than PUSH [mem],which takes 2 cycles.
388
A fundamental rule of Pentium optimization is that itpays to break complexinstructions into equivalent simple instructions, then shufle the simple instructions
for maximum use of the Vpipe. This is true partly because most of the pairable
instructions are simpleinstructions, andpartly because breakinginstructions into
pieces allows morefreedom torearrange code to avoid the
AGIs and register contention Ill discuss in the next chapter.
Chapter 20
Instruction Stream
PUSH EBX
!
Pentium
Rules
389
The single complex instruction takes 3 cycles and is 6 bytes long; with proper sequencing, interleaving the simple instructions with other instructions that dontuse
EDX or MemVar, the three-instruction sequence can be reduced to 1.5 cycles, but it
is 14 bytes long.
Its not unusual for Pentium optimization to approximately double both performance and code size at the same time. In an important loop, go for performance
and ignore the size, but on a program-widebasis, the size bears watching.
Lockstep Execution
You may wonder why anyone would bother breakingADD [MemVar],EAX into three
instructions, given that this instruction can go through either pipewith equal ease.
The answer isthat while the memory-accessing instructions other thanMOV, PUSH,
and POP listed in Table 20.1 (that is, INC/DEC [mem],ADD/SUB/XOR/AND/
OR/CMP/ADC/SBBreg,[mem],and ADD/SUB/XOR/AND/OR/CMP/ADC/SBB
[mem],reg/imrned)
can be paired, they do not provide the 100 percent overlap that
we seek. If youlook at Tables 20.1 and 20.2, you willsee that instructionstaking from
1 to 3 cycles can pair. However, anypair of instructions goes through thetwo pipes in
lockstep. This means, for example, that ifADD [EBX],EDXis going through the U-pipe,
and INC EAX is going through the
V-pipe, the V-pipe willbe idle for 2 of the 3 cycles
that the U-pipe takes to execute its instruction, as shown in Figure 20.4. Out of the
theoretical 6 cycles of work that can be done duringthis time, we actually get only 4
cycles of work, or 67 percent utilization. Even though these instructions pair, then,
this sequence fails to make maximum use of the Pentiums horsepower.
The key here is that when two instructions pair, both execution units are tied up
until both instructions have finished (which means at least for the amount of time
required for the longer of the two to execute, plus possibly some extra cycles for
pairable instructions that cant fully overlap, as described below). The logical conclusion would seem
to be that we should strive topair instructions of the same lengths,
but thatis often not correct.
p
390
Chapter 20
Instruction Stream
INC E A X
Instruction execution in the two pipes
U-pipe
ADD [EBX],EDX
Ste 1 lood [EBX]
io; memory
Cycle
V-pipe
-1dleKeep
pipes
in
lockstep
Step 3: storeSte 2
result to [EBXP
391
; Bank 0 Bank 1
~
Cache
line 0
Cache
line 1
Cache
line 2
Bank 2
; Bank 3
Cache
line 255
12
; Bank 5
Bank 6
Bank 7
0
.
Bank4
;
8
16
20
24
28
Figure 20.5
bits 2, 3, and 4 (fall in the same bank) in tight loops, and you should also avoid
sequences like
mov
mov
bl , [esi 1
bh, [esi+ll
bl ,[esi 1
edi ,edx
bh.[esi+ll
By the way, the reason a code sequence that takes two instructions to load a single
word is attractive in a 32-bit segment is because it takes onlyone cycle when the two
instructions can be paired with other instructions; by contrast, the obvious way of
loading BX
mov
bx.[esil
takes 1.5 to two cycles because the size prefix cant pair, as described below. This is
yet another example of how different Pentiumoptimization can be from everything
weve learned about its predecessors.
The problem with pairing non-single-cycle instructions arises when a pipe executes
an instructionother thanMOV that has an explicit memory operand. (Ill call these
complex memory instrmctions. Theyre the only pairable instructions, other thanbranches,
that take more than one cycle.) Weve already seen that, because instructions go
through the pipes in lockstep, if one pipe executes a complex memory instruction
392
Chapter 20
such asADD FAX,[EBX] while the otherpipe executes a single-cycle instruction, the
pipe with the faster instruction will sit idle for part of the time, wasting cycles. You
might think that
if both pipes execute complex instructions of the same length, then
neither would lie idle, but that turns out to not always be the case. Two two-cycle
instructions (instructions with register destination operands) can indeed pair and
execute in two cycles, so its okayto pair two instructions such as these:
add esi.[SourceSkipl
add e d i . t D e s t i n a t i o n S k i p 1
: V - p i p ec y c l e s
Instruction Stream
AND [ECX],DL
U-pipe
- ) b e /
-Idle-
-+I
ANDEBXI At
1: b a d iEBX]
rom memorv
AND [EBX],At
Step 2: and At with
value loaded
in
Step 1
1
I
Cycle 0
1
I
Cycle 1
V-pipe
-IdleWait for U-pipe to
reach its last cycle
AND [EBX],At
result to [EBXP
393
Instruction Stream
MOV DH,[ECX]
AND DH,DL
MOV [EBX],AH
MOV [ECX],DH
+ v i
V-pipe
U-pipe
Cycle 0
-+I
MOV [EBX],AH
Cycle 2
It
MOV [ECX],DH
I
d
Superscalar Notes
You may well ask whyits necessary tointerleave operations, as isdone in Figure 20.7.
It seems simpler just to turn
and [ e b x l . a 1
394
Chapter
20
into
mov
d l ,[ebxl
and
d l .a1
mov
[ebxl , d l
and be done with it. The problem here is one of dependency. Before the Pentium
can execute AND DL&, it must first know what
is in DL, and it cant know that until
it loads DL from the address pointed to by EBX. Therefore, AND DL& cant happen until thecycle after MOV DL,[EBX] executes. Likewise,the result cant bestored
until the cycle after AND DL& has finished. This means that these instructions, as
written, cant possibly pair, so the sequence takes the same three cycles as AND
[EBX],AL. (Now itshould be clear why AND [EBX], AL takes 3 cycles.) Consequently,
its necessary to interleave these instructions with instructions that use other registers, so this set of operations can execute in one pipe while the other, unrelatedset
executes in the other pipe,as is done in Figure 20.7.
What wevejust seen is the read-after-write form of the superscalar hazard known as
register contention. Ill return to thesubject of register contention in the next chapter;
in the remainderof this chapter Idlike to covera few short items about superscalar
execution.
Register Starvation
The above examples should make
it pretty clear that effective superscalar programming
puts a lot of strain ori the Pentiums relatively small register set.There areonly seven
general-purpose registers (I strongly suggest usingEBP in criticalloops), and does
it
not help to have to sacrifice one of those registers for temporary storage on each
complex memory operation; in pre-superscalar days,we used to employthose handy
CISC memory instructions to do all that stuff without using any extra registers.
More problematicstill is thatfbrmaximum pairing, youlltypically have two operations proceeding at once, onein each pipe, andtrying to keep two operations in
registers at once is difJicult indeed. There k not much to be done about this, other
than clever and Spartan register usage, but be aware that it j . a major element of
Pentium performance programming.
Also be awarethat prefixes of everysort, with the sole exception of the OFH prefix on
non-short conditional jumps, always execute in the U-pipe, and that Intels documentation indicates that no pairing can happen while a prefix byte executes. (As Ill
discuss in the next chapter, my experiments indicate that this rule doesnt always
apply to multiple-cycle
instructions, but you still wontgo far wrong by assuming that
the above rule is correct and trying to eliminate prefix bytes.) A prefix byte takesone
cycle to execute;after that cycle, the actual prefixed instruction itselfwill go through
the U-pipe, and if it and the following instruction are mutually pairable, then they
Pentium Rules
395
Previous
Home
will pair. Nonetheless, prefix bytes are very expensive, effectively taking at least as
long as two normal instructions, and possibly, if a prefixed instruction could otherwise have paired in theV-pipe with the previous instruction, taking as long as three
normal instructions, as shown in Figure 20.8.
Finally, bear in mind thatif the instructions being executedhave not already been
executed at least once since they were loaded into the internalcache, they can pair
only if the first (U-pipe) instructionis not only pairable but also exactly1 byte long,
a category that includes
only INC reg, DEC reg, PUSH reg, and POP reg. Knowing this
can helpyou understand why sometimes, timingreveals that your code runs slower
than itseems it should, althoughthis will generally occur only when the cacheworking set for the code
youre timing is on the order
of 8K or more-an awful lot of code
to try to optimize.
It should be excruciatingly clearby this point thatyou must time yourPentiumaptimized
code if youre to have anyhope of knowing if your optimizations are working as well
as youthink they are; there are just many
too details involved for you to be sure your
optimizations are working properly without checking. My most basic optimization
rule has always been to grab theZen timer and measure actual performance-and nowhere is this more true than on the Pentium.
Dont believe it untilyou measure it!
Instruction Stream
ipe
U-pipe
PUSH EDX
Cycle
1
I
Prefix delays.
Figure 20.8
396
Chapter 20
Prefixes
-Idle-cant
execute in V-pipe
I
L
Next
Previous
chapter 21
unleashing the pentium's V-pipe
Home
Next
Ch
&
rill
399
optimization because the Pentium is perhaps the most complex (and rewarding)
chip to optimize for that Ive ever seen.
In thelast chapter, we explored thedual-execution-pipe nature of the Pentium, and
learned which instructions couldpair (execute simultaneously) in which pipes. Now
were ready to look
at AGIs and register contention-two hazards that can prevent otherwise properly written code from taking full advantage ofthe Pentiurns two pipes, and
can thereby keep your code from pushing the Pentium
to maximum performance.
and thevalue of the register is not set farenough aheadfor the Pentiumto perform
the addressing calculations before the instruction needs the address, then the Pentium
will stall the pipe in which the instruction is executing untilthe value becomes available and the addressing calculations have been performed. Remember, also, that
instructions execute in lockstep on the Pentium, so if one pipe stalls for a cycle,
making its instruction take one cycle longer, that extendsby one cycle the time until
the otherpipe can begin its next instruction, as well.
The rule for AGIs is simple: If you modify any part of a register during a cycle, you
cannot use that register to address memory during either that
cycle or the next
cycle.
If you tryto do this, the Pentiumwill simply stallthe instruction that tries to use that
register to address memory untiltwo cycles after the register was modified. This was
true on the 486 as well, but the Pentiums new twist is that since more than one
instruction can execute in a single cycle, an AGI can stall an instruction thats as
many as three instructions away from the changing of the addressing register, as
shown in Figure 21.1, and an AGI can also cause a stall that costs as many as three
instructions, as shown in Figure 21.2. This means thatAGIs are both mucheasier to
cause and potentially more expensive than on the486, and you must keep a sharp
eye out for them. It also means that its often worth calculating a memory pointer
several instructions ahead of its actual use. Unfortunately, this tends to extend the
lifetimes of pointer registers to span a greater numberof instructions, making the
Pentiurns relatively smallregister set seem even smaller.
400
Chapter 21
Instruction Stream
II
lockstep
-Idleexecution
Cycle 1
AGI (EGl:kiified
previous cycle]
on
: U - p i p ec y c l e
: V - p i p ec y c l e
: U - p i p ec y c l e
: V - p i p ec y c l e
: U - p i p ec y c l e
: V - p i p ec y c l e
: U - p i p ec y c l e
: V - p i p ec y c l e
1
1
2
2
3 AGI stall
3 l o c k s t e pi d l e
4 mov e a x . [ e b x ]
4 mov e d x , [ e b p - 8 ]
This commonplace codeloses a U-pipe cycle to the AGI caused by AND EBX,EBX,
followed by the attempt two instructions later touse EBX to point tomemory. The
code loses a V-pipe cycleas well, because lockstep execution won't let the next
V-pipe
Unleashing thePentium's V-pipe
401
Instruction Stream
ADDEBX,EDX
"_
DECEAX
PUSH EBX
+I
U-pipe
MOV ESI,[Ptr]
Cycle
V-pipe
Register contention
-Idleon ESI
instruction execute until the paired U-pipe instruction that suffered the AGI finishes. The solution is to use TEST EBX,EBX instead of AND; TEST can't modify
EBX, so no AGI occurs. Sure, AND EBX,EBX doesn't modify EBX either, but the
Pentium doesn't know that, so it has to insertthe AGI.
As on the486, you should keep a careful
eye out forAGIs involving the stack pointer.
Implicit modifiers of ESP, such as PUSH and POP, are special-cased so you don't
have to worry about AGIs. However, if youexplicitly modify ESP withthis instruction
sub esp.100h
402
Chapter
21
you can then getAGIs if you attempt to use ESP to address memory, either explicitly
with instructions like this one
mov
eax.[esp+20h]
or via PUSH, POP, or other instructions that implicitly use ESP as an addressing
register.
On the 486, any instruction that had both a constant value and an addressing displacement, such as
mov dword p t r [ebp+16].1
suffered a 1-cycle penalty, taking a total of 2 cycles. Such instructions take only one
cycle on the Pentium, but
they cannot pair, so theyre still the mostexpensive sort of
MOV. Knowing this can speedup something as simple as zeroing two memory variables, as in
;U-pipe
eax.eax
sub
mov [MemVarll.eax
;U-pipe
mov CMemVar2l.eax
1
; a n yV - p i p ep a i r a b l e
; i n s t r u c t i o nc a n
go h e r e ,
; o r SUB c o u l db ei nV - p i p e
2
;V-pipe 2
which should never be slower and should potentially be 0.5 cycles faster, and six
bytes smaller than this sequence:
mov CMemVarll.0
:U-pipe
mov [MemVarEl.O
:U-pipe
1
2
Register Contention
Finally, we come to the last major component of superscalar optimization: register
contention. Thebasic premise here is simple: You cant use the same register in two
inherently sequentialways in a single cycle. For example, you cant execute
i en:acUx- p i cp yec l e
: V - p i p ei d l ec y c l e
1
: due t o dependency
2
a n de b x . e a x; U - p i p ec y c l e
in a single cycle;AND EBX,EAX cant execute untilthe value in EAX is known, and
that cant happen untilINC EAX is done. Consequently, the V-pipe idles while INC
Unleashing the Pentiums V-pipe
403
EAX executes in the U-pipe. We saw this in the last chapter when we discussed splitting instructions into simple instructions, and it is by far the most common sortof
register contention, known as read-after-write register contention. Read-after-write
register contention is the primary reason we have to interleave independent operations in order to get maximumV-pipe usage.
The other sort of register contention is known as write-after-write. Write-after-write
register contention happenswhen two instructions try to write to the same register
on the same cycle. While that may not seem like a particularly useful operation in
general, it can happen when subregisters are beingset, as in the following
1
; V - p i p ei d l ec y c l e
1
; due t o r e g i s t e r c o n t e n t i o n
mov a l , [ V a; Ur l- p i cp yec l e
2
seuabx . e; Ua -xpci py ec l e
where an attemptis made to set both EAX and its AL subregister on thesame cycle.
Write-after-write contention implies that the two instructions comprising the above
substitute for MOVZX should have at least one unrelatedinstruction between them
when SUB EAX,EAX executes in the V-pipe.
1
1
eax,[MemVar]
p u s he s i
pusheax
p u s he d i
pushebx
c a lFl o o T i l d e
; U - p i p ec y c l e
; V - p i p ec y c l e
; U - p i p ec y c l e
; V - p i p ec y c l e
; U - p i p ec y c l e
; V - p i p ec y c l e
1
1
2
2
3
3
But in fact, all the instructions pair, even though ESP is modified five times in the
space of six instructions.
The final register-contention special case isboth remarkable andremarkably important. There is exactlyone sort of instruction that can pair only in the V-pipe: branches.
404
Chapter 21
; U - p i p ec y c l e
; V - p i p ec y c l e
; U - p i p ec y c l e
; V - p i p ec y c l e
1
1
Branches cant pairin the U-pipe; a branch that executes in the U-pipe runs alone,
with the V-pipe idle. If a call orjump is correctly predicted by the Pentiums branch
prediction circuitry (as discussed in the last chapter), it executes in a single cycle,
pairing if it runs in the
V-pipe; if mispredicted, conditionaljumps take 4 cycles in the
U-pipe and 5 cycles in the V-pipe, and mispredicted calls and unconditional jumps
take 3 cycles in either pipe. Note thatRET cant pair.
Whos in First?
One of the trickiest things about superscalar optimizationis that agiven instruction
stream can executeat a different speed depending on the pipe where
startsitexecution, because which instruction goes through the U-pipe first determines which of
the following instructions will be able to pair. If we take the last example and add
one more instruction, the other instructions will go through different pipes than
previously, and cause the loopas a whole to take 50 percent longer, even though we
only added 25 percent morecycles:
LoopTop:
i n c edx
rnov [ e s i ] . e a x
add esi .4
dececx
j n z LoopTop
1
; & p i p ec y c l e
; V - p i p ec y c l e 1
; U - p i p ec y c l e 2
; V - p i p ec y c l e 2
; U - p i p ec y c l e 3
; V - p i p ei d l ec y c l e
3
; because JNZ c a n t
; p a i r i n the U-pipe
Its actually not hard to figure out which instructions go through which pipes; just
back up until you find an instruction that cant pair or can only go through theU-pipe,
and work forward from there,given the knowledge that that instruction executes in
the U-pipe. The easiest thing to look for is branches. All branch target instructions
execute in the U-pipe, as do all instructions after conditional branches that fall
through. Instructions with prefix bytes are generally good U-pipe markers, although
theyre expensive instructions that should be avoided whenever possible, and have
at least one aberrationwith regard to pipeusage, as discussed below. Shifts, rotates,
ADC, SBB, and all other instructions not listed in Table 20.1 in the last chapter are
likewise U-pipe markers.
405
Pentium Optimization
Action
Now, lets takea look at oneof the simplest, tightest pieces of code imaginable, and
see what our new Pentium perspective reveals. Listing21.1 shows a loop implementing the TCP/IP checksum, a 16-bit checksum that wraps carries around to the low
bit so that the result is endian-independent. This makes it easy to perform checksums
on blocks of data regardless of the endiancharacteristics of the machines on which
those blocks are generated andreceived. (Thanks to fellow performance enthusiast
Terje Mathisen for suggesting this checksum as fertile ground for Pentiumoptimization, in the ibm.pc/fast.code forum on Bix.) The loop in Listing 21.1 consists of
exactly fiveinstructions; its hard to imagine that theres a lotof performance to be
wrung from this snippet, right?
LISTING 2 1.1
121 1.ASM
: C a l c u l a t e sT C P / I P( 1 6 - b i tc a r r y - w r a p p i n g )c h e c k s u mf o rb u f f e r
: s t a r t i n ga tE S I ,o fl e n g t h
: Returnschecksum i n AX.
E C X words.
: ECX andESIdestroyed.
: A
l c y c l ec o u n t s
assume 3 2 - b i t p r o t e c t e d
mode.
: Assumes b u f f e rl e n g t h > 0 .
: N o t et h a tt i m i n gi n d i c a t e st h a tt h ep i p es e q u e n c ea n d
: c y c l ec o u n t s shown(basedondocumentedexecutionrules)
: d i f f e rf r o mt h ea c t u a le x e c u t i o ns e q u e n c ea n dc y c l ec o u n t s :
: t h i sl o o ph a sb e e nm e a s u r e dt oe x e c u t ei n
5 c y c l e s :a p p a r e n t l y ,
: t h e1 s th a l fo f
ADD somehow p a i r s w i t h t h e p r e f i x b y t e , o r t h e
: r e f i xb y t eg e t se x e c u t e da h e a do ft i m e .
c h e:ci nkt haistexui a.mal ixszueb
ckloop:
[aaedxsd,i
; c ay cx l.ae0d c
easdi d
ecx dec
c k l o o pj n z
.2
1
1
2
2
3
3
4
:cycle 4
:cycle 5
;cycle 5
;cycle 6
:cycle 6
:cycle
:cycle
:cycle
:cycle
:cycle
:cycle
U - p i pp ree bf iyxt e
V - p i p ei d l e( n op a i r i n gw / p r e f i x )
ADD
U - p i p e1 s th a l fo f
V - p i p ei d l e( r e g i s t e rc o n t e n t i o n )
ADD
U - p i p e2 n dh a l fo f
V - p i p ei d l e( r e g i s t e rc o n t e n t i o n )
U p- bpr yei ptfeei x
V - p i p ei d l e( n op a i r i n gw / p r e f i x )
U - p i p e ADC AX.0
V-pipe
U-pipe
V-pipe
Wrong, wrong, wrong! As detailed in Listing 21 . l , this loop shouldtake 6 cycles per
checksummed word in 32-bit protected mode, a ridiculously high number for the
Pentium. (Youll see why I say should take, not takes,shortly.) We should lose 2
cycles in each pipe to the two size prefixes (because the ADDS are 16-bit operations
in a 32-bit segment), and another2 cycles because of register contention that arises
when ADC A X , O has to wait for the result of ADD AX,[ESI]. Then, too, even though
DEC and JNZ can pair and the branch prediction for JNZ is presumably correct
virtually allthe time, they do take a full cycle, and maybe we can do something about
that as well.
406
Chapter 21
The first thing todo is to time the code inListing 21.1 to verify our analysis. When I
unleashed theZen timer on Listing 21.1, I found, to my surprise, that the codeactually takes only five cycles per checksum word processed, not six. A little more
experimentation revealed that addinga size prefix to the two-cycle ADD EAX,[ESI]
instruction doesntcost anything, certainly not the onefull cycle in each pipe that a
prefix is supposed to take. More experimentation showed that prefix bytes do cost
the documented extra
cycle whenused with one -cycle instructions such as MOV. At
this point, my preliminary conclusionis that prefixes can pair with the first cycle of
at least some multiple-cycle instructions. Determiningexactly why this happens will
take further research on my part, but the
most important conclusion is that you must
measure your code!
The first, obvious thing we can do to Listing 21.1 is change ADC A X , O to ADC E A X , O ,
eliminating a prefixbyte and saving a full cycle. Now were down from five to four
cycles. What next?
Listing 21.2 shows one interesting alternative that doesnt really buy us anything.
Here, weve eliminated all size prefixes by doing byte-sized MOVs and ADDS, but
because the size prefix on ADD AX,[ESI], for whatever reason, didntcost anything
in Listing 21.1, our efforts are to no avail-Listing 21.2 still takes 4 cycles per
checksummed word. Whats worth noting aboutListing 21.2 is the extent towhich
the code is broken into simple instructions and reordered so as to avoid size prefixes, register contention, AGIs, and data bank conflicts (the latter because both
[ESI] and [ESI+l] are in the same cache data bank,as discussed in the last chapter).
LISTING 2 1.2 12 1-2.ASM
: C a l c u l a t e sT C P / I P( 1 6 - b i tc a r r y - w r a p p i n g )c h e c k s u mf o rb u f f e r
: s t a r t i n ga t E S I . o fl e n g t h
E C X words.
: Returnschecksum
i n AX.
: H i g hw o r do f
EAX. O X , E C X and E S I d e s t r o y e d .
: Al c y c l ec o u n t s
assume 3 2 - b i t p r o t e c t e d
> 0.
mode.
: Assumes b u f f e rl e n g t h
sub
mov
ecx
dec
jz
esi.2 add
eax,
eax
dx.[esil
c sk hl ooor pt e n d
: i n i t i a l i z e t h e checksum
: f i r s t word t o checksum
; w e l l do 1 c h e c k s u mo u t s i d et h el o o p
: o n l y 1 checksum t o do
: p o i n tt ot h en e x tw o r dt oc h e c k s u m
ckloop:
add
mov
adc
mov
adc
add
dec
j nz
ckloopend:
:checksum
ax.dx
the
add
eax.O adc
a1 . d l
d l ,[esi 1
ah.dh
dh.[esi+ll
eax.O
e s i ,2
ecx
Ckl O O D
:cycle
:cycle
:cycle
:cycle
:cycle
:cycle
:cycle
:cycle
1 U-pipe
1 V-pipe
2 U-pipe
2 V-pipe
3 U-pipe
3 V-pipe
4 U-pipe
4 V-pipe
l a s t word
407
i n AX.
: H i g hw o r do f
EAX. B XE. D XE. C X
and E S I d e s t r o y e d .
: A
l c y c l ec o u n t s assume 3 2 - b i t p r o t e c t e d mode.
: Assumes b u f f e rl e n g t h > 0 .
eax, eax
sub
edx. edx
sub
ecx, 1
shr
s h o r tc k l o o p s e t u p
jnc
mov
a x , [ e s i1
s h o r tc k l o o p d o n e
jz
add
e s i .2
ckloopsetup:
d[xe,s i
1
mov
.b[ le s i + 2 1
mov
dwords
more
:anyecx
dec
schk ol or ot p e n d
jz
: i n i t i a l i z e t h e checksum
; p r e p a r ef o rl a t e r
ORing
; w e l l 1 d ot w ow o r d sp e rl o o p
;evennumber
o fw o r d s
:dotheoddword
:no morewords t o checksum
: p o i n tt ot h en e x tw o r d
: l o amdo1osw
sft to tr od
: checksum ( l a sb ty tl eo a d eil dno o p )
t o checksum?
;no
ckloop:
mov
add
shl
bh.[esi+31
e s i ,4
ebx.16
or
mov
add
mov
adc
mov
dec
jnz
ebx, edx
d l , [ e s i1
eax.ebx
b l ,[ e s i + 2 1
eax.0
dh.[esi+ll
ecx
ckl oop
ckloopend:
mov
add
adc
adc
mov
e d x . 1 6s h r
ax.dx add
eax.O adc
ckloopdone:
408
Chapter 21
bh.[esi+31
ax.dx
ax.bx
ax.0
edx.eax
:cycle 1 U-pipe
;cycle 1 V-pipe
;cycle 2 U-pipe
: c y c l e 2 V - p i p ei d l e
: ( r e g i s t e rc o n t e n t i o n )
;cycle 3 U-pipe
; c y c l e 3 V-pipe
:cycle 4 U-pipe
:cycle 4 V-pipe
:cycle 5 U-pipe
; c y c l e 5 V-pipe
;cycle 6 U-pipe
:cycle 6 V-pipe
: c h e c k s u mt h el a s td w o r d
: c o m p r e s st h e3 2 - b i tc h e c k s u m
: i n t o a 1 6 - b i t checksum
The checksum loop in Listing 21.3 takes longer than the loop in Listing 21.2, at 6
cycles versus 4 cycles for Listing 21.2-but Listing 21.3 does two checksum operations in those 6 cycles, so weve cut the time per checksum addition from 4 to 3
cycles. You might think thatthis small an improvement doesntjustifythe additional
complexity of Listing 21.3, but it is a one-third speedup, well worth it if this is a
critical loop-and, in general, if it isnt critical, theres no point in hand-tuning it.
Thats why I havent bothered to try to optimize the non-inner-loop code in Listing
21.3; its only executed once per checksum,
so its unlikely that a cycle or two saved
there would make any real-world difference.
Listing 21.3 could be made abit faster yet with some loop unrolling,but thatwould
make the code quite bit
a more complex forrelatively little return. Instead, why not
make the code more complex and get a bigreturn?
Listing 21.4 does exactly that by
loading one dword at a time to eliminate both the word prefix of Listing 21.1 and
the multiple byte-sized accesses of Listing 21.3. An obvious drawback to this is the
considerable complexity needed to ensure that the
dword accessesare dword-aligned
(remember that unaligned dword accesses cost three cycles each), and to handle
buffer lengths that arent
dword multiples. Ive handled these problems by requiring
that the buffer be
dword-aligned and a dword multiple in length,which is ofcourse
not always the case in the real world. However, the point of these listings is to illustrate Pentium optimization-dword issues, being non-inner-loop stuff, are solvable
details that arent germaneto the main focus. In any case, the complexity and assumptions arewell justified by the performance of this code: three cycles per loop,
or 1.5 cycles per checksummed word, more than threetimes the speed of the original code. Again, note that theactual order in which the instructions are arrangedis
dictated by the various optimization hazardsof the Pentium.
LISTING 2 1.4
121-4.ASM
: C a l c u l a t e sT C P / I P( 1 6 - b i tc a r r y - w r a p p i n g )c h e c k s u mf o rb u f f e r
: s t a r t i n ga tE S I .o fl e n g t h
ECX words.
: Returnschecksum
i n AX.
E A X . E C X . E O X . and E S I d e s t r o y e d .
: Al c y c l ec o u n t s assume 3 2 - b i tp r o t e c t e d mode.
: Assumes b u f f e r s t a r t s on a dwordboundary,
i s a d w o r dm u l t i p l e
: i n l e n g t h .a n dl e n g t h
> 0.
; H i g hw o r do f
sub
ecx.1 shr
mov
easdi d
ecx dec
jz
ckloop:
eax.edx
add
mov
eax.O adc
easdi d
ecx
dec
jnz
eax.eax
edx,
Cesi
.4
cskhl o or tp e n d
edx,
Cesi
,4
ckloop
; i n i t i a l i z e t h e checksum
: w e l l d ot w ow o r d sp e rl o o p
: p r e l o a dt h ef i r s td w o r d
; p o i n tt ot h en e x td w o r d
: w e l l do 1 c h e c k s u mo u t s i d et h el o o p
: o n l y 1 checksum t o do
:cycle
;cycle
;cycle
:cycle
:cycle
;cycle
1 U-pipe
1 V-pipe
2 U-pipe
2 V-pipe
3 U-pipe
3 V-pipe
409
ckloopend:
eax.edx
add
eax.O adc
mov
edx.16 shr
ax.dx add
eax.0 adc
: c h e c k s u mt h el a s td w o r d
; c o m p r e s st h e3 2 - b i tc h e c k s u m
eax
edx,
: i n t o a1 6 - b i tc h e c k s u m
Listing 21.5 improves upon Listing 21.4 by processing 2 dwords per loop, thereby
bringing the time per checksummed word down to exactly 1 cycle. Listing 21.5 basically does nothing but unrollListing 21.4's loop one time, demonstrating that the
venerable optimization techniqueof loop unrollingstill has some life left init on the
Pentium. Thecost for this is, as usual, increased code size and complexity, and the
use of more registers.
LISTING21.5121
-5.ASM
; C a l c u l a t e sT C P / I P( 1 6 - b i tc a r r y - w r a p p i n g )c h e c k s u mf o rb u f f e r
; s t a r t i n ga t
E S I . o fl e n g t h E C X words.
: Returnschecksum
i n AX.
: H i g hw o r do f
EAX. EBX. ECX. E D X , and E S I d e s t r o y e d .
: A
l c y c l ec o u n t s assume 3 2 - b i t p r o t e c t e d mode.
; Assumes b u f f e r s t a r t s
on adwordboundary,
i s ad w o r dm u l t i p l e
; i nl e n g t h ,a n dl e n g t h
> 0.
sub
shr
jnc
mov
jz
add
noodddword:
mov
mov
dec
jz
add
eax, eax
ecx ,2
s h o r tn o o d d d w o r d
eax. [ e s i 1
s h o r tc k l o o p d o n e
e s i .4
; i n i t i a l i z e t h e checksum
: w e ' l l d ot w od w o r d sp e rl o o p
; i s t h e r e an odddword
inbuffer?
;checksumtheodddword
; n o . done
; p o i n tt ot h en e x td w o r d
edx. Cesi 1
ebx.[esi+4]
ecx
s h o r tc k l o o p e n d
e s i .8
; p r e l o a dt h ef i r s td w o r d
: p r e l o a dt h es e c o n dd w o r d
; w e ' l l do 1 c h e c k s u mo u t s i d et h el o o p
; o n l y 1 checksum t o do
; p o i n tt ot h en e x td w o r d
eax ,edx
e d x . [ e s i1
eax.ebx
ebx. [esi+41
eax, 0
e s i .8
ecx
c k l oop
;cycle 1 U-pipe
:cycle 1 V-pipe
:cycle 2 U-pipe
;cycle 2 V-pipe
; c y c l e3U - p i p e
: c y c l e3V - p i p e
; c y c l e4U - p i p e
: c y c l e4V - p i p e
ckloop:
add
mov
adc
mov
adc
add
dec
j nz
ckloopend:
add
adc
adc
ckloopdone:
mov
shr
add
adc
41 0
Chapter 21
Previous
Home
Next
41 1
Previous
chapter 22
zenning and the flexible mind
Home
Next
Ch
Lennmg
In Jeff Duntemanns excellent book Bodand PascaZFrum Square One (Random House,
1993),theres asmall assemblysubroutine thats designed to be called from a Turbo
415
Pascal program in order to fill the screen or a system-memory screen buffer with a
specified character/attribute pair in text mode. This subroutine involves only 21
instructions and works perfectly well; however, with whatwe know, we can compact
the subroutine tremendously and speed it up a bit as well. To coin a verb, we can
Zen this already-tight assembly code to an astonishing degree. In the process, I
hope youll get a feel forhow advanced your assembly skills havebecome.
Jeffs original code follows as Listing 22.1(withsome text converted to lowercase in
order to match the style of this book), but thecomments are mine.
LISTING 22.1 122OnStack
01 dBP
RetAddr
Filler
Attrib
BufSize
BufOfs
BufSeg
EndMrk
OnStack
1.ASM
PUSH
BP
fill
p r once a r
bP
callers
;save
BP
bP. SP
f r a mset a c k t o : p o i n t
word p t r C b p l . B u f S e g:.s0ktihpe
fill i f a n u l l
Start
: p o i n t e r i s passed
word p t r Cbpl.BufOfs,O
je
Bye
:make STOSW countup
Start: cld
mov
a x . C b p l . A t :t lroi ba d
AX waittthr i bpuat er a m e t e r
fill c h a r
and
a x . O f; fpm
0r 0efhprw
ogari itrnheg
BX w i t h fill c h a r
b x . [ b p l . F i :l l oe ar d
mov
xpr r.irf0gbteohfiuprn
f thager e
and ambt :tew
o r : caaotantmrxdib.bbiunxtee
fill c h a r
mov
DI w ti tahr gbeutf foef rf s e t
bx,Cbpl.BufO
: l of sa d
mov
d i ,bx
mov
ES w ti tahr gbeutf f e r
segment
b x . [ b p l . B u f S: leoga d
mov
e s , bx
mov
c x . C b p l . B u f S;ilzoea d
C X w bi tuhf f e r
size
stosw
b u f;fill
fer
the
rep
Bye:
p o i nstmov
teoarr;cri kge isnst aoplr. eb p
POP
; and c a l l e r s BP
bp
ret
E n d M r k - R e t A d d r:-r2e t u rcnl .e a r i nt hpgea r m
f rsot m
hs tea c k
endp
Clears
Clears
push
mov
cmp
jne
cmp
The first thing youll notice about Listing 22.1 is that Clears uses a REP STOSW
instruction. That means that were not going to improve performance by any great
amount, no matter how clever we are. While we can eliminate some cycles, the bulk
of the work in Clears is done by that one repeated
string instruction, and theres no
way to improve on that.
Does that mean thatListing 22.1 is as good as it can be? Hardly. While the speed of
Clears is verygood, theres another side to the optimization equation: size. The whole of
Clears is 52 bytes long as it stands-but, as well see, that size is hardly set in stone.
41 6
Chapter 22
Where do we begin with Clears? For starters, theres an instruction in there that
serves no earthly purpose-MOV SP,BP. SP is guaranteed to be equal to BP at that
point anyway, so why reload it with the same value? Removingthat instruction saves
us two bytes.
Well, that was certainly easy enough! Were not going to find any more totally nonfunctional instructions in Clears, however, so lets get on to someserious optimizing.
Well look first for cases where we know of better instructions for particular tasks
than those that were chosen. For example, theres no need to load any register,
whether segment or general-purpose, through BX; we can eliminate two instructions by loading ES and DI directly as shownin Listing 22.2.
LISTING22.2122-2.ASM
Clears
p r once a r
push b p
mov
bp. sp
cmp
word p t r Cbpl.BufSeg.0
Start
jne
cmp
word p t r[ b p l . B u f O f s . O
Bye
je
S t a r t :c l d
mov
ax.Cbpl.Attrib
ax.Off00h
and
mov
b x . [ b p l .F i 1 1 e r
and
bx.0ffh
a x . b xo r
mov
d i .Cbp].BufOfs
mov
es,[bpl.BufSeg
mov
cx.[bpl.BufSize
s troespw
Bye :
POP
bP
ret
EndMrk-RetAddr-2
C1endp
ears
: s a v ec a l l e r s
BP
: p o i n tt os t a c kf r a m e
: s k i pt h e fill i f a n u l l
: p o i n t e ri sp a s s e d
:make STOSW c o u n t up
: l o a d AX w i t h a t t r i b u t e p a r a m e t e r
: p r e p a r ef o rm e r g i n gw i t h
fill c h a r
: l o a d BX w i t h fill c h a r
: p r e p a r ef o rm e r g i n gw i t ha t t r i b u t e
: c o m b i n ea t t r i b u t ea n d
fill c h a r
;load D I w i t ht a r g e tb u f f e ro f f s e t
: l o a d ES w i t h t a r g e t b u f f e r
segment
:load CX w i t hb u f f e rs i z e
:fill t h eb u f f e r
: r e s t o r e c a l l e r s BP
: r e t u r n .c l e a r i n gt h e
parms f r o mt h es t a c k
:save
nC
epalreor acr s
bp
push
mov
bp,sp
cmp
word p t r C b p l . B u f S e g:.s0kt ihpe
Start jne
passed
cmp
word p t[ rb p l . B u f O f s , O
je
Bye
cld
Start:
mov
ax.[bpl.Attrib
ax.Off00h
and
BP
: p o i nf rt a tmoe s t a c k
fill i f a n u l l
i s: p o i n t e r
:makeu pSTOSW c o u n t
: l o a d AX w i t ha t t r i b u t ep a r a m e t e r
: p r e p a r ef o rm e r g i n gw i t h
fill c h a r
Bye :
mov
and
or
1es
bx.[bpl.Filler
bx.0ffh
ax, bx
d i . d w o r dp t r[ b p l . B u f O f s
mov
rep
cx,Cbpl.BufSize
stosw
POP
ret
Clears
bP
EndMrk-RetAddr-2
endp
: l o a d BX w i t h fill c h a r
: p r e p a r ef o rm e r g i n gw i t ha t t r i b u t e
;combine a t t r i b u t e and fill c h a r
: l o a d E S : D I w i t ht a r g e tb u f f e r
:segment:offset
:load CX w i t hb u f f e rs i z e
:fill t h eb u f f e r
: r e s t o r e c a l l e r s BP
: r e t u r n .c l e a r i n gt h e
p a r m sf r o mt h es t a c k
Thats good for another three bytes. Were downto 43 bytes, and counting.
We can save 3 more bytes by clearing the low and high bytes of AX and BX, respectively,
by using SUB reg8,reg8rather than ANDing 16-bit values as shown
in Listing 22.4.
LISTING 22.4122-4.ASM
Clears
push
mov
cmp
jne
cmp
je
Start: cld
mov
sub
p r once a r
bp
bp.sp
word p t r Cbpl.BufSeg.0
Start
word p t rC b p l . B u f O f s . 0
Bye
ax.[bpl.Attrib
a1,al
bx.Cbpl.Filler
mov
bh,bh
sub
a x . b xo r
l edsi . d w o rpd[t br p l . B u f O f s
mov
cx.Cbpl.BufSize
srt eo ps w
: s a v ec a l l e r s B P
: p o i n tt os t a c kf r a m e
: s k i pt h e
fill i f a n u l l
: p o i n t e ri sp a s s e d
Bye :
POP
bP
E nr edtM r k - R e t A d d r - 2
Clears
endD
: r e s t o r e c a l l e r s BP
: r e t u r n .c l e a r i n gt h e
p a r m sf r o mt h es t a c k
Now were down to 40 bytes-more than 20 percent smaller than the original code.
Thats pretty much it for simple instruction-substitution optimizations. Now lets look
for instruction-rearrangement optimizations.
It seems strange to load a word value into AX and then throw away AL. Likewise, it
seems strange to load a word value into BX and then throw away BH. However, those
steps are necessary becausethe two modified word valuesare ORed into a single character/attribute word value that is then used to fill the target buffer.
Lets step back and see what this code really does, though. All it does in the end is
load one byte addressed relative to BP into AH and anotherbyte addressed relative
to BP into AL. Heck, we can just do that directly! Presto-weve saved another 6
bytes, and turned two word-sized memory accesses into byte-sized memory accesses
as well. Listing22.5 shows the new code.
41 8
Chapter 22
LISTING22.5122-5.ASM
Clears
p r once a r
push bp
mov
bp,sp
cmp
word p t r Cbpl.BufSeg.0
Start
jne
cmp
word p t r[ b p l . B u f O f s . O
je
Bye
S t a r t :c l d
mov
a h , b y t pe t[ rb p ] . A t t r i b [ l l
mov
a1 , b y t ep t r[ b p l . F i l l e r
l edsi . d w o rpd[t br p ] . B u f O f s
mov
cx,Cbpl.BufSize
s rt eo ps w
Bye :
POP
bp
E nr edtM r k - R e t A d d r - 2
e n d pC l e a r s
; s a v ec a l l e r ' s
BP
; p o i n tt os t a c kf r a m e
; s k i pt h e fill i f a n u l l
: p o i n t e r i s passed
;make
;load
;load
;load
;load
STOSW countup
AH w i t h a t t r i b u t e
AL w i t h fill c h a r
ES:OI w i t h t a r g e t b u f f e r s e g m e n t : o f f s e t
CX withbuffersize
;fill t h eb u f f e r
; r e s t o r e c a l l e r ' s BP
: r e t u r n .c l e a r i n gt h e
p a r m sf r o mt h es t a c k
(We could getrid ofyet anotherinstruction by having the calling code pack both the
attribute and thefill valueinto thesame word, but that's not partof the specification
for this particular routine.)
Another nifty instruction-rearrangement trick saves 6 more bytes. Clears checks to see
whether the far pointer
is null (zero) at the start
of the routine.. .then
loads and uses
that same far pointer later on.Let's get that pointerinto registers and keep it there;
that way we can check to seewhether it's null with a single comparison, and can use it
later without having to
reload it from memory.This technique is shown in Listing22.6.
LISTING 22.6122-6.ASM
Clears
push
mov
les
mov
or
je
p r once a r
bp
bp,sp
d i . d w o r dp t r[ b p ] . B u f O f s
ax.es
ax.di
Bye
Start: c l d
mov
a h . b y tpeC
t rb p l . A t t r i b C 1 1
mov
a l . b y t pe tCr b p ] . F i l l e r
mov
cx.[bpl.BufSize
s rt oe spw
Bye :
POP
bp
E nr edtM r k - R e t A d d r - 2
e n d pC l e a r s
; s a v ec a l l e r ' s
BP
; p o i n tt os t a c kf r a m e
;load ES:DI w i t ht a r g e tb u f f e r
;segment:offset
we c a n t e s t
;putsegmentwhere
; i s i t a n u l lp o i n t e r ?
;yes. s o w e ' r ed o n e
;make STOSW countup
; l o a d AH w i t h a t t r i b u t e
; l o a d AL w i t h fill c h a r
:load CX w i t hb u f f e rs i z e
:fill t h eb u f f e r
; r e s t o r ec a l l e r ' s
BP
: r e t u r n ,c l e a r i n gt h e
it
p a r m sf r o mt h es t a c k
Well. Now we're down to 28 bytes, having reduced the size of this subroutine by
nearly 50 percent. Only 13 instructions remain. Realistically, howmuch smaller can
we make this code?
About one-third smaller yet, as it turnsout-but in order to do that, we must stretch
our minds and use the 8088's instructions inunusual ways. Let me ask youthis: What
do most of the instructionsin the currentversion of Clears do?
Zenning and the Flexible Mind
41 9
Previous
Home
Next
They either load parameters from the stack frame or set up the registers so that the
parameters can be accessed. Mind you, theres nothing wrong with the stack-frameoriented instructions used in Clears;those instructions access the stack frame in a
highly efficientway, exactly asthe designers of the 8088 intended, and just
as the code
generated by a high-level language would. That means that we arent going to be able
to improvethe code if we dont bend therules a bit.
Lets think ...the parameters are sitting on the stack, and most of our instruction
bytes are beingused to read bytes off the stack with BP-based
addressing.. .we need a
more efficient way to address the stack...the stack.. .THE STACK!
Ye gods! Thats easy-we can use the stuck pointer to address thestack rather thanBP.
While its true that the stack pointer cant be used for mod-reg-rm addressing, as BP
can, it can be used to pop data off the stack-and POP is a one-byte instruction.
Instructions dont getany shorter than that.
There is one detail to be taken care of before we can put ourplan into action: The
return address-the address of the calling code-is on top of the stack, so the parameters we want cant be reached with POP. Thats easily solved, however-well
just pop the return address into an unused
register, then branch through thatregister when were done, as we learned to do in Chapter 14. As we pop theparameters,
well also be removing them from the stack, thereby neatly avoiding the need to
discard them when its time to return.
With that problem dealtwith, Listing 22.7 shows the Zennedversion of Clears.
dx
ax
bx
ah.bh
cx
di
es
bx.es
Bye
; g e tt h er e t u r na d d r e s s
; p u t fill c h a ri n t o AL
; g e tt h ea t t r i b u t e
AH
;putattributeinto
; g e tt h eb u f f e rs i z e
:gettheoffsetofthebufferorigin
: g e tt h es e g m e n to ft h eb u f f e ro r i g i n
;putthesegmentwhere
we c a n t e s t
;nul 1 p o i n t e r ?
;yes. so weredone
:make STOSW countup
:do t h e s t r i n g s t o r e
it
420
Chapter 22
Previous Home
part 2
Part 2
Next
Previous
chapter 23
Home
Next
The VGA is un
ry
of computer graphics, for itby
is far
the most
e closest we may ever come to a linguaj-anca of
computer graphics.
standard has
even
come close
to
the 50,000,000
or so VGAs in use t
Ily every
PC compatible sold
today has full VGA
compatibility built iq*.There are, of course, a variety of graphics accelerators that
outperform the sta6dardVGA, and indeed, it is becoming hard to find a plain vat there is no standard foraccelerators, and every accelerator
VGA at its core.
t if you write your programs for the VGA, youll have the
for your software. In order for graphics-based software to
st perform well. Wringing the best performance from the
VGA is no simple task, and its impossible unless you really understand how the VGA
works-unless you have the internals down cold. This book is about PC graphics at
many levels, but high performance is the foundation for all that is to come, so it is
with the inner workings of the VGA that we will begin our exploration of PC graphics.
The first eight chapters of Part I1 is a guided tour of the heart of the VGA, after
youve absorbed what well coverin this and the nextseven chapters, youll have the
foundation for understandingjust abouteverything the VGA can do, including the
fabled Mode X and more. As you read through these first chapters, please keep in
mind that the really exciting stuff-animation, 3-D, blurry-fast lines and circles and
425
polygons-has to wait until we have the fundamentalsout of the way. So hold on and
follow along, and before you know it thefireworks will be well underway.
Well start our exploration with a quick overview of the VGA, and then well dive
right in and get ataste of what the VGA can do.
The VGA
The VGA is the baseline adapter formodern IBM PC compatibles, present in virtually every PC sold today or in the last several years. (Note that the VGA is often
nothing more than a chip on a motherboard,
with some memory, a DAC, and maybe
a couple of glue chips; nonetheless, Ill refer to it as an adapter from now on for
simplicity.) It guarantees that every PC is capable of documented resolutions up to
640x480 (with 16 possible colors per pixel) and 320x200 (with 256 colors per pixel),
as well asundocumented-but nonetheless thoroughly standard-resolutions up to
360x480 in 256-color mode, as well see in Chapters31-34 and 47-49. In order for a
video adapter to claim VGA compatibility, it must support all the features and code
discussed in this book (with a very few minor exceptions that Ill note)-and my
experience is that just about100 percent of the video hardware currently shipping
or shipped since 1990 is in fact VGA compatible. Therefore, VGA code will run on
nearly all of the 50,000,000 or so PC compatibles out there, with the exceptions
being almost entirely obsolete machines from 1980s.
the This makes good VGA code
and VGA programming expertisevaluable commodities indeed.
Right off the bat, Id like to make one thing perfectly clear: The VGA is hardsometimes very hard-to program for good performance. Hard, but not
impossible-and thats why I like this odd board.Its a throwback to an earlier generation of micros, when inventive coding and asolid understanding of the hardware
were the best tools for improving performance. Increasingly, faster processors and
powerful coprocessors are seen as the solution to thesluggish softwareproduced by
high-level languages and layers of interface and driver code,and thats surely a valid
approach. However, there are tens of millions of VGAs installed right now, in machines ranging from &MHz 286s to 90-MHz Pentiums. Whats more, because the
VGAs are generally 8- or at best 16-bit devices, and because of display memory wait
states, a fasterprocessor isnt as much of a help as youd expect. The upshot is that
only a seasoned performance programmerwho understands the VGA through and
through can drive the board toits fullest potential.
Throughout this book, Ill explore the VGA by selecting aspecific algorithm or feature and implementing code to support it on the VGA, examining aspects of the
VGA architecture as they become relevant.Youll get tosee VGA features in context,
where they are more comprehensible than in IBMs somewhat arcane documentation, and youll get working code to use or to modify to meet your needs.
The prime directive of VGA programming is that theres rarely just oneway to program theVGA for agiven purpose. Once you understand thetools the VGA provides,
426
Chapter 23
At the Core
A little background is necessarybefore were ready to examine Listing
23.1. The VGA is
built around fourfunctional blocks, named the CRT Controller (CRTC), the Sequence
Controller (SC), the Attribute Controller
(AC), and the Graphics Controller (GC).
The single-chipVGA could have been designedto treat theregisters for all the blocks
as one large set, addressed at one pair of 1/0 ports, but in the EGA, each of these blocks
was a separate chip,and thelegacy of EGA compatibility is why each of these blocks
has a separateset of registers and is addressed at differentI/O ports in theVGA.
Each of these blocks has a sizable complement of registers. It is not particularly important that you understand why a given block hasa given register; all the registers together
make up the programming interface, andis it
the entire interface that
is of interest
to the VGA programmer. However, the means by which most VGA registers are addressed makes it necessary for you to rememberwhich registers are in which blocks.
Most VGA registers are addressed as internally indexed registers. The internaladdress
of the register is written to a given blocksIndex register, and then the data for that
register is written to the blocks Data register. For example, GC register 8, the Bit
Bones and Sinew
427
Mask register, is set to OFFH by writing 8 to port SCEH, the GC lndex register, and
then writing OFFH to port SCFH, the GC Data register. Internal indexing makes it
possible to address the9 GC registers through only two ports, and allows the entire
VGA programming interface to be squeezed into fewer than a dozen ports. The
downside is that two 1 / 0 operations are required toaccess most VGA registers.
The ports used to control theVGA are shown in Table 23.1. The CRTC, SC, and GC
Data registers are located at the addresses of their respective Index registers plus
one. However, the AC Index and Data registers are located at the same address,
3COH. The function of this port toggles on every OUT to 3COH, and resets to Index
mode (in which the Index register is programmed by the next OUT to 3COH) on
every read from theInput Status 1 register (3DAH when the VGA is in a color mode,
428
Chapter 23
The method used in the VGA BIOS to set registers is to point DX to the desired
Index register, load AL with the index, perform
a byte OUT, incrementDX to point
to the Data register (except in the case ofthe AC, where DX remains the same),load
AL with the desired data, and performa byte OUT. A handy shortcut is to point DX
to the desired Index register, load AL with the index, load AH with the data, and
perform a word OUT. Since the high byte of the OUT value goes toport DX+1, this is
equivalent tothe first method butis faster. However,this technique does not
work for
programming the AC Index and Data registers; both AC registers are addressed at
3COH, so two separate byte OUTs must be used to program the AC. (Actually, word
OUTs to the AC do work in the EGA, but notin the VGA, so they shouldnt be used.)
As mentioned above, you must be sure which mode-Index or Data-the AC is in
before you do anOUT to 3COH; you can read the InputStatus 1 register at any time
to force the AC to Index mode.
How safe is the word-OUT method of addressing VGA registers? I have, in the past,
run into adapter/computer combinations that had trouble with word OUTs; however, all such problems I am aware of have been fixed. Moreover, a great deal of
graphics software now uses word OUTs, so any computer orVGA that doesnt properly support word OUTs could scarcely be considered a clone at
all.
A speed tip: The setting of each chip S Index register remains the same until it is
reprogrammed. This means that in cases where you are setting the same internal
register repeatedly, you can set the Indexregister to point tothat internal register
once, then write to the Data register multiple times. For example, the Bit Mask
register (GC register 8) is often set repeatedly inside a loop when drawing lines.
The standard code for this is:
MOV
MOV
OUT
DX.03CEH
AL.8
DX ,AX
; p o itnot
GC I n dreexg i s t e r
; i n t e r n a li n d e x
o f B i t Mask r e g i s t e r
;AH c o n t a i nBs i t
Mask r e g i s t esr e t t i n g
Alternatively, the GC Index register could initially be set to point tothe Bit Mask
register with
MOV
MOV
OUT
INC
DX.03CEH
AL.8
DX.AL
DX
: p o i n t t o GC I n d e xr e g i s t e r
o f B i t Mask r e g i s t e r
; i n t e r n a li n d e x
; s e t GC I n d e xr e g i s t e r
: p o i n t t o GC D a t ar e g i s t e r
and then the Bit Mask register could be set repeatedly with the byte-size OUT
instruction
OUT
DX.AL
:AL c o n t B
a i nt s
Mask r e g i s et et tr i n g
429
which is generally faster (and never slower) than a word-sized OUT, and which
does not require AH to be set, freeing up a register. Of course, this method only
works ifthe GC Index register remains unchanged throughout theloop.
430
Chapter 23
Byte from
Plane 0
0 -
Byte from
Plane
Byte from
Plane 2
Byte from
Plane 3
n
u
Palette RAM
"+
(1 6 6-Bit-wide
storage
1 locations
bit first
plane 2 (red)
plane byte, shifted out 1
per dot clock, mostsignificant bit first
8 bits
from
addressed
with four bits
from
memory)
2 -
3
4
5 -
One pixel
per dot
clock to
+
digital-toanalog
converter
(DAC)
1 .ASM
:
:
:
:
Sample V G A p r o g r a m .
A n i m a t e sf o u rb a l l sb o u n c i n ga r o u n d
a p l a y f i e l db yu s i n g
p a g ef l i p p i n g .P l a y f i e l di sp a n n e ds m o o t h l yb o t hh o r i z o n t a l l y
and v e r t i c a l l y .
: By M i c h a e Al b r a s h .
s t a cske g m e np ta rsat a c' Sk T A C K '
d u p5(1?2) d b
setnadc sk
MEORES"/IOEO~MOOE
equ
VIOEO_.SEGMENT
OaOOOh
equ
LOGICAL-SCREENKWIOTH
equ
6: 4d 0fevoxfiri3dn5ee0o
mode
: comment o u t f o r 640x200 mode
: d i s p l a y memory
segment
for
: t r u e VGA g r a p h i c s modes
6 7 2: /w8i bdi ytnht aehns ed i sgi chnat n
0
431
LOGICALLSCREEN-HEIGHT
PAGE0
PAGEl
PAGEOKOFFSET
PAGElLOFFSET
BALLLWIOTH
BALLLHEIGHT
BLANK-OFFSET
equ
equ
equ
BALL-OFFSET
equ
NUM-BALLS
equ
equ
equ
equ
384
equ
; w e ' l lw o r kw i t h
0
; f l af pog ar g e
0 when
page
flipping
1
; f l af pog ar g e
1 when
page
flipping
0
; s t a or tf f s eoptfa g e
0 i n VGA memory
LOGICALLSCREEN-WIDTH * LOGICALLSCREENKHEIGHT
1 ( b o t hp a g e s
; s t a r to f f s e to fp a g e
; a r e6 7 2 x 3 8 4v i r t u a ls c r e e n s )
2 4 1 8; w i d t hobf a li lnd i s p l a y
memory b y t e s
2; h4 e i gbohafsti nlcl al inn e s
PAGE1-OFFSET * 2
; s tbaol raf tinmka g e
; i n VGA memory
BLANK-OFFSET + (BALLLWIDTH * BALLLHEIGHT)
: s t a r to f f s e to fb a l li m a g ei n
VGA memory
;number o f b a l l s t o a n i m a t e
4
; VGA r e g i s t e re q u a t e s .
SC-INDEX
MAP-MASK
3c4h
equ
;SC i nrdeegxi s t e r
equ
2
; S C map mask r e g i s t e r
GC-INDEX
3cehequ
;GC r iengdi es xt e r
equ
GC-MODE
5
:GC mode r e g i s t e r
CRTC-INDEX
03d4h
equ
;CRTC i n dr eegx i s t e r
STARTLADDRESS-HIGH equ Och
:CRTC s t a ar td d r e shsi gbhy t e
START-ADDRESS-LOW equ
Odh
;CRTC s t aar dt d r e sl osbwy t e
CRTC-OFFSET
13hequ
:CRTC o rf ef sg ei st t e r
INPUT-STATUS-1 03dah
equ
;VGA s t a rt ue sg i s t e r
VSYNC-MASK
e 0q :8uvhe r t i csayslbni tnicat t ruesg i s t e r
DE-MASK
e qO;uldhi s p l ea ny a bsbil tnei at t ruesg i s t e r
AC-INDEX
03cOh
equ
:AC i n dr eegx i s t e r
HPELPAN
OR 1 3 h
: A C h o r i z o n t papel al n n i nr eg g i s t e r
equ
20h
: ( b i t 7 i sh i g ht ok e e pp a l e t t e
; a d d r e s s i n go n )
dseg
segment
para
common 'DATA'
Currentpage
PAGEl
;page t o draw t o
CurrentPageOffset
dw
PAGEl-OFFSET
db
: F o u rp l a n e ' sw o r t ho fm u l t i c o l o r e db a l li m a g e .
B a l l P1 aneOImage1abelbyte
: b l u ep l a n ei m a g e
d b0 0 0 h0. 3 c h0. 0 0 h0. 0 1 hO
. f f h0. 8 0 h
0d0b7Ohf.f h .
DeOh. O OO
f hf.f h .
OfOh
4 db
*d u3p ( 0 0 0 h )
0d 7b fOhf.fOhf.eOhf.fOhf.fOhf.f h
dObf fOh f. fOh f. fOh f. fOh f. fOh f. f h
4 db
*d u3p ( 0 0 0 h )
0d 7b fOhf.fOhf.e0h3. fOhf.fOhf.c h
0d3bO
f hf.O
f hf.cOhl.O
f hf.O
f hf.B h
4 db
*dup(000h)
3
B a l l P1 a n e l I m a g e1a b e lb y t e
: g r e e np l a n ei m a g e
*dup(000h)
3
4db
O
d bl fOhf.fOhf,80h3.fOhf.fOhf.c h
0d3bO
f hf.O
f hf.c0h7.O
f hf.O
f hf.e h
0d 7b fOhf.fOhf.eOhf.fOhf.fOhf.f h
dObf fOh f. fOh f. fOh f. fOh f. fOh f. f h
db
8 *d u3p ( 0 0 0 h )
OOfh.
db Offh.
OfOh.
007h.
Offh.
OeOh
001h.
dbOffh.
080h.
000h.
03ch.
OOOh
B a l l P1 a n e 2aIb;m
pbryle1atiam
eldgnaeeg e
*d u3p ( 0 0 0 h )
1d2b
432
Chapter 23
1
1
RAM
db
O f f h ,O f f h .O f f h .O f f h .O f f h .O f f h
db
O f f hO
. f f hO
. f f h 0. 7 f h O
. f f hO
, feh
0d7bO
f hf.O
f hf.e0h3.O
f hf.O
f hf.c h
db
03fh. O f f h . Ofch. O l f h . O f f h . Of8h
db
OOfh, O f f h . OfOh. 007h. O f f h . OeOh
001h.
db
O f f h . 080h. 000h. 0 3 c h . OOOh
BPal al ln e 3 I m a g e
1a b e
y: itlnet e n s i t y
on paf oll alr n e s ,
: t op r o d u c eh i g h - i n t e n s i t yc o l o r s
db
000h. 03ch. 000h. 001h. O f f h 0. 8 0 h
db
007h. O f f h . OeOh. OOfh. O f f h . OfOh
db
. fch
Olfh. O f f h . Of8h. 03fh. OffhO
db
. feh
03fh. O f f h . Ofch. 07fh. OffhO
db
0 7 f h . O f f h . O f e h , O f f h . O f f h .O f f h
db
O f f h . O f f h . O f f h . O f f h . O f f h .O f f h
db
O f f h . O f f h , O f f h . O f f h . O f f h .O f f h
db
. feh
Offh. O f f h . O f f h . 07fh. OffhO
db
. fch
07fh. O f f h . Ofeh. 03fh. OffhO
db
. f8h
03fh. O f f h . Ofch, O l f h . OffhO
db
OOfh. O f f h . OfOh. 007h. O f f h . OeOh
db
001h. O f f h . 080h, 000h. 0 3 c h . OOOh
BallX
BallY
L a s t B a l 1X
LastBallY
BallXInc
BallYInc
B a l l Rep
dw
dw
dw
dw
dw
dw
dw
Ball Control
dw
dw
BallControlString
1 5 . 5 0 , ; a74br00aor .lafl y
4 0 , 200.
110.
300
1 5 . 50. 40.70
40. 100. 160. 30
1. 1. 1. 1
8. 8, 8. 8
1. 1 . 1. 1
x coords
: a r r abo yaf l l
y coords
; p r e v i o u sb a l l
x coords
: p r e v i o u sb a l l y c o o r d s
: x move f a c t o r s f o r b a l l
;y move f a c bt foaorlrsl
:m
B
k eoevttipiom
n ge s
: b a l la c c o r d i n gt oc u r r e n t
: increments
B a l l O C o n tBr ao ll l,l C o n: tproociluntrtoreer ns t
; l o c a t bi oai nnl ls
B a l l 2 C o n tBr oa l .l 3 C o n t r o l
; contros
l trings
dw
B a l l O C o n t rBoal l, l l C o n t: rpool i n t teor s
dw
B a l l 2 C o n t r o 1B,a l l 3 C o n t r o l
: s t a rotbf a l l
: c o n t r o sl t r i n g s
: B a l cl o n t r o sl t r i n g s .
B a l l OwClooarnbdterlo l
dw
Bl al Cl o n t r o l
dw
Bal12Control
dw
B a l l 3wClooarnbdter lo l
dw
-1. 41 ,0 .
-1. - 41 .0 ,
1. - 4 . 0
1 0 . 1. 41 ,0 .
1 a bweolr d
1 2 . -1. 1. 28. -1. -1. 1 2 . 1. -1. 28. 1 . 1. 0
1 awboer dl
20, 0. -1. 40. 0 . 1. 2 0 , 0 . -1. 0
8. 1. 0. 5 2 . -1. 0 . 44. 1. 0 . 0
: P a n n i n gc o n t r o ls t r i n g .
i f d e f MEDRESpVIOEO_MODE
PanningControlString
dw
3 2 . 1. 0 . 34. 0 . 1. 3 2 . -1, 0 . 3 4 . 0 . -1. 0
else
PanningControlString
dw
3 2 . 1. 0. 1 8 4 , 0, 1 . 32. -1. 0. 1 8 4 . 0. -1. 0
endif
P a n n i n g C o n t r o l dw
P a n n i n g C o n t r o l S t r i: np go i cn tut oer r el onct a t i o n
; i np a n n i n gc o n t r o ls t r i n g
dw
1
; # t i m e st op a na c c o r d i n gt oc u r r e n t
PanningRep
: p a n n i n gi n c r e m e n t s
1
dw
PanningXInc
; x p a n n i n gf a c t o r
0
dw
PanningYInc
;y p a n n i n gf a c t o r
433
HPan
db
P a n n i n g S t a r t O f f s e t dw
0
0
ends
dseg
: Macro t o s e t i n d e x e d r e g i s t e r
P2 o f c h i p w i t h i n d e x r e g i s t e r
; a t P 1 t o AL.
SETREG
macro
mov
mov
mov
dx.ax out
endm
P 1 . P2
dx,P1
ah.al
a1 .P2
c s es eg g m epnatpr au b l i c
assume
cs:cseg,
ds:dseg
n pesratoar cr t
mov
ax.dseg
mov
ds.ax
: S e l e c tg r a p h i c s
'CODE'
mode.
i f d e f MEDRES-VIDEO-MODE
mov
ax.010h
else
mov
ax.0eh
endif
int
10h
: ES a l w a y sp o i n t st o
mov
mov
VGA memory.
ax.VIDE0-SEGMENT
es ,ax
: Draw b o r d e ra r o u n dp l a y f i e l di nb o t hp a g e s .
mov
d i , PAGEO-OFFSET
D rcaawl l B o; pr da eg re
mov
d i .PAGEl-OFFSET
D rcaawl lB o; pr ad ge er
: Draw a l l f o u r p l a n e ' s w o r t h o f t h e b a l l
0 border
1 border
t o u n d i s p l a y e d VGA memory.
; e n a b l ep l a n e
mov
a1 , O l h
SETREG S C - I N D E X . MAP-MASK
mov
s. oi f f s eBtaPl ll a n e O I m a g e
mov
d i .BALL-OFFSET
mov
cx.BALL-WIDTH * BALLLHEIGHT
r e p movsb
mov
a1
: epn.l a0an2behl e
SETREG S C - I N D E X . MAP-MASK
mov
, os if f sBeat l l P l a n e l I m a g e
mov
di.BALL-OFFSET
mov
cx.BALL-WIDTH * BALLLHEIGHT
r e p movsb
mov
a1 .04h
: e n a b l ep l a n e
SETREG S C - I N D E X . MAP-MASK
mov
. os fi f sBeat l l P l a n e 2 I m a g e
mov
d i .BALLLOFFSET
434
Chapter 23
mov
cx.BALLLWIDTH * BALLLHEIGHT
repmovsb
mov
p: el annla. e0b8l eh
SETREG SC-INDEX.
MAP-MASK
mov
s i . oB
f fas lel P
t lane3Image
mov
d i .BALL-OFFSET
mov
cx,BALL-WIDTH * BALL-HEIGHT
repmovsb
: Draw a b l a n k i m a g et h es i z eo ft h eb a l lt ou n d i s p l a y e d
VGA memory.
mov
a1; e. O
na alf lhb l e
memory p l ast nhi necse,
SETREG S C - I N D E X , MAP-MASK
; b l a nhkatseor a saepl ll a n e s
mov
d i .BLANK-OFFSET
mov
cx.BALLLWIDTH * BALLLHEIGHT
sub
a1 .a1
r e ps t o s b
; S e t VGA t o w r i t e
mov
mov
d;oxpu.oat iln t
t; op o i dn xt i n c
jmp
in
and
or
jmp
dx.al out
mode 1. f o rb l o c kc o p y i n gb a l la n db l a n ki m a g e s
dx.GCLINDEX
a1 .GCLMODE
GC I n tdoe x
GC Mode r e g i s t e r
GCr eDg ai st tae r
$+2
s ebt tul sel e t ; tdoe l a y
a1 :c,gduexrtrsetonaftt e
: c l etwhareri t e
a1 . n o t 3
a1 .1
w:t shr eiet te
$+2
s e bt tul sel e t : tdoe l a y
GC Mode
mode b i t s
mode f i et ol d
1
: S e t VGA o f f s e tr e g i s t e ri nw o r d st od e f i n el o g i c a ls c r e e nw i d t h .
mov
SETREG
a1 .LOGICALLSCREENLWIDTH / 2
CRTC-INDEX. CRTC-OFFSET
: Move t h eb a l l sb ye r a s i n ge a c hb a l l ,m o v i n g
: r e d r a w i n g it, t h e ns w i t c h i n gp a g e s
BallAnimationLoop:
mov
b x . ( NUM-BALLS
EachBall Loop:
; E r a s eo l di m a g e
mov
mov
mov
call
of ballinthis
i t , and
when t h e y ' r e a l l moved.
2 ) - 2
page ( a tl o c a t i o nf r o mo n em o r ee a r l i e r ) .
si.BLANKLOFFSET : p o i btnolt a ni m
k age
cx,[LastBallX+bxl
dx.[LastBallY+bxl
DrawBal 1
: S e t new l a s t b a l l l o c a t i o n .
mov
mov
mov
mov
;
Change t h e b a l l
ax.[BallX+bxl
[LastballX+bxl.ax
ax.[BallY+bxl
[LastballY+bxl.ax
movementvalues
if it'stimeto
l[RB
deeapcl c;+hubarxrrse]epfnoaertucuattn?to r
M o v e Bj na zl l
mov
s i , [ B a l l C o n t r o l + b; xi tl 'i sm
ct hoea n g e
do s o .
movement
values
435
; g e t1o d s w
f rnew
om
f a cr teopre a t
; c o n t r o ls t r i n g
e n d ; a ta x . a ax n d
jnz
mov
; g e t 1o d s w
SetNewMove:
mov
1 odsw
mov
1 odsw
mov
mov
SetNewMove
si,[BallControlString+bxl
[ B a l l R e p + b;xsl e. at x
[BX
a lIln c + b, ax lx
; s e t new y increment
movement
[BallYInc+bxl.ax
[ B a l l C o n t r o l + b x: ls, as iv e
; Move t h e b a l l .
MoveBall
mov
a[xB, aXl Il n c + b x l
X[ ,Ba+adbxldxl l
mov
a x[,B a l l Y I n c + b x l
[ B a al ldYd+ b x l . a x
; Draw b a l l a t
:move i n y d i r e c t i o n
new l o c a t i o n .
mov
mov
mov
Draw
c aBl al l l
bx
bx
;move i n x d i r e c t i o n
si.BALL-OFFSET
c[ xB.aXl+l b x l
dx.CBallY+bxl
; p o bitnoatl li 'ms a g e
dec
dec
E jancshLBoaol pl
; S e tu pt h en e x tp a n n i n gs t a t e( b u td o n ' tp r o g r a m
; VGA y e t ) .
it intothe
A d j cu as lt lp a n n i n g
; W a i tf o rd i s p l a ye n a b l e( p i x e ld a t ab e i n gd i s p l a y e d )
; w e ' r en o w h e r en e a rv e r t i c a ls y n c .w h e r et h es t a r ta d d r e s sg e t s
; l a t c h e da n du s e d .
call
; Fliptothe
Wai t D i s pa ly E n a b l e
new p a g eb yc h a n g i n gt h es t a r ta d d r e s s .
mov
add
push
SETREG
mov
POP
mov
SETREG
ax.[CurrentPageOffsetl
ax.CPanningStartOffset1
ax
CRTC-INDEX. START-ADDRESS-LOW
a 1 , b y t ep t r[ C u r r e n t P a g e O f f s e t + l l
ax
a1 ,ah
CRTC-INDEX. START-ADDRESS-HIGH
; W a i tf o rv e r t i c a ls y n c
; t ot a k ee f f e c t .
436
Chapter 23
s o we know
s o t h e new s t a r ta d d r e s sh a s
a chance
call
Wai tVSync
; S e th o r i z o n t a lp a n n i n g
mov
mov
in
mov
mov
; sdext . ao lu t
a1 ,[HPanl
dx.INPUT-STATUS-1
;a1
r e,sdext
dx.AC-INDEX
a1 .HPELPAN
AC a dridnerdgeetsoxs i n g
r e gp a np eAC
l t oi n d e x
a1 .[HPanl
mov
;set dx.al
now, j u s t as new s t a r ta d d r e s st a k e se f f e c t .
out
p a nnew
n i n gp e l
; F l i pt h ep a g et od r a wt ot ot h eu n d i s p l a y e dp a g e .
C C u r r exnotrP a g e l . 1
I s P a g jenl z
[CurrentPageOffset].PAGEO-OFFSET
mov
sEhjnm
odrpFt l i p P a g e
IsPagel:
mov
[CurrentPageOffsetl.PAGEl-OFFSET
EndFl ipPage:
;
E x i t i f a k e y ' sb e e nh i t .
mov
ah.1
int
16h
jnz
Done
B a l l A nj m
i mpa t i o n L o o p
; F i n i s h e d ,c l e a rk e y ,r e s e ts c r e e n
mode and e x i t .
Done:
mov
int
start
k; ec ylaeha. r0
16h
mov
int
a ;xr.te3etsoxett
10h
mov
int
a h . 4; cethxoi t
21h
mode
DDS
endp
; R o u t i n et od r a w
a b a l l - s i z e di m a g et o
all p l a n e s .c o p y i n gf r o m
VGA memory i n
DrawBall
mov
mu1
add
add
mov
mov
push
push
POP
D r a w B a l l Loop:
n e ap r o c
ax.LOGICAL-SCREEN-WIDTH
d;xo f f s eostft a rottfoipm a gsec alni n e
a x . c x; o f f s e ot uf p p e lre f ot ifm a g e
a x . [ C u r r e n t P a g e O f f s e t :] o f f s e ot sf t a r ot pf a g e
di ,ax
bp,BALL-HEIGHT
dS
es
dS
;move f r o m VGA memory t o VGA memory
437
dipush
mov
cx.BALL-WIDTH
r e p movsb
;draw
a s c al inniomef a g e
di
POP
di.LOGICAL-SCREEN-WIDTH ; p o i n t t o n e x t d e s t i n a t i o n s c a n l i n e
add
dec
bp
DrawBall Loop
j nz
ds
POP
ret
endp
DrawBall
; W a i tf o rt h e
l e a d i n ge d g eo fv e r t i c a ls y n cp u l s e .
n e ap r o c
Wai tVSync
dx.INPUT-STATUS-1
mov
WaitNotVSyncLoop:
in
a1 .dx
and
a1 .VSYNC-MASK
jnz
Wai tNotVSyncLoop
WaitVSyncLoop:
in
a1 ,dx
and
a1 .VSYNC-MASK
WaitVSyncLoop
Jz
ret
endp WaitVSync
; W a i tf o rd i s p l a ye n a b l et oh a p p e n( p i x e l st ob es c a n n e dt o
; t h es c r e e n ,i n d i c a t i n gw e ' r ei nt h em i d d l eo fd i s p l a y i n g
a frame).
panning.
A d j u s t p a n n i n g n e ap r o c
dec
;timetoget
new p a n n i n gv a l u e s ?
[PanningRepl
DoPan
inz
mov
s i . C P a n n i n g C o n t r ;oplc1outionr rt el onct a t i no n
: p a n n i n gc o n t r o ls t r i n g
odsw
1
f a c t o r e p ep a tn n i n g; g e t
spctaor ninnntgri no? glf e n d ; a ta x . aaxn d
SetnewPanVal
jnz
ues
mov
s i . o f f s ePta n n i n g C o n t r o l S t r i n; gr e s etsott a orstft r i n g
1 odsw
f a c t o r e p ep a tn n i n g; g e t
SetNewPanValues:
mov
C P a n n i n g R e p; 1s e. at ~
new p a nr nevipnaegl ua et
1 odsw
mov
C P a n n i n g X I n; hco1r. ipazav~oannlntuai enl g
1 odsw
mov
C P a n n i n g Y I n; vcpe1avr.tanai lnc~uai enl g
mov
[ P a n n i n g C o n t r o l ;] s, sacivu er rleonctap taiinno n i n g
438
Chapter 23
: c o n t r o ls t r i n g
; Pan a c c o r d i n g t op a n n i n gv a l u e s .
OoPan:
PanLeft:
mov
and
js
jz
mov
in c
CmP
jb
sub
in c
j mp
ax,[PanningXIncl
ax, ax
PanLeft
CheckVerticalPan
a1 , [HPanl
a1
al.8
SetHPan
a1 .a1
[PanningStartOffsetl
s h o r t SetHPan
mov
a1 .[HPan]
a1
SetHPan
al.7
[PanningStartOffsetl
dec
jns
mov
dec
: h o r i z o n t apl a n n i n g
: n e g a t i v e meanspan
left
:pan r i g h t : i f p e pl a nr e a c h e s
move t o t h e
; n e x tb y t ew i t h
a p e lp a no f
: and a s t a r t o f f s e t t h a t ' s o n e
: higher
: 8. i t ' s t i m e t o
-1,
:pan l e f t : i f p e pl a nr e a c h e s
: i t ' s t i m e t o move t o t h e n e x t
: b y t e w i t h a p e lp a no f
7 and a
: s t a r to f f s e tt h a t ' so n el o w e r
SetHPan:
[HPanl .a1
; s a v e new p e lp a nv a l u e
mov
Checkvertical Pan:
mov
ax,[PanningYIncl
: v e r t i c apl a n n i n g
ax.ax and
: n e g a t i v e meanspanup
js
PanUp
jz
EndPan
add
[PanningStartOffsetl,LOGICAL_SCREEN_WIDTH
; p a nd o w nb ya d v a n c i n gt h es t a r t
; a d d r e s sb y
a s c a nl i n e
s hj mo pr t
EndPan
PanUp:
[PanningStartOffsetl.LOGICAL_SCREEN_WIDTH
sub
; p a nu pb yr e t a r d i n gt h es t a r t
: a d d r e s sb y
a s c a nl i n e
EndPan:
ret
: Draw t e x t u r e d b o r d e r a r o u n d p l a y f i e l d t h a t s t a r t s a t
DI.
n e aDr rparwoBc o r d e r
: Draw t h e l e f t b o r d e r .
dipush
rnov
cx.LOGICAL-SCREEN-HEIGHT / 16
DrawLeftBorderLoop:
mov .Ocha1
: s ecbrlofleolcdorctrk
Draw
c aBlol r d e r B l o c k
a dddi
.LOGICAL-SCREEN-WIDTH
* 8
mov .Oeha1
; s e yl eeclcblt ofl olworc rk
D r a wc B
a lol r d e r B l o c k
add
di.LOGICAL-SCREEN-WIDTH
* 8
D r al w
o oLpe f t B o r d e r L o o p
pop
di
: Draw t h er i g h tb o r d e r .
d ip u s h
439
add
di.LOGICAL-SCREEN-WIDTH - 1
mov
cx.LOGICAL-SCREEN-HEIGHT
/ 16
DrawRightBorderLoop:
mov
a1 .Oeh
; s e lyeeclcbtl ofl oow
l orcrk
D r a wc B
a lol r d e r B l o c k
add
di.LOGICAL-SCREEN-WIDTH * 8
.Ocha1mov
: s ecbrlofleolcdorctrk
cDarlal w B o r d eor cBkl
add
di.LOGICAL-SCREEN-WIDTH * 8
D r al w
o oRpi g h t B o r d e r L o o p
di
pop
;
Draw t h e t o p b o r d e r .
dipush
cx.(LOGICAL-SCREEN-WIDTH
- 2) / 2
mov
DrawTopBorderLoop:
di
inc
mov
a1 .Oeh
; s e lyeeclcbtl ofl oow
l orcrk
D r a wc B
a lol r d e r B l o c k
di
inc
mov
a1 .Och
; s ecbrl ofel olcdorct rk
cDarlal w B o r d eor cBkl
DrawTopBorderLoop
loop
di
pop
; Draw t h eb o t t o mb o r d e r .
di.(LOGICAL-SCREEN-HEIGHT
- 8 ) * LOGICAL-SCREEN-WIDTH
add
mov
cx.(LOGICAL-SCREEN-WIDTH - 2 ) / 2
DrawBottomBorderLoop:
di
inc
mov
a1 .Och
; s ecbrlofleolcdorctrk
D r a wc B
a lol r d e r B l o c k
dl
inc
mov
a1 .Oeh
; s e lyeeclctlbofl ow
loorcrk
cDarlal w B o r d eor cBkl
D r al w
o oBpo t t o m B o r d e r L o o p
ret
endp
DrawBorder
; D r a w sa n8 x 8b o r d e rb l o c k
; D I preserved.
incolorin
D r a w B o r d e r B lponrceokac r
dipush
SETREG SC-INDEX. MAP-MASK
mov
a1 . O f f h
rept8
stosb
add
di.LOGICAL-SCREEN-WIDTH
endm
POP
di
ret
DrawBorderBl ock endp
A d j u s t p a nenni dn pg
ends
cseg
startend
440
Chapter 23
AL a t l o c a t i o n
- 1
01.
Smooth Panning
The first thing youll notice upon running the sample program is the remarkable
smoothness with which the display pans from side-to-side and up-and-down. That
the display can pan at all is made possible by two VGA features: 256K of display
memory and thevirtual screen capability. Eventhe most memory-hungry of the VGA
modes, mode 12H (64Ox480), uses only 37.5Kper plane, for a total of 150K out of
the total 256K of VGAmemory. The medium-resolution mode, mode10H (640~350),
requires only 28K per plane, for atotal of 112K. Consequently, there is room in VGA
memory to store more than two full screens of video data in mode 1OH (which the
sample program uses), and there
is room inall modes to store alarger virtual screen
than is actually displayed. In the sample program, memoryis organized as two virtual
screens, each with a resolution of 672x384, as shownin Figure 23.2. The areaof the
virtual screen actually displayed at any given time is selected by setting the display
memory address at which to begin fetching video data; this is set by way of the start
address registers (Start Address High, CRTC register OCH, and Start Address Low,
CRTC register ODH). Together these registers make up a 16-bit displaymemory address at which the CRTC begins fetching dataat the beginningof each video frame.
Increasing the start address causes higher-memory areas of the virtual screen to be
A000 :0000
A000 :7 EO0
A000 : FCOO
Figure 23.2
441
displayed. For example, the Start Address High register could be set to SOH and the
Start Address Low register could be set to OOH in order tocause the display screen to
reflect memory starting at offset 8000H in each plane, rather than at the default
offset of 0.
The logical height of the virtual screen is defined by the amount of VGA memory
available. As the VGA scans display memory for video data, it progresses from the
start address toward higher memory one scan line at a time, until the frameis completed. Consequently, if the start address is increased, lines farther toward the bottom
of the virtual screen are displayed; in effect, the virtual screen appears to scroll up on
the physical screen.
The logical width of the virtual screen is defined by the Offset register (CRTC register 13H), which allows redefinition of the number of words of display memory
considered to makeup onescan line. Normally,
40 words of display
memory constitute a
scan line; after the CRTC scans these40 wordsfor 640 pixels worth data,
of it advances 40
words from thestart of that scan line to find the start of the nextscan line in memory.
This means thatdisplayed scan lines are contiguous in memory. However, the Offset
register can be set so that scan lines are logically wider (or narrower, for thatmatter)
than their displayed width.The sample program sets the Offset register to
2 A H , making
the logical width ofthe virtual screen 42 words, or 42 * 2 * 8 = 672 pixels, as contrasted
with the actual width of the mode 10h screen, 40 words or 640 pixels. The logical
height of the virtual screen inthe sample program is 384; this is accomplished simply
by
reserving 84 * 384 contiguous bytes ofVGA memory forthe virtual screen, where 84
is the virtual screen width in bytes and 384 is the virtual screen height inscan lines.
The start address is the key to panning around thevirtual screen. The start address
registers select the row of the virtual screen that maps to the top of the display;
panning down a scan line requires only that the start address be increased by the
logical scanline width in bytes, which is equal to the Offset register timestwo. The start
address registers select the column that mapsto the left edge of the display as well,
allowing horizontal panning, although in this case only relatively coarse byte-sized
adjustments-panning by eight pixels at a time-are supported.
Smooth horizontal panning is provided by the Horizontal Pel Panning register, AC
register 13H, working in conjunction with the start address. Up to 7 pixels worth of
single pixel panning of the displayed image to the left is performed by increasing
the Horizontal Pel Panning register from 0 to 7. This exhausts the rangeof motion
possible via the Horizontal Pel Panning register; the next pixels worth of smooth
panning is accomplished by incrementing thestart address by one andresetting the
Horizontal PelPanning register to0. Smooth horizontal panning should be viewed as a
series of fine adjustments in the 8-pixel range between coarse byte-sized adjustments.
A horizontal panning oddity: Alone among VGA modes, text mode (in most cases)
has 9 dots per character clock. Smooth panning in this mode requires cycling the
442
Chapter
23
443
image is written to the same VGA addresses; only the destination plane, selected by
the Map Mask register, is different. You might think of the balls image as consisting
of four colored overlays, whichtogether makeup a multicolored image. The sample
program writes a blank image to VGA memory by enabling all planes and writing a
block of zero bytes; the zero bytes are written to all four VGA planes simultaneously.
The images are written to a nondisplayed portion of VGA memory in order to take
advantage of a useful VGA hardware feature, the ability to copy all four planes at
once. As shown by the image-loading code discussed above, four different sets of
reads and writes-and several OUTs as well-are required to copy a multicolored
image into VGA memory as would be needed to draw the same image into a nonplanar pixel buffer. This causes unacceptably slow performance, all the more so
because the wait states that occur on
accesses to VGA memory make it
very desirable
to minimize display memory accesses, and because OUTs tend to be very slow.
The solution is to take advantage of the VGAs writemode 1,which isselected via bits
0 and 1 of the GC Mode register (GC register 5 ) . (Be careful to preserve bits 2-7
when setting bits 0 and 1, as is done in Listing 23.1.) In write mode 1, a single CPU
read loads the addressed byte from all four planes into the VGAs four internallatches,
and a single CPU write writes the contents of the latches to the fourplanes. During
the write, the byte written by the CPU is irrelevant.
The sample program uses write mode 1 to copy the images that were previously
drawn to the high end of VGA memory into a desiredarea of display memory, allin
a single block copyoperation. This is an excellent way to keep the numberof reads,
writes, and OUTs required to manipulate theVGAs display memory low enough to
allow real-time drawing.
The Map Mask register can still maskout planes in write mode 1.All four planes are
copied in thesample program because the Map Mask register is still OFh from when
the blank image was created.
The animatedimages appear to move a bitjerkilybecause they are byte-aligned and
so must move a minimum of 8 pixels horizontally. This is easily solved by storing
rotated versions of all images in VGA memory, and then in each instance drawing
the correct rotation forthe pixel alignment atwhich the image is to be drawn; well
see this technique in action in Chapter 49.
Dont worry if youre not catching everything in this chapter on thefirst pass; the
VGA is a complicated beast, and learning about it is an iterative process. Well be
going over these features again, in different contexts, over the course of the rest of
this book.
Page Flipping
When animated graphics are drawn directly on the screen, with no intermediate
frame-composition stage, the image typically flickers and/or ripples, an unavoidable
444
Chapter 23
result of modifying display memory at the same time that it is being scanned for
video data. The display memoryof the VGA makes it possible to perform page flipping,
which eliminates such problems. The basic premise of page flippingis that one area
of display memory
is displayed while
another is being modified. The modifications never
affect an areaof memory as it is providing video data, so no undesirable side effects
occur. Once themodification is complete, the modified
buffer is selected fordisplay,
causing the screen to change to the new imagein a single framestime, typically 1/60th
or 1/70th of a second. The otherbuffer is then available for modification.
As described above, the VGA has 64K per plane, enough to hold
two pages and more
in 640x350 mode 10H, but not enough for two pages in 640x480 mode 12H. For
page flipping,two non-overlapping areasof display memoryare needed. The
sample
program uses two 672x384 virtual pages, each 32,256 bytes long, one starting at
A000:OOOO and the other starting at A000:7E00. Flipping between the pages is as
simple as setting the start address
registers to pointto one display area or the
otherbut, as it turns out,thats not as simple as it sounds.
The timing of the switch betweenpages is critical to achieving flicker-free animation.
It is essential that the program
never be modifying an areaof display memory asthat
memory is providing video data. Achieving this is surprisingly complicated on the
VGA, however.
The problemis as follows.The start addressis latched by the VGAs internal circuitry
exactly once per frame, typically (but not always on all clones) at the start of the
vertical sync pulse. The vertical sync status is, in fact, available as bit 3 of the Input
Status 0 register, addressable at 3BAH (in monochrome modes) or 3DAH (color).
Unfortunately, by the time the vertical sync status is observed by a program, the start
address for the next frame hasalready been latched, having happened the instant
the vertical sync pulse began. That means that its no good to wait for vertical sync to
begin, then set thenew start address; if we did that, wed have to wait until the next
vertical sync pulse to start drawing, because the page wouldntflip until then.
Clearly, what we want is to set the new start address, then wait for the start of the
vertical sync pulse, at which point we can be surethe page has flipped.However, we
cantjust set the start address and wait, because we might have the extreme misfortune toset one of the start addressregisters before the start of vertical sync and the
other after, resulting in mismatchedhalves of the start addressand a nasty jump of
the displayed image for oneframe.
One possible solution to this problem is to pick a second page start address that has
a 0 value for the lower byte, so only the StartAddress High register ever needs to be
set, but in the sample program inListing 23.1 Ivegone for generalityand always set
both bytes. To avoid mismatched start address bytes, the sample program waits for
pixel data tobe displayed, as indicated by the Display Enable status; this tells uswere
somewhere inthe displayed portion of the frame, far enough
away from vertical sync
so we can be sure the new start addresswill get used atthe next vertical sync. Once
Bones and Sinew
445
the Display Enable status is observed, the program sets the new start address, waits
for vertical sync tohappen, sets the new pel panning state, and thencontinues drawing. Don't worry about the details right now; page flipping will come up again, at
considerably greater length,in later chapters.
Waiting for the sync pulse has the side effect of causing program execution to synchronize to the VGA's frame rate of 60 or 70 frames per second, depending on the
display mode. This synchronization has the useful consequence of causing the program to execute at the same speed onany CPU that can draw fastenough to complete
the drawing in a single frame; the program just
idles for the rest of each frame that it
finishes before the VGA is finished displaying the previous frame.
An important pointillustrated by the sample program is that while the VGA's display
memory is far larger and more versatile than is the case with earlier adapters, it is
nonetheless a limited resource and must be used judiciously. The sample program
uses VGA memory to store two 672x384 virtual pages, leaving only1024 bytes free to
store images. In this case, the only imagesneeded are a colored ball and a blank block
with which to erase it, so there is no problem, butmany applications require dozens
or hundredsof images. The tradeoffs between virtual page size, page flipping, and
image storage must always be kept in mind when designing programs for theVGA.
To see the program runin 640x200 16-color mode, comment out the
EQU line for
MEDRES-VIDEO-MODE.
drawing pixels into display memory; all the write modes and read modes and set/
reset capabilities and everything else involved with manipulating display memory
really does work in the same way on all VGAs and VGA clones. That compatibility
isnt as airtight when it comes
to scanning pixels out of displaymemory and onto the
screen in certain infrequently-used ways, however.
The areas of incompatibility of which Im aware are illustrated by the sample program, and may in fact have caused you to see some glitches when you ran Listing
23.1. The problem, which arises only on certain VGAs, is that some settings of the
Row Offset register cause some pixels to be dropped ordisplaced to the wrong place
on the screen; often, this happens only in conjunction with certain start address
settings. (In my experience, only VRAM (Video RAM)-basedVGAs exhibit this problem, no doubt due
to the way that pixel data is fetched fromVRAM in large blocks.)
Panning andlarge virtual bitmaps can be made to work reliably,by careful selection
of virtual bitmap sizes and start addresses,but its difficult; thats
one of the reasons that
most commercial software
does not use these features,although a numberof gamesdo.
The upshot is that if youre going to use oversized virtual bitmaps and pan around
them, you should take great care to test your software on a wide variety of VRA
and DRAM-based VGAs.
447
Previous
Home
Now that we know whatthe VGA looks like in broadstrokes and have a sense of what
VGA programming is like, we can start looking at specific areas in depth. In the next
chapter, well take a look at thehardware assistance the VGA provides the CPU during display memory access. There are four latches and four ALUs in those chips,
along with some useful masks and comparators, and its that hardware thats the
difference between sluggish performance andmaking the VGA get up anddance.
Next
Previous
chapter 24
parallel processing with the vga
Home
Next
VGA
I'm going to begin o4detailed tour of the VGA at the heart of the flow of data through
the VGA the four ALhs built into the VGA's Graphics Controller (GC) circuitry.The
and XORing
&Us (one for each display memoryplane) are capable of ORing, ANDing,
CPU data and display memorydata together, as well as masking off some or all ofthe bits
in the data from affecting the find result. All the ALUs perform the same logicaloperation at any given time, but each ALU operates on a different display memory byte.
Recall that the VGA has four display memory planes, with one byte in each planeat
any given display memory address. All four display memory bytes operated on are
read fromand written to the same address, but each ALU operates on a byte that was
read from a different plane and writes the result to that plane. This arrangement
allows four display memory bytes to be modified by a single CPU write (which must
451
often be preceded by a single CPU read, as we will see). The benefit is vastly improved performance; if the CPU had to select each of the four planes in turn via
OUTSand perform the four
logical operations itself, VGA performance would slow
to a crawl.
Figure 24.1 is a simplified depiction of data flow around the&Us. Each ALU has a
matching latch, which holds the byte read from the corresponding plane
during the
last CPU read from display memory, even if that particular plane wasnt the plane
that theCPU actually read on thelast read access. (Only one byte can be read by the
CPU with a single display memory read; theplane supplying the byte is selected by
the Read Map register. However, the bytes at thespecified address in all four planes
are always read when the CPU reads display memory,and those four bytes are stored
in their respective latches.)
Each ALU logically combines the byte written by the CPU and the byte stored in the
matching latch, according to the settings of bits 3 and 4 of the Data Rotate register
(and the Bit Mask register as well, which Ill cover next time), and then writes the
result to display memory.It is most important to understand that neither
ALU operand comes directly from display memory. The temptation is to think of the ALUs as
combining CPU data and thecontents of the display memory address being written
to, but they actuallycombine CPU data and the contents of the last displaymemory
location read, which need not be the location being modified. The most common
452
Chapter 24
application of the ALUs is indeed to modify a given display memory location, but
doing so requires a read from thatlocation to load the latches before the write that
modifies it. Omission of the readresults in awrite operation that logically combines
CPU data with whatever data happens to be in the latches from the last read, which
is normally undesirable.
Occasionally, however, the independence of the latches from the display memory
location being written to can be used to great advantage. The latches can be used to
perform 4byte-at-a-time (one byte from each plane) block copying; in this application, thelatches are loaded with a read from the
source area andwritten unmodified
to the destination area.The latches can be written unmodified in one of two ways:By
selecting write mode 1 (for anexample of this, see the last chapter), orby setting the
Bit Mask register to 0 so only the latched bits are written.
The latches can also be used to draw a fairly complex area fill pattern, with a different bit pattern used to fill each plane. The mechanism for this is as follows: First,
generate the desired pattern
across allplanes at any displaymemory address. Generating the pattern requires a separate write operation for each plane, so that each
plane's byte will be unique. Next, read that memory address to store the pattern in
the latches. The contents of the latches can now be written to memory any number
of times by using either write mode 1 or thebit mask, since they willnot change until
a read is performed. If the fill pattern does not require a different bit pattern for
each plane-that is, if the patternis blackand white-filling can be performed more
easily by simply fanning theCPU byte out to all four planes with writemode 0. The
set/reset registers can be used in conjunction with fanning out the data
to support a
variety of two-color patterns. More on this in Chapter 25.
The sample program in Listing 24.1 fills the screen with horizontal bars,then illustrates
the operation of each of the four ALU logical functions by writing avertical SO-pixel-wide
box filled with solid, empty, and vertical and horizontal bar patterns over that background using each of the functions in turn. When observing the outputof the sample
program, it is important to remember that all four vertical boxes are drawn with exactly
the same code-only the logical function that is in effect differs from box to box.
All graphics in the sample program are done in black-and-white by writing to all
planes, in order to show the operation of the ALUs most clearly. Selectiveenabling
of planes via the Map Maskregister and/or set/reset would produce color effects; in
that case, the operation of the logical functions must be evaluated on a plane-byplane basis, since only the enabled planes would be affected by each operation.
LISTING 24.1
124- 1.ASM
: Program t o i l l u s t r a t e o p e r a t i o n o f
453
stack
segment
para
stack
'STACK'
512 d u p ( ? )
db
stack
ends
VGA-VIDEO-SEGMENT
350
SCREEN-HEIGHT
SCREEN-WIDTH-IN-BYTES
DEMO-AREA-HEIGHT
equ
equ
equ
equ
336
DEMO-AREA-WIDTH-IN-BYTES
equ
VERTICAL-BOX-WIDTH-IN-BYTES
OaOOOh
80
40
equ 10
VGA r e g i s t e re q u a t e s .
GC-INDEX
GC-ROTATE
GC-MODE
dseg
segment
para
3ceh
equ
3 equ
equ
;GC i nrdeegxi s t e r
:GC rdoattaat e / l o gf ui cnacl t i o n
: r e g i s t e ri n d e x
:GC mode r e g iisntdeer x
common 'DATA'
: S t r i n g used t ol a b e ll o g i c a lf u n c t i o n s .
L a b be yl sat etbr ei nl g
db
'UNMODIFIED
LABEL-STRING-LENGTH
equ
AND
OR
S-LabelString
: S t r i n g s used t o l a b e l
fill p a t t e r n s .
F iPatternFF
11
db
FILL-PATTERN-FF-LENGTH
F i 11 P a t t e r n 0d' F0bPi lal t t e r0n0: 0 h '
FILL-PATTERN-00-LENGTH
F i 11 P a t t e r n V e r t d b
FILL-PATTERN-VERT-LENGTH
F i 11 P a t t e r n H o rdzb
FILL-PATTERN-HORZ-LENGTH
'Fill P a t t e r n : OFFh'
S - FillPatternFF
equ
XOR
'
S - FillPattern00
equ
'Fill P a t t e r n :V e r t i c a B
l ar'
equ
S - FillP a t t e r n V e r t
'Fill P a t t e r nH: o r i z o n t aBl a r '
equ
S - FillPatternHorz
ends
dseg
: Macro t o s e t i n d e x e d r e g i s t e r
SETGC
macro
mov
mov
dx.ax out
endm
INDEX o f GC c h i p t o SETTING.
INDEX. SETTING
dx, GC-INDEX
ax.(SETTING SHL 8 ) OR I N D E X
: Macro t o c a l l BIOS w r i t e s t r i n g f u n c t i o n t o d i s p l a y t e x t s t r i n g
: TEXT-STRING. o f l e n g t h TEXT-LENGTH, a tl o c a t i o n ROW.COLUMN.
TEXT-UP macro
TEXT-STRING.
TEXT-LENGTH.
mov :BIOS
ah.13h
mov
b p . o f f s e t TEXT-STRING
cx.TEXT-LENGTH
mov
454
Chapter 24
ROW. COLUMN
f u nsct tr i w
on ngr i t e
;ES:BP p o i n tsost r i n g
dx.(ROW
SHL
mov
sub
mov
int
endm
8 ) OR COLUMN
: s ct rhii soan nrgcsluy r,ns o tr
: taetxt rt w
i bh(iusli itggerhaty )
a1 ,a1
b l ,7
10h
:position
moved
'CODE'
cseg
segment
para
public
assume cs:cseg.
ds:dseg
npesratoarcr t
mov
dseg
ax,
mov
ds.ax
: S e l e c t 640x350 g r a p h i c s mode.
mov
int
: ES p o i n t s t o
ax.010h
10h
VGA memory.
ax,VGA-VIDEO-SEGMENT
mov
mov
es.ax
: Draw b a c k g r o u n do fh o r i z o n t a lb a r s .
mov
dx,SCREEN_HEIGHT/4
:# o f b a r s t o draw(each 4 p i x e l s h i g h )
sub
:.sddotiaif ftrst e t
0 d ii ns p l a y
memory
mov
ax.0ffffh
:fill p a t t elf io
rgn
arhr be
t oafrss
bx.DEMO-AREA-WIOTH-IN-BYTES
/ 2 : l e n g toh
efa cbha r
mov
mov
si.SCREEN-WIOTH-IN-BYTES
- DEMO-AREA-WIDTH-IN-BYTES
bp.(SCREEN-WIDTH-IN-BYTES
* 3 ) - DEMO-AREA-WIDTH-INKBYTES
mov
BackgroundLoop:
mov
bxcx.
: l e n g t ho fb a r
b a r o f h a l ft o: d
ps rt ao w
sr w
ep
: p o i n tt os t a r to fb o t t o mh a l fo fb a r
add
di.si
mov
bxcx,
: l e n g t ho fb a r
bar of b
ho
a: dstl ftrtoaom
w
srw
ep
addin:,dbpebost aoxpfttoarpitofnr t
dx
dec
jnz
BackgroundLoop
: D r a w v e r t i c a l boxes f i l l e d w i t h a v a r i e t y o f
: u s i n ge a c ho ft h e
4 l o g i c a lf u n c t i o n si nt u r n .
GC-ROTATE.
SETGC
fill p a t t e r n s
u:nsdm
ea ltoeadcitf i e d
: l o g i c a lf u n c t i o n . .
mov
d i .O
D r acwaVl lebr toid;cx.ra.a.law
Bnodx
SETGC
mov
call
GC-ROTATE, 08h
d i .10
DrawVerticalBox
box
draw
:...and
SETGC
call
GC-ROTATE, 1: s0ehl e c t
d i .20
DrawVerticalBox
SETGC
mov
call
GC-ROTATE,
d i .30
D r a w V ec ratli
mo v
: s e l e c t AND l o g i fcuanl c t i o n . .
OR l o g i fcuanl c t i o n . .
;
. . .and
draw
box
1: s8ehl e c t
Box
X O R l o g i fcuanl c t i o n
...
:...and
draw
box
455
: R e s e tt h el o g i c a lf u n c t i o nt od a t au n m o d i f i e d ,t h ed e f a u l ts t a t e .
0
SETGC
GC-ROTATE.
: L a b e tl h es c r e e n .
ds push
POP
;esst r i nw
ge
sd' il sl p laapryae s s teod
: by p o i n t i n g ES:BP t o them
VGA BIOS'S
: L a b e lt h el o g i c a lf u n c t i o n s ,u s i n gt h e
: w r i t es t r i n gf u n c t i o n .
TEXT-UP
L a b e l s t r i n g , LABEL-STRING-LENGTH,
: L a b e tl h e fill p a t t e r n s ,u s i n gt h e
: w r i t es t r i n gf u n c t i o n .
TEXT-UP
TEXT-UP
TEXT-UP
TEXT-UP
24. 0
VGA BIOS'S
F i l l P a t t e r n F F . FILL-PATTERN-FF-LENGTH.
3.42
F i l l P a t t e r n 0 0 . FILL-PATTERN-00-LENGTH.
9, 42
F i l l P a t t e r n V e r t . FILL-PATTERN-VERT-LENGTH.
15.42
F i l l P a t t e r n H o r z , FILL-PATTERN-HORZ-LENGTH.
21.42
: Wait u n t i l a key'sbeen
WaitForKey:
mov
int
jz
BIOS
h i tt or e s e ts c r e e n
mode & e x i t .
ah.1
16h
WaitForKey
: F i n i s h e dC
. l e a rk e y r, e s e st c r e e n
mode and e x i t .
Done:
mov : c l e aarh . 0
int
16h
Zlh
start
k e yt h a t
mov
int
ax.3
10h
:reset t o t e x t
mov
int
ah.4ch
: e xt oi t
we j u s t d e t e c t e d
mode
DOS
endp
: S u b r o u t i n et od r a w
a box80x336 i n s i z e , u s i n g c u r r e n t l y s e l e c t e d
: l o g i c a lf u n c t i o n ,w i t hu p p e rl e f tc o r n e r
a t t h ed i s p l a y memory o f f s e t
:
:
:
:
:
i n D I . Box i sf i l l e dw i t hf o u rp a t t e r n s .
Top q u a r t e ro fa r e ai s
f i l l e d w i t h OFFh ( s o l i d )p a t t e r n ,n e x tq u a r t e ri sf i l l e dw i t h
OOh
( e m p t y )p a t t e r n ,n e x tq u a r t e ri sf i l l e dw i t h
3 3 h( d o u b l ep i x e w
l ide
v e r t i c a lb a r )p a t t e r n ,
and b o t t o mq u a r t e ri sf i l l e dw i t hd o u b l ep i x e l
h i g hh o r i z o n t abl a pr a t t e r n .
456
Chapter 24
b ohxe i g h t
RowLoop:
mov
ColumnLoop:
mov
cx.WIDTH
ah.es:ldil
stosb
1oop
add
dx
dec
jnz
endm
: l o a dd i s p l a y memory c o n t e n t s i n t o
: GC l a t c h e s (we d o n ' ta c t u a l l yc a r e
: a b o u tv a l u er e a di n t o
AH)
: w r i t ep a t t e r n ,w h i c hi sl o g i c a l l y
: c o m b i n e dw i t hl a t c hc o n t e n t sf o re a c h
: p l a n e a n dt h e nw r i t t e nt od i s p l a y
: memory
Col umnLoop
- WIDTH
:pointtostartofnextline
di.SCREEN_WIDTH_IN-BYTES
down i n box
RowLoop
D r a w V e r tci a l Box p r o nc e a r
DRAW-BOXQUARTER
Offh.
VERTICALLBOX-WIDTHKIN-BYTES
: f i r s t fill p a t t e r n :s o l i d
fill
0. VERTICAL_BOX-WIDTHKIN-BYTES
:second fill p a t t e r n : empty fill
033h. VERTICAL-BOXKWIDTHKIN-BYTES
DRAWKBOXLOUARTER
: t h i r d fill p a t t e r n :d o u b l e - p i x e l
: w i d ev e r t i c a lb a r s
mov
dx.DEMOKAREALHEIGHT / 4 / 4
: f o u r t h fill p a t t e r n :h o r i z o n t a lb a r si n
: s e t s o f 4 s c a nl i n e s
ax.ax
sub
mov
si.VERTICAL-BOXKWIDTH-IN-BYTES
: w i d t ho f
fill a r e a
HorzBarLoop:
dec
ax
; O f f h fill ( s m a ltloe r
word
than
byte
do
DEC)
mov
cx, s i
: wt oi d t h
fill
HBLoopl:
mov
b. el s : [ d:illo al da t c h e( sd o nc' ta raeb o uv ta l u e )
: w r i t es o l i dp a t t e r n ,t h r o u g h
ALUs
stosb
1oop
HBLoopl
add
di.SCREEN-WIDTH_IN-BYTES - VERTICAL_BOX_WIDTH-IN_BYTES
mov t o: w icdxt .hs i
fill
HBLoopE:
mov
b, el s : [ d i l
oad
:1
1a t c h e s
: w r i t es o l i dp a t t e r n ,t h r o u g h
ALUs
stosb
1oop
HBLoopE
add
di.SCREEN-WIDTH-IN-BYTES - VERTICAL-BOX-WIDTH-IN-BYTES
inc
:O fill ( s m a ltloe r word
than
do
byte
DEC)
ax
mov t o: w i cdxt h. s i
fill
HBLoop3:
mov
bl.es:[dil
:1oad 1 a t c h e s
: w r i t e empty p a t t e r n ,t h r o u g h ALUs
stosb
1 oop
HBLoop3
add
di,SCREEN-WIDTH-IN-BYTES
- VERTICAL-BOX_WIDTH-IN_BYTES
mov t o: w icdxt ,hs i
fill
HBLoop4:
mov
. ebs l::[ldol aialtdc h e s
: w r i t e empty p a t t e r n ,t h r o u g h ALUs
stosb
1 oop
HBLoop4
add
di.SCREENKWIDTH-IN_BYTES - VERTICALLBOXKWIDTH-IN-BYTES
dec
dx
jnz
HorzBarLoop
DRAW-BOX-OUARTER
457
ret
OrawVerticalBoxendp
ends
cseg
end
start
Logical function 0, which writesthe CPU data unmodified, is the standard modeof
operation of the ALUs. In this mode, the CPU data is combined with the latched
data by ignoring the latched data
entirely. Expressed as a logical function, this could
be considered CPU data ANDed with 1 (or ORed with 0). This is the mode to use
whenever you want to place CPU data into display memory, replacing the previous
contents entirely. It may occur to you that there is no need to latch
display memory
at all when the data unmodified functionis selected. In thesample program, that is
true, but if the bit mask is being used, the latchesmust be loaded even for the data
unmodified function, as 111discuss in the next chapter.
Logical functions 1 through 3 cause the CPU data tobe ANDed, ORed, and XORed
with the latched data,respectively. Ofthese, XOR is the most useful, since exclusiveORing is a traditionalway to perform animation. The
uses ofthe AND and OR logical
functions areless obvious.AND can be used to mask a blank area intodisplay memory,
or to mask off those portions of a drawing operation that dont overlap an existing
display memory image. OR could conceivably be used to forcean image into display
memory over an existing image. To be honest, I havent encountered any particularly valuable applications for AND and OR, but theyre the sortof building-block
features that could comein handy in just the right
context, so keep them in mind.
458
Chapter 24
which requires fourinstruction bytes. By shifting the value destined for the highbyte
into the high byte with MASMs shift- left operator, SHL (*100H would work also),
and then logically combining the values with MASMs OR operator (or the ADD
operator), bothhalves of DX can be loaded with a single instruction, as in
MOV D X , ( V A L U E ES H L
8 ) O RV A L U E 1
which takes onlythree bytes and is faster, being a single instruction. (Note, though,
that in 32-bit protected mode, theres a size and performance penalty for 16-bit instructions such as the MOV above; see the first part of this book for details.) As
shown, a macro is an ideal place to use this technique; the macro invocation can
refer to two separate byte values, making matters easier for the programmer, while
the macro itself can combine the values into a single word-sized constant.
A minor optimization tip illustrated in the listing is the use of INCAX and DEC
AX in the DrawVerticalBox subroutine when only AL actually needs to be modified. Word-sized register increment and decrement instructions (or dword-sized
Parallel Processing with the VGA
459
Previous
Home
instructions in 32-bit protected mode) are only one byte long, while byte-sized
register increment and decrement instructions are two bytes long. Consequentb,
when size counts, it is worth using a whole 16-bit (or 32-bit) register instead of the
low 8 bits of that register for INC and DEC-ifyou don 't need the upper portion
of the register for any other purpose, or
ifyou can be sure that the INC or DEC
won't aflect the upperpart of the registex
The latches and ALUs are central to high-performance VGA code, since they allow
programs to process across all four memory planes without a series of OUTS and
read/write operations. It
is not always easyto arrange a program to exploit
this power,
however, because the &Us are far more limited than a CPU. In many instances,
however, additional hardware inthe VGA, including the bitmask, the set/reset features, and the barrel shifter, can assist the ALUs in controlling data, as we'll see in
the nextfew chapters.
460
Chapter 24
Next
Previous
chapter 25
vga data machinery
Home
Next
Ch
*&4#
Set/Reset
In the last chapter, amined a
tion of the VGA,
model
to
include
only the write mo
tation
expanded model of GC data flow, featuring the barrel shifter
and bit mask circui
Lets look at the barrel shifter first. A barrel shifter is circuitry
capable of shifting-ok rotating, in the VGAs case-data an arbitrary number of bits
in a single operation, as opposed to being able to shift only one bit position at a time.
The barrel shifter in theVGA can rotate incoming CPU data up to seven bits to the
right (toward the least significant bit), with bit 0 wrapping back to bit 7, after which
the VGA continues processing the rotatedbytejust as it normally processes unrotated
CPU data. Thanks to the nature of barrel shifters, this rotation requires no extra
processing time over unrotated VGA operations. The number of bits by which CPU
data is shifted is controlled by bits 2-0 of GC register 3, the Data Rotate register,
which also contains the ALU function select bits (data unmodified, AND, OR, and
XOR) that we looked at in thelast chapter.
463
Data
Data flow
flow through
through the
the Graphics
Graphics Controller:
Figure 25.1
Figure 25.1
464
Chapter 25
theyre generally referred to in the singular, asthe bit mask. Figure 25.2 illustrates
the operation of one bit of the bit mask for one plane. This circuitry occurs eight in
times
the bit maskfor a given plane, once for
each bit of the byte written to display memory.
Briefly, the bit mask determines on a bit-by-bit basiswhether thesource for each byte
written to display memory isthe ALU for that plane or the latch for that plane.
The bit mask iscontrolled by GC register 8, the Bit Maskregister. If a given bit of the
Bit Maskregister is 1, then the correspondingbit of data from theALUs iswritten to
display memory for all four planes, while if that bit is 0, then the correspondingbit
of data from thelatches for the fourplanes is written to display memory
unchanged.
written to display memory
(In write mode 3, the actual bit mask thatsapplied to data
written by the
is the logical AND of the contentsof the Bit Maskregister and the data
CPU, as well seein Chapter 26.)
The most common use of the bit mask is to allow updating of selected bits within a
display memory byte. This works as follows: The display memory byte of interest is
latched; thebit mask is set to preserve all but thebit or bits to be changed; theCPU
writes to display memory, with
the bit mask preserving the indicated latchedbits and
allowing ALU data through to change the other bits. Remember, though, that it is
not possible to alterselected bits in a display memory bytedirectly; the byte must first
be latched by a CPU read, and then the
bit mask can keep selected bits ofthe latched
byte unchanged.
Listing 25.1 showsa program that uses the bit mask data rotation capabilities of the
GC to draw bitmapped text at any screen location. The BIOS only draws characters
VGA Data Machinery 465
125- 1.ASM
LISTING 25.1
: Program t o i l l u s t r a t e o p e r a t i o n o f d a t a r o t a t e a n d b i t
mask
:
:
f e a t u r e so G
f r a p h i c sC o n t r o l l e r D
. r a w s8 x 8c h a r a c t e ar t
s p e c i f i e dl o c a t i o n u, s i n g
VGA's 8 x 8 ROM f o n tD. e s i g n e d
: f o ru s ew i t h
modes OOh, OEh. OFh. 10h. and1Zh.
: By M i c h a e Al b r a s h .
s t a cske g m e npta rsat a c k
d u p5(1?2) d b
set na dc ks
'STACK'
equ
VGACVIOEOCSEGMENT
044ah equ
SCREEN-WIDTH-INCBYTES
equ
FONT-CHARACTER-SIZE
OaOOOh
:VGA d i s p l a y memorysegment
: o f f s e t o f BIOS v a r i a b l e
:# b y t e s i n e a c h f o n t c h a r
: VGA r e g i s t e re q u a t e s .
GC-INDEX
GC-ROTATE
;GC r iengdi es xt e r
:GCr odt a t ae / l of ug ni ccat il o n
: r e g i s t e ri n d e x
;GC b i t mask r e g iisntdeer x
3cehequ
3 equ
GC-BIT-MASK
equ
d s seegg m epnat r a
TEST-TEXT-ROW
equ
TEST-TEXT-COL
equ
TEST-TEXT-WIDTH 8 equ
Teststring
db
d dF o n t P o i n t e r
ends
dseg
common 'DATA'
69
:row t o d i s p l a y t e s t t e x t a t
17
;column t o d i s p l a y t e s t t e x t a t
: w i d t ho fac h a r a c t e ri np i x e l s
lbaybteel
'Helw
l oo, r l d ! ':.tOesst rt pi tnroign t .
?
offset :font
: Macro t o s e t i n d e x e d r e g i s t e r
SETGC
macro
mov
mov
dx.ax out
endm
I N D E X . SETTING
dx.GC-INDEX
ax,(SETTING SHL 8 ) OR INDEX
c s es ge g m epnat pr au b l i c
assume
cs:cseg,
ds:dseg
npsertaoarcr t
mov
ax,dseg
mov
ds,ax
: S e l e c t6 4 0 x 4 8 0g r a p h i c s
mov
int
: S e td r i v e r
466
Chapter 25
INDEX o f GC c h i p t o
'CODE'
mode.
ax.012h
10h
t o u s et h e8 x 8f o n t .
SETTING.
mov
ah.llh
mov
a1 .30h
p o i nf ot mov
enrt 8 x 8 : g e bt h , 3
int
10h
cSea ecl llt F o n t
:VGA B I O S c h a r a c t e rg e n e r a t o rf u n c t i o n ,
: r e t u r ni n f os u b f u n c t i o n
: P r i n tt h et e s ts t r i n g .
mov
. so if f sTeet s t S t r i n g
mov
bx.TEST_TEXT_ROW
mov
cx.TEST_TEXT_COL
StringOutLoop:
1 odsb
and
a1 .a1
jz
StringOutDone
call
DrawChar
add
cx.TEST_TEXT_WIDTH
S t r i n g jOmupt L o o p
StringOutDone:
: R e s e tt h ed a t ar o t a t ea n db i t
mask r e g i s t e r s .
0
SETGC
GC-ROTATE.
SETGC
GC_EJT_MASK, O f f h
: W a i tf o r
a keystroke.
mov
int
ah.1
21h
: R e t u r nt ot e x t
mode
mov
int
ax,03h
10h
: E x i t t o DOS.
mov
int
ah.4ch
21h
Setnadr tp
: S u b r o u t i n et od r a w
a t e x tc h a r a c t e ri n
(ODh, OEh. OFh. 0 1 0 h0.1 2 h ) .
: F o n tu s e ds h o u l db ep o i n t e dt ob yF o n t P o i n t e r .
a l i n e a rg r a p h i c s
mode
: Input:
: AL
c h a r a c t e rt od r a w
: EX
r o w t o d r a wt e x tc h a r a c t e ra t
C X - column t od r a wt e x tc h a r a c t e ra t
F o r c e s ALU f u n c t i o n t o
DrawChar
push
push
push
push
push
push
push
push
"move".
n e pa r o c
ax
bx
cx
dx
si
di
bP
ds
467
: Set DS:SI t o p o i n t t o f o n t
and ES t o p o i n t t o d i s p l a y
memory.
memory
: C a l c u l a t es c r e e na d d r e s so fb y t ec h a r a c t e rs t a r t si n .
push
sub
mov
xchg
mov
ds
dx, dx
ds .dx
ax.bx
POP
mu1
push
mov
and
shr
shr
shr
add
ds
di
di
di .cx
cl .Olllb
d i .1
d i .1
d i .1
di ,ax
: p tooi n t
B I O S sdeagtm
a ent
di.ds:[SCREEN-WIDTH-IN-BYTES]
: r e t r i e v e BIOS
: s c r e e nw i d t h
c a l c u l a t eo f f s e to fs t a r to fr o w
s e ta s i d es c r e e nw i d t h
s e ta s i d et h ec o l u m n
k e e po n l yt h ec o l u m ni n - b y t ea d d r e s s
d i v i d ec o l u m nb y
and p o i n t t o b y t e
8 t o make a b y t ea d d r e s s
: C a l c u l a t ef o n ta d d r e s so fc h a r a c t e r .
bh.bh
bx.1
bx.1
bx.1
sub
shl
shl
shl
add
: S e tu pt h e
mov
mov
mov
dx.ax out
;assumes8
b y t e sp e rc h a r a c t e r :u s e
: a m u l t i p l yo t h e r w i s e
: o f f s e ti nf o n to fc h a r a c t e r
.: bos xif f soisnenetg mceohnfat r a c t e r
GC r o t a t i o n .
dx, GC-INDEX
a1 ,GC-ROTATE
ah.cl
: Setup BH as b i t mask f o r l e f t h a l f ,
: EL as r o t a t i o n f o r r i g h t h a l f .
mov
bh.cl shr
cl
neg
c1.8 add
, c bl sl h l
bx.0ffffh
: Draw t h e c h a r a c t e r , l e f t h a l f f i r s t , t h e n r i g h t h a l f i n t h e
; s u c c e e d i n gb y t e ,u s i n gt h ed a t ar o t a t i o nt op o s i t i o nt h ec h a r a c t e r
: a c r o s st h eb y t eb o u n d a r ya n dt h e nu s i n gt h eb i t
: p r o p e rp o r t i o no ft h ec h a r a c t e ri n t oe a c hb y t e .
;
mask t o g e t t h e
Does n o tc h e c kf o rc a s ew h e r ec h a r a c t e ri sb y t e - a l i g n e da n d
: n or o t a t i o na n do n l yo n ew r i t ei sr e q u i r e d .
cx
468
mov
bp.FONT-CHARACTER-SIZE
mov
d x , GC-INDEX
POP
swcbirdaet;cehgkne t c x
dec
; - 2 because do tbwyeo
ft aoec crhs ha r
cx
dec
Chapter 25
CharacterLoop:
: S e tt h eb i t
mask f o r t h e l e f t h a l f o f t h e c h a r a c t e r .
mov
mov
dx,ax out
a1 .GC..BIT-MASK
ah.bh
: G e tt h en e x tc h a r a c t e rb y t e
;
mov
mov
stosb
;
memory.
( L e f th a l fo fc h a r a c t e r . )
S e tt h eb i t
a1 , [ s i ]
ah.es:[dil
; g e tc h a r a c t e rb y t e
; l o a dl a t c h e s
; w r i t ec h a r a c t e rb y t e
mask f o r t h e r i g h t h a l f o f t h e c h a r a c t e r .
mov
mov
dx.ax out
a1 .GC~LBIT_MASK
ah.bl
: G e tt h ec h a r a c t e rb y t ea g a i n
: ( R i g h th a l fo fc h a r a c t e r . )
1 odsb
mov
stosb
ah.es:[dil
& w r i t e i t t o d i s p l a y memory.
; g e tc h a r a c t e rb y t e
: l o a dl a t c h e s
: w r i t ec h a r a c t e rb y t e
; P o i n tt on e x tl i n eo fc h a r a c t e r
add
i n d i s p l a y memory.
. c xd i
bp
dec
C h a r a cj nt ez r L o o p
POP
POP
pop
pop
POP
POP
POP
POP
ret
endp
ds
bp
di
si
dx
cx
bx
ax
DrawChar
: S e tt h ep o i n t e rt ot h ef o n tt od r a wf r o mt o
ES:BP.
enceptS
arForeocln t
mov
mov
ret
eSnedlpe c t F o n t
[ Fw
po tnor tr P
d o i n t e r: ps] .aobvipnet e r
w[ poFtrordn t P o i n t e r + Z ] . e s
ends
cseg
startend
well see in Chapter 35. Its also useful for drawing the edges of primitives, such as
filled polygons, that potentially involve modifylng some but notall ofthe pixels controlled by a single byte of display memory.
Basically, the bit mask is handy whenever only some of the eight pixels in a byte of
display memory need tobe changed, because it allows full use of the VGAs four-way
parallel processing capabilities for the pixels that are to be drawn, without interfering with the pixels that are to be left unchanged. The alternative
would be
plane-by-plane processing, which from a performance
perspective would be undesirable indeed.
Its worth pointing out again that the bitmask operates on the datain the latches,
not on thedata in display memory.This makes the bit mask a flexible resource that
with a little imagination can be used for some interesting purposes. For example,
you could fill the latches with a solid background color (by writing the color somewhere in display memory, then reading that location to load the latches), and then
use the Bit Mask register (or write mode 3, as well see later) as a mask through
which to draw a foregroundcolor stencilled into the background color
without reading display memory first. This only works for writing whole bytes at a time (clipped
bytes require theuse ofthe bit mask; unfortunately, were already using it forstencilling in this case), but it completely eliminates reading display memory and does
foreground-plus-background drawing in one blurry-fast pass.
The example code in Listing 25.1 is designed to illustrate the use of the Data Rotate
and Bit Mask registers, and is not as fast or as complete as it might be. The case
where text is byte-aligned could be detected and performed much
faster, without the
use of the Bit Maskor Data Rotate registers and with onlyone display memory access
per fontbyte (to write the font byte), rather than fourread
(to display memory and
write the font byte to each of the two bytes the character spans). Likewise, nonaligned text drawing could be streamlined to one display memory access per byte by
having the CPU rotate and combine the fontdata directly, rather than setting up the
VGAs hardware to do it. (Listing 25.1 was designed to illustrate VGA data rotation
and bit masking rather than thefastest way to draw text. Well see faster text-drawing
code soon.) Oneexcellent rule of thumb is to minimize display memory accesses of
all types, especiallyreads, which tend to be considerably slower than writes. Also, in
470
Chapter
25
Listing 25.1 it would be faster to use a table lookup to calculate the bit masks for the
two halves of each character rather than theshifts used in the example.
For another (and morecomplex) example of drawing bit-mapped text on theVGA,
seeJohn Cockerhams article, Pixel AlignmentEGA
of Fonts,PC TechJournaZ,January,
1987. Parenthetically, Id like to pass along Johns comment about the
VGA When
programming theVGA, everything is complex.
Hes got a point there.
Data
flow
Figure 25.3
Dataflow
Figure 25.3
471
the desiredcolor. For example, theMap Maskregister would be set to 09H to draw in
high-intensity blue; here, bits 0 and 3 are set to 1, so only the blue plane (plane 0)
and the intensity plane (plane 3) are written to.
Remember, though, that planes that are disabled
by the Map Mask register are not
written to or modified in any way. This means that the
above approach works onlyif
the memory being written to is zeroed; if, however, the memory already contains
non-zero data, that data
will remain in the planes disabled
by the Map Mask,and the
end result will be that some planes contain the data just written and other planes
contain old data. In short, color control
using the Map Maskdoes not force all planes
to contain the desiredcolor. In particular, itis not possible to force some planes to
zero and otherplanes to one in a single write with the Map Mask register.
The program in Listing 25.2 illustrates this problem. A green pattern (plane1 set to
1, planes 0, 2, and 3 set to 0) is first writtento display memory. Display memoryis then
filled with blue (only
plane 0 set to 1),with a Map Mask setting of 01H. Where the blue
crosses the green, cyan is produced, rather than blue,
because the Map Mask register
setting of 01H that produces blue leaves the green plane (plane 1 ) unchanged. In
order to generate blue unconditionally, would
it
be necessary to set the Map Mask
register to OFH, clear memory, and then set the Map Mask register to 01H and fill
with blue.
LISTING 25.2
L25-2.ASM
; Program t o i l l u s t r a t e o p e r a t i o n
o f Map Mask r e g i s t e r when d r a w i n g
; t o memory t h a ta l r e a d yc o n t a i n sd a t a .
; By M i c h a e A
l brash.
equ
OaOOOh
; S C riengdies xt e r
;SC map mask r e g i s t e r
EGA r e g i s t e re q u a t e s .
SC-INDEX
SC-MAP-MASK
3c4hequ
equ
I N D E X o f SC c h i p t o
; Macro t o s e t i n d e x e d r e g i s t e r
macro
I N D E X , SETTING
mov
dx.SC-INDEX
mov
a1 , I N D E X
dx,al out
dx
inc
mov ,SETTING
a1
dx.al out
dx
dec
endm
SETSC
c s es eg g m epnatpr au b l i c
assume
cs:cseg
472
;EGA d i s p l a y memory
segment
Chapter 25
'CODE'
SETTING.
n pesratoar cr t
: S e l e c t6 4 0 x 4 8 0g r a p h i c s
mov
int
video
mode.
ax.012h
10h
mov
ax.EGA-VIDEO-SEGMENT
mov
t o ; p o i nets . a x
memory
: D r a w2 41 0 - s c a n - l i n eh i g hh o r i z o n t a lb a r si ng r e e n ,1 0s c a nl i n e sa p a r t .
SETSC
SC-MAP_MASK.OLh
bar
a1 . O f f h
bp.24
cx.80*10
; I bh p
yotereirzs obnat ar l
d i .80*10
: p osit tnonaoterbf xt at r
: F i l ls c r e e nw i t hb l u e ,u s i n g
Map Mask r e g i s t e r t o e n a b l e w r i t e s
: t ob l u ep l a n eo n l y .
SC-MAP-MASK.Olh
SETSC
, d idsiu b
mov 8 0 * 4c8x0,
mov
a1 . O f f h
r e ps t o s b
: W a i tf o r
on1 y
s c:#
r epeebnry t e s
; p e r f o r m fill ( a f f e c t so n l y
: p l a n e 0. t h eb l u ep l a n e )
a keystroke.
mov
in t
: R e s t o r et e x t
mov
in t
ah.1
21h
mode.
ax.03h
10h
: E x i t t o 00s.
start
cseg
mov
in t
endp
ends
end
ah.4ch
21h
start
473
For each of the bits 0-3 in the Enable Set/Reset register (Graphics Controller register 1) that is 1, the corresponding bit in the Set/Reset register (GC register 0) is
extended to a byte (0 or OFFH) and replaces the CPU data for the corresponding
plane. For each of the bits in theEnable Set/Reset register that is 0, the CPU data is
used unchanged for that plane (normal operation).
For example, if the Enable Set/
Reset register is set to 01H and the Set/Reset register is set to 05H, then the CPU
data is replaced for plane0 only (the blue plane), and the
value it is replaced with is
OFFH (bit 0 of the Set/Reset register extended to a byte).Note that in this case, bits
1-3 of the Set/Reset register have no effect.
It is important to understand that the set/resetcircuitry directly replaces CPU data
in Graphics Controller dataflow. Refer back to Figure 25.3 to see that the outputof
the set/reset circuitry passesthrough (and may be transformedby) the ALU and thebit
mask before being written to memory, and even then the Map Mask register must
enable the write. When using set/reset, it is generally desirable to set theMap Mask
register to enable all planes the set/reset circuitry is controlling, since those memory
planes which are disabled by the Map Mask register cannot be modified, and the
purpose of enabling set/reset for a planeis to force that plane to be set by the set/
reset circuitry.
Listing 25.3 illustrates the use of set/reset to force aspecific color to be written. This
the Map
program is the same as that of Listing25.2, except that set/reset rather than
Mask register is used to control color. The preexisting pattern is completely ovenvritten this time, because the set/reset circuitry writes 0-bytesto planes thatmust be off
as well as OFFH-bytes to planes that must be on.
LISTING25.3125-3.ASM
; P r o g r a mt oi l l u s t r a t eo p e r a t i o n
o f s e t / r e s e tc i r c u i t r yt of o r c e
; s e t t i n g o f memory t h a ta l r e a d yc o n t a i n sd a t a .
; By M i c h a e A
l brash.
'STACK'
equ
OaOOOh
3c4h
2
;SC
;SC
;GC
;GC
;GC
;EGA d i s p l a y memory
segment
; EGA r e g i s t e re q u a t e s .
SC-INDEX
equ
SC-MAPLMASK
equ
GC-INDEX
3ceh
equ
GC-SET-RESET
equ
GC-ENABLELSET-RESET equ
0
1
; Macro t o s e t i n d e x e d r e g i s t e r
SETSC
macro
mov
mov
dx.al out
474
Chapter 25
I N D E X , SETTING
dx.SC-INDEX
a1 , I N D E X
i nrdeegxi s t e r
map mask r e g i s t e r
i nrdeegxi s t e r
s e t / r e s e tr e g i s t e r
e n a b l es e t / r e s e tr e g i s t e r
I N D E X o f SC c h i p t o SETTING.
dx
inc
mov ,SETTING
a1
dx.al out
dx
dec
endm
; Macro t o s e t i n d e x e d r e g i s t e r
I N D E X o f GC c h i p t o
SETTING.
I N D E X . SETTING
macro
mov
dx,GC_.INOEX
mov
a1 , I N D E X
dx.al out
dx
inc
mov .SETTING
a1
dx.al out
dx
dec
endm
SETGC
c s es ge g m epnat pr au b l i c
assume
cs:cseg
n pesratoar cr t
; S e l e c t6 4 0 x 4 8 0g r a p h i c s
mov
int
'CODE'
mode.
ax.012h
10h
mov
ax.EGA-VIDEO-SEGMENT
v i d e o mov
t o ; p o i ne ts . a x
memory
; D r a w2 41 0 - s c a n - l i n eh i g hh o r i z o n t a lb a r si ng r e e n ,
SETSC
SC-MAP-MASK.02h
bar
b; se.atdgatdisirvniutinbdoi efn og
mov
mov
HorzBarLoop:
mov
;drawstosb rep
add
bp
dec
H o r z B aj nr Lz o o p
a1 . O f f h
bp.24
cx.80*10
; # bh p
yotereirzs obnat ar l
d i .80*10
; p ostitnnoaoetbrfxtatr
; Fill s c r e e nw i t hb l u e ,u s i n gs e t / r e s e tt of o r c ep l a n e
: o t h e rp l a n et o
SETSC
10 s c a nl i n e sa p a r t .
0 to1's
and a l l
0's.
SC_MAPKMASK.Ofh
; m us es t
; p l a n e s , s o s e t / r e s e tv a l u e sc a n
; b ew r i t t e nt o
memory
SETGC
GC-ENABLE-SET-RESET,Ofh
SETGC
GC-SET-RESET.Olh
sub
di .di
mov 80*480
cx,
mov
a1 .; O
ssifenf anthc/laerlif ebso slreedt
;CPU d a t at oa l lp l a n e s
will be
; r e p l a c e db ys e t / r e s e tv a l u e
; s e t / r e sveat lOui sfeffpohlra n e
; ( t h eb l u ep l a n e )a n d
0 f o ro t h e r
; planes
s ;c#
r epbe ynr t e s
; p l a n e s ,t h e
CPU d a t ai si g n o r e d ; o n l yt h ea c to fw r i t i n gi s
; important
475
; p e r f o r m fill ( a f f e c t sa l lp l a n e s )
r e ps t o s b
; T u r no f fs e t / r e s e t .
SETGC
; W a i tf o r
GC-ENABLELSET-RESET.0
a keystroke.
mov
int
; R e s t o r et e x t
ah,l
21h
mode.
mov
int
; Exitto
start
cseg
ax,03h
10h
00s.
mov
int
endp
ends
end
ah.4ch
21h
start
LISTING25.4125-4.ASM
; Program t oi l l u s t r a t eo p e r a t i o no fs e t / r e s e tc i r c u i t r yi nc o n j u n c t i o n
; w i t h CPU d a t a t o m o d i f y s e t t i n g o f
memory t h a ta l r e a d yc o n t a i n sd a t a
; By M i c h a eA
l brash.
'STACK'
equ
OaOOOh
;SC i nrdeegxi s t e r
; S C map mask r e g i s t e r
;EGA d i s p l a y memory
segment
; EGA r e g i s t e re q u a t e s .
SC-INDEX
SC-MAP-MASK
476
Chapter 25
3c4h
equ
equ
GC-INDEX
equ
3ceh
equ
0
GC-SET-RESET
GC-ENABLELSET-RESET equ 1
:GC i n d e x r e g i s t e r
:GC s e t / r e s e tr e g i s t e r
;GC e n a b l e s e t / r e s e t r e g i s t e r
: Macro t o s e t i n d e x e d r e g i s t e r
I N D E X o f SC c h i p t o
SETTING.
I N D E X o f GC c h i p t o
SETTING.
SETSC
macro
mov
mov
dx,al out
inc
mov
dx.al out
dx
dec
endm
I N D E X , SETTING
d x , SC-INDEX
a1 , I N D E X
dx
a1 .SETTING
; Macro t o s e t i n d e x e d r e g i s t e r
I N D E X , SETTING
macro
mov
dx.GC-INDEX
mov
a1 , I N D E X
dx.al out
dx
inc
mov
.SETTING
a1
dx.al out
dx
dec
endm
SETGC
c s es eg g m epnatpr au b l i c
assume
cs:cseg
n pesratoracr t
: S e l e c t6 4 0 x 3 5 0g r a p h i c s
'CODE'
mode.
mov
int
ax.010h
10h
mov
mov
ax,EGA-VIDEO-SEGMENT
e s .; ap xvoti odn et o
: Draw 1 81 0 - s c a n - l i n eh i g hh o r i z o n t a l
SC-MAP_MASK,OEh
SETSC
memory
b a r si ng r e e n ,1 0s c a nl i n e sa p a r t .
:mapmask
s e t t i n ge n a b l e so n l y
: p l a n e 1. t h eg r e e n
plane
. d dis iu b
mov
a1 . O f f h
mov
bp.18
HorzBarLoop:
mov
80*10
cx,
r e ps t o s b
. 8 0d*ai1d0d
bp
dec
H o r z B aj nr Lz o o p
: s t a r ta tb e g i n n i n go fv i d e o
memory
; # b a r st od r a w
:# b y t e sp e rh o r i z o n t a lb a r
: d r a wb a r
: p o i n tt os t a r t
: F i l ls c r e e nw i t ha l t e r n a t i n gb a r so f
r e da n db r o w n ,u s i n g
: t os e tp l a n e
1 a n ds e t / r e s e tt os e tp l a n e s
0 . 2 & 3.
o f n e x tb a r
CPU d a t a
477
SETSC
SCLMAPLMASK.Ofh
SETGC
GCLENABLELSETLRESET.Odh
SETGC
GC-SET-RESET.04h
. d dis iu b
mov
mov
cx.80*350/2
a x , 07eOh
; p e rsftoorsmw r e p
;
: m sues t
map mask et no a bal lel
; p l a n e s , s o s e t / r e s e tv a l u e sc a n
; b ew r i t t e nt op l a n e s
0. 2 & 3
; and CPU d a t ac a nb ew r i t t e n
to
: p l a n e 1 ( t h eg r e e np l a n e )
;CPU d a t tapo l a n e s
0 . 2 & 3 will be
; r e p l a c e db ys e t / r e s e tv a l u e
: s e t / r e sveat lOui sfeffpohlra n e
2
; ( t h er e dp l a n e )a n d
0 f o ro t h e r
; planes
s c r e; epCnewr o r d s
:CPU cdoapnotltanr onl yles
1;
: s e t / r e s e tc o n t r o l so t h e rp l a n e s
fill
p l a n e(asal)fl f e c t s
T u r no f fs e t / r e s e t .
SETGC
: W a i tf o r
GC-ENABLE-SET-RESET.0
a keystroke.
mov
int
: R e s t o r et e x t
mov
int
ah.1
21h
mode.
ax.03h
10h
: E x i t t o DOS.
mov
int
start
endp
ends
cseg
start end
ah.4ch
21h
There is no clearly defined role for the set/resetcircuitry, asthere is for, say, the bit
mask. In many cases,set/reset is largelyinterchangeable with CPU data, particularly
with CPU data written in write mode 2 (write mode 2 operates similarly to the set/
reset circuitry, as wellsee in Chapter27). The most powerful use of set/reset, in my
experience, is in applications such as the exampleof Listing 25.4,where it is used to
force the value written to certain planes while the CPU data is written to other planes.
In general, though, thinkof set/reset as one moretool you have at your disposal in
getting the VGA to do what you need done,in this case a tool that lets you force all
bits in each plane to either zero or one, orpass CPU data through unchanged, on
each write to display memory. As tools go, set/reset is a handy one, anditll pop up
often in this book.
Notes on Set/Reset
The set/reset circuitry is not active in write modes 1 or 2. The Enable Set/Reset
register is inactive in write mode 3, but the Set/Reset register provides the primary
drawing color in write mode 3, as discussed in the next chapter.
478
Chapter 25
Previous
Home
Next
Be aware that because setheset directly replaces CPU data, it does not necessarily
have to force an entire display memory byte to 0 or OFFH, even when setlreset is
replacing CPU datafor allplanes. For example, ifthe Bit Mask registeris set to 80H,
the setheset circuitry can only modlfi bit 7 of the destination byte in each plane,
since the other seven bits will comefrom the latchesfor each plane. Similarly, the
setheset valuefor each plane can be modifiedby that plane b ALU Once again, this
illustrates that setheset merely replaces the CPU datafor selectedplanes; theset/
reset value is then processed in exactly the same way that CPU data normally is.
479
Previous
chapter 26
Home
Next
483
in the last major chunk of 16-color graphics code I wrote, write mode 3 was used
more than write mode 0 overall, excluding simple pixel copying.So write mode 3 is
well worth using, but to use it you must firstunderstand it. Here's how it works.
In write mode 3, set/reset is automatically enabled for all four planes (the Enable
Set/Reset register is ignored). The CPU data byte is rotated and then ANDed with
the contents of the Bit Mask register, and the result of this operation is used as the
contents of the Bit Mask register alone would normally beused. (If this is Greek to
you, havea look backat Chapters 23 through 25. There's no way to understand write
mode 3 without understanding the rest of the VGA's write data path first.)
That's what write mode 3 does-but what is it for? It turns outthat write mode 3 is
excellent for a surprisingly large number of purposes, because it makes it possible to
avoid the bane of VGA performance, OUTS.Some uses for write mode 3 include
lines, circles,and solid and two-color pattern fills. Most importantly, writemode 3 is
ideal for transparent text; that is, it makes it possible to draw text in l k o l o r graphics mode quickly without wiping out the background in the process. (As we'll see at
the end of this chapter, write mode 3 is potentially terrific for opaque text-text
drawn with the character box filled in with a solid color-as well.)
Listing 26.1 isa modification of code I presented in Chapter 25. That code used the
data rotate and bit mask features of the VGA to draw bit-mapped text in write mode
0. Listing 26.1 uses write mode 3 in place of the bit mask to draw bit-mapped text,
and in the process gainsthe useful abilityto preserve the background into which the
text is being drawn. Where the original text-drawing code drew the entire character
box for each character, with 0 bits in the font patterncausing a black box to appear
around each character, the code in Listing 26.1 affects displaymemory only when 1
bits in the font pattern are drawn. As a result, the characters appear to be painted
into the background, rather than over it. Another advantage of the code in Listing
26.1 is that the characters can be drawn in any of the 16 available colors.
LISTING 26.1 126-
1.ASM
: Program t o i l l u s t r a t e o p e r a t i o n o f w r i t e
mode 3 o f t h e VGA.
; Draws 8x8 c h a r a c t e r s a t a r b i t r a r yl o c a t i o n sw i t h o u td i s t u r b i n g
; the
b a c k g r o u n du, s i n g
VGA's 8x8 ROM f o n tD. e s i g n e d
; f o ru s ew i t h
modes ODh.
OEh.
OFh.
10h.and12h.
; Runs o n l y on VGAs ( i n Models 50 & upand
I B M D i s p l a yA d a p t e r
; a n 1d 0 0 %
compatibles).
; A s s e m b l e dw i t h
MASM
; By M i c h a e A
l brash
: VGA r e g i s t e re q u a t e s .
484
Chapter 26
equ
OaOOOh
e0q4u4; oa fhfos fe t
8
equ
: S C i n d e xr e g i s t e r
:SC map mask r e g i s t e r i n d e x
: G C i n d e xr e g i s t e r
:GC s e t / r e s e tr e g i s t e ri n d e x
:GC e n a b l es e t / r e s e tr e g i s t e ri n d e x
:GC d a t ar o t a t e / l o g i c a lf u n c t i o n
: r e g i s t e ri n d e x
:GC Mode r e g i s t e r
:GC b i t mask r e g i s t e r i n d e x
SC-INDEX 3c4h
equ
SC-MAP-MASK
equ
2
GC-INDEX 3ceh
equ
GC-SET-RESET
equ
0
GC-ENABLE-SET-RESET
equ 1
GC-ROTATE
equ
3
GC-MODE
GC-BIT-MASK
equ
equ
5
8
dseg
segment
para
common 'DATA'
69
:row dt ios p l taetyseatxt t
TEST-TEXT-ROW
equ
TEST-TEXT-COL
:column
17
equ
dt ios p l taetyseatxt t
TEST-TEXTLWIOTH equ
8
; w i dotfh
a c h a r a cpti ienxre l s
Tbey stl eat sbter li n g
' Hdew
bl loor ,l d;p!st' ret. itrO
snoi tnt .g
?
offset ;font
d dF o n t P o i n t e r
ends
dseg
c s es ge g m epnat pr au b l i c
assume
cs:cseg.
ds:dseg
npesratoarcr t
mov
ax.dseg
mov
ds,ax
: S e l e c t6 4 0 x 4 8 0g r a p h i c s
mov
int
'CODE'
mode.
ax.0lZh
10h
: S e tt h es c r e e nt oa l lb l u e .u s i n gt h er e a d a b i l i t yo f
VGA r e g i s t e r s
: t op r e s e r v er e s e r v e db i t s .
dx.GC-INDEX
mov
mov
a1 .GC-SET-RESET
dx,al out
dx
inc
in
a1 . d x
and
a1 .OfOh
or
a1 .1
dx.al out
dx
dec
al.GC_ENABLE-SET-RESET
mov
dx.al out
dx
inc
in
a1 .dx
and
a1 ,OfOh
or
a1
p:lea.snO
naeaelftlbhs/flroeers e t
dx.al out
dx.VGA-VIDEO-SEGMENT
mov
d i s mov
p tl:aopyo .i dn xt e s
mov
d i .O
mov 8000h
cx,
mov
a x . :0sbfeefvtfc/aforhatlefuhuseseet .
stosw rep
orteshseoep;etnrlb.atsl ynu e
memory
;fill a lwords
l 32k
: w r i t t e na c t u a l l yd o e s n ' tm a t t e r
:fill
blue with
: S e td r i v e rt ou s et h e8 x 8f o n t .
mov
mov
ah.llh
a1 .30h
:VGA B I O S c h a r a c t e r g e n e r a t o r f u n c t i o n ,
: r e t u r ni n f os u b f u n c t i o n
485
mov
bh.3
int
10h
S e l ec ac tl lF o n t
; g e t8 x 8f o n tp o i n t e r
; P r i n tt h et e s ts t r i n g ,c y c l i n gt h r o u g hc o l o r s .
mov
si .offset Teststring
mov
bx.TEST-TEXT-ROW
mov
cx,TEST-TEXT-COL
c omov
l owr i t;hs t a ar th . 0
StringOutLoop:
1 odsb
and
a1 ,a1
StringOutDone
jz
push
ax
call
DrawChar
ax
POP
inc
ah
and
ah.0fh
add
cx.TEST-TEXT-WIDTH
StringOutLoop
jmp
StringOutDone:
a k e y ,t h e ns e tt ot e x t
; Wait f o r
mov
int
mov
int
; E x i tt o
; p r e s e r v ec o l o r
;restorecolor
; n e x tc o l o r
; c o l o r sr a n g ef r o m
0 t o 15
ah.1
21h
; w a i t for a key
ax.3
10h
it reexst t o r e
mode
DOS.
mov
int
ah.4ch
21h
Se tnadr pt
; S u b r o u t i n et od r a w
a t e x tc h a r a c t e ri n
a l i n e a rg r a p h i c s
(ODh. OEh. OFh. 0 1 0 h 0. 1 2 h ) B
. a c k g r o u n da r o u n dt h ep i x e l st h a t
; make u pt h ec h a r a c t e ri sp r e s e r v e d .
; F o n tu s e ds h o u l db ep o i n t e dt ob yF o n t P o i n t e r .
---
; Input:
; AL
c h a r a c t e rt od r a w
; AH
c o l o rt od r a wc h a r a c t e ri n( 0 - 1 5 )
: BX
near
row t o d r a wt e x tc h a r a c t e ra t
column t o d r a wt e x tc h a r a c t e r
CX
;
;
Forces ALU f u n c t i o nt o
F o r c e sw r i t e
mode 3.
proc
DrawChar
ax push
bx push
cx push
dx push
push
si
push
di
bp push
ds push
p u;apsxrhe s e rcvhea r a cdtreoianrw
486 Chapter 26
at
"move".
AL
mode
: S e tu ps e t / r e s e tt op r o d u c ec h a r a c t e rc o l o r ,u s i n gt h er e a d a b i l i t y
: o f VGA r e g i s t e r t o p r e s e r v e t h e s e t t i n g o f r e s e r v e d b i t s
dx.GC-INDEX
mov
mov
dx.al out
dx
inc
in
and
ah.0fhand
or
dx.al out
: S e l e c tw r i t e
7-4.
a1 .GC_SETLRESET
a1 .dx
a1 .OfOh
a1 ,ah
mode 3 . u s i n g t h e r e a d a b i l i t y o f
VGA r e g i s t e r s
mode b i t s unchanged.
: t ol e a v eb i t so t h e rt h a nt h ew r i t e
mov
mov
out
dx
inc
in
al.3
or
dx,al out
: S e t DS:SI t o p o i n t t o f o n t
and ES t o p o i n t t o d i s p l a y
memory.
memory
: C a l c u l a t es c r e e na d d r e s so fb y t ec h a r a c t e rs t a r t s
in.
POP cb:haigdancerarkatatxwoc t e r
t:op o i dn spt u s h
dx.dx sub
mov
ax.bx
xchg
mov
At
BIOS
segment
data
.dxds
di,ds:[SCREEN-WIDTH-IN-BYTES]
POP
ds
mu1 r:oc waosldfcti auoorlf taf st e t
ws icdartsehied:nse e t d i p u s h
mov
c oatlhs:useim
de. cetnx d i
and
. Oc l; lkl beoetncphol yel u imn n- b ya tded r e s s
.1
ds ih r
ds ih r
,1
:dsdihi. vrl ci do el ubmy n
add
b y tpte:ooa ,inandxt d i
: r e t r i e v e BIOS
: s c r e e nw i d t h
8 t o make a baydt de r e s s
: C a l c u l a t ef o n ta d d r e s so fc h a r a c t e r .
bh.bh sub
shl
b x . 1s h l
c h a or aff oc tn:eotbi rnfxf .s1he lt
add
bx,l
:.obfsfxofi snnet t
: S e tu pt h e
GC r o t a t i o n . I n w r i t e mode 3, t h i s i s t h e r o t a t i o n
: o f CPU d a t ab e f o r e i t i s ANDed w i t h t h e B i t Mask r e g i s t e r t o
487
: f o r m t h e b i t mask.Forcethe
ALU f u n c t i o n t o "move".Uses
the
: r e a d a b i l i t y o f VGA r e g i s t e r st ol e a v er e s e r v e db i t su n c h a n g e d .
dx.GC-INDEX
mov
mov
dx.al out
dx
inc
in
and
or
dx.al out
: S e tu p
a1 ,GC-ROTATE
a1 .dx
a1 .OeOh
a1 . c l
BH as b i t mask f o r l e f t h a l f .
mov
bh.cl shr
neg
add
bl.cl shl
BL as r o t a t i o n f o r r i g h t h a l f .
bx.0ffffh
cl
c l .0
: Draw t h e c h a r a c t e r , l e f t h a l f f i r s t . t h e n r i g h t h a l f i n t h e
: s u c c e e d i n gb y t e ,u s i n gt h ed a t ar o t a t i o nt op o s i t i o nt h ec h a r a c t e r
: a c r o s st h eb y t eb o u n d a r ya n dt h e nu s i n gw r i t e
mode 3 t o c o m b i n et h e
: c h a r a c t e rd a t aw i t ht h eb i t
mask t o a l l o w t h e s e t / r e s e t v a l u e ( t h e
: c h a r a c t e rc o l o r )t h r o u g ho n l yf o rt h ep r o p e rp o r t i o n( w h e r et h e
: f o n tb i t sf o rt h ec h a r a c t e ra r e
1) o ft h ec h a r a c t e rf o re a c hb y t e .
: W h e r e v e rt h ef o n tb i t sf o rt h ec h a r a c t e ra r e
0. t h eb a c k g r o u n d
: c o l o ri sp r e s e r v e d .
: Does n o tc h e c kf o rc a s ew h e r ec h a r a c t e ri sb y t e - a l i g n e da n d
;
n or o t a t i o na n do n l yo n ew r i t ei sr e q u i r e d .
bp.FONT-CHARACTER-SIZE
mov
dx.GC-INDEX
mov
wsi cd rt behPOP
ae cn k: g e t c x
cx
dec
cdxe c
CharacterLoop:
: S e tt h eb i t
mov
mov
dx.ax out
mask f o r t h e l e f t h a l f o f t h e c h a r a c t e r .
a1 .GC-BIT-MASK
ah.bh
: G e tt h en e x tc h a r a c t e rb y t e
: ( L e f th a l f o f character.)
. .
mov
mov
stosb
: Setthebit
mov
mov
dx.ax out
a1 , [ s i ]
ah.es:[di]
; g e tc h a r a c t e rb y t e
:1 oad 1 a t c h e s
: w r i t ec h a r a c t e rb y t e
a1 .GC-BIT-MASK
ah.bl
; G e tt h ec h a r a c t e rb y t ea g a i n
Chapter 26
& w r i t e i t t o d i s p l a y memory.
mask f o r t h e r i g h t h a l f o f t h e c h a r a c t e r .
: ( R i g h th a l fo fc h a r a c t e r . )
488
: - 2 b ebctacfyhwoactadoerhuross e
& w r i t e it t o d i s p l a y memory.
1 odsb
mov
stosb
; g e tc h a r a c t e rb y t e
ah.es:[dil
:1 oad 1 a t c h e s
: w r i t ec h a r a c t e rb y t e
; P o i n tt on e x tl i n eo fc h a r a c t e ri nd i s p l a y
add
memory.
, c xd i
bp
dec
C h a r a jcnt ze r L o o p
POP
POP
pop
pop
POP
POP
POP
POP
ret
endp
ds
bP
di
si
dx
cx
bx
ax
DrawChar
: S e tt h ep o i n t e rt ot h ef o n tt od r a wf r o mt o
n e aSr eplreocct F o n t
mov
mov
ret
eSn ed lpe c t F o n t
ES:BP.
ends
cseg
end
start
The key to understanding Listing 26.1 is understanding the effect of ANDing the
rotated CPU data with the contents of the Bit Mask register. The CPU data is the
pattern for the character to be drawn, with bitsequal to 1indicating where character
pixels are to appear. The Data Rotate register is set to rotate the CPU data to pixelalign it, since without rotation characters could only be drawn on byte boundaries.
At the same time that the Data Rotate register is set, the Bit Mask register is set to
allow the CPU to modify onlythat portion of the display memory byte accessed that
the pixel-aligned character falls in,
so that other characters and/or graphics data wont
be wiped out. The result of ANDing the rotated CPU data byte with the contents of
the Bit Mask register is a bit mask that allows only the bits equal to 1 in the original
489
LISTING26.2126-2.ASM
: Program t oi l l u s t r a t eh i g h - s p e e dt e x t - d r a w i n go p e r a t i o no f
:
;
:
;
;
;
;
490
w r i t e mode 3 o ft h e VGA.
Draws a s t r i n go f8 x 1 4c h a r a c t e r sa ta r b i t r a r yl o c a t i o n s
w i t h o u td i s t u r b i n gt h eb a c k g r o u n d ,u s i n g
VGA's 8x14 RDM f o n t .
D e s i g n e df o ru s ew i t h
modes ODh.
OEh,
OFh, 10h.and12h.
Runs o n l y on VGAs ( i n Models 50 & upand
I B M D i s p l a yA d a p t e r
a n1d0 0 %
compatibles).
A s s e m b l e dw i t h MASM
By M i c h a e A
l brash
Chapter 26
'STACK'
VGA-VIDEO-SEGMENT
SCREEN-WIDTH-IN-BYTES
FONT-CHARACTER-SIZE
equ
equ
equ
OaOOOh
044ah
14
;VGA d i s p l a y memorysegment
: o f f s e t o f BIOS v a r i a b l e
:I b y t e s i n e a c h f o n t c h a r
equ
equ
equ
equ
equ
equ
3c4h
:SC i n d e xr e g i s t e r
equ
equ
: VGA r e g i s t e re q u a t e s .
SC-INDEX
SC-MAP-MASK
GC-INDEX
GC-SET-RESET
GC-ENABLE-SET-RESET
GC-ROTATE
GC-MODE
GC-BIT-MASK
dseg
segment
para
TEST-TEXT-ROW
TEST-TEXT-COL
TEST-TEXT-COLOR Ofh
T
b ye tsl eat sb ter li n g
'w
H oe rldl odb,! ' . O
F od nd t P o i n t e r
ends
dseg
common 'DATA'
69
equ
equ
17
equ
: S e l e c t6 4 0 x 4 8 0g r a p h i c s
mode.
ax.012h
10h
: t op r e s e r v er e s e r v e db i t s .
in
and
or
dx.al out
mov
;row t o d i s p l a y t e s t t e x t a t
:column t o d i s p l a y t e s t t e x t a t
: h i g hi n t e n s i t yw h i t e
'CODE'
: S e tt h es c r e e nt oa l lb l u e ,u s i n gt h er e a d a b i l i t yo f
mov
mov
dx.al out
dx
inc
in
.OfOh
a1and
or
dx.al out
dx
dec
mov
dx,al out
dx
inc
:teststringtoprint.
:fontoffset
c s es ge g m epnat pr au b l i c
assume
cs:cseg,
ds:dseg
n pesratoracr t
mov
ax.dseg
mov
ds.ax
mov
int
2
3ceh
0
VGA r e g i s t e r s
d x , GC-I NDEX
a1 .GC-SETLRESET
a1 ,d x
a1 .1
orteshseeope:trnlb.tasllynu e
a1 .GC-ENABLE-SET-RESET
a1 . d x
a1 .OfOh
a1p: el. aO
nsaneaflehtbl/sflroeers e t
dx.VGA-VIDEO-SEGMENT
491
mov
mov
mov
mov
es,dx
d i ,O
8000h
cx,
ax.0ffffh
:pointtodisplay
memory
;fill a l l 32kwords
: b e c a u s eo fs e t / r e s e t .t h ev a l u e
: w r i t t e na c t u a l l yd o e s n ' tm a t t e r
;fill w i t hb l u e
r e ps t o s w
; S e td r i v e rt ou s et h e8 x 1 4f o n t .
mov
mov
mov
int
S e l ec ac tl lF o n t
ah.llh
a1 .30h
bh.2
10h
:VGA B I O S c h a r a c t e rg e n e r a t o rf u n c t i o n .
; r e t u r ni n f os u b f u n c t i o n
; g e t8 x 1 4f o n tp o i n t e r
; P r i n tt h et e s ts t r i n g .
mov
. soif f sTeet s t S t r i n g
bx.TEST-TEXT-ROW
mov
mov
cx.TEST-TEXT-COL
ah.TEST-TEXT-COLOR
mov
D r acwasl lt r i n g
: W a i tf o r
a k e y .t h e ns e tt ot e x t
mov
ah.1
i n tf o r; w a i2t 1 h
mov
ax.3
i n t t :er xe ts t o1 r0eh
: E x i tt o
mode
DOS.
mov
int
endp
Start
a key
ah.4ch
21h
: S u b r o u t i n et od r a w
a textstringleft-to-rightin
a linear
: g r a p h i c s mode
(ODh.
OEh.
OFh.
0 1 0 h 0. 1 2 h w
) i t h8 - d o t - w i d e
:
c h a r a c t e r s .B a c k g r o u n da r o u n dt h ep i x e l st h a t
; c h a r a c t e r isp
sreserved.
; F o n tu s e ds h o u l db ep o i n t e dt ob yF o n t P o i n t e r .
--
; Input:
; AH
c o l o rt od r a ws t r i n gi n
; EX
row t o d r a ws t r i n g
on
; CX
column t o s t a r t s t r i n g
- s t r i n gt o
DS:SI
Forces ALU f u n c t i o nt o
F o r c e sw r i t e
mode 3.
n e aDr rparw
o cs t r i n g
ax push
bx push
cx push
dx push
push
push
bp push
ds push
492
Chapter 26
si
di
at
draw
"move".
make up t h e
: S e tu ps e t / r e s e tt op r o d u c ec h a r a c t e rc o l o r ,u s i n gt h er e a d a b i l i t y
: o f VGA r e g i s t e r t o p r e s e r v e t h e s e t t i n g o f r e s e r v e d b i t s
dx.GC-INDEX
mov
mov
dx,al out
dx
inc
in
.OfOh
a1and
ah.0fh and
or
dx,al out
7-4.
a1 .GC-SETLRESET
a1 .dx
a1 ,ah
: S e l e c tw r i t e
mode 3 . u s i n g t h e r e a d a b i l i t y o f
VGA r e g i s t e r s
: t ol e a v eb i t so t h e rt h a nt h ew r i t e
mode b i t s unchanged.
dx.GC-INDEX
mov
mov
a
1 ,GC-MODE
dx.al out
dx
inc
in
a1 .dx
or
a1 . 3
dx.al out
dx.VGA-VIDEO-SEGMENT
mov
dmov
i s p l at yo: p oesi n. dt x
memory
: C a l c u l a t es c r e e na d d r e s so fb y t ec h a r a c t e rs t a r t si n .
to ;point ds push
dx.dx sub
mov
mov
BIOS
segment
data
,dxds
di,ds:[SCREEN-WIDTH-IN-BYTES]
POP
ds
mov
bx
ax,
mu1 r: oc w
aosdl cfti auoorlf atf st e t
ws ciapush
drstehied; nsee t d i
mov
c oatlhs:useim
de. cetnx d i
and
. Oc l;lkl beoetncphloyel u imn n- b ya tded r e s s
di.l shr
di.1 shr
:dsdihi. vlrci do el ubmy n
add b y t ptde;ooai i,nnadtx
: r e t r i e v e BIOS
: s c r e e nw i d t h
:row
: S e tu pt h e
GC r o t a t i o n . I n w r i t e
8 t o make a baydt de r e s s
mode 3 . t h i s i s t h e r o t a t i o n
: Setup
VGA r e g i s t e r st ol e a v er e s e r v e db i t su n c h a n g e d .
dx.GC_INDEX
a1 .GC-ROTATE
a1 . d x
a1 .OeOh
a1 . c l
BH as b i t mask f o r l e f t h a l f ,
B L as r o t a t i o n f o r r i g h t h a l f .
493
mov
bh.cl shr
neg
add
. c lb sl h l
:
:
:
:
:
:
:
:
:
:
bx.0ffffh
cl
c l .8
D r a w a l lc h a r a c t e r s ,l e f tp o r t i o nf i r s t ,t h e nr i g h tp o r t i o ni nt h e
s u c c e e d i n gb y t e ,u s i n gt h ed a t ar o t a t i o nt op o s i t i o nt h ec h a r a c t e r
a c r o s st h eb y t eb o u n d a r ya n dt h e nu s i n gw r i t e
mode 3 t o c o m b i n et h e
c h a r a c t e rd a t aw i t ht h eb i t
mask t o a l l o w t h e s e t / r e s e t v a l u e ( t h e
c h a r a c t e rc o l o r )t h r o u g ho n l yf o rt h ep r o p e rp o r t i o n( w h e r et h e
f o n tb i t sf o rt h ec h a r a c t e ra r e
1) o f t h e c h a r a c t e r f o r e a c h b y t e .
W h e r e v e rt h ef o n tb i t sf o rt h ec h a r a c t e ra r e
0. t h eb a c k g r o u n d
c o l o ri sp r e s e r v e d .
Does n o tc h e c kf o rc a s ew h e r ec h a r a c t e ri sb y t e - a l i g n e da n d
n or o t a t i o n
and o n l y one w r i t e i s r e q u i r e d .
: Draw t h e l e f t p o r t i o n o f e a c h c h a r a c t e r i n t h e s t r i n g .
wsi dc rt hebPOP
ea nc k: g e t c x
push
si
push
di
bx push
; S e tt h eb i t
mask f o r t h e l e f t h a l f o f t h e c h a r a c t e r .
mov
mov
mov
dx.ax out
LeftHalfLoop:
lodsb
and
dx.GC-INDEX
a1 .GC-BIT-MASK
ah.bh
a1 .a1
L e f t H a l f LoopDone
jz
C h acr a lcl t e r u p
c hnaerl xoatct: opat eot iriondnti i n c
L e f t H aj m
l f pL o o p
LeftHalfLoopDone:
POP
bx
pop
di
POP
si
: D r a w t h er i g h tp o r t i o no fe a c hc h a r a c t e ri nt h es t r i n g .
; erpciahogdacarohitchirftinaroioccsnst es r
: b y t eb o u n d a r y
: Setthebit
mask f o r t h e r i g h t h a l f o f t h e c h a r a c t e r .
mov
mov
mov
dx.ax out
RightHalfLoop:
1 odsb
.a1a1and
jz
C h acr a lcl t e r u p
c hnaerlxoattc: opat eot ir ondnti i n c
R i g h t Hj ma pl f L o o p
494
Chapter 26
dx.GC-INDEX
a1 .GC-BIT-MASK
ah.bl
RightHalfLoopDone
RightHalfLoopDone:
POP
POP
pop
pop
POP
POP
POP
POP
ret
eD
n dr apw s t r i n g
ds
bp
di
si
dx
cx
bx
ax
: Draw a c h a r a c t e r .
-- -
: Input:
: AL
character
: CX
:
ES:DI
s c r e e nw i d t h
a d d r e s st od r a wc h a r a c t e ra t
Cnheaarprarcotce r u p
cx push
push
push
ds push
si
di
: S e t DS:SI t o p o i n t t o f o n t a n d
ES t o p o i n t t o d i s p l a y
memory.
: C a l c u l a t ef o n ta d d r e
mov
mu1
add
bp.FDN mov
cx
dec
CharacterLoop:
1 odsb
mov
stosb
s o fc h a r a c t e r .
. 1 4b l
bl
. a sx i
; 1 4b y t e sp e rc h a r a c t e r
:offsetinfont
-CHARACTER-SIZE
: -1 b e c a u s eo n eb y t ep e rc h a r
; g e tc h a r a c t e rb y t e
ah.es:[di]
:1 oad 1 a t c h e s
: w r i t ec h a r a c t e rb y t e
: Pointtonextlineofcharacterindisplay
add
d i ,c x
dec
jnz
bP
CharacterLoop
POP
POP
POP
POP
ret
Characterup
memory.
dS
di
si
cx
endp
: S e tt h ep o i n t e rt ot h ef o n tt od r a wf r o mt o
n e aSrepl reocct F o n t
mov
segment o f c h a r a c t e r
ES:BP.
495
mov
word [ pF tor n t P o i n t e r + E ] . e s
ret
eSn ed lpe c t F o n t
cseg
ends
end
start
In this chapter, Ive tried to give you a feel for how write mode 3 works and what it
might be used for, rather thanproviding polished, optimized, plug-it-in-and-gocode.
Like the rest of the VGAs writepath, write mode 3 is a resource thatcan be used in
a remarkablevariety of ways,and I dontwant to lockyou into thinkingof it as useful
in just one context. Instead, you should take the time to thoroughly understand
what write mode 3 does, and then, when you do VGA programming, think about
how write mode 3 can best be applied to the task at hand. Because I focused on
illustrating the operationof write mode 3, neither listing in this chapter is the fastest
way to accomplish the desired result. For example, Listing 26.2 could be made nearly
twice asfast by simply having the CPU rotate, mask, and join the
bytes from adjacent
characters, then draw the combined bytes to display memoryin a single operation.
Similarly, Listing 26.1is designed to illustrate write mode 3 and its interaction with
the rest of the VGA as a contrastto Listing 25.1in Chapter 25, rather than formaximum speed, and it couldbe made considerably more efficient. If we were going for
performance, wed have the CPU not only rotate the bytes into position, but also do
the masking by ANDingin software. Even more significantly,we would havethe CPU
combine adjacent characters into complete, rotated
bytes whenever possible, so that
only one drawing operation would be required per byte of display memory modified. By doing this, we would eliminate all per-character OUTS,and would minimize
display memory accesses,approximately doubling text-drawing speed.
As a final note, consider that non-transparent text could also be acceleratedwith write
mode 3. The latches could be filled with the background (text box)color, set/reset
could be set the
to foreground (text) color, and write mode 3 could then be used toturn
monochrome text bytes written by the CPU into characters on the screen with just
one write per byte. There arecomplications, such as drawing partial bytes, and rotating the bytes to align the characters, which well revisitlater on in Chapter55, while
were working through the details of the X-Sharp library. Nonetheless, the performance benefit of this approach can be a speedup of as much as four times-all
thanks to the decidedly quirky but surprisingly powerful and flexible write mode 3.
496
Chapter 26
Previous
Home
Next
VGA registers are readable, which makes it possible to change only those bits of
immediate interest, and, in general, I highly recommend doing exactly that, since
IBM (or clone manufacturers) may well someday use some of those reserved bits or
change the meanings of some of the bits that are currently in use.
497
Previous
chapter 27
Home
Next
Chunky Bitmaps,
ics Coexistence
In thelast chapter, wekarned about the
markedly peculiar write mode 3of the VGA,
after having spent thre& learning the
ins and outs of the VGAs data path in
1 aswell in Chapter 23. In all, the VGA supwrite mode 0, touchingmode
ports four write mod&-write modes 0, 1,2, and 3-and read modes 0 and 1 as well.
Which leaves two bbning questions: What is write mode 2, and how the heck do you
bit unusual but not really hard to understand, particularly if you
followed the descri&on of set/reset in Chapter 25. Reading VGA memory, on the
other hand, can be &anger than you could ever imagine.
Lets start with the easy stuff, write mode 2, and save the read modes for the next
chapter.
501
Recall that the set/resetcircuitry for each of the fourplanes affects the byte written
by the CPU in one of three ways: By replacing the CPU byte with 0, by replacing it
with OFFH, or by leaving it unchanged. The natureof the transformation for each
plane is controlled bytwo bits. The enable set/reset bit for a given plane selects
bit for that planeselects
whether the CPU byte is replaced or not, and the set/reset
the value with whichthe CPU byte isreplaced if the enable set/resetbit is 1. The net
effect of set/reset is to independently forceany, none, or all planes to either of all
ones or all zeros on CPU writes. As we discussed in Chapter 25, this is a convenient
way to force aspecific color to appear no matter what color the pixels being overwritten are. Set/reset also allows the CPU to control the contentsof some planes while
the set/reset circuitry controls the contentsof other planes.
Write mode 2 is basicallya set/reset-type mode with enable set/reset always on forall
planes and the set/reset data coming
directly from thebyte written by the CPU. Put
another way, the lower four bits written by the CPU are written across the fourplanes,
thereby becoming a color value. Put yet another way, bit 0 of the CPU byte is expanded to a byte and sent to the plane 0 ALU (if bit 0 is 0, a 0 byte is the CPU-side
input to the plane 0 ALU, while if bit 0 is 1, a OFFH byte is the CPU-side input);
likewise, bit I of the CPU byteis expanded to a byte for plane1 , bit 2 is expanded for
plane 2, and bit 3 is expanded for plane3.
Its possible that you understand write mode 2 thoroughly at this point; nonetheless, I
suspect that some additional explanation of an admittedly non-obvious mode wouldnt
hurt. Lets followthe CPU byte through the VGA in write mode 2, step by step.
502
Chapter 27
It k worth noting two differences between write mode 2 and write mode 0, the
standard write mode of the VGA. First, rotation of the CPUdatabyte does not take
place in write mode 2. Second, the Set/Reset and Enable Set/Reset registers have
no effect in write mode 2.
Now that we understand the mechanics of write mode 2, we can step back and get a
feel for what it might be useful for. View bits 3-0 of the CPU byte as a single pixel in
Yet Another VGA Write Mode
503
one of 16 colors. Next imagine that nibble turned sideways and written across the
four planes, one bit to a plane.Finally, expand eachof thebitsto a byte, as shownin
Figure 27.2, so that 8 pixels are drawn in the color selectedby bits 30 of the CPU byte.
Within the constraintsof the VGAs data paths, thats exactly what write
mode 2 does.
By the constraints of the VGAs data paths,I mean the ALUs, the bitmask, and the
map mask. As Figure 21.1 indicates, the ALUs can modify the color written by the
CPU, the mapmask can prevent the CPU from altering selected planes, and the bit
mask can prevent the CPU from altering selected bits of the byte written to. (Actually, the bit mask simply substitutes latch bits for ALU bits, but since the latches are
normally loaded from the destination display memory byte,
the net effect ofthe bit mask
is usuallyto preserve bits ofthe destination byte.) These are not really constraints at
all, of course, but rather featuresof the VGA; I simply wantto make it clear that the
use of write mode 2 to set 8 pixels to a given color is a rather simple special case
among the many possible ways in which write mode 2 can be used to feed data into
the VGAs data path.
Write mode 2 is selected by setting bits 1and 0 of the Graphics Mode register (Graphics
Controller register 5 ) to 1 and 0 , respectively. Since VGA registers are readable, the
correct way to select write mode 2 on theVGA is to read the Graphics Mode register,
mask off bits 1 and 0, OR in OOOOOOlOb (OZH), and write the result back to the
Graphics Mode register, thereby leaving the otherbits in the register undisturbed.
504
Chapter 27
stored in the first chunky bitmap byte to be modified. Next, the destination byte in
display memory is read in order to load the latches. Then a byte containing two
chunky pixels is read from the chunky bitmap in system memory, and the byte is
rotated four bits to the right to get the leftmostchunky pixel in position. This rotated byte is written to the destination byte; since write mode 2 is active, each bit of
the chunky pixel goes to itsrespective plane, and since theBit Maskregister is set up
to allow onlyone bit in each plane to be modified, a single pixel in the color of the
chunky pixel is written to VGA memory.
This process is then repeated for the rightmost chunky pixel, if necessary, and repeated again for as many pixels asthere are in the image.
LISTING 27.1
127- 1.ASM
memory t o a p l a n a r b i t - m a p i n
equ
equ
equ
equ
: i n d e x o f Map Mask r e g i s t e r
5
8
: i n d e x o f G r a p h i c s Mode r e g
: i n d e x o f B i t Mask r e g
common 'DATA'
: C u r r e n tl o c a t i o n
o f "A"
CurrentX
dw
CurrentY
dw
RemainingLength dw
: Chunky b i t - m a pi m a g e
AImage
as i t i s a n i m a t e da c r o s st h es c r e e n .
?
?
?
of a y e l l o w "A"
on a b r i g h tb l u eb a c k g r o u n d
byte
label
dw
13 13.
;width. h e i g h t i n p i x e l s
000h.db
OOOh, 000h.000h.
000h.
000h.
OOOh
099h.099h.
099h.099h.
OOOh
099h,
009h.
db
099h,
009h.db
099h.099h.
099h.099h.
OOOh
db
O09h. 099h, 099h. Oe9h. 099h.
099h.
OOOh
099h,
009h.db
09eh. Oeeh. 099h.
099h.
OOOh
099h.
009h.db
Oeeh, 09eh. Oe9h. 099h. OOOh
09eh.
009h.db
Oe9h.099h.
Oeeh.099h.
OOOh
09eh.
009h.db
Oeeh.
Oeeh.
Oeeh. 099h. OOOh
Oe9h. 099h. Oeeh.099h.
OOOh
09eh.
009h.
db
09eh.
009h.db
Oe9h. 099h. Oeeh, 099h. OOOh
505
009h.
db 099h.
099h.
099h.
099h.
099h.
009h.
db 099h.
099h.
099h.
099h.
099h.
000h.
db000h.
000h.
000h.
000h.
000h.
OOOh
OOOh
OOOh
ends
Data
Code
s e g m e pn at pr au b l i c
'CODE'
assume cs:Code,
ds:Data
npeSratoarcr t
mov
mov
mov
int
Data
ax,
,axds
ax.10h
: s1ve0il dehec ot
mode(640x350)
10h
: P r e p a r ef o ra n i m a t i o n .
mov
mov
mov
CCurrentX1.0
CCurrentYl.200
CRemainingLength1.600
:move 600 t i m e s
: A n i m a t e ,r e p e a t i n gR e m a i n i n g L e n g t ht i m e s .I t ' su n n e c e s s a r yt oe r a s e
: t h eo l di m a g e ,s i n c et h eo n ep i x e lo fb l a n kf r i n g ea r o u n dt h ei m a g e
: e r a s e st h ep a r to ft h eo l di m a g en o to v e r l a p p e db yt h e
new image.
AnimationLoop:
mov
bx.CCurrentX1
mov
cx.CCurrentY1
mov
.soi f f s e t
AImage
call
DrawFromChunkyBitmap
:draw
[ C ui nr rce n t X l
mov
DelayLoop:
cx.0
t h e "A" image
;move r one
i gt hhetpt oi x e l
; d e l a y s o we d o n ' t move t h e
as
: needed
: i m a g et o of a s t :a d j u s t
oop
1Del
ayLoop
[RemainingLengthl
AnimationLoop
dec
jnz
: W a i tf o ra
Start
k e yb e f o r er e t u r n i n gt ot e x t
ah.0lh
21h
ax.03h
10h
ah.4ch
21h
mov
int
mov
in t
mov
in t
endp
: Drawanimage
s t o r e d i n ac h u n k y - b i t
: a tt h es p e c i f i e dl o c a t i o n .
: Input:
map i n t o p l a n a r
V G A I E G A memory
X s c r e e nl o c a t i o na tw h i c ht od r a wt h eu p p e r - l e f tc o r n e r
o f t h e image
CX
Y s c r e e nl o c a t i o na tw h i c ht od r a wt h eu p p e r - l e f tc o r n e r
o f t h e image
DS:SI
p o i n t e rt oc h u n k yi m a g et od r a w ,
as f o l l o w s :
word a t 0: w i d t h o f i m a g e , i n p i x e l s
w o r da t 2: h e i g h to fi m a g e ,i np i x e l s
BX
506
mode and e n d i n g .
Chapter 27
b y t e a t 4: msb/lsb
f i r s t & s e c o n dc h u n k yp i x e l s ,
r e p e a t i n gf o rt h er e m a i n d e r
o f t h es c a nl i n e
all s c a nl i n e s .
Images
o ft h e i m a g e ,t h e nf o r
w i t h oddwidthshave
an unused n u l l n i b b l e
p a d d i n ge a c hs c a nl i n eo u tt o
a b y t ew i d t h
; AX,
SI.
D I , ES d e s t r o y e d .
DrawFromChunkyBitmap
near
proc
c ld
;
S e l e c t w r i t e mode 2.
mov
a1mov
dx,al out
dx
inc
a1mov
dx.al out
; E n a b l ew r i t e st o
mov
mov
dx.al out
dx
inc
mov
dx,al out
dx.GC-INDEX
.GRAPHICS-MODE
.02h
all 4 p l a n e s .
d x , SC-I NDEX
a1 .MAP-MASK
al.Ofh
: P o i n t E S : D I t ot h ed i s p l a y
; o f t h ei m a g eg o e s ,w i t h
memory b y t e i n w h i c h t h e f i r s t p i x e l
mask t o a c c e s st h a t
AH s e t u p a s t h e b i t
; p i x e lw i t h i nt h ea d d r e s s e db y t e .
mov
ax.SCREEN-WIDTH-IN-BYTES
l si nctmu1
eaonp osft :aoroftfcsxe t
mov
d i ,ax
mov
. bcl l
and
. lcl ll b
mov
ah.80h
;set
AH tbht ioet
mask t fhoer
p; i ixnei lt i a l
ah.cl shr
bx.1 shr
bx.1 shr
bx.1 shr
b y t e;Xs i n
add
du;i op. fbopfixfesmeraob-t lgfyeet fet
mov
bx.DISPLAY-MEMORY-SEGMENT
mov
es.bx
; E S : D I w bhptat h
yiohtctetieheno t s
; upper l e f t o f t h e
imagegoes
; Get t h ew i d t h
and h e i g h t o f t h e
w mov
i d t ht h e ; gcext. [ s i l
inc
inc
mov
[ sbi x .
1
si
inc
si
inc
dx.GC-INDEX
mov
mov ,BIT-MASK
a1
;dtl ehxaeo. avulet
dx inc
image.
si
si
h et ih;ggehett
t h;e t o
GC Irnedgepi sxo ti en rt i n g
B i t Mask r e g i s t e r
507
RowLoop:
push
push
push
ColumnLoop:
mov
out
mov
mov
shr
shr
shr
shr
stosb
ror
jc
dec
a1 ,ah
dx.al
a1,es:Cdil
a1 , [ s i 1
al.1
a1 .1
a1 .1
a1 ,1
ah.1
CheckMorePixels
di
mask
: s e t t h e b i t mask t o draw t h i s p i x e l
: l o a dt h el a t c h e s
; g e tt h en e x tt w oc h u n k yp i x e l s
;move t h e f i r s t p i x e l i n t o t h e l s b
:drawthe
first pixel
;movemask
t on e x tp i x e lp o s i t i o n
: i sn e x tp i x e li nt h ea d j a c e n tb y t e ?
:no
CheckMorePixels:
i f t ah reer e
more
any
pixels
dec ;seecx
AdvanceToNextScanLine
: a c r o s s i n image
jz
mov
a1 ,ah
d x; b.tsahietlet
mask t o draw t hp i sx e l
out
a1 . e s ;:l l[atodht aicel dh e s
mov
: g e tt h e
same t w oc h u n k yp i x e l sa g a i n
lodsb
: a n da d v a n c ep o i n t e rt ot h en e x t
; t w op i x e l s
; d r a wt h es e c o n do ft h et w op i x e l s
stosb
ror
:movemask
t on e x tp i x e lp o s i t i o n
ah.1
CheckMorePixels2 ; i s n e x t p i x e l i n t h e a d j a c e n t b y t e ?
jc
di
dec
:no
CheckMorePixels2:
1oop
Col
umnLoop
:see
i f t h e raer e
; across i n t h e
any
more
image
pixels
CheckMoreScanLines
short
jmp
AdvanceToNextScanLine:
sinc
:advance
n tehxseot hfa ert ot
; scan 1 i n e i n t h e
image
CheckMoreScanLines:
: g e tb a c kt h ed e s t i n a t i o no f f s e t
pop
di
: g e tb a c kt h ew i d t h
POP
cx
; g e tb a c kt h el e f tc o l u m n sb i t
POP
ax
add
di.SCREEN-WIDTH-IN-BYTES
; p o i n tt ot h es t a r to ft h en e x ts c a n
: 1 i n e o f t h e image
;see
bxdec
i f t haer ree
scan
more
any
t h:ei n
image
jnz
RowLoop
ret
DrawFromChunkyBitmap
endp
Code
ends
end
Start
mask
lines
508
Chapter 27
Forper$ormance, its best to store 16-color bitmaps in pre-separatedfour-plane format in system memory, and copy one plane at a time to the screen. Ideally, such
bitmaps should be copiedone scan line at a time, with all four planes completedfor
one scan line before moving on to the next.
I say this because when entire images
are copied one planeat a time, nasty transient color effects can occur as one plane
becomes visibly changed before other planes have been modified.
A more serviceable use of write mode 2 is shown in the program presented in Listing
27.2. The program draws multicolored horizontal, vertical, and diagonal lines, basing the color patterns on passed color tables. Write mode 2 is ideal because in this
application color can vary from one pixel to the next, and in write mode 2 all thats
required to set pixelcolor is a change of the lower nibble of the byte written by the
CPU. Set/reset could be used to achieve the same result, but an index/data pair of
OUTSwould be required to set the Set/Reset register to each new color. Similarly,
the Map Mask register could be used in write mode 0 to set pixel color, but in this
case not only wouldan index/data pairof OUTSbe required but therewould also be
no guaranteethat data already in display memory wouldnt interfere with the color
of the pixel being drawn, since the Map Mask register allows onlyselected planes to
be drawn to.
Listing 27.2 is hardly a comprehensive line drawing program. It draws only a few
special line cases, and although it is reasonably fast,it is far from the fastest possible
code to handlethose cases, becauseit goes through a dot-plot routine andbecause it
draws horizontal lines a pixel rather than a byte at a time. Write mode 2 would,
however, servejust as well in a full-blown line drawing routine. For any type of patterned line drawing on the VGA, the basic approach remains the same: Use the bit
mask to select the pixel (or pixels) to be alteredand use the CPU byte in write mode
2 to select the color in which to draw.
LISTING 27.2127-2.ASM
: d r a w i n gl i n e si nc o l o rp a t t e r n s .
: Assemble w i t h MASM o r TASM
: By MichaelAbrash
Stack
segment
para
stack
STACK
db
512 dup(0)
Stack
ends
SCREEN-WIDTH-IN-BYTES
GRAPHICSLSEGMENT
SC-INDEX
MAP-MASK
GC-INDEX
GRAPHICS-MODE
BIT-MASK
80
OaOOOh
:mode 10 b i t - m a p segment
3c4h
:Sequence
equ
C o n t r o lIlnedr e gx i s t e r
equ
2
; i n d e x o f Map Mask r e g i s t e r
e0q3uc: G
e hr a p hCi cosn t r o Il lnedrre gx
equ
5
: i n d e x o f Graphics Mode r e g
0
: i n d e x o f B i t Mask r e g
equ
equ
equ
509
Data
segment p a r a common 'DATA'
db
P a t t e r n 016
db
0, 1, 2, 6.5.4.3.
7. 8
db
9 . 10, 11. 12.
13.
14.
15
db
6
Patternl
2. 2. 2. 10,10.10
db
8
Pattern2
db
db
15,
15.
15.
0.
0.
15. 0. 0
db
9
Pattern3
db
1.
1,
2. 2. 2. 4.
4.
4
ends
Data
Code
Spt raor ct
segment p a r ap u b l i c 'CODE'
assume cs:Code.ds:Data
near
ax.0ata
mov
ds.ax
mov
ax, 10h
mo v
in t
: sve1i dl0eehcot
mode
(640x350)
10h
: Draw 8 r a d i a l l i n e s i n u p p e r - l e f t q u a d r a n t i n p a t t e r n
mov
mov
mov
call
0.
bx.0
CX.0
s i . o f f s e tP a t t e r n 0
RuadrantUp
: Draw 8 r a d i a l l i n e s i n u p p e r - r i g h t q u a d r a n t i n p a t t e r n
mov
mov
mov
call
b x , 320
cx.0
s i . o f f s e tP a t t e r n l
RuadrantUp
: Draw 8 r a d i a l l i n e s i n l o w e r - l e f t q u a d r a n t i n p a t t e r n
mov
mov
mov
call
1.
2.
bx.0
cx.175
s i . o f f s e tP a t t e r n 2
Quadrantup
: Draw 8 r a d i a l l i n e s i n l o w e r - r i g h t q u a d r a n t i n p a t t e r n 3 .
mov
mov
mov
call
: W a i tf o r
Zlh
bx.320
cx.175
s i . o f f s e tP a t t e r n 3
Quadrantup
a k e yb e f o r er e t u r n i n gt ot e x t
mov
int
mov
int
mov
int
mode andending.
ah.0lh
ax.03h
10h
ah.4ch
21h
: Draws 8 r a d i a l l i n e s w i t h s p e c i f i e d p a t t e r n i n s p e c i f i e d
: quadrant.
510
Chapter 27
mode 10h
; Input:
SI -
X c o o r d i n a t eo fu p p e rl e f tc o r n e ro fq u a d r a n t
Y c o o r d i n a t eo fu p p e rl e f tc o r n e ro fq u a d r a n t
p o i n t e rt op a t t e r n ,i nf o l l o w i n gf o r m :
B y t e 0: L e n g t ho fp a t t e r n
B y t e 1: Start o f p a t t e r n , one c o l o rp e rb y t e
BX
CX
: AX, BX. C X , DX d e s t r o y e d
Quadrantup
add
add
mov
mov
call
mov
mov
call
mov
mov
call
mov
mov
call
mov
mov
call
mov
mov
call
rnov
mov
call
mov
mov
call
ret
Quadrantup
n e apr o c
bx. 160
cx.87
ax.0
dx.160
L i neUp
ax, 1
dx, 88
L i neUp
ax.2
dx, 88
L i neUp
ax.3
dx, 88
L i neUp
ax.4
dx.161
L i neUp
ax.5
dx, 88
L i neUp
ax.6
dx, 88
L i neUp
ax.7
dx, 88
L i neUp
; p o i n tt ot h ec e n t e ro ft h eq u a d r a n t
; d r a wh o r i z o n t a ll i n et or i g h t
edge
: d r a wd i a g o n a ll i n et ou p p e rr i g h t
: d r a wv e r t i c a ll i n et ot o p
edge
: d r a wd i a g o n a ll i n et ou p p e rl e f t
:draw h o r i z o n t a l l i n e t o l e f t
edge
: d r a wd i a g o n a ll i n et ol o w e rl e f t
: d r a wv e r t i c a ll i n et ob o t t o m
edge
; d r a wd i a g o n a ll i n et ob o t t o mr i g h t
endp
; Draws a h o r i z o n t a l ,v e r t i c a l ,o rd i a g o n a ll i n e( o n eo ft h ee i g h t
: p o s s i b l er a d i a ll i n e s )
: s t a r t i n gp o i n t .
AX
- linedirection,
BX
CX
; Input:
DX
;
o f t h es p e c i f i e dl e n g t hf r o mt h es p e c i f i e d
as f o l l o w s :
3 2 1
4 * 0
5 6 7
X c o o r d i n a t eo fs t a r t i n gp o i n t
Y coordinateofstartingpoint
l e n g t ho fl i n e
(number o fp i x e l sd r a w n )
Al r e g i s t e r sp r e s e r v e d .
: T a b l eo fv e c t o r st or o u t i n e sf o re a c ho ft h e
L i n e U p V ewlcaotbor edr sl
dw
dw
8 p o s s i b l el i n e s .
51 1
; Macro t o d r a wh o r i z o n t a l ,v e r t i c a l ,o rd i a g o n a ll i n e .
; Input:
--
XParm
1 t o draw r i g h t , -1 t o draw l e f t , 0 t o n o t move h o r z .
1 t o drawup,
-1 t o draw down, 0 t o n o t move v e r t .
YParm
BX
X startlocation
CX
Y startlocation
DX
number o f p i x e l s t o draw
DS:SI
linepattern
---
jnz
o f pattern
of pattern
;getcolorofthispixel
DotUpInColor
;...and
draw
it
bx
bx
cx
cx
ah
end
CheckMoreLine
;bg. dsaestictaikr t
;at
o f pattern?
mov
1 odsb
mov cpoa;utrnteetasrhen.t a l
CheckMoreLine:
dec
j nz
jmp
endm
o f pattern
dx
L i neUpLoop
L i neUpEnd
push
push
push
push
push
push
push
near
ax
bx
cx
dx
si
di
es
mov
di ,ax
mov
mov
ax.GRAPHICSLSEGMENT
es ,ax
push
;save
dx
L i n e up pr o c
512 Chapter 27
l e n g tlhi n e
: E n a b l ew r i t e s
dx.SC-INDEX
mov
mov
dx.al out
dx
inc
mov
dx.al out
t o a l lp l a n e s .
a1 .MAP-MASK
a1 .Ofh
: S e l e c t w r i t e mode 2.
dx.GC-INDEX
mov
mov
dx.al out
dx
inc
mov
dx.al out
a1 .GRAPHICS-MODE
al.02h
: V e c t o rt op r o p e rr o u t i n e .
l e n gl it nh ebPOP
a c :kg e t d x
di.l shl
cs:CLineUpVectors+di]
jmp
: H o r i z o n t a ll i n et or i g h t .
L i neUpO:
MLineUp 1, 0
: D i a g o n a ll i n et ou p p e rr i g h t .
L i n e u p 1:
MLineUp 1. -1
: V e r t i c a ll i n et ot o p .
L i neUp2:
MLineUp 0. -1
: D i a g o n a ll i n et ou p p e rl e f t .
L i neUp3:
MLineUp -1. - 1
: H o r i z o n t a ll i n et ol e f t .
L i neUp4:
MLineUp -1. 0
: D i a g o n a ll i n et ob o t t o ml e f t .
L i neUp5:
MLineUp -1. 1
: V e r t i c a ll i n e
t o bottom.
LineUpC:
MLineUp 0. 1
51 3
: D i a g o n a ll i n et ob o t t o mr i g h t .
L i neUp7 :
MLineUp 1. 1
L i neUpEnd:
POP
es
di
si
dx
cx
bx
ax
POP
pop
POP
POP
POP
POP
L i neUp
ret
endp
: Draws a d o t i n t h e s p e c i f i e d c o l o r a t t h e s p e c i f i e d
: Assumes t h a t t h e VGA i s i n w r i t e mode 2 w i t h w r i t e s
: e n a b l e da n dt h a t
: Input:
AL
BX
CX
ES
---
ES p o i n t s t o d i s p l a y
1 oca ti on.
t o a 1 1p l a n e s
memory.
dotcolor
X coordinate o f dot
Y coordinateofdot
d i s p l a y memory segment
:A
l r e g i s t e r sp r e s e r v e d .
DotUpInCol or
bx push
cx push
dx push
push
pnreoacr
di
: P o i n t ES:DI t o t h e d i s p l a y
memory b y t ei nw h i c ht h ep i x e lg o e s ,w i t h
: t h e b i t mask s e t up t o a c c e s st h a tp i x e lw i t h i nt h ea d d r e s s e db y t e .
c o l o r push
d: po rt e s e r vaex
ax.SCREEN-WIDTH-IN-BYTES
mov
mu1
cx
:offsetofstartoftopscanline
mov
d i .ax
mov
c l ,b l
cl .lllb
and
dx.GC-INDEX
mov
a1 .BIT-MASK
mov
dx,al
out
inc
dx
mov
al.80h
shr
a1 . c l
:setthebit
mask f o r t h e p i x e l
out
dx.al
shr
bx.1
shr
bx.1
shr
:X i nb y t e s
bx.1
add
: o f f s e to fb y t ep i x e li si n
d i ,bx
:1oad 1a t c h e s
a1 . e s : [ d i l
mov
: g e tb a c kd o t
color
ax
POP
:writedotindesiredcolor
stosb
POP
POP
514
Chapter 27
di
dx
POP
POP
ret
D o t U p I n eC no dl po r
Start
endp
Code
ends
Startend
cx
bx
Write mode 2 lends itself tomore efficient implementations when the drawing color
changes frequently, as in Listing 27.2.
Set/reset tendsto be superior when many pixels in succession are drawn in thesame
color, since with set/reset enabled for all planes the Set/Reset register provides the
color data and as a result the CPU is free to draw whatever byte valueit wishes. For
example, theCPU can execute anOR instruction to display memory when set/reset
is enabled for all planes, thus both loading the latches and writing the color value
with a single instruction, secure in the knowledge that the value it writes is ignored
in favor of the set/reset color.
Set/reset is also the mode of choice wheneverit is necessary to force
the value writtento
some planes to a fixed value while allowing the CPU byte to modify other planes.
This is the mode of operation when set/reset is enabled forsome but notall planes.
5 15
preserve some or all of the image while I temporarily switch to text mode 3to give
my user a Help screen.
Naturally memory is scarce so Id rather notmake a copy of
the video buffer at AOOOH to remember theimage while I digress to the Help text.
The EGA BIOS saysthat the screen memory
will not be cleared on a modeset if bit 7
of AL is set. Yet if I try that, itis clear that writing text into theB800H buffer trashes
much more than the
4K bytes ofa text page; whenI switch backto mode 10H, ghosts
appear in the form
of bands of colored dots. (When in
text mode, I domake a copy
of the 4K buffer at B800H before showing the help; and I restore the 4K before
switching back to mode 10H.) Is there a way to preserve the graphics image while I
switch to text mode?
A corollary to this question is: Where does the 64/128/256Kof
EGA memory hide
when the EGA is in text mode? Some I guess is used to store charactersets, but what
happens to the rest? Or rather, how can I protect it?
Those are good questions.Alas, answering them in full would require extensive explanation that would have little general application, so Im not going to do that.
However, the issue of how to go to text mode and back without losing the graphics
image certainly rates a short discussion, complete with some working code. Thats
especially true given that both the discussion and the codeapply just as well to the
VGA as to the EGA (with a few differences in mode 12H, the
VGAs high-resolution
mode, as noted below).
Phil is indeed correct in his observation that setting bit7 of AL instructs the BIOS
not to clear display memory on modesets, and heis also correct insurmising that a
font is loaded when going
to text mode. The normal mode 10H bitmap
occupies the
first 28,000 bytes ofeach of the VGAs four planes. (The mode 12H bitmap
takes up
the first 38,400 bytes ofeach plane.) The normal mode 3 character/attribute
memory
map resides in the first 4000 bytes of planes 0 and 1 (the blue and green planes in
mode 10H). The
standard font in mode
3 is stored in the first 8K of plane 2 (the red
plane in mode10H). Neither mode 3nor any other text mode makes use of plane 3
(the intensity plane in mode IOH) ; if necessary, plane 3 could be used as scratch
memory intext mode.
Consequently, you can get away with savinga totalof just under16K bytes-the first
4000 bytes of planes 0 and 1 and the first 8K bytes of plane 2-when going from
mode 10H or mode 12H
to mode 3, to be restored on returning to graphics mode.
Thats hardly all there is to the matter of going fromtext to graphics and back without bitmap corruption, though. One interesting point
is that the mode 10H bitmap
can be relocated toA000:8000 simply by doing a mode set
to mode 10H and setting
the startaddress (programmed atCRT Controller registers OCH and ODH) to 8000H.
You can then access display memory starting at A800:8000 instead of the normal
AOOO:OOOO, with the resultant display exactly like that of normal mode 10H. There
are BIOS issues,since the BIOS doesnt automatically access display memory at the
516
Chapter 27
new start address, but if your program does all its drawing directly without the help
of the BIOS, thats no problem.
The mode 12H bitmap
cant start at A000:8000, because its so long thatit would run
off the endof display memory. However,the mode 12H bitmap
can be relocated to,
say, A000:6000, where it would fit without conflicting with the default font or the
normal text mode memory map, although it would overlap two of the upper pages
available for use (but rarely used) by text-mode programs.
At any rate, once the graphics mode bitmap is relocated, flipping to text mode and
back becomes painless. The memory used by mode 3 doesntoverlap the relocated
mode 10H bitmap atall (unless additional portions of font memory are loaded),so
all youneed do is set bit 7 of AL on modesets in order to flip back and forthbetween
the two modes.
Another interesting point aboutflipping from graphics to text and back is that the
standard mode 3 character/attribute map doesnt
actually takeup every byte ofthe
first 4000 bytes of planes 0 and 1. The standard mode 3 character/attribute map
actually only takesup every even byte of the first 4000 in each plane; the odd bytes
are left untouched. This means that only about 12K bytes actually have to be saved
when going to text mode. The code in Listing 27.3flips from graphics mode to text
mode and back, saving onlythose 12K bytes that actually haveto be saved. This code
while in graphics mode, but
saves and restores the first 8K of plane 2 (the font area)
performs the save and restore of the 4000 bytes used for the character/attribute
map while in text mode, because the characters and attributes, which are actually
stored in the even bytes ofplanes 0 and 1, respectively, appear to be contiguous bytes
in memory in text mode and so are easily saved asa single block.
Explaining why only everyother byte ofplanes 0 and 1 is used in text mode and why
characters and attributes appear to be contiguous bytes when they are actually in
different planes is a large part of the explanationIm not going to go into now. One
bit of fallout from this, however, is that if you flip to text mode and preserve the
graphics bitmap using the mechanism illustrated in Listing 27.3, you
shouldnt write
to anytext page other thanpage 0 (that is, dont write to any offsetin display memory
above 3999 in text mode) or alter the Page Select bit in the Miscellaneous Output
register (3C2H) while in text mode. In order to allow completely unfettered access
to text pages, it would be necessary to save every byte in the first 32K of each of
planes 0 and 1. (On the other hand, this would allow up to 16 text screens to be
if any fonts other
stored simultaneously, with any
one displayable instantly.) Moreover,
than the default font are loaded, the portions
of plane 2 that those particular fonts
are loaded intowould have to be saved, up to a maximum of all 64K of plane 2. In
the worst case,a full 128K would have
to be saved in order to preserve all the memory
potentially used by text mode.
As I said, Phil Colemans question is an interestingone, andIve onlytouched on the
intriguing possibilities arising from thevarious configurations of display memory in
Yet Another VGA Write Mode
517
VGA graphics and text modes. Right now, though, we've still got the basics of the
remarkably complex (but rewarding!) VGA to cover.
LISTING 27.3
L27-3.ASM
: Program t o i l l u s t r a t e f l i p p i n g f r o m b i t - m a p p e d g r a p h i c s
mode t o
: t e x t mode a n db a c kw i t h o u tl o s i n ga n y
o f t h eg r a p h i c sb i t - m a p .
: A s s e m b l ew i t h
MASM o r TASM
: By MichaelAbrash
Stack
segment
para
stack
dup(0)
512 db
Stack
ends
'STACK'
equ
OaOOOh :mode
10
bit-map
segment
GRAPHICS-SEGMENT
TEXT-SEGMENT
equ
Ob800h
:mode
3
bit-map
segment
SC-INDEX
: SreeCq3goIucins4edtthnreoecrxlqel e
ur
: i n d e x o f Map Mask r e g i s t e r
MAP-MASK
equ
2
e3 qc: G
ue hr a p hCi cosn t r oIl nl edr reegxi s t e r
GC-INDEX
READ-MAP
equ
4
o :f i n d e x
Read Map r e g i s t e r
Data
segment
para
common 'DATA'
lbaybt e l
GStri keAnyKeyMsg0
Odh. Oah. 'Graphicsmode',
Odh. Oah
db
db
' S t r i k e anykey
t oc o n t i n u e . . . ' ,
Odh. Oah. ' f '
G S t r i kbey A
t en lyaKbeeyl M s g l
Odh.
Oah.
db
' G r a p h i c s mode a g a i n ' ,
db
' S t r i k e any
key
ct o n t i n u e . . . ' .
Odh. Oah
Odh.
Oah.
T S t r i keAnyKeyMsg
b lyat be e l
Odh.
Oah,
'Textmode',
Odh. Oah
db
' S t r i k e anykey
t oc o n t i n u e . . . ' ,
db
P1 ane2Save
CharAttSave
(?I
2000h
dup
db
db
4000 dup ( ? )
; s a v ea r e af o rp l a n e
2 data
: where f o n tg e t sl o a d e d
; s a v ea r e af o r
memory wiped
: o u t by c h a r a c t e r / a t t r i b u t e
: d a t a i n t e x t mode
ends
Data
Code
s e g m epnat pr au b l i c
'CODE'
assume cs:Code.
ds:Data
n peSratoracr t
mov
int
ax.10h
:vsied1lee0ocht
: Fill t h eg r a p h i c sb i t - m a pw i t h
cld
ax.GRAPHICS-SEGMENT
mov
mov
,axes
mov
ah.3
mov pt ol :af noceuxsr . 4
mov
dx.SC-INDEX
mov
a1 .MAP-MASK
do:xl ue, tatahlvee
dx inc
518
Chapter 27
'$'
mode
(640x350)
10h
a c o l o r e dp a t t e r n .
: i n i t i a l fill p a t t e r n
fill
SC I n pd oe itxnh teion g
: Map Mask r e g i s t e r
FillBitMap:
mov
al.10h
shr
a1 , c l
dx.al out
di.di sub
mov
t ha1
:eg e, at h
cx push
mov
cx.8000h
;do
stosw
rep
c o pu lnat nbPeaOcPk: g e t c x
ah.1 shl
ah.1 shl
1 oop
F i1 1 B i tMap
; Putup"strikeanykey"message.
mov
mov
mov
mov
int
; W a i tf o r
ax.Data
ds.ax
d x . o f f s e t GStrikeAnyKeyMsgO
ah.9
21h
a key.
mov
int
ah.0lh
21h
; Save t h e 8K o f p l a n e
2 t h a t will b eu s e db yt h ef o n t .
dx.GC-INDEX
mov
mov
a1 , READ-MAP
dx.al out
dx
inc
mov
a1 .2
dx.al out
: s e tu pt or e a df r o mp l a n e
mov
ax.Data
mov
es.ax
ax.GRAPHICS-SEGMENT
mov
mov
ds.ax
si.si sub
mov
d i . o f f s e t PlaneZSave
mov
cx.Z000h/2
:save
8K ( l e n d
goe
t fhf a uf ol tn t )
r e p movsw
; GO t o t e x t
mode w i t h o u t c l e a r i n g d i s p l a y
mov
int
; Save t h e t e x t
memory.
ax.083h
10h
mode b i t - m a p .
mov
ax.Data
mov
es.ax
ax.TEXT-SEGMENT
mov
mov
ds.ax
sub
si.si
mov
d i , o fCf sheatr A t t S a v e
mov
c x . 4 0 0 0; l/e2notgonestfecxhrt e e n
r e p movsw
i n words
519
;
;
F i l lt h et e x t
message.
mov
ax.TEXT-SEGMENT
mov
es,ax
di .di
sub
a1
;fill c h a r a c t e r
mov
ah.7
;fill a t t r i b u t e
mov
mov
c x . 4 0 0 0 ;/ l2e n got hnft esxct r e ew
i nno r d s
r e ps t o s w
mov
ax.Data
mov
ds.ax
mov
d x . oTf fSs terti k e A n y K e y M s g
mov
ah.9
int
21h
. I . '
; W a i tf o r
akey.
mov
int
;
ah.0lh
21h
R e s t o r et h et e x t
mode s c r e e n t o t h e s t a t e
i t was i n on e n t e r i n g
; t e x t mode.
mov
ax.0ata
mov
ds.ax
ax.TEXT-SEGMENT
mov
mov
es,ax
mov
s i . o fCf sh ea tr A t t S a v e
di.di sub
mov
c x . 4 0 0 0; l/e2notgonestfecxhrt e e n
r e p movsw
; R e t u r nt o
mov
int
;
mode 1 0 h w i t h o u t c l e a r i n g d i s p l a y
2 t h a t was w i p e do u tb yt h ef o n t .
mov
dx,SC-INDEX
mov
a1 .MAP-MASK
out
dx.al
dx
inc
mov
a1 .4
pwl tardtooionx;tuuse.ptae lt
mov
ax.Data
mov
ds.ax
ax.GRAPHICS-SEGMENT
mov
mov
es.ax
s i . o f f s e t PlaneESave
mov
sub
. ddi i
mov
c x . 2 0 0 0 ;hr/e2s t o r e
r e p movsw
; P u tu p" s t r i k ea n yk e y "m e s s a g e .
520
Chapter 27
memory.
ax,90h
10h
R e s t o r et h ep o r t i o no fp l a n e
mov
mov
i n words
ax.Data
ds.ax
8K ( l e n dgoet fhf a uf ol tn t )
Previous
mov
mov
int
: W a i tf o rak e yb e f o r er e t u r n i n g
Start
Code
mov
int
mov
int
mov
int
endp
ends
end
Home
Next
d x . oG
f fSs terti k e A n y K e y M s g l
ah.9
21h
t o t e x t mode a n de n d i n g .
ah.0lh
21h
ax.03h
10h
ah.4ch
21h
Start
52 1
Previous
chapter 28
Home
Next
Read Mode 0
Read mode 0 is actually relatively uncomplicated, given that you understand the
four-plane nature of the
four-plane natureof the VGA. (If you dont understand the
VGA, I strongly urge you to read Chapters 23-27 before continuing with this chapter.) Read mode 0, the read mode counterpart of write mode 0, lets you read from
one (andonly one) plane of VGA memory at any one time.
Read mode 0 is selected by setting bit 3 of the Graphics Mode register (Graphics
Controller register 5 ) to 0. When read mode 0 is active, the plane that supplies the
data when the CPU reads VGA memory is the plane selected by bits 1 and 0 of the
525
Read Map register (Graphics Controller register4).When the Read Map register is
set to 0, CPU reads come from plane
0 (the plane that normally contains bluepixel
data). When the Read Map register is set to 1, CPU reads come from plane1;when
the Read Map register is 2, CPU reads come from plane2; and when the Read Map
register is 3, CPU reads come from plane3.
That all seemssimple enough; in read mode
0, the Read Map register acts as a selector among the four planes, determining which one of the planes will supply the
value returned to the CPU. There is a slight complication, however, in that thevalue
written to the Read Map register in order to read from a given plane is not the same
as the value written to the Map Mask register (Sequence Controller register 2) in
order to write to that plane.
Why is that? Well, in read mode 0, one andonly one plane can be read at atime, so
there are only four possible settings of the Read Map register: 0, 1, 2, or 3, to select
reads from plane 0, 1, 2, or 3. In write mode 0, by contrast (in fact, in any write
mode), any or all planes may be written to at once,since the byte written by the CPU
can fan out to multiple planes. Consequently, there are not four butsixteen possible settings of the Map Mask register. The setting of the Map Maskregister towrite
only to plane 0 is 1; to write onlyto plane 1 is 2; to write only to plane 2 is 4;and to
write only to plane 3 is 8.
As you can see, the settings of the Read Map and Map Mask registers for accessing a
given plane dont match.The code inListing 28.1illustrates this. Listing 28.1 simply
copies a sixteencolorimage from system memory VGA
to memory, one plane at atime,
then animates by repeatedly copying the image back to system memory, again one
plane at a time, clearing the oldimage, and copying the image to anew location in
VGA memory. Note the differing settings of the Read Map and Map Mask registers.
LISTING 28.1 128- 1.ASM
; Program t o i l l u s t r a t e t h e
use o f t h e Read Map r e g i s t e r i n r e a d
mode 0.
; A n i m a t e sb yc o p y i n ga1 6 - C o l O ri m a g ef r o m
VGA memory t o systemmemory.
; one p l a n ea tat i m e ,t h e nc o p y i n gt h ei m a g eb a c kt oa
new l o c a t i o n
: i n VGA memory.
: By M i c h a e Al b r a s h
s t a cs ke g m ew
n to sr dt a c k
512 dup ( ? )
db
stack ends
data
segment
IMAGE-WIDTHEQU
IMAGELHEIGHT
LEFT-BOUND EQU
RIGHT-BOUNDEOU
VGA-SEGMENTEQU
SCREEN-WIDTH
SC-INDEX
EQU
GC-INDEX
EQU
526
Chapter 28
STACK
word DATA
4
; i nb y t e s
: i np i x e l s
EQU
32
10
; i nb y t e s
66
; i nb y t e s
OaOOOh
EQU
80
b y t e s; i n
3; cS4ehq u eCnocnet r oI nl rl de ergxi s t e r
3;cGerha p hCioc ns t r oI lnlrdeeergxi s t e r
MAP-MASK
READ-MAP
EOU
EQU
2
4
SC
GC
: B a s ep a t t e r nf o r1 6 - C O l O ri m a g e .
P a t t e r n P l a nl aebbO
yet el
dup
32
db
(Offh,Offh.O.O)
P a t t e r n Pal n e l
1 a b ebl y t e
dup
32
db
(Offh.O.Offh.0)
P a t t e r n Paln e 2
1 a b ebl y t e
dup
32
(OfOh.OfOh.OfOh.OfOh)
db
P a t t e r n Paln e 3
1 a b ebl y t e
dup
32
(0cch.Occh.Occh.Occh)
db
: T e m p o r a r ys t o r a g ef o r1 6 - c o l o ri m a g ed u r i n ga n i m a t i o n .
ImagePlaneOdb
ImagePl anel db
ImagePlaneZdb
ImagePlane3 db
32*4 d u p
32*4dup
32*4dup
32*4dup
: C u r r e n ti m a g el o c a t i o n
ImageX
b y t e s :dw
in
ImageY
dw
ImageXDi r e c t i o n dw
data
ends
(?I
(?I
(?)
(?)
& direction.
40
100
p i x e l:si n
1
b: iynt e s
code
segment
word
'CODE'
assume
cs:code,ds:data
S t a r tp r o cn e a r
c ld
mov
ax.data
mov
ds.ax
: S e l e c tg r a p h i c s
mov
int
mode 10h.
ax,lOh
10h
: Draw t h e i n i t i a l image
mov
call
si .offset PatternPlaneO
DrawImage
: Loop t oa n i m a t eb yc o p y i n gt h ei m a g ef r o m
VGA memory t o systemmemory,
: e r a s i n gt h ei m a g e ,a n dc o p y i n gt h ei m a g ef r o ms y s t e m
memory t o anew
: l o c a t i o n i n VGA memory.Ends
when a key i s h i t .
AnimateLoop:
: Copy t h ei m a g ef r o m
mov
call
d .i o f f s e t
GetImage
: C l e a rt h ei m a g ef r o m
call
ImagePlaneO
VGA memory.
EraseImage
527
: Advancetheimage
X c o o r d i n a t e ,r e v e r s i n gd i r e c t i o n
: o ft h es c r e e nh a sb e e nr e a c h e d .
if e i t h e r edge
mov
ax,CImageX]
ax.LEFT-BOUND
cmp
jz
ReverseDirection
ax.RIGHT-BOUND
cmp
jnz
SetNewX
ReverseDirection:
CnI emga g e X D i r e c t i o n ]
SetNewX:
ax.CImageXDirection]
add
mov
CImageX1.a~
: Draw t h ei m a g eb yc o p y i n g
mov
call
i t f r o ms y s t e m
s i . o f f s e t ImagePlaneO
DrawImage
: S l o wt h i n g s
down a b i t f o r v i s i b i l i t y ( a d j u s t
asneeded).
mov
cx.0
Del ayLoop:
1 oop Delay Loop
: See if a keyhasbeen
mov
int
jz
h i t ,e n d i n gt h ep r o g r a m .
ah.1
16h
AnimateLoop
: C l e a rt h ek e y .r e t u r nt ot e x t
sub
int
mov
int
mov
int
S t a r t endp
mode,and
r e t u r n t o 00s.
ah.ah
16h
ax.3
10h
ah.4ch
21h
: Draws t h ei m a g ea to f f s e t
: VGA memory.
DrawImage
proc
near
ax,VGA-SEGMENT
mov
mov
es,ax
c aG
ll etImageOffset
DS:SI t o t h e c u r r e n t
:ES:DI
i m a g el o c a t i o ni n
i st h ed e s t i n a t i o na d d r e s sf o tr h e
mov
d x ,SC-I NDEX
mov
al.l
p:l da on e
0 first
DrawImagePlaneLoop:
push d i
:image i s drawn taht e
same o f f s ei nt
: e a c hp l a n e
: p r easpspxeleualrsenvhcet
mov
a1 .MAP-MASK :Map Mask i n d e x
o du xt . :apl o i n t
SC I n d et xoh e
Map Mask r e g i s t e r
POPpbs;laegaclenaektexc t
SC i n rdeegxi s t e r
: pdiotnxoi cn t
528
Chapter 28
o ud tx . a; sl euttph e
SC D raet ga i s t e r
; pdobxeitacnoct k
mov
bx.IMAGE-HEIGHT
; # osf c a n
l i n e si ni m a g e
DrawImageLoop:
mov
cx.IMAGE-WIDTH
;# ob fy t easc r o si m
s age
movsb
rep
add
di.SCREEN-WIDTH-IMAGE-WIDTH
: p o i n tt on e x ts c a nl i n eo fi m a g e
lines?
scan
more
;anybxdec
jnz
DrawImageLoop
pop : dgbieaticmk a sg teao rf tf si en t
shl
a1 . I
:Map Mask s e t t i nf ong re px lt a n e
CmD
a l . 1 ;0hha v e
we done af lol uo rl a n e s ?
DrawImagePlaneLoop
jnz
ret
DrawImage
endp
; C o p i e st h ei m a g ef r o mi t sc u r r e n tl o c a t i o ni n
; b u f f e ra t
DS:DI.
VGA memory
VGA memory i n t o t h e
p r once a r
GetImage
:move d e s t i n a t ioofnf s e t
mov . ds i
into SI
: D I i so f f s e to f
image i n VGA memory
c a l l GetImageOffset
image.01
i sd e s t i n a t i o no f f s e t
xchg s i . d i ;SI i s o f f s e t o f
push ds
es
; E S :dDeI s ti isn a t i o n
POP
ax.VGA-SEGMENT
mov
mov ,ax
ds
; D S : S I s iosu r c e
dx.GC-INDEX
mov
a1 .a1
:do
plane
sub
0 first
GetImagePlaneLoop:
push s i
;image comes f r o m same o f f s ei nt
e apclha n e
; p r easpxelualrsenvhcet
mov
a1,READAP;Read
Map i n d e x
oduxt .:aplo i n t
GC I n d et ox
Read Map r e g i s t e r
POPpbs;lageacelnaketexc t
; pdiotxnoi cn t
GC I nrdeegxi s t e r
o du xt . ;asl uet hpt e
Read Map tsoe l e cr et a df sr o m
; t h ep l a n eo fi n t e r e s t
; pdobexitacnoct k
GC d ar et ag i s t e r
mov
bx,IMAGE-HEIGHT
; C osf c a nl i n e isni m a g e
GetImageLoop:
mov
cx.IMAGE-WIDTH
;# o
b fy t easc r o si m
s age
movsb
rep
add
si.SCREEN-WIDTH-IMAGE-WIDTH
; p o i n tt on e x ts c a nl i n eo fi m a g e
lsm
i n;caoeanrsnbey?xd e c
GetImageLoop
jnz
pop ;bgi amescsatikotgaf efrst e t
inc
a1
;Read Map s e t tnfi nope grlxat n e
cmp
a:lh. 4a v e
we done af ol plulra n e s ?
GjentzI m a g e P l a n e L o o p
push
es
POP
ds
; r eosrt iogri en a l
DS
ret
GetImage
endp
; E r a s e st h ei m a g e
a t i t sc u r r e n tl o c a t i o n .
529
EraseImage
proc
near
mov
dx.SC_INDEX
mov
a1 .MAP-MASK
: p o i n t SC I n d e x t o t h e
Map Mask r e g i s t e r
doxu, at l
di nx c
; p o i n t t o SC D a t ar e g i s t e r
mov
a1 .Ofh
: s e tu pt h e
Map Mask t o a l l o w w r i t e s t o g o t o
doxu. at l
: a l l 4 planes
ax.VGALSEGMENT
mov
mov
es.ax
C aG
l l e t I m a g e O f f s e: tE S : DpIo i n ttstohset a ratd d r e s s
; o f t h e image
a1,al
sub
: e r a s ew i t hz e r o s
mov
bx.IMAGE-HEIGHT
;# o f s c a n l i n e s i n
image
EraseImageLooD:
mov cX.IMAGE-WIDTH
:#obfy t easc r o sism a g e
stosb
rep
di.SCREEN-WIDTH-IMAGE-WIDTH
add
; p o i n tt on e x ts c a nl i n eo fi m a g e
l ism
nc:eoasrne?by x
dec
j nz
EraseImageLoop
ret
EraseImage endp
: R e t u r n st h ec u r r e n to f f s e to ft h ei m a g ei nt h e
G e t I m a g e O f fpsnreeotac r
ax,SCREEN_WIDTH
mov
mu1
[ ImageY 1
ax.[lmageX]
add
mov
d i ,ax
ret
GetImageOffset
endp
code
ends
S
e nt adr t
530
An important point regarding reading VGA memory involves the VGA5. latches.
(Remember that each of the four latches stores a byte for one plane; on CPU
writes, the latches can provide some or all of the data written to display memory,
allowing fast copying and eflcient pixel masking.) Whenever the CPU reads a
given address in VGA memory, each of thefour latches is loaded with the contents
of the byte at that address in its respective plane. Even though the CPU only receives data from one planein read mode 0, all four planes are always read, and
the values read are stored in the latches. This is true in read mode I as well. In
short, whenever the CPUreads VGA memory in any read mode, allfourplanes are
read andall four latches are always loaded.
Chapter 28
Read Mode 1
Read mode 0 is the workhorse read mode, but
its got an annoying limitation: Whenever you want to determine the color of a given pixel in read mode 0, you have to
perform four VGA memory reads, one for each plane, and then interpret the four
bytes youveread as eight 16-color pixels. Thats a lot of programming. The code is
also likely torun slowly, all the more so because a standard IBM VGA takes an aver-
age of 1.1 microseconds to complete each memory read, and read mode 0 requires
four reads inorder to read the fourplanes, not to mention the even greater amount
of time taken by the OUTSrequired to switchbetween the planes. (1.1microseconds
may not sound like much, but ona 66MHz 486, its 73 clock cycles! Local-busVGAs
can be a good deal faster, but a read from thefastest local-busadapter Ive yet seen
would still cost in the neighborhoodof 10 486/66 cycles.)
Read mode 1, also known ascolor compare mode, provides special hardware assistance
for determining whetherpixel
a is a given color. With
a single read mode1read, you
can determine whether each of up to eight pixels is a specific color, and you can
even specify anyor all planes as dont careplanes in the pixel color comparison.
Read mode 1 is selected by setting bit 3 of the Graphics Mode register (Graphics
Controller register 5 ) to 1. In its simplest form, read mode 1 compares the crossplane value ofeach of the eightpixels at a given address to the color value in bits 3-0
of the Color Compare register (Graphics Controller register 2),and returns a 1 to
the CPU in the bit position of each pixel that matches the color in the Color Compare register and a 0 for each pixel that does not match.
Thats certainly interesting, butwhats read mode 1 good for?One obvious application is in implementing flood-fill algorithms, since read mode 1 makes it easy to tell
when a given byte contains a pixel of a boundary color. Another application is in
detecting on-screen object collisions, asillustrated by the code in Listing 28.2.
LISTING 28.2
128-2.ASM
: Program t o i l l u s t r a t e u s e o f r e a d mode 1 ( c o l o rc o m p a r em o d e )
: t od e t e c tc o l l i s i o n si nd i s p l a y
memory.Draws
a y e l l o wl i n eo n
; b l u eb a c k g r o u n d ,t h e nd r a w sap e r p e n d i c u l a rg r e e nl i n eu n t i lt h e
: y e l l o wl i n e
; By
i s reached.
MichaelAbrash
s t a sc ek g m ewn ot sr dt a c k
db
5 1 2 dup ( ? )
stack ends
EQU
VGA-SEGMENT
SCREEN-WIDTH
GC-INDEX
SETLRESET
ENABLE-SETLRESET
COLOR-COMPARE
GRAPHICS-MODE
B I TLMAS K
EQU
EQU
EQU
EQU
STACK
OaOOOh
EQU
80
3ceh
0
1
EQU
2
EQU 5
8
;in b y t e s
: G r a p h i c sC o n t r o l l e rI n d e xr e g i s t e r
: S e t / R e s e tr e g i s t e ri n d e xi n
GC
: E n a b l eS e t / R e s e tr e g i s t e ri n d e xi n
GC
; C o l o rC o m p a r er e g i s t e ri n d e xi n
GC
; G r a p h i c s Mode r e g i s t e r i n d e x i n GC
; B i t Mask r e g i s t e r i n d e x i n
GC
531
'CODE'
word
segment
code
assume
near
proc
code
cs:
Start
c ld
; S e l e c tg r a p h i c s
mov
int
mode 10h.
ax,lOh
10h
; F i l lt h es c r e e nw i t hb l u e .
1
c o l o r i smov
; b l u ea l . l
C aSl le l e c t S e t R e s e t C o l o: sr etdotr aibwnl u e
ax.VGA-SEGMENT
mov
mov
es.ax
ds iu. db i
mov
cx,
7000h
r se tpo :st bvh ae l w
u er i t t ae cn t u a dl l oy e s n ' t
; m a t t e r ,s i n c es e t / r e s e ti sp r o v i d i n g
; t h ed a t aw r i t t e nt od i s p l a y
memory
: Drawa
v e r t i c a ly e l l o wl i n e .
1 4 c o l omov
r i ;sy eal l o. 1w4
c a Sl l e l e c t S e t R e s e t C o l o: rs etdot r a iw
yne l l o w
mov
dx,GC-INDEX
mov
a1 .BIT-MASK
; pdoxi.noatul t
GC tI on d e x
B i t Mask
to ;point dx
inc
GC D a t a
mov
al.10h
B i t Mask t o 10h
; s edtx . aol u t
mov m
:t-tsht4iotihde0napoeidrf lte
1 ine
mov ;do
cx.350
s c r ehoeefnifguhl lt
VLineLoop:
mov
a1 . e s : [ d i ]
; l o a dt h el a t c h e s
stosb
; w r i t en e x tp i x e lo fy e l l o wl i n e
(set/reset
: p r o v i d e st h ed a t aw r i t t e nt o
d i s.p l a y
; memory,and
AL i s a c t u a l l y i g n o r e d )
add
di.SCREEN-WIDTH-1
: p o i tnhont ee sx ct al inn e
1 oop VLi neLoop
: S e l e c tw r i t e
mode 0 andread
mov
dx.GC-INDEX
mov
a1 .GRAPHICS-MODE
do;xup. tao li n t
t o; p o i ndtx i n c
mov
a l . 0 0 0 0 1 0 0 0;bb i t
doG
:xsu.reat tlp h i c s
mode 1.
GC G
I nrtdaoep xh i c s
Mode r e g i s t e r
GC Data
3-1 i rs e a d
mode 1. b i t s 1 & 0-OD
; i s w r i t e mode 0
Moder et ao d
mode 1.
; w r i t e mode 0
: D r a w a h o r i z o n t a lg r e e nl i n e ,
; t or i g h tu n t i lc o l o r
one p i x e l a t a t i m e , f r o m l e f t
c o m p a r er e p o r t s
a y e l l o wp i x e li se n c o u n t e r e d .
; Draw i n green.
mov
al.2
c aSl el l e c t S e t R e s e t C o l o r
532
Chapter 28
:green i s c o l o r 2
; s e tt od r a wi ng r e e n
: S e tc o l o rc o m p a r et ol o o kf o ry e l l o w .
mov
dx.GC_INDEX
mov
a1 ,COLOR-COMPARE
oduxt ,:aplo i n t
GC I n d C
et oxo l o r
: ptdooixn ct
GC D a t a
mov
a l . :1w4e 'l roeo k i fnyoegrl l o w ,
oduxt .:ascleot l coor m p altrooeof yoker l l o w
: pt odidxnet c
GC I n d e x
: S e tu pf o rq u i c ka c c e s st oB i t
mov
a1 .BIT-MASK
oduxt .:aplo i n t
: p tdooixi nn ct
: S e ti n i t i a lp i x e l
Compare r e g i s t e r
color 1 4
Mask r e g i s t e r .
GC I n d etBxoi t
GC D a t a
m a s ka n dd i s p l a y
Mask r e g i s t e r
memory o f f s e t .
mov p: la1
i xn ei.tl8i a0 lh
mov
di,lOO*SCREEN-WIDTH
mask
:startatleft
HLineLoop:
mov
a h . e s : [ d i:ld o
edge o f s c a n l i n e 1 0 0
: S l o wt h i n g s
down a b i t f o r v i s i b i l i t y ( a d j u s t
so we're
asneeded).
mov
cx.0
Del ayLoop:
1oop Del ayLoop
HLineLoop
jmp
: W a i tf o r
a k e y t o b ep r e s s e dt oe n d ,t h e nr e t u r nt ot e x t
mode and
: r e t u r n t o DOS.
WaitKeyAndDone:
WaitKeyLoop:
mov
ah.1
int
16h
jz
WaitKeyLoop
ah.ah
sub
i n kt e:tych1lee6ahr
mov
ax.3
int
:1rt0eehtxuot r n
mov
ah.4ch
:done
int
21h
Start endp
mode
533
: E n a b l e ss e t / r e s e tf o ra l lp l a n e s ,a n ds e t st h es e t / r e s e tc o l o r
; t o AL.
S e l e c t S e t R e s e t Cpnorelooacr
mov
d x ,GC-I NDEX
; p r ceos leorrv e
p u s h ax
mov
a1 .SETPRESET
out
d x :. pa ol i n t
GC I n dStee
oxt / R e sr ee tg i s t e r
; ptdooixn ct
GC D a t a
cPOP
oblao:cgr ke ta x
out
d x .: asSel et t / R e sr ee tg i sstteolr e c t ceodl o r
ddxe c
; p o i n t t o GC I n d e x
mov
a1 ,ENABLEPSETPRESET
oduxt ,;aplo i n t
GC I n dEteonx a bSl ee t / R e sreetg i s t e r
; ptodoixn tc
GC D a t a
mov
al.Ofh
od ux:t,eanl a bs leet / r epsaf loelalrtn e s
ret
S e l e c t S e t R e s eetnCdopl o r
code
ends
Se nt adr t
534
Chapter 28
Now, were allprone to using toolsthe rightway-that is, in the way in which they
is clearly intended
were intended to be used.
By that token,the Color Dont Care register
to mask one or more
planes out of a color comparison, and as such, has limited use.
However, the Color Dont Care register becomes far more interesting in exactly the
extreme case described above, where allplanes become dont careplanes.
Why? Well, as Ivesaid, when all planes are dont careplanes, read mode 1 reads
always return OFFH. Now, when you AND any value with OFFH, the value remains
unchanged, andthat can be awfully handy when youre using the bit mask to modify
selected pixels in VGA memory. Recall that you must always read VGA memory to
load the latches before writing to VGA memory when youre using the bit mask.
Traditionally, two separate instructions-a read followed by a write-are used to perform this task. The code in Listing 28.2 uses this approach. Suppose, however, that
youve set the VGA t o read mode1, with the Color Dont Care register set to 0 (meaning all reads of VGA memory will return OFFH). Under these circumstances, you can
use a single AND instruction to both read and
write VGA memory, since ANDing any
value with OFFH leaves that value unchanged.
Listing 28.3 illustrates an efficient use of write mode 3 in conjunction with read
mode 1 and a Color Dont Care register setting of 0. The mask in AL is passed directly tothe VGAs bit mask (thatshow writemode 3 works-see Chapter 4 for details).
Because the VGA always returns OFFH, the single AND instruction loads the latches,
and writes the value in AL, unmodified, to the VGA, where it is used to generate the
bit mask. This is more compact and register-efficient than using separate instructions to read and write, although it is not necessarily faster by cycle count, because
on a 486 or a Pentium MOV is a l-cycle instruction, but AND with memory is a 3cycle instruction. However, given displaymemory wait states, it is often the case that
the two approaches run at thesame speed, and theregister that theabove approach
frees up can frequently be used to save one ormore cycles in any case.
By the way, Listing 28.3 illustrates how write mode 3 can make for excellent pixeland line-drawing code.
LISTING28.3128-3.ASM
Program t h a t draws a d i a g o n a l l i n e t o i l l u s t r a t e t h e
use o f a
C o l o rD o n tC a r er e g i s t e rs e t t i n go f
OFFH t o s u p p o r t f a s t
r e a d - m o d i f y - w r i t eo p e r a t i o n st o
VGA memory i n w r i t e mode 3 by
drawing a diagonal ine.
Note:Workson
VGAs o n l y .
By MichaelAbrash
s t a sc ek g m ewn ot sr tda c k
512
dup
db
stack ends
VGA-SEGMENT
SCREEN-WIDTH
STACK
(?)
EQU
EQU
OaOOOh
80
: i n bytes
535
GC-INDEX
SETLRESET
ENABLE-SET-RESET
GRAPHICS-MODE
COLOR-OONT-CARE
EQU
EQU
EQU
EQU
EQU
code
segment
word
:code
assume
cs
S t a r tp r o cn e a r
: S e l e c tg r a p h i c s
mov
int
3ceh
0
1
5
7
'CODE'
mode 12h.
ax.12h
10h
: S e l e c t w r i t e mode 3andread
mov
dx.GC_INDEX
mov
a1 .GRAPHICS-MODE
doxu. at l
di xn c
in
a1,dx
aolr. 0 0 0 0 1 0 1 1; bbi t
jmp
out
dec
: G r a p h i c sC o n t r o l l e rI n d e xr e g i s t e r
; S e t / R e s e tr e g i s t e ri n d e x
i n GC
; E n a b l eS e t / R e s e tr e g i s t e ri n d e xi n
GC
: G r a p h i c s Mode r e g i s t e r i n d e x i n GC
: C o l o rD o n ' tC a r er e g i s t e ri n d e xi n
GC
f+2
mode 1.
dx.al
dx
: S e tu ps e t / r e s e tt oa l w a y sd r a wi nw h i t e .
mov
a1 .SET_RESET
doxu. at l
di xn c
mov
a1
.Ofh
doxu. at l
ddxe c
a1,ENABLE-SET-RESET
mov
doxu. at l
di xn c
mov
a1 .Ofh
doxu, ta l
dx
dec
: S e tC o l o rD o n ' tC a r et o
0. s o r e a d s o f VGA memory a l w a y sr e t u r n
mov
a1 .COLOR-OONT_CARE
dox u. at l
di xn c
sub
a1 .a1
doxu. ta l
;
S e tu pt h ei n i t i a l
ax.VGA_SEGMENT
mov
mov
ds.ax
bx.bx
sub
mov
al.80h
536
Chapter 28
down and t o t h e r i g h t .
OFFH.
Previous
mov
cx.400
DrawDiagonal Loop:
a n[ bdx l .:ar el a d si s p l a y
Home
Next
memory, l o a d i nt ghl ae t c h e s ,
: t h e n w r i t e s AL t o t h e VGA. AL becomes t h e
: b i t mask,and
s e t / r e s e tp r o v i d e st h e
: a c t u a ld a t aw r i t t e n
bx.SCREEN-WIDTH
: p o i n tt ot h en e x ts c a nl i n e
ar lo. 1r
:move t h e p i x e l maskone
p i x e lt ot h er i g h t
bx.0
adc
;advance t o t h e n e x t b y t e
i f t h e p i x e l maskwrapped
loopDrawDiagonalLoop
add
: W a i tf o rak e yt o
: r e t u r nt o
be p r e s s e d t o e n d . t h e n r e t u r n t o t e x t
mode and
DOS.
WaitKeyLoop:
mov
ah.1
int
16h
jz
WaitKeyLoop
ah.ah
sub
i n kt e:ytchl1ee6ahr
mov
ax.3
int
:1rt0eehttxuot r n
mov
ah.4ch
:done
int
21h
S t a r t endp
code
ends
end
Start
mode
I hope Ive given youa goodfeel for what color compare mode is and what it might
be used for. Color compare mode isnt particularly easy to understand, but its not
that complicated in actual operation, and its certainly useful at times; take some
time to study the sample code and perform afew experiments of your own, and you
may well find useful applications for color compare mode in your graphics code.
A final note: The Read Map register has no effect in read mode 1, and the Color
Compare and Color Dont Care registers have no effect either in read mode 0 or
when writing to VGA memory. And with that, by gosh, were actuallydone with the
basics of accessing VGA memory!
Not to worry-that still leaves us a slew of interesting VGA topics, including smooth
panning and scrolling, the split screen, colorselection, page flipping, and Mode X.
And thats not to mention actual uses to which the VGAs hardware can be put, including lines, circles, polygons, and my personal favorite, animation. Weve covered
a lot of challenging and rewarding ground-and weve only just begun.
537
Previous
chapter 29
saving screens and other vga mysteries
Home
Next
Savin
Restoring
541
File SNAPSHOT.SCR
"_""""
to save, and creating the 112K file SNAPSHOT.SCR, whichcontains the visible portion of the mode 1OH frame buffer.
The only part of Listing 29.1 that's even remotely tricky is the use of the Read Map
register (Graphics Controller register4) to make each of the four planesof display
memory readable in turn. The same code is used to write 28,000 bytes of display
memory to disk four times, and 28,000 bytes of memory starting at A000:OOOO are
written to disk each time; however, a different planeis read each time, thanks to the
changing setting of the Read Map register. (If this is unclear, refer back to Figure
29.1; you may also want to reread Chapter 28 to brush up on the operation of the
Read Map register in particular and reading EGA and VGA memory in general.)
Of course, we'll want the ability to restore what we've saved, and Listing 29.2 does
this. Listing 29.2 reverses the action of Listing 29.1, selecting mode 10H and then
loading 28,000 bytesfrom SNAPSHOT.SCRinto each planeof display memory.The
Map Mask register (Sequence Controllerregister 2) is used to select the plane to be
written to. If your computer is slowenough, you can see the colors of the text change
542
Chapter
29
segment
as each plane is loaded when Listing 29.2runs. Note that Listing 29.2 does not itself
draw anytext, but rathersimply loads the bitmap saved by Listing 29.1 back into the
mode 10H frame buffer.
1 .ASM
equ
GC-INDEX
equ
OaOOOh
3ceh
equ
4
equ ( 6 4 0 / 8 ) * 3 5 0
READ-MAP
DISPLAYED-SCREEN-SIZE
stack
dup
stack
s e g m e n pt a r as t a c k
512
db
ends
'STACK'
it
: G r a p h i c sC o n t r o l l e rI n d e xr e g i s t e r
;Read Map r e g i s t e r i n d e x i n
GC
; I o fd i s p l a y e db y t e sp e rp l a n ei n
; h i - r e sg r a p h i c ss c r e e n
(?I
Data
Sampl e T e x t
segment
'DATA'
word
d b' T h i isbs i t - m a p p e dt e x td, r a w ni nh i - r e s
'
db
'EGA g r a p h i c s mode 1 0 h . ' . Odh.
Oah.
Oah
d b' S a v i n gt h es c r e e n( i n c l u d i n gt h i st e x t )
...'
db
Odh.
Oah.
'I'
F i 1 ename
db
'SNAPSHOT.SCR'.O
;name o f f i l e w e ' r e s a v i n g t o
ErrMsgl
'*** Cou1dn"t
open
SNAPSHOT.SCR ***'.Odh.Oah.'$'
db
ErrMsg2
db
'*** E r r o wr r i t i n gt o
SNAPSHOT.SCR ***'.Odh,Oah.'$'
WaitKeyMsg
db
Odh.
Oah,
'Done.Pressanykey
t o end
Odh.Oah.'t'
Hand1 e
dw
?
t soawvei n' rgfei l e :ohfa n d l e
db
P1 ane
?
read
being ;plane
Data
ends
...'.
Code
Start
assume
cs:Code.
ds:Data
n e a rp r o c
mov
ax.Data
mov
,ax ds
; Go t o h i - r e s g r a p h i c s
: P u tu p
mode.
mov
ax.10h
in t
10h
:AH = 0 meansmode
s e t , AL
: h i - r e sg r a p h i c s mode
; B I O S v i d e oi n t e r r u p t
- 1 0 hs e l e c t s
some t e x t , s o t h es c r e e ni s n ' te m p t y .
mov
mov
i2n 1t
ah.9
; O O S sfpturni nc tgi o n
d x . oSfaf m
s ept l e T e x t
h
; D e l e t e SNAPSHOT.SCR i f i t e x i s t s .
mov
mov
in t
: C r e a t et h ef i l e
mov
ah.4lh
d x . oFf iflseenta m e
21h
SNAPSHOT.SCR.
ah.3ch
543
mov
d x , oFf iflseenta m e
c xc sx u, b
:make i t a n o r m a l f i l e
in t
21h
mov
C H a n d l e 1 , a x : hstahvneed l e
j nSca v e T h e S c r e e; nw e ' rree a dtsyoa v e
i f neor r o r
mov
ah.9
:DOS sfpturni nc gt i o n
mov
d x , oEf rf rsM
e ts g l
in t
21h
e rt rh; onero ft i f y
s hj mo pr t
done
:and
Done
: L o o pt h r o u g ht h e
4 p l a n e s ,m a k i n ge a c hr e a d a b l e
i n t u r n and
: w r i t i n g it t o d i s k . N o t e t h a t a l l
4 p l a n e sa r er e a d a b l ea t
: A000:OOOO: t h e Read Map r e g i s t e rs e l e c t sw h i c hp l a n ei sr e a d a b l e
: a t anyonetime.
SaveTheScreen:
mov
mov
mov
out
in c
mov
dx.GC-INDEX
al.READ-MAP:set
dx.al
dx
a1 . [ P l a n e :l g e t h e
out
mov
mov
mov
sub
push
mov
mov
SaveLoop:
int
POP
cmp
jz
mov
mov
in t
jmp
SaveLoopBottom:
a1 , CP1 a n e l
mov
in cpnl eat hxn:teept oai xn t
[Plane] .a1
mov
: ha al . v3e
cmp
SaveLoop
j be
GC I n d e x t o Read Map r e g i s t e r
# o ft h ep l a n e
: t o save
we donep laal nl e s ?
:no. s o dt hnoe pxlta n e
: C l o s e SNAPSHOT.SCR
DoCl ose:
mov
mov
in t
: W a i tf o r
a keypress.
mov
mov
in:prompt
t
mov
544
ah,3eh
bx,[Handlel
21h
Chapter 29
ah.9
:DOS sfpturni nc gt i o n
d x . oWf fasi et Kt e y M s g
21h
ah.8
;DOS iwnipfteuhucnothcuott i o n
we w a n t
in t
; R e s t o r et e x t
mode.
mov
in t
ax.3
10h
mov
int
ah,4ch
21h
end
Start
: Done.
Done :
endp
ends
;DDS t e r m i n faut ne c t i o n
Start
Code
LISTING29.2129-2.ASM
: P r o g r a mt or e s t o r ea
mode 1 0 h EGA g r a p h i c s s c r e e nf r o m
: t h e f i l e SNAPSHOT.SCR.
VGA-SEGMENT
equ
OaOOOh
3c4h
equ
2 equ
e q u( 6 4 0 / 8 ) * 3 5 0
SC-INDEX
MAP-MASK
DISPLAYED-SCREEN-SIZE
ends
s tsaecgk mpesantratac k
dup
512
stack
; S e q u e n c eC o n t r o l l e rI n d e xr e g i s t e r
;Map Mask r e g i s t e r i n d e x i n
SC
;# o f d i s p l a y e d b y t e s p e r p l a n e i n a
; h i - r e sg r a p h i c ss c r e e n
'STACK'
db
'DATA'
word
segment
Data
1
F i ename
Edrbr M s g l
ErrMsg2
db
Wai tKeyMsg
db
Hand1 e
db
P1 ane
ends
Data
segment
near
21h
(?
db
'SNAPSHOT.SCR',t )
;name o f f i l e w e ' r e r e s t o r i n g f r o m
I***
Cou1dn"t
open
SNAPSHOT.SCR ***'.Odh.Oah.'$'
'*** rEe rfardoim
rn g
SNAPSHOT.SCR ***'.Odh,Oah,'$'
Odh.Oah.'$'
Odh.Oah,
'Done.Pressanykey
t o end
?
; h a n d l eo ff i l ew e ' r er e s t o r i n gf r o m
?
written
being
;plane
...'.
dw
Code
assume
cs:Code.
ds:Oata
proc
Start
mov
mov
ax.Data
,axds
; Go t o h i - r e s g r a p h i c s
mode.
mov
ax.10h
in t
10h
0 meansmode
s e t , AL
;AH
; h i - r e sg r a p h i c s
mode
; B I O S v i d e oi n t e r r u p t
- 1 0 hs e l e c t s
; Open SNAPSHOT.SCR.
mov
mov
b
;open s u,a1
in t
mov
jnc
mov
ah.3dh
d x . o f f s e tF i l e n a m e
a1
21h
CHandl e l ,ax
RestoreTheScreen
ah.9
;DOS open
f u n c ft ii loen
r e a d i nfgo r
; s a v et h eh a n d l e
; w e ' r er e a d yt or e s t o r e
i f n oe r r o r
;DOS p r i n t s t r i n g f u n c t i o n
545
mov
d x , oEf fr sr M
e ts g l
e r r o rt h ein t o f: n o t i2f y1 h
jsmhpo r t
Done ;and
done
: L o o pt h r o u g ht h e
4 p l a n e s .m a k i n ge a c hw r i t a b l ei nt u r na n d
r e a d i n g it f r o md i s k .N o t et h a ta l l
4 p l a n e sa r ew r i t a b l ea t
: A000:OOOO: t h e Map Mask r e g i s t e rs e l e c t sw h i c hp l a n e sa r er e a d a b l e
: a t anyonetime.
We o n l y makeone
p l a n er e a d a b l ea t
a time.
RestoreTheScreen:
mov
RestoreLoop:
mov
rnov
out
inc
mov
dx.SC-INDEX
a1 .MAP-MASK
dx,al
dx
cl .[Planel
mov
shl
a1 .1
a1 . c l
out
mov
mov
rnov
sub
push
mov
mov
in t
dx,al
ah,3fh
bx. [Handl e l
cx.DISPLAYED-SCREEN-SIZE
dx. dx
ds
s i .VGA-SEGMENT
ds.si
21h
ds
ReadError
ax.DISPLAYED-SCREEN-SIZE
RestoreLoopBottom
POP
jc
cmp
jz
ReadError:
CPlanel.0
mov
rnov
int
j mp
RestoreLoopBottom:
mov
in c
rnov
crnp
j be
ah.9
d x . o f f s e tE r r M s g Z
21h
s h o r tD o C l o s e
a1 , [ P l a n e l
ax
[Planel.al
al.3
RestoreLoop
:startwithplane
: s e t SC I n d e x t o
0
Map Mask r e g i s t e r
g e t t h e 11 o f t h e p l a n e
t o restore
we want
setthebitenablingwritesto
o n l yt h eo n ed e s i r e dp l a n e
s e tt or e a df r o md e s i r e dp l a n e
DOS r e a d f r o m f i l e f u n c t i o n
:#o f b y t e s t o r e a d
: s t a r tl o a d i n gb y t e sa t
A000:OOOO
; r e a dt h ed i s p l a y e dp o r t i o no ft h i sp l a n e
: d i d all b y t e sg e tr e a d ?
:DDS p r i n t s t r i n g f u n c t i o n
; n o t i f ya b o u tt h ee r r o r
;anddone
; p o i n tt ot h en e x tp l a n e
:have we done a l l p l a n e s ?
:no. so do t h en e x tp l a n e
: C l o s e SNAPSHOT.SCR.
DoCl ose:
mov
mov
in t
;
W a i tf o r
:DOS c l o s e f i l e f u n c t i o n
ah.8
21h
;DOS i n p u t w i t h o u t e c h o f u n c t i o n
a keypress.
mov
in t
: R e s t o r et e x t
546
ah.3eh
bx.CHandle1
21h
Chapter 29
mode.
mov
int
ax.3
10h
mov
in t
ah,4ch
21h
: Done.
Done:
endp
Start
Code
:DOS t e r m i n a t e f u n c t i o n
ends
Start end
Whats the solution? Frankly, the solution is to getVGA-specific. A TSR designed for
the VGA can simply read out andsave the state of the registers of interest, program
those registers as needed, save the screen image, and restore the original settings.
From a programmers perspective, readable registers are certainly near the top of
the list of things to like about the VGA! The remaining installed base of EGAs is
steadily dwindling,and you maybe able to ignoreit asa market today, as you
couldnt
even a year or two ago.
Saving Screens and Other VGA Mysteries
547
If youare goingto write a hi-res VGA version ofthe screen capture program,be sure
to account for theincreased size of the VGAs mode 12H bit map. The mode 12H
(640x480) screen uses 37.5K per plane of display memory, so for mode 12H the
displayed screen size equate in Listings 29.1 and 29.2 should be changed to:
DISPLAYED-SCREEN-SIZE
equ
(640/8)*480
Similarly, if youre capturing a graphics screen that starts at anoffset other than0 in
the segment at AOOOH, you must change the memory offset used by the disk functions to match.You can, if you so desire, read the startoffset of the display memory
providing the information shown on the screen from theStart Address registers (CRT
Controller registers OCH and ODH); these registers are readable even on anEGA.
Finally, be aware that the screen capture and restore programs in Listings 29.1and 29.2
are only appropriate for EGA/VGA modes ODH, OEH,O F H , OlOH, and 012H, since they
assume a four- plane configuration of EGA/VGA memory.In all text modes
and in CGA
graphics modes,and in VGA modes 11H and 13H as well, display
memory can simply
be
written to disk and read back as a linear block of memory,just like a normal array.
While Listings 29.1and 29.2 are written in assembly, the principles they illustrate apply
equally wellto high-level languages.In fact, theresno need for any assemblyat all when
can
saving an EGA/VGA screen, as long as the high-level language youre using perform
direct port 1 / 0 to setup the adapter and can read and write display memory directly.
16 Colors out of 64
548
Chapter 29
IBM Color Display), you can never select from more than 16 colors, since those
monitors only accept four input signals. For now, well assume werein one of the
350-scan line color modes, a group which includes mode 10H and the350-scan-line
versions of modes 0, 1, 2, and 3.
Each pixel comes out of memory (or, in text mode, out of the attribute-handling
portion of the EGA) as a 4bit value, denoting 1 of 16 possible colors. In graphics
modes, the 4bit pixel value is made up of one bit from each plane, with 8 pixels
any given byteaddress in display memory. Normally,we think
worth of data stored at
of the 4bit value of a pixel as being that pixels color,so a pixel value of0 is black, a
pixel value of 1 is blue, and so on, as if thats a built-in feature of the EGA.
Actually, though, the correspondenceof pixel values tocolor is absolutely arbitrary,
depending solely on how the color-mapping portion of the EGA containing thepalette registers is programmed. If you cared to have color 0 be bright redand color l
be black, that could easily be arranged, as could a mapping in which all 16 colors
were yellow. Whats more, these mappings affect text-mode characters as readily as
they do graphics-mode pixels, so you could map text attribute 0 to white and text
attribute 15 to black to producea black on white display, if you wished.
Each of the 16 palette registers storesthe mapping of one of the 16 possible4bit pixel
values from memory to one of 64 possible &bit pixel values
to be sent
to the monitor
as video data, as shown in Figure 29.2. A 4bit pixel value of 0 causes the &bit value
from
display
memory or from
a text attribute,
used to look up a
palette register
Figure 29.2
549
Palette
Register
B i t 7
R'
G'
B'
550
Chapter 29
(subfunction 3), bit 0 of BL is set to 1 to cause bit 7 of text attributes to select blinking, or set to 0 to cause bit 7 of text attributes to select high-intensity reverse video.)
Listing 29.3 uses videofunction 10H, subfunction 2 to step through all 64 possible
colors. This is accomplished by putting up 16 color bars, one for each of the 16
possible 4bit pixel values,then changing the mapping provided by the palette registers
to select a differentgroup of 16 colors from the set of 64 each time a key is pressed.
Initially, colors 0-15 are displayed, then 1-16, then 2-17, and so on up to color 3FH
wrapping around to colors 0-14, and finally backto colors 0-15. (By the way, at mode
set time the 16 paletteregisters are not set to colors 0-15, but rather to OH, IH, 2H,
3H, 4H,5H, 14H, 7H, 38H, 39H,
3AH, 3BH, 3CH, 3DH, 3EH,
and 3FH, respectively. Bits
6,5, and 4-secondary red, green, and blue-are all set to 1 in palette registers8-15 in
order to produce high-intensity colors. Palette register 6 is set to 14H to produce
brown, rather than theyellow that the expectedvalue of 6H would produce.)
When you run Listing 29.3, you'll see that the whole screen changes color as each
new color set is selected. This occurs because most of the pixels on thescreen have a
value of 0, selecting the background color stored in palette register 0, and we're
reprogramming palette register 0 right alongwith the other 15 palette registers.
It's important to understand thatin Listing 29.3the contentsof display memory are
never changed after initialization. The only change is the mapping from the 4bit
pixel data coming out of display memory to the &bit data goingto the monitor. For
this reason, it's technically inaccurate to speak of bits in display memory as representing colors; more accurately, they represent attributes in the range 0-15, which
are mapped to colors 0-3FH by the palette registers.
LISTING 29.3129-3.ASM
: Program t o i l l u s t r a t e t h e c o l o r m a p p i n g c a p a b i l i t i e s o f t h e
: EGA's p a l e t t er e g i s t e r s .
VGA-SEGMENT
SC-INDEX
3c4h
MAP-MASK
BAR-HEIGHT
b a re a c h o: hf e i g 1h 4t
TOP-BAR
equ
equ
equ
equ
equ
OaOOOh
BARKHEIGHT*6
s tsaecgk mpesantratac k
: sbt ah rest
down a bt oi t
: l e a v er o o mf o rt e x t
'STACK'
5 1 2 dup ( ? )
db
ends
: S e q u e n c eC o n t r o l l e rI n d e xr e g i s t e r
:Map Mask r e g i s t e r i n d e x i n
SC
stack
'DATA'
word
segment
Data
db KeyMsg
dup 13
db
db
db
db
' P r e s sa n yk e yt o
s e e t h en e x tc o l o rs e t .
'Thereare64colorsetsinall
.'
Odh.
Oah.
Oah.
Oah.
Oah
( ' '1, 'Attribute'
38 dup ( ' ' 1 , ' C o l o r $ '
: Used t o l a b e l t h e a t t r i b u t e s
'
o f t h ec o l o rb a r s .
551
A t t r i b u t e Nbuyl m
at ebbeelr s
X'
0
16
rept
i f x It 1 0
db
else
db
endi f
X'
x+l
endm
db
'0'.
x+'O'.
'h',
Oah. 8 . 8 . 8
Oah. 8 . 8. 8
,$*
: Used t ol a b e lt h ec o l o r so ft h ec o l o rb a r s .( C o l o rv a l u e sa r e
: f i l l e d i n o nt h e
fly.)
obr N
y tluC
eambobel el r s
16
rept
' 0 d0 b0 h ' .
endm
COLORKENTRY-LENGTH
db
Oah.8. 8.
8. 8
( $ - Ce oq luo r N u m b e r s ) / l 6
'I'
C u r r ednbt C o l o r
: Space f o r t h e a r r a y o f 1 6 c o l o r s w e ' l l p a s s t o t h e
: a no v e r s c a ns e t t i n go fb l a c k .
ends
Coor Tl a b l
Data
segment
Code
BIOS, p l u s
(?I, 0
dup
16
db
assume
cs:Code,
ds:Data
near
proc
Start
c ld
mov
mov
ax.Data
,axds
: Go t o h i - r e s g r a p h i c s
mov
mode.
:AH
0 meansmode
s e t , AL
: h i - r e sg r a p h i c s mode
ax.10h
1i
0nht
;BIOS v i d e o i n t e r r u p t
: P u tu pr e l e v a n tt e x t .
mov
mov
in t
ah.9
dx.offset
21h
KeyMsg
;DOS
f su tnpr cirnti ingot n
: P u tu pt h ec o l o rb a r s ,o n ei ne a c ho ft h e1 6p o s s i b l ep i x e lv a l u e s
: ( w h i c hw e ' l lc a l la t t r i b u t e s ) .
mov
sub
cx.16
a1 .a1
push
push
call
POP
POP
ax
cx
BarUp
cx
ax
BarLoop:
552
Chapter 29
: w e ' l lp u tu p1 6c o l o rb a r s
:startwithattribute
1 0 hs e l e c t s
axinc
B a r Lloooopp
; s e l e c tt h en e x ta t t r i b u t e
: P u tu pt h ea t t r i b u t el a b e l s .
mov
bhbh,
sub
mov
ah.2
: v i d e oi n t e r r u p ts e tc u r s o rp o s i t i o nf u n c t i o n
:page 0
dh,TOP_BAR/14
: c o u n t i n gi nc h a r a c t e rr o w s ,m a t c ht o
: t o po ff i r s tb a r ,c o u n t i n gi n
: s c a nl i n e s
d l .16
b aolr efs: ftj tou s t
10h
:DOS
f su tnpr cirnti ingot n
ah.9
d x . oAf tf tsrei bt u t e N u m b e r s
21h
mov
int
mov
mov
in t
: L o o pt h r o u g ht h ec o l o rs e t .
mov
one new s e t t i n gp e rk e y p r e s s .
[ C u r r e n t C o l o r l .: O
s t awr itct ho l zoer r o
ColorLoop:
: S e tt h ep a l e t t er e g i s t e r st ot h ec u r r e n tc o l o rs e t .c o n s i s t i n g
: o ft h ec u r r e n tc o l o r
mapped t o a t t r i b u t e 0. c u r r e n t c o l o r
: mapped t o a t t r i b u t e 1. and s o on.
mov
a1 , [ C u r r e n t C o l o r l
mov
b x , oCf fosl eo tr T a b l e
mov
cx.16
s :we
ect tool1ho6ar sv e
PaletteSetLoop:
:lim
6ict-vobtaloiolt ur e s
and
a1 . 3 f h
mov
C:.ba1uxt1hl6i le-dc ot laoubsrsfleoetrdt i n g
b x in c
pra:el gtehit set et e r s
ax
inc
P a l eltot oe pS e t L o o p
mov
ia;nvhtpi.efd1aure0lnreouhctpttiteo n
mov
p: asa1rlul e.6b2atgftliueslsnet ctet otr iso n
: a n do v e r s c a na to n c e
mov
d x . oCf fosl eo tr T a b l e
ds push
es
; E Stca: oDbtXl lhoeerptooi n t s
POP
in t
:1i 0n hvvtoihidkneept toeashrel ret tuot pt et
: P u tu pt h ec o l o rn u m b e r s ,
s o we c a n see how a t t r i b u t e s map
: t oc o l o rv a l u e s ,a n d
s o we c a n s e e how e a c hc o l o r # l o o k s
: ( a t l e a s t on t h i sp a r t i c u l a rs c r e e n ) .
call
: W a i tf o r
ColorNumbersUp
a k e y p r e s s , s o t h e yc a n
see t h i s c o l o r s e t .
Wai t K e y :
mov
int
ah.8
21h
;DOS w
i nif eptuhcunohtcuot ti o n
: Advance t o t h e n e x t c o l o r s e t .
mov
a xin c
mov
cmp
C o l o r L oj bo ep
a1 . [ C u r r e n t C o l o r l
[CurrentColorl,al
a1 .64
553
; R e s t o r et e x t
mode.
mov
in t
ax.3
10h
mov
in t
ah,4ch
21h
; Done.
Done:
;DOS t efrumnicntai ot en
; P u t su p
a b a rc o n s i s t i n go ft h es p e c i f i e da t t r i b u t e( p i x e lv a l u e ) .
; a t a v e r t i c a lp o s i t i o nc o r r e s p o n d i n gt ot h ea t t r i b u t e .
; I n p u t : AL
proc
attribute
BarUp
mov
mov
mov
out
in c
mov
out
near
dx.SC-INDEX
ah.al
a1 .MAP_MASK
dx.al
dx
a1 ,ah
dx.al
; s e tt h e
Map Mask r e g i s t e r t o p r o d u c e
; t h ed e s i r e dc o l o r
mov
mu1
add
ah,BAR-HEIGHT
ah
ax,TDP-BAR
mov
mu1
dx.80
dx
add
mov
mov
mov
a x , 20
d i ,ax
ax.VGA_SEGMENT
es ,ax
mov
mov
dx.BAR-HEIGHT
a1 .Offh
mov
rep
add
c x , 40
stosb
d i ,40
dec
jnz
ret
dx
BarLineLoop
;row o f t o p o f b a r
; s t a r t a f e w l i n e s down t o l e a v e room f o r
; text
: r o w sa r e EO b y t e sl o n g
: o f f s e ti nb y t e so fs t a r to fs c a nl i n eb a r
; s t a r t so n
leftcornerofbar
; o f f s e ti nb y t e so fu p p e r
;ES:DI p o i n t s t o o f f s e t
; c o r n e ro fb a r
of u p p e r l e f t
BarLineLoop:
endp
:make t h eb a r s4 0w i d e
; d oo n es c a nl i n eo ft h eb a r
;pointtothestartofthenextscanline
; o ft h eb a r
BarUp
; C o n v e r t s AL t o a h e x d i g i t i n t h e r a n g e
Bni T o H e x Dgi i
IsHex
t p r o cn e a r
cmp
al.9
ja
add
a1
,'0'
ret
IsHex:
add
a1
ret
B i n T o H e x D i gt i endp
554
Chapter 29
0-F
endp
ends
: D i s p l a y st h ec o l o rv a l u e sg e n e r a t e db yt h ec o l o rb a r sg i v e nt h e
: c u r r e n tp a l e t t er e g i s t e rs e t t i n g so f ft ot h er i g h to ft h ec o l o r
: bars.
Col orNumbersUpproc
mov
sub
mov
near
ah.2
bh,bh
dh.TOP-BAR114
mov
in t
mov
mov
; v i d e oi n t e r r u p ts e tc u r s o rp o s i t i o nf u n c t i o n
:page 0
: c o u n t i n g i n c h a r a c t e rr o w s .m a t c ht o
: t o po ff i r s tb a r ,c o u n t i n gi n
: s c a nl i n e s
;justtorightofbars
d l ,20+40+1
10h
a1 . [ C u r r e n t C o l o r ] : s t a r t w i t h t h e c u r r e n t c o l o r
b x . o f f s e tC o l o r N u m b e r s + l
: b u i l d c o l o r number t e x t s t r i n g o n t h e
movd coot:1olwo6g1eros6cvt xe,
ColorNumberLoop:
//
c o l o r t h ep u:ssha v e a x
and
al.3fh
:lim
vc6ao-ilt
lbuoitertos
shr
al.l
shr
a1 .1
shr
a1 .1
shr
a1 .1
: i s o l htahnitgeictbhhoobelfloer
call
B i nToHexDi g i t
: c o n v e rtth eh i g hc o l o r
11 n i b b l e
mov
: pa un td
i t i ntth
teoex t
Cbxl
.a1
I/
c o l o rt h e bPOP
a c k; g e t a x
c o l o r t h ep u:ssha v e a x
#
and
.; O
i sacfol1hootl lhawoetre
# nibble
call
Bni T o H e x Dgi i t
: c o n v e rtth el o wn i b b l eotfh e
: color # t o ASCII
mov
[ b x + l l .a1
: apnudt
i t i nttthoeex t
add
bx,COLOR-ENTRY-LENGTH
: p otnhti eone xtntt r y
c o l o rt h e bPOP
a c k; g e t a x
#
c o l o r ; ni ne cx t
ax
#
1 oop
Col orNumberLoop
mov
ah.9
:DOS
f usnt cr pti nirogi n t
mov
d x . o f f s e tC o l o r N u m b e r s
2 1h
a t t trnhi ubem
uutbpe:eprus t
in t
ret
ColorNumbersUpendp
fly
Start
Code
end
Start
Overscan
While wereat it, Im going to touch on overscan. Overscan isthe color of the border
of the display, the rectangular area around the edge of the monitor thats outside
the region displaying active video data but inside the blanking area. The overscan
(or border) color can be programmed to any of the 64 possible colors by either
setting Attribute Controller register 11H directly or calling video function 10H,
subfunction 1.
555
A Bonus Blanker
An interesting bonus: The Attribute Controllerprovides a very convenient way to
5 of the Attribute Controller
blank the screen, in the form
of the aforementioned bit
Index register (at address 3COH after the InputStatus 1register-3DAH in color, 3BAH
in monochrome-has been read and
on every other write to3COH thereafter). Whenever bit 5 of the AC Index registeris 0, video data is cut off, effectively blanking the
screen. Setting bit 5 of the AC Index back to 1 restores video data immediately.
Listing 29.4 illustrates this simple but effective form of screen blanking.
LISTING29.4129-4.ASM
; P r o g r a mt od e m o n s t r a t es c r e e nb l a n k i n gv i ab i t
; A t t r i b u t eC o n t r o l l e rI n d e xr e g i s t e r .
AC-INDEX
INPUT-STATUS-1
equ
3cOh
; c3odleaoqhr -uamd od dr ees s
: Macro t o w a i t f o r
a n dc l e a rt h en e x tk e y p r e s s .
WAIT-KEY
s teagscpm
tkaecr nakt
;DOS i n p u t w i t h o u t e c h o f u n c t i o n
ah,8
21h
'STACK'
512 dup ( ? )
db
stack
' DATA'
' T h i si sb i t - m a p p e dt e x t ,d r a w ni nh i - r e s
'EGA g r a p h i c s mode 1 0 h . ' . Ddh. Oah. Oah
' P r e s sa n yk e yt ob l a n kt h es c r e e n ,t h e n
it,',Odh. Oah
' a n yk e yt ou n b l a n k
' t h e na n yk e yt oe n d . $ '
segment
word
Data
SampldebT e x t
db
db
db
db
ends
'
'
Data
Code
near
; A t t r i b u t eC o n t r o l l e rI n d e xr e g i s t e r
o If n tphuet
; Status 1 register
macro
mov
in t
endm
ends
5 o ft h e
proc
segment
assume
cs:Code.
ds:Data
Start
mov
mov
; Go t o h i - r e s g r a p h i c s
556
Chapter 29
ax,Data
ds.ax
mode.
mov
ax,lOh
in t
10h
;AH
0 meansmode
s e t . AL
; h i - r e sg r a p h i c s
mode
;BIOS v i d e oi n t e r r u p t
- 1 0 hs e l e c t s
; P u t up some t e x t ,
s o t h es c r e e ni s n te m p t y .
mov
mov
i n2 t1
ah.9
dx.S
o faf m
s eDt l e T e x t
h
f;DOS
u sn tcr tipinor ginn t
WAIT-KEY
; B l a n kt h es c r e e n .
dx.al
mov
in
dx.INPUT-STATUS-]
a1 . d x
mov
sub
out
dx.AC-INDEX
a1 .a1
: r e s e tp o r t
: mode
3cOh t o i n d e x ( r a t h e r t h a n d a t a )
;make b i t 5 z e r o . . .
: . . . w h i c hb l a n k st h es c r e e n
WAIT-KEY
: U n b l a n kt h es c r e e n .
mov
in
dx.INPUT-STATUS-]
a1 . d x
: r e s e tp o r t
3cOh t o I n d e x ( r a t h e r t h a n d a t a )
; mode
dx.al
mov
mov
out
dx.AC-INDEX
a1 .ZOh
:make b i t 5 one...
:. . w h i c h u n b l a n k s t h e s c r e e n
WAIT-KEY
: R e s t o r et e x t
mode.
mov
int
ax.2
10h
mov
in t
ah.4ch
21h
endp
: Done.
Done:
Start
Code
Start
;DOS t e r m i n a t e f u n c t i o n
end
Does that do it for color selection? Yes and no. For the EGA, wevecovered the whole
of color selection-but not so forthe VGA. The VGA can emulate everything weve
discussed, but actually performs one 4bit to 8-bit translation (except in 256-color
modes, where all 256 colors are simultaneously available), followed by yet another
translation, this one 8-bit to 18-bit. Whatsmore, the VGA has the ability to flip instantly through as many as 16 16-color sets. The VGAs color selection capabilities,
which are supportedby another set of BIOS functions, can be used to produce stunning color effects, as well see when we cover them starting in Chapter 33.
557
or
out
dx.3ceh
a1 .5
dx.al
dx
a1 .dx
al.not 3
a1 .1
dx.al
: G r a p h i c sc o n t r o l l e ri n d e x
: G r a p h i c s mode r e gi n d e x
: p o i n t GC i n d e x t o GLMODE
: G r a p h i c sc o n t r o l l e rd a t a
: g e tc u r r e n t mode s e t t i n g
:mask o f f w r i t e mode f i e l d
: s e t w r i t e mode f i e l d t o 1
: s e t w r i t e mode 1
This approach is more of a nuisance than simply setting the whole register, but its
safer. Its also slower; for cases where you must set a field repeatedly, it might be
worthwhile to read and mask the register once at the start, save
and it in a variable, so
that thevalue is readily available in memory and need not be
repeatedly read from
the port. This approach is especially attractive because INS are much slower than
memory accesses on 386 and 486 machines.
Astute readers may wonder why I didnt put a delay sequence, such as JMP $+2,
between the IN and OUT involving the same register. There are,after all, guidelines
from IBM, specifjmg that a certain period should be allowed to elapse before a
second access to an 1 / 0 port is attempted, because not all devices can respond as
rapidly as a 286 or faster CPU can access a port. My answer is that while I cant
guarantee that a delay isnt needed, Ive never found a VGA that required one; I
suspect that the delay specification has more to do with motherboard chips such as
the timer, the interrupt controller, and the like, and I sure hate to waste the delay
time if its not necessary. However, Ive never been able to find anyone with the
definitive word on whether delays might ever be neededwhen accessing VGAs, so if
558
Chapter 29
Previous
Home
Next
01
Bit 7
I I
I
0
Reserved
I
I
Readmode
Odd/even
addressing off
CGA pixel
formatting off
Reserved
you know the gospel truth, or if you know of a VGA/processor combo that does
require delays, please let me knowby contacting me through thepublisher. Youd be
doing afavor for a whole generation of graphics programmers who arent sure whether
theyre skating on thin ice without those legendary delays.
559
Previous
chapter 30
video est omnis divisa
Home
Next
4,
ilir
.pa.
as^
s m
*!gib.
563
Offset ,0
(start
of splitscreen
area of
display
memory)
Display Memory
Start "+
address
(start of
normalscreen
area of
display
memory)
564
Chapter 30
So, where is the split screen start scan line stored? The answer variesa bit, depending
on whether youre talking
about the EGA or the VGA. On the EGA, the split screen start
scan lineis a 9-bit value, with bits 7-0
stored inthe Line Compare register
(CRTCregister
18H) andbit 8 stored in bit 4 of the Overflow register (CRTC register7). Other bits
in the Overflow register serve as the high bits of other values, such as the vertical
total and the vertical blanking start. Since EGA registers are-alas!-not
readable,
you must know the correctsettings for theother bits in theOverflow registers to use
the split screenon an EGA. Fortunately, there are only two standard Overflow register
settings on the EGA 11H for 200-scan-line modesand 1FH for 350-scan-line modes.
The VGA, of course, presents no such problem in setting the split screen start scan
line, for it has readable registers. However, the VGA supports a 10-bit split screen
start scan line value, with bits 8-0
storedjust as withthe EGA, and bit 9 stored inbit 6
of the Maximum Scan Line register (CRTC register9).
Turning the split screen on involves nothing more than setting all bits of the split
screen start scan line to the scan line after which you want the split screen to start
appearing. (Of course, youll probably want to
change thestart address before using
the split screen; otherwise, youll just end updisplaying the memory at offset zero
twice: once in the normal screen and oncein the split screen.) Turning off the split
screen is a simple matter of setting the split screen start scan line to a value equal to
or greater than thelast scan line displayed; the safest such approach is to set all bits
of the split screen start scan line to 1. (That is, in fact, the split screen start scan line
value programmed by the BIOS during a mode set.)
565
the whole screen, by setting the start address to 0 and then flippingback and forth
between the normal screen and the split screen with a split screen start scan line
setting of zero. Both the normal screen and the split screen display the same text,
but the split screen displays it one scan line lower, because the split screen doesn't
start untilafter the first scan line, and that produces a jittering
effect as the program
switches the split screen on and off. (On the EGA, the split screen may display two
scan lines lower, for reasons I'll discuss shortly.)
Finally, after another keypress, Listing 30.1 halts.
LISTING 30.1
: D e m o n s t r a t e st h e
130-1.ASM
VGA/EGA s p l i t s c r e e n i n a c t i o n .
......................................................................
I S-VGA
VGA-SEGMENT
SCREEN-WIDTH
SCREENKHEIGHT
CRTC-INDEX
OVERFLOW
MAXIMUM-SCAN-LINEequ
OaOOOh
640
350
3d4h ;CRT C o n t r o l l e rI n d e xr e g i s t e r
7
: i n d eoO
xf v e r f l o rweign
CRTC
: i n d e x o f MaximumScan L i n e r e g i s t e r
: i n CRTC
Och
: i n d e xo S
f t a rA
t d d r e s sH i g hr e g i s t e r
: i n CRTC
: i n d e xo fS t a r tA d d r e s s
Low r e g i s t e r
: i n CRTC
1 8 h; i n d e xo Lf i n e
Compare r e g( b i t s
7-0
: o f s p l i ts c r e e ns t a r ts c a nl i n e )
: i n CRTC
3 d a h: I n p u St t a t u s
0 register
1
: s et to
0 t o a s s e m b lfeo r
: c o m p u t e r st h a tc a n ' th a n d l e
: w o r do u t st oi n d e x e d
VGA r e g i s t e r s
START-ADDRESS-HIGH
STARTLADDRESS-LOWequ
LINE-COMPARE
INPUT-STATUS-0
WORD-OUTS-OK
; s e tt o
0 t o a s s e m b lfeo r
EGA
......................................................................
: Macro t o o u t p u t
aw o r dv a l u et oap o r t .
OUTLWORD
macro
i f WORD-OUTS-OK
dox u. at x
else
doxu. at l
di xn c
x c hagh , a l
doxu. at l
ddxe c
x c hagh . a l
endi f
endm
......................................................................
MyStack
segment
para
stack
512
db
dup
ends
MyStack
'STACK'
(0)
......................................................................
Data
segment
SplitScreenLine
566
Chapter 30
dw
: l i nt hse ep sl ict r e ec nu r r e n t l y
: s t a r t sa f t e r
StartAddress
dw
......................................................................
Code
segment
assumecs:Code.ds:Oata
......................................................................
S t a r tp r o cn e a r
mov
ax.0ata
mov
ds.ax
: S e l e c t mode 1 0 h .6 4 0 x 3 5 01 6 - c o l o rg r a p h i c s
mov
ax.0010h
int
10h
mode.
:AH-0 i s s e l e c t mode f u n c t i o n
:AL=lOh i s mode t o s e l e c t ,
: 6 4 0 x 3 5 01 6 - c o l o rg r a p h i c s
mode
: P u tt e x ti n t od i s p l a y
memory s t a r t i n g a t o f f s e t
0. w i t he a c hr o w
: l a b e l l e d as t o number.This
isthepartof
memory t h a t will be
: d i s p l a y e di nt h es p l i ts c r e e np o r t i o n
o f t h ed i s p l a y .
mov
cx.25
i dnwrt oaet ew
' lxl tol if:n#eos f
: t h es p l i ts c r e e np a r to f
F i l l Spl it S c r e e n L o o p :
f ul on cccat uitoironsmov
no :rs e t a h . 2
c u r s o r: s e t b h . b h s u b
mov
dh.25
i n d r a w t osub: craolwdc uh l. ac tl e
ni n : s t a r t , d l
dl
sub
o c a t icounr s o r t h e
: s ient t
10h
mov
a1 ,25
a g a i ni nsub
d r a w: tcoa1
a lrc.ocuwll a t e
:make
ah,ahsub
mov
dh.10
r o w t h :es p l i td h d i v
ax, add
'00'
mov
[ O i g i t I n s e r dt :tliph.teiganhuexi xtetos
mov
mov
int
1 oop
#
i n page 0
v a l ut he e
memory s t a r t i n g a t
ax.VGA-SEGMENT
mov
mov
es.ax
mov
d i ,8000h
dx,SCREENLHEIGHT
mov
mov
ax, 888811
c ld
RowLoop:
mov
cx.SCREEN-WIOTH/8/2
srt eo ps w
d i v iafsoiwor on r d
d i g ti #twsoi n t o
t do :i gc oit thnsev e r t
ASCII
: t o b ed i s p l a y e d
ah.9
d x . o f f sSe pt l i t S c r e e n M s g
21h
text
the
F i l l Spl it S c r e e n L o o p
: F i l ld i s p l a y
: pattern.
memory
8 0 0 0 hw i t h
a d i a g o n a l l ys t r i p e d
:fill a l ll i n e s
: s t a r t i n g fill p a t t e r n
:fill 1 scan l i n e a word a t a t i m e
:fill t h es c a nl i n e
567
hift
ax.1
ror
ddxe c
jnz
word
RowLoop
o f memory.
; S e tt h es t a r ta d d r e s st o8 0 0 0 ha n dd i s p l a yt h a tp a r t
mov
[StartAddress1,8000h
c aSl el t S t a r t A d d r e s s
: S l i d et h es p l i ts c r e e nh a l f
way u pt h es c r e e na n dt h e nb a c k
down
; a q u a r t e ro ft h es c r e e n .
mov
CSplitScreenLine1,SCREEN-HEIGHT-1
;settheinitiallinejustoff
; t h eb o t t o mo ft h es c r e e n
mov
call
mov
call
cx,SCREENKHEIGHT/2
S p l itScreenUp
cx,SCREEN_HEIGHT/4
SplitScreenDown
; Now move u pa n o t h e rh a l f
mov
call
mov
call
a s c r e e na n dt h e nb a c k
down a q u a r t e r .
cx.SCREEN-HEIGHT/Z
S p l itScreenUp
cx,SCREEN_HEIGHT/4
SplitScreenDown
; F i n a l l y move up t o t h e t o p o f t h e s c r e e n .
mov
call
: W a i tf o r
mov
in t
; T u r nt h e
mov
call
: W a i tf o r
mov
in t
cx.SCREENPHEIGHT/2-2
S p l itScreenUp
a k e yp r e s s( d o n ' te c h oc h a r a c t e r ) .
ah.8
21h
;DOS f uw
cenoi cntcnhhptsioououtnl te
s p l i ts c r e e no f f .
[SplitScreenLine].Offffh
SetSplitScreenScanLine
a k e yp r e s s( d o n ' te c h oc h a r a c t e r ) .
ah.8
21h
; D i s p l a yt h e
:OOS f uw
cenoi cntcnhhptsoiououtlnte
mov
CStartAddressl.0
c aSl el t S t a r t A d d r e s s
; F l i pb e t w e e nt h es p l i ts c r e e na n d
; f r a m eu n t i l
a key i s p r e s s e d .
F1 ipLoop:
CSplitScreenLine1,Offffh
xor
c a l l S e t S p l it S c r e e n S c a n L i n e
mov
cx.10
568
Chapter 30
t h en o r m a sl c r e e ne v e r y1 0 t h
aracter
the
rnov
to
int
rnov
; r e t ui rnnt
Start endp
ax.0003h
;AH-0
s e l e ci ts
mode f u n c t i o n
;AL-3 i s mode t o s e l e c t , t e x t
mode
; r e t u r n t o t e x t mode
10h
ah.4ch
21h
DOS
......................................................................
; W a i t sf o rt h el e a d i n ge d g eo ft h ev e r t i c a ls y n cp u l s e .
; I n p u tn: o n e
; O u t p u tn: o n e
; R e g i s t e r sa l t e r e d :
AL. DX
WaitForVerticalSyncStart
nperaorc
mov
dx.INPUT-STATUS-0
WaitNotVerticalSync:
a l . di xn
t e sa tl . 0 8 h
jnz
WaitNotVerticalSync
WaitVerticalSync:
in
a1,dx
t e sa tl , 0 8 h
jz
WaitVerticalSync
ret
WaitForVerticalSyncStart
endp
......................................................................
; W a i t sf o rt h et r a i l i n ge d g eo ft h ev e r t i c a ls y n cp u l s e .
; I n p u tn: o n e
: O u t p u tn: o n e
; R e g i s t e r sa l t e r e d :
AL. DX
W a i t F o r V e r t i c a l S y n c E n dp r o cn e a r
mov
dx.INPUTLSTATUS-0
WaitVerticalSyncZ:
a l . di xn
t e s t a1 .08h
jz
WaitVerticalSyncZ
WaitNotVerticalSync2:
in
al.dx
t e satl . 0 8 h
jnz
WaitNotVerticalSync2
ret
WaitForVertical SyncEnd endp
569
......................................................................
: S e t st h es t a r ta d d r e s st ot h ev a l u es p e c i f e db yS t a r t A d d r e s s .
: W a i tf o rt h et r a i l i n g
edge o f v e r t i c a l s y n c b e f o r e s e t t i n g
so t h a t
: o n eh a l fo ft h ea d d r e s si s n ' tl o a d e db e f o r et h es t a r to ft h ef r a m e
: a n dt h eo t h e rh a l fa f t e r ,r e s u l t i n gi nf l i c k e r
asoneframe
is
: d i s p l a y e dw i t hm i s m a t c h e dh a l v e s .T h e
new s t a r ta d d r e s sw o n ' tb e
: l o a d e du n t i lt h es t a r to ft h en e x tf r a m e :t h a ti s .o n ef u l lf r a m e
: will b ed i s p l a y e db e f o r et h e
new s t a r ta d d r e s st a k e se f f e c t .
: I n p u tn: o n e
: O u t p u tn: o n e
: R e g i s t e r s a1 t e r e d : A X , DX
S e t S t a r t A d d r epsrsnoec a r
c aWl la i t F o r V e r t i c a l S y n c E n d
dx.CRTC-INDEX
mov
al.START-ADDRESS-HIGH
mov
mov
a h . b y tpe[t S
r tartAddress+ll
cli
;make orbsneoucagtretsihesegtte tr s
OUT-WORD
al.START-ADDRESS-LOW
mov
mov
a h . b y tpe[t S
r tartAddress]
OUT-WORD
sti
ret
S e t S t a r t A d d r eesnsd p
......................................................................
: S e t st h es c a nl i n et h es p l i ts c r e e ns t a r t sa f t e rt ot h es c a nl i n e
: s p e c i f i e db yS p l i t S c r e e n L i n e .
; I n p u tn: o n e
: O u t p u tn: o n e
; A
l r e g i s t e r sp r e s e r v e d
S e t S p l i t S c r e e n S c a n L i n ep r o cn e a r
push a x
push c x
push d x
W a i tf o rt h el e a d i n ge d g eo ft h ev e r t i c a ls y n cp u l s e .T h i se n s u r e s
t h a t we d o n ' t g e t m i s m a t c h e d p o r t i o n s o f t h e s p l i t s c r e e n s e t t i n g
w h i l es e t t i n gt h et w oo rt h r e es p l i ts c r e e nr e g i s t e r s( r e g i s t e r1 8 h
s e tb u tr e g i s t e r
7 notyetset
when a m a t c ho c c u r s ,f o re x a m p l e ) .
w h i c hc o u l dp r o d u c eb r i e ff l i c k e r i n g .
call
WaitForVerticalSyncStart
S e tt h es p l i ts c r e e ns c a nl i n e .
dx.CRTCCINDEX
mov
mov
a h . b y tpet[ rS p l i t S c r e e n L i n e ]
mov
a1 .LINE-COMPARE
cli
:maker teshaoguelnirlascestgeeetrts
OUT-WORD
; sb ei tts
7 - 0st hposelcf i rt esecl nianne
mov
a h . b y t pe t[ rS p l i t S c r e e n L i n e + l l
ah.1
and
570
Chapter 30
mov
a hs,hcll
c l .4
;move b i t 8 o f t h e s p l i t s p l i t s c r e e n s c a n
; l i n ei n t op o s i t i o nf o rt h eO v e r f l o wr e g
mov
a1 ,OVERFLOW
i f IS-VGA
;
;
;
;
The S p l i tS c r e e n ,O v e r f l o w ,a n dL i n e
Compare r e g i s t e r s a l l c o n t a i n
p a r to ft h es p l i ts c r e e ns t a r ts c a nl i n eo nt h e
VGA. W e ' l lt a k e
a d v a n t a g e o f t h er e a d a b l er e g i s t e r so ft h e
VGA t o l e a v e o t h e r b i t s
i n t h e r e g i s t e r s we a c c e s su n d i s t u r b e d .
dx.al
: s e t CRTC I n d e x r e g t o p o i n t t o O v e r f l o w
; p o i n t t o CRTC D a t ar e g
a1,dx
; g e tt h ec u r r e n tO v e r f l o wr e gs e t t i n g
8
a1 ,not
10h
;turnoffsplitscreenbit
a1,ah
; i n s e r t t h e new s p l i t s c r e e n b i t
8
; ( w o r k s i n anymode)
new s ps lci rt ebei tn
8
CRTC rIengd e x
a h . b y tpet[ rS p l i t S c r e e n L i n e + l l
out
di xn c
in
and
or
cl .3
ah.cl
ror
al.MAXIMUM-SCAN-LINE
mov
doxu; s.taelt
CRTC I n pdrtoeeotixgon t
MaximumScan
Maximum
; Scan L i n e
;tpoo di nxti n c
a l . di nx
and
a1
,not
40h
a l . a oh r
CRTCr eDg a t a
; g e tt h ec u r r e n t
MaximumScan
Linesetting
9
;turnoffsplitscreenbit
; i n s e r t t h e new s p l i t s c r e e n b i t 9
; ( w o r k s i n anymode)
new s ps lci rt ebei tn
9
dox;tush, taeelt
else
;
;
;
;
;
O n l yt h eS p l i tS c r e e na n dO v e r f l o wr e g i s t e r sc o n t a i np a r to ft h e
S p l i tS c r e e ns t a r ts c a nl i n ea n dn e e dt ob es e to nt h e
EGA.
EGA r e g i s t e r s a r e n o t r e a d a b l e ,
s o we have t o s e t t h e n o n - s p l i t
s c r e e nb i t so ft h eO v e r f l o wr e g i s t e rt o
a p r e s e tv a l u e ,i nt h i s
c a s et h ev a l u ef o r3 5 0 - s c a n - l i n e
modes.
or
a h .;0i nf hst he er t
; ( o n l yw o r k s
;setthe
OUT-WORD
endi f
sti
POP
dx
POP
cx
POP
ax
ret
SetSplitScreenScanLineendp
......................................................................
Moves t h e s p l i t s c r e e n u p t h e s p e c i f i e d
I n p u t : CX
-#
ofscanlinesto
number o f s c a nl i n e s .
move t h e s p l i t s c r e e n u p b y
O u t p u tn: o n e
R e g i s t e r sa l t e r e d :
CX
571
S p l i t S c r e enpner U
o
a rcp
SplitScreenUpLoop:
it S c r e e n L i n e 1
dec[Spl
c a lS
l e t S p il t S c r e e n S c a n L i n e
1 oop Spl itScreenUpLoop
ret
Spl itScreenUp
endp
......................................................................
; Moves t h es p l i ts c r e e n
.. # o f
: Input: CX
down t h es p e c i f i e d
s c a nl i n e st o
number o f s c a nl i n e s .
move t h es p l i ts c r e e n
down by
: Output:none
: R e g i s t e r sa l t e r e d :
CX
S pi tl S c r e e n D o wpnr once a r
SplitScreenDownLoop:
[ Si npcl i t S c r e e n L i n e l
c aSlel t S p l i t S c r e e n S c a n L i n e
1 oop Spl itScreenOownLoop
ret
Spl itScreenDownendp
......................................................................
Code
ends
end
Start
572
Chapter 30
Remember, the split screen start scan line is spread out overtwo or three registers.
What ifthe incompletely-changed value matches the current scan line after
you veset
one register but before youve set the rest? For oneframe, youll see the split screen
in a wrongplace-possibly a vevy wrongplace-resulting in jumping andflicker.
The solution is simple: Set the split screen start scan line at a time when it cant
possibly match the currently displayed scan line. The easy way to do that is to set it
when there isnt anycurrently displayed scan line-during vertical non-displaytime.
One safe time thats easy to find is the start of the vertical sync pulse, which is typically pretty near themiddle of vertical non-display time,and thats the approachIve
followed in Listing 30.1. Ive also disabled interrupts during the period when the
split screen registers are beingset. This isnt absolutely necessary,
but if its not done,
theres the possibility that an interruptwill occur between register sets and delay the
later register sets until display time, again causing flicker.
One interesting effect of setting the split screen registers at thestart of vertical sync
is that it has the effect of synchronizing the program to thedisplay adapters frame
rate. No matter how fast the computer runningListing 30.1 may be, thesplit screen
will move at a maximum rateof once per frame. Thisis handy for regulating execution
speed over a wide variety of hardware performance ranges; however, be aware that
the VGA supports 70 Hz frame rates in all non-480-scan-linemodes, while the VGA
in 480-scan-line-modes and the EGA in all color modes support 60 Hz frame rates.
573
The bug is this: The first scan line of the EGA split screen-the scan line startingat
offset zero in display memory-is displayed not once but twice. In other words, the
first line of split screen display memory, and only the first line, is replicated one
unnecessary time, pushingall the otherlines down by one.
if the first few scan lines are identical,
its not
Thats not afatal bug, of course. In fact,
even noticeable. The EGAs split-screen bug can produce visible distortion given
certain patterns, however, so you should try to make the top few lines identical (if
possible) when designing split-screen images that might be displayed on EGAs, and
you should in any casecheck how your split-screens look on bothVGAs and EGAs.
I have an important caution here: Don t count on the EGAs split-screen bug; that
is, don t rely on thefirst scan line being doubled when you design your split screens.
IBM designed and made the original EGA, but a lot of companies cloned it, and
there s no guarantee that all EGA clones copy the bug. It is a certainty, at least,
that the VGA didnt copy it.
Theres another respectin which the EGA is inferior to the VGA when it comes to
the split screen, and thats in the area of panning when the split screen is on. This
isnt a bug-its just one of the many areas in which the VGAs designers learned
from the shortcomingsof the EGA and went the EGA one better.
574
Chapter 30
On theVGA, there is recourse. AVGA-only bit, bit 5 of the AC Mode Control register
(AC register lOH), turns off pel panning in the split screen. In other words, when
this bit is set to 1, pel panning is reset to zero before the first line of the split screen,
and remains zero until the endof the frame.This doesnt allow you to pan thesplit
screen horizontally, mind you-theres no way to do that-but it does let you pan the
normal screen while the split screen stays rock-solid. This can be used to produce an
attractive streaming tape effect in the normal screen while the split screen is used
to display non-movinginformation.
: p e lp a n n i n gs u p p r e s s i o na n dp a n sr i g h ti nt h et o ph a l fw h i l et h e
: s p l i ts c r e e nr e m a i n ss t a b l e .
On an EGA. t h e s p l i t s c r e e n j e r k s
: a r o u n d i nb o t hc a s e s ,b e c a u s et h e
EGA d o e s n ts u p p o r ts p l i t
: s c r e e np e pl a n n i n gs u p p r e s s i o n .
: The j e r k i n g i n t h e s p l i t s c r e e n o c c u r s b e c a u s e t h e s p l i t s c r e e n
: i sb e i n gp e lp a n n e d( p a n n e db ys i n g l ep i x e l s - - i n t r a b y t ep a n n i n g ) .
: b u ti sn o t
a n dc a n n o tb eb y t ep a n n e d( p a n n e db ys i n g l eb y t e s - -
575
: " e x t r a b y t e "p a n n i n g )b e c a u s et h es t a r ta d d r e s so ft h es p l i ts c r e e n
: i sf o r e v e rf i x e da t
0.
......................................................................
I S-VGA
equ
VGA-SEGMENT
LOGICAL-SCREENLWIDTH
equ
equ
OaOOOh
1024
: st eo t
0 t o a s s e m fbol re
EGA
SCREEN-HEIGHT
SPLIT-SCREEN-START
SPLIT-SCREEN-HEIGHT
CRTC-INDEX
AC-INDEX
OVERFLOW
MAXIMUM-SCAN-LINE
equ
equ
equ
equ
e w
equ
equ
STARTLADDRESS-HIGH
equ
START-ADDRESS-LOWequ
Odh
HOFFSET
equ
LINE-COMPARE
e w
AC-MOOELCONTROL
PELLPANNING
INPUT-STATUS-0
WORD-OUTS-OK
equ
e w
equ
ew
350
200
; s t a r ts c a nl i n ef o rs p l i ts c r e e n
SCREENCHEIGHT-SPLITpSCREEN-START-1
3d4h
3cOh
7
9
......................................................................
: Macro t o o u t p u t a w o r dv a l u et o
a port.
OUT-WORD
macro
i f WORD-OUTS-OK
doxu, ta x
else
doxu. at l
di xn c
x c hagh , a l
out
dx.al
ddxe c
x c hagh . a l
endi f
endm
......................................................................
MyStack
segment
para
stack
512
db
dup
ends
MyStack
'STACK'
(0)
......................................................................
Data
segment
Spl it S c r e en ne L i
dw
StartAddress
dw
Pel Pan
db
Data
ends
576
Chapter 30
........................................................................
Code
segment
assume
cs:Code.
ds:Oata
......................................................................
S t a r t p r o cn e a r
mov
ax.Data
mov
ds.ax
; S e l e c t mode 10h.640x35016-ColOrgraphics
mov
ax.0010h
int
10h
mode.
;AH-0 i s s e l e c t mode f u n c t i o n
;AL-lOh i s mode t o s e l e c t ,
; 6 4 0 x 3 5 01 6 - c o l o rg r a p h i c s
mode
S e tt h eO f f s e tr e g i s t e rt o
make t h e o f f s e t f r o m t h e s t a r t o f
one
scan l i n e t o t h e s t a r t o f t h e n e x t t h e d e s i r e d
number o f p i x e l s .
T h i sg i v e s us a v i r t u a ls c r e e nw i d e rt h a nt h ea c t u a ls c r e e nt o
panacross.
: N o t et h a tt h eO f f s e tr e g i s t e ri s
programmed w i t h t h e l o g i c a l
; s c r e e nw i d t hi nw o r d s ,n o tb y t e s ,h e n c et h ef i n a ld i v i s i o nb y
2.
;
;
;
;
mov
dx.CRTC-INDEX
ax.(LOGICAL-SCREEN-WIOTH/8/2
mov
OUT-WORD
; S e tt h es t a r ta d d r e s st od i s p l a yt h e
; s c r e e n memory.
s h l 8) o r HOFFSET
memory j u s t p a s t t h e s p l i t
mov
[StartAddressl.SPLIT_SCREEN_HEIGHT*(LOGICAL-SCREEN-WIOTH/8)
c aSl el t S t a r t A d d r e s s
; S e tt h es p l i ts c r e e ns t a r ts c a nl i n e .
mov
[SplitScreenLinel.SPLIT_SCREEN-START
c a l l S e t S p l itScreenScanLi ne
: F i l lt h es p l i ts c r e e np o r t i o no fd i s p l a y
memory ( s t a r t i n g a t
; o f f s e t 0 ) w i t h a c h o p p yd i a g o n a lp a t t e r ns l o p i n gl e f t .
ax.VGA-SEGMENT
mov
mov
es.ax
sub d. id i
mov
dx.SPLIT-SCREEN-HEIGHT
mov
cld
RowLoop:
mov
ax.OFFOh
;fill a l l l i n e s i n t h e s p l i t s c r e e n
; s t a r t i n g fill p a t t e r n
cx.LOGICAL-SCREEN-WIDTH/8/4
;fill 1 s c a nl i n e
ColumnLoop:
o f p a; drstrtaows w
mov
word pet rs : [ d i ] . O
inc
inc
1 oop
rol
dec
jnz
di
di
Col umnLoop
ax.1
dx
RowLoop
a d i a g o n a l 1i n e
so
;make v e r t i c ab l a nskp a c e s
; p a n n i n ge f f e c t sc a nb es e e ne a s i l y
;shiftpatternword
: F i l lt h ep o r t i o no fd i s p l a y
memory t h a t will b e d i s p l a y e d i n t h e
: n o r m a ls c r e e n( t h en o n - s p l i ts c r e e np a r to ft h ed i s p l a y )w i t h
a
: c h o p p yd i a g o n a lp a t t e r ns l o p i n gr i g h t .
mov
mov
mov
cld
RowLoop2:
mov
di.SPLIT-SCREEN-HEIGHT*(LOGICAL_SCREEN-WIOTH/8)
dx.SCREEN-HEIGHT :fill a l ll i n e s
ax.Oc510h
: s t a r t i n g fill p a t t e r n
cx.LOGICAL-SCREEN-WIOTti/8/4
:fill 1 scan l i n e
ColumnLoop2:
o f p a r:td r as two s w
a dl i an ge o n a l
mov
word p t r es:Cdil.O
:make v e r t i c ab l a nskp a c e s
so
: p a n n i n ge f f e c t sc a nb es e e ne a s i l y
idni c
di ni c
1 oopColumnLoop2
w o rpda t t e :r snh i f t a x . 1
ror
dx
dec
jnz
RowLoop2
: P e lp a nt h en o n - s p l i ts c r e e np o r t i o no ft h ed i s p l a y :b e c a u s e
: s p l i ts c r e e np e lp a n n i n gs u p p r e s s i o ni sn o tt u r n e d
: s c r e e nj e r k sb a c k
on, t h e s p l i t
and f o r t h as t h ep e lp a n n i n gs e t t i n gc y c l e s .
l emov
f tt h e pt oi x2ce:0xpl.0s2a 0n 0
c aPlal n R i g h t
: W a i tf o r
mov
int
a k e yp r e s s( d o n ' te c h oc h a r a c t e r ) .
;DOS c o n s o l ei n p u tw i t h o u te c h of u n c t i o n
ah.8
21h
: R e t u r nt ot h eo r i g i n a ls c r e e nl o c a t i o n ,w i t hp e lp a n n i n gt u r n e do f f .
[StartAddressl.SPLIT~SCREEN~HEIGHT*(LOGICAL-SCREEN-WIDTH/8)
mov
c aSl el t S t a r t A d d r e s s
mov
[PelPan] .O
c a l lS e t P e l
Pan
: T u r no ns p l i ts c r e e np e lp a n n i n gs u p p r e s s i o n ,
: w o n ' tb ea f f e c t e db yp e lp a n n i n g .N o td o n eo n
: r e a d a b l er e g i s t e r s
: a r e n ' ts u p p o r t e db y
if IS-VGA
mov
in
dx.INPUT-STATUS-0
a1 .dx
mov
al.20h+AC-MODE-CONTROL
mov
out
inc
in
dx.AC-INDEX
dx.al
dx
a1 ,dx
a1 ,20h
or
578
so t h e s p l i t s c r e e n
EGA b e c a u s eb o t h
and t h es p l i ts c r e e np e lp a n n i n gs u p p r e s s i o nb i t
EGAs.
Chapter 30
: r e s e tt h e
AC I n d e x / D a t a t o g g l e t o
: I n d e xs t a t e
: b i t 5 s e t t o 1 t o keepvideoon
: p o i n t t o AC I n d e x / D a t ar e g i s t e r
: p o i n t t o AC D a t ar e g( f o rr e a d so n l y )
: g e tt h ec u r r e n t
AC Mode C o n t r o lr e g
; e n a b l es p l i ts c r e e np e lp a n n i n g
: suppression
dec
dx
: p o i n tt o
AC I n d e x / D a t ar e g( D a t af o r
; w r i t e so n l y )
out
dx.al
: w r i t et h e
new AC Mode C o n t r o l s e t t i n g
: w i t hs p l i ts c r e e np e lp a n n i n g
: s u p p r e s s i o nt u r n e do n
endi f
o f t h ed i s p l a y ;b e c a u s e
P e lp a nt h en o n - s p l i ts c r e e np o r t i o n
s p l i ts c r e e np e lp a n n i n gs u p p r e s s i o ni st u r n e do n .t h es p l i t
s c r e e n will n o t move as t h ep e lp a n n i n gs e t t i n gc y c l e s .
mov
cx.200
c aPlal n R i g h t
;pan
l e f tt200
h e tpoi x e l s
Wait f o r a k e yp r e s s( d o n ' te c h oc h a r a c t e r ) .
mov
int
ah.8
21h
R e t u r nt ot e x t
mov
text
f u n;DOS
cewtciihot ohincnopunutst o l e
mode and DOS.
ax.0003h
:AH-0
s e l e cits
mode f u n c t i o n
:AL-3 i s mode t o s e l e c t , t e x t
mode
mode
to
; r ient ut r n 1 0 h
mov
ah.4ch
int
21h
S t a r t endp
to
:return
DOS
......................................................................
: Waits f o r t h el e a d i n g
edge o f t h ev e r t i c a ls y n cp u l s e .
: I n p u tn: o n e
: Output:none
: R e g i s t e r sa l t e r e d :
AL. DX
WaitForVerticalSyncStart
nperaorc
mov
dx.INPUT-STATUS-0
WaitNotVerticalSync:
a l . di xn
t e s t a1 .08h
Wj nazi t N o t V e r t i c a l S y n c
WaitVerticalSync:
a l . di xn
t e s t a1 .08h
jz
WaitVerticalSync
ret
WaitForVerticalSyncStart
endp
......................................................................
: W a i t sf o rt h et r a i l i n ge d g e
o f t h ev e r t i c a ls y n cp u l s e .
: I n p u tn: o n e
;
Output:none
: R e g i s t e r sa l t e r e d :
AL. D X
W a i t F o r V e r t i c a l S y n c E n dp r o cn e a r
mov
dx.INPUT-STATUS-0
WaitVerticalSyncZ:
a l . di xn
t e satl . 0 8 h
W a i tj Vz e r t i c a l S y n c E
WaitNotVerticalSyncE:
a l . di xn
t e satl . 0 8 h
Wj nazi t N o t V e r t i c a l S y n c E
ret
WaitForVerticalSyncEndendp
......................................................................
:
:
:
:
:
:
:
S e t st h es t a r ta d d r e s st ot h ev a l u es p e c i f e db yS t a r t A d d r e s s .
W a i tf o rt h et r a i l i n g
edge o f v e r t i c a l s y n c b e f o r e s e t t i n g
so t h a t
one h a l f o f t h e a d d r e s s i s n ' t l o a d e d b e f o r e t h e s t a r t o f t h e f r a m e
and t h e o t h e r h a l f a f t e r , r e s u l t i n g i n f l i c k e r
asoneframe
is
d i s p l a y e dw i t hm i s m a t c h e dh a l v e s .
The new s t a r ta d d r e s sw o n ' tb e
l o a d e du n t i lt h es t a r to ft h en e x tf r a m e :t h a ti s .o n ef u l lf r a m e
w
i
l b ed i s p l a y e db e f o r et h e
new s t a r t a d d r e s s t a k e s e f f e c t .
: I n p u t n: o n e
: Output:none
: R e g i s t e r sa l t e r e d :
A X , OX
S e t S t a r t A d d r eps rsno ec a r
c aWl la i t F o r V e r t i c a l S y n c E n d
mov
dx.CRTC-INDEX
mov
al.START-ADDRESS-HIGH
mov
a h . b y tpetCr S t a r t A d d r e s s + l l
cli
o n c:make
ea t rsebegotsgi tsuehtrteer s
OUTLWORO
al.START-ADDRESS-LOW
mov
mov
a h . b y t pe t[ rS t a r t A d d r e s s l
OUT-WORD
sti
ret
S e t S t a r t A d d r eesnsd p
......................................................................
; S e t st h eh o r i z o n t a lp e lp a n n i n gs e t t i n gt ot h ev a l u es p e c i f i e d
: b yP e l P a n .W a i t su n t i lt h es t a r to fv e r t i c a ls y n ct o
: t h e new p e lp a ns e t t i n gc a nb el o a d e dd u r i n gn o n - d i s p l a yt i m e
do s o , so
: a n dc a nb er e a d yb yt h es t a r to ft h en e x tf r a m e .
: I n p u t n: o n e
: Output:none
: R e g i s t e r sa l t e r e d :
S e t P e l Pan
call
AL. OX
p r onc e a r
WaitForVerticalSyncStart
mov
dx.AC-INDEX
mov
al.PEL-PANNING+ZOh
t h e: p odi nx t, a lo u t
mov
a1 ,[PelPan]
t h e : l o addx . a l o u t
ret
S e t P e l Pan
endp
580
Chapter 30
: a l sr eo s etthse
AC
: I n d e x / D a t at o g g l e
; t oI n d e xs t a t e
1 t o keep v i deoon
: b i t 5 steot
P e l ACt o I n d e x
Pan r e g
new Pel Pan s e t t.i
ng
......................................................................
: S e t st h es c a nl i n et h es p l i ts c r e e ns t a r t sa f t e rt ot h es c a nl i n e
: s p e c i f i e db yS p l i t S c r e e n L i n e .
: I n p u tn: o n e
: O u t p u tn: o n e
: Al r e g i s t e r sp r e s e r v e d
S e t S p l i t S c r e e n S c a n L i n ep r o cn e a r
push ax
push c x
push d x
W a i tf o rt h el e a d i n ge d g eo ft h ev e r t i c a ls y n cp u l s e .T h i se n s u r e s
t h a t we d o n ' t g e t m i s m a t c h e d p o r t i o n s o f t h e s p l i t s c r e e n s e t t i n g
w h i l es e t t i n gt h et w oo rt h r e es p l i ts c r e e nr e g i s t e r s( r e g i s t e r1 8 h
7 n o ty e ts e t
when a m a t c ho c c u r s ,f o re x a m p l e ) .
s e tb u tr e g i s t e r
w h i c hc o u l dp r o d u c eb r i e ff l i c k e r i n g .
call
Wai t F o r V e r t i c a 1 S y n c S t a r t
S e tt h es p l i ts c r e e ns c a nl i n e .
mov
dx.CRTC-INDEX
mov
a h . b y tpetCr S p l i t S c r e e n L i n e l
mov
a1 , LINE-COMPARE
c li
:make s u r e a l l t h e r e g i s t e r s g e t s e t a t o n c e
: s e t b i t s 7 - 0 o ft h es p l i ts c r e e ns c a nl i n e
OUT-WORD
mov
a h . b y tpet[ rS p l i t S c r e e n L i n e + l l
ah.1
and
mov
c l .4
a hs .hcl l
:move b i t ss8tsphpcloeliritfset ceann
: l i n ei n t op o s i t i o nf o rt h eO v e r f l o wr e g
mov
a1
,OVERFLOW
The S p l i tS c r e e n ,O v e r f l o w ,a n dL i n e
Compare r e g i s t e r s a l l c o n t a i n
p a r to ft h es p l i ts c r e e ns t a r ts c a nl i n e
on t h e VGA. W e ' l lt a k e
a d v a n t a g eo ft h er e a d a b l er e g i s t e r so ft h e
VGA t o l e a v e o t h e r b i t s
i nt h er e g i s t e r s
we a c c e s su n d i s t u r b e d .
:dsxe. ta l
out
t o :ipnc
oint dx
trhrsa1
v:egenret.retdftilgxno gw
i n c uO
a1 . n1;o0t usthoprsf nflci tr ebei nt
and
or
a1: i ,ntashhee r t
O
CRTC
vpet or fitnlIoonrt ew
dge x
CRTC
reg Data
8
8
MaximumScan
Maximum
di xn c
in
and
a l . a ho r
: p o i n t t o CRTC D a t a r e g
; g e tt h ec u r r e n t
MaximumScan
L i n es e t t i n g
9
:turnoffsplitscreenbit
; i n s e r t t h e new s p l i t s c r e e n b i t
9
; ( w o r k si na n y
mode)
9
; s e tt h e
new s p l i t s c r e e n b i t
a1,dx
a1 . n o4t 0 h
d ox u. at l
else
O n l yt h eS p l i tS c r e e na n dO v e r f l o wr e g i s t e r sc o n t a i np a r to ft h e
S p l i tS c r e e ns t a r ts c a nl i n ea n dn e e dt ob es e t
on t h e EGA.
EGA r e g i s t e r sa r en o tr e a d a b l e ,
s o we have t o s e t t h e n o n - s p l i t
s c r e e nb i t so ft h eO v e r f l o wr e g i s t e r
t o a p r e s e tv a l u e ,i nt h i s
c a s et h ev a l u ef o r3 5 0 - s c a n - l i n em o d e s .
a h . 0 of hr
; i n s e r t t h e new s p l i t s c r e e n b i t 8
; ( o n l yw o r k si n3 5 0 - s c a n - l i n e
EGA modes)
: s e t t h e new s p l i t s c r e e n b i t 8
OUT-WORD
endi f
sti
POP
POP
dx
cx
ax
POP
ret
SetSplitScreenScanLineendp
......................................................................
; Pan h o r i z o n t a l l y t o t h e r i g h t t h e
: I n p u t : CX
- # ofpixels
number o f p i x e l s s p e c i f i e d
by CX.
b yw h i c ht op a nh o r i z o n t a l l y
: Output:none
; R e g i s t e r sa l t e r e d :
A X , C X . DX
P a n Rpni greohact r
PanLoop:
i n[cP e l
Pan]
[PelPan],07h
and
DjonSz e t S t a r t A d d r e s s
CiSn tca r t A d d r e s s l
DoSetStartAddress:
c aSlel t S t a r t A d d r e s s
c a l lS e t P e l
Pan
l o o p PanLoop
ret
PanRight
endp
........................................................................
Code
ends
Se tnadr t
582 Chapter 30
Recall also that the AC Index and Data registers are both written to at 1/0 address
3COH, with the toggle that determineswhich one is written to at any time switching
state on every writeto 3COH; this toggle is reset to index modeby each read fromthe
Input Status 0 register (3DAH in color modes, 3BAH in monochrome modes). The
AC Index andData registers can also be written to at 3C1H on theEGA, but not on
the VGA, so steer clear of that practice.
On the VGA, reading AC registers is a bit different from writing to them. The AC
Data register can be read from3COH, and theAC register currently addressed by the
AC Index register can be read from 3C1H; reading does not affect the state of the
AC index/data toggle. Listing 30.2 illustrates reading from and writing to the AC
registers. Finally, setting the start address registers (CRTC registers OCH and ODH)
has its complications. As with the split screen registers, the start address registers
must be set together and without interruption at time
a
when theres no chance of a
partial setting being used for a frame. However, its a little more difficult to know
when that might be the case with the start address registers than itwas with the split
screen registers, because its not clear when the start address is used.
You see, the start address is loaded intothe EGAs or VGAs internal display memory
pointer once per frame. The internal pointer is then advanced, byte-by-byte and
line-by-line, until the end of the frame (with a possible resetting to zero if the split
screen line is reached), andis then reloaded for the next frame.Thats straightforward enough; thereal question is, Exactly when is the start address loaded?
In his excellent book Programmers Guide to PC Video Systems (MicrosoftPress) Richard
Wilton says that the start address is loaded at the start of the vertical sync pulse.
(Wilton callsit vertical retrace, which can also be taken to mean vertical non-display
time, but given that hes testing the vertical sync status bit in the InputStatus 0 register, I assume he means that the start address is loaded at the start of vertical sync.)
Consequently, he waits until the end of the vertical sync pulse to set the start address
registers, confident that the start address wont take effect until the nextframe.
Im sure Richard is right when it comes to the real McCoy IBM VGA and EGA, but
Im less confident that every clone out there loads the start address at the start of
vertical sync.
For that vevy reason,I generally advisepeople not to use horizontal smooth
panning
unless they can test their software
on all the makes of display adapter
it might run on.
Ive used Richard
j . approach in Listings 30.1 and 30.2, and sofar as Ive seenit works
fine, but be awarethat there arepotential, albeit unproven, hazardsto relying on
the setting of the start address registers to occur at a speclfic time in the frame.
The interaction of the start address registers and the Pel Panning register is worthy
of note. After waiting for the end of vertical sync to set the start address in Listing
30.2, I wait for thestart of the nextvertical sync toset the Pel Panning register. Thats
583
584
Chapter 30
Previous
Home
for example, there shouldbe no problem if the split screen has a border of blanks
on the leftside.
How Safe?
So, how safe is it to use the split screen? My opinion is that its perfectly safe, although Id welcomeinput from peoplewith extensivesplit screen experience-and
the effects are striking enough that the split screen is well worth using in certain
applications.
Im a little more
leery of horizontal smoothscrolling, with or without thesplit screen.
Still, the Wilton book doesnt advise anyparticular caution,and I havent heard any
horror stories from the field lately, so the clone manufacturers must finally have
gotten it right. (I vividly remember some early clones years backthat didnt quite get
it right.)So, on balance, Idsay to use horizontal smooth scrolling
if you reallyneed
it; on the other hand, in fast animation you can often get away with byte scrolling,
which is easier, faster, and safer. (I recently saw a game thatscrolled as smoothly as
you could ever want.It was only by stopping itwith Ctrl-NumLock that Iwas able to
be sure that itwas, in fact, byte panning, notpel panning.)
In short, use the fancy stuff-but only when you have to.
585
Next
Previous
chapter 31
higher 256-color resolution on the vga
Home
Next
> *
One of the more ippealing features of the VGA is its ability to display 256 simultaneous colors. Unf&&unately, one of the less appealing features of the VGA is the
f the one 256-color mode the IBM-standard BIOS
higher resolution 256-color modes in thelegion of
eans a standard, and differences between seemingly
anufacturers can be vexing.) Morecolors can often
ut the resolution difference between the 640x480
the 320x200 256color modeis so great thatmany programmers
simply cant afford to use the 256-color mode.
about theVGA, however, itsthat theres neverjust
,alternatives always exist for theclever programmer, and thats more true thanyou might imagine with 256-color mode. Not only is
there a high 256-color resolution, there arelots of higher 256-color resolutions, going all the way up to 360x480-and thats with the vanilla IBM VGA!
In this chapter, Im going to focus on one of my favorite 256-color modes, which
provides 320x400 resolution and two graphics pages and can be set up with verylittle
reprogramming of the VGA. In the next chapter,Ill discuss higher-resolution 256color modes, and starting in Chapter 47, Ill cover the high-performance Mode X
256-color programming that many games use.
So. Lets get started.
589
320x400 256-ColorMode
Okay, whats so great about 320x400 256-color mode? Two things: easy, safe mode
sets and page flipping.
As I said above, mode 13His really a 320x400 mode, albeitwith each line doubled to
produce an effective resolution of 320x200. That means thatwe dont need to change
any display timings, widths, or heights in order to tweak mode 13H into 320x400
590
Chapter 31
mode-and that makes 320x400a safe choice. Basically, 320x400 mode differs from
mode 13H only in thesettings of mode bits, whichare sure to be consistent from one
VGA clone to the next andwhich work equallywell with allmonitors. The otherhires 256-color modes differ from mode 13H not only in the settings of the mode bits
but also in the settings of timing and dimension registers, which may not be exactly
the same on all VGA clones and particularly not onall multisyncmonitors. (Because
multisyncs sometimes shrink the active area of the screen when used with standard
VGA modes, some VGAs use alternate register settings for multisync monitors that
adjust the CRT Controller timings to use as much of the screen area as possible for
displaying pixels.)
The other good thing about
320x400 256-colormode is that two pages are supported.
Each 320x400 256-colormode requires 128,000 bytes of display memory,
so we can
just barely manage two pages in 320x400 mode, one starting at offset 0 in display
memory and the otherstarting at offset 8000H. Those two pages are thelargest pair
of pages that can fit in theVGAs 256K, though, and the
higher-resolution 256-color
modes, which use still
larger bitmaps (areas of displaymemory that controlpixels on
and will
the screen), cant support two pages at all. As weve seen in earlier chapters
see again in this book, paging is veryuseful for off-screen construction of imagesand
fast, smooth animation.
Thats why I like 320x400 256-color mode. Thenext step is to understand how display memory is organized in 320x400 mode, andthats not so simple.
591
592
Chapter 31
the text yourself. Likewise, the BIOS read pixel and write pixel routines wont work
in 320x400 mode, but thats no problem because Ill provide equivalent routines in
the nextsection.
Our next task is to convert standard mode 13H into 320x400 mode. Thats accomplished by undoing some of the modebits that are set especially
up
for mode13H, so
that from a programming perspective the VGA reverts to a straightforward planar
model of memory. That means taking the VGA out of chain 4 mode and doubleword
mode, turning off the double display ofeach scanline, making sure chain mode, odd/
even mode, and word mode are turned off, and selecting byte mode for video data
display. All thats done in the Set320~400Modesubroutine in Listing 31 . l ,which
well discussnext.
LISTING 31.1
;
;
;
;
;
131 1.ASM
Program t od e m o n s t r a t ep i x e ld r a w i n gi n3 2 0 x 4 0 02 5 6 - c o l o r
mode on t h e VGA. Draws 8 l i n e s t o f o r m
anoctagon,
a pixel
a t a t i m e . Draws 8 octagons i n all, one on t o po ft h eo t h e r ,
each i n a d i f f e r e n tc o l o rs e t .A l t h o u g hi t sn o tu s e d ,
a
p i x e lr e a df u n c t i o ni sa l s op r o v i d e d .
VGA-SEGMENT
SC-INDEX
GC-INDEX
CRTC-INDEX
MAP-MASK
MEMORY-MODE
MAX-SCAN-LINE
START-ADDRESS-HIGH
UNDERLINE
MODE-CONTROL
READ-MAP
GRAPHICS-MODE
MISCELLANEOUS
SCREEN-WIDTH
SCREEN-HEIGHT
equ
equ
equ
equ
equ
equ
equ
equ
equ
equ
equ
equ
equ
equ
equ
OaOOOh
3c4h
3ceh
3d4h
2
4
9
Och
14h
17h
4
5
6
320
400
; S e q u e n c eC o n t r o l l e rI n d e xr e g i s t e r
: G r a p h i c sC o n t r o l l e rI n d e xr e g i s t e r
;CRT C o n t r o l l e rI n d e xr e g i s t e r
;Map Mask r e g i s t e r i n d e x i n
SC
;Memory Mode r e g i s t e r i n d e x i n
SC
;Maximum Scan L i n e r e g i n d e x i n
CRTC
; S t a r t A d d r e s sH i g hr e gi n d e xi n
CRTC
: U n d e r l i n eL o c a t i o nr e gi n d e xi n
CRTC
;Mode C o n t r o lr e g i s t e ri n d e xi n
CRTC
;Read Map r e g i s t e r i n d e x i n
GC
; G r a p h i c s Mode r e g i s t e r i n d e x i n
GC
; M i s c e l l a n e o u sr e g i s t e ri n d e xi n
GC
;#o fp i x e l sa c r o s ss c r e e n
;I/o f s c a nl i n e so ns c r e e n
593
WORD-OUTS-OK
equ
t o: s e t
0 t o assemble f o r
: c o m p u t e r st h a tc a n ' th a n d l e
: w o r do u t st oi n d e x e d
VGA r e g i s t e r s
s et asgpctmakacer akn t
ends
'STACK'
db
stack
512 dup ( ? I
'DATA'
word
segment
Data
db
BaseCol
or
: S t r u c t u r eu s e dt oc o n t r o ld r a w i n go f
Linecontrol struc
StartX
StartY
LineXInc
LineYInc
BaseLength
d b Loi rn e C o l
eLni nd es c o n t r o l
dw
dw
dw
dw
dw
a line.
?
?
?
?
?
?
: L i s to fd e s c r i p t o r sf o rl i n e st o
draw.
L i n e L li as btLei nl e c o n t r o l
L i n e c o n t r o l <130.110.1.0.60,0>
L i n e c o n t r o l <190.110.1.1.60,1>
Data
LineControl<250,170.0.1.60,2>
L i n e c o n t r o l <250.230.-1.1.60.3>
L i n e c o n t r o l <190.290.-1.0.60,4>
L i n e c o n t r o l <130.290.-1.-1,60.5>
LineControl<70.230.0.-1.60,6>
LineControl<70.170.1.-1.60,7>
L i n e c o n t r o l <-1.0.0,0,0.0>
ends
: Macro t o o u t p u t
a w o r dv a l u et o
a port.
OUT-WORD
macro
if WORD-OUTS-OK
doxu. at x
else
dox u. at l
di xn c
x c hagh . a l
doxu. at l
dxdec
x c hagh . a l
endi f
endm
: Macro t o o u t p u t
a c o n s t a n tv a l u et o
anindexed
CONSTANT-TO-INDEXED-REGISTERmacro
ADDRESS.
mov
dx.AD0RESS
mov
ax.(VALUE s h l 8) + INDEX
OUT-WORD
endm
594
Chapter 31
VGA r e g i s t e r .
I N D E X , VALUE
Code
segment
assume
cs:Code.
ds:Oata
S t a r tp r o cn e a r
mov
ax.0ata
mov
ds.ax
: S e t3 2 0 x 4 0 02 5 6 - c o l o r
call
mode.
Set320By400Mode
: We're i n 3 2 0 x 4 0 02 5 6 - c o l o r
ColorLoop:
mov
si .offset LineList
LineLoop:
mov
cmp
jz
cx,[si+StartXl
c x . -1
L i nesOone
mov
mov
mov
add
P i x e l L o o p:
push
push
call
POP
POP
add
add
dec
jnz
add
jmp
Linesoone:
call
inc
cmp
jb
dx.[si+StartYl
bl ,[si+LineColorl
bp.[si+BaseLengthl
bl .[BaseColorl
:pointtothestartofthe
: linedescriptorlist
X coordinate
: s e tt h ei n i t i a l
:a d e s c r i p t o r w i t h a -1 X
: c o o r d i n a t em a r k st h ee n d
: ofthelist
Y coordinate,
: s e tt h ei n i t i a l
: l i n ec o l o r ,
: and p i x e lc o u n t
: a d j u s tt h el i n ec o l o ra c c o r d i n g
: t oB a s e c o l o r
: s a v et h ec o o r d i n a t e s
cx
dx
W
irt e P i x e l
dx
cx
cx.[si+LineXIncl
dx,[si+LineYIncl
bP
Pixel Loop
si .size Linecontrol
LineLoop
; s e tt h ec o o r d i n a t e so ft h e
: n e x t p o i n t o f t h e 1i n e
: a n ym o r ep o i n t s ?
: y e s .d r a wt h en e x t
; p o i n tt ot h en e x tl i n ed e s c r i p t o r
: a n dd r a wt h en e x tl i n e
GetNextKey
[BaseColorl
[BaseColorl.B
ColorLoop
; w a i tf o r
a k e y .t h e n
: bump t h ec o l o rs e l e c t i o na n d
: see i f we'redone
: n o td o n ey e t
: W a i tf o r
a k e ya n dr e t u r nt ot e x t
: one i s p r e s s e d .
c a l l GetNextKey
mov
ax.0003h
int
10h
mov
ah.4ch
i n t :done
21h
:draw t h i s p i x e l
: r e t r i e v et h ec o o r d i n a t e s
mode andendwhen
t e x t mode
S t a r t endp
: Setsup320x400256-COlOrmodes.
: I n p u tn: o n e
: O u t p u tn: o n e
595
Set320By400Mode
proc
near
: F i r s t . go t o normal320x200256-color
mode, which i s r e a l l y a
: 320x400256-color mode w i t he a c hl i n es c a n n e dt w i c e .
mov
ax.0013h
int
10h
: Change CPU a d d r e s s i n go fv i d e o
:AH
0 means mode s e t . AL
: 2 5 6 - c o l o rg r a p h i c s mode
: B I O S v i d e oi n t e r r u p t
13h s e l e c t s
memory t o l i n e a r ( n o t o d d / e v e n .
: c h a i n .o rc h a i n4 ) .t oa l l o w
us t o access a l l 256K o f d i s p l a y
: memory. When t h i s i s done, VGA memory wil l o o k j u s t l i k e
memory
: i n modes 1 0 ha n d1 2 h .e x c e p tt h a te a c hb y t eo fd i s p l a y
: c o n t r o l o n e2 5 6 - c o l o rp i x e l ,w i t h
: a d d r e s s ,o n ep i x e lp e rp l a n e .
mov
mov
out
inc
in
and
or
out
mov
mov
out
inc
in
and
out
dec
mov
out
inc
in
and
out
:
:
:
:
d i s p l a yo f
dx.SC-INDEX
a1 .MEMORY-MODE
dx.al
dx
a1 ,dx
a1 .not 08h
al.04h
dx.al
dx.GC-INDEX
a1 .GRAPHICS-MODE
dx.al
dx
a1 ,dx
a1 .not 10h
dx.al
dx
a1,MISCELLANEOUS
dx.al
dx
a1 ,dx
a1 .not 02h
dx.al
memory will
4 a d j a c e n tp i x e l sa ta n yg i v e n
:turnoffchain
4
; t u r no f fo d d / e v e n
: t u r no f fo d d / e v e n
:turnoffchain
NOW c l e a rt h ew h o l es c r e e n ,s i n c et h e
mode 13h mode s e t o n l y
c l e a r e d 64K o u t o f t h e
256K o f d i s p l a y memory. Do t h i s b e f o r e
we s w i t c h t h e CRTC o u t o f mode 13h. so we don'tseegarbage
on t h es c r e e n when we make t h e s w i t c h .
CONSTANT-TO-INDEXED-REGISTERSC-1NDEX.MAP-MASK.Ofh
: e n a b l ew r i t e st oa l lp l a n e s ,
so
: we c a nc l e a r 4 p i x e l s a t a t i m e
ax.VGA-SEGMENT
mov
mov
es.ax
sub
d,id i
mov
ax.di
:#o f words i n 64K
mov
cx.8000h
cld
a l l : c l es at or s wr e p
memory
596
Chapter 31
dx.CRTC-INDEX
a1 .MAX-SCAN-LINE
mode by n o ts c a n n i n ge a c h
di xn c
in
and
doxu. at l
dx
dec
a1 .dx
a1
: sl.efnht o t
maximum scan l i n e
: Change CRTC s c a n n i n gf r o md o u b l e w o r d
: t h e CRTC t o scanmorethan
mov
out
di xn c
a l . dixn
and
doxu, at l
dx
dec
mov
doxu. at l
di xn c
a l . dixn
or
mode t o b y t e mode. a l l o w i n g
64K o f v i d e od a t a .
a1,UNDERLINE
dx.al
:turn off doubl eword
a1 . n o4t0 h
a1 .MODELCONTROL
a1: t. u4 r0nh
onb yt ht ee
mode b i t , s o memory i s
a purely
: l i n e a r way. j u s t as i n modes 10hand12h
: scanned f o rv i d e od a t ai n
doxu. at l
ret
Set320By400Mode
endp
: Draws a p i x e l i n t h e s p e c i f i e d c o l o r a t t h e s p e c i f i e d
: l o c a t i o n i n 3 2 0 x 4 0 02 5 6 - c o l o r
: Input:
:
CX
:
DX
BL
mode.
X c o o r d i n a t oepf i x e l
Y c o o r d i n a t oepf i x e l
p i x ceol l o r
: O u t p u tn: o n e
: R e g i s t e r sa l t e r e d :
Writepixel
mov
mov
mov
p r once a r
ax.VGA_SEGMENT
es.ax
ax,SCREEN_WIDTH/4
:pointtodisplay
: t h e r ea r e
memory
4 p i x e l sa te a c ha d d r e s s ,
so
i s 80 b y t e sw i d e
: e a c h3 2 0 - p i x e lr o w
: i n e a c hp l a n e
mu1
push
shr
shr
add
mov
POP
and
mov
shl
dx
cx
cx.1
cx.1
ax.cx
d i ,ax
cx
c1.3
ah.1
ah.cl
mov
a1 .MAP_MASK
mov
dx.SC_INDEX
OUTLWORD
:pointtostart
o f d e s i r e dr o w
X coordinate
: s e ta s i d et h e
: t h e r ea r e 4 p i x e l sa te a c ha d d r e s s
: s o d i v i d e t h e X c o o r d i n a t eb y 4
: p o i n tt ot h ep i x e l ' sa d d r e s s
: g e tb a c kt h e
: g e tt h ep l a n e
X coordinate
# ofthepixel
: s e tt h eb i tc o r r e s p o n d i n gt ot h ep l a n e
: t h ep i x e li si n
:settowritetotheproperplane
: t h ep i x e l
for
597
pixel
t mov
h e : der sa :w[ d i l , b l
ret
W r i t e P i x e nl d p
; Reads t h e c o l o r
o f t h ep i x e la tt h es p e c i f i e dl o c a t i o ni n3 2 0 x 4 0 0
; 2 5 6 - c o l o r mode.
; Input:
;
CX
;
DX
- X c o o r d i n a t eopf i x etlor e a d
- Y c o o r d i n a t e o f p i x etlor e a d
; output:
;
AL
- p i x ceol l o r
; R e g i s t e r sa l t e r e d :
AX, C X . D X , S I . ES
R e a d P i px renoleca r
ax.VGA-SEGMENT
mov
mov
es.ax
mov
ax,SCREENKWIDTH/4
:pointtodisplay
memory
; t h e r ea r e
4 p i x e l sa te a c ha d d r e s s ,
80 b y t e sw i d e
: i n e a c hp l a n e
o f d e s i r e d row
X coordinate
a d ed ar ec sh4sa pt i x e l s
: s o dt hi vei d e
X c o o r bd yi n a t e
4
; e a c h3 2 0 - p i x e lr o wi s
s t a r t t o mu1
; p o i n td x
cx
push
a ;rteh ec rxe. 1s h r
c x . 1s h r
d pd irxeesls' st h e
t o ; p o ianxt , c x a d d
mov
s i ,ax
t h e b a c k : g POP
et
ax
X coordinate
p l a n et h e : g e at 1 . 3a n d
p i x tehl eIo f
mov
ah.al
mov
a1 ,READ-MAP
dx.GC-INDEX
mov
OUT-WORD
f opprlraonptehe erf r orm
ead to ;set
: t h ep i x e l
e spblp;o:tyrti[rdexhtseseai lld
ret
ReadPixel
endp
the aside ;set
; W a i t sf o rt h en e x tk e ya n dr e t u r n s
i t i n AX.
; I n p u tn: o n e
; output:
;
AX
- f u l 1l 6 - b i t
G e t N e x t K epyr once a r
WaitKey:
mov
ah.1
int
16h
jz
WaitKey
ah,ah
sub
int
16h
ret
GetNextKey
endp
Code
ends
Se tnadr t
598
Chapter 31
code f o r k epyr e s s e d
; w a i tf o r
: r e a dt h ek e y
a key t o become a v a i l a b l e
so
The interesting aspects of Listing31.1 are three.First, the Set320x400Mode subroutine selects 320x400 256color mode. This is accomplished by performing a mode
13H mode set followed by then putting the VGA into standard planar byte mode.
Set320x400Modezeros display memory as well. Its necessary to clear display memory
even after a mode 13H mode set because the mode13H mode set clears onlythe 64K
of display memory
that can be accessedin that mode,leaving 192Kof display memory
untouched.
The second interesting aspect of Listing 31.1 is the Writepixel subroutine, which
draws a colored pixel at any x,y addressable location on thescreen. Although it may
not be obvious because Ive optimized the code a little, the process of drawing a
pixel is remarkably simple. First,the pixels display memory address is calculated as
address = (y * (SCREEN-WIDTH / 4 ) ) + ( x / 4)
which might be more recognizable as:
address = ((y * SCREEN-WIDTH) + x) / 4
(There are 4 pixels at each display memory address in 320x400 mode, hence the
division by 4.) Then the pixels plane is calculated as
plane = x and 3
which is equivalent to:
plane = x modulo 4
The pixels color is then written to theaddressed byte in the addressed plane. Thats
all there is to it!
The third item of interest in Listing 31.1 is the ReadPixel subroutine. ReadPixel is
provirtually identical to Writepixel, save that in ReadPixel the Read Map register is
grammed with a plane number, while WritePixel uses a plane mask to set the Map
Mask register. Ofcourse, that difference merely reflectsa fundamental difference in
the operationof the two registers. (If thatsGreek to you,refer back to Chapters2330 for a refresher on VGA programming.) ReadPixel isnt used in Listing 31.1, but
Ive included it because, as I said above, the read and write pixel functions together
can support a whole host of more complex graphics functions.
How does 320x400 256-color mode stack up as regards performance? As it turns out,
the programming modelof 320x400 mode is actually pretty good forpixel drawing,
pretty much on a par with the model of mode 13H. When you run Listing 31.1, youll
no doubtnotice that thelines are drawn quite rapidly. (In fact, the drawing could be
considerably faster still witha dedicated line-drawing subroutine, which would avoid
the multiplication associated with each pixel in Listing 31.1 .)
In 320x400 mode, the calculation of the memory address is not significantly slower
than in mode 13H, and the calculation and selection of the target plane is quickly
accomplished. As with mode 13H, 320x400 mode benefits tremendously from the
byte-per-pixel organization of 256-color mode, which eliminates the need for the
Higher 256-Color Resolution on the VGA
599
L3 1-2.ASM
: Program t od e m o n s t r a t et h et w op a g e sa v a i l a b l ei n3 2 0 x 4 0 0
: 2 5 6 - c o l o r modeson
a VGA. D r a w sd i a g o n a lc o l o rb a r si na l l
: 2 5 6c o l o r s i n page 0. t h e nd o e st h e
same i n page 1 ( b u t w i t h
600
Chapter 31
; t h eb a r st i l t e dt h eo t h e rw a y ) .
: c o l o rb a r si n
and f i n a l l y d r a w sv e r t i c a l
page 0.
VGA-SEGMENT
SC-INDEX
GC-INDEX
CRTC-INDEX
MAP-MASK
MEMORY-MODE
MAX-SCAN-LINE
STARTLADDRESS-HIGH
UNDERLINE
MODELCONTROL
GRAPHICS-MODE
MISCELLANEOUS
SCREEN-WIDTH
SCREEN-HEIGHT
WORD-OUTS-OK
stack
OaOODlh
3c4h
3ceh
3d4h
2
4
9
Och
14h
17h
5
6
320
400
1
segment
db
ends
stack
; Macro t o o u t p u t
; S e q u e n c eC o n t r o l l e rI n d e xr e g i s t e r
: G r a p h i c sC o n t r o l l e rI n d e xr e g i s t e r
:CRT C o n t r o l l e r I n d e x r e g i s t e r
;Map Mask r e g i s t e r i n d e x i n SC
;MemoryMode
r e g i s t e ri n d e xi n
SC
;Maximum Scan L i n e r e g i n d e x i n
CRTC
: S t a r tA d d r e s sH i g hr e gi n d e xi n
CRTC
; U n d e r l i n eL o c a t i o nr e gi n d e xi n
CRTC
;Mode C o n t r o l r e g i s t e r i n d e x i n
CRTC
; G r a p h i c s Mode r e g i s t e r i n d e x i n
GC
; M i s c e l l a n e o u sr e g i s t e ri n d e xi n
GC
;# o f p i x e l sa c r o s ss c r e e n
;# o f s c a nl i n e s onscreen
; s e t t o 0 t o assemble f o r
; c o m p u t e r st h a tc a n ' th a n d l e
: w o r do u t st oi n d e x e d
VGA r e g i s t e r s
p a r as t a c k
'STACK'
(?)
512dup
awordvalue
t o a port.
OUT-WORD
macro
i f WORD-OUTS-OK
dx.ax
out
else
doxu. at l
di xn c
xchg
ah.al
doxu. at l
dx
dec
xchg
ah.al
endi f
endm
; Macro t o o u t p u t
ac o n s t a n tv a l u et o
CONSTANT-TO-INDEXED-REGISTERmacro
mov
dx.AO0RESS
mov
ax.(VALUE s h l 8)
OUT-WORD
endm
anindexed
VGA r e g i s t e r .
ADDRESS, I N D E X , VALUE
INDEX
Code segment
cs:Code
assume
S t a r tp r o cn e a r
; S e t3 2 0 x 4 0 02 5 6 - c o l o r
call
mode.
Set320By400Mode
; We're i n 3 2 0 x 4 0 02 5 6 - c o l o r
mode, w i t h page 0 d i s p l a y e d .
; L e t ' s fill page 0 w i t h c o l o r b a r s s l a n t i n g
down and t o t h e r i g h t .
sub
di.di
;page 0 s t a r t s a t a d d r e s s
601
mov
bl.1
call
ColorBarsUp
;make c o l o r b a r s s l a n t
; t ot h er i g h t
: d r a wt h ec o l o rb a r s
down and
way.
mov
mov
d i ,8000h
bl,-l
call
ColorBarsUp
; Waitforakeyand
;page 1 s t a r t sa ta d d r e s s8 0 0 0 h
;make c o l o r b a r s s l a n t
down and
; totheleft
;draw t h e c o l o r b a r s
c aGl el t N e x t K e y
CONSTANT-TO-INDEXED-REGISTER
CRTC-INDEX.START-ADDRESS-HIGH,EDh
; s e tt h eS t a r tA d d r e s sH i g hr e g i s t e r
; t o 80h. f o r a s t a r ta d d r e s so f8 0 0 0 h
; Draw v e r t i c a l b a r s i n
page 0 w h i l e page 1 i s d i s p l a y e d .
ds iu, db i
sub
b, lb l
bcacaroslol lort rB
h; e
dC
a roasl w
Up
;page 0 s t a r t s a t a d d r e s s
;make c o l o r b a r s v e r t i c a l
; W a i tf o ra n o t h e rk e ya n df l i pb a c kt op a g e
c aGl el t N e x t K e y
CONSTANT-TO-INDEXED-REGISTER
0 whenone
i s pressed.
CRTC-INDEX.START-ADDRESSHIGH,OOh
; s e tt h eS t a r tA d d r e s sH i g hr e g i s t e r
; t o OOh. f o r a s t a r t a d d r e s s o f
; W a i tf o ry e ta n o t h e rk e ya n dr e t u r nt ot e x t
; one i s pressed.
c aGl el t N e x t K e y
mov
ax.0003h
int
10h
mov
ah.4ch
int
21h
mode andend
OOOOh
when
; t e x t mode
;done
S t a r t endp
: Setsup320x400256-color
modes
: I n p u tn: o n e
; Output:none
Set320By400Mode
p r no ec a r
; F i r s t , go t o normal320x200256-color
mode, which i s r e a l l y a
; 320x400256-color
mode w i t he a c hl i n es c a n n e dt w i c e .
mov
ax.0013h
int
10h
; Change C P U a d d r e s s i n go fv i d e o
: c h a i n ,o rc h a i n4 ) .t oa l l o w
602
Chapter 31
;AH
0 means mode s e t . AL
: 2 5 6 - c o l o rg r a p h i c s mode
;BIOS v i d e o i n t e r r u p t
memory t o l i n e a r ( n o t o d d / e v e n .
us t o access a l l 256K o f d i s p l a y
- 1 3 hs e l e c t s
: i n modes 1 0 ha n d1 2 h ,e x c e p tt h a te a c hb y t eo fd i s p l a y
: c o n t r o lo n e2 5 6 - c o l o rp i x e l ,w i t h
4 a d j a c e n tp i x e l s
: a d d r e s s .o n ep i x e lp e rp l a n e .
mov
dx.SC-INDEX
mov
a1 .MEMORY-MODE
doxu. at l
di xn c
in
a1 ,dx
and
a1 ,not
08h
or
a1 .04h
doxu. at l
dx.GC-INDEX
mov
mov
a1 .GRAPHICS-MODE
doxu, at l
di xn c
a l . dixn
and
a1 . n o1t0 h
doxu. at l
ddxe c
mov
a1 ,MISCELLANEOUS
doxu. at l
di xn c
a l . di xn
c h a i n o f f : t u r n 0 2 ha l . n o at n d
doxu. at l
d i s p l a yo f
memory
memory will
a t a n yg i v e n
; t u r no f fc h a i n
4
: t u r no f fo d d l e v e n
; t u r no f fo d d l e v e n
: Now c l e a rt h ew h o l es c r e e n ,s i n c et h e
: c l e a r e d 64K o u t o f t h e
; we s w i t c h t h e
: on t h es c r e e n
CONSTANT-TO-INDEXED-REGISTERSC-INDEX.MAP_MASK,Ofh
: e n a b l ew r i t e st oa l lp l a n e s ,
so
; we c a nc l e a r
4 p i x e l sa t a time
ax.VGA_SEGMENT
mov
mov
es,ax
sub
d. id i
mov
ax.di
mov
cx.8000h
:# o f words i n 64K
c ld
a l l ; c l es at or s wr e p
memory
: Tweak t h e mode t o 3 2 0 x 4 0 02 5 6 - c o l o r
mode b yn o ts c a n n i n ge a c h
; l i n et w i c e .
dx.CRTC-INDEX
mov
mov
a1 .MAX_SCAN-LINE
doxu. at l
di xn c
a l . di nx
a1,not
and
: sl feht
dox u. at l
dx
dec
maximum scan 1 i n e
-0
; Change CRTC s c a n n i n gf r o md o u b l e w o r d
mode t o b y t e mode, a l l o w i n g
; t h e CRTC t o scanmorethan
64K o fv i d e od a t a .
mov
dox u. at l
a1,UNDERLINE
603
dixn c
in
and
doxu. at l
dxdec
mov
doxu. at l
di xn c
in
b y t eh e oa:nlt.u4ro0nrh
a1,dx
a1 . n o4t 0 h
: t u r no f fd o u b l e w o r d
a1 ,MODELCONTROL
a1,dx
mode b i t , so memory i s
: scanned f o r v i d e o d a t a i n
a purely
: l i n e a r way, j u s t as i n modes 10hand12h
doxu. at l
ret
Set320By400Mode
endp
: Draws a f u l ls c r e e n
: Input:
:
DI
:
BL
o f s l a n t i n gc o l o rb a r si nt h es p e c i f i e dp a g e .
- page s t aar td d r e s s
1 t o make t h eb a r ss l a n t
down and t ot h er i g h t ,
-1 t o
make t h e ms l a n t down and t o t h e l e f t ,
0 t o make
them v e r t i c a l .
ColorBarsUpprocnear
ax.VGA-SEGMENT
mov
mov
es.ax
bh.bh
sub
mov
s i .SCREEN-HEIGHT
mov
dx.SC-INDEX
mov
a1 .MAP-MASK
;dtpxho.eiaunlt t
: p o i n dt x i n c
RowLoop:
mov
cx,SCREEN_WIDTH/4
push
bx
ColumnLoop:
MAP-SELECT
1
rept 4
mov
: sdpexl al,oeanucletts
mov
bihn c
MAP-SELECT
endm
nacedoxdntrttehhaseei ns i ntgo: p o i n td i
inc
memory
0
t hSCe t oI rnedge x
t h DX
e to
SC
r e gDi sat tear
Map Mask r e g
: 4p i x e l sa te a c ha d d r e s s ,
so
: e a c h3 2 0 - p i x e lr o w
i s 80 b y t e sw i d e
: i n e a c hp l a n e
: s a v et h er o w - s t a r tc o l o r
:do a adwdlt ilhrt eih4ss sap ti x e l s
: i n - l i n e code
a1 .MAP-SELECT
es:[dil.bh
MAP-SELECT s h l 1
604
:pointtodisplay
:startwithcolor
:#o f rows t o do
Chapter 31
0. and
1. 2.
t 3u ri nn
: w r i t et h i sp l a n e ' sp i x e l
:setthecolorforthenextpixel
: 4 pixels
: d oa n yr e m a i n i n gp i x e l so nt h i sl i n e
: g e tb a c kt h er o w - s t a r tc o l o r
: s e l e c tn e x tr o w - s t a r tc o l o r( c o n t r o l s
: s l a n t i n go fc o l o rb a r s )
down l i n e ssc ron
e e nt h e
Previous
: W a i t sf o rt h en e x tk e ya n dr e t u r n s
GetNextKey
proc
near
WaitKey:
rnov
ah.1
16h
int
WaitKey
jz
ah.ah
sub
int
16h
ret
GetNextKey
endp
Code
Home
Next
it i n AX.
: w a i t f o r a key t o become a v a i l a b l e
the
:read
key
ends
end
Start
When you run Listing 31.2,note theextremely smooth edges and fine gradationsof
color, especially in the screens with slanting color bars. The displays produced by
Listing 31.2 make it clearthat 320x400 256-color mode can produce effects that are
simply not possible in any 16-color mode.
605
Previous
chapter 32
be it resolved: 360x480
Home
Next
609
610
Chapter 32
VGAs-there are just too many variables involved for me to be certain. Feedback
from readerswith broad 360x480 256-color experience is welcome.
The above notwithstanding, 360x480 256-color mode offers 64 times as many colors
and nearly three times as many pixels as IBMs original CGA color graphics mode,
making startlingly realistic effects possible. No mode of the VGA (at least no mode
that 1 know of!), documented or undocumented,
offers a better combinationof resolution and color;even 320x400 256-color mode has 26 percent fewer pixels.
In other words, 360x480 256-color mode is worth considering-so lets have a look.
1.ASM
: B o r l a n d C/C++ t i n y / s m a l l / m e d i u mm o d e l - c a l l a b l ea s s e m b l e r
: s u b r o u t i n e st o :
:
* S e3 t6 0 x 4 8205 6 - c o l o r
VGA mode
:
* Draw a d oi t3n 6 0 x 4 8 20 5 6 - c o l o r
V G A mode
:
* Read t h ec o l o r o f a d o ti n 3 6 0 x 4 8 20 5 6 - c o l o r
VGA mode
: A s s e m b l e dw i t h
TASM
: T h e3 6 0 x 4 8 02 5 6 - c o l o r
mode s e t c o d ea n dp a r a m e t e r sw e r ep r o v i d e d
t h e mi n t ot h ep u b l i cd o m a i n .
: b yJ o h nB r i d g e s ,
who h a sp l a c e d
VGALSEGMENT
SC-INDEX
GC- I N D E X
MAPYMASK
READ-MAP
SCREENKWIDTH
WORD-OUTSLOK
equ
equ
3ceh
equ
equ
equ
OaOOOh
3c4h
2
4
equ
equ
360
1
: d i s p l a y memorysegment
: S e q u e n c eC o n t r o l l e rI n d e xr e g i s t e r
: G r a p h i c sC o n t r o l l e rI n d e xr e g i s t e r
SC
;Map Mask r e g i s t e r i n d e x i n
:Read Map r e g i s t e r i n d e x i n
GC
:# o f p i x e l sa c r o s ss c r e e n
: s e t t o 0 t o assemble f o r
; c o m p u t e r st h a tc a n th a n d l e
: w o r do u t st oi n d e x e d
VGA r e g i s t e r s
Be It Resolved: 360x480
61 1
-DATA
s e g m e npt u b l i bc y t e
'DATA'
; 3 6 0 x 4 8 02 5 6 - c o l o r
mode CRT C o n t r o l l e r r e g i s t e r s e t t i n g s .
( C o u r t e s y o f J o h nB r i d g e s . )
vptbl
dw
dw
dw
dw
dw
dw
dw
dw
dw
dw
dw
dw
dw
dw
dw
dw
dw
06b00h
05901h
05a02h
08e03h
05e04h
08a05h
DOd06h
03e07h
04009h
OealOh
Oacllh
Odfl2h
02d13h
00014h
Oe715h
00616h
Oe317h
word
vpend
abel
1
ends
horztotal
horz di spl ayed
s t a r th o r zb l a n k i n g
e n dh o r zb l a n k i n g
s t a r t h sync
endhsync
v e r t i c a lt o t a l
overflow
c e l lh e i g h t
vsync
start
vs y n ce n da n dp r o t e c tc r 0 - c r 7
v e r t i c a dl i s p l a y e d
offset
t u r n o f f dword mode
v b l a n ks t a r t
vblankend
t u r n on b y t e mode
-DATA
; Macro t o o u t p u t
awordvalue
t o a port.
OUT-WORD
macro
i f WORD-OUTS-OK
doxu. at x
else
d ox u. at l
di xn c
xchg
ah.al
doxu. at l
dxdec
xchg
ah.al
endi f
endm
-TEXT
'CODE'
ds:-DATA
; Setsup360x480256-color
; ( C o u r t e s yo fJ o h nB r i d g e s . )
mode.
; Callas:voidSet360By480ModeO
; R e t u r n sn
: othing
p u b l i c -Set360x480Mode
-Set360x480Mode
procnear
push
push
mov
int
si
di
ax.12h
10h
mov
int
ax.13h
10h
612 Chapter 32
; p r e s e r v e C r e g i s t e rv a r s
; startwith
mode 12h
: l e t t h e B I O S c l e a rt h ev i d e o
; s t a r tw i t hs t a n d a r d
: l e t t h e B I O S s e tt h e
mode 13h
mode
memory
mov
mov
dox u, at x
d x3,c 4 h
ax.0604h
; a l t e rs e q u e n c e rr e g i s t e r s
: d i s a b l ec h a i n
: s y n c h r o n o u sr e s e t
mov
ax.0100h
d xo ,uat x
mov
dx.3c2h
mov
.Oe7h
a1
doxu, at l
mov
dx.3c4h
mov
ax.0300h
dox u. at x
dx.al
dx
dx.al
dx
mov
dx.3d4h
mov
out
inc
in
and
out
dec
cld
mov
mov
a1 . l l h
a1 . d x
al.7fh
: asserted
; m i s co u t p u t
; use 28 mHz d o tc l o c k
: select it
: s e q u e n c e ra g a i n
: r e s t a r ts e q u e n c e r
; r u n n i n ga g a i n
r e g i s ct er t:rcs a l t e r
: crll
v a l u ;e c u r r e n t
data
to : point
v a l u ec r l l : g e t
crO
; remove
p r o t e c:t w r i t e
index
to : point
. so if f svept t b l
c x . ( ( o f f s ev pt e n d ) - ( o f f s ev pt t b l s) )h r
lodsw
@b:
dx.ax out
loop
@b
: r e s pop
tore di
pop
si
ret
-Set360x480Mode
endp
->
cr7
v aCr sr e g i s t e r
; Draws a p i x e l i n t h e s p e c i f i e d c o l o r a t t h e s p e c i f i e d
: l o c a t i o ni n3 6 0 x 4 8 02 5 6 - c o l o r
; Cala
l s :v o i dD r a w 3 6 0 x 4 8 0 D o t ( i n t
mode.
X , i n t Y . i n tC o l o r )
: R e t u r n s n: o t h i n g
DParms
dw
dw
DrawX dw
DrawY dw
C o l o r dw
struc
?
?
?
?
?
DParms
ends
p u b l i c- D r a w 3 6 0 x 4 8 0 D o t
-Draw360x480Dot
proc
near
push
bp
mov
bp,sp
p u sshi
p u sdhi
mov
ax.VGA-SEGMENT
mov
es.ax
mov
ax,SCREEN_WIOTH/4
;pushed BP
; r e t u r na d d r e s s
; X c o o r d i n a t ea tw h i c ht od r a w
: Y c o o r d i n a t ea tw h i c ht od r a w
;colorinwhichtodraw(inthe
: r a n g e 0-255; u p p e rb y t ei g n o r e d )
BP
: p r e s e r v ec a l l e r ' s
; p o i n tt os t a c kf r a m e
: p r e s e r v e C r e g i s t e rv a r s
:pointtodisplay
memory
: t h e r ea r e 4 p i x e l sa te a c ha d d r e s s ,
; e a c h3 6 0 - p i x e lr o wi s9 0b y t e sw i d e
; i ne a c hp l a n e
so
Be It Resolved: 360x480 61 3
mu1
mov
sd hi .rl
dsih. 1r
add
mov
and
mov
pcl oa rntrheebesi tp tohoned: si naeght . csl h l
Cbp+DrawYl
d i , Cbp+DrawX]
d, ai x
c l. b y t ep t r
c l .3
ah.1
: p o i n tt os t a r to fd e s i r e dr o w
: g e tt h e X c o o r d i n a t e
; t h e r ea r e 4 p i x e l sa te a c ha d d r e s s
: so d i v i d e t h e X c o o r d i n a t eb y 4
: p o i n tt ot h ep i x e l ' sa d d r e s s
X c o o r d i n a t ea g a i n
; g e tt h e
# ofthepixel
: g e tt h ep l a n e
Cbp+OrawX]
: t h ep i x e li si n
mov
a1 .MAP-MASK
mov
dx,SC_INDEX
OUT-WORD
f pop rl oa tpnheer t ow r i t et o ; s e t
: t h ep i x e l
al.bC
yp ttberp + C o l :ogtcrhe]oetl o r
:draw
mov
stosb
; r e s pop
tore di
pop
si
POP
bp
ret
-Draw360x480Dotendp
v aC r sr e g i s t e r
: r e s t o r ec a l l e r ' s
BP
: Reads t h e c o l o r o f t h e p i x e l a t t h e s p e c i f i e d
: l o c a t i o ni n3 6 0 x 4 8 02 5 6 - c o l o r
mode.
; C a l la s :i n tR e a d 3 6 0 ~ 4 8 0 D o t ( i n t
X. i n t Y )
: R e t u r n s p: i x e cl o l o r
RParms
dw
dw
ReadX dw
Ready dw
RParms
struc
?
?
?
?
ends
:pushed BP
: r e t u r na d d r e s s
:X c o o r d i n a t e f r o m w h i c h t o r e a d
:Y c o o r d i n a t ef r o mw h i c ht or e a d
p u b l i c- R e a d 3 6 0 x 4 8 0 D o t
-Read360x480Dot
procnear
push
bp
mov
p u sshi
p u sdhi
mov
mov
mov
mu1
mov
ss ih. rl
s hi . rl
sai .dadx
mov
ah.3
and
ax.VGA-SEGMENT
es.ax
ax,SCREEN-WIOTH/4
[bp+DrawY]
si,[bp+DrawX]
a h . b y t pe t r
mov
a1 ,READ-MAP
mov
dx.GC-INDEX
OUT-WORD
614
Chapter 32
BP
: p r e s e r v ec a l l e r ' s
: p o i n tt os t a c kf r a m e
: p r e s e r v e C r e g i s t e rv a r s
bp.sp
Cbp+DrawX]
: p o i n t t o d i s p l a y memory
: t h e r ea r e 4 p i x e l sa te a c ha d d r e s s ,
so
: e a c h3 6 0 - p i x e lr o wi s
90 b y t e sw i d e
: i n e a c hD l a n e
; p o i n tt os t a r to fd e s i r e dr o w
: g e tt h e X c o o r d i n a t e
: t h e r e a r e 4 p i x e l sa te a c ha d d r e s s
: s o d i v i d e t h e X c o o r d i n a t eb y 4
: p o i n tt ot h ep i x e l ' sa d d r e s s
X c o o r d i n a t ea g a i n
: g e tt h e
: g e tt h ep l a n e
# ofthepixel
: s e tt or e a df r o mt h ep r o p e rp l a n ef o r
: t h ep i x e l
l o d sb y t ep t re s : [ s i ]
ah.ah
sub
dpio p
spi o p
POP
bp
ret
-Read360x480Dot
endp
-TEXT ends
end
; r e a dt h ep i x e l
;make t h e r e t u r n v a l u e
a w o r df o r
; r e s t o r e C r e g i s t e rv a r s
;restorecaller's
C o m p i l e dw i t hB o r l a n d
b1c1c0 - 21.1c0 - l . a s m
*
*
BP
V G A l i n ed r a w i n gi n3 6 0 x 4 8 0
C/C++.
M u s tb el i n k e dw i t hL i s t i n g3 2 . 1w i t h
a command l i n e l i k e :
By M i c h a e lA b r a s h
*/
/* c o n t a i n sg e n i n t e r r u p t
#i
n c l u d e < d o s h>
{{define
#define
#define
#define
TEXT-MODE
BIOS-VIDEO-INT
X-MAX
Y-MAX
0x03
Ox10
360
480
/ * w o r k i sn cg r e w
e ni d t h
/ * w o r k i sncgr e he en i g h t
*/
*I
*/
e x t e r nv o i dD r a w 3 6 0 x 4 8 0 D o t O ;
e x t e r nv o i dS e t 3 6 0 x 4 8 0 M o d e O ;
/*
*
Draws a l i n e i n o c t a n t
0 o r 3 ( IDeltaXl
I D e l t a X I + lp o i n t sa r ed r a w n .
>-
DeltaY
).
*I
v o i dO c t a n t O ( X 0 .
YO.
u n s i g n e di n t X O . Y O ;
u n s i g n e di n tD e l t a X .
i n tX D i r e c t i o n ;
intColor;
D e l t a XD
. e l t a YX
. O i r e c t i o nC
. olor)
I* c o o r d i n a t e s o f s t a r t o f t h e l i n e
*I
D e l t a Y ; / * l e n g t ho ft h el i n e
*/
/* 1 i f l i n e i s drawn l e f t t o r i g h t ,
-1 i f d r a w n r i g h t t o l e f t
*/
/* c o l o ri nw h i c ht od r a wl i n e
*I
i n tD e l t a Y x 2 ;
i n tD e l t a Y x Z M i n u s D e l t a X x Z :
i n tE r r o r T e r m ;
/*
Setup
i n i t i a le r r o rt e r ma n dv a l u e s
used i n s i d ed r a w i n gl o o p
OeltaYx2 = DeltaY * 2;
D e l t a Y x 2 - ( i n t ) ( D e l t a X * 2 1;
OeltaYxZMinusDeltaXxZ
E r r o r T e r m = D e l t a Y x 2 - ( i n t )D e l t a X ;
*/
/ * Draw t h e l i n e * /
Draw360x480Dot(XO. Y O . C o l o r ) ;
/ * d r a wt h ef i r s pt i x e l
while ( DeltaX-- 1 {
/ * See i f i t ' s t i m e t o a d v a n c et h e Y c o o r d i n a t e * /
i f ( E r r o r T e r m >= 0 ) {
/ * A d v a n c et h e Y c o o r d i n a t e & a d j u s t t h e e r r o r t e r m
b a c k down * /
*/
YO++;
ErrorTerm
1 else
+- D e l t a Y x E M i n u s D e l t a X x Z ;
I* Add t o t h e e r r o r t e r m
ErrorTerm
+-
*I
DeltaYxE;
X0 +- X D i r e c t i o n ;
Draw360x480Dot(XO, Y O ,
I* a d v a n c et h e
X coordinate *I
I* d r a w a p i x e l * I
Color);
1
I*
*
*
Draws a l i n e i n o c t a n t
1 o r 2 ( IDeltaXl
I D e l t a Y I + Ip o i n t sa r ed r a w n .
<
DeltaY
).
*I
YO. D e l t a X D
v o i dO c t a n t l ( X 0 .
. e l t a YX
. D i r e c t i o nC
. olor)
u n s i g n e d i n t XO, Y O ;
I* c o o r d i n a t e s o f s t a r t o f t h e l i n e
u n s i g n e di n tD e l t a X .
I* l e n g t ho ft h el i n e
*I
DeltaY:
i n tX D i r e c t i o n ;
I* 1 i f l i n e i s d r a w n l e f t t o r i g h t ,
-1 i f d r a w n r i g h t t o l e f t
*I
i n t Color;
*I
I* c o l o r i n w h i c h t o d r a w l i n e
f
i n tD e l t a X x 2 ;
i n tD e l t a X x Z M i n u s D e l t a Y x Z ;
i n tE r r o r T e r m :
*I
/ * S e tu pi n i t i a le r r o rt e r ma n dv a l u e su s e di n s i d ed r a w i n gl o o p
D e lt a X x 2
D e l t a X * 2;
DeltaXxZMinusDeltaYxZ
DeltaXxZ - ( i n t ) ( DeltaY * 2 ) ;
ErrorTerm
D e l t a X x Z - ( i n t )D e l t a Y :
*I
Draw360x480Dot(XO. Y O . C o l o r ) ;
I* d r atwhf ei r sp ti x e l
while ( DeltaY-- ) {
I* See i f i t ' s t i m e t o a d v a n c et h e X c o o r d i n a t e * I
i f ( E r r o r T e r m >- 0 1 I
I* A d v a n c et h e X c o o r d i n a t e & a d j u s t t h e e r r o r t e r m
b a c k down * I
X0 +- X D i r e c t i o n :
E r r o r T e r m +- D e l t a X x 2 M i n u s D e l t a Y x Z ;
1 else I
/ * Add t o t h e e r r o r t e r m * I
E r r o r T e r m +- D e l t a X x E :
YO++;
I* a d v a n tchee
Draw360x480Dot(XO.
YO.Color);
*I
Y coordinate *I
I* d r a w a p i x e l * I
I*
Draws a l i n e o nt h e
EGA o r VGA.
*I
v o i dE V G A L i n e ( X 0 .
YO, X 1 . Y1. C o l o r )
I* c o o r d i n a t eosf
one e n odt fh lei n e
i n t XO. YO;
i n t X1, Y1;
I* c o o r d i n a t e s o f t h eo t h e er n dotfh el i n e
u n s i g n e dc h a rC o l o r ;
/* c o l o ri nw h i c ht od r a wl i n e
616
i n t D e l t a X .D e l t a Y ;
i n t Temp:
Chapter 32
*I
*I
*I
I* Save h a l ft h el i n e - d r a w i n gc a s e sb ys w a p p i n g
YO w i t h Y 1
and X0 w i t h X 1 i f Y O i s g r e a t e r t h a n
Y 1 . As a r e s u l t ,D e l t a Y
0-3 casesneed t o be
i s a l w a y s > 0. a n do n l yt h eo c t a n t
hand1ed. * I
i f ( YO > Y 1 1 I
Temp
YO;
YO
Y1;
Y1
Temp;
XO;
Temp
x0
x1;
X1
Temp:
--
f o u r o c t a n t si nw h i c h
/ * H a n d l ea sf o u rs e p a r a t ec a s e s ,f o rt h e
Y 1 i sg r e a t e rt h a n
YO */
DeltaX
X 1 - XO;
/*
c a l c u l a t et h el e n g t h
i n e a c hc o o r d i n a t e
DeltaY
Y 1 - YO;
i f ( DeltaX > 0 1 I
i f ( DeltaX > DeltaY
(
O c t a n t O ( X 0 . Y O , D e l t a XD
. eltaY,
) else {
O c t a n t l ( X 0 . Y O , D e l t a XD
. eltaY.
I else
of t h e l i n e
*I
1. C o l o r
1. C o l o r
DeltaX
-DeltaX;
/ * a b s o lvDuateoleluft ea X
i f ( DeltaX > DeltaY ) (
O c t a n t O ( X 0 . Y O , D e l t a XD
, eltaY.
-1. C o l o r ) ;
I else I
O c t a n t l ( X 0 . YO, O e l t a XD
. eltaY.
-1. C o l o r ) ;
*I
I*
*
*
*
S u b r o u t i n et od r a w
a r e c t a n g l ef u l lo fv e c t o r s .o ft h e
s p e c i f i e d l e n g t h and i nv a r y i n gc o l o r s ,a r o u n dt h e
s p e c i f i e dr e c t a n g l ec e n t e r .
*I
v o i dV e c t o r s U p ( X C e n t e r Y
. CenterX
. LengthY
. Length)
/ * c e n t e r o f r e c t a n g l et o
i n t XCenter.
YCenter;
i n t XLength.
YLength;
I* d i s t a n c ef r o mc e n t e tr oe d g e
o f r e c t a n g l e */
i n t W o r k i n g X .W o r k i n g Y ,C o l o r
fill * I
- 1;
--
/* L i n e s f r o m c e n t e r t o t o p o f r e c t a n g l e
*/
WorkingX
XCenter - XLength;
YCenter - YLength;
WorkingY
f o r ( ; WorkingX < ( XCenter + XLength 1; W o r k i n g X W )
EVGALine(XCenter.YCenter.WorkingX,WorkingY.Color++);
--
I* L i n e s f r o m c e n t e r t o r i g h t o f r e c t a n g l e
*I
WorkingX
XCenter + XLength - 1;
WorkingY
YCenter - YLength;
f o r ( ; WorkingY < ( YCenter + YLength ) ; WorkingY++ 1
EVGALine(XCenter.YCenter.WorkingX.WorkingY.Color++);
/ * L i n e sf r o mc e n t e rt ob o t t o mo fr e c t a n g l e
WorkingX
XCenter + XLength - 1:
WorkingY
YCenter + YLength - 1:
--
*/
Be It Resolved: 360x480
617
for
I* L i n e s f r o m c e n t e r t o l e f t o f r e c t a n g l e
*I
WorkingX = XCenter - XLength:
WorkingY = YCenter + YLength - 1;
f o r ( : WorkingY >= ( YCenter - YLength 1: WorkingY-EVGALine(XCenter.YCenter.WorkingX.WorkingY.Color++):
1
I*
Sampleprogram
*J
v o i dm a i n ( )
t o d r a wf o u rr e c t a n g l e sf u l lo fl i n e s .
chartemp;
Set360x480ModeO;
/ * Draw each o f f o u r r e c t a n g l e s f u l l o f v e c t o r s
*/
VectorsUp(X-MAX I 4. Y-MAX / 4 . X-MAX / 4 , Y-MAX / 4 . 1 ) :
VectorsUp(X_MAX * 3 / 4, Y-MAX I 4 , X-MAX / 4. Y-MAX I 4 , 2 ) ;
VectorsUp(X-MAX I 4 . Y-MAX * 3 / 4, X-MAX I 4 . Y-MAX / 4. 3 ) :
VectorsUp(X-MAX * 3 I 4. Y-MAX * 3 / 4 . X-MAX / 4 , Y-MAX / 4 . 4 ) ;
/ * W a i tf o rt h ee n t e rk e yt ob ep r e s s e d
s c a n f ( " % c "&
. temp):
*/
/ * Back t o t e x t mode * /
-AX
TEXTLMODE;
geninterrupt(BIOS-VIDEO_INT):
The first thing you'll noticewhen yourun this code is that the speed
of 360x480 256color mode is pretty good, especially considering that most of the program is implemented in C.
Drawing in 360x480 256-color mode can sometimes actually befaster than in the
16-color modes, because the byte-per-pixel display memory organization of 256color mode eliminates the need to read display memory before writing to it in
order to isolate individual pixels coexisting within a single byte. In addition,
360x480 256-color mode is a variant of Mode X, which we'll encounter in detail
in Chapter 47, and supports
all the high-perfrmance features of Mode X
The second thingyou'll notice is that exquisite shading effects are possible in 360x480
256-color mode; adjacent lines blend togetherremarkably smoothly, even with the
256 colors from a palette
of 2 5 6 8
default palette.The VGA allows you to select your
so you could, if you wished, set up thecolors to produce still finer shading albeit
with
fewer distinctly different colors available. For more on this and related topics, see
the coverage of palette reprogramming thatbegins in the next chapter.
The one thing
you may not notice right
away isjust how much detailis visible on the
screen, because the blending of colors tends to obscure the superior resolution of
61 8
Chapter 32
this mode. Each of the four rectanglesdisplayed measures 180 pixels horizontally by
240 vertically. Put another way, each one of those rectangles hastwo-thirds as many
pixels as the entire mode13H screen; inall, 360x480 256-color mode has 2.7 times
as many pixels as mode 13H! As mentioned above, the resolution is unevenly distributed, with vertical resolution matching thatof mode 12H but horizontal resolution
barely exceeding that of mode 13H-but resolution is hot stuff, no matter how its
laid out, and 360x480 256-color mode has the highest 256-color resolution youre
ever likely to see o n a standard VGA. (SuperVGAs are quite another matter-but
when yourequire a SuperVGA youre automatically excluding what might be a significant chunkof the market foryour code.)
Now that weve seen the wonders of which our new mode is capable, lets take the
time to understand how it works.
480 Scan Lines per Screen: A Little Slower, But No Big Deal
Theres nothing unusual about 480 scan lines; standard modes 11H and 12H support thatvertical resolution. The numberof scan lines hasnothing to do
with either
the numberof colors or thehorizontal resolution, so converting 320x400 256-color
mode to 320x480 256-color mode is a simple matter of reprogramming the VGAs
vertical control registers-which control the scan lines displayed, the vertical sync
pulse, vertical blanking, and the total number of scan lines-to the 480-scan-line
settings, and setting the polarities of the horizontal and vertical sync pulses to tell
the monitor to adjust to a 480-line screen.
Switching to 480 scan lines has the effect of slowingthe screen refresh rate.
The VGA
always displays at 70 Hz except in 480-scan-linemodes; there, dueto the time required
to scan the extra lines, the refresh rate slows to 60 Hz. (VGA monitors always scanat the
same rate horizontally; that is, the distance across the screen covered by the electron
beam in a given period of time is the same inall modes. Consequently, adding extra
lines per frame requires extra time.)
60 Hz isnt bad-thats the only refresh rate the
EGA ever supported, and the
EGA was the industry standardin its time-but it does
tend toflicker a little more andso is a little harder on theeyes than 70 Hz.
Be It Resolved: 360x480
619
620
Chapter 32
Setting up for 360x480 256color mode proved to be quite a task. Is drawing in this
mode going to be as difficult?
No. In fact, if you know howto draw in 320x400 256-color mode, you already know
how to draw in 360x480 256-colormode; theconversion between the two is a simple
matter of changing theworking screen width from 320 pixels to 360 pixels. In fact, if
you were to take the 320x400 256color pixel reading and pixel writing code from
Chapter 31 and change the SCREEN-WIDTH equate from 320 to 360, those routines would workperfectly in 360x480 256color mode.
The organization of display memory in 360x480 256-color mode is almost exactly
the same as in 320x400 256color mode,which we covered in detail in the last chapter. However, as
a quick refresher, each byte of display memory controlsone 256-color
VGA is reprogrammed by the modeset so that adjapixel, just as in mode 13H. The
cent pixels lie in adjacent planesof display memory.Look back to Figure 31.1 in the
last chapter to see the organization of the first few pixels on the screen; the bytes
controlling those pixels run cross-plane, advancing to the next address only every
fourth pixel. The address of the pixel at screen coordinate (x,y) is
address = ((y*360)+x)/4
and the plane of a given pixel is:
plane = x modulo 4
A new scan line starts every 360 pixels, or 90 bytes, as shown in Figure 32.1. This is
the major programming difference between the 360x480 and 320x400 256-color
modes; in the 320x400 mode, anew scan line startsevery 80 bytes.
The other programming differencebetween the two modes is that the area of display memory mapped to the screen is longer in 360x480 256-color mode, which is
only common sense given that there are more
pixels in that mode. The
exact amount
of memory required in 360x480 256-color mode is 360 times 480 = 172,800 bytes.
Thats more than half of the VGAs 256 Kb memory complement,so page-flipping is
out; however, theres no reason you couldnt use that extra memory to create avirtual screen larger than360x480, around which you could then scroll, if you wish.
Thats really all there is to drawing in 360x480 256color mode. From a programming perspective, this mode is no more complicated than 320x400 256-color mode
once the modeset is completed, and should be capable of good performance given
some clever coding. Its not particular straightforward to implement bitblt, block
Be It Resolved: 360x480
621
Previous
AOOOO
Home
Next
.....
0.. .O
.O
.O
.
0...0000..
A005A
...o
..
AOOB4
AOlOE
0 1 OF
01
14 22
A0 168
0 1 00
00
28 86
..000000..
000..
Screen
The
~~
.
"
~"
~"
move, or fast line-drawing code forany of the extended256-color modes, but it can
be done-and it's worth the trouble. Even the small taste we've gotten of the capabilities of these modes shows that they put the traditionalCGA, EGA, and generally
even VGA modes to shame.
There's more and better to come, though; in later chapters, we'll return to highresolution 256-color programming in a big way, by exploring the tremendous potential
of these modes for realtime 2-D and 3-D animation.
622
Chapter 32
Previous
chapter 33
Home
Next
625
Our topic in this chapter complements Michaels article nicely. Where he focused
on color perception, well focus on color generation; that is, the ways in which the
VGA can be programmed to generate those colors that lie within itscapabilities. To
find out why a VGA cant generate as pure a red as an LED, read Michaels article. If
you wantto find out how to flip between 16 different sets of 16 colors, though, dont
touch that dial!
I would be remiss if I didnt pointyou in the direction of two more articles, these in
the July 1990 issue DX
of DobbsJournal. Super VGA Programming, by Chris Howard,
provides a good deal of useful information about SuperVGA chipsets, modes, and
programming. Circles and the Digital Differential Analyzer, by Tim Paterson, is a
good article about fast circle drawing, a topic well tackle soon. All in all, the dog
days of 1990 weregood times for graphics.
The DAC
Once the EGA-compatible palette RAM has fulfilled its karma and performed 4bit
to &bit translation on a pixel, the resulting value is sent to the DAC (Digital/Analog
Converter). TheDAC performs an8-bit to 18-bitconversion in much the same manner as the palette RAM, converts the 18-bit result to analog red, green, and blue
626
Chapter
33
If bit 7 of AC Mode
reg is 0, select
palette RAM source,
if 1 , select Color
Palette RAM
Uses incoming 4-bit
pixel values to look
up one of the 16 6-bit
registers, then sends
the contents of that
register out(4-bit to
6-bit conversion)
c
Bits 4-5 out
~~~
Bits 0-3 in
~~
DAC
Uses incoming 8-bit pixel value to look up one of 256
18-bit registers, then sends the contents of that
register, organized as 6-bit red, green, and blue color
components, on to analog conversion circuitry, where
they are converted to three proportional analog signals
and sent to the monitor (8-bit to 18-bit conversion)
J.
I
Green analog signal
to monitor (one of
64 possible levels)
64 possible levels)
627
signals (6 bits for each signal), and sends the three analog signals to the monitor.
The DAC is a separate chip, external to the
VGA chip, butits an integral partof the
VGA standard andis present onevery VGA.
(Id like to take a moment to point out that
you cant speakof colorat any point in
the color translation
process until theoutput stage of the DAC. The 4bitpixel values
in memory, &bit valuesin the palette R A M , and 8-bit values sent to theDAC are all
attributes, not colors, because theyre subject to translationby a later stage. For example, apixel with a 4bit value of 0 isnt black,its attribute 0. It will be translated to
3FH if palette RAM register 0 is set to 3FH, but thats not the color white, just another attribute. The value 3FH coming into the DAC isnt white either, and if the
value stored in DAC register 63 is red=7, green=O, and blue=O, the actual color displayed for thatpixel that was 0 in display memory will be dim red. Itisnt color until
the DAC says its color.)
The DAC contains 256 18-bitstorage registers, usedto translate one of 256possible 8-bit
values into one of 256K (262,144, to be precise) 18-bit values.The 18-bit valuesare
actually composedof three Gbit values,one each for red, green,
and blue; for each color
component, thehigher the number, the brighter the
color, with 0 turning that color
off in the pixel and 63 (3FH) making that color maximum brightness.
all Got
that?
628
Chapter 33
register 1 set to 1, and so on, a configuration in which the upper two bits of all the
palette RAM registers are the same (zero) and therefore irrelevant. As a matter of
fact, youll generally want to set the palette RAM to this pass-through state when
working with VGA color, whether youreusing color pagingor not.
Why is it a good idea to set the palette
RAM to a pass-through state?Its a good idea
because the palette RAM is programmed by the BIOS to EGA-compatible settings
and thefirst 64 DAC registers are programmedto emulate the64 colors thatan EGA
can display during modesets for l6-color modes. This
is done forcompatibility with
EGA programs, and its uselessif youre going to tinker with the VGAs colors. As a
VGA programmer, you wantto take a 4bit pixel value and turnit into an18-bit RGB
value; you can do that without any help from the paletteRAM, and setting the palette RAM to pass-through values effectivelytakes it outof the circuit and simplifies
life something wonderful.The palette RAM exists solelyfor EGA compatibility, and
serves no useful purpose that I know of for VGA-only color programming.
256-Color Mode
So far Ive spoken only of 16-color modes; what of 256-color modes?
The rulein 256-color modes is: Dont tinker withthe VGApalette. Period. You can select
any colors you want by reprogramming the DAC, and theres no guarantee as to
what will happen if you mess around with the palette RAM. Theres no benefit thatI
know of to changing the palette RAM in 256-color mode, and the effect may vary
from VGA to VGA. So dont do it unless you know something I dont.
On the other hand,
feel free to alter the
DAC settings to your hearts content in 256color mode, all the more so because this is the only mode in which all 256 DAC
settings can be displayed simultaneously. By the way, the Color Select register and bit
7 of the Attribute ControllerMode register are ignoredin 256-color mode; all 8 bits
sent from the VGA chip to the DAC come from display memory.Therefore, thereis
no color paging in 256-color mode. Of course, that makes sense given that all 256
DAC registers are simultaneously in use in 256-color mode.
629
630
Chapter 33
DAC register consists of three bytes; the first byte is a 6-bit red component, thesecond byte is a 6-bit green component, and the third
byte is a 6-bit blue component, as
illustrated by Listing 33.1.
63 1
three-byte RGB value separately for each DAC register; although that doesnt take
advantage of the autoincrementing feature,it seems to me to be least susceptible to
outside influences. (It would be evenbetter to disableinterrupts for the entire duration
of DAC register loading, but thats much too long a time to leave interrupts off.) However, I have no hard evidence to offerin support of my conservative approach to setting
the DAC,just an uneasy feeling,so Id be mostinterested in hearing from any readers.
A final point is that the process of loading both the palette RAM and DAC registers
involves performing multiple OUTS to the same register. Many people whose opinions I respect recommend delaying between 1 / 0 accesses to the same port by
performing aJMP $+2 (jumping flushes the prefetch queue and forces a memory
access-or at least a cache access-to fetch the next instruction byte).
In fact, somepeople
recommend twoJMP $+2 instructions between 1 / 0 accesses to the same port, and
three jumps between 1 / 0 accesses to the same port that go in opposite directions
(OUT followed by IN or IN followed by OUT). This is clearly necessary when accessing
some motherboard chips, but I dont know howapplicable it is when accessing VGAs,
so make of it what you will. Input fromknowledgeable readers is eagerly solicited.
In the meantime,if you can use the BIOS to set the DAC, do so; then you wont have
to worry about thereal and potential complications of setting the DAC directly.
LISTING 33.1
133-1.ASM
: P r o g r a m t od e m o n s t r a t eu s eo ft h e
DAC r e g i s t e r s b y s e l e c t i n g
: s m o o t h l yc o n t i g u o u ss e to f2 5 6c o l o r s ,t h e nf i l l i n gt h es c r e e n
so t h a tt h e yb l e n d
a c o n t i n u u mo fc o l o r .
; w i t hc o n c e n t r i cd i a m o n d si na l l2 5 6c o l o r s
: i n t oo n ea n o t h e rt of o r m
small.model
2 0 0 h. s t a c k
.data
: T a b l eu s e dt os e ta l l
256 DAC e n t r i e s
: T a b l ef o r m a t :
:
B y t e 0: DAC r e g i s t e r 0 r evda l u e
:
B y t e 1: DAC r e g i s t e r 0 g r e evna l u e
632
Chapter 33
:
:
:
:
:
DAC
2B:y t e
DAC
3B:y t e
B y t e 4: DAC
DAC
5B: y t e
0 blue value
1 r e dv a l u e
1 g r e e nv a l u e
1 b l u ev a l u e
DAC r e g i s t e r 255 r e dv a l u e
DAC r e g i s t e r 2 5 5g r e e nv a l u e
DAC r e g i s t e r 255 b l u ev a l u e
7B6y5t :e
7B6y6t :e
7B6y7t :e
Col orTable
register
register
register
register
1a b e l b y t e
: The f i r s t 6 4e n t r i e sa r ei n c r e a s i n g l yd i mp u r eg r e e n .
x-0
REPT 64
db
0.63-X.0
x-x+l
ENDM
: T h en e x t6 4e n t r i e sa r ei n c r e a s i n g l ys t r o n gp u r eb l u e .
x-0
REPT 64
db
0,O.X
X-X+l
ENDM
; The n e x t 6 4 e n t r i e s f a d e t h r o u g h v i o l e t t o r e d .
x-0
REPT 6 4
db
X.O.63-X
x-x+l
ENDM
: The l a s t 6 4 e n t r i e s a r e i n c r e a s i n g l y d i m p u r e r e d .
x-0
REPT 64
63-X,O.O
db
x-x+l
ENDM
.code
Start:
mov
ax.DO13h
int
10h
mov
rnov
mov
mov
ax,@data
es.ax
d x . o f f sC
e to l o r T a b l e
ax,
1012h
bx.bx
sub
mov
int
cx,
lOOh
10h
:AH-0
s e l e c t ss e t
mode f u n c t i o n ,
: AL-13h s e l e c t s3 2 0 x 2 0 02 5 6 - c o l o r
: mode
: l o a dt h e DAC r e g i s t e r s w i t h t h e
: c o l o rs e t t i n g s
: p o i n t ES t o t h e d e f a u l t
: datasegment
: p o i n t ES:DX t o t h e s t a r t o f t h e
; b l o c k o f RGB t h r e e - b y t ev a l u e s
: t ol o a di n t ot h e
DAC r e g i s t e r s
:AH-lOh s e l e c t s s e t c o l o r f u n c t i o n ,
: AL-12h s e l e c t s s e t b l o c k o f
DAC
: r e g i s t e r ss u b f u n c t i o n
: l o a dt h eb l o c ko fr e g i s t e r s
: s t a r t i n g a t DAC r e g i s t e r l o
: s e t a l l 2 5 6r e g i s t e r s
: l o a dt h e DAC r e g i s t e r s
633
:now fill t h es c r e e nw i t h
: c o n c e n t r i cd i a m o n d si na l l
256
: c o l o ra t t r i b u t e s
mov
mov
a x , OaOOOh
ds,ax
: p o i n t DS t o t h e d i s p l a y
: segment
memory
: d r a wd i a g o n a ll i n e si nt h eu p p e r -
: l e f tq u a r t e ro ft h es c r e e n
mov
mov
mov
a1 .2
ah.-1
bx.320
mov
dx.160
mov
s, 1i 0 0
s udb, di i
mov
bp.1
c aFl il l l B l o c k
mov
mov
mov
al,2
ah:l
bx.320
mov
mov
mov
mov
dx.160
s, 1i 0 0
d, 3i 1 9
b p . -1
c aFl il l l B l o c k
#2
:startwithcolorattribute
: c y c l e down t h r o u g ht h ec o l o r s
: d r a wt o pt ob o t t o m( d i s t a n c ef r o m
: one l i n e t o t h e n e x t )
: w i d t ho fr e c t a n g l e
: h e i g h to fr e c t a n g l e
: s t a r ta t (0.0)
:draw l e f t t o r i g h t ( d i s t a n c e f r o m
: onecolumn t o t h e n e x t )
:draw i t
: d r a wd i a g o n a ll i n e si nt h eu p p e r : r i g h tq u a r t e ro ft h es c r e e n
112
:startwithcolorattribute
: c y c l e down t h r o u g ht h ec o l o r s
: d r a wt o pt ob o t t o m( d i s t a n c ef r o m
: one l i n e t o t h e n e x t )
: w i d t ho fr e c t a n g l e
: h e i g h to fr e c t a n g l e
: s t a r ta t( 3 1 9 . 0 )
: d r a wr i g h tt ol e f t( d i s t a n c ef r o m
: onecolumn t o t h e n e x t )
:draw i t
: d r a wd i a g o n a ll i n e si nt h el o w e r -
: l e f tq u a r t e ro ft h es c r e e n
mov
mov
mov
al.2
ah:l
b x- 3, 2 0
mov
mov
mov
mov
dx.160
s i ,100
d i .199*320
bp.1
c a l lF i l l 8 1o c k
mov
mov
mov
al.2
ah:l
b x- 3. 2 0
mov
mov
mov
mov
dx,160
s, 1i 0 0
d i .199*320+319
bp.-1
call Fi
mov
int
634
Chapter 33
11 B1 o c k
ah.1
21h
:startwithcolorattribute
#2
: c y c l e down t h r o u g h t h e c o l o r s
: d r a wb o t t o mt ot o p( d i s t a n c ef r o m
: one l i n e t o t h e n e x t )
: w i d t ho fr e c t a n g l e
: h e i g h to fr e c t a n g l e
: s t a r ta t( 0 , 1 9 9 )
:draw l e f t t o r i g h t ( d i s t a n c e f r o m
: onecolumn
t ot h en e x t )
:draw i t
: d r a wd i a g o n a ll i n e si nt h el o w e r : r i g h tq u a r t e ro ft h es c r e e n
:startwithcolorattribute
#2
: c y c l e down t h r o u g h t h e c o l o r s
: d r a wb o t t o mt ot o p( d i s t a n c ef r o m
: one l i n e t o t h e n e x t )
: w i d t ho fr e c t a n g l e
: h e i g h to fr e c t a n g l e
: s t a r ta t( 3 1 9 . 1 9 9 )
: d r a wr i g h tt ol e f t( d i s t a n c ef r o m
: onecolumn t o t h e n e x t )
:draw i t
: w a i tf o r
a key
mov
int
ax.0003h
10h
; r e t u r nt ot e x t
mov
int
ah.4ch
21h
: d o n e - - r e t u r nt o
: F i l l st h es p e c i f i e dr e c t a n g u l a ra r e a
mode
DOS
o f t h es c r e e nw i t hd i a g o n a ll i n e s
: Input:
;
AL
AH
:
:
;
;
;
i n i t i a al t t r i b u t ew i t hw h i c ht od r a w
amount by w h i c ht oa d v a n c et h ea t t r i b u t ef r o m
o n ep i x e lt ot h en e x t
BX = d i s t a n c et oa d v a n c ef r o mo n ep i x etl ot h en e x t
DX = w i d t ohrfe c t a n g l teo
fill
S I = h e i g hotrfe c t a n g l teo
fill
DS:ON = s c r e e na d d r e s osffi r spt i x etlod r a w
BP = o f f s e tf r o mt h es t a r t
o f one
column
t ot h es t a r to f
t h en e x t
FillBlock:
F i 11 HorzLoop:
p u sdhi
push
ax
mov
cx.si
Fill VertLoop:
mov
[dil.al
add
d. ib x
add
a1 .ah
1 oop F i 11 VertLoop
POP
ax
aal .dadh
dpio p
add
d. bi p
dxdec
F ji nl lzH o r z L o o p
ret
; p r e s e r v ep o i n t e rt ot o po fc o l u m n
; p r e s e r v ei n i t i a la t t r i b u t e
; c o l u m nh e i g h t
: s e tt h ep i x e l
; p o i n tt ot h en e x tr o wi nt h ec o l u m n
; a d v a n c et h ea t t r i b u t e
; r e s t o r ei n i t i a la t t r i b u t e
;advance t o t h e n e x t a t t r i b u t e t o
: s t a r tt h en e x tc o l u m n
o f column
: r e t r i e v ep o i n t e rt ot o p
: p o i n tt on e x tc o l u m n
;have we done a l lc o l u m n s ?
; n o . d ot h en e x tc o l u m n
end Start
Note the jagged lines at the corners of the screen when you run Listing 33.1. This
shows how coarse the 320x200 resolution of mode 13H actually is. Now look athow
smoothly the colors blend together in the rest of the screen. This is an excellent
example of how careful color selection can boost perceived resolution, as for example when drawing antialiased lines, as discussed in Chapter 42.
Finally, note that the border of the screen turns green when Listing 33.1 is run.
Listing 33.1 reprograms DAC register 0 to green, and the border attribute (in the
Overscan register) happens to be0, so the bordercomes out greeneven though we
havent touched the Overscan register. Normally, attribute 0 is black, causing the
border to vanish, but the border is an %bit attribute that has to pass through the
DACjust like any other pixel value, and itsjust as subject to DAC color translationas
the pixels controlled by display memory. However, the border color is not affected
by the palette RAM or by the Color Select register.
Yogi Bear and Eurythmics Confront VGA Colors
635
Previous
Home
Next
In this chapter, we traced the surprisingly complex path by which the VGA turns a
pixel value into RGB analog signals headed for the monitor.In the next chapter and
Chapter A on the companionCD-ROM,well look at some more code that
plays with
VGA color. Well explore in more detail the process of reading and writing the palette RAM and DAC registers, and well observe color paging and cycling in action.
636
Chapter 33
Previous
chapter 34
changing colors without writing pixels
Home
Next
WHOOPS!
Our printer failed to strip in the art for
Figure 34.1. The figure,however,isnot
Color Cycling
-
As weve learned in past chapters, theVGAs DAG contains 256 storage locations, each
holding one 18-bit value representing an RGB color triplet organized as 6 bits per
primary color. Eachand every pixelgenerated by the VGA is fed into theDAC as an
8-bit value(refer to Chapter 33 and to Chapter A on thecompanion CD-ROM to see
how pixels become %bit values in non-256 color modes)
and each ?-bit value is used to
639
look up one of the 256 valuesstored inthe DAC. The looked-up value is then converted
to analog red, green,and blue signals and sent to the monitor to form one pixel.
Thats straightforward enough, and weve produced some pretty impressive color
effects by loading the DAG once and then playing with the &bit path into theDAC.
Now, however, we want to generate color effects by dynamically changing the values
stored in theDAC in real time, a technique thatIll call color cycling. The potential of
color cycling should be obvious: Smooth motion can easily besimulated by altering
the colors in an appropriate pattern, andall sorts of changing color effects can be
produced without altering asingle bit of display memory.
For example, a sunset can be made to color and darken by altering the DAC locations containing thecolors used to draw the sunset,or a river can bemade to appear
to flow by cycling through the colors used to draw the river. Another use for color
cycling isin providing more realistic displaysfor applications like realtime 3-D games,
where the VGAs 256 simultaneous colors can be made to seem like many more by
changing the DAC settings from frame to frame to match the changing color demands of the rendered scene.Which leaves only one question: How do we load the
DAC smoothly in realtime?
Actually, so far as I know, you cant. At least you cant load the entire DAC-all 256
locations-frame after frame without producing distressing on-screen effects on at
least some computers. In
non-256 color modes, it
is indeed possible to load theDAC
quickly enough to cycle all displayedcolors (of which there are 16 or fewer),
so color
cycling could be used successfully to cycle all colors in such modes. On the other
hand, colorpaging (which flipsamong a number
of color sets stored within the DAC
in all modes other than 256 color mode, as discussed inChapter A on the companion CD-ROM) can be used in non-256 color modes to produce many of the same
effects as color cycling and is considerably simpler and more reliable then color
cycling, so color paging is generally superior to color cycling whenever its available.
In short,color cycling is reallythe methodof choice for dynamic color effects onlyin
256-color mode-but, regrettably, color cycling is at its least reliable and capable in
that mode,as well see next.
640
Chapter 34
As it happens, the DAG should only be loaded during vertical blanking; that is, the
time between the endof displaying the bottom borderand the start
of displayingthe
top border, when no video information atall is being sentto the screen by the DAG.
Otherwise, small dots of snow appear on thescreen, and while an occasional dot of
this sort wouldnt be a problem, the constant
DAG loading requiredby color cycling
would produce averitable snowstorm on thescreen. By the way, I do mean border,
not frame buffer; the overscan pixels pass through the DAC just like the pixels
controlled by the frame buffer, so you cant even load the DAC while the border
color is being displayed without getting snow.
The start of vertical blanking itself is not easy to find, but the leading edgeof the
vertical sync pulse is easy to detect via bit 3 of the Input Status 1 register at 3DAH;
when bit 3 is 1, the vertical sync pulse is active. Conveniently,the vertical sync pulse
starts partway through but not toofar into vertical blanking, so it serves as a handy
way to tell when its safe to load the DAC without producing snow on thescreen.
So we wait for thestart of the vertical sync pulse, then begin to load the DAG. Theres
a catch, though. On many computers-Pentiums, 486s, and 386s sometimes, 286s
most of the time, and 8088s all the time-there just isnt enough time between the
start of the vertical sync pulse and the end of vertical blanking to load all 256 DAG
locations. Thatsthe crux of the problem with the DAG, and shortly wellget to a tool that
will let you explore for yourself the extent of the problem on computers in which youre
interested. First, though, we must address unother DAC loading problem: theBIOS.
641
Thesefindings lead me inexorably to the conclusion that the BIOS should not be
used to load the DAC dynamically. That is, $you i-e loading the DAC just once in
preparation for a graphicssession-sort of a DAC mode set-by all means load by
way of the BIOS. No one will care that some garbage is displayed for a single
frame; heck, I have boards that bounce andflicker and show garbage every timeI
do a mode set,and the amount of garbage produced by loading the DAC once is
far less noticeable. If; however, you intend to load the DAC repeatedly for color
cycling, avoid the BIOS DAC load functions like theplague. They will bring you
only heartache.
642
Chapter 34
DAC; you could end up with quite a mess on the screen, especially when your program resumes and continues writing to the DAC-but in all likelihood to the wrong
locations. No problem, you say;just disable interrupts for the duration. Good ideabut it takes much longer to load the DAC than interrupts should be disabled for. If,
on the other hand, you set the index for each DAC location separately, you can
disable interrupts 256 times, once as each DAG location is loaded, without problems.
As I commented in the last chapter, I dont have any gruesome tale to relate that
mandates taking the slower but safer road and setting the index for each DAC location separately while interrupts are disabled. Im merely hypothesizing as to what
ghastly mishaps could happen. However, its been my experience that anything that
can happen on thePC does happen eventually; there are justtoo dang many PCs out
there for it to be otherwise. However, load the DAC any way you like; just dont
blame me if you get a call from someonewhos claimsthat your program sometimes
turns their screen into something resembling month-old yogurt. Its not really your
fault, of course-but try explaining that to them!
643
enrct i c aol f
LISTING 34.1
134-1 .ASM
; F i l l s ab a n da c r o s st h es c r e e nw i t hv e r t i c a lb a r si na l l2 5 6
; a t t r i b u t e s ,t h e nc y c l e sap o r t i o no ft h ep a l e t t eu n t i lak e yi s
; pressed.
equ
GUARD-AGAINST-INTS
equ
WAIT-VSYNC
equ
1
0
CYCLE-SIZE
SCREEN-SEGMENT
SCREEN-WIDTH-IN-BYTES
INPUT-STATUS-1
DAC-READ-INDEX
DAC-WRITE-INDEX
DAC-DATA
equ
equ
equ
equ
equ
eclu
equ
256
OaOOOh
320
03dah
03c7h
03c8h
03c9h
: s e t t o 1 t o u s e BIOS f u n c t i o n st oa c c e s st h e
; DAC. 0 t o r e a d a n d w r i t e t h e
DAC d i r e c t l y
;1 t o t u r n o f f i n t e r r u p t s
a n ds e tw r i t ei n d e x
: b e f o r el o a d i n ge a c h
DAC l o c a t i o n . 0 t o r e l y
; on t h e DAC a u t o - i n c r e m e n t i n g
; s e t t o 1 t ow a i tf o rt h el e a d i n g
edge o f
; v e r t i c a ls y n cb e f o r ea c c e s s i n gt h e
DAC, 0
; not to wait
; s e t t o 1 t o u s e REP INS6and
REP OUTSB when
; a c c e s s i n gt h e
DAC d i r e c t l y , 0 t ou s e
; IN/STOSB
and
LOOSB/OUT
;#o f DAC l o c a t i o n s t o c y c l e , 2 5 6
max
;mode 1 3 hd i s p l a y memorysegment
; I o f b y t e sa c r o s st h es c r e e ni n
mode 1 3 h
1 r e g i s t e rp o r t
; i n p u ts t a t u s
;DAC Read I n d e xr e g i s t e r
;DAC W r i t e I n d e x r e g i s t e r
;DAC D a t ar e g i s t e r
i f NOT-8088
.286
e n d i f ;NOT-8088
1.model
smal
l O O h. s t a c k
.data
; S t o r a g e f o r a l l 256 DAC l o c a t i o n s ,o r g a n i z e da so n et h r e e - b y t e
; ( a c t u a l l yt h r e e6 - b i tv a l u e s ;u p p e rt w ob i t s
o f e a c hb y t ea r e n ' t
; s i g n i f i c a n t ) RGB t r i p l e t p e r c o l o r .
P a l e t t e T e m p2 d5db6u* 3p ( ? )
.code
start:
mov
ax.@data
mov
ds,ax
; S e l e c t VGA's s t a n d a r d2 5 6 - c o l o rg r a p h i c s
mov
ax.0013h
int
10h
mode,mode
13h.
0: s e t mode f u n c t i o n ,
:AH
; AL
1 3 h : modes e# t t o
; e a c hf o rr e d ,g r e e n ,a n db l u e ,p e r
out
i f WAIT-VSYNC
; W a i t f o r t h el e a d i n ge d g eo ft h ev e r t i c a ls y n cp u l s e :t h i se n s u r e s
; t h a t we r e a dt h e
DAC s t a r t i n g d u r i n g t h e v e r t i c a l n o n - d i s p l a y
; period.
mov
dx.INPUTLSTATUS-1
b e W a ti toN o; twVaSi ty n c :
in
al.dx
aal n. 0d8 h
Wj na zi t N o t V S y n c
WaitVSync:
; w a i tu n t i lv e r t i c a ls y n cb e g i n s
in
a1 . d x
and
a1 .08h
644
Chapter 34
jz
WaitVSync
endi f
:WAIT_VSYNC
i f USE-BIOS
mov
ax.lOl7h
:AH
bsxu. bbx
mov
mov
mov
mov
cx.256
d x . sP
e ga l e t t e T e m p
es.dx
d x . o f f sPe at l e t t e T e m p
int
10h
else
i f GUARD-AGAINST-INTS
mov
cx.CYCLELSIZE
mov
d. is e P
galetteTemp
mov
es.di
mov
d.io f f s ePt a l e t t e T e m p
sub ah.ah
DACStoreLoop:
mov
dx.DAC-READ-INDEX
mov
al.ah
cli
doxu. at l
mov
d x , DAC-DATA
a l . di xn
stosb
a l . di xn
stosb
a 1 , di xn
stosb
sti
inc
ah
1 oopDACStoreLoop
e l s e : !GUARD-AGAINST-INTS
mov
dx,DAC-READ-INDEX
sub
a1 .a1
doxu. at l
mov
d, si e Pg a l e t t e T e m p
mov
es.di
mov
d.io f f s ePt a l e t t e T e m p
mov
d x , DAC-DATA
i f NOTL8088
mov
cx.CYCLELSIZE*3
i nr es pb
else
:!NOT_8088
mov
cx.CYCLE-SIZE
DACStoreLoop:
a l . di xn
stosb
a l . di xn
stosb
in
a1 .dx
stosb
1 oopDACStoreLoop
endi f
endi f
e n d i f :USE-BIOS
--
1 0 h :s e t
DAC f u n c t i o n ,
1 7 h :r e a d
DAC b l o c ks u b f u n c t i o n
: s t a r t w i t h DAC l o c a t i o n 0
: r e a do u ta l l2 5 6l o c a t i o n s
: AL
: p o i n t ES:DX t o a r r a y i n w h i c h
: t h e DAC v a l u e sa r et ob es t o r e d
: r e a dt h e DAC
:!USE-BIOS
:# o f DAC l o c a t i o n s t o l o a d
:dump t h e DAC i n t o t h i s a r r a y
: s t a r t w i t h DAC l o c a t i o n 0
: s e tt h e
DAC l o c a t i o n #
: g e tt h er e dc o m p o n e n t
; g e tt h eg r e e nc o m p o n e n t
: g e tt h eb l u ec o m p o n e n t
: s e tt h ei n i t i a l
DAC l o c a t i o n t o
:dump t h e DAC i n t o t h i s a r r a y
: r e a d CYCLELSIZE DAC l o c a t i o n s a t o n c e
:# o f DAC l o c a t i o n s t o l o a d
; g e tt h er e dc o m p o n e n t
: g e tt h eg r e e nc o m p o n e n t
: g e tt h eb l u ec o m p o n e n t
:NOTL8088
:GUARDLAGAINST_INTS
645
;Draw a s e r i e so f1 - p i x e l - w i d ev e r t i c a lb a r sa c r o s st h es c r e e ni n
: a t t r i b u t e s 1 t h r o u g h2 5 5 .
mov
mov
mov
ax,SCREEN-SEGMENT
es.ax
di.50*SCREEN-WIOTH-IN-BYTES
: p o i n t ES:OI tthsotea r t
: o f l i n e 5 0o nt h es c r e e n
c ld
100
:mov
draw dx.lOO
RowLoop:
a t t r w i t h l i n e e a cmov
:hs t a r ta l . 1
1
mov
cx.SCREEN-WIDTH-IN-BYTES
:do a laf iucnlrel o s s
ColumnLoop:
:draw
stosb
a pixel
:increment al.l
add
al.0
adc
t u r n e d j uasttt r:i ibfu tt eh e
: o v e r t o 0. i n c r e m e n t it t o 1
: b e c a u s ew e ' r en o tg o i n gt o
: c y c l e OAC l o c a t i o n 0. s o
: a t t r i b u t e 0 w o n ' tc h a n g e
1 oop Col umnLoop
dx
dec
jnz
RowLoop
high
gosr i g i n a l
lines
back
: C y c l et h es p e c i f i e dr a n g eo f
DAC l o c a t i o n s u n t i l a k e y i s p r e s s e d .
CycleLoop:
; R o t a t ec o l o r s1 - 2 5 5o n ep o s i t i o ni nt h eP a l e t t e T e m pa r r a y :
: l o c a t i o n 0 i s a l w a y sl e f tu n c h a n g e d
s o t h a tt h eb a c k g r o u n d
: a n db o r d e rd o n ' tc h a n g e .
p uwPspoahtrlrde t t e T e m p + ( l;a*s3sePi)tdael e t t e T e m p
p u swho prPdt ra l e t t e T e m p + ( l * 3 ) + 2
; s e t t i n fgoar t t r
1
mov
cx.254
mov
s i . o f f sPe at l e t t e T e m p + ( 2 * 3 )
mov
d i , o f f sPe at l e t t e T e m p + ( l * 3 )
mov
ax.ds
mov
es.ax
cx, 254*3/ 2
mov
rep
movsw
s e t t iPnaglse t t e T e m
; r po t a t e
: f o r a t t r s 2 t h r o u g h2 5 5t o
: a t t r s 1 t h r o u g h2 5 4
: g e t POP
bx
attri:
b uf toer
move 1 and
POP
ax
stosw
P a l e t t e T e m pt h e : t h e m t o
a t t r i:b fluot crea t i o n
255
mov
es:[di],bl
i f WAIT-VSYNC
: W a i tf o rt h el e a d i n ge d g eo ft h ev e r t i c a ls y n cp u l s e :t h i se n s u r e s
: t h a t we r e l o a dt h e OAC s t a r t i n g d u r i n g t h e v e r t i c a l n o n - d i s p l a y
: period.
mov
dx.INPUT-STATUS-1
WaitNotVSync2:
;wa it t o b e o u t o f v e r t i c a l s y n c
a l . di xn
aal .n0d8 h
jnz
Wai t N o t V S y n c 2
WaitVSync2:
;wa i t u n t i lv e r t i c a ls y n cb e g i n s
in
al.dx
al.08h
and
jz
WaitVSync2
e n d i f ;WAIT_VSYNC
i f USE-BIOS
; S e tt h en e w ,r o t a t e dp a l e t t e .
646
Chapter 34
mov
bsxu. b x
mov
mov
mov
mov
ax.lOl2h
cx.CYCLE-SIZE
d x , sPe ag l e t t e T e m p
es.dx
d x , o f f sPeat l e t t e T e m p
int
10h
e l s e : !USE-BIOS
i f GUARD-AGAINST-INTS
mov
cx.CYCLE-SIZE
mov
s.io f f s ePt a l e t t e T e m p
ah.ah
sub
DACLoadLoop:
mov
dx,DAC-WRITE-INDEX
mov
a1 ,ah
cli
doxu. at l
mov
d x , DAC-DATA
1 odsb
doxu. at l
1 odsb
dox u, at l
1 odsb
doxu. at l
sti
ai hn c
1oopDACLoadLoop
e l s e :!GUARD-AGAINST-INTS
mov
dx.DAC_WRITE-INDEX
as1u, ab l
doxu. at l
mov
s. io f f s ePt a l e t t e T e m p
mov
d x ,DAC-DATA
i f NOT-8088
mov
cx,CYCLE_SIZE*3
outsb
rep
else
:!NOTL8088
mov
cx.CYCLE-SIZE
DACLoadLoop:
1 odsb
doxu. at l
1 odsb
dox u, at l
1 odsb
doxu, at l
l o o p DACLoadLoop
:NOTL8088
endi f
;GUARDLAGAINSTLINTS
endi f
e n d i f ;USE-BIOS
;See i f ak e yh a sb e e np r e s s e d .
mov
ah,Obh
int
21h
and
a1
.a1
jz
Cycl
eLoop
:C1 e a r t h e k e y p r e s s .
mov
ah.1
int
21h
:AH
1 0 h :s e t
OAC f u n c t i o n ,
: AL
1 2 h :s e t
DAC b l o c ks u b f u n c t i o n
: s t a r t w i t h DAC l o c a t i o n 0
:#o f DAC l o c a t i o n s t o s e t
; p o i n t ES:DX t o a r r a y f r o m w h i c h
: t o l o a d t h e DAC
; l o a dt h e DAC
:I\
o f DAC l o c a t i o n s t o 1oad
: l o a d t h e DAC f r o m t h i s a r r a y
; s t a r t w i t h DAC l o c a t i o n 0
; s e tt h e
DAC l o c a t i o n #
: s e tt h er e dc o m p o n e n t
: s e tt h eg r e e nc o m p o n e n t
: s e tt h eb l u ec o m p o n e n t
: s e tt h ei n i t i a l
DAC l o c a t i o n t o
: l o a d t h e DAC f r o m t h i s a r r a y
: l o a d CYCLE-SIZE
DAC l o c a t i o n s a t o n c e
:#o f DAC l o c a t i o n s t o l o a d
: s e tt h er e dc o m p o n e n t
: s e tt h eg r e e nc o m p o n e n t
: s e tt h eb l u ec o m p o n e n t
;DOS c h e c ks t a n d a r di n p u ts t a t u sf n
: i s ak e yp e n d i n g ?
: n o .c y c l e
some more
:DDS k e y b o a r di n p u tf n
647
: R e s t o r et e x t
mov
int
mov
int
mode anddone.
a x , 0003h
10h
ah,4ch
21h
- 0:
s e t mode f u n c t i o n ,
03h: mode It o s e t
:DOS t e r m i n a t ep r o c e s sf n
:AH
: AL
setnadr t
The big question is, How does Listing 34.1 cycle colors? Via the BIOS or directly?
With interrupts enabled or disabled?
Et ceteru?
However you like, actually. Four equates at the top of Listing 34.1 select the sortof
color cycling performed; by changing these equates
and CYCLE-SIZE, you can get a
feel forhow well various approaches to colorcycling work with whatevercombination of computer system and VGA you care to test.
The USE-BIOS equate is simple. SetUSEBIOS to 1 to load the DAC through the
block-load-DAC BIOSfunction, or to 0 to load theDAC directly with OUTS.
If USE-BIOS is 1, the only other equate of interest is WAIT-VSYNC. If WAIT-VSYNC
is 1, the programwaits for the leading edge
of vertical sync before loading theDAC;
if WAIT-VSYNC is 0, the program doesnt wait before loading. Theeffect of setting
or notsetting WAIT-VSYNC depends onwhether theBIOS ofthe VGA the program
is running onwaits for vertical sync before loading theDAC. You may end upwith a
double wait, causing color cycling to proceed athalf speed, you may end upwith no
wait at all, causing cycling to occur far too
rapidly (and almost certainly with hideous
on-screen effects), oryou may actually end up cycling at the proper one-cycle-perframe rate.
If USEBIOS is 0, WAIT-VSYNC still applies. However, you will always want to set
WAIT-VSYNC to 1 when USE-BIOS is 0; otherwise, cycling willoccur much toofast,
and a good deal of continuous on-screen garbageis likely to make itself evident as
the program loads the DAC non-stop.
If USEBIOS is 0, GUARD-AGAINST-INTS determines whether thepossibility of
the DAC loading process being interrupted is guarded against by disabling interrupts and setting the write index once for every location loaded and whether the
DACs autoincrementing feature is relied upon or not.
If GUARD-AGAINST-INTS is 1, the following sequence is followed for the loading
of each DAC location in turn: Interrupts are disabled, the
DAC Write Index register
is set appropriately, theRGB triplet for the location
is written to theDAC Data register, and interrupts are enabled. This
is the slow but safe approach described earlier.
Matters get still more interesting if GUARD-AGAINST-INTS is 0. In that case, if
NOT-8088 is 0, then an autoincrementing load is performed in a straightforward
fashion; theDAC Write Index register is set to the index
of the first location to load
and the RGB triplet is sent to the DAC by way of three LODSB/OUT DX& pairs,
with LOOP repeating theprocess for each of the locations in turn.
648
Chapter 34
649
animating a night scene, with stars twinklingin the background, meteors streaking
across the sky, and a spaceship moving acrossthe screen with itsjets flaring. One way
to produce most of the necessary effects withlittle effort would be to draw the stars
in several attributes and then cycle the colors for those attributes, draw the meteor
paths in successive attributes, one for each pixel, and then cycle the colors for those
attributes, and domuch the same for thejets. The only remaining task would beto
animate the spaceship across the screen, which is not a particularly difficult task.
The key to getting all the color cycling to work in the above example, howevel;
would be to assign each color cycling task dlfferentpart
a
of the DAC, with each
part cycled independentlyas needed. r f ; as is likely, the total number ofDAC locations cycledproved to be too great to manage in oneframe, youcould simply cycle
the colors ofthe stars after oneframe, the colors of the meteors afterthe next, and
the colors of the jets after yet another frame, then back around to cycling the
colors of the stars. By splitting up the DAC in this manner and interleaving the
cycling tasks, you can perform a great deal of seemingly complex color animation
without loadingvery much ofthe DAC during any oneframe.
Yet another and somewhat odder workaround is that of using only 128 DAC locations and page flipping. (Page flipping in 256color modes involves using the VGAs
undocumented 256color modes; see Chapters 31, 43, and 47 for details.) In this
mode of operation, youd first displaypage 0, which is drawn entirely with colors 0127. Then youd draw page 1 to look just like page 0, except that colors 128-255are
used instead. Youd load DAC locations 128-255 with the next cycle settings for the
128 colors youre using, then youd switch to display the second page with the new
colors. Then you could modify page 0 as needed, drawing in colors 0-127, load DAC
locations 0-127 with the next color cycle settings, and flip back to page 0.
The idea is that you modify only
those DAC locations that arenot used to display any
pixels on the current screen. Theadvantage of this is not, as you might think, that
you dont generate garbage on the screen when modifying undisplayed DAC locations; in fact, you do, for a spot of interference will showup if you set a DAC location,
displayed or not, duringdisplay time. No, you still have to waitfor vertical sync and
load only during vertical blanking before loading theDAC when page flipping with
128 colors; the advantage is that since none of the DAG locations youre modifying is
currently displayed, youcan spreadthe loading out over two or more vertical blanking periods-however long it takes. If you did this withoutthe 128-colorpage flipping,
you might get odd on-screen effects as some of the colors changed after one frame,
some after the next,and so o n - o r you might not; changing the entire
DAG in chunks
over severalframes is another possibility worth considering.
Yet another approach to
color cycling isthat of loading a bit of the DAC during each
horizontal blanking period. Combine that with counting scan lines, and you could
650
Chapter
34
The
DAC Mask
Theres one register in the DAC that Ihavent mentioned yet, the DAC Maskregister
at 03C6H. The operation of this register is simple but powerful; it can mask off any
or all of the 8 bits of pixel information coming into the DAC from the VGA. Whenever a bit ofthe DAG Mask register is 1 , the correspondingbit of pixel information is
passed along to the DAC to be used in looking up the RGB triplet to be sent to the
screen. Whenever a bit of the DAC Mask register is 0, the corresponding pixel bit is
ignored, and a 0 is used for that bit position in all look-ups of RGB triplets. At the
extreme, a DAC Mask setting of 0 causes all8 bits of pixelinformation to be ignored,
so DAC location 0 is looked up for every pixel, and the entire screen displays the
color stored in DAC location 0. This makes setting the DACMask register to 0 a
quick and easy way to blank the screen.
651
Previous
Home
Next
applies to reading from the DAC. In fact, reading from the DAC can even cause snow,
just as loading the DAC does, so it should ideally beperformed during vertical blanking.
The DAC can also be read by way of the BIOS in either of two ways. INT 10H, function 1OH (AH=lOH), subfunction 15H (AL=15H) reads out a single DAC location,
specified by BX;this function returnsthe RGB triplet storedin the specified location
with the red component
in the lower 6 bits of DH, the green component in the
lower
6 bits of CH, and theblue component in the lower 6 bits of CL.
INT 10H, function 10H (AH=lOH), subfunction 17H
(AL=17H)reads out a block of
DAC locations of length CX, starting with the location specified by BX. ES:DX must
point to the buffer in which the RGB values from the specified block of DAC locations are to be stored. The form of this buffer (RGB, RGB, RGB ..., with three bytes
per RGB triple) is exactly the same asthat of the buffer used when calling the BIOS
to load ablock of registers.
Listing 34.1 illustrates reading the DAC both through theBIOS block-read function
and directly, with the direct-read code capable of conditionally assembling to either
guard against interrupts or not and
to use REP INSB or not.As you can see, reading
the DAC settings is very much symmetric withsetting theDAC.
Cycling Down
And so, at long last, we come to the end of our discussion of color control on the
VGA. If it has been more complex than anyone might have imagined, it has also
been most rewarding. Theres as much obscure but very real potential in colorcontrol as there is anywhere on the VGA, which is to say that theres a very great dealof
potential indeed. Put color cycling or color paging together with the page flipping
and image drawing techniques exploredelsewhere in this book, and youll leavethe
audience gasping and wondering How the heck did they do that?
652
Chapter 34
Previous
chapter 35
Home
Next
For all the complexity~,ofgraphics design and programming,surprisingly few primitive functions lie at the.& &&most graphics software. Heavily used primitives include
routines that draw dati
cles, area fills, bit block logical transfers, and, of course,
lines. For many ye&, computer graphics were created primarily with specialized
line-drawing hardware, so lines are in a way the Zinguafranca of computer graphics.
.
.de variety of microcomputer graphics applicationstoday, notaLines are
bly CAD/
computer-aided
engineering.
Probably the best-khown formula for drawing lines on a computer display is called
Bresenhams line-drawing algorithm. (We have to be specific here because there is
also a less-well-known Bresenhams circle-drawing algorithm.) In this chapter, 111
present two implementations for the EGA and VGA of Bresenhams line-drawing
algorithm, which provides decent linequality and excellent drawing speed.
The first implementation is in rather plain C, with the second innot-so-plain assembly, and theyre both pretty good code. The assembly implementation is damned
good code, in fact, but want
ifyou to know
whether its the fastest Bresenhamsimplementation possible, I must tell youthat it isnt. First ofall, the code could be sped
up
a bit by shuffling and combining thevarious error-term manipulations, but that results in truly cryptic code. I wanted you to be able to relate the original algorithm to
the final code, so I skipped thoseoptimizations. Also, write mode 3, which is unique
+l
jii
,,
655
to the VGA, could be used for considerably faster drawing. Ive described write mode
3 in earlier chapters,and I strongly recommend its use in VGA-only line drawing.
Second, horizontal, vertical, and diagonal lines could be special-cased, since those
particular lines require little calculation and can be drawn very rapidly. (This is especially true of horizontal lines, which can be drawn 8 pixels at a time.)
Third, run-length slice line drawing could be used to significantly reduce the number of calculations required perpixel, as Ill demonstrate in the next two chapters.
Finally, unrolled loops and/or duplicated code could be used to eliminate most
of the
branches in the final assembly implementation, and because x86 processors are notoriously slow at branching, that would make quite a difference in overall performance. If
youre interested in unrolled loops and similar assembly techniques, I refer you to the
first part of this book.
That brings us neatly to my final point: Even if I didnt know that there were further
optimizations to be made to my line-drawing implementation, Idassume that there
were. As Im sure the experiencedassembly programmers among you know, there
are dozensof waysto tackle any problem in assembly, and someoneelse always seems
to have come up with a trick that never occurred toyou. Ive incorporated a suggestion made by Jim Mackraz in the code inthis chapter, and Idbe most interested in
hearing of anyother tricks or tips you may have.
Notwithstanding, the linedrawingimplementation in Listing 35.3 is plentyfast enough
for most purposes, so lets get the discussion underway.
656
Chapter 35
assembly language and callable directly from Borland C++, draws lines at extremely
high speed, on the order of three to six times faster than the C version. Between
them, thetwo implementations illuminate Bresenhams line-drawing algorithm and
provide high-performance line-drawing capability.
The difficulty in drawing a line lies in generating aset of pixels that, taken together,
are a reasonable facsimile of a true line. Only horizontal, vertical, and 1:l diagonal
lines can be drawn precisely along the true line being represented; all other lines
must be approximated from thearray of pixels that a given video mode supports, as
shown in Figure 35.1.
Considerable thought has gone into the design of line-drawing algorithms, and a
number of techniques fordrawing high-quality lines have been developed. Unfortunately, most of these techniques were developed for powerful, expensive graphics
workstations and requirevery high resolution, a large color palette, and/or floatingpoint hardware. These techniques tend to perform poorly and produceless visually
impressive results on all but thebest-endowed PCs.
Bresenhams line-drawingalgorithm, on the other hand,is uniquely suited to microcomputer implementation in that it requires no floating-point operations, no divides,
and no multiplies inside the line-drawing loop. Moreover, it can be implemented
with surprisingly little code.
0 0 0 0 0 0 0
@ 0 0 0 0 0 0
Approximating a true linefrom a pixel array.
Figure 35.1
657
drawing at that point. As the drawing of the line progresses from one pixel to the
next, the error can be used to tell when, given the resolution of the display, a more
accurate approximationof the line canbe drawn by placing agiven pixel one unitof
screen resolution away from its predecessor in either the horizontal or thevertical
direction, or both.
Lets examine thecase of drawing a line where the horizontal,
or X length of the line
is greater than the vertical, or Y length, and both lengths are greater than 0. For
example, supposewe are drawing a line from(0,O) to (5,2),as shown in Figure 35.2.
Note that Figure 35.2 showsthe upper-left-hand cornerof the screenas (O,O), rather
than placing (0,O) at its more traditional lower-left-hand corner location. Due to the
way in which the PCs graphics are mapped to memory, it is simpler to work within
this framework, although a translationof Y from increasingdownward to increasing
upward couldbe effected easily enough by simply subtracting theY coordinate from
the screen height minus1; if you are more comfortablewith the traditional coordinate system, feel free tomodify the code in Listings 35.1 and 35.3.
In Figure 35.2, the endpointsof the linefall exactlyon displayed pixels. However,
no
other partof the linesquarely intersects the centerof a pixel, meaning thatall other
pixels will have to be plotted as approximations of the line. The approach to approximation that Bresenhams algorithm takes is to move exactly 1 pixel along the
major dimension of the line each time a new pixel is drawn, while moving 1 pixel
along the minor dimension each time the line moves more than halfway between
pixels along the minor dimension.
In Figure 35.2, the X dimension is the major dimension. This means that
6 dots, one
at each of X coordinates 0,1,2,3,4, and5, will be drawn. The trick, then, is to decide
on the correctY coordinates to accompany those
X coordinates.
3 0 0 0 0 0 0 0
Drawing between two pixel endpoints.
Figure 35.2
658
Chapter 35
Its easy enough to select the Y coordinates by eye in Figure 35.2. The appropriateY
coordinates are0, 0, 1, 1, 2, 2, based on theY coordinate closest to the line for each
X coordinate. Bresenhams algorithmmakes the same selections, based on thesame
criterion. The mannerin which it does this is by keeping a running recordof the
error of the line-that is, how far from the true line the current
Y coordinate is-at
each X coordinate, as shown in Figure 35.3. When the running error of the line
indicates thatthe currentY coordinate deviates from the true lineto the extent that
the adjacent Y coordinate would be closer to the line, then the currentY coordinate
is changed to that adjacentY coordinate.
Lets take a moment to follow the steps Bresenhams algorithm would go through in
drawing the line in Figure 35.3. The initial pixel is drawn at (O,O), the starting point
of the line. At this point the errorof the line is 0.
Since X is the major dimension, the next pixel has an X coordinate of 1. The Y
coordinate of this pixel will be whichever of 0 (the last Y coordinate) or1 (the adjacent Ycoordinatein the directionof the end pointof the line) the true
line at this X
coordinate is closer to. The running error atthis point is B minus A, as shown in
Figure 35.3. This amount is less than 1/2 (that is, less than halfway to the next Y
coordinate), so the Y coordinate does not change X
atequal to 1. Consequently, the
second pixel is drawn at ( 1 ,0).
The thirdpixel has anX coordinate of 2. The running error at
this point is C minus
A, which is greater than 1/2 and therefore closer to the next than to the currentY
coordinate. The thirdpixel is drawn at (2,1), and 1 is subtracted from the running
error to compensate for the adjustment of one pixel in the current Y coordinate.
The running errorof the pixel actually drawn at this point is C minus D.
0 0 0 0 0 0 0
659
660
Chapter 35
interested in the overall picture than in the primitive elements fromwhich that picture is built. As a general rule, any collection of pixels that trend from pointA to
point B in a straight fashion is accepted by the eye as a line. Bresenhams algorithm
is successfully used by many current PC programs, and by the standard of this wide
acceptance the algorithm is certainly good enough.
Then, too,users hate waiting for their computer to finish drawing. By any standard
of drawing performance, Bresenhams algorithm excels.
An Implementation in C
Its time toget down and look at some actual working code. Listing 35.1 is a C implementation of Bresenhams line-drawing algorithm for modes OEH, OFH, IOH, and
12H of the VGA, called as function EVGALiie. Listing 35.2 is a sample program to
demonstrate theuse of EVGALine.
*
*
*
*
*
*/
C i m p l e m e n t a t i o no fB r e s e n h a m sl i n ed r a w i n ga l g o r i t h m
f o r t h e EGA and VGA. Works i n modes OxE. OxF. 0x10.and0x12.
C o m p i l e dw i t hB o r l a n d
C++
By MichaelAbrash
#i
nclude<dos
#define
.h>
/ * c o n t a i n s MK-FP
EVGA-SCREEN-WIDTHKIN-BYTES
# d e f i n e EVGA-SCREEN-SEGMENT
OxAOOO
GCINDEX
{{define
l d e f ine GC-DATA
# d e f i n e SET-RESET-INDEX
# d e f i n e ENABLELSETLRESET-INDEX
# d e f i n e BIT-MASK-INDEX
/*
*
*
*
*I
macro
*/
80
/*
memory o f f s e t f r o m s t a r t o f
one row t o s t a r t o f n e x t
*/
/ * d i s p l a y memorysegment * /
Ox3CE
/ * G r a p h i c s C o n t r o l 1e r
I n d e xr e g i s t e rp o r t
*/
Ox3CF
/ * G r a p h i c sC o n t r o l l e r
D a t ar e g i s t e rp o r t
*/
0 / * i n d e x e so nf e e d e d
*/
1 I* G r a p h i c sC o n t r o l l e r
*/
8 /* r e g i s t e r s */
Draws a d o t a t ( X O . Y O ) i n w h a t e v e rc o l o rt h e
EGA/VGA hardware i s
s e t up f o r . L e a v e st h eb i t
mask s e tt ow h a t e v e rv a l u et h e
d o tr e q u i r e d .
v o i d EVGADot(X0, Y O )
u n s i g n e di n t
XO:
u n s i g n e di n t
YO:
/ * c o o r d i n a t e sa wt h i c ht od r a wd o t w, i t h
/ * ( 0 . 0 ) a t h eu p p e lr e f ot tf h es c r e e n
*I
*/
u n s i g n e dc h a rf a r* P i x e l B y t e P t r :
u n s i g n e dc h a rP i x e l M a s k ;
661
/*
C a l c u l a t et h eo f f s e ti nt h es c r e e ns e g m e n to ft h eb y t e
w h i c ht h ep i x e ll i e s
*/
PixelBytePtr
MK-FP(EVGA-SCREEN-SEGMENT.
( Y O * EVGA-SCREEN-WIDTH-IN-BYTES
) + ( X0 / 8 ) I ;
/*
in
/*
S e tu pt h eG r a p h i c sC o n t r o l l e r ' s
B i t Mask r e g i s t e r t o a l l o w
o n l yt h eb i tc o r r e s p o n d i n gt ot h ep i x e lb e i n gd r a w nt o
b e m o d i f i e d */
outportb(GC-INDEX. BIT-MASK-INDEX);
outportb(GC-DATA.PixelMask);
/*
D r a w t h ep i x e l .B e c a u s e
o f t h eo p e r a t i o no ft h es e t i r e s e t
f e a t u r eo ft h e
EGA/VGA. t h e v a l u e w r i t t e n d o e s n ' t m a t t e r .
The s c r e e n b y t e i s
ORed i n o r d e r t o p e r f o r m
a r e a dt ol a t c ht h e
d i s p l a y memory. t h e np e r f o r m a w r i t e i n o r d e r t o m o d i f y
it. * I
* P i x e l B y t e P t r 1- OxFE:
/*
* Draws
*/
a lineinoctant
v o i dO c t a n t O ( X 0 .
YO,
u n s i g n e di n t XO. YO:
u n s i g n e di n tD e l t a X .
i n tX D i r e c t i o n :
0 or 3 ( IDeltaXJ
>-
DeltaY
).
D e l t a XD
. e l t a YX
. Direction)
/* c o o r d i n a t e so fs t a r to ft h el i n e
/ * l e n g t ho ft h el i n e( b o t h
> 0 ) */
DeltaY;
I* 1 i f l i n e i s drawn l e f t t o r i g h t ,
-1 i f d r a w n r i g h t t o l e f t
*/
i n t Del taYx2;
i n t DeltaYx2MinusDeltaXx2;
i n tE r r o r T e r m :
/*
*/
Setup
initialerrorterm
a n dv a l u e su s e di n s i d ed r a w i n gl o o p
DeltaYx2
D e l t a Y * 2:
DeltaYx2MinusDeltaXx2
D e l t a Y x 2 - ( i n t ) ( D e l t a X * 2 1;
ErrorTerm
D e l t a Y x 2 - ( i n t )D e l t a X :
/*
D r a w t h e l i n e */
EVGADot(X0. Y O ) ;
I* fdpitrirhaxseew
tl
*/
while ( DeltaX-- ) (
/ * See i f i t ' s t i m e t o a d v a n c et h e Y c o o r d i n a t e */
i f ( E r r o r T e r m >- D ) {
/ * Advancethe Y c o o r d i n a t e & a d j u s t t h e e r r o r t e r m
back down * /
YO++;
E r r o r T e r m +- DeltaYx2MinusDeltaXx2;
I else {
/* Add t o t h e e r r o r t e r m
*/
E r r o r T e r m +- OeltaYx2:
X0 +- X D i r e c t i o n ;
EVGADot(XD. Y O ) ;
/*
/*
/*
*
*/
662
*/
Draws a l i n e i n o c t a n t
Chapter 35
or 2
a d v a nt chee
X coordinate
draw a p i x e l */
IDeltaXl
<
D e l t a Y 1.
*/
v o i dO c t a n t l ( X 0 .
YO, DeltaXD
. eltaYX
. Direction)
/* coordinates o f s t a r t o f t h e l i n e
u n s i g n e di n t X O . Y O :
> 0) * I
D e l t a Y : I* l e n g t ho ft h el i n e( b o t h
u n s i g n e di n tD e l t a X .
in t X D i r e c t i on :
I* 1 i f l i n e i s drawn l e f t t o r i g h t ,
- 1 i f drawn r i g h t t o l e f t * I
*/
i n tD e l t a X x 2 ;
i n tD e l t a X x Z M i n u s D e l t a Y x 2 :
i n t ErrorTerm:
/ * Setup i n i t i a l e r r o r t e r m andvalues u s e di n s i d ed r a w i n gl o o p
D e l t a X * 2:
DeltaXxZ
DeltaXxZMinusDeltaYx2
DeltaXx2 - ( i n t ) ( DeltaY * 2 ) :
ErrorTerm
D e l t a X x 2 - ( i n t )D e l t a Y :
*I
EVGADot(X0. Y O ) :
I* d rtfahiprweisxt e l
*I
w h i l e ( DeltaY-- ) [
/ * See i f i t ' s t i m e t o a d v a n c et h e X c o o r d i n a t e * I
i f ( E r r o r T e r m >- 0 1 (
I* A d v a n c et h e X c o o r d i n a t e & a d j u s t t h e e r r o r t e r m
b a c k down * I
X0 +- X D i r e c t i o n ;
E r r o r T e r m +- D e l t a X x 2 M i n u s D e l t a Y x 2 :
1 else {
I* Add t o t h e e r r o r t e r m
*/
E r r o r T e r m +- D e l t a X x Z :
I* atdhvea n c e
Y coordinate *I
I* draw a p i x e l * I
YO++:
EVGADot(X0. Y O ) :
1
I*
*I
voidEVGALine(X0.
i n t XO,
YO:
I*
i n t X1. Y1:
/*
c h aC
r olor:
I*
YO, X 1 . Y 1 . C o l o r )
c o o r d i n a t e so of n ee n do tf h el i n e
c o o r d i n a t e so tf h eo t h e re n do tf h el i n e
c o l o rt o draw 1 i n e i n * I
*I
*/
i n tD e l t a X .D e l t a Y :
i n t Temp:
I* S e tt h ed r a w i n gc o l o r
*I
I* P u tt h ed r a w i n gc o l o ri nt h eS e t / R e s e tr e g i s t e r
outportb(GC-INDEX, SET-RESETLINDEX):
outportb(GC_DATA.Color);
/ * Cause a l l p l a n e s t o b e f o r c e d t o t h e S e t / R e s e t c o l o r
outportb(GC_INDEX. ENABLELSET-RESETLINDEX):
outportb(GC_DATA, OxF);
*I
*/
/ * Save h a l ft h el i n e - d r a w i n gc a s e sb ys w a p p i n g
YO w i t h Y 1
and X0 w i t h X 1 i f Y O i s g r e a t e r t h a n
Y 1 . As
a
r e s u l t ,D e l t a Y
i s always > 0, a n do n l yt h eo c t a n t
0 - 3 casesneed t o be
hand1 ed. * I
i f ( YO
>
Y1 )
Temp
YO;
YO
Y1:
Y1
Temp;
XO:
Temp
--
663
--
x0
X1
}
/*
x1:
Temp:
H a n d l ea sf o u rs e p a r a t ec a s e s ,f o rt h ef o u ro c t a n t si nw h i c h
Y 1 i sg r e a t e rt h a n
Y O */
DeltaX
X 1 - XO:
/ * c a l c u l a t et h el e n g t ho ft h el i n e
i n e a c hc o o r d i n a t e * I
DeltaY
Y 1 - YO:
i f ( DeltaX > 0 ) I
i f ( DeltaX > DeltaY
{
OctantO(X0. Y O , D e l t a X .D e l t a Y .
1);
} else {
O c t a n t l ( X 0 , YO, D e l t a XD
, eltaY.
1):
1 else
D e l t a X = -Del t a x :
/ * a b s o l uv taDeloeuflet a X
i f ( DeltaX > DeltaY
{
OctantO(X0, YO, D e l t a XD
. eltaY.
-1):
1 else {
O c t a n t l ( X 0 . Y O , D e l t a XD
. eltaY.
-1):
*I
/* R e t u r nt h es t a t eo ft h e
EGAIVGA t o normal * I
outportb(GC-INDEX. ENABLE-SET-RESET-INDEX):
outportb(GC-DATA. 0 ) :
outportb(GC-INDEX. BIT-MASK-INDEX):
outportb(GC-DATA. OxFF):
1
*
*
*
*
*
Sampleprogram
to illustrate
C o m p i l e dw i t hB o r l a n d
EGAIVGA l i n e d r a w i n g r o u t i n e s .
C++
By M i c h a e lA b r a s h
*I
# i n c l u d<ed o s . h >
#define
i d e f i ne
#define
# d e f ine
#define
I* c o n t a i ngse n i n t e r r u p t
GRAPHICS-MODE
TEXT-MODE
BIOSpVIDEO-INT
X-MAX
Y-MAX
Ox10
0x03
Ox10
640
348
/*
/*
*/
w o r k i ns cg r e ewni d t h
w o r k i sn cg r e he en i g h t
*I
*/
e x t e r nv o i dE V G A L i n e ( ) :
/*
S u b r o u t i n et od r a w
a r e c t a n g l ef u l lo fv e c t o r s ,o ft h es p e c i f i e d
l e n g t h and c o l o r ,a r o u n dt h es p e c i f i e dr e c t a n g l ec e n t e r .
*I
v o i dV e c t o r s U p ( X C e n t e r .
i n t XCenter.YCenter:
i n t XLength.YLength:
Ci no tl o r :
664
i n t WorkingX.WorkingY:
Chapter 35
YCenter.XLength.YLength.Color)
/ * c e n t e r o f r e c t a n g l e t o fill * I
I* d i s t a n c ef r o mc e n t e rt oe d g e
o fr e c t a n g l e * I
I* cdortlloaioniwrne s
*I
I* L i n e sf r o mc e n t e r
WorkingX = XCenter WorkingY = YCenter f o r ( : WorkingX < (
EVGALine(XCenter.
totopofrectangle
*I
XLength:
YLength:
XCenter + XLength ) : WorkingX++ )
YCenter.WorkingX,WorkingY.Color);
I* L i n e sf r o mc e n t e r
WorkingX = XCenter +
WorkingY = YCenter f o r ( : WorkingY < (
EVGALine(XCenter.
torightofrectangle
*I
XLength - 1;
Y Length :
YCenter + YLength ) : WorkingY++
YCenter.WorkingX.WorkingY.Color):
t ob o t t o mo fr e c t a n g l e
I* L i n e sf r o mc e n t e r
*I
WorkingX = XCenter + XLength - 1:
WorkingY = YCenter + YLength - 1 ;
f o r ( ; WorkingX >- ( XCenter - XLength 1; WorkingX"
EVGALine(XCenter.YCenter.WorkingX.WorkingY,Color):
--
I* L i n e s f r o m c e n t e r t o l e f t o f r e c t a n g l e
*I
WorkingX
XCenter - XLength;
WorkingY
YCenter + YLength - 1;
f o r ( : WorkingY >- ( YCenter - YLength ) : W o r k i n g Y - - 1
EVGALine(XCenter.YCenter.WorkingX.WorkingY.Color
):
1
I*
* Sampleprogram
*/
t o d r a wf o u rr e c t a n g l e sf u l lo fl i n e s .
v o i dm a i n 0
chartemp:
/ * S e tg r a p h i c s
-AX
mode * I
GRAPHICSLMDDE:
geninterrupt(BIOS-VIDEO-1NT):
I* Draw e a c ho ff o u rr e c t a n g l e sf u l lo fv e c t o r s
*I
VectorsUp(XLMAX I 4, Y-MAX I 4, X-MAX I 4.
Y L M A X I 4 . 1);
VectorsUp(X-MAX * 3 / 4, YLMAX I 4. X-MAX I 4.
Y-MAX f 4. 2 ) :
VectorsUp(XLMAX I 4 , Y-MAX * 3 I 4 . XKMAX / 4 .
Y-MAX / 4 . 3 ) ;
VectorsUp(X-MAX * 3 I 4. YLMAX * 3 / 4 . X-MAX I 4 .
Y"AX
/ 4, 4 ) :
I* W a i tf o rt h ee n t e rk e y
scanf ("Xc", &temp) ;
I* R e t u r n b a c k t o t e x t
-AX
- TEXT-MODE;
t o be p r e s s e d * I
mode * I
geninterrupt(BIOS-VIDE0-INT):
Looking at EVGALine
The EVGALine function itself performs four operations.
EVGALie first sets up the
VGAs hardware so that all pixels drawn will be in the desired color. This is accomplished by setting two of the VGA's registers, the Enable Set/Reset register and the
Bresenham Is Fast,andFast Is Good
665
Set/Reset register. Setting the Enable Set/Reset to the value OFH,asis done in
EVGALine, causes all drawing to produce pixels in the color contained in the Set/
Reset register. Setting the Set/Reset registerto the passed color, in conjunctionwith
the Enable Set/Reset settingof OFH, causes all drawing done by EVGALine and the
functions it calls to generate the passed color. In summary, setting up the Enable
Set/Reset and Set/Reset registers in this way causes the remainderof EVGALine to
draw a line inthe specified color.
EVGALine next performs a simple check to cut in half the numberof line orientations that must be handled separately. Figure 35.4 shows the eight possible line
orientations among which a Bresenhams algorithm implementation must distinguish. (In interpretingFigure 35.4, assume that lines radiate outward from the center
of the figure, falling into oneof eight octants delineatedby the horizontal and
vertical axes and thetwo diagonals.) The need tocategorize lines into these octantsfalls
out of the major/minor axis nature of the algorithm; the orientations are distinguished by which coordinate forms the majoraxis and by whether each of X and Y
increases or decreases from the line start to the line end.
A moment of thought will show, howevel; that four of the line orientations are
redundant. Each of the four orientationswhich
for DeltaY, the Y component of the
line, is less than 0 (that is,for which the line startY coordinate is greater than the
line end Y coordinate) can be transformed into one of the four orientationsfor
which the line startY coordinate is less than the line end
Y coordinate simply by
reversing the line start andend coordinates, so that the line isdrawn in the other
direction. EVGALine does this by swapping (XO,YO) (the line start coordinates)
with (XI,
Y l ) (the lineend coordinates) whenever YO is greater thanYI.
666
Chapter 35
Decreasing Y
Octant 5
<
<
0
0
IDeltaYl
>
DeltaX
DeltaY
Octant 6
DeltaX
DeltaY
>
<
0
0
>
I D e l t aI XD le l t a Y l
IDeltaXl
Octant 4
DeltaX < 0
DeltaY < 0
IOeltaXl > IDeltaY
Decreasing X
OeltaY < 0
IDeltaXl > IDeltaY
increasing X
Octant 3
IDeltaYJ > JDeltaX
J DJ e l t a Y 1
DeltaX < 0
DeltaX
DeltaY
increasing Y
>
>
0
JDeltaXJ
> 0
Octant 1
when the running error of the line dictates. In octants 1 and 2, the Y coordinate
changes on every pixel and the X coordinate changesonly when the running error
dictates, sinceY is the major axis.
There is one line-drawing function for octants0 and 3, OctantO, and oneline-drawing function for octants
1 and 2, Octantl. A single function with if statements could
certainly be used to handle all four octants, but at a significant performance cost.
There is, on the other hand,
very little performance cost to grouping octants
0 and 3
together andoctants 1 and 2 together, since the two octants in each pairdiffer only
in the direction of change of the X coordinate.
EVGALiie determines which line-drawing function to call and with what value for
the direction of change of the X coordinate based on two criteria: whetherDeltaX is
negative or not, and whether the absolute value of DeltaX (IDeltaXI) is less than
DeltaY or not,as shownin Figure 35.5. Recall that thevalue of DeltaY, and hence the
direction of change of the Y coordinate, is guaranteed to be non-negative as a result
of the earlier eliminationof four of the line orientations.
After calling the appropriate function to draw the line (more on those functions
shortly), EVGALiie restores the state of the Enable Set/Reset register to its default
of zero. In this state, the Set/Reset register has no effect, so it is not necessary to
restore thestate of the Set/Resetregister as well.EVGALine also restores the state of
Bresenham is Fast, and Fast is Good
667
Decreasing Y
ling X
Dec
Increasing Y
the Bit Mask register (which,as we will see, is modified by EVGADot, the pixeldrawing
routine actually used to draw each pixel of the lines produced by EVGALine) to its
default of OFFH. While it would be more modular to
have EVGADot restore the state
of the Bit Maskregister afterdrawing each pixel, would
it
alsobe considerably slower
to do so. The same could besaid of having EVGADot set the Enable Set/Resetand
Set/Reset registers for each pixel: While modularity would improve, speed would
suffer markedly.
668
Chapter
35
669
670
Chapter
35
do onething: draw lines into thedisplay memory of the VGA. Additional functionality can be suppliedby the code thatcalls EVGALine.
The version of EVGAlLine shown in Listing 35.1 is reasonably fast, but itis not as fast
as it might be. Inclusion of EVGADot directly into Octant0 and Octantl, and, indeed,
inclusion of Octant0 and Octantl directly into EVGALine would speed executionby
saving the overhead of calling and parameterpassing. Handpicked register variables
might speed performance as well, as would the use of word OUTs rather than byte
OUTs. A more significant performance increasewould come from eliminating
separate calculation of the address and mask for each pixel. Since the location of each
pixel relative to the previous pixel is known, the address and mask could simply be
adjusted from one pixel to the next, rather than recalculated from scratch.
These enhancements are not incorporated into the in
code
Listing 35.1 for a couple
of reasons. One reason is that its important that theworkings of the algorithm be
clearly visible in the code, for learning purposes. Once the implementation
is understood, rewriting it for improved performance would certainly be a worthwhile exercise.
Another reason is that when flat-out speed is needed, assembly language is the best
way to go. Why produce hard-to-understandC code to boost speeda bit when assembly-language code can perform thesame task at two or moretimes the speed?
Given which, a high-speed assembly language version of EVGALine would seem to
be a logical next step.
671
.................................................................
: C - c o m p a t i lbi lnee - d r a w ei nngpt roayi nt t
: N e a rC - c a l l a b l ea s :
EVGALine(X0. Y O ,
-EVGALine.
.................................................................
*
*
modelsmall
.code
: Equates.
EVGA-SCREEN-WIDTH-IN-BYTES
equ
EVGA-SCREEN-SEGMENT
GC-INDEX
80
OaOOOh
; GCr3aocpneht hri co sl l e r
SET-RESET-INDEX
ENABLE-SET-RESET-INDEX
BIT-MASK-INDEX
0
1
: I n d e xr e g i s t e rp o r t
: i n d e x e so fn e e d e d
; G r a p h i c s C o n t r o l 1e r
: registers
: S t a c kf r a m e .
EVGALineParms
x0
YO
;pushed BP
: p u s h e dr e t u r na d d r e s s
(make d o u b l e
: w o r df o r f a r c a l l )
: s t a r t i n g X c o o r d i n a t eo fl i n e
;starting Y coordinate o f l i n e
;ending X c o o r d i n a t eo fl i n e
;ending Y c o o r d i n a t eo fl i n e
;colorofline
;dummy t o pad t o w o r ds i z e
dw
dw
dw
dw
x1
Y1
db
struc
dw
dw
Color
db
EVGALineParms
ends
.................................................................
; L i n ed r a w i n gm a c r o s .
.................................................................
: Macro t ol o o pt h r o u g hl e n g t ho fl i n e ,d r a w i n ge a c hp i x e li nt u r n .
; Used f o rc a s eo f( D e l t a X I
: Input:
(DeltaYl.
MOVE-LEFT: 1 i f D e l t a X < 0, 0 e l s e
AL: p i x e l mask f o r i n i t i a l p i x e l
BX: ( D e l t a X I
D X : a d d r e s so f GC d a t a r e g i s t e r . w i t h i n d e x r e g i s t e r s e t t o
i n d e x o f B i t Mask r e g i s t e r
SI: D e l t a Y
E S : D I : d i s p l a y memory address o f b y t e c o n t a i n i n g i n i t i a l
pixel
LINE1
macro
1o c a l
1o c a l
mov bx
672
>-
Chapter 35
MOVE-LEFT
LineLoop.MoveXCoord,NextPixel
, LinelEnd
MoveToNextByte.ResetBitMaskAccumulator
l i n pe:#
iinx eo lf s
cx.
shl
mov
bp.bx sub
bx.1 shl
sub
bx.si add
mov
pmi oxfnreeoltashreer e
: ( t h e r e ' sa l w a y s
a t l e a s tt h e
: a tt h es t a r tl o c a t i o n )
one p i x e l
;DeltaY * 2
: e r r o rt e r m
a t OeltaY * 2 - OeltaX
: e r r o rt e r ms t a r t s
:DeltaX * 2
:DeltaY * 2 - D e l t a X * 2 ( u s e di nl o o p )
;OeltaY * 2 ( u s e di nl o o p )
; s e ta s i d ep i x e l
mask f o r i n i t i a l p i x e l
: w i t h AL ( t h ep i x e l mask a c c u m u l a t o r ) s e t
: f o rt h ei n i t i a lp i x e l
si.l
bp.si
s i .bx
ah.al
LineLoop:
: See i f i t ' s t i m e t o
:see
bp.bpand
js
advancethe
Y c o o r d i n a t ey e t .
n e g a itisi fvt eer rmr o r
t h eastt a y
MoveXCoord
;yes,
same Y c o o r d i n a t e
: Advancethe
Y c o o r d i n a t e ,f i r s tw r i t i n g
all p i x e l s i n t h e c u r r e n t
move t h e p i x e l mask e i t h e r l e f t o r r i g h t , d e p e n d i n g
: on MOVE-LEFT.
: b y t e .t h e n
b i t u d;psxoe. uat tl
x c h bg y t pe t[rd i l
add
b a c kt e r me r:raodr jbups .t s i a d d
mask
bt yph tifiexsoner l s
.a1
: l o a dl a t c h e sa n dw r i t ep i x e l s ,w i t hb i t
mask
: p r e s e r v i n go t h e rl a t c h e db i t s .B e c a u s e
; s e t / r e s e ti se n a b l e df o r
all p l a n e s ,t h e
: v a l u ew r i t t e na c t u a l l yd o e s n ' tm a t t e r
di.EVGALSCREEN-WIOTH_IN-BYTES
;increment Y c o o r d i n a t e
down
: on MOVE-LEFT),
MoveXCoord:
bp.bx add
i f MOVELLEFT
a hr.o1l
else
a hr .o1r
endi f
Nj necx t P i x e l
; i n c r e m e n te r r o rt e r m
same b y t e , no need t o
memory y e t
mask pf oitxbhrieyinslt se .
: m o d i f yd i s p l a y
dbox;ui st.ptae lt
x c bh ygpCt tedr i 1 , a l
673
; l o a dl a t c h e sa n dw r i t ep i x e l s ,w i t hb i t
; p r e s e r v i n go t h e rl a t c h e db i t s .B e c a u s e
: s e t l r e s e ti se n a b l e df o ra l lp l a n e s ,t h e
; v a l u ew r i t t e na c t u a l l yd o e s n ' tm a t t e r
MoveToNextByte:
i f MOVE-LEFT
l e f tdec
boy ti en ; pni siexdxei tl
else
di
inc
endi f
ResetBitMaskAccumulator:
sub
a1 .a1
NextPixel :
or
a1
ptni:hptx,aehteaiedoxehldet l
mask
; n e x tp i x e li si nb y t et or i g h t
; r e s e tp i x e l
maskaccumulator
mask
; accumulator
1 oop
LineLoop
: W r i t et h ep i x e l si nt h ef i n a lb y t e .
LinelEnd:
dbox;iuust.ptae lt
x c hb gypt[ edt ri ] . a l
endm
mask
; Macro t ol o o pt h r o u g hl e n g t ho fl i n e ,d r a w i n ge a c hp i x e li nt u r n .
: Used f o rc a s eo fD e l t a X
;
Input:
<
DeltaY.
MOVE-LEFT: 1 i f D e l t a X < 0 . 0 e l s e
AL: p i x e l mask f o r i n i t i a l p i x e l
EX: I D e l t a x I
DX: a d d r e s so f GC d a t a r e g i s t e r . w i t h i n d e x r e g i s t e r s e t t o
i n d e x o f B i t Mask r e g i s t e r
S I : DeltaY
ES:DI: d i s p l a y memory a d d r e s so fb y t ec o n t a i n i n gi n i t i a l
pixel
LINE2
macro
MOVE-LEFT
Ll oi nceaLl o o p .
MoveYCoord.
ETermAction.
LineEEnd
mov
cx,si
;# o f p i x e l s i n l i n e
:done i f t h e r e a r e n o m o r e p i x e l s
jcxz
LineEEnd
bx.1 shl
;DeltaX * 2
mov
bp.bx
; e r r o rt e r m
bp.si sub
: e r r o rt e r ms t a r t sa tD e l t a X
* 2 - DeltaY
ssi h l
,I
;DeltaY * 2
bx.si sub
:DeltaX * 2 - DeltaY * 2 (used i n l o o p )
add
. b sx i
;DeltaX * 2 (used i nl o o p )
: Setupinitialbit
mask & w r i t e i n i t i a l p i x e l .
dx,al out
x c hb gypt[ edt ri ] . a h
: l o a dl a t c h e sa n dw r i t ep i x e l ,
w i t h b i t mask
: p r e s e r v i n go t h e rl a t c h e db i t s .B e c a u s e
: s e t / r e s e ti se n a b l e df o ra l lp l a n e s ,t h e
: v a l u ew r i t t e na c t u a l l yd o e s n ' tm a t t e r
674
Chapter 35
LineLoop:
: See i f i t ' s t i m e t o
bp.bp and
E T e r m Aj ncst i o n
bp.si add
s jhmo pr t
ETermAction:
a d v a n c et h e
X c o o r d i n a t ey e t .
MoveYCoord
;see i f e r r o rt e r mi sn e g a t i v e
;no.advance
X coordinate
: i n c r e m e n te r r o rt e r m
& keep same
; X coordinate
: Move p i x e l mask o n ep i x e l( e i t h e rr i g h to rl e f t ,d e p e n d i n g
: on MOVE-LEFT). a d j u s t i n gd i s p l a y memory a d d r e s s when p i x e l maskwraps.
i f MOVELLEFT
rol
sbb
a1 .1
ror
di.0 adc
endi f
dx.al out
bp.bx add
a1 .1
di.0
else
; s e t new b i t mask
; a d j u s te r r o rt e r mb a c k
down
: Advance Y c o o r d i n a t e .
MoveYCoord:
add
di.EVGALSCREENLWIDTHLINLBYTES
; W r i t et h en e x tp i x e l .
x c hb gypt[ edt ri l . a h
: l o a dl a t c h e sa n dw r i t ep i x e l ,w i t hb i t
mask
: p r e s e r v i n go t h e rl a t c h e db i t s .B e c a u s e
: s e t / r e s e ti se n a b l e df o ra l lp l a n e s ,t h e
: v a l u ew r i t t e na c t u a l l yd o e s n ' tm a t t e r
1 oop
L i ne2End:
endm
L i neLoop
.................................................................
drawing
; Line
.................................................................
pub1 ic -EVGALi ne
-EVGALi ne
n e aprr o c
bp
push
mov
bp.sp
si
push
di
push
ds
push
; P o i n t DS t o d i s p l a y
mov
mov
; p r e s e r v er e g i s t e rv a r i a b l e s
memory.
; S e t h eS e t / R e s e ta n d
; t h es e l e c t e dc o l o r .
S e t / R e s e tE n a b l er e g i s t e r sf o r
675
mov
mov
out
inc
mov
out
dec
mov
out
inc
mov
out
dx.GC-INDEX
a1 .SET-RESET-I NDEX
dx,al
dx
a1 .[bp+Colorl
dx.al
dx
a1 .ENAELE-SET- RESET-INDEX
dx,al
dx
a1 , O f f h
dx.al
: G e tD e l t a Y
mov
mov
s i , [bp+Y11
ax, [bp+YD]
sub
jns
si ,ax
CalcStartAddress
;line Y start
; l i n e Y end,used
later in
; c a l c u l a t i n gt h es t a r ta d d r e s s
;c a l c u l a t e D e l t a Y
; i f p o s i tw
i vesee' ,rte
: D e l t a Y i s n e g a t i v e - - swap c o o r d i n a t e s so w e ' r ea l w a y sw o r k i n g
: w i t h a p o s i t i v eD e l t a Y .
mov
lpt oa sYi t i v e t o : c o n v e r t s i
ax,[bp+Y11
;set
mov
dx, [bp+XO]
xchg
[bp+X11
dx.
mov
[bp+XO]
;swap .dx
neg
: C a l c u l a t et h es t a r t i n ga d d r e s s
: H a r d w i r e df o r a s c r e e n w i d t h o f
CalcStartAddress:
a x . 1s h l
ax.1
shl
ax.1
shl
ax.1
shl
mov
d i .ax
ax.1
shl
ax.1
shl
di,ax
add
mov
d x , [bp+XOI
moval:sosiwed, detelr c l
c1.7
and
dx.1 shr
dx.1 shr
(cXo0l /u8m) aondf dbr ey st es: g e td x . 1 s h r
adds lt;iano,refodtffxsdei t
: S e tu p
; S e tu pp i x e l
676
X coordinates
i n d i s p l a y memory o f t h e l i n e .
80 b y t e s .
:YO
:YO
:YO
Chapter 35
*
*
:YO
:YO
:YO
:YO
*
*
2 ;a
Y Ol r ei sa d y
i n AX
4
8
16
32
64
80
m a s k i n:g p i x e l
GC I n d e x r e g i s t e r t o p o i n t t o t h e B i t
dx,GC-INDEX
mov
mov
dx.al out
; l e a dv ex i n c
Y 1 .u sfeo r
lsi tnoaer t
: i n c a l c u l a t i n gt h es t a r ta d d r e s s
3 bo if t s
c o fl ou rm n
i n d issepgl amye n t
Mask r e g i s t e r .
al.EIT-MASK-INDEX
DX t ph oe itnot i n g
mask ( i n - b y t ep i x e la d d r e s s ) .
GCr eDgai st at e r
mov
shr
al.80h
a1 . c l
: C a l c u l a t eD e l t a X .
mov
bx.[bp+XOI
sub
bx.[bp+Xl]
: H a n d l ec o r r e c to n e
js
cmp
O c t a n t lj b
: DeltaX
NegDel t a x
sbix .
>-
DeltaY
LINEl
jmp
: DeltaY
>
o f f o u ro c t a n t s
>-
0.
0
EVGALi neDone
DeltaX
>-
0.
Octantl:
LINE2
s hj mo pr t
NegDel t a x :
neg
bx
cmp
Octant2 jb
: IDeltaXl
EVGALineDone
bx.si
>-
LINEl
s jhmo pr t
: IDeltaXl
<
: I Dt ae xl
I
<
DeltaYandDeltaX
0.
1
EVGALineDone
DeltaYandDeltaX
<
0.
Octant2:
LINE2
EVGALi neDone:
: R e s t o r e EVGA s t a t e .
mov
a1 . O f f h
o u t B i t : s de xt . a l
dec
dx
al.ENABLE-SET-RESET-INDEX
mov
out
dx.al
in c
dx
a1 .a1
sub
S ne at:r/sbeR
t deolgexti s.saet elt r
out E
POP
POP
POP
POP
ret
-EVGALi ne
MaskO frfehgt oi s t e r
ds
di
si
bP
endp
end
677
Previous
Home
An explanation of the workings of the code inListing 35.3 would be a lengthy one,
and would be redundantsince the basic operation of the code in Listing 35.3 is no
different from that
of the code in
Listing 35.1,although the implementation
is much
changed dueto the natureof assemblylanguage and also due to designing for speed
rather thanfor clarity. Giventhat you thoroughly understand theC implementation
in Listing 35.1, the assembly language implementation in Listing 35.3, which is
well-commented, should speak for itself.
One point I do want to make is that Listing 35.3 incorporates a clever notion for
which credit is due Jim Mackraz, who described the notion in a letter written in
response to an article I wrote long agoin the late and lamentedProgrammersJournul. Jims suggestion was that when drawing lines for whichIDeltaXl is greater than
IDeltaYI, bits set to 1 for eachof the pixels controlled by a given byte can be accumulated in a register, rather than drawing each pixel individually. All the pixels
controlled by that byte can then be drawn at once, with a single access to display
memory, when all pixel processing associated with that byte has been completed.
This approach can save many OUTSand many display memory reads and writes
when drawing nearly-horizontal lines, and thats important because EGAs and VGAs
hold the CPU up for a considerable period of time on each 1/0 operation and
display memory access.
All too many PC programmers fall into thehigh-level-language trap of thinking that
a good algorithm guarantees good performance. Not so: As our two implementations of Bresenhams algorithm graphically illustrate (pun notoriginally intended,
but allowed to stand once recognized), truly great PC code requires both a good
algorithm and a good assembly implementation. In Listing 35.3,weve got bothand my-oh-my, isnt it fun?
678
Chapter 35
Next
Previous
chapter 36
the good, the bad, and the run-sliced
Home
Next
68 1
*WorkingScreenPtr
i f (XDelta > 0)
i++)
- Color;
WorkingScreenPtr++:
else
(
WorkingScreenPtr--:
Here, the programmer knows which way the line is going before the main loop begins-but nonetheless performs that test every time through the loop, when
calculating the address of the next pixel. Far better to perform the test only once,
outside the loop, as shown here:
i f (XDelta
>
f o r (i-0:
0)
i<RunLength: i++)
*WorkingScreenPtr++
- Color:
else
{
f o r( i - 0 :i < R u n L e n g t h :
*WorkingScreenPtr--
i++)
Color:
Think of it this way: A program is a state machine. It takes a set of inputs and produces a corresponding
set of outputs by passing through aset of states. Your primary
job as a programmeris to implement the desired state machine. Your additionaljob
as a performance programmeris to minimize the lengths of the paths through the
682
Chapter 36
state machine. This means performing as many tests and calculations as possible
outside the loops, so that the loops themselves can do as little work-that is, pass
through as few states-as possible.
Which brings us full circle to Bresenham's run-length slice line-drawing algorithm,
which just happensto be an excellent example of a minimized statemachine. In case
you're fuzzy on the good/bad performancething, that's "good"-as in fast.
'.\
"""."""""""""""""..""""..""""..""~
Midway points
between pixels
along minor axis
__"""."_"""."""".""". ___""".""""".
\\//
683
The run-length slice algorithm rotates matters 90 degrees, with salubrious results.
The basis of the run-length slice algorithm is stepping one pixel at a time along the
minor axis (the shorter dimension), while maintaining an integer error term indicating
how closethe lineis toadvancing an extrapixel along themajor axis, as illustrated by
Figure 36.2.
Consider this: When youre called upon to draw a line with an Xdimension of 35
and a Y-dimension of 10, you have a great deal of information available, some of
which is ignored by standard Bresenhams. In particular, because the slope is between 1/3 and 1/4, you knowthat every single run-a run being a set
of pixels at the
same minor-axis coordinate-must be
either three or four pixels long. No other
length is possible, as shown inFigure 36.3 (apart from thefirst and last runs, which
are special casesthat Ill discuss shortly). Therefore, forthis line, theres no need to
perform an error-term calculation and test for each pixel. Instead, we can just perform one test per run, to see whether the run is three or four pixels long, thereby
eliminating about 70 percent of the calculations in drawing this line.
Take a moment to let the idea behind run-length slice drawing soak in. Periodic decisions must be made to control pixel placement. The key to speed is to make those
decisions as infrequently and as quickly as possible. Of course, it will work to make a
decision at each pixel-thats standard Bresenhams. However, most of those per-pixel
decisions are redundant, and infact we have enough information before we begin
drawing to know whichare the redundant
decisions. Run-length slice drawing
is exactly
equivalent to standard Bresenhams, but it pares thedecision-making process down
to a minimum.Its somewhat analogousto the difference between finding the greatest
common divisor of two numbers using Euclids algorithm and finding it by trying
/
after each step, to see
whether to draw
RUNLENGTH or
RUNLENGTH+l pixels
along the major axis.
684
Chapter
36
Error terms
(cumulative partial pixels)
at ends of runs \
I""""_
(x
685
,io\o
/&
6
I
0 0 0
"""""_
;-
~.
."""""""""
*
I
,
i -,""
"_
- 1
"""""__:".~
'"""I
im
*io 0 6 0 0
m o o o o o ~ o \ o
0
""""..""""r"I
#I
Cumulative error
term > 1, so draw
an extra pixel
686
Chapter
36
possibilities for the length of the next run would be three and four, so they could
increment the error term, then jump directly to the appropriate one of the two
routines depending on whether the error
term turned over. Properly implemented,
it should be possible to reduce theaverage per-run overhead of line drawing to less
than one branch, with only two additions and two tests (the number of runs must
also be counted down),plus a subtraction half the time. On a 486, this amounts to
something on the orderof 150 nanoseconds of overhead per pixel, exclusive of the
time required to actually writethe pixel to display memory.
Thats good.
The second and more difficult detail is balancing the runs so that theyre centered
around theideal line, and therefore draw the same pixels that standardBresenhams
would draw. Ifwe just drew full-length runs from the start, wed end up with an
unbalanced line, as shown in Figure 36.5. Instead, we have to split the initial pixel
plus one full run as evenly as possible between the first and last runs of the line,and
adjust the initial error term appropriately for the initial half-run.
The initial error term is advanced by one-half of the normal per-step fractional advance, because the initial step
is only one-half pixel along the minor axis. This half-step
gets us exactly halfivay betweenthe initial pixel and the next pixel along the minor
axis. All the error-term adjustments are scaled up by two times precisely so that we
can scale up this halvederror term for the initial run by two times, and thereby make
it an integer.
The othertrick here is that if an odd numberof pixels are allocated between the first
and last partial runs, well end up with an odd pixel, since we are unable to draw a
half-pixel. This odd pixel is accounted for by adding half a pixel to the error term.
Thats all there is to run-length slice line drawing; the partial first and last runs are
the only trickypart. Listing 36.1 is a run-length slice implementation in C. This is not
an optimized implementation, nor is it meant to be;this listing is provided so that
you can see how the run-length slice algorithm works. In the nextchapter, Ill move
on to an optimized version, but for now, Listing 36.1 will make it much easier to
grasp the principles of run-length slice drawing, and to understand the optimized
code Ill present in the next chapter.
The Good, the Bad, and the Run-Sliced
687
/ * R u n - l e n g t hs l i c el i n ed r a w i n gi m p l e m e n t a t i o nf o r
mode 0 x 1 3 .t h e
3 2 0 x 2 0 02 5 6 - c o l o r
mode. N o to p t i m i z e d !T e s t e dw i t hB o r l a n d
C++ i n
t h es m a l m
l odel.
*/
VGAs
# d e f i n e SCREEN-WIDTH
320
# d e f i n e SCREEN-SEGMENT
OxAOOO
v o i dD r a w H o r i z o n t a l R u n ( c h a rf a r* * S c r e e n P t r ,i n tX A d v a n c e ,i n tR u n L e n g t h .
intColor):
v o i dD r a w V e r t i c a l R u n ( c h a r far **ScreenPtr. i n t XAdvance. i n t RunLength.
intColor);
/* Drawsa l i n e b e t w e e nt h es p e c i f i e de n d p o i n t si nc o l o rC o l o r .
*/
v o i dL i n e D r a w ( i n tX S t a r t .i n tY S t a r t .i n t
XEnd. i n t YEnd. i n t C o l o r )
I.
688
i n t Temp. AdjUp.AdjDown.ErrorTerm.XAdvance.XDelta.YDelta;
i n t W h o l e s t e p .I n i t i a l P i x e l C o u n t .F i n a l P i x e l C o u n t .
i. RunLength:
c h a rf a r* S c r e e n P t r :
Chapter 36
/* W e ' l la l w a y sd r a wt o pt ob o t t o m ,t or e d u c et h e
number o f cases we have t o
handle,and
t o make l i n e sb e t w e e nt h e
same e n d p o i n t sd r a wt h e
same p i x e l s */
i f ( Y S t a r t > YEnd) {
Temp
YStart:
YEnd;
YStart
YEnd
Temp;
Temp
XStart;
XStart
XEnd;
Temp;
XEnd
-- --
/* P o i n tt ot h eb i t m a pa d d r e s sf i r s tp i x e lt od r a w
*/
ScreenPtr
MK-FP(SCREEN_SEGMENT. Y S t a r t * SCREEN-WIDTH + X S t a r t ) :
/ * F i g u r eo u tw h e t h e rw e ' r eg o i n gl e f to rr i g h t ,a n d
g o i n g h o r i z o n t a l l y */
i f ((XDelta
XEnd - X S t a r t )
{
<
how f a r w e ' r e
0)
XAdvance
-1;
XDelta
-XDelta:
else
XAdvance
1;
/ * F i g u r e o u t how f a r w e ' r e g o i n g v e r t i c a l l y
YDelta
YEnd - Y S t a r t ;
*/
S p e c i a l - c a s eh o r i z o n t a l ,v e r t i c a l .a n dd i a g o n a ll i n e s .f o rs p e e d
and t oa v o i dn a s t yb o u n d a r yc o n d i t i o n sa n dd i v i s i o nb y
(XDelta
0)
I* V e r t i c a l l i n e
f o r( i - 0 i; < - Y D e l t a ;
*I
i++)
*ScreenPtr
Color;
S c r e e n P t r +- SCREEN-WIDTH;
return;
(YDelta
0)
/ * H o r i z o n t a ll i n e
f o r( i - 0 ;i < - X D e l t a :
{
*ScreenPtr
ScreenPtr
return;
(XDelta
*/
i++)
- Color;
+-
XAdvance;
YDelta)
/ * D i a g o n a ll i n e
f o r( i - 0 i: < - X D e l t a ;
0 */
*I
i++)
*ScreenPtr
Color;
S c r e e n P t r +- XAdvance
SCREEN-WIDTH;
return;
689
/*
D e t e r m i n ew h e t h e rt h el i n ei s
i f ( X D e l t a >- Y D e l t a )
{
/ * X m a j o rl i n e
/*
*/
X o r Y m a j o r ,a n dh a n d l ea c c o r d i n g l y
*/
Minimum # o f p i x e l s i n
a runinthisline
WholeStep
XDelta / Y D e l t a :
*/
/*
E r r o rt e r ma d j u s te a c ht i m e
Y s t e p sb y
1: used t o t e l l when one
e x t r ap i x e ls h o u l d
bedrawnas
p a r t o f a r u n ,t oa c c o u n tf o r
f r a c t i o n a ls t e p sa l o n gt h e
X a x i sp e r1 - p i x e ls t e p sa l o n g
Y */
AdjUp
( X D e l t a % Y D e l t a ) * 2:
/*
E r r o rt e r ma d j u s t
when t h e e r r o r t e r m t u r n s o v e r , u s e d t o f a c t o r
o u t t h e X s t e p made a t t h a t t i m e
*I
AdjDown
Y D e l t a * 2:
/*
I n i t i a le r r o rt e r m :r e f l e c t s
an i n i t i a l s t e p
a x i s */
ErrorTerm
(XDelta % YDelta) - (YDelta * 2 ) ;
o f 0 . 5 a l o n gt h e
/*
The i n i t i a l and l a s tr u n sa r ep a r t i a l ,b e c a u s e
Y a d v a n c e so n l y
f o rt h e s er u n s ,r a t h e rt h a n
1. D i v i d eo n ef u l lr u n ,p l u st h e
i n i t i a lp i x e l ,
b e t w e e nt h ei n i t i a l
and l a s tr u n s */
InitialPixelCount
(Wholestep / 2 ) + 1:
FinalPixelCount
InitialPixelCount:
0.5
/*
I f t h eb a s i cr u nl e n g t hi se v e n
and t h e r e ' sn of r a c t i o n a l
advance, we h a v eo n ep i x e lt h a tc o u l dg ot oe i t h e rt h ei n i t i a l
o rl a s tp a r t i a lr u n ,w h i c hw e ' l la r b i t r a r i l ya l l o c a t et ot h e
l a s t r u n */
i f ((AdjUp
0 ) && ((WholeStep & 0x01)
0))
--
InitialPixelCount--:
/*
I f t h e r e ' r e an oddnumber
o fp i x e l sp e rr u n ,
we have 1 p i x e l t h a t c a n ' t
be a l l o c a t e d t o e i t h e r t h e i n i t i a l o r l a s t p a r t i a l r u n .
s o w e ' l l add 0 . 5
t oe r r o rt e r m
s o t h i s p i x e l will b eh a n d l e db yt h en o r m a lf u l l - r u nl o o p
*/
i f ( ( W h o l e s t e p & 0x01) !- 0 )
E r r o r T e r m +- Y D e l t a :
3
I* Draw t h e f i r s t , p a r t i a l r u n o f p i x e l s
D r a w H o r i z o n t a l R u n ( & S c r e e n P t r . XAdvance.
/ * Draw all f u l lr u n s
*/
f o r( i - 0 :i < ( Y D e l t a - 1 ) ;
i++)
*/
I n i t i a l P i x e l C o u n t ,C o l o r ) ;
RunLength
Wholestep:
/ * r u ni sa tl e a s tt h i sl o n g
an e x t r a p i x e l
/ * A d v a n c et h ee r r o rt e r ma n da d d
t e r m so i n d i c a t e s */
i f ( ( E r r o r T e r m +- AdjUp) > 0 )
RunLength++;
E r r o r T e r m -- AdjDown;
/*
r e s e t h ee r r o rt e r m
*/
i f t h ee r r o r
*/
/*
Draw t h i s s c a n l i n e ' s r u n
*/
DrawHorizontalRun(&ScreenPtr. XAdvance.RunLength.Color):
3
/ * Draw t h e f i n a l r u n o f p i x e l s
*/
DrawHorizontalRun(&ScreenPtr, X A d v a n c eF, i n a l P i x e l C o u n tC
. olor):
return:
690
Chapter 36
else
{
/* Y m a j o rl i n e
*/
/* Minimum # o f p i x e l s i n a r u n i n t h i s l i n e
Wholestep = Y D e l t a / X D e l t a :
*/
/ * E r r o rt e r ma d j u s te a c ht i m e
X s t e p sb y
1: used t o t e l l when 1 e x t r a
p i x e ls h o u l db ed r a w n
as p a r t o f a r u n .t oa c c o u n tf o r
Y a x i sp e r1 - p i x e ls t e p sa l o n g
X */
f r a c t i o n a ls t e p sa l o n gt h e
AdjUp = ( Y D e l t a % X D e l t a ) * 2 ;
/ * E r r o rt e r ma d j u s t
when t h e e r r o r t e r m t u r n s o v e r , u s e d t o f a c t o r
*/
o u t t h e Y s t e p made a t t h a t t i m e
AdjDown = X D e l t a * 2 :
/ * I n i t i a le r r o rt e r m :r e f l e c t si n i t i a ls t e po f
ErrorTerm
(YDelta % XDelta)
(XDelta
0 . 5 a l o n gt h e
X axis
*/
2):
/*
The i n i t i a l and l a s tr u n sa r ep a r t i a l ,b e c a u s e
X a d v a n c e so n l y0 . 5
f o rt h e s er u n s ,r a t h e rt h a n
1. D i v i d eo n ef u l lr u n .p l u st h e
i n i t i a lp i x e l ,
b e t w e e nt h ei n i t i a la n dl a s tr u n s
*/
I n i t i a l P i x e l C o u n t = ( W h o l e s t e p / 2) + 1:
FinalPixelCount = InitialPixelCount:
/*
I f t h eb a s i cr u nl e n g t hi se v e n
and t h e r e ' sn of r a c t i o n a la d v a n c e .
go t o e i t h e r t h e i n i t i a l o r l a s t p a r t i a l r u n ,
have 1 p i x e lt h a tc o u l d
w h i c hw e ' l la r b i t r a r i l ya l l o c a t et ot h el a s tr u n
*/
i f ( ( A d j U p == 0 ) && ( ( W h o l e s t e p & 0 x 0 1 )
0))
we
--
InitialPixelCount--;
/*
I f t h e r ea r e
anoddnumber
o fp i x e l sp e rr u n ,
we h a v eo n ep i x e l
t h a t c a n ' t be a l l o c a t e d t o e i t h e r t h e i n i t i a l o r l a s t p a r t i a l
r u n , s o w e ' l l add0.5
t ot h ee r r o rt e r m
s o t h i s p i x e l will be
h a n d l e db yt h en o r m a lf u l l- r u nl o o p
*/
i f ( ( W h o l e s t e p & 0x01) != 0 )
[
/*
E r r o r T e r m += XDel t a :
Draw t h e f i r s t , p a r t i a l r u n o f p i x e l s
*/
DrawVerticalRun(&ScreenPtr. X A d v a n c e I. n i t i a l P i x e l C o u n t C
. olor):
/ * Draw a l l f u l l r u n s
f o r( i = O ;i < ( X D e l t a - 1 ) :
(
*/
i++)
RunLength = WholeStep:
/* r u n i s a tl e a s tt h i sl o n g
*/
A d v a n c et h ee r r o rt e r ma n da d da ne x t r ap i x e l
i f t h ee r r o r
term so i n d i c a t e s */
i f ( ( E r r o r T e r m +- AdjUp) > 0 )
/*
1
RunLength++;
E r r o r T e r m - = AdjDown:
I
/ * Draw t h i s s c a n l i n e ' s r u n
*/
/ * r e s e t h ee r r o rt e r m
*/
D r a w V e r t i c a l R u n ( & S c r e e n P t r . XAdvance,RunLength.Color):
/ * Draw t h e f i n a l r u n o f p i x e l s
DrawVerticalRun(&ScreenPtr.
*/
X A d v a n c eF. i n a l P i x e l C o u n tC
, olor):
return:
691
I* Draws a h o r i z o n t a lr u no fp i x e l s ,t h e na d v a n c e st h eb i t m a pp o i n t e rt o
t h ef i r s tp i x e lo ft h en e x tr u n .
*I
far * * S c r e e n P t r .i n t
XAdvance.
v o i dD r a w H o r i z o n t a l R u n ( c h a r
i n t RunLength. i n t C o l o r )
{
i n t i:
c h a rf a r* W o r k i n g S c r e e n P t r
f o r( i - 0 ;i < R u n L e n g t h ;
{
*ScreenPtr;
i++)
*WorkingScreenPtr
Color:
WorkingScreenPtr +- XAdvance;
I* Advance t o t h e n e x t s c a n l i n e
*I
WorkingScreenPtr +- SCREEN-WIDTH;
*ScreenPtr
WorkingScreenPtr;
/*
Drawsa
v e r t i c a lr u no fp i x e l s ,t h e na d v a n c e st h eb i t m a pp o i n t e rt o
t h ef i r s tp i x e lo ft h en e x tr u n .
*I
v o i dD r a w V e r t i c a l R u n ( c h a rf a r* * S c r e e n P t r .i n t
XAdvance.
i n t RunLength. i n t C o l o r )
{
i n t i:
c h a rf a r* W o r k i n g S c r e e n P t r
f o r (i-0;
i<RunLength; i++)
*WorkingScreenPtr
WorkingScreenPtr
*ScreenPtr;
+-
Color;
SCREEN-WIDTH:
I* Advance t o t h e n e x t c o l u m n
*I
WorkingScreenPtr +- XAdvance;
*ScreenPtr
WorkingScreenPtr:
Notwithstanding that its not optimized, Listing 36.1 is reasonably fast. If you run
Listing 36.2 (a sample linedrawing program that you can use totestdrive Listing 36.1),
you may be as surprised as I was at how quicklythe screenfills with vectors, considering thatListing 36.1 is entirely inC and has some redundant divides. Or perhaps you
wont be surprised-in which case I suggest you not miss the next chapter.
LISTING 36.2
136-2.C
I* Sample l i n e - d r a w i n gp r o g r a m .
Uses t h eo p t i m i z e d
l i n e - d r a w i n gf u n c t i o n sc o d e di nL L i s t i n g
L36.1.C.
T e s t e dw i t hB o r l a n d
C++ i nt h es m a l lm o d e l .
*I
#i
n c l ude <dos. h>
#define
#define
#define
# d e f ine
#define
GRAPHICS-MODE
TEXT-MODE
BIOS-VIDEO-INT
X-MAX
Y-MAX
0x13
0x03
Ox10
320
200
e x t e r nv o i dL i n e D r a w ( i n tX S t a r t .i n t
692
Chapter 36
/ * w o r k i sn cgr e w
e ni d t h
/*
w o r k i sn cg r e he en i g h t
YStart.
*I
*/
i n t XEnd. i n t YEnd. i n t C o l o r ) ;
Previous
I* S u b r o u t i n et od r a w
a r e c t a n g l ef u l lo fv e c t o r s ,o ft h es p e c i f i e d
* l e n g t h and c o l o r ,a r o u n dt h es p e c i f i e dr e c t a n g l ec e n t e r .
*I
voidVectorsUp(XCenter,YCenter.XLength.YLength.Color)
i n t XCenter.YCenter:
I* c e n t e ro fr e c t a n g l et o
fill * I
i n t XLength.YLength:
I* d i s t a n c ef r o mc e n t e rt o
edge o fr e c t a n g l e
int Color;
I* c o l o r t o draw1ines i n * I
Home
Next
*I
i n t WorkingX.WorkingY;
I* l i n e s f r o m c e n t e r t o t o p
o f rectangle *I
WorkingX
XCenter - XLength:
YCenter
YLength;
WorkingY
f o r ( ; WorkingX < ( XCenter + XLength 1: WorkingX++
--
LineDraw(XCenter.YCenter.WorkingX.WorkingY.Color);
--
I* l i n e s f r o m c e n t e r t o r i g h t o f r e c t a n g l e
*I
WorkingX
XCenter + XLength - 1 ;
WorkingY
YCenter - YLength;
f o r ( ; WorkingY < ( YCenter + YLength ) ; WorkingY++ )
LineDraw(XCenter.YCenter.WorkingX.WorkingY.Color);
--
I* l i n e s f r o m c e n t e r t o b o t t o m o f r e c t a n g l e
*/
WorkingX
XCenter + XLength - 1:
WorkingY
YCenter + YLength - 1;
f o r ( ; WorkingX >- ( XCenter - XLength 1: WorkingX--
1.
LineDraw(XCenter.YCenter.WorkingX.WorkingY.Color);
--
I* l i n e s f r o m c e n t e r t o l e f t o f r e c t a n g l e
WorkingX
XCenter - XLength;
WorkingY
YCenter + YLength - 1 ;
f o r ( ; WorkingY >- ( YCenter - YLength
*I
);
WorkingY--
LineDraw(XCenter.YCenter.WorkingX.WorkingY.Color);
I* Sampleprogram
int main0
t o d r a wf o u rr e c t a n g l e sf u l lo fl i n e s .
*I
u n i o n REGS r e g s ;
mode */
regs.x.ax
GRAPHICS-MODE;
int86(BIOS-VIDEO-INT.®s.®s);
I* S e t g r a p h i c s
I* Draw each o f f o u r r e c t a n g l e s f u l l
o f vectors *I
VectorsUp(X-MAX I 4 . Y-MAX I 4 . X-MAX I 4 . Y-MAX I 4 . 1);
VectorsUp(X-MAX * 3 1 4 . Y-MAX / 4 . X-MAX 1 4 . Y-MAX / 4 . 2 ) ;
VectorsUp(X-MAX I 4 . Y-MAX * 3 I 4 . X-MAX I 4 . Y-MAX I 4 , 3 ) ;
VectorsUp(X-MAX * 3 I 4 . Y-MAX * 3 I 4 , X-MAX I 4 . Y-MAX I 4 . 4 ) ;
I* Wait f o r akey
getch( ) :
t o bepressed
*I
I* R e t u r n b a c k t o t e x t
mode * I
regs.x.ax
TEXT-MODE;
int86(BIDS-YIDED-INT.®s,®s):
}
693
Previous
chapter 37
dead cats and lightning lines
Home
Next
As I writethis, th
transcontinental
697
substitute that cat, and since all cats treat all humans with equal disdain, the owner
would never know the difference, and would never suffer the trauma of the loss of
her cat. So FOF drove home, got his cat, put it in the kennel, and waited for the
owner to showup-at which point, she took one look at the kennel andsaid, This
isnt my cat. My cat is dead.
As it turned out,she had shipped her recently deceased feline home to be buried.
History does not recordhow our FOF dug himself out of this one.
Okay, but whats the point? The pointis, if it isnt broken, dont fix it. And if it is
broken, maybe thats all right, too. Which brings us, neat as a pin, to the topic of
drawing lines in a serious hurry.
137-1 .ASM
F a s tr u n - l e n g t hs l i c el i n ed r a w i n gi m p l e m e n t a t i o n
f o r mode 0 x 1 3 .t h e VGAs
3 2 0 x 2 0 02 5 6 - c o l o r
mode.
Draws a l i n e b e t w e e nt h es p e c i f i e de n d p o i n t si nc o l o rC o l o r .
C n e a r - c a l l a b l ea s :
v o i dL i n e D r a w ( i n tX S t a r t .i n tY S t a r t .i n t
XEnd, i n t YEnd. i n tC o l o r )
: T e s t e dw i t h TASM
;
;
;
;
;
SCREEN-WIDTH
SCREENKSEGMENT
.model
small
.code
db
; Parameters t o c a l l .
parms
struc
dw
?
?
dw
XStart
dw
?
YStart
dw
?
XEnd
dw
?
?
YEnd
dw
Color
?
?
db
parmsends
; L o c a vl a r i a b l e s .
AdjUp
AdjDown
equ
l e n g t; h
m
r ui n i m -u6mW heoqlue S t e p
XAdvance
LOCAL-SIZE
pub1 ic -Li neDraw
698
equ320
equ OaOOOh
Chapter 37
;pushed BP
;pushed r e t u r na d d r e s s
;X s t a r t c o o r d i n a t e o f l i n e
:Y s t a r t c o o r d i n a t e o f l i n e
;X e n dc o o r d i n a t eo fl i n e
; Y e n dc o o r d i n a t eo fl i n e
;colorinwhichtodrawline
;dummy b y t eb e c a u s eC o l o r
i s r e a l l y a word
e q-;u2e r r ot er r a
md j u suotpena cahd v a n c e
-4
; e traer odr m
rj u s t
equ - 8
equ 8
when
down
;1 o r -1. f o dr i r e c t i o ni nw h i c h
e trteruoorrm
vrnesr
X advances
_ L i n e D r ap w
rnoeca r
c ld
push
bP
f r a m e s t a c k c a l l e: pr 'rse s e r v e
mo v
bP.SP
f r a ms et a c k o u r
to :point
sp. sub
LOCALLSIZE
v al roi ca:abafllosel prosac ac et e
v a r iCa br e
l egsi s t e r
:push
preserve
si
push
di
serve
ds
push
DS
; W e ' l ld r a wt o pt ob o t t o m ,t or e d u c et h e
number o f cases we have t oh a n d l e ,
: and t o make l i n e sb e t w e e nt h e same e n d p o i n t sa l w a y sd r a wt h e
same p i x e l s .
mov
ax.[bpl.YStart
cmp
ax.[bpl.YEnd
jle
LineIsTopToBottom
ndpoints :swap
Cbp1.YEnd.a~
xchg
mov
Cbp1.YStart.a~
mov
bx.[bpl.XStart
[bpl.XEnd,bx
xchg
mov
Cbp1.XStart.b~
LineIsTopToBottom:
: P o i n t DI t o t h e f i r s t p i x e l t o d r a w .
dx.SCREENLWIDTH
mov
'
: Y S t a r t * SCREEN-WIDTH
mu1
dx
mov
si.[bpl.XStart
mov
,si
di
add
d i ,ax
:DI
Y S t a r t * SCREENKWIDTH + X S t a r t
: offset of initial pixel
: F i g u r eo u t how f a r w e ' r e g o i n g v e r t i c a l l y ( g u a r a n t e e d t o b e p o s i t i v e ) .
mov
cx.[bpl.YEnd
:CX
YDelta
c x , [ b p l . YsSutba r t
: F i g u r eo u tw h e t h e rw e ' r eg o i n gl e f to rr i g h t ,a n d
how f a r w e ' r e g o i n g
: h o r i z o n t a l l y . I n t h ep r o c e s s ,s p e c i a l - c a s ev e r t i c a ll i n e s ,f o rs p e e d
and
: t oa v o i dn a s t yb o u n d a r yc o n d i t i o n sa n dd i v i s i o nb y
0.
mov
dx.Cbpl.XEnd
dx.si
sub
:XDel t a
N o t jVnezr t i c:aXl L
Dienlet a
0 means v e lr itni ce a l
:it i s a v e r t i c a ll i n e
: y e s .s p e c i a lc a s ev e r t i c a l i n e
ax.SCREEN-SEGMENT
mov
mo v : p o i an xt d s ,
DS:DI
tbfoiytr thseett o
draw
mov
a1 . [ b p l . C o l o r
VLoop:
mov
[ d i 1.a1
add
d i .SCREEN-WIDTH
cx
dec
jns
VLOOP
jmp
Done
: S p e c i a l - c a s ec o d ef o rh o r i z o n t a l i n e s .
align
2
IsHorizontalLine:
ax.SCREENKSEGMENT
mov
:point ES:DI t o t h e f i r s t b y t e t o d r a w
rnov
,ax
es
mov
al.[bpl.Color
: d u p l i c a t ei nh i g hb y t e
f o r wordaccess
mov
ah.al
:lefttoright?
bx
bx, and
jns
D i rSet
:yes
:currentlyrighttoleft,pointtoleft
sub
.dx di
: end so we cango l e f t t o r i g h t
: ( a v o i d su n p l e a s a n t n e s sw i t h r i g h tt o
: l e f t REP STOSW)
--
699
D i rSet:
mov
dx
cx,
c x in c
shr
cx.1
stosw
rep
adc
cx, cx
s t orsebp
j mp
Done
: S p e c i a l - c a s ec o d ef o rd i a g o n a ll i n e s .
2
a1 i g n
IsDiagonalLine:
ax.SCREEN-SEGMENT
mov
mov
; p o, ianxt d s
mov
a1 ,[bp] .Col or
add
bx.SCREEN-WIDTH
DLoop:
mov
[ d i 1.a1
add
d i ,bx
cx
dec
jns
DLoop
j mP
Done
a1 i g n
NotVerticalLine:
mov
:# o f p i x e l s t o
draw
many w o r d sa sp o s s i b l e
i f ti hs e r e
b y:do
toed, dt h e
one
DS:DI
fbtitoyrhsteteto
draw
2
bx.1
:assume
r i g ht otl .e f t
s o XAdvance
: * * * l e a v e sf l a g su n c h a n g e d * * *
s e t arl ilLgehfttj,T
no so: lRe if gt h t
l e f t , t o: r i g h bt x n e g
so XAdvance
-1
dx
neg
: I XDel t a I
LeftToRight:
: S p e c i a l - c a s eh o r i z o n t a l i n e s .
and
:YDelta
cx,cx
O?
IsHorizontalLine;yes
Jz
: S p e c i a l - c a s ed i a g o n a l i n e s .
;YDelta
cmp dx
cx,
XDelta?
: y eIss D i a g o n a l L i n e j z
: D e t e r m i n ew h e t h e rt h el i n ei s
X or Y m a j o r ,a n dh a n d l ea c c o r d i n g l y .
cmp
cx dx,
jae
XMa j o r
j mP
YMajor
: X - m a j o r( m o r eh o r i z o n t a lt h a nv e r t i c a l )l i n e .
a1 i g n
2
XMa j o r :
ax.SCREEN-SEGMENT
mov
E S :fb
t DiotyrIhtset o
draw
mov
; p o,ianxt e s
bx
and
,bx
r i g h t ?t o : l e f t
: y e s . CLD i s a l r e a d y s e t
.ins
DFSet
std
;righttoleft,
s o drawbackwards
DFSet:
ta
:XDel
mov
ax.dx
dx.dx
sub
: p r e p a r ef o rd i v i s i o n
d iv
cx
:AX
XDelta/YDelta
: (minimum # o f p i x e l s i n a r u n i n t h i s l i n e )
:DX
XDelta % YDelta
mov
dx
bx,
: e r r o rt e r ma d j u s te a c ht i m e
Y s t e p sb y
1;
bxbx.
: used tt eo l l
when one e xpt isrxaheolbuel d
add
mov
Cbp1.AdjUp.b~
: drawn
as
p oa fr t
a arctuocno.fuonr t
: f r a c t i o n a ls t e p sa l o n gt h e
X a x i sp e r
: 1 - p i x e ls t e p sa l o n g
Y
mov
;ae
td
e.r cjr um
oxsrst i
whentetuertrrhnomesr
700
Chapter 37
; o v e r .u s e dt of a c t o ro u tt h e
X s t e p made a t
add
s i ,si
; t h a tt i m e
mov
[bpl.AdjDown,si
; I n i t i a le r r o rt e r m ;r e f l e c t s
an i n i t i a l s t e p o f
0 . 5 a l o n gt h e Y a x i s .
dx.si
sub
;(XDelta % YDelta) - (YDelta * 2 )
;OX * i n i t i a l e r r o r t e r m
; The i n i t i a l and l a s tr u n sa r ep a r t i a l ,b e c a u s e
Y a d v a n c e so n l y
0.5 f o r
; t h e s er u n s ,r a t h e rt h a n
1. D i v i d e one f u l l r u n , p l u s t h e i n i t i a l p i x e l ,
; b e t w e e nt h ei n i t i a l
and l a s tr u n s .
;SI
YDelta
mov
s i ,c x
l e n g t h )r(u
mov
mni n i ms u
t emp; w h o cl ex . a x
cx.1
shr
; i npcci iotxi
xiuanenlcl t
( ws ht eopl e
/ 2 ) + 1;
; (may
be
a d j u s t e dl a t e r ) .T h i si sa l s ot h e
; f i n a lr u np i x e lc o u n t
;remember
cx push
l a t e rf ocropuinx ter lu nf i n a l
; I f t h eb a s i cr u nl e n g t hi se v e n
and t h e r e ' s no f r a c t i o n a la d v a n c e ,
we have
; one p i x e lt h a tc o u l d
go t o e i t h e r t h e i n i t i a l o r l a s t p a r t i a l r u n . w h i c h
; w e ' l la r b i t r a r i l ya l l o c a t et ot h el a s tr u n .
; I f t h e r ei sa n
oddnumber
o fp i x e l sp e rr u n .
we haveone
p i x e lt h a tc a n ' t
; b ea l l o c a t e dt oe i t h e rt h ei n i t i a lo rl a s tp a r t i a lr u n .
s o w e ' l l add 0 . 5 t o
; t h ee r r o rt e r m
s o t h i s p i x e l will b eh a n d l e db yt h en o r m a lf u l l - r u nl o o p .
add
dx.si
;assume
odd
l e n g t hYDelta
, add
t e r rmotro
; (add 0.5 o f a p i x e l t o t h e e r r o r t e r m )
; i sr u nl e n g t he v e n ?
test
a1 .I
; n o .a l r e a d yd i dw o r kf o r
oddcase,
a l ls e t
XMajorAdjustDone
jnz
dx,si
sub
; l e n g t hi se v e n ,
undoodd
s t u f f we j u s t d i d
bx
bx, and
; i st h ea d j u s t
upequal
t o O?
; n o( d o n ' tn e e dt oc h e c kf o ro d dl e n g t h ,
X M a j o r A d j u s t Dj no zn e
; because o ft h ea b o v et e s t )
1
c om
n; dbeiott;it oh ncsx d e c
make ri un ni t i a l
; shorter
XMajorAdjustDone:
mov
[bp].WholeStep,ax;whole
step
(minimum
length)
run
mov
a1 ,.[Cb op l o r
;AL
d r a wc ionl go r
; Draw t h e f i r s t , p a r t i a l r u n o f p i x e l s .
nal
the ;drawstosb rep
(Y)
; a d v a n c ea l o n gt h em i n o ra x i s
add
di,SCREEN-WIDTH
; D r a w a l lf u l lr u n s .
; a r et h e r e morethan 2 scans, s o t h e r ea r e
cmp
s i .1
; some f u l l r u n s ?
(SI # scans - 1)
;no.no
f u l lr u n s
j na
XMajorDrawLast
b yt e;ear rm
dr joudrsxt d e c
-1 so we
use can
; c a r r yt e s t
si shr
.1
s c ac no -utponatsi c;r caf ornonm
vert
j nXcM a j o r F u l l R u n s O d d E n t r y
; i f t h e r ies
an odd
number
socf a n s ,
; do theoddscan
now
XMajorFullRunsLoop:
mov
c x . [ b p l . W h o l e S t e p ; r u n laei tsatshltiosn g
a dadnt;eedarrdm
rvtobharxendcxea, d d
an e x t r a
XM
j nacj o r N o E x t r a
; p i x e l i f etre
hr reom
r
so i n d i c a t e s
i n p i x eel x t r a; o n e c x
in c
d x . [tsbeup
erbm
lr.rAotdrh; jreD
e so ewtn
XMajorNoExtra:
n l i n e ' s c a n t h i s; d r aswt o s b r e p
(Y)
add
di.SCREEN-WIDTH
; a dm
vaaitxnlho
i osecnreg
X M a j o r F u l l R u n s O d d E n; ehtl rnoeytor:eepr
i f t h e r e i s annumber
odd
; o ff u l lr u n s
mov
c x . [ b p l . W h o l e S t e p ; r u n i s a t l e a s tt h i sl o n g
bx
dx, add
; a d v a n c et h ee r r o rt e r ma n da d da ne x t r a
; p i x e l i f t h ee r r o rt e r m
so indicates
XMajorNoExtraE
j nc
Dead CatsandLightningLines
701
in
p i x e lx t r ;ao n e
cx
in c
d x . [tsbeup
erbm
lr.rAotdrh: jreD
e so ewtn
XMajorNoExtraZ:
stosb
rep
add
di.SCREEN-WIDTH
:draw t h i s s c a nl i n e ' sr u n
: a d v a n c ea l o n gt h em i n o ra x i s
dec
si
X M a j o r F u l l R u njsnLzo o p
: Draw t h e f i n a l r u n o f p i x e l s .
XMajorDrawLast:
POP
cx
stosb
rep
g e tb a c kt h ef i n a lr u np i x e ll e n g t h
:'d r a wt h ef i n a lr u n
c ld
jmp
Done
: Y - m a j o r( m o r ev e r t i c a lt h a nh o r i z o n t a l
align
2
YMajor:
mov
[bpl.XAdvance.bx
mov
ax.SCREENKSEGMEN1
mov
ds ,ax
ax.cx
mov
mov
cx.dx
sub
dx, dx
d iv
cx
mov
add
mov
bx.dx
bx, bx
Cbpl.AdjUp.bx
mov
add
mov
s i ,c x
si ,si
[bpl.AdjDown.si
: I n i t i a le r r o rt e r m :r e f l e c t s
: ( YdDxe. lstsiau b
(Y)
r e s t o r en o r m a ld i r e c t i o nf l a g
line.
:rememberwhich
way X advances
: p o i n t DS:DI t o t h e f i r s t b y t e t o
draw
:YDelta
:XDel t a
: p r e p a r ef o rd i v i s i o n
:AX = Y D e l t a / X D e l t a
; (minimum # o f p i x e l s i n
a runinthisline)
:DX = Y D e l t a % X D e l t a
X s t e p sb y
1:
: e r r o rt e r ma d j u s te a c ht i m e
; used t o t e l l whenone
e x t r ap i x e ls h o u l db e
: d r a w na sp a r to f
a r u n ,t oa c c o u n tf o r
: f r a c t i o n a ls t e p sa l o n gt h e
Y a x i sp e r
: 1 - p i x e sl t e p sa l o n g
X
: e r r o rt e r ma d j u s t
when t h e e r r o r t e r m t u r n s
: o v e r ,u s e dt of a c t o ro u tt h e
Y s t e p made a t
: t h a tt i m e
an i n i t i a l s t e p o f
0 . 5 a l o n gt h e X a x i s .
% XDelta) - (XDelta * 2)
:DX = i n i t i a l e r r o r t e r m
: The i n i t i a l and l a s tr u n sa r ep a r t i a l ,b e c a u s e
X a d v a n c e so n l y
0.5 f o r
: t h e s er u n s ,r a t h e rt h a n
1. D i v i d eo n ef u l lr u n ,p l u st h ei n i t i a lp i x e l ,
: b e t w e e nt h ei n i t i a l
and l a s tr u n s .
mov
s i ,c x
:SI
XDelta
l e n g t h )r(mov
umni n i m
s tuemp: w h ocl ex . a x
shr
; i npccioitxii
uaennlct
= (w
s theopl e
/ 2 ) + 1;
: (may b ea d j u s t e dl a t e r )
cx push
late
:remember
fr oc ropui nx e
tr u
l n
final
cx.1
; If
t h eb a s i cr u nl e n g t hi se v e n
and t h e r e ' sn of r a c t i o n a la d v a n c e ,
we have
: one p i x e lt h a tc o u l d
go t o e i t h e r t h e i n i t i a l o r l a s t p a r t i a l r u n , w h i c h
: w e ' l la r b i t r a r i l ya l l o c a t et ot h el a s tr u n .
: I f t h e r e i s anoddnumber
o fp i x e l sp e rr u n ,
we h a v eo n ep i x e lt h a tc a n ' t
; b ea l l o c a t e dt oe i t h e rt h ei n i t i a lo rl a s tp a r t i a lr u n ,
s o w e ' l l add 0 . 5 t o
: t h ee r r o rt e r m
s o t h i s p i x e l will b eh a n d l e db yt h en o r m a lf u l l - r u nl o o p .
dx.si
add
test
Y M a j o r A d j u s t Dj no zn e
dx,si
sub
bx
bx, and
702
Chapter 37
a1 .1
;assumeodd
l e n g t h ,a d dX D e l t at oe r r o rt e r m
: i sr u nl e n g t he v e n ?
; n o .a l r e a d yd i dw o r kf o ro d dc a s e ,a l ls e t
: l e n g t hi se v e n ,u n d oo d ds t u f f
we j u s t d i d
: i st h ea d j u s tu pe q u a lt o
D?
Y M a j o r A d j u s t Dj nozn e
: n o ( d o n ' tn e e dt oc h e c k
cx
; b o t hc o n d i t i o n sm e t :
: shorter
: b e c a u s eo ft h ea b o v et e s t )
dec
YMajorAdjustDone:
mov
[bp].WholeStep.ax
a1
mov
, [.bCpo] l o r
mov
bx.[bpl.XAdvance
: D r a w t h ef i r s t ,p a r t i a lr u no fp i x e l s .
YMajorFirstLoop:
mov
[dil.al
add
di.SCREEN_WIDTH
cx
dec
Y M a j o r F i r s t L oj nozp
add
amx;dai nsditoah,vbrleaoxnncge
: D r a w a l lf u l lr u n s .
CmP
s i .I
YMajorDrawLast
no
jna :no.
b yt e; earrdmrjoudrsx td e c
f o r odd l e n g t h ,
make i n i t i a l r u n
; w h o l es t e p( m i n i m u mr u nl e n g t h )
:AL
d r a w i n gc o l o r
;which way X advances
; d r a wt h ep i x e l
: a d v a n c ea l o n gt h em a j o ra x i s
CY)
(X)
:#
f u ol l f
runs.
2 tm
ht haoeA
nrreree
: columns, so t h e r ea r e some f u l lr u n s ?
: (SI I/columns - 1)
r u nfsu l l
- 1 s o we
use can
: c a r r yt e s t
si shr
.1
jY
n cM a j o r F u l l R u n s O d d E n t r y
YMajorFullRunsLoop:
mov
c[ bxp, l
.Who1 eStep
add
dx.[bpl.AdjUp
YMajorNoExtrajnc
c x in c
dx.[bpl.AdjDown
sub
YMajorNoExtra:
: d r a wt h e run
YMajorRunLoop:
mov
Cdil.al
add
d i ,SCREEN-WIDTH
cx
dec
jnz
YMajorRunLoop
add
d i ,bx
YMajorFullRunsOddEntry:
mo v
add
jnc
inc
sub
YMajorNoExtraZ:
: d r a wt h er u n
YMajorRunLoopE:
mov
add
dec
j nz
add
cx.[bpl.WholeStep
dx.[bpl.AdjUp
YMajorNoExtraZ
cx
dx.Cbpl.AdjDown
Cdil.al
d i ,SCREEN-WIDTH
cx
YMajorRunLoop2
d i bx
dec
si
Y M a j o r F u l l R u nj ns zL o o p
; Draw t h e f i n a l r u n o f p i x e l s .
YMajorDrawLast:
POP
cx
c o l uc: m
ctoofonlruno- m
vpmeanri t
: i f t h e ri es
an odd
number
; columns,dotheoddcolumn
r count
of
now
;run i s a t l e a s tt h i sl o n g
: a d v a n c et h ee r r o rt e r ma n da d da ne x t r a
: p i x e l i f t h ee r r o rt e r m
so indicates
; o n ee x t r ap i x e li nr u n
; r e s e tt h ee r r o rt e r m
: d r a wt h ep i x e l
: a d v a n c ea l o n gt h em a j o ra x i s
CY)
; a d v a n c ea l o n gt h em i n o ra x i s
(X)
: e n t e rl o o ph e r e
i f t h e r e i s an oddnumber
; o ff u l lr u n s
: r u ni sa tl e a s tt h i sl o n g
; a d v a n c et h ee r r o rt e r ma n da d d
an e x t r a
: p i x e l i f t h ee r r o rt e r m
so i n d i c a t e s
;one e x t r a p i x e l i n r u n
; r e s e tt h ee r r o rt e r m
: d r a wt h ep i x e l
; a d v a n c ea l o n gt h em a j o ra x i s
(Y)
: a d v a n c ea l o n gt h em i n o ra x i s
(X)
: g e tb a c kt h ef i n a lr u np i x e ll e n g t h
DeadCatsandLightningLines
703
YMajorLastLoop:
mov
add
cx
dec
Y M a j o r L a s t L o oj npz
Cdi1,al
di.SCREEN-WIDTH
: d r a wt h ep i x e l
: a d v a n c ea l o n gt h em a j o ra x i s
ds
di
si
SP * bP
bP
: r e s t o r ec a l l e r s
(Y)
Done:
POP
POP
POP
mov
POP
ret
-Li
neDraw
endp
end
DS
: r e s t o r e C r e g i s t e rv a r i a b l e s
: d e a l l o c a t el o c a lv a r i a b l e s
: r e s t o r ec a l l e r ss t a c kf r a m e
704
Chapter
37
assembly code was limited to a maximum of two times, which is pretty close to what
Listing 37.1 did, in fact, achieve. When I compared theC and assembly implementations drawing to normal system (nondisplay) memory, I found that the assembly
code was actually four times as fast asthe C code.
In fact, Listing 37.1 draws VGA lines at about 92percent of the maximumpossible
rate in my system-that is, it draws very nearly as fast as the VGA hardware will
allow. All the optimization in the world would get me less than 10 percent faster
line drawing-and only $I eliminated all overhead, an unlikely proposition at
best. The code isn 1fully optimized, but so what?
Now its true that faster linedrawing codewould likelybe more beneficial on faster
VGAs, especially local-bus VGAs, and in slower systems. For that reason, Ill list a
variety of potential optimizations to Listing 37.1. On the other hand,its also true
that Listing 37.1 is capable of drawing lines at a rateof 2.2 million pixels per second
on a 486/ 33, given fast enough VGA memory, so it should be able to drive almost
any non-local-busVGA at nearly full speed. In short,Listing 37.1 is very fast, and, in
many systems,further optimization is basically a waste of time.
Profile before you optimize.
Further Optimizations
Following is a quick tour of some of the many possible further optimizations to
Listing 37.1.
The run-handling loops could be unrolled more than the current two times. However, bear in mind that a two-times unrolling gets more than half the maximum
unrolling benefitwith less overhead than a moreheavily unrolled loop.
BX could be freed up in the Y-major code by breaking out separate loops for X
advances of 1 and -1. DX could be freed up by using AH as the counter for the run
loops, although this would limit the maximum line length that couldbe handled.
The freed registers could be used to keep more of the whole-step and errorvariables
in registers. Alternatively, the freedregisters could beused to implement more esoteric approaches like unrolling the Y-major inner loop; such unrolling could take
advantage of the knowledge that only two run lengths arepossible for any givenline.
Strangely enough, on the 486 it might also be worth unrolling the X-major inner
loop, which consists of
REP STOSB, because of the slow start-up time of REP relative
to the speed of branching on thatprocessor.
Special code could be implemented for lines with integral slopes, because all runs
are exactly the same length in such lines. Also, the X-major code couldtry to writean
aligned word at a time to display memorywhenever possible; this would
improve the
maximum possible performance onsome 1&bit VGAs.
Dead CatsandLightningLines
705
Previous Home
One weakness of Listing 37.1 is that for lines with slopes between 0.5 and 2, the
average run length is lessthan two, rendering run-lengthslicing ineffective.This can
be remedied by viewing lines in that rangeas being composed of diagonal, rather
than horizontal or vertical runs. I havent space to take this idea any further in this
book, but its not very complicated, and it guarantees aminimum run length of 2,
which renders rundrawing considerably more efficient, and makes techniques such
as unrolling the inner run-drawing loops more attractive.
Finally, be aware that run-length slice drawing is best for long lines, because it has
more and slower setup than a standardBresenhams line draw, including a divide.
Run-length slice is great for100-pixellines, but notnecessarily for 20-pixel lines, and
its a sure thing that
its not terrific for %pixel lines. Both approaches will work, but
if line-drawing performance is critical, whether youll wantto use run-length slice or
standard Bresenhams depends on thetypical lengths of the linesyoull be drawing.
For lines of widely varyinglengths, you might want to implement both approaches,
and choose the best one for each line, depending on the line
length-assuming, of
course, that your display memory is fast enough and your application demanding
enough tomake that level of optimization worthwhile.
If your code looks broken from a performanceperspective, think before you fix it;
that particular cat may be dead for aperfectly good reason. 111say it again: Profile
bejwe you optimize.
706
Chapter 37
Next
Previous Home
chapter 38
the polygon primeval
Next
"Giveme but one jirin spot on which to stand, and I will move theEarth. ''
-Archimedes
Were Archimedes ali&,~,today,
he might say, "Give me but one fast polygon-fill
..
routine onwhich to calf, an'd 1 will draw the Earth." Programmers often think of
pixel drawing as beink thebasic graphics primitive, but filled polygons are equally
fundamental and fir more useful. Filled polygons can be used for constructs as
diverse as a single <$ixelor a 3-D surface, and virtually everything in between.
I'll spends?me&ne in this chapter and the nextseveral developing routines to
draw filled polygoris and building more sophisticated graphics operations atop
those routines. Once.wehave that foundation,I'll get into 2-D manipulation and
animation of polygon-based entities as preface to an exploration of 3-D graphics.
You can't get there from here
without laying some groundwork, though, so in this
we'll see
chapter I'll begin with the basics offilling a polygon. In the next chapter,
how to draw a polygon considerably faster. That's my general approach for this
sort of topic: High-level exploration of a graphics topic first, followed by a speedy
hardware-specific implementation for the IBM PC/VGA combination, the most
widely used graphics system around. Abstract, machine-independent graphics is a
thing of beauty, but only by understanding graphics at all levels, including the
hardware, can you boost performance into therealm of the sublime.
And slow computer graphics is scarcely worth the bother.
,~v:"j
"" "
Inan
n.l
709
Filled Polygons
A polygon is simply a shape formed by lines laid end to end to form a continuous,
closed path. A polygon is filled by setting all pixels withinthe polygons boundaries
to a coloror pattern.For now, well work only with polygons
filled with solid colors.
You can divide polygons
into three categories: convex, nonconvex,
and complex, as shown
in Figure 38.1. Convex polygons include what youd normally think of as convex
and more; as far as wereconcerned, aconvex polygonis one forwhich anyhorizontal line drawn through the polygon encounters the right edgeexactly once and the
left edge exactly once, excluding horizontal and zero-length edge segments. Put
another way, neither the right nor left edge
of a convex polygonever reverses direction fromup to down, or vice-versa.Also, the rightand left edges of a convex polygon
may not cross one another, althoughthey may touch so long as the right edgenever
crosses overto the leftside of the left edge. (Checkout the secondpolygon drawnin
Listing 38.3, which certainly isnt convex in the normal sense.) The boundaries of
nonconvex polygons, on the other hand,can go in whatever directions they please,
so long as they never cross. Complex polygons can have any boundaries you might
imagine, which makesfor interesting problemsin deciding which interior spaces to
fill and which not to fill. Each category is a superset of the previous one.
(See Chapter 41 for a more detaileddiscussion of polygon types and naming.)
Why bother todistinguish between convex,nonconvex, and complex polygons? Easy:
performance, especially when it comes to filling convex polygons. Were going to
start with filled convex polygons; theyre widely useful and will serve well to introduce some of the subtler complexities of polygon drawing, not the least of which is
the slippery concept of inside.
Figure 38.1
710
Chapter 38
00
00000000
00
00000000
71 1
filled polygon will match the edges of the same polygon drawn unfilled. Such polygons will look pretty much as theyre supposed to, and all drawingon raster displays
is, after all, only an approximation of an ideal.
Theres one great drawback to tracing polygons with standard lines, however: Adjacent polygons wont fit together properly, as shown in Figure 38.3. If you use six
equilateral triangles to make a hexagon, for example, theedges of the triangles will
overlap when traced with standard lines, and morerecently drawn triangleswill wipe
out portions of their predecessors. Worse still, odd color effects will show up along
the polygon boundaries if XOR drawing is used. Consequently, filling out to the
boundary lines just wont do for drawing images composed of fitted-together polygons. And because fitting polygons together is exactly whatI have in mind, we need
a different approach.
00000000
The screen after a filled polygon
is drawn using a standard
line-drawing algorithmto trace
the edges.
Figure 38.3
712
Chapter 38
00
00
00
00
00
00
00
00
00000000
The screen after a second, adjacent
polygon is drawn; the second polygon
wipes out several pixelsdrawn as part
of the first polygon, some of them within
the first polygons boundaries.
Points located exactly on horizontal edges are drawn only if the interior of the
polygon is directly below them (horizontal top edges are drawn, horizontal bottom edges aren't).
A vertex is drawn only if all lines endingat that point meet the above conditions
(no rightor bottom edges end at that point).
All edges of a polygon except those thatare flat tops or flat bottoms will be considered either right edges
or left edges, regardless of slope. The left edge is the one that
starts with the leftmost line down from the topof the polygon.
These rules ensurethat nopixel is drawn more than oncewhen adjacent polygons
are filled, and thatif polygons cover the full 360degree range arounda pixel, then
that pixel will be drawn once andonly once-just what we need in order to be ableto
fit filled polygons together seamlessly.
This sort of non-overlapping polygonfilling isn 't ideal for allpurposes. Polygons
are skewed towardthe top and left edges, which not only introduces drawing error
relative to the ideal polygon but also means that a Jilledpolygonwon 't match the
same polygon drawn unfilled. Narrow wedges and one-pixel-wide polygons will
show upspottily. All in all, the choice ofpolygon-filling approach depends entirely
on the ways in which thefilledpolygons must be used.
For our purposes, nonoverlappingpolygons are theway to go, so let's have at them.
C o l o r - f i l l s a convexpolygon.
Al v e r t i c e sa r eo f f s e t
b y( X O f f s e t .
Y O f f s e t ) ." C o n v e x "
means t h a te v e r yh o r i z o n t a ll i n ed r a w nt h r o u g h
t h ep o l y g o na ta n yp o i n tw o u l dc r o s se x a c t l yt w oa c t i v ee d g e s
( n e i t h e rh o r i z o n t a ll i n e sn o rz e r o - l e n g t he d g e sc o u n t
as a c t i v e
e d g e s :b o t ha r ea c c e p t a b l ea n y w h e r e
i nt h ep o l y g o n ) .
and t h a t t h e
r i g h t & l e f t edgesnevercross.
( I t ' s OK f o r them t ot o u c h .t h o u g h .
s o l o n g as t h e r i g h t edgenevercrossesover
totheleftofthe
1
l e f t e d g e . )N o n c o n v e xp o l y g o n sw o n ' tb ed r a w np r o p e r l y .R e t u r n s
f o rs u c c e s s ,
0 i f memory a l l o c a t i o n f a i l e d . * /
# i n c l u d e< s t d i o . h >
l i n c l ude <math. h>
# i f d e f -TURBOC-
713
/*
Advances t h e i n d e x b y
one v e r t e x f o r w a r d t h r o u g h t h e v e r t e x l i s t ,
w r a p p i n ga tt h ee n do ft h el i s t
*/
# d e f i n e INDEX-FORWARD(1ndex) \
Index
( I n d e x + 1) % V e r t e x L i s t - > L e n g t h ;
/*
Advances t h e i n d e x b y o n e v e r t e x b a c k w a r d t h r o u g h t h e v e r t e x l i s t ,
wrapping a t t h e s t a r t o f t h e l i s t
*/
# d e f i n e INDEXLBACKWARD(1ndex) \
Index
(Index - 1 + VertexList->Length) % VertexList->Length:
/*
Advances t h ei n d e xb y
one v e r t e x e i t h e r f o r w a r d o r b a c k w a r d t h r o u g h
thevertexlist,wrappingateither
end o f t h e l i s t */
# d e f i n e INDEX-MOVE(Index,Direction)
i f ( D i r e c t i o n > 0)
Index
( I n d e x + 1) % V e r t e x L i s t - > L e n g t h :
else
Index
(Index
1 + VertexList->Length) % VertexList->Length:
e x t e r nv o i d D r a w H o r i z o n t a l L i n e L i s t ( s t r u c t H L i n e L i s t
s t a t i cv o i dS c a n E d g e ( i n t .i n t .i n t .i n t .i n t .i n t .s t r u c tH L i n e
i n t FillConvexPolygon(struct P o i n t L i s t H e a d e r
i n tX O f f s e t .i n tY O f f s e t )
*,
int):
\
\
\
\
**):
V e r t e x L i s t .i n tC o l o r ,
i n t1 .M i n I n d e x L .M a x I n d e x .M i n I n d e x R .S k i p F i r s t .
i n t M i n P o i n t - YM
. a x P o i n t - YT
. o p I s F l a t L. e f t E d g e D i r ;
i n tN e x t I n d e x .C u r r e n t I n d e x .P r e v i o u s l n d e x ;
i n t DeltaXN.DeltaYN.DeltaXP.DeltaYP;
s t r u c tH L i n e L i s tW o r k i n g H L i n e L i s t ;
s t r u c tH L i n e* E d g e P o i n t P t r :
s t r u c tP o i n t* V e r t e x P t r ;
Temp:
/* P o i n t t o t h e v e r t e x l i s t
*/
VertexPtr
VertexList->PointPtr:
/ * Scan t h e l i s t t o f i n d t h e t o p
a n db o t t o mo ft h ep o l y g o n
*/
i f (VertexList->Length
0)
return(1):
/* r e j e c nt u lpl o l y g o n s
*/
MaxPoint-Y
MinPoint-Y
VertexPtrCMinIndexL
MaxIndex
01.Y:
f o r (i 1: i < V e r t e x L i s t - > L e n g t h : i++)
(
i f (VertexPtrCi1.Y < MinPoint-Y)
VertexPtrCMinIndexL
i1.Y: /* new t o p */
MinPoint-Y
e l s e i f ( V e r t e x P t r C i 1 . Y > MaxPoint-Y)
MaxPoint-Y
VertexPtrCMaxIndex
i1.Y: /* new b o t t o m */
- -
i f (MinPoint-Y
return(1):
-/*
MaxPoint-Y)
p o l y g o ni s0 - h e i g h t :a v o i di n f i n i t el o o pb e l o w
/* Scan i n a s c e n d i n g o r d e r t o f i n d t h e l a s t t o p - e d g e p o i n t
MinIndexR
MinIndexL;
w h i l e( V e r t e x P t r C M i n I n d e x R 1 . Y
MinPoint-Y)
INDEX-FORWARD(Min1ndexR);
INDEX-BACKWARD(Min1ndexR): /* back u p t o l a s t t o p - e d g e p o i n t
714
Chapter 38
*/
*/
*/
I* Now scan i n d e s c e n d i n g o r d e r t o f i n d t h e f i r s t t o p - e d g e p o i n t
w h i l e( V e r t e x P t r [ M i n I n d e x L l . Y
MinPoint-Y)
INDEX_BACKWARD(MinIndexL);
INDEXLFORWARD(Min1ndexL); / * backup
t of i r s tt o p - e d g ep o i n t
*/
*/
I* F i g u r e o u t w h i c h d i r e c t i o n t h r o u g h t h e v e r t e x l i s t f r o m t h e t o p
vertexistheleft
edgeandwhich
i s t h e r i g h t */
LeftEdgeDir
-1: I* assume l e f t edgeruns
down t h r u v e r t e x l i s t
--
*I
i f ((TopIsFlat
(VertexPtrCMinIndexL1.X !VertexPtr[MinIndexRl.X) ? I : 0)
1) C
I* I f t h e t o p i s f l a t , j u s t seewhich o f t h e ends i s l e f t m o s t * I
i f ( V e r t e x P t r [ M i n I n d e x L l . X > VertexPtrCMinIndexR1.X) {
L e f t E d g e D i r = 1 ; I* l e f t e d g er u n su pt h r o u g hv e r t e x
l i s t *I
Temp
MinIndexL:
I* swap t hi ne d i c e s
so M i n I n d e x L
*/
M i n I n d e x L = MinIndexR; I* p o i n t s t o t h e s t a r t o f t h e l e f t
*/
MinIndexR
Temp;
/ * edge, s i m i l a rflM
oy ri n I n d e x R
*/
1 else
/* P o i n tt ot h e
downwardend
ofthefirstlineof
each o f t h e
twoedges down f r o m t h e t o p
*/
NextIndex
MinIndexR;
INDEXLFORWARD(Next1ndex);
PreviousIndex
MinIndexL:
INDEX-BACKWARD(Previous1ndex);
I* C a l c u l a t e X and Y l e n g t h s f r o m t h e t o p v e r t e x t o t h e
end o f
t h e f i r s t l i n e down eachedge:usethose
t o compareslopes
andseewhich
l i n e i s l e f t m o s t *I
DeltaXN
V e r t e x P t r [ N e x t I n d e x l . X - VertexPtrCMinIndexL1.X;
DeltaYN
V e r t e x P t r [ N e x t I n d e x l . Y - VertexPtrCMinIndexL1.Y:
DeltaXP
VertexPtr[PreviousIndexl.X - VertexPtrCMinIndexL1.X:
DeltaYP
V e r t e x P t r [ P r e v i o u s I n d e x l . Y - VertexPtrCMinIndexL1.Y;
i f ( ( ( 1 o n g ) D e l t a X N * DeltaYP - (1ong)DeltaYN * DeltaXP) < OL) {
L e f t E d g e D i r = 1; I* l e f t e d g er u n su pt h r o u g hv e r t e x
l i s t *I
Temp = M i n I n d e x L :
I* swap t hi ne d i c e s
so MinIndexL
*I
MinIndexL
MinIndexR;
I* p o i n t s t o t h e s t a r t o f t h e l e f t
*/
MinIndexR
Temp:
/ * edge, s i m i l a rflM
oy ri n I n d e x R
*I
--
..
# o f s c a nl i n e si nt h ep o l y g o n ,s k i p p i n gt h eb o t t o me d g e
and a l s o s k i p p i n g t h e t o p v e r t e x
if thetopisn'tflat
because
i n t h a t c a s et h et o pv e r t e xh a s
a r i g h t edgecomponent,andset
t h et o ps c a nl i n et od r a w ,w h i c hi sl i k e w i s et h es e c o n dl i n eo f
*I
t h ep o l y g o nu n l e s st h et o pi sf l a t
i f ((WorkingHLineList.Length
MaxPoint-Y - MinPoint-Y - 1 + T o p I s F l a t ) <- 0 )
return(1):
/ * t h e r e ' sn o t h i n gt od r a w ,
s o we'redone
*/
WorkingHLineList.YStart
Y O f f s e t + MinPoint-Y + 1 - T o p I s F l a t :
I* S e tt h e
/*
Get memory i n w h i c h t o s t o r e t h e l i n e l i s t
we g e n e r a t e * I
i f ((WorkingHLineList.HLinePtr
( s t r u c tH L i n e
* ) ( m a l l o c ( s i z e o f ( s t r u c tH L i n e )
*
WorkingHLineList.Length)))
NULL)
memory f o r t h e l i n e l i s t
*I
r e t u r n ( 0 ) : / * c o u l d n ' tg e t
--
I* Scan t h e l e f t edgeand
s t o r et h eb o u n d a r yp o i n t si nt h el i s t
I* I n i t i a l p o i n t e r f o r s t o r i n g s c a n c o n v e r t e d l e f t - e d g e c o o r d s
*I
*I
EdgePointPtr
WorkingHLineList.HLinePtr:
I* S t a r t f r o m t h e t o p o f t h e l e f t e d g e
*I
PreviousIndex
CurrentIndex
MinIndexL;
ThePolygonPrimeval
715
/*
Skipthefirstpointofthefirstlineunlessthetopisflat:
i f thetopisn'tflat,thetopvertexisexactly
on a r i g h t
edgeand
i s n ' t drawn * I
SkipFirst
T o p I s F l a t ? 0 : 1;
/ * Scan c o n v e r t e a c h l i n e i n t h e l e f t
e d g ef r o mt o pt ob o t t o m
*/
do {
INDEX-MOVE(Current1ndex.LeftEdgeDir):
S c a nE d g e ( V e r t e x P t r [ P r e v i o u s I n d e x ] . X + X O f f s e t .
VertexPtr[PreviousIndexl.Y.
VertexPtr[CurrentIndexl.X + X O f f s e t .
VertexPtr[CurrentIndexl.V, 1. S k i p F i r s t .& E d g e P o i n t P t r ) :
- -
PreviousIndex
CurrentIndex:
SkipFirst
0: I* s c a n c o n v e r t t h e f i r s t p o i n t f r o m
w h i l e( C u r r e n t I n d e x
!- MaxIndex):
/*
now on
*/
Scan t h e r i g h t e d g ea n ds t o r et h eb o u n d a r yp o i n t s
i n t h e l i s t *I
EdgePointPtr
WorkingHLineList.HLinePtr;
PreviousIndex
CurrentIndex
MinIndexR:
SkipFirst
T o p I s F l a t ? 0 : 1;
edge,top
t o bottom. X c o o r d i n a t e sa r e
/* Scan c o n v e r tt h er i g h t
a d j u s t e d 1 t ot h el e f t .e f f e c t i v e l yc a u s i n gs c a nc o n v e r s i o no f
thenearestpointstotheleftofbutnotexactly
on t h ee d g e */
do
- -
INDEX-MOVE(Current1ndex.-LeftEdgeDir):
+ X O f f s e t - 1.
VertexPtr[PreviousIndex3.Y.
VertexPtr[CurrentIndexl.X + X O f f s e t - 1.
VertexPtr[CurrentIndex].Y, 0. S k i p F i r s t .& E d g e P o i n t P t r ) ;
CurrentIndex:
PreviousIndex
SkipFirst
0: / * s c a n c o n v e r t t h e f i r s t p o i n t f r o m
now on */
w h i l e( C u r r e n t I n d e x
!- MaxIndex):
ScanEdge(VertexPtr[PreviousIndexl.X
- -
I* Draw t h e l i n e l i s t r e p r e s e n t i n g t h e s c a n c o n v e r t e d p o l y g o n
DrawHorizontalLineList(&WorkingHLineList, Color);
*/
I* R e l e a s e t h e l i n e l i s t ' s
memory a n dw e ' r es u c c e s s f u l l yd o n e
free(WorkingHLineList.HLinePtr):
return(1):
*I
/*
(X1,Yl) t o ( X Z . Y 2 ) .n o ti n c l u d i n gt h e
S c a nc o n v e r t s
an edgefrom
p o i n t a t ( X Z . Y 2 ) .T h i sa v o i d so v e r l a p p i n gt h ee n do fo n el i n ew i t h
t h e s t a r t o f t h en e x t ,a n dc a u s e st h eb o t t o ms c a nl i n eo ft h e
I f S k i p F i r s t !- 0. t h ep o i n t a t (X1,Yl)
p o l y g o nn o tt ob ed r a w n .
i s n ' t d r a w n .F o re a c hs c a nl i n e ,t h ep i x e lc l o s e s tt ot h es c a n n e d
o f t h es c a n n e dl i n ei sc h o s e n .
*/
linewithoutbeingtotheleft
s t a t i cv o i dS c a n E d g e ( i n t
X 1 . i n t Y 1 . i n t X2. i n t Y2. i n tS e t X S t a r t .
i n tS k i p F i r s t .s t r u c tH L i n e* * E d g e P o i n t P t r )
{
i n t Y . DeltaXD
. eltaY:
d o u b l eI n v e r s e S l o p e ;
s t r u c tH L i n e* W o r k i n g E d g e P o i n t P t r ;
- -
/ * C a l c u l a t e X and Y l e n g t h s o f t h e l i n e
and t h e i n v e r s e s l o p e
DeltaX
X2 - X 1 ;
i f ((DeltaY
V2 - Y 1 ) <- 0 )
/* g u a radg a i n s0t- l e n g tahnhdo r i z o n t aeld g e s
return:
InverseSlope
(doub1e)DeltaX / (doub1e)DeltaY:
716
Chapter 38
*/
*/
I* S t o r et h e
X c o o r d i n a t eo ft h ep i x e lc l o s e s tt ob u tn o tt ot h e
leftofthelinefor
each Y c o o r d i n a t eb e t w e e n Y 1 and Y 2 . n o t
i n c l u d i n g Y2 and a l s o n o t i n c l u d i n g
Y1 i f S k i p F i r s t != 0 * /
W o r k i n g E d g e P o i n t P t r = * E d g e P o i n t P t r : I* a v o i dd o u b l ed e r e f e r e n c e
*I
for (Y
Y 1 + S k i p F i r s t : Y < Y2: Y++.
WorkingEdgePointPtr++) {
I* S t o r et h e X c o o r d i n a t e i n t h e a p p r o p r i a t e
edge l i s t */
i f ( S e t X S t a r t -= 1 )
WorkingEdgePointPtr->XStart
X 1 + ( i n t ) ( c e i l ( ( Y - Y I ) * InverseSlope)):
else
WorkingEdgePointPtr->XEnd =
X 1 + ( i n t ) ( c e i l ( ( Y - Y l ) * InverseSlope));
*EdgePointPtr
WorkingEdgePointPtr:
/ * advance c a l l e r ' sp t r
*I
I* Draws a l l p i x e l s i n t h e l i s t o f h o r i z o n t a l l i n e s p a s s e d i n , i n
mode 1 3 h .t h e
VGA's 3 2 0 x 2 0 02 5 6 - c o l o r
mode.Uses
a s l o wp i x e l - b y p i x e la p p r o a c h ,w h i c hd o e sh a v et h ev i r t u eo fb e i n ge a s i l yp o r t e d
t o anyenvironment.
*I
#i
ncl ude <dos h>
d i n c l ude "polygon. h"
# d e f i n e SCREEN-WIDTH
320
# d e f i n e SCREEN-SEGMENT
OxAOOO
s t a t i cv o i dD r a w P i x e l ( i n t .i n t ,i n t ) :
void DrawHorizontalLineList(struct HLineList
intColor)
HLineListPtr.
s t r u c tH L i n e* H L i n e P t r ;
i n t Y. X:
I* P o i n t t o t h e X S t a r t I X E n d d e s c r i p t o r f o r t h e f i r s t ( t o p )
h o r i z o n t a ll i n e */
HLineListPtr->HLinePtr;
HLinePtr
/ * Draweach h o r i z o n t a l l i n e i n t u r n , s t a r t i n g w i t h t h e t o p
a d v a n c i n go n el i n ee a c ht i m e
*I
f o r ( Y = HLineListPtr->YStart: Y < (HLineListPtr->YStart
H L i n e L i s t P t r - > L e n g t h ) ; Y++.
HLinePtr++) {
I* Draweach p i x e l i n t h e c u r r e n t h o r i z o n t a l l i n e i n t u r n ,
s t a r t i n gw i t ht h el e f t m o s t
one * /
f o r ( X = H L i n e P t r - > X S t a r t : X <= H L i n e P t r - > X E n d ; X++)
DrawPixel(X. Y . C o l o r ) ;
one and
1
I* Draws t h e p i x e l a t
(X. Y) incolorColorin
s t a t i cv o i dD r a w P i x e l ( i n t
X . i n t Y . i n tC o l o r )
u n s i g n e dc h a rf a r* S c r e e n P t r ;
X);
ThePolygonPrimeval
717
*ScreenPtr
( u n s i g n e dc h a r ) C o l o r ;
LISTING38.3138-3.C
/*
Sampleprogram t o e x e r c i s e t h e p o l y g o n - f i l l i n g r o u t i n e s . T h i s c o d e
and a l l p o l y g o n - f i l l i n g c o d eh a sb e e nt e s t e dw i t hB o r l a n da n d
M i c r o s o f tc o m p i l e r s .
*/
# i n c l u d e< c o n i o . h >
P
in c l ude <dos. h>
#i
n c l ude "polygon. h"
/*
Draws t h e p o l y g o n d e s c r i b e d b y t h e p o i n t l i s t P o i n t L i s t i n c o l o r
C o l o rw i t h all v e r t i c e s o f f s e t by ( X . Y ) */
# d e f i n e DRAW-POLYGON(PointList.Color,X.Y)
\
Polygon.Length
sizeof(PointList)/sizeof(struct P o i n t ) ; \
Polygon.PointPtr
PointList;
\
FillConvexPolygon(&Polygon. C o l o r . X . Y ) ;
v o i dm a i n ( v o i d 1 :
e x t e r n i n t FillConvexPolygon(struct P o i n t L i s t H e a d e r
v o i dm a i n 0
i n t i. j :
s t r u c tP o i n t L i s t H e a d e rP o l y g o n ;
s t a t i cs t r u c tP o i n tS c r e e n R e c t a n g l e [ ]
*,
i n t .i n t .i n t l ;
~t0.0~,t320.03.t320.200~,~0,200~};
s t a t i cs t r u c tP o i n t
ConvexShape[]
t~0.0).~121,0}.t320.0~,t200,513,~301,51~,~250,51~,~319.143~,
1320.2001,~22.200~.~0.2001,~50.180~,t20.1603,~50,1403,
(20.120}, {50.100), t20.80}, ( 5 0 . 6 0 } , { 2 0 . 4 0 } , t 5 0 . 2 0 } ) ;
staticstructPoint
Hexagon[]
- ~~30.0~.~15.201,t0.0}3:
-- (I30.20}.(15.0}.(0,203}:
~~0.201.I20.101.I0,0}~;
tt90.-50~.~0.-901.~-90.-501~~-90.50~,~0,90~,~90,50~~;
s t a t i cs t r u c tP o i n tT r i a n g l e l C l
staticstructPointTriangle2Cl
s t a t i cs t r u c tP o i n tT r i a n g l e 3 C l
s t a t i cs t r u c tP o i n tT r i a n g l e 4 C l
u n i o n REGS r e g s e t ;
/*
- Ct20.20).~20.03.(0.103~;
--
S e tt h ed i s p l a yt o
VGA mode 1 3 h .3 2 0 x 2 0 02 5 6 - c o l o r
mode */
regset.x.ax
0x0013;
/ * AH 0 s e l e c t s mode s e tf u n c t i o n ,
AL
0 x 1 3s e l e c t s mode 0x13
when s e t a sp a r a m e t e r sf o rI N T
Ox10
i n t 8 6 ( 0 x 1 0 .& r e g s e t .& r e g s e t ) ;
/*
C l e a rt h es c r e e nt oc y a n
DRAW-POLYGON(ScreenRectang1e.
*/
3 . 0. 0 ) ;
/*
D r a w an i r r e g u l a r shape t h a t m e e t so u rd e f i n i t i o no fc o n v e xb u t
i s n o t c o n v e xb ya n yn o r m a ld e s c r i p t i o n
*/
DRAW-POLYGON(ConvexShape. 6. 0. 0 ) ;
getch0:
I* w a i ft o r
a keypress * I
/ * D r a w a d j a c e n tt r i a n g l e sa c r o s st h et o ph a l fo ft h es c r e e n
f o r( j - 0 ;j < - 8 0 ;
j+-20)
(
f o r( i - 0 ;i < 2 9 0 ;
i +- 3 0 ) {
DRAW-POLYGON(Triangle1. 2. i , j ) :
j);
DRAW-POLYGON(Triangle2. 4,i+15.
1
718
Chapter 38
*/
*/
/*
*I
D r a w a d j a c e n tt r i a n g l e sa c r o s st h eb o t t o mh a l fo ft h es c r e e n
f o r (j-100: j<-170;j+-20)
{
/ * Do a row o f p o i n t i n g - r i g h t t r i a n g l e s */
f o r( i - 0 :i < 2 9 0 :
i +- 20) I
DRAWKPOLYGON(Triangle3. 40. i. j ) :
I* Do a row o fp o i n t i n g - l e f tt r i a n g l e sh a l f w a yb e t w e e n
o f p o i n t i n g - r i g h t t r i a n g l e s and t h e n e x t , t o
f o r( i - 0 :i < 2 9 0 ;
i +- 20) I
DRAW-POLYGON(Triangle4. 1. 1. j+lO):
onerow
f i t between * I
I* w a i t f o r a k e y p r e s s * I
getch0:
I* F i n a l l y , draw a s e r i e so fc o n c e n t r i ch e x a g o n so fa p p r o x i m a t e l y
t h e same p r o p o r t i o n s i n t h e c e n t e r o f t h e s c r e e n
*/
i++)
I
f o r( i - 0 :i < 1 6 :
DRAW-POLYGON(Hexagon. i. 160, 1 0 0 ) :
f o r( j - 0 :
j<sizeof(Hexagon)/sizeof(struct P o i n t ) : j++) I
I* A d v a n c ee a c hv e r t e xt o w a r dt h ec e n t e r
*/
i f (HexagonCj1.X !- 0 ) I
HexagonCj1.X >- 0 ? 3 : - 3 :
HexagonCj1.X
HexagonCj1.Y >- 0 ? 2 : - 2 :
HexagonCj1.Y
} else I
HexagonCj1.Y -- HexagonCj1.Y >- 0 ? 3 : - 3 :
--
--
getch0:
I* w a i t f o r a k e y p r e s s * I
I* R e t u r n t o t e x t
mode and e x i t */
regset.x.ax
0x0003:
I* AL
3 s e l e c t s8 0 x 2 5t e x t
i n t 8 6 ( 0 x 1 0 ,& r e g s e t .& r e g s e t ) :
LISTING 38.4
POLYG0N.H
mode
*/
I* PDLYG0N.H: Header f i l e f o r p o l y g o n - f i l l i n g
/ * D e s c r i b e s a s i n g l ep o i n t( u s e df o r
s t r u c tP o i n t I
i n t X;
I* X c o o r d i n a t e */
i n t Y;
I* Y c o o r d i n a t e * I
code * I
a s i n g l ev e r t e x )
*I
I:
I* D e s c r i b e s a s e r i e s o f p o i n t s ( u s e d t o s t o r e
a listofverticesthat
assumed t o c o n n e c t t o t h e t w o
d e s c r i b e a p o l y g o n :e a c hv e r t e xi s
a d j a c e n tv e r t i c e s ,a n dt h el a s tv e r t e xi s
assumed t o c o n n e c t t o t h e
f i r s t ) *I
s t r u c tP o i n t L i s t H e a d e r I
i n t Length:
I* # o f p o i n t s * I
s t r u c tP o i n t * P o i n t P t r :
/* p o i n t e r t o l i s t o f p o i n t s
*I
1:
I* D e s c r i b e st h eb e g i n n i n ga n de n d i n g
h o r i z o n t a ll i n e *I
s t r u c tH L i n e
i n tX S t a r t :
i n t XEnd:
X coordinates o f a single
I* X c o o r d i n a t e o f l e f t m o s t p i x e l i n l i n e
/ * X c o o r d i n a t eo fr i g h t m o s tp i x e li nl i n e
*/
*I
1:
719
I* D e s c r i b e s a L e n g t h - l o n gs e r i e s
o f h o r i z o n t a ll i n e s ,
all assumed t o
b eo nc o n t i g u o u ss c a nl i n e ss t a r t i n ga tY S t a r t
a n dp r o c e e d i n g
downward(used
t o d e s c r i b e a s c a n - c o n v e r t e dp o l y g o nt ot h e
l o w - l e v e hl a r d w a r e - d e p e n d e n td r a w i n gc o d e )
*I
s t r u c tH L i n e L i s t (
i n t Length;
I* # o f h o r i z o n t a l l i n e s * I
i n t YStart;
I* Y c o o r d i n a t e o f t o p m o s t l i n e * I
s t r u c tH L i n e * H L i n e P t r ;
I* p o i n t e r t o l i s t o f h o r z l i n e s * /
}:
Listing 38.2 isnt particularly interesting; itmerely draws each horizontal linein the
passed-in list in thesimplest possible way, one pixel at atime. (No, that doesntmake
the pixel the fundamentalprimitive; in the next chapter
Ill replace Listing 38.2 with
a much fasterversion that doesnt botherwith individual pixels at all.)
Listing 38.1 is where the actionis in this chapter. Our goal is to scan out theleft and
right edgesof each polygon so that all points insideand nopoints outside thepolygon aredrawn, and so that all points locatedexactly on the boundary aredrawn only
if they are not on right or bottom edges. Thats precisely what Listing 38.1 does.
Heres how:
Listing 38.1 first finds the topand bottom of the polygon, then works out from the
top point to find the
two ends of the top edge.If the ends are at different locations,
the top is flat, which has two implications. First, itseasy to find the startingvertices
and directions through thevertex list for the leftand right edges. (To scan-convert
them properly, we must first determine which edge is which.) Second, the topscan
line of the polygon should be drawn without the rightmostpixel, because only the
rightmost pixel of the horizontal edge that makes up the top scan line is part of a
right edge.
If, on the other hand, the ends
of the top edge are at thesame location, the top is
pointed. In that case, the top scan line of the polygon isnt drawn; its part of the
right-edge line that startsat the top vertex.(Its part of a left-edge line, too, but the
right edge overrides.)
When the topisnt flat, its more difficult to tell in which direction through thevertex list the right and left edges go,because both edges startat
the top vertex. The solution
is to compare theslopes from the topvertex to the ends
of the two lines comingout of it in order to see whichis leftmost. The calculations in
Listing 38.1 involving the various deltas do this, using a rearranged form
of the slopebased equation:
(DeltaYN/DeltaXN)>(DeltaYP/DeltaXP)
Once we know where the left edge starts in thevertex list, we can scan-convert it a
line segmentat atime until the bottomvertex is reached. Each point is stored as the
starting X coordinate for the corresponding scan line in the list well pass to
DrawHorizontalLineLt. The nearest X coordinate on each scan line thats on or to
the right of the left edgeis selected. The last point of each line segmentmaking up
the left edge isnt scan-converted, producing two desirable effects. First, it avoids
720
Chapter 38
Previous Home
Next
drawing each vertex twice; two lines come into every vertex, but we want to scanconvert each vertex only once. Second, not scan-converting the last point of each
line causes the bottom scan line of the polygon not to be drawn, as required by our
rules. The first scan line of the polygon is alsoskipped if the top isnt flat.
Now we need to scan-convert the right edge into the endingX coordinate fields of
the line list. This is performed in the same manner as for the left edge, except that
every line in the right edge is moved one pixel to the left before being scan-converted. Why? We want the nearest point to the left of but not on the right edge, s o
that the right edgeitself isnt drawn. As it happens, drawing the nearest pointon or
to the right of a line moved one pixel to the left is exactly the same as drawing the
nearest point to the left of but not onthat line in its original location. Sketch it out
and youll see what I mean.
Once the two edges are scan-converted, the whole line list
is passed to
DrawHorizontalLineList,and the polygon is drawn.
Finis.
Oddball Cases
Listing 38.1 handles zero-length segments (multiple vertices at the same location)
by ignoring them,which will be useful down the roadbecause scaled-down polygons
can end up with nearby vertices moved to the same location. Horizontal line segments are fine anywhere in a polygon, too. Basically, Listing38.1 scanconverts between
active edges (the edges that define the extentof the polygon on each scan line) and
both horizontal and zero-length lines are non-active; neither advances to another
scan line, so they dont affect the edges being scanned.
Ive limited this chapters code to merely demonstrating theprinciples of filling convex polygons, and the listings given are by no means fast. In the next chapter,well
spice things up by eliminating the floating point calculations and pixel-at-a-time drawing and tossing a little assembly language into themix.
72 1
Previous
chapter 39
Home Next
725
The black box approachdoes not, however, necessarilycause the software itselfto
become faster, smaller,or more innovative; quite the opposite, suspect.
I
Ill reserve
judgement onwhether thatis a good thingor not, but
Ill make a prediction:In the
short run, the aforementioned techniques
will lead tonoticeably larger, slower programs, asprogrammers understand less and less of what
the key parts of their programs
do and rely increasingly on general-purpose code written by other people. (In the
long run, programswill be bigger and slower yet, but computers will be so fast and
will have so much memory that no onewill care.) Over time, PC programs will also
come to be more similar to one another-and to programs running on otherplatforms, such as the Mac-as regards both user interface and performance.
Again, I am not saying that this is bad. Itdoes, however, havemajor implications for
the future nature of PC graphics programming, in ways that will directly affect the
means by which many of youearn your livings. Not so very long fromnow, graphics
programming-all programming, for that matter-will become mostly a matter of
assembling in various ways components written by other people, and will cease to be
the all-inclusively creative, mindbendingly complex pursuit itis today.(Using legally
certified black boxes is, by the way, one direction in which the patent lawyers are
leading us; legal considerations may be the final nail in the coffin of homegrown
code.) For now, though, its stillwithin your power, as a PC programmer, to understand and even control every single thing that happens on a computer if you so
desire, to realize any vision you may have. Takeadvantage of this unique window of
opportunity to create some magic!
Neither does ithurt to understand whats involved in drawing, say, a filled polygon,
even if you are using a GUI. You will better understand the performance implications of the available GUI functions, and you will be able to fill in any gaps in the
functions provided. You may even find that you can outperform the GUI on occasion by doing your own drawing into a system memory bitmap, then copying the
result to the screen; for instance,you can do this under Windows by using the WinG
library available from Microsoft. You will also be able to understand why various
quirks exist, and will be able to put them to good use. For example, the X Window
System followsthe polygon drawing rules described in theprevious chapter (although
its not obvious from the X Window System documentation) ; if you understood the
previous chapters discussion, youre in good shapeto use polygons under X.
In short, even though doing so runs counter to current trends, it helps to understand how things work, especiallywhen theyre very visibleparts of the software you
develop. That said, lets learn more aboutfilling convex polygons.
726
Chapter 39
tion ScanEdge);
Drawingthescanned-outhorizontallinesthatconstitutethefilledpolygon
(DrawHorizontalLineList); and
Characterizing thepolygonand
coordinating the tracing anddrawing
(FillConvexPolygon).
The amountof time that the previous chapters sample program
spent in eachof these
areas is shown in Table 39.1.As you can see, halfthe time was spent drawing and the
other halfwas spent tracing the polygon edges (the time spent in FiUConvexPolygon
was relatively minuscule), so we have our choice of where to begin optimizing.
Fast Drawing
Lets start with drawing, which is easilysped up. Theprevious chapters code used a
double-nested loop that called a draw-pixel function to plot each pixel in the polygon individually. Thats a ridiculous approach in a graphics mode thatoffers linearly
mapped memory, asdoes VGA mode 13H, the mode in which were working.At the
very least, we could point a far pointer to the left edge of each polygon scan line,
then draw each pixel in that scan line in quick succession, using something along
the lines of *ScrPtr++ = FillColor; inside a loop.
However, it seems silly to use a loop when the x86 has an instruction, REP STOS,
thats uniquely suited to filling linear memory buffers. Theres no way to use REP
STOS directly in C code, but its a good bet that the memset library function uses
REP STOS, so you could greatly enhance performance by using memset to draw
each scan line of the polygon in a single shot. That, however, is easier saidthan done.
The memset function linked in from the library is tied to the memory model in use;
in small (which includes Tiny, Small,or Medium) data models memset accepts only
near pointers, so it cant be used to access screen memory. Consequently, a large
(which includes Compact, Large, or Huge)data model must be used to allow memset
to drawto display memory-a clear case ofthe tail waggingthe dog.This is an excellent example of why, although it is possible to use C to do virtually anything, its
sometimes much simpler just to use a little assembly code and be done with it.
At any rate, Listing 39.1for this chapter shows a version of DrawHorizontalLineList
that uses memset to draweach scan line of the polygon in a single call.When linked
to Chapter 38s test program, Listing 39.1 increasespure drawing speed (disregarding edge tracing and othernondrawing time) by more than an order of magnitude
Fast Convex Polygons
727
. .""
728
Chapter 39
,n
"
^ ^
."
~~., -
FillConvex
139-1 .C
LISTING 39.1
/*
Draws a l l p i x e l s i n t h e l i s t o f h o r i z o n t a l l i n e s
p a s s e di n ,i n
mode 13h,the
VGAs 3 2 0 x 2 0 02 5 6 - c o l o r mode. Usesmemset t o fill
e a c hl i n e ,w h i c hi s
much f a s t e rt h a n u s i n g D r a w P i x e lb u tr e q u i r e s
t h a t a l a r g ed a t a model(compact.large,orhuge)be
i n use when
r u n n i n g i n r e a l mode o r 286 p r o t e c t e d mode.
A
l C code t e s t e d w i t h B o r l a n d C++.
*/
# i n c l u d e< s t r i n g . h >
# i n c l u d e < d o s . h>
l i n c l ude polygon. h
# d e f i n e SCREEN-WIDTH
320
# d e f i n e SCREEN-SEGMENT
OxAOOO
v o i dD r a w H o r i z o n t a l L i n e L i s t ( s t r u c tH L i n e L i s t
intColor)
HLineListPtr.
s t r u c tH L i n e* H L i n e P t r ;
i n t Length,Width:
u n s i g n e dc h a rf a r* S c r e e n P t r ;
/* P o i n t t o t h e s t a r t o f t h e f i r s t
scan l i n e o nw h i c ht od r a w
ScreenPtr
MK-FP(SCREENLSEGMENT.
H L i n e L i s t P t r - > Y S t a r t * SCREEN-WIDTH);
*/
/*
P o i n tt ot h eX S t a r t / X E n dd e s c r i p t o rf o rt h ef i r s t( t o p )
h o r i z o n t a ll i n e */
HLineListPtr->HLinePtr:
HLinePtr
/ * D r a w e a c hh o r i z o n t a ll i n ei nt u r n ,s t a r t i n gw i t ht h et o p
oneand
*/
a d v a n c i n go n el i n ee a c ht i m e
Length
HLineListPtr->Length:
w h i l e( L e n g t h - > 0) I
I* Draw t h ew h o l eh o r i z o n t a ll i n e
i f i t has a p o s i t i v e w i d t h */
i f ((Width
HLinePtr->XEnd - H L i n e P t r - > X S t a r t + 1) > 0 )
memset(ScreenPtr + H L i n e P t r - > X S t a r t ,C o l o r ,W i d t h ) ;
HLinePtr++:
/ * pnosteiocnx at n
1 i n e X i n f o */
S c r e e n P t r +- SCREEN-WIDTH; /* p o i n t t o n e x t s c a n l i n e s t a r t
*/
At this point, Id like to mention that benchmarks are notoriously unreliable; the
results in Table 39.1 are accurate only for the test program, and only when running
on a particular system. Results could be vastly different if smaller, larger, or more
complex polygons weredrawn, or if a faster or slower computer/VGA combination
were used. Thesefactors notwithstanding, thetest program doesfill a variety of polygons of varying complexity sizedfrom large to small and in between, and certainly
the order of magnitude difference between Listing 39.1 and the old version of
DrawHorizontalLineList is a clear indication of which code is superior.
Fast Convex Polygons
729
Anyway, Listing 39.1 has the desired effect of vastly improving drawing time. There
are cycles yetto be had in the
drawing code, butas tracing polygon edges now takes
92 percent of the polygon filling time, its logical tooptimize the tracing code next.
Floating point is very accurate-but it is not precise. Integer calculations, properly performed, are.
730
Scan c o n v e r t s anedgefrom
( X 1 . Y l ) t o (X2.YZ).
n o ti n c l u d i n gt h e
p o i n ta t (X2.Y2).
If SkipFirst
1. t h e p o i n t a t
(X1.Yl) isnt
0. it i s . F o r eachscan
l i n e ,t h ep i x e l
drawn; i f S k i p F i r s t
c l o s e s tt ot h es c a n n e de d g ew i t h o u tb e i n gt ot h el e f t
o f the
scannededge i s chosen.Uses
an a l l - i n t e g e ra p p r o a c h f o r speedand
p r e c i s i o n . */
Chapter 39
#i
n c l ude <math. h>
# i n c l u d e " p o l y g o n . h"
v o i dS c a n E d g e ( i n t X 1 . i n t Y 1 . i n t X2, i n t Y2. i n tS e t X S t a r t ,
i n tS k i p F i r s t .s t r u c tH L i n e* * E d g e P o i n t P t r )
i n t Y . DeltaX.Height,Width,AdvanceAmt.ErrorTerm,
i n t ErrorTermAdvance.XMajorAdvanceAmt:
s t r u c tH L i n e* W o r k i n g E d g e P o i n t P t r ;
i:
--
* E d g e P o i n t P t r : / * a v o i dd o u b l ed e r e f e r e n c e
WorkingEdgePointPtr
AdvanceAmt
((DeltaX
X2 - X 1 ) > 0 ) ? 1 : -1:
/* d i r e c t i o n i n w h i c h X moves(Y2 i s
always > Y 1 , s o Y a l w a y sc o u n t su p )
i f ((Height
return:
edge
*/
edges
*/
*/
*/
/ * F i g u r eo u tw h e t h e rt h ee d g ei sv e r t i c a l ,d i a g o n a l ,X - m a j o r
( m o s t l yh o r i z o n t a l ) .o rY - m a j o r( m o s t l yv e r t i c a l )a n dh a n d l e
a p p r o p r i a t e l y */
i f ((Width
abs(De1taX)) -= 0 ) {
I* Theedge i s v e r t i c a l ; s p e c i a l - c a s e b y j u s t s t o r i n g t h e
same
X c o o r d i n a t ef o re v e r ys c a nl i n e
*/
/ * Scan t h e edge f o r e a c h s c a n l i n e i n t u r n
*/
f o r (i H e i g h t - S k i p F i r s t : i - -> 0; WorkingEdgePointPtr++) {
/ * S t o r et h e X c o o r d i n a t e i n t h e a p p r o p r i a t e e d g e l i s t
*/
i f (SetXStart
1)
WorkingEdgePointPtr->XStart
X1;
else
WorkingEdgePointPtr->XEnd
X1:
I e l s e i f (Width
Height) {
Theedge
i sd i a g o n a l ;s p e c i a l - c a s eb ya d v a n c i n gt h e
X
c o o r d i n a t e 1 p i x e lf o re a c hs c a nl i n e
*I
i f ( S k i p F i r s t ) /* s k i p t h e f i r s t p o i n t
i f s o i n d i c a t e d */
X 1 +- AdvanceAmt; / * move 1 p i x e l t o t h e l e f t o r r i g h t
*/
/ * Scan t h ee d g ef o re a c hs c a nl i n e
i n t u r n *I
f o r (i H e i g h t - S k i p F i r s t ; i - > 0: WorkingEdgePointPtr++) {
/ * S t o r et h e X c o o r d i n a t e i n t h e a p p r o p r i a t e
edge l i s t */
i f (SetXStart
1)
WorkingEdgePointPtr->XStart
X1:
else
WorkingEdgePointPtr->XEnd
X1;
X 1 +- AdvanceAmt; /* move 1 p i x e l t o t h e l e f t o r r i g h t
*/
/*
--
I
1 e l s e i f (Height > Width)
/*
Edge i sc l o s e rt ov e r t i c a lt h a nh o r i z o n t a l( Y - m a j o r )
*/
i f ( D e l t a X >- 0 )
ErrorTerm
0: / * i n i t i a l e r r o r t e r m g o i n g l e f t - > r i g h t
*/
else
ErrorTerm
- H e i g h t + 1;
/* g o i n gr i g h t - > l e f t */
i f (SkipFirst) {
/* s k i pt h ef i r s tp o i n t
i f so i n d i c a t e d * /
/ * Determinewhether i t ' s t i m e f o r t h e X c o o r dt oa d v a n c e */
i f ( ( E r r o r T e r m +- W i d t h ) > 0) t
X 1 +- AdvanceAmt: / * move 1 p i x e l t o t h e l e f t
or r i g h t * I
E r r o r T e r m -- H e i g h t : / * advanceErrorTerm t o n e x t p o i n t */
/ * Scan t h e edge f o r e a c h s c a n l i n e i n t u r n
*/
f o r (i H e i g h t - S k i p F i r s t ; i - > 0: WorkingEdgePointPtr++)
73 1
X c o o r d i n a t ei nt h ea p p r o p r i a t ee d g el i s t
*/
1)
WorkingEdgePointPtr->XStart
X1:
else
WorkingEdgePointPtr->XEnd
X1;
/ * D e t e r m i n ew h e t h e ri t ' st i m ef o rt h e
X c o o r dt oa d v a n c e
i f ( ( E r r o r T e r m +- W i d t h ) > 0) {
X 1 +- AdvanceAmt: I* move 1 p i x e l t o t h e l e f t o r r i g h t
E r r o r T e r m -- H e i g h t : /* advanceErrorTerm t o c o r r e s p o n d
I* S t o r e t h e
i f (SetXStart
1 else
*/
*/
*/
/*
Edge i sc l o s e rt oh o r i z o n t a lt h a nv e r t i c a l( X - m a j o r )
*/
I* Minimum d i s t a n c e t o a d v a n c e
X e a c ht i m e * I
XMajorAdvanceAmt
( W i d t h / H e i g h t ) * AdvanceAmt;
I* E r r o r t e r m a d v a n c e f o r d e c i d i n g
when t o advance X 1 e x t r a */
ErrorTermAdvance
Width % Height:
i f ( D e l t a X >- 0)
ErrorTerm
0: / * i n i t i a l e r r o r t e r m g o i n g l e f t - > r i g h t
*/
else
- H e i g h t + 1:
/* g o i n gr i g h t - > l e f t */
ErrorTerm
i f (SkipFirst) {
I* s k i pt h ef i r s tp o i n t
i f so i n d i c a t e d * /
X 1 +- XMajorAdvanceAmt:
/* move X minimum d i s t a n c e * I
I* D e t e r m i n e w h e t h e r i t ' s t i m e f o r
X t o advanceoneextra
*I
i f ( ( E r r o r T e r m +- ErrorTermAdvance) > 0) {
I* move X onemore * I
X 1 +- AdvanceAmt:
E r r o r T e r m -- H e i g h t : /* advanceErrorTerm t o c o r r e s p o n d * /
*EdgePointPtr
- WorkingEdgePointPtr:
/ * a d v a n c ec a l l e r ' sp t r
*/
732
Chapter 39
meets that eye, though. Display memory generally responds much moreslowly than
system memory, especiallyin 386 and 486 systems. That means that muchof the time
taken by Listing 39.3 is actually spent waiting for display memory accesses to complete, with the processor forced to idle by wait states. If,instead, Listing 39.3 drew to
a local buffer in system memory or to a particularly fast VGA, the assembly implementation might well display a far more substantial advantage over the C code.
And indeed it does. When the test program is modified to draw to a local buffer,
both theC and assembly language versions get 0.29 seconds faster, that being a measure of the time taken by display memory wait states. Withthose wait states factored
out, theassembly language version of DrawHorizontalLineLitbecomes almost three
times as fast asthe C code.
There is a lesson here. An optimization has no fixed payofl its value fluctuates
according to the context in which it is used. Therek relatively little benefit to firther
optimizing code that already spends
halfits time waitingfor display memoy; no matter how good your optimizations, you'll getonly a two-times speedup at best, and
generally much less than that. There is, on the other hand, potential for tremendous improvement when drawing to system memoy , so ifthat k where most ofyour
drawing will occui; optimizations such as Listing 39.3 are well worth the effort.
Know the environments in which your code will run, and know where the cycles go
in those environments.
LISTING 39.3139-3.ASM
o f h o r i z o n t a ll i n e sp a s s e di n .i n
VGA's 320x200256-color
mode.Uses
REP STOS t o fill
; Draws a l l p i x e l s i n t h e l i s t
: mode 1 3 h .t h e
: each l i n e .
; C n e a r - c a l l a b l ea s :
;
void DrawHorizontalLineList(struct HLineList
intColor);
; A
l a s s e m b l yc o d et e s t e dw i t h
HLineListPtr.
SCREEN-WIDTH
SCREEN-SEGMENT
equ
equ
320
OaOOOh
HLine struc
XStart
XEnd
HLi ne
dw
dw
ends
?
?
;X c o o r d i n a t e o f l e f t m o s t p i x e l i n l i n e
; X c o o r d i n a t eo fr i g h t m o s tp i x e li nl i n e
H L i n e L i s ts t r u c
Lngth
YStart
HLinePtr
HLineList
dw
dw
dw
ends
?
?
;# o
;pointertolist
dw
dw
dw
ends
2 dup(?)
?
?
; r e t u r na d d r e s s & pushed BP
; p o i n t e rt oH L i n e L i s ts t r u c t u r e
; c o l o rw i t hw h i c ht o
fill
f h o r i z o n t a ll i n e s
; Y c o o r d i n a t eo ft o p m o s tl i n e
o f h o r zl i n e s
Parms s t r u c
HLineListPtr
Color
Parms
733
.modelsmall
.code
pub1 ic - D r a w H o r i z o n t a l L i n e L i s t
align
2
- D r a w H o r i z o n t a l L i n epLr oi sct
push
bp
mov
bp.sp
p u sshi
push d i
cld
ax.SCREEN-SEGMENT
mov
mov
es.ax
mov
s. [i b p + H L i n e L i s t P t r l
ax,SCREEN-WIDTH
mov
mu1
[si+YStartl
mov
dx,ax
mov
bx.[si+HLinePtrl
mov
si.[si+Lngthl
and
s. is i
j zF i
11 Done
mov
a l . b y tpetCr b p + C o l o r l
mov
ah.al
Fi 11 Loop:
mov
d. [i b x + X S t a r t l
mov
cx.[bx+XEndl
cx.di
sub
L i n ejFs i l l D o n e
ci xn c
add
d i .dx
t e s t d i .1
jz
Mai
nFi
11
stosb
cx
dec
jz
MainFill:
shr
rep
adc
L ni e F 1i 1 Done
cx.1
stosw
cx.cx
srteops b
LineFillDone:
add
b x . s i zHeL i n e
add
dx.SCREEN-WIDTH
dec
si
j n zF i
11Loop
F i 1 1Done:
pop
di
pop
si
bPPOP
ret
- D r a w H o r i z o n t a l L i n e Ln idspt
end
734
Chapter 39
: p r e s e r v ec a l l e r ' ss t a c kf r a m e
: p o i n tt oo u rs t a c kf r a m e
; p r e s e r v ec a l l e r ' sr e g i s t e rv a r i a b l e s
:make s t r i n g i n s t r u c t i o n s i n c p o i n t e r s
: p o i n t ES t o d i s p l a y
:pointtothelinelist
:pointtothestartofthefirstscan
; lineinwhichto
draw
:ES:DX p o i n t s t o f i r s t s c a n l i n e t o
; draw
: p o i n tt ot h eX S t a r t / X E n dd e s c r i p t o r
: f o rt h ef i r s t( t o p )h o r i z o n t a ll i n e
P
: o fs c a nl i n e st od r a w
; a r e t h e r e any l i n e s t o draw?
:no. so we'redone
; c o l o rw i t hw h i c ht o
fill
: d u p l i c a t ec o l o rf o r
STOSW
: l e f t edge o f fill on t h i s l i n e
; r i g h t edge o f fill
; s k i p i f n e g a t i v ew i d t h
: w i d t ho f fill on t h i s l i n e
;offset of left
edge o f fill
:does fill s t a r t a t anoddaddress?
:no
; y e s .d r a wt h eo d dl e a d i n gb y t et o
; w o r d - a l i g nt h er e s to ft h e
fill
; c o u n to f ft h e
o d dl e a d i n gb y t e
;done i f t h a t was t h e o n l y b y t e
;#o f words i n fill
:fill as many w o r d sa sp o s s i b l e
trailingbyteto
:1 i f t h e r e ' s anodd
: do, 0 o t h e r w i s e
:fill anyodd
t r a i l i n gb y t e
:pointtothenextlinedescriptor
: p o i n tt ot h en e x ts c a nl i n e
:countofflinesto
fill
; r e s t o r ec a l l e r ' sr e g i s t e rv a r i a b l e s
; r e s t o r ec a l l e r ' ss t a c kf r a m e
LISTING39.4139-4.ASM
:
:
:
:
:
Scan c o n v e r t sa ne d g ef r o m
( X 1 , Y l ) t o ( X 2 . Y Z ) . n o ti n c l u d i n gt h e
p o i n t a t ( X 2 , Y Z ) . I f S k i p F i r s t == 1. t h e p o i n t a t
(X1,Yl) isnt
drawn: i f S k i p F i r s t == 0 . i t i s .F o re a c hs c a nl i n e ,t h ep i x e l
c l o s e s t t o t h e s c a n n e de d g ew i t h o u tb e i n gt ot h el e f to ft h es c a n n e d
edge i s chosen. Uses a na l l - i n t e g e ra p p r o a c hf o rs p e e d
& precision.
735
: C n e a r - c a l l a b l ea s :
v o i dS c a n E d g e ( i n t
X 1 , i n t Y 1 , i n t X2, i n t Y2. i n S
t etXStart,
iS
n tk i p F i r sst t. r u H
c tL i n* e* E d g e P o i n t P t r ) ;
: Edgesmust n o t gobottom t o t o p : t h a t i s ,
Y 1 mustbe
<- Y2.
: U p d a t e st h ep o i n t e rp o i n t e dt ob yE d g e P o i n t P t rt op o i n tt ot h en e x t
: f r e ee n t r yi nt h ea r r a yo fH L i n es t r u c t u r e s .
;
HsLtirnuec
XStart
XEnd
HLine
ends
dw
dw
?
?
;X c o o r d i n a t e o f l e f t m o s t p i x e l i n s c a n l i n e
; X c o o r d i n a t eo fr i g h t m o s tp i x e li ns c a nl i n e
Y1
x2
Y2
SetXStart
dw
dw
dw
dw
dw
dw
2 dup(?)
?
?
?
?
?
SkipFi rst
dw
EdgePointPtr
dw
; r e t u r na d d r e s s & pushed BP
:X s t a r tc o o r do fe d g e
:Y s t a r t c o o r d o f
edge
: X e n dc o o r do fe d g e
: Y endcoordofedge
;1 t o s e t t h e X S t a r t f i e l d o f e a c h
: H L i n es t r u c . 0 t o s e t XEnd
;1 t o s k i p s c a n n i n g t h e f i r s t p o i n t
: o f t h ee d g e , 0 t o scan f i r s t p o i n t
; p o i n t e rt o a pointertothearrayof
: H L i n es t r u c t u r e si nw h i c ht os t o r e
; thescanned
X coordinates
Parms
struc
x1
Parms
ends
: O f f s e t sf r o m
AdvanceAmt -2
Height
LOCALLSIZE
BP i ns t a c kf r a m eo fl o c a lv a r i a b l e s
equ
equ
equ
-4
4
.modelsmal 1
.code
pub1 i c -ScanEdge
a1 i g n 2
-ScanEdge
proc
push
bp
mov
bp.sp
sub
sp.LOCAL-SIZE
push s i
push d i
mov
di.[bp+EdgePointPtrl
mov
d i ,[ d i 1
cmp
Cbp+SetXStartl.l
jz
add
HLinePtrSet
d i .XEnd
HLinePtrSet:
mov
bx.[bp+YZI
bx.[bp+Y11
sub
jl e
ToScanEdgeExi t
mov
Cbp+Heightl,bx
cx,cx
sub
mov
dx.1
mov
ax.Cbp+XZl
ax.Cbp+Xll
sub
jz
Isvertical
736
Chapter 39
; t o t a ls i z eo fl o c a lv a r i a b l e s
; p r e s e r v ec a l l e r ' ss t a c kf r a m e
: p o i n tt oo u rs t a c kf r a m e
; a l l o c a t es p a c ef o rl o c a lv a r i a b l e s
: p r e s e r v ec a l l e r ' sr e g i s t e rv a r i a b l e s
: p o i n tt ot h eH L i n ea r r a y
; s e tt h eX S t a r tf i e l do fe a c hH L i n e
: struc?
;yes. D I p o i n t s t o t h e f i r s t X S t a r t
:no. p o i n t t o t h e XEnd f i e l d o f t h e
: f i r s t H L i n es t r u c
: e d g eh e i g h t
: g u a r da g a i n s t0 - l e n g t h
& horzedges
:Height
Y2 - Y 1
;assume E r r o r T e r ms t a r t sa t
0 (true if
: w e ' r em o v i n gr i g h t
as we draw)
1 (move r i g h t )
;assumeAdvanceAmt
;DeltaX
X2 - X 1
: i t ' s a v e r t i c a le d g e - - s p e c i a lc a s e
it
;DeltaX >- 0
jns
SetAdvanceAmt
mov
cx.1
;DeltaX < 0 (move l e f t as we draw)
sub
cx.
bx
;ErrorTerm
-Height + 1
dx
neg
-1 (move l e f t )
;AdvanceAmt
ax
neg
;Width
abs(De1taX)
SetAdvanceAmt:
mov
[bp+AdvanceAmtl.dx
: F i g u r eo u tw h e t h e rt h ee d g ei sd i a g o n a l ,X - m a j o r( m o r eh o r i z o n t a l ) .
: or Y - m a j o r( m o r ev e r t i c a l )
a n dh a n d l ea p p r o p r i a t e l y .
cmp
ax.bx
; i f Width-Height.
i t ' s a diagonaledge
; i t ' s a d i a g o n a el d g e - - s p e c i a cl a s e
I s D i jazg o n a l
; i t ' s a Y - m a j o r( m o r ev e r t i c a l )e d g e
jb
YMajor
; i t ' s anX-major(morehorz)edge
;prepare
dx.dxsub
DX:AXd i v(i W
fsoi iordnt h )
div
;DX
e r r o rt e r ma d v a n c ep e rs c a nl i n e
;SI
minimum I
p ioxtfeo l s
advance X
mov
si.ax
: on eachscan l i n e
t e s t [bp+AdvanceAmt1.8000h
;move l e f t or r i g h t ?
jz
XMajorAdvanceAmtSet
: r i gahl tr .e as de yt
a d vdai stnotcntaneg
hen;egl ceaeftte. s i
; oneachscan
line
XMajorAdvanceAmtSet:
;
mov [ b; p
s at+axXr, lt ]i n g
X coordinate
cmp
C b p + S k i p F i r s t l .;l s k i pt h ef i r s pt o i n t ?
jz
XMajorSkipEntry
XMajorLoop:
mov
[di].ax
; s t o r et h ec u r r e n t
X value
add
d i . s i zHeL i n e
; p o i n tt ot h en e x tH L i n es t r u c
XMajorSkipEntry:
ax.si
add
; s e t X f o rt h en e x ts c a nl i n e
cx.dx
add
: a d v a n c ee r r o rt e r m
jle
XMajorNoAdvance
; n o tt i m ef o r
X c o o r dt oa d v a n c eo n e
; extra
ax.Cbp+AdvanceAmtl
add
:advance X c o o r do n ee x t r a
: a d j u s te r r o rt e r mb a c k
cx.[bp+Heightl
sub
XMajorNoAdvance:
:countoffthisscanline
bx
dec
jnz
XMajorLoop
ScanEdgeDone
jmp
align 2
ToScanEdgeExit:
ScanEdgeExit
jmp
align 2
Isvertical :
mov
ax,[bp+Xl]
X coordinate
; s t a r t i n g( a n do n l y )
bsxu,b[ b p + S k i p F i r s t l
; l o o pc o u n t
Height - S k i p F i r s t
jz
ScanEdgeExit
;no s c a n l i n e s l e f t a f t e r s k i p p i n g 1 s t
V e r t i c a l Loop:
mov
[di].ax
: s t o r et h ec u r r e n t
X value
add
d i . s i zHeL i n e
: p o i n tt ot h en e x tH L i n es t r u c
bxdec
; c o u n to f ft h i ss c a nl i n e
jnz
V e r t i c a l Loop
ScanEdgeDone
jmp
align 2
IsDiagonal :
mov
ax.Cbp+Xl]
: s t a r t i n g X coordinate
cmp
[bp+SkipFi r s t l . l
;skipthefirstpoint?
0 i a g jozn a l S k i p E n t r y ; y e s
--
bx
--
737
Previous
Diagonal Loop:
mov
Cdi1.a~
add
d i . s i zHeL i n e
DiagonalSkipEntry:
ax.dx
add
bx
dec
j nDz i a g o n a l
Loop
jmp
ScanEdgeDone
align 2
YMajor:
push
bp
mov
s i , Cbp+X11
cmp
[bp+SkipFirstl.l
mov
bp.bx
jz
YMajorSkipEntry
YMajorLoop:
mov
[di 3 ,si
add
d i . s i zHeL i n e
YMajorSkipEntry:
cx.ax
add
jle
YMajorNoAdvance
add
s i .dx
cx.bp
sub
YMajorNoAdvance:
bx
dec
YMajorLoop
jnz
POP
bp
ScanEdgeDone:
cmp
Cbp+SetXStartl.l
jz
UpdateHLinePtr
sub
d i .XEnd
UpdateHLinePtr:
mov
bx.Cbp+EdgePointPtrl
mov
C b .xdl i
ScanEdgeExit:
pop
di
POP
si
mov
sp.bp
POP
bp
ret
-ScanEdge
endp
end
738
Chapter 39
: s t o r et h ec u r r e n t
X value
: p o i n tt ot h en e x tH L i n es t r u c
:advancethe X c o o r d i n a t e
: c o u n to f ft h i ss c a nl i n e
: p r e s e r v es t a c kf r a m ep o i n t e r
; s t a r t i n g X coordinate
:skipthefirstpoint?
BP f o r e r r o r t e r m c a l c s
: p u tH e i g h ti n
: y e s .s k i pt h ef i r s tp o i n t
X value
: s t o r et h ec u r r e n t
: p o i n tt ot h en e x tH L i n es t r u c
; a d v a n c et h ee r r o rt e r m
: n o tt i m ef o r
X c o o r dt oa d v a n c e
:advancethe X c o o r d i n a t e
: a d j u s te r r o rt e r mb a c k
: c o u n to f ft h i ss c a nl i n e
: r e s t o r es t a c kf r a m ep o i n t e r
:were we w o r k i n g w i t h X S t a r t f i e l d ?
:yes. D I p o i n t s t o t h e n e x t X S t a r t
:no. p o i n t b a c k t o t h e X S t a r t f i e l d
: p o i n tt op o i n t e rt oH L i n ea r r a y
: u p d a t ec a l l e r ' sH L i n ea r r a yp o i n t e r
: r e s t o r ec a l l e r ' sr e g i s t e rv a r i a b l e s
: d e a l l o c a t el o c a lv a r i a b l e s
: r e s t o r ec a l l e r ' ss t a c kf r a m e
Home
Next
Previous
chaptyer 40
of songs, taxes, and the simplicity of complex
polygons
Home
Next
8?
$2::
.2
Ia
h IrregularPolygonalAreas
,~
*_n.
Every so often, m5daughterasks me to sing her to sleep. (If youve ever heard me
sing, this may caus&you concern about either her hearing or her judgement, but
love knows no boun&j As any parent is well aware, singing a young child to sleep
can easily take several;&%&%,
or until sunrise, whichever comes last. One night, running low on childre$s songs, I switched to a Beatles medley, and at long last her
breathing became s&w and regular. At the end,I softly sang A Hard Days Night,
then quietly stood i p to leave. As I tiptoed out, she said, in a voice not even faintly
tinged with slee #Dad, what do they mean, working like a dog? Chasing a stick?
That doesflFf;& ..x
asense; people dont
chase sticks.
That ledus into a dikussion of idioms, which made about as much sense to her as an
explanation of quantnm mechanics. Finally, I fell back on my standard explanation
of the Universe, whichis that a lot of the time it simplydoesnt make sense.
As a general principle, that explanation holds up remarkably well. (In fact, having
just done my taxes, I think Earth is actually run by blob-creatures from the planet
Mrxx, who are helplessly doubled over with laughter at the ridiculous things they
can make us do. Lets make them get Social Security numbers for theirpets next
year! theyre saying right now, gasping for breath.) Occasionally, however,one has
the rarepleasure of finding a corner of the Universe that makes sense, where everything fits together as if preordained.
Filling arbitrary polygons is such a case.
74 1
Active Edges
The basic premise of filling a complex polygon is that fora given scanline, we determine all intersections between the polygons edges and that scan line and then fill
the spans between the intersections, as shown in Figure 40.1. (Section 3.6 of Foley
and van Dams Computer Guphics, Second Edition provides an overview of this and
other aspects of polygon filling.) There areseveral rules that mightbe used to determine which spans are drawn and which arent; well use the odd/even rule, which
specifies that drawing turns on after odd-numbered intersections(first, third, andso
on) andoff after even-numbered intersections.
The question then becomes how can we most efficiently determine which edges
cross each scan line and where? As it happens, there is a great deal of coherence
from one scan line to the next ina polygon edge list, because each edge starts at a
given Y coordinate and continues unbroken until it ends. In other words, edges
dont leap about and stop and start randomly; the X coordinate of an edge at one
scan line is a consistent delta from thatedges X coordinate at the
last scanline, and
that is consistent for the lengthof the line.
742
Chapter 40
Intersection#2
turns off
Intersection #1
turns on
drawing
Intersection#3
turns on
0 0 0 0 0 0
This allows us toreduce the numberof edges that must be checked for intersection;
on any given scan line, we only need to check for intersections with the currently
active edges-edges that start on that scan line, plus all edges that start on earlier
(above) scan lines and haven't ended yet-as shown in Figure 40.2. This suggests
that we can proceed from thetop scan line of the polygon to the bottom, keepinga
743
running list of currently active edges-called the active edge table (AET)-with the
edges sorted in order of ascending X coordinate of intersection with the current
scan line. Then, we can simply filleach scan line in turn according to the list of active
edges at that line.
Maintaining the AET from one scan line to the next involves three steps: First, we
must add to the AET any edges that start on the current scan line, making sure to
keep the AET X-sorted for efficient odd/even scanning. Second, we must remove
edges that end on the current
scan line. Third, we must advance the X coordinates
of active edges with the same sort of error term-based, Bresenhams-like approach
we used for convex polygons,again ensuring that the AET is X-sorted after advancing the edges.
Advancing the X coordinates is easy. For each edge,well store the currentX coordinate and all required error term information, and
well usethat to advance the edge
one scan line at a time; then, well resort the AET by X coordinate as needed. Removing edges as theyend is also easy; well
just countdown the lengthof each active
edge on each scan line and remove an edge when its count reaches zero. Adding
edges as their tops are encountered is a tad more complex. While there are a number of ways to do this, one particularly efficient approach is to start out by putting all
the edges of the polygon, sorted by increasing Y coordinate, into single
a
list, called
the global edge table (GET). Then,as each scan line is encountered, all edges at the
start of the GET that begin on the currentscan line are moved to the AET; because
the GET is Y-sorted,theres no need to search the entire GET. For still greater efficiency, edges in the GET that share common Y coordinates can be sorted by increasing
X coordinate; this ensures that no more than one pass through the AET per scan
line is ever needed when adding new edges from the GET in such a way as to keep
the AET sorted in ascending X order.
What form shouldthe GET and AET take? Linked lists of edge structures,as shown
in Figure 40.3. With linked lists, all thats required to move edges from the GET to
the AET as they become active,sort theAET, and remove edges that have been fully
drawn is the exchangingof a few pointers.
In summary, well initiallystore all the polygon edges in Yprimary/X-secondary sort
order in the GET, complete with initial X and Y coordinates, error terms and error
term adjustments, lengths, and directions of X movement for each edge. Once the
GET is built, well do thefollowing:
1. Set the current Y coordinate to the Y coordinate of the first edge in the GET.
2. Move all edgeswith the current Y coordinate from the GET to
the AET, removing them
from the GET and maintaining the X-sorted order of the AET.
3. Draw all odd-to-even spansin the AET at the current Y coordinate.
4. Count down the lengths ofall edges in the AET, removing any edges that are done, and
advancing theX coordinates of all remaining edges in the AET by one scan line.
744
Chapter 40
Count
Count
Next * edge
Next'edge
Count
Count
Count
'edge
Next
'edge
Next
'edge
Next
L40-1.C
C o l o r - f i l l s an a r b i t r a r i l y - s h a p e d p o l y g o n d e s c r i b e d
by V e r t e x L i s t .
I f t h e f i r s t and l a s t p o i n t s i n V e r t e x L i s t a r e n o t t h e
same, t h ep a t h
a r o u n dt h ep o l y g o n
i sa u t o m a t i c a l l yc l o s e d .
Al v e r t i c e s a r e o f f s e t
1 f o rs u c c e s s ,
0 i f memory a l l o c a t i o n
b y( X O f f s e t ,Y O f f s e t ) .R e t u r n s
failed. A
l C c o d et e s t e dw i t hB o r l a n d
C++.
I f t h ep o l y g o ns h a p ei s
known i n a d v a n c e ,s p e e d i e rp r o c e s s i n g
may be
- a r u b b e rb a n d
e n a b l e db ys p e c i f y i n gt h es h a p ea sf o l l o w s :" c o n v e x "
s t r e t c h e da r o u n dt h ep o l y g o nw o u l dt o u c he v e r yv e r t e xi no r d e r :
"nonconvex" - t h ep o l y g o ni sn o ts e l f - i n t e r s e c t i n g ,b u tn e e dn o tb e
convex:"complex"
- t h ep o l y g o n
may b es e l f - i n t e r s e c t i n g ,o r ,i n d e e d ,
any s o r t o f p o l y g o n a t
all. Complex will w o r kf o ra l lp o l y g o n s :c o n v e x
i sf a s t e s t .U n d e f i n e dr e s u l t s
will o c c u r i f convex i s s p e c i f i e d f o r
a
n o n c o n v e xo rc o m p l e xp o l y g o n .
D e f i n e CONVEX-CODELLINKED i f t h e f a s t c o n v e x p o l y g o n f i l l i n g c o d e f r o m
Chapter 38 i sl i n k e di n .O t h e r w i s e ,c o n v e xp o l y g o n sa r e
h a n d l e db yt h ec o m p l e xp o l y g o nf i l l i n gc o d e .
Nonconvex i s handledascomplex
i nt h i si m p l e m e n t a t i o n .
See t e x t f o r a
d i s c u s s i o no ff a s t e rn o n c o n v e xh a n d l i n g .
*/
P
in c l u d e < s t d i o . h >
# i n c l u d e< m a t h . h >
P i f d e f -TURBOC-
745
{temp
- a: a - b: b - temp:)
s t r u c tE d g e S t a t e
(
s t r u c tE d g e S t a t e* N e x t E d g e :
i n t X:
in t S t a r t Y :
in t WholePixelXMove;
i n t X D i r e c t i on:
in t E r r o r T e r m :
in t ErrorTermAdjUp:
i n t ErrorTermAdjDown;
i n t Count:
1:
extern
extern
static
static
static
static
static
v o i d OrawHorizontalLineSeg(int. i n t . i n t , i n t ) :
i n t FillConvexPolygon(struct P o i n t L i s t H e a d e r *, i n t . i n t . i n t ) :
v o i dB u i l d G E T ( s t r u c tP o i n t L i s t H e a d e r
*, s t r u c tE d g e S t a t e *, i n t . i n t ) :
v o i dM o v e X S o r t e d T o A E T ( i n t ) :
v o i dS c a n O u t A E T ( i n t .i n t ) :
v o i dA d v a n c e A E T ( v o i d 1 :
v o i dX S o r t A E T ( v o i d ) ;
I* P o i n t e r st og l o b a le d g et a b l e
(GET) a n da c t i v ee d g et a b l e( A E T )
s t a t i cs t r u c tE d g e S t a t e* G E T P t r .* A E T P t r ;
*I
* V e r t e x L i s t .i n tC o l o r .
i n tF i l l P o l y g o n ( s t r u c tP o i n t L i s t H e a d e r
i n t PolygonShape. i n t X O f f s e t . i n t Y O f f s e t )
s t r u c tE d g e S t a t e* E d g e T a b l e B u f f e r :
intCurrentY:
# i f d e f CONVEX-CODELLINKED
I* P a s sc o n v e xp o l y g o n st h r o u g ht of a s tc o n v e xp o l y g o nf i l l e r
*I
i f (PolygonShape
CONVEX)
return(FillConvexPolygon(VertexList. C o l o r ,X O f f s e t .Y O f f s e t ) ) ;
#endl f
I* It t a k e s a minimum o f 3 v e r t i c e s t o c a u s e a n y p i x e l s t o b e
d r a w n :r e j e c tp o l y g o n st h a ta r eg u a r a n t e e dt ob ei n v i s i b l e
*I
i f (VertexList->Length < 3 )
return(1):
I* Getenough memory t o s t o r e t h e e n t i r e
edge t a b l e * I
i f ((EdgeTableBuffer
*
( s t r u c tE d g e S t a t e *) ( m a l l o c ( s i z e o f ( s t r u c tE d g e s t a t e )
VertexList->Length)))
NULL)
return(0):
I* c o u l d n ' tg e t memory f o rt h ee d g et a b l e
*/
I* B u i l d t h e g l o b a l e d g e t a b l e
*I
B u i l d G E T ( V e r t e x L i s tE
. d g e T a b l e B u f f e rX
, O f f s e tY
, Offset);
I* Scan down t h r o u g ht h ep o l y g o ne d g e s ,o n es c a nl i n ea t
a time,
so l o n g a s a t l e a s t
oneedgeremains
i n e i t h e r t h e GET o r AET * I
NULL:
I* i n i t i a l i z et h ea c t i v e
e d g et a b l et oe m p t y
*I
AETPtr
CurrentY
G E T P t r - > S t a r t Y ; /* s t a r t a t t h e t o p p o l y g o n v e r t e x
*I
w h i l e( ( G E T P t r
!- NULL) 1 1 (AETPtr !- NULL)) (
I* u p d a t e AET f o r t h i s s c a n l i n e
*I
MoveXSortedToAET(CurrentY):
I* draw t h i s scan l i n e f r o m AET * I
S c a n O u t A E T ( C u r r e n t YC
. olor);
--
746
Chapter 40
/*
/*
/*
AdvanceAETO;
XSortAETO;
CurrentYU;
*/
*/
*/
/*
C r e a t e s a GET i n t h e b u f f e r p o i n t e d t o b y N e x t F r e e E d g e S t r u c f r o m
i f n e c e s s a r y ,t o
t h ev e r t e xl i s t .
Edge e n d p o i n t sa r ef l i p p e d ,
g u a r a n t e ea l le d g e sg ot o pt ob o t t o m .T h e
GET i s s o r t e d p r i m a r i l y
b ya s c e n d i n g
Y s t a r tc o o r d i n a t e ,a n ds e c o n d a r i l yb ya s c e n d i n g
X
s t a r tc o o r d i n a t ew i t h i ne d g e sw i t h
common Y c o o r d i n a t e s . */
s t a t i cv o i dB u i l d G E T ( s t r u c tP o i n t L i s t H e a d e r
* VertexList.
s t r u c tE d g e S t a t e * N e x t F r e e E d g e S t r u c .i n tX O f f s e t .i n tY O f f s e t )
{
i n t i. S t a r t X .S t a r t Y .
EndX. EndY. D e l t a YD
. e l t a XW
. i d t ht,e m p ;
s t r u c tE d g e S t a t e* N e w E d g e P t r ;
s t r u c tE d g e S t a t e* F o l l o w i n g E d g e ,* * F o l l o w i n g E d g e L i n k ;
s t r u c tP o i n t* V e r t e x P t r :
/*
S c a nt h r o u g ht h ev e r t e xl i s ta n dp u ta l ln o n - 0 - h e i g h te d g e si n t o
t h e GET, s o r t e d by i n c r e a s i n g Y s t a r t c o o r d i n a t e * /
VertexPtr
VertexList->PointPtr;
/ * p o i n tt ot h ev e r t e xl i s t
GETPtr
NULL;
/ * i n i t i a l i z et h eg l o b a l
e d g et a b l et oe m p t y
f o r ( i 0; i < V e r t e x L i s t - > L e n g t h ; i++)
{
/ * C a l c u l a t et h ee d g eh e i g h ta n dw i d t h
*/
StartX
VertexPtrCi1.X + XOffset;
StartY
VertexPtrCi1.Y + YOffset;
/ * T h ee d g er u n sf r o mt h ec u r r e n tp o i n tt ot h ep r e v i o u so n e
i f (i
0) I
/ * Wrap b a c k a r o u n d t o t h e e n d o f t h e l i s t
*/
EndX
VertexPtrCVertexList->Length-1l.X + X O f f s e t ;
EndY
VertexPtr[VertexList->Length-1l.Y + YOffset;
1 else I
VertexPtrCi-11.X + XOffset;
EndX
EndY
VertexPtrCi-ll.Y + YOffset;
--
--
I* Make s u r et h ee d g er u n st o pt ob o t t o m
i f ( S t a r t Y > EndY) {
SWAP(StartX. EndX);
SWAP(StartY. EndY);
/*
S k i p i f t h i sc a n ' te v e rb ea na c t i v ee d g e( h a s
i f ((DeltaY
EndY - S t a r t Y ) !- 0 ) {
*/
*/
*/
*/
0 height)
*/
/ * A l l o c a t es p a c ef o rt h i se d g e ' si n f o ,a n d
fill i n t h e
s t r u c t u r e */
NextFreeEdgeStruc++:
NewEdgePtr
/* d i r e c t i o ni nw h i c h
X moves
NewEdgePtr->XDirection
((DeltaX
EndX - S t a r t X ) > 0 ) ? 1 : -1;
Width
abs(De1taX):
StartX;
NewEdgePtr->X
NewEdgePtr->Starty
StartY;
Del taY;
NewEdgePtr->Count
NewEdgePtr->ErrorTermAdjDown
DeltaY:
i f ( D e l t a X >- 0 ) /* i n i t i a le r r o rt e r mg o i n g
L->R */
NewEdgePtr->ErrorTerm
0;
else
/ * i n iet trigear orlRmir -n>gL
*/
NewEdgePtr->ErrorTerm
- 0 e l t a Y + 1:
- - -
*/
747
>- W i d t h ) (
/* Y-major
edge
NewEdgePtr->WholePixelXMove
0;
NewEdgePtr->ErrorTermAdjUp W i d t h ;
else I
I* X-major
edge
i f (DeltaY
*/
*/
-NewEdgePtr->WholePixelXMove *
(Width I DeltaY)
NewEdgePtr->XDirection:
Width X DeltaY;
NewEdgePtr->ErrorTermAdjUp
I* L i n k t h e
- -
--
- &FollowingEdge->NextEdge;
FollowingEdgeLink
I* S o r t s a l l
edges c u r r e n t l y i n t h e a c t i v e
edge t a b l e i n t o a s c e n d i n g
o r d e ro fc u r r e n t
X coordinates *I
s t a t i cv o i dX S o r t A E T O
{
s t r u c tE d g e S t a t e* C u r r e n t E d g e .* * C u r r e n t E d g e P t r .
*TempEdge;
i n t Swapoccurred;
I* Scan t h r o u g h t h e
AET andswapany
a d j a c e n te d g e sf o rw h i c ht h e
secondedge
i s a t a l o w e rc u r r e n t
X c o o r dt h a nt h ef i r s te d g e .
i s needed * I
R e p e a tu n t i ln of u r t h e rs w a p p i n g
i f (AETPtr !- NULL) (
do
Swapoccurred
0:
CurrentEdgePtr
&AETPtr;
w h i l e( ( C u r r e n t E d g e
*CurrentEdgePtr)->NextEdge !- NULL) {
i f ( C u r r e n t E d g e - > X > CurrentEdge->NextEdge->X)
I* The secondedgehasalower
X thanthefirst;
swap them i n t h e AET * I
TempEdge
CurrentEdge->NextEdge->NextEdge:
*CurrentEdgePtr
CurrentEdge->NextEdge:
CurrentEdge->NextEdge->NextEdge C u r r e n t E d g e ;
CurrentEdge->NextEdge
TempEdge:
1:
Swapoccurred
--
CurrentEdgePtr
I w h i l e( S w a p o c c u r r e d
I* Advanceseachedge
&(*CurrentEdgePtr)->NextEdge;
!- 0 ) :
i n t h e AET byonescan
line.
Removes e d g e st h a th a v eb e e nf u l l ys c a n n e d .
*/
s t a t i c v o i d AdvanceAETO I
s t r u c tE d g e S t a t e* C u r r e n t E d g e .* * C u r r e n t E d g e P t r :
748
Chapter 40
/* Countdownandremove
oradvanceeachedge
i n t h e AET */
CurrentEdgePtr
&AETPtr:
w h i l e( ( C u r r e n t E d g e = * C u r r e n t E d g e P t r ) !- NULL) I
/ * Count o f f one scan l i n e f o r t h i s edge * /
i f ((--(CurrentEdge->Count))
0) I
/ * T h i se d g e i s f i n i s h e d , s o remove i t f r o mt h e AET * I
*CurrentEdgePtr
CurrentEdge->NextEdge:
I else t
I* A d v a n c et h ee d g e ' s
X c o o r d i n a t eb ym i n i m u m
move * /
CurrentEdge->X +- CurrentEdge->WholePixelXMove:
/ * D e t e r m i n ew h e t h e ri t ' st i m ef o r
X t o a d v a n c eo n ee x t r a
--
i f ((CurrentEdge->ErrorTerm
*/
CurrentEdge->ErrorTermAdjUp) > 0 ) t
CurrentEdge->X +- C u r r e n t E d g e - > X D i r e c t i o n :
C u r r e n t E d g e - > E r r o r T e r m -- CurrentEdge->ErrorTermAdjDown:
CurrentEdgePtr
+-
&CurrentEdge->NextEdge;
/*
Moves a l l edges t h a t s t a r t a t t h e s p e c i f i e d
Y c o o r d i n a t ef r o mt h e
GET t o t h e AET, m a i n t a i n i n g t h e X s o r t i n g o f t h e
AET. * /
s t a t i cv o i dM o v e X S o r t e d T o A E T ( i n t
YToMove) I
s t r u c tE d g e s t a t e *AETEdge.**AETEdgePtr,*TempEdge:
i n tC u r r e n t X :
/*
- ---
/*
F i l l st h es c a nl i n ed e s c r i b e db yt h ec u r r e n t
AET a t t h e s p e c i f i e d
Y
c o o r d i n a t ei nt h es p e c i f i e dc o l o r ,u s i n gt h eo d d l e v e n
fill r u l e * I
s t a t i cv o i dS c a n O u t A E T ( i n t
YToScan. i n t C o l o r ) {
i n tL e f t X :
s t r u c tE d g e s t a t e* C u r r e n t E d g e :
/*
Scan t h r o u g ht h e AET, d r a w i n gl i n es e g m e n t sa se a c hp a i ro fe d g e
c r o s s i n g si se n c o u n t e r e d .
The n e a r e s tp i x e l on o r t o t h e r i g h t
o f l e f t edges i s d r a w n , a n d t h e n e a r e s t p i x e l t o t h e l e f t o f b u t
n o t on r i g h t edges i s drawn * /
C u r r e n t E d g e = AETPtr;
w h i l e( C u r r e n t E d g e
!- NULL) I
L e f t X = CurrentEdge->X:
CurrentEdge
CurrentEdge->NextEdge;
OrawHorizontalLineSeg(YToScan. L e f t XC. u r r e n t E d g e - > X - 1C. o l o r ) :
CurrentEdge
CurrentEdge->NextEdge:
LAO-2.C
Draws a l l p i x e l s i n t h e h o r i z o n t a l l i n e
s e g m e n tp a s s e di n .f r o m
( L e f t X . Y )t o( R i g h t X . Y ) .i nt h es p e c i f i e dc o l o r
i n mode 1 3 h .t h e
VGA's 3 2 0 x 2 0 02 5 6 - c o l o r
mode. B o t hL e f t Xa n dR i g h t Xa r ed r a w n .
No
d r a w i n g will t a k ep l a c e i f L e f t X > R i g h t X . */
#i
ncl ude <dos. h>
P
incl ude "polygon. h"
# d e f i n e SCREEN-WIDTH
320
# d e f i n e SCREEN-SEGMENT
OxAOOO
s t a t i cv o i dO r a w P i x e l ( i n t .i n t .i n t ) :
v o i d DrawHorizontalLineSeg(Y,
i n t X;
/*
750
L e f t X .R i g h t X .C o l o r )
Draweach
p i x e li nt h eh o r i z o n t a ll i n e
t h e l e f t m o s t one * I
for (X
L e f t X : X <- R i g h t X ; X++)
DrawPixel(X. Y . C o l o r ) ;
Chapter 40
segment, s t a r t i n g w i t h
I* Draws t h e p i x e l a t
(X. Y ) i nc o l o rC o l o ri n
s t a t i cv o i dD r a w P i x e l ( i n t
X , i n t Y . i n tC o l o r )
u n s i g n e dc h a rf a r* S c r e e n P t r :
i l i f d e f -TURBOCScreenPtr
MK_FP(SCREEN-SEGMENT. Y * SCREEN-WIDTH + X ) :
I* MSC 5.0 * /
#else
FPLSEG(ScreenPtr)
SCREENLSEGMENT:
FP-OFF(ScreenPtr1 = Y * SCREENKWIDTH + X:
#endi f
* S c r e e n P t r = ( u n s i g n e dc h a r )C o l o r :
LISTING 40.3
POLYG0N.H
I* POLYG0N.H: Header f i l e f o r p o l y g o n - f i l l i n g
code * I
# d e f i n e CONVEX
0
# d e f i n e NONCONVEX 1
# d e f i n e COMPLEX
2
I* D e s c r i b e s a s i n g l e p o i n t ( u s e d f o r
s t r u c tP o i n t I
i n t X;
I* X c o o r d i n a t e * I
i n t Y;
I* Y c o o r d i n a t e * I
a s i n g l ev e r t e x )
*I
1:
I* D e s c r i b e s a s e r i e s o f p o i n t s ( u s e d t o s t o r e
a listofverticesthat
d e s c r i b e a p o l y g o n ;e a c hv e r t e xc o n n e c t st ot h et w oa d j a c e n t
v e r t i c e s ;t h el a s tv e r t e xi s
assumed t o c o n n e c t t o t h e f i r s t )
*I
s t r u c tP o i n t L i s t H e a d e r
{
L e ni ngtt h ;
I* p ilo i on ft s
*I
s t r u c tP o i n t * P o i n t P t r ;
I* p o i n t e r t o l i s t o f p o i n t s * /
1:
I* D e s c r i b e st h eb e g i n n i n ga n de n d i n g
X c o o r d i n a t e so f a s i n g l e
h o r i z o n t a ll i n e( u s e do n l yb yf a s tp o l y g o n
fill code) * I
s t r u c tH L i n e {
i n tX S t a r t :
I* X c o o r d i n a t e o f l e f t m o s t p i x e l i n l i n e
*I
i n t XEnd;
I* X c o o r d i n a t eo fr i g h t m o s tp i x e li nl i n e
*I
1:
I* D e s c r i b e s a l e n g t h - l o n gs e r i e so fh o r i z o n t a ll i n e s ,a l l
assumed t o
be on c o n t i g u o u ss c a nl i n e ss t a r t i n ga tY S t a r ta n dp r o c e e d i n g
downward(used
t o d e s c r i b e a s c a n - c o n v e r t e dp o l y g o nt ot h e
l o w - l e v e lh a r d w a r e - d e p e n d e n td r a w i n gc o d e )( u s e do n l yb yf a s t
p o l y g o n fill c o d e ) . * /
s t r u c tH L i n e L i s t {
i n tL e n g t h :
/ * il o f h o r i z o n t a l l i n e s * I
intYStart:
I* Y c o o r d i n a t e o f t o p m o s t l i n e
s t r u c tH L i n e * H L i n e P t r ;
I* p o i n t e r t o l i s t o f h o r z l i n e s
*I
*I
1;
LISTING 40.4
I* Sampleprogram
L40-4.C
t oe x e r c i s et h ep o l y g o n - f i l l i n gr o u t i n e s
*I
# i n c l u d e < c o n i o . h>
#i
ncl ude <dos .h>
#i
ncl ude "polygon. h"
# d e f i n e DRAW_POLYGON(PointList,Color,Shape.X.Y)
\
Polygon.Length = s i z e o f ( P o i n t L i s t ) / s i z e o f ( s t r u c t P o i n t ) : \
Polygon.PointPtr
PointList;
\
F i l l P o l y g o n ( & P o l y g o n .C o l o r ,
Shape, X . Y ) :
751
voidmain(void):
e x t e r ni n tF i l l P o l y g o n ( s t r u c tP o i n t L i s t H e a d e r
*,
i n t .i n t .i n t .i n t ) ;
v o i dm a i n 0 (
i n t i, j;
s t r u c tP o i n t L i s t H e a d e rP o l y g o n :
s t a t i cs t r u c tP o i n tP o l y g o n l [ l
~{0.01.~320.0~.~320.200~,~0,2001,~0,0~,~50,50~,
(270.50~.{270.150~.(50.150~,~50.50~~:
~(0.0).{10.0}.(105.1851,{260.30),~15,150},~5,150~,~5
{ 2 6 0 . 5 3 , ~ 3 0 0 , 5 ~ , ~ 3 0 0 ~ 1 5 1 , ~ 1 1 0 , 2 0 0 ~ , ~ 1 0 0 . 2i0 0 ~ , ~ 0 , 1 0 ~
~(0.0}.~30,-20).~30.0).~0,20},~-30,0~,~-30,-20~~:
-- {(30.20).(15.0).{0.20)):
-
~(0.0).~100.150~.~320,0~,~0,200~,~220.50~,~320~200~~
s t a t i cs t r u c tP o i n tP o l y g o n 2 C I
s t a t i cs t r u c tP o i n tP o l y g o n 3 C l
s t a t i cs t r u c tP o i n tP o l y g o n 4 C I
staticstructPointTrianglelCl
s t a t i cs t r u c tP o i n tT r i a n g l e 2 C l
s t a t i cs t r u c tP o i n tT r i a n g l e 3 C l
s t a t i cs t r u c tP o i n tT r i a n g l e 4 C l
u n i o n REGS r e g s e t :
{(30.0}.(15.20}.(0.0}};
- {{20,20).(20.0}.~0.10}):
~~0.20~.~20.10~.~0.01~:
/*
S e tt h ed i s p l a yt o
VGA mode 1 3 h .3 2 0 x 2 0 02 5 6 - c o l o r
regset.x.ax
0x0013;
i n t 8 6 ( 0 x 1 0 .& r e g s e t .& r e g s e t ) ;
mode * I
I* Draw t h r e ec o m p l e xp o l y g o n s
*/
DRAW-POLYGON(Polygon1. 15. COMPLEX, 0. 0 ) ;
getch0;
I* w a i ft o r
a k e y p r e s s */
DRAW-POLYGON(Polygon2. 5. COMPLEX. 0. 0):
getch0:
I* w a i ft o r
a keypress * I
DRAW-POLYGON(Polygon3, 3. COMPLEX. 0, 0):
getch0:
I* w a i ft o r
a keypress *I
I* Draw some a d j a c e n tn o n c o n v e xp o l y g o n s
f o r( i - 0 :i < 5 :
i++)
(
f o r (j-0: j < 8 : j++) {
ORAW~POLYGON(Polygon4. 16+i*8+j.
30+(j*20));
*I
NONCONVEX. 4 0 + ( i * 6 0 ) .
getch0:
I* w a i ft o r
a keypress *I
/ * D r a w a d j a c e n tt r i a n g l e sa c r o s st h es c r e e n
*I
f o r( j - 0 ;
j<-80: j+-20)
(
f o r( i - 0 :i < 2 9 0 :
i +- 3 0 ) (
DRAW-POLYGON(Triangle1. 2, CONVEX, i. j ) :
DRAW-POLYGON(Triangle2. 4 . CONVEX. i+15. j ) :
1
f o r( j - 1 0 0 :j < - 1 7 0 ;
j+-20)
I
I* Do a r o w o f p o i n t i n g - r i g h t t r i a n g l e s
f o r( i - 0 :i < 2 9 0 :
i +- 20) I
DRAW-POLYGON(Triangle3.
40. CONVEX.
I* Do a rowof
*I
i. j ) :
p o i n t i n g - l e f tt r i a n g l e sh a l f w a yb e t w e e n
onerow
fit between * /
o fp o i n t i n g - r i g h tt r i a n g l e sa n dt h en e x t ,t o
f o r( i - 0 ;i < 2 9 0 :
i +- 20) (
DRAW-POLYGON(Triangle4.
1, CONVEX, i, .$+lo):
752
Chapter 40
getch0;
/*
/*
w a ifto r
a keypress
R e t u r nt ot e x t
mode and e x i t
regset.x.ax
0x0003;
i n t 8 6 ( 0 x 1 0 .& r e g s e t .& r e g s e t ) :
*/
*/
Listing 40.4illustrates several interesting aspectsof polygon filling. The first and third
polygons drawn illustrate the operation of theodd/even fill rule. The second polygon
drawn illustrates how holes can be
created in seemingly solid objects;
an edge runs from
the outside of the rectangle to the inside, the edges comprising the hole are defined,
and then the same edge is used to move back to the outside; because the edgesjoin
seamlessly, the rectangle appears to form a solid boundary around the hole.
The set of V-shaped polygons drawnby Listing 40.4demonstrate that polygons sharing common edges meet butdo not overlap. This characteristic, which I discussed at
length in Chapter 38, is not a trivial matter; it allows polygonsto fit together without
fear of overlapping or missed pixels. In general, Listing 40.1guarantees that polygons are filled such that commonboundaries and vertices are drawn once and only
once. This has the sideeffect for any individual polygonof not drawing pixels that
lie exactlyon the bottom or right boundaries or atvertices that terminate bottom or
right boundaries.
By the way, I have not seen polygon boundary filling handled precisely thisway elsewhere. The boundary filling approach inFoley and van Dam is similar,but seems to
me to not draw all boundary and vertex pixels once and only once.
Performance Considerations
How fastis Listing 40.1?When drawing triangleson a 20-MHz 386, its lessthan one-fifth
the speed of the fast convex polygon fill code. However, most of that time is spent
drawing individual pixels; when Listing40.2 is replaced with the fast assembly line
segment drawing code in Listing 40.5,performance improves by two and one-half
times, to about half as fast as
the fast convex fill
code. Even after conversion to assembly in Listing 40.5,DrawHorizontalLineSeg still takes more than half of the total
execution time, and the remaining time is spread out fairly evenly over the various
subroutines in Listing 40.1.Consequently, theres no single place in which its possible to greatly improve performance, and the maximum additional improvement
753
that's possible looks to be a good deal less than two times; for that reason, and because of spacelimitations, I'm not going to convert the rest of the code to assembly.
However, when filling a polygon with a great many edges, and especially one with a
great many active edges at onetime, relatively more time would be spent traversing
the linked lists. In such a case, conversion to assembly (which does a very good job
with linked list processing) could pay off reasonably well.
LISTING 40.5
;
:
:
:
:
L40-5.ASM
Draws all p i x e l s i n t h e h o r i z o n t a l l i n e
segmentpassed
i n ,f r o m
( L e f t X . Y )t o( R i g h t X . Y ) ,i nt h es p e c i f i e dc o l o ri n
mode 1 3 h t. h e
VGA's 3 2 0 x 2 0 02 5 6 - c o l o r
mode. No d r a w i n g will t a k ep l a c e
if
L e f t X > R i g h t X T. e s t e dw i t h
TASM
C n e a r - c a l l a b l ea s :
v o i d DrawHorizontalLineSeg(Y. L e f t X .R i g h t X .C o l o r ) ;
SCREEN-WIDTH320
SCREEN-SEGMENT
Parms
Y
LeftX
RightX
Color
Parms
ends
struc
dw
dw
dw
dw
dw
equ
equ
2 dup(?)
?
?
?
?
OaODOh
: r e t u r na d d r e s s
& pushed BP
:Y c o o r d i n a t e o f l i n e
segment t o draw
; l e f te n d p o i n to ft h el i n e
segment
; r i g h te n d p o i n to ft h el i n e
segment
: c o l o ri nw h i c ht od r a wt h el i n es e g m e n t
.modelsmal 1
.code
public-DrawHorizontalLineSeg
align
2
DrawHorizontal LineSeg proc
f: rparscm
etaasel blceekprrvp' esu s h
mov fsrtaaom
cukerb:tppoo, si npt
; p rcrevaesalgedlrierisvarpt'ebsulrseh
cld
:make i sn tsprtoirniungi cnt teci rosn s
ax.SCREEN-SEGMENT
mov
mov
;epso, ianxt
EdSi stpol a y
memory
mov
d i ,[ b p + L e f t X ]
mov
cx.[bp+RightX]
o f; w i dc xt h. d is u b
jl
DrawDone
: R i g h t X < Lderndfatoow
Xt oi: n g
c xi n c
ebnou:dtidphneociln t s
ax.SCREEN-WIDTH
mov
mu1
[ b p ;+oYsf flclosianfenet
w
d orhtanoi cwh
add
, adxi
: E S : D I p o isnttl otiaosnrf et
mov
a 1 , b y t pe tCr b p + C o l o r; cl o l oi w
rn h i c thdo r a w
mov
ac:ihopn.luaot lr
AH f o r STOSW
:#
w ot or df s
f i 11
c xs.h1r
s troespw
:fill a word a t a t i m e
cx.cx adc
i f any
b oys:tdttdreohder,seapbw
DrawDone:
pop; rcrevaesalgltrieoisdarrti'ebs lr e
POP
bp
f r; acrsem
at asleltceokrr'es
ret
- D r a w H o r i z o n t a l L i n e Seengd p
end
~
line
754
Chapter 40
seg
Nonconvex Polygons
Nonconvex polygonscan be filled somewhatfaster than complex polygons. Because
edges never cross or switch positions with other edges once theyre in the AET, the
AET for a nonconvex polygon needs to be sorted only when newedges are added.In
order forthis to work, though, edges must be added to the AET in strict left-to-right
order. Complications arise whendealing with two edges that start at the same point,
because slopes must be compared to determine which edge is leftmost. This is certainly doable, but because of space limitations and limited performance returns, I
havent implemented this in Listing 40.1,
Details, Details
Every so often, a programming demonthat Id thought Id forever laidto rest arises
to haunt me once again. A minor example of this-an imp, if you will-is the use of
= when I mean == ,which Ivedone all too often in the past, and am sure Ill do
again. Thats minor deviltry, though, compared to the considerably greater evils of
one of my personal scourges, of whichI was recently reminded anew: too-closeattention to detail. Not seeing the forest for the trees. Looking low when I should have
looked high. Missing the big picture, if you catch my drift.
755
Previous
Home
Next
Thoreau said it best: Our life is frittered away by detail....SimpliQ, simplify That
quote sprangto mind when I received a lettera while backfrom Anton Treuenfels of
Fridley, Minnesota, thanking me for clarifylng the principles of filling adjacent convex polygons in my ongoing writings on graphics programming. (Youll find this
material in the previous two chapters.) Anton then went on to describe his own
method forfilling convex polygons.
Antons approach hadits virtues and drawbacks, foremost among the
virtues being a
simplicity Thoreau would have admired. For instance, in writing my polygon-filling
code, I had spent quitesome time tryingto figure out thebest way to identify which
edge was the left edge and which the right,finally settling on comparing the
slopes
of the edges if the top of the polygon wasnt flat, and comparing the starting points
of the edges if the topwas flat. Anton simplified thistremendously by not bothering
to figure out ahead of time which was the right edge of the polygon and which the
left, instead scanningout the two edges in whatever order he found them and letting
the low-level drawing code test, and if necessary swap, the endpoints of each horizontal line of the fill, so that filling started at theleftmost edge. This is a little slower
than my approach (although the difference is almost surely negligible), but italso
makes quite a bitof code go away.
What that example, and others like it in Antons letter, didwas kick my mind into a
mode that ithadnt-but should have-been in when I wrote the code, a mode in
which I began to wonder, How elsecan I simplify this code?;what you might call
Occams Razor mode. You see, I created the convex polygondrawing code by first
writing pseudocode, then writing C code, and finally writing assembly code, and
once the pseudocodewas finished, I stopped thinking about the interactions
of the
various portions of the program.
In other words, I becameso absorbed in individual details that I forgotto consider
the code as a whole. That was a mistake, and anembarrassing one for someonewho
constantly preaches that programmers should look at their code from a variety of
perspectives; the next chapter shows just how much difference thinking about the
big picture can make. May my embarrassment be your enlightenment.
The point is not whether, in the final analysis, my code or Antons code is better;
both have their advantages. The point is that I was programming with half a deck
because I was so fixated on thedetails of a single type of implementation; I ended up
with relatively hard-to-write,complex code, andmissed out onmany potentially useful optimizations by being so focused. Its a big world out there, and there aremany
subtle approaches to any problem, s o relax and keep thebig picture in mind as you
implement your programs. Your code will likely be not only better, but also simpler.
And whenever you see me walking across
hot coals in this book or elsewhere when
theres aneasier way to go, please, let me know!
Thanks, Anton.
756
Chapter 40
Previous
chapter 41
those way-down polygon nomenclature blues
Home
Next
759
rather long#define variable). Actually, tobe atad more precise, Id call them monotone with respect to a vertical line and simple, where simple means not
self-intersecting.Similarly, the polygon typeI called nonconvexis actuallysimple,
and I suppose what I called complex should be referred to as nonsimple, or
maybe just noneof the above.
This may seem like nit-picking, but actually, it isn t;what its really about is the
tremendous importance of having a shared language. In one of his books, Richard
Feynman describes having developed his own mathematical
framework, complete
with his ownnotation and terminolom, in high school. When hegot to college and
started working with other people who were at his level, he suddenly understood
that people cant share ideas effectively unless they speak the same language;
otherwise, they waste a great deal of time on misunderstandings andexplanation.
Or, as Bill Huber put it, You are free to adopt your own terminology when it suits
your purposes well. But you risklosing or confusing those who could be among
your
most astute readers-those who already have been trained in the same or a related
field. Ditto. Likewise. Duccord. And mea tuba; I shall endeavor to watch my language in the future.
Nomenclature in Action
Just to show you howmuch difference proper description and interchangeof ideas
can make, consider the case of identifylng convex polygons. When I was writing
about polygons in my column in DDJ a nonfunctional method foridentifylng such
polygons-checking for exactly two X direction changesand two Y direction changes
around the perimeter of the polygon-crept into the column by accident. That
not (Thatswhy you wont find itin
method, as I noted in a later column, doeswork.
this book.) Still, a fast method of checking for convex polygons would be highly
desirable, because such polygons can be drawn with the fast code from Chapter39,
rather than therelatively slow, general-purpose code from Chapter40.
Now consider Bills point thatwere not limited to drawing convex polygons in our
convex fill code, butcan actually handle any simple polygon thats monotone with
respect to avertical line. Additionally, consider AntonTreuenfelsspoint, made back
in Chapter 40,that life gets simpler if we stop worrying about which edge of a polygon is the left edge and which is the right, and instead just scan out each raster line
starting at whichever edge is left-most.Now, what do we have?
What we have isan approachpassed along by Jim Kent, of Autodesk Animator fame.
If we modify the low-level code to check which edge is left-most on each scan line
and start drawing there, as just described, then we can handle any polygon thats
monotone with respect to a vertical line regardless of whether the edges cross. (Ill
call this monotone-verticalfrom now on; if anyone wants to correct that terminology,jump right in.) Inother words, we can then handle nonsimple
polygons that are
760
Chapter 41
LISTING 41.1
L41-1.C
/*
R e t u r n s 1 i f p o l y g o nd e s c r i b e db yp a s s e d - i nv e r t e xl i s ti sm o n o t o n ew i t h
i f polygon i ss i m p l e
r e s p e c t t o a v e r t i c a l l i n e , 0 o t h e r w i s e .D o e s n ' tm a t t e r
( n o n - s e l f - i n t e r s e c t i n g )o rn o t .T e s t e dw i t hB o r l a n d
C++ i ns m a l lm o d e l .
*/
#i
n c l u d e " p o l y g o n . h"
VertexList)
- -
i n t i, L e n g t h D
, e l t a Y S i g nP
. reviousDeltaYSign:
i n t NumYReversals
0:
s t r u c tP o i n t* V e r t e x P t r
VertexList->PointPtr:
/ * T h r e eo rf e w e rp o i n t sc a n ' t
i f ((Length-VertexList->Length)
make a n o n - v e r t i c a l - m o n o t o n ep o l y g o n
< 4) return(1):
*/
/ * Scan t o t h e f i r s t n o n - h o r i z o n t a l
edge * I
PreviousDeltaYSign
SIGNUM(VertexPtr[Length-1l.Y - VertexPtrCO1.Y):
i
0:
w h i l e( ( P r e v i o u s D e l t a Y S i g n
0 ) && (i < ( L e n g t h - 1 ) ) ) I
PreviousDeltaYSign
SIGNUM(VertexPtrCi1.Y - V e r t e x P t r [ i + l l . Y ) ;
i++:
1
i f (i
- ( L e n g t h - 1 ) )r e t u r n ( 1 ) :
/*
p o l y g o ni s
a flatline
*/
I* Now c o u n t Y r e v e r s a l s .M i g h tm i s so n er e v e r s a l ,a tt h el a s tv e r t e x ,b u t
b e c a u s er e v e r s a lc o u n t sm u s tb ee v e n ,b e i n go f fb yo n ei s n ' t
do
a problem
*/
i f ((DeltaYSign
SIGNUM(VertexPtrCi1.Y - V e r t e x P t r C i + l ] . Y ) )
!- 0 ) I
i f ( D e l t a Y S i g n !- P r e v i o u s D e l t a Y S i g n ) [
/ * S w i t c h e d Y d i r e c t i o n :n o tv e r t i c a l - m o n o t o n e
if
r e v e r s e d Y d i r e c t i o n as many a s t h r e e t i m e s * I
i f (ttNumYReversals > 2 ) r e t u r n ( 0 ) :
DeltaYSign:
PreviousDeltaYSign
<
) w h i l e (i++
(Length-1)):
return(1):
/ * i t ' s a v e r t i c a l - m o n o t o n ep o l y g o n
*/
Listings 41.2 and 41.3 are variants of the fast convex polygon fill
code from Chapter
39, modified to be able to handle all monotone-vertical polygons,
including nonsimple
ones; the edge-scanning code
(Listing 39.4 from Chapter39) remains thesame, and
so is not shown again here.
Those Way-Down PolygonNomenclatureBlues
761
Monotone-verticalpolygons.
Figure 41.1
Non-monotone-verticalpolygons.
Figure 4 1.2
LISTING 41.2L41-2.C
/* C o l o r - f i l l s a c o n v e xp o l y g o n . Al v e r t i c e sa r eo f f s e tb y( X O f f s e t .Y O f f s e t ) .
"Convex" means "monotone w i t h r e s p e c t t o
a v e r t i c a ll i n e " :t h a ti s ,e v e r y
h o r i z o n t a ll i n ed r a w nt h r o u g ht h ep o l y g o n
a t any p o i n tw o u l dc r o s se x a c t l yt w o
a c t i v ee d g e s( n e i t h e rh o r i z o n t a ll i n e sn o rz e r o - l e n g t he d g e sc o u n ta sa c t i v e
e d g e s :b o t ha r ea c c e p t a b l ea n y w h e r e
i nt h ep o l y g o n ) .R i g h t
& l e f t edges may
c r o s s( p o l y g o n s
may b en o n s i m p l e ) .P o l y g o n st h a ta r en o tc o n v e xa c c o r d i n gt o
t h i sd e f i n i t i o nw o n ' t
bedrawnproperly.(Yes."convex"
i s a l o u s y name f o r
t h i st y p eo fp o l y g o n ,b u ti t ' sc o n v e n i e n t :u s e" m o n o t o n e - v e r t i c a l "
i f i t makes
y o uh a p p i e r ! )
...................................................................
NOTE: t h el o w - l e v e ld r a w i n gr o u t i n e ,D r a w H o r i z o n t a l L i n e L i s t .m u s tb ea b l et o
r e v e r s et h ee d g e s ,
i f n e c e s s a r yt o make t h e c o r r e c t e d g e l e f t
edge. It must
a l s oe x p e c tr i g h te d g et o
be s p e c i f i e d i n +1 f o r m a t( t h e X c o o r d i n a t e i s 1 p a s t
h i g h e s tc o o r d i n a t et od r a w ) .I nb o t hr e s p e c t s ,t h i sd i f f e r sf r o ml o w - l e v e l
d r a w i n gr o u t i n e sp r e s e n t e di ne a r l i e rc o l u m n s :c h a n g e sa r en e c e s s a r yt o
make i t
p o s s i b l et od r a wn o n s i m p l em o n o t o n e - v e r t i c a lp o l y g o n s :t h a ti nt u r n
makes it
p o s s i b l et ou s eJ i mK e n t ' st e s tf o rm o n o t o n e - v e r t i c a lp o l y g o n s .
...................................................................
0 i f memory a l l o c a t i o n f a i l e d */
R e t u r n s 1 f o rs u c c e s s ,
762
Chapter 41
# i n c l u d e< s t d i o . h >
# i n c l u d e< m a t h . h >
# i n c l u d e < s t d l ib. h>
#i
n c l ude "polygon. h"
I* Advances t h ei n d e xb yo n ev e r t e xf o r w a r dt h r o u g ht h ev e r t e xl i s t ,
w r a p p i n ga tt h ee n d
o f t h e l i s t *I
# d e f i n e INDEXKFORWARD(1ndex) \
Index
( I n d e x + 1) % V e r t e x L i s t - > L e n g t h :
/ * A d v a n c e st h ei n d e xb yo n ev e r t e xb a c k w a r dt h r o u g ht h ev e r t e xl i s t ,
*I
wrappingatthestartofthelist
# d e f i n e INDEXLBACKWARD(1ndex) \
Index
(Index - 1 + VertexList->Length) % VertexList->Length:
I* A d v a n c e st h ei n d e xb y
one v e r t e xe i t h e rf o r w a r do rb a c k w a r dt h r o u g h
t h ev e r t e xl i s t ,w r a p p i n ga te i t h e r
end o f t h e l i s t * I
# d e f i n e INDEX_MOVE(Index.Direction)
i f (Direction > 0)
Index
( I n d e x + 1) % V e r t e x L i s t - > L e n g t h ;
else
Index
(Index - 1 + VertexList->Length) % VertexList->Length:
e x t e r nv o i dS c a n E d g e ( i n t .i n t .
i n t . i n t . i n t . i n t . s t r u c tH L i n e
e x t e r nv o i d D r a w H o r i z o n t a l L i n e L i s t ( s t r u c t H L i n e L i s t *, i n t ) ;
i n t FillMonotoneVerticalPolygon(struct P o i n t L i s t H e a d e r
i n tC o l o r ,i n tX O f f s e t .i n tY O f f s e t )
\
\
\
\
**);
VertexList.
i n t i, MinIndex.MaxIndex.MinPoint-Y.MaxPoint-Y:
i n t N e x t I n d e xC
. u r r e n t I n d e xP
. reviousIndex:
s t r u c tH L i n e L i s tW o r k i n g H L i n e L i s t :
s t r u c tH L i n e* E d g e P o i n t P t r ;
s t r u c tP o i n t* V e r t e x P t r :
/* P o i n t t o t h e v e r t e x l i s t
*I
VertexPtr
VertexList->PointPtr;
I* Scan t h e l i s t t o f i n d t h e t o p
a n db o t t o mo ft h ep o l y g o n
*I
i f (VertexList->Length
0)
r e t u r n ( 1 ) : I* r e j e c nt u l pl o l y g o n s
*I
MaxPoint-Y
MinPoint-Y
VertexPtrCMinIndex
MaxIndex
01.Y:
f o r (i 1: i < V e r t e x L i s t - > L e n g t h ; i++)
{
i f ( V e r t e x P t r C i 1 . Y < MinPoint-Y)
i1.Y: I* new t o p * I
MinPointLY = VertexPtrCMinIndex
e l s e i f ( V e r t e x P t r l i 1 . Y > MaxPoint-Y)
MaxPoint-Y
VertexPtrCMaxIndex
i1.Y: I* new b o t t o m */
I* S e tt h e # o f s c a n l i n e s i n t h e p o l y g o n , s k i p p i n g t h e b o t t o m
i f ((WorkingHLineList.Length
MaxPoint-Y - M i n P o i n t - Y ) <- 0 )
r e t u r n ( 1 ) : I* t h e r e ' sn o t h i n gt od r a w ,
s o we'redone
*I
WorkingHLineList.YStart
Y O f f s e t + MinPointLY:
/*
edge * I
Get memory i n w h i c h t o s t o r e t h e l i n e l i s t
we g e n e r a t e * I
i f ((WorkingHLineList.HLinePtr
( s t r u c tH L i n e * ) ( m a l l o c ( s i z e o f ( s t r u c tH L i n e )
*
WorkingHLineList.Length1))
NULL)
return(0);
/ * c o u l d n ' tg e t memory f o r t h e l i n e l i s t
*I
763
/*
*/
*/
Scan t h e f i r s t edgeand
s t o r et h eb o u n d a r yp o i n t si nt h el i s t
I n i t i a lp o i n t e rf o rs t o r i n g
s c a nc o n v e r t e df i r s t - e d g ec o o r d s
EdgePointPtr
WorkingHLineList.HLinePtr:
/* S t a r t f r o m t h e t o p o f t h e f i r s t
edge */
PreviousIndex
CurrentIndex
MinIndex:
/ * Scan c o n v e r t e a c h l i n e i n t h e f i r s t
e d g ef r o mt o pt ob o t t o m
do I
/*
*/
INDEX-BACKWARD(Current1ndex);
ScanEdge(VertexPtr[PreviousIndexl.X + X O f f s e t .
VertexPtr[PreviousIndexl.Y.
VertexPtr[CurrentIndexl.X + X O f f s e t .
VertexPtr[CurrentIndexl.Y. 1. 0. & E d g e P o i n t P t r ) :
PreviousIndex
CurrentIndex;
w h i l e( C u r r e n t I n d e x !- MaxIndex):
--
/*
Scan t h es e c o n de d g ea n ds t o r et h eb o u n d a r yp o i n t s
EdgePointPtr
WorkingHLineList.HLinePtr;
PreviousIndex
CurrentIndex
MinIndex;
/ * Scan c o n v e r tt h es e c o n de d g e ,t o pt ob o t t o m
do I
in the list
*/
*/
INDEX-FORWARD(Current1ndex):
ScanEdge(VertexPtr[PreviousIndexl.X + X O f f s e t .
VertexPtr[PreviousIndexl.Y,
VertexPtr[CurrentIndexl.X + X O f f s e t .
VertexPtr[CurrentIndexl.Y. 0. 0. & E d g e P o i n t P t r ) ;
PreviousIndex
CurrentIndex;
!- M a x I n d e x ) :
) w h i l e( C u r r e n t I n d e x
/ * Draw t h e l i n e l i s t r e p r e s e n t i n g t h e
s c a nc o n v e r t e dp o l y g o n
DrawHorizontalLineList(&WorkingHLineList, Color):
*/
/*
*/
Releasethelinelist's
memory a n dw e ' r es u c c e s s f u l l yd o n e
free(WorkingHLineList.HLinePtr):
return(1);
LISTING 41.3
L41-3.ASM
: Draws a l l p i x e l s i n l i s t o f h o r i z o n t a l l i n e s p a s s e d i n , i n
: 3 2 0 x 2 0 02 5 6 - c o l o r mode.Uses
REP STOS t o fill e a c hl i n e .
. ..................................................................
: NOTE: i s a b l e t o r e v e r s e t h e
X c o o r d sf o r a s c a nl i n e ,
i f n e c e s s a r y ,t o make
: X S t a r t < XEnd. Expectswhicheveredge
i s r i g h t m o s t onanyscan
l i n e t o be i n
: +1 f o r m a t :t h a ti s ,
XEnd i s 1 g r e a t e rt h a nr i g h t m o s tp i x e lt od r a w .
If
n o t h i n g i s drawnon
t h a t scan l i n e .
.: .X.S.t.a.r.t. .-. . .XEnd.
.......................................................
: C near-callable as:
:
void DrawHorizontalLineList(struct HLineList
: Al a s s e m b l yc o d et e s t e dw i t h
TASM and MASM
320
SCREEN-WIDTH
SCREEN-SEGMENT
equ
equ
OaOOOh
HsLtirnuec
XStart
XEnd
HLine
ends
?
?
:X c o o r d i n a t e o f l e f t m opsi ltxi en le
:X c o o r d i rni oag fthet m pol iisnixnte l
dw
dw
H L i n e L i s ts t r u c
Lngth
dw
YStart
dw
764
Chapter 41
?
?
:#
h o roi fz ol innt ea sl
:Y c o o r dtionopafm
t leoi nset
H L i n e L i s t P t ri .nCt o l o r ) :
H L i n e P t r dw
H L i n e L i s tends
Parms
struc
dw
2 d u p ( ?; r)e t u rand d r e s s
& pushed BP
HLineListPtr
dw
?
; p o i n t e rt oH L i n e L i s ts t r u c t u r e
Color
dw
?
; c o lwoiw
rt h itcoh
fill
Parms
ends
.modelsmall
.code
public-DrawHorizontalLineList
align
2
- 0 r a w H o r i z o n t a l L i n e L i s tD r o c
push
bP
; pf rcaesam
tsalelecer kvr 'es
mov
bP * SP
; p ostiutonarft cr ak m e
push
si
; p r e s e r v ec a l l e r ' sr e g i s t e rv a r i a b l e s
push
di
c ld
;make s t r i n g i n s t r u c t i o n s i n c p o i n t e r s
mov
mov
mov
mov
mu 1
mov
mov
mov
and
jz
mov
mov
F i 11 Loop:
mov
mov
CmP
jle
xchg
ax,SCREEN-SEGMENT
e s . a x; p o i n t
ES t od i s p l a y
s i . C b p + H L i n e L i s t P t r ]; p o i n tt ot h el i n el i s t
ax,SCREEN-WIDTH ; p o i n t t o t h e s t a r t o f t h e f i r s t s c a n
; l i n iew
n h i c tho
draw
Csi+YStartl
;ES:DX p o i nfttiorsssct al i nnt oe
dx.ax
b x . [ s i + H L i n e P t r l; p o i n tt ot h eX S t a r t / X E n dd e s c r i p t o r
; f o rt h ef i r s t( t o p )h o r i z o n t a ll i n e
;#o f s c a nl i n e st od r a w
s .i E s i + L n g t h l
; .atshsriei r e
any l i n teos
draw?
F i 11 Done
;no. s o we're
done
a 1 , b y t ep t r[ b p + C o l o r l; c o l o rw i t hw h i c ht o
fill
: ad hu .paf cloli ocr laot re
STOSW
di.[bx+XStartl
cx.[bx+XEndl
d i ,c x
NoSwap
di ,cx
draw
; l e f t edge o f fill on t h i s l i n e
: r i g h t edge o f fill
; i s X S t a r t > XEnd?
;now
, e're
all s e t
;yes. s o swap edges
NoSwap:
sub
jz
add
test
jz
stosb
dec
jz
MainFill:
shr
rep
adc
rep
LineFillOone:
add
add
dec
j nz
;# o f words i n fill
;fill
many
as
pwooasrssdi sb l e
;1 i f t h e r e ' s an odd t r a i l i nbgy t eo
; do, 0 o t h e r w i s e
;fill
odd
any
t r a i bl iyntge
b x . s i z eH L i n e: p o i n t ot h en e x lt i n ed e s c r i p t o r
dx.SCREEN-WIDTH ; p o i n t t o t h e n e x t s c a n l i n e
si
:countofflinesto
fill
F i 11 Loop
765
F i 11 Done:
pop; crr eve
aasg
l ltrieoisdarrt'ibeslre s
pop
si
POP
bP
f r;acsr m
eat aslelcteokrr' es
ret
-D r a w H o r i z o n t a l L i n e L i s t e n d p
end
LISTING 41.4L41-4.C
/ * C o l o r - f i l l s an a r b i t r a r i l y - s h a p e dp o l y g o nd e s c r i b e db yV e r t e x L i s t .
I f t h e f i r s t and l a s t p o i n t s i n V e r t e x L i s t a r e n o t t h e
same, t h e p a t h
a r o u n dt h ep o l y g o n
i sa u t o m a t i c a l l yc l o s e d .
A
l v e r t i c e sa r eo f f s e t
b y( X O f f s e t .Y O f f s e t ) .R e t u r n s
1 f o rs u c c e s s ,
failed. A
l C c o d et e s t e dw i t hB o r l a n d
C++.
0 i f memory a l l o c a t i o n
I f t h ep o l y g o ns h a p e
i s known i n a d v a n c e ,s p e e d i e rp r o c e s s i n g
may be
e n a b l e db ys p e c i f y i n gt h es h a p ea sf o l l o w s :" c o n v e x "
- a r u b b e rb a n d
s t r e t c h e da r o u n dt h ep o l y g o nw o u l dt o u c he v e r yv e r t e xi no r d e r ;
"nonconvex" - t h e p o l y g o n i s n o t s e l f - i n t e r s e c t i n g , b u t
neednotbe
- t h ep o l y g o n may be s e l f - i n t e r s e c t i n g ,o r ,i n d e e d ,
convex;"complex"
any s o r t o f p o l y g o n a t
all. Complex will work f o r all polygons:convex
i sf a s t e s t .U n d e f i n e dr e s u l t s
will o c c u r i f convex i s s p e c i f i e d f o r a
n o n c o n v e xo rc o m p l e xp o l y g o n .
D e f i n e CONVEX-CODE-LINKED i f t h e f a s t c o n v e x p o l y g o n f i l l i n g c o d e f r o m
t h eF e b r u a r y1 9 9 1c o l u m n
i s 1 i n k e di n .O t h e r w i s e ,c o n v e xp o l y g o n sa r e
h a n d l e db yt h ec o m p l e xp o l y g o nf i l l i n gc o d e .
Nonconvex i s handledascomplex
i n t h i s i m p l e m e n t a t i o n . See t e x t f o r
d i s c u s s i o no ff a s t e rn o n c o n v e xh a n d l i n g .
*/
# i n c l u d e< s t d i o . h >
# i n c l u d e < m a t h . h>
# i f d e f -TURBOC# i n c l u d e< a l l o c . h >
#else
I* MSC * I
# i n c l u d e< m a l l o c . h >
#endif
#i
n c l ude "polygon. h"
# d e f i n e SWAP(a.b)
{temp
a; a
s t r u c tE d g e S t a t e {
s t r u c tE d g e s t a t e* N e x t E d g e ;
i n t X;
intStartY;
i n t WholePixelXMove;
intXDirection;
i n tE r r o r T e r m :
i n t ErrorTermAdjUp;
i n t ErrorTermAdjDown;
i n t Count;
3;
766
Chapter 41
b: b
temp;]
e x t e r nv o i d OrawHorizontalLineSeg(int, i n t . i n t . i n t ) :
e x t e r n i n t FillMonotoneVerticalPolygon(struct P o i n t L i s t H e a d e r *,
i n t , i n t .i n t ) :
e x t e r n i n t PolygonIsMonotoneVertical(struct P o i n t L i s t H e a d e r * ) :
s t a t i cv o i dB u i l d G E T ( s t r u c tP o i n t L i s t H e a d e r
*, s t r u c tE d g e S t a t e *,
i n t .i n t ) ;
s t a t i cv o i dM o v e X S o r t e d T o A E T ( i n t ) ;
s t a t i cv o i dS c a n O u t A E T ( i n t .i n t ) :
s t a t i cv o i dA d v a n c e A E T ( v o i d ) :
s t a t i cv o i dX S o r t A E T ( v o i d ) ;
/* P o i n t e r s t o g l o b a l e d g e t a b l e
(GET) and a c t i v e edge t a b l e (AET)
s t a t i cs t r u c tE d g e S t a t e
*GETPtr.*AETPtr:
*/
i n tF i l l P o l y g o n ( s t r u c tP o i n t L i s t H e a d e r
* V e r t e x L i s t .i n tC o l o r ,
i n t PolygonShape. i n tX O f f s e t .i n tY O f f s e t )
(
s t r u c tE d g e S t a t e* E d g e T a b l e B u f f e r :
i n tC u r r e n t Y :
# i f d e f CONVEX-CODE-LINKED
/ * P a s sc o n v e xp o l y g o n st h r o u g h
i f ((PolygonShape
CONVEX)
1I
PolygonIsMonotoneVertical(VertexList))
t o f a s t c o n v e xp o l y g o n
filler
*/
return(FillMonotoneVerticalPolygon(VertexList, C o l o r X
, Offset.
YOffset));
Cendi f
/*
It t a k e s a minimum o f 3 v e r t i c e s t o c a u s e a n y p i x e l s t o b e
drawn: r e j e c t p o l y g o n s t h a t a r e g u a r a n t e e d t o b e i n v i s i b l e
*/
i f (VertexList->Length < 3 )
return(1):
/ * Getenough memory t o s t o r e t h e e n t i r e edge t a b l e */
i f ((EdgeTableBuffer
( s t r u c tE d g e S t a t e * ) ( m a l l o c ( s i z e o f ( s t r u c tE d g e s t a t e ) *
VertexList->Length)))
NULL)
r e t u r n ( 0 ) : / * c o u l d n ' tg e t memory f o r t h e edge t a b l e */
*/
/ * B u i l dt h eg l o b a le d g et a b l e
B u i l d G E T ( V e r t e x L i s tE
. d g e T a b l e B u f f e rX
. O f f s e tY
. Offset);
/ * Scandown t h r o u g ht h ep o l y g o ne d g e s ,o n es c a nl i n ea t
a time,
so l o n ga sa tl e a s to n ee d g er e m a i n s
i n e i t h e r t h e GET o r AET */
AETPtr
NULL:
/* i n i t i a l i z et h ea c t i v e edge t a b l et o empty */
CurrentY
GETPtr->StartY; / * s t a r ta tt h et o pp o l y g o nv e r t e x
*/
w h i l e( ( G E T P t r
!- NULL) I I (AETPtr !- NULL)) {
MoveXSortedToAET(CurrentY): I* u p d a t e AET f o r t h i s scan l i n e */
ScanOutAET(CurrentY.Color);
/ * draw t h i s s c a n l i n e f r o m AET */
/* advance AET edges 1 scan l i n e */
AdvanceAETO:
/ * r e s o r t on X * /
XSortAETO:
/* advance t o t h e n e x t s c a n l i n e
*/
CurrentY++:
/*
*/
C r e a t e s a GET i nt h eb u f f e rp o i n t e dt ob yN e x t F r e e E d g e S t r u cf r o m
t h ev e r t e xl i s t .
Edge e n d p o i n t sa r ef l i p p e d ,
i f necessary, t o
g u a r a n t e ea l le d g e sg ot o pt ob o t t o m .
The GET i s s o r t e d p r i m a r i l y
b ya s c e n d i n g Y s t a r t c o o r d i n a t e ,
a n ds e c o n d a r i l yb ya s c e n d i n g
X
s t a r t c o o r d i n a t e w i t h i n edges w i t h common Y c o o r d i n a t e s . */
767
s t a t i cv o i dB u i l d G E T ( s t r u c tP o i n t L i s t H e a d e r
* VertexList.
s t r u c tE d g e S t a t e * N e x t F r e e E d g e S t r u c , i n t X O f f s e t . i n t Y O f f s e t )
i n t i. S t a r t X .S t a r t Y .
EndX.
EndY.
D e l t a YD
. e l t a XW
. i d t ht,e m p :
s t r u c tE d g e S t a t e *NewEdgePtr;
s t r u c tE d g e S t a t e* F o l l o w i n g E d g e .* * F o l l o w i n g E d g e L i n k :
s t r u c tP o i n t* V e r t e x P t r ;
/*
Scan t h r o u g h t h e v e r t e x l i s t
and p u ta l ln o n - 0 - h e i g h te d g e si n t o
t h e GET, s o r t e db yi n c r e a s i n g
Y s t a r t c o o r d i n a t e */
VertexPtr
VertexList->PointPtr:
/ * p o i n tt ot h ev e r t e xl i s t
*/
GETPtr
NULL:
/ * i n i t i a l i z et h eg l o b a l
edge t a b l et o empty * /
f o r (i 0: i < V e r t e x L i s t - > L e n g t h ; i++)
(
/* C a l c u l a t et h ee d g eh e i g h ta n dw i d t h
*/
StartX
VertexPtrCi1.X + X O f f s e t :
StartY
VertexPtrCi1.Y + YOffset:
/ * T h ee d g er u n sf r o mt h ec u r r e n tp o i n tt ot h ep r e v i o u so n e
*/
i f (i
0) {
/ * Wrap b a c ka r o u n d t o t h e end o f t h e l i s t */
EndX
VertexPtrCVertexList->Length-1l.X + XOffset:
EndY
VertexPtrCVertexList->Length-l1.Y + YOffset:
} else (
EndX
V e r t e x P t r C i - l l . X + XOffset:
EndY
VertexPtrCi-11.Y + YOffset:
--
*/
I* Make s u r e t h e e d g e r u n s t o p t o b o t t o m
i f ( S t a r t Y > EndY) {
SWAP(StartX.
EndX):
SWAP(StartY.EndY):
0 height) *I
I* S k i p if t h i sc a n ' te v e rb ea na c t i v ee d g e( h a s
EndY - S t a r t Y ) !- 0 )
i f ((DeltaY
I* A l l o c a t es p a c ef o rt h i se d g e ' si n f o ,a n d
fill i n t h e
s t r u c t u r e */
NextFreeEdgeStruc++;
NewEdgePtr
NewEdgePtr->XDirection
/ * d i r e c t i o ni nw h i c h X moves * /
((DeltaX
EndX - S t a r t X ) > 0 ) ? 1 : -1:
Width
abs(De1taX);
NewEdgePtr->X
StartX:
StartY:
NewEdgePtr->StartY
NewEdgePtr->Count
DeltaY;
NewEdgePtr->ErrorTermAdjDown
DeltaY:
i f ( D e l t a X >- 0 ) / * i n i t i a le r r o rt e r mg o i n g
L->R */
NewEdgePtr->ErrorTerm
0;
else
/* i n iet tri gear orlRmri n- >gL
*/
NewEdgePtr->ErrorTerm
- D e l t a Y + 1:
i f ( D e l t a Y >- W i d t h ) (
/ * Y-major
edge
*/
NewEdgePtr->WholePixelXMove
0:
NewEdgePtr->ErrorTermAdjUp
Width:
3 else I
/ * X-major
edge
*/
NewEdgePtr->WholePixelXMove
(Width / DeltaY) * NewEdgePtr->XDirection:
NewEdgePtr->ErrorTermAdjUp
Width % DeltaY:
- -
/*
768
Chapter 41
i f ((FollowingEdge
NULL) I I
(FollowingEdge->StartY > S t a r t Y ) I I
((FollowingEdge->StartY
S t a r t Y ) &&
( F o l l o w i n g E d g e - > X >- S t a r t X ) ) ) I
NewEdgePtr->NextEdge
FollowingEdge:
*FollowingEdgeLink
NewEdgePtr:
break;
--
FollowingEdgeLink
- &FollowingEdge->NextEdge;
/*
S o r t s a l l e d g e sc u r r e n t l yi nt h ea c t i v e
edge t a b l ei n t oa s c e n d i n g
o r d e ro fc u r r e n t
X c o o r d i n a t e s */
s t a t i cv o i dX S o r t A E T O
I
s t r u c tE d g e S t a t e* C u r r e n t E d g e .* * C u r r e n t E d g e P t r .
*TempEdge:
i n t Swapoccurred:
/*
S c a nt h r o u g ht h e
AET and swap a n ya d j a c e n te d g e sf o rw h i c ht h e
secondedge
i s a t a l o w e rc u r r e n t X c o o r d t h a n t h e f i r s t
edge.
Repeat u n t i l n of u r t h e rs w a p p i n gi sn e e d e d
*/
i f (AETPtr !- NULL) (
do I
Swapoccurred
0:
CurrentEdgePtr
&AETPtr;
*CurrentEdgePtr)->NextEdge !- NULL) I
w h i l e( ( C u r r e n t E d g e
if (CurrentEdge->X > CurrentEdge->NextEdge->X) (
/ * Thesecondedgehas
a lower X t h a nt h ef i r s t ;
swap them i n t h e AET */
CurrentEdge->NextEdge->NextEdge:
TempEdge
*CurrentEdgePtr
CurrentEdge->NextEdge:
CurrentEdge->NextEdge->NextEdge CurrentEdge:
CurrentEdge->NextEdge
TempEdge:
Swapoccurred
1;
--
CurrentEdgePtr
}
} w h i l e( S w a p o c c u r r e d
- &(*CurrentEdgePtr)->NextEdge;
!- 0):
/*
Advanceseachedge
i n t h e AET byonescan
line.
Removes edges t h a th a v eb e e nf u l l ys c a n n e d .
*/
s t a t i c v o i d AdvanceAETO I
s t r u c tE d g e S t a t e* C u r r e n t E d g e ,* * C u r r e n t E d g e P t r :
- -
/*
*/
769
CurrentEdge->X +- C u r r e n t E d g e - > X D i r e c t i o n :
CurrentEdge->ErrorTerm
CurrentEdge->ErrorTermAdjDown:
CurrentEdgePtr
--
KurrentEdge->NextEdge:
/*
Moves a l l edges t h a t s t a r t a t t h e s p e c i f i e d
Y c o o r d i n a t ef r o mt h e
GET t o t h e AET. m a i n t a i n i n g t h e X s o r t i n g o f t h e
AET. */
s t a t i cv o i dM o v e X S o r t e d T o A E T ( i n t
YToMove) I
s t r u c tE d g e S t a t e *AETEdge. **AETEdgePtr. *TempEdge;
i n tC u r r e n t X :
/*
for
(::)
- --
AETEdge
*AETEdgePtr:
i f ((AETEdge
NULL) I I (AETEdge->X >- C u r r e n t X ) )
TempEdge
GETPtr->NextEdge;
GETPtr: /* l i n k t h e edge i n t o t h e
*AETEdgePtr
GETPtr->NextEdge
AETEdge:
AETEdgePtr
&GETPtr->NextEdge;
TempEdge;
/ * u n l i n kt h ee d g ef r o mt h e
GETPtr
break:
3 else I
AETEdgePtr
&AETEdge->NextEdge:
I
AET
*/
GET */
/*
Fillsthescanlinedescribed
b yt h ec u r r e n t
AET a t t h e s p e c i f i e d
c o o r d i n a t ei nt h es p e c i f i e dc o l o r ,u s i n gt h eo d d l e v e n
fill r u l e
s t a t i cv o i dS c a n O u t A E T ( i n t
YToScan. i n t C o l o r ) I
i n tL e f t X :
s t r u c tE d g e S t a t e* C u r r e n t E d g e :
/*
S c a nt h r o u g ht h e
AET. d r a w i n gl i n es e g m e n t sa se a c hp a i ro fe d g e
c r o s s i n g si se n c o u n t e r e d .
The n e a r e s tp i x e l on o r t o t h e r i g h t
o f l e f t edges i s d r a w n ,a n dt h en e a r e s tp i x e lt ot h el e f to fb u t
n o t on r i g h t edges i s drawn */
CurrentEdge
AETPtr;
w h i l e( C u r r e n t E d g e
!- NULL) I
LeftX
CurrentEdge->X:
CurrentEdge
CurrentEdge->NextEdge:
D r a w H o r i z o n t a l L i n e S e g ( Y T o S c a n . L e f t XC
. u r r e n t E d g e - > X - 1C
, olor);
CurrentEdge
CurrentEdge->NextEdge;
770
Chapter 41
*/
Previous
Home
Next
code * I
# d e f i n e CONVEX
0
# d e f i n e NONCONVEX 1
# d e f i n e COMPLEX
2
I* D e s c r i b e s a s i n g l e p o i n t ( u s e d f o r
s t r u c tP o i n t
i n t X:
/* X coordinate *I
i n t Y:
I* Y c o o r d i n a t e * I
1;
a s i n g l ev e r t e x )
*/
I* D e s c r i b e ss e r i e s
o f p o i n t s( u s e dt os t o r e
a l i s to fv e r t i c e st h a td e s c r i b e
assumed t o c o n n e c t t o t h e t w o a d j a c e n t v e r t i c e s ,
and
l a s t v e r t e x i s assumed t o c o n n e c t t o t h e f i r s t )
*I
s t r u c tP o i n t L i s t H e a d e r {
i n t Length;
/ * 11 o f p o i n t s * I
s t r u c tP o i n t * PointPtr:
I* p o i n t e r t o l i s t o f p o i n t s
*I
a p o l y g o n ;e a c hv e r t e xi s
I;
/*
D e s c r i b e sb e g i n n i n ga n de n d i n g
X c o o r d i n a t e so f a s i n g l e h o r i z o n t a l l i n e
s t r u c tH L i n e {
i n t X S t a r t ; I* X c o o r d i n a t e o f l e f t m o s t p i x e l i n l i n e
*/
i n t XEnd;
I* X c o o r d i n a t e o f r i g h t m o s tp i x e li nl i n e
*I
*/
I;
I* D e s c r i b e s a L e n g t h - l o n gs e r i e so fh o r i z o n t a ll i n e s ,a l l
assumed t o beon
c o n t i g u o u ss c a nl i n e ss t a r t i n ga tY S t a r t
andproceedingdownward(used
to
d e s c r i b es c a n - c o n v e r t e dp o l y g o nt ol o w - l e v e lh a r d w a r e - d e p e n d e n td r a w i n gc o d e )
s t r u c tH L i n e L i s t {
i n t Length;
/* # o fh o r i z o n t a ll i n e s */
intYStart;
I* Y c o o r d i n a t e o f t o p m o s t l i n e
*I
s t r u c tH L i n e * H L i n e P t r ;
I* p o i n t e r t o l i s t o f h o r z l i n e s
*I
*/
3;
I* D e s c r i b e s a c o l o r asan
s t r u c t RGB { u n s i g n e dc h a r
RGB t r i p l e , p l u s o n e b y t e f o r o t h e r i n f o
Red, Green, B l u eS, p a r e :
*I
I:
Is monotone-vertical polygon detection worth all this trouble? Under the right circumstances, you bet. In a situation where a great many polygons are being drawn,
and the application either doesnt know whether theyre monotone-vertical or has
no way to tell the polygon filler that they are, performance can be increased considerably if most polygons are, in fact, monotone-vertical. This potential performance
advantage is helped alongby the surprising fact that Jims testfor monotone-vertical
status is simpler and faster than my original, nonfunctional test for convexity.
See what accurate terminology and effective communication can do?
771
Previous
chapter 42
wu'ed in haste; fried, stewed at leisure
Home
Next
. I can now, unhesitatingly and without reservaed, stewed chicken at all costs.
The thought I had was as follows:This is not good food. Not a profound thought, but it
raises an interesting question: Why was I eating in this restaurant? The answer, to
borrow a phrase from E.F. Schumacher, is uppropnate technology. For a family on a
budget, with a small child, tired of staring at each other over the kitchen table, this
was a perfect place to eat. It was cheap, it had greasy food and ice cream, no one
cared if children dropped things or talked loudly or walked around, and, most important of all, it wasnt home.So what if the food was lousy? Goodfood was a luxury,
a bonus; everything on the above list was necessary. A family restaurant was the appropriate dining-out technology, given
the parameters within whichwe had to work.
775
When I read through SIGGRAPH proceedings and otherstate-of-the-artcomputergraphics material, all too often I feel like Im dining at a four-star restaurant with
two-year-old triplets and anempty wallet. Were talking
incredibly inappropriate technology for PC graphics here. Sure, I say to myself as I read about an antialiasing
technique, that sounds
wonderful-if I had 24bpp color, and dedicated hardware to
do the processing, and all day to wait to generate one image. Yes, I think, that is a
good way to do hiddensurface removal-in a system with hardware z-buffering.Most
of the stuff in the journalComputer Ofaphicsis riveting, but, alas, pretty much useless
on PCs. When an x86 has to do all the work, speed becomes the overriding parameter, especially for real-time graphics.
Literature thats applicable to fast PC graphics is hard enoughto find, but what wed
really like is above-averageimage quality combined with terrific speed, and theres
almost no literature of that sort around.There is some, however, and you folks are
right on top
of it. For example, alert reader
Michael Chaplin, of San Diego, wrote to
suggest that I might enjoy the line-antialiasing algorithm presented in Xiaolin Wus
article, An Efficient AntialiasingTechnique, in the
July 1991 issue
of Computer Guphics. Michael was dead-on right. This is a great algorithm, combining excellent
antialiased line qualitywith speed thats closeto that of non-antialiased Bresenhams
line drawing. This is the sortof algorithm that makes you wantto go out andwrite a
wire-frame animation program, just so you can see how good those smooth lines
look in motion. Wu antialiasing is a wonderfulexample of what can be accomplished
on inexpensive, mass-market hardware with the proper programming perspective.
In short,its a splendidexample of appropriate technology for PCs.
Wu Antialiasing
Antialiasing, as weve been discussing for the past few chapters, is the process of
smoothing lines and edges so that they appear less jagged. Antialiasing is partly an
aesthetic issue, because it makes images more attractive. Its also partly an accuracy
issue, because it makes it possible to position and draw images with effectivelymore
precision than the resolution of the display. Finally, itspartly a flat-out necessity, to
avoid the horrible, crawling, jagged edges of temporal aliasing when performing
animation.
As the algorithm steps
The basic premise of Wu antialiasingis almost ridiculously simple:
one pixel unit ata time along the major (longer) axis of a line, draws
it
the two pixels
bracketing the line along the minor axis at each point. Each of the two bracketing
pixels is drawn with
a weighted fraction of the full intensity of the drawing color, with
the weighting for eachpixel equal to one minus the pixels distance along the minor
axis from the ideal line. Yes, its a mouthful, but Figure 42.1 illustrates the concept.
The intensities of the two pixels that bracket the line areselected so that they always
sum to exactly 1;that is, to the intensity of one fully illuminated pixel of the drawing
color. The presence of aggregate full-pixel intensity means thatat each step, the line
776
Chapter 42
has the same brightness it would have if a single pixel were drawn at precisely the
correct location. Moreover, thanks to the distribution of the intensity weighting, that
brightness is centered at the ideal line. Not coincidentally, a line drawn with pixel
pairs of aggregate single-pixel intensity, centered on the ideal line, is perceived by
the eye not as ajagged collection of pixel pairs, but as a smooth line centered on the
ideal line. Thus, by weighting the bracketing pixels properly at each step, we can
readily produce what looks like a smooth line at precisely the right location, rather
than the jagged pattern
of line segments that non-antialiased line-drawing algorithms
such as Bresenhams (see Chapters 35,36, and37) trace out.
You might expect that the implementation of Wu antialiasing would fall into two
distinct areas: tracing out the line (that is, finding the appropriate pixel pairs to
draw) and calculating the appropriate weightings for each pixel pair. Not so, however.
The weighting calculations involve onlya few shifts, XORS, and adds; forall practical
purposes, tracing and weighting are rolled into one step-and a very fast step it is.
How fastis it? On a 33-MHz 486 witha fast VGA, a goodbut notmaxed-out assembly
implementation of Wu antialiasing draws a more than respectable 5,000 150-pixellong vectors per second. Thats especially impressive
considering that about1,500,000
Wued in Haste; Fried,Stewed at Leisure
777
actual pixelsare drawn per second, meaning that Wu antialiasing is drawing at around
50 percent of the maximum memory bandwidth-half the fastest theoretically possible drawingspeed-of anAT-bus VGA. In short, Wu antialiasing is about as fastan
antialiased line approach as you could ever hope to find for the VGA.
778
Chapter 42
works because what the error accumulator accumulates is precisely the ideal lines
current distance between the two bracketing pixels.
The intensity calculations take longer to describe than they do to perform. All thats
involved is a shift of the erroraccumulator to right-justify the desired intensityweightXOR to flip the least-significantn bits of the first pixels valuein
ing bits, and then an
order to generate thesecond pixels value. Listing42.1 illustratesjust how efficient
Wu antialiasing is; the intensity calculations take only three statements,and the entire Wu linedrawing loopis only nine statements long. Of course, a single C statement
can hide a greatdeal of complexity, but Listing 42.6, an assembly implementation,
shows that only 15 instructions are requiredper step along themajor axis-and the
number of instructions could be reduced to ten by special-casing and loop unrolling. Make no mistake about it, Wu antialiasing is fast.
LISTING 42.1L42-1 .C
/*
*
*
*
*
*
*
*
*
*
*
*
*/
F u n c t i o nt od r a wa na n t i a l i a s e dl i n ef r o m
(XO.YO) t o ( X 1 , Y l ) .
u s i n ga n
a n t i a l i a s i n ga p p r o a c hp u b l i s h e db yX i a o l i n
Wu i n t h e J u l y 1 9 9 1 i s s u e o f
C o m p u t e rG r a p h i c s .R e q u i r e st h a tt h ep a l e t t eb es e tu p
so t h a tt h e r e
a r e NumLevels i n t e n s i t y l e v e l s o f t h e d e s i r e d d r a w i n g c o l o r , s t a r t i n g a t
c o l o rB a s e C o l o r
(100% i n t e n s i t y ) a n df o l l o w e db y( N u m L e v e l s - 1 )l e v e l so f
e v e n l yd e c r e a s i n gi n t e n s i t y ,w i t hc o l o r( B a s e C o l o r + N u m L e v e l s - 1 )b e i n g
0%
i n t e n s i t yo ft h ed e s i r e dd r a w i n gc o l o r( b l a c k ) .T h i sc o d ei ss u i t a b l ef o r
use a ts c r e e nr e s o l u t i o n s ,w i t hl i n e st y p i c a l l yn o
more t h a n 1 K l o n g :f o r
l o n g e rl i n e s ,3 2 - b i te r r o ra r i t h m e t i cm u s tb eu s e dt oa v o i dp r o b l e m sw i t h
f i x e d - p o i n ti n a c c u r a c y . No c l i p p i n g i s p e r f o r m e d i n
DrawWuLine; i t mustbe
p e r f o r m e de i t h e ra t
a h i g h e rl e v e lo ri nt h eD r a w P i x e lf u n c t i o n .
T e s t e dw i t hB o r l a n d
C++ i n C c o m p i l a t i o n mode and t h es m a l lm o d e l .
e x t e r nv o i dD r a w P i x e l ( i n t .i n t .i n t ) ;
779
- -
I* Wu a n t i a l i a s e d l i n e d r a w e r .
*
*
*
*
*
*
*
(XO,YO).(Xl.Yl)
l i n e t o draw
BaseColor
color # o f f i r s t c o l o r i n b l o c k
100% i n t e n s vi teyr stidohorfena w icnogl o r
used f o r a n t i a l i a s i n g . t h e
NumLevels
s i z eo fc o l o rb l o c k ,w i t hB a s e C o l o r + N u m L e v e l s - 1b e i n gt h e
0% i n t e n s vi teyr stidohorfena w i cnogl o r
IntensityBits
l o g base 2 o f NumLevels:the # o f b i t s used t o d e s c r i b e
t h ei n t e n s i t yo ft h ed r a w i n gc o l o r .
2**IntensityBits--NumLevels
*I
v o i dD r a w W u L i n e ( i n t
X O . i n t YO,
u n s i g n e di n tI n t e n s i t y B i t s )
i n t X1.
i n t Y1.
i n t BaseColor. i n t NumLevels.
u n s i g n e d i n tI n t e n s i t y S h i f t .E r r o r A d j .E r r o r A c c :
WeightingComplementMask:
u n s i g n e di n tE r r o r A c c T e m p .W e i g h t i n g ,
i n tD e l t a X .D e l t a Y .
Temp, XDir:
I* Make s u r e t h e l i n e r u n s t o p t o b o t t o m
i f (YO > Y1) t
Temp
YO: YO
Temp
X O : X0
--
--
Y1: Y 1
X1: X 1
--
*I
Temp:
Temp:
/*
Draw t h e i n i t i a l p i x e l , w h i c h i s a l w a y s e x a c t l y i n t e r s e c t e d b y
*/
t h e l i n e and so needsnoweighting
D r a w P i x e l ( X 0 . YO, B a s e C o l o r ) :
- --
i f ((DeltaX
XDir
1:
I else I
X 1 - XO)
>-
0) t
XDir
-1:
DeltaX
- D e l t a X : I* make D e l t a X p o s i t i v e
}
*I
/*
S p e c i a l - c a s eh o r i z o n t a l .v e r t i c a l ,a n dd i a g o n a l i n e s ,w h i c h
r e q u i r en ow e i g h t i n gb e c a u s et h e yg or i g h tt h r o u g ht h ec e n t e r
e v e r yp i x e l */
i f ((DeltaY
Y 1 - YO)
D) I
I* H o r i z o n t a l l i n e * I
w h i l e( D e l t a X - !- 0) I
X0 +- XDir:
DrawPixel(X0. YO, B a s e C o l o r ) :
of
return:
i f (DeltaX
0) (
/* V e r t i c a l l i n e */
do I
YD M :
DrawPixel(X0. YO. B a s e C o l o r ) :
1 w h i l e( - - D e l t a Y !- 0 ) :
return:
i f (DeltaX
DeltaY) (
I* D i a g o n a l 1 i n e * I
do I
X0 +- XDir:
YO*:
DrawPixel(X0. YO, BaseColor):
1 w h i l e( - - D e l t a Y !- 0):
return:
*I
i n i t i a l i z et h el i n ee r r o ra c c u m u l a t o rt o
I* l i n ei sn o th o r i z o n t a l ,d i a g o n a l ,o rv e r t i c a l
ErrorAcc
780
Chapter 42
0:
/*
*/
I* # o f b i t s b y w h i c h t o s h i f t E r r o r A c c t o g e t i n t e n s i t y l e v e l
*I
I n t e n s i t y S h i f t = 16 - I n t e n s i t y B i t s :
/ * Mask used t o f l i p a l l b i t s i n an i n t e n s i t yw e i g h t i n g ,p r o d u c i n gt h e
r e s u l t ( 1 - i n t e n s i t yw e i g h t i n g ) */
WeightingComplementMask
NumLevels - 1;
I* I s t h i s a nX - m a j o ro rY - m a j o rl i n e ?
*I
i f (DeltaY > DeltaX) {
I* Y - m a j o r l i n e : c a l c u l a t e 1 6 - b i t f i x e d - p o i n t f r a c t i o n a l p a r t
of a
p i x e l t h a t X advanceseachtime
Y advances 1 p i x e l , t r u n c a t i n g t h e
X a x i s */
r e s u l t s o t h a t we w o n ' to v e r r u nt h ee n d p o i n ta l o n gt h e
ErrorAdj
( ( u n s i g n e dl o n g )D e l t a X
<< 1 6 ) I ( u n s i g n e dl o n g )D e l t a Y ;
/* Draw a l l p i x e l s o t h e r t h a n t h e f i r s t
and l a s t * /
w h i l eC - D e l t a Y )
I
ErrorAccTemp = E r r o r A c c ;
I* remember c u r r r e n at c c u m u l a t e de r r o r
*/
E r r o r A c c += E r r o r A d j ;
I* c a l c u l a t e r r ofronr e xpt i x e l
*I
i f ( E r r o r A c c <- ErrorAccTemp) {
I* The e r r o ra c c u m u l a t o rt u r n e do v e r ,
s o advancethe X c o o r d * I
X0 += XDir:
I* Y-major. so alwaysadvance
Y */
The I n t e n s i t y B i t s m o s t s i g n i f i c a n t b i t s o f E r r o r A c c g i v e u s t h e
i n t e n s i t yw e i g h t i n gf o rt h i sp i x e l ,
and t h ec o m p l e m e n to ft h e
w e i g h t i n gf o rt h ep a i r e dp i x e l
*I
Weighting
E r r o r A c c >> I n t e n s i t y S h i f t :
DrawPixel(X0. Y O . BaseColor + W e i g h t i n g ) :
DrawPixel(X0 + XDir. Y O .
BaseColor + ( W e i g h t i n g
WeightingComplementMask)):
YO++:
/*
I* Draw t h e f i n a l p i x e l . w h i c h i s a l w a y s e x a c t l y i n t e r s e c t e d b y t h e l i n e
and s o n e e d sn ow e i g h t i n g
*I
DrawPixel(X1. Y 1 . BaseColor):
return:
I* I t ' s a nX - m a j o rl i n e :c a l c u l a t e1 6 - b i tf i x e d - p o i n tf r a c t i o n a lp a r to f
X advances 1 p i x e l , t r u n c a t i n g t h e
p i x e l t h a t Y advanceseachtime
r e s u l tt oa v o i do v e r r u n n i n gt h ee n d p o i n ta l o n gt h e
X a x i s *I
ErrorAdj
( ( u n s i g n e dl o n g )D e l t a Y
<< 1 6 ) / ( u n s i g n e dl o n g )D e l t a X :
I* Draw a l l p i x e l s o t h e r t h a n t h e f i r s t
and l a s t * I
w h i l e( - - D e l t a X )
I
ErrorAccTemp
ErrorAcc:
I* remember c u r r r e n at c c u m u l a t e de r r o r
*I
E r r o r A c c +- E r r o r A d j :
I* c a l c u l a t e r r ofronr e xpt i x e l
*I
i f ( E r r o r A c c <- ErrorAccTemp) I
I* The e r r o ra c c u m u l a t o rt u r n e do v e r ,
s o advancethe Y c o o r d * I
YO++;
>
I* Draw t h e f i n a l p i x e l , w h i c h
i s a l w a y se x a c t l yi n t e r s e c t e db yt h el i n e
and s o n e e d sn ow e i g h t i n g
*I
DrawPixel(X1. Y 1 . BaseColor);
781
Sample Wu Antialiasing
The true test of any antialiasing technique is howgood itlooks, so let's havea look at
Wu antialiasing in action. Listing 42.1 is a C implementation ofWu antialiasing.
Listing 42.2 isa sample program thatdraws a variety of Wu-antialiasedlines, followed
by non-antialiased lines, for comparison. Listing 42.3 contains
DrawPixel()and &Mode()
functions for mode13H, the VGA's 320x200 256-colormode. Finally, Listing 42.4 is
a simple, non-antialiased linedrawing routine. Link these four listings together and
run the resulting program to see both Wu-antialiased and non-antialiased lines.
*/
Sample l i n e - d r a w i n gp r o g r a mt od e m o n s t r a t e
Wu a n t i a l i a s i n g . A l s o d r a w s
n o n - a n t i a l i a s e dl i n e sf o rc o m p a r i s o n .
T e s t e dw i t hB o r l a n d
C++ i n C c o m p i l a t i o n mode and t h es m a l lm o d e l .
1:
*/
/ * # o f c o l o r s w e ' l l do a n t i a l i a s e d d r a w i n g w i t h */
/ * d e s c r i b eosnceo l ours efdoarn t i a l i a s i n g
*/
/ * # o fs t a r t o f p a l e t t ei n t e n s i t yb l o c ki n
DAC */
/ * # o fi n t e n s i t yl e v e l s
*/
/* I n t e n s i t y B i t s
log2
NumLevels
*/
/*
/*
/*
*/
red
component
oc fo l oaf rut ilnl t e n s i t y
green
component
ocf o l oarf tu li ln t e n s i t y
b l u e component ocfo l oafr tu il nl t e n s i t y
enum {WU-BLUE-0.
WU-WHITE-11:
/* d r a cwoi nl ogr s
s t r u c t WuColor WuColorsCNUM~WU~COLORSl /* b l u e and w h i t e
((192,32,5.
0. 0. Ox3F).{224.32.5.Ox3F.Ox3F.Ox3F1};
*/
*/
*/
*/
voidmain0
i n tC u r r e n t C o l o r . i;
u n i o n REGS r e g s e t :
/ * Draw W u - a n t i a l i a s e d l i n e s i n a l l d i r e c t i o n s
SetModeO:
SetPalette(WuCo1ors):
f o r( i - 5 :i < S c r e e n W i d t h I n P i x e l s :
i +- 10) {
*/
DrawWuLine~ScreenWidthInPixels/2-ScreenWidthInPixels/lO+i/5,
S c r e e n H e i g h t I n P i x e l s / 5 . i. S c r e e n H e i g h t I n P i x e l s - 1 ,
WuColors[WU~BLUEl.BaseColor. WuColorsCWU~BLUE1.NumLevels.
WuColors[WU~BLUEl.IntensityBits):
f o r( i - 0 :i < S c r e e n H e i g h t I n P i x e l s :
i +- 10) {
DrawWuLine(ScreenWidthInPixels/2"creenWidthInPixels/lO,
782
Chapter 42
i / 5 . 0. i.
WuColorsCWU~BLUE1.BaseColor. WuColors[WU~BLUEl.NumLevels.
WuColorsCWU~BLUE1.IntensityBits):
f o r( i - 0 ;i < S c r e e n H e i g h t I n P i x e l s ;
+- 1 0 )
DrawWuLine(ScreenWidthInPixels/2+ScreenWidthInPixels/lO, i / 5 ,
S c r e e n W i d t h I n P i x e l s - 1 , i. WuColors[WU~BLUE1.BaseColor.
WuColors[WU~BLUE1.NumLevels. WuColors[WU_BLUEl.IntensityBits);
f o r( i - 0 ;i < S c r e e n W i d t h I n P i x e l s ;
+-
10) {
OrawWuLine~ScreenWidthInPixels/2-ScreenWidthInPixels/lO+i/5,
S c r e e n H e i g h t I n P i x e l s . i, 0. WuColors[WU~WHITEl.BaseColor.
WuColorsCWU~WHITE1.NumLevels,
WuColors[WU_WHITE1.IntensityBits):
/ * fwoar i t
getch();
*/
ap rkeesys
/* Now c l e a rt h es c r e e n
a n dd r a wn o n - a n t i a l i a s e dl i n e s
SetModeO;
SetPalette(WuCo1ors);
f o r( i - 0 ;i < S c r e e n W i d t h I n P i x e l s ;
i +- 1 0 ) {
*/
OrawLine~ScreenWidthInPixels/2-ScreenWidthInPixels/lO+i/5,
S c r e e n H e i g h t I n P i x e l s / 5 , i. S c r e e n H e i g h t I n P i x e l s - 1 ,
WuColors[WU~BLUEl.BaseColor~;
i +- 1 0 ) (
DrawLine~ScreenWidthInPixels/2-ScreenWidthInPixels/lO, i / 5 . 0. i.
f o r( i - 0 ;i < S c r e e n H e i g h t I n P i x e l s ;
WuColors[WU~BLUE1.BaseColor);
i +- 1 0 ) {
f o r( i - 0 ;i < S c r e e n H e i g h t I n P i x e l s ;
OrawLine(ScreenWidthInPixels/2+ScreenWidthInPixels/lO, i / 5 .
S c r e e n W i d t h I n P i x e l s - 1 , i , WuColors[WU-BLUE1.BaseColor);
f o r( i - 0 ;i < S c r e e n W i d t h I n P i x e l s ;
getch( ) :
/ * wfaoirt
regset.x.ax
0x0003;
/* AL
i n t 8 6 ( 0 x 1 0&, r e g s e t&. r e g s e t ) ;
/*
*
*
*
*
*/
+-
10)
OrawLine~ScreenWidthInPixels/2-ScreenWidthInPixels/lO+i/5,
S c r e e n H e i g h t I n P i x e l s . i, 0. WuColors[WU-WHITEI.BaseColor);
a press
key
*/
3 s e l e c t s8 0 x 2 5t e x t
/* r e t u r nt ot e x t
mode */
mode */
S e t su pt h ep a l e t t ef o ra n t i a l i a s i n gw i t ht h es p e c i f i e dc o l o r s .
I n t e n s i t ys t e p sf o re a c hc o l o ra r es c a l e df r o mt h ef u l ld e s i r e di n t e n s i t y
o ft h er e d ,g r e e n ,
and b l u ec o m p o n e n t sf o rt h a tc o l o r
down t o 0%
i n t e n s i t y ;e a c hs t e pi sr o u n d e dt ot h en e a r e s ti n t e g e r .C o l o r sa r e
c o r r e c t e d f o r a gamma o f 2 . 3 .T h ev a l u e st h a tt h ep a l e t t ei s
programmed
w i t ha r eh a r d w i r e df o rt h e
VGA's 6 b i t p e r c o l o r DAC.
v o i dS e t P a l e t t e c s t r u c t
WuColor
WColors)
i n t i. j;
u n i o n REGS r e g s e t :
s t r u c t SREGS s r e g s e t ;
/ * 256 RGB e n t r i e s */
s t a t i cu n s i g n e dc h a P
r aletteBlockC2561C31;
/* Gamma-corrected DAC c o l o rc o m p o n e n t sf o r6 4l i n e a rl e v e l sf r o m
0% t o
100% i n t e n s i t y * /
s t a t i cu n s i g n e dc h a r
GammaTableCl
{
0. 10.14.17.19.21.23.24.26.27.28.29.31.32.33.34.
35.36.37.37.38.39.40.41.41.42,43.44.44.45.46.46.
47.48.48.49.49.50.51.51.52.52.53.53,54.54.55.
55.
56,56.57.57,
58. 58. 59. 59. 60.60.61.61,62,62.63.631;
783
f o r( i - 0 ;
i<NUM-WU-COLORS; i++)
{
f o r( j - 0 j; < W C o l o r s [ i l . N u m L e v e l s ;
j++) I
PaletteBlock[jl[O]
GammaTable[((double)WColors[i].MaxRed
* ( 1.0 ( d o u b 1 e ) j I (double)(WColors[i].NumLevels
- 1 ) ) ) + 0.51;
PaletteBlockCjl[11
GammaTable[((double)WColors[il.MaxGreen
* (1.0 .
( d o u b 1 e ) j / (double)(WColors[i].NumLevels
- 1 ) ) ) + 0.51;
GammaTable[((double)WColors[i].MaxBlue
* (1.0 PaletteBlock[jl[2]
( d o u b 1 e ) j I (double)(WColors[il.NumLevels - 1 ) ) ) + 0.51;
I
/ * Now s e t up t h e p a l e t t e t o
regset.x.ax
regset.x.bx
regset.x.cx
regset.x.dx
*/
- 0x1012; I* s e tb l o c ko f DAC r e g i s t e r sf u n c t i o n * /
-- WWColors[il.NumLevels;
Colors[il.BaseColor;
I* f i r s t DAC l o c a t i o nt ol o a d
*I
/ * # o f DAC l o c a t i o n st ol o a d * I
- ( u n s i g n e di n t ) P a l e t t e B l o c k ; / * o f f s e t o f a r r a y f r o m w h i c h
do Wu a n t i a l i a s i n g f o r t h i s c o l o r
t o l o a d RGB s e t t i n g s * I
sregset.es
-DS; I* segment o f a r r a y f r o m w h i c h t o l o a d s e t t i n g s
*I
i n t 8 6 x ( O x 1 0 .& r e g s e t .& r e g s e t .& r e g s e t ) ;
I* l o a dt h ep a l e t t eb l o c k
*I
LISTING 42.3L42-3.C
/*
*
model
*I
tin c l ude <dos .h>
I* S c r e e nd i m e n s i o ng l o b a l s .u s e di nm a i np r o g r a mt os c a l e .
i n tS c r e e n W i d t h I n P i x e l s
320;
200;
i n tS c r e e n H e i g h t I n P i x e l s
*I
/ * Mode 1 3 hd r a wp i x e lf u n c t i o n .
*I
v o i dD r a w P i x e l ( i n t
X. i n t Y . i n tC o l o r )
(
# d e f i n e SCREEN-SEGMENT
OxAOOO
u n s i g n e dc h a rf a r* S c r e e n P t r ;
FP-SEG(ScreenPtr)
SCREENLSEGMENT;
FP-OFF(ScreenPtr)
( u n s i g n e di n t ) Y
*ScreenPtr
Color;
I* Mode 1 3 hm o d e - s e tf u n c t i o n .
ScreenWidthInPixels + X;
*I
voidSetModeO
(
u n i o n REGS r e g s e t ;
I* S e tt o3 2 0 x 2 0 02 5 6 - c o l o rg r a p h i c s
regset.x.ax
0x0013;
i n t 8 6 ( 0 x 1 0 ,& r e g s e t .& r e g s e t ) ;
mode
*/
LISTING42.4L42-4.C
I* F u n c t i o nt od r a w
a n o n - a n t i a l i a s e dl i n ef r o m
(X0,YO) t o (X1,Yl). u s i n g a
* s i m p l ef i x e d - p o i n te r r o ra c c u m u l a t i o na p p r o a c h .
* T e s t e dw i t hB o r l a n d C++ i n C c o m p i l a t i o n mode andthesmallmodel.
*/
e x t e r nv o i dD r a w P i x e l ( i n t .i n t .i n t ) ;
784
Chapter 42
/*
*/
N o n - a n t i a l i a s e dl i n ed r a w e r .
(XO.YO).(Xl.Yl)
l i n e t o d r a w ,C o l o r
v o i dD r a w L i n e ( i n t
XO.
c o l o r i n w h i c ht od r a w
i n t YO, i n t X 1 . i n t Y 1 . i n t C o l o r )
u n s i g n e dl o n gE r r o r A c c .E r r o r A d j ;
i n tD e l t a X .D e l t a Y .
XDir. Temp;
/*
*/
Make s u r e t h e l i n e r u n s t o p t o b o t t o m
> Y1) I
Temp
YO; YO
Y1: Y 1
Temp;
Temp
XO; X0
X1; X 1
Temp:
--
i f (YO
--
-I ---
--
DrawPixel(X0. YO. C o l o r ) ; / * d r a wt h ei n i t i a lp i x e l
i f ((DeltaX
X 1 - X O ) >- 0 ) {
XDir
1 else
1;
XDir
-1;
DeltaX
-DeltaX;
*/
/*
make D e l t a X p o s i t i v e
i f ((DeltaY
Y 1 - YO)
0)
i f (DeltaX
0) r e t u r n ;
- 0x8000:
*/
*/
/* done i f o n l y o n e p o i n t i n t h e l i n e
/*
i n i t i a l i z el i n ee r r o ra c c u m u l a t o rt o
. 5 . so we can
advancewhen we g e t h a l f w a y t o t h e n e x t p i x e l
*/
/ * Is t h i s a nX - m a j o ro rY - m a j o rl i n e ?
*/
i f (DeltaY > DeltaX) {
/* Y - m a j o rl i n e ;c a l c u l a t e1 6 - b i tf i x e d - p o i n tf r a c t i o n a lp a r to f
a
p i x e l t h a t X advanceseachtime
Y advances 1 p i x e l */
ErrorAdj
( ( ( ( u n s i g n e d1 o n g ) D e l t a X
<< 17) / ( u n s i g n e d1 o n g ) D e l t a Y ) +
1) >> 1:
/ * Draw a l l p i x e l s b e t w e e n t h e f i r s t
and l a s t */
do I
E r r o r A c c +- E r r o r A d j ;
/* c a l c u l a t ee r r o rf o rt h i sp i x e l
*/
i f ( E r r o r A c c & -0xFFFFL) I
/* The e r r o ra c c u m u l a t o rt u r n e do v e r .
so advance t h e X c o o r d */
X0 +- XDir;
E r r o r A c c &- OxFFFFL;
/ * c l e a ri n t e g e rp a r to fr e s u l t
*/
ErrorAcc
YO++:
DrawPixel(X0. Y O , C o l o r ) :
I w h i l e( - - D e l t a Y ) ;
return:
/*
/*
Y-major. s o advance
always
*/
I t ' s a nX - m a j o rl i n e :c a l c u l a t e1 6 - b i tf i x e d - p o i n tf r a c t i o n a lp a r to f
p i x e l t h a t Y advanceseachtime
X advances 1 p i x e l */
( ( ( ( u n s i g n e d1 o n g ) D e l t a Y
<< 17) / ( u n s i g n e d1 o n g ) D e l t a X )
ErrorAdj
1) >> 1:
/ * Draw a l l r e m a i n i n g p i x e l s * /
do I
E r r o r A c c +- E r r o r A d j ;
/* c a l c u l a t e e r r o r f o r t h i s p i x e l
*/
i f ( E r r o r A c c & -0xFFFFL) I
/ * The e r r o ra c c u m u l a t o rt u r n e do v e r ,
so advance t h e Y c o o r d
YO++:
E r r o r A c c &- OxFFFFL;
/* c l e a ri n t e g e rp a r t
o f r e s u l t */
X0 +- XDir;
DrawPixel(X0. Y O , C o l o r ) :
w h i l e( - - D e l t a X ) ;
/*
X-major.
s o always
advance
*/
*/
785
*
*
Mode s e t a n dp i x e l - d r a w i n gf u n c t i o n sf o rt h e6 4 0 x 4 8 02 5 6 - c o l o r
mode o f
TsengLabsET4000-basedSuperVGAs.
T e s t e dw i t hB o r l a n d
C++ i n C c o m p i l a t i o n mode and t h es m a l lm o d e l .
*I
#i
ncl ude <dos.h>
I* S c r e e nd i m e n s i o ng l o b a l s .u s e d
i n tS c r e e n W i d t h I n P i x e l s
640;
480:
intScreenHeightInPixels
t os c a l e */
i n mainprogram
/ * E T 4 0 0 06 4 0 x 4 8 02 5 6 - c o l o rd r a wp i x e lf u n c t i o n .
v o i dD r a w P i x e l ( i n t X . i n t Y . i n t C o l o r )
*I
t
# d e f i n e SCREENKSEGMENT
OxAOOO
# d e f i n e GC-SEGMENT-SELECT
u n s i g n e dc h a rf a r* S c r e e n P t r :
unsigned i n t Bank:
unsignedlongBitmapAddress;
segment
(bank)
select
reg
Ox3CD / * ET4000
*/
/ * f u l lb i t m a pa d d r e s so fp i x e l ,
asmeasuredfromaddress
0 t o OxFFFFF
BitmapAddress
( u n s i g n e dl o n g ) Y * S c r e e n W i d t h I n P i x e l s + X :
/ * Bank # i s u p p e rw o r do fb i t m a pa d d r
*I
Bank
BitmapAddress >> 16:
I* Upper n i b b l e i s r e a d b a n k
#, l o w e r n i b b l e i s w r i t e b a n k
i/ * I
outp(GC-SEGMENTKSELECT, (Bank << 4 ) I Bank):
/ * Draw i n t o t h e b a n k * I
FPKSEG(ScreenPtr) = SCREEN-SEGMENT:
( u n s i g n e di n t )B i t m a p A d d r e s s :
FP-OFF(ScreenPtr)
*ScreenPtr
Color:
I* E T 4 0 0 06 4 0 x 4 8 02 5 6 - c o l o rm o d e - s e tf u n c t i o n .
v o i d SetMode( )
{
*/
*I
u n i o n REGS r e g s e t ;
I* S e t t o 6 4 0 x 4 8 02 5 6 - c o l o rg r a p h i c s
regset.x.ax
Ox002E;
i n t 8 6 ( 0 x 1 0 .& r e g s e t .& r e g s e t ) :
mode
*/
786
Chapter 42
of the block is programmable, but must be a power oftwo.The more intensity levels,
the better. Wu says that 32 intensities are enough; on
my system, eight andeven four
levels looked pretty good. I found thatgamma correction, which giveslinearly spaced
intensity steps, improved antialiasing quality significantly. Fortunately, we can program the palette with gamma-corrected values, so our drawing code doesn'thave to
do any extra work.
Listing 42.1 isn't very fast, so I implemented Wu antialiasing in assembly, hard-coded
for mode 13H. The implementation is shown in full in Listing 42.6. High-speedgraphics code andfast VGAs go together like peanut butter andjelly, which is to say very
well indeed; theassembly implementation ran more thantwice as fast asthe C code
on my 486. Enough said!
LISTING 42.6L42-6.ASM
;
;
;
;
;
;
;
;
;
;
;
;
;
C n e a r - c a l l a b l ef u n c t i o nt od r a w
an a n t i a l i a s e d l i n e f r o m
( X O . Y O ) t o (X1,YI). i n mode 1 3 h .t h e
VGA's s t a n d a r d3 2 0 x 2 0 02 5 6 - c o l o r
mode. Usesan
a n t i a l i a s i n ga p p r o a c hp u b l i s h e db yX i a o l i n
Wu i n t h e J u l y
1 9 9 1i s s u eo fC o m p u t e rG r a p h i c s .R e q u i r e st h a tt h ep a l e t t eb es e t
up s o
t h a tt h e r ea r e
NumLevels i n t e n s i t y l e v e l s o f t h e d e s i r e d d r a w i n g c o l o r ,
s t a r t i n ga tc o l o rB a s e C o l o r( 1 0 0 %i n t e n s i t y )a n df o l l o w e db y( N u m L e v e l s - 1 )
l e v e l so fe v e n l yd e c r e a s i n gi n t e n s i t y ,w i t hc o l o r( B a s e C o l o r + N u m L e v e l s - 1 )
b e i n g 0% i n t e n s i t yo ft h ed e s i r e dd r a w i n gc o l o r( b l a c k ) .
No c l i p p i n g i s
p e r f o r m e d i n DrawWuLine.Handles
a maximum o f 256 i n t e n s i t y l e v e l s p e r
a n t i a l i a s e dc o l o r .T h i sc o d ei ss u i t a b l ef o ru s e
a t s c r e e nr e s o l u t i o n s ,
w i t hl i n e st y p i c a l l y
nomorethan
1 K l o n g ;f o rl o n g e rl i n e s ,3 2 - b i te r r o r
a r i t h m e t i cm u s tb eu s e dt oa v o i dp r o b l e m sw i t hf i x e d - p o i n ti n a c c u r a c y .
T e s t e dw i t h TASM.
; C n e a r - c a l l a b l ea s :
;
v o i dD r a w W u L i n e ( i n t
XO. i n t YO, i n t X 1 . i n t Y 1 . i n B
t asecolor.
i n t NumLevels.unsigned
i n t1 n t e n s i t y B i t . s ) ;
SCREEN-WIDTH-IN-BYTES
SCREEN-SEGMENT
dw
equ
equ
320
OaOOOh
;# o b
f y t e sf r o mt h es t a r ot f
; tothestartofthenext
;segment i n w h iscchr e e n
one
scan
line
memory r e s i d e s
; Parameterspassed
i ns t a c kf r a m e .
parms
struc
;pushed BP and r e t u rand d r e s s
dw
2 dup ( ? )
X0
?
;X c o o r d i n a t eo fl i n es t a r tp o i n t
YO
dw
?
;Y c o o r d i n a t eo fl i n es t a r tp o i n t
X1
dw
?
; X c o o r d i n a t e o f l i n e end p o i n t
Y1
dw
?
; Y c o o r d i n a t e o f l i n e end p o i n t
BaseColor dw
?
; c o l o r # of ifr sc to l oibnrl o c k
used f o r
; a n t i a l i a s i n g .t h e1 0 0 %i n t e n s i t yv e r s i o no ft h e
; d r a w i n gc o l o r
NumLevels dw
?
cbo; losloB
w
oif zcrai teksh,e C o l o r + N u m L e v e l s - I
; b e i n g t h e 0% i n t e n s i t y v e r s i o n o f t h e d r a w i n g c o l o r
; (maximumNumLevels
256)
I n t e n s i t y B i t s dw ?
: l obga s e
2 o f NumLevel s : t h e # o bf i t s
used t o
; d e s c r i b et h ei n t e n s i t yo ft h ed r a w i n gc o l o r .
; 2**IntensityBits--NumLevels
; (maximum I n t e n s i t y B i t s
8)
parmsends
787
small.model
.code
; S c r e e nd i m e n s i o ng l o b a l s ,u s e d
-S c r e e n W i d t h I n P i x e l s
dw
-S c r e e n H e i g h t I n P i x e l s
dw
.code
public
-DrawWuLine
-DrawWuLine proc near
; ppb
ru
eps h
e
c ar vl sleeftra
ra'csm
ke
mov
b p .;sppol oti oncs taalf cr ak m e
push; p rseis e r v e
push d i
;preserve
push
ds
;make
c ld
i n mainprogram
320
200
t os c a l e .
C ' s r e g i vs at er ir a b l e s
C's d e f adual tt a
segment
s t ri ni nsgt r u c t iionncsr e mt ehpneoti rn t e r s
; Make s u r et h el i n er u n st o pt ob o t t o m .
mov
si.Cbpl.XO
mov
ax.Cbpl.YO
cmp
ax.[bp].Yl
;swap e n d p o i n t s i f n e c e s s a r yt oe n s u r et h a t
; Y O <- Y 1
jna
NoSwap
x c hCgb p 1 . Y l . a ~
mov
Cbp1.YO.a~
x c h[ bg p l . X l . s i
mov
[bpl.XO.si
NoSwap:
: Draw t h ei n i t i a lp i x e l ,w h i c hi sa l w a y se x a c t l yi n t e r s e c t e d
b yt h el i n e
; and so n e e d sn ow e i s h t i n s .
dx,SCREEN->EGMENT
mov
mov ; p. dodxisn t
DSst chtreoes e gn m e n t
mov
dx.SCREEN_WIDTH-IN-BYTES
mu1
dx
; Y O * SCREEN-WIDTH-IN-BYTES oy fitefhslede st
; ofthestartofthe
row s t a r t t h e i n i t i a l
; p i x e l i s on
add
s i; ,paoxi n t
D S : S I ti htnoei t pi ai xl e l
a l . b y t ep t r C b p 1 . B a s e C o l o ;rc o l o w
r i t hw h i c h
t o draw
mov
d r a wt h ei n i t i a lp i x e l
mov
[sil.al
mov
mov
sub
Jns
bx.1
cx.[bp].Xl
cx.[bpl.XO
Del taXSet
cx
neg
bx
neg
Del taXSet:
XDir
- 1; assume D e l t a X >-
; D e l t a X ; i s i t >- l ?
;yes. move l e f t - > r i g h t . a l l s e t
;no. move r i g h t - > l e f t
;make D e l t a Xp o s i t i v e
;XDir
- -1
; S p e c i a l - c a s eh o r i z o n t a l ,v e r t i c a l ,
a n dd i a g o n a l i n e s ,w h i c hr e q u i r en o
; w e i g h t i n gb e c a u s et h e yg or i g h tt h r o u g ht h ec e n t e ro fe v e r yp i x e l .
mov
dx.Cbpl.Yl
dx.Cbpl.YO
sub ;DeltaY;
h o r inzN
ooto;nnttH
joan.olzr z
bx.bx
and
jns
DoHorz
std
DoHorz:
l ed ai . [ b x + s; pi lo i n t
mov
ax.ds
788
Chapter 42
i s it O?
;yes. i sh o r i z o n t a l ,s p e c i a lc a s e
; d r a wf r o ml e f t - > r i g h t ?
;yes. a l ls e t
;no.draw
right->left
D I nt oepxitxdteor la w
mov
mov
e s: p. ao xi n t
ES:DI n pedt oixrtxaotewl
a l . b y t ep t r[ b p ] . B a s e C o l o r: c o l o rw i t hw h i c ht od r a w
:CX
DeltaXatthispoint
sr teh:otpdhosrotrebrhafiszewto nl itnael
c ld
f l a g d i r e c t i odne f a u l:tr e s t o r e
jmp
Done
done
we're
;and
align 2
NotHorz:
cx.cx
and
jnz NotVert
: i sD e l t a X O?
: n o , n o t a v e r t i c a ll i n e
:yes. i sv e r t i c a l ,s p e c i a lc a s e
; c o l o rw i t hw h i c ht od r a w
mov
a1 . b y t ep t[rb p ] . B a s e C o l o r
V e r t L o o :p
add
si.SCREEN-WIDTH-IN-BYTES
mov
[sil.al
dx
dec
Vj en rzt L o o p
jmp
Done
:no,
: p o i n tt on e x tp i x e lt od r a w
: d r a wt h en e x tp i x e l
:- - D e l t a Y
:andwe'redone
align 2
NotVert:
;DeltaX
cmp
cx,dx
NotDiag
jnz
mov
DiagLoop:
l e as i
dx
DeltaY?
:yes. i sd i a g o n a l ,s p e c i a lc a s e
a 1 , b y t ep t[rb p l . B a s e C o l o;rc o l owr i t hw h i c ht od r a w
.[si+SCREEN-WIDTH-IN-BYTES+bx]
mov
[ s i 1.a1
dec
Djinazg L o o p
jmp
Done
;advance t o n e x t p i x e l t o
drawby
; i n c r e m e n t i n g Y andadding
XDir t o X
p i x e nl e x t t h e: d r a w
taY
:- - D e l
done
we're
;and
: L i n ei sn o th o r i z o n t a l ,d i a g o n a l ,o rv e r t i c a l .
align 2
NotDiag:
: I s t h i s a nX - m a j o ro rY - m a j o rl i n e ?
cmp dx.cx
jb
XMajor
X-major
:it's
: I t ' s a Y - m a j o rl i n e .C a l c u l a t et h e1 6 - b i tf i x e d - p o i n tf r a c t i o n a lp a r to f
: p i x e l t h a t X advanceseachtime
Y advances 1 p i x e l , t r u n c a t i n g t h e r e s u l t
: t oa v o i do v e r r u n n i n gt h ee n d p o i n ta l o n gt h e
X axis.
xchg
dx.cx
ax.ax
sub
cd xi v
mov . c xd i
sub
s i .;bbxatuschptkea r t
mov
dx. -1
:DX
DeltaX. CX
DeltaY
;make D e l1t a6fX.i 1x 6e d - p ov iani lntu e
DX:AX
;AX
( D e l t a X << 1 6 ) / D e l tW
aoY
ovn. e' tr f l o w
; becauseDeltaX
< DeltaY
:DI
D e(l ltcoaooYupn t )
X by 1. eaxsp l a i nbeedl o w
: i n i t i a l itzlhienereraocrc u m u l a tt oo r
-1.
; so t h a t i t w
ill t u r no v e ri m m e d i a t e l ya n d
; advance X t o t h e s t a r t
X. T h i si sn e c e s s a r y
; properlytobiaserror
sums o f 0 t o mean
: " a d v a n c en e x tt i m e "r a t h e rt h a n" a d v a n c e
: t h i st i m e , ' '
so t h a tt h ef i n a le r r o r
sum can
; n e v e rc a u s ed r a w i n g
t oo v e r r u nt h ef i n a l
X
; c o o r d i n a t e( w o r k s
i n c o n j u n c t i o nw i t h
: t r u n c a t i n gE r r o r A d j .t o
make s u r e X c a n ' t
: overrun)
789
mov
cx.8
csxu. C
b bpl.1ntensityBits
mov
dec
ch
mov
c h . b ypt[ ebt rp l . N u m L e v e l s
s h i f ttwo h:CL
i cbhy b i t#s o f
: E r ri ngo ter teAtonclscei vt ye l
(8
: i n s t e a d o f 16because we w o r ko n l y
: w i t ht h eh i g hb y t eo fE r r o r A c c )
;mask used t fol i ap lbl i t isn
an
w:e ii ng thet ni ns gi t y
, producing
: r e s u l t ( 1 - i n t e n s i t yw e i g h t i n g )
a bv pa .i lBaanbsol et *fC
:r*a
o* m
l*osert[abcpk]
;***from now on
***
;BP
E r r o r A d j . AL
BaseColor.
: AH
s c r a t c hr e g i s t e r
--
bp.ax xchg
: Draw a l lr e m a i n i n gp i x e l s .
YMajorLoop:
dx.bp
add
jnc
NoXAdvance
: c a l c u l a t ee r r o rf o rn e x tp i x e l
:nottimetostepin
X yet
: t h ee r r o ra c c u m u l a t o rt u r n e do v e r ,
;so advancethe X c o o r d
p o pi ni xt ee rtlXDir
he to
add:adds i .bx
NoXAdvance:
add
si.SCREEN-WIDTH-IN-BYTES
;Y-major,
s o always
advance
The I n t e n s i t y B i t s m o s t s i g n i f i c a n t b i t s o f E r r o r A c c g i v e
us t h e i n t e n s i t y
w e i g h t i n gf o rt h i sp i x e l .
and t h ec o m p l e m e n to ft h ew e i g h t i n gf o rt h e
: p a i r e dp i x e l .
mov
ah.dh
:msb o f E r r o r A c c
:Weighting
E r r o r A c c >> I n t e n s i t y S h i f t :
ashh. cr l
add
ah.al
:BaseCol o r + W e i g h t i n g
:DrawPixel(X. Y . BaseColor + W e i g h t i n g ) :
mov
[ s i 1,ah
mov
ah.dh
:msb o f E r r o r A c c
ashh. cr l
:Weighting
E r r o r A c c >> I n t e n s i t y S h i f t :
a xho. cr h
: W e i g h t i n g A WeightingComplementMask
add
ah.al
:BaseColor + ( W e i g h t i n g A WeightingComplementMask)
:DrawPixel(X+XDir. Y .
mov
[ s i + b x, a] h
: BaseColor + ( W e i g h t i n g A WeightingComplementMask));
dec
di
:- - D e l t a Y
jnz
YMajorLoop
jmp
Done
l i n et h i sw i tdh:ownee' r e
;
;
: I t ' s a nX - m a j o rl i n e .
align 2
XMajor:
: C a l c u l a t et h e1 6 - b i tf i x e d - p o i n tf r a c t i o n a lp a r to f
a p i x e l t h a t Y advances
: e a c ht i m e X advances 1 p i x e l , t r u n c a t i n g t h e r e s u l t t o a v o i d o v e r r u n n i n g
: t h ee n d p o i n ta l o n gt h e
X axis.
ax.ax
sub
:make D e1f l6i tx.ae1Yd6 - vpi nao li un et
DX:AX
c xd i v
:AX
( D e l t a Y << 16) / D eW
ol tvoaenxr'.ft l o w
: becauseDeltaY < DeltaX
:DI
cD(oel o
ul tn
oatpX
)
mov. c x d i
sub
si.SCREEN-WIDTH-IN-BYTES
:back
up
t h se t a r t
X by 1. as
: e x p l a i n e db e l o w
mov
dx. -1
: i n i t i ael italrnhirczeoceru m u l at ot o r
-1.
: s o t h a t it will t u r no v e ri m m e d i a t e l ya n d
: advance Y t o t h e s t a r t Y . T h i si sn e c e s s a r y
: properlytobiaserror
sums o f 0 t o mean
: " a d v a n c en e x tt i m e "r a t h e rt h a n" a d v a n c e
: t h i s t i m e . " so t h a t t h e f i n a l e r r o r
sum can
: n e v e rc a u s ed r a w i n gt oo v e r r u nt h ef i n a l
Y
: c o o r d i n a t e( w o r k si nc o n j u n c t i o nw i t h
: t r u n c a t i n gE r r o r A d j .t o
make s u r e Y c a n ' t
: overrun)
790
Chapter 42
mov
cx.8
csxu. bC b p 1 . I n t e n s i t y B i t s
mov
ch
dec
mov
;CL
- I/
o f b i t s bywhich
to shift
: E r r o r A c ct og e ti n t e n s i t yl e v e l( 8
: i n s t e a d o f 16 because we w o r ko n l y
: w i t ht h eh i g hb y t eo fE r r o r A c c )
c h . b y tpe[t br p l . N u m L e v e l s
bp,BaseColor[bpl
;mask used t o f l i p a l l b i t s i n an
: i n t e n s i t yw e i g h t i n g ,p r o d u c i n g
: r e s u l t ( 1 - i n t e n s i t yw e i g h t i n g )
: * * * s t a c kf r a m en o ta v a i l a b l e * * *
***
:***from now on
:BP
E r r o r A d j . AL
BaseColor.
: AH
s c r a t c hr e g i s t e r
xchg
bp.ax
: Draw a l lr e m a i n i n gp i x e l s
XMajorLoop:
dx,bp
add
jnc
NoYAdvance
si.SCREEN_WIDTH-IN-BYTES
add
NoYAdvance:
add
s i .bx
:calculateerrorfornextpixel
Y yet
;nottimetostepin
: t h ee r r o ra c c u m u l a t o rt u r n e do v e r ,
: so advancethe Y c o o r d
:advance Y
:X-major. s o add XDir t o t h e p i x e l p o i n t e r
: The I n t e n s i t y e i t s m o s t s i g n i f i c a n t b i t s o f E r r o r A c c g i v e
: weightingforthispixel,
: p a i r e dp i x e l .
us t h e i n t e n s i t y
a n dt h ec o m p l e m e n to ft h ew e i g h t i n gf o rt h e
mov
ah.dh
E r r o r A:msb
cc of
: W e i ga h t. icnl g s h r
E r r o r A c c >> I n t e n s i t y S h i f t :
+ Weighting
:Basecolorah,al
add
Y . BaseColor + W e i g h t i n g ) :
mov :DrawPixel(X.
Csi1,ah
mov
ah.dh
E r r o r A:msb
cc of
: W e i gahht.icnlg s h r
E r r o r A c c >> I n t e n s i t y S h i f t :
A
WeightingComplementMask
:Weighting
xor
ah.ch
+ ( W e i g h t i n g WeightingComplementMask)
:Basecolor
ah.al add
mov
[si+SCREEN-WIDTH-IN-BYTES].ah
:DrawPixel(X. Y+SCREEN-WIDTH-IN-BYTES,
: BaseColor + ( W e i g h t i n g A WeightingComplementMask)):
:--0eltaX
dec
di
jnz
XMajorLoop
Done:
POP
ds
pop
di
pop
si
POP
bP
ret
-0rawWuLineendp
end
; w e r ed o n ew i t ht h i sl i n e
; r e s t o r eC sd e f a u l td a t a
segment
: r e s t o r eC sr e g i s t e rv a r i a b l e s
: r e s t o r ec a l l e r ss t a c kf r a m e
;done
Notes on Wu Antialiasing
Wu antialiasing can be applied to any curve for which its possible to calculate at
each step the positions and intensities of two bracketing pixels, although theimplementation will generally be nowhere near as efficient as it is for lines. However, Wus
article in Computer Ofaphicsdoes describe an efficient algorithm for drawing antialiased
circles. Wu also describes a technique forantialiasing solids, such as filled circles and
polygons. Wus approach biases the edges of filled objects outward. Although this is
no good for adjacentpolygons of the sortused in rendering,its certainly possible to
791
Previous
Home
With a l6bit erroraccumulator, fixed-point inaccuracy becomes a problem forWuantialiased lines longer than 1K. For such lines, you should switch to using 32-bit
error values, which wouldlet you handle linesof any practical length.
In the listings, I have chosen to truncate, rather than round, the error-adjust
value.
This increases the intensity error of the line but guarantees that fixed-point
inaccuracy wont cause the minor axis to advance past the endpoint. Overrunning the
endpoint would result in thedrawing of pixels outside thelines bounding box, and
potentially even in an attempt to
access pixelsoff the edgeof the bitmap.
Finally, I should mention that, as published, Wus algorithm draws lines symmetrically, from both endsat once. Ihavent done this for a numberof reasons, not least
of which is that symmetric drawing is an inefficientway to draw lines that spanbanks
on banked Super-VGAs. Banking aside, however, symmetric drawing is potentially
faster, because it eliminates half of all calculations; in so doing, it cuts cumulative
error in half, as well.
With or without symmetrical processing, Wu antialiasing beats fried,stewed chicken
hands-down. Trust me on this one.
792
Chapter 42
Next
Previous
chapter 43
bit-plane animation
Home
Next
\
i
~F
A:
795
Anyway, my point is that the PCs animation capabilities are pretty good. Theres a
trick, though: You can only push the VGA to its animation limits by stretching your
mind a bit andusing some unorthodox approaches to animation. Infact, stretching
your mind is the key to producing goodcode for any task on the PC-thats the topic
of the first part of this book. For most software, however, its not fatal if your code
isnt excellent-theres slow but functionalsoftware all over
the place. When it comes
to VGA animation, though,you wont get to first base without a clever approach.
So, what clever approaches do I have in mind? All sorts. The resources of the VGA
(or even its now-ancient predecessor, the EGA) are many and varied, and can be
applied and combinedin hundreds of ways to produce effective animation. For example, referback to Chapter 23 for anexample of page flipping. Or look at theJuly
1986 issue of PC TechJournal, which describes the basic block-moveanimation technique, or the August 1987 issue of PC Tech Journal, which shows a software-sprite
scheme built around theEGAs vertical interrupt and theAND-OR image drawing
technique. Or look over the rest of this book, which contains dozens of tips and
tricks that can be applied to animation, including
Mode X-basedtechniques starting
in Chapter 47 that are thebasis for many commercial games.
This chapter adds yet another sort of animation to thelist. Were going to take advantage of the bit-plane architecture and color palette of the VGA to develop an
animation architecture thatcan handle several overlapping images withterrific speed
and with virtually perfect visual quality.This technique produces no overlap effects
or flicker and allows us to use the fastest possiblemethod to draw images-the REP
MOVS instruction. Ithas its limitations, but unlike Mode X and some other animation techniques, the techniques Ill show you in this chapter will also work on the
EGA, which may be important in some applications.
As with any technique on the PC, there are tradeoffs involved with bit-plane animation.
While bit-plane animation is extremely attractive as far as performance and visual
quality are concerned, is
it somewhat limited. Bit-plane animation supportsonly four
colors plus the background color at any one time, each image must consist of only
one of the fourcolors, and its preferable thatimages of the same color not intersect.
It doesnt much matter if bit-plane animation isnt perfect for all applications,though.
The real point of showing you bit-plane animation is to bring home thereality that
the VGA is a complex adapter with manyresources, and thatyou can do remarkable
things if youunderstand those resources and comeup with creativeways to put them
to workat specific tasks.
796
Chapter 43
data, plane 2 red pixel data, and plane 3 intensity pixel data-but were going to mix
that up a bit in a moment,so well simplyrefer to them as planes 0, 1, 2, and 3 from
now on.
Each bit plane can be writtento independently. The contents of the four bit planes
are used to generate pixels, with the four bits that control the color of each pixel
coming from the four planes. However, the bits from the planes go through a lookup stage on the way to becoming pixels-theyre used to look up a 6bit color from
one of the sixteen palette registers. Figure 43.1 shows how the bits from the four
planes feed into the palette registers to select the color of each pixel. (On the VGA
specifically, the output of the palette registers goes to the DAC for an additional
look-up stage, as described in Chapters 33 and 34 and also Chapter A on the companion CD-ROM.)
Take a goodlook at Figure 43.1. Any light bulbs going on over yourhead yet? Ifnot,
consider this. The general problem with VGA animation is that its complex and
Palette I
Plane 1
color data
per pixel
to the
screen (or
to the DAC
on a VGA)
Plane O
plxel
l.bitper
rom
each plane
Bit-PlaneAnimation
797
timeconsuming to manipulate images that span the fourplanes (as most do), and
that it's hard to avoid interference problemswhen images intersect, since those images share thesame bits in display memory. Since the fourbit planes can be written
to and read from independently, it should be apparent
if wethat
could comeup with
a way to display imagesfrom each plane independently
of whatever imagesare stored
in the otherplanes, we would havefour sets of images that we could manipulatevery
easily. There would be no interference effects between images in different planes,
because imagesin oneplane wouldn't share bits with images
in another plane.
What's
more, since all the bits for agiven image would reside in asingle plane, we could do
away with the cumbersome programming of the VGA's complex hardware that is
needed to manipulate images that span multipleplanes.
All in all, it would be a good dealif we could store eachimage in a single plane, as
shown in Figure 43.2.However, a problem arises when images in different planes
overlap, as shownin Figure 43.3. The combined bits from overlappingimages generate new colors, so the overlapping parts
of the images don't look like theybelong to
either of the two images. What we really want,of course, is for one of the images to
appear to be infront of the other. would
It
bebetter yet if the rearward image showed
through any transparent (that is, background-colored) partsof the forward image.
Can we do that?
You bet.
Plane 3
I
.
Plane 2
Plane 1
I
Plane 0
Figure 43.2
798
Chapter 43
t
1,
Screen
Bit-PlaneAnimation
799
800
Chapter 43
PRO
PR 1
PR2
PR3
PR4
PR5
PR6
PR7
PR8
PR9
PRlO
PRl 1
PR12
PR13
PR14
PRl5
register settings we instantly have exactly what we want, which is an approach that
keeps images in one plane from interferingwith images in other planes while providing precedence and transparency.
Seems almost too easy, doesnt it? Nonetheless, it works beautifully, as wellsee very
shortly. First,though, Idlike to point out thattheres nothing sacred about plane0
having precedence. We could rearrange the palette
register settings so that any plane
had the highest precedence, followed by the otherplanes in any order. Ive chosen
to make plane 0 the highest precedence only because it seems simplest to think of
plane 0 as appearing in front of plane 1, which is in front of plane 2, which is in front
of plane 3.
: Program t o d e m o n s t r a t eb i t - p l a n ea n i m a t i o n .P e r f o r m s
: f l i c k e r - f r e ea n i m a t i o nw i t hi m a g et r a n s p a r e n c ya n d
: imageprecedenceacross
f o u r d i s t i n c tp l a n e s ,w i t h
: 1 3 32x32 imageskept i n m o t i o n a t once.
Bit-Plane Animation
801
; S e tt oh i g h e rv a l u e st os l o w
down on f a s t e rc o m p u t e r s .
an AT.
; S l o w i n ga n i m a t i o nf u r t h e ra l l o w s
a good l o o k a t
; t r a n s p a r e n c ya n dt h el a c ko ff l i c k e r
and c o l o r e f f e c t s
; when imagescross.
: 0 isfinefor
a PC. 500 i s a r e a s o n a b l es e t t i n gf o r
SLOWDOWN 10000
equ
; P l a n es e l e c t sf o rt h ef o u rc o l o r sw e ' r eu s i n g .
RED Olh
equ
GREEN equ
02h
BLUE equ
04h
WHITE equ
08h
VGA-SEGMENT
equ
OaOOOh
:mode 1 0 h d i s p l a y
memory
; segment
SC-INDEX
3c4h
equ
;Sequence C o n t r o l l e r I n d e x
; register
MAP-MASK
SCREEN-WIDTH
SCREEN-HEIGHT
WORD-OUTS-OK
equ
equ
350
equ
equ
80
:Map Mask r e g i s t e r i n d e x i n
; Sequence C o n t r o l l e r
;# o fb y t e sa c r o s ss c r e e n
;# o f s c a n l i n e s
onscreen
; s e t t o 0 t o assemble f o r
; c o m p u t e r st h a tc a n ' t
; h a n d l ew o r do u t st o
; i n d e x e d VGA r e g s
s t a c k segmentparastack
'STACK'
db
512 dup ( ? )
s t a c k ends
; Complete i n f o a b o u t
O b j e c t S t r u c t usrter u c
Delay
dw
BaseDel
ay
Image
dw
dw
?
?
XCoord
XInc
dw
dw
?
?
XLeftLimit
XRightLimit
YCoord
Y Inc
dw
dw
dw
dw
?
?
?
?
YTopLimit
YBottomLimit
P1 aneSel
db
ect
dw
dw
?
?
?
db
ObjectStructure
802
one o b j e c tt h a tw e ' r ea n i m a t i n g .
Chapter 43
ends
;used t o d e l a y f o r
n passes
; t h r o u g h tt h el o o pt o
; c o n t r o al n i m a t i o ns p e e d
; r e s e tv a l u ef o rD e l a y
; p o i n t e rt od r a w i n gi n f o
: f o ro b j e c t
;object X l o c a t i o ni np i x e l s
;# o f p i x e l s t o i n c r e m e n t
: l o c a t i o n by i n t h e X
: d i r e c t i o n oneach move
; l e f t limit o f X m o t i o n
;right l
i
m
i
t o f X motion
;object Y l o c a t i o n i n p i x e l s
; i o f p i x e l st oi n c r e m e n t
; l o c a t i o n by i n t h e Y
: d i r e c t i o n oneach move
; t o p limit o f Y m o t i o n
; b o t t o m limit o f Y m o t i o n
;mask t o s e l e c t p l a n e t o
; w h i c ho b j e c ti sd r a w n
; t o make aneven # o f w o r d s
; l o n g ,f o rb e t t e r
286
; p e r f o r m a n c e( k e e p st h e
; f o l l o w i n gs t r u c t u r e
; word-aligned)
word 'DATA'
Data
segment
;
;
;
;
P a l e t t es e t t i n g st og i v ep l a n e
0 p r e c e d e n c e ,f o l l o w e d
by
p l a n e s 1. 2 . and3.Plane3
h a s t h el o w e s tp r e c e d e n c e
(is
obscured by a n yo t h e rp l a n e ) .w h i l ep l a n e
0 h a st h e
h i g h e s tp r e c e d e n c e( d i s p l a y s
i n f r o n t ofany
o t h e rp l a n e ) .
Colors
;
;
;
;
db
db
db
db
db
db
db
db
db
db
db
db
db
db
db
db
db
OOOh
03ch
03a h
03ch
039h
03ch
03a h
03ch
03f h
03ch
03a h
03ch
039h
03ch
03ah
03ch
OOOh
:backgroundcolor-black
; p l a n e 0 only-red
: p l a n e 1 only-green
; p l a n e sO & l - r e d( p l a n e
0 priority)
;plane2only-blue
; p l a n e s O&E-red ( p l a n e 0 p r i o r i t y )
1 priority)
; p l a n e s1 & 2 - g r e e n( p l a n e
; p l a n e sO & l & E - r e d( p l a n e
0 priority)
; p l a n e3o n l y - w h i t e
; p l a n e s O&3-red ( p l a n e 0 p r i o r i t y )
: p l a n e s1 & 3 - g r e e n( p l a n e
1 priority)
; p l a n e sO & l & 3 - r e d( p l a n e
0 priority)
; p l a n e s2 6 3 - b l u e( p l a n e2p r i o r i t y )
; p l a n e s0 & 2 & 3 - r e d( p l a n e
0 priority)
; p l a n e s1 & 2 & 3 - g r e e n( p l a n e
1 priority)
0 priority)
; p l a n e s0 & 1 & 2 & 3 - r e d( p l a n e
;border col or-bl ack
Imageofahollowsquare.
T h e r e ' sa n8 - p i x e l - w i d eb l a n kb o r d e ra r o u n da l le d g e s
so t h a tt h ei m a g ee r a s e st h eo l dv e r s i o n
of i t s e l f a s
i t ' s moved andredrawn.
Sbql ayubtaeer le
dw
4 8 .;6h e i g hiptni x e l sw, i d t h
i n bytes
rept 8
db
0 . 0 , 0 . 0 , 0 . 0 ; t obpl a nbko r d e r
endm
.r a d i x
2
0.11111111.11111111,11111111,11111111,0
db
0.11111111.11111111.11111111.11111111.0
db
0.11111111,11111111,11111111,11111111,0
db
0.11111111,11111111,11111111,11111111,0
db
0,11111111.11111111.11111111.11111111.0
db
0.11111111.11111111,11111111,11111111,0
db
0.11111111.11111111.11111111.11111111.0
db
db
0.11111111.11111111.11111111.11111111.0
0.11111111.00000000.00000000,11111111,0
db
0,11111111,00000000.00000000.11111111,0
db
db
0.11111111.00000000.00000000.11111111.0
0,11111111.00000000.00000000.11111111.0
db
0.11111111.00000000,00000000,11111111,0
db
db
0,11111111,00000000,00000000.11111111.0
0.11111111.00000000.00000000.11111111.0
db
0.11111111.00000000.00000000.11111111.0
db
0.11111111.00000000.00000000.11111111.0
db
0,11111111,00000000,00000000.11111111.0
db
0.11111111,00000000,00000000,11111111,0
db
db
0,11111111.00000000.00000000,11111111,0
db
0,11111111.00000000.00000000.11111111.0
0.11111111,00000000.00000000,11111111,0
db
0.11111111.00000000.00000000.11111111.0
db
db
0,11111111.00000000.00000000,11111111,0
Bit-Plane Animation
803
db
0,11111111.11111111.11111111,11111111,0
db
0.11111111.11111111,11111111,11111111,0
db
0.11111111.11111111.11111111.11111111.0
db
0.11111111,11111111.11111111,11111111,0
db
0.11111111.11111111.11111111,11111111,0
db
0.11111111.11111111.11111111,11111111~0
db
0,11111111.11111111.11111111.11111111.0
db
0.11111111,11111111.11111111,11111111,0
.radix
10
rept 8
db
0.0.0.0.0.0 ;bottom
b l a nbko r d e r
endm
: Imageof a hollowdiamondwith
a s m a l l e rd i a m o n di nt h e
: middle.
: T h e r e ' sa n8 - p i x e l - w i d eb l a n kb o r d e ra r o u n da l le d g e s
: so t h a t t h e i m a g ee r a s e st h eo l dv e r s i o n
of itselfas
: i t ' s moved andredrawn.
Diamond
l abbyet le
dw
48.6 : h e i g hitpn i x e l sw, i d t ihbn y t e s
rept 8
db
0.0,O.O.O.O ; t obpl a nbko r d e r
endm
1
.radix
2
db
0.00000000.00000001.1000000.000000000.0
db
0.00000000.00000011.11000000.00000000,0
db
0.00000000.00000111.11100000.00000000.0
db
0.00000000,00001111.11110000.00000000.0
db
0.00000000.00011131.11111000.00000000.0
db
0.00000000.00111110.01111100.00000000.0
db
0.00000000.01111100.00111110,00000000.0
db
0.00000000.11111000.00011111.00000000.0
db
0.00000001,11110000.00001111,10000000.0
db
0,00000011.11100000.00000111.11000000.0
db
0.00000111.11000000.00000011,11100000.0
db
0.00001111.10000001,10000001,11110000,0
db
0.00011111.00000011.11000000*11111000.0
db
0.00111110.00000111.11100000.01111100.0
db
0,01111100.00001111.11110000.00111110.0
db
0.11111000.00011111,11111000.00011111,0
db
0.11111000.00011111.11111000.00011111.0
db
0.01111100.00001111.11110000,00111110,0
db
0.00111110,00000111.11100000,01111100.0
db
0,00011111.00000011.11000000,11111000.0
db
0,00001111,10000001.100000001.11110000.0
db
0.00000111.11000000.00000011.11100000.0
db
0,00000011.11100000.00000111.11000000.0
db
0,00000001.11110000.00001111,10000000,0
db
0.00000000.11111000.00011111,00000000*0
db
0.00000000.01111100.00111110,00000000.0
db
0.00000000.00111110.01111100.00000000.0
db
0,00000000,00011111.11111000.00000000.0
db
0.00000000.00001111.11110000.00000000.0
db
0.00000000.00000111.11100000.00000000.0
db
0.00000000.00000011.11000000.00000000.0
db
0.00000000.00000001.1000000.000000000.0
.radix
10
rept 8
O,O.O,O.O.O;bottom
db
blank
border
endm
804
Chapter 43
: L i s to fo b j e c t st oa n i m a t e .
e v e n: w o r d - a l i g nf o rb e t t e r
O b j e c t L i slta b e l
Objectstructure
Objectstructure
Objectstructure
ObjectStructure
Objectstructure
Objectstructure
Objectstructure
ObjectStructure
Objectstructure
Objectstructure
Objectstructure
Objectstructure
Objectstructure
ObjectListEnd
286 p e r f o r m a n c e
Objectstructure
<1,21.Diamond,88.8.80,512,16,0,0,350,RED>
<1.15.Square,296.8.112,480,144.0.0,350,REO>
<1,23.Diamond.88,8.80,512,256.0,0.35O,RED>
<1.13.Square.120,0.0.640,144.4,0,28O,~LUE>
<1.11.0iamond.208.0.0,640.144.4,0,280,ElLUE>
<1.8.Square.296.0.0.640,144.4.0,288,BLUE>
<1.9.Diamond,384,0.0.640,144.4,0~288,BLUE>
<I .14.Square.472.0.0.640,144.4.0.280.BLUE>
<1.8.Diamond,200,8.0,576,48,6,0,28O~GREEN>
<1.8.Square.Z48.8,0,576,96.6.0.280.GREEN>
<1.8.Diamond,296.8.0.576,144,6,0,28O,GREEN>
<1.8.Square.344,8.0.576,192,6,0.280,GREEN>
<1,8,0iamond.392.8.0.576,240,6.0,280~GREEN>
l a b e lO b j e c t S t r u c t u r e
Data
ends
: Macro t o o u t p u t
a w o r dv a l u et o
a port.
OUT-WORD
macro
i f WORD-OUTS-OK
dox u. at x
else
dox u, at l
inc
dx
xchg
ah.al
dox u. at l
dx
dec
xchg
ah,al
endi f
endm
: Macro t o o u t p u t
: register.
a c o n s t a n tv a l u et o
anindexed
CONSTANT-TO-INDEXED-REGISTERmacro
ADDRESS,
INDEX,
dx.AODRESS
mov
mov
ax.(VALUE s h l 8) + I N D E X
OUT-WORD
endm
VGA
VALUE
Code
segment
assume
cs:Code.
ds:Data
S t a r tp r o cn e a r
cld
mov
ax.Data
mov
ds.ax
: S e t6 4 0 x 3 5 01 6 - c o l o r
mov
ax.0010h
int
10h
mode.
;AH-0 means s e l e c t mode
:AL-lOh means s e l e c t
: mode 10h
:BIOS v i d e o i n t e r r u p t
Bit-PlaneAnimation
805
: S e tt h ep a l e t t eu pt op r o v i d eb i t - p l a n ep r e c e d e n c e .
If
D c o l o r wil beshown;
; i f p l a n e s 1 & 2 o v e r l a p ,t h ep l a n e
1 color w
i
l be
: p l a n e s 0 L 1 o v e r l a p ,t h ep l a n e
: shown:and
mov
ds
s o on.
:AH
10h means
: s e tp a l e t t e
: r e g i s t e r sf n
2 means s e t
:AL
: a l lp a l e t t e
: registers
:ES:DX
to points
palette
; the
: settings
the ;call
BIOS t o
: s e tt h ep a l e t t e
push
POP
mov
int
8) + 2
a x . ( l Oshh l
es
d x . o fCf soel ot r s
10h
: Draw t h e s t a t i c b a c k d r o p i n p l a n e
3. A
l themovingimages
:w
i
l appear t o be i n f r o n t o f t h i s b a c k d r o p , s i n c e p l a n e
3
: has t h el o w e s tp r e c e d e n c et h e
way t h e p a l e t t e i s s e t
up.
CONSTANT-TO-INDEXED-REGISTER
SC-INDEX.
MAP-MASK.
D8h
:allowdatato
go t o
: plane 3 only
: P o i n t ES t o d i s p l a y memory f o r t h e r e s t o f t h e p r o g r a m .
ax.VGA-SEGMENT
mov
mov
es.ax
sub
d.id i
bp.SCREEN-HEIGHT116
mov
:fill i n t h es c r e e n
: 1 6l i n e sa t
a time
BackdropBlockLoop:
c a l l DrawGridCross
cD
a lrla w G r i d V e r t
bpdec
Bjanczk d r o p B l o c k L o o p
c a l l DrawGridCross
:draw a c r o s sp i e c e
: d r a wt h er e s to f
a
: 1 5 - h i g hb l o c k
:bottomlineofgrid
: S t a r ta n i m a t i n g !
AnimationLoop:
mov
b x . o f f s eOtb j e c t L i s t
: objectinthelist
: For e a c ho b j e c t ,s e e
if it's time to
move anddraw
that
; object.
ObjectLoop:
; See i f i t ' s t i m e t o
move t h i s o b j e c t .
dec
Cbx+Del
: c o uany t]
D o Nj nezxdteO
; lsabt yji e
lilnc gt - d o n ' t
mov
ax.Cbx+BaseDelay]
mov
[ b x + D e:l,raaeyxdsl eneftletoiam
xryte
806
Chapter 43
down d e l a y
move
: S e l e c tt h ep l a n et h a tt h i so b j e c t
wil bedrawn
in.
mov
d x , SC-INDEX
mov
ah.[bx+PlaneSelectl
mov
a1 .MAP-MASK
OUT-WORD
; Advance t h e X c o o r d i n a t e , r e v e r s i n g d i r e c t i o n
: o ft h e
if either
X marginshasbeenreached.
mov
cx.Cbx+XCoordl
cmp
c x . [ b x + X L e f t L i ml;leai tm
ftlt i t ?
C h e cj ak X R i g h t L i m i t
Cbx+XIncl
neg
CheckXRightLimit:
cmp
cx.[bx+XRightLimitl
jb
SetNewX
[bx+XIncl
neg
SetNewX:
cx.Cbx+XIncl
add
mov
[bx+XCoordl.cx
;current X l o c a t i o n
:no
;yes-reverse
; a t r i g h t limit?
;no
:yes-reverse
;move t h e X c o o r d
; & save it
; Advance t h e Y c o o r d i n a t e , r e v e r s i n g d i r e c t i o n
; o f t h e Y marginshasbeenreached.
i f either
mov
dx,[bx+YCoordl
;current Y l o c a t i o n
cmp
d x . C b x + Y T o p L i m i t l : at to lpi m i t ?
ja
CheckYBottomLimit
;no
;yes-reverse
Cbx+YIncl
neg
CheckYBottomLimit:
limit?
cmp
dx.[bx+YBottomLimit]
;at
bottom
;no
jb
SetNewY
;yes-reverse
Cbx+YIncl
neg
SetNewY:
;move t h e Y c o o r d
dx.Cbx+YIncl
add
; & save it
mov
[bx+YCoordl.dx
s i . Ct bh xe+ It;m
op ao gi net l
; o b j e c t ' s image
; info
c aDlrl a w o b j e c t
; Pointtothenextobjectinthelistuntil
; objects.
we r u n o u t o f
DoNextObject:
a dbdx . s i zOeb j e c t S t r u c t u r e
cmD
bx.offsO
e tb j e c t L i s t E n d
jb
ObjectLoop
; D e l a ya ss p e c i f i e dt o
s l o wt h i n g s
down
i f SLOWDOWN
mov
c x , SLOWDOWN
Del ayLoop:
1oop Del ayLoop
endi f
Bit-PlaneAnimation
807
CheckKey:
mov
int
jz
ah.ah
sub
int
ah.1
16h
AnimationLoop
: i s a k e yw a i t i n g ?
:no
16h
: y e s - c l e a rt h ek e y
& done
: Back t o t e x t mode.
mov
int
ax.0003h
:AL-O3h
means s e l e c t
: mode 03h
10h
: Back t o DOS.
mov
int
ah.4ch
21h
;DOS t e r m i n a t e f u n c t i o n
:done
S t a r t endp
Draws a s i n g l e g r i d c r o s s - e l e m e n t a t t h e d i s p l a y
memory
l o c a t i o np o i n t e dt ob y
ES:DI. 1 h o r i z o n t a ll i n ei s
drawn
a c r o s st h es c r e e n .
I n p u t : ES:DI p o i n t st ot h ea d d r e s s
a t w h i c ht od r a w
O u t p u t : ES:DI p o i n t st ot h ea d d r e s sf o l l o w i n gt h e
l i n e drawn
R e g i s t e r sa l t e r e d :
the
AX, C X , DI
DrawGridCross
near
proc
a l si no el i d
mov
a x:.d0rfaf fwf h
mov
cx.SCREEN-WIDTH12-1
;draw
stosw
rep
r i g ht ht m
e bo us t a l l
: edge
mov
ax,
0080h
o f e d g er i g h t t h e : d r as two s w
: grid
ret
DrawGridCross
endp
Draws t h e n o n - c r o s s p a r t o f t h e g r i d a t t h e d i s p l a y
memory
ES:DI. 15scan
l i n e sa r ef i l l e d .
l o c a t i o np o i n t e dt ob y
I n p u t : E S : D I p o i n t st ot h ea d d r e s sa tw h i c ht od r a w
O u t p u t : ES:DI p o i n t st ot h ea d d r e s sf o l l o w i n gt h e
partofthegrid
drawn
R e g i s t e r sa l t e r e d :
OrawGridVert
near
proc
mov
ax,
0080h
mov
dx.15
808
Chapter 43
BackdropRowLoop:
mov
cx.SCREEN_WIDTH/Z
srt eo ps w
on t h es c r e e n
dxdec
jnz
BackdropRowLoop
ret
DrawGridVert
endp
;
;
;
;
;
D r a w t h es p e c i f i e d image a t t h e s p e c i f i e d l o c a t i o n .
Images a r ed r a w no nb y t eb o u n d a r i e sh o r i z o n t a l l y ,p i x e l
b o u n d a r i e sv e r t i c a l l y .
The Map Mask r e g i s t e rm u s ta l r e a d yh a v e
been s e t t o e n a b l e
access t ot h ed e s i r e dp l a n e .
; Input:
;
;
;
;
C X - X c o o r d i n a t oeuf p p el re fct o r n e r
DX - Y c o o r d i n a t oeuf p p elre fct o r n e r
DS:SI - p o i n t e tro draw i n f of o r
ES - d i s p l a y memory segment
image
; Output:none
; R e g i s t e r sa l t e r e d :
D r a w o b j e cptr once a r
ax.SCREEN-WIDTH
mov
mu1
dx
shr
shr
shr
add
cx.1
cx.1
cx.1
ax.cx
AX, C X . D X . S I . D I . BP
; c a l c u l a t et h es t a r to f f s e ti n
; d i s p l a y memory o ft h er o wt h e
; image w
ill bedrawn
at
; d i v i d et h e
X c o o r d i n a t ei np i x e l s
X c o o r d i n a t ei n
; by 8 t o g e t t h e
; bytes
; d e s t i n a t i o no f f s e ti nd i s p l a y
image
; p o i n t E S : D I t ot h ea d d r e s st o
; whichtheimage
will b ec o p i e d
; i n d i s p l a y memory
; memory f o r t h e
mov
d i ,ax
1 odsw
mov
dx.ax
1 odsw
bp.SCREENKWIDTH
mov
bp,ax
sub
;# o f l i n e s i n t h e
image
; # o fb y t e sa c r o s st h ei m a g e
;#o f b y t e s t o
add t o t h e d i s p l a y
; memory o f f s e t a f t e r c o p y i n g
a line
; o f t h e image t o d i s p l a y memory i n
: o r d e rt op o i n tt ot h ea d d r e s s
; where t h e n e x t l i n e o f t h e
; w
ill go i n d i s p l a y memory
DrawLoop:
mov
rep
cx.ax
movsb
di,bp
add
dx
dec
jnz
ret
image
; w i d t ho ft h e
image
; c o p yt h en e x tl i n eo ft h e
image
; i n t o d i s p l a y memory
a t w h i c ht h e
; p o i n tt ot h ea d d r e s s
; n e x tl i n e w
i
l go i n d i s p l a y
; memory
; c o u n t down t h e l i n e s o f t h e
image
DrawLoop
Bit-PlaneAnimation
809
Drawobject
endp
Code
ends
end
Start
For those of you who havent experienced the frustrations of animation programming on aPC, theres awholelot ofanimation going on Listing
in
43.1. Whats
more,
the animationis virtuallyflicker-free,partly thanks tobit-plane animation and partly
because imagesare never really erased
but ratherare simply overwritten.(The principle
behind the animationis that of redrawing each image with a blank fringe around it
when it moves, so that the blank fringe
erases the partof the oldimage that thenew
image doesnt overwrite. For details
on this sort of animation, see the above-mentioned
PC TechJournaZJuly 1986 article.) Better yet, the redimages takeprecedence over the
green images, which takeprecedence over the blue images, whichtake precedence
over the white backdrop, andall obscured images showthrough holes in and around
the edges of images infront of them.
In short,Listing 43.1accomplishes everything we wished for earlier in an animation
technique.
If you possiblycan, run Listing 43.1.The animation may be a revelation to those of you
who are used toweak, slow animation on PCs with EGA or VGA adapters. Bit-plane
animation makes the PC look an awful lot like-dare I say it?-a games machine.
Listing 43.1 was designed to run at theabsolute fastest speed, andas I mentioned it
puts in a pretty amazing performance on theslowest PCs of all. Assuming youll be
running Listing 43.1on anfaster computer, youll haveto crank up the DELAY equate
at thestart of Listing 43.1 to slow things down to a reasonable pace.(Its not a very
good gamewhere all the pieces are a continual blur!)
Even on somethingas modest
as a 286based AT, Listing 43.1 runs much too fast without
a substantial delay(although
it doeslook rather interesting at
warp speed). We should all havesuch problems, eh?
In fact, we could easily increase the numberof animated images past 20 on thatold
AT, and well into the hundreds on
a cuttingedge local-bus 486or Pentium.
Im not going to discuss Listing 43.1 in detail; the code is very thoroughly commented and should
speak for itself, and most of the individual components of Listing
43.1-the Map Mask register, mode sets, word versus byteOUT instructions to the
VGA-have been covered in earlier chapters. Do notice, however, that Listing 43.1
sets the paletteexactly asI described earlier. This is accomplished by passing a pointer
to a 17-byte array (1 byte for each of the 16 palette registers, and 1 byte for the
border color) to the BIOS videointerrupt (INT lOH), function10H, subfunction 2.
Bit-plane animation does have inherent limitations, which well get to in a second.
One limitation thatis not inherent to bit-plane animation butsimply a shortcoming
of Listing 43.1is somewhat choppy horizontal motion. In the interests
of both clarity
and keeping Listing 43.1 to a reasonable length, I decided to byte-align all images
horizontally. This saved the many tables needed to define the 7 non-byte-aligned
8 10
Chapter 43
rotations of the images, as well as the code needed to support rotation. Unfortunately, it also meant that thesmallest possible horizontal movement was 8 pixels (1
byte of displaymemory), which is far enoughto be noticeable at certain speeds. The
situation is, however, easilycorrectable with the additional rotationsand code. Well
see an implementation of fully rotated images (in this case for Mode X, but the
principles generalize nicely) in Chapter 49. Vertically, where there is no byte-alignment issue, the images move4 or 6 pixels at a times, resulting in considerably smoother
animation.
The addition of code to support rotatedimages would alsoopen the door
to support
for internal animation,where the appearanceof a given image changes over time to
suggest that the image is an active entity.For example, propellers couldwhirl, jaws
could snap, and jets could flare. Bit-plane animation with bit-aligned images and
internal animation can look truly spectacular. Its a sight worth seeing, particularly
for those who doubt thePCs worth when it comes to animation.
only be used in the VGAs planar modes, modesODH, OEH, IOH, and 12H. Also, the
reprogramming of the palette registers that provides image precedence also reduces
the available color set from the normal16 colors to just 5 (one color per plane plus
the background color). Worse still, each image must consist entirely of only one of
the fourcolors. Mixing colors within an image is not allowed, since the bits for each
image are limited to a single plane and can therefore select only one color. Finally,
all images of the same precedence must be the same color.
It is possible to work around thecolor limitations to some extent by using only one
or two planes for bit-plane animation, while reserving the other planes for multicolor drawing. For example, you could use plane 3 for bit-plane animation while
using planes 0-2 for normal 8-color drawing. The images in plane 3 would then appear to be in front of the 8-color images. Ifwe wanted the plane 3 images to be
yellow, we could set up thepalette registers as shown in Table 43.2.
As you can see, the color yellow is displayed whenever a pixels bit from plane3 is 1.
This gives the images from plane 3 precedence, while leaving us withthe 8 normal
low-intensity colors for images drawn across the other 3 planes, as shown in Figure
43.5. Of course, this approach provides only 1rather than3 high-precedence planes,
but that mightbe a good tradeoff for being able to draw multi-colored images as a
backdrop to the high-precedence images. Forthe right application, high-speed flickerfree plane 3 images moving in front of an 8-color backdrop could be a potent
combination indeed.
Another limitation of bit-plane animation is that its best if images stored in the
same
plane never cross each other. Why? Because whenimages do cross, the blank fringe
Bit-PlaneAnimation
811
around each image can temporarily erase the overlapped parts of the otherimage or
images, resulting in momentary flicker. While thats not fatal, it certainly detracts
from the rock-solid animation effect of bit-plane animation.
Not allowing images in the same plane to overlap is actually less ofa limitation than
it seems. Run Listing 43.1 again. Unless you were looking for it,youd never notice
that images of the same color almost never overlap-theres plenty of action to distract the eye, and the trajectories of images of the same color are arranged so that
they have a full range of motion without running into eachother. The only exception is the chain of green images, which occasionallydoubles back on itself when it
bounces directly into a corner and
reverses direction. Here,however, the images are
moving so quickly that thebrief moment duringwhich one images fringe blanks a
8 12
Chapter 43
PRO
PR 1
PR2
PR3
PR4
PR5
PR6
PR7
PR8
PR9
PR10
PRl 1
PR12
PR13
PR14
PR15
8 normal
low-intensity
colors
portion of another image is noticeable only upon close inspection, andnot particularly unaesthetic even then.
When a techniquehas such tremendous visual and performanceadvantages as does
bit-plane animation, it behooves you to design your animation software so that the
limitations of the animation technique dont
get in theway. For example, you might
design a shootinggallery game with allthe images in a given plane marching along
in step in a continuous band. Theimages could never overlap, so bit-plane animation would produce very high image quality.
Bit-Plane Animation
8 13
controller, with the same net result of mismatched top and bottom partsof the image, as the CRT controller scans out first unchanged bytes and then changedbytes.
Basically, shear will occasionally occur unless the CPU and CRT proceed at exactly
the same rate, which is most unlikely.Shear is more noticeablewhen there arefewer
but largerimages, since its more apparentwhen a larger screen area
is sheared, and
because its easier to spotone outof three largeimages momentarily shearing than
one outof twenty small images.
Image shear isnt terrible-Ive written and sold severalgames in which images occasionally shear, and Ive never heard anyone complain-but neither is it ideal. One
solution is page flipping, inwhich drawing is done to a nondisplayed
page of display
memory while another page of display memory is shown on the screen. (We saw
page flipping back in Chapter23, well see it again in the next chapter, and
well use
it heavily starting in Chapter 47.)When the drawing is finished, the newlydrawn
part of display memory is made thedisplayed page, so that thenew screen becomes
visible all at once, with no shearing or flicker. The otherpage is then drawn to, and
when the drawing is complete thedisplay is switched backto that page.
Page flipping can be used in conjunction with bit-plane animation, although page
flipping does diminishsome of the unique advantages of bit-plane animation. Page
flipping produces animationof the highest visual quality whether bit-plane animation is used or not. There are
a few drawbacks to page flipping, however.
Page flipping requirestwo display memory buffers, one to draw in and one
to display
at any given time. Unfortunately, in mode 12H there justisnt enough memory for
two buffers, so page flipping is not an optionin that mode.
Also, page flipping requires thatyou keep the contents of both buffers up to date,
which can require a good deal
of extra drawing.
Finally, page flipping requires that you wait until youre sure thepage has flipped
before you start drawing to the otherpage. Otherwise, you could end upmodifying
a page while its stillbeing displayed, defeating thewhole purpose of page flipping.
Waiting for pages to flip takes time and can slow overall performance significantly.
Whats more, its sometimes difficult to be sure when the page has flipped, since not
all VGA clones implement the display adapter status bitsand page flip timing identically.
To sum up, bit-plane animation by itself is very fast and looks good. In conjunction
with page flipping, bit-plane animation looks a little better but is slower, and the
overall animation schemeis more difficult to implement and perhaps a bit
less reliable on some computers.
814
Chapter 43
Previous
Home
Next
the place to go if you want to make a lot of jawsdrop.) Dont let anyone tell youthat
you cant do good animation on the PC. You can--ifyou stretch your mind to find
ways to bring the full power of the VGA to bear on your applications. Bit-plane animation isnt for every task; neither arepage flipping, exclusive-ORing, pixelpanning,
or any of the many other animation techniques you have available. One or more
tricks from that grab-bag should give you what you need, though, and the bigger
your grab-bag, the betteryour programs.
Bit-Plane Animation
815
Previous
chapter 44
split screens save the page flipped day
Home
Next
A Plethora of Challenges
-
In its simplestterms, computer animation consists of rapidly redrawing similar images at slightly differing locations, so that theeye interprets thesuccessive images as
819
a single object in motion over time. The fact that the world is an analog realm and
the images displayed on a computer screen consist of discrete pixels updated at a
maximum rate of about 70 Hz is irrelevant; your eye can interpret both real-world
images and pixel patterns on the screen as objects in motion, and thats that.
One of the key problems of computer animation is that it takes time to redraw a
screen, time during which the bitmap controlling the screen is in an intermediate
state, with, quite possibly, many objects erased and others half-drawn. Even when
only briefly displayed, a partially-updated screen can cause flicker at best, and at
worst can destroy the illusion of motion entirely.
Another problem of animation is that the screenmust update often enough so that
motion appears continuous. A moving object that moves just once every second,
shifting by hundreds of pixels each time it does move, will appear to jump, not to
move smoothly. Therefore, there are two overriding requirements for smooth animation: 1) the bitmap must be updated quickly (once per frame-60 to 70 Hz-is
ideal, although30 Hz will do fine), and,
2) the process of redrawing the screenmust
be invisible to the user; only the end result should ever be seen. Both of these requirements are metby the program presented inListings 44.1 and 44.2.
L44-1.C
I* S p l i ts c r e e n
VGA a n i m a t i o np r o g r a m .P e r f o r m sp a g ef l i p p i n gi nt h e
t o pp o r t i o no ft h es c r e e nw h i l ed i s p l a y i n gn o n - p a g ef l i p p e d
i n f o r m a t i o ni nt h es p l i ts c r e e na tt h eb o t t o mo ft h es c r e e n .
C o m p i l e dw i t hB o r l a n d
C++ i n C c o m p i l a t i o n mode. * I
# i n c l u d e< s t d i o . h >
# i n c l u d e< c o n i o . h >
# i n c l ude <dos. h>
# i n c l u d e< m a t h . h >
# d e f i n e SCREEN-SEG
# d e f i n e SCREEN-PIXWIDTH
# d e f i n e SCREEN-WIDTH
# d e f i n e SPLIT-START-LINE
SPLIT-LINES
#define
# d e f i n e NONSPLIT-LINES
# d e f i n e SPLIT-START-OFFSET
# d e f i n e PAGEO-START-OFFSET
820
Chapter 44
OxAOOO
640
80
I* ipni x e l s
I*b yi nt e s
*I
*I
339
141
339
0
(SPLIT-LINES*SCREEN-WIDTH)
#define
#define
#define
#define
#define
#define
#define
/\define
#define
PAGEl-START-OFFSET
CRTC-INDEX
Ox3D4
CRTC-DATA
Ox3D5
OVERFLOW
0x07
((SPLIT-LINES+NONSPLIT-LINES)*SCREEN-WIOTH)
/*
CRT C o n t r o l l e rI n d e xr e g i s t e r
/ * CRT C o n t r o l l e D
r a t ar e g i s t e r
/*
*/
*/
i n d e xo f
CRTC r e gh o l d i n gb i t
8 o tf h e
linethesplitscreenstartsafter
*/
MAX-SCAN
Ox09 / * i n d e xo f
CRTC r e gh o l d i n gb i 9to tf h e
linethesplitscreenstartsafter
*/
LINE-COMPARE Ox18 /* i n d e xo f CRTC r e gh o l d i n gl o w e r
8 bits
oflinesplitscreenstartsafter
*/
NUM-BUMPERS
(sizeof(Bumpers)/sizeof(bumper))
BOUNCER-COLOR 15
BACK-COLOR
1
/ * p l a y f i e bl da c k g r o u ncdo l o r
*/
be
used
offof
*/
f o dr r a w i n g
*/
t y p e d e fs t r u c t t /* o n eb o u n c i n go b j e c tt o
move a r o u n dt h es c r e e n
*/
i n t LeftX.TopY;
/* l o c a t i o n */
i n t Width.Height;
/* s i zpi nei x e l s
*/
i n t DirX.DirY:
/ * m ovt ieocnt o r s
*/
i n t CurrentXC2l.CurrentYC21; / * c u r r e n t l o c a t i o n i n e a c h
page */
C
i not l o r :
/* cwoihblnotiecor h
drawn */
image
*RotationO:
/ * r o t a t i o nfsohra n d l i ntgh e
8 possible */
image * R o t a t i o n l ;
/ * i n t r a b y tset a ar td d r e sawst h i cthh e
*/
image
*Rotation2:
/* l e f t edge
can
be
*I
image*Rotation3;
image*Rotation4:
image*Rotation5;
image*Rotation6:
image*Rotation7;
1 bouncer:
v o i dm a i n ( v o i d 1 ;
voidDrawBumperList(bumper *, i n t .u n s i g n e di n t ) ;
v o i dD r a w S p l i t S c r e e n ( v 0 i d ) :
void EnableSplitScreen(void);
voidMoveBouncer(bouncer *, bumper *, i n t ) :
i n t . u n s i g n e di n t ) :
e x t e r nv o i d D r a w R e c t ( i n t . i n t . i n t . i n t . i n t . u n s i g n e d
e x t e r nv o i d ShowPage(unsigned i n t ) ;
e x t e r nv o i dO r a w I m a g e ( i n t . i n t . i m a g e* * . i n t . u n s i g n e di n t . u n s i g n e di n t ) :
externvoidShowBounceCount(void):
e x t e r nv o i dT e x t U p ( c h a r* . i n t . i n t . u n s i g n e di n t . u n s i g n e di n t ) :
e x t e r nv o i dS e t B I O S B x 8 F o n t ( v o i d ) :
/ * Al bumpers i n t h e p l a y f i e l d */
bumperBumpers[]
t
(0.0.19.339.21.
{0.0.639.19.21.
{620.0.639.339.21.
{0.320.639.339.21.
I60.48.79.67.121.
~60.108.79.127.121.
(60.168.79.187.121.
t60.228.79.247.121.
I120.68.131.131,131,
{120.188.131.271.131.
I240.128.259.147.141.
{240.192.259.211.141.
t208.160.227.179.141.
{272.160.291.179.141.
t228.272.231.319.111.
t192.52.211.55.111.
I302.80.351.99.121.
I320.260.379.267.131.
821
{380.120,387.267.13),
{420.160.579,163.11},
{420.60.579.63.11),
{428.110.571,113.11~,
~428.210.571.213.11).(420.260.579.263.11)
1;
/*
I n i t i a ls e t t i n g sf o rb o u n c i n go b j e c t .O n l y
2 r o t a t i o n sa r e needed
b e c a u s et h eo b j e c t
moves 4 p i x e l s h o r i z o n t a l l y a t a t i m e */
bouncerBouncer
(156,60.20.20.4.4.156.156.60.60.BOUNCER-COLOR,
&BouncerRotationO,NULL,NULL,NULL,EBouncerRotation4,NULL,NULL,NULL~;
unsigned i n tP a g e S t a r t O f f s e t s C 2 1
(PAGEO-START-OFFSET.PAGEl-START-OFFSET);
unsigned i n t BounceCount;
v o i dm a i n 0 I
i n t DisplayedPage.NonDisplayedPage.
u n i o n REGS r e g s e t ;
Done, i:
regset.x.ax
0x0012; / * s e t d i s p l a y t o
6 4 0 x 4 8 01 6 - c o l o r mode * /
i n t 8 6 ( 0 x 1 0 .E r e g s e t .& r e g s e t ) ;
/ * s e t h ep o i n t e rt ot h e
B I O S 8 x 8f o n t
*/
SetBIOS8x8FontO;
E n a b l e S p l i t S c r e e n O : /* t u r n on t h e s p l i t s c r e e n
*/
/*
D i s p l a yp a g e
0 above t h e s p l i t s c r e e n
ShowPage(PageStartOffsetsC0isplayedPage
-*/
03):
/* Clearbothpagestobackgroundanddrawbumpers
f o r ( i - 0 : i < 2 : i++)
(
i n eachpage
*/
OrawRect~O.O.SCREEN~PIXWIDTH-l,NONSPLIT~LINES-l,BACK~COLOR,
PageStartOffsets[i].SCREEN-SEG);
OrawBumperList~Bumpers,NUM~BUMPERS,PageStartOffsets[il~;
)
DrawSplitScreenO;
BounceCount
0;
ShowBounceCountO;
/*
/*
draw t h es t a t i cs p l i ts c r e e ni n f o
/*
p u t up t h ei n i t i a zl e r oc o u n t
D r a w t h eb o u n c i n go b j e c ta ti t si n i t i a ll o c a t i o n
*/
*/
*/
OrawImage(Bouncer.LeftX.Bouncer.TopY,EBouncer.RotationO,
Bouncer.Color.PageStartOffsetsCDisplayedPage1.SCREEN~SEG~;
/*
Move t h e o b j e c t , draw i t i n t h e n o n d i s p l a y e d
page u n t i l Esc i s p r e s s e d */
Done
0;
do (
NonDisplayedPage
DisplayedPage A 1:
822
Chapter 44
page,and
flip the
/ * E r a s ea tc u r r e n tl o c a t i o ni nt h en o n d i s p l a y e d
page
DrawRect(Bouncer.CurrentX[NonDisplayedPagel,
*I
Bouncer.CurrentY[NonDisplayedPagel.
Bouncer.CurrentXCNonDisplayedPage3+8ouncer.Width-l,
Bouncer.CurrentY[NonDisplayedPagel+Bouncer.Height-l,
BACK~COLOR.PageStartOffsets[NonDisplayedPagel,SCREEN~SEG~:
I* Move t h eb o u n c e r * /
MoveBouncer(&Bouncer.Bumpers.
NUM-BUMPERS):
/ * Draw a t t h e new l o c a t i o n i n t h e n o n d i s p l a y e d page * /
DrawImage(Bouncer.LeftX.Bouncer.TopY.&Bouncer.RotationO,
Bouncer.Color.PageStartOffsetsCNonDisplayedPage1,
/*
SCREEN-SEG):
--
i s i n t h e n o n d i s p l a y e d page * /
Bouncer.LeftX:
Bouncer.CurrentY~NonDisplayedPage1 Bouncer.TopY:
/ * F l i p t o t h e page we j u s t drew i n t o */
ShowPage(PageStartOffsetsCDisp1ayedPage
NonDisplayedPagel):
I* Respond t o any k e y s t r o k e * I
if (kbhit0) {
s w i t c h( g e t c h 0 )
I
case
OxlB:
/* Esc t o end */
Done
1: b r e a k :
case 0:
/ * extended
branch
the
oncode
*I
s w i t c h( g e t c h 0 )
(
case0x48:
I* nudge
up
*/
Bouncer.Dit-Y
- a b s ( B o u n c e r . D i r Y ) ;b r e a k :
case Ox4B: I* nudge l e f t */
Bouncer.DirX
- a b s ( B o u n c e r . D i r X ) :b r e a k :
case Ox4D: / * nudge r i g h t */
Bouncer.OirX
a b s ( B o u n c e r . D i r X ) ;b r e a k :
case0x50:
/ * nudge down */
Bouncer.DirY
a b s ( B o u n c e r . D i r Y ) ;b r e a k :
Remember where t h eb o u n c e r
Bouncer.CurrentXCNonDisplayedPage1
break:
default :
break:
I w h i l e( ! D o n e ) :
/*
R e s t o r e t e x t mode anddone
regset.x.ax
0x0003:
i n t B 6 ( 0 x 1 0 .I r e g s e t .I r e g s e t ) :
*I
/ * Draws t h e s p e c i f i e d l i s t o f
bumpers i n t o t h e s p e c i f i e d
voidDrawBumperList(bumper * Bumpers, i n t NumBumpers.
u n s i g n e d i n tP a g e s t a r t o f f s e t )
page * I
i n t i:
f o r( i - 0 :
i<NumBumpers:
i++.Bumpers++)
DrawRect(Bumpers->LeftX.Bumpers->TopY.Bumpers->RightX,
Bumpers->BottomY.Bumpers->Color.PageStartOffset,
SCREEN-SEG) :
/ * D i s p l a y st h ec u r r e n t
bouncecount
v o i d ShowBounceCountO {
charCountASCII[71:
*I
Split ScreensSavethePageFlipped
Day
823
itoa(BounceCount.CountASCII.10);
A S C I I */
I* c o n v e r t t h e c o u n t t o
TextUp(CountASCII.344.64.SPLIT_START_OFFSET.SCREEN_SEG):
/ * Frames t h e s p l i t s c r e e n
v o i dD r a w S p l i t S c r e e n O
[
and f i l l s i t w i t h v a r i o u s t e x t
*/
DrawRect~O.O,SCREEN~PIXWIDTH-1.SPLIT~LINES-1.O,SPLIT~START~OFFSET,
SCREEN-SEG);
DrawRect~O,l.SCREEN~PIXWIDTH-l~4,15,SPLIT~START~OFFSET,
SCREEN-SEG) ;
DrawRect~O.SPLIT~LINES-4.SCREEN~PIXWIDTH-l,SPLIT~LINES-l,15,
SPLIT-START-OFFSET,SCREEN-SEG);
D~~~R~C~(~.~.~.SPLIT-LINES-~,~~.SPLIT_START-OFFSET.SCREEN_SEG):
OrawRect~SCREEN~PIXWIDTH-4.1.SCREEN~PIXWIDTH-l,SPLIT~LINES-l,l5,
SPLIT-START-OFFSET,SCREEN-SEG);
TextUp("This i s t h e s p l i t s c r e e n a r e a
B.8.SPLIT-START-OFFSET.
SCREEN-SEG) ;
TextUp("Bounces: ".272.64.SPLIT-START-OFFSET.SCREEN-SEG):
TextUp("\O33:nudge
left".520.78.SPLIT-START-OFFSET.SCREEN-SEG):
TextUp("\032:nudge
right".520.90.SPLIT-START-OFFSET.SCREEN-SEG);
TextUp("\031:nudge
down".520.102.SPLIT~START~OFFSET,SCREEN~SEG);
TextUp("\030:nudge
up".520.114.SPLIT_START-OFFSET.SCREEN-SEG);
TextUp("Esc t o end",520.126.SPLIT-START-OFFSET.SCREEN-SEG);
...",
/*
T u r n on t h e s p l i t s c r e e n a t t h e d e s i r e d l i n e ( m i n u s
1 b e c a u s et h e
s p l i ts c r e e ns t a r t s* a f t e r *t h el i n es p e c i f i e db yt h e
LINE-COMPARE
register)(bit 8 ofthesplitscreenstartlineisstoredinthe
9 i s i n t h e Maximum Scan L i n er e g ) */
O v e r f l o wr e g i s t e r ,a n db i t
v o i dE n a b l e S p l i t S c r e e n O {
outp(CRTC-INDEX, LINE-COMPARE);
outp(CRTC-DATA.
(SPLIT-START-LINE
- 1) & OXFF):
outp(CRTC-INDEX, OVERFLOW):
outp(CRTC-DATA. (((((SPLIT-START-LINE
- 1) & 0x100) >> 8) << 4 ) I
(inp(CRTC-DATA) & - 0 ~ 1 0 ) ) ) ;
outp(CRTC-INDEX. MAX-SCAN);
outp(CRTC-DATA. (((((SPLIT-START-LINE
- 1) & 0x200) >> 9) << 6) I
(inp(CRTC-DATA) & - 0 ~ 4 0 ) ) ) :
/* Moves t h eb o u n c e r ,b o u n c i n g
i f bumpers a r e h i t * /
voidMoveBouncer(bouncer*Bouncer,bumper*BumperPtr.
i n t NewLeftX, NewTopY. NewRightX,NewBottomY.
i:
---
i n t NumBumpers)
824
Chapter 44
B u m p e r P t r - > L e f t X ) &&
(NewRightX >- B u m p e r P t r - > L e f t X ) ) ) {
Bouncer->DirX
-Bouncer->DirX;
/* bounce h o r i z o n t a l l y * /
NewLeftX
Bouncer->LeftX + Bouncer->DirX:
*/
I* U p d a t et h eb o u n c ec o u n td i s p l a y :t u r no v e ra t
10000 * /
i f (++BounceCount >- 10000) t
TextUp("0
".344.64.SPLIT-START~OFFSET,SCREEN_SEG):
BounceCount
0:
I else t
ShowBounceCountO:
Bouncer->LeftX
Bouncer->TopY
NewLeftX;
- NewTopY:
/ * s e tt h ef i n a l
new c o o r d i n a t e s
*/
LISTING 44.2L44-2.ASM
: L o w - l e v e al n i m a t i o nr o u t i n e s .
: T e s t e dw i t h TASM
SCREEN-WIDTH
INPUT-STATUS-1
CRTC-INDEX
START-ADDRESS-HIGH
START-ADDRESS-LOW
GC-INDEX
SET-RESET
G-MODE
80
03dah
03d4h
Och
Od h
03ceh
0
5
: s c r e e nw i d t hi nb y t e s
: I n p u tS t a t u s
1 register
:CRT C o n t r o l l e rI n d e xr e g
: b i t m a ps t a r ta d d r e s sh i g hb y t e
; b i t m a ps t a r ta d d r e s sl o wb y t e
: G r a p h i c sC o n t r o l l e rI n d e xr e g
:GC i n d e x o f S e t / R e s e t r e g
:GC i n d e x o f Mode r e g i s t e r
.model
small
.data
?
: p o i nttos
BIOS 8fxo8n t
BIOS8x8Ptr
dd
: Tablesused t o l o o k up l e f t and r i g h t c l i p masks.
L e f t M a sdkOb f f h0.7 f h0.3 f hO. l f h .
OOfh. 007h.
003h.
RightMask
db
080h.
OcOh,
OeOh,
OfOh.
Of8h.
Ofch.
Ofeh.
Offh
OOlh
.code
: Draws t h e s p e c i f i e d f i l l e d r e c t a n g l e i n t h e s p e c i f i e d c o l o r .
: Assumes t h e d i s p l a y i s i n
mode 12h. Does n o t c l i p andassumes
: r e c t a n g l ec o o r d i n a t e sa r ev a l i d .
: C n e a r - c a l l a b l ea s :v o i dD r a w R e c t ( i n tL e f t X .i n t
TopY. i n t RightX,
i n t BottomY. i n tC o l o r ,u n s i g n e di n tS c r n D f f s e t .
unsigned i n t ScrnSegment);
DrawRectParms
dw
LeftX
dw
TopY
dw
RightX
dw
struc
2 dup ( ? ) : p u s h e d BP and r e t u r na d d r e s s
?
:X coordinateofleftsideofrectangle
?
:Y c o o r d i n a t eo ft o ps i d eo fr e c t a n g l e
?
: X c o o r driisngoiah
df ttee
o f rectangle
825
BottomY
Color
dw
dw
?
?
ScrnOffset
ScrnSegment
OrawRectParms
dw
dw
ends
?
?
pub1 i c -DrawRect
-0rawRect
n e apr o c
push
bP
mov l so:ftpcrataoscbm
lp
iknpets
: vpr cear aegr ilsia
lsebtsrel'viesres
push
di
push
:Y c o o r d i n a t e o f b o t t o m s i d e o f r e c t a n g l e
: c o l o ri nw h i c ht od r a wr e c t a n g l e( o n l yt h e
; l o w e r 4 b i t sm a t t e r )
; o f f s e t o f base o f b i t m a p i n w h i c h t o d r a w
:segment o f base o f b i t m a p i n w h i c h t o d r a w
fram
:spcteraaecl slkeerr'vse
cld
mov
dx.GC-INDEX
a1 .SET-RESET
mov
a h . b y t ep t rC o l o r C b p l
mov
out
:ctw
sahdodeihxexn
lrttioa,ocrw
h
mov
ax.G-MODE + (0300h)
mode 3
o u t w rt:i otseaedxt x ,
d i . d w o r dp t rS c r n O f f s e t C b p ]; p o i n tt ob i t m a ps t a r t
1 es
ax.SCREEN-WIDTH
mov
;pointtothestartofthetopscan
mu1
TopY Cbpl
; l i n e t o fill
d i ,ax
add
ax.LeftXCbp1
mov
bx, ax
mov
ax.1
;/8 b y t e o f f s e t f r o m l e f t o f s c r e e n
shr
ax.1
shr
ax. 1
shr
d i .ax
; p o i n tt ot h eu p p e r - l e f tc o r n e ro f
fill a r e a
add
; i s o l a t ei n t r a p i x e la d d r e s s
and
bx.7
; s e tt h el e f t - e d g ec l i p
mask
mov
dl .LeftMaskCbxl
bx,RightX[bpl
mov
mov
si, bx
: i s o l a t ei n t r a p i x e la d d r e s so fr i g h t
edge
and
bx.7
: s e tt h er i g h t - e d g ec l i p
mask
mov
dh.RightMaskCbx1
mov
bx.LeftX[bpl
; i n t r a p i x e la d d r e s so fl e f t
edge
and
bx.NOT 7
si, bx
sub
s i .1
shr
s i .1
shr
:#o fb y t e sa c r o s s
s p a n n e db yr e c t a n g l e
- 1
s i .1
shr
; i f t h e r e ' s o n l y one b y t ea c r o s s .
MasksSet
jnz
: combinethe masks
d l ,dh
and
MasksSet:
bx.BottomYCbp1
mov
P
: o f scan l i n e s t o fill - 1
sub
bx.TopYCbp1
F i 11 Loop:
;remember l i n e s t a r t o f f s e t
di
push
; l e f t edge c l i p mask
a1 , d l
mov
; d r a wt h e l e f t edge
es:[dil.al
xchg
inc
: p o i n t t o t h en e x tb y t e
di
mov
; I of bytes left to
do
cx.si
;# o f b y t e s l e f t t o
do - 1
dec
cx
: t h a t ' s i t i f t h e r e ' so n l y
1 b y t ea c r o s s
L i neDone
js
:no m i d d l eb y t e s i f o n l y 2 b y t e sa c r o s s
DrawRightEdge
jz
: n o n - e d g eb y t e sa r es o l i d
mov
a1 .Offh
stosb
:draw t h es o l i db y t e sa c r o s st h em i d d l e
rep
826
Chapter 44
DrawRightEdge:
mov
a1,dh
e sx: c[ dhigl . a l
LineDone:
pop
di
add
d i .SCREEN-WIDTH
bx
dec
jns
F iLoop
11
pop
pop
di
POP
bp
ret
-DrawRect
endp
: r i g h t - e d g e c l i p mask
: d r a wt h er i g h t
edge
:retrievelinestartoffset
: p o i n tt ot h en e x tl i n e
; c o u n to f fs c a nl i n e s
: r e s t o r ec a l l e r ' sr e g i s t e rv a r i a b l e s
si
: r e s t o r ec a l l e r ' ss t a c kf r a m e
: Shows t h e page a t t h e s p e c i f i e d o f f s e t i n t h e b i t m a p .
: d i s p l a y e d when t h i s r o u t i n e r e t u r n s .
: C n e a r - c a l l a b l ea s :v o i d
ShowPageParms
dw
S t a r t o f f s e t dw
ShowPageParms
struc
2 dup ( ? I
?
ends
public
-Showpage
n epar ro c
-ShowPage
Page i s
ShowPage(unsigned i n t S t a r t o f f s e t ) ;
;pushed BP and r e at udrdnr e s s
: o f fbsi ie
nt tm oa fp
page dtios p l a y
fram
:spcteraaecl slkeebrr'pvse p u s h
mov
f rsatl ao
mceka tl:obp po .i sn pt
: W a i t f o rd i s p l a ye n a b l et ob ea c t i v e( s t a t u si sa c t i v el o w ) .t ob e
: s u r eb o t hh a l v e s
o f t h es t a r ta d d r e s s
will t a k e i n t h e
same frame.
mov
: p r e l offaoasrdt e s t
b l ,START-ADDRESS-LOW
mov
bh,byte p t S
r tartOffset[bpl
: f l i p p i n g once d i s p l a y
mov
cl.START-ADDRESS-HIGH
: e n a b l de eitse c t e d
mo v
c h . b y t ep t rS t a r t D f f s e t + l [ b p l
mov
dx.INPUT-STATUS-1
WaitDE:
in
a1 .dx
test
a1 .Dlh
j nz
Wai t D E
; d i s p leany aabicsl et ilvoew
(0
active)
: S e tt h es t a r t
o f f s e t i n d i s p l a y memory o f t h e page t o d i s p l a y .
dx.CRTC-INDEX
mov
mov
ax, bx
aodl;ousdw
traersat sx d x ,
mov
ax, cx
ao:dhusditsraherast xs d x ,
: Now w a i t f o r v e r t i c a l
sync, s o t h eo t h e r page will be i n v i s i b l e when
: we s t a r t d r a w i n g t o i t .
mov
dx.INPUT-STATUS-1
Wai t V S :
in
a1 .dx
test
a1 .08h
jz
WaitVS
: v e rat csi ichtysian
ivglceh
(1 a c t i v e )
POP
bP
fram
sc:teraecl lsket ro' rse
ret
-ShowPage
endp
: D i s p l a y st h es p e c i f i e d
image a t t h e s p e c i f i e d l o c a t i o n i n t h e
: s p e c i f i e db i t m a p ,i nt h ed e s i r e dc o l o r .
Day
827
; C n e a r - c a l l a b l ea s :v o i dD r a w I m a g e ( i n tL e f t X .i n t
TopY.
i m a g e* * R o t a t i o n T a b l e .
i n tC o l o r ,u n s i g n e di n tS c r n O f f s e t .
u n s i g n e d i n t ScrnSegment);
DrawImageParms
dw
DILeftX
DITopY
RotationTable
DIColor
struc
dw
dw
dw
2 dup(?);pushed
BP and r e t u r na d d r e s s
?
;X c o o r d i nlseaoo
i fdtfftee
image
:Y c o o r di sm
itnoi o
adafpfget ee
; p o itnpattobeiflrne t teor s
image
; rotations
;cw
o ilhndoitrcoahw
image ( o tnhl ye
; l o w e r 4 b i t sm a t t e r )
; o f f s e t o f base boi tf m
wiahndpitrcoahw
;segment o f base boi tf m
wiahndpitrcoahw
?
?
dw
DIScrnOffset
DIScrnSegment
DrawImageParms
dw
dw
ends
?
?
image s t r u c
WidthInBytes
Height
BitPattern
imageends
dw
dw
dw
?
?
pub1 ic -DrawImage
n e apr o c
push
bP
mov
bP.SP
push
;vp
r cearae
gr ilsialsebtrsel've
isres
push
di
-DrawImage
; p r e s e r v ec a l l e r ' ss t a c kf r a m e
fsrl toaacm
ca;kteploo i n t
c ld
mov
dx.GC_INDEX
mo v
a1 .SETPRESET
mov
a h . b y t ep t rD I C o l o r E b p ]
out
dwdrchtax;ioohisn.wclaeohxtr
mov
ax.G-MODE + (0300h)
o u t w r i tteod; sxe, at x
mode 3
1es
d i , d w o r dp t rD I S c r n O f f s e t [ b p l; p o i n tt ob i t m a ps t a r t
mov
ax.SCREEN-WIDTH
mu1
DITopYCbpl
; p o i nt ohtset aotrhtf eospc a n
add
; l i n ewhich
on
t o draw
, a xd i
ax.DILeftX[bpl
mov
mov
bx, ax
;/E
b oy ft fefsrleoetm
ft
o f screen
shr
ax.1
ax.1
shr
ax.1
shr
di ,ax
add
; p o i n tt ot h eu p p e r - l e f tc o r n e ro fd r a wa r e a
bx.7
and
; i s o l a t ei n t r a p i x e la d d r e s s
bx.1
shl
;*2 f ol oro kw-ourpd
add
b x . R o t a t i o n T a b l e C b p 1; p o i n tt ot h ei m a g es t r u c t u r ef o r
mov Cbxlbx.
i;n tt rhareob tyat tei o n
mov
dx.Cbx1.WidthInBytes:imagewidth
mov
s i , [ b x l . B i t P a t t e r n; p o i n t etro
image p a t t e r nb y t e s
mov
bx.Cbx1.Height
height
;image
DrawImageLoop:
push
di
;remember
o f f ss teat rl ti n e
mov
cx.dx
a cbryo;#
t se so f
DrawImageLineLoop:
1odsb
; g e tt h en e x ti m a g eb y t e
; d r a wt h en e x ti m a g eb y t e
x c hegs[ d: .ia] 1
di
inc
; p o i n tt ot h ef o l l o w i n gs c r e e nb y t e
828
Chapter 44
DrawImageLineLoop
loop
o f f s e st t a r t l i:pop
nr e t r i e v ed i
add
d i ,SCREEN-WIDTH
:count
bx
dec
jnz
DrawImageLoop
; pnt h
oleti e
ionxntet
o f f scan l i n e s
: Draws a 0 - t e r m i n a t e d t e x t s t r i n g a t t h e s p e c i f i e d l o c a t i o n i n t h e
: s p e c i f i e db i t m a pi nw h i t e .u s i n gt h e8 x 8
: c o o r d i n a t e t h a t ' s a m u l t i p l e o f 8.
B I O S f o n t .M u s t
: C n e a r - c a l l a b l ea s :v o i dT e x t U p ( c h a r* T e x t .i n tL e f t X .i n t
be a t an X
TopY,
u n s i g n e d i n tS c r n O f f s e t .u n s i g n e di n tS c r n S e g m e n t ) :
TextUpParms
Text
TULeftX
struc
dw
dw
dw
TUTopY
TUScrnOffset
TUScrnSegment
TextUpParms
dw
dw
dw
ends
pub1 i c -Textup
near
- T e x t u pp r o c
push
bP
mov
bP.SP
v a r ri ae bg licesas:t p
elpush
l reer s' se r v es i
di
push
2 dup
(?):pushed
?
?
?
?
?
BP and r e t u rand d r e s s
t ot: ep xoti on t e r
draw
: X c o o rrdeslo
iceinfdo
tfaatef tneg l e
; (mustbe
a m u l t i p l e o f 8)
;Y c o o r d i n a t e o f t o p s i d e o f r e c t a n g l e
: o f f s e t o f base o f b i t m a p i n w h i c h t o d r a w
:segment o f base o f b i t m a p i n w h i c h t o d r a w
f r a m e s t a ccka: lpl e
r er 'sse r v e
f r asmt aelcokc a l t o : p o i n t
cld
dx.GC-INDEX
mov
ax.G-MODE + (0000h)
mov
w r i t et o
mode 0
o u t : s edt x . a x
d i . d w o r dp t rT U S c r n O f f s e t E b p l: p o i n tt ob i t m a ps t a r t
1 es
ax.SCREEN-WIDTH
mov
TUTopYCbpl
:stp
tstochho
totpaeeiofnr tt
mu1
s tt e:atxrhl tiens e
on
d i .ax
add
ax.TULeftX[bp]
mov
bx, ax
mov
ax.1
:/8
s c r oeofe
blffern
yffosttem
et
shr
ax, 1
shr
ax.1
shr
d i .ax
u:t ph po
teoei nr t-clfce
oi rhfostnafter r
add
. T esdxir tattCo:e
wpbxtopot iln t
mov
TextUpLoop:
: g e tt h en e x tc h a r a c t e rt od r a w
lodsb
a1 .a1
and
TextUpDone
:done
ibfy tneu l l
jz
p o i n t se trr i n g t e: pxpush
tr e s e r v es i
di
: p r e s e r v ec h a r a c t e r ' ss c r e e no f f s e t
push
d a t a d e :f papush
ruel ts e r v e d s
segment
CharUp
c h a r a c t et hr i s
c a :draw
t1
ds
d a t a d e f a: ruel st t o r e
segment
POP
829
POP
POP
inc
jmP
TextUpDone:
POP
POP
POP
ret
di
si
di
TextUpLoop
: r e t r i e v ec h a r a c t e r ' ss c r e e no f f s e t
:retrievetextstringpointer
: p o i n tt on e x tc h a r a c t e r ' ss t a r tl o c a t i o n
di
si
bP
: r e s t o r ec a l l e r ' sr e g i s t e rv a r i a b l e s
: r e s t o r ec a l l e r ' ss t a c kf r a m e
CharUp:
1ds
mov
sub
shl
shl
shl
add
mov
CharUpLoop:
movsb
add
1 oop
ret
-Textupendp
si.[BIOSBx8Ptrl
b l ,a1
bh, bh
bx.1
bx.1
bx.1
s i , bx
cx.8
d i .SCREEN-WIDTH-1
CharUpLoop
: S e t st h ep o i n t e rt ot h e
:draws t h e c h a r a c t e r i n AL a t
:pointtothe8x8fontstart
ES:DI
:*8 t o l o o k u p c h a r a c t e r o f f s e t i n f o n t
: p o i n t 0S:SI t o c h a r a c t e r d a t a i n f o n t
: c h a r a c t e r sa r e 8 h i g h
:copy t h e n e x t c h a r a c t e r p a t t e r n b y t e
:pointtothenextdestbyte
BIOS 8 x 8f o n t .
: C n e a r - c a l l a b l e as: e x t e r nv o i dS e t B I O S B x 8 F o n t ( v o i d ) :
p u b l i c -SetBIOSBx8Font
_SetBIOSdx8Font
proc
near
f r a m e s t a ccka: pl lreers' se r v eb p
push
v a rr ei agcb:push
iapsl elrtlesersr e' sr v se i
segment
(don't
data
: and
assume B I O S
push
di
ds
push
a n:y tphriensge) r v e s
mov
ah.llh
:BIOS cghfeuannr eacrct aitoet onr r
mov
al.30h
:BIOS isnuf bo frumnacttiioonn
: r e q u e s t8 x 8f o n tp o i n t e r
mov
bh.3
: i n v o k e BIOS v i d e o s e r v i c e s
int
10h
mov
word p t r [ B I O S 8 x B P t r ] . b: spt o tr hepeo i n t e r
mov
word p t r [BIOSBx8Ptr+2].es
POP
ds
v a r ei agcbi asl e:ltPOP
rleserrs' tso r e d i
POP
si
POP
bP
f r a m es t a c ka l:lreers' st o r e
ret
_SetBIOSBxBFontendp
end
Listing 44.1 is written in C . It could equally well have been written in assembly language, and would then have been somewhat faster. However, I wanted to make the
point (asI've made again and again) thatassembly language, and, indeed,optimization in general,
is needed only in themost critical portions of any program, and then
only when the programwould otherwise be tooslow. Only in ahighly performancesensitive situation would the performance boost resulting from converting
Listing
44.1 to assemblyjustifythe time spent in coding and bugs
the that would likelycreep
830
Chapter 44
in-and the sample program already updates the screen at the maximum possible
rate of once per frameeven on a 1985-vintage 8-MHz AT. In this case, faster performance would result only in a longerwait for the page to flip.
Write Mode 3
Its possibleto update the bitmapvery efficiently on theVGA, because the VGA can
draw up to 8 pixels at once, and because the VGA provides a number of hardware
features to speed up drawing. This article makes considerable use ofone particularly
unusual hardwarefeature, write mode 3. We discussed writemode 3 back in Chapter 26,
but weve covered a lotof ground since then-so Im goingto run through a quick
refresher on write mode 3.
Some background: In the standardVGAshigh-resolution mode, mode 12H (640x480
with 16 colors, the mode inwhich this chapters sample program runs), each byte of
display memory controls 8 adjacent pixels on thescreen. (The color of each pixel is,
in turn, controlledby 4 bits spread across the fourVGA memory planes, but we need
not concern ourselves withthat here.) Now, there will often be times when we want
to change some but not all of the pixels controlled by a particular byte of display
memory. This is not easily done, for there
is no way to write half a byte, or two bits, or
such to memory; its the whole byte or none of it at all.
You might think that using AND and OR to manipulate individual bits could solve
if the VGA had only one plane
the problem. Alas, not so. ANDing and ORing would work
of memory (like the original monochrome Hercules Graphics Adapter) but theVGA
has four planes, and ANDing and ORing would work only
ifwe selected and manipulated each plane separately, a process that would be hideously slow. No, with the VGA
you must use the hardware assist features, or you might as wellforget aboutreal-time
screen updates altogether. Write mode 3 will do the trick for ourpresent needs.
Write mode 3 is useful when you want to set some but notall ofthe pixels in asingle
byte of display memory to the same color. That is, if youwant to draw a numberof pixels
within a byte in a single color, write mode 3 is a goodway to do it.
Write mode 3 works like this. First, set the Graphics Controller Mode register to
write mode 3. (Look at Listing 44.2 for code that doeseverything described here.)
Next, set the Set/Reset register to the color with which you wish
to draw, in the range
0-15. (It is not necessary to explicitly enable set/reset via the Enable Set/Reset register; write mode 3 does that automatically.) Then, to draw individual pixels within a
single byte, simply read display memory, and then write a byte to display memory
with 1-bits where you want the color to be drawn and 0-bits where you want the
current bitmap contentsto be preserved. (Note well that the data actually read ly the
CPUdoesnt m a t t q the read operationlatches all four planes data, as described way
back in Chapter 24.) So, for example, if write mode 3 is enabled and theSet/Reset
register is set to 1 (blue), then thefollowing sequence of operations:
Split ScreensSavethePage Flipped Day 831
mov
mov
mov
mov
dx.Oa000h
es.dx
a1 .es:[O]
byteptr es:[O].OfOh
will change the first 4 pixels on the screen (the left nibble of the byte at offset 0 in
display memory) to blue,and will leave the next4 pixels (the right nibbleof the byte
at offset 0) unchanged.
Using one MOV to read from
display memory and another to
write to display memory
is not particularly efficient on some processors. InListing 44.2, I insteaduse XCHG,
which reads and then writes a memory location in a single operation, as in:
mov
mov
mov
dx.Oa000h
es.dx
a1 .OfOh
xchg es: [O] ,a1
Again, the actual value thats read is irrelevant. In general, theXCHG approach is
more compact than
two MOVs, and is faster on 386 and earlier processors, but slower
on 486s and Pentiums.
If all pixels in a byte of display memory are to be drawn in a single color, its not
necessary to read before
writing, because none of the information
in display memory
at that byte needs to be preserved; asimple write of OFFH (to draw allbits) will set all
8 pixels to the set/reset color:
mov
mov
mov
dx.Oa000h
es.dx
byteptres:Cdil.Offh
rfvou refamiliar with VGA programming, you re no doubt aware that everything
that can be done with write mode 3 can also be accomplished in write mode 0 or
write mode 2 by using the Bit Mask register. However,setting the BitMask register
requires at least one OUTper byte written, in addition to the read and write of
display memory, and OUTs are often slower than display memory accesses, especially on 386s and 486s. One of the great virtues of write mode 3 is that it requires
virtually no OUTs and is therefore substantially faster formasking than the other
write modes.
In short,write mode 3 is a good choice forsingle-color drawing that modifies individual pixels within display memory bytes. Not coincidentally, the sample application
draws onlysingle-color objects within the animation area; this
allows writemode 3 to
be used for all drawing, in keeping with our desire forspeedy screen updates.
Drawing Text
Well need text in the sample application;
is that also a good use for write mode 3?
Sometimes it is, but notin this particularcase.
832
Chapter 44
Page Flipping
Now that we know howto update thescreen reasonably quickly, itstime to get on to
the funstuff. Pageflipping answers the second requirement for animation,
by keeping bitmap changes off the screen until theyre complete. In other words, pageflipping
guarantees that partially updated bitmaps are never seen.
SplitScreensSavethePageFlipped
Day
833
How is it possible to update a bitmap without seeing the changes as theyre made?
Easy-with page flipping, there are two bitmaps; the program shows youone bitmap
while it updates the other. Conceptually, itsthat simple. In practice, unfortunately,
its not so simple, because of the design of the VGA. To understand why that is, we
must look at how the VGA turns bytes in display memory into pixels on thescreen.
The VGA bitmap is a linear 64 K block of memory. (True, most adapters nowadays
but every makeof SuperVGA
are SuperVGAs withmore than256 K of display memory,
has its own way of letting you access that extra memory, so going beyond standard
VGA is a daunting and difficult task. Also, its hard to manipulate the large frame
buffers of SuperVGA modes fast enough for real-time animation.) Normally, the
VGA picks up the first byte of memory (the byte at offset 0) and displays the corresponding 8 pixels on the screen, then picks up the byte at offset 1 and displays the
next 8 pixels, and so on to the endof the screen.However, the offset of the first byte
of display memory picked up during each frameis not fixed at 0, but is rather programmable by wayof the StartAddress High and Low registers, whichtogether store
the 16-bit offset in display memory at which the bitmap to bedisplayed during the
next framestarts. So, for example, in mode IOH (640~350,16colors), a large enough
bitmap to store a complete screen of information can be stored at display memory
offsets 0 through 27,999, and another full bitmap could be stored at offsets 28,000
through 55,999, as shownin Figure 44.1. (Im discussing 640x350 mode at themoment forgood reason; well get to640x480 shortly.)When the Start Address registers
are set to 0, the first bitmap (or page) is displayed; when they are set to 28,000, the
second bitmapis displayed. Pageflipped animationcan be performedby displaying
I
A000 :0000
p e t
eclma ]
A000 :DACO
( ffset 76,000
&cima
834
Chapter 44
page 0 and drawing to page 1, then setting the start address to page 1 to display that
page and drawing to page 0, and so on ad infinitum.
835
836
Chapter 44
Previous
A000 :0000
(offset O
decimal)
A000: 2C10
(offset 1 1,280
decimal)
Next
Home
Split Screen
(always controls scan lines339-479)
Page 0
(controls scan lines 0-338
when start address = 1 1,280)
A000 :9600
(offset 38,400
decimal)
Page 1
(controls scan lines 0-338
when start address = 38,400)
A000: FFFO
(offset 65,520
decimal)
Screen
animation
The sample program for this chapter uses the split screen and page flipping exactly
as described above. The playfield through which the object bounces is the pageflipped portion of the screen, and therectangle at thebottom containing the bounce
count and the
instructions is the split (that is, not animatable) portionof the screen.
Of course, to the user it all looks like one screen. There are no visible boundaries
between the two unless you choose to create them.
Very few animation applications use the entire screen for animation. If you can get
by with 339 scan lines of animation, split-screen page flipping gives you the best
combination of square pixels and high resolution possible on a standardVGA.
So. Is VGA animation worth all the fuss? Muzs oui. Run the sample program; if youve
never seen aggressive VGA animation before,youll be amazed at how smooth itcan
be. Not every square millimeter of every animated screen must be in constant motion. Most graphics screens need a little quiet space to display scores, coordinates,
file names, or (if all else fails) company logos. If you dont tell the user hes/shes
only getting 339 scan lines of animation, hell/shell probably never know.
SplitScreensSavethePageFlipped
Day
837
Previous
chapter 45
dog hair and dirty rectangles
Home
Next
,q
Ies on Animation
We brought our p&s with us whenwe moved to Seattle. At about thesame time,our
Golden Retriever, S
his third birthday. Sam
relatively
is
intelligent, in
ter than a banana slug, although if he were in the
the sense thathe is cf
s dog Mr.Byte, theres a reasonable chance that he
same room withJeff Du
would mistake Mr. B$e for something edible (a category that includes rocks, socks,
and asurprising nuhber of things too disgusting to mention), andJeff would have
of things to write about.
rtant now. What is important is that-and I am not making this
anaged to find the one pair of socks Samhadnt chewed holes
re important is that after we moved and Sam turned three,he
calmed down amazinkly. We had been waiting for this magic transformation since
Sam turned one,the age at which mostpuppies turn intonormal dogs who lie
around
a lot, waking up to eat their Science Diet(motto, The dog food that costs more than
the average neurosurgeon makes in a year) before licking themselves in embarrassing places and going back to sleep. When Sam turned one andremained hopelessly
out of control we said, Goldens take two years to calm down, as if we had a clue.
When he turned two and remained undeniably Sam we said, Any day now.By the
time he turned three,we were reduced to figuring that it was only about seven more
years until he expired, at which point we might be able to take all the fur he had
shed in his lifetimeand weave ourselves some
clothes without holes in them, or quite
possibly a house.
a
84 1
But miracle of miracles, we moved, and Sam instantly turned into the dog
we thought
wed gotten when we forked over $500--calm, sweet, and obedient.Weeks went by,
and Sam was, if anything, better than ever. Clearly, the change was permanent.
And then we took Sam tothe vet for his annual check-up and foundthat he had an ear
infection. Thanks to the wonders of modern animal medicine, a $5 bottle of liquid restored his health in just two days. And with hishealth, we got, as a bonus, the old Sam.
You see, Sam hadnt changed. Hewas just tired from beingsick. Now he onceagain
joyously knocks down anystranger who makes the mistake of glancing in his
direction,
and will, quite possibly, be booked any day nowon suspicion of homicide by licking.
Plus cu Change
Okay, you give up. What exactly does this have to do with graphics? Im glad you
asked. The lesson to be learned from Sam, The Dog With A Brain The Size Of A
Walnut, is that while things may look like theyvechanged, in fact they often havent.
Take VGA performance. If you buy a 486 with a SuperVGA, youllget performance
that knocks your socks off, especially if you run Windows. Things are liable to be so
fast that youll figure theSuperVGA has to deserve some of the credit.Well, maybeit
does if its a local-bus VGA. But maybe it doesnt,even if it is local bus-and it certainly doesnt if its an ISA bus VGA, because no ISA bus VGA can run faster than
about 300 nanoseconds per access, and VGAs capable of that speedhave been common for at least a coupleof years now.
Your 486 VGA system is fast almost entirely because it has a 486 in it. (486 systems
with graphics accelerators such as the AT1 Ultra or Diamond Stealth are another
story altogether.) Underneath it all, the VGA is still painfully slow-and if you have
an old VGA or IBMs original PS/2 motherboard VGA, its incredibly slow. The fastest ISA-bus VGA around is two to twenty times slower than system memory, and the
slowest VGA around is as much as 100 times slower. In the old days, the rule was,
Display memory is slow, and should be avoided. Nowadays, the rule is, Display
memory is not quite so slow, but should still be avoided.
So, as I say, sometimes things dont change. Of course, sometimes they do change.
For example, in just 49 dog years, I fully expect to own at least one pair of underwear
without a single hole in it. Which brings us, deus ex machina and the creek dont
rise, to yetanother animation method: dirty-rectangle animation.
842
Chapter 45
Table 45.1 shows the results of the aforementioned1 / 0 performance tests, asrun on
two 486/33 SuperVGA systemsunder thePhar Lap 3861DOS-Extender.(The systems
and VGAs are unnamedbecause thisis a not-very-scientific spot test, and I dont want
to unfairly malign, say, a VGA whose only sin is being plugged into a lousy
motherboard, or vice versa.) Under Phar Lap, 32 protected-mode apps run with
full 1 / 0 privileges, meaning that the OUT instructions I measured had the best
official cycle times possible on the 486: 10 cycles. OUT officially takes 16 cycles in
real mode on a 486, and officially takes a mind-boggling 30 cycles in protected mode
if running without full 1 / 0 privileges (as is normally the case for protected-mode
applications). Basically, 1 / 0 is just plain slow on a 486.
As slow as30 or even 10 cycles is for an OUT, one could only wish that VGA 1 / 0 were
actually that fast. The fastest measured OUT to a VGA in Table45.1 is 26 cycles,and
the slowest is 126this for an operationthats supposed to take 10 cycles. To put this
in context, MUL takes only 13 to 42 cycles, and a normal MOV to or from system
memory takes exactly one cycle on the 486. In short, OUTSto VGAs are as much as
100 times slowerthan normal memory accesses, and are generally two to four times
slower than even display memory accesses, although there are exceptions.
Of course, VGA display memory has its ownperformance problems. The fastest ISA
bus VGA can, at best, support sustained write timesof about 10 cycles per word-sized
843
write on a486/33; 15or 20 cycles is more common,even for relatively fast SuperVGAs;
the worst case Ive seen is 65 cycles per byte. However, intermittent writes, mixed
with a lotof register and cache-only code, can effectively execute in one cycle, thanks
to the cachingdesign of many VGAs and the 486s 4-deep write buffer, whichstores
pending writes while the CPU continues executing instructions. Display memory
reads tendto take longer, because coprocessing isnt possible-one microsecond is a
reasonable ruleof thumb for VGA reads, althoughtheres considerable variation. So
VGA memory tends notto be as bad as VGA I/O, but lord knows it isnt good.
OUTs, in general, are lousy on the 486 (and to think they onlytook three cycles on
the 286!). OUTs to VGAs are particularly lousy. Display memory performance is
pretty pool; especiallyfor reads. The conclusions areobvious, I would hope. Structure your graphics code, and, in general, all 486 code, to avoid OUTs.
For graphics, this especially means using writemode 3 rather than thebit-mask register. When you must use the bit mask, arrange drawing so that you can set the bit
mask once, then do a of
lotdrawing with that mask. Forexample, draw a whole edge
at once, then the middle, then theother edge, rather than setting the
bit mask several times on each scan line to draw the edge and middle bytes together. Dont read
from display memory if you dont have to. Write each pixel once and only once.
It is indeed a strange concept: The key to fast graphics is staying away from the
graphics adapteras much as possible.
Dirty-Rectangle Animation
The relative slowness ofVGA hardware is part of the appeal of the technique that I
call dirty-rectangleanimation, inwhich a completecopy ofthe contentsof display
memory is maintained in offscreensystem (nondisplay) memory. All drawing is done
to this system buffer. As offscreen drawing is done, alist is maintained of the bounding rectangles for the
drawn-to areas; these arethe dirty rectangles, dirty in the
sense
that thathave been alteredand nolonger match the contentsof the screen.After all
drawing for a frame
is completed, all the dirty rectangles for that frame are copied to
the screen in a burst,
and then thecycle of off-screen drawing begins again.
Why, exactly, would we want to go through all this complication, rather thansimply
drawing to the screenin the first place? The reason is visual quality.If we were to do
all our drawing directly to the screen, thered
be a lotof flicker as objects were erased
and then redrawn. Similarly, overlapped drawing done with the painters algorithm
(in which farther objects are drawn first, so that nearerobjects obscure them)would
flicker as farther objects were visible for short periods.With dirty-rectangle animation, only the finished pixels for
any given frame ever appear on the screen;
intermediate results are never visible. Figure 45.1 illustrates the visual problems associated with drawing directly to the screen; Figure 45.2 shows how dirty-rectangle
animation solves these problems.
844
Chapter 45
845
846
Chapter 45
pixels are copied multiple times. Listing 45.1 runs pretty well, considering all of its
failings; on my 486/33, 10 11x11 images animate at a very respectable clip.
LISTING 45.1 145- 1.C
/ * S a m p l es i m p l ed i r t y - r e c t a n g l ea n i m a t i o np r o g r a m .D o e s n ' ta t t e m p tt oc o a l e s c e
r e c t a n g l e st om i n i m i z ed i s p l a y
memory a c c e s s e s .N o te v e nv a g u e l yo p t i m i z e d !
T e s t e dw i t hB o r l a n d
C++ i nt h es m a l lm o d e l .
*/
# i n c l u d e < s t d l ib. h>
# i n c l u d e< c o n i o . h >
#i
ncl ude <a11oc. h>
# i n c l u d e <memory. h >
P
incl ude <dos h>
# d e f i n e SCREEN-WIDTH 320
# d e f i n e SCREEN-HEIGHT 200
# d e f i n e SCREENKSEGMENT
OxAOOO
/* Describes a rectangle
t y p e d e fs t r u c t {
i n t Top;
i n tL e f t ;
i n tR i g h t ;
i n t Bottom;
I Rectangle;
*/
/* D e s c r i b e sa na n i m a t e do b j e c t
*/
t y p e d e fs t r u c t {
i n t X:
/ * u p pl ec orf trvniienr tr ubai tl m a p
i n t Y;
i n tX D i r e c t i o n :
/ * d i r e c t i o n a n dd i s t a n c eo f
i n tY D i r e c t i o n ;
1 Entity;
/ * S t o r a g eu s e df o rd i r t yr e c t a n g l e s
*/
# d e f i n e MAX-DIRTY-RECTANGLES
100
i n t NumDi r t y R e c t a n g l e s ;
R e c t a n g l e D i r t y R e c t a n g 1 es[MAX-DIRTY-RECTANGLES]
/*
I f s e t t o 1. i g n o r e d i r t y r e c t a n g l e l i s t
i n t DrawWholeScreen
0:
/ * P i x e l sf o ri m a g ew e ' l la n i m a t e
11
#def ne IMAGE-WIDTH
#def ne IMAGE-HEIGHT 11
[
char I m a g e P i x e l s [ l
15 15.15.9.9.9.9.9,15,15.15,
15 15.9.9.9.9.9.9.9.15.15.
15 9.9.14.14.14.14.14.9.9.15.
9 9.14.14.14.14.14.14.14,9.9.
9. 9.14.14.14.14.14.14.14,9.9.
9.9.14.14.14.14.14,14.14,9.9.
9.9.14.14.14.14.14.14.14,
9.9.14.14.14.14.14.14.14,9.9.
15.9.9.14.14.14.14.14.9.9.15.
15.15,9.9.9,9,9.9.9.15.15.
15.15.15.9.9.9.9.9.15.15.15.
*/
movement
*/
a n dc o p yt h ew h o l es c r e e n .
*/
*/
}:
/*
a n i m a t e de n t i t i e s
9, 9.
*/
847
# d e f i n e NUM-ENTITIES 10
EntityEntitiesCNUM-ENTITIES];
/* p o i n t e r t o s y s t e m b u f f e r i n t o w h i c h w e ' l l d r a w
c h a r far * S y s t e m B u f f e r P t r ;
/*
p o i n t e rt os c r e e n
c h a r far * S c r e e n P t r ;
*/
*I
v o i dE r a s e E n t i t i e s ( v o i d 1 ;
v o i d CopyDi rtyRectangl esToScreen(void) ;
v o i dD r a w E n t i t i e s ( v o i d ) :
voidmain0
i n t i. XTemp. YTemp;
u n s i g n e di n tT e m p c o u n t :
c h a r far *TempPtr;
u n i o n REGS r e g s ;
/ * A l l o c a t e memory f o r t h e s y s t e m b u f f e r i n t o w h i c h w e ' l l d r a w
i f (!(SystemBufferPtr
f a r m a l l o c ( ( u n s i g n e d int)SCREEN-WIDTH*
SCREEN-HEIGHT))) I
p r i n t f ( " C o u 1 d n ' tg e tm e m o r y \ n " ) ;
exit(1);
/ * C l e a rt h es y s t e mb u f f e r
*/
TempPtr
SystemBufferPtr;
f o r (Tempcount
((unsigned)SCREEN-WIDTH*SCREENLHEIGHT);
*TempPtr++
0:
1
/ * P o i n tt ot h es c r e e n
*/
ScreenPtr
MK-FP(SCREEN-SEGMENT.
Tempcount--;
0);
/*
S e tu pt h ee n t i t i e sw e ' l la n i m a t e ,
a t r a n d o ml o c a t i o n s
randomize( ;
f o r (i 0 ; i < NUM-ENTITIES: i++)
I
EntitiesCi1.X
random(SCREENKW1DTH - IMAGE-WIDTH);
Entities[i].Y
random(SCREENKHE1GHT - IMAGE-HEIGHT);
Entities[il.XDirection
1;
Entities[i].YDirection
-1;
--
S e t3 2 0 x 2 0 02 5 6 - c o l o rg r a p h i c s
regs.x.ax
0x0013;
i n t 8 6 ( 0 x 1 0 .& r e g s .& r e g s ) ;
I* Loopanddraw
do
mode
u n t i l a key i sp r e s s e d
*/
*/
/*
Draw t h ee n t i t i e st ot h es y s t e mb u f f e ra tt h e i rc u r r e n tl o c a t i o n s ,
updatingthedirtyrectanglelist
*/
DrawEntitiesO;
/*
Draw t h e d i r t y r e c t a n g l e s ,
or t h ew h o l es y s t e mb u f f e r
a p p r o p r i a t e */
CopyDirtyRectanglesToScreenO;
/*
R e s e tt h ed i r t yr e c t a n g l el i s tt oe m p t y
NumDi r t y R e c t a n g l e s
0;
/*
*/
E r a s et h ee n t i t i e si nt h es y s t e mb u f f e ra tt h e i ro l dl o c a t i o n s ,
updatingthedirtyrectanglelist
*/
EraseEntitiesO;
848
*/
--
/*
*/
Chapter 45
if
1 I
--
I* Move t h e e n t i t i e s , b o u n c i n g o f f t h e e d g e s o f t h e s c r e e n
*I
< NUM-ENTITIES; i++)
I
XTemp
EntitiesLi1.X + Entities[il.XDirection:
EntitiesCi1.Y + Entities[il.YDirection;
YTemp
i f ((XTemp < 0 ) 1 1 ((XTemp + IMAGE-WIDTH) > SCREEN-WIDTH))
Entities[i].XOirection
-Entities[il.XDirection;
XTemp
EntitiesCi1.X + Entities[il.XDirection;
f o r (i 0; i
I
3
EntitiesCi1.X
EntitiesCi1.Y
- XTemp;
YTemp;
} w h i l e( ! k b h i t O ) :
getch0;
I* c l e at hr kee y p r e s s
/* Back t o t e x t mode */
regs.x.ax
0x0003;
i n t 8 6 ( 0 x 1 0 .& r e g s .& r e g s ) ;
I
/*
*/
*I
Draw e n t i t i e s a t c u r r e n t l o c a t i o n s , u p d a t i n g d i r t y r e c t a n g l e l i s t .
v o i dD r a w E n t i t i e s O
i n t i. j , k;
c h a rf a r* R o w P t r B u f f e r ;
c h a rf a r* T e m p P t r B u f f e r ;
c h a rf a r* T e m p P t r I m a g e ;
f o r (i 0; i < NUM-ENTITIES; i++)
I
I* Remember t h e d i r t y r e c t a n g l e i n f o f o r t h i s e n t i t y
*/
i f ( N u m D i r t y R e c t a n g l e s >- MAX-DIRTY-RECTANGLES)
I
I* Toomany d i r t y r e c t a n g l e s ; j u s t r e d r a w t h e w h o l e s c r e e n
DrawWhol eScreen
1;
3 else I
/* Remember t h i s d i r t y r e c t a n g l e */
DirtyRectanglesCNumDirtyRectangles1.Left
EntitiesCi1.X;
DirtyRectangles[NumDirtyRectanglesl.Top
EntitiesCi1.Y:
DirtyRectangles[NumDirtyRectangles3.Right
E n t i t i e s C i 1 . X + IMAGE-WIDTH;
---
DirtyRectangles[NumDirtyRectangles++l.Bottom
E n t i t i e s E i 1 . Y + IMAGE-HEIGHT:
I* P o i n t t o t h e d e s t i n a t i o n i n t h e s y s t e m b u f f e r
*/
*/
RowPtrBuffer
S y s t e m B u f f e r P t r + ( E n t i t i e s C i 1 . Y * SCREEN-WIDTH) +
EntitiesCi1.X;
/ * P o i n tt ot h ei m a g et od r a w
*I
TempPtrImage
ImagePixels;
/ * Copy t h ei m a g et ot h es y s t e mb u f f e r
*I
f o r (j
0; j < IMAGE-HEIGHT; j++)I
/ * Copya row * I
f o r( k
0. T e m p P t r B u f f e r
R o w P t r B u f f e r ; k < IMAGE-WIDTH;
k++)
*TempPtrBuffer++
*TempPtrImage++;
- -
I* P o i n t t o t h e n e x t s y s t e m b u f f e r r o w
*I
R o w P t r B u f f e r +- SCREEN-WIDTH;
/*
Copy t h e d i r t y
t ot h es c r e e n .
r e c t a n g l e s ,o rt h ew h o l es y s t e mb u f f e r
i f appropriate,
*I
849
v o i d C o p y D i r t y R e c t a n g 1 e s T o S c r e e n (1
i n t i. j. k , R e c t W i d t h R
. ectHeight;
u n s i g n e di n tT e m p c o u n t :
u n s i g n e di n tO f f s e t :
c h a rf a r* T e m p P t r S c r e e n ;
c h a rf a r* T e m p P t r B u f f e r ;
i f ( DrawWhol e S c r e e n )
I* J u s tc o p yt h ew h o l eb u f f e rt ot h es c r e e n
DrawWhol eScreen
0;
*I
TempPtrScreen
ScreenPtr:
TempPtrBuffer
SystemBufferPtr;
f o r (Tempcount
((unsigned)SCREEN_WIDTH*SCREEN-HEIGHT):
*TempPtrScreen++
*TempPtrBuffer++;
>
Tempcount--;
I else i
Copy o n l y t h e d i r t y r e c t a n g l e s
*/
f o r ( i = 0; i < N u m D i r t y R e c t a n g l e s : i++)
I
/ * O f f s e ti nb o t hs y s t e mb u f f e ra n ds c r e e no fi m a g e
Offset
(unsigned i n t ) (DirtyRectangles[il.Top
/*
*/
SCREENKWIDTH) +
DirtyRectangles[il.Left;
I* Dimensions o f d i r t y r e c t a n g l e * I
RectWidth
DirtyRectangles[il.Right - DirtyRectangles[il.Left:
RectHeight
DirtyRectangles[il.Bottom - D i r t y R e c t a n g l e s [ i l . T o p :
I* Copy a d i r t y r e c t a n g l e */
--
for (j
- 0;
<
R e c t H e i g h t ; j++) {
I* P o i n t t o t h e s t a r t o f r o w o n s c r e e n
TempPtrScreen
ScreenPtr + O f f s e t :
*I
/* P o i n tt ot h es t a r to fr o wi ns y s t e mb u f f e r
TempPtrBuffer
SystemBufferPtr + O f f s e t ;
/*
*I
Copy a row * /
for (k
0 ; k < R e c t W i d t h ; k++) i
*TempPtrBuffer++;
*TempPtrScreen++
I
/*
P o i n tt ot h en e x tr o w
O f f s e t +- SCREEN-WIDTH;
*/
/*
E r a s et h ee n t i t i e si nt h es y s t e mb u f f e ra tt h e i rc u r r e n tl o c a t i o n s ,
u p d a t i n gt h ed i r t yr e c t a n g l el i s t .
*/
v o i dE r a s e E n t i t i e s O
i n t i. j , k :
c h a rf a r* R o w P t r ;
c h a rf a r* T e m p P t r :
--
850
Chapter 45
*/
DirtyRectangles[NumDirtyRectanglesl.Right
E n t i t i e s [ i ] . X + IMAGELWIDTH:
D i r t y R e c t a n g l e s [ N u m D i r t y R e c t a n g 1es++l . B o t t o m
E n t i t i e s C i 1 . Y + IMAGE-HEIGHT;
/*
/*
Pointtothedestinationinthesystembuffer
*/
RowPtr
S y s t e m B u f f e r P t r + (Entities[i].Y*SCREEN-WIDTH)
C l e a rt h ee n t i t y sr e c t a n g l e
*I
for (j
0 ; j < IMAGE-HEIGHT; j++) {
/ * C l e a r a row */
for (k
0, TempPtr
RowPtr: k < IMAGELWIDTH: k++)
0:
*TempPtr++
- -
I* P o i n t t o t h e n e x t r o w
RowPtr +- SCREENCWIDTH:
+ EntitiesCi1.X:
*I
851
monster mode 12Hpages. With onlyone page, flipping is obviously out of the question, and without page flipping, top-flight, hi-res animation cant be implemented.
The standard fallback is to use the EGAs hi-res mode, mode 10H(640x350, 16 colors) for page flipping, but this mode is less than ideal for a couple of reasons: It
offers sharply lower vertical resolution, and its lousy for handling scaled-up CGA
graphics, because the vertical resolution is a fractional multiple-1.75 times, to be
exact-of that of the CGA. CGA resolution may not seem important these days, but
many images were originally created for the CGA, as were many graphics packages
and games, and its at least convenient to be able to handle CGA graphics easily.
Then, too, 640x350 is also
a poor multiple of the 200 scan lines of the popular 320x200
256-color mode 13Hof the VGA.
There are a couple of interesting, if imperfect, solutions to the problem of hi-res
page flipping. One is to use the split screen to enable page flipping only in the top
two-thirds of the screen; see the previous chapter for details, and for details on the
mechanics of page flipping generally. This doesnt address the CGA problem, but it
does yield square pixels and a full 640x480 screen resolution, althoughnot all those
pixels are flippable and thus animatable.
A second solution is to program the screen to a 640x400 mode. Such a mode uses
almost every byte of displaymemory (64,000 bytes, actually; youcould add another
few lines, if you really wanted to), andthereby provides the highest resolution possible on theVGA for a fully page-flipped display. It maps well to CGA and mode13H
resolutions, being either identical or double in both
dimensions. As an added benefit, it offers an easy-on-the-eyes 70-Hzframe rate,as opposed to the 60 Hz that is the
best that mode 12Hcan offer, due to the design of standard VGA monitors. Best of
all, perhaps, is that 640x400 16-colormode is easy to set up.
The key to 640x400 mode is understanding that ona VGA, mode 10H (640x350) is,
at heart,a 400-scan-line mode. What I mean by that is that in mode 10H, the Vertical
Total register, which controls the total number of scan lines, both displayed and
nondisplayed, is set to 447, exactly the same as in the VGAs text modes, which do in
fact support 400 scan lines. A properly sized and centered display is achieved in
mode 10Hby setting the polarity of the sync pulses to tell the monitorto scan vertically at a faster rate (to make fewer lines fill the screen), by starting the overscan
after 350 lines, and by setting the vertical sync and blanking pulses appropriately for
the faster vertical scanning rate. Changing
those settings is allthats required to turn
mode 10H intoa 640x400 mode, and thats easy to do, as illustrated by Listing 45.2,
which provides mode set code for
640x400 mode.
LISTING 45.2
/*
#i
n c l ude <dos h>
852
L45-2.C
Chapter 45
mode. T e s t e dw i t h
v o i dS e t 6 4 0 x 4 0 0 0
{
u n i o n REGS r e g s e t :
I* F i r s t ,s e tt os t a n d a r d6 4 0 x 3 5 0
regset.x.ax
0x0010:
i n t 8 6 ( 0 x 1 0 .& r e g s e t .& r e g s e t ) ;
mode (mode10h)
*/
/*
M o d i f yt h es y n cp o l a r i t yb i t s( b i t s
7 & 6) o f t h e
a t Ox3CC. w r i t a b l e a t
M i s c e l l a n e o u sO u t p u tr e g i s t e r( r e a d a b l e
Ox3C2) t os e l e c tt h e4 0 0 - s c a n - l i n ev e r t i c a ls c a n n i n gr a t e
*/
outp(Ox3C2,((inp(Ox3CC)
& Ox3F) I 0 x 4 0 ) ) :
/*
Now, t w e a kt h er e g i s t e r sn e e d e dt oc o n v e r tt h ev e r t i c a l
t i m i n g sf r o m3 5 0t o4 0 0s c a nl i n e s
*I
I* a d j u s t h eV e r t i c a l
Sync S t a r t r e g i s t e r
outpw(Ox3D4.
Ox9C10):
f o r 4 0 0s c a nl i n e s
*/
o u t p w ( O x 3 D 4O. x 8 E l l ) :
I* a d j u s t h eV e r t i c a l
Sync End r e g i s t e r
f o r 400 s c a nl i n e s * /
I* a d j u stth eV e r t i c aDl i s p l a y
End
outpw(Ox304.
Ox8FlZ);
r e g i s t e r f o r 4 0 0s c a nl i n e s
*I
outpw(Ox304,
0x9615):
I* a d j u s tt h eV e r t i c aBl l a n kS t a r t
r e g i s t e rf o r4 0 0s c a nl i n e s
*/
outpw(Ox3D4.
0x6916):
/ * a d j u s tt h eV e r t i c aBl l a n k
End r e g i s t e r
f o r 400scan
l i n e s *I
LISTING 45.3
/ * Sampleprogram
L45-3.C
1,
#i
n c l ude <dos h>
P
in c l u d e < c o n 0.
i h>
#define
#define
#define
#define
SCREEN-SEGMENT
OxAOOO
SCREEN-HEIGHT
400
SCREEN-WIDTH-IN-BYTES
INPUT-STATUS-1
Ox3DA
80
/*
c o l o r - m o d ea d d r e s so fI n p u tS t a t u s
r e g i s t e r */
/ * Thepage s t a r ta d d r e s s e sm u s tb ee v e nm u l t i p l e so f2 5 6 .b e c a u s ep a g e
f l i p p i n g i s p e r f o r m e d b yc h a n g i n go n l yt h eu p p e rs t a r ta d d r e s sb y t e
# d e f i n e PAGE-0-START 0
# d e f i n e PAGEL-START (400*SCREEN_WIDTHKIN_BYTES)
1
*I
853
v o i dm a i n ( v o i d ) ;
v o i dW a i t 3 0 F r a m e s ( v o i d ) ;
e x t e r nv o i dS e t 6 4 0 x 4 0 0 ( v o i d ) ;
v o i dm a i n ( )
{
i n t i;
u n s i g n e di n tf a r* S c r e e n P t r :
u n i o n REGS r e g s e t ;
Set640x4000;
/*
s e tt o
/*
6 4 0 x 4 0 01 6 - c o l o r
mode
*/
--
Pointtofirstlineof
page 0 anddraw a h o r i z o n t a l l i n e a c r o s s s c r e e n
FP-SEG(ScreenPtr)
SCREEN-SEGMENT;
FP-OFF(ScreenPtr)
PAGE-0-START;
i<(SCREEN-WIDTH-IN-BYTESlZ);
i++)
*ScreenPtr++
OxFFFF;
f o r( i - 0 ;
/*
Pointtolastlineof
page 1 anddraw a h o r i z o n t a ll i n ea c r o s ss c r e e n
FP-OFF(ScreenPtr)
PAGE-1-START + ((SCREEN_HEIGHT-l)*SCREEN-WIDTH-IN-BYTES);
f o r (i-0; i<(SCREEN_WIDTH_IN_BYTES/2); i++)
*ScreenPtr++
OxFFFF;
*/
/ * Now f l i p p a g e so n c ee v e r y3 0f r a m e su n t i l
do {
Wait30FramesO;
I* F1 i p t o page 1
outpw(Ox3D4. OxOC
*/
*/
((PAGELLSTART
a key i s p r e s s e d
>> 8) <<
8)):
>>
8));
*/
Wait30FramesO:
/ * F l i p t o page 0 * I
outpw(Ox3D4. OxOC I ((PAGE-OKSTART
while(kbhit0
0);
getch0;
/*
I
/*
<<
*/
c l e a rt h ek e yp r e s s
8)
R e t u r n t o t e x t mode and e x i t * I
regset.x.ax
0x0003;
/ * AL 3 s e l e c t s8 0 x 2 5t e x t
i n t 8 6 ( 0 x 1 0 .& r e g s e t .& r e g s e t ) ;
mode
*/
voidWait30Frames.O
i n t i:
f o r( i - 0 ;i < 3 0 ;
i++)
{
/ * W a i tu n t i lw e r en o t
i n v e r t i c a ls y n c ,
s o we c a nc a t c hl e a d i n ge d g e
w h i l e ((inp(1NPUT-STATUS-1)
& 0x08) !- 0 ) ;
/* W a i t u n t i l we a r e i n v e r t i c a l s y n c
*I
w h i l e ((inp(INPUT-STATUS-1)
& 0x08)
0) :
*/
After I described 640x400 mode ina magazine article, Bill Lindley, of Mesa,
Arizona,
wrote me to suggest that when programming theVGA to a nonstandard modeof this
sort, its a good idea to tell the BIOS about the new screen size, for a couple of
reasons. For one thing, pop-uputilities often use the BIOS variables; Billsmemoryresident screen printer,EGAD Screen Print, determines thenumber of scan lines to
854
Chapter 45
print by multiplying the BIOS number of text rows variable times the character
heightvariable. Foranother, the BIOS itself maydo a poorjob
of displaying textif not
given proper information; the active textarea may not match the screen dimensions,or
an inappropriate graphics font
may be used. (Of course, theBIOS isnt going to be
able todisplay text anyway in highly nonstandard modes suchas Mode X, but it will
do fine in slightly nonstandard modes suchas 640x400 16-color mode.) In the case
of the 640x400 16-color model described little
a
earlier, Bill suggests that the codein
Listing 45.4be called immediately after putting theVGA into that mode totell the
BIOS that were working with 25 rows of 16-pixel-high text. I think this is an excellent suggestion; it cant hurt, andmay save youfrom gettingaggravating tech support
calls down the road.
LISTING 45.4
L45-4.C
I* F u n c t i o n t o t e l l t h e
B I O S t o s e t up p r o p e r l ys i z e dc h a r a c t e r sf o r
25 rows o f
16 p i x e l h i g h t e x t i n
6 4 0 x 4 0 0g r a p h i c s mode. C a l l i m m e d i a t e l y a f t e r mode s e t .
Basedon
a c o n t r i b u t i o nb y
Bill L i n d l e y . * I
#i
n c l ude <dos h>
v o i dS e t 6 4 0 x 4 0 0 0
u n i o n REGS r e g s :
0x11:
regs.h.ah
regs.h.al
0x24;
regs.h.bl
2:
i n t 8 6 ( 0 x 1 0 .& r e g s .
®s):
I* c h a r a c t e rg e n e r a t o rf u n c t i o n
*I
I* use ROM 8 x 1 6c h a r a c t e rs e tf o rg r a p h i c s
I* 25 rows * I
I* i n v o kt eh e
B I O S v i d ei on t e r r u p t
t o s e t up t h e t e x t * I
*I
The 640x400 mode Ive described here isnt exactly earthshaking, butit can come in
handy forpage flipping and CGA emulation, and Im sure thatsome of you will find
it useful at onetime or another.
855
keep two pages current can make for a lot of bookkeeping and possibly extra drawing, especially in applications where only some of the objects are redrawn each time.
Serge didnt care to do all that bookkeeping in his animation applications, so he
came up with the following approach, which Ive reworded, amplified, and slightly
modified in the summary here:
1. Set the start address to display page
0.
2. Draw to page1.
3. Set the start address
to display page1 (the newly drawn page), then wait for the leading
edge of vertical sync, at which point the page has flipped and its safe to0. modify page
4. Copy, via the latches,from page 1 to page 0 the areas that changedfrom the previous
screen to the current one.
0, which is now identical to page1, then wait for
5 . Set the start address to display page
the leading edge of vertical sync, at which point the page has flipped and its safe to
modify page 1.
6 . Go to step 2.
The greatbenefit of Sergesapproach is that theonly page that is ever actually
drawn
to (as opposedto being block-copied to) is page 1. Only one page needs to be maintained, and the complications of maintaining two separate pages vanish entirely.
The performance of Serges approach may be better or worse than standard page
flipping, depending onwhether a lot of extra work isrequired to maintain two pages
or not.My guess is that Serges approach will usually be slower, owing tothe considerable amount of display-memorycopying involved, and also to the doublepage-flip
per frame. Theres no doubt, however, that Serges approach is simpler, and the
resultant display quality is every bit as good as standard page flipping. Given page
flippings fair degree of complication, this approach is a valuable tool, especially for
lessexperienced animation programmers.
An interesting variation on Serges approach doesnt page flip
nor wait for vertical sync:
1. Set the start address to display page0.
2. Drawtopage 1.
3. Copy, via the latches, the areas that changed from the last screen to the current one
from page 1 to page 0.
4. Go to step 2.
This approach totally eliminates page flipping, which can consume a great deal of
time. The downside is that images may shear for one frame if theyre only partially
copied when the raster beam reaches them. This approach is basically a standard
dirty-rectangle approach, except that the
drawing buffer is stored indisplay memory,
rather than in system memory. Whether this technique is faster than drawing to
system memory depends onwhether the benefit you get from the VGAs hardware,
856
Chapter 45
Previous
Home
Next
such as the Bit Mask, the &Us, and especially the latches (for copymg the dirty
rectangles) is sufficient to outweigh the extra display-memory accesses involved in
drawing and copying, since displaymemory is notoriously slow.
Finally, Id like to point out that in any scheme that involves changing the displaymemory start address, a clever trickcan potentially reduce thetime spent waiting for
pages to flip. Normally, its necessaryto wait for display enable to be active, then set
the two start address registers, and finally wait for vertical sync to be active, so that
you know the new start address has taken effect. The start-address registers must
never be set around the time vertical sync is active
(the new start address is accepted
at either the start or end
of vertical sync on the EGAs and VGAs Im familiar with),
because it would then be possible to load a half-changed start address (one register
loaded, theother notyet loaded), and thescreen would jump for a frame. Avoiding
this condition is the motivation for waiting for display enable, because display enable is active only when vertical sync is not active and will not become active for a
long while.
Suppose, however, that you arrange your page start addresses so that they both have
a low-byte value of 0 (page 0 starts at OOOOH, and page 1 starts at 8000H, for example). Page flipping can then be done simply by setting the new high byte of the
start address, then waiting for the leading edgeof vertical sync. This eliminates the
need to wait for display enable (the two bytes of the start address can never be mismatched) ;page flippingwill often involve less waiting, because display
enable becomes
inactive long beforevertical syncbecomes active. Usingthe above approach reclaims
all the time between the endof displayenable and thestart of vertical syncfor doing
useful work. (The steps Ive given for Serges animation approach assume that the
single-byte approach is in use; thats why display enable is never waited for.)
In the next chapter,
Ill return to the original dirty-rectangle algorithm presented in
this chapter, and goose it a little with some assembly, so that we can see what dirtyrectangle animation is really made of. (Probably not dog hair
....)
857
Previous
chapter 465
who was that masked image
Home
Next
\:
::h
irty-Rectangle Animation
d
,
.
-
Programming is,
861
funny, perhaps not-but as for me, it hit me exactly right.I started laughinghelplessly,
tears rolling down my face. WhenI went back to work-presto!-the pieces of the debugging puzzlehad come together in my head, and the
work went quicklyand easily.
Obviously, my mind needed break
a
from linear, left-brain, push-it-out thinking, so it
could do the sortof integrating work it does so well-but that its rarely willing
to do
under conscious control. It was exactly this sort of thinking I had in mind when I
titled my 1989 optimization book Zen ofAssembly Language. (Although I must admit
that few people seem to have gotten the connection, and Ive had to field a lot of
questions about whether Im a Zen disciple. Im not-actually, Im more of a Dave
Barry disciple.If you dont know who Dave Barry is, youshould; hes good foryour
right brain.) Give your mind a break once
in a while, and Ill bet youll find youre
more productive.
Were strange thinkingmachines, but were the best ones yet invented, andits worth
learning how to tap our full potential. And with that, its back to dirty-rectangle
animation.
862
Chapter 46
1.C
I* S a m p l es i m p l ed i r t y - r e c t a n g l ea n i m a t i o np r o g r a m ,p a r t i a l l yo p t i m i z e d
f e a t u r i n gi n t e r n a la n i m a t i o n ,
maskedimages
C++
r e c t a n g l ec o p y i n g .T e s t e dw i t hB o r l a n d
and
( s p r i t e s ) . a n dn o n o v e r l a p p i n gd i r t y
i nt h es m a l lm o d e l .
*I
#i
n c l ude < s t d l ib. h>
# i n c l u d e< c o n i o . h >
#i
ncl ude <a11 oc. h>
# i n c l u d e <memory.h>
# i n c l u d e < d o s . h>
I* Comment o u t t o d i s a b l e o v e r l a p e l i m i n a t i o n i n t h e d i r t y r e c t a n g l e l i s t .
# d e f i n e CHECK-OVERLAP 1
# d e f i n e SCREEN-WIDTH 320
# d e f i n e SCREEN-HEIGHT 200
# d e f i n e SCREEN-SEGMENT
OxAOOO
*I
I* D e s c r i b e s a d i r t y r e c t a n g l e * I
t y p e d e fs t r u c t {
v o i d* N e x t :
I* p o i n t e rt on e x tn o d ei nl i n k e dd i r t yr e c tl i s t
*I
i n t Top:
i n tL e f t :
i n tR i g h t :
i n t Bottom:
1 DirtyRectangle:
I* D e s c r i b e sa na n i m a t e do b j e c t
*I
t y p e d e fs t r u c t
{
i n t X:
I* upper l ec fotr vniienr rt ubai tl m a p
*I
i n t Y:
i n tX D i r e c t i o n :
I* d i r e c t i o n and d i s t a n c eo f movement * I
intYDirection:
i n tI n t e r n a l A n i m a t e C o u n t : I* t r a c k i n gi n t e r n a la n i m a t i o ns t a t e
*I
i n It n t e r n a l A n i m a t e M a x :
I* maximum i n t e r n aal n i m a t i o ns t a t e
*/
1 Entity:
I* s t o r a g eu s e df o rd i r t yr e c t a n g l e s
*I
# d e f i n e MAX-DIRTY-RECTANGLES
100
i n t NumDirtyRectangles:
D i r t y R e c t a n g l e DirtyRectanglesCMAX-DIRTY-RECTANGLES]:
I* h e a d l t a i l o f d i r t y r e c t a n g l e l i s t
*I
D i r t y R e c t a n g l e D i rtyHead;
I* I f s e t t o 1, i g n o r e d i r t y r e c t a n g l e l i s t
a n dc o p yt h ew h o l es c r e e n .
*I
i n t DrawWholeScreen
0:
I* p i x e l s andmasks
f o rt h et w oi n t e r n a l l ya n i m a t e dv e r s i o n s
o f t h ei m a g e
*I
w e ' l la n i m a t e
# d e f i n e IMAGE-WIDTH 13
# d e f i n e IMAGE-HEIGHT 11
charImagePixelsOCl
I
0 . 0 . 0 . 9.9.9.9.9,
0.0.
0. 0. 0.
0,0.
9,9.9.9.9.9.9.
0. 0 . 0,0.
0. 9 9. .
0. 0.14.14.14, 9. 9 . 0. 0 . 0 .
99. .
0, 0 . 0 . 0.14.14.14.9.9.
0,0.
9.9.
0.0.0,
0.14.14.14.9.9.
0, 0,
9.9.14.
0. 0.14.14.14.14.9.9.
0 . 0.
9,9.14,14.14.14.14.14.14,9.9.
0, 0.
9.9.14.14.14.14.14.14.14,9.9.
0,0.
0. 9,9.14.14.14.14.14.9.9,
0. 0, 0.
0,0,9.9.9.9.9.9.9.0.0.0,0.
0. 0.0,
9.9.9.9,9.
0,0.
0 . 0. 0 .
1:
863
c h a r ImageMaskOCl
I
0.0.0.1.1.1.1.1.0.0.0,0.0,
00, .
1. 1. 1. 1, 1. 1, 1. 0. 0. 00. .
0. 1, 1. 0.0.
1 , 1 , 1, 1, 1, 0,
0,
0,
1.1.0.0.0.0.1.1,1.1,1.0,0.
1,1.0,0.0.0.1.1.1.1.1.0.0.
1 . 1 . 1 , 0. 0, 1 . 1. 1 . 1. 1 , 1 . 0. 0 .
1.1.1.1.1,1.1,1.1.1.1.0,0.
1.1,1.1.1.1,1.1,1,1.1.0.0,
0,1.1,1.1,1,1.1.1.1,0.0.0.
0.0,1.1.1.1.1.1.1.0.0.0.0.
1;
o,o.o,
1.1.1. 1 . 1 . 0 . 0 . 0 . 0 . 0 .
c h a rI m a g e P i x e l s l [ l
{
0. 0. 0 . 9.9.9.9.9.
0.
0,
0, 0,
0.0.9,9.9,9,9.9,9.0.9.9.9.
0, 9 9. .
0 . 0.14.14,14.9,9.9,9.
99. ,
0. 0 . 0. 0.14.14.14.
0, 0. 0.
9.9.
0.0.0,
0.14.14, 0,0.0.0.0,
9.9.14.
0 . 0.14.14.14. 0, 0, 0.0.
9,9.14,14.14,14.14,14.
0. 0 . 0.0.
9.9,14.14.14.14.14.14.14,
0.0,0.0,
0 , 9,9.14.14,14,14,14,9.9.9.9.
0.0.9.9.9,9.9.9.9,0.9,9.9.
0.0.0.9.9.9,9.9.0,0.0,9,9.
1;
9.
0.
0.
0.
0.
0.
charImageMasklL]
I
0.0.0.1.1.1.1.1.0,0.0.0.1.
0,0.1.1.1.1,1,1.1.0,1.1,1.
0.1.1.0.0.1.1.1.1,1.1.1.0.
1.1.0.0.0.0.1.1.1.0.0.0,0.
1.1.0.0.0,0.1,1.0.0.0.0.0.
1.1,1.0.0.1,1.1,0.0.0.0.0.
1.1.1.1.1,1.1,1,0.0,0,0,0,
1.1,1.1.1.1.1.1,1.0.0.0.0.
0.1.1,1.1,1.1,1.1.1.1,1.0,
0.0.
1. 1. 1. 1. 1. 1.1,
0. 1, 1. 1.
0.0.0,1.1.1,1.1.0.0.0.1,1.
1;
/*
P o i n t e r st op i x e l
andmask
d a t af o rv a r i o u si n t e r n a l l ya n i m a t e d
v e r s i o n so fo u ra n i m a t e di m a g e .
*/
c h a r * I m a g e P i x e l A r r a y [ ] = { I m a g e P i x e l s O .I m a g e P i x e l s l ) ;
c h a r * ImageMaskArrayC] = {ImageMaskO. I m a g e M a s k l l ;
/ * A n i m a t e de n t i t i e s * /
# d e f i n e NUM-ENTITIES 15
E n t i t y EntitiesCNUM-ENTITIES];
/* p o i n t e r t o s y s t e m b u f f e r i n t o w h i c h w e ' l l d r a w
*/
c h a rf a r* S y s t e m B u f f e r P t r ;
/* p o i n t e rt os c r e e n * I
c h a r f a r* S c r e e n P t r :
void EraseEntities(void);
v o i d CopyOirtyRectanglesToScreen(void);
void DrawEntities(void):
v o i d A d d O i r t y R e c t ( E n t i t y *, i n t , i n t ) ;
v o i d DrawMasked(char f a r *, c h a r *, c h a r *, i n t , i n t , i n t ) ;
int):
v o i d F i l l R e c t ( c h a rf a r *, i n t . i n t . i n t .
v o i d C o p y R e c t ( c h a rf a r *, c h a rf a r *, i n t . i n t . i n t . i n t ) ;
864
Chapter 46
voidmain0
i n t i. XTemp.
YTemp;
u n s i g n e di n tT e m p c o u n t :
c h a r far *TempPtr:
u n i o n REGS r e g s :
/* A l l o c a t e memory f o r t h e s y s t e m b u f f e r i n t o w h i c h w e ' l l d r a w
i f (!(SystemBufferPtr
f a r m a l l o c ( ( u n s i g n e d int)SCREEN-WIDTH*
SCREEN-HEIGHT))) {
p r i n t f ( " C o u 1 d n ' tg e tm e m o r y \ n " ) ;
exit(1):
/* C l e a r t h e s y s t e m b u f f e r * I
TempPtr
SystemBufferPtr:
f o r (Tempcount
((unsigned)SCREEN-WIDTH*SCREEN-HEIGHT):
*TempPtr++
0:
1
--
/*
P o i n tt ot h es c r e e n
*/
ScreenPtr
MK-FP(SCREEN-SEGMENT. 0):
/ * S e tu pt h ee n t i t i e sw e ' l la n i m a t e ,
a t random l o c a t i o n s
randomize();
f o r (i 0; i < NUM-ENTITIES: i++)
C
EntitiesCi1.X
random(SCREEN-WIDTH - IMAGE-WIDTH):
EntitiesCi1.Y
random(SCREENkHE1GHT - IMAGE-HEIGHT):
1:
EntitiesCil.XDirection
Entities[i].YDirection
-1;
EntitiesCil.Interna1AnimateCount i E 1:
Entities[i].InternalAnimateMax
2;
--
/*
--
*I
Tempcount--:
*/
--
Setthedirtyrectanglelistto
e m p t y ,a n ds e tu pt h eh e a d / t a i ln o d e
as a s e n t i n e l */
NumDirtyRectangles
0:
DirtyHead.Next
EDirtyHead:
DirtyHead.Top
Ox7FFF:
D i r t y H e a d . L e f t - Ox7FFF:
Ox7FFF;
DirtyHead.Bottom
D i rtyHead. Right
Ox7FFF:
/* S e t3 2 0 x 2 0 02 5 6 - c o l o rg r a p h i c s
mode */
regs.x.ax
0x0013:
i n t 8 6 ( 0 x 1 0 .& r e g s .& r e g s ) :
/ * Loopanddraw
u n t i l a key i s p r e s s e d */
do I
/* Draw t h e e n t i t i e s t o t h e s y s t e m b u f f e r a t t h e i r c u r r e n t l o c a t i o n s ,
updatingthedirtyrectanglelist
*/
DrawEntitiesO:
/* Draw t h e d i r t y r e c t a n g l e s , or t h ew h o l es y s t e mb u f f e r
if
appropriate */
CopyDirtyRectanglesToScreenO:
/* R e s e t t h e d i r t y r e c t a n g l e l i s t t o
empty * I
NumDirtyRectangles
0:
D i r t y H e a d . N e x t & D i r t y H e a d:
/ * E r a s et h ee n t i t i e si nt h es y s t e mb u f f e ra tt h e i ro l dl o c a t i o n s ,
updatingthedirtyrectanglelist
*/
EraseEntitiesO:
/* Move t h e e n t i t i e s , b o u n c i n g o f f t h e e d g e s
o f t h es c r e e n */
f o r (i 0: i < NUM-ENTITIES: i++)
{
XTemp
EntitiesCi1.X + Entities[il.XDirection:
YTemp
EntitiesCi1.Y + EntitiesCi1.YDirection:
-- -
--
- -
--
865
EntitiesCi1.X
EntitiesCi1.Y
--
XTemp;
YTemp:
1 w h i l e( ! k b h i t O ) :
/*
getch0:
c l e at rhkee y p r e s s
*/
/*
Returnbacktotext
mode */
regs.x.ax
0x0003:
i n t 8 6 ( 0 x 1 0 .& r e g s .& r e g s ) :
/* Draw e n t i t l e s a t t h e i r c u r r e n t l o c a t i o n s , u p d a t i n g d i r t y r e c t a n g l e l i s t .
void DrawEnti ties()
*/
i n t i:
c h a rf a r* R o w P t r B u f f e r :
char*TempPtrImage:
char*TempPtrMask:
Entity*EntityPtr:
I
/*
--
/*
Copy t h e d i r t y r e c t a n g l e s , o r t h e w h o l e s y s t e m b u f f e r
t ot h es c r e e n .
*/
v o i d CopyDirtyRectanglesToScreenO
(
i n t i. R e c t W i d t h R
. ectHeight:
u n s i g n e di n tO f f s e t :
Di rtyRectangle * DirtyPtr;
i f (DrawWholeScreen) (
/ * J u s tc o p yt h ew h o l eb u f f e rt ot h es c r e e n
*/
DrawWholeScreen
0:
C o p y R e c t ( S c r e e n P t rS
. ystemBufferPtr.
SCREEN-HEIGHT,
SCREEN-WIDTH.
SCREEN-WIDTH.
SCREEN-WIDTH):
I else {
/* Copy o n l y t h e d i r t y r e c t a n g l e s , i n t h e Y X - s o r t e d o r d e r i n w h i c h
t h e y ' r e l i n k e d */
866
if appropriate.
Chapter 46
--
DirtyPtr
D i rtyHead. Next:
f o r (i 0; i < NumDirtyRectangles: i++)
I
/ * O f f s e ti nb o t hs y s t e mb u f f e r
andscreen
o f image */
Offset
( u n s i g n e di n t )( D i r t y P t r - > T o p
* SCREEN-WIDTH) +
DirtyPtr->Left;
/ * D i m e n s i o n so fd i r t yr e c t a n g l e
*/
RectWidth
DirtyPtr->Right - DirtyPtr->Left:
DirtyPtr->Bottom - DirtyPtr->Top:
RectHeight
/ * Copya
d i r t yr e c t a n g l e */
CopyRect(ScreenPtr + O f f s e t , S y s t e m B u f f e r P t r + O f f s e t .
R e c t H e i g h tR
. ectWidth.
SCREEN-WIDTH.
SCREEN-WIDTH):
/* P o i n tt ot h en e x td i r t yr e c t a n g l e
*/
DirtyPtr
DirtyPtr->Next:
--
1
/*
E r a s et h ee n t i t i e si nt h es y s t e mb u f f e r
u p d a t i n gt h ed i r t yr e c t a n g l el i s t .
v o i dE r a s e E n t i t i e s O
*/
a t t h e i rc u r r e n tl o c a t i o n s ,
i n t i;
c h a r f a r *RowPtr:
{
f o r (i 0 : i < NUM-ENTITIES: i++)
/ * Remember t h e d i r t y r e c t a n g l e i n f o f o r t h i s e n t i t y
*/
A d d D i r t y R e c t ( & E n t i t i e s [ i l , IMAGE-HEIGHT.
IMAGE-WIDTH);
/ * P o i n tt ot h ed e s t i n a t i o ni nt h es y s t e mb u f f e r
*/
RowPtr
S y s t e m B u f f e r P t r + ( E n t i t i e s C i 1 . Y * SCREEN-WIDTH) +
Entities[il.X:
/ * C l e a rt h er e c t a n g l e */
F i l l R e c t ( R o w P t r . IMAGELHEIGHT. IMAGE-WIDTH.
SCREEN-WIDTH.
0):
/ * Add a d i r t y r e c t a n g l e t o t h e l i s t .
The l i s t i s m a i n t a i n e d i n t o p - t o - b o t t o m ,
no p i x e le v e ri n c l u d e dt w i c e ,t om i n i m i z e
l e f t - t o - r i g h t ( Y X s o r t e d )o r d e r ,w i t h
t h e number o f d i s p l a y memory accessesand
t oa v o i ds c r e e na r t i f a c t sr e s u l t i n g
f r o m a l a r g et i m ei n t e r v a lb e t w e e ne r a s u r e
andredraw
f o r a g i v e no b j e c to r
for
a d j a c e n to b j e c t s .
The t e c h n i q u eu s e d i s t o c h e c kf o ro v e r l a pb e t w e e nt h e
r e c t a n g l e and all r e c t a n g l e s a l r e a d y i n t h e l i s t .
I f n oo v e r l a pi sf o u n d ,t h e
r e c t a n g l e i s added t o t h e l i s t .
I f o v e r l a pi sf o u n d ,t h er e c t a n g l ei sb r o k e n
i n t on o n o v e r l a p p i n gp i e c e s ,
and t h ep i e c e sa r e
added t o t h e l i s t b yr e c u r s i v e
c a l l st ot h i sf u n c t i o n .
*/
v o i dA d d D i r t y R e c t ( E n t i t y * p E n t i t y ,i n tI m a g e H e i g h t .i n tI m a g e w i d t h )
DirtyRectangle * D i r t y P t r :
D i r t y R e c t a n g 1 e * TempPtr;
E n t i t y TempEnti t y :
i n t i:
i f ( N u m D i r t y R e c t a n g l e s >- MAX-DIRTY-RECTANGLES) {
/ * Too many d i r t yr e c t a n g l e s :j u s tr e d r a wt h ew h o l es c r e e n
DrawWholeScreen
1;
return:
*/
/*
Remember t h i s d i r t y r e c t a n g l e . B r e a k
up i f necessary t o a v o i d
o v e r l a pw i t hr e c t a n g l e sa l r e a d yi nt h el i s t ,t h e n
addwhatever
r e c t a n g l e sa r el e f t ,i n
Y X s o r t e do r d e r */
# i f d e f CHECK-OVERLAP
*/
/ * Check f o r o v e r l a p w i t h e x i s t i n g r e c t a n g l e s
TempPtr
DirtyHead.Next:
TempPtr
TempPtr->Next)
f o r (i 0: i < N u m D i r t y R e c t a n g l e s : i++.
867
/*
We've f o u n d an o v e r l a p p i n gr e c t a n g l e .C a l c u l a t et h e
r e c t a n g l e s . i f a n y ,r e m a i n i n ga f t e rs u b t r a c t i n go u tt h e
*/
OVerlaDDedareas.andaddthem
to the dirtr list
/ * Check f o r a n o n o v e r l a p p e d l e f t p o r t i o n * /
i f (TempPtr->Left > pEntity->X) {
/* T h e r e ' s d e f i n i t e l y a n o n o v e r l a p p e dp o r t i o n a t t h e l e f t ; add
it, b u t o n l y t o a t m o s t t h e t o p
andbottom o f t h e o v e r 1 a p p i n g
o f below */
r e c t : t o p a n db o t t o ms t r i p sa r et a k e nc a r e
TempEntity.X
pEntity->X:
TempEntity.Y
m a x ( p E n t i t y - > Y ,T e m p P t r - > T o p ) :
AddDirtyRect(&TempEntity.
m i n ( p E n t i t y - > Y + ImageHeight.TempPtr->Bottom)
TempEntity.Y,
TempPtr->Left - p E n t i t y - > X ) ;
--
/ * Check f o r a n o n o v e r l a p p e d r i g h t p o r t i o n * /
i f (TempPtr->Right < ( p E n t i t y - > X + Imagewidth)) I
/ * T h e r e ' s d e f i n i t e l y a n o n o v e r l a p p e dp o r t i o na tt h er i g h t :
i t , b u to n l yt oa tm o s tt h et o pa n db o t t o mo ft h eo v e r l a p p i n g
r e c t ; t o p a n db o t t o ms t r i p sa r et a k e nc a r eo fb e l o w
*/
TempEntity.X
TempPtr->Right:
TempEntity.Y
m a x ( p E n t i t y - > Y .T e m p P t r - > T o p ) ;
AddDirtyRect(&TempEntity.
m i n ( p E n t i t y - > Y + ImageHeight,TempPtr->Bottom)
TempEntity.Y.
( p E n t i t y - > X + Imagewidth) - TempPtr->Right):
add
--
/*
Check f o r a n o n o v e r l a p p e d t o p p o r t i o n
*/
i f (TempPtr->Top > p E n t i t y - > Y ) I
/ * T h e r e ' s a t o pp o r t i o nt h a t ' sn o to v e r l a p p e d
*/
TempEntity.X
pEntity->X;
TempEntity.Y
pEntity->Y;
A d d D i r t y R e c t ( & T e m p E n t i t y . TempPtr->Top - p E n t i t y - > Y I. m a g e w i d t h ) ;
--
/*
Check f o r a n o n o v e r l a p p e d b o t t o m p o r t i o n */
i f (TempPtr->Bottom < ( p E n t i t y - > Y + I m a g e H e i g h t ) ) I
/* T h e r e ' s a b o t t o m p o r t i o n t h a t ' s n o t o v e r l a p p e d
*/
TempEntity.X
pEntity->X;
TempEntity.Y
TempPtr->Bottom;
AddDirtyRect(&TempEntity.
( p E n t i t y - > Y + ImageHeight) - T e m p P t r - > B o t t o m .I m a g e w i d t h ) :
--
/*
We'veadded
return;
*/
a l ln o n - o v e r l a p p e dp o r t i o n st ot h ed i r t yl i s t
/* CHECK-OVERLAP * /
T h e r e ' sn oo v e r l a pw i t ha n ye x i s t i n gr e c t a n g l e ,
add t h i s r e c t a n g l e a s - i s
*/
/ * F i n dt h eY X - s o r t e di n s e r t i o np o i n t .S e a r c h e s
b e c a u s et h eh e a d / t a i lr e c t a n g l ei ss e tt ot h e
TempPtr
&DirtyHead;
w h i l e( ( ( D i r t y R e c t a n g l e* ) T e m p P t r - > N e x t ) - > T o p
TempPtr
TempPtr->Next:
#endif
/*
868
Chapter 46
so we can j u s t
will a l w a y st e r m i n a t e ,
maximum v a l u e s
<
pEntity->Y)
*/
w h i l e( ( ( ( D i r t y R e c t a n g l e* ) T e m p P t r - > N e x t ) - > T o p
( ( ( D i r t y R e c t a n g l e* ) T e m p P t r - > N e x t ) - > L e f t
TempPtr
TempPtr->Next:
-- p E n t i t y - > Y )
<
&&
pEntity->X)) {
I
/ * S e tt h er e c t a n g l ea n da c t u a l l ya d d
it t o t h e d i r t y l i s t
D i rtyPtr
& D i r t y R e c t a n g 1es[NumDi r t y R e c t a n g l e s + + l :
DirtyPtr->Left
pEntity->X:
pEntity->Y:
DirtyPtr->Top
DirtyPtr->Right
p E n t i t y - > X + Imagewidth;
DirtyPtr->Bottom
p E n t i t y - > Y + ImageHeight;
DirtyPtr->Next
TempPtr->Next;
DirtyPtr:
TempPtr->Next
*/
LISTING 46.2L46-2.ASM
: A s s e m b l yl a n g u a g eh e l p e rr o u t i n e sf o rd i r t yr e c t a n g l ea n i m a t i o n .T e s t e dw i t h
: TASM.
: F i l l s a r e c t a n g l ei nt h es p e c i f i e db u f f e r .
: C - c a l l a b l ea s :
: v o i dF i l l R e c t ( c h a rf a r
* B u f f e r P t r .i n tR e c t H e i a h t .i n tR e c t W i d t h .
i n tB u f f e r w i d t h .i n tC o l o r ) :
parms
.model
.code
struc
smal 1
dw
dw
dd
dw
dw
dw
dw
?
?
?
?
?
?
?
BufferPtr
RectHeight
RectWidth
Bufferwidth
Color
parms
ends
pub1 ic -F i 1 1 R e c t
n eparro c
-F i 11 Rect
c ld
push
bp
mov
bp.sp
di
push
;pushed BP
:pushed r e t u r na d d r e s s
:farpointertobufferinwhichto
; h e i g h to fr e c t a n g l et o
fill
: w i d t ho fr e c t a n g l et o
fill
;widthofbufferinwhichto
fill
; c o l o rw i t hw h i c ht o
fill
di.[bp+BufferPtrl
dx.[bp+RectHeightl
bx,[bp+BufferWidthl
b x . C b p + R e c t W i d t;hdli s t a n cf reoem
onondf essct a n
: tostartofnext
mov
a l . b y t ep t r[ b p + C o l o r l
movf ocr o l;odt ahroheu.balle
fill
1 es
mov
mov
sub
REP STOSW
RowLoop:
mov
shr
rep
adc
rep
add
dec
jnz
POP
POP
ret
-Fi11Rectendp
cx.[bp+RectWidthl
cx.1
stosw
cx, cx
stosb
di .bx
dx
RowLoop
: p o i n tt on e x ts c a nt o
fill
: c o u n t down rows t o fill
di
bP
869
; Draws a m a s k e di m a g e( as p r i t e )t ot h es p e c i f i e db u f f e r .
C - c a l l a b l ea s :
v o iDd r a w M a s k e d ( c h af ar r
* B u f f e r P t rc.h a r
* P i x e l s , c h a r * Mask,
i n t ImageHeight, i n t Imagewidth. i n tB u f f e r w i d t h ) :
p a r msster u c
dw
?
;pushed BP
dw
?
; p u s h e dr e t u r na d d r e s s
;farpointertobufferinwhichtodraw
BufferPtr2
dd
?
; p o i n t e rt oi m a g ep i x e l s
Pixels
dw
?
; p o i n t e rt oi m a g e
mask
Mask
dw
?
ImageHeight
; h e i g h to fi m a g et od r a w
dw
?
; w i d t h o f image t o draw
dw
?
Imagewidth
: w i d t ho fb u f f e ri nw h i c ht od r a w
BufferWidthE
dw
?
parmse
ends
pub1 i c JrawMasked
n e aprr o c
-DrawMasked
cld
push
bp
mov
bp. sp
push
si
push
di
1es
mov
mov
mov
mov
sub
mov
RowLoopZ:
mov
ColumnLoop:
1 odsb
and
jz
mov
mov
SkipPixel :
inc
inc
dec
jnz
add
dec
jnz
POP
POP
POP
ret
JrawMasked
di .[bp+BufferPtrE]
s i .[bp+Mask]
bx.[bp+Pixelsl
dx.[bp+ImageHeightl
ax.[bp+BufferWidthEI
ax.[bp+ImageWidth]
[bp+BufferWidthZl.ax
: d i s t a n c ef r o me n do fo n ed e s ts c a n
; tostartofnext
cx.[bp+ImageWidthl
: g e tt h en e x t
mask b y t e
:draw t h i s p i x e l ?
;no
; y e s .d r a wt h ep i x e l
a1 .a1
SkipPixel
a1 Cbxl
es:[dil.al
bx
di
cx
Col umnLoop
di.[bp+BufferWidthZl
dx
RowLoopZ
; p o i n tt on e x ts o u r c ep i x e l
; p o i n tt on e x td e s tp i x e l
: p o i n tt on e x ts c a nt o
fill
;count down rows t o fill
di
si
bp
endp
C - c a l l a b l ea s :
CopyHeight.Copywidth.
O e s t B u f f e r W i d t hS
. rcBufferWidth):
; Copies a r e c t a n g l ef r o m one b u f f e r t o a n o t h e r .
;
vC
o iodp y R e c t ( D e s t B u f f e r PStrrc. B u f f e r P t r .
p a r msst3r u c
dw
dw
D e s t B u f f e r dPdt r
S r c B u f f e r dPdt r
870
Chapter 46
?
?
?
?
;pushed BP
;pushed ar ed tdur rens s
; fpaori nbt ueofw
rft ehocritocohp y
; pf aori nb tuoef fr rew
or m
hc itocophy
CopyHeight
dw
?
Copywidth
dw
?
D e s t B u f f e r W i d t h dw
?
S r c B u f f e r W i d t h dw
?
parms3
ends
pub1 ic _CopyRect
-CopyRect
n e aprr o c
c ld
push
mov
push
push
push
; h e i rgcoethofcotpt y
; w ircdo
etotofchpt y
; w i d tobhfu f f e r
: w bi doutf hff rew
or m
hc tiocophy
t o w h i ct oh p y
1es
di.[bp+DestBufferPtr]
Ids
si.[bp+SrcBufferPtr]
mov
dx.[bp+CopyHeightl
mov
b x , [ b p + D e s t B u f f e r W i d t h l: d i s t a n c ef r o me n d
o f one d e s ts c a n
sub
bx.Cbp+CopyWidthl
: coofttphnoye x t
mov
a x . [ b p + S r c B u f f e r W i d t h:l d i s t a n c ef r o me n do of n es o u r c es c a n
sub
ax.Cbp+CopyWidthl
; o f copy t h
onee x t
RowLoop3:
mov
cx.[bp+CopyWidthl
:#boycfttoeops y
shr
cx.1
movsw
as :copy
pmany
o s s iabswl eo r d s
rep
adc
cx ,c x
movsb
b y t e , i f any
r eodd
p :copy
add sl o
insnue
cearx:sctnpoei o, ai nxt
add
: p o i n tt on e x td e s ts c a nl i n e
d i ,bx
dec
dx
: c o u n t down rows t o fill
jnz
RowLoop3
POP
POP
POP
POP
ret
JopyRect
ds
di
si
bP
endp
end
Masked Images
Masked images are rendered by drawing an objects pixels through a mask; pixels
are actually drawn only where the mask specifies that drawing is allowed.This makes
it possible to draw nonrectangular objects that dontimproperly interfere with one
another when theyoverlap. Masked images also makeit possible to havetransparent
areas (windows) withinobjects. Masked images produce far more realistic animation
than do rectangular images, and therefore are moredesirable. Unfortunately,masked
images are also considerably slower to draw-however,
a good assembly language
implementation can go a long way toward making masked images draw rapidly
enough, as illustrated by this chapters code. (Masked imagesare also known asspdes;
some video hardware supports sprites directly, but on the PC its necessaryto handle
sprites in software.)
871
Masked images make it possible to render scenes so that a given image convincingly
appears to be in front
of or behind other
images; that is, so images are displayed in zorder (by distance). By consistently drawing images that are supposedto be farther
away before drawing nearer images, the nearer images will appear in front of the
other images, and because masked images draw only preciselythe correct pixels (as
opposed to blank pixels in the bounding rectangle),
theres no interference between
overlapping images to destroy the illusion.
In this chapter, Ive used the approachof having separate, pairedmasks and images.
Another, quite different approach to masking is to specify a transparent color for
copying, and copy only those pixels that are not thetransparent color. This has the
advantage of not requiring separate mask data, so its more compact, and thecode
to implement this is a little less complexthan the full masking Ive implemented. On
the other hand, the transparent
color approach is lessflexible because it makes one
color undrawable. Also, with a transparent color, its not possible to keep the same
base image but use different masks, because the mask information is embedded in
the image data.
Internal Animation
Ive added another feature essential to producing convincing animation: internal
animation, which is the process of changing the appearance of a given object over
time, as distinguished from changing only the locution of a given object. Internal
animation makes images look active and alive. Ive implemented the simplest possible form of internal animation in Listing 46.1-alternation between two images-but
even this level ofinternal animation greatly improves the feel of the overall animation.
You could easily increase the number of images cycled
through, simply byincreasing the
value of InternalAnimateMaxfor a given entity.You could also implement morecomplex image-selection logic to produce more interesting
and less predictable
internal-animation effects, such as jumping, ducking, running,and thelike.
Dirty-Rectangle Management
As mentioned above, dirty-rectangle animation makes it possible to access display
memory a minimum number of times. The previous chapters code didntdo any of
that; instead, it copiedall portions of every dirty rectangle to the screen, regardless
of overlap between rectangles. The code Ive presented in this chapter goes to the
other extreme, taking great pains never to draw overlapped portions of rectangles
more than once. This
is accomplished by checking for overlap whenever a rectangle
is to be added to the dirty list. When overlap with an existing rectangle is detected,
the new rectangle is reduced to between zero and fournonoverlapping rectangles.
Those rectangles are then again considered for addition to the dirty list, and may
again be reduced,if additional overlap is detected.
872
Chapter 46
873
Previous
Home
874
Chapter 46
Next
Previous
chapter 47
mode x: 256-color vga magic
Home
Next
VGAs Undocumented
timal Mode
At a book signing fo?
n of Code Optimization, an attractive young woman
came up to me, holding
and said,YoureMichaelAbrash, arent you?I confessed that Iwas, prepared to respond in an appropriately modest yet proud way to
the compliments I a s sure would follow. (It was my own book signing, after all.) It
didnt work out that
way, though. The first thing out of her mouth was:
Mode X is a s
name for a graphics mode. As my jaw started to drop, she
dnt invent themode,either. My husbanddiditbefore
you did.
added,
And they say there &e no groupies in programming!
Well. I never claimedthat I invented the mode (which is a 320x240 256-colormode with
some very special properties, as well see shortly). I did discover it independently,
but so did other people in the game business, some of them no doubtbefore I did.
The difference is that all those other people heldonto this powerful mode as a trade
secret, while I didnt; instead, I spread theword as broadly as I could in my column
in 07;
DobbSJournaZ, on the theory that the more peopleknew about this mode, the
more valuable it would be. And I succeeded, as evidenced by the fact that this now
widely-used mode is universally knownby the name Igave it in00)Mode X. Neither do I think thats a bad name; its short, catchy, and easy to remember, and it
befits the mystery status of this mode, which was omitted entirely from IBMs documentation of the VGA.
877
878
Chapter 47
Fifth, unlike mode 13H, Mode X has plenty of offscreen memory free for image
storage. This is particularly effectivein conjunction with the use of the VGAs latches;
together, the latches and the off-screen memory allow images to be copied to the
screen four pixels at a time.
Theres a sixth feature of Mode X thats not so terrific: Its hard to program efficiently. As Chapters 23 through 30 of this book demonstrates, 16-color VGA
programming can be demanding. Mode X is often as demanding as 16-color programming, and operates by a set of rules that turns everything youve learned in
16-color mode sideways. Programming Mode X is nothing like programming the
nice, flat bitmap of mode 13H, or, for that matter, the flat, linear (albeit banked)
bitmap used by 256-color SuperVGA modes. (Its
important to remember thatMode
X works on all VGAs, notjust SuperVGAs.) Many programmers I talk to love the flat
bitmap model, and think that its the ideal organization for display memory because its
so straightforward to program. Here, however, the complexity of Mode X is opportunity-opportunity for the best combination of performance and appearance the VGA
has to offer.If you do 256-color programming, and especially if you use animation,
youre missing the boat if youre not using Mode X.
Although some developers have taken advantage of ModeX, its use is certainly not
universal, being entirely undocumented; only an experienced VGA programmer
would have the slightest inkling that it even exists, and figuring out how to make it
perform beyond the write pixel/read pixel level is no mean feat. Little other than my
DDJcolumns hasbeen publishedabout it, althoughJohn Bridges has widelydistributed
his code for a number of undocumented 256-color resolutions, and Id like to acknowledge the influence of hiscode on the mode set routine presented in this chapter.
Given the tremendous advantages of Mode X over the documented mode 13H, Id
very much like to get it into the hands of as many developers as possible, so Im
going to spend the next few chapters exploring this odd but worthy mode. Ill provide mode set code, delineate the bitmap organization, and show howthe basic write
pixel and read pixel operations work. Then, Ill move on to the magic stuE rectangle fills, screen clears, scrolls, image copies, pixel inversion, and, yes, polygon fills
(just a different driver for the polygon code), all blurry fast; hardware
raster ops; and
page flipping. In the end, Ill build a working animation program that shows many
of the features of Mode X in action.
The mode set code is the logical place to begin.
879
The codein Listing 47.1 has been around for some time, and the very first version
had a bug thatserves up aninteresting lesson. The original DDJversion made images
roll on IBMs fixed-frequency VGA monitors, a problem that didntcome to my attention until the code was in print andshipped to 100,000 readers.
The bug came about this way: The code I modified to make the Mode X mode set
code used the VGAs 28-MHz clock. Mode X should have used the %-MHz clock, a
simple matter of setting bit 2 of the Miscellaneous Output register (3C2H) to 0 instead of 1.
Alas, I neglected to change thatsingle bit, so frames were drawn at a faster rate than
they should have been; however, both of my monitors are multifrequency types, and
they automatically compensated for the faster frame rate. Consequently, my clockselection bug was invisible and innocuous-until it was distributed broadly and
everybody started bangingon it.
IBM makes only fixed-frequency VGA monitors, which require very specific frame
rates; if they dont get what youve told them to expect, the image rolls. The corrected version is the oneshown here as Listing 47.1;it doesselect the 25-MHz clock,
and works just fine on fixed-frequency monitors.
Why didnt I catch this bug? Neither I nor a single one of my testers had a fixedfrequency monitor! This nicely illustrates how difficult it is these days to test code in
all the PC-compatible environments inwhich it might run. Theproblem is particularly severefor small developers, who
cant afford to buy everymodel of everyhardware
component fromevery manufacturer;just imagine trying to test network-aware software in all possible configurations!
When people ask why software isnt bulletproof; why it crashes or doesnt coexist
with certain programs; why PC clones arent always compatible; why, in short, the
myriad irritations of using a PC exist-this is a big part of the reason. I guess thats
just theprice we pay for the unfetteredcreativity and vast choice of the PC market.
1.ASM
; Mode X (320x240.256colors)
mode s e tr o u t i n e .
Works on a l l VGAs.
. ................................................................
* R e v i s e d6 / 1 9 / 9 1t os e l e c tc o r r e c tc l o c k :f i x e sv e r t i c a lr o l l
*
* p r o b l e mfoi nxs e d - f r e q u e n c y
( I B M 8 5 1 X - t y pm
e )o n i t o r s .
*
. ................................................................
;
;
; C n e a r - c a l l a b l ea s :
voidSet320x240Mode(void):
; T e s t e dw i t h TASM
; M o d i f i e df r o mp u b l i c - d o m a i n
SC-INDEX
CRTC-INDEX
MIS-OUTPUT
SCREEN-SEG
.model
small
.data
880
Chapter 47
mode setcodebyJohnBridges.
equ
03c4h
;Sequence
Controller
Index
equ
03d4h
;CRT C o n t r o l lIenrd e x
03c2h
equ
; M i s c e l l a n e o u sO u t p u tr e g i s t e r
equ
OaOOOh ;segment o f d i s p l a y memory i n mode X
: I n d e x / d a t ap a i r sf o r
CRT C o n t r o l l e r r e g i s t e r s t h a t d i f f e r
mode X.
CRTParms l a b ewl o r d
dw
00d06h : v e r t i c a lt o t a l
dw
03e07h : o v e r f l o w ( b i t 8 o f v e r t i c a l c o u n t s )
dw
04109h : c e l lh e i g h t( 2t od o u b l e - s c a n )
dw
OealOh :vsync
start
dw
O a c l l h :v syncendand
p r o t e c tc r 0 - c r 7
dw
O d f l 2 h ; v e r t i c a ld i s p l a y e d
dw
00014h : t u r n o f f dword mode
dw
Oe715h :v b l a n k s t a r t
dw
00616h ;v b l a n ke n d
dw
Oe317h : t u r n on b y t e mode
CRT-PARM-LENGTH ((S-CRTParms)/2)
equ
: mode 13hand
between
.code
p u b l i c -Set320x240Mode
-Set320x240Mode
Droc
near
push
bP
: p r e s e rcvael l esr t' saf cr ak m e
: psrie s e r v e
C r e g i svt ea r s
push
: ( d o n ' tc o u n t on B I O S p r e s e r v i n ga n y t h i n g )
di
push
mov
in t
ax.13h
10h
: l e tt h e BIOS s e ts t a n d a r d2 5 6 - c o l o r
: mode ( 3 2 0 x 2l0i n0 e a r )
mov
mov
out
mov
out
mov
mov
out
dx.SC-INDEX
ax, 0604h
d x . a; xd i s a b lceh a i n 4
mode
ax.0100h
d x . a x: s y n c h r o n o u sr e s ewt h i l es e t t i n gM i s cO u t p u t
: f o rs a f e t y ,e v e nt h o u g hc l o c ku n c h a n g e d
dx.MISC-OUTPUT
a1 .Oe3h
d x . a :l s e l e c t
25 MHz d o tc l o c k & 60 Hz s c a n n i n gr a t e
mov
mov
out
dx.SC-INDEX
ax, 0300h
d x . a:xu n droe s e( rt e s t a sr te q u e n c e r )
dx.CRTC-INDEX : r e p r o g r a mt h e CRT C o n t r o l l e r
mov
al.llh
;VSync End r e gc o n t a i n sr e g i s t e rw r i t e
mov
dx.al
: p r o t e cbti t
out
dx
:CRT C o n t r o l 1 eDr a trae g i s t e r
inc
a l . d x: g ect u r r e n t
VSync
End
r e g i s t e rs e t t i n g
in
a l . 7 f h :remove w r i t ep r o t e c t on v a r i o u s
and
dx.al
: CRTC r e g i s t e r s
out
dx
:CRT C o n t r oIl nl edre x
dec
c ld
s i . o f f s e t CRTParms : p o i n t t o CRT p a r a m e t e rt a b l e
mov
:# o f t a b l e e n t r i e s
mov
cx.CRT-PARM-LENGTH
SetCRTParmsLoop:
odsw
1
: gtnheeetx t
CRT I n d e x / O ap taai r
oduxt . a: sxtehntee x t
CRT I n d e x / O a pt aa i r
loop
SetCRTParmsLoop
mov
dx.SC-INDEX
mov
ax.OfO2h
o du xt , a: ex n a b w
l er i t eatfosol l pu lr a n e s
mov
ax.SCREEN-SEG
:now c l e a ar ldl i s p l a y
: a t a time
mov
es.ax
memory. 8 p i x e l s
881
sub
d i ,:dpio i n t
E S : D I dt oi s p l a y
memory
sub
ax,ax
:clear
zt eo r o - v a l pu iex e l s
mov
cx.8000h :# o f words i nd i s p l a y memory
r es tpo s: cwl edaoailfrsl p l a y
memory
pop
: r ed si t o r e
pop
si
POP
bP
ret
-Set320x240Mode endp
end
C r e g i vs at er rs
: r e s tcoarlel se trfar sac km e
After setting up mode 13H, Listing 47.1 alters the vertical counts and timings to
select 480 visible scanlines. (Theres no needto alter any horizontal values, because
mode 13H and Mode X both have 320-pixelhorizontal resolutions.) The Maximum
Scan Line register is programmed to double scan each line (that is, repeat eachscan
line twice), however, so we get aneffective vertical resolution of 240 scan lines. It is,
in fact, possible to get 400 or 480 independent scan lines in 256-color mode, as
discussed in Chapter 31 and 32; however, 400-scan-line modes lack
square pixels and
cant support simultaneous off-screen memoryand page flipping. Furthermore, 480scan-line modes lack page flipping altogether, due to memory constraints.
At the same time, Listing 47.1 programs the VGAs bitmap to a planar organization
that is similar to that used by the 16-color modes, and utterly different from the
linear bitmapof mode 13H. The bizarre bitmap organization of Mode X is shownin
Figure 47.1. The first pixel (the pixel at the upperleft corner of the screen) is controlled by the byte at offset 0 in plane0. (The onething thatMode X blessedly has in
common with mode 13H is that eachpixel is controlled by a single byte, eliminating
the needto mask out individual bits of display memory.) The second pixel, immediately to the right of the first pixel, is controlled by the byte at offset 0 in plane 1. The
third pixel comes from offset 0 in plane 2, and the fourth
pixel from offset 0 in plane
3. Then, thefifth pixel is controlled by the byte at offset 1 in plane 0, and thatcycle
continues, with each group of four pixels spread across the fourplanes at thesame
address. The offset M of pixel N in display memory is M = N/4, and the plane P of
pixel N is P = N mod 4. For display memory writes,
the plane is selected by setting bit
P of the Map Mask register (Sequence Controllerregister 2) to 1and all other bits to
0; for display memory reads, the plane is selected by setting the Read Map register
(Graphics Controller register 4) to P.
It goes without saying that this is one ugly bitmap organization, requiring a lot of
overhead to manipulate a single pixel. The write pixel code shown in Listing 47.2
must determine the appropriate plane and perform
16-bitaOUT to select that plane
for each pixel written, and likewise for the read pixel code shown in Listing 47.3.
Calculating and mapping in a plane once foreach pixel written is scarcely a recipe
for performance.
Thats all right, though, because most graphics software spends little time drawing
individual pixels. Iveprovided the write and read pixel routines as basic primitives,
882
Chapter 47
and so youll understand how the bitmap is organized, but the building blocks of
high-performance graphics software are fills, copies, and bitblts, and its there that
Mode X shines.
Workson
a l l VGAs.
; C n e a r - c a l l a b l ea s :
;
v o i dW r i t e P i x e l X ( i n t
SC-INDEX
03c4h
equ
MAP-MASK
02h equ
SCREEN-SEG
equ
equ
SCREENKWIDTH
struc
dw
X
dw
Y
dw
PageBase dw
X.
i n t Y . unsigned i n t PageBase. i n C
t olor);
OaOOOh
EO
:Sequence C o n t r o l l e rI n d e x
: i n d e x i n SC o f Map Mask r e g i s t e r
:segment o f d i s p l a y memory i n mode X
; w i d t ho fs c r e e ni nb y t e sf r o mo n es c a nl i n e
; t ot h en e x t
parms
Color
parms
dw
ends
2 dup ( ? )
?
?
?
:pushed BP and r e t u r na d d r e s s
: X c o o r d i n a t eo fp i x e lt o
draw
: Y c o o r d i n a t eo fp i x e lt od r a w
;base o f f s e t i n d i s p l a y memory o f page i n
; w h i c ht od r a wp i x e l
w
t oh i ;cicnho l o r
draw p i x e l
883
.model s m a l l
.code
p u b l i c -WritePixelX
-Wri t e P i x e l X n e aprr o c
push
bp
mov
bP*sP
; p r e s e r v ec a l l e r ' ss t a c kf r a m e
; p o i n tt ol o c a ls t a c kf r a m e
mov
mu1
mov
shr
shr
add
add
mov
mov
ax.SCREEN-WIDTH
C bp+Y 1
bx.Cbp+XI
bx.1
bx.1
bx, ax
bx.[bp+PageBasel
ax.SCREEN-SEG
es.ax
mov
and
mov
shl
mov
out
c l . b y t e p t r Cbp+Xl
cl .Ollb
ax.0100h + MAP-MASK
ah.cl
dx.SC-INDEX
dx, ax
mov
mov
a1 . b y t e p t r [ b p + C o l o r ]
e s d: [tebhsx;pecidltinro.rhxealaeodlwlr
;offsetofpixel'sscanlinein
page
;X/4
offsetofpixelinscanline
; o f f s e to fp i x e li n
page
: o f f s e to fp i x e li nd i s p l a y
memory
; p o i n t ES:BX t o t h e p i x e l ' s a d d r e s s
--
;CL
pixel'splane
;AL
i n d e x i n SC o f Map Mask r e g
;setonlythebitforthepixel'splaneto
; s e t t h e Map Mask t o e n a b l e o n l y t h e
; p i x e l ' sp l a n e
f r a m e s t a ccka l l;ePOP
re
r 'sst o r e b p
ret
endD
-W r i t e P i x e l X
end
LISTING 47.3
L47-3.ASM
Workson
a l l VGAs.
; No c l i p p i n gi sp e r f o r m e d .
: C n e a r - c a l l a b l ea s :
:
u n s i g n e di nRt e a d P i x e l X ( i n t
GC-INDEX
READ-MAP
SCREEN-SEG
SCREEN-WIDTH
:G
CIron0and3ptecrhoexi clhlse r
:index
04h
OaOOOh
80
struc
dw
X
dw
Y
dw
PageBase dw
X . i n t Y , u n s i g n e idnPt a g e B a s e ) ;
i n GCt hoef
:segment d oi sf p l a y
;w
s cibodryfeitftnhreeonsm
: t ot h en e x t
Read Map r e g i s t e r
memory i n mode X
scan
one
line
parms
:pushed BP and r e t u r na d d r e s s
; X c o o r d i n a t eo fp i x e lt or e a d
;Y c o o r d i n a t eo fp i x e lt or e a d
;base o f f s e t i n d i s p l a y memory o f pagefrom
; w h i c ht or e a dp i x e l
parms
ends
.model
smal
1
.code
publ i c -Readpixel X
-ReadPixelX
n e pa r o c
bp
push
mov
bp.sp
884
Chapter 47
: p r e s e r v ec a l l e r ' ss t a c kf r a m e
; p o i n tt ol o c a ls t a c kf r a m e
mov
mu1
mov
shr
shr
add
add
mov
mov
ax.SCREEN-WIDTH
[ bp+Y 1
bx.Cbp+XI
bx.1
bx.1
bx, ax
bx.[bp+PageBasel
ax.SCREEN-SEG
es ,ax
mov
and
mov
mov
out
a h , b y t ep t r
ah.0llb
a1 ,READ-MAP
dx.GC-INDEX
dx.ax
mov
sub
a1 . e s : [ b x l
ah.ah
; r e a dt h ep i x e l ' sc o l o r
: c o n v e r t i t t o an u n s i g n e d i n t
bP
: r e s t o r ec a l l e r ' ss t a c kf r a m e
POP
ret
-ReadPixel X
end
; o f f s e to fp i x e l ' ss c a nl i n ei n
;X/4
offsetofpixelin
; o f f s e to fp i x e li n
page
:offsetofpixelindisplay
page
scan l i n e
memory
: p o i n t ES:BX t o t h e p i x e l ' s a d d r e s s
[bp+X1
--
pixel'splane
i n d e x i n GC o f t h e Read Map r e g
;AL
; s e tt h e
Read Map t o r e a d t h e p i x e l ' s
: plane
:AH
endp
Mode X ( 3 2 0 x 2 4 0 .2 5 6c o l o r s )r e c t a n g l e
fill r o u t i n e . Workson
all
VGAs. Uses s l o wa p p r o a c ht h a ts e l e c t st h ep l a n ee x p l i c i t l yf o re a c h
p i x e l . F i l l s up t o b u t n o t i n c l u d i n g t h e c o l u m n a t
EndX andtherow
a t EndY. No c l i p p i n g i s p e r f o r m e d .
C n e a r - c a l l a b l ea s :
v o i dF i l l R e c t a n g l e X ( i n St t a r t X i.n t
S t a r t Y . i n t EndX. i n t EndY.
u n s i g n e di n t PageBase. i n t C o l o r ) :
SC-INDEX
MAP-MASK
SCREEN-SEG
SCREEN-WIDTH
parms
StartX
StartY
EndX
struc
dw
dw
dw
dw
03c4h
02h
OaOOOh
80
:Sequence C o n t r o l l e rI n d e x
: i n d e x i n SC o f Map Mask r e g i s t e r
:segment o f d i s p l a y memory i n mode X
: w i d t ho fs c r e e ni nb y t e sf r o m
onescan
: t ot h en e x t
line
:pushed BP and r e t u r na d d r e s s
: X c o o r d i n a t eo fu p p e rl e f tc o r n e ro fr e c t
:Y c o o r d i n a t eo fu p p e rl e f tc o r n e ro fr e c t
;X c o o r d i n a t eo fl o w e rr i g h tc o r n e ro fr e c t
: ( t h er o wa t
EndX i s n o t f i l l e d )
885
EndY
dw
PageBase dw
Color
parms
ends
dw
.model
smal
1
.code
pub1 ic -Fi 11 RectangleX
-Fi11Rectangl eX p r o cn e a r
push
bP
mov
bP.SP
push
si
di
push
mov
mu1
mov
shr
shr
ax.SCREEN-WIDTH
[bp+StartYl
di,[bp+StartX]
d i .1
d i .1
add
add
di ,ax
di.[bp+PageBasel
mov
mov
ax.SCREEN-SEG
es.ax
:Y c o o r d i n a t e o f l o w e r r i g h t c o r n e r o f r e c t
: ( t h ec o l u m na t
EndY i s n o t f i l l e d )
;base o f f s e t i n d i s p l a y memory o f page i n
; w h i c ht o fill r e c t a n g l e
; c o l o ri nw h i c ht od r a wp i x e l
: p r e s e r v ec a l l e r ' ss t a c kf r a m e
; p o i n tt ol o c a ls t a c kf r a m e
: p r e s e r v ec a l l e r ' sr e g i s t e rv a r i a b l e s
;offsetin
page o f t o p r e c t a n g l e s c a n l i n e
offsetoffirstrectanglepixelin
:X/4
: line
: o f f s e to ff i r s tr e c t a n g l ep i x e li n
page
;offsetoffirstrectanglepixelin
: d i s p l a y memory
scan
: p o i n t ES:DI t o t h e f i r s t r e c t a n g l e p i x e l ' s
; address
dx.SC-INDEX
mov
a1 .MAP-MASK
mov
dx.al
out
inc
dx
c l . b y t ep t r[ b p + S t a r t X l
mov
and
cl .Dllb
mov
a1 . O l h
shl
a1 . c l
a h , b y t ep t r[ b p + C o l o r l
mov
mov
bx.[bp+EndYI
sub
bx.[bp+StartYI
F i 1Done
1
jle
mov
s i , [bp+EndX]
s i ,[ b p + S t a r t X l
sub
Fi 11Done
jle
F i 11 RowsLoop:
ax
push
di
push
cx.si
mov
FillScanLineLoop:
dx.al out
mov
es:[dil.ah
a1,l shl
a1and
.Ollllb
A d d r e sj nszS e t
di
inc
mov
al.00001b
AddressSet:
1oop
Fi11ScanLineLoop
pop
di
add
d i .SCREEN-WIDTH
886
Chapter 47
; s e tt h e
Sequence C o n t r o l l e r I n d e x t o
Map Mask r e g i s t e r
; pointtothe
; p o i n t DX t o t h e
;CL
SC D a t ar e g i s t e r
- firstrectanglepixel'splane
;setonlythebitforthepixel'splaneto
: c o l o rw i t hw h i c ht o
fill
;BX
h e i g h to fr e c t a n g l e
: s k i p i f 0 or n e g a t i v e h e i g h t
:CX
widthofrectangle
: s k i p i f 0 o rn e g a t i v ew i d t h
: r e t r i e v et h ep l a n e
mask f o r t h e l e f t
;count down s c a nl i n e s
ax
POP
bx
dec
j n zF i
F i 1Done:
1
pop
pop
Rows
1
Loop
di
si
bp
POP
edge
; r e s t o r ec a l l e r sr e g i s t e rv a r i a b l e s
; r e s t o r ec a l l e r ss t a c kf r a m e
ret
-F i 1 1 R e c t a neX
g l endp
end
The two major weaknesses of Listing 47.4 both result from selecting the plane on a
pixel by pixel basis. First, endless OUTs (which are particularly slow on 386s, 486s,
and Pentiums, much slower than accesses to display memory) must be performed,
and, second, REP STOS cant be used. Listing 47.5 overcomes both these problems
by tailoring the fill technique to the organization of display memory. Each plane is
filled in its entirety in one burst before the next plane
is processed, so only fiveOUTs
are required in all, and REP STOS can indeed be used; Ive used REP STOSB in
Listings 47.5 and 47.6. REP STOSW could be usedand would improveperformance on
most VGAs; however, REP STOSW requires extra overhead to set
up, so it can be slower
for small rectangles, especiallyon &bit VGAs. Note that doing an entire plane at atime
can produce afading-ineffect for large images, because all columns for one plane
are drawn before any columns for the next.
If this is a problem,the fourplanes can
be cycled through once for each
scan line, rather than once for the entire
rectangle.
Listing 47.5 is 2.5 times faster than Listing 47.4 at clearing the screen on a 20-MHz
cached 386 with a Paradise VGA. Although Listing 47.5 is slightly slower than an
equivalent mode 13H fill routine would be, its not grievously so.
LISTING 47.5
L47-5.ASM
p e rr e c t a n g l e ;t h i sr e s u l t si n
a f a d e - i ne f f e c tf o rl a r g e
r e c t a n g l e s .F i l l su p
t o b u tn o ti n c l u d i n gt h ec o l u m na t
; row a t EndY. No c l i p p i n g i s performed.
; C n e a r - c a l l a b l ea s :
;
;
v o i dF i l l R e c t a n g l e X ( i n St t a r t X i.n St t a r t Y i,n t
unsigned i n t PageBase, i n t C o l o r ) ;
SC-INDEX
MAPLMASK
SCREEN-SEG
03c4h
equ
;index
02h
equ
equ
OaOOOh
on a l l
EndX and t h e
EndX. i n t EndY.
;Sequence C o n t r o l l e rI n d e x
i n SC o f Map Mask r e g i s t e r
;segmentd i so pf l a y
memory i n mode X
887
equ
SCREEN-WIDTH
80
; w i d t ho fs c r e e ni nb y t e sf r o m
: t ot h en e x t
onescan
line
parms s t r u c
StartX
StartY
EndX
dw
dw
dw
dw
2 dup ( ? )
?
EndY
dw
PageBase
dw
Color
parmsends
dw
Startoffset
Width
Height
PlaneInfo
STACK-FRAME-SIZE
?
?
?
equ
equ
equ
equ
equ
-2
-4
-6
-8
8
.model
smal
1
.code
pub1 i c - F i 11 RectangleX
-F i 11Rectangl eX p r o c n e a r
push
bp
mov
bp. sp
sp.STACK-FRAME-SIZE
sub
si
push
push
di
cld
mov
mu1
mov
shr
shr
ax.SCREEN-WIDTH
[bp+StartYl
d i ,[ b p + S t a r t X l
d i .1
d i .I
add
add
d i ,ax
di.Cbp+PageBasel
mov
mov
mov
mov
mov
out
mov
sub
Jle
mov
mov
mov
ax.SCREEN-SEG
es ,ax
Cbp+StartOffsetl,di
dx,SC-INDEX
a1 .MAP-MASK
dx.al
bx, [bp+EndY 1
bx.Cbp+StartYl
F i 1Done
1
Cbp+Heightl.bx
dx. [bp+EndXI
cx.[bp+StartX]
dx.cx
F i 11 Done
dx
c x . n o tO l l b
dx.cx
dx.1
dx. 1
CmP
Jle
dec
and
sub
shr
shr
888
Chapter 47
:pushed BP and r e t u r na d d r e s s
:X c o o r d i n a t e o f u p p e r l e f t c o r n e r o f r e c t
:Y c o o r d i n a t eo fu p p e rl e f tc o r n e ro fr e c t
;X c o o r d i n a t eo fl o w e rr i g h tc o r n e ro fr e c t
: ( t h er o wa t
EndX i s n o t f i l l e d )
:Y c o o r d i n a t e o f l o w e r r i g h t c o r n e r o f r e c t
: ( t h ec o l u m na t
EndY i s n o t f i l l e d )
;base o f f s e t i n d i s p l a y memory o f page i n
: w h i c ht o fill r e c t a n g l e
:colorinwhichto
d r a wp i x e l
; l o c a ls t o r a g ef o rs t a r to f f s e to fr e c t a n g l e
: l o c a ls t o r a g ef o ra d d r e s sw i d t ho fr e c t a n g l e
: l o c a ls t o r a g ef o rh e i g h to fr e c t a n g l e
IF and p l a n e mask
:1oca1 s t o r a g e f o r p l a n e
; p r e s e r v ec a l l e r ' ss t a c kf r a m e
: p o i n tt ol o c a ls t a c kf r a m e
: a l l o c a t es p a c ef o rl o c a lv a r s
; p r e s e r v ec a l l e r ' sr e g i s t e rv a r i a b l e s
: o f f s e t i n page o f t o p r e c t a n g l e s c a n l i n e
;X/4
: line
offsetoffirstrectanglepixelinscan
:offsetoffirstrectanglepixelin
: o f f s e to ff i r s tr e c t a n g l ep i x e li n
: d i s p l a y memory
page
: p o i n t ES:DI t o t h e f i r s t r e c t a n g l e p i x e l ' s
: address
; s e tt h eS e q u e n c eC o n t r o l l e rI n d e xt o
Map Mask r e g i s t e r
: pointtothe
:BX
heightofrectangle
; s k i p i f 0 o rn e g a t i v eh e i g h t
; s k i p i f 0 o rn e g a t i v ew i d t h
dx inc
mov
mov
a d;draIree
ccotrsotfoassenssg l e
[bp+Width].dx
word p t r [bp+PlaneInfo],OOOlh
: l o w e rb y t e
; u p p e rb y t e
fill
--ppl al anneemask
f o r p l a n e 0.
# f o rp l a n e 0
F i 1 P1
1 anesLoop:
mov
ax,word p t r[ b p + P l a n e I n f o ]
mov
dx.SC-INDEX+l
; p o i n t DX t thoe
SC D raet ga i s t e r
p i xtehl i s foopurl ta nt hee: sdext . a l
mov
E S : D I tr oe c t a n g lset a r t
d i , [ b p + S t a r t O f f s e; tpl o i n t
mov
dx.Cbp+Widthl
mov
c 1 , b y t ep t r[ b p + S t a r t X ]
and
cl .Ollb
;plane # o f f i r s t p i x e l
in initialbyte
ah,cl
;do we draw t h i s p l a n e i n t h e i n i t i a l b y t e ?
CmP
jae
InitAddrSet
;yes
dec
dx
;no. so s k i p t h e i n i t i a l b y t e
Fi 11 LoopBottom
jz
: s k i pt h i sp l a n e
i f n op i x e l si n
it
inc
di
InitAddrSet:
mov
c l . b y t e p t r [bp+EndX]
dec
cl
and
cl .Ollb
;plane # o f l a s t p i x e l i n f i n a l b y t e
ah.cl
;do we draw t h i s p l a n e i n t h e f i n a l b y t e ?
CmP
j be
WidthSet
:yes
dec
dx
;no. s o s k i p t h e f i n a l b y t e
F i 11 LoopBottom
i f no p i x e l s i n i t
; s k i pt h i sp l a n e s
jz
WidthSet:
mov
s i .SCREEN-WIDTH
s i ,dx
sub
: d i s t a n c ef r o me n do fo n es c a nl i n et os t a r t
; o fn e x t
mov
bx.Cbp+Heightl
;# o f l i n e s t o fill
mov
a l . b y t ep t rC b p + C o l o r l
: c o l o rw i t hw h i c ht o
fill
F i 11 RowsLoop:
cx, dx
mov
;# o f b y t e s a c r o s s s c a n l i n e
stosb
;fill t h e s c a n l i n e i n t h i s p l a n e
r eP
add
di ,si
:pointtothestartofthenextscan
: 1i n e o f t h e r e c t a n g l e
dec
bx
; c o u n t down s c a nl i n e s
jnz
F i 11 RowsLoop
FillLoopBottom:
mov
ax.word Cp bt rp + P l a n e I n f o l
shl
a1 .1
;settheplanebittothenextplane
inc
ah
#
; i n c r e m e n tt h ep l a n e
mov
word [pbtpr + P l a n e I n f o ] . a x
cmp ;have ah.4
we pdone
l a n eas l?l
j n Fz1i 1
P1 anesLoop
; c o n t i n u e i f anymoreplanes
F i 1 Done:
1
v a r ieagbci alse;tl pop
lrseeer rs' tso r e d i
pop
si
mov
bpsp,
; d i s c a r ds t o r a g ef o rl o c a lv a r i a b l e s
; r e s t o r ec a l l e r ' ss t a c kf r a m e
POP
bp
ret
-Fi 11 RectangleX endp
end
889
about as fast as mode 13H. That alone makes Mode X an attractive mode, given its
square pixels, pageflipping, and offscreen memory,but superior performance would
nonetheless be a pleasant addition to that list. Superior performance is indeed possible in Mode X, although, oddly enough, it comes courtesyof the VGAs hardware,
which was never designed to be used in 256-color modes.
All of the VGAs hardware assistfeatures are available in ModeX, although some are
not particularly useful. The VGA hardware feature thats truly the key to Mode X
performance is the ability to process four planes worth of data in parallel; this includes both the latches and the capability to fan data out to any or all planes. For
rectangular fills, welljust need to fan the data out to various planes, so Ill defer a
discussion of other hardware features for now. (By the way, the ALUs, bit mask, and
most other VGA hardware features are also available in mode 13H-but parallel
data processing is not.)
In planar modes, such as Mode X, a byte written by the CPU to display memory may
actually goto anywhere betweenzero and four planes, as shownin Figure 47.2. Each
plane for which the setting of the corresponding bit in the Map Mask register is 1 receives the CPU data, and each planefor which the corresponding bitis 0 is not modified.
In 16-color modes, each plane contains onequarter of each of eight pixels, withthe
4 bits of each pixel spanning all four planes. Not so in Mode X. Look at Figure 47.1
again; each plane contains one pixel in its entirety, with four pixels at any given
address, one perplane. Still, the Map Mask register does the same job in Mode X as
CPU write of valueThe
41 h tooffset 0 inthe
displaymemory
890
Chapter 47
in 16-color modes;set it to OFH (all 1-bits), and all four planes will be written to by
each CPU access. Thus, it would seemthat up to four pixels could be set by a single
Mode X byte-sized writeto display memory, potentially speeding up operations like
rectangle fills by four times.
And, as it turnsout, four-plane parallelism works quite nicely indeed. Listing 47.6 is
yet another rectangle-fill routine, this time using the Map Mask to set up to four
pixels per STOS. The only trick to Listing 47.6
is that any leftor right edge thatisnt
aligned to a multiple-of-four pixel column(that is, a column at which one four-pixel
set ends and the next
begins) must be clippedvia the Map Maskregister, becausenot
all pixels at the address containing the edge are modified. Performance is as expected; Listing 47.6 is nearlyten times fasterat clearing the screen than Listing 47.4
and just about
four times faster than Listing 47.5-and also about fourtimes faster
than the same rectangle fill in mdde 13H. Understanding the bitmap organizztion
and display hardware of Mode X does indeedpay.
Note that the return from Mode Xs parallelism is not always 4x; someadapters lack
the underlying memory bandwidth to writedata thatfast. However, ModeX parallel
access should always be faster than mode 13H access;the only question on any given
adapter is how much faster.
LISTING47.6147-6.ASM
: Mode X (320x240. 256 c o l o r s )r e c t a n g l e
fill r o u t i n e . Workson
: VGAs. Uses f a s ta p p r o a c ht h a tf a n sd a t ao u tt ou pt of o u rp l a n e sa t
: once t o drawup t o f o u r p i x e l s a t o n c e . F i l l s
up t o b u t n o t
: i n c l u d i n gt h ec o l u m na t
EndX and t h er o wa t
: performed.
: C n e a r - c a l l a b l ea s :
:
v o i dF i l l R e c t a n g l e X ( i nSt t a r t Xi.nSt t a r t Yi,n t
unsigned i n t PageBase. i n t C o l o r ) :
equ
equ
SC-INDEX
MAP-MASK
SCREEN-SEG
SCREEN-WIDTH
parms
StartX
StartY
EndX
struc
dw
dw
dw
dw
equ
80
equ
03c4h
02h
OaOOOh
2 dup ( ? )
?
?
?
EndY
dw
PageBase
dw
Color
parms
ends
dw
all
EndY. No c l i p p i n g i s
EndX. i n t EndY.
;Sequence C o n t r o l l e rI n d e x
; i n d e x i n SC o f Map Mask r e g i s t e r
:segment o f d i s p l a y memory i n mode X
: w i d t ho fs c r e e ni nb y t e sf r o mo n es c a nl i n e
: t ot h en e x t
:pushed BP and r e t u r na d d r e s s
; X c o o r d i n a t eo fu p p e rl e f tc o r n e ro fr e c t
: Y c o o r d i n a t eo fu p p e rl e f tc o r n e ro fr e c t
:X c o o r d i n a t eo fl o w e rr i g h tc o r n e ro fr e c t
: ( t h er o wa t
EndX i s n o t f i l l e d )
: Y c o o r d i n a t eo fl o w e rr i g h tc o r n e ro fr e c t
: ( t h e column a t EndY i s n o t f i l l e d )
;base o f f s e t i n d i s p l a y memory o f page i n
: w h i c ht o fill r e c t a n g l e
: c o l o ri nw h i c ht o
draw p i x e l
.model s m a l l
.data
: Planemasks f o r c l i p p i n g l e f t
and r i g h t edges o f r e c t a n g l e .
00fh,00eh.00ch.008h
db
L e f t C l ipP1 aneMask
891
00fh.001h.003h.007h
RightClipPlaneMask
db
.code
pub1 ic JillRectangl eX
J i l l R e c t a n gDl enr X
eo ac r
push
mo v
push
push
c ld
mov
mu1
mov
shr
shr
add
add
mov
mov
,ax
mov
mov
out
inc
mov
and
mov
mov
and
mov
ax.SCREEN-WIDTH
[ b p + S t a r:toYprft lfeoaiosncpgfeteta n sgcl ieanne
di .Cbp+StartXl
:X/4
ofriferfoscftet at snpcgialnxene l
d i .1
: line
d i .1
d i ,ax
:rf oierfcosf tstf aenpti ginxl e l
d i . [ b p + P a g e B a s: oe fl ffsoi erfset tc t a n gpl iexi ne l
: d i s p l a y memory
: p o i n t ES:DI t o t h e f i r s t r e c t a n g l e
ax.SCREEN-SEG
es
a :d dpriexsesl ' s
dx.SC-INDEX
t:hsee t
Sequence C o n t r oI lnlt deo er x
: p ot ithnoet
Map Mask r e g i s t e r
a1 .MAP-MASK
dx.al
: p o i ndtx
t hDXe t o
SCr e D
g iasttae r
s i .Cbp+StartXl
s i ,0003h
up
:look
l e f t plane
edge
mask
bh.LeftClipP1aneMaskCsil : t o c l i p 6 p u t i n BH
s i .Cbp+EndXl
pel adr nisggeeihu,t0:pl0o0o3kh
bl.RightClipP1aneMaskCsil : mask t o c l i p 6 p u t i n BL
mov
mov
CmP
Jle
dec
and
sub
shr
shr
j nz
and
MasksSet:
mov
sub
Jle
mov
mov
sub
dec
FillRowsLoop:
push
mov
out
mov
stosb
dec
Js
Jz
892
Chapter 47
: p r e s e r v ec a l l e r ' ss t a c kf r a m e
: p o i n tt ol o c a ls t a c kf r a m e
: p r e s e r v ec a l l e r ' sr e g i s t e rv a r i a b l e s
cx.Cbp+EndXI
si .Cbp+StartXl
cx.si
F i 1Done
1
cx
si .not Ollb
cx.si
cx.1
cx.1
MasksSet
bh,bl
s i ,Cbp+EndYI
si .Cbp+StartYl
F i 1Done
1
a h . b y t ep t r[ b p + C o l o r l
bp.SCREEN-WIDTH
bp.cx
bp
cx
a1 ,bh
dx.al
a1 ,ah
cx
F i 11 LoopBottom
DoRightEdge
page
: c a l c u l a t e # o fa d d r e s s e sa c r o s sr e c t
:skip i f 0 o rn e g a t i v ew i d t h
fill - 1
: t h e r e ' sm o r et h a no n eb y t et od r a w
: t h e r e ' so n l yo n eb y t e ,
so c o m b i n e t h e l e f t : a n dr i g h t - e d g ec l i p
masks
:# o fa d d r e s s e sa c r o s sr e c t a n g l et o
heightofrectangle
:BX
: s k i p i f 0 o rn e g a t i v eh e i g h t
fill
:colorwithwhichto
: s t a c kf r a m ei s n ' tn e e d e da n ym o r e
: d i s t a n c ef r o me n do fo n es c a nl i n et os t a r t
: o fn e x t
:remember w i d t hi na d d r e s s e s
- 1
:putleft-edgeclip
mask i n AL
: s e tt h el e f t - e d g ep l a n e( c l i p )
mask
:putcolorin
AL
: d r a wt h el e f te d g e
:countoffleft
edge b y t e
: t h a t ' st h eo n l yb y t e
: t h e r ea r eo n l yt w ob y t e s
Previous
mov
out
mov
rep
OoRightEdge:
mov
out
mov
stosb
F i 11 LoopBottom:
add
pop
POP
; m i d d l ea d d r e s s e sa r ed r a w n
4 pixelsat
; s e tt h em i d d l ep i x e l
mask t o no c l i p
; p u tc o l o r
i n AL
; d r a wt h em i d d l ea d d r e s s e sf o u rp i x e l sa p i e c e
a1 . b l
dx,al
a1 .ah
: p u tr i g h t - e d g ec l i p
mask i n AL
: s e tt h er i g h t - e d g ep l a n e( c l i p )
; p u tc o l o ri n
AL
; d r a wt h er i g h te d g e
d i bp
cx
si
F i Rows
1
POP
dec
j nz
F i 11 Done:
pop
a1 .OOfh
dx.al
a1 ,ah
stosb
Home
a pop
mask
:pointtothestart
o f t h en e x ts c a nl i n eo f
: t h er e c t a n g l e
; r e t r i e v ew i d t hi na d d r e s s e s
- 1
: c o u n t down s c a nl i n e s
Loop
di
si
bp
; r e s t o r ec a l l e r sr e g i s t e rv a r i a b l e s
; r e s t o r ec a l l e r ss t a c kf r a m e
ret
-Fi 11 RectangleX endp
end
Just so you can see Mode X in action, Listing 47.7 is a sample program that selects
Mode X and draws a numberof rectangles. Listing 47.7 links to any of the rectangle
fill routines Ive presented.
And now, I hope, youre beginning to see why Im so fond of Mode X. In the next
chapter, well continue with Mode X by exploring thewonders that the latches and
parallel plane hardware can work on scrolls, copies, blits, and pattern fills.
LISTING 47.7
/*
L47-7.C
Program t od e m o n s t r a t e mode X ( 3 2 0 x 2 4 0 .2 5 6 - c o l o r s )r e c t a n g l e
fill b yd r a w i n ga d j a c e n t2 0 x 2 0r e c t a n g l e si ns u c c e s s i v ec o l o r sf r o m
0 onupacrossand
down t h es c r e e n * /
# i n c l u d e < c o n i o . h>
#include <dos. h >
893
Next
Previous
chapter 487
Home
Next
!I%
,:\,
6F
&&
897
.f
7
+
All four
latches are
loaded from
the corresponding planesby every
display memory read
P
How the VGA latches are loaded.
Figure 48.1
Figure 48.2
898
Chapter 48
Mode X ( 3 2 0 x 2 4 0 .2 5 6c o l o r s )p a t t e r n e d
r e c t a n g l e f i l l s by f i l l i n g t h e s c r e e n w i t h a d j a c e n t 8 0 x 6 0
C++
r e c t a n g l e s i n a v a r i e t y o f p a t t e r n s .T e s t e dw i t hB o r l a n d
i n C c o m p i l a t i o n mode a n dt h es m a l m
l odel
*/
# i n c l u d e< c o n i o . h >
# i n c ude
l
<dos. h>
v o i dS e t 3 2 0 ~ 2 4 0 M o d e ( v o i d ) :
v o i dF i l l P a t t e r n X ( i n t .i n t .i n t .i n t .u n s i g n e di n t .c h a r * ) :
/ * 16 4 x 4p a t t e r n s * /
s t a t i cc h a rP a t t 0 [ 1 = ~ 1 0 . 0 . 1 0 , 0 , 0 . 1 0 . 0 . 1 0 . 1 0 . 0 , 1 0 . 0 . 0 , 1 0 , 0 , 1 0 ~ :
s t a t i cc h a r P a t t l [ 1 - ( 9 . 0 . 0 . 0 , 0 , 9 . 0 . 0 . 0 . 0 , 9 . 0 . 0 , 0 , 0 . 9 1 ;
s t a t i cc h a rP a t t 2 [ ] = ~ 5 , 0 . 0 . 0 , 0 , 0 , 5 , 0 , 5 , 0 , 0 . 0 , 0 , 0 , 5 , 0 ~ :
Patt3[]=~14,0.0,14,0.14.14.0.0.14.14.0.14~0~0~141:
s t a t i cc h a r
s t a t i cc h a r
Patt4~]=(15.15,15,1.15.15.1.1.15.1.1.1.1~1,1,1~;
s t a t i cc h a r
Patt5[1=~12.12.12.12.6.6.6.12.6.6.6.12.6~6,6,121:
s t a t i c c h a r Patt6[1=~80.80.80.E0,80,80,80,80,80,80,80,E0,80,80,80,15~:
s t a t i cc h a r P a t t 7 [ ] - I 7 8 . 7 8 . 7 8 . 7 8 . 8 0 . 8 0 . 8 0 . 8 0 . 8 2 . 8 2 . 8 2 , 8 2 , 8 4 , E 4 , 8 4 , 8 4 ) :
s t a t i c c h a r Patt8[1=~78.80,82,84.80.82.84.78,84,78,82,84,78,80,84,78,80~E2~;
s t a t i cc h a r
Patt9[1=~78.80.82,84.78,80,82,84.78,80,82,84,78,80,82,84~:
s t a t i cc h a r
Patt10[]-(0.1.2.3.4,5.6.7.8.9.10.11.12.13,14,151:
s t a t i cc h a r P a t t 1 1 [ 1 - ~ 0 . 1 . 2 , 3 , 0 , 1 , 2 , 3 , 0 , 1 . 2 . 3 , 0 , 1 , 2 , 3 1 :
s t a t i cc h a rP a t t 1 2 [ 1 = [ 1 4 . 1 4 , 9 , 9 , 1 4 ~ 9 , 9 , 1 4 . 9 . 9 . 1 4 . 1 4 . 9 , 1 4 ~ 1 4 , 9 1 :
s t a t i c c h a r Patt13[]-[15.8.8.8,15.15.15.8,15,15,15,8,15,8,8,E~:
899
s t a t i c c h a r Patt14[]-{3,3,3.3.3.7.7.3.3.7.7.3.3.3.3.3);
s t a t i c c h a r Pattl5[l-~O.O.O.O.O.64.0,0.0.0.0.0.0.0.0,89~;
/* T a b l e o f p o i n t e r s t o t h e 1 6 4 x 4 p a t t e r n s w i t h w h i c h t o d r a w
*/
s t a t i cc h a r *P a t t T a b l e C l
(PattO.Pattl.Patt2.Patt3.Patt4.Patt5.Patt6,
Patt7,Patt8.Patt9.PattlO.Pattll.Pattl2.Pattl3,Pattl4,Pattl5~;
v o i dm a i n 0
{
i n t i.j;
u n i o n REGS r e g s e t ;
-FillPatternX~i*80.j*60,i*80+8O,j*6O+6O,O,PattTable~j*4+il~;
Set320x240ModeO;
for (j
0 ; j < 4; j++){
f o r (i 0; i < 4; i++)
(
1
}
getch( ) ;
regset.x.ax
0x0003:
/ * s w i t c hb a c kt ot e x t
i n t 8 6 ( 0 x 1 0 .& r e g s e t .& r e g s e t ) ;
mode anddone
*/
LISTING 48.2
L48-2.ASM
Mode X ( 3 2 0 x 2 4 0 .2 5 6c o l o r s )r e c t a n g l e4 x 4p a t t e r n
fill r o u t i n e .
U p p e r - l e f tc o r n e ro fp a t t e r ni sa l w a y sa l i g n e dt o
a multiple-of-4
rowandcolumn.Workson
a l l VGAs. Usesapproach
o fc o p y i n gt h e
patterntooff-screendisplay
m e m o r y ,t h e nl o a d i n gt h el a t c h e sw i t h
t h ep a t t e r nf o re a c hs c a nl i n ea n df i l l i n ge a c hs c a nl i n ef o u r
p i x e l s a t a t i m e .F i l l su pt ob u tn o ti n c l u d i n gt h ec o l u m na t
EndX
A
l ASM c o d et e s t e d
a n dt h er o wa t
EndY. No c l i p p i n gi sp e r f o r m e d .
w i t h TASM. C n e a r - c a l l a b l ea s :
v o i dF i l l P a t t e r n X ( i n tS t a r t X .i n tS t a r t Y .i n t
u n s i g n e di n tP a g e B a s e .c h a r *P a t t e r n ) ;
SC-INDEX
MAP-MASK
GC-INDEX
BIT-MASK
PATTERN-BUFFER
03c4h
equ
02h
equ
03ceh
equ
08h
equ
O
e qf fuf c h
SCREEN-SEG
SCREEN-WIDTH
equ
80
equ
OaOOOh
EndX. i n t EndY.
; S e q u e n c eC o n t r o l l e rI n d e xr e g i s t e rp o r t
; i n d e x i n SC o f Map Mask r e g i s t e r
: G r a p h i c sC o n t r o l l e rI n d e xr e g i s t e rp o r t
; i n d e x i n GC o f B i t Mask r e g i s t e r
; o f f s e ti ns c r e e n
memory o f t h e b u f f e r u s e d
; t os t o r ee a c hp a t t e r nd u r i n gd r a w i n g
;segment o f d i s p l a y memory i n Mode X
; w i d t ho fs c r e e ni na d d r e s s e sf r o mo n es c a n
; linetothenext
psatrrm
u cs
StartX
StartY
EndX
dw
dw
dw
dw
EndY
dw
PageBase
dw
Pattern
parms
ends
dw
2 dup ( ? )
NextScanOffset
RectAddrWidth
Height
STACK-FRAMELSIZE
equ
900
Chapter 48
;pushed BP a n dr e t u r na d d r e s s
;X c o o r d i n a t e o f u p p e r l e f t c o r n e r o f r e c t
:Y c o o r d i n a t e o f u p p e r l e f t c o r n e r o f r e c t
;X c o o r d i n a t eo fl o w e rr i g h tc o r n e ro fr e c t
; ( t h er o wa t
EndX i s n o t f i l l e d )
;Y c o o r d i n a t e o f l o w e r r i g h t c o r n e r o f r e c t
; ( t h ec o l u m na t
EndY i s n o t f i l l e d )
; b a s eo f f s e ti nd i s p l a y
memory o f page i n
; w h i c ht o fill r e c t a n g l e
fill r e c t a n g l e
; 4 x 4p a t t e r nw i t hw h i c ht o
; l os ct oadrlifasoogtrnaeofenrfonc m
de
; s c a nl i n et os t a r to fn e x t
-6
; l o c a ls t o r a g ef o rh e i g h to fr e c t a n g l e
.model
small
.data
: P l a n em a s k sf o rc l i p p i n gl e f ta n dr i g h te d g e so fr e c t a n g l e .
0 0i pf hPL.le0adf0ntbeC
ehM
l . 0a 0s kc h . 0 0 8 h
0R
0 fi gh h. 0t C
0 1l ihpd.Pb0l0a3nhe.M
0 0a7s hk
.code
p u b l-i Fc i l l P a t t e r n X
_ F1i P
1 atternX
p r o nc e a r
: p r e s e r v ec a l l e r ' ss t a c kf r a m e
push
bp
; p o i n tt ol o c a ls t a c kf r a m e
mov
bp.sp
: a l l o c a t es p a c ef o rl o c a lv a r s
sub
sp,STACK_FRAMELSIZE
: p r e s e r v ec a l l e r ' sr e g i s t e rv a r i a b l e s
push
si
push
di
c ld
mov
mov
ax.SCREEN-SEG
es.ax
si .[bp+Patternl
mov
di.PATTERN-BUFFER
mov
mov
dx.SC-INDEX
a1 ,MAP..MASK
mov
out
dx.al
in c
dx
mov
cx .4
DownloadPatternLoop:
mov
al.1
out
dx,al
movsb
dec
di
mov
a1 .2
out
dx.al
movsb
dec
di
a1 , 4
mov
dx,al
out
movsb
dec
di
al.8
mov
dx,al
out
movsb
1 oop
DownloadPatternLoop
mov
mov
out
dx.GC_INDEX
ax.OOOOOh+BIT-MASK
dx, ax
mov
mov
and
add
ax,Cbp+StartYl
si ,ax
si .Ollb
s i ,PATTERNCBUFFER
mov
mu1
mov
mov
shr
shr
add
dx.SCREEN-WIDTH
dx
di .[bp+StartXl
bx,di
d i .1
d i .1
d i ,ax
: p o i n t ES t o d i s p l a y memory
: c o p yp a t t e r nt od i s p l a y
memory b u f f e r
; p o i n tt op a t t e r nt o
fill w i t h
: p o i n t ES:OI t op a t t e r nb u f f e r
: p o i n tS e q u e n c eC o n t r o l l e rI n d e xt o
: Map Mask
: p o i n t t o SC D a t ar e g i s t e r
: 4p i x e lq u a d r u p l e t si np a t t e r n
0 forwrites
: s e l e c tp l a n e
0 p a t t e r np i x e l
: c o p yo v e rn e x tp l a n e
; s t a y a t same a d d r e s sf o rn e x tp l a n e
1 f o rw r i t e s
: s e l e c tp l a n e
1 p a t t e r np i x e l
: c o p yo v e rn e x tp l a n e
: s t a ya t same a d d r e s sf o rn e x tp l a n e
: s e l e c tp l a n e
2 forwrites
2 p a t t e r np i x e l
: c o p yo v e rn e x tp l a n e
; s t a y a t same a d d r e s sf o rn e x tp l a n e
3 forwrites
; s e l e c tp l a n e
: c o p yo v e rn e x tp l a n e
3 p a t t e r np i x e l
: andadvanceaddress
: s e tt h eb i t
mask t o s e l e c t a l l b i t s
: f r o mt h el a t c h e sa n dn o n ef r o m
: t h e CPU. s o t h a t we c a n w r i t e t h e
: l a t c hc o n t e n t sd i r e c t l yt o
memory
: t o pr e c t a n g l es c a nl i n e
1i n em o d u l o 4
:toprectscan
: p o i n tt op a t t e r ns c a nl i n et h a t
: maps t o t o p l i n e o f r e c t
t o draw
; o f f s e ti np a g eo ft o pr e c t a n g l es c a nl i n e
;X/4
o f f s e to ff i r s tr e c t a n g l ep i x e li ns c a n
: line
: o f f s e t o f f i r s tr e c t a n g l ep i x e li np a g e
901
add
di.[bp+PageBasel
; o f f s e to ff i r s tr e c t a n g l ep i x e li n
; d i s p l a y memory
and
mov
mov
and
mov
mov
bx.[bp+EndX]
bx.0003h
a1 . R i g h t C l i p P l a n e M a s k [ b x ]
bx,ax
mov
mov
CmP
jle
dec
and
sub
shr
shr
jnz
and
cx.[bp+EndXl
ax.[bp+StartXl
cx,ax
F i 11 Done
cx
a x . n o tO l l b
cx, ax
cx.1
cx.1
MasksSet
bh,bl
bx ,0003h
;lookupleft
ah.LeftClipPlaneMaskCbx1
; toclip
MasksSet:
mov
sub
jle
mov
mov
sub
dec
mov
mov
mov
ax,[bp+EndY]
ax.[bp+StartYl
F i 11 Done
[bp+Heightl,ax
ax.SCREEN-WIDTH
ax, cx
ax
[bp+NextScanOffset] ,ax
[bp+RectAddrWidth].cx
dx.SC-INDEX+l
FillRowsLoop:
mov
mov
cx,[bp+RectAddrWidthl
al.es:[sil
in c
jnz
sub
si
s h o r t NoWrap
s i .4
mov
out
stosb
a1 ,bh
dx.al
dec
js
jz
mov
out
rep
cx
Fi 11 LoopBottom
OoRightEdge
a1 .OOfh
dx.al
stosb
edgeplanemask
; l o o k up r i g h te d g ep l a n e
; mask t o c l i p
; p u tt h e masks i n BX
;calculate
o fa d d r e s s e sa c r o s sr e c t
; s k i p i f 0 o rn e g a t i v ew i d t h
fill - 1
; t h e r e ' sm o r et h a no n ep i x e lt od r a w
s o c o m b i n et h el e f t
; t h e r e ' so n l yo n ep i x e l ,
; a n dr i g h t - e d g ec l i p
masks
; # o fa d d r e s s e sa c r o s sr e c t a n g l et o
;AX
h e i g h to fr e c t a n g l e
; s k i p i f 0 o rn e g a t i v eh e i g h t
; d i s t a n c ef r o me n do fo n es c a nl i n et os t a r t
; o fn e x t
- 1
remember w i d t hi na d d r e s s e s
p o i n t t o S e q u e n c eC o n t r o l l e rD a t ar e g
(SC I n d e x s t i l l p o i n t s t o
Map Mask)
- 1
w i d t ha c r o s s
r e a dd i s p l a y memory t o l a t c h t h i s s c a n
l i n e ' sp a t t e r n
p o i n tt ot h en e x tp a t t e r ns c a nl i n e .w r a p p i n g
; b a c kt ot h es t a r to ft h ep a t t e r n
if
; w e ' v er u no f ft h ee n d
NoWrap:
OoRightEdge:
mov
out
stosb
a1 ,b l
dx.al
Fi 11 LoopBottom:
d i , [ b p +a N
d de x t S c a n O f f s e t l
902
Chapter 48
;putleft-edgeclip
mask i n AL
; s e tt h el e f t - e d g ep l a n e( c l i p )
mask
; d r a wt h el e f te d g e( p i x e l s
come f r o ml a t c h e s ;
; v a l u ew r i t t e nb y
CPU d o e s n ' tm a t t e r )
; c o u n to f fl e f te d g ea d d r e s s
; t h a t ' st h eo n l ya d d r e s s
; t h e r ea r eo n l yt w oa d d r e s s e s
4 p i x e l s a t a pop
; m i d d l ea d d r e s s e sa r ed r a w n
; s e tt h em i d d l ep i x e l
mask t o no c l i p
; d r a wt h em i d d l ea d d r e s s e sf o u rp i x e l sa p i e c e
; ( f r o ml a t c h e s ;v a l u ew r i t t e nd o e s n ' tm a t t e r )
; p u tr i g h t - e d g ec l i p
mask i n AL
; s e tt h er i g h t - e d g ep l a n e( c l i p )
mask
; d r a wt h er i g h te d g e( f r o ml a t c h e s ;v a l u e
; w r i t t e nd o e s n ' tm a t t e r )
; p o i n tt ot h es t a r to ft h en e x ts c a n
; l i n eo ft h er e c t a n g l e
dw[epobctrprd+ H e i g h t l
F iRowsLoop
11
F i 11 Done:
mov
dx.GC-INDEX+l
mov
a1 . O f f h
dx.al out
: c o u n t downscan
lines
jnz
di
pop
pop
si
mov
sp.bp
POP
bP
ret
-Fill PatternX endp
end
: r e s t o r et h eb i t
mask t o i t s d e f a u l t ,
: w h i c hs e l e c t sa l lb i t sf r o mt h e
CPU
: a n dn o n ef r o mt h el a t c h e s( t h e
GC
: I n d e xs t i l lp o i n t st o
B i t Mask)
: r e s t o r ec a l l e r sr e g i s t e rv a r i a b l e s
: d i s c a r ds t o r a g ef o rl o c a lv a r i a b l e s
: r e s t o r ec a l l e r ss t a c kf r a m e
Offset O
Offset 1 9200
Offset 38400
Offset 57600
Offset 65532
904
Chapter 48
Although copying through the latches is, in general, a speedy technique, especially on slower VGAs, it 5 not always a win. Reading video memory tends to be
quite a bit slower than writing, and on a fast VLB or PCI adaptel; it can befaster
to copyfrom main memoryto display memory thanit is to copyfrom display memory
to display memory via the latches.
LISTING 48.3
L48-3.ASM
: Mode X ( 3 2 0 x 2 4 0 ,2 5 6c o l o r s )d i s p l a y
memory t o d i s p l a y memory c o p y
r o u t i n e .L e f te d g eo fs o u r c er e c t a n g l em o d u l o
4 m u s te q u a l
l e f t edge
o fd e s t i n a t i o nr e c t a n g l em o d u l o
4. Workson
a l l VGAs. Usesapproach
o f r e a d i n g 4 p i x e l s a t a t i m ef r o mt h es o u r c ei n t ot h el a t c h e s ,t h e n
w r i t i n gt h el a t c h e st ot h ed e s t i n a t i o n .C o p i e su pt ob u tn o t
: i n c l u d i n gt h ec o l u m na tS o u r c e E n d Xa n dt h er o wa tS o u r c e E n d Y .
No
: c l i p p i n gi sp e r f o r m e d .R e s u l t sa r en o tg u a r a n t e e d
i f t h es o u r c ea n d
: d e s t i n a t i o no v e r l a p .
C n e a r - c a l l a b l ea s :
:
:
:
:
v oC
i do p y S c r e e n T o S c r e e n X ( i nS to u r c e S t a r t Xi nS. to u r c e S t a r t Y .
i n t SourceEndX. i n t SourceEndY. i n tD e s t S t a r t X .
i n tD e s t S t a r t Y .u n s i g n e di n tS o u r c e P a g e B a s e .
u n s i g n e di n tD e s t P a g e B a s e .i n tS o u r c e B i t m a p W i d t h ,
i n tD e s t B i t m a p W i d t h ) :
SC-INDEX
MAP-MASK
GC-INDEX
B I T-MAS K
SCREENKSEG
03c4h
equ
02h
equ
03ceh
equ
08h
equ
equ
OaOOOh
: S e q u e n c eC o n t r o l l e rI n d e xr e g i s t e rp o r t
: i n d e x i n SC o f Map Mask r e g i s t e r
: G r a p h i c sC o n t r o l l e rI n d e xr e g i s t e rp o r t
: i n d e x i n GC o f B i t Mask r e g i s t e r
:segment o f d i s p l a y memory i n Mode X
psatrrm
u cs
SourceStartX
SourceStartY
SourceEndX
dw
dw
dw
dw
2 dup ( ? )
?
?
SourceEndY
dw
DestStartX
DestStartY
SourcePageBase
dw
dw
dw
?
?
?
DestPageBase
dw
:pushed B P a n dr e t u r na d d r e s s
: X c o o r d i n a t eo fu p p e r - l e f tc o r n e ro fs o u r c e
: Y c o o r d i n a t eo fu p p e r - l e f tc o r n e ro fs o u r c e
: X c o o r d i n a t eo fl o w e r - r i g h tc o r n e ro fs o u r c e
: ( t h er o wa tS o u r c e E n d Xi sn o tc o p i e d )
: Y c o o r d i n a t eo fl o w e r - r i g h tc o r n e ro fs o u r c e
: ( t h ec o l u m na tS o u r c e E n d Yi sn o tc o p i e d )
: X c o o r d i n a t eo fu p p e r - l e f tc o r n e ro fd e s t
:Y c o o r d i n a t eo fu p p e r - l e f tc o r n e ro fd e s t
: b a s eo f f s e ti nd i s p l a y
memory o f page i n
: w h i c hs o u r c er e s i d e s
: b a s eo f f s e ti nd i s p l a y
memory o f p a g e i n
; w h i c hd e s tr e s i d e s
905
SourceBitmapWidth
dw
DestBitmapWidth
dw
:#o fp i x e l sa c r o s ss o u r c eb i t m a p
: (mustbe
a m u l t i p l eo f4 )
:# o f p i x e l s a c r o s s d e s t b i t m a p
: ( m u s t be a m u l t i p l e o f 4 )
parms
ends
-2
S o u r c e N e x t S c a n Oef fqsue t
: l o c a ls t o r a g ef o rd i s t a n c ef r o me n do f
: o n es o u r c es c a nl i n et os t a r to fn e x t
D e s t N e x t S ceaqnuO f f s e t
equ
RectAddrWidth
Height
STACK-FRAME-SIZE
-4
-6
equ
equ
-8
8
: l o c a ls t o r a g ef o rd i s t a n c ef r o me n do f
; one d e s t s c a n l i n e t o s t a r t o f n e x t
; l o c a ls t o r a g ef o ra d d r e s sw i d t ho fr e c t a n g l e
; l o c a ls t o r a g ef o rh e i g h to fr e c t a n g l e
.model
small
.data
: P l a n em a s k sf o rc l i p p i n g
l e f t and r i g h te d g e so fr e c t a n g l e .
L e f t Cdl ibp P l a n e M a s k
00fh.00eh.00ch.008h
R i g h t C l i pdPbl a n e M a s k
00fh,001h.O03h,007h
.code
p u b Jl iocp y S c r e e n T o S c r e e n X
-CopyScreenToScreenX
proc
near
: p r e s e r v ec a l l e r ' ss t a c kf r a m e
push
bP
: p o i n tt ol o c a ls t a c kf r a m e
mov
bP.SP
sub
: a l l o c a t es p a c ef o rl o c a lv a r s
sp.STACK-FRAME-SIZE
si
: p r e s e r v ec a l l e r ' sr e g i s t e rv a r i a b l e s
push
di
push
ds
push
cld
mov
mov
out
mov
mov
mov
shr
shr
mu1
mov
shr
shr
add
add
ax.SCREEN-SEG
e s ,ax
ax,[bp+DestBitmapWidth]
ax.1
ax.1
[bp+DestStartYl
di.Cbp+DestStartXl
d i .1
d i .1
d i ,ax
di.[bp+DestPageBasel
: i n d i s p l a y memory
mov
shr
shr
mu1
mov
mov
shr
shr
add
add
ax.[bp+SourceBitmapWidthl
and
mov
mov
906
dx.GC-INDEX
ax.OOOOOh+BIT-MASK
dx.ax
Chapter 48
ax, 1
ax, 1
[bp+SourceStartY]
si,[bp+SourceStartXl
bx.si
s i .1
s i .1
s i ,ax
si.[bp+SourcePageBase]
bx.0003h
ah,LeftClipPlaneMask[bxl
bx.[bp+SourceEndX]
; s e tt h eb i t
mask t o s e l e c t a l l b i t s
: f r o mt h el a t c h e sa n dn o n ef r o m
: t h e CPU. s o t h a t we c a n w r i t e t h e
: l a t c hc o n t e n t sd i r e c t l yt o
; p o i n t ES t o d i s p l a y memory
memory
: c o n v e r tt ow i d t hi na d d r e s s e s
; t o pd e s tr e c ts c a nl i n e
:X/4
- offsetoffirstdestrectpixelin
: s c a nl i n e
; o f f s e to ff i r s td e s tr e c tp i x e li np a g e
; o f f s e to ff i r s td e s tr e c tp i x e l
: c o n v e r tt ow i d t hi na d d r e s s e s
: t o ps o u r c er e c ts c a nl i n e
;X/4
- offsetoffirstsourcerectpixelin
: s c a nl i n e
; o f f s e to ff i r s ts o u r c er e c tp i x e li np a g e
:offsetoffirstsourcerect
: p i x e li nd i s p l a y
memory
: l o o ku pl e f te d g ep l a n e
mask
: toclip
and
mov
mov
bx.0003h
a1 . R i g h t C l i p P l a n e M a s k [ b x l
bx,ax
: l o o ku pr i g h t - e d g ep l a n e
; mask t o c l i p
: p u t h em a s k s
i n BX
mov
mov
cmp
jle
dec
and
sub
shr
shr
j nz
and
cx.[bp+SourceEndX]
ax.[bp+SourceStartXl
cx.ax
CopyDone
cx
ax.not O l l b
cx.ax
cx.1
cx.1
MasksSet
bh, bl
:calculate
; rect
MasksSet:
mov
sub
jle
mov
mov
shr
shr
sub
dec
mov
mov
shr
shr
sub
dec
mov
mov
ax,[bp+SourceEndYI
ax.[bp+SourceStartYl
CopyDone
[bp+Heightl.ax
ax.[bp+DestBitmapWidthl
ax, 1
ax, 1
ax.cx
ax
[bp+DestNextScanOffsetl.ax
ax.[bp+SourceBitmapWidthl
ax.1
ax.1
ax.cx
ax
[bp+SourceNextScanOffsetl.ax
C
- b. o + R e c t A d d r W i d t h l . c x
BUG F I X
mov
dx.SC-INDEX
a1 .MAPKMASK
mov
out
dx.al
in c
dx
.....- - BUG F I X
mov
ax, es
mov
ds, ax
CopyRowsLoop:
mov
cx.[bp+RectAddrWidthl
mov
a1 .bh
out
dx.al
movsb
."""""""""""
# o fa d d r e s s e sa c r o s s
; s k i p i f 0 o rn e g a t i v ew i d t h
:# o fa d d r e s s e sa c r o s sr e c t a n g l et oc o p y
: t h e r e ' sm o r et h a no n ea d d r e s st od r a w
s o c o m b i n et h e
; t h e r e ' so n l yo n ea d d r e s s ,
; l e f t - a n dr i g h t - e d g ec l i p
masks
:AX
h e i g h to fr e c t a n g l e
; s k i p i f 0 o rn e g a t i v eh e i g h t
: c o n v e r tt ow i d t hi na d d r e s s e s
; d i s t a n c ef r o me n do fo n ed e s ts c a nl i n et o
; s t a r to fn e x t
; c o n v e r tt ow i d t hi na d d r e s s e s
; d i s t a n c ef r o me n do fo n es o u r c es c a nl i n et o
: s t a r to fn e x t
;remember w i d t hi na d d r e s s e s
- I
~~
...."""""
dec
js
jz
mov
out
rep
DoRightEdge:
mov
out
movsb
cx
CopyLoopBottom
DoRightEdge
a1 .OOfh
dx.al
movsb
a1 ,b l
dx,al
: p o i n t SC I n d e x r e g t o
: p o i n t t o SC D a t ar e g
;DS-ES-screensegment
Map Mask
f o r MOVS
- 1
: w i d t ha c r o s s
;putleft-edgeclip
mask i n AL
; s e tt h el e f t - e d g ep l a n e( c l i p )
mask
; c o p yt h el e f te d g e( p i x e l s
g ot h r o u g h
: latches)
;countoffleft
edgeaddress
: t h a t ' st h eo n l ya d d r e s s
; t h e r ea r eo n l yt w oa d d r e s s e s
4 pixelsat
; m i d d l ea d d r e s s e sa r ed r a w n
; s e tt h em i d d l ep i x e l
mask t o no c l i p
; d r a wt h em i d d l ea d d r e s s e sf o u rp i x e l sa p i e c e
; ( p i x e l sc o p i e dt h r o u g hl a t c h e s )
a pop
; p u tr i g h t - e d g ec l i p
mask i n AL
; s e tt h er i g h t - e d g ep l a n e( c l i p )
mask
; d r a wt h er i g h te d g e( p i x e l sc o p i e dt h r o u g h
; latches)
Mode X MarkstheLatch
907
CopyLoopBottom:
add
add
word
dec
jnz
CopyDone:
mov
mov
dx.al out
POP
pop
pop
mov
POP
si.[bp+SourceNextScanOffset]
di,[bp+DestNextScanOffsetl
[pbtpr + H e i g h t l
CopyRowsLoop
dx.GC-INDEX+l
a1 . O f f h
ds
di
si
sp.bp
bp
: p o i n tt ot h es t a r to f
& d e s tl i n e s
; c o u n t down s c a nl i n e s
: n e x ts o u r c e
: r e s t o r et h eb i t
mask t o i t s d e f a u l t ,
CPU
GC
: I n d e xs t i l lp o i n t st o
B i t Mask)
: w h i c hs e l e c t sa l lb i t sf r o mt h e
: a n dn o n ef r o mt h el a t c h e s( t h e
; r e s t o r ec a l l e r sr e g i s t e rv a r i a b l e s
: d i s c a r ds t o r a g ef o rl o c a lv a r i a b l e s
: r e s t o r ec a l l e r ss t a c kf r a m e
ret
-CopyScreenToScreenXendp
end
908
Chapter 48
LISTING 48.4L48-4.ASM
: Mode X ( 3 2 0 x 2 4 0 .2 5 6c o l o r s )s y s t e m
memory t o d i s p l a y memorycopy
: r o u t i n e . U s e sa p p r o a c ho fc h a n g i n gt h ep l a n ef o re a c hp i x e lc o p i e d ;
: t h i si ss l o w e rt h a nc o p y i n ga l lp i x e l si no n ep l a n e ,t h e na l lp i x e l s
: i nt h en e x tp l a n e ,a n d
so o n ,b u t
i t i ss i m p l e r ;b e s i d e s ,i m a g e sf o r
: w h i c hp e r f o r m a n c ei sc r i t i c a ls h o u l db es t o r e di no f f - s c r e e n
memory
: a n dc o p i e d
t ot h es c r e e nv i at h el a t c h e s .C o p i e su pt ob u tn o t
; i n c l u d i n gt h ec o l u m na tS o u r c e E n d Xa n dt h er o wa tS o u r c e E n d Y .
: c l i p p i n gi sp e r f o r m e d .
;
No
C n e a r - c a l l a b l ea s :
voC
i do p y S y s t e m T o S c r e e n X ( i nSto u r c e S t a r t Xi nS. to u r c e S t a r t Y .
i n t SourceEndX. i n t SourceEndY. i n tD e s t S t a r t X .
i n tD e s t S t a r t Y .c h a r *S o u r c e P t r .u n s i g n e di n tD e s t P a g e B a s e .
i n tS o u r c e B i t m a p W i d t h .i n tO e s t B i t m a p W i d t h ) ;
SC-INDEX
MAP-MASK
SCREEN-SEG
03c4h
equ
02h
equ
equ
OaOODh
: S e q u e n c eC o n t r o l l e rI n d e xr e g i s t e rp o r t
; i n d e x i n SC o f Map Mask r e g i s t e r
:segment o f d i s p l a y memory i n Mode X
p sa trrmu sc
SourceStartX
SourceStartY
SourceEndX
dw
dw
dw
dw
2 dup ( ? )
?
?
?
SourceEndY
dw
DestStartX
DestStartY
SourcePtr
dw
dw
dw
DestPageBase
dw
SourceBitmapWidth
DestBitmapWidth
dw
dw
;pushed BP and r e t u r na d d r e s s
:X c o o r d i n a t eo fu p p e r - l e f tc o r n e ro fs o u r c e
:Y c o o r d i n a t e o f u p p e r - l e f t c o r n e r o f s o u r c e
: X c o o r d i n a t eo fl o w e r - r i g h tc o r n e ro fs o u r c e
: ( t h er o wa t
EndX i s n o t c o p i e d )
: Y c o o r d i n a t eo fl o w e r - r i g h tc o r n e ro fs o u r c e
: ( t h ec o l u m na t
EndY i s n o t c o p i e d )
;X c o o r d i n a t eo fu p p e r - l e f tc o r n e ro fd e s t
;Y c o o r d i n a t eo fu p p e r - l e f tc o r n e ro fd e s t
; p o i n t e r i n DS t o s t a r t o f b i t m a p i n w h i c h
: s o u r c er e s i d e s
i n d i s p l a y memory o f page i n
; b a s eo f f s e t
; w h i c hd e s tr e s i d e s
;# o fp i x e l sa c r o s ss o u r c eb i t m a p
;# o fp i x e l sa c r o s sd e s tb i t m a p
: (mustbe a m u l t i p l e o f 4 )
parms
ends
equ
equ
RectWidth
LeftMask
STACK-FRAME-SIZE
-2
-4
equ
; l o c a ls t o r a g ef o rw i d t ho fr e c t a n g l e
; l o c a ls t o r a g ef o rl e f tr e c te d g ep l a n e
mask
.model
smal
1
.code
p u b -l iCco p y S y s t e m T o S c r e e n X
-CopySystemToScreenX
proc
near
bp push
mov
bp.sp
sub
sp.STACK-FRAMELSIZE
s ip u s h
d ip u s h
;preserve
:pointto
:allocate
:preserve
c a l l e r ' ss t a c kf r a m e
l o c a ls t a c kf r a m e
s p a c ef o rl o c a lv a r s
c a l l e r ' sr e g i s t e rv a r i a b l e s
909
cld
mov
mov
mov
mu1
add
add
mov
ax.SCREEN_SEG
es ,ax
[bp+SourceStartYl
ax.[bp+SourceStartX]
ax.[bp+SourcePtr]
s i ,ax
mov
shr
shr
mov
mu1
mov
mov
shr
shr
add
add
ax.[bp+DestBitmapWidth]
ax.1
ax.1
[bp+DestBitmapWidth].ax
[bp+DestStartYl
di.[bp+DestStartX]
cx,di
d i .1
d i .1
di ,ax
di.Cbp+DestPageBase]
and
mov
cl .Ollb
a1 . l l h
shl
mov
a1 . c l
[bp+LeftMask] .a1
1 oop
POP
add
CopyScanLineLoop
di
di.[bp+DestBitmapWidthl
POP
add
si
dec
jnz
bx
CopyRowsLoop
Chapter 48
memory
ax,Cbp+SourceBitmapWidth]
mov
cx.[bp+SourceEndX1
sub
cx.[bp+SourceStartX]
jle
CopyDone
mov
[bp+RectWidthl.cx
mov
bx.[bp+SourceEndY]
sub
bx.[bp+SourceStartY]
jle
CopyDone
dx.SC-INDEX
mov
mov
a1 .MAP-MASK
dx.al
out
inc
dx
CopyRowsLoop:
mov
ax,[bp+LeftMask]
mov
cx.[bp+RectWidthl
push
si
push
di
CopyScanLineLoop:
out
dx.al
movsb
al.l
rol
cmc
sbb
d i .O
910
: p o i n t ES t o d i s p l a y
si.[bp+SourceBitmapWidthl
: t o ps o u r c er e c ts c a nl i n e
: o f f s e to ff i r s ts o u r c er e c tp i x e l
: i n DS
: c o n v e r tt ow i d t hi na d d r e s s e s
: r e m e m b e ra d d r e s sw i d t h
; t o pd e s tr e c ts c a nl i n e
X/4
o f f s e to ff i r s td e s tr e c tp i x e li n
scan l i n e
o f f s e to ff i r s td e s tr e c tp i x e li n
page
offsetoffirstdestrectpixel
i n d i s p l a y memory
CL = f i r s t d e s t p i x e l ' s p l a n e
u p p e rn i b b l e comes i n t o p l a y when
3 b a c kt o 0
p l a n ew r a p sf r o m
:setthebitforthefirstdestpixel's
; p l a n ei ne a c hn i b b l et o
1
:calculate
: rect
o fp i x e l sa c r o s s
: s k i p i f 0 or n e g a t i v e w i d t h
;EX
h e i g h to fr e c t a n g l e
; s k i p i f 0 o rn e g a t i v eh e i g h t
; p o i n t t o SC I n d e xr e g i s t e r
: p o i n t SC I n d e x r e g t o t h e
: p o i n t DX t o SC D a t a r e g
Map Mask
:remember t h e s t a r t o f f s e t i n t h e s o u r c e
;remember t h e s t a r t o f f s e t i n t h e d e s t
; s e tt h ep l a n ef o rt h i sp i x e l
: c o p yt h ep i x e lt ot h es c r e e n
: s e t mask f o r n e x t p i x e l ' s p l a n e
; a d v a n c ed e s t i n a t i o na d d r e s so n l y
when
; w r a p p i n gf r o mp l a n e
3 t op l a n e 0
: ( e l s e undo I N C D I doneby
MOVSB)
:retrievethedeststartoffset
: p o i n tt ot h es t a r to ft h e
: n e x ts c a nl i n eo ft h ed e s t
: r e t r i e v et h es o u r c es t a r to f f s e t
:pointtothestartofthe
: n e x ts c a nl i n e
o f t h es o u r c e
: c o u n t down s c a nl i n e s
Previous
CopyDone:
pop
di
pop
si
mov
sp.bp
POP
bP
ret
-CopySystemToScreenXendp
end
Home
; r e s t o r ec a l l e r sr e g i s t e rv a r i a b l e s
; d i s c a r ds t o r a g ef o rl o c a lv a r i a b l e s
: r e s t o r ec a l l e r ss t a c kf r a m e
91 1
Next
Previous
chapter 49
mode x 256-color animation
Home
Next
Masked Copying
Over the past two chapters, weve put together most of the tools needed to implement animation in the VGAs undocumented 320x240 256-color Mode X. We now
have mode set code, solid and 4x4 pattern fills, system memory-to-display memory
block copies, and display memory-to-display memory block copies. The final piece
915
of the puzzle is the ability to copy a nonrectangular image to display memory.I call
this masked copying.
Masked copying is sort of like drawing through a stencil, in that only certain pixels
within the destination rectangle are drawn. The objective isto fit the image seamlessly
into thebackground, without the rectangular fringe that results when nonrectangular
images are drawn by block copying their bounding rectangle. This is accomplished
by using a second rectangular bitmap, separate fromthe image but corresponding
to it on a pixel-by-pixel basis, to control which destination pixels are set from the
source and which are left unchanged. With a masked copy, only those pixels properly belonging to an image are drawn, and the image
fits perfectly into the
background, with no rectangular border. In fact, masked copying even makesit possible to havetransparent areaswithin images.
Note that anotherway to achieve this effect is to implement copying code that supports a transparent color; that is, a color that doesnt get copied
but ratherleaves the
destination unchanged. Transparent copying makes for more compactimages, because no separate mask is needed, and is generally faster in a software-only
implementation. However, Mode X supports masked copying but not transparent
copying in hardware, so well use masked copying in this chapter.
The system memory to display memory masked copy routine in Listing 49.1 implements masked copying in a straightforward fashion. In the main drawing loop, the
corresponding mask byte is consulted as each image pixel is encountered, and the
image pixel is copied only if the mask byteis nonzero. As with most of the system-todisplay code Ive presented, Listing 49.1 is not heavily optimized, because its
inherently slow; theres a better way to go when performance matters, and thats to
use the VGAs hardware.
LISTING 49.1 L49-
1.ASM
struc
SourceStartX
916
OaOOOh
: S e q u e n c eC o n t r o l l e rI n d e xr e g i s t e rp o r t
i n SC o f Map Mask r e g i s t e r
:segmentd i sopf l a y
memory i n mode X
Chapter 49
dw
dw
2 dup ( ? )
?
SourceStartY
SourceEndX
dw
dw
?
?
SourceEndY
dw
DestStartX
dw
DestStartY
SourcePtr
DestPageBase
dw
dw
dw
?
?
?
SourceBitmapWidth
dw
DestBi
tmapWidth
MaskPtr
dw
dw
?
?
;Y c o o r d i n a t eo fu p p e rl e f tc o r n e ro fs o u r c e
;X c o o r d i n a t e o f l o w e r r i g h t c o r n e r o f s o u r c e
; ( t h ec o l u m na t
EndX i s n o t c o p i e d )
; Y c o o r d i n a t eo fl o w e rr i g h tc o r n e ro fs o u r c e
: ( t h er o wa t
EndY i s n o t c o p i e d )
;X c o o r d i n a t eo fu p p e rl e f tc o r n e ro fd e s t
; ( d e s t i n a t i o ni si nd i s p l a y
memory)
;Y c o o r d i n a t eo fu p p e rl e f tc o r n e ro fd e s t
; p o i n t e r i n DS t o s t a r t o f b i t m a p w h i c h s o u r c e r e s i d e s
:base o f f s e t i n d i s p l a y
memory o f page i n
; w h i c hd e s tr e s i d e s
;# o fp i x e l sa c r o s ss o u r c eb i t m a p( a l s om u s t
: b ew i d t ha c r o s st h em a s k )
; # o fp i x e l sa c r o s sd e s tb i t m a p( m u s tb em u l t i p l eo f4 )
mask
; p o i n t e r i n DS t o s t a r t o f b i t m a p i n w h i c h
: r e s i d e s( b y t e - p e r - p i x e lf o r m a t ,j u s tl i k et h es o u r c e
i m a g e ;0 - b y t e s
mean d o n ' tc o p yc o r r e s p o n d i n gs o u r c e
p i x e l , 1 - b y t e s mean docopy)
parms
ends
equ
RectWidth
:1 o c a ls t o r a g ef o rw i d t ho fr e c t a n g l e
- 4 R eecqtuH e i g h t
;1 o c a ls t o r a g ef o rh e i g h to fr e c t a n g l e
equ
-6
LeftMask
:1 o c a ls t o r a g ef o rl e f tr e c te d g ep l a n e
STACK-FRAME-SIZE equ 6
.model
smal
1
.code
p u b l i c -CopySystemToScreenMaskedX
-CopySystemToScreenMaskedX pnr eo ac r
f r a m e s t a c kc a l;lppush
erre' ss e r v e b p
f r a smt ea cl okmov
c a l t o ; p ob iPn*ts P
sp.STACK-FRAME-SIZE
; a l l o c a st ep a cf loeor c vaal r s
sub
v a r ri ae bg licesa:st pelpush
l reer s' se r v es i
push
di
-2
mask
mov
mov
mov
mu1
add
mov
add
mov
add
ax.SCREEN-SEG
es ,ax
mov
shr
shr
mov
mu1
mov
mov
shr
shr
add
add
ax.[bp+DestBitmapWidthl
ax, 1
a d d r ei nswsied:stchtoon v e r t
ax.1
[bp+DestBitmapWidth].ax;rememberaddresswidth
: t o pd e s tr e c ts c a nl i n e
[bp+DestStartYl
di.[bp+DestStartX]
cx.di
;X/4
offsetoffirstdestrectpixelin
d i .1
; s c a nl i n e
d i .1
; o f f s e to ff i r s td e s tr e c tp i x e li n
page
d i ,ax
di.Cbp+OestPageBasel
:offsetoffirstdestrectpixel
: i n d i s p l a y memory
cl .Ollb
;CL
firstdestpixel'splane
a1 . l l h
: u p p e rn i b b l e
comes i n t o p l a y when p l a n ew r a p s
; f r o m 3 b a c kt o 0
a1 . c l
:setthebitforthefirstdestpixel'splane
: i n e a c hn i b b l et o
1
[bp+LeftMaskl.al
and
mo v
sh l
mov
; p o i n t EdS i st op l a y
memory
ax.Cbp+SourceBitmapWidthl
mask p i x ienl
OS
917
mov
sub
jle
mov
sub
a x . [ b p + S o u r c e E: cnadlXc Iu l a t e
p11iaxcoerfloss s
: rect
ax.[bp+SourceStartXl
ef gi d0atthoi vr e
CopyDone
: s k i p ni w
Cbp+RectWidthl.ax
word p t r [bp+SourceBitmapWidthl.ax
: d i s t a n c ef r o me n do fo n es o u r c es c a nl i n et os t a r to fn e x t
ax.[bp+SourceEndYI
: h e i g h to fr e c t a n g l e
ax.Cbp+SourceStartYl
: s k i p i f 0 o rn e g a t i v eh e i g h t
CopyDone
Cbp+RectHeightl.ax
; p o i n t t o SC I n d e xr e g i s t e r
dx.SC-INDEX
a1 .MAP-MASK
Map Mask
: p o i n t SC I n d e x r e g t o t h e
dx,al
: p o i n t DX t o SC D a t ar e g
dx
mov
sub
jle
mov
mov
mov
out
inc
CopyRowsLoop:
a1 .[bp+LeftMaskl
mov
cx.[bp+RectWidthl
mov
di
push
CopyScaniineLoop:
b y t ep t rC b x l . 0
CmP
MaskOff
jz
out
mov
mov
dx.al
ah.[sil
es:[dil.ah
inc
inc
rol
adc
bx
si
a1 ,1
d i .O
1oop
POP
add
CopyScanLineLoop
di
di.[bp+DestBitmapWidthl
add
si.[bp+SourceBitmapWidthl
add
bx.[bp+SourceBitmapWidthl
:remember t h e s t a r t o f f s e t i n t h e d e s t
: i st h i sp i x e lm a s k - e n a b l e d ?
it
;no. s o d o n ' td r a w
: y e s .d r a wt h ep i x e l
: s e tt h ep l a n ef o rt h i sp i x e l
: g e tt h ep i x e lf r o mt h es o u r c e
: c o p yt h ep i x e lt ot h es c r e e n
MaskOff:
dec
jnz
CopyDone:
POP
POP
mov
POP
ret
word p t r[ b p + R e c t H e i g h t l
CopyRowsLoop
when
0
: r e t r i e v et h ed e s ts t a r to f f s e t
:pointtothestartofthe
: n e x ts c a nl i n eo ft h ed e s t
:pointtothestartofthe
: n e x ts c a nl i n eo ft h es o u r c e
:pointtothestartofthe
: n e x ts c a nl i n eo ft h e
mask
: c o u n t down s c a nl i n e s
; r e s t o r ec a l l e r ' sr e g i s t e rv a r i a b l e s
di
si
sp.bp
bP
-CopySystemToScreenMaskedX
end
: a d v a n c et h e mask p o i n t e r
: a d v a n c et h es o u r c ep o i n t e r
: s e t mask f o r n e x t p i x e l ' s p l a n e
: a d v a n c ed e s t i n a t i o na d d r e s so n l y
: w r a p p i n gf r o mp l a n e
3 toplane
: d i s c a r ds t o r a g ef o rl o c a lv a r i a b l e s
: r e s t o r ec a l l e r ' ss t a c kf r a m e
endp
918
Chapter 49
Theres a slight hitch, though. The latches can only be used when the source and
destination left edge coordinates, modulo four, are the same, as explained in the
previous chapter. The solution is to copy allfour possible alignments of each image
to display memory, each properly positioned for one of the four possible destination-left-edge-modulo-four cases.These aligned images must be accompaniedby the
four possible alignments of the image mask,stored in system memory. Given all
four
image and mask alignments, masked copying is
a simple matter of selecting the alignment thats appropriate for the destinations left edge, then setting the Map Mask
with the 4bit mask corresponding to each four-pixel set as we copy four pixels at a
time via the latches.
Listing 49.2 performs fast masked copying.This code expects to receivea pointer to
a MaskedImage structure, which in turn points to fourAlignedMaskedImage structures that describe the fourpossible imageand mask alignments. The aligned images
are already stored in display memory, and the aligned masks are already stored in
system memory; further, the masks are predigested into Map Mask register-compatible form. Given all that ready-to-use data, Listing 49.2 selects and works with the
appropriate image-mask pair for thedestinations left edge alignment.
LISTING 49.2L49-2.ASM
: Mode X ( 3 2 0 x 2 4 0 .2 5 6c o l o r s )d i s p l a y
memory t o d i s p l a y memorymaskedcopy
: r o u t i n e . Workson
a l l VGAs. Uses approach o fr e a d i n g 4 p i x e l s a t a t i m ef r o m
: s o u r c ei n t ol a t c h e s ,t h e nw r i t i n gl a t c h e st od e s t i n a t i o n ,u s i n g
Map Mask
: r e g i s t e rt op e r f o r mm a s k i n g .C o p i e su pt ob u tn o ti n c l u d i n gc o l u m na t
: SourceEndXandrow
a t SourceEndY. No c l i p p i n gi sp e r f o r m e d .R e s u l t sa r en o t
: g u a r a n t e e d i f s o u r c ea n dd e s t i n a t i o no v e r l a p .
C n e a r - c a l l a b l ea s :
:
v o i d CopyScreenToScreenMaskedX(int S o u r c e S t a r t X .
i n tS o u r c e S t a r t Y .i n t
SourceEndX. i n t SourceEndY.
i n tD e s t S t a r t X ,i n tO e s t S t a r t Y .M a s k e d I m a g e
* Source,
u n s i g n e di n tD e s t P a g e B a s e .i n tD e s t B i t m a p W i d t h ) :
03c4h
02h
03ceh
08h
OaOODh
SC-INDEX
MAP-MASK
GC-INDEX
BIT-MASK
SCREENKSEG
parms
: S e q u e n c eC o n t r o l l e rI n d e xr e g i s t e rp o r t
: i n d e x i n SC o f Map Mask r e g i s t e r
; G r a p h i c sC o n t r o l l e rI n d e xr e g i s t e rp o r t
; i n d e x i n GC o f Bit Mask r e g i s t e r
:segment o f d i s p l a y memory i n mode X
struc
SourceStartX
SourceStartY
SourceEndX
2 dup ( ? )
?
?
?
SourceEndY
DestStartX
DestStartY
Source
?
?
?
DestPageBase
DestBitmapWidth
parms
ends
;pushed B P and r e t u r na d d r e s s
: X c o o r d i n a t eo fu p p e rl e f tc o r n e ro fs o u r c e
:Y c o o r d i n a t e o f u p p e r l e f t c o r n e r o f s o u r c e
:X c o o r d i n a t e o f l o w e r r i g h t c o r n e r o f s o u r c e
: ( t h ec o l u m na tS o u r c e E n d X
i sn o tc o p i e d )
; Y c o o r d i n a t eo fl o w e rr i g h tc o r n e ro fs o u r c e
: ( t h e rowatSourceEndY
i sn o tc o p i e d )
; X c o o r d i n a t eo fu p p e rl e f tc o r n e ro fd e s t
:Y c o o r d i n a t eo fu p p e rl e f tc o r n e r
o f dest
: p o i n t e r t o MaskedImage s t r u c t f o r s o u r c e
: w h i c hs o u r c er e s i d e s
: b a s eo f f s e ti nd i s p l a y
memory o f page i n
: w h i c hd e s tr e s i d e s
;#o f p i x e l s a c r o s s d e s t b i t m a p ( m u s t b e m u l t i p l e o f
4)
9 19
SourceNextScanOffset
- e2 q u
DestNextScanOffset
-4equ
RectAddrWidth
RectHeight
SourceBitmapWidthequ
STACK-FRAME-SIZE
MaskedImage
A1 i g n m e n t s
equ
equ
- 10
-6
-8
10equ
struc
dw 4 d u p ( ? )
; l o c a ls t o r a g ef o rd i s t a n c ef r o me n do f
; o n es o u r c es c a nl i n et os t a r to fn e x t
; l o c a ls t o r a g ef o rd i s t a n c ef r o me n do f
; o n ed e s ts c a nl i n et os t a r to fn e x t
; l o c a ls t o r a g ef o ra d d r e s sw i d t ho fr e c t a n g l e
; l o c a ls t o r a g ef o rh e i g h to fr e c t a n g l e
; l o c a ls t o r a g ef o rw i d t ho fs o u r c eb i t m a p
; ( i na d d r e s s e s )
; p o i n t e r st oA l i g n e d M a s k e d I m a g e sf o rt h e
: 4 p o s s i b l ed e s t i n a t i o ni m a g ea l i g n m e n t s
MaskedImage
ends
AlignedMaskedImage
struc
i n a d d r e s s e s( a l s o
: i m a g ew i d t h
Imagewidth
dw
?
; o f f s e to fi m a g eb i t m a pi nd i s p l a y
ImagePtr
dw
?
: p o i n t e r t o mask b i t m a p i n DS
MaskPtr
dw
?
ends
AlignedMaskedImage
.model
small
.code
p u b l i c -CopyScreenToScreenMaskedX
-CopyScreenToScreenMaskedX pnr eo ac r
f r a m e s t a c kc a ;l push
pl er re' s e r v eb p
f r a smt ea cl okcmov
a l t o ; p obi pn .t s p
sp.STACK-FRAME-SIZE
; a l l o c a st ep a cf loeorc a l
VarS
sub
v a r ri ea gb ilcsea;tsplpush
elrerers' se r v es i
di
push
cld
mov
mov
out
dx.GC-INDEX
ax.OOOOOh+BIT-MASK
dx.ax
; stbhei et
mask w i d t h i n b y t e s )
memory
; f r otm
hl ae t c h easnndo nf reo m
: t h e CPU. so t h a t we wtchraietne
; l a t c hc o n t e n t sd i r e c t l yt o
memory
ax.SCREEN-SEG
: p o i n t ES
d i st op l a y
memory
mov
mov
es.ax
ax.[bp+DestBitmapWidthl
mov
a d d r e s isne sw i dshr
t h t:oc o n av ex .r 1t
ax.1
shr
[ b p + D e s t S t a ;drrstteolY
ciscpnal ten
mu1
di.[bp+DestStartXl
mov
mov
si ,di
shr
d i .1
;X/4
o ff idforrsepsefesitictxtnte l
shr
; scan l i n e
d i .1
d i ,ax
; ofpdirfparefoissgsnxcfteetetl
add
add
d i . [ b p + D e s t P a g e B a s e:]o f f s eot ffi r sdt e srt e cpt i x eilnd i s p l a y
; memory. now l o o ku pt h ei m a g et h a t ' s
: a l i g n e dt om a t c hl e f t - e d g ea l i g n m e n t
; o fd e s t i n a t i o n
and
; D e s t S t a r t Xm o d u l o
4
s i .3
: s e ta s i d ea l i g n m e n tf o rl a t e r
mo v
cx,si
; p r e p a r ef o rw o r dl o o k - u p
shl
s i .1
; p o i n tt os o u r c eM a s k e d I m a g es t r u c t u r e
mov
bx. [bp+Sourcel
mov
; p o i n tt oA l i g n e d M a s k e d I m a g e
bx.[bx+Alignments+sil
: strucforcurrentleft
e d g ea l i g n m e n t
mov
ax,[bx+ImageWidthl
;image
width
i n addresses
[bp+SourceBitmapWidthl.ax
;remember
image
w i d tiahnd d r e s s e s
mov
mu1
[ b p + S o u r c e S t a r;ttsYoo]pu r cse ecl iatnne
mov
si, [bp+SourceStartXl
shr
s i .1
:X/4
a dfdsi oroseufrptsreiiscnxceet l
; scan l i n e
s i .1
shr
s i ,ax
;fsoiiorfm
osfuprsfitaerinexcgcteet l
add
920
Chapter 49
mov
add
mov
add
ax.si
si.[bx+MaskPtrl
bx.[bx+ImagePtrl
bx.ax
: p o i n t t o mask o f f s e t o f f i r s t
; o f f s e to ff i r s ts o u r c er e c tp i x e l
: i n d i s p l a y memory
mask p i x e l i n
OS
mov
ax,[bp+SourceStartXl
: c a l c u l a t e # o f a d d r e s s e sa c r o s s
ax,cx
add
; r e c t .s h i f t i n g
i f n e c e s s a r yt o
cx,[bp+SourceEndX]
add
; a c c o u n tf o ra l i g n m e n t
cx.ax
CmP
jle
CopyDone
; s k i p i f 0 o rn e g a t i v ew i d t h
add
cx.3
and
a x . n o tO l l b
sub
cx.ax
shr
cx. 1
shr
cx.1
;# o f a d d r e s s e sa c r o s sr e c t a n g l et oc o p y
mov
ax.[bp+SourceEndYl
sub
:AX
h e i g h to fr e c t a n g l e
ax.Cbp+SourceStartYl
jle
CopyDone
; s k i p i f 0 o rn e g a t i v eh e i g h t
mov
[bp+RectHeightl.ax
mov
ax.[bp+DestBitmapWidthl
a d d r e sisne sw i sd ht hr t; oc o nav xe.r1t
shr
ax.1
sub
a x . :cdxi s t a n fcreoem
ononddf essct al isnt toenaoerftx t
mov
Cbp+DestNextScanOffsetl.ax
mov
ax.[bp+SourceBitmapWidthl
; wai di ntdhr e s s e s
sub
a x .: cd xi s t a nf cr eoenmd
o f s o u r sc ce al isnnt toeanoreftx t
mov
[bp+SourceNextScanDffsetl.ax
mov
Cbp+RectAddrWidthl.cx
;remember
width
i n addresses
mov
dx.SC-INDEX
mov
a1 ,MAP"MASK
out
; pdoxi.natl
t oinc
;point dx
CopyRowsLoop:
mov
cx.[bp+RectAddrWidthl
CopyScanLineLoop:
lodsb
SC Ir ne dgteiosx t e r
r eSC
g i sDt ea rt a
Map Mask
; w i d t ha c r o s s
: g e tt h e mask f o r t h i s f o u r - D i x e l s e t
: - a n da d v a n c et h e
mask p o i n t e r
the
dx.al
mask
o: suett
mov
a1 . e s : [;bl olxtahal tedc hwfeoi tsuh r - p fisxrseoeotm
lu r c e
e s : [ d ;fi c]otd.ohutaehperls-eyttposi ex te l
mov
p o i n t esro u r c e t h e : aidncv a n c e b x
di
pdoeisnttienra t itohne; a d v a n c e
inc
s ef ot su r - p i x eol f f ; cdec
ount
cx
jnz
CopyScanLineLoop
mov
ax,[bp+SourceNextScanOffset]
o f s t a r t t h eadd t o ; p osi in,ta x
add
s o u r c e n, e x t : t h e
mask,
bx.ax
add
di.[bp+DestNextScanOffsetl
: a nddelsi nt e s
dec
word p[ tbrp + R e c t H e i g h: ct lo u n t
down s c al inn e s
CopyRowsLoop
j nz
CopyDone:
dx.GC-INDEX+l
; r e s t otbhr ieet
mask idt eos f a u l t ,
mov
a1 . O f f h
mov
; w hsi ec lhbfera
ti cohtl stlm
es
CPU
out
dx.al
l a (ttcthfh:hnreeoeoam
snned
GC
; I n d e xs t i l lp o i n t st oB i t
Mask)
v a r ri ae bg licesatsle:l rePOP
er 'sst o r e d i
si
POP
v a r imov
l ao bc lael sf s;odtroi sr cpaa.gbredp
921
f r a m es t a cc ka lPOP
:l reer s' st o r e b p
ret
-CopyScreenToScreenMaskedX endp
end
It would be handy to have a function that, given a base image and mask, generates
the fourimage and mask alignments and fills in the MaskedImage structure. Listing
49.3,together with the includefile in Listing 49.4and thesystem memory-to-display
memory block-copy routine in Listing 48.4 (in theprevious chapter) does just that.
It would be faster if Listing 49.3were in assembly language, but there's no reason to
think that generating aligned images needs to be particularly fast; in such cases, I
prefer to use C, for reasons of coding speed,fewer bugs, and maintainability.
LISTING 49.3
/*
L49-3.C
Generates a l l f o u r p o s s i b l e
mode X i m a g e / m a s ka l i g n m e n t s ,s t o r e si m a g e
a l i g n m e n t si nd i s p l a y
memory. a l l o c a t e s memory f o r andgenerates
mask
a l i g n m e n t s ,a n d
f i l l s o u t an A l i g n e d M a s k e d I m a g es t r u c t u r e .I m a g ea n d
mask must
b o t hb ei nb y t e - p e r - p i x e lf o r m ,a n dm u s tb o t hb eo fw i d t hI m a g e w i d t h .
Mask
maps i s o m o r p h i c a l l y( o n et oo n e )o n t oi m a g e ,w i t he a c h0 - b y t ei n
mask masking
o f fc o r r e s p o n d i n gi m a g ep i x e l( c a u s i n g
i t n o t t o b ed r a w n ) .a n de a c hn o n - 0 - b y t e
a l l o w i n gc o r r e s p o n d i n gi m a g ep i x e lt ob ed r a w n .R e t u r n s
0 i f f a i l u r e , or # o f
d i s p l a y memory a d d r e s s e s( 4 - p i x e ls e t s )u s e d
i f success. For s i m p l i c i t y ,
a l l o c a t e d memory i s n o t d e a l l o c a t e d i n c a s e o f f a i l u r e . C o m p i l e d w i t h
B o r l a n d C++ i n C c o m p i l a t i o n mode. */
# i n c l u d e< s t d i o . h >
#i
n c l ude < s t d l ib. h>
#include"maskim.h"
e x t e r nv o i d CopySystemToScreenX(int, i n t . i n t . i n t . i n t . i n t . c h a r
u n s i g n e di n t ,i n t .i n t ) ;
u n s i g n e d i n t CreateAlignedMaskedImage(Masked1mage * ImageToSet.
u n s i g n e di n tD i s p M e m S t a r t .c h a r
* Image, i n t Imagewidth.
i n t I m a g e H e i g h t .c h a r
* Mask)
*,
i n tA l i g n ,S c a n L i n e .B i t N u m .S i z e ,T e m p I m a g e W i d t h ;
u n s i g n e dc h a r MaskTemp;
DispMemStart;
u n s i g n e di n tD i s p M e m O f f s e t
A1 ignedMaskedImage *WorkingAMImage;
char*NewMaskPtr.*OldMaskPtr:
I* G e n e r a t ee a c ho ft h ef o u ra l i g n m e n t si nt u r n .
*I
f o r( A l i g n
0: A l i g n < 4;Align++)
I
/ * A l l o c a t es p a c ef o rt h eA l i g n e d M a s k e d I m a g es t r u c tf o rt h i sa l i g n m e n t .
i f ((WorkingAMImage
ImageToSet->AlignmentsCAlignl
malloc(sizeof(AlignedMasked1mage)))
NULL)
r e t u r n 0;
--
WorkingAMImage->Imagewidth
( I m a g e w i d t h + A l i g n + 3 ) / 4; / * w i d t h i n 4 - p i x e l s e t s
W o r k i n g A M I m a g e - > I m a g e P t r DispMemOffset: I* i m a g ed e s t */
*/
/ * Download t h i sa l i g n m e n to ft h ei m a g e .
*/
CopySystemToScreenX(0, 0. I m a g e w i d t h I. m a g e H e i g h t A
, lign,
0.
Image,DispMemOffset.Imagewidth.
WorkingAMImage->Imagewidth
/ * C a l c u l a t e t h e number o f b y t e s n e e d e d t o s t o r e t h e
mask i n
n i b b l e (Map M a s k - r e a d y )f o r m ,t h e na l l o c a t et h a ts p a c e .
*/
WorkingAMImage->Imagewidth * ImageHeight:
Size
i f ((WorkingAMImage->MaskPtr
malloc(Size))
NULL)
r e t u r n 0;
922
Chapter 49
*/
4);
/*
G e n e r a t et h i sn i b b l eo r i e n t e d
(Map M a s k - r e a d y )a l i g n m e n to f
t h e mask,onescan
l i n e a t a t i m e . */
OldMaskPtr
Mask:
WorkingAMImage->MaskPtr:
NewMaskPtr
f o r( S c a n L i n e
0: ScanLine < ImageHeight:ScanLine++)
{
BitNum
Align:
0:
MaskTemp
TempImageWidth
Imagewidth:
do {
/ * S e tt h e mask b i t f o r n e x t p i x e l a c c o r d i n g t o i t s a l i g n m e n t .
MaskTemp I- (*OldMaskPtr++ !- 0 ) << BitNum:
i f (++BitNum > 3 ) {
*NewMaskPtr++
MaskTemp:
MaskTemp
BitNum
0:
--
---
*/
- - -
1
1 w h i l e( - - T e m p I m a g e W i d t h ) :
/*
1
S e ta n yp a r t i a lf i n a l
maskon
i f ( B i t N u m !- 0 ) *NewMaskPtr++
DispMemOffset +- S i z e :
r e t u r nD i s p M e m O f f s e t
LISTING 49.4
/*
/*
*/
t h i ss c a nl i n e .
MaskTemp:
mark o f f t h e
space we j u s t u s e d
*/
- DispMemStart;
MASK1M.H
MASK1M.H: s t r u c t u r e su s e df o rs t o r i n ga n dm a n i p u l a t i n gm a s k e d
images */
/* D e s c r i b e so n ea l i g n m e n to f
a m a s k - i m a g ep a i r .
*/
t y p e d e fs t r u c t {
i n tI m a g e w i d t h : / * i m a g ew i d t hi na d d r e s s e si nd i s p l a y
memory ( a l s o
mask w i d t h i n b y t e s )
*/
u n s i g n e di n tI m a g e P t r :
/ * o f f s e t o f imagebitmap i n d i s p l a y mem */
char*MaskPtr;
/ * p o i n t e rt o maskbitmap */
1 AlignedMaskedImage;
/ * D e s c r i b e sa l lf o u ra l i g n m e n t so f
t y p e d e fs t r u c t {
AlignedMaskedImage*Alignments[41:
a m a s k - i m a g ep a i r .
1 MaskedImage:
/*
*/
p t r st oA l i g n e d M a s k e d I m a g e
s t r u c t sf o rf o u rp o s s i b l ed e s t i n a t i o n
i m a g ea l i g n m e n t s
*/
923
Also, it would be more efficient tomake up structures that describe the source
and
destination bitmaps, with dimensions and coordinates built
in, andsimply passpointers to these structures tolow
thelevel, rather thanpassing manyseparate parameters,
as is now the case. Ive used separate parameters forsimplicity and flexibility.
Be aware that as nijii as Mode X hardware-assisted masked copying is, whether
or not its actually faster than software-only masked or transparent copying depends upon theprocessor and thevideo adapter The advantage of Mode Xmasked
copying is the 32-bit parallelism; the disadvantages are the need to read display
memory and the needto perform an OUT for every four pixels. (OUT is a slow
486/Pentium instruction, and mostVGAs respond to OUTSmuch moreslowly than
to display memory writes.)
Animation
Gosh. Theres just no way I can discuss high-level animation fundamentals in any
detail here; I could spend
an entire (andentirely separate) bookon animation techniques alone. You might want to have a look at Chapters 43 through 46 before
attacking the code inthis chapter; thatwill have to do us for the presentvolume. (I
will return to 3-Danimation in the next chapter.)
Basically, Imgoing to performpage flipped animation,in which one page (that is, a
bitmap large enough to hold a full screen) of display memory is displayed while
another page is drawn to. When the drawing is finished, thenewly modified page is
displayed, and the other-now invisible-page is drawn
to. The process repeats ad
infinitum. For further information, some good places to start areComputer Guphics,
by Foley and van Dam (Addison-Wesley);Principles oflnteructive Computer Graphics,
by
Newman and Sproull (McGraw Hill) ; and Real-TimeAnimation by Rahner James
(January 1990, Dr. Dobbs Journal ) .
Some of the code inthis chapter was adapted forMode X from the codein Chapter
44-yet another reason to read that chapter before finishing this
one.
924
Chapter 49
background page. (See the discussion in the previous chapter for the
display memory
organization used by Listing 49.5.)So far as the displayed image is concerned, there
is never any hint of flicker or disturbance of the background.This continues at a rate
of up to 60 times a second until Esc is pressed to exit the program. See Figure 49.1
for ascreen shot of the resulting image-add the animation inyour imagination.
LISTING 49.5 L49-5.C
/*
*/
# i n c l u d e< s t d i o . h >
#i
n c l u d e < c o n i0. h>
#include <dos. h>
# i n c l u d e< m a t h . h >
#include"maskim.h"
#define
#define
#define
#define
#define
#define
#define
SCREEN-SEG
OxAOOO
SCREEN-WIDTH
320
SCREEN-HEIGHT
240
PAGEO-START-OFFSET 0
PAGE1-START-OFFSET (((long)SCREEN-HEIGHT*SCREEN-WIOTH)/4)
BG-STARTLOFFSET
(((long)SCREEN_HEIGHT*SCREEN_WIDTH*2)/4)
DOWNLOAD-STARTLOFFSET (((long)SCREENKHEIGHT*SCREEN-WIDTH*3)/4)
s t a t i cu n s i g n e di n tP a g e S t a r t O f f s e t s C Z l
{PAGEOKSTART-OFFSET.PAGEl-START-OFFSET):
s t a t i cc h a rG r e e n A n d B r o w n P a t t e r n C ]
( 2 . 6 . 2 . 6 .6 . 2 . 6 . 2 .2 . 6 . 2 . 6 .6 . 2 . 6 . 2 ) ;
s t a t i cc h a rP i n e T r e e P a t t e r n C l
(2.2.2.2, 2 . 6 . 2 . 6 .2 . 2 . 6 . 2 .
2.2.2,2):
s t a t i cc h a rB r i c k P a t t e r n C l
I 6 . 6 . 7 . 6 .7 . 7 . 7 . 7 .7 . 6 . 6 , 6 .
7.7,7,7.}:
s t a t i cc h a rR o o f P a t t e r n C l
( 8 . 8 . 8 . 7 , 7 . 7 . 7 . 7 . 8 . 8 . 8 . 7 ,8 . 8 . 8 . 7 ) ;
# d e f i n e SMOKE-WIDTH
7
# d e f i n e SMOKE-HEIGHT 7
925
s t a t i cc h a rS m o k e P i x e l s C l
0.
0.
8.
8.
0.
0.
0.
0.15.15.15. 0. 0.
7. 7.15.15.15. 0.
7. 7. 7.15.15.15,
7. 7. 7. 7.15.15.
8, 7. 7, 7. 7.15.
0. 8. 7. 7. 7. 0.
0. 0. 8. 8. 0. 01:
s t a t i cc h a r
SmokeMaskCl
0. 0. 1. 1. 1. 0. 0.
0. 1. 1. 1. 1. 1. 0.
1, 1, 1. 1. 1. 1. 1.
1. 1. 1. 1. 1. 1. 1.
1, 1. 1. 1. 1. 1. 1.
0.1.1.1.1.1.0.
0. 0. 1. 1. 1. 0. 01:
# d e f i n e KITELWIDTH 10
# d e f i n e KITELHEIGHT 16
s t a t i cc h a rK i t e P i x e l s C l
- 0. 0.
(
0. 0. 0. 0.45, 0. 0. 0.
0. 0. 0.46.46.46, 0. 0.
0. 0.47.47.47.47.47. 0.
0.48.48.48,48.48.48.48.
0. 0.
0. 0.
0. 0.
49.49,49.49.49.49.49.49.49.
0.
0,50.50.50.50.50.50.50. 0. 0.
0.51.51.51.51.51,51.51, 0. 0.
0. 0.52.52.52.52.52. 0. 0. 0.
0. 0,53.53.53.53.53. 0. 0. 0.
0, 0, 0.54.54.54. 0. 0, 0. 0.
0. 0. 0.55.55.55. 0. 0. 0. 0.
0. 0. 0. 0.58, 0. 0. 0. 0. 0.
0. 0. 0. 0.59, 0. 0. 0. 0.66.
0. 0. 0. 0.60, 0. 0.64, 0.65.
0. 0. 0. 0. 0.61, 0. 0.64. 0.
0. 0. 0. 0. 0. 0.62.63, 0.641;
s t a t i cc h a rK i t e M a s k C l
0.0.0.0,1.0.0.0.0.0.
0. 0. 0. 1. 1. 1. 0. 0, 0. 0.
0. 0. 1. 1. 1. 1. 1. 0. 0. 0.
0. 1. 1. 1. 1. 1. 1. 1. 0. 0.
1. 1. 1. 1. 1. 1. 1. 1. 1. 0.
0.1.1.1.1.1.1.1.0.0.
0. 1. 1. 1. 1. 1. 1. 1. 0. 0.
0. 0. 1. 1, 1. 1. 1. 0. 0. 0,
0. 0. 1. 1. 1. 1. 1. 0. 0. 0.
0.0.0.1.1.1.0.0,0.0.
0. 0. 0. 1. 1. 1. 0. 0. 0. 0.
0.0.0.0.1.0.0.0,0.0.
0. 0. 0. 0. 1. 0. 0. 0. 0. 1.
0. 0. 0. 0. 1. 0. 0. 1. 0. 1.
0. 0. 0. 0. 0. 1. 0. 0. 1. 0.
0. 0. 0. 0. 0. 0. 1. 1. 0. 11:
s t a t i c MaskedImageKiteImage:
# d e f i n e NUM-OBJECTS
20
t y p e d e fs t r u c t (
i n t X,Y.Width.Height.XDir.YDir.XOtherPage.YOtherPage;
MaskedImage*Image:
1 Animatedobject:
926
Chapter 49
-I
A n i m a t e d o b j e c tA n i m a t e d O b j e c t s C l
[ 0.
O.KITE-WIDTH.KITE_HEIGHT.
0. O , & K i t e I m a g e l ,
0 . 1, 1 0l.O . & K i t e I m a g e I .
20. 20.KITEKWIDTH.KITEKHEIGHT.-1.
1, 20.2O.&KiteImagej.
30. 30.KITE~WIDTH.KITE~HEIGHT."1.
3 0 3. 0 . & K i t e I m a g e j ,
40. 40.KITE-WIDTH.KITE-HEIGHT.
1;l.
40.40.&KiteImage).
50, 50.KITEKWIDTH,KITEKHEIGHT. O,-l
50.
, 50.&KiteImage).
I 6 0 , 60.KITE-WIDTH.KITE_HEIGHT. 1. 0. 60.60.&KiteImage).
[ 70. 7O.KITE-WIDTH.KITE-HEIGHT,-l,
0 . 70.7D.&KiteImage).
[ EO. 80.KITE-WI0TH.KITE-HEIGHT.
1, 2. EO. EO,&KiteImage).
{ 90. 90.KITE-WI0TH.KITE-HEIGHT.
0. 2.90,90.&KiteImage}.
[100.100.KITE~WIDTH.KITE~HEIGHT.-1.
2.100.10D.&KiteImageI.
~110.110.KITEKWIDTH.KITEHEIGHT.-1.-2,llO,llO.&KiteIma~e~.
[ 1 2 0 . 1 2 0 . K I T E ~ W I D T H , K I T E ~ H E I G H T , 1.-2.120.120.&KiteImage).
{
{
[
(
[
1. 1.
1 0 . 10,KITE-WIDTH.KITE-HEIGHT.
[130.130,KITEKWIDTH,KITEKHE1GHT, 0.-2.130.130.&KiteImage).
(14D.140.KITE~W10TH.KITELHEIGHT. 2,0.140.140.&KiteImage).
{150.150,KITE~WIDTH.KITEKHE1GHT,-2. 0.150,150.&KiteImage).
(160,160,KITEKWIDTH.K1TEKHE1GHT,
I;
2.2,16D.l60.&KiteImage),
{170.170.KITE~WIDTH.KITE~HEIGHT.-2, 2.170.170.&KiteImage).
{1E0.1E0,KITEKWIOTH,KITEKHEIGHT.-2.-2,lEO,lEO,&KiteImage~,
[190.190,KITE~WIDTH,KITEKHEIGHT, 2.-2.190.190.&KiteImagej.
voidmain(void);
v o i dD r a w B a c k g r o u n d ( u n s i g n e di n t ) ;
v o i d MoveObject(Animated0bject * ) ;
e x t e r nv o i dS e t 3 2 0 ~ 2 4 0 M o d e ( v o i d ) ;
e x t e r nv o i dF i l l R e c t a n g l e X ( i n t ,i n t .i n t .i n t .u n s i g n e di n t .i n t ) :
e x t e r nv o i dF i l l P a t t e r n X ( i n t .i n t .i n t .i n t .u n s i g n e di n t .c h a r * ) ;
e x t e r nv o i d CopySystemToScreenMaskedX(int. i n t . i n t . i n t , i n t . i n t .
*);
c h a r *, u n s i g n e di n t .i n t .i n t ,c h a r
e x t e r nv o i d CopyScreenToScreenX(int. i n t . i n t . i n t . i n t . i n t ,
u n s i g n e di n t .u n s i g n e di n t .i n t .i n t ) ;
CreateAlignedMaskedImage(Masked1mage * ,
e x t e r nu n s i g n e di n t
*, i n t .i n t .c h a r
*):
u n s i g n e di n t .c h a r
e x t e r nv o i d CopyScreenToScreenMaskedX(int. i n t . i n t . i n t . i n t . i n t .
MaskedImage *, u n s i g n e d i n t . i n t ) ;
e x t e r nv o i dS h o w P a g e ( u n s i g n e di n t ) ;
voidmain0
i nD
t i s p l a y e d P a g eN
. o n D i s p l a y e d P a g eD
, one,
i;
u n i o n REGS r e g s e t ;
Set320x240ModeO;
/ * D o w n l o a dt h ek i t ei m a g ef o rf a s tc o p y i n gl a t e r .
*/
i f (CreateAlignedMaskedImage(&KiteImage. DOWNLOADKSTART-OFFSET,
0) {
K i t e P i x e l s . KITELWIDTH.
KITELHEIGHT,
KiteMask)
regset.x.ax
0 x 0 0 0 3 ;i n t E 6 ( 0 x 1 0 .& r e g s e t .& r e g s e t ) :
p r i n t f ( " C o u 1 d n ' tg e tm e m o r y \ n " ) :e x i t ( ) ;
--
/*
Draw t h eb a c k g r o u n dt ot h eb a c k g r o u n dp a g e .
*/
DrawBackground(BG-STARTL0FFSET);
/ * Copy t h eb a c k g r o u n dt ob o t hd i s p l a y a b l ep a g e s .
*/
CopyScreenToScreenX(0, 0 . SCREEN-WIDTH,
SCREENKHEIGHT.
00. .
BG-START-OFFSET,
PAGEO-START-OFFSET,
SCREEN-WIDTH.
SCREENKWIDTH);
CopyScreenToScreenX(0, 0. SCREENKWIDTH.
SCREEN-HEIGHT.
0.0.
BGKSTART-OFFSET.
PAGE1-START-OFFSET,
SCREEN-WIDTH.
SCREEN-WIDTH);
/ * Move t h eo b j e c t sa n du p d a t et h e i ri m a g e si nt h en o n d i s p l a y e d
p a g e t, h e n
f l i p t h e page, u n t i l Esc i s p r e s s e d . * /
0;
Done = D i s p l a y e d P a g e
do I
NonDisplayedPage
D i s p l a y e d P a g e A 1;
927
I* E r a s ee a c ho b j e c ti nn o n d i s p l a y e dp a g eb yc o p y i n gb l o c kf r o m
*I
b a c k g r o u n dp a g ea tl a s tl o c a t i o ni nt h a tp a g e .
f o r( i - 0 :
i<NUM-OBJECTS; i++)
(
CopyScreenToScreenX~AnimatedObjects~il.XOtherPage,
AnimatedObjects[il.YOtherPage.
AnimatedObjects[il.XOtherPage
AnimatedObjects[il.Width.
AnimatedObjects[il.YOtherPage
+
+
AnimatedObjects[il.Height.
AnimatedObjects[il.XOtherPage.
A n i m a t e d O b j e c t s [ i l . Y O t h e r P a g e . BG-START-OFFSET.
PageStartOffsetsCNonDisplayedPagel. SCREEN-WIDTH.
SCREEN-WIDTH):
I* Move a n dd r a we a c ho b j e c t
i nt h en o n d i s p l a y e dp a g e .
*I
f o r( i - 0 ;
i<NUMLOBJECTS; i++)
(
MoveObject(&AnimatedObjects[il):
I* Draw o b j e c ti n t on o n d i s p l a y e dp a g ea t
new l o c a t i o n * I
CopyScreenToScreenMaskedX(0, 0, AnimatedObjects[il.Width,
AnimatedObjects[il.Height. AnimatedObjects[il.X,
A n i m a t e d O b j e c t s [ i l . Y . AnimatedObjectsCil.1mage.
PageStartOffsets[NonDisplayedPagel, SCREEN-WIDTH):
/*
F l i p t o t h ep a g ei n t ow h i c h
we j u s t drew. * I
ShowPaqe(PaaeStartOffsetsrDisD1ayedPage
NonOisplayedPage]);
I* See i f i t ' s t i m e t o
end. * I
if (kbhit0) (
if (getch0
OxlB)Done
1;
I* Esc t o end * I
1 w h i l e( ! D o n e ) :
I* R e s t o r e t e x t
regset.x.ax
mode anddone.
- 0 x 0 0 0 3 ;i n t 8 6 ( 0 x 1 0 .
voidDrawBackground(unsigned
- .
"
*I
& r e g s e t .& r e g s e t ) ;
i n tP a g e s t a r t )
i n ti . j , T e m p ;
I* F i l lt h es c r e e nw i t hc y a n .
*I
F i l l R e c t a n g l e X ( 0 , 0. SCREEN-WIDTH.
SCREEN-HEIGHT.
P a g e s t a r t . 11);
I* Draw a g r e e na n db r o w nr e c t a n g l et oc r e a t e
a f l a t p l a i n . *I
F i l l P a t t e r n X ( 0 1. 6 0 ,
SCREEN-WIDTH,
SCREEN-HEIGHT.
PageStart,
GreenAndBrownPattern):
I* D r a wb l u ew a t e ra tt h eb o t t o mo ft h es c r e e n .
*I
F i l l R e c t a n g l e X ( 0 . SCREENLHEIGHT-30. SCREEN-WIDTH.
SCREEN-HEIGHT.
P a g e s t a r t , 1) :
I* Draw a b r o w n m o u n t a i n r i s i n g o u t o f t h e p l a i n .
*I
f o r( i - 0 :i < 1 2 0 :
i++)
FillRectangleX(SCREEN~WIDTHl2-30-i. 51+i, SCREEN-WIDTH/2-30+i+l,
5 1 + i + lP, a g e s t a r t6. ) ;
I* Draw a y e l l o ws u nb yo v e r l a p p i n gr e c t so fv a r i o u ss h a p e s .
*I
f o r( i - 0 i; < - 2 0 :
i++)
(
Temp
(int)(sqrt(20.0*20.0 - ( f l o a t ) i * ( f l o a t ) i ) + 0.5):
F i l l R e c t a n g l e X ( S C R E E N _ W I D T H - 2 5 - i . 30-Temp,
SCREEN-WIDTH-25+i+l.
30+Temp+l. P a g e s t a r t 1. 4 ) ;
I* D r a wg r e e nt r e e s
down t h e s i d e
f o r( i - 1 0 :i < 9 0 ;
i +- 1 5 )
f o r( j - 0 ;j < 2 0 ;
j++)
o ft h em o u n t a i n .
*I
FillPatternX(SCREENLWIDTH12+i-j13-15, i + j + 5 1 , S C R E E N ~ W I D T H / 2 + i + j I 3 - 1 5 + 1 ,
i + j + 5 1 + 1P. a g e s t a r tP. i n e T r e e P a t t e r n ) :
I* Draw a house on t h e p l a i n .
*I
F i l l P a t t e r n X ( 2 6 5 1. 5 0 2. 9 5 1. 7 0 P
. a g e s t a r tB
. rickPattern);
928
Chapter 49
/*
Move t h e s p e c i f i e d o b j e c t , b o u n c i n g a t t h e e d g e s
o f t h es c r e e na n d
r e m e m b e r i n gw h e r et h eo b j e c t
was b e f o r e t h e move f o r e r a s i n g n e x t t i m e .
v o i d MoveObject(Animated0bject * ObjectToMove)
i n t X, Y:
X
ObjectToMove->X + ObjectToMove->XDir;
Y
ObjectToMove->Y + ObjectToMove->YDir:
i f ( ( X < 0 ) 1 1 (X > (SCREEN-WIDTH - O b j e c t T o M o v e - > W i d t h ) ) ) [
ObjectToMove->XDir
-0bjectToMove->XDir:
X
ObjectToMove->X + ObjectToMove->XDir:
if ((Y
*I
/ * Remember p r e v i o u sl o c a t i o nf o re r a s i n gp u r p o s e s .
ObjectToMove->XDtherPage
ObjectToMove->X:
ObjectToMove->YOtherPage ObjectToMove->Y;
ObjectToMove->X
X : / * s e t new l o c a t i o n * /
ObjectToMove->Y
Y:
1
*I
LISTING 49.6L49-6.ASM
: Shows t h e p a g e a t t h e s p e c i f i e d o f f s e t i n t h e b i t m a p .
: t h i sr o u t i n er e t u r n s .
Page i s d i s p l a y e d when
: C n e a r - c a l l a b l ea s :v o i dS h o w P a g e ( u n s i g n e di n tS t a r t o f f s e t ) :
0e3q:duI naSpht ua t u s
1 register
INPUTLSTATUSLl
equ 03d4h
:CRT C or enI gnt rdoel lxe r
CRTC-INDEX
START-ADDRESS-HIGH
equ
Och
; b i t sm
atdadrptbhr eyi gst ehs
START_ADDRESSLLOWequ
Odh
: basiydttltdm
aoerrw
aetps s
ShowPageParms
Startoffset
ShowPageParms
ends
struc
dw
dw
2 dup ( ? )
:pushed BP and r e t ua rdnd r e s s
?
d i s ptpol abygioet: fm
oi fnaf ps e t
Mode X
256-ColorAnimation 929
Previous
;start
;start
.model
small
.code
p u b l3i ch o w P a g e
-Showpage
n e aprr o c
: p r e s e r v ec a l l e r ss t a c kf r a m e
push
bp
mov
; p o i n tt ol o c a ls t a c kf r a m e
bp.sp
: W a i tf o rd i s p l a ye n a b l et ob ea c t i v e( s t a t u si sa c t i v el o w ) .t ob e
: s u r eb o t hh a l v e so ft h es t a r ta d d r e s s
will t a k e i n t h e
same frame.
bl.START-ADDRESS-LOW
mov
: p r e l o a df o rf a s t e s t
mov
bh.bS
yp tt era r t O f f s e t C b p ]
: f l i p p i n g o n c ed i s p l a y
cl.START_ADDRESS-HIGH
mov
: e n a b l e i sd e t e c t e d
mov
c h . b yS
pttera r t O f f s e t + l [ b p ]
mov
dx.INPUT-STATUSpl
WaitDE:
in
a1 .dx
test
a1 ,Olh
WaitDE
; d i s p l a ye n a b l ei sa c t i v el o w
jnz
: Setthestartoffsetindisplay
memory o f t h e page t o d i s p l a y .
dx.CRTCJNDEX
mov
mov
ax.bx
dx.ax
out
mov
ax,cx
dx.ax
out
: Now w a i t f o r v e r t i c a l s y n c ,
s o t h eo t h e rp a g e
will b e i n v i s i b l e when
: we s t a r t d r a w i n g t o i t .
mov
dx.INPUT-STATUS-1
Wai tVS:
in
,dx a1
test
a1 .08h
: v e r t i c a ls y n ci sa c t i v eh i g h
jz
WaitVS
: r e s t o r ec a l l e r ss t a c kf r a m e
POP
bp
ret
endp -Showpage
end
Home
(0
- active)
(1
- active)
Next
930
Chapter 49
Previous
chapter 50
adding a dimension
Home
Next
933
934
Chapter 50
A source that you may or may not find useful is the series of six bookson C graphics
by Lee Adams, as exemplified by High-Performance CAD Graphics in C (Windcrest/
Tab, 1986). (I dontknow if all six books discuss3-D graphics, but the fourIve seen
do.) To be honest, this book has a number of problems, including: Relatively little
theory and explanation; incompleteand sometimes erroneous discussions ofgraphics hardware; use of nothing but global variables, with cryptic names like array3
the at least touches
and B21; and-well, you get theidea. On the other hand, book
on a great many aspects of 3-D drawing, and theres a lot of C code toback that up.
A number of people have spoken warmly to me of Adams books astheir introduc3-D references,
tion to 3-D graphics. I wouldnt recommend these books as your only
but if youre just starting out, you might want to look at one andsee if it helps you
bridge the gap between the theory and implementation of 3-D graphics.
935
4)
936
Chapter 50
Projection
Working backwardfrom thefinal image, we want to take the vertices of a polygon, as
transformed into view space, and project them to 2-D coordinates on the screen,
to
which, for projection purposes, is assumed to be centered on and perpendicular
the Z axis in view space, at some distance from thescreen. Were after visual realism,
s o well want to do a perspective projection, in order that fartherobjects look smaller
than nearer objects, and so that the field of view will widen with distance. This is
done by scaling the X and Y coordinates of each point proportionately to the Z
distance of the point from the
viewer, a simple matter of similar triangles, as shown
in Figure 50.3. It doesnt really matter how far down the Z axis the screen is assumed
to be; what matters is the ratioof the distance of the screen from theviewpoint to the
width of the screen. This ratio defines the rate of divergence of the viewing pyramid-the full field of view-and is used for performing all perspective projections.
Once perspective projection has been performed,all that remains before calling the
polygon filler is to convert the projected X and Y coordinates to integers, appropriately clipped and adjusted as necessary to
center theorigin on thescreen or otherwise
map the image into awindow, if desired.
Translation
A
I
!I
I
I
TOD ofview
pyramid
/c3proiected
-D12
Perspective projection.
Figure 50.3
Adding a Dimension
937
for eachaxis. Translation is, for example,used to move objects from object space, in
which the centerof the object is typically the origin (O,O,O), into world space, where
the object may be located anywhere.
Rotation
Rotation is the process of circularly moving
coordinatesaround the origin. Forour present
purposes, its necessary onlyto rotate objects about their centersin object space, so
as to turn them to the desired attitude beforetranslating them into world space.
Rotation of a point about an
axis is accomplished by transforming it according to the
formulas shown in Figure 50.4. These formulas map into themore generally useful
matrix-multiplication forms also shownin Figure 50.4. Matrix representation is more
(a)
new =x
newy = cos(theta) * y - sin(theta) * z
newz = sin(theta) * y + cos(theta) * z
k]
[i
cos(theta)
0
sin(theta)
-sin/th:a/]
cos
theta
(b)
n e w = cos(theta) * x + sinltheta) * z
newy = y
newz = -sin(theta) * x + cos(theta) * z
Matrix form of rotation around
-Y axis:
-_
~
(c)
n e w = cos(theta) * x - sin(theta) * y
newy = sin(theta) * x + cos(theta) * y
newz = z
8]
k]
3 - 0 rotation formulas.
Figure 50.4
938
Chapter 50
useful for two reasons: First, it is possible to concatenate multiple rotations into a
single matrix by multiplying them together in the desired order; that single matrix
can then be used to perform the rotations more efficiently.
Second, 3x3 rotation matrices can become the upper-left-hand portions of 4x4 matrices that also perform translation (and scaling as well,but we won't need scaling in
the near future),as shown in Figure50.5. A 4x4 matrix of this sort utilizes homogeneous coordinates; that's a topic way beyond this book, but, basically, homogeneous
coordinates allow you to handle both rotations and translations with 4x4 matrices,
thereby allowing the same code towork witheither, and making it possible to concatenate a long series of rotations and translations into a single matrix that performs
the same transformation as the sequence of rotations and transformations.
There's much more to be said about transformations and the supporting matrix
math, but,in the interests of getting toworking code in this chapter, I'll leave that to
be discussed as the need arises.
""""""""""""-
-1
L 9
,""""""""""""""""
r""-
-1
$ 0
'I
'
"
"
"
"
"
"
"
"
"
"
"
"
"
"
"
939
projected polygon (complete with clipping) and handle the otherdetails of animation (2-D functionality).
Happily (and notcoincidentally), we put together a nice 2-D animation framework
back in Chapters47,48, and 49, during ourexploratory discussion of Mode X, so we
dont have much toworry about in terms of non-3-Ddetails. Basically, well use Mode
X (320x240, 256 colors), and well flip between two display pages, drawing to one
while the otheris displayed. One new 2-D element thatwe need is the ability to clip
polygons; whilewe could avoid this for the momentby restricting the rangeof motion of the polygon so that itstays fully on the screen,certainly in the long runwell
want to be able to handle
partially or fully clipped polygons. Listing 50.1 is the lowlevel code for Mode
a
X polygon filler that supports clipping.
(The high-level polygon
fill code is mode independent, andis the same as that presented in Chapters
38,39,
and 40,as noted furtheron.) The clipping is implemented at the low level, by trimming the Y extent of the scan line list up front, thenclipping the X coordinates of
each scan line in turn. This is not a particularly fast approach to clipping-ideally,
the polygon would be clipped before itwas scanned into a linelist, avoiding potentially wasted scanning and eliminating the line-by-line X clipping-but its much
simpler, and, as we shall see, polygon filling performance is the least of our worries at
the moment.
150- 1.ASM
LISTING 50.1
; Draws a l l p i x e l s i n t h e l i s t o f h o r i z o n t a l l i n e s p a s s e d i n , i n
: Mode X . t h e VGAs u n d o c u m e n t e d3 2 0 x 2 4 02 5 6 - c o l o r
mode. C l i p s t o
: t h er e c t a n g l es p e c i f i e db y
(ClipMinX.ClipMinY).(ClipMaxX.ClipMaxY).
: Draws t ot h ep a g es p e c i f i e db yC u r r e n t P a g e B a s e .
; C n e a r - c a l l a b l ea s :
: A
l a s s e m b l yc o d et e s t e dw i t h
HLine
XStart
XEnd
HLine
ends
H L i n e L i s ts t r u c
Lngth
YStart
HLinePtr
e n dsnstH
eLi
Parms
?
?
dw
dw
dw
?
?
?
: S e q u e n c eC o n t r o l l e rI n d e x
Mask r e g iiinsndt ee rx
l i: nX e cl ieonfpotirm
xdoeoif nls at t e
l i n :eX r icngophoit xrm
odefoil ns at t e
; I o f hl ionrei sz o n t a l
l ti no;Yep mcoosot fr d i n a t e
l i nheosr z o f l i s ;t p tooi n t e r
struc
HLineListPtr
940
dw
dw
HLineListPtr.
Chapter 50
dw
dw
2 d u ;pr(e?at)udrdnr e s s
?
H Ls itnr ue ;tcLpoti ous itrnet e r
& pushed BP
SC
Color
Parms
ends
line
dw
t o w h i c hw i t h: c o l o r
fill
.model
small
.data
-CurrentPageBase:word._ClipMinX:word
extrn
extrn
-ClipMinY:word.LClipMaxX:word,-ClipMaxY:word
: P l a n em a s k sf o rc l i p p i n gl e f ta n dr i g h te d g e so fr e c t a n g l e .
0 0 f h .L0e0fet C
h .ld0i pb0Pclha.n0e0M
8 ha s k
0R
0 1i ghh. 0t C0 l3i phdP
.b0l 0a 7n he .M0 a0 sf hk
.code
2
align
ToFil lDone:
jmp
F iDone
11
public-0rawHorizontalLineList
2
a1 i g n
- D r a w H o r i z o n t a l L i n e L i s tp r o c
; p r e s e r v ec a l l e r ' ss t a c kf r a m e
bp push
; p o i n tt oo u rs t a c kf r a m e
mov
bp.sp
: p r e s e r v ec a l l e r ' sr e g i s t e rv a r i a b l e s
s ip u s h
d ip u s h
;make s t r i n g i n s t r u c t i o n s i n c p o i n t e r s
cld
mov
dx.SC-INDEX
mov
a1 .MAPFMASK
: p o i n t SC I n d e x t o t h e
Map Mask
dx.al out
mov
ax.SCREENLSEGMENT
: p o i n t ES t o d i s p l a y memory f o r REP STOS
es ,ax
mov
;pointtothelinelist
mov
si.[bp+HLineListPtrl
: p o i n tt ot h eX S t a r t / X E n dd e s c r i p t o r
mov
bx.[si+HLinePtrl
: f o rt h ef i r s t( t o p )h o r i z o n t a ll i n e
: f i r s ts c a nl i n et od r a w
mov
cx.[si+YStartl
:# o f s c a nl i n e st od r a w
mov
. [ ssi i+ L n g t h l
; a r et h e r ea n yl i n e st od r a w ?
cmp
si.0
:no. s o w e ' r ed o n e
j 1e
ToFi11Done
: c l i p p e da tt o p ?
cmp
cx.[LClipMinYI
:no
M i n Y NjogteC l i p p e d
; y e s .d i s c a r dh o w e v e r
many l i n e s a r e
cx
neg
: clipped
c x . [ - C lai pd M
d inYl
; t h a t many f e w e rl i n e st od r a w
si.cx sub
:no l i n e s l e f t t o d r a w
j 1e
ToFi11Done
: l i n e st os k i p * 2
cx.1 shl
; l i n e st os k i p * 4
cx.1 shl
: a d v a n c et h r o u g ht h el i n el i s t
bx.cx add
:startatthetopclipline
mov
c x . [LC1 ipMi nY 1
MinYNotClipped:
mov
dx.si
+ 1
; b o t t o mr o wt od r a w
dx,cx add
: c l i p p e da tb o t t o m ?
cmp
d x . [LC1 ipMaxY 1
:no
jle
MaxYNotClipped
:# o f l i n e s t o c l i p o f f t h e b o t t o m
d x . [ - C lsi pu M
b axYI
:# o f l i n e s l e f t t o d r a w
sub
. d sx i
: a l ll i n e sa r ec l i p p e d
T o F i l lj O
l eo n e
MaxYNotClipped:
:pointtothestartofthefirst
mov
ax,SCREENlWIDTH/4
: s c a nl i n eo nw h i c ht od r a w
mu1
cx
:offsetoffirstline
ax.[-CurrentPageBasel
add
:ES:DX p o i n t s t o f i r s t s c a n l i n e t o d r a w
mov
dx.ax
fill
; c o l o rw i t hw h i c ht o
mov
a h , b[pybtper + C o l o r l
F i 11 Loop:
:remember l i n e l i s t l o c a t i o n
bx push
o f s t a r t: r oe fm
o fefdm
sxebpteurs h
Adding a Dimension
941
s ip u s h
;remember # o f 1 i n e s t o d r a w
mov
, [ bdxi+ X S t a r t ]
fill o n t h i s l i n e
; l e f te d g eo f
cmp
d i , [LC1 ipMinX1
; c l i p p e d t o 1 e f t edge?
M
inXNotCl ipped
jge
;no
mov
d i , [LC1 ipMinX]
;yes.cliptotheleft
edge
M i nXNotCl ipped:
mov
. dsi i
mov
cx.Cbx+XEndl
; r i g h te d g eo f
fill
cmp
c x , [LC1 ipMaxX]
; c l i p p e dt or i g h te d g e ?
;no
jl
MaxXNotCl
ipped
mov
c x , [LC1
ipMaxX]
; y e s ,c l i pt ot h er i g h t
edge
cx
dec
MaxXNotCl ipped:
cx.di
CmP
i f n e g awt i dv et h
L i nl D
e; sF
okni li ep
jl
shr
;X/4
of fsripfreocsiscfaxnettent l
d i .1
d i ,1
shr
; line
d i ,d x
add
; o f f s e to ff i r s tr e c tp i x e li nd i s p l a y
mem
dx.si
mov
and
l esf tm
ip-u,ela0;palds0ongk0oee3k h
bh.LeftClipPlaneMask[si]
; t oc l i p
& p u i tn
BH
mov
si .cx
mov
r pi gl ahand
nt -ee dugpe ;sl oi o, 0k0 0 3 h
mov
bl.RightClipPlaneMask[sil
; mask t oc l i p
& p u it n
BL
and
d x . n o tO l l b
; c a l c u l a t e I o fa d d r e s s e sa c r o s sr e c t
cx.dx
sub
shr
CX.1
shr
cx.1
; # o fa d d r e s s e sa c r o s sr e c t a n g l et o
fill - 1
MasksSet
jnz
; t h e r e ' sm o r et h a no n eb y t et od r a w
bh,bl
and
so c o m b i n e t h e l e f t
; t h e r e ' so n l yo n eb y t e ,
; a n dr i g h te d g ec l i p
masks
MasksSet:
mov
dx.SC-INDEX+l
;alreadypointstothe
Map Mask r e g
F i 11 RowsLoop:
mov
a1 ,bh
;putleft-edgeclip
mask i n AL
out
dx,al
; s e tt h el e f t - e d g ep l a n e( c l i p )
mask
mov
a1 .ah
; p u tc o l o ri n
AL
stosb
; d r a wt h el e f te d g e
; c o u n to f fl e f te d g eb y t e
dec
cx
Fi 11 LoopBottom
; t h a t ' st h eo n l yb y t e
js
DoRightEdge
; t h e r ea r eo n l yt w ob y t e s
jz
rnov
; m i d d l ea d d r e s s e sa r ed r a w n
4 p i x e l s a t a pop
a1 .OOfh
out
dx.al
; s e tt h em i d d l ep i x e l
mask t o n o c l i p
mov
a1 ,ah
; p u t c o l o r i n AL
stosb
; d r a wt h em i d d l ea d d r e s s e sf o u rp i x e l sa p i e c e
rep
DoRightEdge:
mov
a1 ,b l
; p u tr i g h t - e d g ec l i p
mask i n AL
dx.al
out
; s e tt h er i g h t - e d g ep l a n e( c l i p )
mask
a1 .ah
mov
:putcolorin
AL
; d r a wt h er i g h te d g e
stosb
F i l l LoopBottom:
LineFillDone:
si
pop
;retrieve # oflinestodraw
;retrieveoffsetofstartofline
POP
dx
;retrievelinelistlocation
POP
bx
add
dx,SCREENLWIDTH/4
;pointtostartofnextline
b xaH
.dsLdi zi nee
:pointtothenextlinedescriptor
si
dec
: c o u n t down l i n e s
j 1nLF1zoi o p
;XStart
942
Chapter 50
F i 11 Done:
pop
di
pop
si
POP
bP
ret
- D r a w H o r i z o n t a l L i n e L i s te n d p
end
; r e s t o r ec a l l e r ' sr e g i s t e rv a r i a b l e s
: r e s t o r ec a l l e r ' ss t a c kf r a m e
The other2-D element we need is some way to erase the polygon at its old location
before it's moved and redrawn. We'll do that by remembering the bounding rectangle of the polygon each time it's drawn, then erasing by clearing that area with a
rectangle fill.
With the 2-D side of the picture well under control, we're ready to concentrate on
the good stuff. Listings 50.2 through 50.5 are the sample 3-D animation program.
Listing 50.2 provides matrix multiplication functions in a straightforward fashion.
Listing 50.3 transforms, projects, and draws polygons. Listing 50.4 is the general
header file for the program, and Listing 50.5 is the main animation program.
Other modules required are:Listings 47.1 and 47.6 from Chapter47 (Mode X mode
set, rectangle fill); Listing 49.6 from Chapter49; Listing 39.4 from Chapter39 (polygon edge scan);and theFillConvexPolygon()function fromListing 38.1 in Chapter
38. All necessary code modules, along with a project file, are present in the
subdirectory for this chapter on the
listings disk,whether they werepresented in this
chapter orsome earlier chapter. This will be the case for the nextseveral chapters as
well, where listings from previous chapters are, referenced. This scheme may crowd
the listings diskette a little bit, but it will certainly reduce confusion!
150-2.C
LISTING 50.2
/*
M a t r i xa r i t h m e t i cf u n c t i o n s .
C++ i nt h es m a l lm o d e l .
T e s t e dw i t hB o r l a n d
/*
M a t r i xm u l t i p l i e sX f o r mb yS o u r c e V e c .a n ds t o r e st h er e s u l ti n
D e s t V e c .M u l t i p l i e s a 4 x 4m a t r i xt i m e s
a 4 x 1m a t r i x :t h er e s u l t
i s a 4 x 1m a t r i x ,a sf o l l o w s :
..
1 4 x 4
"
"
"
"
.. ..
"
I 1 41 14 1
I x 1 x 1
I
I ..
Il l l
"
1 x 1
"
"
*/
v o i dX f o r m V e c ( d o u b 1 eX f o r m [ 4 1 [ 4 1 .d o u b l e
double * DestVec)
*I
SourceVec.
i n t i.j;
f o r( i - 0 ;i < 4 :
i++)
I
DestVecCil
0;
f o r (j-0; j < 4 : j++)
D e s t V e c C i l +- X f o r m C i l C j l
SourceVecCjl:
Adding a Dimension
943
/*
M a t r i xm u l t i p l i e sS o u r c e X f o r m lb yS o u r c e X f o r m Za n ds t o r e st h e
a 4 x 4m a t r i xt i m e s
a 4 x 4m a t r i x ;
r e s u l ti nD e s t X f o r m .M u l t i p l i e s
a 4 x 4m a t r i x ,a sf o l l o w s :
theresultis
"
"
"
"
"
"
"
"
"
"
"
..
*/
v o i dC o n c a t X f o r m s ( d o u b 1 eS o u r c e X f o r m l [ 4 1 [ 4 1 .d o u b l eS o u r c e X f o r m 2 [ 4 1 [ 4 1 ,
d o u b l eD e s t X f o r m [ 4 1 C 4 1 )
i n ti , j , k ;
i++)
{
f o r( i - 0 :i < 4 ;
f o r( j - 0 ;j < 4 :
j++)I
DestXform[il[jl
0;
f o r (k-0:k<4;
k++)
D e s t X f o r m [ i l [ j l +- S o u r c e X f o r m l C i l [ k l
SourceXform2~klCjl;
T r a n s f o r m sc o n v e xp o l y g o nP o l y( w h i c hh a sP o l y L e n g t hv e r t i c e s ) .
p e r f o r m i n gt h et r a n s f o r m a t i o na c c o r d i n gt oX f o r m( w h i c hg e n e r a l l y
r e p r e s e n t s a t r a n s f o r m a t i o nf r o mo b j e c ts p a c et h r o u g hw o r l ds p a c e
t ov i e ws p a c e ) .t h e np r o j e c t st h et r a n s f o r m e dp o l y g o no n t ot h e
it i nc o l o rC o l o r .A l s ou p d a t e st h ee x t e n to ft h e
screenanddraws
r e c t a n g l e( E r a s e R e c t )t h a t ' su s e dt oe r a s et h es c r e e nl a t e r .
C++ i nt h es m a l lm o d e l .
*/
T e s t e dw i t hB o r l a n d
#i
ncl ude "polygon. h"
v o i d XformAndProjectPoly(doub1e X f o r m [ 4 ] [ 4 ] .s t r u c tP o i n t 3
i n tP o l y L e n g t h .i n tC o l o r )
Poly,
i n t i;
s t r u c t P o i n t 3 XformedPoly[MAX_POLY-LENGTH];
s t r u c t P o i n t ProjectedPoly[MAX-POLY-LENGTH];
s t r u c tP o i n t L i s t H e a d e rP o l y g o n :
/ * T r a n s f o r mt ov i e ws p a c e ,t h e np r o j e c tt ot h es c r e e n
*/
f o r( i - 0 i; < P o l y L e n g t h ;
i++)
{
/ * T r a n s f o r mt ov i e ws p a c e
*/
X f o r m V e c ( X f o r m .( d o u b l e* ) & P o l y [ i l .( d o u b l e* ) & X f o r m e d P o l y [ i l ) ;
/ * P r o j e c t t h e X & Y c o o r d i n a t e st ot h es c r e e n ,r o u n d i n gt ot h e
n e a r e s ti n t e g r a lc o o r d i n a t e s .
The Y c o o r d i n a t ei sn e g a t e dt o
Y i s up. t os c r e e n
f l i pf r o mv i e ws p a c e ,w h e r ei n c r e a s i n g
Y i s down.Add
i nh a l ft h es c r e e n
s p a c e ,w h e r ei n c r e a s i n g
w i d t ha n dh e i g h tt oc e n t e r
on t h es c r e e n */
ProjectedPolyCi1.X
( ( i n t ) (XformedPoly[i].X/XformedPoly[i].Z
PROJECTION~RATIO*~SCREEN~WIDTH/Z.O~+O.5~~+SCREEN~WIDTH/Z;
ProjectedPolyCi1.Y
( ( i n t ) (XformedPolyCil.Y/XformedPoly[i].Z
-1.0 * PROJECTION-RATIO * (SCREEN-WIDTH / 2 . 0 ) + 0.5)) +
SCREEN-HEIGHT/Z:
/ * A p p r o p r i a t e l ya d j u s tt h ee x t e n to ft h er e c t a n g l eu s e dt o
*/
e r a s et h i sp a g el a t e r
i f ( P r o j e c t e d P o l y C i 1 . X > EraseRect[NonDisplayedPage].Right)
i f ( P r o j e c t e d P o l y C i 1 . X < SCREENKWIDTH)
EraseRect[NonDisplayedPage].Right
ProjectedPoly[i].X:
e l s e EraseRect[NonDisplayedPagel.Right
SCREEN-WIDTH;
--
944
Chapter 50
i f (ProjectedPoly[il.Y
i f (ProjectedPoly[il.Y
>
EraseRect[NonDisplayedPagel.Bottom)
SCREENkHEIGHT)
EraseRect[NonDisplayedPagel.Bottom = P r o j e c t e d P o l y [ i l . Y :
SCREEN-HEIGHT:
e l s e EraseRect[NonDisplayedPagel.Bottom
i f ( P r o j e c t e d P o l y [ i l . X < EraseRect[NonDisplayedPagel.Left)
i f (ProjectedPolyCi1.X > 0)
EraseRect[NonDisplayedPagel.Left = ProjectedPolyCi1.X:
e l s e E r a s e R e c t [ N o n D i s p l a y e d P a g e l . L e f t = 0;
i f ( P r o j e c t e d P o l y [ i ] . Y < EraseRect[NonDisplayedPagel.Top)
i f (ProjectedPolyCi1.Y > 0)
EraseRect[NonDisplayedPagel.Top = P r o j e c t e d P o l y [ i l . Y ;
e l s e EraseRect[NonDisplayedPagel.Top = 0 :
<
/ * Draw t h ep o l y g o n * /
DRAWlPDLYGON(ProjectedPo1y. P o l y L e n g t hC
. olor.
0. 0 ) ;
LISTING 50.4
/*
POLYG0N.H
POLYG0N.H: Header f i l e f o r p o l y g o n - f i l l i n g c o d e , a l s o i n c l u d e s
*/
a number o fu s e f u li t e m sf o r3 - 0a n i m a t i o n .
# d e f i n e MAX-POLY-LENGTH 4 /* f o u rv e r t i c e si st h e
max p e rp o l y */
# d e f i n e SCREENKWIDTH 320
# d e f i n e SCREEN-HEIGHT 240
# d e f i n e PAGEDKSTART-OFFSET 0
# d e f i n e PAGElLSTART-OFFSET (((long)SCREENKHEIGHT*SCREEN-WIDTH)/4)
/* R a t i o : d i s t a n c e f r o m v i e w p o i n t t o p r o j e c t i o n p l a n e
/ w i d t ho f
p r o j e c t i o np l a n e .D e f i n e st h ew i d t ho ft h ef i e l do fv i e w .L o w e r
w i d e r f i e l d s o f v i e w :h i g h e rv a l u e s
= narrower
a b s o l u t ev a l u e s
# d e f i n e PRDJECTIDN-RATIO
- 2 . 0 / * n e g a t i v eb e c a u s ev i s i b l e
Z
c o o r d i n a t e sa r en e g a t i v e
*I
/ * Draws t h e p o l y g o n d e s c r i b e d b y t h e p o i n t l i s t P o i n t L i s t i n c o l o r
( X . Y ) */
C o l o rw i t ha l lv e r t i c e so f f s e tb y
# d e f i n e DRAW-POLYGON(PointList.NumPoints.Co1or.X.Y)
\
Polygon.Length
NumPoints;
\
Polygon.PointPtr
PointList;
\
FillConvexPolygon(&Polygon. C o l o r , X . Y):
/* D e s c r i b e s a s i n g l e2 - Dp o i n t
s t r u c tP o i n t I
i n t X:
/* X coordinate */
i n t Y:
/* Y coordinate */
I:
/*
*/
*/
D e s c r i b e s a s i n g l e3 - Dp o i n ti n
s t r u c tP o i n t 3 {
double X;
/* X coordinate */
double Y :
/* Y coordinate */
double Z;
/* 2 coordinate */
double W:
h o m o g e n e o u sc o o r d i n a t e s
*/
1:
/*
D e s c r i b e s a s e r i e so fp o i n t s( u s e dt os t o r e
a listofverticesthat
d e s c r i b e a p o l y g o n :e a c hv e r t e xi s
assumed t oc o n n e c tt ot h et w o
a d j a c e n tv e r t i c e s ,a n dt h el a s tv e r t e xi s
assumed t o c o n n e c t t o t h e
f i r s t ) */
s t r u c tP o i n t L i s t H e a d e r
I
i n tL e n g t h :
/* # o f p o i n t s * /
s t r u c tP o i n t * P o i n t P t r :
/* p o i n t e r t o l i s t o f p o i n t s
*/
1:
Adding a Dimension
945
I* D e s c r i b e st h eb e g i n n i n ga n de n d i n g
h o r i z o n t a ll i n e * I
s t r u c tH L i n e
i n tX S t a r t :
i n t XEnd;
X c o o r d i n a t e so f
a single
I* X c o o r d i n a t e o f l e f t m o s t p i x e l
i n l i n e *I
I* X c o o r d i n a t eo fr i g h t m o s tp i x e li nl i n e
*I
1:
I* D e s c r i b e s a L e n g t h - l o n gs e r i e so fh o r i z o n t a ll i n e s ,a l l
assumed t o
be o nc o n t i g u o u ss c a nl i n e ss t a r t i n ga tY S t a r ta n dp r o c e e d i n g
downward(used
t o d e s c r i b e a s c a n - c o n v e r t e dp o l y g o nt ot h e
*/
l o w - l e v e hl a r d w a r e - d e p e n d e n td r a w i n gc o d e )
s t r u c tH L i n e L i s t I
i n tL e n g t h :
I* # o f h o r i z o n t a l l i n e s
*I
intYStart:
I* Y c o o r d i n a t e o f t o p m o s t l i n e
s t r u c tH L i n e * H L i n e P t r :
I* p o i n t e r t o l i s t o f h o r z l i n e s
I:
s t r u c tR e c t
{ i n tL e f t ,
*I
*I
1:
T o pR
, i g h tB
, ottom:
e x t e r nv o i dX f o r m V e c ( d o u b 1 eX f o r m C 4 3 [ 4 1 .d o u b l e
* SourceVec.
double * DestVec):
e x t e r nv o i dC o n c a t X f o r m s ( d o u b 1 eS o u r c e X f o r m l [ 4 ] [ 4 ] .
d o u b l eS o u r c e X f o r m 2 [ 4 ] [ 4 ] .d o u b l eD e s t X f o r m [ 4 1 [ 4 1 ) ;
e x t e r nv o i d X f o r m A n d P r o j e c t P o l y ( d o u b 1 e X f o r m [ 4 ] [ 4 ] .
s t r u c tP o i n t 3 * P o l y ,i n tP o l y L e n g t h ,i n tC o l o r ) :
e x t e r n i n t FillConvexPolygon(struct P o i n t L i s t H e a d e r *, i n t . i n t . i n t ) ;
e x t e r nv o i dS e t 3 2 0 ~ 2 4 0 M o d e ( v o i d ) :
e x t e r nv o i dS h o w P a g e ( u n s i g n e di n tS t a r t o f f s e t ) :
e x t e r nv o i dF i l l R e c t a n g l e X ( i n tS t a r t X .i n tS t a r t Y .i n t
EndX,
i n t EndY, u n s i g n e di n tP a g e B a s e .i n tC o l o r ) :
e x t e r ni n tD i s p l a y e d P a g e .N o n D i s p l a y e d P a g e :
e x t e r ns t r u c tR e c tE r a s e R e c t C ] :
LISTING 50.5
150-5.C
I* S i m p l e3 - Dd r a w i n gp r o g r a mt ov i e w
a p o l y g o na s
it r o t a t e s i n
i sc o n g r u e n tw i t hw o r l ds p a c e ,w i t ht h e
Mode X . Viewspace
( 0 . 0 . 0 ) o fw o r l ds p a c e ,l o o k i n gi n
v i e w p o i n tf i x e da tt h eo r i g i n
Z. A r i g h t - h a n d e d
t h ed i r e c t i o no fi n c r e a s i n g l yn e g a t i v e
c o o r d i n a t es y s t e mi su s e dt h r o u g h o u t .
C++ i nt h es m a l lm o d e l .
*/
T e s t e dw i t hB o r l a n d
# i n c l u d e< c o n i o . h >
# i n c l u d e< s t d i o . h >
#i
ncl ude <dos. h>
#i
ncl ude <math. h>
l i n c l ude "polygon. h"
v o i dm a i n ( v o i d ) :
I* B a s e o f f s e t o f p a g e t o w h i c h t o d r a w
u n s i g n e di n tC u r r e n t P a g e B a s e
= 0;
/ * C l i pr e c t a n g l e :c l i p st ot h es c r e e n
*I
*/
i n t ClipMinX-0C
. lipMinY-0:
i n t ClipMaxX-SCREEN-WIDTH.
ClipMaxY-SCREEN-HEIGHT;
/ * R e c t a n g l es p e c i f y i n ge x t e n tt ob ee r a s e di ne a c hp a g e
*/
s t r u c tR e c tE r a s e R e c t C Z ]
[ IO. 0 . SCREEN-WIDTH. SCREEN-HEIGHT}.
[ O , 0 . SCREEN-WIDTH,
SCREEN-HEIGHT}
}:
I* T r a n s f o r m a t i o nf r o mp o l y g o n ' so b j e c ts p a c et ow o r l ds p a c e .
I n i t i a l l y s e t up t o p e r f o r m n o r o t a t i o n a n d t o
move t h ep o l y g o n
i n t ow o r l ds p a c e- 1 4 0u n i t s
away f r o m t h e o r i g i n
down t h e Z a x i s .
u n i t s away
G i v e nt h ev i e w i n gp o i n t ,- 1 4 0
down t h e 2 a x i s means140
s t r a i g h t ahead i n t h e d i r e c t i o n o f v i e w .
T h ep r o g r a md y n a m i c a l l y
*/
c h a n g e st h er o t a t i o na n dt r a n s l a t i o n .
946
Chapter 50
s t a t i cd o u b l eP o l y W o r l d X f o r m [ 4 1 [ 4 1
I
{1.0. 0.0, 0.0. 0 . 0 1 ,
(0.0. 1.0, 0.0, 0.0).
{O.O. 0.0, 1 . 0 , -140.01.
{O.O. 0.0, 0.0, 1.0) ) :
/ * T r a n s f o r m a t i o nf r o mw o r l ds p a c ei n t ov i e ws p a c e .I nt h i sp r o g r a m .
t h ev i e wp o i n ti sf i x e da tt h eo r i g i no fw o r l ds p a c e ,l o o k i n g
down
2. s o v i e w
t h e 2 a x i si nt h ed i r e c t i o no fi n c r e a s i n g l yn e g a t i v e
*/
space i s i d e n t i c a l t o w o r l d s p a c e : t h i s i s t h e i d e n t i t y m a t r i x .
s t a t i cd o u b l eW o r l d V i e w X f o r m [ 4 1 [ 4 1
I
{1.0.
t0.0,
(0.0,
(0.0.
0.0, 0.0,
0.01.
1 . 0 . 0.0, 0 . 0 1 .
0.0, 1 . 0 , 0.01.
0.0, 0.0, 1.01
):
s t a t i cu n s i g n e di n tP a g e S t a r t O f f s e t s [ E l
{PAGEO_START-OFFSET.PAGEl-STARTCOFFSET1:
inD
t isplayedPageN
. onDisplayedPage:
v o i dm a i n 0
[
i n t Done
0:
d o u b l eW o r k i n g X f o r m [ 4 1 [ 4 1 ;
s t a t i cs t r u c tP o i n t 3T e s t P o l y C l
~t-30.-15.0.11.~0.15.0,11,t10,-5,0,111;
# d e f i n e TEST-POLY-LENGTH
d o u b l eR o t a t i o n = " P I
u n i o n REGS r e g s e t ;
(sizeof(TestPoly)/sizeof(struct P o i n t 3 ) )
/* i n i t i a l r o t a t i o n
3 d e g r e e s */
/ 60.0:
Set320x240ModeO;
ShowPage(PageStartOffsetsCDisp1ayedPage
01);
/ * Keep r o t a t i n gt h ep o l y g o n ,d r a w i n g
i t t ot h eu n d i s p l a y e dp a g e .
and f l i p p i n g t h e p a g e t o
show it * /
do
CurrentPageBase =
/ * s e l e cot t h e pr a g e
f o r d r a w i n gt o */
P a g e S t a r t O f f s e t s [ N o n O i s p l a y e d P a g e = D i s p l a y e d P a g e A 11;
/ * M o d i f yt h eo b j e c ts p a c et ow o r l ds p a c et r a n s f o r m a t i o nm a t r i x
f o rt h ec u r r e n tr o t a t i o na r o u n dt h e
Y a x i s */
PolyWorldXform[O][O] = P o l y W o r l d X f o r m [ 2 1 [ Z 1
cos(Rotation);
-(PolyWorldXform[O1[~]
sin(Rotation)):
PolyWorldXform[E][O]
/ * C o n c a t e n a t et h eo b j e c t - t o - w o r l da n dw o r l d - t o - v i e w
will
t r a n s f o r m a t i o n s t o make a t r a n s f o r m a t i o n m a t r i x t h a t
c o n v e r tv e r t i c e sf r o mo b j e c ts p a c et ov i e ws p a c ei n
a single
operation */
C o n c a t X f o r m s ( W o r 1 d V i e w X f o r m . P o l y W o r l d X f o r mW
, orkingXform);
/ * C l e a rt h ep o r t i o no ft h en o n - d i s p l a y e dp a g et h a t
was drawn
t ol a s tt i m e ,t h e nr e s e tt h ee r a s ee x t e n t
*/
FillRectangleX(EraseRect[NonDisplayedPagel.Left,
EraseRect[NonDisplayedPagel.Top.
EraseRect[NonDisplayedPagel.Right.
EraseRect[NonDisplayedPagel.Bottom. C u r r e n t P a g e B a s e . 0 ) ;
EraseRect[NonDisplayedPagel.Left
EraseRect[NonDisplayedPagel.Top
Ox7FFF:
EraseRect[NonDisplayedPagel.Right =
EraseRect[NonDisplayedPagel.Bottom
0;
/ * T r a n s f o r mt h ep o l y g o n ,p r o j e c t
i t o nt h es c r e e n ,d r a w
i t */
X f o r m A n d P r o j e c t P o l y ( W o r k i n g X f o r m . T e s t P o l y . TEST-POLYCLENGTH.9):
/* F l i pt od i s p l a yt h ep a g ei n t ow h i c h
we j u s t d r e w * /
ShowPage(PageStartOffsets[DisplayedPage = NonDisplayedPagel):
/ * R o t a t e 6 d e g r e e sf a r t h e ra r o u n dt h e
Y a x i s */
i f ( ( R o t a t i o n += ( M - P I / 3 0 . 0 ) ) >- ( M _ P I * E ) )R o t a t i o n
- = M-PI*2;
Adding a Dimension
947
if (kbhit0) {
s w i t c h( g e t c h 0 )
{
case
0x16:
/* Esc t oe x i t */
Done
1; b r e a k ;
/ * away ( - 2 ) * I
c a s e ' A ' : c a s* ea ' :
P o l y W o r l d X f o r m C E l [ 3 1 -- 3 . 0 b; r e a k ;
/* t o w a r d s (+2). D o n 'at l l o wt go et to o
case ' T ' :
case ' t ' :
/ * c l o s e , s o 2 c l i p p i nigs n n' te e d e d
i f ( P o l y W o r l d X f o r m [ 2 1 C 3 1 < -40.0)
P o l y W o r l d X f o r m ~ 2 1 C 3 1+- 3 . 0 :b r e a k ;
case 0:
/ * e x t e n d ec od d e
*/
s w i t c h( g e t c h 0 )
I
case
0x46:
/* l e f t ( - X ) */
3 . 0 b; r e a k :
PolyWorldXformC01[31
case Ox4D:
/ * r i g h t (+X) */
PolyWorldXformCO1[31 +- 3 . 0 ;b r e a k :
case
0x48:
/ * up ( + Y ) */
P o l y W o r l d X f o r m [ 1 1 [ 3 1 +- 3 . 0 ;b r e a k ;
case
0x50:
/ * down ( - Y ) */
P o l y W o r l d X f o r m ~ 1 3 [ 3 1 -- 3 . 0 b; r e a k :
default:
break:
*I
*/
--
break:
/ * a no yt hkepetroay u s e
default:
g e t c h 0 :b r e a k ;
*I
} w h i l e( ! D o n e ) ;
mode and e x i t */
regset.x.ax
0x0003;
/* AL 3 s e l e c t s8 0 x 2 5t e x t
i n t 8 6 ( 0 x 1 0 .& r e g s e t ,& r e g s e t ) ;
I* R e t u r n t o t e x t
mode
*/
948
Chapter 50
Next
Previous Home
with the projection and 2-D clipping, and would require 3-D clipping, which is far
more complicated than2-D. Well get to3-D clipping at some point, but,for now, its
much simplerjust to limit all vertices to negative Z coordinates. The polygon does
get mighty close to the viewpoint, though; run the program and use the Tkey to
move the polygon as close as possible-the near vertex swinging pastprovides a striking sense of perspective.
The performance of Listing 50.5 is, perhaps, surprisingly good, clocking in at 16
frames per second on a 20 MHz 386 with a VGA of average speed and no 387, although there is, of course, only one polygon being drawn, rather than the hundreds
or thousands wed ultimately like. Whats far more interesting is where the execuone polygon, 73 percent
tion time goes.
Even though theprogram is working with only
of the time goes for transformation and projection.
An additional 7 percent is spent
waiting to flip the screen. Only 20 percent of the total time is spent in all other
activity-and only 2 percent is spent actually drawing polygons. Clearly, well want to
tackle transformation and projection first when we look to speed things up. (Note,
however, that a math coprocessor would considerably decrease the time taken by
floating-point calculations.)
In Listing 50.3, when the extent of the bounding rectangle is calculated for later
erasure purposes, that extent is clipped to the screen. This is due to the lack of
clipping in the rectangle fill code from Listing 47.5 in Chapter 47; the problem
would more appropriately be addressed by putting clipping into the fill code, but,
unfortunately, Ilack the space to do that here.
Finally, observe the jaggiescrawling along theedges of the polygon asit rotates. This
is temporal aliasing at its finest! We wont address antialiasing further, realtime
antialiasing being decidedly nontrivial, but this should give you an idea of why
antialiasing is so desirable.
An Ongoing Journey
In the next chapter,
well assignfronts and backs to polygons, and start drawing only
those that are facing the viewer. That will enable us to handle convex polyhedrons,
such as tetrahedrons andcubes. Well also look at interactively controllable rotation,
and at morecomplex rotations than thesimple rotation around the Y axis that we
did this time. In time, well use fixed-point arithmetic to speed things up, and do
some shading and texture mapping. The journey
has only begun; well get to all that
and more soon.
Adding a Dimension
949
Previous
chapter 51
sneakers in space
Home
Next
In some senses, however, 3 - 0 animation is easier than 2-0. Because thereS more
going on in 3 - 0 animation, the eye and brain tend to make more assumptions, and
so are more apt to see what they expect to see, rather than what 5. actually there.
953
about a bit.
Youll be busy viewingthe asteroids in their
primary role, as objects to be
navigated around, and themere presence of topographic detailwill suffice;without
being aware of it, youll fill in theblanks. Your mind will see the topography peripherally, recognize it for what it is supposed to be, and, unless the landscape does
something really obtrusive such as vanishing altogether or suddenly shooting spike
a
miles into space, you will see what you expect to see: a bunch of nicely detailed
asteroids tumblingaround you.
To what extent can you relyon theeye and mind tomake up for imperfections in the
3-D animation process? In some areas, hardly at all; for example, jaggies crawling
along edges stick out like red flags, and likewise for flicker. In other areas, though,
the human perceptual system is more forgiving than youd think. Consider this: At
the end of Return of the Jedi, in the battle to end all battles around the Death Star,
there is a sequence of about five seconds in which several spaceships are visible in
the background. One of those spaceships (and its not very far in the background,
either) looks a bit unusual.
What it looks like is a sneaker. In fact, it is a sneaker-but
unless you know to look for it, youll never notice it, because your mind is busy
making simplifylng assumptions about the complex scene its seeing-and one of
those assumptionsis that medium-sized objects floatingin space are spaceships, unless
proven otherwise.(Thanks to Chris Heckerfor pointing this out. Id never have noticed
the sneaker,myself, without being tippedoff-which is, of course, thewhole point.)
If its good enough for GeorgeLucas, its good enough for us. And with that, lets
resume our quest for realtime3-D animation on the PC.
954
Chapter 51
955
polygon; then, well know that the polygon is visibleif and only if the cross-product
vector points toward the viewer. We need one morething to make the cross-product
approach work, though. Thecross-product can actually point eitherway, depending
on which edges of the polygon we choose to work with and the order inwhich we
evaluate them, so we must establish some conventions for defining polygons and
evaluating the cross-product.
Well define only convex polygons, with the vertices defined in clockwise order, as
viewed from the outside; that is, if youre looking at thevisible side of the polygon,
the vertices will appear in the polygon definition in clockwise order. With those assumptions, the cross-product becomes a quick and easy indicator of polygon
orientation with respect to the viewer; well calculate it as the cross-product of the
first and last vectors in apolygon, as shownin Figure 51.2, and if its pointing toward
the viewer, well know that the polygon is visible. Actually, we dont even have to
calculate the entirecross-product vector, because the Z component alonesuffices to
tell us which way the polygon is facing: positive Z means visible, negative Z means
not. The Z component can be calculated very efficiently, with only two multiplies
and a subtraction.
The question remains of the proper space in which to perform backface removal.
Theres a temptation
to perform it in
view space, which is,after all, the space defined
with respect to the viewer, but view space is not a good choice. Screen space-the
space in which perspective projection has been performed-is the best choice. The
purpose of backface removal is to determine whether eachpolygon is visible to the
viewer, and, despite its name, view space does not provide that information; unlike
screen space, it doesnot reflect perspective effects.
Vector w
(polygon edge #3)
Vertex 3
Figure 5 1.2
956
Chapter 51
Polygon normal = v x w
(cross-productof v & w)
Vertex 1
Backface removal may alsobe performed using the polygon vertices in screen coordinates, which are integers. This isless accurate than using the screen space
coordinates, which are floating point, but is, by the same token, faster. In Listing
51.3, which we'll discuss shortly, backface removal is performed in screen coordinates in the interests of speed.
Backface removal, asimplemented in Listing 51.3,will not work reliablyif the polygon is not convex, if the vertices don't appearin clockwise order, if either thefirst or
last edge in a polygon has zero length, or if the first and last edges are collinear.
These latter two points are thereason it's preferable to work in screen space rather
than screen coordinates (which suffer from rounding problems), speed considerations aside.
LISTING51.1151-1.C
/*
3D a n i m a t i o np r o g r a mt ov i e w
a cubeas
i t r o t a t e s i n Mode X . T h ev i e w p o i n t
isfixedattheorigin
( 0 . 0 . 0 ) o f w o r l ds p a c e ,l o o k i n g
i n t h ed i r e c t i o n o f
i n c r e a s i n g l yn e g a t i v e Z . A r i g h t - h a n d e dc o o r d i n a t es y s t e mi su s e dt h r o u g h o u t .
Al C c o d et e s t e dw i t hB o r l a n d
C++ i n C c o m p i l a t i o n mode. */
#i
n c l u d e < c o n i0. h>
#i
ncl ude <dos h>
# i n c l u d e< m a t h . h >
#i
ncl ude "polygon. h"
# d e f i n e ROTATION
("PI
/ 30.0)
/*
r o t a t eb y
6 d e g r e e sa t
a time
*/
/*
b a s e o f f s e t o f page t o w h i c h t o d r a w
*/
u n s i g n e di n tC u r r e n t P a g e B a s e
= 0:
/ * C l i pr e c t a n g l e :c l i p st ot h es c r e e n
*/
i n t ClipMinX-0.ClipMinY-0:
i n t ClipMaxX-SCREEN-WIDTH.
ClipMaxY-SCREEN-HEIGHT:
/ * R e c t a n g l es p e c i f y i n ge x t e n tt ob ee r a s e di ne a c hp a g e .
*/
s t r u c tR e c tE r a s e R e c t C E l
I IO. 0 . SCREEN-WIDTH,
SCREEN-HEIGHT),
IO. 0 . SCREEN-WIDTH.
SCREEN-HEIGHT)
I:
..
Sneakers in Space
957
s t a t i cu n s i g n e di n tP a g e S t a r t O f f s e t s C 2 1
{PAGEO-START-OFFSET.PAGEl-START-OFFSETl:
i n t OisplayedPage.NonDisplayedPage:
I* T r a n s f o r m a t i o nf r o mc u b e ' so b j e c ts p a c et ow o r l ds p a c e .I n i t i a l l y
s e tu pt op e r f o r mn or o t a t i o na n dt o
move t h e c u b e i n t o w o r l d
s p a c e- 1 0 0u n i t s
away f r o m t h e o r i g i n
down t h e Z a x i s .G i v e nt h e
v i e w i n gp o i n t ,- 1 0 0
down t h e Z a x i s means 1 0 0u n i t s
away i n t h e
d i r e c t i o no fv i e w .
T h ep r o g r a md y n a m i c a l l yc h a n g e sb o t ht h e
t r a n s l a t i o na n dt h er o t a t i o n .
*I
s t a t i cd o u b l eC u b e W o r l d X f o r m [ 4 1 [ 4 1
I
I 1 . 0 . 0.0, 0.0, 0.0).
/*
T r a n s f o r m a t i o nf r o mw o r l ds p a c ei n t ov i e ws p a c e .B e c a u s ei nt h i s
a p p l i c a t i o nt h ev i e wp o i n ti sf i x e da tt h eo r i g i no fw o r l ds p a c e ,
l o o k i n g down t h e Z a x i s i n t h e d i r e c t i o n o f i n c r e a s i n g
Z . viewspace
i d e n t i c a lt ow o r l ds p a c e ,
and t h i s i s t h e i d e n t i t y m a t r i x .
*I
{
s t a t i cd o u b l eW o r l d V i e w X f o r m [ 4 ] [ 4 1
I 1 . 0 . 0.0, 0.0, 0.0).
{O.O. 1.0, 0.0, 0.01.
{O.O. 0.0, 1.0. 0.01.
{O.O. 0.0, 0.0, 1.01
1:
*I
I* a l lv e r t - i c e si nt h ec u b e
s t a t i cs t r u c tP o i n t 3C u b e V e r t s C l
~15.15.15.11.{15.15.-15,1)~{15,-15,15,1~,~15,-15,-15,1~,
{-15.15.15.1).{-15.15,-15,1)~{-15,-15,15,1~,~-15,-15,-15,1~~;
I* v e r t i c e s a f t e r t r a n s f o r m a t i o n
s t a t i cs t r u c tP o i n t 3
*I
XformedCubeVerts[sizeof(CubeVerts)/sizeof(struct
Point3)l;
*I
I* v e r t i c e s a f t e r p r o j e c t i o n
s t a t i cs t r u c tP o i n t 3
ProjectedCubeVerts[sizeof(CubeVerts)/sizeof(struct
I* v e r t i c e s i n s c r e e n c o o r d i n a t e s
Point3)I;
*I
s t a t i cs t r u c tP o i n t
ScreenCubeVerts[sizeof(CubeVerts)/sizeof(struct
---
I* v e r t e x i n d i c e s f o r i n d i v i d u a l f a c e s
Point3)I:
*I
s t a t i ci n tF a c e l C l
{1.3,2.0};
s t a t i c i n t Face2[1
{5.7.3,1):
s t a t i c i n t Face3[]
(4.5.1.0):
s t a t i c i n t Face4[]
{3.7.6.2):
{5.4.6.7):
s t a t i c i n t Face5[]
s t a t i c i n t Face6[]
{0,2.6.4};
I* l i s t o f c u b ef a c e s * I
s t a t i c s t r u c t FaceCubeFaces[]
~~Face1.4.151.~Face2.4.143.
~Face3.4.12~.{Face4.4,11~,~Face5,4,10~,~Face6,4.91~:
I* m a s t e r d e s c r i p t i o n f o r c u b e
*I
staticstructObject
Cube
{sizeof(CubeVerts)/sizeof(struct P o i n t 3 ) .
CubeVerts.XformedCubeVerts.ProjectedCubeVerts.ScreenCubeVerts.
sizeof(CubeFaces)/sizeof(struct Face).CubeFaces);
v o i dm a i n 0 I
i n t Done
0. RecalcXform
d o u b l eW o r k i n g X f o r m [ 4 1 [ 4 ] :
u n i o n REGS r e g s e t :
- 1:
I* S e t u p t h e i n i t i a l t r a n s f o r m a t i o n
*I
Set320x240ModeO: / * s e t t h e s c r e e n t o
Mode X */
ShowPage(PageStartOffsetsCDisp1ayedPage
01):
958
Chapter 51
is
/*
do
Keep t r a n s f o r m i n gt h ec u b e ,d r a w i n g
i t t ot h eu n d i s p l a y e dp a g e .
and f l i p p i n g t h e page t o show i t * I
/*
R e g e n e r a t et h eo b j e c t - > v i e wt r a n s f o r m a t i o na n d
r e t r a n s f o r m l p r o j e c t i f necessary * I
i f (RecalcXform) I
ConcatXforms(Wor1dViewXform. CubeWorldXform.WorkingXform);
/ * T r a n s f o r ma n dp r o j e c ta l lt h ev e r t i c e si nt h ec u b e
*I
X f o r m A n d P r o j e c t P o i n t s ( W o r k i n g X f o r m , &Cube);
0;
RecalcXform
CurrentPageBase I
/ * s e l e c ot t h e pr a g ef o dr r a w i n gt o
*I
PageStartOffsetsCNonDisplayedPage
D i s p l a y e d P a g e A 11;
I* C l e a r t h e p o r t i o n o f t h e n o n - d i s p l a y e d p a g e t h a t
was drawn
t ol a s tt i m e ,t h e nr e s e tt h ee r a s ee x t e n t
*/
FillRectangleX(EraseRect[NonDisplayedPagel.Left,
EraseRect[NonDisplayedPagel.Top.
EraseRect[NonDisplayedPagel.Right.
- -
EraseRect[NonDisplayedPagel.Bottom, CurrentPageBase. 0 ) ;
EraseRect[NonDisplayedPagel.Left
EraseRect[NonDisplayedPagel.Top
Ox7FFF;
EraseRect[NonDisplayedPagel.Right
EraseRect[NonDisplayedPagel.Bottom
I* Draw a l l v i s i b l e f a c e s
o f t h ec u b e
DrawVisibleFaces(&Cube);
/*
*I
- 0;
Fliptodisplaythe
page i n t o w h i c h we j u s t d r e w * I
ShowPage(PageStartOffsets[DisplayedPage
NonDisplayedPagel);
(
w h i l e( k b h i t 0 )
s w i t c h( g e t c h 0 )
t
case
OxlB:
I* Esc t oe x i t
*I
Done
1; b r e a k ;
case ' A * : c a s' ae' :
I* away ( - 2 ) * I
CubeWorldXform[2l[31 -- 3.0;RecalcXform
1; b r e a k ;
/ * t o w a r d s (+Z). D o n 'at l l o wt go etto o
*/
c a s' eT I :
case ' t ' :
I* c l o s e , s o Z c l i p p i n igs n ' t
needed * /
i f (CubeWorldXform[21[31 < -40.0) I
CubeWorldXform[21[31 +- 3.0:
RecalcXform
1;
break;
case ' 4 ' :
/ * r o t ac tl eo c k w iasreo u n d
Y *I
AppendRotationY(CubeWor1dXform. -ROTATION);
RecalcXform-1;break;
case '6':
I* r o t actoeu n t e r c l o c k w iasreo u n d
Y *I
AppendRotationY(CubeWor1dXform. ROTATION);
RecalcXform-1;break:
case '8':
I* r o t ac tl eo c k w iasreo u n d
X */
AppendRotationX(CubeWor1dXform. -ROTATION);
RecalcXform-1;break;
case '2':
I* r o t actoeu n t e r c l o c k w iasreo u n d
X *I
AppendRotationX(CubeWor1dXform. ROTATION);
RecalcXform-1;break;
case 0:
I* extended
code
*I
I
s w i t c h( g e t c h 0 )
case Ox3B: I* r o t a t ec o u n t e r c l o c k w i s ea r o u n d
Z *I
AppendRotationZ(CubeWor1dXform. ROTATION);
RecalcXform-1;break;
case Ox3C: I* r o t a t ec l o c k w i s ea r o u n d
Z *I
AppendRotationZ(CubeWor1dXform. -ROTATION);
RecalcXform-1;break;
SneakersinSpace
959
case Ox4B: I* l e f t ( - X ) * I
CubeWorldXform[OIC31
3.0:
case
0x40:
I* r i g h t ( + X ) * I
CubeWorldXformCO1[31 +- 3.0;
case
0x48:
I* up ( + Y ) */
CubeWorldXform[11[31 +- 3.0;
case
0x50:
I* down ( - Y ) * I
CubeWorldXformCllC31 -- 3.0:
default:
break:
--
RecalcXform-1;
break;
RecalcXform-1:
break:
RecalcXform-1:
break:
RecalcXform-1:
break:
break:
default:
I* any o t hkepetroay u s e
g e t c h 0 :b r e a k :
*I
I w h i l e( ! D o n e ) ;
I* R e t u r n t o t e x t
mode and e x i t * I
regset.x.ax
0x0003;
/ * AL 3 s e l e c t s8 0 x 2 5t e x t
i n t 8 6 ( 0 x 1 0 .& r e g s e t ,& r e g s e t ) :
mode * I
LISTING 5 1.2
15 1-2.C
I* T r a n s f o r m s all v e r t i c e s i n t h e s p e c i f i e d o b j e c t i n t o v i e w s p a c e . t h e n
p e r s p e c t i v ep r o j e c t st h e mt os c r e e ns p a c ea n d
s t o r i n gt h er e s u l t si nt h eo b j e c t .
*/
# i n c l u d e < m a t h . h>
Ikincl ude "polygon. h"1
maps them t os c r e e nc o o r d i n a t e s ,
v o i d X f o r m A n d P r o j e c t P o i n t s ( d o u b 1 e Xform[4][4].
s t r u c t O b j e c t * ObjectToXform)
i n t i, NumPoints
ObjectToXform->NumVerts:
structPoint3 * Points
ObjectToXform->VertexList;
s t r u c t P o i n t 3 * XformedPoints
ObjectToXform->XformedVertexList:
structPoint3 * ProjectedPoints
ObjectToXform->ProjectedVertexList:
s t r u c t P o i n t * Screenpoints
ObjectToXform->ScreenVertexList;
- -
f o (r i - 0 i: < N u m P o i n t s ;
i++.
P o i n t s + +X. f o r m e d P o i n t s t t .
P r o j e c t e d P o i n t s + +S, c r e e n P o i n t s + + )
{
I* T r a n s f o r mt ov i e ws p a c e
*I
X f o r m V e c ( X f o r m (. d o u b l e* ) P o i n t s (, d o u b l e* ) X f o r m e d P o i n t s ) :
I* P e r s p e c t i v e - p r o j e c tt os c r e e ns p a c e
*I
ProjectedPoints->X
XformedPoints->X I X f o r m e d P o i n t s - > Z *
PROJECTION-RATIO * (SCREENLWIDTH / 2 . 0 ) :
ProjectedPoints->Y
XformedPoints->Y I X f o r m e d P o i n t s - > Z *
PROJECTION-RATIO * (SCREEN-WIDTH I 2.0);
ProjectedPoints->Z
XformedPoints->Z;
I* C o n v e r tt os c r e e nc o o r d i n a t e s .
The Y c o o r di sn e g a t e dt o
f l i pf r o mi n c r e a s i n g
Y b e i n gu pt oi n c r e a s i n g
Y b e i n g down,
a se x p e c t e db yt h ep o l y g o nf i l l e r .
Add i n h a l f t h e s c r e e n
w i d t h and h e i g h t t o c e n t e r
on t h es c r e e n . * I
ScreenPoints->X
( ( i n t ) floor(ProjectedPoints->X + 0 . 5 ) ) + SCREENLWIOTH/2:
ScreenPoints->Y
( - ( ( i n t ) floor(ProjectedPoints->Y + 0.5))) +
SCREEN-HEIGHTIE:
--
960
Chapter 51
I* Draws a l l v i s i b l e f a c e s ( f a c e s p o i n t i n g t o w a r d t h e v i e w e r ) i n t h e s p e c i f i e d
so
o b j e c t .T h eo b j e c tm u s th a v ep r e v i o u s l yb e e nt r a n s f o r m e da n dp r o j e c t e d .
t h a tt h eS c r e e n V e r t e x L i s ta r r a yi sf i l l e di n .
*/
#i
ncl ude "polygon. h"
v o i dD r a w V i s i b l e F a c e s ( s t r u c tO b j e c t
ObjectToXform)
i n t i. j . NumFaces
ObjectToXform->NumFaces, NumVertices;
i n t * VertNumsPtr:
ObjectToXform->FaceList:
s t r u c t Face * F a c e P t r
s t r u c t P o i n t * ScreenPoints
ObjectToXform->ScreenVertexList;
l o n gv l . v 2 . w l , w 2 :
s t r u c t P o i n t VerticesCMAX-POLYLLENGTHI:
s t r u c tP o i n t L i s t H e a d e rP o l y g o n :
/ * Draweach v i s i b l e f a c e ( p o l y g o n ) o f t h e o b j e c t i n t u r n
*/
f o(ri - 0i;< N u m F a c e s ;
i++.
FacePtr++) I
NumVertices
FacePtr->NumVerts:
/ * C o p yo v e rt h ef a c e ' sv e r t i c e sf r o mt h ev e r t e xl i s t
*I
f o r ( j - 0 , VertNumsPtr-FacePtr->VertNums: j<NumVertices:j++)
ScreenPoints[*VertNumsPtr++l:
VerticesCjl
/ * Draw o n l y i f o u t s i d ef a c es h o w i n g
( i f t h en o r m a lt ot h e
p o l y g o np o i n t st o w a r dt h ev i e w e r :t h a ti s ,h a s
a positive
2 component) * I
vl
V e r t i c e s C 1 l . X - VerticesCO1.X:
wl
V e r t i c e s C N u m V e r t i c e s - l 1 . X - VerticesCO1.X:
V e r t i c e s C 1 l . Y - VerticesCO1.Y;
v2
w2
V e r t i c e s C N u m V e r t i c e s - l 1 . Y - VerticesCO1.Y:
i f ( ( v l * w 2 - v2*wl) > 0 ) I
/ * It i sf a c i n gt h es c r e e n ,s od r a w
*/
I* A p p r o p r i a t e l y a d j u s t t h e e x t e n t o f t h e r e c t a n g l e u s e d t o
e r a s et h i sp a g el a t e r
*/
f o r( j - 0 :j < N u m V e r t i c e s :
j++){
i f ( V e r t i c e s C j 1 . X > EraseRectCNonDisplayedPagel.Right)
i f ( V e r t i c e s C j 1 . X < SCREEN-WIDTH)
EraseRect[NonDisplayedPagel.Right
VerticesCj1.X:
SCREENLWIDTH:
e l s e EraseRectCNonDisp1ayedPagel.Right
if ( V e r t i c e s C j 1 . Y > E r a s e R e c t [ N o n D i s p l a y e d P a g e l . B o t t o m )
i f ( V e r t i c e s C j 1 . Y < SCREENKHEIGHT)
VerticesCj1.Y:
EraseRectCNonDisplayedPagel.Bottom
e l s e EraseRect[NonDi~playedPage].Bottom-SCREEN~HEIGHT:
if ( V e r t i c e s C j 1 . X < EraseRect[NonDisplayedPagel.Left)
i f (VerticesCj1.X > 0 )
EraseRectCNonDisplayedPage3.Left
VerticesCj1.X;
e l s e E r a s e R e c t C N o n D i s p l a y e d P a g e 1 . L e f t = 0:
i f ( V e r t i c e s C j 1 . Y < EraseRectCNonDisplayedPagel.Top)
i f (VerticesCj1.Y > 0)
EraseRect[NonDisplayedPagel.Top
VerticesCj1.Y:
elseEraseRectCNonDisp1ayedPagel.Top
0:
--
--
/* Draw t h ep o l y g o n * /
DRAW-POLYGON(Vertices. N u m V e r t i c e s F. a c e P t r - > C o l o r .
0 .0 ) ;
The sample program, as shownin Figure 51.3, places a cube, floating in three-space,
under the complete control of the user. The arrow keys may be used to move the
SneakersinSpace
96 1
cube left, right, up, and down, and the A and T keys may be used to move the cube
away from or toward the viewer. The F1 and F2 keys perform rotation around the Z
axis, the axis running from the viewer straight into the screen. The 4 and 6 keys
perform rotation around theY (vertical) axis, and the2 and 8 keys perform rotation
around theX axis, whichruns horizontally across the screen; the latter fourkeys are
most conveniently used by flipping the keypad to the numeric state.
The demoinvolves six polygons,one for each side of the cube. Each of the polygons
must be transformed and projected, so it would seem that 24 vertices (four for each
polygon) must be handled, butsome steps have been taken to improve performance.
All vertices for theobject have been stored in single
a
list; the definition of each face
contains not thevertices for thatface themselves, but ratherindexes into theobjects
vertex list,as shown in Figure 51.4. This reduces the number of vertices to bemanipulated from 24 to 8, for there are, after all, only eight vertices in a cube, with three
faces sharing eachvertex. In this way, the transformation burden is lightened by twothirds. Also, as mentioned earlier, backface removal is performed with integers, in
screen coordinates, rather than with floating-point values in screen space. Finally,
the RecalcXForm flag is set whenever the user changes theobject-to-world transformation. Only when this flag set
is is the full object-to-viewtransformation recalculated
and theobjects vertices transformedand projected again; otherwise,the values already
stored within the object are reused. In thesample application, this brings no visual
improvement, because theres only the oneobject, but theunderlying mechanism is
962
Chapter 51
41
rl
Jums
t,J
15,-15,-15, 1
-15, 15, 15, 1
sound: Ina full-blown 3-D animation application, with multiple objects moving about
the screen, it would help a great deal to flag which of the objects had moved with
respect to the viewer, performing a new transformation and projection only for those
that had.
on
With the above optimizations,the sample program is certainly adequately responsive
a 20 MHz 386 (sans 387; Im sure its wonderfully responsive with
a math coprocessor).
Still, it couldnt quite keep
up with the keyboard when I modified it to read only one
key each time through the loop-and were talking about only eight vertices here.
This indicates that were already near the limit of animation complexity possible
with our current approach. Its time to start rethinking that approach; over twothirds of the overall time is spent in floating-point calculations, and its there that
well begin to attack the performance bottleneck we find ourselves up against.
SneakersinSpace
963
Incremental Transformation
Listing 51.4contains three functions;each concatenates an additional rotation around
one of the three axes to an existing rotation. To improve performance, only the
matrix entries thatare affected in a rotation
around each particularaxis are recalculated (all but four of the entries in asingle-axis rotation matrix are either0 or 1, as
64
shown in Chapter50). This cuts the number
of floating-point multiplies from the
required for the multiplication of two 4x4 matrices to just 12, and floating point
adds from48 to 6.
Be aware that Listing 51.4 performs an incremental rotation on top of whatever
rotation is already in thematrix. The cube may already have been turned left, right,
up, down, and sideways; regardless, Listing 51.4just tacks the specified rotation onto
whatever already exists. In this way, the object-to-world transformation matrix contains a history of all the rotations ever specified by the user, concatenated one after
another onto theoriginal matrix. Potential loss of precision is a problem associated
with using such an approach to represent very
a long concatenation of transformations, especially withfixed-point arithmetic; that's
not aproblem forus yet,but we'll
run intoit eventually.
LISTING 5 1.4
15 1-4.C
I* R o u t i n e st op e r f o r mi n c r e m e n t a lr o t a t i o n sa r o u n dt h et h r e ea x e s
#include<math.h>
#include "polygon. h"
*I
I* C o n c a t e n a t ear o t a t i o nb yA n g l ea r o u n dt h e
X a x i st ot h et r a n s f o r m a t i o ni n
*I
X f o r m T o C h a n g e ,p l a c i n gr e s u l tb a c ki nX f o r m T o C h a n g e .
voidAppendRotationX(doub1eXformToChange[41[41.doubleAngle)
{
doubleTemplO.Templl,Templ2,
Temp2O. TempEl.Temp22:
cos(Ang1e).SinTemp
sin(Ang1e);
d o u b l e CosTemp
I* C a l c u l a t e t h e new v a l u e s o f t h e f o u r a f f e c t e d m a t r i x e n t r i e s
*/
TemplO
CosTemp*XformToChange[l][Ol+
-SinTemp*XformToChange[2][01:
Templl
CosTemp*XformToChangeCll[ll+ -SinTemp*XformToChange[21[1]:
Temp12
CosTemp*XformToChangeCll[21+ -SinTemp*XformToChange[21[2]:
Temp20
SinTemp*XformToChange[ll[Ol+ CosTemp*XformToChange~21~01:
Temp21
SinTemp*XformToChange[ll[ll+
CosTemp*XformToChange~21~11:
Temp22
SinTemp*XformToChange[ll[21+ CosTemp*XformToChange~21[21;
I* P u t t h e r e s u l t s b a c k i n t o
XformToChange * I
XformToChange[l][O]
TemplO:XformToChange~11~11
Templl:
Templ2:XformToChange[21[01
Temp2O:
XformToChange[l][2]
XformToChange[21[11
Temp21; XformToChange[21C23
Temp22:
---
--
--
I* C o n c a t e n a t ear o t a t i o nb yA n g l ea r o u n dt h e
Y a x i st ot h et r a n s f o r m a t i o n
X f o r m T o C h a n g e .p l a c i n gr e s u l tb a c ki nX f o r m T o C h a n g e .
*I
voidAppendRotationY(doub1eXformToChange[4][4].doubleAngle)
d o u b l e TempOO. TempOl.Temp02.
Temp2O. Tempel.Temp22:
cos(Ang1e).SinTemp
sin(Ang1e):
d o u b l e CosTemp
964
Chapter 51
in
/ * C a l c u l a t e t h e new v a l u e s o f t h e f o u r a f f e c t e d m a t r i x e n t r i e s
*/
TempOO = CosTemp*XformToChange[Ol[Ol+
SinTemp*XformToChange[21[01:
TempOl
CosTemp*XformToChange[Ol[ll+ SinTemp*XformToChange[2][11:
Temp02
CosTemp*XformToChange[OI[21+SinTemp*XformToChangeC21C21;
Temp20
-SinTemp*XformToChange[Ol[Ol+ CosTemp*XformToChange[Zl[Ol;
Temp21 = -SinTemp*XformToChange[Ol[ll+ CosTemp*XformToChange~21~11:
Temp22
-SinTemp*XformToChange[OI[21+ C o s T e m p * X f o r m T o C h a n g e ~ 2 1 ~ 2 1 :
/ * P u tt h er e s u l t sb a c ki n t oX f o r m T o C h a n g e
*I
XformToChange[O1[0]
TempOO; XformToChange[0][11
TempOl:
XformToChange[Ol[2l
TempO2: XformToChange[2lC01
TempEO:
XformToChange[2][11
Tempel:XformToChange[21[2]
Temp22:
--
I* C o n c a t e n a t ear o t a t i o nb vA n c l l ea r o u n dt h e
2 axistothetransformationin
X f o r m T o C h a n g e .p l a c i n gr e s u l ib a c ki nX f o r m T o C h a n g e .
*/
voidAppendRotationZ(doub1eXformToChange[41C41,doubleAngle)
d o u b l e TempOO. TempOl, TempO2. TemplO.Templl.TemplE:
d o u b l e CosTemp
cos(Ang1e).SinTemp
sin(Ang1e):
/ * C a l c u l a t e t h e new v a l u e s o f t h e f o u r a f f e c t e d m a t r i x e n t r i e s
*/
TempOO
CosTemp*XformToChange[Ol[Ol+
-SinTemp*XformToChange~ll~Ol:
TempOl
CosTemp*XformToChange[Ol[ll+ -SinTemp*XformToChange~ll~ll:
Temp02
CosTemp*XformToChange[Ol[Zl+ -SinTemp*XformToChange[l1[21;
TemplO
SinTemp*XformToChange[Ol[Ol+ CosTemp*XformToChange[ll[Ol:
Templl
SinTemp*XformToChange[Ol[ll+ CosTemp*XformToChangeC1IC13:
Temp12
SinTemp*XformToChange[Ol[Zl+ C o s T e m p * X f o r m T o C h a n g e ~ l 1 ~ 2 1 :
/ * P u tt h er e s u l t sb a c ki n t oX f o r m T o C h a n g e
*/
XformToChange[0][01
TempOO: XformToChangeCO1~11 TempOl;
XformToChange[Ol[Z]
Temp02: XformToChange~11C01 TemplO:
XformToChange[l][ll
Templl:XformToChange[1][21
TemplZ;
---
--
--
..
POLYG0N.H: Header f i l e f o r p o l y g o n - f i l l i n g c o d e , a l s o i n c l u d e s
anumber
of
u s e f u li t e m sf o r
3D a n i m a t i o n . * /
# d e f i n e MAX-POLY-LENGTH 4 / * f o u r v e r t i c e s i s t h e
max p e rp o l y */
# d e f i n e SCREEN-WIDTH 320
# d e f i n e SCREEN-HEIGHT 240
# d e f i n e PAGEO-START-OFFSET 0
# d e f i n e PAGE1-START-OFFSET (((long)SCREEN-HEIGHT*SCREENKWIDTH)/4)
/ * R a t i o :d i s t a n c ef r o mv i e w p o i n tt op r o j e c t i o np l a n e
/ w i d t ho fp r o j e c t i o n
p l a n e .D e f i n e st h ew i d t ho ft h ef i e l do fv i e w .L o w e ra b s o l u t ev a l u e s
wider
f i e l d so fv i e w :h i g h e rv a l u e s
narrower. */
# d e f i n e PROJECTION-RATIO
-2.0 /* n e g a t i v eb e c a u s ev i s i b l e
Z c o o r d i n a t e sa r en e g a t i v e
/ * Draws t h e p o l y g o n d e s c r i b e d b y t h e p o i n t l i s t P o i n t L i s t i n c o l o r C o l o r w i t h
a l lv e r t i c e so f f s e tb y
( X . Y ) */
# d e f i n e DRAW-POLYGON(PointList.NumPoints.Co1or.X.Y)
\
Polygon.Length
N u m P o i n t s :P o l y g o n . P o i n t P t r
PointList: \
FillConvexPolygon(&Polygon. C o l o r , X . Y ) ;
/ * D e s c r i b e sas i n g l e
2D p o i n t * /
s t r u c tP o i n t I
i n t X;
I* X c o o r d i n a t e * /
i n t Y:
I* Y c o o r d i n a t e */
1:
/*
*I
D e s c r i b e sas i n g l e
3D p o i n t i n homogeneous c o o r d i n a t e s
s t r u c tP o i n t 3 {
double X:
/ * X c o o r d i n a t e */
double Y:
/* Y coordinate */
d o u b l e Z:
/ * 2 c o o r d i n a t e */
d o u b l e W:
*/
3:
Sneakers in Space
965
I* D e s c r i b e s a s e r i e s o f p o i n t s ( u s e d t o s t o r e
a listofverticesthat
d e s c r i b e a p o l y g o n :e a c hv e r t e xi s
assumed t o c o n n e c t t o t h e t w o a d j a c e n t
v e r t i c e s , and t h e l a s t v e r t e x i s
assumed t o c o n n e c t t o t h e f i r s t )
*I
s t r u c tP o i n t L i s t H e a d e r
{
L e ni ngtt h :
I*p #o i on ft s
*I
s t r u c tP o i n t * P o i n t P t r :
I* p o i n t e r t o l i s t
o f points *I
I:
I* D e s c r i b e sb e g i n n i n ga n de n d i n g
X c o o r d i n a t e so f a s i n g l e h o r i z o n t a l l i n e
s t r u c tH L i n e {
i n t X S t a r t : I* X c o o r d i n a t e o f l e f t m o s t p i x e l i n l i n e
*I
i n t XEnd:
I* X c o o r d i n a t eo fr i g h t m o s tp i x e li nl i n e
*I
*I
I:
I* D e s c r i b e s a L e n g t h - l o n gs e r i e so fh o r i z o n t a ll i n e s ,a l l
assumed t o beon
c o n t i g u o u ss c a nl i n e ss t a r t i n ga tY S t a r t
a n dp r o c e e d i n gd o w n w a r d( d e s c r i b e s
a s c a n - c o n v e r t e dp o l y g o nt ol o w - l e v e lh a r d w a r e - d e p e n d e n td r a w i n gc o d e )
*I
s t r u c tH L i n e L i s t
i n tL e n g t h :
I* i o f h o r i z o n t a l l i n e s
*I
intYStart:
I* Y c o o r d i n a t e o f t o p m o s t l i n e
*I
I* p o i n t e r t o l i s t o f h o r z l i n e s
*I
s t r u c tH L i n e * H L i n e P t r :
I:
s t r u c tR e c t
{ i n tL e f t ,
I* S t r u c t u r e d e s c r i b i n g
s t r u c t Face I
i n t * VertNums:
I*
i n t NumVerts;
I*
intColor:
I*
I:
T o pR
, i g h tB
, ottom:
l:
o n ef a c eo fa no b j e c t( o n ep o l y g o n )
pointertovertexptrs
# ofvertices
p o l y g o nc o l o r
*I
*I
*I
*I
I* S t r u c t u r e d e s c r i b i n g
an o b j e c t * I
s t r u c tO b j e c t
{
i n t NumVerts:
s t r u c tP o i n t 3 * V e r t e x L i s t :
s t r u c tP o i n t 3 * XformedVertexList:
struct'Point3 * ProjectedVertexList:
s t r u c tP o i n t * S c r e e n V e r t e x L i s t ;
i n t NumFaces:
s t r u c t Face * F a c e L i s t ;
1:
* SourceVec.double * OestVec):
e x t e r nv o i dX f o r m V e c ( d o u b 1 eX f o r m C 4 1 C 4 1 .d o u b l e
e x t e r nv o i dC o n c a t X f o r m s ( d o u b 1 eS o u r c e X f o r m l [ 4 ] [ 4 1 .
doubleSourceXform2C41C41.doubleDestXformC4lC41):
e x t e r n v o i d XformAndProjectPoly(doub1e XformC4lC41.
s t r u c tP o i n t 3 * P o l y ,i n tP o l y L e n g t h .i n tC o l o r ) :
e x t e r n i n t FillConvexPolygon(struct P o i n t L i s t H e a d e r *, i n t . i n t . i n t ) :
e x t e r nv o i dS e t 3 2 0 ~ 2 4 0 M o d e ( v o i d ) :
e x t e r nv o i dS h o w P a g e ( u n s i g n e di n tS t a r t o f f s e t ) :
e x t e r nv o i dF i l l R e c t a n g l e X ( i n t
S t a r t X . i n t S t a r t Y . i n t EndX.
i n t EndY. u n s i g n e di n t PageBase, i n t C o l o r ) :
* ObjectToXform):
e x t e r n v o i d X f o r m A n d P r o j e c t P o i n t s ( d o u b 1 e X f o r m [ 4 1 [ 4 l . s t r u c tO b j e c t
e x t e r nv o i dD r a w V i s i b l e F a c e s ( s t r u c tO b j e c t
* ObjectToXform):
e x t e r nv o i dA p p e n d R o t a t i o n X ( d o u b 1 e
XformToChangeC41C41. d o u b l eA n g l e ) :
e x t e r nv o i dA p p e n d R o t a t i o n Y ( d o u b 1 e
XformToChangeC41C41. d o u b l eA n g l e ) ;
e x t e r nv o i dA p p e n d R o t a t i o n Z ( d o u b 1 e
XformToChangeC41C41. d o u b l eA n g l e ) :
e x t e r ni n tD i s p l a y e d P a g e .N o n D i s p l a y e d P a g e :
e x t e r ns t r u c tR e c tE r a s e R e c t C l :
Previous
Next
Home
and using the floor() function. For positive values, the two approaches are equivalent; fornegative values, onlythe floor() approach works properly.
Object Representation
Each object consists of a list of vertices and a list of faces, withthe vertices of each
face defined by pointers into the
vertex list; this allows each vertex to be transformed
exactly once, even though several faces may share a single vertex. Each object contains the vertices not only in their original, untransformed state, but in three other
forms as well:transformed to view space, transformed and projected to screen space,
and converted to screen coordinates. Earlier, wesaw that it can be convenient to
store the screen coordinateswithin the object, so that if the object hasnt moved with
respect to the viewer, it can be redrawn without the need forrecalculation, but why
bother storing theview and screen space forms of the vertices as well?
The screen space vertices are useful for some sorts of hidden surface removal. For
example, to determine whethertwo polygons overlap as seen by the viewer, youmust
first know how they lookthe
toviewer, accounting for perspective; screen space provides
that information. (So do the final screen coordinates,
but with less accuracy,and without
any Z information.) The view space vertices are useful for collision and proximity
detection; screen space cant be used here, because objects are distortedby the perspective projection into screen space. World space would serve aswell as view space
for collision detection, but because its possible to transform directly from object
space to view space with a single matrix, its often preferable to skip over world
space.
Its not mandatory thatvertices be stored forall these different spaces, but thecoordinates in all those spaces have to be calculated as intermediate steps anyway, so we
might as well keep them around forthose occasions when theyre needed.
Sneakers in Space
967
Previous
CHAPTER 52
FAST 3-D ANIMATION: MEET X-SHARP
Home
Next
97 1
Never mind what it was like tryingto herd that group around a gorge that
looks like
it was designed toswallow children and small animals without a trace. The hard part
was getting back to the docks and finding theyd have to wait an hour for the next
ferry. As my friend putit, Let me tell you,an houris an eternitywith65 sixth graders
screaming thesong You Are My Sunshine.
Apart from reminding you how lucky you are to be working in a quiet, air-conditioned room, in front of a gently humming computer, freeto think deep thoughts
and eatCheetos to your hearts content,this story provides a useful perspective on
the malleable nature of time. An hour isntjust anhour-it can be forever, or it can
be the wink ofan eye.Just thinkof the last hour you spent working under a deadline;
I bet went
it
past in aflash. Which isnot to say, mind you, that I recommend
working
in a bus full of screaming kids in order to make time pass more slowly; there are
quality issueshere as well.
In our 3-D animation work so far, weve used floating-point arithmetic. Floatingpoint arithmetic-even with a floating-point
processor but especially without one-is
the microcomputer animation equivalent ofworking school
in a bus: It takes forever
to do anything, and you just know youre never going to accomplish as much as you
want to. In this chapter, well address fixed-point arithmetic,which will give us an
instant order-of-magnitude performance boost. Well also give our 3-D animation
code a much morepowerful and extensible framework, making it easy to add new
and differentsorts of objects. Taken together, these alterations
will let us start to do
some really interesting real-time animation.
972 Chapter 52
LISTING 52.1
152- 1.C
/* 3-0 a n i m a t i o np r o g r a mt or o t a t e1 2c u b e s .
t e s t e dw i t hB o r l a n d
C++
Uses f i x e d p o i n t .
A
l C code
i n C c o m p i l a t i o n mode and t h es m a l lm o d e l .
*/
#i
n c l ude <coni 0. h>
#i
n c l ude <dos h>
#include "polygon. h"
/ * b a s eo f f s e to fp a g et ow h i c ht od r a w
u n s i g n e di n tC u r r e n t P a g e B a s e
0:
/ * c l i pr e c t a n g l e ;c l i p st ot h es c r e e n
i n t C l i p M i n X = 0. C l i p M i n Y
0;
i n t ClipMaxX = SCREEN-WIDTH. ClipMaxY
s t a t i cu n s i g n e di n tP a g e S t a r t O f f s e t s C 2 1
*/
*/
--
SCREEN-HEIGHT:
{PAGEOpSTART-OFFSET,PAGEl-STARTpOFFSET);
i n t OisplayedPage.NonOisplayedPage:
i n tR e c a l c A l l X f o r m s = 1. NumObjects
0;
Xform
WorldViewXform:
/ * i n i t i a l i z e df r o mf l o a t s
/ * p o i n t e r s t o o b j e c t s */
O b j e c t *ObjectList[MAX._OBJECTSI:
*/
v o i dm a i n 0 {
i n t Done = 0. i :
O b j e c t* O b j e c t P t r ;
u n i o n REGS r e g s e t ;
InitializeFixedPointO:
Initializecubeso:
Set320x240ModeO:
I* s e tu pf i x e d - p o i n td a t a
*/
/ * s e t up
cubes
and
add
them
t oo b j e c lt i s t o: t h e r
o b j e c t sw o u l db ei n i t i a l i z e d
now, i f t h e r ew e r ea n y
I* s et htsec r e et no
mode X * I
ShowPage(PageStartOffsetsCDisp1ayedPage
/*
*/
01):
Keep t r a n s f o r m i n gt h ec u b e ,d r a w i n g
i t t ot h eu n d i s p l a y e d
and f l i p p i n g t h e p a g e t o
show i t * /
do I
/* For e a c ho b j e c t ,r e g e n e r a t ev i e w i n gi n f o ,
i f necessary
f o (r i - 0i:< N u r n O b j e c t s :
i++)
[
i f ((ObjectPtr
ObjectListCi1)->RecalcXform I I
Recal cAl1 Xforrns) I
ObjectPtr->RecalcFunc(ObjectPtr):
ObjectPtr->RecalcXform
--
0;
RecalcAllXforms
0:
CurrentPageBase
/ * s e l e c ot t h e rp a g ef o rd r a w i n gt o
*/
PageStartOffsetsCNonDisplayedPage
DisplayedPage * 11:
/ * F o re a c ho b j e c t .c l e a rt h ep o r t i o no ft h en o n - d i s p l a y e dp a g e
t h a t was drawn t o l a s t t i m e , t h e n r e s e t t h e e r a s e e x t e n t
*/
i++)
[
f o (r i - 0 i:< N u r n O b j e c t s ;
ObjectPtr = ObjectList[il;
FillRectangleX~ObjectPtr->EraseRect[NonDisplayedPagel.Left,
ObjectPtr->EraseRect[NonDisplayedPagel.Top,
ObjectPtr->EraseRect[NonDisplayedPagel.Right,
ObjectPtr->EraseRect[NonDisplayedPagel.Bottom,
- ObjectPtr->EraseRect[NonDisplayedPagel.Bottom -
CurrentPageBase. 0 ) ;
ObjectPtr->EraseRectCNonDisplayedPage].Left
ObjectPtr->EraseRect[NonDisplayedPage].Top
ObjectPtr->EraseRect[NonDisplayedPagel.Right
Ox7FFF;
0:
973
/*
Draw a l l o b j e c t s */
f o r( i - 0 i: < N u m O b j e c t s :
i++)
ObjectListCil->DrawFunc(ObjectListCil);
/* F l i pt od i s p l a yt h ep a g ei n t ow h i c h
we j u s t drew * I
ShowPage(PageStartOffsets1DisplayedPage
NonDisplayedPage]):
/ * Move a n d r e o r i e n t e a c h o b j e c t
*/
f o r ( i - 0 : i<NumObjects; i++)
ObjectListCil->MoveFunc(0bjectListCi3);
if (kbhit0)
i f (getch0
OxlB) Done
1:
/ * Esc t o e x i t */
1 w h i l e( ! D o n e ) ;
/* R e t u r n t o t e x t mode and e x i t * /
regset.x.ax
0x0003;
/ * AL 3 s e l e c t s8 0 x 2 5t e x t
mode
i n t 8 6 ( 0 x 1 0 .& r e g s e t .& r e g s e t ) ;
exit(1):
*/
T r a n s f o r m sa l lv e r t i c e si nt h es p e c i f i e dp o l y g o n - b a s e do b j e c ti n t ov i e w
s p a c e ,t h e np e r s p e c t i v ep r o j e c t st h e mt os c r e e ns p a c ea n d
maps them t o s c r e e n
c o o r d i n a t e s ,s t o r i n gr e s u l t si nt h eo b j e c t .R e c a l c u l a t e so b j e c t - > v i e w
t r a n s f o r m a t i o nb e c a u s eo n l y
i f t r a n s f o r mc h a n g e sw o u l d
we b o t h e r
t or e t r a n s f o r mt h ev e r t i c e s .
*/
# i n c l u d e< m a t h . h >
# i n c l u d e" p o 1 y g o n . h "
v o i d XformAndProjectPObject(P0bject
-- -
ObjectToXform)
i n t i. NumPoints
ObjectToXform->NumVerts:
Point3 * Points
ObjectToXform->VertexList:
P o i n t 3 * XformedPoints
ObjectToXform->XformedVertexList:
Point3 * ProjectedPoints
ObjectToXform->ProjectedVertexList:
Point * Screenpoints
ObjectToXform-XcreenVertexList:
/ * R e c a l c u l a t et h eo b j e c t - > v i e wt r a n s f o r m
*/
ConcatXforms(Wor1dViewXform. ObjectToXform->XformToWorld.
ObjectToXform->XformToView):
/ * A p p l y t h a t new t r a n s f o r m a t i o n and p r o j e c t t h e p o i n t s */
f o r( i - 0 i: < N u m P o i n t s ;
i++.
Points++,XformedPoints++,
P r o j e c t e d P o i n t s + +S
. creenPointstt)
I
/* T r a n s f o r mt ov i e ws p a c e
*/
XformVec(0bjectToXform->XformToView. ( F i x e d p o i n t *) P o i n t s ,
( F i x e d p o i n t *) X f o r m e d P o i n t s ) :
/* P e r s p e c t i v e - p r o j e c tt os c r e e ns p a c e
*/
ProjectedPoints->X
FixedMul(FixedDiv(XformedPoints->X. X f o r m e d P o i n t s - > Z ) .
D O U B L E ~ T O ~ F I X E D ( P R O J E C T I O N ~ R A T I*O (SCREEN-WIDTH/Z)));
ProjectedPoints->Y
FixedMul(FixedDiv(XformedPoints->Y, X f o r m e d P o i n t s - > Z ) ,
O O U B L E ~ T O ~ F I X E O ( P R O J E C T I O N ~ R A T *I O (SCREEN_WIOTH/E))):
ProjectedPoints->Z
XformedPoints->Z:
/ * C o n v e r tt os c r e e nc o o r d i n a t e s .
The Y c o o r d i s n e g a t e d t o
f l i p from
i n c r e a s i n g Y b e i n gu pt oi n c r e a s i n g
Y b e i n g down, a se x p e c t e db yp o l y g o n
f i l l e r . Add i n h a l f t h e s c r e e n w i d t h a n d h e i g h t t o c e n t e r
onscreen.
ScreenPoints->X
( ( i n t )( ( P r o j e c t e d P o i n t s - > X +
DOUBLE-TO-FIXED(0.5))
>> 1 6 ) ) + SCREEN_WIDTH/2:
ScreenPoints->Y
( - ( ( i n t )( ( P r o j e c t e d P o i n t s - > Y +
>> 1 6 ) ) ) + SCREEN-HEIGHT/Z;
DOUBLE-TO-FIXED(0.5))
974
Chapter 52
*/
#include <math.h>
t i ncl ude "polygon.
h"
/ * Concatenate a rotation by Angle around the X axis to transformation in
--
--
--
1
/* Concatenate a rotation by Angle around the Y a x i s to t r a n s f o r m a t i o n i n
XformToChange. placing the result back into XformToChange. * /
void AppendRotationY(Xform XformToChange. double Angle)
{
--
--
--
975
/*
C o n c a t e n a t e a r o t a t i o n b yA n g l ea r o u n dt h e
2 a x i st ot r a n s f o r m a t i o ni n
X f o r m T o C h a n g e ,p l a c i n gt h er e s u l tb a c ki n t oX f o r m T o C h a n g e .
*/
voidAppendRotationZ(XformXformToChange.doubleAngle)
{
--
/* C a l c u l a t e t h e new v a l u e s o f t h e s i x a f f e c t e d m a t r i x e n t r i e s
TempOO
FixedMul(CosTemp.XformToChange[O][Ol)
+
FixedMul(-SinTemp,XformToChange[l][O]):
TempOl
FixedMul(CosTemp.XformToChange[O1[11)
+
FixedMul(-SinTempX
, formToChange~ll~11);
Temp02
FixedMul(CosTemp.XformToChange[O1[El)
+
FixedMul(-SinTemp.XformToChange[l][21);
TemplO
FixedMul(SinTemp.XformToChange[01~01)
+
FixedMul(CosTernp.XformToChange[ll[Ol):
Templl
FixedMul(SinTemp.XformToChange[O1[1])
+
FixedMul(CosTernp.XformToChangeC11[11):
Temp12
FixedMul(SinTemp.XformToChange[O1[El)
+
FixedMul(CosTemp.XforrnToChange[11[21):
/ * P u tt h er e s u l t sb a c ki n t o
XformToChange */
XformToChange[OI[O1
TempOO; XformToChange[0][11
TempOl:
XformToChange[O1CE]
TempO2: XformToChange[l][O]
TemplO:
XformToChange[ll[ll
T e m p l l ; XformToChange[l][E]
TernplZ;
--
--
LISTING52.4152-4.C
/*
F i x e dp o i n tm a t r i xa r i t h m e t i c
*/
functions.
*/
# i n c l u d e " p o l y g o n . h"
I* M a t r i xm u l t i p l i e sX f o r mb yS o u r c e V e c .
and s t o r e st h er e s u l ti nD e s t V e c .
M u l t i p l i e sa4 x 4m a t r i xt i m e sa4 x 1m a t r i x :t h er e s u l ti sa4 x 1m a t r i x .C h e a t s
b ya s s u m i n gt h e
W c o o r d i s 1 a n db o t t o mr o wo fm a t r i xi s
0 0 0 1. and d o e s n ' t
bothertosetthe
W c o o r d i n a t eo ft h ed e s t i n a t i o n .
*/
v o i dX f o r m V e c ( X f o r r nW o r k i n g X f o r r n .F i x e d p o i n t* S o u r c e V e c .
F i x e d p o i n t* D e s t V e c )
(
i n t i:
i++)
FixedMul(WorkingXform[ilCOI,SourceVecCOI)
FixedMul(WorkingXform[ilCll, S o u r c e V e c C l I ) +
F i x e d M u l ( W o r k i n g X f o r m [ i I ~ 2 1 ~ SourceVecCEl) +
WorkingXform[i][3];
/ * noneed t om u l t i p l y by W
f o r( i - 0 :i < 3 :
DestVecCil
/*
*/
M a t r i xm u l t i p l i e sS o u r c e X f o r m lb yS o u r c e X f o r m Ea n ds t o r e sr e s u l ti n
D e s t X f o r m .M u l t i p l i e sa4 x 4m a t r i xt i m e sa4 x 4m a t r i x ;r e s u l ti sa4 x 4m a t r i x .
C h e a t sb ya s s u m i n gb o t t o mr o wo fe a c hm a t r i xi s
0 0 0 1. a n dd o e s n ' tb o t h e r
t os e tt h eb o t t o mr o wo ft h ed e s t i n a t i o n .
*/
v o i dC o n c a t X f o r m s ( X f o r mS o u r c e X f o r m l .X f o r mS o u r c e X f o r m E ,
XformDestXform)
(
i n t i. j :
f o r( i - 0 :i < 3 ;
f o r (j-0:
976
Chapter 52
i++)
(
j < 4 ; j++)
DestXformCil[jl
SourceXform2[O][j])
+
S o u r c e X f o r m 2 ~ 1 1 ~ j+
l)
FixedMul(SourceXforml[ilC23. S o u r c e X f o r m 2 [ 2 l [ j l ) +
SourceXforml[il[31:
FixedMul(SourceXforml[il[Ol,
FixedMul(SourceXforml[il[ll,
LISTING 52.5152-5.C
/* S e t u p b a s i c d a t a t h a t n e e d s t o b e i n f i x e d p o i n t , t o a v o i d d a t a
*/
d e f i n i t i o nh a s s l e s .
#i
ncl ude "polygon. h"
/ * Al v e r t i c e s i n t h e b a s i c c u b e
*/
s t a t i c I n t P o i n t 3 IntCubeVertsCNUM-CUBE-VERTSI
/*
(15.15.15}.~15.15.-15~,~15,-15,15~,~15,-15,-153,
[ - 1 5 . 1 5 . 1 5 ] . ( - 1 5 , 1 5 , - 1 5 ~ ~ ~ ~ 1 5 , ~ 1 5 , 1 5 ~ . ~ ~ 1 5 ~1~; 1 5 , ~ 1 5 1
T r a n s f o r m a t i o nf r o mw o r l ds p a c ei n t ov i e ws p a c e
c u r r e n t l y ) */
s t a t i ci n tI n t W o r l d V i e w X f o r m [ 3 ] [ 4 1
(
t1,O.O.O).
to.1.0.01. t 0 . 0 . 1 . 0 ~ 1 :
(no t r a n s f o r m a t i o n ,
v o i dI n i t i a l i z e F i x e d P o i n t O
i n t i. j :
f o r( i - 0 :i < 3 :
i++)
f o r( j - 0 :j < 4 :
j++)
WorldViewXform[il[jl
INT~TO_FIXEO(IntWorldViewXform~il[jl):
f o r ( i - 0 : i<NUM-CUBE-VERTS:
i++)
I
CubeVerts[i].X
INT-TO-FIXED(IntCubeVertsCi1.X);
CubeVerts[i].Y
INT_TO_FIXED(IntCubeVerts[il.Y):
CubeVertsCil.2
INT-TO-FIXED(IntCubeVerts[il.Z):
LISTING 52.6152-6.C
/*
Rotatesandmoves
a p o l y g o n - b a s e do b j e c ta r o u n dt h et h r e ea x e s .
Movement i s i m p l e m e n t e do n l ya l o n gt h e
2 a x i sc u r r e n t l y .
*/
#i
ncl ude "polygon. h"
v o i dR o t a t e A n d M o v e P O b j e c t ( P 0 b j e c t
(
ObjectToMove)
--
i f (--0bjectToMove->RDelayCount
0) (
/* r o t a t e */
ObjectToMove->RDelayCount ObjectToMove->RDelayCountBase:
i f (ObjectToMove->Rotate.RotateX !- 0 . 0 )
AppendRotationX(0bjectToMove->XformToWorld,
ObjectToMove->Rotate.RotateX):
i f (ObjectToMove->Rotate.RotateY !- 0 . 0 )
AppendRotationY(0bjectToMove->XformToWorld,
ObjectToMove->Rotate.RotateY):
i f ( O b j e c t T o M o v e - > R o t a t e . R o t a t e Z !- 0.0)
AppendRotationZ(0bjectToMove->XformToWorld,
ObjectToMove->Rotate.RotateZ):
ObjectToMove->RecalcXform
1:
977
--
I* Move i n Z, c h e c k i n gf o rb o u n c i n ga n ds t o p p i n g
*I
i f (--0bjectToMove->MDelayCount
0) {
ObjectToMove->MDelayCount
ObjectToMove->MDelayCountBase;
ObjectToMove->XformToWorld[21[31 +- ObjectToMove->Move.MoveZ;
i f ~ObjectToMove->XformToWorldC21C33>0bjectToMove->Move.MaxZ)
ObjectToMove->Move.MoveZ
0 ; I* s t o p i f c l o s ee n o u g h * I
ObjectToMove->RecalcXform
1:
--
LISTING 52.7
152-7.C
I* Draws a l lv i s i b l ef a c e si ns p e c i f i e dp o l y g o n - b a s e do b j e c t .O b j e c tm u s th a v e
p r e v i o u s l yb e e nt r a n s f o r m e da n dp r o j e c t e d ,
f i l l e d i n . *I
s o t h a tS c r e e n V e r t e x L i s ta r r a yi s
ObjectToXform)
i n t i . j . NumFaces
ObjectToXform->NumFaces. N u m V e r t i c e s ;
i n t * VertNumsPtr;
Face * F a c e P t r
ObjectToXform->FaceList:
P o i n t * Screenpoints
ObjectToXform->ScreenVertexList;
l o n gv l ,v 2 ,
w l . w2;
P o i n t VerticesCMAX-POLY-LENGTH];
P o i n t L i s t H e a d e rP o l y g o n ;
I* Draweach v i s i b l e f a c e ( p o l y g o n ) o f t h e o b j e c t i n t u r n
*I
f o (r i - 0i;< N u m F a c e s ;
i++.
FacePtr++) {
NumVertices
FacePtr->NumVerts;
/ * C o p yo v e rt h ef a c e ' sv e r t i c e sf r o mt h ev e r t e xl i s t
*I
f o r( j - 0 ,
V e r t N u m s P t r - F a c e P t r - > V e r t N u m s ; j < N u m V e r t i c e s ; j++)
VerticesCjl
ScreenPointsC*VertNumsPtr++l;
/ * Draw o n l y i f o u t s i d ef a c es h o w i n g
( i f t h en o r m a lt ot h e
p o l y g o np o i n t st o w a r dv i e w e r ;t h a ti s .h a s
a p o s i t i v e Z component) * I
vl
V e r t i c e s C 1 l . X - VerticesCO1.X:
wl
V e r t i c e s C N u m V e r t i c e s - 1 l . X - VerticesCO1.X:
v2
VerticesC11.Y - VerticesCO1.Y:
w2
VerticesCNumVertices-l1.Y - VerticesCO1.Y;
i f ((vl*w2 - v2*wl) > 0 ) (
I* It i s f a c i n g t h e s c r e e n ,
so draw * I
I* A p p r o p r i a t e l y a d j u s t t h e e x t e n t o f t h e r e c t a n g l e u s e d t o
e r a s et h i so b j e c tl a t e r
*/
f o r( j - 0 ;j < N u m V e r t i c e s ;
j++){
i f (VerticesCj1.X >
ObjectToXform->EraseRectCNonDisplayedPagel.Right~
i f ( V e r t i c e s C j 1 . X < SCREEN-WIDTH)
ObjectToXform->EraseRect[NonDisplayedPagel.Right
VerticesCj1.X;
e l s e ObjectToXform->EraseRect[NonDisplayedPagel.Right =
SCREEN-WIDTH;
i f (VerticesCj1.Y >
DbjectToXform->EraseRect[NonDisplayedPagel.Bottom~
i f ( V e r t i c e s C j 1 . Y < SCREEN-HEIGHT)
ObjectToXform->EraseRect[NonDisplayedPagel.Bottom
VerticesCj1.Y;
e l s e ObjectToXform->EraseRect[NonDisplayedPagel.BottomSCREEN-HEIGHT;
i f (VerticesCj1.X <
--
ObjectToXform->EraseRect[NonDisplayedPagel.Left)
978
Chapter 52
i f (VerticesCj1.X
>
0)
ObjectToXform->EraseRect[NonDisplayedPagel.Left
VerticesCj1.X;
e l s e O b j e c t T o X f o r m - > E r a s e R e c t [ N o n D i spl ayedPage1. Left-0:
i f (Verticesrj1.Y <
ObjectToXform->EraseRect[NonDisplayedPagel.Top)
i f (VerticesCj1.Y > 0)
ObjectToXform->EraseRectCNonDisplayedPagel.Top
VerticesCj1.Y:
e l s e ObjectToXform->EraseRect[NonDisplayedPagel.Top-O:
>
I
/ * Draw t h ep o l y g o n * /
DRAW-POLYGON(Vertices. N u m V e r t i c e sF. a c e P t r - > C o l o r .
LISTING 52.8152-8.C
/ * I n i t i a l i z e s t h e cubesandaddsthem
totheobjectlist.
0. 0 ) :
*/
#i
n c l u d e < s t d lib. h>
#i
ncl ude <math. h>
ti ncl ude "polygon. h"
#define
#define
#define
#define
ROT-6
("PI
/ 30.0)
ROT-3
("PI
/ 60.0)
ROT-2
("PI
/ 90.0)
NUM-CUBES 12
/*
/*
/*
/*
r o t a t e 6 d e g r e e sa t
r o t a t e 3 d e g r e e sa t
r o t a t e 2 d e g r e e sa t
d o f cubes * /
a time
a time
a time
*/
*/
*/
P o i n t 3 CubeVertsCNUM-CUBE-VERTSI:
/* s e te l s e w h e r e ,f r o mf l o a t s
*/
/ * v e r t e xi n d i c e sf o ri n d i v i d u a lc u b ef a c e s
*/
s t a t i c in tF a c e l [ ]
= (1.3.2.0}:
s t a t i c in t FaceZCl
[5.7.3.11;
s t a t i c in t Face3C1
14.5.1.01:
(3.7.6.21:
s t a t i c in t Face4[]
s t a t i c in t Face5C1
(5.4.6.71;
s t a t i c in t Face6CI = I 0 . 2 . 6 . 4 1 :
s t a t i c int*VertNumList[]-[Facel,FaceZ,Face3.Face4.Face5.Face61:
s t a t i c in tV e r t s I n F a c e [ ] - {
sizeof(Facel)/sizeof(int).
s i z e o f ( F a c e 2 ) / s i z e o f ( i n t ) . sizeof(Face3)/sizeof(int),
sizeof(Face4)/sizeof(int). sizeof(Face5)/sizeof(int).
sizeof(Face6)/sizeof(int) I:
/* X . Y . 2 r o t a t i o n s f o r cubes * I
s t a t i cR o t a t e c o n t r o l
InitialRotateCNUM-CUBES] = I
{O.O.ROT_6.ROT-6),
~ROT~3,0.O,ROT~3II [ROT_3.ROT_3.0.0},
(ROT-3, -ROT-3,0.01 .I-ROT_3.ROT-2,0.01, (-ROTL6.-ROT-3.0.01,
~ROT~3.0.0.-ROT~6~.~-ROT_2.0.O.0~R0T~3J,~-R0T~3,0.0,-R0T~31,
[O.O.ROT_2.-ROT~2~.(O.O,-ROT_3.ROT~3},~O.O,-ROT~6,-ROT~6~,}:
staticMoveControlInitialMove[NUM-CUBES]
I
[0,0.80.0.0.0,0,0,-350J,[0,0,80,0,0,0,0,0,-350J,
~0.0.B0.0.0.0,0.0.-3501,~0,0,80,0,0,0,0,0,-3501,
I0.0.80.0.0.0.0.0.-3501~~0,0~80.0.0.0.0.0,-3501,
~0.0.80.0.0,0.0.0.-350~,(0,0,80,0,0,0,0,0,-350),
~0,0.80.0.0.0.0.0,-3501,~0,0.80,0,0.0.0.0;3501,
/*
~0,0,80,0.0,0,0.0.-350~,~0,0,80.0.0.0.0.0,-3501, I :
f a c ec o l o r sf o rv a r i o u sc u b e s
*/
s t a t i c i n t Colors[NUM-CUBES][NUM_CVBE-FACESI
-I
~15.14.12.11.10.9~.I1,2,3,4,5,61,~35.37,39,41,43,45~,
(47.49,51.53.55.571.(59.61.63.65.67.691,(71,73,75,77,79,811,
I83,85.87.89.91,93~.~95.97.99,101,103.105J,
979
(107.109,111.113.115~1171,~119,121,123,125,127,1291,
/*
{131,133,135.137,139,1411,~143.145.147,149,151~1531I ;
s t a r t i n g c o o r d i n a t e s for cubes i n w o r l d s p a c e
*/
s t a t i c i n t CubeStartCoords[NUM-CUBESIC31
I
I 1 0 0 , - 7 0 . - 6 0 0 0 1 , (33.0.-6000].
[ 1 0 0 . 0 . - 6 0 0I0110.0 . 7 0 . ~ 6 0 0 0 1 .
I-33.0.-60001.
I-33,70.-60001,
133.70,-60001,
I33.-70.-60001.
I-100.-70.-60003):
(-100.70.-6000),
~-33.-70.-60001.~-100.0.-6000~,
/ * d e l a yc o u n t s( s p e e dc o n t r o l )f o rc u b e s
*/
s t a t i c i n t InitRDelayCountsCNUM_CVBESI
(1.2.1,2.1.1.1.1.1.2.1.1):
s t a t i c i n t BaseRDelayCountsCNUM_CUBESI
(1,2,1,2,2,1,1,1,2,2,2,11;
s t a t i c i n t InitMDelayCountsCNUM_CUBESl
{1,1,1,1,1,1,1,1,1,1,1,11;
s t a t i c i n t BaseMDelayCountsCNUM~CUBESl ~1.1.1.1.1.1.1.1,1.1,1,11;
--
v o i dI n i t i a l i z e C u b e s O
i n t i. j , k ;
PObject*Workingcube:
i<NUM-CUBES; i++)
(
f o r( i - 0 :
i f ((Workingcube
malloc(sizeof(P0bject)))
NULL)
p r i n t f ( " C o u 1 d n ' tg e tm e m o r y \ n " ) :e x i t ( 1 ) ;
I
Workingcube->DrawFunc
DrawPObject:
Workingcube->RecalcFunc = X f o r m A n d P r o j e c t P O b j e c t :
Workingcube->MoveFunc
RotateAndMovePObject;
Workingcube->RecalcXform
1:
f o r (k-0:
k<2:
k++) {
-Workingcube->EraseRect[kl.Bottom I
Workingcube->RDelayCount Workingcube->MDelayCount Workingcube->MDelayCountBase /*
Workingcube->EraseRect[kl.Left
Workingcube->EraseRect[kl.Top
Workingcube->EraseRect[kl.Right
Ox7FFF;
0;
0:
InitRDelayCountsCil:
Workingcube->RDelayCountBase
BaseRDelayCountsCil;
InitMDelayCountsCil:
BaseMDelayCounts[il;
S e tt h eo b j e c t - > w o r l dx f o r mt on o n e
*/
f o r( j - 0 ;j < 3 ;
j++)
f o (r k - 0 k; < 4 ;
k++)
Workingcube->XformToWorld[jl[kl
INT-TOpFIXED(0);
Workingcube->XformToWorld[Ol[Ol
/*
S e tt h ei n i t i a ll o c a t i o n
*/
f o r( j - 0 ;j < 3 ;
j++)WorkingCube->XformToWorldCjl[31
INT~TO~FIXED(CubeStartCoordsCi1Cjl):
Workingcube->NumVerts
NUM-CUBE-VERTS:
Workingcube->VertexList
CubeVerts:
Workingcube->NumFaces
NUM-CUBELFACES:
Workingcube->Rotate
InitialRotateCil:
Workingcube->Move.MoveX
INT-TO~FIXED(InitialMove[i].MoveX):
Workingcube->Move.MoveY
INT~TO~FIXED(InitialMoveCil.MoveY);
Workingcube->Move.MoveZ
INT~TO~FIXEO(InitialMove[il.MoveZ);
Workingcube->Move.MinX
INT~TO~FIXEO(InitialMove~il.MinX):
Workingcube->Move.MinY
INT_TO-FIXED(InitialMoveCil.MinY);
Workingcube->Move.MinZ
INT-TO-FIXED(InitialMoveCil.MinZ);
Workingcube->Move.MaxX
INT-TO-FIXED(InitialMove[il.MaxX);
Workingcube->Move.MaxY
INT-TO-FIXED(InitialMove[il.MaxY);
Workingcube->Move.MaxZ
INT_TO-FIXED(InitialMove[i].MaxZ);
----
980
Chapter 52
i f ((Workingcube->XformedVertexList
malloc(NUM-CUBE-VERTS*sizeof(Point3)))
malloc(NUM-CUBE-VERTS*sizeof(Point3)))
malloc(NUM_CUBE-VERTS*sizeof(Point)))
malloc(NUM-CUBE_FACES*sizeof(Face)))
Workingcube->FaceList[jl.VertNums
Workingcube->FaceListCjl.NumVerts
WorkingCube->FaceList[jl.Color
ObjectListCNumObjects++l
NULL) {
- NULL)
- NULL)
- NULL)
1
--
VertNumListCjl;
VertsInFaceCjl:
ColorsCil[j];
- (Object*)WorkingCube;
a n dd i v i d e .
; C n e a r - c a l l a b l ea s :F i x e d p o i n tF i x e d M u l ( F i x e d p 0 i n t
M 1 . F i x e d p o i n tM 2 ) ;
F i x e d p o i n tF i x e d D i v ( F i x e d p o i n tD i v i d e n d ,F i x e d p o i n tD i v i s o r ) ;
; T e s t e dw i t h
TASM
.model
small
.386
.code
pub1 ic -FixedMul .-Fi xedDi v
; M u l t i D l i e st w of i x e d - p o i n tv a l u e st o g e t h e r .
FMparms s t r u c
dw
2 d u p; r( e? t)audr dn r e s s
M1
dd
?
M2
dd
?
FMparms ends
2
a1 i g n
n e pa r o c
-FixedMul
push
bP
mov
bp.w
mov
eax.[bp+M11
imu1
dword p t r Cbp+M2] ; m u l t i p l y
add
e8a0x;0ra,0odbhudy2ni^dn(g- 1 6 )
adc
edx, 0
; w h opr eloaesfr itusl t
shr
e af rtx;ahp. c1eut6ti oipnna ar tl
POP
bP
ret
-Fi xedMul
endp
; D i v i d e so n ef i x e d - p o i n tv a l u eb ya n o t h e r .
FDparms s t r u c
dw
2 d u;pr e( a?t du) dr nr e s s
D i v i d ed n d
?
D di vdi s o r
?
FDparmsends
align
2
& pushed BP
i n OX
AX
& pushed BP
98 1
n e pa r o c
i xedDi v
push
bp
mov
bp. S P
cx cx,
;assume p o s ri et i sv ue l t
sub
eax.Cbp+Dividendl
mov
and ; p oe dsaeixatvixivd, ee n d ?
;yes
FDPl
jns
inc :markcx
i t ' s a n e gdai vt ii vdee n d
eax
;make
d i vpitodhseeint idv e
neg
edx,
edx
;make it a 6 4 - bdi it v i d e n tdh, esnh i f t
FDPl :
sub
: l e f t 16 b i t s so t h a t r e s u l t will be i n EAX
e a:1pxf6ur, at c t i o npdaaoil rvf ti d e innd
rol
: h i g hw o r do f
EAX
d x :,w
pa uxhdtpoioavl feri dt ei n d
DX
mov
aw
o: xcfloo.l raewdxa r
EAX
sub
e b x . d w o r dp t r[ b p + D i v i s o r l
mov
: p o s i t i v ed i v i s o r ?
ebx.ebx
and
FDPZ
jns
:yes
i t ' s a n e gdai tvi ivseo r
dec :markcx
ebx
:make p do isvi it si vo er
neg
; d i v i d eFDPL:
ebx
div
e b; xd.i 1v i s o m
r / 2i n. u s
shr
1 i f tdhi ev i si so r
: even
ebx.O
adc
ebx
dec
e bex;ds,C
xeat r r y
i f remainder iasl et a s t
CmP
eax, 0
adc
; h a l f as l a r g e as t d
h iev i s ot hr ,e n
: u s et h a tt or o u n du p
i f necessary
; s h o u l dt h er e s u l t
be made n e g a t i v e ?
c x ,c x
and
FDP3
;no
jz
negate
it
neg;yes. eax
edx ,e a; rxe t u r en s ui lnt
DX:AX; f r a c t i o n a l
mov
FDP3:
: p a r ti sa l r e a d yi n
AX
e :d1wx6h, oprlaoeerfstiunl t
DX
shr
bp
POP
ret
endp
-Fi xedDi
v
end
-F
LISTING 52.10
/*
POLYG0N.H
POLYG0N.H: Header f i l e f o r p o l y g o n - f i l l i n g c o d e , a l s o i n c l u d e s
a number o f u s e f u l i t e m s f o r
3-D a n i m a t i o n . * I
# d e f i n e MAX-OBJECTS
100
/ * max s i m u l t a n e o u s # o b j e c t s u p p o r t e d */
# d e f i n e MAX-POLY-LENGTH 4 I* f o u r v e r t i c e s i s t h e
max p e rp o l y * /
# d e f i n e SCREEN-WIDTH 320
# d e f i n e SCREEN-HEIGHT 240
# d e f i n e PAGEO-START-OFFSET 0
# d e f i n e PAGE1-START-OFFSET (((long)SCREENLHEIGHT*SCREEN_WIDTH)/4)
# d e f i n e NUM-CUBE-VERTS 8
/ * #v eo rf t i cpceeusrb e
*/
# d e f i n e NUM-CUBE-FACES 6
I* #f aocpfue bsr e
*/
/ * R a t i o :d i s t a n c ef r o mv i e w p o i n tt op r o j e c t i o np l a n e
/ w i d t ho f
p r o j e c t i o np l a n e .D e f i n e st h ew i d t ho ft h ef i e l do fv i e w .
Lower
a b s o l u t ev a l u e s
w i d e rf i e l d so fv i e w :h i g h e rv a l u e s
narrower */
# d e f i n e PROJECTION-RATIO - 2 . 0 / * n e g a t i v eb e c a u s ev i s i b l e
2
c o o r d i n a t e sa r en e g a t i v e
*/
/ * Draws t h e p o l y g o n d e s c r i b e d b y t h e p o i n t l i s t P o i n t L i s t i n c o l o r
C o l o rw i t ha l lv e r t i c e so f f s e t
by ( X . Y ) * I
# d e f i n e DRAW-POLYGON(PointList.NumPoints.Co1or.X.Y)
\
Polygon.Length
N u m P o i n t s :P o l y g o n . P o i n t P t r
PointList; \
FillConvexPolygon(&Polygon. C o l o r , X. Y ) ;
982
Chapter 52
# d e f i n e INT-TO-FIXED(x)
( ( ( l o n g ) ( i n t ) x ) << 1 6 )
# d e f i n e DOUBLE-TO-FIXED(x)
( ( l o n g ) ( x * 65536.0 + 0.5))
t y p e d e f 1o n g F i x e d p o i n t :
t y p e d e fF i x e d p o i n tX f o r m C 3 1 [ 4 1 :
I* D e s c r i b e s a s i n g l e 2D p o i n t * I
t y p e d e fs t r u c t
{ i n t X: i n t Y; 1 P o i n t :
I* D e s c r i b e s a s i n g l e 3D p o i n t i n homogeneous c o o r d i n a t e s :t h e W
c o o r d i n a t ei s n ' tp r e s e n t ,t h o u g h :
assumed t o be 1 a n di m p l i e d * I
t y p e d e fs t r u c t
I F i x e d p o i n t X . Y . Z; I P o i n t 3 :
{ i n t X : i n t Y: i n t Z: I I n t P o i n t 3 :
t y p e d e fs t r u c t
I* D e s c r i b e s a s e r i e s o f p o i n t s ( u s e d t o s t o r e
a listofverticesthat
d e s c r i b e a p o l y g o n :e a c hv e r t e xi s
assumed t oc o n n e c tt ot h et w o
a d j a c e n tv e r t i c e s :l a s tv e r t e xi s
assumed t o c o n n e c t t o f i r s t )
*I
t y p e d e fs t r u c t
{ i n tL e n g t h :P o i n t
* PointPtr:
PointListHeader;
/ * D e s c r i b e st h eb e g i n n i n ga n de n d i n g
X coordinates o f a s i n g l e
h o r i z o n t a ll i n e * I
t y p e d e fs t r u c t
{ i n t X S t a r t ;i n t
XEnd: I H L i n e :
I* D e s c r i b e s a L e n g t h - l o n gs e r i e so fh o r i z o n t a ll i n e s ,a l l
assumed t o
be on c o n t i g u o u ss c a nl i n e ss t a r t i n ga tY S t a r ta n dp r o c e e d i n g
downward(used
t o d e s c r i b e a s c a n - c o n v e r t e dp o l y g o nt ot h e
l o w - l e v e hl a r d w a r e - d e p e n d e n td r a w i n gc o d e ) .
*/
t y p e d e fs t r u c t { i n tL e n g t h :i n tY S t a r t :H L i n e
* H L i n e P t r : }H L i n e L i s t :
t y p e d e fs t r u c t
{ i n tL e f t ,T o p ,R i g h t ,B o t t o m :
Rect;
I* s t r u c t u r ed e s c r i b i n go n ef a c e
o f an o b j e c t( o n ep o l y g o n )
*/
t y p e d e fs t r u c t
{ i n t * VertNums: i n t NumVerts; i n tC o l o r :
}
Face:
t y p e d e fs t r u c t
I d o u b l eR o t a t e X .R o t a t e Y .R o t a t e Z :
1 RotateControl;
{ F i x e d p o i n t MoveX,
MoveY.MoveZ.
Minx,MinY.MinZ.
t y p e d e fs t r u c t
MaxX. MaxY. MaxZ; I M o v e C o n t r o l :
I* f i e l d s common t o e v e r y o b j e c t
*/
# d e f i n e BASE-OBJECT
\
o b/j *e c dt r a w s
*I
\
v o i d( * D r a w F u n c ) O :
I* p r e p a r eosb j e cf otdrr a w i n g
*/
\
v o i d( * R e c a l c F u n c ) ( ) :
I* moves o b j e c t * I
\
v o i d( * M o v e F u n c ) O :
I* 1 t oi n d i c a t e need t or e c a l c * I
\
i n t RecalcXform;
I* r e c t a n g l et oe r a s ei ne a c hp a g e
*/
RectEraseRectC21:
I* b a s i c o b j e c t * I
t y p e d e f s t r u c t { BASELOBJECT 1 O b j e c t :
I* s t r u c t u r e d e s c r i b i n g a p o l y g o n - b a s e do b j e c t * I
t y p e d e fs t r u c t I
BASE-OBJECT
I* c o n t r o l sr o t a t i o ns p e e d
*/
i n t RDelayCount.RDelayCountBase:
i n t MDelayCount,MDelayCountBase:
I* c o n t r o l s movementspeed
*I
Xform
XformToWorld;
/* t r a n s f o rf rmoomb j e c t - > w o rsl pd a c e
*/
Xform
XformToView;
/ * t r a n s f ofrrmo m
b j e c t - > v i se pwa c e
*I
RotateContrR
o lo t a t e :
I* c o n t r o rl so t a t i ocnh a n goev et irm e
*I
MoveControl Move:
I* c o n t r ool bs j e c t
movement
over
time
*I
i n t NumVerts;
/ * # v e Vr teiicrnteesx L i s t
*I
P onit 3 * V e r t e x Lsit :
I* u n t r a n s f o r m evde r t i c e s
*I
Point3 * XformedVertexList;
/ * t r a n s f o r m e di n t ov i e ws p a c e
*I
P o i n t 3 * P r o j e c t e d V e r t e x L i s t : I* p r o j e c t e di n t os c r e e ns p a c e
*/
Point * ScreenVertexList:
I* c o n v e r t e tdsoc r e e cno o r d i n a t e s
*I
in t NumFaces :
I* # foafcoienbsj e c t
*/
Face * F a c e L i s t ;
/* p o i n tfteaoircnef o
*I
I PObject:
>
>
*, F i x e d p o i n t * ) ;
e x t e r nv o i dX f o r m V e c ( X f o r m .F i x e d p o i n t
e x t e r nv o i dC o n c a t X f o r m s ( X f o r m .X f o r m ,X f o r m ) :
e x t e r n i n t FillConvexPolygon(PointListHeader *, i n t . i n t . i n t ) :
e x t e r nv o i dS e t 3 2 0 ~ 2 4 0 M o d e ( v o i d ) :
Fast 3-D Animation: Meet X-Sharp
983
e x t e r nv o i dS h o w P a g e ( u n s i g n e di n t ) :
e x t e r nv o i dF i l l R e c t a n g l e X ( i n t .i n t .i n t .i n t .u n s i g n e di n t .i n t ) ;
e x t e r nv o i d X f o r m A n d P r o j e c t P O b j e c t ( P 0 b j e c t * ) ;
e x t e r nv o i dD r a w P O b j e c t ( P 0 b j e c t
*);
e x t e r nv o i dA p p e n d R o t a t i o n X ( X f o r m .d o u b l e ) :
e x t e r nv o i dA p p e n d R o t a t i o n Y ( X f o r m .d o u b l e ) ;
e x t e r nv o i dA p p e n d R o t a t i o n Z ( X f o r m .d o u b l e ) :
e x t e r nn e a rF i x e d p o i n tF i x e d M u l ( F i x e d p o i n t .F i x e d p o i n t ) ;
e x t e r nn e a rF i x e d p o i n tF i x e d D i v ( F i x e d p 0 i n t .F i x e d p o i n t ) ;
e x t e r nv o i d InitializeFixedPoint(void);
e x t e r nv o i d RotateAndMovePObject(P0bject * ) ;
e x t e r nv o i dI n i t i a l i z e C u b e s ( v o i d ) ;
e x t e r n i n t D i s p l a y e d P a g e , NonDi spl ayedPage. Recal cAl1 Xforms ;
e x t e r ni n tN u m O b j e c t s ;
externXformWorldViewXform;
e x t e r nO b j e c t* O b j e c t L i s t [ l ;
e x t e r nP o i n t 3C u b e V e r t s C ] ;
984
Chapter 52
draw a solid cube in real time. Not too shabby...but the code Im presenting in this
chapter goes a bit further, rotating 12 solid cubes at an update rate of about 15
frames per second (fps) on a20 MHz 386 with a slow VGA. Thats 12 transformation
matrices, 72 polygons, and 96 vertices being handled in real time; not Star Wars,
granted, but a giant step
beyond a single cube. Run the programif you get a chance;
you may be surprised at justhow effective this level of animation is. Id like to point
out, in case anyone missed it, thatthis is fully general 3-D. Im not using any shortcuts
or tricks, likeprestoring coordinatesor pregenerating bitmaps; if you wereto feed in
different rotationsor vertices, the animation would change accordingly.
The keys to the performance increase manifested in this chapters code are three.
The first key is fixed-point arithmetic. In theprevious two chapters, we worked with
floating-point coordinates and transformation matrices. Those values are now stored
as 32-bitfixed-point numbers, in the form
16.16 (16 bits of whole number, 16bits of
fraction). 32-bit fxed-point numbers allow sufficient precision for 3-D animation, but
can be manipulated with fast integer operations, rather thanby slow floating-point
processor operationsor excruciatinglyslow floating-point emulator operations. Although
the speed advantage of fixed-point varies depending on the operation, on the
processor, and on whether or not a coprocessor is present, fixed-point multiplication
can be as much as 100 times faster than the emulatedfloating-point equivalent. (Id
like to take a momentto thank Chris Hecker for his invaluable input in this area.)
The second performance key is the use of the 386s native 32-bit
multiply and divide
instructions. C compilers operating in real mode call libraryroutines to perform multiplications and divisions involving 32-bit values,
and those library functions arefairly
slow, especiallyfor division. On a 386,32-bitmultiplicationand division can behandled
with the bit of code in Listing 52.9-and most of even that code is only for rounding.
Fast 3-D Animation: Meet X-Sharp
985
The third performance key is maintaining and operating on only the relevant portions of transformation matrices and coordinates. The bottom row of every
transformation matrix well use (in this book) is [0 0 0 11, so why bother using or
recalculating it when concatenating transforms and transforming points? Likewise
for the fourth elementof a 3-D vector in homogeneous coordinates, which is always
1. Basically, transformation matrices are treated as consisting of a 3x3 rotation matrix and a 3x1 translation vector, and coordinates are treated as 3x1 vectors. This
saves a great many multiplications in the course of transforming each point.
Just for fun, I reimplemented the animation of Listings 52.1 through 52.10 with
floating-point instructions. Together, the preceeding optimizations improve the performance of the entire animation-including drawing time and overhead, and not
just math-by more than ten times overthe code that uses the floating-point emulator. Amazing what one can accomplish with a few dozen lines of assembly and a
switch in number format, isnt it? Note that no assembly code other than the native
386 multiplyand divide is used in Listings 52.1through 52.10, although the polygon
fill code is of course mostly in assembly; weve achieved 12 cubes animated at 15fps
while doing the 3-D work almost entirely in Borland C++, and were still doing sine
and cosine via the floating-point emulator. Happily, were still nowhere near the
upper limit on the animation potential of the PC.
Drawbacks
The techniques weve used to turbocharge 3-D animation are very powerful, but
theres a dark side to them as well. Obviously, native 386instructions wont work on
8088 and 286 machines. Thats rectifiable; equivalent multiplication and division
routines could be implemented for real mode and performance would still be reasonable. It sure is nice to be able to plug in a 32-bit IMUL or DIV and be done with
it, though. More importantly, 32-bit fixed-point arithmetic has limitations in range
and accuracy. Points outside a 64Kx64Kx64K space cant be handled, imprecision
tends to creep in over the course of multiple matrix concatenations, and its quite
possible to generate the dreaded divide by 0 interrupt if Z coordinates with absolute
values less than one are used.
I dont have space to discuss these issues in detail, but here aresome brief thoughts:
The working 64Kx64Kx64Kfixed-point space can be paged
into a larger virtual space.
Imprecision of a pixel or two rarely matters in terms of display quality,and deterioration of concatenated rotations can be corrected by restoring orthogonality, for
example by periodically calculating one row ofthe matrix as the cross-product of the
other two (forcing it to be perpendicular to both). Alternatively, transformations
can becalculated from scratch each time an object or the viewer moves,so theres no
chance for cumulative error. 3-D clipping with a front clip plane of -1 or less can
prevent divide overflow.
986
Chapter 52
Previous
Home
Next
987
Previous
chapter 53
raw speed and more
Home
Next
991
LISTING 53.1
FIXED.ASM
; 3 8 6 - s p e c i f i cf i x e dp o i n tr o u t i n e s .
: T e s t e dw i t h
ROUNDING-ON
TASM
equ
equ
:1 rfoourn d i n g ,
; more a c c u r a t e
ALIGNMENT
.model
smal
.386
.code
992
Chapter 53
0 f o rrounding
no
:no r o u n d i n gi sf a s t e r .r o u n d i n gi s
; M u l t i p l i e st w of i x e d - p o i n tv a l u e st o g e t h e r .
; C n e a r - c a l l a b l ea s :
FixedpoiF
n ti x e d M u l ( F i x e d p 0 i n t
M1. F i x e d p o i n t M2);
F i x e d p o iFn itx e d D i v ( F i x e d p 0 iD
n ti v i d e nFd i.x e d p o iD
n ti v i s o r ) :
FMparms s t r u c
dw
2 d u p: r( e? t)audr dn r e s s
d pushed BP
M1
dd
?
M2
dd
?
ends
FMparms
a1 i g n
ALIGNMENT
p u b l i c -FixedMul
-FixedMul
n e apr o c
push
bp
mov
bp.sp
mov
eax,[bp+Ml]
imul
;multiply
dword p t r Cbp+M21
i f ROUNDING-ON
eax.8000h
add
:roundbyadding2".(-17)
edx.O adc
;wholepartofresultisin
e n d i f ;ROUNDING-ON
eax.16shr
; p u tt h ef r a c t i o n a lp a r ti n
POP
bp
ret
-FixedMul
endp
;
DX
AX
: D i v i d e s one f i x e d - p o i n tv a l u eb ya n o t h e r .
: C n e a r - c a l l a b l ea s :
;
F i x e d p o i Fn it x e d D i v ( F i x e d p 0 i D
n ti v i d e n Fd i. x e d p o i D
n ti v i s o r ) :
FOparms s t r u c
dw
2 d:uraped(t ?
ud)r ne s s
D i vdi d e n d
?
Divisor
dd
?
FDparms
ends
a1 i g n ALIGNMENT
public
-F i xedDi v
-Fi xedDiv
n e ap r o c
push
bp
mov
bp.sp
i f ROUNDING-ON
sub
mov
and
jns
inc
neg
FDP1:
sub
C X .cx
eax,[bp+Dividend]
eax.eax
FDPl
cx
eax
edx ,edx
& pushed BP
;assume rpeossui lt ti v e
;positive dividend?
;yes
;mark i t ' s a n e g a t i v ed i v i d e n d
p odsi:make
ivtiidv e ntdh e
;make i t a 6 4 -dbi vi ti d e tnhsdeh, ni f t
: l e f t 16 b i t s s o t h a t r e s u l t will be
; i n EAX
r o l f r ;apcutp1it oae6nratax l.
o f d i v i di ne n d
: h i g hw o r do f
EAX
mov
d i vwoi:pdihfpnaeouarnlttxed x ,
DX
o f w o rlsub
do: cwl e aar x a x ,
EAX
mov
ebx,dword p t r[ b p + O i v i s o r ]
and
ebx, ebx
:positivedivisor?
jns
FDP2
;yes
dec
cx
;mark i t ' s a dnievgi saot irv e
ebx
p:make
o s i t i vdei v i s o r
neg
993
FDP2:
ebx.1
ebx.O
ebx
div
shr
adc
dec
cmp
adc
ebx
:divide
; d i v i s o r / 2 ,m i n u s
: even
ebx,
edx
eax.O
cx,cx and
jz
FOP3
eax neg
FDP3:
e l s e : !ROUNDING-ON
mov
edx.Cbp+Dividendl
eax.eax sub
eax.edx.16
shrd
16
edx,
sar
idiv
dword [pbtpr + D i v i s o r ]
endi f
shld
edx.eax.16
1 i f thedivisoris
; s e tC a r r y i f r e m a i n d e r i s a t l e a s t
; h a l f as l a r g e as t h ed i v i s o r .t h e n
; use t h a t t o r o u n d
up if n e c e s s a r y
; s h o u l dt h er e s u l tb e
made n e g a t i v e ?
:no
:yes.negate
it
; p o s i t i o n so t h a t r e s u l t
; i n EAX
endsup
;ROUNDING-ON
; w h o l ep a r to fr e s u l ti n
DX;
; f r a c t i o n a lp a r ti sa l r e a d yi n
bp
POP
ret
-FixedDi
endp
v
~~
~~
~~
~~
~~
; R e t u r n st h es i n ea n dc o s i n eo f
anangle.
; C n e a r - c a l l a b l ea s :
;
v o i d CosSin(TAng1e
Angle,
Fixedpoint
*Cos.
Fixedpoint
~~
*):
a1 i g n ALIGNMENT
CosTable1abeldword
i n c l u d ec o s t a b l e . i n c
SCparms s t r u c
dw
dw
Angle
cos
dw
dw
Sin
ends
SCparms
2 dup(?)
?
?
?
a1 i g n ALIGNMENT
pub1 i c JosSi n
-CosSin
p r once a r
push bp
mov
bp.sp
mov
bx.Cbpl.Angle
bx.bx
and
jns
CheckInRange
MakePos:
bx.360*10
add
js
MakePos
jmp
short
CheckInRange
a1 i g n ALIGNMENT
MakeInRange:
bx.360*10
sub
CheckInRange:
cmp
bx,
360*10
jg
MakeInRange
994
Chapter 53
; r e t u r na d d r e s s
& pushed BP
:angle t o c a l c u l a t e s i n e & c o s i n ef o r
: p o i n t e rt oc o sd e s t i n a t i o n
; p o i n t e rt os i nd e s t i n a t i o n
; p r e s e r v es t a c kf r a m e
: s e t up l o c a ls t a c kf r a m e
:make s u r ea n g l e ' sb e t w e e n
; l e s st h a n
0 and2*pi
0, so make i t p o s i t i v e
:make s u r e a n g l e i s
no more t h a n2 * p i
AX
cmp
ja
cmp
Q u a jdar a n t l
bx.
180*10
EottomHalf
bx,
90*10
: f i g u r eo u tw h i c hq u a d r a n t
:quadrant 2 o r 3
:quadrant 0 o r 1
:quadrant 0
b xs.h2l
mov
eax.CosTable[bxl
bx
neg
mov
edx.CosTable[bx+90*10*41
j ms ph o r t
CSDone
a l i g n ALIGNMENT
Quadrantl:
bx
neg
add
bx,
180*10
b xs.h2l
mov
eax.CosTable[bx]
eax
neg
bx
neg
mov
edx.CosTableCbx+90*10*4]
jmp
s h o r t CSDone
a1 i g n ALIGNMENT
EottomHalf:
bx
neg
bx.360*10
add
cmp
bx,
90*10
ja
Quadrant2
: l o o k up s i n e
cos(90-Angle)
:sin(Angle)
:look u pc o s i n e
: c o n v e r tt oa n g l eb e t w e e n
0 and90
: l o o k up c o s i n e
: n e g a t i v ei nt h i sq u a d r a n t
:sin(Angle)
cos(90-Angle)
: l o o k up c o s i n e
:quadrant 2 o r 3
: c o n v e r tt oa n g l eb e t w e e n
:quadrant 2 o r 3
0 and180
:quadrant 3
b xs.h2l
mov
eax.CosTable[bx]
bxneg
mov
edx.CosTable[90*10*4+bx]
edx
neg
j ms ph o r t
CSDone
a1 i g n ALIGNMENT
Quadrant2:
bxneg
bx.180*10
add
b xs.h2l
mov
eax.CosTable[bxl
eax
neg
bx
neg
mov
edx,CosTable[90*10*4+bxl
edx
neg
CSDone:
mov
b x ,[ b p l .Cos
mov
Cbxl
.eax
mov
bx,[bpl.Sin
mov
[ b x ] ,edx
POP
ret
-CosSin
endp
bP
: l o o k up c o s i n e
cos(90-Angle)
;sin(Angle)
: l o o k up s i n e
: n e g a t i v ei nt h i sq u a d r a n t
: c o n v e r tt oa n g l eb e t w e e n
0 and90
: l o o ku pc o s i n e
: n e g a t i v ei nt h i sq u a d r a n t
:sin(Angle)
cos(90-Angle)
: l o o k up s i n e
:negative i n t h i s q u a d r a n t
: r e s t o r es t a c kf r a m e
: M a t r i xm u l t i p l i e sX f o r m
b yS o u r c e V e c .a n ds t o r e st h er e s u l t
in
: O e s t V e c .M u l t i p l i e s
a 4 x 4m a t r i xt i m e s
a 4 x 1m a t r i x :t h er e s u l t
: i s a 4 x 1m a t r i x .C h e a t sb ya s s u m i n gt h e
W c o o r d i s 1 and t h e
: b o t t o mr o wo ft h em a t r i xi s
0 0 0 1. a n dd o e s n ' tb o t h e rt os e t
RawSpeedand More
995
: t h e W c o o r d i n a t eo ft h ed e s t i n a t i o n .
: C n e a r - c a l l a b l ea s :
:
void
XformVec(Xform
WorkingXform,
Fixedpoint
*SourceVec,
F i x e d p o i n t* D e s t V e c ) :
: Thisassemblycode
;
i n t i:
f o( ri - 0i <: 3 :
DestVecCi]
i s equivalenttothis
C code:
- FixedMul(WorkingXform[il[Ol,
FixedMul(WorkingXform[~l[ll,
i+)
SourceVecCOI) +
SourceVecCl]) +
FixedMul(WorkingXformCilC21, SourceVecC21) +
WorkingXform[il[3]:
/ * noneed t o m u l t i p l y by W 1
XVparms s t r u c
WorkingXform
SourceVec
DestVec
XVparms
dw d u p2: r( e? t)audr dn r e s s
dw
?
t r amnas tf:tropoi rxomi n t e r
dw
?
v escot ourr c e t o: p o i n t e r
?
d evset icnt ao troi ;opno i n t e r
dw
ends
a1 i g n ALIGNMENT
pub1 ic -XformVec
-XformVec
proc
near
push bp
mov
bp.sp
push s i
push d i
mov
mov
mov
si.[bpl.WorkingXform
bx.[bpl.SourceVec
di.Cbp1.DestVec
*/
pushed BP
: p r e s e r v es t a c kf r a m e
: s e tu pl o c a ls t a c kf r a m e
: p r e s e r v er e g i s t e rv a r i a b l e s
:SI pointstoxformmatrix
:BX p o i n t s t o s o u r c e v e c t o r
:DI p o i n t st od e s tv e c t o r
soff-0
doff-0
REPT 3
mov
eax.[si+soffl
imul
dword
p t r[ b x l
i f ROUNDING-ON
eax.8000h
add
edx.0
adc
e n d i f :ROUNDING-ON
shrd
eax.edx.16
mov
ecx,
eax
mov
eax.[si+soff+41
imul
dword
p t r [bx+41
i f ROUNDING-ON
eax.8000h
add
edx.0
adc
e n d i f :ROUNDING-ON
shrd
eax.edx.16
ecx.eax
add
mov
eax.[si+soff+81
imuldword
p t r Cbx+81
i f ROUNDING-ON
eax.8000h
add
adc
edx.O
996
Chapter 53
:doonceeach
f o r d e s t X , Y . and Z
:column 0 e n t r y on t h i s row
: x f o r me n t r yt i m e ss o u r c e
X entry
: r o u n db ya d d i n g2 A ( - 1 7 )
:wholepartofresultisin
OX
: s h i f tt h er e s u l tb a c kt o1 6 . 1 6f o r m
: s e tr u n n i n gt o t a l
;column 1 e n t r y on t h i s row
: x f o r me n t r yt i m e ss o u r c e
Y entry
:roundbyadding
2^(-17)
:wholepartofresultisin
: s h i f tt h er e s u l tb a c kt o
: r u n n i n gt o t a lf o rt h i s
DX
16.16form
row
:column2entry
on t h i s row
: x f o r me n t r yt i m e ss o u r c e
Z entry
: r o u n db ya d d i n g2 ^ ( - 1 7 )
:wholepartofresultisin
OX
e n d i f ;ROUNDING-ON
shrd
eax.edx.16
add
ecx,
eax
: s h i f tt h er e s u l tb a c kt o1 6 . 1 6f o r m
: r u n n i n gt o t a lf o rt h i s
row
add
ecx,[si+soff+lZ]
mov
[ d i + d o f ,f e] c x
soff-soff+l6
doff-doff+4
ENDM
v ra:errgpop
ieissttoerred i
pop
si
bPPOP
ret
-XformVec
endp
:add i n t r a n s l a t i o n
: s a v et h er e s u l t
i n t h ed e s tv e c t o r
ab1 es
: r e s t o r es t a c kf r a m e
""""""Y""p""y.n-.
: M a t r i xm u l t i p l i e sS o u r c e X f o r m l
bySourceXformZand
: r e s u l t i n D e s t X f o r m .M u l t i p l i e s a 4 x 4m a t r i xt i m e s
:
:
:
:
s t o r e st h e
a 4 x 4m a t r i x ;
t h er e s u l ti s
a 4 x 4m a t r i x .C h e a t sb ya s s u m i n gt h eb o t t o mr o wo f
e a c hm a t r i x i s 0 0 0 1. and d o e s n ' tb o t h e rt os e tt h eb o t t o m
row
o ft h ed e s t i n a t i o n .
C n e a r - c a l l a b l ea s :
voidConcatXforms(XformSourceXforml.XformSourceXformZ.
XformOestXform)
: Thisassemblycode
: i n t i, j :
:
i se q u i v a l e n tt ot h i s
C code:
f o r ( i - 0 : i < 3 : i++)
{
f o r( j - 0 :j < 3 ;
j++)
OestXform[ilCjl
FixedMul(SourceXforml~il[Ol, SourceXformZ[O1Cjl) +
FixedMul(SourceXforml~il[ll, S o u r c e X f o r m Z [ l ] C j l ) +
FixedMul(SourceXform1Cil~21, SourceXformE[ZICjl):
DestXformCilC31
F i x e d M u l ( S o u r c e X f o r m l ~ i l ~ O 1 ,SourceXformZCOIC3]) +
FixedMul(SourceXform1Cil~l1, SourceXform2[11[31) +
F i x e d M u l ( S o u r c e X f o r m l ~ i l ~ Z 1 ,SourceXform2[21C31) +
SourceXforml[il[31:
CXparms s t r u c
SourceXforml
SourceXformZ
DestXform
CXparms
dw
dw
dw
dw
ends
2 dup(?)
?
?
?
a1 i g n A L I G N M E N T
public_ConcatXforms
-ConcatXforms
near
proc
push bp
mov
bp.sp
push s i
oush d i
mov
mov
mov
bx.Cbpl.SourceXform2
si.Cbpl.SourceXform1
di.[bpl.DestXform
: r e t u r na d d r e s s
& pushed BP
:pointertofirstsourcexformmatrix
: p o i n t e r t o s e c o n ds o u r c ex f o r mm a t r i x
:pointertodestinationxformmatrix
; p r e s e r v es t a c kf r a m e
: s e t up l o c a ls t a c kf r a m e
: p r e s e r v er e g i s t e rv a r i a b l e s
;BX p o i n t st ox f o r m 2m a t r i x
:SI p o i n t st ox f o r m lm a t r i x
:DI p o i n t s t o d e s t x f o r m m a t r i x
RawSpeedand More
997
roff-0
coff-0
REPT 3
REPT 3
mov
eax.Csi+roffl
imul
dword
ptC
r bx+coffl
i f ROUNDING-ON
add
eax,
8000h
edx.O
adc
e n d i f ;ROUNDING-ON
shrd
eax.edx.16
mov
ecx,
eax
mov
eax.[si+roff+41
imul
dword
p t [r b x + c o f f + l 6 1
i f ROUNDING-ON
eax.8000h
add
adc
edx.O
e n d i f ;ROUNDING-ON
shrd
eax.edx.16
add
ecx,
eax
mov
e a x[ s, i + r o f f + 8 1
imul
dword
p t[rb x + c o f f + 3 2 ]
i f ROUNDING-ON
eax.8000h
add
adc
edx.O
e n d i f :ROUNDING-ON
shrd
eax.edx.16
add
ecx,
eax
mov
[di+coff+roffl.ecx
coff-coff+4
ENDM
;row o f f s e t
:once f o r eachrow
;column o f f s e t
;once f o r e a c h o f t h e f i r s t 3 c o l u m n s ,
; assuming 0 as t h eb o t t o me n t r y( n o
; translation)
;column 0 e n t r y on t h i s row
;timesrow
0 e n t r y i n column
; r o u n db ya d d i n g2 ^ ( - 1 7 )
;wholepartofresultisin
; s h i f tt h er e s u l tb a c kt o
; s e tr u n n i n gt o t a l
DX
16.16form
;column 1 e n t r y on t h i s row
;timesrow
1 entryincol
; r o u n db ya d d i n g2 " ( - 1 7 )
;wholepartofresultisin
DX
; s h i f tt h er e s u l tb a c kt o1 6 . 1 6f o r m
; r u n n i n gt o t a l
;column 2 e n t r y on t h i s row
; t i m e sr o w2e n t r y
incol
;roundbyadding2"(-17)
;wholepartofresultisin
shifttheresultbackto
r u n n i n gt o t a l
DX
16.16form
save t h e r e s u l t i n d e s t m a t r i x
col i n xform2 & d e s t
p o i n tt on e x t
now do t h ef o u r t hc o l u m n ,a s s u m i n g
; 1 as t h eb o t t o me n t r y ,c a u s i n g
; t r a n s l a t i o n t o beperformed
mov
eax.[si+roffl
imul
dword
p t r[ b x + c o f f l
i f ROUNDING-ON
eax.8000h
add
edx.O
adc
e n d i f ;ROUNDING-ON
shrd
eax.edx.16
mov
ecx,
eax
mov
eax.[si+roff+41
imul
dword
p t [r b x + c o f f + l 6 1
i f ROUNDING-ON
eax.8000h
add
edx.O
adc
e n d i f ;ROUNDING-ON
shrd
eax.edx.16
ecx.eax
add
mov
eax.[si+roff+8]
imul
dword
p t[rb x + c o f f + 3 2 1
998
Chapter 53
;column 0 e n t r y on t h i s row
0 e n t r y i n column
;timesrow
: r o u n db ya d d i n g2 ^ ( - 1 7 )
; w h o l ep a r to fr e s u l ti si n
DX
: s h i f tt h er e s u l tb a c kt o1 6 . 1 6f o r m
; s e tr u n n i n gt o t a l
;column 1 e n t r y on t h i s row
;timesrow
1 e n t r yi nc o l
; r o u n db ya d d i n g2 " ( - 1 7 )
:wholepartofresultisin
DX
; s h i f tt h er e s u l tb a c kt o1 6 . 1 6f o r m
; r u n n i n gt o t a l
on t h i s row
;column2entry
;timesrow
2 entryincol
i f ROUNDING-ON
eax.8000h
add
edx.O
adc
e n d i f ;ROUNDING-ON
shrd
eax.edx.16
ecx.eax
add
eacdxd. [ s i + r o f f + l 2 1
mov
[di+coff+roff].ecx
c o l n e x ct otfof :-pc oo ifnf +t 4
roff-roff+l6
ENOM
pop
di
pop
si
POP bP
ret
X o n c a teXnf do pr m s
end
; r o u n db ya d d i n g
2^(-17)
: w h o l ep a r to fr e s u l t
i s i n DX
: s h i f tt h er e s u l tb a c kt o1 6 . 1 6f o r m
; r u n n i n gt o t a l
:add i n t r a n s l a t i o n
;save t h e r e s u l t
i n destmatrix
i n xform2 & d e s t
: p o i n tt on e x tc o li nx f o r m 2
& dest
; r e s t o r er e g i s t e rv a r i a b l e s
: r e s t o r es t a c kf r a m e
999
Hidden Surfaces
So far, weve made a number of simplifymg assumptions in order to get the animation to look good; for example, all objects must currently be convex polyhedrons.
Whats more, right now, objects can never pass behind or in front of each other.
What that meansis that its time to havea look at hiddensurfaces.
There area passel of ways to do hidden surfaces. Way off at one end (theslow end)
of the spectrum is Z-buffering, wherebyeach pixel of each polygon is checked as its
drawn to seewhether its the frontmost version of the pixel at those coordinates. At
the other endis the technique of simply drawing the objects in back-to-front order,
so that nearerobjects are drawn on top of farther objects. The latter approach, depth
sorting, is the one well take today. (Actually, true depth sorting involves detecting
and resolving possible ambiguities when objects overlap in 2; in this chapter, well
simply sort the objects on Z and leave it atthat.)
This limited version of depth sorting is fast but less than perfect. For one thing, it
doesnt address the issue of nonconvex objects, so well have to stick with convex
polyhedrons. For another, theres the question of what part of each object to use as
the sorting key; the nearest point, the center, and thefarthest point are all possibilities-and, whichever point is used, depth sorting doesnt handle some overlap cases
properly. Figure 53.1 illustrates
one case in which back-to-frontsorting doesnt work,
regardless of what point is used asthe sorting key.
For photo-realistic rendering, these are serious problems. For fast PC-based animation, however, theyre manageable. Choose objects
that arent too elongated; arrange
their paths of travel so they dont intersect in problematic ways; and, if they do overlap incorrectly, trust that the glitch will be lost inthe speed of the animation and the
complexity of the screen.
Listing 53.2 shows X-Sharp file OLIST.C, which
includes the key routines for depth
sorting. Objects are now stored in a linked list. The initial, empty list, created by
InitializeObjectList(), consists of a sentinel entry at either end, one at the farthest
possible z coordinate, and one atthe nearest.New entries are inserted byAddObject()
in z-sorted order. Each timethe objects are moved, before theyre drawnat their new
locations, Sortobjects0 is called to 2-sort
the object list, so that drawing will proceed
from back to front. The Z-sorting is done onthe basis of the objects center points; a
1000
Chapter 53
X axis
'\
points Farthest
Middle points
V
Viewer
why back-to-font sorting doesn 't always workproperly
Figure 53.1
center-point field has been added to the object structure to support this, and the
center point for each object
is now transformedalong with the vertices. That's really
all there is to depth sorting-and now we can have objects that overlap in X and Y
*/
terminate searches * /
void InitializeObjectListO
{
-- -
0bjectListStart.NextObject
&ObjectListEnd:
0bjectListStart.PreviousObject
NULL:
0bjectListStart.CenterInView.Z
INT_TO_FIXED(-32768):
0bjectListEnd.NextObject
NULL:
0bjectListEnd.PreviousObject
&ObjectListStart;
ObjectListEnd.CenterInView.2
Ox7FFFFFFFL:
0:
NumObjects
Object *ObjectListPtr
- 0bjectListStart.NextObject;
1001
/* L i n k i n t h e new o b j e c t * /
ObjectListPtr->Previousobject->Nextobject
ObjectPtr;
ObjectListPtr;
ObjectPtr->Nextobject
ObjectPtr->Previousobject
ObjectListPtr->PreviousObject;
ObjectListPtr->PreviousObject
ObjectPtr;
NumObjects++;
- -
/*
R e s o r t st h eo b j e c t si no r d e ro fa s c e n d i n gc e n t e r
2 c o o r d i n a t ei nv i e ws p a c e ,
bymovingeachobject
inturntothecorrectpositionintheobjectlist.
v o i dS o r t O b j e c t s O
*/
i n t i;
O b j e c t* O b j e c t P t r .* O b j e c t C m p P t r .* N e x t O b j e c t P t r :
--
/ * S t a r tc h e c k i n gw i t ht h es e c o n do b j e c t
*/
ObjectCmpPtr
0bjectListStart.NextObject;
ObjectPtr
ObjectCmpPtr->Nextobject;
f o r (i-1; i<NumObjects; i++)
(
/* See i f we need t o move b a c k w a r dt h r o u g ht h e l i s t */
i f (ObjectPtr->CenterInView.Z < ObjectCmpPtr->CenterInView.Z) [
/* Remember where t o resume s o r t i n g w i t h t h e n e x t o b j e c t
*/
NextObjectPtr
ObjectPtr->Nextobject;
/ * Yes. move backward u n t i l we f i n d t h e p r o p e r i n s e r t i o n
p o i n t .T e r m i n a t i o ng u a r a n t e e db e c a u s eo fs t a r ts e n t i n e l
*/
do (
ObjectCmpPtr
ObjectCmpPtr->PreviousObject;
3 w h i l e (ObjectPtr->CenterInView.Z <
ObjectCmpPtr->CenterInView.Z);
/*
/*
Now move t h e o b j e c t t o i t s
new l o c a t i o n */
U n l i n kt h eo b j e c ta tt h eo l dl o c a t i o n
*/
ObjectPtr->PreviousObject->Nextobject
ObjectPtr->Nextobject;
ObjectPtr->Nextobject->Previousobject
ObjectPtr->PreviousObject:
/* L i n k i n t h e o b j e c t a t t h e
new l o c a t i o n * /
ObjectCmpPtr->Nextobject->Previousobject
ObjectPtr;
ObjectPtr->PreviousObject
ObjectCmpPtr;
ObjectCmpPtr->Nextobject;
ObjectPtr->Nextobject
ObjectCmpPtr->Nextobject
ObjectPtr;
--
--
/ * Advance t o t h e n e x t o b j e c t t o s o r t
*/
ObjectCmpPtr
NextObjectPtr->PreviousObject;
ObjectPtr
NextObjectPtr;
1 else (
/* Advance t o t h e n e x t o b j e c t t o s o r t
*/
ObjectCmpPtr
ObjectPtr:
ObjectPtr->Nextobject;
ObjectPtr
I
--
Rounding
FIXED."
contains the equate ROUNDING-ON. When this equate is 1 , the results of multiplications and divisions are rounded to the nearest fixed-point values;
when it's 0, the results are truncated. The difference between the results produced
1002
Chapter 53
Previous
Home
Next
by the two approaches is, at most, 2-16;you wouldnt think that would make much
difference, now, would you? Butit does. When the animation is run with rounding
disabled, the cubes start to distort
visibly after a few minutes, and after a few minutes
more they look like theyve
been run over. In contrast, Ive never seen any significant
distortion with rounding on,even after a half-hour or so. I think the difference with
rounding is not that its so much more accurate, but ratherthat the errors are
evenly
distributed; with truncation, the errors are biased, and biased errors become very
visible when theyreapplied to right-angle objects.Even withrounding, though, the
errors will eventually creep in, and reorthogonalization will become necessary at
some point.
The performance cost of rounding is small, and thebenefits are highly visible. Still,
truncation errors become significant only when they accumulate over time, as,for
example, when rotation matrices are repeatedly concatenated over the course
of many
transformations. Some time could besaved by rounding only in such cases. For example, divisionis performed only inthe course of projection, and theresults do not
accumulate over time,so it would be reasonable to disablerounding for division.
Having a Ball
So far in our exploration of 3-D animation, weve had nothing to look at but triangles and cubes. Its time for something a little more visually appealing, so the
demonstration program now features a 72-sided ball. Whats particularly
interesting
about this ball is that its created by the GENBALL.C program in the BALL
subdirectory of X-Sharp, and both the size of the ball and the number of bands of
faces are programmable. GENBALL.C spits out to a file allthe arrays of vertices and
faces needed to create the ball, ready for inclusion in 1NITBALL.C. Ti-ue, if you
change the number of bands, you must change the Colors array in 1NITBALL.C to
match, but thats a tiny detail; by and large, the process of generating a ball-shaped
object is now automated. In fact, werenot limited to ball-shaped objects; substitute
a different vertex and face generation program for GENBALL.C, and you can make
whatever convexpolyhedron you want; again, all you havedotois change the Colors
array correspondingly.You can easily create multiple versionsof the base object, too;
1NITCUBE.C is an example of this, creating 11 different cubes.
What we have here is the first glimmer of an object-editing system. GENBALL.C is
the prototype for object definition, and 1NITBALL.C is the prototype for generalpurpose object instantiation. Certainly, it would be nice to someday
an interactive
have
3-D object editing tool and resource management setup. We have our handsfull with
the drawing end of things at the moment,
though, and for now itsenough to be able
to create objects in a semiautomated way.
1003
Previous
chapter 54
3-d shading
Home
Next
1007
fast asit used to. The switch to 8088 instructions makes X-Sharpsfixed-point calculations about 2.5 times slower overall. Since
a PC isperhaps 40 times slowerthan a 486/33,
were talkingabout a hundred-times speeddifference between the low end andmainstream. A 486/33 can animate a 72-sided ball, complete with shading (as discussed
later), at60 frames per second (fps),with plenty of cyclesto spare; an 8-MHz ATcan
animate the same ball at about6 fps. Clearly, the level of animation an application
uses must be tailored to the available CPU horsepower.
The implementation of a 32-bit multiply using 8088 instructions is a simple matter of
adding together fourpartial products. A 32-bit divide isnot so simple, however. In
fact, in Listing 54.1 Ive chosen not to implement a full 32x32 divide, but ratheronly
a 32x16 divide. The reason is simple: performance. A 32x16 divide can be implemented on an8088 with two DIV instructions, but a 32x32 divide takesa great deal
more work, so far as I can see. (If anyone has a fast 32x32 divide, or has a faster way
to handle signed multiplies and divides than the approach taken by Listing 54.1,
please drop me a line care of the publisher.) In X-Sharp, division is used only to
divide either X or Y by Z in the process of projecting fromview space to screen space,
so the cost of using a 32x16 divide is merely some inaccuracy in calculating screen
coordinates, especially when objects get very close to the Z = 0 plane. This error is
not cumulative (that is, it doesntcarry over to later frames), and in
my experience
doesnt cause noticeable image degradation; therefore, given the already slow performance of the 8088 and 286, Ive opted for performanceover precision.
At any rate, please keep in mind that the non-386 version of FixedDiv() is not a
general-purpose 32x32 fixed-point division routine. Infact, it will generate a divideby-zero error if passed a fixed-point divisor between -1 and 1. As Ive explained, the
non-386 version of Fixed-Div() is designed to do justwhat X-Sharp needs, and no
more, as quickly aspossible.
LISTING 54.1
FIXED.ASM
; F i x e dp o i n tr o u t i n e s .
; T e s t e dw i t h TASM
USE386
equ
MUL-ROUNDING-ON
equ
1
1
0 for
; 8088 opcodes
; 1 frooru n d i n g
on m u l t i p l i e s ,
; 0 f o r norounding.Notrounding
i sf a s t e r ,
; r o u n d i n g i s m o r ea c c u r a t ea n dg e n e r a l l y
a
; good i d e a
DIV-ROUNDING-ON
equ
;1 frooru n d i n g
;
;
;
;
;
ALIGNMENT
equ
.model smal 1
.386
.code
1008
Chapter 54
on d i v i d e s ,
0 f o r norounding.Notrounding
i sf a s t e r ,
r o u n d i n g i s more a c c u r a t e b
, u tb e c a u s e
d i v i s i o ni so n l yp e r f o r m e dt op r o j e c tt o
t h es c r e e n ,r o u n d i n gq u o t i e n t sg e n e r a l l y
i s n tn e c e s s a r y
~~~~
; M u l t i p l i e st w of i x e d - p o i n tv a l u e st o g e t h e r .
; C n e a r - c a l l a b l ea s :
;
F i x e d p o i Fn it x e d M u l ( F i x e d p 0 i n t
FMparms s t r u c
dwd;uraped(t2?
ud)r ne s s
M1
dd
?
M2
dd
?
FMparms ends
a1 i g n ALIGNMENT
public
F i xedMul
-FixedMul
p r once a r
push bp
mov
bp.sp
M 1 . F i x e d p o iH
n tZ ) ;
& pushed BP
i f USE386
mov
eax.[bp+Ml]
i m udl w o r dp t r
[bp+M2]
i f MUL-ROUNDING-ON
add
eax,
8000h
edx.0
adc
e n d i f :MUL-ROUNDING-ON
eax.16
shr
else
;multiply
; r o u n db ya d d i n g2 ^ ( - 1 7 )
; w h o l ep a r to fr e s u l ti si n
DX
; p u tt h ef r a c t i o n a lp a r ti n
AX
; !USE386
push s i
push d i
cx.cx
sub
mov
ax.word p t r [bp+M1+2]
mov
s i . w o rpdt r
[bp+Ml]
ax.ax
and
jns
CheckSecondOperand
axneg
neg
si
ax.0
sbb
ci xn c
CheckSecondOperand:
mov
bx.word p t r [bp+M2+2]
mov
d i .word p t r [bp+M2]
bx.bx
and
S janvse S i g n S t a t u s
bx
neg
neg
di
bx.0
sbb
cxxo. 1r
SaveSignStatus:
push
cx
push ax
mu1
bx
mov
cx.ax
;do f o u rp a r t i a lp r o d u c t s
and
; addthem
t o g e t h e r .a c c u m u l a t i n g
; theresultin
CX:BX
; p r e s e r v e C r e g i s t e rv a r i a b l e s
; f i g u r eo u ts i g n s ,
so we canuse
; u n s i g n e dm u l t i p l i e s
;assume b o t ho p e r a n d sp o s i t i v e
; f i r s t o p e r a n dn e g a t i v e ?
;no
;yes. s o n e g a t e f i r s t operand
;mark t h a t f i r s t o p e r a n d i s n e g a t i v e
; s e c o n do p e r a n dn e g a t i v e ?
;no
;yes. so negatesecondoperand
;mark t h a t secondoperand
i sn e g a t i v e
;remember s i g n o f r e s u l t ;
1 i f result
; n e g a t i v e , 0 i f r e s u l tn o n n e g a t i v e
;remember h i g hw o r d o f M 1
; h i g hw o r d M 1 t i m e sh i g hw o r d
M2
; a c c u m u l a t er e s u l ti n
CX:BX (BX n o tu s e d
; u n t i ln e x to p e r a t i o n ,h o w e v e r )
;assumeno
o v e r f l o w i n t o DX
mov
mu1
mov
add
ax, s i
bx
bx.ax
cx.dx
ax
POP
di
mu1
add
bx.ax
adc
cx.dx
mov
ax, s i
di
mu1
i f MUL-ROUNDING-ON
ax.8000h
add
bx.dx
adc
e l s e :!MUL-ROUNDING-ON
bx.dx
add
e n d i f ;MUL-ROUNDING-ON
cx.0
adc
mov
dx.cx
mov
ax.bx
POP
cx
cx,cx
and
jz
F i xedMul Done
dx
neg
axneg
dx.0
sbb
FixedMulDone:
pop
pop
di
si
:lowword
M 1 t i m e sh i g hw o r d
: a c c u m u l a t er e s u l ti n
CX:BX
; r e t r i e v eh i g hw o r do f
M1
; h i g hw o r d M 1 t i m e sl o ww o r d
; r o u n db ya d d i n g2 ^ ( - 1 7 )
: d o n ' tr o u n d
; a c c u m u l a t er e s u l ti n
CX:BX
: i st h er e s u l tn e g a t i v e ?
; n o .w e ' r ea l ls e t
;yes. s o n e g a t e DX:AX
: r e s t o r e C r e g i s t e rv a r i a b l e s
POP
bP
ret
-FixedMul
endp
byanother.
;
F i x e d p o i Fn it x e d D i v ( F i x e d p 0 i D
n ti v i d e n Fd i, x e d p o i D
n ti v i s o r ) ;
FDparms s t r u c
dwd :uraped(t2?ud)rrne s s
& pushed BP
D i vdi d e n d
?
D
d di v i s o r
?
FDparms ends
ALIGNMENT
a1 ign
JixedOiv
public
JixedDi v
n e apr o c
bp
push
mov
bP SSP
i f USE386
i f DIV-ROUNDING-ON
cx.cx
sub
mov
eax.[bp+Oividendl
d i v i d e n; dp ?o s i t i v e a x , e a x
and
jns
FOP1
cx
inc
eax
neg
1010
Chapter 54
MZ
: a c c u m u l a t er e s u l ti n
CX:BX
:lowword M 1 t i m e sl o ww o r d
M2
e n d i f :USE386
: D i v i d e s one f i x e d - p o i n tv a l u e
: C n e a r - c a l l a b l ea s :
M2
;assume
r e spuol ts i t i v e
:yes
:mark i t ' s a dni ev gi daet ni vde
p od:make
si vi ti idveent dh e
FDPl :
,edx
sub
edx
eax,l6 rol
;make i t a 6 4 - b i td i v i d e n d ,t h e ns h i f t
: l e f t 16 b i t s s o t h a t r e s u l t w
il be i n EAX
: p u tf r a c t i o n a lp a r to fd i v i d e n di n
EAX
DX
; p u tw h o l ep a r to fd i v i d e n di n
EAX
: h i g hw o r do f
mov
o f w o r d l o w: c l eaaxr. a xs u b
mov
ebx.ebxand
jns
cx
dec
ebx
neg
div
FDPZ:
ebx.1 shr
adc
ebx dec
cmp
eax.O adc
dx.ax
ebx.dword [pbt pr + D i v i s o r l
FOP2
ebx
ebx.O
ebx,
edx
cx.cx and
jz
FDP3
eax neg
FDP3 :
e l s e :!DIV-ROUNDING-ON
mov
edx.[bp+Dividendl
eax,eaxsub
eax.edx.16
shrd
edx.16 sar
idiv
dword p[ bt rp + D i v i s o r ]
e n d i f :DIV-ROUNDING-ON
edx.eax.16
shld
else
: p o s i t i v ed i v i s o r ?
:yes
;mark i t ' s a n e g a t i v e d i v i s o r
:make d i v i s o r p o s i t i v e
:divide
; d i v i s o r / 2 .m i n u s
1 i f t h ed i v i s o ri s
: even
isatleast
: s e tC a r r y i f t h er e m a i n d e r
: h a l f as l a r g e as t h ed i v i s o r ,t h e n
: u s et h a tt or o u n du p
i f necessary
; s h o u l dt h er e s u l tb e
:no
it
:yes.negate
made n e g a t i v e ?
: p o s i t i o n so t h a t r e s u l t
: i n EAX
endsup
: w h o l ep a r to fr e s u l ti n
DX:
: f r a c t i o n a lp a r ti sa l r e a d yi n
AX
: !USE386
..
: u n s i g n e dd i v i s i o n s
sub
cx, cx
mov
ax.word p t r [bp+Dividend+Z]
ax.ax and
jns
CheckSecondOperandD
:no
ax
neg
word
neg
p t r [bp+Dividend]
ax.0
sbb
cxinc
CheckSecondOperandD:
mov
b x . [wbpopt r+dD i v i s o r + E l
bxbx,
and
S a v e S i gjn S
s tatusD
s o we canuse
:assume b o t ho p e r a n d sp o s i t i v e
: f i r s t o p e r a n dn e g a t i v e ?
:yes. so n e g a t ef i r s to p e r a n d
:mark t h a t f i r s t
operand i s n e g a t i v e
; s e c o n do p e r a n dn e g a t i v e ?
:no
3-D Shading
1011
:yes. bx
neg
word
neg
sbb
xor
SaveSignStatusD:
push
operand
sosecond
negate
[pbtpr + D i v i s o r l
bx.0
cx.1
:mark t h a t secondoperand
cx
sub
div
dx.dx
bx
mov
mov
cx.ax
ax.word p t[ rb p + D i v i d e n d l
bdxi v
i f DIV-ROUNDING-ON EO 0
shr
bx.1
adc
bx.0
dec
bx
bx.dx
cmp
adc
ax.0
cx.0
adc
e n d i f :DIV-ROUNDING-ON
isnegative
1 i f result
:remember s i g n o f r e s u l t :
: n e g a t i v e , 0 i f r e s u l tn o n n e g a t i v e
DX:AX
: p u tD i v i d e n d + Z( i n t e g e rp a r t )i n
:firsthalfof
3 2 / 1 6d i v i s i o n ,i n t e g e rp a r t
: d i v i d e db yi n t e g e rp a r t
: s e ta s i d ei n t e g e rp a r to fr e s u l t
: c o n c a t e n a t et h ef r a c t i o n a lp a r to f
: t h ed i v i d e n dt ot h er e m a i n d e r( f r a c t i o n a l
: p a r t )o ft h er e s u l tf r o md i v i d i n gt h e
: i n t e g e rp a r to ft h ed i v i d e n d
:second h a l f o f 3 2 / 1 6 d i v i s i o n
: d i v i s o r / Z .m i n u s
1 if thedivisoris
: even
: s e tC a r r y
i f t h er e m a i n d e ri sa tl e a s t
: h a l f as l a r g e as t h ed i v i s o r ,t h e n
: u s et h a tt or o u n du p
i f necessary
mov
dx.cx
POP c x
cx,cx
and
jz
F i xedDi vDone
dx
neg
ax
neg
dx.0
sbb
FixedDivDone:
:absolutevalueofresultin
DX:AX
: i st h er e s u l tn e g a t i v e ?
: n o .w e ' r ea l ls e t
:yes. s o n e g a t e DX:AX
e n d i f :USE386
POP
ret
-F i xedDi v
bP
endp
: R e t u r n st h es i n ea n dc o s i n eo f
an a n g l e .
: C n e a r - c a l l a b l ea s :
:
v o iCdo s S i n ( T A n g 1Aen g l eF,i x e d p o i n t
*Cos.
Fixedpoint *);
a1 i g n ALIGNMENT
CosTabl e 1 abel dword
i n c l u d ec o s t a b l e . i n c
SCparms s t r u c
dw d u2p ( ? )
dw
?
Angle
dw
?
cos
dw
?
Sin
ends
SCparms
a1 i g n ALIGNMENT
pub1 1 c -CosSi n
1 012
Chapter 54
; r e t u r na d d r e s s & pushed BP
:angletocalculatesine
& cosfnefor
: p o i n t e rt oc o sd e s t i n a t i o n
:pointertosindestination
-CosSin
p r once a r
push bp
mov
bp.sp
; p r e s e r v es t a c kf r a m e
: s e tu pl o c a ls t a c kf r a m e
i f USE386
mov
bx.[bpJ.Angle
bx.bx
and
jns
CheckInRange
Ma kePos :
bx.360*10
add
js
MakePos
jmp
s h o r t CheckInRange
a l i g n ALIGNMENT
MakeInRange:
bx.360*10
sub
CheckInRange:
cmp
bx.360*10
jg
MakeInRange
cmp
b x . 180*10
ja
BottomHal f
cmp
bx.90*10
Q u a jdar a n t l
0 and 2*pi
:make s u r ea n g l e ' sb e t w e e n
: l e s st h a n
0. so make i t p o s i t i v e
;make s u r e a n g l e i s
nomore
t h a n2 * p i
; f i g u r eo u tw h i c hq u a d r a n t
;quadrant 2 o r 3
:quadrant 0 o r 1
:quadrant 0
shl
mov
bxneg
mov
jmp
bx.2
eax.CosTable[bxl
ed~.CosTable[bx+90*10*4J
: l o o k up s i n e
;sin(Angle)
cos(90-Angle)
: l o o ku pc o s i n e
s h o r t CSDone
a1 i g n ALIGNMENT
Quadrantl:
bx
neg
180*10
add
bx,
b xs.h2l
mov
eax.CosTable[bxl
eax
neg
bx
neg
mov
edx.CosTable[bx+90*lO*4]
j ms ph o r t
CSDone
a1 ign ALIGNMENT
BottomHal f :
bx
neg
bx.360*10
add
cmp
bx.90*10
ja
Quadrant2
; c o n v e r tt oa n g l eb e t w e e n
0 and90
; l o o ku pc o s i n e
; n e g a t i v ei nt h i sq u a d r a n t
;sin(Angle)
cos(90-Angle)
: l o o ku pc o s i n e
;quadrant 2 o r 3
; c o n v e r tt oa n g l eb e t w e e n
;quadrant 2 o r 3
0 and180
:quadrant 3
shl
mov
bx
neg
mov
edx
neg
jmp
bx.2
eax.CosTable[bxl
edx.CosTable[90*10*4+bxl
: l o o ku pc o s i n e
;sin(Angle)
cos(90-Angle)
; l o o ku ps i n e
; n e g a t i v ei nt h i sq u a d r a n t
s h o r t CSDone
a1 i g n ALIGNMENT
Quadrant2:
bx
neg
bx.180*10
add
; c o n v e r tt oa n g l eb e t w e e n
0 and90
shl
mov
neg
neg
mov
neg
CSDone:
mov
mov
mov
mov
bx.2
eax.CosTable[bxl
eax
bx
edx.CosTable[90*10*4+bxl
edx
: l o o k up c o s i n e
: n e g a t i v ei nt h i sq u a d r a n t
cos(90-Angle)
:sin(Angle)
: l o o ku ps i n e
: n e g a t i v ei nt h i sq u a d r a n t
bx.[bpl.Cos
[bxl .eax
bx.[bpl.Sin
[bxl .edx
e l s e : !USE386
mov
bx.[bpl.Angle
bx.bx
and
jns
CheckInRange
Ma kePos :
bx.360*10
add
js
MakePos
jmp
short
CheckInRange
a1 ign ALIGNMENT
MakeInRange:
bx.360*10
sub
CheckInRange:
cmp
bx.360*10
jg
MakeInRange
q u a d rwahnitc h
o u t ; f i g u1r8e0 * 1 0 b x ,
BottomHal f
:quadrant
90*10
bx,
Quadrantl
0 and2*pi
:make s u r ea n g l e ' sb e t w e e n
: l e s st h a n
0, s o make i t p o s i t i v e
:make s u r ea n g l ei sn o
morethan2*pi
: q u a d3r a n t 2 o r
0 or 1
:quadrant 0
:sin(Angle)
:negative
sin(Angle)
dx
bx
bx.2
ax.word
dx.word
bx
cx.word
bx.word
CSDone
p t r C o s T a bslie:nul[eobpox kl
p t r CosTable[bx+2]
p t r CosTable[bx+90*10*4+2]:lookupcosine
p t r CosTable[bx+90*10*41
cos(90-Angle)
a1 i g n ALIGNMENT
Quadrantl:
bx
neg
bx.180*10
:convert
add
a n g l et o
between 0 and 90
b xs .h2l
mov
ax.word p t r CosTableCbxl
:look
up c o s i n e
mov
dx.word p t r CosTableCbx+21
neg
quadrant this
in
ax
neg
dx.0
sbb
neg
cos(90-Angle)
mov
c x . w o r dp t r
CosTable[bx+90*10*4+21
:look
up
cosine
mov
bx,word p t r CosTableCbx+90*10*43
j msph o r t
CSDone
a1 i g n ALIGNMENT
BottomHal f :
bx
neg
bx.360*10
add
1014
Chapter 54
:quadrant2or3
: c o n v e r tt oa n g l eb e t w e e n
0 and180
cmp
ja
bx.90*10
Quadrant2
;quadrant 2 o r 3
:quadrant3
shl
bx.2
mov
ax.word p t r CosTableCbx]
mov
dx.word p t r CosTable[bx+2]
bx
neg
mov
cx.word p t r CosTable[90*10*4+bx+2]
mov
bx.word p t r CosTable[90*10*4+bxl
cx
neg
bx
neg
cx.0
sbb
j ms ph o r t
CSDone
a1 i g n ALIGNMENT
Quadrant2:
bx
neg
bx.180*10
add
b xs.h2l
mov
ax.word p t r
mov
dx.word p t r
dx
neg
ax
neg
sbb
dx,O
bx
neg
mov
cx.word p t r
mov
bx.word p t r
cxneg
bx
neg
cx.0
sbb
CSDone:
push
bx
mov
b x , [ b p l .Cos
mov
[bx]
,ax
mov
[ bx+2] ,d x
mov
b x .[ b p l. S i n
POP
ax
mov
[bx].ax
mov
Cbx+Zl . c x
CosTable[bx]
CosTable[bx+P]
: l o o k up c o s i n e
cos(90-Angle)
;sin(Angle)
; l o o k p; s i n e
; n e g a t i v ei nt h i s
quadrant
: c o n v e r tt oa n g l e
between 0 and90
: l o o ku pc o s i n e
:negativeinthis
CosTable[90*10*4+bx+2]
CosTable[90*10*4+bx]
quadrant
cos(90-Angle)
:sin(Angle)
; l o o k UD s i n e
: n e g a t i v ei nt h i sq u a d r a n t
e n d i f ;USE386
POP
ret
XosSin
: r e s t o r es t a c kf r a m e
bP
endp
M a t r i xm u l t i p l i e sX f o r m
bySourceVec.and
s t o r e st h er e s u l ti n
OestVec. M u l t i p l i e s a4x4 m a t r i x t i m e s a 4 x 1 m a t r i x ; t h e r e s u l t
i s a 4 x 1m a t r i x .C h e a t sb ya s s u m i n gt h e
W c o o r d i s 1 and t h e
b o t t o mr o wo ft h em a t r i xi s
0 0 0 1. and d o e s n ' t b o t h e r t o s e t
: t h e W c o o r d i n a t eo ft h ed e s t i n a t i o n .
; C n e a r - c a l l a b l ea s :
;
void
XformVec(Xform
WorkingXform.
Fixedpoint
*SourceVec.
F i x e d p o i n t* D e s t V e c ) ;
;
;
;
;
; Thisassemblycode
;
i n t i;
isequivalenttothis
C code:
3-D Shading
1015
f o( ri - 0i <; 3 ;
DestVecCil
i++)
FixedMul(WorkingXform[il[Ol, SourceVecCO]) +
F i x e d M u l ( W o r k i n g X f o r m ~ i l ~ l 1 ,SourceVecC11) +
FixedMul(WorkingXform[ilL?l, SourceVecCZI) +
WorkingXformCil[3]:
/* noneed t o m u l t i p l y by W
*/
XVparms s t r u c
WorkingXform
SourceVec
DestVec
XVparms ends
dw
dw
dw
dw
2 d u p: r( e? t)audr nd r e s s
?
?
?
& pushed BP
t r amnas tf:tropoi ro
xm
inter
v e sc ot ourr c e t o: p o i n t e r
d evset icnt aottroi:opno i n t e r
:do f o u r p a r t i a l p r o d u c t s
and
; a d dt h e mt o g e t h e r ,a c c u m u l a t i n g
: theresultIn
cx.cx
sub
mov
bx.word p t r C&M1&+2]
bx,bx
and
jns
CheckSecondOperand
bx
neg
neg
word
ptr C
&
M
&
l I
bx.0
sbb
mov
word p t r C&M1&+2].bx
ci xn c
CheckSecondOperand:
mov
bx.word p t r C&M2&+21
bx.bx
and
S janvse S i g n S t a t u s
bxneg
neg
word
p t r [&M2&1
sbb
bx.0
mov word p t r [&M2&+2].bx
cxxo. 1r
SaveSignStatus:
push c x
mov
mu1
mov
ax.word p t r C&M1&+21
word p t r [&M2&+21
cx.ax
mov
ax.word p t r C&M1&+2]
mu1
word p t r [&MZ&l
bx, ax
mov
cx. dx
add
ax.word p t r [ & M l & l
mov
word p t r [&M2&+21
mu1
bx, ax
add
cx.dx
adc
ax,word p t r [ & M l & l
mov
word p t r [&M2&1
mu1
i f MUL-ROUNDING-ON
8000h
ax,
add
bx.dx
adc
1016 Chapter 54
CX:BX
: f i g u r eo u ts i g n s ,
so we canuse
: u n s i g n e dm u l t i p l i e s
:assume b o t ho p e r a n d sp o s i t i v e
: f i r s to p e r a n dn e g a t i v e ?
:no
:yes. so n e g a t ef i r s to p e r a n d
:mark t h a t f i r s t
operand i s n e g a t i v e
:secondoperandnegative?
:no
:yes. s o negatesecondoperand
:mark t h a t secondoperand
isnegative
;remember s i g n o f r e s u l t :
1 if result
: n e g a t i v e , 0 i f r e s u l tn o n n e g a t i v e
: h i g hw o r dt i m e sh i g hw o r d
:assumeno
o v e r f l o wi n t o
: h i g hw o r dt i m e sl o ww o r d
: l o ww o r dt i m e sh i g hw o r d
: l o ww o r dt i m e sl o ww o r d
:roundbyadding
ZA(-17)
DX
e l s e ;!MUL-ROUNDING-ON
bx.dx
add
e n d i f :MUL-ROUNDING-ON
cx.0
adc
mov
dx,cx
mov
ax.bx
POP
cx
cx.cx
and
jz
FixedMulDone
dx
neg
axneg
dx.0
sbb
FixedMulDone:
ENDM
a1 i g n ALIGNMENT
pub1 ic -XformVec
-XformVec
p rnoeca r
push bp
mov
bp.sp
push s i
push d i
;dontround
: i st h er e s u l tn e g a t i v e ?
:no.were
a l ls e t
;yes. s o n e g a t e DX:AX
: p r e s e r v es t a c kf r a m e
: s e t u p l o c a ls t a c kf r a m e
; p r e s e r v er e g i s t e rv a r i a b l e s
i f USE366
mov
mov
mov
si.[bpl.WorkingXform
bx.Cbpl.SourceVec
d i ,[ b p l .DestVec
:SI p o i n t s t o x f o r m m a t r i x
:BX p o i n t s t o s o u r c e v e c t o r
: D I p o i n t st od e s tv e c t o r
soff-0
doff-0
REPT 3
eax.[si+soffl
i m udl w o r dp t[rb x l
i f MUL-ROUNDING-ON
eax.8000h
add
edx.O
adc
e n d i f ;MUL-ROUNDING-ON
shrd
eax.edx.16
mov
ecx,
eax
mov
mov
eax.[si+soff+41
i m udl w o r dp t[rb x + 4 ]
i f MUL-ROUNDING-ON
eax.8000h
add
edx.0
adc
e n d i f ;MUL-ROUNDING-ON
shrd
eax.edx.16
add
ecx,
eax
mov
eax.[si+soff+61
imul
dword
p t r [bx+6]
i f MUL-ROUNDING-ON
eax.8000h
add
edx.O
adc
e n d i f :MUL-ROUNDING-ON
shrd
eax.edx.16
ecx.eax
add
add
mov
ecx.[si+soff+l2]
Cdi+doffl.ecx
:doonceeach
f o r d e s t X . Y , and 2
:column 0 e n t r y on t h i s row
; x f o r me n t r yt i m e ss o u r c e
X entry
: r o u n db ya d d i n g2 ( - 1 7 )
; w h o l ep a r to fr e s u l ti si n
:shifttheresultbackto
: s e tr u n n i n gt o t a l
DX
16.16form
;column 1 e n t r y on t h i s row
Y entry
; x f o r me n t r yt i m e ss o u r c e
; r o u n db ya d d i n g2 ^ ( - 1 7 )
;wholepartofresultisin
DX
:shifttheresultbackto16.16form
; r u n n i n gt o t a lf o rt h i sr o w
; c o l u m n2e n t r y
on t h i s row
: x f o r me n t r yt i m e ss o u r c e
2 entry
; r o u n db ya d d i n g2 * ( - 1 7 )
:wholepartofresultisin
DX
; s h i f tt h er e s u l tb a c kt o1 6 . 1 6f o r m
: r u n n i n gt o t a lf o rt h i sr o w
:add i n t r a n s l a t i o n
: s a v et h er e s u l ti nt h ed e s tv e c t o r
3-D Shading
1017
soff-soff+l6
doff-doff+4
ENOM
e l s e : !USE386
mov
mov
mov
push
bp
si.Cbp].WorkingXform
di.[bp].SourceVec
bx.[bpl.OestVec
REPT
push
push
push
push
push
call
add
mov
mov
3
bx
word p t r [ s i + s o f f + 2 1
word p t r [ s i + s o f f l
word p t r Cdi+21
word p t r [ d i l
-FixedMul
sp.8
c x , a x: s e tr u n n i n gt o t a l
bp.dx
:SI p o i n t s t o x f o r m m a t r i x
: D I p o i n t st os o u r c ev e c t o r
:BX p o i n t s t o d e s t v e c t o r
; p r e s e r v es t a c kf r a m ep o i n t e r
soff-0
doff-0
push
push
push
push
push
call
add
POP
add
adc
cx
word p t r[ s i + s o f f + 4 + 2 ]
word p t r [ s i + s o f f + 4 ]
word p t r [di+4+21
word p t r [d1+41
-FixedMul
sp.8
cx
cx, ax
bp .dx
push
push
push
push
push
call
add
POP
add
adc
cx
word p t r[ s i + s o f f + 8 + 2 1
word p t r [ s i + s o f f + 8 1
word p t r Cdi+B+ZI
word p t r [ d i + 8 1
-FixedMul
sp.8
cx
cx, ax
bp.dx
add
cx.[si+soff+l21
bapd. c[ s i + s o f f + l 2 + 2 1
POP bx
mov
[bx+doffl.cx
mov
[bx+doff+El,bp
soff-soff+l6
doff-doff+4
ENDM
POP
bP
;doonceeach
f o r d e s t X. Y . and 2
:remember d e s t v e c t o r p o i n t e r
; x f o r me n t r yt i m e ss o u r c e
: c l e a rp a r a m e t e r sf r o ms t a c k
: p r e s e r v el o ww o r do fr u n n i n gt o t a l
: x f o r me n t r yt i m e ss o u r c e
Y entry
: c l e a rp a r a m e t e r sf r o ms t a c k
; r e s t o r el o ww o r do fr u n n i n gt o t a l
; r u n n i n gt o t a lf o rt h i s
row
: p r e s e r v el o ww o r do fr u n n i n gt o t a l
: x f o r me n t r yt i m e ss o u r c e
2 entry
; c l e a rp a r a m e t e r sf r o ms t a c k
; r e s t o r el o ww o r do fr u n n i n gt o t a l
; r u n n i n gt o t a lf o rt h i s
row
:add i n t r a n s l a t i o n
: r e s t o r ed e s tv e c t o rp o i n t e r
: s a v et h er e s u l ti nt h ed e s tv e c t o r
: r e s t o r es t a c kf r a m ep o i n t e r
e n d i f ;USE386
pop
POP
1018
di
si
Chapter 54
X entry
; r e s t o r er e g i s t e rv a r i a b l e s
POP
ret
-XformVec
endp
: r e s t o r es t a c kf r a m e
bp
~
"
"
~
"
"
"
"
"
"
y
"
~
"
"
"
~
M a t r i xm u l t i p l i e sS o u r c e X f o r m l
bySourceXformEand
s t o r e st h e
r e s u l t i n O e s t X f o r m .M u l t i p l i e s
a 4 x 4m a t r i xt i m e s
a 4x4 m a t r i x :
theresultis
a 4 x 4m a t r i x .C h e a t sb ya s s u m i n gt h eb o t t o mr o wo f
e a c hm a t r i xi s
0 0 0 1, and d o e s n ' tb o t h e rt os e tt h eb o t t o mr o w
o ft h ed e s t i n a t i o n .
C n e a r - c a l l a b l e as:
voidConcatXforms(XformSourceXforml.XformSourceXformE.
XformDestXform)
Thisassemblycode
i n t i.j ;
i se q u i v a l e n tt ot h i s
C code:
f o r( i - 0 :i < 3 :
i++)
{
f o r ( j - 0 : j < 3 : j++)
OestXform[il[jl
FixedMul(SourceXform1Cil[Ol, S o u r c e X f o r m 2 [ 0 1 [ j l ) +
FixedMul(SourceXforml[il[11. S o u r c e X f o r m 2 [ 1 1 [ j l ) +
FixedMul(SourceXformlCil~21, S o u r c e X f o r m E ~ E l ~ j l ) :
OestXformCilC31
SourceXform2[01[31) +
FixedMul(SourceXforml[i][ll.
SourceXform2[llC31) +
F i x e d M u l ( S o u r c e X f o r m l ~ i l ~ 2 1 ,SourceXform2C21C31) +
FixedMul(SourceXforml[il[Ol,
SourceXformlCil[31:
}
CXparms s t r u c
SourceXforml
SourceXformE
OestXform
CXparms ends
dw
dw
dw
dw
2 dup(?)
?
?
?
a1 i g n
ALIGNMENT
p u b l i c_ C o n c a t X f o r m s
nearproc
XoncatXforms
bp
push
mov
bp.sp
si
push
di
Dush
: r e t u r na d d r e s s & pushed BP
: p o i n t e rt of i r s ts o u r c ex f o r mm a t r i x
: p o i n t e r t o s e c o n ds o u r c ex f o r mm a t r i x
:pointertodestinationxformmatrix
: p r e s e r v es t a c kf r a m e
: s e t up l o c a ls t a c kf r a m e
: p r e s e r v er e g i s t e rv a r i a b l e s
i f USE386
mov
mov
mov
bx.[bpl.SourceXform2
si.[bpl.SourceXforml
di.[bpl.OestXform
:BX p o i n t s t o x f o r m 2 m a t r i x
:SI p o i n t s t o x f o r m l m a t r i x
: D I p o i n t st od e s tx f o r mm a t r i x
eax,[si+roffl
p[ bt rx + c o f f l
:row o f f s e t
;once f o r eachrow
:column o f f s e t
:once f o r each o f t h e f i r s t
3 columns.
: assuming 0 as t h eb o t t o me n t r y
(no
: translation)
;column 0 e n t r y on t h i s row
:timesrow
0 e n t r y i n column
roff-0
REPT 3
coff-0
REPT 3
mov
imul
dword
i f MUL-ROUNDING-ON
eax.8000h
add
adc
edx
,0
e n d i f :MUL-ROUNDING-ON
eax.edx.16
shrd
mov
eax
ecx,
mov
eax.[si+roff+41
imul
dword
p t r Cbx+coff+l61
i f MUL-ROUNDING-ON
eax.8000h
add
0
edx,
adc
e n d i f :MUL-ROUNDING-ON
eax.edx.16
shrd
ecx.eax add
mov
eax.[si+roff+81
im
dw
u[lopbrtxrd+ c o f f + 3 2 1
i f MUL-ROUNDING-ON
eax.8000h
add
edx,
adc
0
e n d i f ;MUL-ROUNDING-ON
eax.edx.16
shrd
ecx.eax add
mov
[di+coff+roff].ecx
coff-coff+4
: r o u n db ya d d i n g2 " ( - 1 7 )
:wholepartofresultisin
DX
;shifttheresultbackto
: s e tr u n n i n gt o t a l
16.16form
:column 1 e n t r y on t h i s row
1 entry i n col
:timesrow
: r o u n db ya d d i n g2 * ( - 1 7 )
:wholepartofresultisin
DX
: s h i f tt h er e s u l tb a c kt o
: r u n n i n gt o t a l
16.16form
;column 2 e n t r yo nt h i sr o w
:timesrow
2 entryincol
: r o u n db ya d d i n g2 " ( - 1 7 )
:wholepartofresultisin
DX
: s h i f tt h er e s u l tb a c kt o1 6 . 1 6f o r m
: r u n n i n gt o t a l
:save t h e r e s u l t i n d e s t m a t r i x
: p o i n tt on e x tc o li nx f o r m 2
& dest
ENDM
:now do t h e f o u r t h
column,assuming
: 1 as t h eb o t t o me n t r y ,c a u s i n g
: t r a n s l a t i o n t o b ep e r f o r m e d
mov
eax.[si+roffl
imul
dword
p[ bt rx + c o f f l
i f MUL-ROUNDING-ON
eax.8000h
add
0
edx,
adc
e n d i f ;MUL-ROUNDING-ON
eax.edx.16
shrd
mov
ecx,eax
mov
eax.[si+roff+41
imul
dword
[pbt xr + c o f f + l 6 1
i f MUL-ROUNDING-ON
8000h
eax,
add
edx,
adc
0
e n d i f ;MUL-ROUNDING-ON
e a x .sehdrxd, l 6
ecx.eax add
mov
eax.[si+roff+8]
imul
dword
[pbtxr + c o f f + 3 2 1
i f MUL-ROUNDING-ON
eax.8000h
add
edx.0 adc
e n d i f :MUL-ROUNDING-ON
eax.edx.16
shrd
eax
ecx,
add
e c x , C sai +drdo f f + l 2 ]
1020
Chapter 54
;column 0 e n t r y on t h i s row
0 e n t r y i n column
:timesrow
: r o u n db ya d d i n g2 ^ ( - 1 7 )
;wholepartofresultisin
: s h i f tt h er e s u l tb a c kt o
: s e tr u n n i n gt o t a l
DX
16.16form
;column 1 e n t r y on t h i s row
:timesrow
1 entryincol
: r o u n db ya d d i n gZ A ( - 1 7 )
:wholepartofresultisin
DX
: s h i f tt h er e s u l tb a c kt o1 6 . 1 6f o r m
: r u n n i n gt o t a l
:column2
e n t r y on t h i s row
:timesrow
2 entryincol
: r o u n db ya d d i n g2 ^ ( - 1 7 )
; w h o l ep a r t o f r e s u l t i s i n
; s h i f tt h er e s u l tb a c kt o
: r u n n i n gt o t a l
:add i n t r a n s l a t i o n
DX
16.16form
mov
Cdi+coff+roffl.ecx
coff-coff+4
roff-roff+l6
: s a v et h er e s u l ti nd e s tm a t r i x
:pointtonextcolin
xform2 & d e s t
; p o i n tt on e x tc o li n
xform2 & d e s t
ENDM
e l se : ! USE386
mov
mov
mov
bp push
di.[bp].SourceXformZ
si.[bpl.SourceXforml
bx.[bpl.DestXform
:SI p o i n t s t o x f o r m l m a t r i x
:BX p o i n t s t o d e s t x f o r m m a t r i x
: p r e s e r v es t a c kf r a m ep o i n t e r
roff-0
REPT 3
coff-0
REPT 3
push
push
push
push
push
call
: D I p o i n t st ox f o r m 2m a t r i x
bx
word p t rC s i + r o f f + 2 ]
word p t r [ s i + r o f f ]
word p t r[ d i + c o f f + 2 ]
word p t r[ d i + c o f f ]
-FixedMul
;row o f f s e t
:once f o r eachrow
;column o f f s e t
;once f o r each o f t h e f i r s t
3 columns,
: assuming 0 as t h eb o t t o me n t r y
(no
: translation)
:remember d e s t v e c t o r p o i n t e r
:column 0 e n t r y on t h i s rowtimesrow
: e n t r y i n column
add
mov
mov
sp.8
c x , a: sxreut n n i nt ogt a l
bp.dx
: c l e a rp a r a m e t e r sf r o ms t a c k
push
push
push
push
push
call
cx
word p t r[ s i + r o f f + 4 + Z ]
word p t r[ s i + r o f f + 4 ]
word p t r[ d i + c o f f + l 6 + 2 ]
word p t r[ d i + c o f f + l 6 ]
-FixedMul
: p r e s e r v el o ww o r do fr u n n i n gt o t a l
:column 1 e n t r y on t h i s rowtimesrow
column
: c l e a rp a r a m e t e r sf r o ms t a c k
: r e s t o r el o ww o r do fr u n n i n gt o t a l
; r u n n i n gt o t a lf o rt h i s
row
; entryin
add
add
adc
sp.8
cx
cx.ax
bp. dx
push
push
push
push
push
call
cx
word p t r[ s i + r o f f + 8 + 2 1
word p t r [ s i + r o f f + 8 ]
word p t r[ d i + c o f f + 3 2 + 2 ]
word p t r[ d i + c o f f + 3 2 ]
JixedMul
POP
add
:column 1 e n t r y on t h i s rowtimesrow
column
: c l e a rp a r a m e t e r sf r o ms t a c k
; r e s t o r el o w word o f r u n n i n g t o t a l
; r u n n i n gt o t a lf o rt h i s
row
: e n t r yi n
add
adc
sp.8
cx
cx.ax
bp.dx
POP
mov
mov
bx
Cbx+coff+roff].cx
[bx+coff+roff+21,bp
POP
: p r e s e r v el o ww o r do fr u n n i n gt o t a l
; r e s t o r e DestXForm p o i n t e r
; s a v et h er e s u l ti nd e s tm a t r i x
; p o i n tt on e x tc o li nx f o r m 2
coff-coff+4
ENDM
push
push
push
push
push
call
bx
word p t r [ s i + r o f f + 2 ]
word p t r [ s i + r o f f l
word p t r[ d i + c o f f + 2 1
word p t r [ d i + c o f f l
-F i xedMul
add
mov
mov
sp.8
c x . a; sxreut n n i nt ogt a l
bp. dx
push
push
push
push
push
call
cx
word p t r[ s i + r o f f + 4 + 2 1
word p t r[ s i + r o f f + 4 1
word p t r[ d i + c o f f + l 6 + 2 ]
word p t r [ d i + c o f f + l 6 ]
-F i xedMul
add
POP
add
adc
sp.8
cx
cx,ax
bp ,d x
push
push
push
push
push
call
cx
word p t r[ s i + r o f f + 8 + 2 1
word p t r [ s i + r o f f + 8 1
word p t r [di+coff+32+21
word p t r[ d i + c o f f + 3 2 ]
-FixedMul
add
POP
add
adc
sp,8
cx
cx, ax
bp.dx
:now do t h ef o u r t hc o l u m n ,a s s u m i n g
: 1 as t h eb o t t o me n t r y ,c a u s i n g
; t r a n s l a t i o n t o beperformed
;remember d e s t v e c t o r p o i n t e r
;column 0 e n t r y on t h i s r o wt i m e s
; e n t r y i n column
; c l e a rp a r a m e t e r sf r o ms t a c k
; p r e s e r v el o ww o r d
;column 1 e n t r y on t h i s r o wt i m e s
; e n t r y i n column
; c l e a rp a r a m e t e r sf r o ms t a c k
; r e s t o r el o ww o r do fr u n n i n gt o t a l
; r u n n i n gt o t a lf o rt h i s
row
:column 1 e n t r y on t h i s r o wt i m e sr o w
: e n t r y i n column
: c l e a rp a r a m e t e r sf r o ms t a c k
; r e s t o r el o ww o r do fr u n n i n gt o t a l
row
; r u n n i n gt o t a lf o rt h i s
;add i n t r a n s l a t i o n
POP
bx
[bx+coff+roffl,cx
[bx+coff+roff+2].bp
; r e s t o r e DestXForm p o i n t e r
; s a v et h er e s u l ti nd e s tm a t r i x
roff-roff+l6
; p o i n tt on e x tc o li nx f o r m 2
& dest
; p o i n tt on e x tc o li nx f o r m 2
& dest
ENDM
POP
bp
; r e s t o r es t a c kf r a m ep o i n t e r
di
si
bv
: r e s t o r er e g i s t e rv a r i a b l e s
e n d i f ;USE386
POP
POP
POP
ret
-ConcatXforms
end
1022
Chapter 54
endp
row 1
; p r e s e r v el o ww o r do fr u n n i n gt o t a l
cx.Csi+roff+lZl
bp.[si+roff+l2+21
coff-coff+4
row 0
o f runningtotal
add
add
mov
mov
& dest
: r e s t o r es t a c kf r a m e
Shading
So far, the polygons out of which our animated objects have been built have had
colors of fixed intensities. For example, aface of a cube might
be blue, or green, or
white, but whatever color it is, that color never brightens or dims. Fixed colors are
easy to implement, butthey dont make for very realistic animation. In the
real world,
the intensity of the color of a surface varies depending on how brightly it is illuminated. The ability to simulate the illumination of a surface, or shading, is the next
feature well add to X-Sharp.
The overall shading of an object is the sum of several typesof shading components.
Ambient shadingis illumination by what youmight think of asbackground light, light
thats coming from all directions; all surfaces are equally illuminated by ambient
light, regardless of their orientation.Directed lighting, producing diffuse shading, is
illumination from one or more specific light sources. Directed light has a specific
direction, and theangle at which it strikes a surface determines how brightly it lights
that surface. Specular reJection is the tendency of a surface to reflect light ina mirrorlike fashion. There are othersorts of shading components, includingtransparency
and atmospheric effects, but the ambient and diffuse-shading components are all
were going to deal with in X-Sharp.
Ambient Shading
The basic model for both ambient
and diffuse shading is a simple one. Each surface
has a reflectivity between 0 and 1,where 0 means alllight is absorbed and 1means all
light is reflected. A certain amount of light energy strikes each surface. The energy
(intensity) of the light is expressed such that if light of intensity 1 strikes a surface
with reflectivity 1, then the brightest possible shading is displayed for that surface.
Complicating this somewhat is the need to support color; we do this by separating
reflectance and shading into three components each-red, green, and blue-and
calculating the shading for each color component separately for eachsurface.
Given an ambient-light red intensity of Ured
and a surface red reflectance Rred,the
displayed red ambient shading for thatsurface, as a fraction of the maximum red
intensity, is simply min(IAredxRred,1).The green and blue color components are
handled similarly.Thats really allthere is to ambient shading, although of course we
must design some way to map displayed color components into the
available palette
of colors; Ill do that in the next chapter.Ambient shading isnt the whole shading
picture, though. Infact, scenes tend to look pretty bland without diffuse shading.
Diffuse Shading
Diffuse shading is more complicated than ambient shading, because the effective
intensity of directed light falling on asurface depends onthe angle at which it strikes
the surface. According to Lamberts law, the light energy from a directed light
source
3-D Shading
1023
striking a surface is proportional to the cosine of the angle at which it strikes the
surface, with the angle measured relative to a vector perpendicular to the polygon (a
polygon normal), as shown in Figure 54.1. If the red intensity of directed light is
IDred,the red reflectance of the surface is Rred,and the angle between the incoming
directed light and the surfaces normal is theta, then thedisplayed red diffuse shading for that surface, as a fraction of the largest possible red intensity, is min
(1D,edxRre,xcos(8),
)*
Thats easy enough to calculate-but seemingly slow. Determining the cosine of an
angle can be sped up with a table lookup, but theres alsothe task of figuring out the
angle, and, all in all, it doesnt seem that diffuse shading is going to be speedyenough
for our purposes. Consider this, however: According to the properties of the dot
product (denoted by the operator
as shown in Figure 54.2),
cos(B)=(vw)/ IvI X IwI ) ,
where v and w are vectors, 8 is the angle between v and w, and Ivl is the length of v.
Suppose, now, that v and w are unit vectors; that is, vectors exactly one unit long.
Then the above equation reduces to cos(e)=v.w. In other words, we can calculate
the cosine between N, the unit-normal vector (one-unit-long perpendicular vector)
of a polygon, and L, the reverse of a unit vector describing the direction of a light
source, with just three multiplies and two adds. (Ill explain why the lightdirection
vector must be reversed later.) Once we have that, we can easily calculate the red
.,
LightfromdirectedPolygonnormal(perpendicularvector)
illumination source
D, of energy E.
Figure 54.2
1024
Chapter 54
w is:
w = v,w,+ yw,+ v,w,
LISTING 54.2
/*
DRAWP0BJ.C
Draws a l l v i s i b l e f a c e s i n t h e s p e c i f i e d p o l y g o n - b a s e d o b j e c t .
The o b j e c t
m u s th a v ep r e v i o u s l yb e e nt r a n s f o r m e d
and p r o j e c t e d , so t h a t all v e r t e x
a r r a y sa r ef i l l e di n .A m b i e n t
and d i f f u s es h a d i n ga r es u p p o r t e d .
*/
D
in c l u d e " p o l y g o n .h"
v o i dD r a w P O b j e c t ( P 0 b j e c t
ObjectToXform)
i n t i. j . NumFaces
O b j e c t T o X f o r m - > N u m F a c e s . NumVertices:
i n t * VertNumsPtr,Spot;
Face * FacePtr
ObjectToXform->FaceList:
P o i n t * Screenpoints = ObjectToXform->ScreenVertexList;
P o i n t L i s t H e a d e rP o l y g o n :
F i x e d p o i n tD i f f u s i o n :
Model Col o r Col orTemp:
M o d e l I n t e n s i t yI n t e n s i t y T e m p :
P o i n t 3U n i t N o r m a l ,* N o r m a l S t a r t p o i n t .* N o r m a l E n d p o i n t :
v2, w l . w2:
l o n g VI.
P o i n t VerticesCMAX-POLY-LENGTH];
/*
D r a w each v i s i b l e f a c e ( p o l y g o n ) o f t h e o b j e c t i n t u r n
*/
f o r (i=Oi<NumFaces;
:
i++.
FacePtr++) I
/ * Remember where we can f i n d t h e s t a r t andend o ft h ep o l y g o n ' s
u n i tn o r m a li nv i e ws p a c e ,
and s k i po v e rt h eu n i tn o r m a le n d p o i n t
e n t r y . Theendand
s t a r tp o i n t so ft h eu n i t
normal t ot h ep o l y g o n
must be t h e f i r s t andsecond
e n t r i e si nt h ep o l g y o n ' sv e r t e xl i s t .
*/
N o t et h a tt h e
second p o i n t i s a l s o
an a c t i v ep o l y g o nv e r t e x
VertNumsPtr = FacePtr->VertNums:
&ObjectToXform->XformedVertexList[*VertNumsPtr++l:
NormalEndpoint
NormalStartpoint = &ObjectToXform->XformedVertexListC*VertNumsPtrl:
/ * Copy o v e r t h e f a c e ' s v e r t i c e s f r o m t h e v e r t e x l i s t
*/
NumVertices = FacePtr->NumVerts:
j++)
f o r( j - 0 :j < N u m V e r t i c e s ;
VerticesCjl
ScreenPointsC*VertNumsPtr++l:
I* D r a w o n l y i f o u t s i d ef a c es h o w i n g
( i f t h en o r m a lt ot h ep o l y g o n
i n s c r e e nc o o r d i n a t e sp o i n t st o w a r dt h ev i e w e r :t h a ti s .
has a
p o s i t i v e 2 component) */
vl
VerticesC11.X - VerticesCO1.X:
w l = Vertices[NumVertices-1l.X
- VerticesCO1.X:
v2 = VerticesC11.Y - VerticesCO1.Y;
w2 = V e r t i c e s C N u r n V e r t i c e s - l 1 . Y - VerticesCO1.Y;
i f ( ( v l * w 2 - v2*wl) > 0 ) [
/ * It i s f a c i n g t h e s c r e e n ,
s o draw * /
3-D Shading
1025
/*
A p p r o p r i a t e l ya d j u s tt h ee x t e n to ft h er e c t a n g l eu s e dt o
e r a s et h i so b j e c tl a t e r
*/
f o r( j - 0 :j < N u m V e r t i c e s ;
j++) {
i f (Vertices1jl.X >
ObjectToXform->EraseRect[NonOisplayedPagel.Right~
i f (VerticesCj1.X
<
SCREEN-WIDTH)
ObjectToXform->EraseRectCNonDisplayedPagel.Right
VerticesCj1.X;
e l s e ObjectToXform->EraseRect[NonDisplayedPagel.Right =
SCREEN-WIDTH:
i f (VerticesCj1.Y >
ObjectToXform->EraseRect[NonOisplayedPagel.Bottom~
i f ( V e r t i c e s C j 1 . Y < SCREEN-HEIGHT)
ObjectToXform->EraseRect[NonDisplayedPagel.Bottom
Vertices[j].Y:
e l s e ObjectToXform->EraseRect[NonDisplayedPagel.BottomSCREEN-HEIGHT;
i f (VerticesCj1.X <
ObjectToXform->EraseRect[NonDisplayedPagel.Left)
i f (VerticesCj1.X
>
0)
ObjectToXform->EraseRect[NonOisplayedPagel.Left
VerticesCj1.X:
e l s e ObjectToXform->EraseRectCNonOisplayedPagel.Left-O:
i f (VerticesCj1.Y <
ObjectToXform->EraseRect[NonDisplayedPagel.Top)
i f (VerticesCj1.Y > 0)
ObjectToXform->EraseRect[NonDisplayedPagel.Top
VerticesCj1.Y:
e l s e ObjectToXform->EraseRectCNonDisplayedPagel.Top-O:
/*
I* Do d i f f u s es h a d i n g ,
i f enabled * /
i f (FacePtr->ShadingType & DIFFUSE-SHADING) {
/ * C a l c u l a t et h eu n i tn o r m a lf o rt h i sp o l y g o n ,f o ru s ei nd o t
products * I
UnitNorma1.X
NormalEndpoint->X - N o r m a l S t a r t p o i n t - > X ;
NormalEndpoint->Y - N o r m a l S t a r t p o i n t - > Y :
UnitNorma1.Y
NormalEndpoint->2 - N o r m a l S t a r t p o i n t - > Z :
UnitNormal.2
/ * C a l c u l a t et h ed i f f u s es h a d i n g
component f o r e a c ha c t i v e
s p o t l i g h t */
f o r (Spot-0: Spot<MAX-SPOTS: Spot++) I
i f (SpotOnCSpotl !- 0 ) I
/* Spot i s o n . so sum. f o r each c o l o r component, t h e
i n t e n s i t y ,a c c o u n t i n gf o rt h ea n g l eo ft h el i g h tr a y s
r e l a t i v et ot h eo r i e n t a t i o no ft h ep o l y g o n
*/
I* C a l c u l a t ec o s i n eo fa n g l eb e t w e e nt h el i g h t
and t h e
i f s p o ti ss h i n i n gf r o mb e h i n d
p o l y g o nn o r m a l :s k i p
t h ep o l y g o n * /
--
1026
Chapter 54
i f ((Diffusion
DDT~PRODUCT(SpotDirectionView[Spotl,
UnitNorma1 1) > 0 ) I
1ntensityTemp.Red +=
FixedMul(SpotIntensity[Spotl.Red. D i f f u s i o n ) ;
1ntensityTemp.Green +FixedMul(SpotIntensity[Spotl.Green, D i f f u s i o n ) ;
1ntensityTemp.Blue +FixedMul(SpotIntensity[Spotl.Blue. D i f f u s i o n ) :
1
1
/* C o n v e r tt h ed r a w i n gc o l o rt ot h ed e s i r e df r a c t i o no ft h e
b r i g h t e s tp o s s i b l ec o l o r
*/
&IntensityTemp);
I* Draw w i t ht h ec u m u l a t i v es h a d i n g ,c o n v e r t i n gf r o mt h eg e n e r a l
c o l o rr e p r e s e n t a t i o nt ot h eb e s t - m a t c hc o l o ri n d e x
*/
DRAWKPOLYGON(Vertices. NumVertices,
ModelColorToColorIndex(&ColorTemp). 0. 0):
Polygon
3-D Shading
1027
Reversed unit
vector L toward
Light from directed illumination sourceD, directed
light
i
of energy E, with
direction
expressed
the unit vector L
Polygon unit
by
Polygon surface
Figure 54.4
second point, however, is a polygon vertex. Calculating the difference vector between the first and second points yields the polygons unit normal. Adding a
unit-normal endpoint to each polygon isnt free; each of those end-points has to be
transformed, along with the rest of the vertices, and that takes time. Still, its faster
than calculating a unit normal for each polygon from scratch.
We also need a unitvector for each directed light source. The directed light sources
Ive implemented in X-Sharp are spotlights; that is, theyre considered to be point
light sources that are infinitely far away. This allows the simplifylng assumption that
all light rays from a spotlight are parallel and of equal intensity throughout the displayed universe, so each spotlight can be represented with a single unit vector and a
single intensity. The only trick is that in order to calculate the desired cos(theta)
between the polygon unit normal and a spotlights unit vector, the direction of the
spotlights unit vector must be reversed, as shown in Figure 54.4.This is necessary
because the dot product implicitly places vectors withtheir start points at the same
location when its usedto calculate the cosine of the angle between two vectors. The
light vector is incoming to the polygon surface, and the unit normal is outbound, so
only by reversing one vector or the otherwill we get the cosine of the desired angle.
Given the two unit vectors, its a piece of cake to calculate intensities, as shown in
Listing 54.2.The sample program DEMO1, in the X-Sharp archive on the listings
disk (built by running K1 .BAT), puts the shading code to work displaying a rotating
ball with ambient lighting and three spot lighting sources that the user can turn on
and off. What youll seewhen you run DEMO1 is that the shading is very good-face
colors change very smoothly indeed-so long as only green lighting sources are on.
However, if you combine spotlight two, which is blue, with any other light source,
polygon colors will start to shift abruptly and unevenly. As configured in the demo,
the palette supports a wide range of shading intensities for apure version of anyone
of the threeprimary colors, but avery limited number of intensity steps (four, in this
1028
Chapter 54
Previous
Home
Next
case) for each color componentwhen two or more primary colors are mixed. While
this situation can be improved, it is fundamentally a result of the restricted capabilities of the 256-color palette, and there is only so much that can be done without a
larger color set. In the nextchapter, Ill talk about some ways to improve the quality
of 256-color shading.
3-D Shading
1029
Previous
chapter 55
Home
Next
1033
A Color Model
Weve been developing X-Sharp for several chapters now. In the previous chapter,
we added illumination sourcesand shading; that additionmakes it necessary for us
to have a general-purpose color model,
so that we can display the gradationsof color
intensity necessary to render illuminated surfacesproperly. In other words, when a
bright lightis shining straightat a green surface,
we need tobe able todisplay bright
green, and as that light dims or tilts to strike the surface at a shallower angle, we
need to be able to display progressivelydimmer shadesof green.
The first thing to do is to select a color model in which to perform our shading
calculations. Ill use the dot product-based stuff I discussed in the previous chapter.
The approach well take is to select an ideal representation of the full color space
and do our calculations there,as if we really could display every possiblecolor; only
as a final step will wemap each desired color into the limited 25kolor set of the VGA, or
the color range of whateveradapter we happen to be working
with. There are a number
of color models that we might choose to
work with,but Im going togo with the one
thats bothmost familiar and, in my opinion, simplest: RGB (red, green, blue).
In the RGB model, a given color is modeled as the mix of specific fractions of full
intensities of each of the three color primaries.
For example, the brightestpossible
pure blue is O.O*R,O.O*G, l.O*B. Half-bright cyan is O.O*R, 0.5*G, 0.5*B. Quarterbright gray is 0.25*R, 0.25*G, 0.25B. You can think of RGB color space as being a
cube, as shown in Figure 55.1, with any particular color lying somewhere inside or
on the cube.
Cyan
Green
Red
Increasing
red intensity
1034
Chapter 55
/
Yellow
Increasing
green intensity
/*
255
/ * 255
/ * 255
=
=
=
rnax r e d , 0 = n or e d * I
rnax g r e e n , 0 = n og r e e n */
rnax b l u e . 0 = n o b l u e * I
Here, each color is described by three color components-one each for red, green,
and blue-and each primary color component is represented by eight bits. Zero
intensity of a color component is represented by the value 0, and full intensity is
represented by the value 255.This gives us 256 levels ofeach primary color component, anda total of 16,772,216 possible colors.
Holy cow! Isn't 16,OOO,OOO-pluscolors a bit of overkill?
Actually, no, it isn't. At the eighth Annual Computer Graphics Show in New York,
Sheldon Linker, of Linker Systems, related an interesting tale about color perception research at the Jet
Propulsion Lab backin the '70s. The JPL color research folks
had the capability to print more than50,000,000 distinct and very precise colors on
each word printed
paper. As a test, theytried printing out words in various colors, with
on a background that differed by only one color index from the word's color. No
one expected the human eye to be able to differentiate between two colors, out of
5O,OOO,OOO-plus,that were so similar. It turned out, though, that
everyone could read
the words with no trouble at all; the human eye is surprisingly sensitive to color
gradations, and also happens to be wonderful at detecting edges.
When the JPL team went to test the eye's sensitivity to color on the screen, they
found that only about 16,000,000 colors could be distinguished, because the colorsensing mechanism of the human eye is more compatible with reflective sources
such as paper andink than with emissive sources such as CRTs. Still, the human eye
can distinguish about 16,000,000 colorson the screen.That's not so hard tobelieve,
if you think about it; the eye senses each primary color separately, so we're really only
talking about detecting 256 levels of intensity per primary here. It's the brain that
does the amazing part; the 16,OOO,OOO-pluscolor capability actuallycomes not from
extraordinary sensitivity in the eye, but rather from the brain's ability to distinguish
between all the mixes of 256 levels ofeach of three primaries.
Color Modeling in 256-Color Mode
1035
So it's perfectly reasonable to maintain24 bits of color resolution,and X-Sharp represents colors internally
as ideal, device-independent24bit RGB triplets. All shading
calculations are performed on these triplets, with 24bit color precision. It's only
after the final24bit RGB drawing color is calculated that thedisplay adapter's color
capabilities come into play, as the X-Sharp function ModelColorToColorIndex()is
called to map the desiredRGB color to the closest match the adapter is capable of
displaying. Of course, that mapping is adapter-dependent. On a24bpp device, it's
pretty obvious how the internal RGB color format maps to displayed pixel colors:
directly. On VGAs with 15-bpp Sierra HicolorDACS, the mapping is equally simple,
with the five upper bits of each color component mapping straight display
to
pixels.
But how on earth do we map those 16,OOO,OOO-plus RGB colors into the 256-color
space of a standardVGA?
This is the "color definition" problem I mentioned at the startof this chapter. The
VGA palette is arbitrarily programmable to any set of 256 colors, with each color
defined by six bits each of red, green, and blueintensity. In X-Sharp, the function
InitializePaletteO can be customized to set up the palettehowever we wish; this gives
us nearly complete flexibility in defining the working color set. Even with infinite
selecflexibility, however, 256
out of 16,000,000or so possible colors is a pretty puny
tion. It's easy to set up the palette to give yourself a good selection of just blue
intensities, or ofjustgreens; but for general
color modeling there'ssimply not enough
palette to go around.
One way to deal with the limited simultaneous color capabilities of the VGA is to
build an application thatuses onlya subsetof RGBspace, then bias the VGA's palette
toward that subspace. Thisis the approach used in the DEMOl sample program in
X-Sharp; Listings 55.2 and 55.3 show the versions of Initializepalette0 and
ModelColorToColorIndex()that set up andperform the color mapping for DEMOl.
LISTING55.2155-2.C
/*
S e t su pt h ep a l e t t ei n
mode X , t o a 2 - 2 - 2 g e n e r a l R - G - B o r g a n i z a t i o n , w i t h
64 s e p a r a t el e v e l se a c ho fp u r er e d ,g r e e n ,a n db l u e .T h i si sv e r yg o o d
f o rp u r ec o l o r s ,b u tm e d i o c r ea tb e s tf o rm i x e s .
....."""""""""~
10
R e d l G r e e nBl l u e
""""""""""""
""""""""""""
10
Red
""""""""""""
""""""""""""
11 0
Green
"""""""..........
1036
Chapter 55
_______"_."...........
I1 1
B1 ue
"""""""""""..
7 6 5 4 3 2 1 0
53, 63 I ;
29, 31. 32.
44. 44, 45.
53, 54. 54.
61. 62, 62.
33.
46.
55.
63.
34.
46,
55.
63.
void InitializePaletteO
-- Gamma64Levels[Redl;
=
0:
0:
PaletteBlock[l2B+Green1[2]
- 0:
- Gamma64Levels[Greenl:
- 0:
{
-- Gamma64Level
{
0:
0:
sCBl : uel
1037
- 256:
I* I o f DAC l o c a t i o n s t o 1 oad * I
( u n s i g n e d i n t ) P a l e t t e B l o c k ; I* o f f s e t o f a r r a y f r o m w h i c h
t o l o a d RGB s e t t i n g s * I
sregset.es
-DS; I* segment o f a r r a y f r o m w h i c h t o l o a d s e t t i n g s
*I
i n t 8 6 x ( O x l O .& r e g s e t .& r e g s e t .& r e g s e t ) ;
I* l o a dt h ep a l e t t eb l o c k
*I
regset.x.cx
regset.x.dx
LISTING 55.3
155-3.C
/ * C o n v e r t s a m o d e lc o l o r( ac o l o ri nt h e
RGB c o l o rc u b e ,i nt h ec u r r e n t
a c o l o ri n d e xf o r
mode X . P u r ep r i m a r yc o l o r sa r e
c o l o rm o d e l )t o
s p e c i a l - c a s e d ,a n de v e r y t h i n ge l s ei sh a n d l e db y
a 2 - 2 - 2 model. * I
in t Model Col orToCol orIndex(Mode1 Col or* C o l o r )
i f (Color->Red
0) {
i f (Color->Green
0) {
/ * P u r eb l u e * I
r e t u r n ( l 9 2 + ( C o l o r - > B l u e >> 2 ) ) ;
1 else i f (Color->Blue
0) {
I* Puregreen * I
return(l28+(Color->Green
1 e l s e i f ((Color->Green
/*
P u r er e d * I
return(64+(Color->Red
>>
2));
0 ) && ( C o l o r - > B l u e
>>
2));
- 0))
I* M u l t i - c o l o r m i x ; l o o k u p t h e i n d e x w i t h t h e t w o m o s t s i g n i f i c a n t b i t s
o fe a c hc o l o rc o m p o n e n t
*I
r e t u r n ( ( ( C o 1 o r - > R e d & OxCO)
( ( C o l o r - > B l u e & OxCO)
>>
>>
2)
6));
>>
4)
1038
Chapter 55
1039
for each byte of font data: once to draw the background box, once to read display
memory to load the latches, and once to actually draw the font pattern. Display
of accesses asmuch as
memory is incredibly slow, so wed liketo reduce the number
possible. Withthe BitMans approach, we can reduce the numberof accesses to just
one per fontbyte, and eliminate flicker, too.
The keys to fast solid text are the latches and write mode 3. The latches, as you may
recall from earlierdiscussions in this book, are four internal
VGA registers that hold
the last bytes read from theVGAs four planes; every read fromVGA memory loads
the latches with the values stored at that display memory address across the four
planes. Whenever a write is performed to VGA memory, the latches can provide
some, none, orall of the bits written to memory, depending on the bitmask, which
selects between the latched data and the
drawing data on a bit-by-bit basis.The latches
solve half our problem; we can fill the latches with the background color, then use
them to draw the background box. The trick now is drawing the text pixels in the
foreground color at the
same time.
This is where it gets a little complicated. In write mode 3 (which incidentally is not
available on theEGA) , each byte valuethat the CPU writes to the VGA does not get
written to display memory. Instead, it turns into thebit mask. (Actually, its ANDed
with the Bit Maskregister, and theresult becomes the bitmask, but well leavethe Bit
Mask register set to OxFF, so the CPU value will become the bit mask.) The bit mask
selects, on a bit-by-bit basis, between the data in the latches for each plane (the
previously loaded background color, in this case) and the foregroundcolor. Where
does the foreground color come from, if not from the CPU? From the Set/Reset
register, as shown in Figure 55.3. Thus, each byte written by the CPU (font data,
presumably) selects foreground or background color for each of eight pixels, all
done with a single write to display memory.
1040
Chapter 55
I Bit-maskRegister 1
Byte written to
I I
1
I
OxFF; a 0
TI
bit where
bit-mask is 0;
set/reset bit
where bit-mask
J-
Selects latch
bit where
bit-mask is 0;
set/reset bit
where bit-mask
bit is I .
Eight bits
(Assumes
Map Mask is written to
display
OXOF, so all
memory
planes are
V
written.)
bit where
bit-mask is 0;
set/reset bit
where bit-mask
bit is I .
Eight bits
written to
display
memory
Memory
L
set/reset bit
where bit-mask
bit is I .
Eight bits
written to
display
memory
c
Memory
Eight bits
written to
display
memory
J-
Memory
I041
LISTING 55.4
155-4.ASM
: D e m o n s t r a t e sd r a w i n gs o l i dt e x t
: 3 - b a s e d o. n e - p a s st e c h n i q u e .
CHAR-HEIGHT
SCREEN-HEIGHT
SCREENLSEGMENT
FGLCOLOR
BG-COLOR
GC-INDEX
SETLRESET
G-MODE
BIT-MASK
.model
.stack
.data
on t h e VGA. u s i n gt h eB i t M a n ' sw r i t e
mode
:# o fs c a nl i n e sp e rc h a r a c t e r( m u s tb e< 2 5 6 )
equ 8
equ480
equ OaOOOh
e q u1 4
equ 1
equ3ceh
equ 0
equ 5
equ 8
:# o fs c a nl i n e sp e rs c r e e n
: w h e r es c r e e n
memory i s
:text col or
: b a c k g r o u n db o xc o l o r
: G r a p h i c sC o n t r o l l e r
( G C ) I n d e xr e g 1/0 p o r t
: S e t / R e s e tr e g i s t e ri n d e xi n
GC
: G r a p h i c s Mode r e g i s t e r i n d e x i n
GC
: B i t Mask r e g i s t e r i n d e x i n
GC
smal 1
200h
:currentline #
dw ?
Line
:# o fs c a nl i n e si ne a c hc h a r a c t e r( m u s tb e< 2 5 6 )
dw ?
CharHeight
will f i t o ns c r e e n
:max # o f s c a n l i n e s o f t e x t t h a t
dw ?
MaxLines
: o f f s e tf r o mo n es c a nl i n et ot h en e x t
dw ?
LineWidthBytes
; p o i n t e rt of o n tw i t hw h i c ht od r a w
dd ?
FontPtr
lbavbteel
Samplestring
db 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
db ' a b c d e f g h i j k l m n o p q r s t u v w x y z '
db ' 0 1 2 3 4 5 6 7 8 9 ! @ ~ ~ S % A & * 0 . < . > / ? ; : ' . 0
.code
start:
mov
mov
ax.@data
ds.ax
mov
int
ax, 12h
10h
1 6 6- c4o0lx:os4re8l0e c t
mov
ah.llh
mov
a1 .30h
8 x 8mov: g e t b h . 3
t hnpetotihon: get1e0rt h
i
mov
w o r dp t rC F o n t P t r 1 . b ~
mov
w o r dp t rC F o n t P t r + E l . e s
mov
mov
mov
sub
div
mu1
mov
1042
Chapter 55
bx.CHAR-HEIGHT
CCharHeight1.b~
ax.SCREEN-HEIGHT
dx, dx
bx
bx
[ M,naaexxsLli
mode
ROM
s u b sf ou nb tf u n c t i o n
B I O S 8x8 f o n t
:#socflai npeceshra r a c t e r
t htaetx to:max
l fisncefausn#l l o f
: will f i t on s tchr ee e n
mov
ah.0fh
i n: tg e t 1 0 h
mov
cvional ;rubci m
ayoa1
btnelsv,eae hr t
ah.ahsub
mov
C L i n e W i d t h B y t e s 1:b,w
saylciioixtndaneftnesh
AX
: n o wd r a wt h et e x t
yet
bx.bx
sub
l i sn cemov
a na: ts t a[ bLr txi n e ] ,
LineLoop:
c o la;uasxm
stt.uaanbrxt
mov
c h , FG-COLOR
mov
c l .BG-COLOR
mov . os if f sSeat m p l e s t r i :nt gdet rxoat w
D r acwastTlael em
xxttpht:Sdleetr rai w
ng
mov
b x . I: L i n e ]
baxd. [dC h a r H e i g h t ]
mov
CLine1.b~
y e;td?cmp
o n[eMsa] xbLxi ,
: nj obt
LineLoop
0; be
must
ax.03h
10h
mov
int
ah.4ch
21h
:#
ndseorlctxiafonantwne
mov ah.7
fi on r:t w a i2t 1 h
mov
:back
int
a m u l toi fp l e
t o tderxatw
w h: ci cno hl o r
db rawacth:w
okci gcnorhl oburonx d
ewcipht rhoeaosukst e, y
text
to
to
mode
DOS
;exit
: Draws a t e x t s t r i n g .
: I n p u t : AX = X c o o r d i n a t e a t w h i c h t o d r a w u p p e r - l e f t c o r n e r o f f i r s t c h a r
:
BX
Y c o o r d i n a t ea tw h i c ht od r a wu p p e r - l e f tc o r n e ro f i r s tc h a r
:
CH = f o r e g r o u n(dt e x ct )o l o r
:
CL
b a c k g r o u n( db o xc )o l o r
:
DS:SI
p o i n t e tr os t r i n gt od r a w z, e r ot e r m i n a t e d
:
C h a r H e i g hm
t u s bt es e t ot h eh e i g h ot ef a c hc h a r a c t e r
:
FontPtm
r u s bt es e t ot h ef o n w
t i t hw h i c ht od r a w
L i n e W i d t h B y t e sm u s tb es e tt ot h es c a nl i n ew i d t hi nb y t e s
: D o n ' tc o u n t
on a n yr e g i s t e r so t h e rt h a n
DS. SS. and SP b e i n gp r e s e r v e d .
: The X c o o r d i n a t e i s t r u n c a t e d t o
a m u l t i p l e o f 8. C h a r a c t e r sa r e
: assumed t o be 8 p i x e l sw i d e .
align 2
D r a w T e x t Snpterriaonrcg
c ld
a sx h. 1r
; b y t ea d d r e s so fs t a r t i n g
X w i t h i ns c a nl i n e
a sx h. 1r
asxh. 1r
mov
d, ai x
mov
ax.CLineWidthBytes1
mu1
bx
;startoffsetofinitial
s c a nl i n e
add
d, ai x
;startoffsetofinitial
byte
mov
ax.SCREENKSEGMENT
mov
es.ax
offsetofinitialcharacter's
;ES:DI
: f i r s ts c a nl i n e
: s e t up t h e V G A ' s h a r d w a r e s o t h a t we c a n
: fill t h el a t c h e sw i t ht h eb a c k g r o u n dc o l o r
mov
dx,GC-INDEX
mov
a x . ( O f f h SHL 8 ) + BIT-MASK
doxu. at x
: s e t B i t Mask r e g i s t e r t o OxFF ( t h a t ' s t h e
: d e f a u l t ,b u t I ' m d o i n g t h i s j u s t t o
make s u r e
1043
Previous
mov
d ox u. at x
mov
mov
dox u. at x
mov
a x , ( 0 0 3 h SHL 8 )
; y o uu n d e r s t a n dt h a tB i t
Mask r e g i s t e ra n d
; CPU d a t aa r e
ANDed i n w r i t e mode 3 )
G-MODE
ah.cl
a1 .SET-RESET
b y t pe ter s : [ O f f f f h l . O f f h
mov
c. le s : [ O f f f f h l
mov
out
ah.ch
dx.ax
DrawTextLoop:
1o d s b
and
a1 .a1
jz
DrawTextDone
p u sdhs
p u sshi
p u sdhi
mov
dx,[LineWidthBytesl
dx
dec
mov
cx.CCharHeight1
mu1
cl
I d ss. iC F o n t P t r l
add
s i ,ax
DrawCharLoop:
movsb
add d i . d x
l o oDpr a w C h a r L o o p
pop
di
di ni c
pop
si
POP
ds
Dj rmapw T e x t L o o p
align 2
DrawTextDone:
mov
dx.GC-INDEX
mov
a x , ( 0 0 0 h SHL 8 ) + G-MODE
d ox u. at x
ret
D r a w T e x t Set nr idnpg
setnadr t
1044
Chapter 55
Home
; s e l e c t w r i t e mode 3
; b a c k g r o u n dc o l o r
; s e tt h ed r a w i n gc o l o rt ob a c k g r o u n dc o l o r
; w r i t e 8 p i x e l so ft h eb a c k g r o u n d
; c o l o rt ou n u s e do f f - s c r e e n
memory
; r e a dt h eb a c k g r o u n dc o l o rb a c ki n t ot h e
; l a t c h e s ;t h el a t c h e sa r e
now f i l l e d w i t h
; t h eb a c k g r o u n dc o l o r .T h ev a l u ei n
CL
; d o e s n ' tm a t t e r ,
we j u s tn e e d e d a t a r g e t
: f o rt h er e a d ,
s o we c o u l dl o a dt h el a t c h e s
; f o r e g r o u n dc o l o r
: s e tt h eS e t / R e s e t( d r a w i n g )c o l o rt ot h e
; f o r e g r o u n dc o l o r
; w e ' r er e a d yt od r a w !
; n e x tc h a r a c t e rt od r a w
;end o f s t r i n g ?
;yes
;remember s t r i n g ' s segment
;remember o f f s e t o f n e x t c h a r a c t e r i n s t r i n g
: r e m e m b e rd r a w i n go f f s e t
; l o a dt h e s ev a r i a b l e sb e f o r e
we w i p e o u t DS
: o f f s e tf r o mo n el i n et on e x t
; c o m p e n s a t ef o r STOSB
; o f f s e to fc h a r a c t e r
i n f o n tt a b l e
;pointtofonttable
:pointtostartofcharactertodraw
; t h ef o l l o w i n gl o o ps h o u l db eu n r o l l e df o r
; maximum p e r f o r m a n c e !
;draw a l l l i n e s o f t h e c h a r a c t e r
; g e tt h en e x tb y t eo ft h ec h a r a c t e ra n dd r a w
; c h a r a c t e r ;d a t a
i s ANDed w i t h B i t Mask
: r e g i s t e r t o become b i t m a s k ,a n ds e l e c t s
; b e t w e e nl a t c h( c o n t a i n i n gt h eb a c k g r o u n d
; c o l o r )a n dS e t / R e s e tr e g i s t e r( c o n t a i n i n g
; f o r e g r o u n dc o l o r )
;pointtonextlineofdestination
; r e t r i e v ei n i t i a ld r a w i n go f f s e t
; d r a w i n go f f s e tf o rn e x tc h a r
; r e t r i e v eo f f s e to fn e x tc h a r a c t e ri ns t r i n g
; r e t r i e v e s t r i n g ' s segment
; d r a wn e x tc h a r a c t e r ,
i f any
; r e s t o r et h eG r a p h i c s
Mode r e g i s t e r t o i t s
; defaultstateofwrite
mode 0
;selectwrite
mode 0
Next
Previous
chapter 56
Home
Next
1047
Americas withher mountain lion during thelast Ice Age; or theMars Cats and their
trip in suspended animation to the Lesser Magellenic Cloud and beyond; or most
assuredly Little Whale, the baby Universe Whale that is naughty enough to eat inhabited universes. But I digress.)
Anyway, I bring up Pooh andthe space station because the time has come todiscuss
fast texture mapping. Texture mapping is the process of mapping an image (in our
case, a bitmap)onto the surface of a polygon thatsbeen transformed in the
process
of 3-D drawing. Up to this point, each polygon weve drawn in X-Sharp has been a
single, solid color. Over the last couple of chapters we added the ability to shade
polygons according to lighting, but each polygon was still a single color. Thus, in
order to produce
any sort of intricate design, a greatmany tiny polygons would have
to be drawn. That would be very slow, so we need another approach. Onesuch approach is texture mapping; that is, mapping the bitmap containing the desired
image
onto thepixels contained within the transformed polygon. Done properly, thisshould
make it possible to change X-Sharps output from a bland collection of monocolor
facets to a lively, detailed, and much morerealistic scene.
What sort of scene? you may well ask. This is where Pooh and the space station
came in. When I sat down to think of a sample texture-mapping application, itoccurred to me that the shaded
ball demo we added to X-Sharp recently looked at least
a bit like a spinning, spherical space station, and that the single unshaded, yellow
polygon looked somewhat like a window in the space station, and it mightbe a nice
example if someone were standing in thewindow. ...
The rest is history.
1048
Chapter 56
Figure 56.1
56.1
Figure 56.2
1049
image and the polygon, each time advancing the X coordinates of the edges according to theslopes of the lines,just as is normally done when filling a polygon. Since
the polygon is untransformed, the stepping is identical in both the image and the
polygon, and thepixel mappingis one-to-one, so the appropriate partof each scan
line of the image can simply be block copied to the destination.
Now, matters get more complicated. What if the destination polygon is rotated in
two dimensions? We no longerhave a neat direct mapping from image
scan lines to
destination polygon scan lines. We still want to draw across each destination scan
line, but the proper source
pixels for each destination
scan line may nowtrack across
the source bitmapat an angle, as shown in Figure 56.3. What can we do?
The solution is remarkably simple. Well just map each transformed vertex to the
corresponding vertex in the bitmap;this is easy, because the vertices are atthe same
indices in the originaland transformed vertex lists. Each time we select anew edge
to scanfor the destination
polygon, well select the corresponding edge in the source
bitmap, as well. Then-and this is crucial-each time we step a destination edge
one
scan line, well step the corresponding sourceimage edge an equivalent amount.
Ah, but what is an equivalentamount? Thinkof it this way. If a destination edgeis
100 scan lines high, will
it be stepped 100 times. Then, well divide the SourceXWidth
and SourceYHeight lengths of the source edge
by 100, and addthose amounts to the
source edges coordinates each time the destination is stepped one scan line. Put
another way, we have, as usual, arranged thingsso that in the destinationpolygon we
step DestYHeight times, where DestYHeight is the height of the destination edge.
The this approach arranges to step the source
image edge DestYHeight times also,
to matchwhat the destination is doing.
1050
Chapter 56
Now were able to track the coordinates of the polygon edges through the source
image in tandem with the destination edges. Stepping across each destination scan
line uses precisely the same technique, as shown in Figure 56.4. In the destination,
we step DestXWidth times across each scan line of the polygon, once for eachpixel
on the scan line. (DestXWidthis the horizontal distance between the two edges being scanned on any given scan line.) To match this, we divide SourceXWidth and
SourceYHeight (the lengths of the scan line in the source image, as determined by
the source edge points weve been tracking, as just described) by the width of the
destination scan line, DestXWidth, to produce SourceXStep and SourceYStep.Then,
we just step DestXWidth times, adding SourceXStep and SourceYStep to SourceX
and SourceY each time, and choose the nearest image pixel to (SourceX,SourceY)
to copy to (DestX,DestY).(Note that thenames used above, such asSourceXWidth,
are used for descriptive purposes, and dont necessarily correspond to the actual
variable names used in Listing 56.2.)
Thats a workable approach for 2-D rotated polygons-but what about 3-D rotated
polygons, where the visible dimensions of the polygon can vary with 3-Drotation and
perspective projection? First, Id like to make it clear that texture mapping takes
place fromthe source image tothe destination polygon afterthe destination polygon is
projected to the screen. That is, the image will be mapped after the destination
polygon is in its final, drawable form. Given that, it should be
apparent that the above
approach automatically compensates for all changes in the dimensions of a polygon.
You see, this approach divides source edges and scan lines into however many steps
the destination polygon requires. If the destination polygon is much narrower than
the source polygon, as a result of 3-D rotation and perspective projection, we just
end up taking bigger steps through the source image and skipping a lot of source
image pixels,as shown in Figure 56.5.The upshot is that theabove approach handles
Source image
(texture to map)
1051
Source image
(texture to map)
all transformations and projections effortlessly. It could also be used to scale source
images up to fit in larger polygons; all thats needed is a list of where the polygons
vertices map into the source image, and everything else happens automatically. In
fact, mapping from any polygonal area of a bitmap to any destination polygon will
work, given only that the two polygons have the same number of vertices.
1052
Chapter 56
Even more important is that the sortof texture mapping Ill do in X-Sharp doesnt
correct for perspective. That doesnt much matter for small polygons or polygons
that are nearly parallel to the screen in 3-space, but it can produce very noticeable
bowing of textures on large polygons at an angle to the screen. Perspective texture
mapping is a complex subject thats outside the scope of this book, but you should
be aware of its existence, because perspective texture mapping is a key element of
many games these days.
Finally, Id like to point out that this sort of DDA texture mapping is display-hardware
dependent, because the bitmap for each image must be compatible with the number of bits per pixel in the destination. Thats actually a fairly serious issue. One of
the nice things about X-Sharps polygon orientation is that, until now, the only display dependent partof X-Sharp has been the transformation from RGB color space
to the adapters color space. Compensation for aspect ratio, resolution, and thelike
all happens automatically in the course of projection. Still, we need the ability to
display detailed surfaces, and its hard to conceive ofa fast way to do so thats totally
hardware independent. (If you know of one, let me know care of the publisher.)
For now, all we need is fast texture mapping of adequate quality, whichthe straightforward, non-antialiased DDA approach supplies. Im sure there aremany other fast
approaches, and,as Ivesaid, there are moreaccurate approaches, butDDA texture
mapping works well, giventhe constraints of the PCs horsepower. Next, well look at
code that performsDDA texture mapping. First, though, Idlike to take a moment
to thank Jim Kent, author of Autodesk Animator and a frequent correspondent, for
getting me started with the DDA approach.
Here b a major tip: DDA texture mapping look best onfast-moving surfaces, where
the eye doesnt have timetopick nits with the shearing and aliasing thats an inevitable
by-product of such a crude approach. Compile DEMO1 from the X-Sharp archive
in this chapter b subdirectory of the listings disk, and run it. The initial display
looks okay, but certainly not great, because the rotational speed is so slow. Now
Pooh and theSpaceStation
1053
press the S key a f a y times to speed up the rotation and flip between different
rotation axes. I think you'll be amazed at how much better DDA texture mapping
looks at high speed. This technique would be greatfor mapping textures onto hurtling asteroids orjets, but would come upshortfor slow,finelydetailed movements.
1.C
New header f i l e e n t r i e s r e l a t e d t o t e x t u r e - m a p p e d p o l y g o n s
*/
/*
Draws t h e p o l y g o n d e s c r i b e d b y t h e p o i n t l i s t P o i n t L i s t w i t h
a bitmap
t e x t u r e mapped o n t o i t */
i d e f i n e DRAW_TEXTURED-POLYGON(PointList.NumPoints,TexVerts,TexMap)
\
Polygon.Length
N u m P o i nPt so;l y g o n . P o i n t P t r
PointList;
\
DrawTexturedPolygon(&Polygon. TexVerts.TexMap):
# d e f i n e FIXED-TO-INT(FixedVa1)
( ( i n t )( F i x e d V a l
>> 16))
# d e f i n e ROUND-FIXED-TO_INT(FixedVal)
\
( ( i n t )( ( F i x e d V a l + DOUBLE-TO_FIXED(0.5)) >> 16))
/ * R e t r i e v e ss p e c i f i e dp i x e lf r o ms p e c i f i e di m a g eb i t m a po fs p e c i f i e dw i d t h .
# d e f i n e GET-IMAGE-PIXEL(TexMapBits. TexMapWidth, X Y. )
\
TexMapBits[(Y * TexMapWidth) + X ]
/* Masks t o m a r ks h a d i n gt y p e s
i n Face s t r u c t u r e */
# d e f i n e NO-SHADING
0x0000
# d e f i n e AMBIENT-SHADING Ox0001
# d e f i n e DIFFUSE-SHADING Ox0002
i d e f i n e TEXTURE-MAPPED-SHADING 0x0004
/ * D e s c r i b e s a t e x t u r e map */
t y p e d e fs t r u c t {
i n t TexMapWidth;
/ * t e x t u r e map w i d t h i n b y t e s */
char*TexMapBits;
/ * p o i n t e r t o t e x t u r e b i t m a p */
I TextureMap;
*/
/ * S t r u c t u r ed e s c r i b i n go n ef a c eo fa no b j e c t( o n ep o l y g o n )
*/
t y p e d e fs t r u c t I
i n t * VertNums:
/ * p o i n t e rt ol i s to fi n d e x e so ft h i sp o l y g o n ' sv e r t i c e s
intheobject'svertexlist.
The f i r s t t w o i n d e x e s
m u s ts e l e c te n da n ds t a r tp o i n t s ,r e s p e c t i v e l y ,o ft h i s
p o l y g o n ' su n i tn o r m a lv e c t o r .
Second p o i n ts h o u l da l s o
b ea na c t i v ep o l y g o nv e r t e x
*/
i n t NumVerts;
/ * # o vf e r t si nf a c e n, o it n c l u d i n gt h ei n i t i a l
v e r t e x ,w h i c hm u s tb et h ee n do f
a u n i tn o r m a lv e c t o r
t h a ts t a r t sa tt h es e c o n di n d e xi n
VertNums */
/ * d i r e cpt a l e t t ei n d e xu; s e do n l yf onr o n - s h a d e df a c e s
i nC
t olorIndex;
M o d e l C o l o rF u l l C o l o r ;
/* p o l y g o n ' sc o l o r */
i n t ShadingType: / * n o n e a, m b i e n t d, i f f u s e t, e x t u r e
mapped, e t c . * /
TextureMap * TexMap; / * p o i n t e r t o b i t m a p f o r t e x t u r e m a p p i n g ,
i f any */
Point
TexVerts; /* p o i n t e r t o l i s t o f t h i s p o l y g o n ' s v e r t i c e s , i n
T e x t u r e M a pc o o r d i n a t e s .I n d e x
n must map t o i n d e x
n + 1 i n VertNums. ( t h e + 1 i s t o s k i p o v e r t h e u n i t
n o r m a el n d p o i n t
i n VertNums) */
1 Face;
e x t e r nv o i d D r a w T e x t u r e d P o l y g o n ( P o i n t L i s t H e a d e r *, P o i n t *, TextureMap * ) ;
1054
Draws a b i t m a p . mapped t o a c o n v e xp o l y g o n( d r a w s
a t e x t u r e - m a p p e dp o l y g o n ) .
"Convex"means
t h a te v e r yh o r i z o n t a ll i n ed r a w nt h r o u g ht h ep o l y g o na ta n y
p o i n tw o u l dc r o s se x a c t l yt w oa c t i v ee d g e s( n e i t h e rh o r i z o n t a ll i n e sn o r
z e r o - l e n g t he d g e sc o u n ta sa c t i v ee d g e s ;b o t ha r ea c c e p t a b l ea n y w h e r e
in
t h ep o l y g o n ) .a n dt h a tt h er i g h t
& l e f t edgesnevercross.Nonconvex
p o l y g o n sw o n ' tb ed r a w np r o p e r l y .C a n ' tf a i l .
*/
Chapter 56
*/
#i
n c l ude < s t d i 0 .h>
#i
n c l ude <math. h>
#include "polygon. h"
/ * D e s c r i b e st h ec u r r e n tl o c a t i o n
a n ds t e p p i n g ,
i nb o t ht h es o u r c e
and
t h ed e s t i n a t i o n ,o f
an edge * /
t y p e d e fs t r u c t I
in t D i r e c t i on :
/ * t h r o u g he d g el i s t :
1 f o r a r i g h t edge ( f o r w a r d
t h r o u g hv e r t e xl i s t ) ,
- 1 f o r a l e f t edge(backward
t h r o u g hv e r t e xl i s t )
*/
in t Remai n i ngScans :
I* h e i g h t l e f t t o s c a no u t i n d e s t * I
i n t CurrentEnd:
/* v e r t e x # o f end o f c u r r e n te d g e */
F i x e d p o i n tS o u r c e X ;
I* c u r r e n t X l o c a t i o n i n s o u r c e f o r t h i s
edge * I
F i x e d p o i n tS o u r c e Y :
I* c u r r e n t Y l o c a t i o n i n s o u r c e f o r t h i s
edge */
F i x e d p o i n tS o u r c e S t e p X ;
I* X s t e p i n s o u r c e f o r
Y s t e pi nd e s to f
1 */
F i x e d p o i n tS o u r c e S t e p Y :
I* Y s t e p i n s o u r c e f o r
Y s t e pi nd e s to f
1 *I
/ * v a r i a b l e su s e df o ra l l - i n t e g e rB r e s e n h a m ' s - t y p e
X s t e p p i n gt h r o u g ht h ed e s t ,n e e d e df o rp r e c i s e
p i x e lp l a c e m e n tt oa v o i dg a p s
*I
i n t DestX:
I* c u r r e n t X l o c a t i o n i n d e s t f o r t h i s
edge */
i n tD e s t X I n t S t e p :
X s t e pp e rs c a n - l i n e
Y s t e p */
/ * w h o l ep a r to fd e s t
i n tD e s t X D i r e c t i o n :
/* -1 o r 1 t o i n d i c a t e way X s t e p s ( l e f t / r i g h t ) */
i n t DestXErrTerm:
I* c u r r e n t e r r o r t e r m f o r d e s t
X stepping */
i n t DestXAdjUp:
I* amount t o add t o e r r o r t e r m p e r s c a n l i n e
move */
i n t DestXAdjDown;
I* amount t o s u b t r a c t f r o m e r r o r t e r m
when t h e
e r r o rt e r mt u r n so v e r
*/
1 EdgeScan:
i n t StepEdge(EdgeScan * ) :
i n t SetUpEdge(EdgeScan *, i n t ) :
voidScanOutLine(EdgeScan *, EdgeScan * ) :
i n tG e t I m a g e P i x e l ( c h a r *, i n t . i n t . i n t ) ;
/ * S t a t i c st os a v et i m et h a tw o u l do t h e r w i s ep a s st h e mt os u b r o u t i n e s .
*/
s t a t i ci n tM a x V e r t .N u m V e r t s .D e s t Y :
s t a t i cP o i n t * VertexPtr:
s t a t i c P o i n t * TexVertsPtr:
s t a t i c c h a r * TexMapBits:
s t a t i c i n t TexMapWidth;
/ * Draws a t e x t u r e - m a p p e dp o l y g o n ,g i v e n
a l i s to fd e s t i n a t i o np o l y g o n
and a
v e r t i c e s , a l i s to fc o r r e s p o n d i n gs o u r c et e x t u r ep o l y g o nv e r t i c e s ,
p o i n t e rt ot h es o u r c et e x t u r e ' sd e s c r i p t o r .
*/
v o i d D r a w T e x t u r e d P o l y g o n ( P o i n t L i s t H e a d e r * P o l y g o n ,P o i n t
* TexVerts,
TextureMap * TexMap)
(
i n t MinY. MaxY. M i n V e r t . i:
EdgeScan L e f t E d g eR
. ightEdge:
NumVerts
Polygon->Length:
VertexPtr = Polygon->PointPtr;
TexVertsPtr
TexVerts:
TexMapBits
TexMap->TexMapBits;
TexMapWidth
TexMap->TexMapWidth:
/ * N o t h i n gt od r a w i f l e s st h a n 3 v e r t i c e s
i f (NumVerts < 3 ) {
return:
*/
/ * Scan t h r o u g ht h ed e s t i n a t i o np o l y g o nv e r t i c e s
and f i n d t h e t o p o f t h e
l e f t and r i g h t edges, t a k i n ga d v a n t a g eo fo u rk n o w l e d g et h a tv e r t i c e sr u n
i n a c l o c k w i s e d i r e c t i o n ( e l s e t h i sp o l y g o nw o u l d n ' tb ev i s i b l e
due t o
b a c k f a c er e m o v a l )
*/
MinY
32767;
MaxY
-32768;
f o (r i - 0 i:< N u m V e r t s :
itc)(
1055
i f ( V e r t e x P t r [ i l . Y < MinY) {
MinY
VertexPtrCi1.Y;
i;
MinVert
--
i f ( V e r t e x P t r C i 1 . Y > MaxY) {
MaxY
VertexPtrCi1.Y;
MaxVert
i;
/*
R e j e c t flat ( 0 - p i x e l - h i g h )p o l y g o n s
i f (MinY >- MaxY) I
return;
*/
/*
The d e s t i n a t i o n Y c o o r d i n a t e i s n o t e d g e s p e c i f i c ;
i t a p p l i e st o
b o t he d g e s ,s i n c e
we a l w a y ss t e p Y by 1 */
DestY
MinY;
/ * Setup t o s c a n t h e i n i t i a l l e f t
and r i g h t edges o ft h es o u r c ea n d
d e s t i n a t i o np o l y g o n s . We a l w a y ss t e pt h ed e s t i n a t i o np o l y g o ne d g e s
byone
i n Y . so c a l c u l a t e t h e c o r r e s p o n d i n g d e s t i n a t i o n
X s t e pf o r
e a c he d g e ,a n dt h e nt h ec o r r e s p o n d i n gs o u r c ei m a g e
X and Y s t e p s */
LeftEdge.Direction
-1;
/* s e tu pl e f t
edge f i r s t */
SetUpEdge(&LeftEdgeM
. inVert);
RightEdge.Direction
1;
/* s e t up r i g h t edge */
SetUpEdge(&RightEdge.MinVert);
/ * Step down d e s t i n a t i o ne d g e s onescan l i n e a t a t i m e . A t eachscan
l i n e .f i n dt h ec o r r e s p o n d i n ge d g ep o i n t si nt h es o u r c ei m a g e .
Scan
b e t w e e nt h ee d g ep o i n t s
i nt h es o u r c e ,d r a w i n gt h ec o r r e s p o n d i n g
p i x e l sa c r o s st h ec u r r e n ts c a nl i n ei nt h ed e s t i n a t i o np o l y g o n .
(We
know w h i c h way t h e l e f t and r i g h t e d g e s r u n t h r o u g h t h e v e r t e x l i s t
because v i s i b l e( n o n - b a c k f a c e - c u l l e d )p o l y g o n sa l w a y sh a v et h ev e r t i c e s
i nc l o c k w i s eo r d e r
a ss e e nf r o mt h ev i e w p o i n t )
*/
for ( ; : I
/* Done i f o f f b o t t o m o f c l i p r e c t a n g l e
*/
i f (DestY >- ClipMaxY) I
return;
/*
Draw o n l y i f i n s i d e Y bounds o f c l i p r e c t a n g l e
i f (DestY >- C l i p M i n Y ) {
/ * Draw t h es c a nl i n eb e t w e e nt h et w oc u r r e n te d g e s
ScanOutLine(&LeftEdge&
. RightEdge);
*/
*/
/*
A d v a n c et h es o u r c ea n dd e s t i n a t i o np o l y g o ne d g e s ,e n d i n g
scanned a l l t h e way t o t h e b o t t o m o f t h e p o l y g o n
i f (!StepEdge(&LeftEdge)) {
break:
*/
i f we've
i f (!StepEdge(&RightEdge)) {
break;
1
1
DestY++;
/*
Stepsanedgeonescan
l i n ei nt h ed e s t i n a t i o n ,
and t h ec o r r e s p o n d i n g
d i s t a n c ei nt h es o u r c e .
I f a ne d g er u n so u t ,s t a r t s
a new edge i f t h e r e
i s one.Returns
1 f o rs u c c e s s .o r
0 i f t h e r ea r en om o r ee d g e st os c a n .
i n t StepEdge(EdgeScan * Edge)
{
/*
Count o f f t h e s c a n l i n e
we s t e p p e dl a s tt i m e ;
finished, try tostartanother
one */
i f (--Edge->Remaininsscans
0) {
1056
Chapter 56
i f t h i s edge i s
*/
/*
--
Setupthenextedge;done
i f t h e r e i s no n e x t edge * I
i f ( S e t U p E d g e ( E d g e .E d g e - X u r r e n t E n d )
0) I
r e t u r n ( 0 ) : I* nomoreedges:donedrawingpolygon
*/
/*
return(1);
all s ettod r a wt h e
I* S t e pt h ec u r r e n ts o u r c ee d g e
new edge
*/
*I
Edge->SourceX +- Edge->SourceStepX;
Edge->SourceY +- Edge->SourceStepY;
/ * S t e pd e s t X w i t hB r e s e n h a m - s t y l ev a r i a b l e s ,t og e tp r e c i s ed e s tp i x e l
placement a n d a v o i d g a p s */
Edge->DestX += E d g e - > D e s t X I n t S t e p ; / * w h o l ep i x e sl t e p
*/
/ * Do e r r o r t e r m s t u f f f o r f r a c t i o n a l p i x e l
X s t e ph a n d l i n g */
i f ((Edge->DestXErrTerrn +- Edge->DestXAdjUp) > 0 ) I
Edge->DestX +- E d g e - > D e s t X D i r e c t i o n :
Edge->DestXErrTerm - = Edge->DestXAdjDown;
return(1);
1
/ * Setsup
an edge t o b es c a n n e d ;t h ee d g es t a r t s
a t S t a r t V e r t andproceeds
i nd i r e c t i o nE d g e - > D i r e c t i o nt h r o u g ht h ev e r t e xl i s t .E d g e - > D i r e c t i o nm u s t
be s e t p r i o r t o c a l l ;
-1 t o scan a l e f t e d g e( b a c k w a r dt h r o u g ht h ev e r t e x
l i s t ) . 1 t o scan a r i g h t edge ( f o r w a r dt h r o u g ht h ev e r t e xl i s t ) .
A u t o m a t i c a l l ys k i p so v e r0 - h e i g h te d g e s .R e t u r n s
1 f o rs u c c e s s ,o r
0 if
t h e r ea r e no moreedges
t o scan. */
i n t SetUpEdge(EdgeScan * Edge, i n t S t a r t V e r t )
i n tN e x t V e r t .D e s t X W i d t h ;
F i x e d p o i n tD e s t Y H e i g h t ;
for (;;I I
/ * Done i f t h i s edge s t a r t s a t t h eb o t t o mv e r t e x
i f ( S t a r t V e r t =- MaxVert) I
return(0);
*I
/*
Advance t o t h e n e x t v e r t e x , w r a p p i n g
if we r u n o f f t h e s t a r t o r
*/
ofthevertexlist
NextVert
S t a r t V e r t + Edge->Direction;
i f ( N e x t V e r t >- NumVerts) {
NextVert = 0;
I e l s e i f (NextVert < 0) I
NextVert
NumVerts - 1;
end
I* C a l c u l a t e t h e v a r i a b l e s f o r t h i s
z e r o - h e i g h t edge * I
edgeanddone
if thisisnot
i f ((Edge->RemainingScans =
V e r t e x P t r C N e x t V e r t 1 . Y - V e r t e x P t r C S t a r t V e r t 1 . Y ) !- 0 ) I
DestYHeight
INT-TO_FIXED(Edge->Remaiflingscans);
Edge->CurrentEnd
NextVert:
Edge->SourceX = INTLTO-FIXED(TexVertsPtr[StartVert].X);
Edge->SourceY
INT-TOLFIXED(TexVertsPtr[StartVertl.Y);
Edge->SourceStepX
FixedDiv(INT~TOLFIXED(TexVertsPtr[NextVertl.X~ Edge->SourceX.DestYHeight):
Edge->SourceStepY = FixedDiv(INT-TOLFIXED(TexVertsPtr[NextVertl.Y) Edge->SourceY.DestYHeight):
/ * S e tu pB r e s e n h a r n - s t y l ev a r i a b l e sf o rd e s t
X stepping */
Edge->OestX
VertexPtrCStartVert1.X;
i f ((OestXWidth
(VertexPtr[NextVertl.X - VertexPtrCStartVert1.X)) < 0) I
/* Setup f o r d r a w i n g r i g h t t o l e f t
*/
E d g e - > D e s t X D i r e c t i o n = -1;
- -
1057
DestXWidth
-DestXWidth;
1 - Edge->RemainingScans;
Edge->DestXErrTerm
Edge->DestXIntStep
- ( D e s t X W i d t h / Edge->RemainingScans):
else {
/* S e tu pf o rd r a w i n gl e f tt or i g h t
*/
Edge->DestXDi r e c t i on
1;
Edge->DestXErrTerm
0;
Edge->DestXIntStep
DestXWidth / Edge->RemainingScans;
--
Edge->DestXAdjUp
DestXWidth % Edge->RemainingScans;
Edge->RemainingScans;
Edge->DestXAdjDown
return(1);
/ * success * /
StartVert
NextVert;
/*
k e e pl o o k i n gf o r
/ * T e x t u r e - m a p - d r a wt h es c a nl i n eb e t w e e nt w oe d g e s .
voidScanOutLine(EdgeScan * LeftEdge. EdgeScan
a n o n - 0 - h e i g h et d g e
*/
*/
RightEdge)
F i x e d p o i n tS o u r c e X
LeftEdge->SourceX:
F i x e d p o i n tS o u r c e Y
LeftEdge->SourceY;
i n t DestX
LeftEdge->DestX;
i n t DestXMax
RightEdge->DestX;
F i x e d p o i n tD e s t W i d t h ;
F i x e d p o i n tS o u r c e X S t e p .S o u r c e Y S t e p ;
/ * N o t h i n g t o do i f f u l l y X c l i p p e d */
i f ((DestXMax <- C l i p M i n X ) 1 1 (DestX >- C l i p M a x X ) ) {
return:
*/
I* W i d t ho fd e s t i n a t i o ns c a nl i n e .f o rs c a l i n g .N o t e :b e c a u s et h i si s
an
i n t e g e r - b a s e ds c a l i n g ,
i t canhave
a t o t a le r r o ro f
as much as n e a r l y
one p i x e l .F o rm o r ep r e c i s es c a l i n g ,a l s om a i n t a i n
a f i x e d - p o i n t DestX
i t f o rs c a l i n g .
I f t h i s i s done, i t will a l s o
i n eachedge.anduse
b en e c e s s a r yt on u d g et h es o u r c es t a r tc o o r d i n a t e st ot h er i g h tb y
an
a m o u n tc o r r e s p o n d i n gt ot h ed i s t a n c ef r o mt h et h er e a l( f i x e d - p o i n t )
DestXandthe
f i r s t p i x e l ( a t an i n t e g e r X ) t o bedrawn)
*/
DestWidth
INT-TO_FIXED(DestXMax - DestX);
/* C a l c u l a t es o u r c es t e p st h a tc o r r e s p o n dt oe a c hd e s t
X s t e p( a c r o s s
*/
t h es c a nl i n e )
SourceXStep
FixedDiv(RightEdge->SourceX - SourceX.DestWidth);
SourceYStep
FixedDiv(RightEdge->SourceY - SourceY,DestWidth);
/* C l i p r i g h t edge i f n e c e s s a r y */
i f (DestXMax > ClipMaxX) I
DestXMax
ClipMaxX:
I* C l i p l e f t edge i f n e c s s a r y */
i f (DestX < C l i p M i n X ) {
SourceX +- SourceXStep * ( C l i p M i n X - D e s t X ) ;
SourceY +- SourceYStep * ( C l i p M i n X - D e s t X ) ;
DestX
ClipMinX;
/*
S c a na c r o s st h ed e s t i n a t i o ns c a nl i n e ,u p d a t i n gt h es o u r c ei m a g e
p o s i t i o na c c o r d i n g l y * I
f o r ( ; DestX<DestXMax;DestX++)
I
/* Get c u r r e n t l y mapped p i x e lo u to fi m a g ea n dd r a w
i t t os c r e e n
W r i t e P i x e l X ( D e s t XD
. estY.
GET-IMAGE-PIXEL(TexMapBits. TexMapWidth.
FIXED-TO-INT(SourceX).
FIXED_TO-INT(SourceY))
);
1058
Chapter 56
*/
Previous
Home
Next
I* P o i n t t o t h e n e x t source p i x e l * I
SourceX +- SourceXStep:
SourceY +- SourceYStep:
No matter how you slice it, DDA texture mapping beats boring, single-color polygons nine ways to Sunday. The big downside is that its much slower than a normal
polygon fill: move the ball close to the screen in DEMO1, and watch things slow
down when one of those big texture maps comes around. Of course, thats partly
because the codeis all in C; some well-chosen optimizations would workwonders. In
the next chapter well discuss texture mapping further, crank up the speed of our
texture mapper, and attend to some rough spots that remain in the DDA texture
mapping implementation, most notably in the area of exactly which texture pixels
map to which destination pixels as a polygon rotates.
And, in case youre curious, yes, there is a bear in DEMO1. I wouldnt say he looks
much like a Pooh-type bear, but hes a bearnonetheless. He does tendto look a little
startled when you flip the ball around so that hes zipping by on his head, but,heck,
you would too in the same situation. And remember, when you buy the next VGA
megahit, Bears in Space, you saw it here first.
1059
Previous
chapter 57
10,000 freshly
sheared sheep on the
screen
Home
Next
1063
matter of illusion, of convincing the eye to see what you wantit to see, and thats very
much a black art based on experience.
so
that they re in the sweet spot of apparent motion, in which the eye is willing to
ignore the jumping and aliasing, and blend the images together into continuous
motion. Only experience can give youa feel forthat sweet spot.
1064
Chapter 57
Polygonscannedwith
all-integer approach
.....
Ed
\Edge
as
scanned by
ge endVertex fixed-point approach
mapped exactly tothe left and topdestination edges wereconsidered to be inside, and
pixels that mapped exactly tothe right and
bottom destination edges wereconsidered to
be outside. Thats fine filling
for polygons, but when copying texture maps, it causes
different edges of the texture map to be omitted, depending on the destination
orientation, because different edgesof the texture map correspond to the right
and
bottom destination edges, depending on the current rotation. Also, the previous
chapters code truncated to get integer source coordinates.
This, together with the
orientation problem, meant that
when a texture turned upside down, it slowed one
new rowand onenew column of pixels from the next
row and column of the texture
map. Thisasymmetry was quite visible, and not atall the desired effect.
Listing 57.1 is one solution to these problems. This code,
which replaces theequivalently named function presented in theprevious chapter (and, of course, is present
in the X-Sharp archive in this chapters subdirectory
of the listings disk), makes no
attempt to followthe standardpolygon inside/outside rules when mapping thesource.
Instead, itadvances a half-step into the texture map before
drawing the first pixel, so
pixels along all edges arehalf included. Rounding rather than truncation to texturemap coordinates is also performed. The result is that the texture map stays pretty
much centeredwithin the destinationpolygon as the destination rotates,
with a muchreduced level of orientation-dependent asymmetry.
1066
Chapter 57
LISTING 57.1
157- 1.C
I* T e x t u r e - m a p - d r a wt h es c a nl i n eb e t w e e nt w oe d g e s .U s e sa p p r o a c ho f
p r e - s t e p p i n g 112 p i x e li n t ot h es o u r c ei m a g ea n dr o u n d i n gt ot h en e a r e s t
s o t h a t t e x t u r e maps will appear
s o u r c ep i x e la te a c hs t e p ,
*I
r e a s o n a b l ys i m i l a ra ta l la n g l e s .
voidScanOutLine(EdgeScan
LeftEdge.EdgeScan
RightEdge)
F i x e d p o i n tS o u r c e X ;
F i x e d p o i n tS o u r c e Y :
i n t DestX
LeftEdge->DestX;
i n t DestXMax = R i g h t E d g e - > D e s t X ;
F i x e d p o i n tD e s t W i d t h :
F i x e d p o i n tS o u r c e S t e p X .S o u r c e S t e p Y :
I* N o t h i n gt od o
i f ((DestXMax
return:
<-
i f f u l l y X clipped *I
C l i p M i n X ) I I (DestX >- C l i p M a x X ) ) {
1
i f ((DestXMax - DestX)
SourceX
SourceY
=
=
<=
0) {
I* n o t h i n gt od r a w
return:
*I
LeftEdge->SourceX:
LeftEdge->SourceY:
I* W i d t ho fd e s t i n a t i o ns c a nl i n e ,f o rs c a l i n g .N o t e :b e c a u s et h i si sa n
i n t e g e r - b a s e ds c a l i n g ,
i t canhave
a t o t a le r r o ro f
asmuchas
nearly
one p i x e l .F o rm o r ep r e c i s es c a l i n g ,a l s om a i n t a i n
a f i x e d - p o i n t DestX
i n eachedge,and
use i t f o rs c a l i n g .
I f t h i s i s done, i t will a l s o
benecessary
t o n u d g et h es o u r c es t a r tc o o r d i n a t e st ot h er i g h tb y
an
a m o u n tc o r r e s p o n d i n gt ot h ed i s t a n c ef r o mt h et h er e a l( f i x e d - p o i n t )
*I
DestXand
thefirstpixel(at
an i n t e g e r X) t o bedrawn).
D e s t W i d t h = INTCTOCFIXED(OestXMax - D e s t X ) :
I* C a l c u l a t es o u r c es t e p st h a tc o r r e s p o n dt oe a c hd e s t
X s t e p( a c r o s s
t h es c a nl i n e )
*I
F i x e d D i v ( R i g h t E d g e - > S o u r c e X - SourceX.DestWidth):
SourceStepX
SourceStepY = F i x e d D i v ( R i g h t E d g e - > S o u r c e Y - SourceY.DestWidth):
I* Advance 112 s t e pi nt h es t e p p i n gd i r e c t i o n ,t os p a c es c a n n e dp i x e l s
e v e n l yb e t w e e nt h el e f ta n dr i g h te d g e s .( T h e r e ' s
a s l i g h ti n a c c u r a c y
2 b ys h i f t i n gr a t h e rt h a nd i v i d i n g ,
i n d i v i d i n g n e g a t i v e numbersby
b u tt h ei n a c c u r a c yi si nt h el e a s ts i g n i f i c a n tb i t ,
and w e ' l l j u s t
l i v e w i t h it.) */
SourceX +- SourceStepX >> 1:
SourceY +- SourceStepY >> 1:
I* C l i p r i g h t
i f (DestXMax
DestXMax
>
edge i f n e c s s a r y * /
ClipMaxX)
ClipMaxX;
I* C1 i p l e f t edge i f n e c s s a r y * I
< ClipMinX) {
SourceX +- FixedMul(SourceStepX.INTCTOCFIXED(ClipMinX
SourceY +- FixedMul(S0urceStepY.INT-TO-FIXED(C1ipMinX
DestX
ClipMinX:
i f (DestX
/*
- OestX)):
- DestX)):
S c a na c r o s st h ed e s t i n a t i o ns c a nl i n e ,u p d a t i n gt h es o u r c ei m a g e
p o s i t i o na c c o r d i n g l y * I
1067
f o r ( ; DestX<DestXMax;
DestX++)
I
I* G e tt h ec u r r e n t l y
mapped p i x e l o u t o f theimageanddraw
t h es c r e e n * I
W r i t e P i x e l X ( D e s t XD
. estY.
GET-IMAGE-PIXEL(TexMapBits. TexMapWidth,
ROUND-FIXED-TO_INT(SourceX).
ROUND_FIXED_TO_INT(SourceY))
I* P o i n tt ot h en e x ts o u r c ep i x e l
*I
SourceX +- SourceStepX;
SourceY +- SourceStepY;
it t o
):
1068
Chapter 57
map.
Usesapproach
: 1 / 2p i x e li n t ot h es o u r c ei m a g ea n dr o u n d i n gt ot h en e a r e s ts o u r c e
o fp r e - s t e p p i n g
: p i x e la te a c hs t e p ,
s o t h a t t e x t u r e maps will a p p e a rr e a s o n a b l ys i m i l a r
: a ta l la n g l e s .T h i sr o u t i n ei ss p e c i f i ct o3 2 0 - p i x e l - w i d ep l a n a r
: ( n o n - c h a i n 4 12 5 6 - c o l o r modes,suchas
mode X , w h i c h i s a p l a n a r
: ( n o n - c h a i n 4 12 5 6 - c o l o r mode w i t h a r e s o l u t i o no f3 2 0 x 2 4 0 .
: C n e a r - c a l l a b l ea s :
: void
ScanOutLine(EdgeScan
: T e s t e d w i t h TASM 3.0.
SC-INDEX 03c4h
MAP-MASK 02h
SCREEN-SEG
SCREEN-WIDTH
equ
equ
equ
equ
OaOOOh
80
LeftEdge.
EdgeScan
RightEdge);
; S e q u e n c eC o n t r o l l e rI n d e x
; i n d e x i n SC o f Map Mask r e g i s t e r
:segment d oi sf p l a y
memory i n mode X
s;bw
cysforliritcidenfoenatem
esehnn
: t ot h en e x t
.model
small
.data
extrn
-TexMapBits:word.
-TexMapWidth:word.
-DestY:word
extrn
-CurrentPageBase:word.
-ClipMinX:word
extrn
-ClipMinY:word.
-ClipMaxX:word.
-ClipMaxY:word
: D e s c r i b e st h ec u r r e n tl o c a t i o na n ds t e p p i n g ,i nb o t ht h es o u r c ea n d
: t h ed e s t i n a t i o n ,o f
EdgeScan s t r u c
dw
D i r e c t i on
a ne d g e .M i r r o r ss t r u c t u r ei n
?
DRAWTEXP.C.
:throughedge
l i s t : 1 f o r a r i g h t e d g e( f o r w a r d
: t h r o u g hv e r t e xl i s t ) .
-1 f o r a l e f t edge(backward
: t h r o u g hv e r t e xl i s t )
: h e i g h tl e f tt os c a no u ti nd e s t
: v e r t e x # o f end o f c u r r e n t edge
;X l o c a t i o n i n s o u r c e f o r t h i s
edge
: Y l o c a t i o n i n s o u r c ef o rt h i s
edge
:X s t e pi ns o u r c ef o r
Y s t e p i n d e s to f 1
:Y s t e pi ns o u r c ef o r
Y stepindestof
1
: v a r i a b l e su s e df o ra l l - i n t e g e rB r e s e n h a m ' s - t y p e
: X s t e p p i n gt h r o u g ht h ed e s t .n e e d e df o rp r e c i s e
: p i x e lp l a c e m e n tt oa v o i dg a p s
edge
:current X l o c a t i o n i n d e s t f o r t h i s
: w h o l ep a r to fd e s t
X s t e pp e rs c a n - l i n e
Y step
: - 1 o r 1 t oi n d i c a t ew h i c h
way X s t e p s ( l e f t / r i g h t )
: c u r r e n te r r o rt e r mf o rd e s t
X stepping
:amount t o add t o e r r o r t e r m p e r s c a n l i n e
move
:amount t o s u b t r a c t f r o m e r r o r t e r m
when t h e
: e r r o rt e r mt u r n so v e r
RemainingScans
CurrentEnd
SourceX
SourceY
SourceStepX
SourceStepY
dw
dw
dd
dd
dd
dd
?
?
?
?
?
?
DestX
OestXIntStep
DestXDi rection
DestXErrTerm
DestXAdjUp
DestXAdjDown
dw
dw
dw
dw
dw
dw
?
?
?
?
?
?
dw
dw
dw
2 d u p ( ?: )r e t u rand d r e s s
EdgeScanends
Parms
struc
LeftEdge
RightEdge
Parms
ends
?
?
; p o i n t eo r
: p o i n t eo r
& pushed BP
EdgeScan s t r u c t uflroeerf t
EdgeScan s t r u c t urf iorgeer hd tg e
; O f f s e t sf r o m
BP i n s t a c k f r a m e o f l o c a l v a r i a b l e s .
-4
: c u r r e n t X c o o r ds ioniuinam
rtceaeg e
1 SourceX
equ
1 SourceY
equ
-8
: c u r r e n t Y c o o r dsioniunai m
rtceaeg e
1SourceStepX
equ
-12
;X s t esi npo u r ci m
e a gf oe r
1SourceStepY
equ
-16
;Y s t esi nop u r ci m
e a gf oe r
X d e s t eopf
X d esstteopf
edge
1 069
lXAdvanceByOne
equ
1XBaseAdvance
equ
1YAdvanceByOne
equ
1YBaseAdvance
equ
1 pixel
minimum
number
1 pixel
LOCALLSIZE
equ
.code
e x t rJni x e d M u: nl e a r- ,F i x e d D i v : n e a r
align
2
ToScanDone:
ScanDone
jmp
public 3canOutLine
2
align
-ScanOutLine
near
proc
f; rparscm
etaasel blecepkrrvp' esu s h
mov fsrtaaom
cukerb;tppoo. si npt
sub
sp.LOCAL-SIZE
: a l l o c ast pe a fclooercvaal r i a b l e s
push: p r cerasevsleglaireirvsri e'at sebrl e s
di push
; N o t h i n g t o do i f d e s t i n a t i o n i s f u l l y
X clipped.
mov
di.[bpl.RightEdge
mov
s i , Cdi 1.DestX
cmp
.[-C
s il i p M i n X ]
jlToScanDone
e
; r i g h t edge i st ol e f ot cf l i pr e c t .
mov
bx,[bpl.LeftEdge
mov
dx.[bxl.OestX
cmp
dx,
[LC1 ipMaxXl
jge
ToScanDone
; l e f t edge i tsroi g hoct f l i rpe c t ,
; d e s it.i dn xastui ob n
fill w i d t h
jle
ToScanDone
; nnuoelr lg a t i fvuwelild t h ,
mov
mov
mov
mov
of
so done
s o done
s o done
X coordinate
mov
axsC
wpbotxrrdl . S o u r ;cienYist oi aulr c e
Y coordinate
mov
word Cp bt rp l . 1 S o u r c e Y . a ~
mov
a x . w[ pbotxrrdl . S o u r c e Y + Z
mov
word [ pb tprl . l S o u r c e Y + 2 . a x
; C a l c u l a t es o u r c es t e p st h a tc o r r e s p o n dt oe a c h1 - p i x e ld e s t i n a t i o n
X step
: ( a c r o s st h ed e s t i n a t i o ns c a nl i n e ) .
si
d; peusst h
Xf i xwei nd tpfho,ri nmt
push
ax, ax
sub
ax
0 as f r a c t i dopenoasaf rtl t
X width
push ;push
ax.word p t r[ d i ] . S o u r c e X
mov
sub
ax.word p t[ rb p ] . l S o u r c e ;Xl o w o rodsfo u r c e
X width
mov
dx.word p t r[ d i ] . S o u r c e X + Z
sbb
X width
dx.word p t [r b p ] . l S o u r c e X + Z; h i g hw o r do sf o u r c e
push source
:push
dx
X f iwx iedndt phf o, irnmt
push
ax
call
-F i x e d D:isvc asl eo u r c e
X w i d ttdhoe s t
X width
add psat;arcacfl rem
kosaepm
rt.e8r s
mov
X s t e pf o r
word p t r[ b p ] . l S o u r c e S t e p X . a x; r e m e m b e rs o u r c e
mov
; 1 - p i x edl e s t i n a t i o n
X step
word p [t br p ] . l S o u r c e S t e p X + 2 , d x
mov
CX.1
;assume s o u r c e X a d vnaonnc-ense g a t i v e
X advance?
dx.dx
:which
way source
does
and
1070
Chapter 57
:negative
jns
neg
cmp
jz
inc
SourceXNonNeg
;non-negative
cx
wteshhitxon:aeailtaxspeec.ntg0l ey r ?
SourceXNonNeg
:yes
dx
SourceXNonNeg:
mov
mov
:no. t r u indntcithraoeietgnecetori fo n
: 0 . b e c a u s eo t h e r w i s ew e ' l le n du pw i t h
: w h o l es t e po f1 - t o o - l a r g em a g n i t u d e
Cbpl 1 XAdvanceByOne, cx
Cbpl.1XBaseAdvance.d~
:amount t o add t o s o u r c e p o i n t e r t o
in X
:minimumamount
t o add t o s o u r c e
: p o i n t e r t o advance i n X e a c ht i m e
: t h ed e s ta d v a n c e s
one i n X
d e s t: p u s h f i xfeodr Ym
p ionhi ne ti g h t .
: move byone
push
si
sub
ax, ax
push :push
ax
0 as f r a c t ipoanrat l
o f dest Y height
mov
ax.word p t r[ d i l . S o u r c e Y
sub
ax.word p t r Cbp].lSourceY
:low
word
osf o u r c e
Y height
mov
dx.word p t r Cdil.SourceY+Z
sbb
dx.word p t r [ b p l . l S o u r c e Y + E: h i g hw o r dosf o u r c e
Y height
push
f i xY e hdf ioepnriom
gi hn tt ,
s o:uprucsehd x
push
ax
call
- F i x e d: sDscioav ul er c e
Y h de eitgo
sht t
X width
s t a c k pf raoradd
m
a m: celteear rs s p . 8
mov
word p t r[ b p l . l S o u r c e S t e p Y . a x
;remembersource
Y stepfor
mov
word p t r [bp].lSourceStepY+2,dx
: 1 - p i x edl e s t i n a t i o n
X step
mov
cx.[-TexMapWidth]
:assume
source
Y a d v a n cneosn - n e g a t i v e
and
:which
dx dx,
waysource
does
Y advance?
j n;non-negative
s
SourceYNonNeg
cx
neg
e xscmp
w
at che toptlhlyee : i sa x . 0
an i n t e g e r ?
:yes
SourceYNonNeg
jz
d iirtnehoccieftntrit:uonenntgoced.arxt e
: 0. b e c a u s eo t h e r w i s ew e ' l le n du pw i t h
a
: w h o l es t e po f1 - t o o - l a r g em a g n i t u d e
SourceYNonNeg:
mov
Cbpl.lYAdvanceBy0ne.c~:amount
t o add t os o u r c ep o i n t e rt o
; move byone
in Y
mov
: m i n i m u md i s t a n c es k i p p e di ns o u r c e
ax.[-TexMapWidthl
imu1
dx
: imagebitmap when Y s t e p s( i g n o r i n g
mov
C b ~ 1 . l Y B a s e A d v a n c e . a ~ : c a r r yf r o mt h ef r a c t i o n a lD a r t )
: Advance112 s t e p ' i n t h e s t e p p i n g d i r e c t i o n ,
t o s p a c es c a n n e dp i x e l ;e v e n l y
: b e t w e e nt h el e f ta n dr i g h te d g e s .( T h e r e ' s
a s l i g h ti n a c c u r a c yi nd i v i d i n g
: n e g a t i v en u m b e r sb y
2 b ys h i f t i n gr a t h e rt h a nd i v i d i n g ,b u tt h ei n a c c u r a c y
: isintheleastsignificantbit,
and w e ' l l j u s t l i v e w i t h
it.)
mov
ax,word p t r Cbpl.lSourceStepX
mov
dx.word p t r [bp].lSourceStepX+E
dx.1 sar
ax,l rcr
word
add
[pbtpr l . l S o u r c e X . a x
word
adc
p t r [bp].lSourceX+E.dx
mov
ax.word p t r [bp].lSourceStepY
mov
dx.word p t r [bp].lSourceStepY+E
dx,l sar
ax.1 rcr
word
add
[pbtpr ] . l S o u r c e Y . a x
word
adc
p t r Cbp].lSourceY+Z.dx
: C l i p r i g h t edge i f n e c e s s a r y .
mov
s i [, d i 1.DestX
1 071
cmp
s i ,[LC1 ipMaxX1
jl
RightEdgeClipped
mov
s i , [LC1 ipMaxXl
RightEdgeClipped:
; C1 i p l e f t edge i f n e c s s a r y
mov
bx.Cbp1.LeftEdge
mov
. d[ bi. xDl e s t X
cmp
d i , [LC1
ipMinX1
L e f t E djggeeC l i p p e d
; L e f tc l i p p i n gi sn e c e s s a r y ;a d v a n c et h es o u r c ea c c o r d i n g l y
di
neg
add
; C l i p M i n X - DestX
d i , C-Cl i p M i n X 1
; f i r s t . a d v a n c et h es o u r c e
in X
push
;pushClipMinX
- DestX. i nf i x e d p o i n tf o r m
di
sub
ax.ax
push
;push 0 as f r a c t i o n a lp a r to fC l i p M i n X - D e s t X
ax
push
SourceStepX+E
word p t r [ b p l .
SourceStepX
push
word p t r [ b p l .
call
-F i xedMul
X s t e p p i n gi nc l i p p e da r e a
; t o t a ls o u r c e
add
; c l e a rp a r a m e t e r sf r o ms t a c k
SP.8
add
X p a sctl i p p i n g
word p t r [ b p l . 1 S o u r c e X . a; sx t et phseo u r c e
adc
word p t r [ b p l . 1 SourceX+2,dx
;now a d v a n c et h es o u r c e
in Y
push
;pushClipMinX
- DestX. i nf i x e d p o i n tf o r m
di
sub
ax, ax
push
;push 0 as f r a c t i o n a lp a r to fC l i p M i n X - D e s t X
ax
push
word
pCtbr p ] . l S o u r c e S t e p Y + 2
push
word
p[ bt rp l . l S o u r c e S t e p Y
- Fsi xoc; etuaodrl tlcM
aelu l
Yc laisrptepeiaepn dp i n g
f r opm
a r a m e t; ec lres a r
sp.8
add
add
word
p[ tbrp ] . l S o u r c e Y . a: sxt et hspeo u r c e
Y p a sc tl i p p i n g
word
adc
[pbtpr l . l S o u r c e Y + Z . d x
mov
d i , [LC1 ipMi nX1
; s t a r t X c o o r d i n adi tenaesf ttcel ri p p i n g
LeftEdgeCl ipped:
: C a l c u l a t ea c t u a lc l i p p e dd e s t i n a t i o nd r a w i n gw i d t h .
sub
si,di
; S c a na c r o s st h ed e s t i n a t i o ns c a nl i n e ,u p d a t i n gt h es o u r c ei m a g ep o s i t i o n
; accordingly
: P o i n tt ot h ei n i t i a ls o u r c ei m a g ep i x e l ,a d d i n g
0 . 5 t o b o t h X and Y s o t h a t
; we c a n t r u n c a t e t o i n t e g e r s f r o m
now on b u t e f f e c t i v e l y g e t r o u n d i n g .
0.5
add
word
p t r Cbpl.1SourceY.8000h
;add
mov
ax.word Cpbt rp ] . l S o u r c e Y + 2
ax.0 adc
mu1
[ LsT
im
oeuaxrgM
ci neea lpi W
:nsi enci dai ttnihall
add
word
p[ bt rp l . l S o u r c e X . 8 0 0;0ahd d
0.5
mov
b x . w o rpd[tbr p ] . l S o u r c e X +;Eo f f s ei nt tsoo u r csec al inn e
i msaoguer c e i n o f fssoe;uti rnci etbi xa .l a x a d c
[LTexMapBi
bx,
add
tsl
;DS:BX p o ti hnt oeit tsi a l
i m pa igxee l
; P o i n tt oi n i t i a ld e s t i n a t i o np i x e l .
ax.SCREEN_SEG
mov
mov
,axes
ax,SCREENLWIDTH
mov
; o f f s e to fi n i t i a ld e s ts c a nl i n e
mu1
[LDestY 1
; i n i t i a ld e s t i n a t i o n
mov
cx.di
X
di.l shr
di,l shr
;X/4
offsetofpixelinscanline
add
d i ,ax
: o f f s e to fp i x e li n
page
di,[-CurrentPageBasel
add
; o f f s e to fp i x e li nd i s p l a y
memory
; E S : D I now p o i n t s t o t h e f i r s t d e s t i n a t i o n p i x e l
1072
Chapter 57
and
mov
mov
dox;up. taot hli ne t
mov
c, lO l l b
;CL
a1 ,MAP_MASK
d x , SC-INDEX
p i x e l ' ps l a n e
Within Listing 57.2, all the important optimization is in the loop that draws across
each destination scan line, near theend of the listing. One optimization is elimination of the call to the set-pixel routine used to draw
each pixel in Listing 57.1. Function
calls are expensive operations, to be avoided when performance matters. Also, although Mode X (the undocumented 320x240 256-color VGA mode X-Sharp runs
in) doesnt lenditself well to pixel-oriented operations like line drawing or texture
mapping, t,he inner loop has been set up to minimize ModeXs overhead. A rotating
plane mask is maintained in AL, with DX pointing to the Map Mask register; thus,
only a rotate and anOUT are required toselect the plane to which to write, cycling
from plane 0 through plane 3 and wrapping back to 0. Better yet, because we know
that were simply stepping horizontally acrossthe destination scan line, we can use a
clever optimization to both step the destination and reduce theoverhead of maintaining the mask. Two copies of the currentplane mask are maintained,one ineach
nibble ofAL. (The Map Maskregister pays attention only to the lower nibble.) Then,
when one copy rotates out of the lower nibble, theother copy rotates into thelower
nibble and is ready to be used. This approach eliminates the need to test for the
mask wrapping from plane3 to plane0, all the moreso because a carry is generated
when wrapping occurs, and that carry can be added to DI to advance the screen
pointer. (Check out the next chapter, however, to see the best Map Mask optimization of all-setting it once andleaving it unchanged.)
In all, the overhead of drawing each pixel is reduced from a call to the set-pixel
routine andfull calculation of the screen address and plane mask to fiveinstructions
and nobranches. This is an excellent example of converting full, from-scratch calculations to incrementalprocessing, whereby onlyinformation thathas changed since
the last operation (the plane mask moving one pixel, for example) is recalculated.
Incremental processing and knowing where the cycles go are both importantin the
final optimization in Listing 57.2, speeding up the retrieval of pixels from the texture map. This operation looks very efficient in Listing 57.1, consisting of only two
adds and themacro GET- IMAGE-PIXEL. However, those adds arefixed-point adds,
so they takefour instructions apiece, and themacro hidesnot only conversionfrom
fixed-point to integer, but also a time-consuming multiplication. Incremental approaches areexcellent at avoiding multiplication, because cumulative additions can
often replace multiplication. Thats the case with stepping through the source texture in Listing 57.2; ten instructions, with a maximum of two branches, replace all
the texture calculations of Listing 57.1. Listing 57.2 simply detects when the fractional part of the source x or y coordinate turnsover and advances the source texture
pointer accordingly.
As you might expect, all this optimization is pretty hard to implement, and makes
Listing 57.2 much more complicated than Listing 57.1. Is it worth the trouble? Indeed itis. Listing 57.2 is more than twice as fast as Listing57.1, and thedifference is
very noticeable when large, texture-mapped areas are animated. Whether more than
1074
Chapter 57
Previous
Home
Next
1 075
Previous
chapter 58
heinlein's crystal ball,
spock's brain, and the
9-cycle dare
Home
Next
1079
1080
Chapter 58
bitmaps, can represent more detail and look more realistic than dozens or even
hundreds of solid-color polygons. My X-Sharp texture mapper was in reasonable
assembly-pretty good code, by most standards!-and I felt comfortable with my
implementation; but thenI got a letter from
John Miles, whowas at thetime getting
seriously into 3-D and is now the authorof a 3-D game library. (Yes,you can license it
from his company, Non-Linear Arts,
if youd like; John can be reached at
70322.2457@compuserve.com.)John wrote me as follows: Hmm, so thats how texture-mapping works. But 3 jumps perpixel! Hmph!
It was the Hmph that
really got to me.
Left-Brain Optimization
That was the first shot ofjuice for
my optimizer (or atleast blow tomy ego, which can
be just as productive). John went on to say he had gotten texture mapping
down to
9 cycles per pixel and one jump per
scanline on a 486 (all cycle times will be for the
486 unless otherwise noted); given that my code took, on average, about 44 cycles
and 2 taken jumps (plus 1 not taken) perpixel, I had a long
way to go.
The innerloop of myoriginal texture-mapping codeis shownin Listing 58.1. All this
code does is draw a single texture-mapped scanline, as shown in Figure 58.1; an
outer loop runs throughall the scanlines in whatever polygon is being drawn. I immediately saw that Icould eliminate nearly 10 percent of the cycles byunrolling the
loop; obviously,John had done that,else theres no way he could branchonly once
per scanline. (By the way, branching only once perscanline via a fully unrolled loop
is not generally recommended. A branch every few pixels costs relativelylittle, and
the cacheeffects of fullyunrolled codeare not good.) I quickly came up with several
108 1
1.ASM
I n n e rl o o pt od r a w
a s i n g l et e x t u r e - m a p p e dh o r i z o n t a ls c a n l i n ei n
Mode X . t h e VGA's p a g e - f l i p p e d2 5 6 - c o l o r mode. Becauseadjacent
pixelslieindifferentplanesin
Mode X . an OUT mustbeperformed
t os e l e c tt h ep r o p e rp l a n eb e f o r ed r a w i n ge a c hp i x e l .
: A t t h i sp o i n t :
AL
initialpixel'splane
mask
initialsourcetexturepointer
DS:BX
DX
p o i n t e r t o VGA's S e q u e n c e rD a t ar e g i s t e r
SI
# o f p i x e l s t o fill
ES:DI
pointertoinitialdestinationpixel
--
TexScanLoop:
: S e tt h e
d r a wt h ep i x e l .
dx.al out
mov t e:pxgCitexubaterxhell ,
mov
e s : [sdc;pisrlie.xaetehnl
; P o i n tt ot h en e x ts o u r c ep i x e l .
add
Cbpl.1
bx. XBaseAdvance
:advance
t h e minimum il opfi x e li sn
X
mov
cx.word [pbtpr l . l S o u r c e S t e p X
add
word
p[ tbrp l . l S o u r c e X . c; sxt et hspeo u r c e
X f r a c t i o n ap la r t
jnc
NoExtraXAdvance
; d i d n ' tt u r no v e r :n oe x t r aa d v a n c e
bx.Cbpl.1XAdvanceByOne
add
: tduoi rdvanedr v; a n c e
X one e x t r a
NoExtraXAdvance:
add
bx.[bpl.lYBaseAdvance
:advance
minimum
the
mov
cx,word p t r Cbp1.lSourceStepY
add
word
p[ b
t rp l . l S o u r c e Y . ;csxt et hspeo u r c e
jnc
N o E xatedr a
xvtY
arnn
A
aoocdve:vteduarirnd
:ncne' t
bx.[bpl.lYAdvanceByOne
add
:did
NoExtraYAdvance:
# po if x ei lns
Y f r a c t i o n ap la r t
t ou vranedrv: a n c e
: P o i n tt ot h en e x td e s t i n a t i o np i x e l ,b yc y c l i n gt ot h en e x tp l a n e ,
: advancing t o t h e n e x t a d d r e s s
i f t h ep l a n ew r a p sf r o m
3 t o 0.
rol
adc
Y one e x t r a
and
a1 .1
di.0
: C o n t i n u e i f t h e r ea r e
dec
jnz
anymore
destpixelsto
draw.
si
TexScanLoop
Figure 58.2 shows why this cycling is necessary. In Mode X, the page-flipped 2 5 6
color modeof the VGA, each successive pixel across a scanlineis stored in a different
hardware plane, and an OUT to the VGA's hardware is needed to select the plane
being drawn to. (See Chapters 47, 48, and 49 for details.) An OUT instruction by
1082
Chapter 58
Pixels on Screen
Display Memory
itself takes 16 cycles (and in the neighborhood of 30 cycles in virtual46 or nonprivileged protected mode), and an
ROL takes 2 more, for a total of 18 cycles, double
Johns 9 cycles,just to handle plane management. Clearly, getting plane controlout
of the inner loopwas absolutely necessary.
I must confess, with some embarrassment, that at this point I threw myself into designing a solution that involved executing the texture mapping code to
upfour times
per scanline, once For the pixels in each plane. Its hard to overstate the complexity
of this approach, which involves quadrupling the normalpixel-to-pixel increments,
adjusting the start value for each of the passes, and dealing with some nastyboundary cases. Make no mistake, the code was perfectly doable, and would in fact have
gotten plane control out of the inner loop, butwould havebeen very difficult to get
exactly right, and would have suffered from substantial overhead.
Fortunately, inthe last sentence I was able to say would have,not was, becausemy
friend Chris Hecker (checker@bix.com) came along to toss a figurative bucket of
cold wateron my right brain, which was evidently asleep.(Or possibly stolen by scantilyclad, attractive aliens; remember SpocksBrain?) Chris is the author of the WinG
Windows game graphics package, available from Microsoft via FTP, CompuServe, or
MSDN Level 2; if, like me, you were at the Game Developers Conference in April
1994, you, along with everyone else, were stunned to see Ids megahit DOOM running atfull speed in a window, thanks to WinG. If you write gamesfor a living, run,
dont walk, to check WinG out!
Heinleins Crystal Ball, Spocks Brain, and the 9-Cycle Dare
1083
Chris listened to my proposed design for all of maybe 30 seconds, growing visibly
more horrifiedby the moment, before he
said, But why dont you just draw vertical
rather than horizontal scanlines?
W h y indeed?
158-2.ASM
I n n e rl o o pt od r a w
a s i n g l et e x t u r e - m a p p e dv e r t i c a lc o l u m n ,r a t h e r
t h a n a h o r i z o n t a ls c a n l i n e .T h i sa l l o w sa l lp i x e l sh a n d l e d
by t h i s code t o r e s i d e i n t h e
same p l a n e , so t h et i m e - c o n s u m i n g
p l a n es w i t c h i n gc a nb e
moved o u t o f t h e i n n e r l o o p .
: A t t h i sp o i n t :
- -
DS:BX
i n i t i a ls o u r c et e x t u r ep o i n t e r
DX
o f f s e tt oa d v a n c et ot h en e x tp i x e li nt h ed e s tc o l u m n
- #-
(eitherpositiveornegativescanlinewidth)
o f p i x e l s t o fill
ES:DI
p o i n t e rt oi n i t i a ld e s t i n a t i o np i x e l
YGA s e t up t o draw t o t h e c o r r e c t p l a n e f o r t h i s c o l u m n
SI
REPT
LOOP-UNROLL
: S e tt h e
Map Mask f o r t h i s p i x e l s p l a n e , t h e n d r a w t h e p i x e l .
mov
mov
ah.Cbx1
es:Cdil.ah
: g e tt e x t u r ep i x e l
: s e ts c r e e np i x e l
: P o i n tt ot h en e x ts o u r c ep i x e l .
add
bx.[bpl.lXBaseAdvance
:advance
minimum
the
mov
cx.word Cpbt rp l . 1 S o u r c e S t e p X
add
word
p[ bt rp ] . l S o u r c e X . :csxt et hspoe u r c e
N o Ej naxcedt rxvataX
r naAocdve:tveduarir:dnnnc e t
ab dx .d[ b p l . l X A d v a n c e B y O:todunvairedendrv: a n c e
1084
Chapter 58
I/poifx ei lns
X f r a c t i o n ap la r t
X one e x t r a
NoExtraXAdvance:
# o f p i x eilns
Y
bx,[bp].lYBaseAdvance
add
:advance
minimum
the
mov
cx.word [ pb tprl . l S o u r c e S t e p Y
Y fractionp
a al r t
add
word
p[ tbrp l . l S o u r c e Y . :csxt et hspeo u r c e
no e x t r a advance
jnc
NoExtraYAdvance
o: dvt uiedrrn: ' t
bx.[bpl.lYAdvanceByOne
add
Y one e x t r a
: d i dt u r no v e r :a d v a n c e
NoExtraYAdvance:
: P o i n tt ot h en e x td e s t i n a t i o np i x e l ,w h i c hi s
on t h en e x ts c a nl i n e .
di,dx adc
ENDM
That k what Zen programming is all about, though; tying together two pieces of
seemingly unrelated information to good effect-and that's what I hadfailed todo.
Like Robert Heinlein-like all of us-I had viewedthe world through afilter composed of my ingrained assumptions, and one of those assumptions, based on all
my past experience, was that pixel processingproceeds left to right. Eventually, I
might have come up with Chris k approach; but I would only have come up with it
when andif1 relaxed andstepped back a little, and allowedmyself"a1most dared
myself-to think of it. Whenyou 're optimizing, be sure to leave quiet, nondirected
time in which to conjure up those less obvious solutions, and periodically try to
figure out what assumptions you 're making-and then question them!
~~~1
'p$
Figure 58.3
1085
There area few complications with Chriss approach, not least that
X-Sharps polygon-fillingconvention (top andleft edges included, bottom and right
edges excluded)
is hard to reproduce for column-oriented texture mapping.
I solved this in X-Sharp
version 22 by tweaking the edge-scanning code to allow column-oriented texture
mapping to match the current convention. (Youll find X-Sharp 22 on the listings
diskette in the directory forthis chapter.)
Chris also illustrated another important principleof optimization: A second pairof
eyes is invaluable. Even the best of us haveblind spotsand get caughtup in particular implementations;if you bounce your ideas
off someone, you may wellfind them
coming back with an unexpected-and welcome-spin.
Thats Nice-But
Excellent as Chriss suggestion was, I still had work to do: Listing 58.2 is still more
than twice as slow John
as
Miless code. Traditionally, I start the optimization
process
with algorithmic optimization, then try to tie the algorithm and the hardware together for maximum efficiency, and finish up with instruction-by-instruction,
take-no-prisoners optimization. Weve already done thefirst two steps, so its time to
get down to the bare metal.
Listing 58.2contains three functionalparts: Drawing the pixel, advancing the destination pointer, and advancing the source texture pointer.Each of the three partsis
amenable to further acceleration.
Drawing the pixel is difficult to speed up, given that it consists of only two instructions-diffkult, but not impossible. True, the instructions themselves are indeed
irreducible, but if we can get ridof the ES: prefix (and, as we shall see, we can), we
can rearrange the code to
make it run faster on the Pentium. Without aprefix, the
instructions executeas followson the Pentium:
MOV
AH.CBX1
MOV
[DII.AH
:cycle 1 U-pipe
; c y c l e 1 V - p i p ei d l e ;
;cycle 2 U-pipe
reg c o n t e n t i o n
The second MOV, being dependent on thevalue loaded into AH by the first MOV,
cant execute until the first MOV is finished, so the Pentiums second pipe, the V-pipe,
lies idle for cycle.
a
We can reclaim that cycle simplyby shuffling another instruction
between the two MOVs.
Advancing the destination pointeris easyto speed up: Just build offset
the from one
scanline to the next into each pixeldrawing instruction
as a constant,as in
MOV [EDI+SCANOFFSETI.AH
1086
Chapter 58
source pointer, with the source texture coordinatesand increments stored in 16.16
(16 bits of integer, 16 bits of fraction) format.The source coordinates are storedin a
slightly unusual format, whereby the fractional X and Y coordinates are stored and
advanced separately, but asingle integer value, the source pointer, is used to reflect
both the X and Y coordinates. In Listing 58.2, the integer and fractional parts are
added into the current coordinates
with four separate16-bit operations, and carries
from fractional to integer parts are detected
via conditional jumps,as shownin Figure 58.4. There's quite a lot we can do to improve this.
J-
JCarry from
fractional addition?
I
Yes
No
Advance source pointer
one more pixel in X
Add integer Y
increment
to source pointer
-I
.1
Add fractional Y increment
to fractional Y coordinate
J-
Carry from
fractional addition?
I
Yes
No
Advance source pointer
one more pixel in Y
1087
First, we can sum the X andY integer advance amounts outside the loop, then add
them both to the source pointer with a single instruction. Second, we can recognize
that X advances exactly one extra byte when its fractional part carries, and use ADC
to account for X carries, as shown in Figure 58.5. That single ADC can add in not
only any X carry, but both the X and Y integer advance amounts as well, thereby
eliminating a good chunkof the source-advance code inListing 58.2. Furthermore,
we should somehow be able to use 32-bit registers and instructions to help with the
32-bit fixed-point arithmetic; true, thesize override prefix (because were in a 16-bit
segment) will cost a cycle per 32-bit instruction, but thats better than the 3cycles it
takes to do 32-bit arithmetic with 16-bit instructions. It isnt obvious, but theres a
nifty trick we can use here, again courtesy of Chris Hecker (who, as you can tell, has
done a fair amount of thinking about thecomplexities of texture mapping).
We can store the currentfractional parts of both the X and Y source coordinates in
a single 32-bit register, EDX, as shownin Figure 58.6. Its important to note that the
Y fraction is actually only15 bits, withbit 15 of EDX always
kept at zero; this allowsbit
15 to store thecarry status from eachY advance. We can similarly store the fractional
X and Y advance amounts in ECX, and can store thesum of the integer partsof the
X and Y advance amounts inBP. With this arrangement, thesingle instruction ADD
EDX,ECX advances the fractional partsof both X and y and the following instruction
J
Add fractional X increment
fractional
to
X coordinate
Y coordinate
JCarry from
fractional
addition?
1088
Chapter 58
I
I
Fractional Y Carry
1
Fractional X
Coordinate (1 6 bits)
Bit 31
Fractional Y
Coordinate (1 5 bits)
Bit 16
Bit 14
Bit 0
Bit 15
ADC S1,BP finishes advancing the source pointer in X. Thats a mere3 cycles, and all
that remains is to finish advancing the source pointer in Y
Actually, wealso advanced the source pointer by the Yinteger amount back when we
added BP to SI; all thats left is to detect whether our addition to the Y fractional
current coordinate produced a carry. Thats easily done by testing bit 15 of EDX; if
its zero, therewas no carry and were done; otherwise, Y carried, so we have to reset
bit 15 andadvance the source pointer by one scanline. The resulting program flow is
shown in Figure 58.7. Note that unlike the X fractional addition, we cant get away
with just adding in the carry from the Y fractional addition, because when the Y
fraction carries, it indicates a move not from one pixel to the next on ascanline (a
single byte), but rather from one
scanline to the next (a full scanline width).
All of the above optimizations together getus to 10 cycles--very close to John Miles,
but not thereyet. We have one more trick up our sleeve, though: Suppose we point
SS to the segment containing our textures, and point DS to the screen? (This requires either setting up a stack in the texture segment or ensuring that interrupts
and other stack activity cant happen while SS points to that segment.) Then, we
could swap the functionsof SI and BP; that would let us useBP, which accessesSS by
default, to get at the textures, and DI to access the screen-all with no segment
prefixes at all. By gosh, that would get us exactly one morecycle, and would bring us
down to the same 9 cyclesJohn Miles attained; Listing 58.3 shows that code.At long
last, the Holy Grail attained and our honor defended,
we can rest.
Or can we?
LISTING 58.3158-3.ASM
: I n n e rl o o pt od r a w
: r a t h e rt h a n
a s i n g l et e x t u r e - m a p p e dv e r t i c a lc o l u m n ,
a h o r i z o n t a ls c a n l i n e .M a x e d - o u t1 6 - b i tv e r s i o n .
: A t t h i sp o i n t :
AX = s o u r c ep o i n t e ri n c r e m e n tt oa d v a n c eo n ei n
Y
E C X = f r a c t i o n a l Y advance i n l o w e r 1 5 b i t s o f C X .
f r a c t i o n a l X advance i nh i g hw o r d
o f ECX. b i t
15 s e t t o 0
1089
4
Add integer X increment, integerY
increment, and carry from last
operation to source pointer with ADC
SCANOFFSET-0
REPT LOOP~UNROLL
mov
mov
b l ,[ b p l
[di+SCANOFFSETl,bl
edx.ecx
add
: g e tt e x t u r ep i x e l
; s e ts c r e e np i x e l
; a d v a n c ef r a c
Y i n DX,
; f r a c X i nh i g hw o r do f
bapd, cs i
test
jz
dh,80h
@F
bp,ax
add
and
dh.not
80h
1090
Chapter 58
EDX
; a d v a n c es o u r c ep o i n t e rb yi n t e g r a l
; X & Y amount, a l s oa c c o u n t i n gf o r
; c a r r yf r o m
X f r a c t i o n a la d d i t i o n
; c a r r yf r o m
Y f r a c t i o n a la d d i t i o n ?
:no
;yes.advance
Y byone
; r e s e tt h e Y f r a c t i o n a l c a r r y b i t
@@:
SCANOFFSET
SCANOFFSET + SCANWIDTH
ENDM
a s i n g l et e x t u r e - m a p p e dv e r t i c a lc o l u m n ,
a h o r i z o n t a ls c a n l i n e .M a x e d - o u t3 2 - b i tv e r s i o n .
: A t t h i sp o i n t :
EAX = sum o f i n t e g r a l X & Y s o u r c ep o i n t e ra d v a n c e s
ECX
EDX
1 5 b i t s o f D X , f r a c t i o n a ls o u r c et e x t u r e
i nh i g hw o r d
o f E D X . b i t 15 s e t t o 0
X coord
1091
--
ESI
ED1
EBP
i n i t i a ls o u r c et e x t u r ep o i n t e r
i n i t i a ld e s t i n a t i o np o i n t e r
f r a c t i o n a l Y advance i n l o w e r 15 b i t s o f B P .
f r a c t i o n a l X advance i n h i g h w o r d o f
EBP. b i t
15 s e t t o 0
SCANOFFSET-0
REPT LOOP-UNROLL
mov
b l , Cesi 3
add
edx,
ebp
adc
e s. ei a x
mov
Cedi+SCANOFFSETl , b l
test
jz
add
dh.8Oh
s h o r t @F
esi .ecx
and
dh.not
80h
; g e ti m a g ep i x e l
:advance f r a c Y i n D X ,
; f r a c X i n h i g hw o r do f
EDX
; a d v a n c es o u r c ep o i n t e rb yi n t e g r a l
; X & Y a m o u n t ,a l s oa c c o u n t i n gf o r
; c a r r yf r o m
X f r a c t i o n a la d d i t i o n
; s e ts c r e e np i x e l
; ( l o c a t e dh e r et oa v o i d4 8 6
; A G I f r o mp r e v i o u sb y t eo p )
; c a r r yf r o m Y f r a c t i o n a l a d d i t i o n ?
;no
;yes.advance
Y byone
; ( p r o d u c e sP e n t i u m
A G I f o r MOV B L . [ E S I ] )
; r e s e tt h e Y f r a c t i o n a l c a r r y b i t
@e:
SCANOFFSET
- SCANOFFSET + SCANWIDTH
ENDM
And there you have it: A five to 10-times speedup of a decent assembly language
texture mapper. All it took was some help frommy friends, a good, stiffjolt
of rightbrain thinking,and some solid left-brain polishing-plus the knowledge that such a
speedup was possible. Treat every optimization task as if John Miles has just written
to inform you that hes made it faster than your wildest dreams, and youll be amazed
at what you can do!
;cycle
;cycle
;cycle
;cycle
;cycle
1 U-pipe
1 V-pipe
2 i d l e A G I on E S I
3 U-pipe
3 V-pipe
However, I dont see any way to eliminate this last AGI, which happens abouthalf the
time; even withit, the Pentium execution
time for Listing 58.4is 5.5 cycles. Thats 61
1092
Chapter 58
Previous
Home
Next
nanoseconds-a highly respectable 16 million texture-mapped pixels per secondon a90 MHz Pentium.
The type of texture mapping discussed in both this and earlier chapters doesnt do
perspective correction when mapping textures. Why that is and how to handle perspective correction is a topic for a whole separate book, but be aware that thetextures
on some large polygons (not thepolygon edges themselves) drawn withthe code in
this chapter will appear to be unnaturally bowed, although small polygons should
look fine.
Finally, we never did get rid of the last jump in the texture mapper, yet John Miles
claimed no jumps at
all. How did he doit? Im not sure, butId guessthat he used a
two-entry look-up table, based on the Y carry, to decide how much to advance the
source pointer in Y. However, I couldnt come up with any implementation of this
approach that didnt take 0.5 to 1 cycle more than the test-and-jump approach, so
either I didnt come up with an adequately efficient implementation of the table,
John saved a cycle somewhere else, or perhaps John implemented his code in a 32bit segment,but used the less-efficient table in his fervor to get rid of the finaljump.
The knowledge that I apparently came up with a differentsolution than Johnhighlights that the technical aspects of Johns implementation were, in truth, totally
irrelevant to my optimization efforts; the only actual effect Johns code had on me
was to make me belime a texture mappercould run that fast.
Believe it! And while youre at it, give both halves of your brain equal time-and
watch out for aliens in short skirts, 60s bouffant hairdos, and an undue interest in
either half.
1093
Previous
chapter 59
the idea of bsp trees
Home
Next
1097
The name was there in my mind, somewhere; I could feel the shape of it, in that
same back storeroom, if only I could figureout how to retrieveit.
1poked and worried at thatmemory, trying to get to
it come to the surface.
I concentrated on it as hard as I could, and even started going through the alphabet one
letter at a time, trying to remember if her name started with each letter. After 15
minutes, I was wide awake and totally frustrated. I was also farther than ever from
answering the question; all thefocusing on the memory was beginning to blur the
original imprint.
At this point, I consciously relaxed and made myself think about something completely different. Every time my mind returned to the mystery girl, I gently shifted it
to something else. After a while, I began to driftoff to sleep,and as I did a connection was made, and aname popped, unbidden, intomy mind.
Wendy Tucker.
There aremany problems thatare amenable to the straight-ahead,purely conscious
sort of approach that I first tried to use to retrieve Wendys name. Writing code
(once its designed) is often like that, as are some sorts of debugging, technicalwriting, and balancing your checkbook. Ipersonally find these left-brainactivities to be
very appealing because theyre finiteand controllable; when I start one, I know 111
be able to deal with whatever comes up and make good progress, just by plowing
along. Inspirationand intuitive leaps are sometimes useful, but not required.
The problem is, though, that neitheryou nor I will ever do anything great without
inspiration and intuitive leaps, and especially not without steppingaway from whats
known and venturing into territories beyond. The way to do that is not by trying harder
but, paradoxically, by q n g less hard, stepping back, and giving yourright brain room to
work, then listening for and nurturing whatever comes of that. On a small scale,
thats how I rememberedWendys name, and on alarger scale, thats how programmers come up with products that are more than
me-too, checklist-oriented software.
Which, for a coupleof reasons, bringsus neatly to this chapters topic,Binary Space
Partitioning (BSP) trees. First, games are probably the sort of software in whichthe
right-brain elementis most important-blockbuster games are almost always breakthroughs in one way or another-and some very successful games use BSP trees,
most notably id Softwaresmegahit DOOM. Second, BSP trees arentintuitively easy
to grasp, and considerable ingenuity and inventiveness is required to get themost
from them.
Before we begin, Id liketo thankJohn Carmack, the technical wizard behind DOOM,
for generously sharinghis knowledge of BSP trees with me.
BSP Trees
A BSP tree is, at heart, nothing more than a tree thatsubdivides space in order to
isolate features of interest. Each node of a BSP tree splits an area or volume
a
(in 2-D or
1098
Chapter 59
3-D, respectively) into two parts along a line or a plane; thus the name
Binary Space
Partitioning. The subdivision is hierarchical; the root nodesplits the world into two
subspaces, then eachof the roots two children splits one of those two subspaces into
two more parts. This continues with each subspace being further subdivided, until
each component of interest (each line segment or polygon, for example) has been
assigned its own unique subspace. This is, admittedly, a pretty abstract description,
but the workings of BSP trees will become clearer shortly; it may help to glance
ahead to this chapters figures.
Visibility Determination
For the time being, Im goingto discuss onlyone of the many uses ofBSP trees: The
ability of a BSP tree to allow you to traverse a set of line segments or polygons in
back-to-front or front-to-back order as seen from any arbitrary viewpoint. This sort of
traversal can be very helpful in determining which parts of each line segment or
polygon are visible and which are occluded from the current viewpoint in a 3-D
scene. Thus, a BSP tree makes possible an efficient implementation of the painters
algorithm, whereby polygons are drawn in back-to-frontorder, with closer polygons
overwriting more distant ones that overlap, as shown in Figure 59.1. (The line segments in Figure 1(a) and in other figures in this chapter, represent vertical walls,
viewed from directly above.) Alternatively, visibilitydetermination can be performed
by front-to-back traversal workingin conjunction with some method for remembering which pixels have already
been drawn. The latter approachis more complex, but
has the potential benefit of allowing you to early-out from traversal of the scene
database when all the pixels on the screen have been drawn.
Back-to-front or front-to-back traversal in itself
wouldnt be so impressive-there are
many ways to do that-were it notfor one additional detail: The traversal can always
be performed in linear time, as well see later on. For instance, you can traverse, a
polygon list back-to-frontfrom any viewpoint simplyby walking through the corresponding BSP tree once,visiting each node one andonly one time, and performing
only one relatively inexpensive testat each node.
Its hard to get cheapersorting than linear time, and BSP-based rendering stacks up
well against alternatives such as z-buffering,
octrees, z-scan sorting, and polygon sorting. Better yet, a scene database represented as a BSP tree can be clipped to theview
pyramid very efficiently;huge chunksof a BSP tree can be loppedoff when clipping
to the view pyramid, because if the entire area or volume of a node lies entirely
The Idea of BSP Trees
1099
/
C
B. After
Figure 59.1
outside the view volume, then all nodes and leaves that are children of that node
must likewise be outside the view volume, for reasons that will become clear as we
delve into theworkings of BSP trees.
Limitations of
BSP Trees
Powerful as they are, BSP trees arent perfect.By far the greatestlimitation of BSP
trees is that theyre time-consuming to build, enough so that, for all practical purposes, BSP trees must be precalculated, and cannot be built
dynamically at runtime.
In fact, a BSP-tree compiler that attempts to perform some optimization (limiting
the numberof surfaces that need to
be split, for example)can easily takeminutes or
even hours to process large world databases.
A fixed world database is fine forwalkthrough or flythrough applications (where the
viewpoint movesthrough astatic scene), but not much
use for games or virtual reality, where objects constantly move relative to one another. Consequently, various
workarounds have been developed to allow moving objects to appear in BSP treebased scenes.DOOM, for example,uses 2-D sprites mixed into BSP-based 3-D scenes;
note, though, thatthis approach requires maintainingz information so that sprites
can be drawn and occluded properly. Alternatively, movable objects could be represented as separate BSP trees and merged anew into the world BSP tree with each
move. Dynamic merging may or may not be fast enough, depending on the scene,
but merging BSP trees tends to be quicker than building them, because the BSP
trees being merged arealready spatially sorted.
1 100
Chapter 59
Another possibility would be to generate a per-pixel z-buffer for each frame as its
rendered, to allow dynamically changing objects to be drawn into the BSP-based
world. In this scheme, the BSP tree would allow fast traversal and clipping of the
complex, static world,and thez-buffer wouldhandle therelatively localized visibility
determination involving moving objects. The drawback of this is the need for a
memory-hungry z-buffer; a typical 640x480 z-bufferrequires a fairly appalling 600K,
with equally appalling cache-miss implications for performance.
Yet another possibility would be to buildthe world so that each dynamic object falls
entirely within a single subspace of the static BSP tree, rather than straddlingsplitting lines or planes. In this case, dynamic objectscan be treated as points, which are
then just sorted into the
BSP tree on the fly as they move.
The only other drawbacks of BSP trees that I know of are the memory required to
store the tree,which amounts to a few pointers per node,and therelative complexity of debugging BSP-tree compilation and usage; debugging a large data set being
processed by recursive code (which BSP code tends to be) can be quitea challenge.
Tools like the BSP compiler Ill present in the next chapter, which visually depicts
the process of spatial subdivision asa BSP tree is constructed, help a great dealwith
BSPdebugging:
1 101
There areinfinitely valid waysto carve up Figure 59.2, but thesimplest isjust to carve
along the linesof the walls themselves, with each node containing one wall. This is
not necessarily optimal, in the sense of producing the smallest tree, but it has the
virtue of generating the splitting lines without expensive analysis. It also saves on
data storage, because the data for the walls can do double duty in describing the
splitting linesas well. (Putting onewall on each splitting line doesnt
actually create
a unique subspace for each wall, but it does create a uniquesubspace boundary for
each wall; as wellsee, that spatial organization provides for the same unambiguous
visibility ordering as a uniquesubspace would.)
Creating a BSP tree is a recursive process,so well perform the h t split and go from there.
Figure 59.3 shows the world carvedalong the line of wallC into two parts: walls that are
in frontofwall C, and walls that are behind.(Any of the walls would havebeen anequally
1
BSP tree
front child
frontlines
child
back lines
J
Initial split along the line of wall C.
Figure 59.3
1 102
Chapter 59
valid choice for the initial split;we'll return to the issue ofchoosing splittingwalls in the
next chapter.) This splittinginto front and back isthe essential dualismof BSP trees.
Next, in Figure 59.4, the front subspace of wall C is split by wall D. This is the only
wall in that subspace, so we're done with wall C's front subspace.
Figure 59.5 shows the back subspace of wall C being split by wall B. There's a difference here, though: Wall A straddles the splitting line generated from wall B. Does
wall A belong in the front orback subspace of wall B?
fronl
1 103
Both, actually. WallA gets splitinto two pieces, which Ill call wall A and wall E; each
piece is assigned tothe appropriatesubspace and treatedas a separatewall. As shown
in Figure 59.6, each of the split pieces then has a subspace to itself, and each becomes a leaf of the tree. The BSP tree is now complete.
Visibility Ordering
Now that weve successfullybuilt aBSP tree, you might justifiably be a little puzzled
as to how any ofthis helps with visibilityordering. The answer is that each BSP node
can definitively determine which of its child trees is nearer andwhich is farther from
any and all viewpoints; applied throughout the tree,this principle makes it possible
to establish visibility ordering for all the line segments or planes in a BSP tree, no
matter what the viewing angle.
Consider theworld ofFigure 59.2 viewedfrom an arbitrary angle,
as shown inFigure
59.7. The viewpoint is in front of wall C; this tells us that all walls belonging to the
front tree that descends fromwall C are nearer alongevery ray from the viewpoint
than wall C is (that is, they cant be occluded by wall C) . All the walls in wall Cs back
tree are likewise farther away than wall C along any ray. Thus, forthis viewpoint,we
know for sure that
if were usingthe painters algorithm,
we want to draw allthe walls
in the back tree first, then wall C, and then the walls in the front tree.If the viewpoint had been on the
back side of wall C, this order would have been reversed.
Of course, we need more ordering information than
wall C alone can give us, but we
get thatby traversing the tree recursively, making the same far-near decision at each
node. Figure 59.8 shows the painters algorithm (back-to-front) traversal order of
the tree for the viewpoint of Figure 59.7. At each node, we decide whether were
BSP tree
1 104
Chapter 59
~
~
seeing the front or back side of that nodes wall, then visit whichever of the walls
children is on the farside from the viewpoint, drawthe wall, and thenvisit the nodes
nearer child, in that order.Visiting a child is recursive, involving the same far-near
visiting order.
The key is that eachBSP splitting line separatesall the walls in the currentsubspace
into two groups relative to the viewpoint, and every single member of the farther
1 105
group is guaranteed not toocclude every single member of the nearer. By applying
this ordering recursively, the BSP tree can be traversed to provide back-to-front or
front-to-back ordering, with each node beingvisited only once.
The type of tree walk used to producefront-to-back or back-to-front BSP traversal is
known as an inorderwalk. Moreon this very shortly; youre also likelyto find adiscussion of inorder walking in any good data structuresbook. The only special aspect of
BSP walks isthat a decision has to be made at each node aboutwhich way the nodes
wall is facing relative to the viewpoint, so we know which child tree is nearer and
which is farther.
Listing 59.1 shows a function that
draws a BSP tree back-to-front.The decision whether
a nodes wall is facing forward, made by WallFacingForward()in Listing 59.1, can, in
general, be made by generating a normal
to the nodes wall in screenspace (perspective-corrected space as seen from the viewpoint) and checking whether the z
component of the normal is positive or negative, or by checking the sign of the dot
product of a viewspace (non-perspective corrected space as seen from theviewpoint)
normal and a ray from theviewpoint to the wall. In 2-D, the decision can be made by
enforcing theconvention that when a wall is viewed from the front, the start
vertex is
leftmost; then asimple screenspace comparison of the x coordinatesof the left and
right vertices indicates which way the wall is facing.
listing59.1159-1
.C
v o i d WalkBSPTree(N0DE*pNode)
i f (WallFacingForward(pNode) {
i f (pNode->Backchild) {
WalkBSPTree(pN0de->Backchild);
Draw(pNode);
i f (pNode->Frontchild) I
WalkBSPTree(pN0de->Frontchild):
I
J else
i f (pNode->Frontchild) {
WalkBSPTree(pN0de->Frontchild):
Draw(pNode):
i f (pNode->Backchild) {
1 106
WalkBSPTree(pN0de->Backchild):
Be aware that BSP trees can oftenbe made smaller and more efficientby detecting
collinear surfaces (like aligned wall segments) and generating only oneBSP node
for each collinear set, with the collinear surfaces stored in, say, a linked list attached to that node.Collinearsurfacespartition space identically and cant occlude
one another,so it suffices to generate one splitting nodefor each collinear set.
Chapter 59
1 107
/ / T e s t e dw i t h3 2 - b i tV i s u a l
P
in c l ude < s t d l ib. h>
a t r e e ,u s i n gc o d er e c u r s i o n
C++ 1.10.
l i n c l ude tree. h
e x t e r nv o i dV i s i t ( N 0 D E* p N o d e ) :
v o i d WalkTree(N0DE*pNode)
(
I / Make s u r e t h e t r e e i s n t e m p t y
i f (pNode !- NULL)
(
/ / T r a v e r s et h el e f ts u b t r e e .
i f t h e r e i s one
i f ( p N o d e - > p L e f t C h i l d !- NULL)
WalkTree(pNode->pLeftChild):
I / V i s i t t h i s node
Visit(pNode):
/ / T r a v e r s et h er i g h ts u b t r e e .
i f thereis
i f ( p N o d e - > p R i g h t C h i l d !- NULL)
one
WalkTree(pNode->pRightChild);
listing59.3
159-3.H
/ / Header f i l e TREE.H f o r t r e e - w a l k i n g c o d e .
t y p e d e fs t r u c t -NODE I
s t r u c t -NODE * p L e f t C h i l d :
s t r u c t -NODE * p R i g h t C h i l d ;
1 NODE:
Then I ask if they have any idea how to make the code faster; some dont, but
most
point out that function calls are pretty expensive. Either way, I then ask them to
rewrite the function without code recursion.
And then I sit back and squirm for a minimumof 15 minutes.
I have never had anyone write a functional data-recursion inorder walk function in
less time than that, and several people have simply nevergotten the codeto work at
all. Even the best of them have fumbled their way through the code, sticking in a
push here or a pop there, then working through sample scenarios in their head to
see whatsbroken, programmingby trial and erroruntil the errorsseem to be gone.
No one is eversure they have it right; instead, whenthey cant find any more bugs,
they look at me hopefully to see if its thumbs-up or thumbs-down.
And yet,a data-recursive inorder walk implementation has exactly the same flowchart
and exactly the same functionality as the code-recursive version theyve already written. They already have
a fully functional modelto follow, with allthe problems solved,
but they cant make the connection between that modeland thecode theyretrylng
to implement. Why is this?
1108
Chapter 59
Know it Cold
The problem is that these people don't understand inorder walking through and
through. They understand the conceptsof visiting left and right subtrees, and they
have a general pictureof how traversal movesabout the tree, but
they do notunderstand exactly what the code-recursive version does. If they really comprehended
everything that happens in each iteration of WalkTreeO-how each call saves the
state, and what that implies for the order in which operations are performed-they
would simplyand without fuss implement codelike that inListing 59.4, working with
the code-recursive version as a model.
Listing 59.4159-4.C
/ I F u n c t i o nt oi n o r d e rw a l k
a t r e e ,u s i n gd a t ar e c u r s i o n .
/ / No s t a c k o v e r f l o w t e s t i n g i s p e r f o r m e d .
/ / T e s t e dw i t h3 2 - b i tV i s u a l
C++ 1.10.
# i n c l u d e< s t d l i b . h >
#i
n c l ude " t r e e . h"
100
# d e f i n e MAX-PUSHED-NODES
e x t e r nv o i dV i s i t ( N O 0 E* p N o d e ) :
v o i d WalkTree(NO0E*pNode)
(
NODE *NodeStack[MAX-PUSHED_NODESI:
NODE **pNodeStack;
/ / Make s u r et h et r e ei s n ' te m p t y
i f (pNode !- NULL)
--
NodeStackCOl
NULL: / / push"stackempty"value
pNodeStack
NodeStack + 1;
f o r (::)
[
/ / I f t h ec u r r e n tn o d eh a s
a l e f t c h i l d , push
I / t h ec u r r e n tn o d ea n dd e s c e n dt ot h el e f t
/ / childtostarttraversingtheleftsubtree.
/ I Keep d o i n g t h i s u n t i l
we come t o a node
/ / w i t hn ol e f tc h i l d ;t h a t ' st h en e x t
node t o
I / v i s i t i n i n o r d e r sequence
w h i l e( p N o d e - > p L e f t C h i l d !- NULL)
*pNodeStack++
pNode:
pNode
pNode->pLeftChild;
We're a t a node t h a t hasno
leftchild.
v i s i t t h e n o d e ,t h e nv i s i tt h er i g h t
s u b t r e e i f t h e r e i s one. o r t h e l a s t p u s h e dn o d eo t h e r w i s e :r e p e a tf o re a c h
poppednode u n t i l one w i t h a r i g h t
o r we r u no u to fp u s h e d
s u b t r e ei sf o u n d
n o d e s( n o t et h a tt h el e f ts u b t r e e so f
pushednodeshavealreadybeenvisited.
t h e y ' r ee q u i v a l e n ta tt h i sp o i n tt on o d e s
w i t hn ol e f tc h i l d r e n )
SO
so
for (::I
{
Visit(pNode1;
I / I f thenodehas
a r i g h t c h i l d . make
/ / t h ec h i l dt h ec u r r e n tn o d e
and s t a r t
1 109
I!
/I
//
//
//
t r a v e r s i n gt h a ts u b t r e e ;o t h e r w i s e ,p o p
b a c ku pt h et r e e ,v i s i t i n g
nodes we
passedonthe
way down, u n t i l we f i n d a
node w i t h a r i g h t s u b t r e e t o t r a v e r s e
o rr u no u to fp u s h e dn o d e sa n da r ed o n e
i f ( p N o d e - > p R i g h t C h i l d !- NULL)
/ / Currentnodehas
a r i g h tc h i l d :
/ / t r a v e r s et h er i g h ts u b t r e e
pNode->pRightChild:
pNode
break:
so
Pop t h en e x tn o d ef r o mt h es t a c k
we can v i s i t i t andsee
i f it has a
r i g h ts u b t r e et ob et r a v e r s e d
((pNode
*-pNodeStack)
NULL)
/ I S t a c k i s emptyandthecurrentnode
/ / hasno
r i g h tc h i l d :w e r ed o n e
return:
Take a few minutes to look over Listing 59.4and relate it to Listing 59.2.The structure is different, but uponexamination it becomes
clear that bothlistings reflect the
same underlying model: For each node,visit the left subtree,visit the node,visit the
right subtree.And although Listing 59.4is longer, thats mostly because
I commented
it heavily to make sure its workingsare understood; there areonly 13 lines that actually do anything in Listing 59.4.
Lets look at it anotherway. All the code in Listing 59.2 does is say: Here I am at a
node. First Ill visit the left subtreeif there is one, then Ill visit this node, then Ill
visit the right subtreeif there is one. While Im visiting the left subtree, Ill just push
a markeron a stack that tells meto comeback here when theleft subtree is done. If,
after visiting a node, there are no right children
visittoand nothingleft on thestack,
Im finished. The code doesthis at each node-and thats allit does. Thatsall Listing 59.4does, too, but people tendto get tangledup in pushes and pops and while
loops when they use data recursion. When the implementation model changes to
one with which they are unfamiliar, they abandon the perfectly good model they
used before and try to rederive it in the new context by the seatof their pants.
Here S a secret when you refaced with a situation like this: Step back and get a
clear picture of what your code has to do. Omit no steps. You should builda model
that is so consistent and solid that you can instantly answer any question about
how the code should behave in any situation. For example, my intewiavees often
decide, by trial and error, that there are
two distinct types of right children: Right
children visited after popping back to visit a node after the left subtree has been
visited, and right children visited after descending to a node that has no left
child.
1 1 10 Chapter 59
This makes the traversal code a mass of special cases, each of which has to be
detected by the programmer by trying out scenarios. Worse,you can never be sure
with this approach that you 've caught all the special cases.
The alternative is to develop and apply a unlfiing model. There aren 'treally two
types of right children; the rule is that all right children are visited after their
parents are visited, period. The presence or absence of a left child is irrelevant.
The possibility that a right child may be reached via different code paths depending on the presence of a left child does not afect the overall model. While this
distinction may seem trivial it is in fact crucial, because ifyou have the model
down cold, you can always tell if the implementation is correct by comparing it
with the model.
159-5.C
/ / Sampleprogram t o e x e r c i s ea n dt i m et h ep e r f o r m a n c e
of
I1 i m p l e m e n t a t i o n s o f Wal k T r e e 0 .
/ / T e s t e dw i t h3 2 - b i tV i s u a l
C++ 1.10 under Windows NT.
# i n c l u d e< s t d i o . h >
# i n c l u d e< c o n i o . h >
#i
n c l u d e < s t d il b . h >
# i n c l u d e< t i m e . h >
#i
n c l u d e " t r e e . h"
longVisitcount
0;
v o i dm a i n ( v o i d 1 ;
voidBuildTree(N0DE*pNode.
i n t RemainingOepth):
e x t e r nv o i d WalkTree(N0DE*pRootNode);
voidmain0
NODE RootNode;
i n t i;
l o n gS t a r t T i m e ;
I / B u i l d a sample t r e e
B u i l d T r e e ( & R o o t N o d e1. 4 ) ;
11 Walk t h e t r e e 1000 timesandsee
StartTime
time(NULL);
<
iO
l O
O
i++)
;
f o r( i - 0 :
how l o n g i t t a k e s
WalkTree(&RootNode);
1111
p r i n t f ( " S e c o n d se l a p s e d :% l d \ n " .
time(NULL) - S t a r t T i m e l :
g e t c h ( 1;
//
/ / F u n c t i o n t o add r i g h t and l e f t s u b t r e e s
/ / s p e c i f i e dd e p t ho f ft h ep a s s e d - i n
/I
v o i dB u i l d T r e e ( N 0 D E
*pNode,
i f (RemainingDepth
of t h e
i n t RemainingDepth)
0)
--
pNode->pLeftChild
pNode->pRightChild
else
node.
NULL;
NULL:
- -
pNode->pLeftChild
malloc(sizeof(N0DE)):
i f (pNode->pLeftChild
NULL)
p r i n t f ( " 0 u to fm e m o r y \ n " ) :
exit(1):
- -
pNode->pRightChild
malloc(sizeof(N0DE)):
i f (pNode->pRightChild
NULL)
p r i n t f ( " 0 u to fm e m o r y \ n " ) :
exit(1);
//
/ / N o d e - v i s i t i n gf u n c t i o n
// call.
/I
voidVisit(N0DE*pNode)
so WalkTreeOhassomething
to
V
i s i tCount++:
1 1 12
Chapter 59
parameter is passed and because the compiler doesnt bother setting up an EBPbased stack frame, instead it uses ESP to address the stack. (And, in fact, this cost
could be reduced still further by eliminating the check for a NULL pNode at all but
the top level.) There are other interesting aspects to what the compiler does with
Listings 59.2 and 59.4 but thats enough to give you the idea. Its worth noting that
the compiler might not doas well with code recursion in a more complex function,
and thata good assembly language implementation couldprobably speed up Listing
59.4 enough to make it measurably faster than Listing 59.2, but not even close to
being enough faster to be worth the effort.
The moral of this story(apart from being
it
a good ideato enable compiler optimization) is:
1. Understand what youre doing, through and through.
2. Build a complete and consistent model in your head.
3. Design from the principles that the model provides.
4. Implementthedesign.
5. Measure to learn what youve wrought.
6. Go back to step 1 and apply what youvejust learned.
With each iteration youll dig deeper, learn more,and improve your ability to know
where and how to focus your design and programming efforts. For example, with
the C compilers I used five to 10 years ago, back when I learned about the relative
strengths and weaknesses of code and data recursion, and with the processors then
in use, Listing 59.4 would have blown away Listing 59.2. While doing this chapter,
Ive learned that given current processors and compiler technology, data recursion
isnt going to get me any big wins; and yes, that was news to me. Thats good; this
information saves me fromwasted effort in the future andtells me what to concentrate on when I use recursion.
Assume nothing, keep digging deeper, and never stop learning and growing. The
world wonthold still for you, but fortunately you can run fast enough to keepup if
you just keep atit.
Depths within depths indeed!
1 1 13
Previous
Home
Related Reading
Foley,J., A. van Dam, S. Feiner, andJ. Hughes, Computer Gaphics: Principles and Practice
(Second Edition), Addison Wesley, 1990, pp. 555-557, 675-680.
Fuchs, H., Z. Kedem, andB. Naylor, OnVisible Surface Generation
by A Priori Tree
Structures,Computer GraphicsVol. 17(3),June 1980, pp. 124133.
Gordon, D., and S. Chen, Front-teBack Displayof BSP Trees,IEEE Computer Graphics
and Applications, September 1991, pp. 79-85.
Naylor, B., Binary Space Partitioning Trees as an Alternative Representation of
Polytopes,Computer Aided Design, Vol. 22(4), May 1990, pp. 250-253.
1 1 14
Chapter 59
Next
Previous
chapter 60
Home
Next
?:
3 %$~
1117
In 1992, I did a series of columns about my X-Sharp 3-D library, and hung out on
DDJs bulletin board. There was another guy who hung out there who knew a lot
about 3-D, a fellow named John Carmack whowas surely the only game programmer
Id everheard of who developed under NEXTSTEP. When we moved to Redmond, I
didnt have time for BBSs anymore, though.
In early 1993, I hired Chris Hecker. Later that year, Chris showed me an alpha copy
of DOOM, and I nearly fellout of my chair. About ayear later, Chrisforwarded me a
newsgroup post about NEXTSTEP, and said, Isntthis the guy you usedto know on
the DDJ bulletin board? Indeed it was John Carmack; whats more, it turned out
that John was the guy who had written DOOM. I sent him acongratulatory piece of
mail, and he sent back some thoughts about what he was working on, and somewhere in there I asked if he ever came up myway. It turned out he had family in
Seattle, so he stopped in and visited, and we had a greattime.
Over the next year, we exchanged some fascinating mail, and I became steadily more
impressed with Johns company, id Software. Eventually,John asked if Id be interested in joining id, and after a goodbit of consideration I couldnt think of anything
else that would be as much fun or teach me asmuch. The upshot is that here we all
are in Dallas, our fourth move of 2,000 miles or more since Ive starting writing in
the computer field, and now Im writing some seriously cool3-D software.
Now that Im here, its an eye-opener to look back and see how events fit together
over the last decade. You see, when John started doing PC game programming he
learned fast graphics programming from those early ProgrammersJournal articles of
mine. The copy of Commander Keen that validated my faith in the PC as a game
machine was the fruit of those articles, for that was an id game (although I didnt
know that then). When John was hanging out on the DDJBBS, he had just done
Castle Wolfenstein3-D, the first great indoor3-D game, and was thinking about how
to do DOOM. (If only Id known that then!) And had I not hiredChris, or had he
not somehow remembered me talkingabout thatguy who used NEXTSTEP, never
Id
have gotten back in touch with John, and things would surely be different. (At the
very least, I wouldnt be hearing jokes about how my daughters going to grow up
saying yall.)
I think theres a worthwhile lesson to be learned from all this, a lesson that Ive
seen hold true formany other people, as well.If you do what you love,and do it as
well as you can, good things will eventually come of it. Not necessarily quickly or
easily, but if you stick with it, they will come. There are threads that run through
our lives, and by the time weve been adults for awhile, practically everything that
happens has roots that run far back in time. The implication should be clear: If
you want good things to happen in your future, stretch yourself and put in the
extra effort now at whatever you care passionately about, so those roots will have
plenty to work with down the road.
1 1 18
Chapter 60
All this is surprisingly closely related to this chapters topic, BSP trees, because
John is the fellow who brought BSP trees into the spotlightby building DOOM
around them. He also got me started with BSP trees by explaining how DOOM
worked and getting me interested enough towant to experiment; the BSP compiler in this article is the direct result. Finally, John has been an invaluable help
to me as Ive learned about BSP trees, as will become evident when we discuss
BSP optimization.
Onward to compiling BSP trees.
Parametric Lines
Were all familiar with lines described in slope-intercept
form, with y as a functionof x
y=mx+b
but theres another sort of line description thats very useful for clipping (and for a
variety of 3-D purposes, such as curved surfaces and texture mapping): parametric
Compiling BSP Trees
1 1 19
lines. In parametric lines, x and y are decoupled from one another, and are instead
described as a function of the parameter t:
%nd
- x,,,,)
t(Yend- Y,,,).
This can be summarized as
x = Xstart
Y = Ys,t
(1 60,170),
ik1.2
(1 50,150)
r
1
(133,117)
k0.67
00)50)/k 0
(80,lO)).
ob-0.4
1 120 Chapter 60
I
I
Line equations:
100+t(150-100~
y = 50 + t( 150-50)
x=
Clipped
segment #1:
f=O to M . 2 5
t = 0.25
Original line segment:
(100,50), (150,1501,
from t=O to 01
1 121
Clipped
segment
#2: k0.6 to kl
Clipped
1 122
Chapter 60
Lend
t= 1
the value
t
for theintersection point, as shown in Figure 60.3.
We then simply compare
the t value to the t values of the endpoints of the line segment being split. If its between them, thatswhere we split the line segment, otherwise,we can tell whichside of
the splitter the line segment is on by which side of the line segments t range its on.
Simple comparisons do all the work, and theres no need to do thework of generating
actual x and y values. If you look closely at Listing 60.1,the core of the BSP compiler,
youll see that the parametric clipping code itself is exceedingly short andsimple.
One interesting point about Listing 60.1is that it generates normals to splitting surfaces
simply by exchanging the xand y lengths of the splitting line segment and negating
the resultant y value, thereby rotating the line90 degrees. In 3-D, itsnot thatsimple
to come by a normal;you could calculate the normal as the cross-product of two of
the polygons edges, or precalculate it when you build the world database.
LISTING60.1160-1
# d e f i n e MAX-NUM-LINESEGS
# d e f i n e MAX-INT
# d e f i n e MATCH-TOLERANCE
/ / A vertex
t y p e d e f s t r u c t _VERTEX
.CPP
1000
Ox7FFFFFFF
0.00001
d o u b l ex :
doubley:
1 VERTEX:
// A potentiallysplitpieceof
a l i n e segment,asprocessedfromthe
/ / base l i n e i n t h e o r i g i n a l l i s t
t y p e d e f s t r u c t -LINESEG
{
-LINESEG * p n e x t l i n e s e g :
i n ts t a r t v e r t e x :
i n te n d v e r t e x :
double wall top:
double wall bottom:
d o u b l et s t a r t :
d o u b l et e n d :
1 123
int color;
-LINESEG
*pfronttree;
LINESEG *pbacktree;
1 LINESEG. *PLINESEG:
static VERTEX *pvertexlist;
static int NumCompiledLinesegs 0:
static LINESEG *pCompiledLinesegs:
/ / Builds a BSP tree from the specified line list. List must contain
/ / at least one entry. If pCurrentTree is NULL, then thisis the root
/ / node, otherwise pCurrentTree is the tree that's been buildso far.
/ / Returns NULL for errors.
LINESEG * SelectBSPTree(L1NESEG * plineseghead.
LINESEG * pCurrentTree, LINESEG ** pParentsChildPointer)
LINESEG *pminsplit;
int minsplits:
int tempsplitcount;
LINESEG *prootline:
LINESEG *pcurrentline:
double nx. ny. numer, denom. t;
/ / Pick a line as the root. and remove it from the list o f lines
/ / to be categorized. The line we'll select is the one of those in
/ / the list that splits the fewest of the other lines
in the list
mi nspl its MAX-INT:
plineseghead;
prootline
while (prootline !- NULL) (
pcurrentline
plineseghead;
tempsplitcount
0;
while (pcurrentline !- NULL) I
/ / See how many other lines the current line splits
nx pvertexlist[prootline->startvertex].y
pvertexlist[prootline->endvertexl.y;
ny
-(pvertexlist[prootline->startvertex].x
pvertexlist[prootline->endvertexl.x);
/ / Calculate the dot products we'll need for line
/ / intersection and spatial relationship
(nx * (pvertexlist[pcurrentline->startvertexl.x numer
pvertexlist[prootline->startvertex3.x)) +
(ny * (pvertexlist[pcurrentline->startvertexl.y pvertexlist[prootline->startvertexl.y));
denom
( ( - n x ) * (pvertexlist[pcurrentline->endvertexl.x pvertexlist[pcurrentline->startvertexl.x)) +
((-fly) * (pvertexlist[pcurrentline->endvertexl.y pvertexlist[pcurrentline->startvertexl.y));
/ / Figure out if the infinite lines of the current line
/ / and the root intersect; if so, figure out if the
/ / current line segment is actually split, split ifso,
/ / and add front/back polygons as appropriate
0.0) I
if (denom
/ / No intersection. because lines are parallel: no
/ / split, s o nothing to do
I else (
/ / Infinite lines intersect: figure out whether the
/ / actual line segment intersects the infinite line
/ / of the root, and split ifso
t
numer / denom;
if ((t > pcurrentline->tstart) I &
(t < pcurrentline->tend)) (
I / The root splits the current line
tempspl i tcounttt:
1 else (
--
--
1 124
Chapter 60
pcurrentline = pcurrentline->pnextlineseg:
1
if (tempsplitcount < minsplits) (
pminsplit = prootline;
minsplits = tempsplitcount;
3
I
prootline
prootline->pnextlineseg:
Builds a BSP tree given the specified root,by creating front and
back lists from the remaining lines,and calling itself recursively
LINESEG * BuildBSPTree(L1NESEG * plineseghead. LINESEG * prootline.
LINESEG * pCurrentTree)
t
LINESEG *pfrontlines;
LINESEG *pbacklines;
LINESEG *pcurrentline:
LINESEG *pnextlineseg;
LINESEG *psplitline;
double nx. ny. numer, denom. t ;
int Done;
/ / Categorize all non-root lines as either in front of the root's
/ / infinite line, behind the root's infinite line, or split by the
/ / root's infinite line, in which case we split it into two lines
pfrontlines = NULL:
pbacklines = NULL;
pcurrentline = plineseghead;
while (pcurrentline != NULL)
//
//
pvertexlist[prootline->endvertexl.y;
-(pvertexlist[prootline->startvertexl.x pvertexlist[prootline->endvertexl.x);
/ / Calculate the dot productswe'll need for line intersection
/ / and spatial relationship
numer = (nx * (pvertexlist[pcurrentline->startvertexl.x pvertexlist[prootline->startvertexl.x)) +
(ny * (pvertexlist[pcurrentline->startvertexl.y pvertexlist[prootline->startvertexl.y));
denom = ((-nx) * (pvertexlist[pcurrentline->endvertexl.x pvertexlist[pcurrentline->startvertex].x))
+
(-(ny) * (pvertexlist[pcurrentline->endvertexl.y pvertexlist[pcurrentline->startvertexl.y));
ny
1 125
--
--
pnextlineseg
pcurrentline->pnextlineseg:
if (numer < 0.0) {
I / Presplit part is in front of root line: link
/ / into front list and put postsplit part in back
/ f list
pcurrentline->pnextlineseg
pfrontlines;
pcurrentline;
pfrontlines
psplitline->pnextlineseg
pbacklines;
pbackl ines pspl i tl ine:
1 else (
/ / Presplit part is in back of rootline: link
/ / into back list and put postsplit part in front
I / list
psplitline->pnextlineseg pfrontlines;
pfrontlines psplitline:
pcurrentline->pnextlineseg
pbacklines:
pbacklines pcurrentline;
>
pcurrentline
1 else (
1 126
Chapter 60
pnextlineseg:
while (!Done)
Done
if (numer
< -MATCHTOLERANCE)
p c u r r e n t l i n e - > p n e x t l i n e s e g pfrontlines;
pfrontlines pcurrentline:
Done I ;
1 else if (numer > MATCH-TOLERANCE) [
/ / Current line i s behind root line: link
/ / into back list
pcurrentline->pnextlineseg pbacklines;
pbacklines pcurrentline;
Done 1:
1 else I
I / The point on the current line we
picked to
I / do frontlback evaluation happens to be
/ / collinear with the root,s o use the other
/ / end of the current line
and try again
numer
(nx *
(pvertexlist[pcurrentline->endvertexl.x pvertexlist[prootline->startvertexl.x))+
(ny *
(pvertexlist[pcurrentline-hndvertex1.y pvertexlist[prootline->startvertexl.y));
- -
>
I
pcurrentline
- pnextlineseg;
--
if (pfrontlines
NULL) {
prootline->pfronttree NULL:
1 else I
if (!SelectBSPTree(pfrontlines. pCurrentTree,
&prootline->pfronttree)) I
return NULL:
1
I
if (pbacklines
NULL) (
prootline->pbacktree NULL:
1 else {
if (!SelectBSPTree(pbacklines. pCurrentTree.
&prootline->pbacktree)) {
return NULL:
I
1
return(proot1ine);
Listing 60.1 isnt verylong or complex, but its somewhat more complicated than it
could be because its structured to allow visual displayof the ongoing compilation
Compiling BSP Trees
1 127
process. Thats because Listing60.1 is actuallyjust a partof a BSP compiler for Win32
that visually depicts the progressive subdivisionof space as the BSP tree is built. (Note
that Listing 60.1 might not compile as printed; I may have missed copying some
global variables that it uses.) The complete code is too large to print here in its
entirety, but its on the CD-ROM in file DDJBSP.ZIP.
1 128
Chapter 60
Previous
Home
Next
single game level might have 5,000 to 10,000 polygons; there arent anywhere near
enough dog years in the lifetime of the universe to handle that. Were going to have
to give up on optimal compilation and come up with a decent heuristic approach,
no matter what optimization objective we select.
In Listing 60.1, Ive applied the popular heuristic of choosing as the splitter at each
node the surface that splits the fewest of the other surfaces that are being considered for that node. In other
words, I choose the wall that splits the fewest ofthe walls
in the subspace its subdividing.
1 129
Previous
chapter 61
frames of reference
Home
Next
1133
An excellent example of this, and onethat Ill discuss towardthe endof this chapter,
is that of 3-Dtransfomzation-the process of converting coordinates fromone coordinate space to another, for example from worldspace to viewspace. The way this is
traditionally explained is functional, but notparticularly intuitive, and fairly hard to
visualize. Recently, Ivecome across another way of looking at transforms that seems
to me to befar easier to grasp. The two approaches aretechnically equivalent, s o the
difference is purely a matterof howwe choose to view things-but sometimes thats
the most important sort of difference.
Before we can talk about transforming between coordinate spaces, however, we need
two building blocks: dot products and cross products.
3-D Math
At this point in the book, I was originally going to present aBSP-based renderer, to
complement the BSP compiler I presented in the previous chapter. What changed
my plans was the considerable amount of mail about 3-D math that Ive gotten in
recent months. In every case,the writer hasbemoaned his/her lack ofexpertise with 3-D
math, and has asked what books
about 3-D math Id recommend, and how elsehe/she
could learn more.
Thats a commendable attitude, but the truth is, theres not all that much to 3-D
math, at least not when it comes to the sort of polygon-based, realtime 3-D thats
done on PCs. You really need only two basic math tools beyond simple arithmetic:
dot products and cross products, and really mostlyjust theformer. My friend Chris
Hecker points out that this is an oversimplification; he notes that lots more mathrelated stuff, like BSP trees, graphs, discrete math for edge stepping,and affine and
perspective texture mappings, goes into a productionquality
game. While thats surely
true, dot andcross products, together with matrix math and perspective projection,
constitute the bulk of what most people are asking about when they inquire about
3-D math, and,as wellsee, are key tools for a lotof useful 3-D operations.
The otherthing the mail made clear was that there are a of
lotpeople out therewho
dont understand eithertype of product, at least insofar as they apply to 3-D. Since
much or even most advanced 3-D graphics machinery relies to a greater or lesser
extent on dot products and cross products (even the line intersection formula I
discussed in the last chapter is actually a quotient of dot products), Im going to
spend this chapter examiningthese basic tools and some of their 3-D applications. If
this is old hat to you, my apologies, and Ill return to BSP-based rendering in the
next chapter.
Foundation Definitions
The dot andcross products themselves are straightforward and require almost no
context to understand, but I need
to define some terms Ill use when describing applicawith dot products.
tions of the products, so Ill do that now, and then get started
1 1 34
Chapter 61
Im going to have to assume you have some math background, or well never get to
the good stuff. So, Im just going to quickly define avector as a directionand a magnitude, represented as a coordinate pair (in 2-D) or triplet (in 3-D), relative to the
origin. Thats a pretty sloppy definition, but itll do for our purposes; if you wantthe
Real McCoy, I suggest you check out Calculus and Analytic Geometry, by Thomas and
Finney (Addison-Wesley:ISBN 0-201-52929-7).
So, for example, in 3-D, the vector V = [5 0 51 has a length, or magnitude, by the
Pythagorean theorem, of
(where vertical double bars denote vector length), anda direction in the plane of
the x and z axes, exactlyhalfway between those two axes.
Ill be working in a left-handed coordinate
system, wherebyif you wrapthe fingers of
your left hand around the
z axis withyour thumb pointing in the
positive z direction,
your fingers will curl from thepositive x axis to the positive y axis. The positive x axis
runs left to right across the screen, the positive y axis runs bottom to top across the
screen, and the positive z axis runs into the screen.
For our purposes, projection is the process of mapping coordinates onto aline or surface. Perspectiveprojection projects 3-D coordinates onto a viewplane, scalingcoordinates
according to their z distance from the viewpoint in order to provide proper perspective. Objectspace is the coordinate space in which an object is defined, independentof
other objects and the world itself. Worldspace is the absolute frameof reference for a 3-D
world; all objectslocations and orientations are with respect to worldspace,and this is
the frame of reference around which the viewpoint and view direction move. Viewspace
is worldspace as seen from the viewpoint, looking in the view direction. Screenspace is
viewspace after perspective projection and scaling tothe screen.
Finally, transformation is the process of converting points from one coordinate space
into another; in our
case, thatll mean rotatingand translating (moving) points from
objectspace or worldspace to viewspace.
For additional information, you might want to check out Foley & van Dams Computer Graphics (ISBN 0-201-12110-7), or the chapters in this book dealing with my
X-Sharp 3-D graphics library.
Frames of Reference
1 1 35
As you can see, the result is a scalar value (a single real-valued number), not
another vector.
Now that we know how to calculate dot
a
product,what does that getus? Not much.
The dot product
isnt of much use for graphics until
you start thinkingof it this way
IPII
u v = cos(8) llvll
(eq. 3)
where q is the anglebetween the two vectors, and the other
two terms arethe lengths
of the vectors, as shown in Figure 61 .l.Although its not immediately obvious, equation 3 has a wide varietyof applications in 3-D graphics.
Dot Productsof Unit Vectors
The simplest case of the
dot product is when both vectors are unit vectars; that is, when their
lengths are both one,
as calculated as in Equation 1.In this case,equation 3 simplifies to:
u v = cos(e)
(eq. 4)
In otherwords, the dot productof two unit vectors is the cosine of the anglebetween
them.
One obvious useof this is to find angles
between unit vectors, in conjunctionwith an
inverse cosine function or lookup table. A more useful application in 3-D graphics
llull llvll
Figure 6 1 1
1 1 36
Chapter 61
lies in lighting surfaces, where the cosine of the angle between incident light and the
normal (perpendicularvector) of a surface determines thefraction of the lights full
intensity at which the surface is illuminated, as in
where Is is the intensity of illumination of the surface, I, is the intensity of the light,
and q is the angle between -D, (where Dl is the light direction vector) and the surface
normal. If the inverse light vector and thesurface normal are both unit
vectors, then
this calculation can be performedwith four multiplies and threeadditions-and no
explicit cosine calculations-as
I, = I&
),
(eq. 6)
where Nsis the surface unit normaland D, is the light unitdirection vector, as shown
in Figure 61.2.
1 1 37
scaling preserve right angles, which is why normals are still normals in viewspace, but
perspective projection does not preserve angles, so vectors that were surface normals
in viewspace are no longernormals in screenspace.
Why does this matter? It matters because, on average, half the polygons in any scene
are facing away from theviewer, and hence shouldntbe drawn. One way to identify
such polygons is to see whether theyre facing toward or away from the viewer; that
is, whether their normals have negative z values (so theyre visible) or positive z Values (so they should be culled). However, were talking about screenspace normals
here, because the perspective projection can shift a polygon relative tothe viewpoint
so that although its viewspacenormal has a negative z, its screenspace normal has a
positive z, and vice-versa, as shownin Figure 61.3. Sowe need screenspace normals,
but those cant readily be generated by transformation from worldspace.
viewpoint in viewspace
x,
1 138
Chapter 61
The solution is to use the cross product of two of the polygon's edges to generate a
normal. The formula for thecross product is:
ux v
= [u2v3-u3v2 u3v1- q v 3 y v 2 - U 2 V 1 ]
(eq. 7)
(Note that thecross product operation is denoted by an X.) Unlike the dot product,
the result of the cross product is a vector. Not just any vector,either; thevector generated by the cross product is perpendicular to both of the original vectors. Thus,
the cross product can be used to generate a normal to any surface for which you
have two vectors that lie within the surface. This means that we can generate the
screenspace normals we need by taking the cross product of two adjacent polygon
edges, as shown in Figure 61.4.
In fact, we can cull with only one-third the work needed to generate afull cmss
product; because we 're interested only in the sign of the z component of the normal, we can skip entirely calculating the x and y components. The only caveat
is to
be careful that neither edge you choose is zero-length and that the edges aren 't
collineal: because the dot product can ?produce a normal
in those cases.
How the cross product of polygon edge vectors generates a polygon normal.
Figure 6 1.4
Frames of Reference
1 1 39
1 140 Chapter 61
polygon 0
polygon 1
i-"
L;
v o * N0<0,
so pol gon O
faces orward &
is visible
;
'"-,
*
'
Vl.N1>0,
so pol
gon O
viewpoint in viewspace
where a plane is described by any point that's on the plane (which I'll call the plane
origin), plus a plane normal. One such application is in clipping a line (such as a
polygon edge) to a plane. Just do a dot product between the plane normal and the
vector from one line endpoint to the plane origin, and repeat for the otherline endpoint. If the signs ofthe dotproducts are thesame, no clipping is needed; if they differ,
clipping is needed. And yes,the dot productis also the way to do the actual clipping;
but before we can talk about that, we need to understand the use of the dot product
for projection.
In otherwords, the result is the cosine of the angle between the two vectors, scaled
by the magnitude of the non-unit vector. Now, consider that cosine is really just the
FramesofReference
1 141
"""."""""""..
unit vector U
U*V
Ip -
$)
Elp[
CxCOl*yCOl+x[ll*yC1l+x[21*yC2l)
#define
DOT-PROOUCT(x,y)
void LineIntersectPlane (float *linestart. float *lineend.
float *planeorigin. float *planenormal, float *intersectpoint)
{
1 142
Chapter 61
- -
veclC2l
linestartC21 - planeoriginC21:
OOT-PROOUCT(vec1. planenormal):
startdistfromplane
if (startdistfromplane
0)
/ / point is in plane
intersectpointC01
intersectpointC11
intersectpointCZ1
return:
--
-- linestartC11;
linestartC11:
veclCO1
linestartCO1 - lineendC01:
veclCll
linestartcll - lineendC11:
linestartC21 - lineendC21:
vecl[21
DOT-PRODUCT(vec1, planenormal):
projectedlinelength
startdistfromplane / projectedlinelength:
scale
1inestartCOl - vecl[Ol * scale;
intersectpointC01
linestartCl] - vecl[ll * scale:
intersectpoint[l]
1inestartCll - veclC21 * scale;
intersectpointC21
- linestartC01:
Rotation by Projection
We can use the dotproducts projection capability to look at rotation in an interesting way. Typically, rotations are representedby matrices. This is certainly a workable
representation that encapsulates all aspects of transformation in a single object, and
is ideal for concatenationsof rotations and translations. One problem with matrices,
though, is that many people, myself included, have a hard time looking at a matrix of
sines and cosines and visualizing whats actually
going on. So when two 3 D experts,John
Using the dot product to get the distance from a point to a plane.
Figure 61.7
Frames of Reference
1 143
Previous
Home
Next
Carmack and Billy Zelsnack, mentioned that they think of rotation differently, in a
way that seemed moreintuitive to me,I thought itwas worth passing on.
Their approach is this: Think of rotation as projecting coordinates onto new axes.
That is, given that you have points in, say, worldspace, define the new coordinate
space (viewspace,for example) you want to rotate to by a set of three orthogonal unit
vectors defining the new axes,and thenproject each point ontoeach of the three axes to
get the coordinates in the
new coordinate space,as shown for the2-D case in Figure
61.8. In 3-D, this involves three dot products per point, one to project the point onto
each axis. Translation canbe done separately from rotation by simple addition.
Rotation by projection is exactly the same as rotation via matrix multiplication; in
fact, the rows of a rotation matrix are the orthogonal unit vectors pointing along
the new axes. Rotation byprojection buys us no technical advantages, so that b not
what b important here; the key is that the concept of rotation by projection, together with a separate translation step, gives us a new wayto look at transformation
that I, for one, find easier to visualize and experiment with. A new frame of reference for how we think about 3-0frames of reference, f y o u will.
Three things Ive learned over the years are that itnever hurts to learn new
a way of
looking at things, thatit helps tohave a clearer, more
intuitive model inyour head of
whatever it is youre working on, and that
new tools, or new waysto use old tools, are
Good Things.My experience has been that rotation
by projection, and dot product
tricks in general, offer those sorts of benefits for3-D.
y axis
1 144
Chapter 61
Previous
chapter 62
Home
Next
ga@>
,2@&
d(
Ai
a
.a*:
.;.~
n _ n
1147
time I tested my assumptions by writing an ASM routine to copy the framea dwordat
a time, without the help of memcpy(). This time the Pentium counters reported
16,000 writes.
whoops.
As it turns out, the memcpy() routine in the DOS version of our compiler (gcc)
inexplicably copies memory a byte at a time. With my new routine, the non-pageflipped approach suddenly became slightlyfaster than page flipping.
The first relevant rule is pretty obvious: Assume nothing. Measure early and often.
Know whats really going onwhen your program runs, if you catch my drift. To do
otherwise is to risk looking mighty foolish.
The second rule: when you do look foolish (and trust me, it will happen if you do
challenging work) have a good laugh yourself,
at
and use it as a reminder of Rule#l.
I hadnt done any extra page-flipping work yet, so I didnt waste any time due to my
faulty assumption that memcpy() performed a maximum-speed copy, but that was
just luck. I shouldhave done experimentsuntil I was sure I knew what was going on
before drawing any conclusions and acting on them.
In general, make it apoint not tofall into a tightlyfocused rut; stay loose and think
of alternative possibilities and newapproaches, and always, always,always keep
asking questions. It llpay offbig in the long run. IfI hadn t indulged my curiosity
by running the Pentium counter test on the copy to the screen, even though there
was no specific reason to do so, I would never have discovered the memcpyo
problem-and by so doing I doubled the performance of theentire program in five
minutes, a rare accomplishment indeed.
By the way, I have found the Pentiums performance counters to be very useful in
figuring out what my code really does and where the cycles are going. One useful source
of information on the performance counters and other aspects of the Pentium is
Mike Schmits book, Pentium Processor Optimization Tools, AP Professional,
ISBN 0-12-627230-1.
Onward to rendering froma BSP tree.
BSP-based Rendering
For the last several chapters Ive been discussing the nature of BSP (Binary Space
Partitioning) trees, and in Chapter 60 Ipresented a compiler for
2-D BSP trees. Now
were ready to use those compiled BSP trees to do realtime rendering.
As youll recall, the BSP compiler took a list of vertical walls
and built a 2-D BSP tree
from thewalls, as viewed
from above. The result is shown in Figure 62.1. The world is
split into two pieces by the line of the rootwall, and each half of the world is then
split again by the roots children, and so on, until the world is carved into subspaces
along thelines of all the walls.
1 148
Chapter 62
BSP tree
front %
ckc
front child
back child
child
Our objective is to draw the world so that wheneverwalls overlap we see the nearer
wall at eachoverlapped pixel. The simplest way to do that is with the painters algorithm; that is, drawing the walls in back-to-front order, assuming no polygons
interpenetrate or formcycles. BSP trees guarantee that no polygons interpenetrate
(such polygons are automatically split), and make it easy to walk the polygons in
back-to-front (or front-to-back) order.
Given a BSP tree, in order to render a view of that tree, all we have to do is descend
the tree, deciding each
at node whetherwere seeing the front or
back of the wall at
that node from the current viewpoint. We use that knowledge to first recursively
descend anddraw the farthersubtree of that node, thendraw that node, andfinally
draw the nearersubtree of that node.Applied recursively from the rootof our BSP
trees, this approach guarantees that overlapping polygons will always be drawn in
back-to-front order. Listing 62.1 drawsa BSP-based worldin this fashion. (Because of
the constraints of the printedpage, Listing 62.1is only the core of the BSP renderer,
without the program framework, some math routines, and the polygon rasterizer;
but, the entire program
is on theCD-ROM as DDJBSP2.ZIP. Listing 62.1 is in a compressed format, with relatively little whitespace; the full version on the CD-ROM is
formatted normally.)
LISTING 62.1 162-
1.C
/ * C o r er e n d e r e rf o rW i n 3 2p r o g r a mt od e m o n s t r a t ed r a w i n gf r o m
a 2-D
BSP t r e e : i l l u s t r a t e t h e u s e o f
BSP t r e e s f o r s u r f a c e v i s i b i l i t y .
U p d a t e W o r l d O i st h et o p - l e v e lf u n c t i o ni nt h i sm o d u l e .
f o r t h eB S P - b a s e dr e n d e r e r ,a n df o rt h e
F u l ls o u r c ec o d e
accompanying BSP c o m p i l e r , may b e d o w n l o a d e df r o m
ftp.idsoftware.com/mikeab. i n t h e f i l e d d j b s p 2 . z i p .
NT 3 . 5 . * /
T e s t e dw i t h VC++ 2.0runningonWindows
#define
FIXEDPOINT(x)
((FIXEDPOINT)(((long)x)*((long)OxlOOOO))~
# d e f i Fn IeX T O I N T ( x( i)n t ) ( x
>> 1 6 ) )
1 149
ANGLE(x)
((1ong)x)
STANDARD-SPEED (FIXEDPDINT(20))
STANDARD-ROTATION (ANGLE(4))
MAX-NUM-NODES
2000
MAX-NUM-EXTRA-VERTICES
2000
WORLD-MIN-X
(FIXEDPOINT(-16000))
WORLD-MAX-X
(FIXEDPOINT(16000))
WORLD-MIN-Y
(FIXEDPOINT(-16000))
WORLD-MAX-Y
(FIXEDPOINT(16000))
WORLD-MIN-Z
(FIXEDPOINT(-16000))
WORLD-MAX-Z
(FIXEDPDINT(16000))
PROJECTION-RATIO ( 2 . 0 1 1 . 0 )
11 c o n t r o l sf i e l do fv i e w :t h e
I1 b i g g e r t h i s i s , t h e n a r r o w e r t h e f i e l d o f v i e w
t y p e d e f l o n g FIXEDPOINT;
t y. p. e d e f s t r u c t -VERTEX (
FIXEDPOINT x .z .v i e w x ,v i e w z :
1 VERTEX, *PVERTEX;
t y p e d e fs t r u c t -POINT2 { FIXEDPOINT x , z : 1 POINTE.*PPOINT2;
t y p e d e fs t r u c t -POINTZINT ( i n t x : i n t y : 1 POINTLINT.*PPOINTZINT;
t y p e d el of n g
ANGLE:
11 a n g l easrset o r e d
i n degrees
t y p e d e f s t r u c t -NODE (
VERTEX * p s t a r t v e r t e x .* p e n d v e r t e x :
FIXEDPOINT w a l l t o p w
. a l l b o t t o m t. s t a r t t. e n d :
FIXEDPOINT c l i p p e d t s t a r tc.l i p p e d t e n d :
s t r u c t -NODE * f r o n t t r e e . * b a c k t r e e ;
c iosil vno itrs, i b l e :
FIXEDPOINT s c r e e n x s t a r st .c r e e n x e n d ;
FIXEDPOINT s c r e e n y t o p s t a r st ,c r e e n y b o t t o m s t a r t ;
FIXEDPOINT s c r e e n y t o p e n ds .c r e e n y b o t t o m e n d :
1 NODE. *PNDDE;
c h a r * pDIB:
/ I p o i n t teor
D I B s e c t iw
o ned' lr lai nwt o
HBITMAP h D I B S e c t i o n :
/ I h a n d l oe f
DIB s e c t i o n
HPALETTE h p a l D I B ;
int iteration
0. W o r l d I s R u n n i n g
1;
HWND h w n d o u t p u t ;
i n t D I B W i d t hD
. I B H e i g h tD
. I B P i t c hn. u m v e r t i c e sn, u m n o d e s :
FIXEDPOINT f x H a l f D I B W i d t h .f x H a l f O I B H e i g h t ;
VERTEX * p v e r t e x l i s t , * p e x t r a v e r t e x l i s t :
NODE * p n o d e l i s t :
POINT2 c u r r e n t l o c a t i o n .c u r r e n t d i r e c t i o n .c u r r e n t o r i e n t a t i o n :
ANGLE c u r r e n t a n g l e :
FIXEDPOINT c u r r e n t s p e e d f. x V i e w e r Y c. u r r e n t Y S p e e d :
FIXEDPOINT F r o n t C l i p P l a n e
FIXEDPOINT(10);
FIXEDPOINT FixedMul(FIXEDPOINTx.FIXEDPOINT
y):
FIXEDPOINTFixedDiv(FIXEDPD1NTx.FIXEDPOINT
y):
FIXEDPOINTFixedSin(ANGLEangle).FixedCos(ANGLEangle):
e x t e r n i n t FillConvexPolygon(POINT2INT * V e r t e x P t r .i n tC o l o r ) :
11 R e t u r n sn o n z e r o i f a w a l l i s f a c i n g t h e v i e w e r ,
0 else.
i n t WallFacingViewer(N0DE * p w a l l )
l d e f in e
#define
# d e f ine
i d e f ine
#define
#define
#define
P d e f in e
#define
#define
# d e f ine
l d e f i ne
--
FIXEDPOINT v i e w x s t a r t
pwall->pstartvertex->viewx:
FIXEDPOINT v i e w z s t a r t
pwall->pstartvertex->viewz:
FIXEDPOINT v i e w x e n d
pwall->pendvertex->viewx:
FIXEDPOINT v i e w z e n d
pwall->pendvertex->viewz:
i n t Temp;
I* I / e q u i v a l e n t C c o d e
i f ( ( ((pwall->pstartvertex->viewx >> 1 6 ) *
--
((pwall->pendvertex->view2 pwall->pstartvertex->viewz)
((pwall->pstartvertex->viewz
1 150
Chapter 62
>>
>>
16)) +
16) *
return(1):
else
return(0):
*I
rnov
eax.viewzend
e as ux .bv i e w z s t a r t
i mv iuel w x s t a r t
rnov
ecx,
edx
mov
ebx.eax
rnov
eax.viewxstart
e asxu.bv i e w x e n d
i mv iuel w z s t a r t
eax.ebx
add
edx.ecx
adc
mov
eax.O
jns
s h o r t WFVDone
e ai nx c
WFVDone:
mov
Temp, eax
return(Temp):
/ / U p d a t et h ev i e w p o i n tp o s i t i o n
v o i dU p d a t e v i e w P o s o
asneeded.
i f ( c u r r e n t s p e e d != 0) {
current1ocation.x
+= FixedMul(currentdirection.x.
currentspeed):
i f ( c u r r e n t 1 o c a t i o n . x <= WORLDLMINKX)
c u r r e n t l o c a t i o n . ~ = WORLDLMIN-X:
i f ( c u r r e n t l o c a t i o n . ~ >- WORLD-MAXLX)
c u r r e n t 1 o c a t i o n . x = WORLDLMAXLX - 1:
c u r r e n t 1 o c a t i o n . z += FixedMul(currentdirection.z.
currentspeed):
i f ( c u r r e n t 1 o c a t i o n . z <= WORLDLMINLZ)
c u r r e n t 1 o c a t i o n . z = WORLD-MIN-2:
i f ( c u r r e n t 1 o c a t i o n . z >= WORLDLMAXLZ)
c u r r e n t l o c a t i o n . ~= WORLDLMAXKZ - 1;
}
i f ( c u r r e n t Y S p e e d != 0) {
f x V i e w e r Y += c u r r e n t Y S p e e d :
i f ( f x V i e w e r Y <= WORLDLMINKY)
f x V i e w e r Y = WORLO_MIN_Y:
i f ( f x V i e w e r Y >= WORLD-MAX-Y)
f x V i e w e r Y = WORLDLMAXKY - 1;
/ / T r a n s f o r ma l lv e r t i c e si n t ov i e w s p a c e .
v o i dT r a n s f o r m v e r t i c e s 0
(
VERTEX * p v e r t e x :
FIXEDPOINTtempx.tempz:
intvertex:
pvertex = p v e r t e x l i s t :
f o r ( v e r t e x = 0 : v e r t e x < n u m v e r t i c e s ;v e r t e x + + )
I / T r a n s l a t et h ev e r t e xa c c o r d i n gt ot h ev i e w p o i n t
1 151
tempx
pvertex->x - current1ocation.x:
tempz
pvertex->z - current1ocation.z;
11 R o t a t et h ev e r t e x
so v i e w p o i n ti sl o o k i n g
pvertex->viewx
FixedMul(FixedMul(tempx.
down z a x i s
current orientation.^)
F i x e d M u l ( t e m p z-. c u r r e n t o r i e n t a t i o n . x ) .
FIXEDPOINT(PROJECTION_RATIO)):
p v e r t e x - > v i e w 2 = F i x e d M u l ( t e m p x . current orientation.^) +
F i x e d M u l ( t e m p z .c u r r e n t o r i e n t a t i o n . z ) :
pvertex++:
/ I 3-0 c l i pa l lw a l l s .
If a n y p a r t o f e a c h w a l l i s s t i l l v i s i b l e ,
/ I t r a n s f o r mt op e r s p e c t i v ev i e w s p a c e .
v o i dC l i p w a l l s o
NODE * p w a l l :
int wall:
FIXEDPOINT t e m p s t a r t x t. e m p e n d x t. e m p s t a r t z t. e m p e n d z :
FIXEDPOINT t e m p s t a r t w a l l t o p .t e m p s t a r t w a l l b o t t o m :
FIXEDPOINT t e m p e n d w a l l t o p .t e m p e n d w a l l b o t t o m ;
VERTEX * p s t a r t v e r t e x .* p e n d v e r t e x :
VERTEX * p e x t r a v e r t e x
pextravertexlist:
pwall
pnodelist:
f o r( w a l l
0: w a l l < numnodes;wall++)
I
I / Assume t h ew a l lw o n ' tb ev i s i b l e
pwall->isvisible
0:
11 G e n e r a t et h ew a l le n d p o i n t s ,a c c o u n t i n gf o r
t v a l u e sa n d
I / c l ip p i n g
/ I C a l c u l a t et h ev i e w s p a c ec o o r d i n a t e sf o rt h i sw a l l
pstartvertex
pwall->pstartvertex:
pendvertex
pwall->pendvertex;
I / L o o kf o r z c l i p p i n g f i r s t
/ I C a l c u l a t es t a r ta n de n d
z c o o r d i n a t e sf o rt h i sw a l l
i f (pwall->tstart
FIXEDPOINT(0))
tempstartz
pstartvertex->viewz:
else
tempstartz
pstartvertex->view2 +
- --
FixedMul((pendvertex->viewz-pstartvertex->viewz),
- -+
FixedMul((pendvertex->viewz-pstartvertex->viewz),
pwall->tstart);
i f (pwall->tend
FIXEDPOINT(1))
tempendz
pendvertex->viewz:
else
tempendz
pstartvertex->view2
pwall->tend):
I / C l i pt ot h ef r o n tp l a n e
i f (tempendz < F r o n t C l i p P l a n e ) I
i f (tempstartz < FrontClipPlane) [
/ I F u l l yf r o n t - c l i p p e d
g o t oN e x t w a l l :
1 else {
pwall->clippedtstart = pwall->tstart:
/ I Cliptheendpointtothefrontclipplane
pwall-klippedtend
FixedDiv(pstartvertex->view2 - FrontClipPlane,
pstartvertex->viewz-pendvertex->viewz):
tempendz
- pstartvertex->viewz +
FixedMul((pendvertex->viewz-pstartvertex->viewz),
pwall->clippedtend):
1 152
Chapter 62
1 else {
pwall->clippedtend pwall->tend;
if (tempstartz < FrontClipPlane) t
/ / Clip the start point to the front clip plane
pwall->clippedtstart
FixedDiv(FrontClipP1ane - pstartvertex->viewz,
pendvertex->viewz-pstartvertex->viewz):
tempstartz pstartvertex->view2 +
FixedMul((pendvertex->viewz-pstartvertex->viewz),
pwall->clippedtstart):
1 else t
pwall->clippedtstart pwall->tstart;
Calculate x coordinates
if (pwall-hlippedtstart
FIXEDPOINT(0))
tempstartx pstartvertex->viewx;
else
tempstartx pstartvertex->viewx +
FixedMul((pendvertex->viewx-pstartvertex->viewx),
pwall->clippedtstart);
if (pwall->clippedtend
FIXEDPOINT(1))
tempendx pendvertex->viewx;
else
tempendx pstartvertex->viewx +
FixedMul((pendvertex->viewx-pstartvertex->viewx).
pwall ->cl ippedtend)
;
/ / Clip in x as needed
if ((tempstartx > tempstartz) 1 1 (tempstartx < -tempstartz)) I
/ / The start point is outside the view triangle in
x:
/ / perform a quick test for trivial rejection by seeing if
/ / the end point is outside the view triangle
on the same
/ / side as the start point
if (((tempstartx>tempstartz) && (tempendx>tempendz)) I I
((tempstartx<-tempstartz) && (tempendx<-tempendz)))
/ / Fully clipped-trivially reject
goto NextWall;
/ / Clip the start point
if (tempstartx > tempstartz) {
/ / Clip the start point on the right side
pwall-klippedtstart
FixedDiv(pstartvertex->viewx-pstartvertex->viewz,
//
pendvertex->viewz-pstartvertex->viewz
pendvertex->viewx+pstartvertex->viewx):
tempstartx pstartvertex->viewx +
FixedMul((pendvertex->viewx-pstartvertex->viewx),
pwall->clippedtstart):
tempstartz tempstartx:
else {
/ / Clip the start point on the left side
pwall ->clippedtstart
FixedDiv(-pstartvertex->viewx-pstartvertex->viewz,
pendvertex->viewx+pendvertex->view2 pstartvertex->viewz-pstartvertex->viewx);
tempstartx
- pstartvertex->viewx +
tempstartz
- -tempstartx:
FixedMul((pendvertex->viewx-pstartvertex->viewx),
>
pwall->clippedtstart);
1 153
FixedDiv(pstartvertex->viewx-pstartvertex->viewz,
pendvertex->viewz-pstartvertex->view2
pendvertex->viewx+pstartvertex->viewx);
tempendx pstartvertex->viewx +
FixedMul((pendvertex->viewx-pstartvertex->viewx),
pwall-klippedtend):
tempendz tempendx:
1 else {
I / Clip the end point on the left side
pwall ->cl ippedtend
FixedDiv(-pstartvertex->viewx-pstartvertex->viewz,
pendvertex->viewx+pendvertex->view2 pstartvertex->viewz-pstartvertex->viewx):
tempendx pstartvertex->viewx +
FixedMul((pendvertex->viewx-pstartvertex->viewx),
pwall-klippedtend):
tempendz
-tempendx:
1
1
tempstartwall top FixedMul ((pwall ->wall top- fxViewerY 1,
FIXEDPOINT(PROJECTION_RATIO)):
tempendwalltop tempstartwalltop:
tempstartwall bottom FixedMul ((pwall ->wall
FIXEDPOINT(PROJECTION_RATIO)):
bottom-fxViewerY)
,
tempendwallbottom tempstartwallbottom:
I1 Partially clip in y (the rest is done later in 2D)
I / Check for trivial accept
if ((tempstartwalltop > tempstartz) I I
(tempstartwallbottom < -tempstartz) 1 I
(tempendwalltop > tempendz) I I
(tempendwallbottom < -tempendz)) {
I1 Not trivially unclipped: check for fully clipped
if ((tempstartwallbottom > tempstartz) &&
(tempstartwalltop < -tempstartz) &&
(tempendwallbottom > tempendz) &&
(tempendwalltop < -tempendz)) {
I / Outside view triangle. trivially clipped
goto NextWall :
1
I 1 Partially clipped in Y: we'll do Y clipping at
/ I drawing time
1
I1 The wall is visible: mark it as such and project it.
I1 +1 on scaling because of bottomlright exclusive polygon
I1 filling
pwall->isvisible 1:
pwall->screenxstart
( F i x e d M u l D i v ( t e m p s t a r t x . fxHalfDIBWidth+FIXEDPOINT(O.5).
tempstartz) + fxHalfDIBWidth + FIXEDPOINT(0.5)):
pwall->screenytopstart
(FixedMulDiv(tempstartwal1top.
(FixedMulDiv(tempstartwal1bottom.
1 154
Chapter 62
(FixedMulDiv(tempendwal1top.
--
apointC2l.x
apointC3l.x
apointC0l.y
apointCl1.y
apointC2l.y
apointC3l.y
-- FIXTOINT(pwal1->screenybottomstart):
- FIXTOINT(pwal1->screenybottomend):
- FIXTOINT(pwal1->screenytopend):
FillConvexPolygon(apoint.
//
//
//
//
//
//
//
FIXTOINT(pwal1->screeflytopstart):
pwall->color);
- FIXTOINT(pwal1->screenxend):
- FIXTOINT(pwal1->screenxend);
I f t h e r e ' s a n e a rt r e ef r o mt h i sn o d e .d r a w
it:
o t h e r w i s e ,w o r kb a c ku pt ot h el a s t - p u s h e dp a r e n t
node o f t h e b r a n c h
we j u s t f i n i s h e d : w e ' r e d o n e
if
t h e r e a r e no p e n d i n gp a r e n tn o d e s .
F i g u r ew h e t h e rt h i sw a l li sf a c i n gf r o n t w a r do r
backward:do
i n v i e w s p a c eb e c a u s en o n - v i s i b l ew a l l s
a r e n ' tp r o j e c t e di n t os c r e e n s p a c e ,a n d
we need t o
/ / t r a v e r s e all w a l l s i n t h e
BSP t r e e , v i s i b l e o r n o t ,
// i no r d e rt of i n d
all t h e v i s i b l e w a l l s
i f (WallFacingViewer(pwal1)) {
/ / We're on t h e f o r w a r d s i d e o f t h i s w a l l ,
d ot h e
/ / f r o n t c h i l d r e n now
pNearChildren
pwall->fronttree:
3 else {
/ / We're on t h e b a c k s i d e o f t h i s
w a l l , do t h eb a c k
/ / c h i l d r e n now
pNearChildren
pwall->backtree;
/ / Walk t h e n e a r s u b t r e e o f t h i s
i f ( p N e a r C h i l d r e n !- NULL)
g o t o Wal k N e a r T r e e ;
/ / Pop t h e l a s t - p u s h e d w a l l
pendingstackptr-;
pwall
*pendingstackptr:
i f (pwall
NULL)
g o t o NodesDone:
wall
Wal kNearTree:
pNearChildren:
pwall
NodesDone:
/ / R e n d e rt h ec u r r e n ts t a t e
v o i dU p d a t e w o r l d 0
o f t h ew o r l dt ot h es c r e e n .
HPALETTE h o l d p a l :
HDC h d c S c r e e nh. d c D I B S e c t i o n :
HBITMAP h o l d b i t m a p ;
/ / D r a w t h ef r a m e
UpdateViewPosO;
memset(pD1B. 0. D I B P i t c h * D I B H e i g h t ) :
/ / c l e af rr a m e
TransformVerticesO;
ClipWallsO:
DrawWallsBackToFrontO;
/ / We'vedrawntheframe:copy
i t t ot h es c r e e n
hdcScreen
GetDC(hwnd0utput):
S e l e c t P a l e t t e ( h d c S c r e e n . h p a l O I B . FALSE) :
holdpal
- -
RealizePalette(hdcScreen):
hdcDIBSection
CreateCompatibleDC(hdcScreen);
holdbitmap
SelectObject(hdcD1BSection. h D I B S e c t i o n ) :
1 156
Chapter 62
B i t B l t ( h d c S c r e e n . 0 . 0 . D I B W i d t hD
. I B H e i g h th. d c D I B S e c t i o n .
0 . 0. SRCCOPY);
SelectPalette(hdcScreen. h o l d p a l . FALSE):
R e l e a s e D C ( h w n d 0 u t p u th. d c S c r e e n ) :
SelectObject(hdcD1BSection. holdbitmap):
R e l e a s e D C ( h w n d 0 u t p u th. d c D I B S e c t i o n ) :
iteration++:
1 157
Clipping
In viewspace, the walls may be anywhere relative to the viewpoint: in front, behind,
off to the side. We only want to draw those parts of walls that properly belong on the
screen; that is, those parts that lie in the view pyramid (view frustum), as shown in
Figure 62.2. Unclipped walls-walls that lie entirely in the frustum-should be drawn
in their entirety, fully clipped walls should notbe drawn, and partially clipped walls
must be trimmed before beingdrawn.
In Listing 62.1, Clipwalk()does this in three steps for eachwall in turn. First, the z
coordinates of the two ends of the wall are calculated. (Remember, walls are vertical
and their ends
go straight up anddown, so the top and bottom
of each endhave the
side of the frontclip plane,
same xand z coordinates.) If both ends are on the near
far
then thepolygon is fully clipped, and were done with it. If both ends are on the
side, then the polygon isnt z-clipped, and we leave it unchanged. If the polygon
straddles the nearclip plane, then thewall is trimmed to stop at the nearclip plane
by adjusting the t value of the nearest endpoint appropriately; this calculation is a
simple matter of scaling by z, because the nearclip plane is at a constantz distance.
(The use o f t values for parametriclines was discussed in Chapter 60.) The process is
further simplified because the walls can be treated as lines viewed from above, so we
can perform 2-D clipping in z; this would not be the case if walls sloped or had
sloping edges.
After clipping in z, we clip by viewspace x coordinate, to ensure that we draw only
wall portions that lie between the left and right edges of the screen. Like z-clipping,
x-clipping can be done as a 2-D clip, because the walls and theleft and right sides of
1 158
Chapter 62
x == z clip plane
-x == z clip plane
Note: Solid lines are visible (unclipped)parts of walls, viewed from above.
the frustum are all vertical. We compare both thestart and endpointof each wall to
the left and right sides of the frustum, and reject, accept, or clip each walls t values
accordingly. The test for x clipping is very simple, because the edges of the frustum
are defined as the planes where x==z and -x==z.
The final clip stage is clipping by y coordinate, and this is the most complicated,
because vertical walls can be clipped at an angle in y, as shownin Figure 62.3, so true
3-D clipping of all four wall vertices is involved. We handle this in ClipWalls() by
detecting trivial rejection in y, using y==z and -y==z as the y boundaries of the frustum. However, we leave partial clipping to be handled as a 2-D clipping problem; we
are able to do this only because our earlier z-clip to the near clip plane guarantees
that no remainingpolygon point can have z<=O,ensuring that when
we project well
always pass valid, y-clippablescreenspace vertices to the polygon filler.
Projection to Screenspace
At this point, we have viewspace verticesfor each wall thats at least partially visible.
All we have to do is project these vertices according to z distance-that is, perform
perspective projection-and scale the results to the width of the screen, then well
be ready to draw. Although this step is logically separate from clipping, it is performed as the last step forvisible walls in Clipwalk().
One Story, Two Rules, and a BSP Renderer
1 159
clip plane
-y == z clip plane
Figure 62.3
1 160
Chapter 62
between the vector from theviewpoint to any polygonpoint and thepolygons normal and checking the sign. Listing 62.1 does both, but because our BSP tree is 2-D
and theviewer is always upright, we can save some work.
Consider this: Walls are stored so that the left end, as viewed from the frontside of
the wall, is the start vertex, and the right end is the end vertex. There are only two
possible ways that a wall can be positioned in screenspace, then: viewed from the
front, in which casethe start vertex is to the left of the endvertex, or viewed from the
back, in which case the start vertex is to the right of the end vertex, as shown in
Figure 62.4. So we can tell which side of a wall were seeing, and thus backface cull,
simply by comparing the screenspace x coordinates of the start and end vertices, a
simple 2-D version of checking the direction of the screenspace normal.
The wall orientation test usedfor walking the BSP tree, performed in WaUFacingViewer(),
takes the other approach, and
checks the viewspace sign of the dot productof the
walls normal with a vector from the viewpoint to the wall. Again, this code takes
advantage of the 2-D nature of the tree to generate the wall normal by swapping x
and z and altering signs. We cant use the quicker screenspace x test here that we
used for backface culling, because not all walls can be projected into screenspace;
for example,trying to project awall at z==O would result in division by zero.
All the visible, front-facing walls are drawn into buffer
a
by DrawWallsBackToFront(),
then Updateworld() calls Win32 to copy the new frame to the screen. The frameof
animation is complete.
vertex
start
vertex
end
vertexstartvertexend
BSP Renderer 1 1 61
Previous
Home
Next
1 162
Chapter 62
Previous
chapter 63
Home
Next
,/
@id
herhadan
aging, three-cylinder Saab-not one of the
e late OS, but a blunt-nosed, ungainly little wagon
sardine-like comfort, with two of them perched on
as the car I learned to drive on, and the oneI took whenever I
mother didnt needit.
, was a Volvo sedan, only a coupleof yearsold and
easily the classiest carfny family had ever owned. To the best of my recollection, as of
New Years of my senior year, I had never driven that car. However, I was going to a
New Years party-in fact, I was going to chauffeur four otherpeople-and for reasons lost in the mists of time, I was allowed to take the Volvo. So, one crystal clear,
stunningly cold night, I picked up my passengers, who included Robin Viola, Kathy
Smith,Jude Hawron ...and Alan, whose last
name Ill omit in case he wants to run for
president someday.
The party was at Craig Alexanders house, way out in the middle of nowhere, andit
was a goodone. I heard Al Green for thefirst time, much beer was consumed (none
by me, though), and around 2 a.m., we decided it was time to head home. So we
piled into theVolvo, cranked the heatup to the max, and set off.
1165
1 166
Chapter 63
Pentium Floating-PointOptimization
Im going to assume youre already familiar with FP
x86code in general; for additional
information, check out Intels Pentiurn Processor Users Munuul (order #241430-001;
1-800-548-4725),a book that you should have if youre doing Pentium programming
of any sort. Id also recommend taking a look around http://www.intel.com.
Im going to focus on six core instructions in this section: FLD, FST, FADD, FSUB,
FMUL, and FDIV. First, lets look at cycle times for these instructions. FLD takes 1
cycle; the value ispushed onto theFP stack and ready for use on the nextcycle. FST
takes 2 cycles, although when storing to memory, theres a potential extracycle that
can be lost, as Ill describe shortly.
Floating-Point for Real-Time 3-D
1 167
FDIV is a painfully slow instruction, taking 39 cycles at full precision and 33 cycles at
double precision, which is the default precision for Visual Ct+ 2.0.While FDIV executes, the FPU isoccupied, and cant process subsequent FP instructions untilFDIV
finishes. However, during the cycles while FDIV is executing (with the exception of
the onecycle during which FDIV starts), the integer unit
can simultaneously execute
instructions other thanIMUL. (IMUL usesthe FPU, and can only overlap with FDIV
for a few cycles.) Since the integer unit can execute two instructions per cycle, this
means its possibleto have three instructions, an FDIV and two integer instructions,
executing at the same time. Thats exactly what happens, for example, during the
second cycle of this code:
F D I VS T ( O ) . S T ( l )
ADD
EAX.ECX
I N C EDX
takes 6 cycles in all.) Again, its possible to execute integer-unit instructions during
the 2 (or 3, for FST) cycles after one of these FP instructions starts. Theres a more
exciting possibility here, though:Given properly structured code, theFPU is capable
of averaging 1 cycle per FADD, FSUB, or FMUL. The secret is pipelining.
FADD, can start on cycle N, FSUB can start on cycle N+1, FADD, can start on cycle
N+2, and FMUL can start on cycle N+3. At the start of cycleN+3, the result of FADD,
1 168
Chapter 63
is available in the destination operand, because its been 3 cycles since the instruction started; FSUB isstarting the final cycle ofcalculation; FADD, isstarting its second
cycle, withone cycle yetto go after this; and FMUL is about to be issued. Each of the
instructions takes 3 cycles to produce a result from the time it starts, but because
theyre simultaneously processed at different pipeline stages, one instruction is issued and one instruction completes
every cycle. Thus, the latency of these
instructions-that is, the time until theresult is available-is 3 cycles, but the throughput-the rate at which the FPU can start new instructions-is 1 cycle. An exception
is that the FPU is capable of starting an FMUL only every 2 cycles, so between these
two instructions
FMUL S T ( l ) . S T ( O )
F M U LS T ( E ) . S T ( O )
theres al-cycle stall,and the following three instructions execute just as fast asthe
above pair:
FMUL ST(l),ST(O)
FSLTD( 4 )
FMUL ST(O).ST(l)
Theres acaveat here, though: A FP instruction cant be issued until its operands are
available. The FPU can reach a throughputof 1 cycle per instruction on this code
FADD ST(l).ST(O)
F L D [temp]
FSUB ST(l).ST(O)
because neither the FLD nor the FSUB needs the result from the FADD. Consider,
however
FADD S T ( O ) . S T ( 2 )
FSUB ST(O).ST(l)
where the ST(0) operand to FSUB is calculated by FADD. Here, FSUB cant start
until FADD has completed, so there are 2 stall cycles between the two instructions.
When dependencies like this occur, the FPU runs atlatency rather than throughput
speeds, and performance can drop by as much as two-thirds.
FXCH
One piece of the puzzle is still missing. Clearly, to get maximum throughput, we
need to interleave FP instructions, such that at any one time ideally three instructions are in the pipeline at once. Further,these instructions must not depend on one
another for operands. But ST(0) must always be one of the operands; worse, FLD
can only push into ST(0), and FST can only store from ST(0). How, then, can we
keep three independentinstructions going?
1 169
o f f i r s tt w o :p r o d u c t st os t a r tw h i l et h i r d
: m u l t i p l i c a t i o nf i n i s h e s
fld
[ v e c 0; s+t0a]r t s
[ v e cflm
+O
u ll
fd1
[ vec0+4]
[ v e cfl m
+ 4u1l
fld
[vecO+dl
[ v e cfl m
+ 8u1l
st(1)fxch
cost
f a dsdt p( 2 ) . s t (:O
s t) a r t s
;starts
;starts
:starts
:starts
;starts
:no
& ends on c y c l e 0
on c y c l e 1
& ends on c y c l e 2
on c y c l e 3
& endson
cycle 4
on c y c l e 5
on c y c l e 6
fld
[ v e cfl m
+O
ul
fld
[ v e cfl m
+ 4u1l
1 170 Chapter 63
[vec0+0]
[vec0+41
17 c y c l e s
: s t a r t s & ends on c y c l e 0
; s t a r t s on c y c l e 1
: s t a r t s & ends on c y c l e 2
; s t a r t s on c y c l e 3
fld
fmul
[vecO+81
[vecl+8]
faddp
st(l).st(O)
faddp
st(l).st(O)
C
f s:dst opt tal r t s
s t a r t s & ends on c y c l e 4
s t a r t s on c y c l e 5
s t a l l sf o rc y c l e s6 - 7
s t a r t s on c y c l e 8
s t a l l sf o rc y c l e s9 - 1 0
s t a r t s on c y c l e 11
s t a l l sf o rc y c l e s1 2 - 1 4
on c1y5c.l e
: ends on c y c l e1 6
0
: s t a r t s & e n d so nc y c l e
: s t a r t s on c y c l e 1
2
: s t a r t s & e n d so nc y c l e
: s t a r t s on c y c l e 3
; s t a r t s & ends on c y c l e 4
; s t a r t s on c y c l e 5
;no c o s t
; s t a r t s on c y c l e 6
: s t a l l sf o rc y c l e s7 - 8
f a d d ps t ( l ) . s t ( O;)s t a r t s
on c y c l e 9
: s t a l l sf o rc y c l e s1 0 - 1 2
f[ sd: stoptta] r t s
on c1y3c.l e
: e n d so nc y c l e1 4
LISTING 63.4L63-4.ASM
; u n o p t i m i z e dc r o s sp r o d u c t :
fld
[vec0+41
[ v e cfl m
+ 8u]l
fld
[vec0+8]
[vec1+4]
fmul
36 c y c l e s
: s t a r t s & endson
cycle 0
: s t a r t s on c y c l e 1
2
: s t a r t s & e n d so nc y c l e
; s t a r t s on c y c l e 3
: s t a l l sf o rc y c l e s4 - 5
f s u b r ps t ( l ) . s t ( O;)s t a r t s
on c y c l e 6
: s t a l l sf o rc y c l e s7 - 9
f[svtepc 2:+s0t a] r t s
on c y1c0l .e
: ends on c y c l e 11
fld
[vecO+8]
: s t a r t s & ends on c y 1c l2e
: s t a r t s on c y c l e1 3
[vecl+O]
fmul
: s t a r t s & e n d so nc y c l e1 4
fld
[vec0+0]
; s t a r t s on c y c l e1 5
[ v e cflm
+ 8u]l
; s t a l l sf o rc y c l e s1 6 - 1 7
f s u b r ps t ( l ) . s t ( O;)s t a r t s
on c y c l 1e 8
: s t a l l sf o rc y c l e s1 9 - 2 1
fstp
[ v e c 2 ;+s4t 1a r t s
on c y2c2l .e
: ends on c y c l e 23
Floating-pointforReal-Time 3-D
1 171
: s t a r t s & e n d so nc y c l e2 4
: s t a r t so nc y c l e
25
: s t a r t s & ends on c y c l e2 6
[vec0+4]
: s t a r t so nc y c l e
27
: s t a l l sf o rc y c l e s2 8 - 2 9
f s u b r ps t ( l ) . s t ( O:)s t a r t os nc y c l e3 0
: s t a l l sf o rc y c l e s3 1 - 3 3
fstp
Cvec2+8]
: s t a r t s on c y3c4l e.
: e n d so nc y c l e3 5
fld
[vec1+4]
fmul
fld
[vecl+O]
fmul
Cvec0+01
We couldnt get rid of many of the stalls in the dot product codebecause with six
inputs and one output, itwas impossible to interleave all the operations. However,
the cross product, with three outputs, is much more amenable to optimization. In
fact, three is the magic number; because we have three calculation streams and the
latency of FADD, FSUB, and FMUL is 3 cycles, we can eliminate almost every single
stall in the cross-product calculation, as shownin Listing 63.5. Listing 63.5 loses only
one cycle to a stall, the cycle before the first FST; the relevant FSUB has just finished
extra cycle of latency associated with FST.
on thepreceding cycle, so we run into the
Listing 63.5 is more than 60 percent faster than Listing 63.4, a striking illustration of
the power of properly managing the Pentiums FP pipeline.
LISTING 63.5
L63-5.ASM
: o p t i m i z e dc r o s sp r o d u c t :2 2c y c l e s
fld
Cvec0+41
: s t a r t s & ec noy cdn lse
0
fmul
Cvec1+8]
: s t a r t s on c y c l e 1
fld
Cvec0+8]
: s t a r t s & e n doscny c l e
2
fmul
on c y c l e 3
Cvecl+O
: slt a r t s
fld
Cvec0+01
: s t a r t s & ends on c y c l e 4
fmul
C v e c l + 4: s1t a r t s
on c y c l e 5
fld
Cvec0+81
: s t a r t s & ends on c y c l e 6
fmul
Cvec1+41
: s t a r t s on c y c l e 7
fld
Cvec0+01
: s t a r t s & ends on c y c l e 8
fmul
C v e c l + :8sIt a r t s
on c y c l e 9
fld
: s t a r t s & ends on c y c l e1 0
Cvec0+41
: s t a r t s on c y c l e 11
[vecl+Ol fmul
st(2) fxch
: n oc o s t
sf st (u5b)r. ps t ( O )
: s t a r t s on c y c l e1 2
sf st (u3b)r. ps t ( O )
: s t a r t s on c y c l e1 3
fsst u( lb) r. ps t ( O )
: s t a r t s on c y c l e1 4
st(2) fxch
: n oc o s t
: s t a l l sf o rc y c l e1 5
: s t a r t s on c y c l e1 6 .
fstp
[vecE+O]
: ends on c y c l e1 7
: s t a r t s on c y c l e1 8 .
fstp
Cvec2+41
: e n d so nc y c l e1 9
: s t a r t s on c y c l e2 0 .
fstp
[vec2+81
: ends on c y c l e2 1
Transformation
Transforming a point, forexample from worldspace to viewspace, is one of the most
heavily used FP operations inrealtime 3-D. Conceptually, transformation is nothing
more than three dot products and three additions, as I will discuss in Chapter 61.
1 172
Chapter 63
U1
U,
m31
m32
m33
u3
m34
1 1
-I
"
1'
= mllul
+ m12u2
+ m13u3
m14
2'
= m21u1
m22u2
+ m23u3
m24
3'
= m31u1
+ m32u2
m33u3
m34.
LISTING63.6163-6.ASM
: o p t i m i z e dt r a n s f o r m a t i o n :3 4c y c l e s
: s t a r t s & ends on c y c l e
fld
[vecO+01
; s t a r t s on c y c l e 1
fmul
[matrix+Ol
; s t a r t s & ends on c y c l e
fld
[vec0+01
: s t a r t s on c y c l e 3
fmul
[matrix+l61
: s t a r t s & e n d so nc y c l e
fld
Cvec0+0]
; s t a r t s on c y c l e 5
fmul
[matrix+321
: s t a r t s & ends on c y c l e
fld
[vec0+41
: s t a r t s on c y c l e 7
fmul
[matrix+41
; s t a r t s & endson
cycle
fld
[vec0+41
: s t a r t s on c y c l e 9
fmul
[matrix+20]
; s t a r t s & e n d so nc y c l e1 0
fld
[vec0+43
: s t a r t s on c y c l e 11
fmul
[matrix+361
:no c o s t
fxch
st(2)
: s t a r t so nc y c l e1 2
faddp st(5),st(O)
; s t a r t s on c y c l e1 3
st(3),st(O)
faddp
: s t a r t s on c y c l e1 4
st(l),st(O)
faddp
: s t a r t s & ends on c y c l e
fld
[vecO+81
4
6
15
1 173
fmul
fl d
fmul
fld
fmul
fxch
faddp
faddp
faddp
fxch
fadd
fxch
fadd
fxch
fadd
fxch
fstp
[rnatrix+E]
[vecO+El
[matrix+241
[vecO+81
[matrix+40]
st(2)
st(5),st(O)
st(3).st(O)
st(l).st(O)
st(2)
[matrix+lEl
st(1)
[matrix+28]
st(2)
[matrix+441
st(1)
[vecl+Ol
fstp
[vecl+81
fstp
[vecl+41
;starts on cycle 16
;starts & ends on cycle 17
:starts on cycle 18
;starts & ends on cycle 19
;starts on cycle 20
:no cost
:starts on cycle 21
;starts on cycle 22
;starts on cycle 23
;no cost
;starts on cycle 24
;starts on cycle 25
;starts on cycle 26
;no cost
:starts on cycle27
:no cost
;starts on cycle 28,
; ends on cycle 29
;starts on cycle 30.
: ends on cycle 31
;starts on cycle 32,
; ends on cycle 33
Projection
The final optimization well look at is projection to screenspace. Projectionitself is
basically nothing more than divide
a
(to getl / z ) , followed by two multiplies (to get
x/z and y/z), so there wouldnt seem to be much in the way of FP optimization
possibilities there. However, remember that although
FDIV has a latency of up to 39
cycles, it can overlap with integer instructions for all but one of those cycles. That
means that if we can find enough independent integerwork to do before we need
the l / z result, we can effectively reduce thecost of the FDIV to one cycle. Projection
by itself doesnt offer much
with whichto overlap, but otherwork such as clamping,
with the FDIV for
window-relative adjustments, or 2-D clipping could be interleaved
the next point.
Another dramatic speed-upis possible by setting the precisionof the FPU down to
single precision via FLDCW, thereby cutting thetime FDIV takes to a mere19 cycles.
I dont have the space to discuss reduced precision in detail in this book, but be
aware that alongwith potentially greater performance, it carries certain
risks, as well.
The reduced precision,which affects FADD, FSUB, FMUL, FDIV, and FSQRT, can
cause subtle differences from the resultsyoud get using compiler defaults. If you
use reduced precision, you should be on the alert for precision-related problems,
such as clipped values that vary more than youd expect from the
precise clip point,
or the needfor using larger epsilons in comparisons for point-on-plane
tests.
Rounding
Control
-
Another useful area thatI can noteonly in passing here is that of leaving the FPU in
a particular rounding mode while performing bulk operations of some sort. For
1 1 74
Chapter 63
Previous
Home
Next
example, conversion to int via the FIST instruction requires that the
FPU be in chop
mode. Unfortunately, the FLDCW instruction must be used to get theFPU into and
out of chop mode, and eachFLDCW takes 7 cycles, meaning that compilers often
take at least 14 cycles for each float->int conversion. In assembly, youcan just set the
rounding state (or, likewise, the precision, for faster FDIVs) once at thestart of the
loop, and save all those FLDCW cycles each time through the loop. This is even
more true for ceil(), which many compilers implement as horrendously inefficient
subroutines, even though there are rounding
modes for both ceil()and floor().Again,
though, be aware that results of FP calculations will be subtly different from compiler default behavior while chop, ceil, or floor mode is in effect.
1 175
Previous
chapter 64
quake's visiblesurface
determination
Home
Next
anished video adapter manufacclone. The fellow whowas designing Video Sevens
ked around theclock for monthsto make his VGA
nfident he hadpretty much maxed out its perforishing touches on his chip design, however, news
r, Paradise, had juiced up the performance
of the
1179
The one-deepFIFO turned outto work astonishingly well; for a time,Video Sevens
VGAs were the fastest around, a testament to Toms ingenuity and creativity under
pressure. However, the truly remarkable part of this storyis that Paradises FIFO design
turned out to bear not the slightest resemblance to Toms, and didnt work as well.
Paradise had stuck a read FIFO between display memoryand the video output stage of
the VGA, allowing the video output to read ahead,so that when theCPU wanted to
access display memory,
pixels could come from the
FIFO while the CPU wasserviced
immediately. That did indeed help performance-but not as much as Toms writeFIFO.
The problem, of course, is howto go about transcendinglimits you dont even know
youve imposed. Theresno formula forsuccess, but two principles can standyou in
good stead: simplify and keep on trylng new things.
Generally, if you find your code getting more complex, youre fine-tuning a frozen
design, and its likely youcan get more of a speed-up,with less code, by rethinking
the design.A really good design should bringwith it a moment of immense satisfaction in which everything falls into place, and youre amazed at how little code is
needed andhow all the boundary cases just work properly.
As for how to rethink the design,do it by pursuing whatever ideas occur toyou, no
matter how off-the-wall theyseem. Many ofthe truly brilliant design ideas Ive heard
of over the years sounded like nonsense at first, because they didnt fitmy preconceived view of the world. Often, such ideas are in off-the-wall,
fact
butjust as the news
about Paradises chip sparked Toms imagination, aggressively pursuing seemingly
outlandish ideas can open upnew design possibilities for you.
Case in point: The evolution of Quakes 3-D graphics engine.
1 180 Chapter 64
on your side), detailed lightingand shadows, and 3-D monsters and players in place
of DOOMSsprites. Someday I hope to talk about how all that works, but for the here
and now I want to talk about what is,in my opinion, thetoughest 3-D problem of all:
visible surface determination (drawing the proper surface at each pixel), and its close
relative, culling (discardingnon-visible polygons as quickly
as possible, a way ofaccelerating visible surface determination). In the
interests of brevity,Ill use the abbreviation
VSD to mean both visible surface determination and culling from
now on.
Why do I think VSDis the toughest 3-D challenge? Although rasterization issues
such as texture mapping are fascinating and important, they are tasks of relatively
finite scope, and are being moved into hardware as 3-D accelerators appear; also,
they only scale with increases in screen resolution,which are relatively modest.
In contrast, VSDis an open-ended problem, and there are dozens of approaches
unsocurrently inuse. Even more significantly,the performanceof VSD, done in an
phisticated fashion,scales directly with scene complexity, which tends to increaseas
a square or cube function,
so this very rapidly becomes the limiting factor in rendering realistic worlds. I expect VSD to be the increasingly dominant issue in realtime
PC 3-D over the nextfew years, as 3-Dworlds become increasingly detailed. Already,
a good-sized Quake level contains on the order
of 10,000 polygons, about threetimes
as many polygons as a comparable DOOM level.
1 18 1
Chapter 64
As I discussed at lengthin Chapter 62, given a BSP, its easy and inexpensive to walk
the world in front-to-back or back-to-front order. The simplest VSD solution, which I
in fact demonstrated earlier,is to simply walk the tree back-to-front, clip each polygon to the frustum, anddraw it if its facing forward and not entirely clipped (the
painters algorithm).Is that an adequate solution?
For relativelysimple worlds, it is perfectly acceptable. It doesnt scale very well,though.
One problem is that as you add more polygons in the world, more transformations
and tests have to be performed to cull polygons that arent visible; at some point,
that will bog considerably performance down.
1 1 83
into two pieces created by the nodes above it in the tree, and the nodes children
then furthercarve that subspace into all the leaves that descend from the node.
Since a nodes subspace is bounded and convex, it is possible to test whether it is
entirely outside the frustum. If it is, all of the nodes children are certain to be fully
clipped and can be rejected without any additional processing. Since most of the
world is typically outside the frustum, many of the polygons in the world can be
culled almost for free, in huge, node-subspace chunks. Its relatively expensive to
perform a perfect test for subspace clipping, so instead bounding spheres or boxes
are often maintained for each node,
specifically for culling tests.
So culling to the frustum isnt a problem, and theBSP can be used to draw back-tofront. What, then, is the problem?
Overdraw
The problem John
Carmack, the driving technical force behind DOOM and Quake,
faced when he designed Quake was that in a complex world, many scenes have an
awful lot of polygons in the frustum. Most of those polygons are partially or entirely
obscured by other polygons, but the painters algorithm described earlier requires
that every pixel of every polygon in the frustum be drawn, often only to be overdrawn. In a 10,000-polygon Quake level, it would be easy to get a worst-case overdraw
level of 10 times or more; that
is, in some frames each pixel could be drawn
10 times
or more, on
average. No rasterizer is fastenough to compensate for an order
of such
magnitude and more
work than is actually necessaryto show a scene;worse still,the
painters algorithm will cause a vast difference between best-case and worst-case performance, so the frame ratecan vary wildly as the viewer moves around.
So the problem Johnfaced was how to keep overdraw down to a manageable level,
preferably drawing each pixel exactly once, but certainly no more thantwo or three
times in the worst case. As with frustum culling, it would be ideal if he could eliminate all invisible polygons in the frustum with virtually no work. It would also be a
plus if he could manageto draw only the visible parts of partially-visible polygons,
but that was a balancing act in that it had to be a lower-cost operation than the
overdraw that would otherwise result.
When I arrived at id at the beginningof March 1995,John already had an engine
prototyped anda plan in mind, andI assumed thatour work was a simple matter of
finishing and optimizing that engine. If I had been aware of ids history, however,I
would have known better. John had done not
only DOOM, but also the engines for
Wolfenstein 3-D and several earlier games, and had actually done several different
versions of each enginein the course of development (once doing four engines in
four weeks), for a total of perhaps 20 distinct engines over a four-year period. Johns
tireless pursuit of new and better designs for Quakes engine, fromevery angle he
could think of, would end only when we shipped the product.
1 184
Chapter 64
By three months after I arrived, only one element of the original VSD design was
anywhere in sight, and John hadtaken the dictum of try new things farther than
Id ever seen it taken.
Figure 64.4
Quakes Visible-SurfaceDetermination
1 1 85
The advantage to workingwith a 3 D beam tree, rather than a 2-D region, is that determining which side of
a beam plane a polygon vertexis on involves onlychecking the sign
of the dot product of the ray to the vertex and the plane normal, because allbeam planes
run through the origin (the viewpoint). Also, because a beam plane is completely described by a single normal, generating a beam from a polygon edge requires only a
crossproduct of the edge and a ray from the edge to the viewpoint. Finally, bounding
spheres of BSP nodes can be used do
to the aforementioned bulk culling to
the frustum.
The early-out feature of the beam tree-stopping when the beam tree becomes solidseems appealing, because it appearsto cap worst-case performance. Unfortunately,
there arestill scenes where its possibleto see all the way to the sky or theback wall of
the world, so in the worst case, all polygons in the frustum will still have to be tested
against the beam tree. Similar problems can arise from tiny cracks due to numeric
precision limitations. Beam-tree clipping is fairly time-consuming, and in scenes with
long view distances, such as views across the top of a level, the total cost of beam
processing slowed Quakes frame rate to a crawl. So, in the end, thebeam-tree approach proved to suffer from much the same malady asthe painters algorithm: The
worst case was much worse than the average case, and it didnt scale well with increasing level complexity.
1 186
Chapter 64
Last night I couldnt get to sleep, so I was thinking ... and Id know that I was
about to get my mind stretched yet again. John tried many ways to improve the
beam tree, with some success, but more interesting was the profusion of wildly
different approaches that he generated,
some of which weremerely discussed,others of which were implemented in overnightor weekend-long burstsof coding, in
both cases ultimately discarded or further evolved when they turned out not to
meet the design criteria well enough. Here are some of those approaches, presented in minimal detail in the hopes that,
like Tom Wilson withthe Paradise FIFO,
your imaginationwill be sparked.
Subdividing Raycast
Rays are cast in an 8x8 screen-pixel grid; this is a highly efficient operation because
the first intersection with a surface canbe found by simply clipping the ray into the
BSP tree, startingat theviewpoint, until asolid leaf is reached. If adjacent rays dont
hit the same surface, thenray
a is cast halfway between, and so on until all adjacent
rays either hit the same surface or are on adjacent pixels; then the block around
each ray is drawn from the polygon that was hit. This scales very well,being limited
by the number of pixels, with no overdraw. The problemis dropouts; its quite possible for small polygons to fall between rays and vanish.
Vertex-Free Surfaces
The world is represented by a set of surface planes.The polygons are implicit in the
plane intersections,and are extracted from planes
the as a final step beforedrawing.
This makes for fast clipping and a very small data set (planes are far more compact
than polygons), but its time-consuming to extract polygons from planes.
The Draw-Buffer
Like a z-buffer, but with 1 bit per pixel, indicating whether the pixel has been
drawn yet. This eliminatesoverdraw, but at thecost of an inner-loop buffer test,
extra writes and cache misses, and, worst of all, considerable complexity. Variations include testing the draw-buffer a byte at a time and completely skipping
fully-occluded bytes, or branching off each draw-buffer byte to one of 256 unrolled inner loops for
drawing 0-8 pixels, in theprocess possibly taking advantage
of the ability of the x86 to do the perspective floating-point divide in parallel
while 8 pixels are processed.
Span-Based Drawing
Polygons are rasterized into spans, which are addedto a global span list and clipped
against that list so that only the nearest span at each
pixel remains. Little sorting is
needed with front-to-back walking, because if theres any overlap, the span already in
the list is nearer. This eliminates
overdraw, but at the
cost of a lotof span arithmetic;
also, every polygon still has to be turned intospans.
Quakes Visible-Surface Determination
1 1 87
Portals
The holes where polygons are missing on surfaces are tracked, because its only
through such portals that line-of-sight can extend. Drawing goes front-to-back, and
when a portalis encountered, polygons and portals behind it are clipped to its limits, until no polygons or portals remain visible. Applied recursively, this allowsdrawing
only the visible portions of visible polygons,but at the cost ofa considerable amount
of portal clipping.
Breakthrough!
In the end, John decided that the beam was
treea sort of second-order structure,
reflecting information already implicitly contained in the world BSP tree, so he
tackled the problem of extracting visibility information directly from the world
BSP tree. He spenta week on this, as a byproduct devising a perfect DOOM (2-D)
visibility architecture, whereby a single, linear
walk of a DOOM BSP tree produces
zero-overdraw 2-D visibility. Doing the samein 3-D turned outto be a much more
complex problem, though, and
by the endof the week John was frustrated by the
increasing complexity and persistent glitches in the visibility code. Although the
direct-BSP approach was getting closer to working, it was taking more and more
tweaking, and a simple, clean design didnt seem to be falling out. When I left
work one Friday,John was preparing to try to get thedirect-BSP approach working
properly over the weekend.
When I came on
in Monday,John had the
look of a manwho had broken throughto
the other side-and also the look of a man who hadnt had much sleep. He had
worked all weekend on thedirect-BSP approach, and had gotten
it working reasonably well, withinsights into how to finish it off. At 3:30 Monday morning, as he lay in
bed, thinking about portals, he thoughtof precalculating and storing in each leaf a
list of all leaves visible from that leaf, and then at runtime justdrawing the visible
leaves back-to-front for whatever leaf the viewpoint happens to be in, ignoring all
other leaves entirely.
Size was a concern; initially, a raw, uncompressed potentially visible set (PVS) was
several megabytes in size. However,the PVS could be stored as a bit vector, with 1 bit
per leaf, a structure that shrunk a great deal with simple zero-byte compression.
Those steps, along with changing theBSP heuristic to generate fewer leaves(choosing as the nextsplitter the polygon that splits the fewest other polygons appears to
be the best heuristic) and sealing the outside of the levels so the BSPer can remove
the outside surfaces, which can never be seen, eventually brought thePVS down to
about 20 Kb for a good-size level.
In exchange for that20 Kb, culling leaves outside the frustum is speeded up (because only leaves in the PVS are considered), andculling inside the frustum costs
nothing more thana little overdraw (the PVS for a leaf includes all leaves visible
1 188
Chapter 64
from anywhere in the leaf, so some overdraw, typically on the orderof 50 percent
but ranging up to 150 percent, generally occurs). Better yet, precalculating the
PVS results in aleveling of performance; worst case is no longer muchworse than
best case, because theres no longer extra VSD processing-just more polygons
and perhapssome extra overdraw-associated with complex scenes.The first time
John showed me his working prototype, I went to the most complex sceneI knew
of, a place wherethe frame rate
used togrind down into thesingle digits, and spun
around smoothly, with no perceptible slowdown.
John says precalculating the PVS was a logical evolution of the approaches he had
been considering, that there
was no momentwhen he said Eureka! Nonetheless,
it was clearly a breakthrough to a brand-new, superior design, a design that, together with a still-in-development sorted-edge rasterizer
that completely eliminates
overdraw, comes remarkablyclose to meeting theperfect-worldspecifications we
laid out at thestart.
My friend Chris Hecker has a theory that all approaches work out to the same
thing in the end, since they
all reflect the sameunderlying state and functionali@.
In terms ofunderlying theory, Ive found that be
to true; whether you do perspective texture mapping with a divide or with incremental hyperbolic calculations,
the numbers do exactly the samething. When it comesto implementation, however,
my experience is that simply time-shifting an approach, or matching hardware
capabilities better, or caching can make an astonishing difference.
My friend Terje Mathisen likes to say that almost all programming can beviewed as
an exercise in caching, and thats exactly what John did. No matter how fast he
made his VSD calculations, they could never be as fast as precalculating and looking
up the visibility, and his most inspired move was to yank himself out of the faster
code mindset and
realize that itwas in fact possible to precalculate (in effect, cache)
and look up thePVS.
The hardest thing
in the world is to step outside afamiliar, pretty good solutionto a
difficult problem andlook for a different,
better solution. The best ways I know to do
Quakes Visible-Surface Determination
1 189
Previous
Home
Next
that are to keeptrying new, wacky things, and always, always, always try to simplify.
One of Johns goals is to have fewer lines of code in each 3-D game than in the
previous game, on the assumption that as he learns more, he should be able to
do
things betterwith less code.
So far, it seems to have worked out pretty well for him.
References
Foley,James D., et al., Computer Graphics: Principles
and Practice,Addison Wesley, 1990,
ISBN 0-201-121 10-7(beams, BSP trees, VSD) .
Teller, Seth, Visibility Computationsin Densely Occluded Polyhedral Environments (dissertation), available on http://theory.lcs.mit.edu/-Seth/ along with severalother papers
relevant tovisibility determination.
Teller, Seth, Visibility Preprocessing
for Interactive Walkthroughs,SIGGRAPH 91 proceedings, pp. 61-69.
1 190
Chapter 64
Previous
chapter 65
3-d clipping and other thoughts
Home
Next
;n
1193
I know several trulyoriginal thinkers who have far moreyet. Folks, itsnot theideas;
its design, implementation, and especially hard work that make the difference.
Virtually everyidea Ive encountered in 3-D graphics was invented decadesago. You
think you have a clever graphics idea? Sutherland, Sproull, Schumacker, Catmull,
Smith, Blinn, Glassner, Kajiya, Heckbert, or Teller probably thought of your idea
years ago. (Im serious-spend a few weeks reading through the literature on 3-D
graphics, and youll be amazed at whats already been invented and published.) If
they thought it was important enough,they wrote a paper about it, or tried
to commercialize it, but what they didnt dowas try to charge people for the idea
itself.
A closely related point is the astonishing lack of gratitude some programmers show
for the hard work and sense of community that went into building the knowledge
base with which they work. Howabout this? Anyone who thinks they have a unique
idea that they want to ownand milk for money can do so-but first they have to
track down and appropriately compensate all the people who made possible the
compilers, algorithms, programming courses, books, hardware, and so forth that
put themin a position to have their brainstorm.
Put that way, it sounds like a silly idea, but the idea behindsoftware patents is precisely that eventually everyone will own parts of our communal knowledge base, and
that programming will become in large part a process of properly identifylng and
compensating eachand every owner of the techniquesyou use. All I can say is that if
we do go down that path, I guarantee that
it will be a poorerprofession for all ofusexcept the patent attorneys, I guess.
Anecdote the third: A while back, I had the good fortune to have lunch down by
Seattles waterfront with Neal Stephenson, the author of Snow Crash and The Diumond Age (one of the best SF books Ive come across in a long time). As he talked
about the nature of networked technology and what he hoped to see emerge, he
mentioned that a couple of blocks down the street was the pawn shop where Jimi
Hendrix bought his first guitar. His point was that if a cheap guitar hadnt been
available, Hendrixs unique talentwould never have emerged. Similarly, he views the
networking of society as a way to get affordable creative tools to many people, so as
much talent as possible can be unearthed and developed.
Extend that to programming. The way it should work is that asteady flow of
information circulates, so that everyone can do thebest work theyre capable of. The idea is
that I dont gain by intellectually impoverishing you, and vice-versa; aswe both compete and (intentionally or otherwise) share ideas, both our products becomebetter,
so the market grows larger and everyone benefits.
Thats the way things have worked with programming for a long
time. So far as I can
see it has worked remarkably well, and the recent signs of change make me concerned about the future
of our profession.
1 194
Chapter 65
Things arent changingeverywhere, though; over the past year, Ivecirculated a good
bit of info about 3-D graphics, and plan to keep on doing itas long as I can. Next,
were going to take a look at 3-D clipping.
where each vertex is an (x,y,z) point. Planes can be described in many ways, among
a point on the plane and a unit normal, or a unit
them are three points on the plane,
well use the latter defininormal and a distance from the origin along the normal;
tion. Further,well define the normal to point to the inside (unclipped side) of the
plane. The structures for points,polygons, and planes are shown in Listing 65.1.
165-1 .h
LISTING 65.1
t y p e d e fs t r u c t I
d o u b l e vC31;
1 point-t;
color;
t y p e d e fs t r u c t
d o u ybx;l.e
I point2D-t:
t y p e d e fs t r u c t
o rc:oiln t
n uimn vt e r t s
point-t
1 polygon-t:
t y p e d e fs t r u c t
int
in t
point2D-t
1 polygon2D-t;
verts[MAX-POLY-VERTSl;
numverts;
vertsCMAX-POLY-VERTSI;
t y p e d e fs t r u c tc o n v e x o b j e c t L s
s t r u c tc o n v e x o b j e c t - s
point-t
double
in t
polygon-t
1 convexobject-t:
t y p e d e fs t r u c t I
d o u bdl ei s t a n c e ;
point-t normal
1 plane-t;
*pnext;
center;
vdi st;
numpolys :
*ppoly;
Given a line segment,and aplane to which to clip the segment, thefirst question is
whether the segment is entirely on the inside or the outside of the plane, or intersects the plane. If the segment is on the inside, then the segmentis not clipped by
the plane,and were done. If its on the outside, thenits entirely clipped,and were
likewise done. If it intersects the plane, then
we have to remove the clipped portion
of the line by replacing the endpoint on the outside of the plane with the point of
intersection between the line and the plane.
The way to answer this question is to find out which side ofthe plane each endpoint
is on, and the dot product
is the righttool for thejob. As you may recall from Chapof the projection of
ter 61, dotting any vector with a unit normal returns the length
that vector onto the normal. Therefore,
if we take anypoint and dot with
it the plane
normal well find out how far from the origin the point is, as measured along the
1 196
Chapter 65
plane normal. Anotherway to think of this is to say that the dot productof a point
and the plane normal returnshow far from the origin along the normal the plane
would have to be in order to have the point lie within the plane, as if we slid the
plane along the normal until it touched the point.
Now, remember that our definition of a plane is a unit normal and a distance along
the normal. That means that we have a distance for the plane as part of the plane
structure, and we can get the distance at which the plane would have to be to touch
the pointfrom the dot productof the point and the
normal; a simple comparison of
the two values suffices to tell us which side of the plane the point is on. If the dot
product of the point and the
plane normal is greater than the planedistance, then
the point is in front of the plane (inside the volume being clipped to); if its less,
then the pointis outside the volume and should be clipped.
After we do this twice,once foreach line endpoint, we know everything necessaryto
categorize our line segment. If both endpoints are on the same side of the plane,
theres nothing more to do, because the line is either completely inside or completely outside; otherwise, its on to the next step,clipping the line to the planeby
replacing the outside vertex with the point of intersection of the line and theplane.
Happily, itturns out that we already have all ofthe information we need to do this.
From our earlier tests, we already know the length from the plane, measured along
the normal, to the inside endpoint; thats just the distance, along the normal, of
the inside endpoint from the origin (the dot product of the endpoint with the
normal), minus the plane distance, as shown in Figure 65.1. We also know the
length of the line segment, again measured as projected onto the normal; thats
the difference between the distances along the normal of the inside and outside
endpoints from the origin. The ratio of these two lengths is the fraction of the
segment that remains after clipping. If we scale the x, y, and z lengths of the line
segment by that fraction, and add the
results to the inside endpoint, we get a new,
clipped endpoint at the point
of intersection.
Polygon Clipping
Line clipping is fine for wireframe rendering, butwhat we really want to do is polygon rendering of solid models, which requires polygon clipping.
As with line segments,
the clipping process with polygonsis to determine if theyre inside, outside, or partially inside the clip volume, lopping off any verticesthat areoutside the clip volume
and substituting vertices at theintersection between the polygon and theclip plane,
as shown in Figure 65.2.
An easy way to clipa polygon is todecompose it into a set of edges, and clip each edge
separately asa line segment. Lets define a polygon asa set of verticesthat wind clockwise around the outside of the polygonal area, as viewed from the front side of the
polygon; the edges are implicitly defined by the orderof the vertices. Thus, anedge is
3-D Clipping and Other Thoughts
1 1 97
1 1 98 Chapter 65
the line segment described by the two adjacent vertices that form its endpoints. Well
clip a polygon by clipping each edge individually, emitting vertices for the resulting
polygon as appropriate, depending on the
clipping state of the edge. If the start point
of the edge is inside, that point is added to the outputpolygon. Then, if the start and
end points are in different states (one inside and one outside),
we clip the edge to the
plane, as described above, and add the point which
at the line intersects the clip plane
as the next polygon vertex, as shown in Figure 65.3. Listing 65.2 shows a polygonclipping function.
LISTING 65.2
165-2.c
i n tC l i p T o P l a n e ( p o 1 y g o n - t* p i n .p l a n e - t* p p l a n e .p o l y g o n - t* p o u t )
int
i , j . n e x t v e rct u. r i n e. x t i n :
d o u b l ceu r d o tn. e x t d o ts,c a l e :
p o i n t - t* p i n v e r t .* p o u t v e r t :
pinvert = pin->verts;
poutvert = pout->verts;
c u r d o t = D o t P r o d u c t ( p i n v e r t .& p p l a n e - > n o r m a l ) :
c u r i n = ( c u r d o t >= p p l a n e - > d i s t a n c e ) :
f o r (i=O: i < p i n - > n u m v e r t s : i++)
nextvert
( i + 1) % p i n - > n u m v e r t s :
/ / Keep t h ec u r r e n tv e r t e x
i f i t si n s i d et h ep l a n e
i f (curin)
*poutvert++ = * p i n v e r t ;
nextdot = DotProduct(&pin->verts[nextvertl,
n e x t i n = ( n e x t d o t >= p p l a n e - > d i s t a n c e ) ;
&pplane->normal):
Add a c l i p p e d v e r t e x i f oneend
o ft h ec u r r e n te d g ei s
i n s i d et h ep l a n ea n dt h eo t h e ri so u t s i d e
( c u r i n != n e x t i n )
(pplane->distance - curdot) /
(nextdot - curdot):
f o r ( j = O : j < 3 : j++)
scale
poutvert->v[jl
pinvert->v[jl +
((pin->verts[nextvertl.v[jl
- pinvert->vCJl) *
scale):
poutvert++:
curdot = nextdot;
curin = nextin;
pinvert++:
pout->numverts = poutvert
i f (pout->numverts < 3 )
r e t u r n 0:
pout->verts;
pout->color
return 1;
- pin->color:
Believe it or not, this technique, applied inturn to each edge,is all thatsneeded to
clip a polygon to a plane.Better yet, a polygon can be clipped to multiple planes by
repeating the above process once for each
clip plane, with each interationtrimming
away any part of the polygon thats clipped by that particular plane.
One particularly useful aspect of 3-Dclipping is that if youre drawing texture mapped
polygons, texture coordinates can be clipped in exactly the same way as (x,y,z)coordinates. In fact, the very same fraction thats used to advance x, y, and z from the
inside point to the point of intersection with the clip plane can be used to advance
the texture coordinates as well, so only one extra multiply and one extra add are
required for each texture coordinate.
1 200
Chapter 65
11 P r o j e c tv i e w s p a c ep o l y g o nv e r t i c e si n t os c r e e nc o o r d i n a t e s .
I / N o t et h a tt h e
y a x i sg o e su p
i n w o r l d s p a c ea n dv i e w s p a c e .b u t
11 goes down i n screenspace.
v o i dP r o j e c t P o l y g o n( p o l y g o n - t* p p o l y ,p o l y g o n 2 D - t* p p o l y 2 D )
i:
int
d o u bzl ree c i p :
f o r( i - 0
zrecip
1.0 I p p o l y - > v e r t s [ i ] . v [ Z ] :
p p o l y Z D - > v e r t s [ i 1. x
p p o l y - > v e r t s ~ i I . v [ 0 1* z r e c i p * m a x s c a l e + x c e n t e r :
p p o l y Z D - > v e r t s [ i l . y = DIBHeight ( p p o l y - > v e r t s [ i l . v [ 1 1 * z r e c i p * maxscale + y c e n t e r ) :
--
ppoly2D->color
ppoly->color;
ppoly->numverts:
ppoly2D->numverts
/ / S o r tt h eo b j e c t sa c c o r d i n gt o
v o i dZ S o r t O b j e c t s ( v o i d )
st:
dist:
z d i s t a n c ef r o mv i e w p o i n t .
in t
i . j:
v d di o u b l e
c o n v e x o b j e c*t p- to b j e c t ;
point-t
objecthead.pnext
&objecthead:
: i < n u m o b j e c t s : i++)
f o r( i - 0
f o r( j - 0
: j < 3 : j++)
d i s t . v [ j ] = objectsCil.center.v[jl - currentpos.v[j];
objects[i].vdist = sqrt(dist.vC01 * dist.v[Ol +
dist.vC11 * dist.vC11 +
dist.v[Z] * dist.vC21):
pobject = &objecthead:
objects[il.vdist;
vdist
I1 V i e w s p a c e - d i s t a n c e - s o r tt h i so b j e c ti n t ot h eo t h e r s .
11 Guaranteed t o t e r m i n a t e b e c a u s e o f s e n t i n e l
w h i l e( v d i s t < p o b j e c t - > p n e x t - > v d i s t )
pobject = pobject->pnext:
1201
--
objects[il.pnext
pobject->pnext
/ / Move t h e v i e w p o s i t i o n
v o i dU p d a t e V i e w P o s O
pobject->pnext:
&objects[il:
and s e tt h ew o r l d - > v i e wt r a n s f o r m .
int
i;
p o i n t - tm o t i o n v e c ;
d o u b l e s . c,
mtemplC31C31,
mtempZC31C31:
/ / Move i nt h ev i e wd i r e c t i o n ,a c r o s st h ex - yp l a n e ,
as i f
I / w a l k i n g .T h i sa p p r o a c h
moves s l o w e r when l o o k i n g up or
I / down a t more o f an a n g l e
D o t P r o d u c t ( & v p n .& x a x i s ) :
motionvec.vC01
0.0:
motionvec.v[ll
motionvec.v[Zl
D o t P r o d u c t ( & v p n .& z a x i s ) :
f o r( i - 0
: i < 3 ; i++)
c u r r e n t p o s . v [ i ] +- m o t i o n v e c . v [ i l
i f ( c u r r e n t p o s . v [ i l > MAXKCOORD)
currentpos.vCi1
MAX-COORD:
i f ( c u r r e n t p o s . v [ i l < -MAX-COORD)
c u r r e n t p o s . v C i 1 = -MAXLCOORD:
currentspeed:
11 S e tu pt h ew o r l d - t o - v i e wr o t a t i o n .
/ / Note: much o ft h ew o r kd o n ei nc o n c a t e n a t i n gt h e s em a t r i c e s
/ / c a nb ef a c t o r e do u t ,s i n c e
i t c o n t r i b u t e sn o t h i n gt ot h e
/ I f i n a lr e s u l t :m u l t i p l yt h et h r e em a t r i c e st o g e t h e r
on paper
/ / t o g e n e r a t e a m i n i m u me q u a t i o nf o re a c ho ft h e
9 f i n a le l e m e n t s
s
sin(rol1):
c
cos(rol1):
mroll[Ol[O1
c:
mroll[0][11
s;
mroll[11COl = -s:
mroll[ll[ll
c;
s
sin(pitch1:
c = cos(pitch1:
mpitchCllCl1
c:
mpitchCl][Zl
s;
mpitch[21[11
-s;
mpitch[Zl[Zl
c:
s
sin(yaw);
c
cos(yaw);
myaw[Ol[Ol
c;
-s:
myaw[O1[21
myaw[Zl[O]
s:
myawCEl[Zl
c:
MConcat(mrol1. myaw. m t e m p l ) ;
MConcat(mpitch.mtempl,mtempz);
/ / B r e a ko u tt h er o t a t i o nm a t r i xi n t ov r i g h t .v u p ,
andvpn.
/ / We c o u l d w o r k d i r e c t l y w i t h t h e m a t r i x : b r e a k i n g
i t out
// into three vectors is just to
make t h i n g s c l e a r e r
f o r( i - 0
: i < 3 : i++)
--
--
--
-- -
vright.vCi1
mtempZCOlCi1:
vup.v[il
mtempZC11Cil:
vpn.v[il
mtempZC21Cil:
1 202
Chapter 65
/ / S i m u l a t ec r u d ef r i c t i o n
i f ( c u r r e n t s p e e d > (MOVEMENT-SPEED
* s p e e d s c a l e I 2.0))
MOVEMENT-SPEED * s p e e d s c a l e I 2.0;
currentspeed
e l s e i f ( c u r r e n t s p e e d < -(MOVEMENT-SPEED * s p e e d s c a l e I 2.0))
c u r r e n t s p e e d +- MOVEMENT-SPEED * speedscale / 2.0;
else
0.0:
currentspeed
--
/ / R o t a t e a v e c t o rf r o mv i e w s p a c et ow o r l d s p a c e .
v o i d B a c k R o t a t e V e c t o r ( p o i n t - t * p i n .p o i n t - t* p o u t )
{
int
i:
11 R o t a t e i n t o t h e w o r l d o r i e n t a t i o n
; i < 3 : it+)
f o r( i - 0
pout->v[il
pin->v[01 * vright.v[il
pin->v[11 * vup.v[il +
pin->v[21 * vpn.v[i]:
3
/ I T r a n s f o r m a p o i n tf r o mw o r l d s p a c et ov i e w s p a c e .
v o i dT r a n s f o r m P o i n t ( p o i n t - t* p i n ,p o i n t - t* p o u t )
{
int
i:
poi nt-t tvert:
/ / T r a n s l a t e i n t o a v i e w p o i n t - r e l a t i v ec o o r d i n a t e
f o r( i - 0
: i < 3 : i++)
tvert.v[il
pin->v[il - currentpos.v[il:
/ / R o t a t ei n t ot h ev i e wo r i e n t a t i o n
pout->v[O]
D o t P r o d u c t ( & t v e r t .& v r i g h t ) ;
pout->v[I]
O o t P r o d u c t ( & t v e r t .L v u p ) :
D o t P r o d u c t ( & t v e r t .b v p n ) ;
pout->v[2]
--
/ / T r a n s f o r m a p o l y g o nf r o mw o r l d s p a c et ov i e w s p a c e .
v o i d TransformPolygon(po1ygon-t * p i n p o l y ,p o l y g o n - t* p o u t p o l y )
{
in t
i:
/ I R e t u r n st r u e
i f p o l y g o nf a c e st h ev i e w p o i n t ,a s s u m i n g
/ / w i n d i n go fv e r t i c e s
a ss e e nf r o mt h ef r o n t .
i n t PolyFacesViewer(po1ygon-t * p p o l y )
int
i:
p o i n t - tv i e w v e c ,
f o r( i - 0
{
edgel,edge2.normal:
: i < 3 : i++)
-- -
viewvec.v[il
edgel.vCi1
edge2.vEil
a clockwise
p p o ~ y - > v e r t s [ 0 l . v ~ i -l c u r r e n t p o s . v [ i l :
ppoly->verts[0].vCil - p p o l y - > v e r t s ~ l l . v ~ i l ;
- ppoly->verts~ll.v~il;
ppoly->verts[2].v[il
BackRotateVector(norma1. &plane->normal);
plane->distance DotProduct(¤tpos. &plane->normal) +
CLIP-PLANELEPSILON;
--
---
-- -
--
curpoly 0;
ppoly
pin;
for (i-0 : i<(NUM-FRUSTUM-PLANES-l);
i++)
t
if (!ClipToPlane(ppoly.
&frustumpl anes[i 3 ,
&tpolyCcurpolyl) 1
return 0;
1204
Chapter 65
ppoly = &tpoly[curpolyl;
1;
curpoly
r e t u r nC l i p T o P l a n e ( p p o 1 y .
&frustumplanes[NUMKFRUSTUM_PLANES-ll,
pout) :
11 R e n d e rt h ec u r r e n ts t a t eo ft h ew o r l dt ot h es c r e e n .
v o i dU p d a t e W o r l d O
h o l d p a l:
hdcScreen.hdcOIBSection;
HBITMAP
holdbitmap:
polygon2D-t
screenpoly:
p o l Y g o*npKptopl oy t.l py 0ot .lpyol .l y 2 :
convexobject-t
*pobject:
in t
i , j . k:
HPALETTE
HDC
UpdateViewPosO;
memset(pDIBBase, 0, O I B W i d t h * O I B H e i g h t ) :
SetUpFrustumO:
ZSortObjectsO:
/ I Draw a l l v i s i b l e f a c e s i n a l l o b j e c t s
pobject = objecthead.pnext;
w h i l e( p o b j e c t
!= & o b j e c t h e a d )
ppoly
f o r( i - 0
{
/ / c l e af rra m e
pobject->ppoly:
; i < p o b j e c t - > n u m p o l y s ; i++)
/ I Move t h e p o l y g o n r e l a t i v e t o t h e a b j e c t c e n t e r
tpoly0.color = ppoly->color:
tpoly0.numvert.s
ppoly->numverts:
f o r ( j = O : j < t p o l y O . n u m v e r t s : j++)
i f (PalyFacesViewer(&tpalyO))
i f ( C l i p T o F r u s t u m ( & t p o l y O .& t p o l y l ) )
1
1
>
T r a n s f o r m P o l y g o n( & t p o l y l ,& t p o l y 2 ) :
P r o j e c t P o l y g o n( & t p o l y 2 &
. screenpoly):
F i l l P o l y g o n E D( & s c r e e n p o l y ) ;
ppoly++:
- pobject->pnext:
pobject
/ I We'vedrawn
t h ef r a m e :c o p y
i t t ot h es c r e e n
hdcScreen
GetDC(hwnd0utput):
holdpal
S e l e c t P a l e t t e ( h d c S c r e e n , hpalDIB.FALSE):
RealizePalette(hdcScreen):
h d c D I B S e c t i o n = CreateCompatibleDC(hdcScreen);
holdbitmap
SelectObject(hdc0IBSection. h O I B S e c t i o n ) :
B i t B l t ( h d c S c r e e n . 0. 0. DIBWidth.DIBHeight.hdcDIBSection.
0. 0 , S R C C O P Y ) :
1205
SelectPalette(hdcScreen. holdpal. F A L S E ) :
ReleaseDC(hwnd0utput. hdckreen):
SelectObject(hdcD1BSection. holdbitmap):
ReleaseDC(hwnd0utput. hdcDIBSection):
1 206
Chapter 65
and can be more efficient to generate the final, clipped polygon without any intermediate representations. For further reading on advanced clipping techniques, see
the discussion starting on page 271 of Foley and van Dam.
Finally, clipping in Listing 65.3 is performed in worldspace, rather thanin viewspace.
The frustum is backtransformed from viewspace (where it is defined, since it exists
relative to the viewer) to worldspace for this purpose. Worldspace clipping allows us
to transform only those vertices that arevisible, rather thantransforming all vertices
into viewspace, then clipping them. However, the decision whether to clip in
worldspace or viewspace is not clear-cut and is affected by several factors.
Previous
Home
Next
Further Reading
You now have the basics of 3-D clipping, but because fast clipping is central to highperformance 3-D, theres a lot more to be learned. One good place for further reading
is Foleyand van Dam; another is Procedural Elements of Computer Graphics,by David F.
Rogers. Readand understand eitherof these books, and youll knoweverything you
need forworld-class clipping.
And, as you read, you might take a moment to consider how wonderful it is that
anyone whos interested can tap into so much expert knowledge for the priceof a
book-or, on the Internet, for
free-with no strings attached. Our partof the world
is a pretty good place right now, isnt it?
1 208
Chapter 65
Previous
chapter 66
quake's hidden-surface removal
Home
Next
121 1
1 21 2
Chapter 66
Performance Impact
Whenever a z-buffer is involved, the questions inevitablyare: Whats the memory footprint and whats the performance impact? Well, the memory footprint at 320x200 is
128K, not trivial but not a big deal for a game that requires 8 MB to run. Theperformance impact is about 10 percent for z-filling the world, and roughly 20 percent (with
lots of variation)for drawing spritesand polygon models.In return, we get a perfectly
sorted world, and also the ability to do additional effects, such as particle explosions
and smoke, becausethe z-buffer lets us flawlessly
sort such effectsinto theworld. All in
all, the use of the z-buffer vastly improved
the visual quality and flexibility ofthe Quake
engine, and also simplified the code quite a bit, at an acceptable memory and performance cost.
The precalculated PVS was an important step toward both faster and more level
performance, because it eliminated the need
to identify visible polygons,a relatively
slow step that tended tobe at its worst in the most complex scenes. Nonetheless, in
some spots in real game levels
the precalculated PVS contains five times more polygons
than areactually visible;together with the back-to-front HSR approach, this created
Quakes Hidden-SurfaceRemoval
12 1 3
hot spots in which the frame rate boggeddown visibly ashundreds of polygons are
drawn back-to-front, most of those immediately getting overdrawn by nearer polygons. Raw performance in general was also reduced by the typical 50% overdraw
resulting fromdrawing everything in thePVS. So, although drawing the PVS back-tofront as the finalHSR stage worked and was an improvementover previous designs,
it was not ideal. Surely,John thought, theres a betterway to leverage the PVS than
back-to-front drawing.
And indeed there is.
Sorted Spans
The ideal final HSR stage for Quake would reject all the polygons in the PVS that are
actually invisible, and draw only the visible pixels of the remaining polygons, with no
overdraw, that is, with every pixel drawn exactly once, all at no performance cost, of
course. One way to do that (althoughcertainly not atzero cost) would be to drawthe
polygons from front-to-back, maintaining a region describing the currently occluded
portions of the screen and clipping each polygon tothat region before drawing it. That
sounds promising, but it is in fact nothing more or less than the beam tree approach I
described in Chapter 64, an approach that
we found to haveconsiderable overheadand
serious leveling problems.
We can do much betterif we move the final HSR stage from thepolygon levelto the
span level and use a sorted-spans approach. In essence, this approach consists of
turning each polygon into a setof spans, as shown in Figure 66.1, and then sorting
polygon A
Span generation.
Figure 66.1
1214
Chapter 66
spans
and clipping the spans against eachother until onlythe visible portions of visiblespans
are left to be drawn, as shown in Figure 66.2.This may sound a lot like z-buffering
(which is simply too slow for use in drawing the world, although its fine for smaller
moving objects, asdescribed earlier), but thereare crucial differences.
By contrast with z-buffering, only visible portions of visible spans are scanned out
pixel by pixel (although all polygon edges must still be rasterized). Better yet, the
sorting that z-buffering does at each pixel becomes a per-span operation with sorted
spans, and because of the coherenceimplicit in a span list, each edge is sorted only
against some of the spans on the same line and is clipped only to the few spans that
it overlaps horizontally. Although complex scenes still take longer to process than
simple scenes, the worst case isnt as bad as with the beam tree or back-to-front approaches, because theres no overdraw or scanning of hidden pixels, because
complexity is limited to pixel resolution and because span coherence tendsto limit
the worst-case sorting in anyone areaof the screen.As a bonus, the outputof sorted
spans is in precisely the form thata low-level rasterizer needs, a set of span descriptors, each consisting of a start coordinate and a length.
In short, the sortedspans approach meets our original criteria pretty well; although
it isnt zero-cost, itsnot horribly expensive, it completelyeliminates both overdraw
and pixel scanning of obscured portions of polygons and it tends to level worst-case
performance. We wouldnt want to rely on sorted spans alone as our hidden-surface
mechanism, but the precalculated PVS reduces the number of polygons to a level
that sorted spans can handle quite nicely.
So weve found the approach we need; now itsjust a matter of writing some code
and were on our way, right? Well, yes and no. Conceptually, the sorted-spans approach is simple, but its surprisingly difficult toimplement, with a couple of major
design choices to be made,a subtle mathematical element, andsome tricky gotchas
that Ill have to defer until Chapter 67. Lets look at the design choices first.
1 21 5
polygon A
I
spans
I
x = 22, y = 0, count = o
x = 2 2 , v = 1,count=0
x=21.v=2,count=1
x = 20. v = 3. count = 2
A and B composited
visible spans
A: x = 20, y = 0, count = 0
B: x = 22, y = 0, count = 0
A:x=20,y=l,count=l
B:x=22,y=l,count=O
Ax=19,y=2,count=2
B:x=21,y=2,count=l
Figure 66.2
1 21 6
Chapter 66
With edge-sorting, edges are stored inx-sorted, linked list buckets according to their
start scan line. Each polygon in turn is decomposed into edges, cumulatively building a list of all the edges in the scene. Once all edges for all polygons in the view
frustum have been added to the edge list, the whole list is scanned out in a single
top-to-bottom, left-to-right pass. An active edge list (AEL) is maintained. With each
step to a new scan line, edges that end onthat scan line are removed from the AEL,
active edges are stepped to their new x coordinates,edges starting on the new scan
line are added to the AEL, and the edges are sorted by current x coordinate.
For each scan line, a z-sorted active polygon list (APL)is maintained. The x-sorted
AEL is stepped through in order. As each new edge is encountered (that is, as each
polygon starts or ends as we move left to right), the associated polygon is activated
and sorted into theAPL, as shownin Figure 66.3, or deactivated and removed from
the APL, as shown in Figure 66.4, for a leadingor trailing edge, respectively. If the
nearest polygon has changed (that is, if the new polygon is nearest, or if the nearest
polygon just ended), a span is emitted for the polygon that just stopped being the
nearest, starting at the pointwhere the polygon first because nearest and ending at
the x coordinate of the current edge, and the current x coordinateis recorded in
the polygon that is now the nearest. This saved coordinate laterserves asthe startof
the span emitted when the new nearest polygon ceases to be in front.
Dont worryif you didnt follow all ofthat; theabove isjust a quick overview ofedgesorting to help make the rest of this chapter alittle clearer. My thorough discussion
of the topic will be in Chapter 67.
The spans that are generatedwith edge-sorting are exactly the same spans that ultimately emerge from span-sorting; the difference lies in the intermediate data
structures that are used to sort the spans in the scene. With edge-sorting, the spans
are kept implicit in the edges until the final set of visible spans is generated, so the
sorting, clipping, and span emission is done as each edge adds or removes a polygon,
based on the span state implied by the edge and the set of active polygons. With
span-sorting, spans are immediately made explicit when each polygon is rasterized,
and those intermediate spans are then sorted and clipped
against other thespans on
the scan line to generate the final spans, so the states of the spans are explicit at all
times, and all work isdone directly with spans.
Both span-sorting and edge-sorting work well, and both have been employed successfully in commercial projects. Weve chosen to use edge-sorting in Quake partly
because it seems inherently moreefficient, with excellent horizontal coherence that
makes for minimal time spent sorting, incontrast with the potentially costly sorting
into linked lists that span-sorting can involve. A more important reason, though,is
that with edge-sorting were able to share edges between adjacent polygons, and that
cuts the work involvedin sorting, clipping, and rasterizing edges nearly in half, while
also shrinking the world database quite a bit due to the sharing.
QuakesHidden-SurfaceRemoval
1217
I
I
.+
polygon J
zatx=18 is 100
1
z a t x = l 8 is 125
polygon L
z at x = l 8 is 500
121 8
Chapter 66
t
I
Current
edge;
since
its
lead
a edge
polygon
trailing edge, remove poly on
M from the activepolygon ist.
S; x =111
nearest at x=l8
polygon J
polygon L
mance penalties.
Nonetheless, theres no cut-and-dried answer as to which approach is better. In the
end, span-sorting and edge-sorting amount to the same functionality,and thechoice
between them is a matter of whatever you feel most
comfortable with. In Chapter 67,
Ill go into considerable detailabout edge-sorting, completewith a full implementation.
Im going the spend therest of this chapter laying the foundation for Chapter
67 by
discussing sorting keys and l / z calculation. In the process, Im going to have to
Quakes Hidden-SurfaceRemoval
1 2 19
Edge-Sorting Keys
Now that we know weregoing to sort edges, using them to emit spans for the polygons nearest the viewer, the question becomes: How can we tell which polygonsare
nearest? Ideally, wedjust store a sortingkey in each polygon, and whenever a new
edge came along, wed compare its surfaces key to the keys ofother currently active
polygons, and could easily tell which polygonwas nearest.
That sounds too good to be true, butit is possible. If,for example, your world database is stored as a BSP tree, with all polygons clipped into theBSP leaves, then BSP
walk order is a valid drawing order. So, for example, if you walk the BSP back-tofront, assigning each polygon an incrementally higher key as youreach it, polygons
with higher keys are guaranteedto be in front of polygons with lower keys.
This is the
approach Quakeused for awhile, although a different approach
is now being used,
for reasons Ill explain shortly.
If youdont happento have a BSP or similar data structurehandy, or if you havelots
of moving polygons (BSPs dont handle moving polygons veryefficiently), another
way to accomplish your objectives would be to sort all the polygons against one another before drawing the scene, assigning appropriate keys based on their spatial
relationships in viewspace. Unfortunately, this is generally an extremely slow task,
because every polygon must be compared to every other polygon. There are techniques to improve the performance of polygon sorts, but I dont know of anyone
whos doing general polygon sorts of complex scenes in realtime on a PC.
An alternative is to sort by z distance from the viewer in screenspace, an approach
that dovetails nicely with the excellent spatial coherence of edge-sorting. As each
new edge is encountered on ascan line, the correspondingpolygons z distance can
be calculated and compared to the other polygons distances, and thepolygon can
be sorted into theAPL accordingly.
Getting z distances can be tricky, however. Remember that we need to be able to
calculate z at any arbitrary point on a polygon, because an edge may occur and
cause its polygon to be sorted into theAPL at any point on thescreen. We could
calculate z directly from the screen x and y coordinates and the polygons plane
equation, but unfortunately this cant be done very quickly, because the z for a
plane doesnt vary linearly in screenspace; however, l / z does vary linearly, so well
use that instead. (See Chris Heckers 1995 series of columns on texture mapping
in Game Developer magazine for a discussion of screenspace linearity and gradients
for l/z.) Another advantage of using l / z is that its resolution increases with decreasing distance, meaning that by using l / ~ well
,
have better depth resolution
for nearby features, where it matters
most.
1 220
Chapter 66
The obvious way to get al / z value at any arbitrary point ona polygon is to calculate
l / z at the vertices, interpolate it down both edges of the polygon, and interpolate
between the edges to get the value at the point of interest. Unfortunately, that requires doing alot of workalong each edge, and
worse, requires division to calculate
the l / z step per pixel across each span.
A better solution is to calculate l / z directly from the plane equationand the screen
122 1
Previous
Home
Next
sorting on l / z , were able to leave polygons unsplit but still get correct drawing
order, so we have far fewer edges to process and faster drawing overall, despite the
added cost of l / z sorting.
Another advantage of l / z sorting is that it solves the sorting issues I mentioned at
the start involving moving models that are themselves small BSP trees. Sorting in
world BSP order wouldnt work here, because these models are separate BSPs, and
theres no easy wayto work them into the
world BSPs sequence order.We dont want
to use z-buffering for thesemodels because theyre often large objects such as doors,
and we dont want to lose the overdraw-reduction benefits that closed doors provide
when drawn through the edgelist. Withsorted spans, the edges of moving BSP models are simply placed in the edgelist (first clippingpolygons so they dont cross any
solid worldsurfaces, to avoidcomplications associated withinterpenetration), along
with all the world edges, and l / z sorting takes care of the rest.
Decisions Deferred
There is, without a doubt, an
awful lot of information in the preceding
pages, and it
may not all connect together yet in your mind. The code and accompanying explanation in the next chapter
should help; if you wantto peek ahead, the code
is available
on the CD-ROM as DDJZSORT.ZIP in the directory for Chapter 67. Youmay also
want to take a look at Foley and van Dams Computer Graphics or Rogers Procedural
Elements fm Computer Graphics.
As I write this, its unclear whether Quakewill end upsorting edges by BSP order or
l / z . Actually, theres no guarantee that sorted spans in any form will be the final
design. Sometimes it seems like we change graphics engines as often as they play
Elvis on the 50s oldies stations (but, onewould hope, with more aesthetically pleasing results!) and no doubt
well beconsidering thealternatives right up until theday
we ship.
1222
Chapter 66
Previous
chapter 67
Home
Next
l/z-sorted spans in Quake turned out pretty much the same way, as well see in a
moment. First, though, Id like to note up front that this chapter is very technical
and builds heavily on material I covered earlier in this section of the book; if you
havent already read Chapters 59 through 66, you really should. Make no mistake
about it, this is commercial-quality stuff; in fact, the code in this chapter uses the
1225
same sorting technique as the test version of Quake, QTESTl.ZIP, that id Software
placed on the Internet in early March 1996. This material is the Real McCoy, true
reports from the leading edge, and
I trust thatyoull be patient if careful rereading
and some occasional catch-up reading of earlier chapters are required to absorb
everything contained herein. Besides, the ultimate reference for
any design is working code, which youll find, in part, in Listing 67.1, and in its entirety in the file
DDJZSORT.ZIP on the CD-ROM.
1226
Chapter 67
First, as the world BSP tree is descended, we clip each nodes bounding box in turn
to see if its inside or outside each plane of the view frustum. The clipping results can
be remembered, andoften allow the avoidance of some or all clipping for thenodes
polygons. Forexample, all polygonsin a node that
has a trivially accepted bounding
box are likewise guaranteed to be unclipped and in the frustum, since they all lie
within the nodes volume and needno furtherclipping. This efficient clipping mechanism vanished as soon as we stepped out of BSP order, because a polygon was no
longer necessarily confined to its nodes volume.
Second, sorting on l / z isnt as cheap as sorting on BSP order, because floating-point
calculations and comparisons are involved, rather than integercompares. So Quake
got faster but, like Heinleins fifth rocket stage, there was clear evidence of diminishing returns.
That wasnt the bad part;after all, even a small speed increase is A Good Thing. The
real problem was that our initial l / z sorting proved to be unreliable. We first ran
into problems when two forward-facingpolygons started at a common edge,
because
it was hard to tell whichone was really in front (as discussed below), and we had to
do additional floating-point calculations to resolve these cases. This fixed the problems for a while, but then odd cases started popping up where just the right
combination of polygon alignments caused new sorting errors. We tinkered with
those too, adding more code and incurring additional slowdowns in the process.
Finally, we had everything working smoothly again, although by this point Quake
was back to pretty much the same speed it had beenwith BSP sorting.
And then yet another cropof sorting errors popped up.
We could have fixed those errors too; well takea quick look at how to deal with such
cases shortly. However, like
the sixth rocket stage, the fixes would havemade Quake
slower than it had been
with BSP sorting. S o we gave up andwent back to BSP order,
and now the code is simpler and sortingworks reliably. Its too bad our experiment
didnt work out, butit wasnt wastedtime because in trying whatwe did we learned
quite abit. In particular, we learned that the information
provided by a simple, reliable world ordering mechanism, such as a BSP tree, can do more good than is
immediately apparent, in terms of both performance and solid code,
Nonetheless, sortingon l / z can bea valuable tool, used inthe right context; drawinga
Quake world just doesnt happen to be sucha case. In fact,sorting on l / z is how were
now handling the sortingof multiple BSP models that lie within the same world leaf
in Quake. In this case, we dont have the option of using BSP order (because were
drawing multiple independent trees),so weve set restrictions on theBSP models to
avoid running into the
types of l / z sorting errors we encountered drawing the Quake
world. Next, well look at anotherapplication in which sorting on l / z is quite useful,
one where objects move freely through space. As is so often the case in 3-D, there is
no one right technique, but rather a great many different techniques, each one
handy in the right situations. Often, a combination of techniques is beneficial; for
SortedSpansinAction
1227
There are three types of l / z span sorting, each requiring a different implementation. Inorder of increasing speedand decreasing complexity, theyare: intersecting,
abutting, and independent. (These are namesof my own devising; I haven't come
across any standard nomenclature in the literature.)
invisible portion
of polygon B
-" ". -.
".-.
of polygon A
".""
l..--*
I...
-I
h
""
visible
of polygon A
*.--
""
portion
point
split
span
visible
portion
of polygon B
viewpoint
Note: Polygons A and B are viewed from above.
1228
Chapter
67
Intersecting is the slowest and most complicated type of span sorting, because it is
necessary to compare l / z values at two points in order to detect interpenetration,
and additional work must be done to split the spans as necessary. Thus, although
intersecting span sorting certainly works, its not the first choice for performance.
invisible portion
of polygon A
visible portion
of polygon A
I
visible portion
Polygone B starts here,
of polygon B
abutting polygon A.
At this location, both polygons
have the same 1 /z value.
viewpoint
Note: Polygons A and B are viewed from above.
1 229
most tie cases will show up as near matches, not exact matches. This imprecision
makes it necessary to perform two comparisons, one with an adjust-up by a small
epsilon and one with an adjust-down, creating a range in which near-matches are
considered matches. Fine-tuning this epsilon to catch all ties,without falsely reporting close-but-not-abutting edges as ties, proved to be troublesome in Quake, and the
epsilon calculations and extra comparisons slowed things down.
I do think that abuttingl / z span sortingcould have been made reliable enough for
production use in Quake, were it not thatwe share edges between adjacent polygons
in Quake, so that the world is a large polygon mesh. When a polygon ends and is
followed by an adjacent polygon that shares the edge that just ended, we simply
assume that the adjacent polygon sorts relative to other active polygonsin the same
place as the one that ended (because the mesh is continuous and theres no interpenetration), rather than doinga l / z sort from scratch. This speeds things up by
saving a lotof sorting, but it means that if there is a sorting error, awhole string of
adjacent polygons can be sorted incorrectly, pulled inby the onemissorted polygon.
Missorting is a very real hazard when a polygon is very nearly perpendicular to the
screen, so that the l / z calculations push the limits of numeric precision, especially
in single-precision floating point.
Many caching schemes are possible with abutting span sorting, because any given
pair of polygons, being noninterpenetrating,will sort in thesame order throughout
a scene. However, in Quake at least, the benefits of caching sort results were outweighed by the additional overhead of maintaining the caching information, and
every caching variant we tried actually slowedQuake down.
1230
Chapter 67
whole objects and drawing them back-to-front, while Listing67.1 draws all polygons
byway of a l/z-sorted edge list. Consequently, where the earlier program worked
only so long as object centers correctly described sorting order, Listing 67.1 works
properly for all combinations of non-intersecting and non-abutting polygons. In
particular, Listing 67.1 correctly handles concave polyhedra; a new L-shaped object
(the data for which is not included in Listing 67.1) has been added to the sample
program to illustrate this capability. The ability to handle complex shapes makes
Listing 67.1 vastly more useful for real-world applications than the 3-D clipping demo
from Chapter65.
LISTING67.1167-1
.C
/ / P a r t o f Win32program
t od e m o n s t r a t ez - s o r t e ds p a n s .W h i t e s p a c e
/ / removed f o rs p a c er e a s o n s .F u l ls o u r c ec o d e ,w i t hw h i t e s p a c e ,
/ / a v a i l a b l e f r o m ftp.idsoftware.com/mikeab/ddjzsort.zip.
10000
1000
5000
{
* p n e x t *. p p r e v :
color,visxstart,state:
z i n v 0 0 z. i n v s t e p x z. i n v s t e p y :
t y p e d e fs t r u c t edge-s t
x x. s t e p l.e a d i n g :
in t
*psurf:
surf-t
s t r u c t edge-s
* p n e x*tp. p r e *vp. n e x t r e m o v e :
I edge-t:
/ / Span.edge,andsurface
span-t
spans[MAX_SPANSl:
edge-t
edgesCMAX-EDGES]:
surf-t
surfsCMAXLSURFS1:
lists
line
line
/ / Headand
tail f o r t h e a c t i v e e d g e l i s t
e d g ee- td g e h e a de .d g e t a i l :
s u r f a c eo fa c t i v es u r f a c es t a c k
/ / p o i n t e r st on e x ta v a i l a b l es u r f a c ea n de d g e
s u r f*-pt a v a i l s u r f :
edge-t*pavai
1 edge:
SortedSpans in Action
1231
i; int
point-t viewvec;
for (i-0 ; i < 3 : i++)
viewvec.v[il
ppoly->verts[Ol.v[il
- currentpos.v[i];
11 Use an epsilon here s o we don't get polygons tilted s o
/ / sharply that the gradients are unusable or invalid
if (OotProduct (&viewvec. &pplane->normal) < -0.01)
return 1:
return 0;
numverts
- screenpoly->numverts;
-- --
-- -
slope
1232
Chapter 67
screenpoly->verts[nextvertl.y:
- deltax /
deltay:
- -
- -
--
pavailedge->psurf
- pavailsurf:
/ / Create the surface,so we'll know how to sort and draw from
I / the edges
pavailsurf->state 0:
Davai 1 surf
->col
- orcurrentcol
or:
1233
/ / Scan all the edges in the global edge table into spans.
void ScanEdges (void)
{
int
double
edge-t
span-t
surf-t
pspan
x. y ;
fx. fy, zinv, zinv2;
*pedge. *pedge2. *ptemp;
*pspan;
*psurf, *psurf2;
- spans;
- - --
&edgetail:
edgehead.pnext
NULL;
edgehead.pprev
edgehead.x
-0xFFFF;
edgehead.leading
1;
edgehead.psurf
&surfstack:
edgetail.pnext
NULL;
&edgehead;
edgetail.pprev
DIBWidth << 16;
edgetai1.x
0;
edgetai1.leading
edgetail.psurf
&surfstack;
/ / left
edge
of
screen
//
mark
edge
of
list
surfstack.pprev
&surfstack;
surfstack.pnext
0;
surfstack.color
surfstack.zinv00
-999999.0;
surfstack.zinvstepx
surfstack.zinvstepy
0.0:
for (y-0 ; y<OIBHeight : y++) {
fy
(doub1e)y;
/ / Sort in any edges that starton this scan
pedge newedges[yl.pnext:
pedge2 &edgehead;
while (pedge !- &maxedge) (
while (pedge->x > pedge2->pnext->x)
pedge2 pedgeZ->pnext;
ptemp pedge->pnext;
pedge->pnext pedge2->pnext;
pedge->pprev pedge2;
pedge2->pnext->pprev pedge;
pedgeZ->pnext pedge:
pedge2 pedge:
pedge ptemp;
--
--
- -- -
1;
surfstack.state
surfstack.visxstart
0;
for (pedge-edgehead.pnext ; pedge : pedge-pedge->pnext) I
psurf
pedge->psurf;
if (pedge->leading) (
/ / It's a leading edge. Figure out where it is
/ / relative to the current surfaces and insert
in
/ / the surface stack; if it's on top, emit the span
/ / for the currenttop.
1234
Chapter 67
/ I F i r s t , make s u r et h ee d g e sd o n ' tc r o s s
i f (ttpsurf->state
1) (
fx
(doub1e)pedge->x * (1.0 / (double)Ox10000):
/ I C a l c u l a t et h es u r f a c e ' s
l l z v a l u ea tt h i sp i x e l
psurf->zinv00 + psurf->zinvstepx * f x +
psurf->zinvstepy * f y ;
I / See i f t h a t makes i t a new t o p s u r f a c e
surfstack.pnext;
psurf2
zinv2
psurf2->zinv00 + psurf2->zinvstepx * f x
psurfZ->zinvstepy * fy:
zinv
--
i f (zinv
>-
zinv2) {
/ I I t ' s a new t o ps u r f a c e
/ I e m i tt h es p a nf o rt h ec u r r e n tt o p
x
( p e d g e - > x + OxFFFF) >> 16:
pspan->count
x - psurf2->visxstart:
i f ( p s p a n - > c o u n t > 0) (
pspan->y
y:
psurf2->visxstart;
pspan->x
pspan->color
psurf2->color:
/ I Make s u r e we d o n ' t o v e r f l o w
I / t h es p a na r r a y
i f ( p s p a n < &spansCMAX-SPANS])
pspan++:
- -
psurf->visxstart
x:
/ I Add t h e e d g e t o t h e s t a c k
psurf->pnext
psurf2:
psurf2->pprev
psurf:
surfstack.pnext
psurf:
&surfstack;
psurf->pprev
else {
/ I N o t a new t o p :s o r ti n t ot h es u r f a c es t a c k .
/ I Guaranteed t ot e r m i n a t ed u et os e n t i n e l
/ I b a c k g r o u n ds u r f a c e
do {
psurf2
psurf2->pnext:
zinv2
psurfZ->zinv00 +
psurf2->zinvstepx * f x +
psurf2->zinvstepy * fy;
1 w h i l e( z i n v < z i n v 2 ) :
/ I I n s e r tt h es u r f a c ei n t ot h es t a c k
psurf->pnext
psurf2:
psurf->pprev
psurfZ->pprev:
psurf2->pprev->pnext
psurf:
psurf2->pprev
psurf:
--
--
else {
I / I t ' s a t r a i l i n ge d g e :
i f t h i s was t h et o ps u r f a c e .
I / e m i t h es p a na n dr e m o v e
it.
I / F i r s t , make s u r et h ee d g e sd i d n ' tc r o s s
i f (-psurf->state
0) {
i f (surfstack.pnext
psurf) {
/ I I t ' s on t o p ,e m i tt h es p a n
x
( ( p e d g e - > x + OxFFFF) >> 1 6 ) :
pspan->count
x - psurf->visxstart:
i f (pspan->count > 0) {
pspan->y
y:
pspan->x
psurf->visxstart:
psurf->color:
pspan->color
- --
1 235
/ / Make s u r e we d o n ' t o v e r f l o w
/ / t h es p a na r r a y
i f ( p s p a n < &spans[MAX-SPANSl)
pspantc;
psurf->pnext->visxstart
x;
/ / Remove t h es u r f a c ef r o mt h es t a c k
psurf->pprev;
psurf->pnext->pprev
psurf->pprev->pnext
psurf->pnext;
}
/ / Remove e d g e st h a ta r ed o n e
pedge
removeedgesCy1;
{
w h i l e( p e d g e )
pedge->pprev->pnext
pedge->pnext;
pedge->pnext->pprev
pedge->pprev;
pedge
pedge->pnextremove:
--
/ / S t e pt h er e m a i n i n ge d g e so n es c a nl i n e .a n dr e - s o r t
f o (r p e d g e - e d g e h e a d . p n e x t
; pedge !- & e d g e t a i l ; 1 {
ptemp
pedge->pnext;
/ / S t e pt h ee d g e
p e d g e - > x +- p e d g e - > x s t e p ;
/ / Move t h e e d g e b a c k t o t h e p r o p e r s o r t e d l o c a t i o n .
/ / i f necessary
w h i l e( p e d g e - > x
< pedge->pprev->x) I
pedge2
pedge->pprev;
pedge->pnext:
pedge2->pnext
pedge->pnext->pprev
pedge2:
pedge2->pprev->pnext
pedge;
pedge->pprev
pedgeZ->pprev:
pedge->pnext
pedge2;
pedge2->pprev
pedge:
pspan->x
pedge
-1:
- -- -
ptemp;
/ / m a r kt h ee n do ft h el i s t
/ / D r a w a l lt h es p a n st h a tw e r es c a n n e do u t .
v o i d DrawSpans ( v o i d )
span-t
*pspan;
f o r (pspan-spans ; p s p a n - > x !- -1 ; pspan++)
memset(pDIB
+ (DIBPitch * pspan->y) + pspan->x.
pspan->col or,
pspan->count):
1
/ / C l e a rt h el i s t so fe d g e st oa d da n dr e m o v e
v o i dC l e a r E d g e L i s t s ( v o i d 1
(
i n t i:
1236
Chapter 67
on e a c hs c a nl i n e .
f o r (i=O
; i < D I B H e i g h t ; i++)
{
n e w e d g e s [ i l . p n e x t = &maxedge;
r e m o v e e d g e s [ i ] = NULL;
}
1
/ / R e n d e rt h ec u r r e n ts t a t eo ft h ew o r l d
v o i dU p d a t e W o r l d O
t o t h es c r e e n .
HPALETTE
HDC
HBITMAP
polygon2D-t
polygon-t
convexobject-t
in t
pl ane-t
poi ntLt
holdoal :
h d c S c r e e nh. d c D I B S e c t i o n ;
hol dbi tmap:
screenpoly;
* p p o l y .t p o l y 0 .t p o l y l .t p o l y 2 ;
*pobject;
i. j . k ;
plane;
tnormal :
UpdateviewPoso;
SetUpFrustumO:
ClearEdgeListsO;
pavailsurf = surfs:
p a v a i l e d g e = edges;
I / Draw a l l v i s i b l e f a c e s i n a l l o b j e c t s
pobject = objecthead.pnext;
w h i l e( p o b j e c t
!= & o b j e c t h e a d ) [
ppoly = pobject->ppoly;
: i < p o b j e c t - > n u m p o l y s : i++)
{
f o r (i=O
I / Move t h e p o l y g o n r e l a t i v e t o t h e o b j e c t c e n t e r
tpoly0.numvert.s = p p o l y [ i l . n u m v e r t s ;
f o r ( j = O ; j < t p o l y O . n u m v e r t s ; j++){
f o r (k=O ; k < 3 ; k++)
tpolyO.verts[jl.v[kl
= ppoly[il.verts[jl.v[kl
pobject->center.v[kl;
i f (PolyFacesViewer(&tpolyO. & p p o l y [ i l . p l a n e ) )
i f ( C l i p T o F r u s t u m ( & t p o l y O .& t p o l y l ) )
t
currentcolor = ppoly[il.color;
T r a n s f o r m P o l y g o n( & t p o l y l &
. tpoly2);
P r o j e c t P o l y g o n( & t p o l y 2 .& s c r e e n p o l y ) :
/ I Move t h ep o l y g o n ' sp l a n ei n t ov i e w s p a c e
/ / F i r s t move i t i n t ow o r l d s p a c e( o b j e c tr e l a t i v e )
tnormal = ppoly[il.plane.normal;
plane.distance = ppoly[i].plane.distance +
D o t P r o d u c t( & p o b j e c t - > c e n t e r .& t n o r m a l ) ;
/ / Now t r a n s f o r m i t i n t ov i e w s p a c e
/ I D e t e r m i n et h ed i s t a n c ef r o mt h ev i e w p o n t
p l a n e . d i s t a n c e -=
D o t P r o d u c (t & c u r r e n t p o s &
. tnormal
1;
I / R o t a t et h en o r m a li n t ov i e wo r i e n t a t
ion
plane.norma1 .v[O] =
D o t p r o d u c t( & t n o r m a l .& v r i g h t ) :
plane.norma1 .v[ll =
D o t p r o d u c (t & t n o r m a l &
. vup);
1237
plane.norma1 .v[21
D o t P r o d u c (t & t n o r r n a l &
. vpn):
A d d P o l y g o n E d g e s( & p l a n e &
, screenpoly):
pobject
pobject->pnext;
ScanEdges 0 ;
DrawSpans 0 ;
/ / Wevedrawntheframe;copy
i t t ot h es c r e e n
GetDC(hwndOutput1:
hdcScreen
holdpal
SelectPalette(hdcScreen. hpalDIB.FALSE);
RealizePalette(hdcScreen):
hdcDIBSection
CreateCompatibleDC(hdcScreen):
holdbitmap
SelectObject(hdcD1BSection. h D I B S e c t i o n ) ;
D I B W i d t hD. I B H e i g h th. d c D I B S e c t i o n .
B i t B l t ( h d c S c r e e n , 0.0,
0.0.
SRCCOPY):
S e l e c t P a l e t t e ( h d c S c r e e n , h o l d p a l , FALSE) ;
R e l e a s e D C ( h w n d D u t p u th, d c s c r e e n ) ;
SelectObject(hdcD1BSection. h o l d b i t m a p ) :
DeleteDC(hdcD1BSection);
By the same token, Listing 67.1is quite a bit more complicated than the earlier code.
The earlier codes HSR consisted of a z-sort ofobjects, followed by the drawing of the
objects in back-to-front order, one polygon at a time. Apart from the simple object
sorter, all that was needed was backface culling and a polygon rasterizer.
Listing 67.1 replaces this simple pipeline with a three-stage HSR process. After
backface culling, the edges of each of the polygons in the scene are added to the
global edge list, by way of AddPolygonEdges().After alledges have been added, the
edges are turned into spans by ScanEdgesO, with each pixel on the screen being
covered by one andonly one span (that is, theres no overdraw). Once all the spans
have been generated, theyredrawn by Drawspans(), and rasterization is complete.
Theres nothingtricky aboutAddPolygonEdges(),and Drawspans(),as implemented
in Listing 61.1, is very straightforward as well. In an implementation that supported
texture mapping,however, allthe spans wouldnt be
put on oneglobal span list and
drawn at once, as is done in Listing 67.1,because that would result indrawing spans
from all the surfaces in no particular order. (A surface is a drawing object thats
originally described by a polygon, but in ScanEdgesO there is no polygon in the
classic sense of a setof vertices bounding an area, but rather just ofa edges
set and a
surface that describeshow to draw the spans outlinedby those edges.) That would
mean constantly skipping from one texture to another, which in turn would hurt
processor cache coherency a great deal,
and would alsoincur considerable overhead
in settingup gradient and
perspective calculations eachtime a surfacewas drawn. In
Quake, we have a linkedlist of spans hangingoff each surface,and draw all the spans
for one surface beforemoving on to the next surface.
1238
Chapter 67
The core of Listing 67.1, and the most complex aspect of l/z-sorted spans, is
ScanEdgesO, where the global edge list is converted into a set of spans describing
the nearest surface at each pixel. This process is actually pretty simple, though, if
you think of it as follows:
For each scan line, thereis a setof activeedges, which are those edges that intersect
the scan line. A good partof S c d d g e s ( ) is dedicated to adding any edges that first
appear on the current
scan line (scan lines are processed from the topscan line on
the screen to the bottom), removing edges that reach their bottom on the current
scan line, and x-sorting the active edges so that theactive edges for the next
scan can
be processed from left to right. All this is per-scan-linemaintenance, and is basically
just linked list insertion, deletion, and sorting.
The heart of the action is the loop in ScanEdges() that processes the edges on the current scan line fromleft to right, generatingspans as needed. Thebest way to think of
this loop is as a surface event processor, where
each edge is an event withan associated
surface. Eachleading edge is an event markingthe start of its surface on that scan line;if
the surface is nearer than the currentnearest surface, then a span ends for the nearest
surface, and a spanstarts for thenew surface. Each trailing edge is an event marking
the endof its surface; if its surface is currently nearest,then a span ends for that surface,
and a span starts for thenext-nearest surface (the surface with the next-largest l / z at
the coordinate where the edge intersects the scan line). One handy aspect of this
event-oriented processing is that leadingand trailing edges do not need
to be explicitly paired, because they are implicitly paired by pointing to the same surface. This
saves the memory and time that would otherwise be needed to track edge pairs.
One more element is required in order for ScanEdges() to work efficiently. Each
time a leading or trailing edge occurs, it must be determined whetherits surface is
nearest (at a larger l / z value than any currently active surface). In addition, for
leading edges, the currently topmost surface must be known, and fortrailing edges,
it may be necessary to know
the currentlynext-to-topmost surface. The easiest way to
accomplish this is with a surface stuck that is, a linked list of all currently active surfaces, starting with the nearest surface and progressing toward the farthest surface,
which, as described below, is always the background surface. (The operation of this
sort of edge event-based stack was described and illustrated in Chapter 66.) Each
leading edge causes its surface to be l/z-sorted into the surface stack, with a span
emitted if necessary. Each trailing edge causes its surface to be removed from the
surface stack, again with a span emitted if necessary. As you can seefrom Listing 67.1,
it takes a fair bit of code to implement this, but all thats reallygoing on is a surface
stack driven by edge events.
Implementation Notes
Finally, a few notes onListing 67.1. First, youll
notice that althoughwe clip all polygons to the view frustum in worldspace, we nonetheless later clamp them to valid
Sorted Spans in Action
1239
1240
Chapter 67
Previous
Home
Next
plied to backface culling, so that polygons that are very nearly edge-on are culled.
This calculation would be more accurate if it were based directly on the viewing
angle, rather than on the dot product
of a viewing ray to the polygon with the polygon normal,but that would require a square root,and in my experience theepsilon
used in Listing 67.1works fine.
1 241
Previous
chapter 68
quake's lighting model
Home
Next
1245
or three other boyfriends (I never did get an exact count). The interesting thing,
though, was her response when I finally got aroundto asking her out.She said,Its
about time! When Iasked what she meant, shesaid, Ivebeen trying to get you to
ask me out all evening-but it took you forever! You didnt actually think Iwas interested in that Star Trek program, did you?
Actually, yes, I had thought that,
because Iwas interested in it. One thing I learned
from that experience, and have had reinforced countless times since, is that weyou, me, anyone who programs because they love it, who would do it for free if
necessary-are a breed apart.Were different, and luckily so; while everyone else is
worrying about downsizing, were in one of the hottest industries in the
world. And,
so far as I can see,
the biggest reason were in sucha good situation isnt intelligence,
or hardwork, or education, although those help;
its that we actually like this stuff.
Its important to keep it that way. Ive seen far too many people start to treat programming like a job, forgetting the
joy of doing it, and burn out.
So keep aneye on
how you feel about theprogramming youre doing,and if its getting stale,its time
to learn somethingnew; theres plentyof interesting programming of allsorts tobe
done. Follow your interests-and dont forgetto have fun!
Gouraud Shading
The traditional way to do realistic lighting inpolygon pipelines is Gouraud shading
(also known assmooth shading).
Gouraud shadinginvolves generating a lighting
value
1246
Chapter 68
A primary problem with Gouraud shading is that it requires the vertices used for
world geometry to serve as lighting sample points as well, even though there isnt
necessarily a close relationship between lighting and geometry. This artificial coupling often forces the subdivision of a single polygon into several polygons purely for
lighting reasons, as with the spotlights mentioned above; these extra polygons increase the world database size, and the extra transformations and projections that
they induce can harm performanceconsiderably.
Similar problems occurwith overlapping lights, and with shadows, where additional
polygons are required in order to approximate lighting detail well. In particular,
good shadow edges need small polygons, because otherwise the gradient between
light and dark gets spread across too wide an area. Worse still, the rate of lighting
Quakes Lighting Model
1247
Perspective Correctness
Another problem is that Gouraud shadingisnt perspective-correct. With Gouraud
shading, lighting varies linearly across the face of a polygon, in equal increments per
pixel-but unless the polygon is parallel to the screen, the same sort of perspective
correction is needed to step lighting across the polygon properly as is required for
texture mapping. Lack of perspective correction is not as visibly wrong for lighting
as it is for texture mapping,because smooth lighting gradients
can tolerate considerably more warping than can the detailed bitmapped images used in texture mapping,
but it nonetheless shows up in several ways.
1248
Chapter 68
Rotated 0
Rotated
degrees
90 degrees
1249
same scene, increasing the size ofthe world database and requiring extra rasterization,
at a performance cost. Triangles still dont provide perspective lighting; their lighting is rotationally invariant, but its still wrong-just wrong in a more consistant way.
Gouraud-shaded triangles still result in odd lighting patterns, and require lots of
triangles to support shadowing and otherlighting detail. Finally, triangles dont solve
clipping or viewing variance.
Yet another problem is that while it may work well to add extra geometry so that
spotlights and shadows showup well, thats feasible only for static lighting. Dynamic
lighting-light cast by sources that move-has to work with whatever geometry the
world has to offer, because its needs are constantly changing.
These issues led us to conclude that if we were going to use Gouraud shading, we
would have to build Quake levels from many small triangles, with sufficiently finely
detailed geometryso that complexlighting could be supported and the
inaccuracies
of Gouraud shadingwouldnt be too noticeable.Unfortunately, that line of thinking
brought us back to the problemof a muchlarger world database and a muchheavier
rasterization load (all the worse because Gouraud shading requires an additional
interpolant, slowing the inner rasterization loop), so that not only wouldthe world still
be less than totally solid, becauseof the limitations of Gouraud shading, but the engine
would also be too slow to support thecomplex worlds we had hoped for in Quake.
1 250
Chapter 68
thought to himselfthat if only there were some way to treat it as one surface, it would
then he realized that there was a way to do that.
The insight was to split lighting and rasterization into two separate steps. In a normal
Gouraud-based rasterizer, theres first an off-line preprocessing step whenthe world
database is built, during which polygons are added to support additional lighting
detail as needed, andlighting values are calculated at thevertices of all polygons.At
runtime, the lighting values are modified if dynamic lighting is required, and then
the polygons are drawn with Gouraud shading.
Quakes approach, which Ill call surface-based lighting, preprocesses differently,
and adds an extra rendering step. Duri,ng off-line preprocessing, a grid, called a
light map, is calculated for each polygon in the world, with a lighting value every 16
texels horizontally and vertically. This lighting is done by casting light from all the
nearby lights in the world to each of the grid points on the polygon, and summing
the results for each grid point. The Quake preprocessor filters the values, so shadow
edges dont have a stair-step appearance (a techniquesuggested by Billy Zelsnack) ;
additional preprocessing could be done, for example Phong shading to make surfaces appear smoothly curved. Then, at runtime, the
polygons texture is tiled into a
buffer, with each texel lit according to the weighted average intensities of the four
nearest light map points, as shown in Figure 68.3. If dynamic lighting is needed, the
light map is modified accordingly before the buffer, whichIll call a surface, is built.
Then thepolygon is drawn with perspective texture mapping,with the surface serving as the inputtexture, and with no lighting performed during the texture
mapping.
So what does surface-based lighting buy us? First and foremost, it provides consistent, perspective-correct lighting, eliminating all rotational, viewing, and clipping
variance, because lighting is done in surface space rather than in screen space. By
lighting in surface space, we bind the lighting to the texels in an invariant way, and
then the lighting gets a free ridethrough the perspective texture mapper and ends
up perfectly matched to the texels. Surface-based lighting also supports good, although notperfect, detail for overlapping lights and shadows. The 16-texel grid has
a resolution of two feet in the Quake frame of reference, and this relatively fine
resolution, together with the filtering performed when thelight map is built, is sufficient to support complex shadows with smoothly fading edges. Additionally,
surface-based lighting eliminates lighting glitches at t-junctions, because lighting is
unrelated to vertices.In short, surface-based lighting meets allof Quakesvisual quality
goals, which leaves onlyone question: How does it perform?
1251
Texture tile
Light map
0
32
128
9664
0
0
0
32 64 128
160
96
0
0
0
64 128
96
160
192
0
0
0
96
128
160
192
224
0
0
0
128
160
192
224
255
0
0
0
1252
Chapter 68
Surtace
1 252
1252
Chapter 68
Chapter
68
particularly efficient, because it consists of nothing more thaninterpolating intensity, combining it with a texel and using the result to look up a lit texel color, and
storing the results with a dwordwrite everyfour texels. In assembly language, we got
this code down to 2.25 cycles per lit texel in Quake. Similarly, the texture-mapping
inner loop, which overlaps an FDIV for floating-point perspective correction with
integer pixel drawing in 16-pixel bursts, has been squeezed down to 7.5 cycles per
pixel on a Pentium, so the combined inner looptimes for building and drawing a
surface is roughly in the neighborhood of 10 cycles per pixel. Its certainly possible
to write a Gouraud-shaded perspectivecorrect texture mapperthats somewhatfaster
than 10 cycles, but 10 cycles/pixel is fast enough to do 40 frames/second at640x400
on a Pentium/100, so the cycle counts of surface-based lighting are acceptable. Its
worth noting thatits possible to write a one-pass texture mapper that doesapproximately perspective-correct lighting. However, I have yet tohear of or devise such an
inner loop that isnt complicated and full of special cases, which makes it hard to
optimize; worse, this approach doesntwork well with the procedural andpost-processing techniques Ill discuss shortly.
Moreover, surface-based lighting tends to spend more of its time in inner loops,
because polygons can have any
number of sides and dont need to be split into multiple
smaller polygons for lighting purposes; this reduces the amount of transformation
and projection that are required, and makes polygon spans longer. So the performance of surface-based lighting stacks up very well indeed-except for caching.
I mentioned earlier that a 64x64 texture tile fits nicely in the processor cache. A
typical surface doesnt. Every texel in every surface is unique, so even at 320x200
resolution, something on the rough order
of 64,000 texels must be read in order to
draw a single scene. (The numberactually variesquite a bit, as discussed below,but
64,000 is in the ballpark.) This means that on a Pentium,were guaranteed to miss
the cache once every 32 texels, and the numbercan be considerably worse than that
if the texture access patterns are such that
we dont use every texel in agiven cache
line before that data
gets thrown out of the cache. Then, too, when surface
a
is built,
the surface buffer wont be in the cache, so the writes will be uncached writes that
have to go to main memory, then get readback from main memory at texture mapping time, potentially slowing things further still. All this together makes the
combination of surface building and unlit texture mapping a potential performance
problem, butthat never posed a problem during the
development of Quake, thanks
to surface caching.
Surface Caching
When he thought of surface-based lighting,
John immediately realizedthat surface building would be relatively
expensive. (In fact, he assumed itwould be considerably more
expensive than it actually turned out to be with full assembly-language optimization.)
1253
Consequently, his designincluded the concept of caching surfaces,so that if the same
surface were visible inthe next frame, it could bereused without having tobe rebuilt.
With surface rebuilding needed only rarely, thanks to surface caching, Quake's
rasterization speed is generally the speed of the unlit, perspective-correct texturemapping inner loop,which suffers from more cache misses than Gouraud-shaded,
tiled texture mapping, but doesn't have the overhead of Gouraud shading, and allows
the use of larger polygons. In the worst case, where everything in a frame is a new
surface, the speedof the surface-caching approach is somewhat slower than Gouraud
shading, but generally surface caching provides equal or better performance, so once
surface caching was implemented in Quake, performance was no longer a problem-but size became a concern.
The amountof memory required for surface caching looked
forbidding at first. Surfaces
are large relative to texture tiles, because every texel of every surface is unique. Also,
a surface can contain many texels relative to the numberof pixels actuallydrawn on
the screen, because due to perspective foreshortening, distant polygons have onlya
few pixels relativeto the surface size in texels. Surfaces associated with partly hidden
polygons must be fully built, even though only part of the polygon is visible, and if
polygons are drawn back to front with overdraw, some polygons won't even be visible, but will still require surface building and caching. What all this meant was that
the surface cache initially looked to be very large, on the order
of several megabytes,
even at 32Ox200"too much for a game intended
to run on an8 MB machine.
1254
Chapter 68
oooeee
\
Corresponding mipmap
level 1 texels
1255
Previous
Home
Next
less than effects done by on-screen overdraw every frame. Basically, the surface is a
handy repository for all sorts of effects, because multiple techniques can be
composited, because it caches the results for reuse without rebuilding, and because
the texels constructed ina surface are automaticallydrawn in perspective.
1 256
Chapter 68
Previous
chapter 69
Home
Next
1 259
Id goneonly a few hundred feetwhen I heard footsteps and shouting behind me,
and a wild-eyed man in a business suit camerunning up to me, screaming. Its bad
enough you park in our lot, but now youre leaving your garbage lying around! he
yelled. Dontyou people have anysense of decency? I told him
I worked at NESEC
and was going to pick up the can on my way back, and he shouted,Dont give me
that! I repeated my statements, calmly, and told him who I worked for and where
my office was, and hesaid, Dont give me thatagain, butwith a littleless certainty.
I kept adding detail until was
it obvious that Iwas telling thetruth, and hesuddenly
said, Oh, my God, turned red, andstarted to apologize profusely. A few days later,
we passed in the hallway, and he didnt look me in the eye.
The interesting point is that there was really no useful outcome that could have
resulted from his outburst. Suppose I had been astudent-what would he have accomplished by yelling at me? He let
his emotions overrulehis common sense,and as
a result, did something he laterwished he hadnt. Ive seen many programmers do
the same thing, especially when theyre working
long hoursand not feeling
adequately
appreciated. For example, sometime back I got mail from a programmerwho complained bitterly that although hewas critical to his companys success, management
didnt appreciate his hard work and talent, and asked if I could help him find a
betterjob. I suggested several ways that he mightlook for anotherjob, butalso asked
if he had tried working his problems out with his employers; if he really was that
valuable, what did he have to lose? He admitted he hadnt, and recently he wrote
back and said that he hadtalked to his boss, and now he was getting paid a lot more
money, was getting creditfor his work,and was just flat-out happy.
We programmers thinkof ourselves asrational creatures, butmost of us get angry at
times, and when we do, like everyone else, we tend to be driven by our emotions
instead of our minds. Its my experience that thinking rationally under those circumstances can be difficult, but produces better long-termresults every time-so if
you find yourself in that situation, stay cool and think your way through it, and odds
are youll be happier down the road.
Of course, most of the time programmers really are rational creatures,and the more
information we have, the better. In that spirit, lets look at more of the stuff that
makes Quaketick, starting with what Ive recently learned about surface caching.
1 260
Chapter 69
1261
which is not universally true. Neither do all APIs or accelerators allow applications
enough control over the texture heap so that an efficient surface cache can be implemented, a point that favors non-caching approaches. (A similar option that wasnt
open to us due to time limitations is downloading 8-bpp surfaces and having the
accelerator expand them to l 6 b p p surfaces as it stores them in texture memory.
Better yet, some accelerators support 8-bpp palettized hardware textures that are
expanded to IGbpp on thefly during texturing.)
1262
Chapter 69
1263
Quake 5 triangle-model
Figure 69.1
1264
Chapter
69
drawing pipeline.
1265
As it turns out, though,we were wrong: The repeatedframes were sometimes painfully visible, looking like jerky cardboard cutouts. In fact they looked a lot like the
sprites used in DOOM-precisely the effect we were trying to avoid. This was especially true if we reused them more than once-and if we reused them only once,
then we had to do onefull 3-D rendering plus two sprite renderings every two frames,
which wasnt much faster than simply doing two 3-D renderings.
The sprite architecture also introduced considerable code complexity, increased
memory footprint because of the needto cache thesprites, and madeit difficult to
get hidden surfaces exactly right because sprites are unavoidably 2-D. The performance of drawing the sprites dropped sharply as models got closer, and thats also
where the sprites looked worse when they were reused, limiting sprites to use at a
considerable distance. All these problems could have been worked out reasonably
well if necessary, but thesprite architecturejust had the
feeling of being fundamentally not the right approach,
so we tried thinking along differentlines.
1266
Chapter 69
split, with texture and screen coordinates that are halfway between those of the vertices at theendpoints. (The same splitting could be done for lighting, but we found
that for small triangles-the sort that subdivision works well on-it was adequate to
flat-shade each triangle at the light level of the first vertex, so we didnt botherwith
Gouraud shading.) The halfway values can be calculated very quickly with shifts.
This vertex is drawn, and then each of the two resulting triangles is then processed
recursively in the sameway, as shown in Figure 69.2. There are some additionaldetails, such as the fill rule that ensures that eachpixel is drawn only once (exceptfor
backside vertices, as noted above),but basically subdivisionrasterization boils down
to taking a triangle, splitting a side that has at least one undrawn pixel and drawing
the vertex at the split, and repeating the process for each of the two new triangles.
The code to do this, shown in Listing 69.1, is very simple and easily optimized, especially by comparison with a generalized triangle rasterizer.
Subdivision rasterization introduces considerably more error than affine texture
mapping, and doesnt draw exactly the right triangle shape, but the difference is
very hard to detect for triangles that contain only a few pixels. We found that the
point at which the difference between the two rasterizers becomes noticeable was
surprisingly close: 30 or 40 feet for the Ogres, and about 12 feet for the Zombies.
This means thatmost ofthe triangle models that are visible in atypical Quake scene
are drawn with subdivision rasterization, not affine texture mapping.
How much does subdivision rasterization help performance? When John
originally
implemented it, it more than doubled triangle-model drawing speed, because the
affine texture mapper was not yet optimized. However, I took it upon myself to see
how fast I could make themapper, so now affine texture mapping is only about 20
percent slower than subdivision rasterization. While 20 percent may not soundimpressive, it includes clipping, transform, projection, and backface-culling time, so
the rasterization difference alone is more than 50 percent. Besides, 20 percent overall means thatwe can have 12 monsters now where we could only havehad 10 before,
so we count subdivision rasterization as a clear success.
LISTING 69.1
169-1.C
Quakes r e c u r s i v es u b d i v i s i o nt r i a n g l er a s t e r i z e r :d r a w sa l l
p i x e l s i n a t r i a n g l eo t h e rt h a nt h ev e r t i c e sb ys p l i t t i n g
an
edge t o f o r m a new v e r t e x .d r a w i n gt h ev e r t e x ,a n dr e c u r s i v e l y
p r o c e s s i n ge a c ho ft h et w o
new t r i a n g l e sf o r m e d by u s i n gt h e
new v e r t e x .R e s u l t sa r e
l e s s a c c u r a t et h a nf r o m
a precise
a f f i n eo rp e r s p e c t i v et e x t u r em a p p e r ,
anddrawingboundaries
a r en o ti d e n t i c a lt ot h o s eo f
a p r e c i s ep o l y g o nd r a w e r ,a l t h o u g h
t h e ya r ec o n s i s t e n tb e t w e e na d j a c e n tp o l y g o n sd r a w nw i t ht h i s
technique.
I n v e n t e d andimplementedbyJohn
Carmack o f i d S o f t w a r e .
v o i d D-PolysetRecursiveTriangle ( i n t * I p l . i n t * l p 2 , i n t * l p 3 )
(
int
int
*temp:
d;
1267
Original triangle
(vertices have
already been drawn)
t
Split vertex
id rawn as soon
as its identified)
1268
Chapter 69
step.
int
int
short
newC61:
z;
*zbuf;
I/ try to find
anedge t h a t ' s m o r et h a no n ep i x e ll o n g
d
lp2CO1 - 1 p l C O l :
i f ( d < -1 I ( d > 1)
goto spl i t:
d
lp2C11 - l p l [ l l
i f ( d < -1
d > 1
goto spl i t:
d
lp3CO1 - lp2CO1
i f ( d < -1 ( 1 d > 1
gotosplit2:
d
1 ~ 3 1 1 1- l p 2 1 1 1
i f ( d < -1 1 1 d > 1
aoto soli t2:
d
1piCOl -' lp3CO1:
i f ( d < -1 ( 1 d > 1)
g o t os p l i t 3 :
d
l p l C l l - lp3C11;
i f ( d < -1 1 ) d > 1)
i n x or y
)I
s p l it 3 :
/ / s h u f f l e p o i n t s s o f i r s t edge i s edge t o s p l i t
temp
lpl:
lpl
lp3:
lp3
lp2:
lp2
temp:
goto spl i t:
---
return:
fill it nr i a n g l e
s p l it 2 :
/ I s h u f f l e p o i n t s so f i r s t edge I s edge t o s p l i t
temp
lpl:
lpl
lp2:
1pZ
lp3;
lp3
temp:
---
split:
/ / s p l i t f i r s t edgescreenx.screen
y . t e x t u r e s . t e x t u r e t , and
/ I t of o r m a new v e r t e x .L i g h t i n g( i n d e x
4 ) i si g n o r e d :t h e
/ I d i f f e r e n c eb e t w e e ni n t e r p o l a t i n gl i g h t i n g
and u s i n g t h e same
/ / shading f o r t h ee n t i r et r i a n g l ei su n n o t i c e a b l ef o rs m a l l
/ / t r i a n g l e s , so we j u s t u s e t h e l i g h t i n g f o r t h e f i r s t v e r t e x
of
/ I t h eo r i g i n a lt r i a n g l e( w h i c h
was u s e dd u r i n gs e t - u pt os e t
/ I d-colormap.usedbelow
t o l o o k up lit t e x e l s )
/ / s p l i ts c r e e n x
newCOl
( l p l C 0 1 + 1pZCOl) >> 1:
/ / s p l i ts c r e e n y
( 1 p l C l l + lpZC11) >> 1:
newCll
/ I splittexture s
new[,?]
( l p l C 2 1 + l p 2 [ 2 1 ) >> 1;
/ I splittexture t
new[Jl
( l p l C 3 1 + l p 2 [ 3 1 ) >> 1:
/I split 2
newC51
( l p 1 [ 5 1 + l p 2 [ 5 1 ) >> 1:
---
I1 d r a wt h ep o i n t
i f s p l i t t i n g a l e a d i n g edge
i f ( l p 2 C l l > lplC11)
gotonodraw;
i f ((lp2C11
l p 1 [ 1 ] ) && (1p2COI < l p l C O 1 ) )
gotonodraw:
1269
- newC51>>16:
/ / p o i n tt ot h ep i x e l sz - b u f f e re n t r y .l o o k i n g
up t h e s c a n l i n e s t a r t
/ I addressbased
on s c r e e n y andadding
i nt h es c r e e n
x coordinate
zbuf
zspantable[new[111 + newCO1;
/ / d r a wt h es p l i tv e r t e x
i f i t sn o to b s c u r e db ys o m e t h i n gn e a r e r ,
/ / i n d i c a t e db yt h ez - b u f f e r
i f ( z >- * z b u f )
as
pix: int
11 s e t t h e z - b u f f e r t o t h e
*zbuf
//
/I
/I
/I
//
new p i x e l s d i s t a n c e
z:
g e tt h et e x e lf r o mt h em o d e l ss k i nb i t m a p ,a c c o r d i n gt o
t h e s and t t e x t u r ec o o r d i n a t e s , and t r a n s l a t e i t t h r o u g h
t h el i g h t i n gl o o k - u pt a b l es e ta c c o r d i n gt ot h ef i r s t
v e r t e xf o rt h eo r i g i n a l( t o p - l e v e l )t r i a n g l e .B o t h
s and
t a r e i n 1 6 . 1 6f o r m a t
p i x = d~pco1ormap[skintab1e[new[31>>161Cnew~2]>>1611;
I / d r a wt h ep i x e l ,l o o k i n g
up t h es c a n l i n es t a r ta d d r e s s
/ I based on s c r e e n y and a d d i n gi nt h es c r e e n
x coordinate
d ~ v i e w b u f f e r [ d ~ s c a n t a b l e ~ n e w [ l l+
l new[O]l
nodraw:
/ I r e c u r s i v e l y draw t h et w o
/ / s p l i tv e r t e x
pix:
new t r i a n g l e s we c r e a t e d b ya d d i n gt h e
D-PolysetRecursiveTriangle
D-PolysetRecursiveTriangle
( l p 3 . I p l , new):
( l p 3 , new, l p 2 ) :
1270
Chapter 69
Previous
Home
Next
1 271
Previous
chapter 70
quake: a post-mortem
and a glimpse into
the future
Home
Next
1275
Ive talked about Quakes technology elsewhere in this book, However, those chapters focused on specific areas, not overall structure. Moreover, Quake changed in
significant ways between the writing of those chapters and thefinal shipping. Then,
after shipping, Quake was ported to 3-D hardware. And the postQuake engine, codenamed Trinity, is already in development at this writing (Spring 1997), with some
promising results. So in wrapping up this book, Ill recap Quakes overall structure
relatively quickly, then bring you up to date on thelatest developments. And in the
spirit of Frederik Pohls quote, Ill point outthat we implemented anddiscarded at
least half a dozen3-D engines in the course of developing Quake (and all of Quakes
code was written from scratch, rather than using Doom code), andalmost switched
to another onein the final month, as Ill describe later. And even at this early stage,
Trinity uses almost no Quake technology.
In fact, Ill take this opportunity to coin Carmacks Law, as follows: Fightcode entropy.
If you have a new fundamental assumption, throw away your old code andrewrite it
from scratch. Incremental patching and modifying seems easier at first, and is the
normal course of things in software development, but endsup being much harder
and producingbulkier, markedly inferior code in the long run,
as well see when we
discuss the net code
for Quakeworld. It
may seem safer to modifyworking code, but
the nastiest bugs arise from unexpectedside effects and incorrect assumptions, which
almost always arise in patched-over code, notin code designed from the ground up.
Do the hardwork up frontto make your code simple, elegant, great-and just plain
right-and itll pay off many times over in the long run.
Before I begin, Idlike to remind you that all of the Doom and Quakematerial Im
presenting in this book is presented in thespirit of sharing informationto make our
corner of the world a better place for everyone. Id like to thank John Carmack,
Quakes architect and lead programmer, and id Software for allowing me to share
this technology with you, and I encourageyou to share your own insights by posting
on the Internet andwriting books and articles whenever you have the opportunity
and the right to do so. (Of course, check with your employer first!) Weve all benefited greatly from the shared
wisdom of people like Knuth, Foley and van Dam, Jim
Blinn, Jim Kajiya, and hundreds of others-are you ready to take a shot at making
your own contribution to the future?
Preprocessing theWorld
For the most part, Ill discuss Quakes3-D engine in this chapter, although Ill touch
on other areas of interest. For 3-D rendering purposes, Quake consists of two basic
sorts of objects: the world, which is stored as a single BSP model andnever changes
shape orposition; and potentially moving objects, called entities, which are drawn in
several different ways. Ill discuss each separately.
The world is constructed from aset of brushes, which are n-sided convex polyhedra
placed in a level by a designerusing a map editor,with a selectable texture on each
1276
Chapter 70
1277
1 278
Chapter 70
of leaves that have any interest to the player. In Quake, all sounds that happenanywhere in the
world are sent to the client, and are heard,
even through walls, if theyre
close enough; anexplosion around the corner
could be well within hearing andvery
important to hear, so the PVS cant beused to reject that sound, but unfortunately
an explosion on the otherside of a solid wall will sound exactly the same. Not only is
it confusing hearing sounds through walls, but in a modem game, the bandwidth
required to send all the sounds in a level can slow things down considerably. In a
recent version of Quakeworld, aspecifically multiplayer variant of Quake Ill dlSCUSS
later, John uses the PHS to determine which sounds to bother sending, and the
resulting bandwidth improvement
has made it possible to bump themaximum number of playersfrom 16 to 32. Better yet, a sound on the other
side of a solid wall wont
be heardunless theres an opening that permits the sound
to come through. (In the
future, John will use the PVS to determine fully audible sounds, and the PHS to
determine muted sounds.)
Also, the PHS can beused for events like explosions that
might not have their center in the
PVS, but have portions that reach into the
PVS. In
general, thePHS is useful as an approximationof the space in which the client might
need tobe notified of events.
The final preprocessing step is light map generation. Each light is traced out into
the world to see what polygons it strikes, and the cumulative effect of all lights on
each surfaceis stored as a lightmap, a samplingof light values on a lf5texel grid. In
Quake 2, radiosity lighting-a considerably more expensive process, but one that
produces highly realistic lighting-is performed, butIll save that forlater.
1279
in Quakes development, Johnrealized that the approach of storing portals themselves in the world database could readily be improved upon. (To be clear, Quake
wasnt using portals at that point, and didnt endup using them.) Since the aforementioned sets of 3-D visibility clipping planes between portals-which he named
pussuge+were what actuallygot used for visibility, if he stored those, instead of generating them dynamically from the portals, he would be able to do visibility much
faster than with standard portals. This would give a significantly tighter polygon set
than thePVS, because it would be based on visibility through thepassages from the
viewpoint, rather than the PVSs approach of visibility from anywhere in the leaf,
and that would be a considerable help, because the level designers were running
right up against performance limits, partly because of the PVSs relatively loose polygon set. John immediately decided that passages-based visibility was a sufficiently
he would switchQuake to it, even at thatlate
superior approach thatif it worked out,
stage, and within a weekend, he had implemented it and had it working-only to
find that, like portals, it improved best cases but worsened worst cases, and overall
wasnt a win for Quake. In truth, given how close we were to shipping, John was as
much thankful as disappointed that passages didnt work out, but the possibilities
were too great for us not to have taken a shot at it.
So why even bother mentioning this? Partly to show that not every interesting idea
pans out;I tend to discussthose that did pan out, and
its instructive to point out that
many ideas dont. That doesnt mean you shouldnt try promising ideas, though.
First, some do pan out, and
youll never know whichunless you try. Second, an idea
that doesnt work out in one case can still be filed away for another case. Its quite
likely that passages will be useful in a different contextin a future engine.
The more approachesyou try, the larger your toolkit and the broaderyour understanding will be when you tackle your next project.
1280
Chapter 70
The children in front of the node are then processed recursively. When a leaf is
reached, polygons that touch that leaf are marked as potentially drawable. When
recursion in front of a node is finished, all polygons on the front side of the node
that are marked as potentially drawable are added to the edge list, and then the
children on theback side of that node aresimilarly processed recursively.
The edge list is a special, intermediate step between polygons and drawing. Each
polygon is clipped, transformed, and projected, and its non-horizontal edges are
added to a global list of potentially drawable edges. After all the potentially drawable
edges in the world have been added, theglobal edge list is scanned outall at once,
and all the visible spans (the nearest spans, as determined by sorting on BSP-walk
order) in the world are emitted into spanlists linked off the respective surface descriptors (fornow, youcan thinkof a surfaceas being thesame asa polygon).Taken
together, these spans cover every pixel on the screen once and only once, resulting
in zero overdraw; surfaces that are completely hidden by nearer surfaces generate
no spans at all. The spans are then drawn; all the spans for one surface are drawn, and
then all the spans for the next, so that theres texture coherency between spans,which is
very helpful for processor cache coherency,
and also to reduce setup overhead.
The primary purpose of the edge list is to make Quakesperformance as level-that is, as
consistent-as possible. Compared to simply drawing all potentially drawable polygons
front-to-back, the edgelist certainly slows down the best case, that is, whentheres no
overdraw. However, by eliminating overdraw, the worst case is helped considerably; in
Quake, theres a ratio of perhaps 4:lbetween worst and best case drawing time, versus
the 1 O : l or more that can happen with straight polygon drawing. Leveling is very
important, because cases where a gameslows down to the point
of being unplayable
dictate game and level design, and the fewer constraints placed on design, the better.
A corollary is that best case performance can be seductively misleading; itk a
great feeling to see
a scene running at 30 or even 60frames per second, but ifthe
bulk of the gameruns at ISfPs, those best cases are just going to make the
rest of
the game lookworse.
The edgelist isan atypical technology for John;its an extrastage in the engine,its
complex, and it doesntscale well. A Quake level might have a maximum of 500
potentially drawable polygons that get placed into the edge
list, and that runs fine,
but if you were to try to put 5,000 polygons into the edge list, it would quickly bog
down due to edge sorting,link following, and dataset size. Different data structures
(like using a tree to store the edges rather than a linear linked list) would help to
some degree, butbasically the edgelist has arelatively small window
of applicability;
it was appropriate technology for the degree of complexity possible in a Pentiumbased game (and even then, only with the reduction in polygons made possible by
the PVS) , but will probably be poorly suited to more complex scenes. It served well
it feels
in the Quake engine, but remains an inelegant solution, and, in the
end,like
Quake: A Post-Mortem and a Glimpse into the Future
1 281
theres something better we didnt hit on. However, as John says, Im pragmatic
above all else-and the edgelist did the job.
Rasterization
Once thevisible spans are scanned out
of the edgelist, theymust still be drawn,with
perspective-correct texture mappingand lighting. This involves hundreds of lines of
heavily optimized assembly language, but is fundamentally pretty simple. In orderto
draw the spans for a given surface, the screenspace equations for l/z, s/z, and t/z
(where s and t are the texture coordinates and z is distance) are calculated for the
surface. Then for each
span, these values are calculated for the points at each end
of
the span, the reciprocal of l / z is calculated with a divide, and s and t are thencalculated as (s/z)*z and (t/z) *z. If the span is longer than 16 pixels, s and t are likewise
calculated every 16 pixels along the span. Then each stretch of up to 16 pixels is
drawn by linearly interpolating between these correctly calculated points. This introduces someslight error, but this is almost never visible, and even then is only a small
ripple, well worth the performance improvement gainedby doing the perspectivecorrect math only once every 16 pixels. To speed things up a little more, theFDIV to
calculate the reciprocal of l / z is overlapped with drawing 16 pixels, taking advantage of the Pentiurns ability to perform floating-point in parallel with integer
instructions, so the FDIV effectively takes onlyone cycle.
Lighting
Lighting is less simple to explain. The traditional way of doing polygon lighting is to
calculate the correct light at thevertices and linearly interpolate between those points
(Gouraud shading), but
this has several disadvantages; in particular, it makes it hard
to get detailed lighting without creating a lot of extra polygons, the lighting isnt
perspective correct, and the lighting varies with viewing angle for polygons other
than triangles. To address these problems, Quake uses surface-based lighting instead.
In this approach, when its time to draw a surface (a world polygon), that polygons
texture is tiled into a memorybuffer. At the same time, the texture is lit according to
the surfaces lightmap, as calculated during preprocessing. Lighting values
are linearly
so the lighting effects
are smooth,
interpolated betweenthe light maps lGtexel grid points,
but slightly blurry. Then, thepolygon is drawn to the screen using the perspectivecorrect texture mapping described above, with the prelit surface buffer being the
source texture, rather than the original texture tile. No additional lighting is performed duringtexture mapping; all lighting is done when the surface buffer is created.
Certainly it takes longer to build a surface buffer and thentexture mapfrom it than
it does to do lighting and texture mapping in a single pass. However, surface buffers
are cached for reuse, so only the texture mapping stage is usually needed. Quake
surfaces tend to be big, so texture mapping is slowed by cache misses; however, the
Quake approach doesnt need to interpolate lighting on a pixel-by-pixel basis, which
1282
Chapter 70
helps speed thingsup, andit doesnt require additionalpolygons to provide sophisticated lighting. On balance, the performance of surface-based drawing is roughly
comparable to tiled, Gouraud-shaded texture mapping-and it looks much better, being perspectivecorrect, rotationallyinvariant,and highly detailed. Surface-based drawing
also has the potential to support some interestingeffects, because anything that can
be drawn into the surface buffer can
be cached as well,and is automatically drawn in
correct perspective. For instance, paint splattered on a wall could be handled by
drawing the splatter image as a sprite into the appropriate surface buffer, so that
drawing the surfacewould draw the splatter as well.
Dynamic Lighting
Here we come to a feature added to Quake after last years Computer Game
not supportdynamic lightDevelopersConference (CGDC). At that time, Quake did
ing; thatis, explosions and such didnt produce temporary lighting
effects. We hadnt
thought dynamic lighting would add enough to thegame to be worth the trouble;
however, at CGDC Billy Zelsnack showed us a demo of his latest 3-D engine, which
was far from finished at the time, but did
have impressive dynamic lighting effects.
This caused us to move dynamic lighting up thepriority list, and when I got back to
id, I spentseveral days making the surface-building code as fast as possible (winding
up at 2.25 cycles per texel in the inner loop) in anticipation of adding dynamic
lighting, which would of course cause dynamically lit surfaces to constantly be rebuilt as the lighting changed. (A significant drawback of dynamic lighting is that it
makes surface cachingworthless for dynamically lit surfaces, but if most of the surfaces in a scene are not dynamically lit at any one time, it works out fine.) There
things stayed for several weeks, while more critical work was done, andit was uncertain whether dynamic lighting would, in fact,make it into Quake.
Then, one Saturday,John suggested that I take a shot at adding the high-level dynamic lighting code,the code that
would take the dynamic light sourcesand project
their sphereof illumination into theworld, and which would then add thedynamic
contributions into the appropriate light maps and rebuild the affected surfaces. I
said I would as soon as I finished up thestuff I was working on, but it
might be aday
or two.A little while later, he said, I bet I can get dynamic lighting working in less
than an hour, and dove into the code. One hour and nine minutes later, we had
dynamic lighting, and its nowhard to imagine Quake withoutit. (It sure is easier to
imagine the impact of features and implement them once
youve seen themdone by
someone else!)
One interesting point aboutQuakes dynamic lighting is how inaccurate it is. It is
basically a linear projection, accounting properly for neither surface angle
light-nor
ing falloff with distance-and yet thats almost impossible to notice unless you
specifically look for it, and has no negative impact on gameplay whatsoever. Motion
and fast action can surely cover for a multitudeof graphics sins.
Quake: A Post-Mortem and a Glimpse into the Future
1283
Its well worth pointing out that because Quakes lighting is perspective correct and
independent of vertices, and because the rasterizer is both subpixel and subtexel
correct, Quakeworlds are visually verysolid and stable. This was an importantdesign
goal from the start, both
as a point of technical pride and because it greatly improves
the players sense of immersion.
Entities
So far, all weve drawn isthe static, unchanging (apart from
dynamic lighting)world.
Thats an important foundation, butits certainly not a game; now we need to add
moving objects. These objects fall into four very different categories: BSP models,
polygon models, sprites, and particles.
BSP Models
BSP models are just like the world, except that they can move. Examples include
doors, moving bridges, and health and ammo
boxes. The way these are renderedis
by clipping their polygons into the world BSP tree, so each polygon fragment is in
only one leaf. Then these fragments are addedto the edge list, just like world polygons, and scanned out, along with the rest ofthe world, when the edge list isprocessed.
The only trick here is front-to-back ordering. Each BSP model polygon fragment is
given the BSP sorting order of the leaf in which it resides, allowing it to sort properly
versus the world polygons.If two or morepolygons from differentBSP models are in
the same leaf, however, BSP ordering is no longeruseful, so we then sort those polygons by l / z , calculated from the polygons plane equations.
Interesting note: We originally tried to sort all world polygons on l / z as well, the
reason being that we could then avoid splitting polygons except whenthey actually
intersected, rather than having to split them along the lines of parent nodes. This
would result in fewer edges, and faster edge list processing and rasterization. Unfortunately, we found thatprecision errors andspecial cases such as seamlesslyabutting
objects made itdifficult to get global l / z sorting to work completely reliably, and the
code that we had to add to work around these problems slowed things up to the
point where we were getting no extra performance for all the extra codecomplexity.
This is not to say that l / z sorting cant work (especiallyin something like a flight sim,
where objects never abut), but BSP sorting order can be a wonderful thing,partly
because it always worksperfectly, and partly because its simpler and faster to sort on
integer node andleaf orders than onfloating-point l / z values.
BSP models take some extratime because of the cost ofclipping them into the
world
BSP tree, but render justas fast asthe rest of the world, again with no overdraw, so
closed doors, for example,block drawing of whatevers on the otherside (although
its still necessary to transform, project, and add to the edge list the polygons the
door occludes, because theyre still in the PVS-theyre potentially visible if the door
opens). This makes BSP models most suitable for fairly simple structures, such as
1 284
Chapter 70
boxes, which have relatively few polygons to clip, and cause relatively few edges to be
added to the edgelist.
1285
z-buffer can be; it would have been very difficult to get all those rectangles to draw
properly using any other occlusion technique. Certainly z-buffering by itself cant
perform well enough to serve for all hidden surface removal; thats why we have the
PVS and the edgelist (although for hardware rendering thePVS would suffice), but
z-buffering pretty much means thatif you can figure out how to drawan effect, you
can readily insert it intothe world withproper occlusion, and thats a powerful capability indeed.
Supporting scenes with a dozen or moremodels of 300 to 500 polygons each was a
major performance challenge in Quake, and thepolygon-model drawing code was
being optimized right up until the last weekbefore it shipped. One helpin allowing
more models per scene was the PVS; we only drewthose models that were in the PVS,
meaning thatlevels could have a hundred or more
models without requiring alot of
work to eliminate most of those that were occluded. (Note that this is not uniqueto
the PVS; whatever high-level culling scheme we had ended upusing for world polygons would have provided the same benefit for polygon models.) Also, model
bounding boxes were used to trivially clip those that werent in the view pyramid,
and to identify those that were unclipped, s o they could be sent through a special
fast path. The biggest breakthrough, though,was a very different sort of rasterizer
that John came up with for relatively distant models.
1286
Chapter 70
Sprites
We had hoped to be able to eliminate spritescompletely, making Quake 100% 3-D,
but sprites-although sometimes very visibly 2-D-were used for afew purposes, most
noticeably the cores of explosions. As of CGDC last year, explosions consisted of an
exploding spray of particles (discussed below), but there justwasnt enough visual
punch with that representation; adding a series of sprites animating an explosion
did the trick. (In hindsight,we probably should have made the explosions polygon
models rather than sprites; it would have looked about as good, and the few sprites
we used didnt justify the considerable amount
of code and programmingtime required to support them.) Drawing a sprite is similar to drawing a normal polygon,
must detect
complete with perspective correction, althoughof course the inner loop
and skip over transparent pixels, and must also perform z-buffering.
Particles
The last drawing entity type is particles. Each particle is a solid-colored rectangle,
scaled by distance from the viewer and drawn with z-buffering. There can be up to
2,000 particles in a scene, and they are used for rocket trails, explosions, and the
like. In one sense, particles are very primitive technology, but they allow effects that
would be extremely difficult to do well withthe othertypes ofentities, and they work
well in tandemwith other entities, as, for example,providing a trail of fire behind a
polygon-model lava ball that flies into the air, or generating an expanding cloud
around a sprite explosion core.
Verite Quake
Verite Quake (VQuake)was the first hardware-acceleratedversion of Quake. Itlooks
extremely good, dueto bilinear texture filtering, which eliminates most pixel aliasing,
and because it provides good performance at higher resolutions such as 512x384
and 640x480. Implementing VQuake proved to be an interesting task, for two reasons: The Verite chips fill rate was marginal forQuakes needs, andVerite contains
processing than most 3-D
a programmableRISC chip, enabling more sophisticated
accelerators. The need to squeeze as much performance as possible out of Verite
ruled out theuse of a standardAPI such as Direct 3D or OpenGL;instead, VQuake
uses Renditions proprietaryAPI, Speedy3D, with the addition of some special calls
and custom Verite code.
Quake: A Post-Mortem and a Glimpse into the Future
1287
GLQuake
The second (and, according to current plans, last) port of Quake to a hardware
accelerator was an OpenGL version, GLQuake, a native Win32 application. I have
1288
Chapter 70
1289
WinQuake
Im not going to spend much time on the Win32 port of Quake; most of what I
learned doing this consists of tedious details that are doubtless well covered elsewhere, and frankly it wasnt a particularly interesting task and was harder than I
expected, and Im pretty much tired of the whole thing. However, I will say that
Win32 is clearly the future,especially nowthat NT is coming on strong, and like it or
not, you had best learn to writegames forWin32. Also, Internet gaming is becoming
ever more important, andWin32s built-in TCP/IP support is a big advantage over
DOS; that alone was enough to convince us we had to port Quake. As a last comment, Id say that it is nice to have Windows take care of device configuration and
interfacing-now if only we could get manufacturersto write drivers for those devices that actually worked reliably! This will come as no surprise to veteran Windows
programmers, who have suffered through years of buggy2-D Windows drivers, but if
youre new to Windows programming, be prepared to run into and learn to work
a r o u n d - o r a tleast document in your readme files-driverbugs on a regular basis.
Still, when you get down to it, the futureof gaming is a networked Win32 world, and
thats that, so if you havent already moved to Win32, Id say its time.
1290
Chapter 70
Qua keWorld
Quakeworld is a native Win32 multiplayer-onlyversion of Quake, and was done as a
learning experience; itis not a commercialproduct, butis freely distributed on the
Internet. The idea behind it was to try to improve the multiplayer experience, especially
for people linked by modem, by reducing actual and perceived latency. BeforeI discuss
Quakeworld, however, I should discuss the evolution of Quakes multiplayer code.
From the beginning, Quakewas conceived as a client-server app, specifically so that
it would be possible to have persistent servers always running on the Internet, independent of whether anyone was playing on them at any particular time, as a step
toward the long-term goal of persistent worlds. Also, client-server architectures tend
to be moreflexible and robust than peer-to-peer, and itis much easier to have players come and gowill
at with client-server. Quake is client-server from the ground up,
and even in single-player mode, messages are passed through buffers between the
client codeand theserver code; its quite likely that the client and
server would have
been two processes, in fact, were it not for the need to support
DOS. Client-server
turned out to be the right decision, because Quakes ability to support persistent,
come-and-go-as-you-pleaseInternet servers with up to 16 people has been instrumental in the games high visibility in the press, and its lasting popularity.
However, client-server is not without a cost, because, in its pure form, latency for
clients consists of the round trip from the
client to the server and back. (In Quake,
orientation changesinstantly on theclient, short-circuitingthe trip to the server, but
all other events, such as motion and firing, must make the round trip before they
happen on theclient.) In peer-to-peergames, maximum latency can bejust thecost
of the one-way trip, because each client is running a simulation of the game, and
each peer sees its own actions instantly. What all this means is that latency is the
downside of client-server, but in many other respects client-server is very attractive.
So the big task with client-server is to reduce latency.
As of the release of QTestl, thefirst and last prerelease of Quake, John had smoothed
net play considerably by actually keeping the clients virtual time a bit earlier than
the time of the last server packet, and interpolatingevents between the last two packets to the clients virtual time. This meant thatevents didnt snap to
whatever packet
had arrived last, and got rid
of considerable jerking andstuttering. Unfortunately, it
actually increased latency, because of the retarding of time needed to make the interpolation possible. This illustrates a common tradeoff, which is that reducedlatency
often makes for rougherplay.
Reduced latency alsoo f e n makes for more frustrating play.Its actually nothard
to reduce the latency perceived by the player, but many of the approaches that
reduce latency introduce the potentialfor paradoxes that can be
quite distracting
and annoying. For example, a player may see a rocket go by, and think theyve
dodged it, only toJind themselves exploding a second later as the d@erence of opinion
between his simulationand the other simulation isresolved to hisdetriment.
Quake: A Post-Mortem and a Glimpse into the Future
1291
Worse, QTestl was prone to frequent hitching over allbut the best connections, because
it was built around reliable packet delivery (TCP) provided by the operating system.
Whenever a packet didnt arrive, there was a long pause waiting for the retransmission. After QTestl, John realized that this was a fundamentally wrong assumption,
and changed the code
to use unreliable packet delivery (UDP), sending the
relevant
portion of the full state every time (possible only because the PVS can be used to cull
most events in a level), and letting the game logic itself deal with packets that didnt
arrive. A reliable sideband was used as well, but only for events like scores, not for
gameplay state. However, this was a good example of Carmacks Law: John did not
rewrite the net code
to reflect this new fundamental assumption, and wound up with
8,000 lines of messy code that took right up until Quake shipped to debug. For
Quakeworld, Johndid rewrite the net code from
scratch around theassumption of
unreliable packet delivery, and it woundup as just 1,500 lines of clean, bug-free code.
In the long run,its cheaper to rewrite than to patch and modify!
1 292
Chapter 70
Quake 2
I cant talk in detail about Quake 2 as a game, but I can describe some interesting
technology features. The Quake 2 rendering engineisnt going to change that much
from Quake; the improvements are largely in areas such as physics, gameplay, artwork,
and overall design.The most interesting graphics change is in the preprocessing,where
John has added supportfor radiosity lighting; that is, the ability to put a light source into
the world and have the light bounced around theworld realistically. This is sometimes
terrific-it makes for great glowing light around lava and hanging light panels-but in
other cases its lessspectacular than the effects that designers can get by placing lotsof
direct-illumination light sources in a room, so the two methods can be used as needed.
Also, radiosity is very computationally expensive, approximately as expensive as BSPing.
Most of the radiosity demos Ive seen have been in one or
two rooms, and the order
of the problem goes up tremendously onwhole Quake levels. Heres another case
where the PVS is essential;without it, radiosity processing time would be 0(polygons2),
but with the PVS its 0 (po1ygonsaverage-potentially-visible-polygons),which is over
an order of magnitude less (and increases approximately linearly, rather than as a
squared function,with greater-level complexity).
Also, the moving sky texture will probably be gone orwill change. Onelikely replacement is an enclosing texture-mapped box around the world, at a virtually infinite
distance; this will allow open vistas, much like Doom, a welcome change from the
claustrophobic feelof Quake.
2 is a shift from interpretedQuake-C code forgame
Another likely change in Quake
logic to compiled DLLs. Part of the incentive here is performance-interpretation
isnt cheap-and part is debugging, because the standard debugger can be used with
DLLs. The drawback, of course, is portability; Quake-C program files are completely
portable to any platform Quakeruns on,with no modification or recompilation, but
DLLs compiled for Win32 require a real porting effort to run anywhere else. Our
thinking hereis that there arealmost no non-console platforms other than thePC
that matter that much anymore, and for those few that do (notably the Mac and
Linux), the DLLs can be ported along with the core engine code. It just doesnt
make sense for easy portability to tiny markets to impose a significant development
and performance cost on the one huge
market. Consoles will always require serious
porting effortanyway, so going to Win32-specific DLLsfor thePC version wont make
much difference inthe ease of doing console ports.
Finally, Internet supportwill improve in Quake 2. Some of the Quakeworldlatency
improvements will doubtless be added, but more important, there will be a new
interface, especially for monitoring and joining games,
net
in the formof an HTML
page. John has always been interested inmoving asmuch codeas possible out of the
game core, and letting the
browser take care of most of the UI makes it possible to
eliminate menuingand such from the Quake 2 engine. Thinkof being ableto browse
hundreds of Quake servers from a single Web page (much as you can today with
Quake:
1293
QSpy, but with the advantage of a standard,familiar interface and easy extensibility),
and I think youll see why John considers this the game interface of the future.
By the way, Quake 2 is currently beingdeveloped as a native Win32app only; no DOS
version is planned.
Looking Forward
In my address to the ComputerGame Developers Conference in1996, I said that it
wasnt a badtime to start up a game company aimed
at hardware-only rasterization,
and trying to make a game that leapfrogged the competition. It looks like I was
probably a year early, because hardware took longer to ship than I expected, although there was a good living to be made writing games that hardware vendors
could bundlewith their boards. Now, though, it clearly is time. By Christmas 1997,
there will be several million fast accelerators out there, and
by Christmas 1998, there
access tothe
will be tens of millions. At the same time,vastly more people are getting
Internet, andits from the convergence of these two trends that I thinkthe technology for the next generationof breakthrough real-time games will emerge.
John is already working on ids next graphics engine, code-named Trinity and targeted around Christmas of 1998. Trinity not
is only a hardware-only engine, its baseline
system isa Pentium Pro
200-plus withMMX, 32 MB, and anaccelerator capable of at
least 50 megapixels and 300 K triangles per second with alpha blending and
z-buffering. The goals of Trinity are quite different from those of Quake. Quakes primary
technical goals were to do high-quality, well-lit, complex indoor scenes with 6 degrees of freedom, andto support client-server Internet play. That was a good start,
but only that. Trinitys goals are to have much less-constrained, better-connected
worlds than Quake. Imagine seeing through open landscape from oneserver to the
next, and seeing the action on adjacent servers in detail,in real time, and youll have
an idea of where things are heading in the near future.
A huge graphics challenge for the next generationof games is level ofdetail (LOD)
management. If were to have
larger, more openworlds, there will inevitably be more
geometry visible at one time. At the same time, the push for greater detail thats
been in progress for the past four years or so will continue; people will start expecting to see real cracks and bumpswhen they get close to a wall, not justa picture of
cracks and bumpspainted on a flat wall. Without LOD, these two trends are in direct
opposition; theres no way you can make the world larger and make all its surfaces
more detailed atthe same time, without bringing the rendererto its knees.
The solution is to draw nearer surfaces with more detail than farther surfaces. In itself,
thats not so hard, butdoing it without popping and snapping being visible as you move
about is quite achallenge. John has implemented fractal landscapes with constantly
adjustable level of detail, and has made it so new vertices appear as needed and
gradually morph to their final positions, so there is no popping. Trinity is already
1294
Chapter 70
Previous
Home
Next
capable of displaying oval pillars that have four sides when viewed from a distance,
and add vertices and polygons smoothly as you get closer, such that the change is
never visible, and thepillars look oval at all times.
1295
Previous
Home
Afterword
If youve followed me this far, you might agree thatweve come through some rough
country. Still, Im of the opinion that hard-won knowledge is the best knowledge,
not only because itsticks to you better, but also because winning ahard race makes it
easier to win the next one.
This is an unusualbook in thatsense: In additionto being a compilationof much of
what I know about fast computer graphics, it is a journal recording some of the
process by which I discovered and refined that
knowledge. I didntjust sit down one
day to write this book-I wrote it over a periodof years and published its component
parts inmany places. It is ajournal of my successes and frustrations, with side glances
of my life as it happened along theway.
And there is yet another remarkable thing about this book: You, the reader, helped
me write it. Perhaps not you personally, but many people who have read my articles
and columns over the years sent me notes asking me questions,suggesting improvements (occasionally by daring me to beat them atthe code performance game!)or
sometimesjust dumping remarkable code into my lap. Where it seemed approprisometimes even the words of my correspondents, and
ate, I dropped in the code and
the book is much the richer forit.
Here and there,I learned things that had nothing all
at to do with fast graphics.
For example: Im not a doomsayer who thinks American education lags hopelessly
behind the rest of the Western world, but now and then something happens that
makes me wonder. Some time back, I received a letter from one Melvyn J. Lafitte
requesting that I spend some time in my columns describing fast 3-D animation
techniques. Melvyn hoped thatI would be so kind as to discuss, among otherthings,
hidden surface removal and perspective projection, performed in
real time, of course,
and preferably in Mode X. Sound familiar?
Melvyn shared with me a hidden surface approach that he had
developed. His technique involved defining polygon vertices in clockwise order, as viewedfrom thevisible
side. Then, he explained, one can use the cross-product equations found in any
math book to determine which way the perpendicular to the polygon is pointing.
Better yet, he pointed out, its necessary to calculate only the Z component of the
perpendicular, andonly the sign of the Z component needactually be tested.
1297
Next
Previous
Home
What Melvyn described is, of course, backface removal, a key hidden-surface technique that I used heavily in X-Sharp. In general, other hidden surface techniques
must be used in conjunction with backface removal, but backface removal is nonetheless important and highly efficient. Simply put, Melvyn had devised for himself
one of the fundamentaltechniques of 3-D drawing.
Melvyn livesin Moens, France. At the time he wrote me, Melvyn was 17years old. Try
to imagine any American 17-year-old of your acquaintance inventing backface removal. Tryto imagine any teenager you know even using
the phrase the cross-product
equations found in any math book. Not to mention that Melvyn was able to write a
highly technical letter in English; and if Melvyns English was something less than
flawless, it was perfectly understandable, and,in my experience, vastly better than an
average, or even well-educated, Americans French. Please understand, I believe we
Americans excel in awide variety of ways,
but I worry that when it comes
to math and
foreign languages, we are becoming a nation of tEtes depomme de t m e .
Maybe I worry too much. If the glass is half empty, well, its also half full. Plainly,
something I wrote inspired Melvyn to do something thatis wonderful, whether he
realizes it or not. And it has been tremendously gratifylng to sense in the letters I
have received the same feeling of remarkably smart people going out there and
doing amazing things just for the sheer unadulterated of
funit.
I dontthink Im exaggeratingtoo much (well, maybea little) when I say that this sort of
fun is what I live for. Im glad to see that so many of you share that samepassion.
Good luck. Thank you for your input, your code, and all your kind words. Dont be
afraid to attempt theimpossible. Simply knowing whatis impossible is useful knowledge-and you may well find, in the wake of some unexpectedsuccess, that nothalf
of the things we call impossible have any right at all to wear the label.
-Michael
1298
Afterword
Abrash
Next
Previous
Index
3-D clipping
arithmetic imprecision, handling, 1240
line segments, clipping to planes,
1195-1197
overview, 1195
polygon clipping
BackRotateVector function, 1203
clipping to frustum, 1200, 12011206, 1206-1207
ClipToFrustum function, 1204
ClipToPlane function, 1199
optimization, 1207
overview, 1197-1200
PolyFacesViewer function, 1203
ProjectPolygon function, 1201
SetUpFrustum function, 1204
SetWorldspace function, 1204
TransformPoint function, 1203
TransformPolygon function, 1203
Updateworld function, 1205
viewspace clipping, 1207
ZSortObjects function, 1201
3-D drawing
See also BSP (Binary Space
Partitioning) trees; Hidden surface
removal; Polygons, filling; Shading;
3-D animation.
backface removal
BSP tree rendering, 1160-1161
calculations, 955-957
motivation for, 954-9j j
and sign of dot product, 1140
solid cube rotation demo program,
957-961, 962-963, 964-966, 967
background surfaces, 1240
draw-buffers, and beam trees, 1187
and dynamic objects, 1100-1101
Gouraud shading, 1246-1250
Numbers
l/z sorting
abutting span sorting, 1229-1230
AddPolygonEdges function, 12321233, 1238
vs. BSP-order sorting, 1226-1227
calculating l/z value, 1220-1222
ClearEdgeLists function, 1236-1237
Drawspans function, 1236
independent span sorting, 1230, 12311238, 1239-1241
intersecting span sorting, 1228-1229
PolyFacesViewer function, 1232
reliability, 1227
ScanEdges function, 1234-1236,
1238-1239
Updateworld function, 1237-1238
3-D animation
See also Hidden surface removal; 3-D
drawing; 3-D polygon rotation
demo program; X-Sharp 3-D
animation package.
demo programs
solid cube rotation program, 957961, 962-963, 964-966, 967
3-D polygon rotation program, 939,
940-945, 948-949
12-cube rotation program, 972, 973984, 985-987
depth sorting, 1000, 1001-1002
rotation
ConcatXforms function, 944
matrix representation, 938-939
multiple axes of rotation, 948
XformVec function, 943
rounding vs. truncation, 1002-1003
translation of objects, 937-938
1299
Home
lighting
Gouraud shading, 1246-1250
overlapping lights, 1247
perspective correctness, 1248-1250
rotational variance, 1247
surface-based lighting, 1250-1256,
1260-1262
viewing variance, 1249
moving models in 3-D drawings,
1212-1222
painter's algorithm, 1099, 1104-1105
perspective correctness problem,
1248-1250
portals, and beam trees, 1188
projection
dot products, 1141-1 142
overview, 937, 948
raycast, subdividing, and beam trees, 1187
reference materials, 734-935
rendering BSP trees
clipping, 1158-1159
Clipwalls function, 1152-1155,
1158-1157
DrawWahBackToFront function,
1155-1156, 1160-3161
overview, 1149
reference materials, 1157
TransformVertices function, 11511152, 1158
UpdateViewPos function, 1151, 1157
Updateworld function,
1156-1157,1157
viewspace, transformation of
objects to, 1158
wall orientation testing, 1160-1 161
WallFacingViewer function, 11501151, 1161
span-based drawing, and beam
trees, 1187
transformation of objects, 935-936
triangle model drawing
fast triangle drawing, 1263-1265
overview, 1262-1263
precision, 1265
subdivision rasterization, 1266-1267,
1267-1270
944,748
overview, 937
performance, 949
polygon filling with clipping support,
940-943
effects on performance, 82
optimizing for, 83-85
overview, 79-82
and registers, 85
12-cube rotation demo program
limitations of, 986
optimizations in, 985-986
performance, 986
X-Sharp animation package, 972, 973984, 984-985
16-bit checksum program
See also TCP/IP checksum program.
assembly implementation, 10-12, 17-18
C language implementation, 8-9, 15-16
overview, 8
redesigning, 9
16-color VGA modes
color paging, 628-629
DAC (DigitaVAnalog Converter), 626-628
palette RAM, 626
24-byte hi/lo function, 292-293
32-bit addressing modes, 256-258
32-bit division, 181-184, 1008
32-bit fixed-point arithmetic, optimizing,
1086-1089, 1090-1091, 1092-1093
32-bit instructions, optimizing, 1091
32-bit registers
See also Registers; VGA registers.
adding with LEA, 131
BSWAP instruction, 252
multiplying with LEA, 132-133
386 processor, 222
time vs. space tradeoff, 187
using as two 16-bit registers, 253-254
256-color modes
See also 320x400 256-color mode.
DAC settings, 629
mapping RGB model to, 1036, 10371038, 1039
resolution, 360x480 256-color mode,
619-620
286 processor
CMP instruction, 161, 306
code alignment, 215-218
cycle-eaters, 209-210
data alignment, 213-215
data transfer rates, 212
display adapter cycle-eater, 219-221
display memory wait states, 220
DRAM refresh cycle-eater, 219
A
Absolute value, setting AX register, 171
Abstraction, and optimization, 330-332,
345-346
Abutting span sorting, 1229-1230
AC (Attribute Controller), VGA
addressing, 427-428
Color Select register, 628-629
Index register, 443, 555
Mode Control register, 575
Mode register
color paging, 628-629
256-color modes, 629
palette RAM registers, setting, 631-632
Pel Panning register, 574
registers, setting and reading, 583
screen blanking demo program,
556-557
441
92,93
objectives, 28
optimizing instructions, 23-24
programmers responsibilities, 27-29
rearranging instructions, 418-419
reducing size of code, 416-418
stack addressing, 420
understanding data, importance of, 122
Assembly language programmers, vs.
compilers, 154-155
Assembly language, transformation
issues, 25-26
AT computer
display adapter cycle-eater, 107
286 processor, data transfer rates, 212
Attribute Controller, VGA. See AC
(Attribute Controller), VGA.
Automatic variables, 184-185
AX register, setting to absolute value, 171
B
Backface culling. See Backface removal.
Backface removal
See also Hidden surface removal;
Visible surface determination.
BSP tree rendering, 1160-1161
calculations, 955-957
motivation for, 954-955
and sign of dot product, 1140
solid cube rotation demo program,
957-961, 962-963, 964-966, 967
Background surfaces, 1240
BackRotateVector function, 1203
Ball animation demo program, 431-441
Barrel shifter, VGA, 463-464
Beam trees
improvement, attempts at, 1187-1188
overview, 1185
performance, 1186
potentially visible set (PVS),
precalculating, 1188-1189
Benchmarks, reliability of, 729
Biased perceptions, and optimization,
1080, 1085
Big endian format, 252
BIOS. See EGA BIOS; VGA BIOS.
Bit mask
bitmapped text demo program, 466469, 470-471
505-508
relocating, 516-517
transient color effects, 509
Bit-plane animation
assembly implementation, 801-809,810
limitations, 811-813
overview, 796
page flipping, 814
palette registers, 799-801
principles, 796-798
shearing, 813
Black box approach, and future of
programming, 725-726
Blocks. See Restartable blocks.
Borders (overscan), 555-556
BOUND instruction, 221
Boundary pixels, polygons
rules for selecting, 712
texture mapping, 1049-1052, 10651066, 1067
Bounding volumes, 1184
Boyer-Moore algorithm
assembly implementations, 271-274,
274-277
671-677
692, 692-693
description, 683-684
implementation details, 685-687
integer-based implementation,
685-687
potential optimizations, 70j
Bresenhams run-length slice algorithm.
See Run-length slice algorithm.
Bridges, John
mode set routine, 360x480 256-color
mode, 609, 612, 620-621
256-color modes, undocumented, 879
Brute-force solutions, 193
BSP (Binary Space Partitioning) trees
2-D line representation, 1120
3-D rendering, 1162
beam trees
improvement, attempts at, 1187-1 188
overview, 118j
performance, 1186
potentially visible set (PVS),
precalculating, 1188-1189
BSP compiler
BuildBSPTree function, 1125-1127
SelectBSPTree function, 1124-1125
BuildBSPTree function, 1125-1127
building,1101-1104
BuildTree function, 1112
data recursion vs. code recursion,
1108-1113
description, 1098-1099, 1119
and dynamic objects, 1100-1101
C
C library functions
getco function, 12, 14
m e m c w function, 116
memcmpo function, 116
memcpy0 function, 1147-1148
memsea function, 727
optimization, 15
read0 function, 12, 121
strsts() function, 115
Cache, internal. See Internal cache.
Cache lines, Pentium processor, 374
Calculations, redundant, and
optimization, 682-683
Calculus and Analytic Geometry
(book), 1135
CALL instruction
486 processor, 241-242
Pentium processor, 404
Carmack, John
and id Software, 1118
overdraw, 1184-1186
subdivision rasterization, 1266-1267,
1267-1270
Carry flag
DEC instruction, 148
INC vs. ADD instructions, 147-148
LOOP instruction, 148
rotating bits through, 185
in word count program (David
Stafford), 317-319
Cats, shipping via air freight, 697-698
cellmap class, 325-329, 333-335, 341-345
Cellmap wrapping, Game of Life, 331332, 333-335, 336, 337-338
Cell-state method, 327, 334, 344
CGA (Color/Graphics Adapter)
display adapter cycle-eater, 104
VGA compatibility with, 430
Challenges
Game of Life
rules, 346, 350
3-cell-per-word implementation
(David Stafford), 351-352, 353363, 363-365
ScanBuffer routine, 305 , 307-319
Change list, in Game of Life, 363-366
Chaplin, Michael, 776
Charactedattribute map, VGA mode 3, 517
Chartreuse moose story, 399
Checksum programs. See 16-bit
checksum program; TCP/IP checksum
program.
Chunky bitmap conversion demo
program, 505-508
Chunky bitmaps, converting to planar,
504-505, 505-508
Circular linked lists. 288-292
Clear-cell method, 327, 334, 343
ClearEdgeLists function, 1236-1237
Clements, Willem, 313-315
Client-server architecture, and
Quakeworld, 1291
Clipping
See also Hidden surface removal
(HSR); Visible surface determination
(VSD).
arithmetic imprecision, handling, 1240
in BSP tree rendering, 1158-1159
line segments, clipping to planes,
1195-1197
masked copying, Mode X, 923
overview, 1195
polygon clipping
BackRotateVector function, 1203
clipping to frustum, 1200, 12011206, 1206-1207
ClipToFrustum function, 1204
ClipToPlane function, 1199
optimization, 1207
overview, 1197-1200
PolyFacesViewer function, 1203
ProjectPolygon function, 1201
SetUpFrustum function, 1204
SetWorldspace function, 1204
TransformPoint function, 1203
TransformPolygon function, 1203
UpdateViewPos function, 1202
Updateworld function, 1205
viewspace clipping, 1207
ZSortObjects function, 1201
ClipToFrustum function, 1204
ClipToPlane function, 1199
Clock cycles
See also Cycle-eaters.
address calculation pipeline, 238-240
branch prediction, 377-378
byte registers and lost cycles, 242-245
cross product floating point
optimization, 1171, 1172
and data alignment, 213-215
data transfer rates, 81, 82
dot product floating point
optimization, 1170
dual-pipe execution, 405
effective address calculations
286 and 386 processors, 223-225
Pentium processor, 375-376
8088 processor
data transfer rates, 81, 82
memory access, 82, 83-85
floating point instructions, 1167-1170
486 processor
address calculation pipeline, 238240, 250
byte registers and lost cycles,
242-245
indexed addressing, 237-238
stack addressing, 241-242
32-bit addressing modes, 256-258
EXCH instruction, 1170
indexed addressing, 237-238
551-555
509-515
Compilers
vs. assembly language programmers,
154-155
avoiding thinking like, 152, 154-155
bitblt compiler for Game of Life
(David Stafford), 351-352, 353-363,
363-365
handling of segments, 154
Complex polygons
defined, 710, 742
edges, keeping track of, 742-744, 753,
755, 756
polygon-filling programs, 745-752, 754
Computational Geomety,An
Introduction (book), 759-760
Computer Graphics:Princaples and
Practice (book), 660, 934, 1121
Computer Graphics (book), 1135, 1157
ConcatXforms function
assembly implementation,997-999,
1019-1022
CopyDirtyRectangleToScreen
function, 866-867
Copying
bytes between registers, 172
pixels, using latches (Mode X), 905907, 908, 909-911
CopyRect subroutine, 871
CopyScreenToScreenMaskedX
subroutine, 918, 919-921
CopyScreenToScreenX subroutine,
905-907,908
CopySystemToScreenMakedX
subroutine, 916-918
CopySystemToScreenX subroutine,
908, 909-911
CosSin subroutine, 994-996,
999, 1013-1015
Count-neighbors method, 334-335
CPU reads from VGA memory, 526
CPUID instruction, Pentium
processor, 378
CreateAlignedMaskedImage function,
922-923
Cross products
calculating, 955-956, 1139-1140
floating point optimization, 1171, 1172
CRT Controller, VGA. See CRTC (CRT
Controller), VGA.
CRTC(CRT Controller), VGA
addressing, 427-428
Line Compare register, 565
Overflow register, 565
shearing, 813-814
start address registers, setting, 583
Cycle-eaters
286 and 386 processors
data alignment cycle-eater,
213-215, 218
display adapter cycle-eater, 219-221
DRAM refresh cycle-eater, 219
overview, 209-210
prefetch queue cycle-eater, 211-212
system wait states, 210-212
data alignment cycle-eater
386 processor, 218
286 processor, 213-215
display adapter cycle-eater
286 and 386 processors, 219-221
8088 processor, 101-108
DRAM refresh cycle-eater
286 and 386 processors, 219
8088 processor, 95-99, 108
8-bit bus cycle-eater, 79-85, 108
8088 processor
display adapter cycle-eater, 101-108
DRAM refresh cycle-eater, 95-99, 108
8-bit bus cycle-eater, 79-85, 108
prefetch queue cycle-eater, 86-94, 108
wait states, 99-101
overview
286 and 386 processors, 209-210
8088 processor, 78-79, 80
prefetch queue cycle-eater
286 and 386 processors, 211-212
8088 processor, 86-94, 108
system wait states, 210-212
wait states, 99-101
Cycles. See Clock cycles; Cycle-eaters.
D
DAC (DigitaVAnalog Converter)
color cycling
bit-by-bit loading, 650-651
color cycling demo program, 643,
644-648, 648-649
interleaved loading, 649-650
problems, 640-643
using subset of, 649
Data register, 642-643
index wrapping, 651
loading
bit-by-bit loading, 650-651
directly, 642-643
interleaved loading, 649-650
via VGA BIOS, 641-642, 648
and Write Index register, 642-643, 651
Mask register, blanking screen, 651
Read Index register, 651-652
reading, 651-652
setting registers, 630, 631-632
in VGA color path, 626-628
Write Index register
DAC index wrapping, 651
loading DAC, 642-643
DAC registers demo program, 632-635
Data alignment cycle-eater
386 processor, 218
286 processor, 213-215
Data bus, 8-bit
See also 8-bit bus cycle-eater.
Data manipulation instructions, and
flags, 147
Data recursion
vs. code recursion, 1108
Euclids algorithm, 200
inorder tree traversal, 1108, 11091110,1110
Data register, loading DAC, 642-643
Data Rotate register
barrel shifter, controlling, 463
vs. CPU-based rotations, 489
effect on ALUs, 452
Data rotation, VGA
barrel shifter, 463-464
bit mask, 464-471
CPU vs. Data Rotate register, 489
847-851, 863-869
description, 844-845
ordering rectangles, 873
overlapping rectangles, 872-873
vs. page flipping, 846, 862
performance, 873
system memory buffer size, 851
writing to display memory, 856-857
Disk caches, 19
Display adapter cycle-eater
286 and 386 processors, 219-221
data transfer rates, 220
8088 processor
graphics routines, impact on, 106
optimizing for, 107
overview, 101-104
performance, impact on, 104
read/write/modify operations, 107
wait states, 99-101
Display memory
See also Bit mask; Display
memory access.
Mode X
copying between memory locations,
905-907, 908
copying from system memory, 908,
909-911
1267-1270
DrawHorizontdLineList subroutine,
941-943
DrawHorizontdLineSegfunction
assembly implementation, 754
C implementation, 750-751
DrawHorizontalRunfunction, 692
DrawImage subroutine, 828
Drawing
See also Line-drawing algorithms;
Lines; 3-D drawing.
fill patterns, using latches, 453
pixel drawing
EVGADot function, 6 1 - 6 2 , 669-670
optimization, 1074, 1086
painters algorithm and overdraw
problem, 1184
single-color drawing with write mode
3, 831-832
speeding up, 727-729
text
bitmapped text using bit mask, 466469, 470-471
bitmapped text using write mode 3,
484-489, 489-490, 490-496
solid text using latches, 1039-1041,
1042-1044
1025-1027
1055-1056
E
EA (effective address) calculations
286 and 386 processors, 223-225
8088 processor, 129
486 processor
address calculation pipeline, 238-240
stack addressing, 241-242
Pentium processor, 375-376
320x400 256-color mode, 599-600
EBP register, 257
Edge tracing
overview, 711-713
ScanEdge function
assembly implementation,
735-738, 735
floating-point C implementation,
716-717
integer-based C implementation,
730-732
200-202
675-677
F
FADD instruction, Pentium processor,
1167-1170
Far jumps, to absolute addresses, 186-187
FDIV instruction, Pentium processor,
1167-1170
Fetch time
See also Instruction fetching.
286 and 386 processors, 210, 211
8088 processor, 86-93
Files
reading from
getco function, 12, 14
read0 function, 12
restartable blocks, 16
text, searching for. See Search engine.
Fill patterns, drawing using latches, 453
FillConvexPolygon function, 714-716,
720-721
FillMonotoneVerticalPolygon
function, 763-764
FillPatternX subroutine, 899, 900-903,
903-904
FillPolygon function
complex polygons, 746
monotone-vertical polygons, 767
FillRect subroutine, 869-870
FillRectangleX subroutine
four-plane parallel processing, 888891, 891-893
pixel-by-pixel plane selection, 885-887
plane-by-plane processing, 887-889
FindIDAverage function
assembly implementations
based on compiler optimization, 160
data structure reorganization, 163,
165-166
FindNodeBeforeValueNotLess
function, 286, 287
F i n a t r i n g function
Boyer-Moore algorithm, 269, 271-274,
274-277
overview, 175
scan-on-first-character approach, 176
s c a n - o n - s p e c i c t e r approach, 178
FirsWass function, 355-358
Fix function, 358, 365
FixedDiv subroutine, 982, 993,
1010-1012
FIXED-MUL macro, 1016-1017
FtvedMul subroutine, 981, 993-994,
1009-1010
Fixed-point arithmetic
vs. floating point, 985, 1206
vs. integer arithmetic, 730, 1065
32-bit fixed-point arithmetic, 10861089, 1090-1091, 1092-1093
Flags
and BSWTAP instruction, 254
Carry flag, 147-148, 185, 317-319
INC vs. ADD, 147-148
and LOOP instruction, 148
and NOT instruction, 146-147
FLD instruction, Pentium processor,
1167-1 170
Floating point optimization
clock cycles, core instructions,
1167-1168
cross product optimization, 1171, 1172
dot product optimization, 1170, 1171
FXCH instruction, 1169-1170
interleaved instructions, 1169-1170
matrix transformation optimization,
1172-1173, 1173-1174
overview, 1167-1170
pipelining, 1168-1170
projection to screen space, 1174
rounding control, 1174-1175
Floating-point calculations
vs. fixed-point calculations, 985, 1206
vs. integer calculations, 730
FMUL instruction
486 processor, 236
Pentiurn processor, 1167-1170
486 processor
AX register, setting to absolute value,172
byte registers and lost cycles, 242-245
CMP instruction
operands, order of, 306
vs. SCASW, 161
copying bytes between registers, 172
and display adapter cycle-eater, 107
indexed addressing, 237-238
internal cache
effect on code timing, 246
optimization, 236
LAHF and S A H F instructions, 148
LEA instruction, vs. ADD, 131
LODSB instruction, 304
LODSD instruction, vs. MOVLEA
sequence, 171
G
Game of Life
abstraction and performance, 330-332,
345-346
341-345
C++ implementation
basic, 324, 325-328
optimized, 336, 337-338
cellmap-wrapped implementation,
331-332, 333-335, 336, 337-338
challenge to readers
rules, 346, 350
3-cell-per-word implementation
(David Stafford), 351-352, 353363, 363-365
change list, 363-366
performance analysis, 329-330, 332,
338, 340, 350
re-examining problem, 338-339, 363
rules, 324
3-cell-per-word implementation
discussion, 363-365
listing, 352-363
overview, 351-352
GC (Graphics Controller), VGA
addressing, 427-428
architecture
ALUS, 451-452
barrel shifter, 463-464
bit mask, 464-471
latches, 452-453
set/reset circuitry, 471-479
Bit Mask register
bit mask, controlling, 465
drawing solid text, 1040
setting inside a loop, 429
vs. write mode 3, 832, 844
Color Compare register, 531
Data Rotate register
barrel shifter, controlling, 463
vs. CPU-based rotations, 489
effect on ALUs, 452
Enable Set/Reset register
setting drawing color, 666
specifying plane, 474
Graphics Mode register
read mode 0 , selecting, 525
read mode 1 , selecting, 531
Read Map register
plane, selecting, for CPU reads, 526
planes, specifying to be read, 542
Set/Reset register, 666
Gcdo function
543-545
545-547
H
Hardware dependence, DDA (digital
differential analyzer) texture
mapping, 1053
Hecker, Chris
texture mapping insight, 1083
I
id Software, 1118, 1190
Ideas, selling, 1193-1194
Illowsky, Dan, 187, 315
Image precedence. See
Bit-plane animation.
DluL instruction
486 processor, 236
on 386 processor, 173-174
INC instruction
VS. ADD, 147-148, 219
and Carry flag, 147-148
1173-1174
overview, 394-395
TCPAP checksum program, 408
Internal animation, 872
Internal buffering
See also Restartable blocks.
in 16-bit checksum program, 15-16
in search engine, 114-115
Internal cache
486 processor
effect on optimization, 236
timing code, 246
Pentium processor
instruction fetching, 374
organization, 374-375
paired instructions, 391, 396
Internal indexing, VGA, 427-429
Internet support
Quake 2, 1293
Quakeworld, 1291
Interrupts
DAC, loading, 643, 648
Divide By Zero interrupt, 181
and IRET instruction, 227
and long-period Zen timer, 53, 66
and page flipping, 446
and POPF instruction, 226
and Zen timer, 43, 45-46
Index
J
Jet Propulsion Lab, color perception
research, 1035
JMP $+2 instructions, 558, 632
JMP DWORD PTR instruction, 186-187
Jumps, to absolute addresses, 186-187
K
Kennedy, John, 171-172
Kent, Jim
dynamic palette adjustment, 1039
monotone-vertical polygons, filling,
760-761
Kissing, learning to,
281-282
Kitchen floor story, 261-262
Klerings, Peter, 350
Knuth, Donald, 323
L
U H F instruction, 148
Large code model
linking Zen timer, 71
optimizing assemblers, 71-72
Latches
and bit mask, 470
and Color Dont Care register, 535537, 535
and CPU reads, 530
drawing solid text, 1039-1041,
1042-1044
Mode X
copying pixels, 905-907, 908,
909-911
922-923
704-706
C-language implementation, 688-691
Line-drawing algorithms
accumulated pixels approach (Jim
Mackraz), 678
Bresenhams algorithms
basic line-drawing algorithm, 655661, 661-665, 665-671, 671-677
run-length slice algorithm, 683-693,
698-704, 705
characteristics of, 656-657
run-length slice algorithm, 683-693,
698-704, 705
Wu antialiasing algorithm, 776-779,
780-791, 791-792
Line-drawing demo program, 615-618,
618-619
LineIntersectPJane function, 1142-1143
Lines
drawing
See also Line-drawing algorithms.
color-patterned lines demo
program, 509-515
32OSee also Restartable blocks.400
256-color mode, 600
write mode 2, 509
intersecting, 1121-1123
parametric lines
clipping, 1121-1123
overview, 1119-1120
Linked lists
basic implementation, 283-285
circular lists, 288-292
dummy nodes, 285-287
head pointers, 284, 285
InsertNodeSorted assembly
routine, 290
overview, 282
sentinels, 285-287
sorting techniques, 755
tail nodes, 286
test-bed program, 291
Little endian format, 252
Local optimization
See also Assemhly language
optimization; Optimization.
bit flipping and flags, 146-147
defined, 140
incrementing and decrementing,
147-148
M
Mackraz, Jim, 678
Map Mask register
demo program, 472-473
drawing text, 833
optimizing Mode X, 1074
vs. Read Map register, 526
selecting planes for CPU writes, 443444, 471-472
with sedreset circuitry, 474
write mode 1, 443
Map Mask register demo program,
472-473
1174
944, 948
1173-1174
925-930
3-D polygon rotation, 939,
940-945, 943
885-887
925-930
clipping, 923
between display memory locations,
918-919, 919-921
image and mask alignments,
generating, 922-923
performance, 924
system memory to display memory,
916-918, 916
memory allocation, 903-904
mode set routine, 880-881, 882
optimization, 1074
pattern fills, 899, 900-903, 903-904
pixel access and hardware planes, 1082
ReadPixelX subroutine, 884-885
vertical scanlines vs. horizontal,
1084-1086
WritePixelX subroutine, 883-884
ModelColor structure, 1035
ModelColofloColorIndex function,
1036, 1038
Mod-WM byte, 257
Modular code
and future of programming
profession, 725-726
optimizing, 153
N
NEG EAX instruction, 222
Negation, twos complement, 171
Next1 function, 353
Next2 function, 353
Nextseneration method, 327-328,
335, 336, 337-338, 344
Nonconvex objects, depth sorting, 1000,
1001-1002
Normal vectors
building BSP trees, 1106
calculating, 955-956
direction of, 1140
Normals. See Normal vectors.
NOSMART assembler directive, 72
NOT instruction, 146-147, 147
0
Object collisions, detecting, 531-534
Object space, 935, 1135
Object-oriented programming, 725-726
Octant0 function
360x480 256-color mode line drawing
demo program, 615
Bresenhams line-drawing algorithm,
662, 668-669
Octant1 function
360x480 256-color mode line drawing
demo program, 616
Bresenhams line-drawingalgorithm,
663, 668-669
531-534
P
Page flipping
and bit-plane animation, 814
color cycling, 650
vs. dirty-rectangle animation, 846, 862
display memory start address,
changing, 857
mechanics of, 833-836
memory allocation, 834, 903-904
overview, 444-446
single-page technique, 855-857
640x480 mode, 836-837
with split screen, 836-837
320x400 256-color mode, 600-605
timing updates, 835-836
VGA mode 12H (hi-res mode), 851-855
Page flipping animation demo programs
Mode X, 924-925, 925-930
split screen and page flipping, 820825, 825-830, 836-837
320x400 256-color mode, 600-605
Painters algorithm
See also 3-D animation; 3-D drawing.
and BSP trees, 1099, 1104-1105
overdraw problem, 1184-118j
potentially visible set (PVS),
precalculating, 1188-1189
Pairable instructions, Pentium
processor, 388
(book), 1148
Performance
See also Assembly language
optimization; Clock cycles; Cycleeaters; Local optimization;
Optimization; Zen timer.
and abstraction, 330-332, 345-346
551-555
Pixels
See also Boundary pixels, polygons;
Pixel drawing.
copying, using latches (Mode X), 905907,908, 909-911
reading (320x400 256-color mode), 599
redrawing, display adapter
cycle-eater, 102
rotating bits, 252
writing (320x400 256-color mode),
599, 600
Plane mask, 1074
Plane-manipulation demo program,
476-478
Planes
clipping line segments to, 1195-1197
l/z value, calculating, 1221
representation, 1196
Planes, VGA
See also Bit-plane animation.
ALUs and latches, 451-453
and bit mask, 465
capturing and restoring screens, 541542, 543-547,547-548
and Color Dont Care register, 534-535,
535-537
885-887
719-720
982-984
761-771
733-734, 735-739
PolygonIsMonotoneVertical
function, 761
Polygons
See also Texture mapping.
adjacent, and l/z span sorting, 1230
backface removal, 954-957, 1160-1161
categories of, 710, 742,759-760
clipping, 1158-1159
Gouraud shading, 1247
hidden surface removal, 1214-1222
normal vector, calculating, 955-956
projection in 3-D space, 937,
944-945,948
representation, 1196
3-D polygon rotation demo program
matrix multiplication functions, 943-
944,948
overview, 939
performance, 949
polygon filling with clipping
support, 940-943
transformation and projection, 944945, 948
transformation to 3-D space, 935
unit normal, calculating, 1027-1028,
1137-1140
visibility, calculating, 955-956
visible surface determination (VSD)
beam trees, 1185-1189
overdraw problem, 1184-1185
1142-1143
944-945
Q
QLife program, 352-363
QSCAN3.ASM listing, 309-311
Quake 2, 1293
Quake
surface caching, 1253-1256, 1260-1262
surface-based lighting
description, 1250-1251
mipmapping, 1254-1255
performance, 1251-1253
surface caching, 1253-1256,
1260-1262
texture mapping, 1261-1262
3-D engine
BSP trees, 1276-1277
lighting, 1282-1283
model overview, 1276-1277
portals, 1279-1280
potentially visible set (PVS), 1278-1279
rasterization, 1282
world, drawing, 1280-1281
and visible surface determination
(VSD), 1181
Quakeworld, 1291-1292
R
Radiosity lighting, Quake 2, 1293
Rasterization of polygons
See also Polygons, filling.
boundary pixels, selecting, 712
efficient implementation, 711
in Quake 3-D engine, 1282
Rate of divergence, in 3-D drawing, 937
Raycast, subdividing, and beam trees, 1187
RCL instruction, 185-186
RCR instruction, 185-186
Read360x480Dot subroutine, 614-615
R e a d 0 C library function
vs. getco function, 12
overhead, 121
Read Index register, 651-652
Read Map register
demo program, 526-530
planes, specifying to be read, 542
read mode, 0 , 526
Read Map register demo program, 526-530
Read mode, 0 , 521
Read mode 1
Color Dont Care register, 534
overview, 525-526
vs. read mode 0, 521
selecting, 525
Read/write/modify operations, 107
Read-after-write register contention, 404
ReadPixel subroutine, 598, 599
ReadPixelX subroutine, 884-885
Real mode. See 386 processor.
Real mode
addressing calculation pipeline, 239
32-bit addressing modes, 256-258
Rectangle fill, Mode X
four-plane parallel processing, 888891, 891-893
pixel-by-pixel plane selection,
885-887
plane-by-plane processing, 887-889
Recursion
BSP trees
building BSP trees, 1101-1104
data recursive inorder traversal,
1107-1113
visibility ordering, 1104-1 106
code recursion
vs. data recursion, 1108-1110
Euclids algorithm, 198-199
compiler-based optimization,
1112-1113
data recursion
vs. code recursion, 1108-1 110
compiler-based optimization,
1112-1113
Euclids algorithm, 200
inorder tree traversal, 1108-1110
performance, 1111-1113
performance, 1111-1113
Reference materials
3-D drawing, 934-935
3-D math, 1135
bitmapped text, drawing, 471
Bresenhams line-drawing
algorithm, 660
BSP trees, 1114, 1157
circle drawing, 626
color perception, 625
8253 timer chip, 72
486 processor, 236
parametric line clipping, 1121
Pentium processor, 374, 1148
SVGA programming, 626
VGA registers, 583
ReferenceZTimerOff subroutine, 41
ReferenceZTimerOn subroutine, 40
Reflections, in GLQuake, 1290
Reflective color, vs. emissive color, 1035
Register contention, Pentium processor,
403-405
Register-only instructions, 223-225
Registers
See also 32-bit registers; VGA registers.
AX register, 171
copying bytes between, 172
EGA palette registers, 549-550
8-bit bus cycle-eater, 85
486 processor
addressing pipeline penalty, 238240, 250
byte registers and lost
cycles, 242-245
indexed addressing, 237-238
pushing or popping, vs. memory
locations, 254-255
scaled, 2 56-258
stack addressing, 241-242
32-bit addressing modes, 256-258
prefetch queue cycle-eater, 94
and split screen operations, 573
and stack frames, 153
VGA architecture, 427-429
Relocating bitmaps, 516-517
Rendering BSP trees
backface removal, 1160-1161
clipping, 1158-1159
Clipwalls function, 1152-1155,
1158-1159
DrawWallsBackToFront function,
1155-1156,1160-1161
overview, 1149
reference materials, 1157
TransformVertices function, 11511152,1158
UpdateViewPos function, 1151,1157
Updateworld function, 11561157,1157
viewspace, transformationof objects
to, 1158
wall orientation testing, 1160-1161
WallFacingViewer function, 11501151,1161
RenderMan Companion (book), 742
REP MOVS instruction, 148
REP MOVSW instruction, 82, 105, 220
REP SCASW instruction, 166
REP STOS instruction, 727, 735
REPNZ SCASB instruction
vs. Boyer-Moore algorithm, 267-268,
271, 274
in string searching problem, 121-122,
174-175, 262-263
REPZ CMPS instruction
vs. Boyer-Moore algorithm, 267-268,
271, 274
in string searching problem, 121-122,
174-175, 262-263
Restartable blocks
in 16-bit checksum program, 16
optimizing file processing, 118
performance, 122
in search engine, 117-118
size of, 114, 121
Results, precalculating
See also lookup tables.
BSP trees and potentially visible set
(PVS), 1188-1189
RET instruction, 241-242
Reusable code, and future of
programming profession, 725-726
RGB (red, green, blue) color model
mapping to 256-color mode, 1036,
1037-1038, 1039
overview, 1034-1035
Richardson,John, 316
Right-handed coordinate system, 935-937
ROL instruction, 185-186
Roll angle, in polygon clipping, 1206
ROR instruction, 185-186
Rotate instructions
hand assembling, 255-256
n-bit vs. 1-bit, 255-256
286 processor, 222
RotateAndMovePObject
function, 977-978
Rotation, 3-D animation
Concatxforms function, 944
matrix representation, 938-939
multiple axes of rotation, 948
using dot product, 1143-1144
XformVec function, 943
Rotational variance, 1249
Rotations, bitwise
vs. lookup tables, 145-146
multi-bit vs. single-bit, 185-186
Rounding vs. truncation
in 3-D animation, 1002-1003
floating point optimization, 1174-1175
texture mapping, 1066-1067
Run-length slice algorithm
assembly implementation, 698-704
C-language implementations, 688-692,
692-693
description, 683-684
implementation details, 685-687
integer-based implementation, 685-687
potential optimizations, 705
Ruts, mental, staying out of, 1147-1148
Index
S
SAHF instruction, 148
Sam the Golden Retriever, 841-842
SC (Sequence Controller), VGA
addressing, 427-428
Map Mask register
CPU writes, selecting planes, 443444, 471-472
drawing text, 833
optimizing Mode X, 1074
vs. Read Map register, 526
with set/reset circuitry, 474
write mode 1, 444
Scaled registers, 256-258
Scan conversion, polygons
active edges, 721, 742-744,
753, 755, 756
C-language implementation, 713-717,
720-721
defined, 710
zero-length segments, 721
Scan lines
redefining length of, 442
in split screens, 564-565, 573
360x480 256-color mode, 619
vertical, in texture mapping, 1084-1086
ScanBuffer assembly routine
authors implementation, 301-302,
303-304
hand-optimized
implementation(Wil1em Clements),
313-315
730-732
ScanOutLine function
assembly implementation,
1069-1073,1074
C-language implementation, 1058-
1059, 1067-1069
SCASW instruction, 161
Screen blanking
demo program, 556-557
using DAC Mask register, 651
Screen blanking demo program,
556-557
178, 269
optimization, 174-180
restartable blocks, 117-118
search space and optimization,
122, 175
search techniques, 115-116, 175
SearchForString function, 118
Searching
See also Search engine.
Boyer-Moore algorithm, 263-277
in linked list of arrays, 156-166
for specified byte in buffer, 141-145
using REP SCASW, 166
SecondPass function, 358-360
Sedgewick, Robert (Algorithms), 192, 196
Segments
compiler handling of, 154
and far jumps, 186
protected mode, 208-209
386 processor, 222
SelectBSPTree function, 1124-1125
476-478
planes, setting all to single
color, 473-474
and write mode 2, 501-502, 509, 515
Set/Reset register, 666
SetBIOSSxSFont subroutine, 830
Set-cell method, 327, 334, 342
SETGC macro, 454, 475
Setpalette function, 783-784
SetPelPan subroutine, 580
SETSC macro, 474
SetSplitScreenScanLine subroutine,
570-571, 581
SetStartAddress subroutine, 570, 580
SetUpEdge function, 1057-1058
Setworldspace function, 1204
Shading
See also Lighting; 3-D drawing.
ambient shading, 1023
diffuse shading, 1023-1024
directed light sources, 1028
effects, 360x480 256-color mode, 618
overall shading, calculating, 1025
of polygons, 1025-1026, 1027-1029
Shearing
cause of, 813
in dirty-rectangle animation, 846
page flipping, 814
sheep, 1063
Shift instructions, 222, 255-256
Shifting bits, vs. lookup tables, 145-146
SHL instruction, 376
ShowBounceCount function, 823-824
Showpage subroutine
masked copying animation, Mode X,
929-930
852-853
8-9, 15-16
overview, 8
redesigning, 9
16-color VGA modes
color paging, 628-629
DAC (DigitaVAnalog
Converter), 626-628
palette R A M , 626
Small code model, linking Zen timer, 70
Software patents, 1194
Sorted span hidden surface removal
abutting span sorting, 1229-1230
AddPolygonEdges function, 12321233, 1238
BSP order vs. l / z order, 1220, 1226
ClearEdgeLists function, 1236-1237
DrawSpans function, 1236
edge sorting, 1220-1222
edges vs. spans, 1215-1220
independent span sorting, 1230, 12311238, 1239-1241
intersecting span sorting, 1228-1229
l / z sorting, 1220-1222, 1227-1231,
1231-1238,1239-1241
overview, 1214-1215
PolyFacesViewer function, 1232
ScanEdges function, 1234-1236,
1238-1239
Updateworld function, 1237-1238
Sorting techniques
25-byte sorting routine, 180-181
BSP trees, 1099
moving models in 3-D
drawings, 1212-1222
911
T
Table-driven state machines, 316-319
Tail nodes, in linked lists, 286
TASM (Turbo Assembler), 71-72
TCPAP checksum program
basic implementation, 406
dword implementation, 409
Index
overview, 1197-1200
PolyFacesViewer function, 1203
ProjectPolygon function, 1201
SetUpFrustum function, 1204
SetWorldspace function, 1204
TransformPoint function, 1203
TransformPolygon function, 1203
updateviewpos function, 1202
Updateworld function, 1205
viewspace clipping, 1207
ZSortObjects function, 1201
3-D drawing
See also BSP (Binary Space
Partitioning) trees; Hidden surface
removal; Polygons, filling; Shading;
3-D animation.
backface removal
BSP tree rendering, 1160-1161
calculations, 955-957
motivation for, 954-955
and sign of dot product, 1140
solid cube rotation demo program,
957-961, 962-963, 964-966, 967
background surfaces, 1240
draw-buffers, and beam trees, 1187
and dynamic objects, 1100-1101
Gouraud shading, 1246-1250
lighting
Gouraud shading, 1246-1250
overlapping lights, 1247
perspective correctness, 1248-1250
rotational variance, 1249
surface-based lighting, 1250-1256,
1260-1262
viewing variance, 1249
moving models in 3-D
drawings, 1212-1222
painters algorithm, 1099, 1104-1105
perspective correctness problem,
1248-1250
portals, and beam trees, 1188
projection
dot products, 1141-1142
overview, 937, 948
raycast, subdividing, and
beam trees, 1187
reference materials, 934-935
1267-1270
vertex-free surfaces, and
beam trees, 1187
visibility determination, 1099-1 106
visible surface determination (VSD)
beam trees, 1185-1189
culling to frustum, 1181-1184
overdraw problem, 1184-1185
polygon culling, 1181-1184
potentially visible set (PVS),
precalculating, 1188-1189
3-D engine, Quake
BSP trees, 1276-1277
lighting, 1282-1283
model overview, 1276-1277
portals, 1279-1280
potentially visible set (PVS), 1278-1279
rasterization, 1282
world, drawing, 1280-1281
3-D math
cross products, 1139-1140
dot products
calculating, 1135-1137
calculating light intensity, 1137
projection, 1141-1142
rotation, 1143-1144
sign of, 1140-1141
o f unit vectors, 1136
of vectors, 1135-1136
matrix math
assembly routines, 992, 996-999
C-language implementations,
974-976
normal vectors, calculating, 955-956
rotation of 3-D objects, 938-939,
943-944, 948
962-963
145-146
LOOP instruction vs. DEC/JNZ
sequence, 139
memory access, performance, 223-225
MUL and I"Linstructions, 173-174
multiplication operations, increasing
speed of, 173-174
new instructions and features, 222
Pentium code, running on, 411
protected mode, 208-209
rotation instructions, clock
cycles, 185-186
system wait states, 210-212
32-bit addressing modes, 256-258
32-bit multiply and divide
operations, 985
using 32-bit register as two 16-bit
registers, 253-254
XCHG vs. MOV instructions, 377, 832
386SX processor, 16-bit bus
cycle-eater, 81
360x480 256-color mode
display memory, accessing, 621-622
Draw360x480Dot
subroutine, 613-614
drawing speed, 618
horizontal resolution, 620
line drawing demo program,
615-618,618-619
mode set routine (John Bridges), 609,
612, 620-621
on VGA clones, 610-611
Read360x480Dot
subroutine, 614-615
256-color resolution, 619-620
vertical resolution, 619
320x400 256-color mode
advantages of, 590-591
display memory organization, 591-593
line drawing, 600
page flipping demo program, 600-605
performance, 599-600
pixel drawing demo program, 593598, 599-600
320x240 256-color mode. See Mode X.
Time perception, subjectivity of, 972
Time-critical code, 13
Index
1173-1174
U
Unifying models, and optimization,
1110-1111
Unit normal of polygons, calculating,
1027-1028, 1137-1140
Unit vectors, dot product, 1136-1137
Unrolling loops, 143-145, 305, 312, 377378, 410
Updateviewpos function, 1202
V
Variables, word-sized vs. byte-sized,
82, 83-85
Vectors
cross product, 1139-1140
cross-products, calculating, 955-956
dot product, 1135-1137
length equation, 1135
optimization of, 986
unit vectors, dot product, 1136-1137
Vectorsup function
Bresenhams line-drawing algorithm,
664-665
360x480 256-color mode line drawing
program, 617-618
Verite Quake, 1287-1280
Vertex-free surfaces, and beam trees, 1187
Vertical blanking, loading DAC, 641
Vertical resolution, 360x480 256-color
mode, 619
Vertical scanlines, in texture mapping,
1084-1086
Vertical sync pulse
loading DAC, 641, 648
and page flipping, 444-446, 835-836
split screens, 573
VGA BIOS
DAC (Digital/Analog Converter)
loading, 641-642, 648
setting registers, 630, 631-632
vs. direct hardware
programming, 458-459
function 13H, 459
and nonstandard modes, 854-855
palette RAM, setting registers, 629-630,
631-632
reading from DAC, 652
text routines, in 320x400 256-color
mode, 592
Index
W
Wait30Frames function, 854
Wait states
display memory wait states
8088 processor, 101-103
286 processor, 220
vs. DRAM refresh, 100
overview, 99
system memory wait states, 210-213
WaitForVerticaLSyncEnd subroutine,
569, 579-580
WaitForVerticaLSyncStart subroutine,
569, 579
WalkBSPTree function, 1106
WalkTree function
code recursive version, 1108
data recursive version, 1109-1110
Wall orientation testing, BSP tree
rendering, 1160-1161
WC word counting program (Terje
Mathisen), optimization, 250-252,
306, 319
Williams, Rob, 174
Winnie the Pooh orbiting Saturn, 1047
WinQuake, 1290
Word alignment, 286 processor
code alignment, 215-218
data alignment, 213-215
stack pointer alignment, 218-219
Word count program
313-315
509-515
mechanics, 502
overview, 501-502
selecting, 504
vs. set/reset circuitry, 509, 515
vs. write mode 0, 503
Write mode 3
vs. Bit Mask register, 844
charactedattribute map, 517
drawing bitmapped text, 484-489,
489-490, 490-496
drawing solid text, 1039-1041,
1042-1044
X
X86 family CPUs
See also 8088 processor.
32-bit division, 181-184, 1008
branching, performance, 140
copying bytes between registers, 172
interrupts, 9
limitations for assembly
programmers, 27
lookup tables, vs. rotating or shifting,
145-146
LOOP instruction vs. DEC/JNZ
sequence, 139
machine instructions, versatility, 128
memory addressing modes, 129-133
overview, 208
transformation inefficiencies, 26
XCHG instruction, 377, 832
X-clipping, in BSP tree rendering, 1159
XformAndProjectPObject function, 974
SormAndProjectPoints function, 960
XformAndproectPoly function, 944-945
XformVec function
assembly implementation, 996-997,
1017-1019
964-965,975
1019-1022
1013-1015
1067
1009-1010
Previous
996-999
ModelColoffoColorIndexfunction,
1036, 1038
older processors, support for, 10071008, 1008-1023
overview, 984-985
POLYG0N.H header file, 982-984
RGB color model
mapping to 256-color mode, 1036,
1037-1038, 1039
overview, 1034-1035
RotateAndMovePObject function,
'
977-978
X
n
o
r
f
A
rn
d
P
a
o
r
e
jc
t hnction, 974
XformVec function
assembly implementation, 996-997,
1017-1019
C implementation, 976
XSortAET function
complex polygons, 748
monotone-vertical polygons, 769
Y
Yaw angle, in polygon clipping, 1206
Y-clipping, in BSP tree rendering, 1159
Z
Z-buffers
performance, 1213
Quake 3-D engine, 1285-1286
vs. sorted spans, 1215
sorting moving models, 1212-1213
Z-clipping, in BSP tree rendering, 1158
Zen timer
See also Long-period Zen timer.
calling, 48
Index
Home