You are on page 1of 502

PC Assembly Language

Paul A.Carter
July 23, 2006

Copyright ? 2001, 2002, 2003, 2004 by Paul


Carter

This

may

be reproduced

entirety (including this


permission

made

for

notice)

the

and distributed

au-thorship,

provided

document

authors

consent.

excerpts

like

This

reviews

in its

copyright and

that

no

charge

itself,

without

includes

fair

and

is

the

use

advertising,

and

derivative works like trans-lations.

Note that this restriction is not intended to


prohibit charging for the service of printing

or

copying the document.

Instructors are encouraged to use this document as


a class resource; however, the author would

appreciate being notified inthis

case.

Contents
Preface

1 Introduction
1.1

1.2

.........
..........
...........
.......
.....
..........
.........
...
..
........
.
..
.........

Number Systems

1.1.1
1.1.2

Decimal
Binary

1.1.3

Hexadecimal

Computer Organization

1.2.1

Memory

1.2.2

The CPU

1.2.3

The 80x86 family of CPUs

1.2.4

8086 16-bit Registers

1.2.5

80386 32-bit registers

1.2.6

Real Mode

1.2.7

16-bit Protected Mode

1.2.8

32-bit Protected Mode

1.2.9

Interrupts

1.3

1.4

.......
....
....
...
....
.........
....
.........
.......
.......
..
...
...

Assembly Language

1.3.1

Machine language

1.3.2

Assembly language

1.3.3

Instruction operands

1.3.4

Basic instructions

1.3.5

Directives

1.3.6

Input and Output

1.3.7

Debugging

1.4.1

a Program
First program

1.4.2

Compiler dependencies

1.4.3

Assembling the code

1.4.4

Compiling the C code

1.4.5

Linking the object files

Creating

...............

1
1

...............
...............
...............
...............
...............
...............
...............
...............
...............
...............
...............
...............
...............
...............
...............
...............
...............
...............

1
1
3
4
4
5
6
7
8
8
9
10
10
11
11
11
12
12

...............
...............
...............
...............
...............
...............
...............
...............
............... .

.......
1.4.6

13
16
16
18
18
22
22
23
23

Understanding

listing file

23

ii

1.5

an assembly

Skeleton File

..............
.........
.....
.........

Basic Assembly Language

2.1

Working with Integers

2.1.1

Integer representation

2.1.2

Sign extension

2.2

.......
...........
..........
......
.....

2.1.3

Twos complement arithmetic

2.1.4

Example

2.1.5

Extended precision arithmetic

program

Control Structures

2.2.1

Comparisons

2.2.2

Branch instructions

2.2.3

The loop instructions

.............
.............
.............
.............

CONTENTS

25
27
27
27
30

.............
.............
.............
.............
.............
.............
.............
.
.......... ..........
33
35
36
37
37
38
41

2.3

Translating Standard Control Structures


42

2.4
3

If statements

2.3.2

While loops

2.3.3

Do while loops

Example: Finding Prime Numbers

Bit Operations

3.1

..........
.........
..
............
..........
...........
........
..........

2.3.1

Shift Operations

3.1.1

Logical shifts

3.1.2

Use of shifts

3.1.3

Arithmetic shifts

3.1.4

Rotate shifts

3.1.5
3.2

.......
......
......
.......
......
......
......
.....
....
.........
...

Simple application

Boolean Bitwise Operations

3.2.1

The AND operation

3.2.2

The OR operation

3.2.3

The XOR operation

3.2.4

The NOT operation

3.2.5

The TEST instruction

3.2.6

Uses of bit operations

3.3

Avoiding Conditional Branches

3.4

Manipulating bits inC

3.4.1

The bitwise operators of C

3.4.2

Using bitwise operators in C

.............
.............
.............
.............
3.5

Big and Little Endian Representations

42
43
43
43

.............
.............
.............
.............
.............
.............
.............
.............
.............
.............
.............
.............
.............
.............

47
47
47
48
48
49
49
50
50
50
51
51
51
52
53

.............
.............
.............
.............
......
....
.......................
56
56
56
57

3.5.1

When

to Care About

Little

and Big

.
.......................
....................
...
.................
...........
.....
................
...
..........
59 3.6

Endian

Counting Bits

60 3.6.1

Method

one

60 3.6.2

Method two

61 3.6.3

Method three

123

CONTENTS

Subprograms

4.1

Indirect Addressing

4.2

Simple Subprogram Example

4.3

The Stack

4.4

The CALL and RET Instructions

4.5

Calling Conventions

4.5.1

Passing parameters

on the stack

4.5.2

Local variables

..
........
......
.........
.......
.......

on the stack

4.6

Multi-Module Programs

4.7

Interfacing Assembly with C

4.7.1

Saving registers

4.7.2

Labels of functions

4.7.3

Passing parameters

............
............
............
............
............
............

iii

65
65
66
68
69
70
70

............
............
............
............
............
............
.
.......
....
................

75
77
80
81
82
82

4.7.4

Calculating addresses of local variables

...........................
............
.............
.......... ..............
...
.......
...
82 4.7.5

Returning values

83 4.7.6

Other calling conventions

83 4.7.7

Examples

85 4.7.8

Calling C

88

functions from assembly

4.8

Reentrant and Recursive Subprograms

89 4.8.1

Recursive subprograms

89 4.8.2

Review of C variable storage types

91

Arrays

............ ...............
............ ..........
............
..
.......
...
............
...
......
..
............
....
95

5.1

Introduction

95

5.1.1

Defining

arrays

95

5.1.2

Accessing elements of

arrays

96

1 3

More advanced indirect addressing

98

5.2

5.1.4

Example

5.1.5

Multidimensional Arrays

Array/String Instructions

memory

5.2.1

Reading and writing

5.2.2

The REP instruction prefix

5.2.3

Comparison string instructions

5.2.4
5.2.5

The REPx instruction prefixes


Example

Floating Point

6.1

Floating Point Representation

6.1.1

Non-integral binary numbers

.............
.............
.............
.............
.............
.............
.............
.............
.............
.............
..........
...................
99
103
106
106
108
109
109
111

117

117
117

6.1.2

IEEE floating point representation

119 6.2

Arithmetic

122

Floating Point

..............
.........................
....
........
.............
............
.............
........
.....
..........
.................
...........
..........
.............
...........
6.2.1

Addition

122

6.2.2

Subtraction

iv

6.3

6.2.3

Multiplication and division

6.2.4

Ramifications for programming

The Numeric Coprocessor

6.3.1

Hardware

6.3.2
6.3.3

Instructions
Examples

6.3.4

Quadratic formula

6.3.5

Reading

6.3.6

Finding primes

array

from file

Structures and C++

7.1

7.2

Structures

7.1.1

Introduction

7.1.2
7.1.3

Memory alignment
Bit Fields

7.1.4

Using structures inassembly

Assembly and C++

7.2.1

Overloading and Name Mangling

.............
..........
..............
........
.....
.......

7.2.2

References

7.2.3

Inline functions

7.2.4

Classes

7.2.5

Inheritance and Polymorphism

7.2.6

Other C++ features

A 80x86 Instructions

A.1Non-floating Point Instructions


A.2 Floating Point Instructions

Index

............
............
............
............

CONTENTS

123
124
124
124

............
............
............
............
............
............
............
............
............
............
............
............

125
130
130
133
135

143
143
143
145
146
150
150
151

............
............
............
............
............
............
............

153
154
156
166
171

173

173
179

181

Preface
Purpose

purpose

The

of this book is to give the reader

better understanding

at

work

languages

lower

like

Pascal.

than
By

often be much

gaining

more

productive

assembly

program
used

language

supported

real

books

The
mode.

address

the computer.

multitasking

discusses how to

processors

teach

to

how

that the original PC

8086

processor

In this

mode,

only

any

device in

This mode is not suitable

instead

C and

this goal. Other PC

any memory or

secure,
later

as

in assembly language

still

processor

1981!

program may

program

way to achieve

the 8086

in

deeper

developing

software in higher level languages such


C++. Learning to

is an excellent

really

in programming

of how computers work, the reader

understanding

can

how computers

of

level

for

operating system. This book

program

the 80386 and

inprotected mode (the mode that

Windows

and

supports

the

systems

Linux

runs

in).

features

that

modern

expect,

such

memory protection.
use protected mode:

as

There

virtual

are

This

mode

operating

memory and
reasons to

several

1.It is easier to program inprotected mode than


inthe 8086 real mode
that other books

use.

2.All modern PC operating systems

run in

protected mode.

3.There is free software available that

runs in

this mode.

The lack of textbooks

for protected

assembly

is the

programming

main

mode PC

reason

that

the author wrote this book.


As alluded to above, this text makes
Free/Open

assembler
Both

Source software:

and the DJGPP

of these

the Internet.

use

NASM

are

use

of

namely, the NASM

available

C/C++

compiler.

to download

from

The text also dis-cusses how to

assembly

code

under

the Linux

operating

sys-tem

found

for all of these

on

my

drpaulcarter
the example

many

with

C/C++ compilers

Microsofts

Examples

and

web

Borlands

and

under Win-dows

platforms

can

be

site: http://www.

com/pcasm. You must download

code if you wish to assemble and

run

of the examples inthis tutorial.

v
vi
PREFACE

Be aware that this text does not attempt to


cover every aspect of assem-bly programming. The
author has tried to cover the most important
topics that all programmers should be acquainted
with.

Acknowledgements
The author would like to thank the many

programmers

around the world that have

contributed to the Free/Open Source movement.

All the programs

and even this book itself

were

produced using free software. Specifically, the


author would like to thank John S. Fine, Simon
Tatham, Julian Hall and others for developing

the NASM assembler that allthe examples in


this book

are based on; DJ Delorie

for developing

the DJGPP C/C++ compiler used; the numerous


people who have contributed to the GNU
compiler

on which DJGPP

is based

gcc

on; Donald

Knuth and others for developing the TEX and


LATE X2 typesetting languages that

were used to

produce the book; Richard Stallman (founder of


the Free Software Foundation), Linus Torvalds
(creator of the Linux kernel) and others who
produced the underlying soft-ware the author used

to produce this work.


Thanks to the following people for
corrections:
John S. Fine
Marcelo Henrique Pinto de Almeida
Sam Hopkins
Nick DImperio

Jeremiah Lawrence
Ed Beroset
Jerry Gembarowski

Ziqiang Peng
Eno Compton
Josh I
Cates
Mik Miffl in

Luke Wallis
Gaku Ueda
Brian Heward
vii

Chad Gorshing

F. Gotti
Bob Wilkinson
Markus Koegel
Louis Taber
Dave Kiddell
Eduardo Horowitz
S ebastien Le Ray
Nehal Mistry

Jianyue Wang

Jeremias Kleer
Marc Janicki
Resources
Authors

page

NASM SourceForge

DJGPP

on the Internet

http://www.drpaulcarter.com/

page

http://sourceforge.net/projects/nasm/

Linux Assembly
The Art of Assembly

http://www.delorie.com/djgpp
http://www.linuxassembly.org/
http://webster.cs.ucr.edu/

USENET
Intel documentation

comp.lang.asm.x86
http://developer.intel.com/design/Pentium4/documentation.htm

Feedback
The author welcomes

any feedback on this

work.
E-mail:

pacman128@gmail.com

WWW:

http://www.

drpaulcarter.com/pcasm

viii
PREFACE

Chapter 1

Introduction
1.1

Number Systems

Memory in a computer consists of numbers


Computer
in

memory

decimal

simplifies

does not store these numbers

(base

the

10).

hardware,

Because

it

computers

greatly

store

all

information in a binary (base 2) format. First lets

review the decimal system.

1.1.1

Decimal

are composed of 10 possible


digits (0-9). Each digit of a number has a power
of 10 associated with it based on its position in
Base 10 numbers

the number. For example:

234

=2

10

+3

10

+4

10

1.1.2

Binary

are composed of 2 possible


digits (0 and 1).Each digit of a number has a
power of 2 associated with it based on its
Base 2 numbers

position inthe number. (A single binary digit is


called

a bit.) For example:

110012

+1 2 + 0
16 + 8 + 1

1 2

=
=

+0

+1

25

This shows how binary

may

be converted to

decimal. Table 1.1shows how the first few


numbers

are represented

inbinary.

Figure 1.1shows how individual binary

digits (i.e., bits)

are added.

Heres

1
2
CHAPTER 1.INTRODUCTION

an example:

Decimal

Binary

Decimal

Binary

0000

1000

0001

1001

0010

10

1010

0011

11

1011

0100

12

1100

0101

13

1101

0110

14

1110

0111

15

1111

Table 1.1: Decimal 0 to 15 inBinary


No previous

+0

+1

carry
1

+0
1

Previous

+1

+0

0
+1

carry
1

+0

+1

Figure 1.1: Binary addition (c stands for carry)

110112
+100012
1011002
If one considers the following decimal division:

1234

can see

he

that

10

= 123 r 4

this division

strips

off the

rightmost decimal digit of the number and shifts


the other decimal digits
Dividing

one position to the right.


a similar operation, but

by two performs

for the binary digits of the number.


1:
following binary division

11012

This

fact

can

102

= 1102

1.2

shows.

rightmost digit
significant

r1

be used to convert

number to its equivalent binary

Figure

Consider the

This

decimal

representation

method

finds

as

the

first, this digit is called the least

bit (lsb). The leftmost digit is called

the most significant bit (msb). The basic unit of

memory

consists of 8 bits and is called

a byte.

The 2 subscript is used to show that the number is

represented inbinary, not decimal

1.1. NUMBER SYSTEMS

Decimal

25

= 12 r 1
2 =6 r0
2 =3 r0
2 =1
r1
2 =0 r1

12

6
3

Binary

= 1100 r 1
= 110 r 0
10 = 11r 0
10 = 1
r1
10 = 0 r 1

11001

10

1100

10

110

11

=110012

Thus 2510

Figure 1.2: Decimal conversion

589

16

36

16

16

=
=
=

Thus 589

36 r13
2 r4
0 r2

=24D16

Figure 1.3:

1.1.3

Hexadecimal

Hexadecimal
Hexadecimal

numbers

use
can

(or hex for short)

base

16.

be used

as a

shorthand

for

binary

numbers.

possible digits. This creates

Hex

a problem

has

16

since there

are no symbols to use for these extra digits after


9. By convention, letters are used for these
extra digits. The 16 hex digits are 0-9 then A,

B, C, D, Eand F.The digit A is equivalent to

10 indecimal, B is 11, etc


number

has

a power

Example:

=
=
=

2BD16

To convert
idea that

Each digit of

of 16 associated

+ 11
512 + 176 + 13
2

16

16

+ 13

16

use

the

701

from decimal to hex,

was

a hex

with it.

used for binary

conversion

same

except

divide by 16. See Figure 1.3 for an example.

reason that hex is useful


very simple way to convert
The

is that there is

4
CHAPTER 1.INTRODUCTION

between hex and binary. Binary numbers get large


and cumbersome

quickly.

Hex provides

much

more compact way to represent binary.


To convert a hex number to binary, simply
convert each hex digit to a 4-bit binary number.

For example, 24D16

is converted

to 0010 0100

zeros of the 4-bits


zero for the middle
digit of 24D16 is not used the result is wrong.
Converting from binary to hex is just as easy. One
does the process in reverse. Convert each 4-bit
11012

are

Note that the leading

important! If the leading

segments of the binary to hex. Start from the


right end, not the left end of the binary number.
This

ensures

that the

process uses

the correct 4-bit

segments 2.Example:
110

0000

0101

1010

0111

11102

E16

A 4-bit number

is called

nibble

Thus

each hex digit


nibbles

make

represented
value

by

ranges

a
so a
to

corresponds

a
a

byte and

nibble. Two

can

byte

be

2-digit hex number. A bytes

from 0 to 11111111 in binary, 0 to

FF inhex and 0 to 255 indecimal.

1.2

1.2.1

Computer Organization

Memory

Memory

is

memory
megabytes

can hold

measured

is

computer

units of kilobytes ( 2

10

32

with

memory

of

roughly 32 million bytes of information.

Each byte in 1,024


(2

20

and

labeled

The basic unit of

in

byte.

by

address

as

Address

bytes),

megabytes

=1,048,576
gigabytes

unique

2 30

number

as

known

is

shows.

1,073,741,824

1.4

Figure

memory

bytes)

its

bytes).

Memory

2A

45

B8

20

8F

CD

12

2E

Figure 1.4: Memory Addresses

memory

Often

single
have

is used in larger chunks than

bytes. On the PC
been

memory as

given

larger

sections

stored by

memory is numeric.
using a char-acter code

to characters.

numbers

common

character

(American

is supplanting

One

codes

Standard

Interchange)

difference

of

Table 1.2 shows.

All data in

are

names

architecture,

to these

Code

is

for

that

maps

the

most

as

is known

new, more

ASCII

of

Characters

ASCII

Informa- tion

complete code that

Unicode.

One

key

between the two codes is that ASCII

uses
2

If

it is not

clear

why

the

starting

difference, try converting the example

point

makes

starting at the left.

1.2. COMPUTER ORGANIZATION

word

2 bytes

double word

4 bytes

quad word

8 bytes

paragraph

16 bytes

Table 1.2: Units of Memory

one

a character, but
per character.

byte to encode

two bytes (or a word)

maps

ASCII

character

004116

the

capital

byte

4116

the ASCII val-ues to words


characters

important

for representing

The CPU

the

the word

limited to

Unicode extends

and allows

to be represented.

languages of the world.

1.2.2

(6510 ) to

maps
uses a byte, it is
3

uses

For example,

A; Unicode

Since ASCII

only 256 different characters

more

Unicode

characters

many

This

is

for all the

The Central Processing Unit (CPU) is the


physical device that performs

instructions. The instructions that CPUs perform

are generally very simple.


Instructions may require

the data they act

on to

be inspecial storage loca-

tions inthe CPU itself called registers. The CPU

can access

data in registers

much faster than data inmemory. However, the


number of registers in a
CPU is limited,

care to keep

so the programmer must

take

only currently

used data inregisters.

The instructions
make

a type of CPU executes

up the CPUs machine


programs

language. Machine

have

a much more

basic structure than higherlevel languages. Machine language instructions

are encoded as raw numbers,


not in friendly text formats. A CPU must be able

an instructions
purpose very quickly to run effi ciently.
to decode

Machine

language is designed with


this goal inmind, not to be easily deciphered by

humans. Programs written


inother languages must be converted to the
native machine language of

the CPU to run on the computer. A compiler


a program that translates
programs written ina programming language

is

into the machine language of

a particular computer architecture. Ingeneral,


every type of CPU has its
own unique machine language. This is one reason
why programs written for
a Mac can not run onan IBM-type PC.
Computers use a clock to synchronize the
execution of the instructions.
gigahertz

(known

GHz stands for

The clock pulses at a fixed frequency

as the clock

billion cycles per


buy a 1.5 GHz computer,

speed). When

you or one

1.5 GHz is the frequency of this clock

4.
The clock

does not keep track of minutes and seconds. It simply beats at a constant

second.

A 1.5 GHz CPU

has 1.5 billion clock pulses

per second.
3

In fact, ASCII only

uses

has 128 different values to


4

Actually, clock pulses

the lower 7-bits and

so only

use.

are used

in many different

components of

a computer.

other components often


CPU.

The

use different

clock speeds than the

6
CHAPTER 1.INTRODUCTION

rate. The electronics of the CPU uses the beats


to perform their operations correctly, like how the

a metronome

beats of

help

one

play music at

the correct rhythm. The number of beats (or


they

are

requires

usually

depends

model. The

called

on

number

cycles)

an

1.2.3

of cycles depends

as

on

and
the

well.

The 80x86 family of CPUs

IBM-type

PCs contain

80x86 family (or


this

instruction

the CPU generation

instructions before it and other factors

as

family

including

more recent

CPU from Intels

a clone of one). The CPUs in


some common features

all have

base machine language. However, the


members greatly enhance

the features.

8088,8086: These CPUs from the programming


standpoint

are

identical.

They

were

the CPUs

used in the earliest PCs. They provide several

16-bit registers: AX, BX, CX, DX, SI, DI, BP, SP,
CS, DS, SS, ES, IP, FLAGS. They only support

up to one

memory and only operate


a program may access
even the memory of other

megabyte of

in real mode. In this mode,

any memory
programs!

very

address,

This makes

diffi cult! Also,

divided

debugging

and security

program memory

into segments. Each segment

has to be

can not

be

larger than 64K.

80286: This CPU


adds

some new

was

used in AT class PCs. It

instructions

to the base machine

language of the 8088/86. However, its main

new

feature is 16-bit protected mode. In this mode, it

can access up to 16 megabytes and protect


programs from accessing each others memory.
However,
programs are still divided into
segments that could not be bigger than 64K.

80386: This CPU greatly enhanced the 80286.


First, it extends
the registers

many

of

to hold 32-bits (EAX, EBX, ECX,

EDX, ESI, EDI, EBP, ESP, EIP) and adds two

new
new

16-bit registers

FS and GS. It also adds

32-bit protected

mode.

In this mode, it

can access up to
again

divided

segment

into

gigabytes.

segments,

Programs

80486/Pentium/Pentium

now

but

can also be up to 4 gigabytes

are
each

insize!

Pro:

These

members of the 80x86 family add

very

few

new

up the execution

features. They mainly speed

of the

instructions.
Pentium MMX: This

processor

adds the MMX

(MultiMedia eXtensions)
instructions
instructions

to

the

Pentium.

can speed up common

graphics operations.

These

1.2. COMPUTER ORGANIZATION

AX

AH

AL

Figure 1.5: The AX register

Pentium II:This is the Pentium Pro processor


with the MMX instructions
added. (The Pentium IIIis essentially just

a faster

Pentium II.)

1.2.4

8086 16-bit Registers

The original 8086 CPU provided four 16-bit


general

purpose

Each of these

into

registers: AX, BX, CX and DX.


registers

could be decomposed

two 8-bit registers. For example, the AX

register could be decomposed


AL registers

as

register contains

Figure

the

upper

into

the AH and

1.5 shows. The AH


(or high) 8 bits of AX

and AL contains the lower 8 bits of AX. Often

AH and AL

are

used

as

independent

one

byte

registers; however, it is important

they

are not

independent

to realize that

of AX. Changing AXs

value will change AH and AL and vice


general

purpose

registers

are

versa.

The

used in many of the

data movement and arithmetic instructions.

are two 16-bit index registers: SI and


are often used as pointers, but can be
used for many of the same purposes as the
general
registers. However, they can not be
There

DI. They

decomposed into 8-bit registers.


The 16-bit BP and SP registers
point to data inthe

are

ma-chine

are

used to

language stack and

called the Base Pointer and Stack Pointer,

respectively. These will be discussed later.

are
memory is
a program. CS

The 16-bit CS, DS, SS and ES registers

segment registers.
used
stands

for different

They denote what

parts

of

for Code Segment, DS for Data Segment,

SS for Stack Segment and ES for Extra Segment.


ES is used

as a temporary segment register. The


are in Sections 1.2.6 and

details of these registers

1.2.7.

The Instruction Pointer (IP) register is used

with the CS register

to

keep

track

of the

address of the next instruction to be executed

by the

as an

CPU. Normally,

executed, IP is advanced

instruction

is

to point to the next

instruction inmemory.

The

information

instruction

FLAGS

about

stores

register
the

These

results

results

of

are

important

previous

stored

as

individual bits inthe register. For example, the Z


bit is 1
if the result of the previous

was zero or

0 if not

zero.

instruction

Not all instructions

modify the bits in FLAGS, consult the table in


the appendix

to

see

how individual instructions

affect the FLAGS register.

8
CHAPTER 1.INTRODUCTION

1.2.5

80386 32-bit registers

80386

The

and

processors

later

extended registers. For example,

have

the 16-bit AX

register is extended to be 32-bits. To be backward

compatible,

AX still refers to the 16-bit register

and EAX is used to refer to the extended

32-bit

register. AX is the lower 16-bits of EAX just

as

AL is the lower 8-bits of AX (and EAX). There is

no way to access

the

upper

directly. The other extended

16-bits

registers

of EAX

are

EBX,

ECX, EDX, ESI and EDI.

Many of the other registers

well.

BP

becomes

EBP;

SP

are

extended

becomes

as

ESP;

FLAGS becomes EFLAGS and IP becomes EIP.

However

registers,

unlike the index and general


in 32-bit

below)

only

the

registers

are used.

protected

extended

The segment registers

are

mode

versions

purpose

(discussed
of

these

still 16-bit in the

80386. There
FS and

are

new segment registers:


names do not stand for

also two

GS. Their

anything

are extra temporary

They

segment

registers (like ES).

of the term word refers to

One of definitions

the size of the data registers

80x86 family, the term is


In Table 1.2, one

sees
was

80386

was

was

developed, it

given this meaning

released.

first

was

When

the

decided to leave the

of word unchanged,

definition

confusing.

that word is defined to be

2 bytes (or 16 bits). It


when the 8086

of the CPU. For the

now a little

even

though the

register size changed.

1.2.6

Real Mode

where

So

memory

did

is limited

bytes). Valid

range

mous

from?

required

The BIOS

some of the 1M

one

to only
DOS

like the video

megabyte

640K

limit

(2

20

address

to FFFFF. These

a 20-

bit number. Obviously,

a 20-bit

number will not fit into

any of the 8086s

16-bit registers. Intel solved this problem, by using two 16-bit values to

for its code and for hard-

ware devices
screen.

In real mode,

infa-

from (in hex) 00000

addresses require
come

the

determine

an address.

The first 16-bit value is called the selector. Selector

values must be stored in segment registers. The second 16-bit value is called
the offset. The physical address referenced by

computed by the formula

a 32-bit

selector:offset

pair is

16 selector

+ offset

Multiplying by 16 inhex is easy, just add

a 0 to

the right of the number. For example, the

physical addresses referenced by 047C:0048 is


given by:
047C0

+0048
04808
In effect, the selector value is a paragraph
number (see Table 1.2).
Real segmented addresses have disadvantages:

1.2. COMPUTER ORGANIZATION

can only reference 64K


memory (the upper limit of the 16-bit offset).
What if a program has more than 64K of code?
A single value in CS can not be used for the
entire execution of the program. The program
must be split up into sections (called segments)
less than 64K in size. When execution moves
from one seg-ment to another, the value of CS
must be changed. Similar problems occur with
A single selector value
of

large amounts

This

of data

and the DS register.

can be very awkward!

Each byte in

memory

does not have

unique

segmented address. The


physical address 04808

can

be referenced

by

047C:0048, 047D:0038,

047E:0028

or

047B:0058.

the comparison of

This

can

seg-

mented addresses.

1.2.7

16-bit Protected Mode

complicate

Inthe 80286s 16-bit protected mode,


selector values

are interpreted

completely differently than inreal mode.


real mode,

a selector

In

value

is a paragraph number of physical

memory. In
a selector
value is an index into a descriptor table.
In
both modes, programs are
protected mode,

divided into segments. Inreal mode, these

segments

are at fixed positions


memory and the selector

inphysical

value

denotes the paragraph number


of the beginning of the segment. Inprotected

mode, the segments

are not

at fixed positions inphysical

memory.

In fact,

they do not have to be in

memory at all!
Protected mode uses a technique called virtual
memory The basic idea
of a virtual memory system is to only keep the
data and code in memory that
programs are currently using. Other data and
code are stored temporarily
on disk until they are needed again. In16-bit

protected mode, segments

are

memory and disk as needed.


a segment is returned
to memory from disk, it is very likely that it will
be put into a different area
of memory that it was inbefore being moved to
moved between
When

disk. All of this is done


transparently

program

by the operating system. The

does not have to be

written differently for virtual

memory to work.

Inprotected mode, each segment is assigned

an entry

ina descriptor

table. This entry has allthe information that


the system needs to know
about the segment. This information includes:
is it currently in memory;
if in memory, where is it;access permissions

(e.g., read-only). The index


of the entry of the segment is the selector value

that is stored in segment


registers.
One big disadvantage of 16-bit protected

mode is that offsets

are still

One

well-known

PC 16-bit quantities. As a consequence of this,

segment sizes
the 286

arrays
dead.

are still limited to

columnist called

at most 64K. This makes the

problematic!

use of large

CPU brain

10
CHAPTER 1.INTRODUCTION

1.2.8

32-bit Protected Mode

The 80386 introduced 32-bit protected mode.


There

are two major

dif-ferences between 386

32-bit and 286 16-bit protected modes:

are expanded to be 32-bits. This


an offset to range up
to 4 billion. Thus, segments can have sizes

1.Offsets
allows

up to 4 gigabytes.

can be divided into smaller 4K-sized


pages
The virtual memory system
works with pages now instead of segments. This
means that only parts of segment may be in
memory at any one time. In 286 16-bit mode,
either the entire segment is in memory or none of

2. Segments
units called

it

is. This

is not

practical

with

the

larger

segments that 32-bit mode allows.


InWindows 3.x, standard mode referred to

286 16-bit protected mode and enhanced mode

referred to 32-bit mode. Windows 9X, Windows


OS/2 and Linux all run inpaged

NT/2000/XP,

32-bit protected mode.

1.2.9

Interrupts

the ordinary

Sometimes

flow

of

a program

must be interrupted to process events that require

response
The hardware of a computer
a mechanism called interrupts to handle
these events. For example, when a mouse is
moved
the mouse hardware
interrupts
the
current program to handle the mouse movement
(to move the mouse cursor, etc. )Interrupts
cause control to be passed to an interrupt
handler. Interrupt handlers are routines that
process the interrupt. Each type of interrupt is
assigned an integer number.
At the beginning
of physical memory, a table of inter-rupt vectors
prompt

provides

resides that contain the segmented


the interrupt
essentially

an index

External

addresses of

handlers. The number of interrupt is


into this table.

interrupts

the CPU. (The

mouse

are

raised from outside

is an example of this type.)

Many

keyboard,
cards)

devices

I/O

raise

interrupts

(e.g.,

timer, disk drives, CD-ROM and sound

are raised from within


an error or the interrupt
interrupts are also called traps.

Internal interrupts

the CPU, either from


instruction. Error
Interrupts

generated

are

instruction

uses

called

these types of interrupts

(Application
modern

UNIX)

Interface)

interface.

Many interrupt

handlers

program

restore

values they

More

as Windows

and

return control back


when they

all the registers

to the

had before the interrupt

Thus, the interrupted

DOS

to implement its API

Programming

use a C based

interrupt

interrupts.

operating systems (such

to the interrupted
They

the

from
software

finish.

same

occurred.

program runs as if nothing


some CPU cycles).

happened (except that it lost


Traps generally

do not return. Often they abort

the program.
5

However,

kernel level.

they

may use a

lower level interface

at the

1.3. ASSEMBLY LANGUAGE

11

1.3

Assembly Language

1.3.1

Machine language

type

Every
machine
language

of

language.

are

understands

CPU

Instructions

its

in

own

machine

as bytes in memory.
own unique numeric
its operation code or opcode
for
80x86 processors instructions vary
numbers stored

Each instruc-tion has its


code

called

short. The

insize. The opcode is always at the beginning of


the instruction.

Many instructions

data (e.g., constants

or

addresses)

also include
used by the

instruction.

Machine

program

language

is

very

in directly. Deciphering

of the numerical-coded
for humans.

says to add

diffi cult

instructions

For example,

to

the meanings
is tedious

the instruction

the EAX and EBX registers

that

together

and store the result back into EAX is encoded by

the following hex codes:

C3

This is hardly obvious. Fortunately,


called

an assembler can do

a program

this tedious work for

the programmer.

1.3.2

Assembly language

An assembly

text (just

as a

language

program

is stored

higher level language

program).

Each assembly instruction represents exactly

ma-chine

as

one

instruction. For example, the addition

instruction described above would be represented

as:

inassembly language
add

eax, ebx

Here the meaning

of the instruction

is much

clearer than in machine code. The word add is


mnemonic

for

the

addition

instruction.

general form of an assembly instruction is:


mnemonic operand(s)

The

An assembler is a program that reads

a text

file with assembly instructions and converts the assembly into machine

code. Compilers

are programs

that do similar conversions for high-level

programming languages. An assembler is much simpler than

a compiler.

assembly language statement

years

for directly represents

Every

It took several

a single

machine

computer

instruction. High-level language statescientists to fig-ments

are much more complex

may

require many machine instructions.


out how to even write

and

ure

Another important difference between


assembly and high-level languages

that since

every

a compiler!

is

different type of CPU has its

own machine language, it


also has its own assembly language.
assembly programs between

Porting

12
CHAPTER 1.INTRODUCTION

computer

different

architectures

diffi cult than in a high-level


This

books

off the Internet

com-mon

More

examples

or NASM

Assembler

Assembler

is much

more

language.

uses

the

Netwide

for short. It is freely available

(see the preface for the URL).

are Microsofts
or Borlands Assembler
are some differences in the
assemblers

(MASM)

There

(TASM).

assembly syntax for MASM/-TASM and NASM.

1.3.3

Instruction operands

Machine
number

and

code

type

instructions
of

general, each instruction


number of

oper-ands

operands;

have

varying

however,

a fixed
can have

itself will have

(0 to 3). Operands

the following types:


register: These operands refer directly to the

contents of the CPUs regis-

ters.

in

memory:

These refer to data in

address of the data

may

a constant hardcoded
or may be computed using
values

registers.

of

memory.

The

be
into the instruction

are

Address

always

offsets from the beginning of a

segment.
immediate:

These

are

values

fixed

that

are

listed inthe instruction itself.


They

are

stored in the instruction itself (in

the code segment), not in


the data segment.
implied:

These

operands

are not

explicitly

shown. For example, the in-

crement instruction adds

or memory.

The

one to a register

one is

implied.

1.3.4

The

Basic instructions

most

instruction.

It

basic

moves

instruction
data from

is

one

the

MOV

location to

another

(like

the

assignment

operator

in

high-level language). It takes two operands:

mov

dest,

src

The data specified

by

src

is copied to dest

One restriction is that both operands

memory

operands.

This

quirk

assembly.

There

of

points

are

may not

out

be

another

often somewhat

arbitrary rules about how the various instructions

are

used. The operands

size. The value of AX


Here
comment):

is

an

must also be the

can not be stored

example

(semicolons

same

into BL.

start

1.3. ASSEMBLY LANGUAGE

13

mov

eax, 3

;
store

3 into EAX

register (3 is immediate operand)

mov

bx, ax

;
store

the value of AX

The ADD instruction is used to add integers.


add

eax, 4

add

al,ah

= eax + 4
;
al= al+ ah
;
eax

The SUB instruction subtracts integers.


sub
sub

-10
ebx, edi ;
ebx = ebx -edi
bx, 10

;
bx = bx

The INC and DEC instructions increment

or

into the BX register

decrement values by
Since the

one is an implicit

one.

operand, the

machine code for INC and DEC is smaller than for


the equivalent ADD and SUB instructions.
inc

ecx

;
ecx++

dec

dl

;
dl--

1.3.5

Directives

A directive is an artifact of the assembler not


the CPU. They

are gen-erally

used to either

instruct the assembler to do something


the assembler of something. They

or inform

are not

translated into machine code. Com-mon uses of


directives

are:

define constants

memory to store data into


group memory into segments
conditionally include source code

define

include other files

passes through a preprocessor


many of the same preprocessor
commands as C.However, NASMs preprocessor
di-rectives start with a % instead of a # as inC.
NASM code

just like C. It has

equ directive
equ directive can be used to define a
symbol. Symbols are named constants that can be
used inthe assembly program. The format is:
The

The

symbol

equ value

Symbol values

can not

be redefined later.

14
CHAPTER 1.INTRODUCTION

Unit

Letter

byte

word

double word

quad word

ten bytes

Table 1.3: Letters for RESX and DX Directives

The %define directive

This directive is similar to Cs #define


directive. It is most commonly

used to define constant

macros

just

as in C.

%define SIZE 100

mov

eax, SIZE

The above code defines


and shows its

use

a macro

named SIZE

in a MOV instruction. Macros

are more flexible than


Macros can be redefined
simple constant numbers.

symbols

and

can

in two

be

more

ways
than

Data directives

are used in data segments to


define room for memory. There are two ways
memory can be reserved. The first way only
defines room for data; the second way defines
room and an initial value. The first method uses
one of the RESX directives. The X is replaced with
a letter that determines the size of the object (or
Data directives

objects) that will be stored. Table 1.3 shows the


possible values.
The second method (that defines

uses one of the


are the same as

an

initial

value, too)

DX directives. The X

letters

those

in the

RESX

directives.
It is very

common to mark memory locations


with labels. Labels allow one to easily refer to
memory locations in code. Below are several
examples:
L1

db

;
byte

with initial value 0 L2

;
word
L3

dw

labeled L1

1000

labeled L2 with initial value 1000

db

110101b

;
byte

initialized

to binary 110101 (53 indecimal) L4

;
byte

12h

db

initialized to hex 12 (18

indecimal) L5

;
byte

17o

db

initialized to octal 17 (15 indecimal) L6

dd

;
double

1A92h

to hex 1A92 L7

word initialized

;
1

resb

uninitialized byte
L8

db

;
byte

"A"

initialized

to ASCII code for A (65)

Double quotes and single quotes

same. Consecutive data


definitions are stored sequentially
memory. That is,the word L2 is stored
immediately after L1in memory.

are

treated the

in

Sequences

1.3. ASSEMBLY

inC.

LANGUAGE

15

L9

db

d, 0

db

0,1,2,3

defines 4 bytes L10

db

"w", "o", "r",

;
defines a C string

= "word"

L11

;
same as

word, 0

L10

The DD directive

can

be used to define both

integer

and

single

precision

constants. However, the DQ

point

floating

can

only be used to

define double precision floating point constants.

For large

sequences,

NASMs

TIMES directive

is often useful. This direc-tive repeats its operand

a specified
L12

number of times. For example,

times 100 db 0

equivalent to 100 (db 0)s

L13

resw

;
reserves room

100
for 100 words

can be used to refer to


data in code. There are two ways that a label can
be used. If a plain label is used, it is interpreted as
Remember

that labels

the address (or offset) of the data. If the label is


placed inside

as

square

brackets ([]),it is interpreted

the data at the address. In other words,

one

a label as a pointer to the data

and

should think of

the square brackets dereferences

the pointer just

the asterisk does in C. (MASM/TASM


different

convention.)

mov

as
a

In 32-bit mode, addresses

are 32-bit. Here are some


1

follow

examples:

al,[L1]

;
copy

byte

at L1into AL

mov

;
EAX

eax, L1

mov

address of byte at L1 3

;
copy

ah

mov

[L1],

AH into byte at L1

;
copy

eax, [L6]

double word at L6 into EAX

eax, [L6]

;
EAX

at L6

add

add

= EAX +double
[L6], eax

word

double word at L6 += EAX

mov

al,[L6]

;
copy

first

byte of double word at L6 into AL

shows an important
assem-bler does not keep
track of the type of data that a label refers to. It
is up to the programmer to make sure that he (or
she) uses a label correctly. Later
it will be
common to store addresses of data in registers
and use the register like a pointer variable in C.
Again, no checking is made that a pointer is
used correctly. In this way, assembly is much
more error prone than even C.
Line

7 of the

examples

property of NASM. The

Consider the following instruction:

mov

[L6], 1

;
store a 1
at

L6

This

not

statement
specified

produces

error.

an

operation

Why?

Because

size

the

assembler does not know whether to store the 1


as

a byte,

word

or

double word. To fix this, add

size specifier:

mov

dword [L6], 1

;
store a 1

at L6
6

Single precision floating point is equivalent to a float

variable in C.

16
CHAPTER 1.INTRODUCTION

This tells the assembler to store

an 1
at the

double word that starts at L6. Other size

are: BYTE, WORD, QWORD

specifiers

1.3.6

and TWORD

7.

Input and Output

Input and output


activities.

It

are very system

involves

in-terfacing

dependent
with

the

systems hardware. High level languages, like C

provide standard libraries of routines that provide

a simple,

uniform programming

interface for I/O.

no standard libraries
access hardware
operation in pro-tected mode)

Assembly languages provide

must

They

either

(which is a privileged

or use

whatever

directly

low

level

routines

that

the

operating system provides.


It is

very common

be interfaced

that the assembly


library

the

code

I/O routines

rules

for assembly

with C. One

for

can use

However,

passing

routines to

advantage

of this is

the standard C

one must

information

know

between

complicated

uses.
These rules are too
to cover here. (They are covered

later!)

simplify

routines

that

To

developed
C

rules

the

I/O,

author

has

his own routines that hide the complex

and

1 .4

Table

interface.

provide

provided. All of the

more

much

describes

the

rou-tines preserve

simple
routines

the value of

all registers, except for the read routines. These

routines do modify the value of the EAX register.


To use these routines,
information

them.

that

one must

the

a
preprocessor

To include

%include

a file with
to use
NASM, use the

include

assembler
file in

needs

directive.

The

following

line includes the file needed by the authors I/O


8:
routines
%include "asm_io.inc"

To

use one

one
uses a

of the print routines,

EAX with the correct

value

and

loads

CALL

instruction to invoke it. The CALL instruction is


equivalent

to

function

call in

high level

language. It jumps execution to another

section

of code, but returns back to its origin after

the routine is over. The example

program

below

shows

several

examples

of calls to these I/O

routines.

1.3.7

Debugging

authors
library also contains some
for debugging programs
These

The

useful routines

debugging routines display information about the


state of the computer without modifying the state.
These
routines
are really macros
7

defines

TWORD

floating point
8

that
code

The

..

asm io

asm io

a ten

byte

coprocessor uses

downloads

(and

inc

requires)

inc

on

the

area

of

memory.

The

this data type.

the

are

asm io
in

web

object

the

page

file

example
for

this

tutorial, http://www.drpaulcarter.com/pcasm

1.3. ASSEMBLY LANGUAGE

17

print int

prints out to the screen the value

of the integer stored

inEAX
print char

prints out to the screen the

character whose ASCII

value stored inAL

print string

prints out to the

screen

the

contents of the string at


the address stored inEAX. The
string must be a C-

type string (i.e. null terminated).


print nl

prints out to the

line character. read int

screen a new
an integer

reads

from the keyboard and stores it into


the EAX register.
read char

reads

a single

character from the

keyboard and stores


its ASCII code into the EAX
register.
Table 1.4: Assembly I/O Routines

preserve the current state of the CPU and


a subroutine call. The macros are
defined inthe asm io.inc file discussed above.
Macros are used like ordinary instructions.
Operands of macros are separated by commas.
There are four debugging routines named
dump regs, dump mem, dump stack and dump math;
they display the values of registers, memory,
that

then make

stack and the math


dump

regs

This

coprocessor,

macro

respectively.

prints out the values of

the registers (in hexadeci-mal) of the computer to

stdout (i.e. the screen). It also displays the bits

set in the FLAGS

zero flag is 1,ZF


displayed.

register. For example, if the

is displayed

that is printed

distinguish

the

If it is 0,it is not

a single integer
out as well. This can

It takes

output

of

argument

be used to

dump

different

regs

commands.
dump
of

as

mem

a region

macro
of memory (in
This

.
.

ASCII characters.

arguments

delimited
that
dump

is used

regs

prints out the values

hexadecimal) and also

It takes

to label

argument)

the

paragraphs

memory

output

integer

(just

is

can

be

the number

as

will

start

on

is

label.)

of 16-byte

to display after the address

displayed

comma

an

The second argument

the address to display. (This


The last argument

three

The first is

the

The

first

paragraph boundary before the requested address.


dump stack This

macro

prints out the values

on the CPU stack. (The


stack will be covered
stack is organized

in Chapter

4.) The

as double

words and this routine displays them this

way.
9

It

takes

three

comma

Chapter 2 discusses this register

18

CHAPTER 1.INTRODUCTION

arguments

delimited

The first is

an

integer

label (like dump regs). The second is the number


of double words to display below the address

that

the EBP register holds and the third argument is


the number of double words to display above the
address inEBP.
dump math This

macro

prints out the values of

the registers of the math

coprocessor.

It

takes

single

integer

argument that is used to label


dump

1.4

the

output

regs

does.

Creating

just

as

the

of

a Program

Today, it is unusual to create

program

argument

written completely

stand alone

in assembly language

Assembly

is usually used to key certain critical

rou-tines.

Why? It is much easier to program in a

higher level language than in assembly. Also, using

assembly

makes

other platforms.

a program very hard to port to


In fact, it is rare to use assembly

at all.
So, why should

anyone

learn assembly at all?

1.Sometimes code written inassembly

can be

faster and smaller than

compiler generated code.

access to direct

2.Assembly allows

hardware

features of the system that

might be diffi cult

or impossible to use from a

higher level language.

3.Learning to program inassembly helps

one

gain a deeper understanding of how computers work.

4.Learning to program inassembly helps

one

understand better how

compilers and high level languages like C


work.
These

last

two

learning assembly

programs

points

demonstrate

that

can be useful even if one never

in it later. In fact, the author rarely

programs

in assembly, but he

uses

the ideas he

learned from it everyday.

1.4.1

First

The early

program

programs

in this text will all start

from the simple C driver


It

simply

asm main.

calls

program

another

This is really

in Figure 1.6

function

named

a routine that will be


are several advantages

written inassembly. There

in using the C driver routine. First, this lets the


C system set

protected

up the program to run correctly

mode.

corresponding

All the segments

code need not

worry

about

of this. Secondly, the C library will also be

available to be used by the assembly

1.4. CREATING A PROGRAM

19

in

their

segment registers will be initialized

by C. The assembly

any

and

code. The

int main()

int ret status

ret status

=asm main();

return ret status

Figure 1.6: driver.c code


advantage of this. They

use Cs

I/O

functions (printf, etc.). The following

shows

a simple

assembly

program.
first.asm

;
file: first.asm
;
First assembly program. This program
;
input and prints out their sum.

;
To create executable using djgpp:
;
nasm -f coff first.asm
;
gcc -o first first.o driver.c asm_io.

89

%include "asm_io.inc"
10

12

;
initialized
;

13

segment .data

11

data is put inthe .data

14

asks for two integers

as

segment
15

;
These

labels refer to strings used

for output
16

17

prompt1 db

;
dont

"Enter

a number:

forget null terminator

18

",0

prompt2

"Enter another number: ",0

db
19

outmsg1 db

"You entered ",0

20

outmsg2 db

"and ",0

21

outmsg3 db

",the

sum of these

is

",0
22

23

24

;
;
uninitialized

data is put inthe

.bss segment
25

26

segment .bss

27

28

;
;
These

labels refer to double words

used to store the inputs

29

20

30

input1

resd 1

31

input2

resd 1

32

35

;
;
code
;

36

segment .text

33

34

37

38

is put inthe .text segment

global

_asm_main

_asm_main:

39

enter

40

pusha

0,0

41

42

mov

eax, prompt1

43

call

print_string

45

call

read_int

46

mov

[input1],

48

mov

eax, prompt2

49

call

print_string

44

eax

;
;

47

50

51

call

read_int

52

mov

[input2],

eax

;
;

54

mov

55

add

56

mov

eax, [input1]
eax, [input2]
ebx, eax

;
;
;

53

57

dump_regs 1

58

dump_mem

59

60

61

62

2,outmsg1, 1

;
;
next
;

print out result

message

63

mov

eax, outmsg1

64

call

print_string

65

mov

eax, [input1]

66

call

print_int

67

mov

eax, outmsg2

68

call

print_string

69

mov

eax, [input2]

70

call

print_int

71

mov

eax, outmsg3

CHAPTER 1.INTRODUCTION

setup routine
print out prompt
read integer

store into input1


print out prompt
read integer

store into input2

eax

= dword at input1

eax += dword at input2


ebx

= eax

;
print
;
print
as series

;
print
;
print
;
print
;
print

out register values


out

memory

of steps

out first

message

out input1
out second
out input2

message

1.4. CREATING A PROGRAM

72

call

print_string

73

mov

eax, ebx

74

call

print_int

75

call

print_nl

76
77

popa

78

mov

79

leave

eax, 0

21

;
print

out third

;
print

out

;
print

new-line

;
return
80

message

sum (ebx)

back to C

ret

first.asm

program defines a section of


the program that specifies memory to be stored
in the data segment (whose name is .data)
Line 13 of the

initialized

Only

data should be defined

segment. On lines 17 to 21, several


declared.

They

will be printed

so must

library and

in this

be terminated

are

strings

with

the

with

character (ASCII code 0). Remember

null

there is

big difference between 0 and 0.

Uninitialized

data should be declared in the

on line 26). This


segment gets its name from an early UNIX-based
assem-bler operator that meant block started by
symbol. There is also a stack segment too. It
bss

segment

(named

.bss

will be discussed later.


The

code

historically.

segment

is

.text

named

are

It is where instructions

placed.

Note that the code label for the main routine


(line 38) has

an

underscore

of the C calling

specifies the rules C


It is

very

prefix. This is part

convention.

important

uses

now, one

will be presented;

only needs to know

(i.e., functions

and

conven-tion

to know this convention

when interfacing C and assembly


tire convention

This

when compiling code

global

Later the

en-

however,

for

that all C symbols


variables)

have

underscore
compiler.

to them by the C

prefix appended
(This

DOS/Windows,

rule

is

specifically

the Linux C compiler

for

does not

prepend anything to C sym-bol names.)

on line 37 tells the


asm main label global.
Unlike in C, labels have internal scope by default.
This means that only code in the same module
can use the label. The global directive gives the
specified label (or labels) external scope. This
type of label can be accessed by any module in
the program. The asm io module declares the
The

assembler

global

directive

to make the

print int, et.al. labels to be global. This is


why

one can use them inthe

first.asm module.

22
CHAPTER 1.INTRODUCTION

1.4.2

Compiler dependencies

The assembly code above is specific to the free


10-based
11
GNU
DJGPP C/C++ compiler.
This
compiler

can

be freely downloaded

ternet. It requires

runs

386-based PC

under DOS, Windows

compiler

uses

95/98

from the In-

or better and
or NT. This

object files in the COFF (Common

Object File Format) format. To assemble to this

format

use

the -f coff switch

with

nasm

(as

shown in the comments of the above code). The


extension of

the resulting object file will be o.

a GNU compiler also.


above to run under Linux,

The Linux C compiler is


To convert the code
simply

remove

the underscore prefixes in lines 37

and 38. Linux

uses

the ELF (Executable

and

Linkable Format) format for object files. Use the


-f elf switch for Linux. It also produces

an

object
The compiler specific

ex- with ano extension.

ample files, available from

another
Microsoft

popular
the authors

for object

Borland C/C++ is

compiler.

files. Use

uses

It

web site, have

the

OMF format

the -f obj

switch

for

Borland compilers.
already been modified to
work with the appropriate
compiler.
The extension of the object file willbe obj. The OMF format

uses differ-

ent segment directives than the other object formats. The data segment
(line 13) must be changed to:

segment

DATA public align=4 class=DATA

use32

The bss segment (line 26) must be changed to:


segment

BSS public align=4 class=BSS

use32

The text segment (line 36) must be changed to:


segment

TEXT public align=1 class=CODE

use32

Inaddition

line 36:

a new

line should be added before

group
The

DGROUP

Microsoft

BSS

DATA

compiler

C/C++

or the
given a

can use

either the OMF format

Win32 format for

object

OMF

files.

converts

(If

the

information

to

format,

Win32

internally.)

Win32 format allows segments

defined just

as

format

to be

for DJGPP and Linux. Use the

-f win32 switch to output

in this mode. The

extension of the object file will be obj.

1.4.3

Assembling the code

The first step is to assembly the code. From the


command line, type:

nasm
10

it

-f object-format first.asm

GNU is a project of the Free Software Foundation

(http://www.fsf.org)
11

http://www.delorie.com/djgpp

1.4. CREATING A PROGRAM

23
where object-format

is either coff, elf

win32 depending

on

used. (Remember

that the

what

C compiler

source

changed for both Linux and Borland

1.4.4

or

obj

will be

file must be

as well.)

Compiling the C code

Compile the driver.c file using

For DJGPP,

a C compiler.

use:

gcc -c driver.c

-c switch means to just compile, do not


same switch works on
Linux, Borland and Microsoft compilers as well.
The

attempt to link yet. This

1.4.5

Linking the object files

Linking

is

the

process

of

machine code and data in object


files together to create

combining

the

files and library

an executable

file. As will

be shown below, this process is complicated.


C code requires the standard
special startup code to

let the C compiler

correct

pa-rameters,

run.

C library and

It is much easier to

call the linker

directly. For example, to link the

first

program

using DJGPP,

gcc -o first
This creates

with the

than to try to call the linker

use:

driver.o first.o

an executable

code for the

asm io.o

called first.exe (or

just first under Linux).


With Borland,

one would use:

bcc32 first.obj driver.obj

Borland

uses the name

of the first file listed to

determine the executable

case, the program

asm io.obj

name. So inthe above

would be named first.exe.

It is possible to combine the compiling and


linking step. For example,

gcc -o first
Now

driver.c first.o

gcc will compile

asm io.o

driver.c and then link.

1.4.6

Understanding

an assembly

listing file

The -llisting-file

tell

nasm to create a listing

This file shows

how the

can be used to
file of a given name.
code was assembled.

switch

Here is how lines 17 and 18 (in the data segment)

appear

in the listing file. (The line numbers

are

in the listing file; however notice that the line


numbers in the

source

file may not be the

same as

and little endian. Big endian is t

24
CHAPTER 1.INTRODUCTION

48 00000000 456E7465722061206Eprompt1 db

"Enter

a number:

",0 49

00000009 756D6265723A2000
50 00000011 456E74657220616E6Fprompt2 db

"Enter another number: ",0

510000001A 74686572206E756D6252 00000023 65723A2000

The first column in each line is the line number

and the second is the offset (in hex) of the data in

raw

the segment. The third column shows the


hex

values that will be stored. In this

case

the

hex data correspond to ASCII codes. Finally, the

The offsets listed in the second

on the line
column are very

likely not the true offsets that

the data will be

text from the

source

file is displayed

placed at in the complete

may

program.

Each module

define its own labels inthe data segment (and

the other segments,

too). In the link

section

these

1.4.5),

all

definitions

are

segment.

The

combined

new

final

data

to

step (see

segment

form

offsets

one
are

label

data
then

computed by the linker.


Here is
the

source

small section (lines 54 to 56 of

file) of the text segment in the listing

file:
94

0000002C

eax,

mov

A1 [00000000]

95 00000031 0305[04000000]

[input1]

add

eax,

mov

ebx, eax

[input2] 96

00000037

89C3

The

third

generated
code for

column

shows

the

by the assembly.

machine

Often

an instruction can not

code

the complete

be computed yet.

For example, in line 94 the offset (or address) of


input1 is not known until the code is linked.
The assembler

can compute

the op-code for the

mov

instruction

(which from the listing is A1),

but

it writes

the

In this

in

offset

because the exact value

can not

case, a temporary

square

brackets

be computed yet.

offset

of 0 is used

because input1 is at the beginning of the part of

the bss segment


this

does

beginning

program.

not
of

defined in this file. Remember

mean
the

that

final

When the code

it will be

bss

segment

at the
of

the

is linked, the linker

will insert the correct offset into the position.


Other instructions, like line 96, do not reference

any

labels. Here the assembler

can compute

the

complete machine code.

Big and Little Endian Representation

one looks closely at line 95, something seems


very strange about the offset in the square
If

brackets of the machine code. The input2 label


is at offset 4 (as defined in this file); however, the

offset that

appears

but 04000000.
multibyte

in

memory

is not 00000004,

Why? Different

processors store

integers in different orders in memory.


There

Endian is pronounced

like

endian and little endian


method indian.

are two popular

methods of

storing integers
Big endian

:
big

is the

1.5. SKELETON FILE

25

that

seems

the most natural

most significant)
next

biggest,

00000004

etc. For

would be stored

processors

method. However,

The biggest (i.e

example,

as

the

dword

the four bytes 00 00

processors
use this big endian
Intel-based processors use the

00 04. IBM mainframes,


and Motorola

byte is stored first, then the

most RISC

all

little endian method! Here the least significant

byte

is stored first. So, 00000004 is stored in memory

as 04 00 00 00. This format is hardwired into the


CPU and can not be changed. Normally, the
programmer does not need to worry about which
format is used. However, there are circumstances
where it is important.

1.When binary data is transfered between


different computers (either
from files

or through a network).

2.When binary data is written out to memory

as a multibyte

integer

and then read back

as individual

bytes

or vice

versa.
Endianness

array

always
strings

does not apply to the order of

elements.

The first element of

at the lowest
(which

are

address.
just

an array

This applies

character

Endianness still applies to the individual


of the

1.5

arrays)

is

to

elements

arrays.
Skeleton File

Figure 1.7 shows

as a starting
programs.
used

a skeleton

file that

can be

point for writing assembly

26

skel.
1

%include "asm_io.inc"

segment .data

CHAPTER 1.INTRODUCTION

;
initialized data
;

segment here

is put inthe data

segment .bss
8
9

;
;
uninitialized

data is put inthe bss

segment
10

11

12
13

14

segment .text
global

_asm_main

_asm_main:

15

enter

16

pusha

0,0

17

18
19

20
21

;
;
code is put inthe text
;
or after this comment.
;

22
23

popa

segment.

24

mov

25

leave

;
setup

eax, 0

routine

Do not modify the code before

;
return
26

back to C

ret

Figure 1.7: Skeleton Program

skel.asm

Chapter 2

Basic Assembly
Language
2.1

2.1.1

Working with Integers

Integer representation

Integers
signed

come

in two flavors: unsigned and

Unsigned

non-negative)

are

straightforward

binary

integers

(which

represented

manner.

in

are
very

The number 200

as an one byte unsigned integer would be represented as by 11001000 (or C8 inhex).


Signed integers (which may be positive or
negative) are represented in a more complicated
ways. For example, consider 56. +56 as a byte
would be represented by 00111000. repr
On esen
paper,
ted
one
by
could represent 56 as 111000, but how would

this be represented

memory.

in

byte in the computers

How would the minus sign be stored?

There

are

three

general

have been used to represent

computer

memory

techniques

that

signed integers in

All of these methods

the most significant

bit of the integer

use

as a sign

bit.This bit is 0 if the number is positive and 1


if negative.

Signed magnitude

The first method is the simplest and is called


signed magnitude. It

rep-resents

the integer

as two

parts. The first part is the sign bit and the second
is the magnitude
be represented

as

bit is underlined)
The

largest

+127

of the integer. So 56 would

the byte

or

(the sign

byte value would be 01111111

and the smallest

11111111

00111000

and 56 would be 10111000.

byte

127. To negate

bit is reversed.

value

a value,

or

be

the sign

This method is straightforward,

First, there are


zero, +0 (00000000) and
zero is neither positive nor

but it does have its drawbacks.

two possible values of


0 (10000000).

would

Since

negative, both of these representations

same.

the

This

complicates

the

should act
logic

of

arithmetic for the CPU. Secondly,

27
28

CHAPTER 2.BASIC

ASSEMBLY

LANGUAGE

general arithmetic is also complicated.


general arit If 10 is

added to 56, this must be recast

as 10

subtracted by 56. Again, this complicates the


logic of the CPU.

Ones complement

The

second

method

is

known

as

ones

complement representation. The ones complement


of

a number

number.

new

is found by reversing each bit in the

(Another

way to

look at it is that the

bit value is 1
oldbitvalue.) For example, the

bit
value.
)For
ones
complement
In ones
ones

of 0 0111000 (+56) is 11000111.

com-plement

complement

is

notation,
equivalent

computing

to

Thus, 11000111 is the representation

the

nega-tion.

for 56.

was

Note that the sign bit


by ones complement

automatically

and that

as one

pect taking the ones complement


the original number. As for
there

are two

(+0) and

representations

ex-

the first method,


of

zero:

00000000

with ones

numbers is complicated.

There is
complement

would

twice yields

11111111 (0). Arithmetic

complement

changed

a
of

handy trick to finding the ones

a number

in hexadecimal without

converting it to binary. The trick is to subtract the


hex digit from F(or 15 indecimal). This method

assumes that the number of bits in the number


a multiple of 4. Here is an example: +56
represented

is

is

by 38 in hex. To find the ones

complement, subtract each digit from F to get C7


inhex. This

agrees

with the result above.

Twos complement

The first two methods described


early computers

Modern

method called two s complement


The

twos complement

the following two steps:

of

were used on
use a third

computers

representation.

number is found by

1.Find the ones complement of the number

2.Add
Heres

one to the result

an example

of step 1

using 00111000 (56). First the

ones complement is com-puted: 11000111. Then

one is added:
11000111

1
11001000

In two complements

twos

complement

number.

Thus,

complement
negations

should reproduce

WITH INTEGERS

the

to negating

is equiv-alent

11001000

is

rep-resentation

Surprising
twos
The C language

29

notation, computing

of

the

twos

56.

Two

the original number.

complement

does

meet

2.1. WORKING

this

Number

Hex Representation

00

01

127

7F

-128

80

-127

81

-2

FE

-1

FF

Table 2.1: Twos Complement

Representation

complement of 11001000 by adding

one to the

ones complement.

00110111

1
00111000

When performing
complement
leftmost bit

not

used.

computer
number

may

the

produce

Remember
is of

of

the addition in the twos

operation,

some

bits).

addition

a carry.

that
fixed

Adding

This

all data

of

size (in terms

two

bytes

the

carry is
on the
of

always

produces

byte

words produces
important

twos

for

example,

as a result (just as adding two


a word, etc.) This property is

complement

complement

notation.

zero as a one

consider

number (00000000)

byte

For

twos

Computing

its

two complement produces the sum:


11111111

c
where

c represents

00000000

a carry.
carry,

shown how to detect this


in

the

result.)

Thus,

notation there is only


complement

arithmetic

in

(Later

it will be

but it is not stored

twos

one zero. This

complement
makes twos

simpler than the previous

methods.

twos

Using
byte

can

complement

notation,

be used to represent

to +127. Table 2.1 shows


If 16 bits

are

to +32, 767

signed

the numbers 128

some

selected values

used, the signed numbers 32, 768

can

be represented.

+32, 767 is

represented
be
re prese ntby
ed.7FFF, 32, 768 by 8000, -128
FF80 and -1 as
numbers

range

approximately.

as

FFFF. 32 bit twos complement


from

billion

to +2 billion

no

The CPU has


(or word

or double

idea what

particular byte

word) is supposed to represent.

Assembly does not have the idea of types that

high level language has. How data is interpreted


depends

Whether

on what

instruction is used

the hex value

data

FF is considered

a signed 1 or a unsigned
on the programmer. The C language

represent

on the

to

+255 depends

30

CHAPTER 2.BASIC

ASSEMBLY

LANGUAGE

defines signed and unsigned integer types.

This allows

a C compiler to

determine the

correct instructions to use with the data.

2.1.2

Sign extension

a specified size. It
uncommon to need to change the size of
to use it with other data. Decreasing size is

In assembly, all data has


is not
data

the easiest.

Decreasing size of data

To decrease the size of data, simply

the

more

remove
a

significant bits of the data. Heres

trivial example:

mov

;
ax =52 (stored

ax, 0034h

in16 bits)

mov

cl,al

lower 8-bits of

Of

course,

if

;
cl=

ax
the

number

can not

be

represented

correctly

in

the

smaller

size,

decreasing the size does not work. For example,


if AX

were

above

code

method

would

308 in decimal) then the

still set

CL to 34h. This

works with both signed and unsigned

numbers.

Consider

FFFFh (1
(1

0134h (or

as a

as a

byte).

signed numbers,

was

if AX

word), then CL would be FFh

However, note that this is not

correct if the value inAX was unsigned!

The rule for unsigned numbers is that all the


bits being removed must be 0 for the conversion

to be correct. The rule for signed numbers is


that the bits being removed must be either all 1s

or

all 0s. In addition,

removed

must

have

the first

the

same

removed bits. This bit will be the

bit not being


value

new

as

the smaller value. It is important that it be

as the original

the

sign bit of

same

sign bit!

Increasing size of data

Increasing the size of data is

than decreasing. Consider


is extended

to

more

complicated

the hex byte FF. If it

word, what value should the

word have? It depends

a unsigned

If FF is

on how

FF is interpreted.

byte (255 in decimal), then

the word should be 00FF; however, if it is

signed byte (1 in decimal), then the word should


be FFFF.

an

In general, to extend

one

makes

new

all the

unsigned

bits

number,

of the expanded

number 0. Thus, FF becomes

00FF. However,

a signed number, one must extend the


bit. This means that the new bits become

to extend
sign

copies of the sign bit. Since the sign bit of FF

is 1, the

new

bits must also be all

ones, to

produce FFFF. If the signed number 5A (90 in

decimal)

was extended,

the result would be 005A.

2.1. WORKING WITH INTEGERS

31

are

There
provides
that

several instructions that the 80386

for extension

the computer

number

is

signed

of

does

or

numbers.

not know

unsigned.

programmer to use the correct


For unsigned

numbers,

It is

Remember
whether

up to

the

instruction.

one can

simply

put

zeros

in the

upper

bits using

MOV instruction.

an

For example, to extend the byte in AL to


unsigned word inAX:

mov

;
zero

ah, 0

it is not

However,

out

upper

possible

an

unsigned

double

no way to
EAX in a MOV. The
providing
a new
There is

instruction

word

use a

to

instruction to convert the unsigned

8-bits

MOV

word in AX to

in EAX. Why

specify the

upper

not?

16 bits of

80386 solves this problem


instruction

has two operands.

MOVZX

by

This

The destination

a 16 or 32 bit register.
source (second operand) may be an 8 or 16 bit
register or a byte or word of memory. The other
(first operand) must be

The

restriction is that the destination


than the

source.

(Most

must be larger

instructions

require

source and destination to be the same


are some examples:

the

size.) Here

movzx

eax, ax

;
extends ax into eax

movzx

eax, al

;
extends

alinto

eax

movzx

ax, al

;
extends

alinto

ax

movzx

ebx, ax

;
extends ax into

ebx

no easy way to
any case. The 8086

For signed numbers, there is

use

the MOV instruction

provided

several

numbers.

The

for

instructions

CBW

to extend

instruction sign extends the AL register


The operands

are

Word to Double

signed

to Word)

Byte

(Convert

into AX.

implicit. The CWD (Convert

word) instruction

sign extends

means to
as one 32 bit

AX into DX:AX. The notation DX:AX


think of the DX and AX registers
register with the

upper

16 bits in DX and the

lower bits in AX. (Remember

any
new

that the 8086 did

not have

32 bit registers!) The 80386 added

several

instructions.

Word

to Double

sign extends
Double

word

word

The

(Convert

instruction

AX into EAX. The CDQ (Convert

to Quad word) instruction

extends EAX into EDX:EAX


the MOVSX instruction

uses the rules

CWDE

Extended)

sign

(64 bits!). Finally,

works like MOVZX except it

for signed numbers.

Application to C programming

Extending
also

occurs

of unsigned

and signed

in C. Variables in

integers

ANSI C does not

define

may

be declared

unsigned (int is signed)


char type is
the code inFigure 2.1. Inline 3,the variable

as

either signed

Consider

a is extended

using the rules

for unsigned values (using MOVZX), but inline 4,the signed rules

are used

signed

or not, it is up to

each individual compiler to


decide this.

for b(using MOVSX).

or

whether the

That is why

the type is explicitly de-

fined inFigure 2.1.

32

CHAPTER 2.BASIC

ASSEMBLY

LANGUAGE

=0xFF;
=0xFF;

unsigned char uchar

signed char

schar

a = (int) uchar;

int

int b

= (int) schar;

a = 255

/ b

= 1

(0x000000FF) /
(0xFFFFFFFF) /

Figure 2.1:
char ch;
while( (ch

= fgetc(fp))

!= EOF ){

/ do something with ch /
}

Figure 2.2:

There is a common C programming bug


that directly relates to this subject. Consider the
code inFigure 2.2. The prototype of fgetc()is:

int fgetc( FILE *);

One might question why does the function return

back

an int since

reason
an char

it reads characters? The

is that it normally

does return back

an int value using zero extension).


one value that it may return
that is not a character, EOF. This is a macro
that is usually defined as 1. Thus,
us fgetc()
ual ly d
either returns back a char extended to an int
value (which looks like 000000xx in hex) or EOF

(ex-tended to

However, there is

(which looks like FFFFFFFF inhex).


The

basic

problem

the program in
re-turns an int, but

with

Figure 2.2 is that fgetc()

this value is stored in a char. C will truncate the

higher

order bits to fit the int value into the

char. The only problem is that the numbers (in


hex)

000000FF

truncated

can not

and

FFFFFFFF

both

will be

to the byte FF. Thus, the while loop

distinguish between reading the byte FF

from the file and end of file.


Exactly
depends
Why?

case,
or unsigned.

what the code does in this

on whether

char is signed

Because in line 2, ch is compared with

an int value
ch will be
extended to an int so that two values being
2.
compared are of the same size
As Figure 2.1
showed, where the variable is signed or unsigned
is very important.
EOF.

Since

EOF is

If char is unsigned,

000000FF.

to be

FF is extended

This is compared

to EOF (FFFFFFFF)

and found to be not equal. Thus, the loop

never

ends!
1

It is

a common

misconception that files have

an EOF

character at their end. This is not true!


2

The

reason

for this requirement will be shown later.

2.1. WORKING WITH INTEGERS

33
char

If

is

signed,

FFFFFFFF. This does

FF

is

compare as

extended

to

equal and the

loop ends. However, since the byte FF may have

been read from the file, the loop could be ending


prematurely.

The solution to this problem is to define the

as an int, not a char. When this


no truncating or extension is done in line

ch variable

is

done,

2.

Inside

the loop, it is safe to truncate

the value

since ch must actually be a simple byte there.

2.1.3

As

Twos complement arithmetic

was seen

performs
performs

addition

earlier,
and

subtraction.

the
the

Two

add

instruction

sub

instruction

of the bits

FLAGS register that these instructions set

overflow and

carry

in the

are the

flag. The overflow flag is set

if the true result of the operation is too big to

fit into the destination

for signed

arithmetic.

carry flag is set if there is a carry in the


msb of an addition or a borrow inthe msb of a
subtraction. Thus, it can be used to detect
overflow for unsigned arithmetic. The uses of the
carry flag for signed arithmetic will be seen
The

shortly.

subtraction
integers.
signed

or

of the great

One

complement

advantages

is that the rules

are

same as
may

exactly the

Thus, add and sub

FFFF

on

43

a carry generated,
answer.
There are two different

There is

but it is not part of

multiply

instructions. First, to mul-tiply


IMUL

be used

(1)
(1)

002B

used

for unsigned

44

002C

or

2s
and

unsigned integers.

the

of

for addition

instruction.

and divide

use either

the MUL

The MUL instruction

is

to multiply unsigned numbers and IMUL is

used to multiply

different

signed integers.

instructions

multiplication
complement

are

needed?

different

Why

The

are two

rules

for

for unsigned and 2s

signed numbers. How so? Con-sider

the multiplication

yielding

of the byte FF with itself

word

result.

Using

multiplication this is 255 times 255

unsigned

or 65025

FE01 in hex). Using signed multiplication

or 1
(or 0001 in hex).
There
or 1
(or
are
0001
several
in hex).
forms of the

(or

this is

1 times 1
1

multiplication

instructions. The oldest form looks like:


mul

The

source

source

reference.

Exactly
depends

is either

It

what

on the

can not

register

be

an

multiplication
size of the

or a memory

immediate

source

is

value.

performed

operand. If the

operand is byte sized, it is multiplied by the byte

in the AL register and the result is stored in the

16 bits of AX. If
multiplied

the

source

is 16-bit, it is

by the word in AX and the 32-bit

34
BASIC ASSEMBLY

CHAPTER 2
LANGUAGE

dest

source1

source2

Action

reg/mem32

= AL*source1
DX:AX =AX*source1
EDX:EAX =EAX*source1

reg16

reg/mem16

dest *= source1

reg32

reg/mem32

dest *= source1

reg16

immed8

dest *= immed8

reg32

immed8

dest *= immed8

reg16

immed16

reg32

immed32

reg16

reg/mem16

immed8

dest

reg32

reg/mem32

immed8

dest

reg16

reg/mem16

immed16

reg32

reg/mem32

immed32

reg/mem8

AX

reg/mem16

dest *= immed16
dest *= immed32

=source1*source2
=source1*source2
dest =source1*source2
dest =source1*source2

Table 2.2: imul Instructions

is stored inDX:AX. If the

source

is 32-bit, it is

multiplied by EAX and the 64-bit result is stored

into EDX:EAX.
The IMUL instruction has the same formats

MUL, but also adds


There

some

as

other instruction formats.

are two and three operand

imul

dest, source1

imul

dest, source1, source2

formats:

Table 2.2 shows the possible combinations.


The two division operators

are DIV and

IDIV. They perform unsigned and signed integer


division respectively. The general format is:

div

source

If the

source

is 8-bit, then AX is divided by the

operand. The quotient


remainder

DX:AX

is stored in AL and the

in AH. If the

source

is stored into AX and remainder

source

is 16-bit, then

is divided by the operand. The quotient

into DX. If the

is 32-bit, then EDX:EAX

the operand

is divided by

and the quotient is stored into EAX

and the remainder into EDX. The IDIV instruction


works the
instructions

same way.
like

There

the special

are no
IMUL

special IDIV

ones.

If the

or the
zero, the program is interrupted and
terminates. A very common error is to forget to
initialize DX or EDX before division.
quotient

is too big to fit into its register

divisor is

The

NEG

instruction

negates

its

operand by computing its twos complement.


operand

may

be

any

8-bit, 16-bit,

or

single

Its
32-bit

register

or memory

location.

2.1. WORKING WITH INTEGERS

2.1.4

Example programmath.asm

%include "asm_io.inc"

segment .data

prompt

db

"Enter

square_msg

db

"Square of input

;
Output

strings

a number:

35

",0
is ",0
cube_msg

is ",0

db

"Cube of input

cube25_msg

db

input times 25 is ",0

db
8

"Cube of

quot_msg

"Quotient of cube/100 is ",0

rem_msg

db

"Remainder of

neg_msg

db

"The negation

10

11

segment .bss

12

input

resd 1

13

14

15

16

segment .text
global

_asm_main

_asm_main:

17

enter

18

pusha

0,0

19

20

mov

eax, prompt

21

call

print_string

23

call

read_int

24

mov

[input],

imul

eax
ebx, eax

28

mov
mov

29

call

print_string

30

mov

eax, ebx

31

call

print_int

32

call

print_nl

22

eax

25

26

27

33

eax, square_msg

;
;

34

mov

ebx, eax

35

imul

ebx, [input]

36

mov

eax, cube_msg

37

call

print_string

38

mov

eax, ebx

39

call

print_int

40

call

print_nl

cube/100 is ",0
of the remainder is ",0

setup routine
edx:eax

=eax *eax

save answer

inebx

ebx *= [input]

36
ASSEMBLY

CHAPTER 2.BASIC
LANGUAGE

41

42

imul

ecx, ebx, 25

43

mov

eax, cube25_msg

44

call

print_string

45

mov

eax, ecx

46

call

print_int

47

call

print_nl

49

mov

eax, ebx

50

cdq

51

mov

ecx, 100

52

idiv

54

mov
mov

ecx
ecx, eax
eax, quot_msg

55

call

print_string

56

mov

eax, ecx

57

call

print_int

58

call

print_nl

59

mov

eax, rem_msg

48

53

60

call

print_string

61

mov

eax, edx

62

call

print_int

63

call

print_nl

neg
mov

edx

66

67

call

print_string

68

mov

eax, edx

69

call

print_int

70

call

print_nl

64

65

eax, neg_msg

71

72

popa

73

mov

74

leave

;
ecx

eax, 0

=ebx*25

;
initialize
;
cant

edx by sign extension

divide by immediate value

;
edx:eax / ecx
;
save quotient

into

ecx

;
negate

the remainder

;
return

back to C

ret

75

2.1.5

Extended precision arithmetic

Assembly

.
that

math.asm

allow

language also provides

one

to

perform

instructions

addition

and

subtraction of numbers larger than double words

These in-structions

use

the

carry

flag. As stated

above, both the ADD and SUB instruc-tions modify

the

carry

flag if

respectively.

a carry or borrow are

generated,

2.2. CONTROL STRUCTURES

37

This information stored inthe


used to add
breaking

or subtract

carry

flag

can be

large numbers by

up the operation

into smaller double

word (or smaller) pieces.


The ADC and SBB instructions

use this

information inthe carry flag. The ADC instruction


performs the following operation:
operand1
flag

= operand1 + carry

+ operand2

The SBB instruction performs:


operand1

= operand1 -carry

flag

operand2

How

are these used? Consider

the sum of 64-bit

integers inEDX:EAX and EBX:ECX. The


following code would store the sum inEDX:EAX:

add

eax, ecx

;
add lower

adc

edx, ebx

;
add

32-bits
2

upper

32-bits and

carry

from previous

sum

Subtraction is very similar. The following code

subtracts EBX:ECX from


EDX:EAX:

lower 32-bits

sbb

;
subtract upper
For really

to

use

ADC

(instead of all but the first iteration).

can be done

instruction

right

initialize the

carry

there is

a loop could be
a sum loop, it would
instruction for every

large numbers,

be convenient

This

edx, ebx

32-bits and borrow

used (see Section 2.2). For

iteration

;
subtract

eax, ecx

sub

no

instructions.

by using the CLC (CLear Carry)


the

before

loop

flag to 0.If the

difference
The

starts

carry

to

flag is 0,

between the ADD and ADC

same

idea

can

be used

for

subtraction, too.

2.2

Control Structures

High
control

level

languages

structures

(e.g.,

provide

high

the

and

if

level
while

statements) that control the thread of execution.

Assembly lan-guage does not provide such complex


control structures

goto

and

used

It instead

uses the
can

inappropriately

infamous

result

in

spaghetti code! How-ever, it is possible to write


structured

assembly

basic procedure

using the familiar


and translate
assembly

language

programs
program

is to design the
high level

The

logic

control structures

the design into the appropriate

language

(much like

compiler

would

do).

2.2.1

Comparisons

Control structures decide what to do based

on comparisons of data. In assembly, the result of


a comparison is stored inthe FLAGS register to
be

38

CHAPTER 2.BASIC

ASSEMBLY

used

LANGUAGE

later.

The

80x86

instruction to perform comparisons


register is set based
operands

are

of

on the

provides

the

CMP

The FLAGS

difference of the two

the CMP instruction.

subtracted and the FLAGS

The operands

are set

based

on

the result, but the result is not stored anywhere.


If you need the

result

use the SUB instead

of the CMP instruction.

are two flags (bits


are important: the
zero (ZF) and carry (CF) flags. The zero flag is
set (1) if the resulting difference would be zero.
The carry flag is used as a
borrow
flag
for
subtraction.
Consider
a
For unsigned integers, there
in the FLAGS register)

that

comparison like:

cmp

vleft, vright

The difference

of vleft

- vright

is computed

are set accord-ingly. If the difference


of the of CMP is zero, vleft = vright, then ZF is

and the flags

set (i.e. 1) and the CF is unset (i.e. 0). If vleft

> vright,

then ZF is unset and CF is unset (no

<

borrow). If vleft

vright, then ZF is unset


and CF

is set (borrow).

are three flags that


are important: the zero

For signed integers, there

= OF if

Why does SF

(ZF) flag, the overflow

(OF) flag and the sign (SF) flag. The overflow

flag vleft

> vright?

an operation

If there

is set if the result of

overflows (or underflows)

flag

The sign

is no overflow, then the


will

difference

have

the

is

set

if the result of an operation is negative.

is

set

(just

If vleft

as for unsigned

integers). If vleft

= vright, the ZF

> vright, ZF is unset

correct

value

be non-negative.
SF

and

must

Thus,

SF

=OF =0.However,

if there is an overflow, the

and
= OF. If vleft < vright, ZF is unset and SF 6= OF.
Do not forget that other instructions

can also change

the FLAGS register,

not just CMP.

difference will not have the

correct

value

instructions

(and

will be negative).
SF

in fact

transfer

Branch

Thus,

= OF = 1

can

2.2.2

execution

Branch instructions
points of a

to arbitrary

program. In other
are two types

words, they act like


of branches:

conditional. An unconditional

a goto,

it

conditional

always

branch depending

JMP

(short

unconditional
usually

and

branch is just like

makes

the

branch.

branch does not make

conditional

the branch, control


The

There

may or may not make the


on the flags in the FLAGS

branch

register. If

a goto.

unconditional

passes to the next

for

jump)

branches

Its

instruction.

instruction
single

makes

argument

is

code label to the instruction to branch

to.The assembler

or linker

will replace the label

with correct address of the in-struction. This is

one

another

easier.

of the tedious operations

that the

does to make the programmers

assembler

It

statement

is

important

immediately

to realize

that

life

the

after the JMP instruction

will never be executed

unless another instruction branches to it!


There

are

several

variations

of

the

jump

instruction:
SHORT This jump is

very

can only move up or down

range. It
in memory.

limited in

128 bytes

2.2. CONTROL STRUCTURES

39

JZ

branches only if ZF is set

JNZ

branches only if ZF is unset

JO

branches only if OF is set

JNO

branches only if OF is unset

JS

branches only if SF is set

JNS

branches only if SF is unset

JC

branches only if CF is set

JNC

branches only if CF is unset

JP

branches only if PF is set

JNP

branches only if PF is unset

Table 2.3: Simple Conditional Branches

memory

than the others.

signed byte to store the


jump. The displacement

move

ahead

or

behind.

added to EIP). To specify


SHORT

keyword

It

uses a

displacement

is how
(The

immediately

many

single

of the

bytes to

displacement

is

use

the

short jump,

before the label in

the JMP instruction.

NEAR This jump is the default type for both

unconditional

and condi-tional branches

it

can

be used to jump to any location in a seg-ment

Actually, the 80386 supports two types of

near

uses two bytes for the displacement.


one to move up or down roughly
32,000 bytes. The other type uses four bytes
for the displacement, which of course allows one
to move to any location in the code segment.
jumps. One

This allows

The

four

protected

byte
mode.

type
The

is the

two

default

byte

type

specified by putting the WORD keyword

in 386

can

be

before the

label inthe JMP instruction.


FAR This jump allows control to move to
another code segment. This is a

very rare thing to do in386 protected


mode.

Valid code labels follow


data labels. Code labels
them

in the

code

are

segment

the

same

rules

as

defined by placing
in front

of

the

statement they label. A colon is placed at the end

of the label at its point of definition. The colon is

not part of the name.


There

are many

instructions.

different conditional branch

a code label as their


simplest ones just look at a

They also take

single operand. The

single flag in the FLAGS register to determine

whether to branch
of these

which

indicates

number

or not.

instructions.
the

See Table 2.3 for a list

(PF is the parity

flag

or evenness

the

odd

of bits set in the lower 8-bits

of

of the

result.)
The following pseudo-code:

40

CHAPTER 2.BASIC

ASSEMBLY

LANGUAGE

== 0 )
= 1;

if (EAX

EBX

else
EBX

= 2;

could be written inassembly

cmp

as:

eax, 0

jz

thenblock

mov

ebx, 2

jmp

next

thenblock:

mov

ebx, 1

next:

Other comparisons
;
set

flags (ZF set if eax

;
if ZF is set

of IF

;
jump over

THEN part of IF

;
THEN part

of IF

so easy

-0 =0)

branch to thenblock

;
ELSE part

not

using the conditional branches in

Table 2.3. To illustrate, consider the following


pseudo-code:
if (EAX

EBX

>= 5 )

= 1;

are

else
EBX

= 2;

If EAX is greater than

may be set or unset

or equal to five, the

ZF

and SF will equal OF. Here

is assembly code that tests for these conditions


(assuming that EAX is signed):

cmp

eax, 5

js

signon if SF

=1

jo

;
goto

elseblock if OF

jmp

thenblock

if SF

jo

elseblock

=1
and SF =0
;
goto

= 0 and OF =0

;
goto

signon

signon:

;
goto

thenblock

thenblock if SF

thenblock

=1
and OF =1

elseblock:
8

mov

ebx, 2

jmp

next

10

thenblock:

mov

11

12

ebx, 1

next:

The above code is very awkward. Fortunately,


the 80x86 provides addi-tional branch instructions

to make these type of tests much easier. There

are signed

and unsigned versions of each

Table

2.4 shows these instruc-tions. The equal and not


equal branches (JE and JNE)

are

the

same

for

both signed and unsigned integers. (In fact, JE


and JNE are really identical

2.2. CONTROL STRUCTURES

41
Unsigned

Signed

= vright

JE

branches if vleft

JNE

branches if vleft 6= vright

JL, JNGE

branches if vleft

JLE, JNG

branches if vleft vright

JG, JNLE

branches if vleft

JGE, JNL

branches if vleft vright

< vright
> vright

=vright

JE

branches if vleft

JNE

branches if vleft 6= vright

JB, JNAE

branches if vleft

JBE, JNA

branches if vleft vright

JA, JNBE

branches if vleft

JAE, JNB

branches if vleft vright

<vright
>vright

Table 2.4: Signed and Unsigned Comparison Instructions

to JZ and JNZ, respectively.) Each of the other


branch

instructions

two

have

synonyms.

For

example, look at JL (jump less than) and JNGE


(jump not greater than

same instruction

or equal to). These are the

because:

x < y =
The unsigned branches

not(x y)

use

below instead of Land G.


Using these

A for above and B for

new branch instructions


the
can be translated to assembly

pseudo-code above
much easier.

cmp

eax, 5

jge

thenblock

ebx, 2

jmp

next

thenblock:

mov

mov

ebx, 1

next:2.2.3

The loop instructions

The 80x86 provides several instructions


designed to implement for-like loops. Each of
these instructions takes

a code label as its single

operand.
LOOP Decrements ECX, if ECX 6= 0, branches

to label
LOOPE, LOOPZ Decrements ECX (FLAGS

register is not modified), if


ECX 6= 0 and ZF

= 1, branches

LOOPNE, LOOPNZ Decrements ECX


(FLAGS unchanged), if ECX 6=

0 and ZF

= 0,branches

The last two loop instructions

are useful

sequential search loops. The following

for

pseudo-code:

42

CHAPTER 2

sum =0;
for( i=10; i
>0; i )

sum += i;
could be translated into assembly

as:

mov

eax, 0

;
eax

is

mov

ecx, 10

;
ecx

is

loop_start:

add

eax, ecx

loop

loop_start

BASIC ASSEMBLY

LANGUAGE

sum
i
2.3

Translating Standard

Control Structures

This section looks at how the standard


control structures of high level languages

can be

implemented inassembly language.

2.3.1

If statements

The following pseudo-code:


if (condition )
then block;

else
else block

could be implemented

so that

;
code

to set FLAGS

jxx

else_block

;
select xx
3

for then block

jmp

endif

else_block:

;
code

;
code

branches if condition false

as:

for else block

endif:

If there is no else, then the else block

branch

can be replaced

by a branch to endif.

;
code

to set FLAGS

jxx

endif

xx so that
;
code
4

for then block

endif:

;
select

branches if condition false

2.4. EXAMPLE: FINDING PRIME NUMBERS

43

2.3.2

While loops

The while loop is a top tested loop:


while( condition ){
body of loop;

}This
could be translated into:
1

while:

;
code

to set FLAGS based

jxx

endwhile

on

condition
3

xx so that

branches if false

;
select
4

body of loop
jmp

while

endwhile:

2.3.3

Do while loops

The do while loop is a bottom tested


loop:
do {
body of loop;

}while( condition );

This could be translated into:


1

do:

;
body

of loop

;
code

to set FLAGS based

jxx

do

on

condition
4

;
select xx

so that

branches if true

2.4

Example:

Finding

Prime

Numbers
This section looks at

a program

prime numbers. Recall that

that finds

prime numbers

are

evenly divisible by only 1


and themselves. There

no formula for doing this. The basic method


program uses is to find the factors of all odd
3
numbers
below a given limit. If no factor can be
found for an odd number, it is prime. Figure 2.3
is

this

C.
shows the basic algorithm written in

Heres the assembly version:


3

2 is the only

44

even prime

number.

CHAPTER

BASIC ASSEMBLY

guess;

LANGUAGE

/ current

guess

unsigned

unsigned factor;

/ possible factor of

unsigned limit;

/ find primes

for prime

guess

up to this

value /

45

printf (Find primes


scanf(%u,

up to:);

&limit);

printf (2\n);

/ treat

printf (3\n);

/ special

case

/ initial

guess

guess

=5;

while (guess

10

/ look for

11

factor

12

<= limit

){

a factor

of

guess

14

factor

15

guess

as

/
/

= 3;

while (factorfactor

13

first two primes

< guess

&&

% factor != 0 )

+= 2;

if (guess % factor != 0)

16

printf (%d\n,

17

guess += 2;

18

guess);

/ only look at odd numbers /

19

Figure 2.3:
prime.asm
1

%include "asm_io.inc"

segment .data

Message

db

"Find

resd

45

segment .bss
6

Limit

resd

Guess

89

segment .text
global

10
11

_asm_main

_asm_main:

12

enter

13

pusha

0,0

14

15

mov

eax, Message

16

call

print_string

17

call

read_int

18

mov

[Limit],

eax

19

primes

up to:",0

;
find primes up to this
;
the current guess
;
setup routine
;
scanf("%u",

limit

for prime

& limit );

2.4. EXAMPLE: FINDING PRIME NUMBERS

45

20

mov

eax, 2

21

call

print_int

22

call

print_nl

23

mov

eax, 3

24

call

print_int

25

call

print_nl

mov

dword [Guess], 5

26

27

28

while_limit:

29

mov

eax,[Guess]

30

cmp

eax, [Limit]

31

jnbe

end_while_limit

mov

ebx, 3

32

33

34

while_factor:

35

mov

eax,ebx

36

mul

eax

37

jo

end_while_factor

38

cmp

eax, [Guess]

39

jnb

end_while_factor

40

mov

eax,[Guess]

41

mov

edx,0

42

div

ebx

43

cmp

edx, 0

44

je

end_while_factor

46

add

ebx,2

47

jmp

while_factor

45

48

end_while_factor:

49

je

end_if

50

mov

eax,[Guess]

51

call

print_int

call

print_nl

54

add

dword [Guess], 2

55

jmp

while_limit

52

53

56

end_if:

end_while_limit:

57
58

popa

59

mov

60

leave

eax, 0

;
printf("2\n");
;
printf("3\n");

;
Guess

=5;

;
while

(Guess

<= Limit

;
use

jnbe since numbers

;
ebx

is factor

=3;

are

unsigned

;
edx:eax

= eax*eax

;
if answer

wont fit ineax alone

;
if !(factor*factor < guess)

;
edx

=edx:eax

;
if !(guess

% ebx

% factor != 0)

;
factor += 2;
;
if !(guess

% factor != 0)

;
printf("%u\n")
;
guess += 2

return back to C

61

ret

46

prime.asm

CHAPTER 2.BASIC
ASSEMBLY

LANGUAGE

Chapter 3

Bit Operations
3.1

Shift Operations

Assembly language allows the


manipulate

common

the

individual

bits

bit operation is called

programmer to
of

data.

One

shift. A shift

moves the position of the bits of some


data. Shifts can be either toward the left (i.e.
toward the most significant bits) or toward the
operation

right (the least significant bits).

3.1.1

Logical shifts

A logical shift is the simplest type of shift. It


shifts in a very straightfor- ward

shows

an example

of

a shifted

manner.

Figure 3.1

single byte number.

Original

Left shifted

Right shifted

Figure 3.1: Logical shifts


Note that

new,

incoming bits

The SHL and SHR instruc-tions

are

allow

one to shift

positions. The number of


either be

any

by

These

number of

can

positions to shift

a constant or can

be stored in the CL

The last bit shifted out of the data is

register.

stored

zero.

are used to perform

logical left and right shifts respectively


instructions

always

in the

carry

flag. Here

are some

code

examples:

mov

to

bit

right,

ax,

ax

ax

ax

0C123H

8246H,

2091H,

CF

;
shift
4123H,

CF

;
shift

;
shift

ax

left,

ax,

shr

ax, 0C123H

shl

CF

15

=0

13

1
bit to
shr

1
bit to right,

mov

ax

47

48

,
,

CHAPTER 3.BIT OPERATIONS

ax

shl

to

bits

mov

cl,3

ax

shr

bits to right,

3.1.2

048CH,

CF

17

;
shift

cl

ax =0091H,

CF

=1

Use of shifts

Fast

most

;
shift

ax

left,

multiplication

common uses

of

and

a shift

division

are

operations. Recall

that in the decimal system, multiplication


division
digits.
binary.

the

and

a power of ten are simple, just shift


The same is true for powers of two in
by

For

example,

to double

the

binary

once to
the left to get 101102 (or 22). The quotient of a
division by a power of two is the result of a right
shift. To divide by just 2, use a single right shift;
number 10112

(or 11 in decimal), shift

to divide by 4 (2 2), shift right 2 places; to divide


by 8 (2

3),

instructions

shift 3 places to the right, etc. Shift

are very

basic and

are

much faster

than the corresponding MUL and DIV instructions!


Actually,
multiply

logical

can

shifts

and divide unsigned

be

used

to

val-ues. They do

not work in general for signed values. Consider


the 2-byte

logically

value FFFF

(signed

once,

right shifted

If value
it is

1).

the result is 7FFF

which is +32, 767! Another type of shift

can

be

used for signed values.

3.1.3

Arithmetic shifts

These

shifts

are

designed

to allow

signed

numbers to be quickly multi-plied and divided by

powers

of 2. They insure that the sign bit is

treated correctly.
SAL Shift Arithmetic

just

a synonym

is assembled

machine code

as

Left

- This instruction

is

for SHL. It

into

the exactly

the

same

as SHL. As long

the sign bit is not changed by the shift,

the result will be correct.


SAR Shift Arithmetic

instruction

Right

that does not shift

- This

is

a new

the sign bit (i.e.

the msb) of its operand.


shifted

as

normal except that the

enter from the left

are copies

is, if the sign bit is 1, the


Thus, if

The other

new

bits

are

bits that

of the sign bit (that

new

bits

are

also 1).

byte is shifted with this instruction,

only the lower 7 bits

are shifted.

As for the other

shifts, the last bit shifted out is stored in the

carry

flag.

8246H,

mov

ax,0C123H

ax,

sal
CF

13

1
sal

;
ax = 048CH, CF = 1
;
ax = 0123H, CF = 0

sar

;
ax =
ax, 1
ax, 2

3.1. SHIFT OPERATIONS

49

3.1.4

Rotate shifts

The rotate shift instructions

work like logical

shifts except that bits lost off one end of the data

are

shifted in on the other side. Thus, the data

is treated
simplest

as if it is a circular structure. The two


rotate instructions
are ROL and ROR

which make left and right rotations, respectively.

Just

as

for the other shifts, these shifts leave the

copy of the last

bit shifted around inthe carry flag.

mov

8247H,

CF

ax, 0C123H

ax

rol

13

are two

;
ax =
ax, 1
ax, 1
ax, 2
ax, 1

;
ax = 048FH, CF = 1
;
ax = 091EH, CF = 0
;
ax = 8247H, CF = 1
;
ax = C123H, CF = 1
There

rol

rol

ror

ror

additional rotate instructions

that shift the bits inthe data and the


named

RCL and RCR. For example,

carry

flag

if the AX

register

is rotated

17-bits made

up

with these instructions,

of AX and the

carry

flag

the

are

rotated.

mov

clc

the

carry

ax, 0C123H

flag (CF

= 0)

;
ax =8246H,

15

rcl

091BH, CF

=0

=1

rcr

CF

;
ax = C123H,

CF

CF

;
ax =

ax, 1
6

;
ax = 8246H,

3.1.5

CF

;
ax = 048DH,

ax, 1

rcl

;
clear
rcl
ax,

ax, 2

=1
=0

rcr

ax, 1

Simple application

Here is a code snippet that counts the


number of bits that

are on

(i.e. 1) inthe EAX

register.
1

mov

;
blwill

bl,0

contain the count of ON bits

;
ecx

ecx, 32

mov

is the loop counter

count_loop:
4

bit into

shl

carry

;
shift

eax, 1
flag

jnc

;
if CF == 0,goto

skip_inc
skip_inc
7

inc

bl

skip_inc:
loop

count_loop

50
CHAPTER 3.BIT OPERATIONS

X AND Y

Table 3.1: The AND operation

AND

Figure 3.2: ANDing

a byte

The above code destroys the original value of EAX


(EAX is zero at the end of the loop). If one wished

to retain the value of EAX, line 4 could


replaced

with rol eax, 1.

be

3.2

Boolean Bitwise Operations

are

There

four

common

boolean

operators:

AND, OR, XOR and NOT. A truth table shows


the result

of each operation

for each possible

value of its operands.

3.2.1

The AND operation

The result of the AND of two bits is only 1


if both bits

are 1, else the

result is 0 as the truth

table inTable 3.1shows.


Processors
instructions

support

these

operations

bits of data

in parallel.

contents of AL and BL

are

as

on

all the

For example,

if the

that act indepen-dently

ANDed together, the

basic AND operation is applied to each of the 8


pairs of corresponding

bits in the two registers

as Figure

3.2 shows. Below is a code example:

mov

ax,0C123H

and

ax, 82F6H

8022H

;
ax =

3.2.2

The OR operation

The inclusive OR of 2 bits is 0 only if both

bits are 0,else the result is 1


as the truth table in
Table 3.2 shows. Below is a code example:

mov

ax,0C123H

or

ax, 0E831H

= E933H

;
ax

3.2. BOOLEAN BITWISE OPERATIONS

51

X OR Y

Table 3.2: The OR operation

X XOR Y

Table 3.3: The XOR operation

3.2.3

The XOR operation

The exclusive OR of 2 bits is 0 if and only if


both bits are equal, else the result is 1
as the truth

table inTable 3.3 shows. Below is a code

example:

mov

ax, 0C123H

xor

ax, 0E831H

;
ax

= 2912H
3.2.4

The NOT operation

The NOT

operation

a unary

is

operation

on one operand, not two like binary


as AND). The NOT of a bit is
the opposite value of the bit as the truth table in
Table 3.4 shows. Below is a code example:

(i.e. it acts

operations such

mov

ax, 0C123H

not

ax

;
ax =

3EDCH

Note

that

complement.

the

NOT

finds

the

Unlike the other bitwise

the NOT instruction

does not change

ones

operations,

any

of the

an

A ND

bits inthe FLAGS register.

3.2.5

The

The TEST instruction

TEST

instruction

performs

operation, but does not store the result. It only

sets the FLAGS register based


would

be (much like how

performs

a subtraction

on what

but only sets FLAGS). For

example, if the result would be

set.

the result

the CMP instruction

zero, ZF would

be

52
Turn

on bit i

Turn offbit i
Complement

bit i

CHAPTER 3.BIT OPERATIONS

NOT X

Table 3.4: The NOT operation


OR the number with 2

(which is the binary

number with just bit i


on)

AND the number with the binary number


with only bit i
off.

called

This operand is often

a mask

XOR the number with 2

Table 3.5: Uses of boolean operations

3.2.6

Uses of bit operations

Bit operations

are very useful

for

manipulating individual bits of data without


modifying the other bits. Table 3.5 shows three

common uses

of these operations. Below is some

example code, implementing these ideas.

mov

xor

ax,

C12BH

410BH

0FFF0H

4F00H

or

invert

ax,

;
turn

nibbles

0FFFFH

ax

off nibble,

ax

= BF0FH

;
1s

bit

ax,

= 4F0BH

xor

ax = C10BH

bit 5,

;
invert
ax

ax

and

8000H

;
turn on nibble,
ax,

;
turn on

;
turn off

0FFDFH

ax

ax

ax

3,

bit

ax,0C123H

or

0F00H
and

ax

complement,

0F00FH
8

31,

xor
ax

40F0H

The AND operation


the remainder

of

can

To find the remainder of


the

number with

mask will contain

also be used to find

a power of two
a division by 2 i,AND

division

by

a mask equal to 2 i 1. This


ones from bit 0 up to bit i 1.

It
from
is just
bit 0 these bits that contain the remainder.

The result

zero out

of the AND will keep these bits and

the others. Next is a snippet of code that

finds the quotient and remainder of the division


of 100 by 16.

mov

64H

ebx, 0000000FH

= 16 -1
= 15 or F
;
ebx = remainder =4
Using the CL register

it is possible

bits of data. Next is

sets (turns

on)

an

arbitrary

an

;
mask
ebx, eax

and

arbitrary

;
100 =

eax, 100
mov

to modify

example that

bit in EAX. The

number of the bit to set is stored inBH.

3.3. AVOIDING CONDITIONAL BRANCHES

53

mov

;
blwill

bl,0

contain the count of ON bits

;
ecx

ecx, 32

mov

is the loop counter

count_loop:

bit into

carry

;
add just
loop

;
shift

eax, 1

shl

the

adc

carry

flag to bl

flag

bl,0
6

count_loop

Figure 3.3: Counting bits with ADC

mov

;
first

cl,bh

build the number to OR with

mov

ebx, 1
3

shl

left cltimes

;
shift

ebx, cl

or

eax, ebx

;
turn on bit
Turning
1

a bit offis
mov

just

a little

harder.

;
first

cl,bh

build the number to AND with

mov

ebx, 1
shl

left cltimes

;
invert

not

ebx

bits

;
turn off

eax, ebx

and

;
shift

ebx, cl
4

bit

Code to complement

an arbitrary

bit is left

as an

exercise for the reader.


It is not

uncommon to see the following


program:

puzzling instruction in a 80x86

xor

;
eax

eax, eax

=0

A number XORed with itself always results in

zero. This instruction

is used because its machine

code is smaller than the corresponding MOV


instruction.

3.3

Avoiding Conditional

Branches
Modern

processors

use very sophisticated


as quickly as possible.
is known as speculative

techniques to execute code


One

common

technique

execu-tion

This

processing
multiple

capabilities

processor,

the

parallel

of the CPU to

execute

once.

at

instructions

branches present

uses

technique

Conditional

prob-lem with this idea. The

ingeneral, does not know whether the

branch will be taken

or not.

different set of instructions

If it is taken,

will be executed than

if it is not taken. Processors

try to predict

whether

taken.

the

branch

will

be

wrong, the processor


time executing the wrong code.
prediciton is

If

has wasted

the

its

54
CHAPTER 3.BIT OPERATIONS

One

using
The

way to

avoid this problem is to avoid

conditional
sample

provides

code

a simple

branches
in

when

possible.

3.1.5

example of where

one

could do

this. In the previous example, the on bits of the

EAX register

are counted.

the INC instruction.


branch

can

be

a branch to skip

Figure 3.3 shows

removed

instruction to add the


The SETxx

It uses

carry

instructions

by

how the

using

the

ADC

flag directly.
provide

a way to

remove

branches

to

location

These

a byte register or
zero or one based on the

state of the FLAGS register.


after SET

cases.

certain

set the value of

instructions

memory

in

are

conditional

same

the

branches.

If

The characters

characters
the

used for

corresponding

condition of the SETxx is true, the result stored


is a one, if false
setz

a zero is stored. For example,


;
AL = 1
if Z flag is

al

set, else 0

Using these instructions,


clever

techniques

that

one can
cal-cuate

develop

values

some

without

branches.

For example, consider the problem of finding


the

maximum

of

two

values.

The

standard

use a
use a conditional branch to act on which
was larger. The example program below
how the maximum can be found without

approach to solving this problem would be to


CMP and

value
shows

any branches.
1

;
file: max.asm

%include "asm_io.inc"

segment .data

message1 db "Enter

a number:

",0

message2 db "Enter another number: ",

0
message3 db "The larger number is:",

0
89

segment .bss
10

11

input1

resd

;
first

12

13

14
15

segment .text
global

_asm_main

_asm_main:

16

enter

17

pusha

0,0

18

19

mov

eax, message1

20

call

print_string

21

call

read_int

number entered

;
setup
;
print
;
input

routine

out first

message

first number

3.3. AVOIDING CONDITIONAL BRANCHES

55

mov

[input1],

24

mov

eax, message2

25

call

print_string

26

call

read_int

28

xor

ebx, ebx

29

cmp

eax, [input1]

30

setg

bl

;
;
;

neg
mov

ebx

32
33

and

ecx, ebx
ecx, eax

34

not

ebx

35

and

ebx, [input1]

22

eax

23

27

31

;
;
;

;
;

or

ecx, ebx

38

mov

eax, message3

39

call

print_string

40

mov

eax, ecx

41

call

print_int

42

call

print_nl

36

37

43
44

popa

45

mov

46

leave

47

ret

print out second

eax, 0

message

input second number (ineax)

ebx

=0

compare

second and first number

= (input2
ebx = (input2
ecx = (input2
ecx = (input2
ebx = (input2
ebx = (input2
ecx = (input2
ebx

>input1)
>input1)
>input1)
>input1)
>input1)
>input1)
>input1)

1
:
0

?0xFFFFFFFF

:
0

?0xFFFFFFFF

:
0

input2

:
0

?
?
?
?

0:
0xFFFFFFFF
0:
input1
input2

:
input1

print out result

return back to C

The trick is to create


used to select the correct

a bit

mask that

can be

value for the maximum.

The SETG instruction in line 30 sets BL to 1


if
the second input is the maximum

or

0 otherwise.

This is not quite the bit mask desired. To create

bit mask, line 31 uses the NEG


on the entire EBX register
(Note
was zeroed out earlier.) If EBX is 0,

the required
instruction
that EBX

this does nothing; however,

if EBX is 1, the

result is the twos complement representation of -1

or

0xFFFFFFFF.

This

is just

the

bit

uses this bit


as the maximum.

required. The remaining code

to select the correct input

mask
mask

An alternative
trick is to use the DEC
statement. Inthe above code, if the NEG is replaced
with

DEC, again the result will either be 0

0xFFFFFFFF.

However, the values

than when using the NEG instruction.

are

or

reversed

56
CHAPTER 3.BIT OPERATIONS

3.4

3.4.1

Manipulating bits inC

The bitwise operators of C

Unlike

some

high-level languages, C does

provide operators for bitwise

operations. The AND operation is


1.
represented by the binary & operator
The OR operation is represented by the binary
|
operator. The XOR operation is represetned by the binary ^operator.

And the NOT operation is


represented by the unary

The shift operations

~operator.

are performed

<< and >> binary operators.


The << operator performs
>> operator performs right

by Cs

left shifts and the

shifts. These operators take two operands.

The left operand is the value to


shift and the right operand is the number of

bits to shift by. If the value

to shift is an unsigned type, a logical shift is


made. If the value is a signed

type (like int), then an arithmetic shift is


used. Below is some example C
code using these operators:
1

s;

short int

int is 16bit /
3

s = 1;

complement) /

assume

short unsigned

that short

u;

s = 0xFFFF

u= 100;

(2s
/

u=

0x0064 /
5

u = u|
0x0100;

u = 0x0164

s =s & 0xFFF0;

s = 0xFFF0

s = s u;

s = 0xFE94

u= u<< 3;

u = 0x0B20

(logical

shift )/

(arithmetic

3.4.2

s = s >> 2;

s = 0xFFA5

shift )/

Using bitwise operators inC

are used
same purposes as they are used
language.
They
allow
one to
individual bits of data and can be
The bitwise operators

multiplication

in C for the
in assembly
manipulate

used for fast

and division. In fact,

a smart

compiler will

use a shift

for

a multiplication

like,

x *= 2,automatically.
Many
POSIX

operating

system

API

2 s (such

and Win32) contain functions which

operands

that have data encoded

example,

systems

POSIX

as

(a better
others.

name

Each

would be owner),

user can

type of

bits. For

maintain

permissions for three different types of

programmer

to

be granted

a file requires

manipulate

POSIX defines several

file

users: user
group and

permission to read, write and/or execute

To change the permissions of

as
use

individual

macros to help

a file.

the C
bits

(see Table

3.6). The
1

This

unary
2

operator

is different

from

the

binary

&&

and

& operators!

Application Programming Interface


stands

Computer

for Portable

Environments.

IEEE based

on UNIX.

Operating

System

stan-dard

Interface

developed

by

for

the

3.5. BIG AND LITTLE ENDIAN

57

REPRESENTATIONS

Macro

Meaning

S IRUSR

user can read


user can write
user can execute
group can read
group can write
group can execute
others can read
others can write
others can execute

S IWUSR
S IXUSR
S IRGRP
S IWGRP
S IXGRP
S IROTH
S IWOTH
S IXOTH

Table 3.6: POSIX File Permission Macros


chmod function can be used to
permissions of file. This function takes

a string with
on and an integer 4

two parameters,
file to act

with the appropriate

the

set

name

the
of the

bits set for the desired

permissions. For example, the


code below sets the permissions

to allow the

owner

of the file to read and

write to it, users in the


and others have
chmod(foo,

group to read

the file

no access.
S IRUSR |
S IWUSR |
S IRGRP );

can

The POSIX stat function

be used to

find out the current permission


bits for the file. Used with the chmod function,
it is possible to modify
of the

permissions

Here is an example that


write

access to

some
without

changing

others.

removes

others and adds read

access

to the owner of the file. The


other permissions
1

struct stat

are not

file stats

altered.
/ struct

used by

stat () /
2

stat(foo,

& file stats );

//
read
read
filefil info
e

file sta

st mode holds permission bits /

ts

..

chmod(foo,

(file stats .st mode & S IWOTH) |


S IRUSR);

3.5

Big and Little Endian

Representations
Chapter 1introduced

the concept of big and

little endian

representations

of multibyte

data.

However, the author has found that this subject

many

confuses

people. This section

covers

the

topic inmore detail.


The reader will recall that endianness

refers

to the order that the in-dividual bytes (not bits)


of

a multibyte

data element is stored in memory.

Big endian is the most straightforward

method.

It stores the most signif-icant byte first, then the

next significant byte and

so on. In other

words

the big bits

are

stored first. Little endian stores

the

in

the

bytes

Actually

typedef to

opposite

a parameter

of

type

mode t

which

an integral type.

58
CHAPTER 3.BIT OPERATIONS

unsigned short word


unsigned char

if (p[0]

= 0x1234;

p = (unsigned

assumes

sizeof(short)

char ) &word;

== 0x12 )

printf (Big Endian Machine\n);

else
printf ( Little Endian Machine\n);

== 2 /

is

Figure 3.4: How to Determine Endianness

order (least significant

first). The x86 family of

processors use little endian representation.


As an example, consider the double
1234567816

representing

In

big

representation, the bytes would be stored

56 78. In little
would be stored

endian represenation,

now, why any sane


Intel

representations
would

seem

the bytes

for

asking himself

chip de-signer would

representation?

sadists

as 12 34

as 78 56 34 12.

The reader is probably

endian

word

endian

use

right

little

Were the engineers

inflicting

on multitudes

this
of

at

confusing

program-mers?

It

that the CPU has to do extra work

to store the bytes backward in memory like this

unreverse them when read back in to


memory). The answer is that the CPU does not
do any extra work to write and read memory

(and to

using little endian format. One has to realize

that

the CPU is composed

circuits that simply work


(and bytes)
CPU.

are not

of

many

electronic

on bit values. The bits


necessary order in the

in any

Consider

can

the 2-byte AX register. It

be

decomposed into the single byte registers: AH and

are

AL. There

circuits in the CPU that maintain

the values of AH and AL. Circuits

are not

in any

order in a CPU. That is, the circuits for AH

not before
instruction

or

are
mov
to memory
is not any

after the circuits for AL. A

that copies the value of AX

copies the value of AL then AH. This

harder for the CPU to do than storing AH first.

same argument applies to the individual


bits in a byte. They are not really in any order in
the circuits of the CPU (or memory for that
matter). However, since individual bits can not be
addressed in the CPU or memory, there is no way
to know (or care about) what order they seem
The

to be kept internally by the CPU.

The C code in Figure 3.4 shows how the


endianness of

a CPU can be

determined. The

pointer treats the word variable


char-acter

array.

as a two

element

Thus, p[0] evaluates to the first

byte of word in

memory

endianness of the CPU.

which

depends

on

the

3.5. BIG AND LITTLE ENDIAN

59

REPRESENTATIONS

unsigned invert endian( unsigned

unsigned invert;

const unsigned char

unsigned char ip

x)

xp = (const unsigned char ) &x;


= (unsigned char ) & invert;

67

= xp [3];
= xp[2];
ip[2] = xp[1];
ip[3] = xp[0];
ip[0]

reverse

the individual bytes /

ip[1]

10

11

return invert

12

13

/ return the bytes reversed /

Figure 3.5: invert endian Function

3.5.1

When to Care About Little and

Big Endian

For typical programming, the endianness of

the CPU is not significant.


The most

common

time that it is important is

when binary data is trans-

ferred between different computer systems. This is

usually either using

some

type of physical data media (such as a disk)


network. Since ASCII data

or a

With the advent of

multi-is single byte, endianness is not

an issue

for

byte character

it.
sets,

like

Allinternal TCP/IP headers store integers inbig endian format (called


network byte order). TCP/IP libraries provide C functions for dealing with
endianness issues in a portable

way. For example,


UNICODE, endianness is

important

even

for

the htonl() function

con-

text

data. UNICODE supports


verts

a double

either endianness and has

word (or long integer) from host to network format. The


5

a mechanism

for specifying

For a big endian

which endianness is being

system, the two functions just return their input unchanged. This allows

used to represent the data.

ntohl() function performs the opposite transformation.

one to write network programs


and run correctly onany

that will compile

system irrespective of its endianness. For more


information, about endi-

anness

and network programming

see W.

Richard Stevens excellent book,


UNIX Network Programming.

Figure 3.5 shows

a C function

that inverts

the endianness of a double


word. The 486

processor

introduced

a new

machine instruction named BSWAP


that

reverses

the bytes of

any

32-bit register. For

example,
bswap

;
swap

edx

bytes of edx

The instruction can not be used


registers.
However, the XCHG
5

Actually, reversing the endianness of

reverses
little

the bytes; thus,

to big is the

functions

do the

same

con-verting

same
thing.

on

an integer

16-bit

simply

from big to little

operation.

So both

of

or

these

60
CHAPTER 3.BIT OPERATIONS

int count bits (unsigned int data )

int cnt

=0;

45

while( data != 0 ){

=data

data

cnt++;

return cnt;

10

& (data 1);

Figure 3.6: Bit Counting: Method One

instruction

can be used to swap the bytes of the


can be decomposed into 8-bit

16-bit registers that

registers. For example:


xchg

3.6

ah,al

;
swap

bytes of ax

Counting Bits

Earlier

a straightforward

for counting the number

was given
are on in a

technique

of bits that

double

word. This

section

looks

direct methods of doing this

at other less

as an exercise

using

the bit operations discussed in this chapter.

3.6.1

Method

The first

one

method

very

is

simple, but not

obvious. Figure 3.6 shows the code.

How

does

this

work? In every
one bit is turned off in
bits are off (i.e. when data

method

iteration of the loop,


data. When all the

is zero), the loop stops. The number of iterations


required to make data

zero is equal to

the number

of bits inthe original value of data.

Line 6 is where

a bit of data

does this work? Consider


binary representation

is turned off. How

the general form of the

of data and the rightmost

1 in this representation. By definition,


after this 1
must be

binary representation

zero.

every

bit

Now, what will be the

of data

- 1? The bits to

the left of the rightmost 1


will be the

same as for

data, but at the point of the rightmost 1


the bits
will

be the

complement

of data. For example:


data

xxxxx10000

of

the

original

bits

data

-1 =

xxxxx01111

3.6. COUNTING BITS

61
1

static unsigned char byte bit count [256];

/ lookup table /

23

initialize count bits ()

void
4

int cnt, i data;

67

= 0; i< 256; i++ ){


=0;
data = i;

for( i

cnt

while( data != 0 ){

10

data

11

/ method

one

& (data 1);

cnt++;

12

13

byte bit count [i]

14

=cnt;

15

16

= data

17

18

int count bits (unsigned int data )

19

const unsigned char byte

20

= (unsigned

char ) & data;

21

22

return byte bit count [byte[0]]

23

byte bit count [byte[2]]

24

+byte bit count


+byte bit count

[byte[1]]

[byte [3]];

Figure 3.7: Method Two

where the xs

are

the

same

for both numbers.

When data is ANDed with data

-1, the result

will

zero

the rightmost 1
in data and leave all the

other bits unchanged.

Method two

3.6.2

A lookup table
bits

of

an

straightforward

precompute

can

arbitrary

also be used to count the


double

approach

the number

two related

would

be

The

to

of bits for each double

word and store this in an

are

word.

problems

array.

However, there

with this

approach.

are roughly 4 billion double word values!


means that the array will be very big and
that initializing
it will also be very time
consuming.
(In fact, unless one is going to
actually use the array more than 4 billion times,
more time will be taken to initialize the array than
There
This

it would require to just compute

using method one!)

the bit counts

62
CHAPTER 3.BIT OPERATIONS

A more realistic method would precompute


the bit counts for all possible byte values and store
these into an array. Then the double word

can be

up into four byte values. The bit counts of


are looked up from the
array and sumed to find the bit count of the
split

these four byte values

original double word. Figure 3.7 shows the to


code implement this approach.
The initialize count bits function must be

called before the first call to the count bits


function. This function initializes the global

array. The count bits function


looks at the data variable not as a double word,
but as an array of four bytes. The dword pointer
acts as a pointer to this four byte array. Thus,
dword[0] is one of the bytes indata (either the
least significant or the most significant byte
depending on if the hardware is little or big
endian, respectively.) Of course, one could use a
byte bit count

construction

like:

(data

>> 24) & 0x000000FF

to find
similar

most

the

ones

significant

byte

value

and

for the other bytes; however, these

constructions

will

be

slower

an array

than

reference.

a for loop could easily be


used to compute the sum on lines 22 and 23. But,
a for loop would include the overhead of
initializing a loop index, comparing the index
One last point,

after each iteration and incrementing

Computing

the

sum as

values will be faster.

the explicit

In fact,

the index.

sum

a smart

of four

compiler

would convert the for loop version to the explicit

sum.

This

loop

iterations

process

technique known

3.6.3

or

compiler

eliminating

optimization

as loop unrolling.

Method three

There
counting

of reducing

is

is

yet

another

the bits that

clever

are on

in

method

of

data. This

method literally adds the ones and zeros of the


data together. This

sum must

equal the number of

ones in the data. For example, consider


the ones in

byte stored in

data. The first step

counting

variable

named

is to perform the following

operation:
data

= (data & 0x55) +((data

>> 1) & 0x55);

What does this do? The hex constant

0x55 is

01010101 in binary. In the first operand of the

addition, data is ANDed with this, bits at the


odd

are

bit positions

bits

position

same

>>

1) & 0x55) first moves all


even positions to an odd
and uses the same mask to pull out these

operand ((data
the

pulled out. The second

at the

bits. Now, the first operand

contains

the

even bits
of data. When these two operands are added
together, the even and odd bits of data are
odd bits and the second operand the

added

together.

101100112

For

then:

3.6. COUNTING BITS

63

example,

if

data

is

int count bits(unsigned int

x)

static unsigned int mask[]

= {0x55555555,

0x33333333,

0x0F0F0F0F,

0x00FF00FF,

0x0000FFFF };

int i;

int shift

/ number of positions to shift to right /

10

for( i=0, shift=1; i


< 5; i++, shift = 2 )

11

x =(x & mask[i]) + ((x >> shift) & mask[i ]);

12

return

13

14

x;

Figure 3.8: Method 3


data & 010101012

+ (data >> 1) & 010101012

or

00

01

00

01

01

01

00

01

01

10

00

10

on the right shows the actual bits added together. The


bits of the byte are divided into four 2-bit fields to show that actually there
The addition

are four independent

additions being performed.

sums can be is two, there is


no possibility that the sum will overflow its field
and corrupt one of the other fields sums.
Of course, the total number of bits have not
been computed yet. How-ever, the same technique
that was used above can be used to compute the
total in a series of similar steps. The next step
Since the most these

would be:

data

= (data & 0x33) + ((data

>> 2) &

0x33);

Continuing the above example (remember that

data

now

is 011000102 ):
data & 001100112

+(data >> 2) & 001100112


Now there

are two 4-bit

or

fields to that

0010

0010

0001

0000

0011

0010

are

independently added.
The next step is to add these two bit

sums

together to form the final result:


data

= (data & 0x0F) + ((data >> 4) & 0x0F);

Using the example above (with data equal to

001100102 ):
data & 000011112

00000010

+ (data >> 4) & 000011112

or

00000011
00000101

64
CHAPTER 3.BIT OPERATIONS

Now

data

Figure

is 5 which

3.8 shows

an

is the correct
implemen- tation

result.
of

this

method that counts the bits in a double word. It

uses a for

loop to compute the

sum.

It would be

faster to unroll the loop; however, the loop makes

it clearer how the method generalizes to different


sizes of data.

Chapter 4

Subprograms
This chapter
make modular

programs

level languages (like C)

are

looks at using subprograms

to

and to interface with high


Functions and procedures

high level language examples of subprograms.

a subprogram and the


agree on how data will be
passed between them. These rules on how data
will be passed are called calling conventions. A
The code that calls

subprogram itself must

large part of this chapter

will

standard C calling conventions

to

interface

programs

assembly

deal with the

that

can be

subprograms

used

with

This (and other conventions) often

pass

the addresses of data (i.e. pointers) to allow the


subprogram to access the data inmemory.

4.1

Indirect Addressing

Indirect addressing allows registers to act like


pointer variables. To in-dicate that

as a pointer,

be used indirectly

square

is to

brackets ([]).For example:

direct

mov
memory

mov

ebx, Data

a register

it is enclosed in

mov

ax, [Data]
addressing of

;
ebx

;
normal
a word
2

= & Data

ax, [ebx]

;
ax =

*ebx

Because AX holds

word, line 3 reads

word

starting at the address stored inEBX. If AX


replaced with AL, only
read. It is important

a single

was

byte would be

to realize that registers do

not have types like variables do in C. What EBX


is assumed to point to is completely

determined

are used. Furthermore, even


the fact that EBX is a pointer is completely
determined by the what instructions are used
If EBX is used incorrectly, often there will be no
assembler error; however, the program will not
work correctly.
This is one of the many
reasons that assembly programming is more error
prone than high level programming.
by what instructions

65

66
CHAPTER 4.SUBPROGRAMS

All the 32-bit

general

purpose

(EAX, EBX,

ECX, EDX) and index (ESI, EDI) registers

can be

used for indirect addressing. In general, the 16-bit


and 8-bit registers

4.2

can not be.

Simple Subprogram

Example
A subprogram

an

independent

unit of

a
a subprogram is like a
function in C. A jump can be used to invoke the
subprogram, but returning presents a problem. If
code that

program.

can

is

be used from different parts of

In other words,

the subprogram is to be used by different parts of


the program, it must return back to the section of
code that invoked it.Thus, the jump back from
the subprogram
The code below

can not

be hard coded to

a label.

shows how this could be done

using the indirect form of the JMP instruction.


form of the instruction

uses

the value of

This

a register

to determine where to jump to (thus, the register

a
program

acts much like

function pointer in C.) Here

the first

from chapter 1
rewritten to

is

use

a subprogram.
sub1.asm

;
file: sub1.asm
;
Subprogram example program

%include "asm_io.inc"

45

segment .data

a number:

prompt1 db

"Enter

prompt2 db

"Enter another number: ",0

",0

outmsg1 db

"You entered ",0

outmsg2 db

"and ",0

10

outmsg3 db

",the

sum of these

is ",0

11

12

segment .bss

13

input1

resd 1

14

input2

resd 1

15

16

17

18

segment .text
global

_asm_main

_asm_main:

19

enter

20

pusha

0,0

;
setup

eax, prompt1

;
print

21

22

mov

23

call

print_string

mov

ebx, input1

24

25

;
dont

;
store

forget null terminator

routine

out prompt
address of input1 into ebx

4.2. SIMPLE SUBPROGRAM EXAMPLE

67

26

mov

ecx, ret1

27

jmp

short get_int

;
;

29

mov

eax, prompt2

30

call

print_string

32

mov

ebx, input2

33

mov

ecx, $ + 7

34

jmp

short get_int

mov

eax, [input1]

28

ret1:

31

35

36

;
;

37

add

38

mov

eax, [input2]
ebx, eax

40

mov

eax, outmsg1

41

call

print_string

42

mov

eax, [input1]

43

call

print_int

44

mov

eax, outmsg2

45

call

print_string

46

mov

eax, [input2]

47

call

print_int

48

mov

eax, outmsg3

49

call

print_string

50

mov

eax, ebx

51

call

print_int

52

call

print_nl

;
;

eax, 0

39

53

54

popa

55

mov

56

leave

57

ret

58

;
subprogram

59

;
Parameters:

60

ebx

get_int

-address

of dword to store

store return address into


read integer
print out prompt

= this address + 7
eax = dword at input1
ecx

eax += dword at input2


ebx

= eax

print out first

message

print out input1


print out second

message

print out input2


print out third

print out

message

sum (ebx)

print new-line

return back to C

ecx

integer into
61

ecx

-address

of instruction to

return to

63

;
Notes:
; value

64

get_int:

62

of eax is destroyed

65

call

66

mov

read_int
[ebx],

eax

67

store input into


jmp

;
jump

memory

ecx
sub1.

back to caller

68
CHAPTER 4.SUBPROGRAMS

get int

The

register-based
EBX

register

uses a simple,
conven-tion. It expects the

subprogram

calling

to

hold

the

address

of

the

DWORD to store the number input into and the

ECX register
instruction

to hold the code address

of the

to jump back to. In lines 25 to 28,

the ret1 label is used

to compute

this return

address. In lines 32 to 34, the $ operator is used

to compute the return address. The $ operator


returns the current address for the line it appears

on. The

expression $ + 7 computes the address

of the MOV instruction

on line 36.

Both of these return address computations

are

awkward.

to be defined
second

The first method requires


for each subprogram

instead of

simpler

a label, but
If a near jump was

method does not require

require careful thought.

would

a label

call. The

not

a short

does
used

jump, the number to add to $

be 7! Fortunately,

way to invoke

there is

much

subprograms. This method

uses

the stack.

4.3

The Stack

Many
stack.

CPUs

have built-in

A stack is

list. The stack is


organized

support

a Last-In First-Out
an area of memory

for

(LIFO)

that is

in this fashion. The PUSH instruction

adds data to the stack and the POP instruction

removes

data. The data removed

is always the

a last-in

last data added (that is why it is called


first-out list).

The

SS

segment

register

specifies

the

segment that contains the stack (usually this is


the

same segment

ESP register

data is stored into). The

contains

the address of the data

that would be removed

from the stack.

This

data is said to be at the top of the stack. Data

can only be added in double word units. That is,


one can not push a single byte on the stack.
1
The PUSH instruction inserts a double word
on the stack by subtracting 4 from ESP and then
stores

the

instruction

double

word

at [ESP]

The

POP

reads the double word at [ESP] and

then adds 4 to ESP. The code below demostrates


how these instructions

work and

assumes

that

ESP is initially 1000H.

push

= 0FFCh

;
2 stored

at 0FF8h, ESP

push

dword 3

;
1
stored

dword 1

0FFCh, ESP

= 0FF8h

;
3 stored

= 0FF4h
pop
EAX =3,ESP = 0FF8h
;
EBX = 2,ESP = 0FFCh
;
ECX = 1,ESP = 1000h
ESP

Actually words

words

on the stack.

at 0FF4h,

eax
pop

ebx

pop

can be pushed too, but

protected mode, it is better to work

at

dword 2

push

in32-bit

with only double

ecx

4.4. THE CALL AND RET INSTRUCTIONS

69
The stack

can

be used

as a convenient

to store data temporarily.


making

subprogram

place

It is also used for

calls, passing

parameters

and local variables.


The 80x86 also provides

that pushes

the values

of

a PUSHA

instruction

EAX, EBX, ECX,

EDX, ESI, EDI and EBP registers (not in this

order). The POPA instruction

can

be used to

pop

them allback off.

4.4

The CALL and RET

Instructions
The 80x86 provides two instructions
the stack to make calling subprograms

easy.

The

CALL

uncondi-tional jump to

instruction

subprogram

pops

of f

use

makes

an

and pushes

on the stack.
an address and

the address of the next instruction


The RET instruction

that

quick and

jumps

to

that

address.

When

using

instructions, it is very important that


the stack correctly

so

these

one man-age

that the right number is

popped offby the RET instruction!


The previous
these

new

program can be rewritten to use

instructions

by changing lines 25 to 34

to be:

mov

ebx, input1

call

get_int

mov

ebx, input2

call

get_int

and change the subprogram get int to:


get_int:

call

read_int

mov

[ebx],

eax

ret

There

are several

advantages to CALL and RET:

It is simpler!

al lows
ogram
sca
It allows subprogramsIt calls
to subpr
be nested
easily.

Notice

that

get int

calls read int. This call

pushes another address


of read ints code is

on the
a RET

stack. At the end


that

pops

off the

return address and jumps back to get ints code.

pops off
asm main.

Then when get ints RET is executed, it


the return address that jumps back to

This works correctly because of the LIFO property


of the stack.

70
CHAPTER 4.SUBPROGRAMS

Remember it is very important to pop offall


data that is pushed

on the

stack. For example,

consider the following:

get_int:

call

read_int

mov

[ebx],

push

eax

ret

eax

;
pops

off

EAX value, not return address!!

This code would not return correctly!

4.5

Calling Conventions

When

subprogram

code and the subprogram

on

how to

languages

as

the calling

(the callee) must

agree

data between them. High-level

have standard

calling

interface

pass

is invoked,

conventions.

with assembly

ways to pass
For

data known

high-level

language,

code

to

the assembly

language code must


the high-level

can

use

language.

the

same

on

how the code

are on or

optimizations
convention

a CALL
All

or may vary

is compiled
not). One

(e.g. if
universal

is that the code will be invoked with

instruction and return via


PC

compilers

support

convention that will be described


chapter

as

The calling conventions

differ from compiler to compiler

depending

RET.

one

calling

inthe rest of this

allow one
are reentrant. A
may be called at any point

in stages. These conventions

to create

subprograms

reentrant subprogram
of

conventions

a program

that

safely (even inside the subprogram

itself).

4.5.1

Passing parameters

Parameters

on the

stack.

the

a subprogram may be passed


are pushed onto the stack
instruction.
Just as in C, if

to

They

before the CALL

parameter

subprogram,

on the stack

is

to

be

changed

the address of the data

by

passed, not the value. If the parameters

less than

a double

the

must be

size is

word, it must be converted to a

double word before being pushed.


The

parameters

on

the

popped off by the subprogram,

stack

are not
are

instead they

accessed from the stack itself. Why?

Since

they

have

to be pushed

on

the stack

before the CALL instruction,

the return address would have to be popped off


irst (and then pushed
back

on again).

Often the parameters will have to be used in


several places in the
subprogram. Usually, they

can not

be kept in a

register for the entire


subprogram

and would have to be stored in

chapter

CONVENTIONS

71

4.5. CALLING

ESP + 4

Parameter
Return address

ESP

Figure 4.1:

ESP + 8

Parameter

ESP + 4

Return address

ESP

subprogram data
Figure 4.2:

on the stack keeps a copy of the data in memory


that can be accessed at any point of the
subprogram.
Consider

a subprogram that is passed a


on the stack. When using

single parameter

indirect ad-When the subprogram is invoked, the

stack looks like Figure 4.1. The


80x86

proces-

pa-

dressing, the

rameter

can be accessed

using indirect addressing ([ESP+4]

sor accesses

2).

If the stack is also used inside the subprogram to store data, the number

needed to be added to ESP will change. For example, Figure 4.2 shows what
the stack looks like if a DWORD is pushed the stack. Now the parameter is

at ESP +8 not ESP +4.Thus, it can be very

registers
indirect
sion.

ESP (and EBP)

error prone to use ESP when use the stack segment

referencing parameters. To solve this problem, the 80386 supplies another

EAX,

register to use: EBP. This registers only

is to reference data

different segon what


are used inthe
addressing expres-

ments depending

on the

EDX

EBX,

while

ECX

and

purpose

use the data

segment.
stack. The C calling convention mandates that

a subprogram

first

save the

value of EBP on the stack and then set EBP to be equal to ESP. This allows
ESP to change

as data

is pushed

or popped

offthe stack without modifying

EBP. At the end of the subprogram, the original value of EBP must be
restored (this is why it is saved at the start of the subprogram.) Figure 4.3

However,

this is usually

proprograms, be-

unimportant for most


tected mode

cause

for them the data

and stack segments

are the

same.

shows the general form of a subprogram that follows these conventions.

Lines 2 and 3 inFigure 4.3 make

up the general

prologue of a subprogram.
Lines 5 and 6 make

up the epilogue.

Figure 4.4

shows what the stack looks


like immediately after the prologue. Now the

parameter

can be access
any place

[EBP + 8] at

with
inthe subprogram

without worrying about what


else has been pushed onto the stack by the
subprogram.

After the subprogram is over, the parameters


that

were pushed on the

stack must be removed. The C calling


convention specifies that the caller

code must do this. Other conventions

are

different. For example, the Pascal

calling convention specifies that the subprogram

must
2

remove

the parame-

It is legal to add

a constant to a register

using indirect addressing.

complicated expressions

when

More

are

possible too. This topic is

covered inthe next chapter

72
CHAPTER 4.SUBPROGRAMS

subprogram_label:

push

ebp

;
save original

mov

ebp, esp

;
new

;
subprogram

pop

ret

EBP

EBP value

on stack

=ESP

code
ebp

;
restore

original EBP value

Figure 4.3: General subprogram form

ESP + 8

EBP + 8

Parameter

ESP + 4

EBP + 4

Return address

ESP

EBP
Figure 4.4:

saved EBP

ters.

(There

is

another

form

instruction that makes this

support

compilers

pascal keyword

is

of

easy to

the

RET

do.) Some C

too.

The

used in the prototype

and

this

convention

definition of the function to tell the compiler to

use

this

convention.

In

fact,

convention that the MS Windows

use

the

stdcall

API C functions

way. What is the advantage of


this way? It is a little more effi cient than the C
convention. Why do all C functions not use this
convention, then? In general, C allows a function
to have vary-ing number of arguments (e.g., the
also works this

types of

printf and scanf functions). For these


functions,

the

operation

parameters from the stack

remove the
will vary from one call
to

of the function to the next. The C convention

allows

the instructions

to perform this operation

to be easily varied from

one call to the next. The

Pascal

convention

and

stdcall

makes

operation

very

convention

(like the Pascal language)

diffi cult.

Thus,

the

does not

allow this type of function. MS Windows

this convention

since

none

this

Pascal

can use

of its API functions

take varying numbers of arguments.

Figure 4.5 shows how


C calling

removes

convention

subprogram

would

using the

be called.

Line

the parameter from the stack by directly

manipu-lating the stack pointer. A POP instruction


could be used to do this also, but would require the

useless result to be stored in a register. Actually,

case, many compilers would use


to remove the parameter.
would use a POP instead of an ADD
ADD requires more bytes for the

for this particular

POP ECX instruction

The compiler
because

the

instruction. However, the POP also changes


value! Next is another example
subprograms

that

use

program

ECXs

with two

the C calling conventions

discussed above. Line 54 (and other lines) shows


that multiple data and text segments
stack

4.5.

may

be

CALLING

CONVENTIONS

73
1

push

dword 1

call

fun

add

esp, 4

;
pass 1
as parameter
;
remove

parameter from stack

Figure 4.5: Sample subprogram call

source

file. They will be combined

into single

data and text segments


Splitting

up

in the linking

process.

the data and code into separate

segments allow the data that

a subprogram uses to

be defined close by the code of the


subprogram.

%include "asm_io.inc"

23

segment .data
4

sum

dd

56

segment .bss
7

input

resd 1

89

;
10

11

12

13

14

15

;
pseudo-code
;
i
=1;
;
sum = 0;

algorithm

;
while( get_int(i,
; sum += input;
; i++;

&input),

16

;
}

17

;
print_sum(num);

18

segment .text
global

19
20

_asm_main

_asm_main:

21

enter

22

pusha

0,0

23

mov

24

25

edx, 1

while_loop:

26

push

edx

27

push

dword input

28

call

get_int

29

add

esp, 8

sub3.asm

input != 0 ){

;
setup routine
;
edx is i inpseudo-code

;
save

i
on stack

;
push address on input on stack
;
remove i
and &input from stack

74
30

32

mov
cmp

eax, [input]
eax, 0

33

je

end_while

add

[sum],

37

inc

edx

38

jmp

short while_loop

31

34

35

eax

36

39

40

end_while:

41

push

dword [sum]

42

call

print_sum

43

pop

ecx

44

45

popa

46

leave

ret

47

48

49

50

;
subprogram
;
Parameters

get_int
(inorder pushed

on

CHAPTER 4.SUBPROGRAMS

;
sum += input
;
push value

;
remove

of

sum onto

stack

[sum] from stack

stack)

+12])

51

number of input (at [ebp

52

address of word to store input

into (at [ebp


54

+ 8])

53

;
Notes:

values of eax and ebx

are

destroyed
55

segment .data

56

prompt

db

")Enter

number (0 to quit): ",0


58

segment .text

59

get_int:

an integer

57

60

push

ebp

61

mov

ebp, esp

mov

eax, [ebp + 12]

62

63

call

print_int

66

mov

eax, prompt

67

call

print_string

69

call

read_int

70

mov

ebx, [ebp

71

mov

[ebx],

64

65

68

store input into

memory

eax

+ 8]

4.5. CALLING CONVENTIONS

72

73

pop

74

ret

ebp

75

81

;
subprogram print_sum
;
prints out the sum
;
Parameter:
; sum to print out (at [ebp+8])
;
Note: destroys value of eax
;

82

segment .data

83

result

76

77

78

79

80

db

"The

sum is ",0

84

85

segment .text

86

print_sum:

87

push

ebp

88

mov

ebp,

90

mov

eax, result

91

call

print_string

esp

89

92

93

mov

eax, [ebp+8]

94

call

print_int

95

call

print_nl

97

pop

ebp

98

ret

96

;
jump

4.5.2

sub3.

75
back to caller

Local variables

The stack

on the stack

can be used as a convenient

location

for local variables. This is exactly where C stores

normal (or automatic in C lingo) variables. Using


the stack for variables is important if

subprograms

to

be

reentrant

one

subprogram will work if it is invoked at


including the subprogram

Data not stored

any

place,

itself. In other words,

can be invoked recursively.


for variables also saves memory.
on the stack is using memory

reentrant subprograms
Using the stack

wishes

reentrant

from the beginning of the

program

until the end

program (C calls these types of variables


or static). Data stored on the stack only use
memory when the subprogram they are defined for
of the

global

is active.
Local

variables

are

stored

right

by subtracting

the number

after

the

are

allocated

of bytes

required

saved EBP value in the stack. They

76
CHAPTER 4.SUBPROGRAMS

subprogram_label:

push

ebp

;
save original

mov

ebp, esp

;
new

sub

esp, LOCAL_BYTES

;
=# bytes

;
subprogram

EBP

EBP value

onstack

=ESP
needed by locals

code

mov

esp, ebp

;
deallocate

pop

ebp

;
restore

ret

locals

original EBP value

Figure 4.6: General subprogram form with local


variables

void calc sum( int

int i

n,int

sump

sum =0;

45

for( i=1; i
<= n;i++ )

sum += i;

sump

= sum;

Figure 4.7: C version of

in the prologue of the subprogram.


shows the
register

new

is

subprogram

to

used

sum

Figure 4.6

skeleton. The EBP

access

local

variables.

Consider the C function in Figure 4.7. Figure 4.8


shows how the equivalent subprogram
could be written inassembly.
Figure 4.9 shows what the stack looks like after

pro-gram

the prologue of the

in Figure 4.8. This

section of the stack that contains the parameters,

return information

and local variable


called

Despite

the

fact

function creates
and

LEAVE

that

ENTER

a new

simplify

a stack

the

prologue and epilogue they

of a
on the stack.

invocation

stack frame

storage is

frame. Every

are not used very

often.

The prologue and epilogue of a subprogram

two special instructions that

Why

performs
performs

The

Because

the

they

prologue

can be simplified

instruction

For the

are

ENTER

code

takes

simplier

by using

specifically for this purpose. The

and

instruction
the

the slower than the equivalent

ENTER

operands

are designed

LEAVE

epilogue

two immediate

instructions!

This

calling convention, the second operand is always

0.The first operand is


an example of when the number bytes needed by local variables. The LEAVE instruction has no
one can not assume that a
operands. Figure 4.10 shows how these instructions are used. Note that the
one instruction sequence is
program skeleton (Figure 1.7) also uses ENTER and LEAVE.
faster than a multiple inis

struction

one.

4.6. MULTI-MODULE PROGRAMS

77
1

cal_sum:

push

ebp

mov

ebp, esp

sub

esp, 4

;
make room for local sum

56

-4],0

;
sum =0

mov

dword [ebp

mov

ebx, 1

;
ebx (i)= 1

for_loop:

cmp

ebx, [ebp+8]

;
is i
<= n?

jnle

end_for

12

add

[ebp-4], ebx

13

inc

ebx

14

jmp

short for_loop

10

11

;
sum += i

15

16

end_for:

17

mov

ebx, [ebp+12]

;
ebx =sump

18

mov

eax, [ebp-4]

;
eax =sum

19

mov

[ebx],

21

mov

esp, ebp

22

pop

ebp

23

ret

eax

;
*sump

=sum;

20

Figure 4.8: Assembly version of sum

4.6

Multi-Module Programs

program

A multi-module

more

than

presented

programs.

one

object

here

is

have

one

composed of

programs

All the

file.

been

multi-module

They consisted of the C driver object

file and the assembly

object

file (plus the

library

Recall

that

object

files).

combines the object files into

program.

the

linker

a single executable
up references

The linker must match

made to each label in one module (i.e. object


file) to its definition

in another

order for module A to


module B, the
After

the

use a

module.

In

label defined in

extern directive must be used.

extern

directive

delimited list of labels.

comes

a comma

The directive tells the

as external to the
are labels that can be
but are defined in another.

assembler to treat these labels

module. That is, these


used in this module,
The

asm io inc file


as external.

routines

In assembly,

defines the

labels

can not
78

CHAPTER 4.SUBPROGRAMS

read int, etc.

be

accessed

ESP

+ 16
+ 12
ESP + 8
ESP + 4

EBP + 12

ESP

EBP + 8

EBP + 4

Return address

ESP

EBP

EBP

sump

saved EBP

-4

sum

Figure 4.9:
1

subprogram_label:

enter

leave

ret

;
=# bytes

LOCAL_BYTES, 0

;
subprogram

needed by locals

code

Figure 4.10: General subprogram form with local


variables using ENTER and LEAVE

can be accessed from other modules than the


one it is defined in, it must be declared global in
its module.
Line

13

Figure

defined

of

The

global

does

this

program listing in
asm main label being

the skeleton

1.7 shows

as

directive

global.

there would be

the

Without

this

declaration,

a linker error. Why?

Because the

C code would not be able to refer to the internal

asm main label.

Next
rewritten

is the code for the previous

separate

use two

to

subprograms

modules.

two

The

(get int and print sum)

source

example,

are

in

file than the asm main routine.

main4.asm
1

%include "asm_io.inc"

segment .data
4

sum

dd

segment .bss
7

input

resd 1

segment .text
10

global

_asm_main

11

extern

get_int, print_sum

12

13

_asm_main:

enter

0,0

setup routine
14

pusha
4.6. MULTI-MODULE PROGRAMS

79

15

16

17

mov

edx, 1

while_loop:

18

push

edx

19

push

dword input

20

call

get_int

21

add

esp, 8

23

mov

eax, [input]

24

cmp

eax, 0

25

je

end_while

add

[sum],

29

inc

edx

30

jmp

short while_loop

22

26

27

eax

28

31

32

end_while:

33

push

dword [sum]

34

call

print_sum

35

pop

ecx

36
37

popa

leave

38

;
edx is i inpseudo-code
;
save i
on stack
;
push address on input on stack
;
remove

i
and &input from stack

;
sum += input
;
push value
;
remove

of

sum onto

[sum] from stack


ret

39

stack

main4.asm

sub4.asm
1

%include "asm_io.inc"

segment .data
4

prompt

db

")Enter

number (0 to quit): ",


0

an integer

segment .text
7

global

get_int, print_sum

get_int:

enter

0,0

10

11

mov

eax, [ebp + 12]

12

call

print_int

14

mov

eax, prompt

15

call

print_string

13

16

80

17

call

read_int

18

mov

ebx, [ebp

19

mov

[ebx],

+ 8]

eax

20

21

leave

22

ret

23

24

segment .data

25

result

db

"The

sum is ",0

26

27

segment .text

28

print_sum:

enter

0,0

31

mov

eax, result

32

call

print_string

34

mov

eax, [ebp+8]

35

call

print_int

36

call

print_nl

29
30

33

37

38

leave

39

ret

sub4.

CHAPTER 4.SUBPROGRAMS

;
store input into memory
;
jump back to caller

The previous example only has global code


labels; however, global data labels work exactly

same way.

the

4.7

Interfacing Assembly with C

Today,

very

few

programs

completely in assembly. Compilers


converting

are written
are very good at

high level code into effi cient machine

code. Since

it is much easier to write code in a

high

language,

level

it

is

more popular. In
more portable

addition, high level code is much

than assembly!
When assembly

is used, it is often only used

for small parts of the code. This

can be done

in

two

ways:

calling assembly subroutines from C

or inline assembly. Inline assembly allows the


programmer to place assembly statements directly
into C code. This can be very convenient; however,
there are disadvantages to inline assembly. The
assembly code must be written in the format the
compiler

uses.

supports

NASMs

require different

No

compiler

format.

formats

at

the

Different

moment
compilers

Borland and Microsoft

require MASM format. DJGPP and Linuxs


require GAS

GAS is the assembler

uses

gcc

format. The
that all GNU compilers

the AT&T syntax which

use. It

4.7. INTERFACING ASSEMBLY

WITH C

81
1

segment .data

dd

format

db

"x

45

= %d\n", 0

...

segment .text
6

push

dword [x]

;
push xs

push

dword format

;
push address

call

_printf

;
note underscore!

add

esp, 8

;
remove

10

value
of format string

parameters from stack

Figure 4.11: Call to printf

an assembly
more standardized on

technique of calling
much

subroutine is

the PC.

Assembly routines
the following

Direct

are usually

used with C for

reasons:

access

is needed to hardware features of

are
or impossible to access

the computer that

diffi cult

from C.

The routine must be

as fast as possible

and the

programmer can hand


optimize the code better than the compiler

can.
reason

as valid as it once was.


over the years
compilers can often generate very effi cient
(especially
if compiler optimizations
are

The last

is not

Compiler technology

and
code

turned

on)

routines

are: reduced

Most

has improved

The

of

the

disadvantages

of

assembly

portability and readability.

calling

conventions

already been specified. However,

there

have

are a

few

additional features that need to be described.

4.7.1

Saving registers

First, C

assumes

that

the values of the following

subroutine

maintains

The register keyword

can

registers: EBX, ESI, EDI, EBP, CS, DS, SS, ES.

This does not


dec-

mean

that

be used in

C variable

the subroutine

can not change

them internally. Instead, it means that if

it does change their values, it must restore their original values before the
subroutine returns.

The EBX, ESI and EDI values must be unmodified

because C uses these registers for register variables. Usually the stack is
used to save the original values of these registers.

ern

compilers

relatively

do this auto-is very

similar

syntaxes

laration to suggest to the


compiler that it use a register for this variable instead of

amemory locaare known as

tion. These

register variables.

different

from

Mod-

the

of MASM, TASM and NASM.

matically without requiring

any suggestions.

82
CHAPTER 4.SUBPROGRAMS

EBP + 12

value of

EBP + 8

address of format string

EBP + 4

Return address

EBP

saved EBP
Figure 4.12: Stack inside printf

4.7.2

Labels of functions

Most

compilers

prepend

single

underscore( ) character at the be-ginning of the

names

of functions

For example,

and global/static

a function

variables.

named f will be assigned

the label f.Thus, if this is to be

an

assembly

routine, it must be labelled f,not f.The Linux

gcc

compiler

does

not

Under Linux ELF executables,

use

any character.
one simply would

prepend

the label f for the C function f.However,

an underscore. Note
that inthe assembly skeleton program (Figure 1
.7), the label for the main routine is asm main.
DJGPP s

gcc

does prepend

Passing parameters

4.7.3

Under the C calling convention, the arguments of

a function are pushed on the stack inthe reverse


order that they appear inthe function call.
Consider the following C statement: printf("x

%d\n",x); Figure 4.11 shows how this would be


compiled (shown inthe equivalent NASM format)

Figure 4.12 shows what the stack looks like


after the prologue inside the printf function.

The printf function is one of the C library


functions that

can take any number

of arguments.

The rules of the C calling conventions

necessary to use were specifically written

It is not

to allow these types of functions. Since the


address assembly to process

anar- of the format

string is pushed last, its location

on the stack

will

always be at
bitrary

number

of

argu-

EBP + 8 no matter how

header file defines

many parameters are passed to the function. The


can then look at the format string to determine how many
parameters should have been passed and look for them on the stack.
Of course, if a mistake is made, printf("x = %d\n"), the printf code

good C book for details.

will still print out the double word value at [EBP

ments inC.The stdarg.h

macros
that can be used to process
them portably.
See any

printf code

+12]. However,

not be xs value!

4.7.4

Calculating addresses of local

this will

variables

Finding the address of


data
linker

or

bss segments is simple

does

this.

a label defined

However,

inthe

Basically, the

calculating

the

a local variable (or parameter) on the


stack is not as straightforward. However, this is
a very common need when calling subroutines
Consider the case of passing the address of a
variable (lets call it x) to a function
address of

4.7. INTERFACING ASSEMBLY

WITH C

83

x is located at EBP
one cannot just use:

(lets call it foo). If


the stack,

mov

eax, ebp

8 on

-8

Why? The value

that MOV

must be computed by the


must in the end be

stores into EAX

as-sembler

a constant).

(that is, it

However, there

is an instruction that does the desired calculation.


It is called LEA (for Load Ef-fective Address). The
following

would calculate

the address

of

and

store it into EAX:

eax, [ebp

lea

-8]

Now EAX holds the address of


pushed

Do

on the

not

be

instruction

however,

is

x and

could be

stack when calling function foo

confused,
reading

it
the

looks
data

like

at

this is not true. The LEA

this

[EBP8];

instruction

not
never
true reads
The memory!
LEA in stru
It only
ction computes

the

address

that

instruction

would

be

read

by

and stores this address

another

in its first

register operand. Since it does not actually read

any memory, no memory


dword) is needed or allowed.
4.7.5

( e.g.

size designation

Returning values

Non-void

conventions

are

done. Return values

types

integral

returned

return back

C functions

The C calling

(char,

in the

value

how this is

specify

passed via registers. All

enum,

int,

EAX

register

etc.)

If they

are
are

are extended to 32-bits


are extended
are signed or unsigned types.)

smaller than 32-bits, they

when stored in EAX. (How they


depends
64-bit

on if they
are

values

returned

in the EDX:EAX

are also stored in


EAX. Floating point values are stored in the ST0
register of the math coprocessor. (This register
register

pair. Pointer values

is discussed inthe floating point chapter.)

4.7.6

The

Other calling conventions

rules

above

describe

the

standard

calling convention that is sup-ported by all 80x86


C compilers

calling

support

Often compilers

conventions

as

well.

other

When interfacing

with assembly language it is

very

know what calling

convention

using when it calls

your

important

to

the compiler

is

function. Usually, the

the standard calling con-vention;


4.
however, this is not always the case
Compilers
default is to

use

that

use

multiple

command

conventions

line switches

can

that

often

have

to

be used

change
4

The Watcom

does not

example

use

C compiler

the standard

source

is

an

example

conven-tion

of

one

that

by default. See the

code file for Watcom for details

84
CHAPTER 4.SUBPROGRAMS

the

calling

They

also

provide

to the C syntax to explicitly assign

conventions

However,
and

convention.

default

extensions

these

may vary

to

individual

extensions

are not

functions

standardized

from one compiler to another.

The GCC

compiler

allows

conventions. The convention

of

different

calling

a function can be

explicitly

declared

by using the

exten-sion. For example, to declare

uses the standard


that takes a single
that

following

a void

function

calling convention named f


int

parameter,

use

the

syntax for its prototype:

void f(int )

attribute

supports

also

GCC

attribute

the

((cdecl));

standard

convention. The function above

call

calling

could be declared

to use this convention by replacing the cdecl with


stdcall.

The difference in stdcall and cdecl is

that stdcall requires


the parameters
calling

convention

convention
take

can

a fixed

the subroutine

to

remove

from the stack (as the Pascal

does)

Thus, the stdcall

only be used with functions that

number of arguments

(i.e.

ones not

like printf and scanf).

an additional attribute
regparm that tells the compiler to use
registers to pass up to 3 integer arguments to a
function instead of using the stack.
This is a
common type of optimization
that
many
GCC also supports

called

compilers support.

Borland and Microsoft

to declare

calling

cdecl and

immediately

prototype.
would

conven-tions.

stdcall

as

act

keywords

use a common syntax

function
the

before

They

add the

to C. These

keywords
modifiers

appear
in a

and

name

function

For example, the function f above

be defined

as

follows

for Borland

and

Microsoft:
void

cdecl f(int );

There
each

are

advantages

the

of

calling

and disadvantages

conven-tions.

The

to

main

advantages

of the cdecl convention is that it is

simple and

very

flexible. It

can

be used for

any

type of C function and C compiler. Us-ing other


conventions

can

limit

the

portability

subroutine. Its main disadvantage

of

the

can
use more

is that it

some of the others and


every invocation of the function
to remove the parameters on the

be slower than

memory

(since

requires

code

stack).

The advantages of the stdcall convention is


that it

uses

less

memory

than cdecl. No stack

cleanup is required after the CALL instruction. Its

main

disadvantage

with functions

is that it

can not

that have variable

be used

numbers

of

arguments.
The advantage

uses

registers to

of using

pass

convention

that

integer parameters is speed.

The main disadvantage

is that the convention

more

parameters

complex.

Some

registers and others

on the stack.

may

be

is

in

4.7. INTERFACING ASSEMBLY

WITH C

85

4.7.7

Examples

example
that shows how an
can be interfaced to a C program.
(Note that this program does not use the assembly
skeleton
program (Figure 1.7) or the driver.c
Next

is

an

assembly routine

module.)

main5.c

#include <stdio.h>
/ prototype

for assembly routine /


3

void calc sum( int


attribute

int main( void )


6

int

n,sum;

int )

((cdecl));

up to:);

printf (Sum integers

&n);

10

scanf(%d,

11

calc sum(n, &sum);

printf (S
(Sum is %d\n, sum);

12

return 0;

13

14

main5.c

sub5.asm

;
subroutine _calc_sum
;
finds the sum of the integers
through n
1

;
Parameters:
; n
what

to sum up to (at [ebp

+ 8])
5

sump

-pointer

into (at [ebp


7

;
void

;
{

10

11

;
;
;

+ 12])

to int to store

;
pseudo

calc_sum( int

int i,
sum

n,int

= 0;

for( i=1; i
<= n;i++ )

sum += i;

sum

C code:

*sump )

; *sump
;
}

12

13

=sum;

14

segment .text

15

global

16

_calc_sum

17

86
CHAPTER 4.SUBPROGRAMS

Sum integers

up to:10

Stack Dump # 1

EBP

= BFFFFB70

ESP

= BFFFFB68

+16

BFFFFB80

080499EC

+12

BFFFFB7C

BFFFFB80

+8

BFFFFB78

0000000A

+4

BFFFFB74

08048501

+0

BFFFFB70

BFFFFB88

-4

BFFFFB6C

00000000

-8

BFFFFB68

4010648C

Sum is 55

Figure 4.13:

18

;
local variable:

19

sum at [ebp-4]

20

_calc_sum:

21

enter

4,0

22

push

ebx

23
24

mov

25

dump_stack 1,2,4

;
;

26

mov

ecx, 1

;
;

27

dword [ebp-4],0

for_loop:

28

cmp

ecx, [ebp+8]

29

jnle

end_for

31

add

[ebp-4],

32

inc

ecx

33

jmp

short for_loop

36

mov

ebx, [ebp+12]

37

mov

eax, [ebp-4]

38

mov

[ebx],

40

pop

ebx

41

leave

30

ecx

34

35

end_for:

eax

39

Sample

run of sub5 program

;
make room for sum on stack
;
IMPORTANT!
sum =0
print out stack from ebp-8 to ebp+16

ecx

is i
inpseudocode

cmp

i
and

if not i
<= n,quit

sum += i
ebx

eax

=sump
=sum

restore ebx

42

ret

4.7. INTERFACING ASSEMBLY

sub5.asm

WITH C

87

Why is line 22 of sub5.asm

so

important?

Because

the C calling

con-vention

value of EBX to be unmodified


call. If this is not done, it is

program
Line

macro
just a

requires

the

by the function

very

likely that the

will not work correctly.

25 demonstrates

how

works. Recall that the


numeric

the dump stack

first parameter

is

label, and the second and third

many double words to


display below and above EBP respec-tively. Figure
4.13 shows an example run of the program. For
this dump, one can see that the address of the
dword to store the sum is BFFFFB80 (at EBP +
12); the number to sum up to is 0000000A (at
parameters

EBP

determine how

+ 8); the

return address for the routine is

08048501 (at EBP


BFFFFB88

+ 4); the saved

EBP value is

(at EBP); the value

variable is 0 at (EBP

- 4); and

of the local

finally the saved

EBX value is 4010648C (at EBP

-8).

sum function could be rewritten to


return the sum as its return value instead of using
a pointer parameter. Since the sum is an
integral value, the sum should be left inthe EAX
register. Line 11of the main5
c file would be
The calc

changed to:

sum = calc

sum(n);

Also, the prototype of calc

sum would

need be

altered. Below is the modi-fied assembly code:

sub6.asm

;
subroutine _calc_sum
;
finds the sum of the integers
through n

1
2

;
Parameters:

-what to sum up to (at

;
Return value:
; value of sum

;
pseudo C code:
;
int calc_sum( int n)
;
{

=0;

10

int i,
sum

11

for( i=1; i
<= n;i++ )

13

;
;

return

14

;
}

15

segment .text

12

16

sum += i;
sum;

global

_calc_sum

17

18

;
local variable:

19

20

_calc_sum:

21

[ebp

sum at [ebp-4]
enter

4,0

+ 8])

;
make room for sum on stack

88
CHAPTER 4.SUBPROGRAMS

segment .data

format

34

db "%d", 0

...

segment .text
5

lea

eax, [ebp-16]

push

eax

push

dword format

call

_scanf

add

esp, 8

10

11

...

Figure 4.14: Calling scanf from assembly

22

23

mov

dword [ebp-4],0

24

mov

ecx, 1

25

for_loop:

26

cmp

ecx, [ebp+8]

27

jnle

end_for

29

add

[ebp-4],

30

inc

ecx

28

ecx

31

jmp

short for_loop

mov

eax, [ebp-4]

32

33

end_for:

34

35

leave

36

;
sum = 0
;
ecx is i
inpseudocode

;
cmp

i
and

;
if not

i
<= n,quit

;
sum += i

;
eax
37

= sum
ret

sub6.asm

4.7.8

Calling C functions from

assembly

One great advantage

assembly is that allows

of interfacing

as-sembly

code to

the large C library and user-written

C and

access

functions

For example, what if one wanted to call the scanf

function to read in an integer


Figure

from the keyboard?

4.14 shows code to do this. One

important

the C calling standard to the letter. This


that it

preserves

EDI registers;
registers
definitely

very

point to remember is that scanf follows

may

means

the values of the EBX, ESI and

however, the EAX, ECX and EDX


be modified!

be changed,

return value of the scanf

as

In fact, EAX

will

it will contain

the

4.8. REENTRANT AND RECURSIVE

89

SUBPROGRAMS

call. For other examples of using interfacing


with C,look at the code in

asm io.asm

which

was used to create asm io.obj.


4.8

Reentrant and Recursive

Subprograms

A reentrant subprogram must satisfy the


following properties:

It must not modify

any

code instructions.

In

a high level language


this would be diffi cult, but in assembly it is not
hard for a program to

try to modify its own code. For example:

mov

word [cs:$+7], 5

into the word 7 bytes ahead

;
previous

;
copy
add

ax, 2

statement changes 2 to 5!

This code would

work

in real mode, but in

protected

mode

operating

segment is marked

as

systems

the

code

read only. When the first

program will be aborted


on these systems. This type of programming is
bad for many reasons. It is confusing, hard to
line above executes, the

and does not allow code sharing (see

maintain
below).

It must not modify global data (such

as data

in

the data and the bss

tasegments).
in
All variables

are stored on the

stack.
There

are several

advantages to writing reentrant

code.

A reentrant subprogram

can be called

recursively.

program can
processes. On many

reentrant

multiple

multi-tasking

operating

multiple instances of

program

be

shared

by

systems, if there

are

running, only

one copy

of the code

is inmemory. Shared
libraries and DLLs (Dynamic

Link Libraries)

use this idea as well.

Reentrant

subprograms

multi-threaded

grams

work much better in

pro-

Windows 9x/NT and most UNIX-like

operating systems (Solaris,

Linux,

support

etc.)

multi-threaded

programs.
4.8.1

Recursive subprograms

These types of subprograms


The recursion

can be
occurs

Direct recursion
foo,

calls

recursion

itself

occurs

either

direct

when

inside

when

call themselves.

foos

or

indirect.

subprogram,
body.

a subprogram

say

Indirect

is not called

by itself directly, but by another subprogram

it

calls. For example, subprogram foo could call bar


and bar could call foo.
5

A multi-threaded

execution. That is,the

program
program

has multiple

threads

itself is multi-tasked.

90
CHAPTER 4.SUBPROGRAMS

of

;
finds

segment .text
global _fact

n!

_fact:

enter

0,0

67

mov

eax, [ebp+8]

cmp

eax, 1

;
eax =n
;
if n<= 1,terminate

jbe

term_cond

10

dec

eax

11

push

eax

12

call

_fact

;
eax =fact(n-1)

13

pop

ecx

;
answer

14

mul

dword [ebp+8]

;
edx:eax

15

jmp

short end_fact

16

=eax *[ebp+8]

term_cond:

mov

17

18

ineax

eax, 1

end_fact:

19

leave

20

ret

Figure 4.15: Recursive factorial function

Recursive

termination

true,

subprograms

condition.

no more

recursive

condition
recursion

routine

must

When this

recursive

calls

have

condition

is

are made. If a
a termination

does not have

or the condition never becomes true, the


will never end (much like an infinite

loop).
Figure 4.15 shows

a function

that calculates

factorials recursively.

It could be called from C

with:

x =fact

/ find 3!/

(3);

Figure 4.16 shows what the stack looks like at its


deepest point for the above function call.
Figures

4.17

and 4.18

complicated recursive example


respectively.

show

another

What is the output is for f(3)?

Note that the ENTER instruction creates

on the

stack for each recursive

recursive

more

in C and assembly,

instance of f has its

variable i. Defining

i
as

a new

call. Thus, each

own

independent

double word in the

data segment would not work the same.

4.8. REENTRANT AND RECURSIVE


SUBPROGRAMS

n=3 frame

n=2 frame
n=1frame

91

n(3)
Return address

Saved EBP

n(2)
Return address

Saved EBP

n(1)
Return address

Saved EBP
Figure 4.16: Stack frames for
factorial function
1

void f(int

x)

int i;

for( i=0; i
< x;i++ ){
printf (%d\n,

i);

f(i);

Figure 4.17: Another example (C version)

4.8.2

types

Review of C variable storage

C provides several types of variable storage.

are

global These variables

any

function and

locations

are

defined outside of

stored

(in the data

or

memory

at fixed

bss segments)

exist from the beginning of the

program

and

until the

can be accessed from any


program; however, if they are

end. By default, they


function in the

as

declared

same

static, only the functions

can access

module

in the

them (i.e. in assembly

terms, the label is internal, not external).


static These
that

are

are

local variables

the keyword

static

for two different

function

purposes!)

are also stored at fixed memory


or bss), but can only be directly
inthe functions they are defined in.

These variables

locations (in data

accessed

of

declared static. (Unfortunately, C uses

92
CHAPTER 4.SUBPROGRAMS

%define i
ebp-4

%define

segment .data

format

segment .text

;
useful macros
;
10 =\n

db "%d", 10,0

global _f

extern _printf

x ebp+8

_f:

enter

4,0

;
allocate room on stack

mov

dword [i],0

;
i
=0

13

mov

eax, [i]

;
is i
<x?

14

cmp

eax, [x]

15

jnl

quit

17

push

eax

18

push

format

19

call

_printf

20

add

esp, 8

22

push

dword [i]

23

call

_f

24

pop

eax

26

inc

dword [i]

27

jmp

short lp

for i

10

11

12

lp:

16

;
call printf

21

;
call f

25

28

;
i++

quit:

29

leave

30

ret

Figure 4.18: Another example (assembly version)


4.8. REENTRANT AND RECURSIVE
SUBPROGRAMS

93

automatic

type for

This is the default

a funcvariables are

variable defined inside


tion. This

allocated

on

the

stack when the function they

are

defined in is invoked and

are

deallocated

when the function returns.

Thus,

they

not

do

have

fixed

memory

locations.
register This keyword asks the compiler to

use

register for the data in this variable. This is

just

a request.

The compiler does not have to

honor it. If the address

of the variable

anywhere in the

program

(since registers

do not have addresses)

only
values.

simple

integral

Structured types

not fit in

automatically

types

can

be

Also,

register

can not be; they

register! C compilers
make normal

into register variables

is used

it will not be honored

automatic

without

any

would

will often
variables

hint from the

programmer.
volatile This keyword tells the compiler that the
value of the variable

may

any moment.
can not make any

change
compiler

assumptions

modified. Often

about

This

means

that the

when the variable

is

a compiler
in

the register in place of the variable in

might store the value of


register temporarily and

variable

use

section of code. It can not


do

these

types

of

optimizations

with

volatile variables. A common


example

one could

of

volatile

variable

be altered by two

threads

of

program

multi-threaded

Consider the following code:


1

If

would be

x =10;

y =20;

z = x;
could be altered by another

thread, it is

x between
so that z would not be 10.
if the x was not declared volatile, the
might assume that x is unchanged and

possible that the other thread changes

lines 1
and 3
However,
compiler

set z to 10.

use of volatile is to keep


from using a register for a variable.
Another

the compiler

94
CHAPTER 4.SUBPROGRAMS

Chapter 5

Arrays
5.1

Introduction

An array is a contiguous block of list of data


in memory. Each element

of the list must be the

same type and use exactly the same number of


bytes of memory for storage. Because of these
properties, arrays allow effi cient access of the data
by its position (or index) in the array. The
address of any element can be computed by
knowing three facts:

The address of the first element of the

array.

The number of bytes in each element


The index of the element
It is convenient

to consider the index of the

array to be zero (just as in C).


to use other values for the first

first element of the

It is possible

index, but it complicates the computations.

5.1.1

arrays

Defining

arrays

Defining

inthe data and bss

segments
To define

segment,
NASM

use

initialized

also provides

TIMES that

many

an

can

times

statements

array

in the data

the normal db, dw, etc. directives.

useful

directive

be used to repeat

without

having

by hand. Figure

named

a statement

to duplicate

the

5.1 shows several

examples of these.

array in the bss


resw, etc. directives.
Remember that these directives have an operand
that spec-ifies how many units of memory to
reserve. Figure 5.1 also shows examples of these
To define

segment,

use

an

uninitialized

the

resb,

types of definitions.

95

96
CHAPTER 5.ARRAYS

segment .data

;
define array

a1

;
define array

a2

;
same as before

a3

;
define array

a4

of 10 double words initialized to 1,2,..,10

1,2,3,4,5,6,7,8,9,10

dd

of 10 words initialized to 0

0,0,0,0,0,0,0,0,0,0

dw

using TIMES

times 10 dw 0
of bytes with 200 0s and then a 100 1s

times 200 db 0
times 100 db 1

10

11

12

segment .bss

13

;
define anarray

14

a5

15

;
define anarray

16

a6

resd

resw

of 10 uninitialized double words

10
of 100 uninitialized words

100

Figure 5.1: Defining

Defining

arrays

arrays as local variables on the

stack

no direct way to define a local


array variable on the stack. As before, one
There is

computes the total bytes required by all local


variables,

including

from ESP (either

arrays,
directly

and subtracts

or

using the

this

ENTER

instruction). For example, if

a function

needed

character variable, two double word integers and

a 50 element
4

+ 50

word

array, one

= 109 bytes.

would need 1+ 2

However, the number

a multiple of four
(112 in this case) to keep ESP on a double word
boundary.
One could arrange
the variables
inside this 109 bytes in several ways. Figure 5.2
shows two possible ways
The unused part of
subtracted

the first

from ESP should be

on double
memory accesses.
words

5.1.2

is there to keep the double

ordering

word boundaries

Accessing elements of

There

language

array,

is

as

its address

the following
array1

array

no

[ ] operator

in C. To

two

access an

arrays
in assembly

element of

an

definitions:
5, 4, 3, 2, 1

db

;
array
Here

up

must be computed. Consider

array

of bytes array2

2,1

to speed

are some

dw

of words

examples using this

5.1. INTRODUCTION

5,4,3

arrays:

97

EBP

-1

-8
EBP -12
EBP

char
unused
dword 1
dword 2

word

array
word

-100
-104
EBP -108
EBP -109

array

EBP
EBP

EBP

-112

dword 1
dword 2
char
unused

Figure 5.2: Arrangements of the stack

mov

al,[array1]

mov

al,[array1 + 1]

mov

[array1 + 3],al

mov

ax, [array2]

mov

ax, [array2 + 2]

mov

[array2

mov

ax, [array2 + 1]

+ 6],ax

Inline 5,element 1
of the word

array

;
al=array1[0]
;
al=array1[1]
;
array1[3]

=al

;
ax =array2[0]
;
ax =array2[1]
;
array2[3]

(NOT array2[2]!)

=ax

;
ax =??
is referenced, not element 2.Why?

are two byte units, so to move to the next


element of a word array, one must move two bytes
ahead, not one. Line 7 will read one byte from the
first element and one from the second. In C, the
compiler looks at the type
of a pointer in
determining how many bytes to move in an
expression that uses pointer arithmetic so that the
programmer does not have to. However, in
assembly, it is up to the programmer to take the
size of array elements in account when moving
Words

from element to element.

Figure 5.3 shows


all the

elements

example code.

code snippet that adds

of array1

in the

previous

In line 7, AX is added to DX.

Why not AL? First, the two operands of the ADD

same size. Secondly, it


would be easy to add up bytes and get a sum
that was too big to fit into a byte. By using DX,
sums up to 65,535 are allowed. However, it is

instruction

must be the

important to realize that AH is being added also.


This is why AH is set to zero

inline 3.

Figures 5.4 and 5.5 show two alternative

to calculate the

sum.

ways

The lines in italics replace

lines 6 and 7 of Figure 5.3.


1

Setting AH to

an

unsigned

zero

number.

is implicitly
If

action would be to insert


and 7

it is

a CBW

assuming that AL is

signed,

the

appropriate

instruction between lines 6

98
CHAPTER 5.ARRAYS

mov

ebx, array1

;
ebx =address

mov

dx,0

;
dx will hold sum

mov

ah,0

;
?

mov

ecx, 5

mov

al,[ebx]

;
al=*ebx

add

dx,ax

;
dx += ax (not

inc

ebx

;
bx++

loop

lp

of array1

lp:

al!)

Figure 5.3: Summing elements of an array


(Version 1)
1

mov

ebx, array1

;
ebx =address

mov

dx,0

;
dx will hold sum

mov

ecx, 5

add

dl,[ebx]

;
dl+= *ebx

jnc

next

;
if no carry

inc

dh

;
inc dh

inc

ebx

;
bx++

loop

lp

10

of array1

lp:

goto next

next:

Figure 5.4: Summing elements of an array


(Version 2)
5.1.3

More advanced indirect

addressing

Not surprisingly, indirect addressing


is often used with arrays. The most
general form of an indirect

memory

reference is:
[base

reg + factor*index reg + constant]


where:
base

reg is one of the registers

EAX,

EDI.
EBX, ECX, EDX, EBP, ESP, ESI or
factor is either 1,2,4 or 8.(If 1,
factor is omitted.)

index

reg is one of the registers

EAX,

EBX, ECX, EDX, EBP, ESI, EDI.


(Note that ESP is not inlist.)

5.1. INTRODUCTION

99
1

mov

ebx, array1

;
ebx =address

mov

dx,0

;
dx will hold sum

mov

ecx, 5

add

dl,[ebx]

;
dl+= *ebx

adc

dh,0

;
dh+= carry

inc

ebx

;
bx++

loop

lp

of array1

lp:

flag

+0

Figure 5.5: Summing elements of an array


(Version 3)

constant is a 32-bit constant.

The constant

can be a label (or a label


expression).

5.1.4

Example

Here is an example that uses anarray and


passes it to a function. It uses the array1c.c
program (listed below) as a driver, not the
driver.c program.
array1.
1

%define ARRAY_SIZE 100

%define NEW_LINE 10

segment .data
5

FirstMsg

db

"First 10

elements of array", 0
6

Prompt

db

"Enter index of

element to display: ",0


7

SecondMsg

db

"Element %d is

db

"Elements 20

%d", NEW_LINE, 0
8

ThirdMsg

through 29 of array", 0
9

InputFormat

db

"%d", 0

10

11

segment .bss

12

array

resd ARRAY_SIZE

13

14

segment .text

15

extern

_puts, _printf, _scanf

global

_asm_main

_dump_line
16

17

_asm_main:

enter

18

local dword variable at EBP


push
20

21

ebx
push

4,0

esi

-4

19

100

22

;
initialize array

to 100, 99,98

23

24

25

26

mov

ecx, ARRAY_SIZE

mov

ebx, array

init_loop:

27

mov

[ebx],

28

add

ebx, 4

ecx

29

loop

init_loop

31

push

dword FirstMsg

32

call

_puts

33

pop

ecx

35

push

dword 10

36

push

dword

37

call

_print_array

38

add

esp, 8

30

34

array

39

40

;
prompt user

41

Prompt_loop:

for element index

42

push

dword Prompt

43

call

_printf

44

pop

ecx

45

97,

...

CHAPTER 5.ARRAYS

;
print

out FirstMsg

;
print

first 10 elements of

lea

46

eax

= address

47

eax

push
48

push

49

call

_scanf

50

add

esp, 8

cmp

eax, 1

51

eax

eax, [ebp-4]

of local dword

array

dword InputFormat

= return value of scanf

;
je

52

InputOK
53

call

54

_dump_line

rest of line and start


Prompt_loop
57

InputOK:

over

;
dump
jmp

55

;
if input

invalid

56

58

mov

esi, [ebp-4]

59

push

dword [array

60

push

esi

61

push

dword SecondMsg

print out value of element

+ 4*esi]

;
call

62

_printf
63

add

esp, 12

5.1. INTRODUCTION

64

65

push

dword ThirdMsg

66

call

_puts

67

pop

ecx

69

push

dword 10

70

push

dword

71

call

_print_array

72

add

esp, 8

74

pop

esi

75

pop

ebx

76

mov

eax, 0

77

leave

68

array + 20*4

73

ret

78

79

80

81

;
;
routine

_print_array

101

;
print

out elements 20-29

;
address

of array[20]

return back to C
82

;
C-callable

routine that prints out

elements of a double word

array as

83

signed integers.
84

85

;
C prototype:
;
void print_array(

const int *a,int

n);

;
Parameters:
; a
pointer
(at ebp+8 on stack)
86

87

to array to print out


88

-number

integers to print out (at ebp+12

stack)
89

90

segment .data

on

of

91

OutputFormat

db

"%-5d %5d"

92

93

segment .text
global

94
95

_print_array

_print_array:

96

enter

0,0

97

push

esi

98

push

ebx

100

xor

esi, esi

101

mov

ecx, [ebp+12]

102

mov

ebx, [ebp+8]

99

103

print_loop:
push

104

ecx

105

NEW_LINE, 0

;
esi = 0
;
ecx = n
;
ebx = address of array
;
printf might change ecx!
102

+ 4*esi]

106

push

dword [ebx

107

push

esi

108

push

dword OutputFormat

109

call

_printf

110

add

esp, 12

112

inc

esi

113

pop

ecx

114

loop

print_loop

pop
pop

ebx

117

118

leave

119

ret

111

115

116

esi

array1.

CHAPTER 5.ARRAYS

;
push array[esi]
;
remove

parameters (leave ecx!)

array1c.c

#include <stdio.h>

int
4

asm main(

void );

void dump line( void );

int main()
7

{
{

int ret status

=asm main();

ret status

10

return ret status

11

12

13

14

function dump line

15

dumps all chars left in current line from

input buffer

16

17

void dump line()

18

int ch;

19

20

while( (ch

21

\n)
23

22

= getchar())

/ null body/

!= EOF && ch !=

array1c.c

5.1. INTRODUCTION

103

The LEA instruction revisited

The LEA instruction can be used


purposes than just calcuating addresses
common one is for fast computations.
the following:
lea

ebx, [4*eax

This effectively

+eax]

for other

A fairly

Consider

stores the value


of eff
5 ective
EAX ly
into
T his
st

EBX. Using LEA to do this

is both easier and

faster
than using MUL. However,
do this

one must

realize

that the expression inside the square brackets must


be a legal indirect address. Thus, for example, this
instruction

can not

be used to multiple

quickly.

5.1.5

Multidimensional Arrays

Multidimensional

different

by 6

arrays are not really very


one dimensional arrays

than the plain

mem-ory as
array.

just that,

Two Dimensional Arrays

Not

are represented in
a plain one dimensional

already discussed. In fact, they

surprisingly,

the

simplest

array is a two dimen-sional one.


A two dimensional array is often displayed as a
grid of elements. Each element is identified by a
multidimensional

pair of indices. By convention,

with the

identified

row

the first index

of the element

and the

rows

and two

second index the column.


Consider

an array
as:

with three

columns defined
int

a [3][2];

reserve
The C
room
compiler
for awould
6 (= 2r
map the elements as follows:

The C compiler would


integer

array

Index
Element

What

and
0
a[0][0]

1
a[0][1]

the table attempts

element

referenced

a[1][0]

a[1][1]

a[2][0]

a[2][1]

to show is that the

as a [0] [0] is stored at the


one dimensional array.

beginning of the 6 element


Element

is

a[0] [1] is stored in the next position

so on. Each row of the two


dimensional
array is stored contiguously in
memory. The last element of a row is followed
by the first element of the next row
This is
known as the rowwise representation of the array
and is how a C/C++ compiler would represent
the array.

(index

1) and

How does the compiler determine

[j]

appears

in the rowwise

where a[i]

representation?

simple formula will compute the index from i


and
j

The formula inthis

hard to

see

case is 2i + j.Its not too

how this formula is derived. Each

row is two elements long; so, the first element


of row i
is at position 2i. Then the position of
104
CHAPTER 5.ARRAYS

mov
2

eax, [ebp

sal

-44]
eax, 1

;
ebp

-44 isis location


;

multiple i
by 2
3

-48]

add

eax, [ebp

mov

eax, [ebp + 4*eax

add j
4

-40]

-40 is the address of a[0][0]


[ebp -52], eax
;
store
result into x (at ebp -52)
ebp

mov

Figure 5.6: Assembly for x

= a[i ][j]

This analysis also shows how the formula is

generalized to anarray with N columns: N i+j.


Notice that
columns
the formula does not depend
number of

on the

rows.

As an example, let
following code (using

us see how gcc compiles the


the array a defined above):

x = a[i ][ j];
Figure 5.6 shows the assembly this is translated
into. Thus, the compiler

essentially converts the

code to:

x = (&a[0][0] + 2i + j);
and infact, the programmer could write this way
with the

same result.

There is nothing magical about the choice of

the rowwise representation

of the

array. A

columnwise representation would work just

as

well:
0

Index
Element

a[0][0]

1
a[1][0]

a[2][0]

a[0][1]

a[1][1]

a[2][1]

Inthe columnwise representation, each column


is stored contiguously. El-ement [i][j] is stored

at position i
+ 3j.Other languages

use the columnwise

for example)

(FORTRAN,

representation. This is important when


interfacing code with multiple languages.

Dimensions Above Two

For dimensions above two, the same basic


idea is applied. Consider

three dimensional

array:
int b[4][3][2];

This

array

would be stored like it

dimensional
consecutively

arrays each of
in memory. The

how it starts out:


5.1. INTRODUCTION

105

was

size

four two

[3]

[2]

table below shows

Index

Element

b[0][0][1]

b[0][1][0]

b[0][1][1]

b[0][2][0]

b[0][2][1]

10

b[1][0][0]

b[1][0][1]

b[1][1][0]

b[1][1][1]

b[1][2][0]

Index

Element

b[0][0][0]

11

b[1][2][1]

The formula for computing the position of b[i]

[j][k] is 6i + 2j + k.The 6 is determined by

arrays. Ingeneral, for


as a[L][M][N] the position

the size of the [3][2]

anar-ray dimensioned

of element a[i][j][k] will be M

+ k.Notice
does not

i+ N

again that the first dimension (L)

appear

inthe formula.

same process
anndimen-sional array of

For higher dimensions, the

generalized. For

dimensions D1 to Dn

is

the position of element

denoted by the indices i1 to in is given by the


formula:
D2

D3

+ Dn

Dn

in1

i1

+D3

D4

Dn

i2

+ in

or for the uber math geek, it can be written more


succinctly as:

Dk

j=1

ij

k=j+1

does not

The first dimension, D1


formula.

appear

This is where

in the

you can

tell

For

the

columnwise

general formula would be:

representation,
the author

the

was

a physics

major. (Or

+ D1
i2 + + D1

was the refer-

i1

D2

D2 Dn1 in

Dn2
in1 + D1
ence to FORTRAN a give- or
away?)

inuber math geek notation:

j1

Dk

j=1

ij

k=1

Inthis case, it is the last dimension, Dn

not

appear

in the formula.

Passing Multidimensional Arrays

that does

as

Parameters inC

The

rowwise

representation

of

arrays has a direct effect in C


programming. For one dimensional arrays, the
size of the array is not required to compute where
any specific element is located in memory. This is
not true for multidimensional arrays
To access
the elements of these arrays, the compiler must
multidimensional

know all but the first dimension.

apparent

when considering

function that takes

This becomes

a
array as a

the prototype

a multidimensional

of

parameter. The following will not compile:


void

f(

int

information /

[][]);

no dimension

106
CHAPTER 5.ARRAYS

However, the following does compile:


void f(int

a [][2] );

Any two dimensional

array

with two columns

can

be passed to this function. The first dimension is


2.
not required
Do not be confused by

a function

with this

prototype:
void f( int

This defines

a []);

a single

array of integer
can be used to create an
acts much like a two
dimensional

pointers (which incidently

array

of arrays that

dimensional

array).

For higher dimensional

first dimension of the

arrays,

all but the

array must be specified


a four dimensional

parameters. For example,

array parameter
void f(int

5.2

might be passed like:

a [][4][3][2]

);

Array/String Instructions

for

The

80x86

family

of

These

instructions.

provide

are de-signed to work with


are called string

several instructions that

arrays

processors

instructions

They

use

the index registers

(ESI

an operation and then to


or decrement one or
reg-isters. The direction flag

and EDI) to perform


automatically

increment

both of the index

(DF) in the FLAGS register determines where the

are incremented or
are two instructions
that

index registers
There

decremented
modify

the

direction flag:
CLD clears the direction flag. Inthis state, the

index registers

are incre-

mented.
STD sets the direction flag. Inthis state, the
index registers

are decre-

mented.

very common

mistake in 80x86 programming

is

to forget to explicitly put the direction flag in the


correct
works

state. This often leads

to code that

most of the time (when the direction flag

happens to be in the desired state), but does not

work allthe time.

5.2.1

Reading and writing

memory

The simplest string instructions either read

or write memory or both. They may read or


write a byte, word or double word at a time.
Figure 5.7
2

A size

can be specified

here, but it is ignored by the

compiler.

5.2. ARRAY/STRING INSTRUCTIONS

107

=[DS:ESI]
= ESI 1
AX = [DS:ESI]
ESI = ESI 2
EAX = [DS:ESI]
ESI = ESI 4
AL

LODSB

STOSB

LODSW

LODSD

= AL
= EDI 1
[ES:EDI] = AX
EDI = EDI 2
[ES:EDI] = EAX
EDI = EDI 4
[ES:EDI]

EDI

ESI

STOSW

STOSD

Figure 5.7: Reading and writing string instructions

segment .data

array1

dd

1,2,3,4,5,6,7,8,9,10

34

segment .bss
5

array2

resd 10

67

segment .text

;
dont

cld

mov

esi, array1

10

mov

edi, array2

11

mov

ecx, 10

12

forget this!

lp:

13

lodsd

14

stosd

15

loop

lp

Figure 5.8: Load and store example

shows

these

pseudo-code

are several

instructions

description

with

short

of what they

do. There

points to notice here. First, ESI is used

for reading

and

easy to

EDI for writing. It is

remember this if

one

remembers that SI stands for

Source Index and DI stands for Destination Index.

Next, notice that the register that holds the data

or

is fixed (either AL, AX

that the storing instructions


the segment

as

DS

programming,

programmer

ES to detemine

this is not usually

there is only

one

should be automatically
(just

use

to write to, not DS. In protected

mode programming
since

EAX). Finally, note

is).
it

is

a problem,

data segment

initialized

However,

very

to initialize
3.

in real

important
ES

and ES

to reference it

to the

mode

for

the

correct

segment

selector

use

example
3

Another

value

of

Figure

these

complication

5.8 shows

instructions

is that

an

that

one can not copy

the

value of the DS register into the ES register directly using


single

copied

MOV instruction.

to

general

Instead,

purpose

the value of DS must

register

be

(like AX) and then

copied from that register to ES using two

108
CHAPTER 5.ARRAYS

MOVSB

byte [ES:EDI]
ESI

EDI
MOVSW

= byte [DS:ESI]

= ESI 1
= EDI 1

word [ES:EDI]

= word [DS:ESI]

= ESI 2
EDI = EDI 2
ESI

MOVSD

dword [ES:EDI]

= dword [DS:ESI]

= ESI 4
EDI = EDI 4
ESI

Figure 5.9: Memory


1

segment .bss

array

move

string instructions

resd 10

34

segment .text

;
dont

cld

mov

edi, array

mov

ecx, 10

xor

eax, eax

rep stosd

forget this!

Figure 5.10: Zero

array

example

copies

an array

The

into another.

combination

of

and

LODSx

STOSx

instruction (as in lines 13 and 14 of Figure 5.8) is

very common.
by

performed

instructions

5.8

single

5.9 describes

Figure

could

difference

be

string instruction.

MOVSx

the operations

that

these

perform. Lines 13 and 14 of Figure

be

instruction

can

In fact, this combination

replaced

with

the

with

same

single

MOVSD

The

effect.

only

would be that the EAX register would

not be used at all inthe loop.

5.2.2

The REP instruction prefix

The

80x86

family

provides

.
a

special

instruction prefix
called REP that can be used
with the above string instructions
This prefix
tells the CPU to repeat the next string instruction
a specified number of times. The ECX
4

MOV instructions.
4

A instruction

prefix

is not

special byte that is placed before

modifies

the instructions

an
a

instruction,

it is

string instruction that

behavior. Other prefixes

used to override segment defaults of memory

are

accesses

also

5.2. ARRAY/STRING INSTRUCTIONS

109
CMPSB

compares

byte [DS:ESI] and byte [ES:EDI]

= ESI 1
EDI = EDI 1
ESI

CMPSW

compares

word [DS:ESI] and word [ES:EDI]

= ESI 2
EDI = EDI 2
ESI

CMPSD

compares

dword [DS:ESI] and dword [ES:EDI]

= ESI 4
EDI = EDI 4
ESI

SCASB

compares
EDI

SCASW

compares
EDI

SCASD

AX and [ES:EDI]

compares
EDI

AL and [ES:EDI]

1
EAX and [ES:EDI]

Figure 5.11: Comparison string instructions

register is used to count the iterations (just


for the LOOP instruction).

as

Using the REP prefix,

the loop in Figure 5.8 (lines 12 to 15) could be


replaced

with a single line:

rep movsd
Figure 5.10 shows another example that

zeroes

out the contents of an array.

5.2.3

Comparison string instructions

Figure

5.11

shows

new

several

string

that can be used to


compare
memory with other memory or a register.
They are useful for comparing or searching
arrays. They set the FLAGS register just like
instructions

the

CMP instruction.

compare

corresponding

scan memory

SCASx

The

instructions

CMPSx

memory

locations and the

locations

for

specific

value.
Figure 5.12 shows
searches

array.

short code snippet that

for the number

12

The SCASD instruction

a double word
on line 10 always

in

even if the value searched for is


one wishes to find the address of
in the array, it is necessary to

adds 4 to EDI,
found. Thus, if

the 12 found

subtract 4 from EDI (as line 16 does).

5.2.4

The REPx instruction prefixes

There

are several other REP-like instruction


can be used with the comparison

prefixes that

string instructions.

Figure 5.13 shows the two

bel.

110
CHAPTER 5.ARRAYS

segment .bss

array

resd 100

34

segment .text
5

cld

mov

edi, array

;
pointer

mov

ecx, 100

;
number

of elements

mov

eax, 12

;
number

to scan for

lp:

10

scasd

11

je

found

12

loop

lp

13

;
code to perform

14

15

16

17

18

to start of array

jmp

onward

sub

edi, 4

if not found

found:

;
code to perform

;
edi now

points to 12 inarray

if found

onward:

Figure 5.12: Search example

or at most ECX times


or at most ECX

REPE, REPZ

repeats instruction while Z flag is set

REPNE, REPNZ

repeats instruction while Z flag is cleared


times
Figure 5.13: REPx instruction prefixes

prefixes

and

are

(as

describes

are

and REPZ

just

REPNE

comparison

their

synonyms

and

operation.

REPE

same

prefix

for the

REPNZ).

If the

repeated

stops because

string instruction

the result of the comparison, the index register

are

registers

decremented;

still

however,

incremented

and

of

or

ECX

the FLAGS register still

holds the state that terminated the repetition.


Why

can one not

use

the

because of

flag
to

comparisons

just look

Thus, it is possible to

to determine

see

if ECX is

if the

zero

a comparison or ECX

repeated

after

stopped

becoming

zero.

the repeated comparison?

Figure 5.14 shows

an

example

code snippet

that determines

if two

blocks
of

memory are

equal. The JE

example checks to

see

the result

on

line 7 of the

of the previous

instruction. If the repeated comparison

stopped

because it found two unequal bytes, the Z flag

will still be cleared


however,

if the

ECX became

and

no

comparisons

zero, the

branch is
stopped

made;

because

Z flag will still be set and

the code branches to the equal label.

5.2. ARRAY/STRING INSTRUCTIONS

111
1

segment .text

cld

mov

esi, block1

;
address

of first block

mov

edi, block2

;
address

of second block

mov

ecx, size

;
size of blocks

repe

cmpsb

;
repeat

je

equal

;
code to perform

jmp

inbytes

while Z flag is set

;
if Z set, blocks
if blocks

equal

are not equal

onward

equal:

10

11

;
code to perform

if equal

onward:

12

Figure 5.14: Comparing

5.2.5

memory

blocks

Example

This section contains

an assembly source
array

file

with several functions that implement

operations using string instructions. Many of the

functions

duplicate familiar C library functions.

global

_asm_copy,

_asm_find,

memory.asm
_asm_strlen, _asm_strcpy
2

segment .text

;
function _asm_copy
;
copies blocks of memory
;
C prototype
;
void asm_copy( void *dest, const
void *src, unsigned sz);
;
parameters:
; dest
pointer to buffer to copy
4

to
10

src

-pointer

sz

-number

to buffer to copy

from
11

of bytes to copy

12

13

;
next, some

helpful symbols

defined
14

15

%define dest [ebp+8]

16

%define

src

[ebp+12]

17

%define

sz

[ebp+16]

18

_asm_copy:

19

enter

0,0

20

push

esi

112

are

21

push

edi

mov
mov
mov

esi, src

22

23

24

25

edi, dest

ecx, sz

26

27

cld

28

rep

movsb

pop
pop

edi

31

32

leave

33

ret

29

30

esi

34
35

36

37

38

39

;
function _asm_find
;
searches memory for a given
;
void *asm_find( const void
;
parameters:
CHAPTER 5.ARRAYS

;
esi =address

of buffer to copy from

;
edi =address

of buffer to copy to

;
ecx

=number

;
clear

of bytes to copy

direction flag

;
execute

movsb ECX times

byte

*src, char target, unsigned sz);

40

src

-pointer

to buffer to

search
41

42

;
;

target

sz

-byte value to search for


-number of bytes inbuffer

;
return value:
; if target is found, pointer to
first occurrence of target inbuffer
43

44

45

is returned
46

else

;
NULL is returned
;
NOTE: target is a byte value,
pushed on stack as a dword value.

47

48

49

but is

The byte value is stored inthe lower

8-bits.

50

51

%define

52

%define target [ebp+12]

53

%define

src
sz

[ebp+8]

[ebp+16]

54

55

_asm_find:

56

enter

0,0

57

push

edi

mov

eax, target

58

59

has value to search for

;
al
mov

60

edi, src
61

mov

62

cld

ecx, sz

5.2. ARRAY/STRING INSTRUCTIONS

113
63

repne

64

until ECX

je

66

zero

mov

;
scan

scasb

== 0 or [ES:EDI] == AL

67

;
if not

eax, 0

short quit

;
if

found_it

flag set, then found value

return NULL pointer

65

68

found,

jmp

69

found_it:

70

mov

eax, edi

71

dec

eax

found return (DI


73

pop

74

leave

75

ret

-1)

72

;
if
quit:

edi

76

77
78

79

80

81

82

83

84

;
function _asm_strlen
;
returns the size of a string
;
unsigned asm_strlen( const char
;
parameter:
; src
pointer to string

*);

;
return value:
; number of chars

instring (not

counting, ending 0) (inEAX)


85

src [ebp +8]

86

%define

87

_asm_strlen:

88

enter

0,0

89

push

edi

91

mov

edi, src

92

mov

ecx, 0FFFFFFFFh ;

90

93

xor

94

cld

al,al

scasb

95

repnz

96

97

98

99

;
;
repnz

100

;
not

101

will go one step too far,

FFFFFFFF

-ECX

102

mov

eax,0FFFFFFFEh

103

sub

eax, ecx

104

edi

=pointer

use largest

to string

possible ECX

=0

al

scan for terminating


so length
length

-ECX,
-ecx

is FFFFFFFE

=0FFFFFFFEh

114

105

pop

106

leave

107

ret

edi

108

109

;
function

_asm_strcpy

114

;
copies a string
;
void asm_strcpy(
;
parameters:
; dest
pointer
; src
pointer

115

116

%define dest [ebp

117

%define

118

_asm_strcpy:

110

111

112

113

src

char *dest,

to string to

to string to

+ 8]
[ebp + 12]

119

enter

0,0

120

push

esi

121

push

edi

mov

edi, dest

122

123

124

mov

125

cld

126

esi, src

cpy_loop:

127

lodsb

128

stosb

129

or

al,al

130

jnz

cpy_loop

pop
pop

edi

133

134

leave

131

132

esi

CHAPTER 5.ARRAYS

const char *src);

copy to
copy from

;
load

AL & inc si

;
store
;
set

AL & inc di

condition flags

;
if not

past terminating 0,continue

ret

135

memory.

memex.c

#include <stdio.h>

#define STR SIZE 30


/ prototypes

void

asm copy(

unsigned )

void , const void void


,

attribute

asm find( const

((cdecl));

void ,
char target

void

unsigned )

5.2. ARRAY/STRING
INSTRUCTIONS

115

unsigned
attribute

asm strlen( const


((cdecl));

, const char )

10

char )

void

attribute

asm strcpy(
((cdecl));

11

char

12

int main()

13

14

char st1[STR SIZE]

15

char st2[STR SIZE];

16

char st;

17

char

= test

string;

ch;

18

19

asm copy(st2,

st1, STR SIZE);

30 chars of string /

copy

printf (%s\n,

20

all

st2);

21

22

printf (Enter

in string /
24

25

26

27

28

st

23

a char: );

/ look for byte

scanf(%c%[\n],

= asm find(st2,

&ch);

ch, STR SIZE);

if (st )
printf
pr

(Found it:%s\n, st);

else
printf (Not found\n);

29

30

31

st1[0]

= 0;

printf (Enter string :);

32

scanf(%s,

33

printf (len

st1);

= %u\n,

asm strlen(st1));

34

35

asm strcpy(

st2, st1);

copy

meaningful data in string /

36

printf (%s\n,

st2 );
37

return 0;

38

39

memex.c
116
CHAPTER 5.ARRAYS

Chapter 6

Floating Point
6.1

Floating Point

Representation

6.1.1

Non-integral binary numbers

When number systems

first chapter, only integer


Obviously,

it must

non-integral

numbers

decimal. In decimal,

were

discussed in the

were

values

to represent

be possible

in other bases
digits to the

0.123

=1

10

+2

10

as

well

+3

powers

10

Not surprisingly, binary numbers work similarly:

0.1012

=1

+0

+1

as

right of the

decimal point have associated negative

ten:

discussed.

= 0.625

of

This idea

can be combined

with the integer

methods of Chapter 1
to convert

a general

number:

= 4 + 2 +0.25 + 0.125 = 6.375


Converting from decimal to binary is not

very

diffi cult either. Ingeneral, divide the decimal


number into two parts: integer and fraction.
Convert the integer part to binary using the

methods from Chapter 1.The fractional part is


converted using the method described below.
Consider

a binary

fraction with the bits

..

labeled a,b,c,... The number inbinary then


looks like:
0.abcdef

Multiply the number by two. The binary


representation of the

new

...

number will be:

a.bcdef

117

118
CHAPTER 6.FLOATING POINT

0.5625
0.125
0.25
0.5

2
2
2
2

=
=
=
=

1.125
0.25

first bit

second bit

0.5

third bit

1.0

fourth bit

Figure 6.1: Converting 0.5625 to binary

=
=
=
=

1
0
0
1

0.85
0.7
0.4
0.8
0.6
0.2
0.4
0.8

2
2
2
2
2
2
2
2

=
=
=
=
=
=
=
=

1.7
1.4
0.8
1.6
1.2
0.4
0.8
1.6

Figure 6.2: Converting 0.85 to binary

Note that the first bit is

..

now

Replace the a with 0 to get:


0.bcdef

in the ones place.

...

and multiply by two again to get:


b.cdef

Now the second bit (b) is inthe ones position.


This procedure
bits

are

needed

can be repeated until as many


are found. Figure 6.1 shows a

real example that converts 0.5625 to binary. The


method stops when

fractional

part of

zero

is

reached.

As

another

example,

consider

converting

23.85 to binary. It is easy to convert the integral

part

(23

fractional

=
part

101112 ), but
(0.85)?

what

Figure

about

6.2 shows

beginning of this calculation. If one looks at

the

the

6.1. FLOATING POINT REPRESENTATION

119

the numbers carefully,


This

means
to

opposed
There

an infinite loop is found!


a repeating binary (as

that 0.85 is

repeating

a pattern to

is

decimal

the

in base 10)

numbers

calculation. Looking at the pattern,

that

0.85

10111.1101102

0.1101102

in the

one can see

23.85

Thus,

consequence of the above


can not be represented
using a finite number of bits

important

One

calculation is that 23.85

exactly in binary
(Just

13

as

a finite

can not

be represented in decimal with

number of digits.) As this chapter shows,

float and double variables in C

binary. Thus, values like 23.85


exactly

into

these

approximation of 23.85

To

simplify

the

are

can not

variables.

Only

an

hardware,

floating

point

are stored in a con-sistent format


This
uses scientific notation (but in binary,
powers of two, not ten). For example,
or 10111.11011001100110
.2 would be

format

23.85

be stored

can be stored.

numbers

using

stored in

..

stored

as:

1.011111011001100110...

100

(where the exponent (100) is in binary). A

normalized floating point number has the form:

1.ssssssssssssssss

eeeeeee

where 1.sssssssssssss is the significand and

eeeeeeee
6.1.2

is the exponent.

IEEE floating point

representation

The

IEEE

Electronic

(Institute

Engineers)

organization

is

on most

made

Often

binary

specific

(but not all!)

it

is

of the computer

Intels numeric (or math)

This

com-puters

supported

by

IEEE

different

defines

the

itself. For example,

coprocessors

two different

are
use it.

(which

built into all its CPUs since the Pentium)

The

and

inter-national

floating point numbers

format is used

today.

Electrical

an

that has designed

formats for storing

hardware

of

formats

with

precisions: single and double precision.

Single preci-sion is used by float variables in C

and double precision is used by double variables.

coprocessor also uses a third,


ex-tended precision. In fact,
all data in the coprocessor itself is inthis precision.
When it is stored in memory
from the
coprocessor it is converted to either single or
Intels math

higher precision called

double

precision

precision

uses a

automatically.

slightly

different

Extended

general format

than the IEEE float and double formats and

so

will not be discussed here.


1

not

It should

repeat

in

repeats
0.13
2

type

one

be

in decimal,

Some

compilers

this

use

double. (This

surprising

that

not another.

but in ternary

compilers

uses

so

base, but

(such

extended

as

number

Think

about

might
1
3

(base 3) it would

Borland)

precision.

long

However,

it

be

double
other

double precision for both double and long


is allowed by ANSI C.)

120
CHAPTER 6.FLOATING POINT

31

30

23

s
s
e

22

e
sign bit

-0 = positive, 1= negative

biased exponent (8-bits)

= true exponent

+ 7F (127 decimal).

The

values 00 and FF have special meaning (see text).


f

fraction

-the first 23-bits after the 1.inthe significand.


Figure 6.3: IEEE single precision

IEEE single precision

Single precision floating point

encode the number.

uses 32 bits to

It is usually accurate to 7

significant decimal digits. Floating point


numbers

are stored

in a much

more

complicated

format than integers. Figure 6.3 shows the basic


format of

a IEEE single

precision number.

are sev-eral quirks to the format. Floating


use the twos complement
representation for negative numbers. They use a
signed mag-nitude representation. Bit 31
determines the sign of the number as shown. The

There

point numbers do not

binary exponent is not stored directly.

Instead,

the sum of the exponent and 7F is stored from


bit 23 to 30. This biased exponent is always
non-negative.

The fraction part

assumes a normalized

(in the form 1.sssssssss).

significand

Since the first bit is

anone, the leading one is not stored!


This al-lows the storage of an additional bit at the
end and so increases the precision
slightly. This idea is know as the hidden one
always

representation.
How would 23.85 be

One should always keep in

stored? First, it is positive

so the sign bit is 0.

Next mind that the bytes 41BE

the true exponent

is 4, so the biased exponent is 7F+4

= 8316

Finally, the
CC CD can be interpreted

ways depending
on what aprogram does
with them!
As as single
different

precision

fraction is 01111101100110011001100 (remember the leading

one is hidden).

Putting this all together (to help clarify the different sections of the floating
point format, the sign bit and the faction have been underlined and the bits

floating

point

have been grouped into

4-bit nibbles):
number,

they

23.850000381,

represent
but

as

0100 00011

0111110 1100 1100 1100 11002

= 41BECCCC16

double word integer, they

represent

1,103,023,309!

This is not exactly 23.85


(since it is a repeating binary). If one converts
The CPU does not know

the above back to

decimal,

one finds that it is approximately


which

23.849998474.

is

correct

the

This

number is very close to 23.85, but it is not exact.

Actually, in C,23.85 interpretation!


would not be represented exactly
the left-most bit that

as above. Since

was

truncated from the exact representation is 1, the

up to 1. So 23.85 would be
as 41BE CC CD inhex using single

last bit is rounded

represented

precision. Converting this to decimal results in

23.850000381 which is a slightly better


approximation of 23.85.

6.1. FLOATING POINT REPRESENTATION

121
e =0

and

=0

denotes the number

zero (which can not be nor-

malized) Note that there is a +0 and -0.

e =0

and

f 6= 0

denotes

a denormalized

number. These

are dis-

cussed inthe next section.

e =FF

and

=0

denotes infinity (). There

are

both positive

and negative infinities.

e =FF

and

f 6= 0

an undefined
a Number).

denotes
(Not

result, known

Table 6.1: Special values of f and e

as NaN

63

62

52

0
s

51

Figure 6.4: IEEE double precision

How

would

-23.85

be

represented?

Just

change the sign bit: C1BE CC CD. Do not take


the twos complement!

Certain combinations

meanings

for IEEE floats.

these special values.

f have special

Table 6.1 describes

An infinity is produced

operation

a negative

number, adding two infinities, etc.

overflow

undefined

Normalized

or

e and

zero.
An
result is produced by an invalid
such as trying to find the square root of

by

an

of

by division

single

precision

by

numbers
N
orm alizedcan
sin

range
in magnitude
1.0
2in magnitude
( 1.1755
gle
precision
numbersfrom
can range
35)
127
35).
10
to 1.11111...
2
( 3.4028
10
126

Denormalized numbers

Denormalized
numbers
can be used to
represent numbers with magni-tudes too small to

126).
normalize tud
(i.e.
es too
below
sm all
1.0 to 2n ormal ize
For (i.e.
example,
be lo
129

consider
1.02 the number 1.0012
2
( 1.6530
39).
Inthe given normalized form, the exponent

can

is too small. However, it

be represented in
127.
the unnormal- ized form: 0.010012
2
To

store this number, the biased exponent is set to 0


(see Table 6.1) and the fraction is the complete

significand

with 2
the

of the

127

one to

number written

(i.e. all bits


the

left

of

are

the

The representation of 1.0012

as a

stored

decimal
the one
point)

129

product

including

is then:

0000 0000 00010010 0000 0000 0000 0000

122
CHAPTER 6.FLOATING POINT

IEEE double precision

IEEE double precision

uses 64 bits to

represent numbers and is usually accurate to


about 15 significant decimal digits. As Figure

6.4 shows, the basic format is very similar to


single precision. More bits

are used for the

biased

exponent (11) and the fraction (52) than for


single precision.
The larger

range

for the biased exponent has

two consequences. The first is that it is calculated

as the sum of the true exponent and 3FF (1023)


(not 7F as for single precision). Secondly, a large
range of true exponents (and thus a larger range
of magnitudes) is allowed. Double precision

magnitudes
308.
to 10

can range

from approximately 10

308

It is the larger field of the fraction that is


responsible for the increase in the number of
significant digits for double values.

As an example, consider 23.85 again. The

biased exponent will be 4 + 3FF

= 403 inhex.

Thus, the double representation would be:

0100 0000 0011


0111110110011001100110011001100110011001100110011010

or 40 37 D9 99 99 99 99 9A inhex. If one
converts this back to decimal, one finds
23.8500000000000014 (there are 12 zeros!) which
is a much better approximation of 23.85.
The double precision has the same special
3.
values as single precision
Denormalized
numbers are also very similar. The only main
difference is that double denormalized numbers
use 2 1023 instead of 2 127.

6.2

Floating Point Arithmetic

Floating

different

point arithmetic

than in continuous

mathematics,

all numbers

on a computer
mathematics

can

is

In

be considered

exact. As shown in the previous section,

on a

many numbers can not be represented


a finite number of bits. All
calculations are per-formed with limited precision.
In the examples of this section, numbers with an
computer
exactly

with

8-bit significand will be used for simplicity.

6.2.1

Addition

To add two floating point numbers, the

exponents must be equal. If they

are not

already equal, then they must be made equal by

shifting the significand of the number with the


smaller exponent. For example, consider 10.375

6.34375

= 16.71875

or inbinary:

1.0100110
1.0100110 2 2

1.1001011

1.1001011

The only difference is that for the infinity and

undefined values, the biased exponent

is 7FF not FF.

6.2. FLOATING POINT ARITHMETIC

123

These two numbers do not have the same

so shift

exponent

the significand

to make the

exponents the same and then add:

1.0100110

0.1100110
0.1100110

2
2

10.0001100

Note that the shifting of 1.1001011


the
Note trailing
tha
t the
one
sh
rounding
2

results

and
ift
in

drops off

after
ing of 1.100101
3.
0.11001102

nding
n0.11
2
The
result of the (or
ad
The result
result s iof
the 00110
addition,
10.00011002
4)
1.00001100
2
is equal to 10000.1102 or 16.75.
3

answer (16.71875)!
an approximation due to the round off
errors of the addition process.
This is not equal to the exact

It is only

It is important

always

mathematics
point

to realize that floating point

on a computer
an approximation

arithmetic

(or

calculator)

The

laws

is
of

do not always work with floating

numbers

on a computer.

Mathemat- ics

assumes

infinite precision which

no computer can

match. For example, mathematics

+ b)

exactly

6.2.2

= a; however,

this

teaches that (a

may not

hold true

ona computer!
Subtraction

Subtraction works very


same problems as addition.
consider 16.75 15.9375

Shifting 1.1111111

1.0000000

As

an example,

= 0.8125:

1.0000110

1.1111111
1.1111111

3
22

gives (rounding up)

0.0000110

similarly and has the

24

1.0000110

24

1.0000000
1.0000000

2
2

0.0000110

24

=0.112 =0.75 which is not

exactly correct.

6.2.3

Multiplication and division

For multiplication, the significands

are

multiplied and the exponents

10.375

2.5

added. Consider

= 25.9375:

1.0100110

1.0100000
1.0100000

2
2

10100110

10100110

1.10011111000000

are

124

CHAPTER 6.FLOATING POINT

Of

course,

the real result would be rounded to

8-bits to give:

1.1010000

= 11010.0002 = 26

Division is more complicated, but has similar

problems with round of f

6.2.4

errors.

Ramifications for programming

The main point of this section is that floating

calculations
are not exact.
The
programmer needs to be aware of this.
A
common mistake that programmers make with
floating point numbers is to compare them
assuming that a calculation is exact. For example,
consider a function named f(x) that makes a
complex calculation and a program is trying to
point

find the functions roots

4.
One might be tempted

to use the following statement to check to see if x


is a root:

if (f(x)

== 0.0 )

But, what if f(x) returns 1 10


likely

means

that

30?

x is a very good

This

very

approximation

of a true root; however, the equality will be false.

may not be any IEEE floating point value


x that returns exactly zero, due to round off
errors in f(x).
A much better method would be to use:
There
of

if (fabs(f(x))

< EPS )

where EPS is a macro defined to


be aEPS
very ismall
where
10).
This is true
positive value (like 110
whenever f(x) is very close to zero. Ingeneral,

to compare
another

a floating
(y) use:

point value (say x) to

if (fabs(x y)/fabs(y)

6.3

< EPS

The Numeric Coprocessor

6.3.1

Hardware

The earliest Intel

processors

had

support for floating point operations


not

mean

that

they

could

not

no

hardware

This does

perform

float

operations.

It just

by

performed

non-floating

systems

that they had to be

procedures

composed

point instructions.

Intel did

math

provide

coprocessor.

an

many

of

For these early


additional

chip

coprocessor
has machine
instructions
that perform many
floating point operations much faster than using a
software procedure (on early processors, at least

called

means

A math

10 times
4

A root of a function is a value

x such that

f(x)

=0

6.3. THE NUMERIC COPROCESSOR

125

faster! )

The

coprocessor

and

the

for

processor

80386,

80387.

integrated the math

80486 itself.

was
was a 80287

for the 8086/8088

called the 8087. For the 80286, there

The

80486DX

coprocessor

into the

Since the Pentium, all generations of

coprocessor;
however, it is still programmed as if it was a
separate unit. Even earlier systems without
a
coprocessor can install software that emulates a
math coprocessor. These emulator packages are
automatically
activated
when
a program
executes a coprocessor instruction and run a
software procedure that produces the same result
as the coprocessor would have (though much
80x86

processors

have

a builtin

math

slower, of course).

The numeric
point registers.

coprocessor

has eight

Each register

holds

data. Floating point numbers

.
as

80-bit

extended

precision

registers. The registers


ST7.

The

floating

are

point

..

always stored

numbers

are named

floating

80 bits of

in these

ST0, ST1, ST2,

registers

are

used

differently

than the integer registers of the main

are organized as
a Last-In First-Out

CPU. The floating point registers

a stack.

Recall that

a stack

(LIFO) list. ST0 always

top of the stack. All

is

refers to the value at the

new

numbers

top of the stack. Existing

the

pushed down

new

on the

are

added to

numbers

stack to make

room

are

for the

number.
register in the numeric

flags. Only the 4 flags

used for comparisons


and C3

6.3.2

The

,,

a status

It has several

There is also

coprocessor.

will be covered: C0

use of these

C1

C2

is discussed later.

Instructions

To make it easy to distinguish the normal


CPU instructions from copro-cessor

coprocessor

ones, all the

mnemonics start with an F.

Loading and storing

are several instructions that load data


copro-cessor register stack:
FLD source
loads a floating point number
from memory onto the top of
There

onto the top of the

the stack. The

source may

be a single, double

or extended
or a coprocessor register.
reads an integer from memory,

precision number
FILD

source

converts it to floating point

on top of the stack. The


source may be
a word, double word or quad word.
stores a one on the top of the

and stores the result

either
FLD1

stack.

stores

FLDZ

a zero on the top of the

stack.

are also several instructions that store


memory. Some of these
instructions also pop (i.e. remove) the number
There

data from the stack into

from
5

However, the 80486SX did not have have

integrated

coprocessor.

There

was a separate

an

80487SX

chip for these machines.

126
CHAPTER 6.FLOATING POINT

the stack

as it stores

FST dest

it.

stores the top of the stack (ST0)

into

memory.

The destina-

may either be a
number or a
coprocessor register.
tion

double precision

just

or

stores the top of the stack into

FSTP dest

memory

single

as FST; however,
after the number

is stored, its

value is popped from the stack.


The
single, double

may

destination

either

or extended precision number

or a coprocessor

register.
FIST dest

stores the value of the top of the

stack converted to an integer

memory.
or a double

into

may

either

a word

word.

The

The

destination

stack

itself

is

to

an

unchanged. How the floating point


number
integer depends

is

converted

onsome bits in
the

coprocessors

control

word.

This is a special (non-floating


point) word register that controls
how the coprocessor works.

By default, the control word is


initialized

so that

it rounds

to the nearest integer when it

converts to integer. However,


the

FSTCW

(Store

Control

Word) and FLDCW (Load Control


Word) instructions

can

be used

to change this behavior.


FISTP

dest

Same

as

FIST except for two

things. The top of the stack is

may

popped and the destination


also be a quad word.
There are two other instructions
move or remove data on the

that

can

stack itself.
FXCH STn

exchanges the values in ST0 and

STn on the stack (where

is register number from 1


to 7).
FFREE STn

up a register on the
as
unused or empty.
frees

stack

by marking the register

Addition and subtraction

Each of the addition instructions compute

the sum of ST0 and another operand. The result


is always stored in a coprocessor register.
FADD

may

src

ST0

coprocessor register
or a single or double

+= src. The src

be any

precision number in

memory.
dest

FADD dest, ST0

dest

may

be any

FADDP dest

coprocessor reg-ister.

or

stack. The dest

dest

may

+= ST0 then pop

be any

coprocessor

FADDP dest, STO


FIADD

+= ST0. The

src

ST0

register.

+= (float) src.

an integer to ST0. The


src must be a word or double word inmemory.
There are twice as many subtraction

Adds

instructions than addition because the order of


the operands is important for subtraction (i.e. a

+b = b + a,

but

b 6= b a!). For each

but
a b 6=there
b a!).
For
each instruction,
instruction,
is an
alternate
one that t
subtracts inthe reverse order. These

6.3.
COPROCESSOR

127

THE

reverse

NUMERIC

segment .bss

array

resq

SIZE

sum

resq

45

segment .text
6

mov

ecx, SIZE

mov

esi, array

fldz

;
ST0 =0

lp:

10

fadd

qword [esi]

;
ST0 += *(esi)

11

add

esi, 8

;
move

12

loop

lp

13

fstp

qword

to next double

;
store

sum

result into

sum

sum example

Figure 6.5: Array

R or RP. Figure 6.5 shows

a short code snippet


of anarray of doubles

up the elements
one must specify the size
memory operand. Otherwise the assembler

that adds

On lines 10 and 13,

of the

would not know whether the

memory

operand

FSUB

src

FSUBR

src

was a float

FSUB dest, ST0


FSUBR dest, ST0

FSUBP dest

or

FSUBP dest, STO

FSUBRP dest

or

FSUBRP dest, STO

FISUB

src

FISUBR
(dword)
ST0

or a double

src

(qword).

-= src. The src may be any coprocessor

register

or a single or double precision number in memory.


ST0 = src
ST0.
The src may be any coprocessor register or a single or double precision number in
memory.

dest

ister.
dest

cessor
dest

-= ST0. The dest may be any coprocessor

reg-

= ST0 -dest. The dest may be any coproregister.

-= ST0 then pop stack. The dest may be any

coprocessor

register.

= ST0 -dest then pop stack. The dest may

dest
be any

coprocessor register.
ST0 -= (float) src.
Subtracts an integer from
ST0. The src must be a word or double word in memory.
ST0 = (float) src
ST0. Subtracts ST0 from an
integer. The src must be a word or double word in
memory.

128
CHAPTER 6.FLOATING POINT

Multiplication and division

The multiplication instructions

are completely

analogous to the addition instructions.


FMUL

may

src

ST0 *= src. The

coprocessor register
or a single or double

src

be any

precision number in

memory.

dest *= ST0. The

FMUL dest, ST0

dest

may

be any

FMULP dest

coprocessor reg-ister.

or

stack. The dest

dest *= ST0 then pop

may

be any

FMULP dest, STO


FIMUL

coprocessor

register.

ST0 *= (float) src.


an integer to ST0.
The src must be a word or double word in
memory.

src

Multiplies

are

Not surprisingly, the division instructions


analogous to the subtrac-tion instructions.

Division by
FDIV

may

zero results

inan infinity.
ST0 /= src. The

src

coprocessor register
or a single or double

src

be any

precision number in

memory.
FDIVR

The

src

ST0

= src / ST0.

src may be any coprocessor register or a single or double

precision

number in

memory.
FDIV dest, ST0

dest

may

be any

dest /= ST0. The

coprocessor reg-ister.

FDIVR dest, ST0

dest

= ST0 / dest.

may be any coprocessor register.


FDIVP dest or
dest /= ST0 then pop
stack. The dest may be any
FDIVP dest, STO
coprocessor register.
FDIVRP dest or
dest = ST0 / dest
then pop stack. The dest may
FDIVRP dest, STO
be any coprocessor

The dest

register.
FIDIV

src

ST0 /= (float)

src.

Divides ST0 by an integer.


The

FIDIVR

ST0.

src must

src

Divides
ST0. The

be a word

or double word in
memory.
ST0 = (float) src /

an integer by
src must be a word or double

word in

memory.
Comparisons

The

coprocessor

also performs comparisons

of floating point numbers. The FCOM family of


instructions does this operation.

6.3. THE NUMERIC COPROCESSOR

129
1

if (
x >y )

fld

qword [x]

;
ST0 =x

fcomp

qword [y]

;
compare

fstsw

ax

;
move

sahf

else_part

;
if x not above y,goto

10

11

12

13

jna

STO and

C bits into FLAGS

else_part

then_part:

;
code for then part
jmp

end_if

else_part:

;
code for else part
end_if:

Figure 6.6: Comparison example

FCOM src
compares ST0 and src. The src
can be a coprocessor register
or a float or double in memory.
FCOMP src
compares ST0 and src, then
pops stack. The src can be a
coprocessor register or a float or
double in memory.
FCOMPP
compares ST0 and ST1, then
pops stack twice.
FICOM src
compares ST0 and (float)

src. The src can be a word or


dword integer inmemory.
FICOMP

src

then pops stack.

compares ST0 and (float)src,


The src
can be a word or dword integer

inmemory.

compares

FTST

C3 bits of the coprocessor

,,

ST0 and 0.

These instructions change the C0

C1 C2 and

status register.

Unfortunately, it is not possible for the CPU to

access

these bits directly. The conditional branch

use the FLAGS register, not the


coprocessor status register. However, it is
instructions

relatively simple to trans-fer the bits of the status

word into the corresponding bits of the FLAGS


register using

some new

FSTSW dest

word into either

SAHF

instructions:

Stores the coprocessor status

a word in memory or the AX register.


Stores the AH register into the

FLAGS register.
LAHF

Loads the AH register with the

bits of the FLAGS register.


Figure 6.6 shows

a short

example code

snippet. Lines 5 and 6 transfer

the C0

,,

C1 C2

coprocessor status word into


are transfered so
are analogous to the result of a

and C3 bits of the

the FLAGS register. The bits


that they

comparison of two unsigned integers. This is


why line 7 uses

a JNA

instruction.

130
CHAPTER 6.FLOATING POINT

The Pentium Pro (and later

processors
new comparison

(Pentium IIand III)) support two

operators that directly modify the CPUs FLAGS


register.

compares ST0 and src. The


reg-ister.
FCOMIP src
compares ST0 and src, then
pops stack. The src must be a coprocessor
FCOMI

src must

src

be a coprocessor

register.

Figure 6.7 shows

an example

subroutine that finds

the maximum of two dou-bles using the FCOMIP


instruction. Do not confuse these instructions
with the integer comparison functions (FICOM and
FICOMP).

Miscellaneous instructions

covers some other miscellaneous


that the co-processor provides.

This section
instructions
FCHS

ST0

ST0

= -ST0 Changes

the sign of

= |ST0|
Takes the absolute
STO
= |ST0|
Takes the absolut
STO
ST0 =
Takes the square
ST0

FABS

value of ST0

ST0

value
FSQRT
of S

root of ST0

= ST02
ST0 by
a power
of 2 quickly. ST1
ST0
= ST02

multiples

bST1c

ST0

FSCALE

is not removed from the

coprocessor

stack. Figure 6.8 shows

an example

of how to use this

instruction.

6.3.3

Examples

6.3.4

Quadratic formula

The first example shows how the quadratic


formula

can be encoded

in assembly. Recall that

the quadratic formula computes the solutions to


the quadratic equation:

ax

+bx +c = 0

The formula itself gives two solutions for x:x1

and x2

x1,x2

b
2

2a

4ac

The expression inside the

square root

(b

4ac)

is called the discriminant.

Its value is useful in

determining

the

possibilities

which

1.There is only

4ac

of

following

three

are true for the solutions.

one real degenerate

solution. b

=0

2.There

are two real solutions.

3.There

are two complex

4ac

solutions. b

Here is a small C program that

uses

>0

4ac

<0

the assembly

6.3. THE NUMERIC


COPROCESSOR

131

quadt.c

#include <stdio.h>
2

int quadratic(

double, double, double, double


, double );

int main()
6

double a,b,c, root1, root2;

printf (Enter
scanf(%lf

10

if (quadratic(

11

a,b,c,&root1,

&root2) )

printf (roots:
pr
%.10g %.10g\n,

12

root2);

13

root1,

else

printf (
(No real roots\n);

14

return 0;

15

16

a,b,c:);

%lf %lf, &a, &b, &c);

quadt.c

Here is the assembly routine:


quad.
1

;
function

;
finds

quadratic

solutions to the quadratic

equation:
3

a*x^2

+ b*x +c = 0

;
C prototype:

int quadratic( double

double
6

a,double

b,

c,

double *root1,

double *root2 )
7

;
Parameters:

a,b,c

-coefficients

quadratic equation (see above)

-pointer
in

10

root1

to double to store first root


root2

-pointer

store second root in


12

of powers of

11

to double to

;
Return

value:

returns 1
if real roots found,

else 0
13

qword [ebp+8]

14

%define

15

%define b

qword [ebp+16]

16

%define

qword [ebp+24]

17

%define root1

dword [ebp+32]

18

%define root2

dword [ebp+36]

19

%define disc

qword [ebp-8]

132

20

%define one_over_2a

qword

21

22

segment .data

23

MinusFour

dw

-4

24

segment .text

25

global

26

_quadratic

_quadratic:

27

28

push

ebp

29

mov

ebp, esp

30

sub

esp, 16

31

push

ebx

32

CHAPTER 6.FLOATING POINT

[ebp-16]

;
allocate

2 doubles (disc & one_over_2a)

;
must save

original ebx

33

fild

word [MinusFour];

34

fld

35

fld

36

fmulp

st1

37

fmulp

st1

38

fld

39

fld

40

fmulp

st1

41

faddp

st1

42

ftst

;
;

43

fstsw

44

sahf

45

jb

46

fsqrt

47

fstp

48

fld1

49

fld

50

fscale

51

fdivp

st1

52

fst

one_over_2a

;
;

53

fld

54

fld

disc

55

fsubrp

st1

56

fmulp

st1

57

mov

ebx, root1

58

fstp

qword [ebx]

ax
no_real_solutions

;
disc

;
a

59

fld

60

fld

disc

61

fchs

stack -4
stack:

a,-4

stack:

c,a,-4

stack: a*c, -4

stack: -4*a*c

stack: b,b,-4*a*c
stack: b*b, -4*a*c

stack: b*b

-4*a*c

test with 0

;
if disc < 0,no real solutions
stack: sqrt(b*b

store and

-4*a*c)

pop stack

stack: 1.0
stack:

a,1.0

stack:

a *2^(1.0)

=2*a, 1

stack: 1/(2*a)
stack: 1/(2*a)

stack: b,1/(2*a)
stack: disc, b,1/(2*a)
stack: disc
stack: (-b

-b,1/(2*a)

+ disc)/(2*a)

store in*root1
stack: b
stack: disc, b
stack: -disc, b

6.3. THE NUMERIC COPROCESSOR

133

62

fsubrp

st1

63

fmul

one_over_2a

64

mov

ebx, root2

65

fstp

qword [ebx]

66

mov

eax, 1

67

jmp

short quit

68

69

no_real_solutions:

mov

eax, 0

ebx

74

pop
mov

75

pop

ebp

76

ret

70

71

72

quit:

73

esp, ebp

-b
-disc)/(2*a)

;
stack:

-disc

;
stack:

(-b

;
store

in*root2

;
return

value is 1

;
return

value is 0

quad.asm

6.3.5

Reading

array

from file

an assembly routine reads


a file. Here is a short C test

Inthis example,
doubles from

program:
readt.c

This

assembly
from stdin

program tests
procedure.

the
32bit
read doubles
This
program
tests th

It reads the doubles

(Use redirection to read from file .)

/
5

#include <stdio.h>

extern int read doubles( FILE , double ,

int );

#define MAX 100

int main()
10

11

int i,n;

12

double a[MAX];

13

n= read

14

doubles(stdin

a,MAX);

15

for( i=0; i
< n;i++ )

16

printf (%3d %g\n, i,a[i ]);

17

return 0;

18

19

1 34

CHAPTER 6.FLOATING POINT

readt

Here is the assembly routine

r e
1

segment .data

format

34

db

ad

"%lf", 0

asm

segment .text
5

global

_read_doubles

extern

_fscanf

78

%define SIZEOF_DOUBLE

%define FP

dword

10

%define ARRAYP

dword

11

%define ARRAY_SIZE

dword

;
format

for fscanf()

+ 8]
[ebp + 12]
[ebp + 16]

[ebp

12

%define TEMP_DOUBLE

13

14

15

;
function

16

;
C prototype:

_read_doubles

[ebp

-8]

17

int read_doubles( FILE *fp,

double *arrayp, int array_size );18

This function reads doubles from


file into

array

anarray, until

a text

;
EOF or

19

is full.

20

;
Parameters:

21

-FILE pointer

fp

from (must be open for input)

-pointer

arrayp
read into

23

22

to double

array_size

to read

;
array to

-number of

elements inarray
24

;
Return

25

value:

number of doubles stored into

26

27

_read_doubles:

28

push

ebp

29

mov

ebp,esp

30

sub

esp, SIZEOF_DOUBLE

32

push

esi

33

mov

esi, ARRAYP

34

xor

edx, edx

31

35

36

while_loop:

37

cmp

edx, ARRAY_SIZE

38

jnl

short quit

array

(in EAX)

;
define one double on stack
;
save

esi

;
esi =ARRAYP
;
edx =array

index (initially 0)

;
is edx < ARRAY_SIZE?

;
if not, quit

loop

6.3. THE NUMERIC


COPROCESSOR

135
39

40

;
call fscanf()

to read

a double

into

TEMP_DOUBLE
41

;
fscanf()

42

might change edx

so save

43

push

edx

44

lea

eax, TEMP_DOUBLE

45

push

eax

46

push

dword format

47

push

FP

48

call

_fscanf

49

add

esp, 12

50

pop

edx

51

cmp

eax, 1

52

jne

short quit

it

53
54

56

;
copy
;
(The

57

55

TEMP_DOUBLE into ARRAYP[edx]

8-bytes of the double

are copied

-8]

58

mov

eax, [ebp

59

mov

[esi +8*edx],

eax

-4]

60

mov

eax, [ebp

61

mov

[esi +8*edx

63

inc

edx

64

jmp

while_loop

pop

esi

mov

eax, edx

71

mov

esp, ebp

72

pop

ebp

62

65

66

quit:

67
68

69
70

;
save

edx

;
push &TEMP_DOUBLE
;
push &format
;
push file pointer

;
restore

edx

;
did fscanf

return 1?

;
if not, quit

loop

+4],eax

by two 4-byte copies)

;
first copy
;
next copy

lowest 4 bytes

highest 4 bytes

;
restore esi
;
store return

value into

ret

73

6.3.6

eax

read.asm

Finding primes

This final example


numbers

looks at finding

prime

is more
one. It stores the
in an array and only divides

again. This imple-mentation

effi cient than the previous


primes it has found
by the previous

every

primes it has found

odd number to find

new

primes.

136
CHAPTER 6.FLOATING POINT

instead of

is that it computes

One other difference

square root

of the

guess

for

the

the next prime to

can stop searching for


coprocessor control word so
it stores the square root as an integer,

determine at what point it


factors. It alters the

that when
it

truncates

instead

of

rounding.

This

is

controlled by bits 10 and 11of the control word

are called the RC (Rounding Control)


bits. If they are both 0 (the default), the
coprocessor rounds when converting to integer. If
they are both 1, the coprocessor
truncates
These bits

integer

conversions.

careful to

save

Notice that the routine is

the original

control

word and

restore it before it returns.


Here is the C driver

program:
fprime.c

#include <stdio.h>

#include <stdlib.h>

function find primes

finds the indicated


i
number of primes

Parameters:

a array to hold

10

how

many

primes

primes to find

extern void find primes (int

a,unsigned n

);11
12

int main()

13

14

int status;

15

unsigned i;

16

unsigned

17

int

max;

a;

18

19

printf (How

find? );

20

many

primes do you wish to

scanf(%u,

&max);

21

22

a =calloc(

sizeof(int ),max);

23

24

if (a ){

25

26

find primes(a,max);

27

28

/ print out the last 20 primes found /

for(i= (max

29

? max 20

> 20 )

:0; i
< max;

i++ )

30

printf (%3d %d\n, i+1, a[i]);

6.3. THE NUMERIC COPROCESSOR

3132

free(a);

status

33

34

35

else {

=0;

fprintf (stderr

36

status

37

=1;

Can not create

array

38
39

return status;

40

41

137
of %u ints\n, max);

fprime.c

Here is the assembly routine:

prime2.asm
1

segment .text
global

_find_primes

;
;
function find_primes
;
finds the indicated

number of primes

;
Parameters:
; array
array to hold primes
; n_find
how many primes to find
;
C Prototype:

10

;extern void find_primes( int *array,

11

12

%define

13

%define n_find

14

%define

15

%define isqrt

16

%define orig_cntl_wd

17

%define new_cntl_wd

array
n

18

19

_find_primes:

ebp
ebp

+8
+12

-4
-8
ebp -10
ebp -12
ebp
ebp

enter

12,0

22

push

ebx

23

push

esi

25

fstcw

word [orig_cntl_wd]

26

mov

ax, [orig_cntl_wd]

20

21

24

unsigned n_find )

;
number of primes found so far
;
floor of sqrt of guess
;
original control word

;
new

control word

;
make room for local variables
;
save possible register variables
;
get current control word
138

27

or

ax, 0C00h

28

mov

[new_cntl_wd],

29

fldcw

word [new_cntl_wd]

30

ax

31

mov

esi,[array]

32

mov

dword [esi], 2

33

mov

dword [esi + 4],3

34

mov

ebx, 5

35

mov

dword [n], 2

36

37

;
This

outer loop finds

a new

prime

CHAPTER 6.FLOATING POINT

;
set

rounding bits to 11(truncate)

;
esi points

;
array[0]
;
array[1]
;
ebx

to array

=2
=3

= guess = 5

;
n= 2
each iteration, which it adds to the
38

;
end of the array. Unlike

prime finding

program,

the earlier

this function

39

does not determine primeness by dividing


by allodd numbers. It only
40

;
divides

by the prime numbers that it

41

;
are stored

42

43

while_limit:

inthe array.)

44

mov

eax, [n]

45

cmp

eax, [n_find]

46

jnb

short quit_limit

48

mov

ecx, 1

49

push

ebx

50

fild

dword [esp]

51

pop

ebx

52

fsqrt

47

fistp

53

54

dword [isqrt]

has already found. (Thats why they

;
while

(n< n_find )

;
ecx is used as array
;
store guess on stack
;
load guess
;
get guess

onto

index

coprocessor

off stack

stack

;
find sqrt(guess)
;
isqrt
55

;
This

=floor(sqrt(quess))
inner loop divides

guess

(ebx) by

58

;
until it finds a prime factor of guess
;
or until the prime number to divide is
;

59

while_factor:

56

57

[esi + 4*ecx]

60

mov

eax, dword

61

cmp

eax, [isqrt]

62

jnbe

short quit_factor_prime

63

mov

eax, ebx

64

xor

edx, edx

65

div

dword [esi +4*ecx]

66

or

edx, edx

earlier computed prime numbers


(which

means guess

is not prime)

greater than floor(sqrt(guess))

;
eax

= array[ecx]

;
while

(isqrt

;
&& guess
67

jz

<array[ecx]

% array[ecx] != 0 )

short

quit_factor_not_prime

inc

68

ecx

6.3. THE NUMERIC COPROCESSOR

jmp

69

short while_factor

70

71

72

;
found anew

73

74

quit_factor_prime:

prime !

mov
mov

eax, [n]

76

77

inc

eax

78

mov

[n], eax

75

dword [esi + 4*eax], ebx

79

80

quit_factor_not_prime:

81

add

ebx, 2

82

jmp

short while_limit

83

84

quit_limit:

85

86

fldcw

word [orig_cntl_wd]

87

pop

esi

88

pop

ebx

89

90

leave

139

;
add guess
;
inc n
;
try next
;
restore
;
restore

to end of

odd number

control word

register variables

ret

91

array

140
CHAPTER 6.FLOATING POINT

prime2.

global _dmax

23

segment .text
4

;
function

;
returns

;
C prototype

;
double

;
Parameters:

_dmax

the larger of its two double arguments

dmax( double d1,double d2 )

-first double
-second double

d1

10

d2

11

;
Return

12

13

%define d1

ebp+8

14

%define d2

ebp+16

15

_dmax:

value:

larger of d1
and d2 (inST0)

enter

0,0

18

fld

qword [d2]

19

fld

qword [d1]

;
ST0 =d1,ST1= d2

20

fcomip

st1

;
ST0 =d2

21

jna

short d2_bigger

22

fcomp

st0

;
pop d2 from stack

23

fld

qword [d1]

;
ST0 =d1

24

jmp

short exit

16

17

25

d2_bigger:

26

exit:

;
if d2 is max, nothing

27

leave

28

ret

Figure 6.7: FCOMIP example


6.3.
COPROCESSOR

141

THE

NUMERIC

to do

segment .data

dq

2.75

five

dw

;
converted

to double format

45

segment .text
6

fild

dword [five]

;
ST0 =5

fld

qword [x]

;
ST0 =2.75, ST1=5

fscale

;
ST0 =2.75 *32,ST1= 5

Figure 6.8: FSCALE example

142
CHAPTER 6.FLOATING POINT

Chapter 7

Structures and

C++
7.1
7.1.1

Structures

Introduction

Structures

are used inC to group together


a composite variable. This

related data into

technique has several advantages:

1.It clarifies the code by showing that the data


defined inthe structure

are intimately

related.

2.It simplifies passing the data to functions.


Instead of passing multiple
variables separately, they
single unit.

can be passed as a

3.It increases the locality

From the assembly standpoint

as an array

be considered

of the code.

a structure can

with elements of varying

arrays are always the


same size and type. This property is what allows
one to calculate the address of any element by
knowing the starting address of the array, the
size. The elements of real

size of the

elements

and the desired

elements

index.

A structures

same

elements do not have to be the

are not). Because of this


a structure must be explicitly
and is given a tag (or name) instead of a

size (and usually

each element
specified

of

numerical index.

a structure will
way as an element of an
array. To access an element, one must know the
In assembly, the element of

be accessed ina similar

starting address of the structure and the relative

offset of that element from the beginning of the

structure. However, unlike


offset

can

be calculated

element, the element of

offset by the compiler.

an array

where this

by the index

a structure

of the

is assigned

an

See the virtual

memory management

section of

Operating System text book for discussion of this term.

143
144

CHAPTER 7

STRUCTURES AND C++

Offset

Element

y
6

Figure 7.1: Structure S

Offset

Element

any

unused

y
8

Figure 7.2: Structure S

For example, consider the following structure:

struct S {
short int

x;

/ 2byte integer /

int

y;

/ 4byte integer /

double

z;

/ 8byte float

};

Figure 7.1 shows how

might look in the computers


C

standard

that

variable

of type S

memory.

The ANSI

the

elements

a
same

of

are arranged in the memory in the


as they are defined in the struct definition.

structure
order

states

It also states that the first element

very

is at the

beginning of the structure (i.e. offset zero).

It also defines another useful

macro

header file named offsetof().

inthe stddef
This

macro

computes and returns the offset of any element of


a structure. The macro takes two parameters, the
first is the name of the type of the structure, the
second is the name of the element to find the
offset of. Thus, the result of offsetof
would be 2 from Figure 7.1.

(S

y)

7.1. STRUCTURES

145
struct S {

short int

x;

/ 2byte integer /

int

y;

/ 4byte integer /

double

z;

attribute

/ 8byte float

((packed));

Figure 7.3: Packed struct using

7.1.2

gcc

Memory alignment

If one uses the offsetof


offset of y using the gcc

macro to find the

compiler, they will find that it returns 4,not 2!


Why? Because gcc (and Recall that an address is
on many other compilers) align variables on double
word boundaries by default. a double word
boundary if

In32-bit protected mode, the CPU

memory faster if the data starts at it is


divisible by 4 a double word boundary. Figure 7.2
reads

shows how the S structure really looks


using

gcc. The compiler

inserts two unused bytes

into the structure to align

y (and

z) ona double word boundary. This

shows why it is a good idea

to use offsetof to compute the offsets instead


of calculating them oneself

when using structures defined in C.

course, if the structure is only used in


assembly, the programmer
can determine the offsets himself. However, if
one is interfacing C and
assembly, it is very important that both the
Of

assembly code and the C code

agree on the offsets

of the elements of the

structure! One complication is


that different C compilers

may

give different

offsets to the elements. For


example, as we have seen, the gcc compiler creates
an S structure that looks
like Figure 7.2; however, Borlands compiler
would create

a structure

that

looks like Figure 7.1. C compilers provide

ways

to specify the alignment


used for data. However, the ANSI C standard
does not specify how this will

be done and thus, different compilers do it

differently.

The

gcc compiler

has

a flexible

and

complicated method of specifying the


alignment. The compiler allows

one to specify

the alignment of any type


using

a special syntax.

For example, the

following line:
typedef short int unaligned int

attribute

((

aligned (1)));

defines
aligned

a new type
on byte

parenthesis

named unaligned int that is


boundaries.

after

(Yes,

attribute

The 1
inthe aligned parameter
with

other

alignments.

powers

type,

gcc

the

are required!)
can be replaced

two to specify

other

(2 for word alignment, 4 for double

word alignment,

structure

of

all

was

etc.)

changed

would put

If the

to be

y at

y element of
an unaligned

offset 2.

the
int

However,

would still be at offset 8 since doubles

are

also

double word aligned by default. The definition of

zs type would have to be changed

put at offset 6.

as well for it to

146

CHAPTER 7

STRUCTURES AND C++

#pragma pack(push) /

save

alignment

state /

/ set byte alignment

#pragma pack(1)

struct S {
short int

x;

/ 2byte integer /

int

y;

/ 4byte integer /

double

z;

/ 8byte float

};

#pragma pack(pop)

/ restore

original alignment

Figure 7.4: Packed struct using Microsoft

or

Borland

The

gcc

compiler also allows

structure.

This tells the

minimum

possible

space

one to pack a
to use the

compiler
for

the

structure.

Figure 7.3 shows how S could be rewritten this

way.

This form of S would

use

the

minimum

bytes possible, 14 bytes.


Microsofts

support the
using

and

same

a #pragma

Borlands

compilers

both

method of specifying alignment

directive.

#pragma pack(1)

The directive above tells the compiler to pack

on

elements of structures
with

no extra

The

with two, four, eight

replaced

on word,

alignment
paragraph

stays

directive

these directives

in

effect

This

are

differently

lead

to

respectively.

until

The

overridden

by

problems since

often used in header files. If

files with structures,

can

be

or sixteen to specify

can cause

the header file is included

laid out

(i.e.,

one can

double word, quad word and

boundaries,

another directive

This

byte boundaries

padding).

before other header

these structures

may

be

than they would by default.

a very hard to find error


a program might lay out

modules of

Different

the elements of the structures indifferent places!


There

is

a way to

avoid

Microsoft and Borland support

current

alignment

state

and

this

problem.

a way to save
restore

the

it later

Figure 7.4 shows how this would be done.

7.1.3

Bit Fields

Bit fields allow

one to

specify members of

struct that only

use a spec-ified

number of bits.

The size of bits does not have to be

a multiple
an

of eight. A bit field member is defined like


unsigned int

or int

size appended to it.Figure

This defines

a colon and bit


7.5 shows an example.

member

a 32-bit

with

variable that is decomposed

inthe following parts:


7.1. STRUCTURES

147
struct S {
unsigned f1
:
3;

/ 3bit field

unsigned f2 :
10;

/ 10bit field /

unsigned f3 :
11;

/ 11bit field /

unsigned f4 :
8;

/ 8bit field

};

Figure 7.5: Bit Field Example

Byte \ Bit

Operation Code (08h)

Logical Unit #

msb of LBA

middle of Logical Block Address

lsb of Logicial Block Address

Transfer Length

Control

Figure 7.6: SCSI Read Command Format

8 bits

11bits

10 bits

3 bits
f4

The

f3

first

bitfield

f2

is

to the

assigned

significant bits of its double word.

f1

least

so simple if one
are actually stored in
memory. The diffi culty occurs when bitfields
span byte boundaries. Because the bytes on a
little endian processor will be reversed in memory.
However, the format is not

looks

at how the bits

For example, the S struct bitfields will look like


this in memory:

5 bits

3 bits

3 bits

5 bits

8 bits

8 bits
f2l

f1

f3l

f2m

f3m

f4

The f2l label refers to the last five bits (i.e., the
five least significant bits) of the f2 bit field. The

f2m label refers to the five most significant

bits

f2. The double vertical lines show the byte

of

boundaries.

If

one reverses

all

the bytes, the

pieces of the f2 and f3 fields will be reunited in


the correct place.
The physical

memory

layout

is not usually

important unless the data is being transfered in or

out of the

common

program

(which

with bit fields).

is actually

It is

quite

common

for

hardware devices interfaces to use odd number of

bits that bitfields could be useful to represent.


2

Actually, the ANSI/ISO

some

flexibility

However,

in exactly

common

C standard gives the compiler


how

the

compilers

bits

(gcc,

are

laid

Microsoft

Borland) will lay the fields out like this.

148
STRUCTURES AND C++

CHAPTER 7

out

and

#define MS OR BORLAND (defined( BORLANDC

)\

||
defined( MSC VER))

34

#if MS OR BORLAND
5

pragma

pack(push)

pragma

pack(1)

#endif

89

struct SCSI read cmd {

:
8;

10

unsigned opcode

11

unsigned lba msb

12

unsigned logical unit

13

unsigned lba mid

14

unsigned lba lsb :


8;

15

unsigned transfer length

16

unsigned control

:
5;
:
3;

:
8;

18

#if defined( GNUC )

20

attribute

21

:
8;

:
8;

17

19

/ middle bits /

((packed))

#endif

22

23

#if MS OR BORLAND

24

25

#endif

pragma

pack(pop)

Figure 7.7: SCSI Read Command Format


Structure

One example is SCSI

3.
A direct read command

for a SCSI device is spec-ified by sending

message to the

a six byte

device in the format specified in

Figure 7.6. The diffi culty representing

this using

bitfields is the logical block address which


different

7.6,

bytes of the command.

one sees

spans

that the data is stored

endian format.

Figure

From Figure

7.7 shows

in big

definition

that attempts to work with all compilers. The

first two lines define

a macro

that is true if the

or

code is compiled with the Microsoft


compilers.

The potentially

lines 11to 14. First

one

as a

are

might wonder why the

lba mid and lba lsb fields


and not

Borland

parts

confusing

are

defined separately

single 16-bit field? The

reason

is

that the data is inbig endian order. A 16-bit field


would be stored in little endian order by the
compiler.

fields
3

Next,

the

lba msb

appear to be reversed;

and

logical unit

however,

Small Computer Systems Interface,

an industry

7.1. STRUCTURES

149

8 bits
3 bits
control

8 bits
5 bits

transfer length

8 bits

8 bits

8 bits
lba lsb

lba mid

logical unit

lba msb

opcode

Figure 7.8: Mapping of SCSI read cmd fields


1

struct SCSI read cmd {

unsigned char opcode;

unsigned char lba msb

unsigned char logical unit

unsigned char lba mid;

unsigned char lba lsb;

unsigned char transfer length

unsigned char control;

10

:
5;
:
3;
/ middle bits /

#if defined( GNUC )

attribute

11

12

#endif

13

((packed))

Figure 7.9: Alternate SCSI Read Command

Format Structure

this is not the

case.

They have to be put in

this order. Figure 7.8 shows


mapped

are again

as a

how the fields

are

48-bit entity. (The byte boundaries

denoted by the double lines.) When this

is stored in memory inlittle endian order, the bits

are arranged

inthe desired format (Figure 7.6).

To complicate
for

the

correctly

matters

SCSI read cmd

for

Microsoft

more,

does
C.

the definition

not
If

quite
the

work

ex-pression

sizeof(SCSI read cmd)

is evalutated,

Microsoft C will return 8, not 6!This is because

the

uses

compiler

Microsoft

the

type of the

bitfield indetermining how to map the bits. Since

all the bit fields

are

defined

as

unsigned

types,

pads two bytes at the end of the

the compiler

structure to make it
words. This

can

an integral

be remedied

number of double

by making all the

fields unsigned short instead. Now, the Microsoft


does not need to add

compiler

since six bytes is


words.

any

pad bytes

an integral number of two-byte


com-pilers also work correctly

The other

with this change. Figure 7.9 shows yet another

definition that works for all three compilers. It

avoids

all but two of the bit

fields by using

unsigned char.
The reader should not be discouraged

if he

confusing.

It is

found

the previous

confusing!

The

discussion

author

often

it

finds

confusing to avoid bit fields altogether and


operations
4

Mixing

to examine
different

types

and
of

bit

modify
fields

use bit

the

leads

less

to

bits
very

confusing behavior! The reader is invited to experiment.

150

CHAPTER 7

STRUCTURES AND C++

manually.

Using structures inassembly

7.1.4

As discussed
assembly

is

above, accessing

very

much like

a structure in
an array
how one would
would zero out

accessing

a simple example, consider


write an assembly routine that
the y element of an S structure.
For

Assuming

the

prototype of the routine would be:


void

zero

y( S

s p );

the assembly routine would be:

%define

_zero_y:

y_offset

enter

0,0

mov

eax, [ebp + 8]

s_p (struct
mov
6

;
get

pointer) from stack

+ y_offset],

dword [eax
leave

ret

one to pass a structure

C allows

a function;

by value to

however, this is almost always

a bad

idea. When passed by value, the entire data in

structure

the

must be copied to the stack and

more
a pointer to a structure instead.
C also allows a structure type to be used as
the return value of a func-tion. Obviously
a
structure can not be returned in the EAX register.
then retrieved

by the routine. It is much

effi cient to pass

compilers

Different

handle

common

this

situation

com-pilers
use is to internally rewrite the function as one
that takes a structure
pointer as a parameter.
differently.

solution that

The pointer is used to put the return value into

a structure
Most

defined outside of the routine called.


assemblers

built-in support
assembly

(including

for defining

code. Consult

your

NASM)

structures

documentation

details.

7.2

in

Assembly and C++

have

your

for

The

programming

C++

extension of the C language.

language

is

an

Many of the basic

rules of interfacing C and assembly language also


apply to C++.
modified.

are

However,

Also,

some

easier to understand

assembly

language

knowledge of C++.

some

rules need to be

of the extensions of C++

with

This section

a knowledge of
assumes a basic

7.2. ASSEMBLY

AND C++

151
1

#include <stdio.h>

23

void f(int
4

printf (%d\n,

x)

{
x);

78

void f(double
9

printf (%g\n,

10

11

x)

{
x);

Figure 7.10: Two f()functions

7.2.1

Overloading and Name Mangling

C++

allows

different

functions

(and

class

same name to be
defined
When more than one function share
the same name, the functions are said to be
overloaded. If two functions are defined with the
same name in C, the linker will produce an
error because it will find two definitions for the
same symbol in the object files it is linking. For
member

functions)

with the

example, consider the code in Figure 7.10. The

assembly code would define two labels

equivalent

named f which will obviously be an error.

uses the same linking process as C, but


this error by per-forming name mangling or

C++

avoids

modifying the symbol used to label the function.

In a

way, C already uses name


adds an underscore to the name
when

creating

the

label

mangling, too. It
of the C function

for

the

function.

name of both
functions in Figure 7.10 the same way and
produce
an error. C++ uses a more
sophisticated mangling process that produces two
However,

C will

func-tion

first

assigned

the

labels for the functions.

different

the

mangle

in Figure

by DJGPP

For example,

7.10

would

be

the label f Fi and the

second function, f Fd. This avoids

any

linker

errors.
Unfortunately,

to manage
mangle
Borland
@f$qd

names
names

there is

no

standard for how

in C++ and different compilers


For

differently.

C++ would

use

the labels @f$qi and

for the two functions

However, the rules

example

are not

in Fig-ure 7.10

completely

arbitrary

name

The mangled

encodes the signature of the

function. The signature of


by the order

function is defined

and the type of its parameters.

Notice that the func-tion that takes

a single

int

has an i
at the end of its mangled
name (for both DJGPP and Borland) and that
the one that takes a double argument has a d at
the end of its mangled name. If there was a

argument

function named f with the prototype:

152

CHAPTER 7

STRUCTURES AND C++

void f(int

x,int y,double

DJGPP would mangle its

z);

name to

be f Fiid

and Borland to @f$qiid.


The return type of the function is not part of

functions

mangled

overloading

signature

name.
in

C++.

fact

explains

unique

Only

may

functions

rule

of

whose

one
can see, if two functions with the same name and
signature are defined in C++, they will produce
the same mangled name and will create a linker
error. By default, all C++ functions are name
signatures

are

is not encoded in its

and

This

be overloaded. As

even ones

mangled,

no way

are

that

When it is compiling

not

overloaded

file, the compiler has

a particular function
is overloaded or not, so it mangles all names. In
fact, it also mangles the names of global variables
by encoding the type of the variable in a similar
way as function signatures. Thus, if one defines a
global variable in one file as a certain type and
then tries to use it in another file as the wrong
type, a linker error will be produced. This
characteristic
of C++ is known as typesafe
linking. It also exposes another type of error,
inconsistent prototypes
This occurs when the
definition of a function in one module does not
agree with the prototype used by another module.
In C, this can be a very diffi cult problem to
debug. C does not catch this error. The program
of knowing whether

will compile
behavior

different

as

and link, but will have undefined


the calling

types

on

code

will be pushing

the stack than

expects. In C++, it will produce

the function

a linker error.
a function

When the C++ compiler is parsing


call, it looks for
the

types

of

a matching
the

function by looking at

arguments

passed

to the

If it finds

function

match, it then creates

CALL to the correct function using the compilers

name

mangling rules.

use

compilers

Since different

name

different

mangling rules, C++ code compiled by different

may not

compilers
This

be able to be linked together.

fact is important

when considering

using

precompiled C++ library! If one wishes to write


in assembly

function

that

a
a

will be used with

C++ code, she must know the

name

mangling

rules for the C++ compiler to be used (or use the


technique explained below).
The astute

student

may

question

whether

as

expected.

the code in Figure 7.10 will work


Since C++
printf

name

function

compiler

printf.

mangles all functions, then the


will

be

will not produce

This

is

prototype for printf

mangled

valid

was

and

CALL to the

concern!

If

the
label

the

simply placed at the

top of the file, this would happen. The prototype


is:
int printf (const char , ...);

DJGPP would mangle this to be printf FPCce.

(The Fis for function, P

The match does not have to be

compiler

will

arguments.

scope

consider

The

rules

an exact

matches

made

for this

process

of this book. Consult

a C++

match, the

by

casting

the

are

beyond

the

book for details.

7.2. ASSEMBLY

AND C++

153
for pointer, C for const,

ellipsis.)

This

would

for

the regular

for char and

not call

printf function! Of course, there must


a way for C++ code to call C code. This is
very important because there is a lot of useful old
C code around. In addition to allowing one to call
legacy C code, C++ also allows one to call

librarys

be

assembly

code using

the normal

C mangling

conventions.

C++ extends the extern keyword to allow it

to specify that the func-tion


modifies

uses

terminology,

or

global variable it

the normal C conventions. In C++


the function

or

global variable

uses

C linkage. For example, to declare printf to have


C linkage,

use the prototype:

extern C int printf (const char

...

);

use the C++


name mangling rules on this function, but instead
to use the C rules. However, by doing this, the
This instructs the compiler not to

may not be overloaded. This


provides the easiest way to interface C++ and
assembly, define the function to use C linkage and
then use the C calling convention.
printf

For
linkage

function

convenience,
of

block

C++

of

also

functions

allows
and

the

global

variables to be defined. The block is denoted by


the usual curly braces.
extern C {
/ C linkage global

prototypes

variables

and function

If one examines the ANSI C header files that

come with C/C++ com-pilers today, they will find


the following near the top of each header file:
#ifdef

cplusplus

extern C {
#endif

a similar construction near the bottom


a closing curly brace. C++ compilers
define the
cplusplus macro (with two
And

containing

leading under-scores). The snippet above encloses

the entire header file within

an extern

"C" block if

as C++, but does


nothing if compiled as C (since a C compiler
would give a syntax error for extern "C"). This
same technique can be used by any programmer
to create a header file for assembly routines that
can be used with either C or C++.
the header file is compiled

7.2.2

References

are another new feature of C++


one to pass parameters to functions

References
They allow

without explicitly using pointers.

consider

the

code

reference parameters

in Figure

are pretty

For example,

7.11.

Actually,

154

CHAPTER 7

STRUCTURES AND C++

void f(int & x )

{x++; }

// the & denotes

a reference parameter

34

int main()
5

y =5;

int

f(y);

no & here!

y); // prints out 6!

return 0;

10

// reference to y is passed, note

printf (%d\n,

Figure 7.11: Reference example

simple,

they

really

are

just

pointers.

The

programmer (j
implement
var

compiler just hides this from the

ust

as

Pascal

parameters

as

compilers

pointers)

When

the compiler

generates assembly for the function call on line 7,


it

passes

the address

function f in assembly,
6:

prototype

was

void f( int xp);

of

y. If one was writing


as if the

they would act

are

References

just

convenience

are

that

especially useful for opera-tor overloading. This is

another
define

feature

meanings

structure

use

or

is to

of C++
for

that

the

(+) operator

plus

concatenate string objects. Thus, if


strings,

the

operators

class types. For example,

define

one to
on
a common

allows

common

a and

to

b were

a + b would return the concatenation of


a and b. C++ would actually call a

strings

function to do this (in fact, these expression could

be rewritten in function notation


+(a,b)). For effi ciency,
the address

one

of the string

as operat or
pass

would like to
objects

in-stead

of

passing them by value. Without references, this

as operator +(&a,&b), but this


one to write in operator syntax as
&a + &b. This would be very awkward and
confusing. However, by using references, one can
write it as a +b,which looks very natural.

could be done
would require

7.2.3

Inline functions

Inline functions are yet another feature of


7.
Inline functions are meant to replace the

C++

error-prone,

preprocessor-based

macros

that take

pa-rameters. Recall from C, that writing a macro


that squares a number might look like:
6

Of

course,

they might want to declare the function

with C linkage to avoid

name

mangling

as discussed

in

Section 7.2.1
7

C compilers often support this feature

as an extension

7.2. ASSEMBLY AND C++

155
1

inline int

inline f (int

x)

{return xx; }

34

int f(int
5

x)

{return xx; }

67

int main()
8

int

y =inline

11

f (x);

return 0;

12

13

y,x = 5;

y =f(x);

10

Figure 7.12: Inlining example

#define SQR(x) ((x)(x))

Because the
C

and

parenthesis

preprocessor

does

are

simple

does not understand

sub-stitutions,

the

required to compute the correct

answer

in most

cases.

However,

answer

will not give the correct


Macros
overhead

are
As

demonstrated,

version

used because they eliminate

of making

function.

even this

for SQR(x++).

the

func-tion call for

several steps. For

on

chapter

performing

a very

the

simple

subprograms

function call involves

simple function, the time

it takes to make the function call

may

be

more

than the time to actually perform the operations

are a much more


that looks like a
does not CALL a

in the function! Inline functions


friendly

way to

write

normal

function,

but

common

block

functions

are

function.

code

that

of code. Instead,

calls to inline

replaced by code that performs the

C++ allows

function

to be

made

inline by placing the keyword inline in front of

the function defini-tion. For example, consider the


functions

declared

in Figure 7.12. The call

function f on line 10 does


(in assembly, assuming

y is at ebp-4):

push

dword [ebp-8]

a normal

x is at

to

function call

address ebp-8 and

3
4

call

_f

pop
mov

ecx
[ebp-4],

eax

However, the call to function inline f on line 11


would look like:

156

CHAPTER 7

STRUCTURES AND C++

mov

eax, [ebp-8]

imul

eax, eax

mov

[ebp-4],

In this

case,

there

eax

are two

advantages

to

inlining. First, the inline func-tion is faster. No

parameters

are

pushed

on

no stack
no branch is
func-tion call uses less
the stack,

frame is created and then destroyed,

made. Secondly, the inline

code! This last point is true for this example,


but does not hold true inall cases.
The main disadvantage
inline code is not linked and

of inlining

so

is that

the code of

an

inline function must be available to all files that

use it. The

previous example assembly code shows

this. The call of the non-inline

function

only

requires knowledge of the parameters, the return


value type, calling convention and the

the label for the function.

name

of

All this information

is available

from the prototype

However,

using

knowledge

of the all the code of the function.

This

means

the

that

inline

of the function.

function must be recompiled

non-inline

any part of an inline


all source files that use the

if

function is changed,

Recall that for

functions, if the prototype

change, often the files that


need not be recompiled.

use

does not

the function

For all these

the code for inline functions

header

requires

function

files. This practice

are usually

reasons,

placed in

is contrary

to the

normal hard and fast rule in C that executable

are never

code statements

7.2.4

placed inheader files.

Classes

A C++ class

describes

a type

of object. An

has both data mem-bers and function


8.
members
In other words, its a struct with
object

data and functions associated with it.Consider the

simple class defined in Figure 7.13. A variable of


Simple type would look just like

normal C

struct with a
Actually,

C++

uses

the

single int member. The

functions

are not

stored in

memory

the this keyword to access the

member
functions.
passed

structure

to

However,

functions
are different from other
pointer to the object acted
They are

a hidden parameter.

pointer to the

on from inside

This parameter is

the member

that the member function is acting


function.

the set data

method

object

on.

For
consider

assigned

example,

of the Simple

class of Fig-

ure

7.13.

like

If it

was

function

written in C, it would look

that

was

passed a
on as the code
-S switch on the
gcc and Borland

explicitly

pointer to the object being acted


in Fig-ure 7.14 shows. The
DJGPP

compilers

an

compiler

as well)

assembly

file

(and

the

tells the compiler to produce

containing

the

equivalent

assembly language for the code produced.


DJGPP and

gcc

the assembly

.s extension
8

Often called member functions

For

file ends in an

in C++

and unfortu-

or more

generally
methods.

7.2. ASSEMBLY

AND C++

157
1

class Simple {
public:

Simple();

// default constructor

Simple();

// destructor

int get data() const;

// member functions

void set data( int );

private:
int data;

// member data

};

10

11

12

Simple::Simple()

{data

= 0;}

13

14

15

Simple::Simple()

{/ null body / }

16

17

int Simple::get

18

{return data; }

data() const

19

20

21

void Simple:: set data( int


{data

x)

= x;}

Figure 7.13: A simple C++ class

nately

uses

AT&T

assembly

language

syntax

which is quite different from NASM and MASM


9
syntaxes
(Borland and MS compilers generate

a file

with a .asm extension using MASM syntax.)

7.15

Figure

converted

shows

very

DJGPP

the

purpose

of the statements. On

first line, note that the set data method

is assigned

name

of

to NASM syntax and with comments

added to clarify

the

output

the

mangled

label that encodes

the

of the method, the name of the class and the

parameters.
because

The

other

name

of the class is encoded

classes

might

have

method

named set data and the two methods must be

assigned

different

labels.

encoded

so

the

that

parameters

The

can

class

overload

set data method to take other parameters


normal C++ functions. However, just
compilers

different

are

the

just

as

as

before,

will encode this information

differently inthe mangled label.

Next
prologue
9

The

called
the

gcc

compiler

and 3, the familiar function


line

On

are

differences

outputs

several

5,

system includes

gas. The gas assembler uses

compiler

There

free

on lines 2
appears.

the code

pages on

its

in the

the

web

named

a2i

format

that

in INTEL and AT&T formats.

program

own

assembler

AT&T syntax and thus


for

discuss

gas.
the

There is also

(http://www.multimania.com/placr/a2i.html),

that

converts AT&T format to NASM format.

158

CHAPTER 7

STRUCTURES AND C++

void set data( Simple object

int

x)

{
object>data

= x;

Figure 7.14: C Version of Simple::set data()


1

_set_data__6Simplei:

mangled

name

push

ebp

mov

ebp, esp

mov

eax, [ebp + 8]

pointer to object (this)


edx, [ebp

parameter

;
data

+12]

is at offset 0
leave

10

;
edx
mov

ret

;
eax
mov

= integer
[eax], edx

Figure 7.15: Compiler output of Simple::set data(


int )

the first parameter


EAX. This is not the
hidden parameter
being acted

on the stack is stored into


x param-eter! Instead it is the

10

that points to the object

on. Line 6 stores

the

x parameter

into

EDX and line 7 stores EDX into the double word


that EAX points to. This is the data member of
the Simple

object being acted

on,

which being

the only data in the class, is stored at offset 0 in


the Simple structure.

Example

This section

a C++

uses

the ideas of the chapter to

an unsigned
can
be any size, it will be stored in an array
of unsigned integers (double words)
It can be
made any size by using dynamical allocation.
11
The double words are stored in reverse order
create

class that represents

integer of arbitrary

size. Since the integer

(i.e. the least significant double word is at index


0). Figure
Big int

7.16 shows the definition of the


12.
The size of a Big int is

class

measured by the size of the unsigned

array

that

is used to store
10

As usual, nothing is hidden inthe assembly code!

11

Why?

Because

start processing

addition

operations

at the beginning

of the

will then always

array

move

and

forward.
12

See the code example

source

for the complete

for this example. The text will only refer to


code.

7.2. ASSEMBLY AND C++

159

some

code

of the

class Big int {


public:

Parameters:

explicit Big int (


size t

value of Big int

size

unsigned initial value

as a normal

unsigned int

=0);

Parameters:

13

14

15

16

18

initial

initial value

12

17

of

normal unsigned int s

10

11

as number

size of integer expressed

size

of

normal unsigned int s


initial value

initial

value of Big int

as a string

holding

hexadecimal representation of value.

/
Big int (size t

const char

19

as number

size of integer expressed

size

size

initial value );

20

21

22

Big int (const Big int & big int to copy );


Big int ();

23

24

25

// returns size of Big int (in terms of unsigned int s)


size t

size () const;

26

27

const Big int & operator

28

friend Big int operator

= (const

+(const

const Big int & op2 );

29

30

friend Big int operator (const Big int & op1,

const Big int & op2);

31

32

friend bool operator

== (const

35

36

37

Big int & op1,

const Big int & op2 );

33

34

Big int & big int to copy );

Big int & op1,

friend bool operator

< (const

Big int & op1,

const Big int & op2);


friend ostream & operator

<< (ostream

&

os,

const Big int & op );

38

private:
size t

39

41

size

unsigned number

40

// size of unsigned

;
//

array

pointer to unsigned

array

holding value

};

Figure 7.16: Definition of Big int class

160
STRUCTURES AND C++

CHAPTER

// prototypes for assembly routines

extern C {

const Big int & op1,

const Big int & op2);

res,

int sub big ints (Big int &

const Big int & op1,

const Big int & op2);

res,

int add big ints (Big int &

10

11

12

inline Big int operator

+(const

13

Big int result (op1.size ());

14

int

= add big ints( result


== 1)

op1, op2);

throw Big int::Overflow();

16

if (res

17

== 2)

throw Big int::Size mismatch();

18

return result

19

20

res

if (res

15

Big int & op1, const Big int & op2)

21

22

23

inline Big int operator (const Big int & op1, const Big int & op2)
{

24

Big int result (op1.size ());

25

int

=sub big ints (result


== 1)

op1, op2);

throw Big int::Overflow();

27

if (res

28

== 2)

throw Big int::Size mismatch();

29

return result

30

31

res

if (res

26

Figure 7.17: Big int Class Arithmetic Code


7.2. ASSEMBLY

AND C++

161

its data. The size data member of the class is

assigned offset

zero

and the

number member is

assigned offset 4.
To

these

simplify

instances with the

same

example,
size

only

object

ar-rays can be added

to or subtracted from each other.


The class has three constructors:
(line 9) initializes
normal

unsigned

the class
integer;

initializes

the instance

contains

the second

by using

hexadecimal

the first

instance by using

value.

(line 18)

string
The

that
third

constructor (line 21) is the copy constructor.


This discussion focuses
and

subtraction

where

operators

the assembly

on

how the addition

work

language

since

this

is

is used. Figure

7.17 shows the relevant parts of the header file for


these operators. They show how the operators

set

up to

call

different compilers
rules

for

functions

the

use radically

operator

are

assembly

up

are

Since

different mangling

functions,

used to set

routines.

inline

operator

calls to C linkage

assembly routines. This makes it relatively

easy to

port to different compilers and is just


direct

calls. This technique

need to throw

an exception

as fast as

also eliminates

the

from assembly!

Why is assembly used at all here? Recall that

to perform

carry must

pre-cision arithmetic, the


moved from one dword to be

multiple
be

added to the next significant


C) do not allow the

CPUs

only

carry

flag. Performing

be done

recalculate the

by

carry

dword. C++ (and

programmer to access

having

the addition could

C++

to the next dword. It is much

can

be accessed

independently

flag and conditionally

write the code in assembly

the

more

where the

add it

effi cient to

carry

flag

and using the ADC instruction

which automatically

adds the

carry

flag in makes

a lot of sense.
For brevity, only the add big ints assembly
routine will be discussed

here. Below is the code

for this routine (from big math.asm):

big math.
1

segment .text

global

add_big_ints,

%define size_offset 0

%define number_offset 4

56

%define EXIT_OK 0
7

%define EXIT_OVERFLOW 1

%define EXIT_SIZE_MISMATCH 2

10

;
Parameters

11

%define

12

%define op1ebp+12

13

%define op2 ebp+16

for both add and

res ebp+8

14

sub_big_ints

sub routines

162

15

add_big_ints:

16

push

ebp

17

mov

ebp, esp

18

push

ebx

19

push

esi

20

push

edi

21

;
first
;

22

23

set

up esi to

edi to

CHAPTER 7.STRUCTURES AND C++

point to op1
point to op2
24

25

26

27
28

;
mov
mov
mov
;

ebx to point to res


esi, [op1]
edi, [op2]

ebx, [res]

30

;
make sure
;

31

mov

eax, [esi + size_offset]

32

cmp

eax, [edi + size_offset]

33

jne

sizes_not_equal

34

cmp

eax, [ebx + size_offset]

35

jne

sizes_not_equal

29

that all 3 Big_ints

have

36

37
38

39

40

mov
ecx, eax
;
;
now, set registers to point
;
esi = op1.number_

to their

= op2.number_
= res.number_

edi

ebx

43

;
;

44

mov

ebx, [ebx

45

mov

esi, [esi +number_offset]

46

mov

edi, [edi +number_offset]

41

42

+number_offset]

47

clc

48

xor
edx, edx
;
;
addition loop

49

50

51

52

add_loop:

53

mov

eax, [edi+4*edx]

54

adc

eax, [esi+4*edx]

55

mov

[ebx

56

inc

edx

the

same

+4*edx], eax

size

;
op1.size_

!= op2.size_

;
op1.size_

!= res.size_

;
ecx

=size of Big_ints

respective

arrays

;
clear carry

;
edx

=0

flag

;
does

not alter

carry

flag

7.2. ASSEMBLY AND C++

57

loop

add_loop

jc

overflow

xor

eax, eax

jmp

done

58

59

60

ok_done:

61

62

63

overflow:

64

mov

eax, EXIT_OVERFLOW

65

jmp

done

66

sizes_not_equal:

mov

eax, EXIT_SIZE_MISMATCH

69

pop

edi

70

pop

esi

71

pop

ebx

72

leave

73

ret

67

68

done:

big math.asm

163

;
return

value

= EXIT_OK

most

Hopefully,
straightforward

of

this

code

to the reader by

to 27 store pointers

should

now.

be

Lines 25

to the Big int objects

passed to the function into registers. Remember

are just pointers. Lines 31


to 35 check to make sure that the sizes of the
three objectss arrays are the same. (Note that the
offset of size is added to the pointer to access the
that references

really

data member.) Lines 44 to 46 adjust the registers

to point to the
objects

instead

array
of

used by the respective


the

objects

themselves

(Again, the offset of the number member is added

to the object pointer.)


The loop in lines 52 to 57 adds the integers
stored in the

arrays

together

by adding the least

dword

significant

in this

first, then

the

next

least

dwords, etc. The addition must be done

significant

sequence

for extended preci-sion arithmetic

(see Section 2.1.5). Line 59 checks for overflow,

on overflow

the

carry

flag will be set by the last

addition of the most significant


dwords

array are

in the

dword. Since the

stored in little endian

order, the loop starts at the beginning of the


and

moves

array

forward toward the end.

Figure 7.18 shows

a short

example using the

Big int class. Note that Big int constants must


be declared

necessary

constructor

conversion
unsigned

as on line
reasons. First,

explicitly

for two

int

to

Big ints of the


makes conversion

that

will

a Big int.
same size can
problematic

16. This

is

there is

no
an

convert

Secondly,

since it would be

diffi cult to know what size to convert to. A


sophisticated
allow

any

author

implementation

only

be added. This

of the

more

class would

any other size. The


over complicate this

size to be added to

did not

wish to

example by implementing this here. (However, the


reader is encouraged to do this.)

164

CHAPTER

STRUCTURES AND C++

#include big int.hpp

#include <iostream>

using

namespace

std;

45

int main()
6

try {

Big int b(5,8000000000000a00b);

Big int a(5,80000000000010230);


Big int

10

cout

11

c =a + b;

<< a << + << b << = << c << endl;

for( int i=0; i


< 2; i++ ){

12

c =c + a;

13

cout

14

15

16

cout

<< c = << c << endl;

<< c1

= << c Big int(5,1)

18

cout

<< d = << d<< endl;

19

cout

<< c

20

cout

<< c > d << (c >d) << endl;

== d << (c == d) << endl;

21

22

catch( const char str ){

cerr << Caught:

23

<< str << endl;

24

25

catch( Big int ::Overflow

){

cerr << Overflow << endl;

26

27

28

catch( Big int ::Size mismatch ){

cerr << Size

29

mismatch

<< endl;

30

return 0;

31

32

<< endl;

Big int d(5, 12345678);

17

Figure 7.18: Simple Use of Big int

7.2. ASSEMBLY AND C++

165
1

#include <cstddef>

#include <iostream>

using

namespace

std;

45

class A {
6

public:
cdecl m() {cout

void

int ad;

<< A::m() << endl; }

};

10

11

12

class B :
public A {
public:
cdecl m() {cout

13

void

14

int bd;

15

<< B::m() << endl; }

};

16

17

void f( A

18

= 5;

19

p>ad

20

p>m();

21

p)

22

23

int main()

24

25

A a;

26

Bb;

27

cout

28

29

cout

<< Size of a: << sizeof(a)


<< Offset of ad: << offsetof(A,ad) << endl;
<< Size of b: << sizeof(b)

30

<< Offset

of ad: << offsetof(B,ad)

31

<< Offset

of bd: << offsetof(B,bd)

32

f(&a);

33

f(&b);

<< endl;

return 0;

34

35

Figure 7.19: Simple Inheritance

166

_f__FP1A:

push

ebp

mov

ebp, esp

mov

eax, [ebp+8]

mov

dword [eax], 5

mov

eax, [ebp+8]

push

eax

call

_m__1A

add

esp, 4

10

leave

11

ret

CHAPTER 7.STRUCTURES AND C++

;
mangled function name
;
eax points to object

;
using offset 0 for ad
;
passing address of object

to A::m()

;
mangled

method

name

for A::m()

Figure 7.20: Assembly Code for Simple


Inheritance

7.2.5

Inheritance and Polymorphism

Inheritance
data

and

consider

allows

methods

one

of

class

another.

to inherit
For

the

example,

the code in Figure 7.19. It shows two

classes, A and B, where class B inherits from A.


The output of the program is:

Size of

a:4 Offset

of ad: 0

Size of b:8 Offset of ad: 0 Offset of


bd: 4 A::m()
A::m()

Notice that the ad data members of both classes


(B inherits it from A) are at the

same offset. This


may be passed
or any object of a

is important since the f function

pointer to either

an A object

type derived (i.e. inherited


shows

the (edited)

asm

from) A. Figure 7.20

code

for the function

(generated

by gcc).

Note that inthe output that As

a and
one can see that

m method was

called for both the

b objects. From the

assembly,

the call to A::m() is

hard-coded

into

object-oriented

the

function.

programming,

should depend

on what type

the function.

This

For

true

the method

called

of object is passed to

is known

as

polymorphism.

C++ turns this feature off by default. One

uses

the virtual keyword to enable it. Figure 7.21


shows how the two classes

would be changed.

None of the other code needs to be changed.


Polymor- phism
Unfortunately,

transition

can

be implemented

gccs

at the time of this writing

becoming significantly
initial

cover
the

more

implementation.

simplifying

many ways.

implementation

complicated
In

this discussion,

the

is

in

and is

than its

interest

of

the author will only

the implementation of polymor-phism which

Windows

based

Microsoft

and

Borland

7.2. ASSEMBLY AND C++

167

class A {
public:
virtual void

cdecl m() {cout

<< A::m() << endl; }

int ad;

};

67

class B :
public A {
8

public:
virtual void

11

cdecl m() {cout

<< B::m() << endl; }

int bd;

10

};

Figure 7.21: Polymorphic Inheritance

implementation has not changed in many

years

and probably will not change inthe foreseeable


future.

With these changes, the output of the

program

changes:

Size of

a:8 Offset

of ad: 4

Size of b:12 Offset of ad: 4 Offset of


bd: 8 A::m()
B::m()

Now the second call to f calls the B: :m()

a B object.

This is

not the only change however. The size of

an A is

method because it is passed

now

8 (and B is 12). Also, the offset of ad is 4,

answer to these

not 0.What is at offset 0? The

are

questions

related

to how polymorphism

is

implemented.

A C++ class that has

any

virtual methods is

given

an extra

array

of method pointers

called

the vtable. For the A and B classes this

pointer

hidden field that is a pointer to an


13.
This table is often

is stored

compilers

always

at offset
put

0. The

this

Windows

pointer

at

the

beginning of the class at the top of the inheritance

tree. Looking at the assembly code (Figure 7.22)


generated

for function f (from Figure 7.19) for

the virtual method version

can see

of the

that the call to method

program, one
m is not to a

label. Line 9 finds the address of the vtable from


the object. The address of the object is pushed

on the

stack in line 11. Line 12 calls the virtual

method by branching to the first address in the


14.
vtable
This call does not use a label, it branches

to the code address pointed to by EDX. This


type of call is an
13

For classes

always

without

virtual

make the class compatible

methods
with

C++

compilers

normal C struct

with the
14

It

Of

same

data members.

course,

was put

this value is already

there in line 8 and

and the next line changed

very

effi cient

because

optimizations turned

it

in the ECX register.

line 10 could be removed

to push ECX. The code is not

was

generated

without

compiler

on.

168

?f@@YAXPAVA@@@Z:

push

ebp

mov

ebp, esp

mov

eax, [ebp+8]

mov

dword [eax+4], 5

mov
mov

ecx, [ebp + 8]

10

mov

eax, [ebp + 8]

11

push

eax

12

call

dword [edx]

13

add

esp, 4

45

78

edx, [ecx]

14

15

pop

16

ret

ebp

CHAPTER 7.STRUCTURES AND C++

;
p->ad

=5;

=p
;
edx = pointer
;
eax = p
;
ecx

;
push "this"

to vtable

pointer

;
call first function
;
clean up stack

invtable

Figure 7.22: Assembly Code for f() Function

example of late binding. Late binding delays the


decision of which method
running.
appropriate

This

allows

to call until the code is


the

code

to

call

the

method for the object. The normal

case

(Figure 7.20) hard-codes

call to

certain

method and is called early binding (since here the


method is bound early, at compile time).
The attentive reader will be wondering

why

the class methods in Fig-ure 7.21 are explicitly

declared

to

using the

use

uses a different
methods

passes

the C calling

convention

calling

conven-tion for

than the standard

C++ class

C convention.

It

on by

the

the pointer to the object acted

method in the ECX register instead


stack.

The

stack

explicit parameters

modifier

is still used

of using the

for the other

of the method. The

tells it to

convention.

use

Borland

the standard

uses

C++

lets

complicated

look

example

at

(Figure

cdecl

C calling

the C calling

convention by default.
Next

by

cdecl keyword. By default, Microsoft

slightly
7.23)

more

In it, the

classes A and B each have two methods: m1 and

m2. Remember
its

own

method.

that since class B does not define

m2 method,
Figure

7.24

it inherits

the

shows

the b object

how

A classs

appears in memory. Figure 7.25 shows the output


of the program. First, look at the address of the

vtable

for each

object.

The

two B objectss

are the same and thus, they share the


same vtable. A vtable is a property of the class
not an object (like a static data member). Next,

addresses

look

at the addresses

looking at assembly

in the

output,

7.2. ASSEMBLY AND C++

169

vtables.

one can

From

determine

class A {
public:

virtual void

cdecl m1() {cout

virtual void

cdecl m2() {cout

int ad;

<< A::m1() << endl; }


<< A::m2() << endl; }

};

78

class B :
public A {
9

cdecl m1() {cout

virtual void

10

<< B::m1() << endl; }

int bd;

11

12

// Binherits As m2()

public:

};

13

/ prints the vtable of given object /

14

void print vtable (A

15

pa

psees pa as an array

16

//

17

unsigned

18

// vt

sees

of dwords

p = reinterpret
vtable

= reinterpret

19

void vt

20

cout

21

for( int i=0; i


< 2;i++ )

<< hex << vtable

cout

22

cast<unsigned

as an array

<< dword

>(pa);

of pointers

cast<void
address

>(p[0]);

= << vt << endl;

<< i
<< :

<< vt[i] << endl;

23

functions in EXTREMELY

24

// call

25

void (m1func

virtual

pointer)(A

26

m1func pointer

= reinterpret

27

m1func pointer(pa);

);

nonportable

way!

// function pointer variable

cast<void ()(A)>(vt[0]);
// call method m1via function pointer

28

29

void (m2func

pointer)(A

30

m2func pointer

= reinterpret

m2func pointer(pa);

31

32

);

// function pointer variable

cast<void ()(A)>(vt[1]);
// call method m2 via function pointer

33

34

int main()

35

36

A a;

Bb1;

37

cout

<< a:

Bb2;

<< endl;

print vtable (&a);

38

cout

<< b1:

<< endl; print vtable (&b);

39

cout

<< b2:

<< endl; print vtable (&b2);

40

return 0;

41

Figure 7.23: More complicated example

170

CHAPTER 7

STRUCTURES AND C++

048

vtablep

ad

bd

&B::m1()

&A::m2()

vtable

b1

Figure 7.24: Internal


representation of b1

a:
vtable address

=004120E8

dword 0:

00401320

dword 1:

00401350

A::m1()
A::m2()
b1:
vtable address

=004120F0

dword 0:

004013A0

dword 1:

00401350

B::m1()
A::m2()
b2:
vtable address

=004120F0

dword 0:

004013A0

dword 1:

00401350

B::m1()
A::m2()

Figure 7.25: Output of program inFigure 7.23

7.2. ASSEMBLY

AND C++

171

is at offset 0 (or dword 0) and m2 is at offset 4


(dword 1). The m2 method

pointers

for the A and B class vtables

are the same

because

class B

inherits the m2 method from the A class.

Lines 25 to 32 show how

one

could call

virtual function by reading its address out of the


15.
vtable for the object
The method address is

stored into

C-type

function

pointer

with

an

explicit this pointer. From the output in Figure

7.25,

one can see

that it does work. However,

please do not write code like this! This is only


used to illustrate

how the virtual methods

use

the vtable.
There

are some

practical lessons to learn from

this. One important fact is that

to be

very

one

careful when reading

class variables to a binary file. One

use a binary

to the file! This

and writing

can not just


or write on the entire object as
or write out the vtable pointer
is a pointer to where the vtable

read

this would read

would have

memory and will vary


from program to program. This same problem
can occur in C with structs, but in C, structs only
have pointers in them if the programmer explicitly
puts them in. There are no obvious pointers
defined ineither the A or B classes.

resides in the programs

Again, it is important to realize that different


compilers

implement

vir-tual methods differently

In Windows, COM (Component

use

class objects
16
interfaces

vtables

compilers

Only

virtual method

vtables

as

Object Model)

to implement
that

COM

implement
does

can

uses
one of

the

Microsoft

create COM classes. This is why Borland

same implementation as Microsoft and


reasons why gcc can not be used to create

the

COM

classes.
The

exactly

code

like

for

the

virtual

non-virtual

one.

method

that calls it is different. If the compiler


absolutely
called,

it

sure
can

of what virtual method

ignore

the vtable

Other C++ features

can be

will be

and call

method directly (e.g., use early binding).

7.2.6

looks

Only the code

the

The workings
RunTime

of other C++ features

Type Information,

and multiple inheritance)

exception

are

good

starting

Reference

point

is

The

handling

beyond the

of this text. If the reader wishes to

go

(e.g.,

scope
a

further,

Anno-tated

C++

Manual by Ellis and Stroustrup

and

The Design and Evolution of C++ by Stroustrup.

15

Remember this code only works with the MS and

Borland compilers, not


16

COM classes also

172

gcc.

use the

stdcall calling convention,

CHAPTER 7

STRUCTURES AND C++

Appendix A

80x86 Instructions
A.1

Non-floating Point

Instructions
This section lists and describes the actions

and formats of the non-floating point instructions


of the Intel 80x86 CPU family.
The formats

use the following

abbreviations:

general register

R8

8-bit register

R16

16-bit register

R32

32-bit register

SR

segment register

These

can

memory

M8

byte

M16

word

M32

double word

immediate value

be combined for the multiple operand

instructions.

means

For

example,

that the instruction

the

format

R, R

takes two register

Many of the two operand instructions

operands.
allow the

same

operands. The abbre-viation O2 is

used to represent these operands: R,R R,M R,I


M,R M,I. If
used for

a 8-bit

register

an operand,

or memory can

the abbreviation,

be

R/M8 is

used.
The table also shows how various bits of the
FLAGS register

are

affected

by each instruction

If the column is blank, the corresponding

bit is

not affected at all. If the bit is always changed to

particular

value,

1or 0 is

column. If the bit is changed

depends
placed

on the
in the

modified in

shown in the

to

value that

operands of the instruction,


column.

some

Finally,

undefined

way a

if the

a C is
bit

is

? appears in

the column. Because the

173
174

APPENDIX

A.

80X86 INSTRUCTIONS

only instructions that change the direction flag

are CLD and STD, it is not


FLAGS columns.

listed under the

Flags
Name

Description

ADC

Add with Carry

O2

ADD

Add Integers

O2

AND

Bitwise AND

O2

BSWAP

Byte Swap

R32

CALL

Call Routine

RMI

CBW

Convert Byte to Word

CDQ

Convert

Dword

Formats

to

Qword

CLC

Clear Carry

CLD

Clear Direction Flag

CMC

Complement

CMP

Compare Integers

CMPSB

Carry

C
C

Compare Bytes

CMPSW

Compare Words

CMPSD

Compare Dwords

CWD

Convert

Word

O2

to

Dword into DX:AX


CWDE

Convert

Word

to

Dword into EAX


DEC

Decrement Integer

RM

DIV

Unsigned Divide

RM

ENTER

Make stack frame

I,0

IDIV

Signed Divide

RM

IMUL

Signed Multiply

R16,R/M16
R32,R/M32
R16,I
R32,I
R16,R/M16,I
R32,R/M32,I
INC

Increment Integer

RM

INT

Generate Interrupt

JA

Jump Above

JAE

Jump Above

JB

Jump Below

JBE

Jump Below

JC

Jump Carry

or Equal

I
I

or Equal

I
I

A.1. NON-FLOATING POINT

175

INSTRUCTIONS

Flags
Name

Description

Formats

=0

JCXZ

Jump if CX

JE

Jump Equal

JG

Jump Greater

JGE

Jump

I
I
I

Greater

or

Equal
JL

Jump Less

JLE

Jump Less

JMP

Unconditional Jump

JNA

Jump Not Above

JNAE

Jump Not Above

or Equal

I
RMI
I

or

Equal
JNB

Jump Not Below

JNBE

Jump Not Below

or

Equal
JNC

Jump No Carry

JNE

Jump Not Equal

JNG

Jump Not Greater

JNGE

Jump Not Greater

or

Equal
JNL

Jump Not Less

JNLE

Jump

Not

Less

or

Equal
JNO

Jump No Overflow

JNS

Jump No Sign

JNZ

Jump Not Zero

JO

Jump Overflow

JPE

Jump Parity Even

JPO

Jump Parity Odd

JS

Jump Sign

JZ

Jump Zero

LAHF

Load FLAGS into AH

LEA

Load Effective Address

LEAVE

Leave Stack Frame

LODSB

Load Byte

LODSW

Load Word

LODSD

Load Dword

LOOP

Loop

LOOPE/LOOPZ

Loop If Equal

LOOPNE/LOOPNZ

Loop If Not Equal

176
80X86 INSTRUCTIONS

R32,M

APPENDIX

A.

Flags
Name
MOV

Description

Formats

Move Data

O2
SR,R/M16
R/M16,SR

MOVSB

Move Byte

MOVSW

Move Word

MOVSD

Move Dword

MOVSX

Move Signed

R16,R/M8
R32,R/M8
R32,R/M16

MOVZX

Move Unsigned

R16,R/M8
R32,R/M8
R32,R/M16

MUL

Unsigned Multiply

RM

NEG

Negate

RM

NOP

No Operation

NOT

1s Complement

RM

OR

Bitwise OR

O2

POP

Pop From Stack

R/M16

POPA

Pop All

POPF

Pop FLAGS

PUSH

Push to Stack

PUSHA

Push All

PUSHF

Push FLAGS

RCL

Rotate Left with Carry

R/M32

R/M16
R/M32 I

R/M,I

R/M,CL
RCR

Rotate

Right

with

Carry
REP

Repeat

REPE/REPZ

Repeat If Equal

REPNE/REPNZ

Repeat If Not Equal

RET

Return

ROL

Rotate Left

R/M,I
R/M,CL

R/M,I
R/M,CL

ROR

Rotate Right

R/M,I
R/M,CL

AH

Copies

SAHF

into

FLAGS

A.1. NON-FLOATING POINT

177

INSTRUCTIONS

Flags
Name
SAL

Description

Formats

Shifts to Left

R/M,I

R/M, CL
SBB

Subtract with Borrow

SCASB

Scan for Byte

SCASW

Scan for Word

SCASD

Scan for Dword

SETA

Set Above

SETAE

Set Above

SETB

Set Below

SETBE

Set Below

SETC

Set Carry

R/M8

SETE

Set Equal

R/M8

SETG

Set Greater

SETGE

Set Greater

SETL

Set Less

SETLE

Set Less

SETNA

Set Not Above

SETNAE

Set

O2

R/M8

or Equal

R/M8
R/M8

or Equal

R/M8

R/M8

or Equal

R/M8
R/M8

or Equal

Not

R/M8
R/M8

Above

or

R/M8

Equal
SETNB

Set Not Below

SETNBE

Set

Not

R/M8

Below

or

R/M8

Equal
SETNC

Set No Carry

R/M8

SETNE

Set Not Equal

R/M8

SETNG

Set Not Greater

SETNGE

Set

Not

Greater

R/M8

or

R/M8

Equal
SETNL

Set Not Less

SETNLE

Set Not LEss

R/M8

or Equal

R/M8

SETNO

Set No Overflow

R/M8

SETNS

Set No Sign

R/M8

SETNZ

Set Not Zero

R/M8

SETO

Set Overflow

R/M8

SETPE

Set Parity Even

R/M8

SETPO

Set Parity Odd

R/M8

SETS

Set Sign

R/M8

SETZ

Set Zero

SAR

Arithmetic

R/M8
Shift

to

Right

R/M,I

R/M, CL

178

APPENDIX

A.

80X86 INSTRUCTIONS

Flags
Name
SHR

Description

Logical Shift to Right

Formats

R/M,I

R/M, CL
SHL

Logical Shift to Left

R/M,I

R/M, CL

STC

Set Carry

STD

Set Direction Flag

STOSB

Store Btye

STOSW

Store Word

STOSD

Store Dword

SUB

Subtract

O2

TEST

Logical Compare

R/M,R

R/M,I
XCHG

Exchange

XOR

Bitwise XOR

R/M,R
R,R/M
O2

A.2. FLOATING POINT INSTRUCTIONS

179

A.2

Floating Point Instructions

In this section,

coprocessor

section
of

information

the

of the 80x86

are

instructions

description
operation

many

described.

about whether

the

The

describes

briefly

instruction.

math

the

save space,
instruction pops
To

the stack is not given inthe description.


The

format

operands

column

can be used

what

type

are used:

STn

Single precision number inmemory

Double precision number in memory

register

Extended precision number in memory


I16

Integer word in memory

I32

Integer double word inmemory

I64

Integer quad word inmemory

Instructions requiring

terisk(

of

with each instruction. The

following abbreviations

coprocessor

shows

).

a Pentium

Pro

or better are marked

with anas-

Description

ST0

FABS

FADDP dest[,ST0]

+= src
+= STO
dest += ST0

FCHS

ST0

FADD

src

FADD dest, ST0

FCOM

src

FCOMP

ST0

STn FD

dest

STn
STn

= ST0

Compare ST0 and

src

Format

= |ST0|

Compare ST0 and

src
src

STn FD
STn FD

src

Compares ST0 and ST1

src

Compares into FLAGS

STn

Compares into FLAGS

STn

ST0 /= src

STn FD

FDIV dest, ST0

dest /= STO

STn

FDIVP dest[,ST0]

dest /= ST0

STn

FCOMPP
FCOMI

FCOMIP

FDIV

src

src

FDIVRP dest[,ST0]

= src/ST0
dest = ST0/dest
dest = ST0/dest

FFREE dest

Marks

FDIVR

src

FDIVR dest, ST0

STn FD

ST0

STn
STn

as empty

STn

+= src

FIADD

src

ST0

FICOM

src

Compare ST0 and

I16 I32

Compare

I16 I32

FICOMP

FIDIV

src

src

FIDIVR

src

STO /= src
STO

= src/ST0

180
80X86 INSTRUCTIONS

I16 I32

src
ST0 and src

I16 I32
I16 I32

APPENDIX

A.

Instruction
FILD

src

FIMUL

src

Description

I16 I32 I64

ST0 *= src

I16 I32

FINIT

Initialize Coprocessor

FIST dest

Store ST0

FISTP dest

Store ST0

FISUB

src

FISUBR

src

-= src
ST0 =src -ST0
ST0

FLD src

Push src on Stack

FLD1

Push 1.0 on Stack

FLDCW

src

Format

Push src on Stack

Load Control Word Register

I16 I32
I16 I32 I64
I16 I32
I16 I32
STn FDE

I16

Push on Stack

FLDPI

Push 0.0 on Stack

FLDZ

ST0 *= src

STn FD

FMUL dest, ST0

dest *= STO

STn

FMULP dest[,ST0]

dest *= ST0

STn

FRNDINT

Round ST0

FMUL

src

= ST0
=

bST1c

FSCALE

ST0

FSQRT

ST0

FST dest

Store ST0

FSTP dest

Store ST0

FSTCW dest

Store Control Word Register

I16

FSTSW dest

Store Status Word Register

I16 AX

STO

FSUBP dest[,ST0]

-= src
dest -= STO
dest -= ST0
ST0 =src-ST0
dest = ST0-dest
dest = ST0-dest

FTST

Compare ST0 with 0.0

FXCH dest

Exchange ST0 and dest

FSUB

src

FSUB dest, ST0


FSUBP dest[,ST0]
FSUBR

src

FSUBR dest, ST0

ST0

STn FD
STn FDE

STn FD
STn
STn
STn FD
STn
STn

STn

Index
ADC, 37, 54

ADD, 13, 36
AND, 50

array1.asm, 99102

arrays,

95115

accessing, 96102
defining, 9596

local variable, 96

static, 95
multidimensional, 103106

parameters, 105106

two dimensional, 103104


assembler, 11
assembly language, 1112

binary, 12

addition, 2
bit operations
AND, 50
assembly, 5253
C,5657

NOT, 51
OR, 50

shifts, 4750

arithmetic shifts, 48
logical shifts, 4748

rotates, 49
XOR, 51

branch prediction, 53
bss segment, 21
BSWAP, 59

BYTE, 16
byte, 4

C driver, 19

181

C++, 150171

Big int example, 158163


classes, 156171

copy constructor,

161

early binding, 168

extern C, 153
inheritance, 166171
inline functions, 154156

late binding, 168


member functions,

see meth-

ods

name

mangling, 151153

polymorphism, 166171

references, 153154

typesafe linking, 152


virtual, 166
vtable, 167171
CALL, 6970

calling convention, 65, 7076, 83

84
cdecl, 84
stdcall, 84
C, 21, 71, 8084

labels, 82

parameters, 82
registers, 81

return values, 83
Pascal, 71
register, 84

standard call, 84
stdcall, 72, 84, 171
CBW, 31

CDQ, 31
CLC, 37

182

CLD, 106

clock, 5
CMP, 3738
CMPSB, 109
CMPSD, 109
CMPSW, 109

code segment, 21
COM, 171

comment, 12

compiler, 5,11

Borland, 22, 23

DJGPP, 22, 23

gcc, 22
attribute

149

84, 145, 148,

Microsoft, 22

pragma

pack, 146, 148, 149

Watcom, 83

conditional branch, 3941


counting bits, 6064

method

one, 6061

method three, 6264


method two, 6162
CPU, 57

80x86, 6

CWD, 31
CWDE, 31

data segment, 21
debugging, 1618
DEC, 13

decimal, 1
directive, 1315
%define, 14

DX, 14, 95

data, 1415
DD, 15
DQ, 15

equ, 13
extern, 77
global, 21, 78, 80

RESX, 14, 95
TIMES, 15, 95
INDEX

DIV, 34, 48

do while loop, 43
DWORD, 16

endianess, 2425, 5760


invert endian, 59
FABS, 130

FADD, 126
FADDP, 126
FCHS, 130
FCOM, 129

FCOMI, 130
FCOMIP, 130, 140
FCOMP, 129
FCOMPP, 129

FDIV, 128
FDIVP, 128
FDIVR, 128
FDIVRP, 128
FFREE, 126

FIADD, 126
FICOM, 129
FICOMP, 129

FIDIV, 128
FIDIVR, 128
FILD, 125
FIST, 126

FISUB, 127
FISUBR, 127

FLD, 125
FLD1, 125
FLDCW, 126

FLDZ, 125

floating point, 117139


arithmetic, 122124
representation, 117122
denormalized, 121

double precision, 122


hidden

one, 120

IEEE, 119122

single precision, 120121

floating point
INDEX

coprocessor,

124139

addition and subtraction, 126

127
comparisons, 128130
hardware, 124125
loading and storing data, 125
126
multiplication and division, 128
FMUL, 128
FMULP, 128
FSCALE, 130, 141
FSQRT, 130
FST, 126

FSTCW, 126
FSTP, 126
FSTSW, 129
FSUB, 127
FSUBP, 127

FSUBR, 127
FSUBRP, 127
FTST, 129

FXCH, 126

gas, 157
hexadecimal, 34
I/O, 1618

asm io library,

1618

dump math, 18
dump
dump

mem, 17
regs, 17

dump stack, 17
print char, 17
print int, 17

print nl, 17
print string, 17
read char, 17
read int, 17
IDIV, 34

if statment, 42

immediate, 12
IMUL, 3334
INC, 13

183

indirect addressing, 6566

arrays,

98102

integer, 2738
comparisons, 3738
division, 34

extended precision, 3637


multiplication, 3334

representation, 2733
ones complement, 28

signed magnitude, 27
twos complement, 2830

sign bit, 27, 30

sign extension, 3033


signed, 2730, 38
unsigned, 27, 38

interfacing with C, 8089

interrupt, 10
JC, 39

JE, 41
JG, 41

JGE, 41
JL, 41

JLE, 41
JMP, 3839
JNC, 39

JNE, 41
JNG, 41
JNGE, 41

JNL, 41
JNLE, 41
JNO, 39

JNP, 39
JNS, 39

JNZ, 39
JO, 39

JP, 39

JS, 39

JZ, 39

label, 1416
LAHF, 129
LEA, 83, 103

184

linking, 23

listing file, 2324


locality, 143
LODSB, 107
LODSD, 107
LODSW, 107
LOOP, 41

LOOPE, 41
LOOPNE, 41
LOOPNZ, 41
LOOPZ, 41

machine language, 5,11


MASM, 12
math.asm, 3536

memory, 45
pages, 10
segments, 9,10
virtual, 9,10

memory.asm,

111115
memory:segments, 9
methods, 156

mnemonic, 11
MOV, 12
MOVSB, 108
MOVSD, 108
MOVSW, 108
MOVSX, 31
MOVZX, 31

MUL, 3334, 48, 103

multi-module
NASM, 12
NEG, 34, 55

nibble, 4
NOT, 52

programs,

7780

opcode, 11
OR, 51

prime.asm, 4345
prime2.asm, 135139

protected mode
16-bit, 9
INDEX

32-bit, 10

quad.asm, 130133
QWORD, 16

RCL, 49

RCR, 49

read.asm, 133135
real mode, 89

recursion, 8990
register, 5,78
32-bit, 8

base pointer, 7,8

EDI, 107
EDX:EAX, 31, 34, 37, 83
EFLAGS, 8

EIP, 8
ESI, 107

FLAGS, 7,3738
CF, 38

DF, 106
OF, 38

PF, 39

SF, 38

ZF, 38
index, 7
IP, 7

segment, 7,8,107
stack pointer, 7,8

REP, 108109
REPE, 110
REPNE, 110

see REPNE
see REPE

REPNZ,
REPZ,

RET, 6970, 72
ROL, 49
ROR, 49

SAHF, 129
SAL, 48

SAR, 48
SBB, 37
SCASB, 109
INDEX

185

SCASD, 109

16 SCASW, 109

WORD,

word,

8 SCSI, 147149
SETxx, 54

60 SETG, 55
51 SHL, 47
SHR, 47

skeleton file, 25
speculative execution, 53

stack, 6876
local variables, 7576, 8283

parameters, 7073
startup code, 23
STD, 106

storage types

XCHG,
XOR,

automatic, 91
global, 91
register, 93

static, 91
volatile, 93
STOSB, 107
STOSD, 107

string instructions, 106115

structures, 143150
alignment, 145146
bit fields, 146150
offsetof(), 144

SUB, 13, 36

subprogram, 6693
calling, 6976

reentrant, 89
subroutine,

see subprogram

TASM, 12
TCP/IP, 59

TEST, 51

text segment,

see code segment

twos complement, 2830

arithmetic, 3337

TWORD, 16
UNICODE, 59

while loop, 43