You are on page 1of 19

1.

INTRODUCTION TO THE X86 ARCHITECTURE AND THE EMU8086 EMULATOR

1.1 The block diagram of a microcomputer


A microcomputer is a general purpose device that can be programmed to carry out a set of arithmetic or logical operations. Since the sequence of basic operations can be changed, the microcomputer can solve more than one kind of problem.

Figure 1. The block diagram of a microcomputer Conventionally, a computer consists of at least one processing element, typically a central processing unit (CPU), some form of memory and input/output devices, all interconnected by several types of busses: CPU: the hardware block which controls the system and processes data. memory: the hardware block which stores data and instructions in a sequence of memory locations. The memory locations are defined by addresses and content. I/O ports: hardware blocks that form the interface between the microcomputer and the external world. busses: the connections between the other three hardware blocks.

1.2 The x86 architecture. CPU Internals


In the Microprocessor Architecture laboratory you will study Intels x86 architecture. This section presents a summary of the registers and flags inside the CPU. A register is a small amount of storage available as part of the CPU. Registers are addressed by other mechanisms than the main memory and can be accessed more quickly. The typical usage scenario is the following: a) the CPU loads data from a larger memory into registers, b) the data inside the registers is used for arithmetico-logical instructions, manipulated, or tested, and c) in the end the results are stored back in the main memory, either by the same instruction or a subsequent one. The x86 architecture provides several 16-bit registers. Four of them (AX, BX, CX, DX) are general-purpose registers (GPRs), which means they can be potentially used for any operation. However, there are some restrictions: there are some instructions which use by default one or several general purpose registers. For example: only AX and DX can be used for multiplication and division instructions, only BX can be used to store effective addresses (for indirect memory addressing), only CX can be used as a counter with the loop instruction. All the general purpose registers can be accessed as two separate bytes (e.g. BX's high byte can be accessed as BH and low byte as BL). There are two pointer registers with special roles: SP (stack pointer) points to the "top" of the stack, and BP (base pointer) is often used to point at some other place in the stack, typically above the local variables. Because these registers are used as pointers, the information they store is typically interpreted as an effective address. The registers SI (source index) and DI (destination index) are address registers, and may also be used for array indexing. These registers store effective addresses or indexes. Special instructions, such as the array manipulation instructions, use these registers to point to the current element in the source array (SI) and to the current element in the destination array (DI). Four segment registers: CS (code segment register), DS (data segment register), SS (stack segment register) and ES (extended data segment register) are used to form a physical address (in the memory). They always store segment addresses. More informations on memory addressing will be provided in Laboratory 3. The FLAGS register (or also called status register) contains all the flags of the CPU: CF (carry flag): signals an arithmetic carry or borrow for unsigned 16bit numbers PF (parity flag): the sum modulo 2 of the result AF (adjust flag): signals an arithmetic carry or borrow over the first nibble ZF (zero flag): signals that the result is 0

SF (sign flag): signals that the most significant bit of the result is set (this is the sign bit in twos complement representation) DF (direction flag): controls the left-to-right or right-to-left direction of array processing OF (overflow flag): signals an arithmetic overflow for signed 16bit numbers Finally, the instruction pointer (IP) stores the effective address of the next instruction that will be fetched from memory and then executed. This register cannot be directly accessed (read or written) by a program.

1.3 Memory organization


In general, any memory block can be regarded as a sequence of memory locations. Typically, each memory location stores an 8-bit number, a byte (this is the content of the memory location). Each memory location is identified by a unique number called address, more specifically physical address. The size of the memory is directly linked with the physical address through the following equation:
memorySize 2 physicalAddressSize[bits ]

(1)

Example 1: using a physical address of 2 bits, one can form 4 different physical addresses: 00, 01, 10, and 11, corresponding to 4 different memory locations. Consequently, a memory with a physical address of 2 bits will comprise 4 memory locations (4 bytes). Example 2: using a physical address of 20 bits, one can form 2 20 different physical addresses, corresponding to 220 different memory locations. Consequently, a memory with a physical address of 20 bits will comprise 220 memory locations (1 MB).
physical addresses a unique address for each memory location contents

FFFFFh FFFFEh 00010h 0000Fh 0000Eh 00001h 00000h

88h 73h 09h 1Bh ACh 17h 24h

each memory location stores an 8-bit value

Figure 2. The memory is a sequence of memory locations with unique addresses

Note: the numbers in Figure 2 are written in hexadecimal. For more information on numbering bases see Section 1.7.

1.4 Understanding some x86 instructions


Microprocessors are programmed using a set of instructions. Although every architecture comes with a specific set of instructions, the types of instructions are typically the same: data transfer instructions o o o set a register or a memory location to a fixed constant value copy data from a memory location to a register, or vice versa read and write data from I/O devices

data processing instructions o o o o arithmetic operations (add, subtract, multiply, divide, etc.) logic operations (and, or, exclusive or, shift, rotate, etc.) bitwise logic operations compare operations

flow control instructions o o o branch to another location in the program and execute instructions there conditional branch to another location if a certain condition holds branch to another location, while saving the location of the next instruction as a point to return to (a call)

Some typical x86 instructions are presented below.

MOV Move (Copy) Data


Usage: MOV dest, src Arguments: dest - general-purpose register, segment register (except CS) or memory location src - immediate value, general-purpose register, segment register or memory location Effects: Copies the source to the destination, overwriting the destination's value: (dest) (src). Flags: none Miscellaneous: The arguments must be the same size (byte, word).

Example

Example

Example

ADD Integer Addition


Usage: ADD d, s Arguments: d - register or memory location s - immediate, register or memory location; (two memory operands cannot be used) Effects: Adds the source to the destination: (d) (d) + (s). Flags: The CF, ZF, OF, SF, AF, and PF flags are set according to the result. Misc: No difference between signed and unsigned operands. CF and OF indicate carry in case of unsigned, respectively signed values. SF indicates the sign in case of signed values.

Example

Example

Example

ADC Add with Carry


Usage: ADC d, s Arguments: d - register or memory location s - immediate, register or memory location; (two memory operands cannot be used) Effects: Adds the the carry flag (CF) and the source to the destination: (d) (d) + (s) + (CF). Flags: The CF, ZF, OF, SF, AF, and PF flags are set according to the result. Misc: No difference between signed and unsigned operands. CF and OF indicate carry in case of unsigned, respectively signed values. SF indicates the sign in case of signed values.

Example

Example

Example

DIV Unsigned Division


Usage: DIV src Arguments: src 8bit or 16bit register or memory location; Effects: if src is an 8bit value: divides by src the value stored in AX and stores the remainder in AH and the quotient in AL: (AH) (AX) mod (src), (AL) (AX) div (src). if src is a 16bit value: divides by src the value stored in DX concatenated with AX and stores the remainder in DX and the quotient in AX: (DX) (DX) (AX) mod (src), (AX) (DX) (AX) div (src). Flags: The CF, ZF, OF, SF, AF, and PF flags are undefined. Misc: If the quotient is larger than 8bits (16bits) and cannot be stored in AX (DX AX) then a divide overflow error will be thown.

Example

1.5 The emu8086 Emulator


The emu8086 emulator, freely available for download at http://www.emu8086.com, will be used throughout the Microprocessor Architecture laboratories to exemplify the architectural attributes of the x86 architecture. This emulator will be used to write assembly language programs, compile them and execute them with the purpose of understanding how the microprocessor operates. The emu8086 emulator has many options, but the basic usage scenario for the laboratory will be the following: 1. Start the emulator. The Source Code window (without any assembly code) will be displayed. 2. Use the Source Code window to write an assembly language program. Save the program by clicking on the File menu -> Save As submenu. 3. Optionally, if the program is provided by your instructor, load the .asm file by clicking the Open button and selecting the corresponding source file. 4. Compile the program: 4.1. Click the Compile button to compile the program. 4.2. If the Assembler Status dialog, displayed after compilation, lists any compilation errors, double click on the error text to return to the Source Code window, fix the error and compile the program again. 4.3. If there are no assembly errors proceed to the next step. 5. Load the executable program in the emulator.

Example

5.1. In the Assembler Status dialog, displayed after the compilation, click the Run button. 6. Execute the program step-by-step, watch the status change of the registers, memory locations, flags, etc. and write down observations. 6.1. Click the Reload button to reload the executed program. 6.2. View and inspect the registers in the Emulator window. 6.3. View the Source window by clicking on the Source button. 6.4. Optionally, if needed, view the Flags window by clicking on the Flags button. 6.5. Optionally, if needed, view the Memory window by clicking on the View menu -> Memory submenu. 6.6. Optionally, if needed, view the Stack window by clicking on the Stack button. 6.7. Click the Single Step button and note how the current, highlighted instruction is executed. Note the status change of the registers, memory locations, flags, etc. 6.8. Repeat step 6.7 above until a Message dialog (saying that the program has returned control to the operating system) is displayed. 7. Draw conclusions regarding the effect of the various instructions on the registers, memory locations, flags, etc.

1.6 Exercises
1.6.1 Exercise 1
Objective. The purpose of this exercise is to get used to the emu8086 emulator and to the MOV and ADD instructions of the x86 instruction set. Requierment. Write a program that performs the average of three unsigned 16-bit numbers. Solution. 1. Start the emulator. 2. Use the Source Code window to write the following program:

org 100h mov AX, 0h mov DX, 0h mov AX, 1234h add AX, 8017h adc DX, 0h add AX, 9423h adc DX, 0h mov BX, 3h div BX mov CX, AX int 20h 3. Understand the program! 3.1. The first line of this program (org 100h) is not an instruction. This is an assembly directive specifying that the next instruction (and consequently the whole program) will be loaded in the memory (in the code segment) starting with address 100h. 3.2. The following two instructions initialize the registers which will store the average. Note that the sum of three positive 16-bit numbers results in a larger number, which might not fit into 16 bits. This is why we use two 16-bit registers (AX and DX) to store the result. DX will store the most significant 2 bytes and AX the least significant two bytes. 3.3. Going further, the first number (1234h) is loaded into AX (mov AX, 1234h). Then, the second number is added to AX (add AX, 8017h). 3.4. After the previous instruction the carry flag ( CF) might have the value 1 (if the sum does not fit into 16-bits). The value of this bit is added to DX (adc DX, 0h). 3.5. Going further, the third number ( 9423h) is added to AX (add AX, 9423h). Then, the carry flag is again added to DX (adc DX, 0h). 3.6. The instruction mov BX, 3h loads the value 3h into BX, and the instruction div BX divides the 32-bit value stored in DX AX by the 16-bit value stored in BX. After the division, the quotient is stored in AX and the remainder is stored in DX. 3.7. Finally, the result is copied from AX to CX. 3.8. The instruction int 20h is a software interrupt. It ends the current program and returns control to the operating system. 4. Save the program (File menu -> Save As submenu) with the name lab1_prog1.asm. 5. Compile the program: 5.1. Click the Compile button to compile the program.

5.2. You will be prompted to save the executable file. Save it with the recommended name (lab1_prog1.com). 5.3. View the compilation status in the Assembler Status dialog. If the program was edited correctly the message should be lab1_prog1.com is assembled successfully into 30 bytes.

6. Load the executable program in the emulator. 6.1. Click the Run button to load the program in the emulator and execute it. 7. Execute the program step-by-step, watch the status change of the registers, memory locations, flags, etc. and write down observations. 7.1. Click the Reload button to reload the executed program.

7.2. Inspect the Emulator window and note that:

7.2.1. The current instruction (mov AX, 00000h) is highlighted. This is the first instruction in the program and was loaded at the logical address 0700:0100 (segment address : effective address). The effective address was imposed by the org 100h assembly directive. 7.2.2. The value in the register IP (the register that stores the effective address of the current instruction) is 0100h.

7.3. Click the Single Step button to execute the first instruction. Note that: 7.3.1. The value in the register IP has changed (to 103h) because IP now points to the second instruction (highlighted), which is stored in the memory at address 0700:0103. 7.3.2. The value stored in the AX register has not changed because it was already 0000h.

7.4. Click again the Single Step button to execute the second instruction. Note that: 7.4.1. The value in the register IP has changed (to 0106h) because IP now points to the third instruction (highlighted), which is stored in the memory at address 0700:0106.

7.4.2. The value stored in the DX register has not changed because it was already 0000h. 7.5. Execute the next instruction (mov AX, 1234h) and note that: 7.5.1. Register AX was loaded with the value 1234h. The new value in AX is now 1234h. 7.5.2. The value in the register IP has changed again. 7.6. Click on the Flags button and view the status of the flags after this arithmetic operation. Note that for the moment all the flags are 0. 7.7. Execute the next instruction (add AX, 8017h) and note that: 7.7.1. Register AX was loaded with the sum between its previous value (1234h) and the value 8017h. The new value in AX is now 924Bh. 7.7.2. The carry flag (CF) is still 0, because the sum between the two values did not result in a number larger than 16-bits. 7.7.3. The zero flag (ZF) is still 0, because the result is a non-null value. 7.7.4. The sign flag (SF) is 1 and the overflow flag (OF) is 0. These will be discussed in Laboratory 2. 7.7.5. The parity flag (PF) is 1, because the sum modulo 2 of the bits in the result is 1. 7.7.6. The value in the register IP has changed again. 7.8. Execute the next instruction (adc DX, 0h) and note that: 7.8.1. The value in register DX remains unchanged because the carry flag is 0. 7.8.2. The value in the register IP has changed again. 7.9. Execute the next instruction (add AX, 9423h) and note that: 7.9.1. Register AX was loaded with the sum between its previous value (924Bh) and the value 9423h. The new value in AX is now 266Eh, which represent only the least significant 16-bits of the 17-bit result. The most significant bit, which is 1, is stored in the carry flag. 7.9.2. The carry flag (CF) is 1, because the sum between the two values resulted in arithmetic carry. In this case, the carry flag stores the most significant bit of the result. 7.9.3. The zero flag (ZF) is still 0, because the result is a non-null value. 7.9.4. The sign flag (SF) is 0 and the overflow flag (OF) is 1. These will be discussed in Laboratory 2. 7.9.5. The parity flag (PF) is 0, because the sum modulo 2 of the bits

in the result is 0. 7.9.6. The value in the register IP has changed again. 7.10. Execute the next instruction (adc DX, 0h) and note that:

7.10.1. The value in register DX is incremented by 1 (the value of the carry flag was added to the previous value of DX). Consequently, the value of DX is now 1h. 7.10.2. All the arithmetic flags are updated again. Among them, the carry flag (CF) becomes 0, because the sum between the DX, 0h and CF did not produce an arithmetic carry. 7.10.3. The value in the register IP has changed again. 7.11. Execute the next instruction (mov BX, 3h) and note that:

7.11.1. Register BX was loaded with the value 3h. 7.11.2. The value in the register IP has changed again. 7.12. Execute the next instruction (div BX) and note that:

7.12.1. The 32-bit value stored in DX AX (1266Eh) is divided by the 16-bit value stored in BX (3h). After the division, the quotient ( 6224h) is stored in AX and the remainder (2h) is stored in DX. 7.13. Execute the next instruction (mov CX, AX) and note that:

7.13.1. Register CX is loaded with the value stored in register AX, The value of AX does not change. 7.14. The current instruction is int 20h. Click the Single Step button twice and note that a Message dialog, saying that the program has returned control to the operating system is displayed. Click Ok. 8. Write down conclusions regarding the effect of the various instructions on the registers and memory locations.

1.7 Appendix 1. Numbering bases: 2, 10, 16


Any number can be represented as: a decimal number (number in base 10) and can be written as a sequence of digits (0, 1, , 9), a binary number (number in base 2) and can be written as a sequence of binary digits or bits (0 and 1), a hexadecimal number (number in base 16) and can be written as a sequence of hexadecimal digits (0, 1, , 9, A, B, C, D, E and F). A hexadecimal number is identified using the h suffix (1A44h) or the 0x prefix (0x1A44).

In computer systems several other notions, such as byte, word and double word are used. Definition byte (B) word (w) double word (dw) sequence of 8 bits sequence of 16 bits sequence of 32 bits Binary range 00000000 11111111 Unsigned decimal range 0 255 0 65535 Hexadecimal range 0x00 0xFF 0x0000 0xFFFF 0x00000000 0xFFFFFFFF

1.7.1 Unsigned-numbers base conversion


Decimal -> Binary (Hexadecimal) divide repeatedly by 2 (16), keeping track of the remainders write all the remainders in reverse orde Binary (Hexadecimal) -> Decimal multiply each binary (hexadecimal) digit with 2 (16) at power digit-index and sum the products example (binary): decimalValue(11012) = 1*23 + 1*22 + 0*21 + 1*20 = 1310 example (hexadecimal): decimalValue(3A16) = 3*161 + A*160 = 5810 Binary <-> Hexadecimal note that a sequence of 4 bits converts into a single hexadecimal digit and viceversa: o 0000 -> 0h, 0001 -> 1h, ..., 1111 ->Fh

a sequence of more than 4 bits can be converted to hexadecimal by converting sequences of 4 bits to hexadecimal (starting from right to left): o 0101101101101011110111 -> 01 0110 1101 1010 1111 0111 -> 1 6 D A F 7 -> 0x16DAF7

a sequence of hexadecimal digits will be converted digit-by-digit in several sequences of 4 bits: o 0x3F9 -> 0011 1111 1001 -> 11111110001

For more information on numbering bases and conversions please see this website: http://www.purplemath.com/modules/numbbase.htm.