You are on page 1of 25

A Brief x86 Assembler Tutorial

Assembly language programming for the Intel x86 chips is not necessarily a difficult task. However, it can be made more difficult than it need be, depending upon a number of factors. One of the most critical of these factors is the programmer's choice of assembler. There are three main assemblers in use today (I stand open to correction on this as I have only ever heard of these three). These are MASM, (MicroSoft's Assembler), TASM (Borland's Turbo Assembler) and A86 Assembler. (It has been recently pointed out to me that, as well as these, there is the GNU assembler which is released under the GNU public licence and is freely available for use with a number of operating systems. It comes as part of the binutils package on linux distributions.) Of these assemblers, I use A86, which is an excellent assembler and, though not free software, it is shareware and so can be downloaded from the internet. It is written by Eric Isaacson, and comes with a hefty manual in the shape of 19 text-files. I am including the entire package on my web-site, zipped, as well as the complete manual.

Tutorial 1 - The 8086 Chip


Before beginning to write programs in assembler, you need to know a few things about the chip for which you are writing the program. This tutorial will assume that all programs are being run on the 8086 chip and as such, they will all run on any IBM compatible PC, as the 80386, 80486 and Pentium (I, II and III) chips all are designed to run 8086 code. The 8086 chip uses registers for performing operations. It has The following registers

General Registers

AX BX CX DX Segment Registers CS DS ES SS Pointer Registers IP SP BP Data Transfer Registers SI

Points to start of the Code Segment Points to start of the Data Segment Extra Segment pointer Points to start fo Stack Segment Instruction Pointer Stack Pointer Base Pointer Source Index

DI Set of flags

Destination Index

When writing long or complicated programs in assembler one would need to use all these, but when using A86 for the most part one can forget about all but the first four. These 4 general purpose registers can be used alone to perform most of the tasks one would want. A86 by default assembles programs into .com files instead of .exe. Com programs are small programs, less than 64 kilobytes in size, the size of one segment in memory so in .com programs the data, code and stack all fit into one segment. Hence there is no need for the four segment pointers CS, DS, ES and SS as they are automatically set by the assembler to point to the only used segment - so forget about them for a few years! As for the last two sets of pointers, you can forget about them for a few years as well as they will not be required when doing only simple assembler. The exception being possibly the flags register, as we will be using the zero flag in some programs. Now that we have limited ourselves to just four registers we can look at these more closely. The four registers have names ending with X and starting with the first four letters of the alphebet, so their names should not be hard to remember. Each of the register is 16 bits wide ( as the 8086 was a 16 bit processor ). However, as one will frequently be working with bytes, each of the registers can be accessed one half at a time. To access the top half of a register simply replace the X in the name by a H, for high, and to access the lower half replace the X by an L for low. So CH is the top eight bits of the CX register and AL is the lower eight bits of the AX register. With that basic information known, we can now start writing programs.

Tutorial 2 - The MOV and INT Instructions


In this tutorial we will write our very first program in assembler and compile it. We will also meet the two most basic instructions in assembler, from our point of view. These are the MOV instruction which is used to transfer data, and the INT instruction which calls a DOS interrupt.

The MOV Instruction


The MOV instruction is the instruction which will appear more than any other in an assembler program. All that is does is it copies a piece of data from one location to another. It is similar in concept to the MOVE operator in the COBOL language, but it is used far more frequently. Here are a few examples of MOV instructions.

MOV BX,AX MOV CH,DH MOV BH,DL MOV AH,12

; This copies the contents of the AX register into the BX register. ; This copies the top byte of the DX register into the top byte of the CX register. ; This copies the bottom byte of the DX register into the top byte of BX. ; This puts the value 12 decimal into the top half of the AX register. 2

MOV AH,0Ch ; This does the same as the above except that the number is given in hexadecimal. ; Hexadecimal numbers MUST begin with a digit and end with a "h". MOV DL,"*" ; This puts the character "*" into DL (lower half of DX). MOV DL,42 ; This does the same thing, as characters are stored as numbers. ( ASCII char 42 = "*" )

The above are all prefectly legal assembler statements ( notice that comments are preceded by a ";". This is the same as "//" in C++ or "*" in COBOL). In fact you could type the above statements into a text file and it would assemble with A86. (If you do try to do this, do NOT run the .com file generated as it will not terminate!). There are a number of things that you cannot do with the MOV instruction:

MOV AX,BH MOV CH,BX MOV 12,DL

; Invalid operation, as you cannot move an 8 bit quantity to a 16 bit one. ; Similar to above, you cannot put a 16 bit quantity into an 8 bit one. ; You cannot put a value into the number 12. If you see this is a program what ; is probably meant is MOV DL,12 - put 12 into DL. MOV DL,AL,CL ; You cannot have 3 operands! MOV AH ; Neither can you have only 1! You must have exactly 2: destination and source The INT Instruction
The INT instruction is the instruction which does the most work in any assembler program. What it does is it calls a DOS interrupt (like a function) to perform a special task. When one wants to read from the keyboard or disk or mouse, or write to the screen, one uses an interrupt. When using DOS, there are a over 50 different interrupts available. Of these the programmer will only use a few. Each interrupt though, has a number of sub-functions which select the individual task that the function has to do. For example, there is just one interrupt for accessing the mouse INT 33h, but there are separate subfunctions available to see if a BUTTON has been clicked, to see how far the mouse has moved, to display or hide a mouse pointer etc. An assembler programmer's best friend is an list of interrupts and their subfunctions, as whenever you want to do some input or output you can simply go down the list until you find the interrupt subfunction which does what you want, and use it. I, being the helpful chap that I am, have provided a brief interrupt list here which should be sufficient for most of your needs. By now I'm sure you are asking, how do I use these wonderful interrupts? Thankfully, it is not difficult. One goes down the list until one find the appropriate interrupt subfunction and moves the subfunction number to AH. One then looks at the input required by the function and moves the appropriate values to the registers stated. Example: I you go down the list you will see that interrupt 21h (The DOS interrupt), subfunction 2, outputs a character. So let us write a code extract which will output the character "!".

MOV AH,02 ; To select subfunction 2, move the appropriate number, 2, to AH. 3

MOV DL,"!" ; In the interrupt list, it says that the character to output should be ; in register DL. So we move the character to DL. INT 21h ; Finally when all the registers are set as required, we call the interrupt.

Perhaps the most inportant of all the interrupt subfunctions is INT 21h, subfunction 4Ch. This is the function which terminates the program and returns the user to the operating system. Every assembler program you write should end with the following lines.

MOV AH,4Ch ; Select the subfunction MOV AL,00 ; Select a return value (optional but recommended) INT 21h ; Call the DOS interrupt.

Now we can write compile and run our first assembler program. Using a text editor, MS-DOS Edit, or Windows Notepad, type in the following lines ( same as above ) and save it as prog1.asm in the same directory as A86:
MOV MOV INT MOV MOV INT AH,02 DL,"!" 21h AH,04Ch AL,00 21h ; ; ; ; ; ; Function to output a char Character to output Call the interrupt to output "!" Select exit function Return 0 Call the interrupt to exit

At the DOS command line then type in the following command: "A86 prog1.asm". A86 assembler should start up ( I am assuming you are already in the directory where A86.com, A86.lib and prog1.asm are) and assemble your program. If you then type "dir" you will see that the file prog1.com has been generated. Type "prog1" to run the program, and viola!
Tutorial 3 - Labels and Jumps.
In this, third tutorial we will meet the idea of labels in assembler and how to use them for conditional execution. For this section we will meet 4 new instructions, and encounter the zero flag for the first time.

Labels
Labels are names which are used to identify various pieces of code. In effect they give a name to a particular location in an assembler program. In assembler, a label consists of a name followed immediately by a colon. Any letter or number both upper and lower case as well as the underscore, may be used in label names. Names are not case sensitive, like the rest of assembler (mov Ah,dL is the same as MOV AH,DL). The following are all valid label names:

start: loop1: read_a_key: ANY_Label_3: L1:

To put a label in your code is simple, just put in it the middle of your code with instructions either side of it. Remember, though, no two labels can have the same name, and reserved words cannot be used as label names eg. you can't have a label "mov:", use "move:" instead. Below is our program from Tut2 with labels inserted:

; This point in the code is now called "Output_char". Output_char: ; To select subfunction 2, move the appropriate number, 2, to AH. MOV AH,02 ; In the interrupt list, it says that the character to output MOV DL,"!" should be ; in register DL. So we move the character to DL. INT 21h ; Finally when all the registers are set as required, we call the interrupt. Exit: ; Labels should be relevant to the code after them. MOV AH,4Ch ; Select the subfunction. MOV AL,00 ; Select a return value (optional but recommended). INT 21h ; Call the DOS interrupt.

Jumps and Conditional Execution


The principle usage of labels in assembler is to perform conditional execution, the equivalent of IF statements in most high level languages. In assembler one has to have two instructions to have conditional execution. The first instruction is nearly always the CMP instruction which compares two values. If they are equal one of the CPU's flags, known as the zero flag is set (Basically the CMP instruction gets the difference between two quantities and see's if it is zero or not). The second instruction necessary for conditional execution is the JMP instruction or a derivitive thereof. These instructions shall now be examined individually.

JMP
The JMP instruction in assembler causes the program execution to continue from a certain point. The JMP instruction has just one operand which is either the address of the point in the program where execution is to start (very very rare) or a label. Consider the following piece of code:

start: ; sub-function 8 - read a character mov ah,08 ; call interrupt int 21h ; save the key read in bl. mov bl,al ; a jump instruction causes the program to start running now JMP output from the ; output label, skipping out the next two lines. mov ah,01 ; These never get executed... int 21h output: ; execution continues here mov ; output a "(" dl,"(" mov ah,02 int 21h ; Then output the character read still held in bl mov dl,bl int 21h ; Last output a ")" mov dl,")" int 21h ; Terminate the program exit: mov ah,4ch mov al,00 int 21h

The code executes in a linear fashion until it gets to the jmp command from which it execution continues with the statement "mov dl,bl". The intermediate two lines never get executed.

JZ
The JZ instruction is a form of the JMP instruction except that the jump occurs only when the zero flag is set. The instruction is read as "Jump if Zero".

JNZ
The JNZ instruction is the opposite of the JZ instruction in that the jump occurs when the zero flag is NOT set. It is read as "Jump if Not Zero".

CMP
The CMP(compare) instruction is used two compare two values and to act upon the result of that comparison. For now, we shall concern ourselves with the two most basic results of the comparison, whether the quantities are equal or not. The compare instruction essentially subtracts the two values passed to it and sets the zero flag if the difference is zero i.e. the two quantities are equal. A combination of the CMP and the JMP instructions can be used to implement the assembler equaivalent of a basic if statement. Consider the following example which will read in a key and tell the user if he pressed escape (ASCII code 27):

start: mov ah,08 int 21h CMP al,27 JNZ not_escape is_escape: mov ah,02 mov dl,"E" int 21h mov dl,"S" int 21h mov dl,"C" int 21h not_escape: mov ah,4ch mov al,00 int 21h

; again sub-function 8 ; read the character ; compare it to the escape character ; if it is not equal (difference is not zero (NZ)) then go to not_escape ; otherwise this code gets executed. ; subfunction 2 - output a character. ; output letter "E" ; then output "S". (NOTE: 02 remains in ah register so no need to keep moving it) ; finally output "C" ; if any other key is pressed execution continues here ; exit the program.

Tutorial 4 - Variables and Strings.


In this the fourth tutorial we will cover how do allocate space in memory for variables in out programs and also how output messages on the screen using strings. This whole topic is very basic and very simple, as well as being essential to do anything useful in assembler Unlike other assemblers, a86 does not specifically require a special area in the program where variables are declared, filled with keywords to remember. However it is good practice to have all your variables grouped together at the start of your program, so as to have a clear separation between variables and code. In assembler there are two"keywords" of sorts to remember: DB and DW. DB tells the compiler to allocate a byte space (8 bits) for a variable, while DW allocates two bytes (16 bits) for the variable. Characters and small numbers are allocated with db while most numbers for arithmetic are dw. Once variables are declared, they can then be used like registers, with the mov instruction to transfer data. Here is an example of the program from tutorial 3 with a variable to hold each bracket:

jmp start ; ;======================= ; leftbr db "(" ; rightbr db ")" ; key db ;

First instruction should be a command Data declarations clearly separated from code The variables declared to hold the brackets are given them as initial values. Variable declared to hold the key pressed

;======================= start: ; Program execution starts here... mov ah,08 int 21h ; Read a keypress mov key,al ; Store the key in the variable

output: mov dl,leftbr mov ah,02 int 21h mov dl,key int 21h mov dl,rightbr int 21h exit: mov ah,4ch mov al,00 int 21h

; Move the variable to dl for output ; Output "(" ; Ouput key ; Output ")" ; Exit code 0 ; Terminate program

Strings and arrays can be declared in assembler too, by a number of methods. The easiest method involves simply assigning more than one character to a data item, and voila - a string. This allows us to output messages easily, like in a basic "hello world" program:
jmp start ;============================ msg db "Hello World.$" ;============================ start: mov ah,09 mov dx,offset msg the string int 21h exit: mov ah,4ch mov al,00 int 21h ; Start program... ; A string variable with a value. ; subfunction 9 output a string ; DX points to (holds the address of) ; Output the message ; Exit code 0 ; Terminate program

There are a couple of things to notice about the above program, and about outputing strings in general. Firstly, when using interrupt 21h, sub-fn 9 to output strings, one must finish the string with a "$". Otherwise the computer will continue outputing charaters from memory past the end of the string until a "$" is reached. The second thing to notice is that you do not move the string to DX, but you move the address of the string to the register. In the actual specification, it says that DS:DX must point to the address. This means that DS must contain the segment in which the string is, and DX holds the offset, or the address within that segment. However, as I mentionned in previous tutorials, when writing basic programs using a86, all the data and code is in one segment so you can forget about the DS requirement. (If you don't follow this don't worry, you can still write simple assembler code without knowing it!) When one outputs a string in assembler, the computer does not automatically move onto the next line before the next output. It will continue outputting on the same line until that line is full, which means until 80 characters have been output. To force a line break, insert the line feed and carriage return characters in your string - characters 10 and 13. To display "Hello World" on screen and move onto the next line declare the variable msg as follows:
msg db "Hello World.",10,13,"$"

Non-printing characters like the carriage return, have to be added to a string by way of their character numbers. Character numbers are not placed in inverted commas in the definition, and must be separated by commas.
Tutorial 5 - Mathematical Operators.
In assembler, more so than in high level programming languages, mathematical operations are essential. Even to perform the simplest things, like reading in or printing out a decimal number requires a surprisingly large number of mathematical operators. This is not a long or difficult tutorial because each of the mathematical operators is contained within one instruction and all one has to do is learn the appropriate instruction mnemonics. The instructions I have divided into four categories, and I include a nice sample program at the end.

Increment and Decrement


These are two of the most basic and useful instructions in the instruction set. The instruction "inc" adds one to the parameter, while the instruction "dec" subtracts one from the parameter. These operations are generally faster than using an add instruction to add one to the value.
inc ax dec b[bx] inc var_name ;add one to contents of ax register ;subtract one from byte pointed to by bx register ;increment the variable var_name

Basic Arithmetic Operators


There are assembler instructions for all of four basic arithmetic operators: addition, subtraction, multiplication, and division. The important thing about these instructions is that the latter two, multiplication and division are slow to carry out in comparison to other operations, particularly compared to bit operations such as the left and right shift given below. For this reason, when multiplying by a constant value, it can be quicker to perform the operation using a sequence of shifts and adds rather than a multiplication. Below are some example operations using adds and subtracts. Note, however, that a register or numeric literal must be one argument of the instruction - memory to memory adds are not allowed in one instruction.
add ax,bx sub bx,1 add [bx],ax bx add [bx],al bx sub num,cx add cx,num sub num,5 ;subtract cx value from variable "num" ;add num value to cx value ;subtract 5 from variable num ;add value in al to memory _byte_ pointed to by ;add value in bx to value in ax (result in ax) ;subtract 1 from the bx value ;add value in ax to memory _word_ pointed to by

The multiplication and division operators are much more limited in their parameters. Each instruction takes only one parameter and the other is always the AX (and/or AL and AH) register. Here are some very simple example instructions (Only covering 8 bit multiply and divides to avoid using multiple registers):
mul bl mul ch mul num mul 7 mul b[bx] (byte value) mul) div bl ah div ch = "mod" in Pascal) div num div 7 div b[bx] ;multiply bl * al giving result in ax. ;multiply ch * al giving result in ax. ;multiply variable "num" by al giving ax (num is byte) ;ax = 7 * al ;ax = value-pointed-to-by-bx * al. ("b" specifies 8 bit ;divide ax by value in bl. result in al, remainder in ;al = ax / ch, ah = ax % ch (% = modulus operator in C

;al = ax / num, ah = ax % num ;al = ax / 7, ah = ax % 7 ;al = ax / [bx], ah = ax % [bx]

Bit Shifting Operators


The bit shifting operators are operators which take the binary representation of a value and move the bits either left or right. With a left shift, a zero is added onto the right of the number and the leftmost bit is removed. This effectively multiplies the number by two, and it is very fast. The right shift is performed in the oposite way, and divides the number by two. The shift operators take two parameters, the data to be shifted and the amount it is to be shifted by. The second parameter is either a literal number, or the cl register.
shl shr shl shr ax,2 bl,1 ch,3 dx,cl ;multiply ax by 4 (2^2) ;divide bl by 2 (2^1) ;multiply ch by 8 (2^3) ;divide dx by 2^value-in-cl ;clear the lower 4 bits... ;...of the dl register

shr dl,4 shl dl,4

Logical Operators
As well as shifting bits left and right, the x86 instruction set also contains instructions for performing logical operations on the bits in numbers: and, or, not, and xor. Each of these except the not operator take two parameters (not takes one). The two parameter operators are used in the same way and accept the same parameter types as add and subtract (as in, one must have a literal or register as at least one parameter - no memory memory operations allowed). The not operator, takes one parameter of any non-literal type, memory or register.

For a complete list of the operators and all possible legal parameters to them, consult the A86 Manual Chapter 6.

10

Sample Program
This is a simple sample program which reads in two numbers and outputs their sum. Simple, one would think, but not in assembler, as the inputs and outputs have to be converted to and from character values into their numeric equivalents, i.e. we read in the characters '1' and '2' but we have to convert this to the number 12. This makes the program longer.
jmp start ;**************************** ;* Program to read in two * ;* numbers and add them * ;* and print out the result * ;**************************** number db 7 dup 0 output n1 dw 0 n2 dw 0 res dw 0 cr dw 13,10,"$" start: mov dx,offset number mov bx,dx mov b[bx],5 mov ah,0ah int 21h mov bx,offset number +1 mov cx,00 mov cl,[bx] mov ax,00 usedigit: inc bx shl ax,1 an add... mov dx,ax principle. shl ax,2 add ax,dx mov dx,00 mov dl,[bx] sub dx,48 number value add ax,dx loop usedigit cmp n1,00 read jnz second mov n1,ax jmp start second: mov n2,ax continue print_cr: mov ah,09

; string which will store input and ; two input variables ; one output variable ; carriage return, line feed

; maximum 5 characters to read ; read in a string from keyboard ; cl now contains number of digits ; ax will contain the number input ; get next digit ; multiply by 10 using 2 shift ops and ; ... x*8 + x*2 = x*10 is the ; ax is now multiplied by 10 ; dl has new character ; subtract 48 = ascii('0') to get ; add to ax ; loop statement= jmp if cx > 0 ; see if this is first or second number ; assign it to the first variable ; read in another number ; or assign to second variable and

11

mov dx,offset cr int 21h addnos: mov ax,n1 mov bx,n2 add ax,bx mov res,ax mov cx,00 setup_string: mov bx,offset number+7 mov b[bx],'$' forwards dec bx mov ax,res convert_decimal: mov dx,10 div dl add ah,48 mov [bx],ah dec bx mov ah,00 cmp ax,00 again jnz convert_decimal printout: mov dx,bx inc dx forward one. mov ah,09 int 21h close: mov ah,4ch mov al,00 int 21h

; print out a carriage return character ; move numbers to registers ... ; ...and add ; store the result ; put a $ at end of buffer. ; we will fill buffer from back

; divide by 10 ; convert remainder to character ; and move to buffer for output ; quotient becomes new value ; if we haven't got all digits divide

; we decremented once too many, go ; output the string

; end program

Tutorial 6 - Some Basic Graphics


Not many application programs are written in assembler entirely these days. It is most usual to find the assembler code embedded in code in a high level language, such as C++ or Pascal. (if I use assembler at all these days, I use it embedded inside programs for Turbo Pascal). The principal reason for using assembler is because of the increased speed of execution which it gives, and one area where this speed is most appreciated is in the area of computer graphics. This single tutorial will cover how to set up a graphics mode, how to return to text mode, and also how to place a single pixel on the screen. This can all be done using various interupts, but for faster pixel plotting, it is best to use direct memory accesses. The trouble with this is that each different screen resolution requires a different method of plotting pixels.

12

Switching Screen Modes


The interrupt used for switching between screen modes, and for all graphics work is interrupt 10h. Subfunction 0 of this interrupt sets the screen mode, depending upon the value of the number in the AL register. A list of the basic graphics modes are given below.
Mode 0 1 2 3 4 5 6 Type text text text text graphics graphics graphics Text Res 25 x 40 25 x 40 25 x 80 25 x 80 25 x 40 25 x 40 25 x 80 Graphics Colours Mode Res 320 x 200 320 x 200 640 x 200 640 x 200 320 x 200 320 x 200 640 x 200 16 16 16 16 4 4 mono 13 14 15 16 17 18 19 Type graphics graphics graphics graphics graphics graphics graphics Text Res 25 x 40 25 x 80 25 x 80 25 x 80 30 x 80 30 x 80 25 x 40 Graphics Colours Res 320 x 200 640 x 200 640 x 350 640 x 350 640 x 480 640 x 480 320 x 200 16 16 mono 16 mono 16 256

When writing programs which use graphics, one should remember to return the display to text mode just before the program finishes. Mode 3 is a standard mode, which is appropriate for most programs to switch to before ending. The following is a stub of code which switches the display to graphics modes (640 x 480 x 16) and then back to text mode again before ending.
;========================================= ; Basic program to change graphics modes ;========================================= mov ah,00 ;subfunction 0 mov al,18 ;select mode 18 (or 12h if prefer) int 10h ;call graphics interrupt ;==== Graphics code here ==== mov ah,00 ;again subfunc 0 mov al,03 ;text mode 3 int 10h ;call int mov ah,04ch mov al,00 ;end program normally int 21h

Displaying and Reading Back Pixels - Simply


The displaying and reading back of pixels on the screen is again simply done using interrupts. The interrupt in question is again int 10h, this time subfunctions 0Ch and 0Dh or decimal 12 and 13. The first of these displays a pixel on the screen at any

13

resolution (provided a graphics mode) at the co-ordinates specified by the values in the cx and dx registers. The colour value is specified in the al register. The second function reads the value of the pixel in memory again given by cx and dx, except this time it returns the colour in al. Below is a sample program which will display a square in blue in the middle of the screen.
jmp start ;========================================= ; Basic program to draw a rectangle ;========================================= mode db 18 ;640 x 480 x_start dw 100 y_start dw 100 x_end dw 540 y_end dw 380 colour db 1 ;1=blue ;========================================= start: mov ah,00 ;subfunction 0 mov al,mode ;select mode 18 (or 12h if prefer) int 10h ;call graphics interrupt ;========================== mov al,colour ;colour goes in al mov ah,0ch mov cx, x_start ;start drawing lines along x drawhoriz: mov dx, y_end ;put point at bottom int 10h mov dx, y_start ;put point on top int 10h inc cx ;move to next point cmp cx, x_end ;but check to see if its end jnz drawhoriz drawvert: ;(y value is already y_start) mov cx, x_start ;plot on left side int 10h mov cx, x_end ;plot on right side int 10h inc dx ;move down to next point cmp dx, y_end ;check for end jnz drawvert ;========================== readkey: mov ah,00 int 16h ;wait for keypress ;========================== end: mov ah,00 ;again subfunc 0 mov al,03 ;text mode 3 int 10h ;call int mov ah,04ch mov al,00 ;end program normally int 21h

14

Tutorial 7 - Graphics with Direct Memory Access.


Drawing pictures on the screen using the bios interrupts is all very easy, but when push comes to shove, its also very, very slow as the bios routines are built to cope with every graphics mode. A faster way of plotting pixels is to directly place the bits in video memory, using, for example, a move instruction. This is very, very fast. but it does limit you to the resolution for which the routine was written. The simplest video modes for demostrating this principle, is that of mode 19, (13h) which has 320 x 200 pixels and uses 256 colours. 256 is 2^8, meaning that the colour value takes up exactly one byte per pixel. This makes the actual setting of pixels very easy, move the colour value into the appropriate memory byte. The video memory for this mode begins at memory address A000h, and the pixels are then linearly in memory row by row. The memory location to write to for pixel (x,y) is A000h + (y * 320) + x. Unfortunately, this involves a previously unencountered complication. With the programs we have written so far, the data and program have all been placed in the one segment. However, the graphics memory is not within that segment, so we need a segment offset. This offset is the start of video memory. To access an address we now use two parts, the segment (stored in the es register) and the offset (stored in the di register). The whole address is referenced es:[di]. Drawing horizontal and vertical lines in this mode is easy: to draw a horizontal line, simply fill all memory addresses from the starting point to the end point; to draw a vertical line, add 320 to the current pixel position and this gives the next point. Below is a sample program demostrating this.
jmp start ;============================== ; Draws a horiz and vert line ;============================== startaddr dw 0a000h ;start of video memory colour db 1 ;============================== start: mov ah,00 mov al,19 int 10h ;switch to 320x200 mode ;============================= horiz: mov es, startaddr ;put segment address in es mov di, 32000 ;row 101 (320 * 100) add di, 75 ;column 76 mov al,colour ;cannot do mem-mem copy so use reg mov cx, 160 ;loop counter hplot: mov es:[di],al ;set pixel to colour inc di ;move to next pixel loop hplot vert: mov di, 16000 ;row 51 (320 * 50) add di, 160 ;column 161

15

mov cx, 100 ;loop counter vplot: mov es:[di],al add di, 320 ;mov down a pixel loop vplot ;============================= keypress: mov ah,00 int 16h ;await keypress end: mov ah,00 mov al,03 int 10h mov ah,4ch mov al,00 ;terminate program int 21h

That, basically is all there is to it. Note how for switching to and from graphics mode we still use the int calls. This is because the change only occurs generally once per program and so, unlike pixel plotting is not a bottle-neck. The primary use of assembler for graphics is frequently to embed the code in a higher level language. Given below, then is an implementation of a few basic graphics primatives created in assembler but embedded withing pascal functions (these will work with Borland/Inprise's Turbo Pascal Compilers or can be easily converted to equivalent C/C++ functions).
const vga : word = $A000; var oldmode : byte; Procedure setMCGA; assembler; {Sets the graphics mode, including saving the previous graphics mode} asm mov ax,0F00h int 10h mov oldmode,al mov ax,0013h int 10h end; Procedure settext; assembler; {Returns the program to the graphics mode it was previous in} asm mov ah,00h mov al,oldmode int 10h end; procedure putpixel( x,y : word; colour : byte);

16

{sets the pixel at (x,y) to the colour given by colour. Calculations are done using shifts and additions not multiplications} begin if (x>319) or (y>199) then exit; asm push ds {save these two registers...} push di {...by putting values on stack} mov shl mov shl add shl ax,y ax,1 bx,ax ax,2 ax,bx ax,5 {ax=y*2} {bx=y*2} {ax=y*8} {ax=(y*8)+(y*2)=y*10} {ax=(y*10)*2^5=y*320} {ax has y offset, bx has x} {add the offsets}

mov bx,x add ax,bx mov mov mov mov

di,ax {di now has currect offset} ah,colour ds,vga {es now has segment} ds:[di],ah {plot the pixel} {restore reg values}

pop di pop ds end; end;

Mixing Assembly and C-code


by Gregor Brunmar
Why mix programming languages?
After the last tutorial, you now feel like king of the world! =) You're eager to jump into the action, but there's one problem. Even though assembly is a powerful language, it takes time to read, write and understand. This is the main reason there ARE more programming languages than just assembly =). Now that we have a working 32-bit boot sector, we want to be able to continue our development in a higher language, whenever possible. C is my main choice, because it's common and powerful. If you think C is old and want to use C++ instead, I'm not stopping you. The choice is your's to make.

17

Say that we want a print() function instead of addressing the video memory directly. Also, we want a clrscr() to clear the screen. This could easily be done by making a forloop in C. We can't make function calls from a binary file (eg. our boot sector). For this purpose, we create another file, from which we will operate after the boot sector is done. So now we need to create a file, called 'main.c'. It will contain the main() function - yes, even operating systems can't escape main() =). As I said, a boot sector can't call functions. Instead, we read the following sector(s) from the boot disk, load it/them into memory and finally we jump to the memory address. We can do this the hard way using ports or the easy way using the BIOS interrupts (when we're still in Real mode). I choose the easy way, as always.

How do I do this?
We start as always, by creating a file (tutor3.asm) and typing:
[BITS 16] [ORG 0x7C00]

When the BIOS jumps to our boot sector, it doesn't leave us empty handed. For example, to read a sector from the disk, we have to know what disk we are resident on. Probably a floppy disk, but it could as well be one of the hard drives. To let us know this, the BIOS is kind enough to leave that information in the DL register. To read a sector, the INT 13h is used. First of all, we have to 'reset' the drive for some reason. This is just for security. Just put 0 in AH for the RESET-command. DL specifies the drive and this is already filled in by our friend, the BIOS. The INT 13h returns an error code in the AH register. This code is 0 if everything went OK. We assume that the only thing that can go wrong, is that the drive was not ready. So if something went wrong, just try again.
reset_drive: mov ah, 0 int 13h or ah, ah jnz reset_drive

The INT 13h has a lot of parameters when it comes to reading and loading a sector from the disk to the memory. This table should clearfy them a bit.
Register Function ah al Command - 02h for 'Read sector from disk' Number of sectors to read

18

ch cl dh dl

Disk cylinder Disk sector (starts with 1, not 0) Disk head Drive (same as the RESET-command)

Now, where shall we put our boot sector. We have the whole memory by our selves. Well, not the reserved parts, but almost the whole memory. Remember, we placed our stack in 090000h-09FFFFh. I choose 01000h for our 'kernel code'. In real mode (we haven't switched yet), this is represented by 0000:1000. This address is read from es:bx by the INT 13h. We read two sectors, just in case our code happends to get bigger than 512 bytes (likely).
mov ax, 0 mov es, ax mov bx, 0x1000

Followed by the INT 13h parameters and the interrupt call itself.
mov ah, 02h mov al, 02h mov ch, 0 mov cl, 02h mov dh, 0 int 13h or ah, ah jnz reset_drive

Now, we should have the next sector on the disk in memory address 01000h. Just continue with the code from tutorial 2 with two little ajustments. First, now that we're going to clear the screen, we don't need our 'P' at the top right corner anymore. And instead of hanging the computer, we will now jump to our new C-code.
cli xor ax, ax . . . mov ss, ax mov esp, 090000h

Now, we want to jump to our code segment (08h) and offset 01000h. Remember, we didn't want our 'P' either. Change the following four lines:
mov 0B8000, 'P' mov 0B8001, 1Bh

19

hang: jump hang

To:
jump 08h:01000h

Don't forget to fill the rest of the file...


gdt: gdt_null: . . . times 510-($-$$) db 0 dw AA55h

Moving on to actually writing the second sector =). This should be our main(). Our main() function should be declared as void and not as int. What should it return the integer to? We must declare the constant message string here, because I don't know how to relocate constant strings within a file (anyone know how to do this?). This works, but it's kind of ugly...
const char *tutorial3;

I always put the word const in, whenever possible. That's because it keeps me from making mistakes. Sometimes, it's good and some times it ain't. Most of the time it's good to have it. First of all, we wan to clear the screen, then we print our message and go into an infinite loop (hang). Simple as that.
void main() { clrscr(); print(tutorial3); for(;;); }

But wait a minute?! You haven't declared clrscr() or print() anywhere? What's up with that? No, that's true. Because of my lack of knowledge of the linker, I don't know how to do that. This way, if we spelled everything right, the linker finds the appropriate function. If not, our OS will tripple fault and die/reset. Ideas are welcome here... After main(), we place our string. After that, main.c is complete!
const char *tutorial3 = "MuOS Tutorial 3";

20

Now for our other functions. We place them in a file called 'video.c'. clrscr() is the easy one, so let's start with that.
void clrscr() {

We know that the video memory is resident at 0xB8000. So we start by assigning a pointer to that location.
unsigned char *vidmem = (unsigned char *)0xB8000;

To clear the screen, we just set the ASCII character at each position in the video memory to 0. A standard VGA console, is initialized to 80x25 characters. As I told you in tutorial 2, the even memory addresses contains the ASCII code and the odd addresses, the color attribute. By default, our color attributes should be 0Fh, white on black background, non-blinking. All we have to do, is to make a simple for-loop.
const long size = 80*25; long loop; for (loop=0; loop<size; loop++) { *vidmem++ = 0; *vidmem++ = 0xF; }

Now for the cursor position. If we cleared the screen, we also want our cursor to be in the top right corner. To change the cursor position, we have to use two assembly commands:in and out. The computer has ports which is a way to communicate with the hardware. If you want to learn more, have a look at Chapter 9 in Intel's first manual(1.1MB PDF). It's a little tricky to change the cursor position. We have two ports: 0x3D4 and 0x3D5. The first one is a index register and the second a data register. This means that we specify what we want to read/write with 0x3D4 and then do the actual reading and/or writing from/to 0x3D5. This register is called CRTC and contains functions to move the cursor position, scroll the screen and some other things. The cursor position is divided into two registers, 14 and 15 (0xE and oxF in hex). This is because one index is just 8 bits long and with that, you could only specify 256 different positions. 80x25 is a larger than that, so it was divided into two registers. Register 14 is the MSB of the cursor offset (from the start of the video memory) and 15 the LSB. We call a function out(unsigned short _port, unsigned char _data). This doesn't exist yet, but we'll write it later.
out(0x3D4, out(0x3D5, out(0x3D4, out(0x3D5, } 14); 0); 15); 0);

21

Now, to write the out() and in() functions, we need some assembly again. This time, we can stick to C and use inline assembly. We put them in a seperate file called 'ports.c'. First, we have the in() function.
unsigned char in(unsigned short _port) {

This is just one assembly line, so if you want to know more about the in command, look in Intel's second manual(2.6MB PDF). Inline assembly is kind of special in GCC. First you program all your assembly stuff and then you specify inputs and outputs. We have one input and one output. The input is our port and the output is our value recieved from in.
unsigned char result; __asm__ ("in %%dx, %%al" : "=a" (result) : "d" (_port)); return result; }

This looks rather messy, but I'll try to explain. The two %% says that this is a register. If we don't have any inputs or outputs, only one % is required. After the first ':', the outputs are lined up. The "=a" (result), tells the compiler to put result = EAX. If I'd write "=b" instead, then result = EBX. You get the point. If you want more than one output, just put a ',' and write the next and so on. Now to the outputs. "d" specifies that EDX = _port. Same as output, but without the '='. Plain and simple =). Now to the out(). Same as for in(), but with no outputs and two inputs instead. I hope this speaks for itself.
void out(unsigned short _port, unsigned char _data) { __asm__ ("out %%al, %%dx" : : "a" (_data), "d" (_port)); }

Then we have the print(). Three variables are needed. One pointer to the videomemory, one to hold the offset of the cursor position and one to use in our print-loop.
void print(const char *_message) { unsigned char *vidmem = (unsigned char *)0xB8000); unsigned short offset; unsigned long i;

We want print() to write at the cursor position. This is read from the CRTC registers with the in() function. Remember that register 14 holds bits 8-15, so there we need to left shift the bits we read. We increase the vidmem pointer by two times offset, because every character has both ASCII code and a color attribute.

22

out(0x3D4, 14); offset = in(0x3D5) << 8; out(0x3D4, 15); offset |= in(0x3D5); vidmem += offset*2;

With a correct vidmem pointer, we're all set to start printing our message. First we initialize our loop variable i. The loop should execute as long as the value we are next to print, is non-zero. Then we simply copy the value into vidmem and increase vidmem by two (we don't want to change the color attribute).
i = 0; while (_message[i] != 0) { *vidmem = _message[i++]; vidmem += 2; }

Our message is printed and all that is left to do is to change the cursor position. Again, this is done with out() calls.
offset += i; out(0x3D5, (unsigned char)(offset)); out(0x3D4, 14); out(0x3D5, (unsigned char)(offset >> 8)); }

To compile, we start with the boot sector.


nasmw -f bin tutor3.asm -o bootsect.bin

For the rest of the C-files, we first compile each file seperatly and then link them together.
gcc -ffreestanding -c main.c gcc -c video.c -o video.o gcc -c ports.c -o ports.o ld -e _main -Ttext 0x1000 -o ld -i -e _main -Ttext 0x1000 objcopy -R .note -R .comment -o main.o kernel.o main.o video.o ports.o -o kernel.o main.o video.o ports.o -S -O binary kernel.o kernel.bin

'-i' says that the build should be incremental. First link without it, because when '-i' is used, the linker doesn't report unresolved symbols (misspelled function names for example). When it linkes without errors, put '-i' to reduce the size. '-e _main' specifies the entry symbol. '-Ttext 0x1000' tells the linker that we are running this code at memory address 0x1000. Then we just specify what output format we want, the output file name and list out .o-files, starting with main.o (important!). The objcopy line make the .o-file to a plain binary file, by removing some sections.
23

We're not done yet. We have our boot sector and our kernel. The boot sector assumes that the kernel is resident the two following sectors on the same disk. So, we need to make them into one file. For this, I've made a special program in C. I'm not going into any details about it, but I'll include the source code. The program is called 'makeboot' and takes at least three parameters. The first one is the output file name. This can be 'a.img' in our case. The rest of the parameters are input files, read in order. We want our boot sector to be placed first and then our kernel.
makeboot a.img bootsect.bin kernel.bin

Code:
[BITS 16] [ORG 0x7C00] ; We need 16-bit intructions for Real mode ; The BIOS loads the boot sector into memory location 0x7C00

reset_drive: mov ah, 0 int 13h or ah, ah jnz reset_drive mov ax, 0 mov es, ax mov bx, 0x1000 mov ah, 02h mov al, 02h mov ch, 0 mov cl, 02h mov dh, 0 int 13h or ah, ah jnz reset_drive cli xor ax, ax mov ds, ax lgdt [gdt_desc] mov eax, cr0 or eax, 1 mov cr0, eax jmp 08h:clear_pipe

; ; ; ;

RESET-command Call interrupt 13h Check for error code Try again if ah != 0

; Destination address = 0000:1000 ; ; ; ; ; ; ; ; READ SECTOR-command Number of sectors to read = 1 Cylinder = 0 Sector = 2 Head = 0 Call interrupt 13h Check for error code Try again if ah != 0

; Disable interrupts, we want to be alone

; Set DS-register to 0 - used by lgdt ; Load the GDT descriptor ; Copy the contents of CR0 into EAX ; Set bit 0 ; Copy the contents of EAX into CR0 ; Jump to code segment, offset clear_pipe

[BITS 32] clear_pipe: mov ax, 10h

; We now need 32-bit instructions ; Save data segment identifyer

24

mov ds, ax register mov ss, ax register mov esp, 090000h jmp 08h:01000h

; Move a valid data segment into the data segment ; Move a valid data segment into the stack segment ; Move the stack pointer to 090000h ; Jump to section 08h (code), offset 01000h

gdt: gdt_null: dd 0 dd 0 gdt_code: dw dw db db db db gdt_data: dw dw db db db db gdt_end:

; Address for the GDT ; Null Segment

; Code segment, read/execute, nonconforming 0FFFFh 0 0 10011010b 11001111b 0 ; Data segment, read/write, expand down 0FFFFh 0 0 10010010b 11001111b 0 ; Used to calculate the size of the GDT

gdt_desc: dw gdt_end - gdt - 1 dd gdt

; The GDT descriptor ; Limit (size) ; Address of the GDT

times 510-($-$$) db 0 dw 0AA55h

; Fill up the file with zeros ; Boot sector identifyer

25