You are on page 1of 116

While learning ASM, I found many tutorials to be very confusing, and did not cover assembly in the detail

that's necessary for such a complicated programming language as this one. So, I write this rudimentary tutorial in order to ease the pain others may have learning ASM. The problem with most beginner level tutorials is that they assume the reader has previous programming knowledge in one language or another. While I'll make comments that draw connections between programming in BASIC and ASM, i hope to write this is such a way that you can skip these remarks without affecting your learning, therefore making this a completely newbielevel tutorial. First off, i believe it very difficult to learn programming without programming as you learn. So, i suggest you have a copy of TASM, a necessary utility for writing assembly programs. Also before you start, it's important that you understand about hexadecimal + binary.

2.1 - Introduction to programming
[Those with programming experience in any other language may want to ignore this section] So what is programming anyway. Well, the basic idea is that a computer program is made up of a bunch of "instructions" that a computer follows. For the most part, a program is made by typing in a bunch of instructions that make much more sense to us than they do to the computer. Then, they are translated, "compiled" or "assembled" into a program that the computer can understand. This is why you need to download an install the software higher up on this page. For our means, we can type these commands into a simple, standard text editor such as "notepad". Actually, this is preferred - if you use a more advanced program like Microsoft Word, you'll have to make sure that you save it as "text only". So, if you can, use Notepad. It's standard with all versions of windows.

2.2 - Your first program
Open up notepad, or whatever you happen to have decided to type with. For a start, your programs should always have this skeleton

That is, all your programs should include these lines. Your whole program will go in lines between "start" and "end start".

It's very important that if you copy these lines into your file instead of using Copy+Paste, notice the periods at the beginning of the first few lines. And, notice the colon after START. Even the smallest dot is a very important piece in programming so never overlook them. Now, start and end start don't really mean much to a computer. But, to use, start is the beginning of something. And end start doesn't make a lot of logical sense to us, but that's how it goes, so just grin and bear it. End Start tell where the end of the part of the main program is. But right now, our program does absolutely nothing!. So, we may want to learn about the different commands we can use in assembly.

2.3 - Interrupts
We can write a very simple program that puts just a character of text on the screen using just "interrupts". If you're familiar with any higher level languages, you can think of interrupts as essentially commands. Interrupts each have some complicated operation(s) they perform, and all they require is that you give them a small amount of information. In this case, we'll be using an interrupt that can put text characters on the screen. Because one interrupt may have many other functions it can perform, we must tell it which one to do. Then, we give it the required information, and tell it to do whatever it may do. We can therefore do very complicated operations while being totally oblivious to how they work. Here is our example program
.MODEL SMALL .STACK 200H .CODE START: Mov ah, 2 Mov dl, 1 Int 21h mov ah, 4ch mov al, 00h int 21h END START

It may seem like a collection of completely arbitrary words and numbers. Only at first. We soon realize that it is a very concrete concept. Every part along the way does its important part. The tiny pieces of code result in one big program that does exactly what we expected. Here's a breakdown, line by line, of what the program does 1. We put the number 2 in a specific location in the computer's memory. Later, the computer will look at this number and, in this case, this number tell which "function number" the interrupt should do. As mentioned before, most interrupts can do a variety of functions. So, we must tell it which one to do. In this case, we want the DISPLAY OUTPUT function. This

is function number 2. So, we put the number 2 in a specific place, just waiting for the computer to look it up later 2. We put the number 1 in a different specific place in memory. We've already specified that we want to use function 2 of the specific interrupt, which is DISPLAY OUTPUT. But what should it display. Well, different characters of text have different number codes assigned to them (this is unrelated to the base-whatever numbering stuff we talked about earlier, just to let you know). This code is called "ASCII" So, if we're going to be displaying text, we should specify what text. The number 1 in the ASCII code happens to correspond to a little smiley face. After all of this, we've so far established that we want the computer to do some text displaying; The text we want to display is a smiley face 3. The two pieces of information we gave the computer would be worthless if we didn't do something with it. In this 3rd line, we tell the computer to use interrupt #21. As soon as this happens, it looks at the place in memory called "ah" and sees whats there, because it must know which of it's numerous functions it should do. It ends up figuring out that it should display text, and ultimately, it should display a smiley face. Note that there's an "h" after the number 21. If we put an h after a number, it means that the number is not 21 in decimal. It's 21 in hexadecimal. Remember, don't think that this means 21 in both cases. Think of this hexadecimal number as "Two one"; And, if we convert it to decimal, we find that it's 33. But, it's common programming practice to use hexadecimal when referring to interrupts, rather than their decimal equivalents. So, unless you're a devout non-conformist, make it easy on yourself and think of this as "Int twenty 21 h" not "Int thirty three". It'll make it much easier for you to communicate about assembly, as everyone else calls interrupts by their hexadecimal numbers. 4. The final commands end the program. This is necessary at the end of the all your programs, unless you want awful things to happen. If you forget this, random effects, that will more than likely freeze up the computer, will result. There we have it! We've effectively written a program doing exactly what we expected from the outset. A couple things you should note Firstly, the blanks line are just my style of separating code to make it easier to read. The assembler, which we'll explain using in just a second, doesn't care one way or the other if there are blank lines, as long as they don't actually hurt the code in some way: They generally don't. you can take them out if you don't like - You can add more at your will - It doesn't matter, because the only important part is the code; The commands involved in our program. Secondly, in "Mov ah", ah is not "10" in hexadecimal. In this instance, it's the name of a place in memory. It's just a coincidence. More is explained in the next section.

Ax also has another part called al. it's a very important instruction in ASM. we would see it contained: 01FF Why? Because the "High" part contains 1. 1 First. They make up the higher and lower parts of the register ax. This makes a lot of sense. is one of the parts of ax. It does. to find their values. it's also in ah. we combine them (Don't add them. bh. does a tiny. bx. these two lines: Mov ah. we could say Mov ah. The ones we're mainly concerned with right now are AX. when combined to make ax. However. CX. this would work Mov dl.The Registers In the last section there was a strange unexplained part of it. Primarily. because it's not actually moved. though: 01 + FF = 100. They're called registers. The MOV instruction can be used in other ways.2. FF Then. dl The computer would take whatever is in dl and move it into ah. mean "High" and "Low". For example. that we've already encountered. simple command. The first one takes the number 2 and "Moves" it into a place the computer explicitly calls "ah". They're made up of two 'pieces' each .hence. we conclude that many registers. unlike an interrupt. bl. For example. 1 Mov al. or at the least the ones we care about are made of 2 smaller parts. the total value possible for those is 0 to 65. if we did this: Mov ah. BX. let's explain "Mov". they make 01FF So. and the "low" part contains FF. smaller registers. So we can deduce that the next command moves the number 1 into a place called "dl". Well. Ah. and DX.535 . not 01FF) ah bh ch dh + + + + al bl cl dl = = = = ax bx cx dx One final thing to mention . So. It appears to be shorthand for the word "Move". Whatever is in dl stays there. the h and l in ah and al. And. to say "move" is misleading. and so on. So. if we looked at what is in ax. can each have a value of 0 to 255. Likewise. or 01.4 .al. ah So what are ah and dl anyway. We know from many previous mentions that they're specific places in memory. But now. and so on. combined into the bigger register. This command. 2 Mov dl. ah.

so just read through slowly. we never actually made a program out of it.obj As long as your program has no problems. this will make a file in the same directory called "first.asm". Well. 1 and 0 represent each transistor of memory. with a finished program. Now. We'll first start by how memory is divided up. well. type this into the address bar >Tlink First. or one with no charge.exe First First. Each transistor is called a "bit". as the same as 1 and 0 in binary.3.. or no charge. and so on. Well.. But. it would be very cryptic 1010111010000111001100011100011100111110. and go back over it if something confuses you. This is short for "BInary digiT".. Taking into account our previous knowledge of binary.. If you have problems seeing it run because it opens and closes itself too fast. However... we remember that in binary a digit can only be either 0 or 1. To move on. to make a program is really quite simple: First save your program. there's not a whole lot that can be done using only them. And to do this. most of the time many pieces.. that would take up four digits to show: 1111 is the same as F in hexadecimal. So. as you just read. This can become quite complex. a computer's memory is a piece of circuitry. and registers are very important. we could think of either a transistor with a charge. So.exe"! Hoorah! Our first successful compile! (hopefully). interrupts. recall that in hexadecimal the highest digit is F .Compiling our programs We left off part 2. go to the folder where you have TASM and type this into the address bar: >Tasm. . we'll need to understand a little bit about the computer's memory. as something like "First. This turns out to be true. Then. Basically. we also need to just know about memory in general. The millions of these that the computer has is where is stores everything. In binary. Then.1 . It has small points in the circuits called "transistors" that can either have an electric charge of 5v.5 . enjoy! 3.. You see. this will make a program called "First.2.which has a decimal equivalent of 15.obj Finally. if we wanted to look at memory and it was all in binary form. hexadecimal is also important in our discussion. click on it to run it.Memory True. MOV.obj". to make memory easier to read. we can read it in hexadecimal numbers Well.

In any case. it would look like this: FFFF Punch that into your computer's calculator and convert it to decimal. store numbers. etc. 2 numbers.Addressing In order to use the computer's memory .it's all the same amount. where things should be put and where they should be gotten from. for really big numbers. Usually these two numbers are WORDs (16 bits each).we have to understand how the computer goes about organizing it. they can only hold one byte.2 . So. or 8 bits . or 2 nibbles. Specifically. etc . These are used to communicate between ourselves and the computer. it just goes up from there. petabyte PB. there's these: 1024 1024 1024 1024 1024 1024 bytes = kilobyte (KB) KB = megabyte (MB) MB = gigabyte (GB) GB = terabyte terabytes = petabyte petabytes = exabyte Terabyte can be TB. and "word". .e.g. we just need to deal with the terms "bit". and so on could only have a maximum value of 255. "byte".Since our previous unit was called a "Bit". with two registers we have 2 bytes. each register can hold 1 byte. Whenever we want to read or write to memory. we must use numbers pointing the the exact location of the BYTE we want to read. This may seem arbitrary at first . 8 bits = Byte 2 Bytes = Word 2 Words = Double Word (DWORD for short) Then. terabyte is only starting to come into common use as harddrives get larger. but these are not in common use. It does this by something called Segments and Offsets. to keep in the same naming theme. surprise surprise : It equals 65535 3. Why? Well. One byte is 8 bits. called the Segment and Offset. or 16 bits. and. Then. or 4 nibbles. the highest number is 65535. the registers ah. text. They're the ones that'll come up most often in low-level programming As I mentioned briefly in the last section. One points to the general area of memory. Assuming we made the highest possible hexadecimal number with 4 nibbles. 4 bits are called a "Nibble". al.why not 999? Because. As of 2007. and the highest number we can make in binary with 8 bits is this 11111111 So when we put together ah and al.

STACK 200H . This way. how many bytes into that segment. but that would require that we print each individual character one after another! There's a better way. In that case. known as the "offset". For a real life address with the street name and the number. not surprisingly. Bear this in mind as we cover just one more section before making use of what we now know about memory. BX. 21h fe00 ax 0 09 mov ah. and so on. but both are numbers. We learned one method. dx. that data above may be located at: FE00:0000 FE00 would be the "segment".so we must have some way of specifying what part of memory they're in. say that the text we want is stored at FE00:0000.not to be confused with DX . And the other. DX. There is another function of Int 21 that will print an entire string (a string is a bunch of text characters one after another). ah. Essentially the street name is what "part" of the city you live in.which stands for "Data Segment". CX. FE00:0002 points to D2.MODEL SMALL . to point to segments of memory.The Register DS The registers we've covered: AX. This is called their "Address". 3. Now. And 0000 would be how far in the data starts. the computer has millions of bytes of memory . are all called General Purpose Registers . This program will allow us to print it out to the screen . For the sake of simplicity. Going back to the previous sections. There is another kind of registers which are called Segment Registers In this case we're discussing DS . We do the same with computer memory. we can use up to 64KB of memory at once (65536 bytes) For example. 4ch . FE00:0001 would point to the hexadecimal number AB.the "Segment". ds.. So. say these numbers (hexadecimal) we stored somewhere in memory: 00 AB D2 AC 98 4E 67 and so on. say we wanted to read those numbers.3 . say we wanted to print some text to the screen.CODE START: Mov Mov Mov Mov Int ax. Segment registers are used. They aren't usually used for holding data like the general purpose registers are. Well. and their smaller parts. So the address FE00:0000 would "point" to the hexadecimal number 00. or part.

Since int 21 has a lot of different functions it can do. Since the offset is 0.DATA . For this interrupt. Finally. we put 9 into ah. we put the segment number first into ax. well. when you run a program like this one (if you were to compile and run it. they can vary. The one to print text. but they make life a lot easier for programmers. put another register into them. which i don't recommend).MODEL SMALL .you're not allowed to change them directly. When the compiler/assembler is done changing your program into something the computer can actually read. which in this case 'moves' (or rather copies) fe00 into the ax register. Next. This makes programs much more versatile and useful. first. however. but this time to end the program. what does this program do? Well. as you will see. it puts fe00. where we want. We use the MOV instruction to do this. Well.STACK 200H . We generally don't just put whatever we want. Strings (text). So. 00h int 21h END START So. what have we accomplished then with segments and offsets if we can't use them? We can.4 . So. For example. etc. In theory. this is just great. that's one quirk of the segment registers .Variables DS was important to introduce in the previous section. At least not at this stage. So. You don't specify this. But. is #9. we must specify which on we want. let's rewrite that last program so that it does actually work . They're called variables because. we use int 21 again. memory doesn't work like that. And whatever you put in these variables is usually put in the "Data Segment". by specifying a segment and offset. For example. So. 3. al. we put 0 into dx. what are variables and how do we use them? A variable is where you can store data. which is what DS points to. the segment of the text. But. it doesn't actually use variables. the computer picks out a free space in memory to load the program itself. You can. you can have things called "variables". because when you write a program. into AX. but you change them as much as you need during your program. we move it into ds Then we put 0 into dx. it requires that we have the segment of the text in DS and the offset in DX. Not only can they contain a number or something like that. we wanted ds to have the segment. numbers. you can't just put a number right into DS.

db. Let's start from the top downward. 4ch mov al. because as we said. because . the period in front of DATA is very important. In this case. So. It puts the offset of the variable textstring into the register. The part called . 00h int 21h END START Wow.CODE says that everything after it is part of the code. we move the segment of that variable into ax instead of what's actually in the variable. but we just tell what we want it to start at.DATA? Textstring db "I'm a string$" Well. You'll notice there's a new part that should be included in the beginning. since we can't move it straight into ds. or multiple bytes. Once very convenient feature of the assembler is that we don't have to figure out the segment and offset that our variable is at. make sure that .CODE START: Mov Mov Mov Mov Int ax.What's with that line after . Finally. . stands for "Declare Byte(s)".DATA comes before . it's multiple bytes. It doesn't actually print a dollar sign on the screen.CODE. The same for OFFSET Textstring. 21h SEG Textstring ax OFFSET Textstring 09 mov would make it tough to find where our variables are in memory.we must specify the name we want to call the variable first. we tell the compiler what we want to be in the variable. One more little detail of int 21. by saying SEGMENT Textstring. Next. function 9 is that the text you're printing must have a dollar sign at the end. because each character of text takes up one byte. A lot of things to explain here. One more unexplained part . the computer decides quite randomly . It can either be used if we want our variable to be one byte long. This can be changed by your program. ds. As always. we put the segment into ax first. dx.DATA declares what variables we have. it just indicates where the text ends.This is a new part! Make sure to include it Textstring db "I'm a string$" . instead of the actual variable. Also. ah. Which is good. Textstring is the name of the variable . Again.

the segment is B800 [That's hexadecimal of course]. 3. Now ah should contain 00. we can write stuff to the segment where what's on screen is stored.just like the register DS points to the Data segment. This means that the screen is set up then so you can only put text onto it. But. an absolute address is always in the same place. we'll talk about text. it would be a number code for the letter I. we must put 0 in ah. Our last program printed text on the screen using an interrupt. let's not worry about that right now]. put a value only in ax. This segment has an Absolute Address. For simplicity right now.STACK 200H .CODE START: Mov ax.Go ahead and compile and run this program. And this function requires that you put the screen mode you want in al. CS points to the code segment. surprisingly enough. This means that. Therefore. so this doesn't accomplish much. Here. or offset 0 in segment B800. and change to screen mode 3.Special Segments We know that all your variables. and al should contain 03. Well. Now that we're absolutely sure we're using the screen mode we want. it uses a text screen mode. In all our previous programs. Depending on whether you're using only text. we didn't specify what screen mode we wanted to use. to use it. For the screen mode we're using. The change screen mode function is function 0 of interrupt 10. Variables usually in the Data Segment and code. say we had run that previous program that prints "I'm a string" on the screen.5 . The letter "I" would be stored at B800:0000. So by default. Every letter on the keyboard. though they don't have registers associated with them. Screen mode 3 however. Actually. 4c00h int 21h END START You may notice this is a bit different than how the previous programs used interrupts. along with numerous other things. unlike variables that may change around their address every single time you run a program. The only catch in this case is that with the screen mode. So. One of these segments of memory holds all the data for what's stored on the screen. As an example.MODEL SMALL . i just moved a value straight into ax. Since ax consists of ah and al. and all the code that makes up your final program reside in memory. Take a look at this program: . 0003h int 10h mov ax. or using graphics. have a . it should work. Unlike the last one. in the Code Segment [btw . there are other various specific segments of memory. it's a different absolute address for different screen modes. the location may change around. is the default screen mode. So how do we specify the screen mode? There's an interrupt that'll change the screen mode for us. we'll call the screen mode function.

Not to be confused with a lowercase i. the code for the letter I is 73.39 [27h] But why isn't the code for apostrophe at B800:0001? It is only one byte long after all. The answer is that text can have different colors. It would be difficult to remember everything in this code all 256 of them . there's a byte with the numeric code of what color that character should just look at this chart: Anyway.MODEL SMALL .STACK 200H . Now let's put all this information to use in this next program: . B800:0001 should have the number 7 stored at it .numeric code assigned to them. We saw this in an earlier example that put a smiley face on the screen . Since by default the print string function of int 21 prints with the color grey.73 [49 hexadecimal] The code for apostrophe would be at B800:0002 . which has code number 105. And therefore being the second character on the screen it should be the second byte.7 is the numeric color code for grey.It's code was the number 1. And after each byte containing a numeric code for a character. the code for "I" would be at B800:0000 . Well.

. 1 mov es:[bx]. that does actually mean B800 . but i don't think it's used for anything specific . but it's required.CODE START: mov ax.0 mov ah. 0003h int 10h mov bx. Rest assured.i think it's just an extra segment register used to point to whatever segment you want. ah What exactly does this mean. is stored at offset 0.[] . After the segment's in bx. 0b800h mov es. so it must have something to do with a segment and offset. 0100h int 21h mov ax. But for registers that aren't segment specify that we want to use them as an offset. So. From here. And the top left corner of the screen. we put it into ES. which is the 'extra segment'. we must put it into a register first. Firstly we know that a segment and it's offset are separated by a colon. we must put them in brackets . But since you can't move numbers directly into the segment. We covered this further up on this page. Since [bx] is after the colon. bx mov bx. So instead of saying B800. Well. since it's on the left side of the colon. We're going to use ES. unlike DS and CS which point to your data and your code. you have to put that first 0 on there. we can actually put B800 in a register. Now it gets a little bit tricky: mov es:[bx]. is the segment. Then so far we've deduced that we're trying to put a number at the segment represented by whatever is in es. so we want to put things into the segment B800. and tell the computer to look at what's in the register to find what offset we want. 4c00h int 21h END START The first two lines are two change to the text screen mode. 0b800h For one reason or another. we put it into bx with: mov bx. We're gonna need to get a Segment Register to point to the segment we want. This is a "Mov" instruction. This is similar to what we did with ES. it must be the offset that we're moving to. so if we just left bx without . We want to put text on the screen.It sort of drops that first 0. I'm not sure exactly. it gets a little complicated. which is essentially the 'beginning' of the screen. ah mov ax. ES. this is another way that we can move things around in memory.

0 mov ah. it'll also come in useful right now. you give a name to a specific part of the program. What if in the last program you didn't want to print just 1 smiley face on the screen. you can use this name to jump around in your program .1 . This isn't necessarily true. it would be a very long program. bx mov bx. These next 2 lines are also new: mov ax. 0b800h mov'd have to add 2 to it every time you wanted to print the smiley face in a different place. So all in all. we now know what this command means: mov es:[bx].MODEL MEDIUM . By using what's called a "loop" we can print those 100 smiley faces by only adding a couple of lines of code. Let's see what this new program would look like: .CODE START: mov ax. When you use a label. As you'll see later. So. 0003h int 10h mov bx. Well.STACK 200H . 0100h int 21h This interrupt waits for you to hit a key.0 The offset of the upper-left corner of the screen is 0. That way you can have a chance to see what happens when you run the program. And ah contains 1 this is the numeric code for the smiley face character of text.In fact.Loops & Line Labels Line Labels are a very simple idea. two lines before this was this line: mov bx. And if bx is acting as our offset. ah It means 'move' what's in ah to the segment and offset that es and bx point to. 1 mov cx. 100 . we should make bx equal 0. we'd be saying we actually want to put something in bx. because you would constantly have to change bx . Say you wanted to print 100. 4.

and 256 colors on screen at once. it can just be a label for a part in your adds the number 2 to bx. So now it's time that we used one of the screen modes suited towards graphics. This will be the start of the loop (BTW: A label doesn't always have to start a loop. 200 pixels tall. Notice that here it says Loop startloop. then the loop ends. but there's no point in using them because drawing pixels to the screen is very easy. bx points to the offset of the next character on the screen. we need one more line to close the loop. We probably don't want the loop going on forever. This is all well and fine for learning purposes but not particularly useful. It's a start anyway. Inside the loop is this: add bx. ah add bx. But in this case. Though not great. every time your program runs across the command LOOP. It has a resolution of 320x200x256. To make a loop we start with a label. 2 loop startloop mov ax. Then. When we've typed all the lines that should go in the loop.there is. Make sure that you're aware of the colon after it . We must tell where we want to loop back to. Firstly. startloop: means that that line in the program is called startloop. Now. it can do some pretty nice graphics. 4c00h int 21h END START This new program makes use of the 2 new things we're learning here. What a loop basically does is does a set of instructions over and over again. Since startloop is the beginning of the loop in this case. It's very much like the last section. DOS has some interrupts for dealing with graphics. If cx is 0. then we should use Loop startloop. we've used only the text mode for output. Then we put the instructions to be repeated on the lines after the label. Recall that bx will point the offset 0 at the beginning of the program. This is the command LOOP. 2 This pretty much explains itself . We don't want it to put the smiley face character at offset 0 100 times. 4.Doing something useful: Graphics Up until this point. That means 320 pixels wide.this is what clarifies for the compiler that it's a label. so there must be some way to specify how long the loop lasts . you can call a label almost anything you want. 0100h int 21h mov ax. This is mode 13h. you must put a number in cx. it does start the loop). just one more thing we added to this program. Like a variable. And this loop is going to be done 100 times.2 . Before the start of the loop.startloop: mov es:[bx]. That's really all there is to the loop command. it subtracts one from cx before looping. CX is used as the loop counter. . By adding 2.

/ for division.. 0013h int 10h mov bx.CODE START: mov ax. 1 mov cx. Likewise.because. Also. Also in the previous section. the segment for mode 13 starts at offset 0 of segment A000. the loop itself is much the same.. do the math: 320 * 200 = 64000 [note that * is used as a symbol for multiplication.In the previous section. 64000 startloop: mov es:[bx]. 4c00h int 21h END START Surprised? It's almost the exact same as before but with minor changes for mode 13. bx because the screen starts at segment A000. we potentially had to write 2 bytes per character. the screen started at offset 0 of segment B800. Move a byte to A000. 64000 That's because this program is intended to fill the whole screen with dots. loop again. FYI. all data is only a color . add to bx to go to the next offset. So. when typing * usually denotes multiplication. Then. each byte of data written to the screen will draw only one pixel [a dot].]. the loop counter has been changed to mov cx. and ^ for exponents: 2^3=8.STACK 200H . In this mode. Since there's 320x200 pixels. Notice that inc bx . 0100h int 21h mov ax. 0 mov ah. a dot always looks like a dot. 0A000h mov es. there's nothing else to store but the color of the dot. 0A000h mov es. Let's see an example program for drawing some pixels: MODEL MEDIUM . ah inc bx loop startloop mov ax. and so on. Now we use mov bx. bx mov bx.

0A000h mov es. Inc bx then.3 . if it helps. 4. To use it. bx xor di. Simply add this after inc bx: inc ah So.hence. first time through the loop we draw a blue pixel. i know that that blue screen is ultra exciting. cleaner way at filling the screen with pixels.that it loops back around to 0. but i think inc is faster. the next with color 3. adds only one to used in place of add bx. This is stored in ES:DI. 64000 Startloop: stosb . Then we move to the next pixel and draw one with color 2. Let's try drawing the whole screen. add bx.storing bytes at a location in memory. and so on. Then it's just a matter of putting a value in al and calling STOSB. we must first set where to store the bytes. I'm fairly sure that when you use an inc for a register that's already at it's max value .A faster way Now we'll look at a quicker. START: mov ax. using inc when ah = 255 . but why when you can just use add!? Well. This is how it turned out for me. anyway. al mov cx. and it's just good programming technique to do the more logical thing. this would work: inc bx inc bx in place of the add bx. You should see a colorful pattern on your screen when you run this. di xor al. 2 because we want to put a byte in every single offset. 1 would have been valid here too. 2 in the previous example. Stosb is used for exactly what we did in the last program . We do this with the command STOSB. INC can be thought of as INCrememnt or even INCrease. but with all 256 colors at once to make it more interesting. since every single offset corresponds to a pixel. Likewise. 0013h int 10h mov bx.

It's therefore referred to as 'null terminated'.MODEL MEDIUM . 4. so it's the easiest for me to write about. enemy.grh". a filename must end in a 0 or 'null' character . Files aren't too hard to use. this should be in the DATA part of the program: Filename db "spryte1. we'll need the filename to do that.4 . This tutorial.Graphics from file Before we go onto to getting graphics from a file. as you can maybe tell. Our program so far should look like this: . 4c00h int 21h END START Well. so we'll need to load them in.grh0" When the 0 is inside the quotes it becomes text. 0100h int 21h mov ax.0 . Now it's part of the filename instead a terminator of the string. I have a sprite made for our program to load onto the screen. For simplicity. This means learning how to open and read files. ship.STACK 200H . Each character is a byte. Notice that that line is NOT Filename db "spryte1. and so is the number zero at the end. but faster (i believe).0 Recall that db can be thought of as declare byte(s). it's 40h.CODE START: . let's try saving graphics in a file. 0 is used as a terminator for the string. It's not too noticeably faster. same exact thing. From there opening the file is just a matter of passing a few things to an interrupt. like a person. A sprite is basically just a little image. Next. we'll get into drawing "sprites". but when used over and over as part of a program it would pay off. which is mostly what I use it for. etc depending on what kind of game it's in.DATA Filename db "spryte1. Much like printing text needs a $ at the end of the string. So to start. Our sprites will be stored in files and loaded in.grh". is leading towards the parts of ASM programming that will help you design games. of course. It's no longer a terminator because instead of being 00h in hex. Download it before moving on. al loop startloop mov ax. First we must open a file.

16 down. So. and the value to DUP after it in parenthesis (). the handle can't stay there. it decides for us. You may wonder though what 'dup' is. so let's get to it and open the file. just a few simple lines to open it: mov mov int mov ax. 3d00h dx. and so on. write only. We must put the handle here to tell which file to read from CX = how many bytes to read. We can 'open' many files at once. write. We don't get to pick the handle. an idea we're familiar with already. meaning read ax. but that's okay because we only want to load it.. AH = 3Fh specifies we want to read from the file. First. in this case. or both read and write. Since we want a 256 byte long chunk of memory. dup says duplicate the byte (in this case we don't specify exactly what value. whenever we want to read.. Read only. OFFSET filename 21h filehandle. we only need this one. it'll give us a number called a 'handle'.0. meaning that it puts it in a register after it's done. and it takes a few parameters. we technically could have many different handles. Dup can be thought of as DUPlicate. rather than reading the file every time we need data from it. Notice that the 256 comes before DUP. Next. we just move ax.. So. For now though. DS:DX = Seg and offset to string holding the filename.. We have this variable in order to actually load the file. So. and the assembler i think will reserve the space leaving whatever used to be there) 256 times. ax Next.0. 0013h int 10h mov ax. So. we just use the number associated with the open file instead of giving it the entire filename again. We can't change it while it's open for read only. into our variable filehandle. We'll make AL=0. add these two lines to the data part: Filehandle dw ? Filebuffer db 256 dup (?) When we open a file. Since we'll obviously need to use ax many more times throughout our program. Well. which now contains the handle. Filebuffer is where the contents are stored. we just put a ? to say that it doesn't matter. @data mov ds. ax And it 'returns' the file handle to ax.. AH = 3Dh specifies that we want to open a file AL = the mode to open it in. we have a place to but the handle and data.0. we would normally have to write: Filebuffer db 0. This way. BX = Handle.0. 256 times. making it quicker and easier to look at it's contents.byte per pixel) . meaning we have access to them and no other programs do. 256 (16 pixels across. 1Bpp . The interrupt for opening a file is again int 21h. it's just another interrupt to read from our newly opened file.. let's open the file. etc with the file. one for each file.

and POP. So. after 16 times. we'll move from the 'buffer' to the screen. we can get back what value it had before the end of the loop so the loop will work. so we start by setting the loop counter in cx to 16. 304 pop cx loop startloop Our loop draws one line. cx. There's 320 pixels in a row. It's like STOSB. Anyway. What is rep? It means. we'll introduce you to something very similar to what we just covered. or where to move from. we give an address to move to. We set CX again to 16. Now this is a little tricky as well: Each time we do MOVSB. so let's do the math: 320 - . and use REP MOVSB. It's a command called MOVSB.DS:DX = seg + offset of place to load to. and they're 'POPped' off of there as well. So. Sometimes it's better to omit a few smaller details until later in order to move to the bigger stuff quicker. this bit of code is a bit tricky: mov cx. DI has changed by 16. and automatically increases DI and SI by one. because DX still contains the offset of Filebuffer from before. I'll go into detail of this later. or where to move to. This is also a shortcut to mov di. but it MOVeS Bytes around in memory. "repeat the next instruction however many times CX says to".di points di to the first pixel of segment 0a000h. Ds already points to the right segment. 21h 3f00h filehandle 256 OFFSET filebuffer Now that it's loaded. So.. no need to change it. but just bear with it because they're necessary for this. ES:DI will point to the destination. 0 I know I haven't introduced the stack.. we'll point DS:DI to Filebuffer. di. just think of PUSH as saving a register temporarily. Things that are PUSHed go onto the 'stack'. 16 rep movsb add di. dx 0a000h ax di The first line is just a little short cut. Now. just another little segment of code: mov mov mov mov int ax. but it's already the loop counter! Well. To use this. Then inside the loop we need to use cx again as a different counter. For now. 16 startloop: push cx mov cx. xor di. DS:SI points to source. and ES:DI to the screen: mov mov mov xor si. We want to move 16 bytes (one line) from the buffer to the screen. and POP as getting that value back. ax. PUSH. it moves a byte. bx. and one to move from. es. we use PUSH CX to save it on the stack. dx. So. and we want to point it to the next row before we draw it. the screen by making it 0.

it's pushed as 16. blue. then i saved it in the file we used which has no junk in it like a "bitmap" does. blue. However. and loop. has many different ports you must interface with from time to time. which there's a good chance of since there's only 256 in it. CX is decreased by one.16 = 304. we point it to the first pixel of the next row down. use IN) But. So. why must we even use the palette at all? Well. 0100h int 21h mov ax. not what the pixel actually looks like. By adding 304 to di. it's quite easy. green and blue. And that's what we'll discuss next. which has control over the palette. When you write a byte to the screen. So. Things which emit their own light such as the electron gun in a TV or monitor have base colors red. you can think of defining the look of a color as mixing colors. That's just how it happens. and yellow.959 actually).The palette Mode 13h is called a 'paletted' graphics mode. making it have whatever value it had last time it was needs to be used as two separate loop counters. you should see a little Mario sprite in the top left corner of the screen. since MS Paint in .5 . be sure to remember that when loop. blue. there's something wrong with him. We need this because your video card. Well. So. The final command POPs CX back. you just mix your own values of red. the first time through CX would be 0 at the end of the loop because REP brought it down to 0. then we pop it. That is. REP also decreases CX by one every time it REPeats the instruction. This command puts a byte out to a 'port' in your computer. If we didn't PUSH it. It's decreased before we loop. finally. how do you tell what a pixel/color looks like. His colors aren't right. You can't just write bytes to them since they're ports. much like you probably did early in school with paint or something. you have to use OUT (and to get stuff from a port. You'll notice though that it's POPped. still 16. you can use 256 colors at once but there's something like 4 million possible colors (4. But you must be familiar with yet another command: OUT. Well. So. we have just these lines to wait for a key allowing us to see the sprite. you're only telling which number of the palette that pixel is. What the color looks like is something saved in the palette. that's another topic VERY VERY important to graphics called the 'Palette'. and green and make the exact color you want. 4c00h int 21h END START And that's it! When you run it. Just the pure bytes that make the image. and so on and so on. the colors in that image didn't look the same as the original bitmap. So. sometimes the default palette doesn't have the exact shade of color you want.144. and green. The sprite in our last program was saved as an image file through windows. you mix 3 basic colors: Red. then we loop and immediately PUSH it again! What good does this accomplish? Well. The problem is. and then to exit: finish: mov ax. At the start of the loop then. 4. while other natural [reflected?] light has red. that's why we must store and retrieve CX . and pushed as 15.

I did happen to save the palette and put it in a file.63 and put it out to port 3c9h 3 times. al inc mov out out out mov int mov int dx al. green. so that it's loaded correctly. we just do mov al. 3c8h out dx. completely black) to white. To write to the palette. and blue your new color will because we want to change black. 0013h int 10h xor al. This way. 4c00h 21h end start To use OUT we give it a port and a value. However. Giving it 3 0's would result in black. 0100h 21h ax. So. The result should be the black screen changing to white. So we just INCed dx. 3 63's results in pure white. let's just change the look of one color. color 0. 63 al al al ax. and the value in al . making all the colors look right. we can make the colors in our program look the same as the original image.CODE START: mov ax. and it still is. . let's go over some simple palette stuff: For simplicity's sake. The 3 values you give it are the amounts of red. Then.STACK 200H . We just changed what color 0 looks like. So. and our DOS based program uses another. We did xor al. dx. we can now load the palette in our program before. Don't be confused though .windows uses one palette. They don't have to be in registers though. dx. We'll change color 0 (by default.every byte in the screen's memory was 0 before. al mov other registers will work. Before diving into the big main program. Each one has a value from 0 to 63. The port can be in DX.MODEL MEDIUM . first we send a byte to port 3c8h telling it which color we want to change. dx. it takes the next 3 values at port 3c9h. and at the opposite extreme. This is just a couple of out commands: .

Such optimization usually requires writing code that is specifically geared towards the target architecture. A good place to start learning how to optimize code is Agner Fog's website [1] .NET). A well crafted assembly language routine can usually beat the one generated by a compiler (Even the better optimizing compilers such as Intel C++ or Visual C++ .Why do people learn Assembly Language? Speed It is possible to hand-optimize your code by using assembly language instead of relying on the generic optimization techniques of a High Level Language compiler. Optimizing for speed is a skill that is difficult to master.

thus the capability to optimize and eliminate all unnecessary instructions. We will use DEBUG initially for our 16-bit examples. Necessity Certain code cannot be conceived when using a High Level Language. Assemblers? You can take your pick from the List of Assemblers. Though they . which is available for download at [2]. Debuggers DEBUG is a DOS/Windows command-line utility program that you can use to debug MS-DOS (16-bit) programs. it is pretty easy to learn another. where you need to design Interrupt Handlers that have no other dependencies. you have total control over what ends-up in the actual executable file. When coding in Assembly. A big example is during Operating System Development. If you learn the syntax of one Assembler. Such "Low-Level" Code is usually available only through the use of Assembly Language. Understanding Learning Assembly Language is the stepping stone to learning general Computer Architecture. we will use an open-source command-line debugger called GRDB (Get Real Debugger).Size As a result of generic optimization methods. These can be powerful resources in the hands of someone who knows how to use them. High Level Languages tend to bloat the executable file with useless code. They tend to be used by hackers and such for any reason from bypassing software protection. but for the 32-bit examples. Intel-based Assembly Language syntax will be used. In general. to getting rid of pesky bugs that they've tracked down themselves. but they are certainly not a beginner's choice to learn assembly from. This learning process is accelerated when you approach subjects such as Operating System Development. Disassemblers A disassembler is a program designed to create an assembly listing from a compiled Executable.

are used by different computer systems. Several number systems. Therefore.can help if you know enough assembly to wade through everything that a higher level language gets compiled into. we are only playing with the representation of the number. some terminology. but only the manner in which it is represented. just remember. Introduction A program consists of two fundamental things: data and instructions. A number system does NOT change the value of the number. and hexadecimal. not its value. Every number system has a base (the number of digits available). But first. we would like to cover the most common of them all: the decimal number system. a computer represents these data and instructions in the form of numbers. Before we dive into the other number systems. Loosely speaking. What we mean to say is that the value of the number remains the same. but the digits we use and how we use them decides the representation of that number. octal. decimal. it is apparent that a programmer should have a good understanding of the underlying number systems being used by the computer system. Number Systems A number system is a way of representing a number. Well. you might ask. including binary. Emulators   Microsoft Virtual PC VMWare PC Emulator The need "Eek! Why should I know this stuff?". For now. (You will understand what we mean as we progress along the chapter.) Base . here are some reasons:   It's extremely simple This is what you will need in order to learn and program in assembly language.

7. A table of common bases with the digits they provide follows: Base 2 8 10 16 Name Binary Octal Decimal 0.  Please refer to the above table for information on Bases and Letter Suffixes A number with base b and the sequence of digits (anan-1a0..)b Example (the decimal number 44934 in different number systems): 10101111100001102 1276068 4493410 af86h ----binary octal decimal hexadecimal If you look carefully. you will notice the number representation "shrinking" in width as we use a higher number system. This makes the number system being used for that particular sequence of digits clear.a-1.1 are used. 3. 2. 1. 1 0. 4. This is precisely the reason why a subscript suffix is added to a number representation. 6. Base representation When you are using several number systems together. it is usually safe to assume that the number uses the base-10 or decimal number system. For a number system with base b. 7. 6. 7 0. 6. The base of a number system represents the number of digits it makes available for use. 8. If a base is not specified. . E.The base. 9. also called radix or scale. so we will avoid them. for example. 1. 5. 2. The decimal number system.. The letter is usually the initial character of the name of the base in use. has 10 digits and is called a base-10 number system. 4. There are two common ways of writing the suffix. Hexadecimal numbers are commonly used in code for precisely this property and because it makes .. D. A.) is represented as: (anan-1a0. C.. b . B. 5. d for the decimal number system). 8. 2.. A discussion about them is irrelevant to us. the digits 0. 3.. 3. is the fundamental building block of a number system. 4. A decimal number representing the base (10 for the decimal number system) or a letter of the alphabet representing that base (for example. 1..a-1. F F Base-0 does not exist and you cannot do much with base-1. 5. it is easy to confuse one number system for another. 9 Digits Last Digit Letter Suffix 1 7 9 b o (none) h Hexadecimal 0.

For any number with base b and the sequence of digits (anan-1a0). and so on toward the left. A number consists of a sequence of one or more digits. the rightmost digit is incremented by one 4 9 7 4 0 -. that of the digit 3 is 10^1 = 10. the gears move and the digit 0 is brought into position 4 9 7 5 0 -. in 65536. the digit 5 has the least weight (i. for example. The decimal number. you will notice a distance measurement device showing the number of miles (or kilometers. The Decimal Number System The decimal number system is the most commonly used number system. of each digit is given by wn = b . notice how the digits change with each mile you cover. the narrower the number representation becomes as we have plenty of digits to represent the number. The next time you go out for a drive. the weight. an is given by pn = anwn = an b . Every time the vehicle covers a new mile. with each digit having a weight and a place value. a vehicle that has traveled 49.then for the same mile. that of the digit 5 is 10^2 = more mile covered. for example. in metric units) traveled by that vehicle. weight of a digit = radix raised to the power of the position of the digit place value = value of digit * weight of the digit n n The Decimal Odometer If you peek into the dashboard of a vehicle. the gears controlling the second to rightmost digit move and the digit is incremented by 1 from 4 to 5. Try visualizing it using this graphical illustration as a guide.representing numbers simpler (as we shall see later).e 10^0 = 1). the decimal number system offers 10 digits (0 through 9) that one can use to represent numbers. Starting from the right. the wider the number representation becomes as we have to represent the number using fewer digits. Consider. has 5 digits and 5 corresponding weights each associated with one digit. Also called the base-10 number system.748 miles. 4 9 7 4 8 -. wn. The lesser the digits we have. The more the digits we have. that number is incremented by more mile covered. You use it everyday and you have been taught to work with this number system since your childhood. The place value pn of each digit. 65535. That measurement indicator is called an odometer. the place value is 3 * 10^1 = 30.starting 4 9 7 4 9 -. The place value is the weight of the digit times the digit. For the digit 3. where n is the place of the digit in the number. Each digit occupies a place in the number. .

you don't really need the WIN32-specific part.c */ #include <stdio. } Steps to making and running the above program : 1. Don't forget to define the appropriate preprocessor macro properly when compiling. To stop the running program press Ctrl+C (Windows) or Ctrl-D (UNIX). i).c % . but we have included it just in case you use a different compiler. (If you have the MinGW Compiler system.h> #define sleep(_x) Sleep ((_x)*1000) #elif defined(__UNIX__) #include <unistd. } return 0.c 2. You will need GCC installed on your system to try this example (for both UNIX and Windows). /* sleep for 1 second */ sleep (1). see how the digits change. . do this For a UNIX system: % gcc -g -pedantic -Wall -std=c89 -D__UNIX__ -o odometer odometer. ++i) { /* display a number */ printf ("%05d\r". i <= 10000.If you want to see it for yourself on your computer.c > odometer Now.h> /* choose platform */ #if defined(__WIN32__) #include <windows./odometer For a Windows system: > gcc -g -pedantic -Wall -std=c89 -D__WIN32__ -o odometer odometer. fflush (stdout). To build the executable and run it. try building and running this C program.h> #endif int main (void) { register int i = 0. Put this code in a file named odometer. for (i = 9985.) /* odometer. 8. You need the following software to use this Makefile: 5. Windows .exe *.You can also use the following Makefile to build this program 1.mingw. .$(RM) -f $(name) $(name).cygwin.gcc. 12. 11. GNU Compiler Collection and GNU Make 6. 2. UNIX . ------------------------------------------------------------------------name=odometer platform=-D__WIN32__ CC=gcc CFLAGS=-g -pedantic -Wall -std=c89 RM=rm 7. 4. GNU Makefile. 3. -------------------------------------------------------------------------. clean: @ echo "\n>>> Cleaning build\n" .net 10. 1.c file and at the command prompt type.gnu..www.c Note: The tabs in the makefile are important! UNIX / www.c @ echo "\n>>> Building program\n" $(CC) $(CFLAGS) $(platform) -o $(name) $(name).Your UNIX distribution should come with this.unxutils. Windows .o Put the above text in a file called Makefile in the same directory as the odometer. 13.PHONY: all clean all: $(name) $(name): $(name). rm 9.. 14.

the second last digit is to the power 2^1 and some one and so forth. Binary is base 2 number system. The reason for saying why binary is important to a programmer:. the only practical means to store data is to is the way of on and off (There is no way to tell whether how high is the voltage or how low it is).It makes no sense to type in 1s and 0s. The reason for saying why hexadecimal is important to a programmer:.) Binary -> Decimal Think of reading binary as reading normal number. (Some programmers says "Real man codes in hex". The last digit means is the to the power 2^0. but in the sense./odometer on Windows: > cd sourcedirectorythatcontainsthecodeandthemakefile > mingw32-make > odometer Binary and hexadecimal Binary and hexadecimal is are both different but yet similar number system which are extremely important to any programmer. In the sense. Therefore in that sense binary is evolved. but due to some reasons the people at IBM decided to call it hexadecimal instead. hexadecimal was used to be called sexadecimal.In the electronic world. the numbers mean something else. so it means that learning binary is learning machine code. By the way. Example: 0000b 0001b 0010b 0011b 0100b = = = = = 0 1 2 3 4 . one who can read binary and one who cannot. thus the hexadecimal is evolved. (As some people says.on UNIX: % cd source_directory_that_contains_the_code_and_the_makefile % make platform=-D__UNIX__ % . "Real man code in binary") Oh yes. there is 10 types of people in the world. it is more practical to adopt a base 16 number system than to use a base 10 number system. while hexadecimal is base 16 number system. Data is stored in on or off (1 or 0).

it is just a notation. you should be able to understand how does hexadecimal works. and since you understand how base 2. The conversion of hexadecimal to decimal is almost similar to the conversion of binary to decimal.) Hexadecimal -> Decimal Till this point you should be able to understand binary.0101b 0110b 0111b 1000b 1001b 1010b = = = = = = 5 6 7 8 9 10 11011010b = 1*2^7 + 1*2^6 + 1*2^4 + 1*2^3 + 1*2^1 = 128 + 64 + 16 + 8 + 2 = 218 01010111b = 1*2^6 + 1*2^4 + 1*2^2 + 1*2^1 + 1*2^0 = 64 + 16 + 4 + 2 + 1 = 87 (*Note: the b which ends every binary number is used to inform people that the number is in binary. applying the same concept:Example: 01h 02h 03h 04h 05h 06h 07h 08h 09h 0Ah 0Bh 0Ch 0Dh 0Eh 0Fh 10h = = = = = = = = = = = = = = = = 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 F4h = 15*16^1 + 4*16^0 = 240 + 4 = 244 . Well.

has only three ways of interpreting data: either as a quantity that is processed. In addition. but I prefer it to end with h.) Binary -> Hexadecimal Now is one of the most important section of the this tutorial. or having no meaning at all.7h Therefore 01000111b -> 47h Data Units Humans can view data in a number of ways. a friends name. to tell it how to interpret that data. the computer relies upon you. A computer. the programmer. or a picture of a tree.F34Ah = = = = 15*16^3 + 3*16^2 + 4*16^1 + A*16^0 15*4096 + 3*256 + 4*16 + 10 61440 + 768 + 64 + 10 62282 (*Note: the h which ends every hexadeciaml number is used to inform people that the number is in hexadecimal. it is just a notation. such as the time of day.. What is the point of knowing binary and hexadecimal exist when you do not know how to convert from one to the other? Example 1: 0111 1010b | | 7h . on the other hand.Ah Therefore 01111010b -> 7Ah Example 2: 0100 0111b : | | 4h . machine code that is executed. . Some HLL programmers prefer hexadecimal to be prefixed with 0x.

(An integer is a number that has no fractional portion--that is. you have 8 possible states--you have the 4 combinations of Bits A and B while Bit C is 0. and the highest number you can remember is 2 -1. That's a total of 2 possible numbers. If you add another bit.) Each possible state of the group of bits will correspond to one unique integer. The digital circuits that run a computer only have two levels. The binary nature of the bit is inherited from the underlying electronics. if you know the state of the bits. That is. and will not be covered in this book . a single bit on its own in the 0 state will represent the number 0. there will only be zeroes after the decimal point. Also. and toggling bits To set a bit means changing its value to 1. The meanings of high and low are tied to the hardware in use. Mathematically describing the amount of combinations formable by a group of 4 bits. n n n . If you add a Bit D. clearing. Usually. (2) (1) Setting. you can figure out what integer is represented. If you have two bits. If a single bit is used to represent a number. you need to use more bits. without any gaps in the possible values. given an integer. plus the 4 combinations of Bits A and B while Bit C is 1.Bit The smallest "unit" of data in a binary machine is the bit . To clear a bit means changing its value to 0. If you wanted to use the state of a group of n bits to remember the value of a nonnegative (that is. if you write out the number on paper. Usually. you can figure out what state the bits must be in to represent that value (if it is possible to represent that value with those bits). the state of a group of bits is used to represent an integer. then the lowest number you can remember is 0. the total amount of combinations you can form is 4--you have the two states of Bit A while Bit B is 0. 2*2*2*2 or 2^1 * 2^1 * 2^1 * 2^1 = 2^4 = 16 A group of n bits will have a total of 2 possible states. A bit has only two states. If you want to be able to remember a higher number. high and low. that number will only have two possible values. then toggling it will change it to 0). plus the two states of Bit A while Bit B is 1. 0 and 1. Bit C. To toggle a bit means inverting its value (if it is currently 1. Bit A and Bit B. and in the 1 state will represent the number 1. zero or positive) integer. there are 16 combinations--the same reasoning applies. Grouping Bits One bit has two possible states: 0 and 1.

16-bit characters are currently known as "wide" characters in the C and C++ languages. But the size of a byte depends on the microprocessor in question.Knuth term. That is why you are bound to come across the term "octet". The 80x86 byte is a group of 8 bits clubbed together to represent a total of 2 = 256 bit states. is a group of 16 bits. Word 'Word' is probably the most confusing term in data representation. which is the maximum number bits that a processor can work with at a time. One nibble is comprised of the first half of the bits in the byte. Nibble (or Nybble) There are two nibbles in a byte.Knuth (author of The Art of Computer Programming) has invented a new term for 16-bit data to get away from the ambiguity of the term 'word'. a nibble is 4 bits in size. which is short for tetrabyte. On an 80x86. Four 8-bit bytes is 32 bits. Octa (Knuth) Another D. Eight 8-bit bytes is 64 bits. Use of groups of bits . you would need to use a larger-sized unit of data. Because of the use of 'w' in functions handling 16-bit characters. Also. which is what 80x86 assembly community and this book uses. The highest number it can represent is 2 -1 = 255 (FFh). The original definition refers to the machine word. special terms are used for the most commonly used bit groupings. If a byte is used to represent a nonnegative integer. which means "a group of 8 bits"-regardless of whatever the size of a byte may be. Tetra (Knuth) Another D. 8 8 Octet On most existing microprocessors.Knuth term. and the other nibble is comprised of the last half of the bits in the byte. The other definition of 'word'. Byte The byte is the smallest unit of data that a microprocessor can manipulate. a byte is 8 bits. Double-word A group of 32 bits. If you wanted to represent 256 or any higher number. 32-bit microprocessors have machine words of 32 bits. There are actually two meanings. which is short for octabyte. the smallest number that it can represent is 0 (00h). he adopted a term that starts with 'w'.For ease of communication. whereas 16-bit microprocessors have words of 16 bits. Wyde (Knuth) D.

There is not one unique value represented by each number. which would equal 4*3^2 + 2*3^1 + 1*3^0 = thirty-six + six + one.What can you do with a word? For instance. Each digit in a sequence of digits that form a number can be assigned a unique number to identify that digit. Representing integers as binary numbers You're already familar with at least one numbering system: decimal. And since each bit is independent of the others.) The problem with this system of simply adding the digits is that there are many redundant numbers. then zero bits are set. The rightmost digit is said to be in place "0". Fifteen can be represented as 78 = 7 + 8 = fifteen. then its corresponding light is turned on. Forty three might be 99997! (Kind of resembles Roman numerals. 96 all represent the value 15. The most efficent way to represent a value is by assigning weights to each bit position. In the above example. less digits are required to represent the same value. we can assign place values to each place. If 5 lights are on. Let's assume that each digit has a place value that is 3 times greater than its neighbor to the right. you can find out how many lights are turned on by testing the number of bits that are set. then 1+1+1+1+1 = 5 bits are set. We were able to use 3 digits instead of 5. For example. of 'on' lights can be expressed with the word. 17 distinct values. from 0 through 16. 78. If none of the lights are on. one bit per light. 11 would equal 1*(place value of digit in place 1) + 1*(place value of digit in place 0) = 1*3^1 + 1*3^0 = 3 + 1 = 4. we could write 421. The binary nature of bits suggests that we should use the base-2 numbering system. . some lights can be on while others are off. Since you are using 65536 states to represent 17 values. you can represent the states of 16 lights. If the bit is cleared. A decimal digit. For example 6502 |||| |||\-> ||\--> |\---> \----> Place Place Place Place 0 1 2 3 Next. and each other digit has a place that is one higher than its neighbor to the right. To represent 43. This way. Any quantity. A word can also represent a quantity. then the light is off. 69. We can declare that if a bit is set. 87. the base-10 couterpart of the bit. can have a value from 0 through 9. called a "place". the above use of a word is a waste. What if you need to represent a number larger than 9? Add another digit. Let's assume you only care about how many lights are on.

 The place value of a place is (r^p). we will calculate the value of 6502 6502 |||| |||\-> ||\--> |\---> \----> (10^0)2 (10^1)0 (10^2)5 (10^3)6 = = = = 1*2 10*0 100*5 1000*6 = = = = 2 0 500 6000 6000+500+0+2 = 6502. decimal is referred to as base 10. this also equals the amount of possible digits. To convert a decimal number to binary. For example. so the radix is 10. binary is referred to as base 2.There is still room for improvement. the highest digit is 9. p=place. By using the radix as the factor to calculate place values. let's convert 6502 to binary 6502 / 2 = 3251 REM 0 . just keep dividing the number by 2-. In this way. 1101b is the way to write 13 in binary. To illustrate the equation. Now an example in which binary is translated to decimal (binary values end in "b" to show that they are binary) 1101b |||| |||\-> ||\--> |\---> \----> (2^0)1b (2^1)0b (2^2)1b (2^3)1b = = = = 1*1 2*0 4*1 8*1 = = = = 1 0 4 8 1+0+4+8=13. Thus. each digit can be used to represent a multiple of the first value that cannot be represented by the digits to the right of it. there are no duplicate representations of any value. d=digit number. For example. the radix of a numbering system is also called the base of the numbering system. Because of this equation. It turns out that the best factor to increase place values by is the radix of the numbering system in use. and the quotient is the number to use to get the next digit to the left.the remainder equals the digit. and the value of a digit in a place is (r^p)d where r=radix. In decimal. Conveniently. The radix is the first value that cannot be represented by a digit.

0000 <. So. you write the bits in descending order-. (for example. two separate topics have been discussed: groups of bits.a |||| |||\-> Bit ||\--> Bit |\---> Bit \----> Bit bunch of bits set to 0 0 1 2 3 For any two bits A and B. We also say that bit A is "to the left" of bit B. If we want a group of bits to represent a binary number. in exactly the same way as places in a number are assigned place numbers. When you write out a sequence of bits. if bit A is Bit 4 and bit B is Bit 2.high bits to the left. Now we want to know how to use a group of bits to represent binary numbers. Bits in a group of bits are assigned unique numbers to identify them. For example 1101 <. holding the digit in place 0.) we say that bit A is a "higher" bit than bit B. etc. which is 1 ||\--> Bit 1. low bits to the right. we usually use bit n to hold the digit in placen of the binary number.3251 1625 812 406 203 101 50 25 12 6 3 1 / / / / / / / / / / / / 2 2 2 2 2 2 2 2 2 2 2 2 = 1625 REM 1 = 812 REM 1 = 406 REM 0 = 203 REM 0 = 101 REM 1 = 50 REM 1 = 25 REM 0 = 12 REM 1 = 6 REM 0 = 3 REM 0 = 1 REM 1 = 0 REM 1 So. and binary numbers. bit 1 holds the value in place 1. Each bit can represent one binary digit-therefore.. 6502 expressed as a binary number is 1100101100110b. a binary number of n digits can be represented by a group of n or more bits.A bunch of bits representing the value 1101b |||| |||\-> Bit 0. holding the digit in place 1. which is 0 . where bit A has a higher bit ID number than bit B. bit 0 holds the value in place 0. Binary numbers and groups of bits Until now..

. or -128 to -1. 508.. So. 4. This is because you need to remember the negative sign associated with the number--that is. and when the value is positive. When the number is signed. it is called an unsigned number. and the remaining 7 bits are the unsigned integer. the value is equal to the value of that unsigned integer. and the remaining bits represent an unsigned integer. which is 1 Groups of groups of bits It may be that you are working with a system that only supports 8-bit bytes. If a value cannot be negative. 510}. with an 8 bit signed number. the highest bit is not considered to be a binary digit. which is 1 \----> Bit 3. When the sign bit is 0. holding the digit in place 2. when the value is negative. which is also the maximum value you can store in a group of 16 bits. holding the digit in place 3. Thus. The method of storing signed numbers on an 80x86 is called the "two's complement" method. This unsigned integer is in the range 0 to 127. 2. the value 128 is subtracted from the value of the unsigned integer to get the value of the entire byte. You can also represent values in the range -128 to 127. . the value is equal to the value of the unsigned integer minus the radix of that unsigned integer. For example. If a value can be negative or nonnegative. 6.|\---> Bit 2. it is called a signed number. the combined value represented by these bytes can be considered to equal 152256^1 + 74 256^0 = 152 256 + 74 = 38986. When the sign bit is 1. when the number is negative. with 8 bits you can represent values in the set {0. so the range of values isn't always from 0 to 2 -1. If the bit is 1. How could you store a number higher than 255? You can combine bytes (or other groups of bits) in the same way as you combine digits. While there are always 2 possible values represented by a group of n bits. there is one sign bit. The maximum value you can store with 2 bytes is 255 256^1 + 255 256^0 = 255 256 + 255 = 65535. the highest (leftmost) bit is the sign bit. the number is negative--otherwise it is nonnegative (zero or positive). The radix of a byte is 256 (the first value that can't be represented by a byte). So. you can assign numeric values to each combination of bit values as you like. For example. the byte has the range 0-128 to 127-128. the byte has the range 0 to 127. The total range of possible values is -128 to 127. but is considered to be the "sign" of the number. Representing negative numbers The above method is only one possible way to represent a number with bits. whether the value is negative or not. The radix of that unsigned integer is 128. For such a signed number. n n . given a Byte 1 which stores the value 152 and a Byte 0 which stores the value 74.

a serial communications protocol. So. a gate operates by on and off states (0=off or 1=on). For example. then the value is equal to the lowest representable value. but the words "left" and "right" are clearly defined to indicate the relationship of bits in groupings of bits. So. and a low with ground (0V) on the line. At its core. It's not like you can't tell what direction "left" is. It could be read as the following IF A=0 AND B=0 then C=0 IF A=0 AND B=1 then C=0 IF A=1 AND B=0 then C=0 . represented as binary numbers of course. it equals -128. instead of the place value of the leftmost bit of a byte equalling 128. and 0's as +12V (+7 to +15). you will usually store integers in this way to make the most use of the instructions.) Boolean Logic Boolean logic was developed by George Boole in the late 1800s. Of course computers have concept of "left and right". The word Bit originates from the term "Binary digIT". boolean logic is simple to master and will be useful later in programming. if that bit is set. and the binary operation on two numbers. The inputs to the gate(A. The following boolean examples will be represented using truth tables. Following along in the part of the 'AND' example below. Also. it's presumed by the instructions that a group of n bits represents either any unsigned integer in the range 0 to 2 -1 or any signed integer in the range -2 n n-1 n-1 to 2 -1. Footnotes 1. The device that is driving the signal is either sourcing +5V or sinking to ground. using the methods described above. the inputs A and B are required to be 'on' in order for C to be on (1). This is true regardless of how large the group of bits is.If all bits are 1. In RS-232. 1's are represented by -12V (-7 to -15). a high is represented by a +5 volt signal. 2. look at the instructions SHL and SHR. -128 is added to the value of the byte.) the value will equal -1. Along the lines of electronics. In traditional digital circuits. 80x86 conventions On the 80x86. It's not literal left and right spatial positioning. (I've removed note 3. The actual process of storing numbers in this way is automatically done by the assembler and the microprocessor. Another way of describing two's complement signed integers is to say that the place value of the leftmost bit is negated. (that is.logic gates. the sign bit is 1 indicating that the value is negative and the unsigned integer is equal to 127.B) are on the left and the output (C) is on the right. reading from the 'Truth Table'. That is. so it is acceptable to use those terms. if the sign bit is 1 and all other bits are 0.

IF A=1 AND B=1 then C=1 Notice that the result from NAND and NOR are the exact complement of the result from AND and OR. . respectively.

Contents [hide]     1 Uses of the operations 2 Combinations of the logic 3 Coding the logic 4 Notes Uses of the operations Combinations of the logic .

co. In general.One or more of these logical operations can be grouped together to form complex logic. More generally. The UK (and other IEC conforming countries another set of symbols is used (which according to some people (myself included) is more descriptive (Booleanly thinking. The term memory (slang) is also used interchangeably with RAM (Random Access Memory). This is much like the modern use of RAM or DRAM to refer to primary memory (see Scientica's comment in RAM). SEE Bit Operations.ferrite core storage. as it actually refers to an old technology -. Addition and subtraction are actually simulated using Boolean logic! The symbols showed above is those used in the and memory are considered distinct objects in computer terminology. Memory can be loosely understood as storage. memory is where the central processor directly stores data. This has created a somewhat complex logical gate on which can be based a reaction. its definition changes according to context and one should be able to comfortably extract it. Combining an AND and an OR. it is the capacity to store information. Computers need memory as we do too. or directly retrieves data and instructions. NOTE: Memory was an American term for some types of computer storage. like OR (A OR B <=> A+B) [[>=1]] ) than the US versions)) Here is a good page that shows both in comparison: Computer Memory In a computer system. be it from software or other. is a device or a component of a device in which information can be inserted and stored and from which it may be extracted when wanted. Internationally. for example. or the main store. Coding the logic FOR MORE BIT OPERATIONS. NOTE: "core" is obsolete. but in Sweden. ..geekcoalition. memory. G (an input) is optional. manufacturers have adopted this term for semiconductor storage designed as arrays of addressable bits. in pseudocode: IF (A=1 AND B=1) OR G=1 then C=1 (A AND B must be true for C to be true. According to Webster. or core. Memory has also been known as main storage. However. Notes Boolean logic is the basis for much of the internal processor logic.

ROM (Read-Only Memory) chips save programs permanently. Example of combinatorial circuit can be a selector for what register to use or the location of memory to access. Static memory can save its state even if it lacks a supply of electricity. Sequential circuits are digital systems containing all three kinds of circuits.Computer memory is as vital for program execution as our memory is for us to remember. In addition to this. see that the RAM lost is last state when lost the power supply. Each bit-cell of memory can record either a high voltage state or a low voltage state. The most common form of volatile memory is a RAM (Random Access Memory) module. For example. and a example of sequential circuit can be the hard drive or the RAM. when the input states are 'lost' the state of the circuit is lost. and pulse generators for generating synchronizing signals. Having a whopping 3 GHz microprocessor with a meagre 32 MB of memory (RAM) on board kind of limits the capabilities of the computer system. interface circuits allow connections to other systems. can be considered like a medium that preserve a state and can be modified by external factor (magnetism or a laser light and it not accept entries or produce results (by the entries) only they react or change is state by the external power/factor) Contents [hide] . but not when have this supply even if you don't enter more information to the memory the others locations still have the last state that was 'computed' for this location. the last state is not lost like the combinatorial. electronic (digital or nondigital) or nonelectronic. See that a CD or a floppy disk is not a circuit of any of this two types. The combinatorial circuits depend on the input states and give a 'result' for each possible entry and this are one reason for what they are called combinatorial this is they give a result for each possible combination of inputs. NOTE: Electronic digital systems are built using three kinds of digital circuits: combinatorial circuits for implementing Boolean logic. Computer memory can be either static or volatile. but it certainly can store electrical voltage states either permanently or temporarily. The sequential circuit depend to in the state of the entries and when the entries are lost. Computer memory cannot store "ideas" or information as we do. storage circuits for maintaining state.

but itself can not decodificate the address or location. Like A Finite Set For What I talk of A Finite set?. then if you need more space. This kind of memory is often used to mean the primary memory (because this type of memory is used there. 16. you have a page fault or strange results in the computations. or the used for the OS specific. it's a similar mistake to the "CMOS circuit" (Complementary Metal- Oxide-Silicon). A simple process can fill less bytes like 200 or 0 when they are not using the memory. is so important understand that you will have a finite set of cells. 32.          1 RAM 2 Like A Finite Set 3 Dividing the set Or Making partitions of the set 4 Computing the location address 5 The Memory Bus 6 Reading Memory 7 Writing Memory 8 For What Is Used 9 Advanced Memory Topics 10 How Semiconductor RAM Works RAM Short for Random Access Memory. But see that you need stay always in a correct way or manipulation with your actual set of memory. there normally exist a subspace that is not assigned to a process. then is important that you calculate the correct computations for administrate in a correct way the space that you will use.wikipedia. normally if you access memory outside of your set. When an operating system loads your program to memory or create a process. in each cell you can save a block of bits (binary units) called a byte. normally you have 8. Dividing the set Or Making partitions of the set .org/wiki/RAM It is a sequential circuit that can hold is states and can be read written. the memory that is not available for you directly is the others programs memory. 512 or 1024 Mb. The size of the memory are power of 2. 64. A longer more detailed explanation of RAM can be found here http://en2. it give a specific space in the memory or a subset of the whole memory. you can take or request to the OS a little space from here. the memory in the computer is not infinite. 256. 128.

the intersection of each partition in the set is null or not exist. It may or may not correspond with the data width associated with the architecture. The size of the data in p can grown-up or can free some memory reducing its space. the data is the instructions and data. which carries the memory address provided by the processor. see that when a program is not executed it can be considered like simple data. . which carries the timing and direction signals for controlling the read and write transactions. This p in it space will have a data. That is. free space. the x86 family of processors have different modes of addressing a location. The OS can be considered like a process but is not redundant take here like a separate space of the partition of each process. calling it the 8088. with difference on the other process that they not always are not loaded at the same position of memory. The memory can be partitioned like in general way (see that a partition is referred to space of memory): a partition for the OS. is taken from this partition. Computing the location address The memory only read and write to a specific location. this location is calculated by the Micro-Processor. and built a chip with a 16-bit data bus.A partition of a set is a part of the set. when the free space partition is grown-down. as Intel modified the 8086 to support an 8-bit data bus. a partition that doesn't have any type of process. see that when this happened the partition of all process grown-up. The Memory Bus The memory bus is the set of connections between the processor and memory. a partition not overlap any other partition or have a intersection with other. nothing prevents the designer from building a software compatible chip with an 8-bit data bus. For example: Intel designed the 8086 as a 16-bit architecture. for what? the partition of the OS is normally always in the same place of the memory or is loaded always at the same partition of memory. 3) The control bus. Let p be a process that belongs to the union of all partitions of process. Lets examine the partition of a specific process. that is. which carries the data being transmitted back and forth between the processor and memory. a partition for each process. 1) The address bus. The partition of the free space is important because when a process need be executed or need more memory. which is the number of data bits that can be transmitted simultaneously. most data buses have a fixed width. And it didn't. However. To keep things simple. The addressable memory is inside this space and is in own partition. this is any p in the actual executable process tree controlled by the operating system. The memory bus can be divided into three parts. 2) The data bus.

See that in this case is implicit the read/write of memory that use the addressing modes of the processor. . computes the address using address modes.Another example: The first 32-bit x86 architecture designed by Intel was the 80386.  The selected memory bank uses the rest of the address to retrieve data.    The processor determines how many reads will be required to retrieve the data. and to send the data to the processor. a computer can read or they have states that can be held and then read. the Pentium data bus is 64-bits in width. The processor is responsible for handling any mismatches of data size and alignment between the data bus and what the processor instruction requests. for read. Remember in memory is read it state in a specific location of memory with a specific size. Writing Memory The process is nearly the same as reading a location of memory. Reading Memory Ok. and stops when 0. The address is sent via the address bus. but they are considered to be part of a 32-bit architecture. The Pentium supports the same instruction set. Consider an iteration that counts from 10 to 0. The data that holds the counter is in memory. However. which had a 32bit data bus.  If more reads are needed. what happens if you cannot change the initial value 10? answer: an infinite loop. to all memory and address decoders. A short description of the process can be  The processor decodes an instruction that needs to read memory. Then this address is sent through the address bus and is received by the memory. How read a memory of a computer?. The Pentium has more instructions. the write of memory is important because it can change the states. The address decoders (part of motherboard chipset) use part of the address to select (activate) a specific memory bank. First we need the location and is computed like was said in the previous section. the previous three steps are repeated with the next address.

and the segmentation and paging topics in The Microprocessor. to all memory and address decoders. The address pins select one of many memory cells within the chip. . If more writes are needed. For What Is Used The memory in a computer is used for hold states. The data pins provide the means to put data into the cell. and it computes the address using address modes. Advanced Memory Topics Hardware topics. for example. In a computation like a simple addition like 234+8467 is important remember the two "sumandos" 234 and 8467 and is important to save or write the result 3701. for write.   The selected memory bank uses the rest of the address to write the data. it may hold 8 or 16 bits. The address is sent via the address bus. and we talk about states in the memory that are read or written. with data. With the memory of a computer happened similar. and then request this old states for make computations with this data. but that will be Watched in Data Representation. and to get the data back out. but see that here we only talk about How the computer get a Address of memory?. the previous three steps are repeated with the next address. What is done in the process of read and write. The address decoders (part of motherboard chipset) use part of the address to select (activate) a specific memory bank.A short description can be  The processor decodes an instruction that needs to write to memory. It may be larger.    The processor determines how many writes will be needed to store the data. How Semiconductor RAM Works A RAM chip has address pins and data pins. The memory cell may be as small as 1 bit.

reads data from memory. And. used in PC-100 and PC-133 boards. from the same set of address pins. generally. Everything a computer can do is determined by the capabilities of the microprocessor inside it. SDRAMs seem to be the most popular form of DRAM.There are two control is called "chip select" or "chip enable". tells the DRAM to capture the column portion of the address. Dynamic RAM or DRAM. It also performs many additional operations including arithmetic. lastly. They have a few extra features which I will not go into. called row and column. This is sufficient for what manufacturers call static RAM. "output enable". Because of the high density of DRAM. and input-output. A microprocessor. At this writing. it will lose charge. Because of "leakage". depending on the state of the read/write pin. One pin tells the RAM if we want to access it or not -. Microprocessors have quite a history. and the introduction of the IBM PC are events of particularly great importance from the historical viewpoint. . the containment of an entire CPU on a single chip. to further control a data read. This is the minimum configuration. tells the DRAM to capture the row portion of the address. logic. The other pin tells the RAM whether we wish to read (fetch) data from the RAM or to write (store) data into the RAM -. Row Address Strobe (RAS). works on is often just called the "write enable". The invention of the integrated circuit (IC). Static RAM uses a circuit called a flip-flop as its storage element. it's reading. static RAM will retain its contents as long as it has power. leading to loss of data. the DRAM will store or output data. and writes the result back to memory. it can require a lot of address bits to access every memory cell. In the absence of stimuli. The data must be "refreshed" before the data is lost. The Microprocessor The microprocessor is the most central part of a computer. The capacitor is charged up to hold data. modern DRAMs receive the address in two parts. Check the Intel web site for more information about SDRAMs. When both strobes have been activated. One more pin. DRAMs are designed to refresh a whole row of memory cells when RAS is strobed. To keep the number of pins low. Robert Noyce and Gordon Moore) and also one of the main reasons why this CPU-on-a-chip was being trade-named the microprocessor. uses a capacitor as its storage element. Yet another pin. There may be an extra enable pin. Column Address Strobe (CAS). if it's not writing. Building an entire CPU on a single chip for the first time ever was a great achievement for Intel Corporation (founded by Dr.

which resulted in large sizes of computers.2. The name "Intel" derives from Integrated Electronics just in case you wanted to know. Contents [hide] -9  1 Basic Architecture o  1.3 Control Bus 2. Some of these companies also roll out Intel-compatible microprocessors." it refers to both Intel-manufactured and Intel-compatible 3rdparty manufactured microprocessors.1 Memory cache . Intel is not the only company manufacturing microprocessors. Older CPUs were made of vacuum tubes and also of separate transistors.1 The processor interface 4 Memory management features o o  4. Motorola.1 Registers 3 Basic operation o  3. There are many families and generations of microprocessors. and its efforts should be well applauded. When we use the term "80x86.1 Address Bus 2. However.3 Execution Unit   2.2 Paging and virtual memory (it still ain't where you think it is) 5 Performance enhancements o 5. there are other competent companies like AMD (American Micro Devices).1 The von Neumann Machine 2 Inside the Microprocessor o o 2.2.2 Bus Interface Unit    o 2. Chips used in more recent microprocessors are silicon dies of incredibly small size on the order of 10 m.3. Details about any particular company otherwise will be specifically noted.1 A simple analogy 2.2. but we are going to study only those from the 80x86 family. Your microprocessor comes to be made from beach sand! Intel Corporation has had a major hand in the development of the microprocessor.A microprocessor is manufactured by placing extremely tiny transistors on extremely small semiconductor integrated circuits.2 Data Bus 2. and Cyrix that also manufacture microprocessors.1 Protected mode and segmented memory (it ain't where you think it is) 4.

consider yourself and compare your brain with the microprocessor. to execute the next instruction. A simple analogy For a rough analogy. 3. control unit (CU). the 80x86 microprocessor architectures are highly enhanced over it. Calculate addresses. This store in our computers is mostly semiconductor-based memory. memory. Perform the requested operation. Update current code location (add length of the fetched instruction to program counter). Fetch any operands. 2. John von Neumann in 1946. Digital computers based on the von Neumann architecture loosely follow this pattern of operation 1. The von Neumann Machine A von Neumann machine is a stored-program computer that uses a single store for both data and executable instructions. Store any results. Go back to step 1. 5.o 5. When you were a toddler. Many computers even today are based on this architecture. input-output. known as the von Neumann architecture. the Intel 80x86 architecture has become the de facto industry standard. you started identifying things. This basic architecture is. Although based on this architecture. Fetch instruction at current code location (pointed by the program counter). if any. CU and the bus are generally considered to form the CPU. and a bus. . but several have additional enhancements made to them. 6. The ALU. Since von Neumann computers spend a lot of time moving data between memory and the CPU (slowing down processing considerably). we begin our discussion of the workings of a microprocessor by highlighting a simple analogy between us and microprocessors. a symbol such as µ wouldn't have made had much sense to you except for that it was a picture. Inside the Microprocessor For reasons that will soon become clearer. As you grew up to become a kindergarten kid. therefore. and learning the alphabet and the digits. 7. the bus is usually replaced by a bus unit (made of multiple separate busses). Pictures started coming to life.2 Overlapped instruction execution (aka out-of-order execution) Basic Architecture The architecture that the 80x86 microprocessor-based computers use is based on a fundamental architecture first proposed by Dr. A von Neumann machine has 5 parts: arithmetic-logic unit (ALU). 4. As a result of technological innovations and clever marketing.

A microprocessor works in a similar manner. You may also have used your index finger to point to words to easily locate them while slowly reading sentences. Since each bit can have only one of two possible states and a group of n bits can have only 2 total possible states. The BIU is primarily made up of three busses: an address bus. It contains a bus interface unit that enables it to communicate with external devices and an execution unit that executes the instructions fed to it. The 32 Pentium. a 32-bit microprocessor internal bus has a width of 32 bits. you learnt about simple sentences. Bus Interface Unit The microprocessor has a part called the bus interface unit (BIU). and then complex ones. and therefore. and a 64-bit microprocessor internal bus has a width of 64 bits. a data bus. The internal bus transfers data to and fro between the ALU. .048.576 or 1 M (one Meg). It tells the microprocessor where to fetch data from or where to send it to. The address bus in an 8086 microprocessor has 20 signal-lines. With age you began reading and comprehending entire paragraphs. "Bus" is a general computer term for a pathway consisting of a number of electronic signal lines through which data and signals are transferred. = 4096 M = 4 Gig Data Bus The data bus is responsible for getting data into and sending it outside the microprocessor. which establishes the communication link between the microprocessor and external devices. which you can use to locate words while reading them. and the instruction decoder. has a 32-bit address bus. can only hold an address of size 20 bits. The size of the data bus decides how much data can be transferred through it at a time. you came to know about how these individual picture symbols were grouped together to form words of various sizes and different meanings. and a control bus. on the other hand. The number of path-lines in a bus determines its size. The microprocessor has two types of data bus: an internal data bus and an external system data bus. A 16bit microprocessor has an internal data bus width of 16 bits. and all this while you only got quicker and quicker at doing it.As you grew older. Each signal-line can carry only one of two voltage values (high or low) at a time. The system data bus of the microprocessor communicates with the external devices and transfers data to the internal bus. and can address up to 2 locations. The location can either be in memory or it can be an input/output port (connecting to an external device). the registers. Later on. thus signaling either a logic-1 state or a logic-0 state (a sort of yes or no) to the microprocessor. Address Bus The address bus is much like your index finger. the number of different locations that the 8086 can address is 2 20 n = 1.

The processor chip also provides an address to select which device register or memory location to write to or read from.) Execution Unit To operate on data using instructions. and executing them. Pentium-class processors provide one data path for data transfers between the processor chip and all other system units. The number and configuration of units and data paths vary depending on who designs the system. Off-chip units include the memory shared by data and executable code. 80x86 microprocessors are based on the CISC regulations and have large instruction sets. and other units known as peripherals can also be attached to the CPU. Data is transferred between these units via data paths. A square wave oscillator or clock circuit generates the timing signals based on which the processor synchronizes all its activities. etc. deciphering them. Attached to the CPU are storage for code and data. memory acts as a store for code and data. This simply means that programs written for an 80386 will run comfortably on a Pentium-based computer. Coprocessors. later microprocessors in this family are fully compatible with earlier ones. there are clear distinctions between storage and memoryin computer hardware terminology). The more the number of instructions executed per clock cycle. Cache.Control Bus This particular bus is the one that the Bus Interface Unit uses to notify the memory of its intentions. For example. if the microprocessor wanted to write to memory a write line on the Control Bus would activate letting the memory know that the microprocessors intention is to write a value to main memory. (ADD MORE-Read. the faster the processor. You cannot feed just about anything to the microprocessor and tell it to execute it. Binary instructions that a microprocessor understands as executable define its instruction set. (Although. so that the newer instruction set overlays and extends the previous ones. also known as memory. a microprocessor contains an execution unit (EU). The processor chip will read data from memory or a device. However. and various device controllers. . It also determines the speed with which instructions are fetched and executed. Basic operation The processor interface The CPU (Central Processor Unit) is the part of the computer system that contains the logic for fetching instructions. and write data to memory or a device. but may not on an 80286 one. Registers Within the execution unit are the registers we will use to program the x86 microprocessors.

The following table shows all of the possible permission combinations. blocks of contiguous memory that hold code and data.A Pentium processor (chip) writes data by placing an address on the set of signal lines known as the address bus. If a code segment is 32-bit. we can define segments. and execute permissions. the use of these features explains why your program cannot easily alter or read the data of another program in multitasking systems such as Windows and Linux. Segment descriptors control read. and virtual-8086. and capturing the data appearing on the data bus. the segment can be as large as 4G (allowing full 32-bit addressing). the following memory management features are unnecessary. Segment Type Execute Read Write code code data data Yes Yes Yes Yes Yes Yes We can designate whether each segment is 16-bit or 32-bit. protected. by default. and writable data locations must be in data segments. We are primarily interested in protected mode because that is the mode our 32-bit programs in Windows and Linux operate under. Protected mode and segmented memory (it ain't where you think it is) Intel defined at least three operating modes for their 32-bit microprocessors: real. write. Timing signals control the data transfer. Under protected mode. and the data on the set of signal lines known as the data bus. However. . The processor reads data by placing the address on the address bus. If a code or data segment is 32-bit. Two types of segments are defined: code and data. Memory management features For computational purposes. Segments are allowed to overlap. It shows that executable code must be in code segments. instructions in it use 32-bit addressing and 32-bit operands (when instructions need more than one byte). They are managed by segment descriptors.

is tested for permissions. real. When your program runs. ES.the addresses map to different pages! Each page table entry also has a "present" bit. When discussing this feature.Segment descriptors also hold base addresses that will be added to the effective addresses to get linear addresses. This is why addresses in one program are normally invalid in another program -. Part of a linear address is treated as a page number. ES. code or data. The page base address is added to the rest of the linear address to create a physical address. which allows us to implement "page swapping". we can make two programs occupy the "same memory" by making two sets of page tables. which is used to index into page tables to retrieve a page base address. We use one set when executing one program. The selector contains an index into the descriptor table where segment descriptors are stored. FS. The page base addresses allow the pages to be randomly distributed throughout physical (true. the paging feature breaks up memory into pages of fixed size (4096 bytes on a Win32 platform). every memory access. the segments associated with the four primary segment registers CS. every memory access uses a segment register. DS. When paging is enabled. This bit is maintained by software. the memory paging feature can be enabled. To access segments. Whereas the segmentation feature gathers memory into segments of varying sizes. and SS are set to the same base address. and every memory access is modified by a base address. As MS-DOS assembler programmers know. the four segments are effectively the same single segment. This is the flat memory model. a "page" is no longer a 256 byte block of memory. or SS (stack instructions). GS) with a selector. you use a value called a selector. and we use the other set to execute the other program. SS. The most recent Pentiums can generate 36-bit physical addresses with this feature. actual) memory. When you load a segment register (CS. a set of page tables are used to change the address again. without crashing the code in them! Because software can update the page tables. Paging and virtual memory (it still ain't where you think it is) In protected mode. the heart of virtual memory. . Thus. whether you specify a register or not. DS. DS (most data accesses). Windows does not take much advantage of segment registers. the indexed descriptor is loaded into a hidden register (effectively a cache) associated with the segment register. indicating if the page is loaded with page data. An effective address will be converted to the same linear address regardless of whether you are modifying it with CS (jumps). Except for execute and write privileges. ES (some string instructions). This is the last possible alteration of the address before it goes out onto the address bus.

it generates a page fault exception. . This includes mathematical operations. because a copy of the page already exists on the hard disk. their purpose and their usage. and then replaces it with the desired page from the hard disk. How are Registers used? Registers are used by simply utilizing instructions that involve their use. the OS signals a bad memory access. Performance enhancements Memory cache Overlapped instruction execution (aka out-of-order execution) What are Registers? Since computers are not magic. Otherwise. A page that isn't dirty does not need to be swapped out. registers are used for data manipulation. Registers are such data areas that are physically located on the processor. Then the page is marked as "present". The OS decides if the memory is allocated. Almost every form of data transfer and data manipulation is processed through registers.When the processor attempts to access a "not present" page. If there is no room. the OS finds a suitable place to reload the "swapped out" page. other types of Memory or even I/O Devices. Such instructions load/store data from/to RAM. logic operations. which is set when a loaded page has been written to. If not. Why are Registers used? As stated in the last paragraph. there must be some way to physically manipulate data in the real world as indicated by computer program instructions. Please reference Intel's Processor documentation for specific information about instructions. Optimization note: The page table entry also has a dirty bit. the OS chooses a page to "swap out" to the hard disk. program control and other various operations.

Types of Registers There are various types of registers that. registers which are used for general programming purpose. Base Register (BL/BH/BX/EBX/RBX) The Base Register was initially designed to be a base pointer for addressing memory locations. have various purposes. Counter Register (CL/CH/CX/ECX/RCX) The Counter Register was initially designed to perform as a counter for programmed loops and as an index number for shift operations. The following are brief descriptions of all the major types of registers found in the x86 architecture. More detailed information can be found in Intel's Processor documentation manuals. General Purpose Registers (GPR) General Purposes Registers are hopefully self-explanatory. . of course. Accumulator Regsiter (AL/AH/AX/EAX/RAX) The Accumulator Register was initially designed to hold results from arithmetic operations. to send and receive data during I/O operations and to identify BIOS function calls.

Source Index Register (SI/ESI/RSI) The Source Index Register was initially designed to act as a pointer to the source of memory and string operations.Data Register (DL/DH/DX/EDX/RDX) The Data Register was initially designed to assist in arithmetic operations and to be a pointer to I/O port addresses during I/O operations. . Base Pointer Register (BP/EBP/RBP) The Base Pointer Register was initially designed to hold the base address of the stack. Destination Index Register (DI/EDI/RDI) The Destination Index Register was initially designed to act as a pointer to the destination of memory and string operations.

Stack Pointer Register (SP/ESP/RSP)
The Stack Pointer Register was initially designed to hold the limit (top) address of the stack.

Segment Registers
Segment Registers act as base address pointers to memory "segments" during operations that address any part of memory. These registers are apart of the x86 "Segmentation Memory Model" and are rarely used due to the advent of "flat" memory space. Despite their depreciation during the evolution of the x86 architecture, these registers are still required to have a valid values during normal CPU operation.

Code Segment Register (CS)
The Code Segment Register was initially designed to act as a pointer to the code segment in which a program is currently running.

Data Segment Register (DS)
The Data Segment Register was initially designed to act as a pointer to the data segment in which a program's variables and data structures were being accessed.

Auxiliary Segment Registers (ES/FS/GS)
These Auxiliary Segment Registers were initially designed to assist programs with addressing various segments of memory due to the the 64KB limitation of segments during 16-bit Real Mode operation.

Extended Architecture Registers
The following registers are used with certain instructions, in which the support of those instructions varies depending on the release time of the processor. Please read Intel and AMD's Documentation Manuals for more information about their instruction-sets.

Floating Point Registers (ST)
These Floating Point Registers were initially designed during the addition of the 80387 (x87) Floating Point Unit (FPU). Currently, the FPU is standard when using a 387 FPU, 487 FPU or in processors that are 586+ (Pentium and above). The FPU registers are used to store data during float-point operations of the FPU.

Matrix Math Extension Registers (MMX)
These Matrix Math Extension Registers were initially designed during the addition of MMX to the Pentium processor series. The MMX registers are used to store data during MMX operations.

System Registers
The following registers are used to control system operation and asses system/program status.

Instruction Pointer (IP/EIP/RIP)
The Instruction Pointer was initially designed to help guide program control.

Prior to instruction execution, the Instruction Pointer (IP) points to the location of that instruction in memory. During standard operation, the IP is automatically increased after the execution of each instruction.

The Flags Register was initially designed to hold the state of the processor, including certain data pertaining to the currently running process.

Control Registers (CR0/CR2/CR3/CR4)
The following Control Registers were initially designed to support the enabling and/or disabling of various processor features in a programmable fashion.

General Outline
Lexical Issues Whitespace Comments Identifiers Literals Integers Characters ASCII Unicode DBCS Strings ASCII Unicode Types Null-Terminated (Zero-Terminated) Dollar-Terminated Length-Prefixed Descriptor-based Mixed-mode HLA Strings etc... Keywords Separators Instruction Syntax General Instruction Syntax Operands Registers Memory variables Literals Expressions Labels, Variables and Data Definition Data Definition and Types Simple Types BYTE or DB WORD or DW DWORD or DD QWORD etc... Packed Data Types BCD etc... Operators

Assembler Directives (TODO) Layout and Style Code Traditional Linear . '' Angled-Brackets <> etc. Assignment = EQU := Special ? etc. Bit Manipulation SHR SHL etc.....Separators Comma (.) Line-Extension (\) Arithmetic * / + MOD Bitwise Bitwise Logical AND OR NOT XOR etc.. Relational == < > != ! => <= Grouping Parentheses () Brackets [] Braces {} Quotes ""..) Period (.

HLA Specfic . whenever we refer to assembly language. Remember. The assembly syntax of various 80x86 assemblers may vary somewhat but all of them are essentially subsets of the Intel Architecture assembly language. but since it is necessary to do so. we will be using curly braces to mark out optional parts. it will mean we are referring to the IA-32 syntax. whenever we say MOV instruction. We will be using the Intel Architecture 32-bit (IA-32) assembly language syntax throughout. So. we provide another version of the above syntax to make things clearer. so all our examples that are not marked as specific to any assembler otherwise. So. 01 . {label:} mnemonic {operand1} {. operand3 . operand3} {. Comment Example: mylabel: mov eax. will mean they are for MASM. Comment} MASM Specific Microsoft Macro Assembler (MASM) follows this syntax closely. 01" is the instruction. only in syntax definitions like this one. operand2} {. we have not used any special markup to indicate optional components.Indented Mixed Comments Procedure Details Line Details Single-Line Multi-line Labels Instruction Syntax There are many kinds of assembly language that differ from one another in many ways. where "MOV" is the mnemonic and "MOV EAX. operand2. An instruction in the IA-32 syntax format looks like this: label: mnemonic operand1. Copies 01 into the eax register. in general. we are referring to the complete statement and not just "MOV" itself. To keep the syntax above clean and simple.

eax ) ) then <> endif. and in high-level control structures. In general. eax ). when calling procedures. This allows you to specify one instruction as an operand of another. Though HLA fully supports labels like any other assembler. registers. ebx). treating operands as though they were parameters to a function that does the operation.) .g. e. e. operand2.g.label: mnemonic( operand1. // Comment An Example mylabel: mov( 01. eax ). HLA supports an interesting feature known as instruction composition. The use of high-level control structures usually obviates the need for such labels in actual source code. operand3 ).... // Copies 01 into the eax register. the source operand is first and the destination operand is second. mov( mov( 5. but it is quite useful when expanding macros. those who prefer to eschew the high-level control structures and write "low-level" assembly code. labels use the same basic syntax in HLA as in other assemblers. opposite MASM's (dest. HLA's syntax is quite a bit different than MASM's. if( mov( i. HLA will emit the interior instruction first and then substitute the destination (second) operand in place of the interior instruction when processing the outer instruction. Generally. // Copies 5 into the ebx and eax Whenever an instruction appears as the operand to another. you'll find that the operands are reversed (that is. // "true" if "i" contains a non-zero value. Also note that HLA uses a functional syntax for instructions. Nevertheless. you won't find instruction composition used as in this example..src) organization). FASM Specific (todo.

GoASM Specific

NASM Specific

Building Programs
Assembly language allows you to code programs using mnemonics, but the computer doesn't understand these. What the computer does understand is simply a sequence of high and low voltage fluctuations represented by binary digits. So, we need some kind of program to convert our code into a form the computer can understand and execute. High-level languages make use of compilers for this purpose. Assembly language, however, requires the use of an assembler, which is a sort of compiler itself. Before we begin building programs using assemblers, we will first demonstrate a few things.

1 A crude "Hello, World"!

o  

1.1 Using Debug

2 Resource Compilation (Windows specific) 3 Assembling

A crude "Hello, World"!
Using Debug
This "Hello, World!" example is unlike the programs that you may have written in other languages or assembly itself, because it does not use any direct function calls to print it to the screen nor does it print to the screen. It is very crude because it is only a binary file made of a series of bytes used to represent the characters in the string "Hello, World!". One important point to remember here is that a program is essentially a series of bytes. We use the command 'type' to display the contents of the resulting file. First, run DEBUG by typing 'DEBUG' at the command-prompt, and at the debug prompt (-), type as in the following (input emboldened): C:\>DEBUG -A 0B18:0100 0B18:010D

db "Hello, World!" ; press ENTER here

-R CX file CX 0000 000D N hello.bin W Q C:\>type hello.bin Hello, World! C:\>

; CX = 'number' of bytes we want to output to ; initial value of CX ; D = 13, the length of "Hello, World!", ENTER ; name of the output file ; save the data "Hello, World!" to file ; Quit DEBUG

'db', or define byte, is simply a directive to DEBUG to tell it to define some bytes for us.

The entire compilation process, called building, is not a single step though. The process is basically divided into three steps: resource compiling, assembling, and linking.

Resource Compilation (Windows specific)
Resources are objects, such as strings, bitmaps, icons, video, audio and menus, that you will use in your programs. Information concerning the resources that you want your program to use are contained in a text file, called a _resource script_ (.RC). The resource compiler program reads this resource script and combines all the resources referenced in it to form a packed resource file. Generally, for the Windows platform this resource file has the extension .RES. There are resource compilers, however, that also generate Windows object files (.OBJ) instead of or in addition to resource files (GoRC is such an example). Also, utilities to convert between the two formats exist. A Windows resource file (.RES) contains a series of packed resource entries, with no headers, footers, padding, etc.. On the other hand, a Windows object file (.OBJ) contains more than just resource bytes: relocation lists, symbol lists, modules, segment data, checksum bytes, etc.. You do not need to worry about these to start build programs--the assemblers do all the dirty work for you. (However, that shouldn't stop you from diggin' in and building the next big assembler.) Information regarding specific resource compilers can be found a little later.

Assembly language source code files are strictly plain-text files having the extension ".ASM".

Data Transfer Instructions


   

1 The MOV Opcode 2 Loading memory or register with a constant 3 So how does mov work? 4 Moving data from a memory location to another

The MOV Opcode
So far, you would have read about registers and memory, but there was no mention on how to transfer data to the memory location or register. So let us embark on the journey to learn how to move data from memory to register, from register to memory, from register to register and setting the value in the registers and memory. Of course I will describe some tricks (ie some size optimization) along the way.

Loading memory or register with a constant
To load a register with a constant you do the following: mov eax, 10 In the above example, the opcode mov moves the constant 10 to eax. This means that the value in eax is now 10 after the instruction. To load a memory with a constant you do the following: mov [memory], 10 In the above example, the opcode mov moves the constant 10 to [memory]. The brackets tells the assembler that the label is a memory for most assembler (Though some assemblers ignores it. Refer to the assembler manuals for more details). This means that the value in [memory] is now 10 after the instruction.

So how does mov work?
The opcode mov works in the following method (simplified): mov dest, source dest = source Where dest can be register or memory and source can be register, memory or constant.However, do note that both the source and dest cannot be memory at the same time. You cannot use mov opcode to copy from memory to memory directly. Moving data from a memory location to another will be discussed later. This convention might look strange to HLL coders but fear not. One will get used to it after a while. Most opcode are in the form opcode dest, source. Of course if you do not like the above convention, you can use other assemblers like HLA or GAS. To move data at certain memory location to register: mov eax, [memory] ; or any other register

3 IF-THEN-ELSE statement . assembly language does not natively support high-level representation of these conditional statements... High-level constructs can be implemented for these instructions.1 IF statement 1. "switch. the mov opcode cannot move data from one memory location to another memory location. "do. However. Contents [hide]  1 The Flags o o o".. The statements that allow for branching (deviation to separate choices) based on conditions are called conditional branching statements. Don't panic. or any other register To move data from register to register: mov eax. [memory] . eax OR push [memory] pop [memory2] Conditional Statements Conditional statements are the "if.. but they are mostly either macros or incorporated assembler directives.or any other avaible register mov [memory2]. you still can move data from a memory location to another by another method. ecx . Those that allow for repetition of statements of code without rewriting code based on conditions are called conditional looping statements. or any other register Moving data from a memory location to another As mentioned above. You either temporary copy to an avaible register and then copy it to the memory location or you make use of the stack.else.endif"...To move data from register to memory location: mov [memory].2 FOR statement (C version) 1.while" etc... "while. eax .wend".. Specialized instructions use the EFlags register to determine a condition and then a jump is executed based on the state. mov eax. This chapter focuses on teaching you how we implement these constructs using plain-vanilla assembly instructions.

5 SWITCH-CASE statements The Flags In assembler. The "cc" in Jcc. there is actually a whole range of opcodes. CMOVcc and SETcc. All the opcodes can be classified into 2 groups. Parity flag (PF). Also another somewhat important flag would be the direction flag (DF). conditional statements revolve around one thing and that is the EFLAG register (or more commonly known as the flag register). the first being opcodes that modifies the EFLAG. Some of the conditions test fields have their alias thus actually they are opcodes that are the same (For example.o o 1. For the former group it could be further classified into opcodes that which modifies what flags and so on.4 Advanced IF statements 1. and those that do not modify the EFLAG. but it would only be used by string opcodes and can only be modified by cld (clear direction flag) and std (set direction flag). JZ is the same as JE). overflow flag (OF). Sign flag (SF). The most important flags are carry flag (CF). Opcodes relating to Conditional Statements The opcodes that would be most commonly seen in conditional statements in assembler would be the following Instruction JMP Jcc JCXZ/JECXZ LOOP LOOPZ/LOOPE Description Unconditional jump Jump if conditions met Jump if cx/ecx equals 0 Loop count Loop count while zero/equal LOOPNZ/LOOPNE Loop count while not zero/equal CMOVcc TEST CMP Conditional move Logical compare Compare 2 operands For Jcc. The tttn is as following (listed according to its encoding) . zero flag (ZF). CMOVcc and SETcc represent the tttn (condition test fields).

while jg is jump if greater (intended for signed number). Greater than or equal to) SF = OF LE/NG (Less than or equal to. Well the difference is that ja is jump if above (intended for unsigned numbers). Not above or equal) CF = 1 NC/NB/AE (No carry.                O (Overflow) OF = 1 NO (No overflow) OF = 0 C/B/NAE (Carry. Not above) CF = 1 or ZF = 1 NBE/A (Not below or equal. Parity odd) PF = 0 L/NGE (Less than. Above) CF = 0 and ZF = 0 S (Sign) SF = 1 NS (Not sign) SF = 0 P/PE (Parity. Greater than) ZF = 0 and SF = OF One would wonder what is the difference between ja and jg. conditional jumps for unsigned. Not zero) ZF = 0 BE/NA (Below or equal. Not below. Above or equal) CF = 0 E/Z (Equal. Zero) ZF = 1 NE/NZ (Not equal. and others. Parity even) PF = 1 NP/PO (Not parity. Below. Not greater than) ZF = 1 or SF <> OF NLE/G (Not less than or equal to. Conditional Jumps for signed numbers     JL/JNGE JNL/JGE JLE/JNG JNLE/JG Conditional Jumps for unsigned numbers     JC/JB/JNAE JNC/JNB/JAE JBE/JNA JNBE/JA Others  JO . Alright so the above list could be classified into conditional jumps for signed numbers. Not greater than or equal to) SF <> OF NL/GE (Not less than.

address given in operand. no support for 32bit or 16bit memory or register. then if counter != 0. Do take note that the conditional move is only for 32bit and 16bit register and memory. Conditional move for 8bit register and memory is not support. SETcc will set the byte if the condition is met. memory to register or register to memory. . the code will jump to the label. if you wish to generate a 32-bit result from SETcc that can be done by zero extending the 8bit register by using the MOVZX instruction. For LOOPNZ. one is jump near. the code wil jump to the label if counter != 0 and zero flag is set 0 (or rather is cleared). the processor will only jump to label if ZF = 0). For CMOVcc. but the results are not updated. go back to label if ecx is not zero LOOPZ and LOOPNZ is similiar to LOOP but the jump is also dependent on the value in Zero Flag. Jcc is almost similiar to JMP. if the condition met. id est can only jump relative to JCXZ/JECXZ -127 to +127. just that the jump is only taken if the conditions are right (For example for JZ label. Each time LOOP instruction is executed. So in short LOOP label is the same as the following label: dec jnz ecx label . LOOP/LOOPxx instruction makes us of ecx or cx as the counter. CMOVcc is only available on 686 and later. CMP instruction compares the first operand with the second source operand and set the status flag in the EFLAGS register according to the results. so the jump must be within displacement of -127 to +127.       JNO JE/JZ JNE/JNZ JS JNS JP/JPE JNP/JPO "JMP" is an unconditional jump. But take note that the displacement for JCXZ and JECXZ is only 1 byte. JCXZ/JECXZ is a jump if cx/ecx (dependent on opcode used) is zero. the code will jump to the label if counter != 0 and zero flag is set to 1. Though. Please bear in mind that SETcc only accept 8bit register and 8bit memory and nothing else. there is 2 types of jump commonly used. displacement relative to next instruction. The comparison is performed by subtracting the second operand from the first operand and then setting the status flags in the same manner. For LOOPZ. absolute indirect. relative. Also loop has a displacement of 1 byte. For JMP. Do take note that Intel do not recommend LOOP/LOOPZ/LOOPNZ because they say it is a complex instruction and it would be much better to do the above code to replace LOOP. ecx or cx (depending on address size) is decremented. but I personally think it is more useful than SETcc. the one is jump near. decrement counter in count register . the code will move data from register to register.

you have to be careful when comparing registers against something here _out: HLA (low-level syntax): cmp( eax. Now the more commonly used opcodes for conditional statements are introduced. The following rarely does what the author expects . most high-level assemblers assume you're usingunsigned comparisons. IF statement Pseduo code: IF eax < 25 //do something here ENDIF Assembler: cmp eax. By default. How to implement conditional statement in assembler In this section. 25 jnc _out . When using high-level control structures like HLA's "if" and MASM's .TEST instruction compares the bit-wise logical AND of the first operand and the second operand and set the SF. Comment I generally perfer jnc to jnb because jnc means jump if carry flag is not set as opposed to jnb which means jump if not below.if. ZF and PF status flags according to the result. The result is then discarded. I will give some pseudo code and later how it would look like in assembler. 25 ).do something here . // do something here _out: MASM/TASM (high-level syntax): .if eax < 25 . lets dive in into the topic itself.endif HLA (high-level syntax): if( eax < 25 ) then // do something here endif. jnb _out.

if( (type int32 eax) > -1 ) then // do something endif. You'll have to explicitly tell the assembler if you want to do a signed comparison.g. eax . -1 ). // equivalent low-level code: cmp( eax. Assembler: test eax. // do something endOfIf: : The problem is that -1 is equivalent to $ffff_ffff (0ffffffffh) and EAX. jng something here _out: HLA low-level syntax: test( eax. -1 ). // equivalent low-level code: cmp( eax. jna endOfIf. // do something here _out: . is never greater than this value (hence the expression above is always false). eax ).if( eax > -1 ) then // do something endif. when treated as an unsigned value. e. jnz _out..set the flags jnz _out . // do something endOfIf: Always be aware of the differences between signed and unsigned comparisons! Pseudo code: IF eax == 0 //do something here ENDIF HLA high-level syntax: if( !eax ) then // do something here endif.

. eax and or eax. jz _even. reg is only 1byte. For example: Pseudo code: IF eax is odd edx++ ENDIF Assembler code: test eax. eax ). Call it size optimisation. // do something here _out: Comment This would be probably one of the more common code seen in assembler (Quite a number of windows API returns 0 in eax on error). ecx ). However the drawback is that the displacement must be -127 to +128). One reason why "cmp eax. // do something here _out: or xchg eax. eax is shorter than the cmp (The last example is 1byte smaller than the test and or variant because xchg something here _out: HLA syntax: or( eax. 1 jz _even inc edx _even: HLA syntax: test( 1.or or eax. jnz _out. ecx jecxz _out .do something here _out: HLA syntax: xchg( eax. eax ).set the flags jnz _out . The test instruction could be used for test whether a bit is set. 0" is not used is because test eax. eax . jecxz _out.

0 adc edx. inc( edx ). edx ). adc( 0. adc( 0. _even: or bt eax. _even: or shr eax. The first should be easy to understand. jnc _even. eax ). 1 jnc _even inc edx _even: HLA syntax: shr( 1. 0 HLA syntax: shr( 1. or bt eax. 0 HLA syntax: bt( 0. inc( edx ). eax ). id est whether the last bit is set or edx ). _even: or shr eax. the second makes use of the fact the carry flag contains the . 1 adc edx. eax ). eax ). edx ). 0 jnc _even: inc edx _even: HLA syntax: bt( 0. jnc _even. Comment The above codes are just examples of testing for "even-ness".

but the second does. or cmp eax.endif HLA Syntax (high-level): if( eax > 47 ) then mov( eax. eax HLA syntax: cmp( eax. All in all. mov( eax. but misjumps take up alot of cycles. 47 ). edx ). 47 ). jna atF. Assembler: cmp eax. 47 cmova edx. If you want to preserve the value. edx ). eax . 47 jna @F mov edx. while the last makes use of the instruction bt which test the bit and sets the carry flag according to whether the bit is set or not. cmova( eax. edx ). This could be replaced by some conditional jumps (as shown in the later example).if eax > 47 mov edx. eax @@: HLA syntax: cmp( eax. . and may not be supported on all IA-32 processors. go for the test version or the bt version. endif.last bit shifted out. Pseudo code: IF eax > 47 edx = eax ENDIF MASM syntax (high-level): . the first and third does not destroy the value in eax. atF: Comment This is just an example of how the instruction CMOVcc could be used (Sweet and short huh?). Just do take note that CMOVcc are introduced in P6 family processor.

mov( eax. _loopstart: mov( array2[ ecx*4 ]. ecx <= 9. cl HLA syntax: cmp( eax. ecx ). Comment This is just an example of how the instruction SETcc could be used. movzx( cl. ecx ). endfor.ecx<=9. 9 jbe _loopstart HLA syntax: xor( ecx. Assembler: xor ecx.Pseudo code: IF eax == 9 ecx = 1 ENDIF Assembler: cmp eax. eax ).ecx++){ array[ecx] = array2[ecx] } HLA high-level syntax: for( xor( ecx. array[ ecx*4 ] ). inc( ecx )) do mov( array2[ ecx*4 ]. FOR statement (C version) Pseudo code: FOR (ecx==0. . eax inc ecx cmp ecx. ecx _loopstart: mov eax. 9 ). array2[ecx*4] mov array[ecx*4]. ecx ). eax ). The movzx zero extend the value in cl (which is set depending on the value in eax) to ecx. 9 setz cl movzx ecx. setz( cl ).

eax ). dec( ecx ). edx ). or mov ecx. This need not be the case. edx. IF-THEN-ELSE statement Pseudo code: IF (ecx<eax) edx = 8 ELSE edx = 16 ENDIF Assembler: cmp sbb and add ecx. edx ). 8.9 _loopstart: mov eax. ecx ). eax edx 8 8 HLA syntax: cmp( sbb( and( add( or ecx. jbe _loopstart. cmp( ecx. but the second example is one instruction shorter than the other. array[ ecx*4 ] ). edx. edx ). . ecx is used as the counter. mov( eax. eax dec ecx jnz _loopstart HLA syntax: mov( 9. Also in both examples. array[ ecx*4 ] ).mov( eax. edx. 8. jnz _loopstart. eax ). inc( ecx ). Comment Both examples does the same thing in this context. edx. _loopstart: mov( array2[ ecx*4 ]. array2[ecx*4] mov array[ecx*4]. you can use any of the other registers as the counter. 9 ).

mov( 8. Instead carry flag is used to set edx to -1 or 0. eax ).do something ENDIF HLA high-level syntax: if( eax >= '0' && eax<= '9' ) then // do something endif. also: if( eax in '0'. jb notLessThan. Then finally the add instruction fixes the number to the desired number. 16 _@@: HLA syntax: cmp( ecx. jmp endOfIF.'9' ) then // do something endif. endOfIF: Comment Personally I prefer the first code because there is no something @@: . 8 jmp _@@ @@: mov edx. notLessThan: mov( 16. eax jnc @F mov edx. Assembler: cmp eax.. edx ). The and instruction then sets edx to 8 or 0.cmp ecx. edx ). Advanced IF statements Pseudo code: IF EAX>="0" && EAX<="9" . "9" ja @F . "0" jc @F cmp eax.

atF. ecx. 9 ). 8 break case 2: mov ecx. Kudos to Nexo for coming up with the second code. atF.HLA low-level syntax: cmp( jnae cmp( jnbe // atF: or lea ecx. eax ). 9 ). jnbe atF. 9 break case 3: . do something SWITCH-CASE statements Pseudo code: SWITCH eax case 0: mov ecx. [eax-@byte('0')]). ecx. do something eax. // do something atF: Comment The second code makes use of one register but there is only one conditional jump while the second code makes use of two conditional jumps. Generally the best optimised code is when you do not need conditional jumps at something @@: HLA syntax: lea( cmp( jnbe // atF: or sub( '0'. cmp( eax. atF. eax ). "9"-"0" ja @F . // or xor( '0'. [eax-"0"] cmp ecx. 7 break case 1: mov edx. '0' ). '9' ). eax.

-1 break END SWITCH HLA switch macro (from the HLA standard library): switch( eax ) case( 0 ) mov ecx. 9 jmp @F _3: mov edx. 11 jmp @F . offset _4 .data jmptable dd offset _0. offset _3. 7 jmp @F _1: mov edx. offset _1. -1 edx. 10 jmp @F _4: mov ecx. offset _2. 9 case(3) mov edx. 4 ja _default jmp jmptable[eax*4] _0: mov ecx. 10 case(4) mov ecx. 10 break case 4: mov ecx. 11 break default: or ecx. 11 default or ecx. 8 jmp @F _2: mov ecx. 8 case(2) mov ecx.code cmp eax. 7 case(1) mov edx. Assembler: .

These operations are calledbitwise operations and the operators we use to perform them are called bitwise operators. &_4 ]. -1 @@: HLA Syntax: static jmptable: dword[] := [&_0. _0: mov( 7. jmp( jmptable[eax*4] ). endstatic. Bit Operations Bitwise operations There are certain operations that you can use on individual bits or groups of bits. _3: mov( 10. . This is because negative numbers are "bigger" than normal number and we are using unsigned comparison (Which would be the concept behind Nexo code in the previous example). One may wonder why there is only one conditional jump to test whether it is within the range. ecx )._default: or ecx. jmp atF. _default: or( -1. ecx ). _1: mov( 8. edx ). _4: mov( 11. jmp atF. 4 ). atF: Comment The following example just introduce you to the idea of a jump table. jmp atF. &_3. jmp atF. ecx ). &_1. edx ). jmp atF. ecx ). &_2. ja _default. cmp( eax. _2: mov( 9.

SHLD. The logical bitwise instructions for 80386 and higher microprocessors are NOT. called thebit manipulation instructions. The same applies for the other options. BT. then videos will be enabled. that aid in moving bits include SHL. Additional instructions. ROR. Bitwise operators are symbolic operators that we shall use for clarifying operational concepts. . OR. XOR. BSR. and their machine-equivalent instructions is essential to learning ASM. These aid in manipulating bit values. BTR. ROL. WHY USE BITWISE OPERATORS There are several reasons why we need to or should use bitwise operators. BSF. SHRD..There is a clear distinction between bitwise operators and bitwise instructions. SAL. SAR. whenever bit 2 in dwCodeOptions is set to 1. AND. Bitwise instructions are the equivalent machine instructions provided by the microprocessor to simulate these operators. the video preview will not be shown. SHR. If it were cleared to 0. a good grasp of the bitwise operations. and SETcc. . One of the most popular ones is: most programmers prefer using the individual bits of a variable as boolean options for their code rather than using separate boolean variables as shown in an example (not asm code) below. Since their uses are important to every programmer. and RCR. . This saves memory space and speeds up the application program. RCL. BTC. Instead of using separate variables like this bShowAmbientLight = TRUE bNoSound = FALSE bNoVideo = TRUE bNoSeek = TRUE . TEST. So. BTS. operators.. We could create a variable dwCodeOptions and assign options to its bits like this bit bit bit bit and 0 to bShowAmbientLight 1 to bNoSound 2 to bNoVideo 3 to bNoSeek so on.

we mean changing its current boolean value to the other boolean value. while clearing a bit means changing its value to 0. THE LOGICAL OPERATORS NOT bitwise operator NOT is probably the easiest bitwise operator to understand. NOTing a value results in changing it to its opposite--NOT toggles values. you get 1. then 32 bits are available for use as boolean options. you can use the XOR operator instead as discussed a little later. Number of Options The number of bits you can use as boolean options in your code is limited by the size of the variable you use for storing them. clearing. see the one discussed under the AND Operator heading. If you NOT 1. and toggling bits Setting a bit means changing its value to 1. Bit Masks A bit mask is a temporary value that we use while setting. By toggling a bit. then toggling it will change it to 0. NOT changes "I am going" to "I am not going" . and toggling bits. Example: NOT 1100 1011b ------------= 0011 0100b NOT 0101 0111b -----------= 1010 1000b . If you create a 32-bit (DWORD-sized) variable. that NOT toggles the values of all the bits its operand contains. Creating a 16-bit (WORD-sized) variable will only allow for 16 bits to be used. however. If it is currently 1. Similarly. NOT(0) = 1 NOT(1) = 0 . The parentheses only suggest that NOT is unary.Setting. clearing. If you want to toggle the values of specific bits in an operand directly. It should be noted. 8-bit (BYTE-sized) variables will restrict your options to 8 bits. meaning that it takes only one operand. Knowing these terms is quite essential for understanding how to apply the operators discussed below. If you NOT 0. As an example. The NOT operator is an unary operator. you get 0. NOT changes "I am not going" to "I am going" .

Don't do this:  NOT is not equal to NEG. then I am also . NOTing a value doesn't negate it but gives us the 1's complement (all bits flipped). 0 . in that when you AND a bit (one of two states) with another. The 2's complement notation is used to represent negative numbers by x86-based systems. However if both bits are 1." 0 AND 0 = then I am 0 AND 1 = going. NOT a = NEG a . 1 AND 0 = going. If both Mary AND Julia are going. if one of the bits is 0. => 2's complement = 1's complement + 1 NEG(0101 1100b) = NOT(0101 1100b) + 1 AND BITWISE OPERATOR AND is quite simple to understand. the result would be zero. If both Mary AND Julia are not going to the party.Assembly Syntax MASM: NOT reg/mem HLA: NOT( reg/mem ). To get the 2's complement use the NEG instruction or add 1 to the 1's complement. 1 AND 1 = going. If Mary is ready to go AND Julia is not. then I am not 0 1 . The relationship between the two operations can be shown as below: . 0 .1 or NEG a = NOT a + 1. AND requires that BOTH of its operands be TRUE. Never confuse the NOT instruction with the NEG instruction in assembly language. the result would be 1. If Mary is not ready to go AND Julia is. Use the NOT instruction to:  Toggle all the bits in a value to get its 1's complement. then I am not . not going too. "To result in TRUE.

reg. reg/mem AND reg/mem. 41h). const HLA: AND( AND( AND( AND( src. reg ). Each uppercase letter value has a 0 as its 5th bit. clears it. which is 'A' OR bitwise operator . To convert 'a' (0110 0001b. ANDing a bit with 0. 0110 0001b AND 1101 1111b -------------= 0100 0001 . so that it ANDs with the bit 5 (0) in 'a' to result in 'A'. src AND reg/mem. 'a' . 61h) to 'A' (0100 0001b. The lowercase letters have binary representations wherein the 5th bit of any value is a 1. bit mask with bit 5 = 0 . const. We use a temporary value called a bit mask for this purpose. reg/mem. dest ). reg AND reg. Consider the conversion of lowercase letters to uppercase. Use the AND operator to:  Clear bits. reg/mem). reg/mem ).Example: 0101 1110b AND 1010 1010b ------------= 0000 1010b 1011 1110b AND 0001 0101b ------------= 0001 0100b Assembly Syntax MASM: AND dest. we use a bit mask with bit 5 as a 0 and the rest as 1s.

You'll use this operator very much when programming Windows. this is another little trick. For example . if one of the bits is 1. ORing a bit with 1 will set it. For more information. src OR reg/mem. the result would be 0. OR( reg/mem. reg/mem ). Although. const HLA: OR( src. when setting options for Window styles you use the OR operator. it may prove useful and you may also see it used sometimes. the result will be 1. When you OR a bit with another. reg ). OR( reg. For example. However if both are 0." 0 0 1 1 OR OR OR OR 0 1 0 1 = = = = 0 1 1 1 Example: 11100101b 01010111b OR -----------11110111b 00000100b 10101010b OR -----------10101110b Assembly Syntax MASM: OR dest. OR requires that ONE OR BOTH of its operands be TRUE.OR is somewhat the opposite of AND. reg OR reg. see the Windows ASM volume. reg/mem OR reg/mem. Use the OR instruction/operator to:  Set a bit. reg/mem ). It works because of the way the OR instruction affects the bits in the flags register. * Compare the value in a register to 0. dest ). OR( const. "To result in TRUE.

XOR is used in some encryption technology such as XOR encryption (XOR encrpytion is weak) and for XOR swap. MASM OR eax. This instruction is equivalent to "cmp eax. You can then use a ZF-testing conditional jump (JNZ. 0 0 1 1 Example 0011 1000b XOR 0011 1000b ------------= 0000 0000b XOR XOR XOR XOR 0 1 0 1 = = = = 0 1 1 0 0101 0101b XOR 1111 1000b ------------= 1010 1101b </verbatim> '''''Assembly Syntax''''' . eax ). "To result in TRUE." So.66 83 F8 00 . hang it on your favorite wall.. compiles into 4 bytes . and don't ever ask again. Frame what is said below. XOR requires that ONLY ONE of its two operands to be TRUE. 0 or eax. HLA OR( eax. cmp eax. or exclusive OR.66 0B C0 XOR bitwise operator XOR. but since it produces shorter code. . equivalent to cmp eax. it is preferable to use it. compiles into 3 bytes . eax . when you XOR two bits when both are valued 1. 0 is equivalent to eax comparing with 0 and sets the value of the Zero Flag bit in the flags register. the result will ALWAYS be 0. eax . JZ) to proceed with control flow. is a binary operator like OR but with a slight difference. 0". XOR allows you to swap the contents of two variables without using a third variable--only how we can do that is shown in actual code a little later.

reg/mem ). const HLA: XOR( src. edx XOR edx. to swap the values of the microprocessor registers eax and edx. For example. Using XOR is a possibility when you don't want a bus lock associated with the XCHG operation. This instruction is covered in the Data Transfer Instructions chapter.'' The x86 assembly XOR instruction allows us to swap two values without using a third placeholder. only operator 1011 0101b 0010 1000b XOR . whereas the XCHG instruction does not. <pre> XOR eax.  Toggle bits in a value. while the others should be 0.<source lang="asm"> MASM: XOR dest. when you supply a memory operand. . edx Although this is a neat trick. eax XOR eax. </source> '''''Use the XOR instruction to:''''' * ''Swap values. you use a bit mask wherein the toggling bits should be valued 1. For example. the x86 already provides an XCHG instruction for swapping registers. reg/mem XOR reg/mem. has an implicit LOCK associated with it. dest ). which slows the execution of the instruction by a tremendous amount. XOR( reg/mem. Note also that the XOR instruction affects the x86 flags. we use 0010 1000b as the bit mask. The XCHG instruction. not code. src XOR reg/mem. XOR( const. To toggle specific bits. XOR( reg. To toggle all the bits in 1000 1011b (to 0111 0100b) you can use NOT or you can XOR the value with -1 (1111 1111b). reg ). XORing the operand with -1 does exactly what NOTing it does. you use the following XOR instructions. reg/mem). to toggle bit 5 and bit 3 in 1011 0101b. reg XOR reg.

Simply stated If you have N values stored. let's answer: "What is parity?" Parity is " the state of being odd or even used as the basis of a method of detecting errors in binarycoded data " according to the Merriam-Webster Dictionary. eax" is generally used (mostly to return 0 in conjunction with a RET directive). The above instruction has "MOV eax. 0" This instruction has the same effect as assigning 0 to the register.--------------= 1001 1101b . Tricky XOR can "undo" itself. "xor eax. The reason it is used is that reg32 instructions are a little faster and much smaller. same as "MOV eax. In even parity the total number of 1s (including the parity value) should be even. and then adding a 0 or a 1 (parity value) to make the count odd or even. Parity can be odd or even. Here's an example XOR eax. If you happen to lose ANY ONE of the N + 1 values. Exactly how it is done is explained a litte later. 00101000b * Clear a register to 0. The advantage to smaller instructions is that you can hold more in the instruction cache. For example. Parity information is simply redundant information that is calculated from an actual set of values. you can use the XOR operator to perform a very subtle operation that may not be very self-evident at first but is truly another marvel. As interesting as it is.eax" compiles as 66 33 C0 while "mov eax. Parity is calculated by first counting the number of 1s in a unit of binary data. eax . However. .0" compiles as 66 B8 00 00 00 00 It occupies only 3 bytes as opposed to the 6 of the MOV version. which slow down programs as well. you use these values to calculate an extra value (parity information) so that the number of values stored now becomes N + 1. speeding up program execution. they require fewer memory fetches. Also. "XOR eax. you can recalculate it using the remaining N. In odd parity. XORing a register with itself will clear it to 0. 0" as its equivalent.  Detect errors and compute parity. but first. the total number of 1s (including the parity value) should be odd. code it like this XOR 10110101b.

1101b: 1. So. 1101 1 b.--. all the bits would move right by n times.--. and when SHR n is performed on it. resulting in: 1001 1 . 1001 has 2 1-bits. the parity bit for 1001 in this case is 1. The uses for shift right are that when you shift right a number by n.Example: Consider the bits in 1001b: 1. resulting in: 1001 0 . parity bit) should be odd. Example: 01011010b SHR 3 = 00001011b 90 SHR 3 = 11 10101110b SHR 2 = 00101011b 174 SHR 2 = 43 .--. it is better to replace divides with shift rights. Taking a number as a binary. We now add a parity bit to 1001. 2. So. Dividing on a computer is much slower than using shifts. parity bit) should be even. a. it is like dividing the number by 2^n. number of 1s (incl. number of 1s (incl. If we want odd parity. For odd parity.parity bit = 1 SHR bitwise operator SHR means shift right. 1101 has 3 1-bits. For even parity. the parity bit for 1001 in this case is 0. Add a parity bit: a.parity bit = 1 Take another example. If we want even parity. 1101 0 .parity bit = 0 . Thus. 2.parity bit = 0 b.--.

SHR( const. const SHL reg/mem. cnt SHL reg/mem. reg/mem ). reg/mem ). CL HLA: SHR( cnt. and when SHL n is performed on it. SHL( const. Thus. dest ).Assembly Syntax MASM: SHR dest. Multiplication on a computer is much slower than using shifts. it is better to replace multiplication with shift lefts. Addressing Modes Contents [hide]  1 Building memory addresses . reg/mem ). SHR( CL. Example: 10010101b SHL 2 = 1001010100b 149 SHL 2 = 596 00011110b SHL 3 = 11110000b 30 SHL 3 = 240 Assembly Syntax MASM: SHL dest. cnt SHR reg/mem. dest ). The uses for shift left are that when you shift left a number by n. all the bits in the number would move left by n times. it is like multiplying the number by 2^n. Taking a number as a binary. CL HLA: SHL( cnt. const SHR reg/mem. SHL bitwise operator SHL means shift left. SHL( CL. reg/mem ).

this is also known as "based addressing" 5. the direct address. This is the basis for indirect and basedaddressing. This is the basis for array or indexedaddressing.4 Constant (static) base + double indexing 5. Varying the address Both the base address and displacement can be constant.1 Constant (static) base only -. For example.    2 Varying the address 3 Scaling the index 4 The CPU doesn't care 5 Doing it in assembly o o o o o o o o 5.5 Variable base only -.this is also known as "direct addressing" 5. then we can locate the data by adding an offset or a displacement to the base address. (For an example where this is not true.this is also "direct addressing" 5.6 Variable base + constant displacement -.8 Variable base + scaled index + constant displacement Building memory addresses We often work with blocks of data spanning several memory addresses. A register loaded with a displacement is often called an index register. The assembler will combine these two into a single value.) If we have a large data block (say a 20-byte data structure). . We do this on the x86 by loading a register with the variable part. This is usually the lowest address of the data block. see the stack frame inThe Stack.the base address.7 Variable base + scaled index 5.3 Constant (static) base + scaled index 5. a dword is stored in memory as four consecutive bytes. Thus the calculated address (or effective address) is the sum of a base address and a displacement. Either the base or the displacement (or both) can be varied. but only need to look at a few bytes of data embedded in it.2 Constant (static) base + constant displacement -. A register loaded with a base address acts as a base register. We pick one of the addresses as a reference point .this is also known as "indirect addressing" 5.

Constant (static) base + scaled index MASM: mov eax. The CPU doesn't care which number is the base address.[dword_data] HLA: mov( dword_data. eax ). the address doesn't need to be valid at all.this is also "direct addressing" MASM: mov eax. Constant (static) base + constant displacement -. eax ). two variable) to create an address. 4. Doing it in assembly Constant (static) base only -.dword_data FASM: mov eax. At the machine level. eax ). 2. [dworddata+4] HLA: mov( dword_data[4].dword_data[ebx+ecx*4] . We do this by multiplying the index by the item size.dword_data[ecx*4] FASM: mov eax. The last case is the reason you may see code that performs nonaddress arithmetic with the LEA instruction. The CPU doesn't care The x86 is capable of adding together three numbers (one constant. (See All About Arrays for added details.Scaling the index In a HLL. And in the case of the LEA instruction. an array index with a value n is used to access the n-th array item (or element).[dword_data+ecx*4] HLA: mov( dword_data[ecx*4]. It only cares that the final value is a valid address. The x86 has the built-in capacity to scale the value of one register (by 1.dword_data+4 FASM: mov eax.this is also known as "direct addressing" MASM: mov eax. or 8) before computing the effective address. we need to convert this index into a displacement. Constant (static) base + double indexing MASM: mov eax.) This computation is called scaling.

A is 5 byte (xor eax.[ebx] HLA: mov( [ebx]. [ebx+ecx*4+24] HLA: mov( [ebx+ecx*4+24]. This is . ~[[ebx4+ eax]] = 8B0498) while B is 7 bytes (mov eax. Variable base + scaled index + constant displacement MASM: mov eax. Variable base + scaled index MASM/FASM: mov eax. eax = 33C0.[ebx+4] HLA: mov( [ebx+4]. eax ).FASM: mov eax.[ebx+4] FASM: mov eax. 4[ebx] mov eax. [dword_data+ebx+ecx*4] HLA: mov( dword_data[ebx+ecx*4]. A. [ebx*4+eax] or B. Variable base only -. Variable base + constant displacement -. eax ). ~[[ebx4]] = 8B049D 00000000). xor eax.[ebx+ecx*4] HLA: mov( [ebx+ecx*4]. eax ).this is also known as "based addressing" MASM: mov eax. eax ).[ebx+ecx*4+24] FASM: mov eax. mov eax. eax ). mov eax. [ebx*4] Most people will think that A will be longer in size.this is also known as "indirect addressing" MASM/FASM: mov eax. but in fact it is wrong. eax mov eax.24[ebx+ecx*4] mov eax.

If the box can contain 16 CDs. The code was usually placed first. You'll have to POP them out in the reverse 4 3 2 1 You can think of the stack as a closed box of fixed size with only one of the sides (top) open. An array of such reserved memory locations is called a stack. and saving room was high priority.COM (COre iMage) executable format was limited to 65. Also. Code and data occupied space downward.because when sib is encoded. the . and the stack was organized backward. So.536 bytes. This is called balancing the stack and if it is not done your program will crash in short order. The structure of the stack In the early DOS times. The code. remember toPOP it out in the reverse order. It is important to make sure that everything that you push onto the stack is popped off it as well. you can put only an allowable number of CDs inside. the last one you put in will be the first one to be picked out. say you PUSHed in 1 2 3 4 one after another. the only time that the index is nulled is when the displacement is 4bytes. you can't put a 17th CD inside. you will often require to save the contents of a register to free it for other purposes. It is used for a number of things but mainly for local variables. from byte 0FFFF. then the static data. For example. You could either copy the contents of the register to another available register or to a memory location reserved for such a purpose. The stack is a linear data structure--an array of allocatable locations in memory. then what we . Memory allocations and deallocations in the stack occur on a last-in-first-out (LIFO) basis. data. For more about it you have to learn the opcode format. and the stack had to be fit into this tiny space. Because the size of the box is fixed. The Stack Stack While programming in assembly language. the one that you put in first will always be the last one to come out. This simply means that you'll have to pick out the CDs in the reverse order of placement. whenever you PUSH some data into the stack. from byte 0100 (origin). The first data to come into the stack becomes the last one to go out. Nonetheless.

 Each entry popped out shortens the allocated section of the stack upward (toward higher memory addresses) and increments SP by unit size (4 bytes for 32-bit. Removing an entry empties the current location and changes (increments) the stack pointer to point to the previous allocated location (higher memory address).Allocations and deallocations made in a stack don't make the stack grow or shorten. Just as icicles grow from top to bottom. Adding an entry into the stack fills the next allocatable location (lower memory address) and changes (decrements) the stack pointer (SP) to point to the that location. that's what the stack is primarily used for. . A variable stack or a corrupted stack is no reliable place to save data--after all. A pointer called the stack base (SB) points to the first entry that you put into the stack. If you try to allocate memory any further in the stack after it is full. while the stack was placed eventually growing upward. Caution: The stack is a fixed number of memory locations for temporary use. called the top of the stack (TOS). The most recent data that you tried to push into the stack is lost. The offset address to the most recently allocated stack location. The 16-bit SP register is used only in the 16-bit environment. so does the stack. There can be many stacks present at a time in memory. but there's only one current stack. Only the number of allocations made increases or decreases. This current stack segment is pointed to by the stack segment (SS) register. Stacks in executables are still placed proceeding backward. (TODO) Consider a stack of a capacity of 16 call. or 4 dwords of temporary data.  Each entry pushed into the stack makes the allocated section of the stack grow downward (toward lower memory addresses) and decreases SP by unit size. is held in another register called the stack pointer (SP). a stack overflow occurs. The data in a stack doesn't move and shouldn't move--data moving or a stack growing or shortening defeats its purpose. Each stack is located in memory in its own segment to avoid overwriting other parts of memory. the 32-bit extended stack pointer (ESP) register is used to point to the TOS. Example: . 1 for 8-bit). 2 for 16-bit. or 8 words. in a 32-bit environment. Comment To picture a stack growing or shrinking. It can hold only 16 bytes. the virtual data. eventually growing down. think of 'icicles'.

and it's true that especially NT is very picky.[hWnd]. and NT are extremely touchy to stack alignment issue.386 .stdcall option casemap:none include /masm32/include/user32. According to fodder." Example (MASM): . and you can use ebp to access parameters and local variables. esp and ebp are stack-related pointers. refer to Win CallingConventions. for example. but it does have weird effects . "Misaligned stack doesn't necessarily give GPF.0.model flat. For the most part if you do not manipulate the stack directly. critical to balance the stack--if the return address is not where it is expected to be on RET the program will crash.WM_CLOSE. The first is to pass data to procedures including Windows API functions. A common question is: "Where is the stack located?" The stack is located in memory and is reserved for use by your program. Besides local variables there are two other very important functions of the stack. However functions can be created without the creating of a stack frame. When you enter/step into most functions. usually a stack frame would be created. therefore. When the CPU encounters a RET it will pop the return address off the stack and jump back to that address. When you call a procedure.0 is actually assembled as follows push 0 push 0 push WM_CLOSE push [hWnd] call [SendMessage] The parameters are pushed in reverse order because Windows uses the STDCALL convention in which the stack is reversed but the function will balance it for you on return. the following INVOKE call INVOKE SendMessage. if not it would raise some general protection fault (or simply known as GPF).inc include /masm32/include/kernel32. An important point to note is that the stack should be aligned to DWORD (align to 4). the address that it was called from is pushed onto the stack then the program jumps to the procedure. For calling conventions that you can use in Windows. esp is the stack . you never have to worry about this but you will find that the stack is a very powerful tool and eventually you will need to keep these things in mind. The second function of the stack is to hold the return addresses of procedures. while ebp is the base pointer. It is. one of our friends on the Win32 ASM board.Using the stack We use the PUSH instruction to put data into the next allocatable stack location and the POP instruction to remove a data entry from the current stack location (SP).

Thus some member (stryker/arkane) at the forums (win32asm) have came out with the xcall macro which is supposed to be faster than invokes.[esp] .0 However.lib includelib /masm32/lib/kernel32. => sub esp. remove dword align to crash app on Windows NT MessageBox.includelib /masm32/lib/user32.push decrements esp by byte "Stack needs to be aligned to dword" esp. mov [esp]. . 0 . The usage for push is something like push eax.4 . more about stacks and its related opcodes.OFFSET testing. the data to the stack is moved to eax. Example: push size pop size .CODE EXITM <OFFSET @@1> ELSE EXITM <_str> ENDIF ENDM .lib .4 eax . The esp is then incremented by the size of the data moved from the stack.when you pop eax.0.pop increments esp by byte eax . add esp. parameters:VARARG . Of course there are some limitations which are that the macro cannot handle direct memory and cannot handle BYTE.2 . TBYTE size parameters. QWORD. mov is much faster than pushes and pops since it saves bytes and requires less clock cycles to stryker xcall MACRO gfalen @str MACRO _str:VARARG LOCAL @@1 IF @InStr(1.0 ExitProcess.DATA @@1 DB _str.eax . however. <_str>.code start: testing @@: jmp @F db sub invoke invoke end start Now. The most common opcodes related to the stack are 'push' and 'pop'. WORD. as it replaces all the pushes with mov and sub (note.0. <!"> ) . The esp (which holds the pointer to the stack) is then decemented by the size of data you pushed onto the stack. that the result is much larger if you use push instructions). as in you push the data on eax to the stack. Similarly. => mov eax.

esp . For popad. For popfd. and how a stack frame is created so as to access the parameter with ebp. ecx. @SizeStr(<param> ) . edx. the registers are popped off the stack in the following order: edi.LOCAL psize. ebx. para4 eax. 1.par2 . para3. Stack frame Eariler on. They are namely pushad (pusha being the 16bit version). 6. The following codes shows how parameters can be access.5 lea eax. @str(<param> ) ENDIF ELSE mov DWORD PTR ~[esp+psize*4]. <parameters> psize = psize + 4 ENDM IF psize EQ 4 push parameters ELSE sub esp. paddr mov DWORD PTR ~[esp+psize*4]. para2. MASM example: test47 proc mov mov par1:DWORD. psize psize = 0 FOR param. esi and edi. edx. <parameters> IF @SizeStr(<param> ) GT 4 paddr SUBSTR <param>. 5 IFIDNI paddr. .par1 ecx. ebp. There are some opcodes that help to store and later restore values in the registers. For pushfd. eax ELSE mov DWORD PTR ~[esp+psize*4]. the data from the stack are popped into the Flags register (EFLAGS). esp. I have mentioned that ebp is the base pointer and its uses are to access the local variables and parameters passed to the function. all general registers are pushed onto the stack in the following order: eax. plen IFNB <parameters> psize = 0 FOR param. eax. For pushad. one of it is to create an internal stack frame) and the code produced (viewed from a disassembler).ebp. @str(<param> ) ENDIF psize = psize + 1 ENDM ENDIF ENDIF call function ENDM The uses of push and pop are to store data temporarily (store data on the stack) and to pass parameter (pop are not used though). popad (popa being the 16bit version). the Flags register (EFLAGS) is transferred onto the stack. <ADDR > paddr SUBSTR <param>. paddr. pushfd (pushf being the 16bit version) and popfd (popf being the 16bit version). esi. ecx. Below I have listed a sample MASM code (MASM have certain internal macro.

esp mov eax. original value of ebp stored at [ebp+04h]. eax ).[ebp+0Ch] mov edx. DWORD . ebx ). para2./* . "ret ./* par1 */ para2 */ para3 */ para4 */ The code "push ebp" and "mov ebp.[ebp+14h] leave ret 10h . Actual MASM code output from the HLA compiler: 1_test47__hla_ proc near32 push ebp mov ebp. ecx ). which sets up the stack frame) test47: push ebp mov ebp.esp mov eax./* . There is opcode that does the opposite of leave. para4:dword ).test47 mov mov ret endp edx. but is not used as it is slow. para3. The "ret 10h" tells the processor to transfers control from the procedure back to the instruction address saved on the stack (surprise.[ebp+08h] PTR[ebp+08h] = par1 mov ecx. copy value of esp to ebp . dword mov ebx.[ebp+10h] mov ebx. @stdcall. @nodisplay. mov( mov( mov( mov( end test47. para4. dword xL1_test47__hla___hla_: leave ret 16 L1_test47__hla_ endp ptr ptr ptr ptr [ebp+8] [ebp+12] [ebp+16] [ebp+20] . The instruction "leave" removes the stack frame by esp and ebp back to their condition before the stack frame is initialized (mov esp. para2:dword. par1. DWORD PTR [ebp+0Ch] = par2 . para3:dword. @nostackalign. becomes this after compiling (due to some MASM internal macros. DWORD PTR [ebp+14h] = par4 .esp" creates a stack frame.par3 ebx. edx ). surprise the stack is used to store the initial value of the instruction pointer when "calling" a function. store value of ebp on stack .ebp pop ebp). sizeof parameters * number of parameters = 4*4 . DWORD PTR [ebp+10h] = par3 . begin test47. dword mov ecx./* . dword mov edx.par4 // HLA example procedure test47( par1:dword. The address of the function is loaded to eip and code continues with excution according to eip).

DWORD PTR lea eax. The following code (MASM & HLA) would show how ebp can be used to access local variables (Local variables are acutally data stored on the stack). sizeof parameters * space for local variables ~[ebp-24h] = dd1 ~[ebp-28h] = dd2 = address of first byte in the array number of parameters = 4*4 . Spare me the crap. reserve stack mov eax. thus DWORD PTR[ebp+04h]] contains the original value of ebp. lea( eax. dd2: dword.eax . test124 proc par1:DWORD. para3:dword. dd2 ).thereby removing function parameters from the stack)is because of the STDCALL calling convention. DWORD PTR mov [ebp-28h]. para4:dword ). [ebp-20h]. becomes this after compiling (due to some MASM internal macros. para2. which sets up the stack frame) push ebp mov ebp.[ebp-24h] .eax lea eax. -28h .dd1 . para4 LOCAL buffer[32]:BYTE LOCAL dd1:DWORD LOCAL dd2:DWORD mov eax. mov dd2. while C calling convention would only do "ret" and leave it to the caller to adjust esp. mov( dd1. end test124. @nostackalign.buffer ret test124 endp // HLA example: procedure test124( par1:dword. begin test124. eax ).10h" (something like add esp. dd1: dword. this is just an example. @leave. para2:dword. mov( eax. This is due to the fact that ebp is pushed onto the stack. var buffer: byte[32].esp add esp. Parameters could be accessed via DWORD PTR [ebp+4+4*positionofparameter] The above code shows how a stack frame is created and how ebp is used to access the parameters passed to the functions. 16 . @stdcall. buffer ). para3. @nodisplay. One may ask why the first parameter is stored in DWORD PTR[ebp+08h] and not DWORD PTR[ebp+04h]. [ebp-20h] leave ret 10h .

Actual HLA compiler output: L2_test124__hla_ proc near32 push ebp mov ebp.-28h" instead of "sub esp. When stack frame is removed.28h". call function1 .. creating a stack frame... byte ptr ~[ebp-32] . 40 mov eax. Push/pop ebp would still be needed if you want to use EBP as a general purpose register. dword ptr [ebp-36] . (Hopefully I do make some sense.-28h" (sub esp. but it has its purpose. The following codes are ways to create functions without stack have to handcode (well. Maybe it is due to some macro defined deep into MASM.not so much to save the push ebp . you cannot use local variables (in the automated masm way). nor can you access function parameters the usual way . to represent whatever code present ret 4*numberofparameter . unless somebody has macros) all ESP references. they might even not have a stack frame in their functions (Yes. You need to manually adjust the offsets from esp. I think it would be easier to understand how to access local variables by examining how to calculate the displacement needed to access a certain local variable (by looking at the above example) than my explanation. Also. He further states that "If you don't use a stack frame.) However I cannot comprehend why MASM produce "add esp. the value of esp would decrease). Local variables differ from parameters in the fact that they are accessed by negative displacement (Remember the fact that when you push something. and remember to further adjust these if you do push/pop". since you have to return it in it's original state".esp = 2 byte. leave = 1 byte. remember that pushing data would cause a change in the value of esp. if you don't have a stack frame you don't want 'leave'. 40 in the HLA output) might seem weird. it is possible and I would show you how)./* buffer */ xL2_test124__hla___hla_: leave ret 16 L2_test124__hla_ endp Okay. eax . the reason for not using a stack frame is either that you don't need it. It is to ensure the values stored in local variables are not corrupted any data when something is pushed onto the stack. total bytes saved = 4). Some code gurus definitely cares about how big the code size and how fast their code runs. so the code is almost similar to the above code. esp sub esp. function1: nop . "Usually. or that you want to use EBP as a general purpose register . mov ebp./* dd1 */ mov dword ptr [ebp-40]. Removing stack frame can shave off some clocks and some bytes (push ebp = 1byte. According to f0dder./* dd2 */ lea eax. The instruction "add esp. To optimise their code.

// Don't automatically generate code @noframe. you can tell the compiler to skip the generation of the stack frame by using the @noframe procedure option and the @basereg and @parmoffset compile-time variables.or OPTION PROLOGUE:NONE OPTION EPILOGUE:NONE function2 function2 proc par1:DWORD./* . so the parameters will be passed on the stack in the opposite order. edx ). para2:dword. ?@basereg := ebp.par3. dword ptr ret 16 xL3_nostk__hla___hla_: [esp+16] [esp+12] [esp+8] [esp+4] . anymore. Note that when using this option you cannot use the @stdcall scheme. @nodisplay. dword ptr mov ecx. para3. to represent whatever code present ret 4*4 endp OPTION PROLOGUE:PROLOGUEDEF OPTION EPILOGUE:EPILOGUEDEF or function3 par1 par2 par3 par4 function3 proc equ equ equ equ nop ret endp <esp+4> <esp+8> <esp+12> <esp+16> . // for the stack frame. // Parameters start at offset 4 since we're not pushing EBP procedure nostk( par1:dword./* par1 */ para2 */ para3 */ para4 */ .par2. para3:dword. // Tell HLA to use ESP as the base register. eax ). end nostk. // "_parms_" is a constant HLA creates that specifies // the number of bytes of parameters. to EBP as the base register.par4 nop . ecx ). // Note: no @stdcall option! begin nostk. para2. dword ptr mov ebx. ?@parmoffset := 4. to represent whatever code present 4*4 In HLA. // Now switch back Here's the code the HLA compiler emits L3_nostk__hla_ proc near32 mov eax. dword ptr mov edx. mov( mov( mov( mov( ret( par1. para4./* ./* . _parms_ ). ?@basereg := esp. ebx ). para4:dword ).

ebx )./* . dword ebx. the stack is bounded by 2 things. eax )./* . he/she will realise that somehow windows will mysteriously terminate the program or perhaps having a weird GPF(general protection fault). _parms_ ). you can use equates. So what is the limiting fact. For example mov dword ptr fs:[4]. para3./* . The lower stack boundary is located at fs:[8]] and the upper stack boundary at fs:[4]]./* (type (type (type (type dword dword dword dword [esp+4]) */ [esp+8]) */ [esp+12]) */ [esp+16]) */ . namely the lower stack boundary and the upper stack boundary.Output from HLA compiler: L4_nostk2__hla_ proc mov mov mov mov ret xL4_nostk2__hla___hla_: L4_nostk2__hla_ endp Further notes on the stack For one that seriously did mess around with the stack. -. one would ask. _para3:dword. dword 16 ptr ptr ptr ptr [esp+4] [esp+8] [esp+12] [esp+16] . Let me tell you. para4. edx ). para3 :text := "(type dword [esp+12])". begin nostk2.. 0 near32 eax. _para4:dword ). dword ecx. para4 :text := "(type dword [esp+16])". end nostk2. para2. mov( mov( mov( mov( ret( par1. para2 :text := "(type dword [esp+8])".g. fs:[4]] and fs:[8]] will make sure that when you enter some part of the kernel(that checks ESP against fs:[4]] and fs:[8]]) the OS won't kill your program. just as you do with MASM. all these depending on the variants of Windows. _para2:dword. e.L3_nostk__hla_ endp If you really need to use the stdcall calling sequence with HLA without building a stack frame. ecx ). procedure nostk2( _par1:dword. @nodisplay. @noframe. dword edx. ffffffffh mov dword ptr fs:[8]. const par1 :text := "(type dword [esp+4])".

Whether a piece of data is aligned depends not only on the address where it's located. The CPU "feels" better when data is aligned on 4-BYTE boundaries or in some cases 16-BYTE boundaries. Misalignment of data is one of the problems that you need to take care of when writing efficient code. but also on its size. and 4-BYTE (DWORD) data is aligned when located at address boundaries evenly divisible by 4. Contents [hide]   1 A Simple Example 2 Causes of Misalignment o o o  2. The processor is unable to access misaligned data in a way "natural" to it.3 Misaligned Stack Data 3 Aligning Data A Simple Example Boundaries are evenly divisible memory addresses. Misaligned data is data located at an address that the processor cannot access efficiently. Although hand optimizations can squeeze the juice out of the microprocessor.The above example allows the stack to be located on any memory location as long as the memory is committed and accessible. Alignment Writing efficient code is an art. For example. an address that is aligned on a 4BYTE (DWORD) boundary is evenly divisible by 4. I would strongly suggest that the values in fs:[4]] and fs:[8]] is restored on the exit of you program. Actually. Also. So. it makes code run faster and. It is only a few instruction and hopefully it does ensure compatibility across the different OS. The processor will always get it's data from DWORD boundaries and in DWORD sizes. in some cases.1 Improper Structures 2. 1BYTE data is always aligned. some operating systems require alignments of some structures to DWORD boundaries. the data must be aligned in order to use certain CPU features. 2-BYTE (WORD) data is aligned when located at evenly divisible addresses.2 Data type organization 2. This is called natural alignment. if you had the following . A 32-bit microprocessor "naturally" accesses data positioned at address boundaries evenly divisible by 4. In short you can create your stack. a little alertness and precautions here and there while coding can also save you fortunes.

However. the effects of misaligned data access are somewhat mitigated. This requires 2 memory fetches and takes longer to execute. Then put them both together. This is how it applies to DWORDs as in the previous example. For word size values.  General protection faults. Note that on modern processors with decent cache designs. Chop off the leftmost 3 bytes 3. hence slowing down execution. misaligned accesses within a cache line generally do not require additional cycles to access. Detecting misalignment at debug-time is difficult. Get the second DWORD (FETCH 2) 4. Causes of Misalignment (TODO) Improper Structures (TODO) Data type organization (TODO: Strings and data types order) ... Common aftereffects of misalignment are * More number fetches required to access data. the processor would find it on an address divisible by 4 (a boundary) and get it in one fetch. Chop off the rightmost 1 byte 5. Get the first DWORD (FETCH 1) 2.. In particular. there would be no change because the 2 bytes are available in the first pass in both cases.. The effects it has on a processor depends on the architecture of the microprocessor. the processor would 1. if the data was misaligned like this 1122 2233 3344 4400. There are certain instructions that work better on 16-BYTE boundaries (such as movsd) and some that require it (some FPU instructions). and you wanted to get the second DWORD. misaligned accesses across a cache line incur the penalty. However. and you wanted the same DWORD.1111 2222 3333 4444. or GPFs.

. align next data or instruction to 16-BYTE boundary. however. QWORDs are not properly aligned. Some of the SIMD (Simple Instruction Multiple Data) instructions require memory aligned at 32-BYTE boundaries. it is important to at least align it to 4-byte (DWORD) boundaries otherwise the processor making 2 reads to get the value slows down processing considerably. For example. (This page is not the same 4 KB page that the 80x86 microprocessor uses for paging with segment descriptors defined as in Windows. which usually means allocating memory with a bit over and aligning the start position to read and write to. however. but it makes no effort to align the stack properly. Syntax: ALIGN [[boundary]] Example: ALIGN 4 ALIGN 16 . and WORDs before BYTEs. you can use any even number from 2 through 16. it seems to do the same alignment of variables. If you use full segment definitions and specify "page". align next data or instruction to DWORD boundary. generally.) MASM will properly align variables declared with LOCAL to their natural boundaries up to DWORD. MASM. it is 256 bytes. The ALIGN directive aligns the next instruction or data to the boundary specified.Misaligned Stack Data (TODO) Aligning Data It is worth aligning code labels that are frequent jump targets because speed increases are often observed. will complain if you ask for alignment that is greater than the segment alignment. Begin MASM Specific To align data using MASM. you should try to define the larger-sized data first. These are the two most common alignment directives but. To align labels. the ALIGN directive places NOP (no operation) instructions wherever needed. As a general rule. use the ALIGN directive. . you can specify up to "ALIGN 256". (TODO: Structure and stack Alignment) The stack should be always aligned to 4 in Windows-based programs because misalignment often causes some API functions to fail. With a 16-bit stack. you should define DWORDs before WORDs. Rather. You should make it a point to align data after you've defined your strings. With data. so its alignment of the DWORD variables will not be of much use half the time on average.

This data is now aligned. This aligns the first instruction of the procedure on the specified boundary. there may be additional restrictions based on the assembler you're using with HLA. especially if you want to preserve all registers on entry. The ALIGN directive aligns the next instruction or data to the boundary specified. the microprocessor will fetch data twice. I order to get the value of dwValue. do you? We guess not. the string is 17 bytes long. HLA does not. <<statements>> end procName. You don't want it to do that. . you may use the HLA align procedure option as follows procedure procName( <<OptionalParameters>> ). To align labels. or with a 32-bit stack. To force the first instruction of a procedure to begin on some boundary. generally. HLA supports alignments of any value. one reason to cause misalignment ALIGN 4 .Here. In theory. Example: ALIGN( 4 ). align next data or instruction to 16-BYTE boundary. you can use any even number from 2 through 16. . HLA automatically pads all procedure variables to 32 bits (a requirement of Windows). If you do not use the ALIGN 4 directive. These are the two most common alignment directives but. the ALIGN directive places NOP (no operation) instructions wherever needed. In the above example. align(4). ALIGN( 16 ). though in general you should use a power of two. Align next piece of data at the next 4-byte boundary. Also. I have found this to be somewhat awkward to do. . Example: Aligning after defining strings. as HLA's alignment capabilities depend on the underlying assembler that processes HLA's output. extensively using variables larger than 32 bits efficiency will be improved by forgoing the convenience of proc and manually assigning variables and aligning the stack. . dwValue dd 0 . use the ALIGN directive or procedure option. <<otherOptions>> begin procName. the next piece of data gets deposited at the next byte (byte 18). a 17-byte string string1 db "this is a string".0 . End MASM Specific Begin HLA Specific To align data using HLA. align next data or instruction to DWORD boundary. Syntax: ALIGN( <<boundary>> ). but in certain circumstances you may not be allowed to use values greater than 16.

and esp.Align stack to four-byte ./* d */ Unfortunately. HLA. end TestProc. 8 1 byte padding. dword ptr [ebp-7] mov esp. the "and esp. begin TestProc. by default. those procedures' stacks will be dword aligned (unlessTest Proc also messes with the stack before calling those procedures). provide this padding to local variables. mov( d. 0fffffffch boundary! mov al. var b:byte. esp sub esp./* b */ . @nodisplay. begin procName. For example. eax ). However. mov( w. true address alignment depends on the stack being properly aligned upon entry into the procedure. produces the following MASM code L1_TestProc__hla_ proc near32 push ebp mov ebp. ax ). d:dword. <<statements>> end procName. emits some extra code to align the stack upon entry into a procedure. Note that the alignment is only within the activation record. mov( b. byte ptr [ebp-1] mov ax. <<otherOptions>> var b:byte. ebp pop ebp ret 4 L1_TestProc__hla_ endp .Allocate storage for 7 bytes + .however. you can use the align directive for this purpose: procedure procName( <<OptionalParameters>> ). al ). d:dword./* w */ . compiling the following HLA code procedure TestProc(parameter: dword). it's possible to mess with the stack prior to calling a procedure and invalidating this assumption. w:word. To help overcome this problem. If you want to align the addresses of your local variables on the stack to some particular boundary. . Most of the time you can count on the stack being aligned on a double-word boundary upon entry into your procedure. word ptr [ebp-3] mov eax. align(4). 0fffffffch" instruction does not align the current activation record to a four-byte boundary. but if Test Proc calls any other procedures.

mov( w. @nodisplay. word ptr [ebp-3] . eax ). w:word. eax ). align(2)./* w */ mov eax. @noalignstack. begin t.If your program doesn't mess up the dword alignment of the stack. mov( b. begin t. d:dword. begin TestProc. end TestProc. ax ). @noalignstack. 8 mov al. thus making your code a tiny bit more efficient program t. 0fffffffch" instruction. ebp pop ebp ret 4 L1_TestProc__hla_ endp Note in the examples to this point that the w and d local variables have been misaligned in the activation record. mov( b. d:dword. end TestProc. dword ptr [ebp-7] . esp sub esp. var b:byte. mov( d. ax ). begin TestProc. align(4). w:word. This is easy to fix with an align directive in the VAR section of the procedure program t. Emits the following MASM code L1_TestProc__hla_ proc near32 push ebp mov ebp./* d */ mov esp. byte ptr [ebp-1] . procedure TestProc(parameter: dword). al ). al ). mov( d./* b */ mov ax. you can use the @nostackalign procedure option to tell HLA not to bother emitting the "and esp. mov( w. . procedure TestProc(parameter: dword). @nodisplay. var b:byte. end t.

esp sub esp./* d */ Note that HLA always guarantees that literal string constants you create in an HLA program are stored in memory aligned to a four-byte boundary and always consume a multiple of four bytes. World. object is a multiple of four bytes long L4_len__hla_ L4_str__hla_ . MASM code generated by the HLA compiler L1_TestProc__hla_ proc near32 push ebp mov ebp."./* w */ .current length L2_str__hla_ label byte db "Hello World" db 0 . dword ptr [ebp-8] mov terminating byte align label dword dword label db db byte byte byte L6_len__hla_ align label 4 dword 0ch 0ch byte "Hello World. byte ptr [ebp-1] mov ax. consider the following HLA string constants appearing in a program program t. end t. 8 mov al./* b */ . World. World. word ptr [ebp-4] mov eax.maximum length dword 0bh . ebp pop ebp ret 4 L1_TestProc__hla_ endp ...".align to dword boundary L2_len__hla_ label dword dword 0bh . static s1: s2: s3: s4: string string string string := := := := "Hello "Hello "Hello "Hello World". begin t. Note the code that HLA emits for this string data (keep in mind that HLA prefixes string data with the maximum length and current length of the string) align 4 . For example." 0 0 0 0 4 dword .Extra padding to ensure that string ..end t.".

then words.for floating point operations .re-align data to 8-byte boundary Here ALIGN is used to pad the DATA section with zeroes to bring it back into alignment for the qwords.Extra padding to ensure that string .. would upset the sequence . then bytes and strings.0 TWORDRESULT DT 0. being 10 bytes.Extra padding for dword alignment. So you would declare all qwords first.0 Mess2 DB 'Output message'. Twords. The same can be done in a CONST section or for uninitialized data (using ? as the initializer)." 0 0 0 .. Example: Code: DATA TWORDINTEGER DT 0. End HLA Specific Begin FASM Specific (TODO) End FASM Specific Begin GoASM Specific Achieving correct data alignment in GoAsm Good alignment can usually be achieved automatically by declaring data in size sequence in the data section. . object is a multiple of four bytes long L8_len__hla_ L8_str__hla_ align label dword dword label db db byte 4 dword 0eh 0eh byte "Hello World.L6_str__hla_ dword dword label db db byte byte 0dh 0dh byte "Hello World.0 ." 0 0 .. then could do them all first then correct the alignment using ALIGN.0 ALIGN 8 QWORD_DATA1 DQ 0 QWORD_DATA2 DQ 0 COUNTD1 DD 0 COUNTD2 DD 0 COUNTW1 DW 0 COUNTW2 DW 0 COUNTB DB 0 Mess1 DB 'Input message'.

and the like. The stack is refered to as ST(x) where x is a position on the stack.For Win32. which performs no operation. though many people refer to it as ST without a number. who's purpose is to hold status flags for operations like comparison. memory. executes approximately 70 instructions. This causes a lot of confusion to newcomers because unlike programming on the main CPU. The FPU can be very useful for performing calculation and mathematics heavy applications such as 3D graphics and audio processing. Throughout this article. when you move data into a register on the FPU. GoAsm also automatically aligns the stack pointer (RSP) ready for an API call. exception detection. and constant instructions. The FSTSW instruction moves the status word register into the CPU. This chapter will describe the instruction set of the FPU up to the Pentium processor. ST(0) will be used to represent the top of the FPU stack. writing to or comparing the contents of. It includes the data transfer. Contents [hide] . The FPU also contains a status register. arithmetic. like the CPU. everything below it moves down. or the floating-point unit (FPU). GoAsm also pads the size of the structure to suit.esi the pointer to the pointer to be aligned The Floating Point Unit (FPU) The x87 coprocessor. so you can use it for flow controll. both when they are declared as local data and in the data section. There are some speed tests in TestBug which show what difference correct alignment can make when reading from. GoAsm pads with instruction NOP (opcode 90h). 16-1 and esi. End GoASM Specific Alignment to x (if x is power of 2 is simple). For Win64 GoAsm automatically aligns structures and structure members to suit the natural boundary of the structure and its members. transcendental. Code alignment in GoAsm Correct code alignment will differ between processors. it uses a stack. comparison. and response. GoAsm automatically aligns structures on a dword boundary. etc. See the GoAsm help file for more details. -16 . The FPU uses a different system for moving data around in the processor. For example alignment to 16 add esi. where eax is always eax. and you need to keep track of your register stack at all times. bit test operations. Instead of using named registers. When you use ALIGN in a CODE section.

Negative numbers are stored in 2's-complement form.--------------------------+----------------------+ . Following shows how to define the various integer types. The following table shows the various data types used by the FPU along with their sizes and approximate ranges.1 Signed Integers 1. and floating point.+------------------------------------------------+ .79e308 Extended Real 80 bit 3. and what their binary representations are . FFFFFFFFFFFFFEBF | .18e4932 Packed BCD 80 bit -1e18 to 1e18 Signed Integers Positive signed integers are stored in normal format. Type Type Length Length Range Range Word Integer 16-bit -32. 0024 | var2 dw -2 .22e18 32 bit 1.14e9 Long Integer Single Real Double Real 64-bit -9. with the left-most sign bit equal to 1.-------------------------------------------------+ var1 dw 24 .768 to 32.14e9 to 2. 000004D2 | var4 dd -123 . FFFFFF85 | var5 dq 9876 .40e38 64 bit 2.23e-308 to 1. 1 FPU Data Formats o o  1. BCD. with the left-most sign bit set to 0.767 Short Integer 32-bit -2.22e18 to 9.2 Binary-Coded Decimal (BCD) 2 List of FPU Instructions FPU Data Formats The FPU uses 3 different types of data: signed integer. FFFE | var3 dd 1234 . 0000000000002694 | var6 dq -321 .37e-4932 to 1. Definition | Hexadecimal | .18e-38 to 3.

with 2 digits per byte.Binary-Coded Decimal (BCD) BCD numbers are 10-bytes in size. F2XM1 FABS FADD FADDP FBLD FBSTP FCHS FCLEX FCMOVcc* FCOM FCOMI FCOMIP FCOMP FCOMPP FCOS FDECSTP FDIV FDIVP FDIVR FDIVRP FFREE FIADD FICOM FICOMP FIDIV FIDIVR FILD FIMUL 2 to the X power minus 1 Absolute value of ST(0) Add two floating point values Add two floating point values and pop ST(0) Load BCD data from memory Store BCD data to memory Change the sign of ST(0) Clear exceptions Conditional move based on CPU flags Compare ST(0) to a floating point value Compare ST(0) to ST(i) and set CPU flags Compare ST(0) to ST(i) and set CPU flags and pop ST(0) Compare ST(0) to a floating point value and pop ST(0) Compare ST(0) to ST(1) and pop both registers Cosine of the angle value in ST(0) Decrease stack pointer Divide two floating point values Divide two floating point values and pop ST(0) Divide in reverse two floating point values Divide in reverse two floating point values and pop ST(0) Free a data register Add an Integer located in memory to ST(0) Compare ST(0) to an integer value Compare ST(0) to an integer value and pop ST(0) Divide ST(0) by an Integer located in memory Divide an Integer located in memory by ST(0) Load integer from memory Multiply ST(0) by an Integer located in memory . Each number is stored as 18 digits. +----------+-----+-----+-----+-----+-----+-----+-----+-----+----+----+----+---+----+----+----+----+----+----+ | Sign bit | D17 | D16 | D15 | D14 | D13 | D12 | D11 | D10 | D9 | D8 | D7 | D6 | D5 | D4 | D3 | D2 | D1 | D0 | +----------+-----+-----+-----+-----+-----+-----+-----+-----+----+----+----+---+----+----+----+----+----+----+ 79 0 List of FPU Instructions This list of instructions was compiled from Ray Filiatreault's online floating point tutorial. The highest order byte stores the sign of the number. with the highest order bit of this byte being the sign bit. and not in complemented form. Note that both positive and negative numbers are stored in true form.

FINCSTP FINIT FIST FISTP FISUB FISUBR FLD FLD1 FLDCW FLDENV FLDL2E FLDL2T FLDLG2 FLDLN2 FLDPI FLDZ FMUL FMULP FNCLEX FNINIT FNOP FNSAVE FNSTCW FNSTENV FNSTSW FPATAN FPREM FPREM1 FPTAN FRNDINT FRSTOR FSAVE FSCALE FSIN FSINCOS FSQRT FST FSTCW FSTENV FSTP FSTSW FSUB FSUBP FSUBR FSUBRP FTST FUCOM FUCOMI Increase stack pointer Initialize the FPU Store integer to memory Store integer to memory and pop ST(0) Subtract an Integer located in memory from ST(0) Subtract ST(0) from an Integer located in memory Load real number Load the value of 1 Load control word LoaD environment Load the log base 2 of e (Napierian constant) Load the log base 2 of Ten Load the log base 10 of 2 (common log of 2) Load the log base e of 2 (natural log of 2) Load the value of PI Load the value of Zero Multiply two floating point values Multiply two floating point values and pop ST(0) Clear exceptions (no wait) Initialize the FPU (no wait) No operation Save state of FPU (no wait) Store control word (no wait) Store environment (no wait) Store status word (no wait) Partial arctangent of the ratio ST(1)/ST(0) Partial remainder Partial remainder 1 Partial tangent of the angle value in ST(0) Round ST(0) to an integer Restore all registers Save state of FPU Scale ST(0) by ST(1) Sine of the angle value in ST(0) Sine and cosine of the angle value in ST(0) Square root of ST(0) Store real number Store control word Store environment Store real number and pop ST(0) Store status word Subtract two floating point values Subtract two floating point values and pop ST(0) Subtract in reverse two floating point values Subtract in reverse two floating point values and Pop ST(0) Test ST(0) by comparing it to +0.0 Unordered Compare ST(0) to a floating point value Unordered Compare ST(0) to ST(i) and set CPU flags .

FUCOMIP FUCOMP FUCOMPP FWAIT FXAM FXCH FXTRACT FYL2X FYL2XP1 Unordered Compare ST(0) to ST(i) and set CPU flags and pop ST(0) Unordered Compare ST(0) to a floating point value and pop ST(0) Unordered Compare ST(0) to ST(1) and pop both registers Wait while FPU is busy Examine the content of ST(0) Exchange the top data register with another data register Extract exponent and significand Y*Log2X Y*Log2(X+1) * cc refers to any of these variations FCMOVB Move if below (CF=1) FCMOVE Move if equal (ZF=1) FCMOVBE Move if below or equal (CF=1 or ZF=1) FCMOVU Move if unordered (PF=1) FCMOVNB Move if not below (CF=0) FCMOVNE Move if not equal (ZF=0) FCMOVNBE Move if not below or equal (CF=0 and ZF=0) FCMOVNU Move if not unordered (PF=0) .