::/ \::::::. :/___\:::::::. / \::::::::. : _/\:::::::::. : _ \ \::::::::::. Oct/Nov 98 :::\_____\::::::::::. Issue 1 ::::::::::::::::::::::......................................................... A S S E M B L Y P R O G R A M M I N G J O U R N A L asmjournal@mailcity.


T A B L E O F C O N T E N T S ---------------------------------------------------------------------Introduction...................................................mammon_ "VGA Programming in Mode 13h".............................Lord Lucifer "SMC Techniques: The Basics"...................................mammon_ "Going Ring0 in Windows 9x".....................................Halvar Column: Win32 Assembly Programming "The Basics"..............................................Iczelion "MessageBox"..............................................Iczelion Column: The C standard library in Assembly "_itoa, _ltoa and _ultoa"...................................Xbios2 Column: The Unix World "x86 ASM Programming for Linux"............................mammon_ Column: Issue Solution "11-byte Solution"..........................................Xbios2 ---------------------------------------------------------------------+++++++++++++++++++++++Issue Challenge++++++++++++++++++++ Write a program that displays its command line in 11 bytes ----------------------------------------------------------------------

::/ \::::::. :/___\:::::::. / \::::::::. : _/\:::::::::. : _ \ \::::::::::. :::\_____\:::::::::::..............................................INTRODUCTION by mammon_ Welcome to the first issue of Assembly Programming Journal. Assembly language has become of renewed interest to a lot of programmers, in what must be a backlash to the surge of poor-quality RAD-developed programs (from Delphi, VB, etc) released as free/shareware over the past few years. Assembly language code is tight, fast, and often well-coded -- you tend to find fewer inexperienced coders writing in assembly language than you do writing in, say,

Visual Basic. The selection of articles is somewhat eclectic and should demonstrate the focus of this magazine: i.e., it targets the assembly-language programming community, not any particular type of coding such as Win32, virus, or demo programmimg. As the magazine is newly born and much of its purpose may seem unclear, I will devote the rest of this column to the most common questions I have received via email regarding the mag. How often will an issue be released? -----------------------------------Barring hazard, an issue will be released every other month. What types of articles will be accepted? ---------------------------------------Anything to do with assembly language. Obviously repeats of previously presented material are not necessary unless they enhance or clarify the earlier material. The focus will be on Intel x86 instruction sets; however coding for other processors is acceptable (though out of courtesy it would be good point to an x86 emulator for the processor you write on). Personally I am looking for articles on the areas of asembly language that interest me: code optimization, demo/graphics programming, virus coding, unix and other-OS asm coding, and OS-internals. Demos (with source) and quality ASCII art (for issue covers, column logos, etc) are especially welcome. For what level of coding experience is the mag intended? -------------------------------------------------------The magazine is intended to appeal to asm coders of all levels. Each issue will contain mostly beginner and intermediate level code/techniques, as these will by nature be of the greatest demand; however one of the goals of APJ is to include enough advanced material to make the magazine appeal to "pros" as well. How will the mag be distributed? -------------------------------Assembly Programming Journal has its own web page at which will contain the current issue and an archive of previous issues. The page also contains a guestbook and a disucssion board for article writers and readers. An email subscription may be obtained by sending an email to with the subject "SUBSCRIBE"; starting with the next issue, Assembly Programming Journal will be emailed to the address you sent the mail from. Wrap-up ------That's the bulk of the "faq". Enjoy the mag! ::/ \::::::.

:/___\:::::::. / \::::::::. : _/\:::::::::. : _ \ \::::::::::. :::\_____\:::::::::::...........................................FEATURE.ARTICLE VGA Programming in Mode 13h by Lord Lucifer This article will describe how to program VGA graphics Mode 13h using assembly language. Mode 13h is the 320x200x256 graphics mode, and is fast and very convenient from a programmer's perspective. The video buffer begins at address A000:0000 and ends at address A000:F9FF. This means the buffer is 64000 bytes long and that each pixel in mode 13h is represented by one byte. It is easy to set up mode 13h and the video buffer in assembly language: mov int mov mov ax,0013h 10h ax,0A000h es,ax ; Int 10 - Video BIOS Services ; ah = 00 - Set Video Mode ; al = 13 - Mode 13h (320x200x256) ; point segment register es to A000h ; we can now access the video buffer as ; offsets from register es

At the end of your program, you will probably want to restore the text mode. Here's how: mov int ax,0003h 10h ; Int 10 - Video BIOS Services ; ah = 00 - Set Video Mode ; al = 03 - Mode 03h (80x25x16 text)

Accessing a specific pixel int the buffer is also very easy: ; ; ; ; bx = x coordinate ax = y coordinate multiply y coord by 320 to get row add this with the x coord to get offset

mul add mov

320 ax,bx cx,es:[ax]

; now pixel x,y can be accessed as es:[ax]

Hmm... That was easy, but that multiplication is slow and we should get rid of it. That's easy to do too, simply by using bit shifting instead of multiplication. Shifting a number to the left is the same as multiplying by 2. We want to multiply by 320, which is not a multiple of 2, but 320 = 256 + 64, and 256 and 64 are both even multiples of 2. So a faster way to access a pixel is: ; ; ; ; ; ; ; ; ; ; ; bx = x coordinate ax = y coordinate copy bx to cx, to save it temporatily shift left by 8, which is the same as multiplying by 2^8 = 256 now shift left by 6, which is the same as multiplying by 2^6 = 64 now add those two together, whis is effectively multiplying by 320 finally add the x coord to this value now pixel x,y can be accessed as es:[ax]

mov shl shl add add mov

cx,bx cx,8 bx,6 bx,cx ax,bx cx,es:[ax]

Well, the code is a little bit longer and looks more complicated, but I can guarantee it's much faster. To plot colors, we use a color look-up table. This look-up table is a 768 (3x256) array. Each index of the table is really the offset index*3. The 3 bytes at each index hold the corresponding values (0-63) of the red, green, and blue components. This gives a total of 262144 total possible colors. However, since the table is only 256 elements big, only 256 different colors are possible at a given time. Changing the color palette is accomplished through the use of the I/O ports of the VGA card: Port 03C7h is the Palette Register Read port. Port 03C8h is the Palette Register Write port Port 03C9h is the Palette Data port Here is how to change the color palette: ; ; ; ; mov out mov out mov out mov out mov dx,03C8h dx,ax dx,03C9h dx,al bl,al dx,al cl,al dx,al dl,al ax bl cl dl = = = = palette index red component (0-63) green component (0-63) blue component (0-63)

; 03c8h = Palette Register Write port ; choose index ; 03c8h = Palette Data port ; set red value ; set green value ; set blue value

Thats all there is to it. Reading the color palette is similar: ; ; ; ; mov out mov in mov in mov in mov dx,03C7h dx,ax dx,03C9h al,dx bl,al al,dx cl,al al,dx dl,al ax bl cl dl = = = = palette index red component (0-63) green component (0-63) blue component (0-63)

; 03c7h = Palette Register Read port ; choose index ; 03c8h = Palette Data port ; get red value ; get green value ; get blue value

Now all we need to know is how to plot a pixel of a certain color at a certain location. Its very easy, given what we already know: ; bx = x coordinate ; ax = y coordinate ; dx = color (0-255)

mov shl shl add add mov

cx,bx cx,8 bx,6 bx,cx ax,bx es:[ax],dx

; ; ; ; ; ; ; ; ; ;

copy bx to cx, to save it temporatily shift left by 8, which is the same as multiplying by 2^8 = 256 now shift left by 6, which is the same as multiplying by 2^6 = 64 now add those two together, whis is effectively multiplying by 320 finally add the x coord to this value copy color dx into memory location thats all there is to it

Ok, we now know how to set up Mode 13h, set up the video buffer, plot a pixel, and edit the color palette. My next article will go on to show how to draw lines, utilize the vertical retrace for smoother rendering, and anything else I can figure out by that time... ::/ \::::::. :/___\:::::::. / \::::::::. : _/\:::::::::. : _ \ \::::::::::. :::\_____\:::::::::::...........................................FEATURE.ARTICLE SMC Techniques: The Basics by mammon_ One of the benefits of coding in assembly language is that you have the option to be as tricky as you like: the binary gymnastics of viral code demonstrate this above all else. One of the viral "tricks" that has made its way into standard protection schemes is SMC: self-modifying code. In this article I will not be discussing polymorphic viruses or mutation engines; I will not go into any specific software protection scheme, or cover any anti-debugger/anti-disassembler tricks, or even touch on the matter of the PIQ. This is intended to be a simple primer on self-modifying code, for those new to the concept and/or implementation. Episode 1: Opcode Alteration ---------------------------One of the purest forms of self-modifying code is to change the value of an instruction before it is executed...sometimes as the result of a comparison, and sometimes to hide the code from prying eyes. This technique essentially has the following pattern: mov reg1, code-to-write mov [addr-to-write-to], reg1 where 'reg1' would be any register, and where '[addr-to-write-to]' would be a pointer to the address to be changed. Note that 'code-to-write- would ideally be an instruction in hexadecimal format, but by placing the code elsewhere in the program--in an uncalled subroutine, or in a different segment--it is possible to simply transfer the compiled code from one location to another via indirect addressing, as follows: call changer mov dx, offset [string] ;this will be performed but ignored label: mov ah, 09 ;this will never be perfomed int 21h ;this will exit the program ....

changer: mov di, offset to_write mov byte ptr [label], [di] ret to_write: mov ah, 4Ch

;load address of code-to-write in DI ;write code to location 'label:' ;return from call ;terminate to DOS function

this small routine will cause the program to exit, though in a disassembler it at first appears to be a simple print string routine. Note that by combining indirect addressing with loops, entire subroutines--even programs--can be overwritten, and the code to be written--which may be stored in the program as data--can be encrypted with a simple XOR to disguise it from a disassembler. The following is a complete asm program to demonstrate patching "live" code; it asks the user for a password, then changes the string to be printed depending on whether or not the password is correct: ; smc1.asm ================================================================== .286 .model small .stack 200h .DATA ;buffer for Keyboard Input, formatted for easy reference: MaxKbLength db 05h KbLength db 00h KbBuffer dd 00h ;strings: note the password is not encrypted, though it should be... szGuessIt db 'Care to guess the super-secret password?',0Dh,0Ah,'$' szString1 db 'Congratulations! You solved it!',0Dh,0Ah, '$' szString2 db 'Ah, damn, too bad eh?',0Dh,0Ah,'$' secret_word db "this" .CODE ;=========================================== start: mov ax,@data ; set segment registers mov ds, ax ; same as "assume" directive mov es, ax call Query ; prompt user for password mov ah, 0Ah ; DOS 'Get Keyboard Input' function mov dx, offset MaxKbLength ; start of buffer int 21h call Compare ; compare passwords and patch exit: mov ah,4ch ; 'Terminate to DOS' function int 21h ;=========================================== Query proc mov dx, offset szGuessIt ; Prompt string mov ah, 09h ; 'Display String' function int 21h ret Query endp ;=========================================== Reply proc PatchSpot: mov dx, offset szString2 ; 'You failed' string mov ah, 09h ; 'Display String' function int 21h ret Reply endp ;===========================================


proc mov cx, 4 ; # of bytes in password mov si, offset KbBuffer ; start of password-input in Buffer mov di, offset secret_word ; location of real password rep cmpsb ; compare them or cx, cx ; are they equal? jnz bad_guess ; nope, do not patch mov word ptr cs:PatchSpot[1], offset szString1 ;patch to GoodString bad_guess: call Reply ; output string to display result ret Compare endp end start ; EOF ======================================================================= Episode 2: Encryption --------------------Encryption is undoubtedly the most common form of SMC code used today. It is used by packers and exe-encryptors to either compress or hide code, by viruses to disguise their contents, by protection schemes to hide data. The basic format of encryption SMC would be: mov reg1, addr-to-write-to mov reg2, [reg1] manipulate reg2 mov [reg1], reg2 where 'reg1' would be a register containing the address (offset) of the location to write to, and reg2 would be a temporary register which loads the contents of the first and then modifies them via mathematical (ROL) or logical (XOR) operations. The address to be patched is stored in reg1, its contents modified within reg2, and then written back to the original location still stored in reg1. The program given in the preceding section can be modified so that it unencrypts the password by overwriting it (so that it remains unencrypted until the program is terminated) by first changing the 'secret_word' value as follows: secret_word db 06Ch, 04Dh, 082h, 0D0h and then by changing the 'Compare' routine to patch the 'secret_word' location in the data segment: ;=========================================== magic_key db 18h, 25h, 0EBh, 0A3h ;not very secure! Compare mov mov xor mov mov mov xor mov mov mov xor mov mov mov xor proc ;Step 1: Unencrypt password al, [magic_key] ; put byte1 of XOR mask in bl, [secret_word] ; put byte1 of password in al, bl byte ptr secret_word, al ; patch byte1 of password al, [magic_key+1] ; put byte2 of XOR mask in bl, [secret_word+1] ; put byte2 of password in al, bl byte ptr secret_word[1], al ; patch byte2 of password al, [magic_key+2] ; put byte3 of XOR mask in bl, [secret_word+2] ; put byte3 of password in al, bl byte ptr secret_word[2], al ; patch byte3 of password al, [magic_key+3] ; put byte4 of XOR mask in bl, [secret_word+3] ; put byte4 of password in al, bl al bl al bl al bl al bl

mov byte ptr secret_word[3], al ; patch byte4 of password mov cx, 4 ;Step 2: Compare changes from here mov si,offset KbBuffer mov di, offset secret_word rep cmpsb or cx, cx jnz bad_guess mov word ptr cs:PatchSpot[1], offset szString1 bad_guess: call Reply ret Compare endp Note the addition of the 'magic_key' location which contains the XOR mask for the password. This whole thing could have been made more sophisticated with a loop, but with only four bytes the above speeds debugging time (and, thereby, article-writing time). Note how the password is loaded, XORed, and re-written one byte at a time; using 32-bit code, the whole (dword) password could be written, XORed and an re-written at once. Episode 3. Fooling with the stack --------------------------------This is a trick I learned while decompiling some of SunTzu's code. What happens here is pretty interesting: the stack is moved into the code segment of the program, such that the top of the stack is set to the first address to be patched (which, BTW, should be the one closest to the end of the program due to the way the stack works); the byte at this address is the POPed into a register, manipulated, and PUSHed back to its original location. The stack pointer (SP) is then decremented so that the next address to be patched (i byte lower in memory) is now at the top of the stack. In addition, the bytes are being XORed with a portion of the program's own code, which disguises somewhat the actual value of the XOR mask. In the following code, I chose to use the bytes from Start: (200h when compiled) up to --but not including-- Exit: (214h when compiled; Exit-1 = 213h). However, as with SunTzu's original code I kept the "reverse" sequence of the XOR mask such that byte 213h is the first byte of the XOR mask, and byte 200h is the last. After some experimentation I found this was the easiest way to sync a patch program--or a hex editor--to the stack-manipulative code; since the stack moves backwards (a forward-moving stack is more trouble than it is worth), using a "reverse" XOR mask allows both filepointers in a patcher to be INCed or DECed in sync. Why is this an issue? Unlike the previous two examples, the following does not contain the encrypted version of the code-to-be-patched. It simply contains the source code which, when compiled, results in the unencrypted bytes which are then run through the XOR routine, encrypted, and then executed (which, if you have followed thus far, will immediately demonstrate to be no good... though it is a fantastic way of crashing the DOS VM!). Once the program is compiled you must either patch the bytes-to-be-decrypted manually, or write a patcher to do the job for you. The former is more expedient, the latter is more certain and is a must if you plan on maintaining the code. In the following example I have embedded 2 CCh's (Int3) in the code at the fore and aft end of the bytes-to-be-decrypted section; a patcher need simply search for these, count the bytes in between, and then XOR with the bytes between 200-213h. Once again, this sample is a continuation of the previous example. In it, I

have written a routine to decrypt the entire 'Compare' routine of the previous section by XORing it with the bytes between 'Start' and 'Exit'. This is accomplished by seeting the stack segment equal to the code segment, then setting the stack pointer equal to the end (highest) address of the code to be modified. A byte is POPed from the stack (i.e. it's original location), XORed, and PUSHed back to its original location. The next byte is loaded by decrementing the stack pointer. Once all of the code it decrypted, control is returned to the newly-decrypted 'Compare' routine and normal execution resumes. ;=========================================== magic_key db 18h, 25h, 0EBh, 0A3h Compare mov sub mov mov mov mov mov mov XorLoop: pop ax xor al, [si] push ax dec sp dec si cmp si, offset Start jae GoLoop mov si, offset Exit-1 GoLoop: loop XorLoop ;XOR next byte mov sp, bx ;restore stack pointer mov ss, dx ;restore stack segment jmp patch_pwd db 0CCh,0CCh ;Identifcation mark: START patch_pwd: ;no changes from here mov al, [magic_key] mov bl, [secret_word] xor al, bl mov byte ptr secret_word, al mov al, [magic_key+1] mov bl, [secret_word+1] xor al, bl mov byte ptr secret_word[1], al mov al, [magic_key+2] mov bl, [secret_word+2] xor al, bl mov byte ptr secret_word[2], al mov al, [magic_key+3] mov bl, [secret_word+3] xor al, bl mov byte ptr secret_word[3], al ;compare password mov cx, 4 mov si, offset KbBuffer mov di, offset secret_word rep cmpsb or cx, cx ;get byte-to-patch into AL ;XOR al with XorMask ;write byte-to-patch back to memory ;load next byte-to-patch ;load next byte of XOR mask ;end sddr of XOR mask ;if not at end of mask, keep going ;start XOR mask over cx, cx, ax, dx, ss, bx, sp, si, proc offset offset cs ss ax sp offset offset EndPatch[1] patch_pwd ;start addr-to-write-to + 1 ;end addr-to-write-to ;save stack segment--important! ;set stack segment to code segment ;save stack pointer ;start addr-to-write-to ;start sddr of XOR mask

EndPatch Exit-1

jnz bad_guess mov word ptr cs:PatchSpot[1], offset szString1 bad_guess: call Reply ret Compare endp EndPatch: db 0CCh, 0CCh ;Identification Mark: END This kind of program is very hard to debug. For testing, I substituted 'xor al, [si]' first with 'xor al, 00h', which would cause no encryption and is useful for testing code for final bugs, and then with 'xor al, EBh', which allowed me to verify that the correct bytes were being encrypted (it never hurts to check, after all). Episode 4: Summation -------------------That should demonstrate the basics of self-modifying code. There are a few techniques to consider to make development easier, though really any SMC programs will be tricky. The most important thing is to get your program running completely before you start overwriting any of its code segments. Next, always create a program that performs the reverse of any decryption/encryption code--not only does this speed up comilation and testing by automating the encryption of code areas that will be decrypted at runtime, it also provides a good tool for error checking using a disassembler (i.e. encrypt the code, disassemble, decrypt the code, disassemble, compare). In fact, it is a good idea to encapsulate the SMC portion of your program in a separate executable and test it on the compiled "release product" until all of the bugs are out of the decryption routine, and only then add the decryption routine to your final code. The CCh 'landmarks' (codemarks?) are extremely useful as well. Finally, do your debugging with for DOS applications--the debugger is quick, small, and if it crashes you simply lose a Windows DOS box. The ability to view the program address space after the program has terminated but before it is unloaded is another distinct advantage. More complex examples of SMC programs can be found in Dark Angel's code, the Rhince engine, or in any of the permutation engines used in ploymorphic viruses. Acknowledgements go to Sun-Tzu for the stack technique used in his ghf-crackme program. ::/ \::::::. :/___\:::::::. / \::::::::. : _/\:::::::::. : _ \ \::::::::::. :::\_____\:::::::::::...........................................FEATURE.ARTICLE Going Ring0 in Windows 9x by Halvar Flake This article gives a short overview over two ways to go Ring0 in Windows 9x in an undocumented way, exploiting the fact that none of the important system tables in Win9x are on pages which are protected from low-privilege access. A basic knowledge of Protected Mode and OS Internals are required, refer to

your Assembly Book for that :-) The techniques presented here are in no way a good/clean way to get to a higher privilege level, but since they require only a minimal coding effort, they are sometimes more desirable to implement than a full-fledged VxD. 1. Introduction --------------Under all modern Operating Systems, the CPU runs in protected mode, taking advantage of the special features of this mode to implementvirtual memory, multitasking etc. To manage access to system-critical resources (and to thus provide stability) a OS is in need of privilege levels, so that a program can't just switch out of protected mode etc. These privilege levels are represented on the x86 (I refer to x86 meaning 386 and following) CPU by 'Rings', with Ring0 being the most privileged and Ring3 being the least privileged level. Theoretically, the x86 is capable of 4 privilege levels, but Win32 uses only two of them, Ring0 as 'Kernel Mode' and Ring3 as 'User Mode'. Since Ring0 is not needed by 99% of all applications, the only documented way to use Ring0 routines in Win9x is through VxDs. But VxDs, while being the only stable and recommended way, are work to write and big, so in a couple of specialized situations, other ways to go Ring0 are useful. The CPU itself handles privilege level transitions in two ways: Through Exceptions/Interrupts and through Callgates. Callgates can be put in the LDT or GDT, Interrupt-Gates are found in the IDT. We'll take advantage of the fact that these tables can be freely written to from Ring3 in Win9x (NOT IN NT !). 2. The IDT method ----------------If an exception occurs (or is triggered), the CPU looks in the IDT to the corresponding descriptor. This descriptor gives the CPU an Address and Segment to transfer control to. An Interrupt Gate descriptor looks like this: --------------------------------- --------------------------------D D 1.Offset (16-31) P P P 0 1 1 1 0 0 0 0 R R R R R +4 L L --------------------------------- --------------------------------2.Segment Selector 3.Offset (0-15) 0 --------------------------------- --------------------------------DPL == Two bits containing the Descriptor Privilege Level P == Present bit R == Reserved bits The first word (Nr.3) contains the lower word of the 32-bit address of the Exception Handler. The word at +6 contains the high-order word. The word at +2 is the selector of the segment in which the handler resides. The word at +4 identifies the descriptor as Interrupt Gate, contains its privilege and the present bit. Now, to use the IDT to go Ring0, we'll create a new Interrupt Gate which points to our Ring0 procedure, save an old one and replace it with ours. Then we'll trigger that exception. Instead of passing control to Window's own handler, the CPU will now execute our Ring0 code. As soon as we're done, we'll restore the old Interrupt Gate.

In Win9x, the selector 0028h always points to a Ring0-Code Segment, which spans the entire 4 GB address range. We'll use this as our Segment selector. The DPL has to be 3, as we're calling from Ring3, and the present bit must be set. So the word at +4 will be 1110111000000000b => EE00h. These values can be hardcoded into our program, we have to just add the offset of our Ring0 Procedure to the descriptor. As exception, you should preferrably use one that rarely occurs, so do not use int 14h ;-) I'll use int 9h, since it is (to my knowledge) not used on 486+. Example code follows (to be compiled with TASM 5): -------------------------------- bite here ----------------------------------.386P LOCALS JUMPS .MODEL FLAT, STDCALL EXTRN ExitProcess : PROC .data IDTR SavedGate OurGate df 0 dq 0 dw dw dw dw 0 028h 0EE00h 0 ; This will receive the contents of the IDTR ; register ; We save the gate we replace in here ; Offset low-order word ; Segment selector ; ; Offset high-order word

.code Start: mov mov shr mov sidt mov add mov mov movsd movsd mov mov movsd movsd int eax, offset Ring0Proc [OurGate], ax eax, 16 [OurGate+6], ax fword ptr IDTR ebx, dword ptr [IDTR+2] ebx, 8*9 edi, offset SavedGate esi, ebx ; Save the old descriptor ; into SavedGate edi, ebx esi, offset OurGate ; Replace the old handler ; with our new one 9h ; Trigger the exception, thus ; passing control to our Ring0 ; Put the offset words ; into our descriptor

; load IDT Base Address ; Address of int9 descriptor in ebx

; procedure mov mov movsd movsd call edi, ebx esi, offset SavedGate ; Restore the old handler ExitProcess, LARGE -1

Ring0Proc PROC mov eax, CR0 iretd Ring0Proc ENDP end Start -------------------------------- bite here ----------------------------------3. The LDT Method ----------------Another possibility of executing Ring0-Code is to install a so- called callgate in either the GDT or LDT. Under Win9x it is a little bit easier to use the LDT, since the first 16 descriptors in it are always empty, so I will only give source for that method here. A Callgate is similar to a Interrupt Gate and is used in order to transfer control from a low-privileged segment to a high-privileged segment using a CALL instruction. The format of a callgate is: --------------------------------- --------------------------------D D D D D D 1.Offset (16-31) P P P 0 1 1 0 0 0 0 0 0 W W W W +4 L L C C C C --------------------------------- --------------------------------2.Segment Selector 3.Offset (0-15) 0 --------------------------------- --------------------------------P == Present bit DPL == Descriptor Privilege Level DWC == Dword Count, number of arguments copied to the ring0 stack So all we have to do is to create such a callgate, write it into one of the first 16 descriptors, then do a far call to that descriptor to execute our Ring0 code. Example Code: -------------------------------- bite here ----------------------------------.386P LOCALS JUMPS .MODEL FLAT, STDCALL EXTRN ExitProcess : PROC .data

GDTR CallPtr

df 0 dd 00h dw 0Fh

; This will receive the contents of the IDTR ; register ; ; ; ; ; ; As we're using the first descriptor (8) and its located in the LDT and the privilege level is 3, our selector will be 000Fh. That is because the low-order two bits of the selector are the privilege level, and the 3rd bit is set if the selector is in the LDT.


dw dw dw dw

0 028h 0EC00h 0

; Offset low-order word ; Segment selector ; ; Offset high-order word

.code Start: mov mov shr mov xor sgdt mov sldt add mov mov shl mov add mov mov movsd movsd call xor sub stosd stosd call eax, offset Ring0Proc [OurGate], ax eax, 16 [OurGate+6], ax eax, eax fword ptr GDTR ebx, dword ptr [GDTR+2] ax ebx, eax al, [ebx+4] ah, [ebx+7] eax, 16 ax, [ebx+2] eax, 8 edi, eax esi, offset OurGate ; Move our custom callgate ; into the LDT fword ptr [CallPtr] eax, eax edi, 8 ; Execute the Ring0 Procedure ; Clean up the LDT ; load GDT Base Address ; ; ; ; ; ; Address of the LDT descriptor in ebx Load the base address of the LDT itself into eax, refer to your pmode manual for details ; Put the offset words ; into our descriptor

; Skip NULL Descriptor

ExitProcess, LARGE -1

Ring0Proc PROC mov eax, CR0 retf Ring0Proc ENDP end Start -------------------------------- bite here -----------------------------------

Well, that's all for now folks. This method can be easily changedto use the GDT instead which would save a few bytes in case you have to optimize heavily. Anyways, do use these methods with care, they will NOT run on NT and are generally not exactly a clean or stable way to do these things. Credits & Thanks ---------------The IDT-Method taken from the CIH virus & Stone's example source at The LDT-Method was done by me, but without IceMans & The_Owls help I would still be stuck, so all credits go to them. ::/ \::::::. :/___\:::::::. / \::::::::. : _/\:::::::::. : _ \ \::::::::::. :::\_____\:::::::::::................................WIN32.ASSEMBLY.PROGRAMMING Win32 ASM: The Basics by Iczelion The required tools: -Microsoft Macro Assembler 6.1x : MASM support of Win32 programming starts from version 6.1. The latest version is 6.13 which is a patch to previous version of 6.11. Win98 DDK includes MASM 6.11d which you can download from Microsoft at But be warned, this monstrosity is huge, 18.5 MB in size. MASM 6.13 patch can also be downloaded from -Microsoft import libraries : You can use the import libraries from Visual C++. Some are included in Win98 DDK. -Win32 API Reference : You can download it from Borland's site: Here's a brief description of the assembly process. MASM 6.1x comes with two essential tools: ml.exe and link.exe. ml.exe is the assembler. It takes in the assembly source code (.asm) and produces an object file (.obj) . An object file is an intermediate file between the source code and the executable file. It needs some address fixups which are the services provided by link.exe. Link.exe makes an object file into an executable file by several means such as adding the codes from other modules to the object files or providing the address fixups, addingr esouces, etc. For example: ml skeleton.asm ---> this produces skeleton.obj link skeleton.obj ---> this produces skeleton.exe The above lines are simplification of course. In the real world, you must add several switches to ml.exe and link.exe to customize your application. Also there will be several files you must link with the object file in order to create your application. Win32 programs run in protected mode which is available since 80286. But 80286 is now history. So we only have to concern ourselves with 80386 and its

descendants. Windows run each Win32 program in separated virtual space. That means each Win32 program will have its own 4 GB address space. Each program is alone in its address space. This is in contrast to the situation in Win16. All Win16 programs can *see* each other. Not so in Win32. This feature helps reduce the chance of one program writing over other program's code/data. Memory model is also drastically different from the old days of the 16-bit world. Under Win32, we need not be concerned with memory model or segment anymore! There's only one memory model: Flat memory model. There's no more 64K segments. The memory is a large continuous space of 4 GB. That also means you don't have to play with segment registers. You can use any segment register to address any point in the memory space. That's a GREAT help to programmers. This is what makes Win32 assembly programming as easy as C. We will examine a miminal skeleton of a Win32 assembly program. We'll add more flesh to it later. Here's the skeleton program. If you don't understand some of the codes, don't panic. I'll explain each of them later. .386 .MODEL Flat, STDCALL .DATA <Your initialized data> ...... .DATA? &lt;Your uninitialized data> ...... .CONST <Your constants> ...... .CODE <label> <Your code> ..... end <label> That's all! Let's analyze this skeleton program. .386 This is an assembler directive, telling the assembler to use 80386 instruction set. You can also use .486, .586 but the safest bet is to stick to .386. .MODEL FLAT, STDCALL .MODEL is an assembler directive that specifies memory model of your program. Under Win32, there's only on model, FLAT model. STDCALL tells MASM about parameter passing convention. Parameter passing convention specifies the order of parameter passing, left-to-right or right-to-left, and also who will balance the stack frame after the function call. Under Win16, there are two types of calling convention, C and PASCAL C calling convention passes parameters to the function from right to left, that is , the rightmost parameter is pushed on the stack first. The caller is responsible for balancing the stack frame after the call. For example, in order to call a function named foo(int first_param, int second_param, int third_param) in C calling convention the asm codes will look like this: push [third_param] push [second_param] push [first_param] call foo add sp, 12 ; Push the third parameter ; Followed by the second ; And the first ; The caller balances the stack frame

PASCAL calling convention is the reverse of C calling convention. It pushes parameters on the stack from left to right and the callee is responsible for the stack balancing after the call. Win16 adopts PASCAL convention because it produces smaller codes. C convention is useful when you don't know how many parameters will be passed to the function as in the case of wsprintf(). In the case of wsprintf(), the function has no way to determine beforehand how many parameters will be pushed on the stack, so it cannot balance the stack correctly. The caller is the one who knows how many bytes are pushed on the stack so it's right and proper that it's also the one who balances the stack frame after the call. STDCALL is the hybrid of C and PASCAL convention. It pushes parameters on the stack from right to left but the callee is responsible for stack balancing after the call. Win32 platform use STDCALL exclusively. Except in one case: wsprintf(). You must use C calling convention with wsprintf(). .DATA .DATA? .CONST .CODE All four directives are what are called sections. You don't have segments in Win32 anymore, remember? But you can divide your entire address space into logical sections. The start of one section denotes the end of the previous section. There are two groups of section: data and code. Data sections are divided into 3 categories: * .DATA This section contains initialized data of your program. * .DATA? This section contains uninitialized data of your program. Sometimes you just want to preallocate some memory but doesn't want to initialize it. This section exists for that purpose. * .CONST This section contains declaration of constants used by your program. Constants in this section can never be modified in your program. They are just *constant*. You don't have to use all three sections in your program. Declare only the section(s) you want to use. There's only one section for code: .CODE. This is where your codes reside. Example: <label> end <label> ...where <label> is any arbitrary label is used to specify the extent of your code. Both labels must be identical. All your codes must reside between <label> and end <label> ::/ \::::::. :/___\:::::::. / \::::::::. : _/\:::::::::. : _ \ \::::::::::. :::\_____\:::::::::::................................WIN32.ASSEMBLY.PROGRAMMING MessageBox Display by Iczelion We will create a fully functional Windows program that displays a message box

saying "Win32 assembly is great!". Windows prepares a wealth of resources for use by Windows programs. Central to this is the Windows API (Application Programming Interface). Windows API is a huge collection of very useful functions that resides in Windows itself, ready to be used by any Windows programs. These functions are stored in several dynamic-linked libraries (DLLs) such as kernel32.dll, user32.dll and gdi32.dll, to name a few. Kernel32.dll contains API functions that deal with memory and process management. User32.dll controls the user interface aspects of your programs. Gdi32.dll is responsible for graphics operation. Other than "the main three", there are other DLLs that your program can make use of, provided you have enough information about the desired API functions stored in them. Windows programs dynamically link to these DLLs, i.e. the codes of API functions are not included in the executable file. This is very different from what's called static linking in which actual codes from software libraries are included in the executable files. In order for programs to know where to find the desired API functions at runtime, enough information must be embedded into the executable file for it to be able to select the correct DLLs and correct functions. That information is in import libraries. You must link your programs with the correct import libraries or it will not be able to locate the desired API functions. There are two types of API functions: One for ANSI and the other for Unicode. The name of API functions for ANSI are postfixed with "A", eg. MessageBoxA. Those for Unicode are postfixed with "W" (for Wide Char, I think). Windows 95 natively supports ANSI and Windows NT Unicode. But most of the time, you will use an include file which can determine and select the appropriate API functions for your platform. Just refer to the API function name without the postfix. I'll present the bare program skeleton below. We will fill it out later. .386 .model flat, stdcall .data .code Main: end Main Every Windows program must call an API function, ExitProcess, when it wants to quit to Windows. In this respect, ExitProcess is equivalent to int 21h, ah=4Ch in DOS. Here's the function prototype of ExitProcess from winbase.h: void WINAPI ExitProcess(UINT uExitCode); -void means the function does not return any value to the caller. -WINAPI is an alias of STDCALL calling convention. -UINT is a data type, "unsigned integer", which is a 32-bit value under Win32 (it's a 16-bit value under Win16) -uExitCode is the 32-bit return code to Windows. This value is not used by Windows as of now. In order to call ExitProcess from an assembly program, you must first declare the function prototype for ExitProcess.

.386 .model flat, stdcall ExitProcess PROTO :DWORD .data .code Main: invoke ExitProcess, 0 end Main That's it. Your first working Win32 program. Save it under the name msgbox.asm. Assuming ml.exe is in your path, assemble msgbox.asm with: ml /c /coff /Cp msgbox.asm /c tells MASM to assemble the source file into an object file only. Do not invoke Link.exe automatically. /coff tells MASM to create .obj file in COFF format. /Cp tells MASM to preserve case of user identifiers Then go on with link: link /SUBSYSTEM:WINDOWS /LIBPATH:c:\masm\lib msgbox.obj kernel32.lib /SUBSYSTEM:WINDOWS informs Link.exe on which platform the executable is intended to run /LIBPATH:<path to import library> tells Link where the import libraries are. In my PC, they're located in c:\masm\lib. Now that you get msgbox.exe. Go on, run it. You'll find that it does nothing. Well, we haven't put anything interesting in it yet. But it's a Windows program nonetheless. And look at its size! In my PC, it is 1,536 bytes. The line: ExitProcess PROTO :DWORD

is a function prototype. You create one by declaring the function name followed by the keyword "PROTO" and lists of data types of the parameters prefixed by colons. MASM uses function prototypes to type checking which will prevent nasty stack errors that may pass unnoticed otherwise. The best place for function prototypes is in an include file. You can create an include file full of frequently used function prototypes and data structures and include it at the beginning of your asm source code. You call the API function by using "invoke" keyword: invoke ExitProcess, 0 INVOKE is really a kind of high-level call. It checks number and types of parameters and pushes parameters on the stack according to the specified calling convention (in this case, stdcall). By using INVOKE instead of a normal call, you can prevent stack errors from incorrect parameter passing. Very useful. The syntax is: INVOKE expression [,arguments] where expression is a label or function name.

Next we're going to put a message box in our program. Its function declaration is: int WINAPI MessageBoxA(HWND hwnd, LPCSTR lpText, LPCSTR lpCaption, UINT uType); -hwnd is the handle to parent window -lpText is a pointer to the text you want to display in the client area of the message box -lpCaption is a pointer to the caption of the message box -uType specifies the icon and the number and type of buttons on the message box Under Win32 , HWND, LPCSTR, and UINT are all 32 bits in size. Let's modify msgbox.asm to include the message box. .386 .model flat, stdcall ExitProcess PROTO :DWORD MessageBoxA PROTO :DWORD, :DWORD, :DWORD, :DWORD .data MsgBoxCaption db "Our First Program",0 MsgBoxText db "Win32 Assembly is Great!",0 .const NULL equ 0 MB_OK equ 0 .code Main: INVOKE MessageBoxA, NULL, ADDR MsgBoxText, ADDR MsgBoxCaption, MB_OK INVOKE ExitProcess, NULL end Main Assemble it by: ml /c /coff /Cp msgbox.asm link /SUBSYSTEM:WINDOWS /LIBPATH:c:\masm\lib msgbox kernl32.lib user32.lib You have to include user32.lib in your Link parameter, since link info of MessageBoxA is in user32.lib. You'll see a message box displaying the text "Win32 Assembly is Great!". Let's look again at the source code: We define two zero-terminated strings in .data section. Remember that all strings in Windows must be terminated with zero (ASCIIZ). We define two constants in .const section. We use constants to improve the clarity of the source code. Look at the parameters of MessageBoxA. The first parameter is NULL. This means that there's no window that *owns* this message box. The operator "ADDR" is used to pass the address of the label to the function. This operator is specific to MASM. No TASM-equivalent exists. It functions like "OFFSET" operator but with some differences: 1. It doesn't accept forward reference. If you want to use "ADDR foo", you have to declare "foo" before using ADDR operator. 2. It can be used with a local variable. A local variable is the variable that is created on the stack. OFFSET operator cannot be

used in this situation because the assembler doesn't know the true address of the local variable at assemble time. ::/ \::::::. :/___\:::::::. / \::::::::. : _/\:::::::::. : _ \ \::::::::::. :::\_____\:::::::::::........................THE.C.STANDARD.LIBRARY.IN.ASSEMBLY The _itoa, _ltoa and _ultoa functions by Xbios2 ATTENTION I: This is based on Borland's C++ 4.02. Whenever possible I've checked it with any other library / program containing the specific functions, but differences may exist between this and your version of C. Also this is strictly 32-bit code, Windows compiler. No DOS or UNIX.] ATTENTION II: Size comparisons are extremely easy to do. Speed comparison's aren't. The differences in speed I give are based on RDTSC timings, but they DON'T take into account extreme cases. That's why I don't give exact clock cycles. Of course if you need exact clock cycles for your Pentium II, you can always buy me one :) The C language offers three functions to convert an integer to ASCII: char *itoa(int value, char *string, int radix); char *ltoa(long value, char *string, int radix); char *ultoa(unsigned long value, char *string, int radix); _itoa and _ltoa do _exactly_ the same thing. This is because an integer _is_ a long in 32-bit code. Yet they are different: _itoa has some _completely_ useless code in it (in 16bit this code would sign-extend value if radix=10). Yet the result is always the same, so _ltoa from here on means both _ltoa and _itoa. _ultoa is exactly the same as _ltoa and _itoa, except when radix=10 and value < 0. Anyway all these functions call this function: ___longtoa(value, *string, radix, signed, char10) The first three parameters are passed 'as is', signed is set to 1 by _ltoa if radix=10 else it is set to 0 and char10 is the character that corresponds to 10 if radix>10, and is always set to 'a' (___longtoa is also used by printf, which has an option to have uppercase chars in Hex). ___longtoa does the following (and it does it with badly written code): 1. 2. 3. 4. Checks that 2<=radix<=36, if it isn't returns '0' If signed=1 and value<0 add a '-' to the string and neg the value Loop1: create a pseudo-string in the stack, reversed Loop2: convert and copy the pseudo-string into string

The check on radix is necessary because: radix=0 would generate an INT0 (divide by zero) radix=1 would put the program in an infinite loop, destroying the stack radix=37 for value=36 would return '}', the character after 'z'

The two loops are necessary because of the way the conversion is done (see code later). To implement a single-loop conversion, the number of digits should be calculated in advance, which results in less efficient code (the number of digits in value is n=(int)(log(value)/log(radix))+1, but using one more loop is much faster). Including the disassembly of C's functions would create a really large article, and anyway they're just examples of really bad code. So straight to the result: ltoa proc cmp sete mov jmp mov longtoa: push push push sub mov mov mov cmp jl cmp jg or jge cmp jz mov inc neg skip: mov loop1: xor div mov inc or jnz loop2: dec mov cmp jl add nochar: add stosb cmp jg _ret: mov mov add dword ptr [esp+0Ch], 10 ch cl, 'a'-'0'-10 short longtoa cx, 'a'-'0'-10 ebx edi esi esp, 24h ebx, [esp+3Ch] eax, [esp+34h] edi, [esp+38h] ebx, 2 short _ret ebx, 36 short _ret eax, eax short skip byte ptr ch, 0 short skip byte ptr [edi], '-' edi eax esi, esp edx, edx ebx [esi], dl esi eax, eax loop1 esi al, [esi] al, 10 short nochar al, cl al, '0' esi, esp short loop2 byte ptr [edi], 0 eax, [esp+38h] esp, 24h


; radix ; value ; string

; _ltoa ?


pop pop pop ret endp

esi edi ebx

This is a 3 into 1 procedure. ltoa and ultoa take the same parameters as the standard C functions. longtoa was changed to take from the stack the same parameters as ltoa and ultoa, while signed and char10 are passed in CH and CL respectively. This way ltoa and ultoa 'see' longtoa as 'their' code, not as a different procedure (this is to avoid a common problem in C, procedures that just 'forward' their parameters to another function). This code compiles to 102 bytes (and it could be optimized to gain some more bytes) whereas the standard C code takes 270 bytes. Specifically: function C size Asm size -----------------------------itoa 60 0 ltoa 40 12 ultoa 27 4 longtoa 143 86 ----------total 270 102 It also runs 2x faster than ltoa. And of course, this is a fully C-compatible version of ltoa and ultoa. Of course it can be changed from C-compatible to suit specific needs (e.g make it stdcall instead of cdecl, or if speed and size are needed remove the check for the radix, and so on...) Anyway, it is rather strange that you'll ever use values of radix other than 2, 8, 10 or 16. So if speed or size is of essence, a better, more specific routine can be written. For example, consider this routine which stores the value of EAX as a binary number at the address specified by EDI: ultob proc mov more1: shl dec jc jnl more2: setc add shl mov inc dec jnl mov ret ultob endp ecx, 32 eax, 1 ecx more2 more1 dl dl, '0' eax, 1 [edi], dl edi ecx more2 [edi], al

This runs 14x faster than C ltoa, and 7x faster than Asm ltoa, and is only 29 bytes long. But this article is long enough, so wait for another article on specific 'ltoa' functions (who knows, maybe if I decide to write a 'printf' function in Asm, which would use them...). ::/ \::::::. :/___\:::::::.

/ \::::::::. : _/\:::::::::. : _ \ \::::::::::. :::\_____\:::::::::::............................................THE.UNIX.WORLD x86 ASM Programming for Linux by mammon_ Essentially this article is an excuse to combine two of my favorite coding interests: the Linux operating system and assembly language programming. Both of these need (or should need) no introduction; like Win32 assembly, Linux assembly runs in 32-bit protected mode...however it has the distinct advantage of allowing you to call the C standard library functions as well as any of the usual Linux "shared" library functions. I have begun with a brief introduction on compiling assembly language programs in Linux; for greater readability you may want to skip over this to the "Basics" section. Compiling And Linking --------------------The two main assemblers for Linux are Nasm, the (free) Netwide Assembler, and GAS, the (also free) Gnu Assembler which is integrated into GCC. I will focus on Nasm in this article and leave GAS for a later date, as it uses the AT&T syntax and thus would require a lengthy introduction. Nasm should be invoked with the ELF format option ("nasm -f elf hello.asm"); the resulting object is linked with GCC ("gcc hello.o") to produce the final ELF binary. The following script can be used to compile ASM modules; I wrote it to be very simple, so all it does is take the first filename passed to it (I recommend naming it with a ".asm" extension), compile it with nasm, and link it with gcc. #!/bin/sh # ========================================================= outfile=${1%%.*} tempfile=asmtemp.o nasm -o $tempfile -f elf $1 gcc $tempfile -o $outfile rm $tempfile -f #EOF ================================================================== The Basics ---------It is best, of course, to start off with an example before launching into the OS details. Here is a very basic, "hello-world"-style program: ; asmhello.asm ======================================================== global main extern printf section .data msg db "Helloooooo, nurse!",0Dh,0Ah,0 section .text main: push dword msg call printf pop eax ret ; EOF ================================================================= A quick rundown: the "global main" must be declared global--and since we are

using the GCC linker, the entrypoint must be named "main"--for the OS loader. The "extern printf" is simply a declaration for the call later in the program; note that this is all that is needed; the parameter sizes do not need to be declared. I have sectioned this example into the standard .data and .text sections, though this is not strictly necessary--one could get by with only a .text segment, just as in DOS. In the body of the code, note that you must push the parameters to the call, and in Nasm you must declare the size of all ambiguous (i.e. non-register) data: hence the "dword" qualifier. Note that just as inother assemblers, Nasm assumes that any memory/label reference is intended to mean the address of the memory location or label, not its contents. Thus, to specify the address of the string 'msg' you would use 'push dword msg', while to specify the contents of the string 'msg' you would use 'push dword [msg]' (note this will only contain the first 4 bytes of 'msg'). As printf requires a pointer to a string, we will specify the address of 'msg'. The call to printf is pretty straightforward. Note that you must clean up the stack after every call you make (see below); thus, having PUSHed a dword, I POP a dword from the stack into a "throwaway" register. Linux programs end simply with a RET to the OS, as each process is spawned from the shell (or PID 1 ;) and ends by returning control to it. Notice that in Linux you use the standard shared libraries that are shipped with the OS in lieu of an "API" or Interrupt Services. All external references will be taken care of by the GCC linker which takes a lot of the workload off the asm coder. Once you get used to the basic quirks, coding assembly in Linux is actually easier than on a DOS-based machine! The C Calling Syntax -------------------Linux uses the C calling convention--meaning that arguments are pushed onto the stack in reverse order (last arg first), and that the caller must cleanup the stack. You can do this either by popping values from the stack: push dword szText call puts pop ecx or by directly modifying ESP: push dword szText call puts add esp, 4 Results from the call are returned in eax or edx:eax if the value is greater than 32-bit. EBP, ESI, EDI, and EBX are all saved and restored by the caller. Note that you must preserve any other registers you use, as the following will illustrate: ; loop.asm ================================================================= global main extern printf section .text msg db "HoodooVoodoo WeedooVoodoo",0Dh,0Ah,0 main: mov ecx, 0Ah push dword msg looper: call printf loop looper pop eax ret

; EOF ====================================================================== On first glance this looks pretty simple: since you are going to use the same string on the 10 printf() calls, you do not need to clean up the stack. Yet when you compile this, the loop never stops. Why? Because somewhere in the printf() call ECX is being used and isn't saved. So to make your loop work properly you must save the count value in ECX before the call and restoe it afterwards, as so: ; loop.asm ================================================================ global main extern printf section .text msg db "HoodooVoodoo WeedooVoodoo",0Dh,0Ah,0 main: mov ecx, 0Ah looper: push ecx ;save Count push dword msg call printf pop eax ;cleanup stack pop ecx ;restore Count loop looper ret ; EOF ====================================================================== I/O Port Programming -------------------But what about direcr hardware access? In Linux you need a kernel-mode driver to do anything really tricky...this means your program will end up being two parts, one kernel-mode that provides the direct-hardware functionality, the other user-mode to provide an interface. The good news is that you can still access ports using the IN/OUT commands from a user-mode program. To access the I/O ports your program must be granted permission by the OS; to do that, you must make an ioperm() call. This function can only be called by a user with root access, so you must either setuid() the program to root or run the program as root. The ioperm() has the following syntax: ioperm( long StartingPort#, long #Ports, BOOL ToggleOn-Off) which means that 'StartingPort#' specifies the first port number to access (0 is port 0h, 40h is port 40h, etc), '#Ports' specifies how many ports to access (i.e., 'StartingPort# = 30h' and '#Ports = 10' would provide access to ports 30h-39h), and 'ToggleOn-Off' enables access if TRUE (1) or disables access if FALSE (0). Once the call to ioperm() is made, the requested ports may be access as normal. The program can call ioperm() any number of times and does not need to make a subsequent ioperm() call (though the example below does so) as the OS will take care of this. ; io.asm ==================================================================== BITS 32 GLOBAL szHello GLOBAL main EXTERN printf EXTERN ioperm SECTION .data

szText1 db 'Enabling I/O Port Access',0Ah,0Dh,0 szText2 db 'Disabling I/O Port Acess',0Ah,0Dh,0 szDone db 'Done!',0Ah,0Dh,0 szError db 'Error in ioperm() call!',0Ah,0Dh,0 szEqual db 'Output/Input bytes are equal.',0Ah,0Dh,0 szChange db 'Output/Input bytes changed.',0Ah,0Dh,0 SECTION .text main: push dword szText1 call printf pop ecx enable_IO: push word 1 ; enable mode push dword 04h ; four ports push dword 40h ; start with port 40 call ioperm ; Must be SUID "root" for this call! add ESP, 10 ; cleanup stack (method 1) cmp eax, 0 ; check ioperm() results jne Error ;---------------------------------------Port Programming Part-------------SetControl: mov al, 96 ; R/W low byte of Counter2, mode 3 out 43h, al ; port 43h = control register WritePort: mov bl, 0EEh ; value to send to speaker timer mov al, bl out 42h, al ; port 42h = speaker timer ReadPort: in al, 42h cmp al, bl ; byte should have changed--this IS a timer :) jne ByteChanged BytesEqual: push dword szEqual call printf pop ecx jmp disable_IO ByteChanged: push dword szChange call printf pop ecx ;---------------------------------------End Port Programming Part---------disable_IO: push dword szText2 call printf pop ecx push word 0 ; disable mode push dword 04h ; four ports push dword 40h ; start with port 40h call ioperm pop ecx ;cleanup stack (method 2) pop ecx pop cx cmp eax, 0 ; check ioperm() results jne Error jmp Exit Error:

push dword szError call printf pop ecx Exit: ret ; EOF ====================================================================== Using Interrupts In Linux ------------------------Linux is a shared-library environment running in protected mode, meaning there are no interrupt services. Right? Wrong. I noticed an INT 80 call on some GAS sample source code with the comment "sys_write(ebx, ecx, edx)". This function is part of the Linux syscall interface, which means that the interrupt 80 must be a gate into the syscall services. Poking around in the Linux source code (and ignoring warnings to NEVER use the INT 80 interface as the function numbers may be changed at any time), I found the "system call numbers" --that is, what function # to pass on to INT 80 for each syscall routine-- in the file UNISTD.H. There are 189 of them, so I will not list them here...but if you are going to be doing Linux assembly, do yourself a favor and print this file out. When calling INT 80h, eax must be set to the desired function number. Any parameters to the syscall routine must be placed in the following registers in order: ebx, ecx, edx, esi, edi so that parameter one is placed in ebx, parameter 2 in ecx, etc. Note that there is no stack used to pass values to a syscall routine. The result of the call will be returned in eax. Other than that, the INT 80 interface is the same as regular calls (only a bit more fun ;). The following program demonstrates a simple INT 80h call in which a program checks and display its own PID. Note the use of printf() format string-- it is best to psuedocode this as a C call first, then make the format string a DB and to push each variable passed (%s, %d, etc). The C structure for this call would be printf( "%d\n", curr_PID); Note also that the escape sequences ("\n") are not all that reliable in assembly; I had to use the hex values (0Ah,0Dh) for the CR\LF. ;pid.asm==================================================================== BITS 32 GLOBAL main EXTERN printf SECTION .data szText1 db 'Getting Current Process ID...',0Ah,0Dh,0 szDone db 'Done!',0Ah,0Dh,0 szError db 'Error in int 80!',0Ah,0Dh,0 szOutput db '%d',0Ah,0Dh,0 ;weird formatting is for printf() SECTION .text main: push dword szText1 call printf

;opening message

pop ecx GetPID: mov eax, dword 20 int 80h cmp eax, 0 jb Error push eax push dword szOutput call printf pop ecx pop ecx push dword szDone call printf pop ecx jmp Exit Error: push dword szError call printf pop ecx Exit: ret ; EOF ===================================================================== Final Words ----------Most of the trouble is going to come from getting used to Nasm itself. While nasm does come with a man page, it does not by default install it, so you must move it (cp or mv) from /usr/local/bin/nasm-0.97/ to /usr/local/man/man1/ The formatting is a little messed up, but that is easily fixed using the nroff directives. It still does not give you the entire Nasm documentation, however; for that, copy nasmdoc.txt from /usr/local/bin/nasm-0.97/doc/nasmdoc.txt to /usr/local/man/man1/ Now you cam invoke the nasm man page with 'man nasm' and the nasm documentation with 'man nasmdoc'. For further information, check out the following: Linux Assembly Language HOWTO Linux I/O Port Programming Mini-HOWTO Jan's Linux & Assembler HomePage ( Also I owe a bit of thanks to Jeff Weeks at code^x software ( for forwarding me a couple of GAS hello-world's in the dark days before I found Jan's page. ::/ \::::::. :/___\:::::::. / \::::::::. : _/\:::::::::. : _ \ \::::::::::. :::\_____\:::::::::::...........................................ISSUE.CHALLENGE 11-byte Program Displays Its Command-Line by Xbios2 ; getpid() syscall ; syscall INT ; there will never be PID 0 ! :) ; pass return value to printf ; pass format string to printf ; cleanup stack ; ending message

The Challenge ------------Write an 11-byte program that displays its command line. The Solution -----------Before saying that these programs won't work, try them. Some of them work only after you've run them twice. Anyway, they' ve been tested both under Windows and plain DOS and they work. Believe it or not, these are the first programs I've ever written in DOS, so I just tried various ideas until some worked, even thought I thought they wouldn't... :) The command line in DOS is found in the PSP (Program Segment Prefix) which in .COM files occupies the first 100h bytes in the segment. At offset 80h, a <count, char> string (first byte is length of string, and n bytes follow) contains everything typed after the filename. The last character in this string is a CR (carriage return). The requested program should be composed of three parts: 1. set up pointers to data 2. display data 3. exit Actually all the following programs DON'T include part 3, but read on. The data (command line) can be printed either as a single string, or character by character. APPROACH 1: Print single string ------------------------------For the first approach there are two interrupts: 1. INT 21, 9 ; write $ terminated string 2. INT 21, 40 ; write to file using handle For the first case, part 2 would be: mov ah, 9 mov dx, 81h int 21h that makes 7 bytes, leaving only 4 bytes to replace the last CR with a '$', which are too few. (Actually, if the user would type a $ as the last character in the comand line, this would make the smallest possible program.) The shortest program I managed to write is: shr si,1 ; D1 EE lodsb ; AC push si ; 56 add si,ax ; 03 F0 mov byte ptr [si],'$' ; C6 04 24 xcgh bp,ax ; 95 pop dx ; 5A int 21 ; CD 21 For the second case, the smallest program would be this: ; Solution I mov dx, 81h ; BA 81 00 mov cl, ds:[80h] ; 8A 0E 80 00 mov ah, 40h ; B4 40 int 21h ; CD 21

The first two lines are part 1 (set up pointers) and the other two are part 2 (display string). If you think that something is missing you're right: we don't set BX (the handle). APPROACH 2: Print char by char -----------------------------For the second approach there are two interrupts: 1. INT 21, 2 ; write char in dl 2. INT 29 ; write char in al Of course the second interrupt is better, since there is no need to load ah with a function value. In addition, INT 29 reads the char from AL, so it can be used together with LODSB. The first way to implement this approach is to minimize part 2 (display loop). A program that does this is the following: ; Solution II mov si, 80h ; BE 80 00 lodsb ; AC mov cl, al ; 8A C8 more: lodsb ; AC int 29h ; CD 29 loop more ; E2 FB This program printed CX characters. The second way to print the string is to pri nt up to the CR. Here is how: ; Solution III mov si, 81h ; BE 81 00 more: lodsb ; AC int 29h ; CD 29 cmp al, 13 ; 3C 0D jne more ; 75 F9 nop ; 90 Yes, the last instruction IS a NOP. So we have an 11-byte program that works, and even has a NOP in it. Removing the NOP creates an even crazier program that is 10 bytes long, displays it's command line AND waits for a key press before terminating... Actually solution II, by substituting MOV SI,80h with SHR SI,1, does the same thing (10 bytes that display the command line and wait for the user to press a key). BTW: I really don't know why these programs work, though I have one or two theories... Next Issue Challenge -------------------Write the smallest possible PE program (win32) that outputs it's command line. ::/ \::::::. :/___\:::::::. / \::::::::. : _/\:::::::::. : _ \ \::::::::::. :::\_____\:::::::::::.......................................................FIN

You might also like