You are on page 1of 15

22/10/2016

Windows x64 Shellcode | McDermott Cybersecurity

January 11, 2011

Introduction
RIP-Relative Addressing
API Lookup Overview
API Lookup Demo
The Code
Building
Testing
Comments
Mitigations

refers to a chunk of executable machine code (along with any associated data) which is executed after being
injected into the memory of a process usually by means of a buffer-overflow type of security vulnerability. The term comes
from the fact that in early exploits against Unix platforms, an attacker would typically execute code that would start a
command shell listening on a TCP/IP port, to which the attacker could then connect and have full access to the system. For
the common web-browser and application exploits on Windows today, the shellcode is more likely to download and
execute another program than spawn a command shell, but the term remains.
In general, shellcode can be thought of as any code that is capable of being executed from an arbitrary location in memory
and without relying on services provided by the operating system loader as with traditional executables. Depending on the
exploit, additional requirements for shellcode may include small size and avoiding certain byte patterns in the code. In any
case, there are two tasks performed by the loader which shellcode must take care of itself:
1. Getting the addresses of data elements (such as strings referenced by the code)
2. Getting the addresses of system API functions used
This article describes a shellcode implementation of the x64 assembly program from my Windows Assembly Languages
article (refer to that article for general x64 assembly programming issues such as calling conventions and stack usage). As
youll see, the main program code doesnt look much different. Task #1 above actually turns out to be a non-issue on x64
platforms due to a new feature called
Task #2 is what comprises the bulk of the effort. In fact, the
code for looking up API functions is significantly larger and more complex than the main program itself. The only other
difference between the vanilla and shellcode versions of x64 hello world is that the shellcode does not use a
section,

http://mcdermottcybersecurity.com/articles/windows-x64-shellcode

1/15

22/10/2016

Windows x64 Shellcode | McDermott Cybersecurity

instead placing the strings in the


section after
. This is because sections are a feature of the executable file
format, whereas shellcode needs to be just a single block of code and data.

refers to the instruction pointer register on x64, and RIP-relative addressing means that references to memory addresses
being read or written can be encoded as offsets from the currently-executing instruction. This is not a completely new
concept, as
and
instructions have always supported relative targets on x86, but the ability to
memory using relative addressing is new with x64.
On x86, the labels referring to data variables would be replaced with actual hard-coded memory addresses when the program
was assembled and linked, under the assumption that the program would be loaded at a specific base address. If at runtime
the program needed to load at a different base address, the loader would perform
by updating all of those hardcoded addresses. Because shellcode needed to run from anywhere in memory, it needed to determine these addresses
dynamically and typically used a trick where the call instruction would push the address just past itself onto the stack as the
return address. This return address could then be popped off the stack to get a pointer to the string at runtime:

call skip
db Hello world, 0
skip:
pop esi
;esi now points to Hello world string

On x64 we do not need this trick. RIP-relative addressing is not only supported but is in fact the default, so we can simply
refer to strings using labels as with ordinary code and it Just Works.

Even the most trivial programs generally need to call various operating system API functions to perform some of type of
input/output (I/O) displaying things to the user, accessing files, making network connections, etc. On Windows these API
functions are implemented in various system DLLs, and in standard application development these API functions can simply
be referred to by name. When the program is compiled and linked, the linker puts information in the resulting executable
indicating which functions from which DLLs are required. When the program is run, the loader ensures that the necessary
DLLs are loaded and that the addresses of the called functions are resolved.
Windows also provides another facility that can be used by applications to load additional DLLs and look up functions on
demand: the
and
APIs in kernel32.dll. Not having the benefit of the loader, shellcode
needs to use LoadLibrary() and GetProcAddress() for all API functions it uses. This unfortunately presents a Catch-22: How
does the shellcode get the addresses of LoadLibrary() and GetProcAddress()?
It turns out that an equivalent to GetProcAddress() can be implemented by traversing the data structures of a loaded DLL in
memory. Also, kernel32.dll is always loaded in the address space of every process on Windows, so LoadLibrary() can be
found there and used to load other DLLs.
Developing shellcode using this technique requires a solid understanding of the Portable Executable (PE) file format used on
Windows for EXE and DLL files, and the next section of this article assumes some familiarity. The following references and
tools may be helpful:
Matt Pietreks
part1 and part2. Note that this only
covers 32-bit and not 64-bit PE files, but the differences are very minor mostly just widening some memory address
fields to 64 bits
The offical Microsoft Portable Executable and Common Object File Format Specification
http://mcdermottcybersecurity.com/articles/windows-x64-shellcode

2/15

22/10/2016

Windows x64 Shellcode | McDermott Cybersecurity

Daniel Pistellis CFF Explorer is a nice GUI tool for viewing and editing PE files, with 64-bit support
The dumpbin utility included with Visual C++ (including Express Edition) the most useful switches for our purposes are
/headers and /exports
Many of the PE data structures are documented in MSDN under ImageHlp Structures
Definitions of the data structures can be found in
in the
directory of the Windows SDK
The command in WinDbg is able to display many of these structures

This demonstration of how to find the address of a function in a loaded DLL can be followed by attaching WinDbg to any
64-bit process (Im using notepad.exe). Note that the particular values seen here may be different on your system.
First well get the address of the Thread Environment Block (TEB), sometimes also referred to as the Thread Information
Block (TIB). The TEB contains a large number of fields pertaining to the current thread, and on x64 the fields can be accessed
as offsets from the GS segment register during program execution (the FS register was used on x86). In WinDbg, the pseudo
register $teb contains the address of the TEB.

0:001>
$teb=000007fffffdb000
0:001>
ntdll!_TEB
+0x000 NtTib
: _NT_TIB
+0x038 EnvironmentPointer : (null)
+0x040 ClientId
: _CLIENT_ID
+0x050 ActiveRpcHandle : (null)
+0x058 ThreadLocalStoragePointer : (null)
+0x060
:
+0x068 LastErrorValue
[...]

_PEB

: 0

The only field from the TEB we are interested in is the pointer to the Process Environment Block (PEB). Note that WinDbg
also has a $peb pseudo-register, but in the shellcode implementation we will have to use the GS register to go through the
TEB first.

0:001>
ntdll!_PEB
+0000 InheritedAddressSpace : 0 ''
+0001 ReadImageFileExecOptions : 0 ''
+0002 BeingDebugged
: 01 ''
+0003 BitField
: 08 ''
+0003 ImageUsesLargePages : 0y0
+0003 IsProtectedProcess : 0y0
+0003 IsLegacyProcess : 0y0
+0003 IsImageDynamicallyRelocated : 0y1
+0003 SkipPatchingUser32Forwarders : 0y0
+0003 SpareBits
: 0y000
+0008 Mutant
: 0xffffffff`ffffffff Void
+0010 ImageBaseAddress : 000000000`ff8b0000 Void
+0018
:
_PEB_LDR_DATA
[...]

The PEB contains numerous fields with process-specific data and we are interested in the Ldr field at offset 0x18 which
points to a structure of type PEB_LDR_DATA.

0:001>
ntdll!_PEB_LDR_DATA
+0000 Length

: 058

http://mcdermottcybersecurity.com/articles/windows-x64-shellcode

3/15

22/10/2016

Windows x64 Shellcode | McDermott Cybersecurity


+0004 Initialized
+0008 SsHandle
+0010

: 01 ''
: (null)
: _LIST_ENTRY [ 0x00000000`00373040 - 0x39a3b0 ]

+0020

: _LIST_ENTRY [ 0x00000000`00373050 - 0x39a3c0 ]

+0030

: _LIST_ENTRY [ 0x00000000`00373150 - 0x39a3d0 ]

+0040 EntryInProgress : (null)


+0048 ShutdownInProgress : 0 ''
+0050 ShutdownThreadId : (null)

The PEB_LDR_DATA structure contains three linked lists of loaded modules InLoadOrderModuleList,
InMemoryOrderModuleList, and InInitializationOrderModuleList. A
or
refers to any PE file in memory the
main program executable as well as any currently-loaded DLLs. All three lists contain the same elements just in a different
order, with the one exception that InInitializationOrderModuleList only contains DLLs and excludes the main executable.
The elements of these lists are of type LDR_DATA_TABLE_ENTRY, though you cant tell from the previous output because
they are only shown as LIST_ENTRY which is the generic linked list header datatype used throughout Windows. A
LIST_ENTRY simply consists of a forward and back pointer for creating circular, doubly-linked lists. The address of the
_LIST_ENTRY within the _PEB_LDR_DATA structure represents the
When traversing the circular list, arriving
back at the list head is the way to know when complete.

0:001>
ntdll!_LIST_ENTRY
+0000 Flink
+0008 Blink

: Ptr64 _LIST_ENTRY
: Ptr64 _LIST_ENTRY

The
command provides the ability to traverse these types of lists and execute a specific command for each element in the
list (in this case displaying the element as an LDR_DATA_TABLE_ENTRY data structure). WinDbg commands can get
nasty-looking sometimes but are quite powerful. Here we display the InLoadOrderModuleList with list head at offset
from the beginning of the PEB_LDR_DATA structure (very long output truncated to show just part of one element):

0:001>
[...]
ntdll!_LDR_DATA_TABLE_ENTRY
+0x000 InLoadOrderLinks : _LIST_ENTRY [ 0x00000000`00333620 - 0x333130 ]
+0x010 InMemoryOrderLinks : _LIST_ENTRY [ 0x00000000`00333630 - 0x333140 ]
+0x020 InInitializationOrderLinks : _LIST_ENTRY [ 0x00000000`003344e0 - 0x333640 ]
+0x030
: 0x00000000`77650000 Void
+0x038 EntryPoint
+0x040 SizeOfImage
+0x048 FullDllName
+0x058

:
:
:
:

0x00000000`7766eff0 Void
0x11f000
_UNICODE_STRING "C:\Windows\system32\kernel32.dll"
_UNICODE_STRING "kernel32.dll"

+0x068 Flags
[...]

: 0x84004

Interesting fields for us within an LDR_DATA_TABLE_ENTRY structure are DllBase at 0x30 and BaseDllName at 0x58.
Note that BaseDllName is a UNICODE_STRING, which is an actual data structure and not simply a null-terminated
Unicode string. The actual string data can be found at offset 0x8 in the structure, for a total of 0x60 from BaseDllName.

0:001>
ntdll!_UNICODE_STRING
+0000 Length
+0002 MaximumLength
+0008 Buffer

: Uint2B
: Uint2B
: Ptr64 Uint2B

http://mcdermottcybersecurity.com/articles/windows-x64-shellcode

4/15

22/10/2016

Windows x64 Shellcode | McDermott Cybersecurity

Armed with this knowledge, we now have the ability to obtain the base address of any DLL given its name. Once we have
the base address we can traverse the DLL in memory to locate any function exported by the DLL. Also note that the return
value of LoadLibrary() is in fact a DLL base address. The base address of a loaded DLL can also be obtained in WinDbg with
the
command. Lets take a look at kernel32.dll:

0:001>
start

end
00000000`7776f000

module name
kernel32
(deferred)

An interesting feature of the PE file and loader is that the PE file format in memory is exactly the same as it is on disk, at least
as far as the headers. Its not exactly true that the entire file is read verbatim into memory, because each section is loaded at a
certain byte alignment in memory (typically a multiple of 4096, the virtual memory page size) that may be different from
where it falls in the file. Also, some sections (like a debug data section) may not be read into memory at all. However, when
we look at the DLL base address in memory, we can expect to find what we see at the beginning of any PE file: a DOS MZ
header. Thats an IMAGE_DOS_HEADER structure to be exact:

0:001>
ntdll!_IMAGE_DOS_HEADER
+0000 e_magic
+0002 e_cblp
+0004 e_cp
+0006 e_crlc
+0008 e_cparhdr
+0x00a e_minalloc
+0x00c e_maxalloc
+0x00e e_ss
+0010 e_sp
+0012 e_csum
+0014 e_ip
+0016 e_cs
+0018 e_lfarlc
+0x01a e_ovno
+0x01c e_res
+0024 e_oemid
+0026 e_oeminfo
+0028 e_res2
+0x03c

:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:

0x5a4d
090
3
0
4
0
0xffff
0
0xb8
0
0
0
040
0
[4] 0
0
0
[10] 0

The e_lfanew field at 0x3c (which for some reason is displayed as a decimal number even though everything else is hex)
contains the byte offset to the NT header (IMAGE_NT_HEADERS64). Converting 224 to hex
and adding to the image
base will point to the NT header at
. We can use the option (recursive) to expand the embedded
OptionalHeader field (which is a misnomer as it is required and always present):

0:001>
ntdll!_IMAGE_NT_HEADERS64
+0000 Signature
: 04550
+0004 FileHeader
: _IMAGE_FILE_HEADER
+0000 Machine
: 08664
+0002 NumberOfSections : 6
+0004 TimeDateStamp
: 0x4a5bdfdf
+0008 PointerToSymbolTable : 0
+0x00c NumberOfSymbols : 0
+0010 SizeOfOptionalHeader : 0xf0
+0012 Characteristics : 02022
+0018
: _IMAGE_OPTIONAL_HEADER64
+0000 Magic
: 0x20b
+0002 MajorLinkerVersion : 09 ''
+0003 MinorLinkerVersion : 0 ''
[...]

http://mcdermottcybersecurity.com/articles/windows-x64-shellcode

5/15

22/10/2016

Windows x64 Shellcode | McDermott Cybersecurity


+0068 LoaderFlags
: 0
+0x06c NumberOfRvaAndSizes : 010
+0070
: [16] _IMAGE_DATA_DIRECTORY
[...]

The DataDirectory field is located a total of


bytes from the NT headers (offset
from OptionalHeader which is
from the NT headers). This is an array of 16 elements corresponding to the various types of data in a PE file.

0:001>
ntdll!_IMAGE_DATA_DIRECTORY
[0] @ 0000000077650168 +0000 VirtualAddress

+0004 Size

[1] @ 0000000077650170 +0000 VirtualAddress 0xf848c +0004 Size 0x1f4


[2] @ 0000000077650178 +0000 VirtualAddress 0116000 +0004 Size 0520
[3] @ 0000000077650180 +0000 VirtualAddress 0x10c000 +0004 Size 09810
[4] @ 0000000077650188 +0000 VirtualAddress 0 +0004 Size 0
[5] @ 0000000077650190 +0000 VirtualAddress 0117000 +0004 Size 0x7a9c
[6] @ 0000000077650198 +0000 VirtualAddress 0x9b7dc +0004 Size 038
[7] @ 00000000776501a0 +0000 VirtualAddress 0 +0004 Size 0
[8] @ 00000000776501a8 +0000 VirtualAddress 0 +0004 Size 0
[9] @ 00000000776501b0 +0000 VirtualAddress 0 +0004 Size 0
[10] @ 00000000776501b8 +0000 VirtualAddress 0 +0004 Size 0
[11] @ 00000000776501c0 +0000 VirtualAddress 0x2d8 +0004 Size 0408
[12] @ 00000000776501c8 +0000 VirtualAddress 0x9c000 +0004 Size 0x1c70
[13] @ 00000000776501d0 +0000 VirtualAddress 0 +0004 Size 0
[14] @ 00000000776501d8 +0000 VirtualAddress 0 +0004 Size 0
[15] @ 00000000776501e0 +0000 VirtualAddress 0 +0004 Size 0

We are interested in the Export Directory which is the first one in the list having VirtualAddress
and Size
.
See the MSDN documentation of the IMAGE_DATA_DIRECTORY structure for a reference on which type of data goes with
each array element.
A virtual address, also called a
is an offset from the base load address of the module. RVAs are
used extensively in PE files, including for the pointers to the function names and function addresses in the export table. To
get the actual memory address pointed to by an RVA, simply add the base address of the module.
(For convenience, note that the
extracted manually so far.)

command can be used to automatically display much of the PE header information weve

Given that the Export Directory begins at RVA


, we add the base address
and should therefore expect
to find an IMAGE_EXPORT_DIRECTORY structure at
. Unfortunately IMAGE_EXPORT_DIRECTORY is not
understood by the command or documented in MSDN, so we will have to refer to the structure definition in winnt.h:
typedef struct _IMAGE_EXPORT_DIRECTORY {
DWORD
Characteristics;
DWORD
TimeDateStamp;
WORD
MajorVersion;
WORD
MinorVersion;
DWORD
Name;
DWORD
Base;
DWORD
NumberOfFunctions;
DWORD
NumberOfNames;
DWORD
AddressOfFunctions;
// RVA from base of image
DWORD
AddressOfNames;
// RVA from base of image
DWORD
AddressOfNameOrdinals; // RVA from base of image
} IMAGE_EXPORT_DIRECTORY, *PIMAGE_EXPORT_DIRECTORY;

The best we can do in WinDbg is display the structure as an array of DWORDs and count where things fall using the above
structure as a reference.

http://mcdermottcybersecurity.com/articles/windows-x64-shellcode

6/15

22/10/2016

Windows x64 Shellcode | McDermott Cybersecurity


0:001>
00000000`776f0020
00000000`776f0030

00000000 4a5bc32c 00000000 000a366c


00000001 0000056a 0000056a

00000000`776f0040

000aa10b 000aa12c

[...]

Beginning with the 8th DWORD within the structure we will find AddressOfFunctions (
), followed by
AddressOfNames (
and AddressOfNameOrdinals (
). These values are RVAs when we add the DLL base
address we will get the memory address of the array. When working with RVAs a lot it can be handy to stash the DLL base
address in a pseudo-register because it will be used so frequently. Here is AddressOfNames:

0:001>
0:001>
00000000`776f15f0

000a3691 000a36a6 000a36b5

00000000`776f1600
00000000`776f1610
[...]

000a36be 000a36c7 000a36d8 000a36e9


000a370f 000a372e 000a374d 000a375a

This is an array of RVAs pointing to the function name strings (the size of the array is given by the NumberOfNames field in
IMAGE_EXPORT_DIRECTORY). Take a look at the first one (adding DLL base address of course) and we see the name of a
function exported from kernel32.dll.

0:001>
00000000`776f3679

"AcquireSRWLockExclusive"

We can ultimately find the address of a function based on the array index of where the name is found in this array. The
AddressOfNameOrdinals array is a parallel array to AddressOfNames, which contains the
associated with
each name. An ordinal value is the index which is finally used to look up the function address in the AddressOfFunctions
array. (DLLs have the option of exporting functions by ordinal only without even having a function name, and in fact the
GetProcAddress() API can be called with a numeric ordinal instead of a string name).
More often than not, the value in each slot of the AddressOfNameOrdinals array has the same value as its array index but this
is not guaranteed. Note that AddressOfNameOrdinals is an array of WORDs, not DWORDs. In this case it appears to follow
the pattern of each element having the same value as its index.

0:001>
00000000`776f2b98
00000000`776f2ba8
00000000`776f2bb8
[...]

0000 0001 0002 0003 0004 0005 0006 0007


0008 0009 000a 000b 000c 000d 000e 000f
0010 0011 0012 0013 0014 0015 0016 0017

Once we have the ordinal number of a function, the ordinal is used as an index into the AddressOfFunctions array:

0:001>
00000000`776f0048
00000000`776f0058
00000000`776f0068
[...]

000aa12c

00066b20

00066ac0 0006ad90 0006ae00 0004b7d0


000956e0 0008fbb0 00048cc0 0004b800

The interpretation of the values in this array depends on whether the function is forwarded.
is a
mechanism by which a DLL can declare that an exported function is actually implemented in a different DLL. If the function
is not forwarded, the value is an RVA pointing to the actual function code. If the function is forwarded, the RVA points to an
http://mcdermottcybersecurity.com/articles/windows-x64-shellcode

7/15

22/10/2016

Windows x64 Shellcode | McDermott Cybersecurity

ASCII string giving the target DLL and function name. You can tell in advance if a function is forwarded based on the range
of the RVA the function is forwarded if the RVA falls within the export directory (as given by the VirtualAdress and Size in
the IMAGE_DATA_DIRECTORY entry).
You can practically see at a glance which RVAs above are in the vicinity of the export directory addresses weve been
working with. The first element in the array corresponds to our old friend AcquireSRWLockExclusive which we can see is
forwarded to another function in NTDLL:

0:001>
00000000`776fa10b
00000000`776fa12b

"NTDLL.RtlAcquireSRWLockExclusive"
""

The third array element, on the other hand, is not forwarded and points directly to the executable code of ActivateActCtx:

0:001>
kernel32!ActivateActCtx:
00000000`776544b0 4883ec28
00000000`776544b4 4883f9ff
[...]

sub
cmp

rsp,28h
rcx,0FFFFFFFFFFFFFFFFh

We now have all of the understanding we need to get the address of a function and its just a matter of implementing the
above steps in code.

;shell64.asm
;License: MIT (http://www.opensource.org/licenses/mit-license.php)

.code
;note: ExitProcess is forwarded
main proc
sub rsp, 28h
;reserve stack space for called functions
and rsp, 0fffffffffffffff0h
;make sure stack 16-byte aligned
lea rdx, loadlib_func
lea rcx, kernel32_dll
call lookup_api
mov r15, rax

;get address of LoadLibraryA


;save for later use with forwarded exports

lea rcx, user32_dll


call rax

;load user32.dll

lea rdx, msgbox_func


lea rcx, user32_dll
call lookup_api

;get address of MessageBoxA

xor r9, r9
lea r8, title_str
lea rdx, hello_str
xor rcx, rcx
call rax

;MB_OK
;caption
;Hello world
;hWnd (NULL)
;display message box

lea rdx, exitproc_func


lea rcx, kernel32_dll
call lookup_api

;get address of ExitProcess

xor rcx, rcx

;exit code zero

call rax
;exit
http://mcdermottcybersecurity.com/articles/windows-x64-shellcode

8/15

22/10/2016

Windows x64 Shellcode | McDermott Cybersecurity

call rax

;exit

main endp
kernel32_dll
loadlib_func
user32_dll
msgbox_func
hello_str
title_str
exitproc_func

db
db
db
db
db
db
db

'KERNEL32.DLL', 0
'LoadLibraryA', 0
'USER32.DLL', 0
'MessageBoxA', 0
'Hello world', 0
'Message', 0
'ExitProcess', 0

;look up address of function from DLL export table


;rcx=DLL name string, rdx=function name string
;DLL name must be in uppercase
;r15=address of LoadLibraryA (optional, needed if export is forwarded)
;returns address in rax
;returns 0 if DLL not loaded or exported function not found in DLL
lookup_api proc
sub rsp, 28h
;set up stack frame in case we call loadlibrary
start:
mov
mov
lea
mov
cld

r8, gs:[60h]
r8, [r8+18h]
r12, [r8+10h]
r8, [r12]

for_each_dll:
mov rdi, [r8+60h]
mov rsi, rcx
compare_dll:
lodsb
test al, al
jz found_dll
mov ah, [rdi]
cmp ah, 61h
jl uppercase
sub ah, 20h
uppercase:
cmp ah, al
jne wrong_dll
inc rdi
inc rdi
jmp compare_dll
wrong_dll:
mov r8, [r8]
cmp r8, r12
jne for_each_dll
xor rax, rax
jmp done
found_dll:
mov rbx, [r8+30h]

;peb
;peb loader data
;InLoadOrderModuleList (list head) - save for later
;follow _LIST_ENTRY->Flink to first item in list
;r8 points to current _ldr_data_table_entry
;UNICODE_STRING at 58h, actual string buffer at 60h
;pointer to dll we're looking for
;load character of our dll name string
;check for null terminator
;if at the end of our string and all matched so far, found it
;get character of current dll
;lowercase 'a'
;convert to uppercase

;found a character mismatch - try next dll


;skip to next unicode character
;continue string comparison
;move to next _list_entry (following Flink pointer)
;see if we're back at the list head (circular list)
;DLL not found

;get dll base addr - points to DOS "MZ" header

mov r9d, [rbx+3ch]


add r9, rbx
add r9, 88h

;get DOS header e_lfanew field for offset to "PE" header


;add to base - now r9 points to _image_nt_headers64
;18h to optional header + 70h to data directories
;r9 now points to _image_data_directory[0] array entry
;which is the export directory

mov r13d, [r9]

;get virtual address of export directory

test r13, r13


;if zero,
http://mcdermottcybersecurity.com/articles/windows-x64-shellcode

module does not have export table

9/15

22/10/2016

Windows x64 Shellcode | McDermott Cybersecurity

test r13, r13


jnz has_exports

;if zero, module does not have export table

xor rax, rax


jmp done

;no exports - function will not be found in dll

has_exports:
lea r8, [rbx+r13]

;add dll base to get actual memory address


;r8 points to _image_export_directory structure (see winnt.h)

mov r14d, [r9+4]


add r14, r13

;get size of export directory


;add base rva of export directory
;r13 and r14 now contain range of export directory
;will be used later to check if export is forwarded

mov ecx, [r8+18h]


mov r10d, [r8+20h]
add r10, rbx

;NumberOfNames
;AddressOfNames (array of RVAs)
;add dll base

dec ecx
for_each_func:
lea r9, [r10 + 4*rcx]
mov edi, [r9]
add rdi, rbx
mov rsi, rdx
compare_func:
cmpsb
jne wrong_func

;point to last element in array (searching backwards)


;get current index in names array
;get RVA of name
;add base
;pointer to function we're looking for

;function name doesn't match

mov al, [rsi]


test al, al
jz found_func

;current character of our function


;check for null terminator
;if at the end of our string and all matched so far, found it

jmp compare_func

;continue string comparison

wrong_func:
loop for_each_func
xor rax, rax
jmp done
found_func:

;try next function in array


;function not found in export table
;ecx is array index where function name found

mov r9d, [r8+24h]


add r9, rbx
mov cx, [r9+2*rcx]

;r8 points to _image_export_directory structure


;AddressOfNameOrdinals (rva)
;add dll base address
;get ordinal value from array of words

mov r9d, [r8+1ch]


add r9, rbx
mov eax, [r9+rcx*4]

;AddressOfFunctions (rva)
;add dll base address
;Get RVA of function using index

cmp rax, r13


jl not_forwarded
cmp rax, r14
jae not_forwarded

;see if func rva falls within range of export dir


;if r13 <= func < r14 then forwarded

;forwarded function address points to a string of the form <DLL name>.<function>


;note: dll name will be in uppercase
;extract the DLL name and add ".DLL"
lea rsi, [rax+rbx]
lea rdi, [rsp+30h]
mov r12, rdi

;add base address to rva to get forwarded function name


;using register storage space on stack as a work area
;save pointer to beginning of string

copy_dll_name:
movsb
cmp byte ptr [rsi], 2eh
;check
http://mcdermottcybersecurity.com/articles/windows-x64-shellcode

for '.' (period) character

10/15

22/10/2016

Windows x64 Shellcode | McDermott Cybersecurity

cmp byte ptr [rsi], 2eh


jne copy_dll_name

;check for '.' (period) character

movsb
mov dword ptr [rdi], 004c4c44h

;also copy period


;add "DLL" extension and null terminator

mov rcx, r12


call r15

;r12 points to "<DLL name>.DLL" string on stack


;call LoadLibraryA with target dll

mov rcx, r12


mov rdx, rsi
jmp start

;target dll name


;target function name
;start over with new parameters

not_forwarded:
add rax, rbx
done:
add rsp, 28h
ret

;add base addr to rva to get function address


;clean up stack

lookup_api endp
end

In the past I had developed 32-bit shellcode using the free and open-source Netwide Assembler (NASM), but when going
through the exercise of learning the 64-bit variety I figured I would try it out with the Microsoft Assembler (MASM) instead.
One problem quickly became apparent: MASM offers no way (that I know of) to generate raw binary machine code as
opposed to an .exe file! All is not lost though, the code bytes can be extracted from the .exe file easily enough (but in the
future I might go back to NASM).
First build a regular executable (note that no /defaultlib arguments are required this code does not directly import any
functions from DLLs because it looks them up itself):
ml64 shell64.asm /link /entry:main

Then use dumpbin to display the section headers, and take note of the
section:

and

for the .text

dumpbin /headers shell64.exe

SECTION HEADER #1
.text name
1000 virtual address (0000000140001000 to 00000001400011B1)
200 size of raw data
(00000200 to 000003FF)
[...]

Converting these numbers to decimal, this means we need to extract 434 (


) bytes beginning at offset 512 (
) in the
file. This can be done with a hex editor, or with the following command if you have a Windows version of laying around
(Im using Cygwin):
dd if=shell64.exe of=shell64.bin bs=1 count=434 skip=512

Now we have a file shell64.bin containing our shellcode. I like to open it in IDA Pro the first time and make sure it looks
right.

http://mcdermottcybersecurity.com/articles/windows-x64-shellcode

11/15

22/10/2016

Windows x64 Shellcode | McDermott Cybersecurity

The following test program simply loads data from a file into memory and then transfers execution to it. It supports an
optional argument which will insert a debugger breakpoint prior to calling the shellcode. All of the error-handling code is
long and tedious, yes, but debugging shellcode can be difficult enough without having to worry about whether the test
program is working correctly. There is also a free tool called testival available for testing shellcode, which supposedly has
some nice features but I have not personally tried it.
Note the call to
to enable execute permission on the allocated memory. This is necessary because the
process heap memory is non-executable by default on 64-bit Windows. This is called Data Execution Prevention (DEP) and
was designed specifically as a security measure. Without the VirtualProtect() call, the program will crash with an Access
Violation on the first instruction of the shellcode (debugging note: the
command in WinDbg can be used to display
the memory permissions for a given address). Bypassing DEP involves a technique called
(ROP) which is beyond the scope of this article (see mitigations section at the end).
Also note the use of
to insert the debugger breakpoint. Inline assembly language is not allowed by the
x64 Visual C++ compiler, so we can no longer write __asm int 3 to trigger a debugger as in x86 and must use the
__debugbreak() macro instead (it produces the same int 3 opcode). Take a look through
there are numerous such
macros available.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42

//runbin.c
#include
#include
#include
#include
#include
#include
#include

<windows.h>
<stdio.h>
<io.h>
<stdlib.h>
<malloc.h>
<fcntl.h>
<intrin.h>

typedef void (*FUNCPTR)();


int main(int argc, char **argv)
{
FUNCPTR func;
void *buf;
int fd, len;
int debug;
char *filename;
DWORD oldProtect;
if (argc == 3 && strlen(argv[1]) == 2 && strncmp(argv[1], "-d", 2) == 0) {
debug = 1;
filename = argv[2];
} else if (argc == 2) {
debug = 0;
filename = argv[1];
} else {
fprintf(stderr, "usage: runbin [-d] <filename>\n");
fprintf(stderr, " -d
insert debugger breakpoint\n");
return 1;
}
fd = _open(filename, _O_RDONLY | _O_BINARY);
if (-1 == fd) {
perror("Error opening file");
return 1;
}
len = _filelength(fd);

43
if (-1 == len) {
http://mcdermottcybersecurity.com/articles/windows-x64-shellcode

12/15

22/10/2016

Windows x64 Shellcode | McDermott Cybersecurity

43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74

if (-1 == len) {
perror("Error getting file size");
return 1;
}
buf = malloc(len);
if (NULL == buf) {
perror("Error allocating memory");
return 1;
}
if (0 == VirtualProtect(buf, len, PAGE_EXECUTE_READWRITE, &oldProtect)) {
fprintf(stderr, "Error setting memory executable: error code %d\n", GetLastError());
return 1;
}
if (len != _read(fd, buf, len)) {
perror("error reading from file");
return 1;
}
func = (FUNCPTR)buf;
if (debug) {
__debugbreak();
}
func();
}

return 0;

Build the test program with:


cl runbin.c

Then test the shellcode as follows:


runbin shell64.bin

If all goes well the message box should be seen:

If you want to step through it in a debugger, add the d option:


runbin d shell64.bin

http://mcdermottcybersecurity.com/articles/windows-x64-shellcode

13/15

22/10/2016

Windows x64 Shellcode | McDermott Cybersecurity

For this to work, a Just-In-Time (JIT) debugger (also known as postmortem debugger) must be configured on the system. To
enable WinDbg as the JIT debugger, run windbg I from the command line. For more information see Configuring
Automatic Debugging.

This shellcode was written from scratch with the goal of making it easy to understand (as much as shellcode can be anyway)
and to demonstrate how everything works. It is not the smallest or most optimized code possible. There are many other
published shellcode examples out there, and the Metasploit source code is particularly worth a look (the path is
/external/source/shellcode/windows/x64/src/).
Most shellcode does not handle forwarded exports as in this example, because it bloats and complicates the code and can
be worked around by determining in advance if the function is forwarded and just writing your code to call the ultimate
target instead. (The only catch is that whether an export is forwarded can change between operating system versions or
even service packs, so supporting forwarded exports does in fact make the shellcode more portable.)
A common variation on the technique for locating a function is to iterate through the export table computing a hash of
each function name, and then comparing it to a pre-computed hash value of the name of the function were interested in.
This has the advantage of making the shellcode smaller, particularly if it uses many API functions with lengthy names, as
the code only needs to contain short hash values rather than full strings like ExitProcess. The technique also serves to
obscure which functions are being called and has even been used by stand-alone malicious executables for this purpose.
Metasploit goes even further and computes a single hash that covers both the function name and DLL name.
It is also common practice to encrypt or encode the shellcode (typically with just a simple XOR type of algorithm
rather than true strong encryption), for the purpose of obfuscation and/or avoiding particular byte values in the code (such
as zeroes) that could prevent an exploit from working. The encrypted code is then prepended with a decoder stub that
decrypts and executes the main code.
Most shellcode does not bother with the error handling I put in place to return zero if the DLL or function cannot be
found, again because it makes the code larger and is not necessary once everything is tested.
The lookup_api function does not entirely behave itself according to the x64 calling conventions in particular it does
not bother to save and restore all of the registers that are deemed
. (A function is allowed to modify rax, rcx,
rdx, r8, r9, r10, and r11, but should preserve the values of all others). It also makes an assumption that r15 will point to
LoadLibraryA if needed for forwarded functions.
Metasploit and others use NASM instead of MASM as the assembler (probably a good call given the aforementioned
limitation of MASM for outputting raw binary, also NASM is open source and runs on Linux and other platforms).
Metasploit uses decimal numbers for the various offsets into the data structures whereas I prefer hex (You might be a geek
if).

Unfortunately for exploit developers and fortunately for PC users, the latest versions of Windows employ a variety of
effective exploit mitigation technologies. None of these features truly eliminate vulnerabilities but they can make it
significantly more difficult to execute arbitrary code via an exploit as opposed to simply crashing the program. For more
information on many of these mitigations and techniques for bypassing them, the Corelan exploit writing tutorials are
excellent (32-bit centric but still mostly applicable to x64).
Data Execution Prevention (DEP) This was discussed earlier regarding the VirtualProtect() call in the test program. By
default the stack and heap are configured to use non-executable memory pages which trigger an Access Violation if code
attempts to execute there. DEP can be bypassed using Return-Oriented Programming (ROP), where snippets of existing
executable code on the system are executed in sequence to accomplish a particular task.
Address Space Layout Randomization (ASLR) Rather than loading DLLs and EXEs at constant base addresses, the
operating system randomly varies the load address (at least across reboots, not necessarily between every invocation of a
program). ASLR does not prevent shellcode from executing (this example code runs just fine with it), but it makes it more
http://mcdermottcybersecurity.com/articles/windows-x64-shellcode

14/15

22/10/2016

Windows x64 Shellcode | McDermott Cybersecurity

difficult to transfer execution to the shellcode in the first place. It also makes bypassing DEP using ROP much more
difficult. There are several approaches to bypassing ASLR, including the use of a secondary information-disclosure
vulnerability to obtain the base address of a module.
Stack cookies Compiler-generated code is inserted before and after functions to detect if the return address on the stack
has been overwritten, making it more difficult to exploit stack-based buffer overflow vulnerabilities.
Structured Exception Handler (SEH) overwrite protection this is not applicable to x64 because exception handlers are
not stored on the stack.
Export Address Table Filtering (EAF) This is a new option released as part of the Enhanced Mitigation Experience
Toolkit (EMET) in November 2010. It is designed to block shellcode from looking up API addresses by accessing DLL
export tables, and works by setting a hardware breakpoint on memory access to certain data structures. Microsoft
acknowledges that it can be easily bypassed but argues that it will break almost all shellcode currently in use today, and
that EMET can be updated in response to new attack techniques at much more frequent intervals than new releases of
Windows are possible. See this article on bypassing EAF for details.

http://mcdermottcybersecurity.com/articles/windows-x64-shellcode

15/15