Professional Documents
Culture Documents
CURRENT
Peter Ferrie, Senior Anti-Virus Researcher, Microsoft Corporation
Abstract Unpackers are as old as the packers themselves, but restored to its non-present state. In either case, the page
anti-unpacking tricks are a more recent development. These address is kept in a list. In the event of exceptions triggered by
anti-unpacking tricks have developed quickly in number and, in execution of non-executable or non-present pages, the page
some cases, complexity. In this paper, we will describe some of address is compared to the entries in that list. A match
the most common anti-unpacking tricks, along with some indicates the execution of newly-written code, and is a possible
countermeasures. host entrypoint.
debugger, an emulator, a code-buffer, or a W-X interceptor. It The simplest of anti-dumping tricks is to change the
can be a tool in a virtual machine. There are corresponding SizeOfImage value in the Process Environment Block (PEB).
tricks for each of these, and they will be discuss ed separately. This interferes with process access, including preventing a
debugger from attaching to the process. It also breaks tools
- A memory-dumper dumps the process memory of the such as LordPE in default mode, among others.
running process, without regard to the code inside it. Example code looks like this:
- A debugger attaches to the process, allowing single- mov eax, fs:[30h] ;PEB
stepping, or the placing of breakpoints at key locations, in mov eax, [eax+0ch] ;LdrData
order to stop execution at the right place. The process can
;get InLoadOrderModuleList
then be dumped with more precision than a memory -dumper
mov eax, [eax+0ch]
alone.
;adjust SizeOfImage
- An emulator, as used within this paper, is a purely add dw [eax+20h], 1000h
software-based environment, most commonly used by anti-
malware software. It places the file to execute inside the The technique is used by many packers now. However,
environment and watches the execution for particular events of the technique is easily defeated, even by user-mode code.
interest. We can simply ignore the SizeOfImage value, and call the
VirtualQuery() function instead. The VirtualQuery() function
- A code-buffer is similar to, but different from, a debugger. returns the number of sequential pages whose attributes are
It also attaches to a process, but instead of executing the same. Since there cannot be gaps between sections in
instructions in-place, it copies each instruction into a private memory, the ranges can be enumerated by querying the first
buffer and executes it from there. It allows fine-grained control page after the end of the previous range. The enumeration
over execution as a result. It is also more transparent than a would begin with the ImageBase page and continue while
debugger, and faster than an emulator. the MEM_IMAGE type is returned. A page that is not of
the MEM_IMAGE type did not come from the file.
- A W-X interceptor uses page-level tricks to watch for
write-then-execute sequences. Typically, an executable region
b. Erasing the header
is marked as read-only and executable, and everything else is
marked as read-only and non-executable (or simply non-
present, depending on the hardware capabilities). Then the Some unpackers examine the section table to gather
code is allowed to execute freely. The interceptor intercepts interesting information about the image. Erasing or altering
exceptions that are triggered by writes to read-only pages, or that section table in the PE header can interfere with the
execution from non-executable or non-present pages. If the gathering of that information. This is typically used as to
hardware supports it, a read-only page will be replaced by a defeat ProcDump-style tools, which rely on the section table
writable but non-executable page, and the write will be allowed to dump the image.
to continue. Otherwise, the single-step exception will be used Example code looks like this:
to allow the write to complete, after which the page will be
;get image base start of the stolen bytes in the host, to point to the start of
push 0 the relocated code. A jump instruction is placed at the end
call GetModuleHandleA of the relocated code, to point to the end of the stolen bytes.
push eax The rest of the opcodes in the s tolen region in the host are
push esp then replaced with garbage. The relocated code can also be
push 4 ;PAGE_READWRITE interspersed with garbage instructions, in order to make it
;rounded up to hardware page size more difficult to determine the real instructions from the fake
push 1 instructions. This complicates the restoration of the original
push eax code. This technique was introduced in ASProtect.
xchg edi, eax
call VirtualProtect e. Guard Pages
xor ecx, ecx
mov ch, 10h ;assume 4kb pages Guard pages act as a one-shot access alarm. The first
;store VirtualProtect return value time that a guard page is accessed for any reason, an
rep stosb EXCEPTION_GUARD_PAGE (0x80000001) exception will be
raised. This can be used for a variety of things, but overall it
This technique is used by Yoda's Crypter, among others. acts as a demand-paging system for ring 3 code. The
As above, the VirtualQuery() function can be used to technique is achieved by intercepting the
recover the image size, and some of the layout (i.e. which EXCEPTION_GUARD_PAGE (0x80000001) exception,
pages are executable, which are writable, etc), but there is no checking if the page is within a particular range (for example,
way to recover the section table once it has been erased. within the process image space), then mapping in some
appropriate content if so.
c. Nanomites
This technique is used by Shrinker to perform on-
Nanomites are a more advanced method of anti-dumping. demand decompression. By decompressing only the pages
They were introduced in Armadillo. They work by replacing that are accessed, the startup time is reduced significantly.
branch instructions with an "int 3" instruction, and using The committed memory consumption can be reduced, since
tables in the unpacking code to determine the details. The any pages that are not accessed do not need any physical
details in this case are whether or not the "int 3" is a memory to back them. The overall application performance
nanomite or a debug break; whether or not the branch can also be increased, when compared to other packers that
should be taken, if it is a nanomite; the address of the decompress the entire application immediately. Shrinker
destination, if the branch is taken; and how large the works by hooking the ntdll KiUserExceptionDispatcher()
instruction is, if the branch is not taken. function, and watching for the EXCEPTION_GUARD_PAGE
(0x80000001) exception. If the exception occurs within the
A process that is protected by nanomites requires self- process image space, then Shrinker will load from disk the
debugging (known as "Debug Blocker" in Armadillo, see individual page that is being accessed, decompress it, and
Anti-Debugging:Self-Debugging section below), which uses then resume execution. If an access spans two pages, then
a copy of the same process. This allows the debugger to upon resuming, an exception will occur for the next page,
intercept the exceptions that are generated by the debuggee and Shrinker will load and decompress that page, too.
when the nanomite is hit. When the exception occurs in the
debuggee, the debugger recovers the exception address and A variation of this technique is used by Armadillo, to
searches for it in an address table. If a match is found, then perform on-demand decryption (known as "CopyMem2").
the nanomite type is retrieved from a type table. If the CPU However, as with nanomites, it requires the use of self-
flags match the type, then the branch will be taken. When debugging. This is in contrast to Shrinker, which is entirely
that happens, the destination address is retrieved from a self-contained. Armadillo decompresses all of the pages
destination table, and execution resumes from that address. into memory at once, rather than loading them from disk
Otherwise, the size of the branch is retrieved from the size when they are accessed. Armadillo uses the debugger to
table, in order to skip the instruction. intercept the exceptions in the debuggee, and watches for
the EXCEPTION_GUARD_PAGE (0x80000001) exception. If
d. Stolen Bytes the exception occurs within the process image space, then
Armadillo will decrypt the individual page that is being
Stolen bytes are opcodes that are taken from the host and accessed, and then resume execution. If an access spans
placed in dynamically allocated memory, where they will be two pages, then upon resuming, an exception will occur for
executed separately. A jump instruction is placed at the the next page, and Armadillo will decrypt that page, too.
a. PEB fields
If performance were not a concern, a protection method of
this type could also remember the last page that was loaded, i. NtGlobalFlag
and discard it when an exception occurs for another page
(unless the exception address suggests an access that The NtGlobalFlag field exists at offset 0x68 in the PEB.
spanned them). That way, no more than two pages will ever The value in that field is zero by default. On Windows
be in the clear in memory at the same time. In fact, that 2000 and later, there is a particular value that is typically
Armadillo does not do this could be considered a weakness stored in the field when a debugger is running. The
in the implementation, because by simply touching all of the presence of that value is not a reliable indication that a
pages in the image, Armadillo will decrypt them all, and then debugger is really running (especially since it is entirely
the process can be dumped entirely. absent on Windows NT). However, it is often used for
that purpose. The field is composed of a set of flags. The
f. Imports value that suggests the presence of a debugger is
composed of the following flags:
The list of imported functions can be very useful to get at
least some idea of what a program does. To combat this, FLG_HEAP_ENABLE_TAIL_CHECK (0x10)
some packers alter the import table after the imports have FLG_HEAP_ENABLE_FREE_CHECK (0x20)
been resolved. The alteration typically takes the form of FLG_HEAP_VALIDATE_PARAMETERS (0x40)
completely erasing the import table, but there are variations
that include changing the imported address to point to a Example incorrect code looks like this:
private buffer that is allocated dynamically. Within the
buffer is a jump to the real function address. This buffer is mov eax, fs:[30h] ;PEB
usually not dumped by default, so when the process exits, ;check NtGlobalFlag
the information is lost as to the real function addresses. cmp b [eax+68h], 70h
jne being_debugged
g. Virtual machines
This technique is used by ExeCryptor, among others.
Virtual machines are perhaps the ultimate in anti-dumping
technology, because at no point is the directly executable The "cmp" instruction above is a common mistake.
code ever visible in memory. Further, the import table might The assumption is that no other flags can be set, which is
contain only the absolutely required functions not true. Those three flags alone are usually set for a
(LoadLibrary() and GetProcAddress()), leaving no clue as to process that is created by a debugger, but not for a
what the program does. Additionally, the p-code might be process to which a debugger attaches afterwards.
encoded in some way, such that two behaviourally identical However, there are three further exceptions.
samples might have very different-looking contents. This
technique is used by VMProtect. The first exception is that additional flags can be set for
all processes, by a registry value. The registry value is
The p-code itself can be polymorphic, where do-nothing the "GlobalFlag" string value of the
instructions are inserted into the code flow, in the same way "HKLM\System\CurrentControlSet\Control\Session
as is often done for native code. This technique is used by Manager" registry key.
Themida.
The second exception is that all of the flags can be
The p-code can contain anti-debugging routines, such as controlled on a per-process basis, by a different registry
checking specific memory locations for specific values (see value. The registry value is the also the "GlobalFlag"
Anti-Debugging section below). This technique is used by string value (note that "Windows Anti-Debug Reference"
HyperUnpackMe2 i. by Nicolas Falliere iii incorrectly calls it "GlobalFlags") of
the "HKLM\Software\Microsoft\Windows
The p-code interpreter can be obfuscated, such that the NT\CurrentVersion\Image File Execution
method for interpretation is not immediately obvious. This Options\<filename>" registry key. The "<filename>" must
technique is used by Themida and Virtual CPUii. be replaced by the name of the executable file (not a DLL)
to which the flags will be applied when the file is
executed. An empty "GlobalFlag" string value will result
II. ANT I-UNPACKING BY ANT I-DEBUGGING
in no flags being set.
The third exception is that, on Windows 2000 and later, the second one (ForceFlags) is at offset 0x10 in the heap.
all of the flags can be controlled on a per-process basis, The Flags field indicates the settings that were used for the
by the Load Configuration Structure. The Load current heap block. The ForceFlags field indicates the
Configuration Structure has existed since Windows NT, settings that will be used for subsequent heap manipulation.
but the format was not documented by Microsoft in the The value in the first field is two by default, the value in the
PE/COFF Specification until 2006 (and incorrectly). The second field is zero by default. There are particular values
structure was extended to support Safe Exception that are typically stored in those fields when a debugger is
Handling in Windows XP, but it also contains two fields running, but the presence of those values is not a reliable
of relevance to this paper: GlobalFlagsClear and indication that a debugger is really running. However, they
GlobalFlagsSet. As their names imply, they can be used are often used for that purpose.
to clear and/or set any combination of bits in the PEB-
>NtGlobalFlag field. The flags specified by the The fields are composed of a set of flags. The value in
GlobalFlagsClear field are cleared first, then the flags the first field that suggests the presence of a debugger is
specified by the GlobalFlagsSet field are set. This means composed of the following flags:
that even if all of the flags are specified by the
GlobalFlagsClear field, any flags that are specified by the HEAP_GROWABLE (2)
GlobalFlagsSet field will still be set. No current packer HEAP_TAIL_CHECKING_ENABLED (0x20)
supports this structure. HEAP_FREE_CHECKING_ENABLED (0x40)
HEAP_SKIP_VALIDATION_CHECKS (0x10000000)
If the FLG_USER_STACK_TRACE_DB (0x1000) is HEAP_VALIDATE_PARAMETERS_ENABLED
specified to be set, either by the "GlobalFlag" registry (0x40000000)
value, or in the GlobalFlagsSet field, the
FLG_HEAP_VALIDATE_PARAMETERS will Example code looks like this:
automatically be set, even if it is specified in the
GlobalFlagsClear field. mov eax, fs:[30h] ;PEB
;get process heap base
Thus, the correct implementation to detect the default mov eax, [eax+18h]
value is this one: mov eax, [eax+0ch] ;Flags
dec eax
mov eax, fs:[30h] ;PEB dec eax
mov al, [eax+68h] ; NtGlobalFlag jne being_debugged
and al, 70h
cmp al, 70h The value in the second field that suggests the presence
je being_debugged of a debugger is composed of the following flags:
The process default heap is another place to find Example code looks like this:
debugging artifacts. The base heap pointer can be retrieved
by the kernel32 GetProcessHeap() function. Some packers mov eax, fs:[30h] ;PEB
avoid using the API and look directly at the PEB instead. ;get process heap base
mov eax, [eax+18h]
Example code looks like this: cmp [eax+10h], 0 ;ForceFlags
jne being_debugged
mov eax, fs:[30h] ;PEB
;get process heap base The "tail" flags are set in the heap fields if the
mov eax, [eax+18h] FLG_HEAP_ENABLE_TAIL_CHECK flag is set in the PEB-
>NtGlobalFlags field. The "free" flags are set in the heap
Within the heap are two fields of interest. The PEB- fields if the FLG_HEAP_ENABLE_FREE_CHECK flag is set
>NtGlobalFlags field forms the basis for the values in those in the PEB->NtGlobalFlags field. The validation flags are set
fields. The first field (Flags) exists at offset 0x0c in the heap, in the heap fields if the
FLG_HEAP_VALIDATE_PARAMETERS flag is set in the mov eax, fs:[30h] ;PEB
PEB->NtGlobalFlags field. However, the heap flags can be ;check BeingDebugged
controlled on a per-process basis, through the cmp b [eax+2], 0
"PageHeapFlags" value, in the same manner as "GlobalFlag" jne being_debugged
above.
To defeat these methods requires only setting the PEB-
c. The Heap >BeingDebugged flag to FALSE. A common
convenience while debugging is to place a breakpoint at
The problem with simply clearing the heap flags is that the first instruction in the kernel32 IsDebuggerPresent()
the initial heap will have been initialised with the flags function. Some unpackers check explicitly for this
active, and that leaves some artifacts that can be detected. breakpoint.
Specifically, at the end of the heap block will one definite Example code looks like this:
value, and one possible value. The
HEAP_TAIL_CHECKING_ENABLED flag causes the push offset l1
sequence 0xABABABAB to always appear twice at the call GetModuleHandleA
exact end of the allocated block. The push offset l2
HEAP_FREE_CHECKING_ENABLED flag causes the push eax
sequence 0xFEEEFEEE (or a part thereof) to appear if call GetProcAddress
additional bytes are required to fill in the slack space until cmp b [eax], 0cch
the next block. je being_debugged
Example code looks like this: ...
l1: db "kernel32", 0
mov eax, <heap ptr> l2: db "IsDebuggerPresent", 0
;get unused_bytes
movzx ecx, b [eax-2] Some packers check that the first byte in the function is
movzx edx, w [eax-8] ;size the "64" opcode ("FS:" prefix).
sub eax, ecx Example code looks like this:
lea edi, [edx*8+eax]
mov al, 0abh push offset l1
mov cl, 8 call GetModuleHandleA
repe scasb push offset l2
je being_debugged push eax
call GetProcAddress
These values are checked by Themida. cmp b [eax], 64h
jne being_debugged
d. Special APIs ...
l1: db "kernel32", 0
i. IsDebuggerPresent l2: db "IsDebuggerPresent", 0
The ntdll NtQueryInformationProcess() function has This technique is used by HyperUnpackMe2, among
these parameters: HANDLE ProcessHandle, others. Since this information comes from the kernel,
PROCESSINFOCLASS ProcessInformationClass, PVOID there is no easy way for user-mode code to prevent this
ProcessInformation, ULONG ProcessInformationLength, call from revealing the presence of the debugger.
PULONG ReturnLength. Windows Vista supports 45
classes of ProcessInformationClass information (up from The undocumented ProcessDebugFlags class returns
38 in Windows XP), but only four of them are documented the inverse value of the EPROCESS->NoDebugInherit bit.
by Microsoft so far. One of them is the That is, the return value is FALSE if a debugger is
ProcessDebugPort. It is possible to query for the present.
existence (not the value) of the port. The return value is Example code looks like this:
0xffffffff if the process is being debugged. Internally, the
function queries for the non-zero state of the EPROCESS- push eax
>DebugPort field. mov eax, esp
Example code looks like this: push 0
push 4 ;ProcessInformationLength
push eax push eax
mov eax, esp push 1fh ;ProcessDebugFlags
push 0 push -1 ;GetCurrentProcess()
push 4 ;ProcessInformationLength call NtQueryInformationProcess
push eax pop eax
push 7 ;ProcessDebugPort test eax, eax
push -1 ;GetCurrentProcess() je being_debugged
call NtQueryInformationProcess
pop eax This technique is used by HyperUnpackMe2, among
test eax, eax others. Since this information comes from the kernel,
jne being_debugged there is no easy way for user-mode code to prevent this
call from revealing the presence of the debugger.
This technique is used by MSLRH, among others.
Since this information comes from the kernel, there is no v. NtQuerySystemInformation
easy way for user-mode code to prevent this call from
revealing the presence of the debugger. The ntdll NtQuerySystemInformation() function has
these parameters: SYSTEM_INFORMATION_CLASS
iv. Debug Objects SystemInformationClass, PVOID SystemInformation,
ULONG SystemInformationLength, PULONG
Windows XP introduced a "debug object". When a ReturnLength. Windows Vista supports 106 classes of
debugging session begins, a debug object is created, and SystemInformationClass information (up from 72 in
a handle is associated with it. It is possible to query for Windows XP), but only nine of them are documented by
the value of this handle, using the undocumented Microsoft so far. None of them is the
ProcessDebugObjectHandle class. SystemKernelDebuggerInformation class, which has
Example code looks like this: existed since Windows NT.
push 3
The SystemKernelDebuggerInformation class returns push ebx
the value of two flags: KdDebuggerEnabled in al, and call NtQueryObject
KdDebuggerNotPresent in ah. Thus, the return value in pop ebp
ah is FALSE if a debugger is present. push 4 ;PAGE_READWRITE
Example code looks like this: push 1000h ;MEM_COMMIT
push ebp
push eax push ebx
mov eax, esp call VirtualAlloc
push 0 push ebx
push 2 ;SystemInformationLength ;ObjectInformationLength
push eax push ebp
;SystemKernelDebuggerInformation push eax
push 23h ;ObjectAllTypesInformation
call NtQuerySystemInformation push 3
pop eax push ebx
test ah, ah xchg esi, eax
je being_debugged call NtQueryObject
lodsd ;handle count
This technique is used by SafeDisc. Since this xchg ecx, eax
information comes from the kernel, there is no easy way to l1: lodsd ;string lengths
prevent this call from revealing the presence of the movzx edx, ax ;length
debugger. ;pointer to TypeName
lodsd
vi. NtQueryObject xchg esi, eax
;sizeof(L"DebugObject")
The ntdll NtQueryObject() function has these ;avoids superstrings
parameters: HANDLE Handle, ;like "DebugObjective"
OBJECT_INFORMATION_CLASS cmp edx, 16h
ObjectInformationClass, PVOID ObjectInformation, jne l2
ULONG ObjectInformationLength, PULONG xchg ecx, edx
ReturnLength. Windows NT-based platforms support five mov edi, offset l3
classes of ObjectInformationClass information, but only repe cmpsb
two of them are documented by Microsoft so far. Neither xchg ecx, edx
of them is the ObjectAllTypesInformation, which we jne l2
require. ;TotalNumberOfObjects
cmp [eax], edx
As noted above, when a debugging session begins on jne being_debugged
Windows XP, a debug object is created, and a handle is ;point to trailing null
associated with it. It is possible to query for the list of l2: add esi, edx
existing objects, and check the number of debug objects ;round down to dword
that exist. This API is supported by Windows NT-based and esi, -4
platforms, but only Windows XP and later will return a ;skip trailing null
debug object in the list. ;and any alignment bytes
Example code looks like this: lodsd
loop l1
xor ebx, ebx ...
push ebx l3: dw "D","e","b","u","g"
push esp ;ReturnLength dw "O","b","j","e","c","t"
;ObjectInformationlength of 0
;to receive required size Since this information comes from the kernel, there is
push ebx no easy way for user-mode code to prevent this call from
push ebx revealing the presence of the debugger.
;ObjectAllTypesInformation
vii. Thread hiding level denial-of-service, by causing the CSRSS.EXE
process to perform an illegal operation. One method is
Windows 2000 introduced an explicitly anti-debugging the creation of a thread at an invalid memory address, or a
API extension, in the form of an information class called thread that executes an infinite loop. However, since the
HideThreadFromDebugger. It can be applied on a per- control is complete, an application can inject a thread into
thread basis, using the ntdll SetInformationThread() the CSRSS.EXE process space and perform some
function. meaningful action, which results in a privilege elevation.
Example code looks like this: However, this is of only minor concern, since usually only
Administrators will be able to acquire the debug privilege,
push 0 and Administrators are highly privileged already. This
push 0 technique was described publicly by Piotr Bania iv in 2005.
;HideThreadFromDebugger
push 11h Both OllyDbg and WinDbg acquire the debug privilege,
push -2 ;GetCurrentThread() but Turbo Debug does not. The best way to defeat this
call NtSetInformationThread technique is to not acquire the privilege arbitrarily, and
keep it for only as long as truly necessary.
When the function is called, the thread will continue to
run but a debugger will no longer receive any events ix. CloseHandle
related to that thread. Among the missing events are that
the process has terminated, if the main thread is the If an invalid handle is passed to the kernel32
hidden one. This technique is used by CloseHandle() function (or directly to the ntdll NtClose()
HyperUnpackMe2, among others. function), and no debugger is present, then an error code
is returned. However, if a debugger is present, an
viii. OpenProcess EXCEPTION_INVALID_HANDLE (0xc0000008) exception
will be raised. This exception can be intercepted by an
When a process acquires the SeDebugPrivilege, it exception handler, and is an indication that a debugger is
gains full control of the CSRSS.EXE, even though running.
CSRSS.EXE is a system process. The reason for that is Example code looks like this:
because SeDebugPrivilege overrides all of the restrictions
for that process alone. Further, the privilege is passed to xor eax, eax
child processes, such as the ones created by a debugger. push offset being_debugged
The result is if a debugged application can obtain the push dw fs:[eax]
process ID for CSRSS.EXE, it can open the process via mov fs:[eax], esp
the kernel32 OpenProcess() function. The process ID can ;any illegal value will do
be obtained by the kernel32 CreateToolhelp32Snapshot() ;must be dword-aligned on Vista
function and a kernel32 Process32Next() function push esp
enumeration; or the ntdll NtQuerySystemInformation call CloseHandle
(SystemProcessInformation (5)) function (and the ntdll
NtQuerySystemInformation() function is how the kernel32 To defeat this method is easiest on Windows XP, where
CreateToolhelp32Snapshot() function gets its information a FirstHandler Vectored Exception Handler can be
on Windows NT-based platforms). Alternatively, registered by the debugger to hide the exception and
Windows XP introduced the ntdll CsrGetProcessId() silently resume execution. Of course, there is the problem
function, which simplifies things greatly. of transparently hooking the kernel32
Example code looks like this: AddVectoredExceptionHandler() function, in order to
prevent another handler from registering as the first
call CsrGetProcessId handler. However, it is still better than the problem of
push eax transparently hooking the ntdll NtClose() on Windows NT
push 0 and Windows 2000, in order to register a Structured
push 1f0fffh ;PROCESS_ALL_ACCESS Exception Handler to hide the exception.
call OpenProcess
test eax, eax x. OutputDebugString
jne being_debugged
The kernel32 OutputDebugString() function can
This opens (no pun intended) the way to a system- demonstrate different behaviour, depending on whether
or not a debugger is present. The most obvious
difference in behaviour that the kernel32 GetLastError() The kernel32 WriteProcessMemory() function
function will return zero if a debugger is present. technique is a simple variation on the kernel32 ReadFile()
Example code looks like this: function technique above, but it requires that the data to
write are already present in the process memory space.
push 0 Example code looks like this:
push esp
call OutputDebugStringA push 1
call GetLastError push offset l1
test eax, eax push offset l2
je being_debugged push -1 ;GetCurrentProcess()
call WriteProcessMemory
xi. ReadFile l1: nop
l2: int 3
The kernel32 ReadFile() function can be used as a
technique for self-modification, by reading file content This technique is used by NsAnti. The way to defeat
into the code stream. It is also an effective method for this technique is to use hardware breakpoints instead of
removing software breakpoints that a debugger might software breakpoints after the API call.
place. This is a technique that I discussed privately in
1999, but it was described publicly by Piotr Bania v in 2007. xiii. UnhandledExceptionFilter
Example code looks like this:
When an exception occurs, and no registered
xor ebx, ebx Structured Exception Handlers (neither Safe nor Legacy)
mov ebp, offset l2 or Vectored Exception Handlers exist, or none of the
push 104h ;MAX_PATH registered handlers handles the exception, the kernel32
push ebp UnhandledExceptionFilter() function will be called as a
push ebx ;self filename last resort. Within that function is a call to the handler
call GetModuleFileNameA that was registered by the kernel32
push ebx SetUnhandledExceptionFilter() function, but that call will
push ebx not reached if a debugger is present. Instead, the
push 3 ;OPEN_EXISTING exception will be passed to the debugger. The presence
push ebx of a debugger is determined by a call to the ntdll
push 1 ;FILE_SHARE_READ NtQueryInformationProcess (ProcessDebugPort class)
push 80000000h ;GENERIC_READ function. So, for applications that do not know about the
push ebp ntdll NtQueryInformationProcess (ProcessDebugPort
call CreateFileA class) function, the missing exception can be used to infer
push ebx the presence of the debugger.
push esp Example code looks like this:
;more bytes might be more useful
push 1 push offset l1
push offset l1 call SetUnhandledExceptionFilter
push eax ;force an exception to occur
call ReadFile int 3
;replaced by "M" jmp being_debugged
;from the MZ header l1: ...
l1: int 3
... xiv. Block Input
l2: db 104h dup (?);MAX_PATH
The user32 BlockInput() function blocks mouse and
The way to defeat this technique is to use hardware keyboard events from reaching applications. It is a very
breakpoints instead of software breakpoints after the API effective way to disable debuggers.
call. Example code looks like this:
The kernel32 SuspendThread() function can be another Windows NT-based platforms support multiple
very effective way to disable user-mode debuggers like desktops per session. It is possible to select a different
OllyDbg and Turbo Debug. This can be achieved by active desktop, which has the effect of hiding the
enumerating the processes, as described above, then windows of the previously active desktop, and with no
suspending the main thread of the parent process, if it obvious way to switch back to the old desktop.
does not match "Explorer.exe". This technique is used by
Yoda's Protector. Example code looks like this:
Other common device names include these: The interrupt 1 descriptor normally has a descriptor
privilege level (DPL) of 0, which means that the "cd 01"
\\.\REGVXG opcode ("int 1" instruction) cannot be issued from ring 3.
\\.\REGSYS An attempt to execute this interrupt directly will result in a
general protection fault ("int 0x0d" exception) being
These names belong to RegMon. The first name is for issued by the CPU, eventually resulting in an
Windows 9x-based platforms, the second name is for EXCEPTION_ACCESS_VIOLATION (0xc0000005)
Windows NT-based platforms. exception being raised by Windows.
OllyDbg can be found by calling the user32 WinDbg can be found by calling the user32
FindWindow() function, and passing "OLLYDBG" as the FindWindow() function, and passing
class name to find. "WinDbgFrameClass" as the class name to find.
Example code looks like this: Example code looks like this:
vulnerability, whereby an attacker will produce a sample
push 0 which intentionally delays its main execution, usually via a
push offset l1 dummy loop, in an attempt to force an emulator to give up.
call FindWindowA Example code looks like this:
test eax, eax
jne being_debugged mov ecx, 400000h
... l1: loop l1
l1: db "WinDbgFrameClass", 0
In some cases, such dummy loops can be recognized and
l. Miscellaneous tools skipped, but in that case, care must be taken to adjust the
values of any internal timers, and also the CPU registers that
i. FindWindow are involved. Otherwise, the arbitrary skipping of the loop
might be detected.
There are several less common tools that are of interest Example code looks like this:
to some packers, such as window name of "Import
REConstructor v1.6 FINAL (C) 2001-2003 MackT/uCF", or call GetTickCount
a class name of "TESTDBG", "kk1, "Eew57", or xchg ebx, eax
"Shadow". These names are checked by MSLRH. mov ecx, 400000h
l1: loop l1
call GetTickCount
III. ANT I-UNPACKING BY ANT I-EMULAT ING
sub eax, ebx
Some methods to detect emulators and virtual machines cmp eax, 1000h
have been described elsewhere ix. Some additional methods are jbe being_debugged
described here. A companion paper (Anti-Unpacking Tricks -
Future) will describe some further methods. Further, the loop might not be a dummy one at all, in the
sense that the results might be used for a real purpose, even
a. Software Interrupts though they could have been calculated without resorting to
a loop.
i. Interrupt 3 Real-world example code looks like this:
b. Time-locks Many APIs return error codes when they receive invalid
parameters. The problem for anti-malware emulators is that,
Time-locks are a very effective anti-emulation technique. for simplicity, such error checking is not implemented. This
Most anti-malware emulators intentionally contain a limit to leads to a vulnerability, whereby an attacker will
the amount of time and/or the number of CPU instructions intentionally pass known invalid parameters to the function,
that can be emulated, before the emulator will exit with no and expecting an error code to be returned. In s ome cases,
detection. This behavior is almost a requirement, since a this error code is used as a key for decryption. Any
user will typically not be patient enough to wait for an emulator that fails to return the error code will not be able to
emulated application to exit on its own (if it ever would), decrypt the data.
before being able to access it normally. This leads to a Example code looks like this:
push 1 e. GetProcAddress(internal)
push 1
call Beep Some anti-malware emulators export special APIs, which
call GetLastError can be used to communicate with the host environment, for
;ERROR_INVALID_PARAMETER (0x57) example. This technique has been published elsewherex.
push 5 ;sizeof(l2) Example code looks like this:
pop ecx
xchg edx, eax push offset l1
mov esi, offset l2 call GetModuleHandleA
mov edi, esi push offset l2
l1: lodsb push eax
xor al, dl call GetProcAddress
stosb test eax, eax
loop l1 jne being_debugged
... ...
l2: db 3fh, 32h, 3bh, 3bh, 38h l1: db "kernel32", 0
;secret message l2: db "Aaaaaa", 0
There are many known file-format tricks, yet occasionally v. Non-aligned PointerToRawData
a new one will appear. This can be a significant problem for
anti-malware emulators, since if the emulator is responsible The PointerToRawData field in the section table is a
for parsing the file-format, then incompatibilities can appear field that is subject to automatic rounding down by
because of differences in the emulated operating system. Windows. By relying on this behaviour, it is possible to
For example, Windows 9x-based platforms use a hard-coded produce a section whose entrypoint appears to point to
value for the size of the Optional Header, and ignore the PE- data other than what will actually be executed.
>SizeOfOptionalHeader field. They also allow gaps in the
virtual memory described by the section table. Windows NT- vi. No section table
based platforms honour the value in the PE-
>SizeOfOptionalHeader field, and do not allow any gaps. An interesting thing happens if the value in the PE-
Typical tricks include: >SectionAlignment field is reduced to less than 4kb.
Normally, the section that contains the PE header is
i. Non-aligned SizeOfImage neither writable nor executable, since there is no section
table entry that describes it. However, if the value in the
The file-format documentation states that the value in PE->SectionAlignment field is less than 4kb, then the PE
header is marked internally as both writable and to a specified user-mode address, but it also returns the
executable. Further, the contents of the section table original value of the page attributes. Any change in the
become optional. That is, the entire section table can be attributes is an indication that an intercepter is running.
zeroed out, and the file will be mapped as though it were Example code looks like this:
one section whose size is equal to the value in the PE-
>SizeOfImage field. ;sizeof(MEMORY_BASIC_INFORMATION)
push 1ch
mov ebx, offset l1
IV. ANT I-UNPACKING BY ANT I-INT ERCEPT ING
push ebx
push ebx
a. Write->Exec call VirtualQuery
;PAGE_EXECUTE_READWRITE
Some unpacking tools work by intercepting the execution cmp b [ebx+14h], 40h
of newly written pages, to guess when the unpacker has jne being_debugged
completed its main function and transferred control to the ...
host. By writing then executing a dummy instruction, an ;sizeof(MEMORY_BASIC_INFORMATION)
unpacker can cause an intercepter to exit early. l1: db 1ch dup (?)
Example code looks like this:
The kernel32 VirtualProtect() function is another way to
mov [offset dest], 0c3h query the page attributes, since the previous attributes are
call dest returned by the function. Any change in the attributes is an
indication that an intercepter is running.
This technique is used by ASPack , among others. Example code looks like this:
However, it is probably for an entirely different reason,
which is to force a CPU queue flush for multiprocessor l1: push eax
environments. push esp
push 40h ;PAGE_EXECUTE_READWRITE
b. Write^Exec push 1
push offset l1
Some unpacking tools work by changing the previously call VirtualProtect
writable-executable page attributes to either writable or pop eax
executable, but not both. These changes can be detected ;PAGE_EXECUTE_READWRITE
indirectly. The easier method to achieve this is to use a cmp al, 40h
function that uses the kernel to write to a specified user- jne being_debugged
mode address. The function will return an error if it fails to
write to the address. In this case, the address to specify is
one in which the page attributes are likely to have been V. MISCELLANEOUS
altered. A good candidate function is the kernel32
VirtualQuery() function. a. Fake signatures
Example code looks like this:
Some packers emit the startup code for other popular
;sizeof(MEMORY_BASIC_INFORMATION) packers and tools, in an attempt to fool unpackers into
push 1ch misidentifying the wrapper. Among the most popular of
mov ebx, offset l1 these fake signatures is the startup code for Microsoft
push ebx Visual C, which was written to fool PEiD. This technique is
push ebx used by RLPack Professional, among others.
call VirtualQuery
test eax, eax
je being_debugged CONCLUSION
... There are many different classes of anti-unpacking
;sizeof(MEMORY_BASIC_INFORMATION) techniques, and this paper has attempted to describe a subset
l1: db 1ch dup (?) of the known ones. A companion paper (Anti-Unpacking
Tricks - Future) will describe some of the possible future ones,
Not only does the kernel32 VirtualQuery() function write
so that we can, where possible, construct defenses against
them.
Final note:
i
http://crackmes.de/users/thehyper/hyperunpackme2/
ii
http://www.honeynet.org/scans/scan33/index.html
iii
http://www.securityfocus.com/infocus/1893
iv
http://www.piotrbania.com/all/articles/antid.txt
v
http://piotrbania.com/all/articles/bypassing_the_breakpoints.tx
t
vi
http://www.defendion.com/EliCZ/infos/TlsInAsm.zip
vii
http://pferrie.tripod.com/papers/chiton.pdf
viii
http://vx.eof-project.net/viewtopic.php?id=142
ix
http://pferrie.tripod.com/papers/attacks2.pdf
x
http://pferrie.tripod.com/papers/attacks2.pdf
xi
http://www.symantec.com/enterprise/security_response/weblo
g/2007/02/x86_fetchdecode_anomalies.html