You are on page 1of 40

Win32API Interceptor Final Report

Win32API Interceptor
A Microsoft Windows API function calls
Interception application

Final Report

1
Win32API Interceptor Final Report

Table of contents
Table of contents ...................................................................................2

Abstract ..................................................................................................4

Introduction ............................................................................................4

The Goal – Project scope ......................................................................4

Technologies used in the solution’s architecture ..............................5


Microsoft Research “Detours” Technology .................................................... 5
Loading a DLL into a process’s context ............................................................. 5
Detouring a function........................................................................................... 8

COM .................................................................................................................. 12
DllInjectionAppLoader...................................................................................... 12
InterceptLogger ............................................................................................... 13

Microsoft Access Data-Base........................................................................... 14

The Solution .........................................................................................15


Architecture...................................................................................................... 15
Win32API Interceptor....................................................................................... 16
InterceptLogger ............................................................................................... 16
DllInjectionAppLoader...................................................................................... 16
\\.\pipe\Win32APIInterceptor............................................................................ 16
TraceAPI.DLL .................................................................................................. 16
Spawned Process............................................................................................ 16

Code snippets .................................................................................................. 17


The DETOUR_TRAMPOLINE macro .............................................................. 17
DetourGenMoveEax function........................................................................... 17
DetourFunctionWithTrampolineEx function ..................................................... 18
detour_insert_detour function .......................................................................... 19
Creating a trampoline function for the "CreateProcessW" function.................. 21
Instrumenting the "CreateProcessW" function ................................................. 21
The "CreateProcessW" Detour function........................................................... 21
InjectLibrary function........................................................................................ 22

The Application Guide..................................................................................... 25


Win32APIInterceptor installation...................................................................... 25
GUI guide ........................................................................................................ 27
Hands-on example........................................................................................... 30

2
Win32API Interceptor Final Report

Known Issues................................................................................................... 31

Appendices...........................................................................................32
Microsoft Research “Detours” ....................................................................... 32

3
Win32API Interceptor Final Report

Abstract
This project introduces a novel approach to intercept Win32API function calls.
It is based on a Microsoft-research technology called Detours.
The final product of this project is a MS Access based application that logs all the
Win32API function calls that are issued by an application of the user's choice.

This project was created by:


Dr. Ilana David ilana@ee.technion.ac.il Instructor and EE Software
Lab chief engineer
Ben Bernstein bbern@microsoft.com Instructor and a developer in
the Microsoft R&D Haifa
center
Kfir Karmon skarmon@t2.technion.ac.il Student of the Computer
Science department
Polina Shafran shpolina@t2.technion.ac.il Student of the Computer
Science department

Introduction
Innovative systems research hinges on the ability to easily instrument and extend
existing operating system and application functionality. With access to appropriate
source code, it is often trivial to insert new instrumentation or extensions by
rebuilding the OS or application. However, in today’s world of commercial software,
researchers seldom have access to all relevant source code.
In this project we use "Detours", which is a library for instrumenting arbitrary Win32
functions on x86 machines. Detours intercepts Win32 functions by re-writing target
function images.
While prior researchers have used binary rewriting to insert debugging and profiling
instrumentation, to our knowledge, Detours is the first package on any platform to
logically preserve the un-instrumented target function (callable through a trampoline)
as a subroutine for use by the instrumentation. Using the unique trampoline design
is crucial for extending existing binary software.
Since the project’s scope is bounded by an academic course we did not implement a
whole solution. We mainly concentrated in understanding the technologies that were
involved and to produce a working prototype of an infrastructure that intercepts
Windows API functions (for NT-family OSs).

The Goal – Project scope


The goal that we set was to compile a non-intrusive framework that intercepts
Windows API function calls issued by basic Win32 application.
The main interest in this project was to get familiar with the Detours technology and
to develop a working prototype of a real world, usable, application.
Therefore, some shortcuts were taken, several easy though non efficient methods
ere used in order for us to be able to focus on the main goal that we set for this
project.

4
Win32API Interceptor Final Report

Technologies used in the solution’s


architecture
Microsoft Research “Detours” Technology
The Detours technology was conceived in the Microsoft Research labs.
This is, yet, another method for intercepting method calls. Many techniques exist to
uphold this task, though, this mechanism is non intrusive - the executable is not
altered; only its memory image is changed. This way you do not need to compile the
code again (no need for sources).
Further more you can run several instances of the executable and intercept only the
instances that interest you.
In this section we’ll explain how this is achieved.
It should be noted that Detours can in fact intercept any function call and not just
Windows API function calls.
We will describe below the way Detours work. We will include code snippets from the
Detours code for further explanations.

Loading a DLL into a process’s context


Let’s assume we magically have a DLL that includes all interception information that
if we can make it load itself into a process’s memory space it will intercept all
Win32PI function calls. (In the next section we shall describe how to make that
magical DLL)
What we need to do is to force the process that we want to intercept to generate a
call to LoadLibrary() with the given DLL.
The following steps and code snippets describe how to achieve this task: (There is a
diagram tailing this explanation that might help clarifying this process)
1) First of all we want to create a new process for the application we want to
intercept.
Since we want to create it in ‘suspended’ mode, so we can inject our DLL, we
will create it and pass the “CREATE_SUSPENDED” creation flag to the
CreateProcess() function like so :
DWORD dwMyCreationFlags = (dwCreationFlags | CREATE_SUSPENDED);
if (!CreateProcess(lpApplicationName,
lpCommandLine,
lpProcessAttributes,
lpThreadAttributes,
bInheritHandles,
dwMyCreationFlags,
lpEnvironment,
lpCurrentDirectory,
lpStartupInfo,
&pi)) {
return FALSE;
}

5
Win32API Interceptor Final Report

2) Now we hold a instance of the PROCESS_INFORMATION structure named pi that


consists the created process’s handle (and the main thread’s handle)
3) The next step is to acquire the main thread’s context (a CONTEXT structure, our
instance will be named cxt) we’ll need it to append assembly calls to
LoadLibrary() on the stack and to update the eip register.
This is done like so:
GetThreadContext(hThread, &cxt);
(hThread is a handle to the main thread of the suspended process)
4) Now we’ll create a structure that will hold all the generated assembly code
and parameters. The structure looks like this:
struct Code {
BYTE rbCode[128];
CHAR szLibFile[512];
} Code
rbCode will hold the assembly code and szLibFile will hold a copy of the
injected DLL name.
5) Let’s calculate the beginning address of the code structure so we can add
assembly code to there:
nCodeBase = (cxt.Esp - sizeof(code))
Actually, we will add the assembly code to the code structure (that is resident
in our process’s memory space) and only when we are done we shall copy
the structure (containing the code) to the new spawned-suspended process’s
memory. The target address will be nCodeBase.
All the memory relative addresses/values will be calculated using the target
location (nCodeBase)
6) Now let’s copy the name of the injected DLL to code.szLibFile Like so:
CopyMemory(code.szLibFile, pDLLName, strlen(pDLLName)+1);
Where pDLLName is a pointer to the injected DLL’s name.
Note that the string is copied to the code structure since it will be copied to the
target process’s memory space (this way we will have the DLL’s name in the
target process)
7) Next step is to write a “push” command in the code structure that will push a
pointer to the DLL’s name – this will be used as a parameter for the
LoadLibrary() function call.
This is done using a inner function called DetourGenPush(). We will not dig
inside this function too, but basically what it does is write the assembly “push”
command’s opcode and the address to be pushed.
pbCode = DetourGenPush(code.rbCode,
nCodeBase + offsetof(Code,szLibFile));
Where offsetof() is a macro that retrieves the offset of a member of a
structure from the beginning of the structure.
8) Generating a “call” assembly function call is the next step. This too is done by
using an inner function called DetourGenCall(). Again, we will not inline this
function, but summarize that it writes the assembly “call” command’s opcode
and a relative address to jump to.
DetourGenCall(pbCode, pfLoadLibrary,
(PBYTE)nCodeBase + (pbCode - code.rbCode));
Where pfLoadLibrary is the address of the LoadLibrary() function

6
Win32API Interceptor Final Report

9) Now we’ll add a last generated assembly function call.


It will be an unconditional jump to the address that is currently in the EIP
register (the address of the next code command, right after we suspended the
process).
This is necessary since after we resume the process/thread we want it to load
our DLL and then to resume its regular code path as it was before we
interrupted it.
As you probably guessed it is done by an inner function called
DetourGenJmp(), this function will write the JMP’s opcode and the relative
address to jump to:
DetourGenJmp(pbCode, (PBYTE)cxt.Eip,
(PBYTE)nCodeBase + (pbCode - code.rbCode));
10) Now we need to change the thread’s context so that the EIP will point to our
first generated function call and the ESP to the address right after our
generated code (so LoadLibrary() will not overwrite our function calls).
This is done by editing the cxt’s members and calling the SetThreadContext()
function, like so:
cxt.Esp = nCodeBase - 4;
cxt.Eip = nCodeBase;
SetThreadContext(hThread, &cxt);
Note: It is important to notice that code is executed from low-addresses to
high-addresses whereas the stack enlarges the other way around.
Since we added the generated code to the pbCode pointer (and enlarged it
every time) the code has been written upwards on the stack starting from the
initial value it was set to (code.rbCode) and going backwards (to higher
addresses), consult the illustration below for more details.
11) We’re nearly there … Now let’s unprotect the target process’s memory and
copy the code structure to the base address we calculated (nCodeBase):
VirtualProtectEx(hProcess, (PBYTE)nCodeBase,
sizeof(Code), PAGE_EXECUTE_READWRITE,
&nProtect);
WriteProcessMemory(hProcess, (PBYTE)nCodeBase,
&code, sizeof(Code), &nWritten);
12) One last thing: After writing code to the memory (during execution) one must
call the FlushInstructionCache()function so the CPU will know to invalidate
its inner cache of commands (pipeline):
FlushInstructionCache(hProcess, (PBYTE)nCodeBase, sizeof(Code))
13) That’s it!!! lets resume the thread and let it load our DLL, thereafter it will
continue it its original course:
ResumeThread(hThread);

7
Win32API Interceptor Final Report

This illustration displays how the memory looks right before we resume the thread.
(step 13):
Our Process New Spawned (Soon to be
intercepted) Process

0xFF….F 0xFF….F
Old ESP

DLL-name (copy)
Copy of
struct Code{ JMP Old(EIP) the Code
…. structure
…. CALL LoadLibrary
} code;
PUSH DLL-name
New EIP =
UINT32 nCodeBase Old(ESP) –
sizeof(Code)
CreateProcess()

(Suspended)

0x00….0 0x00….0
New ESP

Detouring a function
Now, for the real thing. In this section we'll describe how the Detours mechanism
works and how it was incorporated into our project.

As we described above, the Detours mechanism is a Microsoft-Research technology


that was created to allow "hooking" to binary function calls at run time. The
application itself is not changed nor do you need to recompile the application (as
opposing to code coverage tools for example)

In general, the way Detour accomplishes this task is by changing the application's
assembly code that was loaded into the memory so that instead of going to the real
functions' code it jumps to the detouring code.
we stated "jumps" above since this is precisely the way detours does the trick – it
takes the 5 first bytes of the function you want to detour (assuming it has at least 5
bytes, this is the biggest restriction of using this method) and it writes it down in a
"Trampoline function". Instead of those 5 bytes an unconditional jump is written
destined to jump to the "Detour function", see bellow.
Then it creates a new code block, called the "Detour function", this function includes
the user's interception code (any thing he wants to do before the real function
operates). Appending this code is an assembly call function to the "Trampoline
function".
As you might recall, the "Trampoline function" includes the 5 bytes that were taken
from the original function. At the end of the "Trampoline function", Detours appends
an unconditional jump to the rest of the original function's code.

8
Win32API Interceptor Final Report

And now for the unwinding: when the original function hits the end it "Returns" to the
calling function and that would be… the "Detour function", since the "Trampoline
function" used an unconditional jump the return address on the stack is of the
"Detour Function".
When the "Detour Function" completes then it "Returns" to … the function that called
the original function to begin with (and not to the original function itself since, as you
recall, we added an unconditional jump to the "Detour function").

That’s it!

9
Win32API Interceptor Final Report

Easy, ha? Well we'm well aware that the explanation above is a "bit" obscure. To
combat this we will now add a diagram that will express this notion.
The diagram bellow is based on a diagram that was introduced in a PowerPoint
presentation that is included in the detours archive file that can be downloaded from
the web.

Before Detours:
1. Call

Calling Called function


function ("Original Function")

2. Return

After Detours:

1. Call 2. Jump 3. Call 4. Jump

Calling Called function Detour Trampoline Original


function ("Original Function") Function Function Function

6. Return 5. Return

Diagram: How Detours change the original functions' calling sequence

Talk about "a picture is worth a thousand words…"

10
Win32API Interceptor Final Report

Now we'll display the code behind this magic.


The following diagram illustrates the change in the assembly code that occurs after
you apply the Detour mechanism on a function.
This diagram, too, is based on a slide from a PowerPoint presentation that is
accompanied in the Detours archive.
Before Detours: After Detours:

Target: Target:
push ebp [1 byte]
jmp Detour [5 bytes]
mov ebp,esp [2 bytes]
push edi
push ebx [1 bytes]
....
push esi [1 byte]
push edi
Detour:
....
...Your code...
Call Trampoline
...More of your code...

Trampoline:
push ebp
mov ebp,esp
push ebx
push esi
jmp Target+5

Diagram: The code beneath the Detours mechanism

Now that we clearly understand how the mechanism works we need to understand
how to create the "Detours functions" and how to connect them to the "Original
functions" we want to detour.
The Detours library comes with some code that creates this connection, meaning,
given a function you want to detour and a "Detour Function" that contains the code
you want to inject it will instrument the "original function" like we described above.
The process of a function-instrumentation, done by creating the DLL that we
described in the section above, is divided into two:
1. Create a "Trampoline Function" and store in it the address of the original
function. (Done in compile time)
For this task Detours present a c-macro called:
"DETOURS_TRAMPOLINE(<Trampoline function signature>, <Original
function's name>)", this macro generates code of a Trampoline Function and
stores the address of the Original function in it.
This macro can be found in the "Detours.h" file.
2. Connect the Trampoline with the Detour function and the Original Function,
using the stored address of the Original Function in the Trampoline. (Done in
runtime). This is done by using the function:
"DetourFunctionWithTrampoline(<Trampoline function name>, <Detour
function name>)", the Trampoline function name is the same function name
as declared in the macro in the first bullet (above), and the Detour function
name is the function's name that you want to be called instead of the original
function (see the bullet above for the "original function name")

11
Win32API Interceptor Final Report

To sum it up, for every "Original function" that you want to detour, you need to call
the "DETOURS_TRAMPOLINE" with the original function and the signature of the
trampoline function (which should be the same as the signature of the Original
Function) then you should add a call to the "DetourFunctionWithTrampoline" function
that will bind the Trampoline and your function, the Detour function, in which you can
add the code that you want to run before the call to the original function.
You should not forget one important thing, in the Detour function you write, you
should add a function call to the Trampoline function (This function, as you can
recall, holds the first few instructions of the original function and a jump call to the
rest of the function)
You don't have to call the Trampoline function. If you don't call it, there will be no run
time error, what will happen is that when the detour function will terminate it'll return
to the calling function without running any of the original function's code. This, in fact,
is a way to replace the original function with your implementation.
Further more, you could add code after the call to the trampoline function (in the
Detour function), and that code will actually run after the original implementation.

Since we wrap all this code in a DLL binary we want the instrumentation to happen in
the DllMain() function, when it is called with the "reason" parameter set to:
"DLL_PROCESS_ATTACH". (To be specific, we want the calls to
"DetourFunctionWithTrampoline" to exist in DllMain())
This will ensure that when the LoadLibray() function will be injected into the
instrumented process (as we saw in the first section), the calls to the "
DetourFunctionWithTrampoline" functions will run as soon as the process will
resume execution.

COM
Describing the COM technology is way beyond the scope of this document. Further
more COM is only used as a by-product, it is not the main technology used in the
project.
Never the less we will generally describe how this technology helped us in the
project.
COM is a way to share objects, created in one language, in another language.
It wraps the object in a binary capsule that can be interpreted in several languages.
It is a bit more complicated that what is described above, and COM has more into it
that only what we stated.
The main reason we used COM objects is because we had C/C++ code that
implemented the detours functionality and the application we wrote was based on
Visual basic for Application.
Both C++ and VB handle COM objects and it was a good way to run the needed
functionality from within the VB application's memory space and not as separate
processes.
In the following sections we'll describe the two COM objects we created

DllInjectionAppLoader
This COM object will spawn a new process with the requested application and will
inject a DLL that includes function-instrumentation code (as was described in the
Microsoft research "Detours" technology section)

12
Win32API Interceptor Final Report

This object implements the IDllInjectionAppLoader interface.


The IDllInjectionAppLoader declares the following methods and properties:
ƒ HRESULT LoadApplication(
[in] BSTR pszExePath,
[in] BSTR pszDllPath);
This method loads the application that resides in the pszExePath location and
injects the Detours DLL located at pszDllPath into it
ƒ HRESULT KillApplication();
This method terminates the loaded application (if one is loaded)
The application should have been loaded using the LoadApplication() function
ƒ HRESULT IsAppLoaded([out, retval] VARIANT_BOOL* pVal);
This property returns true if and only if an application was loaded using the
LoadApplication() function and the application did not terminate (either by
itself or by using the KillApplication() method)

InterceptLogger
This COM object will log the Win32API functions that are called during the
application that was spawned with the injected DLL.
This object implements the IInterceptLogger interface.
The IInterceptLogger declares the following methods and properties:
ƒ HRESULT StartLogging(
[in] VARIANT_BOOL bBlocking,
[in] BSTR ODBC_DSN);
This method opens the OS-pipe and logs every message from it to the DB.
bBlocking should be true if and only if you wish the function to block. This is
useful if you write a VB-script that plays the role of the listener.
ODBC_DSN is the name of the ODBC DSN that the logger should write to.
ƒ HRESULT Shutdown();
This method should be called if you want to close the connection to the data
base.
After calling this method no logging messages will be inserted into the
database until StartLogging() will be call again.
ƒ HRESULT IsConnected([out, retval] VARIANT_BOOL* pVal);
this property returns true if and only if the following terms co-exist:
o StartLogging() function was called previously and the connection was
made successfully
o Shutdown() was not called after StartLogging() was successful
ƒ HRESULT AddFunctionToFilter([in] BSTR FunctionName);
This method adds the function name that is passed in the parameter
FunctionName, to a list of filtered functions
ƒ HRESULT SetFilterType([in] long FilterType);
This method sets the way the InterceptLogger object will filter the messages
according to the function names that were added by the AddFunctionToFilter()
method.
FilterType can be any one of the following values:
o 0 – No filtering will be done, the functions in the filter list will be ignored
o 1 – Log if and only if the log message is of a function that exists in the
filter list
o 2 – Log if and only if the log message is of a function that doesn't exists
in the filter list

13
Win32API Interceptor Final Report

Microsoft Access Data-Base


We used Microsoft Access as a repository for the function calls that were issued
during the instrumented application's life time.
Further more, we used MS Access to create an application that allows the user to
easily start an instrumented application and display the results.
Since in an MS Access application we can add VB code and since, we were able to
use the COM objects described in the section above to achieve this goal.
The database was not the main priority of this project so we used a simple database
structure, and it included only one table.
One clear limitation that this database design imposed was that we could not save all
the arguments passed in every function, therefore we limited the number of
arguments that were logged to the first five arguments.
Besides these fields the table includes fields for the functions' names, their return
address a timestamp in which the function was called.

14
Win32API Interceptor Final Report

The Solution
Architecture
In this section we'll describe the pieces that build our solution, what technology we
used for each of them and how we put all the pieces together.
The following diagram displays the grand picture.

1. Create COM Object


Win32API Interceptor
(MS Access Data
Base)
6. Log function call data

InterceptLogger
(COM Object)

2. Create COM Object


5. Extract logging data
\\.\pipe\Win32APIInterceptor

DllInjectionAppLoader
(OS Pipe)

(COM Object)

3a. Spawn

4. Post function call logging data

TraceAPI.DLL
3b. Inject Spawned process
(Binary DLL with
(with the TraceAPI.DLL
the Detours
injected)
functions)

Actually all the elements that are displayed in the diagram above were described
throughout the document.

15
Win32API Interceptor Final Report

Win32API Interceptor
This is an MS Access based application that runs the show.
As soon as the database is started a form opens, using this form the user can
choose which application he wants to intercept, he can change the filter that controls
which functions should be logged or not.
All the logged data is displayed on the screen and there is a graph that displays the
topmost called functions. (See the MS Access sub-section in the Technologies
section)

InterceptLogger
This is a COM object that incorporates the code that extracts the logging data that is
stored in the \\.\pipe\Win32APIInterceptor OS Pipe and inserts it to the data base.
(See the COM sub-section in the Technologies section)

DllInjectionAppLoader
This, too, is a COM object that spawns the user selected application that will be
instrumented with the TraceAPI.DLL detours functions. (See the COM sub-section in
the Technologies section)

\\.\pipe\Win32APIInterceptor
This is the windows-pipe's name that the TraceAPI.DLL's functions send logging
data to and the InterceptLogger extracts the logging data from.
This is the way the spawned process communicates with the Win32API-Intercetor
application.

TraceAPI.DLL
This DLL file includes the detours functions and the code that changes the Win32
API functions so they'd detour first through our code, that will send logging data via
the pipe. (See Microsoft research "Detours" technology section)

Spawned Process
This is the process the user wants to log its Win32API function calls.
It will be spawned by the DllInjectionAppLoader COM object, that will inject the
TraceAPI.DLL into its memory space. Every call to a Win32API function that will be
issued by this application/process will be posted in the \\.\pipe\Win32APIInterceptor.
Thereafter the message will be extracted from the pipe by the InterceptLogger and
stored by it in the Win32API Interceptor database.

16
Win32API Interceptor Final Report

Code snippets

The DETOUR_TRAMPOLINE macro


This macro was used to declare the trampoline function and to store the original
function's address (See the Detouring a function section)
The following code was excerpted from the Detours.h file.

#define DETOUR_TRAMPOLINE(trampoline,target) \
static PVOID __fastcall _Detours_GetVA_##target(VOID) \
{ \
return &target; \
} \
\
__declspec(naked) trampoline \
{ \
__asm { nop };\
__asm { nop };\
__asm { call _Detours_GetVA_##target };\
__asm { jmp eax };\
__asm { ret };\
__asm { nop };\
__asm { nop };\
__asm { nop };\
__asm { nop };\
__asm { nop };\
__asm { nop };\
__asm { nop };\
__asm { nop };\
__asm { nop };\
__asm { nop };\
__asm { nop };\
__asm { nop };\
__asm { nop };\
__asm { nop };\
__asm { nop };\
__asm { nop };\
__asm { nop };\
__asm { nop };\
__asm { nop };\
__asm { nop };\
__asm { nop };\
__asm { nop };\
}

DetourGenMoveEax function
This function generates assembly code on the given pointer that will be interpreted
as a "MOV nValue eax" assembly command.
There are many other functions of this sort, so we'll bring only this one as a teaser.
The following code was excerpted from the Detours.h file.

inline PBYTE DetourGenMovEax(PBYTE pbCode, UINT32 nValue)


{
*pbCode++ = 0xB8;
*((UINT32*&)pbCode)++ = nValue;
return pbCode;

17
Win32API Interceptor Final Report

DetourFunctionWithTrampolineEx function
This function was used to change the original function so it'll jump to the detour
function, and to fill the trampoline (See the Detouring a function section)
The following code was excerpted from the Detours.cpp file.
BOOL WINAPI DetourFunctionWithTrampolineEx(PBYTE pbTrampoline,
PBYTE pbDetour,
PBYTE *ppbRealTrampoline,
PBYTE *ppbRealTarget)
{
PBYTE pbTarget = NULL;

// Kfir: Gets the address of the first line of code


// (??? maybe there are some
// headers inside a function that we sould ignore ???)
pbTrampoline = DetourGetFinalCode(pbTrampoline, TRUE);
pbDetour = DetourGetFinalCode(pbDetour, FALSE);

// Kfir: Set Return values


if (ppbRealTrampoline)
*ppbRealTrampoline = pbTrampoline;
if (ppbRealTarget)
*ppbRealTarget = NULL;

if (pbTrampoline == NULL || pbDetour == NULL)


return FALSE;

// Kfir: check the trampoline that was passed has the


// structure we expect
// this tramopline should have been constructed using
// the "DETOUR_TRAMPOLINE" macro
// (located in the Detours.h file)
if (pbTrampoline[0] != OP_NOP ||
pbTrampoline[1] != OP_NOP ||
pbTrampoline[2] != OP_CALL ||
pbTrampoline[7] != OP_PREFIX ||
pbTrampoline[8] != OP_JMP_EAX) {

return FALSE;
}

PVOID (__fastcall * pfAddr)(VOID);

// Kfir: calculate MAGICALLY the location of the


// "original function"....
// pbTrampoline is a pointer to a Trampoline function
// code (see DETOUR_TRAMPOLINE macro)
// pbTrampoline[3] whould be the adderss of the
// precompiler-generated
// function named "_Detours_GetVA_##target" (see
// DETOUR_TRAMPOLINE macro)
pfAddr = (PVOID (__fastcall *)(VOID))(pbTrampoline +
SIZE_OF_NOP + SIZE_OF_NOP +
SIZE_OF_JMP +
*(LONG *)&pbTrampoline[3]);

pbTarget = DetourGetFinalCode((PBYTE)(*pfAddr)(), FALSE);


if (ppbRealTarget)

18
Win32API Interceptor Final Report

*ppbRealTarget = pbTarget;

// Kfir: this function will copy the code from the original
// code to the trampoline and add
// the needed jmp opcodes in the trampoline and in the
// original function
// pbTarget is the pointer to the ORIGINAL FUNCTION
// pbDetour is the pointer to the DETOUR FUNCTION (the
// code you want to run before the original one)
// pbTrampoline is the pointer to where we will dump the
// first 5 bytes of the original code.
return detour_insert_detour(pbTarget, pbTrampoline, pbDetour);
}

detour_insert_detour function
This is an inner function that is used by the DetourFunctionWithTrampolineEx
function. (See where this function was called, at the end of the function)
The following code was excerpted from the Detours.cpp file.

// Kfir: this function will copy the code from the original-code to
// the trampoline and add
// the needed jmp opcodes in the trampoline and in the original //
function
// pbTarget is the pointer to the ORIGINAL FUNCTION
// pbDetour is the pointer to the DETOUR FUNCTION (the code you
// want to run before the original one)
// pbTrampoline is the pointer to where we will dump the first
// 5 bytes o
static BOOL detour_insert_detour(PBYTE pbTarget,
PBYTE pbTrampoline,
PBYTE pbDetour)
{
PBYTE pbCont = pbTarget;
//Kfir: First we want to *check* what kind of commands exist in
// the begining of a function
// generally we want to remove at least 5 bytes
// (SIZE_OF_TRP_OPS) but we
// dont want to break a command in the middle (i don't
// know how they drew the line exactly
// but some opcodes need to be glued and some don't need to be
// glued together,
// so if the first one moves so will the rest)
for (LONG cbTarget = 0; cbTarget < SIZE_OF_TRP_OPS;) {
PBYTE pbOp = pbCont;
BYTE bOp = *pbOp;
pbCont = DetourCopyInstruction(NULL, pbCont, NULL);
cbTarget = pbCont - pbTarget;

if (bOp == OP_JMP ||
bOp == OP_JMP_EAX ||
bOp == OP_RET_POP ||
bOp == OP_RET) {

break;
}
if (bOp == OP_PREFIX && pbOp[1] == OP_JMP_SEG) {
break;
}
if ((bOp == OP_PRE_ES ||
bOp == OP_PRE_CS ||

19
Win32API Interceptor Final Report

bOp == OP_PRE_SS ||
bOp == OP_PRE_DS ||
bOp == OP_PRE_FS ||
bOp == OP_PRE_GS) &&
pbOp[1] == OP_PREFIX &&
pbOp[2] == OP_JMP_SEG) {
break;
}
} // Kfir: End of FOR!!!

if (cbTarget < SIZE_OF_TRP_OPS) {


// Too few instructions.
return FALSE;
}
if (cbTarget > (DETOUR_TRAMPOLINE_SIZE - SIZE_OF_JMP - 1)) {
// Too many instructions.
return FALSE;
}

CDetourEnableWriteOnCodePage ewTrampoline(pbTrampoline,
DETOUR_TRAMPOLINE_SIZE);
CDetourEnableWriteOnCodePage ewTarget(pbTarget, cbTarget);
if (!ewTrampoline.SetPermission(PAGE_EXECUTE_READWRITE))
return FALSE;
if (!ewTarget.IsValid())
return FALSE;

PBYTE pbSrc = pbTarget;


PBYTE pbDst = pbTrampoline;
// Kfir: Now really *move* the code we discovered in the
// for-loop before.
for (LONG cbCopy = 0; cbCopy < cbTarget;) {
pbSrc = DetourCopyInstruction(pbDst, pbSrc, NULL);
cbCopy = pbSrc - pbTarget;
pbDst = pbTrampoline + cbCopy;
}
if (cbCopy != cbTarget) // Count came out different!
return FALSE;

// Kfir: add a jump in the Trampoline to the rest of the


// original code
// (after copying the first 5Bytes we now add the jump)
if (!detour_insert_jump(pbDst, pbTarget + cbTarget, SIZE_OF_JMP))
return FALSE;

pbTrampoline[DETOUR_TRAMPOLINE_SIZE-1] = (BYTE)cbTarget;

// Kfir: add a jump in the Original Function code to the DETOUR


// FUNCTION
// (after "backuping" the first 5Bytes of the original
// code in the trampoline we can now
// override it with a jump to the "injected code")
if (!detour_insert_jump(pbTarget, pbDetour, cbTarget))
return FALSE;

return TRUE;
}

20
Win32API Interceptor Final Report

Creating a trampoline function for the "CreateProcessW" function


This is a call to the DETOUR_TRAMPOLINE c-macro that creates a trampoline function for
the CreateProcessW Win32API function.
Note hoe the signature of the tramoline function is the same as the original
CreateProcessW function.
The following code was excerpted from the _win32.cpp file.

DETOUR_TRAMPOLINE(BOOL __stdcall Real_CreateProcessW(LPCWSTR a0,


LPWSTR a1,
LPSECURITY_ATTRIBUTES a2,
LPSECURITY_ATTRIBUTES a3,
BOOL a4,
DWORD a5,
LPVOID a6,
LPCWSTR a7,
struct _STARTUPINFOW* a8,
LPPROCESS_INFORMATION a9),
CreateProcessW);

Instrumenting the "CreateProcessW" function


This call to the DetourFunctionWithTrampoline function was taken out of a function
that is run in the DLLMain of the TraceAPI.dll.
This call will instrument the Original CreateProcessW using the Detour function
(Mine_CreateProcess) and the Trampoline function (Real_CreateProcecssW)
The following code was excerpted from the _win32.cpp file.
DetourFunctionWithTrampoline((PBYTE)Real_CreateProcessW,
(PBYTE)Mine_CreateProcessW);

The "CreateProcessW" Detour function


This is the Detour function that will be run istead of the original "CreatePrrocessW"
function.
Notice that it calls the trampoline function.
The following code was excerpted from the _win32.cpp file.
BOOL __stdcall Mine_CreateProcessW(LPCWSTR a0,
LPWSTR a1,
LPSECURITY_ATTRIBUTES a2,
LPSECURITY_ATTRIBUTES a3,
BOOL a4,
DWORD a5,
LPVOID a6,
LPCWSTR a7,
struct _STARTUPINFOW* a8,
LPPROCESS_INFORMATION a9)
{
// Kfir: log the fact that this function was called and its
// paramters' values
DWORD nMsgID =
_PrintEnter(-1,
"CreateProcessW(%ls,%ls,%lx,%lx,%lx,%lx,%lx,%ls,%lx,%lx)\n",
a0, a1, a2, a3, a4, a5, a6, a7, a8, a9);

BOOL rv = 0;
__try {
// Kfir: Here we call the trampoline function that will

21
Win32API Interceptor Final Report

// eventually call jump to the rest of the original code


rv = Real_CreateProcessW(a0, a1, a2, a3, a4,
a5, a6, a7, a8, a9);
} __finally {
// Kfir: Log the return value
_PrintExit(nMsgID, "CreateProcessW(,,,,,,,,,)>%lx\n", rv);
};
return rv;
}

InjectLibrary function
This function injects a LoadLibrary() function call to a process.
It is used to inject the TraceAPI.dll file to the process that the user asked to intercept
its Win32API calls. (See Loading a DLL into a process’s context section for more
details)
The following code was excerpted from the creatwth.cpp file.
static BOOL InjectLibrary(HANDLE hProcess,
HANDLE hThread,
PBYTE pfLoadLibrary,
PBYTE pbData,
DWORD cbData)
{
BOOL fSucceeded = FALSE;
DWORD nProtect = 0;
DWORD nWritten = 0;
CONTEXT cxt;
UINT32 nCodeBase;
PBYTE pbCode;

struct Code
{
BYTE rbCode[128];
union
{
WCHAR wzLibFile[512];
CHAR szLibFile[512];
};
} code;

//Kfir: suspend the tread so we can change its context & stack
//Kfir: hThread is the main running thread
SuspendThread(hThread);

ZeroMemory(&cxt, sizeof(cxt));
cxt.ContextFlags = CONTEXT_FULL;

//Kfir: Get Thread's context


if (!GetThreadContext(hThread, &cxt)) {
goto finish;
}

//Kfir: calculate where the new code should be inserted from


nCodeBase = (cxt.Esp - sizeof(code))
& ~0x1fu;// Cache-line align.

pbCode = code.rbCode;

22
Win32API Interceptor Final Report

if (pbData) {
CopyMemory(code.szLibFile, pbData, cbData);
//Kfir: probably "DetourGenPush" adds a "push" opcode on
// the stack and the
//Kfir: address of the Dll Name afterwards
pbCode = DetourGenPush(pbCode, nCodeBase +
offsetof(Code, szLibFile));
//Kfir: probably adds a "call" opcode to the
// 'LoadLibrary' function located at
//Kfir: the kernel dll (uses internal GetLoadLibraryA())
pbCode = DetourGenCall(pbCode, pfLoadLibrary,
(PBYTE)nCodeBase + (pbCode –
code.rbCode));
}

//Kfir: probably adds opdodes that restores the current


// registers' values
pbCode = DetourGenMovEax(pbCode, cxt.Eax);
pbCode = DetourGenMovEbx(pbCode, cxt.Ebx);
pbCode = DetourGenMovEcx(pbCode, cxt.Ecx);
pbCode = DetourGenMovEdx(pbCode, cxt.Edx);
pbCode = DetourGenMovEsi(pbCode, cxt.Esi);
pbCode = DetourGenMovEdi(pbCode, cxt.Edi);
pbCode = DetourGenMovEbp(pbCode, cxt.Ebp);
pbCode = DetourGenMovEsp(pbCode, cxt.Esp);
//Kfir: continue running what we suspended
pbCode = DetourGenJmp(pbCode, (PBYTE)cxt.Eip,
(PBYTE)nCodeBase + (pbCode - code.rbCode));

//Kfir: update the next command and SP


cxt.Esp = nCodeBase - 4;
cxt.Eip = nCodeBase;

//Kfir: add writting permmisions to the process's page where


// the code should be injected (nCodeBase)
if (!VirtualProtectEx(hProcess, (PBYTE)nCodeBase, sizeof(Code),
PAGE_EXECUTE_READWRITE, &nProtect)) {
goto finish;
}

//Kfir: add the injected code formed above


if (!WriteProcessMemory(hProcess, (PBYTE)nCodeBase,
&code, sizeof(Code), &nWritten)) {
goto finish;
}

//Kfir: when writing to stack new comands must call this


// function (see msdn)
if (!FlushInstructionCache(hProcess, (PBYTE)nCodeBase,
sizeof(Code))) {
goto finish;
}

//Kfir: update thread's context


if (!SetThreadContext(hThread, &cxt)) {
goto finish;
}

fSucceeded = TRUE;

finish:

23
Win32API Interceptor Final Report

//Kfir: Resmue the thread


//Kfir: This will cause the LoadLibrary function to run and
// load the wanted dll
//Kfir: and to restore the registers
//Kfir: and finally to jup back to where we suspened the thread
// in the first place
ResumeThread(hThread);
return fSucceeded;
}

24
Win32API Interceptor Final Report

The Application Guide


Win32APIInterceptor installation

System Requirements
1. x86 version of Windows NT/Windows 2000/ Windows XP
2. Microsoft Visual C++ .Net 2003
3. Microsoft Office Access 2003
4. Administrator permissions

Adding an ODBC DSN for the project


ODBC Data Source should be defined, prior to the installation.
In order to do that, go to the Start button and select Control Panel (as shown on the
illustration 4.1).

illustration 4.1 illustration 4.2

Now, open ODBC Data Source Administrator and select User DSN section (illustration
4.2). Then, press the Add button, select Driver do Microsoft Access (*.mdb) and press the
Finish button. Now, you see the ODBC Microsoft Access Setup screen. Type
Win32APIInterceptor in the Data Source Name field, press the Select button and select the
Win32APIInterceprot.mdb file from the Win32APIInterceptor directory. Press OK.

Once ODBC definition is complete, you must compile the source files. These files can be
found on the project’s website.

Compilation
In order to install the application, open the Win32APIInterceptor -> Win32APIIntercept
directory. Open Win32APIIntercept.sln file with Microsoft Visual C++ .NET

25
Win32API Interceptor Final Report

illustration 4.3

The next step is to build the solution. Select the Rebuild Solution option from the
Build menu as shown on the previous illustration and wait until the build process
completes.

Now, Win32APIInterceptor is ready for use.

26
Win32API Interceptor Final Report

GUI guide
Main Screen
To start a Win32APIInterceptor, open the Win32APIInterceptor.mdb file. The main
window appears, as follows:

illustration 4.4

Intercepted Executable section


In the Executable location field, the user specifies the application he wishes to
intercept.
One can simply type in the path of the desired executable, or hit the … button, to
browse the Disk.
By pressing the Start Interception button, the user starts the interception process
and by pressing the Stop Interception button, halts it.

Illustration 4.5

27
Win32API Interceptor Final Report

Log section
All intercepted API calls for a certain application are listed in this section. There is a
separate record for each API function, which consists of a function name, call time,
return value and the first five arguments (at most).
While intercepting, all these details are saved in a database. In order to display them
on the screen, one must press the Refresh button.
There is a record counter at the bottom of the section.

illustration 4.6

Tools section
The can change some interception properties:

ƒ To clear the database, use the Clear Database button. After the database is
cleared, the Log list is refreshed – all the data is removed from it.

ƒ There is an option to refresh the Log list automatically. In order to do that,


check the AutoRefresh checkbox. The next step is to enter the timeout in
seconds.

ƒ In the Filters section the user can manage the Log list. One can select the
functions he wishes to display (or those he wishes to screen out). The default
condition is Unfiltered, which means, that all API functions will be displayed.
In order to display only certain functions, one should select the Wanted
option and add all of the desired functions to the list below. To block some
function from being displayed, select the Unwanted option (analogous to
Wanted in the previous example) and proceed as described earlier.
Functions list can be managed according to the user’s needs (add/remove
functions, clear the list). All changes in the tools section can be performed
dynamically, without stopping the interception process.

illustration 4.7

28
Win32API Interceptor Final Report

Top Functions section


This chart allows the user to see most frequently used functions for a certain
application.

illustration 4.8

29
Win32API Interceptor Final Report

Hands-on example
Let’s examine the API calls that occur while running the command line interpreter
(cmd.exe, in our case).

1. Open Win32APIInterceptor and press the Clear Database button for you own
convenience. (This will delete the saved information from the database and
remove all the records from the Log section.)
2. Type cmd in the Executable location field, and then press the Start
Interception button.
3. As you can see, nothing happens and the Log section remains clear. The
reason is that we didn’t press the Refresh button, i.e. the information about
API calls is stored in the database, but is not shown on the screen. Press the
Refresh button and function’s details will appear. Wait a few second until all
API calls occur. (You can press the Refresh button at the end of the
interception, to ensure, that all functions are displayed.)
4. Press the Stop Interception button.
5. The counter on the bottom of the Log section shows, that 282 API calls has
occurred. We can see that most frequently called function was
GetLocaleInfoW (138 times - according to the chart).
6. Clear the function list, by pressing the Clear button in the Filters section. Add
the GetLocaleInfoW function to the list and highlight the Unwanted option.
7. This will cause the interceptor not to display the GetLocaleInfoW function in
the Log section.
8. Repeat the steps 1-3 again.
9. The GetLocaleInfoW function doesn’t appear in the Log section, and function
counter equals to 144. As you can see, 144 + 138 = 282.
10. Note, that instead of using the Refresh button, you can use the AutoRefresh
option.

30
Win32API Interceptor Final Report

Known Issues
• The compilation of the project is dependent of the MS Visual Studio 2003
(.Net) IDE, there isn't a full makefile that build all of the project.
Aside of the COM objects, there is a makefile that'll build the rest of the
project.

31
Win32API Interceptor Final Report

Appendices
Microsoft Research “Detours”

Abstract extensions by rebuilding the OS or application.


Innovative systems research hinges on the However, in today’s world of commercial
ability to easily instrument and extend existing development and binary-only releases,
operating system and application researchers seldom have access to all relevant
functionality. With access to appropriate source code.
source code, it is often trivial to insert new Detours is a library for intercepting
instrumentation or extensions by rebuilding arbitrary Win32 binary functions on x86
the OS or application. However, in today’s machines. Interception code is applied
world of commercial software, researchers dynamically at runtime. Detours replaces the
seldom have access to all relevant source first few instructions of the target function
code. with an unconditional jump to the user-
We present Detours, a library for provided detour function. Instructions from
instrumenting arbitrary Win32 functions on the target function are preserved in a
x86 machines. Detours intercepts Win32 trampoline function. The trampoline function
functions by re-writing target function images. consists of the instructions removed from the
The Detours package also contains utilities to target function and an unconditional branch to
attach arbitrary DLLs and data segments the remainder of the target function. The
(called payloads) to any Win32 binary. detour function can either replace the target
While prior researchers have used binary function or extend its semantics by invoking
rewriting to insert debugging and profiling the target function as a subroutine through the
instrumentation, to our knowledge, Detours is trampoline.
the first package on any platform to logically Detours are inserted at execution time. The
preserve the un-instrumented target function code of the target function is modified in
(callable through a trampoline) as a memory, not on disk, thus facilitating
subroutine for use by the instrumentation. Our interception of binary functions at a very fine
unique trampoline design is crucial for granularity. For example, the procedures in a
extending existing binary software. DLL can be detoured in one execution of an
We describe our experiences using Detours application, while the original procedures are
to create an automatic distributed partitioning not detoured in another execution running at
system, to instrument and analyze the DCOM the same time. Unlike DLL re-linking or static
protocol stack, and to create a thunking layer redirection, the interception techniques used in
for a COM-based OS API. Micro-benchmarks the Detours library are guaranteed to work
demonstrate the efficiency of the Detours regardless of the method used by application
library. or system code to locate the target function.
While others have used binary rewriting for
Introduction debugging and to inline instrumentation,
Innovative systems research hinges on the Detours is a general-purpose package. To our
ability to easily instrument and extend existing knowledge, Detours is the first package on any
operating system and application functionality platform to logically preserve the un-
whether in an application, a library, or the instrumented target function as a subroutine
operating system DLLs. Typical reasons to callable through the trampoline. Prior systems
intercept functions are to add functionality, logically prepended the instrumentation to the
modify returned results, or insert target, but did not make the original target’s
instrumentation for debugging or profiling. functionality available as a general subroutine.
With access to appropriate source code, it is Our unique trampoline design is crucial for
often trivial to insert new instrumentation or extending existing binary software.

32
Win32API Interceptor Final Report

In addition to basic detour functionality, function completes, it returns control to the


Detours also includes functions to edit the detour function. The detour function performs
DLL import table of any binary, to attach appropriate postprocessing and returns control
arbitrary data segments to existing binaries, to the source function. Figure 1 shows the
and to inject a DLL into either a new or an logical flow of control for function invocation
existing process. Once injected into a process, with and without interception.
the instrumentation DLL can detour any
Invocation without interception:
Win32 function, whether in the application or
the system libraries. 1
The following section describes how
Source Target
Detours works. Section 0 outlines the usage of Function Function
the Detours library. Section 0 describes
alternative function-interception techniques 2
and presents a micro-benchmark evaluation of
Invocation with interception:
Detours. Section 0 details the usage of
Detours to produce distributed applications 1 2 3
from local applications, to quantify DCOM
Source Detour Trampoline Target
overheads, to create a thunking layer for a new Function Function Function Function
COM-based Win32 API, and to implement
first chance exception handling. We compare 5 4
Detours with related work in Section 0 and
summarize our contributions in Section 0.
Figure 1. Invocation with and without
Implementation interception.
Detours provides three important sets of
functionality: the ability to intercept arbitrary The Detours library intercepts target
Win32 binary functions on x86 machines, the functions by rewriting their in-process binary
ability to edit the import tables of binary files, image. For each target function, Detours
and the ability to attach arbitrary data actual rewrites two functions: the target
segments to binary files. We will describe the function and the matching trampoline function.
implementation of each of these The trampoline function can be allocated
functionalities. either dynamically or statically. A statically
allocated trampoline always invokes the target
Interception of Binary Functions function without the detour. Prior to insertion
of a detour, the static trampoline contains a
The Detours library facilitates the single jump to the target. After insertion, the
interception of function calls. Interception trampoline contains the initial instructions
code is applied dynamically at runtime. from the target function and a jump to the
Detours replaces the first few instructions of remainder of the target function.
the target function with an unconditional jump Statically allocated trampolines are
to the user-provided detour function. extremely useful for instrumentation
Instructions from the target function are programmers. For example, in Coign [7],
preserved in a trampoline function. The
invoking the Coign_CoCreateInstance
trampoline consists of the instructions
trampoline is equivalent to invoking the
removed from the target function and an
original CoCreateInstance function
unconditional branch to the remainder of the
without instrumentation. Coign internal
target function.
When execution reaches the target function, functions can call Coign_CoCreate-
control jumps directly to the user-supplied Instance at any time to create a new
detour function. The detour function performs component instance without concern for
whatever interception preprocessing is whether or not the original function has been
appropriate. The detour function can return rerouted with a detour.
control to the source function or it can call the
trampoline function, which invokes the target
function without interception. When the target

33
Win32API Interceptor Final Report

;; Target Function ;; Target Function payloads, to Win32 binary files and for editing
… … DLL import tables.
TargetFunction: TargetFunction:
push ebp jmp DetourFunction Figure 3 shows the basic structure of a
mov ebp,esp Win32 Portable Executable (PE) binary file.
push ebx
push esi TargetFunction+5:
The PE format for Win32 binaries is an
push edi push edi extension of COFF (the Common Object File
… … Format). A Win32 binary consists of a DOS
;; Trampoline ;; Trampoline compatible header, a PE header, a text section
… … containing program code, a data section
TrampolineFunction: TrampolineFunction:
jmp TargetFunction push ebp containing initialized data, an import table
… mov ebp,esp listing any imported DLLS and functions, an
push ebx
push esi
export table listing functions exported by the
jmp TargetFunction+5 code, and debug symbols. With the exception
… of the two headers, each of the other sections
of the file is optional and may not exist in a
given binary.
Figure 2. Trampoline and target functions,
before and after insertion of the detour (left Start of File
DOS Header
and right). PE (w/COFF) Header
.text Section
Figure 2 shows the insertion of a detour. To Program Code
detour a target function, Detours first allocates
.data Section
memory for the dynamic trampoline function Initialized Data
(if no static trampoline is provided) and then
enables write access to both the target and the .idata Section
Import Table
trampoline. Starting with the first instruction,
Detours copies instructions from the target to .edata Section
Export Table
the trampoline until at least 5 bytes have been
Debug Symbols
copied (enough for an unconditional jump
instruction). If the target function is fewer
than 5 bytes, Detours aborts and returns an End of File
error code. To copy instructions, Detours uses Figure 3. Format of a Win32 PE binary file.
a simple table-driven disassembler. Detours
adds a jump instruction from the end of the To modify a Win32 binary, Detours creates
trampoline to the first non-copied instruction a new .detours section between the export
of the target function. Detours writes an table and the debug symbols. Note that debug
unconditional jump instruction to the detour symbols must always reside last in a Win32
function as the first instruction of the target binary. The new section contains a detours
function. To finish, Detours restores the header record and a copy of the original PE
original page permissions on both the target header. If modifying the import table, Detours
and trampoline functions and flushes the CPU creates the new import table, appends it to the
instruction cache with a call to Flush- copied PE header, then modifies the original
InstructionCache. PE header to point to the new import table.
Finally, Detours writes any user payloads at
Payloads and DLL Import Editing the end of the .detours section and appends
While a number of tools exist for editing the debug symbols to finish the file. Detours
binary files [10, 12, 13, 17], most systems can reverse modifications to the Win32 binary
research doesn’t require such heavy-handed by restoring the original PE header from the
access to binary files. Instead, it is often .detours section and removing the
sufficient to add an extra DLL or data segment .detours section. Figure 4 shows the
to an application or system binary file. In format of a Detours-modified Win32 binary.
addition to detour functions, the Detours Creating a new import table serves two
library also contains fully reversible support purposes. First, it preserves the original
for attaching arbitrary data segments, called import table in case the programmer needs to
reverse all modifications to the Win32 file.

34
Win32API Interceptor Final Report

Second, the new import table can contain must include the detours.h header file and
renamed import DLLs and functions or link with the detours.lib library.
entirely new DLLs and functions. For
#include <windows.h>
example, Coign [7] uses Detours to insert an #include <detours.h>
initial entry for coignrte.dll into each VOID (*DynamicTrampoline)(VOID) = NULL;
instrumented application. As the first entry in DETOUR_TRAMPOLINE(
the applications import table, VOID WINAPI SleepTrampoline(DWORD),
Sleep
coignrte.dll always is the first DLL to );
run in the application’s address space. VOID WINAPI SleepDetour(DWORD dw)
{
Start of File return SleepTrampoline(dw);
DOS Header
}
PE (w/COFF) Header
VOID DynamicDetour(VOID)
.text Section
{
Program Code
return DynamicTrampoline();
.data Section }
Initialized Data void main(void)
{
.idata Section VOID (*DynamicTarget)(VOID) = SomeFunction;
unused Import Table
DynamicTrampoline
.edata Section =(FUNCPTR)DetourFunction(
Export Table (PBYTE)DynamicTarget,
(PBYTE)DynamicDetour);
.detours Section
detour header DetourFunctionWithTrampoline(
original PE header (PBYTE)SleepTrampoline,
new import table (PBYTE)SleepDetour);
user payloads
// Execute the remainder of program.
Debug Symbols
DetourRemoveTrampoline(SleepTrampoline);
End of File DetourRemoveTrampoline(DynamicTrampoline);
}
Figure 4. Format of a Detours-modified
binary file.
Figure 5. Sample Instrumentation Program.
Detours provides functions for editing
import tables, adding payloads, enumerating Trampolines may be created either statically
payloads, removing payloads, and rebinding or dynamically. To intercept a target function
binary files. Detours also provides routines for with a static trampoline, the application must
enumerating the binary files mapped into an create the trampoline with the DETOUR-
address space and locating payloads within _TRAMPOLINE macro. DETOUR_-
those mapped binaries. Each payload is TRAMPOLINE takes two arguments: the
identified by a 128-bit globally unique prototype for the static trampoline and the
identifier (GUID). Coign uses Detours to name of the target function.
attach per-application configuration data to Note that for proper interception the
application binaries. prototype, target, trampoline, and detour
In cases where instrumentation need be functions must all have exactly the same call
inserted into an application without modifying signature including number of arguments and
binary files, Detours provides functions to calling convention. It is the responsibility of
inject a DLL into either a new or an existing the detour function to copy arguments when
process. To inject a DLL, Detours writes a invoking the target function through the
LoadLibrary call into the target process trampoline. This is intuitive as the target
with the VirtualAllocEx and Write- function is just a subroutine callable by the
ProcessMemory APIs then invokes the call detour function.
with the CreateRemoteThread API. Using the same calling convention insures
that registers will be properly preserved and
Using Detours that the stack will be properly aligned between
The code fragment in Figure 5 illustrates detour and target functions.
the usage of the Detours library. User code

35
Win32API Interceptor Final Report

Interception of the target function is enabled debugging symbols. The function pointer
by invoking the DetourFunctionWith- returned by DetourFindFunction can be
Trampoline function with two arguments: given to DetourFunction to create a
the trampoline and the pointer to the detour dynamic trampoline.
function. The target function is not given as Interception of a target function can be
an argument because it is already encoded in removed by invoking the DetourRemove-
the trampoline. Trampoline function.
A dynamic trampoline is created by calling Note that because the functions in the
DetourFunction with two arguments: a Detours library modify code in the application
pointer to the target function and a pointer to address space, it is the programmer’s
the detour function. DetourFunction responsibility to ensure that no other threads
allocates a new trampoline and inserts the are executing in the address space while a
appropriate interception code in the target detour is inserted or removed. An easy way to
function. insure single-threaded execution is to call
Static trampolines are extremely easy to use functions in the Detours library from a
when the target function is available as a link DllMain routine.
symbol. When the target function is not
available for linking, a dynamic trampoline Evaluation
can be used. Often a function pointer to the Several alternative techniques exist for
target function can be acquired from a second intercepting function calls. Alternative
function. For those times, when a pointer to interception techniques include:
the target function is not readily available, Call replacement in application source
DetourFindFunction can find the pointer code. Calls to the target function are replaced
to a function when it is either exported from a with calls to the detour function by modifying
known DLL, or if debugging symbols are application source code. The major drawback
available for the target function’s binary1. of this technique is that it requires access to
DetourFindFunction accepts two source code.
arguments, the name of the binary and the Call replacement in application binary
name of the function. DetourFind- code. Calls to the target function are replaced
Function returns either a valid pointer to the with calls to the detour function by modifying
function or NULL if the symbol for the application binaries. While this technique
function could not be found. DetourFind- does not require source code, replacement in
Function first attempts to locate the the application binary does require the ability
function using the Win32 LoadLibrary and to identify all applicable call sites. This
GetProcAddress APIs. If the function is requires substantial symbolic information that
not found in the export table of the DLL, is not generally available for binary software.
DetourFindFunction uses the DLL redirection. If the target function
ImageHlp library to search available resides in a DLL, the DLL import entries in
the binary can be modified to point to a detour
DLL. Redirection to the detour DLL can be
achieved by either replacing the name of the
original DLL in the import table before load
time or replacing the function addresses in the
indirect import jump table after load [2].
Unfortunately, redirecting to the detour DLL
through the import table fails to intercept DLL
internal calls and calls on pointers obtained
from the LoadLibrary and
GetProcAddress APIs early in an
applications execution.
Breakpoint trapping. Rather than replace
1
Microsoft ships debugging symbols for the entire the DLL, the target function can be intercepted
Windows NT operation system as part of the retail
by inserting a debugging breakpoint into the
release. These symbols can be found in the \support-
\symbols directory on the OS distribution media. target function. The debugging exception

36
Win32API Interceptor Final Report

handler can then invoke the detour function. be partitioned across a network. During
The major drawback to breakpoint trapping is distributed executions, new Coign detour
that debugging exceptions suspend all functions intercept calls to COM instantiation
application threads. In addition, the debug functions and re-route those calls to distributed
exception must be caught in a second machines. In essence, Coign extends the COM
operating-system process. Interception via library to support intelligent remote
break-point trapping has a high performance invocation. Whereas DCOM supports remote
penalty. invocation of a few COM instantiation
Table 1 lists times for intercepting either an functions, Coign supports remote invocation
empty function or the CoCreateInstance for approximately 50 COM functions through
API. Times are on a 200 MHz Pentium Pro. detour extensions. Coign uses Detours’ DLL
Rows list the time to invoke the functions redirection functions to attach a runtime loader
without interception, with interception through and the payload functions to attach profiling
call replacement, with interception through data to application binaries.
DLL redirection, with interception using the Our colleagues have used Detours to
Detours library, or with interception through instrument the user-mode portion of the
breakpoint trapping. As can be seen, function DCOM protocol stack including marshaling
interception with Detours library has only proxies, DCOM runtime, RPC runtime,
minimal overhead (less than 400 ns in either WinSock runtime, and marshaling stubs [11].
case). The resultant detailed analysis was then used
to drive a re-architecture of DCOM for fast
Intercepted Function user-mode networks. While they could have
Interception
Empty CoCreate- used source code modifications to produce a
Technique
Function Instance special profiling version of DCOM, the
Direct 0.113µs 14.836µs source-based instrumentation would have been
Call Replacement 0.143µs 15.193µs version dependent and shared by all DCOM
DLL Redirection 0.143µs 15.193µs applications on the profiling machine. With
Detours Library 0.145µs 15.194µs binary instrumentation based on Detours, the
Breakpoint Trap 229.564µs 265.851µs profiling tool can be attached to any Windows
NT 4 build of DCOM and only effects the
process being profiled.
Table 1. Comparison of Interception
In another extension exercise, Detours was
Techniques.
used to create a thunking layer for COP (the
Component-based Operating System Proxy)
Experience [14]. COP is a COM-based version of the
The Detours package has been used Win32 API. COP aware applications access
extensively in Microsoft Research over the last operating system functionality through COM
two years to instrument and extend Win32 interfaces, such as IWin32FileHandle.
applications and the Windows NT operating Because the COP interfaces are distributable
system. with DCOM, a COP application can use OS
Detours was originally developed for the resources, including file systems, keyboards,
Coign Automatic Distributed Partition System mice, displays, registries, etc., from any
[7]. Coign converts local desktop applications machine in a network. To provide support for
built from COM components into distributed legacy applications, COP uses detour functions
client-server applications. During profiling, to intercept all application calls to the Win32
Coign uses Detours to intercept calls to COM APIs. Native application API calls are
instantiation functions such as CoCreate- converted to calls on COP interfaces. At the
Instance. The detour functions invoke the bottom, the COP implementation
original library functions through trampolines, communicates with the underlying operating
then wrap output interface pointers in an system through trampoline functions. COP
additional instrumentation layer (for more requires no modifications to application
details see [8]). The instrumentation layer binaries. At load time, the COP DLL is
measures inter-component communication to injected into the application’s address space
determine how application components should with Detours’ injection functions. Through its

37
Win32API Interceptor Final Report

simple interception, Detours has facilitated this 15]. Code patching has been applied to insert
massive extension of the Win32 API. debugging or profiling code. In the distant
Finally, to support Software Distributed past, code patching was generally considered
Shared Memory (SDSM) systems, we have to be a much more practical update method
implemented a first chance exception filter for than re-compiling the entire application. In
Win32 structured exception handling. The addition to debugging and profiling, Detours
Win32 API contains an API, Set- has also been used to resourcefully extend the
UnhandledExceptionFilter, through functionality of existing systems [7, 14].
which an application can specify an exception While recent systems have extended code
filter to execute should no other filter handle patching to parallel applications [1] and
an application exception. For applications system kernels [16], Detours is to our
such as SDSM systems, the programmer knowledge the only code patching system that
would like to insert a first-chance exception preserves the semantics of the target function
filter to remove page faults caused by the as a callable subroutine. The detour function
SDSM’s manipulation of VM page replaces the target function, but can invoke its
permissions. Windows NT does not provide functionality at any point through the
such a first-chance exception filter mechanism. trampoline. Our unique trampoline design
A simple detour intercepts the exception entry makes it trivial to extend the functionality of
point from kernel mode to user mode (Ki- existing binary functions.
UserExceptionDispatcher). With only Recent research has produced a class of
a few lines of code, the detour function calls a detailed binary rewriting tools including Atom
user-provided first-chance exception filter and [13], Etch [12], EEL [10], and Morph [17]. In
then forwards the exception, if unhandled, to general, these tools take as input an application
the default exception mechanism through a binary and an instrumentation script. The
trampoline. instrumentation script passes over the binary
inserting code between instructions, basic
Related Work blocks, or functions. The output of the script
is a new, instrumented binary. In a departure
Detours are an extension of the general for earlier systems, DyninstAPI [6] can modify
technique of code patching. To intercept applications dynamically.
execution, an unconditional branch or jump is Detours’ primary advantage over detailed
inserted into the desired point of interception binary rewriters is its size. Detours adds less
in the target function. Code overwritten by the than 18KB to an instrumentation package
unconditional branch is moved to a code patch. whereas detailed binary rewriters add at least a
The code patch consists of either the few hundred KB. The cost of Detours small
instrumentation code or a call to the size is an inability to insert code between
instrumentation code followed by the instructions or basic blocks. Detailed binary
instructions moved to insert the unconditional rewriters can insert instrumentation around any
branch and a jump to the first instruction in the instruction through sophisticated features such
target function after the unconditional branch. as free register discovery. Detours relies on
Logically, a code patch can be prepended to adherence to calling conventions in order to
the beginning of a function, inserted at some preserve register values. While detailed binary
arbitrary point in a function, or appended to rewriters support insertion of code before or
the end of a function. after any basic instruction unit, they do not
Whereas a code patch invokes preserve the semantics of the uninstrumented
instrumentation then continues the target target function as a callable subroutine.
function, our technique transfers control
completely to the detour function which can
invoke the original target function through the
Conclusions
trampoline at its leisure. The trampoline gives The Detours library provides an import set
instrumentation complete freedom to invoke of tools to the arsenal of the systems
the semantics of the original function as a researcher. Detour functions are fast, flexible,
callable subroutine at any time. and friendly. A detour of
Techniques for code patching have existed CoCreateInstance function has less than
since the dawn of digital computing [3-5, 9, a 3% overhead, which is an order of magnitude

38
Win32API Interceptor Final Report

smaller than the penalty for breakpoint [9] Kessler, Peter. Fast Breakpoints: Design and
Implementation. Proceedings of the ACM SIGPLAN '90
trapping. The Detours library is very small. Conference on Programming Language Design and
The runtime consists of less than 40KB of Implementation, pp. 78-84. White Plains, NY, June 1990.
compiled code although typically less than [10] Larus, James R. and Eric Schnarr. EEL: Machine-
18KB of code is added to the users Independent Executable Editing. Proceedings of the
instrumentation. ACM SIGPLAN Conference on Programming Language
Design and Implementation, pp. 291-300. La Jolla, CA,
Unlike DLL redirection, the Detours library June 1995.
intercepts both statically and dynamically
[11] Li, Li, Alessandro Forin, Galen Hunt, and Yi-Min Wang.
bound invocations. Finally, the Detours High-Performance Distributed Objects over a System
library is much more flexible than DLL Area Network. Proceedings of the Third USENIX NT
redirection or application code modification. Symposium. Seattle, WA, July 1999.
Interception of any function can be selectively [12] Romer, Ted, Geoff Voelker, Dennis Lee, Alec Wolman,
enabled or disabled for each process Wayne Wong, Hank Levy, Brian Bershad, and J. Bradley
Chen. Instrumentation and Optimization of Win32/Intel
individually at execution time. Executables Using Etch. Proceedings of the USENIX
Our unique trampoline preserves the Windows NT Workshop 1997, pp. 1-7. Seattle, WA,
semantics of the original, uninstrumented August 1997. USENIX.
target function for use as a subroutine of the [13] Srivastava, Amitabh and Alan Eustace. ATOM: A
detour function. Using detour functions and System for Building Customized Program Analysis
Tools. Proceedings of the SIGPLAN '94 Conference on
trampolines, it is trivial to produce compelling Programming Language Design and Implementation, pp.
system extensions without access to system 196-205. Orlando, FL, June 1994.
source code and without recompiling the [14] Stets, Robert J., Galen C. Hunt, and Michael L. Scott.
underlying binary files. Detours makes Component-based Operating System APIs: A Versioning
possible a whole new generation of innovative and Distributed Resource Solution. IEEE Computer,
32(7), July 1999.
systems research on the Windows NT
platform. [15] Stockham, T.G. and J.B. Dennis. FLIT- Flexowriter
Interrogation Tape: A Symbolic Utility Program for the
TX-0. Department of Electical Engineering, MIT,
Bibliography Cambridge, MA, Memo 5001-23, July 1960.
[1] Aral, Ziya, Illya Gertner, and Greg Schaffer. Efficient
[16] Tamches, Ariel and Barton P. Miller. Fine-Grained
Debugging Primitives for Multiprocessors. Proceedings
Dynamic Instrumentation of Commodity Operating
of the Third International Conference on Architectural
System Kernels. Proceedings of the Third Symposium on
Support for Programming Languages and Operating
Operating Systems Design and Implementation (OSDI
Systems, pp. 87-95. Boston, MA, April 1989.
'99), pp. 117-130. New Orleans, LA, February 1999.
[2] Balzer, Robert and Neil Goldman. Mediating USENIX.
Connectors. Proceedings of the 19th IEEE International
[17] Zhang, Xiaolan, Zheng Wang, Nicholas Gloy, J. Bradley
Conference on Distributed Computing Systems
Chen, and Michael D. Smith. System Support for
Workshop, pp. 73-77. Austin, TX, June 1999.
Automated Profiling and Optimization. Proceedings of
[3] Digital Equipment Corporation. DDT Reference Manual, the Sixteenth ACM Symposium on Operating System
1972. Principles. Saint-Malo, France, October 1997.

[4] Evans, Thomas G. and D. Lucille Darley. DEBUG - An


Extension to Current Online Debugging Techniques.
Communications of the ACM, 8(5), pp. 321-326, May
1965.

[5] Gill, S. The Diagnosis of Mistakes in Programmes on the


EDSAC. Proceedings of the Royal Society, Series A, 206,
pp. 538-554, May 1951.

[6] Hollingsworth, Jeffrey K. and Bryan Buck. DyninstAPI


Programmer's Guide, Release 1.2. Computer Science
Department, University of Maryland, College Park, MD,
September 1998.

[7] Hunt, Galen C. and Michael L. Scott. The Coign


Automatic Distributed Partitioning System. Proceedings
of the Third Symposium on Operating System Design and
Implementation (OSDI '99), pp. 187-200. New Orleans,
LA, February 1999. USENIX.

[8] Hunt, Galen C. and Michael L. Scott. Intercepting and


Instrumenting COM Applications. Proceedings of the
Fifth Conference on Object-Oriented Technologies and
Systems (COOTS'99), pp. 45-56. San Diego, CA, May
1999. USENIX.

39
Win32API Interceptor Final Report

40

You might also like