Professional Documents
Culture Documents
Undocumented Windows NT
Undocumented Windows NT
Undocumented Windows NT
Authors: Prasad Dabak
Sandeep Phadke
Milind Borate
Publisher: Hungry Minds
ISBN: 0764545698
Published:1999
Table of Contents
Chapter 1: Windows NT: An Inside Look See next page for excerpt.
Chapter 2: Writing Windows NT Device Drivers See next page for excerpt.
Chapter 3: Win32 Implementations: A Comparative Look See next page for
excerpt.
Chapter 4: Memory Management See next page for excerpt.
Chapter 5: Reverse Engineering Techniques See next page for excerpt.
Chapter 6: Hooking Windows NT System Services See next page for excerpt.
Chapter 7: Adding New System Services to the Windows NT Kernel See next
page for excerpt.
Chapter 8: Local Procedure Call See next page for excerpt.
Chapter 9: Hooking Software Interrupts See next page for excerpt.
Chapter 10: Adding New Software Interrupts See next page for excerpt.
Chapter 11: Portable Executable File Format See next page for excerpt.
Appendix A: Details of System Calls with Paramenters See next page for
excerpt.
Appendix B: What's on the CD-ROM See next page for excerpt.
EVALUATING WINDOWS NT
The qualities of an operating system are the result of the way in which the operating
system is designed and implemented. For an operating system to be portable, extensible,
and compatible with previous releases, the basic architecture has to be well designed. In
the following sections, we evaluate Windows NT in light of these issues.
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 2 of 226
Portability
As you know, Windows NT is available on several platforms, namely, Intel, MIPS, Power
PC, and DEC Alpha. Many factors contribute to Windows NT’s portability. Probably the
most important factor of all is the language used for implementation. Windows NT is
mostly coded in C, with some parts coded in C++. Assembly language, which is platform
specific, is used only where necessary. The Windows NT team also isolated the hardware-
dependent sections of the operating system in HAL.DLL. As a result, the hardware-
independent portions of Windows NT can be coded in a high-level language, such as C, and
easily ported across platforms.
Extensibility
Windows NT is highly extensible, but because of a lack of documentation, its extensibility
features are rarely explored. The list of undocumented features starts with the
subsystems. The subsystems provide multiple operating system interfaces in one operating
system. You can extend Windows NT to have a new operating system interface simply by
adding a new subsystem program. Windows NT provides Win32, OS/2, POSIX, Win16, and
DOS interfaces using the subsystems concept, but Microsoft keeps mum when it comes to
documenting the procedure to add a new subsystem.
Another example of Windows NT’s extensibility is its implementation of the system call
interface. Developers commonly modify operating system behavior by hooking or adding
system calls. The Windows NT development team designed the system call interface to
facilitate easy hooking and adding of system calls, but again Microsoft has not documented
these mechanisms.
Compatibility
Downward compatibility has been a long-standing characteristic of Intel’s microprocessors
and Microsoft’s operating systems, and a key to the success of these two giants. Windows
NT had to allow programs for DOS, Win16, and OS/2 to run unaltered. Compatibility is
another reason the NT development team went for the subsystem concept. Apart from
binary compatibility, where the executable has to be allowed to run unaltered, Windows
NT also provides source compatibility for POSIX-compliant applications. In another
attempt to increase compatibility, Windows NT supports other file systems, such as the
file allocation table (FAT) file system from DOS and the High Performance File System
(HPFS) from OS/2, in addition to the native NT file system (NTFS).
Maintainability
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 3 of 226
Windows NT is a big piece of code, and maintaining it is a big job. The NT development
team has achieved maintainability through an object-oriented design. Also, the breakup of
the operating system functionality into various layers improves maintainability. The
topmost layer, which is the one that is seen by the users of the operating system, is the
subsystems layer. The subsystems use the system call interface to provide the application
programming interface (API) to the outside world. Below the system call interface layer
lies the NT executive, which in turn rests on the kernel, which ultimately relies on the
hardware abstraction layer (HAL) that talks directly with the hardware.
Security
Windows NT is a secure operating system based on the following characteristic: A user
needs to log in to the system before he or she can access it. The resources in the system
are treated as objects, and every object has a security descriptor associated with it. A
security descriptor has access control lists attached to it that dictate which users can
access the object.
All this being said, a secure operating system cannot be complete without a secure file
system, and the FAT file system from the days of DOS does not have any provision for
security. DOS, being a single-user operating system, did not care about security.
In response to this shortcoming, the Windows NT team came up with a new file system
based on the HPFS, which is the native file system for OS/2. This new native file system
for Windows NT, known as NTFS, has support for access control. A user can specify the
access rights for a file or directory being created under NTFS, and NTFS allows only the
processes with proper access rights to access that file or directory.
Caution: Keep in mind that no system is 100 percent secure. Windows NT, although remarkably secure, is not DoD
compliant. (For the latest news on DoD compliance, check out http://www.fcw.com/pubs/fcw/1998/0727/fcw-
newsdodsec-7-27-98.htm.)
Multiprocessing
Windows NT supports symmetric multiprocessing, the workstation version of Windows NT
can support two processors, and the server version of Windows NT can support up to four
processors. The operating system needs special synchronization constructs for supporting
multiprocessing. On a single-processor system, critical portions of code can be executed
without interruption by disabling all the hardware interrupts. This is required to maintain
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 4 of 226
Note: Multiprocessing can be classified as asymmetric and symmetric. In asymmetric multiprocessing, a single
processor acts as the master processor and the other processors act as slaves. Only the master processor runs the
kernel code, while the slaves can run only the user threads. Whenever a thread running on a slave processor invokes a
system service, the master processor takes over the thread and executes the requested kernel service. The scheduler,
being a kernel code, runs only on the master processor. Thus, the master processor acts as the scheduler, dispatching
user mode threads to the slave processors. Naturally, the master processor is heavily loaded and the system is not
scalable. Compare this with symmetric multiprocessing, where any processor can run the kernel code as well as the
user code.
Multiprogramming
Windows NT 3.51 and Windows NT 4.0 lack an important feature, namely, the support for
remote login or Telnet of a server operating system. Both these versions of Windows NT
can operate as file servers because they support the common Internet file system (CIFS)
protocol. But they cannot act as CPU servers because logging into a Windows NT machine
over the network is not possible. Consequently, only one user can access a Windows NT
machine at a time. Windows 2000 plans to overcome this deficiency by providing a Telnet
server along with the operating system. This will enable multiple programmers to log in on
the machine at the same time, making Windows 2000 a true server operating system.
Note: Third-party Telnet servers are available for Windows NT 3.51 and Windows NT 4.0. However, Microsoft’s own
Telnet server comes only with Windows 2000.
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 5 of 226
architecture of the operating system serves yet another purpose: It allows multiple APIs
for the same operating system. This is achieved by implementing the APIs through the
server processes.
The MACH operating system kernel provides a very simple set of interface functions. A
server process implementing a particular API uses these interface functions to provide a
more complex set of interface functions. Windows NT borrows this idea from the MACH
operating system. The server processes in Windows NT are called as the subsystems. NT’s
choice of the client-server architecture shows its commitment to good software
management principles such as modularity and structured programming. Windows NT had
the option to implement the required APIs in the kernel. Also, the NT team could have
added different layers on top of the Windows NT kernel to implement different APIs. The
NT team voted in favor of the subsystem approach for purposes of maintainability and
extensibility.
The Subsystems
There are two types of subsystems in Windows NT: integral subsystems and environment
subsystems. The integral subsystems, such as the security manager subsystem, perform
some essential operating system task. The environment subsystems enable different types
of APIs to be used on a Windows NT machine. Windows NT comes with subsystems to
support the following APIs:
Win32 Subsystem. The Win32 subsystem provides the Win32 API. The applications
conforming to the Win32 API are supposed to run unaltered on all the 32-bit
platforms provided by Microsoft–that is, Windows NT, Windows 95, and Win32s.
Unfortunately, as you will see later in this book, this is not always the case.
WOW Subsystem. The Windows on Windows (WOW) subsystem provides backward
compatibility to 16-bit Windows applications, enabling Win16 applications to run
on Windows NT. These applications can run on Windows NT unless they use some
of the undocumented API functions from Windows 3.1 that are not defined in
Windows NT.
NTVDM Subsystem. The NT Virtual DOS Machine (NTVDM) provides a text-based
environment where DOS applications can run.
OS/2 Subsystem. The OS/2 subsystem enables OS/2 applications to run. WOW,
NTVDM, and OS/2 are available only on Intel platforms because they provide
binary compatibility to applications. One cannot run the executable files or binary
files created for one type of processor on another type of processor because of
the differences in machine code format.
POSIX Subsystem. The POSIX subsystem provides API compliance to the POSIX
1003.1 standard.
The applications are unaware of the fact that the API calls invoked are processed by the
corresponding subsystem. This is hidden from the applications by the respective client-
side DLLs for each subsystem. This DLL translates the API call into a local procedure call
(LPC). LPC is similar to the remote procedure call (RPC) facility available on networked
Unix machines. Using RPC, a client application can invoke a function residing in a server
process running on another machine over the network. LPC is optimized for the client and
the server running on the same machine.
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 6 of 226
Device Drivers
This chapter covers the software requirements for building Windows NT device drivers, the procedure for building device
drivers, and the structure of a typical device driver.
MOST OF THE SAMPLES IN this book are Windows NT kernel mode device drivers. This chapter
contains the information you need to build device drivers and understand the samples in
this book. This chapter is not a complete guide to writing device drivers. The best sources
of information for detailed coverage of the topic are Art Baker's The Windows NT Device
Driver Book: A Guide for Programmers and the documentation that ships with the
Windows NT Device Driver Kit (DDK).
Windows NT Device Driver Kit (DDK) from Microsoft For the development of device drivers,
you need to install the Device Driver Kit on your machine. The Device Driver Kit is
available with the MSDN Level 2 subscription. The kit consists of sets of header files,
libraries, and tools that enable easy development of device drivers.
32-bit compiler You need a 32-bit compiler to compile the device drivers. We strongly
recommend using the Microsoft compiler to build the samples in this book.
Win32 Software Development Kit (SDK) Although it is not necessary for compiling the
samples from this book, we recommend installing the latest version of the Win32 SDK on
your machine. Also, when you build device drivers using the DDK tools, you should set the
environment variable MSTOOLS to point to the location where the Win32 SDK is installed.
You can fake the installation of the Win32 SDK by adding the environment variable
MSTOOLS with the System applet in the Control Panel.
%SystemRoot%\System32\cmd.exe /k E:\DDK40\bin\setenv.bat
E:\DDK40 free
The Checked Build Environment shortcut, on the other hand, refers to this command line:
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 7 of 226
checked
Both shortcuts spawn CMD.EXE and ask it to execute the SETENV.BAT file with appropriate
parameters. After executing the command, CMD.EXE still keeps running because of the
presence of the /k switch. The SETENV.BAT file sets the environment variables, which are
added to the CMD.EXE process's environment variable list. The DDK tools, which are
spawned from CMD.EXE, refer to these environment variables. SETENV.BAT sets the
environment variables, including BUILD_DEFAULT, BUILD_DEFAULT_TARGETS,
BUILD_MAKE_PROGRAM, and DDKBUILDENV.
The drivers are compiled using the utility called BUILD.EXE, which is shipped with the
DDK. This utility takes as input a file named SOURCES. This file contains the list of source
files to be compiled to build the driver. This file also contains the name of the target
executable, the type of the target executable (for example, DRIVER or PROGRAM), and the
path of the directory where the target executable is to be created.
Each sample device driver included with the DDK contains a makefile. However, this is not
the actual makefile for the device driver sample. Instead, the makefile for each sample
device driver includes a common makefile, named MAKEFILE.DEF, which is present in the
INC directory of the DDK installation directory.
# DO NOT EDIT THIS FILE!!! Edit .\sources. if you want to add a new source
# file to this component. This file merely indirects to the real make file
!INCLUDE $(NTMAKEENV)\makefile.def
Some of the driver samples in this book have Assembly language files (.ASM files). You
cannot refer to the .ASM file directly into the SOURCES file. Instead, you have to create a
directory called I386 in the directory where the source files for the drivers are kept. All
the .ASM files for the drivers must be kept in the I386 directory. The BUILD.EXE utility
automatically uses ML.EXE to compile these .ASM files.
BUILD.EXE generates the appropriate driver or application based on the settings specified
in the SOURCES file and using the platform-dependent environment variables. If there are
any errors during the BUILD process, the errors are logged to a file called as BUILD.ERR. If
there are any warnings, they are logged to the BUILD.WRN file. Also, the BUILD utility
generates a file called BUILD.LOG, which contains lists of commands invoked by the BUILD
utility and the messages given by these tools.
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 8 of 226
the device drivers. Hence, the DriverEntry of each device driver is called in the context of
the SYSTEM process. Each device driver is represented by a device name in the system, so
each driver has to create a device name for its device. This is done with the
IoCreateDevice function. If Win32 applications need to open the handle to a device driver,
the driver needs to create a symbolic link for its device in the DosDevices object
directory. This is done using a call to IoCreateSymbolicLink. Typically, in the DriverEntry
routine of a device driver, the device object and the symbolic link object are created for
a device and some driver or device-specific initialization is performed.
Most of the device driver samples in this book involve pseudo device drivers. These drivers
do not control any physical device. Instead, they complete tasks that can be performed
only from the device driver. (The device driver runs at the most privileged mode of the
processor–Ring 0 in Intel processors.) In addition, the DriverEntry is supposed to provide
sets of entry points for other functions, such as OPEN, CLOSE, DEVICEIOCONTROL, and so
on. These entry points are provided by filling in some fields in the device object, which is
passed as a parameter to the DriverEntry function.
Because most of the drivers in this book are pseudo device drivers, the DriverEntry routine
is the same for all of them. Only the device driver–specific initialization is different.
Instead of repeating the same piece of code in each of the driver samples, a macro is
written. The macro is called MYDRIVERENTRY:
#define MYDRIVERENTRY(DriverName,DeviceId,DriverSpecificInit)
PDEVICE_OBJECT deviceObject=NULL;
NTSTATUS ntStatus;
WCHAR deviceNameBuffer[]=L"\\Device\\"##DriverName;
UNICODE_STRING deviceNameUnicodeString;\
WCHAR deviceLinkBuffer[]=L"\\DosDevices\\"##DriverName;
UNICODE_STRING deviceLinkUnicodeString;
RtlInitUnicodeString(&deviceNameUnicodeString,
deviceNameBuffer);
ntStatus = IoCreateDevice(DriverObject,
0,
&deviceNameUnicodeString,
##DeviceId,
0,
TRUE,
&deviceObject);
if (NT_SUCCESS(ntStatus)){
RtlInitUnicodeString(&deviceLinkUnicodeString,
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 9 of 226
deviceLinkBuffer);
ntStatus= IoCreateSymbolicLink(
&deviceLinkUnicodeString,
&deviceNameUnicodeString);
if (!NT_SUCCESS(ntStatus)) {
IoDeleteDevice (deviceObject);
return ntStatus;
ntStatus=##DriverSpecificInit;
if (!NT_SUCCESS(ntStatus)) {
IoDeleteDevice (deviceObject);
IoDeleteSymbolicLink(&deviceLinkUnicodeString);
return ntstatus;
DriverObject->MajorFunction[IRP_MJ_CREATE] =
DriverObject->MajorFunction[IRP_MJ_CLOSE] =
DriverObject->MajorFunction[IRP_MJ_DEVICE_CONTROL] =
DriverDispatch;
DriverObject->DriverUnload=DriverUnload;
return STATUS_SUCCESS;
} else { return
ntStatus;
};
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 10 of 226
initialization function specified by the third parameter. If the function returns failures,
the macro returns the error code of the specific initialization function. If the function
succeeds, the macro fills in various function pointers for other functions supported by the
driver in the DriverObject. Once this macro is used in the DriverEntry function, you need
to write the DriverDispatch and DriverUnload functions, as the macro refers to these
functions.
All the requests to device driver are sent in the form of an I/O Request packet (IRP). The
driver expects the system to call the specific driver function for all device driver requests
based on the function pointers filled in during DriverEntry. We assume that all the driver
functions are filled in with the address of the DriverDispatch function in the following
discussion.
The DriverDispatch function is called with an IRP containing the command code of
IRP_MJ_CREATE whenever an application opens a handle to a device driver using the
CreateFile API call. The DriverDispatch function is called with an IRP containing the
command code of IRP_MJ_CLOSE whenever an application closes its handle to a device
driver using the CloseHandle API function. The DriverDispatch function is called with an
IRP containing the command code of IRP_MJ_DEVICE_CONTROL whenever the application
uses the DeviceIoControl API function to send or receive data from a device driver. If the
driver functionality is being used by multiple processes, the driver can use the CREATE and
CLOSE entry points to perform per-process initialization.
Because all these requests end up calling DriverDispatch, you need to have a way to
identify the actual function requested. You can accomplish this by looking at the
MajorFunction field in an I/O Request Packet (IRP). The request packet contains the
function code and any other additional parameters required to complete the request. The
DriverUnload routine is called when the device driver is unloaded from the system. Just
like DriverEntry, the DriverUnload function is called in the context of the SYSTEM process.
Typically, in a DriverUnload routine, the device driver deletes the symbolic link and the
device name created during DriverEntry and performs some device-specific
uninitialization.
SUMMARY
In this chapter, we covered the software requirements for building Windows NT device
drivers, the procedure for building device drivers, and the structure of a typical device
driver. Along the way, we explained a simple macro that you can use to generate the
driver entry code for a typical device drive.
Chapter 3: Win32
Implementations: A Comparative
Look
This chapter covers the Win32 implementation on Windows 95/98 and Windows NT. The authors discuss the differences
between these two implementations with respect to address space, process startup, toolhelp functions, multitasking,
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 11 of 226
Presently, there are four Win32 API implementations available from Microsoft:
Windows 95/98
Windows NT
Win32S
Windows CE
Of these, Win32S is very limited due to bugs and the restrictions of the underlying
operating system. Presently, Win32 API implementations on Windows 95/98 and Windows
NT are very popular among developers. Windows CE is meant for palmtop computers. The
Win32 API was first implemented on the Windows NT operating system. Later, the same
API was made available in Windows 95. Ideally, an application written using the standard
Win32 API should work on any operating system that supports the Win32 API
implementation. (However, this is not necessarily true due to the differences between the
implementations.) The Win32 API should hide all the details of the underlying
implementations and provide a consistent view to the outside world.
In this chapter, we focus on the differences between the implementations of the Win32
API under Windows NT and Windows 95. As developers, you should be aware of these
differences while you develop applications that can run on both of these operating
systems.
The major design goal for Windows 95 was backward compatibility. Hence, instead of
porting all the 16-bit functions to 32-bit, Microsoft decided to reuse the existing 16-bit
code (from the Windows 3.x operating system) by wrapping it in 32-bit code. This 32-bit
code would in turn call the 16-bit functions. This was a good approach because the tried-
and-true 16-bit code was already running on millions of machines all over the world. In
this Win32 API implementation, most of the functions from KERNEL32 thunk down to
KRNL386, USER32 thunks down to USER.EXE, and GDI32 thunks down to GDI.EXE.
On Windows NT also, the Win32 API is provided in the form of the famous trio of the
KERNEL32, USER32, and GDI32 DLLs. However, this implementation is done completely
from scratch without using any existing 16-bit code, so it is purely a 32-bit implementation
of Win32 API. Even 16-bit applications end up calling this 32-bit API. Windows NT’s 16-bit
subsystem uses universal thunking to achieve this.
Note: Universal thunking is a way of calling 32-bit functions from 16-bit applications. (More on thunking later in this
chapter.)
KRNL386.EXE, USER.EXE, and GDI.EXE, which are used to support 16-bit applications,
thunk up to KERNEL32, USER32, and GDI32 through the WOW (Windows on Windows) layer.
Most of the functions provided by KERNEL32.DLL call one or more native system services
to do the actual work. The native system services are available through a DLL called
NTDLL.DLL.
As far as USER32 and GDI32 are concerned, the implementation differs in NT versions 3.51
and later versions. Under Windows NT 3.51, a separate subsystem process implements the
USER32 and GDI32 calls. The DLLs USER32 and GDI32 contain stubs, which pass the
function parameters to the Win32 subsystem (CSRSS.EXE) and get the results back. The
communication between the client application and the Win32 subsystem is achieved by
using the local procedure call facility provided by the NT executive.
XREF: Chapter 8 covers the details of the local procedure call (LPC) mechanism.
Under Windows NT 4.0 and Windows 2000, the USER32 GDI32 calls the system services
provided by a kernel-mode device driver called WIN32K.SYS. USER32 and GDI32 contain
stubs that call these system services using the 2Eh interrupt. Hence, most of the
functionality of the Win32 Subsystem process (CSRSS.EXE) is taken over by the kernel-
mode driver (WIN32K.SYS). The CSRSS process still exists in NT 4.0 and Windows 2000–
however, its role is limited to mainly supporting Console I/O.
It is interesting to note that the Win32 API completely hides NTDLL.DLL from the
developer. Actually, most of the functions provided by the Win32 API ultimately call one
or more system services. This system service layer is very powerful and many times
contains functions that do not have equivalent Win32 API functions. Most of the Windows
NT Resource Kit utilities link to this DLL implicitly.
Address Space
Both Windows 95 and Windows NT deal with flat, 32-bit linear addresses that give 4GB of
virtual address space. Of this, the upper 2GB (hereafter referred to as the shared address
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 13 of 226
space) is reserved for operating system use, and the lower 2GB (hereafter referred to as
the private address space) is used by the running process. The private address space of
each process is different for each process. Although the virtual addresses in the private
address space of all processes is the same, they may point to a different physical page.
The addresses in the shared address space of all the processes point to the same physical
page.
Under Windows 95/98, the operating system DLLs, such as KERNEL32, USER32, and GDI32,
reside in the shared address space, whereas in Windows NT these DLLs are loaded in the
process’s private address space. Hence, under Windows 95/98, it is possible for one
application to interfere with the working of another application. For example, one
application can accidentally overwrite memory areas occupied by these DLLs and affect
the working of all the other processes.
Note: Although the shared address space is protected at the page table level, a kernel-mode component (for example,
a VXD) is able to write at any location in 4GB address space.
In addition, under Windows 95/98, it is possible to load a dynamic link library in the
shared address space. These DLLs will have the same problem described previously if the
DLL is used by multiple applications in the system.
Windows NT loads all the system DLLs, such as KERNEL32, USER32, and GDI32, in the
private address space. As a result, it is never possible for one application to interfere with
the other applications in the system without intending to do so. If one application
accidentally overwrites these DLLs, it will affect only that application. Other applications
will continue to run without any problems.
Memory-mapped files are loaded in the shared address space under Windows 95/98,
whereas they are loaded in the private address space in Windows NT. In Windows 95/98, it
is possible for one application to create and map a memory-mapped file, pass its address
to another application, and have the other application use this address to share memory.
This is not possible under Windows NT. You have to explicitly create and map a named
memory-mapped file in one application and open and map the memory-mapped file in
another application in order to share it.
The address space differences have strong impacts on global API hooking. The topic of
global API hooking has been covered many times in different articles and books. There is
still no common API hooking solution for both Windows NT and Windows 95/98. The basic
problem with global API hooking is that under Windows 95/98, it is possible to load a DLL
in shared memory. Also, all the system DLLs reside in shared memory. Hooking an API call
amounts to patching the few instructions at the start of function and routing them to a
function in a shared DLL using a simple JMP instruction. This does not work under Windows
NT because if you patch the bytes at the start of the function, they will be patched only in
your address space as the function resides in the private address space.
To do any kind of global API hooking under Windows NT, you have to make sure that the
hooking is performed in each of the running processes. For this, you need to play with the
address space of other processes. In addition, the same hooking also needs to be done in
newly started processes. Windows NT provides a way to automatically load a particular
DLL in each process through the AppInit_DLL registry key.
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 14 of 226
Process Startup
There are several differences in the way the process is started under Windows 95/98 and
Windows NT. Although the same CreateProcess API call is used in Windows 95/98 and
Windows NT, the implementation is quite different. In this chapter, we are looking only at
an example of a CreateProcess API call. Ideally, both of the CreateProcess
implementations should give the same view to the outside world. When somebody says
that a particular API call is standard, this means that given a specific set of parameters to
a function, the function should behave exactly the same on all the implementations of this
API call. In addition, the function should return the same error codes based on the type of
error.
Consider a simple problem such as detecting the successful start of an application. If you
try to spawn a program that has some startup problem (for example, implicitly linked DLLs
are missing), it should return an appropriate error code. The Windows 95/98
implementation returns an appropriate error code such as STATUS_DLL_NOT_FOUND,
whereas Windows NT does not return any error. Windows NT’s implementation will return
an error only if the file spawned is not present at the expected location. This happens
mainly because of the way the CreateProcess call is implemented under Windows NT and
Windows 95/98. When you spawn a process in Windows 95/98, the complete loading and
startup of the process is performed as part of the CreateProcess call itself. That is, when
the CreateProcess call returns, the spawned process is already running.
Toolhelp Functions
Win32 implementation on Windows 95/98 provides some functions that enable you to
enumerate the processes running in the system, module list, and so on. These functions
are provided by KERNEL32.DLL. The functions are CreateToolHelp32 SnapShot,
Process32First, Process32Next, and others. These functions are not implemented under
Windows NT’s implementation of KERNEL32. The programs that use these functions
implicitly will not start at all under Windows NT. The Windows NT 4.0 SDK comes with a
new DLL called PSAPI.DLL, which provides the equivalent functionality. The header file for
this PSAPI.H is also included with the Windows NT 4.0 SDK. Windows 2000 has this toolhelp
functionality built into KERNEL32.DLL.
Note: A function is implicitly linked if the program calls the function directly by name and includes the
appropriate .LIB file in the project. That is, it does not use GetProcAddress to get the address of the function.
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 15 of 226
Multitasking
Both Windows 95 and Windows NT use time slice–based preemptive multitasking. However,
because the Windows 95 implementation of the WIN32 API depends largely on 16-bit code,
it has a few inherent drawbacks. The major one is the Win16Mutex. Because the existing
16-bit code is not well suited for multitasking, the easiest choice for Microsoft was to
ensure that the 16-bit code is not entered from multiple tasks. To achieve this, Microsoft
came up with the Win16Mutex solution.
Before entering the 16-bit code, the operating system acquires the Win16Mutex, and it
leaves the Win16Mutex while returning from 16-bit code. The Win16Mutex is always
acquired when a 16-bit application is running, which results in reduced multitasking.
Windows NT does not have this problem because the entire code is 32-bit and is well
suited for time slice–based preemptive multitasking. Also, the 16-bit code thunks up to 32-
bit code in the case of Windows NT.
Thunking
Thunking enables 16-bit applications to run in a 32-bit environment and vice versa. It is a
way of calling a function written in one bitness from the code running at a different
bitness. Bitness is a property of the processor, and you can program the processor to
adjust the bitness. Bitness decides the way instructions are decoded by the processor.
There are two different types of thunking available:
Universal thunking
Generic thunking
Universal thunking enables you to call a 32-bit function from 16-bit code, whereas generic
thunking enables you to call a 16-bit function from 32-bit code. Windows 95/98 supports
both generic and universal thunking, but Windows NT supports only universal thunking. As
you saw earlier in this chapter, generic thunking is used extensively in WIN32 API
implementation of Windows 95/98. For example, a 32-bit USER32.DLL calls functions from
a 16-bit USER.EXE, and a 32-bit GDI32.DLL calls functions from a 16-bit GDI.EXE. Various
issues are involved in thunking, such as converting 16:16 far pointers in 16-bit code to flat
32-bit address and manipulating a stack for making a proper call from code running at one
bitness to code running at a different bitness. Microsoft provides tools such as thunk
compilers to automate most of these tasks.
Many vendors who write code for Windows 95/98 use generic thunking to avoid a major
redesign of their applications. For example, say a particular vendor has a product for
Windows 3.1 and would like to port it to Windows 95. Instead of rewriting the code for
Windows 95, an easier solution is to use the majority of the existing 16-bit code and use
generic thunking as a way of calling this code from 32-bit applications. However, these
applications need to be rewritten for Windows NT as Windows NT does not support generic
thunking.
Device Drivers
Device drivers are trusted components of the operating system that have full access to the
entire hardware. There are no restrictions on what device drivers can do. Each operating
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 16 of 226
system provides some way of adding new device drivers to the system. The device drivers
need to be written according to the semantics imposed by the operating system. The
device drivers are called virtual device drivers (VXD) in Windows 95/98, and they are
called as kernel-mode device drivers in Windows NT. Windows 95 uses LE file format for
virtual device drivers, whereas Windows NT uses the PE format. As a result, the
applications that use VXDs cannot be run on Windows NT. They need to be ported to a
Windows NT (kernel-mode) device driver.
Microsoft has come up with a Common Driver Model in Windows 98 and Windows 2000. At
this point, however, you need to port all the applications that use VXDs to Windows NT by
writing an equivalent kernel-mode driver.
Security
The major WIN32 API implementation difference between Windows 95/98 and Windows NT
is security. Windows 95/98’s implementation does not have any support for security. In all
the Win32 API functions that have SECURITY ATTRIBUTES as one of the parameters,
Windows 95/98’s implementation just ignores these parameters. This has some impact on
the way a developer programs. Registry APIs such as RegSaveKey and RegRestoreKey work
fine under Windows 95/98. However, under Windows NT, you need to do a few things
before you can use these functions. In Windows NT, there is a concept of privileges. There
are different kinds of privileges, such as Shutdown, Backup, and Restore. Before using a
function such as RegSaveKey, you need to acquire the Backup privilege. To use
RegRestoreKey, you need to acquire the Restore privilege, and to use the
InitiateSystemShutdown function, you need to acquire the Shutdown privilege.
Under Windows 95/98, anybody can install a VXD. To install a kernel-mode device driver
under Windows NT, you need administrator privilege for security purposes. As mentioned
previously, device drivers are trusted components of the operating system and have access
to the entire hardware. By requiring privileges to install a device driver, Windows NT
restricts the possibility that a guest account holder will install a device driver, which
could potentially bring the whole system down to its knees.
SUMMARY
This chapter covered the WIN32 API implementation on Windows 95/98 and Windows NT.
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 17 of 226
We discussed the differences between these two implementations with respect to address
space, process startup, toolhelp functions, multitasking, thunking, device drivers,
security, and newly added API calls.
MEMORY MANAGEMENT HAS ALWAYS been one of the most important and interesting aspects of any
operating system for serious developers. It is an aspect that kernel developers ignore.
Memory management, in essence, provides a thumbnail impression of any operating
system.
Microsoft has introduced major changes in the memory management of each new
operating system they have produced. Microsoft had to make these changes because they
developed all of their operating systems for Intel microprocessors, and Intel introduced
major changes in memory management support with each new microprocessor they
introduced. This chapter is a journey through the various Intel microprocessors and the
memory management changes each one brought along with it in the operating system that
used it.
In the segmented model, the address space is divided into segments. Proponents of the
segmented model claim that it matches the programmer’s view of memory. They claim
that a programmer views memory as different segments containing code, data, stack, and
heap. Intel 8086 supports very primitive segmentation. A segment, in the 8086 memory
model, has a predefined base address. The length of each segment is also fixed and is
equal to 64K. Some programs find a single segment insufficient. Hence, there are a
number of memory models under DOS. For example, the tiny model that supports a single
segment for code, data, and stack together, or the small model that allows two segments–
one for code and the other for data plus stack, and so on. This example shows how the
memory management provided by an operating system directly affects the programming
environment.
The Intel 80286 (which followed the Intel 8086) could support more than 640K of RAM.
Hence, programmers got new interface standards for accessing extended and expanded
memory from DOS. Microsoft’s second-generation operating system, Windows 3.1, could
run on 80286 in standard mode and used the segmented model of 80286. The 80286
provided better segmentation than the 8086. In 80286’s model, segments can have a
programmable base address and size limit. Windows 3.1 had another mode of operation,
the enhanced mode, which required the Intel 80386 processor. In the enhanced mode,
Windows 3.1 used the paging mechanisms of 80386 to provide additional performance. The
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 18 of 226
virtual 8086 mode was also used to implement multiple DOS boxes on which DOS programs
could run.
Windows 3.1 does not make full use of the 80386’s capabilities. Windows 3.1 is a 16-bit
operating system, meaning that 16-bit addresses are used to access the memory and the
default data size is also 16 bits. To make full use of 80386’s capabilities, a 32-bit
operating system is necessary. Microsoft came up with a 32-bit operating system, Windows
NT. The rest of this chapter examines the details of Windows NT memory management.
Microsoft also developed Windows 95 after Windows NT. Since both these operating
systems run on 80386 and compatibles, their memory management schemes have a lot in
common. However, you can best appreciate the differences between Windows NT and
Windows 95/98 after we review Windows NT memory management. Therefore, we defer
this discussion until a later section of this chapter.
Windows NT is a protected operating system; that is, the behavior (or misbehavior) of one
process should not affect another process. This requires that no two processes are able to
see each other’s address space. Thus, Windows NT should provide each process with a
separate address space. Out of this 4GB address space available to each process, Windows
NT reserves the upper 2GB as kernel address space and the lower 2GB as user address
space, which holds the user-mode code and data. The entire address space is not separate
for each process. The kernel code and kernel data space (the upper 2GB) is common for
all processes; that is, the kernel-mode address space is shared by all processes. The
kernel-mode address space is protected from being accessed by user-mode code. The
system DLLs (for example, KERNEL32.DLL, USER32.DLL, and so on) and other DLLs are
mapped in user-mode space. It is inefficient to have a separate copy of a DLL for each
process. Hence, all processes using the DLL or executable module share the DLL code and
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 19 of 226
incidentally the executable module code. Such a shared code region is protected from
being modified because a process modifying shared code can adversely affect other
processes using the code.
Sharing of the kernel address space and the DLL code can be called implicit sharing.
Sometimes two processes need to share data explicitly. Windows NT enables explicit
sharing of address space through memory-mapped files. A developer can map a named file
onto some address space, and further accesses to this memory area are transparently
directed to the underlying file. If two or more processes want to share some data, they
can map the same file in their respective address spaces. To simply share memory
between processes, no file needs to be created on the hard disk.
We now examine the 80386’s addressing capabilities and the fit that Windows NT memory
management provides for it. Intel 80386 is a 32-bit processor; this implies that the address
bus is 32-bit wide, and the default data size is as well. Hence, 4GB (232 bytes) of physical
RAM can be addressed by the microprocessor. The microprocessor supports segmentation
as well as paging. To access a memory location, you need to specify a 16-bit segment
selector and a 32-bit offset within the segment. The segmentation scheme is more
advanced than that in 8086. The 8086 segments start at a fixed location and are always
64K in size. With 80386, you can specify the starting location and the segment size
separately for each segment.
Segments may overlap–that is, two segments can share address space. The necessary
information (the starting offset, size, and so forth) is conveyed to the processor via
segment tables. A segment selector is an index into the segment table. At any time, only
two segment tables can be active: a Global Descriptor Table (GDT) and a Local Descriptor
Table (GDT). A bit in the selector indicates whether the processor should refer to the LDT
or the GDT. Two special registers, GDTR and LDTR, point to the GDT and the LDT,
respectively. The instructions to load these registers are privileged, which means that only
the operating system code can execute them.
The processor compares the DPL with the Requested Privilege Level (RPL) before granting
access to a segment. The RPL is dictated by 2 bits in the segment selector while specifying
the address. The Current Privilege Level (CPL) also plays an important role here. The CPL
is the DPL of the code selector being executed. The processor grants access to a particular
segment only if the DPL of the segment is less than or equal to the RPL as well as the CPL.
This serves as a protection mechanism for the operating system. The CPL of the processor
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 20 of 226
can vary between 0 and 3 (because 2 bits are assigned for CPL). The operating system
code generally runs at CPL=0, also called as ring 0, while the user processes run at ring 3.
In addition, all the segments belonging to the operating system are allotted DPL=0. This
arrangement ensures that the user mode cannot access the operating system memory
segments.
It is very damaging to performance to consult the segment tables, which are stored in
main memory, for every memory access. Caching the segment descriptor in special CPU
registers, namely, CS (Code Selector), DS (Data Selector), SS (Stack Selector), and two
general-purpose selectors called ES and FS, solves this problem. The first three selector
registers in this list–that is, CS, DS, and SS–act as default registers for code access, data
access, and stack access, respectively.
To access a memory location, you specify the segment and offset within that segment.
The first step in address translation is to add the base address of the segment to the
offset. This 32-bit address is the physical memory address if paging is not enabled.
Otherwise this address is called as the logical or linear address and is converted to a
physical RAM address using the page address translation mechanism (refer to Figure 4-1).
The memory management scheme is popularly known as paging because the memory is
divided into fixed-size regions called pages. On Intel processors (80386 and higher), the
size of one page is 4 kilobytes. The 32-bit address bus can access up to 4GB of RAM.
Hence, there are one million (4GB/4K) pages.
Page address translation is a logical to physical address mapping. Some bits in the
logical/linear address are used as an index in the page table, which provides a logical to
physical mapping for pages. The page translation mechanism on Intel platforms has two
levels, with a structure called page table directory at the second level. As the name
suggests, a page table directory is an array of pointers to page tables. Some bits in the
linear address are used as an index in the page table directory to get the appropriate page
table to be used for address translation.
The page address translation mechanism in the 80386 requires two important data
structures to be maintained by the operating system, namely, the page table directory
and the page tables. A special register, CR3, points to the current page table directory.
This register is also called Page Directory Base Register (PDBR). A page table directory is a
4096-byte page with 1024 entries of 4 bytes each. Each entry in the page table directory
points to a page table. A page table is a 4096-byte page with 1024 entries of 4 bytes (32
bits) each. Each Page Table Entry (PTE) points to a physical page. Since there are 1 million
pages to be addressed, out of the 32 bits in a PTE, 20 bits act as upper 20 bits of physical
address. The remaining 12 bits are used to maintain attributes of the page.
Some of these attributes are access permissions. For example, you can denote a page as
read-write or read-only. A page also has an associated security bit called as the supervisor
bit, which specifies whether a page can be accessed from the user-mode code or only
from the kernel-mode code. A page can be accessed only at ring 0 if this bit is set. Two
other bits, namely, the accessed bit and the dirty bit, indicate the status of the page. The
processor sets the accessed bit whenever the page is accessed. The processor sets the
dirty bit whenever the page is written to. Some bits are available for operating system
use. For example, Windows NT uses one such bit for implementing the copy-on-write
protection. You can also mark a page as invalid and need not specify the physical page
address. Accessing such a page generates a page fault exception. An exception is similar
to a software interrupt. The operating system can install an exception handler and service
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 21 of 226
the page faults. You’ll read more about this in the following sections.
32-bit memory addresses break down as follows. The upper 10 bits of the linear address
are used as the page directory index, and a pointer to the corresponding page table is
obtained. The next 10 bits from the linear address are used as an index in this page table
to get the base address of the required physical page. The remaining 12 bits are used as
offset within the page and are added to the page base address to get the physical address.
Process Isolation
The next question that comes to mind is, “How does Windows NT keep processes from
seeing each other’s address space?” Again, the mechanism for achieving this design goal is
simple. Windows NT maintains a separate page table directory for each process and based
on the process in execution, it switches to the corresponding page table directory. As the
page table directories for different processes point to different page tables and these
page tables point to different physical pages and only one directory is active at a time, no
process can see any other process’s memory. When Windows NT switches the execution
context, it also sets the CR3 register to point to the appropriate page table directory. The
kernel-mode address space is mapped for all processes, and all page table directories have
entries for kernel address space. However, another feature of 80386 is used to disallow
user-mode code from accessing kernel address space. All the kernel pages are marked as
supervisor pages; therefore, user-mode code cannot access them.
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 22 of 226
point to the same physical page. Figure 4-2 shows two processes sharing a page via same
page table entries. The DLL pages are marked as read-only so that a process inadvertently
attempting to write to this area will not cause other processes to crash.
Note: This is guaranteed to be the case when xxxx==yyyy. However, if xxxx!=yyyy, the physical page might not be
same. We will discuss the reason behind this later in the chapter.
Kernel address space is shared using a similar technique. Because the entire kernel space
is common for all processes, Windows NT can share page tables directly. Figure 4-3 shows
how processes share physical pages by using same page tables. Consequently, the upper
half of the page table directory entries are the same for all processes.
#include <windows.h>
#include <string.h>
#include <stdio.h>
#include "gate.h"
DWORD PageDirectory[1024];
This initial portion of the SHOWDIR.C file contains, apart from the header inclusion, the
global definition for the array to hold the page directory. The inclusion of the header file
GATE.H is of interest. This header file prototypes the functions for using the callgate
mechanism. Using the callgate mechanism, you can execute your code in the kernel mode
without writing a new device driver.
For this sample program, we need this mechanism because the page directory is not
accessible to the user-mode code. For now, it’s sufficient to know that the mechanism
allows a function inside a normal executable to be executed in kernel mode. Turning on to
the definition of the page directory, we have already described that the size of each
directory entry is 4 bytes and a page directory contains 1024 entries. Hence, the
PageDirectory is an array of 1024 DWORDs. Each DWORD in the array represents the
corresponding directory entry.
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 23 of 226
int i=0;
PageDirectory[i] = PageDir[i];
CfuncGetPageDirectory() is the function that is executed in the kernel mode using the
callgate mechanism. This function simply makes a copy of the page directory in the user-
mode memory area so that the other user-mode code parts in the program can access it.
The page directory is mapped at virtual address 0xC0300000 in every process’s address
space. This address is not accessible from the user mode. The CFuncGetPageDirectory()
function copies 1024 DWORDs from the 0xC0300000 address to the global PageDirectory
variable that is accessible to the user-mode code in the program.
*/
void DisplayPageDirectory()
int i;
int ctr=0;
GetCurrentProcessId());
if (PageDirectory[i]&0x01) {
if ((ctr%3)==0) {
printf("\n");
ctr++;
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 24 of 226
printf("\n");
The DisplayPageDirectory() function operates in user mode and prints the PageDirectory
array that is initialized by the CfuncGetPageDirectory() function. The function checks the
Least Significant Bit (LSB) of each of the entries. A page directory entry is valid only if the
last bit or the LSB is set. The function skips printing invalid entries. The function prints
three entries on every line or, in other words, prints a newline character for every third
entry. Each directory entry is printed as the logical address and the address of the
corresponding page table as obtained from the page directory. As described earlier, the
first 10 bits (or the 10 Most Significant Bits [MSB]) of the logical address are used as an
index in the page directory. In other words, a directory entry at index i represents the
logical addresses that have i as the first 10 bits. The function prints the base of the logical
address range for each directory entry. The base address (that is, the least address in the
range) has the last 22 bits (or 22 LSBs) as zeros. The function obtains this base address by
shifting i to the first 10 bits. The address of the page table corresponding to the logical
address is stored in the first 20 bits (or 20 MSBs) of the page directory entry. The 12 LSBs
are the flags for the entry. The function calculates the page table address by masking off
the flag bits.
main()
WORD CallGateSelector;
int rc;
* from Ring 3 */
rc = CreateCallGate(GetPageDirectory, 0,
&CallGateSelector);
if (rc == SUCCESS) {
farcall[2] = CallGateSelector;
_asm {
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 25 of 226
DisplayPageDirectory();
getchar();
rc=FreeCallGate(CallGateSelector);
if (rc!=SUCCESS) {
"CallGateSelector=%x, rc=%x\n",
CallGateSelector, rc);
} else {
return 0;
The main() function starts by creating a callgate that sets up the GetPageDirectory()
function to be executed in the kernel mode. The GetPageDirectory() function is written in
Assembly language and is a part of the RING0.ASM file. The CreateCallGate() function,
used by the program to create the callgate, is provided by CALLGATE.DLL. The function
returns with a callgate selector.
XREF: The mechanism of calling the desired function through callgate is explained in Chapter 10.
We’ll quickly mention a few important points here. The callgate selector returned by
CreateCallGate() is a segment selector for the given function: in this case,
GetPageDirectory(). To invoke the function pointed by the callgate selector, you need to
issue a far call instruction. The far call instruction expects a 16-bit segment selector and a
32-bit offset within the segment. When you are calling through a callgate, the offset does
not matter; the processor always jumps at the start of the function pointed to by the
callgate. Hence, the program only initializes the third member of the farcall array that
corresponds to the segment selector. Issuing a call through the callgate transfers the
execution control to the GetPageDirectory() function. This function calls the
CfuncGetPageDirectory() function that copies the page directory in the PageDirectory
array. After the callgate call returns, the program prints the page directory copied in the
PageDirectory by calling the DisplayPageDirectory() function. The program frees the
callgate before exiting.
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 26 of 226
.386
.model small
.code
include ..\include\undocnt.inc
public _GetPageDirectory
extrn _CFuncGetPageDirectory@0:near
_GetPageDirectory proc
Ring0Prolog
call _CFuncGetPageDirectory@0
Ring0Epilog
retf
_GetPageDirectory endp
END
The function to be called from the callgate needs to be written in assembly language for a
couple of reasons. First, the function needs to execute a prolog and an epilog, both of
which are assembly macros, to allow paging in kernel mode. Second, the function needs to
issue a far return at the end. The function leaves the rest of the job to the
CFuncGetPageDirectory() function written in C.
If you compare the output of the showdir program for two different processes, you find
that the upper half of the page table directories for the two processes is exactly the same
except for two entries. In other words, the corresponding kernel address space for these
two entries is not shared by the two processes.
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 27 of 226
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 28 of 226
ff800000:0039a000 ffc00000:00031000
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 29 of 226
ff800000:0039a000 ffc00000:00031000
Let’s analyze, one step at a time, why the two entries are different. The page tables
themselves need to be mapped onto some linear address. When Windows NT needs to
access the page tables, it uses this linear address range. To represent 4GB of memory
divided into 1MB pages of 4K each, we need 1K page tables each having 1K entries. To
map these 1K page tables, Windows NT reserves 4MB of linear address space in each
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 30 of 226
process. As we saw earlier, each process has a different set of page tables. Whatever the
process, Windows NT maps the page tables on the linear address range from 0xC0000000
to 0xC03FFFFF. Let’s call this linear address range as the page table address range. In
other words, the page table address range maps to different page tables–that is, to
different physical pages–for different processes. As you may have noticed, the page table
addresses range falls in the kernel address space. Windows NT cannot map this crucial
system data structure in the user address space and allow user-mode processes to play
with the memory. Ultimately, the result is that two processes cannot share pages in the
page table address range although the addresses lie in the kernel-mode address range.
Exactly one page table is required to map 4MB address space because each page table has
1K entries and each entry corresponds to a 4K page. Consequently, Windows NT cannot
share the page table corresponding to the page table address range. This accounts for one
of the two mysterious entries in the page table directory. However, the entry’s mystery
does not end here–there is one more subtle twist to this story. The physical address
specified in this entry matches the physical address of the page table directory. The
obvious conclusion is that the page table directory acts also as the page table for the page
table address range. This is possible because the formats of the page table directory
entry and PTE are the same on 80386.
The processor carries out an interesting sequence of actions when the linear address
within the page table address range is translated to a physical address. Let’s say that the
CR3 register points to page X. As the first step in the address translation process, the
processor treats the page X as the page table directory and finds out the page table for
the given linear address. The page table happens to be page X again. The processor now
treats page X as the required page table and finds out the physical address from it. A more
interesting case occurs when the operating system is accessing the page table directory
itself. In this case, the physical address also falls in page X!
Let’s now turn to the second mysterious entry. The 4MB area covered by this page
directory entry is internally referred to as hyperspace. This area is used for mapping the
physical pages belonging to other processes into virtual address space. For example, a
function such as MmMapPageInHyperspace() uses the virtual addresses in this range. This
area is also used during the early stages of process creation. For example, when a parent
process such as PROGMAN.EXE spawns a child process such as NOTEPAD.EXE,
PROGMAN.EXE has to create the address space for NOTEPAD.EXE. This is done as a part of
the MmCreateProcessAddressSpace() function. For starting any process, an address space
must be created for the process. Address space is nothing but page directory. Also, the
upper-half entries of page directory are common for all processes except for the two
entries that we have already discussed. These entries need to be created for the process
being spawned. The MmCreateProcessAddressSpace() function allocates three pages of
memory: the first page for the page directory, the second page for holding the hyperspace
page table entries, and the third page for holding the working set information for the
process being spawned.
Once these pages are allocated, the function maps the first physical page in the address
space using the MmMapPageInHyperSpace() function. Note that the
MmMapPageInHyperSpace() function runs in the context of PROGMAN.EXE. Now the
function copies the page directory entries in the upper half of the page directory to the
mapped hyperspace virtual address. In short, PROGMAN.EXE creates the page directory for
the NOTEPAD.EXE.
Windows NT supports memory-mapped files. When two processes map the same file, they
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 31 of 226
share the same set of physical pages. Hence, memory-mapped files can be used for
sharing memory. In fact, Windows NT itself uses memory-mapped files to load DLLs and
executables. If two processes map the same DLL, they automatically share the DLL pages.
The memory-mapped files are implemented using the section object under Windows NT. A
data structure called PROTOPTE is associated with each section object. This data
structure is a variable-length structure based on the size of the section. This data
structure contains a 4-byte entry for each page in the virtual address space mapped by the
section object. Each 4-byte entry has the same structure as that of the PTE. When the
page is not being used by any of the processes, the protopte entry is invalid and contains
enough information to get the page back. In this case, the CPU PTE contains a fixed value
that is 0xFFFFF480, which indicates that accessing this page will be considered a protopte
fault.
Now comes the toughest of all questions: "How can Windows NT give away 4GB of memory
to each process when there is far less physical RAM available on the board?" Windows NT,
as well as all other operating systems that allow more address space than actual physical
memory, uses a technique called virtual memory to achieve this. In the next section, we
discuss virtual memory management in Windows NT.
If a page is not mapped onto physical RAM, Windows NT marks the page as invalid. Any
access to this page causes a page fault, and the page fault handler can bring in the page
from the secondary storage. To be more specific, when the page contains DLL code or
executable module code, the page is brought in from the DLL or executable file. When the
page contains data, it is brought in from the swap file. When the page represents a
memory-mapped file area, it is brought in from the corresponding file. Windows NT needs
to keep track of free physical RAM so that it can allocate space for a page brought in from
secondary storage in case of a page fault. This information is maintained in a kernel data
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 32 of 226
structure called the Page Frame Database (PFD). The PFD also maintains a FIFO list of in-
memory pages so that it can decide on pages to throw out in case of a space crunch.
Before throwing out a page, Windows NT must ensure that the page is not dirty.
Otherwise, it needs to write that page to secondary storage before throwing it out. If the
page is not shared, the PFD contains the pointer to PTE so that if the operating system
decides to throw out a particular page, it can then go back and mark the PTE as invalid. If
the page is shared, the PFD contains a pointer to the corresponding PROTOPTE entry. In
this case, the PFD also contains a reference count for the page. A page can be thrown out
only if its reference count is 0. In general, the PFD maintains the status of every physical
page.
The PFD is an array of 24-byte entries, one for each physical page. Hence, the size of this
array is equal to the number of physical pages that are stored in a kernel variable,
namely, MmNumberOfPhysicalPages. The pointer to this array is stored in a kernel
variable, namely, MmpfnDatabase. A physical page can be in several states–for example, it
can be in-use, free, free but dirty, and so on. A PFD entry is linked in a doubly linked list,
depending on the state of the physical page represented by it. For example, the PFD entry
representing a free page is linked in the free pages list. Figure 4-4 shows these lists linked
through the PFD. The forward links are shown on the left side of the PFD, and the
backward links are shown on the right side.
There are in all six kinds of lists. The heads of these lists are stored in following kernel
variables:
MmStandbyPageListHead
MmModifiedNoWritePageListHead
MmModifiedPageListHead
MmFreePageListHead
MmBadPageListHead
MmZeroedPageListHead
All these list heads are actually structures of 16 bytes each. Here is the structure
definition:
DWORD NumberOfPagesInList,
DWORD TypeOfList,
DWORD FirstPage,
DWORD LastPage
} PageListHead_t;
The FirstPage field can be used as an index into the PFD. The PFD entry contains a pointer
to the next page. Using this, you can traverse any of the lists. Here is the structure
definition for the PFD entry:
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 33 of 226
DWORD NextPage,
void *PteEntry/*PpteEntry,
DWORD PrevPage,
DWORD PteReferenceCount,
void *OriginalPte,
DWORD Flags;
} PfdEntry_t;
Using this, you can easily write a program to dump the PFD. However, there is one
problem: kernel variables, such as list heads, MmPfnDatabase, and
MmNumberOfPhysicalPages, are not exported. Therefore, you have to deal with absolute
addresses, which makes the program dependent on the Windows NT version and build
type.
#define _X86_
#include <ntddk.h>
#include <string.h>
#include <stdio.h>
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 34 of 226
#include "undocnt.h"
#include "gate.h"
/*Define the WIN32 calls we are using, since we can not include both
NTDDK.H and
ULONG dwOSVersionInfoSize;
ULONG dwMajorVersion;
ULONG dwMinorVersion;
ULONG dwBuildNumber;
ULONG dwPlatformId;
} OSVERSIONINFO, *LPOSVERSIONINFO;
ULONG NtVersion;
ULONG PebOffset;
ULONG VadRootOffset;
#pragma pack(1)
void *VadLocation;
VAD Vad;
} VADINFO, *PVADINFO;
#pragma pack()
VADINFO VadInfoArray[MAX_VAD_ENTRIES];
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 35 of 226
int VadInfoArrayIndex;
PVAD VadTreeRoot;
The initial portion of the VADDUMP.C file has a few definitions apart from the header
inclusion. In this program, we use the callgate mechanism as we did in the showdir
program–hence the inclusion of the GATE.H header file. After the header inclusion, the
file defines the maximum number of VAD entries that we’ll process. There is no limit on
the nodes in a VAD tree. We use the callgate mechanism for kernel-mode execution of a
function that dumps the VAD tree in an array accessible from the user mode. This array
can hold up to MAX_VAD_ENTRIES entries. Each entry in the array is of type VADINFO. The
VADINFO structure has two members: the address of the VAD tree node and the actual VAD
tree node. The VAD tree node structure is defined in the UNDOCNT.H file as follows:
void *StartingAddress;
void *EndingAddress;
DWORD Flags;
}VAD, *PVAD;
The first two members dictate the address range represented by the VAD node. Each VAD
tree node maintains a pointer to the parent node and a pointer to the left child and the
right child. The VAD tree is a binary tree. For every node in the tree, the left subtree
consists of nodes representing lower address ranges, and the right subtree consists of
nodes representing the higher address ranges. The last member in the VAD node is the
flags for the address range.
The VADDUMP.C file has a few other global variables apart from the VadInfoArray. A
couple of global variables are used while locating the root of the VAD tree. The PEB of a
process points to the VAD tree root for that process. The offset of this pointer inside the
PEB varies with the Windows NT version. We set the VadRootOffset to the appropriate
offset value of the VAD root pointer depending on the Windows NT version. There is a
similar problem of Windows NT version dependency while accessing the PEB for the
process. We use the Thread Environment Block (TEB) to get to the PEB. One field in TEB
points to the PEB, but the offset of this field inside the TEB structure varies with the
Windows NT version. We set the PebOffset variable to the appropriate offset value of the
PEB pointer inside the TEB structure depending on the Windows NT version. Another global
variable, NtVersion, stores the version of Windows NT running on the machine.
That leaves us with two more global variables, namely, VadInfoArrayIndex and
VadTreeRoot. The VadInfoArrayIndex is the number of initialized entries in the
VadInfoArray. The VadInfoArray entries after VadInfoArrayIndex are free. The
VadTreeRoot variable stores the root of the VAD tree.
The sample has been tested on Windows NT 3.51, 4.0 and Windows 2000 beta2. The
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 36 of 226
sample will run on other versions of Windows 2000, provided the offsets of VadRoot and
PEB remain same.
* stored
*/
if (VadNode == NULL) {
return;
return;
VadTreeWalk(VadNode->LeftLink);
VadInfoArray[VadInfoArrayIndex].VadLocation = VadNode;
VadInfoArray[VadInfoArrayIndex].Vad.StartingAddress =
VadNode->StartingAddress;
VadInfoArray[VadInfoArrayIndex].Vad.EndingAddress =
VadNode->EndingAddress;
if (NtVersion == 5) {
(DWORD)VadInfoArray[VadInfoArrayIndex].
(DWORD)VadInfoArray[VadInfoArrayIndex].
Vad.EndingAddress += 1;
(DWORD)VadInfoArray[VadInfoArrayIndex].
(DWORD)VadInfoArray[VadInfoArrayIndex].
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 37 of 226
Vad.EndingAddress -= 1;
VadInfoArray[VadInfoArrayIndex].Vad.ParentLink =
VadNode->ParentLink;
VadInfoArray[VadInfoArrayIndex].Vad.LeftLink =
VadNode->LeftLink;
VadInfoArray[VadInfoArrayIndex].Vad.RightLink =
VadNode->RightLink;
VadInfoArray[VadInfoArrayIndex].Vad.Flags =
VadNode->Flags;
VadInfoArrayIndex++;
VadTreeWalk(VadNode->RightLink);
The VadTreeWalk() function is executed in the kernel mode using the callgate mechanism.
The function traverses the VAD tree in the in-order fashion and fills up the VadInfoArray.
The function simply returns if the node pointer parameter is NULL or the VadInfoArray is
full. Otherwise, the function recursively calls itself for the left subtree. The recursion is
terminated when the left child pointer is NULL. The function then fills up the next free
entry in the VadInfoArray and increments the VadInfoArrayIndex to point to the next free
entry. Windows 2000 stores the page numbers instead of the actual addresses in VAD.
Hence, for Windows 2000, we need to calculate the starting address and the ending
address from the page numbers stored in these fields. As the last step in the in-order
traversal, the function issues a self-recursive to process the right subtree.
VadTreeRoot = VadRoot;
VadInfoArrayIndex = 0;
VadTreeWalk(VadRoot);
The CfuncDumpVad is the caller of the VadTreeWalk() function. It just initializes the
global variables used by the VadTreeWalk() function and calls the VadTreeWalk() function
for the root of the VAD tree.
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 38 of 226
void VadTreeDisplay()
int i;
VadTreeRoot);
"LeftLink\t RightLink\n");
VadInfoArray[i].VadLocation,
VadInfoArray[i].Vad.StartingAddress,
VadInfoArray[i].Vad.EndingAddress,
VadInfoArray[i].Vad.ParentLink,
VadInfoArray[i].Vad.LeftLink,
VadInfoArray[i].Vad.RightLink);
printf("\n\n");
The VadTreeDisplay() function is a very simple function that is executed in user mode. The
function iterates through all the entries initialized by the VadTreeWalk() function and
prints the entries. Essentially, the function prints the VAD tree in the infix order because
the VadTreeWalk() function dumps the VAD tree in the infix order.
void SetDataStructureOffsets()
switch (NtVersion) {
case 3:
PebOffset = 0x40;
VadRootOffset = 0x170;
break;
case 4:
PebOffset = 0x44;
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 39 of 226
VadRootOffset = 0x170;
break;
case 5:
PebOffset = 0x44;
VadRootOffset = 0x194;
break;
As we described earlier, the offset of the PEB pointer within TEB and the offset of the VAD
root pointer within the PEB are dependent on the Windows NT version. The
SetDataStructureOffsets() function sets the global variables indicating these offsets
depending on the Windows NT version.
main()
WORD CallGateSelector;
int rc;
short farcall[3];
void DumpVad(void);
void *ptr;
OSVERSIONINFO VersionInfo;
VersionInfo.dwOSVersionInfoSize = sizeof(VersionInfo);
if (GetVersionEx(&VersionInfo) == TRUE) {
NtVersion = VersionInfo.dwMajorVersion;
return 0;
SetDataStructureOffsets();
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 40 of 226
*/
rc = CreateCallGate(DumpVad, 0, &CallGateSelector);
if (rc != SUCCESS) {
return 1;
farcall[2] = CallGateSelector;
_asm {
VadTreeDisplay();
PAGE_READONLY);
if (ptr == NULL) {
goto Quit;
_asm {
VadTreeDisplay();
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 41 of 226
Quit:
rc = FreeCallGate(CallGateSelector);
if (rc != SUCCESS) {
CallGateSelector, rc);
return 0;
The main() function starts by getting the Windows NT version and calling
SetDataStructureOffsets() to set the global variables storing the offsets for the PEB and
the VAD tree root. It then creates a callgate in the same manner as in the SHOWDIR
sample program. Issuing a call through this callgate ultimately results in the execution of
the VadTreeWalk() function that fills up the VadInfoArray. The main() function then calls
the VadTreeDisplay() function to print the VadInfoArray entries.
We also show you the change in the VAD tree due to memory allocation in this sample
program. After printing the VAD tree once, the program allocates a chunk of memory.
Then, the program issues the callgate call again and prints the VAD tree after returning
from the call. You can observe the updates that happened to the VAD tree because of the
memory allocation. The program frees up the callgate before exiting.
.386
.model small
.code
public _DumpVad
extrn _CFuncDumpVad@4:near
extrn _PebOffset:near
extrn _VadRootOffset:near
include ..\include\undocnt.inc
_DumpVad proc
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 42 of 226
Ring0Prolog
MOV EAX,FS:[00000124h]
MOV EAX,[EAX]
PUSH EAX
CALL _CFuncDumpVad@4
Ring0Epilog
RETF
_DumpVad endp
END
The function to be called from the callgate needs to be written in the Assembly language
for reasons already described. The DumpVad() function gets hold of the VAD root pointer
and calls the CFuncDumpVad() function that dumps the VAD tree in the VadInfoArray. The
function gets hold of the VAD root from the PEB after getting hold of the PEB from the
TEB. The TEB of the currently executing thread is always pointed to by FS:128h. As
described earlier, the offset of the VAD root pointer inside PEB and the offset of the PEB
pointer inside the TEB vary with the Windows NT version. The DumpVad() function uses
the offset values stored in the global variable by the SetDataStructureOffsets() function.
Listing 4-7 presents the output from an invocation of the VADDUMP program. Note that the
VAD tree printed after allocating memory at address 0x300000 shows an additional entry
for that address range.
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 43 of 226
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 44 of 226
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 45 of 226
The output of the VADDUMP program does not really look like a tree. You have to trace
through the output to get the tree structure. The entry with a null parent link is the root
of the tree. Once you find the root, you can follow the child pointers. To follow a child
pointer, search the pointer in the first column, named Vad@, in the output. The Vad entry
with the same Vad@ is the entry for the child that you are looking for. An all-zero entry
for a left/right child pointer indicates that there is no left/right subtree for the node.
Figure 4-5 shows a partial tree constructed from the output shown previously.
IMPACT ON HOOKING
Now we’ll look at the impact of the memory management scheme explained in the last
section in the area of hooking DLL API calls. To hook a function from a DLL, you need to
change the first few bytes from the function code. As you saw earlier, the DLL code is
shared by all processes and is write protected so that a misbehaving process cannot affect
other processes. Does this mean that you cannot hook a function in Windows NT? The
answer is, “Hooking is possible under Windows NT, but you need to do a bit more work to
comply with stability requirements.” Windows NT provides a system call, VirtualProtect,
that you can use to change page attributes. Hence, hooking is now a two-step process:
Change the attributes of the page containing DLL code to read-write, and then change the
code bytes.
Copy-on-Write
“Eureka!” you might say, “I violated Windows NT security. I wrote to a shared page used
by other processes also.” No! You did not do that. You changed only your copy of the DLL
code. The DLL code page was being shared while you did not write to the page. The
moment you wrote on that page, a separate copy of it was made, and the writes went to
this copy. All other processes are safely using the original copy of the page. This is how
Windows NT protects processes from each other while consuming as few resources as
possible.
The VirtualProtect() function does not mark the page as read-write–it keeps the page as
read-only. Nevertheless, to distinguish this page from normal read-only pages, it is marked
for copy-on-write. Windows NT uses one of the available PTE bits for doing this. When this
page is written onto, because it is a read-only page, the processor raises a page fault
exception. The page fault handler makes a copy of the page and modifies the page table
of the faulting process accordingly. The new copy is marked as read-write so that the
process can write to it.
Windows NT itself uses the copy-on-write mechanism for various purposes. The DLL data
pages are shared with the copy-on-write mark. Hence, whenever a process writes to a
data page, it gets a personal copy of it. Other processes keep sharing the original copy,
thus maximizing the sharing and improving memory usage.
A DLL may be loaded in memory at different linear address for different processes. The
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 46 of 226
memory references–for example, address for call instruction, address for a memory to
register move instruction, and so on–in the DLL need to be adjusted (patched) depending
on the linear address where the DLL gets loaded. This process is called as relocating the
DLL. Obviously, relocation has to be done separately for each process. While relocating,
Windows NT marks the DLL code pages as copy-on-write temporarily. Thus, only the pages
requiring page relocation are copied per process. Other pages that do not have memory
references in them are shared by all processes.
This is the reason Microsoft recommends that a DLL be given a preferred base address and
be loaded at that address. The binding of the DLL to a specific base address ensures that
the DLL need not be relocated if it is loaded at the specified base address. Hence, if all
processes load the DLL at the preferred base address, all can share the same copy of DLL
code.
The POSIX subsystem of Windows NT uses the copy-on-write mechanism to implement the
fork system call. The fork system call creates a new process as a child of a calling process.
The child process is a replica of the parent process, and it has the same state of code and
data pages as the parent. Since these are two different processes, the data pages should
not be shared by them. However, generally it is wasteful to make a copy of the parent’s
data pages because in most cases the child immediately invokes the exec system call. The
exec system call discards the current memory image of the process, loads a new
executable module, and starts executing the new executable module. To avoid copying
the data pages, the fork system call marks the data pages as copy-on-write. Hence, a data
page is copied only if the parent or the child writes to it.
The following sample program demonstrates how copy-on-write works. By running two
instances of the program, you can see how the concepts described in this section work.
The application loads a DLL, which contains two functions and two data variables. One
function does not refer to the outside world, so no relocations are required for it. The
other function accesses one global variable, so it contains relocatable instructions or
instructions that need relocation. One data variable is put in a shared data section so it
will be shared across multiple instances of DLL. One variable is put in a default data
section. The two functions are put in separate code sections just to make them page
aligned.
When you run the first instance of the application, the application loads and prints the
physical addresses of two functions and two data variables. After this, you run the second
instance of the same application. In the second instance, the application arranges to load
the DLL at a different base address than that of the first instance. Then it prints the
physical addresses of two functions and two data variables. Next, the application arranges
to load the DLL at the same base address as that of the first instance. In this case, all
physical pages are seen to be shared. Next, the application modifies the shared and
nonshared variable and modifies the first few bytes of one function, and it prints the
physical addresses for two functions and two variables again. We first discuss the code for
this sample program and then describe how the output from the sample program
demonstrates memory sharing and the effects of the copy-on-write mechanism.
#include <windows.h>
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 47 of 226
#include <stdio.h>
#include "gate.h"
#include "getphys.h"
HANDLE hFileMapping;
HINSTANCE hDllInstance;
The initial portion of the file contains the header inclusion and global variable definitions.
The program demonstrates the use of various page attributes, especially to implement the
copy-on-write mechanism. As described earlier, the program uses four different types of
memory sections. The pointers to the four different types of memory sections are defined
as global variables. The hDllInstance stores the instance of the instance handle of the DLL
that contains the different kind of memory sections used in this demonstration.
*/
int LoadDllAndInitializeVirtualAddresses()
hDllInstance = LoadLibrary("MYDLL.DLL");
if (hDllInstance == NULL) {
return -1;
hDllInstance);
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 48 of 226
NonRelocatableFunction =
GetProcAddress(GetModuleHandle("MYDLL"),
"_NonRelocatableFunction@0");
RelocatableFunction =
GetProcAddress(GetModuleHandle("MYDLL"),
"_RelocatableFunction@0");
SharedVariable =
GetProcAddress(GetModuleHandle("MYDLL"),
"SharedVariable");
NonSharedVariable =
GetProcAddress(GetModuleHandle("MYDLL"),
"NonSharedVariable");
if((!NonRelocatableFunction) ||
(!RelocatableFunction) ||
(!SharedVariable) ||
(!NonSharedVariable)) {
FreeLibrary(hDllInstance);
HDllInstance = 0;
return -1;
VirtualLock(NonRelocatableFunction, 1);
VirtualLock(RelocatableFunction, 1);
VirtualLock(SharedVariable, 1);
VirtualLock(NonSharedVariable, 1);
return 0;
The four different types of memory sections that we use for the demonstration reside in
MYDLL.DLL. The LoadDllAndInitializeVirtualAddresses() function loads MYDLL.DLL in the
calling process’s address space and initializes the global variables to point to different
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 49 of 226
types of memory sections in the DLL. The function uses the GetProcAddress() function to
get hold of pointers to the exported functions and variables in MYDLL.DLL. The function
stores the instance handle for MYDLL.DLL in a global variable so that the FreeDll() function
can later use it to unload the DLL. The function also locks the different memory sections
so that the pages are loaded in memory and the page table entries are valid. Generally,
Windows NT does not load the page table entries unless the virtual address is actually
accessed. In other words, the memory won’t be paged in unless accessed. Also, the system
can page out the memory that is not used for some time, again marking the page table
entries as invalid. We use the VirtualLock() function to ensure that the pages of interest
are always loaded and the corresponding page table entries remain valid.
*/
void FreeDll()
VirtualUnlock(NonRelocatableFunction, 1);
VirtualUnlock(RelocatableFunction, 1);
VirtualUnlock(SharedVariable, 1);
VirtualUnlock(NonSharedVariable, 1);
FreeLibrary(hDllInstance);
HDllInstance = 0;
NonRelocatableFunction = NULL;
RelocatableFunction = NULL;
SharedVariable = NULL;
NonSharedVariable = NULL;
The FreeDll() function uses the VirtualUnlock() function to unlock the memory locations
locked by the LoadDllAndInitializeVirtualAddresses() function. The function unloads
MYDLL.DLL after unlocking the memory locations from the DLL. As the DLL is unloaded,
the global pointers to the memory sections in the DLL become invalid. The function sets
all these pointers to NULL according to good programming practice.
*/
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 50 of 226
strcpy(buffer, "");
return buffer;
The GetPageAttributesString() function returns a string with characters showing the page
attributes given the page attribute flags. The LSB in the page attributes indicates whether
the page is present in memory or the page table entry is invalid. This information is
printed as P or NP, which stands for present or not present. Similarly, R or RW means a
read-only or read-write page; S or U means a supervisor-mode or a user-mode page; and D
means a dirty page. The various page attributes are represented by different bits in the
PageAttr parameter to this function. The function checks the bits and determines whether
the page possesses the particular attributes.
*/
int DisplayVirtualAndPhysicalAddresses()
DWORD pNonRelocatableFunction = 0;
DWORD pRelocatableFunction = 0;
DWORD pSharedVariable = 0;
DWORD pNonSharedVariable = 0;
DWORD aNonRelocatableFunction = 0;
DWORD aRelocatableFunction = 0;
DWORD aSharedVariable = 0;
DWORD aNonSharedVariable = 0;
printf("\n------------------------------------\n");
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 51 of 226
printf("--------------------------------------\n");
GetPhysicalAddressAndPageAttributes(
NonRelocatableFunction,
&pNonRelocatableFunction, &aNonRelocatableFunction);
GetPhysicalAddressAndPageAttributes(
RelocatableFunction,
&pRelocatableFunction, &aRelocatableFunction);
GetPhysicalAddressAndPageAttributes(
SharedVariable,
&pSharedVariable,
&aSharedVariable);
GetPhysicalAddressAndPageAttributes(
NonSharedVariable,
&pNonSharedVariable,
&aNonSharedVariable);
NonRelocatableFunction,
pNonRelocatableFunction,
GetPageAttributesString(
aNonRelocatableFunction));
RelocatableFunction,
pRelocatableFunction,
GetPageAttributesString(
aRelocatableFunction));
SharedVariable,
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 52 of 226
pSharedVariable,
GetPageAttributesString(
aSharedVariable));
NonSharedVariable,
pNonSharedVariable,
GetPageAttributesString(
aNonSharedVariable));
printf("------------------------------------\n\n");
return 0;
int FirstInstance()
if (LoadDllAndInitializeVirtualAddresses()!=0) {
return -1;
DisplayVirtualAndPhysicalAddresses();
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 53 of 226
getchar();
FreeDll();
We want to demonstrate the sharing of memory sections by the DLL loaded by two
different processes. You need to run two instances of the demonstration program. The
FirstInstance() function is executed when you run the first instance of the program. The
first instance loads the DLL and prints the physical addresses and page attributes for the
various memory sections in the DLL. Then, the function asks you to run another instance
of the program. Now there are two processes that loaded MYDLL.DLL. You can compare
the outputs from these two instances to check how the memory sections are shared. More
on this when we explain the output from this sample program.
int NonFirstInstance()
DWORD OldAttr;
HINSTANCE hJunk;
program***\n\n");
hJunk=LoadLibrary("JUNK.DLL");
if (hJunk==NULL) {
return -1;
if (LoadDllAndInitializeVirtualAddresses()!=0) {
FreeLibrary(hJunk);
return -1;
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 54 of 226
FreeLibrary(hJunk);
DisplayVirtualAndPhysicalAddresses();
FreeDll();
if (LoadDllAndInitializeVirtualAddresses()!=0) {
return -1;
DisplayVirtualAndPhysicalAddresses();
NonRelocatableFunction\n");
VirtualProtect(NonRelocatableFunction, 1, PAGE_READWRITE,
&OldAttr);
*(char *)SharedVariable=0x10;
*(char *)NonSharedVariable=0x10;
DisplayVirtualAndPhysicalAddresses();
FreeDll();
return 0;
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 55 of 226
The second instance of the program does a lot more work than the first instance. The
sharing of the DLL memory sections depends on the way the instance loads the DLL and
accesses the memory locations in the DLL. In more concrete terms, the sharing depends on
whether the second instance loads the DLL at the same base address as the first instance.
It also depends on whether the instances only read the memory sections or any of the
instances write to the memory sections. To demonstrate this, the NonFirstInstance()
function first loads the DLL at a different base address than the first instance. The
function ensures that the DLL is loaded at a different base address by loading JUNK.DLL
before loading MYDLL.DLL. JUNK.DLL has the same preferred base address as that of
MYDLL.DLL. The first instance loads MYDLL.DLL at its preferred base address by default. In
the second instance, MYDLL.DLL cannot be loaded at its preferred base address because
the address range is already occupied by JUNK.DLL. After MYDLL.DLL is loaded at a
different base address, there is no reason for the program to keep JUNK.DLL loaded, and
so it frees the JUNK.DLL instance. Next, the function prints the physical addresses and
page attributes of the memory sections in MYDLL.DLL using the
DisplayVirtualAndPhysicalAddresses() function. The information printed here can be
compared with the output of the first instance of the program to get an idea of how the
DLLs loaded at different base addresses share the memory sections.
int DecideTheInstanceAndAct()
hFileMapping = CreateFileMapping(
(HANDLE)0xFFFFFFFF,
NULL,
PAGE_READWRITE,
0,
0x1000,
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 56 of 226
"MyFileMapping");
if (hFileMapping == NULL) {
FreeDll();
return -1;
if (GetLastError() == ERROR_ALREADY_EXISTS) {
NonFirstInstance();
} else {
FirstInstance();
The sample program does not accept any parameter to indicate whether it’s the first
instance. It uses a simple trick to decide it: It creates a named file mapping. The call to
the CreateFileMapping() API function sets the last error to ERROR_ALREADY_EXISTS if a
mapping with the same name already exists. This indicates that an instance that created
the file mapping is already running. In other words, if the program can successfully create
the named file mapping, it’s the first instance of the program. Otherwise, another
instance (that is, the first instance) of the program is already running and the current
instance is the second instance. Depending on whether it’s the first instance, the
DecideTheInstanceAndAct() function calls the NonFirstInstance() function or the
FirstInstance() function. A file mapping is automatically destroyed by the operating system
when the reference count drops to zero. The sample program does not explicitly close the
handle to the mapping. The handle is closed and the reference count for the memory
mapping is decremented when the program exits. The mapping is freed up when the last
instance of the program exits.
main()
int rc;
* application
*/
return -1;
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 57 of 226
DecideTheInstanceAndDoTheThings();
FreeRing0CallGate();
The main() function starts by a call to the CreateRing0CallGate() function that is located
in the GETPHYS.C file. The sample program uses the callgate mechanism to access the
page tables. As described earlier, the page tables reside in the kernel memory and are not
accessible to the user-mode code. The CreateRing0CallGate() function sets up a function
that reads in the page tables to be executed in kernel mode. The
DisplayVirtualAndPhysicalAddresses() function later uses this function to get hold of the
physical address and the page attributes for a given virtual address. After creating the
callgate, the main function passes control to the DecideTheInstanceAndDoTheThings()
function. The callgate is freed up by the program before exiting.
#include <windows.h>
#include <stdio.h>
#include "..\cgate\dll\gate.h"
The GETPHYS.C file implements the function to access the page table using the callgate
mechanism. The GATE.H file is included because it contains the prototypes for functions
that deal with the callgate manipulation. The segment selector of the callgate used by the
program is stored in the global variable, CallGateSelector.
BOOL _stdcall
CFuncGetPhysicalAddressAndPageAttributes(
*PhysicalAddress = 0;
*PageAttributes = 0;
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 58 of 226
if ((*PageTableEntry)&0x01) {
*PhysicalAddress =
((*PageTableEntry)&0xFFFFF000U) +
(VirtualAddress&0x00000FFFU);
*PageAttributes = (*PageTableEntry)&0x00000FFFU;
return TRUE;
} else {
return FALSE;
BOOL GetPhysicalAddressAndPageAttributes(
void *VirtualAddress,
BOOL rc;
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 59 of 226
if (!CallGateSelector) {
return FALSE;
farcall[2] = CallGateSelector;
_asm {
return rc;
int CreateRing0CallGate()
DWORD rc;
rc = CreateCallGate(
_GetPhysicalAddressAndPageAttributes,
0,
&CallGateSelector);
return rc;
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 60 of 226
int FreeRing0CallGate()
DWORD rc;
rc = FreeCallGate(CallGateSelector);
if (rc == SUCCESS) {
CallGateSelector = 0;
return rc;
The FreeRing0CallGate() function is another utility function that destroys the callgate
created by the CreateCallGate() function. It uses the FreeCallGate() interface function
provided by GATE.DLL.
.386
.model small
.code
public __GetPhysicalAddressAndPageAttributes
extrn _CFuncGetPhysicalAddressAndPageAttributes@12:near
include ..\include\undocnt.inc
__GetPhysicalAddressAndPageAttributes proc
Ring0Prolog
push eax
push ecx
push edx
call _CFuncGetPhysicalAddressAndPageAttributes@12
Ring0Epilog
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 61 of 226
retf
__GetPhysicalAddressAndPageAttributes endp
END
Listing 4-11 presents the output from the previous sample program. Note the differences
between the physical addresses and page attributes printed by the first instance and the
second instance. See if you can explain the output and match your findings with our
description that comes after this output.
--------------------------------------------------------------
AddressAddressAttributes
--------------------------------------------------------------
--------------------------------------------------------------
first instance
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 62 of 226
--------------------------------------------------------------
AddressAddressAttributes
--------------------------------------------------------------
--------------------------------------------------------------
instance
----------------------------------------------------------------
AddressAddressAttributes
----------------------------------------------------------------
----------------------------------------------------------------
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 63 of 226
------------------------------------------------------------------
AddressAddressAttributes
------------------------------------------------------------------
------------------------------------------------------------------
Note the page attributes from the output of the first instance. The functions are marked
read-only, as expected. The unshared variable is also marked read-only. This is because
Windows NT tries to share the data space also. As described earlier, such pages are
marked for copy-on-write, and as soon as the process modifies any location in the page,
the process gets a private copy of the page to write to. The other page attributes show
that the PTE is valid, the page is a user-mode page, and nobody has modified the page so
far.
Now, compare the output from the first instance with the output from the second instance
when it loaded the MYDLL.DLL at a base address different from that in the first instance.
As expected, the virtual addresses of all the memory sections are different than those for
the first instance. The physical addresses are the same except for the physical address of
the relocatable function. This demonstrates that the code pages are marked as copy-on-
write, and when the loader modifies the code pages while performing relocation, the
process gets a private writable copy. Our nonrelocatable function does not need any
relocation; hence, the corresponding pages are not modified. The second instance can
share these pages with the first instance and hence has the same physical page address.
To cancel out the effects of relocation, the second instance loads MYDLL.DLL at the same
base address as that in the first instance. Yup! Now, the virtual address matches the ones
from the first instance. Note that the physical address for the relocatable function also
matches that in the output from the first instance. The loader need not relocate the
function because the DLL is loaded at the preferred base address. This allows more
memory sharing and provides optimal performance. It’s reason enough to allocate proper,
nonclashing preferred base addresses for your DLLs.
This ideal share-all situation ceases to exist as soon as a process modifies some memory
location. Other processes cannot be allowed to view these modifications. Hence, the
modifying process gets its own copy of the page The second instance of the sample
program demonstrates this by modifying the data variables and a byte at the start of the
nonrelocatable function. The output shows that the physical address of the nonrelocatable
doesn’t match with the first instance. The nonrelocatable function is not modified by the
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 64 of 226
loader, but it had the same effect on sharing when we modified the function. The shared
variable remains a shared variable. Its physical address matches that in the first instance
because all the processes accessing a shared variable are allowed to see the modifications
made by other processes. But the nonshared variable has a different physical address now.
The second instance cannot share the variable with the first instance and gets its own
copy. The copy was created by the system page fault handler when we tried to write to a
read-only page and the page was also marked for copy-on-write. Note that the page is now
marked read-write. Hence, further writes go through without the operating system getting
any page faults. Also, note that the modified pages are marked as dirty by the processor.
SWITCHING CONTEXT
As we saw earlier, Windows NT can switch the memory context to another process by
setting the appropriate page table directory. The 80386 processor requires that the
pointer to the current page table directory be maintained in the CR3 register. Therefore,
when the Windows NT scheduler wants to perform a context switch to another process, it
simply sets the CR3 register to the page table directory of the concerned process.
Windows NT needs to change only the memory context for some API calls such as
VirtualAllocEx(). The VirtualAllocEx() API call allocates memory in the memory space of a
process other than the calling process. Other system calls that require memory context
switch are ReadProcessMemory() and WriteProcessMemory(). The ReadProcessMemory()
and WriteProcessMemory() system calls read and write, respectively, memory blocks from
and to a process other than the calling process. These functions are used by debuggers to
access the memory of the process being debugged. The subsystem server processes also
use these functions to access the client process’s memory. The undocumented
KeAttchProcess() function from the NTOSKRNL module switches the memory context to
specified process. The undocumented KeDetachProcess() function switches it back. In
addition to switching memory context, it also serves as a notion of current process. For
example, if you attach to a particular process and create a mutex, it will be created in
the context of that process. The prototypes for KeAttachProcess() and KeDetachProcess()
are as follows:
The following sample demonstrates how you can use the KeAttachProcess() and
KeDetachProcess() functions. The sample prints the page directories for all the processes
running in the system. The complete source code is not included. Only the relevant
portion of the code is given. Because these functions can be called only from a device
driver, we have written a device driver and provided an IOCTL that demonstrates the use
of this function. We are giving the function that is called in response to DeviceIoControl
from the application. Also, the output of the program is shown in kernel mode debugger’s
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 65 of 226
window (such as SoftICE). Getting the information back to the application is left as an
exercise for the reader.
int i;
int ctr=0;
KeAttachProcess(Peb);
if (PageDirectory[i]&0x01) {
if ((ctr%8) == 0)
DbgPrint(" \n");
ctr++;
DbgPrint("\n\n");
KeDetachProcess();
The DisplayPageDirectory() function accepts the PEB for the process whose page directory
is to be printed. The function first calls the KeAttachProcess() function with the given PEB
as the parameter. This switches the page directory to the desired one. Still, the function
can access the local variables because the kernel address space is shared by all the
processes. Now the address space is switched, and the 0xC030000 address points to the
page directory to be printed. The function prints the 1024 entries from the page directory
and then switches back to the original address space using the KeDetachProcess()
function.
void DisplayPageDirectoryForAllProcesses()
ULONG BuildNumber;
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 66 of 226
ULONG ListEntryOffset;
ULONG NameOffset;
if ((BuildNumber==0x421) || (BuildNumber==0x565)) { // NT
3.51 or NT 4.0
ListEntryOffset=0x98;
NameOffset=0x1DC;
ListEntryOffset=0xA0;
NameOffset=0x1FC;
} else {
DbgPrint("Unsupported NT Version\n");
return;
ProcessListHead=ProcessListPtr=(PLIST_ENTRY)(((char
*)PsInitialSystemProcess)+ListEntryOffset);
while (ProcessListPtr->Flink!=ProcessListHead) {
void *Peb;
char ProcessName[16];
ListEntryOffset);
memset(ProcessName, 0, sizeof(ProcessName));
DisplayPageDirectory(Peb);
ProcessListPtr=ProcessListPtr->Flink;
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 67 of 226
function for each process in the system. All the processes running in a system are linked in
a list. The function gets hold of the list of the processes from the PEB of the initial system
process. The PsInitialSystemProcess variable in NTOSKRNL holds the PEB for the initial
system process. The process list node is located at an offset of 0x98 (0xA0 for Windows NT
5.0) inside the PEB. The process list is a circular linked list. Once you get hold of any node
in the list, you can traverse the entire list. The DisplayPageDirectoryForAllProcesses()
function completes a traversal through the processes list by following the Flink member,
printing the page directory for the next PEB in the list every time until it reaches back to
the PEB it started with. For every process, the function first prints the process name that
is stored at a version-dependent offset within the PEB and then calls the
DisplayPageDirectory() function to print the page directory.
Here, we list partial output from the sample program. Please note a couple of things in
the following output. First, every page directory has 50-odd valid entries while the page
directory size is 1024. The remaining entries are invalid, meaning that the corresponding
page tables are either not used or are swapped out. In other words, the main memory
overhead of storing page tables is negligible because the page tables themselves can be
swapped out. Also, note that the page directories have the same entries in the later
portion of the page directory. This is because this part represents the kernel portion
shared across all processes by using the same set of page tables for the kernel address
range.
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 68 of 226
All the shared DLLs are loaded in the shared region. All the system DLLs–for example,
KERNEL32.DLL and USER32.DLL–are shared DLLs. Also, a DLL’s code/data segment can be
declared shared while compiling the DLL, and the DLL will get loaded in the shared region.
The shared memory blocks are also allocated space in the shared region. In Windows
95/98, once a process maps a shared section, the section is visible to all processes.
Because this section is mapped in shared region, other processes need not map it
separately.
There are advantages as well as disadvantages of having such a shared region. Windows
95/98 need not map the system DLLs separately for each process; the corresponding
entries of page table directory can be simply copied for each process. Also, the system
DLLs loaded in shared region can maintain global data about all the processes and
separate subsystem processes are not required. Also, most system calls turn out to be
simple function calls to the system DLLs, and as a result are very fast. In Windows NT,
most system calls either cause a context switch to kernel mode or a context switch to the
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 69 of 226
subsystem process, both of which are costly operations. For developers, loading system
DLLs in a shared region means that they can now put global hooks for functions in system
DLLs.
For all these advantages, Windows 95/98 pays with security features. In Windows 95/98,
any process can access all the shared data even if it has not mapped it. It can also corrupt
the system DLLs and affect all processes.
SUMMARY
In this chapter, we discussed the memory management of Windows NT from three
different perspectives. Memory management offers programmers a 32-bit flat address
space for every process. A process cannot access another process’s memory or tamper
with it, but two processes can share memory if they need to. Windows NT builds its
memory management on top of the memory management facilities provided by the
microprocessor. The 386 (and above) family of Intel microprocessors provides support for
segmentation plus paging. The address translation mechanism first calculates the virtual
address from the segment descriptor and the specified offset within the segment. The
virtual address is then converted to a physical address using the page tables. The
operating system can restrict access to certain memory regions by using the security
mechanisms that are provided both at the segment level and the page level.
Windows NT memory management provides the programmer with flat address space, data
sharing, and so forth by selectively using the memory management features of the
microprocessor. The virtual memory manager takes care of the paging and allows 4GB of
virtual address space for each process, even when the entire system has much less
physical memory at its disposal. The virtual memory manager keeps track of all the
physical pages in the system through the page frame database (PFD). The system also
keeps track of the virtual address space for each process using the virtual address
descriptor (VAD) tree. Windows NT uses the copy-on-write mechanism for various
purposes, especially for sharing the DLL data pages. The memory manager has an
important part in switching the processor context when a process is scheduled for
execution. Windows 95/98 memory management is similar to Windows NT memory
management with the differences being due to the fact that Windows 95/98 is not as
security conscious as Windows NT.
THIS CHAPTER DIFFERS greatly from other chapters in the book. It does not contain any
undocumented Windows NT information. Instead, it provides some general tips regarding
how to reverse engineer on your own to explore the undocumented Windows NT world.
This chapter teaches you how to reverse engineer Windows NT given the raw Assembly
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 70 of 226
code and the useful symbolic information provided by Microsoft in the form of .DBG files.
You can access these .DBG files on the Windows NT distribution CD-ROM. This chapter
does not provide a complete guide to reverse engineering for the simple reason that you
cannot clearly define a way of approaching this problem. Reverse engineering is like
panning for gold; you have to sift through tons of Assembly code to find a little
information. But this chapter contains some useful tricks we have used to come up with
undocumented Windows NT. Reverse engineering is an art, and it requires a lot of
intuition, patience, and logical deduction.
We divided this chapter into different sections with each section describing a step in
reverse engineering. We conclude the chapter by illustrating reverse engineering of a
sample undocumented function. The best tool for implementing reverse engineering is
NuMega’s excellent SoftICE. This book would not have been possible without SoftICE. This
chapter assumes that the reader has used debuggers. We recommend trying out SoftICE to
get the most out of this chapter. Although the concepts explained here specifically apply
to reverse engineering NTOSKRNL (NT Executive image) using SoftICE, these concepts can
apply to reverse engineering any piece of operating system code.
XREF: See the NuMega Web site at http://www.numega.com/ for up-to-date version information on SoftICE.
You need the following .DBG files to explore the KERNEL component:
KERNEL32.DBG
NTDLL.DBG
NTOSKRNL.DBG
You need the following .DBG files to explore the USER and GDI components:
USER32.DBG
GDI32.DBG
CSRSS.DBG
CSRSRV.DBG
WIN32K.DBG
Copy these .DBG files onto your hard drive, and then, using the symbol loader,
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 71 of 226
convert .DBG files into .NMS (the native symbol format of SoftICE). Then, add these files
to SoftICE’s initialization settings using the SoftICE Initialize Settings/Symbols option in
the symbol loader. This ensures that the symbols get loaded when SoftICE loads. Now,
reboot the machine. SoftICE now contains the symbolic information rather than the hex
addresses, making the Assembly code look more readable. The Windows 2000 symbolic
information comes in .DBG and .PDB files instead of just .DBG files. One needs to have
MSPDB60.DLL file from Visual C++ to covert these files into native symbol format of
SoftICE (.NMS)
In stdcall calling conventions, the parameters are pushed by the caller from right to left,
and the parameters pop off the stack by the called function. The advantage of using the
stdcall calling convention is that it generates compact code because the code for popping
the parameters off the stack resides in only one place (in the function itself). The
disadvantage is that since a fixed number of parameters always pop off in the function,
this calling convention cannot support a variable number of arguments. To have a variable
number of arguments, you must follow the cdecl calling convention.
The fastcall calling convention resembles stdcall, except its first two parameters are
passed in registers instead of on a stack. This results in faster code because the register
access proves much faster than memory access.
Let us take one sample C function following the stdcall calling convention and see the
corresponding Assembly code generated by the compiler. In this example, we will also see
how compiler-generated Assembly code accesses parameters passed to the function, and
how local variables are implemented. The concepts explained here form the basis for
reverse engineering discussed later in this chapter.
int sum;
sum=x+y+z;
return sum;
main()
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 72 of 226
Listing 5-2: Compiler-generated Assembly code for the C function in Listing 5-1
;sum
PUSH EBP
MOV EBP,ESP
SUB ESP,04
PUSH EBX
PUSH ESI
PUSH EDI
MOV EAX,[EBP+10]
ADD EAX,[EBP+0C]
ADD EAX,[EBP+08]
MOV [EBP-04],EAX
MOV EAX,[EBP-04]
POP EDI
POP ESI
POP EBX
LEAVE
RET 000C
;main
PUSH 30
PUSH 20
PUSH 10
CALL _sum@12
If you take a look at the Assembly code, the compiler generates the code to set the EBP
register to the start of the stack frame. (The stack frame for the function starts from
EBP+8 since the compiler pushes the EBP register to maintain the stack frame set up by
the caller function.) Hence, the parameters passed to the function start at EBP+8.
Therefore, the first parameter x is accessed as [EBP+8] by the generated Assembly code.
The parameters y and z are accessed as [EBP+C] and [EBP+10]. For implementing local
variables, compilers typically generate code, which decrements the ESP register by the
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 73 of 226
total number of bytes required to hold all the local variables defined in the function. In
the previous code, there is only one local variable sum; therefore, the compiler allocates
space for 4 bytes (1 DWORD) on the stack by generating the instruction SUB ESP, 4. The
EBP register accesses all such local variables as negative offsets. The variable sum is
accessed as [EBP-4] in the code. The LEAVE instruction used in the end restores the
contents of EBP register and cleans up the local variables.
When the function sum is called, the stack frame looks like:
30 fl Last parameter
20 fl Second parameter
10 fl First parameter
After setting up the standard stack frame of PUSH EBP, MOV EBP, ESP and creating space
for local variables, the stack looks like:
30 fl Last parameter(EBP+10)
20 fl Second parameter(EBP+C)
10 fl First parameter(EBP+8)
Return address
(Address of the
instruction following
the call _sum@12
instruction)
Most of the functions in the NTOSKRNL access the parameters and local variables in the
same way (by setting up the frame using EBP registers and accessing the local variables
using the negative offsets from the EBP register). But a few functions do not set up this
standard stack frame; instead, the parameters are accessed directly using the ESP register
(such as ESP+8). In this case, reverse engineering becomes very difficult because the same
parameter is accessed using different offsets from the ESP register at different places.
The advantage is that it results in faster and more compact code.
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 74 of 226
PATTERNS
Because compilers are themselves software programs, they follow a certain pattern when
generating the Assembly code.
MOV ECX, 6
REPSZ STOSD
This piece of code initializes the memory of 6 DWORD size (0x18 bytes), which starts at
location EBP-24. This also suggests that probably some structures of size 0x18 bytes is
locally defined and initialized in the function.
JZ BitNotSet
..
..
BitNotSet:
This piece of code tests the fifteenth bit of the fifth parameter passed to the function,
assuming standard stack frame is generated for the function and does the processing
based on the bit test results.
This statement fills in the EAX register with a pointer to the current thread object. Note
that the FS register points to a Processor Control Region (PCR) in kernel mode.
This piece of code fills in the EAX register with a pointer to the current process object
under Windows NT 3.51. Under Windows NT 4.0 and Windows 2000, this instruction looks
like MOV EAX, [EAX+44], since the offset of pointer to process object is changed from 3.51
to 4.0.
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 75 of 226
information and they want to figure out the problem. But Microsoft probably turns these
bits off when they get the release out. By doing this, Microsoft hides a wealth of
information from operating system reverse engineering. We expose a part of this wealth
here. There could be many other such flags. Pieces of hidden debug messages code inside
NTOSKRNL appear like this:
JZ HideFromReverseEngineering
PUSH ..
PUSH ..
PUSH ..
CALL DbgPrint
HideFromReverseEngineering:
Whenever you come across such a piece of code, just set the required bit from SoftICE,
and you will see all those messages that are hidden.
Here are some of the known variables in NTOSKRNL and the debug messages shown by the
operating system when these variables or bits of these variables are turned on. Most of the
variables appear only in the checked builds of the operating system.
ExpEchoPoolCalls
By setting this variable to 1, you can get the information about each memory
allocation/deallocation performed using functions such as ExAllocatePoolWithTag and
ExFreePool. The information shown includes the address where the memory was
allocated, size of the region allocated, type of the pool used (paged/nonpaged), and type
of memory (cache, aligned, and so on). The information displays as follows:
ObpShowAllocAndFree
By setting this variable to 1, you can get information about each executive object when it
is created/destroyed. The information includes the memory address where the object was
created and the type of the object (Key, Semaphore, and so on). The information appears
like this:
LpcpTraceMessages
This variable proves very useful in reverse engineering the local procedure call mechanism
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 76 of 226
(LPC) used by Windows NT for implementing various subsystems. By setting this variable to
1, you can get tons of information about how LPC functions. The information displays as
follows:
LPC[ 55.54 ]: Explorer.exe Send Request (LPC_REQUEST) Msg e1118b08 (853) [000000
LPC[ 1a.52 ]: csrss.exe Receive Msg e1118b08 (853) from Port e11a6dc0 (csrss.exe
LPC[ 1a.52 ]: csrss.exe Sending Reply Msg e1118b08 (853.0, 0) [00000000 00010001
LPC[ 55.54 ]: Explorer.exe Got Reply Msg e1118b08 (853) [00000000 00010001 00000
MmDebug
By setting different bits of this variable, you can see different messages generated by the
memory management module. Following, we list the bits of this variable that the
operating system can set and then generate the corresponding messages.
Bit 2
Bit 3
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 77 of 226
index 0 c0300403
index 1 c0301403
index 2 c0502403
index 3 c01ff401
....
....
Bit 4
csrss.exe file: \MMFAULT: va: 8018cd7e size: 1000 process: SystemVa file: \
MMFAULT: va: 77d9bd10 size: 1000 process: progman.exe file: \MMFAULT: va: c1ec00
....
....
Bit 10
Bit 28
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 78 of 226
zero bits 0
Bit 30
ObDebugFlags
Two bits of this variable (the fifth and sixth bits) control the operating system debug
messages. These bits control the security descriptor-related messages
Bit 6
Bit 7
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 79 of 226
NtGlobalFlag
One bit of this variable enables the debug messages. Other bits control the validations
performed by the operating system and general operation of the operating system. Take a
look at the GFLAGS utility in the resource kit for the description of individual bits of
NtGlobalFlag. The value of this variable is inherited by a variable in NTDLL.DLL during the
process startup. NTDLL.DLL uses the second bit of this variable to show the loading of a
process. During process startup, NTDLL gets the value of this flag and sets its internal
variable ShowSnap to 1 if the second bit is set. Once this bit is set, you can watch the
behavior of the PE executable/DLL loader. Windows NT will show names of all the
imported DLLs, plus it will show a real set of DLLs required to start an application. It will
also show you the address of initialization functions of each of these DLLs as well as a lot
of other information. Look at the following messages displayed by the operating system by
just turning on one bit of the NtGlobal flag variable. Here, we started pstat.exe and
terminated it immediately:
T40;C:\WINNT40\system32;C:\WINNT40;c:\winnt35;c:\winnt35\system32;c:\msdev\bin;C
:\DOS
...
...
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 80 of 226
...
...
...
SepDumpSD
By setting this variable to 1, the operating system dumps the security descriptor in the
security handling–related code.
SECURITY DESCRIPTOR
Revision = 1
Dacl present
Self relative
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 81 of 226
Owner S-1-5-32-544
Sacl@ 0
Dacl@ e11f71fc
AceSize = 20
Ace Flags =
AceSize = 24
Ace Flags =
Sid = S-1-5-32-544
TokenGlobalFlag
By setting this variable to 1, the operating system dumps the security token-related
messages.
0xe11826f0
0xe11826f0
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 82 of 226
NtOpenKey
DesiredAccess=80000000 RootHandle=00000000
Name='\Registry\Machine\Software\Microsoft\Windows
CmpParseKey:
CompleteName = '\Registry\Machine\Software\Microsoft\Windows
CmpFindSubKeyByName:
CmpFindSubKeyInLeaf:
CmpFindSubKeyByName:
CmpFindSubKeyInLeaf:
CmpFindSubKeyByName:
XREF: You can find a complete list of functions (documented as well as undocumented) imported by an application
using the DUMPBIN utility. For example, DUMPBIN PROGMAN.EXE /IMPORTS will display all the functions imported by
the program manager.
To start DRWTSN32, begin an application that faults (GPF) or write one that does the fault
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 83 of 226
explicitly. If you do not know an application that uses this undocumented function, try to
find an equivalent Win32 API call. If you find such a call, write an application that will call
this function. Assuming you want to decipher the parameters passed to a
NtAllocateVirtualMemory system service, you may write an application that calls
VirtualAlloc(). Once the breakpoint for the function that you want to decipher is
triggered, you can look at the details of the function implementation. You can use some
general tricks to decipher the parameters. We discuss a few of them in the sections that
follow.
JZ 8019D397
..
..
8019D397:
JZ 8019D3B3
From this Assembly code, you can easily see that [EBP+C], the second parameter, contains
the InfoClass, and [EBP+14], the fourth parameter, contains the size of the buffer that
holds the mutant information.
PUSH 00
LEA EAX,[EBP-20]
PUSH EAX
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 84 of 226
MOV EAX,[_ExMutantObjectType]
PUSH EAX
PUSH 01
CALL _ObReferenceObjectByHandle
MOV [EBP-24],EAX
TEST EAX,EAX
JL 8019D435
CALL _KeReadStateMutant
Looking at this code, you can clearly see that the first parameter to the NtQueryMutant()
function is the Mutex object handle because the same parameter is passed a first
parameter to documented ObReferenceObjectByHandle() function, and first parameter to
ObReferenceObjectByHandle() function is the object handle. Hence, using the knowledge
that the name of the function is NtQueryMutant and the first parameter is passed as is to
ObReferenceObjectByHandle as a object handle, we can conclude that the first parameter
might be a handle to a mutex object.
While executing in kernel mode, FS:[124] always points to the currently executing thread
(TEB) and [TEB+40] always points to the current process. Under Windows NT 4.0 and
Windows 2000, [TEB+44] points to the current process.
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 85 of 226
SHR EAX, 0A
SHR EAX, 14
SUB EAX,3FD00000
The preceding two pieces of code route to the page table entry and the page directory
entry, respectively, for the virtual address present in the ESI register. The functioning
registers might change; however, the pattern remains the same. You may have seen this
code in many memory management-related functions. At first it looks odd; however, it is
highly optimized using the 2’s complement method. As an exercise, try to determine how
this works. Hint: Page tables are mapped starting at the virtual address 0xC0000000, and
Page directory is mapped starting at the virtual address 0xC0300000.
PUSH 00
LEA EAX,[EBP-20]
PUSH EAX
PUSH ECX
PUSH 08
CALL _ObReferenceObjectByHandle
MOV [EBP-24],EAX
TEST EAX,EAX
JL .....
MOV EAX,FS:[00000124]
MOV ECX,[EBP-20]
CMP [EAX+40],ECX
JZ ...
PUSH ECX
CALL _KeAttachProcess
Here, the code attempts to play with other processes. It wants to perform some work on
behalf of another process. This piece of code gets the handle to the Process object as a
parameter. Using this handle, the code reaches to the actual object and then compares
the address of the Process object with the address of the current Process object stored at
[TEB+40] in Windows NT 3.51 and [TEB+44] in Windows NT 4.0 and Windows 2000. If the
Process object dealt with is not the current Process object, then the code attaches to the
desired Process object using KeAttachProces(). The code following this will execute in the
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 86 of 226
context of the attached process. You can see a similar kind of code in the system services
that have the ability to play in other processes. The system service
NtAllocateVirtualMemory enables allocation of the memory for a process other than the
current one. You will find this kind of code in the NtAllocateVirtualMemory() function.
Other places where you can find this code are NtFreeVirtualMemory() and
NtLockVirtualMemory().
XREF: You can study the example we chose in Chapter 10, “Adding New Software Interrupts.”
In Chapter 10, we discuss the callgate implementation on Windows NT (for running ring 0
code from ring 3 application). When we decided to design the callgate mechanism, we
were in search of some mechanism to allocate the selectors—the basic requirement for
creating callgates. We knew that the Win32 application did not have a Local Descriptor
Table (LDT). Therefore, we wanted to allocate selectors from a Global Descriptor Table
(GDT). First, we looked at the symbols of NTOSKRNL by using SoftICE’s command SYM
*Selector*. We received some entries matching the regular expression *Selector*.
_KeI386AllocateGdtSelectors
0008:80125D10 JB 80125D5E
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 87 of 226
0008:80125D2D JZ 80125D47
0008:80125D41 DEC SI
STATUS_ABIOS_SELECTOR_NOT_AVAILABLE
Looking at the last instruction, RET 8, the function clearly followed the _stdcall calling
convention with two parameters to the function. We next had to decipher what those
parameters were. Because the compiler generated the standard stack frame (PUSH EBP,
MOV EBP, ESP), clearly EBP+8 referred to the first parameter, and EBP+C referred to the
second parameter.
The following instruction sequence suggests that the second parameter represents the
number of selectors to be allocated:
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 88 of 226
0008:80125D10 JB 80125D5E
...
...
STATUS_ABIOS_SELECTOR_NOT_AVAILABLE
This code moves the second parameter in the SI register and compares the SI register with
the kernel variable KiNumberFreeSelectors$S10229. If the value in the SI register is less
than KiNumberFreeSelectors$S10229, then the code jumps to a label and from there fills
in the EAX register with an error code of STATUS_ABIOS_SELECTOR_NOT_AVAILABLE.
Clearly, the second parameter to the function was “Number of Selectors to allocate.”
The next two instructions acquired the GDT lock. Locks are extensively used at various
places to protect multiple threads from accessing some shared kernel data structure. Most
of the time, you can ignore these pieces of code, because they have nothing to do with
the actual logic of the function.
Then, the function loads the EDX register with the value of the kernel variable
_KiFreeGdtListHead$S10230. Looking at the instruction, you can see the selectors are put
in a free list.
Next, the function checks to see if the number of selectors to be allocated is zero. In that
case, the function jumps to a label where some rollback is done, and the EAX register is
zeroed out indicating success so the function returns.
0008:80125D2D JZ 80125D47
....
....
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 89 of 226
STATUS_ABIOS_SELECTOR_NOT_AVAILABLE
Now, let’s see what happens when the number of allocated selectors is nonzero:
0008:80125D41 DEC SI
The code fills the ECX register with the first parameter. Then, it loads the EDI register
with the value of the EDX register (_KiFreeGdtListHead$S10230). Next, it subtracts the
value of the kernel variable KiAbiosGdt. The value of the kernel variable KiAbiosGdt
matched with the base address of the Global Descriptor Table. Hence, the preceding piece
of code extracts the selector value in the DI register. Next, the code copies the selector
value in the location pointed by the ECX register. The code then adds 2 to the ECX
register. From this, we deduced that the first parameter points to a buffer that contains
the selector values allocated with each entry consisting of 2 bytes. Therefore, the first
parameter must be an array of short integers. The code reaches to the next free selector
using the instruction:
MOV EDX,[EDX]
From this, we can see that the free selectors are maintained in a linked list, and the
descriptors are used for keeping track of the next free selector in the list. The SI register
decrements each time in the loop. Initially, the SI register contains the number of
selectors to be allocated. In the end, the SI register reaches 0. At this point, the buffer
pointed by second parameter contains the list of selectors allocated.
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 90 of 226
NTSTATUS _stdcall
KeI386AllocateGdtSelectors(
if (KiNumberFreeSelectors$S10229<nSelectors) {
return STATUS_ABIOS_SELECTOR_NOT_AVAILABLE;
KfAcquireSpinLock(_KiAbiosGdtLock);
_KiNumberFreeSelectors$S10229-=nSelectors;
if (nSelectors==0) {
goto CommonExit;
DescriptorEntry=_KiFreeGdtListHead$S10230;
while (nSelectors!=0) {
SelectorArray[i]=DescriptorEntry-KiAbiosGdt;
i++;
nSelectors--;
DescriptorEntry=*DescriptorEntry
CommonExit:
KfReleaseSpinLock(_KiAbiosGdtLock);
return 0;
SUMMARY
In this chapter, we described how to use symbolic information supplied with Windows NT
using SoftICE. We also discussed some general techniques used for reverse engineering,
such as how to understand the compiler code generation patterns. Next, we showed how
Windows NT can assist in reverse engineering by enabling some debugging flags in the
kernel. We also discussed various ways of deciphering the parameters for undocumented
functions. Next, we reviewed some typical Assembly language patterns found throughout
the Windows NT kernel code. The chapter concluded with an example showing the
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 91 of 226
THIS CHAPTER DISCUSSES hooking Windows NT system services. Before we begin, let’s first review
what we mean by a system service. A system service refers to a set of functions (primitive
or elaborate) provided by the operating system. Application programming interfaces (APIs)
enable developers to call several system services, directly or indirectly. The operating
system provides APIs in the form of a dynamic link library (DLL) or a static compiler
library. These APIs are often based on system services provided by the operating system.
Some of the API calls are directly based on a corresponding system service, and some
depend on making multiple system service calls. Also, some of the API calls may not make
any calls to system services. In short, you do not need a one-to-one mapping between API
functions and system services. Figure 6-1 demonstrates this in context of Windows NT.
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 92 of 226
Note: Under Windows NT 3.51, the system services are provided by a kernel-mode component called NTOSKRNL.EXE.
Most of the KERNEL32.DLL calls—such as those related to memory management and kernel objects management—are
handled by these system services. The USER32 and GDI32 calls are handled by a separate subsystem process called
CSRSS. Starting with Windows NT 4.0, Microsoft moved most of the functionality of CSRSS into a kernel-mode driver
called WIN32K.SYS. The functionality moved into WIN32K.SYS is made available to the applications in the form of
system services. These system services are not truly part of native system services since they are specific to the user
interface and not used by all subsystems. This chapter and the next chapter focus only on the system services provided
by NTOSKRNL.EXE.
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 93 of 226
that hooking does not have any undesirable side effects on the operating system.
Protecting modifications to Registry keys is something easily doable when you hook the
Registry system services. This has several applications, since little protection is provided
for Registry settings created by applications.
Debugging
Complex programs could make use of system-service hooking to debug the stickiest
problems. For example, a few days back, we had a problem with the installation of a
piece of software. We had difficulty creating folders and shortcuts for this application.
Using a systemwide hook, we quickly figured that the installation program was looking for
a Registry value that indicated where to install the folders (which happened to be the
Start menu). We hooked the NtQueryValueKey() call, then obtained the value the
installation program was looking for. We created that value and solved our problem.
Life without hooking is unthinkable for most Windows developers in today’s Microsoft-
dominated world of operating systems. Windows NT system services lie at the center of
the NT universe, and having the ability to hook these can prove extremely handy.
TYPES OF HOOKS
The following sections explore two types of hooking.
Kernel-Level Hooking
You can achieve kernel-level hooking by writing a VXD or device driver. In this method,
essential functions provided by the kernel are hooked. The advantage of this type of
hooking is that you get one central place from which you can monitor the events occurring
as a result of a user-mode call or a kernel-mode call. The disadvantage of this method is
that you need to decipher the parameters of the call passed from kernel mode, since
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 94 of 226
many times these services are undocumented. Also, the data passed to the kernel-mode
call might differ from the data passed in a user-mode call. Also, a user-level API call might
be implemented using multiple calls to the kernel. In this case, hooking becomes far more
difficult. In general, this type of hooking is more difficult to achieve, but it can produce
more rewarding results.
User-Level Hooking
You can perform this type of hooking with some help from a VXD or device driver. In this
method, the functions provided by the user-mode DLLs are hooked. The advantage of this
method is that these functions are usually well documented. Therefore, you know the
parameters to expect. This makes it easy to write the hook function. This type of hooking
limits your field of vision to user mode only and does not extend to kernel mode.
IMPLEMENTATIONS OF HOOKS
The following sections detail the implementation of hooks under various Microsoft
platforms.
DOS
In the DOS world, system services are implemented as an interrupt handler routine (INT
21h). The compiler library routines typically call this interrupt handler to provide an API
function to the programmer. It is trivial to hook this handler using the GetVect (INT 21h,
AX=25h) and SetVect (Int 21h, AX=35h) services. Hence, hooking system services are fairly
straightforward. DOS does not contain separate user and kernel modes.
Windows 3.x
In the Windows 3.x world, system services are implemented in DLLs. The compiler library
routines represent stubs that jump to the DLL code (this is called dynamic linking of DLLs).
Also, because the address space is common to all applications, hooking amounts to getting
the address of that particular system service and changing a few bytes at that address.
Changing of these bytes sometimes requires the simple aliasing of selectors.
XREF: Refer to the MSDN article in Microsoft Systems Journal (Vol. 9, No. 1) entitled, “Hook and Monitor Any 16-bit
Windows(tm) Function With Our ProcHook DLL,” by James Finnegan.
Windows 95 and 98
In the Windows 95/98 world, system services are implemented in a DLL as in Windows 3.1.
However, under Windows 95/98, all 32-bit applications run in separate address spaces.
Because of this, you cannot easily hook any unshared DLL. It is fairly easy to hook a shared
DLL such as KERNEL32.DLL. You simply modify a few code bytes at the start of the system
service you want to hook and write your hook function in a DLL that is loaded in shared
memory. Modifying the code bytes may involve writing a VXD, because KERNEL32.DLL is
loaded in the upper 2GB of the address space and protected by the operating system.
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 95 of 226
Windows NT
In the Windows NT world, system services are implemented in the kernel component of NT
(NTOSKRNL.EXE). The APIs supported by various subsystems (Win32, OS/2, and POSIX) are
implemented by using these system services. There is no documented way of hooking
these system services from kernel mode. There are several documented ways for hooking
user-level API calls.
XREF: Refer to the MSDN articles in Microsoft Systems Journal entitled, “Learn System-Level Win32(r) Coding
Techniques by Writing and API Spy Program,” by Matt Pietrek (Vol.9, No.12), and “Load Your 32-bit DLL into Another
Process’s Address Space Using INJLIB,” by Jeffrey Richter (Vol.9, No.5).
We will present one way of achieving hooking of NT system services in kernel mode in this
chapter. We also provide the code for this on the CD-ROM accompanying this book.
Windows programmers, when they link with the KERNEL32, USER32, and GDI32 DLLs, are
completely unaware of the existence of the NT system services supporting the various
Win32 calls they make. Similarly, POSIX clients using the POSIX API end up using more or
less the same set of NT system services to get what they want from the kernel. Thus, NT
system services represent the fundamental interface for any user-mode application or
subsystem to the kernel.
For example, when a Win32 application calls CreateProcess() or when a POSIX application
calls the fork() call, both ultimately call the NtCreateProcess() system service from the NT
executive.
NT system services represent routines, which run entirely in the kernel mode. For those
familiar with the Unix world, NT system services can be considered the equivalent of
system calls in Unix.
Currently, Windows NT system services are not completely documented. The only place
where you can find some documentation regarding the NT system services is on Windows
NT DDK CD-ROMs from Microsoft. The DDK discusses about 25 different system services and
covers the parameters passed to them in some detail. You’ll see from Appendix A that this
is only the tip of the iceberg. In Windows NT 3.51, 0xC4 different system services exist, in
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 96 of 226
Windows NT 4.0, 0xD3 different system services exist, and in Windows 2000 Beta-2, 0xF4
different system services exist.
We deciphered the parameters of 90% of the system services. Prototypes for all these
system services can be found in UNDOCNT.H on the CD-ROM included with this book. We
also provide detailed documentation of some of the system services in Appendix A.
In the following section, you will learn how to hook these system services.
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 97 of 226
The easiest way to put a hook into the system services is to locate the System Service
Dispatch Table used by the operating system and change the function pointers to point to
some other function inserted by the developer. You can do this only from a kernel-mode
device driver because this table is protected by the operating system at the page table
level. The page attribute for these pages is set so that only kernel-mode components can
read from and write to this table. User-level applications cannot read or write these
memory locations.
PVOID ServiceTableBase;
PVOID ServiceCounterTable(0);
PVOID ParamTableBase;
where
ServiceCounterTable This field is used only in checked builds of the operating system and contains the counter of how
many times each service in SSDT is called. This counter is updated by INT 2Eh handler
(KiSystemService).
ParamTableBase Base address of the table containing the number of parameter bytes for each of the system
services.
The following program provides an example of hooking system services, under Windows
NT. The system service NtCreateFile() hooks and the name of the file created prints when
the hook gets invoked. We encourage you to insert code for hooking any other system
service of choice. Note the proper places for inserting new hooks in the following code.
Here are the steps to try out the sample (assuming that the sample binaries are copied in
C:\SAMPLES directory):
1. Run “instdrv hooksys c:\samples\hooksys.sys.” This will install the hooksys.sys driver.
The driver will hook the NtCreateFile system service.
2. Try to access the files on your hard disk. For each accessed file, the hooksys.sys will
trap the call and display the name of the file accessed in the debugger window.
These messages can be seen in SoftICE or using the debug message-capturing tool.
#include "ntddk.h"
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 98 of 226
#include "stdarg.h"
#include "stdio.h"
#include "hooksys.h"
#define DRIVER_SOURCE
#include "..\..\include\wintype.h"
#include "..\..\include\undocnt.h"
PHANDLE FileHandle,
ACCESS_MASK DesiredAccess,
POBJECT_ATTRIBUTES ObjectAttributes,
PIO_STATUS_BLOCK IoStatusBlock,
ULONG FileAttributes,
ULONG ShareAccess,
ULONG CreateDisposition,
ULONG CreateOptions,
ULONG EaLength
);
#define SYSTEMSERVICE(_function)
KeServiceDescriptorTable.ServiceTableBase[
*(PULONG)((PUCHAR)_function+1)]
NTCREATEFILE OldNtCreateFile;
NTSTATUS NewNtCreateFile(
PHANDLE FileHandle,
ACCESS_MASK DesiredAccess,
POBJECT_ATTRIBUTES ObjectAttributes,
PIO_STATUS_BLOCK IoStatusBlock,
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 99 of 226
ULONG FileAttributes,
ULONG ShareAccess,
ULONG CreateDisposition,
ULONG CreateOptions,
ULONG EaLength)
int rc;
char ParentDirectory[1024];
PUNICODE_STRING Parent=NULL;
ParentDirectory[0]='\0';
if (ObjectAttributes->RootDirectory!=0) {
PVOID Object;
Parent=(PUNICODE_STRING)ParentDirectory;
rc=ObReferenceObjectByHandle(ObjectAttributes->RootDirectory,
0,
0,
KernelMode,
&Object,
NULL);
if (rc==STATUS_SUCCESS) {
extern NTSTATUS
int *);
int BytesReturned;
rc=ObQueryNameString(Object,
ParentDirectory,
sizeof(ParentDirectory),
&BytesReturned);
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 100 of 226
ObDereferenceObject(Object);
if (rc!=STATUS_SUCCESS)
RtlInitUnicodeString(Parent,
L"Unknown\\");
} else {
RtlInitUnicodeString(Parent,
L"Unknown\\");
Parent?Parent->Buffer:L"",
Parent?L"\\":L"", ObjectAttributes-
>ObjectName->Buffer);
rc=((NTCREATEFILE)(OldNtCreateFile)) (
FileHandle,
DesiredAccess,
ObjectAttributes,
IoStatusBlock,
AllocationSize,
FileAttributes,
ShareAccess,
CreateDisposition,
CreateOptions,
EaBuffer,
EaLength);
return rc;
NTSTATUS HookServices()
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 101 of 226
OldNtCreateFile=(NTCREATEFILE)(SYSTEMSERVICE(ZwCreateFile));
_asm cli
(NTCREATEFILE)(SYSTEMSERVICE(ZwCreateFile))=NewNtCreateFile;
_asm sti
return STATUS_SUCCESS;
void UnHookServices()
_asm cli
(NTCREATEFILE)(SYSTEMSERVICE(ZwCreateFile))=OldNtCreateFile;
_asm sti
return;
NTSTATUS
DriverEntry(
IN PDRIVER_OBJECT DriverObject,
IN PUNICODE_STRING RegistryPath
MYDRIVERENTRY(DRIVER_DEVICE_NAME,
FILE_DEVICE_HOOKSYS,
HookServices());
return ntStatus;
NTSTATUS
DriverDispatch(
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 102 of 226
IN PDEVICE_OBJECT DeviceObject,
IN PIRP Irp
Irp->IoStatus.Status = STATUS_SUCCESS;
IoCompleteRequest (Irp,
IO_NO_INCREMENT
);
return Irp->IoStatus.Status;
VOID
DriverUnload(
IN PDRIVER_OBJECT DriverObject
WCHAR deviceLinkBuffer[] =
L"\\DosDevices\\"DRIVER_DEVICE_NAME;
UNICODE_STRING deviceLinkUnicodeString;
UnHookServices();
RtlInitUnicodeString (&deviceLinkUnicodeString,
deviceLinkBuffer
);
IoDeleteSymbolicLink (&deviceLinkUnicodeString);
IoDeleteDevice (DriverObject->DeviceObject);
SUMMARY
In this chapter, we explored system services under DOS, Windows 3.x, Windows 95/98,
and Windows NT. We discussed the need for hooking these system services. We discussed
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 103 of 226
kernel- and user-lever hooks. We discussed the data structures used during the system call
and the mechanism used for hooking Windows NT system services. The chapter concluded
with an example that hooked the NtCreateFile() system service.
CUSTOMIZING THE KERNEL for specific purposes has been very popular among developers long
before Windows NT. Ancient Unix gurus and developers alike practiced the art. In Unix, for
example, kernel developers can modify the kernel in several ways, such as adding new
device drivers, kernel extensions, system calls, and kernel processes. In Windows NT, DDK
provide means to add new device drivers. However, one of most effective ways of
modifying the kernel–adding new system services to it–is not documented. This method
proves more efficient than adding device drivers for several reasons discussed later in this
chapter. Here, we focus on the detailed implementation of a system service inside the
Windows NT kernel and explain, with examples, how new system services can add to the
Windows NT.
In Inside Windows NT, Helen Custer mentions the design of system services and the
possibility of adding new system services to the kernel:
Using a system service dispatch table provides an opportunity to make native NT system
services extensible. The kernel can support new system services simply by expanding the
table without requiring changes to the system or to applications. After a code is written
for a new system service, a system administrator could simply run a utility program that
dynamically creates a new dispatch table. The new table will contain another entry that
points to a new system service.
The capability to add new system services exists in Windows NT but it is not documented.
Very little changed between NT 3.51 and later versions of Windows NT in this area. The
only thing being changed is that some of the data structures involved in implementation of
a system service are located at the different offsets in the later versions of the operating
system. We feel that our method of adding new system services may hold, possibly with
very minor modifications, in future releases of Windows NT.
At the end of this chapter, we try to shed some light on the possible thought that went
into the design of this portion of the operating system.
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 104 of 226
implementation of the SSDT and SSPT occurs similarly in all versions of Windows NT to
date. We present the two implementations separately for clarity, one for Windows NT
3.51 and one for the later versions of the operating system such as Windows NT 4.0 and
Windows 2000.
Below is the table containing the service ID mappings for all versions of Windows NT to
date.
Windows NT 3.51 Mapped to 0x0 through 0xC3 service Processed by the Win32 subsystem–a
IDs inside NTOSKRNL user mode process. No system
services are provided in the kernel
for handling these directly. These
calls use the Win32 subsystem using
kernel’s LPC system services.
Windows NT 4.0 (up to Service Pack Mapped to 0x0 through 0xD2 service Mapped to 0x1000 through 0x120A
5) IDs inside NTOSKRNL service IDs in the inside WIN32K.SYS.
The kernel mode driver WIN32K.SYS
takes over the functionality of the
Win32 subsystem and supports these
services.
Windows NT 2000 (beta-2) 0x0 through 0xF3 service IDs inside Mapped to 0x1000 through 0x1285
NTOSKRNL service IDs in the inside WIN32K.SYS.
The kernel mode driver WIN32K.SYS
takes over the functionality of the
Win32 subsystem and supports these
services.
In Windows NT 3.51, only the KERNEL32 and ADVAPI32 functions of the operating system
route through NTDLL.DLL to NTOSKRNL. The USER32 and GDI32 functions of the operating
system implement as a part of the Win32 subsystem process (CSRSS). The USER32.DLL and
GDI32.DLL provide wrappers, which calls the CSRSS process using the local procedure call
(LPC) facility.
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 105 of 226
instruction implements the system services. The INT 2Eh handler is internally named as
KiSystemService and hereafter we refer to it as the handler. Before entering the handler,
the EAX register is loaded with the service ID and the EDX register with a pointer to the
stack frame required for implementation of a particular service. The handler gets to the
current TEB (Thread Environment Block) by looking at the Processor Control Region (PCR).
The current TEB is stored at an offset of 0x124 in the Processor Control Region. The
handler gets the address of the System Service Descriptor Table from the TEB. You can
locate the address of the Service Descriptor Table at 0x124 offset in the TEB. Chapter 6
explains the format of the Service Descriptor Table.
The handler refers to the first entry in the Service Descriptor Table for service IDs less
than 0x1000 and refers to the second entry of the table for service IDs greater than or
equal to 0x1000. The handler checks the validity of service IDs. If a service ID is valid, the
handler extracts the addresses of the SSDT and SSPT. The handler copies the number of
bytes (equal to the total number of bytes of the parameter list) described by the SSPT for
the service–from user-mode stack to kernel-mode stack–and then calls the function
pointed to by the SSDT for that service.
Initially, when any thread is started, the TEB contains a pointer to the Service Descriptor
Table–identified internally as KeServiceDescriptorTable. KeServiceDescriptorTable
contains four entries. Only the first entry in this table is used, which describes the service
ids for some of the KERNEL32 and ADVAPI32 calls. Another Service Descriptor Table,
internally named KeServiceDescriptorTableShadow, identically matches
KeServiceDescriptorTable under NT 3.51. However, under later versions of the operating
system, the second entry in the table is not NULL. The second entry points to another
SSDT and SSPT. This SSDT and SSPT comprise part of the WIN32K.SYS driver. The
WIN32K.SYS driver creates this entry during its initialization (in its DriverEntry routine) by
calling the function called KeAddSystemServiceTable. (We provide more information on
this later in this chapter.) This second entry describes the services exported by
WIN32K.SYS for USER32 and GDI32 modules.
You should note that in all versions of Windows NT, KeServiceDescriptorTable contain only
one entry and that all started threads point their TEBs to KeServiceDescriptorTable. This
continues so long as the threads call services belonging to first entry in
KeServiceDescriptorTable. When the threads call services above these limits (unlikely in
3.51, but very likely in later versions of Windows NT, because USER and GDI service IDs
start with 0x1000), the KiSystemService jumps to a label _KiEndUnexpectedRange under
NT 3.51 and _KiErrorMode under NT 4.0 and KiBBTEndUnexpectedRange in Windows 2000.
Let’s see what role the code at each label plays.
*/
return STATUS_INVALID_SYSTEM_SERVICE;
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 106 of 226
if (PsConvertToGuiThread() != STATUS_SUCCESS) {
return STATUS_INVALID_SYSTEM_SERVICE;
PsConvertToGuiThread()
if (PspW32ProcessCallout) {
* always = 0
*/
/* This is only invoked for the later versions of the operating system
*/
} else {
return STATUS_ACCESS_DENIED;
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 107 of 226
Under Windows NT 3.51, both KeServiceDescriptorTable and the Shadow Table point to
the same SSDT and SSPT and contain only one entry. Now, ask yourself this logical
question: “Why do we have the Shadow Table at all when apparently it does not provide
much help in NT 3.51?” We attempt to answer this question later in the chapter.
Note: Note that once a process makes a USER32/GDI32 call, it permanently stops using the original
KeServiceDescriptorTable and switches entirely to a copy of KeServiceDescriptorTableShadow.
1. Allocate a block of memory large enough to hold existing SSDT and SSPT and the
extensions to each of the table.
2. Copy the existing SSDT and SSPT into this block of memory.
3. Append the new entries to the new copies of the two tables as shown in Figure 7-2.
4. Update KeServiceDescriptorTable and KeServiceDescriptorTableShadow to point to
the newly allocated SSDT and SSPT.
In NT 3.51, because the Shadow Table is never used, you could get away without having to
update it. In NT 4.0 and Windows 2000, however, the Shadow Table takes a leading role
once a GDI32 or a USER32 call has been made. Therefore, it is important that you update
both KeServiceDescriptorTable and KeServiceDescriptorTableShadow. If you fail to update
KeServiceDescriptorTableShadow in NT 4.0 or Windows 2000, the newly added services
will fail to work once a GDI32 or USER32 call is made. We recommend that you update
both the tables in all versions of Windows NT so that you can use the same piece of code
with all the versions of the operating systems.
The method we used for this is as follows. There is a function in NTOSKRNL called
KeAddSystemServiceTable. This function is used by WIN32K.SYS driver for adding the
USER32 and GDI 32 related functions. This function does refer to
KeServiceDescriptorTableShadow. The first entry in both KeServiceDescriptorTable and
KeServiceDescriptorTableShadow is the same. We iterate through each DWORD in the
KeAddSystemServiceTable code, and for all valid addresses found in this function, we
compare the 16 bytes (size of one entry in descriptor table) at this address with the first
entry in KeServiceDescriptorTable. If we find the match, we consider that as the address
of the KeServiceDescriptorTableShadow. This method seems to work in all Windows NT
versions.
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 108 of 226
provides interface for services called by KERNEL32.DLL). And one application links to this
wrapper DLL and calls the newly added services. The newly added services print a debug
message saying, “kernel service .... Called” and print the parameters passed to the
services. Each service returns values 0, 1, and 2. The function AddServices() isolates the
code for the mechanism of adding new system services.
Assuming first that the sample binaries are copied in C:\SAMPLES directory, here are the
steps to try out the sample:
#include "ntddk.h"
#include "stdarg.h"
#include "stdio.h"
#include "extnddrv.h"
#define DRIVER_SOURCE
#include "..\..\include\wintype.h"
#include "..\..\include\undocnt.h"
NTSTATUS SampleService0(void);
..............
..............
*/
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 109 of 226
(unsigned int)SampleService1,
(unsigned int)SampleService2,
..............
..............
*/
};
/* Table describing the parameter bytes required for the new services */
4,
8,
..............
..............
*/
};
NTSTATUS SampleService0(void)
return STATUS_SUCCESS;
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 110 of 226
trace(("param1=%x\n", param1));
return STATUS_SUCCESS+1;
return STATUS_SUCCESS+2;
..............
..............
*/
int i;
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 111 of 226
__try {
__except (EXCEPTION_EXECUTE_HANDLER) {
return 0;
if (MmIsAddressValid((PVOID)dwordatbyte)) {
if (memcmp((PVOID)dwordatbyte,
&KeServiceDescriptorTable, 16)==0) {
if
((PVOID)dwordatbyte==&KeServiceDescriptorTable) {
continue;
return dwordatbyte;
return 0;
NTSTATUS AddServices()
PServiceDescriptorTableEntry_t KeServiceDescriptorTableShadow;
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 112 of 226
NumberOfServices=sizeof(ServiceTableBase)/sizeof(ServiceTableBase[0]);
trace(("KeServiceDescriptorTable=%x\n", &KeServiceDescriptorTable));
KeServiceDescriptorTableShadow=(PServiceDescriptorTableEntry_t)GetAddrssofShadow
Table();
if (KeServiceDescriptorTableShadow==NULL) {
return STATUS_UNSUCCESSFUL;
trace(("KeServiceDescriptorTableShadow=%x\n",
KeServiceDescriptorTableShadow));
NewNumberOfServices=KeServiceDescriptorTable.NumberOfServices+NumberOfServices;
StartingServiceId=KeServiceDescriptorTable.NumberOfServices;
NewNumberOfServices*sizeof(unsigned int));
if (NewServiceTableBase==NULL) {
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 113 of 226
return STATUS_INSUFFICIENT_RESOURCES;
NewNumberOfServices);
if (NewParamTableBase==NULL) {
ExFreePool(NewServiceTableBase);
return STATUS_INSUFFICIENT_RESOURCES;
memcpy(NewServiceTableBase, KeServiceDescriptorTable.ServiceTableBase,
KeServiceDescriptorTable.NumberOfServices*sizeof(unsigned
int));
memcpy(NewParamTableBase, KeServiceDescriptorTable.ParamTableBase,
KeServiceDescriptorTable.NumberOfServices);
memcpy(NewServiceTableBase+KeServiceDescriptorTable.NumberOfServices,
ServiceTableBase, sizeof(ServiceTableBase));
memcpy(NewParamTableBase+KeServiceDescriptorTable.NumberOfServices,
ParamTableBase, sizeof(ParamTableBase));
KeServiceDescriptorTable.ServiceTableBase=NewServiceTableBase;
KeServiceDescriptorTable.ParamTableBase=NewParamTableBase;
KeServiceDescriptorTable.NumberOfServices=NewNumberOfServices;
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 114 of 226
SSPT */
KeServiceDescriptorTableShadow->ServiceTableBase=NewServiceTableBase;
KeServiceDescriptorTableShadow->ParamTableBase=NewParamTableBase;
KeServiceDescriptorTableShadow->NumberOfServices=NewNumberOfServices;
/* Return Success */
DbgPrint("Returning success\n");
return STATUS_SUCCESS;
NTSTATUS
DriverDispatch(
IN PDEVICE_OBJECT DeviceObject,
IN PIRP Irp
);
VOID
DriverUnload(
IN PDRIVER_OBJECT DriverObject
);
NTSTATUS
DriverEntry(
IN PDRIVER_OBJECT DriverObject,
IN PUNICODE_STRING RegistryPath
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 115 of 226
return ntStatus;
NTSTATUS
DriverDispatch(
IN PDEVICE_OBJECT DeviceObject,
IN PIRP Irp
PIO_STACK_LOCATION irpStack;
PVOID ioBuffer;
ULONG inputBufferLength;
ULONG outputBufferLength;
NTSTATUS ntStatus;
Irp->IoStatus.Status = STATUS_SUCCESS;
Irp->IoStatus.Information = 0;
switch (irpStack->MajorFunction)
case IRP_MJ_DEVICE_CONTROL:
trace(("EXTNDDRV.SYS: IRP_MJ_CLOSE\n"));
switch (irpStack->Parameters.DeviceIoControl.IoControlCode)
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 116 of 226
case IOCTL_EXTNDDRV_GET_STARTING_SERVICEID:
trace(("EXTNDDRV.SYS:
IOCTL_EXTNDDRV_GET_STARTING_SERVICEID\n"));
outputBufferLength = irpStack-
>Parameters.DeviceIoControl.OutputBufferLength;
if (outputBufferLength<sizeof(StartingServiceId)) {
Irp->IoStatus.Status =
STATUS_INSUFFICIENT_RESOURCES;
} else {
ioBuffer = (PULONG)Irp->AssociatedIrp.SystemBuffer;
memcpy(ioBuffer, &StartingServiceId,
sizeof(StartingServiceId));
Irp->IoStatus.Information = sizeof(StartingServiceId);
break;
break;
ntStatus = Irp->IoStatus.Status;
IoCompleteRequest (Irp,
IO_NO_INCREMENT
);
return ntStatus;
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 117 of 226
VOID
DriverUnload(
IN PDRIVER_OBJECT DriverObject
UNICODE_STRING deviceLinkUnicodeString;
RtlInitUnicodeString (&deviceLinkUnicodeString,
deviceLinkBuffer
);
IoDeleteSymbolicLink (&deviceLinkUnicodeString);
IoDeleteDevice (DriverObject->DeviceObject);
trace(("EXTNDDRV.SYS: unloading\n"));
/* MYNTDLL.C
*/
#include <windows.h>
#include <stdio.h>
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 118 of 226
#include <winioctl.h>
#include "..\sys\extnddrv.h"
int ServiceStart;
_asm {
int 2eh
__declspec(dllexport) NTSTATUS
SampleService1(int param)
void **stackframe=¶m;
_asm {
add eax, 1
int 2eh
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 119 of 226
__declspec(dllexport) NTSTATUS
char **stackframe=¶m1;
_asm {
add eax, 2
int 2eh
__declspec(dllexport) NTSTATUS
char **stackframe=¶m1;
_asm {
add eax, 3
int 2eh
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 120 of 226
__declspec(dllexport) NTSTATUS
char **stackframe=¶m1;
_asm {
add eax, 4
int 2eh
__declspec(dllexport) NTSTATUS
int param5)
char **stackframe=¶m1;
_asm {
add eax, 5
int 2eh
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 121 of 226
__declspec(dllexport) NTSTATUS
char **stackframe=¶m1;
_asm {
add eax, 6
int 2eh
BOOL SetStartingServiceId()
HANDLE hDevice;
BOOL ret;
hDevice = CreateFile (
"\\\\.\\extnddrv",
GENERIC_READ | GENERIC_WRITE,
0,
NULL,
OPEN_EXISTING,
FILE_ATTRIBUTE_NORMAL,
NULL
);
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 122 of 226
if (hDevice == ((HANDLE)-1))
"Error", MB_OK);
ret = FALSE;
else
DWORD BytesReturned;
ret=DeviceIoControl(
hDevice,
IOCTL_EXTNDDRV_GET_STARTING_SERVICEID,
NULL,
NULL,
&ServiceStart,
sizeof(ServiceStart),
&BytesReturned,
NULL);
if (ret) {
if (BytesReturned!=sizeof(ServiceStart)) {
"Error", MB_OK);
ret=FALSE;
} else {
ret = TRUE;
} else {
"Error", MB_OK);
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 123 of 226
CloseHandle (hDevice);
return ret;
BOOL WINAPI
DllMain(HANDLE hModule,
DWORD Reason,
LPVOID lpReserved)
switch (Reason) {
case DLL_PROCESS_ATTACH:
//
//
return SetStartingServiceId();
default:
return TRUE;
* system services.
*/
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 124 of 226
#include <windows.h>
#include <stdio.h>
#include "..\dll\myntdll.h"
main()
SampleService0());
SampleService1(0x10));
SampleService2(0x10, 0x20));
return 0;
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 125 of 226
KeAddSystemServiceTable
The WIN32K.SYS driver calls this function during its DriverEntry under Windows NT 4.0 and
Windows 2000. This function looks somehow odd. The function expects five parameters:
an index in the Service Descriptor Table where this new entry is to be added, SSDT, SSPT,
the number of services, and one parameter for use only in checked build versions. This
last parameter points to a DWORD Table that holds the value of the number of times each
service gets called.
SUMMARY
In this chapter, we discussed in detail the system service implementation of Windows NT.
We explored some code fragments from a system service interrupt handler, using
KiSystemService() as an example. Next, we detailed the mechanism for adding new system
services to the Windows NT kernel. We also used an example that adds three new system
services to the Windows NT kernel. We compared extending the kernel with device drivers
with extending the kernel by adding system services.
MICROSOFT DESIGNED THE local procedure call (LPC) facility to enable efficient communication
with what Windows NT calls the subsystems. Although you do not need to know about
subsystems before understanding the LPC mechanism, it is certainly interesting and
advisable. In this chapter, we discuss the subsystems and then shed some light on the
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 126 of 226
DOS and Unix variants dominated the operating systems world in the 1980s. DOS has a
monolithic architecture, composed of a single lump of code. Unix follows the layered
architecture, where the operating system divides into layers such that each layer uses
only the interface provided by the lower layers. The MACH operating system follows a new
client-server approach. The initial versions of MACH were based on BSD Unix 4.3.
The MACH team focused on two major goals. First, they wanted to have a more structured
code than BSD 4.3. Second, they wanted to support different variants of the Unix API.
They achieved both these goals by pushing the execution of kernel code to user-mode
processes, which acted as servers. The MACH kernel appears very small, providing only the
basic system services common to all Unix APIs. Therefore, we call it a micro-kernel. The
server processes run in user mode and provide a sophisticated API interface. The normal
application processes are clients of these server processes. When a client process invokes
an API function, the emulation library, which links with the client code, transparently
passes on the call to the server process. You can accomplish this using a facility similar to
RPC (remote procedure call). The server process, after carrying out any necessary
processing, returns the results to the client.
To support a new API in the MACH environment, you need to write a server process and
emulation library, which support the new API. Not all server processes provide a different
API. Some provide generic functionality such as memory management or TTY
management.
The Windows NT design team sought goals similar to that of MACH’s developers. They
wanted to support Win32, OS/2, and POSIX APIs, while keeping room for future APIs.
Client-server architecture proved a natural choice.
The servers are called as the protected subsystems in Windows NT. Subsystems are user-
mode processes running in a local system security context. We call them protected
subsystems because they are separate processes operating in separate address spaces and
hence are protected from client access/modification. There are two types of subsystems:
Integral subsystems
Environment subsystems
Integral Subsystems
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 127 of 226
An integral subsystem performs some essential operating system task. For Windows NT,
this group includes the Local Security Authority (lsass.exe), the Security Accounts
Manager, the Session Manager (smss.exe), and the network server. The Local Security
Authority (LSA) subsystem manages security access tokens for users. The Security Accounts
Manager (SAM) subsystem maintains a database of information on user accounts, including
passwords, any account groups a given user belongs to, the access rights each user is
allowed, and any special privileges a given user has. The Session Manager subsystem starts
and keeps track of NT logon sessions and serves as an intermediary among protected
subsystems.
Environment Subsystems
An environment subsystem is a server that appears to perform operating system functions
for its native applications by calling system services. An environment subsystem runs in
user mode and its interface to end-users emulates another operating system, such as OS/2
or POSIX–on top of Windows NT. Even the Win32 API implements through a subsystem
process under Windows NT 3.51.
Note: Not all the API functions in the client-side DLLs need to pass the call to the subsystem process. For example,
most of the KERNEL32.DLL calls can directly map onto the system services provided by the kernel. Such API functions
invoke the system services via NTDLL.DLL. Most of the USER32.DLL functions and GDI32.DLL functions pass on the call
to the subsystem process. (In Windows NT 4.0, Microsoft moved the Win32 subsystem inside the kernel for performance
reasons.)
The system call interface provided by the Windows NT kernel is called as the native API.
The Win32 subsystem uses the native API for implementing the Win32 API. Generally, user
programs make calls to an API provided by some subsystem, avoiding the use of a
cumbersome, native API. We refer to the user programs as the clients of the subsystem
that provides the API used by these programs.
The communication between the client processes and the subsystem happens through a
mechanism called local procedure call (LPC), specially designed by Microsoft for that
purpose. For unknown reasons, Microsoft prefers to keep the LPC interface
undocumented. There is no reason why LPC cannot function as an Inter-Process
Communication (IPC) mechanism. Microsoft provides a RPC kit for client-server
communication across machines. Windows NT optimizes the RPCs by converting them to
LPCs, in case the client and the server reside on the same machine. However, RPC has its
own overheads. LPC proves most efficient in the raw form, and the subsystems also use it
in that form only. Apart from that, RPC does not provide access to the fastest form of
LPC–the Quick LPC. For these reasons, we provide you with useful information on the LPC
interface.
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 128 of 226
The stub function in the DLL waits for the subsystem to return the results and, in turn,
passes the results to the caller. The client process simply resembles calling a normal
procedure in its own code. In the case of RPC, the client actually calls a procedure sitting
in some remote server over the network–hence the name remote procedure call. In
Windows NT, the server runs on the same machine; hence the mechanism is called as a
local procedure call.
There are three types of LPC. The first type sends small messages up to 304 bytes. The
second type sends larger messages. The third type of LPC is called as Quick LPC and used
by the Win32 subsystem in Windows NT 3.51.
The first two types of LPC use port objects for communication. Ports resemble the sockets
or named pipes in Unix. A port is a bidirectional communication channel between two
processes. However, unlike sockets, the data passed through ports is not streamed. The
ports preserve the message boundaries. Simply put, you can send and receive messages
using ports. The subsystems create ports with well-known names. The client processes
that need to invoke services from the subsystems open the corresponding port using the
well-known name. After opening the port, the client can communicate, with the server,
over the port.
The client sends a connection request to a waiting subsystem using the NtConnectPort()
function. When the subsystem receives the connect request, it comes out of the
NtListenPort() function and accepts the connection using the NtAcceptConnectPort()
function. The NtAcceptConnectPort returns a new port handle specific to the client
requesting the connection. The server can break the communication link with the
particular client by closing this handle. The subsystem completes the connection protocol
using the NtCompleteConnectPort() function. Now, the client also returns from the
NtConnectPort() function and gets a handle to the communication port. This handle is
private to the client process. The child processes do not inherit the port handles so the
children need to open the subsystem port again.
After completing this connection protocol, the client and the subsystem can start
communicating over this port. The client sends a request to the subsystem using the
NtRequestPort() function. When the NtRequestPort() function sends datagram messages to
the subsystem, the client does not receive any acknowledgment for the sent messages. In
case the client expects a reply to its request, the client can use the
NtRequestWaitReplyPort() function, which sends the request to the subsystem and waits
for a reply from the subsystem. The subsystem receives request messages using the
NtReplyWaitReceive() function and sends reply messages using the NtReplyPort() function.
The subsystem can optimize by replying to the previous request and waiting for the next
request using a single call to the NtReplyWaitReceivePort() function. Figure 8-1 displays this
entire process of communication.
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 129 of 226
A subsystem may receive/reply to messages from more than one client using the same
port. The message contains fields, which identify the client process and thread. The
kernel fills in the process ID and the thread ID in the messages. Therefore, the subsystems
can rely on this information, and the LPC forms a secure and reliable communication
mechanism because the sender of the messages can be reliably identified.
Generally, as a part of the port message, the client specifies the server space address of
the shared section and the offset of the copied parameters within the shared section. If
the server uses this information, it should first validate it if the client process proves
unreliable. After processing the request, the server also sends back the results via the
shared section. Apart from the additional processing, the shared section LPC essentially
uses the same set of port APIs as the short message communication. The sequence of
operations also resembles that of the short message communication with one exception–in
addition to handling the message port, the client must create the shared section and
perform the parameter copying. The sequence of operations shown in Figure 8-1 applies to
the shared section LPC as well.
PORT-RELATED FUNCTIONS
In this section, we discuss the port-related functions and parameters passed to them in
detail. We prepared sample programs demonstrating short message passing and shared
section memory message passing. We discuss these programs next.
NtCreatePort
int _stdcall
NtCreatePort(
PHANDLE PortHandle,
POBJECT_ATTRIBUTES ObjectAttributes,
DWORD MaxConnectInfoLength,
DWORD MaxDataLength,
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 130 of 226
DWORD Unknown);
This function creates a new port for communication. The name of the port and the parent
directory in the object hierarchy pass through the ObjectAttributes parameter. The
MaxConnectInfoLength parameter specifies the maximum size of information that can pass
on to a connection request. (Later in this section, we discuss the connection information.)
The MaxDataLength parameter is the maximum size of the message that can pass through
the port. Both these parameters are ignored. The operating system always sets the
connection information length to 260 bytes and the data length to 328 bytes, which are
the maximum allowed values for these parameters. Just make sure that you pass values
less than the maximum allowed values because the function returns an error otherwise.
The unknown fifth parameter can pass as zero. A handle to the newly created port returns
in PortHandle. The server process uses this port handle to accept connection requests
from clients.
NtConnectPort
int _stdcall
NtConnectPort(
PHANDLE PortHandle,
PUNICODE_STRING PortName,
PVOID Unknown1,
LPCSECTIONINFO sectionInfo,
PLPCSECTIONMAPINFO mapInfo,
PVOID Unknown2,
PVOID ConnectInfo,
PDWORD pConnectInfoLength);
The client uses this function to establish LPC communication with the server. The name of
the port to connect to is specified as a Unicode string in the PortName parameter. The
second parameter, unknown at this time, cannot pass as NULL because the function fails
the validation checks otherwise. The third parameter operates only when you use the
shared section LPC. It is a pointer to a structure, described as follows:
DWORD Length;
HANDLE SectionHandle;
DWORD Param1;
DWORD SectionSize;
DWORD ClientBaseAddress;
DWORD ServerBaseAddress;
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 131 of 226
} LPCSECTIONINFO, *PLPCSECTIONINFO;
The Length field in this structure specifies the size of the structure; it is always set to 24.
The caller of this function–the client–fills the SectionHandle and SectionSize fields, apart
from the Length. The CreateFileMapping() function can create a shared section of
required size. Upon return from the NtConnectPort() function, the ClientBaseAddress and
ServerBaseAddress fields, in the LPCSECTIONINFO structure, contain the addresses where
the section is mapped in the client address space and the server address space,
respectively.
The next parameter to the NtConnectPort() function–mapInfo–also functions only for the
shared section LPC. This parameter is a pointer to a structure described as follows:
DWORD Length;
DWORD SectionSize;
DWORD ServerBaseAddress;
} LPCSECTIONMAPINFO, *PLPCSECTIONMAPINFO;
This structure duplicates the information in the LPCSECTIONINFO structure. The client
needs to fill only the Length field, which it always sets to 12–the size of the structure. We
have not been able to decipher the significance of passing this structure to the
NtConnectPort() function. Still, you have to pass a valid structure; if you pass a NULL
pointer, the function fails. We have observed that the two members of the structure,
namely, SectionSize and ServerBaseAddress, zero out on return from the function.
We do not know the next parameter sent to the NtConnectPort() function, so set it as
NULL.
The client can send some information to the server with the connection request. The
server receives this information via the LPC message, which it gets from the
NtReplyWaitReceivePort() function in case of a connection request. The ConnectInfo
parameter points to this connection information. The size of the connection information
passes through the pConnectInfoLength parameter that is a pointer to a double word. The
server, also, can send back some information to the client at connection time. This
information returns in the same ConnectInfo buffer, and the pConnectInfoLength is set to
indicate the length of the returned connection information.
NtReplyWaitReceivePort
int _stdcall
NtReplyWaitReceivePort(
HANDLE PortHandle,
PDWORD Unknown,
PLPCMESSAGE pLpcMessageOut,
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 132 of 226
PLPCMESSAGE pLpcMessageIn);
This function is used by the server side of LPC to receive requests from clients and reply
to them. The first parameter is the port handle obtained from the NtCreatePort()
function. The second parameter, currently unknown, can be passed as NULL. The third
parameter is the message that serves as a reply to the previous client request. This
parameter can be NULL, in which case the function simply accepts a request from the
client. The fourth parameter, a pointer to a LpcMessage structure, fills, on return from
the function, with the request information. Both the third and the fourth parameters are
pointers to the LpcMessage structure, which we display here.
WORD ActualMessageLength;
WORD TotalMessageLength;
DWORD MessageType;
DWORD ClientProcessId;
DWORD ClientThreadId;
DWORD MessageId;
DWORD SharedSectionSize;
BYTE MessageData[MAX_MESSAGE_DATA];
} LPCMESSAGE, *PLPCMESSAGE;
The ActualMessageLength field is set to the size of the actual message stored in the
MessageData field, whereas the TotalMessageLength is set to the size of the entire
LpcMessage structure along with the MessageData. The system, not the client-server, sets
the MessageType field. There are several message types. We detail the important ones:
LPC_REQUEST The server receives this type of message when a client sends a request using the
NtRequestWaitReplyPort() function. The server should reply to this message using the
NtReplyPort() function or the NtReplyWaitReceivePort() function. The server should not
reply to any messages other than the LPC_REQUEST messages. The
NtRequestWaitReplyPort() function waits until it gets the reply from the server and then
returns the reply message to the client. Effectively, the client thread that calls the
NtRequestWaitReplyPort() function hangs if the server does not send a reply message.
LPC_REPLY The client receives this type of message from the NtRequestWaitReplyPort() function,
when the server replies to the request.
LPC_DATAGRAM The server receives this type of message when a client sends a request using the
NtRequestPort() function. As the name of the message type implies, the client does not
get a reply from the server for this kind of message. If the server tries to reply to this
message using the NtReplyPort() function or the NtReplyWaitReceivePort() function, the
function fails and returns an error.
LPC_PORT_CLOSED The server receives this type of message when a client closes the port handle. If a client
dies without closing the port handle, the operating system closes the handle on behalf of
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 133 of 226
the client. Thus, the server gets the LPC_PORT_CLOSED message in any case and can use
it to free the per-client resources it allocates.
LPC_CLIENT_DIED The server receives this type of message when a client dies. Refer to the description of
the NtRegisterThreadTerminatePort() function for more information.
LPC_CONNECTION_REQUEST The corresponding server receives this type of message when a client tries to connect to
a port using the NtConnectPort() function.
The next fields in the LpcMessage structure are set, by the system, to the client’s process
ID and thread ID, respectively. The next field is set to the unique message ID generated by
the system. The server can rely on these fields because the operating system, not the
client, sets them. These fields do not make sense in the messages received by the client
and therefore are set to zero in the messages returned by the NtRequestWaitReplyPort()
function.
Only the shared section LPC uses the SharedSectionSize field. The system sets this field to
the size of the shared section when it passes a LPC_CONNECTION_REQUEST type of
message to the server.
The last field is the actual message and is a variable length field. The client-server can
choose to allocate only enough memory space to hold the structure parameters and the
actual message. When passing a pointer to this structure for receiving a message, you
must allocate enough memory space to fit the message the process can send at the other
end of the port. If you fail to do it, you will receive an “Invalid Access” or similar kind of
fault. To be on the safer side, you should always allocate for the maximum-sized message
while passing a pointer for receiving a message.
NtAcceptConnectPort
int _stdcall
NtAcceptConnectPort(
PHANDLE PortHandle,
DWORD Unknown1,
PLPCMESSAGE pLpcMessage,
DWORD acceptIt,
DWORD Unknown3,
PLPCSECTIONMAPINFO mapInfo);
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 134 of 226
We have not been able to decipher the second parameter–generally set to zero. The third
parameter is the LPC message returned to the client as the connection information from
the server. The fourth parameter, named acceptIt, is passed as 0 if the server cannot
accept the connection request. The server passes acceptIt as a nonzero value if it can
accept the connection request. The fifth parameter, not deciphered yet, can be set to
zero. The last parameter is a pointer to the LpcSectionMapInfo structure, which fills with
appropriate data upon return. We already explained the members of this structure. This
structure supplies shared-section information for future use by the server for
communicating with the client.
NtCompleteConnectPort
int _stdcall
NtCompleteConnectPort(
HANDLE PortHandle);
The server finishes the connection procedure with the NtCompleteConnectPort() function.
The only parameter to this function is the port handle returned by the previous call to the
NtAcceptConnectPort() function. The client waits in the NtCon-nectPort() function until
the server completes the connection procedure by calling the NtCompleteConnectPort()
function.
NtRequestWaitReplyPort
int _stdcall
NtRequestWaitReplyPort(
HANDLE PortHandle,
PLPCMESSAGE pLpcMessageIn,
PLPCMESSAGE pLpcMessageOut);
The client uses this function to send a request and wait for a reply to/from the server.
The first parameter is the port handle obtained via a previous call to the NtConnectPort()
function. The pLpcMessageIn parameter is a pointer to a LPC request message sent to the
server. The last parameter is a pointer to another LPC message structure that fills with
the reply message from the server, on return from the function.
NtListenPort
int _stdcall
NtListenPort(
HANDLE PortHandle,
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 135 of 226
PLPCMESSAGE pLpcMessage);
This very small function internally uses the NtReplyWaitReceivePort() function. Here we
present the pseudocode of this function:
NtListenPort(HANDLE PortHandle,
PLPCMESSAGE pLpcMessage)
while(1) {
rc = NtReplyWaitReceivePort(
PortHandle,
NULL,
NULL
pLpcMessage);
if (rc == 0)
if(pLpcMessage->MessageType ==
LPC_CONNECTION_REQUEST)
break;
return rc;
As you can see from this pseudocode, the NtListenPort() function ignores all messages
except connection requests. You cannot use this function if servicing multiple clients.
While servicing multiple clients, a server gets a mix of connection requests and other
client requests. The server needs to sort out the connection requests from the other
requests and perform appropriate processing. If only a single client can connect at a time,
the server can get the connection request using the NtListenPort() function and then start
a loop to accept and process other client requests.
NtRequestPort
int _stdcall
NtRequestPort(
HANDLE PortHandle,
PLPCMESSAGE pLpcMessage);
This function just sends a message on the port and returns. The server thread waiting on
this port gets the message and does the required processing. The server thread need not
return the results to the caller. In this case, the message type in the header is
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 136 of 226
LPC_DATAGRAM. A message sent using this function resembles a datagram in the sense
that the sender does not receive an acknowledgment.
NtReplyPort
int _stdcall
NtReplyPort(
HANDLE PortHandle,
PLPCMESSAGE pLpcMessage);
The server uses this function if it wants to send a reply to the client and does not want to
be blocked for the next request from the client. The first parameter to this function is the
port handle, and the second parameter is the reply message sent to the client.
NtRegisterThreadTerminatePort
int _stdcall
NtRegisterThreadTerminatePort(
HANDLE PortHandle);
If a client calls this function after connecting to a port, then the operating system sends
the LPC_CLIENT_DIED message to the server when the client dies. Even if the client closes
the port handle and keeps running, the system maintains a reference to the port.
Therefore, the operating system sends the LPC_PORT_CLOSED message after the
LPC_CLIENT_DIED message and not after the client closes the port handle.
NtSetDefaultHardErrorPort
int _stdcall
NtSetDefaultHardErrorPort(
HANDLE PortHandle);
The CSRSS subsystem calls this function during its initialization. The NtRaiseHardError()
function, called in case of serious system errors, sends a message to the registered hard
error port. Hence, the CSRSS subsystem can pop up the message when application startup
problems appear. The kernel houses only one set of global variables. These variables store
the pointer to the hard error port so only one process can capture system errors. On
Windows NT, this happens to be the Win32 subsystem. Calling this function requires
special privilege.
NtSetDefaultHardErrorPort(HANDLE PortHandle)
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 137 of 226
if (PrivilegeNotHeld)
return STATUS_PRIVILEGE_NOT_HELD);
if (ExReadyForErrors == 0) {
PortHandle;
ExpDefaultErrorPort =
ExpDefaultErrorPortProcess = CurrentProcess;
ExReadyForErrors = 1;
} else {
return STATUS_UNSUCCESSFUL
return STATUS_SUCCESS;
NtImpersonateClientOfPort
int _stdcall
NtImpersonateClientOfPort(
HANDLE PortHandle,
PLPCMESSAGE pLpcMessage);
A subsystem may need to perform some processing in the security context of the calling
thread. The NtImpersonateClientOfPort() function enables the server thread to assume the
security context of the client thread. The function uses the pLpcMessage parameter to
identify the process ID and thread ID of the client thread.
On the CD: The sample program can be found in the PORT.C file on the accompanying CD-ROM. The data prototypes
and structure definitions for port-related functions can be found in UNDOCNT.H, which is also on the CD-ROM.
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 138 of 226
demonstrating short message communication. When the program is invoked without any
parameters, it acts as the server. If invoked with some parameter, it acts as a client (the
parameter is a dummy parameter and gets ignored). You should start the program in
server mode first. The server-mode program first creates a port and then loops into a
“receive request–process request–reply request” sequence. It uses the
NtReplyWaitReceivePort() function to accept requests. The connection requests are
treated differently than other requests. In case of a connection request, the server thread
has to accept the connection and complete the connection sequence. For requests, other
than the connection request, the server prints the message, inverts all the bytes in the
message, and sends this inverted message back as the reply.
Once the server is ready to accept connections, you can run another instance of the
program–this time in client mode. The client-mode program connects to the port created
by the server-mode instance. It first demonstrates the use of the NtRequestPort() function
to send a datagram. Then, the client sends a request and waits for a reply in a loop. You
can start multiple client sessions; the server portion of the program can handle multiple
client requests.
/***************************************************/
* port object
*/
#include <windows.h>
#include <stdio.h>
#include "undocnt.h"
#include "print.h"
Apart from regular header inclusions, the initial portion of the PORT.C file has the
definition of the name of the message port used by the sample program. It is a complete
path name starting from the root of the object directory. Note that the wide character set
is used instead of the normal ASCII character set because we are directly invoking the
system services and the system services understand only the Unicode character set.
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 139 of 226
*/
DWORD *ptr;
DWORD i;
for(i=0;
i<pLpcMessage->ActualMessageLength/sizeof(DWORD);
i++) {
ptr[i] = ~ptr[i];
return;
This is a dummy processing function on the server side. This function is passed the LPC
request message, received by the server. The function should return the reply message in
the same memory space. As the comment says, the function simply inverts all the bytes in
the message. Because we only want to demonstrate the working of the LPC, we do not
provide any intricate server functionality. You can modify this function to implement the
functionality provided by your server.
BOOL
ProcessConnectionRequest(
PLPCMESSAGE LpcMessage,
PHANDLE pAcceptPortHandle)
HANDLE AcceptPortHandle;
int rc;
*pAcceptPortHandle=NULL;
PrintMessage(LpcMessage);
ProcessMessageData(LpcMessage);
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 140 of 226
rc = NtAcceptConnectPort(
&AcceptPortHandle,
0,
LpcMessage,
1,
0,
NULL);
if (rc != 0) {
return FALSE;
printf("AcceptPortHandle=%x\n", AcceptPortHandle);
rc = NtCompleteConnectPort(AcceptPortHandle);
if (rc != 0) {
CloseHandle(AcceptPortHandle);
rc);
return FALSE;
*pAcceptPortHandle = AcceptPortHandle;
return TRUE;
The server part of the program calls this function when it receives a connection request
from the client. This function receives the message containing the connection request and
returns the port handle specific to the client. The function first prints the message then
calls the ProcessMessageData() function. As described earlier, the message data in a
connection request consists of nothing but the ConnectInfo passed to the NtConnectPort()
function by the client.
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 141 of 226
In this function, we accept all the connection requests. You may want to modify this
function to selectively accept connection requests. For example, you might permit the
connection only for certain users or only if the client provides certain connection
information. If your server can accept only a single client at a time, you need to reject all
further connection requests. As described earlier, you can reject connection requests by
passing the acceptIt parameter as zero.
BOOL
ProcessLpcRequest(
HANDLE PortHandle,
PLPCMESSAGE LpcMessage)
int rc;
PrintMessage(LpcMessage);
ProcessMessageData(LpcMessage);
rc = NtReplyPort(PortHandle, LpcMessage);
if (rc != 0) {
return FALSE;
return TRUE;
In this program, we chose to use two function calls to reply to a message and receive the
next message, instead of using a single call to the NtReplyWaitReceive() function. The
ProcessLpcRequest() function, a small utility function, prints the received message,
processes it (inverts the bytes by calling the ProcessMessageData() function), and sends
back the processed data as the reply using the NtReplyPort message.
BOOL RetVal;
HANDLE PortHandle;
int rc;
LPCMESSAGE LpcMessage;
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 142 of 226
rc = NtCreatePort(&PortHandle, ObjectAttr,
if (rc != 0) {
return -1;
memset(&LpcMessage, 0, sizeof(LpcMessage));
while (1) {
HANDLE AcceptPortHandle;
rc = NtReplyWaitReceivePort(PortHandle,
NULL,
NULL,
&LpcMessage);
if (rc != 0) {
printf("NtReplyWaitReceivePort failed");
CloseHandle(PortHandle);
return -1;
RetVal = TRUE;
switch (LpcMessage.MessageType) {
case LPC_CONNECTION_REQUEST:
RetVal = ProcessConnectionRequest(
&LpcMessage,
&AcceptPortHandle);
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 143 of 226
break;
case LPC_REQUEST:
RetVal = ProcessLpcRequest(
PortHandle,
&LpcMessage);
break;
default:
PrintMessage(&LpcMessage);
break;
if (RetVal == FALSE) {
break;
return 0;
As described earlier, the same LPC demonstration program acts as the server and the
client. The main() function calls the server() function when the program is invoked
without any parameters. The server() function is passed a pointer to the
OBJECT_ATTRIBUTES structure that contains the object name of the communication port.
The function creates a port with this name, upon which it gets back a handle to the port.
As described earlier, the MaxConnectInfoLength and MaxDataLength parameters to the
NtCreatePort() function are ignored so we simply pass them as zero. The NtCreatePort()
function returns a zero on success and a nonzero value on failure.
After successful creation of the port, the server() function goes into a receive-process-
reply loop. The function uses the NtReplyWaitReceivePort() function to receive requests
from clients. Since we use this function only to receive requests, the pLpcMessageOut
parameter passes as NULL. The NtReplyWaitReceivePort() function returns zero on
success, and the pLpcMessageIn contains the client request. This request can take the
form of a LPC_CONNECTION_REQUEST, a LPC_DATAGRAM, a LPC_REQUEST, and so on. The
server processes each type of requests differently. It processes the
LPC_CONNECTION_REQUEST by performing the connection protocol. It accomplishes this
by calling the ProcessConnectionRequest() function. With a LPC_REQUEST message, the
server needs to do the requested processing and reply to the request. Since we are not
implementing any significant functionality in the server, we just print the message, invert
the message bytes, and return a reply. We do this in the ProcessLpcRequest() function.
For LPC_DATAGRAM messages, a reply is not expected. These messages and all other
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 144 of 226
The server side of the program continuously loops, receiving-processing-replying the client
requests. We did not program an exit for the server part. This is generally the case with
servers, and that’s the reason why they are called daemons in Unix terminology.
Generally, servers start up with the system boot and continue processing client requests
until the system shuts down. With our server, you can kill it by pressing Ctrl+C in the
command window or by using the Task Manager.
HANDLE PortHandle;
DWORD i;
DWORD Value=0xFFFFFFFF;
int rc;
LPCMESSAGE LpcMessage;
DWORD *ptr;
printf("ClientProcessId=%x, ClientThreadId=%x\n",
GetCurrentProcessId(),
GetCurrentThreadId());
rc = NtConnectPort(&PortHandle,
uString,
&Param3,
0,
0,
0,
ConnectDataBuffer,
&Size);
if (rc != 0) {
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 145 of 226
return -1;
printf("\n\n");
rc = NtRegisterThreadTerminatePort(PortHandle);
if (rc != 0) {
CloseHandle(PortHandle);
return -1;
* NtRequestPort
*/
memset(&LpcMessage, 0, sizeof(LpcMessage));
LpcMessage.ActualMessageLength=0x08;
LpcMessage.TotalMessageLength=0x20;
ptr=(DWORD *)LpcMessage.MessageData;
ptr[0]=0xBABABABA;
ptr[1]=0xCACACACA;
rc=NtRequestPort(PortHandle, &LpcMessage);
while (1) {
memset(&LpcMessage, 0, sizeof(LpcMessage));
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 146 of 226
LpcMessage.ActualMessageLength=0x08;
LpcMessage.TotalMessageLength=0x20;
ptr[0] = Value;
ptr[1] = Value-1;
fflush(stdin);
if (toupper(getchar()) == ’Y’) {
CloseHandle(PortHandle);
break;
*/
rc = NtRequestWaitReplyPort(PortHandle,
&LpcMessage,
&LpcMessage);
if (rc != 0) {
rc);
return -1;
PrintMessage(&LpcMessage);
Value -= 2;
return 0;
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 147 of 226
The client() function implements the client-side portion of the LPC sample. The function
prints the process ID and the thread ID; you can match it with the process ID and thread ID
printed from the messages received by the server.
The client() function starts its job by connecting to the port created by the server process.
It passes six double words as the connectInfo. You can verify that the server receives
these words as the message data with the LPC_CONNECTION_REQUEST. Upon return from
the NtConnectPort() function, the client gets a handle to the port. Also, the connectInfo
buffer fills with the data message passed to the NtAcceptConnectPort() function by the
server.
Further, the client calls the NtRegisterThreadTerminatePort() function, with the newly
acquired port handle as the parameter, so that the operating system sends a
LPC_CLIENT_DIED message over the port when the client terminates. The client calls this
function only if the server needs to know about the client death. We call this function
here to demonstrate the mechanism.
The client also demonstrates the datagram communication via the LPC. As described
earlier, the NtRequestPort() function passes LPC_DATAGRAM type requests. Note that the
client fills in only the message length fields and the actual message data; the operating
system fills in the remaining fields in the LPCMESSAGE structure before the message
passes to the server. The client() function sends two double words as the message data,
which the server prints upon reception of the message.
After demonstrating the datagram communication, the client goes in a “send request –
wait for reply” loop. Every time, before sending the request, it asks the user whether to
continue or quit. If the user wants to continue with the demonstration, the client sends a
sample request over the port using the NtRequestWaitReply() function. The message data
consists of two double words inverted by the server and sent back as the reply. The
NtRequestWaitReply() function returns to the client after it gets the reply message from
the server. In this program, we used the same buffer to pass the request message and to
receive the reply message. You can use different buffers for this purpose.
OBJECT_ATTRIBUTES ObjectAttr;
UNICODE_STRING uString;
int rc;
memset(&ObjectAttr, 0, sizeof(ObjectAttr));
ObjectAttr.Length = sizeof(ObjectAttr);
RtlInitUnicodeString(&uString, PORTNAME);
ObjectAttr.ObjectName = &uString;
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 148 of 226
if (argc == 1) {
*/
rc = server(&ObjectAttr);
} else {
*/
rc = client(&uString);
return rc;
The main() function simply represents the control function that calls either the server part
or the client part depending on whether the user specifies a parameter. Before passing on
the control to one of these functions, the main() function initializes a UNICODE_STRING
and an OBJECT_ATTRIBUTES structure with the port name. These pass as parameters to
the server() and client() functions.
Apart from the PORT.C file, the sample program contains a PRINT.C file and a PRINT.H
file. The PRINT.C file contains utility routines to print the LPCMESSAGE structure, and the
PRINT.H file contains the prototypes for these functions.
Similar to the short message LPC sample, the same program works as the server as well as
the client depending on whether a parameter is passed while invoking the program. You
should start the program in the server mode first and when the server is ready, start the
same program in client mode from another command window. The client creates a shared
section for passing parameters and receiving results. The client then establishes
communication with the server and asks for a string sent to the server as the parameter.
The client copies the string to the shared section and sends a message to the server. Upon
receiving the message, the server reverses the string in the shared section and sends a
reply. The client prints the reversed string after receiving the reply. The server permits
you to start multiple client sessions simultaneously.
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 149 of 226
#include <windows.h>
#include <stdio.h>
#include "..\port\print.h"
DWORD ServerBaseAddress;
DWORD MessageOffset;
} SHAREDLPCMESSAGE, *PSHAREDLPCMESSAGE;
This initial portion of the file contains, apart from the required include directives, a
couple of important definitions. The client creates the section and therefore determines
the size of the shared section. The server is intimated about the size of the section at the
time of connection. The operating system sets the SharedSectionSize field, in the LPC
message, to the size of the shared section when it passes a LPC_CONNECTION_REQUEST
message to the server. The server might choose to reject the connection request if it
disagrees with the section size chosen by the client. For example, the section size might
prove too small for the replies from the server.
The section size definition is followed by the definition for the message that the client
sends over the port when the client wants to invoke some service from the server. As
described earlier, the actual parameters pass via the shared section; the message simply
indicates to the server that the client wants to invoke some service. In this sample
program, we choose to pass the port message containing the server-side base address of
the shared section and the offset of the copied parameters within the shared section. The
server, in this sample program, does not keep track of the shared section information for
the connected clients. (Remember that the server is informed of the details of the shared
section when it accepts the connection request via NtAcceptConnectPort().) The server
depends solely on the shared-section information passed by the client with every LPC
request. In a nondevelopment environment, with unreliable clients, the server should
either maintain the track of the shared-section information itself or verify the information
sent by the client.
* and reverse it
*/
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 150 of 226
PSHAREDLPCMESSAGE SharedLpcMessage;
char *ServerView;
SharedLpcMessage =
(PSHAREDLPCMESSAGE)(pLpcMessage->MessageData);
ServerView =
((char *)SharedLpcMessage->ServerBaseAddress)+
SharedLpcMessage->MessageOffset;
strrev(ServerView);
BOOL
ProcessConnectionRequest(
PLPCMESSAGE LpcMessage,
PHANDLE pAcceptPortHandle)
LPCSECTIONMAPINFO mapInfo;
HANDLE AcceptPortHandle;
PrintMessage(LpcMessage);
*/
memset(&mapInfo, 0, sizeof(mapInfo));
mapInfo.Length=0x0C;
rc = NtAcceptConnectPort(
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 151 of 226
&AcceptPortHandle,
0,
LpcMessage,
1,
0,
&mapInfo);
if (rc != 0) {
return FALSE;
printf("AcceptPortHandle=%x\n", AcceptPortHandle);
printf("mapInfo.SectionSize=%x\n",
mapInfo.SectionSize);
printf("mapInfo.ServerBaseAddress=%x",
mapInfo.ServerBaseAddress);
rc = NtCompleteConnectPort(AcceptPortHandle);
if (rc != 0) {
rc);
return FALSE;
*pAcceptPortHandle = AcceptPortHandle;
return TRUE;
The ProcessConnectionRequest() here also resembles the one in the shared section LPC
sample. The only difference between the two functions is in the value they pass for the
mapInfo parameter to NtAcceptConnectPort(). If the server passes a non-NULL value for
the mapInfo parameter and the client has not sent the shared section information with the
connection request, the call fails. Therefore, the ProcessConnectionRequest() function, in
the shared section LPC sample, passes NULL as the mapInfo parameter. Here, the
ProcessConnectionRequest() function passes a pointer to the LPCSECTIONMAPINFO
structure, where it receives the information about the shared section for use in parameter
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 152 of 226
passing. The sample program does not use this information. A real server might keep track
of the shared-section information per client; for example, it can maintain a hash table
indexed by the client thread ID. The server can later retrieve the shared-section
information from the hash table whenever it receives a LPC request. In this sample
program, the client sends the shared section information, with every LPC request, as a
part of the message sent over the port.
HANDLE PortHandle;
int rc;
LPCMESSAGE LpcMessage;
HANDLE AcceptPortHandle;
BOOL FirstTime=TRUE;
0x0, 0x00000);
if (rc != 0) {
return -1;
memset(&LpcMessage, 0, sizeof(LpcMessage));
while (1) {
if ((FirstTime) ||
(LpcMessage.MessageType != LPC_REQUEST)) {
* message.
*/
rc = NtReplyWaitReceivePort(
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 153 of 226
PortHandle,
NULL,
NULL,
&LpcMessage);
FirstTime=FALSE;
} else {
*/
rc = NtReplyWaitReceivePort(
PortHandle,
0,
&LpcMessage,
&LpcMessage);
if (rc != 0) {
printf("NtReplyWaitReceivePort"
return -1;
if (LpcMessage.MessageType ==
LPC_CONNECTION_REQUEST) {
ProcessConnectionRequest(&LpcMessage,
pAcceptPortHandle)
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 154 of 226
*/
PrintMessage(&LpcMessage);
ProcessMessageData(&LpcMessage);
return 0;
The server() function implements the server-side functionality of the sample program. It
starts by creating a port object. After successful creation of the port, the function goes in
a “receive request – process request – send reply” loop. The server continues in the loop
until you terminate it by pressing Ctrl+C or with the help of the Task Manager.
The server() function receives a new request and replies to the previous request using a
single call to the NtReplyWaitReceive() function. A reply needs to be sent only if the
previous request is of type LPC_REQUEST. Hence, the function calls the
NtReplyWaitReceive() function with a NULL pLpcMessageOut parameter when it receives
the first request or the previous request is not of type LPC_REQUEST. Otherwise, the
message received from the client sends as the pLpcMessageOut parameter. In both cases,
upon return from the NtReplyWaitReceive() function, the LpcMessage structure contains
the next request sent by the client.
The server handles only the LPC_REQUEST and LPC_CONNECTION_REQUEST type messages;
other messages are ignored. For LPC_CONNECTION_REQUEST messages, the server
establishes a communication channel with the client by calling the
ProcessConnectionRequest() function. For LPC_REQUEST messages, the server prints the
message and calls the ProcessMessageData() function that reverses the string that passes
as a parameter in the shared section. The reply to the LPC_REQUEST message is sent by a
call to the NtReplyWaitReceivePort() function in the next iteration.
HANDLE hFileMapping;
LPCSECTIONINFO sectionInfo;
LPCSECTIONMAPINFO mapInfo;
DWORD ServerBaseAddress;
DWORD ClientBaseAddress;
char *ClientView;
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 155 of 226
HANDLE PortHandle;
int rc;
LPCMESSAGE LpcMessage;
hFileMapping = CreateFileMapping(
(HANDLE)0xFFFFFFFF,
NULL,
PAGE_READWRITE,
0,
SHARED_SECTION_SIZE,
NULL);
if (hFileMapping == NULL) {
return -1;
memset(§ionInfo, 0, sizeof(sectionInfo));
memset(&mapInfo, 0, sizeof(mapInfo));
sectionInfo.Length = 0x18;
sectionInfo.SectionHandle = hFileMapping;
sectionInfo.SectionSize = SHARED_SECTION_SIZE;
mapInfo.Length = 0x0C;
printf("ClientProcessId=%x, ClientThreadId=%x\n",
GetCurrentProcessId(),
GetCurrentThreadId());
rc = NtConnectPort(
&PortHandle,
uString,
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 156 of 226
&Param3,
§ionInfo,
&mapInfo,
NULL,
NULL,
NULL);
if (rc != 0) {
return -1;
printf("PortHandle=%x\n", PortHandle);
sectionInfo.ClientBaseAddress);
sectionInfo.ServerBaseAddress);
ServerBaseAddress =
sectionInfo.ServerBaseAddress;
ClientBaseAddress =
sectionInfo.ClientBaseAddress;
while (1) {
int MessageOffset = 0;
PSHAREDLPCMESSAGE SharedLpcMessage;
gets(MessageString);
if (stricmp(MessageString, "quit") == 0) {
CloseHandle(PortHandle);
return 0;
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 157 of 226
fflush(stdin);
scanf("%d", &MessageOffset);
if ((MessageOffset+strlen(MessageString)) >=
SHARED_SECTION_SIZE) {
"memory window\n");
return -1;
memset(&LpcMessage, 0, sizeof(LpcMessage));
LpcMessage.ActualMessageLength=0x08;
LpcMessage.TotalMessageLength=0x20;
SharedLpcMessage =
(PSHAREDLPCMESSAGE)(LpcMessage.MessageData);
ServerBaseAddress);
SharedLpcMessage->ServerBaseAddress =
ServerBaseAddress;
SharedLpcMessage->MessageOffset =
MessageOffset;
MessageOffset;
strcpy(ClientView, MessageString);
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 158 of 226
"reply....\n");
rc=NtRequestWaitReplyPort(
PortHandle,
&LpcMessage,
&LpcMessage);
if (rc != 0) {
rc);
return -1;
PrintMessage(&LpcMessage);
//
return 0;
The client() function, which encompasses the client-side functionality of the sample
program, substantially differs from the client() function in the short message LPC sample.
This is because a majority of the shared-section handling is performed in the client.
The client() function starts creating a shared section by calling the CreateFileMapping()
API function. Note that the section is created with read+write permissions. Also note that,
the file handle, passed as –1, means an unnamed section not associated with any file is
created. You can create a section by mapping a disk file, but it is not necessary. The
function passes the section handle, returned by the CreateFileMapping() function, to the
NtConnectPort() function via the sectionInfo parameter. The NtConnectPort() function
maps the shared section in the client as well as the server address space before sending a
connection request to the server. The NtConnectPort() function returns after successfully
establishing a communication channel with the server. Upon return, the sectionInfo
structure contains the information about shared-section mapping. The function also
returns the handle to the LPC port, used by the client for issuing requests.
After a successful connection establishment, the client goes in a "send request – wait for
reply” loop. The client asks the user for a string that it sends to the server as the
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 159 of 226
parameter. (If you enter “quit,” the client exits.) The client also inputs the offset, within
the shared section. After receiving these inputs, the client copies the given string at the
specified offset in the shared section. It fills up a LPC message indicating the base
address, of the shared section, in the server address space and the offset of the string
within the shared section. The client sends the LPC message to the server over the port by
calling the NtRequestWaitReplyPort() function. Upon receiving the message, the server
reverses the string and sends a reply message. The client prints the reversed string upon
return from the NtRequestWaitReplyPort() function.
OBJECT_ATTRIBUTES ObjectAttr;
UNICODE_STRING uString;
int rc;
memset(&ObjectAttr, 0, sizeof(ObjectAttr));
ObjectAttr.Length=sizeof(ObjectAttr);
ObjectAttr.ObjectName=&uString;
RtlInitUnicodeString(&uString, PORTNAME);
if (argc == 1) {
*/
rc = server(&ObjectAttr);
} else {
*/
rc = client(&uString);
return rc;
Similar to the short message LPC sample, the main() function in this sample program does
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 160 of 226
not have any substantial code. It simply acts as a control function that calls either the
server() function or the client() function depending on whether the program is invoked
with command line parameters. The program also uses the PRINT.H and PRINT.C files for
printing the LPC messages.
QUICK LPC
Quick LPC is the fastest form of LPC. Apart from that, Quick LPC has some peculiarities.
For one, Quick LPC does not use port objects. Second, Quick LPC serves as the exclusive
medium of communication for the Win32 subsystem. The Windows NT kernel supports only
a single server (per client) using Quick LPC; the Win32 subsystem occupies this slot.
Therefore, if you want to use Quick LPC, you need to modify the kernel a bit. (Note that
until now, we presented only user-level code in this chapter.) However, talking about the
peculiarities without giving details can make this concept puzzling. So, here we present
details about Quick LPC.
Another problem with the regular LPC is that the context switching between the client
thread and the server thread happens in an “uncontrolled” manner. Typically, a client
sends a request on the port and waits for a response from the server (except while sending
datagrams using the NtRequestPort() function). While the client thread waits on the port
for a reply, the thread scheduler searches for the most eligible thread for execution. More
often than not, this new thread selected for execution differs from the server thread.
Essentially, the server thread is not immediately scheduled when the request comes over
the port. Similarly, the client thread may not be scheduled immediately after the
subsystem sends a reply.
Quick LPC overcomes both of the aforementioned disadvantages. The first disadvantage is
overcome by creating a dedicated server thread per client thread. The second
disadvantage is overcome by using a kernel object named an event pair, which serves as
the backbone of the Quick LPC. As implied by its name, an event pair consists of a pair of
event objects, named high event and low event, respectively. The NT kernel provides
functions, which allow a thread to wait on one of the events in the pair and signal the
second event in an atomic operation. The event pair object also guarantees that the
thread waiting on the signaled event is the next thread to be scheduled.
Note: Two sets of functions operate on the event pair. One set of functions gives the regular sleep-wakeup protocol; it
does not guarantee immediate thread scheduling: The NtSetHighWaitLowEventPair() function and the
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 161 of 226
NtSetLowWaitHighEventPair() function. In this chapter, we discuss the other set of functions that guarantee the
immediate scheduling of the signaled thread.
The event pair object takes care of the “controlled” thread switching. It provides no
mechanism for passing parameters and return values. The Quick LPC achieves this with a
dedicated shared section for each client thread. The Win32 subsystem also creates a
dedicated section object and maps it in the address space of both the client and the
subsystem processes. The client thread fills in the parameters in the shared area before
passing the control to the server thread and similarly the server thread copies the results
in the shared area before returning the control to the client thread.
Naturally, you may think, “Why is the Quick LPC restricted to the Win32 subsystem? Why
can’t it operate as a general-purpose Inter-Process Communication mechanism?” The
reason is that you cannot call the functions KiSetLowWaitHighThread() and
KiSetHighWaitLowThread() from the user-mode process directly. Windows NT reserves two
software interrupts for this purpose. Interrupt 0x2C calls the function
KiSetLowWaitHighThread() and interrupt 0x2B calls the function KiSetHighWaitLowThread
().These two interrupt routines operate on a default event pair object. The Thread
Environment Block (TEB) maintains a pointer to this default event pair. You can use the
NtSetThreadInformation() function to set this pointer. Since only one event pair object
can associate with every thread, only one server thread can make use of the Quick LPC;
that server thread typically belongs to the Win32 subsystem for most applications.
However, non-Win32 applications–or for that matter, non-GUI applications–can still use the
Quick LPC for general-purpose communication.
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 162 of 226
Following the usual practice in this chapter, the same sample program acts as the server
or the client depending on whether you pass a command line parameter to the program.
You should first start the program in the client mode. The client prints its own process ID
and thread ID. The server needs this information to establish the event pair object. After
you start the program in the server mode, it asks you for the process ID and the thread ID
of the client. After initializing the thread object, the server issues INT 2CH then waits for
a client request. Meantime, the client waits for a user keystroke. After getting a keystroke
from the user, the client issues a INT 2BH, which switches the execution thread from the
client thread to the server thread. The server prints a message indicating that it is
scheduled and then waits for a keystroke. Upon receiving the keystroke, it switches the
control back to the client by triggering INT 2C again. This continues until you kill the
server and the client by pressing Ctrl+C or using the Task Manager.
The implementation, for both client and server, resides in a single file, QLPC.C, which we
describe in detail in the next section.
#include <windows.h>
#include <stdio.h>
#include "..\include\undocnt.h"
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 163 of 226
#define EVENTPAIRNAMEL"\\MyEventPair"
Apart from the usual header inclusions, the initial portion of the QLPC.C file defines the
name of the event pair used by the sample program to demonstrate “controlled” thread
switching. We create the event pair at the root of the object directory. If you want to
create several objects in the object tree, we suggest you create these objects under an
application-specific directory.
int server()
HANDLE ClientEventPairHandle;
OBJECT_ATTRIBUTES ObjectAttr;
UNICODE_STRING uString;
DWORD OpenThreadParam[2];
int rc;
memset(&ObjectAttr, 0, sizeof(ObjectAttr));
ObjectAttr.Length = sizeof(ObjectAttr);
RtlInitUnicodeString(&uString, EVENTPAIRNAME);
ObjectAttr.ObjectName = &uString;
rc = NtCreateEventPair(
&EventPairHandle,
STANDARD_RIGHTS_ALL,
&ObjectAttr);
if (rc == 0) {
printf("EventPairHandle=%x\n", EventPairHandle);
} else {
return -1;
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 164 of 226
rc = ZwSetInformationThread(
GetCurrentThread(),
8,
&EventPairHandle,
4);
if (rc != 0) {
return -1;
ClientProcessHandle = OpenProcess(
PROCESS_ALL_ACCESS,
FALSE,
ClientPid);
if (ClientProcessHandle == NULL) {
rc = GetLastError();
rc);
return -1;
memset(&ObjectAttr, 0, sizeof(ObjectAttr));
ObjectAttr.Length = sizeof(ObjectAttr);
OpenThreadParam[0] = ClientPid;
OpenThreadParam[1] = ClientTid;
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 165 of 226
rc = NtOpenThread(
&ClientThreadHandle,
THREAD_ALL_ACCESS,
&ObjectAttr,
OpenThreadParam);
if (rc != 0) {
return -1;
printf("ClientProcessHandle = %x\n",
ClientProcessHandle);
printf("ClientThreadHandle = %x\n",
ClientThreadHandle);
rc = DuplicateHandle(
GetCurrentProcess(),
EventPairHandle,
ClientProcessHandle,
&ClientEventPairHandle,
0,
FALSE,
DUPLICATE_SAME_ACCESS);
if (rc == FALSE) {
rc = GetLastError();
return -1;
ClientEventPairHandle);
rc = ZwSetInformationThread(
ClientThreadHandle,
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 166 of 226
8,
&EventPairHandle,
4);
if (rc != 0) {
return -1;
while (1) {
DWORD ret_val;
if (ret_val != 0) {
} else {
printf("int 2C returned\n");
getchar();
return 0;
The server() function creates a named event pair object. It receives a handle to the newly
created event pair upon successful creation of the object. Next, it establishes an
association between the event pair and the server thread–the current thread. The server
uses the ZwSetInformationThread() function to associate the event pair with the thread.
This function is documented in the Windows NT DDK, but you can also call it from a user-
mode, nondriver application. The prototype for this function looks like:
NTSTATUS
ZwSetInformationThread(
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 167 of 226
HANDLE ThreadHandle,
THREADINFOCLASS ThreadInformationClass,
PVOID ThreadInformation,
ULONG ThreadInformationLength);
As described earlier, each thread points to the associated event pair object, and the INT
2BH/INT 2CH issued by a thread operates on the associated event pair object. The
operating system stores the pointer of the associated event pair in the Thread
Environment Block for the thread, and you can set it using the ZwSetInformationThread()
function. The ThreadInformationClass for the event pair pointer is 8. The actual
information to set is the handle of the event pair object. We pass 4 as the
ThreadInformationLength parameter because it represents the size of a handle in Windows
NT.
The server needs to associate the event pair with the client thread. But this is not as
simple as setting up the association for the current thread. First, the server gets a hold of
handles to the client process and the client thread. For this, it needs the client’s process
ID and thread ID, which input from the user. The function uses the OpenProcess() API
function to get a handle to the client process.
Note: The server process should have security rights to open the client process.
After setting up the Quick LPC channel, the server can now accept requests from the
client. It goes into a loop, blocking in the INT 2CH, and indicating it to the user whenever
it gets a request from the client. The server waits for a keystroke and then issues INT 2CH.
This causes the server thread to suspend and the client thread to release for execution.
We use inline assembly to issue the software interrupt. Note that the interrupt routine,
for interrupt 0x2C, stores the return value in the EAX register.
int client()
GetCurrentProcessId());
GetCurrentThreadId());
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 168 of 226
getchar();
while (1) {
DWORD ret_val;
if (ret_val != 0) {
} else {
printf("int 2B returned\n");
getchar();
return 0;
The client() function proves much simpler in comparison to the server() function because
the entire Quick LPC initialization is done by the server. The client just provides the
process ID and the thread ID for input to the server. After the initialization is complete,
the server waits for a client request in INT 2CH. You should indicate the end of
initialization to the client by a keystroke. After receiving the keystroke, the client issues a
INT 2BH, releasing the server thread for execution. Now, the client blocks and is
rescheduled only when the server issues INT 2CH. The client waits for a keystroke from
the user before issuing another INT 2BH.
We use inline assembly to issue the software interrupt. Note that the interrupt routine,
for interrupt 0x2B, stores the return value in the EAX register.
int rc;
if (argc == 1) {
rc = server();
} else {
rc = client();
return rc;
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 169 of 226
The main function in this sample program represents the control center. It calls the server
() function if you invoke the program without any parameters; otherwise, it calls the
client() function.
Whenever a client starts, it connects to the server over the port and sends a LPC request
containing its process ID and thread ID. The server, upon receiving the request, initializes
an event pair object and creates a new thread to handle the new client. A shared section
also needs to be created and mapped in the server address space, as well as the client
address space. The server can do it explicitly, or it can use the shared-section LPC so that
the client creates the section and the system itself takes care of the mapping.
After setting up the communication channel like this, the main server thread sends a reply
message to the client indicating that everything is set up. Now, the main server thread
can freely accept more connection requests from clients. The newly created thread waits
for the client requests by issuing INT 2CH. After the Quick LPC channel is established, the
client can copy the parameters to the shared area and issue INT 2BH whenever it needs to
invoke some service from the server.
As a result of the software interrupt, the server thread is scheduled for execution. The
server thread reads the parameters from the shared area, processes the request, copies
the results to the shared area, and invokes INT 2CH. The software interrupt causes the
server thread to sleep, and the client thread is scheduled for execution. This continues
until the client thread closes the port handle or dies. Now, the main server thread gets a
LPC_HANDLE_CLOSED message over the port. Upon receiving the message, the main
thread releases all resources allocated for the client; in other words, it destroys the
shared-section mapping, kills the thread handling the particular client, destroys the event
pair handle, and so on.
The sample program presented in the previous section works for console applications
under Windows NT 3.51. The program does not work for GUI applications because the
Win32 subsystem also sets the event pair handle in the Thread Environment Block (TEB),
overwriting the event pair handle set by our program. The Win32 subsystem sets the event
pair handle in the TEB when the thread makes the first GUI call. One fact in our favor is
that the event pair handle is maintained per thread. Therefore, you can work around this
problem very easily by having a separate client thread to communicate with the server.
The other threads in the application can consist of GUI threads, accessing the GUI
functions offered by the Win32 subsystem and using the Quick LPC to talk to the Win32
subsystem. You should take care only that the thread, using the Quick LPC to talk to your
own server, does not make any GUI calls.
Note: Our sample program does not work in Windows NT 4.0 because the interrupt 0x2B serves a different purpose.
As you know, the Win32 subsystem functionality moves entirely into the kernel-mode driver, namely, WIN32K.SYS, in
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 170 of 226
Windows NT 4.0. The Win32 GUI calls also process as system calls in Windows NT 4.0. Therefore, the Win32 subsystem
no longer needs the Quick LPC interface, also negating the requirement of interrupts 0x2C and 0x2B.
Note: Surprisingly, the Win32 subsystem, under Windows NT 3.51, does not call the NTDLL.DLL functions. It invokes
the interrupts 0x2B and 0x2C directly. Performance seems the most likely reason behind this “bypassing” act. First,
the system call interface is bypassed. The overheads of system call setup—that is, indexing the system call ID to find
out the number of parameters and the kernel function to be invoked—might prove unacceptable. Hence, we find the
two functions in question by going out of the way and invoking the special interrupts instead of using the normal
system call interface interrupt 0x2E. Of course, this required modifying the kernel to handle the two new software
interrupts. We still don’t understand why the Win32 subsystem bypasses the NTDLL.DLL functions.
You cannot use these functions to access the Quick LPC on Windows NT 4.0. Obviously, you
need to implement the system call invocation yourself; it’s fairly easy, though. On
Windows NT 4.0, you need to change the INT 2Bh instruction to the following sequence of
instructions that invoke the NtSetHighWaitLowThread() system call:
INT 2Eh
You cannot use INT 2CH, under Windows NT 4.0, even though the interrupt handler for it
remains there in place. (You would expect both the interrupt handlers to be extinct if the
Win32 subsystem no longer requires them, wouldn’t you?) This is because the interrupt
handler returns a STATUS_NO_EVENT_PAIR error even if the TEB of the calling thread
points to a proper event pair. Therefore, you need to use a corresponding system call to
achieve the same effect as the KiSetLowWaitHighThread() function. You can replace the
INT 2CH instruction with the following instructions that invoke the
NtSetLowWaitHighThread() system call:
INT 2Eh
The system call interface exists and can function even under Windows NT 3.51. You might
choose to use the same interface for the two versions of Windows NT so that the same
code works on both versions. OK! It’s not so straightforward because the service IDs
changed from Windows NT 3.51 to Windows NT 4.0. In Windows NT 3.51, the service ID for
the NtSetLowWaitHighThread() system call is 0xA3, and the NtSetHighWaitLowThread()
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 171 of 226
SUMMARY
A local procedure call (LPC) is the communication mechanism used by Windows NT
subsystems. In this chapter, we gave you a brief introduction to subsystems followed by a
detailed discussion on the undocumented LPC mechanism.
There are three types of LPC. The short message LPC passes small messages up to 304
bytes in length. The shared section LPC uses shared memory and passes larger messages.
Both the short message LPC and the shared section LPC are based on a kernel object
called port. The functions to manipulate ports are not documented. In this chapter, we
documented the parameters and use of these functions with demonstration programs.
The Quick LPC, the fastest form of LPC, is used exclusively by the Win32 subsystem. The
Quick LPC proves faster because it ensures controlled scheduling of the client and server
thread. In contrast with the other two forms of LPC, the Quick LPC requires a dedicated
server thread per client thread. The Quick LPC mechanism uses another kernel object–the
event pair. The context switches between the client thread and the corresponding
dedicated server thread are optimized using the event pair object.
Hardware interrupts come from the physical devices in the machine. For example,
whenever there is a character waiting on the COM port, a hardware interrupt will be
triggered. When an I/O operation completes, a hardware interrupt also will be triggered.
Software interrupts occur as a result of an explicit INT nn request from the application.
Applications typically use this mechanism to get different services from the operating
system. Exceptions occur as a result of an application’s attempt to perform illegal
operations, such as dividing by zero.
The next sections detail how processors handle software interrupts in real, protected, and
V86 modes.
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 172 of 226
In real mode, the lower 1K of memory holds a data structure known as the Interrupt
Vector Table (IVT). There are nominally 256 entries in this table. (Since the 80286, the IVT
is not required to have 256 entries or start at physical address 0. The base and address
and length of the IVT are determined by looking at the Interrupt Descriptor Table
Register.) Each entry contains a far pointer to an Interrupt Service Routine. Any type of
interrupt routes to the appropriate Interrupt Service Routine through this table. The
processor indexes the interrupt number in this table; pushes current CS, IP, and flags on
the stack; and calls the far pointer specified in the IVT. The handler processes the
interrupt and then executes an IRET instruction to return control to the place where the
processor executed at the time of the interrupt.
Interrupt gates interest us. The important fields of interrupt gates include the code
segment selector and the offset of the code for execution for this interrupt, as well as the
privilege level of the interrupt descriptor. The interrupt processing closely resembles that
in real mode. When the interrupt occurs, the processor indexes the interrupt number in
IDT, pushes EFLAGS, CS, and EIP onto the stack, and calls the handler specified in the IDT.
When the handler finishes executing, it should execute the IRET instruction to return
control. Depending upon the type of interrupt, an error code may be pushed on the stack.
The handler must clear this error code from the stack. The DPL field in the interrupt gate
controls the software interrupts. The current privilege level must be at least as privileged
as DPL to call these software interrupts. If not, then a General Protection Fault is
triggered. This protection feature permits the operating system to reserve certain
software interrupts for its own use. Hardware interrupts and exceptions process without
regard to the current privilege level.
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 173 of 226
interrupts are also provided, such as multiplex interrupt 2F. Applications fill in the
parameters in various registers and execute the INT nn instruction to access these services
from the operating system. Various compiler libraries provide wrappers around these
interrupt interfaces and provide useful C functions, such as _open, _read, _write, and
others.
Not much changes in the way software interrupts are used in Windows 95/98 and Windows
NT. Windows NT provides user-callable software interrupts. The following table lists the
important software interrupts provided.
2Bh, 2Ch Used by the CSRSS subsystem to force an immediate thread switch. This occurs as part of a
LPC mechanism. We discussed LPC in more detail in Chapter 8. These interrupts are used only
in Windows NT 3.51 since in later versions of Windows NT, most of the functionality in CSRSS
is moved to a kernel-mode driver WIN32K.SYS.
2Dh Debugging service. This service, used by driver writers, outputs debugging messages to the
Debugger Window. The DbgPrint() function provided in DDK calls this interrupt to output
debug messages.
2Eh This interrupt is extensively used for calling system services provided by Windows NT. The
system services are provided by two components viz. NTOSKRNL and WIN32K.SYS. The services
provided by WIN32K.SYS are present only in Windows NT versions later than 3.51. We discuss
system services in detail in Chapters 6 and 7.
MS-DOS provides system services to hook software interrupts by means of INT 21h, and
functions 25h and 35h. Compiler libraries provide wrapper functions such as _dos_getvect
and _dos_setvect to hook software interrupts. Windows 95 provides a mechanism to hook
software interrupts by means of Set_PM_Int_Vector and Hook_V86_Int_Chain VxD services.
However, Windows NT does not officially support any way to hook software interrupts. The
DDK does provide functions such as HalGetInterruptVector() and IoConnectInterrupt() to
hook hardware interrupts. Once we understand Intel data structures such as IDT and
interrupt gates, we can easily hook software interrupts in Windows NT. Hooking software
interrupts basically amounts to changing the code selector and offset fields in the
Interrupt Gate Descriptor. However, this certainly becomes a platform-dependent
situation. It will work only on an Intel implementation of Windows NT.
You can apply the same technique for hooking software interrupts to hook hardware
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 174 of 226
The sample application that we write in this chapter hooks INT 2Eh (System Service
Interrupt) and maintains the counters of how many times a particular system service was
called. The sample maintains only the counter of system services provided by
NTOSKRNL.EXE. The user-level application issues DeviceIoControl to this driver to obtain
the statistics about the service usage. As we already saw in Chapter 7, there are a total of
0xC4 system services in NT 3.51, 0xD3 services in NT 4.0, and 0xF4 services in Windows
2000 provided by NTOSKRNL.EXE. This sample works on all versions of Windows NT to
date.
HOOKINT.C
#include "ntddk.h"
#include "stdarg.h"
#include "stdio.h"
#include "Hookint.h"
#define TEST_PAGING
#define DRIVER_SOURCE
#include "..\..\include\intel.h"
#include "..\..\include\wintype.h"
#include "..\..\include\undocnt.h"
/* Interrupt to be hooked */
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 175 of 226
int OldHandler;
ULONG *ServiceCounterTable;
ULONG ServiceCounterTableSize;
int NumberOfServices;
#ifdef TEST_PAGING
void *PagedData;
#endif
char buffer[6];
* base of IDTR
*/
PIdtr_t Idtr=(PIdtr_t)buffer;
#pragma pack()
if (ServiceId>NumberOfServices)
return;
#ifdef TEST_PAGING
memset(PagedData, 0, 100000);
#endif
ServiceCounterTable[ServiceId+1]++;
return;
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 176 of 226
NTSTATUS DriverSpecificInitialization()
PIdtEntry_t IdtEntry;
extern PServiceDescriptorTableEntry_t
KeServiceDescriptorTable;
NumberOfServices =
KeServiceDescriptorTable->NumberOfServices;
ServiceCounterTableSize =
(NumberOfServices+1)*sizeof(int);
ServiceCounterTable = ExAllocatePool(PagedPool,
ServiceCounterTableSize);
if (!ServiceCounterTable) {
return STATUS_INSUFFICIENT_RESOURCES;
#ifdef TEST_PAGING
PagedData=ExAllocatePool(PagedPool, 100000);
if (!PagedData) {
ExFreePool(ServiceCounterTable);
return STATUS_INSUFFICIENT_RESOURCES;
#endif
memset(ServiceCounterTable,0,
ServiceCounterTableSize);
*ServiceCounterTable=NumberOfServices;
trace(("NumberOfServices=%x, "
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 177 of 226
"ServiceCounterTableSize=%x, @%x\n",
NumberOfServices, ServiceCounterTableSize,
ServiceCounterTable));
IdtEntry=(PIdtEntry_t)Idtr->Base;
* handler’s address
*/
OldHandler =
((unsigned int)IdtEntry[HOOKINT].OffsetHigh<<16U)|
(IdtEntry[HOOKINT].OffsetLow);
*/
_asm cli
IdtEntry[HOOKINT].OffsetLow =
(unsigned short)NewHandler;
IdtEntry[HOOKINT].OffsetHigh =
_asm sti
return STATUS_SUCCESS;
NTSTATUS
DriverEntry(
IN PDRIVER_OBJECT DriverObject,
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 178 of 226
IN PUNICODE_STRING RegistryPath
MYDRIVERENTRY(L"hookint",
FILE_DEVICE_HOOKINT,
DriverSpecificInitialization());
return ntStatus;
NTSTATUS
DriverDispatch(
IN PDEVICE_OBJECT DeviceObject,
IN PIRP Irp
PIO_STACK_LOCATION irpStack;
PVOID ioBuffer;
ULONG inputBufferLength;
ULONG outputBufferLength;
ULONG ioControlCode;
NTSTATUS ntStatus;
Irp->IoStatus.Status = STATUS_SUCCESS;
Irp->IoStatus.Information = 0;
ioBuffer = Irp->AssociatedIrp.SystemBuffer;
inputBufferLength = irpStack->Parameters.
DeviceIoControl.InputBufferLength;
outputBufferLength = irpStack->Parameters.
DeviceIoControl.OutputBufferLength;
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 179 of 226
switch (irpStack->MajorFunction)
case IRP_MJ_DEVICE_CONTROL:
trace(("HOOKINT.SYS: IRP_MJ_DEVICE_CONTROL\n"));
ioControlCode = irpStack->Parameters.
DeviceIoControl.IoControlCode;
switch (ioControlCode)
case IOCTL_HOOKINT_SYSTEM_SERVICE_USAGE:
int i;
* service usage
*/
if (outputBufferLength >=
ServiceCounterTableSize) {
*/
trace((for (i=1;
i<=NumberOfServices;
i++)
DbgPrint("%x ",
ServiceCounterTable[i])));
trace((DbgPrint("\n")));
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 180 of 226
* supplied buffer
*/
memcpy(ioBuffer, ServiceCounterTable,
ServiceCounterTableSize);
*/
Irp->IoStatus.Information =
ServiceCounterTableSize;
} else {
Irp->IoStatus.Status =
STATUS_INSUFFICIENT_RESOURCES;
break;
default:
Irp->IoStatus.Status =
STATUS_INVALID_PARAMETER;
"IRP_MJ_DEVICE_CONTROL\n"));
break;
break;
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 181 of 226
ntStatus = Irp->IoStatus.Status;
IoCompleteRequest (Irp,IO_NO_INCREMENT);
return ntStatus;
VOID
DriverUnload(
IN PDRIVER_OBJECT DriverObject
WCHAR deviceLinkBuffer[]=L"\\DosDevices\\hookint";
UNICODE_STRING deviceLinkUnicodeString;
PIdtEntry_t IdtEntry;
ExFreePool(ServiceCounterTable);
#ifdef TEST_PAGING
ExFreePool(PagedData);
#endif
/* Reach to IDT */
IdtEntry=(PIdtEntry_t)Idtr->Base;
*/
_asm cli
IdtEntry[HOOKINT].OffsetLow =
(unsigned short)OldHandler;
IdtEntry[HOOKINT].OffsetHigh =
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 182 of 226
_asm sti
RtlInitUnicodeString (&deviceLinkUnicodeString,
deviceLinkBuffer
);
IoDeleteSymbolicLink (&deviceLinkUnicodeString);
IoDeleteDevice (DriverObject->DeviceObject);
trace(("HOOKINT.SYS: unloading\n"));
HANDLER.ASM
.386
.model small
.code
include ..\..\include\undocnt.inc
public _NewHandler
extrn _OldHandler:near
extrn _NewHandlerCFunc@4:near
Ring0Prolog
STI
push eax
call _NewHandlerCFunc@4
CLI
Ring0Epilog
_NewHandler endp
END
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 183 of 226
SUMMARY
In this chapter, we discussed interrupt processing in various modes of Intel processors.
Then, we saw how the operating system makes use of interrupts. Next, we discussed the
need for hooking software interrupts. We also explored a mechanism for hooking software
interrupts. We concluded the chapter with an example that hooks Int 2E (the system
service interrupt) in Windows N.
AS WE SAW IN THE previous chapter, software interrupts are one of the mechanisms used for
calling system services. We have also seen that INT 2E is used for getting the system
services from the Windows NT kernel. By adding new software interrupts, it is possible to
add new system services to the Windows NT kernel. We have already seen one way to add
new system services to the Windows NT kernel, and this is just one more method. In this
chapter, we will not be playing with the operating system data structures as we did in
Chapter 7. Instead, we will use Intel data structures to add new system services.
If you see the descriptor entry for INT 2Eh through a debugger such as SoftICE, you will
notice that its descriptor privilege level is 3. That is why NTDLL.DLL can call INT 2Eh on
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 184 of 226
} InterruptGate_t;
There are a few unused interrupts in Windows NT, including INT 20h and INT 22-29h. You
can use these interrupts to add new software interrupts. Following are the steps for
adding new software interrupts:
1. Get the base address of the interrupt descriptor table using the assembly instruction
“sidt.” This instruction stores the base address and limit of IDT at the specified
memory location.
2. Treat this base address an a pointer to array of “InterruptGate_t” structures.
3. Index the interrupt number to be added into this table.
4. Fill in the “InterruptGate_t” entry at the index according to the requirements of the
interrupt gate. That is, sNNet the “SegmentType” field to 0Eh meaning interrupt
gate; set the “SystemSegmentFlag” to 0 meaning segment; set the “Selector,”
“OffsetLow,” and “OffsetHigh” fields with the address of the interrupt handler. Set
the “Present” field to 1.
5. Establish some mechanism for passing parameters to the interrupt service routine.
For example, INT 2Eh uses the EDX register to point to the user stack frame and the
EAX register for the service ID.
XREF: We have already seen mechanisms used by INT 2Eh handler in Chapter 6.
The sample application that illustrates this method adds INT 22h to the Windows NT
kernel. The interrupt handler expects that the EDX register points to the buffer, which
will be filled by the handler with the “Newly added interrupt called” string. The buffer
should be at least 29 bytes long.
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 185 of 226
Following is the device driver that adds a new software interrupt to the Windows NT
kernel. The driver adds the interrupt in its DriverEntry routine and removes the interrupt
in its DrvUnload routine. The full source code for the application that issues this newly
added interrupt is not given. Only the relevant part that issues the interrupt is given here.
#include "ntddk.h"
#include "stdarg.h"
#include "stdio.h"
#include "addint.h"
#include "..\include\intel.h"
#include "..\include\undocnt.h"
IdtEntry_t OldIdtEntry;
/* Interrupt Handler */
char buffer[6];
PIdtr_t Idtr=(PIdtr_t)buffer;
NTSTATUS AddInterrupt()
PIdtEntry_t IdtEntry;
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 186 of 226
IdtEntry=(PIdtEntry_t)Idtr->Base;
if
((IdtEntry[ADDINT].OffsetLow!=0)||(IdtEntry[ADDINT].OffsetHigh!=0))
return STATUS_UNSUCCESSFUL;
_asm cli
requirement */
IdtEntry[ADDINT].OffsetLow=(unsigned short)InterruptHandler;
IdtEntry[ADDINT].Selector=8;
IdtEntry[ADDINT].Reserved=0;
IdtEntry[ADDINT].Type=0xE;
IdtEntry[ADDINT].Always0=0;
IdtEntry[ADDINT].Dpl=3;
IdtEntry[ADDINT].Present=1;
IdtEntry[ADDINT].OffsetHigh=(unsigned short)((unsigned
int) InterruptHandler>16);
_asm sti
return STATUS_SUCCESS;
NTSTATUS
DriverEntry(
IN PDRIVER_OBJECT DriverObject,
IN PUNICODE_STRING RegistryPath
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 187 of 226
MYDRIVERENTRY(DRIVER_DEVICE_NAME, FILE_DEVICE_ADDINT,
AddInterrupt());
return ntStatus;
void RemoveInterrupt()
PIdtEntry_t IdtEntry;
/* Reach to IDT */
IdtEntry=(PIdtEntry_t)Idtr->Base;
_asm cli
_asm sti
NTSTATUS
DriverDispatch(
IN PDEVICE_OBJECT DeviceObject,
IN PIRP Irp
Irp->IoStatus.Status = STATUS_SUCCESS;
IoCompleteRequest (Irp,
IO_NO_INCREMENT
);
return Irp->IoStatus.Status;
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 188 of 226
VOID
DriverUnload(
IN PDRIVER_OBJECT DriverObject
WCHAR deviceLinkBuffer[] =
L"\\DosDevices\\"DRIVER_DEVICE_NAME;
UNICODE_STRING deviceLinkUnicodeString;
RemoveInterrupt();
RtlInitUnicodeString (&deviceLinkUnicodeString,
deviceLinkBuffer
);
IoDeleteSymbolicLink (&deviceLinkUnicodeString);
IoDeleteDevice (DriverObject->DeviceObject);
trace(("ADDINT.SYS: unloading\n"));
.386
.model small
.code
public _InterruptHandler
extrn _CFunc:near
include ..\include\undocnt.inc
_InterruptHandler proc
Ring0Prolog
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 189 of 226
jz NullPointer
repz movsb
NullPointer:
call _CFunc
Ring0Epilog
iretd
messagelen dd $-message
_InterruptHandler endp
End
#include <windows.h>
#include <stdio.h>
#include "addint.h"
main()
char buffer[100];
__try {
_asm {
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 190 of 226
int 22h
__except (EXCEPTION_EXECUTE_HANDLER) {
return 0;
buffer);
return 0;
Callgates are mechanisms that facilitate controlled and secure communication from a
lower privilege level to higher privilege level. Right now we will consider the control
transfer from ring 3 to ring 0 since Windows NT uses only these two privilege levels. It is as
if you have ring 3 and ring 0 code on two sides of a callgate, with the callgate acting as an
intermediary between the two. The callgate enables messages to pass from one ring to the
other.
When creating a callgate, you have to specify the address of each side of the fence and
the number of parameters to be passed from one side of the fence to the other. The
privilege level of the callgate dictates which processes have access to it. When the control
is transferred though the callgate, the processor switches to the ring 0 stack. This stack is
selected by looking at the TSS. The TSS contains the stack for each privilege level. After
this, the processor pushes the ring 3 SS:ESP on this new stack. Then the processor copies
the number of parameters specified by the callgate from the ring 3 stack to the ring 0
stack. Parameters are in terms of the number of DWORDS for 32-bit callgates and the
number of WORDS for a 16-bit callgate. Further, the processor pushes the ring 3 CS:EIP
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 191 of 226
onto the stack and jumps to the address specified in the callgate. The function at ring 0 is
responsible for cleaning the parameters from the stack once it has finished executing. In
the end, the ring 0 code should execute a retf nn instruction to clean up the stack and
return control to the ring 3 code.
The sample accompanying this technique is based on the sample program PHYS.EXE
demonstrated in Matt Pietrek’s Windows 95 Programming Secrets(IDG Books Worldwide).
The sample shows you how you can use the same trick under Windows NT. The sample
uses three undocumented functions in NTOSKRNL.EXE. These functions enable you to
allocate and release selectors from the Global Descriptor Table (GDT) and modify the
descriptor entries corresponding to the selectors. Use of the following undocumented
functions prevents the need to directly manipulate Intel data structures such as the GDT.
NTSTATUS
KeI386AllocateGdtSelectors(
int NumberOfSelectors);
The function allocates the specified number of selectors from the GDT and fills in the
SelectorArray with the allocated selector values. NTOSKRNL keeps a linked list of free
selectors in the descriptor itself. Also, NTOSKRNL keeps track of the number of free
selectors. The function checks whether the specified number of selectors is present. If
enough selectors are available, the function removes those selectors from the free list and
gives the list to the caller. Interestingly, these functions are exported from the
NTOSKRNL.EXE file, so any driver can use them. Other functions also enable descriptor
queries and other tasks, but they are not exported.
NTSTATUS
KeI386ReleaseGdtSelectors(
int NumberOfSelectors);
The function releases the specified number of selectors. The selectors are specified in the
array SelectorArray. The function updates the variable that keeps track of the number of
selectors and inserts these selectors in the free list of selectors.
NTSTATUS
This function fills in the descriptor corresponding to a particular selector. The second
parameter should be a pointer to a descriptor entry.
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 192 of 226
and extended memory size from CMOS data. The application also prints the contents of
CPU control registers such as CR0, CR2. The instructions for accessing these registers are
privileged.
#include "ntddk.h"
#include "stdarg.h"
#include "stdio.h"
#include "callgate.h"
#include "..\include\intel.h"
#include "..\include\undocnt.h"
and
use
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 193 of 226
NTSTATUS rc;
rc=KeI386AllocateGdtSelectors(SelectorArray, 0x02);
if (rc!=STATUS_SUCCESS) {
return rc;
SelectorArray[1]));
descriptor */
ring0_desc.limit_0_15 = 0xFFFF;
ring0_desc.base_0_15 = 0;
ring0_desc.base_16_23 = 0;
ring0_desc.readable = 1;
ring0_desc.conforming = 0;
ring0_desc.code_data = 1;
ring0_desc.app_system = 1;
ring0_desc.dpl = 0;
ring0_desc.present = 1;
ring0_desc.limit_16_19 = 0xF;
ring0_desc.always_0 = 0;
ring0_desc.seg_16_32 = 1;
ring0_desc.granularity = 1;
ring0_desc.base_24_31 = 0;
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 194 of 226
gate descriptor */
->FunctionLinearAddress );
callgate_desc.selector = SelectorArray[0];
callgate_desc.param_count = CallGateInfo->NumberOfParameters;
callgate_desc.some_bits = 0;
callgate_desc.present = 1;
callgate_desc.offset_16_31 = HIWORD(CallGateInfo-
>FunctionLinearAddress);
caller
CallGateInfo->CodeSelector=SelectorArray[0];
CallGateInfo->CallGateSelector=SelectorArray[1];
rc=KeI386SetGdtSelector(SelectorArray[0], &ring0_desc);
if (rc!=STATUS_SUCCESS) {
trace(("SetGdtSelector=%x\n", rc));
KeI386ReleaseGdtSelectors(SelectorArray, 0x02);
return rc;
rc=KeI386SetGdtSelector(SelectorArray[1], &callgate_desc);
if (rc!=STATUS_SUCCESS) {
trace(("SetGdtSelector=%x\n", rc));
KeI386ReleaseGdtSelectors(SelectorArray, 0x02);
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 195 of 226
return rc;
/* Return success */
return STATUS_SUCCESS;
int rc;
SelectorArray[0]=CallGateInfo->CodeSelector;
SelectorArray[1]=CallGateInfo->CallGateSelector;
rc=KeI386ReleaseGdtSelectors(SelectorArray, 0x02);
if (rc!=STATUS_SUCCESS) {
return rc;
NTSTATUS
DriverEntry(
IN PDRIVER_OBJECT DriverObject,
IN PUNICODE_STRING RegistryPath
return ntStatus;
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 196 of 226
NTSTATUS
DriverDispatch(
IN PDEVICE_OBJECT DeviceObject,
IN PIRP Irp
PIO_STACK_LOCATION irpStack;
PVOID ioBuffer;
ULONG inputBufferLength;
ULONG outputBufferLength;
ULONG ioControlCode;
NTSTATUS ntStatus;
Irp->IoStatus.Status = STATUS_SUCCESS;
Irp->IoStatus.Information = 0;
ioBuffer = Irp->AssociatedIrp.SystemBuffer;
inputBufferLength = irpStack-
>Parameters. DeviceIoControl.InputBufferLength;
outputBufferLength = irpStack-
>Parameters. DeviceIoControl.OutputBufferLength;
switch (irpStack->MajorFunction)
case IRP_MJ_DEVICE_CONTROL:
trace(("CALLGATE.SYS: IRP_MJ_DEVICE_CONTROL\n"));
ioControlCode = irpStack-
>Parameters. DeviceIoControl.IoControlCode;
switch (ioControlCode)
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 197 of 226
case IOCTL_CALLGATE_CREATE:
PCallGateInfo_t CallGateInfo;
CallGateInfo=(PCallGateInfo_t)ioBuffer;
Irp->IoStatus.Status=CreateCallGate(CallGateInfo);
if (Irp->IoStatus.Status==STATUS_SUCCESS) {
Irp->IoStatus.Information = sizeof(CallGateInfo_t);
break;
case IOCTL_CALLGATE_RELEASE:
PCallGateInfo_t CallGateInfo;
CallGateInfo=(PCallGateInfo_t)ioBuffer;
ntStatus=ReleaseCallGate(CallGateInfo);
break;
default:
Irp->IoStatus.Status = STATUS_INVALID_PARAMETER;
trace(("CALLGATE.SYS: unknown
IRP_MJ_DEVICE_CONTROL\n"));
break;
break;
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 198 of 226
ntStatus = Irp->IoStatus.Status;
IoCompleteRequest (Irp,
IO_NO_INCREMENT
);
return ntStatus;
VOID
DriverUnload(
IN PDRIVER_OBJECT DriverObject
WCHAR deviceLinkBuffer[] =
L"\\DosDevices\\"DRIVER_DEVICE_NAME;
UNICODE_STRING deviceLinkUnicodeString;
RtlInitUnicodeString (&deviceLinkUnicodeString,
deviceLinkBuffer
);
IoDeleteSymbolicLink (&deviceLinkUnicodeString);
IoDeleteDevice (DriverObject->DeviceObject);
trace(("CALLGATE.SYS: unloading\n"));
#include <windows.h>
#include <winioctl.h>
#include "callgate.h"
#include "gate.h"
HANDLE hCallgateDriver=INVALID_HANDLE_VALUE;
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 199 of 226
WORD CodeSelectorArray[8192];
void OpenCallgateDriver()
strcpy (completeDeviceName,
"\\\\.\\callgate"
);
GENERIC_READ | GENERIC_WRITE,
0,
NULL,
OPEN_EXISTING,
FILE_ATTRIBUTE_NORMAL,
NULL
);
void CloseCallgateDriver()
if (hCallgateDriver!=INVALID_HANDLE_VALUE) {
CloseHandle(hCallgateDriver);
int
NumberOfParameters,
PWORD pSelector)
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 200 of 226
CallGateInfo_t CallGateInfo;
DWORD BytesReturned;
if (hCallgateDriver==INVALID_HANDLE_VALUE) {
return ERROR_DRIVER_NOT_FOUND;
if (!pSelector)
return ERROR_BAD_PARAMETER;
memset(&CallGateInfo, 0, sizeof(CallGateInfo));
CallGateInfo.FunctionLinearAddress=FunctionAddress;
CallGateInfo.NumberOfParameters=NumberOfParameters;
if (!DeviceIoControl(hCallgateDriver,
(DWORD)IOCTL_CALLGATE_CREATE,
&CallGateInfo,
sizeof(CallGateInfo),
&CallGateInfo,
sizeof(CallGateInfo),
&BytesReturned,
NULL)) {
return ERROR_IOCONTROL_FAILED;
*pSelector=CallGateInfo.CallGateSelector;
CodeSelectorArray[CallGateInfo.CallGateSelector]=CallGateInfo.CodeSelector;
return SUCCESS;
CallGateInfo_t CallGateInfo;
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 201 of 226
DWORD BytesReturned;
if (hCallgateDriver==INVALID_HANDLE_VALUE) {
return ERROR_DRIVER_NOT_FOUND;
if
(CallGateSelector>=sizeof(CodeSelectorArray)/sizeof(CodeSelectorArray[0])) {
return ERROR_BAD_PARAMETER;
memset(&CallGateInfo, 0, sizeof(CallGateInfo));
CallGateInfo.CallGateSelector=CallGateSelector;
CallGateInfo.CodeSelector=CodeSelectorArray[CallGateSelector];
if (!DeviceIoControl(hCallgateDriver,
(DWORD)IOCTL_CALLGATE_RELEASE,
&CallGateInfo,
sizeof(CallGateInfo),
&CallGateInfo,
sizeof(CallGateInfo),
&BytesReturned,
NULL)) {
return ERROR_IOCONTROL_FAILED;
return SUCCESS;
switch (Reason) {
case DLL_PROCESS_ATTACH:
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 202 of 226
OpenCallgateDriver();
return TRUE;
case DLL_PROCESS_DETACH:
CloseCallgateDriver();
return TRUE;
default:
return TRUE;
/*
CGATEAPP.C
Copyright (C) 1997 Prasad Dabak and Sandeep Phadke and Milind Borate
*/
#include <windows.h>
#include <stdio.h>
#include "gate.h"
void DumpBaseMemory()
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 203 of 226
void DumpExtendedMemory()
void DumpControlRegisters()
_asm {
_asm {
_asm {
void cfunc()
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 204 of 226
DumpBaseMemory();
DumpExtendedMemory();
DumpControlRegisters();
void func(void);
main()
WORD CallGateSelector;
int rc;
short farcall[3];
__try {
cfunc();
__except (EXCEPTION_EXECUTE_HANDLER) {
parameters
rc=CreateCallGate(func, 0, &CallGateSelector);
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 205 of 226
if (rc==SUCCESS) {
/*Prepare for making the far call. Forget about the offset
farcall[2]=CallGateSelector;
_asm {
rc=FreeCallGate(CallGateSelector);
if (rc!=SUCCESS) {
rc=%x\n",
CallGateSelector, rc);
} else {
return 0;
.386
.model small
.code
public _func
extrn _cfunc:near
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 206 of 226
include ..\include\undocnt.inc
_func proc
Ring0Prolog
call _cfunc
Ring0Epilog
retf
_func endp
END
PAGING ISSUES
While writing the callgate sample, we observed that there are certain issues regarding
accessing the paged/swapped out data in the interrupt routine and also in the function
called through callgate. All the existing interrupt handlers such as INT 2Eh were seen to
follow certain entry and exit code before performing any real work. Some of the tasks
performed by the entry code were:
Out of all these steps, the first step is absolutely necessary and is related to the logic used
by page fault handler of the operating system. The page fault handler does some
arithmetic on the current stack pointer and the stack pointer at the time of ring transition
from ring 3 to ring 0 and take some decisions. If at least a specific amount of stack space
is not found between these two stack pointer values, then the system crashes with a Blue
Screen.
It is essential that you follow this while writing interrupt handlers or functions executed
through callgate to successfully access paged out data. The fourth step of setting FS
register to 0x30 is also necessary since the system expects FS register to point to Processor
Control Region when the thread is executing in ring 0 and the selector 0x30 points to the
descriptor with the base address equal to address of processor control region.
Note: Note that you have to follow the same steps while hooking software interrupts.
The second and third step seems to be only for bookkeeping information.
All the samples in this book that use callgates or interrupt handlers use a macro defined in
UNDOCNT.INC file called Ring0Prolog and Ring0Epilog. These macros implement the code,
which takes care of these paging issues.
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 207 of 226
SUMMARY
In this chapter, we detailed how interrupts are executed under Windows NT. Then we
discussed a mechanism for adding new software interrupts. Along the way, we discussed
some processor data structures used while processing the interrupt and presented an
example that adds a software interrupt (0x22) to Windows NT. We also showed an
example of an application that calls the newly added interrupt. After that, we discussed
callgates, used for running ring 0 code from ring 3. This was followed by an example that
demonstrated how to use callgates to read processor control registers such as CR0, CR3
and do direct port I/O from ring 3. The chapter concluded with the discussion about the
paging issues while executing functions through callgates and interrupt handlers.
MICROSOFT INTRODUCED A NEW executable file format with Windows NT. This format is called the
Portable Executable (PE) format because it is supposed to be portable across all 32-bit
operating systems by Microsoft. The same PE format executable can be executed on any
version of Windows NT, Windows 95, and Win32s. Also, the same format is used for
executables for Windows NT running on processors other than Intel x86, such as MIPS,
Alpha, and Power PC. The 32-bit DLLs and Windows NT device drivers also follow the same
PE format.
It is helpful to understand the PE file format because PE files are almost identical on disk
and in RAM. Learning about the PE format is also helpful for understanding many operating
system concepts. For example, how operating system loader works to support dynamic
linking of DLL functions, the data structures involved in dynamic linking such as import
table, export table, and so on.
The PE format is not really undocumented. The WINNT.H file has several structure
definitions representing the PE format. The Microsoft Developer's Network (MSDN) CD-
ROMs contain several descriptions of the PE format. However, these descriptions are in
bits and pieces, and are by no means complete. In this chapter, we try to give you a
comprehensive picture of the PE format.
Microsoft also provides a DLL with the SDK that has utility functions for interpreting PE
files. We also discuss these functions and correlate them with other information about the
PE format.
OVERVIEW OF A PE FILE
In this section, we discuss the overall structure of a PE file. In the sections that follow, we
go into detail about the PE format. A PE file comprises various sections. Because
Microsoft’s 32-bit operating systems follow the flat memory model, an executable no
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 208 of 226
longer contains segments. Still, different parts of an executable, such as code and data,
have different characteristics. These different parts of an executable are stored as
different sections. Thus, a PE file is a concatenation of data stored in sections.
A few sections are always present in a PE file generated by the Microsoft linker. Other
linkers may generate similar sections with different names. A PE file generated with the
Microsoft linker has a .text section that contains the code bytes concatenated from all the
object files. As for the data, it can be classified into different categories. The .data
section contains all the initialized global and static data, while the .bss section contains
the uninitialized data. The read-only data, such as string literals and constants, is stored
in the .rdata section. This section also contains some other read-only structures, such as
the debug directory, the Thread Local Storage (TLS) directory, and so on, which we
explain later in this chapter. The .edata section contains information about the functions
exported from a DLL, while the .idata section stores information about the functions
imported by an executable or a DLL. The .rsrc section contains various resources, such as
menus and dialog boxes. The .reloc section stores the information required for relocating
the image while loading.
The names of the sections do not have any significance. As mentioned earlier, different
linkers may use different names for the sections. Programmers can also create new
sections of their own. The #pragma code_seg and #pragma data_seg macros can be used
to create new sections while working with Microsoft compiler. The operating system
loader locates the required piece of information from the data directories present in the
file headers. Shortly, we will present an overview of file headers and then look at them in
more detail.
STRUCTURE OF A PE FILE
Apart from the sections consisting of the actual data, a PE file contains various headers
that describe the sections and the important information present in the sections.
If you look at the hex dump of a PE file, the first 2 bytes might look familiar. Aren’t they
M and Z? Yes, a PE file starts with the DOS executable header. It is followed by a small
program that prints an error message saying that the program cannot be run in DOS mode.
It’s the same idea that was used in 16-bit Windows executables. This program code is
executed, if the PE image is run under DOS.
After the DOS header and the DOS executable stub comes the PE header. A field in the
DOS header points to this new header. The PE header starts with the 4-byte signature
“PE” followed by two nulls. The PE format is based on the Common Object File Format
(COFF) used by Unix. The PE signature is followed by the object file header borrowed from
COFF. This header is present also for the object files produced by Microsoft’s 32-bit
compilers. This header contains some general information about the file, such as the
target machine ID, the number of sections in the file, and so forth. The COFF style header
is followed by the optional header. This header is optional in the sense that it is not
required for the object files. As far as executables and DLLs are concerned, this header is
mandatory. The optional header has two parts. The first part is inherited from COFF and
can be found in all COFF files. The second part is an NT-specific extension of COFF. Apart
from other NT-specific information, such as the subsystem type, this part also contains the
data directory. The data directory is an array in which each entry points to some
important piece of information. One of the entries in the data directory points to the
import table of the executable or DLL, another entry points to the export table of the
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 209 of 226
XREF: We will look at the detailed formats of the different pieces of information later in this chapter.
The data directory is followed by the section table. The section table is an array of
section headers. A section header summarizes the important information about the
respective section. Finally, the section table is followed by the sections themselves.
We hope that this gives you an overview of the organization of a PE file. Before diving into
the details of the PE format, let’s discuss a concept that is vital in interpreting a PE file.
Because the PE format always talks in terms of RVAs, it’s difficult to find the location of
the required information within a file. A common practice while accessing a PE file is to
map the file in memory using the Win32 memory mapping API. It’s a bit complicated to
calculate the address for the given RVA in this memory-mapped file. You first need to find
out the section in which the given RVA lies. You can accomplish this by iterating through
the section table. Each section header stores the starting RVA for the section and the size
of the section. A section is guaranteed to be contiguously loaded in memory. Hence, the
offset from the start of the section for a particular piece of data is bound to be the same
whether the file is memory mapped or loaded by the operating system loader for
execution. Hence, to find out the address in a memory-mapped file, you simply need to
add this offset to the base address of the section in the memory-mapped file. Now, this
base address can be calculated from within the file offset of the section, which is also
stored in the respective section header. Quite an easy procedure, isn’t it?
ImageRvaToVa()
Don’t worry, there is an easier way out. Microsoft comes to our rescue here with
IMAGEHLP.DLL. This DLL exports a function that computes the address in the memory-
mapped file, given an RVA.
LPVOID ImageRvaToVa(
PIMAGE_NT_HEADERS NtHeaders,
LPVOID Base,
DWORD Rva,
PIMAGE_SECTION_HEADER *LastRvaSection
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 210 of 226
);
PARAMETERS
NtHeaders Pointer to an IMAGE_NT_HEADERS structure. This structure represents the PE header and is
defined in the WINNT.h file. A pointer to the PE header within a PE file can be obtained using
the ImageNtHeader() function exported by IMAGEHLP.DLL.
Base Base address where the PE file is mapped into memory using the Win32 API for the memory
mapping of files.
LastRvaSection Last RVA section. This is an optional parameter, and you can pass NULL. When specified, it
points to a variable that contains the last section value used for the specified image to
translate an RVA to a VA. This is used for optimizing the section search, in case the given RVA
also falls within the same section as the one for the previous call to the function. The
LastRVASection is checked first, and the regular sequential search for the section is carried
out only if the given RVA does not fall within the LastRVASection.
RETURN VALUES
If the function succeeds, the return value is the virtual address in the mapped file;
otherwise, it is NULL. The error number can be retrieved using the GetLastError()
function.
ImageNtHeader()
The ImageRvaToVa() function needs a pointer to the PE header. The ImageNtHeader
exported from the IMAGEHLP.DLL can provide you this pointer.
PIMAGE_NT_HEADERS ImageNtHeader(
LPVOID ImageBase
);
PARAMETERS
ImageBase Base address where the PE file is mapped into memory using the Win32 API for the memory
mapping of files.
RETURN VALUES
If the function succeeds, the return value is a pointer to the IMAGE_NT_HEADERS structure
within the mapped file; otherwise, it returns NULL.
MapAndLoad()
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 211 of 226
The IMAGEHLP.DLL can also take care of memory mapping a PE file for you. The
MapAndLoad() function maps the requested PE file in memory and fills in the
LOADED_IMAGE structure with some useful information about the mapped file.
BOOL MapAndLoad(
LPSTR ImageName,
LPSTR DllPath,
PLOADED_IMAGE LoadedImage,
BOOL DotDll,
BOOL ReadOnly
);
PARAMETERS
ImageName Name of the PE file that is loaded.
DllPath Path used to locate the file if the name provided cannot be found. If NULL is passed, then
normal rules for searching using the PATH environment variable are applied.
LoadedImage The structure LOADED_IMAGE is defined in the IMAGEHLP.H file. The structure has the
following members:
Sections Pointer to the first section header within the mapped file.
Characteristics Characteristics of the PE file (this is explained in more detail later in this chapter).
The function sets the members in the structure appropriately after loading the PE file.
DotDll If the file needs to be searched and does not have an extension, then either the .exe or
the .dll extension is used. If the DotDll flag is set to TRUE, the .dll extension is used;
otherwise, the .exe extension is used.
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 212 of 226
RETURN VALUES
UnMapAndLoad()
After you are done with the mapped file, you should call the UnMapAndLoad() function. This function unmaps the PE file
and deallocates the resources allocated by the MapAndLoad() function.
BOOL UnMapAndLoad(
PLOADED_IMAGE LoadedImage
);
PARAMETERS
LoadedImage Pointer to a LOADED_IMAGE structure that is returned from a call to the MapAndLoad()
function.
RETURN VALUES
We will discuss the other useful functions from this DLL as we continue in this chapter.
The IMAGE_NT_HEADERS structure that represents the PE header is defined as follows in the WINNT.H file:
DWORD Signature;
IMAGE_FILE_HEADER FileHeader;
IMAGE_OPTIONAL_HEADER OptionalHeader;
} IMAGE_NT_HEADERS, *PIMAGE_NT_HEADERS;
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 213 of 226
The signature is PE followed by two nulls, as mentioned earlier. The COFF style header is represented by the
IMAGE_FILE_HEADER structure and is followed by the optional header represented by the IMAGE_OPTIONAL_HEADER
structure. The fields in the COFF style header are as follows:
MachineTarget machine ID. Various values are defined in the WINNT.H file–for example, 0x14C is used for Intel 80386
(and compatibles) and 0x184 is used for Alpha AXP.
PointerToSymbolTable Offset to the COFF symbol table. This field is used only for COFF style
object files and PE files with COFF style debug information.
SizeOfOptionalHeader Size, in bytes, of the optional header that follows this header. This data
can be used in locating the string table that immediately follows the
symbol table. This field is set to 0 for the object files because the
optional header is absent in them.
Characteristics Attributes of the file. The flag values are defined in the WINNT.H file.
This field contains an OR of these flags. The important flags are as
follows:
IMAGE_FILE_LINE_NUMS_STRIPPED Indicates that the COFF line numbers have been removed from the file.
IMAGE_FILE_LOCAL_SYMS_STRIPPED Indicates that the COFF symbol table has been removed from the file.
IMAGE_FILE_DEBUG_STRIPPED Indicates that the debugging information has been removed from the file.
IMAGE_FILE_RELOCS_STRIPPED Indicates that the base relocation information is stripped from this file,
and the file can be loaded only at the preferred base address. If the
loader cannot load such an image at the preferred base address, it fails
because it cannot relocate the image.
IMAGE_FILE_BYTES_REVERSED_LO Little endian: the least significant bit (LSB) precedes the most significant
bit (MSB) in memory, but they are stored in reverse order.
IMAGE_FILE_BYTES_REVERSED_HI Big endian: the MSB precedes the LSB in memory, but they are stored in
reverse order.
IMAGE_FILE_REMOVABLE_RUN_FROM_SWAP If this flag is set and the file is run from a removable media, such as a
floppy, the loader copies the file to the swap area and runs it from there.
IMAGE_FILE_NET_RUN_FROM_SWAP Similar to the previous flag. It is run from swap if the file is run from a
network drive.
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 214 of 226
Note: The COFF style header is followed by the optional header. The optional header is absent in the object files. The
format of the optional header is defined as the IMAGE_OPTIONAL_HEADER structure in the WINNT.H file. The first few
fields in this structure are inherited from COFF.
SizeOfCode Size of the code section. If there are multiple code sections, this field contains the sum of
sizes of all these sections.
SizeOfInitializedData Size of the initialized data section. If there are multiple initialized data sections, this field
contains the sum of sizes of all these sections.
SizeOfUninitializedData Same as SizeOfInitializedData, but for the uninitialized data (BSS) section.
Microsoft added some NT-specific fields to the optional header. These fields are as follows:
ImageBase If the file is loaded at this address in memory, the loader need not do any base relocations.
This is because the linker resolves all the base relocations at the time of linking, assuming that
the file will be loaded at this address. We discuss this in more detail in the section on the
relocation table. For now, it is enough to know that the loading time is reduced if a file gets
loaded at the preferred base address. A file may not get loaded at the preferred base address
because of the nonavailability of the address. This happens when more than one DLL used by
an executable use the same preferred base address. The default preferred base address is
0x400000. You may want to have a different preferred base address for your DLL so that it
does not clash with that of any other DLL used by your application. You can change the
preferred base address using a linker switch. You can also change the base address of a file
using the rebase utility that comes with the Win32 SDK.
ReBaseImage()
The ReBaseImage() function from the IMAGEHLP.DLL also enables you to change the preferred base address.
BOOL ReBaseImage(
LPSTR CurrentImageName,
LPSTR SymbolPath,
BOOL fReBase,
BOOL fRebaseSysfileOk,
BOOL fGoingDown,
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 215 of 226
DWORD CheckImageSize,
LPDWORD OldImageSize,
LPDWORD OldImageBase,
LPDWORD NewImageSize,
LPDWORD NewImageBase,
DWORD TimeStamp
);
PARAMETERS
SymbolPath In case the symbolic debug information is stored as a separate file, the path to find the
corresponding symbol file. This is required to update the header information and timestamp of
the symbol file.
fRebaseSysfileOk If the file is a system file with the preferred base address above 0x80000000, it is rebased only
if this flag is TRUE.
fGoingDown If you want the loaded image of the file to lie entirely below the given address, set this flag to
TRUE. For example, if the loaded size of a DLL is 0x2000 and you call the function with the
fGoingDown flag as TRUE and give the address as 0x600000, the DLL will be rebased at
0x508000.
CheckImageSize Rebasing might change the loaded image size of the file because of the section alignment
requirements. If this parameter is nonzero, the file is rebased only if the changed size is less
than this parameter.
OldImageSize Original image size before the rebase operation is returned here.
OldImageBase Original image base before the rebase operation is returned here.
NewImageSize New loaded image size after the rebase operation is returned here.
NewImageBase New base address. Upon return, it contains the actual address where the file is rebased.
RETURN VALUES
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 216 of 226
FileAlignment In the file, a section always starts at an offset that is a multiple of the file
alignment. This value is some multiple of the sector size.
MajorImageVersion, MinorImageVersion A developer can use these fields to version his or her files. It can be
specified with a linker flag.
SizeOfImage Size of the image after considering the section alignment. This amount of
virtual memory needs to be reserved for loading the file.
SizeOfHeaders Total size of the headers, including the DOS header, the PE header, and the
section table. The sections containing the actual data start at this offset in
the file.
CheckSum This is used only for the kernel-mode drivers/DLLs. It can be set to 0 for
user-mode executables/DLLs.
Subsystem Subsystem used by the file. The following values are defined in the
WINNT.H file:
IMAGE_SUBSYSTEM_NATIVE Image doesn’t require a subsystem. The kernel-mode drivers and native
applications such as CSRSS.EXE have this value for the field.
DllCharacteristics Obsolete.
SizeOfStackReserve Address space to be reserved for the stack. Only the virtual address space is
marked–the swap space is not allocated.
SizeOfStackCommit Actual memory committed for the stack. This much swap space is initially
allocated. The committed stack size is increased on demand until it reaches
the SizeOfStackReserve.
LoaderFlags Obsolete.
NumberOfRvaAndSizes Number of entries in the data directory that follows this field. It is always
set to 16.
DataDirectory As mentioned earlier, each entry in the data directory points to some
[IMAGE_NUMBEROF_DIRECTORY_ENTRIES] important piece of information. Each of these entries is of the type
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 217 of 226
DWORD VirtualAddress;
DWORD Size;
} IMAGE_DATA_DIRECTORY, *PIMAGE_DATA_DIRECTORY;
ImageDirectoryEntryToData()
The VirtualAddress field contains the RVA of the respective piece of information, and the Size field contains the size of
the data. To get to the actual data, you need to convert the RVA to the actual address in the memory-mapped PE file.
This can be accomplished with the ImageDirectoryEntryToData() function exported by IMAGEHLP.DLL.
PVOID ImageDirectoryEntryToData(
LPVOID Base,
BOOLEAN MappedAsImage,
USHORT DirectoryEntry,
PULONG Size
);
PARAMETERS
MappedAsImage Set this flag to TRUE if the system loader maps the file. Otherwise, set the flag to FALSE.
Size Upon return, the size from the data directory is filled here.
RETURN VALUES
If the function succeeds, the return value is the address in the memory-mapped file where the required data resides.
Otherwise, the function returns NULL.
Export Directory
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 218 of 226
The data directory entry at the IMAGE_DIRECTORY_ENTRY_EXPORT index points to the export directory for the file. The
RVA in this directory entry points to the .edata section. The information about the functions exported by the file
(generally a DLL) is stored here. The data directory entry points to the export directory that is defined as the
IMAGE_EXPORT_DIRECTORY structure in the WINNT.H file. The fields in this structure are as follows:
Base Starting ordinal for the exported functions–that is, the least of the ordinals. Generally, this
field is 1.
NumberOfNames Number of functions that are exported by name. Some functions may be exported only by
ordinal, so this number may be less than NumberOfFunctions.
AddressOfFunctions RVA of an array (let’s call it as the export-functions array) that has an entry for each function
exported from the DLL. Hence, the size of this array is equal to the NumberOfFunctions field.
The entry at index i corresponds to the function exported with ordinal i + Base. Each entry in
this array is also an RVA. If the RVA for a particular array entry points within the export
section, then it is a forwarder. Forwarder means that the function is not present in this DLL,
but it is a forwarder reference to some function in another DLL. In such a case, the RVA points
to an ASCIIZ string that stores the name of the other DLL and the function name separated by
a period. In case the target DLL exports the function by ordinal, the function name is formed
as # followed by the ordinal printed in decimal. For example, the KERNEL32.DLL for Windows
NT forwards the HeapAlloc() function to the RtlAllocateHeap() function in the NTDLL.DLL.
Hence, the corresponding RVA in this case points to a location within the export section that
holds the string NTDLL.RtlAllocateHeap. The Win32 applications can import the HeapAlloc()
function from the KERNEL32.DLL without worrying about all these details. When the
application runs on Windows 95, the loader resolves the import reference to the function in
the KERNEL32.DLL. When the same application runs on Windows NT, the loader finds that the
function is forwarded to the NTDLL.DLL. Hence, the loader automatically loads the NTDLL.DLL
and resolves the imported function to the RtlAllocateHeap() function.
When an export-functions array entry is not a forwarder–that is, the RVA does not lie within the export section–the RVA
points to the entry point of the function or to the location of the exported variable.
The export-functions array may have gaps. This is beacause some ordinals might be left unused while exporting
functions, and some ordinals might not have any corresponding export. In such a case, the corresponding array entry is
set to 0.
AddressOfNames RVA of an array called as the export-names array that has an entry for every function that is
exported by name. Hence, the size of this array is equal to the NumberOfNames field. Each
entry in this array is an RVA pointing to an ASCIIZ string containing the export name. The
array is sorted on the lexical order so as to allow binary search.
AddressOfNameOrdinals RVA of an array of ordinals henceforth called as the export-ordinals array. This array has the
size same as that of the AddressOfNames array. All three arrays, namely, export-names,
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 219 of 226
Import Directory
The next index in the data directory, IMAGE_DIRECTORY_ENTRY_IMPORT, is reserved for the import directory of an
executable/DLL. The RVA in this data directory entry points to the import directory, which is nothing but a variable-
sized array of IMAGE_IMPORT_DESCRIPTORs, one for each imported DLL. The first field in this structure is a union. If the
Characteristics field in this union is 0, it indicates the end of the variable-sized import descriptors array. Otherwise, the
union is interpreted using the other member, OriginalFirstThunk.
OriginalFirstThunk This is an RVA of what Microsoft calls as the Import Lookup Table (ILT). Each entry in the ILT is
a 32-bit number. If the MSB of this number is set, it is treated as an import by ordinal. The bits
0 through 30 are treated as the ordinal of the imported function. If the MSB is not set, the
number is treated as an RVA to the IMAGE_IMPORT_BY_NAME structure. The first member of
this structure is a hint for searching for the imported name in the export directory of the
imported DLL. The loader uses this hint as the starting index in the export-names array when
it does a binary search while resolving the import reference. The hint is followed by an ASCIIZ
name of the import reference.
The WINNT.H file provides the IMAGE_SNAP_BY_ORDINAL macro to determine whether it’s an import by ordinal. It also
provides the IMAGE_ORDINAL macro to get the ordinal from the 32-bit number in the ILT. The ILT is a variable-sized
array. The end of the ILT is marked with a 0.
TimeDateStamp This field is set to 0, unless the imports are bound. Soon, we discuss what’s meant by binding
the imports of a PE file.
Name RVA of the ASCIIZ string that stores the name of the imported DLL.
FirstThunk RVA of the Import Address Table (IAT). The IAT is another array parallel to the ILT, unless the
image is bound. The IAT also has ordinals or pointers to the IMAGE_IMPORT_BY_NAME
structures. When the loader resolves the import references, it replaces the entries in the IAT
with the actual addresses of the corresponding functions. Astonishingly, that is all it needs to
do to achieve dynamic linking–everything else is already set in place by the linker and import
librarian. Let’s see how all these components work together to achieve dynamic linking.
Every DLL has an import library that can either be created using an import librarian or may be generated by the linker
itself while creating the DLL. The import library has stub functions with names the same as those of the functions
exported from the DLL. The import library also has a .idata section containing an import table that has entries for all the
functions from the DLL. Each stub function is an indirect jump that refers to the appropriate entry in the IAT in
the .idata section. When an executable is linked with the import library, the linker resolves the imported function calls
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 220 of 226
to the stub functions in the import library. The linker also concatanates the .text section from the import library that
contains the stub functions with the .text section of the generated executable. The .idata sections and, incidentally, the
import directories are also concatenated. The stage is now set for loading. While loading, the entries in the IAT are
replaced by the actual function addresses, and that’s it. Now when the function is called, the control is transferred to
the stub function that performs an indirect jump. As the IAT entry contains the address of the actual function from the
DLL, the control is transferred to the required function.
The situation is a bit different if you use the new __declspec(dllimport) directive while prototyping an imported
function. In that case, the compiler itself generates an import table. In addition, it generates an indirect call referring
to the appropriate location in the generated IAT. This method does away with the overhead of an extra jump.
A major portion of loading time is spent on resolving the imports. The loader has to search each imported symbol in the
export directory of the imported DLL to find out the virtual address of the symbol. The loading time can be drastically
reduced if the IAT contains the actual address of the symbol instead of the name or ordinal. Such a PE file is called as a
bound image. The imported symbol addresses are calculated assuming that the imported DLL will be loaded at the
preferred base address at the time of loading. The IMAGE_IMPORT_DESCRIPTORs, in a bound PE file, are also modified.
The TimeDateStamp field stores the timestamp of the imported DLL. At the time of loading, if this timestamp does not
match with that of the DLL, the imports need to be resolved again. Because the IAT is modified and does not contain the
symbol names or ordinals, the ILT is used, in this case, to resolve the imports.
The forwarded functions pose another problem with binding. The addresses of the forwarded functions cannot be
calculated at bind time, and so these functions have to be resolved at load time. A list of all the forwarded functions for
an imported DLL is maintained through the ForwarderChain member in the corresponding IMAGE_IMPORT_DESCRIPTOR.
This member stores the index of a forwarded function in the IAT. The IAT entry at this index stores the index of the next
forwarded function, and so on, forming a list of forwarded functions. The list is terminated by a '1 entry.
BindImage()
The bind utility that is shipped with Win32 SDK enables binding of PE files. Also, the BindImage and BindImageEx()
functions in the IMAGEHLP.DLL provide this functionality.
BOOL BindImage(
LPSTR ImageName,
LPSTR DllPath,
LPSTR SymbolPath
);
PARAMETERS
ImageName The filename of the file to be bound. This can contain only a filename, a partial path, or a full
path.
DllPath A root path to search for ImageName if the filename contained in ImageName cannot be
opened.
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 221 of 226
SymbolPath A root path to search for the corresponding symbol file. If the symbol file is stored separately,
the header of the symbol file is changed to reflect the changes in the PE file.
RETURN VALUES
BindImageEx()
This function is very similar to BindImage function except it provides more customization such as getting a periodic
callback during the progress of binding process.
BOOL BindImageEx(
IN DWORD Flags,
IN LPSTR ImageName,
IN LPSTR DllPath,
IN LPSTR SymbolPath,
IN PIMAGEHLP_STATUS_ROUTINE StatusRoutine
);
PARAMETERS
Flags The field controls the behavior of the function. It is set to as an OR of the flag values
defined in the IMAGEHLP.H file. The following flag values are defined in the IMAGEHLP.H
file:
BIND_ALL_IMAGES Bind all images that are in the call tree for this file.
StatusRoutine Pointer to a status routine. The status routine is called during the progress of the image
binding process.
RETURN VALUES
Calling BindImage is equivalent to calling BindImageEx with Flags as 0 and StatusRoutine as NULL. That is, calling
BindImage(ImageName, DllPath, SymbolPath) is equivalent to calling BindImageEx(0, ImageName, DllPath, SymbolPath,
NULL).
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 222 of 226
Resource Directory
The next index in the data directory, IMAGE_DIRECTORY_ENTRY_RESOURCE, refers to the resource directory for a PE
file. The resource directory and the resources themselves are generally stored in a section named .rsrc section. The
resources are maintained in a tree structure similar to that in a file system. The root directory contains subdirectories. A
subdirectory can contain subdirectories or resource data. The subdirectories can be nested to any level. But Windows NT
only uses a three-level structure. At each level, the resource directory branches according to certain characteristics of
the resources. At the first level, the type of the resource–bitmap, menu, and so on–is considered. All the bitmaps are
stored under one subtree, all the menus are stored under another subtree, and so on. At the next level, the name of the
resource is considered, and the third level classifies the resource according to the language ID. The third-level resource
directory points to a leaf node that stores the actual resource data.
A resource directory consists of summary information about the directory followed by the directory entries. Each
directory entry has a name or ID that is interpreted as a type ID, a name ID, or a language ID, depending on the level of
the directory. A directory entry can point either to the resource data or to a subdirectory that has a similar format.
The format of the resource directory is defined as the IMAGE_RESOURCE_DIRECTORY structure in WINNT.H.
TimeDateStamp Date and time when the resource was generated by the resource compiler.
NumberOfNamedEntries Number of directory entries having string names. These entries immediately follow
the directory summary information and are sorted.
NumberOfIdEntries Number of directory entries that use integer IDs as the names. These entries
follow the ones having string names.
This summary information is followed by the directory entries. Each directory has a format as defined by the
IMAGE_RESOURCE_DIRECTORY_ENTRY structure in WINNT.H. This structure is composed of two unions. The first union
stores the ID of the entry. If the MSB is set, then the lower 31 bits in this field is an RVA of the Unicode string that stores
the name of the entry. The Unicode string consists of the length of the string followed by the 16-bit Unicode characters.
If the MSB is not set, then the union stores the integer ID of the resource. This first union stores the type ID, the name
ID, or the language ID, depending on the level of the directory. The second union, in the
IMAGE_RESOURCE_DIRECTORY_ENTRY structure, points either to another resource directory or to the resource data,
depending on the MSB. If the bit is set, the lower 31 bits is an RVA of another subdirectory. If the MSB is not set, then
it’s an RVA of the resource data entry that forms a leaf node of the resource directory tree structure. The format of the
resource data entry is defined as the IMAGE_RESOURCE_DATA_ENTRY structure in the WINNT.H file and has following
members:
CodePage Code page used to decode code point values within the resource data. Typically, the code
page would be the Unicode code page.
Relocation Table
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 223 of 226
A PE file needs only based relocations. The linker resolves all the relative relocations, assuming that the file will get
loaded at the preferred base address. For example, if a function foo has the RVA as 0x100 and the preferred base
address is 0x400000, the linker resolves the call to foo as a call to address 0x400100. At run time, if the file is loaded at
the preferred base address of 0x400000, then no relocation needs to be preformed. If, for some reason, the file cannot
be loaded at the base address of 0x400000, the loader needs to patch the call. If the loader manages to load the file at a
base address of 0x600000, it needs to change the call address to 0x600100. In general, it needs to add the difference of
0x200000 to all the to-be-patched locations. This process is called as the based relocation. The list of the to-be-patched
locations, also called as fixups, is maintained in the relocation table that is generally present in the .reloc section and is
pointed to by the data directory entry at the IMAGE_DIRECTORY_ENTRY_BASERELOC index. The relocation table is
nothing but a series of relocation blocks, each representing the fixups for a 4K page. Each relocation block has a header
followed by the relocation entries for the corresponding page. The relocation block format is defined as the
IMAGE_BASE_RELOCATION structure in the WINNT.H file, and it has following fields:
SizeOfBlock Total size of the relocation block, including the header and the relocation entries.
Each relocation entry is a 16-bit word. The higher 4 bits indicate the type of relocation, and the lower 12 bits are the
offset of the fixup location within the 4K page. The address-to-patched is calculated by adding the base address for
loading, the RVA of the page to be patched, and the 12-bit offset within the page. The relocation types are defined in
the WINNT.H file–only two of them are used on Intel machines:
IMAGE_REL_BASED_ABSOLUTE The relocation is skipped. This type can be used to pad a relocation block so that
the next block starts at a 4-byte boundary.
IMAGE_REL_BASED_HIGHLOW The relocation adds the base-address difference to the 32-bit double word at the
location denoted by the 12-bit offset.
Debug Directory
The operating system is not concerned with the debug information present in a PE file. The debugging tools access the
debug information in a PE file. There are various debugging tools, which expect the debug information in different
formats. The corresponding compilers/linkers also store the debug information in different formats. The PE format
allows the debug information to be stored in different formats, such as COFF, Frame Pointer Omission (FPO), CodeView
(CV4), and so on. A single file may contain debug information in more than one format. The debug directory pointed to
by the IMAGE_DIRECTORY_ENTRY_DEBUG entry in the data directory is an array of debug directory entries, one for each
debug information format. The IMAGE_DEBUG_DIRECTORY structure in the WINNT.H file represents the format of a
debug directory entry.
TimeDateStamp Date and time when the debug data was created.
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 224 of 226
Of the different debug information formats, three are frequently encountered in PE files. The first one is the format
used by the popular CodeView debugger. This format is defined in the CV4 specification. The FPO format is used to
describe nonstandard stack frames. Not all the files in a PE file need have an FPO format debug entry. The functions
without one are assumed to have a normal stack frame. The third important format is COFF, which is the native debug
information format for PE files. The PE header itself points to the COFF symbol table. The COFF debug information
consists of symbols and line numbers.
In such a case, each thread gets a private copy of i. Whenever a particular thread is running, its own private copy of i
should be automatically activated. This is achieved in Windows NT using the Thread Local Storage (TLS) mechanism.
Let’s see how it works.
Do not confuse the local data of a thread with the local variables that are created on stack. Each thread has a separate
stack and local variables that are created and destroyed separately for each thread as the stack grows and shrinks. In
this section, the phrase local data means global variables that have a separate copy for each thread.
The operating system maintains a structure called as the Thread Environment Block (TEB) for every thread running in the
system. The FS segment register is always set such that the address FS:0 points to the TEB of the thread being executed.
The TEB contains a pointer to the TLS array. The TLS array is an array of 4-byte DWORDs. Similar to the TEB, a separate
TLS array is present for each thread. A thread can store its local data in the TLS array. Generally, programs store
pointers to local data in some slot in the TLS array. The slot allocation for the TLS array is controlled by the API
functions TlsAlloc() and TlsFree(). The Win32 API also provides functions to set and get the value at a particular index in
the TLS array.
It is cumbersome to access the thread-specific data using the API functions. An easier way is to use the __declspec
(thread) specification while declaring global variables that need to have a private copy for each thread. All such
variables are gathered by the compiler/linker, and a single TLS array index is automatically allotted to this bunch of
data. The TLS array entry at this index contains the pointer to a local data buffer that stores all these variables. These
variables are accessed as any other normal variable in the program. Whenever such a variable is accessed, the compiler
takes care to generate the code to access the TLS array entry and the data at a proper offset within the local data
buffer.
This discussion is bit off the track. However, it is necessary before discussing the IMAGE_DIRECTORY_ENTRY_TLS data
directory entry. The TLS directory structure is defined as IMAGE_TLS_DIRECTORY in the WINNT.H. Let’s have a look at
this structure and see how it fits in the TLS mechanism.
StartAddressOfRawData Each time a new thread is created, the operating system allocates a new local data buffer for
the thread and initializes the buffer with the data that is pointed to by this field. Note that
this address is not an RVA, but it is a proper virtual address that has a relocation entry in
the .reloc section.
EndAddressOfRawData Virtual address of the end of the initialization data. The rest of the local data buffer is filled
with zeros.
AddressOfIndex Address in the data section where the loader should store the automatically allotted TLS
index. The code accessing TLS variables accesses the index from this location.
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 225 of 226
AddressOfCallBacks Pointer to a null-terminated array of TLS callback functions. Each function in this array is
called whenever a new thread is created. These functions can perform additional
initialization (for example, calling constructors) for the TLS data. The TLS callback has the
same parameters as the DLL entry-point function.
SizeOfZeroFill Size of the local data that is to be initialized to zero. The total size of the local data is
(EndAddressOfRawData StartAddressOfRawData) + SizeOfZeroFill.
Characteristics Reserved.
Section Table
We’ve roamed through the PE format without bothering about the section formats. This is possible because of the data
directory that directly locates the important pieces of information within a PE file. You need not know about the
sections at all to interpret a PE file. Nevertheless, in case you need to modify a PE file, you may be required to know
about the sections and section headers. For example, you may want to add, remove, or extend a particular section, and
this requires changes to the section table, among other things.
As mentioned earlier, the PE header is followed by the section table. The section table is an array of section headers.
The format of the section header is defined by the IMAGE_SECTION_HEADER structure in the WINNT.H file. The members
of a section header are as follows:
SizeOfRawData Size of the section as stored in the file. This is equal to the VirtualSize rounded
to the next file alignment multiple.
PointerToRawData Within file offset to the section data. If you memory map a PE file, this field
needs to be used to get to the section data.
PointerToLinenumbers Within file offset to the COFF style line number information.
IMAGE_SCN_LNK_REMOVE Section will not become part of the loaded image. The .debug section may
have this flag set.
IMAGE_SCN_MEM_DISCARDABLE Section can be discarded. The relocation table and debug information can be
discarded after the loading process is over. Hence, the .debug and .reloc
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24
Undocumented Windows NT Page 226 of 226
IMAGE_SCN_MEM_SHARED Section can be shared in memory. If a DLL has the data section with this flag
set, all the instances of the DLL in different processes share the same data.
IMAGE_SCN_MEM_EXECUTE Section can be executed. For the code sections, both the
IMAGE_SCN_CNT_CODE and IMAGE_SCN_MEM_EXECUTE flags are set.
LOADING PROCEDURE
Let’s see how the loader interprets a PE file and prepares a memory image for execution. The loader needs to find the
free virtual address space to map the file in memory. The loader tries to load the image at the preferred base address.
After this is done, the loader maps the sections in memory. The loader goes through the section table and maps each
section at the address calculated by adding the RVA of the section to the base address. The page attributes are set
according to the section’s characteristic requirements. After mapping the section in memory, the loader performs based
relocation if the base address is not equal to the preferred base address. Then, the import table is checked and the
required DLLs are loaded. The same procedure for loading an executable–mapping sections, based relocation, resolving
imports, and so on–is applied while loading a DLL. After loading each DLL, the IAT is fixed to point to the actual
imported function address.
SUMMARY
Microsoft introduced the Portable Executable (PE) file format with Windows NT. The PE format serves as the executable
file format for all the 32-bit Microsoft operating systems (that is, the various versions of Windows NT and Windows
95/98) though these operating systems still support the older executable file formats, including the DOS executable file
format.
Various components in a PE file are addressed using the relative virtual address (RVA). The IMAGEHLP.DLL provides us
with utility functions to memory map a PE file to find the address in the memory corresponding to the RVA specified in
the PE file. A PE file is composed of the file headers, the data directory, the section table, and the various sections. The
data directory points to the important parts of the PE file: the export directory, the import directory, the relocation
table, the debug directory, and the Thread Local Storage. The export directory lists the symbols exported from the PE
file, which is most likely a DLL. The import directory lists all the symbols imported by the PE file. When a PE file is
loaded in memory for execution, the loader resolves the imported symbols to actual virtual addresses in the DLL that
exports the symbols. This process is termed dynamic linking.
The PE headers are followed by the section table that points to all the sections, including the ones pointed to by the
various data directory entries. The loader reads the section table and maps various sections of a PE file in memory. Then
it prepares the image for execution by relocating the image for the mapped address and resolving various imported
symbols after loading the required DLLs.
http://www.left-brain.com/DesktopModules/EngagePublish/printerfriendly.aspx?itemId... 2010/5/24