Allocating Memory the Newfangled Way: The new Operator

Dale E. Rogerson Microsoft Developer Network Technology Group Created: August 6, 1992 Revised: January 21, 1993 "The business side is easy—easy! ...if you're any good at math at all, you understand business. It's not its own deep, deep subject. It's not like C++." William Gates, Upside Magazine, April 1992 There are two sample applications associated with this technical article. Click to open or copy the files for the OWNER sample application. Click to open or copy the files for the NEWOPR sample application.

Many developers ask the question, "Do I need to overload the new operator for Windows™–based applications?" when they start programming in C++ with the Microsoft® C/C++ version 7.0 compiler. These developers want to conserve selectors while allocating memory from the global heap. Fortunately, the C/C++ version 7.0 runtime library allows developers to reduce selector consumption without overloading the new operator. This article examines the behavior of the new operator in a Windows-based program. It provides an overview of new, discusses whether you should overload new, examines the C++ ambient memory model, and discusses large-model C++ programming and dynamic-link library (DLL) ownership issues. Two sample applications, newopr and owner, illustrate the concepts in this technical article. A bibliography of suggested reading material is included at the end of the article.

This section provides an overview of the new operator, the _fmalloc function, and the _nmalloc function.


The new operator calls malloc directly. In small or medium model, it calls the near version of malloc, which is _nmalloc. In large model, it calls _fmalloc. Alarms are probably ringing in the heads of experienced programmers. In the past, Microsoft has recommended against using malloc because it was incompatible with Windows real mode. In C/C++ version 7.0, malloc is designed for Windows protectedmode programming, and real mode is no longer a concern in Microsoft® Windows™ version 3.1. In most cases, calling _fmalloc is now better than calling GlobalAlloc directly.

_fmalloc is better than GlobalAlloc because of subsegment allocation. Instead of calling GlobalAlloc directly for each memory request, _fmalloc tries to satisfy as many requests as possible with only one GlobalAlloc call, and uses GlobalReAlloc to increase the size of a segment. Reducing the calls to GlobalAlloc cuts down on the overhead, time, and selectors required by an application. Reducing selectors is particularly important for C++ programs. Most programs allocate lots of small objects on the heap. If new called GlobalAlloc directly, each small object would use a selector, and the program would reach the system limit of 8192 selectors (4096 in standard mode) too quickly.

Although _fmalloc is fine and dandy, _nmalloc is not nearly (no pun intended) as sophisticated. _nmalloc allocates fixed memory with LocalAlloc directly, which may result in memory fragmentation in the local heap. _nmalloc performs no subsegment allocation scheme, and the local heap must share a maximum of 64K with the stack and static data. Here's another gotcha: _nmalloc is the default for the new operator in the medium and small models. _nmalloc allocates its memory from the local heap and must share the heap with the static data and stack—so a lot of things compete for only 64K of space. It is rather easy to run out of memory in the local heap. For example, a simple phone book that requires 200 bytes of data per entry would be able to store a maximum of only 330 names. Heap Walker can help you determine the source of memory allocation. Memory allocated with LocalAlloc (through _nmalloc) expands the segment labeled DGroup. Memory allocated with GlobalAlloc (through _fmalloc) is labeled as a private segment. For more information on _fmalloc, see the "Allocating Memory the Old-Fashioned Way: _fmalloc and Applications for Windows" technical article on the Microsoft Developer Network CD (Technical Articles, C/C++ Articles).

Overloading the new Operator
Many developers want to overload the new operator as soon as they learn that new calls _nmalloc. You can overload the new operator to perform specialized memory management, but overloading new to call _fmalloc instead of _nmalloc will not work. The new operator has four versions. In this article, we are concerned only with the following two:
void __near *operator new( size_t size ); void __far *operator new( size_t size ) ;

In small and medium models, the compiler calls the near version of the new operator, and this version then calls _nmalloc. If we try to overload this function by calling _fmalloc, we would get a far-to-near pointer conversion error:
void __near *operator new( size_t size ) { return _fmalloc(size); //ERROR: Lost segment in far/near conversion. }

A memory management scheme that overloads the near version of the new operator can return only near pointers, so using GlobalAlloc or GlobalAllocPtr will not work either. Overloading the new operator to call _fmalloc instead of _nmalloc is obviously not the answer.

Ambient Memory Models
Asking the proper question will lead to a useful solution. The proper question is: "How do I get the far version of the new operator compiled in my code?" There are three ways to do this:
• • •

Specify an ambient memory model. Override the ambient memory model. Use the large memory model.

The following sections describe each method in turn.

Specifying an Ambient Memory Model
You can think of the ambient memory model as the default memory model. Normally, the ambient memory model of a class is identical to the data model you specify at compilation time. If the data model is near (for example, in small or medium models), the

ambient memory model is near. You can specify the ambient memory model for a class explicitly by using __near or __far; for example:
class __far CFoo { };

Using the new operator on the CFoo class, as defined above, allocates the CFoo object on the global heap using _fmalloc. Note The ambient memory model of a class must be identical to the memory model of all of its base classes. For example, if your class inherits from a Microsoft Foundation class, your class must have the same memory model as the Foundation class. If you use small and medium memory models, the ambient memory model of a Foundation class is near. We discuss the large model in the "Large-Model Programs" section.

Overriding the Ambient Memory Model
You can override the ambient memory model on a per-object-instance basis:
class CBar{ }; void main() { CBar __far *pBar = new __far CBar ; }

At first glance, the code above looks very straightforward. However, nonstatic member functions have a hidden parameter called the this pointer. It is through the this pointer that an object instance references its data. If the member function is near, it expects the this pointer to be near. A far this pointer results in an error because a far pointer cannot be converted to a near pointer. The following code generates an error because it cannot find a default constructor that returns a far this pointer:
class CBar{ public: CBar(); }; CBar::CBar() { } void main() { CBar __far *pBar = new __far CBar ; // ERROR C2512: 'CBar': An appropriate // default constructor is not available.


To compile the code above, you must override the constructor based on the addressing type. This results in the following correct code:
class CBar{ public: CBar(); CBar() __far ; // Overload the constructor to take far this pointers. }; CBar::CBar() { } // Overloaded constructor. CBar::CBar() __far { } void main() { CBar __far *pBar = new __far CBar ; }

Only functions that are actually called through a far pointer need to be overridden.
class CBar{ private: int value; buildIt() __far {}; // Must be far: CBar() __far calls it. public: CBar(); CBar() __far ; // Overload the constructor to take far this pointers. // inc is called through a far pointer. void inc() __far { value++; } ; // dec is not called through a far pointer. void dec() { value--; } ;


CBar::CBar() { buildIt() ; } // Overloaded constructor. CBar::CBar() __far { buildIt() ; }

void main() { CBar *npBar = new CBar ; // Allocated in default data segment. CBar __far *pBar = new __far CBar ; // Allocated in global heap. pBar->inc() ; npBar->dec() ; npBar->inc() ; } // Far addressing. // Near addressing. // Converts near pointer to a far pointer.

Confusion Is Nothing New
The use of the __far modifier can make programs very difficult to understand and debug. For example, let's assume that the following code is compiled in the small or medium memory models:
class __far CFoo { public: CFoo() ; ~CFoo() ; ... }; class CBar { public: CBar() ; ~CBar() ; ... }; CFoo aFoo; CBar aBar CFoo *pFoo CBar *pBar // Allocated in a far data segment. // Allocated in default data segment. // Far pointer. // Near pointer. // Near pointer. // Far pointer.

CFoo __near *npFoo ; CBar __far *fpBar ; main() { CFoo anotherFoo; pFoo = new CFoo; pBar = new CBar;

// Allocated on stack // (default data segment). // Allocated in global heap. // Allocated in default data segment.

fpBar2 = new __far CBar; // Error: No appropriate default constructor. npFoo = new CFoo; // Error: Cannot convert from a far pointer to a near pointer.

npFoo = new __near CFoo ; // Error: Cannot convert from a far class to a near class. npFoo = &aFoo; // Error : Cannot convert from a far pointer to a near pointer. }

You can see how complex an application can get when it mixes near objects and far objects. Again, Heap Walker can help you determine whether memory is being allocated in the default data segment or in the global heap. For additional information on the new operator, see Chapter 5 of the Microsoft C/C++ version 7.0 Programming Techniques manual on the Microsoft Developer Network CD.

Large-Model Programs
As we discussed in the previous section, mixing near and far addressing is even more of a nightmare in C++ than it is in C and can offset many C++ benefits such as ease of maintenance and readability. The solution is to use the large model. Although the large model has not been recommended in the past, the combination of Microsoft C/C++ version 7.0 and Windows version 3.1 now makes large model the memory model of choice. When a C or C++ program is compiled with the large memory model, malloc is mapped to its model-independent or far version known as _fmalloc. Because the new operator calls malloc, heap objects are allocated in global memory. The two issues associated with using the large model involve speed and creating multiple instances. The time you save by not worrying whether an object is near or far can be used to run a profiler and to optimize the application, thus compensating for any speed losses caused by the large model.

Multiple Instances
The new /Gx option in C/C++ version 7.0 simplifies the creation of multiple-instance, large-model applications. Make sure to use the following compiler options: /Gt65500 /Gx Programs with multiple read/write data segments cannot have multiple instances. By default, the Microsoft C compiler places initialized and uninitialized static data in two separate segments. The compiler places each static object that is larger than or equal to 32,767 bytes into its own segment. The /Gx and /Gt options override this behavior.

The /Gx option forces all initialized and uninitialized static data into the same segment. The /Gt[n] option places any object larger than n bytes in a new segment. (n is optional, as indicated by the square brackets.) If n is not specified, it defaults to 256 bytes. If n is large (for example, 65,500 bytes), most objects remain in the default data segment. Because a multiple-instance application can have only one read/write data segment, the application is limited to 64K for all statics, the local heap, and the stack. However, C++ promotes the use of the heap through the new operator, which allocates memory from the global heap instead of the local heap in the large model, so the 64K local heap limit should not be a problem. Moreover, multiple-instance, small-model and medium-model applications also have only one read/write data segment. Warning A bug in Microsoft C/C++ version 7.0 causes the compiler to place uninitialized global instances of classes and structures in a far data segment (FAR_DATA) when the /Gx option is used, resulting in two data segments. For this reason, you must declare global class objects and structures as near. To illustrate, most Microsoft Foundation Class Library programs have a global object declared as follows:
CTheApp theApp;

To get multiple instances of this program, you must change the line to:
CTheApp __near theApp;

We recommend that you use the NEAR define:
CTheApp NEAR theApp;

The EXEHDR utility determines the number of data segments a program contains. In the sample EXEHDR output below, the lines that detail the number of segments are underlined and appear in bold.
Microsoft (R) EXE File Header Utility Version 3.00 Copyright (C) Microsoft Corp 1985-1992. All rights reserved. Module: Description: NEWOPR newopr - demonstrates new operator in medium v. large model Data: NONSHARED Initial CS:IP: seg 1 offset e392 Initial SS:SP: seg 4 offset 0000 Extra stack allocation: 1000 bytes DGROUP: seg 4 Heap allocation: 0400 bytes Application type: WINDOWAPI Runs in protected mode only no. type address file mem flags

1 CODE 00000600 0ff8f 0ff8f PRELOAD, (movable), (discardable) 2 CODE 00010a00 013b0 013b1 PRELOAD, (movable), (discardable) 3 DATA 00000000 00000 00038 PRELOAD, (movable) 4 DATA 00012000 0159f 01fee PRELOAD, (movable) Exports: ord seg offset name 1 1 e358 _AFX_VERSION exported 2 1 f718 ___EXPORTEDSTUB exported

The MAP file helps determine the data that is placed in the FAR_DATA segment instead of the default data segment. To get a MAP file, be sure to specify a MAP filename and the /MAP option on the link line. In the sample MAP file below, lines of interest are underlined and shown in bold.
Start 0001:0000 . . . 0001:E378 0002:0000 Length 004CFH Name NEWOPR_TEXT Class CODE

01C17H 013B1H



0003:0000 00038H 0004:0000 00010H 0004:0010 011E0H 0004:11F0 00000H . . . 0004:140E 0000CH 0004:141A 00004H 0004:141E 00008H 0004:1426 00163H 0004:1589 00015H 0004:159E 00001H 0004:15A0 009B6H . . . Origin Group 0004:0 DGROUP Address Export



Alias _AFX_VERSION ___ExportedStub

0001:E358 _AFX_VERSION 0001:F718 ___ExportedStub Address 0001:7C52 . . . Publics by Name


0003:0000 0001:7BF2 0002:01F0 0003:000E . . . Address . . . 0002:1380 0003:0000 0003:000E 0004:0004 . . . 0004:1FEE 0004:1FEE

?spaceholder@@3VCObArray@@E ?Store@CRuntimeClass@@RECXAEVCArchive@@@Z ?TextOut@CDC@@RECHHHPFDH@Z ?theApp@@3VCTheApp@@E

Publics by Value

?GetStartPosition@CMapPtrToWord@@RFCPEXXZ ?spaceholder@@3VCObArray@@E ?theApp@@3VCTheApp@@E rsrvptrs

__end _end

Program entry point at 0001:E392

Multiple-Instance, Large-Model Foundation Class Programs
Multiple-instance, large-model programs that use the Microsoft Foundation classes must build special versions of the Microsoft Foundation Class Library using the /Gt and /Gx options. Use the following command line: nmake MODEL=L TARGET=W DEBUG=1 OPT="/Gt65500 /Gx" Warning This variant of the Microsoft Foundation Class Library has not been tested extensively by Microsoft. For additional information on using large-model programs with Windows, see the "Programming at Large" technical article on the Microsoft Developer Network CD (Technical Articles, C/C++ Articles).

The NEWOPR Sample
newopr is a rather simple application that demonstrates some of the issues presented in this technical article. newopr tries to allocate 128 blocks of memory, 1024 bytes per block. When newopr is compiled as a medium-model program, it cannot allocate 128 blocks because it runs out of memory in the default data segment. In fact, the Microsoft Foundation Class Library raises an exception when the new operator fails, and newopr handles this exception gracefully.

When newopr is compiled as a large-model program, it can allocate all 128 blocks because the memory is allocated from the global heap instead of the local heap. The best way to use newopr is to compile it medium model, run it, and examine the heap with Heap Walker. Run NMAKE with the CLEAN option, and then compile large model. Run the large-model version, and re-examine the heap with Heap Walker. The following parameters control how newopr gets built: DEBUG=[0|1] LARGE=[0|1] MINST=[0|1] CLEAN Setting of 1 enables debugging information. Setting of 1 compiles newopr as a large-model program. Setting of 1 compiles with /Gt and /Gx options to allow multiple instances. LARGE must be set to 1. Deletes .exe, .res, and .obj files.

Sample nmake command lines are shown below: Command Line nmake nmake DEBUG=1 nmake LARGE=1 nmake MINST=1 nmake LARGE=1 MINST=1 Makes Medium-model version. Medium-model debug version. Large-model version. Medium-model version. MINST is ignored. Multi-instance, large-model version. Foundation class large-model library must be compiled with /Gx and /Gt for this to work. Same as above, but enables debugging.


DLLs and Memory Ownership
As discussed in the "Allocating Memory the Old-Fashioned Way: _fmalloc and Applications for Windows" technical article on the Microsoft Developer Network CD, _fmalloc called from a DLL behaves differently than _fmalloc called from an application. If you call _fmalloc from a DLL, it calls GlobalAlloc with the GMEM_SHARE flag, which changes the ownership of the allocated memory from the calling application to the DLL.

Ownership determines when the system will clean up the memory:
• •

If the application owns the memory, exiting the application releases the memory. If the DLL owns the memory, unloading the DLL from memory releases the memory. If multiple applications or multiple instances of an application use a DLL, the DLL is unloaded only after all applications that use it are unloaded.

The key point here is that memory owned by a DLL (for example, GMEM_SHARE) can exist even after your application exits. The Smart Alloc sample application, which accompanies "Allocating Memory the Old-Fashioned Way: _fmalloc and Applications for Windows," illustrates this issue.

Ambiguous Memory Ownership
A DLL owns the memory allocated as GMEM_SHARE from within the DLL (in C++ or C). A DLL also owns the memory allocated by new in the DLL. Determining when and where memory is allocated can become very confusing in C++. The code samples below are from the owner sample application and its associated OWNERDLL.DLL. The DLL contains the following class:
class __export CContainedClass{ public: char aMessage[1024] ; }; class __export CFooInDLL{ public: CFooInDLL () ; void yourString() ; void myString(); CContainedClass aContainedClass ; char aBuffer[1024] ; char *aString ; } ; CFooInDLL::CFooInDLL() { aString = new char[1024] ; } /////// INLINE FUNCTION //////// inline void CFooInDLL::yourString() {


if (aString) delete aString ; aString = new char[1024] ;

/////// OUTLINE FUNCTION /////// void CFooInDLL::myString() { if (aString) delete aString ; aString = new char[1024] ; }

The .EXE for the program contains the following code fragment:
// Code in .EXE CFooInDLL aFoo; void somefunc() { aFoo.yourString() ; aFoo.myString() ; aFoo.yourString() ; }

// Now application owns aString. // Now DLL owns aString. // Now application owns aString.

Given these code fragments (where the object is defined in a DLL and declared in an application), the following rules apply:

The application owns the memory for objects declared in the application. Therefore, the application owns the memory for aFoo.

Space for an object and its contained objects is allocated where the object is declared. Therefore, during the construction of the aFoo object, memory for aContainedClass is allocated, and aContainedClass is also located in the application's memory space.

The process that executes the new operator owns the memory for the object (see figure below). • The CFooInDLL constructor calls the new operator to allocate space for aString; therefore, the DLL owns the memory for aString.

yourString is an inline function and executes inside the application; therefore, the application owns the memory allocated by yourString. myString executes inside the DLL; therefore, the DLL owns memory allocated by myString.

The debug versions of Foundation classes track the allocation of memory. An assertion in the Microsoft Foundation Class Library MEmory.cpp source file will fail when yourString tries to free memory allocated by the DLL. Therefore, the retail versions of owner and OWNERDLL run fine, but the debug versions fail.

In most cases, it is best to design classes exported from a DLL so that memory ownership will not bounce between the application and the DLL. Using the debug versions of the Foundation class libraries helps track this problem.

Memory ownership for CFooInDLL object The problem of determining memory ownership is just one more reason not to export C++ class interfaces from a DLL. In most cases, it is much better to export a C interface from a DLL.

There is no need to override the new operator to make it compatible with the Windows environment. The new operator calls malloc. The model-independent version of malloc, _fmalloc, is designed to manage subsegment allocation under Windows. However, in medium or small memory models, malloc calls _nmalloc instead of _fmalloc. _nmalloc allocates memory through LocalAlloc. The best way to get the new operator to call _fmalloc is to use the large memory model. The ambient memory model for a class can be specified or overridden for a class instance, but both of these methods can quickly lead to confusing and complex code.

The following technical articles on the Microsoft Developer Network CD (Technical Articles, C/C++ Articles) are good sources of information on memory management in C++:

"Allocating Memory the Old-Fashioned Way: fmalloc and Applications for Windows" "Programming at Large" "Exporting with Class"

• •

We also recommend the Microsoft C/C++ version 7.0 Programming Techniques manual, also available on the Microsoft Developer Network CD. Chapter 5 of this manual discusses memory management in C++.

Allocating Memory the Old-Fashioned Way: _fmalloc and Applications for Windows
Dale Rogerson Microsoft Developer Network Technology Group Created: July 10, 1992 Click to open or copy the files in the Smart Alloc sample application for this technical article.

One of the most shocking things that a first-time programmer for Windows has to learn is not to use malloc but to use special Microsoft® Windows™ memory allocation functions such as GlobalAlloc, GlobalReAlloc, GlobalLock, GlobalUnlock, and GlobalFree. The reasons for requiring special memory allocation functions have mostly gone away with the demise of real mode. In fact, Microsoft C/C++ version 7.0 brings us almost full circle, because the preferred method for memory allocation is the large-model version of malloc or _fmalloc. Even the C startup code now uses malloc to allocate space for the environment. This article discusses the behavior of malloc supplied with Microsoft C/C++ version 7.0. The article focuses on programming for the protected modes—standard and enhanced— of Microsoft Windows version 3.1. The following topics are discussed:
• • • • • •

_nmalloc: Why _fmalloc is not the same History: Why _fmalloc was bad Subsegment Allocation: Why _fmalloc is good _ffree: Why _fmalloc is not perfect DLLs: Why _fmalloc may not do what you want Versatility: Why _fmalloc is not for everything

The information for this article was gleaned from the C/C++ version 7.0 compiler runtime library source code.

To interactively explore the behavior of _fmalloc, the Smart Alloc (SMART.EXE) sample application is provided. Smart Alloc is best used in conjunction with Heap Walker, which shows the exact state of the global segments allocated. Segments allocated with GlobalAlloc (or _fmalloc) are listed by Heap Walker as having a type designation of "Private." Smart Alloc has a dynamic-link library (DLL) that intercepts all calls to GlobalAlloc, GlobalFree, and GlobalReAlloc made by Smart Alloc or the C run-time library and prints messages with OutputDebugString to the debugging console. It is usually most convenient to use DBWIN.EXE to view these messages.

_nmalloc: Why _fmalloc Is Not the Same
When compiling with the large data model libraries (compact-, large-, and huge-model programs), malloc is automatically mapped to _fmalloc. In other memory models, the programmer must explicitly call _fmalloc, because malloc maps to _nmalloc in these memory models. _nmalloc functions differently from _fmalloc. _nmalloc directly maps to LocalAlloc with the LMEM_NODISCARD | LMEM_FIXED flags. _nfree directly calls LocalFree. Because _nmalloc allocates fixed memory blocks, it can lead to fragmentation of the local heap.

History: Why _fmalloc Was Bad
Before Microsoft® Windows™ version 3.1, programmers had to worry about compatibility with Windows-based real mode, which required the locking and unlocking of memory handles to support movable memory. A locked block in real mode is fixed in memory, and leaving blocks locked would result in performance degradation. The way _fmalloc is defined meant that an allocated block would have to be locked throughout its lifetime. When Microsoft C version 6.0 was released, real mode was the only mode in Windows; therefore, _fmalloc was designed to work under real mode. Microsoft C/C++ version 7.0 was designed to develop protected-mode applications for Windows. In protected mode, there is no penalty for locking a memory handle and leaving it locked. It is not even necessary to retain the handle returned from GlobalAlloc, because the GlobalHandle function returns the handle to a selector returned from GlobalLock. Macros defined in WINDOWSX.H simplify the process of getting a pointer to a block of memory. The GlobalAllocPtr and GlobalFreePtr macros automatically lock and unlock a memory block. Microsoft C/C++ version 7.0 takes advantage of the new freedom allowed by protected mode. _fmalloc can now leave memory blocks locked with no penalty under the two protected modes of Windows version 3.x.

Subsegment Allocation: Why _fmalloc Is Good

One of the current limitations of Windows version 3.x is the systemwide limit of 8192 selectors (4096 for standard mode). Each call to GlobalAlloc uses one selector and has an overhead of 32 bytes, which makes GlobalAlloc inappropriate for allocating many small blocks of memory. For example, take a flat file database that reads in a list of names and addresses from the hard disk and puts them in a binary tree. If GlobalAlloc is called for each name and address, this program would not be able to store more than 4096 names. Many companies have more than 4096 employees. In fact, the actual number of available selectors is far less than 8192 because all Windows-based applications and libraries must share from the same pool of selectors. _fmalloc implements a much more intelligent use of selectors. Instead of allocating a new segment for each memory request, _fmalloc tries to satisfy as many requests as possible using a single segment. _fmalloc expands the segment as needed and returns pointers to areas of memory within the segment. This process of managing memory within a segment is called subsegment allocation. In the first call, _fmalloc allocates a segment with GlobalAlloc using GMEM_MOVEABLE. (GMEM_SHARE, also set when compiling dynamic-link libraries [DLLs], will be examined in the section on DLLs.) The block allocated by _fmalloc is, therefore, not fixed in memory. It is movable. The selector associated with this block of memory will not change. However, because malloc returns a pointer to a location within the segment, the pointer will not have an offset of zero (selector:0) as GlobalAlloc does. In the next call, _fmalloc first tries to satisfy the request without allocating any memory. If this is not possible, it attempts to do a GlobalReAlloc instead of a GlobalAlloc. This reduces the number of selectors used by the program. If the segment size must grow larger than the _HEAP_MAXREQ constant defined in malloc.h to meet the allocation request, GlobalAlloc is called again. _HEAP_MAXREQ is defined to be 0x0FFE6 or 65,510 bytes. This leaves enough room for the overhead needed to manage the heap and not have any memory crossing a segment boundary. If more than _HEAP_MAXREQ memory is requested, the _fmalloc call returns a null pointer. Figures 1 and 2 illustrate the behavior of _fmalloc.

Figure 1. _fmalloc vs. GlobalAlloc Figure 1 illustrates how _fmalloc satisfies several memory requests with one segment consuming only one selector when the requested blocks are less than _HEAP_MAXREQ. Each call to GlobalAlloc, on the other hand, uses up a selector.

Figure 2. _fmalloc Subsegment Allocation Figure 2 shows how _fmalloc allocates a new segment when it cannot satisfy a request with the old segment because the requested block would cause the segment to grow larger than _HEAP_MAXREQ. Notice how neither GlobalAlloc nor _fmalloc allocates exactly the number of bytes that are requested. Both functions have some overhead. The current version of _fmalloc requires 22 bytes of overhead on top of the overhead of GlobalAlloc. It also defines the smallest segment size to be 26. Future versions of _fmalloc may require more or less overhead. _fmalloc also returns a pointer that is guaranteed to be aligned on double-word boundaries. _fmalloc attempts to be more efficient than GlobalAlloc by allocating memory from Windows in chunks, hoping to satisfy several memory requests while using only one selector and without needing to call GlobalAlloc or GlobalReAlloc again. In some cases, this can lead to faster speeds. The amount of memory that _fmalloc initially allocates to a new segment is rounded up to the nearest 4K boundary. If less than 4070 bytes (4096 - 26) is requested, 4K is allocated. If 4096 - 26 + 1 is requested, 8K is allocated. This behavior differs from the explanation in the Microsoft C/C++ version 7.0 Run-Time Library Reference, which states that the initial requested size for a segment is just enough to satisfy the allocation request. When _fmalloc can satisfy a request by growing the segment, it calls GlobalReAlloc. The global variable _amblksiz determines the amount by which the segment is grown. _fmalloc will grow the segment in enough multiples of _amblksiz to satisfy the request. The default value of _amblksiz is 4K for Windows, instead of the 8K used by MSDOS®. You can set _amblksiz to any value, but _fmalloc rounds it up to the nearest whole power of two before it is used. The sample application, Smart Alloc (SMART.EXE), can be used to explore the behavior of _fmalloc in detail. Examine Smart Alloc's Help file for more information on using it. Try allocating 1 byte of memory. _fmalloc calls GlobalAlloc with a size of 4K. Try allocating 4070 bytes and 4071 bytes. Smart Alloc also lets you experiment with different values of _amblksiz. The frugal behavior of _fmalloc makes it suited to allocating bunches of small memory objects. However, as will be shown in the next section, _fmalloc is not suitable for all uses.

_ffree: Why _fmalloc Is Not Perfect

While the subsegment allocation scheme employed by _fmalloc is very good, the behavior of _ffree is not as straightforward as GlobalFree. Knowledge of this behavior is very important to avoid wasting large amounts of memory. The following example illustrates the behavior of _ffree. Note In Figures 3 through 7, it is possible for Selector 3 to have a lower or higher value than Selector 1. The number indicates in what order the selectors were allocated.

Figure 3. Freed Segments Are Not GlobalFree'd In Figure 3, the last block allocated has been freed. However, its memory is not returned to the system.

Figure 4. Freed Blocks Are Not Reallocated In Figure 4, the first and fourth blocks of memory are freed in addition to Block 5. Again, no memory is returned to Windows with a GlobalFree. If _fmalloc returned the memory for the first block to Windows, the pointer to Block 2 would have to change. It would be possible for _fmalloc to GlobalReAlloc the memory associated with Selector 2 and GlobalFree the memory associated with Selector 3. This can be accomplished with the C/C++ run-time library, as will be explained in conjunction with Figure 7.

Figure 5. Figure 4 Followed with an _fmalloc(x/2) In Figure 5, a new block has been allocated. Because this block is half the size of the previous first block, _fmalloc places it in this empty block of Selector 1.

Figure 6. Figure 5 Followed with an _fmalloc(2 * x) In Figure 6, another block of memory is allocated. This time it is twice the size of the previous blocks of memory. Because this block is too large to fit into the heap associated with Selector 2, the memory associated with Selector 3 is reallocated to hold it.

Figure 7. Figure 4 Followed by _heapmin If memory is set up as in Figure 4, calling _heapmin will leave memory in the state shown by Figure 7. _heapmin performs the following actions to achieve this state:
• • •

Memory associated with Selector 1 is GlobalReAlloc'ed to remove the padding. Selector 2's memory is GlobalReAlloc'ed to remove the freed block and padding. GlobalFree releases Selector 3 and all of its memory.

To recreate the previous examples with Smart Alloc, use 22,000 bytes for the size x. It is important to note that Smart Alloc sorts allocated memory by handle (that is, selector) and not the order in which it was allocated. In addition to _heapmin, the C compiler run-time library contains many other functions to help manage the heap created by _fmalloc. Descriptions of these functions are in the Microsoft C/C++ version 7.0 Run-Time Library Reference. Like _heapmin, most of these functions are unique to C/C++ version 7.0 and are not ANSI C compatible. Below is a list of these unique functions: Reallocation functions: _fexpand _frealloc _heapadd _heapmin Information functions: _fmsize _fheapwalk Debugging functions: _fheapset Fills free heap entries with a specified value. Returns size of an allocated block. Returns information about each entry in a heap. Expands or shrinks a block of memory without moving its location. Reallocates a block to a new size. Might move the block of memory. Adds memory to a heap. Releases unused memory in a heap.

All programmers who decide to use _fmalloc must be aware that _ffree does not return memory to the operating system. For example, an application might read in an entire text file and display it on the screen. Let's say that the application keeps a linked list of lines and mallocs the memory for each line in the file. If the user selects a large file of about 1

megabyte (MB), the application allocates at least 1 MB of memory. The user then closes the file. The application faithfully calls _ffree for each line in the file. Even though the application does not need the memory, it is still hogging it from the system. This application needs to call _heapmin or one of the other heap management functions. Why doesn't _ffree call GlobalFree? There are two main reasons:

Speed. It is faster to keep the memory allocated than to repeatedly call GlobalAlloc, GlobalReAlloc, and GlobalFree. _fmalloc calls can be extremely fast when _fmalloc only has to return a pointer to an existing block of memory. Pointers. _fmalloc returns pointers to an offset inside a segment. _fmalloc would have to move the memory pointed to by these pointers if it were to actually call GlobalFree to free the memory. It is not possible for _fmalloc or _ffree to update all the pointers into its heap.

Note All memory (freed and unfreed) is returned to the system as part of the Windows kernel's normal clean-up process when the application exits.

DLLs: Why _fmalloc May Not Do What You Want
As mentioned above, when _fmalloc must allocate a segment, it makes a call to GlobalAlloc. For applications, it allocates the segment as GMEM_MOVEABLE. For DLLs, _fmalloc calls GlobalAlloc with GMEM_SHARE | GMEM_MOVEABLE flags. _fmalloc maintains only one heap for a DLL, which is shared by all applications that use the DLL. In most cases, programmers do not really want the memory allocated from a DLL marked as GMEM_SHARE. The GMEM_SHARE flag tells Windows that this memory is going to be shared by several programs. The most immediate consequence of using GMEM_SHARE in a DLL is that the memory will not be released until the DLL is unloaded from memory. The DLL is not always unloaded from memory when the application that loads it exits. Because multiple applications or instances of an application are using a DLL, the DLL and its memory will not be unloaded until all applications using the DLL have exited. The following are the possible times when memory is freed:

If an application allocates memory and does not free it, the memory is freed by Windows when the application exits. If an application calls a DLL that allocates memory without the GMEM_SHARE flag (via GlobalAlloc), the memory is owned by the application and will be freed when the application exits. If an application calls a DLL that allocates memory with the GMEM_SHARE flag, the memory will be owned by the DLL and not by the application. The

memory will be released when the DLL is unloaded and not when the application exits. If a programmer is not careful, the use of _fmalloc in a DLL can lead to large pools of allocated but unneeded memory. It is usually best to use the GMEM_SHARE flag only when memory must be shared or must exist for the lifetime of the DLL. This means that, in many cases, GlobalAlloc should be used instead of _fmalloc in a DLL. Remember, calling _ffree does not generate a call to GlobalFree. Even if the DLL is freeing memory before it returns to the application, memory can be wasted. Refer to the previous section on _ffree for more information. The situations listed above can be demonstrated by using the Smart Alloc sample application. Perform the following steps: 1. Run Heap Walker (HEAPWALK.EXE). 2. Run an instance of Smart Alloc (SMART.EXE). 3. GlobalAlloc 1000 bytes of movable memory from a DLL. (See the Smart Alloc help file for details on how to do this.) 4. Walk the global heap using Heap Walker and examine the listing. The above memory should be owned by Smart Alloc. It will differ slightly in size due to the overhead and padding performed by GlobalAlloc. 5. GlobalAlloc 2000 bytes of shared memory from a DLL. 6. Walk the global heap using Heap Walker and examine the listing. The memory allocated in step 5 should be owned by SMARTDLL.DLL. It will differ slightly in size due to the overhead and padding performed by GlobalAlloc. 7. Run a second instance of Smart Alloc. Do not exit the first instance. 8. GlobalAlloc 3000 bytes of movable memory from a DLL using the second instance of Smart Alloc. 9. GlobalAlloc 4000 bytes of shared memory from a DLL using the second instance of Smart Alloc. 10. Walk the global heap in Heap Walker and examine the listing. The memory allocated in steps 8 and 9 should be owned and allocated like the memory allocated by the first instance in steps 4 and 5. In fact, the memory allocated in step 9 will be allocated in the same segment as the memory allocated for the first instance of Smart Alloc in step 5. 11. Exit the second instance of Smart Alloc.

12. Walk the global heap using Heap Walker and examine the listing. The 3000-byte segment will have been discarded by Windows, but the 4000-byte segment owned by SMARTDLL.DLL will still exist. Figures 8 and 9 illustrate the above sequence. Figure 8 illustrates the state of memory after executing steps 1 through 10 in the list above.

Figure 8. State of Memory After Step 10 Figure 9 illustrates what is freed after Instance 2 is deleted.

Figure 9. State of Memory After Closing Instance 2 Remember that _fmalloc allocates memory with the GMEM_SHARE option set.

Versatility: Why _fmalloc Is Not for Everything
While the subsegment allocation makes _fmalloc better for general use, it does not provide the same kind of versatility that GlobalAlloc does. Below is a list of some of the things that GlobalAlloc can do that _fmalloc cannot:
• • •

Allocate memory with the GMEM_SHARE flag in an application. Allocate nonshared memory from a DLL. Allocate more than 64K. GlobalAlloc takes a DWORD, while _fmalloc takes a size_t, which is an unsigned int. _halloc can also be used to allocate more than 64K in a block of memory. Allocate fixed memory, discardable memory, or memory with the other GMEM_* attributes.

Although most programmers do not think of general protection faults as a positive thing, they can be helpful in locating where a program writes outside of a memory block. Because _fmalloc returns a pointer into a block of memory, it is possible to write past the end of the block and not write past the end of the segment.


In most cases, _fmalloc and _ffree utilize system resources better than directly calling GlobalAlloc and GlobalFree. The subsegment allocation scheme used by _fmalloc reduces the number of selectors needed and also reduces the amount of system overhead. While the subsegment allocation scheme is a boon to programmers for Windows, _fmalloc is not without its limitations. The most important one to remember is that memory is not returned to Windows when _ffree is called. Also keep in mind that calling _fmalloc from a DLL allocates memory with the GMEM_SHARE attribute set, which is usually not what is wanted because memory is not freed until the DLL is unloaded.

Calling All Members: Member Functions as Callbacks
Dale Rogerson Microsoft Developer Network Technology Group Created: April 30, 1992 Click to open or copy the files in the CALLB sample application for this technical article.

Microsoft® Windows™ version 3.1 has over 30 callback functions that applications can use to enumerate objects, hook into the hardware, and perform a variety of other activities. Due to the prevalence of callbacks, it is only natural to want to handle callbacks with C++ member functions. However, callbacks are prototyped as C functions and, therefore, do not associate data with operations on that data, making the handling of callbacks less straightforward when you use C++ than it initially might appear. This article explains why normal member functions cannot be used as callback functions, gives several techniques for handling callbacks, and illustrates these techniques with code fragments. The code fragments are included as the CALLB sample program on the Microsoft Developer Network CD. The article and source code are targeted toward Microsoft C/C++ version 7.0, but the ideas presented apply to all C++ compilers, including those by Borland and Zortech. The reader should be familiar with Windows callbacks and with C++. A bibliography is supplied at the end of the article.

The Hidden Parameter, the this Pointer

Every callback function has its own prototype, which determines the parameters that the Microsoft® Windows™ operating system passes to the function. For example, EnumObjects is a Windows function that enumerates objects inside of Windows, such as pens and brushes (these objects should not be confused with C++ objects). EnumObjectsProc is the callback for EnumObjects and is prototyped this way:
int FAR PASCAL __export EnumObjectsProc( LPSTR lpLogObject, LPSTR lpData) ;

Note CALLBACK can be used in place of FAR PASCAL above. When Windows calls the EnumObjectsProc function, it passes the two parameters— lpLogObject and lpData—to the function. The following code attempts to set up a member function as a callback. The code compiles and links successfully but causes a protection fault at run time.
// See CProg1.cpp // Run nmake -fmake1 class CProg1 { private: int nCount ; // Incorrect callback declaration // Use a static or nonmember function. int FAR PASCAL EXPORT EnumObjectsProc( LPSTR lpLogObject, LPSTR lpData) ; public: // Constructor CProg1() : nCount(0) {}; // Member function void enumIt(CDC& dc) ; }; void CProg1::enumIt(CDC& dc) { // Register callback dc.EnumObjects(OBJ_BRUSH, EnumObjectsProc, NULL) ; } // Callback handler int FAR PASCAL EXPORT CProg1::EnumObjectsProc( LPSTR lpLogObject, LPSTR pData) { // Process the callback. nCount++ ; MessageBeep(0) ; return 1 ; }

If the Windows ::EnumObjects function is called instead of CDC::EnumObjects, as in this line:

::EnumObjects(hdc, OBJ_BRUSH, (FARPROC)EnumObjectsProc, NULL) ;

the following error would occur:
cprog1.cpp(13) : error C2643: illegal cast from pointer to member

The reason for the above error and protection fault is that C++ member functions have a hidden parameter known as the this pointer. C++ is able to associate a function with a particular instance of an object by means of the this pointer. When C++ compiles the following line:
dc.EnumObjects(OBJ_BRUSH, EnumObjectsProc, NULL) ;

it generates a call equivalent to:
CDC::EnumObjects(OBJ_BRUSH, EnumObjectsProc, NULL, (CDC *)&dc) ;

The last parameter, (CDC*) &dc, is the this pointer. Member functions access an object's data through the this pointer. C++ handles the this pointer implicitly when accessing member data. In the CProg1::enumIt function, the line:
nCount = 0 ;

is actually compiled this way:
this->nCount = 0 ;

Windows passes only two parameters to EnumObjectsProc. It does not call functions through objects and cannot send a this pointer to the callback function. However, as compiled above, EnumObjectsProc expects three parameters instead of two. The result is that a random value on the stack is used as the this pointer, causing a crash. To handle EnumObjectsProc as a member function, the compiler must be told not to expect a this pointer as the last parameter.

Avoiding the this Pointer
Two function types in C++ do not have a this pointer:
• •

Nonmember functions Static member functions

Nonmember Functions
A nonmember function is not part of a C++ class and, therefore, does not have a this pointer. A nonmember function does not have access to the private or protected members of a class. However, a nonmember friend function can access the private and protected

class members with which the function is friendly. Using nonmember functions to handle a callback is similar to handling a callback in C.

Static Member Functions
Static member functions are class member functions that do not receive this pointers. As a result:

An object does not have to be created before a static member function is called or static member data is accessed. The class scope operator can access static members without an object, for example:

• • • • •

A static member function cannot access a nonstatic member of its class without an object instance. In other words, all object access must be explicit, such as:
object.nonStatFunc(someValue); // NOT: nonStatFunc(someValue) ;

or an object pointer, such as:
ptrObject->nonStatFunc(someValue); // NOT: nonStatFunc(someValue) ;

The last point above is the kicker. Unlike a nonstatic member function, a static member function is not bound to an object. A static function cannot implicitly access nonstatic members. For more information on static member functions, see the bibliography at the end of this article.

Techniques for Handling Callbacks
The rest of this article demonstrates techniques for handling callbacks with static member functions. The main concern is linking the callback routine with a particular object by providing a pointer to the object—kind of a pseudo-this pointer. In other words, our goal is to make a static function act like a nonstatic function. You can use the following techniques to achieve this goal:
• • • •

Not providing a pointer Providing a pointer in a static member variable Passing a pointer in a parameter for application-supplied data Keeping a pointer in a collection indexed by a return value

The callback being handled will determine the technique to use. Many callbacks do not have a parameter for application-supplied data, nor do they return a unique value.

Not Providing a Pointer
In some cases, object pointers are unnecessary because the callback does not need to access member data. In these cases, the callback operates only on static data. The following code fragment demonstrates the technique.
// See CProg3.cpp // Run nmake -fmake3 class CProg3 { private: static int statCount ; int nCount ; // Use a static member function for callbacks. static int FAR PASCAL EXPORT EnumObjectsProc( LPSTR lpLogObject, LPSTR lpData) ; public: // Constructor CProg3() : nCount(0) {}; // Member function void enumIt(CDC& dc) ; }; // Static data members must be defined. int CProg3::statCount = 0 ; // Enumerate the Windows DC objects. void CProg3::enumIt(CDC& dc) { // Register callback and start enumerating. dc.EnumObjects(OBJ_BRUSH, EnumObjectsProc, NULL) ; } // Callback handler int FAR PASCAL EXPORT CProg3::EnumObjectsProc( LPSTR lpLogObject, LPSTR pData) { // Process the callback. statCount++; // nCount++; This line would cause an error if not commented. MessageBeep(0) ; return 1 ; }

Note that all objects of the CProg3 class above will share the statCount variable. Whether this is good or bad depends on what the application is trying to accomplish. The following code fragment illustrates how the outcome might not be what is expected.
void someFunc(CDC& aDC, CDC& bDC, CDC& cDC) { // Assume that aDC has a = 3 objects.


// Assume that bDC has b // Assume that cDC has c // Create some objects. CProg3 aObject; CProg3 bObject; CProg3 cObject; aObject.enumIt(aDC) ; aObject.enumIt(bDC) ; aObject.enumIt(cDC) ;

= 4 objects. = 7 objects.

// statCount = a = 3 // statCount = a + b = 7 // statCount = a + b + c = 14

There are several ways to avoid the sharing of data between instances of a class. The next sections describe techniques that link the callback function to a particular object by providing a pseudo-this pointer.

Providing a Pointer in a Static Member Variable
The main reason to have a callback as a member function is for accessing class members unique to a particular object (that is, nonstatic members). A callback member function must be a static function and, therefore, can only access static members without using "." or "->". The next listing shows how to use a static member variable to pass an object's this pointer to the callback. The callback can then use the pointer to access object members. To simplify the code, the callback calls a helper function that performs all the work. The helper function is nonstatic and can implicitly access member data through its this pointer.
// See CProg5.cpp // Run nmake -fmake1 class CProg5 { private: int nCount ; // Use a static variable to pass the this pointer. static CProg5 * pseudoThis ; // Use a static member function for callbacks. static int FAR PASCAL EXPORT EnumObjectsProc( LPSTR lpLogObject, LPSTR lpData) ; // Use a nonstatic member function as a helper. int EnumObjectsHelper( LPSTR lpLogObject, LPSTR lpData); public: CProg5() : nCount(0) {}; void enumIt(CDC& dc) ; }; // Static data members must be defined. CProg5 * CProg5::pseudoThis = NULL; // Enumerate the objects. void CProg5::enum(CDC& dc)



pseudoThis = this ; // Register callback and start enumerating. dc.EnumObjects(OBJ_BRUSH, EnumObjectsProc, NULL) ; pseudoThis = NULL ;

// Callback handler int FAR PASCAL EXPORT CProg5::EnumObjectsProc( LPSTR lpLogObject, LPSTR lpData) { if (pseudoThis != (CProg *)NULL) return pseudoThis->EnumObjectsHelper(lpLogObject, lpData) ; else return 0 ; } int CProg5::EnumObjectsHelper( LPSTR lpLogObject, LPSTR lpData) { // Process the callback. nCount++; MessageBeep(0) ; return 1 ; }

While the above technique works fine in many cases, the objects must coordinate the use of the callback. For callbacks (such as EnumObjects) that do their work and then exit, coordination is not much of a problem. For other callbacks, it may be. The techniques described in the next two sections require less coordination but work only with certain callbacks.

Passing a Pointer in a Parameter for Application-Supplied Data
A close examination of the EnumObjects function reveals that it has an extra 32-bit parameter, lpData, for supplying data to the callback routine. This is a great place to pass a pointer to an object. The following overworked sample demonstrates this technique.
// See CProg6.cpp // Run nmake -fmake1 class CProg6 { private: int nCount ; // Use a static member function for callbacks. static int FAR PASCAL EXPORT EnumObjectsProc( LPSTR lpLogObject, LPSTR lpData) ; // Use a nonstatic member function as a helper. int EnumObjectsHelper( LPSTR lpLogObject) ; public: CProg6() : nCount(0) {}; void enumIt(CDC& dc) ; }; // Enumerate the objects. void CProg6::enumIt(CDC& dc)

{ }

// Register callback and start enumerating. dc.EnumObjects(OBJ_BRUSH, EnumObjectsProc, (LPSTR)this) ;

// Callback handler int FAR PASCAL EXPORT CProg6::EnumObjectsProc( LPSTR lpLogObject, LPSTR lpData) { CProg6 * pseudoThis = (CProg6 *)lpData ; if ( pseudoThis != (CProg6 *)NULL ) return pseudoThis->EnumObjectsHelper(lpLogObject) ; else return 0 ; } // Callback helper function. int CProg6::EnumObjectsHelper( LPSTR lpLogObject) { // Process the callback. nCount++; MessageBeep(0) ; return 1; }

This technique will, of course, only work with callbacks that take application-supplied data. The following list shows those callbacks:
• • • • • • • • • • • •

EnumChildProc EnumChildWindows EnumFontFamProc EnumFontFamilies EnumFontsProc EnumMetaFileProc EnumObjectsProc EnumPropFixedProc EnumPropMovableProc EnumTaskWndProc EnumWindowsProc LineDDAProc

Keeping a Pointer in a Collection Indexed by a Return Value
Another technique for linking an object pointer with a callback uses the return value of the function that sets up the callback. This return value is used as an index into a collection of object pointers. In the following example, SetTimer sets up a TimerProc callback and returns a unique timer ID. The timer ID is passed to TimerProc each time the function is called. The CTimer class uses the timer ID to find the object pointer in a CMapWordToPtr collection. The CTimer class is an abstract class designed to be inherited by other classes.
// See CTimer.h // Run nmake -ftmake // Declaration class CTimer { private: UINT id ; static CMapWordToPtr timerList ; static void stopTimer(int id) ; static void FAR PASCAL EXPORT timerProc(HWND hwnd, UINT wMsg, int timerId, DWORD dwTime); protected: virtual void timer(DWORD dwTime) = 0 ; public: // Constructor CTimer() : id(NULL) {}; // Destructor ~CTimer() {stop();}; // Use BOOL start(UINT msec) ; void stop() ;


// Define statics. CMapWordToPtr CTimer::timerList ; // Implementation BOOL CTimer::start (UINT msecs) { id = SetTimer(NULL,0,msecs,(FARPROC)timerProc); if (id != NULL) { timerList.SetAt(id, this); return TRUE ; } else return FALSE; }

void CTimer::stop() { if (id != NULL) { stopTimer(id) ; id = NULL ; } } static void CTimer::stopTimer(int timerId) { KillTimer(NULL,timerId) ; timerList.RemoveKey(timerId) ; } static void FAR PASCAL EXPORT CTimer::timerProc(HWND hwnd, UINT wMsg, int timerId, DWORD dwTime) { CTimer * pseudoThis ; if ( timerList.Lookup(timerId, (void*&)pseudoThis)) { if ( pseudoThis != (CTimer *)NULL) pseudoThis->timer(dwTime) ; else stopTimer(timerId) ; } else KillTimer(NULL,timerId) ; } // Inherit CTimer class in order to use it. class CMyTimer : public CTimer { protected: void timer(DWORD dwTimer) { MessageBeep(0); } ; };

Static member functions are used in C++ to handle callbacks because they do not have this pointers. Callback functions are not designed to accept this pointers. Because static member functions do not have this pointers and, in many cases, it is desirable to have access to an object, this article has suggested four ways of providing the static member function with a this pointer.

For more information on C++ topics such as the this pointer, friend functions, or static functions, see:

Stroustrup, Bjarne. The C++ Programming Language. 2d ed. Addison-Wesley, 1991.

Ellis and Stroustrup. The Annotated C++ Reference Manual. Addison-Wesley, 1990. Lippman, Stanley B. C++ Primer. 2d ed. Addison-Wesley, 1991. Microsoft C/C++ version 7.0 C++ Language Reference. Microsoft Corporation, 1991. Microsoft C/C++ version 7.0 C++ Class Libraries User's Guide, Microsoft Corporation, 1991.

• •

For information on callbacks, see:

Microsoft Windows version 3.1 Software Development Kit (SDK) Programmer's Reference, Volume 1: Overview. Microsoft Corporation, 1987-1992. Microsoft Windows version 3.1 SDK Programmer's Reference, Volume 2: Functions. Microsoft Corporation, 1987-1992. Microsoft Windows version 3.1 SDK Guide to Programming. Microsoft Corporation, 1987-1992. Petzold, Charles. Programming Windows. 2d ed. Microsoft Press, 1990. Norton, Peter and Paul Yao. Peter Norton's Windows 3.0 Power Programming Techniques. Bantam Computer Books, 1990.

• •

The C/C++ Compiler Learns New Tricks
Dale Rogerson Microsoft Developer Network Technology Group Created: August 28, 1992 Revised: January 27, 1993 The section on simplified building was removed. (This method links all programs with the /NOI option enabled, which causes problems.) Click to open or copy the files in the Back sample application for this technical article.

WinMain, GlobalAlloc, and mixed-model programming—these are just some of the conventions C programmers had to accept when they started programming for the Microsoft® Windows™ operating system. Microsoft C/C++ version 7.0 can now hide these conventions so that programmers can use standard C practices; applications thus become much easier to develop and port. This article provides an overview of

programming conventions that C/C++ programmers no longer need and a discussion of the new programming practices in C/C++ version 7.0. A bibliography of suggested reading material is included at the end of this article. A sample application called Back (BACK.EXE) and its accompanying dynamic-link library (DLL) called Trace (TRACE.DLL) demonstrate many of the ideas in this article. See the "Notes on the Sample Application" section for more information about Back and Trace. Note: The information in this article is valid only for Microsoft Windows version 3.x standard and enhanced modes.

The Microsoft® C/C++ version 7.0 compiler and run-time libraries were designed for the Microsoft Windows™ operating system. For this reason, programmers no longer have to follow many of the conventions that differentiated Windows-based programs from MSDOS®–based programs. For example, C/C++ programmers can now use:
• • • • •

Large-model programming instead of mixed-model programming. The main function instead of the WinMain function. _fmalloc instead of GlobalAlloc. GlobalAllocPtr instead of GlobalAlloc. Dynamic-link libraries (DLLs) with default LibMain and WEP.

The following sections discuss each of these topics in detail.

Large Model vs. Mixed Model
One of the first weird conventions that programmers moving to Windows face is mixedmodel programming. Mixed-model programming brings out the worst in segmented processor architectures. Some pointers are near while others are far. Some variables default to near while others default to far. Source code becomes a confused mass with near and far casts strewn throughout. In Windows protected modes, large model is now the model of choice.

Single Instances
The behavior of Microsoft C version 6.0 was one reason why programmers were reluctant to use the large model. C version 6.0 built large-model applications with multiple read/write data segments. Windows forces an application that uses multiple

read/write data segments to be single instance; therefore, applications built by C version 6.0 would run only single instance. If you want to build a single-instance application, the Microsoft C/C++ compiler's large model gives it to you for free. There is no need to check hPrevInstance—Windows does all the work for you, including putting up an informative dialog box that tells the user that only one instance can run. Note If you are not using the Microsoft C/C++ compiler, you should check the documentation for your C compiler to see which options will generate multiple read/write data segments.

Multiple Instances
It is possible to get multiple instances with a large-model application. If you use Borland® C++ or Microsoft C/C++ version 7.0, it is easy to get a single 64K read/write data segment. For the Microsoft C/C++ compiler, the /Gx and /Gtnnn options will do the trick; for the Borland C++ compiler, a single data segment is the default. For more information, see the "Programming at Large" and "Allocating Memory the Newfangled Way: The new Operator" technical articles on the Microsoft Developer Network CD (Technical Articles, C/C++ Articles).

Many programmers are concerned about the amount of overhead in a large-model application compared with a small-model or mixed-model application. Performance is never free. Just using the mixed model instead of the large model never makes an application significantly faster. The best method is to use a profiler to determine which code gets executed the most, and optimize that code. It is preferable to optimize code using portable techniques. If you spend a week making functions near and using other optimizations specific to a segmented architecture, the optimizations (and your week of work) will be lost when you port the code to Windows NT™. Instead, you could spend the week reworking the algorithms used in the code that is executed the most. These improvements will impact performance more significantly than which language, compiler, or compiler options you use. However, if your marketing department changes specifications faster than an 80486 can prefetch an instruction, algorithms often change overnight. In this situation, a programmer must often use the compiler (sometimes blindly) to try to speed up code instead of optimizing the code itself.

main vs. WinMain

The Microsoft C/C++ startup code first checks for a function labeled main in a Windows-based program. If it cannot find main, it tries to locate a function called WinMain. The gist of this wonderful information is that a Windows-based application can use main instead of WinMain as its entry point, just like an MS-DOS C program. One of the standard ways to declare main is:
void main(int argc, char *argv[], char **envp) { }

Why would a program want to use main? Possibly for portability or to use a common source between Windows and MS-DOS or UNIX®. Using main also allows programmers to build upon their MS-DOS knowledge for handling the command line and the environment. Not to be outdone by any old application, DLLs can also use main instead of LibMain as an entry point. However, in C/C++ version 7.0, the Windows libraries include a default LibMain, so most DLLs will not need a main or LibMain function. This is covered later in the "Using DLLs" section. The above information was found in plain and public display in the DETAILS.TXT file, which is provided with the Microsoft C/C++ compiler. Those interested in reading code should check out the SOURCE\STARTUP\WIN directory for the STUBMAIN.ASM and CRT0.ASM files.

Getting to hInstance
The careful reader will be wondering where the program is going to get its instance handle. Why, from _hInstance, of course! _hInstance is an undocumented feature of the C/C++ startup code. When Windows calls the startup code, the instance handle is passed to the startup code in the DI register, as documented in the Microsoft Windows version 3.1 Software Development Kit (SDK) Programmer's Reference, Volume 1: Overview, in Chapter 22. The instance handle is then placed in a global variable called _hInstance. To access this variable, you must declare it first:
extern const HINSTANCE _hInstance;

The startup code also includes the following global variables for the other parameters normally passed to WinMain:
• •

_hPrevInstance _lpszCmdLine


You can access these variables by using the following declarations:
extern const HINSTANCE _hPrevInstance; extern const LPSTR _lpszCmdLine; extern const int _cmdShow;

The parameters passed to a DLL are different from parameters passed to an application. The following global variables are defined in the startup code for a DLL:
• • • •

_hModule _lpszCmdLine _wDataSeg _wHeapSize

The following declarations will give you access to these variables:
extern extern extern extern const const const const HINSTANCE _hModule ; LPSTR _lpszCmdLine ; WORD _wDataSeg ; WORD _wHeapSize ;

A quick look into the startup code uncovered the above information. The startup code is included with Microsoft C/C++ version 7.0; look in the SOURCE\STARTUP directory. For more information on the startup code and what it does, see "A Comprehensive Examination of the Microsoft C Version 6.0 Startup Code" in the Microsoft Systems Journal, Vol. 7, No. 1, on the Microsoft Developer Network CD. The article examines C version 6.0 startup code for MS-DOS, but most of the information is also valid for version 7.0. This article explains the work the startup code must perform and provides background information for reading the source code. Note The startup source code is subject to change between compiler releases. The inclusion of specific startup variables or functions is not guaranteed in future releases.

_fmalloc vs. GlobalAlloc
The big problem with GlobalAlloc is that it consumes a selector for each call. A selector is a limited resource in Windows version 3.x, so GlobalAlloc is inappropriate for allocating small blocks of memory such as nodes in a linked list. The solution is to implement a subsegment allocation scheme in which one segment is allocated with GlobalAlloc and divided up into small blocks.

Fortunately, Microsoft C/C++ version 7.0 includes a subsegment allocation scheme called _fmalloc. _fmalloc is the large-model or model-independent version of malloc. When you compile with the large model, malloc is mapped to _fmalloc. In other memory models, malloc must explicitly be called _fmalloc. _fmalloc manages its own heap on top of the Windows global heap. When _fmalloc is called, it first checks whether it can satisfy the memory request by simply returning a pointer to an unused block inside its heap. If it can't, _fmalloc takes one of the following actions:

If the 64K limit of the block is reached or no memory had been allocated, _fmalloc allocates a segment with GlobalAlloc. If more room is needed, _fmalloc enlarges a segment with GlobalReAlloc.

You use _ffree to free the memory blocks allocated by _fmalloc. However, _ffree does not call GlobalFree. Instead, _ffree marks a block as unused, and _fmalloc tries to satisfy future requests for memory with these unused blocks by reusing them. The _heapmin function releases unused blocks back to Windows. For more information on using malloc in a Windows-based program, see the "Allocating Memory the Old-Fashioned Way: _fmalloc and Applications for Windows" technical article on the Microsoft Developer Network CD (Technical Articles, C/C++ Articles).

GlobalAllocPtr vs. GlobalAlloc
If you don't want to use _fmalloc, at least use GlobalAllocPtr instead of GlobalAlloc. GlobalAllocPtr is a macro defined in WINDOWSX.H that allocates the memory, locks the handle, and returns a pointer to the allocated memory. To free the memory, use GlobalFreePtr. There is no need to retain memory handles or lock and unlock memory blocks. What makes this possible is the GlobalHandle function, which takes a pointer and returns the handle to it. GlobalHandle removes the need for saving and tracking handles, resulting in incredible savings in time, memory, and complexity. Other convenient memory macros in WINDOWSX.H are:
• • • •

GlobalPtrHandle GlobalLockPtr GlobalUnlockPtr GlobalReAllocPtr

If these macros were C functions, they would be prototyped as follows:
void FAR * GlobalAllocPtr(UINT flags, DWORD size) ; // Allocates and locks a block of size bytes with the // flags set. BOOL GlobalFreePtr(void FAR* lp) ; // Unlocks and frees the block pointer by lp; // returns a non-zero on success. void FAR // // // * GlobalReAllocPtr(void FAR* lp, DWORD size, UINT flags) ; Reallocates the block pointed to by lp to size bytes with the flags set. The return value is the pointer to the reallocated block.

HGLOBAL GlobalPtrHandle(void FAR* lp) ; // Gets global handle pointed to by lp from FAR pointer. BOOL GlobalLockPtr(void FAR* lp) ; // Locks the block lp points to. // If successful, returns a non-zero value. BOOL GlobalUnlockPtr(void FAR* lp) ; // Unlocks the block lp points to. // If successful, returns a non-zero value.

For the curious, here are the definitions of GlobalAllocPtr and GlobalFreePtr:
#define GlobalAllocPtr(flags, cb) \ (GlobalLock(GlobalAlloc((flags), (cb)))) #define GlobalFreePtr(lp) \ (GlobalUnlockPtr(lp),(BOOL)GlobalFree(GlobalPtrHandle(lp)))

Using DLLs
Microsoft C/C++ version 7.0 run-time libraries provide better support for building DLLs. Two changes that simplify building DLLs are:
• •

A default LibMain function A default WEP function

Most of the information for this section can be found in the DETAILS.TXT file, which is included with the Microsoft C/C++ compiler. Note The library files that do not include the C run-time functions (for example, xNOCRTDW.LIB, where x is the memory model) do not have a default LibMain or WEP function. You must provide your own LibMain and WEP functions if you use these libraries.

Many DLLs are collections of functions that do not need to perform initialization and therefore do nothing in the LibMain function. If a function does not do anything, it would be nice if the developer did not have to worry about it. The C run-time libraries now include a version of LIBENTRY.OBJ and a default LibMain function. So, if the DLL links to the C run-time functions, it does not have to link to LIBENTRY.OBJ or provide its own LibMain function. The default LibMain function is very clever. It does nothing.

It is no longer necessary to include a dummy WEP function in your DLL code. The C run-time libraries now include a default version of the WEP function. The default WEP performs the following functions: 1. Calls the optional user termination function _WEP (see next section). 2. Performs C exit processing (calls _cexit). 3. Returns to Windows the value returned by _WEP. Placing WEP in a fixed segment ensures that it will exist in memory in case of error. For proper placement of WEP, include the following lines in the .DEF file:

The source code for the default WEP function is included with the Microsoft C/C++ version 7.0 compiler. Look in the SOURCE\STARTUP\WIN directory for a file called WEP.ASM.

To add your own processing to the default WEP function, add a _WEP function to your DLL. (Note the leading underscore character.) Here is an example:
int FAR PASCAL _WEP(int nExitType); // Put _WEP code into same fixed segment as the WEP function. #pragma alloc_text(WEP_TEXT, _WEP) int FAR PASCAL _WEP(int nExitType) { //


// Exit cleanup code goes here. // return nExitType ;

The _WEP function is optional; use this function for cleanup tasks that you want done when the DLL is unloaded. If you do not provide a _WEP function, the default WEP function calls the default _WEP function, which simply returns a one (1). To verify for yourself, check the source in the STUBWEP.ASM file included with the Microsoft C/C++ compiler in the SOURCE\STARTUP\WIN directory. Avoid the following in a _WEP function:
• • • •

Do not use deep stacks (that is, do not use recursion or call a bunch of functions). Do not use operating system requests. Do not use file I/O. Do not call functions that are not in a FIXED segment.

Building DLLs becomes much easier with the default WEP and LibMain functions. It is almost possible to cut functions from an application and simply recompile them to get a DLL. Using the large model for both the DLL and the application simplifies this process.

Notes on the Sample Application
The sample application, Back, demonstrates some of the concepts presented in this article. Back lists command-line options and environment variables. It can be built for MS-DOS or for Windows. The MKWIN.BAT batch file builds the Windows version, while the MKDOS.BAT batch file builds the MS-DOS version.The code for the sample application is simple and straightforward. To display the output, the MS-DOS version of Back uses printf and the Windows version uses trace, which is a function exported from a DLL called TRACE.DLL. trace performs printf-like printing to the debug monitor. It demonstrates how to export a CDECL variable argument function from a DLL and shows how simple a DLL can be. To view the BACK.C and TRACE.C files, click the sample application button at the beginning of this article.

Microsoft C/C++ version 7.0 introduces new programming practices that facilitate the development of applications for Windows version 3.1 in protected mode. Programmers can now:

Use the large memory model. Large-model programs are compatible with the protected modes of Windows version 3.1 and can have multiple instances.

Use _fmalloc. With C/C++, _fmalloc, which is the large or model-independent version of malloc, performs subsegment allocation and conserves selector usage.

Use the run-time version of LibMain. DLLs no longer have to link to LIBMAIN.OBJ because the C/C++ run-time libraries include a default LibMain.

Use the run-time version of WEP. You no longer need a dummy WEP function because the C/C++ run-time libraries include a default WEP function. To add your own exit processing, use a _WEP function.

Technical Articles
All of the articles below are available on the Microsoft Developer Network CD (Technical Articles, C/C++ Articles):
• •

"Microsoft Windows and the C Compiler Options" "Allocating Memory the Old-Fashioned Way: _fmalloc and Applications for Windows" "Allocating Memory the Newfangled Way: The new Operator" "Programming at Large"

• •

Product Documentation
On the Microsoft Developer Network CD, you can find the following books under C/C++ 7.0 in the Product Documentation section of the Source index:
• •

Programming Techniques Environment and Tools

See the following book under Windows 3.1 SDK in the Product Documentation section of the Source index:

Programmer's Reference, Volume 1: Overview


DETAILS.TXT included with Microsoft C/C++ version 7.0

How to Pass Parameters Between COBOL and C
Michael Hendrick Systems Support Engineer, Languages February 1992

This article explains how Microsoft® COBOL programs can pass parameters to and receive parameters from Microsoft C programs. It assumes you have a basic understanding of the COBOL and C languages. Microsoft COBOL supports calls to routines written in Microsoft C, FORTRAN, Pascal, and Assembler. This article describes the necessary syntax for calling Microsoft C routines and contains a series of examples demonstrating the interlanguage capabilities between COBOL and C. The sample programs apply to the following Microsoft products:

Microsoft COBOL Professional Development System (PDS) versions 4.0 and 4.5 for MS-DOS® and OS/2® Microsoft C Optimizing Compiler version 6.0 for MS-DOS and OS/2

Mixed-Language Programming with COBOL and C
The C Interface to COBOL
The C interface to COBOL utilizes the standard C extern statement. The following are the recommended steps for using this statement to execute a mixed-language CALL from C: 1. In the C code, include an extern statement for each COBOL routine CALLed. The extern statement should be at the beginning of the C program, before any CALLs to the COBOL routine.

Note: When compiling, if the /Gc compiler directive is used (the /Gc option causes all functions in the module to use the FORTRAN/Pascal naming and CALLing conventions), then the cdecl keyword should be used when the COBOL function is declared (because COBOL uses the C CALLing convention, not the Pascal CALLing convention). 2. To pass an argument by reference, pass a pointer to the object (all parameters must be passed by reference to COBOL). C automatically translates array names into addresses. Therefore, arrays are automatically passed by reference and don't need the * (asterisk) operator. 3. Once a routine has been properly declared with an extern statement, CALL it just as you would CALL a C function. 4. If passing structures between COBOL and C, compile the C routine with the /Zp1 compiler option to pack structure members. 5. Always compile the C module in large model.

C Arguments
The default for C is to pass all arrays by reference (near or far, depending on the memory model) and all other data types by value. C uses far data pointers for compact, large, and huge models, and near data pointers for small and medium models.

Passing C Arguments by Value
The C default is to pass everything except arrays by value. Arrays can be passed by value only if they are declared as the only member of a structure. The following example passes all 100 bytes of x directly to the C function test():
struct x_struct {int x[100]) xs; . . . test(xs)

Passing C Arguments by Reference (Near or Far)
In C, passing a pointer to an object is the equivalent of passing the object itself by reference. Within the CALLed function, each reference to the parameter itself is prefixed by an * (asterisk). Note: To pass a pointer to an object, prefix the parameter in the CALL statement with &. To receive a pointer to an object, prefix the parameter's declaration with *. In the latter

case, this may mean adding a second * to a parameter that already has an *. For example, to receive a pointer by value, declare it as follows:
int *ptr;

But to receive the same pointer by reference, declare it as the following:
int **ptr;

The default for arrays is to pass by reference.

Effect of C Memory Models on Size of Reference
Near reference is the default for passing pointers in small and medium model C. Far reference is the default for the compact, large, and huge models. Note All C programs that are linked with COBOL must be compiled with the large memory model.

Restrictions on CALLs from COBOL
The COBOL to C interface does not support near heap in the C run time. This means you should not use the function calls that access near heap in your C programs. This includes the following functions:
• • • • • •

_nfree() _nheapchk() _nheapset() _nheapwalk() _nmalloc() _nmsize()

To work around this, compile and link with C as the initial program. After the main C program begins, the COBOL routine can be CALLed. The COBOL code can then CALL back and forth with C. Since the C support modules are not used, there are no special restrictions on the near heap functions.

Special Note on C Strings
C stores strings as simple arrays of bytes (like COBOL) but also uses a null character [ASCII NULL (0)] as the delimiter to show the end of the string. For example, consider the string declared as follows:

char str[] = "String of text"

The string is stored in 15 bytes of memory as follows:
|S|t|r|i|n|g| |o|f| |t|e|x|t|\0|

When passing a string from COBOL to C, the string will normally not have a NULL appended to the end. Because of this, none of the C routines that deal directly with a string (printf, sprintf, scanf, and so on) can be used with these strings unless a NULL is appended to the end. A NULL can be put at the end of a COBOL string by using the following declaration:

Compiling and LINKing
Several compile and link options need to be used when interfacing C and COBOL. The standard C compile line is as follows:
CL /c /Aulf CProgName ;

Option /c /Aulf u l f

Description Compiles without linking (produces only an .OBJ file). Sets up a customized large memory model. SS not equal to DS. DS is reloaded on function entry. Selects large memory model Far (32-bit) code pointers. Selects large memory model Far (32-bit) data pointers.

The standard LINK line for COBOL CALLing C is as follows: For MS-DOS®

For OS/2®

The standard LINK line for C CALLing COBOL is as follows: For DOS

For OS/2

Note that the order in which the libraries are specified in the LINK line is important. Microsoft® COBOL versions 4.0 and 4.5 introduced the shared run-time system. Although it is generally more useful to link your applications using the static run-time system (LCOBOL.LIB), you may also choose to link the applications with the shared run-time library (COBLIB.LIB) to take advantage of its more efficient methods of utilizing memory. In order to do this and link your applications to Microsoft C, you must SET the COBPOOL environment variable as referenced in the Microsoft COBOL Operating Guide.

Common Pitfalls
This list supplies a simple checklist to go over when you encounter problems doing mixed-language programming:

Make certain the version numbers of the two languages are compatible. Microsoft COBOL versions 4.0 and 4.5 are compatible with the C versions 5.1 and 6.x. Use the /NOD switch when LINKing to avoid duplicate definition errors. If duplicate definition errors still occur, use the /NOE switch in addition to the /NOD switch when LINKing. Watch for incompatible functions such as _nfree() and _nheapchk(). Make certain the C program is compiled in the large memory model and the /Aulf compile options are used. If passing structures (records) to and from COBOL, use the /Zp1 compile option. (/Zp1 means that structure members will be packed on one-byte boundaries.) When COBOL is the main module and there are some C functions that are not working correctly, make the C routine the main routine and then CALL the COBOL routine. The COBOL routine can then in turn CALL back into the C routines. When this method is used, the COBOL/C support modules do not have to be used. This can correct some incompatibilities.

• •

Batch FIles
The following batch files can be helpful when using the sample programs below. The CBC6.BAT file can be used to set your environment table correctly, but think of it as a convenience rather than a necessity when using. This means that you should already have these parameters preset in your environment when using both languages in tandem. CBC6.BAT





Sample Code
The following sample code demonstrates how to pass common numeric types to a C routine by reference and by value. COBNUMS.CBL
* Passing Common Numeric Types to C by Reference and by Value working-storage section. 01 field1 pic 9(4) comp-5 value 123. 01 field2 pic 9(8) comp-5 value 123456. 01 field3 pic 9(4) comp-5 value 456. 01 field4 pic 9(8) comp-5 value 456789. procedure division. * Fields 1 and 2 (below) are passed BY REFERENCE. The keywords * are omitted here since BY REFERENCE is the default method. call "_CFUNC" using field1, field2, by value field3, by value field4. display "Returned pic 9(4): " field1. display "Returned pic 9(8): " field2. stop run.


#include <stdio.h> void CFunc(int *RefInt, long *RefLong, int ValInt, long ValLong) { printf("By Reference: %i %li\r\n", *RefInt, *RefLong); printf("By Value : %i %li\r\n", ValInt, ValLong); *RefInt = 321; *RefLong = 987654; }

Returned PIC 9(4): 00321 Returned PIC 9(8): 000987654 By Reference: 123 123456 By Value : 456 456789

The following sample code demonstrates how to pass an alphanumeric string from C to COBOL. _COBPROG.CBL
program-id. "_cobprog". data division. linkage section. 01 field1 pic x(6). procedure division using field1. display "String from C: " field1. stop run.

#include <stdio.h> extern cdecl cobprog(char *Cptr); char Cptr[] = "ABCDEF"; void main() { cobprog(Cptr); }

String from C: ABCDEF

The following sample code demonstrates how to pass a record from COBOL to a C data struct. STRUCT.CBL
$set vsc2 rtncode-size(4) * Passing a Record from COBOL to a C struct data division.

working-storage 01 rec-1. 02 var1 pic 02 var2 pic 02 varc2 pic 02 varc3 pic 02 varc4 pic 02 varc5 pic 02 varc1 pic

section. X(8) X(12) 9(04) 9(04) 9(04) 9(04) 9(04) value "HELLO". value "W O R LD". comp-5 value 2. comp-5 value 3. comp-5 value 4. comp-5 value 5. comp-5 value 1.

procedure division. call "C_FUNCTION1" using by reference rec-1. display "CBL varC--> " varC1. display "CBL varC--> " varC2. display "CBL varC--> " varC3. display "CBL varC--> " varC4. display "CBL varC--> " varC5. display "CBL var1--> " var1. display "CBL var2--> " var2. stop run.

#include <stdio.h> struct struct1 { unsigned char var1[8]; unsigned char var2[12]; unsigned int var3[5]; }; function1(struct struct1 far *p1) { int a; for (a=0; a<5; a++) printf("%i\n",p1->var3[a]); for (a=0; a<8; a++) printf("%c", p1->var1[a]); printf("\n"); for (a=0; a<12; a++) printf("%c", p1->var2[a]); printf("\n"); }

2 3 4 5 1 HELLO W O R LD CBL VARC--> 00001


VARC--> VARC--> VARC--> VARC--> VAR1--> VAR2-->

00002 00003 00004 00005 HELLO W O R LD

The following sample code demonstrates how to pass a record from struct from C to COBOL. _COBPROC.CBL
identification division. environment division. data division. working-storage section. 01 Integer pic 9(4). 01 Long pic 9(8). linkage section. 01 CobRec. 03 COBInt pic s9(4) comp-5 value zero. 03 COBLong pic s9(8) comp-5 value zero. 03 COBString pic x(21) value spaces. procedure division using CobRec. move COBInt to Integer. move COBLong to Long. display "Integer from C: " Integer. display "Long integer from C: " Long. display "String from C: " COBString. exit program.

#include <stdio.h> #include <malloc.h> #include <stdlib.h> struct CobRec { unsigned int varInt; unsigned long varLong; char szString[21]; };

// defines data type CobRec // integer variable // long int // string variable

/* COBOL routines are cdecl; this means the name must be prefixed * with '_'. Alternatively, you can manually reverse the * parameters. */ extern far cdecl COBPROC(struct CobRec *cPtr); main() { struct CobRec *cPtr;

// declare pointer to struct

// get memory to hold struct cPtr = (struct CobRec *) _fmalloc(sizeof(struct CobRec));

/* NOTE: COBOL will be the main program unless BP is nonzero. * BP is zero until some local variables are allocated and used. * * In this example, we do use some local variables; therefore, this * is taken care of already. */ printf("Positive Integers and String\n"); cPtr->varInt = 32767; // refer to member of struct and cPtr->varLong = 60000; // assign values strcpy(cPtr->szString,"This is a test string"); COBPROC ( cPtr); printf("\n\n\n"); printf("Negative Integers and String\n"); cPtr->varInt = -32765; cPtr->varLong = -987654; strcpy(cPtr->szString,"Here's another string\n"); COBPROC ( cPtr); } // CALL to COBOL procedure

Positive Integers and String Integer from C: 2767 Long integer from C: 00060000 String from C: This is a test string

Negative Integers and String Integer from C: 2765 Long integer from C: 00987654 String from C: Here's another string

The following sample code demonstrates how to pass an array of integers from COBOL to C. INTARRAY.CBL
* Passing an Array of Integers from COBOL to C data division. working-storage section. 01 t-count pic 99. 01 t-table. 05 the-table pic 9(4) comp-5 occurs 5 times. procedure division. perform varying t-count from 1 by 1 until t-count > 5 move t-count to the-table(t-count) end-perform. call "C_CProc" using t-table. stop run.

#include <stdio.h> void CProc(int IntTable[4]) { int count; for (count = 0; count < 5; count++) printf("Array [%i]: %i\r\n", count, IntTable[count]); }

Array Array Array Array Array [0]: [1]: [2]: [3]: [4]: 1 2 3 4 5

The following sample code demonstrates how to pass a two-dimensional array of long integers from COBOL to C. LINT.CBL
* Passing long integers from COBOL to C $set bound data division. working-storage section. 01 I1 pic 9. 01 J1 pic 9. 01 t-table. 02 t-field occurs 2 times. 05 the-table pic 9(8) comp-5 occurs 3 times. procedure division. perform varying I1 from 1 by 1 until I1 > 2 perform varying J1 from 1 by 1 until J1 > 3 move J1 to the-table(I1, J1) end-perform end-perform. call "_CProc" using t-table. stop run.

#include <stdio.h> void CProc(long IntTable[2][3]) { int i, j; for (i = 0; i < 2; i++) for (j = 0; j < 3; j++) printf("Array [%i,%i]: %ld\r\n", i, j, IntTable[i][j]);


Array Array Array Array Array Array [0,0]: [0,1]: [0,2]: [1,0]: [1,1]: [1,2]: 1 2 3 1 2 3

The following sample code demonstrates how to pass a two-dimensional array of records from C to COBOL _COBPROC.CBL
program-id. "_CobProc". data division. working-storage section. 01 I1 pic 01 J1 pic linkage section. 01 the-table. 02 t-table occurs 2 times. 05 t-field occurs 3 times. 10 field1 pic 10 field2 pic procedure division using the-table. perform varying I1 from 1 by 1 until I1 > 2 perform varying J1 from 1 by 1 until J1 > 3 display "table[" I1 "][" J1 "]: " field1(I1, J1) display " " field2(I1, J1) end-perform end-perform. stop run.

9. 9.

9(4) comp-5. x(6).

#include <stdio.h> struct TableStruc { /* define structure */ int TheInt; char String[6]; } TheTable[2][3]; extern void far cdecl CobProc(struct TableStruc TheTable[2][3]); void main() { int i, j; for (i = 0; i < 2; i++) for (j = 0; j < 3; j++) { /* initialize structure */

TheTable[i][j].TheInt = j; Sprintf(TheTable[i][j].String, "[%1i][%1i]", i, j); } } CobProc(TheTable); /* CALL COBOL routine */

table[1][1]: 00000 [0][0] table[1][2]: 00001 [0][1] table[1][3]: 00002 [0][2] table[2][1]: 00000 [1][0] table[2][2]: 00001 [1][1] table[2][3]: 00002 [1][2]

The following sample code demonstrates how to pass integers by reference from COBOL to C. COBINT.CBL
* Passing integers by reference from COBOL to C working-storage section. 01 passvar1 pic 9(4) comp-5 value 16384. 01 passvar2 pic 9(4) comp-5 value 33. procedure division. display "Before the call to the C swapping function...". display "Passvar1 is equal to: " passvar1. display "Passvar2 is equal to: " passvar2. call "_SwapFunc" using by reference passvar1 by reference passvar2. display "After the call to the C swapping function...". display "Passvar1 is equal to: " passvar1. display "Passvar2 is equal to: " passvar2. stop run.

/* Manipulates integers passed from a COBOL program */ #include <stdio.h> void SwapFunc(int *var1, int *var2) { int tmp; /* Temporary value for use in swap */ tmp = *var1;


*var1 = *var2; *var2 = tmp; return;

Before the call to the C swapping function... PassVar1 is equal to: 16384 PassVar2 is equal to: 00033 After the call to the C swapping function... PassVar1 is equal to: 00033 PassVar2 is equal to: 16384

The following sample code demonstrates how to pass an integer from COBOL to C. CBLINT.CBL
working-storage section. 01 pass-var pic 9(4) comp-5 value 3. procedure division. call "_Circum" using by value pass-var. display "Radius of circle: " pass-var. display "Circumference of circle: " return-code. stop run.

#include <stdio.h> int Circum(int Radius) { float cir; cir = 3.14159 * Radius * Radius; return((int) cir); }

Radius of circle: 00003 Circumference of circle: +0028

The following sample code demonstrates passing a long integer from COBOL to C. LINT.CBL
$set rtncode-size(2) working-storage section. 01 pass-var pic 9(4) comp-5. procedure division. display "Radius of circle?".

accept pass-var. call "_Area" using by value pass-var. display "Area of circle: " return-code. stop run.

#include <stdio.h> long Area(int Radius) { float cir; cir = 3.14159 * Radius * Radius; return((long) cir); }

Radius of circle? 1 Area of circle: +0003

The following sample code demonstrates how to pass a string from COBOL to C. COBSTR.CBL
* Passing a string from COBOL to C identification division. program-id. cobstr. data division. working-storage section. 01 passvar pic x(15) value "Replace this". procedure division. display "This is what is passed: " passvar. call "_Funct" using pass-var. display "This is what comes back: " passvar. stop run.

#include <ctype.h> void * Funct(char *Rvalue) { char *cp; cp = Rvalue; while (*cp != '\0') { *cp = toupper(*cp); ++cp; } return; }


This is what is passed: Replace this This is what comes back: REPLACE THIS

The following samples demonstrate how to call a C 6.x routine from a COBOL 4.5 program, where the C function, in turn, spawns another COBOL 4.5 executable. Note: The COBOL program titled COB2.CBL must be compiled and linked as a standalone executable module. Use the following lines to compile and link this program:

* How to call a C function that executes another COBOL * program. program-id. main. working-storage section. 01 commandL. 05 filler pic x(01) value 'S'. 05 cmdlin pic x(124) value "COB2.EXE". procedure division. display "In COBOL program 1". call "_pcexec" using commandL. display "End of COBOL program 1". stop run.

#include <stdio.h> #include <process.h> pcexec (commandL) char far commandL[125]; { printf ("Prior to C call of COB2.EXE \n"); spawnl (P_WAIT, "COB2.EXE", "COB2", "spawnl", NULL); printf("After C call to COB2.EXE \n"); }

* This program must be a separate executable. program-id. cob2. procedure division. display "Inside COBOL program 2". stop run.

In COBOL program 1 Prior to C call of COB2.EXE Inside COBOL program 2

After C call to COB2.EXE End of COBOL program 1

The following samples demonstrate how a COBOL 4.5 Quickwin application can call a Windows™-based DLL written in C 6.x. MAIN.CBL
working-storage section. 77 Var1 pic 9(4) comp-5. 77 Char pic x. procedure division. move 1 to Var1. display "Prior to DLL call: " at 0101 display Var1 at 0120. call 'cdll' using by reference Var1. display "After DLL call: " at 0201. display Var1 at 0217. call "cbl_read_kbd_char" using Char. stop run.


#include <windows.h> int FAR PASCAL LibMain(HANDLE hInstance, WORD wDataSeg, WORD cbHeapSize, LPSTR lpszCmdLine) { //Additional DLL initialization fits here if (cbHeapSize != 0) UnlockData(0); return (1); } VOID FAR PASCAL cdll(int _far *piIntPointer) { if((*piIntPointer >= -32768) && (*piIntPointer < 32767)) { (*piIntPointer)++; return(1); }

else { } } VOID FAR PASCAL WEP (int nParameter) { if (nParameter == WEP_SYSTEM_EXIT) { return (1); } else { if (nParameter == WEP_FREE_DLL) { return (1); } else { return (1); } } } return(0);


The Microsoft Overlay Virtual Environment (MOVE)
Microsoft Corporation Created: March 20, 1992

This article explains how the Microsoft® overlay virtual environment (MOVE) helps overcome memory limitations for programs that run in the MS-DOS® operating system. The article compares MOVE technology to conventional overlays and to paged virtual memory systems, and explains the basics of the technology.

Along with death and taxes, all programmers eventually share another misery: insufficient memory. Since the beginning of their profession, programmers have needed to cram too-big programs into too-little random-access memory (RAM). Programmers for MS-DOS® are further restricted by the infamous 640K limit; a program running on a 4 MB computer, for example, can directly execute only in the first 640K of RAM. Many techniques have been employed to overcome this limitation: optimizing compilers, interpreters, MS-DOS extenders, and so on. The most commonly used technique, overlays, is also one of the most cumbersome to use. The new Microsoft overlay virtual environment (MOVE) is a significant advance over previous overlay methods. MOVE is both easier to use and more effective than conventional overlay systems. In many ways, the MOVE technology combines the benefits of overlays and virtual memory. Some of the advantages of MOVE over conventional overlays are:

The MOVE system keeps multiple overlays in memory at the same time. This makes devising efficient overlay structures much easier. Discarded overlays can be cached in extended memory (XMS) or expanded memory (EMS). MOVE supports pointers to functions. You do not need to modify your source code. The memory allocated for overlays can be set at program startup. Your program can adapt to different memory situations.

The MOVE technology can be used only in MS-DOS operating system programs. Programs in the Microsoft Windows™ graphical environment automatically take advantage of a similar mechanism built into Windows. The next three sections cover the basics of conventional overlays and virtual memory. If you're already familiar with these concepts, you can skip ahead to "MOVE Basics."

Overlay Basics
If you're not using overlays or other techniques, your program size cannot exceed available memory. When loading your program, MS-DOS copies the program's code and data segments into memory, starting at the first available memory location and continuing to the end of the program (see Figure 1).

Figure 1. Memory Map for a Nonoverlaid Application

With overlays, however, the entire program need not fit into memory at one time. A portion of the program, called the root, occupies the lower portion of available memory and works just like a nonoverlaid program. The other portions of the program, called overlays, have overlapping memory addresses. This trick is accomplished by keeping only one or a subset of these overlays in memory at one time. When you use overlays, the linker automatically includes a routine called the overlay manager in your program's EXE file. When the program calls a function located in another overlay, the overlay manager loads the necessary overlay into memory, overwriting the previous overlay (Figure 2).

Figure 2. Memory Map for an Overlaid Application This way, a program can be many times larger than available memory; it only needs sufficient memory to hold the root and the largest overlay. In some overlay systems the overlays are included within the EXE file, whereas in others the overlays are separate files, usually with the OVL extension. You need not keep track of which overlay is in memory or which function is in which overlay; the overlay manager automatically handles loading the appropriate overlay when necessary. Well, if overlays sound too good to be true, you're right; they have some drawbacks. They slow your program down, sometimes considerably. All that reloading of overlays from the disk can gum up the works. Reading an instruction from an overlay on the disk can be several thousand times slower than reading the instruction from an already-loaded overlay, so the speed of your program depends heavily on how the overlays are structured. Ideal candidates for overlays are functions that are called only once during a program's execution, like initialization or error-handling routines. Routines that are used together should be grouped into the same overlay so that multiple overlays needn't be loaded to accomplish a task. The worst situation is caused by a tight inner loop calling routines in two different overlays. In cases like this, the computer spends more time loading overlays from disk than executing instructions. This phenomenon, called thrashing, is accompanied by grinding from your user's hard disk and groaning from your users. Determining an efficient overlay structure is fiendishly difficult, an activity closer to art than to science. Your intuitions about who calls what, particularly in a large program, are often dead wrong. Even when you know which functions are involved in a particular task, it's still difficult to balance the performance hit with the need to reduce the required memory.

Example of Overlay Structure

Most programmers structure large projects into several source files, using one file for each major system in the program. For example, a hypothetical calendar program that allows the user to add appointments, view a date page, and print the calendar might be composed of the files listed below. Source file Key routines

DATABASE.C DatabaseInit (read, write database appointment records) DatabaseReadRecord DatabaseWriteRecord DatabaseExit DATAFORM.C (show, get appointment data entry) DATEUTIL.C (various date routines) INIT (main initialization routine) MAIN.C (main program file) PRINTER.C (print appointments) STRUTIL (various string routines) DataFormEnter DataFormShow DateDifference DateGet DateShow InitializeApplication main ShowMenu PrinterInit PrinterWrite StringGet StringShow

An obvious overlay structure for the program can be illustrated as follows. Root Overlays


Although this structure reduces memory requirements to a bare minimum, it is probably very slow. For example, the primitives in DATEUTIL.C and STRUTIL.C are used throughout the code, so these routines should be placed in the root.

As you analyze the call tree and optimize the overlay structure, you may find yourself putting more and more routines in the program's root. However, if you put too many routines in the root, your program will need nearly as much memory as the nonoverlaid version. The initialization routines in INIT.C call the hypothetical routines DatabaseInit in DATABASE.C and PrinterInit in PRINTER.C. Although these routines thematically belong in DATABASE.C and PRINTER.C, they should be included in the INIT overlay for best performance. If you move too many routines from where they belong to where they are used, your program may run faster, but it will be harder to read and to maintain the source code. A more balanced overlay structure is shown below. Root MAIN.C DATEUTIL.C STRUTIL.C Overlays 1: DATABASE.C (except DatabaseInit) 2: DATAFORM.C 3: INIT.C (plus DatabaseInit from DATABASE.C) (plus PrinterInit from PRINTER.C) PRINTER.C (except PrinterInit)

Producing a good overlay structure requires lengthy and tedious trial-and-error work. As new capabilities are added to your program, the structure quickly becomes obsolete. Programmers working on a large system that contains hundreds of source files and thousands of functions often spend as much time tuning the overlay structure as they do writing code.

Paged Virtual Memory
Because working with overlays is so difficult, computer designers have come up with a radically different approach called paged virtual memory (VM). In a paged virtual memory system, the entire address space of the computer is divided into fixed-size blocks called pages. The address range of the processor can be significantly larger than the memory physically contained in the computer; therefore, only a fraction of the page addresses represent actual memory addresses. The programmer doesn't have to worry about the amount of memory in a computer that has VM. All addresses used in a VM program are virtual addresses. The computer's virtual memory manager maps virtual page addresses to the physical addresses of memory. When a program needs a virtual memory page that is not mapped to a physical page in memory, the virtual memory manager copies the contents of that page from disk to a page

of physical memory. The operating system maps the virtual address of the page to the physical address of the page's contents. This way, when the program reads from a particular virtual address, the computer's VM mapping scheme ensures that the program reads from the appropriate physical page. The computer doesn't need room for all the pages containing a program. The more physical pages available, the less disk activity needed and the faster the program runs. The operating system's VM manager handles loading pages from the disk, swapping modified pages to the disk and translating virtual addresses to physical addresses. Virtual memory has several advantages over overlays. First, it does not require programmer effort and eliminates the tedious process of creating overlay structures. Second, the program performs efficiently regardless of the amount of memory the user's computer contains. Most of the program's execution time is spent in a small fraction of the code. As the program executes, pages containing this core code replace pages with less critical code. The set of pages that make up the often-used code is called the program's working set. If the working set can fit in the computer's physical memory, the program executes efficiently and swaps pages only occasionally for infrequently used routines. If the working set cannot fit in the computer's memory, the computer thrashes, spending more time loading code from the disk than executing the program. Of course, VM is no panacea either. First, the virtual memory manager and the address translation scheme must be part of the computer hardware. The more powerful members of the Intel® CPU family, particularly the 80386 and higher, support address translation. Less powerful CPUs, however, do not support this feature. Second, the virtual memory manager must realistically be an integral part of the operating system. MS-DOS does not support virtual memory.

MOVE Basics
Microsoft's new MOVE overlay technology has the best of both the overlay and virtual memory worlds. MOVE is an overlay system but has significant advantages over conventional overlays. Unlike conventional overlays, MOVE allows more than one overlay to reside in memory simultaneously. Like virtual memory, the MOVE memory manager keeps resident as many overlays as will fit. Each overlay need not fully cover a single task; two or three overlays can cooperate to complete the task. When loading a new overlay, MOVE discards the least recently used (LRU) overlay. If there is still insufficient room for the new overlay, MOVE discards the next least recently used overlay, and so on. With MOVE you can make your overlays smaller and more modular, letting the LRU algorithm determine which overlays stay in memory. Some of your overlays may remain in memory because they are needed for the normal operation of the program. This working set of overlays is similar to the working set of pages in a virtual memory system. Like virtual memory, MOVE programs naturally configure themselves for efficient operation on a given computer. Unlike virtual memory, however, you are not limited to fixed-size pages; you can group functions for better control. For example, if function A is

called each time function B is called and only when function B is called, you can group A and B in the same overlay to save the disk time of loading them separately.

MOVE Mechanics
You don't need to modify your C source code to create a MOVE application, but you do need to modify your CL and LINK command lines. These changes are described in the "Creating Overlaid Programs" section. Like a nonoverlaid program, a MOVE application has a single EXE file. The EXE file contains the root and all overlays. The file also contains the overlay manager routines (about 5K), which are automatically added by the linker. When a MOVE application is launched, the program's startup routine allocates a memory area to store the overlays. This area, called the overlay heap, is distinct from the regular heap used for malloc. When your application calls a function in an overlay that is not currently loaded in RAM, the MOVE manager must read the overlay from disk and copy its contents to the overlay heap before program execution can continue. If the heap does not have enough free space to hold the requested overlay, the MOVE manager discards one or more of the currently resident overlays. The least recently used overlay is discarded first. Because overlays can vary in size, the MOVE manager may have to discard multiple overlays to make sufficient room for the requested overlay. If your program is running on a computer with EMS or XMS memory, the MOVE manager can create an overlay cache for copying discarded overlays. The program cannot execute overlays directly from this cache because the cache resides above the 640K limit. If a discarded overlay is needed again, the manager copies it from the overlay cache to the overlay heap rather than reading it from the disk. Because reading from the cache is much faster than reading from the disk, the space for your working set is effectively the cache size plus the heap size. The overlay manager routines maintain the overlay cache with an LRU algorithm in a manner similar to the overlay heap.

Heap and Cache Management
The MOVE overlay manager is responsible for loading requested overlays from the disk or cache and copying them to the heap. If there is insufficient contiguous heap space for the requested overlay, the MOVE manager discards the LRU overlay from the heap and checks for contiguous space again. If space is still insufficient, the MOVE manager discards the next LRU overlay and repeats these steps until sufficient contiguous space is available. At program startup, the MOVE manager attempts to allocate an overlay heap equal to the sum of the program's three largest overlays. If space is insufficient or there are less than four program overlays, MOVE allocates a heap that is the size of the largest overlay. The remaining computer free memory is retained for the conventional (malloc) heap. (This is default initialization behavior and can be substituted by another scheme if desired.)

If the program is running on a computer with EMS or XMS memory, the MOVE manager attempts to allocate an overlay cache three times the size of the overlay heap. If there is not enough memory for a cache this size, all EMS or XMS memory is used. When the MOVE manager discards an overlay from the heap, it does not copy the overlay to the cache if a copy of the overlay is already in the cache. Individual overlays can be up to 64K in size but are usually much smaller. Overlays can be individual OBJ files, as in a conventional overlay system, or they may contain a list of functions. With large overlays, your program's performance will suffer the problems associated with conventional overlays. Your overlays should be large enough to justify the time it takes to load them from disk. Specifics vary depending on your program, and experimentation will help you find the optimal overlay size and organization. For most programs, an optimal overlay size is about 4K. If your overlaid program temporarily needs the EMS or XMS memory occupied by the cache, you can use the MOVE application programming interface (API) _movepause function to release the cache memory and _moveresume to restore the cache. This is particularly useful if your program spawns another program that needs EMS or XMS memory to function. The MOVE API functions are described in Appendix A.

How Does MOVE Work?
One aspect of MOVE seems quite mysterious until you know how it works. How does the overlay manager know when it needs to load an overlay? How do calls to overlaid functions know where to branch in the overlay heap? This magic is accomplished by inserting an additional link between the function and its callers. This link, called a thunk, works like an additional function call. One thunk data structure is created in the root for each far function contained in the overlays. The thunk data structure contains the overlay number containing the function and the offset of the function's entry point within the overlay. The linker modifies all function calls to overlaid functions so that they call the thunk instead of directly calling the function. When a function calls the thunk, the MOVE manager locates the appropriate overlay in the heap or loads the overlay from the cache or disk and jumps to the offset specified in the thunk.

Creating Overlaid Programs
You create a MOVE application by following the same edit-compile-link development cycle used for all C programs. (The old syntax, link a+(b)+(c), is also supported.) You will need to create an additional file, called a DEF file, for each application. The DEF file is used by the linker and specifies the makeup of the root and of each overlay. A sample DEF file for the hypothetical calendar program is shown below:
EXETYPE DOS ;FUNCTIONS:init DatabaseInit PrinterInit

; Place main, strutil, and dateutil in the root. FUNCTIONS:0 _main FUNCTIONS:0 _strutil FUNCTIONS:0 _dateutil FUNCTIONS:1 FUNCTIONS:2 FUNCTIONS:3 FUNCTIONS:3 _database _dataform _init _printer

For more information on the syntax of DEF files, see "Creating Overlaid MS-DOS Programs" and "Creating Module Definition Files" in the C/C++ Environment and Tools manual. The first step in creating a MOVE application is to determine an appropriate overlay structure. For most programs, a good starting point is to place each OBJ file in its own overlay. The program entry point must be in the root, that is, the normal sequence is _astart followed by main. OBJ files containing universally called primitives should be placed in the root as well. MOVE gives you control over the placement of individual functions. Instead of moving a function's source code physically to another file, you specify the function in a FUNCTIONS statement in your application's DEF file. A function can be specified in this way only if it is a packaged function. Functions can be packaged by specifying the /Gy switch during compilation. For more information on packaging functions, see "CL Command Reference" and "Creating Overlaid MS-DOS Programs" in the C/C++ Environment and Tools manual.

Optimizing Overlaid Programs
After you've created a MOVE program, you can run it under different memory conditions, assess its performance, and compare the performance of different overlay sizes and structures. A MOVE feature called tracing can help you optimize your overlays. Tracing a MOVE application generates a log file during program execution. The log file contains an entry for each load and discard of an overlay. A separate MS-DOS utility called TRACE reports and summarizes the information in trace log files. The TRACE utility is discussed in Appendix C. For more information on tracing, see "Creating Overlaid MS-DOS Programs" in the C/C++ Environment and Tools manual. Future versions of MOVE will include enhanced tools that make designing and optimizing the overlay structure easier. You can modify some of the characteristics of the MOVE manager. For example, you can change the amount of memory MOVE allocates for the overlay heap and cache by changing the constants and heuristics in the MOVEINIT.C file. For more information, see "Creating Overlaid MS-DOS Programs" in the C/C++ Environment and Tools manual.

Appendix A: The MOVE API
The MOVE API is provided in a library called MOVE.LIB. This library is a component of the C combined libraries for medium and large models. (Another form of the library, MOVETR.LIB, also contains the MOVE API; see Appendix C.) The MOVE API is declared in the MOVEAPI.H file, which is available on disk. This appendix describes MOVE routines and functionality.

The _moveinit Function
MOVE begins an overlaid program with a call to _moveinit, which calculates the heap and cache needed for the overlays and allocates memory for the heap and cache. You can use the default _moveinit function provided in MOVE.LIB, or you can write your own version of _moveinit and link it to your program. The source code for the default _moveinit function is available in the MOVEINIT.C file. The _moveinit call occurs before the call to _astart that begins a C program and performs initialization. For this reason, do not call C run-time routines from any version of _moveinit. The following functions are called from _moveinit:
• • • •

_movesetheap _movegetcache _movesetcache _movetraceon (only in MOVETR.LIB)

The functions are described in the sections below. In addition, LINK creates several variables that begin with $$; these variables are described in the "LINK Variables" section.

Heap Allocation
The _movesetheap function sets the overlay heap size. extern unsigned short __far __cdecl _movesetheap( unsigned short maxovl, unsigned short minheap, unsigned short reqheap ); where:

maxovl minheap

is the maximum number of overlays. The $$COVL variable always contains this value. is the minimum heap size, specified in 16-byte paragraphs. The heap must be at least the size of the largest overlay. To calculate overlay sizes, use $$MPOVLSIZE as in MOVEINIT.C. is the requested heap size, specified in 16-byte paragraphs. The default _moveinit function requests the sum of the sizes of the three largest overlays.


MOVE attempts to allocate the requested amount of memory. If that much memory is not available, MOVE tries to allocate as much as possible. If the amount of available memory is less than the minimum heap requested, MOVE ends the program and issues a run-time error.

Cache Allocation
The _movegetcache function determines the amount of memory available for a cache. extern void __far __cdecl _movegetcache( unsigned short __far *expmem, unsigned short __far *extmem ); where: *expmem is available expanded memory, in kilobytes. *extmem is available extended memory, in kilobytes.

The _movesetcache function allocates expanded and extended memory for an overlay cache. extern unsigned short __far __cdecl _movesetcache( unsigned short expmem, unsigned short extmem ); where: expmem extmem is the requested amount of expanded memory, specified in kilobytes. is the requested amount of extended memory, specified in kilobytes.

The default _moveinit function requests a cache equal to the sum of all overlays. If _movesetcache cannot allocate the requested amount of memory, it sets a bit in the return value. MOVEAPI.H defines the following constants to represent bits in the return value. Constant __MOVESETCACHE_ERR_NO Bit 0 Description No error Cannot allocate extended memory Cannot allocate expanded memory


The _movesetcache function sets the following global variables when the overlay cache is allocated:
extern unsigned short __far __cdecl _moveckbxms; extern unsigned short __far __cdecl _moveckbems;

The _moveckbxms variable is set to the size of the allocated extended memory. The _moveckbems variable is set to the size of the allocated expanded memory.

Freeing and Reallocating Cache Memory
You can temporarily release and then restore the memory allocated for the overlay cache. This is useful when your program spawns another program that uses extended or expanded memory or when you want to prepare for a possible abnormal exit from your program. The _movepause function frees the cache memory and closes the executable file. extern void __far __cdecl _movepause( void ); The _moveresume function reallocates memory for the overlay cache and reopens the executable file. extern void __far __cdecl _moveresume( void ); MOVEAPI.H defines the following variables for use by these functions:
extern unsigned short __far __cdecl _movefpause; extern unsigned short __far __cdecl _movefpaused;

MOVEAPI.H also defines constants to represent bits in _movefpause and _movefpaused as follows.


Bit 2 4

Description Represents the executable file Represents the cache memory

The _movepause function reads the value in _movefpause and sets _movefpaused to the value of the action taken by _movepause. Before you call _movepause, set _movefpause to __MOVE_PAUSE_DISK to close the file, and set it to __MOVE_PAUSE_CACHE to free the cache, as in:
_movefpause |= __MOVE_PAUSE_DISK; _movefpause |= __MOVE_PAUSE_CACHE; _movepause();

The _moveresume function reads the value in _movefpaused and then clears _movefpaused. The overlays that were in the heap and cache are not restored. Therefore, after a call to _moveresume, the program may at first run slightly more slowly as it makes calls to routines in overlays.

LINK Variables
LINK creates the following variables: $$MAIN $$OVLTHUNKBEG $$OVLTHUNKEND $$CGSN Entry point to an overlaid program. In a C program, this is defined to be __astart. Beginning of the interoverlay call (thunk) table. End of the interoverlay call table. Number of global segments. Each object file contributing to an overlay takes up one global segment number (GSN). Each COMDAT (packaged function) segment takes up one GSN. Number of overlays. Each overlay can contain several GSNs. Map of GSNs to segment displacements in an overlay. Map of GSNs to overlay numbers. Map of overlay numbers to logical file addresses of overlays in the executable file. Map of overlay numbers to overlay image sizes (the size of the code actually loaded into the overlay heap).



Overlay interrupt number.

Appendix B: MOVE Environment Variables
You can use environment variables at run time to specify the size of the requested overlay heap and overlay cache and the maximum number of overlays. The _moveinit function given in MOVEINIT.C provides environment support; you can compile this function and link it with your program. (MOVETR.LIB includes a version of _moveinit that already contains environment support.) First, enable environment support by compiling MOVEINIT.C with MOVE_ENV defined. Then specify the resulting MOVEINIT.OBJ when linking your program. With MOVE_ENV defined, MOVEAPI.H declares the following variable:
extern unsigned short __far __cdecl _movesegenv;

Compiling for environment support causes MOVEINIT.C to define a function called _movegetenv. The environment-support version of _moveinit uses _movegetenv to get the values of the following environment variables: MOVE_HEAP Requested heap (paragraphs)

MOVE_COVL Maximum number of overlays MOVE_EMS MOVE_XMS Requested expanded-memory cache (paragraphs) Requested extended-memory cache (paragraphs)

To use these variables, set them to strings that represent the desired settings. Each string must consist of exactly four hexadecimal digits.

Appendix C: The TRACE Utility
You can optimize the overlays in your program with the help of the tracing form of the MOVE library (MOVETR.LIB) and the Microsoft MOVE trace utility (TRACE) version 1.0. MOVETR.LIB contains MOVE.LIB and additional routines for tracing overlay behavior. Create a tracing version of your program as described in the following sections. When you run your program, the tracing functions create a binary file called MOVE.TRC in the directory from which the program is run. After your program ends, use TRACE to read MOVE.TRC. If the tracing results indicate that some functions cause overlays to be

swapped frequently, you can reorganize the functions in the overlays by using statements in the module definition file.

Creating a Tracing Version of an Overlaid Program
To create a program that will trace overlay performance, specify MOVETR.LIB in LINK's libraries field. This causes LINK to use the MOVETR.LIB library instead of the MOVE.LIB component of the default combined library. Use LINK's /NOE option to prevent conflicts between MOVETR.LIB and the combined library. If you explicitly specify the combined library in the libraries field, list MOVETR.LIB before the combined library.

The Trace Functions
By default, tracing is in effect during the entire run of your program. You do not need to make any changes in your program to enable tracing. However, MOVETR.LIB provides two functions that you can use to turn tracing on and off within your program. The _movetraceon function turns on tracing. extern void __far __cdecl _movetraceon( void ); This function opens the MOVE.TRC file and activates tracing. During tracing, information about overlay behavior is written to MOVE.TRC. The default _moveinit function calls _movetraceon at the start of the program if MOVE_PROF is defined; this definition is in MOVETR.LIB. The _movetraceoff function turns off tracing and closes MOVE.TRC. extern void __far __cdecl _movetraceoff( void ); The tracing functions are declared in MOVEAPI.H. They are defined only in MOVETR.LIB.

Running TRACE
To run TRACE, use the following syntax: TRACE [options] [tracefile] The tracefile is the MOVE.TRC file created during a tracing session. You can specify a path with the filename. If tracefile is not specified, TRACE looks in the current directory for a file called MOVE.TRC.

An option is preceded by an option specifier, either a forward slash (/) or a dash (–). Options are not case sensitive. An option can be abbreviated to its initial letter. Options can appear anywhere on the command line. TRACE options are: /SUM Displays a summary of the program's performance. If /SUM is not specified, TRACE displays the entire tracing session. For details, see the "TRACE Performance Summary" section. If /SUM is specified, /EXE and /MAP have no effect. Allows TRACE to read the executable file that was traced and to extract function names for use in the trace output. Specify the filename of the executable file that generated the MOVE.TRC file. You can specify a path with the filename. If /EXE is not specified, the trace output refers to functions by overlay number and offset. The program must contain Microsoft Symbolic Debugging Information that is compatible with Microsoft CodeView® version 4.0. To include debugging information, create the object file using the /Zi option and link the program using the /CO option. Displays a usage statement. Displays a usage statement.


/HELP /?

TRACE Output
TRACE displays information on the tracing session to the standard output device. You can use the redirection operator (>) to save the output in a file. The output is in table format. Each line of output represents an interoverlay transaction. A line of information is organized into the following fields:

The overlay to which to return from the current transaction. (If blank, the overlay in the previous line is implied.) The physical return address in segment:offset form. (If blank, the address in the previous line is implied.) The transaction type, which is one of the following: • Present
• • •

Load from disk Load from expanded memory Load from extended memory

• • • • •

Discard from heap Cache to expanded memory Cache to extended memory Discard from cache

• • •

Invalid The overlay that is the object of the transaction. The segment in memory where the transaction overlay is loaded. The interoverlay operation, which is one of the following: • Call function, in which function is:

An overlay number and an offset in default output A function name if /EXE is used A decorated function name if /EXE and /MAP are used
• •

Return. If blank, the Call in the previous line is implied.

TRACE Performance Summary
When you run TRACE with the /SUM option, TRACE displays a summary of overlay performance to the standard output device. The full session is not displayed. You can use the redirection operator (>) to save the output in a file. The summary information is organized into the following fields. OVERALL calls Sum of Call operations

returns Sum of Return operations

HEAP discards discards / entries Sum of "Discard from heap" transactions Discards as percent of (calls + returns)

loads from disk loads from expanded memory loads from extended memory

Sum of "Load from disk" transactions Sum of "Load from expanded memory" transactions Sum of "Load from extended memory" transactions

CACHE discards discards / entries Sum of "Discard from cache" transactions Discards as percent of (calls + returns)

caches to expanded memory Sum of "Cache to expanded memory" transactions caches to extended memory Sum of "Cache to extended memory" transactions

TRACE Errors
TRACE issues the following errors and warnings. TR1001Invalid filename for /EXE The string specified with the /EXE option was not a valid filename. TR1005Missing filename for /EXE The /EXE option must be followed by a colon and a filename, with no spaces in between. TR1007Unrecognized option The command line contained an option specifier, either a forward slash (/) or a dash (–), followed by a string that was not recognized as a TRACE option. TR1010Cannot find trace file One of the following occurred:

A trace file was specified on the command line, but the specified file does not exist. No trace file was specified on the command line and TRACE assumed a trace file called MOVE.TRC, but MOVE.TRC does not exist.

TR1011Error opening/reading .EXE file TRACE either failed to find the executable file specified with /EXE or encountered an error while opening the file. TR1012Out of memory The available memory is insufficient for the size of the program being traced. TR1013Invalid debugging information The debugging information contained in the executable file was not packed using CVPACK version 4.0. TR4001Cannot find function name TRACE could not find a function name to display. TRACE continues to generate output without displaying the function name. Function names are displayed when the /EXE option is specified. Either the executable file contains corrupt debugging information or a module in the executable file was compiled without the /Zi option for including debugging information. TR4002Missing debugging information for module TRACE could not find a symbol to correspond to a given physical address. A module may have been compiled without the /Zi option for including debugging information.

Microsoft Windows and the C Compiler Options
Dale Rogerson Microsoft Developer Network Technology Group Mr. Rogerson is widely known for having reported the largest number of duckbilled platypus sightings in the greater Seattle area. Created: May 5, 1992 Click to view or copy the Zusammen sample application files for this technical article.


One of the key issues in the development and design of commercial applications is optimization—how to make an application run quickly while taking up as little memory as possible. Although optimization is a goal for all applications, the Microsoft® Windows™ graphical environment presents some unique challenges. This article provides tips and techniques for using the Microsoft C version 6.0 and C/C++ version 7.0 compilers to optimize applications for Windows. It discusses the following optimization techniques:
• • • •

Using compiler options Optimizing the prolog and epilog code Optimizing the calling convention Aliasing (using the /Ow and /Oa options)

General Optimization Strategies
Optimization is a battle between two forces: small size and fast execution. As with most engineering problems, deciding which side to take is never easy. The following guidelines will help you optimize your applications for the Microsoft® Windows™ graphical environment.

If your application runs in real mode, always optimize for size. Memory is the limiting resource in real mode. Using too much memory leads to both speed loss and memory loss, resulting in a performance hit. Memory is not as scarce in protected mode (that is, in standard and enhanced modes) as it is in real mode, so you must decide whether to optimize for speed or for size. However, as users start running multiple programs simultaneously, memory becomes scarce. The rule of thumb for both Windows and other operating environments is to optimize for speed the 10 percent that runs 90 percent of the time. Tools such as the Microsoft Source Code Profiler help determine where optimizations should be made. Because Windows is a visual interactive environment, several shortcuts help identify areas that need speed optimization. Any code that displays information directly on the screen, including code that responds to WM_PAINT, WM_CREATE, and WM_INITDIALOG messages, should be optimized. A dialog box does not appear until the WM_INITDIALOG process is complete, so the user must wait. Speed is not as critical in other areas because the user can move the mouse only so fast. In most situations, the code underlying the selection processes in a dialog box need not be optimized.

Note The Microsoft C version 6.0 compiler precedes most function modifiers with a single underscore (_), for example, _loadds, _export, _near, _far, _pascal, _cdecl, and

_export. The Microsoft C/C++ version 7.0 compiler uses two underscores (__) for ANSI C compatibility but recognizes the single underscore for backward compatibility. This article uses C version 6.0 compiler syntax except when discussing features available only in C/C++ version 7.0.

The Sample Application: Zusammen
The sample application, Zusammen, illustrates the use of the compiler options. Zusammen, which means together in German, scrolls two different windows simultaneously. To scroll, the user selects the windows with the mouse and clicks Zusammen's scroll bars. This makes it easy to compare data in two different windows or applications. Zusammen consists of a program generated by MAKEAPP and a dynamic-link library (DLL) called Picker. MAKEAPP is a sample program included in the Windows version 3.1 Software Development Kit (SDK). The Picker DLL selects the windows to be scrolled. The make files for Zusammen and Picker are combined for simplicity. All functions are classified as local, global, entry point, or DLL entry point and declared with an appropriate #define statement, for example:
void LOCAL cleanup(HWND hwndDlg); BOOL DLL_ENTRY Picker_Do(HWND, LP_PICKER_STRUCT); • • • •

A local function is a function called from within a segment. A global function is a function called from outside a segment. An entry point is a function that Windows calls. A DLL entry point is a DLL function that a client application calls.

For demonstration purposes, the symbols are defined in the make files. Using symbols facilitates switching memory models and optimizing applications. You can also port applications to flat-model environments easily by using #define NEAR and #define FAR (from WINDOWS.H) instead of __near and __far. Some possibilities are:


The Solution

Tables 1 through 3 show options recommended for general use. These options can be used as defaults in make files because they do not require changes to the source code to compile correctly. Each table shows the options for building an application and a DLL and differentiates between the debugging (development) phase and the released product. The options in Table 1 apply to applications or libraries that run in real mode; the options in Tables 2 and 3 apply to applications or libraries that run only in protected mode. Table 3 is for C/C++ version 7.0 use only. The developer must choose either the /Ot option to optimize for speed (time) or the /Os option to optimize for size. The C version 6.0 compiler defaults to /Ot. The C/C++ version 7.0 compiler defaults to /Od, which disables all optimizations and enables fast compiling (/f). The /Oa and /Ow options do not appear in the tables; both options assume no aliasing and require that the C source meet certain conditions to work properly. These two options are discussed in the "Aliasing and Windows" section. In general, use /Ow instead of /Oa for Windows-based applications. You can turn the no-aliasing assumption on and off using #pragma optimize with the a or w switch. Another option that is not included in the tables is the optimized prolog/epilog option /GW. In C version 6.0, this option generates code that does not work in real mode; it is fixed in C/C++ version 7.0. For backward compatibility, the C/C++ version 7.0 /Gq option generates the same prolog/epilog as the C version 6.0 /GW switch. Although the fixed /GW option results in a smaller prolog for non-entry-point functions, better optimizations are available for protected-mode applications, as discussed in the next section. Table 1. Compiler Options for Real Mode (C 6.0 and C/C++ 7.0)

The General Solution for Protected Mode
If your application runs only in protected mode, you can use the additional optimization options shown in the second row of Table 2. Make1 demonstrates the use of these options, which are safe for all modules in a protected-mode application. You can realize additional savings in space and time by compiling modules without entry points separately from those with entry points. Use the options in the third row of Table 2 for modules without entry points. Make2 demonstrates the use of both sets of options. The Zusammen sample application is already set up with far calls and entry points in separate C files. This application should run only in protected mode, so you should compile with the resource compiler (RC) /T option to ensure that the application never runs in real mode.

DLLs can benefit from the techniques presented in the "Optimized DLL Prolog and Epilog" section. These techniques work with both C version 6.0 and C/C++ version 7.0. Table 2. Compiler Options for Protected Mode Only (C 6.0 and C/C++ 7.0)

The General Solution for Protected Mode and C/C++ 7.0
The C/C++ version 7.0 compiler includes special optimizations for protected-mode Windows programs (see Table 3). These special optimizations include /GA (for applications), /GD (for DLLs), and /GEx (to customize the prolog) and help reduce the amount of overhead the prolog/epilog code causes. The /GA and /GD options add the prolog and epilog code only to far functions marked with __export instead of compiling all far functions with the extra code. With __export, entry points need not be placed in a separate file as required by C version 6.0. Applications that do not mark far functions with __export can use the /GA /GEf or /GD /GEf options to generate the prolog/epilog code for all far functions. /GEe causes the compiler to export the functions by emitting a linker EXPDEF record. By default, /GD emits the EXPDEF record but /GA does not. Applications compiled with /GA usually do not need the EXPDEF record. Only real-mode applications need /GEr and /GEm; protected-mode applications have no use for these options. The following options generate equivalent prolog/epilog code:
• •

/GA is equivalent to /GA /GEs /D_WINDOWS. /GD is equivalent to /GD /GEd /GEe /Aw /D_WINDOWS /D_WINDLL.

Table 3. Compiler Options for Protected Mode (C/C++ 7.0 Only)

Overview of Compiler Options
Generate Intrinsic Functions (/Oi)
The /Oi option replaces often-used C library functions with equivalent inline versions. This replacement saves time by removing the function overhead but increases program size because it expands the functions. In C version 6.0, the /Oi option is not recommended for general use because it causes bugs in some situations, especially when DS != SS. Using #pragma intrinsic to selectively optimize functions reduces the chance of encountering a bug.

The ZUSAMMEN.C module of the sample application demonstrates the use of #pragma intrinsic. Although this particular use does not drastically increase program speed, it does demonstrate the right ideas: It speeds up the WM_PAINT function and is used on a function that is called three times per WM_PAINT message. The best savings occur when the intrinsic function is in a loop or is called frequently.

Pack Structure Members (/Zp)
The /Zp option controls storage allocation for structures and structure members. To save as much memory as possible, Windows packs all structures on a 1-byte boundary. Although this saves memory, it can result in performance degradation. Intel® processors work more efficiently when word-sized data is placed in even addresses. An application must pack Windows structures to communicate successfully with Windows, but it need not pack its own structures. Because Windows structures are prevalent, it is better to compile with the /Zp option and use #pragma pack on internal data structures. Passing an improperly packed structure to Windows can lead to problems that are difficult to debug. Both Zusammen and Picker use #pragma pack on their internal data structures. (See the FRAME.H, APP.H, and PACK_DLL.H modules.) Note that PICKER.DLL packs PICKER_STRUCT. Because most Windows-based applications pack structures, it is safer to leave DLL structures packed. In most cases, the speed optimization is not worth the extra trouble of documenting the unpacked functions, especially if the DLL will be used with other languages or products, such as Microsoft Visual Basic™ or Microsoft Word for Windows.

Set Warning Level (/W3)
All Windows-based programs should be compiled at warning level 3. You can fix many hard-to-detect bugs by removing the warnings that appear during compilation. It is less expensive to fix a warning message than to ship a bug fix release to unsatisfied users. All applications should be run in Windows debug mode before release.

Compile for Debugging (/Zi) and Disable Optimizations (/Od )
It is often easier to turn off optimizations to debug a module. Some optimizations can introduce bugs into (or remove bugs from) otherwise correct programs. For this reason, an application must be fully tested with release options, and all developers and testers should be aware of the options used.

Stack Checking (/Gs)
By default, the compiler generates code to "check the stack"; that is, each time a function is called, chkstk (actually _aNchkstk) compares the available stack space with the additional amount the function needs. If the function requires more space than is available, the program generates a run-time error message. Table 4 (below, under "Examining the Prolog and Epilog Code") shows the call to chkstk, which is removed by

compiling with /Gs. Stack checking adds significant overhead, so it is usually disabled with the /Gs option after sufficient testing. It is usually a good idea to reenable stack checking on recursive functions with the check_stack pragma.

#define Statements (/DSTRICT, /D_WINDOWS, /D_WINDLL)
The #define statements /DSTRICT, /D_WINDOWS, and /D_WINDLL are recommended for all Windows-based applications. Using /DSTRICT with WINDOWS.H results in a more robust and type-safe application. /DSTRICT lets you use macros to replace Windows functions such as GetStockObject with type-safe versions such as GetStockBrush and GetStockPen. The C header files use /D_WINDOWS and /D_WINDLL to determine the correct prototypes and typedefs to include. /D_WINDLL ensures that using an invalid library function in a DLL generates an error. The C/C++ version 7.0 compiler /GA option automatically sets /D_WINDOWS; the /GD option sets both /D_WINDOWS and /D_WINDLL.

Optimizing the Prolog and Epilog
Programs designed for Windows, unlike those designed for MS-DOS®, have special sections of code called the prolog and epilog added to entry points. For this reason, Windows uses special compilers. When you compile a program with the /Gw option, all far functions receive the extra prolog and epilog code and increase in size by about 10 bytes. You can take the following steps to reduce this overhead, especially for protectedmode-only applications:
• •

Reduce the number of far calls. Reduce the prolog and epilog code.

Reducing the Number of Far Calls
Because /Gw adds the extra code only to far functions, reducing the number of far functions is a good way to trim program size. In the small memory model, all functions are near unless explicitly labeled as far, so reducing far calls is not a problem. In the medium memory model, all functions default to far and therefore receive the extra prolog and epilog code. In C version 6.0, you can use two methods to reduce this overhead:

Organize source modules. Label all functions explicitly as either near or far, and compile with the medium model. Use mixed-model programming with small model as the base.

C/C++ version 7.0 users do not need either of these methods; they can use the /GA and /GD options to add prolog/epilog code only to far functions marked with __export. Other

far functions are compiled without additional overhead. To add the prolog and epilog code to all far functions, use /GA /GEf or /GD /GEf. Organizing source modules To reduce the number of far calls, you must organize source modules carefully. Each module is divided into internal functions and external functions. Internal functions are called only from within the module; external functions are called from outside the module. As a direct result of this arrangement, internal functions are marked near and external functions are marked far. The Zusammen sample application is arranged in this manner. Each module has a header file that prototypes all external functions as far. Each source file prototypes its internal functions as near because they are not needed outside the module. For large applications, you can use a tool such as MicroQuill's Segmentor to determine the best segmentation to use. You can also organize source modules manually, but the process must be repeated whenever the source file changes. Another method for reducing far call overhead is to use the FARCALLTRANSLATION and PACKCODE linker options. This method works exclusively on protected-modeonly applications and should not be used in real mode. PACKCODE combines code segments. You can specify the size of the segments to pack on the command line (for example, /PACKCODE:8000). The default size limit is 65530 bytes. C/C++ version 7.0 turns PACKCODE on by default for all segmented executables. If a far function is called from the same segment, FARCALLTRANSLATION replaces the far call with a near call. Mixed-model programming In mixed-model programming, the small model acts as the base. All far functions are explicitly labeled as in the previous method. Each module is compiled with the /NT option, which places the module in a different segment, for example:
cl /c /Gw /Od /Zp /W3 /NT _MOD1 mod1.c cl /c /Gw /Od /Zp /W3 /NT _MOD2 mod2.c

Because the small model is used, all other functions default as near model and presto!— no far call overhead. The SDK Multipad sample application uses this method for compiling, although many of its near functions are labeled as such. Make3 compiles Zusammen using this method. In practice, this method does not save much work—it only eliminates the need to label near functions explicitly. However, labeling near functions is useful for documenting local and global functions.

In mixed-model programming, only functions in the default _TEXT code segment can call the C run-time library. Multipad avoids this limitation by not calling any C run-time library functions. Mixed-model programming uses the small-model C library, which is placed in the _TEXT segment. Because these library routines are based in small model, they assume all code as near. If a C library function is called from a different segment, a linker fixup error occurs because the linker cannot resolve a near jump into another segment. There is no convenient way to avoid this restriction. Removing the C run-time library Because the C run-time library is not used, you need not link to it. The Windows version 3.1 SDK includes libraries named xNOCRTW.LIBthat do not contain any C run-time functions. Each memory model has one such library containing the minimum amount of code needed to resolve all compiler references. Using this library saves about 1.5K from the _TEXT code segment size and about 500 bytes from the default data segment size. Linking time also improves slightly. When using the xNOCRTW.LIB libraries, note that the standard C libraries may contain some operations that seem ordinary (such as long multiplication).

Examining the Prolog and Epilog Code
Decreasing the number of far functions is only part of the battle. Not all far functions need the full prolog and epilog code, as the existence of the /GW, /GA, and /GD options shows. The C/C++ version 7.0 /GA and /GD options provide the best achievable optimizations of the prolog and epilog code. The C version 6.0 /GW option provides an optimized version of the prolog/epilog code for far functions that are not entry points. However, when armed with a little knowledge, the C version 6.0 compiler user can generate better results for protected-mode applications than those the /GW option provides, as discussed in the following sections. What does the prolog/epilog code do anyway? The prolog/epilog code sets the DS register to the correct value to compensate for the existence of multiple data segments and their movements. The second column of Table 4 shows the assembly-language listing of the prolog/epilog code that every far function receives when it is compiled with /Gw. The last column shows the prolog/epilog code that near functions receive. This is the same code that far functions contain when they are not compiled with /Gw. Table 4. Assembly Listing of Prolog and Epilog Code (C 6.0)

C/C++ version 7.0 provides additional optimizations for real mode, even if you use the /Gw and /GW options. These optimizations include:

• •

Using mov ax,ds instead of a push/pop sequence in the Preamble phase. Using lea sp, WORD PTR -2[bp] for the Release Frame phase.

Table 5 shows the compiler output for these options. Table 5. Assembly Listing of Prolog and Epilog Code (C/C++ 7.0)

Most of the prolog/epilog code is not needed in protected mode but is essential for real mode. The /GW option does not have the push ds instruction that all far functions require in real mode to save the data segment; for this reason, /GW does not work in real mode. Not much can be done to optimize the prolog/epilog code that C version 6.0 generates for real-mode applications, so this article focuses on optimization in protected mode only. For more information on what happens during real mode, see Programming Windows by Charles Petzold (Redmond, Wash.: Microsoft Press, 1990). For the compiler writer's viewpoint, see the Windows version 3.1 SDK Help file. The order of phases in the C/C++ version 7.0 compiler options /GA and /GD differs slightly from that of /Gw: The Alloc Frame phase occurs before the Save DS and Load DS phases (when compiling without /G2). As a result, the /GA and /GD options remove the two dec bp instructions from the Release Frame phase. The compiler output for the /GA and /GD options is shown in Table 6. Table 6. Assembly Listing of Prolog and Epilog Code (C/C++ 7.0)

Protected mode only The Mark Frame and Unmark Frame phases are not needed during protected mode and can be ignored. The prolog/epilog code for a near function and the prolog/epilog compiled with /Gw differ in four phases: Preamble, Save DS, Load DS, and Restore DS. The other phases—Link Frame, Alloc Frame, Release Frame, and Unlink Frame—are the same; they set up the stack frame for the function. (See Figure 1.)

Figure 1. Stack Frame Creation The compiler generates code to access the parameters passed to the function using positive offsets to BP ([BP + XXXX]). Negative offsets from BP ([BP – XXXX]) access

the function's local variables. This happens for all C functions—near functions, far functions, and functions compiled with the /Gw option. Optimizing for 80286 processors (/G2) Because protected mode requires an 80286 processor at the minimum, you should use some of the special 80286 instructions through the /G2 option. Two instructions—enter and leave—are relevant to our current discussion. Enter performs the same function as Link Frame and Allocate Frame, and leave performs the same function as Release Frame and Unlink Frame. Table 7 shows the prolog/epilog code for near and far functions compiled with the /G2s option and without the /Gw option. Table 7. Assembly Listing of Prolog/Epilog Code Compiled with /G2s

Unfortunately, the /Gw option overrides the /G2 option in C version 6.0 and generates the prolog/epilog code without the enter and leave instructions. The C/C++ version 7.0 compiler corrects this limitation; it generates Windows prolog/epilog code with the enter and leave instructions when it compiles with /GA or /GD and /G2. Table 8 shows the prolog/epilog code for functions compiled with C/C++ version 7.0 options. Table 8. Assembly Listing of Prolog and Epilog Code for C/C++ 7.0 (Protected Mode Only)

The prolog preamble's purpose The Preamble, Save DS, Load DS, and Restore DS phases exist only when you compile a far function with a Windows option (/Gw, /GW, /GD, or /GA). Programs developed for Windows, unlike those developed for MS-DOS, can have multiple instances, each with its own movable default data segment. When control is transferred from Windows to an application or from an application to a DLL, a mechanism is needed for changing DS to point to the correct default data segment. This mechanism consists of the prolog/epilog code, the Windows program loader, the EXPORT section of the DEF file (or _export), and the MakeProcInstance function. Nothing seems to happen in the Preamble, Save DS, and Load DS phases:
push ds pop ax nop push ds mov ds,ax ; move ds into ax ; ; ; ; now ax = ds save ds ds = ax, but ax = ds therefore ds = ds

It seems like a lot of work to set DS equal to itself. However, a lot happens behind the scenes. Examining the code with the Microsoft CodeView® debugger reveals three Preamble phases different from the code listing the /Fc compiler option generates (see Make4). The Client_WinProc (in WCLIENT.C), Client_Initialize (in CLIENT.C), and Picker_Do (in PICKER.C) functions demonstrate these phases. Table 9 lists these phases. Table 9. Preamble Variations

The Windows program loader magically changes the Preamble phase of the prolog. The loader first examines the list of exported functions when it loads a program. When it finds an entry-point function with the /Gw preamble, it changes the preamble. If the function is not exported or the preamble is different, the loader leaves it alone, and DS retains its value. For example, in Client_Initialize, the DS register does not have to be changed so it is not. If the function is part of a single-instance application, the value can be set directly because single-instance applications have only one data segment. Because DLLs are always single instance, they belong to this group. AX is set directly to DGROUP. In the Load DS phase, DS is loaded with the DGROUP value from AX, resulting in a correct DS value for the function. In exported far functions, as demonstrated by Client_WinProc, Windows removes the entire preamble but still loads DS from AX during the Load DS phase. So where does it load AX? It depends on how Windows calls the function. For all window procedures, including Client_WinProc, Windows sets up AX correctly before calling the procedure. That leaves callbacks such as those used with the EnumFontFamilies function. You can set up an EnumFontFamilies callback as follows:
FARPROC lpCallBack; lpCallBack = MakeProcInstance(CallBack, hInstance); EnumFontFamilies(hdc, NULL, lpCallBack, (LPSTR)NULL); FreeProcInstance(lpCallBack);

MakeProcInstance creates an instance thunk, which is basically a jump table with an added capability: setting AX. Instance thunks appear as follows:
mov ax,XXXX jmp <actual function address> ;jump to actual function

The return value of MakeProcInstance is the address of the instance thunk. This address is passed to EnumFontFamilies, which calls the instance thunk instead of the function itself. The instance thunk sets up AX with the current address of the data segment. In real

mode, Windows changes this address each time it moves the data segment and jumps to the function that loads DS with the value in AX. And presto! chango! DS has the correct value. This discussion leads to some interesting conclusions:

An application cannot call an exported far function directly; it must use the result of MakeProcInstance as a function pointer instead. An application should not use MakeProcInstance when calling a function in a DLL. DLLs should not call MakeProcInstance on any exported far function that resides inside the DLL. Nonexported far functions do not need the prolog/epilog code. Windows sets up the AX register as part of its message-passing mechanism. Window procedures do not have instance thunks. There are no obvious optimizations.

• •

FixDS (/GA and /GEs)
FixDS by Michael Geary is a public domain program available on CompuServe® that brings insight and imagination to the optimization process. Borland® C++ and Microsoft C/C++ version 7.0 both incorporate this feature. Under Microsoft C/C++ version 7.0 you can use /GA to perform the same function as FixDS (see Tables 6 and 8). So far, we have not discussed the SS stack segment register. The prolog code does not set SS anywhere. This must mean that the Windows Task Manager sets SS before the function is executed. Because a Windows-based application is not normally compiled with the /Au or /Aw option, SS == DS. So there is no reason why DS cannot be loaded simply from SS. Instead of pushing DS into AX, FixDS modifies the prolog to put SS into AX, which is eventually placed in DS (see the fourth column of Table 10). This preamble differs from the standard Windows preamble, so the Windows loader does not modify it. This method has two convenient side effects:
• •

You no longer need MakeProcInstance. You do not have to export entry points.

FixDS does not work for DLLs because DS != SS.

Table 10. Assembly Listing of Optimized Prolog and Epilog

The C/C++ version 7.0 compiler extends the ideas of FixDS by letting the programmer specify where DS gets its value. You can use the /GEx option in conjunction with the /GA and /GD options to load DS. The following options are available:
• •

/GEa—Load DS from AX. This is equivalent to /Gw and /GW. /GEd—Load DS from DGROUP. This is the default behavior for /GD and is useful for DLLs, as explained in the next section. /GEs—Load DS from SS. This is equivalent to FixDS and is the default behavior for /GA.

When you compile an application with /GA, the functions marked with __export are not really exported (you can look at the exported functions with EXEHDR). If you compile the program with /GA /GEe, the EXEHDR listing shows all exported functions. A program that you compile with /GA loads DS from SS and does not need to export its entry points, as mentioned above. A program compiled with /GA /GEa should normally be compiled with /GEe. The /GD and /GA options work differently. The /GD option exports functions marked with __export. To stop the compiler from exporting functions in a DLL, use /GA /GEd /D_WINDLL /Aw instead of /GD.

Optimized DLL Prolog and Epilog
Although the previous recommendations (excluding FixDS) work fine with DLLs, a better optimization method exists. To optimize a DLL with C version 6.0, compile all DLL modules with the options listed in Table 2 for modules without entry points: /Aw /G2 /Gs /Oclgne /W3 /Zp This compilation does not generate prolog or epilog code because the /Gw option is not used. To load DS correctly, mark all entry-point functions with _loadds. Place the functions that the client application calls in the DEF file. This changes the prolog/epilog code to match the second column of Table 10. _loadds basically adds the same lines that the Windows function loader changes in the Preamble for a DLL. See Make5 for an example of this method. Again, this is for protected-mode-only applications.

The /GD option in C/C++ version 7.0 defaults to loading DS from the default data segment (see the third column of Table 10). The /GD option also sets _WINDLL and /Aw. Notice that the compiler options include /Aw but not /Au. The /Aw option informs the compiler that DS != SS. The /Au option is equivalent to /Aw and a _loadds on every function, far and near. This is not an optimization because even near functions receive the three lines of code that set up the DS register. Using _loadds does not work for applications that have multiple instances and therefore multiple DGROUPs. It does, however, work for single-instance applications. A singleinstance application need not export functions because the application passes function addresses to Windows. The application should make sure that another instance cannot start by checking the value of hInstance. Windows creates a new data segment for the application, but the application contains hard-coded pointers to the first data segment. The application should also set up a single data segment in the DEF file as:

Otherwise, the _loadds function modifier will generate warnings. There is no need to use MakeProcInstance because the _loadds function modifier sets up the DS register correctly. EXPORT vs. _export In the previous examples, the functions are exported in the DEF file. You can also use the _export keyword to export DLL functions. This method has some drawbacks, depending on the method you use to link the application with the DLL. There are three methods:
• • •

Including an IMPORTS line in the DEF file Using the IMPLIB utility Linking explicitly at run time

Including an IMPORTS line in the DEF file Including an IMPORTS line in the DEF file of the application, for example:

although inconvenient for DLLs with many functions, allows you to rename functions, for example:

Now the application can call PickIt instead of Picker_Do. This is useful when DLLs from different vendors use the same function name and when you import a function directly by its ordinal number. The linker gives each exported function an ordinal number to speed up linking by eliminating the need to search for the function. You can override the default ordinal number by specifying a number after an "at" sign (@) in the DLL's DEF file, for example:
; DLL .DEF EXPORTS Picker_Do @1

An application can import this function with the following DEF file entry:

DLLs should always include ordinal numbers on exported functions. Using the IMPLIB utility Most programmers use the IMPLIB utility instead of an IMPORTS line in their DEF files. IMPLIB takes the DEF file of a DLL or, if _export is used, takes the DLL itself and builds a LIB file. The application links with the LIB file to resolve the calls to the DLL. Therefore, the IMPORTS line is not needed. One of the drawbacks of _export is that it assumes linking by name instead of linking by ordinal number. As a result, the linker gives the function an ordinal number and the function name is placed in the Resident Name Table. The linker is not likely to assign the same number each time it links the program. For example, the output of the EXEHDR program for a program with two exported functions may originally look like this:
Exports: ord seg offset 1 1 07a1 4 1 0e06 3 1 00ac 2 1 0061 name WEP exported, shared data ___EXPORTEDSTUB exported, shared data PICKER_OLDDLGPROC exported, shared data PICKER_DO exported, shared data

Adding a third exported function to the program may change all the ordinals in the EXEHDR output, for example:
Exports: ord seg offset 1 1 07a1 3 1 0e06 4 1 0f00 name WEP exported, shared data ___EXPORTEDSTUB exported, shared data NewFunction exported, shared data

2 5

1 1

00ac 0061

PICKER_OLDDLGPROC exported, shared data PICKER_DO exported, shared data

Applications that use any method of ordinal linking must now be recompiled to use the new ordinals. You may also have to recompile if you use the EXPORT statement without explicitly giving ordinal numbers. Having to recompile an application each time the DLL changes offsets many of the advantages of using DLLs. Linking by name also results in function names being placed in the Resident Name Table, which is an array of function addresses indexed by function name. The Resident Name Table stays in memory for the life of the DLL. When linking by ordinal number, the function names reside on disk in the Non-Resident Name Table while an array of function addresses indexed by ordinal number resides in memory. For a large DLL, the Resident Name Table could consume a significant amount of memory. Also, linking by name is much slower than linking by ordinal number because Windows must perform a series of string comparisons to find the function in the table. Linking explicitly at run time Run-time dynamic linking occurs when a function call is resolved at run time instead of load time. For example:
HANDLE hLib ; FARPROC lpfnPick ; // Get library handle. hLib = LoadLibrary("PICKER.DLL") ; // Get address of function. lpfnPick = GetProcAddress(hLib, "Picker_Do") ; // Call the function. (*lpfnPick) (hwnd, &aPicker ) ; // Free the library. FreeLibrary( hLib) ;

Linking by name does not use the ordinal number of the function. When linking by name it is much faster to have the function name in the Resident Name Table. However, using ordinal numbers is still faster and uses less memory. For example:
#define PICKER_DO 3 HANDLE hLib ; FARPROC lpfnPick ; // Get library handle. hLib = LoadLibrary("PICKER.DLL") ; // Get address of function. lpfnPick = GetProcAddress(hLib, MAKEINTRESOURCE(PICKER_DO)) ; // Call the function. (*lpfnPick) (hwnd, &aPicker ) ; // Free the library. FreeLibrary( hLib) ;

The fastest, most flexible method, regardless of the linking method you use, is to explicitly list the functions with ordinal numbers in the EXPORTS section of the DEF file. The C/C++ version 7.0 /GD option encourages the use of __export to mark entry points. If you use this option, we recommend that you add an EXPORT entry in the DEF file for all functions that an application calls. DS != SS issues Some problems can arise within a DLL because DS != SS. A common problem occurs when a DLL calls the standard C run-time library. For example, if you compile the following code with the /Aw option:
void Foo() { char str[10]; strcpy(str,"BAR"); }

// allocates str on stack, // passing the far pointer as a // near pointer

the compiler generates a near/far mismatch error because strcpy expects str to be in the default data segment (a near pointer). However, str is allocated on the stack (making it a far pointer) because the stack segment does not equal the data segment. The following examples show how to avoid this situation.
• • • • • • • • • • • • • • • • • • •

You can place the array in the data segment by making it static:
void Foo2() { static char str[10]; strcpy(str,"BAR"); } char str[10]; void Foo3() { strcpy(str,"BAR"); } // allocate str in data segment

You can place the array in the data segment by making it global:
// allocate str in data segment

Instead of linking with the small-model version of strcpy, you can use the largemodel (also called the model-independent) version:
void Foo4() { char str[10]; _fstrcpy(str,"BAR"); }

// accept far pointers

This version expects far pointers instead of near pointers and therefore casts the near pointers into far pointers.

You can also use the following functions from the Windows library: • lstrcat
• • • • • •

lstrcmp lstrcmpi lstrcpy lstrlen wsprintf wvsprintf

If you use one of these functions, the previous example becomes:
void Foo4() { char str[10]; lstrcpy(str,"BAR"); }

// accept far pointers

The following code fragment:
void Foo5() { char str[10]; char *pstr ; pstr = str ; strcpy(pstr,"BAR"); }

// allocated on stack // near pointer based on DS // loss of segment

causes the compiler to generate the error message:
warning C4758: address of automatic (local) variable taken. DS != SS.

In this example, pstr is set to the offset of str, and the segment is lost because pstr is a near pointer. Declaring pstr a far pointer eliminates this problem. However, you cannot pass a far pointer to strcpy so you must use _fstrcpy, which results in the following corrected code:
void Foo6() { char str[10]; char FAR *pstr ;

// far pointer

pstr = str ; // no segment loss _fstrcpy(pstr,"BAR");


The following code also prevents the segment loss:
void Foo7() { static char str[10]; char *pstr ; pstr = str ; strcpy(pstr,"BAR"); }

// DS-based pointer // no segment loss

What happens if the C run-time function does not have a far version? For example, in the Picker DLL, the picker_OnMouseUp function calls _splitpath, which requires near pointers. Using static or global structures poses problems for multiple applications that use Picker simultaneously. To avoid these problems, Picker allocates memory from the local heap with the LocalAlloc(LMEM_FIXED,size) function, which returns a local pointer. This is exactly what Picker needs to call _splitpath.

Follow these guidelines to avoid DS != SS problems:
• • • • • • • • •

Be sure that all pointers you pass to a DLL are far pointers. Declare pointers to stack variables as far pointers. Declare arrays as static or global. Avoid storing arrays on the stack. Avoid storing variables referenced by pointers on the stack. Use the local heap for storing data. Use far versions of C run-time functions (such as _fstrcpy). Use equivalent Windows functions (such as wsprintf or lstrcpy). Use prototypes on all functions.

Reminders about DLLs:
• •

FixDS does not work with DLLs because DS != SS. Avoid using _export in DLLs with C version 6.0.

Use the DEF file to override the default behavior of functions marked with _export. Always assign ordinal numbers to all exported DLL functions. /Au introduces a considerable amount of overhead; use /Aw and _loadds instead. Replace /Gw with _loadds on exported functions.

• • •

Optimizing the Calling Convention
Several calling conventions can be used for optimization, including _cdecl (/Gd), PASCAL (/Gc), and _fastcall (/Gr):

_cdecl is the default C calling convention and is slightly slower than PASCAL and _fastcall. PASCAL (defined in WINDOWS.H as _pascal) is used to communicate between Windows and an application. It is faster than _cdecl but does not allow variable argument functions such as wsprintf. _fastcall is the fastest method. It places some of the parameters in registers but does not support variable argument functions and cannot be used with _export or PASCAL, so entry points cannot use the _fastcall modifier. Under C/C++ version 7.0, the __fastcall modifiercan conflict with the Windows prolog/epilog code if used in the following combinations. __fastcall, __far, Gw (also invalid in C version 6.0) __fastcall, __far, __export, GA __fastcall, __far, __export, GD __fastcall, __far, GA, GEf __fastcall, __far, GD, GEf __fastcall, __far, __export, GA, GEf __fastcall, __far, __export, GD, GEf

• • • • • • •

Because the C run-time library is compiled with the _cdecl convention, you must include header files such as STDLIB.H and STRING.H when you use a different calling convention. These header files explicitly mark each function as _cdecl to simplify changing the default convention. When you use a third-party library, you may have to add the _cdecl function modifier to the header files.

You can use any calling convention as the default convention for applications, as long as you declare all entry points FAR PASCAL and declare the WinMain function PASCAL. Marking callback functions as PASCAL is usually safer, even if you use the /Gc Pascal convention option, because it avoids problems if the calling convention changes inadvertently. It is also a good form of code commenting. Summary of calling conventions:
• • • •

WinMain should use the PASCAL calling convention. Entry points that Windows calls must be FAR PASCAL. Only _cdecl allows variable arguments. _fastcall is incompatible with _export or PASCAL and is therefore incompatible with Windows prolog/epilog code.

DLLs and _cdecl
A DLL, unlike an application, can use any calling convention, even for application-called entry points. An application that calls a DLL must know which calling convention the DLL expects and must use that convention. A DLL may need to implement a variable argument function. Because _cdecl is the only convention that supports variable arguments, it is the convention of choice. If you want a DLL function to use variable arguments, use the _cdecl convention instead of the PASCAL convention. Note the following caveats when using variable argument lists in DLLs:

The variable argument macros from STDARG.H use the default pointer size to point to the arguments that are on the stack. In the small or medium model, the pointers are near pointers. Because DS != SS, these pointers do not point to the correct value and must be changed to far pointers before you can use these macros, as shown in the modified STDARG.H below:
/**************************************************************** * File: wstdarg.h * Remarks: Macro definitions for variable argument lists * used in DLLs. ****************************************************************/ typedef char _far *wva_list ; #define wva_start( ap, v ) #define wva_arg( ap, t ) #define wva_end( ap ) (ap = (wva_list) &v + sizeof( v )) (((t _far *)(ap += sizeof( t )))[-1]) (ap = NULL)

• • • • • • • • • • •

When passing arguments by reference, always use far pointer declarations. The compiler synthesizes far pointers by pushing the DS and the offset of the memory

location onto the stack. This provides the DLL with the proper information to access the application's data segment.

Because functions with variable arguments are defined using _cdecl, pointer arguments that are not declared in the parameter list must be typecast in the function call; otherwise, the omission of the function parameter prototype causes unpredictable results. For example:
void FAR _cdecl DebugPrint( LPSTR lpStr, LPSTR lpFmt, ... ) DebugPrint( szValue, "%s, value passed: %d\r\n", (LPSTR) "DebugPrint() called", (int) 10 ) ;

• • • •

When you import or export a function, you must declare it with an underscore (_) prefix in the DEF file. You must also preserve case sensitivity in the function name. For example, you can declare the function above as follows:

• • • •

cdecl functions must either be linked by ordinal number or have all-uppercase names. Unlike Pascal functions, which are converted to uppercase before they are exported, _cdecl functions retain their case when exported. The Windows dynamic-linking mechanism always converts function names to uppercase before it looks in the DLL for the function. However, functions exported from a DLL are expected to be in uppercase and are not converted. The result is a comparison between an uppercase function name and a mixed-case function name. This comparison, of course, fails. The solution is to declare the function name alluppercase or to link by ordinal number and avoid the whole comparison problem.

Variable argument C run-time library functions such as vsprintf and vfprintf do not take DS != SS into account. These functions are not available in DLLs. Compile with /D_WINDLL instead of /D_WINDOWS to detect functions that DLLs do not support. The C/C++ version 7.0 compiler option /GD does this automatically.

If the DLL will be used with different languages such as Visual Basic, Borland C++, Microsoft Excel, Zortech C++, or Microsoft FORTRAN, you should use the PASCAL convention. The registers used by the _fastcall convention can change between compiler versions and are not compatible between compilers by different vendors.

Aliasing and Windows (/Ow and /Oa)
An alias is a second name that refers to a memory location. For example, in:
int i ; int *p ; p = &i ;

pointer p is an alias of variable i. You can use aliases to perform tasks while keeping the original pointer around, for example:
// No error checking. // Get a pointer. // LPSTR ptr = GlobalLock(GlobalAlloc(GHND,1000)); LPSRT ptr_alias = ptr ; // alias the pointer for ( i = 0 ; i < 1000 ; i++) *(ptr_alias++) = foo(i) ; // use the alias GlobalFree(GlobalHandle(ptr)); // free the memory

The compiler makes the following assumptions if there is no aliasing:
• • •

If a variable is used directly, no pointers reference that variable. If a pointer references a variable, that variable is not used directly. If a pointer modifies a memory location, no other pointers access the same memory location.

Global Register Allocation (/Oe)
Although aliasing is a common and acceptable practice, the compiler can improve optimizations if it can assume that there is no aliasing, because it can place more memory locations into registers. By default, the compiler uses registers:
• • •

To hold temporary copies of variables. To hold variables declared with the register keyword. To pass arguments to functions declared with fastcall or compiled with /Gr.

The /Ow and /Oa options signal the compiler that it has more freedom to place variables or memory locations into registers; these options do not cause the compiler to keep variables in registers. The global register allocation option /Oe, on the other hand, allocates register storage to variables, memory locations, or common subexpressions. Instead of using registers only for temporary storage or for producing intermediate results, the /Oe option places the most frequently used variables into registers. For example, /Oe places a window handle, hWnd, in a register if a function is likely to use hWnd repeatedly. Because the no-aliasing options increase the compiler's opportunities to place a variable in a register, it makes sense to use these options with /Oe. In many cases, the /Ow and /Oa options do not optimize without the /Oe option. In some cases, you can eliminate problems with /Ow or /Oa by turning off /Oe optimization.

Using /Ow Instead of /Oa
What is the difference between /Ow (Windows version) and /Oa? Basically, /Ow is a relaxed version of /Oa. It assumes aliasing will occur across function calls, so a memory location placed in a register is reloaded after a function call. For example, in:
foobar( int * p) ; { // Compiler puts the value that p points to into a register. *p = 5 ; foo() ; // If compiled with /Ow, the compiler reloads the register // with p. (*p)++ ; }

the compiler places the memory referenced by pointer p into a register. If the /Ow option is set, the compiler reloads the register. If the /Oa option is set, pointer p is not reloaded after the function call. Thus, /Ow tells the compiler to forget everything about pointed-to values after function calls. Compiling the code fragment above with /Ox and /Oa results in the following code:
mov mov call mov si,WORD PTR [bp+4] WORD PTR [si],5 _foo WORD PTR [si],6 ; pointer p is passed in at [bp+4] ; compiler assumes that *p cannot ; change and generates *p=6 instead ; of (*p)++

Notice how the compiler optimized away the last line that incremented pointer p. Compiling the code with /Ox and /Ow results in the following correct version:
mov mov call inc si,WORD PTR [bp+4] WORD PTR [si],5 _foo WORD PTR [si] ; p ; compiler assumes that ; *p might change.

To understand the benefit this technique adds to a Windows-based program, look at the following code fragment:
void Foo(HWND hwnd) { char ach[80]; // Zero terminate the string in case of error. // ach[0] = 0;

SendMessage(hwnd, WM_GETTEXT, sizeof(ach), (LONG)(LPSTR)ach); // If some text is returned, do something with it. // if (ach[0] != 0) { Bar(ach); } }

If you compile this code fragment with /Oa and C version 6.0, Bar is never called. If you use C/C++ version 7.0, Bar is called. The C version 6.0 compiler assumes that ach does not change in the SendMessage call and optimizes the call to the if block because ach[0] is still zero. If you compile the code with /Ow, the compiler expects ach to change after any function, including SendMessage. The C version 6.0 compiler appears to be pretty dumb—it does not realize that the ach pointer was passed to SendMessage. However, as far as the compiler can tell, a LONG was passed, not the pointer. If a pointer had been passed, /Oa would have worked. For example, in the following code:
void SomeFunc(HWND hwnd, LPSTR astr, int asize) { SendMessage(hwnd, WM_GETTEXT, asize, (LONG)astr); } void Foo(HWND hwnd) { char ach[80]; //Pass a pointer. SomeFunc(hwnd,(LPSTR)ach, sizeof(ach)); if (ach[0] != 0) { Bar(ach); } }

the compiler knows that the pointer is being passed and can be changed. This problem can occur in any function that takes a pointer as a DWORD (lparam) or a WORD (wparam). The C/C++ version 7.0 compiler corrects this behavior. You can also solve this problem by simply declaring ach volatile. This causes the compiler to place a variable in a register only if it must. However, /Ow usually generates better code than using the volatile keyword. Although /Ow is the easiest solution, the code it generates is not as efficient as the code /Oa generates, as illustrated by the hWnd window handle in the previous example. Window handles are commonly used in functions. They are perfect examples of variable

types that are meant to be placed into registers; however, with the /Ow option they are reloaded after any function call. Using #pragma optimize at strategic locations to turn /Ow and /Oa off prevents problems associated with reloading. A profiler can help determine the placement of such statements. The STRICT macros defined in the Windows version 3.1 SDK WINDOWS.H file also reduce the need for the /Ow option. WINDOWSX.H includes macros that make most window functions type-safe. So, a pointer is passed as a pointer instead of being passed as a LONG. The STRICT macros can make an application more robust and should be used even if the /Oa option is not in effect.

Avoiding Undocumented Features
Undocumented "features" are rarely necessary or useful, with the exception of file functions such as _lcreate that were not documented before Windows version 3.x. For example, an undocumented feature that saves neither time nor effort is demonstrated by the following code segment.
HANDLE h = LocalAlloc(LMEM_MOVEABLE, cb); HANDLE h2; char* p; // WARNING: Undocumented Hack. // Dereference the handle without locking it. // char* p = *((char**)h); // Use *p for a bit. *p = 0; h2 = LocalAlloc(LMEM_MOVEABLE, cb); // Hmm... It could have moved, so dereference it again. // p = *((char**)h); if (*p = 0) { // Do something. }

You should not use this undocumented feature for two reasons:

Future versions of Windows will have a flat memory model and will not support this type of memory accessing. The code will not compile as expected if you use the /Oa option. The p pointer is not passed to the LocalAlloc function; therefore, the compiler assumes that p will not change as a result of this function call. The programmer has tried to outsmart the compiler by dereferencing the pointer again after the function call, so the

program appears to be safe. Not quite.... The compiler removes the second dereference statement because it assumes that p did not change as a result of the function call; this is exactly what the person who had to support the code would do. To avoid this problem:
• • • •

Do not use this undocumented feature. (This is the best solution.) Use /Ow instead of /Oa. Always lock handles to memory before using them. Use #pragma optimize to selectively turn the /Ow option on and off. You can also turn /Oe off. Use the volatile keyword to ensure that variables are not placed in registers.

Programming at Large
Dale Rogerson Microsoft Developer Network Technology Group Created: April 13, 1992

Microsoft® Windows™ version 3.1 signals the death of Windows real mode. With the release of Windows version 3.1, only standard and enhanced modes are supported. The end of real mode is the beginning of new programming freedoms, such as writing largemodel applications. This article explains why the large model is valid for protected mode applications and discusses the solutions for single instances and the Windows version 3.0 page-locking bug, limitations of large-model applications.

The Large Memory Model and Protected Mode
For large-model applications running under real mode, the Microsoft® Windows™ version 3.0 graphical environment fixed the data segments. Fixed segments cannot move, reducing the ability of Windows to manage memory effectively. Under protected mode, Windows can move fixed data segments. Therefore, protected mode does not suffer the performance degradation that real mode does.

The difference between real mode's inability to move memory and protected mode's ability to move memory lies in the way the two modes address memory. Large-model data pointers default to 32-bit far pointers. In real mode, a far pointer consisted of a segment address and a segment offset, both 16 bits in length. If Windows moved the segment, the segment address would change. Windows had no efficient method for tracking and updating all pointers to a segment. In protected mode, the processor provides a mechanism, the segment selector, that removes the need to track and update individual pointers. All far pointers in protected mode consist of a 16-bit segment selector and a 16-bit segment offset. The segment selector does not refer directly to a physical address; instead, it indexes into a table. The value in this table is a segment address. When a segment moves, the segment selector does not change, but the value in the table is updated. The maintenance of the segment selector and the selector tables is supported directly by the Intel® 80x86 microprocessor. While the segment selector solves many of the old problems caused by using the large model, it does not resolve two limitations. One limitation requires applications with multiple data segments to have only a single instance. The other limitation is a bug in Windows version 3.0 that caused multiple data segments to be page-locked in memory. These limitations do not affect dynamic-link libraries.

Single Instances
Windows version 3.1 cannot run multiple instances of applications with multiple readwrite data segments. If a large-model application has a single read-write data segment, it can run multiple instances. A read-only segment can also be safely shared by multiple instances because the instances cannot change the segment. Most large-model applications, however, have multiple data segments and, therefore, cannot run multiple instances. While there are several methods for getting only one data segment in a large-model program, one must remember that the application can have only 64 kilobytes (K) of static data, local heap, and stack combined. This is the same as a medium-model program. For this reason, when porting from a flat model 32-bit environment, it is probably best to use a compiler that supports development of 32-bit applications under Windows. These compilers, such as Watcom C 9.0, MetaWare 32-Bit Windows Application Development Kit, or MicroWay NPD C-386, use WINMEM32.DLL to get a full 32-bit flat memory model.

The Reason
In a multiple-instance application, all instances share the same code segments but have unique default data segments. Small- and medium-model applications have only one data segment. Most large-model applications have multiple data segments, but the current Windows kernel cannot resolve fixups to multiple data segments. Consider the following code fragment found in large-model applications that establishes the DS register:

mov mov

ax,_data_01 ds,ax

This code is shared by all instances of the application. When the code is loaded, _data_01 can hold only one value. Windows has no way to associate other data segments with a given instance of an application. The program loader determines if only one instance is allowed after examining the .EXE header. If it discovers more than one data segment, it limits an application to one instance. If an application has less than 64K of data, stack, and local heap, it is possible to collapse the data into one data segment.

Gaining Multiple Instances
To get multiple instances, there must be only one read-write data segment. Under Microsoft C/C++ version 7.0, follow these guidelines to allow for multiple instances:
• • • • •

Do not use /ND to name extra data segments unless the segment is READONLY. Use the .DEF file to mark extra data segments READONLY. Do not use __far or FAR to mark data items. Use /PACKDATA to combine data segments. Use /Gt65500 /Gx to force all data into the default data segment.

All of the above guidelines apply to Microsoft C version 6.0, except for the last one. Microsoft C version 6.0 and C/C++ version 7.0 will usually generate two read-write data segments. One is for initialized static data (DATA). The other one (FAR_BSS) is for uninitialized static data. The Borland® C compilers default to generating only one data segment. The existence of multiple data segments for a program called SOMEPROG.EXE can be verified by the following command: c:\> EXEHDR -v someprog.exe | more Microsoft C version 6.0 does not have the /Gx option to stop the generation of FAR_BSS and to combine initialized and uninitialized data. While there are ways to stop the creation of FAR_BSS with C version 6.0, in most cases it is easier to use C/C++ version 7.0. To eliminate FAR_BSS with C version 6.0:
• •

Initialize all uninitialized static variables, and mark all extern variables as NEAR. Mark all variables as NEAR, forcing the variables into the DATA segment.

For large programs, these ways of eliminating FAR_BSS can be very time-consuming.

The big problem with all methods for gaining multiple instances is that the application still has only one read-write data segment. It does not have more data space than a medium- or small-model program. A large-model program can have either multiple instances or multiple read-write data segments, but not both.

Windows Version 3.0 Page-Locking Bug
Multiple data segments do not cause any problems in Windows version 3.1, except for requiring an application to run a single instance. In Windows version 3.0, however, there is a bug in the memory manager that page-locks fixed segments of an application. When a segment is page-locked, it becomes a dam in memory because it cannot be moved in physical memory nor paged to disk. This is of great concern for applications compiled with a large model, because large-model applications can have more than one data segment that is fixed. Under Windows version 3.1, fixed segments in a DLL are still page-locked to support interrupt service routines.

Page-Lock Fix
To get around the page-lock problem, follow these steps: 1. Compile your application normally, and generate a map file during linking. Examine the map file and find the names of the FAR_DATA and FAR_BSS segments. 2. Write one or more assembly language routines that will return handles to the FAR_DATA and FAR_BSS segments found in step 1. The following function will return a handle to the data segments named MYSEGMENT and FAR_BSS:
3. title simhan.asm 4. ;**************************************************************** 5. ?WIN = 1 6. ?PLM=1 ; PASCAL calling convention is DEFAULT 7. ?WIN=1 ; Windows calling convention 8. ; Use 386 code? 9. .MODEL LARGE 10.include 11.sBegin DATA 12.sEnd DATA 13.MYSEGMENT SEGMENT MEMORY 'FAR_DATA' 14.MYSEGMENT ENDS 15.FAR_BSS SEGMENT MEMORY 'FAR_BSS' 16.FAR_BSS ENDS 17.sBegin CODE 18.assumes CS,CODE 19.assumes DS,DATA 20.;************************************************************** 21.cProc gethandle,<PUBLIC,FAR,PASCAL> 22.cBegin ax,MYSEGMENT 24.cEnd gethandle 25.;**************************************************************

26.cProc gethandle2,<PUBLIC,FAR,PASCAL> 27.cBegin ax,FAR_BSS 29.cEnd gethandle2 30.sEnd CODE 31.end

32. Add a call to the following function in your application's InitInstance function after testing the success of your CreateWindow call:
33.void unlockAll() 34.{ 35.// This fix is only needed for Windows version 3.0 so check 36.// version. 37.if (LOWORD(GetVersion()) == 0x0003) 38.{ 39.// Un-pagelock MYSEGMENT 40.unlockExtra(gethandle()) ; 41.// Un-pagelock FAR_BSS 42.unlockExtra(gethandle2()) ; 43.} 44.} 45.void unlockExtra(HGLOBAL hExtraSeg) 46.{ 47.BOOL fRet ; 48.// Unfix segment in logical memory 49.GlobalReAlloc(hExtraSeg, 0, GMEM_MODIFY | GMEM_MOVEABLE); 50.// Only discardable memory can be GlobalPageUnlock'ed 51.GlobalReAlloc(hExtraSeg, 0, GMEM_MODIFY | GMEM_DISCARDABLE); 52.// Unfix in physical (protected mode) memory 53.GlobalPageUnlock(hExtraSeg); 54. 55.// Reset the lock count to 0 because Windows happens to lock 56.// it multiple times. { 58.fRet = GlobalUnlock(hExtraSeg); 59.} while (fRet); 60. 61.// Modify the flags to moveable 62.GlobalReAlloc(hExtraSeg, 0, GMEM_MODIFY | GMEM_MOVEABLE); 63.}

64. Modify your make file to assemble and link your procedures that return handles to your fixed data segments. 65. Recompile your program, and check results using the Microsoft Windows 80386 Debugger (WDEB386.EXE).

Testing the Page-Lock Fix
It is a good idea to test the fix under Windows version 3.0. A program that reports the page-lock status of segments is needed. Microsoft CodeView® for Windows and the 3.0 version of the Windows Heap Walker utility do not report the page-lock status. Also, the 3.1 version of Heap Walker does not run reliably under Windows version 3.0. WDEB386, however, does report the page-lock status of segments.

Finally, you can use WDEB386 to get page-lock information, as follows: 1. Install the debugging version of WIN386.EXE and WIN386.SYM. 2. Run WDEB386.EXE. 3. Issue the DL selector command to dump the local descriptor table (LDT) entry for the selector in which you are interested. 4. Take the Base linear address from the DL command and issue the .ML linear address command. 5. Take the PFT address from the .ML command and issue the .MS PFT address command. This will list the lock count for that page. For more information on WDEB386.EXE, refer to Chapter 5, "Advanced Debugging: 80386 Debugger," in the Microsoft Windows version 3.1 Software Development Kit (SDK) Programming Tools.

Words of Warning
It is important to keep the following points in mind when deciding to use the large model:

A bug in Microsoft C/C++ version 7.0 causes C++ objects to be placed outside the default data segment, ignoring the /Gx compiler option. To avoid this bug, specify the object as near. For example:
CTheApp NEAR theApp ;

• •

To get multiple instance large-model Microsoft Foundation Class (MFC) applications, a special variant of the large-model libraries must be built. Use the following make line:
nmake MODEL=L TARGET=W DEBUG=1 OPT="/Gt65500 /Gx"

The above variant of the MFC library has not been extensively tested.

Large-model applications run more slowly than medium- and small-model applications. Basically, a multiple-instance, large-model application differs from a mediummodel application only in the size of its default data pointers. Multiple-instance, large-model applications have only one read-write data segment. Multiple-instance, large-model applications can have only 64K total of stack, local heap, and static data.

It is easier to build multiple-instance, large-model applications with Microsoft C/C++ version 7.0 and Borland C compilers than with Microsoft C version 6.0. When porting from a flat-model 32-bit environment, it is probably best to use a compiler that supports development of 32-bit applications under Windows. These compilers, such as Watcom C 9.0, MetaWare 32-Bit Windows Application Development Kit, or MicroWay NPD C-386, use WINMEM32.DLL to get a full 32-bit flat-memory model. Another option is to wait for the release of Win32s™, a subset of the Win32™ Application Programming Interface that lets you develop 32-bit applications for Windows version 3.1.

On a more positive note, large-model DLLs work very well because the equation SS != DS in the large model works exactly as it does in a DLL. Also, a DLL is always a single instance. The Microsoft Foundation Classes recommend using a large model for DLLs.

Sign up to vote on this title
UsefulNot useful