High-Performance Tuning for Win32

May 9, 1995

Your cool Win32-based application is up and running and you are pretty darn excited about the 32-bit world. But then doubt starts to creep into your mind and you say, "Hey, I know this app isn't performing to its potential. There must be some things I can do to make this faster."

So, how do you get the most bang for your bits? That's where this article comes in. Here are six easy tuning tips from Microsoft's performance gang.

#1. Do page tuning

The wstune utility is a valuable tool that runs on Windows NT and has been in the Win32 SDK since its earliest version. Using wstune to page tune your application on Windows NT can also help your application's boot time on Windows 95.

The program's working set is a collection of recently referenced pages in its virtual address space. As the working set size decreases, so does memory demand. Since it is advantageous to minimize the memory demand, the working set tuner's dynamic-link library (DLL) is developed to measure the code working set and to produce an ordering of the functions (procedures) within the code that will be small, if not minimal, in a paging environment.

#2. Prebase your DLLs

Make sure your DLLs are prebased contiguously in memory. This lets the system avoid the necessity of calculating an appropriate load address for the DLLs. You can look in your map files to see how big they are, and from that calculate their load addresses manually if you have to. Or you could write the appropriate makefiles to do this automatically.

If you are using the Microsoft Visual C++ tools, you can just run the Rebase applet that comes with it. Rebase automatically calculates the appropriate load addresses for your DLLs if you feed them to the applet.

#3. Properly prebind

Make sure your DLLs are properly prebound by using the "bind" utility that comes with your favorite compiler. This marks the import address table (IAT) as read-only, so the loader does not have to privatize a new page for the IAT for each LoadLibrary call.

The IAT provides the system with the specific addresses for all APIs called by a third-party DLL. These addresses are then used to jump to the appropriate API in the system code. The IAT is a minimum of one page (4K) of pagelocked data per client of the DLL.

For every n clients of a particular DLL, the loader would need to first create and then copy n private copies of the IAT for each client. This is a much more expensive operation than a simple page fault.

Note that you must prebind your DLLs against a set of specific target DLLs. The bind utility will automatically calculate the sizes of your DLLs versus the system DLLs. This technique allows you to optimize for only one platform. It probably makes most sense for developers to prebind against Windows 95 rather than Windows NT, because the latter is more likely to be running on a system with more RAM.

#4. Turn on page faulting in Windows 95

You can use page-fault tracing by installing the debug version of Windows 95 and then running SWITCH.BAT to install the retail GUI (graphical user interface). SWITCH.BAT is a batch file supplied with the Win32 SDK. It switches from debug to retail and back, depending upon the options you give. To turn on page-fault tracing, type ".mff" (without the quotation marks) into the wdeb386 kernel debugger. This will dump all faults to your debug terminal. An analysis of this data can tell you all sorts of things, such as how many page faults you're taking, as well as when you're faulting due to reading in code versus data.

It is important to finish one complete initialization before starting something else. In some cases an application calls the OleInitialize routines too early. This can cause thrashing when the application is started in low memory conditions because both the application and the OLE libraries are trying to finish initialization, but neither one can fully fit into memory simultaneously.

#5. Load later

To help application boot time specifically, try to delay loading certain components, such as MPR (network), SHELL.DLL, COMMDLG.DLL, printer drivers, the spooler, and so on, until they are actually needed. Delaying the loading of just one DLL doesn't help as much, but when you can do it for multiple DLLs on an 8-MG (megabyte) computer, your wins start increasing geometrically.

Are you doing small file reads? Sequences of 8-byte file reads at bootup can hinder performance. Do you do unnecessary disk writes at bootup? Look at any large allocations or files you create at boot time that could be moved to later on. Make sure you aren't loading any system DLLs you don't really need, such as the messaging API (MAPI) or the multimedia components

#6. Use Windows 95 call tracing

Use call tracing. Make sure you aren't passing in any bogus search paths for DLLs, or possibly looking for malformed DLLs. In some cases searching the path for DLLs is more expensive than explicitly changing directories to the application directory, loading the DLL, and then changing back to the working directory.

Often applications load six or seven of their own random DLLs, thus allocating and touching huge chunks of memory at bootup, and opening/closing temp files.

You can turn on call tracing in wdeb386 using this sequence of commands:

.c <return> 3 <return> 4 <return> y <return> n <return> ESC

Then watch what calls are made on your behalf down to the file system.

Tuning on Windows 95

Remember that it is especially important to tune your applications on Windows 95, because you may be running on computers with less memory than those running Windows NT.