Under the Hood

Matt Pietrek

Matt Pietrek is the author of Windows 95 System Programming Secrets (IDG Books, 1995). He works at Nu-Mega Technologies Inc., and can be reached at 71774.362@compuserve.com.

QWay back in your July 1994 column, you wrote about shrinking the size of a 16-bit executable for Borland C++ 4.0. Do you have any suggestions for doing the same for version 5 with a 32-bit executable written in straight C? You can imagine my horror when I compiled a program consisting only of a MessageBeep call and discovered it was approximately 32KB!

Fred Bulback

Via CompuServe

AIn my 32-bit Liposuction article this month, I describe some techniques for reducing the size of executable (EXE and DLL) files. However, Fred's question addresses a different issue. Specifically, even the smallest of C/C++ programs ends up bigger than necessary because of the overhead imposed by the runtime library (RTL).

Often a simple program doesn't need all of the compiler's RTL functions to run and you end up with unneeded code in your executables. For these programs, it's possible to replace various pieces of the RTL code and save quite a bit of space. In this column I'll show where many 32-bit executables are bigger than they have to be, and show you a way to slash their sizes dramatically.

I don't mean to be hard on the runtime libraries. They do a large amount of grunt work for you. My point in this column is to show that there's nothing magical about the runtime library. It's just code, and as such, you can replace it with other code. By writing your own custom initialization code, you can remove references to many of the routines that are linked into all C or C++ executables. When you remove the references to those functions, the linker in turn won't include them in the final executable, and your executable will be smaller as a result.

Size Does Matter

Whenever I charge the windmills of executable file bloat, someone inevitably tells me that executable size doesn't matter. After all, hard-drive space is cheap. More importantly, on most hard drives the minimum space that a file takes up is 8KB or 16KB. Thus (as the logic goes), it's not worth trying to make your programs smaller than 8KB.

I have three rebuttals to these arguments. First, in small programs most of this space is taken up by RTL code. If you don't need all this code, why waste the time to load and execute it? Second, while hard drives have large cluster sizes, floppies don't. If you distribute your code on floppy disk, cutting down the size of your executables may let you ship on fewer disks. Third, more and more software is being distributed via the Internet. Why force users with 14.4K modems to wait for useless code to download?

So Why Are Executables Fat?

Let's look at the simplest possible C program and see where unnecessary code is being pulled in. The smallest possible program is arguably the following:


 // TEST.C
int main()
{
    return 0;
}

First, let's compile the program with Visual C++¨ 4.1 and try to make it as small as possible. The command line


 CL /O1 /Fm TEST.C

should produce an executable that's optimized for size (the /01 switch) and that doesn't contain any debug information or padding from incremental linking. The /Fm switch tells the compiler to generate a MAP file. The resultant EXE from this compile is 11,264 bytes. If you were to examine TEST.OBJ, you'd see that function main only contributes three bytes of code.

Obviously, something else is taking up 11KB in the file. Figure 1 shows a portion of TEST.MAP, which gives you a clue what's included in TEST.EXE and using up 11KB.

Figure 1 TEST.MAP from Visual C++ 4.1


  0001:00000000     _main                         00401000 f test.obj
 0001:00000010     _mainCRTStartup               00401010 f LIBC:crt0.obj
 0001:00000120     __amsg_exit                   00401120 f LIBC:crt0.obj
 0001:00000150    __cinit                       00401150 f LIBC:crt0dat.obj
 0001:00000180    _exit                         00401180 f LIBC:crt0dat.obj
 0001:000001a0    __exit                        004011a0 f LIBC:crt0dat.obj
 0001:00000290   __global_unwind2              00401290 f LIBC:exsup.obj
 0001:000002d2   __local_unwind2               004012d2 f LIBC:exsup.obj
 0001:0000032a    __NLG_Return2                 0040132a   LIBC:exsup.obj
 0001:0000033a   __abnormal_termination        0040133a f LIBC:exsup.obj
 0001:0000035d   __NLG_Notify1                  0040135d f LIBC:exsup.obj
 0001:00000366   __NLG_Notify                   00401366 f LIBC:exsup.obj
 0001:00000379    __NLG_Dispatch                 00401379   LIBC:exsup.obj
 0001:00000380    __XcptFilter                   00401380 f LIBC:winxfltr.obj
 0001:00000510   __setenvp                      00401510 f LIBC:stdenvp.obj
 0001:000005f0    __setargv                      004015f0 f LIBC:stdargv.obj
 0001:00000870   ___crtGetEnvironmentStringsA     00401870 f LIBC:aw_env.obj
 0001:00000a10    __setmbcp                      00401a10 f LIBC:mbctype.obj
 0001:00000cd0    ___initmbctable                00401cd0 f LIBC:mbctype.obj
 0001:00000ce0   __ioinit                       00401ce0 f LIBC:ioinit.obj
 0001:00000ec0   __heap_init                    00401ec0 f LIBC:heapinit.obj
 0001:00000ee0    __except_handler3              00401ee0 f LIBC:exsup3.obj
 0001:00000f9d    __seh_longjmp_unwind@4         00401f9d f LIBC:exsup3.obj
 0001:00000fc0   __FF_MSGBANNER                 00401fc0 f LIBC:crt0msg.obj
 0001:00001000    __NMSG_WRITE                   00402000 f LIBC:crt0msg.obj
 0001:000011f0    _free                          004021f0 f LIBC:free.obj
 0001:00001210   _malloc                        00402210 f LIBC:malloc.obj
 0001:00001230   __nh_malloc                    00402230 f LIBC:malloc.obj
 0001:00001270   __heap_alloc                   00402270 f LIBC:malloc.obj
 0001:00001290    ___crtMessageBoxA              00402290 f LIBC:crtmbox.obj
 0001:00001330    _strncpy                       00402330 f LIBC:strncpy.obj
 0001:00001430    __callnewh                     00402430 f LIBC:handler.obj
 0001:00001450   _RtlUnwind@16                  00402450 f kernel32:KERNEL32.dll

I won't describe everything in TEST.MAP, but a couple of space wasters are worth noting. All the functions with "unwind" in the name are related to C++ exception handling or Win32¨ structured exception handling. The same goes for the __XcptFilter and __except_handler3 functions. I didn't see any exception handling in my function main, did you? Also in TEST.MAP, you'll find _malloc, _free, and various other heap routines. Did you see any memory allocation in function main? I didn't think so. The __crtGetEnvironmentStringsA function retrieves the process's environment and breaks it into individual components (for instance, "PATH=..."). Once again, function main didn't do anything with environment strings, yet code to manipulate them ended up in the EXE. Hmm....

Let's look at Borland C++ 5.0 and see if it's any better. On the same TEST.C program, let's use this command line:


 BCC32 -O1 -M TEST.C

This tells the compiler to produce an executable optimized for size, without any debug information, and with a MAP file. This time, the resultant TEST.EXE is 32,768 bytes. Wow! Figure 2 shows just a sampling of lines from the TEST.MAP created with Borland C++.

Figure 2 TEST.MAP from Borland C++ 5.0


   Address         Publics by Value

 0001:0000007C       _main
 0001:00000084       _memchr
 0001:000000A4       _memcpy
 0001:000000C8       _memmove
 0001:00000114       _memset
 0001:00000144       _strcmp
 0001:00000170       _strlen
 0001:0000018C       _memcmp
 0001:000001D0       _strrchr
 0001:00000BBE       _ThrowException(...
 0001:00000D57       _CatchCleanup()
 0001:0000117E       __Local_unwind
 0001:000011FB       __ExceptionHandler
 0001:00001B81       __CurrExcContext
 0001:00001F72       __tpdsc__[typeinfo*]
 0001:00001F89       __tpdsc__[Bad_typeid]
 0001:00001FA5       __tpdsc__[Bad_cast]
 0001:00001FC1       __tpdsc__[typeinfo]
 0001:0000200C       @__InitExceptBlock
 0001:00002077       ___call_terminate
 0001:0000215B       ___call_unexpected
 0001:000021CF       __GetPolymorphicDTC(void*,unsigned int)
 0001:000021EE       __ExceptInit
 0001:000022D0       __GetExceptDLLinfoInternal
 0001:0000230C       __lockDebuggerData()
 0001:00002334       __unlockDebuggerData()
 0001:00002340       ___DefHandler
 0001:00002460       __InitDefaultHander
 0001:0000247A       ___doGlobalUnwind
 0001:0000248F       invokeHnd()
 0001:00002498       __SetExceptionHandler
 0001:000024AD       __UnsetExceptionHandler
 0001:000024E0       _flushall
 0001:000024E8       __allocbuf
 0001:00002568       _fflush
 0001:000025F0       __flushout
 0001:00002650       __initfmode
 0001:00002660       ___fputn
 0001:00002734       _fputs
 0001:00002774       _fputc
 0001:0000290C       _sprintf
 0001:0000298C       __init_streams
 0001:00002B1C       ___vprinter
 0001:0000317C       __xfflush
 0001:000031BC       __flushall
 0001:000031F4       __umask
 0001:000032C4       ___write
 0001:000033A4       __get_handle
 0001:000033E4       __dup_handle
 0001:00003504       __init_handles
 0001:00003648       ___IOerror
 0001:0000368C       ___DOSerror
 0001:000036AC       ___NTerror
 0001:000036C0       __rtl_write
 0001:00003724       ___close
 0001:00003770       ___isatty
 0001:000037A0       ___isatty_osfhandle
 0001:000037B8       ___lseek
 0001:00003830       ___open
 0001:00003A28       __clear87
 0001:00003A3C       __control87
 0001:00003A64       ___realcvt
 0001:00003A6A       ___nextreal
 0001:00003A70       __scantod
 0001:00003A76       __scanrslt
 0001:00003A7C       __fpreset
 0001:00003AAC       ___longtoa
 0001:00003B20       ___utoa
 0001:00003B3C       __matherr
 0001:00003B68       __matherrl
 0001:00003B98       __initmatherr
 0001:000041B8       _free
 0001:000042C0       ___org_malloc
 0001:000042C0       _malloc
 0001:00004754       _realloc
 0001:000047EC       operator delete(void*)
 0001:000047FC       operator delete[](void*)
 0001:0000480C       __virt_reserve
 0001:00004878       __virt_commit
 0001:000048CC       __virt_decommit
 0001:000048F0       __virt_release
 0001:00004978       __ErrorMessage
 0001:00004A68       __ErrorExit
 0001:00004AE0       __ErrorMessageHelper
 0001:00004B90       __abort
 0001:00004BA4       _abort
 0001:00004C10       _exit
 0001:00004C28       __exit
 0001:00004C60       __initwild
 0001:00004E5C       __addarg
 0001:00005180       _raise
 0001:000051F8       __create_shmem
 0001:00005200       __init_exit_proc
 0001:000052AC       __startup
 0001:00005414       __cleanup
 0001:0000548C       __terminate
 0001:000054A0       _getdate

Looking at Borland C++'s TEST.MAP, you'll see several of the same space wasters that I found in the Visual C++ executable, like routines for exception handling and memory allocation. What's more, the Borland C++ executable includes code for C++ runtime type information (RTTI), as well as many STDIO.H "FILE *" routines (fputs, fflush, and so on). Also, there is code for sprintf-style string formatting and floating-point math support. With all these routines, it's not hard to see why the executable is 32KB. The bigger question is, "Why do you need all this runtime library code for such a simple program?"

Some of you may be thinking "Why not use the runtime library DLL that comes with the compiler?" Microsoft names this MSVCRTxx.DLL, while Borland's is named CW32xxxx.DLL. The xs correspond to the particular version number of the DLL. These DLLs remove the RTL code from your executables and put it in a DLL shared by other executables.

While using the RTL DLL is certainly a viable solution for medium to large programs, it's overkill for tiny programs. For starters, code is code. It doesn't matter where you put the RTL code, it's still going to load and execute and take time to do so. More importantly, by linking to the RTL DLL, your executable won't run if the DLL isn't around. If you plan to distribute your program, you'll almost certainly need to also distribute this large RTL DLL. You can't assume that the DLL will be present on everybody's system.

Breaking It Down

"Wait," you say, "the standard RTL code that my compiler supplies performs many important tasks like calling static constructors and destructors. It also sets up things so that you can use functions like malloc transparently. If I remove that initialization code, won't critical RTL functions stop working?" Yes. However, this isn't necessarily the end of the world. Let's break up the RTL code into categories to see why not.

Internal initialization and shutdown functions. In some cases (such as argc/argv processing), you can replace the standard code with your own smaller version that doesn't reference other RTL functions. In other cases, you may not need a particular language feature. For instance, if you don't have any static constructors or destructors, it doesn't matter if your replacement startup code doesn't call them.

User-callable functions that require initialization before they can be used, but are easily replaced. For instance, malloc assumes that the startup code has initialized the heap. However, the malloc function can easily be replaced with the Win32 HeapAlloc function, which doesn't require any initialization.

Large routines that have direct (or nearly direct) equivalents in the Win32 API. For example, in many cases the Win32 wsprintf function can be used as a replacement for the C sprintf function, saving quite a bit of space in your executable. The heap functions fall into this category as well as the preceding category.

Relatively small RTL functions that don't reference other functions. An example of such a function would be strchr, which finds the first occurrence of a character within a NULL-terminated string. It's perfectly OK to use the compiler-supplied versions of these functions. Why reinvent the wheel?

Large routines with no equivalent Win32 functionality. If your code needs to use functions like scanf, you'll have to bite the bullet and absorb whatever size hit they impose.

My point in breaking apart the RTL is to show that, for moderately simple programs, there are various types of RTL code that you can replace to save space in your executable. The key is to do as little work as possible. If it looks like a major hassle to replace something, or if required functionality won't be there, don't do it—go ahead and use the standard RTL code. I've found that many of my programs make few demands on the RTL, and are surprisingly easy to shrink using a library I've written.

The TINYCRT Library

To show you how easy it is to shrink executables, I've created what I call TINYCRT. With just a little effort, I was able to make both Borland and Microsoft versions. A friend of mine has even used the Microsoft version of TINYCRT on the DEC Alpha.

I want to point out up front that TINYCRT isn't something that you'll always want to use. It has significant restrictions and limitations. If your program can live within these confines, though, you can achieve serious size reductions.

As currently implemented, TINYCRT doesn't initialize your static constructors or destructors. If you use threads and runtime library routines that need per-thread synchronization (such as strtok), these routines won't work properly. Based on reading the runtime library code, it's possible that some types of exception handling may not work, although I didn't have any trouble with exception handling in my testing.

In simple terms, TINYCRT is good primarily for small, single-threaded applications that don't use static constructors and destructors. If your program doesn't work with TINYCRT, don't use it! On the other hand, quite a few programs work well with TINYCRT. I went back and switched several of my sample programs from prior MSJ columns and articles to use TINYCRT and found a 12KB savings to be the norm. Put another way, I was routinely creating 2KB or 3KB executables out of my old utilities.

The heart of TINYCRT lies in replacing the standard RTL startup code that comes with your compiler. The compiler-supplied startup code goes through all sort of gyrations to initialize various things, which in turn causes the linker to bring in other RTL routines. This is what causes the 11KB Microsoft executables or the 32KB Borland executables. The TINYCRT startup code does the bare minimum: just enough to call your main or WinMain routine.

Figure 3 shows CRT0TWIN.C, which is the startup code for GUI applications. Note that the only routine in the file is called WinMainCRTStartup. This is the name that the Microsoft linker uses when deciding where a GUI executable's entry point will be. The first byte of WinMainCRTStartup is where the operating system transfers control to begin execution of the program. In WinMainCRTStartup, the code first retrieves the process's command line and scans past the filename portion at the beginning. This pointer becomes the lpszCommandLine argument to your WinMain function. Next, the code calls GetStartupInfo to get the information needed for the nCmdShow argument to WinMain. A call to GetModuleHandle retrieves the executable's HMODULE.

Figure 3 CRT0TWIN.C


 //==========================================
// TINYCRT - Matt Pietrek 1996
// Microsoft Systems Journal, October 1996
// FILE: CRT0TWIN.C
//==========================================
#include <windows.h>

#ifndef __BORLANDC__
// Force the linker to include KERNEL32.LIB
#pragma comment(linker, "/defaultlib:kernel32.lib")
#endif

//
// Modified version of the Visual C++ 4.1 startup code.  Simplified to
// make it easier to read.  Only supports ANSI programs.
//

void __cdecl WinMainCRTStartup( void )

{
    int mainret;
    char *lpszCommandLine;
    STARTUPINFO StartupInfo;

    lpszCommandLine = GetCommandLine();

    // Skip past program name (first token in command line).

    if ( *lpszCommandLine == '"' )  //Check for and handle quoted program name
    {
        // Scan, and skip over, subsequent characters until  another
        // double-quote or a null is encountered

        while( *lpszCommandLine && (*lpszCommandLine != '"') )
            lpszCommandLine++;

        // If we stopped on a double-quote (usual case), skip over it.

        if ( *lpszCommandLine == '"' )
            lpszCommandLine++;
    }
    else    // First token wasn't a quote
    {
        while ( *lpszCommandLine > ' ' )
            lpszCommandLine++;
    }

    // Skip past any white space preceeding the second token.

    while ( *lpszCommandLine && (*lpszCommandLine <= ' ') )
        lpszCommandLine++;

    StartupInfo.dwFlags = 0;
    GetStartupInfo( &StartupInfo );

    mainret = WinMain( GetModuleHandle(NULL),
                       NULL,
                       lpszCommandLine,
                       StartupInfo.dwFlags & STARTF_USESHOWWINDOW
                            ? StartupInfo.wShowWindow : SW_SHOWDEFAULT );

    ExitProcess(mainret);
}

With all this information in hand, WinMainCRTStartup calls whatever WinMain function you've defined. After WinMain returns, WinMainCRTStartup code calls ExitProcess, passing whatever value WinMain returned as the exit code. Pretty easy! If you've ever looked at the startup code for 16-bit Windows¨ apps, you might remember that they had to call quasi-documented system functions like InitTask, InitApp, and WaitEvent in the right sequence before you could do anything. In comparison, the bare startup code for a Win32 app is wonderfully simple.

For console applications, the startup code is a little different. Figure 4 shows CRT0TCON.C, which is my minimal version. For C and C++ console applications, the Microsoft linker looks for a function called mainCRTStartup and uses that as the entry point. The only real job for mainCRTStartup is to call your main function. However, main takes three arguments: argc, argv, and env. Strange as it may seem, the operating system doesn't hand you argc, argv, and env on a silver platter. It's the job of the runtime library code to parse the command line and environment into nice tokens. Since I rarely see the env ("environment") argument used, I punted and didn't implement any support for it. For argc and argv processing, mainCRTStartup calls a routine I wrote called _ConvertCommandLineToArgcArgv. I'll describe this function momentarily.

Figure 4 CRT0TCON.C


 //==========================================
// TINYCRT - Matt Pietrek 1996
// Microsoft Systems Journal, October 1996
// FILE: CRT0TCON.C
//==========================================
#include <windows.h>
#include "argcargv.h"

#ifndef __BORLANDC__
// Force the linker to include KERNEL32.LIB
#pragma comment(linker, "/defaultlib:kernel32.lib")
#endif

int __cdecl main(int, char **, char **);    // In user supplied code

void __cdecl mainCRTStartup( void )
{
    int mainret, argc;

    argc = _ConvertCommandLineToArgcArgv( );

    mainret = main( argc, _ppszArgv, 0 );

    ExitProcess(mainret);
}

To finish describing my minimal startup code, I had to cheat a tiny bit for Borland C++. Borland's TLINK32 uses a different method to locate the address of the executable's entry point, meaning that it won't automatically use mainCRTStartup or WinMainCRTStartup as the entry point. It would be messy to explain exactly what TLINK32 does and how I dealt with it. Suffice it to say that I wrote two very minimal assembler files, one for GUI apps and another for console applications. C032CON.ASM contains just a JMP to mainCRTStartup, while C032WIN.ASM is just a JMP to WinMainCRTStartup. For the benefit of Borland C++ users without an assembler, I've included the corresponding OBJ files in the download file.

Turning to the argc and argv processing, you'll find my minimal version in ARGCARGV.C (see Figure 5). I could have used the argc and argv processing that comes with the compiler's RTL. However, these versions call several other RTL routines that cause the linker to bring in quite a bit of additional code. I was able to write my own version in a little over 256 bytes of code. It won't always give you the identical results to what you'll get from your compiler's RTL, but it's pretty good for your typical command line requirements.

Figure 5 ARGCARGV.C


 //==========================================
// TINYCRT - Matt Pietrek 1996
// Microsoft Systems Journal, October 1996
// FILE: ARGCARGV.C
//==========================================
#include <windows.h>
#include "argcargv.h"

#define _MAX_CMD_LINE_ARGS  128

char * _ppszArgv[_MAX_CMD_LINE_ARGS+1];

int __cdecl _ConvertCommandLineToArgcArgv( void )
{
    int cbCmdLine;
    int argc;
    PSTR pszSysCmdLine, pszCmdLine;
    
    // Set to no argv elements, in case we have to bail out
    _ppszArgv[0] = 0;

    // First get a pointer to the system's version of the command line, and
    // figure out how long it is.
    pszSysCmdLine = GetCommandLine();
    cbCmdLine = lstrlen( pszSysCmdLine );

    // Allocate memory to store a copy of the command line.  We'll modify
    // this copy, rather than the original command line.  Yes, this memory
    // currently doesn't explicitly get freed, but it goes away when the
    // process terminates.
    pszCmdLine = HeapAlloc( GetProcessHeap(), 0, cbCmdLine+1 );
    if ( !pszCmdLine )
        return 0;

    // Copy the system version of the command line into our copy
    lstrcpy( pszCmdLine, pszSysCmdLine );

    if ( '"' == *pszCmdLine )   // If command line starts with a quote ("),
    {                           // it's a quoted filename.  Skip to next quote.
        pszCmdLine++;
    
        _ppszArgv[0] = pszCmdLine;  // argv[0] == executable name
    
        while ( *pszCmdLine && (*pszCmdLine != '"') )
            pszCmdLine++;

        if ( *pszCmdLine )      // Did we see a non-NULL ending?
            *pszCmdLine++ = 0;  // Null terminate and advance to next char
        else
            return 0;           // Oops!  We didn't see the end quote
    }
    else    // A regular (non-quoted) filename
    {
        _ppszArgv[0] = pszCmdLine;  // argv[0] == executable name

        while ( *pszCmdLine && (' ' != *pszCmdLine) && ('\t' != *pszCmdLine) )
            pszCmdLine++;

        if ( *pszCmdLine )
            *pszCmdLine++ = 0;  // Null terminate and advance to next char
    }

    // Done processing argv[0] (i.e., the executable name).  Now do th
    // actual arguments

    argc = 1;

    while ( 1 )
    {
        // Skip over any whitespace
        while ( *pszCmdLine && (' ' == *pszCmdLine) || ('\t' == *pszCmdLine) )
            pszCmdLine++;

        if ( 0 == *pszCmdLine ) // End of command line???
            return argc;

        if ( '"' == *pszCmdLine )   // Argument starting with a quote???
        {
            pszCmdLine++;   // Advance past quote character

            _ppszArgv[ argc++ ] = pszCmdLine;
            _ppszArgv[ argc ] = 0;

            // Scan to end quote, or NULL terminator
            while ( *pszCmdLine && (*pszCmdLine != '"') )
                pszCmdLine++;
                
            if ( 0 == *pszCmdLine )
                return argc;
            
            if ( *pszCmdLine )
                *pszCmdLine++ = 0;  // Null terminate and advance to next char
        }
        else                        // Non-quoted argument
        {
            _ppszArgv[ argc++ ] = pszCmdLine;
            _ppszArgv[ argc ] = 0;

            // Skip till whitespace or NULL terminator
            while ( *pszCmdLine && (' '!=*pszCmdLine) && ('\t'!=*pszCmdLine) )
                pszCmdLine++;
            
            if ( 0 == *pszCmdLine )
                return argc;
            
            if ( *pszCmdLine )
                *pszCmdLine++ = 0;  // Null terminate and advance to next char
        }

        if ( argc >= (_MAX_CMD_LINE_ARGS) )
            return argc;
    }
}

As a side benefit to writing my own argc/argv processing, you can use the _ConvertCommandLineToArgcArgv in your GUI applications. That's right! If you've ever wanted argc/argv-style arguments in a GUI program, include ARGCARGV.H in your source and call the _ConvertCommandLineToArgcArgv function. The return value is the number of arguments (that is, argc). The _ppszArgv global variable is an array of string pointers to each argument.

RTL Routines

The remainder of the TINYCRT files are replacements for some common RTL routines that come with your compiler. Why replace your compiler's RTL routines? In many cases, you can take advantage of Win32 system functions to write smaller versions of the routine in question. In other cases, the compiler-supplied version of a routine may be overkill, and you can replace it with something that's good enough for your requirements. For example, many functions that work with characters and strings (such as strcmpi) are actually multibyte-enabled (MBCS). The code to implement true multibyte support in the RTL is relatively large. If your code will always use single-byte (ANSI) characters, it's worthwhile to prevent the multibyte code from being linked in.

If you use TINYCRT, you may want to replace additional RTL functions beyond what I've implemented here. To replace a function, simply write your own version, add it to the list of OBJ files in the appropriate makefile (TINYCRT.MS or TINYCRT.BOR) and rebuild. The TINYCRT sources provide numerous examples. I won't describe every routine that I've replaced, but I will mention a few of the more interesting replacements here.

The first routines I replaced when writing TINYCRT were printf and sprintf, since this family of functions takes up several KB of memory. The Win32 API has the sprintf and wvsprintf functions, which in most cases are perfectly adequate. The Win32 version of these functions doesn't handle floating point values or some of the more esoteric formatting that the ANSI specification allows. But, the Win32 routines have almost always been good enough for the programs I write.

Implementing printf is a straightforward affair. The code in PRINTF.C (see Figure 6) passes its input arguments to wvsprintf, which formats them into a buffer. Next, I call WriteFile, passing it the string that wvsprintf has formatted. Since printf writes its output to stdout, I need the file handle for stdout. A call to the function GetStdHandle returns exactly the handle I need. The printf replacement is an excellent example of how a standard RTL can be replaced with a series of Win32 system calls and a little glue code.

Figure 6 PRINTF.C


 #include <windows.h>
#include <stdio.h>
#include <stdarg.h>

#ifndef __BORLANDC__
// Force the linker to include USER32.LIB
#pragma comment(linker, "/defaultlib:user32.lib")
#endif

int __cdecl printf(const char * format, ...)
{
    char szBuff[1024];
    int retValue;
    DWORD cbWritten;
    va_list argptr;
          
    va_start( argptr, format );
    retValue = wvsprintf( szBuff, format, argptr );
    va_end( argptr );

    WriteFile(  GetStdHandle(STD_OUTPUT_HANDLE), szBuff, retValue,
                &cbWritten, 0 );

    return retValue;
}

The next set of routines that TINYCRT replaces are the standard memory allocation functions (malloc, free, new, delete, and so on). Like printf and friends, the memory- management functions typically take up several KB of memory. The Win32 API's HeapAlloc family of functions substitutes quite nicely for the C/C++ heap routines. For instance, my implementation of malloc is dead simple:


 return HeapAlloc( GetProcessHeap(), 0, size );

It's important to point out here that using HeapAlloc will almost always be slower than using the compiler's built-in memory allocator. If performance is an issue, by all means use your compiler's built-in heap routines. The purpose of TINYCRT is to produce the smallest executables possible, not the fastest. Incidentally, Visual C++ 4.1 uses HeapAlloc to implement malloc and new, but Visual C++ 4.2 went to a different scheme to improve allocation performance.

I replaced the remaining functions (strupr, strlwr,
atol, atoi, strcmpi, stricmp, and so on) to eliminate what I call the "locale" hit. All of these functions (as well as many others that I didn't replace) are locale-enabled, which means they'll work with double-byte character sets. The code to implement locale support takes up a good chunk of memory. If you only use single-byte characters (that is, the ANSI character set), this code is never executed and is wasted space.

By implementing these locale-enabled functions with either Win32 equivalents or by rewriting them from scratch, you prevent the locale code from being linked in. In the case of Visual C++, an internal routine ( _isctype ) is used by quite a few RTL functions and drags in the locale code. The TINYCRT version of _isctype (ISCTYPE.C, in Figure 7) doesn't use locales so it can make the executable much smaller.

Figure 7 ISCTYPE.C


 // 1996 - John Robbins, with Matt Pietrek
#include <ctype.h>

int __cdecl _isctype ( int c , int mask )
{
    /* c valid between -1 and 255 */
    if (((unsigned)(c + 1)) <= 256)
    {
        return ( _pctype[c] & mask ) ;
    }
    else
        return 0;
}

The selection of routines that I replaced is entirely arbitrary. I came up with the list by linking many of my programs with TINYCRT and determining which RTL routines took up the most space in my executables. The linker MAP file is your friend in determining how much space each function uses. As I identified routines that could be replaced (for instance, printf, malloc, and strcmpi), I wrote replacements and added them to TINYCRT. No scientific method here.

Without a doubt, there are additional functions that could be added to TINYCRT. For instance, the entire set of stdio FILE * functions (fread, fwrite, fprintf, and so on) would be a nice addition. Bear in mind that the compiler-supplied versions of these functions implement sophisticated caching to improve performance. Anything you write in a couple of lines of code most likely won't be as fast. Again, this comes back to my point that TINYCRT is supposed to create small executables. If it's not fast enough for your needs, or if it can't handle all of your program's requirements, don't use it. You're no worse off than you were in the first place.

Using TINYCRT with Visual C++

If you use Visual C++ (or a compatible compiler), TINYCRT is extremely easy to use. I deliberately wrote it so that you don't have to change any of your source code. It's very easy to switch between using and not using TINYCRT—this way, you can make sure everything is working correctly, and also see if you succeeded in decreasing the size of your EXE. The TINYCRT library for Visual C++ is called LIBCTINY.LIB and is built from TINYCRT.MS. If you want a debug version of the code, just define DEBUG=1 on the NMAKE command line and rebuild.

Using LIBCTINY.LIB couldn't be simpler. Just add it to the list of library files in your project or makefile. If you explicitly link to LIBC.LIB or LIBCMT.LIB, make sure LIBCTINY.LIB appears before those files in the library list. For ease of use on my system, I've put LIBCTINY.LIB in the LIB directory where the other Visual C++ libraries live. If you're a dinosaur like me and still build programs entirely from the command line, LIBCTINY.LIB can be specified on the compiler command line:


 CL /O1 TEST.CPP LIBCTINY.LIB

If you don't see the size improvements that you'd expect from TINYCRT with Visual C++, I'll let you in on something I learned the hard way. Both LIBCTINY.LIB and the compiler-supplied LIBCxx.LIB implement versions of printf, malloc, and so forth. The linker has to pick just one version of each referenced function and include it in the executable. When resolving references, LINK searches the libraries in the order they were presented to the linker. Default libraries like LIBC.LIB are searched last. Once LINK selects a routine from a particular library, that library effectively moves to the head of the library search order. (See KnowledgeBase article Q31998 for a more detailed description.) Thus, if your program uses routines from the default compiler RTL, the default RTL version of other functions may also be used even though LIBCTINY.LIB has a version of the function.

If you don't live and breathe linkers, the above paragraph probably sounded like Greek. If it made sense to you, you know your linker rules and there's some additional help that LINK can offer when you're trying to figure out which version of a function was linked in, and why. Check out the /VERBOSE linker switch. It may take a little while to figure out exactly what the output tells you, but I found it very helpful in understanding why the standard LIBC version of functions were being used even though my LIBCTINY had its own version of the function.

When using LIBCTINY.LIB, you may encounter a linker error like this:


 << start output >>
LIBC.lib(crt0.obj) : error LNK2005: _mainCRTStartup  
  already defined in libctiny.lib(CRT0TCON.OBJ)
test.exe : fatal error LNK1169: one or more multiply 
  defined symbols found
<<end output >>

This is to be expected, given the Byzantine library search process described above. You can work around this by using the /FORCE:MULTIPLE switch. The things you gotta do sometimes.

To show a small example of linking with LIBCTINY.LIB I wrote the TEST program, which exercises some of the functionality that TINYCRT implements. The source file is TEST.CPP, and the TEST.MS file is a small makefile that builds TEST using LIBCTINY.LIB. These files aren't listed here, but can be downloaded from http://www.msj.com.

Using TINYCRT with Borland C++

The Borland C++ version of TINYCRT is called CW32TINY.LIB and is built from TINYCRT.BOR. A debug version can be built by defining DEBUG=1 on the MAKE command line. The CW32TINY.LIB file should go ahead of CW32.LIB or CW32MT.LIB in TLINK32's library list.

In addition to including CW32TINY.LIB in the library list, you'll also need to replace C0W32.OBJ or C0X32.OBJ. If you have a GUI application, replace C0W32.OBJ with C032WIN.OBJ. Otherwise, for console mode programs, replace C0X32.OBJ with C032CON.OBJ.

To demonstrate a minimal example of a Borland C++ program using TINYCRT, I made the TEST program (mentioned above) buildable using Borland C++ 5.0. To build it, run MAKE on the TEST.BOR makefile. To see how CW32TINY.LIB and the C032xxx.OBJ files are used, check out the contents of TEST.BOR.

Wrap Up

TINYCRT isn't a panacea. It has a very specific goal: to make those little programs you create as small as they possibly can be. In writing TINYCRT, size and ease of use always took precedence over speed and 100 percent ANSI compatibility. However, I found that many of my everyday programs are able to use TINYCRT without a hitch. The beauty of TINYCRT is that you're free to implement and add whatever you want to it!

I'd like to thank John "Senior Boy" Robbins for his help with TINYCRT. John wrote a couple of the RTL replacement routines and was my beta tester. He also was my sounding board and offered numerous suggestions.

Have a question about programming in Windows? Send it to Matt at 71774.362@compuserve.com

This article is reproduced from Microsoft Systems Journal. Copyright © 1995 by Miller Freeman, Inc. All rights are reserved. No part of this article may be reproduced in any fashion (except in brief quotations used in critical articles and reviews) without the prior consent of Miller Freeman.

To contact Miller Freeman regarding subscription information, call (800) 666-1084 in the U.S., or (303) 447-9330 in all other countries. For other inquiries, call (415) 358-9500.