Porting Applications from Windows NT/X86 to Windows NT/Alpha AXP

Digital Equipment Corporation
Maynard, Massachusetts

October 1994

Abstract

This article describes how to port Windows NT™ applications on the Intel® X86 platform to DEC® Alpha AXP™. Issues of interest to Windows NT version 3.5 developers, such as compiler descriptions, porting methodologies, and Alpha AXP optimization, are discussed in this article.

This article contains the following chapters and appendixes:

Chapter 1. Introduction. Provides an overview of the porting process from Intel X86 and Alpha AXP.
Chapter 2. Compiler Descriptions. Provides an overview of the CLAXP, DEC FORTRAN, and ASAXP compilers.
Chapter 3. Porting Overview. Outlines the steps to port your Windows NT version 3.5 application to Alpha AXP.
Chapter 4. Architectural Issues. Itemizes the architectural differences between X86 and Alpha AXP.
Chapter 5. Public Domain Tools for Alpha AXP. Provides a list of public domain tools available for development on Alpha AXP.
Chapter 6. Alpha AXP Application Tuning and Optimization. Describes the features for tuning and optimizing applications on Alpha AXP.
Appendix A. Compiler Options Comparison.
Appendix B. Data Type Comparison.
Appendix C. Data Type Natural Alignment.

Chapter 1. Introduction

This chapter provides an overview of porting applications from Microsoft® Windows NT™ version 3.5 on Intel® X86 to Windows NT version 3.5 on DEC® Alpha AXP™ machines. This chapter briefly describes Alpha AXP and helps you assess the effort needed to port your applications from Intel X86 to Alpha AXP.

1.1. Windows NT Version 3.5 for Alpha AXP

Alpha AXP is a high-performance, 64-bit operating system neutral architecture, developed by Digital Equipment Corporation ("DEC," or "Digital"). Alpha AXP has a 64-bit physical and virtual address space and processes 64-bit integers and floating-point numbers.

The Alpha AXP architecture is a scalable architecture that will support faster and faster chip designs over its long design lifetime. First-generation chips (DECchip 21064 family) are available with clock speeds from 133MHz to 200MHz. Second-generation chips (DECchip 21064A family) are available with clock speeds from 225MHz to 275MHz. These chips provide industry-leading performance using dual-issue designs, meaning that two instructions can be processed each clock cycle.

Digital is currently shipping client systems for Windows NT for Alpha AXP using a 150MHz 21064 and server systems that support up to 4-way SMP and that support 190MHz 21064 processor boards. Both give excellent performance, and because they have the same byte ordering as Intel X86, they interoperate very efficiently with Intel systems.

Windows NT is a Microsoft operating system that has been ported to and optimized for the Alpha AXP architecture through collaborative engineering work between Digital and Microsoft. It is one of three operating systems (the others being OSF/1, UNIX®, and OpenVMS) currently shipping on the Alpha AXP platforms.

Windows NT is a modern, 32-bit, multitasking operating system that exploits some advanced features, such as 64-bit registers, of the Alpha AXP architecture. Windows NT supports a 32-bit virtual address space on Alpha AXP, as it does when implemented on Intel X86 architectures.

1.2. Execution of X86 DOS and Win16 Executables on Alpha AXP

Part of Windows NT is a subsystem or facility that allows MS-DOS® and Win16 Intel executables to run on Alpha AXP and interoperate with native Win32® applications. These applications run more slowly than native applications for many reasons, but for personal productivity tools, mail, and many other low-performance applications, performance may be very satisfactory.

In Windows NT version 3.5, OLE 2.01 allows enhanced interoperability between Win16 (16-bit Microsoft Windows®–based applications) and Win32 (32-bit Windows-based) applications. Each Win16 application can run in its own address space, improving system robustness. Performance of these Intel X86 executables on RISC microprocessors such as Alpha AXP is also significantly improved.

The features described above are important to your clients. Your clients will want to interoperate between native, high performance Win32 Alpha AXP applications and legacy, X86 Win16 executables, lower performance applications, and personal productivity tools. Developers may find some value in this feature, therefore your support staff will need to understand how Win16 and Win32 applications interoperate and share data on both Intel X86 and Alpha AXP platforms.

1.3. Porting Applications from Windows NT/X86 to Windows NT/Alpha AXP

In contrast to the support for Intel Win16 executables in Windows NT, native Win32 Intel programs will not execute on Alpha AXP. Recompilation of these applications is the proper method to provide support for Alpha AXP. This recompilation is usually very easy; some developers port significant applications in less than one working day.

Many popular CASE tools and compilers, including the Microsoft SDK, are supported on both Intel X86 Windows NT and on Alpha AXP Windows NT. These common tools ease migration and allow Alpha AXP to be a development platform for deployment on Intel NT, and vice versa.

The following table lists the basic porting steps:

Step	Procedure
1	Set up the source and make files on Alpha AXP.
2	Ensure the components that make up the development environment are available on Alpha AXP.
3	Modify the make files as necessary, so they work in the Alpha AXP nmake environment.
4	Compile and link your application.
5	Correct any errors and re-compile and/or re-link your application.
6	Test your application.
7	Optimize your application.

Chapter 2. Compiler Descriptions

This chapter describes the CLAXP compiler, as well as other Digital compilers available on Alpha AXP. With the exception of the compiler, the Win32 SDK is the same on both Intel X86 and Alpha AXP. The compiler on the Win32 SDK for Alpha AXP is called CLAXP, not CL386. An alias, CL, is now used on all platforms, making make files more uniform.

2.1. CLAXP

CLAXP is Digital's and Microsoft's compiler on Windows NT for Alpha AXP. The CLAXP compiler generates Alpha AXP–specific optimized machine code. The compiler front end, or code interpreter, looks very similar to Microsoft's CL386 compiler. The interpreter accepts and parses most of the CL386 switches. The back end is Digital’s code optimizer that generates code optimized specifically for the Alpha AXP architecture.

For further details on the CLAXP compiler, see the CLAXP Compiler Specifications Version 8.00 document. This document is located in \mstools\bin\claxp.txt. For more information on designing your application to take advantage of the Alpha AXP architecture, see the Windows NT for Alpha AXP Calling Standard document.

2.2. DEC FORTRAN

The DEC FORTRAN compiler for Alpha AXP is an implementation of full language FORTRAN-77 conforming to American National Standard FORTRAN, ANSI X3.9-1978. The DEC FORTRAN compiler will compile any source code pool that strictly adheres to ANSI FORTRAN-77.

The compiler also supports extensions to the ANSI standard, including a number of extensions defined by the DEC FORTRAN compiler that runs on OpenVMS, DEC OSF/1, and ULTRIX systems. The following list describes some of the more significant extensions:

Additional statements such as DO WHILE, ENDDO, ACCEPT, and TYPE.
Support for dynamic memory allocation and POINTER data type.
Support for reading and writing binary data files in big-endian and VAX, IBM®, and CRAY® floating point formats.
Support for 64-bit signed integers using INTEGER*8 and LOGICAL*8.

Other vendor's FORTRAN compilers' extensions may differ. However, because Digital's FORTRAN extensions have often become de facto standards, compatibility is likely even when syntax extensions are used.

Certain FORTRAN extensions specific to Microsoft FORTRAN that are not yet supported by DEC FORTRAN are:

Dynamically allocated arrays.
SELECT CASE, INTERFACE TO, CYCLE, and EXIT statements.

These extensions (except the INTERFACE TO statement) will be supported in the next major release of DEC FORTRAN.

In addition to language extensions, the DEC FORTRAN run-time library provides a number of built-in utility routines to the ANSI-defined intrinsic functions. Other compilers are likely to differ in what utility routines are available.

The development kit provided with DEC FORTRAN supports a command line interface and the nmake utility. Source code debugging with a graphical user interface is provided via the windbg utility.

DEC FORTRAN uses GEM as its back-end on all Alpha AXP platforms. The DEC FORTRAN compiler provides a multiphased optimizer that is capable of performing optimizations across entire programs. Although builds may take more time and memory compared to compilers that optimize less thoroughly, the improved performance of highly optimized code at run time is worth the added time.

DEC FORTRAN is run-time compatible with CLAXP. This capability allows you to mix and match FORTRAN and C/C++ modules to meet your application needs. Run-time compatibility with other compilers has not been tested, though it may work if proper calling standards are followed.

The DEC FORTRAN kit includes dynamic-link library (DLL) versions of its Run-Time Libraries (RTLs). These RTLs can be reproduced and distributed royalty-free worldwide.

2.3. Windows NT Assembler (ASAXP)

ASAXP compiles source files written in Alpha AXP assembly language. You need to be familiar with both the Alpha AXP 21064 DECchip architecture and the Alpha AXP assembly language. For assembler syntax, see the file \mstools\bin\asaxp.txt.

Chapter 3. Porting Overview

In most cases, porting your application simply means recompiling and relinking the application on Alpha AXP. If you use the Win32 SDK to develop your application, you only have to make minor changes to your development environment. This chapter describes how to port your application to Alpha AXP.

3.1. Modifying the nmake Environment

If you use the nmake utility on your Intel X86 system, your application's build environment will need very few changes to work on Alpha AXP. If you use another build system, be sure the that system is available on Alpha AXP. Otherwise, you need to modify your make file(s) to work with nmake. The following sections describe the changes you need to be aware of.

3.1.1. Modify host machine symbols

The _ALPHA_ symbol defines the Alpha AXP target environment. The make file should pass -D_ALPHA_= 1 to the CLAXP compiler. If the make file includes <ntwin32.mak>, this is automatically ensured.

Additionally, ensure that the symbols _MIPS_ and _X86_ are not defined.

3.1.2. Modify compiler options

Most of the compiler options are the same on both Intel X86 and Alpha AXP. There are two compiler flag changes on Alpha AXP to be aware of:

/Gn flags (These flags do not support Intel X86 optimization.)
/On flags (Certain flags are different from Intel X86.)

See Appendix A for a list of compiler options. The \mstools\bin\claxp.txt file provides detailed information about the compiler flags.

3.1.3. Other build environments

If you use other build environments and your build software is not ported to Alpha AXP, you must rebuild your dependency files. The nmake compiler, linker, and resource macros are predefined in a file named \mstools\h\ntwin32.mak. Use this file as a start for rebuilding your application. The Building Applications chapter in the Windows NT SDK Programming Techniques document provides further information about the build process.

3.2. Porting References to Header Files

In most cases, you do not need to modify the header files. Compile the application first and investigate any header file errors generated by the compiler. The following sections describe some common header file changes that may be required.

3.2.1. Machine-specific #ifdef statements

Applications developed on multiple platforms may have preprocessing operators defined for different machine types—for example, #ifdef _X86_, #ifdef _MIPS_. These statements must be updated to apply to Alpha AXP. This is accomplished by adding an #ifdef _ALPHA_ section and adding the appropriate architecture-specific code. These symbols are defined in ntwin32.mak.

3.3. Recompiling

Once the necessary changes have been made to the build environment, executing the nmake utility will compile your application. It is also possible to compile the application from command line by using the CLAXP compiler. Most of the CL386 options are available on the CLAXP compiler. You can also compile your program in command prompt mode. Appendix A contains a list of Alpha AXP compiler options.

3.4. Relinking

The link command on Alpha AXP is the same as on Intel X86 and understands the same flags.

3.5. Debugging

There are three debuggers currently available on Alpha AXP systems to help you debug code at the application and kernel levels: windbg, NTSD, and KD. The following section provides a brief introduction to these debuggers.

3.5.1. windbg

The WinDebug debugger, windbg, is a GUI-based debugging tool. It is located in \mstools\bin or in the Win32 SDK Tools program group. The debugger allows you to set breakpoints; examine values of local variables, registers, and assembly-language instructions; and so on.

To debug your application, compile the code with the /Zi and /Od options. Then link with debug:full and debugtype:cv (CodeView®-style debugging information), or with debug:full and debugtype:coff (for global only, non-static symbols). The windbg debugger also has full 64-bit register support on Alpha AXP NT systems.

3.5.2. NTSD

You can use the NT Symbolic Debugger (NTSD) for assembler programs that have been compiled and linked with the debugtype:coff and debug:partial options using the link32 command. There are two commands (rL, rF) available on Windows NT for AXP that are not available on Intel X86. These commands enable you to examine large integers and floating point registers.

For information on commands and how to use NTSD, see the Tools User's Guide in Microsoft Win32 SDK.

3.5.3. KD

The Kernel Debugger (KD) allows you to debug kernel-mode executables and device drivers. KD can also be used to perform remote driver debugging between different architectures. For example, you can use KD to debug an Alpha AXP driver from an Intel machine and vice versa.

Chapter 4. Architectural Issues

This section briefly discusses the architectural issues you need to consider when porting applications from Windows NT for Intel X86 to Windows NT for Alpha AXP.

For a detailed description of the architectural considerations that need to be addressed when porting your application, see the AXP Notes document. You can locate this document in the Win32 SDK Tools program group. The \mstools\bin\claxp.txt file is also a good reference for compiler-specific features.

4.1. Variable Argument Lists

For defining variable argument lists, the CLAXP compiler supports two header files:

ANSI standards <stdarg.h>
Traditional <varargs.h>

Use the macros va_start(list,v), va_arg(list,mode) & va_end(list) as defined in the header file <stdarg.h>. All programs that properly use the varargs macros for variable argument list processing will port unchanged to Windows NT on Alpha AXP.

The following example illustrates the proper use of the varargs macros:

// Example: VARARG.C
    /* VARARGS.C illustrates passing a variable number of 
       arguments using the following macros:
          va_start        va_arg      va_end
    *
    * Also the ANSI and UNIX type:
    *      va_list
    * and the UNIX types:
    *      va_alist        va_dcl
    */
    
    #include <stdio.h>
    #include <stdarg.h>
    int average( int first, ... );
    
    void main()
    {
    /* Call with 3 integers (-1 is used as terminator). */
    printf( "Average is: %d\n", average( 2, 3, 4, -1 ) ); 
    
    /* Call with 4 integers. */
    printf( "Average is: %d\n", average( 5, 7, 9, 11, -1 ) );
     
    /* Call with just -1 terminator. */
    printf( "Average is: %d\n", average( -1 ) );
    }
    
    /* Returns the average of a variable list of integers. */
    int average( int first, ... )
    {
    int count = 0, sum = 0, i = first;
    va_list marker;
    
    va_start( marker, first );      /* Initialize variable arguments */
    )
    {
    sum += i;
    count++;
    i = va_arg( marker, int);
    }
    va_end( marker );               /* Reset variable arguments */
    return( sum ? (sum / count) : 0 );
    }

4.2. Uninitialized Variables

An error message, "Memory Access Errors", is displayed for uninitialized variables due to inappropriate values on the high order bits of those variables. Use the Disassembly option in windbg to locate the improperly initialized variables.

When using the Disassembly option, a listing from the compiler is very helpful; create a .COD file using the /FAcs CLAXP (compiler) option. In optimized only cases, specify the -O* -Zi options and use the listing file created.

If you notice behavior such as a floating point exception, invalid access, and so on, examine the variables involved and investigate the call stack until you find the offending variable.

4.3. Data Alignment and the UNALIGNED Keyword

The Alpha AXP memory architecture naturally aligns and references data in 2-, 4-, or 8-byte quantities. In Windows NT version 3.5, there is a switch to enable the alignment fault exceptions. (Automatic fixups are disabled by default.) Turning off automatic fixups by default is desirable because it allows Alpha AXP application developers to locate alignment faults in their own applications.

For maximum performance, align your data structure components on natural boundaries. Also, avoid using byte and word length integers in favor of longwords or quadwords wherever possible. (For example, avoid using chars and shorts; use ints or int64s instead.)

If unaligned access of data is required, you can use the UNALIGNED pointer qualifier defined in the Windows NT system include files. The performance using UNALIGNED is better than the kernel handle alignment faults (though not as good as using aligned data). When instructed, the compiler will insert routines to handle the unaligned data; and thus avoid operating system trap(s), resulting in significantly improved performance.

The UNALIGNED pointer is portable across all Windows NT platforms, regardless of whether the machine has alignment restrictions or not.

The following is a table of UNALIGNED pointer examples:

UNALIGNED int	*ip;	// pointer to unaligned int
int UNALIGNED	*ip;	// pointer to unaligned int
int * UNALIGNED	ip;	// wrong
typedef struct _FOO UNALIGNED	*PFOO;	// pointer to unaligned struct
typedef SYMBOLS UNALIGNED	*psymbols;	// pointer to unaligned symbol
UNALIGNED PLONG	NextEntry;	// wrong
PLONG UNALIGNED	NextEntry;	// wrong
LONG UNLIGNED	*NextEntry;	// correct

There are routines available to help locate the source of alignment traps. Performance gains of up to 20 percent have been observed when the programmer instructs the compiler in such a way that an alignment trap will not occur.

See Appendix C for details on data type natural alignment.

4.4. Structure Packing

The #pragma pack directive can be used to pack structure members together tighter than the default packing the compiler would use. In some cases this is necessary to map structures to preexisting data. In other cases, it is to reduce memory use. It results in structure members that no longer have their natural alignment; thus the CLAXP treats access to these structure members as unaligned. The code generated is bigger and slower than aligned access because the compiler will load the 16-bit, 32-bit, or 64-bit object byte by byte, or by checking the alignment dynamically. This is still much faster than a hardware trap. /ZP8 is the default packing CLAXP would use.

Use the #pragma pack directive only when necessary, such as when compiling a data structure that will be read from a disk file. For example:

#pragma pack(1)
 typedef struct
{
...
}
#pragma pack()  //resume default

If you use the #pragma pack directive, it is necessary to appropriately declare pointers as UNALIGNED. Note that you may incur a large performance penalty for UNALIGNED access.

4.5. Data Structures: 64-Bit Data Types

On Alpha AXP platforms, LARGE_INTEGER types are treated as one naturally aligned quadword. CLAXP adds 64-bit integer support. CLAXP handles type long double as a 64-bit floating point type (rather than 80-bit). See Appendix B for a data types comparison list.

4.6. Integer Division by Zero Exception

By default, integer division by zero is reported as an exception on Alpha AXP. The exception may occur on Alpha AXP but not on Intel X86. This is considered a latent bug in the original code and should be corrected.

4.7. Floating Point Behavior

All Windows NT platforms use identical IEEE floating point formats. For finite floating point values, Alpha AXP floating point behavior is identical to that of MIPS® and Intel X86. However, for non-finite floating point values (for example, infinity, denormals, and NaNs) the Alpha AXP will raise a floating point exception when such values are encountered. The floating point exceptions are:

STATUS_FLOAT_DIVIDE_BY_ZERO	0xC000008E
STATUS_FLOAT_INVALID_OPERATION	0xC0000090
STATUS_FLOAT_OVERFLOW	0xC0000091
STATUS_FLOAT_UNDERFLOW	0xC0000093
STATUS_FLOAT_INEXACT_RESULT	0xC000008F

Note that if you use default options, floating point exceptions that are not reported under Intel X86 may raise floating points exceptions in Alpha AXP.

At the present time, the CLAXP compiler supports the following IEEE related options:

/QAieee (same as QAieee1 described below.)
/QAieee0
IEEE floating point NaNs, Infinities, and denormals are not supported in the compiled code. Underflows are quickly forced to zero, and the use of a NaN or Infinity raises an exception. This is the default value, and should be used for all applications except those that require IEEE-floating point exception behavior, because it produces the fastest execution speed.
Run-time library routines may still produce NaNs and denormals, however, so the use of the _matherr routine to handle those situations is recommended. If an application does require support for IEEE NaNs and denormals, use the QAieee option (equivalently /Qaieee1).
/Qaieee1
IEEE floating point NaNs, Infinities, and denormals are supported. Use this value for applications that expect IEEE-compliant, masked response handling of non-finite operands.
/Qaieee2
Same as /QAieee1, but IEEE Inexact Operation exceptions are also enabled. Use this value only for applications requiring the IEEE inexact operation exception to be raised (this is almost never needed).

4.8. Imprecise Location of Exceptions

For normal compile modes, if one of the exceptions listed in the previous section does occur, it is likely that the exception PC does not point to the instructions that actually cause the trap. If using a debugger, look for the offending floating point instruction a few instructions prior to the Fir (continuation address). If you are looking near the beginning of a function, look at the last few instructions in the calling frame.

Compile your code using the /QAieee1 option if you want the exceptions to be precise.

4.9. CONTEXT Structure Definition

The CONTEXT structure is an architecture-dependent data structure that contains register data. If you have code that accesses fields in this structure, you will need to modify the application for Alpha AXP.

4.10. Page Size Assumptions

The page size is architecture-dependent (4 KB on Intel and 8 KB on Alpha AXP) and should not be hard-coded into applications. If the application assumes the page size to be 4 KB, it will not work correctly on Windows NT on Alpha AXP.

4.11. LARGE_INTEGER & Quadword Types

Although the LARGE_INTEGER data type is a 64-bit integer, it is not a quadword. A quadword is a 64-bit integer data type and is not supported on Windows NT for Alpha AXP. LARGE_INTEGER is created from an array of two longwords. The LARGE_INTEGER data type can only be used in conjunction with a set of run-time library functions. (For example, LargeIntegerAdd, and so on.)

4.12. Multithreaded Granularity of Access

Ideally, you would like to have atomic load/store of shared data. An atomic load/store requires a single instruction. On the Intel X86 platform, the size of shared data is 1, 2, or 4 bytes, but on Alpha AXP it is 4 or 8 bytes.

To ensure portability across platforms, it is necessary to protect multithread access to shared data structure with locks. (For example, EnterCriticalSection, LeaveCriticalSection.)

4.13. setjmp/longjmp Jump Buffer

The jump buffer is compiler-specific and version-specific. If you use setjmp/longjmp, do not link objects produced by various compilers or versions.

4.14. Assembler Source Files

The CLAXP compiler does not produce any assembler source files.

Chapter 5. Public Domain Tools for Alpha AXP

This chapter lists some of the common utilities that are publicly available on the Internet for Alpha AXP.

5.1. Alpha AXP Developer Support Home Page

You can access Digital’s Alpha AXP Developer Support home page on the World Wide Web by specifying the following Universal Resource Locator (URL):

http://www.digital.com/www-swdev/

From the Alpha AXP Developer Support home page, follow the steps below to access the public domain tools:

	To reach...	Click on...
1	Alpha AXP Technical Support page	TECH button (on the Alpha AXP Support home page)
2	Microsoft Windows NT Software page	Microsoft Windows NT (in the Software Area)
3	Windows NT Public Domain page (this is where the tools are stored)	Unsupported Software Tools Built for Alpha AXP (in the Public Domain Software area)

5.2. List of Public Domain Tools

Most of the public domain utilities also contain the executables for Intel-based Windows NT machines. Note that all of these tools were collected from the Internet and are copyrighted as per the agreements in the individual source code. Digital makes no warranties, either written or implied, concerning this software. The following table lists the public domain tools available on the Internet.

Tool	Description
bsdcmpat.lib	Contains a library of routines to help in porting to NT. Routines include bcmp, bcopy, bstring, bzero,getopt, index, and isctype. Headers include ctype, getopt, paths, string, strings, and unistd.
cal.exe	Prints a calendar. If you specify a number between 1 and 12 for month, only that month is printed for that year. Year can be between 1 and 9999.
cat.exe	Reads each file in sequence and displays it on the standard output.
cmp.exe	Compares two files. With no options, cmp makes no comment if the files are the same. If they differ, it reports the byte and line number at which the difference occurred to the standard output.
color.exe	Changes the color of the foreground and background. The available colors are black, blue, green, cyan, red, and magenta.
comm.exe	Compares sorted data.
compress.exe	Uses modified Lempel-Ziv. This command is compatible with the compress/decompression used on the UNIX systems compress programs. This is version 4 and supports up to 16 bits compression.
egrep.exe	Searches a file for regular expressions. Egrep patterns are full regular expressions.
grep.exe	Searches a file for regular expressions. Grep patterns are limited regular expressions in the style of 'ex'.
flex.exe	Generates output as C code source file via programs that recognize lexical patterns in text, Fast Lexical Analyzer Generator (FLEX).
fold.exe	Folds the contents of the specified files, or the standard input if none are specified, breaking the lines to have a maximum of 80 characters.
head.exe	Gives the first n lines of the specified files or the standard input.
ls.exe	Acts as a UNIX ls work-alike. Some options are different or absent. Use the -? command line argument for help. The executable is compiled for I386 and should be UNIX-code–compatible, but hasn't been tested.
mawk.exe	Implements the AWK programming language.
mewinnt.exe	Invokes the MicroEMACS windows editor.
par.exe	Copies, by way of a filter, its input to its output, changing all white characters (except newlines) to spaces, and reformatting each paragraph. Paragraphs are delimited by vacant lines, which are lines containing no more than a prefix, suffix, and intervening spaces.
perl.exe, perlglob.exe	Combines, using the Perl language, some of the features of C, sed, awk, and shell.
sed.exe	Uses regular-expression routines from EMACS (may not be fast). GNU sed is a batch stream editor. For speed, use Perl.
soss.exe	Runs SOSS, a file server conforming to SUN Microsystems' NFS protocol version 2.
tar.exe	Executes a version of the tape archiver command available on most UNIX machines based upon the GNU tar utility. It does not yet utilize the tape drive available.
uniq.exe	Reports repeated lines in a file. The repeated lines must be adjacent in order to be found.
unshar.exe	Extracts files from the SHELL archive.
viewps.exe	PostScript text extractor.
win100.exe	Invokes Kermit and terminal emulator.
winvn.exe	Runs the Visual Usenet news reader for Microsoft Windows.
xstr.exe	Extracts and hashes strings in a C program.
xvi.exe	Runs a portable multiwindow version of the UNIX editor, 'vi'.
yacc.exe	Runs Berkeley Yacc, an LALR parser generator. It has been made as compatible as possible with AT&T® Yacc.
zip.exe, unzip.exe	Packages and compresses (archive) files. Lists/test/extracts from a ZIP archive file.

Chapter 6. Alpha AXP Application Tuning and Optimization

This chapter contains a summary of tips and hints for optimizing your application on Windows NT for Alpha AXP. For detailed performance analysis and optimization techniques on Windows NT, see the Optimizing Windows NT (Volume 3 of the Windows NT Resource Kit ) book by Russ Blake from Microsoft Press®.

6.1. Compiler Options

An important tool in optimizing your application is the compiler. Once your application is working, recompile the application with optimization turned on (using the /Ox switch). The performance gain will vary from application to application but typically you can expect a 20–30 percent gain in performance. Also consider specifying some of the other CLAXP optimization switches summarized below for any potential gain in performance.

Since you often develop your application initially in debug mode to facilitate testing and debugging of your application (using the -Zi -Od switches), remember that the compiler has turned off optimization. You can quickly turn off debugging in nmake by passing nmake a nodebug flag—for example, >nmake nodebug=1.

6.1.1. CLAXP

Use the following options for optimization:

Option	Description
/Ox	full optimization (except in-lining)
/O2	full optimization (same as /Ox /Ob2)
/d2O3 /Ob2 /Oi	full optimization with byte vectorization
/d2Gt64	use of Global Pointer

You can also use UNALIGNED keyword for unaligned data. See the ALIGNMENT section below.

6.1.2. DEC F77

If you are using the DEC F77 compiler for your FORTRAN application, the following table lists the optimization switches that are not enabled by default:

Optimization Switch	Description	Notes
/align:dcommons	Aligns COMMON data blocks on natural boundaries up to eight bytes.	These switches may be enabled en masse via /fast.
/assume:noaccuracy_sensitive	Allows floating point operations to be reordered.
/math_library:fast	Uses versions of some intrinsics that trade a small amount of accuracy for improved performance.
/inline:all	Inlines every possible routine.	Use these switches carefully. By default, the compiler automatically inlines and unrolls according to its heuristics.
/unroll:<count>	Specifies how many times loops are unrolled.

6.2. Tools

Key tools for performance are described below. For more detailed information on the use of these tools, see the Optimizing Windows NT book.

PerfMon (Performance Monitor)

Performance Monitor (PerfMon) analyzes the performance of your system or application. Use this tool to analyze all the key areas for potential bottlenecks—CPU usage, disk I/O activity, memory statistics, network traffic, and so forth.

WAP (Windows API Profiler)

WAP profiles the Windows API. For example, how much time is spent in which Win32 API calls? It can be run without recompiling your application and is part of the Windows NT SDK.

CAP (Call Attributed Profiler)

CAP profiles your entire application (how much time is spent in each function). This is used to identify the "hot spots" in your program so you can focus on those areas for optimization. WAP requires that you recompile your application with the -Gh option on the C compiler. (Currently only C programs are supported.) It is part of the Windows NT SDK.

WST (Working Set Tuner)

WST can improve the speed of your program by reducing processor cache bottlenecks. It does this by telling the linker to reorder the functions in your program in the order that most reduces paging. Using WST involves several steps. See the Optimizing Windows NT book for more details.

6.3. Alignment Fixups

In both versions of Windows NT (3.1 and 3.5), the operating system automatically resolves Alpha AXP alignment faults at run time by default. In most cases this is a desirable feature because the alternative would be for an application to unexpectedly terminate with a data misalignment exception if any alignment fault occurred.

However, allowing the operating system to resolve alignment faults can degrade the performance of your application if there are hundreds or thousands of alignment fixups per second. The rate of alignment fixups can be monitored using the PerfMon or wperf performance tools.

To eliminate alignment errors in your application on Alpha AXP, change the default operating system control for alignment exceptions so that alignment faults become visible to your application. You can then use the debugger to locate the source of the alignment faults.

6.3.1. Changing operating system defaults

You can change operating system defaults in the registry by using the new SDK tool called axpalign (an easy and preferred method) or with the regedt32 method. The two methods are discussed below.

To enable alignment fault exceptions (using axpalign) enter:
axpalign /enable
To disable alignment fault exceptions enter:
axpalign /disable
To enable Alpha AXP alignment fault exceptions (using regedt32) add the following value in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\:
EnableAlignmentFaultExceptions : REG_DWORD : 0x1
To disable alignment fault exceptions and revert to the default operating system behavior, enter the following value:
EnableAlignmentFaultExceptions : REG_DWORD : 0x0

After using either method, you will need to reboot your system. The changes apply system-wide and affect all applications. Use this method only while locating alignment faults in your own system or application. Otherwise, older applications that still contain alignment errors may terminate with data misalignment exceptions.

If your application requires that the operating system handle alignment faults regardless of the operating system alignment exception control, insert the following statement early in your program:

SetErrorMode(SEM_NOALIGNMENTFAULTEXCEPT);

This statement must be executed before any alignment error can occur. The effect of the statement is to set a flag that causes the operating system to handle alignment faults for your program.

6.3.2. Displaying alignment fixups per second

Use the PerfMon or wperf tools to display the number of alignment fixups per second. If this value is zero, you do not need to debug your system or application. A value of less than 100 or 500 alignment fixups per second is not considered a performance problem. However, addressing alignment faults is essential for obtaining good performance on RISC architectures.

6.3.3. Techniques for addressing alignment faults

There are several techniques for addressing alignment faults:

Use the UNALIGNED compiler keyword if your application must access unaligned data. It is faster than letting the kernel fix up unaligned data. See Chapter 4 for a description of the UNALIGNED type qualifier.
Do not use the #pragma pack(1) directive unless absolutely necessary. This will cause more alignment faults on all RISC architectures, including Alpha AXP. See Chapter 4 for a description of the #pragma pack directive.
Avoid misleading casts. Pointer casts are used to cast one pointer type to another. One subtle and undetectable problem with this can occur when casting to a pointer with stronger alignment requirements. For example,
```
long *ip;
...
percent = (double *)ip / 100.0; 
```
will cause an alignment trap if the value of pointer ip is not a multiple of 8. Since ip is merely a pointer to a 4-byte long type, there is no reason to believe it will be a multiple of 8. The compiler cannot detect this at compile time. Another example is when the address for some data is obtained from a memory allocation function that makes no guarantee about the alignment of the address it returns.
For example, if MyAlloc() does not round up addresses to 8-byte multiples, then:
```
TimePtr = MyAlloc(8);
...
MyQuerySystemTime((PLARGE_INTEGER)TimePtr);
```
may result in an alignment trap because MyQuerySystemTime is expecting the address of a properly aligned LARGE_INTEGER type.
Note that on MIPS platforms this particular code example will work as expected even though the code is wrong, because LARGE_INTEGER types are treated as two naturally aligned longwords. On Alpha AXP platforms the code may result in an alignment trap because LARGE_INTEGER types are treated as one naturally aligned quadword.
Do not pass unaligned pointers to another function. If you take the address of a member of an unaligned or packed structure and pass it as an argument to a function, with or without a cast, it is likely that an alignment trap will occur. For example:
```
struct _FOO {
...
long count;
};
struct _FOO UNALIGNED *pFoo;
void SetCounter(long * ip);
...
pFoo->count++;// Ok, the compiler knows 
// the long may be unaligned.
SetCounter(&pFoo->count);    // WRONG! Function
// SetCounter is expecting
// a pointer to a normal,
// aligned long.
```
One workaround in this case is to type SetCounter to receive an UNALIGNED pointer rather than a normal pointer. Another work-around is to declare a local count variable and pass its address to SetCounter.
Note that the Windows NT system services do not explicitly enforce quadword alignment of quadword pointer parameters.

Appendix A. Compiler Options Comparison

CLAXP	CL386	Description
/?,/help	/?,/help	print compiler options
/batch	/batch	batch compiler mode
/Bd	/Bd	verbose, shows all default macros and include files
/c	/c	compile only, no link
/C	/C	preserve comments during preprocessing
/D<name>{=\|#}<text>	/D<name>{=\|#}<text>	define macro or constant
/E	/E	preprocess to stdout
/EP	/EP	preprocess to stdout, but no #line
/F<num>	/F<num>	set stack size
/Fa[file]	/Fa[file]	name assembly listing file
/FA<s\|c>	/FA<a\|s\|c>	configure assembly listing, in CLAXP /FAa=/FAc
/Fd[file]	/Fd[file]	specify program database .PDB file
/Fe<file>	/Fe<file>	specify executable file
/FI[file],/Fc[file]	/FI[file]	specify forced include file, use /FAc, /FAcs
/Fm[file]	/Fm[file]	specify linker map file
/Fo<file>	/Fo<file>	specify object file
/Fp<file>	/Fp<file>	specify precompiled header file
/Fr[file]	/Fr[file]	specify source browser file
/FR[file]	/FR[file]	specify extended .SBR file
not available	/G3	optimize for 80386
not available	/G4	optimize for 80486 (default)
not available	/G5	optimize for Pentium
not available	/Gd	__cdecl calling convention
/Ge	/Ge	enable stack checking calls (default)
/Gf	/Gf	enable string pooling
/Gh	/Gh	enable hook __penter function call
not available	/Gr	__fastcall calling convention
/Gs[num]	/Gs[num]	disable stack checking calls
/Gt<n>	/Gt<n>	threshold for gp-relative data
/Gy	/Gy	separate functions for linker
/Gz	/Gz	__stdcall calling convention
/H<num>	/H<num>	max external name length
/I<dir>	/I<dir>	add to include search path
/J	/J	default char type is unsigned
/link	/link	linker control options
not available	/LD	create .DLL
/MD	/MD	link with MSVCRT.LIB
/MD	/ML	link with LIBC.LIB
/nologo	/nologo	logo suppress copyright message
/O	not available	maximum speed, /Oi /Ob2 or O2
/O1	/O1	minimize space
/O2	/O2	maximize speed (same as /O)
not available	/Oa	assume no aliasing
/Ob<0\|1\|2>	/Ob<0\|1\|2>	inline expansion (default n=0)
/Od	/Od	disable optimizations (default if /Zi and no /O*)
not available	/Og	enable global optimization
/Oi[-]	/Oi[-]	enable intrinsic functions (default=Oi-)?
/Op[-]	/Op[-]	improve floating-pt consistency (decrease performance)
CLAXP	CL386	Description
not available (see /O1)	/Os	minimize space
not available (see /O2)	/Ot	maximize speed
not available	/Ow	assume cross-function aliasing
/Ox	/Ox	maximum opts (/Ogityb1 /Gs for CL386, /Oi /Ob2 for CLAXP)
not available	/Oy[-]	enable frame pointer omission
/P	/P	preprocess to file
not available	/QmipsGx	generate MIPS-specific instructions
/QAgl	not available	generate fetches and stores in units of longword
/QAgq	not available	generate fetches and stores in units of quadword
/QAieee, /QAieee1	not available	IEEE floating point NaNs, infinities, and denormals support
/QAieee0	not available	disable IEEE floating point support
/QAieee2	not available	/QAieee1 and IEEE Inexact Operation exception support
/Tc<source file>	/Tc<source file>	compile file as .c
/Tp<source file>	/Tp<source file>	compile file as .cpp
/u	/u	remove all predefined macros
/U<name>	/U<name>	remove predefined macro
/V<string>	/V<string>	set version string
/vd<0\|1>	/vd<0\|1>	disable/enable vtordisp
/vmb	/vmb	best case for pointers to class members
/vmg	/vmg	full generality for pointers to class members
/vms	/vms	define single inheritance
/vmm	/vmm	define multiple inheritance
/vmv	/vmv	define virtual inheritance
/w,/W0	/w,/W0	disable all warnings
/W<n>	/W<n>	set warning level (default n=1)
/WX	/WX	treat warnings as errors
/X	/X	ignore standard include directories
/Yc[file]	/Yc[file]	create .PCH file
/Yd	/Yd	put debug info in every .OBJ
/Yu	/Yu	use .PCH file
/YX[file]	/YX[file]	automatic .PCH
/Z7	/Z7	C7 style CodeView information
/Za	/Za	ANSI compatibility (implies /Op)
/Zd	/Zd	debugging information
/Ze	/Ze	enable extensions (default)
/Zg	/Zg	generate function prototypes
/Zh	/Zh	home arguments (low-level debugging)
/Zi	/Zi	prepare for debugging (CodeView I information for windbg)
not available	/Zl	omit default library name in .OBJ
/Zn	/Zn	turn off SBRPACK for .SBR files
/Zp[n]	/Zp[n]	pack structs on n-byte boundary
/Zs	/Zs	syntax check only

Appendix B. Data Type Comparison

Data Type (byte)	Alpha AXP	Intel X86
char	1	1
unsigned char	1	1
short	2	2
unsigned short	2	2
int	4	4
unsigned int	4	4
long	4	4
unsigned long	4	4
void *	4	4
char *	4	4
float	4	4
double	8	8
long double	8	10

All data types are identical except for long double.

Appendix C. Data Type Natural Alignment

Data Type	Alignment Starting Position
8-bit character string	Byte boundary
16-bit integer	Address that is a multiple of 2 (word alignment)
32-bit integer	Address that is a multiple of 4 (longword alignment)
64-bit integer	Address that is a multiple of 8 (quadword alignment)
IEEE floating single S	Address that is a multiple of 4 (longword alignment)
IEEE floating double T	Address that is a multiple of 8 (quadword alignment)
IEEE floating extended X	Address that is a multiple of 16 (octaword alignment)
IEEE floating single S complex	Address that is a multiple of 4 (longword alignment)
IEEE floating double T complex	Address that is a multiple of 8 (quadword alignment)
IEEE floating extended X complex	Address that is a multiple of 16 (octaword alignment)

Additional Reading

In addition to this article, the following documents will help you in porting your application from Windows NT for Intel X86 to Alpha AXP:

Microsoft Win32 Software Development Kit for Alpha AXP, on \mstools\bin\axpnotes.txt.

Microsoft Win32 Software Development Kit for Alpha AXP, on \mstools\bin\claxp.txt.

Microsoft Win32 Software Development Kit for Alpha AXP, on \mstools\bin\asaxp.txt.

Microsoft Win32 Software Development Kit for Alpha AXP, on \mstools\bin\wap.txt.

Windows NT for Alpha AXP Calling Standards, Digital Equipment Corporation, Rev 1.7, January 1994.

Russ Blake, Optimizing Windows NT, Microsoft Corporation, Summer 1993.

Microsoft Win32 SDK version 3.5, Tools User's Guide, Microsoft Corporation, 1993.

********************

Copyright 1994 Digital Equipment Corporation. All rights reserved. Restricted rights: Use, duplication, or disclosure by the U.S. Government is subject to restrictions as set forth in subparagraph (c)(1)(ii) of the Rights in Technical Data and Computer Software clause at DFARS 252.227-7013.

The information in this document is subject to change without notice and should not be construed as a commitment by Digital Equipment Corporation. Digital Equipment Corporation assumes no responsibility for any errors that may appear in this document.
This software described in this document is furnished under a license and may be used or copied only in accordance with the terms of such license.
No responsibility is assumed for the use or reliability of software on equipment that is not supplied by Digital or its affiliated companies.

Alpha AXP, DEC C, DEC FORTRAN, DEC OSF/1, Open VMS, ULTRIX, and VAX are trademarks of Digital Equipment Corporation.
Intel is a trademark of Intel Corporation.
CodeView, Microsoft, Microsoft Press, MS-DOS, and Windows are registered trademarks and Windows NT and Visual C++ are trademarks of Microsoft Corporation.
All other trademarks and registered trademarks are the property of their respective holders.