The Dr. Watson Diagnostic Tool

May 1994

Abstract

The Dr. Watson utility is a diagnostic tool for the Microsoft® Windows™ operating system. It detects system and application failures and can store information in a disk file. This file can help you find and determine which application(s) caused the problem. This article explains how you can configure Dr. Watson to meet your programming needs. It also includes a sample Dr. Watson listing with comments and a listing of the GP fault messages.

Introduction

The Dr. Watson utility is a diagnostic tool for the Microsoft® Windows™ operating system. It detects system and application failures and can store information in a disk file. This file can help you find and determine which application(s) caused the problem.

You can run only a single instance of Dr. Watson at one time. Dr. Watson cannot trap faults in a Windows MS-DOS session. You can configure Dr. Watson to meet your needs by including settings for any of the following entries in the [Dr. Watson] section of your WIN.INI file (note the space between Dr. and Watson):

These entries are described in the sections below.

The SkipInfo Entry

The SkipInfo entry controls which parts of the failure report are actually sent to disk. You can set the following values to disable parts of the failure report:

Value Meaning
32bitregs Disable values of 32-bit registers and of FS and GS on 386/486 processors.
clues Disable the dialog box titled “Dr. Watson's Clues.”
information Disable system information, such as Windows version, processor type, and available memory.
registers Disable 16-bit registers.
segments Disable segment contents, base addresses, length, and flags.
stack Disable stack backtrace.
summary Disable the 6-line summary at the beginning of the error report.
tasks Disable list of all active tasks (running applications).
time Disable Dr. Watson start and stop times.

Tip   Each SkipInfo value listed above can be abbreviated, using the first three letters of the value name. For example, the following entry disables the Dr. Watson's Clues dialog box and the stack backtrace:

[Dr. Watson]
SkipInfo=clu sta

The ShowInfo Entry

Some parts of the Dr. Watson failure report are disabled by default. These sections can be enabled with the ShowInfo entry. You can set the following values to enable parts of the failure report:

Value Description
disassembly Enable separate disassembly of the fault address. This does not affect disassembly of stack frames. (See “The DisLen Entry” section below.)
errorlog Enable error logging.
locals Enable stack dump of local variable and parameter values.
modules Enable list of all loaded modules, including dynamic-link libraries (DLLs) and font files.
paramlog Enable parameter-validation error logging.
sound Enable audible warnings.

Tip   Each ShowInfo value listed above can be abbreviated, using the first three letters of the value name. The following example sets all six values for the ShowInfo entry, enabling those six parts of the failure report:

[Dr. Watson]
ShowInfo=dis err loc mod par sou

The DisLen Entry

The DisLen entry controls the number of instructions that are disassembled in stack traces and the disassembly portion of the failure report. The default value is 8. The following example sets the value to 4:

[Dr. Watson]
DisLen=4

The TrapZero Entry

By default, Dr. Watson does not trap divide overflow exceptions, because many applications provide their own handling. You can use the TrapZero entry to enable trapping of divide overflow exceptions, as shown in the following example:

[Dr. Watson]
TrapZero=1

The GPContinue Entry

One of the most advanced features of Dr. Watson enables an application to continue even after a general-protection (GP) fault. A GP fault means that a bug has been encountered, so continuing is dangerous. However, some application developers requested the ability to continue running an application even after a GP fault. If this setting is enabled, Dr. Watson performs the following tests when a GP fault occurs. If each of the following four conditions is true, Dr. Watson allows the application to continue:

  1. Is bit 0 of GPContinue set?

  2. Is the faulting instruction an instruction that can be allowed to continue?

    For example, the following instruction, which happens to be beyond the end of a segment, can continue:

    mov   ax,[ffff]
    

    The following instruction involving an invalid address would not be allowed to proceed:

    jmp   seg:offs
    
  3. Is the fault in an area outside Kernel or User? (If the fault is in Kernel or User, you may set the appropriate bit in GPContinue to continue in spite of the risk.)

  4. Does the user want to continue? Dr. Watson displays a dialog box and lets the user decide. The dialog box in Windows 3.0 has Cancel (default) and Ignore buttons and says:
    RECOVERABLE APPLICATION ERROR
    <APPNAME> failed in <MODNAME>. Ignoring fault is risky.
    

In Windows 3.1, the dialog box says:

<APPNAME>

An error has occurred while running this application.

If you chose to ignore it, you should save your work in a new file. Otherwise this application will terminate.

If you choose Close from this dialog box, you will see the normal Application Error dialog box.

Although it is very risky, you can also allow continuation in Kernel or User by setting GPContinue as appropriate. The GPContinue entry has the following bit values:

Bit Value Meaning
0 1 Allow continuation. (This is the default setting.)
1 2 Write only 3-line reports.
2 4 Continue even if the fault is in Kernel.
3 8 Continue even if the fault is in User.

You must combine these values. The following entry allows an application to continue after a User GP fault:

[Dr. Watson]
GPContinue=9

The DisStack Entry

The DisStack entry controls how many levels back on the stack get disassembled. The default value is 2. The following example sets the value to 100:

[Dr. Watson]
DisStack=100

The LogFile Entry

The default name for the Dr. Watson log file is DRWATSON.LOG in the Windows directory. The LogFile entry can be changed to any valid filename, even a printer or debugging terminal. For example, use the following setting to write to a terminal on COM1:

[Dr. Watson]
LogFile=com1

Sample Dr. Watson Log File with Comments

To save disk space, Dr. Watson generates a complete report only for the first three errors. The next 17 errors generate a report summary. After 20 errors, Dr. Watson stops writing to the log file. If you close Dr. Watson and rerun it, writing to the log file resumes. You can determine the number of error reports generated in the current session by selecting the Dr. Watson icon.

When the log file reaches 100K, Dr. Watson displays a warning message. After you have analyzed the error reports in the log file, you should delete the log file.

The following version of the Dr. Watson log file includes comments identified by a pound sign (#). These comments do not appear in the normal Dr. Watson log. They have been added here to explain the sections of the log. Also, each section starts with the bold line so you can easily find the beginning and end of each section. The words in parentheses are the WIN.INI entries that determine whether this information is to be displayed in this log file.

Start Dr. Watson 0.80 - Thu Sep 26 10:51:28 1991

# This line is inserted each time you start Dr. Watson. You can disable it with SkipInfo=time.

************************************************************************
# This line marks the beginning of a Dr. Watson report.

    Dr. Watson 0.80 Failure Report - Thu Sep 26 10:51:36 1991
# Version 0.80 of Dr. Watson - date report was generated

    BICHO had a 'Exceed Segment Bounds (Read)' fault at BICHO _DoCommand+006b
# Application 'BICHO' had an 'Exceed Segment Bounds' fault while reading memory. The actual code that failed was also in BICHO, 0x6b bytes past the start of the DoCommand() function. See the "GP Fault Messages" section for more information.

    $tag$BICHO$Exceed Segment Bounds (Read)$BICHO _DoCommand+006b
        $push word ptr [fffe]$Thu Sep 26 10:51:36 1991
# This line repeats the previous information in a format easier for automatic code to parse. It also includes the actual faulting instruction (a push instruction here).

    CPU Registers (regs)

    ax=1e54  bx=0014  cx=0d7f  dx=0111  si=1e54  di=0111
# The 16-bit CPU registers. This can be useful for decoding what address an instruction was modifying.

## What you need to look for is:
   1]  


    ip=02fd  sp=230c  bp=237a  O- D- I+ S- Z- A+ P+ C-
# The IP is the instruction pointer (Program Counter). SP and BP are the stack pointer and base pointer. The last 8 items show the state of the flag bits. In this case, Overflow, Direction, Sign, Zero, and Carry bits are Clear (0); the Interrupt, AuxCarry, and Parity bits are Set (1).

    cs = 0e57  8059fbc0:083f Code Ex/R
# Code segment selector is 0e57, linear address is 8059fbc0 (enhanced-mode linear addresses often start with 8xxx), and the limit is 83f. Accessing code and data segments beyond their limits is a common cause of GP faults.

    ss = 0d7f  8059d5e0:25df Data R/W
# Stack selector

    ds = 0d7f  8059d5e0:25df Data R/W
# Data selector--note that the limit is 25df, and we tried to read the value at fffe, beyond the limit.

    es = 0d7f  8059d5e0:25df Data R/W

    CPU 32 bit Registers (32bit)

    eax = 00001e54  ebx = 00000014  ecx = ffff0d7f  edx = 00000111
    esi = 00001e54  edi = 00000111  ebp = 0000237a  esp = 800422fc
    fs = 0000         0:0000 Null Ptr
# If the selector is 0, it indicates a null pointer. Trying to use a null pointer is another common cause of GP faults.

    gs = 0000         0:0000 Null Ptr
    eflag = 00000002

    System Info (info)

    Windows version 3.10
    Debug build
# The debug version of Windows (from the SDK) was running.

    Windows Build 3.1.048
# This is an internal Microsoft build of Windows, #48.

    Username Unknown User
# Your Name Here

    Organization Unknown Organization
# Your Org Here

    System Free Space 7131008
    Stack base 1122, top 9164, lowest 7504, size 8042
# Stack size of current task

    System resources:  USER: 87% free, seg 0777  GDI: 85% free, seg 05d7
    LargestFree 6594560, MaxPagesAvail 1610, MaxPagesLockable 267
# These stats are for informational purposes.

    TotalLinear 1948, TotalUnlockedPages 274, FreePages 52
    TotalPages 614, FreeLinearSpace 1611, SwapFilePages 7158
    Page Size 4096
    4 tasks executing.
    WinFlags -
      Math coprocessor
      80386 or 80386 SX
      Enhanced mode
      Protect mode

    Stack Dump (stack)

# We dump the stack to see who called the routine that failed.

    Stack Frame 0 is BICHO _DoCommand+006b        ss:bp 0d7f:237a
# The failure occurred in BICHO, 0x6b bytes past the start of DoCommand().

## What you need to look for is:
   1] Note that sometimes the offset from the beginning is hard to determine. So you can look at the code offset in CVW in mixed C and ASM mode to match the function with the ASM instruction. In this case, you would go to the DoCommand and find offset 237a

    0e57:02f0  e9 02b9               jmp    near 05ac
    0e57:02f3  6a 00                 push   00
    0e57:02f5  9a 8db0 0477          callf  0477:8db0
    0e57:02fa  e9 02af               jmp    near 05ac
    (BICHO:_DoCommand+006b)
# The failure happened on the following instruction:

    0e57:02fd  ff 36 fffe            push   word ptr [fffe]
# We tried to read a value from memory at DS:FFFE and push it on the stack. However, the limit of the DS segment is 25df.

    0e57:0301  68 0110               push   0110
    0e57:0304  e8 fe5d               call   near 0164
    0e57:0307  83 c4 04              add    sp, 04

    Stack Frame 1 is BICHO MAINWNDPROC+0027       ss:bp 0d7f:2388
# The Bicho MainWndProc probably called DoCommand().

    0e57:0670  eb 16                 jmp    short 0688
    0e57:0672  ff 76 0a              push   word ptr [bp+0a]
    0e57:0675  56                    push   si
    0e57:0676  e8 fc19               call   near 0292
    (BICHO:MAINWNDPROC+0027)
    0e57:0679  83 c4 04              add    sp, 04
    0e57:067c  99                    cwd
    0e57:067d  eb 1f                 jmp    short 069e
    0e57:067f  6a 00                 push   00

    Stack Frame 2 is USER IDISPATCHMESSAGE+007e   ss:bp 0d7f:239e
# USER is the Windows USER.EXE.  It is what calls your window and dialog box procedures.  In this case, it called the BICHO MainWndProc().

    Stack Frame 3 is BICHO WINMAIN+0050           ss:bp 0d7f:23bc
# Here is the BICHO WinMain, which called DispatchMessage(), which called MainWndProc().

    Stack Frame 4 is BICHO 1:00a3                 ss:bp 0d7f:23ca
# Here is where the startup code calls WinMain.

    System Tasks (tasks)

    Task  WINEXIT, Handle 0daf, Flags 0001, Info    9248 08-09-90 16:52
      FileName C:\MS\WIN\DON\WINEXIT.EXE
    Task DRWATSON, Handle 0ea7, Flags 0001, Info   26256 09-23-91 12:00
      FileName C:\WIN31\DRWATSON.EXE
# This task will always be listed.

    Task  PROGMAN, Handle 060f, Flags 0001, Info  110224 09-23-91 12:02
      FileName C:\WIN31\PROGMAN.EXE
# This task (or whatever shell you use) will always be listed.

    Task    BICHO, Handle 0da7, Flags 0001, Info   16537 09-11-91  8:45
      FileName D:\BICHO.EXE
# This is the name of the program that caused the failure.

    1> I ran a test app that accessed a value
    2> beyond the limits of the segment bounds.
# Anything you type in the Dr. Watson's Clues dialog box is added to the log file, so you can write what you want to remember.

    Stop Dr. Watson 0.80 - Thu Sep 26 10:52:10 1991

# We write this line each time Dr. Watson terminates.

GP Fault Messages

This section provides a list of the GP fault error messages that are found in the Dr. Watson log file.

Divide by zero error

This error was caused because your program tried to divide by zero.

Exceed segment bounds

This error was caused because your program tried to access memory outside of the current segment. Check the following:

  1. Are you trying to read past an array?

  2. Is the starting address of your read operation valid?

Invalid selector

This error was caused because your program tried to free a segment that wasn't there or was someone else's segment. You need to make sure you are using a valid selector.

Null selector

This error was caused because your program tried to access memory with a NULL selector. Check the following:

  1. Failed GlobalAlloc or GlobalLock.

  2. NULL value in a long pointer.

Read from execute-only code segment

This error was caused because your program tried to read a code segment. Check for an invalid pointer.

Segment not present

This error was caused because the segment your program needed was not present. Check for the following:

  1. A code segment that was discarded when it should have been marked as FIXED.

  2. A GlobalAlloc of 0 bytes

Segment wrap-around

This error was caused because your program tried to read a code segment. Check for an invalid pointer.

Write to read-only data

This error was caused because your program tried to write to a read-only code segment. Check for the following:

  1. Are you trying to write past an array?

  2. An invalid pointer.

Write to code segment

This error was caused because your program tried to write to a code segment. Check the following:

  1. Are you trying to write past an array?

  2. An invalid pointer.