Battling Backup Anxiety with Win32

Eric Bergman-Terrell

What does backup reliability have to do with Win32 directory recursion and memory-mapped file I/O? Plenty, if you use some neat features of Win32 to answer the question: Are the files on my backup tape really the same as those on my hard disk? This article explores a handy console application that provides the answers. You'll learn about tree traversal and memory-mapped file I/O in the process.

I'M paranoid. I backup my hard disk to tape habitually. I store the tapes in a safe deposit box at my bank. The thought of losing my files in a hard disk crash is more than I can bear. I get a lot of comfort in knowing my files are safely tucked away in a 50-ton bank vault. So when I started hearing reports that some backups that passed my backup software's verify phase couldn't be reliably restored, I panicked. Apparently I couldn't trust my backup software's verify feature. Having a strong urge to know the truth about my backups, I decided to write a program that compares all the files in two directory trees. With this program, I could backup my files and restore them to a temporary directory and then run a comparison to determine, once and for all, if my backups were good or bad. Since this directory comparison program uses some interesting Win32 programming techniques, I thought it would be fun to describe them in this article. In a nutshell, it's a console application that uses the Win32 directory tree traversal API, and uses memory-mapped file I/O to compare files. So let's take a look.

The sample application

The directory comparison program is named DIRCMP and the complete source code is included on the Developer's Disk. DIRCMP is a Win32 console application that can be compiled with MSVC++ 4.1. A console application is a character-mode program that runs in a Windows 95 or NT command shell. Since they're full Win32 applications they can't run under Win32s. DIRCMP is simple to run. Just open a command shell and type DIRCMP <directory 1> <directory 2>.

For example, if you want to compare the C:\SOURCE with C:\TMP directories, you would enter this command:


 C:\>DIRCMP  C:\SOURCE  C:\TMP

If you want more detailed output, you can use the optional /verbose mode:


 C:\>DIRCMP  C:\SOURCE  C:\TMP  /VERBOSE

When DIRCMP runs, it creates a data structure containing the files in either directory. It then displays two lists: the files that exist in the first directory but not in the second directory, and the files that exist in the second directory but not in the first directory. Next, the program displays the number of files in both directories. Finally, the program compares the files in both directories and displays a list of like-named files that are different or have different sizes. I used the VC++ AppWizard to create the application shell.

I was surprised to find that AppWizard doesn't create a skeleton console application. I expected it to create a file containing a skeleton main() function something like the one shown here. Instead, Visual C++ 4.1 forces you to create your main() manually. Perhaps a future version will automate this process. After all, how hard could it be to enhance AppWizard to create less than 10 lines of code?


 #include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[])

  {
  // TODO:  Add your code here

  return EXIT_SUCCESS;
  }

Exploring console applications

The main function's parameters correspond to the arguments entered on the command line by the user. The argc parameter is the number of arguments plus one. The parameter argv is an array of character pointers that contains the program's executable path plus each argument. For example, let's say you enter the following command:


 C:\>C:\BIN\DIRCMP C:\SOURCE C:\TMP

In this situation, the value of the argc parameter is three and the array contains these values:


 argv[0] == "C:\BIN\DIRCMP.EXE"
argv[1] == "C:\SOURCE"
argv[2] == "C:\TMP"

While a console application isn't an MFC CWinApp, console applications can use CStrings and the MFC collection classes. DIRCMP's main function first checks the argument count. If the count is valid, it calls CDirectoryCompare::Compare to compare the two directories. If there are three arguments (in other words, if argc == 4) it assumes that the user has specified the /verbose mode. Here's the source code for the Main() function:


 #include <stdio.h>
#include <stdlib.h>
#include <afxwin.h>
#include <afxext.h>
#include "dircmp.h"


int main(int argc, char *argv[])

  {
  // Check the argument count and give a usage
  // message if it's incorrect.
  if (argc != 3 && argc != 4)
    {
    printf("usage: dircmp <directory 1> "
           "<directory 2> {/verbose}\n");
    exit(EXIT_FAILURE);
    }

  // Compare the directories.
  CDirectoryCompare DirCmp;
    DirCmp.Compare(argv[1], argv[2], argc == 4);

  printf("\ndircmp completed successfully\n");

  return EXIT_SUCCESS;
  }

Walking the directory tree

Unlike Win16 programs that must call MS-DOS functions to traverse directory trees, Win32 console applications can call native Win32 functions. Let's examine the calls we'll use:


 HANDLE FindFirstFile(LPCTSTR lpFileName, 
                     LPWIN32_FIND_DATA lpFindFileData);

The parameter lpFileName must be either a valid directory or a path and filename. It can contain wildcard characters (* and ?). Under Windows 95, lpFileName can't have more than MAX_PATH characters. The parameter lpFindFileData is the address of a WIN32_FIND_DATA structure that receives information about the found file or subdirectory. FindFirstFile's return value, if successful, is a search handle for use in calls to FindNextFile and FindClose. Otherwise, it returns INVALID_HANDLE_VALUE:


 BOOL FindNextFile(HANDLE hFindFile, LPWIN32_FIND_DATA 
                 lpFindFileData);

In FindNextFile the hFindFile parameter is the search handle returned by the call to FindFirstFile. The lpFindFileData parameter is the address of a WIN32_FIND_DATA structure that receives information about the found file or subdirectory. FindNextFile's return value is TRUE if the function call succeeds and FALSE if it doesn't:


 BOOL FindClose(HANDLE hFindFile);

Our last traversal function is FindClose. Having only one parameter, hFindFile is the search handle returned by the original call to FindFirstFile. This function releases the search handle. Its return value is TRUE on success and FALSE on failure. We'll use these functions in CDirectoryCompare::AddFilesToList.

DIRCMP keeps track of which files are in which directories by using an MFC CMapStringToOb collection named m_FileList. Items in m_FileList are stored with their filenames as the retrieval key. Each item is a CFileListItem that specifies which directories contain the file. Here's the declaration of the CFileListItem class:


 class CFileListItem : public CObject

  {
  DECLARE_DYNAMIC(CFileListItem)

  public:
    CFileListItem();

    enum { m_nElements = 2 };
    BOOL m_bWhichDir[m_nElements];
  };

If a CFileListItem object has a value of TRUE in the first element of its m_bWhichDir array, the file is stored in the first directory specified in the DIRCMP command line. If the second element has a value of TRUE, the file is stored in the second directory specified in the DIRCMP command.

CDirectoryCompare::AddFilesToList uses FindFirstFile and FindNextFile to find all files and subdirectories in the directory specified by DirName. If the dwFileAttributes FILE_ATTRIBUTE_DIRECTORY bit is clear, the item is a file and its filename is stored in the cFileName field of the WIN32_FIND_DATA structure. If the filename isn't already stored in m_FileList, a new CFileListItem object is created and added to m_FileList. Otherwise a pointer to the CFileListItem is retrieved using the Lookup function. In either case, the m_bWhichDir array is updated to record which directory contains the file.

When the program encounters a subdirectory, it calls CDirectoryCompare::AddFilesToList recursively to process the files in the subdirectories. When all directories have been processed the program calls FindClose to close the search handle. Here's the code for CDirectoryCompare::AddFilesToList:


 void CDirectoryCompare::AddFilesToList(
                         const CString& RootDir,
                         const CString& DirName,
                         int nIndex)

// Add all files in the directory specified by
// RootDir and DirName to m_FileList.

  {
  WIN32_FIND_DATA FindFileData;

  if (DirName.GetLength() >= MAX_PATH)
    printf("File or Directory Name is too "
           "long: \"%s\"\n", (LPCSTR) DirName);
  else
    {
    // Get first file or directory in the 
    // directory.
    const HANDLE hFindFile = 
        FindFirstFile(DirName + "\\*.*",
                      &FindFileData);
    ASSERT(hFindFile != INVALID_HANDLE_VALUE);

    CString FileName(FindFileData.cFileName);

    // If there are files and/or directories
    // in the directory, add all of them to
    // m_FileList.
    if (hFindFile != INVALID_HANDLE_VALUE)
      {
      do
        {
        // Get filename of file or directory.
        FileName = FindFileData.cFileName;

        // If the item is a file, add it to 
        // m_FileList.
        if (!(FindFileData.dwFileAttributes & 
              FILE_ATTRIBUTE_DIRECTORY))
          {
          const CString NewLine = 
                  DirName + "\\" + FileName;
          CString NewFileName = 
            NewLine.Right(NewLine.GetLength() - 
                          RootDir.GetLength());

          NewFileName.MakeUpper();

          CObject       *pValue = NULL;
          CFileListItem *pItem  = NULL;

          // If a new item must be created...
          if (nIndex == 0 || 
              !m_FileList.Lookup(NewFileName,
                                 pValue))
            {
            // Create a new item.
            pItem = new CFileListItem;
            ASSERT(pItem);

            pItem->m_bWhichDir[nIndex] = TRUE;

            // Add new item to m_FileList.
            m_FileList.SetAt(NewFileName, pItem);
            }
          else
            {
            ASSERT(pValue && pValue->IsKindOf(
                  RUNTIME_CLASS(CFileListItem)));

            pItem = (CFileListItem *) pValue;

            // Update the item which is already
            // in m_FileList.
            pItem->m_bWhichDir[nIndex] = TRUE;
            }
          }

        // If item is a directory, add its
        // files.
        if (FindFileData.dwFileAttributes &
            FILE_ATTRIBUTE_DIRECTORY &&
            FileName != "." && FileName != "..")
          AddFilesToList(RootDir, DirName + 
                         "\\" + FileName, 
                         nIndex);
        } while (FindNextFile(hFindFile, 
                              &FindFileData));

      FindClose(hFindFile);
      }
    }
  }

DIRCMP calls CDirectoryCompare::AddFilesToList for both directories specified on the DIRCMP command line. The directory names are stored in an array named m_Dirs. Here's how the lists are managed:


 
void CDirectoryCompare::PopulateFileList()

// Add the names of all files in either directory
// to m_FileList.

  {
  ClearFileList();

  for (int i = 0; i < 2; i++)
    AddFilesToList(m_Dirs[i], m_Dirs[i], i);
  }

Let's compare

Once all the filenames have been added to m_FileList by CDirectoryCompare::PopulateFileList, it's time to compare the files that exist in both directories specified in the DIRCMP command line. The Win32 memory-mapped file I/O feature makes it easy to compare files. Memory-mapped file I/O lets Win32 programs associate a region of virtual memory with a disk file so that any access of the memory will access the associated file data. If the files to be compared are the same size, each file can be mapped to a virtual memory address. The files can then be compared by simply passing the addresses to the memcmp function.

Memory-mapped file I/O is supported by Windows 95, NT, and even Win32s. Table 1 (see page 12) provides a summary of the memory-mapped file I/O functions and their parameters.

Here's how CDirectoryCompare::CompareFiles works. First, CreateFile opens the files. Then GetFileSize verifies that the files are the same size. If so, CreateFileMapping and MapViewOfFile memory map the files. The actual comparison of the file contents is done by memcmp. Finally, UnmapViewOfFile and CloseHandle clean up.

CDirectoryCompare::CompareFiles is limited to files smaller than 4G because memcmp's third argument, which specifies the number of bytes to compare, is a 32-bit value. If you need to compare files larger than 4G, you'll need to compare them in chunks smaller than 4G. Here's the code for running the comparison:


 CDirectoryCompare::ComparisonType 
CDirectoryCompare::CompareFiles(
                    const CString& Path,
                    DWORD& dwBytesCompared) const

// The file specified by Path is in both 
// directories. Compare both copies and
// determine if they are identical.
//
// Return Value:
//
// DifferentSizes     Files are different sizes
//
// DifferentContents  Files are the same size but
//                    have different contents.
//
// Identical          Files are the same size and
//                    have identical contents.

  {
  dwBytesCompared = 0;

  CString Paths[2];

  // Determine full filenames of both copies
  // of the file.
  for (int i = 0; i < 2; i++)
    Paths[i] = m_Dirs[i] + Path; 

  HANDLE FileHandle[2];
  DWORD dwSizeLow[2], dwSizeHigh[2];

  // Attempt to open both copies of the file.
  for (i = 0; i < 2; i++)
    {
    FileHandle[i] = CreateFile(
                         (LPCSTR) Paths[i], 
                         GENERIC_READ, 
                         FILE_SHARE_READ,
                         NULL, 
                         OPEN_EXISTING, 
                         FILE_ATTRIBUTE_NORMAL,
                         NULL);

    if (FileHandle[i] == INVALID_HANDLE_VALUE)
      {
      printf("Cannot open file %s\n",
             (LPCSTR) Paths[i]);
      exit(EXIT_FAILURE);
      }

    // Determine size of both copies of the file.
    dwSizeLow[i] = GetFileSize(FileHandle[i], 
                               &dwSizeHigh[i]);

    // If the file is larger than 4 GB, exit.
    if (dwSizeHigh[i] != 0)
      {
      printf("File %s is too large\n",
             (LPCSTR) Paths[i]);
      exit(EXIT_FAILURE);
      }
    }  

  // Determine if the copies are different sizes.
  if (dwSizeLow[0] != dwSizeLow[1])
    return DifferentSizes;

  // At this point, the files are identical in 
  // size.
  const DWORD dwFileSize = dwSizeLow[0];

  // No need to compare empty files
  if (dwFileSize == 0)
    return Identical;

  HANDLE Maps[2];
  const void *FileContents[2];

  // Map the files using memory-mapped file I/O.
  for (i = 0; i < 2; i++)
    {
    Maps[i] = CreateFileMapping(FileHandle[i],
                                NULL,
                                PAGE_READONLY,
                                0,
                                dwFileSize,
                                NULL);

    if (Maps[i] == NULL)
      {
      printf("Cannot map file %s\n", 
             (LPCSTR) Paths[i]);
      exit(EXIT_FAILURE);
      }

    FileContents[i] = MapViewOfFile(
                            Maps[i],
                            FILE_MAP_READ,
                            0,
                            0,
                            dwFileSize);

    if (FileContents[i] == NULL)
      {
      printf("Cannot map file %s\n", 
             (LPCSTR) Paths[i]);
      exit(EXIT_FAILURE);
      }
    }

  // Compare the memory-mapped files with memcmp.
  const BOOL bResult = memcmp(FileContents[0], 
                              FileContents[1], 
                              dwFileSize) == 0;

  dwBytesCompared = dwFileSize;

  // Unmap and close the files.
  for (i = 0; i < 2; i++)
    {
    if (!UnmapViewOfFile(FileContents[i]))
      {
      printf("Cannot unmap file %s\n",
             (LPCSTR) Paths[i]);
      exit(EXIT_FAILURE);
      }

    if (!CloseHandle(Maps[i]))
      {
      printf("Cannot close file %s\n", 
             (LPCSTR) Paths[i]);
      exit(EXIT_FAILURE);
      }

    if (!CloseHandle(FileHandle[i]))
      {
      printf("Cannot close file handle "
             "for file %s\n", (LPCSTR) Paths[i]);
      exit(EXIT_FAILURE);
      }
    }

  return bResult ? Identical : DifferentContents;
  }

Conclusion

In addition to providing the simplicity of a flat memory model, the Win32 API provides several other great conveniences. Console applications are useful for utilities that are best run from the command line. The native Win32 directory tree traversal API allows you to search for files without using MS-DOS functions. And finally, memory-mapped file I/O makes working with files as easy as manipulating pointers. Incidentally, DIRCMP provided proof that my backup software and hardware were actually working perfectly. Still, I'm glad I wrote the program so that I can ensure the reliability of my backups. After all, even paranoids have enemies.

Eric Bergman-Terrell, author of Vault and Astronomy Lab, writes Windows travel agency applications at Galileo International. CompuServe 73667,3517.

 

To find out more about Visual C++ Developer and Pinnacle Publishing,
visit their website at
http://www.pinppub.com/vcd/

Note: This is not a Microsoft Corporation website.
Microsoft is not responsible for its content.

This article is reproduced from the August 1996 issue of Visual C++ Developer. Copyright 1996, by Pinnacle Publishing, Inc., unless otherwise noted. All rights are reserved. Visual C++ Developer is an independently produced publication of Pinnacle Publishing, Inc. No part of this article may be used or reproduced in any fashion (except in brief quotations used in critical articles and reviews) without prior consent of Pinnacle Publishing, Inc. To contact Pinnacle Publishing, Inc., please call (800)788-1900 or (206)251-1900.