Chapter 7 Structured Storage and Compound Files

Through their history, computer storage technologies have evolved to allow more and more separate agents—applications and components, for example—to simultaneously share a common storage device such as a disk drive or a database. Where once an application owned the entire computer and all its resources, operating systems have for some time now allowed multiple applications to share those resources using the concept of files. In the next stage of computing, as end users understand component software, a need will develop for many components to share a single file. The first section of this chapter justifies this need more clearly.

Assuming that the need to share the contents of a file among components truly exists, OLE defines an architecture called Structured Storage, which enables this kind of sharing. OLE also implements this architecture in a service called Compound Files.1 All of this effectively creates a "file system within a file," in which two types of named elements, storages and streams, encapsulate the functionality that you find today in directories and files on existing file systems. A storage acts like a directory in that it manages other storages and streams but holds no data itself; a stream acts like a file in that it can hold information but not other storage elements.

The Structured Storage and Compound Files implementations support many features that can simplify the way an application—especially one created from arbitrary components—deals with its underlying storage. For example, storage elements can be direct (changes are permanent when written) or transactioned (changes are not permanent until explicitly committed to storage). Incremental access, including saving and loading, is also the default mode of operation: if all you want to read is the information from one stream at a particular point in the storage (directory) tree, you need only to navigate the hierarchy and open that stream—no need to search through the whole file yourself. To provide these features, OLE requires control over the absolute positioning (seek offsets) of information within the bounds of a file, just as a file system takes over the absolute positioning (sectors) of file contents on a storage device.

What? Microsoft is asking me to change my file format? Have those people had too much espresso or something? This is a response I heard often when Microsoft first presented Structured Storage. Surely we can all heartily agree that file systems are a good and powerful innovation, right? Thus, in this chapter, I intend to convince you that doing the same thing to files as we know them—creating a "file system within a file"—is a good and powerful innovation itself. OLE doesn't intend to dictate what information you store in a file or the elements within that file; it merely intends to standardize the means of accessing units of information, whatever the internal format of that information happens to be. In other words, OLE takes control of managing any hierarchy of storage and stream elements, in the same way that a file system manages any hierarchy of directories and files. In exchange, you're given the freedom to build any sort of hierarchy you want and to store any information you want with however many streams you want.

The general and most fundamental implication of such standardization is that information stored in a compound file can be browsed by code other than the application that originally created the file. Most file formats today are proprietary—only the application that wrote the file (or one with intimate knowledge of that application) can examine its contents. On the other hand, because OLE, a central service, maintains the hierarchy of elements in a compound file, any arbitrary agent can browse the storage and stream elements within that file. That is, anyone else can examine the hierarchy; however, they cannot crack the proprietary information contained inside individual streams. (Not yet at least, but further technology is under development to make such information browsable as well.) If a stream, however, has a standard name and contains information in a standard format, anyone can find that stream in a file and extract its information regardless of the presence of the code that originally wrote that information.

As we'll see in Chapter 16, there is a standard for a stream named SummaryInformation that contains a document's title, subject, author, keywords, and so forth. This standard allows an end user to issue a query to a search tool, for example "find all documents that I wrote after 14 September 1994 with the word vegetarian in the title or keywords," and the shell goes off and searches for all files, regardless of origin, whose summary information matches the criteria. For the end user, this eliminates a host of application-specific features used to search through files, which usually apply exclusively to that application's files. The standards that Structured Storage brings to the scene allow consolidation of such features in the system shell, a boon for the end user and the developer alike—end users need not learn so many different user interfaces, and developers need not write them in the first place (unless they want to improve on the system shell, a legitimate market).

Just as the invention of file systems paid off big for the computer industry, you can expect that OLE's Compound Files, if put into practice, will pay off big once again. As we'll see in this chapter, simple use of the technology is quite similar to what you understand about files today. In the same manner as a file system makes disparate sectors on a disk appear as a contiguous byte array through the infamous file handle, OLE makes disparate fragments within a file appear as a contiguous byte array through a stream element. Indeed, there is almost a one-to-one correlation between how you work with a file handle and how you work with a stream.

After we examine the benefits and features of Structured Storage and Compound Files, we'll take a look at storage and stream elements more closely. These elements are objects with specific interfaces, namely IStorage and IStream. We'll spend some time learning how you access these objects and obtain your first interface pointer to them, showing examples in code as we apply the technology to the Patron and Cosmo samples.

This discussion will set the stage for the many uses of Structured Storage in other parts of OLE, such as Persistent Objects (Chapter 8), Monikers (Chapter 9), Uniform Data Transfer (Chapter 10, as well as Clipboard and Drag and Drop in Chapters 12 and 13), Property Sets (Chapter 16), OLE Documents (Chapters 17 through 23), and OLE Controls (Chapters 24 and 25). Obviously, Structured Storage plays a very supportive role in much of OLE, and just as storage has played a key role in the evolution of computers and software in general, this part of OLE is central to the evolution of component software.

1 Microsoft licenses the ANSI C++ source code for Compound Files as a reference implementation for those who want to use the technology on other platforms such as UNIX and OS/2. Microsoft provides the implementation for Windows and Macintosh as part of OLE.