Deep C++: Don’t Let These Words Scare You

by Brian Overland

In ancient Greece, the Pythagoreans met as a secret brotherhood guarding the truths of mathematics. Understanding these truths was only for the elite. One member was even booted out for revealing a holy of holies: the knowledge that the square root of two is irrational.

In a manner perhaps not quite so deliberate, an arcane priesthood seems to have developed around C++. If people merely had to understand the syntax and practical benefits of the language, perhaps the truths of C++ might be available to all. But too often, people feel that unless they master a set of strange, multisyllabic mantras—words such as "polymorphism," "abstraction," and "encapsulation"—they don’t really understand the essence of C++.

Polymorphism

Polymorphism is a useful term if you’re going to talk about comparative languages and systems, because the concept extends beyond the C++ mechanism (virtual functions). It can take years to understand this term fully and (more significantly) to comprehend when it could possibly make a difference. In the context of C++ itself, you can replace "polymorphism" with the phrase "using virtual functions." That’s all it really means.

In general, polymorphism is a technique for making objects act more like independent, autonomous units. Polymorphism de-centralizes control.

In fact, the best way to understand the concept is to look at the practical requirements of a system such as Windows. Windows is a message-based architecture that involves polymorphism. When the system sends a notification to an application window ("respond to a mouse click," for instance), the system cannot anticipate all the possible responses. If it could—if there were a finite list of responses to choose from—Windows would be an incredibly limited system.

Polymorphism (literally, "many forms" from Greek) really means unlimited possible responses. Because there are an unlimited number of possible responses to the same message, responses to Windows messages have to be polymorphic:. In particular, new applications can be continually developed in the future, each bringing with it new message-handling code.

The innovation of object-oriented programming is simply to build this flexibility into the structure of a programming language. In C++, the virtual-function capability lets you apply this kind of mechanism wherever you choose. In C, you’d have to use callback functions.

Consider a call to a virtual Print() function (an admittedly cliche cliched example):

   Shape *pShape;
   // Assign pShape to print to
   // a class derived from Shape
   pShape->Print();

The point of this example is that the call to Print() will always call the derived class’s implementation, assuming that Print() is virtual. And because the potential number of derived classes is unlimited, the number of possible responses is unlimited. This works correctly even when new classes are added after this code is compiled. New responses can be added without recompiling the main program, just as new Windows applications can be written without recompiling Windows itself.

Virtual

Virtual is a word greatly in vogue these days. The name virtual is potentially misleading, by the way: It means something that isn’t quite real but has the behavior of (that is, provides the virtues of) the real thing. But in truth, virtual functions are completely real. They are declared, defined, and used exactly as other member functions are, except the virtual keyword is used one time: the first time the function is declared. (This occurs in the base-class declaration.)

   class Shape {
   \\ ...
   virtual void Print(void);
   };

Moreover, except in the case of pure virtual functions (a subject for another column), a virtual function call doesn’t look or act differently from a "real" function call. The difference is that with a virtual function, the precise type of the object determines the actual function to be called, even though this type may not be known until run time. As shown in the previous section, this may happen because a call is made through a pointer of base-class type. At run time, the pointer pShape may be assigned to point to an object of a derived class (a square or a circle, for example), which provides its own version of Print().

This deferring of the function-call address to run time is also called late binding. It happens not through magic but through an indirect function call (using a pointer to a function). Understanding this helps you understand the trade-offs involved.

Not all functions should be made virtual, because there is a performance penalty: each class maintains a pointer to its own implementation of the virtual function. This detail is invisible to you at the source-code level, but it means that you should make a function virtual only if it is going to be overridden.

Encapsulation

The term encapsulation is more straightforward and easy to understand. (The major intimidation factor is the number of syllables!) Encapsulation means "to shield from the outside world." It’s probably more accurate, though, to say that the outside world gets protected from the inside of an object. Consider another common example: a string class. It can be manipulated through functions—the preferred interface here—but in this example, the data member is incorrectly made public as well:

   class String {
   public:
   char array[256];
   char *getstr(void);
   char *copystr(char *s);
   };

Given this declaration, there’s no reason for a programmer to refer directly to the data member array, seeing such reference as a shortcut. Now, we might not want a programmer to do this, but there’s nothing to stop him, and we know people don’t always read the documentation.

A problem arises when the designers update this class to make its use of storage more flexible and efficient. They may rewrite the class as follows:

   class String {
   public:
   char *p;
   char length;
   char *getstr(void);
   char *copystr(char *s);
   };

Now other programmer who have been using this class all through their code, while innocently accessing the (now nonexistent) data member array, have a serious problem. Code that uses the String class is now riddled with errors. The user’s code is now seriously broken.

Encapsulation, therefore, is primarily a way of protecting the user of a class. Errors would not have arisen in this case if the designers had decided to designate a certain part of the class as its interface and made the rest private. The private portion can be safely changed.

The term interface, by the way, has different meanings in different contexts, but it has a fairly consistent definition among programmers. An interface is the channel through which two entities can interact. (So, for example, a user interface is the part of a system that a user can interact with.)

C++, much more than C, encourages well-defined and well-enforced interfaces between parts of a program. As long as interfaces are clearly understood and do not change, programmers should be able to make changes to the rest of a program much more safely. The result, in theory, should be fewer bugs.

In C++, you encapsulate a member by making it private:
   class String {
   // Internals (encapsulated)
   private:
   char *p;
   .int length;

   // Interface
   public:
   char *getstr(void);
   char *copystr(void);
   };

Yet if "encapsulation" is clear, data abstraction is the source of endless confusion among C++ programmers. The problem is inconsistency in the way people use the term, although the general notion is widely understood, if somewhat foggily.

Abstract is the opposite of concrete, but all data types are ultimately concrete in C++, just as they are in other languages. The data type without a representation is a data type that doesn’t exist in any real program, whether the language is Fortran, Basic, Pascal, or C++.

"Data abstraction" is meaningful as a general goal of programming: to require the user of a data type to need know nothing about the specific structure of the type. In this regard, data abstraction has some connection to encapsulation. Abstraction is the goal, encapsulation is a tool.

There is no absolute link between the two concepts, however. For example, a FILE* pointer is a good representation of an abstract data type. A programmer using C can use such a pointer without knowing what the layout of a FILE structure is. Moreover, code that uses a FILE* pointer is generally portable between different compilers and libraries, each of whose concrete, internal representation of FILE is different. Yet nothing prevents a programmer using C from looking at this information and accessing it directly, ultimately breaking the code when there’s an attempt to port it. Encapsulation isn’t present in that case, though it might very well be helpful.

For example, the following code should always be portable, even though the actual layout of the FILE structure may change dramatically from one implementation of C/C++ to the next:

   FILE *fp;
   if (fp=fopen("DATA", "r")
   fprintf(fp,"Hello, file.");

Ah, ha!

C++ concepts, ultimately, are tools and techniques designed to solve practical problems. If you program long enough, you’ll come across all these problems yourself, so hopefully, when the terms are explained, your reaction should be, "Ah ha! That’s what it’s for!"

Such terms are not really forms of mysticism imposed on you from above, like Pythagoras instructing students at his feet.

Brian Overland has published several books on C and C++, and has written on programming topics for many groups at Microsoft. Before coming to Microsoft as a technical writer, he was a professional programmer, actor, and drama critic.