Deep C++: Dont Let These Words Scare You
by Brian Overland
In ancient Greece, the Pythagoreans met as a secret brotherhood guarding the truths of mathematics. Understanding these truths was only for the elite. One member was even booted out for revealing a holy of holies: the knowledge that the square root of two is irrational.
In a manner perhaps not quite so deliberate, an arcane priesthood seems to have developed around C++. If people merely had to understand the syntax and practical benefits of the language, perhaps the truths of C++ might be available to all. But too often, people feel that unless they master a set of strange, multisyllabic mantraswords such as "polymorphism," "abstraction," and "encapsulation"they dont really understand the essence of C++.
Polymorphism
Polymorphism is a useful term if youre going to talk about comparative languages and systems, because the concept extends beyond the C++ mechanism (virtual functions). It can take years to understand this term fully and (more significantly) to comprehend when it could possibly make a difference. In the context of C++ itself, you can replace "polymorphism" with the phrase "using virtual functions." Thats all it really means.
In general, polymorphism is a technique for making objects act more like independent, autonomous units. Polymorphism de-centralizes control.
In fact, the best way to understand the concept is to look at the practical requirements of a system such as Windows. Windows is a message-based architecture that involves polymorphism. When the system sends a notification to an application window ("respond to a mouse click," for instance), the system cannot anticipate all the possible responses. If it couldif there were a finite list of responses to choose fromWindows would be an incredibly limited system.
Polymorphism (literally, "many forms" from Greek) really means unlimited possible responses. Because there are an unlimited number of possible responses to the same message, responses to Windows messages have to be polymorphic:. In particular, new applications can be continually developed in the future, each bringing with it new message-handling code.
The innovation of object-oriented programming is simply to build this flexibility into the structure of a programming language. In C++, the virtual-function capability lets you apply this kind of mechanism wherever you choose. In C, youd have to use callback functions.
Consider a call to a virtual Print() function (an admittedly cliche cliched example):
Shape
*pShape;
// Assign
pShape to print to
// a class
derived from Shape
pShape->Print();
The point of this example is that the call to Print() will always call the derived classs implementation, assuming that Print() is virtual. And because the potential number of derived classes is unlimited, the number of possible responses is unlimited. This works correctly even when new classes are added after this code is compiled. New responses can be added without recompiling the main program, just as new Windows applications can be written without recompiling Windows itself.
Virtual
Virtual is a word greatly in vogue these days. The name virtual is potentially misleading, by the way: It means something that isnt quite real but has the behavior of (that is, provides the virtues of) the real thing. But in truth, virtual functions are completely real. They are declared, defined, and used exactly as other member functions are, except the virtual keyword is used one time: the first time the function is declared. (This occurs in the base-class declaration.)
class Shape
{
\\ ...
virtual void
Print(void);
};
Moreover, except in the case of pure virtual functions (a subject for another column), a virtual function call doesnt look or act differently from a "real" function call. The difference is that with a virtual function, the precise type of the object determines the actual function to be called, even though this type may not be known until run time. As shown in the previous section, this may happen because a call is made through a pointer of base-class type. At run time, the pointer pShape may be assigned to point to an object of a derived class (a square or a circle, for example), which provides its own version of Print().
This deferring of the function-call address to run time is also called late binding. It happens not through magic but through an indirect function call (using a pointer to a function). Understanding this helps you understand the trade-offs involved.
Not all functions should be made virtual, because there is a performance penalty: each class maintains a pointer to its own implementation of the virtual function. This detail is invisible to you at the source-code level, but it means that you should make a function virtual only if it is going to be overridden.
Encapsulation
The term encapsulation is more straightforward and easy to understand. (The major intimidation factor is the number of syllables!) Encapsulation means "to shield from the outside world." Its probably more accurate, though, to say that the outside world gets protected from the inside of an object. Consider another common example: a string class. It can be manipulated through functionsthe preferred interface herebut in this example, the data member is incorrectly made public as well:
class String {
public:
char array[256];
char *getstr(void);
char *copystr(char *s);
};
Given this declaration, theres no reason for a programmer to refer directly to the data member array, seeing such reference as a shortcut. Now, we might not want a programmer to do this, but theres nothing to stop him, and we know people dont always read the documentation.
A problem arises when the designers update this class to make its use of storage more flexible and efficient. They may rewrite the class as follows:
class String {
public:
char *p;
char length;
char *getstr(void);
char *copystr(char *s);
};
Now other programmer who have been using this class all through their code, while innocently accessing the (now nonexistent) data member array, have a serious problem. Code that uses the String class is now riddled with errors. The users code is now seriously broken.
Encapsulation, therefore, is primarily a way of protecting the user of a class. Errors would not have arisen in this case if the designers had decided to designate a certain part of the class as its interface and made the rest private. The private portion can be safely changed.
The term interface, by the way, has different meanings in different contexts, but it has a fairly consistent definition among programmers. An interface is the channel through which two entities can interact. (So, for example, a user interface is the part of a system that a user can interact with.)
C++, much more than C, encourages well-defined and well-enforced interfaces between parts of a program. As long as interfaces are clearly understood and do not change, programmers should be able to make changes to the rest of a program much more safely. The result, in theory, should be fewer bugs.
In C++, you encapsulate a member by making it
private:
class String {
// Internals (encapsulated)
private:
char *p;
.int length;
// Interface
public:
char *getstr(void);
char *copystr(void);
};
Yet if "encapsulation" is clear, data abstraction is the source of endless confusion among C++ programmers. The problem is inconsistency in the way people use the term, although the general notion is widely understood, if somewhat foggily.
Abstract is the opposite of concrete, but all data types are ultimately concrete in C++, just as they are in other languages. The data type without a representation is a data type that doesnt exist in any real program, whether the language is Fortran, Basic, Pascal, or C++.
"Data abstraction" is meaningful as a general goal of programming: to require the user of a data type to need know nothing about the specific structure of the type. In this regard, data abstraction has some connection to encapsulation. Abstraction is the goal, encapsulation is a tool.
There is no absolute link between the two concepts, however. For example, a FILE* pointer is a good representation of an abstract data type. A programmer using C can use such a pointer without knowing what the layout of a FILE structure is. Moreover, code that uses a FILE* pointer is generally portable between different compilers and libraries, each of whose concrete, internal representation of FILE is different. Yet nothing prevents a programmer using C from looking at this information and accessing it directly, ultimately breaking the code when theres an attempt to port it. Encapsulation isnt present in that case, though it might very well be helpful.
For example, the following code should always be portable, even though the actual layout of the FILE structure may change dramatically from one implementation of C/C++ to the next:
FILE *fp;
if (fp=fopen("DATA", "r")
fprintf(fp,"Hello, file.");
Ah, ha!
C++ concepts, ultimately, are tools and techniques designed to solve practical problems. If you program long enough, youll come across all these problems yourself, so hopefully, when the terms are explained, your reaction should be, "Ah ha! Thats what its for!"
Such terms are not really forms of mysticism imposed on you from above, like Pythagoras instructing students at his feet.
Brian Overland has published several books on C and C++, and has written on programming topics for many groups at Microsoft. Before coming to Microsoft as a technical writer, he was a professional programmer, actor, and drama critic.