References are like constant
pointers that are automatically dereferenced by the compiler.
Although references
also exist in Pascal, the C++ version was taken from the
Algol language. They are essential in C++ to support the syntax of operator
overloading (see Chapter 12),
but they are also a general convenience to control the way arguments are passed
into and out of functions.
This chapter will first look briefly at
the differences between pointers in C and C++, then
introduce references. But the bulk of the chapter will delve into a rather
confusing issue for the new C++ programmer: the
copy-constructor, a special
constructor (requiring references) that makes a new object from an existing
object of the same type. The copy-constructor is used by the compiler to pass
and return objects by value
into and out of
functions.
The most important difference between
pointers in C and those in C++ is that
C++ is a
more strongly typed language. This stands out where
void*
is concerned. C doesn’t let you casually assign a pointer of one type to
another, but it does allow you to accomplish this through a void*.
Thus,
bird* b; rock* r; void* v; v = r; b = v;
Because this “feature” of C
allows you to quietly treat any type like any other type, it leaves a big
hole in the type system. C++ doesn’t allow this;
the compiler gives you an error message, and if you really want to treat one
type as another, you must make it explicit, both to the compiler and to the
reader, using a cast. (Chapter 3 introduced C++’s improved
“explicit” casting
syntax.)
A reference
(&) is like a constant pointer that is
automatically dereferenced. It is usually used for function argument lists
and function return
values. But you can also make a
free-standing reference. For
example,
//: C11:FreeStandingReferences.cpp #include <iostream> using namespace std; // Ordinary free-standing reference: int y; int& r = y; // When a reference is created, it must // be initialized to a live object. // However, you can also say: const int& q = 12; // (1) // References are tied to someone else's storage: int x = 0; // (2) int& a = x; // (3) int main() { cout << "x = " << x << ", a = " << a << endl; a++; cout << "x = " << x << ", a = " << a << endl; } ///:~
In line (1), the compiler allocates a
piece of storage, initializes it with the value 12, and ties the reference to
that piece of storage. The point is that any reference must be tied to someone
else’s piece of storage. When you access a reference, you’re
accessing that storage. Thus, if you write lines like (2) and (3), then
incrementing a is actually incrementing x, as is shown in
main( ). Again, the easiest way to think about a reference is as a
fancy pointer. One advantage of this “pointer” is that you never
have to wonder whether it’s been initialized (the compiler enforces it)
and how to dereference it (the compiler does it).
The most common place you’ll see
references is as function arguments and return values. When a reference is used
as a function argument, any modification to the
reference inside the function will cause changes to the argument
outside the function. Of course, you could do the same thing by passing a
pointer, but a reference has much cleaner syntax. (You can think of a reference
as nothing more than a syntax convenience, if you want.)
If you return a
reference from a function, you must take the same care as if you return a
pointer from a function. Whatever the reference is connected to shouldn’t
go away when the function returns, otherwise you’ll be referring to
unknown memory.
Here’s an example:
//: C11:Reference.cpp // Simple C++ references int* f(int* x) { (*x)++; return x; // Safe, x is outside this scope } int& g(int& x) { x++; // Same effect as in f() return x; // Safe, outside this scope } int& h() { int q; //! return q; // Error static int x; return x; // Safe, x lives outside this scope } int main() { int a = 0; f(&a); // Ugly (but explicit) g(a); // Clean (but hidden) } ///:~
The call to f( )
doesn’t have the convenience and cleanliness of using references, but
it’s clear that an address is being passed. In the call to
g( ), an address is being passed (via a reference), but you
don’t see it.
The reference argument in
Reference.cpp works only when the argument is a non-const object.
If it is a const object, the function g( ) will not accept
the argument, which is actually a good thing, because the function does
modify the outside argument. If you know the function will respect the
constness of an object, making the argument a
const reference will allow the function to be
used in all situations. This means that, for built-in types, the function will
not modify the argument, and for user-defined types, the function will call only
const member functions, and won’t modify any public data
members.
The use of const references in
function arguments is especially important because your function may receive a
temporary
object. This might have been
created as a return value of another function or explicitly by the user of your
function. Temporary objects are always const, so if you don’t use a
const reference, that argument won’t be accepted by the compiler.
As a very simple example,
//: C11:ConstReferenceArguments.cpp // Passing references as const void f(int&) {} void g(const int&) {} int main() { //! f(1); // Error g(1); } ///:~
The call to f(1) causes a
compile-time error because the compiler must first create a reference. It does
so by allocating storage for an int, initializing it to one and producing
the address to bind to the reference. The storage must be a const
because changing it would make no sense – you can never get your hands on
it again. With all temporary objects you must make the same assumption: that
they’re inaccessible. It’s valuable for the compiler to tell you
when you’re changing such data because the result would be lost
information.
In C, if you want to modify the
contents of the pointer rather than what it points to, your function
declaration looks like:
void f(int**);
and you’d have to take the address
of the pointer when passing it in:
int i = 47; int* ip = &i; f(&ip);
With references in C++, the syntax is
cleaner. The function argument becomes a reference to a pointer, and you no
longer have to take the address of that pointer. Thus,
//: C11:ReferenceToPointer.cpp #include <iostream> using namespace std; void increment(int*& i) { i++; } int main() { int* i = 0; cout << "i = " << i << endl; increment(i); cout << "i = " << i << endl; } ///:~
By running this program, you’ll
prove to yourself that the pointer is incremented, not what it points
to.
Your normal habit when passing an
argument to a function should be to pass by const reference. Although at
first this may seem like only an efficiency
concern (and you normally
don’t want to concern yourself with efficiency tuning while you’re
designing and assembling your program), there’s more at stake: as
you’ll see in the remainder of the chapter, a copy-constructor is required
to pass an object by value, and this isn’t always
available.
The efficiency savings can be substantial
for such a simple habit: to pass an argument by value requires a constructor and
destructor call, but if you’re not going to modify the argument then
passing by const reference only needs an address pushed on the
stack.
In fact, virtually the only time passing
an address isn’t preferable is when you’re going to do such
damage to an object that passing by value is the only safe approach (rather than
modifying the outside object, something the caller doesn’t usually
expect). This is the subject of the next
section.
Now that you understand the basics of the
reference in C++, you’re ready to tackle one of the more confusing
concepts in the language: the
copy-constructor, often called
X(X&) (“X of X ref”). This constructor is essential to
control passing and returning of user-defined types by value during function
calls. It’s so important, in fact, that the compiler will automatically
synthesize a copy-constructor if you don’t provide one yourself, as you
will
see.
To understand the need for the
copy-constructor, consider the way C handles passing and returning variables by
value
during
function calls. If you declare a function and make a function
call,
int f(int x, char c); int g = f(a, b);
how does the compiler know how to pass
and return those variables? It just knows! The range of the types it must deal
with is so small – char, int, float, double,
and their variations – that this information is built into the compiler.
If you figure out how to generate
assembly
code with your compiler and determine the statements generated by the function
call to f( ), you’ll get the equivalent of:
push b push a call f() add sp,4 mov g, register a
This code has been cleaned up
significantly to make it generic; the expressions for b and a will
be different depending on whether the variables are global (in which case they
will be _b and _a) or local (the compiler will index them off the
stack pointer). This is also true for the expression for g. The
appearance of the call to f( ) will depend on your name-decoration
scheme, and “register a” depends on how the CPU registers are named
within your assembler. The logic behind the code, however, will remain the
same.
In C and C++, arguments are first pushed
on the stack from right to left, then the function call is made. The calling
code is responsible for cleaning the arguments off the stack (which accounts for
the add sp,4). But notice that to pass the arguments by value, the
compiler simply pushes copies on the stack – it knows how big they are and
that pushing those arguments makes accurate copies of them.
The return value of f( ) is
placed in a register. Again, the compiler knows everything there is to know
about the return value type because that type is built into the language, so the
compiler can return it by placing it in a register. With the primitive data
types in C, the simple act of copying the bits of the value is equivalent to
copying the object.
But now consider user-defined types. If
you create a class and you want to pass an object of that class by value, how is
the compiler supposed to know what to do? This is not a type built into the
compiler; it’s a type you have created.
To investigate this, you can start with a
simple structure that is clearly too large to return in
registers:
//: C11:PassingBigStructures.cpp struct Big { char buf[100]; int i; long d; } B, B2; Big bigfun(Big b) { b.i = 100; // Do something to the argument return b; } int main() { B2 = bigfun(B); } ///:~
Decoding the assembly output is a little
more complicated here because most compilers use “helper” functions
instead of putting all functionality inline. In main( ), the call to
bigfun( ) starts as you might guess – the entire contents of
B is pushed on the stack. (Here, you might see some compilers load
registers with the address of the Big and its size, then call a helper
function to push the Big
onto the stack.)
In the previous code fragment, pushing
the arguments onto the stack was all that was required before making the
function call. In PassingBigStructures.cpp, however, you’ll see an
additional action: the address of B2 is pushed before making the call,
even though it’s obviously not an argument. To comprehend what’s
going on here, you need to understand the constraints on the compiler when
it’s making a function call.
When the compiler generates code for a
function call, it first pushes all the arguments on the stack, then makes the
call. Inside the function, code is generated to move the stack pointer down even
farther to provide storage for the function’s local variables.
(“Down” is relative here; your machine may increment or decrement
the stack pointer during a push.) But during the assembly-language
CALL, the CPU pushes the address
in the program code where the function call came from, so the
assembly-language RETURN can use
that address to return to the calling point. This address is of course sacred,
because without it your program will get completely lost. Here’s what the
stack frame looks like after the CALL and the allocation of local variable
storage in the function:
The code generated for the rest of the
function expects the memory to be laid out exactly this way, so that it can
carefully pick from the function arguments and local variables without touching
the return address. I shall call this block of memory, which is everything used
by a function in the process of the function call, the function
frame.
You might think it reasonable to try to
return values on the stack. The compiler could simply push it, and the function
could return an offset to indicate how far down in the stack the return value
begins.
The problem occurs because functions in C
and C++ support interrupts; that is, the languages are
re-entrant. They also support recursive function
calls. This means that at any point in the execution of a program an interrupt
can occur without breaking the program. Of course, the person who writes the
interrupt service routine (ISR) is responsible for
saving and restoring all the registers that are used in the ISR, but if the ISR
needs to use any memory further down on the stack, this must be a safe thing to
do. (You can think of an ISR as an ordinary function with no arguments and
void return value that saves and restores the CPU state. An ISR function
call is triggered by some hardware event instead of an explicit call from within
a program.)
Now imagine what would happen if an
ordinary function tried to return values on the stack. You can’t touch any
part of the stack that’s above the return address, so the function would
have to push the values below the return address. But when the assembly-language
RETURN is executed, the stack pointer must be pointing to the return address (or
right below it, depending on your machine), so right before the RETURN, the
function must move the stack pointer up, thus clearing off all its local
variables. If you’re trying to return values on the stack below the return
address, you become vulnerable at that moment because an interrupt could come
along. The ISR would move the stack pointer down to hold its return address and
its local variables and overwrite your return value.
To solve this problem, the caller
could be responsible for allocating the extra storage on the stack for
the return values before calling the function. However, C was not designed this
way, and C++ must be compatible. As you’ll see shortly, the C++ compiler
uses a more efficient scheme.
Your next idea might be to return the
value in some global data area, but this doesn’t work either. Reentrancy
means that any function can be an interrupt routine for any other function,
including the same function you’re currently inside. Thus, if you
put the return value in a global area, you might return into the same function,
which would overwrite that return value. The same logic applies to
recursion.
The only safe place to return values is
in the registers, so you’re back to the problem of what to do when the
registers aren’t large enough to hold the return value. The answer is to
push the address of the return value’s destination on the stack as one of
the function arguments, and let the function copy the return information
directly into the destination. This not only solves all the problems, it’s
more efficient. It’s also the reason that, in
PassingBigStructures.cpp, the compiler pushes the address of B2
before the call to bigfun( ) in main( ). If you look at
the assembly output for bigfun( ), you can see it expects this
hidden argument and performs the copy to the destination inside the
function.
So far, so good. There’s a workable
process for passing and returning large simple structures. But notice that all
you have is a way to copy the bits from one place to another, which certainly
works fine for the primitive way that C looks at variables. But in C++ objects
can be much more sophisticated than a patch of bits; they have meaning. This
meaning may not respond well to having its bits copied.
Consider a simple example: a class that
knows how many objects of its type exist at any one time. From Chapter 10, you
know the way to do this is by including a static data
member:
//: C11:HowMany.cpp // A class that counts its objects #include <fstream> #include <string> using namespace std; ofstream out("HowMany.out"); class HowMany { static int objectCount; public: HowMany() { objectCount++; } static void print(const string& msg = "") { if(msg.size() != 0) out << msg << ": "; out << "objectCount = " << objectCount << endl; } ~HowMany() { objectCount--; print("~HowMany()"); } }; int HowMany::objectCount = 0; // Pass and return BY VALUE: HowMany f(HowMany x) { x.print("x argument inside f()"); return x; } int main() { HowMany h; HowMany::print("after construction of h"); HowMany h2 = f(h); HowMany::print("after call to f()"); } ///:~
The class HowMany contains a
static int objectCount and a static member function
print( ) to report the value of that objectCount, along with
an optional message argument. The constructor increments the count each time an
object is created, and the destructor decrements it.
The output, however, is not what you
would expect:
after construction of h: objectCount = 1 x argument inside f(): objectCount = 1 ~HowMany(): objectCount = 0 after call to f(): objectCount = 0 ~HowMany(): objectCount = -1 ~HowMany(): objectCount = -2
After h is created, the object
count is one, which is fine. But after the call to f( ) you would
expect to have an object count of two, because h2 is now in scope as
well. Instead, the count is zero, which indicates something has gone horribly
wrong. This is confirmed by the fact that the two destructors at the end make
the object count go negative, something that should never
happen.
Look at the point inside
f( ), which occurs after the argument is passed by value. This means
the original object h exists outside the function frame, and
there’s an additional object inside the function frame, which is
the copy that has been passed by value. However, the argument has been passed
using C’s primitive notion of bitcopying, whereas the C++ HowMany
class requires true initialization to maintain its integrity, so the default
bitcopy fails to produce the desired effect.
When the local object goes out of scope
at the end of the call to f( ), the destructor is called, which
decrements objectCount, so outside the function, objectCount is
zero. The creation of h2 is also performed using a bitcopy, so the
constructor isn’t called there either, and when h and h2 go
out of scope, their destructors cause the negative values of
objectCount.
The problem occurs because the compiler
makes an assumption about how to create a new object from an existing
object.
When you pass an object by
value, you create a new object, the passed object inside the function frame,
from an existing object, the original object outside the function frame. This is
also often true when returning an object from a function. In the expression
HowMany h2 = f(h);
h2, a previously unconstructed
object, is created from the return value of f( ), so again a new
object is created from an existing one.
The compiler’s assumption is that
you want to perform this creation using a bitcopy, and in many cases this may
work fine, but in HowMany it doesn’t fly because the meaning of
initialization goes beyond simply copying. Another common example occurs if the
class contains pointers – what do they point to, and should you copy them
or should they be connected to some new piece of memory?
Fortunately, you can intervene in this
process and prevent the compiler from doing a bitcopy. You do this by defining
your own function to be used whenever the compiler needs to make a new object
from an existing object. Logically enough, you’re making a new object, so
this function is a constructor, and also logically enough, the single argument
to this constructor has to do with the object you’re constructing from.
But that object can’t be passed into the constructor by value because
you’re trying to define the function that handles passing by value,
and syntactically it doesn’t make sense to pass a pointer because, after
all, you’re creating the new object from an existing object. Here,
references come to the rescue, so you take the reference of the source object.
This function is called the
copy-constructor and is
often referred to as X(X&), which is its appearance for a class
called X.
If you create a copy-constructor, the
compiler will not perform a bitcopy when creating a new object from an existing
one. It will always call your copy-constructor. So, if you don’t create a
copy-constructor, the compiler will do something sensible, but you have the
choice of taking over complete control of the process.
Now it’s possible to fix the
problem in HowMany.cpp:
//: C11:HowMany2.cpp // The copy-constructor #include <fstream> #include <string> using namespace std; ofstream out("HowMany2.out"); class HowMany2 { string name; // Object identifier static int objectCount; public: HowMany2(const string& id = "") : name(id) { ++objectCount; print("HowMany2()"); } ~HowMany2() { --objectCount; print("~HowMany2()"); } // The copy-constructor: HowMany2(const HowMany2& h) : name(h.name) { name += " copy"; ++objectCount; print("HowMany2(const HowMany2&)"); } void print(const string& msg = "") const { if(msg.size() != 0) out << msg << endl; out << '\t' << name << ": " << "objectCount = " << objectCount << endl; } }; int HowMany2::objectCount = 0; // Pass and return BY VALUE: HowMany2 f(HowMany2 x) { x.print("x argument inside f()"); out << "Returning from f()" << endl; return x; } int main() { HowMany2 h("h"); out << "Entering f()" << endl; HowMany2 h2 = f(h); h2.print("h2 after call to f()"); out << "Call f(), no return value" << endl; f(h); out << "After call to f()" << endl; } ///:~
There are a number of new twists thrown
in here so you can get a better idea of what’s happening. First, the
string name acts as an object identifier when information about
that object is printed. In the constructor, you can put an identifier string
(usually the name of the object) that is copied to name using the
string constructor. The default = "" creates an empty
string. The constructor increments the objectCount as before, and
the destructor decrements it.
Next is the copy-constructor,
HowMany2(const HowMany2&). The copy-constructor can create a new
object only from an existing one, so the existing object’s name is copied
to name, followed by the word “copy” so you can see where it
came from. If you look closely, you’ll see that the call
name(h.name) in the constructor initializer list is actually calling the
string copy-constructor.
Inside the copy-constructor, the object
count is incremented just as it is inside the normal constructor. This means
you’ll now get an accurate object count when passing and returning by
value.
The print( ) function has
been modified to print out a message, the object identifier, and the object
count. It must now access the name data of a particular object, so it can
no longer be a static member
function.
Inside main( ), you can see
that a second call to f( ) has been added. However, this call uses
the common C approach of ignoring the return value. But now that you know how
the value is returned (that is, code inside the function handles the
return process, putting the result in a destination whose address is passed as a
hidden argument), you might wonder what happens when the return value is
ignored. The output of the program will throw some illumination on
this.
Before showing the output, here’s a
little program that uses iostreams to add line numbers to any
file:
//: C11:Linenum.cpp //{T} Linenum.cpp // Add line numbers #include "../require.h" #include <vector> #include <string> #include <fstream> #include <iostream> #include <cmath> using namespace std; int main(int argc, char* argv[]) { requireArgs(argc, 1, "Usage: linenum file\n" "Adds line numbers to file"); ifstream in(argv[1]); assure(in, argv[1]); string line; vector<string> lines; while(getline(in, line)) // Read in entire file lines.push_back(line); if(lines.size() == 0) return 0; int num = 0; // Number of lines in file determines width: const int width = int(log10((double)lines.size())) + 1; for(int i = 0; i < lines.size(); i++) { cout.setf(ios::right, ios::adjustfield); cout.width(width); cout << ++num << ") " << lines[i] << endl; } } ///:~
The entire file is read into a
vector<string>, using the same code that you’ve seen earlier
in the book. When printing the line numbers, we’d like all the lines to be
aligned with each other, and this requires adjusting for the number of lines in
the file so that the width allowed for the line numbers is consistent. We can
easily determine the number of lines using vector::size( ), but what
we really need to know is whether there are more than 10 lines, 100 lines, 1,000
lines, etc. If you take the logarithm, base 10, of the
number of lines in the file, truncate it to an int and add one to the
value, you’ll find out the maximum width that your line count will
be.
You’ll notice a couple of strange
calls inside the for loop:
setf( ) and
width( ). These are
ostream calls that allow you to control, in this case, the justification
and width of the output. However, they must be called each time a line is output
and that is why they are inside the for loop. Volume 2 of this book has
an entire chapter explaining iostreams that will tell you more about these calls
as well as other ways to control iostreams.
When Linenum.cpp is applied to
HowMany2.out, the result is
1) HowMany2() 2) h: objectCount = 1 3) Entering f() 4) HowMany2(const HowMany2&) 5) h copy: objectCount = 2 6) x argument inside f() 7) h copy: objectCount = 2 8) Returning from f() 9) HowMany2(const HowMany2&) 10) h copy copy: objectCount = 3 11) ~HowMany2() 12) h copy: objectCount = 2 13) h2 after call to f() 14) h copy copy: objectCount = 2 15) Call f(), no return value 16) HowMany2(const HowMany2&) 17) h copy: objectCount = 3 18) x argument inside f() 19) h copy: objectCount = 3 20) Returning from f() 21) HowMany2(const HowMany2&) 22) h copy copy: objectCount = 4 23) ~HowMany2() 24) h copy: objectCount = 3 25) ~HowMany2() 26) h copy copy: objectCount = 2 27) After call to f() 28) ~HowMany2() 29) h copy copy: objectCount = 1 30) ~HowMany2() 31) h: objectCount = 0
As you would expect, the first
thing that happens is that the normal constructor is called for h, which
increments the object count to one. But then, as f( ) is entered,
the copy-constructor is quietly called by the compiler to perform the
pass-by-value. A new object is created, which is the copy of h (thus the
name “h copy”) inside the function frame of f( ), so the
object count becomes two, courtesy of the copy-constructor.
Line eight indicates the beginning of the
return from f( ). But before the local variable “h copy”
can be destroyed (it goes out of scope at the end of the function), it must be
copied into the return value, which happens to be h2. A previously
unconstructed object (h2) is created from an existing object (the local
variable inside f( )), so of course the copy-constructor is used
again in line nine. Now the name becomes “h copy copy” for
h2’s identifier because it’s being copied from the copy that
is the local object inside f( ). After the object is returned, but
before the function ends, the object count becomes temporarily three, but then
the local object “h copy” is destroyed. After the call to
f( ) completes in line 13, there are only two objects, h and
h2, and you can see that h2 did indeed end up as “h copy
copy.”
Line 15 begins the call to f(h),
this time ignoring the return value. You can see in line 16 that the
copy-constructor is called just as before to pass the argument in. And also, as
before, line 21 shows the copy-constructor is called for the return value. But
the copy-constructor must have an address to work on as its destination (a
this pointer). Where does this address come
from?
It turns out the compiler can create a
temporary object whenever it needs one to properly evaluate an expression. In
this case it creates one you don’t even see to act as the destination for
the ignored return value of f( ). The lifetime of this temporary
object is as short as possible so the landscape
doesn’t get cluttered up with temporaries waiting to be destroyed and
taking up valuable resources. In some cases, the temporary might immediately be
passed to another function, but in this case it isn’t needed after the
function call, so as soon as the function call ends by calling the destructor
for the local object (lines 23 and 24), the temporary object is destroyed (lines
25 and 26).
Finally, in lines 28-31, the h2
object is destroyed, followed by h, and the object count goes correctly
back to
zero.
Because the copy-constructor implements
pass and return by value, it’s important that the compiler creates one for
you in the case of simple structures – effectively, the same thing it does
in C. However, all you’ve seen so far is the default primitive behavior: a
bitcopy.
When more complex types are involved, the
C++ compiler will still automatically create a copy-constructor if you
don’t make one. Again, however, a bitcopy
doesn’t make sense, because it doesn’t
necessarily implement the proper meaning.
Here’s an example to show the more
intelligent approach the compiler takes. Suppose you create a new class composed
of objects of several existing classes. This is called, appropriately enough,
composition,
and it’s one of the ways you can make new classes from existing classes.
Now take the role of a naive user who’s trying to solve a problem quickly
by creating a new class this way. You don’t know about copy-constructors,
so you don’t create one. The example demonstrates what the compiler does
while creating the default copy-constructor for your new class:
//: C11:DefaultCopyConstructor.cpp // Automatic creation of the copy-constructor #include <iostream> #include <string> using namespace std; class WithCC { // With copy-constructor public: // Explicit default constructor required: WithCC() {} WithCC(const WithCC&) { cout << "WithCC(WithCC&)" << endl; } }; class WoCC { // Without copy-constructor string id; public: WoCC(const string& ident = "") : id(ident) {} void print(const string& msg = "") const { if(msg.size() != 0) cout << msg << ": "; cout << id << endl; } }; class Composite { WithCC withcc; // Embedded objects WoCC wocc; public: Composite() : wocc("Composite()") {} void print(const string& msg = "") const { wocc.print(msg); } }; int main() { Composite c; c.print("Contents of c"); cout << "Calling Composite copy-constructor" << endl; Composite c2 = c; // Calls copy-constructor c2.print("Contents of c2"); } ///:~
The class WithCC contains a
copy-constructor, which simply announces that it has been called, and this
brings up an interesting issue. In the class Composite, an object of
WithCC is created using a default constructor. If there were no
constructors at all in WithCC, the compiler would automatically create a
default constructor, which would
do nothing in this case. However, if you add a copy-constructor, you’ve
told the compiler you’re going to handle constructor creation, so it no
longer creates a default constructor for you and will complain unless you
explicitly create a default constructor as was done for
WithCC.
The class WoCC has no
copy-constructor, but its constructor will store a message in an internal
string that can be printed out using print( ). This
constructor is explicitly called in Composite’s constructor
initializer list (briefly introduced in Chapter 8 and covered fully in Chapter
14). The reason for this becomes apparent later.
The class Composite has member
objects of both WithCC and WoCC (note the embedded object
wocc is initialized in the constructor-initializer list, as it must be),
and no explicitly defined copy-constructor. However, in main( ) an
object is created using the copy-constructor in the definition:
Composite c2 = c;
The copy-constructor for Composite
is created automatically by the compiler, and the output of the program
reveals the way that it is created:
Contents of c: Composite() Calling Composite copy-constructor WithCC(WithCC&) Contents of c2: Composite()
To create a copy-constructor for a class
that uses composition (and
inheritance,
which is introduced in Chapter 14), the compiler recursively calls the
copy-constructors for all the member objects and base classes. That is, if the
member object also contains another object, its copy-constructor is also called.
So in this case, the compiler calls the copy-constructor for WithCC. The
output shows this constructor being called. Because WoCC has no
copy-constructor, the compiler creates one for it that just performs a bitcopy,
and calls that inside the Composite copy-constructor. The call to
Composite::print( ) in main shows that this happens because the
contents of c2.wocc are identical to the contents of c.wocc. The
process the compiler goes through to synthesize a copy-constructor is called
memberwise
initialization.
It’s always best to create your own
copy-constructor instead of letting the compiler do it for you. This guarantees
that it will be under your
control.
At this point your head may be swimming,
and you might be wondering how you could have possibly written a working class
without knowing about the copy-constructor. But remember: You need a
copy-constructor only if you’re going to pass an object of your class
by value. If that never happens, you don’t need a
copy-constructor.
“But,” you say, “if I
don’t make a copy-constructor, the compiler will create one for me. So how
do I know that an object will never be passed by value?”
There’s a simple technique for
preventing pass-by-value: declare a private
copy-constructor. You don’t even need to create a
definition, unless one of your member functions or a friend function
needs to perform a pass-by-value. If the user tries to pass or return the object
by value, the compiler will produce an error message because the
copy-constructor is private. It can no longer create a default
copy-constructor because you’ve explicitly stated that you’re taking
over that job.
Here’s an example:
//: C11:NoCopyConstruction.cpp // Preventing copy-construction class NoCC { int i; NoCC(const NoCC&); // No definition public: NoCC(int ii = 0) : i(ii) {} }; void f(NoCC); int main() { NoCC n; //! f(n); // Error: copy-constructor called //! NoCC n2 = n; // Error: c-c called //! NoCC n3(n); // Error: c-c called } ///:~
Notice the use of the more general form
NoCC(const NoCC&);
using the const.
Reference syntax is nicer to use than
pointer syntax, yet it clouds the meaning for the reader. For example, in the
iostreams library one overloaded version of the
get( ) function
takes a char& as an argument, and the whole point of the function is
to modify its argument by inserting the result of the get( ).
However, when you read code using this function it’s not immediately
obvious to you that the outside object is being modified:
char c; cin.get(c);
Instead, the function call looks like a
pass-by-value, which suggests the outside object is not
modified.
Because of this, it’s probably
safer from a code maintenance standpoint to use pointers when you’re
passing the address of an argument to modify. If you always pass
addresses as const references
except
when you intend to modify the outside object via the address, where you pass by
non-const pointer, then your code is far easier for the reader to
follow.
A pointer is a variable that holds the
address of some location. You can change what a pointer selects at runtime, and
the destination of the pointer can be either data or a function. The C++
pointer-to-member follows this same concept, except that what it selects
is a location inside a class. The dilemma here is that a pointer needs an
address, but there is no “address” inside a class; selecting a
member of a class means offsetting into that class. You can’t produce an
actual address until you combine that offset with the starting address of a
particular object. The syntax of pointers to members requires that you select an
object at the same time you’re dereferencing the pointer to
member.
To understand this syntax, consider a
simple structure, with a pointer sp and an object so for this
structure. You can select members with the syntax shown:
//: C11:SimpleStructure.cpp struct Simple { int a; }; int main() { Simple so, *sp = &so; sp->a; so.a; } ///:~
Now suppose you have an ordinary pointer
to an integer, ip. To access what ip is pointing to, you
dereference the pointer with a ‘*’:
*ip = 4;
Finally, consider what happens if you
have a pointer that happens to point to something inside a class object, even if
it does in fact represent an offset into the object. To access what it’s
pointing at, you must dereference it with *. But it’s an offset
into an object, so you must also refer to that particular object. Thus, the
* is combined with the object dereference. So the new syntax
becomes –>* for a pointer to an object,
and .* for the object or a reference, like
this:
objectPointer->*pointerToMember = 47; object.*pointerToMember = 47;
Now, what is the syntax for defining
pointerToMember? Like any pointer, you have to say what type it’s
pointing at, and you use a * in the definition. The only difference is
that you must say what class of objects this pointer-to-member is used with. Of
course, this is accomplished with the name of the class and the scope resolution
operator. Thus,
int ObjectClass::*pointerToMember;
defines a pointer-to-member variable
called pointerToMember that points to any int inside
ObjectClass. You can also initialize the pointer-to-member when you
define it (or at any other time):
int ObjectClass::*pointerToMember = &ObjectClass::a;
There is actually no
“address” of ObjectClass::a because you’re just
referring to the class and not an object of that class. Thus,
&ObjectClass::a can be used only as pointer-to-member
syntax.
Here’s an example that shows how to
create and use pointers to data members:
//: C11:PointerToMemberData.cpp #include <iostream> using namespace std; class Data { public: int a, b, c; void print() const { cout << "a = " << a << ", b = " << b << ", c = " << c << endl; } }; int main() { Data d, *dp = &d; int Data::*pmInt = &Data::a; dp->*pmInt = 47; pmInt = &Data::b; d.*pmInt = 48; pmInt = &Data::c; dp->*pmInt = 49; dp->print(); } ///:~
Obviously, these are too awkward to use
anywhere except for special cases (which is exactly what they were intended
for).
Also, pointers to members are quite
limited: they can be assigned only to a specific location inside a class. You
could not, for example, increment or compare them as you can with ordinary
pointers.
A similar exercise produces the
pointer-to-member syntax for member functions. A pointer to a function
(introduced at the end of Chapter 3) is defined like this:
int (*fp)(float);
The parentheses around (*fp) are
necessary to force the compiler to evaluate the definition properly. Without
them this would appear to be a function that returns an int*.
Parentheses also play an important role
when defining and using pointers to member functions. If you have a function
inside a class, you define a pointer to that member function by inserting the
class name and scope resolution operator into an ordinary function pointer
definition:
//: C11:PmemFunDefinition.cpp class Simple2 { public: int f(float) const { return 1; } }; int (Simple2::*fp)(float) const; int (Simple2::*fp2)(float) const = &Simple2::f; int main() { fp = &Simple2::f; } ///:~
In the definition for fp2 you can
see that a pointer to member function can also be initialized when it is
created, or at any other time. Unlike non-member functions, the & is
not optional when taking the address of a member function. However, you
can give the function identifier without an argument list, because overload
resolution can be determined by the type of the pointer to member.
The value of a pointer is that you can
change what it points to at runtime, which provides an important flexibility in
your programming because through a pointer you can select or change
behavior at runtime. A pointer-to-member is no different; it allows you
to choose a member at runtime. Typically, your classes will only have member
functions publicly visible (data members are usually considered part of the
underlying implementation), so the following example selects member functions at
runtime.
//: C11:PointerToMemberFunction.cpp #include <iostream> using namespace std; class Widget { public: void f(int) const { cout << "Widget::f()\n"; } void g(int) const { cout << "Widget::g()\n"; } void h(int) const { cout << "Widget::h()\n"; } void i(int) const { cout << "Widget::i()\n"; } }; int main() { Widget w; Widget* wp = &w; void (Widget::*pmem)(int) const = &Widget::h; (w.*pmem)(1); (wp->*pmem)(2); } ///:~
Of course, it isn’t particularly
reasonable to expect the casual user to create such complicated expressions. If
the user must directly manipulate a pointer-to-member, then a typedef is
in order. To really clean things up, you can use the pointer-to-member as part
of the internal implementation mechanism. Here’s the preceding example
using a pointer-to-member inside the class. All the user needs to do is
pass a number in to select a
function.[48]
//: C11:PointerToMemberFunction2.cpp #include <iostream> using namespace std; class Widget { void f(int) const { cout << "Widget::f()\n"; } void g(int) const { cout << "Widget::g()\n"; } void h(int) const { cout << "Widget::h()\n"; } void i(int) const { cout << "Widget::i()\n"; } enum { cnt = 4 }; void (Widget::*fptr[cnt])(int) const; public: Widget() { fptr[0] = &Widget::f; // Full spec required fptr[1] = &Widget::g; fptr[2] = &Widget::h; fptr[3] = &Widget::i; } void select(int i, int j) { if(i < 0 || i >= cnt) return; (this->*fptr[i])(j); } int count() { return cnt; } }; int main() { Widget w; for(int i = 0; i < w.count(); i++) w.select(i, 47); } ///:~
In the class interface and in
main( ), you can see that the entire implementation, including the
functions, has been hidden away. The code must even ask for the
count( ) of functions. This way, the class implementer can change
the quantity of functions in the underlying implementation without affecting the
code where the class is used.
The initialization of the
pointers-to-members in the constructor may seem overspecified. Shouldn’t
you be able to say
fptr[1] = &g;
because the name g occurs in the
member function, which is automatically in the scope of the class? The problem
is this doesn’t conform to the pointer-to-member syntax, which is required
so everyone, especially the compiler, can figure out what’s going on.
Similarly, when the pointer-to-member is dereferenced, it seems
like
(this->*fptr[i])(j);
is also over-specified; this looks
redundant. Again, the syntax requires that a pointer-to-member always be bound
to an object when it is
dereferenced.
Pointers in C++ are almost identical to
pointers in C, which is good. Otherwise, a lot of C code wouldn’t compile
properly under C++. The only compile-time errors you will produce occur with
dangerous assignments. If these are in fact what are intended, the compile-time
errors can be removed with a simple (and explicit!) cast.
C++ also adds the reference from
Algol and Pascal, which is like a constant pointer that is automatically
dereferenced by the compiler. A reference holds an address, but you treat it
like an object. References are essential for clean syntax with operator
overloading (the subject of the next chapter), but they also add syntactic
convenience for passing and returning objects for ordinary
functions.
The copy-constructor takes a reference to
an existing object of the same type as its argument, and it is used to create a
new object from an existing one. The compiler automatically calls the
copy-constructor when you pass or return an object by value. Although the
compiler will automatically create a copy-constructor for you, if you think one
will be needed for your class, you should always define it yourself to ensure
that the proper behavior occurs. If you don’t want the object passed or
returned by value, you should create a private
copy-constructor.
Pointers-to-members have the same
functionality as ordinary pointers: You can choose a particular region of
storage (data or function) at runtime. Pointers-to-members just happen to work
with class members instead of with global data or functions. You get the
programming flexibility that allows you to change behavior at
runtime.
Solutions to selected exercises
can be found in the electronic document The Thinking in C++ Annotated
Solution Guide, available for a small fee from www.BruceEckel.com.
[48]
Thanks to Owen Mortensen for this example