C++ Advanced Tutorial - Lesson 1
Related Blog Items
- C++ Tutorial Part 2 - Advanced
- C++ Basics - Tutorial
- C++ Advanced Tutorial - Lesson 11
- C++ Advanced Tutorial - Lesson 8
- C++ Advanced Tutorial - Lesson 3
Please refer C++ Basics before reading this tutorial..
1. VPTR AND VTABLE
1.1 A Class and its Objects in the Memory
1.2 Linking of a Method Call
1.3 Memory Footprint of a Non-polymorphic Object
1.4 Non-virtual Function Call
1.5 Virtual Function Call Using vPtr and vtable
1.6 A Language Needs Compiler and Type Information to Utilize vtable
1.7 Polymorphic Base Classes are placed at the front
1.8 Inheritance of Base-class vPtrs
1.9 De-referencing the vPtr
1.VPTR AND VTABLE
1.1 A Class and its Objects in the Memory
When a class is first-time accessed in a program, e.g. an object of this class is created or a static method is called, the class definition including one unique copy of its methods is loaded into the memory. Each method has its own memory address.
Then, every time when an object of this class is created, a copy of the data members of this class is created and put in a block of memory. Data members are placed in a fixed order one next to another. The address of an object is the address of the first byte of its memory block.
1.2 Linking of a Method Call
When compiler sees a method call such as
Woman * pw = new Woman; pw->MakeUp();
it needs to link the individual data copy of the object to the unique implementation of the method. If the method is non-virtual, this task is done at compile time, while if the method is virtual, it is done at run time.
1.3 Memory Footprint of a Non-polymorphic Object
Suppose class Derived inherits from Base1, Base2, Base3, and none of them has virtual functions:
class Base1 { public: void Hi1() { printf("Hi from Base1!\n"); } BYTE a1[100]; }; class Base2 { public: void Hi2() { printf("Hi from Base2!\n"); } BYTE a2[100]; }; class Base3 { public: void Hi3() { printf("Hi from Base3!\n"); } BYTE a3[100]; }; class Derived : public Base1, public Base2, public Base3 { public: void Hi() { printf("Hi from Derived!\n"); } BYTE a[100]; }; void main() { Derived * pDerived = new Derived; Base1 * pBase1 = (Base1 *)pDerived; pBase1->Hi1(); Base2 * pBase2 = (Base2 *)pDerived; pBase2->Hi2(); Base3 * pBase3 = (Base3 *)pDerived; pBase3->Hi3(); }
The memory footprint of an object of class Derived and the addresses of the pointers will be like:
pDerived pBase1 pBase2 pBase3 +----------+-----------+-----------+----------+ | a1 | a2 | a3 | a | +----------+-----------+-----------+----------+ 0 100 200 300 400 Fig 1. Memory footprint of a non-polymorphic type
From this footprint you can see some important facts:
1)An object of a non-polymorphic class doesn’t need to carry any information about addresses of the methods, because the linking of non-virtual functions are already done by compiler at compile time. So the object contains purely data members. The references such as pBase1, pBase2, pBase3, etc. are only used to access the data members, not the methods.
2)The memory of base classes is allocated in front of the derived class, which conforms to the sequence of object construction.
3)The address of the FIRST base class is the same as the derived class.
4)When a pointer of the derived class is casted to the base class, it is pointing to the base-class part of the memory.
1.4 Non-virtual Function Call
When compiler sees a call to a non-virtual method:
because only the class which declares the method can implement it, compiler will directly link the call to the address of the specific method at compile time.
1.5 Virtual Function Call Using vPtr and vtable
When compiler sees a call to a virtual method:
void Go(Vehicle * pv) { pv->StartEngine(); pv->Accelerate(); }
it has no idea on the addresses of the virtual methods, because when method Go is called at run time, the parameter Vehicle * pv can be passed a pointer to an object of any derived class, such as FamilyCar, 4WD, PickUpTruck, Van, etc., each with its own implementations of StartEngine and Accelerate at different memory locations.
Therefore, there must be a mechanism for a program to figure out the locations of the virtual methods at run time.
Now consider the following virtual function call:
pBase2->Hi2();
A late-binding process involves the following activities:
1)Compiler adds a hidden vPtr member to the class, and generates one unique vtable for the class.
At compilation time, when compiler sees the definition of a class with virtual methods, it will build a virtual table (vtable) for the class, which is an array of function pointers to the implementations of all the virtual methods, and add a hidden data member vPtr to the class definition as the FIRST data member.
Now suppose the methods of classes in Fig. 1 (Hi, Hi1, Hi2, Hi3) are all virtual functions. The memory footprint of an object of class Derived becomes:
pDerived pBase1 pBase2 pBase3 +----+-------------+-------+---------+-------+----------+----------+ |vptr| a1 | vptr2 | a2 | vptr3 | a3 | a | +----+-------------+-------+---------+-------+----------+----------+ 0 4 104 108 208 212 312 412 Fig. 2. Memory footprint of a polymorphic type object
As you can see in the memory footprint, if you use a Base2 pointer to receive a Derived object, for example, this pointer will point to memory offset 104 as pBase2 does.
Note that each Derive object will have its own memory footprint, with the same structure but in different memory locations. However, the vPtrs will all be pointing to the same method implementations, in other words, the vPtr2 of two instances will contain the same address.
The derived-class and the first base class shares the same vPtr, which points to their shared merged vtable (see following section “Inheritance of Base-class vPtrs” for details). The rest of the base classes have their own vPtrs.
Note that no matter how complicated the inheritance hierarchy is, a function pointer in the vtable always points to the latest/lowest implementation of the virtual function in the inheritance hierarchy.
2)Compiler generates code to do dynamic binding using the vtable.
At compilation time, when compiler sees a call to a virtual method thourgh a pointer (pBase2->Hi2( )), it knows that the address of the function is only known at run time, so it will not try to find the implementation of the function. Instead, it knows that the pointer (pBase2) will be pointing to a vPtr at run time. So it generates code to go through the vPtr to find the vtable (whose composition is already know from the type of the pointer), and go to a certain entry of that vtable, fatch that function pointer, and make the call.
3)At run time, when an object is created out of this class definition, its vPtr member will be assigned the address of the class’s vtable.
1.6 A Language Needs Compiler and Type Information to Utilize vtable
To directly link the address of the implementation of a function at compile time (for a non-polymorphic object), or to generate code to find it out at run time (for a polymorphic one), the compiler needs to access the class definition of the pointer, and the type definition of the pointer is all it wants to know. Because of this, even if pBase2 is actually pointing to an object of class Derived, it can not access class Derived’s method Hi, because the code generated by the compiler only knows the vtable of class Base2. For the same reason, the following function call doesn’t work:
void * pvoid = new Derived;
pvoid->Hi();
Scrip languages such as VBScript, JavaScript and early versions of VB do not have a compiler and do not do compilation. They explain and run the code line by line at run time. Therefore, they can not generate code before run to access the vtable. So they can not utilize vtable and can not directly use polymorphism.
1.7 Polymorphic Base Classes are placed at the front
Now suppose only Base2 and Base3 have virtual methods while Base1 doesn’t. The memory footprint of Derived will become:
pDerived pBase2 pBase3 pBase1 +----+-------------+------+----------+----------+----------+ |vptr| a2 | vptr | a3 | a1 | a | +----+-------------+------+----------+----------+----------+ 0 4 104 108 208 308 408 Fig. 3. Memory footprint of a polymorphically mixed type
You can see that the memory block of polymorphic base classes (Base2 and Base3) are moved to the front, while the non-polymorphic base classes (Base1) are moved to the back.
1.8 Inheritance of Base-class vPtrs
From the Fig.2 you can see, the vPtr member of the second and third base class was inherited, but the vPtr of the first base class is not. Suppose class D inherits only from C, C only from B, and B only from A. When compiler construct the vtable for D, it directly merges the vtables of A, B and C into D’s, so that class D has only one vPtr pointing to one vtable. The vtable contains function pointers to the lowest implementations of all virtual functions across the inheritance hierarchy, no matter where they are defined.
However, due to multiple inheritance, a class may indirectly inherit from many classes. If we decide to merge the vtables of all the base classes into one, the vtable may become very big. To avoid this, instead of discarding the vPtrs and vtables of all base classes and merging all vtables into one, the compiler only does it to all the FIRST base classes, and retains the vPtrs and vtables of all the subsequent base classes and their base classes.
In other words, in an object’s memory footprint, you can find the vPtrs of all its base classes all through the hierarchy, except for all the “first-borns”.
In Fig. 2, virtual function Hi3 is defined in class Base3. When it is called:
pDerived->Hi3();
even if it is implemented in class Derived, the program will still go to the vPtr of Base3 (offset 208 in the footprint), then to Base3’s vtable, then to Hi3’s function pointer, which points back to the implementation in class Derived. We can prove it by setting Base3’s vPtr in offset 208 to 0. Thus the address of Base3’s vtable is lost, and error will cause the program to be shut down:
pd->Hi3(); // virtual function Hi3 is defined in Base3. Works fine here ::memset( (Base3 *)((DWORD)pd + 208), 0, 4); pd->Hi3(); // Error happens!
In a typical case, the derived class inherits from a group of base classes, which can be interfaces (classes with only public pure virtual functions) or classes with some implementations. The derived class does not define any new virtual function, it only implements the virtual functions defined in the base classes, or simply group the services of the base classes together. In such a case, the vtable of the derived class is simply the vtable of the first base class. Therefore, the first vPtr in the derived-class object points to the first base class’s vtable, the second vPtr points to the second base class’s vtable, and so on. Everything is clean and neat.
1.9 De-referencing the vPtr
Comparing Fig.2 with Fig.1, because the binding of virtual methods are only done at run time, the object contains not only data members but also vPtrs telling the addresses of all the virtual functions. When you have a reference say pBase2, you know that the first four bytes from the given address is a vPtr of Base2, followed by the data members of Base2.
Base2 * pBase2 = new Derived; pBase2->Hi2(); // OK! cout << pBase2->a2[3]; // OK! pBase2->Hi3(); // Compile error: 'Hi2' : is not a member of 'Base3'
Because of this structure, if you only want to access virtual functions defined in Base2 and its own part of data members, not the virtual functions and data members of other classes, what you really need is the address of Base2’s vPtr.
Popularity: 29%
You need to log on to convert this article into PDF
Related Blog Items - C++ Tutorial Part 2 - Advanced
- C++ Basics - Tutorial
- C++ Advanced Tutorial - Lesson 11
- C++ Advanced Tutorial - Lesson 8
- C++ Advanced Tutorial - Lesson 3
Related Blog Items
- C++ Tutorial Part 2 - Advanced
- C++ Basics - Tutorial
- C++ Advanced Tutorial - Lesson 11
- C++ Advanced Tutorial - Lesson 8
- C++ Advanced Tutorial - Lesson 3
No Comments
No comments yet.