Bad Casts and wrong assumptions about class hierarchies

Lately here at work, I have been looking into solving a certain crash in the product. This particular crash was occurring in a certain function that had many crashes over the years.

The situation involved code in a base class that that was doing work for a derived class. In fact the base class was calling methods on the derived class. To do that, it had to cast itself to a pointer of the derived class (using the ‘this’ pointer). But since there were was more derived classes than one, the cast was sometimes performing an invalid cast when called with an instance of the other derived class.

So, sometimes when fixing a mysterious crash, just look for a bad cast. 🙂

The code had a class hierarchy roughly like this:

Class Base
	Class A
	Class B

So here, class A inherits from class Base. Similarly, class B also inherits from Base.

In this example case, some code in class Base was always casting the ‘this’ pointer to A. However called with an instance of B, the assumption won’t hold, and a cast will simply corrupt the pointer. Furthermore, it did this with a C style cast, which offers no warning nor protection.

This code demonstrates the problem:

void B::foo()
{
	A* temp = (A*)this; // B is not A!
	// crash attempting to do something with temp
}

So if called like this,

B* b = new B();
b->foo();

we should expect a crash calling foo since class B is not class A.

There were a few things wrong with this.

  1. A base class should not implement work that rightly belongs to a derived class. For instance it would be bad if a shape class should be in charge of drawing a box and a sphere. To be polymorphic a shape class would offer a virtual or abstract draw method, and leave the implementation details to the derived box and sphere classes. Thus a derived class pointer could stand in for a base class pointer (demonstrating polymorphism) and with no switch statements just implement the needed functionality.
    So in my case here, the whole concept of polymorphism here seems to have been violated. If a derived class performs an operation, then the base class should not be calling it. In fact, the base class should always be completely ignorant of the implementation details of a derived class. In this case, the operations being performed didn’t involve any virtual methods…. Thus without a virtual method, polymorphism cannot be implemented. Hint: using C++ it is possible to replace switch statements and if statements with simple virtual method calls. Thus this code was pure C code masquerading as C++ code.
  2. An incomplete understanding of the class hierarchy that derives from this base class. While the assumptions may have once held, once other derived classes were added later on, the assumptions would fail. Thus casting everything to one particular derived class to perform operations was very fragile, and asking for trouble.
  3. Use of C style casts here is plain wrong in C++. A cast like this should fail, but in C, no cast ever fails. A C style casts may be fast, but it offers no protection. Thankfully the Microsoft Visual C compiler offers a measure of protection however for C++. By turning on Run Time Type Identification (RTTI) in C++ and using any of the standard casts (static_cast<>, dynamic_cast<>, reinterpret_cast<>) the operation will fail with invalid data and return NULL (or nullptr in VC 10.0) . Thus B::foo could have been written like this:
void B::foo()
{
	A* temp = dynamic_cast(this);
	if (temp)
	{
		// do something with temp
	}
}

Which in this case as shown, the code inside the if statement would never get executed.

Finally in fixing the bug, I did a search through the entire source file looking for these C style casts to the wrong data type. And for almost every function that used this bad cast, we had lots of crashes in that function from customers over the years.

Advertisements