A very subtle bug in class Tab

Over the last two days I’ve been trying to reproduce and fix a very mysterious crash in 3dsmax. The reports from the field reported a crash with the following partial stack frame:

	msvcr90.dll!free(0x20736563)  Line 110	C
 	geom.dll!BitArray::SetSize(0, 0)  Line 179	C++

The problem is it was crashing while attempting to free a invalid pointer. The problem was completely mysterious at first, for when you look at the code, SetSize doesn’t call free directly. Also the memory getting free’d was completely private, with no outside access.

To tackle the problem, I wrote some unit tests, hoping I could learn more about how class BitArray works, and therefore hoping to learn how to reproduce the problem. So after two days, a lot of unit tests, and looking at a lot of stack frames I came upon the answer.

The code listing below shows a unit test that demonstrates the crash.

void BitArrayTestSuite::TestCrash()
{
    Tab<BitArray> ar;
    int size = 10;
    // Calling SetCount here only allocates memory using malloc. It does NOT
    // construct instances of the item in the array.
    ar.SetCount(size);
    for (int i = 0; i < size; i++)
    {
        // The bit array is completely invalid here.
        // 1. It has not been constructed.
        // 2. It is full of garbage memory.
        // Use at your own risk.
        BitArray b = ar[i];
        // Take a gamble here an hope it doesn't crash.
        b.SetSize(0);
    }
}

Here a templated container class called Tab (For those who don’t know the 3dsmax SDK) calls SetCount to allocate memory for 10 items. It does this using only malloc. The Tab class does not call any constructors for BitArray. Therefore when the memory is accessed, it is interpreted as valid array of BitArray’s. All this without so much as a warning or anything to signal to the programmer that they have done something wrong. This is really dangerous: Lying to the compiler, surely brings consequences.

The final problem is exhibited as the SetSize method eventually deletes a pointer to its ‘bits’. But since the BitArray contains nothing but garbage, it was deleting a pointer to random memory.

This isn’t a problem with class BitArray. This class is actually quite robust, up to a point. The problem is the Tab class allows you to shoot your foot off so extremely easily!! Meaning by calling SetCount, you can lie to the compiler, fooling it into thinking you have legitimately constructed elements when you really have NOT.

This problem will occur not only with BitArray, but with anything else that Tab may or can hold.

The particular crash I was investigating: 1597131, mostly came from the following callstack:



msvcr90.dll!free(0x20736563) Line 110 C

geom.dll!BitArray::SetSize(0, 0) Line 179 C++

MNMath.dll!MNNormalFace::Init() Line 31 C++

> EPoly.dlo!CreateOnlyRestore::After() Line 1497 C++

EPoly.dlo!CreateOnlyRestore::Restore(1) Line 1549 C++

The place that contained the Tab abuse was inside of CreateOnlyRestore::After, where there was this code:

MNNormalSpec *pNorm = mpEditPoly->mm.GetSpecifiedNormals();
if (pNorm)
{
	mNewNormFaces.SetCount (nfnum);
	for (int i=0; iFace(i+ofnum);
	}
}

Notice how a Tab of MNNormalFace elements was magically ‘constructed’ using the SetCount method. Then the Init method was called on this phantom element, which eventually boiled down to freeing a dangling pointer.

Luckily for us, back in August, Bernard fixed a bug that went right through the code above. His fix actually completely removed the above abuse of Tab in

d:stageRenoir_MAX_R106_RL_Stagedevel3dswinsrcmaxsdksamplesmesheditablepolyrestore.cpp

and thus the principle place that leads to bucket 1597131.

However, there are six other stack frames that lead to bucket 1597131. Each one will have to be investigated. Meanwhile, there is a good lesson to be learned here:

The method Tab::SetCount does NOT construct your elements for you. Use Tab at your own risk.

Advertisements

2 thoughts on “A very subtle bug in class Tab

  1. Chris Johnson

    Ah… The HTML must have eaten up the template parameter…

    It should say:

    Tab [ BitArray ]  ar;

    (Just replace the square brackets with angle brackets…)

    I’ll see about reconstructing the code entry when I get home.

    THanks for spotting that.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s