The incredible expanding layer bug

So a few months ago at work (January), a bug came across my desk for a performance problem in 3dsmax. I have always heard of bugs where someone in an architect’s office would load a file, and it would take 45-50 minutes. Perhaps the file was an autocad file with hundreds or thousands of layers in it. I’ve even had a guy at an architect’s office tell me they had files that took an hour to load before…. Just incredible. Anyways, this bug was that it would take almost a half hour to create 1000 layers in max. The guy who logged the defect, even gave us a maxscript to help reproduce the problem:

 

(–resetmaxfile(#noprompt)clearlistener()layerCount=1000

for i = 1 to layerCount do

(

t = timestamp()

mLayer=LayerManager.newLayerFromName (“blarg_” + i as string)

t += timestamp()

format “%,%n” i t

)

)

I first gave the script a run through in order to reproduce the problem, and indeed I was able to see it took a long long time. I ran it the first time, and it seemed to take an hour. But I after all wanted better numbers than that. So I modified my script to give me the total time.

The final result was 35 minutes to complete everything. During the course of which, the 3dsmax maxscript listener simply stopped responding. Finally it finished and I dumped the printed results into an Excel spread sheet and plotted the results.

The following chart plots the time (in blue) that it takes to create 1000 layers. Each Nth layer from 1 to 1000 is on the X-axis on the bottom. The time (y-axis) is the vertical axis and is plotted in milliseconds.

 

layer_creation_time_chart

By the time the 1000th layer was created, it took nearly 5 seconds. *Ouch*. The blue graph is a class parabolic shape and is in fact some form of an N squared polynomial. This performance degradation is a classic non-linear form. Contrast that with the red line, the expected performance result. Anyways, finding the bug was the real problem at hand. Why was it so slow?

My experiments were of course ridiculously hard to test. After all, you make a change and wait 35 minutes to test it. Finally I stumbled upon a function call to update some UI. Okay, I commented it out, and ran it again. My results astounded me: 4 seconds! The code I removed was simply that, when-ever a layer was created, it would update that little layer dropdown list that is on the layer toolbar:

LayerToolbar

 

Remember that little guy? That UI tool that no one likes nor uses anymore? Well the problem was that little layer dropdown list would add the new layer to some data structure, and then resort all the layers. This was a classic n squared iteration over all the layers. The more layers, the more sorting you have to do. Obviously a performance nightmare.

Once I temporarily removed that UI date function call, the time per layer was so low, that it wouldn’t even register on that graph shown above. But after all, creating layers should update that UI dropdown list eventually right? So if we remove the function call, how will it get updated? To fix that, I simply put in place a call to suspend UI updates, and another to resume UI updates for that UI drop down list. So before creating the 1000 layers, I call that suspend function, and afterwards call the resume function. So that in the end, the Layer UI dropdown list gets updated only once.

My favorite blogger, Joel Spolsky, wrote about this in a classic piece: writing about “Shlemiel the painter’s algorithm”

http://www.joelonsoftware.com/articles/fog0000000319.html

Advertisements