Don’t stomp on the Build!!

I’ve seen a lot of code from different companies I’ve worked at. I have also seen a few open source projects on the web.

And a feature that I have seen in some of them, where no matter what build configuration you specify, the build gets written to the same place.

For instance:

If you build a 32 bit build of something it goes to c:/build/exe.
If you build a 64 bit build of something it goes to c:/build/exe.
If you build a Debug build of something it goes to c:/build/exe.
If you build a Release build of something it goes to c:/build/exe.

Are you starting to notice a pattern here? No matter which configuration you choose, all build results go to the exact same folder. This is an especially grievous type of sin committed by those of the lowest caste of programmers: The disorderly ones.

This can be easily solved using any build system by just a little bit of common sense and organization:

If you build a 32 bit build of something it goes to c:/build/x32/*.
If you build a 64 bit build of something it goes to c:/build/x64/*.
If you build a Debug, 32 bit build of something it goes to c:/build/x32/debug/*.
If you build a Release 32 bit build of something it goes to c:/build/x32/release.
If you build a Debug, 64 bit build of something it goes to c:/build/x64/debug/*.
If you build a Release 64 bit build of something it goes to c:/build/x64/release.

Notice a pattern here?

Different build configuration artifacts should be able to sit side by side without interfering with each other.

  • You don’t want half of your release build binaries to accidently have debug symbols in them (inflating their size).
  • You don’t want to have a mixed build where memory is allocated in a debug heap, and freed in an incorrect release heap. Or vice versa. This corrupts memory at best and crashes at worst.
  • You don’t want crashes at runtime robbing developers of their time chasing down mysterious heisenbugs that they could have simply solved by having a consistent build.
  • You absolutely have to keep your intermediate files separated from each other too. This is the start of where subtle bugs can creep in. Thus intermediate files also need to be guarded too.
  • I could go about how stupid that is.

So for example, here is a sample of what could be done for a project to keep source code, and artifacts highly organized:

  • c:/source_code
  • c:/builds/bin/win32/debug
  • c:/builds/bin/win32/release
  • c:/builds/bin/win64/debug
  • c:/builds/bin/win64/release
  • c:/builds/obj/win32/debug/<projectname>
  • c:/builds/obj/win32/release/<projectname>
  • c:/builds/obj/win64/debug/<projectname>
  • c:/builds/obj/win64/release/<projectname>
  • c:/builds/cmakefiles/etc…
  • c:/builds/TestResults
  • etc…

Using an organization like this above makes a few things easier for the developer:

  1. They won’t have to clean the build as often. They can work all day long on a particular configuration and not interfere with other configurations.
  2. Don’t have to clean the build as often.
  3. Cleaning the build is simple. Just select a directory and hit the delete key. No need for a complicated build system invocation to clean the build.

Some other thoughts on the subject:

  1. Specify one well known, easily accessible and recognizable location for build artifacts. Put everything in there, and don’t make any exceptions. Don’t hide your build artifacts in a dozen different places. Don’t expect your developers to manually assemble the final product after the build is complete.
  2. Document that build location in a readme file found at the root of the project. If you haven’t moved on to readme files yet, then the 1970’s is calling, and wants it’s code back.
  3. Tell others to read the readme file.
  4. Even better make the build artifacts go to some location outside of the source control folder. This is beneficial for a few reasons. First it does not commingle the source code from the build artifacts. Second your .gitignore file doesn’t have to be so massive with files to ignore that were created as a byproduct of the build. I’ve worked on huge projects, and because we followed this principle, I’ve been able to keep the .gitignore files down to a few lines of code. A win win for everyone.

How to Count Lambda costs through cloud watch


How to calculate costs of running multiple Lambda functions by using cloud watch logs.

AWS Lambda functions record their logs to cloudwatch. And if you run thousands of lambda functions you might want to look through all the logs to find the cost for running all those.

This article describes a script that makes it easy to find those costs.


When a Lambda function executes it writes all standard output to AWS Cloudwatch. Finally at the end of the execution, Lambda writes a summary to CloudWatch indicating the costs involved. Which looks like this:

REPORT RequestId: 42a2245b-4237-56ad-ab71-37f37a265b12 Duration: 21342.35 ms Billed Duration: 2300 ms Memory Size: 2048 MB Max Memory Used: 200 MB Init Duration: 740.19 ms

Now If you ran your lambda function thousands of times, you will thousands of these summaries littering your logs. The goal being to dive into those logs and somehow assemble all that text into something useful for estimating the costs of those Lambda executions.

For instance, the best approach would be to get all that nice cost information into an excel spread-sheet. Then let excel crunch some numbers to tally the costs.

The Script

Using python and AWS python api (called boto3), I wrote a script to do just that. This script crawls through some Cloudwatch logs and streams, parses the summary costs and output’s some results into a .csv file. Then that .csv file can be opened in Microsoft excel.

import boto3
import os
import json
import time
import re
import sys
import datetime
from timeit import timeit as timer

test = "REPORT RequestId: 41f11e5b-4237-56ad-ab71-48b46e176e08	Duration: 45777.35 ms	Billed Duration: 45800 ms	Memory Size: 3008 MB	Max Memory Used: 195 MB	Init Duration: 740.19 ms"
p = re.compile("REPORT RequestId:\s(.*)\sDuration: (\d+\.\d+).*Billed Duration:\s(\d+).*Memory Size:\s(\d+).*Memory Used: (\d+).*")

test_match = p.match(test)

# print(  # The entire string matched
# print( # Request id
# print( # Duration
# print( # Bill Duration
# print( # Memory Allocated
# print( # Memory Used

def write_to(output_file, events):
    Writes output events to a log file

    Only if it matches a given regular expression
    count = 0
    for event in events:
        log_text = event['message'].rstrip()
        match = p.match(log_text)
        if match:
            output_file.write("%s," %
            output_file.write("%s," %
            output_file.write("%s," %
            output_file.write("%s," %
            output_file.write("%s," %
            count += 1
    return count

def get_stream_events(log_client, group_name, stream_name, output_file):
    Gets all the events in a given stream

    Sends all events to a write function for processing.
    kwargs = {'logGroupName' : group_name, 'logStreamName' : stream_name}

    while True:
        response = log_client.get_log_events(**kwargs)

        events = response['events']
        write_to(output_file, events)

        if 'nextToken' in response:
            kwargs['nextToken'] = response['nextToken']

def get_log_group(group_name, stream_prefix):
    Iterates through all log groups looking for a stream

    Looks for all streams whose name starts with the stream_prefix parameter
    This is slow, so should not be used. 
    print("getting group: %s" % group_name)
    logs = boto3.client('logs') # cloudwatch

    with open('output.csv', 'w') as output_file:
        output_file.write("RequestId, Duration, Billed Duration, Memory Size, Max Memory Used\n")
        kwargs = {'logGroupName' : group_name, 'limit': 50, }
        stream_count = 0
        while True:
            response = logs.describe_log_streams(**kwargs)
            streams = response['logStreams']
            for stream in streams:
                stream_name = stream['logStreamName']
                print("stream: %s" % stream_name)
                if (not stream_name.startswith(stream_prefix)):

                stream_count += 1
                get_stream_events(logs, group_name, stream_name, output_file)

            time.sleep(0.21) # this method is throttled if called more than 5 times per second.
            if 'nextToken' in response:
                kwargs['nextToken'] = response['nextToken']
        print("Stream Count: %s" % stream_count)
def get_filter_log_events(group_name, stream_prefix, start_time, end_time):
    Gets Cloud watch logs filtered by a few parameters

    :param - group_name: The fully qualified name of the cloud watch group
    :param - stream_prefix: Any part of the beginning of the stream name to query in
    :param - start_time: The starting time to filter for events.
    :param - end_time: The ending time to filter for events.

    It is critical to get start and end times in the correct format that Amazon uses. Even on their own lambda
    log pages, times are listed in different formats, reporting times in different timezones.

    The time that must be used is the time reported under cloud watch, and is the time that the streams show
    for their start and last time stamps.
    The timezone must be listed in UTC time.
    The final format is like this:
    AMAZON_TIME_FORMAT = "%Y-%m-%dT%H:%M:%S.%fZ%z"
    where %z is of the form +HHMM or -HHMM

    For example:

    year: 2020
    month: 03
    day: 23
    hour: 10 AM (24 hour format)
    minute: 15
    second: 00
    milliseconds: 00
    time zone: -6 hours after Greenwhich mean time
    print("getting group: %s" % group_name)
    logs = boto3.client('logs') # cloudwatch

    with open('output.csv', 'w') as output_file:
        output_file.write("RequestId, Duration (ms), Billed Duration (ms), Memory Size (MB), Max Memory Used (MB)\n")
        kwargs = {'logGroupName' : group_name, 'logStreamNamePrefix' : stream_prefix, 'startTime': start_time, 'endTime': end_time}
        count = 0
        while True:
            response = logs.filter_log_events(**kwargs)
            if 'events' in response:
                events = response['events']
                count += write_to(output_file, events)

                if 'nextToken' in response:
                    kwargs['nextToken'] = response['nextToken']
        print("Number of matched events: %s" % count)

if __name__ == '__main__':
    if (test_match == None):
        raise RuntimeError("The regular expression did not work: %s" % p)

    if (len(sys.argv) == 3): # args includes the name of this script as the first argument
        log_group_name = sys.argv[1]
        stream_prefix = sys.argv[2]

        print("Looging in Log Group: %s" % log_group_name)
        print("Looking for streams starting with: %s" % stream_prefix)
        get_log_group(log_group_name, stream_prefix)

    elif (len(sys.argv) == 5):
        start = timer()
        log_group_name = sys.argv[1]
        stream_prefix = sys.argv[2]
        start_time = sys.argv[3]
        end_time = sys.argv[4]
        AMAZON_TIME_FORMAT = "%Y-%m-%dT%H:%M:%S.%fZ%z"
        # 2020-03-23T16:14:47.637
        # The date contains year, month, day, hour, minute, second, and microsecond.

        start_time = datetime.datetime.strptime(start_time, AMAZON_TIME_FORMAT)
        end_time   = datetime.datetime.strptime(end_time  , AMAZON_TIME_FORMAT)

        print("Start time : %s" % start_time)
        print("  End time : %s"   % end_time)

        # Convert to milliseconds since epoch
        start_time = int(start_time.timestamp() * 1000)
        end_time   = int(end_time.timestamp() * 1000)
        print("epoch start time : %s" % start_time)
        print("  epoch end time : %s"   % end_time)

        get_filter_log_events(log_group_name, stream_prefix, start_time, end_time)
        end = timer()
        print("Elapsed time: %s" % (end-start))
        print("wrong number of arguments")

How to make a Nuget package for C++

Making a nuget package for managed code is really straightforward, since it is so extensively documented on Microsoft’s various websites.

But if you want to make a nuget package that contains native code like libraries (*.lib), headers (*.h) you are almost out of luck! Microsoft will give you about 3 minuscule paragraphs full of cryptic junk for documentation!

The first key to understand how to put C++ stuff into a nuget package is to understand that a nuget package (*.nupkg) is really just a zip file that has been renamed. Therefore you should just be able to stick anything in there that you like.

The second key is that we will just be using Nuget to download and unzip the nuget package for us. After that we are on our own.

The third key is that none of the Visual Studio versions will offer any aid at all in hooking up the nuget package to the project that needs it. It is up to you to break open your text editor and modify your visual studio project files (*.vcxproj etc.).


Here is a high-level summary of what needs to be done, and will be explained in detail:

  1. Gather or stage your native library files into a folder of your choosing.
  2. Create a *.nuspec file in that folder.
  3. Edit the *.nuspec file to include the files you want to include in your package.
  4. Create a *.props file
  5. Call nuget pack to create the package.
  6. Push the nuget package to a feed somewhere.
  7. Create a packages.config file.
  8. Edit the visual studio project file to point to where the restored nuget package is.

NOTE: In this document I will be using the Google Filament renderer library as my demonstration example, since I had to do that here at work recently.

Stage the native files

This should be easy, copy all the files you need for your native library to a a convenient folder. For example:

  • bin/*
  • docs/*
  • include/*
  • lib/*

Create the *.nuspec file

I like to put this *.nuspec file inside of the directory that has the code I’m packaging up. That makes it easier and simplifies the paths we will put inside the *.nuspec file. For my example, mine is filament.nuspec:

  • bin/*
  • docs/*
  • include/*
  • lib/*
  • filament.nuspec

Edit the *.nuspec file

A nuspec files is an xml file and hence must follow the syntax requirements as documented on microsoft’s website:

I chose to have the package contents follow the same layout as how Google gives them to me. Thus my nuspec package looks like this:

<?xml version="1.0"?>
<package >
		<description>Google Filament Renderer</description>
		<tags>Native, native</tags>
		<file src="lib\**\*.*"       target="native\lib"     />
		<file src="include\**\*.*"   target="native\include" />
		<file src="docs\*"           target="native\docs"    />
		<file src="bin\*"            target="native\bin"     />
		<file src=""        target="native"      />
		<file src="filament.props"   target="native" />

The most important part here is the <tags> element that contains the text ‘native’. To be doubly sure I got it right, I added it twice, the first time with a capital ‘N’. This is useful when the package is hosted in, you can do a search for native packages by using the search term: tag:native

Each <file> xml element has two attributes: a ‘src’ attribute and a ‘target’ attribute. Source is where it gets it’s files from, and Target is where the files will be placed when the nuget package is restored or unzipped. What you specify here is completely up to you. As you can see in my example above, I have also created a filament.props file. I will be using this later to make it simpler for any project to consume this nuget package.

Also I added a root folder called ‘native’. It worked with it, I haven’t tested it without a root folder. But you can give this root folder any arbitrary name you want.

Create a filament.props file

This will aid us later on in consuming the nuget package from another C++ project. Here we will create an MSBuild file (NOT a .vcxproj file) that will describe where the include files and library files are. This step requires a good understanding of the MSBuild xml syntax and especially a good knowledge of C++ project file (i.e. *.vcxproj) syntax. If you don’t know that, just copy and paste my code here:

<?xml version="1.0" encoding="utf-8"?>
<Project xmlns="" 
    <LibraryType Condition="'$(Configuration)'=='Debug'">mdd</LibraryType>
    <LibraryType Condition="'$(Configuration)'=='Release'">md</LibraryType>
  <FilamentLibs Include="$(MSBuildThisFileDirectory)\lib\x86_64\$(LibraryType)\*.lib" />
    <!-- Expland the items to a property -->
    <ClCompile>	<AdditionalIncludeDirectories>$(MSBuildThisFileDirectory)\include</AdditionalIncludeDirectories>

This specifies all the library files using a wildcard pattern (*.lib) and it points to where the include directory is too. It uses the special reserved msbuild property $(MSBuildThisFileDirectory) which helps anchor the paths that are being used inside of this file.

Call Nuget Pack

Now comes the fun part, to create the nuget package. I run a simple batch script like this:

NuGet.exe pack filament.nuspec -OutputDirectory builds\nuget\x86_64

Thereupon I can look in my builds\nuget\x86_64 directory and see a nuget package named Google.Filament.2019.8.8.nupkg

Call Nuget Push

Now that the package is created, it’s time to push it up to a nuget feed. Which nuget feed you push up to is up to you, and is none of my business. But is the defacto source. But if you have a private feed you use for your company that works too. I run a simple batch script like this:

NuGet.exe push builds\nuget\x86_64\*.nupkg -Source https://<some nuget url> -Apikey <some api key>

Create a packages.config file

This part is super easy. Create a packages.config file in the same directory as the visual studio project file (*.vcxproj) that will be consuming the nuget package.

Place the following xml snippet into the file:

<?xml version="1.0" encoding="utf-8"?>
  <package id="Google.Filament" version="2019.8.8" />

Notice how the version matches what was put inside the *.nuspec file.

Edit the visual studio project file

This step involves a text editor (I prefer Notepad++ or Visual Studio Code). This step is also markedly different from what we would use if we were using visual studio to find a managed .net nuget package for a managed .net project.

But since this is a native project, we don’t get the big boy tools, and have to settle for the hand-me-downs from Microsoft.

First we will open the *.vcxproj file that will be consuming this project, and will simply add the following line of xml code at the end of the file:

<Import Project="<Your package directory>\Google.Filament.2019.8.8\native\filament.props" />

Where your packages go, is up to you when you do a nuget restore. So where-ever that is, you will have to change the snippet <Your package directory> to point to where-ever it was that nuget unpacked the files. This can be a relative or an absolute path, so it’s up to you. But in the end, msbuild needs to be able to find that *.props file, otherwise it won’t load in visual studio.


And that is it. The package is done, and you should be able to do a nuget restore and build your native project. Many of the steps were the same as creating a managed .NET nuget package. With the exception of the manual editing of the native project files.

If you are looking for more examples of nuget packages holding native projects, do a search on, and find a project. Browse to the page for the nuget package and find the link to the ‘project’ code page for it. Usually on Click on the link to open the project in Github and then hunt around for a *.nuspec file.

The Chris Test: 9 steps to better code

There are usually a set of things that a software shop does to ensure high quality software. No matter whether it is a web, desktop or mobile. The principles are the same.

Years ago, someone else wrote a similar set of rules.

This article was very influential to me when I first read it. However….

It is more focused on the high level issues that affect the company as a whole, not rules for good code. For instance, one rule is do you use source control? In this day, every software shop should and probably does. Maybe they didn’t 13 years ago? I know of plenty software projects, or products that use source control and still have bad code. So that’s not a real test of how good the code is. For the actual coders writing the software, some of the items in Joel’s excellent blog do not really affect them on a daily basis. Programmers live in their own little bubble, and live under a set of realities that effects them only. That is what I’m going to write about here.

  1. Do you compile at the highest warning level possible?
  2. Do you check for memory leaks?
  3. Do you have overly large functions?
  4. Do you have overly large code files?
  5. Do you test with debug builds?
  6. Do you have regression tests?
  7. Do you measure code coverage?
  8. Do you stop everything when tests fail?
  9. Do you do code reviews?
  10. Do you use a real build system?

Do you compile at the highest warning level possible?

This can be interpreted to mean different things for different types of languages. For instance, a web shop developing in javascript, css and ruby will not have compiler warnings. While ruby doesn’t have compiler warnings, there are linters available, or other static analysis tools to help supplement enforcing the rules of the language. But a product written in C, C++, C# or Java will have to live in the land of warnings every minute of every day.

Compiling code and witnessing a long list of warnings spew out is a special kind of demoralization. It’s even worse when these build warnings go unfixed for months or years. The latter state means one thing: Management or the leader of the developers don’t care that the product has subtle problems. In fact it means that nobody cares. Or it could be that someone cares, but feels that they are actively discouraged from doing so by peers or management – which is the worst state of all.

What I really don’t get, is how a developer could let a warning get into the build in the first place. I mean how could a developer NOT look at build output? It leads one to ask, if they aren’t looking at the build output, are there other things they are also not looking at? They have to know if the build succeeded or not right? They obviously are not blind, but it does indicate something is lacking: Either some competence or a desire. Either one is programmer malpractice.

The end result is that there will be very subtle bugs in the software that manifest themselves in very costly ways in the long term. There could be a problem that a customer support person spends 4 hours with a customer resolving. That then gets logged to a developer who spends a day trying to figure out what went wrong. That then gets fixed and distributed to customers 9 months later. Meanwhile everyone loses: The customer from lost productivity, the software shop from labor working on what could have been discovered in a few seconds by a compiler warning. Imagine the cost for such failures to act properly?

Now most modern compilers or linters have a range of options. These options range from the basic, to the important, all the way to the absurd. So not all compiler/linter warnings are created equal. When you start getting warnings that border on absurd, then you can create one (not many) list of exceptions to those warnings, or a list of warnings that are turned off. This list should be in one place, checked into source control, and contain clearly worded, and strong reasons those warnings are turned off. This list should never be large. And one person alone without code review should never be allowed to change it unilaterally.

The compiler (For languages where it applies) is the first line of defense against bugs. Software shops who refuse to use it to its fullest potential are putting out sub-standard products.

Do you check for memory leaks?

Failing to correct (or even care) about memory leaks is a certain kind of wrong. Memory leaks are like lung cancer. Slowly suffocating it’s victim. Memory leaks slowly and insidiously suffocate an application of its memory. Applications live on memory, and when memory goes, so does the application. And customers aren’t happy when their application dies. It’s even worse when it dies and takes all their data with it.

This fundamentally begins with the programmer who writes the feature. They are the ones who originally wrote the code that allocated the memory. They should have immediately considered where the memory would be deallocated, and who would own the memory in the meantime. And second the original programmer should have tested their code.

There are tools out there to help identify memory leaks, and they should be used if they exist. Failure to verify memory leaks exist is just plain sloppy.

Do you have overly large functions?

These are functions that are more than two screens in length. Which seems to be the unwritten standard in this industry today. It wasn’t always an accepted standard however. Ruby developers seem to go by the rule that functions should never be larger than 5-10 lines of code which is even better. Large functions quickly turn a program into an unreadable mess that makes it extremely difficult, or almost impossible, for another human to decipher. The problem here is that such functions cannot be easily refactored without the real possibility of introducing bugs. I have personally seen functions that were thousands and thousands of lines long. The cyclomatic complexity of such a function shoots through the stratosphere, and any modification of such can have real possibilities of adverse, unintended side effects. Such a function was essentially abandoned code and was a dumping ground for code changes that no one cared about.

Do you have overly large code files?

Large functions lay the foundation for large code files. These are files that contain 10,000 to 25,000 lines of code. Files that contain hundreds of functions and dozens of class definitions. Code files that try to be a God to everything. These files cannot be easily read nor browsed through. They remain a black box to all but the author.

Now, there are tools to help browse through large code files. Tools to help list the structure of the application. But when code gets so big that you need a browsing tool for the browsing tool, then you definitely have a problem. When this happens, discoverability nosedives and mindshare in the product evaporates. At that point programmers stop reading the code and start searching it. And when programmers start searching the code, they are not really understanding it. They are in essence looking at a painting through a microscope, and not standing back and looking at the big picture.

Another really bad side effect is the risk of duplicate code. A huge haystack may be hiding the needle you are searching for, but not finding it, you create another needle. This in a profession where needlessly reinventing the needle is a cardinal sin.

Case in point: I once come across someone modifying a rather small class: about 40 lines of code worth (at most). I looked around and saw that we had a few other classes with the same name, and informed them about it during a code review. It turns out there were about 10 to 20 different copies of that class floating around the code base. Apparently different programmers over the years used that class a lot. Ok, that’s a good thing. But instead of simply including a header file and linking against a library, they copied the class into their code. So I pushed back in the code review and asked them to remove the duplicate code. All this because the code base was so large, no one knew where anything was, and whether a particular problem had been solved already.

Do you test with debug builds?

This really applies only to C and C++ based code. Sorry all you script people out there. Your language has evolved past this little sharp edge of history. C and C++ base languages have, by convention, classified building a product into two different states: A release build and a debug build. A release build is what you ship to your customer. It is compiled for speed with little to no error checking built in. A debug build is slower, it is not shipped to the customer and has lots of error checking.

Software should be tested to be correct first, and then tested for speed and performance second. A good software quality assurance division of a company will understand that and not require testing only on release builds.

The reason for this is, that in the world of C and C++ based languages. A ‘debug’ build will have certain extra safety checks built into the product that help identify problems before they escalate into major disasters. Such code has assertions, poison memory patterns, bounds checking, extra type checking etc… These things make code much safer and more enjoyable for the customer. A ‘release’ build on the other hand, simply compiles all those nice safety nets away and instead plops the code right in the middle of a speedway, with no safety guards what-so-ever. Likewise testing only on release builds is like training a child to ride a bike in the middle of a busy road. With no safety precautions, something is going to be break, and it will be difficult to decipher post-mortem what happened.

Do you have regression tests?

A set of regression is the second layer of defense to prevent bugs. These tests must be run by developers as they write. Not just afterwards. Tests at the very least must be run before code is merge or submitted to the repository. Otherwise the tests serve no purpose. Otherwise imagine the havok if code is submitted on a daily basis that breaks things left and right. And no one would know. Once it is discovered the tests are broken and that a feature is broken, it’s usually much later in the development process. By that time a programmer may have moved one to work on something else. To fix it could be time consuming, not to mention costly. So the entire software development process becomes less efficient.

But the ultimate havoc wreaked by not having regression tests is that bugs get through to customers. And customers get annoyed and buy someone else’s software.

Do you measure Code Coverage?

Testing it great. But you also have to know when to stop too. Most programming languages, except the new ones, have tools for measuring code coverage. They should be used. In fact one company I know of, runs tests in a continuous integrations server. The coverage of the product is recorded, and on test results with coverage less than 95%, the commit is rejected. Way to go!!

Interestingly enough, figuring out code coverage is good insurance for when a programmer leaves. The programmer may leave, but tests, guaranteed by high code coverage, ensure the code can be maintained by someone else. So when the new guy or girl comes in, and starts maintaining the code, they can be reasonably sure that their code changes won’t break the world.

Code coverage also helps to ensure the quality of the code. When code is examined in the cold truth of coverage you start to discover a few things. You can discover unused parameters, or branches of code that never get executed. You may also discover expensive, slow code that perhaps you didn’t need to call, because it’s results weren’t needed. The good reasons can go on and on, and will vary. You can will also discover bugs, lots of them. I have never ran code through code coverage and not discovered a bug. It is simple as this: untested code is buggy code.

Do you stop everything when tests fail?

This should be a no brainer: Of course all development should stop if a regression test fails. All hands should be called on deck when the tests fail. Everyone should know not to sync code submitted during a particular time period. Everyone should know not to build upon similarly infected bad code. Everyone should also feel responsibility to look, hunt and search for and fix the problem. In such a situation, everyone should pretty much be guaranteed a clean working environment where they can trust that the code they are basing new features on is stable, solid and bug free.

But if the culture says that regression tests are ignored and swept under the rug. Then no one can be sure of anything about the product. What if the test failure count keeps creeping upwards and no one is held accountable? Then the product as a whole gains technical debt. Someone traded a short term gain that will not last, but incurs debt that grows over time. There is no debt so small that over time cannot grow into something huge. The entire farce can eventually grow into a farce where certain groups refuse to run the regression tests: because they are broken. But then refuse to help fix them in the first place. Such downwards spiral always result in the same thing: A spectacular collapse that offends some customers really badly.

Do you do code reviews?

No developer is immune from writing bugs into their code. Especially those who are unfamiliar with the software, language or platform. It can many years to master each of those 3 areas. In the meanwhile, code submitted by someone who has not mastered those areas should be reviewed by other people.

But this rule should not be applied with no regard for the circumstances. Code Reviews slow down development speed a lot. As in drastically. Ever had a code review languish for weeks at a time? I have, and it’s not fun. Productivity nosedives, moral plummets and old bugs languish. So, if a startup company absolutely required code reviews for everything, they would probably not actually ship a product… at all. Well before going bankrupt and running out of investor money.

So, here are a few common sense solutions to the problem:

  1. Code reviews are optional for those who originally wrote the code. Those who wrote the code are always more informed about what and how the code works than anyone else. Any changes made by them to code they themselves wrote should be optional.
  2. Conversely, modifying any code you yourself didn’t write should always require a code review. Especially if the person who wrote the code is still available.
  3. Have an expiration date. Any code review not reviewed in a certain amount of time expires and the developer is free to submit the code.
  4. Require unit tests and regression tests instead of code reviews. This works especially well for those who originally wrote the code in the first place.

Requiring a code review of all code is an ideal that should be sought after. But remember, customers buy your software, not your code reviews. Until there is first shipping software, no amount of code reviews will pay the bills. So, until then, don’t sacrifice the new startup for a feature that customers don’t buy.

Do you use a real build system?

A slow or inefficient build is a tax that costs real time and real money. A build is the process of converting the source code into something else which you send to your customer. Applications being written by programmers are also built by programmers. And usually built using custom code that is slow and very inefficient. It doesn’t have to be this way. I once worked at a company that used an archaic build system that took 45 minutes to compile and link. I fixed it up so that it took 1 ½ minutes. Another product I once saw had no build system at all. But rather a collection of bash scripts that did a few things in a certain order: No matter if they were required or not. Such a build system was actually fast compared to building 8 million lines of C/C++ code. But still it did only one thing: Build the product with little to no configurability.

Each programming language and environment has its own build system. C has the antique make system (good riddance). C++ has none really unless you are using Microsoft’s Visual studio and MSBuild. Java has Maven. Ruby has rake. These tools should be used instead of re-inventing the wheel by writing custom scripts for your build. Once a build system is written properly, then it can executed differently according to the needs of the user.

A full clean build can be executed and be guaranteed to work due to no preconditions polluting the source code.

An incremental build can be executed based off of localized changes. These builds don’t start from scratch but rather build only what is needed based off of the actual changes.

A clean build is simply to delete everything that a build creates, and return the source code back to the state it was in when it was first retrieved from the source code repository.

If a build cannot do this, then it is in the stone ages and not helping anyone nor anything.

Over engineered build systems from Hell

While I was at Autodesk years ago we went through various build systems. When I first started the build was written in Perl. Dependencies were specified using a .csv file. I had no idea how it worked, nor did I care how it worked since I was interested in other things during those years. The build could routinely take an hour and 45 minutes, and was mind-numbingly boring since we usually had to do it everyday. And if you were unlucky multiple times per day. Even worse In fact on the build servers the build would routinely take almost 4 hours!

What a horrible tax.

Later on, a hot-shot programmer came along and rewrote the build system to use ruby and rake. It was supposed to be faster, which it kind of was. But the complexity was just as bad, and no one knew ruby nor how rake worked. Then the developer left the company, leaving behind a black box. So that silo of information was gone, and no one knew how the build system worked. It took 2 developers about a year and a half or so to learn the build system well. To really be able to work on it at the level the original developer had done.

To be sure there were problems with the build. It still took a long time to build the product. Somewhere near an hour. About the only thing we did gain was the ability to do incremental builds.

But modifying the build was the main problem. The build, written in ruby, completely reinvented the wheel on so many different areas. To understand this better, you have to understand that the product at the time was built using Microsoft tools because it was based solely on the Microsoft platform. Thus the source project files were in a build language that Microsoft created. That build language was built into visual studio and was called MSBuild. Instead of using Microsoft tools to create the build, ruby and rake were used instead. Instead of using Microsoft tools to parse the xml project files, a ruby gem was used. Instead of using anything from Microsoft to help with the build, everything was re-invented from scratch. Parsing visual studio .vcproj (and eventually .vcxproj) files was done tediously and laboriously and mind numbingly using rake and some xml gem. Talk about recreating the wheel! For instance, lots and lots of code was written to duplicate a simple function Microsoft already offered. Though a Microsoft API call that retrieved a fully instantiated object with full properties all intact.

Copying files to the build directory was another disaster too. It would take around 10 to 12 minutes to copy 7000~14000 files. It originally was somewhere near 7000 files, but grew over time. All written in ruby code that no one knew how to debug except to put print statements in.

Another problem was adding build properties. If you wanted to add a build property (a key value pair), you had to add it to multiple places in the ruby build, knowing exactly what to modify (in duplicate) and such. It was horrible.

Mixing ruby with MSBuild was like mixing iron and clay. They don’t mix well at all. It was like a ruby straight jacket that hindered the build and visual studio upon which it was based.

There had to be a better way.

Eventually when the frustrations with the build boiled over, I learned MSBuild and figured out how to build max without ruby. It took over a year, from when I first got a working prototype, to get something into the main branch of max. Simply due to bureaucratic inertia. There are lots of people in positions of power who simply say no before learning about a subject. Which was something all too common there. The first liberation was freeing the list of things to get built from ruby. Yes the list of DLL’s and EXE’s to get built was specified in some arcane ruby syntax somewhere. The first thing was moving that list to a democratic XML file. Now any build language could parse it and find out what to build. The second thing was moving the list of files to copy to an XML file. Now any build system could know what files to copy as well.

Once those two things were in place, it was time to put in the build system that I originally came up with (during a particular Christmas break).

It was written in pure XML, with one MSBuild extension that was written using C#. All the tools were native to visual studio, and what you built on the command line was what was built in visual studio. They both used the same properties (using native property sheets) and built in the same way.

What’s more I found that using native MSBuild tools to copy those 7000+ files to build now was incredibly fast. In fact, while once debugging through that old ruby code responsible for copying, I found the source of the 10 minute copy task. Or why it took 10 minutes. It was using an N factorial algorithm! So given directory A with B thru Z subdirectories, it would iterate through all directories n! times. Each directory was not parsed once, but N! times according to the amount of sub-directories that existed. It was an unholy mess that proves that re-inventing the wheel is usually a disaster waiting to happen. Now back to the improvement: With the new msbuild copy mechanism it took 20 seconds to copy all those files. 20 seconds versus 10 minutes was a big improvement.

Incremental builds also greatly improved. Go ahead and build your product from scratch. Now don’t change a thing and rebuild. If you have a smart build system, it should just take a few seconds and nothing will happen. The build system will be smart enough to essentially report that nothing changed and therefore it did no work. My new build did just that in a matter of seconds. The old build system used to take about 5 minutes to do that (And it still did work anyways…).

Speaking of performance. The time to compile and link the actual files didn’t change much, because that was always in visual studio’s corner and not ruby. The improvement in performance came from the copying actions that now took 20 seconds. Also noticeable was the shorter time involved from when the build started to when the first CPP file was getting compiled. In ruby/rake it took quite a few minutes. In the new build it took a few seconds. Eventually when I got a new SSD hard-drive, I was able to get the build down to 22 minutes on my machine.

The build at Century Software

Later on I moved to Century Software, a local company to where I live. That was such a fun place. Anyways their build system for their windows product was written in Make! Yes Make the original, ancient build system that you can’t even find documentation for anymore. I mean literally, I found (I think) one page of documentation somewhere on some professors lecture notes. The docs were  horrible. Make implemented here was so old it built one C file at a time. No multi-threading, no parallel builds nothing. Slow was the operative word here. That and a incomprehensible built output that was so verbose it was almost impossible to comprehend. The only good thing about it was that it immediately stopped on the first error.

So eventually I rebuilt that using MSBuild too. It took me a few months in my spare time. No bureaucratic inertia, no one telling me no. I just worked on it in my spare time and eventually I had a complete and fully functioning system for the tinyterm product. This build was the best I’ve ever done, with zero duplication, extremely small project files and a build that was very fast. It went from 45 minutes to a minute and a half.

When writing a product, the build system should be done using the tools that the platform provides. There should be no transmogrifying the code, or the build scripts (like openssl) before doing the build. When writing ruby on rails use rake for your build process’s. When targeting Microsoft platforms use msbuild. Using java, then use maven. Data that must be shared between build platforms should be in xml so that anything can parse them. And most important of all, distrust must go, and developers and managers must have an open mind to new things. Otherwise the development process’s will take so long, and be so costly, and exert such a tax that new features will suffer, bugs will not get fixed and customers will not be served and they will take their money elsewhere.

Now working at….

I no longer work at Autodesk anymore. In fact I have not worked there since last spring. I and another developer, and almost the entire documentation department were laid off. In fact, that spring the entire Softimage product was discontinued too.

My highlights of working at Autodesk were these:

1. It was my first job at a big company (before I was self-employed). While I went through a big company, but the big company didn’t go through me. Once the company gets big enough it is hard to listen to customers. Especially when all are clamoring for bug fixes that should have been fixed last year, or 18 years ago.
2. I learned the craft of software development, for which I’ll always be grateful.
3. I learned how to make good code, and saw lots of examples of bad code.
4. I learned that if software quality is like a river of fire. If it gets out of control, and passes through the banks guiding the river, it will burn everything. Therefore bug counts have to be kept under strict control for a software product. If they are allowed to proliferate beyond the ability to get fixed, then the product suffers.
5. I almost single-handedly saved 3dsmax. When I started, the average time (in days) between a crash was in the mid single digits. When I finished the max 2014 release was more stable than Maya: in the low 40’s. That was my effort mostly with a bit of help in the last year for the 2014 release. Still the stability of 3dsmax was still no where near the rock-solid stability of Autocad. In all the years I used Autocad (at work and personally) I don’t think it ever crashed on me. At least not that I can remember. My personal opinion is that Max will never again have a sustained effort to fix crashes like they had when I was there. Simply because mostly: No one cared like I did.
6. I think it is time for a new 3D product to be built from the ground up. To avoid the problems that a mountain of code imposes. To have a modern take on a 3D application that is not hobbled by the 30 year old windows win32 API, and that is easy to develop for. If anyone is interested in such an effort please contact me privately.

Anyways after I left Autodesk, I worked at a local boutique software house called Century Software. It was the opposite of Autodesk: small having less than 10 people. They also have a very old product called TinyTerm: A terminal emulator. While it is old, it is a great product. Also the team there was a lot of fun. It was a breath of fresh air. I re-learned old C skills for which I’m grateful, but I also learned how to use a source control package named git. For which I’m even more grateful.

So what am I doing now?

As of January of 2015 I work at at company called Instructure. The environment is terrific, the people there are young and very smart. It is an awesome place for developers. I can’t say enough good things about this place. I work on a beautiful product called Bridge, which is a learning management system for business’s.

Writing Stable 3ds Max Plugins

I found this document while looking through old files today, and thought I’d share it. It was from a lecture I gave at Autodesk University back in 2012. It applies to 3ds max, but has some points that would be applicable to software development in general.

[Note] I wrote this along time ago, and today I saw this blog post again. I thought in fairness, I should add a few things. These were my guidelines I came up with based off of years of experience and years of experience of fixing bugs in 3dsmax. While I firmly believe in every single last one of them, unfortunately, hardly any of these things ever entered the thoughts of most 3dsmax developers. Most coded all day long being blissly unaware of warning level 4, and none ever showed an interest in static analysis except two people. In fact management most of the time was completely unsympathetic to these ideas. Of course management, even the development managers who used to be programmers simply didn’t care. They just wanted bugs fixed fast. No matter, nor interest was given to systematically fixing the fence at the top of the cliff. All thoughts were to get ambulances to the dead bodies at the bottom of the cliff as fast as possible. As a result the fences at the top were always full of holes. By the time I left Autodesk in the spring of 2014, only a dozen or so projects compiled at warning level 4. And no systematic static analysis was being done by anyone. I could go on, but that’s a thought for another blog post.


Preventing crashes and increasing stability in software is a difficult task. There is no practice nor set of practices that will completely prevent all crashes. However there are a few things that will help reduce errors. With that short introduction let us get started.

Basic Responsibilities

These are basic practices that would apply no matter where you worked and no matter which product you worked on.

Compile at warning level 4

You should compile your plugins at warning level 4. Level 4 warnings can help point out subtle programming errors that can lead to bugs that can be absurdly difficult to discover later on.  This is a free and practically instantaneous way to find bugs early in the development process.

Code compiled at level 4 is better than code compiled at anything less. Level 4 warnings should be turned on, and no warnings should be suppressed.

The 3ds Max SDK compiles cleanly at warning level 4, and has been that way for at least 3 years now.

Case in Point:

We turned on warning level 4 for an old project recently. One level 4 warning pointed to some unreachable code. This was caused by a break statement that was left in in a loop. This problem eventually resulted in a complete feature not working.

Compile with Static Analysis

The highest version of visual studio comes with a static analyzer called Code Analysis. This feature can be turned on for native or managed code in visual studio. Static analysis does a deep scrutinization of the code and can help spot bugs. These bugs are more complex than what level 3 or 4 warnings can give. But these warnings are usually so fundamental that they can be likened to level 1 or 2 warnings.

Case in Point:

The static analyzer can detect allocation/de-allocation mismatches. For instance we turned it on and found when memory was allocated with new[] but was de-allocated with delete, instead of delete []. We found lots of these scattered throughout our large source code base. The advantage of this is that it is so easy to detect. Without static analysis it would take a special tool like bounds checker to reveal a memory allocation mismatch, and that would only be after exhaustive testing.

Check Pointers for NULL

By far the most common fix I have seen for known crashes in 3dsmax is to check a pointer for NULL. This is the most persistent problem that I have ever seen in C/C++ code. Get into a habit now to check every pointer for NULL before using it.  A corollary to this is to initialize all pointers to NULL before and after they are used.

Case in Point:

The visual studio static analysis tool emits various warnings for possible dereferencing of null pointers. Consequently I have rarely seen this problem in code that compiles at level 4 with static analysis.
For the Rampage Release the 4th highest Crash in 64 bit max was a crash using Ray Traced Shadows. The shadow code contained a buffer of floating point values that was uninitialized. It was extremely difficult to track down, as it was only manifest when the debugger was NOT attached.

Check before casting

If you lie to the compiler, your application will come back and bite you. C is a language that is seemingly built on casts, where anything can seemingly be cast to anything else. This ability to so easily lie to the compiler and misrepresent your types to the compiler is dangerous and risky. Therefore prefer to use C++ style casts. By turning on RTTI and using C++ style casts, the results of the cast can be checked for validity.

Case in Point:

In the sdk header file imtl.h is a class called MtlBase which has 4 derived classes. One of those classes is class Mtl. I found functions in MtlBase that was blindly assuming the instance (i.e. this) was an instance of class Mtl. However this ignored the fact that there were 3 other derived classes from MtlBase. Thus it was casting the ‘this’ pointer to class Mtl, and then doing work on that miscast pointer.

Avoid stack based strings

A very common way to crash the application is over-reliance on stack based C strings. This code for instance is very dangerous:

void foo() {

One of the problems with stack based strings, is operating on a string that is bigger than the buffer. This of course can corrupt the callstack.  This is almost impossible to debug afterwards and usually makes reading minidump crash files an exercise in frustration.  The danger can be minimized by using the newer safe string functions. For instance instead of using strcat, which can easily run over the end of a string, you can use strcat_s which is a safer version.

When possible use TSTR or MSTR instead , where the actual string buffer is stored on the heap, and not the stack. Then if anything does go wrong, it will not corrupt the callstack.

Now a disclaimer: Max has a lot of stack based strings all over the place (It is has been around a long time of course). But their usage is getting reduced as we now favor TSTR or MSTR.

Case in Point:

The code for the customization dialog contained a for loop that concatenated a string into a stack based buffer of limited size. The for loop interated too many times and the buffer overflowed, corrupting other items on the stack. That stack based buffer was several frames up the stack. When that stack frame was cleaned up, it crashed. Diagnosing the problem was difficult since the symptom was several function calls away from the source of the problem.

Avoid using catch(…)

If at all possible avoid using catch(…). Prefer to handle more specific exceptions like catching an out of memory exception such as (std::bad_alloc). While using catch(…) may prevent the application from crashing, it can  also hide bugs and make it more difficult to solve crashes. It is useful for debugging to actually remove a catch(…) and let the program crash exactly where the cause of the crash is located. You should generally catch only those errors that you can handle, and let the ones that you cannot pass through so that the larger system can handle it if possible, or crash in the “correct” place rather than delay it.

Now catch(…) can be used when it does something to correct the program state. This should be done only after careful consideration, usually with multiple developers. Also side affects needs to be considered as well. If a catch is used to wrap a call to thousands of 3ds Max functions, than it probably shouldn’t be used. However wrapping a call to a 3rd party library is acceptable. Everything needs to be balanced of course.

Certain regular expressions can easily be written to help search for empty catch statements. The static analyzer PVS-Studio will also help identify these too.

Case in Point:

I regularly review the usage of catch(…) in the source code, and have over the years taken out a few catch(…). As a result, the clarity of crashes from customers in the field has increased.

Use Debug Builds

When debug builds are available, they should be used for testing and development. 3ds Max is the only M&E product that provides debug builds to ADN partners, all though they may be slow in delivery. However despite the delays a debug build provides a great resource in validating your plugins.

[Note: It turns out to be very ironic that I put this here, since the 3dsmax team does not use debug builds. Sure the devs do, but in all my years there I could never get management to move to have the QA testers use debug builds. Never the less I believe in debug builds and that they are far superior for testing than release builds.]

Watch log file, asserts and the debug output

Log File

3dsmax has a log file that writes to <max install>networkmax.log. This file is mainly used for debugging network rendering, which was its original purpose. However, it has grown to become a popular logging mechanism for max. This log can provide useful information, but it still is under-utilized and cannot be expected to report program state consistently across the product.


Do not ignore asserts (remember debug builds?). Use asserts liberally in your own code and don’t suppress asserts unless they are logged and checked afterwards (for example, using automated testing). The assert system will automatically log all asserts (whether they are suppressed or not) to the file: <max install>3dsmax.assert.log.

Debug output window

The Visual Studio debug output window (debugging window) provides significant output and can be useful to watch during debugging sessions. Be sure to turn on display for all types of events including exceptions (very important) and regular program messages. If you want to check debug output without attaching a debugger, than you can use a Microsoft tool from sysinternals called DbgView. See the following website for details:

Disclaimer: The MetaSL system parses a lot of files when 3ds Max starts up. This will generate a lot of exceptions that are benign, so not to worry. The reason is The MetaSL system, from Mental Images, uses a 3rd party library (antler) to parse files, which in turn uses exceptions for program flow.

Enable Break On Exception:

Visual Studio has debugging options that allow it to break execution when an exception is thrown. This should be used as often as possible. This is the corollary to the “No catch(…)” above.  There are a few places where max actually does use catch(…), for example in the maxscript kernel.  By enabling this feature, exceptions are immediately brought to the attention of the developer.

Max Specific Problems

Do not hold naked simple pointers to ReferenceTarget’s

A class that is not a ReferenceMaker should not hold a plain old pointer to a ReferenceTarget, or a class that derives from a ReferenceTarget, without some mechanism to ensure validity of the pointer before use (i.e. AnimHandles). Instead replace the simple pointer with a SingleRefMaker class instance, and have that observe the ReferenceTarget.

Good Bad
class Good{…

SingleRefMaker mObserve;


class Risky{…

ReferenceTarget* mObserve;



Do not write dynamic arrays of ReferenceTarget’s.

Do not write a class that holds an array of ReferenceTarget’s: especially when that array grows and shrinks at runtime.

A class like this usually has a container that holds pointers to ReferenceTargets. It usually overrides ReferenceMaker::NumRefs like this:

int RumRefs() { return myArray.Count(); }

Instead of a fixed number of items:

int RumRefs() { return 3; }

This cannot be done correctly without considering undo and redo (Subclassing class RestoreObj). The fundamental weakness of the reference system is that it expects references to be in a fixed position. That reference index is an internal implementation of the ReferenceMaker that should be invisible to clients. However clients routinely use the reference index to get a certain Target. And one of those clients is the undo system. One of the complications of such an implementation is that the Undo System usually expects that internal array to never shrink in size. If a ReferenceTarget is removed from the internal array, a RestoreObj usually should or could point to its old reference slot. The Reference System of course has no idea that the internal array shrunk in size, so if an undo action occurs it may stick that Reference back into the wrong slot. To avoid that, a common practice is to make dynamic reference arrays grow but never shrink. This wastes memory.
For example: Undo and Redo can change the size of the internal array via SetReference. So if you have an array with 10 ReferenceTarget’s and your undo/redo object happens to ‘redo’ and stick a reference back in at slot 5, well, all your other pointers from index 5 to 10 have now had their indexes bumped up by one. So now anything dependent or holding on to those moved ReferenceTarget pointers are now dangling.

There are a few alternatives to this:

  • Use class IRefTargContainer.
  • Use an array of AnimHandle’s.
  • Use a ParameterBlock

Do not access the Reference System after NOTIFY_SYSTEM_SHUTDOWN

The notification message NOTIFY_SYSTEM_SHUTDOWN (See notify.h) is broadcast before plugins are unloaded. It is critically important to drop all references to plugins in response to this message. There are many plugin modules that define ReferenceTargets that will then get unloaded shortly afterwards. Once the plugin module is unloaded, trying to access a ReferenceTarget defined in that module can result in a crash.

Do minimal work in DllMain

The MSDN docs state that minimal work should be done in DllMain. Specifically it warns against loader lock, among other things. The DllMain function can be called as a result of LoadLibrary. When LoadLibrary is executed a critical section is locked while your DllMain is active. If you try to do work that for example needs another DLL to get loaded, it could lock up the application as a race condition. Instead of doing work in DllMain on shutdown, there are a few other ways to do plugin initialization and unitialization. For example:

  • You can do uninitialization work in response to NOTIFY_SYSTEM_SHUTDOWN. (see notify.h)
  • You can and should use a LibInitialize and LibShutdown functions.

A similar warning is not to do heavy work in static variables constructors, because a static variable will get constructed close in time to when DLLMain is called. Then, when the static variable is constructed, the DLL may not be fully loaded and types needed by the constructor may not be available yet.

Do not violate party etiquette

Uninvited guests should not crash the 3ds Max party. When the party is over: go home.

Uninvited guests

Every plugin has an appropriate time in which it should be initialized, do its work and shutdown. For example:

  • A plugin for a color picker should not instantiate one when 3ds max starts up.
  • A plugin for a scene browser should be active ONLY when its UI is active.

It is entirely possible and probable that users can start max and NEVER use your plugin. Therefore do not waste memory and resources for a feature that may not get used. Do the work when users actually invoke your feature. In other words when 3ds Max starts up, the plugin should not invite itself to the 3ds Max party, it should wait for an invitation.
This rules is violated on startup by loading 3rd party libraries, instantiating plugin classes, holding pointers to the node scene graph and registering callbacks to common scene events (my favorite pet peeve: “Hey max crashed in this function even though I never used this feature?”).  When max loads a plugin, the major things 3ds Max requires from a plugin are:

  • The number of class descriptors
  • A way to get those class descriptors.
  • Some pointers to LibInitialize and LibShutdown functions.

Therefore class descriptors really are the only things that should be instantiated on module load or startup. There should be no static instances of the actual plugin class, whether it is a material plugin, shadow, utility, or renderer. Of course there are exceptions such as function published interfaces and parameter block descriptors that often are statically defined: But I’m not talking about those.

No loitering

When 3ds Max shuts down, it sends out the most important broadcast notification in all of 3ds Max (found in notify.h): NOTIFY_SYSTEM_SHUTDOWN. This means the 3ds Max party is over. The plugin should completely shut itself down or disassociate itself completely from all max data. For example: All References should be dropped. All arrays holding pointers to INode’s should be cleared out etc… And most common and most dangerous: All callbacks functions that are registered should be unregistered.

When NOTIFY_SYSTEM_SHUTDOWN is broadcast, the entire max scene is completely intact and still in a completely valid state. During any callbacks or notifications after that, 3ds Max will contain less and less of a valid state to work with. In other words as 3ds Max progresses in its shutdown sequence less and less of the max scene will be valid. So for instance the other shutdown notification NOTIFY_SYSTEM_SHUTDOWN2 is called merely when the main 3dsmax window (think HWND) is destroyed. No plugin should be responding to that message to (for example) iterate through the scene graph. Likewise the LibShutdown functions should not be iterating the scene graph.

Case In Point

Say that a plugin that depends on another library like this:
plugin.dll -> library.dll
When the plugin is loaded by max, the tertiary library will also (automatically) get loaded. But when the plugin is unloaded the tertiary library will not get unloaded. That is unless the reference count on the library is decremented to zero. This will not happen unless FreeLibrary is specifically called on library.dll (Which is not a common nor recommended practice). Thus instead, the library will get freed or shutdown long after WinMain exits and max has uninitialized and is gone. Therefore the tertiary library should not contain any dependencies on anything in the 3ds Max SDK. Thus for example GetCOREInterface() should never be called in a DllMain of a dependent module to a plugin (i.e. library.dll ).

Quality Testing

Developers can implement the following practices in their software development processes:

Automated regression testing

All good production pipelines should have regression testing that occurs automatically after a build. This is critical to help catch bugs before they get to customers in the field. Also the developers should have access to these automated tests so that they also can run these tests before submitting their code.

Dynamic Memory Analysis

This means using 3rd party tools to profile, analyze, check and verify memory during runtime of the application.

The following list of tools is a partial example of what is available:

  • MicroFocus BoundsChecker: Checks for memory leaks, or memory allocation mismatches among a host of other things.
  • Microsoft’s Application Verifier also checks for various memory problems during runtime such as accessing an array out of bounds.
  • Visual Leak Detector (Open source on checks for memory leaks. It is fast, efficient and stable.

Code coverage

This is using a tool to measure how much of your application or plugin was actually tested during execution. This helps a developer to know when they have tested the product enough. It also can help a developer find areas they have not tested. Simply put untested code is buggy code, and a code coverage tool helps in this regard. The best tool I have ever seen for this is Bullseye ( It works for native C++ and is easy to use, and very fast. It requires instrumentation of the code during the build which can double the build time,but runtime performance is excellent.









Beware the hook hiding under the clutter

A few years ago, while working on the release of 3dsMax (2014?), I was working on fixing warnings that came from a static analysis tool (PVS Studio). The analyzer had identified code in the status panel that could lead to possible memory corruption. As I started to look around in the code related to the status panel, I found some code for unrelated features and some dead code. In other words code for features that was not where it was supposed to be.

These features used to be on the status panel, but since then got moved around or removed. For instance the snaps used to be on the status panel (Remember in max 3?), but since then got moved to the toolbar on the top area of the app. What I found was that all the old features that ever used to be on the status panel were still there! When the UI got updated over the years, the programmer(s) just turned off the UI features, and simply left the old code. They didn’t even comment out the code, or indicate that a particular UI widget was not used… It was just abandoned. It was a hard to know what was still relevant, and what was not.

The snaps were at least still relevant, I put in a breakpoint, executed some snaps and hit that code! great! But wait! The snaps were now in a different functional area of the UI than the status panel. To be exact, for people not familiar with 3dmsax, the snaps are at the top of the screen on the main toolbar, and the status panel is the row of UI controls on the bottom of the screen. So why should toolbar code be hiding in the status panel code? What a Zig zag! Who cares? The computer doesn’t care where the code is, as long as the appropriate function gets compiled to an object file, and is available to the linker, that code can be anywhere. So why should humans care? Well because humans are much more expensive to operate than a computer. Humans read the code and have to understand the code first. So as long as that feature was hiding in the status panel, it was as good as taking a random book in the library, ripping off the catalogue ID, and placing it in a random location in the library.

Disorganized Library. In code typified by dead code and features implemented in the wrong files.

The tax associated with disorganization in code is immense. I’ve seen it first hand. I remember the first time I tried to find the code for the material editor. It took me an hour. I think I had to find some text that was somewhere in the material editor, and do a brute force text search of every file in the source code. Then look up some more things, do more brute force text searches until I found the code I was looking for. It was the equivalent of taking a book in the library, ripping off the cover with its catalogue ID, putting a different cover with an obsfucated name on it (generally with no vowels), and putting the book somewhere else in the library. Good luck finding it.

For an example of that, try finding the code for the editable mesh modifier in the public SDK of 3dsmax. You will have a hard time if you look for editablemesh.vcxproj, since it’s actually called pack1.vcxproj. Thankfully it is in a folder called /samples/mesh/editablemesh. (btw, the SDK sample code is only organized because of work I did way back in 2005).

Which leads us to another problem. A feature will be called one thing in the documentation, have a completely different name for the source code files, and have completely different class names in the code files. Imagine if you could have a code file name foo.cpp, the class names in the file are variants of the name baz, and the feature as exposed to the customer is named wizz. It’s all a confusing job to learn the lay of the land. Maps obviously don’t exist in this land, and only through experience do you learn where things are. Browsing the code is simply useless.

So I started to move the misplaced code. I used a code coverage tool called bullseye to aid me in my testing. At one point I had removed most of the code and hit around 85% functional coverage, and something slightly less for MCDC coverage. (MCDC coverage is the real indicator of code coverage btw). But there was this stubborn windows hook code that I could just not hit. Doing a brute text search for the hook, I could not find where the hook was used. It appeared to be dead code. So I removed it.

Fast forward a few months, and a bug report came that the cross hair feature was broken. That is where the cursor in the viewport can be replaced by cross hair lines. Ok, what caused it? My removal of the apparently dead hook. Ok, sorry about that.

Now if you know anything about hooks, you must know that they are only used as a last resort. And even then I and a few colleagues agree that using a hook is still generally the wrong approach. Hooks are an evil API that literally traps every single message that gets passed through your message pump (millions in a typical session). And for each message the system will execute your callback, and allow you to do who knows what with the message. There is no way that hooks can have a beneficial affect on performance. In fact with dozens (or more?) of messages getting sent every second, it can only have a negative affect on performance.

So the cross hair bug was reported, and the dev who was fixing it (not me) asked to revert the code changes by puting back the hook to fix the cross hair cursor. Not so fast I said. Why was the hook code in the status panel in the first place? Viewport code had it’s own place in the code. This hook code was far far removed from that! Finding this hook here was like finding the a gas line in the glove compartment of an automobile– in other word a complete and utter hack. This cross hair business has no place in status panel code. Well, something smelled fishy here. So I suggested to the other developer that they research the problem in greater depth and find another solution to the problem of the hook and crosshair. They did and in a week or so, they came back with a simple fix consisting of setting a variable (or something trivial like that) to turn on the cross hair. And best of all that fix was in it’s proper location: the viewport code. Excellent a great fix! Congratulations to the developer who found the real place to fix it.

Meanwhile once I found out what the hook was for, I researched it a little more and found out how really bad the hook was. That hook got installed and activated when max first started up, and stuck around until max shut down. It didn’t matter if you used the 3D viewport window cross hair or not, that hook was installed and running 100% of the time. Sucking CPU time for every message it trapped.






The incredible expanding layer bug

So a few months ago at work (January), a bug came across my desk for a performance problem in 3dsmax. I have always heard of bugs where someone in an architect’s office would load a file, and it would take 45-50 minutes. Perhaps the file was an autocad file with hundreds or thousands of layers in it. I’ve even had a guy at an architect’s office tell me they had files that took an hour to load before…. Just incredible. Anyways, this bug was that it would take almost a half hour to create 1000 layers in max. The guy who logged the defect, even gave us a maxscript to help reproduce the problem:

	for i = 1 to layerCount do
		t = timestamp()
		mLayer=LayerManager.newLayerFromName ("blarg_" + i as string)
		t += timestamp()
		format "%,%n" i t

I first gave the script a run through in order to reproduce the problem, and indeed I was able to see it took a long long time. I ran it the first time, and it seemed to take an hour. But I after all wanted better numbers than that. So I modified my script to give me the total time.

The final result was 35 minutes to complete everything. During the course of which, the 3dsmax maxscript listener simply stopped responding. Finally it finished and I dumped the printed results into an Excel spread sheet and plotted the results.

The following chart plots the time (in blue) that it takes to create 1000 layers. Each Nth layer from 1 to 1000 is on the X-axis on the bottom. The time (y-axis) is the vertical axis and is plotted in milliseconds.


By the time the 1000th layer was created, it took nearly 5 seconds. *Ouch*. The blue graph is a class parabolic shape and is in fact some form of an N squared polynomial. This performance degradation is a classic non-linear form. Contrast that with the red line, the expected performance result. Anyways, finding the bug was the real problem at hand. Why was it so slow?

My experiments were of course ridiculously hard to test. After all, you make a change and wait 35 minutes to test it. Finally I stumbled upon a function call to update some UI. Okay, I commented it out, and ran it again. My results astounded me: 4 seconds! The code I removed was simply that, when-ever a layer was created, it would update that little layer dropdown list that is on the layer toolbar:


Remember that little guy? That UI tool that no one likes nor uses anymore? Well the problem was that little layer dropdown list would add the new layer to some data structure, and then resort all the layers. This was a classic n squared iteration over all the layers. The more layers, the more sorting you have to do. Obviously a performance nightmare.

Once I temporarily removed that UI date function call, the time per layer was so low, that it wouldn’t even register on that graph shown above. But after all, creating layers should update that UI dropdown list eventually right? So if we remove the function call, how will it get updated? To fix that, I simply put in place a call to suspend UI updates, and another to resume UI updates for that UI drop down list. So before creating the 1000 layers, I call that suspend function, and afterwards call the resume function. So that in the end, the Layer UI dropdown list gets updated only once.

My favorite blogger, Joel Spolsky, wrote about this in a classic piece: writing about “Shlemiel the painter’s algorithm”

How to use Perforces P4.NET API for basic tasks

So recently I had a task to rename some files in Perforce. Ok, one or two files is easy, that can be done by hand. But this task required doing around 800 files based off of data from an external source. So I started looking for a .NET API that would allow me to access Perforce using C# rather than use the command line.

Thankfully Perforce has one.

The only problem is documentation is basically no good:

The help docs have one measly page that gives a tutorial on how it works. And that is only for connecting and disconnecting. For most of the rest of the reference guide contains no description of what anything does, nor what anything is, nor how to use it. The docs are basically just a skeleton of a help file. And on every page is this disclaimer: [This is preliminary documentation and is subject to change.]. Indeed it is. The docs are so bare, they are basically useless. Given a function signature that has no description like this:


public List<FileSpec> IntegrateFiles(IList<FileSpec> toFiles,FileSpec fromFile,

Options options




Type: System.Collections.Generic.IList(FileSpec)


Type: Perforce.P4.FileSpec


Type: Perforce.P4.Options

Return Value


You basically have to guess how to use this API!

To add insult to the whole comical affair, Perforce did add documentation for this particular method, not the parameters as you see above, but for the method in general. But it was docs for the command line! As if the command line syntax and idioms could possibly apply to C#! In stunned disbelief I had to press forward as this was my only option for accessing Perforce using a .NET API. I had access different perforce API that uses .NET years ago. But it was so old that it wasn’t compatible with .NET 4.0, or the new perforce servers we use now.

So the point about this blog post is not to complain, but to offer a solution about how I figured all this stuff out.

First off I wrote a class to encapsulate the credentials needed to log on to the Perforce Server. It contained data like the server name and port, the username and the client spec:

public class PerforceID
	public PerforceID(String serverURI, String user_name, String client_spec)
		mServerURI = serverURI;
		mUserName = user_name;
		mClientSpec = client_spec;
	private String mServerURI;
	public System.String ServerURI
		get { return mServerURI; }
	private String mUserName;
	public System.String UserName
		get { return mUserName; }
	private String mClientSpec;
	public System.String ClientSpec
		get { return mClientSpec; }

This instance is constructed with the appropriate data and used later.

I kept all my methods in a static class, with just a few static data members.

How to Connect to a Perforce Server

Obviously connecting to the perforce server was of paramount importance. Thankfully the Perforce documentation did describe how to connect (at least). So here I basically borrow from their documentation, except I am using my own static variables. And as you can see I am using my own PerforceID instance to provide the server address (id.ServerURI), the user name (id.UserName) and the client spec I want to use (id.ClientSpec).

private static Perforce.P4.Connection mPerforceConnection;
private static Perforce.P4.Repository mRepository;
public static void Init(PerforceID id)
	// initialize the connection variables
	// note: this is a connection without using a password

	// define the server, repository and connection
	Server server = new Server(new ServerAddress(id.ServerURI));
	mRepository = new Repository(server);
	mPerforceConnection = mRepository.Connection;

	// use the connection varaibles for this connection
	mPerforceConnection.UserName = id.UserName;
	mPerforceConnection.Client = new Client();
	mPerforceConnection.Client.Name = id.ClientSpec;

	// connect to the server

As you can see above the Repository and Connection instance need to be used later, hence why I grab a hold of them for later use.

How to Disconnect from a Perforce Server

This part is easy (again borrowed from their docs).

public static void UnInit()
	if (mPerforceConnection != null)
		mPerforceConnection = null;

How to open a file for edit

Editing a file is the most basic operation I could hope to do. Unfortunately it was not straight-forward at all. So given a string that contains the full file path, this method will open the file for editing in Perforce.

public static bool Edit(String filename)
	Perforce.P4.Options options = new Options();
	mPerforceConnection.Client.EditFiles(options, new FileSpec[] { new FileSpec( new ClientPath( filename) ) });
	return true;

It was not my intention to open it for editing in any particular changelist. Thus after calling this method, the file will be opened for editing in the default changelist.

I found that the Perforce.P4.Client class contained a lot of the methods I would need to do various familiar operations. Operations that you could logically imagine a user doing manually such as Adding files to a changelist, deleting files, integrating files, shelving and unshelving, merging, moving, locking files etc…

How to integrate a file

Integration or branching (versus copying) is important because it retains the file history. For this operation you need two arguments: the old and new file name. Both must be the full qualified path names containing the directory and file.

using Perforce.P4;
public static bool Integrate(String old_name, String new_name)
	bool result = false;
		var from = new FileSpec(new ClientPath(old_name), VersionSpec.Head);
		var to = new FileSpec(new ClientPath(new_name), VersionSpec.None);

		mPerforceConnection.Client.IntegrateFiles(from, options, to);
		result = true;
	catch (Exception e)
		Console.WriteLine("Unknown exception calling Perforce.");
		result = false;

	return result;
static IntegrateFilesCmdOptions options = new IntegrateFilesCmdOptions(IntegrateFilesCmdFlags.None, -1, 0, null, null, null);

The reason I instantiated the instance of IntegrateFilesCmdOptions outside of the function is because I had to call this function hundreds of times. Hence it didn’t make sense redo that instantiation every time.  Also like all Perforce commands, I wrote inside an exception handler since I was guessing how the API worked. For instance I had to guess what parameters I could pass into my options instance above. This will integrate the files and put them in the default changelist. The last 3 parameters I still have no idea what they do, but passing in null works!

This method works, but the only problem is that calling this 800 times is very slow. After about 30 seconds of this I decided I needed a faster approach. I needed to integrate 800 files in a few seconds.

How to integrate lots of files (fast)

So given the requirement I needed to integrate hundreds of files, here is the fast way to do it. I needed to replace 800 calls to the server with just a few instead. This is done by first creating a branch and then integrating the branch. The branch spec will contain what files get branch to where. And the integrate command will be given the branch spec as its main specification. Therefore basically two commands get sent to the perforce servers instead of a few hundred. The result is spectacularly fast compared to the old way.

At the heart of a branch definition is a mapping of old files to the new files. The old file is always on the left, and the new file is on the right. So given that I want to integrate or branch file A.cpp to a new location and rename it to B.cpp, the mapping would look like this:

//depot/foo/A.cpp       //depot/new/output/B.cpp

How to Create a branch

Given those file specifications, creating a branch spec also requires giving it a name, and specifying the user name:

using Perforce.P4;
public static bool CreateBranch(String branch_id, String username, Dictionary<String,String> fileList)
		var view_map = new ViewMap();
		foreach( var pair in fileList )
			var from = new ClientPath(pair.Key);
			var to = new ClientPath(pair.Value);

			view_map.Add(new MapEntry(MapType.Include, from, to));

		var msg = "Created programmatically via the Perforce .NET api. Used for integrating lots of files at once.";
		bool locked = true;
		var branch = new BranchSpec(branch_id, username, DateTime.Now, DateTime.Now, msg, locked, view_map, null, null);
		var created = mRepository.CreateBranchSpec(branch);
		Debug.Assert(created != null);
	catch (Exception e)
		Console.WriteLine("Unknown exception calling Perforce.");
		return false;

	return true;

Here the first argument to the method branch_id is the unique identifying name of the branch. The username argument is required for some reason. The last argument, the Dictionary contains the file mappings from old to new names. In this example above the method returns true if it succeeded and false otherwise. Again I wrapped it in an exception handler because I had to figure out how to use this API on my own by guessing, which I accomplished by a lot of trial and error.

To verify that the branch was created, I opened perforce and browsed the branch specs that were owned by me, and indeed the new branch spec showed up in the list.

How to Integrate a branch spec

Now that my branch spec is defined, I can easily integrate it using just the unique name I gave it when the branch was created:

using Perforce.P4;
public static void IntegrateBranch(String branch_id)
		var change_list = -1;
		var max_files = -1;
		var branch_options = new IntegrateFilesCmdOptions(IntegrateFilesCmdFlags.None, change_list, max_files, branch_id, null, null);
		var created = mPerforceConnection.Client.IntegrateFiles(branch_options, null);
		Debug.Assert(created != null);
	catch (Exception e)
		Console.WriteLine("Unknown exception calling Perforce.");

So the two methods above could be called in succession like this:

CreateBranch(“mybranch”, “John_Doe”, ….);