Over engineered build systems from Hell

While I was at Autodesk years ago we went through various build systems. When I first started the build was written in Perl. Dependencies were specified using a .csv file. I had no idea how it worked, nor did I care how it worked since I was interested in other things during those years. The build could routinely take an hour and 45 minutes, and was mind-numbingly boring since we usually had to do it everyday. And if you were unlucky multiple times per day. Even worse In fact on the build servers the build would routinely take almost 4 hours!

What a horrible tax.

Later on, a hot-shot programmer came along and rewrote the build system to use ruby and rake. It was supposed to be faster, which it kind of was. But the complexity was just as bad, and no one knew ruby nor how rake worked. Then the developer left the company, leaving behind a black box. So that silo of information was gone, and no one knew how the build system worked. It took 2 developers about a year and a half or so to learn the build system well. To really be able to work on it at the level the original developer had done.

To be sure there were problems with the build. It still took a long time to build the product. Somewhere near an hour. About the only thing we did gain was the ability to do incremental builds.

But modifying the build was the main problem. The build, written in ruby, completely reinvented the wheel on so many different areas. To understand this better, you have to understand that the product at the time was built using Microsoft tools because it was based solely on the Microsoft platform. Thus the source project files were in a build language that Microsoft created. That build language was built into visual studio and was called MSBuild. So instead of using Microsoft tools to create the build, ruby and rake were used instead. So instead of using Microsoft tools to parse the xml project files, a ruby gem was used. So instead of using anything from Microsoft to help with the build, everything was re-invented from scratch. Parsing visual studio .vcproj (and eventually .vcxproj) files was done tediously and laboriously and mind numbingly using rake and some xml gem. Talk about recreating the wheel! So much code written to duplicate a simple function call to a Microsoft function could have retrieved a fully instantiated object with full properties all intact.

Copying files to the build directory was another disaster too. It would take around 10 to 12 minutes to copy 7000~14000 files. It originally was somewhere near 7000 files, but grew over time. All written in ruby code that no one knew how to debug except to put print statements in.

Another problem was adding build properties. If you wanted to add a build property (a key value pair), you had to add it to multiple places in the ruby build, knowing exactly what to modify (in duplicate) and such. It was horrible.

Mixing ruby with MSBuild was like mixing iron and clay. They don’t mix well at all. It was alike a ruby straight jacket that hindered the build and visual studio upon which it was based.

There had to be a better way.

Eventually when the frustrations with the build boiled over, I learned MSBuild and figured out how to build max without ruby. It took over a year, from when I first got a working prototype, to get something into the main branch of max. Simply due to bureaucratic inertia. There are lots of people in positions of power who simply say no before learning about a subject. Which was something all too common there. The first liberation was freeing the list of things to get built from ruby. Yes the list of DLL’s and EXE’s to get built was specified in some arcane ruby syntax somewhere. The first thing was moving that list to a democratic XML file. Now any build language could parse it and find out what to build. The second thing was moving the list of files to copy to an XML file. Now any build system could know what files to copy as well.

Once those two things were in place, it was time to put in the build system that I originally came up with during a particular Christmas break.

It was written in pure XML, with one MSBuild extension that was written using C#. All the tools were native to visual studio, and what you built on the command line was what was built in visual studio. They both used the same properties (using native property sheets) and built in the same way.

What’s more I found that using native MSBuild tools to copy those 7000+ files to build now was incredibly fast. In fact, I while once debugging through that old ruby code responsible for copying, I found the source of the 10 minute copy task. Or why it took 10 minutes. It was using an N factorial algorithm! So given directory A with B thru Z subdirectories, it would iterate through all directories n! times. Each directory was not parsed once, but N! times according to the amount of sub-directories that existed. It was an unholy mess that proves that re-inventing the wheel is usually a disaster waiting to happen. Now back to the improvement: With the new msbuild copy mechanism it took 20 seconds to copy all those files. 20 seconds versus 10 minutes was a big improvement.

Incremental builds also greatly improved. Go ahead and build your product from scratch. Now don’t change a thing and rebuild. If you have a smart build system, it should just take a few seconds and nothing will happen. The build system will be smart enough to essentially report that nothing changed and therefore it did no work. My new build did just that in a matter of seconds. The old build system used to take about 5 minutes to do that (And it still did work anyways…).

Speaking of performance. The time to compile and link the actual files didn’t change much, because that was always in visual studio’s corner and not ruby. The improvement in performance came from the copying actions that now took 20 seconds. Also noticeable was the shorter time involved from when the build started to when the first CPP file was getting compiled. In ruby/rake it took quite a few minutes. In the new build it took a few seconds. Eventually when I got a new SSD hard-drive, I was able to get the build down to 22 minutes on my machine.

Better yet was the removal of duplication of anything. Everything was native and iron was mixed with iron and clay was mixed with clay. Sort of….

One developer, (who we called doctor No, because he said no to everything good), holding on to the fantasy that max would be multi-platform someday would not let ruby go. So there were in essence two build systems that could do the same thing. In fact he wanted an option to invoke one build system from another! So I had to put in an option to invoke msbuild from ruby/rake! This was hobbling msbuild with an old clunker. Kind of like buying a new car and towing the old one around everywhere you go. Yes extremely stupid and frustrating.

Which goes to show that old ways of thinking die hard, or don’t die at all.

The build at Century Software

Later on I moved to Century Software, a local company to where I live. That was such a fun place. Anyways their build system for their windows product was written in Make! Yes Make the original, ancient build system that you can’t even find documentation for anymore. I mean literally, I found (I think) one page of documentation somewhere on some professors lecture notes. The docs were  horrible. Make implemented here was so old it built one C file at a time. No multi-threading, no parallel builds nothing. Slow was the operative word here. That and a incomprehensible built output that was so verbose it was almost impossible to comprehend. The only good thing about it was that it immediately stopped on the first error.

So eventually I rebuilt that using MSBuild too. It took me a few months in my spare time. No bureaucratic inertia, no one telling me no. I just worked on it in my spare time and eventually I had a complete and fully functioning system for the tinyterm product. This build was the best I’ve ever done, with zero duplication, extremely small project files and a build that was very fast. It went from 45 minutes to a minute and a half.

When writing a product, the build system should be done using the tools that the platform provides. There should be no transmogrifying the code, or the build scripts (like openssl) before doing the build. When writing ruby on rails use rake for your build process’s. When targeting Microsoft platforms use msbuild. Using java, then use maven. Data that must be shared between build platforms should be in xml so that anything can parse them. And most important of all, distrust must go, and developers and managers must have an open mind to new things. Otherwise the development process’s will take so long, and be so costly, and exert such a tax that new features will suffer, bugs will not get fixed and customers will not be served and they will take their money elsewhere.

Advertisements

Writing Stable 3ds Max Plugins

I found this document while looking through old files today, and thought I’d share it. It was from a lecture I gave at Autodesk University back in 2012. It applies to 3ds max, but has some points that would be applicable to software development in general.

[Note] I wrote this along time ago, and today I saw this blog post again. I thought in fairness, I should add a few things. These were my guidelines I came up with based off of years of experience and years of experience of fixing bugs in 3dsmax. While I firmly believe in every single last one of them, unfortunately, hardly any of these things ever entered the thoughts of most 3dsmax developers. Most coded all day long being blissly unaware of warning level 4, and none ever showed an interest in static analysis except two people. In fact management most of the time was completely unsympathetic to these ideas. Of course management, even the development managers who used to be programmers simply didn’t care. They just wanted bugs fixed fast. No matter, nor interest was given to systematically fixing the fence at the top of the cliff. All thoughts were to get ambulances to the dead bodies at the bottom of the cliff as fast as possible. As a result the fences at the top were always full of holes. By the time I left Autodesk in the spring of 2014, only a dozen or so projects compiled at warning level 4. And no systematic static analysis was being done by anyone. I could go on, but that’s a thought for another blog post.

Introduction

Preventing crashes and increasing stability in software is a difficult task. There is no practice nor set of practices that will completely prevent all crashes. However there are a few things that will help reduce errors. With that short introduction let us get started.

Basic Responsibilities

These are basic practices that would apply no matter where you worked and no matter which product you worked on.

Compile at warning level 4

You should compile your plugins at warning level 4. Level 4 warnings can help point out subtle programming errors that can lead to bugs that can be absurdly difficult to discover later on.  This is a free and practically instantaneous way to find bugs early in the development process.

Code compiled at level 4 is better than code compiled at anything less. Level 4 warnings should be turned on, and no warnings should be suppressed.

The 3ds Max SDK compiles cleanly at warning level 4, and has been that way for at least 3 years now.

Case in Point:

We turned on warning level 4 for an old project recently. One level 4 warning pointed to some unreachable code. This was caused by a break statement that was left in in a loop. This problem eventually resulted in a complete feature not working.

Compile with Static Analysis

The highest version of visual studio comes with a static analyzer called Code Analysis. This feature can be turned on for native or managed code in visual studio. Static analysis does a deep scrutinization of the code and can help spot bugs. These bugs are more complex than what level 3 or 4 warnings can give. But these warnings are usually so fundamental that they can be likened to level 1 or 2 warnings.

Case in Point:

The static analyzer can detect allocation/de-allocation mismatches. For instance we turned it on and found when memory was allocated with new[] but was de-allocated with delete, instead of delete []. We found lots of these scattered throughout our large source code base. The advantage of this is that it is so easy to detect. Without static analysis it would take a special tool like bounds checker to reveal a memory allocation mismatch, and that would only be after exhaustive testing.

Check Pointers for NULL

By far the most common fix I have seen for known crashes in 3dsmax is to check a pointer for NULL. This is the most persistent problem that I have ever seen in C/C++ code. Get into a habit now to check every pointer for NULL before using it.  A corollary to this is to initialize all pointers to NULL before and after they are used.

Case in Point:

The visual studio static analysis tool emits various warnings for possible dereferencing of null pointers. Consequently I have rarely seen this problem in code that compiles at level 4 with static analysis.
For the Rampage Release the 4th highest Crash in 64 bit max was a crash using Ray Traced Shadows. The shadow code contained a buffer of floating point values that was uninitialized. It was extremely difficult to track down, as it was only manifest when the debugger was NOT attached.

Check before casting

If you lie to the compiler, your application will come back and bite you. C is a language that is seemingly built on casts, where anything can seemingly be cast to anything else. This ability to so easily lie to the compiler and misrepresent your types to the compiler is dangerous and risky. Therefore prefer to use C++ style casts. By turning on RTTI and using C++ style casts, the results of the cast can be checked for validity.

Case in Point:

In the sdk header file imtl.h is a class called MtlBase which has 4 derived classes. One of those classes is class Mtl. I found functions in MtlBase that was blindly assuming the instance (i.e. this) was an instance of class Mtl. However this ignored the fact that there were 3 other derived classes from MtlBase. Thus it was casting the ‘this’ pointer to class Mtl, and then doing work on that miscast pointer.

Avoid stack based strings

A very common way to crash the application is over-reliance on stack based C strings. This code for instance is very dangerous:

void foo() {
TCHAR buf[SIZE];

}
One of the problems with stack based strings, is operating on a string that is bigger than the buffer. This of course can corrupt the callstack.  This is almost impossible to debug afterwards and usually makes reading minidump crash files an exercise in frustration.  The danger can be minimized by using the newer safe string functions. For instance instead of using strcat, which can easily run over the end of a string, you can use strcat_s which is a safer version.

When possible use TSTR or MSTR instead , where the actual string buffer is stored on the heap, and not the stack. Then if anything does go wrong, it will not corrupt the callstack.

Now a disclaimer: Max has a lot of stack based strings all over the place (It is has been around a long time of course). But their usage is getting reduced as we now favor TSTR or MSTR.

Case in Point:

The code for the customization dialog contained a for loop that concatenated a string into a stack based buffer of limited size. The for loop interated too many times and the buffer overflowed, corrupting other items on the stack. That stack based buffer was several frames up the stack. When that stack frame was cleaned up, it crashed. Diagnosing the problem was difficult since the symptom was several function calls away from the source of the problem.

Avoid using catch(…)

If at all possible avoid using catch(…). Prefer to handle more specific exceptions like catching an out of memory exception such as (std::bad_alloc). While using catch(…) may prevent the application from crashing, it can  also hide bugs and make it more difficult to solve crashes. It is useful for debugging to actually remove a catch(…) and let the program crash exactly where the cause of the crash is located. You should generally catch only those errors that you can handle, and let the ones that you cannot pass through so that the larger system can handle it if possible, or crash in the “correct” place rather than delay it.

Now catch(…) can be used when it does something to correct the program state. This should be done only after careful consideration, usually with multiple developers. Also side affects needs to be considered as well. If a catch is used to wrap a call to thousands of 3ds Max functions, than it probably shouldn’t be used. However wrapping a call to a 3rd party library is acceptable. Everything needs to be balanced of course.

Certain regular expressions can easily be written to help search for empty catch statements. The static analyzer PVS-Studio will also help identify these too.

Case in Point:

I regularly review the usage of catch(…) in the source code, and have over the years taken out a few catch(…). As a result, the clarity of crashes from customers in the field has increased.

Use Debug Builds

When debug builds are available, they should be used for testing and development. 3ds Max is the only M&E product that provides debug builds to ADN partners, all though they may be slow in delivery. However despite the delays a debug build provides a great resource in validating your plugins.

[Note: It turns out to be very ironic that I put this here, since the 3dsmax team does not use debug builds. Sure the devs do, but in all my years there I could never get management to move to have the QA testers use debug builds. Never the less I believe in debug builds and that they are far superior for testing than release builds.]

Watch log file, asserts and the debug output

Log File

3dsmax has a log file that writes to <max install>networkmax.log. This file is mainly used for debugging network rendering, which was its original purpose. However, it has grown to become a popular logging mechanism for max. This log can provide useful information, but it still is under-utilized and cannot be expected to report program state consistently across the product.

Asserts

Do not ignore asserts (remember debug builds?). Use asserts liberally in your own code and don’t suppress asserts unless they are logged and checked afterwards (for example, using automated testing). The assert system will automatically log all asserts (whether they are suppressed or not) to the file: <max install>3dsmax.assert.log.

Debug output window

The Visual Studio debug output window (debugging window) provides significant output and can be useful to watch during debugging sessions. Be sure to turn on display for all types of events including exceptions (very important) and regular program messages. If you want to check debug output without attaching a debugger, than you can use a Microsoft tool from sysinternals called DbgView. See the following website for details: http://technet.microsoft.com/en-US/sysinternals

Disclaimer: The MetaSL system parses a lot of files when 3ds Max starts up. This will generate a lot of exceptions that are benign, so not to worry. The reason is The MetaSL system, from Mental Images, uses a 3rd party library (antler) to parse files, which in turn uses exceptions for program flow.

Enable Break On Exception:

Visual Studio has debugging options that allow it to break execution when an exception is thrown. This should be used as often as possible. This is the corollary to the “No catch(…)” above.  There are a few places where max actually does use catch(…), for example in the maxscript kernel.  By enabling this feature, exceptions are immediately brought to the attention of the developer.

Max Specific Problems

Do not hold naked simple pointers to ReferenceTarget’s

A class that is not a ReferenceMaker should not hold a plain old pointer to a ReferenceTarget, or a class that derives from a ReferenceTarget, without some mechanism to ensure validity of the pointer before use (i.e. AnimHandles). Instead replace the simple pointer with a SingleRefMaker class instance, and have that observe the ReferenceTarget.

Good Bad
class Good{…

SingleRefMaker mObserve;

}

class Risky{…

ReferenceTarget* mObserve;

}

 

Do not write dynamic arrays of ReferenceTarget’s.

Do not write a class that holds an array of ReferenceTarget’s: especially when that array grows and shrinks at runtime.

A class like this usually has a container that holds pointers to ReferenceTargets. It usually overrides ReferenceMaker::NumRefs like this:

int RumRefs() { return myArray.Count(); }

Instead of a fixed number of items:

int RumRefs() { return 3; }

This cannot be done correctly without considering undo and redo (Subclassing class RestoreObj). The fundamental weakness of the reference system is that it expects references to be in a fixed position. That reference index is an internal implementation of the ReferenceMaker that should be invisible to clients. However clients routinely use the reference index to get a certain Target. And one of those clients is the undo system. One of the complications of such an implementation is that the Undo System usually expects that internal array to never shrink in size. If a ReferenceTarget is removed from the internal array, a RestoreObj usually should or could point to its old reference slot. The Reference System of course has no idea that the internal array shrunk in size, so if an undo action occurs it may stick that Reference back into the wrong slot. To avoid that, a common practice is to make dynamic reference arrays grow but never shrink. This wastes memory.
For example: Undo and Redo can change the size of the internal array via SetReference. So if you have an array with 10 ReferenceTarget’s and your undo/redo object happens to ‘redo’ and stick a reference back in at slot 5, well, all your other pointers from index 5 to 10 have now had their indexes bumped up by one. So now anything dependent or holding on to those moved ReferenceTarget pointers are now dangling.

There are a few alternatives to this:

  • Use class IRefTargContainer.
  • Use an array of AnimHandle’s.
  • Use a ParameterBlock

Do not access the Reference System after NOTIFY_SYSTEM_SHUTDOWN

The notification message NOTIFY_SYSTEM_SHUTDOWN (See notify.h) is broadcast before plugins are unloaded. It is critically important to drop all references to plugins in response to this message. There are many plugin modules that define ReferenceTargets that will then get unloaded shortly afterwards. Once the plugin module is unloaded, trying to access a ReferenceTarget defined in that module can result in a crash.

Do minimal work in DllMain

The MSDN docs state that minimal work should be done in DllMain. Specifically it warns against loader lock, among other things. The DllMain function can be called as a result of LoadLibrary. When LoadLibrary is executed a critical section is locked while your DllMain is active. If you try to do work that for example needs another DLL to get loaded, it could lock up the application as a race condition. Instead of doing work in DllMain on shutdown, there are a few other ways to do plugin initialization and unitialization. For example:

  • You can do uninitialization work in response to NOTIFY_SYSTEM_SHUTDOWN. (see notify.h)
  • You can and should use a LibInitialize and LibShutdown functions.

A similar warning is not to do heavy work in static variables constructors, because a static variable will get constructed close in time to when DLLMain is called. Then, when the static variable is constructed, the DLL may not be fully loaded and types needed by the constructor may not be available yet.

Do not violate party etiquette

Uninvited guests should not crash the 3ds Max party. When the party is over: go home.

Uninvited guests

Every plugin has an appropriate time in which it should be initialized, do its work and shutdown. For example:

  • A plugin for a color picker should not instantiate one when 3ds max starts up.
  • A plugin for a scene browser should be active ONLY when its UI is active.

It is entirely possible and probable that users can start max and NEVER use your plugin. Therefore do not waste memory and resources for a feature that may not get used. Do the work when users actually invoke your feature. In other words when 3ds Max starts up, the plugin should not invite itself to the 3ds Max party, it should wait for an invitation.
This rules is violated on startup by loading 3rd party libraries, instantiating plugin classes, holding pointers to the node scene graph and registering callbacks to common scene events (my favorite pet peeve: “Hey max crashed in this function even though I never used this feature?”).  When max loads a plugin, the major things 3ds Max requires from a plugin are:

  • The number of class descriptors
  • A way to get those class descriptors.
  • Some pointers to LibInitialize and LibShutdown functions.

Therefore class descriptors really are the only things that should be instantiated on module load or startup. There should be no static instances of the actual plugin class, whether it is a material plugin, shadow, utility, or renderer. Of course there are exceptions such as function published interfaces and parameter block descriptors that often are statically defined: But I’m not talking about those.

No loitering

When 3ds Max shuts down, it sends out the most important broadcast notification in all of 3ds Max (found in notify.h): NOTIFY_SYSTEM_SHUTDOWN. This means the 3ds Max party is over. The plugin should completely shut itself down or disassociate itself completely from all max data. For example: All References should be dropped. All arrays holding pointers to INode’s should be cleared out etc… And most common and most dangerous: All callbacks functions that are registered should be unregistered.

When NOTIFY_SYSTEM_SHUTDOWN is broadcast, the entire max scene is completely intact and still in a completely valid state. During any callbacks or notifications after that, 3ds Max will contain less and less of a valid state to work with. In other words as 3ds Max progresses in its shutdown sequence less and less of the max scene will be valid. So for instance the other shutdown notification NOTIFY_SYSTEM_SHUTDOWN2 is called merely when the main 3dsmax window (think HWND) is destroyed. No plugin should be responding to that message to (for example) iterate through the scene graph. Likewise the LibShutdown functions should not be iterating the scene graph.

Case In Point

Say that a plugin that depends on another library like this:
plugin.dll -> library.dll
When the plugin is loaded by max, the tertiary library will also (automatically) get loaded. But when the plugin is unloaded the tertiary library will not get unloaded. That is unless the reference count on the library is decremented to zero. This will not happen unless FreeLibrary is specifically called on library.dll (Which is not a common nor recommended practice). Thus instead, the library will get freed or shutdown long after WinMain exits and max has uninitialized and is gone. Therefore the tertiary library should not contain any dependencies on anything in the 3ds Max SDK. Thus for example GetCOREInterface() should never be called in a DllMain of a dependent module to a plugin (i.e. library.dll ).

Quality Testing

Developers can implement the following practices in their software development processes:

Automated regression testing

All good production pipelines should have regression testing that occurs automatically after a build. This is critical to help catch bugs before they get to customers in the field. Also the developers should have access to these automated tests so that they also can run these tests before submitting their code.

Dynamic Memory Analysis

This means using 3rd party tools to profile, analyze, check and verify memory during runtime of the application.

The following list of tools is a partial example of what is available:

  • MicroFocus BoundsChecker: Checks for memory leaks, or memory allocation mismatches among a host of other things.
  • Microsoft’s Application Verifier also checks for various memory problems during runtime such as accessing an array out of bounds.
  • Visual Leak Detector (Open source on codeplex.com) checks for memory leaks. It is fast, efficient and stable.

Code coverage

This is using a tool to measure how much of your application or plugin was actually tested during execution. This helps a developer to know when they have tested the product enough. It also can help a developer find areas they have not tested. Simply put untested code is buggy code, and a code coverage tool helps in this regard. The best tool I have ever seen for this is Bullseye (bullseye.com). It works for native C++ and is easy to use, and very fast. It requires instrumentation of the code during the build which can double the build time,but runtime performance is excellent.

 

 

 

 

 

 

 

 

How to use Perforces P4.NET API for basic tasks

So recently I had a task to rename some files in Perforce. Ok, one or two files is easy, that can be done by hand. But this task required doing around 800 files based off of data from an external source. So I started looking for a .NET API that would allow me to access Perforce using C# rather than use the command line.

Thankfully Perforce has one.

http://www.perforce.com/product/components/apis

The only problem is documentation is basically no good:

http://www.perforce.com/perforce/doc.current/manuals/p4api.net/p4api.net_reference/Index.html

The help docs have one measly page that gives a tutorial on how it works. And that is only for connecting and disconnecting. For most of the rest of the reference guide contains no description of what anything does, nor what anything is, nor how to use it. The docs are basically just a skeleton of a help file. And on every page is this disclaimer: [This is preliminary documentation and is subject to change.]. Indeed it is. The docs are so bare, they are basically useless. Given a function signature that has no description like this:

Syntax

C#
public List<FileSpec> IntegrateFiles(IList<FileSpec> toFiles,FileSpec fromFile,

Options options

)

Parameters

toFiles

Type: System.Collections.Generic.IList(FileSpec)

fromFile

Type: Perforce.P4.FileSpec

options

Type: Perforce.P4.Options

Return Value

 

You basically have to guess how to use this API!

To add insult to the whole comical affair, Perforce did add documentation for this particular method, not the parameters as you see above, but for the method in general. But it was docs for the command line! As if the command line syntax and idioms could possibly apply to C#! In stunned disbelief I had to press forward as this was my only option for accessing Perforce using a .NET API. I had access different perforce API that uses .NET years ago. But it was so old that it wasn’t compatible with .NET 4.0, or the new perforce servers we use now.

So the point about this blog post is not to complain, but to offer a solution about how I figured all this stuff out.

First off I wrote a class to encapsulate the credentials needed to log on to the Perforce Server. It contained data like the server name and port, the username and the client spec:

public class PerforceID
{
	public PerforceID(String serverURI, String user_name, String client_spec)
	{
		mServerURI = serverURI;
		mUserName = user_name;
		mClientSpec = client_spec;
	}
	private String mServerURI;
	public System.String ServerURI
	{
		get { return mServerURI; }
	}
	private String mUserName;
	public System.String UserName
	{
		get { return mUserName; }
	}
	private String mClientSpec;
	public System.String ClientSpec
	{
		get { return mClientSpec; }
	}
}

This instance is constructed with the appropriate data and used later.

I kept all my methods in a static class, with just a few static data members.

How to Connect to a Perforce Server

Obviously connecting to the perforce server was of paramount importance. Thankfully the Perforce documentation did describe how to connect (at least). So here I basically borrow from their documentation, except I am using my own static variables. And as you can see I am using my own PerforceID instance to provide the server address (id.ServerURI), the user name (id.UserName) and the client spec I want to use (id.ClientSpec).

private static Perforce.P4.Connection mPerforceConnection;
private static Perforce.P4.Repository mRepository;
public static void Init(PerforceID id)
{
	// initialize the connection variables
	// note: this is a connection without using a password

	// define the server, repository and connection
	Server server = new Server(new ServerAddress(id.ServerURI));
	mRepository = new Repository(server);
	mPerforceConnection = mRepository.Connection;

	// use the connection varaibles for this connection
	mPerforceConnection.UserName = id.UserName;
	mPerforceConnection.Client = new Client();
	mPerforceConnection.Client.Name = id.ClientSpec;

	// connect to the server
	mPerforceConnection.Connect(null);
}

As you can see above the Repository and Connection instance need to be used later, hence why I grab a hold of them for later use.

How to Disconnect from a Perforce Server

This part is easy (again borrowed from their docs).

public static void UnInit()
{
	if (mPerforceConnection != null)
	{
		mPerforceConnection.Disconnect();
		mPerforceConnection = null;
	}
}

How to open a file for edit

Editing a file is the most basic operation I could hope to do. Unfortunately it was not straight-forward at all. So given a string that contains the full file path, this method will open the file for editing in Perforce.

public static bool Edit(String filename)
{
	Perforce.P4.Options options = new Options();
	mPerforceConnection.Client.EditFiles(options, new FileSpec[] { new FileSpec( new ClientPath( filename) ) });
	return true;
}

It was not my intention to open it for editing in any particular changelist. Thus after calling this method, the file will be opened for editing in the default changelist.

I found that the Perforce.P4.Client class contained a lot of the methods I would need to do various familiar operations. Operations that you could logically imagine a user doing manually such as Adding files to a changelist, deleting files, integrating files, shelving and unshelving, merging, moving, locking files etc…

How to integrate a file

Integration or branching (versus copying) is important because it retains the file history. For this operation you need two arguments: the old and new file name. Both must be the full qualified path names containing the directory and file.

using Perforce.P4;
public static bool Integrate(String old_name, String new_name)
{
	bool result = false;
	try
	{
		var from = new FileSpec(new ClientPath(old_name), VersionSpec.Head);
		var to = new FileSpec(new ClientPath(new_name), VersionSpec.None);

		mPerforceConnection.Client.IntegrateFiles(from, options, to);
		result = true;
	}
	catch (Exception e)
	{
		Console.WriteLine("Unknown exception calling Perforce.");
		Console.Write(e.Message);
		result = false;
	}

	return result;
}
static IntegrateFilesCmdOptions options = new IntegrateFilesCmdOptions(IntegrateFilesCmdFlags.None, -1, 0, null, null, null);

The reason I instantiated the instance of IntegrateFilesCmdOptions outside of the function is because I had to call this function hundreds of times. Hence it didn’t make sense redo that instantiation every time.  Also like all Perforce commands, I wrote inside an exception handler since I was guessing how the API worked. For instance I had to guess what parameters I could pass into my options instance above. This will integrate the files and put them in the default changelist. The last 3 parameters I still have no idea what they do, but passing in null works!

This method works, but the only problem is that calling this 800 times is very slow. After about 30 seconds of this I decided I needed a faster approach. I needed to integrate 800 files in a few seconds.

How to integrate lots of files (fast)

So given the requirement I needed to integrate hundreds of files, here is the fast way to do it. I needed to replace 800 calls to the server with just a few instead. This is done by first creating a branch and then integrating the branch. The branch spec will contain what files get branch to where. And the integrate command will be given the branch spec as its main specification. Therefore basically two commands get sent to the perforce servers instead of a few hundred. The result is spectacularly fast compared to the old way.

At the heart of a branch definition is a mapping of old files to the new files. The old file is always on the left, and the new file is on the right. So given that I want to integrate or branch file A.cpp to a new location and rename it to B.cpp, the mapping would look like this:

//depot/foo/A.cpp       //depot/new/output/B.cpp

How to Create a branch

Given those file specifications, creating a branch spec also requires giving it a name, and specifying the user name:

using Perforce.P4;
public static bool CreateBranch(String branch_id, String username, Dictionary<String,String> fileList)
{
	try
	{
		var view_map = new ViewMap();
		foreach( var pair in fileList )
		{
			var from = new ClientPath(pair.Key);
			var to = new ClientPath(pair.Value);

			view_map.Add(new MapEntry(MapType.Include, from, to));
		}

		var msg = "Created programmatically via the Perforce .NET api. Used for integrating lots of files at once.";
		bool locked = true;
		var branch = new BranchSpec(branch_id, username, DateTime.Now, DateTime.Now, msg, locked, view_map, null, null);
		var created = mRepository.CreateBranchSpec(branch);
		Debug.Assert(created != null);
	}
	catch (Exception e)
	{
		Console.WriteLine("Unknown exception calling Perforce.");
		Console.Write(e.Message);
		return false;
	}

	return true;
}

Here the first argument to the method branch_id is the unique identifying name of the branch. The username argument is required for some reason. The last argument, the Dictionary contains the file mappings from old to new names. In this example above the method returns true if it succeeded and false otherwise. Again I wrapped it in an exception handler because I had to figure out how to use this API on my own by guessing, which I accomplished by a lot of trial and error.

To verify that the branch was created, I opened perforce and browsed the branch specs that were owned by me, and indeed the new branch spec showed up in the list.

How to Integrate a branch spec

Now that my branch spec is defined, I can easily integrate it using just the unique name I gave it when the branch was created:

using Perforce.P4;
public static void IntegrateBranch(String branch_id)
{
	try
	{
		var change_list = -1;
		var max_files = -1;
		var branch_options = new IntegrateFilesCmdOptions(IntegrateFilesCmdFlags.None, change_list, max_files, branch_id, null, null);
		var created = mPerforceConnection.Client.IntegrateFiles(branch_options, null);
		Debug.Assert(created != null);
	}
	catch (Exception e)
	{
		Console.WriteLine("Unknown exception calling Perforce.");
		Console.Write(e.Message);
	}
}

So the two methods above could be called in succession like this:


CreateBranch(“mybranch”, “John_Doe”, ….);
IntegrateBranch(“mybranch”);


			
					

Bad Casts and wrong assumptions about class hierarchies

Lately here at work, I have been looking into solving a certain crash in the product. This particular crash was occurring in a certain function that had many crashes over the years.

The situation involved code in a base class that that was doing work for a derived class. In fact the base class was calling methods on the derived class. To do that, it had to cast itself to a pointer of the derived class (using the ‘this’ pointer). But since there were was more derived classes than one, the cast was sometimes performing an invalid cast when called with an instance of the other derived class.

So, sometimes when fixing a mysterious crash, just look for a bad cast. 🙂

The code had a class hierarchy roughly like this:

Class Base
	Class A
	Class B

So here, class A inherits from class Base. Similarly, class B also inherits from Base.

In this example case, some code in class Base was always casting the ‘this’ pointer to A. However called with an instance of B, the assumption won’t hold, and a cast will simply corrupt the pointer. Furthermore, it did this with a C style cast, which offers no warning nor protection.

This code demonstrates the problem:

void B::foo()
{
	A* temp = (A*)this; // B is not A!
	// crash attempting to do something with temp
}

So if called like this,

B* b = new B();
b->foo();

we should expect a crash calling foo since class B is not class A.

There were a few things wrong with this.

  1. A base class should not implement work that rightly belongs to a derived class. For instance it would be bad if a shape class should be in charge of drawing a box and a sphere. To be polymorphic a shape class would offer a virtual or abstract draw method, and leave the implementation details to the derived box and sphere classes. Thus a derived class pointer could stand in for a base class pointer (demonstrating polymorphism) and with no switch statements just implement the needed functionality.
    So in my case here, the whole concept of polymorphism here seems to have been violated. If a derived class performs an operation, then the base class should not be calling it. In fact, the base class should always be completely ignorant of the implementation details of a derived class. In this case, the operations being performed didn’t involve any virtual methods…. Thus without a virtual method, polymorphism cannot be implemented. Hint: using C++ it is possible to replace switch statements and if statements with simple virtual method calls. Thus this code was pure C code masquerading as C++ code.
  2. An incomplete understanding of the class hierarchy that derives from this base class. While the assumptions may have once held, once other derived classes were added later on, the assumptions would fail. Thus casting everything to one particular derived class to perform operations was very fragile, and asking for trouble.
  3. Use of C style casts here is plain wrong in C++. A cast like this should fail, but in C, no cast ever fails. A C style casts may be fast, but it offers no protection. Thankfully the Microsoft Visual C compiler offers a measure of protection however for C++. By turning on Run Time Type Identification (RTTI) in C++ and using any of the standard casts (static_cast<>, dynamic_cast<>, reinterpret_cast<>) the operation will fail with invalid data and return NULL (or nullptr in VC 10.0) . Thus B::foo could have been written like this:
void B::foo()
{
	A* temp = dynamic_cast(this);
	if (temp)
	{
		// do something with temp
	}
}

Which in this case as shown, the code inside the if statement would never get executed.

Finally in fixing the bug, I did a search through the entire source file looking for these C style casts to the wrong data type. And for almost every function that used this bad cast, we had lots of crashes in that function from customers over the years.

Developers who litter in the code

Over the years an old code base accumulates a lot of  junk left over by programmers. Some of this is what I call vacation signs. This is a small explicit statement left in the code in the form of a comment (or a bug) stating that the developer was there. For instance:

// mjm - 06.12.00 - begin

...

// mjm - end

So now, not only do I know where his changes started, thankfully he informed the entire world where his changes ended. This is equivalent to taking a vacation to Italy and pounding a small sign in the ground when you arrived in a town stating “I <insert name here> arrived in Milan on Feb 1 2004”. And imagine when you left, you then pounded another small sign into the ground (perhaps at the other end of town?) stating your name and the date that you left. This is cute for the pictures send to your mom, but sooner or later the mayor of Milan will start to get offended at stupid tourists who do this, and will send the constables to arrest the idiot foreigner who is littering his town. This if course is utter stupidty. And in the case of these vacation signs left in our code above, they are an utter nuisance. Especially when there are thousands of these little signs.

Solving linker error ‘LNK2022: metadata operation failed’

If you are compiling managed C++ projects, and ever encountered this idiotic linker error:

1>xxx.obj : error LNK2022: metadata operation failed (8013118D) : Inconsistent layout information in duplicated types (_PROPSHEETPAGEW): (0x020001f8).

1>xxx.obj : error LNK2022: metadata operation failed (8013118D) : Inconsistent layout information in duplicated types (_PROPSHEETPAGEA): (0x02000206).

than this post is for you.

For some reason in our code, this linker error always comes in pairs. No matter, in our code base it usually manifests itself after rearranging header files in a file that is compiled with /clr compiler switch.

The problem is caused by windows.h being included after other header files:

#pragma unmanaged
#include “file.h”
#include “otherfile.h”
#include <windows.h>

#pragma managed
….

The solution is to make sure that windows.h is included before anything else.

#pragma unmanaged
#include <windows.h>
#include “file.h”
#include “otherfile.h”

#pragma managed

….

Pointer Truncation

Pointer truncation is an evil that all developers of native C++ code need to worry about. This is simply storing a wide piece of memory in a narrower bit of memory. This can become an issue on applications that are ported from 32 bit to 64 bit. In 32 bit applications, pointers are of course 32 bits wide. So it is no big deal to do something like this:

int* foo = new int[45];
int bad = (int)foo;

However in 64 bit applications, that is bad. Since that pointer could be using the high bits of it’s memory address, and when cast or truncated to a 32 bit int, it will lose information. The high bits of a pointer are the bits that pointer to regions higher than 0xFFFFFFFF. For instance a 32 bit pointer has 8 digits (in hexadecimal notation) and can look like this:

0x12345678

But a 64 bit pointer has 16 digits (in hex) and can look like this:

0x1111111122222222

The bits above that are 1 are the high bits. Since a 64 bit app can address more than 2^32 bits of memory, these bits can get used if the 64 bit app requires lots of memory. There is a way to force pointers to be assigned to the higher regions of memory without the expense of allocating all that memory. That way is to use VirtualAlloc in the Windows API, and reserve (but not commit) the lower 4 Gigs of memory. 

The results of a random debugging session involving the code below, shows that the high bits are completely lost when stored in the variable bad.

// a pointer, which is 64 bits wide on a 64 bit platform
int* foo = new int[45];
// this will truncate the memory pointer
int bad = (int)foo; // PURE Evil! Integers are 32 bits wide

Notice how the integer loses the leading value from the pointer (circled) as shown in the debugger watch window below: image

If the coder really gets adventurous they can try converting that integer back to a pointer again, with especially disastrous results:

// a pointer, which is 64 bits wide on a 64 bit platform
int* foo = new int[45];
// this will truncate the memory pointer
int bad = (int)foo; // PURE Evil! Integers are 32 bits wide

// an accident waiting to happen
int* worse = (int*)bad; // catastrophic!
// worse now looks like this: 0xffffffffa0040140
// now the high bits of 'worse' are filled with 0xff, which automatically
// makes it point to invalid regions of memory
// Any accessing of this pointer will result in an access violation.

which shows this in the debug watch window:

image

Notice how the high bits of the pointer worse are now pointing to regions of memory that the windows kernal has flagged as off limits. Any attempt to do anything with this pointer will result in an access violation, and if not correctly handled will result in the termination of the application.

One good way to find these is to call VirtualAlloc with very big values when your application begins. (I’m simplifying greatly here, trust me). This will reserve large chunks of memory below 4 Gigs. So that by the time your application makes actual memory allocations, all or most of the pointers will be above the 4 Gigs boundary mark. Then you have to test your application quite thoroughly to flush out these hot spots.

Well you might ask, should not the compiler aid you in this situation? Not the MS VC++ 10.0 compiler. Even at warning level 4, nothing is emitted to warn about the truncation.

Also Code Analysis will not help as it not enabled on 64 bit builds. i.e.

1>—— Build started: Project: VirtualMemTest, Configuration: Debug x64 ——
1>cl : Command line warning D9040: ignoring option ‘/analyze’; Code Analysis warnings are not available in this edition of the compiler

And on 32 bit builds of the same code code analysis still emits no warning.

Experimenting with Live Writer and code plugins

I have a wordpress blog, but I find its text editor to be somewhat limiting. The big limitation I have found is the problem of pasting code into the blog, and making the code display nicely. So far I have had no success with wordpress plugins. So I’m going to experiment with Live Writer. So for my first experiment, I downloaded the “Paste As Visual Studio Code” plugin from Lavernock. There website link is here

So, here is some C# code with very little eye candy:

public class SymbolUseOptions
{
    public SymbolUseOptions()
    {
        // empty constructor
    }
    private FileReference mMethod;
    public FileReference Method
    {
        get { return mMethod; }
        set { mMethod = value; }
    }

    private FileReference mSymbol;
    public FileReference Symbol
    {
        get { return mSymbol; }
        set { mSymbol = value; }
    }
}

So that worked very easily.

Here is some C++ code (This time with line numbers and alternating line colors):

Code Snippet
  1. bool Database::Open( String^ file_name )
  2. {
  3.     bool result = false;
  4.     if (!mOpened)
  5.     {
  6.         const char* filename = Util::StringToChar(file_name);
  7.         UdbStatus status = UdbStatus::Udb_statusOkay;
  8.         
  9.         status = (UdbStatus)udbDbOpen( filename );
  10.         
  11.         Util::FreeNativeString(filename);
  12.  
  13.         Debug::Assert(status == UdbStatus::Udb_statusOkay);
  14.  
  15.         if (status == UdbStatus::Udb_statusOkay)
  16.         {
  17.             result = true;
  18.             mOpened = true;
  19.         }
  20.     }
  21.  
  22.     return result;
  23. }