Over engineered build systems from Hell

While I was at Autodesk years ago we went through various build systems. When I first started the build was written in Perl. Dependencies were specified using a .csv file. I had no idea how it worked, nor did I care how it worked since I was interested in other things during those years. The build could routinely take an hour and 45 minutes, and was mind-numbingly boring since we usually had to do it everyday. And if you were unlucky multiple times per day. Even worse In fact on the build servers the build would routinely take almost 4 hours!

What a horrible tax.

Later on, a hot-shot programmer came along and rewrote the build system to use ruby and rake. It was supposed to be faster, which it kind of was. But the complexity was just as bad, and no one knew ruby nor how rake worked. Then the developer left the company, leaving behind a black box. So that silo of information was gone, and no one knew how the build system worked. It took 2 developers about a year and a half or so to learn the build system well. To really be able to work on it at the level the original developer had done.

To be sure there were problems with the build. It still took a long time to build the product. Somewhere near an hour. About the only thing we did gain was the ability to do incremental builds.

But modifying the build was the main problem. The build, written in ruby, completely reinvented the wheel on so many different areas. To understand this better, you have to understand that the product at the time was built using Microsoft tools because it was based solely on the Microsoft platform. Thus the source project files were in a build language that Microsoft created. That build language was built into visual studio and was called MSBuild. So instead of using Microsoft tools to create the build, ruby and rake were used instead. So instead of using Microsoft tools to parse the xml project files, a ruby gem was used. So instead of using anything from Microsoft to help with the build, everything was re-invented from scratch. Parsing visual studio .vcproj (and eventually .vcxproj) files was done tediously and laboriously and mind numbingly using rake and some xml gem. Talk about recreating the wheel! So much code written to duplicate a simple function call to a Microsoft function could have retrieved a fully instantiated object with full properties all intact.

Copying files to the build directory was another disaster too. It would take around 10 to 12 minutes to copy 7000~14000 files. It originally was somewhere near 7000 files, but grew over time. All written in ruby code that no one knew how to debug except to put print statements in.

Another problem was adding build properties. If you wanted to add a build property (a key value pair), you had to add it to multiple places in the ruby build, knowing exactly what to modify (in duplicate) and such. It was horrible.

Mixing ruby with MSBuild was like mixing iron and clay. They don’t mix well at all. It was alike a ruby straight jacket that hindered the build and visual studio upon which it was based.

There had to be a better way.

Eventually when the frustrations with the build boiled over, I learned MSBuild and figured out how to build max without ruby. It took over a year, from when I first got a working prototype, to get something into the main branch of max. Simply due to bureaucratic inertia. There are lots of people in positions of power who simply say no before learning about a subject. Which was something all too common there. The first liberation was freeing the list of things to get built from ruby. Yes the list of DLL’s and EXE’s to get built was specified in some arcane ruby syntax somewhere. The first thing was moving that list to a democratic XML file. Now any build language could parse it and find out what to build. The second thing was moving the list of files to copy to an XML file. Now any build system could know what files to copy as well.

Once those two things were in place, it was time to put in the build system that I originally came up with during a particular Christmas break.

It was written in pure XML, with one MSBuild extension that was written using C#. All the tools were native to visual studio, and what you built on the command line was what was built in visual studio. They both used the same properties (using native property sheets) and built in the same way.

What’s more I found that using native MSBuild tools to copy those 7000+ files to build now was incredibly fast. In fact, I while once debugging through that old ruby code responsible for copying, I found the source of the 10 minute copy task. Or why it took 10 minutes. It was using an N factorial algorithm! So given directory A with B thru Z subdirectories, it would iterate through all directories n! times. Each directory was not parsed once, but N! times according to the amount of sub-directories that existed. It was an unholy mess that proves that re-inventing the wheel is usually a disaster waiting to happen. Now back to the improvement: With the new msbuild copy mechanism it took 20 seconds to copy all those files. 20 seconds versus 10 minutes was a big improvement.

Incremental builds also greatly improved. Go ahead and build your product from scratch. Now don’t change a thing and rebuild. If you have a smart build system, it should just take a few seconds and nothing will happen. The build system will be smart enough to essentially report that nothing changed and therefore it did no work. My new build did just that in a matter of seconds. The old build system used to take about 5 minutes to do that (And it still did work anyways…).

Speaking of performance. The time to compile and link the actual files didn’t change much, because that was always in visual studio’s corner and not ruby. The improvement in performance came from the copying actions that now took 20 seconds. Also noticeable was the shorter time involved from when the build started to when the first CPP file was getting compiled. In ruby/rake it took quite a few minutes. In the new build it took a few seconds. Eventually when I got a new SSD hard-drive, I was able to get the build down to 22 minutes on my machine.

Better yet was the removal of duplication of anything. Everything was native and iron was mixed with iron and clay was mixed with clay. Sort of….

One developer, (who we called doctor No, because he said no to everything good), holding on to the fantasy that max would be multi-platform someday would not let ruby go. So there were in essence two build systems that could do the same thing. In fact he wanted an option to invoke one build system from another! So I had to put in an option to invoke msbuild from ruby/rake! This was hobbling msbuild with an old clunker. Kind of like buying a new car and towing the old one around everywhere you go. Yes extremely stupid and frustrating.

Which goes to show that old ways of thinking die hard, or don’t die at all.

The build at Century Software

Later on I moved to Century Software, a local company to where I live. That was such a fun place. Anyways their build system for their windows product was written in Make! Yes Make the original, ancient build system that you can’t even find documentation for anymore. I mean literally, I found (I think) one page of documentation somewhere on some professors lecture notes. The docs were  horrible. Make implemented here was so old it built one C file at a time. No multi-threading, no parallel builds nothing. Slow was the operative word here. That and a incomprehensible built output that was so verbose it was almost impossible to comprehend. The only good thing about it was that it immediately stopped on the first error.

So eventually I rebuilt that using MSBuild too. It took me a few months in my spare time. No bureaucratic inertia, no one telling me no. I just worked on it in my spare time and eventually I had a complete and fully functioning system for the tinyterm product. This build was the best I’ve ever done, with zero duplication, extremely small project files and a build that was very fast. It went from 45 minutes to a minute and a half.

When writing a product, the build system should be done using the tools that the platform provides. There should be no transmogrifying the code, or the build scripts (like openssl) before doing the build. When writing ruby on rails use rake for your build process’s. When targeting Microsoft platforms use msbuild. Using java, then use maven. Data that must be shared between build platforms should be in xml so that anything can parse them. And most important of all, distrust must go, and developers and managers must have an open mind to new things. Otherwise the development process’s will take so long, and be so costly, and exert such a tax that new features will suffer, bugs will not get fixed and customers will not be served and they will take their money elsewhere.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s