Benjamin Pollack of Fog Creek software argues against weekend-at-midnight deployments:
Because, you see, you know how no one else works on the weekend? You know how that's why it seems so alluring? Well, that applies to you, too. So, for the same reason that your customers are less likely to be hitting your website, you have a drastically decreased ability to handle things if something actually does go wrong.
I'm not entirely sure Benjamin and the Kiln team have drawn the right lessons from their experience. The point that Benjamin's story really makes is quite different, if you take a step back.
The real take-away from this disaster story is: checking that the deployment work by "logging in and kicking the tyres" is kind of like checking that your car is roadworthy by, well, turning the ignition once and kicking the tyres. It might do the job when you want to drive down to the garage or deploy your tiny web app that has 200 free users, but if you're deploying an application that's mission-critical to your users, who are paying you money for it, it's just not good enough.
Two better approaches come to mind to resolve this:
- Continuous Deployment, which involves getting rid of the "big release" altogether by automating the "kicking the tyres" test and building it up into a full production monitoring suite which will roll back releases when they screw things up.
- Full-on, properly managed releases like they do in large IT corporations, such as banks, where a "release" is not something you kick off from home via SSH on a Saturday night, but a properly planned effort that involves critical members of the dev team as well as the QA team being present and ready to both test the production system thoroughly and fix any issues that may occur.
As long as your startup is literally running on a shoestring, informal cowboy-style deployments are, quite simply, all you can afford, but I'm somewhat surprised to find out that a mature product from a mature company like Fog Creek handles their deployment of a mission-critical product in such a way.
Certainly, deploying mid-week instead will cover up the problem a bit better, but really, Fog Creek should make a choice to either move towards fully automated, continuous deployments, or away from informal cowboy deployments towards formal release procedures with sign-offs, rollback procedures, and other typical processes.
Update: Benjamin has described the full deployment procedures here. Those seem much more reliable than what was apparent from the article.
If you read this far, you should follow me on twitter here.