Our Takeaway after Basecamp’s System Outage: Prevention Is Always Better than Cure

As a communications agency, staying ahead of the game is our daily bread. For this, our entire staff gathers around a huge table every Thursday morning for a weekly meeting to discuss the status of all the clients we work for: reviewing goals, adjusting strategies and resolving emerging issues immediately is crucial in ensuring that our clients’ projects are on-track and make great strides. But what if the project management tool to coordinate all of this is down all at once when it is most needed? Well, that’s exactly what we – among many others – faced two weeks ago.

What happened?

On November 9, Basecamp 3 had been stuck in read-only mode for nearly five hours straight. Although access to existing messages, to-do lists and files were still fully available, albeit non-edible, new data could not be added into the team communication software whatsoever.

What exactly caused the outage in the first place?

Every type of activity, from posting a message, updating a to-do list or merely liking one of your co-workers’ comments, is meticulously tracked and archived into a large table of events. As it transpired, Basement’s “database hit the ceiling of 2,147,483,647 on [this] very busy events table,” according to a later statement by David Heinemeier Hansson, co-founder of the software. The capacity to write any new events was reached or to put into layman’s terms: Basecamp was running out of memory space.

How did service users cope with this downtime?

It didn’t take long until customers took to Twitter asking for updates on the issue. Some affected users also didn’t recoil from expressing their frustration about the service failure:

Not good timing guys!! We are running a training session for a client showing them BC and trying to convince them to get BC for their company!!! ????????????????

— Olivia Hensley (@OLHensley) November 8, 2018

i wasted a lot of my data trying to upload a file, twice. i though it was my internet at first.

— Jasper Rojo (@JasperYDN) November 8, 2018

On the flipside, a number of users took it with a grain of salt appreciating a somewhat unforeseen coffee break:

Thanks for the info. It has given me an excuse to grab a coffee ☕️

— Michael Bosson (@mikebosson) November 8, 2018

Time to have a coffee break ????

— Julien (@JulienGaille) November 8, 2018

What to learn from it?

In light of most users’ positive encouragement online, Basecamp certainly did a great job at handling the issue as fast and, more importantly, as transparent as possible. As we continuously highlighted in previous blog posts about thought leadership, there’s a golden rule in crisis communications: Validate concern and show action – and that’s precisely what happened. After users had been informed about the failure on Twitter early on, Basecamp demonstrated great endeavor in providing updates consistently and soothing customers by specifying an exact time by which the system would be back online, even reminding users to consider time differences depending on their location. Needless to say, this proactive engagement helped the software company, which prides itself on offering a reliable service with an uptime record of 99.998%, to maneuver through this incident in an adept manner.

I very much appreciate the context and explanation but more importantly is how refreshing it is to see honesty and accountability in a day and age where businesses power sweep issues to the growing mound under the rug. We all deserve space and grace from time to time.

— Jeremy Moore (@jeremycmoore) November 13, 2018

This week, Basecamp eventually settled this system outage by presenting a concluding failure report to its customers alongside another apology on the part of the leadership.

The full, painful postmortem on why Basecamp 3 was stuck in read-only mode for nearly five hours last Thursday. We’re incredibly sorry for what happened, and especially that it was avoidable. Here are all the details. https://t.co/BQcBy36jLG

— Basecamp (@basecamp) November 12, 2018

The crux of the whole matter here is – as Basecamp itself acknowledges repeatedly – that this failure, above all, was very much avoidable. An easy fix that could have averted a fairly disproportionate outcome if it only came up a little earlier. All this being said, the takeaway of the story is: carving out some time for ancillary meetings to re-evaluate what’s on your plate is of utmost importance, as we have seen: prevention is always better than cure!