From 2:13am GMT March 13 / 9:13pm Central March 12 until around 4:10am GMT / 11:10pm Central, Basecamp 3 was mostly offline and Basecamp 2 unable to process file uploads and downloads, as our cloud storage provider had a severe, sustained outage. We continued to have minor disruptions in service from 4:10am GMT / 11:10pm Central until everything was cleared at 6:53am GMT / 1:53am Central.
This is the second time in a week that I’m forced to write “I’m so sorry”. That’s incredibly painful. Both because it’s because we’re failing our customers for the second time in a week, but also because it’s showing us just how unprepared we’ve been as an organization to deal with these cloud challenges, despite our belief otherwise.
I’m not going to bother you with platitudes about “lessons to be learnt”, because I’ve already done that just a few days ago. This goes much deeper than just a few lessons. It has called into question our entire risk management and operational structure at Basecamp.
It’s also been a mighty fall. From reaching for 99.999% in uptime – the hallowed five nines! – we’re now scrambling for two of them. From riches of reliance to rags of shambles. To say this is humbling is an epic understatement.
We’re stopping all major product development at Basecamp for the moment, and dedicating all our attention to fixing these single points of failure that the recent cloud outages have revealed. We’re also going to pull back from our big migration to the cloud for a while, until we’re able to comfortably commit to a multi-region, multi-provider setup that’s more resilient against these outages.
From 4:30am GMT March 7 / 10:30pm Central March 6 until 1:02pm GMT / 7:02am Central, Basecamp 2 and the search feature in Basecamp 3 were mostly offline due to a catastrophic network failure with our cloud provider. Both our primary network link, our backup network link, and several additional ad-hoc network links between critical services needed to run Basecamp 2 were forced offline, as the cloud provider sought to deal with underlying network problems they were having.
Both Basecamp 2 and the search feature in Basecamp 3 are now fully back online.
But this was one of the worst outages we’ve had in the history of Basecamp. We’re incredibly sorry about just how long and broad of an interruption this caused, especially for our European customers of Basecamp 2. We’re so very sorry about this. We know this caused real and deep interruption to many people’s workflow from the early morning to the early afternoon on the main European timezones. And of course to any other customers around the world, including the US, who were also affected.
We’ve learned some hard lessons about network availability, the limitations of redundant, and double redundant backup connections. We’ll be working diligently to change how we work with cloud providers in the future, and how we can insulate ourselves and our customers from any future incidents like this. While this incident may have been triggered by network issues outside of our immediate control, it’s always within our control how we architect our systems, how we prepare for disasters, and how we ensure something like this never has the power to inflict such a traumatic outage.
So I want to make absolutely clear that this is our failure. Even in this new world of cloud services, it’s still always our fault when Basecamp isn’t available. Whatever the underlying problem for an outage, there’s always something you could have done to prevent it. And our list in this case includes a number of both obvious and not-so-obvious steps we could have taken. We will now take them.
Once again, I’m deeply sorry for this terrible outage. We will work as diligently we can to ensure that this doesn’t happen to any version of Basecamp again, neither past or present. Thank you for understanding, thank you for your patience, and thank you for being a customer, even if you with all justification ran out of both understanding and patience during this utterly unacceptable outage.
Basecamp 3’s to-do lists keep you in the loop when you’re working closely with other members of your team. You get notified when someone assigns you a to-do and that person gets notified when you check it off.
This works great when it’s clear who needs to be assigned, but that’s not always the case. Sometimes you don’t know who should do the work, other times a to-do isn’t for anyone in particular and just needs to be logged.
Take a bug list, for example. People across your company might log software bugs, carefully documenting what’s broken and how to recreate it. In the moment, they might not be sure who to assign, so they log the bug and move on. But if you didn’t want those bugs to fall through the cracks, you’d have to monitor the list yourself.
It shouldn’t be on you to reload a to-do list every hour to see what’s changed. Starting today, you won’t have to!
A new notification
If you’d like to receive notifications when to-dos are added to a specific list, just go to that list in Basecamp. Inside the right-hand menu, you’ll see a new option to receive these notifications:
Once you’ve turned on notifications, you’ll see messages in your Hey! menu every time someone adds a new to-do to that list:
Prefer to get email notifications? No problem — we’ll bundle up these notifications so your inbox doesn’t get clobbered every time someone adds a to-do:
Just for you
These notifications are only for you and only for a particular list. Other people will have to opt-in if they’d like to receive notifications, too. Want to stop receiving notifications? Just visit the to-do list, open the menu, and turn off the notification.
Give it a try!
We hope this update makes it easier to keep track of bugs, QA issues, and other unassigned tasks. Let us know what you think!
A great thing about the Home screen in Basecamp 3 is that your drafts, bookmarks, and assignments are all easily accessible. Load up Home, click a link, and see all of your assignments or your drafts in one place.
There’s just one problem with this approach: You have to leave whatever you’re doing to get to these links. We wanted to make it easier to find and access these “My…” links, and now you can!
Inspired by our mobile apps, we’ve removed these links from Home and added a dedicated menu called “My Stuff.” Now, no matter where you are, you can access your important links:
Open the menu and you’ll see your links up top plus a small collection of pages you’ve recently visited inside Basecamp:
While we were building this new menu, we found it so useful that we wanted to use it without taking our hands off the keyboard. Just like the Find menu, we’ve added a keyboard shortcut to open My Stuff:
My Stuff: ⌘/Ctrl + ; Find: ⌘/Ctrl + /
Once open, you can use your up & down arrows to navigate the links. Choose the link you want and hit Enter to visit that page — no mouse required!
Give it a try!
We hope this update makes it easier to access things like your unpublished drafts, assignments, and Boosts. Let us know what you think!
Last Thursday, November 9th, Basecamp 3 was in read-only mode for almost five hours starting 7:21am CST and ending 12:11pm CST. That meant users could access existing messages, todo lists, and files, but no new information could be entered, and no existing information could be altered. Everything was frozen in place.
The root cause was that our database hit the ceiling of 2,147,483,647 on our very busy events table. Almost every single activity in Basecamp is tracked in this table. When you post a message, update a todo list, or applaud a comment, we track that activity in the events table. So when we became unable to write new events to that table, every attempt to do practically anything in Basecamp was halted.
This was an avoidable problem. We were actively working on expanding the capacity of the events table in the days prior to this outage, but we failed to properly account for how quickly we were running out of headroom.
To compound the avoidable factor, we should had been aware of the general issue much sooner. The programming framework we use, Ruby on Rails (which was originally extracted from Basecamp!), moved to a new default for database tables in version 5.1 that was released in 2017. That change lifting the headroom for records from 2,147,483,647 to 9,223,372,036,854,775,807 on all tables. Which ended up being the same root-cause fix that we applied to our tables.
It’s bad enough that we had the worst outage at Basecamp in probably 10 years, but to know that it was avoidable is hard to swallow. And I cannot express my apologies clearly or deeply enough.
We pride ourselves at Basecamp on being “boring software” because it just works and it’s always available. Since Basecamp 3 was launched, and up until this outage, we’ve had an uptime record of 99.998%. This near five-hour outage has taken that impressive statistic down to a more humbling 99.978%.
Some companies might choose to weasel around an outage like ours by claiming that it was only a “partial outage”, because the application remained available in read-only mode for the majority of this time. But that’s not what we’re going to do at Basecamp. We’re going to take the scar in our uptime record as a reminder to do better.
Because we owe everyone using Basecamp to do better. It’s embarrassing and humbling to have suffered the biggest outage at Basecamp in a decade from an issue that we should have addressed years ago, and that we were actively working on addressing, but failed to complete in time.
As the CTO of Basecamp and the creator of Ruby on Rails, I accept full responsibility for our failures. I should have been more vigilant with our own database schema when Rails 5.1 announced the new default, and I should have followed up and asked the right questions when we finally did start work on remediation. I’m really sorry to have failed you 😢
If you have any questions, or if we can help in any way, please reach out to our wonderful support crew who’ve been dealing with each report individually.
I also want to express my deep gratitude to everyone who’ve been so gracious with their kind words of encouragement and support during and after this ordeal. I don’t know if we’ve earned such understanding, given our clear culpability, but we are extremely grateful none the less.
On a personal note, I want to apologize for not posting this postmortem until today. The plan was to have this final summary ready on Friday, but then the Woolsey fire hit, and our family was forced to evacuate our home in Malibu. It’s been a crazy week 😬
You could always Add a To-do, Upload a File, Post a Message, or Add an Event right from the Home Screen. Most people, however, just needed to browse projects or the Hey! menu without the “Big Green Add Button” in the way.
Now you can simply swipe up on the Home Screen navigation to reveal these Quick Add options. We’ve also added a list of Recently Visited sections for easy reference. Just tap on one of these to jump right back to it.
2. Comments with Image Galleries
We’ve improved the interface for commenting on Basecamp Messages. Now you can format your comments using Bold, Italic, and Bullets. You can also select multiple images to attach to form an Image Gallery in Basecamp.
Tap the Paperclip icon. Then select the images one at a time in the order you want them to appear. Tap Upload Files, and they’ll be grouped together into an Image Gallery. You can add and edit captions too.
3. Reply Directly inside a Notification
If your phone supports it (Android 8.0 and above) you can now have Basecamp conversations without opening the app. Just reply to a Ping or Message notification. You’ll see a running history of what’s been said.
We hope these features help you get around Basecamp easier, give detailed suggestions, and reply to discussions without losing focus. We’ve got more planned! Stay tuned. Until then, get the latest on Google Play.
Basecamp 3 is now back online for reading and writing. All data was confirmed to be fully safe and intact. No emails that were sent to Basecamp during the outage were dropped. We may still have some backlogs on processing things like incoming emails, and you may still see some slowdowns here and there as we catch up. But we are back, and we are safe.
We will be following up with a detailed and complete postmortem soon. All in, we were stuck in read-only mode for almost five hours. That’s the most catastrophic failure we’ve had a Basecamp in maybe as much as decade, and we could not be more sorry. We know that Basecamp customers depend on being able to get to their data and carry on the work, and today we failed you on that.
We’ve let you down on an avoidable issue that we should have been on top of. We will work hard to regain your trust, and to get back to our normal, boring schedule of 99.998% uptime.
Note: If you were in the middle of posting something new to Basecamp, and you got an error, that data is most likely saved in our browser-based autosave system. If it doesn’t appear automatically, we can help you recover that data. Please contact support if you’re in this situation, and we’ll have a team ready to assist.
Below is the timeline for today:
At 7:21am CST, we first got alerted that we had run out of ID numbers on an important tracking table in the database. This was because the column in database was configured as an integer rather than a big integer. The integer runs out of numbers at 2147483647. The big integer can grow until 9223372036854775807.
At 7:29am CST, the team diagnosed the problem and started working on the fix. This meant writing what’s called a database migration where you change the column type from the regular integer to the big integer type. Changing a production database is serious business, so we had to test this fix on a staging database to make sure it was safe.
At 7:52am CST, we had verified that the fix was correct and tested it on a staging database, so we commenced making the change to the production database table. That table in the database is very large, of course. That’s why it ran out of regular integers. So the migration was estimated to take about one hour and forty minutes.
At 10:56am CST, we completed the upgrade to the databases. This was the largest part of the fix we needed to address the problem. But we still have to verify all the data, update our configurations, and ensure that we won’t have more problems when we go back online. We’re working on this as fast as we can.
At 11:33am CST, we’re still verifying that all data is as it should be for Basecamp 3. The database migration has finished, but the verification process is still ongoing. We’re working as fast as we can and hope to be back fully shortly.
At 11:52am CST, verification of the databases is taking longer than expected. We have 4 databases per datacenter and we have two datacenters with databases. So a total of 8 databases. We need to be absolutely certain that all the data is in proper sync before we can go back online. It’s looking good, but 99% sure isn’t good enough. Need 100%.
At 12:22pm CST, Basecamp came back online after we successfully verified that all data was 100% intact.
At 12:33pm CST, Basecamp had another issue dealing with the intense load of the application being back online. This caused a caching server to get overwhelmed. So Basecamp is down again while we get this sorted.
At 12:41pm CST, Basecamp came back online after we switched over to our backup caching servers. Everything is working as of this moment, but we’re obviously not entirely out of the woods yet. We remain on red alert.
I will continue to update this post with more information, and we will provide a full postmortem after this has completed.
We should have known better. We should have done our due diligence when this improvement was made to the framework two years ago. I accept full responsibility for failing to heed that warning, and by extension for causing the multi-hour outage today. I’m really, really sorry 😢
Basecamp 3’s Message Board is a central place for your team to post updates and gather feedback on the record. It’s great for announcements, internal pitches, and just bouncing ideas back and forth.
Since Basecamp 3 first launched, the Message Board has been sorted so new posts appear at the top with older ones below. That’s great most of the time, but many of you have asked for other ways to sort your posts.
New ways to sort
With this update, we’ve added a new sort order setting to Basecamp 3’s Message Boards. You can access this setting on your computer, tablet, or phone from the menu in the upper right corner of the Message Board:
Now, you can sort your posts three ways:
By original post date: Messages posted recently will always be shown first. This is still the default setting.
By latest comment: Messages with new comments be shown first. This keeps the most active discussions right up at the top.
Alphabetically A-Z: Messages will be sorted based on their title. If you use the Message Board like a table of contents for your team or company, this option will come in handy.
Applies to everyone on the project
Whatever you choose, this will affect everyone on the project. That way, everyone will know where to put things and where to find things—you’ll all see posts in the same order.
Different projects, different settings
Each project has its own setting. If you prefer organizing your Company HQ alphabetically, your client projects by latest comment, and your marketing team by original post date that’ll work great!
What’s more, Message Board posts on your project’s home screen will remember your sort order, too:
We hope this update gives you more flexibility and makes the Basecamp Message Board even more useful. Let us know what you think!
At the same time, we’ve gone back and retrofitted existing features and interactions for better accessibility. Today I’m excited to announce that we just completed some significant improvements to the Basecamp 3 Jump Menu!
The jump menu has always been the quickest way for getting to a person, project, recently visited page, and My assignments/bookmarks/schedule /drafts/latest activity. Here’s a look at it in action:
In setting out to make the jump menu more accessible we identified a few specific areas in need of help.
1. Provide an alternate way to trigger the menu
The ⌘/Ctrl + J shortcut for opening the jump menu isn’t communicated in a non-visual way, and initiating multi-key commands can be difficult for people who have motor function challenges.
To improve this, we added a button-based trigger, implemented as an invisible button that appears when someone first presses their tab key after loading up Basecamp. This technique is very similar to the common “Skip Navigation” link technique used around the web (we added one to Basecamp at the end of last year).
2. Clear non-visual instructions for how to interact with it
As a visual user it’s fairly obvious how the jump menu works: We show the placeholder “Jump to project, person, or recently visited page…” with a blinking cursor, and a list of entries below it that filters down as you type.
To clarify this interaction for customers using a screen reader, we created a visually hidden <span> element with more verbose instructions, “Type to filter and use the up and down arrow keys to navigate this list of people, projects, and recently visited pages.”
3. Announcing the selected item and number of results as you filter
If you’re using a screen reader to filter through a list, how do you to know how many items are listed as your search term increases? And which item is selected as you arrow up/down or tab to navigate through the list of results?
The first step in making a complex element like this one accessible is doing some research. We look for examples of similar elements from around the web for inspiration and guidance on the proper markup to use. The W3C WAI-ARIA examples site (get ready for a long one! “World Wide Web Consortium’s Web Accessibility Initiative (for) Accessible Rich Internet Applications”) is a great place to start. The second example on their Combobox with Listbox Popup Examples page, “List Autocomplete with Automatic Selection,” seemed most similar to the behavior of our Basecamp jump menu.
Authoritative as this site may seem, it’s worth testing the examples on real screen readers. There’s an abundance of quirks across screen reader + web browser combos that means these examples often don’t work quite as expected. When that happens, additional code is often required to get screen reader announcements to fire in the way you’d like. Expect lots of trial and error 😊
The implementation we settled on uses the aria-activedescendant property. This technique provides a way to keep DOM focus on the <input> while updating your selection as you move through the list of results. This is the key that allows the screen reader software to understand what’s happening on the screen. Here’s a look at the final product in action, followed by all of the dynamic and static attributes we used to get this working. For further reading about these attributes check out the W3C article linked above where many of the following definitions are borrowed from.
On the combobox container <div>, our <bc-content-filter>element:
role="combobox": This identifies the element as a combobox.
aria-haspopup="listbox": This indicates that the combobox is associated with a pop up list of suggested values.
aria-owns="jump-menu__results": This associates the combobox with the results container.
aria-expanded="true": This indicates that the associated results listbox popup element is displayed. Since in our case the list of results is always shown when the jump menu is shown, we don’t need to toggle this attribute. If it only appeared after some text was entered, we would need to toggle the attribute between this and aria-expanded="false".
2. On the text box <input>:
aria-autocomplete="list: Indicates that the autocomplete behavior of the string that’s entered is to suggest a list of possible values in a popup.
aria-labelledby="a-jump-menu__description": A sort of backup label for instructions on how to use the jump menu.
aria-controls="jump-menu__results": Points to the popup element that lists the suggested values.
3. A non-visible status <span> to communicate the number of results (e.g. “Home, 1 of 14”). Making it an aria live region with role=”status” and aria-live=”assertive” ensures that the screen reader will immediately speak any new text content that gets pushed into it. Just make sure the <span> is present in the DOM before pushing text into it, or it won’t work!
role=”status”: A type of aria live region used for conveying advisory information.
aria-live=”assertive”: This makes sure that when the selection changes, announcing it takes priority over anything else the screen reader might be saying.
Dynamic attribute: When the jump menu is first rendered we inject the name of the auto-selected first item in the list followed by the directions for using the widget (“Type to filter and use the up and down arrow keys to navigate this list of people, projects, and recently visited pages”). As you arrow/tab through the list of entries, we use a helper to update the contents of the span to again communicate the current selection, followed by your current location in the list, for example “Management team project – Match 2 of 3”.
4. Another hiddendescription <span>, referenced by aria-labelledby, provides a better description for how to use the jump menu than the visual placeholder:
Text content: “Type to filter and use the up and down arrow keys to navigate this list of people, projects, and recently visited pages”
5. On the listbox results container <div>:
id=”jump-menu__results”: Used as a reference by the combobox element.
role=”listbox”: Defines it as a container for the list of results.
6. On each <article> element in the list of results:
A unique id for each result in the list.
role=”option”: This defines the element as a listbox option.
Basecamp is hiring a data analyst to help us make better decisions in all areas of the business. This includes everything from running A/B tests with statistical rigor to forecasting revenue for the year to tracing performance problems to analyzing usage patterns.
We’re looking for an experienced candidate who’s done similar work elsewhere (as you’ll be the only one at Basecamp with this specialty). But nobody hits the ground running. You won’t be able to answer every question immediately or know how all the systems work on day one — and we don’t expect you to.
We want strong, diverse teams built from different backgrounds, experiences and identities. We’re ready for the ongoing work that goes into building an inclusive, supportive place for you to do the best work of your career. That starts with working no more than 40 hours a week on a regular basis and getting 8+ hours of sleep a night. Our workplace and our benefits are designed to support a sustainable, healthy relationship with your work. (We literally just wrote a book on the topic!)
Today, our team works from 32 different cities spread across 6 countries. You can work from anywhere in the world, so long as you can design a normal working day with 4 hours or more overlap with Chicago time (CST/UTC-6). Nomads welcome.
About the job
Data informs almost everything we do at Basecamp, but we’re not a “data-driven organization” in the sense that data dictates decisions. Data is there to clear the head, but ultimately we drive the company with our heart.
This means the job isn’t about maximizing revenue or minimizing costs. Yes, we want to make money and we don’t want to be wasteful, but we also want to be kind, considerate, fair, flexible, and calm. You won’t be looking for ways to squeeze the last sour drop out of the lemon at Basecamp.
But you will help us make sense of the data. Establish the facts. Put a price on the choices we make. Help us understand the business, our software, and its customers.
Here are some examples of projects you might work on:
Analyzing the performance of a new marketing page. Track the cohort that signed up with this variation. Keep us patient for a statistically significant result. Compute the value of the change.
Identify when a brute-force login attack started, quarantine the IP addresses involved, work with technical operations to bolster our defenses, and write up the forensics report at the end.
Analyze our purchase records to locate transactions within states that are starting to collect sales tax on software like ours, work with our accounting company to document that sourcing method, and help evaluate whether we should buy or build a sales-tax engine.
Help product strategy analyze usage data to figure out whether a certain feature is working as intended, and if it is, who it’s important to.
Illuminate how we’re spending money on cloud computing today, and estimate how much we’ll be spending next year, given our growth patterns.
Answer the question: Has Basecamp 3 gotten slower in the last 6 months? Compare aggregate performance data to find the high-level trends, then help us pinpoint data tipping points or code regressions.
Answering these questions usually means formulating and running queries against our big data infrastructure. But it also means just doing the basic math, and ensuring we’re being statistically rigorous. You should be able to do both the technical and statistical work to answer questions like the ones in the examples above.
That’s a lot of different areas of responsibility! So you probably won’t be an expert in all of them, and that’s fine. A solid fundamental approach to analysis will pave the way.
And you’ll have plenty of help! Basecamp has a Security, Infrastructure, and Performance (SIP) group that’s responsible for managing the data pipeline, storage, and analytical interfaces. And a Operations (Ops) group that’s responsible for running our servers, network, and cloud services. It’s a plus if you’re able to help evolve these systems, but by no means a requirement.
In broad strokes, Managers of One thrive at Basecamp. We’re committed generalists, eager learners, conscientious workers, and curators of what’s essential. We’re quick to trust. We see things through. We’re kind to each other, look up to each other, and support each other. We achieve together. We are colleagues, here to do our best work.
You’ll probably have a degree that has exposed you to the rigor of the analytical work. Social scientists welcome. If you don’t have a degree in Theoretical Statistics, that’s not a showstopper — and it’s not what we’re looking for, anyway! We care about what you can do and how you do it, not about how you got there.
While we currently have an office in Chicago, you should be comfortable working remotely — most of the company does! This means that the bulk of our work is written, whether that be in the form of long reports or short chats. We value good writers.
We also value people who can take a stand yet commit even when they disagree. We subject ideas to rigorous debate, but all remember that we’re here for the same purpose: to do good work together. Charging the trust battery is part of the work.
About our pay & benefits
Our pay is within the top 10% of the industry, for the matched role and experience, based on San Francisco rates. This comes to a range at hiring of between $115,000 and $141,000, depending on your seniority. No matter where you live. Plus, with two years under your belt, you’ll participate in our profit-growth sharing program.
Our benefits at Basecamp are all about helping you lead a healthy life away from work. While we have a lovely office in Chicago, it’s not where you’ll find foosball tables constantly spinning, paid lunches, or any of the other trappings that companies use to lure employees into staying ever longer at work.
Work can wait. Our benefits include 4-day Summer Weeks, a yearly paid vacation, a one-month sabbatical every three years, and allowances for CSA, fitness, massage, and continuing education. We have top-shelf health insurance and a retirement plan with a generous match. See the full list.
How to apply
Please send an application tailored to this position that speaks to us. Introduce yourself as a colleague. Show us that future. As we said, we value great writers, so please do take your time with the application. Forget that generic resume. There’s no prize for being the first to submit!
We’d like to hear about how you’d approach some of the example projects outlined in the description about the job. Imagine you’re doing the work and walk us through your thinking.
All that being said, don’t send in a copy of War & Peace. We hire rarely at Basecamp, so when we do, there’s usually hundreds of applicants. Be kind to the people doing application triage and keep your cover letter to fewer than 800 words and the thoughts on project approaches below the same ceiling.
Go for it!
We are accepting applications for this position until Friday, October 12. We’ll let you know that we’ve received your application. After that, you probably shouldn’t expect to hear back from us until after the application deadline has passed. We want to give everyone a fair chance to apply and be evaluated.
As mentioned in the introduction, we’re eager to assemble a more diverse team. In fact, we’re not afraid of putting extra weight on candidates from underrepresented groups at Basecamp.
We can’t wait to hear from you!
(And again, imposters: We are too. Take heart. Step up.)