On current events

It’s easy to say what a year, what a week.

But that’s a shortsighted, privileged point of view. I’m guilty of holding that occasional perspective. It’s moments like these that jolt me into recognizing the deeper reality.

What we’re seeing is the culmination of years – decades, generations, and centuries – of unjust treatment against black people, minorities, and other marginalized communities.

This country’s racist history is shameful, and so is its present.

Deep systemic racism + the militarization of police (both physically in terms of gear, and mentally in terms of mindset) is a powder keg. We’ve seen sparks before, now we’re seeing the explosion.

If you’re surprised, you’re not paying attention.

I don’t like the violence, but I get it. This is what happens when people are squeezed, compressed, and backed into a corner with no way out. For years, for generations. We’re all humans – if your lot in life was different you just might do the same.

I support peaceful protests, I support the fight against racism, against oppression, and against injustice – wherever it hides.

There’s exceptionally hard work ahead. I recognize this work has been happening for years, often ignored or unappreciated by many people, including me. How frustrating it must be to work so hard, and see such little progress, on something so elemental.

Change will require a massive, sustained effort by millions over many years. A change in perspective, mindset, and approach. And that work will certainly be met with future setbacks, which is why change requires optimism, too (which is in short supply in moments like these). I hope we can find it, and support those who need it.

I’ll be working to educate myself, and break my own patterns of ignorance. This sense of urgency is, embarrassingly, new to me, so I have a lot to learn – which organizations to support, what books to read, what history to absorb, and who to listen to. I’m starting on that today. If you’re like me, I hope you’ll do the same.

-Jason Fried, CEO, Basecamp

Employee-surveillance software is not welcome to integrate with Basecamp

We’ve been teaching people how to do remote work well for the better part of two decades. We wrote a whole book about the topic in 2013, called REMOTE: Office Not Required. Basecamp has been a remote company since day one, and our software is sold as an all-in-one toolkit for remote work. Yeah, we’re big on remote work!

So now that COVID-19 has forced a lot of companies to move to remote work, it’s doubly important that we do our part to help those new to the practice settle in. We’ve been hosting a variety of online seminars, done podcasts, and been advocating for healthy ways to do remote right.

Unfortunately, the move to remote work has also turbo-charged interest in employee surveillance software. Drew Harwell’s harrowing report for The Washington Post should make anyone’s skin crawl, but it seems some managers are reading about these disgusting tools and thinking “oh, what a great idea, where can I buy?”.

And as fate would have it, some of those managers would then visit these employee surveillance vendors and see a Basecamp logo! 😱 These vendors promoting their wares by featuring integrations with Basecamp, usually under the banner of “time tracking”. Yikes!

We’ve decided it’s our obligation to resist the normalization of employee surveillance software. It is not right, it is not human, and unless we speak up now, we might well contribute to this cancer of mistrust and control spreading even after the COVID-19 crisis is behind us. That is not something we in good conscience could let happen.

Keep reading “Employee-surveillance software is not welcome to integrate with Basecamp”

Hiring programmers with a take-home test

There’s no perfect process for hiring great programmers, but there are plenty of terrible ways to screw it up. We’ve rejected the industry stables of grilling candidates in front of a whiteboard or needling them with brain teasers since the start at Basecamp. But you’re not getting around showing real code when applying for a job here.

In the early days of the company, we hired programmers almost exclusively from the open source community. I simply tapped people I’d been working with on the Ruby on Rails project, and I knew that their code would be good, because I’d seen so much of it! This is how we hired Jamis, Jeremy, Sam, Josh, Pratik, Matthew, and Eileen.

But if you only consider candidates from the open source community, you’re going to miss out on plenty of great programmers who just don’t have the time or the inclination to contribute code to open source.

And unfortunately, it’s rarely an option for candidates to submit code from a previous job with an application to a new job. Unlike design, which is at least somewhat out in the open, commercial code is often closely guarded. And even if it wasn’t, it’s often hard to tease out what someone was actually personally responsible for (which can also be a challenge with design!).

So what we’ve started to do instead at Basecamp is level the playing field by asking late-stage candidates to complete a small programming assignment as part of the final evaluation process. I’m going to show you two examples of these projects, and the submissions from the candidates that ended up being hired.

Keep reading “Hiring programmers with a take-home test”

Seamless branch deploys with Kubernetes

Basecamp’s newest product HEY has lived on Kubernetes since development first began. While our applications are majestic monoliths, a product like HEY has numerous supporting services that run along-side the main app like our mail pipeline (Postfix and friends), Resque (and Resque Scheduler), and nginx, making Kubernetes a great orchestration option for us.

As you work on code changes or new feature additions for an application, you naturally want to test them somewhere — either in a unique environment or in production via feature flags. For our other applications like Basecamp 3, we make this happen via a series of numbered environments called betas (beta1 through betaX). A beta environment is essentially a mini production environment — it uses the production database but everything else (app services, Resque, Redis) is separate. In Basecamp 3’s case, we have a claim system via an internal chatbot that shows the status of each beta environment (here, none of them are claimed):

(prior to starting work on HEY, we were running 8 beta environments for BC3)

Our existing beta setup is fine, but what if we can do something better with the new capabilities that we are afforded by relying on Kubernetes? Indeed we can! After reading about GitHub’s branch-lab setup, I was inspired to come up with a better solution for beta environments than our existing claims system. The result is what’s in-use today for HEY: a system that (almost) immediately deploys any branch to a branch-specific endpoint that you can access right away to test your changes without having to use the claims system or talk to anyone else (along with an independent job processing fleet and Redis instance to support the environment).

Let’s walk through the developer workflow

  • A dev is working on a feature addition to the app, aptly named new-feature.
  • They make their changes in a branch (called new-feature) and push them to GitHub which automatically triggers a CI run in Buildkite:
  • The first step in the CI pipeline builds the base Docker image for the app (all later steps depend on it). If the dev hasn’t made a change to Gemfile/Gemfile.lock, this step takes ~8 seconds. Once that’s complete, it’s off to the races for the remaining steps, but most importantly for this blog post: Beta Deploy.
  • The “Beta Deploy” step runs bin/deploy within the built base image, creating a POST to GitHub’s Deployments API. In the repository settings for our app, we’ve configured a webhook that responds solely to deployment events — it’s connected to a separate Buildkite pipeline. When GitHub receives a new deployment request, it sends a webhook over to Buildkite causing another build to be queued that handles the actual deploy (known as the deploy build).
  • The “deploy build” is responsible for building the remainder of the images needed to run the app (nginx, etc.) and actually carrying out the Helm upgrades to both the main app chart and the accompanying Redis chart (that supports Resque and other Redis needs of the branch deploy):
  • From there, Kubernetes starts creating the deployments, statefulsets, services, and ingresses needed for the branch, a minute or two later the developer can access their beta at https://new-feature.corp.com. (If this isn’t the first time a branch is being deployed, there’s no initializing step and the deployment just changes the images running in the deployment).

What if a developer wants to manage the deploy from their local machine instead of having to check Buildkite? No problem, the same bin/deploy script that’s used in CI works just fine locally:

$ bin/deploy beta
[✔] Queueing deploy
[✔] Waiting for the deploy build to complete : https://buildkite.com/new-company/great-new-app-deploys/builds/13819
[✔] Kubernetes deploy complete, waiting for Pumas to restart

Deploy success! App URL: https://new-feature.corp.com

(bin/deploy also takes care of verifying that the base image has already been built for the commit being deployed. If it hasn’t it’ll wait for the initial CI build to make it past that step before continuing on to queueing the deploy.)

Remove the blanket!

Sweet, so the developer workflow is easy enough, but there’s got to be more going on below the covers, right? Yes, a lot. But first, story time.

HEY runs on Amazon EKS — AWS’ managed Kubernetes product. While we wanted to use Kubernetes, we don’t have enough bandwidth on the operations team to deal with running a bare-metal Kubernetes setup currently (or relying on something like Kops on AWS), so we’re more than happy to pay AWS a few dollars per month to handle managing our cluster masters for us.

While EKS is a managed service and relatively integrated with AWS, you still need a few other pieces installed to do things like create Application Load Balancers (what we use for the front-end of HEY) and touch Route53. For those two pieces, we have a reliance on the aws-alb-ingress-controller and external-dns projects.

Inside the app Helm chart we have two Ingress resources (one external, and one internal for cross-region traffic that stays within the AWS network) that have all of the right annotations to tell alb-ingress-controller to spin up an ALB with the proper settings (health-checks so that instances are marked healthy/unhealthy, HTTP→HTTPS redirection at the load balancer level, and the proper SSL certificate from AWS Certificate Manager) and also to let external-dns know that we need some DNS records created for this new ALB. Those annotations look something like this:

Annotations:
  kubernetes.io/ingress.class:                             alb
  alb.ingress.kubernetes.io/listen-ports:                  [{"HTTP": 80},{"HTTPS": 443}]
  alb.ingress.kubernetes.io/scheme:                        internet-facing
  alb.ingress.kubernetes.io/ssl-policy:                    ELBSecurityPolicy-TLS-1-2-2017-01
  alb.ingress.kubernetes.io/actions.ssl-redirect:          {"Type": "redirect", "RedirectConfig": { "Protocol": "HTTPS", "Port": "443", "StatusCode": "HTTP_301"}}
  alb.ingress.kubernetes.io/certificate-arn:               arn:aws:acm:us-east-1:############:certificate/########-####-####-####-############
  external-dns.alpha.kubernetes.io/hostname:               new-feature.us-east-1.corp.com.,new-feature.corp.com.

alb-ingress-controller and external-dns are both Kubernetes controllers and constantly watch cluster resources for annotations that they know how to handle. In this case, external-dns will know that it shouldn’t create a record for this Ingress resource until it has been issued an Address, which alb-ingress-controller will take care of in it’s own control loop. Once an ALB has been provisioned, alb-ingress-controller will tell the Kubernetes API that this Ingress has X Address and external-dns will carry on creating the appropriate records in the appropriate Route53 zones (in this case, external-dns will create an ALIAS record pointing to Ingress.Address and a TXT ownership record within the same Route53 zone (in the same AWS account as our EKS cluster that has been delegated from the main app domain just for these branch deploys).

These things cost money, right, what about the clean-up!?

Totally, and at the velocity that our developers are working on this app, it can rack up a small bill in EC2 spot instance and ALB costs if we have 20-30 of these branches deployed at once running all the time! We have two methods of cleaning up branch-deploys:

  • a GitHub Actions-triggered clean-up run
  • a daily clean-up run

Both of these run the same code each time, but they’re targeting different things. The GitHub Actions-triggred run is going after deploys for branches that have just been deleted — it is triggered whenever a delete event occurs in the repository. The daily clean-up run is going after deploys that are more than five days old (we do this by comparing the current time with the last deployed time from Helm). We’ve experimented with different lifespans on branch deploys, but five works for us — three is too short, seven is too long, it’s a balance.

When a branch is found and marked for deletion, the clean-up build runs the appropriate helm delete commands against the main app release and the associated Redis release, causing a cascading effect of Kubernetes resources to be cleaned up and deleted, the ALBs to be de-provisioned, and external-dns to remove the records it created (we run external-dns in full-sync mode so that it can delete records that it owns).

Other bits

  • We’ve also run this setup using Jetstack’s cert-manager for issuing certs with Let’s Encrypt for each branch deploy, but dropped it in favor of wildcard certs managed in AWS Certificate Manager because hell hath no fury like me opening my inbox everyday to find 15 cert expiration emails in it. It also added several extra minutes to the deploy provisioning timeline for new branches — rather than just having to wait for the ALB to be provisioned and the new DNS records to propagate, you also had to wait for the certificate verification record to be created, propagate, Let’s Encrypt to issue your cert, etc etc etc.
  • DNS propagation can take a while, even if you remove the costly certificate issuance step. This was particularly noticeable if you used bin/deploy locally because the last step of the script is to hit the endpoint for your deploy over and over again until it’s healthy. This meant that you could end up caching an empty DNS result since external-dns may not have created the record yet (likely, in-fact, for new branches). We help this by setting a low negative caching TTL on the Route53 zone that we use for these deploys.
  • There’s a hard limit on the number of security groups that you can attach to an ENI and there’s only so much tweaking you can do with AWS support to maximize the number of ALBs that you can have attached to the nodes in an EKS cluster. For us this means limiting the number of branch deploys in a cluster to 30. HOWEVER, I have a stretch goal to fix this by writing a custom controller that will play off of alb-ingress-controller and create host-based routing rules on a single ALB that can serve all beta instances. This would increase the number of deploys per cluster up to 95ish (per ALB since an ALB has a limit on the number of rules attached), and reduce the cost of the entire setup significantly because each ALB costs a minimum of $16/month and each deploy has two ALBs (one external and one internal).
  • We re-use the same Helm chart for production, beta, and staging — the only changes are the database endpoints (between production/beta and staging), some resource requests, and a few environmental variables. Each branch deploy is its own Helm release.
  • We use this setup to run a full mail pipeline for each branch deploy, too. This makes it easy for devs to test their changes if they involve mail processing, allowing them to send mail to <their username>@new-feature.corp.com and have it appear in their account as if they sent it through the production mail pipeline.
  • Relying on GitHub’s Deployments API means that we get nice touches in PRs like this:
complete with a direct link to the temporary deploy environment

If you’re interested in HEY, checkout hey.com and learn about our take on email.

Blake is Senior System Administrator on Basecamp’s Operations team who spends most of his time working with Kubernetes, and AWS, in some capacity. When he’s not deep in YAML, he’s out mountain biking. If you have questions, send them over on Twitter – @t3rabytes.

We’ve refreshed our policies

Spring is emerging in the US and as part of our company spring cleaning, we took a peek at our product policies, noticed some cobwebs, and got out the duster.

You can read our current product policies here. Besides rewriting sections to be more readable, we made four substantive changes:

1. We’ve consolidated our policies across all products owned and maintained by Basecamp, LLC.
That includes all versions of Basecamp, Highrise, Campfire, Backpack, and the upcoming HEY. This change mostly affects our legacy application customers, bringing their (stale) terms and privacy policies up-to-date.

2. We’ve added more details to our privacy policy.
Our customers deserve to be able to easily and clearly understand what data we collect and why. We’ve restructured and fleshed out our privacy policy to do just that, while also adding more details on your rights with regard to your information. Just as important are the things that haven’t changed: that we take the privacy of your data seriously; that we do not, have not, and never will sell your data; and that we take care to not collect sensitive data that aren’t necessary.

3. We’ve introduced a Use Restrictions policy. We are proud to help our customers do their best work. We also recognize that technology is an amplifier: it can enable the helpful and the harmful. There are some purposes we staunchly stand against. Our Use Restrictions policy fleshes out what used to be a fairly vague clause in our Terms of Service, clearly describing what we consider abusive usage of our products. In addition, we outline how we investigate and resolve abusive usage, including the principles of human oversight, balanced responsibilities, and focus on evidence that guide us in investigations.

4. We’ve adjusted how you can find out about policy changes.
In 2018, we open-sourced our policies by publishing them as a public repository on Github. One of the nice things about this repository is it tracks all the revisions we make in our policies so you can see what changed, when, and why. For instance, you can see every change we made to our policies in this refresh. You can also decide whether you want to get an email notification when changes are made by watching the repository. We’ll also be announcing any substantive changes here on SvN; if you prefer email updates, you can subscribe here.

As always, customers can always reach us at support@basecamp.com with questions or suggestions about our policies. You can also open an issue in our policies repository if you’d like to contribute! 

The Majestic Monolith can become The Citadel

The vast majority of web applications should start life as a Majestic Monolith: A single codebase that does everything the application needs to do. This is in contrast to a constellation of services, whether micro or macro, that tries to carve up the application into little islands each doing a piece of the overall work.

And the vast majority of web applications will continue to be served well by The Majestic Monolith for their entire lifespan. The limits upon which this pattern is constrained are high. Much higher than most people like to imagine when they fantasize about being capital-a Architects.

But. Even so, there may well come a day when The Majestic Monolith needs a little help. Maybe you’re dealing with very large teams that constantly have people tripping over each other (although, bear in mind that many very large organizations use the monorepo pattern!). Or you end up having performance or availability issues under extreme load that can’t be resolved easily within the confines of The Majestic Monolith’s technology choices. Your first instinct should be to improve the Majestic Monolith until it can cope, but, having done that and failed, you may look to the next step.

Keep reading “The Majestic Monolith can become The Citadel”

Why HEY had to wait

We had originally planned to release HEY, our new email service, in April. There was the final cycle to finish the features, there was a company meetup planned for the end of the month to celebrate together, we’d been capacity testing extensively, and the first step of a marketing campaign was already under way.

But then the world caught a virus. And suddenly it got pretty hard to stay excited about a brand new product. Not because that product wasn’t exciting, but because its significance was dwarfed by world events.

A lack of excitement, though, you could push through. The prospect of a stressful launch alongside the reality of a stressful life? No.

That’s not because we weren’t ready to work remotely. That we had to scramble to find new habits or tools to be productive. We’ve worked remotely for the past twenty years. We wrote a book on working remotely. Basecamp is a through and through remote company (and an all-in-one toolkit for remote work!).

But what’s going on right now is about more than just whether work can happen, but to which degree it should. We’re fortunate to work in software where the show doesn’t have to stop, like is the case in many other industries, but the show shouldn’t just carry on like nothing happened either.

About half the people who work at Basecamp have kids. They’re all at home now. Finding a new rhythm with remote learning, more cramped quarters, more tension from cooped-up siblings. You can’t put in 100% at work when life asks for 150%. Some things gotta give, and that something, for us, had to be HEY.

And it’s not like life is daisies even if you don’t have kids. This is a really stressful time, and it’s our obligation at Basecamp to help everyone get through that the best we can. Launching a new product in the midst of that just wasn’t the responsible thing to do, so we won’t.

Remember, almost all deadlines are made up. You can change your mind when the world changes around you.

HEY is going to launch when the world’s got a handle on this virus. When we either find a new normal, living within long-running restrictions, or we find a way to beat this thing. We’re not going to put a date on that, because nobody knows when that might be. And we’re not going to pretend that we do either.

In the meantime, we’ll keep making HEY better. We’re also going to put in time to level up Basecamp in a number of significant ways that have long been requested. The work doesn’t stop, it just bends.

If you wrote us an email to iwant@hey.com, you’re on the list, and we’ll let that list know as soon as we open up. If you think you might be interested in a better email experience when that’s something we all have the mental space to think about again, please do send us a story about how you feel about email to iwant@hey.com.

Stay home, stay safe!

Working remotely builds organizational resiliency

For many, moving from everyone’s-working-from-the-office to everyone’s-working-at-home isn’t so much a transition as it is a scramble. A very how the fuck? moment.

That’s natural. And people need time to figure it out. So if you’re in a leadership position, bake in time. You can’t expect people to hit the ground running when everything’s different. Yes, the scheduled show must go on, but for now it’s live TV and it’s running long. Everything else is bumped out.

This also isn’t a time to try to simulate the office. Working from home is not working from the office. Working remotely is not working locally. Don’t try to make one the other. If you have meetings all day at the office, don’t simply simulate those meetings via video. This is an opportunity not to have those meetings. Write it up instead, disseminate the information that way. Let people absorb it on their own time. Protect their time and attention. Improve the way you communicate.

Ultimately this major upheaval is an opportunity. This is a chance for your company, your teams, and individuals to learn a new skill. Working remotely is a skill. When this is all over, everyone should have a new skill.

Being able to do the same work in a different way is a skill. Being able to take two paths instead of one builds resiliency. Resiliency is a super power. Being more adaptable is valuable.

This is a chance for companies to become more resilient. To build freedom from worry. Freedom from worry that without an office, without those daily meetings, without all that face-to-face that the show can’t go on. Or that it can’t work as well. Get remote right, build this new resiliency, and not only can remote work work, it’ll prove to work better than the way you worked before.

Live Q&A on remote working, working from home, and running a business remotely

In this livesteam, David and I answer audience questions about how to work remotely. At Basecamp we’ve been working remotely for nearly 20 years, so we have a lot of experience to share. This nearly 2-hour video goes into great detail on a wide variety of topics. Highly recommended if you’re trying to figure out how to work remotely.

A live tour of how Basecamp uses Basecamp to run Basecamp

David and I spent nearly 2-hours giving a livestream tour of our very own Basecamp account. We wanted to show you how Basecamp uses Basecamp to run projects, communicate internally, share announcements, know what everyone’s working on, build software, keep up socially, and a whole bunch more. Our entire company runs on Basecamp, and this video shows you how.