Data scientists mostly just do arithmetic and that’s a good thing

Hi, I’m Noah. I work at Basecamp. Sometimes I’m called a “data scientist.” Mostly, I just do arithmetic, and I’m ok with that.

Here’s a few of the things I worked on in the last couple of weeks, each of them in response to a real problem facing the business:

  • I analyzed conversion, trial completion, and average invoice amounts for users in different countries.
  • I identified the rate at which people accidentally sign up for Basecamp when they mean to sign in to an existing account and how that’s changed over time.
  • I analyzed and reported on financial performance of a few of our products.
  • I ran and analyzed a survey of account owners.
  • I analyzed an A/B test we ran that affected the behavior of a feature within Basecamp.

In the last two weeks, the most “sophisticated” math I’ve done has been a few power analyses and significance tests. Mostly what I’ve done is write SQL queries to get data, performed basic arithmetic on that data (computing differences, percentiles, etc.), graphed the results, and wrote paragraphs of explanation or recommendation.

I haven’t coded up any algorithms, built any recommendation engines, deployed a deep learning system, or built a neural net.

Why not? Because Basecamp doesn’t need those things right now.

The dirty little secret of the ongoing “data science” boom is that most of what people talk about as being data science isn’t what businesses actually need. Businesses need accurate and actionable information to help them make decisions about how they spend their time and resources. There is a very small subset of business problems that are best solved by machine learning; most of them just need good data and an understanding of what it means that is best gained using simple methods.

Some people will argue that what I’ve described as being valuable isn’t “data science”, but is instead just “business intelligence” or “data analytics”. I can’t argue with that arbitrary definition of data science, but it doesn’t matter what you call it — it’s still the most valuable way for most people who work with data to spend their time.

I get a fair number of emails from people who want to get into “data science” asking for advice. Should they get a masters degree? Should they do a bunch of Kaggle competitions?

My advice is simple: no. What you should probably do is make sure you understand how to do basic math, know how to write a basic SQL query, and understand how a business works and what it needs to succeed. If you want to be a valuable contributor to a business, instead of spending your weekend working on a data mining competition, go work in a small business. Talk to customers. Watch what products sell and which ones don’t. Think about the economics that drive the business and how you can help it succeed more.

Knowing what matters is the real key to being an effective data scientist.