Menu

The Myth of the Data Science Trap

Why you should focus on providing business value 📈 instead of training models 🤖

Andrei Pascanean
Jul 27, 2022 4:23:29 PM

A couple weeks ago I read a post on reddit’s r/datascience forum where a user complained that the current data landscape is more skewed to analytics than machine learning. He argued that oftentimes companies place high requirements — some even ask for a PhD — only to have the candidate do data analytics and BI.

This rant ended in the user concluding that overqualified data experts are repeatedly tricked into accepting data analytics jobs that they don’t want.

So why do companies do this?

There’s levels to it

First of all, different organisations come with different levels of data maturity:

  • “Carol from accounting can mail you her Excel spreadsheet, quite some big data with about 20k rows!”
  • “This is our marketing dashboard, we use it to generate some reports based on the data in our database.”
  • “We build machine learning models. They make some interesting predictions and we plot them in our dashboard”
  • “Our models are constantly updated, with their predictions influencing decision making in every step of our workflow”

Each level corresponds to a certain tech stack; a combination of tools and methods that suit a company’s business processes.

An organisation that just started collecting and using data could be perfectly satisfied with a relational database and a simple dashboard. Not every company will be looking for an ML model from the get go.

Of course over time as the company evolves and their data maturity level improves, modelling will become more and more relevant.

More to life than models

OK, but what about companies that are relatively data mature? Why do they still require data experts to write tests for production code or manage container repositories?

Well you probably guessed it already — machine learning projects require more than just training a model. Data has to be cleaned and preprocessed in a robust pipeline, models have to be placed in production and predictions need to be served to end users.

ML Engineering Model ProcessSource: Machine Learning Engineering by Andriy Burkov

Most companies can’t afford to keep a data scientist working only on modelling, while remaining idle for the rest of the project. That’s why data professionals are also expected to work on data engineering, model deployment and MLOps tasks.

Most data teams — save for maybe R&D departments — spend only a fraction of their time building models.

By the way, that image I used is from one of my favourite ML books; Machine Learning Engineering by Andryi Burkov

Great Expectations

So companies are looking for professionals that can do more than just train a model. Then why is there such a disparity between what companies offer and what applicants expect?

Learning data science often happens in a lab-like environment where datasets are squeaky clean, project scopes are clearly defined and where there is no end-user cater to. This kind of environment would make sense if you’re only learning about statistical concepts but not so much about applying your craft in a business setting.

Meme clean data - blog data science trapDon't be like Milton - be prepared.

This way of learning is common in online bootcamps and university courses, which leads to plenty of candidates having an unrealistic expectation of what working in the industry is actually like.

Always Be Adding Value

To recap: companies have different levels of data maturity, while running projects that require more than just training a model. Meanwhile applicants have a completely different idea of what their day-to-day responsibilities would look like.

So how do we bridge this gap?

It’s important to realise that any project in data science has only one goal: adding business value.

Data science is about solving problems. There is no such thing as a data science problem — only business problems with data solutions.

The business-facing stakeholders on your project don’t care about how you arrive at your solution. The model you train doesn’t matter. Whether it’s a simple business rules model or a complicated neural network, the only thing that matters is how your solution improves business workflows. Adding this value is important.

So focus on finding value in providing value, whether that means building a dashboard for a less data-mature company, or writing API endpoints for deploying an already-built model.

TL;DR

It’s not a trap.

Working in this industry is all about providing value for business stakeholders by solving problems (using data).

For some problems the solution will be building a dashboard to visualise data, while other solutions will require more complicated models. It differs per maturity level and data science project.

Sure, sometimes university professors and online courses can give the impression that all you do as a data scientist is train models, but the skills they teach are still crucial.

You may not need to build a neural net each month — but when you do, you better know how.


Managed to suffer through that? Check out my other pieces on machine learning and consulting:

CTA lesson learned blog 2

CTA blog Planning Fallacy