What's hard about hiring data roles

Roy Keyes

9 Aug 2021 - This is a post on my blog. Get the RSS feed!


Hiring is hard and data hiring has special challenges

I recently released a book about hiring called Hiring Data Scientists and Machine Learning Engineers. Some of the questions I've fielded related to my book are:

These questions are at the core of why I wrote my book. Hiring in general is difficult, but hiring for data roles comes with some special challenges. In this post I will lay out my view on why data roles are specifically hard to hire for.

Hype and hope

Data science and machine learning have emerged in the last ten years as particularly promising approaches to a number of business problems. Organizations have more access to more data than ever and the hope that business value can be extracted from that data has reached a fever pitch.

The hype around data science and machine learning (and something called "artificial intelligence") has driven all sorts of organizations to build out data teams. The hype has also led a huge number of people to enter the field in the hopes of landing an interesting and highly compensated career (remember the "sexiest job of the 21st century"?).

This is still the early days of this era of data science and machine learning. While there has been some evolution of roles and titles, the field is extremely broad and there is still a lot of disagreement on what exactly data scientists, machine learning engineers, and related roles actually do. A data scientist at one company may have a very different skill set and set of duties than a data scientist at another company, despite the shared title.

Hiring amid the hype

The newness, hype, and lack of clarity create a number of challenges for people trying to hire for these roles.

The fact that these fields are relatively new means that there are few really experienced data scientists and machine learning engineers, but also there are few people that are really experienced in hiring for these roles. The newness means that the easiest thing to do has been to simply use or adapt hiring strategies from more established roles, such as for software engineering. This creates challenges, as the wrong assessment strategies will lead to more false positives and false negatives in the process.

The newness also means that finding those experienced candidates can be very difficult and ultimately very expensive when you do find them, since the supply is so low. Organizations are faced with the fact that experienced talent may simply be outside their budgets.

The wide breadth of topics, skills, and technologies that fall under data science and machine learning means that there is often a mismatch between the open role and the candidates. It's often not clear to either side what the other brings to the hiring table, and this can often lead to confusion and frustration.

"I thought this was a data science role!"

"It is!"

This lack of alignment and understanding on the basic role definitions is a common source of issues.

The hype has led to a huge influx of people joining the field, leading to a glut of early career people applying for DS and MLE roles. This means that a hiring manager looking to fill more junior level roles is likely to be overwhelmed with the volume of applicants. A team looking to fill both junior level and senior level positions may be faced with two extremely different sourcing problems: too many and too few applicants at the same time.

Overcoming these challenges

To hire effectively and efficiently, you need to understand these challenges and design a hiring strategy to address them. Core aspects of such a strategy are:

You can read more in depth on these issues in my book, which is available at dshiring.com.