Should Data Scientists Be Licensed?

Licensing could lead to increased public safety, but at the cost of slowing down innovation

Andrei Lyskov
Towards Data Science

--

Every day your life is impacted by different machine learning algorithms. Some are innocuous, such as movie recommendations on Netflix. Others such as loan approval and bail sentencing could cause unmitigated harm if they aren’t developed properly. With the growing influence of these models, it raises the question of whether data scientists should be licensed, similar to lawyers and doctors.

A formal profession would mean data scientists would be required to maintain a certain level of technical competency, adhere to a code of conduct and self-regulate through a ratified board composed of professionals in the field. Since data science requires special training, and malpractice can cause severe negative consequences, there is a strong case to be made for the formation of a data science profession.

However, data science is still in its infancy relative to other licensed fields, and it’s expected to segment and change over the coming years. Despite its infancy, the negligence of one data scientist will have far more impact than the negligence of one doctor. As such, there needs to be a way to punish bad actors and maintain public trust. Whether that be through having a professional licensing board that puts restrictions on employing non-licensed data scientists, or a certificate which ensures the integrity of a data scientist. The end goal remains the same, protect the public from professional incompetence.

Benefits of Licensing

When looking at the reasons for licensing, the most obvious reason is to protect the public. Going to a doctor, you have a certain expectation in the service. You know that they are held to standards, and if they misbehave there will be repercussions such as revoking their ability to practice medicine. Data scientists who misbehave, however, can simply find another job and continue their career.

The other benefit of licensing is by standardizing the education and expectations for data scientists. Right now there is a lot of confusion about the role of a data scientist, which leads to many unqualified individuals rebranding themselves as data scientists to take advantage of an increase in pay and prestige. By creating a standard through licensing, low-quality workers who cannot meet the new entry requirement would be forced out, while the more driven ones would have to engage in job-related training to meet the new expectations. In the UK for example, the introduction of occupational licensing for security guards and care workers led to a rise in qualification levels and job-related training.

Drawbacks of Licensing

Yet the reasons against licensing are just as compelling. For example, it’s been shown that excessive levels of licensing can hinder job creation, especially for individuals with lower levels of education which could further levels of inequality.

Deciding who is qualified can also be a difficult task, particularly because Data scientists come from all kinds of backgrounds. One small study (n=1001) explored the differences in data scientists backgrounds in areas such as academic studies (20% Computer Science, 19% Math, 19% economics) as well as the fields they work in (42% technology, 37% industrial, 16% financial, 5% healthcare), illustrating a sharp contrast in backgrounds. Additionally, since the field is so new, it’s likely that the data scientists of the future will have different skills to the data scientists of today.

Another issue that can present itself is in curtailing innovation by creating a high barrier to entry. This is especially true of immigrant talent that may already be competent but may need to go through duplicate training that they had already received in their home countries. With data science already experiencing a shortage, there’s no doubt that any form of professional licensing would hurt job growth and significantly reduce the practitioner entry rates. These issues would be a direct result of the monopoly created in the form of a data science association, which would restrict the supply of labor and reduce competition.

Putting It All Together

At its core, the issue of licensing a profession is ultimately about protecting the public from bad actors and ensuring consistent quality. Other benefits include standardizing the education and expectations for data scientists, lowering the amount of low-quality workers. On the other hand, because data science is such a new field, it may hinder job creation and slow down innovation. This may lead to immigrant talent seeking other countries for employment that don’t have a high barrier to entry.

While the mechanics behind implementing licensing may prove difficult, certificates can be a less restrictive option to licensing. A professional association can administer exams, and companies who want to ensure the quality of their data scientists may choose to prioritize those candidates with certificates. The other alternative is to license and regulate only those data scientists who work with sensitive data or large-scale models. In all likelihood, as the field matures and the role of a data scientist is more standardized, it may make sense to revisit the question of licensing, or at the very least creating some sort of check and balance. As it stands now, implementing any form of a restrictive licensing process may end up doing more harm than good.

--

--

Data Scientist at Coinbase writing about Data Science, Quantified Self, Philosophy and other topics I find interesting