What does it take to be a data scientist in 2022?
The breakneck pace of technological change means data scientists are contending with shifting expectations, technologies, tools and strategies
There may be no such thing as a typical data scientist. As expectations of analytics have grown and with the increasing quantities of data available, the data scientist’s role is diverse, exciting and in demand.
For a long time the role of ‘data scientist’ lacked definition. It was a buzzy label that data-savvy people could hand themselves on LinkedIn, with little push-back. For employers, it became a catch-all term for ideal candidates of impossibly numerous talents – with expectations set for ultra-technical statisticians, strategic leaders, and operations specialists all at once. “We have to blow up this idea of the data scientist unicorn,” says director for the Institute of Experiential AI, Dr Usama Fayyad. Instead, candidates and employers should set realistic expectations, with this all-consuming ideal model sliced into its various components. Some people may be better at technically complex tasks like probabilistic modelling; others may be better at data translation – communicating results and fitting insights into a strategic, business setting – which may not be the best use of highly technical engineering or science skills.
We have to blow up this idea of the data scientist unicorn
There is movement in resetting these expectations, though, one that’s perhaps inevitable with the breakneck pace of technological change.
When data science was a smaller field than today – there were 250,000 job vacancies in 2020 according to some reports – the expectation was that data scientists had to know everything, about all types of models, and how they worked.
Now, there’s a broader recognition that the field just moves too quickly for that to be realistic, and it’s more commonly acknowledged that specialisations, such as NLP or forecasting, can be extremely useful, says Michael Shores, senior director of data science at Vista.
“This also applies to the languages data scientists work with – a few years ago, data scientists used to be polyglots, knowing R, SAS, Python, and possibly a few others,” Shores explains.
“But the field has coalesced around Python. This poses an interesting labour problem for highly regulated industries where ‘old’ languages like SAS are still being used, but there’s scant new talent that already know, or are willing to learn, these ‘dated’ languages.”
While there may be a convergence in the common languages used by data scientists, the opposite is true for the types of data that data scientists analyse: the reams of unstructured data in all of their ‘modalities’ – whether audio, visual, table or text recognition – and the new ways of interpreting them opened up by deep learning technologies.
“Deep learning is here to stay and it’s gone beyond simple optical character recognition,” says Edgar Meij, head of AI discovery at Bloomberg. “It’s become the multi-tool pocket knife for machine learning, so understanding the possibilities – and what the capabilities are not – as a data scientist is critically important.”
Additionally, says Meij, while there have been enormous advances in automated machine learning (AutoML) or automatic hyperparameter tuning, keeping humans in the loop will always be essential – for interpreting and making sense of everything from inputs to analysis.
The same principles apply to annotation, he adds: “It’s one thing as a data scientist to look at some data, query it, transform it, analyse it, model it and present the results. It’s another to get those results validated. So if you have a model that makes predictions – look at some of those predictions, and work towards getting a continuous annotation feedback loop going, allowing data scientists to lean on human judgements.”
Keeping abreast of the latest developments on the horizon of data science is one way that data practitioners can help to safeguard their skills for the rapidly changing future.
But it’s hardly just technical change on the horizon. Data scientists must grapple with responsible AI, and the concepts of fairness, accountability, transparency, ethics, and sustainability – FATES, for short – as technology uses become more common in our daily lives.
“Once seen as fringe, more people are paying attention to these aspects, especially when data science is used for more automated decision-making that involves people,” comments
Professor Paul Clough at the University of Sheffield. “Linked to this is the ethical use of data, ensuring people are aware of how their data is used and have provided appropriate consent. Related to this are topics like explainable AI, seeking to make clear how algorithms, especially neural networks, arrive at an outcome.”
This is just one reason why it’s so vital that data scientists have a seat at the leadership table.
While most businesses will be cognisant of the transformational power of data, they will need people who can translate this most technical of subjects into the language of business, inform decision-making, and the know-how to embed transparency and ethics into the day to day.
After all, says Meij, quoting a motto frequently mentioned by his boss, Michael Bloomberg: “If you can’t measure it, you can’t manage it.”