Why I disagree with the term "Data Scientist"

....and other grouchy grumblings from an old IT guy :-D

Okay, first I'll say, I've been knocking around with IT since the early days, yes, yes, yes, that means I'm biased, IT racist (if you will) and very set in my ways about how things should and should not work with IT.....

in fact, I'm not.  I'm actually quite the opposite.  I'm very flexible and open-minded.  That's how I've managed to stay relevant over the past 25+ years.  I've evolved with IT......where it makes sense.

I've been cautious when I needed to, I've thrown caution to the wind on the odd occasion and just went with the flow to see where we'd end up.

However.  All this new nonsense about A.I (Artificial Intelligence) and the term "Data Scientist" as a job role just rubs salt into a paper cut for me.

1st you are not a scientist.  yes, okay, you probably were very good as maths (correct spelling) and statistics at school, did that weird class you could do at A-level (if they even do that anymore) and then moved into University where you gorged on how to process data using Analytical thought (there's the nugget, right there!) and then to feed your addiction, you then wanted more, you moved onto doing a Masters degree....oh, the addiction knows no bounds, you bask in your glory...and then it looks like you may actually have to stop doing this now and do a "real" job....hang on, one last saving grace beholds you....yes, you manage to convince someone you NEEED to do a Ph.D..... you are saved, 4 more years of rolling around half-naked with your numbers, stats and all those equations, you are a god in human form.  and as they say, "then that also ends and you have to get a real job".

But, you now have more letters after your name than the length of most peoples names, therefore you can command a high salary and demand superiority and aloofness of your greatness of being able to do some sums.  (Oh, I'm going to get some flack for that one, but stick with me!).

The pretty clueless people (okay, I'll say it, mostly about my age + 5years) who are in charge of departments are being told "we NEEED these people for A.I., we NEEED the Data Scientists, the IT Industry is telling us we do".  So, they hire these people.

Then they throw them at client projects that come along.....and they fail.  They fail bad. But, it's okay, "it was the clients fault, they didn't define their requirements properly", "they didn't set realistic goals", "they didn't really know what they wanted"...."we did our best".... blah blah blah...

So, something is a little wrong and no-one really wants to admit it.  Data Scientists are just Data Analysts.  We've always had these people, heck, back in 1996-2001 I was paid quite nicely for doing that very task.  Analysing and making sense of millions of rows of data relating to ticket prices and revenue sharing for a major European railway company....yes, it was mind-blowing....but, and I think this is the real point.  Being able to extract meaningful data from the massive amount of data is only 1 (yes ONE) tiny part of the equation.  You still NEEED to know about IT.  About writing applications (do NOT even dare to throw the, "but I can code in python" sentence at me!), about deploying applications, about scaling, ....basically about all the "IT Infrastructure" items such as Load Balancers, firewalls, security, etc... all the things that you are taught when you learn about "proper IT".

The problem here is that companies are "ass-u-ming" that the "Data Scientists" know how to do all the "other IT stuff", because, well because they are paying them a LOT of money.  Some will point blank refuse (I know I've worked with some), some will give it a go, "what's the worst that can happen"...."(see above^^^^)".... others, just swan off to some other company to carry on Sciencing some data and getting paid handsomely for it.

So, I suppose there's my disagreement.  There is an assumption that the "Data Scientist" knows all.  They don't.  They know half a dozen ML (Machine Learning) models that they can invoke with some python scripts and make some good sense out of massive amounts of data.  COVID-19 has proven that.  But it has also shown that garbage-in, garbage-out.  We're still not at the A.I. level that people assume, else an A.I. would have worked everything out for us and developed a vaccine by now.


With that in mind, lay back and relax and I'll take you on a little journey back, okay, maybe only 20 years or so and I promise I'll be brief.  It involves my old mate, Dion Ridley.  Dions great.  Love him to bits & I hope he's being safe over in NY.

Dion is/was a Software Engineer.  I on the other hand have and probably always will be a Software Developer.  "What's the difference?", I hear you say.

Well, Dion will obsess over making widgets, making them as good and as fast as they possibly can be and to be the most efficient code and to do the job and just the job they need to do and they will do it bloody well (cos he's a great coder, just don't tell him that, his head is big enough as it is).  I on the other hand, can make some widgets, they'll work, they'll do the job, but they'll mostly be for a Proof of Concept, just to give a skeleton of what really needs to be done.  Probably run slow, will use a ton of external libraries that I have no real understanding of what those functions do internally, but it'll work.  A bit like the black boxes you have under the bonnet in your car - they do they job and you don't argue.

And that is where I think the problem lies with "Data Scientists", they are the "Software Engineer" mindset people, they make algorithms that do specific things, they make ML widgets and they are REALLY good at it.

But, you need a "Software Developer" to be able to step back, look at the bigger picture of what a client/customer needs to achieve and then decide where those widgets can be placed to make maximum benefit.  The widgets are crucial, but they are just component 3, 7, 11, 19, 21 & 23 of a 30 component IT solution.

Here is a diagram that I believe shows visually what I'm trying to explain.  "AI" (or "ML" for the purists), is just a component piece of the bigger whole of a solution that you will deliver for a customer......"Data Scientists"....it's not "all" about you ;-)



Whilst I give "Data Scientists" a bad rap, it's not their fault, it's the resource managers, the IT depts hiring them who mis-understand their value and then assume they are something they are not.  Vice-versa is also true.  For instance, I'm NOT a "Data Scientist", many bosses have tried to push me into it, I've given it a go...I've done online courses, I even wrote a few python brains (50lines of code) to drive a "car" around a course on my screen without crashing, but you know, it just didn't fit.  It wasn't applied.  It wasn't useful.  It was a widget.


Then I stumbled over this guys couple of articles and I totally 100% agree with him.

https://towardsdatascience.com/dont-learn-machine-learning-8af3cf946214

Get to know what the widgets are that are available, there are a lot of open source one's out there, then, if you need to drill down and do something specialist, then go get your "Data Scientist" involved and they'll revel in it and you'll get an awesome application made and delivered.

Just make sure that the right people are doing the right work to get the best possible outcome.  If you keep asking that plumber to do auto-electrical work, soon you'll have a steam powered car on your driveway :-)

Amusingly, I've just been making "ESRI Map widgets", but that's a different thing entirely!


-------
UPDATE: Okay, so, as we all say, "It's my opinion, so don't go getting your nerve ending all sensitive and do the usual internet keyboard warrior thing".  But, alas, some of us have been knocking around in the IT industry for longer than some of you have been born (oh, another sentence guaranteed to wind up the youngsters), but what that has given us is 'insight'.  What you think is new and amazing, is usually just a rehash of something we all worked on 20years ago, but hey, you weren't there, so you wouldn't know and it's not the sort of stuff that gets taught or even passed down from master to apprentice (now I am showing my age).

This person just published this article, and to be fair he did a much more eloquent job with his words than I can - maybe he's a "proper internet writer" or something, but in essence he's saying the same thing as I was above.  Software Engineer.  Data Scientist. etc..etc...

https://towardsdatascience.com/machine-learning-engineers-will-not-exist-in-10-years-c9cbbf4472f3

This does raise the point though, like we had back in 2000, where the job title, "Internet Strategist" existed (yes, I had that for a while!), it sort of morphed into a new title because things moved on, what I am REALLY hoping is that Machine Learning does just become an API.  A blackbox.  Something that you just call from your code and it does a thing.  Do you need to know what 20 python libraries it executed in the background to get the result?  nope.  You just care about the output and how it relates to solving your problem.

So, yes there will still be Machine Learning / Data Science people but they will be doing just that and only that, working on that blackbox and making it very smart.  That'll leave the Software developers free'd up to know they can plug these blackbox components in to meet certain AI-esque requirements, but they don't have to actually know how to do ML themselves.  Then the balance will return.

However, and this is the usual thing, the rush of companies being led by the slightly-out-of-touch, will demand that they need all these Data Scientists and not employ Software Developers or Infrastructure Engineers, etc... or they'll let them go.  Then when the axe swings the other way - and it will - they'll cut down on the Data Scientists and will then have a panic rush to get the Software Developers back...but, oh no!  Then they'll claim there is a "skills shortage" for these people, so then they'll have to outsource them from some place else, except, they don't know where from as they've never really had good candidates before, so they'll find yet another country out there who are willing to charge peanuts for these services and the skills gap will be filled.  yay, pat those bosses on the back.  (Oh hang on, did I just describe 2000-2015 there).  As I say, the IT Industry is actually very new and it makes a lot of mistakes, frequently and it makes them over and over and over again.  No single person can make it stop, it takes proper leadership, but y'know, as they say, if you've been around long enough, you end up learning to go with the flow rather than disrupting it.  (Oh, that is so not me, but I unfortunately lack the power to influence).

So if you are a Software Developer, you stick at it, learn "just enough to get by", don't put your eggs in one basket, not that you would have done anyway, it'll not be in your nature.  Would you, in todays world, commit to ONLY coding in ONE language?  (okay, there are a lot of people I know who only code in Java, as it's got them through the past 10+years, but even they are now realising their shortfall), you "keep your options open".  Be flexible and adapt.  That way you'll survive and not get sucked into every new trendy / shiny thing that comes along.

(I'm not saying I always make the right choices there, I recall telling the head of SCEE (Sony Computer Entertainment Europe) in a job interview that I didn't see the point of connecting Playstation 1 devices up to the internet, why would you do that?....I'll just point out, it was 1996)

------------
FURTHER UPDATE:
So, this opinion article appeared on Medium today:

https://towardsdatascience.com/dont-become-a-data-scientist-ee4769899025

and yep, you guess it, the title is "Don't become a Data Scientist, become a Software Engineer instead".

'nuff said.  ;-)

Comments