Stop Expecting Data Scientists to Be Magical: Analytics Is a Team Sport
Many organizations put unreasonable expectations on data scientists. Their job descriptions and requirements are often at a super-human level. “They” say — and who are they? — that modern-day data scientists must be good at absolutely everything. Okay, then, what’s “everything,” in this case?
First, data scientists have to have a deep understanding in mathematics and statistics, covering regression models, machine learning, decision trees, clustering, forecasting, optimization, etc. Basically, if you don’t have a post-graduate degree in statistics, you will fail at “hello.” The really bad news is that even people with statistics degrees are not well-versed in every technique and subject matter. They all have their specialties, like medical doctors.
Then data scientists have to have advanced programming skills and deep knowledge in database technologies. They must be fluent in multiple computer languages in any setting, easily handling all types of structured and unstructured databases and files in any condition. This alone is a full-time job, requiring expert-level experience, as most databases are NOT in analytics-ready form. It is routinely quoted that most data scientists spend over 80% of their time fixing the data. I am certain that these folks didn’t get an advanced degree in statistics to do data plumbing and hygiene work all of the time. But that is how it is, as they won’t see what we call a “perfect” dataset outside schools.
Data scientists also have to have excellent communication and data visualization skills, being able to explain complex ideas in plain English. It is hard enough to derive useful insights out of mounds of data; now they have to construct interesting stories out of them, filled with exciting punchlines and actionable recommendations at the end. Because most mortals don’t understand technical texts and numbers very well — many don’t even try, and some openly say they don’t want to think — data scientists must develop eye-popping charts and graphs, as well, using the popular visualization tool du jour. (Whatever that tool is, they’d better learn it fast).
Finally, to construct the “right” data strategies and solutions for the business in question, the data scientist should have really deep domain and industry knowledge, at a level of a management and/or marketing consultant. On top of all of that, most job requirements also mention soft skills — as “they” don’t want some data geeks with nerdy attitudes. In other words, data scientists must come with kind and gentle bedside manners, while being passionate about the business and boring stuff like mathematics. Some even ask for child-like curiosity and ability to learn things extremely fast. At the same time, they must carry authority like a professor, being able to influence non-believers and evangelize the mind-numbing subject of analytics. This last part about business acumen, by the way, is the single-most important factor that divides excellent data scientists who add value every time they touch data, and data plumbers who just move data around all day long. It is all about being able to give the right type of homework to themselves.
Now, let me ask you: Do you know anyone like this, having all of these skills and qualities in “one” body? If you do, how many of them do you personally know? I am asking this question in the sincerest manner (though I am quite sarcastic, by nature), as I keep hearing that we need tens of thousands of such data scientists, right now.
There are musicians who can write music and lyrics, determine the musical direction as a producer, arrange the music, play all necessary instruments, sing the song, record, mix and master it, publish it, and promote the product, all by themselves. It is not impossible to find such talents. But if you insist that only such geniuses can enter the field of music, there won’t be much music to listen to. The data business is the same way.
So, how do we divide the task up? I have been using this three-way division of labor — as created by my predecessors — for a long time, as it has been working very well in any circumstance:
- A Statistical Analyst will have deep knowledge in statistical modeling and machine learning. They would be at the core of what we casually call analytics, which goes way beyond some rule-based decision-making. But these smart people need help.
- A Master Data Manipulator will have excellent coding skills. These folks will provide analytics-ready datasets on silver platters for the analysts. They will essentially take care of all of the “before” and “after” steps around statistical modeling and other advanced analytics. It is important to remember that most projects go wrong in data preparation and post-analytics application stages.
- A Business Analyst will need to have a deep understanding of business challenges and the industry landscape, as well as functional knowledge in modeling and database technologies. These are the folks who will prescribe solutions to business challenges, create tangible projects out of vague requests, evaluate data sources and data quality, develop model specifications, apply the results to businesses, and present all of this in the form of stories, reports, and data visualization.
Now, achieving master-level expertise in one of these areas is really difficult. People who are great in two of these three areas are indeed rare, and they will already have “chief” or “head” titles somewhere, or have their own analytics practices. If you insist only procuring data scientists who are great at everything? Good luck to you.
Too many organizations that are trying to jump onto this data bandwagon hire just one or two data scientists, dump all kinds of unorganized and unstructured data on them, and ask them to produce something of value, all on their own. Figuring out what type of data or analytics activity will bring monetary value to the organization isn’t a simple task. Many math geeks won’t be able to jump that first hurdle by themselves. Most business goals are not in the form of logical expressions, and the majority of data they will encounter in that analytics journey won’t be ready for analytics, either.
Then again, strategic consultants who develop a data and analytics roadmap may not be well-versed in actual modeling, machine learning implementation, or database constructs. But such strategists should operate on a different plane, by design. Evaluating them based on coding or math skills would be like judging an architect based on his handling of building materials. Should they be aware of values and limitations of data-related technologies and toolsets? Absolutely. But that is not the same as being hands-on, at a professional level, in every area.
Analytics has always been a team sport. It was like that when the datasets were smaller and the computers were much slower, and it is like that when databases are indeed huge and computing speed is lightning fast. What remains constant is that, in data play, someone must see through the business goals and data assets around them to find the best way to create business value. In executing such plans, they will inevitably encounter many technical challenges and, of course, they will need expert-level technicians to plow through data firsthand.
Like any creative work, such as music producing or movie-making, data and analytics work must start with a vision, tangible business goals, and project specifications. If these elements are misaligned, no amount of mathematical genius will save the day. Even the best rifles will be useless if the target is hung in a wrong place.
Technical aspects of the work matter only when all stakeholders share the idea of what the project is all about. Simple statements like “maximizing the customer value” need a translation by a person who knows both business and technology, as the value can be expressed in dollars, visits, transactions, dates, intervals, status, and any combination of these variables. These seemingly simple decisions must be methodically made with a clear purpose, as a few wrong assumptions by the analyst at-hand — who may have never met the end-user — can easily derail the project toward a wrong direction.
Yes, there are people who can absolutely see through everything and singlehandedly take care of them all. But if your business plan requires such superheroes and nothing but such people, you must first examine your team development roadmap, org chart, and job descriptions. Keep on pushing those poor and unfortunate recruiters who must find unicorns within your budget won’t get you anywhere; that is not how you’re supposed to play this data game in the first place.
Stephen H. Yu is a world-class database marketer. He has a proven track record in comprehensive strategic planning and tactical execution, effectively bridging the gap between the marketing and technology world with a balanced view obtained from more than 30 years of experience in best practices of database marketing. Currently, Yu is president and chief consultant at Willow Data Strategy. Previously, he was the head of analytics and insights at eClerx, and VP, Data Strategy & Analytics at Infogroup. Prior to that, Yu was the founding CTO of I-Behavior Inc., which pioneered the use of SKU-level behavioral data. “As a long-time data player with plenty of battle experiences, I would like to share my thoughts and knowledge that I obtained from being a bridge person between the marketing world and the technology world. In the end, data and analytics are just tools for decision-makers; let’s think about what we should be (or shouldn’t be) doing with them first. And the tools must be wielded properly to meet the goals, so let me share some useful tricks in database design, data refinement process and analytics.” Reach him at firstname.lastname@example.org.