What Is Data Science and Why Do We Need More Data Scientists?
If you take a quick look at Glassdoor.com and their “50 Best Jobs in America” for 2017, you will notice that Data Scientist ranks at #1, among several other “data/analytics” titles in the list. Although this is evidently a highly desirable job, there appears to be a large talent gap in the field. Recent research by McKinsey Global Institute has projected that by next year, the US could face a shortage of 140,000 to 190,000 professionals with deep analytical skills, i.e., advanced statistical training and knowledge of machine learning. What exactly is data science? While not an entirely new concept, the terms “data science” and “data scientist” have exploded in popularity in the past 5-10 years. At its core, data science is about using scientific methods along with new technology systems to gather insights from massive amounts of heterogeneous data.
For the Federal Government in particular, there’s much to be done in order to attract and retain top data scientists who will work with some of the largest databases in existence. The Networking and Information Technology Research and Development (NITRD) Program has come up with a plan to address this issue. One of their strategies is to “Improve the national landscape for Big Data education and training to fulfill increasing demand for both deep analytical talent and analytical capacity for the broader workforce.” So there is clearly a focus on growing data science programs and preparing the next generation of super-savvy data nerds… but why?
- The importance of data-driven decision making cannot be overstated
- Data scientists are skilled at asking the right questions
- Data scientists are uniquely equipped to make sense of big data
1. Data-Driven Decisions
- In these fiscally-constrained times, more and more agencies are being assessed on how effectively they are achieving their mission.
- Decisions regarding how to deploy the government’s biggest resources – people and money – for maximum efficiency must be evidence-based.
- However, there are many challenges to becoming a fully data-driven organization. A recent article by Deloitte University Press describes these obstacles, but also provides solutions and success stories for each obstacle.
2. Defining the Data Question/Need
- Any worthwhile analysis begins with a question. The type of question determines what kind of data and which analyses are required.
- Data scientists have a strong background in research, which allows them to help frame the question and select the appropriate method of data analysis.
- It’s important to note that data can be used to answer many questions, but not all of them.
“The data may not contain the answer. The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.”
3. Big Data
- Any data-related article you read within the past 5 years will make reference to the exponential growth in both the depth and breadth of data available (“big data”).
- This is perhaps best illustrated by IBM’s estimate that 90 percent of the world’s data has been generated in the past two years. And that assertion was made in 2013, so the percentage is now likely even higher.
- We can no longer brush off the notion that big data is a global phenomenon; its potential benefits are now well-documented. It has already begun to shape the way many businesses and governments work, as well as how we experience everyday life.
- Data scientists are able to apply their knowledge of math, statistics, and computer science to all industries that collect and consume data on a large scale.
Who has been in one of these situations?
- Not enough data/information to answer the question you’re interested in or make the decision you need to make
- Overwhelmed with an excessive amount of data and must figure out a way to parse through the irrelevant “noise” in order to focus in on what is actionable
Data scientists are trained to deal with these challenges, all the while navigating messy, unstructured, and remote datasets. Using techniques such as machine learning, data mining, predictive analytics, deep learning, and cognitive computing, data scientists can begin to answer questions that organizations have not even considered to ask.