You are on page 1of 14

A New Definition of Data Science in Aca…

Saved to Dropbox • Feb 24, 2020 at 10:14 AM

A New De'nition of
Data Science in
Academic Programs
It’s time to get more speci.c about
what Data Science entails.

Thu Vu
Feb 23 · 5 min read

D ata science in the past few

years has been used to refer
to almost everything related to
data (data analysis, data mining,
machine learning, etc.). More and
more people seek for a Data
Science education, and more
universities and online platforms
rush to develop such a program.
However, the lack of clarity in the
de@nition of data science and data
scientist clearly hurts everyone, no
matter how fashionable they may
sound. Everyone projects onto the
role what they want it to be:

Applicant: “I’m interested in doing

machine learning on massive data
sets. So I’m applying for a job as a
data scientist!”

Business: “I need someone who can

build a nice management
dashboard from these Excel
workbooks, so I’ll hire a data

As a result, people who applied for

a data scientist job are unhappy
because they end up doing data
extraction and dashboarding
instead of any machine learning.
Meanwhile, businesses realize they
do not gain much more value than
having some good data analysts.
I’m personally very hesitant to call
myself a data scientist, even
though I do a lot of “stuL” with

What do data scientists do? Source.

So how should we
de+ne data science?
A MIT article published a few
weeks ago (at the time of writing
this post) proposes a new
de@nition of data science and
design for data science program.
They argue that data science is
NOT a single discipline. It is
rather an umbrella (generic)
term that describes the complex
process in a team of data
scientists with non-overlapping
skills. Given the broad spectrum of
activities and multiple steps
involved to extract value from data
nowadays, it is not plausible to
assume that one data scientist
possess all the necessary expertise.

More clarity on what data science

entails will not only help academic
programs better design their
curricula, but also for learners and
businesses to better understand
what exactly to look for and expect
from such programs.

Backend and frontend

data science
The article argues that there needs
a clear distinction between
backend and frontend data science.
I summarize their idea below:
Backend and Frontend Data Science
(illustrated by author based on MIT

The main roles involved in the data

science pipeline are:

Data engineers who deals

with hardware, eOcient
computing, and data storage

Data analysts who wrangle,

explore, quality assess, @t
models to data, perform
statistical inference, and
develop prototypes.

Machine learning engineers

who build and assess
prediction algorithms and
make the solution scalable and
robust for many users.
Data science software
developers who are not
involved directly in producing
data science pipelines but
instead develop the software
tools that facilitate data
science. Examples are the
developers of Hadoop, R,
RStudio, IPython notebooks,
TensorFlow, D3, pandas, and
the tidyverse, etc.

Each of these roles requires very

diLerent expertise that deserves a
whole diLerent track in a data
science academic program.

I would also argue that in reality,

there are many more possible data
science roles. For example, a data
science translator/ communicator
(?) who possesses skills to bridge
the gap between the management
group and data science team.
He/she can skillfully explain
complex data science concepts to
lay people, through visualization or
presentation. Many data science
projects get pushed back or not get
funded because the management
does not fully grasp the idea
behind. Another role could be
someone (data science business
developer?) who has strong domain
expertise and at the same time
knows data science concepts at a
high level. He is good at connecting
the dots and spot valuable data
science opportunities that may
bring bene@t the business.

What to look for in a

data science education
With the overview above, you can
see that data science is in fact very
broad, and machine learning and
modelling is a rather small part in
the puzzle. That means the
academia needs to better de@ne
what their programs oLer, learners
need to better de@ne their goals,
and businesses need to better
understand what they are looking

The article suggests academia to

provide better preparation to
learners by:

Three di:erent tracks: oLer

speci@c tracks corresponding
to the diLerent aspects of data
science: data engineer, data
analyst, and machine learning
engineer, data science software
developer, etc.

Bring applications to the

forefront: emphasize the
necessity of real-world
applications and subject matter
of the problems. Courses need
to be linked to actual

Real-world experience: to those

who are interested in becoming
data science software
developers, capstone project
courses on developing software
packages are what they look
for in the program. Also, it’s of
great importance that learners
can produce reliable and
reproducible code, because the
data science pipeline or
application will need to work
for many users. This is an often
neglected training aspect

Practical programming skills:

strong programming training
in appropriate languages for
speci@c roles and tasks:

Illustration by author based on MIT article.

Focus on graduate-level
programs: data science degrees
is recommended to target
master’s or Ph.D. levels, rather
than undergraduate level.
If you are a learner looking for a
data science education, it’s very
important to ask yourself:

What speci@c role would you

like to see yourself be doing?

What are the relevant skills you

would then need to gain?

Once you have the answer to

these questions, look at the
curriculum and see if it @ts
your needs and expectations.

If you are hiring for a business,

it’s important to be careful with the
use of data scientist term and be as
speci@c as possible in the job
descriptions. In the long term,
everyone can bene@t from this.

. . .
Thank you for reading. Enjoy

Data Science Education

Technology Machine Learning




Thu Vu

Data analytics consultant. I write

about data science, art and personal


Towards Data

A Medium publication sharing

concepts, ideas, and codes.


About Help Legal

You might also like