You are on page 1of 7

4 Reasons Why Economists Make

Great Data Scientists (And Why No


One Tells Them)
Matúš Lupták Follow
Oct 23, 2015 · 7 min read

As far as job titles go, data scientist is kind of the biggest buzzwords of
the last few years. It's also one of the more nebulous ones. What
actually is data science? Can you even study this? What do data
scientists do?

Yes, you can now study data science at some universities (Edinburgh's
Data Science program is one of the better ones), but most data
scientists come from other elds. Mathematics. Computer Science.
Statistics. Physics.

You know, the usual suspects — math-heavy courses that also expose
you to a lot of programming and algorithms.

But I want to suggest that economics is — surprisingly perhaps — a great


background for data science.
Yes yes yes. Please, hear me out. I know I am biased, but I really believe
there aren’t many degrees that give you better training for working in
data science than economics.

Economics?
As a graduate of economics, I've committed possibly the greatest sin of
the profession. I switched sides. To machine learning and data science.

Gasp.

I don't think I really switched sides, but the world — at least the
economics world — around you would have you believe that
econometricians and data scientists are sets without an intersection.
Data mining is somewhat of a bad word in econometrics, a eld almost
religiously seeking causal inference and interpretability of results.

But when you actually look into what data science usually is, the
boundaries between more traditional econometrics/statistics and the
hip and cool machine learning become less and less clear (this
infographic is a great illustration of it: source).
But, really?
Reading through common data science job descriptions, you may get
the idea that economics is the worst training to have. Most economics
programs don't teach programming and databases, neither do they
come even close to machine learning. WTF is Hadoop? And Hive and
Pig? Is this a joke?

Speci c skills aren't the most important, though. Solid background is —


a background that will let you learn the speci c skills quickly. And good
economics education is indeed a solid background to have.
So here's 4 reasons why economists make great data scientists:

You already know machine learning!


Before you stop reading, thinking that I must've gone to a very weird
economics school to have learned machine learning there, read this:

Machine learning is really just a very fancy term for


statistical/predictive modelling that programmers invented to keep
away the uninitiated from their elite club (hey, they do know some
economics after all — scarcity drives prices up!).

In fact, the rst two modules in the most popular machine learning
course on Coursera are, wait for it, linear regression and logistic
regression.

For the 99% percent of economists who took introductory


econometrics, this may surprise you. But you probably have deeper
knowledge of linear regression than the average data scientist. Just as
you may be freaked out by names like “neural networks” or “support
vector machines”, you'd have to work very hard to nd the term
“heteroskedasticity” anywhere in machine learning syllabi.

And even the terms you may not know, they are often just examples of
skilful copywriting. Neural networks are a great example. It's
something that sounds incredibly complicated (are we modelling the
brain or what?), but on a (basic) fundamental level, they just combine
layers of logistic(-like) regressions to model more complex non-linear
relationships that a single regression may not capture (for great
primers on neural nets, see http://karpathy.github.io/neuralnets/ or
http://iamtrask.github.io/2015/07/12/basic-python-network/).

Granted, neural networks can go deep, far deeper than what I've just
described. Recurrent nets, convolutional nets, deep learning are all
much more complex topics — and much more powerful algorithms. But
for most machine learning applications, you should do just ne with far
simpler models: basic neural nets, decision trees, regressions, SVMs…
And with statistical background from most econometric courses, you
are not going to have any trouble grasping these concepts quickly (I
highly recommend that Coursera course).

You have higher standards


Hands up if you can still recite all OLS assumptions. And all the possible
threats to internal and external validity of your analysis.

Of course you can, you nerds.

At least in my experience, econometrics was obsessed with nding


causal relationships — and making it really clear how di cult this is
without randomized controlled trials. And how sensitive most models
are to their basic assumptions. A lecture wouldn't pass without
someone mentioning yet another possible source of bias. Attenuation
bias. Survivorship bias. Selection bias. Measurement error. Reverse
causality. Truncation. Censoring.

For every problem there was another — more complicated — model that
was to deal with it. A model that also introduced its own bag of
assumptions and issues.

The world of econometrics was messy, uncertain and frustratingly


limiting.

Warning: gross exaggeration ahead.

Compared to this, machine learning is beautifully straightforward.


Instead of solving models explicitly — relying on strict assumptions to
be able to do so — models are estimated iteratively with gradient
descent (and its derivatives). Instead of guring out what the theory is
behind the relationship you are trying to study, and carefully selecting
explanatory variables and the appropriate model, you try all you can
think of and see if it sticks. Get used to cross-validation and testing.
Instead of t-statistics, why not try some bootstrapping?

To econometricians, this may seem blasphemous. But that's only


because you are expecting the same from ML that you expected from
econometrics. Inference and causal interpretation. For the most part,
ML strives for prediction and discovering patterns, not causality. For
some models, you can't even say which variables are the most
important in predicting the results.

Yes, neural networks may not be used in explaining the causal e ect of
minimum wage on unemployment. But neither can you really expect
(multinomial) logit to be used to recognize hand-writing. It's all about
using the right tools in the right applications — and I think
econometrics taught you a lot about that.

You actually know how to write coherent sentences


Data science isn't just fancy algorithms, though. Unless you are an
academic researcher who only writes theoretical papers (in which case
you probably wouldn't be reading this anyway), presentation and
writing are big parts of data science. Just as they are in economics.

If you work as a data scientist anywhere in the “real world”, you'll have
to present your results to non-technical audiences — managers,
marketers and copywriters, customers and clients. And you'll have to be
able to show why your results matter and how normal folk can use it
and act on it.

As economists, I'd wager you've written your fair share of papers,


essays, reports, presentations and dissertations in your time at
university. Don't underestimate this skill. In fact, it probably puts you
well ahead of most of computer scientists and mathematicians when it
comes to presenting and explaining your work clearly — and putting
together longer pieces of texts that have structure and logic behind
them.

Python isn't hard


Alas, you will probably also have to write code, not just words, if you
want to work in data science. But it's not like economists don't have to
write code, too. True, Stata isn't a “proper” programming language, but
it's a great introduction to statistical computing. And if you go on to
graduate studies, many economics programs have you learn other
languages anyway — Python is very common, as is R and Matlab.

Fortunately, Python's become the programming “lingua franca” of data


science. Not only has it got a great selection of libraries (Numpy, Scipy,
Scikit-learn, Statsmodels, Pandas, Matplotlib, Seaborn…), but it's also
a very legible and easy-to-learn language and you've probably come
across it anyway.

And if you haven't, just learn Python. R may be powerful too, but the
syntax is an abomination and it's kind of slow with bigger datasets.
Matlab is a commercial software, and while it is great (and fast) at
mathematical computing and it has an open-source alternative
(Octave), it's not that common. Julia is too obscure and still a bit too
young.

So why no one tells you this?


Apparently, economists should make great data scientists. So why no
one tells them in university that this is a very real career choice? For
one, it's all relatively young. And course prospectuses are slow to
change — favoring more traditional options in nance, academia,
government…

But I also think there is a bit of prejudice in the economics world


against data science. That it's beneath an economist to go into data
science. That they are concerned with greater issues.

Which is a shame. Because economics gives its graduates a very unique


blend of technical/statistical and soft/human skills that are much
harder to come by in the mathematic and CS departments. And perhaps
data science positions would bene t from having careful
econometricians do the job — people aware of all the possible
shortcomings of data mining and just trying all that might work. Just as
econometricians might learn from ML when it comes to testing and
cross-validation and algorithmic approaches to estimation.

So give it a try. Follow the links in this article. See if it catches your
fancy. And don't think that just because you don't know what Hessians
are, you can't go into machine learning.

(This isn't meant to be a guide for economists on how to become data


scientists. But it should give you plenty of things to think about — and
expand your range of possible career options. I may write more speci c
“tutorial” articles later.)

You might also like