Professional Documents
Culture Documents
As far as job titles go, data scientist is kind of the biggest buzzwords of
the last few years. It's also one of the more nebulous ones. What
actually is data science? Can you even study this? What do data
scientists do?
Yes, you can now study data science at some universities (Edinburgh's
Data Science program is one of the better ones), but most data
scientists come from other elds. Mathematics. Computer Science.
Statistics. Physics.
You know, the usual suspects — math-heavy courses that also expose
you to a lot of programming and algorithms.
Economics?
As a graduate of economics, I've committed possibly the greatest sin of
the profession. I switched sides. To machine learning and data science.
Gasp.
I don't think I really switched sides, but the world — at least the
economics world — around you would have you believe that
econometricians and data scientists are sets without an intersection.
Data mining is somewhat of a bad word in econometrics, a eld almost
religiously seeking causal inference and interpretability of results.
But when you actually look into what data science usually is, the
boundaries between more traditional econometrics/statistics and the
hip and cool machine learning become less and less clear (this
infographic is a great illustration of it: source).
But, really?
Reading through common data science job descriptions, you may get
the idea that economics is the worst training to have. Most economics
programs don't teach programming and databases, neither do they
come even close to machine learning. WTF is Hadoop? And Hive and
Pig? Is this a joke?
In fact, the rst two modules in the most popular machine learning
course on Coursera are, wait for it, linear regression and logistic
regression.
And even the terms you may not know, they are often just examples of
skilful copywriting. Neural networks are a great example. It's
something that sounds incredibly complicated (are we modelling the
brain or what?), but on a (basic) fundamental level, they just combine
layers of logistic(-like) regressions to model more complex non-linear
relationships that a single regression may not capture (for great
primers on neural nets, see http://karpathy.github.io/neuralnets/ or
http://iamtrask.github.io/2015/07/12/basic-python-network/).
Granted, neural networks can go deep, far deeper than what I've just
described. Recurrent nets, convolutional nets, deep learning are all
much more complex topics — and much more powerful algorithms. But
for most machine learning applications, you should do just ne with far
simpler models: basic neural nets, decision trees, regressions, SVMs…
And with statistical background from most econometric courses, you
are not going to have any trouble grasping these concepts quickly (I
highly recommend that Coursera course).
For every problem there was another — more complicated — model that
was to deal with it. A model that also introduced its own bag of
assumptions and issues.
Yes, neural networks may not be used in explaining the causal e ect of
minimum wage on unemployment. But neither can you really expect
(multinomial) logit to be used to recognize hand-writing. It's all about
using the right tools in the right applications — and I think
econometrics taught you a lot about that.
If you work as a data scientist anywhere in the “real world”, you'll have
to present your results to non-technical audiences — managers,
marketers and copywriters, customers and clients. And you'll have to be
able to show why your results matter and how normal folk can use it
and act on it.
And if you haven't, just learn Python. R may be powerful too, but the
syntax is an abomination and it's kind of slow with bigger datasets.
Matlab is a commercial software, and while it is great (and fast) at
mathematical computing and it has an open-source alternative
(Octave), it's not that common. Julia is too obscure and still a bit too
young.
So give it a try. Follow the links in this article. See if it catches your
fancy. And don't think that just because you don't know what Hessians
are, you can't go into machine learning.