You are on page 1of 10

http://gilesd-j.

com/2019/01/07/7-reasons-for-policy-professionals-to-get-
pumped-about-r-programming-in-2019/

January 7, 2019

7 Reasons for policy professionals to get into R programming in 2019


By gdickens in Economics, Opinion, R-Programming Tag Data Science, Public Policy
Analysis, R
Note: A version of this article was also published via LinkedIn here and on Medium
here. 

With the rise of ‘Big Data’, ‘Machine Learning’ and the ‘Data Scientist’ has come an
explosion in the popularity of using open-source programming tools for data analysis.

This article provides a short summary of some of the evidence of these tools overtaking
commercial alternatives and why, if you work with data, adding an open programming
language, like R or Python, to your professional repertoire is likely to be a worthwhile
career investment for 2019 and beyond.

Like most faithful public policy wonks, I’ve spent more hours than I can count
dragging numbers across a screen to understand, analyse or predict whatever
segment of the world I have data on.

Exploring where the money was flowing in the world’s youngest democracy;


analysing which government program was delivering the biggest impact; or
predicting which roads were likely to disappear first as a result of climate change.

New policy questions, new approaches to answer them and a fresh set of data.

Yet, every silver-lining has a cloud. And in my experience with data it’s often the
need to scale a new learning curve to adhere to legacy systems and fulfil an
organizational fetish for using their statistical software of choice.

Excel, SAS, SPSS, Eviews, Minitab, Stata and the list goes on.

Which is why I’ve decided this article needed to be written:

Because not only am I tired of talking to fellow analytical wonks about why they’re
limiting themselves by only being able to work on data with spreadsheets, but also
that there are distinct professional advantages to unshackling yourself from the
professional tyranny of proprietary tools:
1. Open-Source Statistics is Becoming the Global Standard
Firstly, if you haven’t been watching, the world is increasingly full of data. So much
data, that the world is chasing after nerds to analyse it. As a result, the demand for a
‘jack of all trades’ data person, or “data scientist” has been outstripping that of a
more vanilla-flavoured ‘statistician’:

% Job Advertisements with term “data scientist” vs. “statistician”

(Credit: Bob Muenchen – r4stats.com)

And although you might not have aspirations to work in what the Harvard Business
Review called the ‘Sexiest Job of the 21st Century’ the data gold rush has had
implications far beyond the sex appeal of nerds.

For one, online communities like Stackoverflow, Kaggle and Data for


Democracy have flourished. Providing practical avenues for learning how to do
some science with data and driving demand for tools that make applying this
science accessible to everyone, like R and Python.

So much, that some of the best evidence, suggests that not only is demand for
quants with R and Python skills booming, but the practical use of open-source
statistical tools like R and Python are starting to eclipse their proprietary relatives:

Statistical software by Google Scholar Hits:


(More credit to Bob Muenchen – r4stats.com)

Of course, I’m not here to conclusively make the point that a particular piece of
software is a ‘silver bullet’. Only that something has happened in the world of data
that the quantitatively inclined shouldn’t ignore: Not only are R and Python
becoming programming languages for the masses, but they’re increasingly proving
themselves as powerful complements to more traditional business analysis tools
like Excel and SAS.

2. R is for Renaissance Wo(Man)


For those watching the news, you’ll no doubt have heard of the great battle being waged
between the R and Python languages that has  tragically left the internet strewn with the
blood of programmers and their pocket protectors.

But I’m going to goosestep right over the issue as in my opinion much of what I say for R,
is increasingly applicable to Python.

For those of you unfamiliar with R, in essence it’s a programming language made to
use computers to do stuff with numbers.

Enter: “10*10” and it will tell you ‘100’ 

Enter: “print(‘Sup?’)” and the computer will speak to you like that kid loitering on
your lawn.

Developed around 25 years ago, the idea behind R was in essence to develop a
simpler, more open and extendible programming language for statisticians.
Something which allowed you greater power and flexibility than a ‘point and click’
interface, but that was quicker than punch cards or manually keying in 1s and 0s to
tell the computer what to do.

The result: R – A free statistical tool whose sustained growth has helped create one
of the most flexible statistical tools in existence.

So much growth in fact, that in 2014 enough new functionality was added to R by
the community that “R added more functions/procs than the SAS Institute has written in
its entire history.” And while it’s not the quantity of your software packages that
counts, the speed of development is impressive and a good indication of the likely
future trajectory of R’s functionality. Particularly as many heavy hitters including the
likes of Microsoft, IBM and Google are already using R and making their own
contributions to the ecosystem:  

Using R for Analytics – Get in Before George Clooney Does:


Image source. Also, see here

Not only that, but with much of this growth being driven by user contributions, it is
also a great reminder of the active and supportive community you have access to as
an R and Python user. Making it easier to get help, access free resources and find
example code to steal base your analysis on.

3. R is Data and Discipline Agnostic

(Source: xkcd)

One of the first things that motivated me to learn R, was the observation that many
of the most interesting questions I encountered went unanswered because they
crossed disciplines, involved obscure analytical techniques, or were locked away in a
long-forgotten format. It therefore seemed logical to me that if I could become a
data analytics “MacGyver”, I’d have greater opportunities to work on interesting
problems.

Which is how I discovered R. You see, as somebody that is interested


in almost everything, R’s adoption by such a diverse range of fields made it nearly
impossible to overlook. With extensions being freely available to work with a wide
variety of data formats (proprietary or otherwise) and apply a range of nerdy
methods, R made a lot of sense.

I think it was Richard Branson that once said “If somebody offers you a problem but
you are not sure you can do it, say yes. R probably has a package for it” (!):
 Want to forecast sales?
 Need to create a beautiful graph to impress that guy you like?
 Want to see your data on a map so you can see if it relates to geography?
 Want to create a regression model to test whether Bill from accounting is
stealing your lunch?
 Want to streamline that statistical summary your boss keeps asking for?
 Need to turn your analysis into an interactive web dashboard?
 Want to radically shift a nation’s budget transparency?

Then R (and increasingly Python) has you covered.

Yet there is perhaps a subtler reason adopting R made sense and that’s the simple
fact that by being ‘discipline agnostic’ it’s well-suited for multidisciplinary teams,
applied multi-potentialites and anyone uncertain about exactly where their career
might take them.

4. R Helps Avoid Fitting the Problem to the Tool

As an economist, I love a good echo chamber. Not only does everybody speak my
language and get my jokes, but my diagnosis of the problem is always spot-on.
Unfortunately, thanks to errors of others, I’m aware that such cosy teams of
specialists, isn’t always a good idea – with homogeneous specialist teams risking
developing solutions which aren’t fit for purpose by too narrowly defining a problem
and misunderstanding the scope of the system it’s embedded in.
(Source: chainsawsuit.com)

While good organizations are doing their best to address this, creating teams that
are multidisciplinary and have more diverse networks can be a useful means to
protect against these risks while also driving better performance. Which of course
stands to be another useful advantage of using more general statistical tools with a
diverse user base like R: as you can more fluidly collaborate across disciplines while
being better able to pick the right technique for your problem, reducing the risk
that everything look like a nail, merely because you have a hammer.  5.
Programming Encourages Reproducibility

Yet programming languages also hold an additional advantage to more typical ‘point
and click’ interfaces for conducting analysis – transparency and reproducibility.  

For instance, because software like R encourages you to write down each step in
your analysis, your work is more likely to be ‘reproducible’ than had it been done
using more traditional ‘point and click solutions. This is because you’re encouraged
to record each step needed to achieve the final result making it easier for your
colleagues to understand what the hell you’re doing and increasing the likelihood
you’ll be able to reproduce the results when you need to (or somebody else will).
In addition to this being practically useful for tracing your journey down the data-
analysis-maze, for analytical teams it can also serve as a means for encouraging
collaboration by allowing to more easily understand your work and replicate your
results. Assisting with organizational knowledge retention and providing an
incentive for ensuring analysis is accurate by often making it easier to spot errors
before they impact your analysis or soil your reputation.

Finally, while the use of scripting isn’t unique to open-source programming


languages, by being free, R and Python comes with an additional advantage that in
the instance you decide to release your analysis, the potential audience is likely to
be greater and more diverse than had it been written using propriety software.
Which is why in a world of the “Open Government Partnership” open-source
programming languages makes a lot of sense, providing a means of easing the
transition towards government publicly releasing government policy models.

6. R Helps Make Bytes Beautiful  


As data-driven-everything becomes all the rage, making data pretty is becoming an
increasingly important skill. R is great at this, with virtually unlimited options for
unleashing your creativity on the world and communicating your results to the
masses. Bar graphs, scatter diagrams, histograms and heat maps. Easy.

Just not pie graphs. They’re terrible.  

But R’s visualization tools don’t finish at your desk, with the ‘Shiny’ package allowing
you to take your pie graphs to the bigtime by publishing interactive dashboards for
the web. Boss asking you to redo a graph 20 times each day? Outsource your work
to the web by automating it through a dashboard and send them a link while you
sip cocktails at the beach.

7. R and Python are free, but the Cost of Ignoring the Trend Towards Open-
Source Statistics Won’t Be

Finally, R and Python are free, meaning not only can you install it wherever you
want, but that you can take it with you throughout your career:
 Statistics lecturers prescribing you textbooks that are trying to get you
hooked on expensive software that likely won’t exist when you graduate?  Tell
them it’s not 1999 and send them a link to this.
 Working for a not-for-profit organization that needs statistical software but
can’t afford the costs of proprietary software? Install R and show them how to
install Swirl’s free interactive lessons.
 Want to install it at home? No problem. You can even give a copy to your cat.
 Got a promotion and been gifted a new team of statisticians?  Swap the
Christmas bonuses for the give the gift that keeps giving: R!
But I’m not here to tell you R (or Python) are perfect. Afterall, there are good
reasons some companies are reluctant to switch their analysis to R or Python. Nor
am I interested in convincing you that it can, or should, replace every proprietary
tool you’re using. As I’m an avid spreadsheeter and programs like Excel have distinct
advantages.

Rather, I’d like to suggest that for all the immediate costs involved in learning an
open-source programming language, whether it be R or Python, the long-term
benefits are more than likely to surpass them.

(Source)

Not only that, but as a new generation of data scientists continue to push for the
use of open-source tools, it’s reasonable to expect R and Python will become as
pervasive a business tool as the spreadsheet and as important to your career as
laughing at your boss’ terrible jokes.  

Interested in learning R? Check out  this link here for a range of free resources.

You can also  read my review of the online specialization I took to scale the R learning
curve here.

You might also like