Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Standard view
Full view
of .
Save to My Library
Look up keyword
Like this
0 of .
Results for:
No results containing your search query
P. 1


Ratings: (0)|Views: 8 |Likes:
Published by terrabyte
There is no consensus on what being a data scientist actually means. Data scientists come from the ranks of statisticians, mathematicians, programmers, and many other disciplines. This blog presents a typology of data scientists, akin to the MBTI, for characterizing Data Scientists.The typology includes five sets of descriptors representing spectra of preferences, skills, or predominant activities.
There is no consensus on what being a data scientist actually means. Data scientists come from the ranks of statisticians, mathematicians, programmers, and many other disciplines. This blog presents a typology of data scientists, akin to the MBTI, for characterizing Data Scientists.The typology includes five sets of descriptors representing spectra of preferences, skills, or predominant activities.

More info:

Published by: terrabyte on May 06, 2013
Copyright:Attribution Non-commercial


Read on Scribd mobile: iPhone, iPad and Android.
download as DOCX, PDF, TXT or read online from Scribd
See more
See less





Read any popular business magazine and you’re sure to find an
article about how
data science
is the wave of the future. Since2011, after fifty years of wandering through the halls of academia, real world employment of data scientists hasskyrocketed. Last year, the Harvard Business Review declaredthe job of Data Scientist to be the
Sexiest Job of the 21st Century.
Unfortunately, there is no consensus on what being adata scientist actually means.Data scientists come from the ranks of statisticians,mathematicians, accountants, data engineers, programmers,database managers, data miners, business analysts, risk assessors,and specialists in visualization, machine learning, patternrecognition, simulation, predictive modeling, and quality management (e.g., Six Sigma Greenand Black Belts). Programmers might specialize in any of a dozen different languages. Evenwithin statistics, there are data scientists from very disparate branches based on domain expertiseand methods, such as: survey statistics; mathematical statistics; biostatistics; chemometrics;geostatistics; epidemiology, business statistics; psychometrics; and econometrics. No individualdata scientist knows and uses but a small fraction of the hundreds of methods available for managing and
analyzing data. Is it any wonder, then, that it’s impossible to define what a data
scientist does except in very general terms. Even uni
versities can’t agree on what a curriculum
for training data scientists should look like.So how might a data scientist describe his or her interests and skills succinctly in a field in which practitioners come from so many different backgrounds? This is a problem akin to typologies for  personality assessment, like the Myers-Briggs Type Indicator (MBTI).
Such typologies don’t
cover every possible characteristic, but rather, summarize a few key dimensions.In this typology of data scientists, there are five sets of descriptors representing spectra of  preferences, skills, or predominant activities. Each data scientist chooses the categories that bestdescribe his or her skills and preferences. An individual data scientist might have many skills and preferences, but only certain of them would predominate. They might also change over time. Atypology of data scientists would be a simple way to identify key characterize that others canquickly understand and use to facilitate working together.So, here
’s what a typology of data scientists might look like.
Method Base
Every data scientist has a set of methods he or she isfamiliar with, usually based on training and reinforced byexperience. The methods a data scientist uses could besaid to fall into two categories
organization andanalysis
although there is overlap. Data scientists tendto use methods that are predominately one type or theother.
favor methods involving programming, data warehousing, database management,
 I’m the Sexiest Cat 
of the 21st Century
We’re organizers.
data parsing, merging, extracting, sorting, filtering, clustering andclassification. They tend to be computer scientists, programmers,database managers, data miners, and mathematicians. They oftenwork with Big Data that is compiled, validated, and processed inreal time.
favor methods involving data description,hypothesis testing, and predictive modeling. They tend to bestatisticians, business analysts, quality managers, risk assessors,and predictive modelers. They usually work with static datasetsthat have been extensively scrubbed in preparation for analysis.
There are many different methods a data scientist can rely on, bethey programming languages or analysis techniques. In practice,each data scientist tends to have a set of core methods that he or she uses routinely. Usually, the methods are what they learned inschool or have found to be successful in their work. Sometimes, the methods are researchfavorites or specialties they offer for an advantage over business competitors. That leads to twotypes of data scientist on a spectrum of work practice
generalists and specialists.
 will use a variety of methods and software, even going so far as to learn new analysis techniquesor programming languages that might be applicable to a given dataset.
will rely ontechniques they know well and have used extensively in the past, modifying design elements andmethod specifications to find the best result for a dataset.
Data scientists also have a tendency to focus on either the domain of the
data or a method’s
fundamental characteristics.
experts honor the sources, meanings, and limitations of thedata elements they are studying. They tend to be goal oriented and methodologically flexible.
They are often willing to “bend the rules” a bit in order to conduct an analysis. They will use
data transformations and other model optimization techniques. They will examine violations of assumptions to assess the severity of impact and possible corrective measures before foregoing a
 planned analysis. They’ll even consider using unconventional and controversial
approaches if they believe the action is warranted.
experts understandthe mathematical foundation of their analysis technique and how itis implemented by software. They often write their own code, evenfor routine tasks. They tend to follow rigorous plans and procedures. They
lay by the rules.”
They avoid deleting outliersand using transformations and stepwise techniques that mightcapitalize on chance. They will switch to alternative analyticalmethods upon
violation of a method’s assumption
Credentials are embodied in education and experience, the more the
 better, at least in general. Beyond that, it’s impossible to quantify
credentials. Education stresses theory; experience stressesapplication. A good education involves brief exposure to a widevariety of ideas; experience involves a much longer exposure tofewer ideas. A degree represents a package of learning that may or 
 I'm an Analyzer.
 I have my methods.
may not be relevant to a job. On the job experience is always relevant but may represent either acontinually advancing set of skills or the same set of skills repeated again and again.Data scientists almost always have a relevant degree to begin in the profession. After that, eachadditional year in school is probably worth two to four years of experience, some say more andsome say less. Experience has to be progressive. The first five years is often spent learning aboutthe working world, such as what to do when the boss tells you to make a pie chart. The next fiveto fifteen years is learning how data scientists solve problems. From fifteen to thirty years, youlead projects using data science to solve problems. You also get to bedevil recent graduates bytelling them to summarize their regression-on-principal-components model in a singlePowerPoint slide. After thirty years, you just tell stories about how you used to do ANOVAswith a pencil and paper.So, characterize credentials with the combination of 
Recognize, though, thatcredentials are difficult to express in a word like the other characteristics of data scientists.Furthermore, degrees and experience are different for every type of data scientist. A BS+1 programmer or mathematician might be on the verge of a major breakthrough. Bill Gates, for example,
didn’t have
credentials when he started Microsoft. In contrast, an MS+5 appliedstatistician is just starting out.
The final characteristic of a data scientist is howthey communicate, or at lease prefer tocommunicate, not in terms of media, but rather, interms of 
. Think of theaudience as either:
. Communications inwardinvolves your peers in school, at work,and in the data science profession. Theseare people who you could show computer code or matrix formulas to and expect thatthey would be interested.
. Communications outwardinvolves p
eople who aren’t data scientists but may be interested in what you do
, thoughnot in your code or formulas. They may be co-workers, a class you teach, an invisibleaudience on the internet, or the general public.Content can be categorized as:
. High level, top-down communications that usually involve ideas, concepts,trends, patterns, summaries, mathematical laws, and other general information.
. Communications involving specific methods, formulas, code, datastructures, programming practices, data elements, and other details of data science.
 I’m a Top
-Down communicator.

You're Reading a Free Preview

/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->