The Data Dozen

The Data Dozen

Data can take a variety of forms. Some are readily amenable to statistical analysis and some are better suited to other methods of analysis. When you’re trying to solve some problem or research question, though, you need to use whatever is available that fits. Here are twelve types of data to think about using in your next analysis.

Data can take a variety of forms. Some are readily amenable to statistical analysis and some arebetter suited to other methods of analysis. When you
re trying to solve some problem or researchquestion, though, you need to use whatever is available that fits. Here are twelve types of data tothink about using in your next analysis.
Data Type Description Generation Examples
AutomaticMeasurementsInformation generated bydevices, usually electronic ormechanical, that operatewithout human involvement(other than calibration andsample introduction).Experimenter-DeviceThermocouples,strain-gagescales, electronicmetersManualMeasurementsInformation generated bydevices that require humaninvolvement to carry out themeasurement.Experimenter-DeviceRulers, calipers,thermometers,balance-beamscalesArchivedRecordsInformation generated by anidentifiable person ororganizationKnownindividual ororganizationGovernmentrecords, financialdata, personaldiaries, logs,notesDirectedResponsesInformation receives as theresult of a specific directinquiry.Experimenter-SubjectSurveys, focusgroups,interrogationsElectronicRecordingsInformation stored onaudiovisual devicesExperimenter-DeviceVideos, audiorecordings,photos, false-color imagesMetadataData about data
their origins,qualities, scales, and so on.DataGeneratorTime, location,and method ofdata generationTransformationsInformation created from otherinformation.Data AnalystPercentages,sums, z-scores,ratios, and so on.Analog DataInformation from a source thatresembles in some respect aphenomenon underinvestigationExperimenterExperimental labanimals, models
First PersonReportsDescriptive, qualitativeinformation derived from a first-person encounterIndividualEyewitnessaccountsSecondhandReportsInformation summarized orretold by a second party basedon first-person accounts.Knownindividual ororganizationNews storiesUnverifiedReportsInformation, written or retold,which cannot be disproven orverified.Unknownindividual ororganizationAnecdotes,stories, legendsConjecturesInformation created fromthought experiments ratherthan physical experiments.Knownindividual ororganizationExpert opinions
Automatic and manual measurements are used commonly in statistical analysis when they can begenerated in large numbers at reasonable costs. Furthermore, they are often measured oncontinuous, or at least, quantitative scales. These measurements are usually easy to reproduce butmay be time or location dependent.Archive records are also used commonly in statisticalanalyses, usually as government records and financialdata, when they are measured on quantitative scales.These data are often considered
“official” because they
have been verified even though they are not reproducible.Archive records may also provide qualitative information,usually in small amounts, such as personal diaries, logs,notes, and so on. These can be used to support statisticalanalyses and are a mainstay of scientific investigations.Directed responses, information received as the result of specific questions, includes results of surveys and focusgroups, which are commonly analyzes with statistics.Direct response data is also generated by direct and cross examinations in court and by militaryand law enforcement interrogations. Direct response data comes from individuals, so theirresponses may not always be true and consistent.Two types of data that are used in almost all data analyses are metadata and transformations.Metadata are data about data, such as descriptions of their origins, qualities, scales, and so on.Transformations are data created from other data, which includes percentages, z-scores, sums,ratios, mathematical functions and so on(http://statswithcats.wordpress.com/2010/11/21/fifty-ways-to-fix-your-data/ 
).Analogs are data sources that substitute for the actual phenomenon of interest. Models are a typeof analog as are animals used in medical experiments (much to their and my displeasure).Statistics is all about models(http://statswithcats.wordpress.com/2010/08/08/the-zen-of-modeling/ ), from basing test probabilities on the Normal distribution to creating regressionmodels from data.
Data? I thought you said tuna.
Electronic recordings, like videos and audio recordings, would seem to be a good type of data toanalyze. Recordings have a great data density, though it can be laborious to extract individualdata elements from the qualitative recording source. They can be faked, but so too can all theother types of data.Reports come from witnesses. First person reports come from eyewitnesses. The information istypically descriptive, qualitative, and may be verifiable but typically isn
’t reproduc
ible and maynot even be true. Secondhand reports are eyewitness reports that are summarized or retold by asecond party, such as news agencies. Unverified reports, anecdotes, stories, and legends that maybe written or retold, come from sources that are unknown. These reports usually cannot bedisproven or verified. Reports don
’t often provide data elements for statistical analyses but may
provide supporting evidence or metadata.Finally, conjectures are data produced by experts through thought experiments rather thanphysical experiments. The Delphi process(http://en.wikipedia.org/wiki/Delphi_method
)is agood example of the use of conjecture. Usually conjecture is used in situations in which datacannot be collected, such as forecasting the future.Data analysts use all these data types. Statisticians want touse data types that provide many observations so they canassess variability. Scientists and engineers may besatisfied with the results of a single, albeit well controlled,experiment. They are truly deterministic breeds. Courtswant every piece of evidence to be attested to by anindividual, whether an eyewitness or an expert witness.They want to be able to cross-examine witnesses.Historians don
t usually have eyewitnesses so they rely onreports, especially secondhand and even unverifiedreports. They
ll use whatever they can find.Certainly, this classification is not the only way to look at data. For example, the U.S. legalsystem defines courtroom evidence as either:
physical objects, like a weapon.
illustrations of evidence, like a map of the crime scene.
items that contains human language, like contracts and newspaper articles.
oral or written evidence from witnesses.(http://people.howstuffworks.com/inadmissible-evidence1.htm
). To be admissible in court, thesetypes of evidence have to be
(i.e., proves or disproves a fact),
(i.e., essential tothe case), and
(i.e., proven to be reliable). Trial lawyers use witnesses to tellcompelling stories that will keep judges and juries attentive, which non-testimony evidence maynot. In contrast, noted scientist and lecturer Neil deGrasse Tyson counters that
In courts,eyewitness testimony is considered great evidence. In science it's considered worthless.
s not quite true if the observation can be witnessed by others, such as in the cases of astronomical observations and replicated experiments. UFO eyewitnesses don
t fare so well withscientists. Statisticians want more, though. Our analyses aren
t based on
Did you see that?

