You are on page 1of 25

3-1.

Types of Data Analysis


Learning Goal

Build foundation for doing data analysis using R

Things to Learn

Base-R Tidyverse Data Analysis


▪ Week 1, 4, 5, 6 ▪ Week 7, 8, 10, 11, 12 ▪ Week 2, 3, 13, 14

▪ R programming language ▪ Tidyverse package ▪ Types of Data Analysis


- Variable, assignment, data - ggplot2 (data visualization) - Descriptive, exploratory,
structure, operation, - dplyr(data transformation) inferential, predictive,
function, iteration, - tidyr(data tidying) causal data analysis.
conditional statement - purrr(iteration)
▪ Workflow of data analysis
- Import, transformation,
visualize, modeling, report
Why do we need to know the types
of data analysis?
▪ Leek & Peng (2015) categorized data analysis into 6 types and
then emphasized that the most common error in data analysis
is to mistake the type of question be considered.

기술통계분석 탐색적자료분석 추론분석


(Descriptive) (Exploratory) (Inferential)

예측분석 인과분석 결정론적분석


(Predictive) (Causal) (Mechanical)

Leek, J. T., & Peng, R. D. (2015). What is the question?. Science, 347(6228), 1314-1315.
Descriptive Data Analysis

▪ Descriptive Data Analysis are used to summarize data without


further interpretation.

<Korean Census, 2021 Aug>

Leek, J. T., & Peng, R. D. (2015). What is the question?. Science, 347(6228), 1314-1315.
Exploratory Data Analysis

▪ Exploratory Data Analysis (EDA) are used to discover patterns in


data, such as trends and correlations, to draw ideas and
hypothesis.

Leek, J. T., & Peng, R. D. (2015). What is the question?. Science, 347(6228), 1314-1315.
Inferential Data Analysis

▪ Inferential Data Analysis are used to make decision whether


the patterns found in samples would be still hold in populations.

Leek, J. T., & Peng, R. D. (2015). What is the question?. Science, 347(6228), 1314-1315.
Predictive Data Analysis

▪ Predictive Data Analysis are used to predict outcomes on a


single person or unit.
▪ EX) Netflix recommends movies to us based on predictions.

Leek, J. T., & Peng, R. D. (2015). What is the question?. Science, 347(6228), 1314-1315.
Causal Data Analysis

▪ Causal Data Analysis are used to find out what happens to one
measurement on average if you make another measurement
change.
▪ EX) Data analysis to identify the causal relationship between
smoking and lung cancer.

Leek, J. T., & Peng, R. D. (2015). What is the question?. Science, 347(6228), 1314-1315.
Mechanistic Data Analysis

▪ Mechanistic Data Analysis are used to show that changing one


measurement always and exclusively leads to a specific,
deterministic behavior in another.
▪ EX) data analysis has shown how wing design changes air flow
over a wing, leading to decreased drag

Leek, J. T., & Peng, R. D. (2015). What is the question?. Science, 347(6228), 1314-1315.
Common Mistakes in Data Analysis

Leek, J. T., & Peng, R. D. (2015). What is the question?. Science, 347(6228), 1314-1315.
The mistakes that inferential data analysis
are interpreted as causal data analysis
▪ Correlation does not imply causation
▪ Ex) the correlation between icecream consumption and murder
rate does not imply the causal relationship between them.

아이스
크림

온도

범죄율
The mistakes that exploratory data analysis are
interpreted as inferential data analysis
▪ Data dredging refers to a data mining practice in which large
volumes of data are analyzed seeking any possible relationships
between data.
▪ Data dredging is also called Data fishing and P-hacking.
The mistakes that descriptive data analysis are
interpreted as inferential data analysis
▪ The patterns from few cases should not be generalized without
proper inferential data analysis.

Case Study
The mistakes that exploratory data analysis are
interpreted as predictive data analysis
▪ The patterns found from exploratory data analysis might
represent errors and noise specific to the current samples.
▪ There is no guarantee that the patterns found from our samples
will hold for the unseen new data set.
Why is it important to understand the
types of data analysis?
▪ Mistaking the types of questions being considered in each type
of data analysis will make you draw wrong and unreliable
conclusions.
Causality in Social Science

▪ In social science, scientists make theory about the causal


mechanisms to explain various phenomena, and test the theory
using data.
▪ The data analysis for solving social problems aims to present
solutions specific problems based on data analysis. Therefore,
some causal mechanisms should there to discuss the problems
and solutions.
▪ The fundamental difficulty of data analysis and research in
social science is that the data and statistical tools used in social
science have limitations in showing the causal relationships.
Causality

▪ X is a cause of Y if

1. X precedes Y in time (temporal precedence).


2. Some mechanism whereby this causal effect operates can be
posited (causal mechanism)
3. A change in the value of X is accompanied by a change in the
value of Y on the average (correlation)
4. The effects of X on Y can be isolated from the effects of
other potential variables on Y (isolation or lack of
confounders)

Cohen, P., West, S. G., & Aiken, L. S. (2014). Applied multiple regression/correlation analysis for the behavioral
sciences. Psychology press.
Types of Research

▪ Experimental study
▪ Three conditions for experimental study
▪ Random assignment of subjects
▪ Subjects should be randomly assigned to experimental
and control groups.
▪ Manipulation of independent variables
▪ Treatment/intervention is given to a experimental
group.
▪ Isolation of extraneous variables
▪ The effects of all other variables except the
independent variable should be isolated
Types of Research

▪ Experimental study
▪ Three variables
▪ Dependent variable
▪ A variable that we want explain
▪ Independent variable
▪ A variable that is manipulated
▪ A variable that explains the dependent variable
▪ Extraneous variable
▪ All other variables except the independent variable
that influences the dependent variables
▪ Confounding variable
Types of Research

▪ Experimental study
▪ Three variables
Types of Research

▪ The reason why experimental study can tell causality


▪ The three conditions of experimental study allows us to
eliminate alternative explanations and to explain the
changes in the dependent variable solely based on the
changes in the independent variable.
Types of Research

▪ Non-experimental study
▪ In social science, it is not easy to conduct experimental
study because of the ethical and practical reasons.
▪ Quasi-experimental research
▪ Research in which not all three conditions of
experimental research are met (especially the random
assignment).
▪ Observational study
▪ Research in which researchers can only observe without
the three conditions for experimental study.
The key task
in Social Science Research
▪ Most studies in social science collect data using observational
study, and use correlation-based models such as regression.
Therefore, it is very important to make a strong argument using
both correlations and theory to talk about causality.

Causality

Theory

Correlation (observational, correlational)


Summary

▪ Data analysis can be categorized into descriptive, exploratory,


inferential, predictive, causal, and mechanistic data analysis.

Types of Data Analysis Goal


Descriptive Data Analysis Summary
Exploratory Data Analysis Discovery
Inferential Data Analysis Decision making, Generalization
Predictive Data Analysis Prediction
Causal Data Analysis Causality (on the average)
Mechanistic Data Analysis Causality
Summary

▪ Mistaking the types of questions being considered in each type


of data analysis will make you draw wrong and unreliable
conclusions.

▪ The conditions for causality are 1) temporal precedence, 2)


causal mechanism, 3) correlation, 4) isolation.
▪ In social science, it is very important to make a strong
argument using both correlations and theory to talk about
causality.

You might also like