You are on page 1of 40

Persiapan Menuju Analisis Multivariat

Outline
• Graphical examination of data
• Types and potential impacts of missing data
• Identify outliers
• Test the assumptions underlying multivariate
techniques
What is data analysis?
What is data analysis?
• Data analysis is the process of organizing and
arranging data so that the results of the study
can be interpreted.
Graphical examination of data
• Univariate profile: shape of distribution
Graphical examination of data
• Bivariate profile: examination of relationship
Graphical examination of data
• Bivariate profile: examination of group differences
Identify Missing Data
• What is missing data?
– Valid values on one or more variables are not
available for analysis.
• What are the impacts of missing data?
– Practical impacts: the reduction of sample size for
the analysis.
– Substantive impacts: results of statistical analysis
on data with a nonrandom missing data process
could be biased.
Identify Missing Data
Identify Missing Data
Identify Missing Data
Identify Missing Data
Identify Missing Data
Imputation Techniques for Missing Data
Imputation Techniques for Missing Data
Identify Outliers
Identify Outliers
Test the assumptions underlying multivariate
techniques
Test the assumptions underlying multivariate
techniques
Test the assumptions underlying multivariate
techniques
Test the assumptions underlying multivariate
techniques
Test the assumptions underlying multivariate
techniques
Test the assumptions underlying multivariate
techniques
Test the assumptions underlying multivariate
techniques
Incorporating Non-metric Data with Dummy
Variables
Data Preparation (1)
• The first step of data preparation is coding. Coding is the process
of transferring verbal answers or observational categories into
numbers that can be recognized by the computers.
• Coding is time consuming, expensive, and source of error.
Precoded responses to a question are nescessary. For example,
instead of asking “What is your occupation?” we ask “What is
your occupation? Are you:
1) A professional
2) A government official
3) An employee
4) A self-employed
5) Temporarily un-employed
Data Preparation (2)
• Technological advances have reduced the
potential for error in time-consuming data
preparation. Still, data should be checked for
double codes, illegible numerals, and simple
mistakes.
• History of data entry:
1) Punch card
2) Direct data entry (DDE)
3) Optical scanning
Data Handling (1)
• Data handling prior to analysis may include:
 Handling missing values
 Recoding variables
 Creating new variables

• All the above procedures can be performed


using software programs (SPSS, SAS, etc.).
Data Handling (2)
Handling missing data
• Missing data may be caused by respondent or interviewer
oversight and left a question unanswered, or some observational
data might have been lost.
• If there is a large amount of missing data for a case, the case might
be dropped from analysis.
• If only a small amount of data is missing, we can declare a
“missing value” for each variable so that the analysis include only
cases with valid data.
• Some software programs have procedure to “replace missing data
with measure of central tendency” so that all the cases with
missing data can be include in the analysis.
Data Handling (3)
Recoding variables
• It is strongly recommended that data be entered in
their raw form.
• For a certain purpose, a researcher may recode a
variable, e.g. instead of “actual number of children”,
the following salient categories are meaningful:
1) No or only 1 child (small families)
2) 2 or 3 children (roughly average)
3) 4 or more (large families)
Data Handling (4)
Constructing variables
• There are two types of constructed variable:
1) Composite indices
2) Rates or ratios
• Composite indices are combination of several items
intended to measure a construct.
• Rates are mathematical combinations of variables that are
intended to operationalize a construct. Rates are the
number of occurences per unit of the population.
Mathematical operation may include simple additive to very
complex weighting scheme.
Data Analysis (1)
• Qualitative data analysis includes studying
notes and transcripts, classifying occurrences
and relating examples.
• Quantitative data analysis includes data
preparation, data entry, data handling, and
statistical analyses.
• Quantitative data analysis concists of univariate
analysis, bivariate analysis, and multivariate
analysis.
Data Analysis (2)
Data Analysis (3)
Univariate analyses
• Univariate = one variable.
• Univariate statistics = descriptive statistics =
describe only one variable.
• Descriptive statistics include:
1) Frequency distributions
2) Central tendency
3) Dispersion
Data Analysis (4)
Frequency distributions:
•Table, histogram, frequency polygon.
•Percentage more important than number.
Data Analysis (5)
Central tendency:
• Express how the scores tend to cluster around a central or
average value.
• Three measures of central tendency:
a) Mode: the most frequently occuring score. Mode is appropriate
for nominal-level data.
b) Median: the middle score, half of the scores are above and half
below it. Median is used if there are extreme scores, e.g. Income.
c) Mean: the arithmatic average obtained by summing all scores and
dividing by the number of scores. Mean is preferred for
continuous data (interval and ratio level data) without outlying
scores.
Data Analysis (6)
Dispersion:
• Dispersion is a measure of how spread out
scores are.
• Dispersion includes:
 Range: the distance between lowest and highest
scores.
 standard deviation: the average deviation of all
raw scores from the mean. The larger the sd the
more spread out scores are from the mean.
Data Analysis (7)
Bivariate Analyses
• Bivariate = 2 variables.
• Bivariate analysis = 2 variables are considered
simultaneously = bivariate relationships.
• The strength of the relationships are measured
by the correlation coefficient.
• CC + = the variables vary in the sam direction.
• CC - = the variables vary in the opposite direction.
Data Analysis (8)
Multivariate Analyses
• Multivariate = 3 or more variables.
• Examples of MA:
1) Factor analysis
2) Cluster analysis
3) Path analysis
Summary
• Data analyses are planned and carried out to
answer specific research questions.
• Statistical analyses usually begin with simple
one-variable (univariate) description, two-
variable (bivariate) relationships and followed
by multivariate techniques to approach the
reality of population studies.

You might also like