Multivariate Analysis or Survey Data Analysis by Tim Fry PDF

October 2001
exPERT STATS
STATISTICS WORKSHOPS FOR RESEARCH
STUDENTS
Multivariate Analysis
or
Survey Data Analysis
Dr. Tim R.L. Fry

Introduction. About the Data.
u This workshop is concerned with Measurement Scales.

analysing data collected using sample u Non-metric measurements are made
surveys.
on a nominal or ordinal scale. They
u I will present an overview of
are qualitative (or discrete) and
techniques that form part of a wider
reflect characteristics or attributes.
class known as Multivariate Analysis.
u Metric measurements are made on
u J.F. Hair, R.E. Anderson, R.L. Tatham
and W.C. Black Multivariate Data an interval or ratio scale. They are
Analysis, 5 th Edition, Prentice Hall, quantitative and reflect relative
1998. quantity or distance.
u ETC/MKC3500 Survey data analysis u Need to “match” the technique and
http://webct3.buseco.monash.edu.au:8900/ the measurement scale.
u (My choice of) Software: S.P.S.S.
A Classification 1. A Classification 2.
Interdependence Techniques.
Dependence Techniques.
u The explanation or prediction of u The identification of the structure of
dependent variable(s) by other inter-relationships. Variables are
independent variable(s). analysed simultaneously with none
being designated as either dependent or
u Examples include: independent.
– (various types of) Regression, u Examples include:
– Canonical Correlation, – Factor Analysis,
– Structural Equations, – Cluster Analysis,
– (Chi-squared) Automatic Interaction – Multidimensional Scaling,
Detection [(CH)AID], – Correspondence Analysis.
– Multivariate Analysis of Variance, u Other approaches to analysing survey
– Conjoint Analysis. data that are not “multivariate analysis”
include data mining, genetic algorithms
and neural networks.
1
A “Structured” Approach 1. A “Structured” Approach 2.
u Definition Stage. u Estimation

– Research problem. Objectives. – Choice of any estimation method and
Multivariate technique. “Conceptual any options. Estimation of specified
Model”. model. Model “fit”.
u Analysis Plan. u Interpretation.
– Implementation issues. Sample size. – Identify the empirical evidence.
Measurement scales. u Validation.
u Validate Assumptions. – Are the results generalisable?
– Are the statistical and conceptual Diagnostic analysis. Outliers.
assumptions met? Preliminary data
analysis. u These last stages, especially the last
three tend to interact with each other.
General Guidelines. A Key Assumption.
u Establish practical as well as u Central to the use of many of these

statistical significance. techniques is the assumption that a
u Sample size affects all results. set of variables has a multivariate
Normal distribution.
u Know your data.
u A byproduct of this assumption is
u Strive for model parsimony.
that we often seek transformations
u Look at your errors.
that result in Normality.
u Validate your results. u Transformations are also used to
achieve constant variance
(homoscedasticity) or linearity.
u Note: Any underlying assumption
should be stated and, if possible,
tested.
2
Preliminary Data Analysis 1. Preliminary Data Analysis 2.
u Before undertaking more “complicated” u We can use χ2 tests of association to

analysis we should conduct some begin to investigate any (inter)-
preliminary data analysis. relationships in the data.
u Frequency distributions: – Non-metric variables.
– We can tabulate and cross-tabulate u To test whether our metric variables
variables. can be assumed to have Normal
– We can plot cumulative frequencies. distributions.
» Histograms
– Normal Probability Plots compare the
– Stem and Leaf plots, Box plots. cumulative distribution of the observed
– Produce 2D and 3D scatterplots, scatterplot data with that of a Normally distributed
matrices. variable [P-P Plots].
u These enable us to begin to investigate – Formal tests also exist based upon
our data and “test” any assumptions. either skewness or kurtosis or both.
Graphical Analysis 1. Graphical Analysis 2.
u Graphical analysis is a common

exploratory tool.
u SYSTAT is very good for graphical
analysis.
u Two popular techniques are
Andrews plots and Chernoff Faces.
– Chernoff faces represent the
observation on the p variables by a two
dimensional face whose characteristics
(shape, nose length, eye size, …) are
determined by the standardized
measurements of the variables.
– Early work used only 10 variables.
However, recent implementations have uAlso check the “interactive” WWW site:
used up to 20 variables. http://www.hesketh.com/schampeo/Faces/interactive.html
3
Outliers. Missing Data 1.
u An outlier is an observation that is u Cannot be avoided!

substantially different from the other – Can be for factors external to the
observations. respondent (data collection or entry) or
– Is it “representative” of the population? for actions by the respondent
(refusal/non-response).
u Detection: – Key questions are is the missing data
– Univariate [plots, Z scores]. randomly spread and how much data is
– Bivariate [plots, scatterplots]. missing?
– Multivariate [distance measures based – Missing data has two main impacts.
upon Mahalanobis D2]. Potential “bias” in the analysis and
u Retention or Deletion? reduction of the effective sample size.
– Is it a “valid” observation from the
population or not?
» If YES then retain the observation.
Missing Data 2. An Example
u Examine the missing data. We want a Republic, God Save the Queen
– Is it possible to determine the reason(s) u Current, ongoing research, joint with
for the missing data? Can we determine
a missing data process? Sinclair Davidson at RMIT.
u Dealing with missing data. u Here we look at analysing some data in
– Delete respondent(s) [case(s)] and/or The Australian Election Survey, 1998.

variable(s) from the analysis. Extreme – In particular, we will look at questions
version is to use only those respondents relating to the “republic issue”.
with complete data. This is acceptable – Later, we will be analysing The
when data is missing at random and the
Constitutional Referendum Study, 1999.
sample size is large.
– Replacement of the missing values by – Both data files available from Social
imputation methods. These methods are Science Data Archives at A.N.U.
either based on all respondents with » http://ssda.anu.edu.au/
complete data or on all non-missing
data.
4
Referendum vote on republic
Frequencies Cumulative
Frequency Table Frequency Percent Valid Percent Percent
Valid Definitely vote for republic 690 36.4 37.2 37.2
Importance of Queen Probably vote for republic 427 22.5 23.0 60.3
Undecided 218 11.5 11.8 72.0
Cumulative
Frequency Percent Valid Percent Percent Probably vote against
235 12.4 12.7 84.7
Valid Very important 183 9.6 9.9 9.9 republic
Fairly important Definitely vote against
381 20.1 20.5 30.4 283 14.9 15.3 100.0
republic
Not very important 1292 68.1 69.6 100.0
Total 1853 97.7 100.0
Total 1856 97.8 100.0
Missing Missing 44 2.3
Total 1897 100.0
Total 1897 100.0
Crosstabs
Australia a republic
Referendum vote on republic * Head of republic from voters Crosstabulation
Cumulative
Frequency Percent Valid Percent Percent Count
Valid Strongly for republic 629 33.2 34.3 34.3 Head of republic from voters
Favour republic 577 30.4 31.5 65.8 Strongly for Strongly for
Favour retaining Queen 462 24.4 25.2 91.0 election by For election For election by election by
Strongly for retaining voters by voters Parliament Parliament Total
165 8.7 9.0 100.0
Queen Referendum Definitely vote for republic 400 122 131 35 688
Total 1833 96.6 100.0 vote on Probably vote for republic 186 152 77 10 425
Missing Missing 64 3.4 republic Undecided 97 84 27 6 214
Total 1897 100.0
Probably vote against
republic 112 74 38 8 232
Head of republic from voters
Definitely vote against
Cumulative republic 174 60 21 23 278
Frequency Percent Valid Percent Percent Total 969 492 294 82 1837
Valid Strongly for election by
972 51.2 52.8 52.8
voters
Chi-Square Tests
For election by voters 492 25.9 26.7 79.6
For election by Parliament 294 15.5 16.0 95.5 Asymp. Sig.
Strongly for election by Value df (2-sided)
82 4.3 4.5 100.0 a
Parliament Pearson Chi-Square 104.695 12 .000
Total 1840 97.0 100.0 Likelihood Ratio 107.135 12 .000
Missing Missing 57 3.0 Linear-by-Linear
1.427 1 .232
Total 1897 100.0 Association
N of Valid Cases 1837
Change or retain Aust flag a. 0 cells (.0%) have expected count less than 5. The
minimum expected count is 9.55.
Cumulative
Frequency Percent Valid Percent Percent
Valid Strongly for flag change 277 14.6 15.1 15.1
For flag change 443 23.4 24.1 39.2
For retaining flag 509 26.8 27.7 67.0
Strongly for retaining flag 606 31.9 33.0 100.0
Total 1835 96.7 100.0
Total 1897 100.0
Multiple Correspondence Analysis (Multiple Scaling)
Correspondence Analysis
Credit Credit
CORRESPONDENCE HOMALS
Version 1.0 Version 1.0
by by
Data Theory Scaling System Group (DTSS) Data Theory Scaling System Group (DTSS)
Faculty of Social and Behavioral Sciences Faculty of Social and Behavioral Sciences
Leiden University, The Netherlands
Leiden University, The Netherlands
2.5
1.0
.8 2.0
.6
1.5
.4
Dimension 2
.2 1.0
Dimension 2
.0
.5
-.2
-.4 Vote in House of

Referendum Vote 0.0
-.6 on Republic Representatives
-.8 Head of Republic -.5

Referendum vote
-1.0 from voters on Republic
-1.0 -.8 -.6 -.4 -.2 .0 .2 .4 .6 .8 1.0 1.2 -1.0
Head of Republic
Dimension 1 -1.5 from voters
-1.5 -1.0 -.5 0.0 .5 1.0 1.5
Dimension 1
Logistic Regression Classification Table
a
Case Processing Summary Predicted

a
Unweighted Cases N Percent
Referendum Vote
Selected Cases Included in Analysis 1746 92.0
Missing Cases 151 8.0 Not For Percentage
Observed Republic For Republic Correct
Total 1897 100.0
Step 1 Referendum Vote Not For Republic 464 221 67.7
Unselected Cases 0 .0
Total For Republic 106 955 90.0
1897 100.0
a. If weight is in effect, see classification table for the total Overall Percentage 81.3
number of cases. a. The cut value is .500
Dependent Variable Encoding Variables in the Equation
Original Value Internal Value B S.E. Wald df Sig. Exp(B)

Not For Republic Step
a G1 268.017 2 .000
0
1 G1(1) 1.315 .385 11.656 1 .001 3.725
For Republic 1
G1(2) 3.546 .368 92.934 1 .000 34.676
Categorical Variables Codings G3 5.817 3 .121
G3(1) .613 .338 3.290 1 .070 1.846
Parameter coding
G3(2) .374 .347 1.160 1 .282 1.454
Frequency (1) (2) (3) (4) (5) (6) (7)
Vote in House of Liberal 661 .000 .000 .000 .000 .000 .000 .000 G3(3) .678 .367 3.416 1 .065 1.970
Representatives Labor 701 1.000 .000 .000 .000 .000 .000 .000 G4 160.619 3 .000
National (Country) 84 .000 1.000 .000 .000 .000 .000 .000 G4(1) 2.621 .282 86.254 1 .000 13.747
Australian Democrats 97 .000 .000 1.000 .000 .000 .000 .000 G4(2) 2.008 .191 110.493 1 .000 7.451
Greens 31 .000 .000 .000 1.000 .000 .000 .000 G4(3) .802 .156 26.446 1 .000 2.230
One Nation 113 .000 .000 .000 .000 1.000 .000 .000
B11REPS 31.041 7 .000
Other Party 35 .000 .000 .000 .000 .000 1.000 .000
Informal/Didn't vote 24 .000 .000 .000 .000 .000 .000 1.000 B11REPS(1) .290 .153 3.579 1 .059 1.337
Change or retain Strongly for flag change 263 1.000 .000 .000 B11REPS(2) -.580 .321 3.261 1 .071 .560
Aust flag For flag change 429 .000 1.000 .000 B11REPS(3) -.180 .289 .388 1 .533 .835
For retaining flag 482 .000 .000 1.000 B11REPS(4) .306 .545 .315 1 .575 1.358
Strongly for retaining flag 572 .000 .000 .000 B11REPS(5) -.802 .283 8.025 1 .005 .448
Head of republic Strongly for election by
911 1.000 .000 .000 B11REPS(6) -.640 .445 2.070 1 .150 .527
from voters voters
For election by voters 469 .000 1.000 .000 B11REPS(7) -1.469 .510 8.289 1 .004 .230
For election by Parliament 287 .000 .000 1.000 I1(1) -.409 .135 9.180 1 .002 .664
Strongly for election by Constant -3.592 .486 54.525 1 .000 .028
79 .000 .000 .000
Parliament
a. Variable(s) entered on step 1: G1, G3, G4, B11REPS, I1.
Importance of Very important 166 .000 .000
Queen Fairly important 363 1.000 .000
Not very important 1217 .000 1.000
Sex Male 869 .000
Female 877 1.000
Omnibus Tests of Model Coefficients
Chi-square df Sig.
Step 1 Step 896.732 16 .000
Block 896.732 16 .000
Model 896.732 16 .000
Model Summary
-2 Log Cox & Snell Nagelkerke

Step likelihood R Square R Square
1 1442.129 .402 .544

Multivariate Analysis or Survey Data Analysis by Tim Fry PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Multivariate Analysis or Survey Data Analysis by Tim Fry PDF

Uploaded by

Copyright:

Available Formats

October 2001

Dr. Tim R.L. Fry

u This workshop is concerned with Measurement Scales.

u Definition Stage. u Estimation

General Guidelines. A Key Assumption.

u Establish practical as well as u Central to the use of many of these

u Before undertaking more “complicated” u We can use χ2 tests of association to

Graphical Analysis 1. Graphical Analysis 2.

u Graphical analysis is a common

u An outlier is an observation that is u Cannot be avoided!

Missing Data 2. An Example

– Delete respondent(s) [case(s)] and/or The Australian Election Survey, 1998.

-.4 Vote in House of

-.8 Head of Republic -.5

Case Processing Summary Predicted

Dependent Variable Encoding Variables in the Equation

Original Value Internal Value B S.E. Wald df Sig. Exp(B)

Omnibus Tests of Model Coefficients

-2 Log Cox & Snell Nagelkerke

You might also like