Professional Documents
Culture Documents
Slide 1
Outliers
Multicollinearity
Validation
Sample problem
Slide 2
of variance
The ability of discriminant analysis to extract discriminant functions
that are capable of producing accurate classifications is enhanced
when the assumptions of normality, linearity, and homogeneity of
variance are satisfied.
We will use the script for testing for normality and test substituting
the log, square root, or inverse transformation when they induce
normality in a variable that fails to satisfy the criteria for normality.
Slide 3
Slide 4
If we reject the null hypothesis and conclude that the variances are
heterogeneous, we substitute separate covariance matrices in the
classification, and evaluate whether or not our classification accuracy
is improved.
SW388R7
Slide 5
Slide 6
The strategy that we will use for detecting outliers is testing each case
as a multivariate outlier, and omitting those cases where the
probability of the Mahalanobis distances is less than or equal to 0.001.
Multicollinearity
Data Analysis &
Computers II
Slide 7
Validation
Data Analysis &
Computers II
Slide 8
Slide 9
Problem 1
Data Analysis &
Computers II
Slide 10
In the dataset GSS2000R, is the following statement true, false, or an incorrect application of a statistic? Use a
level of significance of 0.05 for the statistical analysis. Use a level of significance of 0.01 for evaluating missing
data and assumptions.
From the list of variables "number of hours worked in the past week" [hrs1], "self-employment" [wrkslf],
"highest year of school completed" [educ], and "income" [rincom98], the most useful predictors for
distinguishing among groups based on responses to "opinion about spending on welfare" [natfare] are "number
of hours worked in the past week" [hrs1], "self-employment" [wrkslf], and "highest year of school completed"
[educ]. These predictors differentiate survey respondents who thought we spend too little money on welfare
from survey respondents who thought we spend about the right amount of money on welfare who, in turn, are
differentiated from survey respondents who thought we spend too much money on welfare.
The most important predictor of groups based on responses to opinion about spending on welfare was number
of hours worked in the past week. The second most important predictor of groups based on responses to
opinion about spending on welfare was self-employment. The third most important predictor of groups based
on responses to opinion about spending on welfare was highest year of school completed.
Survey respondents who thought we spend about the right amount of money on welfare worked fewer hours in
the past week than survey respondents who thought we spend too little or too much money on welfare. Survey
respondents who thought we spend about the right amount of money on welfare had completed more years of
school than survey respondents who thought we spend too little or too much money on welfare. Survey
respondents who thought we spend too much money on welfare were more likely to be self-employed than
survey respondents who thought we spend too little money on welfare.
1. True
2. True with caution
3. False
4. Inappropriate application of a statistic
SW388R7
Dissecting problem 1 - 1
Data Analysis &
Computers II
Slide 11
In the dataset GSS2000R, is the following statement true, false, or an incorrect application of a
statistic? Use a level of significance of 0.05 for the statistical analysis. Use a level of
significance of 0.01 for evaluating missing data and assumptions.
From the list of variables "number of hours worked in the past week" [hrs1], "self-employment"
[wrkslf], "highest year of school completed" [educ], and "income" [rincom98], the most useful
predictors for distinguishing among groups based on responses to "opinion about spending on
welfare" [natfare] are "number of hours worked in the past week" [hrs1], "self-employment"
[wrkslf], and "highest year of school completed" [educ]. These predictors differentiate survey
respondents who thought we spend too little money on welfare from survey respondents who
thought we spend about the right amount of money on welfare who, in turn, are differentiated
from survey respondents who thought we spend too much money on welfare.
SW388R7
Dissecting problem 1 - 2
Data Analysis &
Computers II
Slide 12
Dissecting problem 1 - 3
Data Analysis &
Computers II
Slide 13
In the dataset GSS2000R, is the following statement true, false, or an incorrect application of
a statistic? Use a level of significance of 0.05 for the statistical analysis. Use a level of
significance of 0.01 for evaluating missing data and assumptions.
From the list of variables "number of hours worked in the past week" [hrs1], "self-employment"
[wrkslf], "highest year of school completed" [educ], and "income" [rincom98], the most useful
predictors for distinguishing among groups based on responses to "opinion about spending on
welfare" [natfare] are "number of hours worked in the past week" [hrs1], "self-employment"
[wrkslf], and "highest year of school completed" [educ]. These predictors differentiate
survey respondents who thought we spend too little money on welfare from survey
respondents who thought we spend about the right amount of money on welfare who, in
turn, are differentiated from survey respondents who thought we spend too much money
on welfare.
SW388R7
Dissecting problem 1 - 4
Data Analysis &
Computers II
Slide 14
In the dataset GSS2000R, is the following statement true, false, or an incorrect application of a statistic? Use a level of
significance of 0.05 for the statistical analysis. Use a level of significance of 0.01 for evaluating missing data and assumptions.
From
In athe list of variables
stepwise "number
analysis, weofonly
hours worked in the past week" [hrs1], "self-employment" [wrkslf], "highest year of school
completed" [educ], and "income" [rincom98], the most useful predictors for distinguishing among groups based on responses to
interpret
"opinion aboutthe independent
spending on welfare" [natfare] are "number of hours worked in the past week" [hrs1], "self-employment" [wrkslf],
variables
and thatofare
"highest year entered
school in [educ]. These predictors differentiate survey respondents who thought we spend too
completed"
little money on welfare from survey respondents who thought we spend about the right amount of money on welfare who, in
the are
turn, stepwise analysis.
differentiated from survey respondents who thought we spend too much money on welfare.
The importance of individual
The most important predictor of groups based on responses to opinion about spending on welfare was number of hours
predictors
worked in the past week. The second most important predictor of groups based onisresponses
based on to order
opinion about spending on
welfare was self-employment. The third most important predictor ofof
groups
entry in the analysis.to opinion about spending
based on responses
on welfare was highest year of school completed.
Survey respondents who thought we spend about the right amount of money on welfare worked fewer hours in the past week
than survey respondents who thought we spend too little or too much money on welfare. Survey respondents who thought we
spend about the right amount of money on welfare had completed more years of school than survey respondents who thought
we spend too little or too much money on welfare. Survey respondents who thought we spend too much money on welfare were
more likely to be self-employed than survey respondents who thought we spend too little money on welfare.
1. True
2. True with caution
3. False
4. Inappropriate application of a statistic
SW388R7
Dissecting problem 1 - 5
Data Analysis &
Computers II
Slide 15
From the list of variables "number of hours worked in the past week" [hrs1], "self-employment" [wrkslf], "highest
year of school completed" [educ], and "income" [rincom98], the most useful predictors for distinguishing among
groups based on responses to "opinion about spending on welfare" [natfare] are "number of hours worked in the
past
Theweek" [hrs1],
specific "self-employment"
relationships [wrkslf],
listed in and "highest year of school completed" [educ]. These predictors
the problem
differentiate survey respondents who thought
indicate how the independent variable relates we spend too little money on welfare from survey respondents who
thought we spend about the right amount of money on welfare who, in turn, are differentiated from survey
to groups ofwho
respondents thethought
dependent variable,
we spend e.g.,
too much the on welfare.
money
mean for hours worked in the past week will
The most important predictor of groups based on responses to opinion about spending on welfare was number of
be lower
hours workedforinrespondents
the past week.whoThethink
secondwe spend
most important predictor of groups based on responses to opinion
the right
about amount
spending of money
on welfare versus respondents
was self-employment. The third most important predictor of groups based on responses
to
who think we spend too much or too little. year of school completed.
opinion about spending on welfare was highest
Survey respondents who thought we spend about the right amount of money on welfare worked fewer hours in
the past week than survey respondents who thought we spend too little or too much money on welfare.
Survey respondents who thought we spend about the right amount of money on welfare had completed more
years of school than survey respondents who thought we spend too little or too much money on welfare.
Survey respondents who thought we spend too much money on welfare were more likely to be self-employed
than survey respondents who thought we spend too little money on welfare.
1. True
2. True with caution
3. False
4. Inappropriate application of a statistic
LEVEL OF MEASUREMENT - 1
Data Analysis &
Computers II
Slide 16
In the dataset GSS2000R, is the following statement true, false, or an incorrect application of a
statistic? Use a level of significance of 0.05 for the statistical analysis. Use a level of
significance of 0.01 for evaluating missing data and assumptions.
From the list of variables "number of hours worked in the past week" [hrs1], "self-employment"
[wrkslf], "highest year of school completed" [educ], and "income" [rincom98], the most useful
predictors for distinguishing among groups based on responses to "opinion about spending on
welfare" [natfare] are "number of hours worked in the past week" [hrs1], "self-employment"
[wrkslf], and "highest year of school completed" [educ]. These predictors differentiate survey
respondents who thought we spend too much money on welfare from survey respondents
who thought we spend about the right amount of money on welfare who, in turn, are
differentiated from survey respondents who thought we spend too little money on welfare.
Survey respondents who thought we spend about the right amount of money on welfare worked
fewer hours in the past week than survey respondents who thought we spend too much or little
money on welfare. Survey respondents
Discriminant whorequires
analysis thoughtthat
we the
spend about the right amount of money
dependent
on welfare had completed
variable be non-metric and the independent variables who thought we
more years of school than survey respondents
spend too much or little
be money
metric orondichotomous.
welfare. Survey respondents
"Opinion who thought we spend too
about spending
much money on welfare on were more
welfare" likely to
[natfare] be ordinal
is an self-employed than survey respondents who
level variable,
thought we spend too which
little money
satisfieson welfare.
the level of measurement
requirement.
LEVEL OF MEASUREMENT - 2
Data Analysis &
Computers II
Slide 17
From the list of variables "number of hours worked in the past week" [hrs1], "self-
employment" [wrkslf], "highest year of school completed" [educ], and "income" [rincom98],
the most useful predictors for distinguishing among groups based on responses to "opinion
about spending on welfare" [natfare] are "number of hours worked in the past week"
[hrs1], "self-employment" [wrkslf], and "highest year of school completed" [educ]. These
predictors differentiate survey respondents who thought we spend too much money on
welfare from survey respondents who thought we spend about the right amount of money
on welfare who, in turn, are differentiated from survey respondents who thought we spend
too little money on welfare.
Survey respondents who thought we spend about the right amount of money on welfare worked
fewer hours
"Number in the
of hours past week
worked in thethan survey respondents who thought we spend too much or little
money
past on [hrs1]
week" welfare.
andSurvey respondents who thought we spend about the right amount of money
"highest
year of school
on welfare completed"
had completed[educ]
more years of school than survey respondents who thought we
are interval level variables, which
spend too much or little money on welfare. Survey respondents who thought we spend too
satisfies the level of measurement
much moneyfor
requirements ondiscriminant
welfare were more likely to be self-employed than survey
"Income" [rincom98] is anrespondents
ordinal level who
thought we spend too little money on welfare. variable. If we follow the convention of
analysis.
treating ordinal level variables as metric
variables, the level of measurement
requirement for discriminant analysis is
satisfied. Since some data analysts do
not agree with this convention, a note
"Self-employment" [wrkslf] is a of caution should be included in our
dichotomous or dummy-coded interpretation.
nominal variable which may be
included in discriminant analysis.
SW388R7
Slide 18
Run the
Run the script
script to
to check
check
missing data.
missing data. Move
Move thethe
variables included
variables included in thein the
analysis, mark
analysis, mark thethe option
option
form missing
form missing data,
data, specify
and
clickthe
that thedependent
OK button
variable is nonmetric and
click the OK button.
Be sure to specify
that the dependent
variable is nonmetric.
SW388R7
Slide 19
Statistics
Slide 20
Cases who had missing data for the variable "number of hours
worked in the past week" [hrs1] had an average score on the
variable "highest year of school completed" [educ] that was 1.87
units lower than the average for cases who had valid data (t=-5.194,
p<0.001) and had an average score on the variable "income"
[rincom98] that was 5.32 units lower than the average for cases who
had valid data (t=-4.758, p<0.001).
SW388R7
Slide 21
Slide 22
Slide 23
Slide 24
Slide 25
Second, type in
3 in the Third, click on the
Maximum text Continue button to
box. close the dialog box.
Slide 26
Slide 27
Slide 28
Slide 29
Slide 30
Slide 31
Slide 32
Slide 33
Slide 34
Slide 35
Click on the OK
button to request the
output for the
discriminant
analysis.
Classification accuracy before
SW388R7
Data Analysis &
Computers II
Slide 36
transformations or removing outliers
Classification Resultsb,c
ASSUMPTION OF NORMALITY
Data Analysis &
Computers II
Slide 37
Slide 38
highest year of school completed
Descriptives
Slide 39
highest year of school completed
Slide 40
number of hours worked in the past week
Descriptives
Slide 41
income
Descriptives
Slide 42
Click on the OK
button to produce
the output.
SW388R7
Slide 43
Omitting outliers
Data Analysis &
Computers II
Slide 44
Slide 45
excluding outliers
Classification Resultsb,c
SAMPLE SIZE - 1
Data Analysis &
Computers II
Slide 46
SAMPLE SIZE - 2
Data Analysis &
Computers II
Slide 47
Slide 48
VARIABLE GROUPS
Slide 49
VARIABLE GROUPS
Slide 50
VARIABLE GROUPS
Slide 51
Slide 52
MULTICOLLINEARITY
Data Analysis &
Computers II
Slide 53
Slide 54
relationship of functions to groups
Function 2 separates
Functions at Group Centroids survey respondents
who thought we spend
Function too little money on
WELFARE 1 2 welfare (positive value
of 0.235) from survey
1 -.220 .235 respondents who
2 .446 -.031 thought we spend too
3 -.311 -.362 much money (negative
value of -0.362) on
Unstandardized canonical discriminant welfare. We ignore the
functions evaluated at group means second group (-0.031)
Function 1 separates survey respondents in this comparison
who thought we spend about the right because it was
amount of money on welfare (the positive distinguished from the
value of 0.446) from survey respondents other two groups by
who thought we spend too much (negative function 1.
value of -0.311) or little money (negative
value of -0.220) on welfare.
Independent variables and group membership:
SW388R7
Data Analysis &
Computers II
Slide 55
which predictors to interpret
Variables Entered/Removeda,b,c,d
Min. D Squared
Between Exact F
Step Entered Statistic Groups Statistic df1 df2 Sig.
1 NUMBER When we use the stepwise method of
OF variable inclusion, we limit our interpretation
HOURS of independent variable predictors to those
.023 1 and 3listed as statistically
.475 1 135.000
significant .492
in the table
WORKED
LAST of Variables Entered/Removed.
WEEK
We will interpret the impact on membership
2 R in groups defined by the dependent variable
SELF-EM by the independent variables:
P OR •number of hours worked in the past week
WORKS .251 1 and 2 •self-employment.
3.289 2 134.000 .040
FOR •highest year of school completed
SOMEBO
DY
3 HIGHEST
YEAR OF
SCHOOL .364 1 and 3 2.433 3 133.000 .068
COMPLE
Had we use simultaneous
TED entry of all variables, we
wouldbetween
At each step, the variable that maximizes the Mahalanobis distance not have imposed
the two closest this
groups is entered. limitation.
a. Maximum number of steps is 8.
b. Maximum significance of F to enter is .05.
c.
Independent variables and group membership:
SW388R7
Data Analysis &
Computers II
Slide 56
predictor loadings on functions
Structure Matrix
Function
1 2
HIGHEST YEAR OF
.687* .136
SCHOOL COMPLETED
NUMBER OF HOURS
-.582* .345
WORKED LAST WEEK
R SELF-EMP OR WORKS
.223 .889*
FOR SOMEBODY
RESPONDENTS INCOMEa .101 .292*
Pooled within-groups correlations between discriminating
variables and standardized canonical discriminant functions
Variables ordered by absolute size of correlation within function.
Based on the structure
*. Largest absolute correlation between each variable and
matrix, the predictor
Based on the structure matrix,any thediscriminant function variable strongly
predictor variables strongly associated with
a. This variable not used in the analysis. associated with
discriminant function 1 which distinguished discriminant function 2
between survey respondents who thought which distinguished
we spend about the right amount of money between survey
on welfare and survey respondents who respondents who thought
thought we spend too much or little money we spend too little money
on welfare were number of hours worked in on welfare and survey
the past week (r=-0.582) and highest year respondents who thought
of school completed (r=0.687). we spend too much money
on welfare was self-
employment (r=0.889).
Independent variables and group membership:
SW388R7
Data Analysis &
Computers II
Slide 57
predictors associated with first function - 1
Group Statistics
Valid N (listwise)
WELFARE Mean Std. Deviation Unweighted Weighted
1 TOO LITTLE NUMBER OF HOURS The average number of hours worked
43.96 13.240in the past56week 56.000
for survey
WORKED LAST WEEK
HIGHEST YEAR OF respondents who thought we spend
13.73 2.401about the 56
right amount
56.000 of money on
SCHOOL COMPLETED
welfare (mean=37.90) was lower
R SELF-EMP OR WORKS
1.93 .260than the average
56 number of hours
56.000
FOR SOMEBODY worked in the past weeks for survey
RESPONDENTS INCOME 13.70 5.034respondents
56 who56.000
thought we spend
2 ABOUT RIGHT NUMBER OF HOURS too little money on welfare
37.90 13.235(mean=43.96)
50 50.000
and survey
WORKED LAST WEEK
HIGHEST YEAR OF respondents who thought we spend
14.78 2.558too much money
50 on welfare
50.000
SCHOOL COMPLETED
(mean=42.03).
R SELF-EMP OR WORKS
1.90 .303 50 50.000
FOR SOMEBODY This supports the relationship that
RESPONDENTS INCOME 14.00 5.503"survey respondents
50 50.000who thought we
3 TOO MUCH NUMBER OF HOURS spend about the right amount of
42.03 10.456money on 32 32.000
welfare worked fewer
WORKED LAST WEEK
HIGHEST YEAR OF hours in the past week than survey
13.38 2.524respondents 32 who32.000
thought we spend
SCHOOL COMPLETED
too little or much money on welfare."
R SELF-EMP OR WORKS
1.75 .440 32 32.000
FOR SOMEBODY
RESPONDENTS INCOME 14.75 5.304 32 32.000
Total NUMBER OF HOURS
41.32 12.846 138 138.000
WORKED LAST WEEK
Independent variables and group membership:
SW388R7
Data Analysis &
Computers II
Slide 58
predictors associated with first function - 2
Group Statistics
Valid N (listwise)
WELFARE Mean Std. Deviation Unweighted Weighted
1 TOO LITTLE NUMBER OF HOURS
43.96 13.240The average
56 highest
56.000year of school
WORKED LAST WEEK
completed for survey respondents
HIGHEST YEAR OF
13.73 2.401who thought
56 we 56.000
spend about the
SCHOOL COMPLETED right amount of money on welfare
R SELF-EMP OR WORKS (mean=14.78) was higher than the
1.93 .260average highest
56 56.000
year of school
FOR SOMEBODY
RESPONDENTS INCOME 13.70 5.034completed56for survey
56.000 respondents
who thought we spend too little
2 ABOUT RIGHT NUMBER OF HOURS
37.90 13.235money on 50welfare (mean=13.73) and
50.000
WORKED LAST WEEK survey respondents who thought we
HIGHEST YEAR OF
14.78 2.558
spend too 50
much 50.000
money on welfare
SCHOOL COMPLETED (mean=13.38).
R SELF-EMP OR WORKS
1.90 .303This supports
50 the50.000
relationship that
FOR SOMEBODY
RESPONDENTS INCOME 14.00 5.503
"survey respondents
50
who thought we
50.000
spend about the right amount of
3 TOO MUCH NUMBER OF HOURS
42.03 10.456money on 32 welfare had completed
32.000
WORKED LAST WEEK more years of school than survey
HIGHEST YEAR OF respondents who thought we spend
13.38 2.524 32 32.000
SCHOOL COMPLETED too little or much money on welfare."
R SELF-EMP OR WORKS
1.75 .440 32 32.000
FOR SOMEBODY
RESPONDENTS INCOME 14.75 5.304 32 32.000
Total NUMBER OF HOURS
41.32 12.846 138 138.000
WORKED LAST WEEK
Independent variables and group membership:
SW388R7
Data Analysis &
Computers II
Slide 59
predictors associated with second function
Group Statistics
Valid N (listwise)
WELFARE Mean Std. Deviation Unweighted Weighted
1 TOO LITTLE NUMBER OF HOURS Since self-employment is a dichotomous
43.96 13.240 variable, the
56 mean
56.000
is not directly
WORKED LAST WEEK
HIGHEST YEAR OF interpretable. Its interpretation must
13.73 2.401 take into 56
account the coding by which 1
56.000
SCHOOL COMPLETED
corresponds to self-employed and 2
R SELF-EMP OR WORKS
1.93 .260 corresponds
56 to someone
56.000 else. The lower
FOR SOMEBODY mean for survey respondents who
RESPONDENTS INCOME 13.70 5.034 thought we
56 spend too much money on
56.000
2 ABOUT RIGHT NUMBER OF HOURS welfare (mean=1.75), when compared
37.90 13.235 to the mean
50 for 50.000
survey respondents who
WORKED LAST WEEK
HIGHEST YEAR OF
thought we spend too little money on
14.78 2.558 welfare (mean=1.93),
50 50.000 implies that the
SCHOOL COMPLETED
group contained more survey
R SELF-EMP OR WORKS
1.90 .303 respondents
50 who were self-employed
50.000
FOR SOMEBODY and fewer survey respondents who were
RESPONDENTS INCOME 14.00 5.503 working for
50 someone
50.000 else.
3 TOO MUCH NUMBER OF HOURS
42.03 10.456 This supports
32 the relationship that
32.000
WORKED LAST WEEK
"survey respondents who thought we
HIGHEST YEAR OF
13.38 2.524 spend too32much32.000
money on welfare were
SCHOOL COMPLETED more likely to be self-employed than
.440 survey respondents
32.000who thought we
R SELF-EMP OR WORKS
1.75 32
FOR SOMEBODY spend too little money on welfare."
RESPONDENTS INCOME 14.75 5.304 32 32.000
Total NUMBER OF HOURS
41.32 12.846 138 138.000
WORKED LAST WEEK
CLASSIFICATION USING THE DISCRIMINANT MODEL:
SW388R7
Data Analysis &
Computers II
Slide 60
by chance accuracy rate
Slide 61
criteria for classification accuracy
Classification Resultsb,c
Slide 62
VALIDATION OF THE DISCRIMINANT ANALYSIS
Classification Resultsb,c
Slide 63
From the list of variables "number of hours worked in the past week" [hrs1], "self-
employment" [wrkslf], "highest year of school completed" [educ], and "income" [rincom98],
the most useful predictors for distinguishing among groups based on responses to "opinion
about spending on welfare" [natfare] are "number of hours worked in the past week"
[hrs1], "self-employment" [wrkslf], and "highest year of school completed" [educ]. These
predictors differentiate survey respondents who thought we spend too much money on welfare
from survey respondents who thought we spend about the right amount of money on welfare
The stepwise discriminant analysis
who, in turn, are differentiated from survey respondents who thought we spend too little
included the three variables identified
money on welfare. as the most useful predictors.
The most important predictor of groups based on responses to opinion about spending on
welfare was number of hours worked in the past week. The second most important predictor of
groups based on responses to opinion about spending on welfare was self-employment. The
third most important predictor of groups based on responses to opinion about spending on
welfare was highest year of school completed.
Survey respondents who thought we spend about the right amount of money on welfare worked
fewer hours in the past week than survey respondents who thought we spend too much or little
money on welfare. Survey respondents who thought we spend about the right amount of money
on welfare had completed more years of school than survey respondents who thought we
spend too much or little money on welfare. Survey respondents who thought we spend too
much money on welfare were more likely to be self-employed than survey respondents who
thought we spend too little money on welfare.
SW388R7
Slide 64
From the list of variables "number of hours worked in the past week" [hrs1], "self-employment"
[wrkslf], "highest year of school completed" [educ], and "income" [rincom98], the most useful
predictors for distinguishing among groups based on responses to "opinion about spending on
welfare" [natfare] are "number of hours worked in the past week" [hrs1], "self-employment"
[wrkslf], and "highest year of school completed" [educ]. These predictors differentiate survey
respondents who thought we spend too much money on welfare from survey respondents
who thought we spend about the right amount of money on welfare who, in turn, are
differentiated from survey respondents who thought we spend too little money on welfare.
The most important predictor of groups based on responses to opinion about spending on
welfare was number We
of hours
found worked in the past
two statistically week. The second most important predictor of
significant
groups based on responses to opinion
discriminant aboutmaking
functions, spending on welfare
it possible to was self-employment. The
third most importantdistinguish
predictor among
of groupsthebased
three on responses
groups defined to opinion about spending on
welfare was highest by theofdependent
year variable.
school completed.
Moreover, the cross-validated classification
Survey respondents who thought
accuracy we spend
surpassed the about the right
by chance amount of money on welfare worked
accuracy
criteria, supporting the utility of the model.
fewer hours in the past week than survey respondents who thought we spend too much or little
money on welfare. Survey respondents who thought we spend about the right amount of money
on welfare had completed more years of school than survey respondents who thought we
spend too much or little money on welfare. Survey respondents who thought we spend too
much money on welfare were more likely to be self-employed than survey respondents who
thought we spend too little money on welfare.
SW388R7
Slide 65
From the list of variables "number of hours worked in the past week" [hrs1], "self-employment"
[wrkslf], "highest year of school completed" [educ], and "income" [rincom98], the most useful
predictors for distinguishing among groups based on responses to "opinion about spending on
welfare" [natfare] are "number of hours worked in the past week" [hrs1], "self-employment"
[wrkslf], and "highest year of school completed" [educ]. These predictors differentiate survey
respondents who thought we spend too much money on welfare from survey respondents who
The order of importance matched
thought we spend about the right theamount
order ofofentry
money on table
in the welfare
of who, in turn, are differentiated
from survey respondents who thought we Entered/Removed."
"Variables spend too little money on welfare.
The most important predictor of groups based on responses to opinion about spending on
welfare was number of hours worked in the past week. The second most important
predictor of groups based on responses to opinion about spending on welfare was self-
employment. The third most important predictor of groups based on responses to opinion
about spending on welfare was highest year of school completed.
Survey respondents who thought we spend about the right amount of money on welfare worked
fewer hours in the past week than survey respondents who thought we spend too much or little
money on welfare. Survey respondents who thought we spend about the right amount of money
on welfare had completed more years of school than survey respondents who thought we
spend too much or little money on welfare. Survey respondents who thought we spend too
much money on welfare were more likely to be self-employed than survey respondents who
thought we spend too little money on welfare.
SW388R7
Slide 66
The most important predictor of groups based on responses to opinion about spending on
welfare was number of hours worked in the past week. The second most important predictor of
groups based on responses to opinion about
We spending on welfare
verified that was self-employment. The
each statement
third most important predictor of groups about
basedthe
on relationship
responses tobetween
opinion about spending on
welfare was highest year of school completed.
predictors and groups was correct.
Survey respondents who thought we spend about the right amount of money on welfare
worked fewer hours in the past week than survey respondents who thought we spend too
much or little money on welfare. Survey respondents who thought we spend about the right
amount of money on welfare had completed more years of school than survey respondents
who thought we spend too much or little money on welfare. Survey respondents who
thought we spend too much money on welfare were more likely to be self-employed than
survey respondents who thought we spend too little money on welfare.
1. True
The answer to the question is true with
2. True with caution caution. A caution is added because of
3. False the inclusion of ordinal level variables. A
caution is added because of a violation
4. Inappropriate application of a statistic of discriminant analysis assumptions.
Complete discriminant analysis:
SW388R7
Data Analysis &
Computers II
Slide 67
level of measurement
Yes
Complete discriminant analysis:
SW388R7
Data Analysis &
Computers II
Slide 68
analyzing missing data
No
No Probability of t-tests or
chi-square tests <= level
of significance?
Yes
Slide 69
assumption of normality
No
Yes
Add caution for Use transformation
violation of normality in revised model
No
Slide 71
picking discriminant model for interpretation
Cross-validated accuracy
for revised discriminant
analysis > accuracy of
Yes baseline by 2% or more? No
Slide 72
sample size
Yes
Number of cases in
smallest group greater No Inappropriate
than number of application of
independent variables? a statistic
Yes
Complete discriminant analysis:
SW388R7
Data Analysis &
Computers II
Slide 73
assumption of equal dispersion
No
No Accuracy rate at least 2%
higher using separate-
groups covariance
matrices?
Yes
Slide 74
usable discriminant model
Sufficient statistically No
significant functions to False
distinguish DV groups?
Yes
Yes
Complete discriminant analysis:
SW388R7
Data Analysis &
Computers II
Slide 75
relationships between IV's and DV
No
Entry order of variables
interpreted correctly?
No
False
Yes
Relationships between No
individual IVs and DV groups False
interpreted correctly?
Yes
Complete discriminant analysis:
SW388R7
Data Analysis &
Computers II
Slide 76
classification accuracy
Cross-validated accuracy is No
25% higher than proportional False
by chance accuracy rate?
Yes
Complete discriminant analysis:
SW388R7
Data Analysis &
Computers II
Slide 77
adding cautions to solution
Yes
Yes
Yes
True