0 Up votes0 Down votes

9 views18 pagesLinear Discriminant Analysis for Education Data.

Oct 02, 2013

© Attribution Non-Commercial (BY-NC)

PDF, TXT or read online from Scribd

Linear Discriminant Analysis for Education Data.

Attribution Non-Commercial (BY-NC)

9 views

Linear Discriminant Analysis for Education Data.

Attribution Non-Commercial (BY-NC)

You are on page 1of 18

Kyle N. Payne Group 3 1

Classification of Schools by Academic Achievement Measures Stat 448 Final Project Kyle N. Payne INTRODUCTION In many applications, it makes logical and practical sense to dichotomize continuous variables. In terms of academic performance in educational policy, we could practically describe academic performance in terms of high academic achievement and low academic achievement. While it is reasonable to assume that in dichotomizing continuous variables causes a considerable loss in information (Cohen, 1983) we can also reflect upon the considerable ease of the interpretation in a dichotomy, and how this could help lawmakers, policy specialists, etc. in the development of suitable educational policy. From an applied perspective, also it is logical to investigate the extent that demographic variables predict the classification of schools in terms of academic achievement, and such is the subject of the following analysis. The data set under study consists of math and reading scores from standardized tests administered annually to 3rd and 5th graders in the state of Illinois, as well as several demographic and economic variables. The standardized test in question, the Illinois Standard Achievement Test or ISAT is intended to assess individual student achievement relative to Illinois Learning Standards. The data set contains data for cohorts of students measured at both 3rd and 5th grade from 1999 - 2011. Measurements are at the school level, with averages taken across students. The entire dataset consists of 69466 observations across 109 variables, of which 10 were created over the course of the analysis. These variables consist of coding variables, and averages of other variables across similar groups (like 3rd, and 5th grade). The cohort 1 data (training set) consists of 1783 observations across 109 variables, as does the cohort 2 data (test set). The data was compiled by faculty and staff at the University Of Illinois department of Labor and Employment Relations. Note that some analyses are placed in the appendix for ease of reading. METHODS For my analysis, I chose to use a quadratic discriminant function analysis to model the class membership of elementary schools in Illinois into two dichotomous classes, schools that obtain High Academic Achievement (HAA), and those that obtain Low Academic Achievement (LAA). The criterion for either is decided in advance, i.e. for cohort 1, the data are coded 0 for LAA or 1 for HAA based on if the proportion of students that exceeded expectations in ISAT scores (averaged across math and reading and grades for each school) is above or below 15% respectively. The scale for each grade and test subject were equal, which allowed for easy averaging across grade 3, 4, and 5 for each school, as well as for the two test types. The test scores are standardized, meaning that all schools are assessed in the same manner, such that the test scores are relative to an Illinois state standard. The discriminant analysis was performed using the SAS 9.2 and SAS 9.3 platforms with the stepdisc and discrim procedures. I considered cohort 1 as the training set, and used a stepwise model selection procedure in order to select the appropriate model out of a space of possible 2

Classification of Schools by Academic Achievement Measures models. Predictors selected are general demographic variables of interest, including the average number of low-income students per school, student teacher ratio, etc. For fitting the discriminant function, the variable that is the classification is dependent on is academ_achieve, the proportion of students that exceed expectations on the ISAT averaged across math and reading and grade 3, and 5. The coding variable AA is of the form = { 0 < .15, 1 .15} This is a measure of the average school-wise score on the ISAT. While each class is not multivariate normally distributed, the quadratic discriminant function is relatively robust to non-normality. However to address the relative performance of the discriminant analysis to other methods, I have also used a logistic regression to model the probability schools being assigned to the two classifications. This secondary analysis was done using the SAS 9.2 platform with the logistic procedure. RESULTS Section 1 The stepdisc procedure was initially utilized for the following predictors: avg_stud_lowincome The average number of low income students per school chronic_truant_rate The average proportion of chronic truancy per school avg_dist_tch_salary The average teacher salary per district avg_perc_dist_tch_badegree The average percent of teachers with bachelors degrees per district avg_perc_dist_tch_madegree - The average percent of teachers with masters degrees per district bamaxpay_sched - The bachelors degree maximum pay schedule per school mamaxpay_shed - The masters degree maximum pay schedule per school The procedure was carried out with a .05 selection level and .05 significance level. Table 1.1 below demonstrates the first part of the analysis, in which the predictors are entered into the model based upon their significance.

Statistics for Entry, DF = 1, 1708 R- Squar Toleranc e F Value Pr > F e

0.534 1961.05 <.000 5 1 0.127 250.52 <.000 9 1 0.000 1 0.013 7 0.015 1 0.11 0.741 2 23.70 <.000 1 26.16 <.000 1

avg_stud_lowincome,
chronic_truant_rate,
avg_perc_dist_tch_badegree,
avg_perc_dist_tch_madegree,
are
statistically
significant
at
the
0.05
level.
We
can
see
that
the
variable
that
makes
up
the
vast
majority
of
the
variance
explained
in
the
model
is
avg_stud_lowincome.
avg_stud_lowincome
is
significant
when
entered
into
the
model,
and
we
can
also
see
that
the
multivariate
statistics
below
indicate
improvement
over
the
null
model.

Multivariate
Statistics
Statistic
Wilks'
Lambda
Pillai's
Trace
Average
Squared
Canonical
Correlation
Value
F
Value
Num
DF
Den
DF
Pr
>
F

Table 1.2

1 1

However, it is seen in table 1.3 upon the second step of the stepwise selection process, that all other terms have dropped below any practical significance in ! : 4

Statistics for Entry, DF = 1, 1707 Partial R- Toleranc Square F Value Pr > F e

3.41 0.065 2 24.60 <.000 1 12.31 0.000 5 11.87 0.000 6 5.96 0.014 7 1.60 0.206 5

Table 1.3 Therefore, while the stepwise process finishes after 4 steps with the significant predictors below in table 1.4, we can effectively call into question the practical significance of the other predcitors given the very small partial ! square values. Stepwise Selection Summary Ste p 1 2 3 4 Numbe r In Entered Remove d Partial R- Square F Value Pr > F Wilks' Pr < Lambda Lambda

1 avg_stud_lowincome 2 avg_dist_tch_salary

Table 1.4

24.60 <.000 0.45890 <.0001 1 121 9.80 0.001 0.45628 <.0001 8 123 11.70 0.000 0.45317 <.0001 6 070

Thus, I fit the discriminant function with only the avg_stud_lowincome variable as a predictor. The discrim procedure was utilized, with the classification performed on the coded variable AA = {0 for LAA, 1 for HAA}.

Class Level Information Variabl e AA Name 0 _0 1 _1 Prior Frequenc Proportio Probabilit y Weight n y

842 842.00 0.472238 0.500000 00 941 941.00 0.527762 0.500000 00 Table 1.5

The discrimination resulted in a near 50/50 discrimination of the data, with a roughly 47% of the schools in the LAA category and 53% in the HAA category. As seen in the table 1.7, that the overall classification error rate is 16.11, which consists of a 0.2138 misclassification for the LAA class and 0.1084 misclassification rate for the HAA class . Number of Observations and Percent Classified into AA From AA LAA HAA Total Priors LAA 662 78.62 102 10.84 764 42.85 0.5 Table 1.6 HAA 180 21.38 839 89.16 1019 57.15 0.5 Total 842 100.00 941 100.00 1783 100.00

Refitting
the
model
with
proportional
priors,
I
received
the
same
results
of
non- homogenous
variance
between
the
two
groups,
and
therefore
the
quadratic
discriminant
function
analysis
was
used,
as
seen
in
Table
1.8.
The
MANOVA
results
are
similar
to
the
non-proportional
prior
analysis
(Table
1.9).

Chi-Square
DF
Pr
>
ChiSq

<.0001

Multivariate Statistics and Exact F Statistics S=1 M=-0.5 N=889.5 Statistic Wilks' Lambda Pillai's Trace Hotelling-Lawley Trace Roy's Greatest Root Value F Value Num DF Den DF Pr > F

0.47971 1931.6 133 5 0.52028 1931.6 867 5 1.08458 1931.6 699 5 1.08458 1931.6 699 5 Table 1.9

1 1 1 1

The use of proportional priors increased the misclassification rate for the LAA class and decreased the misclassification rate for the HAA class. However these changes were very slight. The analysis with proportional priors resulted in a very slight increase in the misclassification rate at 0.1621 (Table 1.11).

Number of Observations and Percent Classified into AA From AA LAA HAA Total Priors LAA HAA Total

Error Count Estimates for AA Rate Prior s LAA HAA Total

The
cross-validated
error
rate
estimates
are
slightly
higher
than
the
resubstitution
rates
(table
1.12),
which
are
typically
less
accurate.

Cross
Validated
Error
Count
Estimates
for
AA
Rate
Prior s
LAA
HAA
Total

Because the purpose of the discriminant analysis is to be able to use the training set data to classify future data, I viewed cohort 1 data as a training set, and used cohort 2 data as a test set. While neither data set is completely randomly sampled, we can view cohort 2 as test set for classification under the assumption that there is no distinct non-stochastic difference in the amount of low-income students, and ISAT test scores. Therefore, using the cohort 1 data as the training set with proportional 8

Classification
of
Schools
by
Academic
Achievement
Measures
priors,
the
result
of
the
classification
of
cohort
2
is
shown
in
table
1.13
below.
We
can
see
that
a
larger
proportion
of
cohort
2
is
classified
into
the
HAA
class
compared
with
cohort
1.

Number
of
Observations
and
Percent
Classified
into
AA
Total
LAA
HAA
Total

Due to the univariate nature of the discriminant analysis, we can also view the classification visually. Figure 1.1 describes the predicted probability of being classified into the HAA group as a function of the average number of low-income students per school. The blue represents the HAA class, and red represents the LAA class.

Figure 1.1 Reviewing the assumptions for quadratic discriminant analysis, it is clear that there are several violations in this particular analysis. The distributions of the average number of low-income students for the LAA and HAA classes are both highly non-normal (figure 1.2), which is a consequence of splitting the data into the two classes. However, I proceeded in the face of this because not all violations of assumptions are equally detrimental, while some make an analysis completely invalid, some only affect the precision and accuracy of the analysis to a degree. The robustness of LDA and QDA to violations of normality has been investigated in (Sever, Lajovic & Rajer, 2005). The results of (Sever, Lajovic & Rajer, 2005) 9

Classification
of
Schools
by
Academic
Achievement
Measures
indicate
that
the
largest
effect
of
non-normality
on
the
discriminant
analysis
is
the
increased
bias
of
error
count
estimates.
Skewness
in
distribution
appears
to
have
little
to
no
effect
on
the
discriminant
analysis
using
LDA
or
QDA.

AA=0

25

20.0 17.5 15.0

AA=1

20

Percent

Percent

15

10

0 0 20 40 60 80 100 120 140 160 180 200 220 240 260 280

Figure
1.2
Section
2
Because
the
classification
scheme
under
study
involves
classifying
data
into
dichotomous
classes,
I
also
used
logistic
regression
of
the
average
number
of
low- income
students
per
school
onto
the
log
odds
of
said
school
being
classified
in
the
either
of
the
AA
classes.
Logistic
regression
is
competitive
with
discriminant
analysis
for
classification
because
of
its
relatively
small
set
of
assumptions,
and
thus
the
non-normality
of
the
classes
is
not
a
violation.
The
generalized
logit
link
function
was
utilized
as
suggested
in
(Der & Everitt, 2002) due
to
the
ordinal
nature
of
the
scale
of
the
response.
The
test
of
the
global
null
hypothesis
(table
2.1)
and
the
MLE
estimates
(table
2.2)
are
all
significant.
The
asymptotic
Wald
Chi-Square
value
should
be
precise
due
to
the
large
sample
size.

Testing
Global
Null
Hypothesis:
BETA=0
Test
Likelihood
Ratio
Score
Wald
Chi-Square
DF
Pr
>
ChiSq

avg_stud_lowincome

avg_stud_lowincome

1134.8846 1 927.6747 1

10

Analysis of Maximum Likelihood Estimates Estimat AA DF e 1 Standar d Wald Error Chi-Square Pr > ChiSq

Parameter Intercept

1 2.9648

0.1345 485.7581

<.0001 <.0001

avg_stud_lowincom 1 e

Table
2.2
The
odds
ratio
estimate
for
the
average
number
of
low-income
students
on
HAA
is
equal
to
0.977
(Table
2.3).
This
implies
that
an
increase
in
low-income
students
per
school
is
more
likely
in
the
LAA
class.

Odds
Ratio
Estimates
Point
Estimat 95%
Wald
e
Confidence
Limits

Effect

AA

avg_stud_lowincom 1 e

0.977

0.975

0.979

Table 2.3 Viewing diagnostics (figure 2.1, 2.2), it is clear that there are no obvious violations of assumptions of homogeneity of residual variance. However, we do see that the classes are completely separated in their residuals, which is likely due to the artificial-ness of the classification scheme.

Figure 2.1

11

Figure 2.2 Due to the univariate nature of the analysis, we can also view the logistic regression in terms of average number of low-income students on the probability of a school being classified as a HAA school. Figure 2.3 describes the predicted probability of a school being classified into the HAA class by the average number of low-income students per school.

Figure 2.3

We can also view measures of the association of predicted probabilities and the observed response. The percent concordant is the percent of responses that have a predicted mean score that also exists in the same class. The c-c measure is an adjustment on the ROC c measure. It ranges from 0.5 to 1, where 0.5 reflects a model

12

Classification
of
Schools
by
Academic
Achievement
Measures
randomly
predicting
the
response,
and
1
perfectly
classifying
the
response
(table
2.4).
It
appears
as
if
the
classification
is
relatively
accurate.

Association
of
Predicted
Probabilities
and
Observed
Responses
Percent
Concordant
Percent
Discordant
Percent
Tied
Pairs

90.8
Somers'

D

Section 3 In comparing the two models it is clear that the discriminant analysis may give relatively biased predictions when compared to the logistic regression. This reflects the possible bias of the model due to the violations of normality. While the two models do deviate from each other in their predictions of the probability of being classified into the HAA class, the two models are roughly similar (Figure 3.1).

Figure 3.1

13

Classification of Schools by Academic Achievement Measures Conclusion From the two analyses, we can paint a very convincing picture: The average number of low-income students per school is associated with decreases in the probability of said school being classified as into the High Academic Achievement class. Both models predict that schools with high number of low-income students have a high probability of being classified as LAA, and therefore the models predict that those schools have a lower number of students that exceed expectations on ISAT scores. Not only did the Average Number of Low-Income Students per school classify schools well, it did so above any other demographic predictor. The model selection process described in section 1 of the results section is evidence towards this point, as avg_stud_lowincome had a partial ! = 0.5345. This could provide a useful perspective to budgetary decisions, as the average number of low-income students explained much more variance then the average teacher salary per district (Although this is a messy comparison as there is variance in average teacher salary within a district). While this effect size may seem relatively small, it is actually quite high with regard to effects sizes commonly expected in social science. This also speaks to the general noisey-ness of the data. Further analysis could look at the relative performance of the discriminant model across each of the cohorts, or using a more sophisticated multivariate regression model where ISAT scores for math and reading are multiple responses. Other types of classification schemes could also be performed on the data, such as K-Means clustering, non-parametric discriminant analyses, etc. 14

Classification of Schools by Academic Achievement Measures Reference Cohen, J. (1983). Cost of dichotomization. Applied Psychological Measurement, 7(3), 249-250.

Der, G. & Everitt, B. S. (2002). A handbook of statistical analyses using sas. (2nd ed., p. 292). Boca Raton, FL: Chapman & Hall/CRC
Sever, M., Lajovic, J., & Rajer, B. (2005). Robustness of the fishers discriminant . Metodoloki zvezki,2(2), 239-242.

15

Classification
of
Schools
by
Academic
Achievement
Measures
Appendix:
A1.
Some
univariate
results
for
avg_stud_lowincome:
LAA:

Moments
N
Mean
Std
Deviation
Skewness
Uncorrected
SS
Coeff
Variation
842
Sum
Weights
205.1981
Sum
Observations
84.2103863
Variance
-0.6552029
Kurtosis
41417329.4
Corrected
SS
41.0385799
Std
Error
Mean
842
172776.8
7091.38915
-0.8303315
5963858.28
2.90208156

Basic
Statistical
Measures
Location
Mean
Median
Mode
Goodness-of-Fit
Tests
for
Normal
Distribution
Test
Kolmogorov-Smirnov
Cramer-von
Mises
Anderson-Darling
D
W-Sq
A-Sq
Statistic
0.1670009
Pr
>
D
5.4448388
Pr
>
W-Sq
33.6363182
Pr
>
A-Sq
p
Value
<0.010
<0.005
<0.005
205.1981
Std
Deviation
231.7000
Variance
279.2000
Range
Interquartile
Range
Variability
84.21039
7091
300.00000
140.90000

AA=0

300 250

avg_stud_lowincome

Normal Percentiles

16

Moments N Mean Std Deviation Skewness Uncorrected SS Coeff Variation Basic Statistical Measures Location Mean Median Mode 59.72030 Std Deviation 46.40000 Variance 0.00000 Range Interquartile Range Variability 53.66708 2880 282.30000 74.10000 941 Sum Weights 59.7202976 Sum Observations 53.6670837 Variance 1.18972537 Kurtosis 6063436.14 Corrected SS 89.8640595 Std Error Mean 941 56196.8 2880.15587 1.4619666 2707346.52 1.74949693

Goodness-of-Fit
Tests
for
Normal
Distribution
Test
Kolmogorov-Smirnov
Cramer-von
Mises
Anderson-Darling
D
W-Sq
A-Sq
Statistic
0.1328989
Pr
>
D
3.7635999
Pr
>
W-Sq
24.7062620
Pr
>
A-Sq

AA=1

300 250

avg_stud_lowincome

Normal Percentiles

17

Statistics for Removal, DF = 1, 1707 Variable avg_stud_lowincome avg_dist_tch_salary No variables can be removed. Statistics for Entry, DF = 1, 1706 Variable chronic_truant_rate avg_perc_dist_tch_badegree avg_perc_dist_tch_madegree bamaxpay_sched mamaxpay_sched Variable avg_perc_dist_tch_badegree will be entered. Variable(s) That Have Been Entered avg_stud_lowincome Multivariate Statistics Statistic Wilks' Lambda Pillai's Trace Average Squared Canonical Correlation Value 0.456281 0.543719 0.543719 F Value 677.64 677.64 Num DF 3 3 Den DF 1706 1706 Pr > F <.0001 <.0001 avg_dist_tch_salary avg_perc_dist_tch_badegree Partial R-Square 0.0018 0.0057 0.0055 0.0036 0.0011 F Value 3.02 9.80 9.38 6.19 1.89 Pr > F 0.0826 0.0018 0.0022 0.0129 0.1690 Tolerance 0.7843 0.9771 0.9753 0.7578 0.9789 Partial R-Square 0.5411 0.0142 F Value 2012.52 24.60 Pr > F <.0001 <.0001

18

## Much more than documents.

Discover everything Scribd has to offer, including books and audiobooks from major publishers.

Cancel anytime.