Professional Documents
Culture Documents
Discriminant Analysis
Example 9.1: Consider the following data on nancial
ration for solvent and bankrupted companies
Financial Ratios of Bankrupt and Solvent Companies, Altman (1968)
Source: Morrison (1990). Multivariate Statistical Methods,
3rd ed. McGraw-Hill
X1 = Working Capital / Total Assets
X2 = Retained Earnings / Total Assets
X3 = Earnings Before Interest and Taxes / Total Assets
X4 = Market Value of Equity / Total Value of Liabilities
X5 = Sales / Total Assets
Group, 1 = Bankrupt 2 = Solvent
Group
X1
X2
1
36.7 -62.8
1
24.0
3.3
1
-61.6 -120.8
1
-1.0 -18.1
1
18.9
-3.8
1
-57.2 -61.2
1
3.0 -20.3
1
-5.1 -194.5
1
17.9
20.8
1
5.4 -106.1
1
23.0 -39.4
1
-67.6 -164.1
1
-185.1 -308.9
1
13.5
7.2
1
-5.7 -118.3
1
72.4 -185.9
1
17.0 -34.6
1
-31.2 -27.9
1
14.1 -48.2
1
-60.6 -49.2
1
26.2 -19.2
1
7.0 -18.1
1
53.1 -98.0
1
-17.2 -129.0
1
32.7
-4.0
1
26.7
-8.7
1
-7.7 -59.2
1
18.0 -13.1
1
2.0 -38.0
1
-35.3 -57.9
1
5.1
-8.8
1
0.0 -64.7
1
25.2 -11.4
X3
-89.5
-3.5
-103.2
-28.8
-50.6
-56.6
-17.4
-25.8
-4.3
-22.9
-35.7
-17.7
-65.8
-22.6
-34.2
-280.0
-19.4
6.3
6.8
-17.2
-36.7
-6.5
-20.8
-14.2
-15.8
-36.3
-12.8
-17.6
1.6
0.7
-9.1
-4.0
4.8
X4
54.1
20.9
24.7
36.2
26.4
11.0
8.0
6.5
22.6
23.8
69.1
8.7
35.7
96.1
21.7
12.5
35.5
7.0
16.6
7.2
90.4
16.5
26.6
267.9
177.4
32.5
21.3
14.6
7.7
13.7
100.9
0.7
7.0
X5
1.7
1.1
2.5
1.1
0.9
1.7
1.0
0.5
1.0
1.5
1.2
1.3
0.8
2.0
1.5
6.7
3.4
1.3
1.6
0.3
0.8
0.9
1.7
1.3
2.1
2.8
2.1
0.9
1.2
0.8
0.9
0.1
0.9
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
35.2
38.8
14.0
55.1
59.3
33.6
52.8
45.6
47.4
40.0
69.0
34.2
47.0
15.4
56.9
43.8
20.7
33.8
35.8
24.4
48.9
49.9
54.8
39.0
53.0
20.1
53.7
46.1
48.3
46.7
60.3
17.9
24.7
43.0
47.0
-3.3
35.0
46.7
20.8
33.0
26.1
68.6
37.3
59.0
49.6
12.5
37.3
35.3
49.5
18.1
31.4
21.5
8.5
40.6
34.6
19.9
17.4
54.7
53.5
35.6
39.4
53.1
39.8
59.5
16.3
21.7
16.4
16.0
4.0
20.8
12.6
12.5
23.6
10.4
13.8
33.4
23.1
23.8
7.0
34.1
4.2
25.1
13.5
15.7
-14.4
5.8
5.8
26.4
26.7
12.6
14.6
20.6
26.4
30.5
7.1
13.8
7.0
20.4
-7.8
99.1
126.5
91.7
72.3
724.1
152.8
475.9
287.9
581.3
228.8
406.0
126.6
53.4
570.1
240.3
115.0
63.1
144.8
90.0
149.1
82.0
310.0
239.9
60.5
771.7
307.5
289.5
700.0
164.4
229.1
226.6
105.6
118.6
1.3
1.9
2.7
1.9
0.9
2.4
1.5
2.1
1.6
3.5
5.5
1.9
1.8
1.5
0.9
2.6
4.0
1.9
1.0
1.5
1.8
1.8
2.3
1.3
1.7
1.1
2.0
1.9
1.9
1.2
2.0
1.0
1.6
Solvent
0
0
0
2
7
17
7
0
Histogram
EBIT / Total Assets
18
16
14
12
10
8
6
4
2
Bankrupt
40
25
X5
1.50
1.94
1.20
1.80
1.16
0.93
1.35
0.86
12.30
6.29
3.03
2.18
6.60
4.60
0.10
0.90
6.70
5.50
33
33
10
X4
40.05
254.67
21.70
164.40
54.94
206.57
3018.22
42669.19
9.51
0.72
2.91
1.31
267.20
718.30
0.70
53.40
267.90
771.70
33
33
-5
X3
-31.78
15.32
-17.70
14.60
51.35
10.87
2637.18
118.11
17.55
0.71
-3.82
-0.56
286.80
48.50
-280.00
-14.40
6.80
34.10
33
33
-20
X2
-62.51
35.24
-39.40
35.60
71.31
16.51
5085.48
272.50
3.31
-0.33
-1.69
-0.18
329.70
71.90
-308.90
-3.30
20.80
68.60
33
33
< -51
Bankrupt
Mean
Solvent
Bankrupt
Median
Solvent
Bankrupt Standard Deviation
Solvent
Bankrupt
Sample Variance
Solvent
Bankrupt
Kurtosis
Solvent
Bankrupt
Skewness
Solvent
Bankrupt
Range
Solvent
Bankrupt
Minimum
Solvent
Bankrupt
Maximum
Solvent
Bankrupt
Count
Solvent
X1
-2.83
41.40
5.40
45.60
45.88
14.21
2104.57
201.99
6.95
-0.63
-2.09
-0.37
257.50
55.00
-185.10
14.00
72.40
69.00
33
33
Frequency
Statistic
-35
Solvent
Mean
Variance
Observations
df
F
P(F<=f) one-tail
F Critical one-tail
Bankrupt
1.50303
1.350928
33
32
1.561835
0.106347
1.80448
Solvent
1.939394
0.864962
33
32
More complete use of group separation information, however, can be given by discriminant analysis (DA).
p variables
q exclusive groups
2. The rst new variable has the best discriminating power w.r.t the given groups.
The second new variable has the second
best discriminating power and is uncorrelated with the rst one, the third has
the third best discriminating power and is
uncorrelated with the previous ones, etc.
yj = aj1x1 + + ajpxp,
10
the variables.
The idea in deriving the discriminant functions is to divide the total variation into between group and within group variation
(2)
11
T = B + W,
12
BW1.
The resulting eigenvectors form the coecients for the discriminant functions yj , j = 1, . . . , k
with k = min(q 1, p).
The functions are called canonical discriminant functions.
13
65 DF Total
64 DF Within Classes
1 DF Between Classes
Frequency
Weight
Proportion
1
2
33
33
33.0000
33.0000
0.500000
0.500000
14
GROUP = 1
DF = 32
X1
X2
X3
X4
X5
2104.5659
1834.1637
-266.4029
249.8980
18.0357
1834.1637
5085.4767
1632.2018
177.7665
-15.6653
-266.4029
1632.2018
2637.1822
168.3066
-46.6066
249.8980
177.7665
168.3066
3018.2188
1.6108
18.0357
-15.6653
-46.6066
1.6108
1.3509
GROUP = 2
DF = 32
X1
X2
X3
X4
X5
201.986
117.413
16.740
974.165
1.921
117.413
272.496
52.076
1630.092
0.879
16.740
52.076
118.108
814.591
2.762
974.165
1630.092
814.591
42669.190
-14.529
1.921
0.879
2.762
-14.529
0.865
Variable
Variable
X1
X2
X3
X4
X5
X1
X2
X3
X4
X5
Mean
Variance
Std Dev
66
66
66
66
66
19.28485
-13.63485
-8.23182
147.35909
1.72121
1632
5064
1920
34186
1.13924
40.39972
71.15836
43.81308
184.89362
1.06735
GROUP = 1
Variable
Variable
X1
X2
X3
X4
X5
X1
X2
X3
X4
X5
Mean
Variance
Std Dev
33
33
33
33
33
-2.83030
-62.51212
-31.78182
40.04545
1.50303
2105
5085
2637
3018
1.35093
45.87555
71.31253
51.35350
54.93832
1.16229
GROUP = 2
Variable
X1
X2
X3
X4
X5
15
Mean
Variance
Std Dev
33
33
33
33
33
41.40000
35.24242
15.31818
254.67273
1.93939
201.98563
272.49627
118.10841
42669
0.86496
14.21216
16.50746
10.86777
206.56522
0.93003
16
Variable
X1
X2
X3
X4
X5
Total
STD
Pooled
STD
40.3997
71.1584
43.8131
184.8936
1.0673
33.9599
51.7589
37.1166
151.1413
1.0526
Num DF= 1
Den DF= 64
Between
STD
R-Squared
RSQ/
(1-RSQ)
31.2755
69.1229
33.3047
151.7644
0.3086
0.304266
0.479063
0.293363
0.342055
0.042428
0.4373
0.9196
0.4152
0.5199
0.0443
X1
X2
X3
X4
X5
Adjusted
Canonical
Correlation
Approx
Standard
Error
Squared
Canonical
Correlation
0.793876
0.781803
0.045863
0.630239
Eigenvalues of INV(E)*H
= CanRsq/(1-CanRsq)
Canonical
Correlation
Eigenvalue
Pr > F
27.9892
58.8555
26.5698
33.2726
2.8357
0.0001
0.0001
0.0001
0.0001
0.0971
Difference
Proportion
Cumulative
1.0000
1.0000
1.7045
Likelihood
Ratio
Approx F
Num DF
Den DF
Pr > F
0.36976078
20.4534
60
0.0001
M=1.5
N=29
CAN1
Statistic
Wilks Lambda
Pillais Trace
Hotelling-Lawley Trace
Roys Greatest Root
Value
0.369760775
0.630239225
1.704451275
1.704451275
F
20.4534
20.4534
20.4534
20.4534
Num DF
Den DF
Pr > F
5
5
5
5
60
60
60
60
0.0001
0.0001
0.0001
0.0001
17
X1
X2
X3
X4
X5
0.694823
0.871854
0.682260
0.736708
0.259462
18
CAN1
X1
X2
X3
X4
X5
CAN1
1.000000
1.000000
1.000000
1.000000
1.000000
X1
X2
X3
X4
X5
X1
X2
X3
X4
X5
0.0034765558
0.0084720383
0.0152812900
0.0030378872
0.4984713894
CAN1
GROUP
CAN1
0.506539
0.734533
0.493528
0.552283
0.161231
1
2
-1.285613175
1.285613175
0.1404518774
0.6028563830
0.6695203123
0.5616859665
0.5320432994
0.1180635365
0.4385036080
0.5671902048
0.4591503359
0.5246858501
19
20
This structure tells how the means of variables and means of discriminant functions are
correlated.
21
22
Example 9.3: (Continued) From the within canonical structure we observe that X2 (Retained earnings
/ Total assets) has the highest correlation with the
discriminant function. Next come X4 (Market value
of equity / Total Value of Liabilities), X1 (Working
capital / Total Assets) and X3 (Earnings before interest and taxes / Total assets), whereas X5 (Sales /
Total Assets) is small, but it has a large standardized
coecient.
Summing up, protable and companies whose market
value is on a high level are the properties preventing
from the bankruptcy.
23
24
Example 9.4. Testing for the equality of the population covariance matrices.
(4)
H0 : 1 = 2 ,
25
Sepal
Sepal
Pedal
Pedal
length
WIdth
Length
Width
26
149 DF Total
147 DF Within Classes
2 DF Between Classes
Frequency
Weight
Proportion
50
50
50
50.0000
50.0000
50.0000
0.333333
0.333333
0.333333
SETOSA
VERSICOLOR
VIRGINICA
M=0.5
Value
Wilks Lambda
0.023438631
Pillais Trace
1.191898825
Hotelling-Lawley Trace 32.47732024
Roys Greatest Root
32.1919292
N=71
F
199.145
53.4665
580.532
1166.96
Num DF
Den DF
Pr > F
8
8
8
4
288
290
286
145
0.0001
0.0001
0.0001
0.0001
28
1
2
Canonical
Correlation
Adjusted
Canonical
Correlation
Approx
Standard
Error
Squared
Canonical
Correlation
0.984821
0.471197
0.984508
0.461445
0.002468
0.063734
0.969872
0.222027
SEPALLEN
SEPALWID
PETALLEN
PETALWID
Eigenvalues of INV(E)*H
= CanRsq/(1-CanRsq)
1
2
Eigenvalue
Difference
Proportion
Cumulative
32.1919
0.2854
31.9065
.
0.9912
0.0088
0.9912
1.0000
Likelihood
Ratio
Approx F
Num DF
Den DF
Pr > F
0.02343863
0.77797337
199.1453
13.7939
8
3
288
145
0.0001
0.0001
CAN1
CAN2
0.991468
-0.825658
0.999750
0.994044
0.130348
0.564171
0.022358
0.108977
Sepal
Sepal
Petal
Petal
Length
Width
Length
Width
in
in
in
in
mm.
mm.
mm.
mm.
Length
Width
Length
Width
in
in
in
in
mm.
mm.
mm.
mm.
SEPALLEN
SEPALWID
PETALLEN
PETALWID
1
2
CAN1
CAN2
0.222596
-0.119012
0.706065
0.633178
0.310812
0.863681
0.167701
0.737242
Sepal
Sepal
Petal
Petal
SEPALLEN
SEPALWID
PETALLEN
PETALWID
CAN1
CAN2
0.791888
-0.530759
0.984951
0.972812
0.217593
0.757989
0.046037
0.222902
Sepal
Sepal
Petal
Petal
Length
Width
Length
Width
29
in
in
in
in
mm.
mm.
mm.
mm.
30
SEPALLEN
SEPALWID
PETALLEN
PETALWID
CAN1
CAN2
-0.686779533
-0.668825075
3.885795047
2.142238715
0.019958173
0.943441829
-1.645118866
2.164135931
Sepal
Sepal
Petal
Petal
Length
Width
Length
Width
in
in
in
in
mm.
mm.
mm.
mm.
SEPALLEN
SEPALWID
PETALLEN
PETALWID
CAN1
CAN2
-.4269548486
-.5212416758
0.9472572487
0.5751607719
0.0124075316
0.7352613085
-.4010378190
0.5810398645
SEPALLEN
SEPALWID
PETALLEN
PETALWID
CAN2
-.0829377642
-.1534473068
0.2201211656
0.2810460309
0.0024102149
0.2164521235
-.0931921210
0.2839187853
Sepal
Sepal
Petal
Petal
Length
Width
Length
Width
Sepal
Sepal
Petal
Petal
Length
Width
Length
Width
in
in
in
in
mm.
mm.
mm.
mm.
CAN1
CAN2
-7.607599927
1.825049490
5.782550437
0.215133017
-0.727899622
0.512766605
31
in
in
in
in
mm.
mm.
mm.
mm.
H1 : More is needed
(5)
On the basis of the within-matrices the rst
discriminator indicates that the species dier
with respect to the overall size of the leaves
and the second discriminator that species differ also with respect to the width of the
leaves.
32
Example 9.6: Bankruptcy risk and signal to reorganization of a company (Laitinen, Luoma, Pynn
onen
1996, UV, Discussion Papers 200)
33
34
Sample statistics:
B1 (n=20)
Variable Mean Std Dev
ROI -10.24 8.60
TCF -13.32 10.83
QRA 0.58 0.39
SCA -0.61 20.22
DSR 1.09 0.55
**=significant at level 0.01
***=significant at level 0.001
B2 (n=20)
Mean Std Dev
3.52 5.59
0.13 2.31
0.57 0.55
-4.75 18.79
0.69 0.25
N3 (n=17)
Mean Std Dev
2.27 7.14
0.97 5.00
1.14 0.70
13.62 13.19
0.88 0.34
N4 (n=23) F for eq
Mean Std Dev of means
12.02 5.96 37.66***
6.47 5.67 32.48***
0.85 0.42 4.95**
23.13 19.55 10.39***
0.57 0.28 7.62***
The results indicate that also the third canonical discriminant function is statistically signicant.
35
36
Table 11. Canonical structure and Standardized canonical coefficients both as pooled within.
Variable CAN1
ROI 0.702
TCF 0.643
QRA 0.101
SCA 0.252
DSR -0.306
37
38
Group dierences:
CAN1, the nancial performance, shows that the nancial performance is the main characteristic dierentiating healthy and bankruptcy rms (as expected).
CAN2, controversy dynamic liquidity and static ratios,
is dierentiating characteristic between reorganizable
non-bankrupt and reorganizable bankrupt rms.
CAN3, controversy between liquidity and other ratios,
reorganizable non-bankrupt rms and healthy rms.
The distinction is probably due to the fact that nonbankrupt rms may have cash reserves (high liquidity),
but do not use it protably.
39
40
Classication
The other main usage of discriminant analysis is to predict from which of the given
classes a given observation is coming from
(decease diagnostics, bankruptcy prediction,
etc.).
The goal is to minimize the misclassication
rate, (two groups labeled as 1 and 2)
(6)
where P (E) denotes the misclassication probability, pi is the probability that an observation is from group i, and P (j|i) denotes
the probability that an observation coming
from the group j is classied to the group i,
i, j = 1, 2, and p1 + p2 = 1.
Frequency
33
33
Weight
33.0000
33.0000
Proportion
0.500000
0.500000
Prior
Probability
0.500000
0.500000
The probabilities pi indicate the prior probabilities or the population proportion of the
group i.
41
42
Discriminant Analysis
Covariance
Matrix Rank
5
1
0
6.61120
gence problems).
proc logistic data = a.bankruptcy;
* wcta (x1) reta (x2) ebitta (x3)
model group = reta / ctable;
run;
2
6.61120
0
Ordered
Value
Group
Total
Frequency
1
2
1
2
33
33
j
1
-1.76280
0.01113
-0.03003
0.01810
0.00266
1.42947
2
-4.67181
0.02007
-0.00825
0.05739
0.01047
2.71115
sta (x5);
Response Profile
GROUP
CONSTANT
X1
X2
X3
X4
X5
mvetvl (x4)
Criterion
AIC
SC
-2 Log L
43
Intercept
Only
Intercept
and
Covariates
93.495
95.685
91.495
19.804
24.183
15.804
44
False
NEG
5.9
SPSS results:
Correct
95.5
Percentages
Sensi- Speci- False
tivity ficity
POS
93.9
97.0
3.1
Classification Tabl
==========================================
Predicted
Group
Percentage
1
2
Correct
-----------------------------------------Observed 1
31
2
93.9
Group
2
1
32
97.0
-----------------------------------------Overall Percentage
95.5
==========================================
a
The cut value is .500
Correct
NonEvent Event
31
32
Incorrect
NonEvent Event
1
2
Prob
Level
0.500
Classification Table
0.1530
0.0020
2.0417
9.5805
0.8165
0.0571
1
1
Intercept
reta
1.1666
-0.1767
Wald
Pr > ChiSq
Standard
Error
Chi-Square
Estimate
DF
Parameter
<.0001
<.0001
0.0020
1
1
1
Likelihood Ratio
Score
Wald
75.6917
31.6182
9.5805
Pr > ChiSq
DF
Chi-Square
Test
ables.
45
46