You are on page 1of 8

BANKRUPTCY DATA

Principal Componenet Analysis

Step 1: We should find the principal components of the dataset because if all the attribute is
taken the model gives the squared correlation of 100% which cannot be true for any model .

To find the principal component these are the following steps:


1) Drag the Bankruptcy file(excluding NO column)
2) Select attributes( to select all attributes except YR and D)
3) Normalize the data
4) Drag PCA operator and play the process.
After the process is played, top principal components are selected from the Eigenvectors
having highest value irrespective of the positive or negative sign.

PRINCIPAL COMPONENTS

PC1= R23, PC2=R5,PC3=R2,PC4=R20,PC5=R11,PC6=R19,PC7=R8,PC8=R7.

This picture above is the graphical representation of cumulative variance of the Principal
Components.
We a=have to make 2 models for the particular data.( Logistic Regression and Decision Tree)
Before moving ahead, we will be changing the values of D to DYN {Failure(when, D=0) and
Healthy(When, D=1)} in the excel sheet because logistic regression does not take numerical
values and requires non numeric values. Also before importing the new file we will exclude
all the attributes except the Principal Component attributes and D.
LOGISTIC REGRESSION

Logistic Regression is used when the outcome variable has only two values, as in this example
the outcome variable was ‘Firm Category” it had only two values either “Healthy” or “Failed”.
The input variables had continuous values.

Steps involved in logistic regression:


1) Drag the Bankruptcy file in blank process
2) Drag select attributes, not necessary in this case because already excluded all the
non-required data.
3) Drag Set Role to set the attribute to DYN.
4) Drag split data to divide data in ratio 0.7 and 0.3
5) Next drag Logistic Regression
6) Drag Apply Model an connect logistic regression mod to mod of apply model and
par of split data to unl
7) Drag Performance (Binomial Classification) and tick classification error and
accuracy.
8) Play the process.

After playing the process we get the performance vector, example set of apply model and
Logistic Regression tab.
In the Example set (apply model) the actual value and the prediction value of DYN is shown.
The equation formed:
Logit = 5.036*R2 + 0.226*R5 + 6.398*R7 + 1.142*R8 + 4.937*R9 + 7.0669*R11 -
24.782*R17 - 0.065*R19 + 0.757*R20 + 2.390*R21 + 24.043*R23 - 23.212 + ℇ
These variables has Max effect on the model. The variable with “+” sign affect it positively
and the variable with the “-“sign affect it negatively.
The performance vector shows that the accuracy of the data is 67.31% and table shows
that:
➢ Number of times predicted failure was actually a failure-16 times out of 23
(69.57% precision)
➢ Number of times predicted healthy was actually healthy- 19 times out of 29
(65.52% precision)
DECISION TREE MODEL

Steps involved in decision tree:


1) Drag the Bankruptcy file in blank process
2) Drag select attributes, not necessary in this case because already excluded all the
non-required data.
3) Drag Set Role to set the attribute to DYN.
4) Drag split data to divide data in ratio 0.7 and 0.3
5) Next drag decision tree
6) Drag Apply Model an connect decision tree mod to mod of apply model and par of
split data to unl
7) Drag Performance (Classification) and tick classification error and accuracy.
8) Play the process.
The performance vector shows that the accuracy of the data is 67.50% and table shows
that:
➢ Number of times predicted failure was actually a failure-14 times out of 21
(66.67% precision)
➢ Number of times predicted healthy was actually healthy- 13 times out of 19
(68.42% precision).

If R21<1.750 and if R9<1.945 and if R8>11.070 and if R8>87.760 then the firm category is
“Fail”.

If R21<1.750 and if R9<1.945 and if R8>11.070 and if R8<87.760 then the firm category is
“Healthy”.

Interpretation: If R21>1.750 is R11, according to description it shows Current Asset/Asset. If


Current Asset/Asset>0.500 then the firm is healthy. If Current Assets/Assets is less than or
equal to 0.500, then it is R17 According to the description it is INCDEP/Assets. If
INCDEP/Assets>0.105 then the firm is healthy. If it is less than or equal to 0.105 then firm
would fail.
If R21 is less than or equal to 1.750, then it is R9, which is Current Assets/Current Debts. If
Current Assets/Current Debts is greater than 1.945 then the firm is healthy.
The information is taken from a decision tree in a branching pattern. The Healthy firm should
maintain an optimal asset ratio to maintain the firm’s day to day operations.
Tree
R21 > 1.750
| R11 > 0.500: Healthy {Fail=0, Healthy=29}
| R11 ≤ 0.500
| | R17 > 0.105: Healthy {Fail=0, Healthy=6}
| | R17 ≤ 0.105: Fail {Fail=2, Healthy=0}
R21 ≤ 1.750
| R9 > 1.945: Healthy {Fail=0, Healthy=4}
| R9 ≤ 1.945
| | R8 > 11.070
| | | R8 > 87.760: Fail {Fail=2, Healthy=0}
| | | R8 ≤ 87.760: Healthy {Fail=0, Healthy=3}
| | R8 ≤ 11.070
| | | R23 > 0.075
| | | | R17 > 0.135: Fail {Fail=5, Healthy=0}
| | | | R17 ≤ 0.135: Healthy {Fail=1, Healthy=3}
| | | R23 ≤ 0.075: Fail {Fail=36, Healthy=1}

CONCLUSION:

Logistics Regression Decision Tree

Accuracy: 67.31% Accuracy: 67.50%

Classification error: 32.69 Classification error: 32.5

Both the models gave accuracy rate of 67.31% and 67.50%. Since, there is not much
difference between both the models, both can be used but Decision Tree gives Detail
interpretation of the data.

You might also like