Professional Documents
Culture Documents
Step 1: We should find the principal components of the dataset because if all the attribute is
taken the model gives the squared correlation of 100% which cannot be true for any model .
PRINCIPAL COMPONENTS
This picture above is the graphical representation of cumulative variance of the Principal
Components.
We a=have to make 2 models for the particular data.( Logistic Regression and Decision Tree)
Before moving ahead, we will be changing the values of D to DYN {Failure(when, D=0) and
Healthy(When, D=1)} in the excel sheet because logistic regression does not take numerical
values and requires non numeric values. Also before importing the new file we will exclude
all the attributes except the Principal Component attributes and D.
LOGISTIC REGRESSION
Logistic Regression is used when the outcome variable has only two values, as in this example
the outcome variable was ‘Firm Category” it had only two values either “Healthy” or “Failed”.
The input variables had continuous values.
After playing the process we get the performance vector, example set of apply model and
Logistic Regression tab.
In the Example set (apply model) the actual value and the prediction value of DYN is shown.
The equation formed:
Logit = 5.036*R2 + 0.226*R5 + 6.398*R7 + 1.142*R8 + 4.937*R9 + 7.0669*R11 -
24.782*R17 - 0.065*R19 + 0.757*R20 + 2.390*R21 + 24.043*R23 - 23.212 + ℇ
These variables has Max effect on the model. The variable with “+” sign affect it positively
and the variable with the “-“sign affect it negatively.
The performance vector shows that the accuracy of the data is 67.31% and table shows
that:
➢ Number of times predicted failure was actually a failure-16 times out of 23
(69.57% precision)
➢ Number of times predicted healthy was actually healthy- 19 times out of 29
(65.52% precision)
DECISION TREE MODEL
If R21<1.750 and if R9<1.945 and if R8>11.070 and if R8>87.760 then the firm category is
“Fail”.
If R21<1.750 and if R9<1.945 and if R8>11.070 and if R8<87.760 then the firm category is
“Healthy”.
CONCLUSION:
Both the models gave accuracy rate of 67.31% and 67.50%. Since, there is not much
difference between both the models, both can be used but Decision Tree gives Detail
interpretation of the data.