Professional Documents
Culture Documents
3. A major advantage is that both measures of attribute signifance are included in the statistic. A major
disadvantage is that a highly predictive attribute value will appear insignificant if its corresponding
predictability score is low.
4. 99.62%
b. As we are dealing with a sampling of all possible data we can never be 100% certain that our
results pertain to the entire instance population. Therefore, with statistical testing, we never prove
anything, we can only provide levels of confidence in the conclusions of our experiments.
Bootstrap
Part a: None of the 3 t-test scores are significant. The screen showing the t-test results follows:
Part b: The Anova confirms the results in Part a (Prob = 0.120). The screen shot of the Anova follows:
Part c:
The Mikro scores are to be used as the values for model accuracy. Recall the Mikro score is model
accuracy when the model is applied to all instances.
4. Implement the process model given in Section 7.9 illustrating the Pareto lift chart but change the target
class parameter to loyal. Take a screen shot of the chart and summarize what the chart is telling you.
The Pareto chart tells us that if we use 0.81 as our cutoff for customers classified as loyal we will be
correct 136 out of 156 times. If we use 0.72 as the cutoff value, we will be correct 164 out of 197
times. Further, choosing 0.18 as the cutoff value, our correctness in identifying loyal customers drops
significantly to 183 out of 456.
The right vertical axis is used to interpret the four connected points. Consider the point residing above
and midway between the 0.72 to 0.81 confidence interval. This positioning of this point tells us that if
we include the instances within this confidence interval we select approximately 90% of all loyal
customers.
5. Implement the process model in section 7.7 using the credit screening dataset.
a. Does the t-test show any significant differences in model performance?
b. Does the ANOVA help confirm the results of the t-tests?
c. Use the mikro value within each performance vector to compute the 95% confidence error (or
accuracy) interval for each model.
Part a: The accuracy of the decision tree of maximum depth 1 is significantly less than the accuracy
of the other models. Here is a screen shot of the t-test computations.
Part b: The Anova confirms the results in Part a (Prob = 0.000). The screen shot of the Anova follows:
Part c:
The Mikro scores are to be used as the values for model accuracy. The Mikro score is model accuracy
when the model is applied to all instances.