Professional Documents
Culture Documents
Complete each section. When you are ready, save your file as a PDF document and submit it
here: https://classroom.udacity.com/nanodegrees/nd008/parts/11a7bf4c-2b69-47f3-9aec-
108ce847f855/project
Key Decisions:
Answer these questions
● In your cleanup process, which fields did you remove or impute? Please justify why you
removed or imputed these fields. Visualizations are encouraged.
Removed: purpose, installment, duration, current type of apartment, occupation
These fields will not have significant impact to the results, have missing values or offer
no differentiation between credit decisions.
Logistic Regression
Account Balance, Purpose and Credit Amount are the top 3 most significant variables with p-
value of less than 0.05.
Accuracy is around 76.0% while accuracy for creditworthy is higher than non-creditworthy at
80.0% and 62.9% respectively. The model is biased towards predicting customers as non-
creditworthy.
Decision Tree
Account Balance, Value Savings Stocks and Duration of Credit Month are the top 3 most
important variables. The overall accuracy is 74.7%.
Accuracy for creditworthy is 79.1% while accuracy for non-creditworthy is 60.0%. The
model seems to be biased towards predicting customers as non-creditworthy.
Forest Model
Credit Amount, Age Years and Duration of Credit Month are the 3 most important variables.
Overall accuracy is 80.0%. The model isn’t biased as the accuracies for creditworthy and non-
creditworthy are 79.1% and 85.7% respectively, which are comparable.
Credit Amount, Age Years and Duration of Credit Month are the 3 most important variables.
Overall accuracy is 80.0%. The model isn’t biased as the accuracies for creditworthy and non-
creditworthy are 79.1% and 85.7% respectively, which are comparable.
Boosted Model
Account Balance and Credit Amount are the most significant variables from figure 10. Overall
accuracy for is 76.7%. Accuracies for creditworthy and non-creditworthy are 76.7% and 78.3%
respectively which indicates a lack of bias in predicting creditworthiness of customers.
You should have four sets of questions answered. (500 word limit)
Step 4: Writeup
Decide on the best model and score your new customers. For reviewing consistency, if
Score_Creditworthy is greater than Score_NonCreditworthy, the person should be labeled as
“Creditworthy”
Write a brief report on how you came up with your classification model and write down how
many of the new customers would qualify for a loan. (250 word limit)
Answer these questions:
● Which model did you choose to use? Please justify your decision using all of the
following techniques. Please only use these techniques to justify your decision:
○ Overall Accuracy against your Validation set
The forest model offers the highest accuracy = 80% against the validation set.
○ Accuracies within “Creditworthy” and “Non-Creditworthy” segments
accuracies for creditworthy and non-creditworthy are among the highest of all
○ ROC graph
Note: Remember that your boss only cares about prediction accuracy for Creditworthy and
Non-Creditworthy segments.
408
Please check your answers against the requirements of the project dictated by the rubric here.
Reviewers will use this rubric to grade your project.