You are on page 1of 2

Installment 1 SMMD DHM-2, 2020

Indian School of Business


Credit Risk Project, Installment 1
Your answers to the following questions are to be submitted in a single report document.
The report file must have a separate cover page that identifies the team (e.g., J-1) and lists
the members of the team who are participating in the project. Number the subsequent
pages and format them to have 1-inch margins all around. Include only plots that are
discussed in your report. Reports are to be submitted via the LMS by 23:55hrs on
Monday, 1nd Aug.

These questions use your team project dataset. For the sake of checking conditions,
assume that the cases represent a simple random sample from the population of loans
described in the project description. You do not have all of the loans, just a small sample
out of the many thousands handled by this lender. For these questions, use only the 628
loans with complete descriptions in your data table.1

A key variable in the remainder of the analysis is defined as follows. The lender uses a
performance metric known as PRSM, performance ratio at six months. To construct
PRSM, divide two times the amount repaid at six months by the total amount to be
repaid:
Amount repaid at six months
PRSM = 2
Total amount to be repaid
PRSM should be approximately equal to 1 if the payments at 6 months are on track to
fulfill the debt at the end of the year. Values of PRSM < 1 indicate a loan for which the
payments are currently coming in slower than expected; PRSM > 1 indicates a loan that is
being paid off faster than expected. You will need to create this column in your JMP
dataset using the formula calculator. The formula calculator manipulates columns in the
data table.2

1. (a) Using both graphics and descriptive statistics describe the shape and form of the
distribution of the PRSM score. Does it appear reasonable to use the Empirical Rule with
this variable?
(b) If you were to remedy any anomalies with this variable how well would the Empirical
Rule work now?

2. (a) The variable Years in Business may ultimately be useful in forecasting the PRSM
score. Comment on the shape of the distribution of this predictor.
(b) It is common (but not a requirement) to take the log of a variable that displays a shape
such as this variable does. Further, the log transform is not defined for the value of zero,

1 Selectthe rows that are incomplete which are the last 1,038 in your data table, and use the commands Rows >
Exclude and Rows > Hide to remove them from these analyses.
2 Use Cols > New column… to add a column in the data table. Then right-click on the top line (with the name of the

new column) and define the PRSM formula. Define the formula by picking variables from the list at the top left inside
the formula editor.

1
Installment 1 SMMD DHM-2, 2020

so when a variable contains zero values, we modify the log transform to include an offset
term. The most common offset is 1. Comment on the distribution of log(1 + Years in
Business).

3. (a) If 9 loans are given, each with a Total Amount to be Repaid of $12,000, then what is
the approximate probability that the average PRSM score of these 9 loans is less than
0.76? State any assumptions you have used for your probability calculation.
(b) What is the probability that less than $41,040 in total across the 9 loans has been paid
back after 6 months?

4. For this question, recode the PRSM score into a two level categorical variable, labeled as
“Above” or “Below” depending on whether the PRSM score is above or below the mean
PRSM score in your dataset. (You will find the “If” function under the Conditional menu
in the formula dialog and the “Greater than” function in the Comparison menu.) It has
been suggested that loans given to Repeat customers (Loan Type identifies this variable)
perform better than those given to first time customers. To what extent is this assertion
supported by the data? Give a brief discussion to support your answer.

5. Does your answer to the previous question mean that Original/Repeat loan status is the
root cause of an expected improved PRSM score or might there be another explanation?
If so, suggest a possible “lurking” variable that could influence the comparison (the
lurking variable can be a hypothetical one, it doesn’t have to exist in the data set).
Otherwise, explain briefly why it is not possible. (Reading Section 5.2 of Stine and Foster
could be helpful here.)

6. Compare the estimated probabilities that a PRSM score falls into the “Above” category
across the different Independent Sales Organizations (ISO Name). Summarize your
conclusions briefly and identify which ISO appears to be the better performer and which
one is the worst.

7. Management is concerned that the population mean PRSM score from which this sample
has been drawn is less than 0.85. Identify two conditions that need to be met for a
confidence interval derived from the sample data to be valid.

8. Given that the conditions from Qu. 7 hold true, find a 95% CI for the mean PRSM score
based on your dataset and use it to make a statement addressing management’s concerns.

9. If you wanted to construct an approximate 95% CI for the proportion of Corporations in


the population of loans, and this CI was to have a margin of error of +/-5%, then what
sample size would you recommend?

10. Based on your dataset, create a 95% CI for the proportion of Corporation’s in the loan
population and provide an interpretation of the interval for management.

You might also like