10 views

Uploaded by Abhee Raj

CREDIT RATING REPORT INSTALLMENT 1

- Statistics Homework Help, Statistics Tutoring, Statistics Tutor - By Online Tutor Site
- mid term
- cp and cpk
- Attachment a Oregon-Specific Live Load Factor Calibration
- Normal Distribution
- What Are Outliers47
- (Rorabacher Et Al. 1991 Anal Chem) Q-Test
- CIMAC_WG4_AppendixIV
- What Are Outliers28
- MatE 14 - Lecture 3 Normal Pr
- Confidence Intervals
- 06MAT41-MP-2
- What Are Outliers26
- SBST1303 AQ JAN 2018 (2)
- statsfinalproject
- Ch.1 Comparing Two Means
- Control de La Memoria de Trabajo en Monos Rhesus
- Normal Distribution Giuded Solutions
- wp_4430.pdf
- MAT130 Lecture Module 3

You are on page 1of 16

Project

Installment – I

Group – 8

Abhishek Singh (173)

Girish Matada (194)

Jai Kishore Jangir (198)

Paras Kohli (210)

Tushar Rane (233)

Unnati Kandelwal (234)

Installment 1

1. (a) Describe the distribution of the PRSM scores using both graphics and

descriptive statistics.

(b) Does it appear reasonable to use the Empirical Rule to describe the variation

of PRSM? Briefly explain your reasoning.

(c) If you were to remedy any evident anomalies with this variable, how well

would the Empirical Rule work now?

Ans

(a) PRSM score is the PRSM = (2*Amt paid in 6 months)/Total amt to be paid

CODE : Descriptive Statistics

mean(initial_PRSM$Amt.Repaid.at.6.Months)

median(initial_PRSM$Amt.Repaid.at.6.Months)

getmode(initial_PRSM$Amt.Repaid.at.6.Months)

mean(PRSM$Amt.Repaid.at.6.Months)

median(PRSM$Amt.Repaid.at.6.Months)

getmode(PRSM$Amt.Repaid.at.6.Months)

OUTPUT:

GRAPHICAL Representation

With Outliers:

Without Outliers:

(b) Does it appear reasonable to use the Empirical Rule to describe the variation of

PRSM? Briefly explain your reasoning.

Ans) No, as it has already been indicated in the (a) part, 99.84% of values are falling

standard deviation and hence it is not advisable to use the empirical rule.

CODE:

mean(initial_PRSM$Amt.Repaid.at.6.Months)

sd(initial_PRSM$Amt.Repaid.at.6.Months)

mean(PRSM$Amt.Repaid.at.6.Months)

sd(PRSM$Amt.Repaid.at.6.Months)

OUTPUT:

(c) If you were to remedy any evident anomalies with this variable, how well would

the Empirical Rule work now?

Ans) By removing the Outliers, we were able to justify the empirical rule.

2. Management is concerned that current lending procedures produce loans that,

on average, have PRSM scores below the target 1. After correcting the anomaly

noted in Question 1, does a confidence interval indicate that management should

indeed be concerned that average PRSM scores are lower than desired (i.e.,

lower than 1)? Be sure to justify the use of a confidence interval in this context.

Ans)

Standard Error ----> @ Confidence Level 95%

s <-sd(PRSM$Amt.Repaid.at.6.Months)

1)*s/sqrt(length(PRSM$Amt.Repaid.at.6.Months))

error

left <- m-error

right <-m+error

left

right

m

OUTPUT:

3. Control charts can be used to measure the stability of many types of data,

including the performance of loans. Assume that the loans in your data table are

arranged in chronological order, starting from the first row through row 628.

Generate a control chart of the PRSM scores for your sample, completing the

JMP dialog as shown below. These choices set the process mean μ = 0.9, the

standard deviation σ = 0.24, and group the loans into batches of size 40. Be sure

to resolve the anomaly noted in the first question.

(a) Do the resulting x-bar and s-charts indicate that the lending process has been “in

control” over the sampled period? What are the implications, especially with regard

to the confidence interval in Question 2?

(b) Why are the control limits for the mean in the x-bar chart so much wider than

the confidence interval for the mean used in Question 2? Two reasons, please!

Ans) The control limits are based on 6σ while the control limits in question 2 were based

on 4σ for 95% confidence interval.

4. (a) The variable Years in Business may ultimately be useful in predicting the

PRSM score. Describe the distribution of this variable.

(b) Describe the distribution of the variable defined by log(1 + Years in

Business). In particular, is the variation in the transformed variable “nearly

normal”?

Ans 4. (a)

OUTPUT:

The plot shows that the data follows negative exponential distribution and not normal

distribution.

(b) Describe the distribution of the variable defined by log(1 + Years in Business). In

particular, is the variation in the transformed variable “nearly normal”?

Ans)

OUTPUT:

5. The distribution of Average House Value in Zip Code reveals an unusual

feature that appears to be an artificial consequence of the data processing. If

this artificial feature could be corrected so that the data showed the actual

average house values, then:

(a) How would the mean and standard deviation of this variable change?

Would x-bar and s remain unchanged, increase or decrease?

(b) Would the median and IQR of this variable remain unchanged, increase or

decrease?

Ans 5. (a)Solution to find the Summary of Average House Value in Zip Code in dataset.

OUTPUT:

As we can see the anomaly in the Mean of the DataFrame, this means the Average House

Value in Zip Code has a value that is far away from the usual values, meaning an outlier.

The value of s is mentioned in the screenshot above.

OUTLIERS:

BOXPLOT for OUTLIERS:

CODE, to remove outliers is listed below:

summary(df_2$Average.House.Value.in.Zip.Code)

summary(df_2_without_Outliers$Average.House.Value.in.Zip.Code)

OUTPUT:

(b) Would the median and IQR of this variable remain unchanged, increase or decrease?

CODE:

OUTPUT:

6. An analyst extracted a sample of 25 loans from the same population as your

sample of loans. Estimate the probability that the average PRSM in a sample of

25 from this population is greater than 0.9. Identify any relevant assumptions

and state why you believe them to be plausible.

Mean 0.8094

Std Dev 0.2067

Kurtosis -0.0216

Ans 6. (A)

Assumption:

1. The given mean/Std. Dev/ Kurtosis is for population

2.

OUTPUT:

Here Probability for finding the Mean of the sample of 25 to be greater than 0.9 is

0.01420485.

7. Would you be surprised if the sample of 25 obtained by the analyst described

in the prior question contains fewer than 5 loans that were originated from

ISO named “Credit Divas”? Explain your answer and justify any assumptions

you have made.

Ans) It’s not a big surprise as getting 5 credit divas from random sample of 25 is just

20%. This sample contains 685 credit divas out of 1687 equals to 41 %. so there is a

good possibility of obtaining such a sample. Weighted Sampling can be used to avoid

such biased scenarios.

8. For this question, recode the PRSM score into a two-level categorical variable,

labeled as “Above” or “Below” depending on whether the PRSM score is above

or below the average PRSM score in your dataset.

It has been suggested that loans given to repeat customers (Loan Type

identifies this variable) perform worse than those given to first time borrowers.

To what extent is this assertion supported by the data? Provide a brief

discussion using ideas from the first two lectures to support your answer.

From, the Cross table Type-1 refers to original customer, Type 2 refers to

repeating customer,

Below and high are categorized as per the target PRSM score of 1.

As the table suggests 85.5% of original customers are not meeting the target

whereas 75.5% of repeating customers are not meeting the target PRSM. So our

assumption cannot be justified based on this table.

9. Does your answer to the previous question imply that Original/Repeat loan

status is the cause of an improved PRSM score or might there be another

explanation? If so, suggest a possible “lurking” variable that could influence

the comparison (the lurking variable can be a hypothetical one, it doesn’t have

to exist in the data set). Otherwise, explain briefly why it is not possible.

(Reading Section 5.2 of SF could be helpful here.)

From the correlation matrix we can notice that there is no significant correlation

between the variables and PRSM , hence a new variable by variable transformation

is required.

10. If you wanted to construct an approximate 95% confidence interval for the

proportion of future loans that originate from ISO Loan Masters, and this CI was to

have a margin of error of +/-2%, then what sample size would you recommend?

Ans)

- Statistics Homework Help, Statistics Tutoring, Statistics Tutor - By Online Tutor SiteUploaded bymathhomeworkhelp
- mid termUploaded bysamuelteal
- cp and cpkUploaded bynikesh singh
- Attachment a Oregon-Specific Live Load Factor CalibrationUploaded byatac101
- Normal DistributionUploaded byAnonymous 2DuYUD
- What Are Outliers47Uploaded byniyati25
- (Rorabacher Et Al. 1991 Anal Chem) Q-TestUploaded bymascli
- CIMAC_WG4_AppendixIVUploaded bysuraj dhulannavar
- What Are Outliers28Uploaded byniyati25
- MatE 14 - Lecture 3 Normal PrUploaded bysajeer
- Confidence IntervalsUploaded bybatoolj
- 06MAT41-MP-2Uploaded byJerom Joy
- What Are Outliers26Uploaded byniyati25
- SBST1303 AQ JAN 2018 (2)Uploaded byCalistus Eugene Fernando
- statsfinalprojectUploaded byapi-325102852
- Ch.1 Comparing Two MeansUploaded byHugh Andrews
- Control de La Memoria de Trabajo en Monos RhesusUploaded byrodors
- Normal Distribution Giuded SolutionsUploaded byleeshanghao
- wp_4430.pdfUploaded byamd mhm
- MAT130 Lecture Module 3Uploaded byEric Bonilla
- Metode Standard Menghitung KetidakpastianUploaded byHasim Utomo
- Case Processing Summary Statistik RegresiUploaded byRhaniee Ran Hadrianti
- WHAT ARE OUTLIERS103.pptxUploaded byniyati25
- What Are Outliers151Uploaded byniyati25
- What Are Outliers207Uploaded byniyati25
- What Are Outliers215Uploaded byniyati25
- What Are Outliers221Uploaded byniyati25
- Growth Curve for Girls With Turner SyndrUploaded bydayahn
- VariabilityUploaded byJelica Vasquez
- Final Stats ProjectUploaded bytoingunhulon

- Time Series R DocumentationUploaded byAbhee Raj
- ABCUploaded byAbhee Raj
- Outlook of Indian EconomyUploaded byAbhee Raj
- SimulationUploaded byAbhee Raj
- Group 1 DWDM Assignment OLAPUploaded byAbhee Raj
- Economic DevelopmentUploaded byAbhee Raj
- Assignment 2 RUploaded byAbhee Raj
- Assignment_Project All in 1Uploaded byAbhee Raj
- DiversificationUploaded byAbhee Raj
- Assignment1 Abhishek Singh 173Uploaded byAbhee Raj
- Assignment NumpyUploaded byAbhee Raj
- Pandas DatareaderUploaded byAbhee Raj
- Nonparametric StatisticsUploaded byAbhee Raj
- Group 8 - Business Stats Project - Installment IIUploaded byAbhee Raj
- Bitcoins Marketing StrategyUploaded byAbhee Raj
- Exchange Rate SystemUploaded byEkta Pashine
- UntitledUploaded byAbhee Raj

- iso 7932-2004Uploaded byAlexiel Nguyen
- Experiment #1 - Statistical Concepts & Calibration of EquipmentUploaded byChristopher Galindo
- Assessment of the Health Seeking Behavior of Senior Nursing Students in Saudi ArabiaUploaded byAlexander Decker
- Guidance No 19 - Surface Water Chemical MonitoringUploaded byÖzlem Fedai Korçak
- epiRUploaded byTejendra Pachhai
- DST Feedback2Indrani shakyaUploaded byAdil Anwar
- Pichon Naz 2016Uploaded byWilliams Ariel Chandia Pérez
- Statistical Sampling for AuditorsUploaded byPraveena Raja
- Assessment of InterUploaded byZachary Farouk
- Simultaneous TitrationUploaded byRicha Syahwalia
- Chapter 1 4811 Fund ConceptsUploaded byJamriKallun
- Lecture 08. Strategies for Data Analysis Cohort and Case Control Studies.pptUploaded byYus Ani
- Quiz 4 (Take Home)Uploaded byYellow Carter
- Case Study - Tom MartinUploaded byELVIS
- r Companion Bio StatisticsUploaded byDavid Rafael Cañez Martinez
- os2Uploaded byapi-27836396
- 130322_cziraky.pdfUploaded byLui
- Real Statistics Examples WorkbookUploaded byRaúl Araque
- hasil praktikumUploaded byHasanah Eka
- Estimation Practice 1Uploaded byKakha Ungiadze
- 2Uploaded byRAM NAIDU CHOPPA
- BIO 2060 Answers Problem Set 3Uploaded byElizabethKeeping
- Cost InfluenzaUploaded bycastelten
- Confidence Intervals for the Interaction Odds Ratio in Logistic Regression with Two Binary X's.pdfUploaded byscjofyWFawlroa2r06YFVabfbaj
- b11e081b0c149ee744298b87cfc8e948---QNT275r6_wk4_practice_setUploaded byUsman Farooq
- Prevalence of dementia and major dementia subtypes in Spanish populations: A reanalysis of dementia prevalence surveys, 1990-2008Uploaded byNuria Carcavilla
- Correlation UCS and Point Load IndexUploaded byyandipu
- skittles projectUploaded byapi-252747849
- astm_d1709Uploaded byVíctor Hugo López
- Basic Statistics 16Uploaded bymario975