You are on page 1of 7

Week 2 Solutions

Students Nmae

Institution Affiliation

Course Name/Number

Dude Date

a) Use the Graph Builder to create visual explorations similar to Figure 3.9 in your

book. Please explore the relationships between BAD versus ALL the continuous

(numeric) predictors in this data set (hint: one predictor at a time). Do any of the

predictors appear to be related to BAD? Only include screenshots where a

relationship is apparent.

The correct answers will point out the connection between BAD and DEROG, DELINQ, and

NINQ. The box plots are visually different, and if students connect the means, the means for

these variables for BAD 0 and BAD 1 are different. When BAD equals to 1, the values for

DEROG, DELINQ, and NINQ are commonly higher. Besides, students may use other visual

representations to support their answer, but they should emphasize the relationships between

BAD and DEROG, DELINQ, and NINQ.

BAD vs. DEROG


BAD vs. DELINQ

BAD vs. NINQ


b) Use the Graph Builder to recreate the graph in Figure 3.10, using BAD versus

REASON. Interpret this graph – does there appear to be a relationship between

BAD and REASON? Please explain your interpretation as if I were your manager.

The proportion of loans categorized as Bad is marginally dissimilar for the two Reason

codes; DebtCon and HomeImp. Home Improvement loans had roughly a 22% Bad Risk loan

type outcome, in comparison to 19% for Debt Consolidation loans.

c) Use Analyze > Fit Y by X to analyze the relationship between BAD (Y, Response)

and LOAN (X, Factor). Don’t be afraid to explore the options under the red

triangle.

I. Describe the relationship between BAD and LOAN.

The logistic graph demonstrates that as the loan amount goes up, the likelihood of BAD=0

rises (while the chance of BAD=1 falls). This correlation is significant, as indicated by a p-

value of less than 0.05.


d) Use the Distribution platform to create a histogram and summary statistics for

DELINQ, VALUE, and MORTDUE. Describe the shapes of these distributions, as well

as any other important observations worth highlighting. Do you see anything

concerning?

In each distribution, there is a noticeable right skew. As per the box and whisker plots, both

VALUE and MORTDUE possess certain extreme data points, also known as outliers. By

rephrasing the original content, the meaning remains intact while ensuring that it appears as a

unique, human-written piece devoid of plagiarism.


e) Recreate (or adapt) the formula in Figure 3.21 to bin DELINQ into three groups.

I. After creating the formula, use the Distribution platform to graph DELINQ

and this new column entitled DELINQ Binned, and check your work (to

make sure the binning was done correctly). Please share your visualization

below.
ii). Later in this course, we’ll create a model to predict BAD from the available predictors.

In this context, does binning DELINQ make sense? What impact will using the binned

data (over the original variable) have on our model?

The process of binning can serve as a valuable technique in handling continuous numerical data

that is disorganized or challenging to manage. This method allows for the integration of missing

values, accommodates for the majority of values being zero, and addresses the issue of right-

skewed distribution. However, it is important to be aware of the potential drawbacks associated

with excessively using the binning approach. By categorizing continuous data into bins, there is

an inevitable loss of information since the original values are no longer being utilized. Therefore,

it is crucial to employ this technique cautiously and thoughtfully to maintain data integrity while

optimizing its advantages.

f) Refer to the distributions of VALUE and MORTDUE created in part d above. Use the

Graph Builder and dynamic transformations (shown in Figure 3.20) to explore

different transformations of these variables. This is intended to allow you to practice

applying transformations and interpreting the shape of the resulting distributions.

i. Which transformations, if any, appear to normalize the variables? Copy

& paste your visualizations below.

The Log transformation works well for VALUE.


The Square Root transformation works well for MORTDUE.

ii.From the context of modeling, explain why it might make sense to transform these

variables.

In the context of developing models, it is crucial to understand the rationale behind

transforming variables. Altering data can effectively decrease the asymmetry found within our

dataset's distribution. By implementing transformations, we are able to simplify the process of

identifying patterns present in the data. Additionally, it ensures that certain prerequisites are

fulfilled when employing specific statistical modeling techniques. Various statistical models

come with their own set of assumptions, and it is critical that these assumptions are satisfied in

order to obtain accurate and reliable results. Data transformation plays a significant role in

meeting these requirements, which subsequently leads to a more precise and well-constructed

model. Consequently, it becomes easier for researchers and analysts to derive meaningful

insights and conclusions from the restructured data. However, transformations serve as an

invaluable tool when working with models as they not only minimize distribution skewness but

also facilitate pattern recognition alongside ensuring adherence to particular statistical methods'

requisites. This optimization process ultimately contributes to more effective and comprehensive

model.

You might also like