Week 2 Solutions

Week 2 Solutions
Students Nmae
Institution Affiliation
Course Name/Number
Dude Date
a) Use the Graph Builder to create visual explorations similar to Figure 3.9 in your
book. Please explore the relationships between BAD versus ALL the continuous
(numeric) predictors in this data set (hint: one predictor at a time). Do any of the
predictors appear to be related to BAD? Only include screenshots where a
relationship is apparent.
The correct answers will point out the connection between BAD and DEROG, DELINQ, and
NINQ. The box plots are visually different, and if students connect the means, the means for
these variables for BAD 0 and BAD 1 are different. When BAD equals to 1, the values for
DEROG, DELINQ, and NINQ are commonly higher. Besides, students may use other visual
representations to support their answer, but they should emphasize the relationships between
BAD and DEROG, DELINQ, and NINQ.
BAD vs. DEROG

BAD vs. DELINQ
BAD vs. NINQ

b) Use the Graph Builder to recreate the graph in Figure 3.10, using BAD versus
REASON. Interpret this graph – does there appear to be a relationship between
BAD and REASON? Please explain your interpretation as if I were your manager.
The proportion of loans categorized as Bad is marginally dissimilar for the two Reason
codes; DebtCon and HomeImp. Home Improvement loans had roughly a 22% Bad Risk loan
type outcome, in comparison to 19% for Debt Consolidation loans.
c) Use Analyze > Fit Y by X to analyze the relationship between BAD (Y, Response)
and LOAN (X, Factor). Don’t be afraid to explore the options under the red
triangle.
I. Describe the relationship between BAD and LOAN.
The logistic graph demonstrates that as the loan amount goes up, the likelihood of BAD=0
rises (while the chance of BAD=1 falls). This correlation is significant, as indicated by a p-
value of less than 0.05.

d) Use the Distribution platform to create a histogram and summary statistics for
DELINQ, VALUE, and MORTDUE. Describe the shapes of these distributions, as well
as any other important observations worth highlighting. Do you see anything
concerning?
In each distribution, there is a noticeable right skew. As per the box and whisker plots, both
VALUE and MORTDUE possess certain extreme data points, also known as outliers. By
rephrasing the original content, the meaning remains intact while ensuring that it appears as a
unique, human-written piece devoid of plagiarism.

e) Recreate (or adapt) the formula in Figure 3.21 to bin DELINQ into three groups.
I. After creating the formula, use the Distribution platform to graph DELINQ
and this new column entitled DELINQ Binned, and check your work (to
make sure the binning was done correctly). Please share your visualization
below.
ii). Later in this course, we’ll create a model to predict BAD from the available predictors.
In this context, does binning DELINQ make sense? What impact will using the binned
data (over the original variable) have on our model?
The process of binning can serve as a valuable technique in handling continuous numerical data
that is disorganized or challenging to manage. This method allows for the integration of missing
values, accommodates for the majority of values being zero, and addresses the issue of right-
skewed distribution. However, it is important to be aware of the potential drawbacks associated
with excessively using the binning approach. By categorizing continuous data into bins, there is
an inevitable loss of information since the original values are no longer being utilized. Therefore,
it is crucial to employ this technique cautiously and thoughtfully to maintain data integrity while
optimizing its advantages.
f) Refer to the distributions of VALUE and MORTDUE created in part d above. Use the
Graph Builder and dynamic transformations (shown in Figure 3.20) to explore
different transformations of these variables. This is intended to allow you to practice
applying transformations and interpreting the shape of the resulting distributions.
i. Which transformations, if any, appear to normalize the variables? Copy
& paste your visualizations below.
The Log transformation works well for VALUE.

The Square Root transformation works well for MORTDUE.
ii.From the context of modeling, explain why it might make sense to transform these
variables.
In the context of developing models, it is crucial to understand the rationale behind
transforming variables. Altering data can effectively decrease the asymmetry found within our
dataset's distribution. By implementing transformations, we are able to simplify the process of
identifying patterns present in the data. Additionally, it ensures that certain prerequisites are
fulfilled when employing specific statistical modeling techniques. Various statistical models
come with their own set of assumptions, and it is critical that these assumptions are satisfied in
order to obtain accurate and reliable results. Data transformation plays a significant role in
meeting these requirements, which subsequently leads to a more precise and well-constructed
model. Consequently, it becomes easier for researchers and analysts to derive meaningful
insights and conclusions from the restructured data. However, transformations serve as an
invaluable tool when working with models as they not only minimize distribution skewness but
also facilitate pattern recognition alongside ensuring adherence to particular statistical methods'
requisites. This optimization process ultimately contributes to more effective and comprehensive
model.

Week 2 Solutions

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Week 2 Solutions

Uploaded by

Copyright:

Available Formats

Week 2 Solutions

predictors appear to be related to BAD? Only include screenshots where a

BAD and DEROG, DELINQ, and NINQ.

BAD vs. DEROG

BAD vs. NINQ

REASON. Interpret this graph – does there appear to be a relationship between

type outcome, in comparison to 19% for Debt Consolidation loans.

I. Describe the relationship between BAD and LOAN.

value of less than 0.05.

as any other important observations worth highlighting. Do you see anything

unique, human-written piece devoid of plagiarism.

data (over the original variable) have on our model?

skewed distribution. However, it is important to be aware of the potential drawbacks associated

optimizing its advantages.

Graph Builder and dynamic transformations (shown in Figure 3.20) to explore

different transformations of these variables. This is intended to allow you to practice

applying transformations and interpreting the shape of the resulting distributions.

i. Which transformations, if any, appear to normalize the variables? Copy

& paste your visualizations below.

The Log transformation works well for VALUE.

In the context of developing models, it is crucial to understand the rationale behind

dataset's distribution. By implementing transformations, we are able to simplify the process of

You might also like