Professional Documents
Culture Documents
Students Nmae
Institution Affiliation
Course Name/Number
Dude Date
a) Use the Graph Builder to create visual explorations similar to Figure 3.9 in your
book. Please explore the relationships between BAD versus ALL the continuous
(numeric) predictors in this data set (hint: one predictor at a time). Do any of the
relationship is apparent.
The correct answers will point out the connection between BAD and DEROG, DELINQ, and
NINQ. The box plots are visually different, and if students connect the means, the means for
these variables for BAD 0 and BAD 1 are different. When BAD equals to 1, the values for
DEROG, DELINQ, and NINQ are commonly higher. Besides, students may use other visual
representations to support their answer, but they should emphasize the relationships between
BAD and REASON? Please explain your interpretation as if I were your manager.
The proportion of loans categorized as Bad is marginally dissimilar for the two Reason
codes; DebtCon and HomeImp. Home Improvement loans had roughly a 22% Bad Risk loan
c) Use Analyze > Fit Y by X to analyze the relationship between BAD (Y, Response)
and LOAN (X, Factor). Don’t be afraid to explore the options under the red
triangle.
The logistic graph demonstrates that as the loan amount goes up, the likelihood of BAD=0
rises (while the chance of BAD=1 falls). This correlation is significant, as indicated by a p-
DELINQ, VALUE, and MORTDUE. Describe the shapes of these distributions, as well
concerning?
In each distribution, there is a noticeable right skew. As per the box and whisker plots, both
VALUE and MORTDUE possess certain extreme data points, also known as outliers. By
rephrasing the original content, the meaning remains intact while ensuring that it appears as a
I. After creating the formula, use the Distribution platform to graph DELINQ
and this new column entitled DELINQ Binned, and check your work (to
make sure the binning was done correctly). Please share your visualization
below.
ii). Later in this course, we’ll create a model to predict BAD from the available predictors.
In this context, does binning DELINQ make sense? What impact will using the binned
The process of binning can serve as a valuable technique in handling continuous numerical data
that is disorganized or challenging to manage. This method allows for the integration of missing
values, accommodates for the majority of values being zero, and addresses the issue of right-
with excessively using the binning approach. By categorizing continuous data into bins, there is
an inevitable loss of information since the original values are no longer being utilized. Therefore,
it is crucial to employ this technique cautiously and thoughtfully to maintain data integrity while
f) Refer to the distributions of VALUE and MORTDUE created in part d above. Use the
ii.From the context of modeling, explain why it might make sense to transform these
variables.
transforming variables. Altering data can effectively decrease the asymmetry found within our
identifying patterns present in the data. Additionally, it ensures that certain prerequisites are
fulfilled when employing specific statistical modeling techniques. Various statistical models
come with their own set of assumptions, and it is critical that these assumptions are satisfied in
order to obtain accurate and reliable results. Data transformation plays a significant role in
meeting these requirements, which subsequently leads to a more precise and well-constructed
model. Consequently, it becomes easier for researchers and analysts to derive meaningful
insights and conclusions from the restructured data. However, transformations serve as an
invaluable tool when working with models as they not only minimize distribution skewness but
also facilitate pattern recognition alongside ensuring adherence to particular statistical methods'
requisites. This optimization process ultimately contributes to more effective and comprehensive
model.