Professional Documents
Culture Documents
1. A company manager says that the average balance on their credit cards is $500. Do you think that this
assertion is justified? Use a one-sample t-test to draw your conclusion.
Balance
Mean 520.015
Variance 211378.2253
Observations 400
Hypothesized Mean Difference 500
df 399
t Stat 0.870673781
P(T<=t) one-tail 0.192227914
t Critical one-tail 1.648681534
P(T<=t) two-tail 0.384455827
t Critical two-tail 1.965927296
P value of one tail is greater than our significance level 0.05, Null hypothesis cannot be rejected
2. Is there a difference between men and women as far as average balance is concerned? Use a two-
sample t-test to draw your conclusion.
Null Hypothesis: Average balance of credit card for men and women has no difference
Alternate Hypothesis: Average balance of credit card for men and women has difference
Male Female
Mean 509.8031088 529.5362319
Variance 213554.5652 210187.1043
Observations 193 207
Hypothesized Mean Difference 0
df 396
t Stat -0.42838443
P(T<=t) one-tail 0.334302083
t Critical one-tail 1.648710601
P(T<=t) two-tail 0.668604165
t Critical two-tail 1.965972608
P value of one tail is greater than our significance level 0.05, Null hypothesis cannot be rejected
• There is no significance difference between men and women as far as average balance is
concerned.
Saravanan Varadharajan 1
Business Report - DADM
3. Is there a difference between students and non-students as far as average balance is concerned? Use a
two-sample t-test to draw your conclusion.
Null Hypothesis: Average balance of credit card for students and non-students has no difference
Alternate hypothesis: Average balance of credit card for students and non-students has difference
P value of one tail is less than our significance level 0.05, Null hypothesis can be rejected, hence
“average balance of credit card for students and non-students has no difference” cannot be rejected.
• Yes, there is a significant difference between students and non-students as far as average
balance is concerned
4. It is generally assumed that if there are more credit cards then the balance on the cards will be more.
Based on this dataset, do you think this is true? Calculate a correlation coefficient and show a scatter plot
to support your answer.
Correlation Coefficient:
Cards Balance
Cards 1
Balance 0.086456 1
• Correlation coefficient is very less and almost equal to zero, hence no relation between no of
cards and balance of the cards.
Scatter plot:
Balance vs cards
3000
Balance
2000
y = 28.987x + 434.29
1000
0
0 2 4 6 8 10
Card
• Based on the scatter plot the line is flat and it is not that linear so despite of increase or
decrease in credit card the average balance.
• Hence this is not true, there is no correlation between them. Correlation Coefficient is very less
Saravanan Varadharajan 2
Business Report - DADM
5. Examine whether the following demographic variables influence balance: (a) age, (b) years of education,
(c) marital status. For age and years of education, use scatter plots to depict their relationship with
balance and calculate the correlation coefficient. For the relationship between marital status and
balance, use a two-sample t-test to draw your conclusion
Correlation Coefficient:
Scatter plots:
2500
2000
1500
Balance
1000 Balance
Linear (Balance)
500
0
0 20 40 60 80 100 120
Age
2000
Axis Title
1500
1000
15, 790
500
0
0 5 10 15 20 25
Axis Title
It is understandable that the trend shows no correlation, hence credit balance not dependent on the It’s
variable.
• The demographic variables years of education, age, marital status had no influence on balance.
Null Hypothesis: Average balance of credit card for single and married is same
Alternate Hypothesis: Average balance of credit card for single and married is different
Saravanan Varadharajan 3
Business Report - DADM
Married Single
Mean 517.9428571 523.2903226
Variance 205696.7262 221735.0385
Observations 245 155
Hypothesized Mean Difference 0
df 319
t Stat -0.112233601
P(T<=t) one-tail 0.455354389
t Critical one-tail 1.649644319
P(T<=t) two-tail 0.910708777
t Critical two-tail 1.967428387
6. “Ethnicity of the cardholder does not matter as far a balance is concerned.” Carry out an analysis of
variance (ANOVA) and discuss whether this statement is supported by the data or not.
Null Hypothesis: Ethnicity of cardholder does not matter as far a balance is concerned
Alternate Hypothesis: Ethnicity of cardholder matter as far as balance is concerned
SUMMARY
Groups Count Sum Average Variance
African American 99 52569 531 235839.1633
Asian 102 52256 512.3137255 231748.3362
Caucasian 199 103181 518.4974874 190922.4129
ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 18454.20047 2 9227.100236 0.043442783 0.957491888 3.018451995
Within Groups 84321457.71 397 212396.6189
• P value is greater so null hypothesis – no rejection, Ethnicity of cardholder does not matter as
far a balance is concerned
Saravanan Varadharajan 4
Business Report - DADM
7. A general principle that credit card companies often follow is to assign a higher credit limit to people
with a higher credit rating. Does the data show that this principle is being followed?
Correlation Coefficient:
Limit Rating
Limit 1
Rating 0.99688 1
Scatter plots:
Rating
1200
1000
800
Rating
600
400
Rating
200
0
0 5000 10000 15000
Limit
• Credit card companies follow the principle to assign a higher credit limit to people with a higher
credit rating is true. Its is justified based on the correlation.
8. Run a simple linear regression of balance on the credit limit. (Here credit limit is the X and the balance is
the Y). Report the coefficients and the R-squared. Show a scatter plot. State inference
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.861697267
R Square 0.74252218
Adjusted R Square 0.741875251
Standard Error 1172.703019
Observations 400
ANOVA
df SS MS F Significance F
Regression 1 1578442502 1578442502 1147.764214 2.5306E-119
Residual 398 547342484 1375232.372
Total 399 2125784986
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 2485.956595 88.58573292 28.06271973 2.28634E-96 2311.802153 2660.111036 2311.802153 2660.111036
Balance 4.326112527 0.127694288 33.87866901 2.5306E-119 4.075072921 4.577152132 4.075072921 4.577152132
Saravanan Varadharajan 5
Business Report - DADM
Scatter Plot:
Balance
2500
2000 R² = 0.7425
1500
Balance
1000
500
0
-500 0 2000 4000 6000 8000 10000 12000 14000 16000
Balance
9. Run a simple linear regression of balance (Y) on credit rating (X). Report the coefficients and R-squared.
Show a scatter plot. State inference
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.863625161
R Square 0.745848418
Adjusted R Square 0.745209846
Standard Error 232.0713048
Observations 400
ANOVA
df SS MS F Significance F
Regression 1 62904789.88 62904789.88 1167.994581 1.8989E-120
Residual 398 21435122.03 53857.09053
Total 399 84339911.91
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept -390.8463418 29.06851463 -13.44569362 3.07318E-34 -447.993365 -333.6993186 -447.993365 -333.6993186
Rating 2.566240327 0.075089102 34.1759357 1.8989E-120 2.418619483 2.713861171 2.418619483 2.713861171
Scatter Plot:
Balance
3000
2000 y = 2.5662x - 390.85
R² = 0.7458
1000
0
0 200 400 600 800 1000 1200
-1000
Saravanan Varadharajan 6
Business Report - DADM
10. Consider your findings in questions 8-9. Discuss business mechanisms to increase or decrease the
balance on credit cards. Try to quantify your answers. In this context, focus on possible specific
strategies using variables in Q8 and Q9 that the business could adopt to increase the balance on credit
cards.
• The credit card rating and credit limit has significant impact on credit card balance.
• Both have good correlation
• the balance is high for those who has credit rating and credit limit high
• Both rating and limit are the significant forecaster of balance
• Higher rating and higher credit limit persons balance can be increased, whereas the lower
rating and lower credit limit people balance must be decreased based on the analysis
11. The credit limit is provided as a consolidated amount for all the credit cards the cardholder has. Run a
multiple linear regression of Balance (Y) on Limit and Cards as two X variables. Report the coefficients.
Discuss the effect on the balance of (a) increasing the credit limit on the same number of cards and (b)
increasing the number of cards without altering the total credit limit.
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.865188295
R Square 0.748550786
Adjusted R Square 0.74728404
Standard Error 231.1247525
Observations 400
ANOVA
df SS MS F Significance F
Regression 2 63132707.37 31566353.68 590.9238244 9.7585E-120
Residual 397 21207204.54 53418.65124
Total 399 84339911.91
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept -369.0359554 36.16414657 -10.20447018 7.22692E-22 -440.133128 -297.9387828 -440.133128 -297.9387828
Cards 26.03375427 8.438363509 3.085166246 0.002176819 9.444290848 42.62321769 9.444290848 42.62321769
Limit 0.171479037 0.005013136 34.20593861 2.0023E-120 0.161623424 0.18133465 0.161623424 0.18133465
Saravanan Varadharajan 7
Business Report - DADM
12. Run a simple linear regression equation with Income as X and Balance as Y. Report the coefficients. Is
the coefficient of Income significantly different from zero? What does this say about the effect of income
on balance?
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.463656457
R Square 0.21497731
Adjusted R Square 0.213004891
Standard Error 407.8647195
Observations 400
ANOVA
df SS MS F Significance F
Regression 1 18131167.4 18131167.4 108.9917152 1.03089E-22
Residual 398 66208744.51 166353.6294
Total 399 84339911.91
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 246.5147506 33.19934735 7.425289058 6.90344E-13 181.2467485 311.7827527 181.2467485 311.7827527
Income 6.048363409 0.579350163 10.43990973 1.03089E-22 4.909394402 7.187332415 4.909394402 7.187332415
Income Balance
Income 1
Balance 0.463656457 1
13. Based on the equation derived in question 12, what is the estimated balance for a person with an
income of USD 100k per year?
Balance
2500
y = 6.0484x + 246.51
R² = 0.215
2000
1500
1000
500
0
0 20 40 60 80 100 120 140 160 180 200
Saravanan Varadharajan 8
Business Report - DADM
14. Based on the dataset, explore the relationship between credit card balance (Y) and (a) Income (b) Age
(c) Education (c) Limit, and (d) Rating as X variables? Estimate a multiple linear regression model and
report the statistical significance of each of these variables.
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.936702578
R Square 0.87741172
Adjusted R Square 0.875856031
Standard Error 161.9917647
Observations 400
ANOVA
df SS MS F Significance F
Regression 5 74000827.17 14800165.43 564.0020686 4.5908E-177
Residual 394 10339084.74 26241.33183
Total 399 84339911.91
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept -473.2514026 55.10833546 -8.587655545 2.08837E-16 -581.5945666 -364.9082387 -581.5945666 -364.9082387
Income -7.608832003 0.381931562 -19.92197755 1.37077E-61 -8.359710677 -6.85795333 -8.359710677 -6.85795333
Limit 0.07901642 0.044791005 1.764113581 0.078487737 -0.009042839 0.167075679 -0.009042839 0.167075679
Rating 2.773843725 0.667079559 4.158190261 3.93909E-05 1.462363177 4.085324273 1.462363177 4.085324273
Age -0.860030445 0.478700493 -1.796594023 0.073165937 -1.801157147 0.081096257 -1.801157147 0.081096257
Education 1.967791521 2.605290902 0.755305874 0.450516748 -3.154218733 7.089801776 -3.154218733 7.089801776
Saravanan Varadharajan 9
Business Report - DADM
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.93547739
R Square 0.875117948
Adjusted R Square 0.874488819
Standard Error 162.8813393
Observations 400
ANOVA
df SS MS F Significance F
Regression 2 73807370.62 36903685.31 1390.999823 4.5212E-180
Residual 397 10532541.29 26530.33071
Total 399 84339911.91
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept -534.8121502 21.60269845 -24.75672896 1.66359E-82 -577.2821357 -492.3421648 -577.2821357 -492.3421648
Income -7.672124366 0.378462026 -20.2718472 3.1071E-63 -8.416164597 -6.928084134 -8.416164597 -6.928084134
Rating 3.949264832 0.086209035 45.81033566 1.4482E-160 3.77978154 4.118748125 3.77978154 4.118748125
• Based on the multiple regression analysis, income and rating are the two statistically significant
factors based on the P-value.
• These all variables, i.g income, education, age, limit, and rating together has 87.7% of variation in
the credit card balance
• Retaining the Xs with low p value i.e only with income and rating, the regression analysis was done
again
• In this regression, analysis with these two variables showed 87,5% variation in the credit card
balance. Which is almost same r square value as previous.
• Based on that it is very clear that, income and rating are the two significant statistically
• And also looking on the errors ( residuals ) and pattern is studied
• It is seen that more values are on negative side and specifically more lower income groups and the
line of fit is also not linear
• Residuals of rating showed a positive side for lower and higher rating where it showed negative
rating.
_________________________________________________________________________________________
Concluding Remarks:
• This is the conclusion of analysis, income and rating are the two important
variables contributing to the change in balance,
• The limit, age, and education are not significant variables for the balance.
___________________________________________________________________________________
Saravanan Varadharajan 10