You are on page 1of 11

Business Report

Data Analysis for Decision Making

Saravanan Varadharajan 8/7/22 PGPMex


Business Report - DADM

1. A company manager says that the average balance on their credit cards is $500. Do you think that this
assertion is justified? Use a one-sample t-test to draw your conclusion.

Null Hypothesis: Average balance of credit card is $500


Alternate Hypothesis: Average balance of credit card is not $500

t-Test: Two-Sample Assuming Unequal Variances

Balance
Mean 520.015
Variance 211378.2253
Observations 400
Hypothesized Mean Difference 500
df 399
t Stat 0.870673781
P(T<=t) one-tail 0.192227914
t Critical one-tail 1.648681534
P(T<=t) two-tail 0.384455827
t Critical two-tail 1.965927296

P value of one tail is greater than our significance level 0.05, Null hypothesis cannot be rejected

• The average balance of Credit card is $500. Hence, Assertion is justified.

-----------------------Saravanan Varadharajan DADM.xlsx – Sheet Q1----------------------------------

2. Is there a difference between men and women as far as average balance is concerned? Use a two-
sample t-test to draw your conclusion.

Null Hypothesis: Average balance of credit card for men and women has no difference
Alternate Hypothesis: Average balance of credit card for men and women has difference

t-Test: Two-Sample Assuming Unequal Variances

Male Female
Mean 509.8031088 529.5362319
Variance 213554.5652 210187.1043
Observations 193 207
Hypothesized Mean Difference 0
df 396
t Stat -0.42838443
P(T<=t) one-tail 0.334302083
t Critical one-tail 1.648710601
P(T<=t) two-tail 0.668604165
t Critical two-tail 1.965972608

P value of one tail is greater than our significance level 0.05, Null hypothesis cannot be rejected

• There is no significance difference between men and women as far as average balance is
concerned.

-----------------------Saravanan Varadharajan DADM.xlsx – Sheet Q2--------------------------------------

Saravanan Varadharajan 1
Business Report - DADM

3. Is there a difference between students and non-students as far as average balance is concerned? Use a
two-sample t-test to draw your conclusion.

Null Hypothesis: Average balance of credit card for students and non-students has no difference
Alternate hypothesis: Average balance of credit card for students and non-students has difference

t-Test: Two-Sample Assuming Unequal Variances

Student Non Student


Mean 876.825 480.3694444
Variance 240101.9429 193085.1361
Observations 40 360
Hypothesized Mean Difference 0
df 46
t Stat 4.902778661
P(T<=t) one-tail 6.08619E-06
t Critical one-tail 1.678660414
P(T<=t) two-tail 1.21724E-05
t Critical two-tail 2.012895599

P value of one tail is less than our significance level 0.05, Null hypothesis can be rejected, hence
“average balance of credit card for students and non-students has no difference” cannot be rejected.

• Yes, there is a significant difference between students and non-students as far as average
balance is concerned

-------------------Saravanan Varadharajan DADM.xlsx – Sheet Q3 ------------------------------------------

4. It is generally assumed that if there are more credit cards then the balance on the cards will be more.
Based on this dataset, do you think this is true? Calculate a correlation coefficient and show a scatter plot
to support your answer.

Correlation Coefficient:
Cards Balance
Cards 1
Balance 0.086456 1

• Correlation coefficient is very less and almost equal to zero, hence no relation between no of
cards and balance of the cards.

Scatter plot:
Balance vs cards
3000
Balance

2000
y = 28.987x + 434.29
1000
0
0 2 4 6 8 10
Card

• Based on the scatter plot the line is flat and it is not that linear so despite of increase or
decrease in credit card the average balance.
• Hence this is not true, there is no correlation between them. Correlation Coefficient is very less

----------------------------Saravanan Varadharajan DADM.xlsx – Sheet Q4-------------------------------

Saravanan Varadharajan 2
Business Report - DADM

5. Examine whether the following demographic variables influence balance: (a) age, (b) years of education,
(c) marital status. For age and years of education, use scatter plots to depict their relationship with
balance and calculate the correlation coefficient. For the relationship between marital status and
balance, use a two-sample t-test to draw your conclusion

Correlation Coefficient:

Age Education Balance


Age 1
Education 0.003619 1
Balance 0.001835 -0.00806 1

Scatter plots:
2500

2000

1500
Balance

1000 Balance
Linear (Balance)
500

0
0 20 40 60 80 100 120
Age

years of education vs balance


2500

2000
Axis Title

1500

1000
15, 790
500

0
0 5 10 15 20 25
Axis Title

Balance Linear (Balance)

It is understandable that the trend shows no correlation, hence credit balance not dependent on the It’s
variable.

• The demographic variables years of education, age, marital status had no influence on balance.

Null Hypothesis: Average balance of credit card for single and married is same
Alternate Hypothesis: Average balance of credit card for single and married is different

Saravanan Varadharajan 3
Business Report - DADM

t-Test: Two-Sample Assuming Unequal Variances

Married Single
Mean 517.9428571 523.2903226
Variance 205696.7262 221735.0385
Observations 245 155
Hypothesized Mean Difference 0
df 319
t Stat -0.112233601
P(T<=t) one-tail 0.455354389
t Critical one-tail 1.649644319
P(T<=t) two-tail 0.910708777
t Critical two-tail 1.967428387

• P value is greater so null hypothesis – no rejection, there is no significant changes due to


married status.

----------------------------Saravanan Varadharajan DADM.xlsx – Sheet Q5---------------------------------

6. “Ethnicity of the cardholder does not matter as far a balance is concerned.” Carry out an analysis of
variance (ANOVA) and discuss whether this statement is supported by the data or not.

Null Hypothesis: Ethnicity of cardholder does not matter as far a balance is concerned
Alternate Hypothesis: Ethnicity of cardholder matter as far as balance is concerned

Anova: Single Factor

SUMMARY
Groups Count Sum Average Variance
African American 99 52569 531 235839.1633
Asian 102 52256 512.3137255 231748.3362
Caucasian 199 103181 518.4974874 190922.4129

ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 18454.20047 2 9227.100236 0.043442783 0.957491888 3.018451995
Within Groups 84321457.71 397 212396.6189

Total 84339911.91 399

• P value is greater so null hypothesis – no rejection, Ethnicity of cardholder does not matter as
far a balance is concerned

----------------------------Saravanan Varadharajan DADM.xlsx – Sheet Q6--------------------------------------

Saravanan Varadharajan 4
Business Report - DADM

7. A general principle that credit card companies often follow is to assign a higher credit limit to people
with a higher credit rating. Does the data show that this principle is being followed?

• Yes, that principle is being followed.

Correlation Coefficient:

Limit Rating
Limit 1
Rating 0.99688 1

Scatter plots:

Rating
1200
1000
800
Rating

600
400
Rating
200
0
0 5000 10000 15000
Limit

• Credit card companies follow the principle to assign a higher credit limit to people with a higher
credit rating is true. Its is justified based on the correlation.

----------------------------Saravanan Varadharajan DADM.xlsx – Sheet Q7---------------------------------

8. Run a simple linear regression of balance on the credit limit. (Here credit limit is the X and the balance is
the Y). Report the coefficients and the R-squared. Show a scatter plot. State inference

Simple liner regression:

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.861697267
R Square 0.74252218
Adjusted R Square 0.741875251
Standard Error 1172.703019
Observations 400

ANOVA
df SS MS F Significance F
Regression 1 1578442502 1578442502 1147.764214 2.5306E-119
Residual 398 547342484 1375232.372
Total 399 2125784986

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 2485.956595 88.58573292 28.06271973 2.28634E-96 2311.802153 2660.111036 2311.802153 2660.111036
Balance 4.326112527 0.127694288 33.87866901 2.5306E-119 4.075072921 4.577152132 4.075072921 4.577152132

Saravanan Varadharajan 5
Business Report - DADM

Scatter Plot:

Balance
2500
2000 R² = 0.7425
1500
Balance

1000
500
0
-500 0 2000 4000 6000 8000 10000 12000 14000 16000
Balance

Balance Linear (Balance)

• Credit limit is significant factor. It’s having a correlation R2= 0.7425

----------------------------Saravanan Varadharajan DADM.xlsx – Sheet Q8--------------------------------------

9. Run a simple linear regression of balance (Y) on credit rating (X). Report the coefficients and R-squared.
Show a scatter plot. State inference

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.863625161
R Square 0.745848418
Adjusted R Square 0.745209846
Standard Error 232.0713048
Observations 400

ANOVA
df SS MS F Significance F
Regression 1 62904789.88 62904789.88 1167.994581 1.8989E-120
Residual 398 21435122.03 53857.09053
Total 399 84339911.91

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept -390.8463418 29.06851463 -13.44569362 3.07318E-34 -447.993365 -333.6993186 -447.993365 -333.6993186
Rating 2.566240327 0.075089102 34.1759357 1.8989E-120 2.418619483 2.713861171 2.418619483 2.713861171

Scatter Plot:

Balance
3000
2000 y = 2.5662x - 390.85
R² = 0.7458
1000
0
0 200 400 600 800 1000 1200
-1000

• Credit rating influences the credit balance.


• It has a correlation

----------------------------Saravanan Varadharajan DADM.xlsx – Sheet Q9--------------------------------------

Saravanan Varadharajan 6
Business Report - DADM

10. Consider your findings in questions 8-9. Discuss business mechanisms to increase or decrease the
balance on credit cards. Try to quantify your answers. In this context, focus on possible specific
strategies using variables in Q8 and Q9 that the business could adopt to increase the balance on credit
cards.

• The credit card rating and credit limit has significant impact on credit card balance.
• Both have good correlation
• the balance is high for those who has credit rating and credit limit high
• Both rating and limit are the significant forecaster of balance

• Higher rating and higher credit limit persons balance can be increased, whereas the lower
rating and lower credit limit people balance must be decreased based on the analysis

----------------------------Saravanan Varadharajan DADM.xlsx – Sheet Q10--------------------------------------

11. The credit limit is provided as a consolidated amount for all the credit cards the cardholder has. Run a
multiple linear regression of Balance (Y) on Limit and Cards as two X variables. Report the coefficients.
Discuss the effect on the balance of (a) increasing the credit limit on the same number of cards and (b)
increasing the number of cards without altering the total credit limit.

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.865188295
R Square 0.748550786
Adjusted R Square 0.74728404
Standard Error 231.1247525
Observations 400

ANOVA
df SS MS F Significance F
Regression 2 63132707.37 31566353.68 590.9238244 9.7585E-120
Residual 397 21207204.54 53418.65124
Total 399 84339911.91

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept -369.0359554 36.16414657 -10.20447018 7.22692E-22 -440.133128 -297.9387828 -440.133128 -297.9387828
Cards 26.03375427 8.438363509 3.085166246 0.002176819 9.444290848 42.62321769 9.444290848 42.62321769
Limit 0.171479037 0.005013136 34.20593861 2.0023E-120 0.161623424 0.18133465 0.161623424 0.18133465

• Credit limit and no of cards is a significant for credit balance


• Both has greater impact on the balance
• Correlation coefficient = 0.865 and R-square = 0.748
• Increase in single unit($) of credit limit with same card will increase 0.17 of balance ( credit limit
is measured on bigger scale compared to card it has 34.2 as standard error )
• Increase in one card will increase 26.03 in the balance - increase in card increase the balance

----------------------------Saravanan Varadharajan DADM.xlsx – Sheet Q11--------------------------------------

Saravanan Varadharajan 7
Business Report - DADM

12. Run a simple linear regression equation with Income as X and Balance as Y. Report the coefficients. Is
the coefficient of Income significantly different from zero? What does this say about the effect of income
on balance?

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.463656457
R Square 0.21497731
Adjusted R Square 0.213004891
Standard Error 407.8647195
Observations 400

ANOVA
df SS MS F Significance F
Regression 1 18131167.4 18131167.4 108.9917152 1.03089E-22
Residual 398 66208744.51 166353.6294
Total 399 84339911.91

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 246.5147506 33.19934735 7.425289058 6.90344E-13 181.2467485 311.7827527 181.2467485 311.7827527
Income 6.048363409 0.579350163 10.43990973 1.03089E-22 4.909394402 7.187332415 4.909394402 7.187332415

Income Balance
Income 1
Balance 0.463656457 1

• Correlation coefficient for the two variables = 0.46


• Based on regression coefficient of income is 6.048
• Yes, it is well away from zero it takes the value from 4.90 to 7.18.
• Adding one unit of income will increase balance 6.04 more and it is a significant forecaster.
• Based on scale, seeing it t-stat it is 10.4 standard error away from zero

----------------------------Saravanan Varadharajan DADM.xlsx – Sheet Q12--------------------------------------

13. Based on the equation derived in question 12, what is the estimated balance for a person with an
income of USD 100k per year?

Balance
2500
y = 6.0484x + 246.51
R² = 0.215
2000

1500

1000

500

0
0 20 40 60 80 100 120 140 160 180 200

• Based on the equation derived Y = 6.0484 (X) + 246.51


• X = Income
• Y = 6.0484 (100) + 246.51= 851.35 USD

----------------------------Saravanan Varadharajan DADM.xlsx – Sheet Q13--------------------------------------

Saravanan Varadharajan 8
Business Report - DADM

14. Based on the dataset, explore the relationship between credit card balance (Y) and (a) Income (b) Age
(c) Education (c) Limit, and (d) Rating as X variables? Estimate a multiple linear regression model and
report the statistical significance of each of these variables.

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.936702578
R Square 0.87741172
Adjusted R Square 0.875856031
Standard Error 161.9917647
Observations 400

ANOVA
df SS MS F Significance F
Regression 5 74000827.17 14800165.43 564.0020686 4.5908E-177
Residual 394 10339084.74 26241.33183
Total 399 84339911.91

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept -473.2514026 55.10833546 -8.587655545 2.08837E-16 -581.5945666 -364.9082387 -581.5945666 -364.9082387
Income -7.608832003 0.381931562 -19.92197755 1.37077E-61 -8.359710677 -6.85795333 -8.359710677 -6.85795333
Limit 0.07901642 0.044791005 1.764113581 0.078487737 -0.009042839 0.167075679 -0.009042839 0.167075679
Rating 2.773843725 0.667079559 4.158190261 3.93909E-05 1.462363177 4.085324273 1.462363177 4.085324273
Age -0.860030445 0.478700493 -1.796594023 0.073165937 -1.801157147 0.081096257 -1.801157147 0.081096257
Education 1.967791521 2.605290902 0.755305874 0.450516748 -3.154218733 7.089801776 -3.154218733 7.089801776

Income Limit Rating Age Education Balance


Income 1
Limit 0.792088341 1
Rating 0.791377625 0.996879737 1
Age 0.175338403 0.100887922 0.103164996 1
Education -0.027691982 -0.023548534 -0.030135627 0.003619285 1
Balance 0.463656457 0.861697267 0.863625161 0.001835119 -0.008061576 1

Saravanan Varadharajan 9
Business Report - DADM

Only with income and rating

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.93547739
R Square 0.875117948
Adjusted R Square 0.874488819
Standard Error 162.8813393
Observations 400

ANOVA
df SS MS F Significance F
Regression 2 73807370.62 36903685.31 1390.999823 4.5212E-180
Residual 397 10532541.29 26530.33071
Total 399 84339911.91

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept -534.8121502 21.60269845 -24.75672896 1.66359E-82 -577.2821357 -492.3421648 -577.2821357 -492.3421648
Income -7.672124366 0.378462026 -20.2718472 3.1071E-63 -8.416164597 -6.928084134 -8.416164597 -6.928084134
Rating 3.949264832 0.086209035 45.81033566 1.4482E-160 3.77978154 4.118748125 3.77978154 4.118748125

• Based on the multiple regression analysis, income and rating are the two statistically significant
factors based on the P-value.
• These all variables, i.g income, education, age, limit, and rating together has 87.7% of variation in
the credit card balance
• Retaining the Xs with low p value i.e only with income and rating, the regression analysis was done
again
• In this regression, analysis with these two variables showed 87,5% variation in the credit card
balance. Which is almost same r square value as previous.
• Based on that it is very clear that, income and rating are the two significant statistically
• And also looking on the errors ( residuals ) and pattern is studied
• It is seen that more values are on negative side and specifically more lower income groups and the
line of fit is also not linear
• Residuals of rating showed a positive side for lower and higher rating where it showed negative
rating.

----------------------------Saravanan Varadharajan DADM.xlsx – Sheet Q14--------------------------------------

_________________________________________________________________________________________

Concluding Remarks:

• This is the conclusion of analysis, income and rating are the two important
variables contributing to the change in balance,

• The limit, age, and education are not significant variables for the balance.

___________________________________________________________________________________

Saravanan Varadharajan 10

You might also like