Professional Documents
Culture Documents
Question 1:
1. A company manager says that the average balance on their credit cards is $500. Do you
think that this assertion is justified? Use a one-sample t-test to draw your conclusion.
Solution:
Balance
Mean 520.015
Variance 211378.2
Observations 400
Hypothesized Mean 500
df 399
t Stat 0.870674
P(T<=t) one-tail 0.192228
t Critical one-tail 1.648682
P(T<=t) two-tail 0.384456
t Critical two-tail 1.965927
Inference:
P value is greater than 0.05. Thus cannot reject null hypothesis. Average balance on credit card is
500$
Question 2:
Is there a difference between men and women as far as average balance is concerned?
Use a two-sample t-test to draw your conclusion..
Solution:
Null Hypothesis: There is no difference between the average of the balance between men and
women.
Alternative Hypothesis: There is difference between the average of the balance between men
and women.
Balance
Balance Men Women
Mean 509.8031088 529.5362319
Variance 213554.5652 210187.1043
Observations 193 207
Hypothesized Mean Difference 0
df 396
t Stat -0.42838443
P(T<=t) one-tail 0.334302083
t Critical one-tail 1.648710601
P(T<=t) two-tail 0.668604165
t Critical two-tail 1.965972608
Inference:
P value is greater than 0.05. Thus cannot reject null hypothesis. Thus there is no difference in the
average balance between men and women.
Question 3:
Solution:
Null Hypothesis: No difference in the average balance as far the students and non-students are
considered.
Alternative Hypothesis: There is in the balance as far the students and non-students are
considered.
Inference:
P value is not greater than 0.05. Thus we can reject null hypothesis. Thus there is difference in the
average balance between students and non-student.
Question 4:
It is generally assumed that if there are more credit cards then the balance on the cards will be
more. Based on this dataset, do you think this is true? Calculate a correlation coefficient and show a
scatter plot to support your answer.
Solution:
The correlation coefficient between the number of cards and the balance on cards is 0.08645635.
Thus it indicates that there is a weak relationship between the number of cards and balance on card.
10
9
8
7
No of Cards
6
5
4
3
2
1
0
0 500 1000 1500 2000 2500
Balance on card
Question 5:
Examine whether the following demographic variables influence balance: (a) age, (b) years of
education, (c) marital status. For age and years of education, use scatter plots to depict their
relationship with balance and calculate the correlation coefficient. For the relationship between
marital status and balance, use a two-sample t-test to draw your conclusion.
Solution 5A:
The correlation coefficient between age and the balance on cards is 0.001835. This number is almost
equal to zero. Thus it indicates that there is a weak relationship between age and balance on card.
120
100
80
AGE
60
40
20
0
0 500 1000 1500 2000 2500
BALANCE ON CARD
Solution 5B:
The correlation coefficient between years of education and the balance on cards is -0.008061576.
This number is almost equal to zero. Thus it indicates that there is a weak relationship between
years of education and balance on card.
Education
25
YEARS OF EDUCATION
20
15
10
0
0 500 1000 1500 2000 2500
BALANCE ON CARD
Solution 5C:
Null Hypothesis: There is no difference between the average of the balance between married and
unmarried.
Alternative Hypothesis: There is difference between the average of the balance between married
and unmarried
Married- Unmarried-
Balance on card Balance on card
Mean 517.9428571 523.2903226
Variance 205696.7262 221735.0385
Observations 245 155
Hypothesized Mean Difference 0
Df 319
t Stat -0.112233601
P(T<=t) one-tail 0.455354389
t Critical one-tail 1.649644319
P(T<=t) two-tail 0.910708777
t Critical two-tail 1.967428387
Inference:
P value is greater than 0.05. Thus, we cannot reject null hypothesis. Thus, there is no difference in
the average of the balance between married and unmarried.
Question 6:
“Ethnicity of the cardholder does not matter as far a balance is concerned.” Carry out an analysis of
variance (ANOVA) and discuss whether this statement is supported by the data or not.
Solution:
Null Hypothesis: There is no difference between the average of the balance between various
ethnicity.
Alternative Hypothesis: There is difference between the average of the balance between various
ethnicity.
Results of Analysis of Variance (ANOVA):
SUMMARY
Groups Count Sum Average Variance
African American- Balance on card 99 52569 531 235839.2
Asian- Balance on card 102 52256 512.3137 231748.3
Caucasian- Balance on card 199 103181 518.4975 190922.4
ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 18454.2 2 9227.1 0.043443 0.957492 3.018452
Within Groups 84321458 397 212396.6
Inference:
P value is greater than 0.05. Thus we cannot reject null hypothesis. Thus there is no difference in the
average of the balance between various ethnicity.
Question 7:
A general principle that credit card companies often follow is to assign a higher credit limit to people
with a higher credit rating. Does the data show that this principle is being followed?
Solution:
The correlation coefficient between credit limit and the credit rating is 0.99687. The correlation
coefficient is closer to 1 which indicates a perfect linear relationship between the credit limit and
credit rating.
Rating
1200
1000
800
Rating
600
400
200
0
0 2000 4000 6000 8000 10000 12000 14000 16000
Credit Limit
Question 8:
Run a simple linear regression of balance on the credit limit. (Here credit limit is the X and the
balance is the Y). Report the coefficients and the R-squared. Show a scatter plot.
Solution:
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.861697
R Square 0.742522
Adjusted R Square 0.741875
Standard Error 233.585
Observations 400
ANOVA
Significance
df SS MS F F
Regression 1 62624255 62624255 1147.764 2.5E-119
Residual 398 21715657 54561.95
Total 399 84339912
Balance
2500
2000
1500
Balance
0
-500 0 2000 4000 6000 8000 10000 12000 14000 16000
Credit limit
INFERENCE:
For every increase in the credit limit, there is a 0.2 times increase in the balance. P value is lesser
than 0.05, so there is a significant relationship between the two variable. As per the R squared value,
there is 74% variability in balance for variation in credit limit.
Question 9:
Run a simple linear regression of balance (Y) on credit rating (X). Report the coefficients and R-
squared. Show a scatter plot.
Solution:
Simple Linear Regression Results:
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.863625
R Square 0.745848
Adjusted R Square 0.74521
Standard Error 232.0713
Observations 400
ANOVA
Significance
df SS MS F F
Regression 1 62904790 62904790 1167.994581 1.8989E-120
Residual 398 21435122 53857.09
Total 399 84339912
Balance
2500
2000
1500
Balance
y = 2.5662x - 390.85
1000 R² = 0.7458
500
0
0 200 400 600 800 1000 1200
-500
Credit Rating
INFERENCE:
For every increase in the credit rating, there is a 2.56 times increase in the balance. P value is lesser
than 0.05, so there is a significant relationship between the two variable. As per the R squared value,
there is 74% variability in balance for variation in credit rating.
QUESTION 10:
Consider your findings in questions 8-9. Discuss business mechanisms to increase or decrease the
balance on credit cards. Try to quantify your answers. In this context, focus on possible specific
strategies using variables in Q8 and Q9 that the business could adopt to increase the balance on
credit cards
SOLUTION:
For increasing the balance following strategies could be adopted:
Increase the credit limit for individuals – For every unit increase in credit limit there is 0.2
increase in balance.
Acquire customers with higher credit rating- This will lead to increase in average credit
rating. For every unit increase in credit rating there is 2.56 increase in the balance.
QUESTION 11:
The credit limit is provided as a consolidated amount for all the credit cards the cardholder has. Run
a multiple linear regression of Balance (Y) on Limit and Cards as two X variables. Report the
coefficients. Discuss the effect on the balance of (a) increasing the credit limit on the same number
of cards and (b) increasing the number of cards without altering the total credit limit.
SOLUTION:
Multiple Linear Regression Results:
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.865188295
R Square 0.748550786
Adjusted R Square 0.74728404
Standard Error 231.1247525
Observations 400
ANOVA
Significance
df SS MS F F
Regression 2 63132707 31566354 590.9238 9.8E-120
Residual 397 21207205 53418.65
Total 399 84339912
INFERENCE:
For every increase in the credit limit, there is a 0.17 times increase in the balance keeping the
number of cards constant. Similarly, for every increase in the number of cards there is 26 times
increase in the balance keeping the credit limit constant. P value is lesser than 0.05, so there is a
significant relationship between the three variable. As per the R squared value, there is 74%
variability in balance, for variation in credit limit and number of cards.
QUESTION 12:
Run a simple linear regression equation with Income as X and Balance as Y. Report the coefficients. Is
the coefficient of Income significantly different from zero? What does this say about the effect of
income on balance?
SOLUTION:
Simple Linear Regression Results:
Regression Statistics
Multiple R 0.463656457
R Square 0.21497731
Adjusted R Square 0.213004891
Standard Error 407.8647195
Observations 400
ANOVA
Significance
df SS MS F F
Regression 1 18131167 18131167 108.9917 1.03E-22
Residual 398 66208745 166353.6
Total 399 84339912
Balance
2500
y = 6.0484x + 246.51
2000
Income
1500
1000
500
0
0 20 40 60 80 100 120 140 160 180 200
Balance
INFERENCE:
Coefficient of income is significantly different from zero. For every increase in income there is
increase in balance by 6 times. As per the R squared value, there is 21% variability in balance for
variation in income.
QUESTION 13:
Based on the equation derived in question 12, what is the estimated balance for a person with an
income of USD 100k per year?
SOLUTION:
Regression equation is y=6.0484x+246.51. Estimated balance for a person with an income of USD
100K per year is 851.35K dollars
Question 14:
Based on the dataset, explore the relationship between credit card balance (Y) and (a) Income (b)
Age (c) Education (c) Limit, and (d) Rating as X variables? Estimate a multiple linear regression model
and report the statistical significance of each of these variables.
Solution:
Correlation matrix:
Income Age Education Limit Rating Balance
Income 1
Age 0.175338 1
Education -0.02769 0.003619 1
Limit 0.792088 0.100888 -0.02355 1
Rating 0.791378 0.103165 -0.03014 0.99688 1
Balance 0.463656 0.001835 -0.00806 0.861697 0.863625 1
Regression Statistics
Multiple R 0.936703
R Square 0.877412
Adjusted R Square 0.875856
Standard Error 161.9918
Observations 400
ANOVA
Significance
df SS MS F F
Regression 5 74000827 14800165 564.0021 4.6E-177
Residual 394 10339085 26241.33
Total 399 84339912
Y= -473.25-7.6X1-0.86X2+1.967X3+0.079X4+2.773X5