You are on page 1of 14

1) A company manager says that the average balance on their credit cards

is $500. Do you think that this assertion is justified? Use a one-sample t-


test to draw your conclusion.

Solution:
In regard to this case:
Null Hypothesis is: - H0 : x = 500, (where x is average balance on their credit
cards)
Alternate Hypothesis is: - H1 : x ≠ 500, (where x is average balance on their
credit cards)

One-Sample t-Test

Variable 1 Variable 2
Mean 520.015 0
Variance 211378.2 0
Observations 400 2
Hypothesized Mean 500
df 399
t Stat 0.870674
P(T<=t) one-tail 0.192228
t Critical one-tail 1.648682
P(T<=t) two-tail 0.384456
t Critical two-tail 1.965927

From this above table which is also depicted in “Ans 1” of the excel sheet we
can infer that the P value for two tail marked in red above is 0.384456 which is
greater than 0.05. Therefore as a result, we fail to reject this assertion and the
null hypothesis assuming 95% confidence. Thus in conclusion, this assertion
is justified.
2) Is there a difference between men and women as far as average balance
is concerned? Use a two-sample t-test to draw your conclusion.

Solution
In regard to this case,
Let us assume: Average balance for men = x1 and Average balance for
women = x2.
Null hypothesis: - H0 : x1 = x2
Alternate Hypothesis is: - H1 : x1 ≠ x2

Two-Sample t-Test

Variable Variable
1 2
Mean 509.8031 529.5362
Variance 213554.6 210187.1
Observations 193 207
Hypothesized Mean Difference 0
df 396
t Stat -0.42838
P(T<=t) one-tail 0.334302
t Critical one-tail 1.648711
P(T<=t) two-tail 0.668604
t Critical two-tail 1.965973

From this above table which is also depicted in “Ans 2” of the excel sheet we
can infer that the P value for two tail marked in red above is 0.668604 which is
greater than 0.05. Therefore as a result, we fail to reject this null hypothesis
assuming 95% confidence interval and we can conclude there is no significant
difference between men and women as far as average balance is concerned.
3) Is there a difference between students and non-students as far as
average balance is concerned? Use a two-sample t-test to draw your
conclusion.

Solution:
In regard to this case,
Let us assume: Average balance for students = x1 and Average balance for
non-students = x2.
Null hypothesis: - H0 : x1 = x2
Alternate Hypothesis is: - H1 : x1 ≠ x2

Two-Sample t-Test

Variable
Variable 1 2
Mean 480.3694444 876.825
Variance 193085.1361 240101.9
Observations 360 40
Hypothesized Mean Difference 0
df 46
t Stat -4.90277866
P(T<=t) one-tail 6.08619E-06
t Critical one-tail 1.678660414
P(T<=t) two-tail 0.00001217
t Critical two-tail 2.012895599

From this above table which is also depicted in “Ans 3” of the excel sheet we
can infer that the P value for two tail marked in red above is 0.00001217
which is lesser than 0.05. There is also a huge difference in the mean for both
variable 1 and variable 2. Therefore as a result, we can directly reject this null
hypothesis assuming 95% confidence interval and we can conclude there is a
significant difference between men and women as far as average balance is
concerned.
4) It is generally assumed that if there are more credit cards then the balance on the
cards will be more. Based on this dataset, do you think this is true? Calculate a
correlation coefficient and show a scatter plot to support your answer.

Solution:
With reference to this case,
We can calculate correlation coefficient between credit cards and balance and
the result of the calculation is given below. From the inference we can say
and agree to the fact that if there are more credit cards then the balance on
the cards will be more.

Column Column
1 2
Column 1
1
Column 0.086456 1
2

The Correlation coefficient between credit cards and balance is 0.086456.

Scatter Plot

Balance
2500

2000

1500

Balance
1000

500

0
0 2 4 6 8 10

From this scatter plot, we cannot say anything significantly in relation to cards and
balance. (Kindly refer to “Ans 4” of the excel sheet).
5) Examine whether the following demographic variables influence
balance: (a) age, (b) years of education, (c) marital status. For age and
years of education, use scatter plots to depict their relationship with
balance and calculate the correlation coefficient. For the relationship
between marital status and balance, use a two-sample t-test to draw
your conclusion

Solution:
With regard to this case,
We have to find the correlation coefficient of balance and age differently and
again the correlation coefficient of balance and education differently. From
“Ans 5.1” of the excel sheet we can see that the result of the correlation
coefficient of balance and age is 0.001835119 and the result of the correlation
coefficient of balance and education is -0.008061576. The scatter plot for
balance-education and balance-age are mentioned below as well as in “Ans
5.1” of the excel sheet.

Scatter Plot of Balance-Age

Balance
2500

2000

1500

Balance
1000

500

0
0 20 40 60 80 100 120

From the above calculations and the scatter plot, we can infer that the
correlation coefficient of age and balance is very insignificant. Thus they are
not significantly correlated.
Scatter Plot of Balance-Education

Education
25

20

15

Education
10

0
0 500 1000 1500 2000 2500

From the above calculations and the scatter plot, we can infer that the
correlation coefficient of age and balance is very insignificant. Thus they are
not significantly correlated.

Relationship of Balance and Marital Status


In regard to this case,
Let us assume: Married to be x1 and unmarried to be x2.
Null hypothesis: H0 : x1 = x2
Alternate hypothesis: H1 : x1 ≠ x2

Two-Sample t-Test

Variable Variable
1 2
Mean 517.9429 523.2903
Variance 205696.7 221735
Observations 245 155
Hypothesized Mean Difference 0
df 319
t Stat -0.11223
P(T<=t) one-tail 0.455354
t Critical one-tail 1.649644
P(T<=t) two-tail 0.910709
t Critical two-tail 1.967428
From this above table which is also depicted in “Ans 5” of the excel sheet we
can infer that the P value for two tail marked in red above is 0.910709 which is
greater than 0.05, assuming 95% confidence interval. Therefore as a result,
we fail to reject this null hypothesis and we can conclude there is no
significant difference between marital status and balance and also the marital
status will not be influencing balance significantly.

6) “Ethnicity of the cardholder matter does not matter as far a balance is


concerned.” Carry out an analysis of variance (ANOVA) and discuss
whether this statement is supported by the data or not.

Solution:
In regard to this case,
Let us assume: African American to be x1, Asian to be x2 and Caucasian to be
x3. Therefore,
Null hypothesis: H0 : x1 = x2 = x3
Alternate hypothesis: H1 : x1 ≠ x2 ≠ x3

ANOVA Test

SUMMARY
Groups Count Sum Average Variance
Column 1 99 52569 531 235839.2
Column 2 102 52256 512.3137 231748.3
Column 3 199 103181 518.4975 190922.4

ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 18454.2 2 9227.1 0.043443 0.957492 3.018452
Within Groups 84321458 397 212396.6

Total 84339912 399

From this above table which is also depicted in “Ans 6” of the excel sheet we
can infer that the P value for two tail marked in red above is 0.957492 which is
greater than 0.05. Therefore as a result, we fail to reject this null hypothesis
assuming 95% confidence interval and we can conclude there is no significant
difference and can strongly agree to the fact that Ethnicity of the cardholder
matter does not matter as far a balance is concerned.
7) A general principle that credit card companies often follow is to assign a
higher credit limit to people with a higher credit rating. Does the data
show that this principle is being followed?

Solution:
With reference to this case,
We need to calculate firstly the correlated coefficient of limit and rating. From
the calculation of the same above we get the result of correlated coefficient as
0.99688. This result is much highly correlated. To further prove the statement,
we took the help of scatter plot diagram which is mentioned below as well as
in the “Ans 7” of the excel sheet.

Scatter Plot of Limit and Rating

Rating
1200

1000

800

600
Rating
400

200

0
0 5000 10000 15000

From the above diagram which is also mentioned in “Ans 7” of the excel
sheet, we can strongly infer and agree to the general principle that credit card
companies often follow is to assign a higher credit limit to people with a higher
credit rating. Here it clearly shows that with higher credit rating of customers,
there is an increased higher credit limit.
8) Run a simple linear regression of balance on the credit limit. (Here credit
limit is the X and the balance is the Y). Report the coefficients and the R-
squared. Show a scatter plot.

Solution:
In this case, we have assumed X to be credit limit and Y to be the balance.
The result of R squared and the coefficients are mentioned below as well as
in “Ans 8” of the excel sheet.

Regression Statistics
Multiple R 0.861697267
R Square 0.74252218
Adjusted R Square 0.741875251
Standard Error 233.5849982
Observations 400

ANOVA
df SS MS F Significance F
Regression 1 62624255 62624255 1147.764 2.5E-119
Residual 398 21715657 54561.95
Total 399 84339912

Coefficients Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.0%


Upper 95.0%
Intercept -292.7904955 26.68341 -10.9728 1.18E-24 -345.249 -240.332 -345.249 -240.332
X Variable 1 0.171637278 0.005066 33.87867 2.5E-119 0.161677 0.181597 0.161677 0.181597

In reference to the above diagram and the calculations done in “Ans 8” of the
excel sheet, we can infer that the R square result is 0.742522, result of
intercept is -292.7904955 and the result of coefficient of X is 0.171637278.
Regression Equation: Y= -292.7904955 + 0.171637278*X

Scatter Plot

Balance
2500

2000

1500

Balance
1000

500

0
0 5000 10000 15000
9) Run a simple linear regression of balance (Y) on credit rating (X). Report
the coefficients and R-squared. Show a scatter plot.

Solution:
In this case, we have assumed X to be credit limit and Y to be the balance.
The result of R squared and the coefficients are mentioned below as well as
in “Ans 9” of the excel sheet.

Regression Statistics
Multiple R 0.863625161
R Square 0.745848418
Adjusted R Square 0.745209846
Standard Error 232.0713048
Observations 400

ANOVA
df SS MS F Significance F
Regression 1 62904790 62904790 1167.995 1.9E-120
Residual 398 21435122 53857.09
Total 399 84339912

Coefficients Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.0%


Upper 95.0%
Intercept -390.8463418 29.06851 -13.4457 3.07E-34 -447.993 -333.699 -447.993 -333.699
X Variable 1 2.566240327 0.075089 34.17594 1.9E-120 2.418619 2.713861 2.418619 2.713861

In reference to the above diagram and the calculations done in “Ans 9” of the
excel sheet, we can infer that the R square result is 0.745848418, result of
intercept is -390.8463418 and the result of coefficient of X is 2.566240327.
Regression Equation: Y= -390.8463418 + 2.566240327*X

Scatter Plot

Balance
2500

2000

1500

Balance
1000

500

0
0 200 400 600 800 1000 1200
10) Consider your findings in questions 8-9. Discuss business mechanisms
to increase or decrease the balance on credit cards. Try to quantify your
answers. In this context, focus on possible specific strategies using
variables in Q8 and Q9 that the business could adopt to increase the
balance on credit cards.

Solution:
In relation to this case, we can assume credit rating as X1, limits as X2 and the
balance as Y. From the previous scatter plots and calculations, we can infer
that both rating and limits are directly proportionate to the balance. We can
say that with the increase or decrease in rating, there is an increase or
decrease in balance consecutively and likewise with the increase or decrease
in limits, subsequently there is increase or decrease in balance. These two
variables in turn are interconnected to each other again proportionately. High
credit rating results in higher credit limit which again results in higher balance.
Similarly, it is just the opposite in case of lower credit rating. Therefore in
order to increase the balance on credit cards, the business should put more
emphasis and take note of the credit ratings and limits since they are the
changing points and the main turnovers of the business in terms of increased
balance on the credit cards.

11) The credit limit is provided as a consolidated amount for all the credit
cards the cardholder has. Run a multiple linear regression of Balance
(Y) on Limit and Cards as two X variables. Report the coefficients.
Discuss the effect on the balance of (a) increasing the credit limit on the
same number of cards and (b) increasing the number of cards without
altering the total credit limit.

Solution:
In this case, we have assumed X1 to be limit, X2 to be card and Y to be the
balance. The result of R squared and the coefficients are mentioned below as
well as in “Ans 11” of the excel sheet.
Regression Statistics
Multiple R 0.865188295
R Square 0.748550786
Adjusted R Square 0.74728404
Standard Error 231.1247525
Observations 400

ANOVA
df SS MS F Significance F
Regression 2 63132707 31566354 590.9238 9.8E-120
Residual 397 21207205 53418.65
Total 399 84339912

Coefficients Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.0%


Upper 95.0%
Intercept -369.0359554 36.16415 -10.2045 7.23E-22 -440.133 -297.939 -440.133 -297.939
X Variable 1 (Limit) 0.171479037 0.005013 34.20594 2E-120 0.161623 0.181335 0.161623 0.181335
X Variable 2 (Card) 26.03375427 8.438364 3.085166 0.002177 9.444291 42.62322 9.444291 42.62322

From the above diagram and chart, which is also calculated in “Ans 11” of the
excel sheet, we can infer that the intercept is -369.0359554, coefficient of limit
is 0.171479037 and the coefficient of card is 26.03375427. Therefore,
Regression Equation is: Y = -369.0359554 + 0.171479037*X1 +
26.03375427*X2.

From this, we can conclude two things. Firstly with the increase or decrease
in X1 (limit), the balance increases or decreases subsequently. They are
directly proportionate to each other. Secondly, with the increase or decrease
in X2 (card), the balance increases or decreases subsequently. They are
directly proportionate to each other. Both coefficient of limit and card being
positive.

12) Run a simple linear regression equation with Income as X and Balance
as Y. Report the coefficients. Is the coefficient of Income significantly
different from zero? What does this say about the effect of income on
balance?

Solution:
In this case, we have assumed X to be income and Y to be the balance. The
result of R squared and the coefficients are mentioned below as well as in
“Ans 12” of the excel sheet.
Regression Statistics
Multiple R 0.463656457
R Square 0.21497731
Adjusted R Square 0.213004891
Standard Error 407.8647195
Observations 400

ANOVA
df SS MS F Significance F
Regression 1 18131167 18131167 108.9917 1.03E-22
Residual 398 66208745 166353.6
Total 399 84339912

Coefficients Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.0%


Upper 95.0%
Intercept 246.5147506 33.19935 7.425289 6.9E-13 181.2467 311.7828 181.2467 311.7828
X Variable 1 6.048363409 0.57935 10.43991 1.03E-22 4.909394 7.187332 4.909394 7.187332

In reference to the above diagram and the calculations done in “Ans 12” of
the excel sheet, we can infer that the R square result is 0.21497731, result of
intercept is 246.5147506 and the result of coefficient of X is 6.048363409.
Regression Equation: Y= 246.5147506 + 6.048363409*X
Null hypothesis: H0 : c1 = 0 (where, c1= coefficient of X1)
Alternate hypothesis: H1 : c1 ≠ 0

Therefore, in this case we can infer that the P value is:


1.03088580258906E-22, which is lesser than 0.05 and thus we reject null
hypothesis assuming 95% confidence. So correlated coefficient is significant
in this case. We can strongly say that the income has significant effect on
balance.

13) Based on the equation derived in question 12, what is the estimated
balance for a person with an income of USD 100k per year?

Solution:
In this case we assume Income to be X and balance to be Y.
Regression Equation: Y= 246.5147506 + 6.048363409*X
The value of X is USD 100k.
Therefore, the estimated balance for a person is:

246.5147506 + 6.048363409 X 100 = 851.351092.

The balance is: 851.351092.


14) Based on the dataset, explore the relationship between credit card
balance (Y) and (a) Income (b) Age (c) Education (c) Limit, and (d) Rating
as X variables? Estimate a multiple linear regression model and report
the statistical significance of each of these variables.

Solution:
In this case, let us assume X1 to be income, X2 to be age, X3 to be
education, X4 to be limit and Y to be balance. The result of the coefficients
are mentioned below as well as in “Ans 14” of the excel sheet.

Regression Statistics
Multiple R 0.933826525
R Square 0.872031978
Adjusted R Square 0.870736099
Standard Error 165.298439
Observations 400

ANOVA
df SS MS F Significance F
Regression 4 73547100 18386775 672.9272 7.8E-175
Residual 395 10792812 27323.57
Total 399 84339912

Coefficients Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.0%


Upper 95.0%
Intercept -356.4394673 48.3784 -7.36774 1.02E-12 -451.551 -261.328 -451.551 -261.328
X Variable 1 (Income) -7.560341603 0.389546 -19.4081 2.08E-59 -8.32618 -6.7945 -8.32618 -6.7945
X Variable 2 (Age) -0.803431365 0.488275 -1.64545 0.100672 -1.76337 0.15651 -1.76337 0.15651
X Variable 3 (Education) 1.055668552 2.649032 0.398511 0.690469 -4.1523 6.263633 -4.1523 6.263633
X Variable 4 (Limit) 0.263715465 0.005885 44.80999 5.9E-157 0.252145 0.275286 0.252145 0.275286

In reference to the above diagram and the calculations done in “Ans 14” of
the excel sheet, we can infer that the result of the intercept is -356.4394673,
coefficient of X1 is -7.560341603, coefficient of X2 is -0.803431365,
coefficient of X3 is 1.055668552, coefficient of X4 is 0.263715465. P value of
income and limit is less than 0.05 therefore we reject null hypothesis
assuming 95% confidence. Thus the coefficient is significant and they would
be affecting the balance. On the other hand, the P value of age and education
is higher than 0.05 therefore we fail to reject null hypothesis assuming 95%
confidence. Thus the coefficient is very much insignificant and they would not
be affecting the balance.
Y= -356.4394673 - 7.560341603*X1 - 0.803431365*X2 + 1.055668552*X3 +
0.263715465*X4
(whereby the assumption of X1, X2, X3 and X4 is given above).

You might also like