You are on page 1of 19

Assessment No 2 | Individual Assessment | Credit Card Study

Question 1
A company manager says that the average balance on their credit cards is $500. Do you think that this
assertion is justified? Use a one-sample t-test to draw your conclusion.

Answer

Approach: In order to solve this question, we use a T-test sample variable assuming unequal variances.
But if you see the data, we just have one variable to compare with So what we will do is, create a dummy variable
to compare the credit card balance data to the dummy variable.

1. Let us create a hypothetical situation where

(Null Hypothesis) H0 : Mean balance of credit card = 500


(Alternative Hypothesis) H1 : Mean balance of credit cards is not equal to 500(can be less or greater but
not equal)

• The most important point to discuss in this equation is the value of hypothesized mean difference which is
500 over here because the average balance we’re comparing with a dummy variable which is zero here
has a difference of 500

Two sample t test with a demo variable (ignore the demo


variable)
T-Test: Two-Sample Assuming Unequal
Variances

Balance Demo variable


Mean 520.015 0
Variance 211378.2253 0
Observations 400 400
Hypothesized Mean Difference 500
df 399
t Stat 0.870673781
P(T<=t) one-tail 0.192227914
t Critical one-tail 1.648681534
P(T<=t) two-tail 0.384455827
t Critical two-tail 1.965927296

Analysis
Looking at the P-value which is greater than the significance level of 0.05 makes us not reject the null hypothesis
Also, t stat is smaller than the t critical, again we can not reject the null hypothesis.

Conclusion
There is not enough evidence to prove that the average balance on their credit card is not equal to 500$.
There is enough evidence to prove that the average balance on their credit card is equal to 500$.
Question 2
Is there a difference between men and women as far as average balance is concerned? Use a two-
sample t-test to draw your conclusion.

Approach: To solve this test we have to first filter out the data of mean balances of men and women and then
compare the T-test of two samples assuming unequal variances.

Let us create a hypothetical situation where

(Null Hypothesis) H0: Mean balance of Men and Women is the same M1=F1
(Alternative Hypothesis) H1:Mean balance of M1 (Male) and F1 (Female) are not equal (can be less or greater
but not equal)

t-Test: Two-Sample Assuming Unequal Variances

Male M1 Female F1
Mean 509.8031088 529.5362319
Variance 213554.5652 210187.1043
Observations 193 207
Hypothesized Mean Difference 0
df 396
t Stat -0.42838443
P(T<=t) one-tail 0.334302083
t Critical one-tail 1.648710601
P(T<=t) two-tail 0.668604165
t Critical two-tail 1.965972608

Analysis
Looking at the P two-tail value which is greater than the significance level 0.05, 0.668604165424214 > 0.05

Conclusion
As the P value is greater than 0.05 so we cannot reject the null hypothesis that means we don’t have
enough evidence to prove that the mean balance of credit card between men and women is
significantly different.
Question 3
Is there a difference between students and non-students as far as average balance is concerned? Use a
two-sample t-test to draw your conclusion.

Approach: To solve this will again take data from the sheet filtering between students and nonstudents and their
balance in the credit card.

Let us create a hypothetical situation where

(Null Hypothesis) H0: Mean balance of Students and Nonstudents is the same M1=M2
(Alternative Hypothesis) H1:Mean balance of M1 (Students) and M2 (Non students) are not equal (can be less or
greater but not equal.)

t-Test: Two-Sample Assuming Unequal


Variances

Not students Students


M1 M2
Mean 480.3694444 876.825
Variance 193085.1361 240101.9429
Observations 360 40
Hypothesized Mean
Difference 0
df 46
t Stat -4.902778661
P(T<=t) one-tail 6.09E-06
t Critical one-tail 1.678660414
P(T<=t) two-tail 1.21724E-05
t Critical two-tail 2.012895599 Analysis

Inference: The P-value is smaller than the significance level of 0.05.

Conclusion
From the two-sample T-test, we have found that the P-value is smaller than the significance level
which is 0.05 that means we have enough evidence to prove that there is a difference in the mean
balance of credit card between students and nonstudents, therefore, we reject the null hypothesis
which says that there is no difference.
Question 4
It is generally assumed that if there are more credit cards then the balance on the cards will be more.
Based on this dataset, do you think this is true? Calculate a correlation coefficient and show a scatter
plot to support your answer.

This is a regression equation in which one variable affects the other over here the number of cards is assumed to
be affecting the balance on credit cards so the number of credit cards is the variable X and the balance of the great
card is the variable Y which is also considered to be response so we’ll calculate the correlation coefficients which
will define the relationship between the two

Correlation Matrix

No of cards Balance
No of cards 1
Balance 0.086456347 1

The correlation coefficient for the of cards per person and the balance on their cards is 0.0864

Scatter plot

Balance (Y)
2500

2000

1500

1000
y = 28.987x + 434.29
500

0
0 2 4 6 8 10

The regression equation (Y) is seen in the scatter plot as


Y=28.987+434.29
Where 434.29 is the intercept and 28.897 is the coefficient

Inference
1.As the correlation is close to 0, there is a weak relation between the number of card and balance.
2.The points in the graph forms linear trends having the linear graph close to the perpendicula,
therefore, showing a weak relation
Therefore, it is close to right that if there are more credit cards then then balance on the cards will be
more.
Question 5
Examine whether the following demographic variables influence balance: (a) age, (b) years of
education, (c) marital status. For age and years of education, use scatter plots to depict their
relationship with balance and calculate the correlation coefficient. For the relationship between
marital status and balance, use a two-sample t-test to draw your conclusion

A. Let’s take the first demographic variable which is age and let’s see the relationship between age and
the balance of the credit cards.

For this we’ll use correlation.

Correlation Matrix
Age Balance
Age 1
Balance 0.001835119 1

As for the correlation matrix the relationship between the age and the balance in the credit card is not a strong
relationship where we can see the change brought by age on the balance is just 1%

Let’s see an equation by using a scatter plot and see a relationship again by calculating the regression
quation

Balance (Y)
2500

2000

1500

1000 y = 0.0489x + 517.29

500

0
0 20 40 60 80 100 120

If u see the equation,


Y = 0.0489x + 517.29

Conclusion
Through the equation we can see that 0.04 is the coefficient which means that one unit change in age
which is one year will bring a change in balance of 0.04 dollars which is not very significant and age is
not age is not great predictor of why
B. let’s see the effect of years of education on the variable Y which is the credit card balance and
let’s see the relationship through correlation matrix.

Correlation Matrix

Years of Education Balance


Years of
Education 1
Balance -0.008061576 1

According to the correlation matrix the relationship between years of education and credit card balance is
negative so the correlation coefficient over here is - 0.080

Now let’s create a scatter plot and see the equation relation between the two

Scatter Plot

Balance Y
2500

2000

1500

y = -1.186x + 535.97
1000

500

0
0 5 10 15 20 25

Inference
From the equation derived from the scatterplot (Y= -1.186x + 535.97) you can see the a negative or
almost zero relationship between the years of education and the credit card balance which shows that
there is no significant relationship between years of education and balance.
C. Let's see how credit card balance get affected by marital status

Lets create a hypothetical situation where


(Null Hypothesis) H0 : Mean balance of Married(M1) and Not married(M2) For a credit card is same M1=M2
(Alternative Hypothesis) H1: Mean balance of Married(M1) and Not married(M2) for a credit card are not equal
(can be less or greater but not equal)

To check if the mean balance of credit cards is different for married and non married will use a T test of two
sample assuming unequal variances

t-Test: Two-Sample Assuming Unequal


Variances

Not Married (M2) Married (M1)


Mean 523.2903226 517.9428571
Variance 221735.0385 205696.7262
Observations 155 245
Hypothesized Mean Difference 0
df 319
t Stat 0.112233601
P(T<=t) one-tail 0.455354389
t Critical one-tail 1.649644319
P(T<=t) two-tail 0.910708777
t Critical two-tail 1.967428387

Conclusion
Looking at the Pvalue which is greater than 0.05 which clearly indicates that we fail to reject the null
hypothesis, which means then we don't have significant evidence to prove that the mean difference of
both the variables are different from each other, so the marital status does not impact the card
balance.
Question 6
Does Ethnicity of the cardholder matter as far a balance is concerned?

Approach: To analyze this, we have to follow some tests to check whether the mean of all the ethnicities
present in the data are equal or not

Lets create a hypothetical situation where

(Null Hypothesis) H0 : Average balance of credit cards are equal of all the 3 ethnicities, E1=E2=E3

(Alternative Hypothesis) H1 :Average balance of credit cards are not equal all the 3 ethnicities, E1≠E2≠E3

To check the mean differences of all the three ethnicities will use a ANOVA ANALYSIS OF VARIANCES) to
check if there is a difference or not

ANOVA (Analysis of variance test )

Anova: Single Factor

SUMMARY
Groups Count Sum Average Variance
African American 99 52569 531 235839.1633
Asian 102 52256 512.3137255 231748.3362
Caucasian 199 103181 518.4974874 190922.4129

ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 18454.20047 2 9227.100236 0.043442783 0.957492 3.018452
Within Groups 84321457.71 397 212396.6189

Total 84339911.91 399

Conclusion
Looking at the P value which is greater than 0.05 significance level we come to a conclusion that we
cannot reject the null hypothesis which means that the mean of credit card balance of all the three
ethnicities are close to equal and we do not have significant evidence to prove that ethnicities impact
the balance
Question 7
A general principle that credit card companies often follow is to assign a higher credit limit to people
with a higher credit rating. Does the data show that this principle is being followed?

Approach: This is a regression question where the variable X will be a variable which will predict or affect the
value of. In this case the variable X is the credit card rating and variable Y is the balance in the credit card to
check their relationship we wil wil use a correlation matrix.

Rating Limit
Rating 1
Limit 0.99688 1

If u see the relationbetween rating and limit is a strong relation and it is a great predictor of y
As it is 99%

To find the regression relation we wil use a scatter plot

Limit
16000
14000 y = 14.872x - 542.93
12000
10000
8000
6000
4000
2000
0
0 200 400 600 800 1000 1200

There is a very tight relation between rating and balance


Equation is Y = 14.872x - 542.93

Inference
As per the eqation the balce of the credit card increases by 14.872 dollars if there is an increase in
rating by 1 unit, so, there is a relation between credit limit and rating. Also, the linear trend shows
that it has a positive correlation.
Question 8
Run a simple linear regression of balance on the credit limit. (Here credit limit is the X and the balance
is the Y). Report the coefficients and the R-squared. Show a scatter plot.

We will do a simple linear regression analysis to see how much X (CARD LIMIT) predicts the Y (BALANCE)

Regression model

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.8616973
R Square 0.7425222
Adjusted R
Square 0.7418753
Standard Error 233.585
Observations 400

ANOVA
Significance
df SS MS F F
Regression 1 62624255 62624255 1147.7642 2.53E-119
Residual 398 21715657 54561.951
Total 399 84339912

Standard Upper Lower Upper


Coefficients Error t Stat P-value Lower 95% 95% 95.0% 95.0%
- - - -
Intercept -292.7905 26.683415 10.972752 1.184E-24 -345.24855 240.33244 345.24855 240.33244
Limit (X) 0.1716373 0.0050662 33.878669 2.53E-119 0.1616774 0.1815972 0.1616774 0.1815972

From the above regression model, we conclude that the P-value is less than 0.05 which means the Card
limit is a significant predictor For card balance.
- The coefficients of limit 0.17 mean that there with an increase of .17$ in balance with an
increase of limit by 1$
- The R-square indicates that if I want to influence my credit card balance then 74% of it can be
done by varying the credit card limit
Let perform a scatter plot to see the regression equation

CARD BALANCE (Y)


2500
2000 y = 0.1716x - 292.79

1500
1000
500
0
0 2000 4000 6000 8000 10000 12000 14000 16000
-500

Inference
Looking at regression model where P-value is less than 0.05 shows that card limit is a great predictor
for balance on credit card and also the pattern of a scatter plot it is seen there is progressive relation
between the two. So If limit goes up by 1 unit, balance goes up by 0.1716.
Question 9
Run a simple linear regression of balance (Y) on credit rating (X). Report the coefficients and R-
squared. Show a scatter plot

We will do a simple linear regression analysis to see how much X (CARD RATING) predicts the Y (BALANCE)

Regression model

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.863625161
R Square 0.745848418
Adjusted R
Square 0.745209846
Standard Error 232.0713048
Observations 400

ANOVA
df SS MS F Significance F
Regression 1 62904789.88 62904789.88 1167.994581 1.8989E-120
Residual 398 21435122.03 53857.09053
Total 399 84339911.91

Standard
Coefficients Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
-
Intercept 390.8463418 29.06851463 -13.44569362 3.07318E-34 -447.993365 -333.6993186 -447.993365 -333.6993186
Rating (x) 2.566240327 0.075089102 34.1759357 1.8989E-120 2.418619483 2.713861171 2.418619483 2.713861171

From the above regression model, we conclude that the P-value is less than 0.05 which means the Card
Rating is a significant predictor For card balance
- The coefficients of Rating 2.566 mean that there with an increase of 2.566 in balance with an
increase of rating by 1$
- The R-square indicates that if I want to influence my credit card balance then 74% of it can be
done by varying the credit card rating
Scatter Plot

Blance (Y)
2500
y = 2.5662x - 390.85
2000
1500
1000
500
0
-500 0 200 400 600 800 1000 1200

Inference
Looking at regression model where P-value is less than 0.05 shows that card rating is a great predictor
for balance on credit card and also the pattern of a scatter plot it is seen there is progressive relation
between the two. So If rating goes up by 1 unit, then balance goes up by 2.566.
Q10. Consider your findings in questions 8-9. Discuss business mechanisms to increase or decrease the
balance on credit cards. Try to quantify your answers.In this context, focus on possible specific
strategies using variables in Q8 and Q9 that the business could adopt to increase the balance on credit
cards

P-value is less than 0.05, therefore credit limit is predictor of balance.


Coefficient
where; y = balance on the credit card, x= credit
y = 0.1716x - 292.79 card limit

Interpretation of the slope Coefficient of credit limit is positive therefore, with $1 increase in credit card limit , the average balance of credit card
(coefficient of the limit) - increases by $0.1716.

When credit card limit is equal to zero, there is a negative balance on the credit card of $292.79. This may
not make much sense from the business perspective where balance does not apply to a zero credit limit
Interpretation of the intercept - case.
About 74.25% of the variation in average credit card balance is due to the credit card limit and remaining 25.75% is due
R² Interpretation to other dependant factors.

Based on 9th

P-value (1.8988) is less than 0.05, therefore credit rating is a predictor of balance.
Coefficients
y = 2.5662x - 390.85 where; y = balance on the credit card, x= credit rating
The coeffi ci ent of credi t ra ti ng i s pos i ti ve therefore, wi th $1 i ncrea s e i n credi t ra ti ng , the a vera ge ba
Bus i nes s i mpl i ca ti on of credi t ra ti ng- credi t ca rd i ncrea s es by $ 2.5662.
Interpreta ti on/Bus i nes s i mpl i ca ti on of the
When
i ntercept
credi t ra
- ti ng i s equa l to zero, there i s a nega ti ve ba l a nce on the credi t ca rd of 390.85.

About 74.58% of the va ri a ti on i n a vera ge credi t ca rd


ba l a nce i s due to the credi t ra ti ng a nd rema i ni ng
R² Interpreta ti on 25.42% i s due to other dependa nt fa ctors .
Question 11
The credit limit is provided as a consolidated amount for all the credit cards the cardholder has. Run a
multiple linear regression of Balance (Y) on Limit and Cards as two X variables. Report the coefficients.
Discuss the effect on the balance of (a) increasing the credit limit on the same number of cards and (b)
increasing the number of cards without altering the total credit limit.

To see the effect of two X variables on variable y


We will run a multiple linear regression to get the coefficients snd the relation of the two with each
other and also on the cards
Regression Model
Regression Statistics
Multiple R 0.865188295
R Square 0.748550786
Adjusted R
Square 0.74728404
Standard
Error 231.1247525
Observations 400

ANOVA
Significance
df SS MS F F
Regression 2 63132707.37 31566353.68 590.9238244 9.7585E-120
Residual 397 21207204.54 53418.65124
Total 399 84339911.91

Standard
Coefficients Error t Stat P-value Lower 95% Upper 95% Lower 95.0%
- - -
Intercept 369.0359554 36.16414657 10.20447018 7.22692E-22 -440.133128 297.9387828 -440.133128
No . Cards X1 26.03375427 8.438363509 3.085166246 0.002176819 9.444290848 42.62321769 9.444290848
Limit X2 0.171479037 0.005013136 34.20593861 2.0023E-120 0.161623424 0.18133465 0.161623424
A. The coefficient of No of cards (x1) is 26.033
The coefficient of limit (x2) is .1714
B.

Approach: Let's discuss the regression model here where coefficients are different when they were taken in
simple linear regression which was for No of cards was 28.98
The reason is that now we have one more predictor that runs simultaneously with the other so when No of
cards increase limit is kept constant and the limit increased the no of cards are kept constant
C. If limit goes up by 1 unit, balance goes up by 0.1714 (keeping cards constant).
D. If cards go up by 1 unit, balance goes up by 26.0337 (keeping limit constant).
Question 12
Run a simple linear regression equation with Income as X and Balance as Y. Report the coefficients. Is
the coefficient of Income significantly different from zero? What does this say about the effect of
income on balance?

Let us run a simple linear regression to see the effect of income

Linear Regression Model

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.463656457
R Square 0.21497731
Adjusted R
Square 0.213004891
Standard Error 407.8647195
Observations 400

ANOVA
df SS MS F Significance F
Regression 1 18131167.4 18131167.4 108.9917152 1.03089E-22
Residual 398 66208744.51 166353.6294
Total 399 84339911.91

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 246.5147506 33.19934735 7.425289058 6.90344E-13 181.2467485 311.7827527 181.2467485 311.7827527
Income X 6.048363409 0.579350163 10.43990973 1.03089E-22 4.909394402 7.187332415 4.909394402 7.187332415

(A) The coefficient of a regression model is of income and Balance on cards are

Intercept 246.514
Income X 6.048

(B) Coefficients of Income is significantly away from zero which is 6.048

(C) From the regression model, we have got the coefficient of income at 6.048 which says that there
will be an increase in the balance my 6.048 if the income gets increased by one unit
(D) If the card goes up by 1 unit, balance goes up by 26.0337 (keeping the limit as a constant).
Question 13
Based on the equation derived in question 12, what is the estimated balance for a person with an
income of USD 100k per year?

Let’s create a scatter plot for the above question to get the regression equation

Balance Y
2500

2000

1500

1000

500
y = 6.0484x + 246.51
0
0 20 40 60 80 100 120 140 160 180 200

The regression equation we get from the scatterplot is Y = 6.0484x + 246.51


and from the question, we’ll take the value of X as 100 let’s see what is the estimated value

Y= 6.048(100) +246.51

Y= 851.35

So the estimated balance for a person 100K is 851.35 dollars


Question 14
Based on the dataset, explore the relationship between credit card balance (Y) and (a) Income (b) Age
(c) Education (c) Limit, and (d) Rating as X variables? Estimate a multiple linear regression model and
report the statistical significance of each of these variables.

Regression Statistics
Multiple R 0.936702578
R Square 0.87741172
Adjusted R
Square 0.875856031
Standard
Error 161.9917647
Observations 400

ANOVA
Significance
df SS MS F F
Regression 5 74000827.17 14800165.43 564.0020686 4.5908E-177
Residual 394 10339084.74 26241.33183
Total 399 84339911.91

Standard
Coefficients Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
- - - - - -
Intercept 473.2514026 55.10833546 8.587655545 2.08837E-16 581.5945666 364.9082387 581.5945666 364.9082387
- - - -
Income X1 7.608832003 0.381931562 19.92197755 1.37077E-61 8.359710677 -6.85795333 8.359710677 -6.85795333
- -
Limit X2 0.07901642 0.044791005 1.764113581 0.078487737 0.009042839 0.167075679 0.009042839 0.167075679
Rating X3 2.773843725 0.667079559 4.158190261 3.93909E-05 1.462363177 4.085324273 1.462363177 4.085324273
- - - -
Age X4 0.860030445 0.478700493 1.796594023 0.073165937 1.801157147 0.081096257 1.801157147 0.081096257
- -
Education X5 1.967791521 2.605290902 0.755305874 0.450516748 3.154218733 7.089801776 3.154218733 7.089801776

Statistical significance of the coefficients:


A. Income: The balance and income relation proves to be significant as the p-value is less than the
significance level 0.05.
B. Age: The relationship between balance and age is not significant as the p-value is not less than
that of the significance level 0.05.
C. Education: The relationship between balance and education is not significant as the p-value is
not less than the significance level 0.05.
D. Limit: The relationship between balance and limit is not statistically significant as the p-value is
not less than the significance level 0.05.
E. Rating: The relationship between balance and rating is statistically significant as the p-value is
less than the significance level 0.05.
How credit balance depends on each of the variables:
A. Income: If income decreases by 1 unit, balance increases by 7.6088 (keeping other x variables
constant).
B. Age: If age decreases by 1 unit, balance increases by 0.8600 (keeping other x variables constant).
C. Education: If education increases by 1 unit, balance increases by 1.9678 (keeping other x
variables constant).
D. Limit: If limit increases by 1 unit, balance increases by 0.0790 (keeping other x variables
constant).
E. Rating: If rating increases by 1 unit, balance increases 2.7738 (keeping other x variables
constant).

Presented by
ARPAN BHATIA
PGPMex ‘21

You might also like