You are on page 1of 11

DADM ASSESSMENT-2

Question 1:

1. A company manager says that the average balance on their credit cards is $500. Do you
think that this assertion is justified? Use a one-sample t-test to draw your conclusion.
Solution:

Null Hypothesis: Average balance on credit card is 500$.


Alternative Hypothesis: Average balance on credit card is not 500$.

Results of One Sample T-test:

Balance
Mean 520.015
Variance 211378.2
Observations 400
Hypothesized Mean 500
df 399
t Stat 0.870674
P(T<=t) one-tail 0.192228
t Critical one-tail 1.648682
P(T<=t) two-tail 0.384456
t Critical two-tail 1.965927

Inference:
P value is greater than 0.05. Thus cannot reject null hypothesis. Average balance on credit card is
500$

Question 2:

Is there a difference between men and women as far as average balance is concerned?
Use a two-sample t-test to draw your conclusion..

Solution:

Null Hypothesis: There is no difference between the average of the balance between men and
women.
Alternative Hypothesis: There is difference between the average of the balance between men
and women.

Results of two Sample T-test:

Balance
Balance Men Women
Mean 509.8031088 529.5362319
Variance 213554.5652 210187.1043
Observations 193 207
Hypothesized Mean Difference 0
df 396
t Stat -0.42838443
P(T<=t) one-tail 0.334302083
t Critical one-tail 1.648710601
P(T<=t) two-tail 0.668604165
t Critical two-tail 1.965972608

Inference:
P value is greater than 0.05. Thus cannot reject null hypothesis. Thus there is no difference in the
average balance between men and women.

Question 3:

Is there a difference between students and non-students as far as average balance is


concerned? Use a two-sample t-test to draw your conclusion.

Solution:

Null Hypothesis: No difference in the average balance as far the students and non-students are
considered.
Alternative Hypothesis: There is in the balance as far the students and non-students are
considered.

Results of two Sample T-test:

Balance of Balance of non-


Students Students
Mean 876.825 480.3694444
Variance 240101.9429 193085.1361
Observations 40 360
Hypothesized Mean Difference 0
df 46
t Stat 4.902778661
P(T<=t) one-tail 6.08619E-06
t Critical one-tail 1.678660414
P(T<=t) two-tail 1.21724E-05
t Critical two-tail 2.012895599

Inference:
P value is not greater than 0.05. Thus we can reject null hypothesis. Thus there is difference in the
average balance between students and non-student.

Question 4:

It is generally assumed that if there are more credit cards then the balance on the cards will be
more. Based on this dataset, do you think this is true? Calculate a correlation coefficient and show a
scatter plot to support your answer.

Solution:
The correlation coefficient between the number of cards and the balance on cards is 0.08645635.
Thus it indicates that there is a weak relationship between the number of cards and balance on card.
10
9
8
7
No of Cards

6
5
4
3
2
1
0
0 500 1000 1500 2000 2500
Balance on card

Question 5:

Examine whether the following demographic variables influence balance: (a) age, (b) years of
education, (c) marital status. For age and years of education, use scatter plots to depict their
relationship with balance and calculate the correlation coefficient. For the relationship between
marital status and balance, use a two-sample t-test to draw your conclusion.

Solution 5A:
The correlation coefficient between age and the balance on cards is 0.001835. This number is almost
equal to zero. Thus it indicates that there is a weak relationship between age and balance on card.

120

100

80
AGE

60

40

20

0
0 500 1000 1500 2000 2500
BALANCE ON CARD

Solution 5B:
The correlation coefficient between years of education and the balance on cards is -0.008061576.
This number is almost equal to zero. Thus it indicates that there is a weak relationship between
years of education and balance on card.
Education
25
YEARS OF EDUCATION

20

15

10

0
0 500 1000 1500 2000 2500
BALANCE ON CARD

Solution 5C:
Null Hypothesis: There is no difference between the average of the balance between married and
unmarried.
Alternative Hypothesis: There is difference between the average of the balance between married
and unmarried

Results of two Sample T-test:

Married- Unmarried-
Balance on card Balance on card
Mean 517.9428571 523.2903226
Variance 205696.7262 221735.0385
Observations 245 155
Hypothesized Mean Difference 0
Df 319
t Stat -0.112233601
P(T<=t) one-tail 0.455354389
t Critical one-tail 1.649644319
P(T<=t) two-tail 0.910708777
t Critical two-tail 1.967428387

Inference:
P value is greater than 0.05. Thus, we cannot reject null hypothesis. Thus, there is no difference in
the average of the balance between married and unmarried.

Question 6:
“Ethnicity of the cardholder does not matter as far a balance is concerned.” Carry out an analysis of
variance (ANOVA) and discuss whether this statement is supported by the data or not.

Solution:
Null Hypothesis: There is no difference between the average of the balance between various
ethnicity.
Alternative Hypothesis: There is difference between the average of the balance between various
ethnicity.
Results of Analysis of Variance (ANOVA):

SUMMARY
Groups Count Sum Average Variance
African American- Balance on card 99 52569 531 235839.2
Asian- Balance on card 102 52256 512.3137 231748.3
Caucasian- Balance on card 199 103181 518.4975 190922.4

ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 18454.2 2 9227.1 0.043443 0.957492 3.018452
Within Groups 84321458 397 212396.6

Total 84339912 399

Inference:
P value is greater than 0.05. Thus we cannot reject null hypothesis. Thus there is no difference in the
average of the balance between various ethnicity.

Question 7:
A general principle that credit card companies often follow is to assign a higher credit limit to people
with a higher credit rating. Does the data show that this principle is being followed?

Solution:
The correlation coefficient between credit limit and the credit rating is 0.99687. The correlation
coefficient is closer to 1 which indicates a perfect linear relationship between the credit limit and
credit rating.

Rating
1200

1000

800
Rating

600

400

200

0
0 2000 4000 6000 8000 10000 12000 14000 16000
Credit Limit
Question 8:
Run a simple linear regression of balance on the credit limit. (Here credit limit is the X and the
balance is the Y). Report the coefficients and the R-squared. Show a scatter plot.

Solution:

Simple Linear Regression Results:

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.861697
R Square 0.742522
Adjusted R Square 0.741875
Standard Error 233.585
Observations 400

ANOVA
Significance
df SS MS F F
Regression 1 62624255 62624255 1147.764 2.5E-119
Residual 398 21715657 54561.95
Total 399 84339912

Standard Lower Upper Lower Upper


Coefficients Error t Stat P-value 95% 95% 95.0% 95.0%
1.18E-
Intercept -292.79 26.68341 -10.9728 24 -345.249 -240.332 -345.249 -240.332
2.5E-
Limit 0.171637 0.005066 33.87867 119 0.161677 0.181597 0.161677 0.181597

Balance
2500
2000
1500
Balance

1000 y = 0.1716x - 292.79


500 R² = 0.7425

0
-500 0 2000 4000 6000 8000 10000 12000 14000 16000
Credit limit

INFERENCE:
For every increase in the credit limit, there is a 0.2 times increase in the balance. P value is lesser
than 0.05, so there is a significant relationship between the two variable. As per the R squared value,
there is 74% variability in balance for variation in credit limit.
Question 9:
Run a simple linear regression of balance (Y) on credit rating (X). Report the coefficients and R-
squared. Show a scatter plot.

Solution:
Simple Linear Regression Results:
SUMMARY OUTPUT

Regression Statistics
Multiple R 0.863625
R Square 0.745848
Adjusted R Square 0.74521
Standard Error 232.0713
Observations 400

ANOVA
Significance
df SS MS F F
Regression 1 62904790 62904790 1167.994581 1.8989E-120
Residual 398 21435122 53857.09
Total 399 84339912

Standard Upper Lower Upper


Coefficients Error t Stat P-value Lower 95% 95% 95.0% 95.0%
3.07318E-
Intercept -390.846 29.06851 -13.4457 34 -447.993365 -333.699 -447.993 -333.699
1.8989E-
Rating 2.56624 0.075089 34.17594 120 2.418619483 2.713861 2.418619 2.713861

Balance
2500
2000
1500
Balance

y = 2.5662x - 390.85
1000 R² = 0.7458
500
0
0 200 400 600 800 1000 1200
-500
Credit Rating

INFERENCE:
For every increase in the credit rating, there is a 2.56 times increase in the balance. P value is lesser
than 0.05, so there is a significant relationship between the two variable. As per the R squared value,
there is 74% variability in balance for variation in credit rating.
QUESTION 10:
Consider your findings in questions 8-9. Discuss business mechanisms to increase or decrease the
balance on credit cards. Try to quantify your answers. In this context, focus on possible specific
strategies using variables in Q8 and Q9 that the business could adopt to increase the balance on
credit cards

SOLUTION:
For increasing the balance following strategies could be adopted:
 Increase the credit limit for individuals – For every unit increase in credit limit there is 0.2
increase in balance.
 Acquire customers with higher credit rating- This will lead to increase in average credit
rating. For every unit increase in credit rating there is 2.56 increase in the balance.

QUESTION 11:
The credit limit is provided as a consolidated amount for all the credit cards the cardholder has. Run
a multiple linear regression of Balance (Y) on Limit and Cards as two X variables. Report the
coefficients. Discuss the effect on the balance of (a) increasing the credit limit on the same number
of cards and (b) increasing the number of cards without altering the total credit limit.

SOLUTION:
Multiple Linear Regression Results:
SUMMARY OUTPUT

Regression Statistics
Multiple R 0.865188295
R Square 0.748550786
Adjusted R Square 0.74728404
Standard Error 231.1247525
Observations 400

ANOVA
Significance
df SS MS F F
Regression 2 63132707 31566354 590.9238 9.8E-120
Residual 397 21207205 53418.65
Total 399 84339912

Standard Lower Upper Lower Upper


Coefficients Error t Stat P-value 95% 95% 95.0% 95.0%
-
Intercept 369.0359554 36.16415 -10.2045 7.23E-22 -440.133 -297.939 -440.133 -297.939
Limit 0.171479037 0.005013 34.20594 2E-120 0.161623 0.181335 0.161623 0.181335
Cards 26.03375427 8.438364 3.085166 0.002177 9.444291 42.62322 9.444291 42.62322

INFERENCE:
For every increase in the credit limit, there is a 0.17 times increase in the balance keeping the
number of cards constant. Similarly, for every increase in the number of cards there is 26 times
increase in the balance keeping the credit limit constant. P value is lesser than 0.05, so there is a
significant relationship between the three variable. As per the R squared value, there is 74%
variability in balance, for variation in credit limit and number of cards.
QUESTION 12:
Run a simple linear regression equation with Income as X and Balance as Y. Report the coefficients. Is
the coefficient of Income significantly different from zero? What does this say about the effect of
income on balance?

SOLUTION:
Simple Linear Regression Results:

Regression Statistics
Multiple R 0.463656457
R Square 0.21497731
Adjusted R Square 0.213004891
Standard Error 407.8647195
Observations 400

ANOVA
Significance
df SS MS F F
Regression 1 18131167 18131167 108.9917 1.03E-22
Residual 398 66208745 166353.6
Total 399 84339912

Standard Lower Upper Lower Upper


Coefficients Error t Stat P-value 95% 95% 95.0% 95.0%
Intercept 246.5147506 33.19935 7.425289 6.9E-13 181.2467 311.7828 181.2467 311.7828
1.03E-
Income 6.048363409 0.57935 10.43991 22 4.909394 7.187332 4.909394 7.187332

Balance
2500
y = 6.0484x + 246.51
2000
Income

1500
1000
500
0
0 20 40 60 80 100 120 140 160 180 200
Balance

INFERENCE:
Coefficient of income is significantly different from zero. For every increase in income there is
increase in balance by 6 times. As per the R squared value, there is 21% variability in balance for
variation in income.
QUESTION 13:
Based on the equation derived in question 12, what is the estimated balance for a person with an
income of USD 100k per year?

SOLUTION:
Regression equation is y=6.0484x+246.51. Estimated balance for a person with an income of USD
100K per year is 851.35K dollars

Question 14:
Based on the dataset, explore the relationship between credit card balance (Y) and (a) Income (b)
Age (c) Education (c) Limit, and (d) Rating as X variables? Estimate a multiple linear regression model
and report the statistical significance of each of these variables.

Solution:
Correlation matrix:
Income Age Education Limit Rating Balance
Income 1
Age 0.175338 1
Education -0.02769 0.003619 1
Limit 0.792088 0.100888 -0.02355 1
Rating 0.791378 0.103165 -0.03014 0.99688 1
Balance 0.463656 0.001835 -0.00806 0.861697 0.863625 1

Multiple linear regression results:


SUMMARY OUTPUT

Regression Statistics
Multiple R 0.936703
R Square 0.877412
Adjusted R Square 0.875856
Standard Error 161.9918
Observations 400

ANOVA
Significance
df SS MS F F
Regression 5 74000827 14800165 564.0021 4.6E-177
Residual 394 10339085 26241.33
Total 399 84339912

Standard Lower Upper Lower Upper


Coefficients Error t Stat P-value 95% 95% 95.0% 95.0%
Intercept -473.251 55.10834 -8.58766 2.09E-16 -581.595 -364.908 -581.595 -364.908
Income -7.60883 0.381932 -19.922 1.37E-61 -8.35971 -6.85795 -8.35971 -6.85795
Age -0.86003 0.4787 -1.79659 0.073166 -1.80116 0.081096 -1.80116 0.081096
Education 1.967792 2.605291 0.755306 0.450517 -3.15422 7.089802 -3.15422 7.089802
Limit 0.079016 0.044791 1.764114 0.078488 -0.00904 0.167076 -0.00904 0.167076
Rating 2.773844 0.66708 4.15819 3.94E-05 1.462363 4.085324 1.462363 4.085324
EQUATION OF THE MULTIPLE REGRESSION MODEL:

Y= -473.25-7.6X1-0.86X2+1.967X3+0.079X4+2.773X5

STATISTICAL SIGNIFICANCE OF EACH VARIABLE:


 INCOME: The correlation coefficient between income and balance is 0.4636 and p value as
per regression analysis is lesser than 0.05. Thus this indicates that, this variable has a
significant relationship with variable balance.
 AGE: The correlation coefficient between Age and balance is 0.001835 and p value as per
regression analysis is more than 0.05. Thus this indicates that, this variable does not have a
significant relationship with variable ‘balance’.
 EDUCATION: The correlation coefficient between education and balance is -0.00806 and p
value as per regression analysis is more than 0.05. Thus this indicates that, this variable does
not have a significant relationship with variable ‘balance’.
 LIMIT: The correlation coefficient between credit limit and balance is 0.8616 and p value as
per regression analysis is more than 0.05. Thus this indicates that, this variable have a
relationship with variable ‘balance’.
 RATING: The correlation coefficient between credit rating and balance is 0.8636 and p value
as per regression analysis is less than 0.05. Thus this indicates that, this variable have a
significant relationship with variable ‘balance’.

You might also like