You are on page 1of 15

DADM ASSESSMENT 2

Q1. A company manager says that the average balance on their credit cards is $500. Do you think
that this assertation is justified? Use a one sample t-test to draw your conclusion.

Fig. 1

To find out whether the average balance on credit cards is equal to $500 or not, one sample t-test
has been conducted. As can be seen from the above table, since p-value > 0.05, we fail to reject null
hypothesis (H0) which means that we do not have enough evidence to prove that average balance is
greater than $500.
Hence, we can conclude that average balance on credit cards is equal to $500.
Q2. Is there a difference between men and women as far as average balance is concerned? Use a
two sample t-test to draw your conclusion.

Fig. 2

To find out whether there is any difference in average credit card balance of men and women, two
sample t-test has been conducted. As can be inferred from the above table, since p-value > 0.05, we
fail to reject null hypothesis (H0) which means that we do not have enough evidence to prove that
there is difference in average balance of men and women.
Hence, we can conclude that there is no difference in average credit card balance of men and
women.
Q3. Is there a difference between students and non-students as far as average balance is
concerned? Use a two sample t-test to draw your conclusion.

Fig. 3

To find out whether there is any difference in average credit card balance of student and non-
student, two sample t-test has been conducted. As can be seen from the above table, since p-value <
0.05, we reject null hypothesis (H0) in favour of alternative hypothesis (H1) which means that we
have enough evidence to prove that there is difference in average credit card balance of student and
non-student.
Q4. It is generally assumed that if there are more credit cards then the balance on the credit cards
will be more. Based on this dataset, do you think this is true? Calculate a correlation coefficient
and show a scatter plot to support your answer.

Fig. 4

Relation between Credit Balance and No. Cards


2000
1800
1600
1400
Credit Balance

1200
1000
800
600 f(x) = 28.99 x + 434.29
400
200
0
0 1 2 3 4 5 6 7 8 9 10
No.of Cards

Fig. 5

As we can infer from the above table, since the correlation coefficient of card and balance is
0.086456, we can conclude that the ‘no. of cards’ and ‘credit balance’ have negligible positive
relation.
Also, the trendline in the above scatter plot (Fig. 5) is depicting a kind of negligible upward trend
which, in turn, supports our conclusion.
Hence, we can say that more number of credit cards does not necessarily lead to increase in credit
balance.
Q5. Examine whether the following demographic variables influence balance: (a) age, (b) years of
education, (c) marital status. For age and years of education, use scatter plots to depict their
relationship with balance and calculate the correlation coefficient. For the relationship between
marital status and balance, use a two-sample t-test to draw your conclusion.

Fig. 6

Correlation Analysis of Age and Balance :

Fig. 7

As we can infer from the above table (Fig. 6), since the correlation coefficient of age and balance is
0.001835, we can conclude that the ‘age’ and ‘balance’ have negligible positive correlation or no
relation.
Also, the trendline in the above scatter plot (Fig. 7) is a straight line which, in turn, depicts no
relation between the two categories.

Correlation Analysis of Education and Balance :

Fig. 8
As we can infer from the above table (Fig. 6), since the correlation coefficient of education and
balance is -0.00806, we can conclude that the ‘education’ and ‘balance’ have negligible negative
relation.
Also, the trendline in the above scatter plot (Fig. 8) is depicting a negligible downward trend, which,
in turn, supports our conclusion.

Two Sample T-Test to find out influence of Marital Status on balance :

Fig. 9

To find out whether marital status influence balance or not, two sample t-test has been conducted.
As can be seen from the above table (Fig. 9), since p-value > 0.05, we fail to reject null hypothesis
(H0) which means that we do not have enough evidence to prove that credit balance of married and
that of unmarried are not equal.
Hence, we can conclude that marital status does not influence balance.
Q6. “Ethnicity of the cardholder does not matter as far as a balance is concerned”. Carry out an
analysis of variance (ANOVA) and discuss whether this statement is supported by the data or not.

Fig. 10

To find out whether ethnicity of cardholder matters or not, ANOVA test has been conducted. As can
be inferred from above table (Fig. 10), since p-value > 0.05, we fail to reject null hypothesis (H0)
which means that we do not have enough evidence to prove that ethnicity of cardholder matters in
case of credit balance.
Hence, we can conclude that ethnicity of cardholder does not matter as far as balance is concerned.
Q7. A general principle that credit card companies often follows is to assign higher credit limit to
people with a higher credit rating. Does the data show that this principle is being followed?

Fig. 11

Fig. 12

As can be inferred from above table (Fig. 11), since the coefficient correlation of rating and limit is
0.996748, we can conclude that there is a perfect positive relation between the two.
Also, above scatter plot (Fig. 12) supports our conclusion too as the trendline is depicting a perfect
upward trend.
Hence, we can say that the credit card companies assign higher credit limit to people with higher
credit rating.
Q8. Run a simple linear regression of balance on the credit limit. (Here credit limit is the X and the
balance is the Y). Report the coefficients and the R-squared. Show a scatter plot.

Fig. 13

Fig. 14

As we can infer from above table (Fig. 13) as well as scatter plot (Fig. 14), the value of R square is
0.7268 which means that around 73% of variation in balance is due to change in limit.
The simple linear regression equation is generally of the form a + bx or bx + a, where :
a = y-intercept
b = slope
x = independent variable

If we again refer the above table, intercept coefficient is -312.89 and slope coefficient is 0.18. So, our
equation in this case would be : Balance = 0.18 * Limit – 312.89
With the help of our above equation, we can say that if limit increases by $1, then the balance will
increase by $0.18.
Also, by looking at the above scatter plot, we can say that credit limit and balance have high positive
relation as the trendline is depicting high upward trend.
Q9. Run a simple linear regression of balance (Y) on credit rating (X). Report the coefficients and R-
squared. Show a scatter plot.

Fig. 15

Fig. 16

A we can infer from above table (Fig. 15) as well as scatter plot (Fig. 16), the value of R square is
0.7302 which means that around 73% of variation in balance is due to change in rating.
The simple linear regression equation is generally of the form a + bx or bx + a, where :
a = y-intercept
b = slope
x = independent variable

If we again refer the above table, intercept coefficient is -410.897 and slope coefficient is 2.685. So,
our equation in this case would be : Balance = 2.69 * Rating – 410.9
With the help of our above equation, we can say that if rating increases by 1, then the balance will
increase by $2.69.
Also, by looking at the above scatter plot, we can say that rating and balance have perfect positive
relation as the trendline is depicting a perfect upward trend.
Q 10. Consider your findings in questions 8-9.  Discuss business mechanisms to increase or
decrease the balance on credit cards. Try to quantify your answers. In this context, focus on
possible specific strategies using variables in Q8 and Q9 that the business could adopt to increase
the balance on credit cards.

Ans. On the basis of questions 8 & 9, following business mechanisms can be adopted to increase or
decrease the credit balance :
 The company should offer higher limit to those customers who have higher rating than to
those with lower rating. An increase in the limit by the credit companies will lead to an
increase in balance as the customers have a tendency to buy more on credit if they have
higher limit on their credit card. For the same reason, if there is decrease in the limit, it will
result in decrease in credit balance as well.
 The companies, in collaboration with different outlets can offer many discounts and other
lucrative offers on the purchases made on their credit card as this will motivate and
encourage the customers to buy more than otherwise. This ultimately increase the credit
balance of the customers.
 The companies can lure customers to make more purchases on credit card than on cash by
giving increased credit points on every purchase they make.
Q 11. The credit limit is provided as a consolidated amount for all the credit cards the cardholder
has. Run a multiple linear regression of Balance (Y) on Limit and Cards as two X variables. Report
the coefficients. Discuss the effect on the balance of (a) increasing the credit limit on the same
number of cards and (b) increasing the number of cards without altering the total credit limit.  

Fig. 17

Fig. 18

The multiple linear regression is generally of the form y = b0 + b1x1 + b2x2 + b3x3 + ………….. + bpxp,
where :
y = dependent variable
b0 = y-intercept
b1, b2, b3 = slope
x1, x2, x3 = independent variables

As can be inferred from Fig. 17, the intercept coefficient is -401.748 and the slope coefficient of limit
and cards is 0.179 and 30.756 respectively.
As can be seen from Fig. 18, with the help of p-value, we are able to find out that the balance
depends on limit as well as no. of credit cards.
Our multiple regression equation would be : Balance = 0.18 * Limit + 30.76 * Cards – 401.75
If we increase limit by $1 when all other variable are constant, then the balance will also increase by
$0.18 and suppose if we increase no. of cards by 1 unit when other variables remain constant, then
the balance would increase by $30.76.
Q 12. Run a simple linear regression equation with Income as X and Balance as Y. Report the
coefficients. Is the coefficient of Income significantly different from zero? What does this say about
the effect of income on balance? 

Fig. 19
Assumptions :
Income : Assuming its unit as thousand US dollars per year
Balance : Assuming its unit as dollars

The simple linear regression equation is generally of the form a + bx or bx + a, where :


a = y-intercept
b = slope
x = independent variable

As can be inferred from the above table (Fig. 19), the intercept coefficient is 241.529 and the slope
coefficient is 6.456. The simple linear regression equation in this case would be :
Balance = 241.53 + 6.46 * Income
With the help of above equation, we can say that if income increases by $1000, then the balance
would also be increased by $6460.

Q 13. Based on the equation derived in question 12, what is the estimated balance for a person
with an income of USD 100k per year?  

Ans. The equation in question 12 was : Balance = 241.53 + 6.46 * Income


The value of balance is estimated below if we substitute the value of income in the above equation.
Balance = 241.53 + 6.46 * 100000
= 241.53 + 646000
= $646241.53
Q 14. Based on the dataset, explore the relationship between credit card balance (Y) and (a)
Income (b) Age (c) Education (c) Limit, and (d) Rating as X variables? Estimate a multiple linear
regression model and report the statistical significance of each of these variables.

Fig. 20

Fig. 21

Fig. 22
The multiple linear regression is generally of the form y = b0 + b1x1 + b2x2 + b3x3 + ………….. + bpxp,
where :
y = dependent variable
b0 = y-intercept
b1, b2, b3 = slope
x1, x2, x3 = independent variables

As can be inferred from Fig. 20, the value of R Square is 0.8638 which means that 86 % of variation in
balance is due to changes in income, age, education, limit and rating.
The intercept coefficient is -543.638 and the slope coefficients of income, age, education, limit and
rating are -8.486, -0.695, 5.614, 0.07 and 3.089 respectively.
If we refer Fig. 21, we will notice that hypothesis test has been conducted and we have found that
out of the five variables, i.e., income, age, education, limit and rating, balance depends only on
income and rating.
Also, Fig. 22 depicts the significant and insignificant variables when the p-value is compared with
0.05. Income and rating are significant variables as their p-value is less than 0.05 whereas age,
education and limit are insignificant variables.
By considering the above three tables, the multiple linear equation in this case would be :
Balance = -543.64 – 8.49 * Income + 3.09 * Rating
With the help of the above equation, we can say that if income is decreased by $1000 when all the
other variables are constant, then balance will increase by $8490 and suppose if rating is increased
by 1 unit when all the other variables remain unchanged, then the balance would increase by $3.09.

You might also like