You are on page 1of 20

BUSINESS ANALYTICS

ASSIGNMENT 3

XAVIER INSTITUTE OF SOCIAL SERVICE,


PURULIA ROAD, RANCHI

Submitted by:
RADHIKA PADIA (ROLL NO 15)

THE DATA TABLE:


CONCE
PT
3
1
2
2
2
1
2
1
2
5
4
2
2
1
2
3
1
2
3
2
3
3
3
1
4
4
5
4
4
5
2
2
1
3
3
1
1
4
3
1
3
3

CIS
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1

AG
E
24
22
27
26
29
23
24
22
26
48
37
32
32
37
41
23
34
38
32
29
36
34
32
28
29
47
24
43
42
44
24
24
23
36
39
30
28
42
39
27
36
47

MARTI
AL
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
1
1
1
1

NU
MC
AR
4
1
1
2
1
1
2
2
1
1
1
3
1
1
1
1
5
2
1
3
1
1
2
1
1
2
1
2
3
1
3
4
2
1
1
1
2
1
1
1
2
1

AVAG
E
2.5
3
2.5
2
2.5
3
3
3
2
0.5
1.5
3
1
1
2
2
3
2
1.5
2.5
1.5
1.5
2
2.5
2
1
0.5
1
1.5
0.5
1
2
2.5
2
2
2
4.5
2
1.5
2
2.5
2

NUM
TRIP
1
1
0
2
1
0
0
1
1
3
2
0
0
0
3
0
1
0
0
0
0
0
1
0
3
5
9
2
2
7
2
0
0
0
0
0
2
2
1
3
1
1

CONCE
PT1
0
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
0
0
0
0
0
0
0
1
0
0
0
0

NUMCA
R1
1
0
0
1
0
0
1
1
0
0
0
1
0
0
0
0
1
1
0
1
0
0
1
0
0
1
0
1
1
0
1
1
1
0
0
0
1
0
0
0
1
0

GROUPS
1
1
1
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
3
3
3
3
3
3
3
3
4
4
4
4

3
4
5
4
5
5
4
5
4
5
4
4
4
4
5
4
4
5

1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1

42
42
47
43
62
55
39
58
43
59
43
42
47
38
37
51
47
51

1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1

2
1
1
2
1
2
3
1
1
1
3
1
4
2
2
1
2
1

3.5
0.5
1
1.5
0.5
0.5
1.5
2
0.4
0.5
1.5
1.5
1.5
1
0.8
1
1.5
2

1
3
4
4
6
4
5
6
2
7
3
4
4
3
8
2
1
6

0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1

1
0
0
1
0
1
1
0
0
0
1
0
1
1
1
0
1
0

4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4

QUESTON 1:
Divide the sample into two groups
a. Those showing high interest 4 or 5 rating on CONCEPT
b. Those showing low interest 1or 2or 3 rating on CONCEPT
Cross tabulate high versus low interest with CIS. How strong is the association between
interest in the policy and the current insurance supplier? Is the association statistically
significant? What does it tell you?
SOLUTION:
THE RECODED VALUES OF CONCEPT IS NAMED AS CONCEPT1 LABELLED AS FOLLOWS:

0: LOW INTEREST GROUP


1:HIGH INTEREST GROUP

Cross tabulation: Two-Way


Counts

CONCEPT1(rows) by CIS(columns)
0

Total

22

12

34

18

26

Total

30

30

60

Chi-Square Tests of Association for CONCEPT1 and CIS


Test Statistic

Value

Df

p-Value

Pearson Chi-Square

6.787

1.000

0.009

Number of Valid Cases: 60

INTERPRETATION:
Let us assume,
Ho: the association between interest in the policy and the current insurance supplier is not strong.
H1: the association between interest in the policy and the current insurance supplier is strong.
From the above analysis, we get to know that p value is 0.009 which is less than 0.05 (at 5% level of
significance).
Therefore Ho is rejected and H1 is accepted
We can conclude that the association between interest in the policy and the current insurance
supplier is very strong and significant.

QUESTION2:
We can consider the concept rating (CONCEPT) as an Independent Variable and the
remaining 6 variables as predictor variables. Regress CONCEPT on the other variables:

a. Interpret the regression equation and indicate the extent to which those variations in the
predictor variables explain the variation in the independent variable?
b. Is each of the predictor variables significant at 0.05 level? Can a simpler mode (involving
fewer predictors be developed? If so what is the model and what is the percentage
improvement of the simple model over the full model?
SOLUTION:
PART a:
OLS Regression

Dependent Variable

CONCEPT

60

Multiple R

0.869

Squared Multiple R

0.755

Adjusted Squared Multiple R

0.727

Standard Error of Estimate

0.705

Regression Coefficients B = (X'X)-1X'Y


Effect

Coefficient

Standard Error

Std.
Coefficient

Tolerance

p-Value

CONSTANT

1.360

0.588

0.000

2.315

0.025

CIS

-0.067

0.211

-0.025

0.746

-0.318

0.752

MARTIAL

-0.098

0.239

-0.034

0.673

-0.409

0.684

AGE

0.056

0.013

0.418

0.451

4.132

0.000

NUMCAR

0.033

0.099

0.023

0.914

0.328

0.744

AVAGE

-0.442

0.140

-0.282

0.579

-3.160

0.003

NUMTRIP

0.226

0.051

0.383

0.627

4.458

0.000

Analysis of Variance
Source

SS

df

Mean Squares

F-Ratio

p-Value

Regression

81.359

13.560

27.248

0.000

Residual

26.375

53

0.498

INTERPRETATION:
The regression equation is:

CONCEPT =1.3600.067 CIS0.098 AGE+0.056 MARTIAL +0.033 AVAGE0.442 NUMCAR + 0.226 NU


Let us assume,
Ho: the regression equation is not significant in predicting the dependent variable.
H1: the regression equation is significant in predicting the dependent variable.
From the above analysis, we can see that

The p value is 0.00 which is less than 0.05 at 5% level of significance ,therefore Ho is
rejected and H1 is accepted i.e. , the regression equation is significant in predicting the
dependent variable.
Also, squared multiple r is 0.755, which indicates that the goodness of fit is at a fairly good
level.

THE p value of the constant is 0.025 which suggests that changes in the predictor variables are
associated with changes in the dependent variable.

PART b:
1 OLS Regression

Dependent Variable

CONCEPT

60

Multiple R

0.323

Squared Multiple R

0.105

Adjusted Squared Multiple R

0.089

Standard Error of Estimate

1.290

Regression Coefficients B = (X'X)-1X'Y


Effect

Coefficient

Standard Error

Std.
Coefficient

Tolerance

p-Value

CONSTANT

2.633

0.235

0.000

11.184

0.000

CIS

0.867

0.333

0.323

1.000

2.603

0.012

Analysis of Variance
Source

SS

df

Mean Squares

F-Ratio

p-Value

Regression

11.267

11.267

6.774

0.012

Residual

96.467

58

1.663

INTERPRETATION:
The regression equation is :

CONCEPT =2.633+0.867 CIS


H0: the regression equation is not a significant predictor of the dependent variable (concept1)
H1: the regression equation is a significant predictor of the dependent variable (Concept1)
The p value is 0.012, which is less than 0.05 at 5% level of significance, which indicates that H0 is
rejected and H1 is accepted.
Therefore CIS is a significant predictor of concept at 0.05 level of significance.

2. OLS Regression

Regression Coefficients B = (X'X)-1X'Y


Effect

Coefficient

Standard Error

Tolerance

p-Value

0.449

Std.
Coefficient
0.000

CONSTANT

-0.549

-1.222

0.227

AGE

0.098

0.012

0.739

1.000

8.348

0.000

Analysis of Variance
Source

SS

df

Mean Squares

F-Ratio

p-Value

Regression

58.796

58.796

69.685

0.000

Residual

48.937

58

0.844

INTERPRETATION:
The regression equation is:

concept=0.549+0.098 AGE
H0: the regression equation is not a significant predictor of the dependent variable( concept1)
H1: the regression equation is a significant predictor of the dependent variable( Concept1)

The p value is 0.000 , which is less than 0.05 at 5% level of significance ,which indicates that H0 is
rejected and H1 is accepted.
Therefore AGE is a significant predictor of concept at 0.05 level of signicance.

3. OLS Regression

Effect

Coefficient

CONSTANT

2.211

MARTIAL

1.253

Regression Coefficients B = (X'X)-1X'Y


Standard Error
Std.
Tolerance
Coefficient
0.282
0.000
.
0.341

0.435

1.000

p-Value

7.851

0.000

3.679

0.001

Analysis of Variance
Source

SS

df

Mean Squares

F-Ratio

p-Value

Regression

20.380

20.380

13.532

0.001

Residual

87.353

58

1.506

INTERPRETATION:
The regression equation is:

concept=2.211+ 1.253 MARTIAL


H0: the regression equation is not a significant predictor of the dependent variable (concept1)
H1: the regression equation is a significant predictor of the dependent variable ( Concept1)
The p value is 0.001, which is less than 0.05 at 5% level of significance , which indicates that H0 is
rejected and H1 is accepted.
Therefore MARTIAL is a significant predictor of concept at 0.05 level of signicance.

4. OLS Regression

Regression Coefficients B = (X'X)-1X'Y


Effect

Coefficient

Standard Error

CONSTANT

3.395

0.352

Std.
Coefficient
0.000

Tolerance

p-Value

9.633

0.000

Regression Coefficients B = (X'X)-1X'Y


Effect

Coefficient

Standard Error

NUMCAR

-0.195

0.182

Std.
Coefficient
-0.139

Tolerance

p-Value

1.000

-1.073

0.288

Analysis of Variance
Source

SS

df

Mean Squares

F-Ratio

p-Value

Regression

2.095

2.095

1.150

0.288

Residual

105.638

58

1.821

INTERPRETATION:
The regression equation is:

concept=3.3950.195 NUMCAR
H0: the regression equation is not a significant predictor of the dependent variable( concept1)
H1: the regression equation is a significant predictor of the dependent variable( Concept1)
The p value is 0.288 , which is MORE than 0.05 at 5% level of significance ,which indicates that H0
is accepted and H1 is rejected.
Therefore NUMCAR is NOT a significant predictor of concept at 0.05 level of signicance.

5. OLS Regression

Regression Coefficients B = (X'X)-1X'Y


Effect

Coefficient

Standard Error

Tolerance

p-Value

0.295

Std.
Coefficient
0.000

CONSTANT

4.968

16.856

0.000

AVAGE

-1.074

0.150

-0.685

1.000

-7.164

0.000

Analysis of Variance
Source

SS

df

Mean Squares

F-Ratio

p-Value

Regression

50.576

50.576

51.321

0.000

Residual

57.158

58

0.985

INTERPRETATION:
The regression equation is:

concept=4.9681.074 AVAGE

H0: the regression equation is not a significant predictor of the dependent variable( concept1)
H1: the regression equation is a significant predictor of the dependent variable( Concept1)
The p value is 0.000 , which is less than 0.05 at 5% level of significance ,which indicates that H0 is
rejected and H1 is accepted.
Therefore AVAGE is a significant predictor of concept at 0.05 level of signicance.

6. OLS Regression

Regression Coefficients B = (X'X)-1X'Y


Effect

Coefficient

Standard Error

Tolerance

p-Value

0.166

Std.
Coefficient
0.000

CONSTANT

2.136

12.843

0.000

NUMTRIP

0.430

0.053

0.729

1.000

8.116

0.000

Analysis of Variance
Source

SS

Df

Mean Squares

F-Ratio

p-Value

Regression

57.286

57.286

65.863

0.000

Residual

50.447

58

0.870

INTERPRETATION:
The regression equation is:

concept=2.136+ 0.430 NUMTRIP


H0: the regression equation is not a significant predictor of the dependent variable( concept1)
H1: the regression equation is a significant predictor of the dependent variable( Concept1)
The p value is 0.000 , which is less than 0.05 at 5% level of significance ,which indicates that H0 is
rejected and H1 is accepted.
Therefore NUMTRIP is a significant predictor of concept at 0.05 level of signicance.

A SIMPLER MODEL (involving fewer predictors) can be developed. This is as follows:


Best Subset Regression:
DEPENDENT VARIABLE: CONCEPT

Model No R-Sq Adjusted R-Sq

Mallows' Cp MSE

Variables

0.546

0.538

42.339

0.844

AGE

0.532

0.524

45.374

0.870

NUMTRIP

0.707

0.696

9.524

0.555

AGE, NUMTRIP

0.662

0.650

19.222

0.639

AVAGE, AGE

0.754

0.741

1.295

0.474

AVAGE, AGE, NUMTRIP

0.709

0.693

11.100

0.561

CIS, AGE, NUMTRIP

0.754

0.736

3.190

0.481

AVAGE, MARTIAL, AGE, NUMTRIP

0.754

0.736

3.230

0.482

AVAGE, NUMCAR, AGE, NUMTRIP

0.755

0.732

5.101

0.489

AVAGE, MARTIAL, NUMCAR, AGE, NUMTRIP

10

0.755

0.732

5.108

0.489

AVAGE, MARTIAL, CIS, AGE, NUMTRIP

11

0.755

0.727

7.000

0.498

NUMTRIP, AVAGE, NUMCAR, MARTIAL, AGE, CIS

Model No

AIC

AICC

BIC

Variables

164.044

164.473

170.327

AGE

165.868

166.296

172.151

NUMTRIP

139.824

140.551

148.201

AGE, NUMTRIP

148.348

149.076

156.726

AVAGE, AGE

131.290

132.401

141.762

AVAGE, AGE, NUMTRIP

141.423

142.534

151.894

CIS, AGE, NUMTRIP

133.172

134.756

145.738

AVAGE, MARTIAL, AGE, NUMTRIP

133.217

134.802

145.783

AVAGE, NUMCAR, AGE, NUMTRIP

135.071

137.225

149.731

AVAGE, MARTIAL, NUMCAR, AGE, NUMTRIP

10

135.078

137.232

149.739

AVAGE, MARTIAL, CIS, AGE, NUMTRIP

11

136.956

139.780

153.711

NUMTRIP, AVAGE, NUMCAR, MARTIAL, AGE, CIS

The Best Models:


Criteria

Value

Best Subset Model

Adjusted R-Sq

0.741

AVAGE, AGE, NUMTRIP

AIC

131.290

AVAGE, AGE, NUMTRIP

AIC (Corrected)

132.401

AVAGE, AGE, NUMTRIP

Schwarz's BIC

141.762

AVAGE, AGE, NUMTRIP

The Best Models:


Criteria

Value

Best Subset Model

INTERPRETATION:
From the tables above, we see that the adjusted R sq of AVAGE, AGE, NUMTRIP is highest,
i.e. 0.741 which indicates that it is the simple and best subset model.
The AIC value of AVAGE, AGE, NUMTRIP is 131.290, which is the lowest, hence the best
subset.
So, the percentage improvement of the simpler model over the full model would be as
follows:
[(0.741-0.727)/ 0.727]*100= 1.9257%

QUESTION 3:
Divide the sample into 4 groups: Rushmore Single, Rushmore Married, Other Company
Single, and Other Company Married. Run a single factor ANOVA to test the null hypothesis

that the mean of the CONCEPT for the four groups are the same at 5% level of significance?
If not which group has the highest average rating?
SOLUTION:
Analysis of Variance
Effects coding used for categorical variables in model.
The categorical values encountered during processing are
Variables

Levels

GROUPS (4 levels)

1.000 2.000 3.000 4.000

Dependent Variable

CONCEPT

60

Multiple R

0.563

Squared Multiple R

0.317

Analysis of Variance
Source

Type III SS

df

Mean Squares

F-Ratio

p-Value

GROUPS

34.150

11.383

8.663

0.000

Error

73.583

56

1.314

INTERPRETATION:
LET US ASSUME:
Ho: the mean of the CONCEPT for the four groups are the same.
H1: the mean of the CONCEPT for the four groups are different.
From the analysis of variance, we get to know that p value is 0.000 at 5% level of
significance ,which is less than 0.05.therefore null hypothesis is rejected and alternate
hypothesis is accepted.
We interpret that at least one of the means is different.
Now to know which mean is different, we need to perform the pairwise comparison,
PAIRWISE COMPARISON:
Post Hoc Test of CONCEPT
Using least squares means.
Using model MSE of 1.314 with 56 df.

Tukey's Honestly-Significant-Difference Test


GROUPS(i) GROUPS (j) Difference p-Value 95% Confidence Interval
Lower

Upper

-0.569

0.560

-1.719

0.581

0.148

0.992

-1.263

1.558

-1.727

0.001

-2.848

-0.606

0.717

0.454

-0.562

1.996

-1.158

0.011

-2.109

-0.207

-1.875

0.001

-3.128

-0.622

INTERPRETATION:
We compare the p value of all the 6 groups and select those p values whose value is less than
0.05.
From the above table, p values of group (1,4) , (2,4),( 3,4) are less than 0.05.
Amongst the group , we get to know that variable 4 (rushmore married) is common in all the
groups having p value less than 0.05.
Therefore , it means that the mean of RUSHMORE MARRIED(4) is different .

QUESTION 4:
Divide the NUMCAR as follows One and More than One. Now using the CONCEPT and
the 4 groups (as developed in 3 above) run a 2 way ANOVA with the second concept as
NUMCAR groups. Is there any difference between the results obtained in 3 above and this
new 2 way ANOVA at 5% level of significance?

SOLUTION:
Analysis of Variance
Effects coding used for categorical variables in model.
The categorical values encountered during processing are

Variables

Levels

GROUPS (4 levels)

1.000

2.000

NUMCAR1 (2 levels)

0.000

1.000

3.000

Dependent Variable

CONCEPT

60

Multiple R

0.593

Squared Multiple R

0.351

4.000

Analysis of Variance
Source

Type III SS

df

Mean Squares F-Ratio p-Value

GROUPS

34.538

11.513

8.568

0.000

NUMCAR1

2.614

2.614

1.945

0.169

GROUPS *NUMCAR1

2.430

0.810

0.603

0.616

Error

69.873

52

1.344

INTERPRETATION:
We run a two way annova with the 4 GROUPS and NUMCAR.
Let us assume,
Hog: the average means of CONCEPT is similar as the average means of GROUPS.
H1 g: the average means of CONCEPT is different from the average means of GROUPS.
Ho n: the average means of CONCEPT is similar as the average means of NUMCAR
H1 n: the average means of CONCEPT is different from the average means of NUMCAR.
From the p values, we can see the null hypothesis (ho g) is rejected due to its value being less
than 0.05, which means that the average means of CONCEPT is different from the average
means of GROUPS.
Similarly, the p value of NUMCAR is greater than 0.05, therefore we accept the null
hypothesis( ho n), i.e. , the average means of CONCEPT is similar as the average means of
NUMCAR.
Now to know which variable of the GROUP has a different mean, we perform a pairwise
comparison of the GROUPS.
PAIRWISE COMPARISON
Post Hoc Test of CONCEPT

Using least squares means.


Using model MSE of 1.344 with 52 df.

Tukey's Honestly-Significant-Difference Test


Groups I)

Groups (J)

Difference

p-Value

95% Confidence Interval


Lower

Upper

-0.615

0.529

-1.781

0.550

0.089

0.998

-1.340

1.519

-1.786

0.001

-2.922

-0.650

0.705

0.483

-0.592

2.001

-1.170

0.012

-2.134

-0.207

-1.875

0.001

-3.145

-0.605

INTERPRETATION:
We compare the p value of all the 6 groups and select those p values whose value is less than
0.05.
From the above table, p values of group (1,4) , (2,4),( 3,4) are less than 0.05.
Amongst the group , we get to know that variable 4 (Rushmore Married) is common in all the
groups having p value less than 0.05.
Therefore , it means that the mean of RUSHMORE MARRIED(4) is different .
Therefore, there is no difference in the results obtained in question 3 above and this new 2
way ANOVA at 5% level of significance.

QUESTION 5:
Factor analyse the full 60x7 data matrix using principal component analysis using Varimax
rotation. Apply Kaisers criterion (eigenvalue > 1) to extract the principal components. How
will you interpret each set of rotated factor loadings?
Factor Analysis

Latent Roots (Eigenvalues)


1

3.413

1.065

0.913

0.663

0.432

0.353

0.161

Component Loadings
1

CONCEPT

0.906

-0.014

CIS

0.460

0.496

AGE

0.855

0.100

MARTIAL

0.619

0.075

NUMCAR

-0.207

0.853

AVAGE

-0.779

0.269

NUMTRIP

0.786

0.048

Variance Explained by Components


1

3.413

1.065

Percent of Total Variance Explained


1

48.754

15.217

Rotated Loading Matrix (VARIMAX, Gamma = 1.000000)


1

CONCEPT

0.904

-0.051

CIS

0.481

0.477

AGE

0.858

0.065

MARTIAL

0.621

0.049

NUMCAR

-0.171

0.861

AVAGE

-0.767

0.301

NUMTRIP

0.788

0.015

"Variance" Explained by Rotated Components


1

2.791

1.063

Percent of Total Variance Explained


1

39.871

15.182

INTERPRETATION:
There are two latent factors at work which explains 63.97% of the total market behaviour of
the rushmore insurance.
P1=0.906 CONCEPT +0.460 CIS+ 0.855 AGE +0.619 MARTIAL0.207 NUMCAR0.779 AVAGE+ 0.786

P2=0.014 CONCEPT +0.496 CIS +0.100 AGE +0.075 MARTIAL+0.853 NUMCAR+0.269 AVAGE+0.048

FACTORS

PRINCIPAL COMPONENTS

CONCEPT
CIS
AGE
MARTIAL
NUMCAR
AVAGE
NUMTRIP

1
1
1
1
1
2
1

Nomenclature of PC1 consisting of factors- CONCEPT, CIS, AGE, MARTIAL, AVAGE,


NUMTRIP is
SELF CHARACTERISTICS
Nomenclature of PC2 consisting of factors- NUMCAR is

USAGE CHARACTERISTICS
Therefore we can say that 48.754% of market would buy insurance depending on self
characteristics and 15.21% of market would buy insurance depending on the car
characteristics.
RECOMMENDATIONS:
1.WISDOM OF OFFERING:

From the CHI SQUARE test of association, we understood that the association between
interest in the policy and the current insurance supplier is very strong and significant, which
means that the policy would be acceptable by the majority of respondents.
2. TARGET MARKET:

The factors influencing the respondent to buy the insurance are:

The age of the respondents,


The marital status,
Average age of cars owned
The number of trips taken by the car owned
AGE1(i)

Tukey's Honestly-Significant-Difference Test


AGE1(j) Difference
p-Value
95% Confidence Interval
Lower

Upper

-0.932

0.022

-1.762

-0.103

-1.991

0.000

-2.821

-1.162

-2.883

0.000

-4.054

-1.713

-1.059

0.010

-1.921

-0.196

-1.951

0.000

-3.145

-0.757

-0.892

0.208

-2.086

0.302

The coding of age were as follows:

20-30: 1
30-40: 2
40-50: 3
50 and above : 4

After the pairwise comparison, we can see that AGE GROUP 20-40 is the segment highly interested
in the insuance policy.
Also, from the above analysis, (question 4) , we concluded that the RUSHMORE MARRIED group
is the most significant group and therefore this segment can serve as the target market of the
insurance company.

From the factor analysis, we saw that 84.96% of market would buy insurance depending on self

characteristics and 12.948% of market would buy insurance depending on the car
characteristics.
3.FURTHER RESEARCH REQUIRED:
Cluster analysis could be done to segment the market further more.