You are on page 1of 23

Case 2 : Cross Tabulation

1. For each of the 11 variables, examine its nature (metric, interval,

ordinal, nominal)

Salary Metric( continous)

Size_Family Nominal
Education Nominal
Region Ordinal
Lifestyle Ordinal
Cars Nominal
Credit Ordinal
Stataion_Wagon Ordinal
Foriegn_Car Ordinal
Van Ordinal
Other Ordinal

Table 1: Nature of Variables

2. Carry out the data cleaning steps, identify the possible outlier or extreme values and
remove them if necessary.
Data has been observed and there were no missing values in this data. Also no extreme
values of variables has been observed.

Cases
Valid Missing Total
N Percent N Percent N Percent
Size of the Family 100 100.0% 0 0.0% 100 100.0%
Years of Education of 100 100.0% 0 0.0% 100 100.0%
the head of the family
Area (Northeren or 100 100.0% 0 0.0% 100 100.0%
Southeren)
Life Style 100 100.0% 0 0.0% 100 100.0%
Number of the cars in 100 100.0% 0 0.0% 100 100.0%
possesion
Buy cars on credit 100 100.0% 0 0.0% 100 100.0%
Have a stataion Vagon 100 100.0% 0 0.0% 100 100.0%
Have a foriegn 100 100.0% 0 0.0% 100 100.0%
economic care?
Have a Van? 100 100.0% 0 0.0% 100 100.0%
Have another type of a 100 100.0% 0 0.0% 100 100.0%
car?
3. Examine the distribution of income. Is income normally distributed? If not, which
transformation would you recommend to make it normal?

For checking the normality distribution of variables One-sample test is used. Here to
check the distribution of income of family one- sample test is applied and following
results have been observed.
Interpretations

Total no. of respondents are 100 and salary have mean value 43105, with Std. Deviation
value is 37918.172 as shown here in the following table of One-Sample Statistics.

With the Confidence interval of difference of 95% lower value is 35581.21 and 50628.79
is higher value. With the df 100, t value is 11.368 and Sig .000 is observed. The values are
showing that the income of family is not normally distributed.

Cases
Valid Missing Total
N Percent N Percent N Percent
Income of the 100 100.0% 0 0.0% 100 100.0%
family
Table 1: Case Processing Summary

Statistic Std.
Error
Mean 43105.00 3791.817
Lower 35581.21
95% Confidence Bound
Interval for Mean Upper 50628.79
Bound
5% Trimmed Mean 38484.44
Median 35400.00
Income of the 1437787752.
Variance
family 525
Std. Deviation 37918.172
Minimum 0
Maximum 304200
Range 304200
Interquartile Range 37350
Skewness 3.843 .241
Kurtosis 22.614 .478
Table 2: Descriptives

Kolmogorov-Smirnova Shapiro-Wilk
Statistic df Sig. Statistic df Sig.
Income of the .201 100 .000 .675 100 .000
family
Table 3: Tests of Normality

4. After preparing the data, apply the cross tabulation and chi-square test
of independence.

To describe the relationship between two categorical variables, we use a special type of table
called a cross-tabulation. This type of table is also known as a Crosstab. In a crosstab, the
categories of one variable determine the rows of the table, and the categories of the other variable
determine the columns.

A Cross tabulation and chi-square test is applied on all variables by assuming folloing assumptions.
H0= There is no relation between Area and other variables

a) Area (Northeren or Southeren) * Income of the family:

In the below table p- value .558 is greater than the critical value .05 so H0 is
accepted.
Chi-Square Tests
Value df Asymp. Sig. (2-
sided)
a
Pearson Chi-Square 85.417 88 .558
Likelihood Ratio 115.194 88 .027
Linear-by-Linear Association .074 1 .786
N of Valid Cases 100
a. 178 cells (100.0%) have expected count less than 5. The minimum
expected count is .40.

b) Area (Northeren or Southeren) * Size of the Family

In the below table p- value .882 is greater than the critical value .05 so H0 is
accepted

Chi-Square Tests
Value df Asymp. Sig.
(2-sided)
a
Pearson Chi-Square 4.418 9 .882
Likelihood Ratio 5.114 9 .824
Linear-by-Linear .000 1 1.000
Association
N of Valid Cases 100
a. 14 cells (70.0%) have expected count less than 5. The minimum
expected count is .40.

c) Area (Northeren or Southeren) * Years of Education of the head of the family

In the below table p- value .498 is greater than the critical value .05 so H0 is
accepted

Chi-Square Tests
Value df Asymp. Sig. (2-
sided)
a
Pearson Chi-Square 10.368 11 .498
Likelihood Ratio 12.844 11 .304
Linear-by-Linear Association .764 1 .382
N of Valid Cases 100
a. 19 cells (79.2%) have expected count less than 5. The minimum
expected count is .40.
d) Area (Northeren or Southeren) * Life Style
In the below table p- value .04 is smaller than the critical value .05 so H0 is
rejected.
Chi-Square Tests
Value df Asymp. Sig. (2- Exact Sig. (2-sided) Exact Sig. (1-sided)
sided)
Pearson Chi-Square 4.209a 1 .040
Continuity Correction b 3.409 1 .065
Likelihood Ratio 4.220 1 .040
Fisher's Exact Test .064 .032
Linear-by-Linear Association 4.167 1 .041
N of Valid Cases 100
a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 18.00.
b. Computed only for a 2x2 table

e) Area (Northeren or Southeren) * Number of the cars in possession

In the below table p- value .819 is greater than the critical value .05 so H0 is
accepted

Chi-Square Tests
Value df Asymp. Sig.
(2-sided)
a
Pearson Chi-Square .400 2 .819
Likelihood Ratio .402 2 .818
Linear-by-Linear .111 1 .739
Association
N of Valid Cases 100
a. 2 cells (33.3%) have expected count less than 5. The minimum
expected count is .80.
f) Area (Northeren or Southeren) * buy cars on credit
In the below table p- value .181 is greater than the critical value .05 so H0 is
accepted

Chi-Square Tests
Value df Asymp. Sig. (2- Exact Sig. (2- Exact Sig. (1-
sided) sided) sided)
Pearson Chi-Square 1.786a 1 .181
Continuity Correctionb 1.240 1 .265
Likelihood Ratio 1.826 1 .177
Fisher's Exact Test .265 .132
Linear-by-Linear Association 1.768 1 .184
N of Valid Cases 100
a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 12.00.
b. Computed only for a 2x2 table

g) Area (Northeren or Southeren) * have a stataion Vagon

In the below table p- value .755 is greater than the critical value .05 so H0 is
accepted
Chi-Square Tests
Value df Asymp. Sig. (2- Exact Sig. (2- Exact Sig. (1-
sided) sided) sided)
a
Pearson Chi-Square .097 1 .755
Continuity Correctionb .003 1 .959
Likelihood Ratio .098 1 .754
Fisher's Exact Test .801 .484
Linear-by-Linear .096 1 .756
Association
N of Valid Cases 100
a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 7.60.
b. Computed only for a 2x2 table
h) Area (Northeren or Southeren) * Have a foriegn economic care?
In the below table p- value .117 is greater than the critical value .05 so H0 is
accepted

Chi-Square Tests
Value df Asymp. Sig. (2- Exact Sig. (2- Exact Sig. (1-
sided) sided) sided)
Pearson Chi-Square 2.451a 1 .117
Continuity Correctionb 1.536 1 .215
Likelihood Ratio 2.697 1 .101
Fisher's Exact Test .192 .105
Linear-by-Linear 2.427 1 .119
Association
N of Valid Cases 100
a. 1 cells (25.0%) have expected count less than 5. The minimum expected count is 4.40.
b. Computed only for a 2x2 table
i) Area (Northeren or Southeren) * Have a Van?
In the below table p- value .000 is smaller than the critical value .05 so H1 is
accepted
Chi-Square Tests
Value df Asymp. Sig. (2- Exact Sig. (2- Exact Sig. (1-
sided) sided) sided)
a
Pearson Chi-Square 21.094 1 .000
Continuity Correctionb 18.815 1 .000
Likelihood Ratio 21.710 1 .000
Fisher's Exact Test .000 .000
Linear-by-Linear 20.883 1 .000
Association
N of Valid Cases 100
a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 8.00.
b. Computed only for a 2x2 table
j) Area (Northeren or Southeren) * Have another type of a car?
In the below table p- value .002 is smaller than the critical value .05 so H1 is
accepted

Chi-Square Tests
Value df Asymp. Sig. (2- Exact Sig. (2- Exact Sig. (1-
sided) sided) sided)
a
Pearson Chi-Square 9.557 1 .002
Continuity Correctionb 8.203 1 .004
Likelihood Ratio 9.472 1 .002
Fisher's Exact Test .003 .002
Linear-by-Linear 9.461 1 .002
Association

N of Valid Cases 100

a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 11.20.
b. Computed only for a 2x2 table
5. After analyzing the data, try to make a simplified table to present all the
results to
Variable Pearson Chi-Square
the Size_Family 0.882 mangers
Education 0.498
lifestyle 0.40
Cars 0.819
credit 0.181
Stataion_Wagon 0.755
Foriegn_Car 0.117
van 0.0000
Other 0.002
Salary 0.558

(professionals).
6. Does the number of cars possessed depend more on the income of the
family or the size of the family? Does there exist an interaction between
these two factors?
To see the dependency of car possessed on income or size of family is
analyzed by the regression analysis.
First regression analysis is done between car possession and
income of family and following results have been observed.

Model Summaryb
Model R R Square Adjusted R Std. Error of
Square the Estimate
1 .245a .060 .050 .477
a. Predictors: (Constant), Income of the family
b. Dependent Variable: Number of the cars in possesion

ANOVAa
Model Sum of df Mean Square F Sig.
Squares
Regression 1.422 1 1.422 6.254 .014b
1 Residual 22.288 98 .227
Total 23.710 99
a. Dependent Variable: Number of the cars in possesion
b. Predictors: (Constant), Income of the family

Coefficientsa
Model 95.0% Confidence Interval for
B
Lower Bound Upper Bound
(Constant) .990 1.277
1 Income of the .000 .000
family
a. Dependent Variable: Number of the cars in possesion

Residuals Statisticsa
Minimum Maximum Mean Std. Deviation N
Predicted Value 1.13 2.10 1.27 .120 100
Residual -1.095 1.553 .000 .474 100
Std. Predicted Value -1.137 6.886 .000 1.000 100
Std. Residual -2.297 3.257 .000 .995 100
a. Dependent Variable: Number of the cars in possesion

Regression analysis is done between car possession and income of

family and following results have been observed.

ariables Entered/Removeda
Model Variables Variables Method
Entered Removed
Size of the . Enter
1
Familyb
a. Dependent Variable: Number of the cars in
possesion
b. All requested variables entered.

Model Summaryb
Model R R Square Adjusted R Std. Error of
Square the Estimate
1 .587a .344 .338 .398
a. Predictors: (Constant), Size of the Family
b. Dependent Variable: Number of the cars in possesion

ANOVAa
Model Sum of df Mean Square F Sig.
Squares
Regression 8.161 1 8.161 51.438 .000b
1 Residual 15.549 98 .159
Total 23.710 99
a. Dependent Variable: Number of the cars in possesion
b. Predictors: (Constant), Size of the Family
Coefficientsa
Model 95.0% Confidence Interval for
B
Lower Bound Upper Bound
(Constant) .481 .851
1 Size of the .111 .195
Family
a. Dependent Variable: Number of the cars in possesion

Residuals Statisticsa
Minimum Maximum Mean Std. Deviation N
Predicted Value .97 2.35 1.27 .287 100
Residual -.737 1.569 .000 .396 100
Std. Predicted Value -1.039 3.756 .000 1.000 100
Std. Residual -1.849 3.940 .000 .995 100
a. Dependent Variable: Number of the cars in possesion
7. Does there exist an association between the fact of having a Van and the
life style? What happens to this association when the area is considered?
Finally, the possession of a Van is related to the area or the life style?
Does there exist an association between the fact of having a Van
and the life style?
To find out the relation between the having van and life style cross tab
analysis is done and following results have been observed. Following tables
show that p values is greater than value of 0.05 so there is no association
between having a van and life style.

Case Processing Summary

Cases
Valid Missing Total
N Percent N Percent N Percent
Have a Van? * Life 100 100.0% 0 0.0% 100 100.0%
Style

Life Style Total

1 2
1 46 34 80
Have a Van?
2 9 11 20
Total 55 45 100

Chi-Square Tests
Value df Asymp. Sig. Exact Sig. (2- Exact Sig. (1-
(2-sided) sided) sided)
Pearson Chi-Square 1.010a 1 .315
Continuity Correctionb .568 1 .451
Likelihood Ratio 1.005 1 .316
Fisher's Exact Test .329 .225
Linear-by-Linear 1.000 1 .317
Association
N of Valid Cases 100
a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 9.00.
b. Computed only for a 2x2 table

What happens to this association when the area is considered?

To find out the relation between the having van and area cross tab analysis is
done and following results have been observed. Following tables show that p
values is less than value of 0.05 so there is an association between having a
van and area.

Case Processing Summary

Cases
Valid Missing Total
N Percent N Percent N Percent
Have a Van? * Area 100 100.0% 0 0.0% 100 100.0%
(Northeren or
Southeren)

Have a Van? * Area (Northeren or Southeren)

Crosstabulation
Count
Area (Northeren or Total
Southeren)
1 2
1 57 23 80
Have a Van?
2 3 17 20
Total 60 40 100
Chi-Square Tests
Value Df Asymp. Sig. Exact Sig. (2- Exact Sig. (1-
(2-sided) sided) sided)
Pearson Chi-Square 21.094a 1 .000
Continuity Correctionb 18.815 1 .000
Likelihood Ratio 21.710 1 .000
Fisher's Exact Test .000 .000
Linear-by-Linear 20.883 1 .000
Association
N of Valid Cases 100
a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 8.00.
b. Computed only for a 2x2 table

Finally, the possession of a Van is related to the area.

8. Does the possession of economic foreign cars depend on the size of the
family?
The possession of economic foreign cars depends on the size of the family
as shown in the following ANOVA table.

Size of the Family * Have a foriegn economic care?

Crosstabulation
Count
Have a foriegn economic Total
care?
1 2
Chi-Square Tests
2 17 2 19
Value Df Asymp. Sig.
3 28 2 30
(2-sided)
4 28 1 29
Pearson Chi-Square 24.970a 9 .003
5 3 2 5
Likelihood Ratio 16.830 9 .051
Linear-by-Linear Size of the 7.0116 1 6 .008 1 7
Association Family 7 3 0 3
N of Valid Cases 1008 2 1 3
a. 16 cells (80.0%) have expected9count less than 5. The
2 0 2
minimum expected count is .11. 10 0 1 1
9. Is 11 0 1 1 possession
Total 89 11 100
of a station-
wagon has any association with the size of the family? Is this conclusion
remained when one adds the effect of the income?

H0= There is no association between having a station wagon and size of family

H1= There is an association between the variables

In the below table p- value .000 is less than the critical value .05 so H1 is
accepted.
Chi-Square Tests
Value df Asymp. Sig.
(2-sided)
Pearson Chi-Square 59.589a 9 .000
Likelihood Ratio 53.811 9 .000
Linear-by-Linear 37.268 1 .000
Association
N of Valid Cases 100
a. 14 cells (70.0%) have expected count less than 5. The
minimum expected count is .19.

But when one adds the effect of the income.

The regression analysis is showing that there is no effect of income on the association of having
station wagon and size of family.

Descriptive Statistics
Mean Std. Deviation N
have a stataion Vagon 1.19 .394 100
Size of the Family 3.95 1.877 100
Income of the family 43105.00 37918.172 100

ANOVAa
Model Sum of df Mean F Sig.
Squares Square
Regression 5.862 2 2.931 29.841 .000b
1 Residual 9.528 97 .098

Total 15.390 99
a. Dependent Variable: have a stataion Vagon
b. Predictors: (Constant), Income of the family, Size of the Family
10.Can the use of a credit for the purchase of a car be explained by the
level of education? What does happen if one considers simultaneously
the effect of the income?

H1= There is an relationship between the variables

In the below table p- value .181 is greater than the critical value .05 so H0 is
accepted. So there is no relationship between these two variables.

Chi-Square Tests
Value df Asymp. Sig. Exact Sig. (2- Exact Sig. (1-
(2-sided) sided) sided)
Pearson Chi-Square 1.786a 1 .181
Continuity Correctionb 1.240 1 .265
Likelihood Ratio 1.826 1 .177
Fisher's Exact Test .265 .132
Linear-by-Linear 1.768 1 .184
Association
N of Valid Cases 100
a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 12.00.
b. Computed only for a 2x2 table
When the effect of income is added

Following results have been observed. And conclusion remains the same.
ANOVAa
Model Sum of Df Mean Square F Sig.
Squares
Regression .781 2 .391 1.873 .159b
1 Residual 20.219 97 .208
Total 21.000 99
a. Dependent Variable: buy cars on credit
b. Predictors: (Constant), Income of the family, Years of Education of the head of the
family
11.Is the possession of a station-wagon affected by the area in which the
family lives? Is this conclusion upheld when one adds the size of the
family?

A Cross tabulation and chi-square test is applied on all variables by assuming following assumptions.

Area (Northeren or Southeren) * Having a station wagon

In the below table p- value .755 is greater than the critical value .05 so H0 is
accepted.
Case Processing Summary
Cases
Valid Missing Total
N Percent N Percent N Percent
have a stataion Vagon * 100 100.0% 0 0.0% 100 100.0%
Area (Northeren or
Southeren)

Chi-Square Tests
Value Df Asymp. Sig. Exact Sig. (2- Exact Sig. (1-
(2-sided) sided) sided)
a
Pearson Chi-Square .097 1 .755
Continuity Correctionb .003 1 .959
Likelihood Ratio .098 1 .754
Fisher's Exact Test .801 .484
Linear-by-Linear .096 1 .756
Association
N of Valid Cases 100
a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 7.60.
b. Computed only for a 2x2 table

When the effect of size of family is studied on this relation a regression analysis showing the
following results and according to following table the possession of station wagon is
dependent on the area and size of family.

ANOVAa
Model Sum of df Mean F Sig.
Squares Square
1 Regression 5.809 2 2.904 29.402 .000b
have a stataion Vagon * Area (Northeren or Southeren)
Residual 9.581 97
Crosstabulation .099
Total
Count 15.390 99
a. Dependent Variable: have a stataion Vagon
Area (Northeren or Total
b. Predictors: (Constant), Area (Northeren orSoutheren)
Southeren), Size of the Family
1 2
have a stataion 1 48 33 81
Vagon 2 12 7 19
Total 60 40 100