Business Statistics (Testing of Hypothesis, Chi Square Correlation)

Unit 7 :
Hypothesis testing
 Hypothesis – it represents a researcher’s expectation what is true of the population

parameter
Formulation of Hypothesis
 Null Hypothesis - H0 – A statement in which no difference or effect is expected. If the

null hypothesis is not rejected, no changes will be made. It is a statement of no
association
In formulating the ‘null’, usually the words “no”, “not”, or “same” or “independent” will
be the part of the stated hypothesis
 Alternate Hypothesis – H1 – A statement that some difference or effect is expected.

Accepting the alternative hypothesis will lead to changes in opinions or actions. It is a
statement of association.
 Significance level – It is a level of risk or probability or risk a researcher takes when

rejecting the null hypothesis when it is true. The significance levels are 1%, 5%, 10%.
5%. Significance level is interpreted as (100 – 95)/100 (where 95 is a confidence level of
the researcher about the population parameter).
 Degrees of freedom(df) – it refers to the amount of information available to estimate

population parameters from sample statistics. It refers to the number of values in a
sample we can specify freely once we know something about the sample.
 Importance of Hypothesis
 Provides the direction to the study
 Identifies relevant facts for the study
 Facilitates interpretations and derive conclusions of the study
 Useful in framing the entire research process
 It shows the extent of knowledge gained by the researcher
1|Page
Tests of Hypotheses
Parametric Tests – When the data is on continuous scale (interval and ratio scale) then
parametric tests can be used
Ex : of Parametric test
t-test - for small samples (n<30)
Z-Test - for large samples ( n>30)
ANOVA – Analysis of Variance ( both one & two ways)
Non Parametric Tests – When the data is on categorical scale (Nominal & Ordinal) then
Non parametric tests can be used
Ex : of Non Parametric test
Chi – Square test, Kolmogorov Smirnov D test, Wilcoxon Matched Pairs test etc
2|Page
Chi Square Analysis
Karl Pearson introduced the chi-square test in 1900.
Chi-Square is used to determine how the observed frequencies differ from expected
frequencies.
Chi-Square is a non parametric test of hypothesis. It can be used for both the nominal and
the ordinal data.
Classification of Chi-Square test
Chi-Square goodness-of-fit test[for one variable]
Chi-Square test of independence[for two variables]
Chi-Square goodness-of-fit test(one dimensional chi square or for one variable)
The chi-square goodness-of-fit test is used to analyze the distribution of frequencies for
categories of one variable. It is used in analyzing nominal data variables, such as age or
number of bank arrivals, to determine whether the distribution of these frequencies is the
same as some hypothesized or expected distribution. However, the goodness-of-fit cannot
be used to analyze two variables simultaneously.
 Ex : of one dimensional chi square test
To test the hypothesis that training is significant in improving the skills of salesman
at 5% significance level [ table value at 5% significance level is 9.488]
Dimension Strongly Agree Can’t Disagree Strongly

Agree say Disagree
No. of 15 32 9 12 2
respondents
Solution :
Null Hypothesis [H0] : Training is not significant in improving the skills of the
salesman
Alternate Hypothesis [H1] : Training is highly significant in improving the skills of

the salesman
3|Page
Formula for calculating Chi-Square :
Formula for Chi-square = ∑(Oi-Ei)2/Ei
Oi – observed frequencies, Ei – Expected frequencies
Determination of Expected frequencies
Expected frequencies = total number of respondents/number of

categories/dimensions = 70/5 = 14
Calculation of Observed and Expected frequencies
Observed Expected (Oi-Ei) (Oi-Ei)2 (Oi-Ei)2/Ei

frequencies(Oi) frequencies(Ei)
15 14 1 1 0.071
32 14 18 324 23.14
9 14 -5 25 1.785
12 14 -2 4 0.28
2 14 -12 144 10.28
Total 35.56
Formula for 2 = ∑(Oi-Ei)2/Ei = 35.56
Degrees of freedom = (n-1) = (5-1) = 4
(where n is the number of dimensions)
Expected frequencies = number of respondents/number of dimensions = 70/14 = 5
Formula for chi-square = ∑(Oi-Ei)2/Ei = 35.56
The calculated value of Chi-square test is tested at degrees of freedom
Degrees of freedom = (n-1) = (5-1) = 4, where n is the number of dimensions
Table value at 5% significance level for 4 degrees of freedom from chi-square table
= 9.488
4|Page
Conclusion
Since the calculated value(35.56) is greater than table value or critical value (9.488),
we should reject the null hypothesis and conclude that training is significant in
improving the skills of salesman.
Practice Problems in One Dimensional Chi-Square Test
1. Test at 5% significance level, whether Price is an important indicator in influencing

customers or not : [table value at 5% = 9.488]
Dimension Number of Respondents
Very Important 50
Somewhat Important 60
Neither Important nor Unimportant 20
Somewhat Unimportant 40
Very Unimportant 30
2. Test at 5% significance level, whether Brand Image is an important component in

promoting the cosmetics or not : [table value at 5% = 12.592]
Dimension Number of Respondents
Extremely Important 150
Important 90
Somewhat Important 30
Neutral 40
Somewhat 30
Un Important
Unimportant 50
Extremely 60
5|Page
Un Important
3. The demand for a particular spare part in a factory was found to vary from day-to-
day in a sample study, the following information was obtained
Day : Mon Tue Wed Thu Fri Sat
Number of parts demanded: 1124 1125 1110 1120 1126 1115
Test the hypothesis that the number of parts demanded does not depend on the day
of the week. Test tis at 5% significance level[table value at 5% = 11.070]
 Test whether the sales of milk is uniformly distributed for 12 months [table value at
5% significance level : [table value : 19.675]
Month Gallons of Milk
January 1610
February 1585
March 1649
April 1590
May 1540
June 1397
July 1410
August 1350
September 1495
October 1564
November 1602
December 1655
6|Page
7|Page
Unit 9
Correlation Analysis
Correlation Analysis is used as a statistical tool to ascertain the association between the
two variables. It also determines the nature and strength of the variables. It determines the
degree of relationship or direction between the variables
Ex: Income – Investment ( as Income increases, the level of investment also increases),
IQ – Productivity
Ex : correlation between salary, marks, IQ etc
Correlation is denoted by ‘r’ called as Correlation Coefficient. If r = 0 then it is no

correlation, r = -1, then it is negative correlation. If r = +1, then it is perfect positive
correlation.
If r lies 0.75 – 1, then it is high correlation
If r lies 0.5 – 0.74, then it is moderate correlation
If r < 0.5, then it is low correlation
Methods of correlation
The Methods of Correlation are classified as Graphic and Algebraic Methods
Graphic Methods – graph & Scatter diagram
Algebraic Methods – Product moment or Covariance method called as Karl Pearson

Product Moment Formula, Rank method or Spearman Rank Correlation, Concurrent
Deviation Method
Product moment or Karl Pearson Coefficient or Covariance Method is the measured of

determining the joint variation between the variables
Bivariate correlation – Pearson Product moment formula & Assumed mean or short cut
method
Rank Correlation –
Karl Pearson coefficient of correlation can’t be calculated when the series that are ranked
according to size. It is a convenient method to rank the series. It is used for qualitative
phenomenon
8|Page
Ex : Intelligence tests
Spearman Rank correlation(for individual ranks) and Edward Spearman formula (for tied
or similar ranks)
9|Page
Find the value of r by using Pearson Product moment formula between the following
X : 25 18 32 21 35 29
Y : 16 11 20 15 26 28
Solution
Formula for Pearson Product moment formula called as covariance method. It determines
the joint variation between the variables
r = n∑xy - ∑x.∑y
√n{∑x2 – (x)2 √ n{∑y2 – (y)2
X Y X2 Y2 XY
25 16 625 256 400
18 11 324 121 198
32 20 1024 400 640
21 15 441 225 315
35 26 1225 676 910
29 28 841 784 812
∑X = 160 ∑Y=116 ∑X2=4480 Y2=2460 XY=3275
r = n∑xy - ∑x.∑y
√n{∑x2 – (x)2 √ n{∑y2 – (y)2
r = 6 * 3275 – 160 * 116
√6 * 4480 - (160)2 √ 6*2462 -(116)2
r = 0.84
since the value of r lies between 0.75 to 1 then it is high positive correlation and it can be
concluded that r is strongly correlated. There is a strong correlation between X and Y
10 | P a g e
Problem
Determine the correlation coefficient
Age(cars) 2 4 6 7 8 10 2
Maintenance(’00 16 15 18 19 17 21 20
)
Problem
Productivity 50 55 60 65 52 60 47 36 75
Experience(yrs) 49 72 74 44 58 66 50 30 35
Problem
Fertilizers in metric tonnes 15 18 20 24 30 35 40 50
Productivity in metric 85 93 95 105 120 130 150 160

tonnes
11 | P a g e
12 | P a g e

Business Statistics (Testing of Hypothesis, Chi Square Correlation)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Business Statistics (Testing of Hypothesis, Chi Square Correlation)

Uploaded by

Copyright:

Available Formats

Unit 7 :

 Hypothesis – it represents a researcher’s expectation what is true of the population

 Null Hypothesis - H0 – A statement in which no difference or effect is expected. If the

 Alternate Hypothesis – H1 – A statement that some difference or effect is expected.

 Significance level – It is a level of risk or probability or risk a researcher takes when

 Degrees of freedom(df) – it refers to the amount of information available to estimate

 Provides the direction to the study

 Identifies relevant facts for the study

 Facilitates interpretations and derive conclusions of the study

 Useful in framing the entire research process

 It shows the extent of knowledge gained by the researcher

t-test - for small samples (n<30)

Z-Test - for large samples ( n>30)

ANOVA – Analysis of Variance ( both one & two ways)

Ex : of Non Parametric test

Karl Pearson introduced the chi-square test in 1900.

Classification of Chi-Square test

Chi-Square goodness-of-fit test[for one variable]

Chi-Square test of independence[for two variables]

Chi-Square goodness-of-fit test(one dimensional chi square or for one variable)

 Ex : of one dimensional chi square test

Dimension Strongly Agree Can’t Disagree Strongly

Alternate Hypothesis [H1] : Training is highly significant in improving the skills of

Formula for Chi-square = ∑(Oi-Ei)2/Ei

Oi – observed frequencies, Ei – Expected frequencies

Determination of Expected frequencies

Expected frequencies = total number of respondents/number of

Calculation of Observed and Expected frequencies

Observed Expected (Oi-Ei) (Oi-Ei)2 (Oi-Ei)2/Ei

2 14 -12 144 10.28

Formula for 2 = ∑(Oi-Ei)2/Ei = 35.56

Degrees of freedom = (n-1) = (5-1) = 4

(where n is the number of dimensions)

Expected frequencies = number of respondents/number of dimensions = 70/14 = 5

Formula for chi-square = ∑(Oi-Ei)2/Ei = 35.56

The calculated value of Chi-square test is tested at degrees of freedom

Degrees of freedom = (n-1) = (5-1) = 4, where n is the number of dimensions

Practice Problems in One Dimensional Chi-Square Test

1. Test at 5% significance level, whether Price is an important indicator in influencing

Dimension Number of Respondents

Neither Important nor Unimportant 20

2. Test at 5% significance level, whether Brand Image is an important component in

Dimension Number of Respondents

Extremely Important 150

Day : Mon Tue Wed Thu Fri Sat

Number of parts demanded: 1124 1125 1110 1120 1126 1115

Month Gallons of Milk

Ex : correlation between salary, marks, IQ etc

Correlation is denoted by ‘r’ called as Correlation Coefficient. If r = 0 then it is no

If r lies 0.75 – 1, then it is high correlation

If r lies 0.5 – 0.74, then it is moderate correlation

If r < 0.5, then it is low correlation

The Methods of Correlation are classified as Graphic and Algebraic Methods

Graphic Methods – graph & Scatter diagram

Algebraic Methods – Product moment or Covariance method called as Karl Pearson

Product moment or Karl Pearson Coefficient or Covariance Method is the measured of

√n{∑x2 – (x)2 √ n{∑y2 – (y)2

25 16 625 256 400

18 11 324 121 198

32 20 1024 400 640

21 15 441 225 315

35 26 1225 676 910

29 28 841 784 812