You are on page 1of 12

Unit 7 :

Hypothesis testing

 Hypothesis – it represents a researcher’s expectation what is true of the population


parameter

Formulation of Hypothesis

 Null Hypothesis - H0 – A statement in which no difference or effect is expected. If the


null hypothesis is not rejected, no changes will be made. It is a statement of no
association

In formulating the ‘null’, usually the words “no”, “not”, or “same” or “independent” will
be the part of the stated hypothesis

 Alternate Hypothesis – H1 – A statement that some difference or effect is expected.


Accepting the alternative hypothesis will lead to changes in opinions or actions. It is a
statement of association.

 Significance level – It is a level of risk or probability or risk a researcher takes when


rejecting the null hypothesis when it is true. The significance levels are 1%, 5%, 10%.
5%. Significance level is interpreted as (100 – 95)/100 (where 95 is a confidence level of
the researcher about the population parameter).

 Degrees of freedom(df) – it refers to the amount of information available to estimate


population parameters from sample statistics. It refers to the number of values in a
sample we can specify freely once we know something about the sample.

 Importance of Hypothesis

 Provides the direction to the study

 Identifies relevant facts for the study

 Facilitates interpretations and derive conclusions of the study

 Useful in framing the entire research process

 It shows the extent of knowledge gained by the researcher

1|Page
Tests of Hypotheses

Parametric Tests – When the data is on continuous scale (interval and ratio scale) then
parametric tests can be used

Ex : of Parametric test

t-test - for small samples (n<30)

Z-Test - for large samples ( n>30)

ANOVA – Analysis of Variance ( both one & two ways)

Non Parametric Tests – When the data is on categorical scale (Nominal & Ordinal) then
Non parametric tests can be used

Ex : of Non Parametric test

Chi – Square test, Kolmogorov Smirnov D test, Wilcoxon Matched Pairs test etc

2|Page
Chi Square Analysis

Karl Pearson introduced the chi-square test in 1900.

Chi-Square is used to determine how the observed frequencies differ from expected
frequencies.

Chi-Square is a non parametric test of hypothesis. It can be used for both the nominal and
the ordinal data.

Classification of Chi-Square test

Chi-Square goodness-of-fit test[for one variable]

Chi-Square test of independence[for two variables]

Chi-Square goodness-of-fit test(one dimensional chi square or for one variable)

The chi-square goodness-of-fit test is used to analyze the distribution of frequencies for
categories of one variable. It is used in analyzing nominal data variables, such as age or
number of bank arrivals, to determine whether the distribution of these frequencies is the
same as some hypothesized or expected distribution. However, the goodness-of-fit cannot
be used to analyze two variables simultaneously.

 Ex : of one dimensional chi square test

To test the hypothesis that training is significant in improving the skills of salesman
at 5% significance level [ table value at 5% significance level is 9.488]

Dimension Strongly Agree Can’t Disagree Strongly


Agree say Disagree

No. of 15 32 9 12 2
respondents

Solution :

Null Hypothesis [H0] : Training is not significant in improving the skills of the
salesman

Alternate Hypothesis [H1] : Training is highly significant in improving the skills of


the salesman

3|Page
Formula for calculating Chi-Square :

Formula for Chi-square = ∑(Oi-Ei)2/Ei

Oi – observed frequencies, Ei – Expected frequencies

Determination of Expected frequencies

Expected frequencies = total number of respondents/number of


categories/dimensions = 70/5 = 14

Calculation of Observed and Expected frequencies

Observed Expected (Oi-Ei) (Oi-Ei)2 (Oi-Ei)2/Ei


frequencies(Oi) frequencies(Ei)

15 14 1 1 0.071

32 14 18 324 23.14

9 14 -5 25 1.785

12 14 -2 4 0.28

2 14 -12 144 10.28

Total 35.56

Formula for 2 = ∑(Oi-Ei)2/Ei = 35.56

Degrees of freedom = (n-1) = (5-1) = 4

(where n is the number of dimensions)

Expected frequencies = number of respondents/number of dimensions = 70/14 = 5

Formula for chi-square = ∑(Oi-Ei)2/Ei = 35.56

The calculated value of Chi-square test is tested at degrees of freedom

Degrees of freedom = (n-1) = (5-1) = 4, where n is the number of dimensions

Table value at 5% significance level for 4 degrees of freedom from chi-square table
= 9.488

4|Page
Conclusion

Since the calculated value(35.56) is greater than table value or critical value (9.488),
we should reject the null hypothesis and conclude that training is significant in
improving the skills of salesman.

Practice Problems in One Dimensional Chi-Square Test

1. Test at 5% significance level, whether Price is an important indicator in influencing


customers or not : [table value at 5% = 9.488]

Dimension Number of Respondents

Very Important 50

Somewhat Important 60

Neither Important nor Unimportant 20

Somewhat Unimportant 40

Very Unimportant 30

2. Test at 5% significance level, whether Brand Image is an important component in


promoting the cosmetics or not : [table value at 5% = 12.592]

Dimension Number of Respondents

Extremely Important 150

Important 90

Somewhat Important 30

Neutral 40

Somewhat 30

Un Important

Unimportant 50

Extremely 60

5|Page
Un Important

3. The demand for a particular spare part in a factory was found to vary from day-to-
day in a sample study, the following information was obtained

Day : Mon Tue Wed Thu Fri Sat

Number of parts demanded: 1124 1125 1110 1120 1126 1115

Test the hypothesis that the number of parts demanded does not depend on the day
of the week. Test tis at 5% significance level[table value at 5% = 11.070]

 Test whether the sales of milk is uniformly distributed for 12 months [table value at
5% significance level : [table value : 19.675]

Month Gallons of Milk

January 1610

February 1585

March 1649

April 1590

May 1540

June 1397

July 1410

August 1350

September 1495

October 1564

November 1602

December 1655

6|Page
7|Page
Unit 9

Correlation Analysis

Correlation Analysis is used as a statistical tool to ascertain the association between the
two variables. It also determines the nature and strength of the variables. It determines the
degree of relationship or direction between the variables

Ex: Income – Investment ( as Income increases, the level of investment also increases),
IQ – Productivity

Ex : correlation between salary, marks, IQ etc

Correlation is denoted by ‘r’ called as Correlation Coefficient. If r = 0 then it is no


correlation, r = -1, then it is negative correlation. If r = +1, then it is perfect positive
correlation.

If r lies 0.75 – 1, then it is high correlation

If r lies 0.5 – 0.74, then it is moderate correlation

If r < 0.5, then it is low correlation

Methods of correlation

The Methods of Correlation are classified as Graphic and Algebraic Methods

Graphic Methods – graph & Scatter diagram

Algebraic Methods – Product moment or Covariance method called as Karl Pearson


Product Moment Formula, Rank method or Spearman Rank Correlation, Concurrent
Deviation Method

Product moment or Karl Pearson Coefficient or Covariance Method is the measured of


determining the joint variation between the variables

Bivariate correlation – Pearson Product moment formula & Assumed mean or short cut
method

Rank Correlation –

Karl Pearson coefficient of correlation can’t be calculated when the series that are ranked
according to size. It is a convenient method to rank the series. It is used for qualitative
phenomenon

8|Page
Ex : Intelligence tests

Spearman Rank correlation(for individual ranks) and Edward Spearman formula (for tied
or similar ranks)

9|Page
Find the value of r by using Pearson Product moment formula between the following

X : 25 18 32 21 35 29

Y : 16 11 20 15 26 28

Solution

Formula for Pearson Product moment formula called as covariance method. It determines
the joint variation between the variables

r = n∑xy - ∑x.∑y

√n{∑x2 – (x)2 √ n{∑y2 – (y)2

X Y X2 Y2 XY

25 16 625 256 400

18 11 324 121 198

32 20 1024 400 640

21 15 441 225 315

35 26 1225 676 910

29 28 841 784 812

∑X = 160 ∑Y=116 ∑X2=4480 Y2=2460 XY=3275

r = n∑xy - ∑x.∑y

√n{∑x2 – (x)2 √ n{∑y2 – (y)2

r = 6 * 3275 – 160 * 116

√6 * 4480 - (160)2 √ 6*2462 -(116)2

r = 0.84

since the value of r lies between 0.75 to 1 then it is high positive correlation and it can be
concluded that r is strongly correlated. There is a strong correlation between X and Y

10 | P a g e
Problem

Determine the correlation coefficient

Age(cars) 2 4 6 7 8 10 2

Maintenance(’00 16 15 18 19 17 21 20
)

Determine the correlation coefficient

Problem

Productivity 50 55 60 65 52 60 47 36 75

Experience(yrs) 49 72 74 44 58 66 50 30 35

Problem

Determine the correlation coefficient

Fertilizers in metric tonnes 15 18 20 24 30 35 40 50

Productivity in metric 85 93 95 105 120 130 150 160


tonnes

11 | P a g e
12 | P a g e

You might also like