You are on page 1of 56

Hypothesis

INTRODUCTION

Hypothesis
Assumption About Population parameter Supposition made on some basis

Hypothesis
According to Prof. Morris Humburg Hypothesis is a quantitative statement about a population Roger Mc Danial An assumption that a researcher makes about some characteristic of a population under study Guy Preliminary or tentative explanation by the researcher regarding the outcome of an investigation Mouton statement postulating a possible relationship b/w tow or more phenomenon or variable

Hypothesis and RQ
Both generate and contribute to body of knowlodge which supports or refutes an existing theory but hypothesis differs from a problem as a problem is formulated in form of a question; it serves basis and origin from which an hypothesis is derived. Hypothesis is a suggested solution to a problem. Problem cannot tested whereas hypothesis can be tested and verified. Hypothesis is formulated after defining problem review, theoretical and empirical background, studying analytical models.

Hypothesis Testing - Procedure


 1. Five steps Setting Hypothesis Set up a hypothesis about population parameter. Collect sample data produce sample statistics and use the information to decide how likely is our hypothesized population parameter is correct. Say, we have been given or assumed certain value of a population mean.To test the validity of our assumption we gather sample data and determine the difference b/w hypothesized value and and actual value of sample mean.

Hypothesis Testing
Then we judge whether the difference is significant.The smaller the difference, greater the likelihood that our hypothesized value of our mean is correct, larger the difference , smaller the likelihood. The conventional approach to hypotheses is not to construct a single about population parameter, but rather to set up two different hypothesis, so constructed that if one is accepted the other is rejected and vice versa. The two hypothesis in statistical test are referred as 1. Null Hypothesis 2. Alternate Hypothesis.

Null Hypothesis Represented by Ho; where o represents no difference Ho : ( , =, ) o Only one sign of , =, will appear at a time. A very useful tool in testing the significance of difference. In simplest form the hypothesis asserts that there is no real difference b/w sample and population in particular characteristic / variable understudy, thus word null means invalid, void or amounting to nothing Eg. Whether extra coaching has benefited the students or not, null hypothesis extra coaching has not benefited the students or drug is not effective in curing a disease

 Alternate Hypothesis Represented by Ha Ha : (< , , > ) o Alternate hypothesis specifies those values that the researcher believes to hold true for which sample data lead to acceptance of this hypothesis as true. Eg. A psychologist to test whether or not a certain class of people have a mean population IQ Ho : = 100 (null hypothesis) Ha : 100 (alternate hypothesis)

Hypothesis
For testing the differences b/w the mean of two groups, null hypothesis establishes that two groups have equal means (1 - 2 = 0) and for alternate hypothesis means are not equal (1 - 2 0) Ho : 1 - 2 = 0 (null hypothesis) Ha : 1 - 2 0 (alternate hypothesis)

Hypothesis Testing Procedure


2. Set up suitable Significance level Next step is to test the validity of Ho against Ha at a certain level of significance. Usually denoted by alpha ( ) The confidence with which an experimenter rejects or retains null hypothesis depends upon the significance level adopted.It is probability of null hypothesis being wrong.

Hypothesis
It is expressed in percentage, such as 5 percent,When H in question is accepted at 5 per cent level, the statistician is running the risk that, he will be making wrong decision about 5 percent of the time. By testing at 1 percent, he seeks to reduce the chance of making false judgment but some element of risk remains (1 of 100 occasions)that he will make the wrong decision I.e.he may accept where he ought to have rejected or vice versa. Statistical theory states that its probability must be small. Traditionally =0.05 for consumer research; = 0.01 for quality assurance and =0.10 for political

Hypothesis
3. Setting test criterion selecting appropriate probability distribution Some commonly used probability distribution are t, f and chi square. 4. Dong computations with a sample of size n 5. Making Decisions To accept or reject null hypothesis. To test whether computed value of test falls in region of rejection or acceptance depending upon significance level.

Errors in Hypothesis Testing


1. 2. 3. 4. When a statistical hypothesis is tested, there are four possibilities : Hypothesis is true but our test rejects it. (Type 1 error) Hypothesis is false but our test accepts it.(Type II error). Hypothesis is true and our test accepts it. (Correct Decision). Hypothesis is false and our test rejects it.(Correct Decision).

Errors in Hypothesis Testing


Accept Ho Ho is True
Correct Decision

Reject Ho
Type 1 Error

Ho is False

Type II Error

Correct Decision

Two-Tailed Test of Hypothesis


Two-tailed test of hypothesis will reject the null hypothesis, if the sample statistic is significantly higher than or lower than the hypothesized population parameter. Thus in two-tail test the rejection region is located in both the tails, if we are testing the hypothesis at 5 per cent level of significance, the size of acceptance region on each side of the mean would be 0.475 and the size of the rejection region is 0.025. If we refer to table of areas under the normal curve, we find that an area of 0.475 corresponds to 1.96 standard errors on each side of H hypothetical mean

Two-Tailed Test of Hypothesis


If sample mean falls in this area, the hypothesis is accepted. If the sample mean fall in to the area beyond 1.96 standard error, the hypothesis is rejected, because it falls in to the rejection region. For 1 percent level of significance, acceptance region of 0.495 (half of 0.99) is equal to 2.58 std error. Null hypothesis that average income per household is Rs. 1000/- against alternate hypothesis that it is not Rs. 1000/-, the rejection region will lie on both sides becoz we would reject null hypothesis if mean income is more or less than 1000/-

One-tailed Hypothesis
As distinguished from two tailed test, one tailed test is so called because the rejection region will be located on either left or right depending upon alternate hypothesis formulated. Eg. Testing hypothesis that an average income per household is Rs. 1000/- against alternate that income is less than 1000/-the rejection region will be on left side & test will be one sided left tailed test.

Null hypothesis : Ho : = o Alternate hypothesis : i) H1: o i.e. >o or : <o ii) H1: >o iii) < o

Power of Hypothesis Test


It is important to know how well the hypothesis test is working, the measure of how well it is working is called power of the test. In hypothesis testing and are the probabilities of type 1 and type 2 error.Type 1 error occurs when we reject null hypothesis that is true and ( the significance level of the test) is probability of making type 1 error. Type II error occur when we accept null H that is false. The probability of type II error is . Smaller the better it is.Alternatively (1- ), i.e. probability of rejecting null H, when it is false should be as large as possible.

Standard Error and Sampling Distribution


Standard Error is of fundamental importance in testing hypothesis. Standard Deviation of the sampling distribution is called the standard error. It is so called because it measures the sampling variability due to chance or random forces. If the universe distribution is not normal, then the sampling distribution of sample means approaches normality as the sample size increases. The error is used in place of distribution to emphasize the variation among sample means is due to sampling errors.

Test of Significance of Attributes.


As distinguished from variables where quantitative measurement of phenomenon is possible, in case of attributes, we can only find out the presence or absence of a particular characteristic. The sampling of attributes may therefore be regarded as the drawing of samples from a population whose members possess the attribute A or not A. For example a study of attribute literacy, a sample may be taken and population be classified as Illiterates and literates.

Test of Significance of Attributes.


With such data a binomial type of problem may be formed. The selection of an individual on sampling may be called event and appearance of attribute A may be taken as success and non- appearance as failure. Suppose of 1000 people selected, 100 are found literates and 900 illiterates, we would say sample consist of 1000 units of which 100 are successes and 900 as failures. Then probability of success or p = 100/1000 or 0.1 and probability of failure or q = 900/1000 or 0.9 so that p+q = 1.

Test of Significance of Attributes.


1. 2. 3. The various test of significance of attributes are : Tests for number of successes Tests for proportion of successes Tests for difference between proportions.

Tests for number of successes


The sampling distribution of the number of success follows a binomial probability distribution. Hence its standard error is given by the formula : S.E. of no. of successes = npq where n=size of sample p= probability of success in each trial q=(1-p), probability of failure. Value of Difference /SE is compared with SE of 5 per cent significance i.e. 1.96 or 2.58 at 1 percent significance level.

Tests for proportion of successes


Instead of recording the number of success in each sample, we record proportion of success i.e.1/nth of the number of the success. Therefore mean proportion of the success must be success p and SD of the proportion of the successes pq/n, thus we have formula SE.p = pq/n where n=size of sample p= probability of success in each trial q=(1-p), probability of failure.

Tests for difference b/w proportions

if two samples are drawn from different populations, we may be interested in finding out whether difference b/w the proportion of successes is significant or not. In such case we take hypothesis that difference b/w p1 & p2 i.e. proportion of success in one and other sample in due to fluctuations of random sampling. If p1-p2/S.E. is less than 1.96(5 % significance level) the difference is regarded as due to random sampling variations ie as not significant.

SE (p1-p2) =
n1 + n2

pq(1/n +1/n )
1 2

p= n1 p1+ n2 p2 or

p= x1 +x2
n1 + n2

Test of Significance for Small Samples


When size of the sample is less than 30. In such case it is reverse to large samples, wherein random sampling distribution is approximately normal and values given by sample data are sufficiently close to population values. Therefore it should be noted that estimates in such cases will vary widely from sample to sample. In small samples, assumption is made that the parent population is normal. Under these conditions sampling distribution of sample statistic such as mean(x) and proportion (p) is normal but the critical values of x & p depends on whether or not the population SD is known.

Test of Significance for Small Samples


When the population SD is not known, its value is estimated by computing the SD of sample s and standard error of the mean is calculated by using the formula x = sn. When we do this the resulting sampling distribution may not be normal even if sampling is done from a normally distributed population. In such cases the sampling distribution turns out to be the student t distribution.

Hypothesis Testing for small samples(n30) t test


Sir, William Gosset of Ireland in early 1900, developed method for hypothesis testing, but Guinness Breway in Dublin in which he was employed, did not allow his method and findings to publish, so later his findings were published under his pen name student in 1905. t test is used for hypothesis testing for single population mean, two population means(independent), difference b/w two population means(dependent).

t test for single population mean

s= (x-x)
n-1

x=mean of sample, =actual or hypothetical mean; n=sample size; s=SD of the sample.

Difference between means of two samples (independent samples)

t= X1- X2 S

x nn
1 1

n +n

S=

(X1 -X1 ) +

(X2 - X2)

n +n -2
1 2

Difference between means of two samples (dependent samples)

t= d S

n
d -n(d )

S=

(d -d ) or

n -1

n 1

Chi-Square Distribution &Test Statistic( )


The symbol is a greek letter chi. The sampling distribution of is called distribution. Like other hypothesis testing procedure, calculated value of test-static is compared with its critical value to know whether null hypothesis is true or not. The decision of accepting null hypothesis is based on how close the sample results are to expected results.

Chi-Square Distribution and Test Statistic( )


The calculated value of is compared with table of , for a given degree of freedom at a certain level of specific significance. If the cal value of is more than table value, the difference between theory and observation is significant i.e. not arisen due to fluctuations of sampling. On the other hand if cal value is less than table value than difference is not significant i.e. may be due to fluctuation of sampling hence ignored. Since is derived from observations, it is statistic not parameter (there is no parameter corresponding to it), therefore it is termed as non-parametric.

Chi-Square Distribution and Test Statistic( )


While comparing the calculated value of with table value we have to determine the degrees of freedom. By df we mean classes to which the values can be assigned arbitrarily without violating the restrictions or limitations placed.For eg. For choosing five numbers totaling to 100, we may choose 4 as fifth would be 100- sum of 4.Thus we had only one restraint on our freedom, df is 5-1=4. Another example 10 classes in such a manner that number of cases, mean and Sd agree with original distribution i.e. restraint are 3, Df 10-3=7. = n-k, where k in number of restraint.

Chi-square test

Before calculating chi square, expected frequency may be given or calculated, in general expected frequency can be calculated from the equation : E=RTxCT/N RT= Row total containing cell, CT = Column total N= Number of observation

For eg, a drug was administered to 812 of total population 3248, number of fever cases is shown as Treatment Fever No Fever Total Drug 20 792 812 No Drug 220 2216 2436 Total 240 3008 3248 Expectation of AB = AxB/N = 812/3248x240 = 60 Table of expected frequency shall be 60 752 812 180 2256 2436 240 3008 3248 Df=(r-1)(c-1) = (2-1)(2-1)=1

test as a test of goodness of fit


It is so called because it enables us to ascertain how best is theoretical distribution fit empirical distribution i.e. those obtained from sample. In other words how well is ideal frequency curve fits with observed facts. Test of concurrence of the two is goodness of fit. Actually the is a test of badness of fit, since the result of the test lead that fit of normal distribution to observed distribution is bad, but as the evidence of the badness is not convincing therefore may be said to be good.

1. Establish null and alternate hypothesis and establish significance level 2. Sample of observation is drawn from relevant statistic population. 3. A set of expected and theoretical frequencies is derived assuming that null hypothesis is true. 4. The observed frequencies is compared with expected. 5. If the calculated value of is less than table value at certain level of significance (generally at 5%) and for certain degree of freedom the fit is considered to be good otherwise poor or bad.

Steps in testing goodness of fit

F test is named after great statistician R.A.Fisher. The object of the test is to find out whether the two independent estimates of population variance differ significantly, or whether the two samples as drawn from the normal populations having same variance. For carrying out the test of significance , we calculate the ratio F, as

F-Test (Variance Ratio Test)

s = (x -x ) s = (x -x )
1 1 1 2 2 2

n -1
1

n -1
2

F-Test (Variance Ratio Test)


It should be noted that s1 is always the larger estimate of variance 1 = Df for sample having larger variance 2 = Df for sample having smaller variance The calculated value of the F is compared with table value of 1 & 2 at 5% or 1% level of significance.If cal value is more than table value the F ratio is considered significant, and null H is rejected.

Analysis of Variance
Analysis of Variance frequently referred as ANOVA is a statistical technique specially designed to test whether
means of more than two quantitative populations are equal ANOVA is developed by R.A. Fisher in 1920s is capable to test that specified classifications differ significantly.ANOVA enable us to analyze the total variation of data into components which may be attributed to causes of variations t test is a procedure of testing null hypothesis for two samples only. However, when we have three or more samples drawn from same population, ANOVA provides the possible solutions.

Analysis of Variance
For the sake of clarity the technique of Analysis of Variance have been discussed for One way classification Two-way classifications.

One-way Analysis of Variance


In one-way classification, the data are classified according to only one criterion. The hull hypothesis is : Ho : 1= 2=3..=k Or H1 : 1 2 3.. k Arithmetic means of the populations from which k samples were drawn are equal/not equal to one another.

One-way Analysis of Variance


Steps in carrying out the analysis are :
1. Calculate variance between the samples

a. Calculate the mean of each sample i.e X1, X2 etc b. Calculate the grand average X pronounced as double bar
X1+X2+X3 + ,

X=
N1+N2+N3 + ,

c. Take the difference b/w means of various samples and the grand average.

c. Square these deviations and obtain the total which will give sum of squares b/w the samples d. Divide the total in step d) by the degrees of freedom, the degrees of freedom will be one less than the number of samples e.g.for 4 samples, df=4-1, for k samples, = k-1. 2. Calculate variance within the samples a. Calculate the mean of each sample i.e X1, X2 etc b. Take the deviations of various items in a sample from the mean values of the respective samples. c. Square these deviations, obtain the total which gives the sum of the squares within the samples

d. Divide the total in step c) by the degrees of freedom, the degrees of freedom is obtained by the deductions from the total number of items, the number of samples i.e. = Nk.Where k is number of samples and N is total number of all observations Between-Column Variance 3. Calculate the ratio F = Within-Column Variance or F = S1/S2 4. Compare the calculated value of F with the table value for the df at certain significance level, generally 5%, if F cal is more than difference is significant. It is customary to summarize the calculations in form of ANOVA table :

It is customary to summaries the calculations in form of ANOVA table : Source of SS Variations (sum of squares)
B/W Samples Within Samples
Total

Iuo

df c-1 2= n-c
1=

MS Mean Square
MSC=SSC/(c-1) MSE=SSE/(n-c)

SSC SSE

SST n-1 F= MSC/MSE SST = Total sum of squares of variations SSC= Sum of squares b/w samples (columns) SSE = Sum of squares within samples (rows) MSC = Mean sum of squares b/w samples. MSE= Mean sum of squares within samples.

Two-way Analysis of Variance


Steps in carrying out the analysis are :
1. Calculate variance between the samples

a. To simplify the calculations by taking an assumed /hypothetical and calculate the variance by taking the difference b/w the value given and hypothesed mean b. Calculate the total variance of the rows and column and also grand total T c. Calculate the correction factor as T/N, where N is the number of items

2. Calculate the sum of the squares b/w columns by, a. Square the columns totals dividing each total by number of items included in it, adding these figures and than subtracting the correction factor(T/N) from them. b. Df in this case will be (c-1) 3. Calculate the sum of the squares b/w rows by, a. Square the rows totals dividing each total by number of items included in it, adding these figures and than subtracting the correction factor(T/N) from them. b. Df in this case will be (r-1)

4. Calculate total sum of squares by, a. Adding the squares of all the items in the table and subtracting the correction factor therefrom, b. Df in this case will be N-1 5. The above information will be presented in the table of ANOVA in the following form as;

Source of Variations
B/W Samples

Sum of squares
SSC

df

Mean sum of Square


MSC=SSC/(c-1)

Within Samples SSR MSR=SSR/(r-1) MSR/MSE Residual or Error SSE (c-1)(r-1) MSE= SSE/(c-1)(r-1) Total SST n-1

c-1 r-1

MSC/MSE

SST = Total sum of squares of variations SSC= Sum of squares b/w columns SSR = Sum of squares within rows SSE = Sum of Squares due to error. Total df is N-1 or cr-1

Class Test for 5 marks


Note : Attempt all questions, carrying equal marks i.e. 2.5 marks

Q.N. 1. What is hypothesis, discuss its different types along with suitable examples?.Also explain the procedure of testing hypothesis. Q.N. 2. In a test given to two groups of students, the marks obtained are : 1Group :18 20 36 50 49 36 34 49 41 2 Group:29 28 26 35 30 44 46 Examine the significance of difference between the marks secured by the students of the two groups. Given : For

=0.05, =14, t /2 = 2.14, =15, t /2 = 2.13, =16, t /2 = 2.12

Class Test for 5 marks


Note : Attempt all questions, carrying equal marks i.e. 2.5 marks

What is hypothesis and what are the different types of errors. Also explain the test of significance in hypothesis testing. Q.N. 2. In a test given to group of students, the marks obtained before and after the coaching are :
Before 18 After 29 20 28 36 26 50 35 49 30 36 44 34 46 49 45 41 34

Q.N. 1.

Examine whether the course was useful. Given : For

=0.05, =8, t /2 = 2.306,