You are on page 1of 5

Top University Preparation // Sampling and Hypothesis Testing

Joseph Andreas December 13, 2013

1
1.1

Sampling
Denition

Population is the collection of all items that you want to examine. Sample is the subsection of a population.

1.2

Four Types of Sampling

There are four types of sampling, these are : 1. Random Sampling is the type of sampling where each item in the population has an equal chance to be selected. The selection method is random, like you give every item in the population a number in a ball and then put all balls in the box. Then you take balls at random. 2. Systematic Sampling is the type of sampling using certain rule. For example, instead of choosing the ball at random, we can always choose the numbers that are divisible by 5. 3. Stratied Sampling can be used when the population in question is split up into groups with dierent patterns of behaviour. For example, if we were trying to nd the nations favourite radio programme, most children would probably like dierent stations or programmes to most adults -. Each group is sampled separately and the results are put together. In the example given above, if children make up 25 % of the population, we would make sure our sampling would choose children with the probability of 25 % of the sample would be children. 4. Quota Sampling involves splitting the population into groups and sampling a given number of people from each group. For example, one samples exactly 50 men and 50 women. However one chooses the 50 men is not important and is not based on probability, unlike stratied sampling.

1.3

Advantages and Disadvantages of Dierent Sampling Methods


(a) Advantages i. Easy to calculate ii. The ideal one as it involves less bias (b) Disadvantages i. Not possible without complete list of population members. ii. The minority group may not be properly represented.

1. Random Sampling

2. Systematic Sampling (a) Advantages i. Sample is easy to choose ii. Sample is evenly spread over population. (b) Disadvantages i. Sample may be biased if there is hidden periodicity in population that coincides of the selection. ii. So, some pattern may be over or under represented. 3. Stratied Sampling (a) Advantages i. Make sure that each group is properly represented. (b) Disadvantages i. More Complex ii. The strata must be carefully dened 4. Quota Sampling (a) Advantages i. Easier than stratied sampling while retain its advantages. (b) Disadvantages i. Biased as the selection is not random.

1.4

Sample Mean and Variance

Suppose we have a sample data X1 , X2 , . . . , Xn . From the sample data, we would like to estimate the population mean and variance. Let and 2 be estimated mean and variance respectively. Then we have X1 + X2 + . . . + Xn = n 2 2 2 2 2 ( X ) + ( X ) + . . . + ( X X 2 + X2 + . . . + Xn n2 1 2 n ) 2 = = 1 n1 n1 Note that the denominator n 1 seems strange, just remember if you are estimating population variance from sample data, you divide by n 1 instead of n. 2

1.5

Mean and Variance of Sample Mean that are taken from Normal Distribution

Suppose X follows normal distribution N (, 2 ). Let Y be the distribution of mean of n sample of X . Then 2 Y N (, ) n Proof. Let X1 , X2 , . . . , Xn be samples of X that follows normal distribution. Note that Y = 1 n (X1 + X2 + X3 + . . . + Xn ). Calculating mean and variance of Y and the result follows. Note that it does not require the sample size is large nor the population is large. What if the population is not normally distributed? (see next part)

1.6

Central Limit Theorem (simplied)

Suppose X follows any distribution with mean and variance 2 . Let Y be the distribution of mean of n sample of X . If n is large enough, then Y N (, where Y is the mean of sample data. note : No need continuity correction. 2 ) n

Hypothesis Testing

A statistical hypothesis simply means a claim about population parameter. A hypothesis testing simply means we test the possibility of the claim. Example : Suppose you nd that the average marks of usual math students in year 2 is normally distributed N (81, 40). Then you use new method of teaching math in the year 2 class and it turns out that their marks are improving, to say at average, 85. Suppose the size of the class is 40. What can you conclude? Does it because of the new method? Or it is just a random year in which you have a bunch of very good students?

2.1

Null and Alternative Hypothesis

Suppose we have a population with mean and standard deviation . Consider a sample size with size n that has been altered in some ways. Our null hypothesis about the mean is that the mean of the sample still follows the mean of population. Our alternate hypothesis is that the mean of the sample does not follow mean of population. Example : A null hypothesis means nothing happened. There is no change in the mean. In our example, it means that the increase in our year 2 math grades is accounted by probability, nothing more. In this case, we write H0 : = 0

In this example, 0 = 81. An alternative hypothesis means something happened. We can always conclude that the increase is due to the eectiveness of our new method. In this case, we claim that the new method implies increasing eectiveness. We write H1 : > 0 In this example, > 81.

2.2

One and Two tailed tests

Let us give some examples to determine H1 in more variety. Our example before has shown the rst type, one tail test for an increase. Let us introduce two more. 1. One tail test for an increase, H1 : > 0 . 2. One tail test for a decrease, H1 : < 0 . In this type, suppose that the marks are actually decreasing, and the average of a class of 40 students is 78 now. Then, H1 : < 78. 3. Two tail tests H1 : = 0 . Take a look at the example. Suppose a company produces cookies. Each box of cookies would have weight of N (80, 1) grams. One day, it has been suggested that a machine is defective and it produces 20 box of cookies with average weight of 84 grams. Is the machine defective that the weight of a box of cookie now changes?

2.3

Carrying the Test

Let us take a look back at our example. How about it? Does it actually the probability? Or the new method is actually doing good? One way to determine is to give a condence interval. Consider sample data with size n, taken from population with mean and standard deviation . By central limit theorem, the mean of the sample data (lets just say Y ) follows the distribution N , n . There, we can measure our mean of sample data (sample mean), where it lies on the distribution. For example, if we carry the test with 95% condence, then we will accept the original (null) hypothesis if our sample mean actually lies within 95% condence interval. Otherwise, we will reject the null hypothesis and accept the alternatives. If we go back to our example, it means that under H0 hypothesis, there is less than 100 95% = 5% that the chance of the sample mean (of size 40) of the mark passes 1 = 85 How to do that? Well simple. Let X be the sample random variable. Notice that by sampling part, we have 40 X N (81, ) 40 It follows that X N (81, 1), so we just need to nd P (X 85). If P (X 85) 0.05, then we accept H0 . If P (X 85) 0.05, then we reject H0 . There are alternative ways also to do that, that is to nd the rejection region. Forget about the fact that now the average is 85. Ask a question, which average that we should have to get 4
2

81 the result of exactly 5%? In this case, N (z ) = 0.95, so we have X = z . Here, z = 1.645, so 40

82.645. Since average is 85 > 82.645, we X = 82.645. Thus, the rejection region is given by X conclude that the new method performed better.

40

Note that if the test is two tailed, then the probability of 5% is divided by 2.5% to nd two rejection regions. Do you see why?

2.4

Other Cases
(X1 )2 +(X2 )2 +...+(Xn )2 . n1

There are other cases also. Let us take a look at two. 1. Unknown variance??? In this case, estimate the variance using ssample = 2. What if n is small?? Use T = freedom.
X
s2 sample n

t(n 1), i,e tdistribution with n 1 degrees of

You might also like