Professional Documents
Culture Documents
Revision
Hypothesis Testing
Niza
Talukder
Normal Distribution
• Example: Suppose a large group of students take a test. Many of the students
are likely to get scores about the mean and scores in ranges of a fixed width are
likely to tail off away from the mean. Let’s say that the average score was 60.
We would expect to find more students in the range 55-65 than in the range 85-
95. These considerations suggest a probability density function that peaks at the
mean and tails off at its extremities.
(Refer to class discussions to see how the histogram transitions to a normal distribution)
µ
Properties:
• The distribution is symmetrical about the mean,
• Mean, median and mode are all equal due to the symmetry of the distribution
• Mean of the random variable is E(X) =
• Variance of a random variable is Var (X) = E
• Total area under the curve is 1
There are two parameters fundamental to this distribution; . A random variable X having a normal
distribution will be denoted with the following notation, X )
• There is no convenient formula in determining probabilities. Thus, area under the curve
cannot be calculated in the usual manner. Probabilities of any normal distribution is
calculated and tabulated.
• All normal distribution can be transformed into standard normal. This transformation is
achieved by
Example 3
A company produces lightbulbs whose lifetimes follow a normal
distribution with mean 1200 hours and standard deviation 250 hours.
If a lightbulb is chosen randomly from the company’s output, what is
the probability that its lifetime will be between 900 and 1300 hours?
Empirical Rule:
Recall standard deviation: it measures how spread out the numbers are. When we calculate standard deviation,
Generally it is observed that
68% of the values are within 1 standard deviation away from the mean
95% of the values are within 2 standard deviations away from the mean
99.7% of the values are within 3 standard deviations away from the mean
Central Limit Theorem
• It states that distribution of the means of the samples taken from a population with any
distribution (given it has finite expected value and variance) will converge to a normal
distribution with mean, and variance
• CLT describes how samples drawn from a non normal distribution become increasingly
approximated by the normal distribution as sample size increases.
• The fact that the sample distribution is approximated to normal distribution whatever the
underlying distribution is an extremely powerful result
Sampling distribution of the mean
• If the sample size is a small proportion of the population size, then subtracting the mean and dividing by
the standard error yields a random variable,
• Estimator: an estimator of a population parameter is a random variable that depends on the sample
information and whose realization provide approximation to this unknown parameter. A specific
realization of that random variable is called an estimate.
• Point estimator: point estimator of a population parameter is a function of the sample information that
yields a single number. The corresponding realization is called the point estimate of the parameter.
Example: You want to find the average family income in your neighborhood based on a random sample
of 20 families. Suppose you found the sample mean to be $49000. Point estimator over here is the sample
mean and point estimate is $49000
• For most practical situations, point estimate alone is inadequate or difficult to obtain. In that case, you
will want to use an interval estimate - range of values in which the quantity to be estimated appears
likely to lie
• Significance level: probability at which we reject a null hypothesis given that it is true i.e. rejecting a claim
made in a certain situation. It is represented by
• Confidence interval: 1 -
Eg. A 95% confidence interval means that we are 95% sure/confident that a parameter will lie within a certain
range.
• In this figure, 95% falls in the unshaded region i.e. the confidence interval. The shaded region is the significance
level.
• If you are dealing with a two tailed distribution, this 5% will be divided by 2 since you consider two tails-
one at the right; the other at the left.
Confidence Interval for the mean of a normal population:
In this section, we will estimate an unknown population parameter by establishing an interval of values.
Suppose that we have a random sample of n observations from a normal distribution with mean and
variance . If is known, and the observed sample mean is , then a 100(1-) % confidence interval for the
population mean is given by
- < µ < +
Using only a point estimate is like fishing in a murky lake with a spear. You can throw a spear where you saw
a fish, but you’ll probably miss it. On the other hand, if you toss a net in that area, you have a good chance of
catching the fish. A confidence interval is like a fishing net, and it represents a range of plausible values
where you are likely to find the population parameter.
Example 5
At 95%, = 0.5
Here you will be drawing a two-tailed figure (since you are dealing with a range), thus will be divided by 2. We
have = 0.025. Now, find the probability from the table and do the calculation.
Interpretation: If 25 observations are drawn from this population repeatedly, we are 95% confident that each
time the true mean weight will range from 19.33 to 20.27 ounces.
Example 7
• Often we want to be able to formulate a statement that can either be correct or incorrect. In such cases, we
use statistical evidence to suggest if the proposed statement is true or not. For example:
• Suppose, it is suggested that students perform better when the class size is smaller. What is the probability
that this statement is true? This is where we use the hypothesis test to determine the likelihood of this
claim.
• Hypothesis test enables us to determine the validity of a claim/conjecture by using sample data.
• Alternative hypothesis: refers to the situation when the null hypothesis is not true (opposite of ).
Denoted as
A statement that you want to prove/test/determine.
EXAMPLES
State the appropriate null hypothesis and alternative hypothesis for each of the following cases.
(a) The mean area of the several thousand apartments in a new development is advertised to be 1250 square
feet. A tenant group thinks that the apartments are smaller than advertised. They hire an engineer to measure a
sample of apartments to
test their suspicion.
b) Larry's car consume on average 32 miles per gallon on the highway. He now switches to a new motor oil that
is advertised as increasing gas mileage. After driving 3000 highway miles with the new oil, he wants to
determine if his gas mileage actually has increased.
: µ = 32
: µ > 32
The diameter of a spindle in a small motor is supposed to be 5 millimetres. If the spindle is either too small or too
c)
large, the motor will not perform properly. The manufacturer measures the diameter in a sample of motors to
determine
whether the mean diameter has moved away from the target.
: µ = 5 ( the claim)
:µ≠5
Note: The above requires a two sided hypothesis.
• Hypothesis testing : A neurologist is testing the effect of a drug on response time by injecting100 rats
with a unit of the drug, subjecting each to neurological stimulus and recording its response time. The
neurologist knows that the mean response time of the rats not injected with the drug is 1.2 seconds.
The mean of the 100 injected rats’ response time is 1.05 seconds with a sample standard deviation of
0.5 seconds. Do you think that the drug has an effect on the response time? Use 1% significance level
• Then find the z score which refers to the standard deviations away from the mean
z = = = -3
• Draw the figure as shown in class
• Rejection rule
Since this is a two sided test, you reject the null if
z stat > or z stat < -
In this case, the z – score is -3 which is less than the critical value (or - ) , - 2.57.
Z- score lies in the rejection region.
• Conclusion:
At 1% significance level, z-score is less than the critical value, -2.57. Thus, we have enough evidence to reject
the null in favor of the alternative. Hence, we can conclude that the drug has an effect on the reaction time of
the rats.
Finding the p-value
• Interpretation of p-value: the smallest significance level at which the null hypothesis can be rejected given
the single observed sample mean.
The mean is 3 standard deviations away. So what is the probability of getting a result this extreme?
• Refer to the z-table and find the probability for -3. The value is 0.00135; since it is a two tailed test, double
the value by 2 (2x0.00135). This 0.0027 is the p-value, meaning the probability that the drug has no effect
on response time. The p-value is too small, thus we can reject the null.
Guidelines for Hypothesis testing
Hypothesis testing is a proof by contradiction. The testing process has 5 steps:
The z-score would remain the same but the figure and critical value will change because this is a one-tailed
test.
• Rejection rule for a left-tailed hypothesis test:
Z score < - (Note: over here we shall not divide because we are not considering two tails). So, is equal to 0.01
which has a critical value of - 2.32
Z-score falls in the rejection region, thus we reject the null and conclude that the drug has an effect on the
response time.
• We can compare the P-value we calculate with a fixed value that we regard as decisive. The decisive
value of P is called the significance level (this is our α). Most common values for α are 0.1, 0.05, 0.01.
Example
An air freight company wishes to test whether or not the mean weight
of parcels shipped on a particular route exceeds 10 pounds. A random
sample of 49 shipping orders were examined and found to have an
average weight of 11 pounds. Assume that the standard deviation of
the weight is 2.8 pounds. Conduct a hypothesis test at 5%
significance level.
Question for next class