You are on page 1of 43

Statistical Hypothesis

A statistical hypothesis is a claim or statement


about a particular aspect of a population

Hypothesis testing is a decision making process


where we draw inferences about the population
• For example, suppose that an engineer is designing an air crew escape system that
consists of an ejection seat and a rocket motor that powers the seat.
• The rocket motor contains a propellant, and for the ejection seat to function properly,
the propellant should have a mean burning rate of 50 cm/sec.
• If the burning rate is too low, the ejection seat may not function properly, leading to an
unsafe ejection and possible injury of the pilot.
• Higher burning rates may imply instability in the propellant or an ejection seat that is too
powerful, again leading to possible pilot injury.
• So the practical engineering question that must be answered is: Does the mean burning
rate of the propellant equal 50 cm/sec, or is it some other value (either higher or
lower)?
Burning rate is a random variable that can be described by a probability
distribution.

Suppose that our interest focuses on the mean burning rate (a parameter
of this distribution).

Specifically, we are interested in deciding whether or not the mean


burning rate is 50 centimeters per second.

We may express this formally as

H0: μ = 50 centimeters per second


H1: μ ≠ 50 centimeter per seconds
The statement H0: μ = 50 centimeters per second is called the null hypothesis.

This is a claim that is initially assumed to be true.

The statement H1: μ ≠ 50 centimeters per second is called the alternative hypothesis
and it is a statement that contradicts the null hypothesis.

Because the alternative hypothesis specifies values of μ that could be either greater or
less than 50 centimeters per second, it is called a two-sided alternative hypothesis.

Remember that hypotheses are always statements about the population or distribution
under study, not statements about the sample
One-sided alternative hypothesis may look like
Test of a hypothesis

It is a procedure leading to a decision about the null


hypothesis.

Hypothesis-testing procedures rely on using the information


in a random sample from the population of interest.

If this information is consistent with the null hypothesis, we


will not reject it; however, if this information is inconsistent
with the null hypothesis, we will conclude that the null
hypothesis is false and reject it in favor of the alternative.
Test of a hypothesis….continued

Truth or falsity of a particular hypothesis can never be known


with certainty unless we can examine the entire population.
This is usually impossible in most practical situations.

Therefore, a hypothesis-testing procedure should be


developed with the probability of reaching a wrong
conclusion in mind.

Testing the hypothesis involves taking a random sample,


computing a test statistic from the sample data, and then using
the test statistic to make a decision about the null hypothesis
TESTS OF STATISTICAL HYPOTHESES (propellant burning rate)

The null hypothesis is that the mean burning rate is 50 centimeters per second, and
The alternate is that it is not equal to 50 centimeters per second.

Test Statistic ( x_bar)

Take a sample of n = 10 specimens and find sample mean burning rate x_bar.
The sample mean is an estimate of the true population mean μ.
If a value of the sample mean x_bar is close to the hypothesized value of μ = 50 centimeters per
second, then null hypothesis is true.
If sample mean x_bar is considerably different from 50 centimeters per second, it is is evidence
that alternative hypothesis H1 is true.

Thus, the sample mean is the test statistic in this case.


Acceptance Region and Critical Values

Suppose

we will not reject the null hypothesis H0: μ=


50, and if either x < 48.5 or x > 51.5, we will
reject the null hypothesis in favor of the
alternative hypothesis H1:
• Type I Error

• This decision procedure can lead to either of two wrong conclusions.

• For example, the true mean burning rate of the propellant could be equal to 50

centimeters per second.

• However, for the randomly selected propellant specimens that are tested, we could

observe a value of the test statistic x_bar that falls into the critical region.

• We would then reject the null hypothesis H0 in favor of the alternate H1 when, in

fact, H0 is really true. This type of wrong conclusion is called a type I error.
Type II Error

Now suppose that the true mean burning rate is different from 50 centimeters
per second, yet the sample mean x_bar falls in the acceptance region.

In this case, we would fail to reject H0 when it is false.

This type of wrong conclusion is called a type II error.


What is α
Computing the Type I Error
Suppose x_bar lies between 48.5 and 51.5

Also, suppose that the standard deviation of burning rate is σ = 2.5 centimeters per
second and that the burning rate has a distribution for which the conditions of the
central limit theorem apply
The distribution of the sample mean is approximately normal with mean μ = 50 and
standard deviation ( for n = 10).

The probability of making a type I error (or the significance level of our test) is equal to
the sum of the areas that have been shaded in the tails of the normal distribution. We
may find this probability as
This is the type I error probability. This
implies that 5.74% of all random
samples would lead to rejection of the
hypothesis H0: μ = 50 centimeters per
second when the true mean burning
rate is really 50 centimeters per second.
Hypothesis Tests in Simple Linear Regression

• This approach helps to assess the adequacy of a linear regression model and model
parameters
• Facilitate the construction of confidence intervals.

To test hypotheses about the slope and intercept of the regression model me make
following assumption

Errors (residuals) are normally and independently distributed with mean zero and
variance σ2, abbreviated NID (0, σ2 ).
Important and special hypotheses

When NULL hypothesis is true, it is equivalent to concluding that there is no linear


relationship between x and Y as shown in Figure above.
This may imply either that x is of little value in explaining the variation in Y and that the best
estimator of Y for any x is yˆ = Y

When NULL hypothesis is true it may also imply that the true relationship
between x and Y is not linear as seen below
When NULL hypothesis is False (Rejected), this implies that x is of value in explaining the
variability in Y and the straight-line model is adequate (as seen in Figure above)

It could also mean that although there is a linear effect of x, better results could be
obtained with the addition of higher order polynomial terms in x [see Fig. below].
EXAMPLE Test the NULL hypothesis that slope is ZERO

we will use α = 0.01. We have

SS E = 21.25
the t-statistic becomes

SS E 21.25
 =
2
= = 1.18
n2 18
Test the NULL hypothesis that slope is ZERO

we will use α = 0.01. We have

SS E = 21.25
the t-statistic becomes

SS E 21.25
 =
2
= = 1.18
n2 18
Test the NULL hypothesis that slope is ZERO

we will use α = 0.01. We have

SS E = 21.25
the t-statistic becomes

SS E 21.25
 =
2
= = 1.18
n2 18
Test the NULL hypothesis that slope is ZERO

we will use α = 0.01. We have

SS E = 21.25
the t-statistic becomes

SS E 21.25
 =
2
= = 1.18
n2 18
Test the NULL hypothesis that slope is ZERO

we will use α = 0.01. We have

SS E = 21.25
the t-statistic becomes

SS E 21.25
 =
2
= = 1.18
n2 18
Test the NULL hypothesis that slope is ZERO

we will use α = 0.01. We have

SS E = 21.25
the t-statistic becomes

SS E 21.25
 =
2
= = 1.18
n2 18
Test the NULL hypothesis that slope is ZERO

we will use α = 0.01. We have

SS E = 21.25
the t-statistic becomes

SS E 21.25
 =
2
= = 1.18
n2 18
ANALYSIS OF VARIANCE (ANOVA)
F-Distribution
F-Distribution
Definition.
If U and V are independent chi-square random variables
with r1 and r2 degrees of freedom, respectively, then:

follows an F-distribution with r1 numerator degrees of freedom


and r2 denominator degrees of freedom.
Characteristics of the F-Distribution

(1) F-distributions are generally skewed. The shape of an F-distribution


depends on the values of r1 and r2, the numerator and denominator
degrees of freedom, respectively, as this picture illustrates
(2) The probability density function of an F random variable with r1
numerator degrees of freedom and r2 denominator degrees of
freedom is:

Cummulative density function (CDF) ia´s


The Gamma Function
The Gamma Function is an extension of the concept of factorial numbers. We can input (almost) any real or
complex number into the Gamma function and find its value. Such values will be related to factorial values.
There is a special case where we can see the connection to factorial numbers.
If n is a positive integer, then the function Gamma (named after the Greek letter "Γ" by the mathematician
Legendre) of n is:

A formula that allows us to find the value of the Gamma function for any real value of n is as follows:
Relationship between ALPHA, r1 and r2 with F
VALUES OF F for r1 = 6, r2 = 2 and specific values of ALPHA
ANALYSIS OF VARIANCE (ANOVA)
APPROACH TO TEST SIGNIFICANCE OF REGRESSION

ANOVA approach works as under:


Partition the total variability in the response variable into meaningful
components as the basis for the test. The analysis of variance identity is as
follows:
ANOVA TABLE
𝐹𝑟𝑜𝑚 𝐹 − 𝑇𝑎𝑏𝑙𝑒, 𝑓0.01,1,18 = 8.29

𝑆𝑖𝑛𝑐𝑒 𝐹 = 128.86 > 𝑓0.01,1,18 = 8.29, we reject NULL Hypothesis Ho