You are on page 1of 30

IT3101: Foundations of

Data Science
• Introduction to Hypothesis Testing
• Significance Tests: p-Value and Critical Value
• Z- Test
• T- Test
• Hypothesis testing is an act in statistics whereby an analyst tests an
assumption regarding a population parameter. It is used to assess the
plausibility (falseness) of a hypothesis by using sample data.
• It is a form of statistical inference that uses data from a sample to
draw conclusions about a population parameter or a population
probability distribution.
• Analysts test a hypothesis by examining a random sample of the
population for two different hypotheses:
• Null hypothesis (H0)
• Alternative hypothesis (H1)
Hypothesis Testing: Introduction
Hypothesis Testing: Example
A restaurant owner installed a new automated drink machine. The
machine is designed to dispense 530 ML in medium size glass. The
owner suspect that machine may be dispensing too much in medium
size. They decide to take sample of 30 drinks of medium size to see if
the average amount is significantly greater than 530 ML.

Null Hypothesis?
Ho : μ = 530
Alternate Hypothesis?
H1: μ > 530
• Unfortunately, since hypothesis tests are based on sample
information, the possibility of errors must be considered.
➢A type I error corresponds to rejecting H0 when H0 is actually true, &
➢A type II error corresponds to accepting H0 when H0 is false.

• The probability of making a type I error is denoted by α, and the


probability of making a type II error is denoted by β.
• The maximum allowable probability of making a type I error, called
the level of significance for the test.
• Common choices for the level of significance are α = 0.05 and α = 0.01.
• “Confidence” can be measure as: 1 - significance level
Hypothesis Testing: Example Scenario
Hypothesis Testing: Introduction
• To interpret the results of hypothesis test, one need to test the results
or measure of how likely the sample results are.
• Significance test are used for this purpose:
• p-value
• critical values
• A statistical hypothesis test may return a value called probability or the
p-value. This is a quantity that we can use to interpret or quantify the
result of the test and either reject or fail to reject the null hypothesis.
• This is done by comparing the p-value to a threshold value chosen
beforehand called the significance level.
• If p-value > alpha: Fail to reject the null hypothesis (i.e. not significant
result).
• If p-value <= alpha: Reject the null hypothesis (i.e. significant result)
• A common misunderstanding is that the p-value is a probability of the null
hypothesis being true or false given the data. In probability, this would be
written as follows:
Pr(hypothesis | data)
This is incorrect.
• Instead, the p-value can be thought of as the probability of the data given
the pre-specified assumption embedded in the statistical test.
• Again, using probability notation, this would be written as:
Pr(data | hypothesis)
• It allows us to reason about whether or not the data fits the hypothesis. Not
the other way around.
• The p-value is a measure of how likely the data sample would be observed if
the null hypothesis were true.
Significance Test: Critical Value
Z - Test
Z-Test: Procedure (One Sample)
Z-Test: Example
A principal at a certain school claims that the students in his school are
above average intelligence. A random sample of thirty students IQ scores
have a mean score of 112.5. Is there sufficient evidence to support the
principal’s claim?
The mean population IQ is 100 with a standard deviation of 15.
• Step 1: State the Null hypothesis.
The accepted fact is that the population mean is 100, so: H0: μ = 100
• Step 2: State the Alternate Hypothesis.
The claim is that the students have above average IQ scores, so: H1: μ > 100.
• Step 3: Draw a picture to help you visualize the problem.
Z-Table: Rejection Region
Z-Table: Rejection Region
Decision Rule:
• The decision rule is a statement that tells under what circumstances to reject the
null hypothesis. The decision rule is based on specific values of the test statistics.
• The decision rule depends on whether an upper-tailed, lower-tailed, or two-tailed
test is proposed. In an upper-tailed test the decision rule has investigators reject
H0 if the test statistic is larger than the critical value. In a lower-tailed test the
decision rule has investigators reject H0 if the test statistic is smaller than the
critical value. In a two-tailed test the decision rule has investigators reject H0 if the
test statistic is extreme, either larger than an upper critical value or smaller than a
lower critical value.
• Other factor is the level of significance. The level of significance which is selected
dictates the critical value. For example, in an upper tailed Z test, if α =0.05 then
the critical value is Z=1.645.
Let’s say we need to determine if girls on average score higher than 600 in the exam.
We have the information that the standard deviation for girls’ scores is 100. So, we
collect the data of 20 girls by using random samples and record their marks. Finally,
we also set our ⍺ value (significance level) to be 0.05.

• Mean Score for Girls is 641


• The size of the sample is 20
• The population mean is 600
• Standard Deviation for Population is 100

• Compute z-score?
• Whether to reject or select null hypothesis?
Z-Test: Example (Two Sample)
T - Test
T-Test
T-Test
• If you only care whether the two populations are different from one
another, perform a two-tailed t-test.
• If you want to know whether one population mean is greater than or
less than the other, perform a one-tailed t-test.
• Example question: your company wants to improve sales. Past sales data indicate
that the average sale was $100 per transaction. After training your sales force,
recent sales data (taken from a sample of 25 salesmen) indicates an average sale of
$130, with a standard deviation of $15. Did the training work? Test your hypothesis
at a 5% alpha level.
• Step 1: Write your null hypothesis statement (How to state a null hypothesis). The
accepted hypothesis is that there is no difference in sales, so:
• H0: μ = $100
• Step 2: Write your alternate hypothesis. This is the one you’re testing in the one
sample t test. You think that there is a difference (that the mean sales increased),
so:
• H1: μ > $100
T-Test (One Sample)
T-Test (One Sample)

You might also like