Chapter 3 Part 1 The Logic of Hypothesis Testing v2

Chapter 3.
Hypothesis testing
1
Chapter 3 – Part 1
Begin Able to demonstrate relationships
Part I: Hypothesis Testing: fundamentals
Chapter 3. Hypothesis testing
2
A few remarks…. Book: chapter 9
part 1
• This chapter is probably the most important one in this lecture. It’s also probably the most important one in ALL lectures
related to quantitative methods.
• Almost all undergraduate students in the world cover this topic. Being proficient with hypothesis testing is a
prerequisite in many fields (and yes even in finance, medicine, marketing and HRM…).
• As it’s a completely new field for you, it might be challenging as you discover methods and vocabulary.
So what is it about?
The objective of hypothesis testing is to validate or reject a claim, an assertion, a theory, an hypothesis on the
population. It’s the most fundamental and common method in science when we need to infer from a sample.
• Until now, we created confidence intervals to answer to questions "can we find an interval inside which we are confident
that the parameter is?“.
• Now we ask slightly different question such as:
 "We think that our customers watch TV 2 hours per day. Is it true?“
 “Can we be sure that this colour is more efficient when it comes to sell cars?”
 “Can we say that investing in SMB is more profitable?”
 “Is it possible to demonstrate that this vaccine is working?”
 “Can we conclude that he’s guilty?”
3
Some examples….
Medicine
This table has been used to demonstrate that the Pfizer

vaccine was protecting against COVID.
 The (oversimplified) question is “can we state that

the vaccine protects?”
 A sample was selected in two different nursing
facilities.
 Vaccinated and non vaccinated persons were
compared in both of them.
 Conclusions rely completely on Hypothesis Testing.
The idea is not to rely on opinions but on facts. It’s a

good practice to provide detailed computations and raw
data to the reader who can check himself the relevancy
of the conclusion.
4
Where will you see it? ….
Finance
Investment controlling evaluation has an effect on
ROE! By the end of the chapter, you will read this table.
Quality Control
Your company sell toys procured from a famous supplier and you need to know if they are defective or not. You define, after
a meeting with Marketing, CRM and quality control department, a good batch as having less than 0.5% of defects among
toys.
Inspection is expensive: you need to open the package, test the toy and so on. You can’t test all of them! The smaller the
sample, the less expensive it is... but the higher the risk of not detecting that there are too many defective toys.
How many toys should be selected? What's the maximum percentage of defective toys in the sample that can accepted to
be able to say that less than 0.5% of the whole batch is defective?
Marketing / Management
You manage a fast food restaurant. You want to determine if the waiting time to place an order has changed in the past
month compared to the previous mean value of 3 minutes. How can you confirm or inform this assertion?
5
A few remarks….
• The four parts of this chapter cover

 Part 1: a global introduction to the topic and a fully detailed explanation of the most common case.
 Part 2: being able to select the best test for a given problem
 Part 3: practice on important tests
 Part 4: specific tests
• Each part is detailed on IESEG-Online, each of them with some specific videos (that you’ll happily watch).
• You’ll find thousands (millions?) of documents and videos of this topic online. Be careful as every single word is
important. A significant number of those documents or videos skip important parts or assume that you already know
them.
• The textbook is your best ally. Part 1 is covered in chapter 9.
Now let’s begin…..

6
1 Fundamentals of Hypothesis Testing
1.1 Introductory example
The mean age of your customers was 30 last year. It’s an important factor for your market and you would like this value not
to change over time. Your marketing department paid a consultancy company specialized in communication to work on this
problem. You want to check if this year the mean age of your customers is still 30. You can’t check with every individual
customer and select randomly a sample of customers. Can you conclude that the mean age is still the same?
• There is a claim, and assertion about a parameter in the population: the consultancy company claims that µ (the mean
age in the population) is 30.
• We face two cases called hypotheses: “the mean is 30” and “the mean is not 30”. They are mutually exclusive and
opposite. They can’t be both true at the same time. They are given a name:
 H0 is called the “null hypothesis”, it’s “the mean age in the population of customers is 30” which means “the mean
age of my customers has not changed”.
 H1 is called the “alternative hypothesis”, it’s “the mean age in the population of customers is not equal to 30”.
 Formulating hypothesis can be tricky so we keep thinks simple right now. We could have tried to check if the mean
was larger than 30.
• To accept or reject this claim, you will select a sample. Thanks to this sample you obtain a statistic. In the example, it’s ,
the mean age in the sample.
Caution: we do not define accurately here some terms for clarity. They will be detailed in the next part.
7
• Depending on the value of this statistic , you’ll accept or reject the claim.
µ= 30 µ= 40
• For the sake of clarity, we assume that the mean can be equal to 30 or 40 but we could have compared 30 to 25, 45,
50, 53…
• If the real mean in the population is 30, then we are likely to obtain a value close to 30 in the sample. If the real value is
40, we expect to observe a value close to 40.
• If, in the sample, , then it’s likely for the mean in the population to be 30. If it’s 42 then it’s likely that the mean is 40.
• But what about ? Should be conclude H0 or H1? Is it close enough to 30? As you see the whole problem is to set up a
decision rule to choose between the two alternatives.
• The value that we select for this decision rule (called a bound or a critical value) must be selected to minimize the risk
of concluding H1 when H0 is true or decide H0 when H1 is true.
8
• Let’s focus on the question: can we conclude that the mean in the population is 30?
Bounds or Critical values
Conclude H0
µ= 30
• We define (arbitrarily in this example) the bounds to be 34 and 26. They must be selected according to a given risk
level (such as the risk of rejecting H0 when H0 is true). From it we deduce the decision rule:
 If the observed value in the sample is above 34 or below 26, then we REJECT H0. We conclude that the mean age is
not 30. We can say that “the observed value in the sample is so far from 30 that it’s really unlikely that the mean
value in the population is 30”. Notice that maybe the real value is 30 but you’ve been unlucky when you selected
the sample.
 If the observed value in the sample is between 26 and 34 then we FAIL TO REJECT H0. Does that mean the mean
age in the population is exactly 30? NO!!!! Maybe the mean age in the population is 31.
• Notice that we’ve NEVER said “H0 is true” or “H1 is true”. We should just say “given observed values, we can reject H 0“
or “we fail to reject H0”. Why? Because we are working with samples, and we might have selected a faulty one!
9
Many books summarize the whole process with the following steps:
1. Set up hypotheses H0 and H1.

2. Understand what you want to work on. A mean? A proportion? Compare one or many groups? Compare qualitative
variables?
3. Deduce which test might be applicable to your case (and there are many!).
4. Check that assumptions to use this test are applicable (or find a backup solution).
5. Decide on the significance level α…. Basically, select how confident you want to be in the conclusion.
6. Deduce the critical value (the bound) and compare it to the observed value from you sample
7. Make a decision : do you reject or not H0.
8. State a global conclusion.
Let's cover them one by one in the next slides!

10
1.2 Null and alternative hypothesis
1.2.1 Vocabulary
• The null hypothesis is what we assume to be true. It’s also called the status quo or the given situation or the state of
nature. It’s what we “expect to see if nothing specific happens”.
 We constantly assume things that we don’t even test. We expect to find something written in a book. We expect to
be able to drink coffee in a bar… When we don’t test them, we accept them as facts.
 Justice: we assume that you’re innocent unless proven guilty.  H0 is “innocent”
 Pregnancy: we assume that a woman is not pregnant as the common “state of nature” is not to be pregnant.  H0
is “not pregnant”
 Vaccine: it’s more prudent to assume that a new vaccine is not efficient. There is no good reason to assume, ex
ante, that a vaccine is effective.  H0 is “the vaccine has no effect”
 Marketing: we can make French and Spanish test a new smartphone. The conservative assumption is that they are
willing to spend the same amount for this product. H0 is “the mean amount that French and Spanish are willing to
spend is the same”.
 Finance: we can try to predict the price of Amazon on the stock exchange and test if past sales have an effect on
valuation.  H0 is “there is no correlation between the current price and past sales”
11
1.2.1 Vocabulary
• The alternative hypothesis is… the opposite. We make an unexpected assertion, a claim about something and we’re
not sure. We can’t assume it to be true. It might be true
• If we don’t observe a result compatible with the alternative hypothesis, we maintain the null hypothesis. To be more
accurate “if we observe something FAR from H0 then we reject H0.
 Justice: you’re guilty if we have sufficient proofs to reject the null hypothesis.
 Pregnancy: H1 is “pregnant”. We conclude pregnancy if some observations just can’t happen without pregnancy. Ex:
giving birth.
 Vaccine: H1 is “the vaccine is efficient”. We conclude that the vaccine has an effect if what has been observed can’t
happen if it has no effect. Ex: among those with vaccine, 1% die. Among those without the vaccine 20% die. The
gap between the two percentages is so large that we can reject H0. Notice that we haven’t said “the vaccine is
perfect”, just that we can reject its “non-effect”.
 Marketing: H1 is “the mean amount that French and Spanish are willing to spend is not the same”. If for instance
French are willing to spend 200€ and Spanish 600€, the gap between the two is so large that we can be confident
that H0 is not valid.
 Finance: H0 is “there is a correlation between the current price and past sales”… but is does not tell us if it’s a
positive or negative correlation.
12
1.2.1 Vocabulary
The null hypothesis H0 The alternative hypothesis H1 or HA
It’s the assumption accepted as true. It’s what we currently believe or assume. It’s the opposite of the null hypothesis and a claim. It’s not
We can call it “state of nature” or “status quo” or “historical value”. If we can’t something that we can assume to be true.
demonstrate that it’s wrong, we keep this statement.
Most of the time, it’s a claim that past experiments have not been able to reject It challenges H0
or “a claim that does not ask us to believe that something really unexpected
happened”.
The equality “=” or “≥” or “≤” is always in H 0 (if we work on quantitative It never contains “=” nor “≥” nor “≤”
variables).
The null hypothesis is rejected when there is sufficient evidence from the sample Many times, it’s the research question or “what the analyst
data that it’s false. wants to prove”.
If we don’t reject the null hypothesis, we believe that it’s true. If we reject H0, then we don’t reject H1 (obviously!). It does
We have not proven that it was true. We only say, “there is insufficient evidence not mean that H1 is correct but that, given your data, H1 is
to warrant the rejection of H0”. more believable.
We say that we reject or fail to reject the null hypothesis. We don’t say that we “accept” the null hypothesis. We don’t say that a hypothesis is
“true”. We can only say the “data lead us to believe in this conclusion” or “we decide this hypothesis”.
13
1.2.2 Examples
Jam has disappeared. Kitchen was empty save for a 4 years old girl. According to her there is a strong possibility that a
suspect called Mister Bunny stole the jam.
Step 1: the claim. Our initial assumption is that the defendant (Mister Bunny) is innocent. The null hypothesis (H0) and the alternative
hypothesis (H1) are:
H0: Defendant is innocent as “"the defendant is innocent until proven guilty” so H 0 is our initial assumption.
H1: Defendant is guilty.
Step 2: collecting data
We collect data called here “evidences”: there is no jam on Mister Bunny, he does not like jam and, as some of you might have noticed, it’s
a puppet. In statistics, data are the evidence.
Step 3: decision based on data
If the jury finds insufficient evidence, it does not reject the null hypothesis: we think that Mister Bunny is innocent (i.e. somebody else is
guilty). We can say that we “fail to reject the null hypothesis”. If the jury finds sufficient evidence, it does reject the null hypothesis: we
think that Mister Bunny is guilty.
Do we know if he’s really guilty or really innocent? No! And we’ll never know! We don’t “prove” or “demonstrate” that somebody is
innocent or guilty. We can just say that evidence make us believe in guilt or not. As we are using evidence (meaning collected data), we
will never be sure. We can be highly confident in the outcome, but we can’t be sure (maybe, even if it’s highly unlikely, Mister Bunny really
likes jam). Rejecting H0 does not mean that H1 is true. Failing to reject H0 does not mean that H0 is true.
14
1.2.2 Examples
• A company sells electric cars. They state on their website that autonomy is 400km. You can assume that autonomy is
indeed 400km. It’s your null hypothesis if you assume the company to be right:
H0 autonomy = 400
H1 autonomy ≠ 400

• Last year average sales were €46 per customer. You expect them to increase each year. The null hypothesis is the status
quo. The claim is “can we say that the amount has increased”. It gives us a hint: in the alternative hypothesis we wonder if
the value is larger than €46.
H0 amount ≤ 46
H1 amount > 46

• You wonder is there is a link between origin and preferred colour. You select two groups (French and non-French) and ask
them which colour is preferred.
H0 No link between the origin and colour (independence)
H1 Link between origin and colour (dependence)
15
1.2.3 Some common hypothesis
• For two qualitative variables the most common case is:
H0 variables are independent
H1 variables are dependent
• For means and proportions, when we work on one sample, the three most common cases are:
Left tailed Two tailed Test Right tailed
H0 value in pop ≥ Expected value H0 value in pop = Expected value H0 value in pop ≤ Expected value
H1 value in pop < Expected value H1 value in pop ≠ Expected value H1 value in pop > Expected value
Left tailed as H1 is on the left We say “two tailed” as H1 is on both sides Right tailed as H1 is on the right
• The expected mean is commonly called µ0 and the expected proportion is called p0
16
1.2.4 Practical case
Let’s come back to the wedding. Somebody told you “most of the time, guests are, on average, 30”. Can you check the
validity of this assumption?
Step 1: define hypotheses
The research question is to know if the mean age of guests at the wedding is equal to 30 or not. The first step is to create
the two hypotheses:
H0:
H1: the expected value is
Step 2: understand what you want to work on

• We need a single sample taken from all guests. We don’t want to compare two subgroups taken from the population.
• We want to test the mean (a quantitative variable). We need a test designed to study the mean.
• We need the sample size n and to obtain the sample mean .
Step 3: select the relevant test

Let’s call, for the moment, the test that we need “Two tailed test on the sample mean”.
Step 4: check assumptions: we consider that they hold (well, we just need a large sample*).
17
1.3 Type I and Type II errors
1.3.1 Why do we need to consider errors?
• As we’re working with a sample, we’re never 100% sure that the conclusion will be the right one. Even if it’s unlikely, we
might be unlucky and select a sample that does not behave like the population. At the wedding, we might select
children’s table.
• A good example to understand the concept is the “pregnancy case”. Your company produces pregnancy tests. The
objective is to detect pregnancy (!) for Anna.
 Anna is pregnant or not. We can’t say “half pregnant” nor “40% pregnant”.
 We define the status-quo or H0 as “Anna is not pregnant”. The alternative hypothesis is H1 “Anna is pregnant”.
• The test, based the presence of a given level of hormones, can conclude
 “pregnancy is not detected” or “Anna is not pregnant”. We do not reject H 0 or “the quantity of hormones is such
that we can’t reject non-pregnancy”.
 “pregnancy is detected” or “Anna is pregnant”. We reject H0 and conclude H1 or “the quantity of hormones is so
high that we reject non-pregnancy”.
But the test is working on hormones…. And biology is such that in some cases a pregnant woman might have a low
quantity of hormones and in other cases a high level can be observed with a non pregnant woman.
18
1.3.1 Why do we need to consider errors?
The two desired cases are :
 the pregnancy test detects non pregnancy when Anna is non pregnant: we conclude H0 when H0 is true.
 the pregnancy test detects pregnancy when Anna is pregnant: we conclude H1 when H1 is true.
Decision
Making Actual Situation
H0 true H1 is true / H0 False

Anna is not pregnant Anna is Pregnant

test / Conclusion
Do not reject H0 / Decide H0

CORRECT
Decides not pregnant
19
1.3.2 Definitions
• If Anna is not pregnant but the test detects pregnancy, it’s a false positive also called Type I error or “false alarm”.
• The error is “we conclude H1 while the truth is H0”.
• The risk that it will happen, or the probability of concluding H1 when the truth is H0 or the false positive risk is called α.
P(conclude H1 | H0 true) = α
Decision


test / Conclusion
Do not reject H0 / Decides H0

CORRECT
20
1.3 Type I and Type II errors Total area for α which is
“probability of observing a value in H1
1.3.2 Definitions
while H0 is true” as
P(conclude H1 | H0 true) = α
µ= 30
Decision


test / Conclusion

CORRECT
21
1.3.2 Definitions
• If Anna is pregnant but the test fails to detects pregnancy, it’s a false negative also called Type II error.
• The error is “we conclude H0 while the truth is H1”.
• The risk that it will happen, or “the probability of concluding H 0 when the truth is H1“ is called β.
P(conclude H0 | H1 true) = β
• In many real word examples, the Type II error leads to much more severe consequences. Here we fail to detect
pregnancy.
Decision


test / Conclusion
Do not reject H0 / Decide H0

CORRECT Type II error / False Negative
Decides not pregnant Probability = β
22
1.3.2 Definitions
• There is a link between the false positive risk and the false negative risk. If we change the level of hormones needed to
detect pregnancy, we can increase the probability of detecting pregnancy… but the risk of a false alarm will increase too.
• The probability of a false positive, α, is called the significance level. A very common value is 5%.
• The probability of a true negative 1- β is called the power of the test. A common value is 10%.
• Right now, we learn how to care about α or Type I error. Power is NOT going to be covered this semester*.
Decision Actual Situation

Making
cy test / Conclusion

Probability= 1-α
Decides not pregnant Probability = β

Reject H / Decides H Probability =

23
1.3.3 Practical case Bounds or Critical values
Surface = α/2=2.5%
Surface = α/2=2.5% Conclude H0
Surface for α
Let’s come back to the wedding. Somebody told you “guests are, on average, 30”. Can you check the validity of this
assumption?
Step 5: define the significance level α
• Given the research question, there is not specific reason to ask for a very risk of Type I error. It’s not a highly critical topic
• Selecting α=5% for the level of significance is a common choice.
• If we observe a value INSIDE the bounds, we can not reject H 0. If we observe a value outside, then we reject H0 and the
risk of a false positive is 5%
24
1.4 The complete example
• We explain the whole logic from the beginning to the end in the following slides. You do NOT have to learn those steps
as at the end we show you how to master a much more efficient method.
• You are still explained all terms as a many other tools rely on the same definitions and the same logic.
Warning
You are not allowed not to understand the logic. This is the
most important part of the most important chapter…
25
Step 6: obtain the critical value and compare it to the observed value
• If the mean age at the wedding is really 30, then the distribution of the sample mean is normal.
• The dispersion of ages is σ. Let’s say that it’s equal to 20
• Let’s say that the sample size is n=100
• We know that for a sample of size n, Dispersion
• µ0 is the expected value for the population mean
µ0=30
26
The test is:
H0:
H1:
If we observe a value close to 30, we do not reject H0. If we observe a sample mean
close to 30, we conclude H0
µ0=30
If we observe a sample mean far from 30,

we reject H0
Far might be above or below
27
Let’s assume that the observed value in the sample is =25
Is it close to 30 or not?
Observed value in the sample =25
µ0=30
28
• To know if it’s close or not, we need bounds. Inside we can not reject H 0 and outside we reject H0
• In this example the observed value is outside, but it’s just an example!!!
• But the problem is to know WHERE the bounds are.
LB UB
H1 H0 H1
µ0=30
Where are exactly the

bounds?
29
• But we know that the risk of a false positive is 5%
• So we know where the bounds are….
LB Dispersion UB
H1 H0 H1
µ0=30
Surface = α/2=2.5% Surface = α/2=2.5%
They are such that the

surface outside is 5%
• But there is a major problem: there is no statistical table that gives us the bounds for a normal curve whose mean is 30
and standard deviation is 0.5…
• We need to STANDARDIZE
30
• To standardize (which means that we want to transform the problem to work on the Standardized Normal
distribution whose mean is 0 and standard deviation 1), first we translate all terms by 30.
µ0 =25-30
=25
LB UB
Standardized
Normal distribution H1 H0 H1
0 µ0=30
31
• Then, as the new dispersion has to be equal to 1, we divide all terms by
ZSTAT=
=25 Dispersion
LB UB
Dispersion
Z H1 H0 H1
0 µ0=30
This new value is the standardized value. Let’s give it a name.

The standardized normal distribution is named Z
As we use data coming from a sample, it’s calculated thanks to statistics
So the name is ZSTAT=
32
• The same logic is repeated with the bounds LB and UB.
LB − μ 0 UB − μ 0
ZSTAT =25
σx σx
LB UB
Z H1 H0 H1
0 µ0=30
We all agree on the fact that such a name is quite unpleasant. As they allow us to know if we need to reject H0 or
not, they are important, so let’s call them the critical values.
33
• Let’s change the name for the critical values...
ZSTAT =25
-Zα/2 +Zα/2
LB UB
Z H1 H0 H1
0 µ0=30
Surface = α/2=2.5% Surface = α/2=2.5%
We also notice that the green surface is still the same.

As each the green surface is equal to α/2 then the right critical value is called +Zα/2
As the Z distribution is symmetric, the left value is just -Zα/2
34
• We are almost done!
• Technically we change where the problem it but we don’t what the problem is
ZSTAT =25
-Zα/2 +Zα/2
LB UB
Z H1 H0 H1
0 µ0=30
(Do not reject H0 ) ⇔ (observed value is between LB and UB) ⇔ (-Zα/2⩽ZSTAT⩽ Zα/2)
(Reject H0 ) ⇔ (observed value is NOT between LB and UB) ⇔ (ZSTAT< -Zα/2 or Zα/2 <ZSTAT)
• ZSTAT is obtained thanks to collected data. ZSTAT=
Step 6: Zα/2 is obtained from reading a Normal distribution table (not needed this semester).
You already know the value…. Zα/2=1.96
35
Step 7: compare the observed value and the critical value
ZSTAT = -2.5 < -1.96 = - Zα/2  the observed value is “far” from what we should have observed if the mean had been 30
ZSTAT =25
-Zα/2 +Zα/2
LB UB
Z H1 H0 H1
0 µ0=30
Step 8: state a global conclusion

At a 5% level of significance, we REJECT the null hypothesis and conclude that the mean age in NOT 30.
At a 5% level of significance, we have enough statistical evidence to reject the null hypothesis.
36
1.5 Some training….
• Just to see if you understood the logic, let’s repeat the steps on another problem:
You want to estimate the mean weight of a laptop and a supplier told you that the mean weight was 2kg. In your sample
(size n=40 , the sample mean weight is 2.3kg. Standard deviation in the population is equal to 5). Can we confirm what the
supplier told you? You will work at a 5% risk.
…. But something is missing….. Have you noticed what?

37
1.6 Sample and Standard deviation: Student’s distribution.
• Exactly as for confidence intervals, we don’t really know the value of σ, the standard deviation in the population. It’s
estimated by s, the standard deviation in the sample. The “margin of error” increases as uncertainty is increasing.
• To consider this added error, we need Student’s distribution, but the logic is the same:
TSTAT =25 Dispersion
-Tα/2 +Tα/2
LB UB
T H1 H0
0 µ0=30
(Do not reject H0 ) ⇔ (observed value is between LB and UB) ⇔ (-Tα/2⩽TSTAT⩽ Tα/2)
(Reject H0 ) ⇔ (observed value is NOT between LB and UB) ⇔ (T STAT< -Tα/2 or Tα/2 <ZSTAT)
• TSTAT is obtained thanks to collected data. TSTAT= in the wedding problem.
• Tα/2 is obtained from reading a Student distribution….. and now we have a tiny problem…
38
• Tα/2 is obtained from reading a Student distribution… and an

extract is displayed on the right.
• We always read the column 0.05 (the level of significance)
• There is an infinite number of Student Tables, on for each
sample size. The larger the sample, the more the Student
table look like the Normal table (see at the end on the
column, you’ll see the 1.96 of the normal table).
• Instead of the sample size, it’s written ν (the Greek letter
nu) also called degree of freedom.
• The degree of freedom is a very very tricky topic (see notes)
• In S3 it’s just defined as “the quantity of information
available to increase the accuracy of the analysis above the
minimum quantity of information needed to obtain a result”
or “the quantity of information that can freely change”.
• Here it’s n-1 = 100-1= 99  read the row just above
39
Step 6: Tα/2 is equal to 2.00

Step 7: As TSTAT = -2.5 < -2 = - Tα/2 we reject H0.
Step 8: the conclusion is “at a 5% risk level, we conclude that
the mean age at the wedding is not equal to 30”.
• The test that we’ve just explained if called the “One Sample
T test” (test de Student pour un échantillon).
• It’s the most famous one in the world as all lectures begin
with this one.
Good news: during exams you will not have to repeat all those
steps. SPSS will always calculate TSTAT for you.
Let’s show you a real example on SPSS!
40
1.7 Practical example with SPSS
• Open the file House.SAV (file / open file / House.SAV)

• Question: can we say that the mean price of houses in this city
is equal to 300.000€?
Step 1: define hypotheses

H0:
H1: 0
Step 2: understand what you want to work on
• We need a single sample taken from all houses in the city.
• We want to test the mean (a quantitative variable).
• We need the sample size n and to obtain the sample mean .
Step 3: select the relevant test
We need the “Two tailed test T test” (well, for the moment you don’t know any other one!)
Step 4: check assumptions: we consider that they hold (we just need the sample size to be above 30)
Step 5: select the level of significance. Let’s keep 5%
41
• Select Analysis / Compare Means / One Sample T-test
• Select the variable “value”
• Select 300 for the test value

42
Sample standard
Sample mean deviation s
Sample size n Std of the mean
Number of degrees Difference between the

TSTAT of freedom observed sample mean (398)
n-1 and the expected value 300
43
Step 6: to obtain the critical value, we need to read the T-table are row 86
TCRIT= 2.00
Step 7: compare TSTAT=7.670 and TCRIT= 2.00
Step 8: conclude.
At a 5% risk we reject the null assumption or “we can not say that the average price of houses in the city is equal to 300,
at a 5% level of significance”.
44
2 The P-value: the core of inferential statistics
2.1 Why another new method?
• As you’ve seen, you had to fetch a statistical table, to identify the row and column and to read it. Isn’t it a little bit
outdated? What about a tool automatically computed, the same for all tests and easy to read?
• This tool is called the p-value (or probability value).
• The global list of steps remains the same, save for step 6:
1. Set up hypotheses H0 and H1.

2. Understand what you want to work on. A mean? A proportion? Compare one or many groups? Compare
qualitative variables?
3. Deduce which test might be applicable to your case (and there are many!).
4. Check that assumptions to use this test are applicable (or find a backup solution).
5. Decide on the significance level α…. Basically, select how confident you want to be in the conclusion.
6. Read the p-value and compare it to the significance level α
7. Make a decision : do you reject or not H0.
8. State a global conclusion.
45
2.2 Definition of the p-value
• We give you three definitions for the p-value: a geometric definition, Bound
a statistical definition and a practical definition. They are, of course,
identical.
• Let’s rely on a practical example to explain:
H1 H0
You want to know if the mean age at the wedding is LESS than 30.
The sample size is n=100, the sample mean is 25 and
the sample standard deviation s=20. As before =2.
Surface = α=5% µ0=30

Step 1: define the test
H0:
H1:
This is a one-tailed test or left-tailed test as H1 is on the left.
Step 2: we work on the mean of one sample.
Step 3: the relevant test is called the “One Sided T-Test”
Step 4: assumptions are assumed to hold. Basically, the sample size has to be large enough (n>30)
Step 5: the level of significance is selected α=5%
46
2.2 Definition of the p-value =25
Bound
2.2.1 P-value and geometry
P-Value = surface
H1 H0
µ0=30
Surface = α=5%
• The p-value is a surface area / area. It’s the surface / area “further than the observed value”.
• This surface / area is also a probability. The p-value is always between 0 and 1.
• Notice that when the p-value is tiny, the observed value is on the left so is “deep” inside H1.
 the smaller the p-value, the larger the distance between the expected value and the observed value.
 The smaller the p-value, the more we tend to reject the null hypothesis. We are very confident in the fact that H 0 is
not acceptable.
47
2.2 Definition of the p-value =25
Bound
2.2.2 Interpretation
P-Value = area
H1 H0
µ0=30
Surface = α=5%
• The p-value is the probability, if H0 is true, that another random sample will give you a sample mean “further
than” the observed value 25.
• It’s also described as “the probability that another sample will produce an equal or more extreme value than the
observed one if H0 is true”.
The p-value is the probability that the test statistic equals or is more extreme than the observed value, under the
assumption that the null hypothesis is true. The p-value is sometimes interpreted (quite incorrectly) as the likelihood
of H0.
48
Bound
H1 H0
Surface = α=5%
µ0=30
P-Value = area
Likelihood
• If the p-value is tiny, observing in a sample even more extreme value than the one that we obtained is very unlikely if
H0 is true.
• It means that the observed value is itself very unlikely if H0 is true. Two potential explanations. We observed a value
far from the expected value as…
 H0 is not true! The sample behaves like the population. Maybe the real population mean is 24, so it’s quite
logical to observe 25.
 H0 is true! By lack of luck, we selected a faulty sample (ex: the table with children at the wedding). The sample
does not behave like the population. But selecting randomly such a faulty sample is unlikely*. It might happen
but the likelihood is small… thus we don’t believe in this conclusion.
• Conclusion: if the p-value is tiny, we observe something that should not happen in H 0 is true. Thus we don’t trust H0
and reject it.
• Now the question becomes: what is “tiny”?
49
Bound
H1 H0
Surface = α=5%
µ0=30
P-Value = area
Likelihood
• Now the question becomes: what is “tiny”?

• When observed value is on the left side of the bound, we conclude H1.
• It’s the same thing as saying the p-value (red area) is smaller than the green area…. Which is equal to α.
The decision rule is if p-value α then do not reject H0

if p-value α then reject H0
 If p p-value α then we "fail to reject the null hypothesis" and conclude that there is not evidence of a difference in
the population. This does not mean that the H 0 is is true! Just that we do not have sufficient evidence to say that it is
likely false. These results are not statistically significant.
 If p-value α then we "reject the null hypothesis" and conclude that there is a difference in the population. These
results are statistically significant.
50
Bound
H1 H0
2.2.3 End of the example
Surface = α=5%
µ0=30
P-Value = area
Step 6: calculate the p-value

In this example, p-value = 0.0070 (SPSS will find it for you) and it’s lower than 5%, the level of significance
Step 7: conclude
As the p-value is less than 5%, we reject H0.
Step 8: state a global conclusion

At a 5% risk level, we can conclude that the mean age is significantly below 30.
In a research article, we might read “The mean age at the wedding is less than 30 (p-value=0.007)”
51
2.3 Importance of the p-value
• You might not have noticed it, but the p-value is fundamental in all research articles.
Example 1*: the p-value is routinely reported in almost all research articles.
This article studied factors influencing the use of Netflix.
Example 2: demonstrate that a vaccine has an effect
Example 3: why do you think that research does not rely on a single study?
A small p-value means that we are confident in the rejection of H 0. But this result might have been obtained by chance
as the sample might not behave like the whole population. Some studies that conflict with facts are always expected.
That’s why research rely on meta-analysis, which is an “analysis of analysis” as the risk that samples are consistently
skewed is limited.
Science relies on reproducibility.
A single study might be defective. A single study is not a strong proof.
52
2.3 Importance of the p-value
Example 4: what do we say when we talk to somebody without any knowledge of the p-value?
 A tradition is to use the word “significance” (la significativité) and mean “probably not due to chance”. It’s like a code:
 When the p-value is larger than 10%  the result is not significant
between 5 and 10%  the result is not very significant
between 1 and 5%  the result is significant
between 0.1 and 1%  the result is very significant
less than 0.1%  the result is highly significant
The table is taken from an article that studies product placement

in movies*. The question each time is “do countries differ their views
over product placement”. Notice the “Sig” and the “significant”.
The answer is each time “yes”… so it’s very important for the
marketing teams to take it into consideration as the movie is
released in all countries.
53
2.4 Practical use on SPSS
• Open the file House.SAV (file / open file / House.SAV) as before.
• Question: can we say that the mean price of houses in this city is equal to 300,000€? We obtained the following
table:
P-value Confidence interval for the

mean difference
• The p-value is called “Sig” as “Significance” on SPSS. (2-tailed) indicates that the test is… two tailed (so that we check
if the value is equal or not to 300K).
• The p-value is smaller than 0.001 so we reject H0 and conclude that “the mean price of houses in the population is
significantly different from 300K”.
• The confidence interval is a little bit strange. It means that we are 95% confident that the gap between the expected
population mean 300K and the real population mean is between 72K and 123K. Which is a very complicated way of
saying “we are 95% confident that the population mean is between 372K and 423K”.
54
3 Examples and practice
3.1 What will you face at the exam?
• The objective is to make you understand how to state a correct test and how to read results. We intentionally do not
focus on computations. It might sound easy… but it’s not as every single word is important.
• You need to be able to explain the meaning of your results. Basically, we need you to be able to explain what a test is.
What the p-value is.
• What about a small auto-test to begin with?
 Are you able to list all steps in a test?
 Do you know the difference between a two tailed and a one tailed test?
 Do you know how to write the null and alternative hypothesis (for the moment, only for a mean on one sample)?
 Would you be able to define the p-value with your own words (and in a meaningful way)?
 Do you know what are the expected value, observed value, standard deviation of the sample, standard deviation
between samples, critical value, tSTAT, Type I error, alpha?
 Do you know why we use the Student distribution?
 Would you be able to give (an imperfect) definition for the degree of freedom?
 Would you be able to read a table from SPSS?
 Would you be able to transform a test into something that you can test?
 Would you be able to explain the concept of significance?
55
3 Examples and practice
3.2 Practice
 To be honest, the best practice is to play with the dataset customer_dbase and to check is some values are expected.
 If you want to practice, have a look at chapter 9, part 2:
Problem 30 questions a, b
Problem 32 question a
Problem 34 questions a and b.
The total number of problems is limited as we practice a lot in the third part.

Chapter 3 Part 1 The Logic of Hypothesis Testing v2

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 3 Part 1 The Logic of Hypothesis Testing v2

Uploaded by

Copyright:

Available Formats

Chapter 3.

This table has been used to demonstrate that the Pfizer

 The (oversimplified) question is “can we state that

The idea is not to rely on opinions but on facts. It’s a

• The four parts of this chapter cover

Now let’s begin…..

1. Set up hypotheses H0 and H1.

Let's cover them one by one in the next slides!

Step 2: understand what you want to work on

Step 3: select the relevant test

H0 true H1 is true / H0 False

Do not reject H0 / Decide H0

H0 true H1 is true / H0 False

Do not reject H0 / Decides H0

H0 true H1 is true / H0 False

Do not reject H0 / Decides H0

H0 true H1 is true / H0 False

Do not reject H0 / Decide H0

Decision Actual Situation

Reject H / Decides H Probability =

If we observe a sample mean far from 30,

Where are exactly the

Observed value in the sample =25

Surface = α/2=2.5% Surface = α/2=2.5%

They are such that the

This new value is the standardized value. Let’s give it a name.

We also notice that the green surface is still the same.

Step 8: state a global conclusion

…. But something is missing….. Have you noticed what?

• Tα/2 is obtained from reading a Student distribution… and an

Step 6: Tα/2 is equal to 2.00

• Open the file House.SAV (file / open file / House.SAV)

Step 1: define hypotheses

• Select Analysis / Compare Means / One Sample T-test

• Select the variable “value”

• Select 300 for the test value

Sample size n Std of the mean

Number of degrees Difference between the

Step 7: compare TSTAT=7.670 and TCRIT= 2.00

1. Set up hypotheses H0 and H1.

Surface = α=5% µ0=30

• Now the question becomes: what is “tiny”?

The decision rule is if p-value α then do not reject H0

Step 6: calculate the p-value

Step 8: state a global conclusion

Example 2: demonstrate that a vaccine has an effect

The table is taken from an article that studies product placement

P-value Confidence interval for the

You might also like