You are on page 1of 32

Estimation

Prof : Xavier Boute


boute@hec.fr

Let’s do a survey :
Which picture do you prefer ?

Desert Bridge

Answer on blackboard
2

Let’s do another survey :


How many letters have you got in your last name ?

James Bond : 4

Definitions

• Population : set of all the cases (individuals) I’d


like to study
• Population’s size : N
• Sample : part of the population I will study
• Sample’s size : n

Representative sample ?

POPULATION

Sample

5
Quota’s method

Gender Age Origin

Man 60 Under 25 yrs 20 France 25

Woman 40 Between 25 and 30 yrs 45 China 20

More than 30 yrs 35 India 20

Europe (except France) 15

Other 20

6
Estimation of a mean (quantitative)
What’s the difference between the
estimation and the value I want
to estimate ?

POPULATION Sample
Size N n

N n
1 1
n∑
x̄ = xi
N∑
Mean μ= xi
i=1 i=1

Difference between x̄ and μ ?
7
Estimation of a proportion (qualitative)
What’s the difference between the
estimation and the value I want
to estimate ?

POPULATION Sample
Size N n

Proportion p p̂

̂
Difference between p and p ?
8
What is an (n) estimator ?
- It’s a function
- It gives the result (estimation) for each sample of size n

John Anna Kim Paul Li


Ex : 5 students
with GPA 3 4 3 2 5

Estimator for the mean with sample’s size = 2


John John John John Anna Anna Anna Kim Kim Paul
Anna Kim Paul Li Kim Paul Li Paul Li Li
3.5 3 2.5 4 3.5 3 4.5 2.5 4 3.5

Bias of an estimator

10
Central limit theorem

The estimator (for sample of size  = n) is a variable (function).
When n is big enough, this variable is a normal distribution.
σ
Those parameters are mean = μ and standard deviation =
n

μ is the mean for all the population
n is the sample's size
σ is the standard-deviation for the population.
11
x̄ is one value of the estimator

which is a normal distribution, 
σ
with parameters μ and 
n

So, there is a 95 %  chance that :
σ
d(x̄; μ) ≤ 1.96
n
σ σ
μ − 1.96 μ μ + 1.96
n n
σ σ
1.96 1.96
12 n n
Confidence interval for the mean
So, there is a 95 %  chance that μ is in 

[ ]
σ σ
x̄ − 1.96 ; x̄ + 1.96
n n
μ is the mean for all the population
x̄ is the mean for the sample
n is the sample's size
σ is the standard-deviation for the population.

Same formula for 99%, 99.7%, by replacing 1.96 by 2.57 or 3.


13
Problem with the standard-deviation :
N
1 2

σ= (xi − μ)
N i=1

We have to estimate σ by using the sample.
n
1 2
n∑
We could imagine using  (xi − x̄)  as estimator.
i=1

But there is a bias. The good estimation is :


n
1 2
n−1∑
s= (xi − x̄)
i=1
14
Confidence interval for the mean

So, there is a 95 % POPULATION Sample


 chance that μ is in  Size N n

[ n]
s s N n
1 1
x̄ − 1.96 ; x̄ + 1.96
n∑
x̄ = xi

Mean μ= xi
n N i=1 i=1

n
Std. 1 N 1 2
n−1∑
2 s= (xi − x̄)
N∑
σ= (xi − μ)
deviation i=1
i=1

Same formula for 99%, 99.7%, by replacing 1.96 by 2.57 or 3.


15

« Non puoi insegnare niente a un uomo.


Puoi solo aiutarlo a scoprire ciò che ha dentro di sé »

16

Exercise : Cinema

Kill Joe 4 The 6th Element Quand la mer


monte …
# Spectators 500 400 100

Average 5 5.5 8.5

variance 6.26 3.9 0.5

The three films were evaluated by spectators with a


grade from 1 (I don’t like the movie) to 10 (I totally
enjoy this movie).
Assuming we have representative samples, what can you
conclude ?
17

18
Estimation of a proportion (qualitative)
What’s the difference between the
estimation and the value I want
to estimate ?

POPULATION Sample
Size N n

Proportion p p̂

̂
Difference between p and p ?
19
Confidence interval for the proportion
So, there is a 95 %  chance that p is in 

[ ]
̂ − p)̂
p(1 ̂ − p)̂
p(1
p̂ − 1.96 ; p̂ + 1.96
n n

p is the proportion for all the population
̂
p is the proportion for the sample
n is the sample's size

Same formula for 99%, 99.7%, by replacing 1.96 by 2.57 or 3.


20
Exercise : election in France

We have several candidates for an election.


The two candidates with the best results after the first round will go to the
second run.
 
2 days before the first round, we make an opinion poll on a representative
sample of 997 people.
We collect the following results  about the voting intentions :
Candidate A : 12%
Candidate B : 8%
Candidate C : 19 %
Candidate D : 16%
The other candidates have less than 5%.
 
What can you predict about the results of the first round ?

21

Chi-square test

2
χ
Relation between 2 qualitative variables ?

22
Repartition by gender and category in a company

Workers Technicians Managers

Men 20 40 40

Women 30 60 10

Is there a relation between gender and category ?


23
Workers Technicians Managers Sum

Men 20 40 40 100

Women 30 60 10 100

Sum 50 100 50 200

How many workers men should we have if no relation ?


100
50 × = 25
200
24
Workers Technicians Managers Sum

Men 20 25 40 50 40 25 100

Women 30 25 60 50 10 25 100

Sum 50 100 50 200

2 2 2 2 2 2
2 (20 − 25) (30 − 25) (40 − 50) (60 − 50) (40 − 25) (10 − 25)
χ = + + + + +
25 25 50 50 25 25
2
If χ  is significantly  > 0 ( i . e ≠ 0)
we reject the hypothesis « Gender »25 and « Category » are independent.
2
χ  distribution 

If significance is less
than .05,
We reject the
hypothesis where the 2
variables are
independent.
So there is a relation
between « Gender »
and « Category ».

26

Chi-square with SPSS

Action SPSS: “ Analyze”, “Descriptive statistics”, “Cross


tabs”, choose the variables
Statistics : chi square
Cells : you can add count or % if you want

27

Data Camp Assignment


1) Sport and Health Analysis
First, analyze the data sport&health from the sample MBA J22, by
using SPSS, to give some feedback to Sport&Health Department.
Then criticize the survey and propose a new one which would be
easier to analyze in the future.

Give your answer with a ppt file on blackboard.

Submission dates: January 31, 2022 04:30:00 PM to February 06,


2022 11:30:00 PM
2) Peer evaluation
Give your feedback to 4 classmate (on blackboard)
Evaluation dates: February 06, 2022 11:59:00 PM to February 13,
2022 11:42:00 AM 28

Exercise : Brouilly and sulfites

A wine-grower has decided to produce some Brouilly wine without the


warning "includes sulfites" on the bottle label. For this purpose, the
quantity of sulfite contained in the wine must not be more than 10 mg
per litre on average. A consumer group carries out a chemical analysis
of 25 bottles of this production of Brouilly. The results obtained for
this sample are mean =10.4 mg/litre and s = 0.75 mg/litre.

Questions

1. Compute a confidence interval of the average sulfite content in the


new production.
2. Calculate the size n of the sample allowing to estimate the average
sulfite content with a 0.1 mg/litre precision using a confidence
interval. In other words, you want a confidence interval with a length
0.2 mg/litre. 29

Confidence interval for the proportion


There is a 95 %  chance that p is in 

[ ]
̂ − p)̂
p(1 ̂ − p)̂
p(1
p̂ − 1.96 ; p̂ + 1.96
n n

Do we have a maximum for ̂ − p)̂ ?


p(1
For this value, calculate the precision for the
confidence interval for a sample size = 100; 400; 1000;
2000; 4000.

30
31
Compare the mean of FEV between smokers and non
smokers for all the population

Compare the mean of AGE between smokers and non


smokers for all the population

32

You might also like