Professional Documents
Culture Documents
Introduction &
Refresh of (some) statistical
concepts
Giampiero Passaretta
giampiero.passaretta@upf.edu
Part 1
Logistics
Logistics
When: September 27 – November 28 (No class:
Novermber 1)
Contact: giampiero.passaretta@upf.edu
Two parts
1. Lecture (~1.30–1.45 h)
Giampiero
• Population vs sample
• Probability
• Probability distribution
• Normal distribution
• Standard normal
• Sampling and sampling distribution
• Inferential statistics
What is this course about?
Bivariate linear
regression
Multiple linear
regression
Categorical independent
variables
Specification errors
& Inefficiencies
Generalized linear
models
Textbook & Readings
Bivariate linear
regression
Multiple linear
regression
Categorical independent
variables
Specification errors
Generalized linear
models
What is this course about?
Why regression The use of regression analysis in
techniques? the social sciences
Bivariate linear
regression
Multiple linear
regression
Categorical independent
variables
Specification errors
Generalized linear
models
Part 2
We may be interested in
Prediction predicting a certain behavior
Association
Causation
Three fundamental uses
Linear regression as a
statistical tool…
INDIVIDUALS
COUNTRIES
Unit of analysis
POLITICAL PARTIES
FIRMS
POPULATION Dream
SAMPLE
data Reality data
The inferential problem
POPULATION SAMPLE
What we are interested in What we (usually) work with
(If random…)
Probability theory
(inference on the population
from the sample)
Probability: definition
Dicothomous variables
2 values; 2 probabilities (example: coin toss)
Continuous variables
Many values; probability assigned to intervals
Probability distribution
Continuous variables
100%
Probability distribution
Continuous variables
Interval A [20+]
P(A) = .39
39%
Probability distribution
Continuous variables
Interval B [19–]
P(B) = 1 – P(A)= .61
Interval A [20+]
P(A) = .39
61%
39%
Normal distribution
Simmetric (bell-shaped)
Mean Variance
50% 50%
Variability
around the mean
Normal distribution: Facts
N ∼ (μ, σ2)
«Empirical rule»
68–95–99
μ = 20
σ=2
μ = 20
σ=2
99%
between 14 and 26
μ = 20
σ=2
99%
between 14 and 26
95%
between 16 and 24
μ = 20
σ=2
99%
between 14 and 26
95%
between 16 and 24
68%
between 18 and 22
68%
μ=0
between σ=1
95%
between
For all standardized
99% distributions
between
(Standard) Normal distribution
Why is important?
POPULATION SAMPLE
What we are interested in What we (usually) work with
(If random…)
Probability theory
(inference on the population
from the sample)
The inferential problem
Population
Sample
ESTIMATOR of the
Population PARAMETER of population parameter (
interest ()
Population of interest
UPF students
Thought experiment
Imagine we could draw many random samples
(n=100) of UPF students…
Thought experiment We repeat the sampling!
(contant size n)
years
Pop.
years
years
years
Standard error of
the mean
Standard deviation in
the population
(UNKNOWN)
Sample size
(KNOWN)
In practice
Standard deviation in the population () estimated by the standard
deviation in the sample (s):
Example: Mean age at UPF
WHAT?!
Example: Mean age at UPF
In practice
Which z-value?
For a 95% confidence level, the error we accept if 5%
5% overall
Z-value = 1.96
(95% of the values falls
within 1.96 SD of the mean)
Example: Mean age at UPF
In practice
Population
Sample
Probability theory
(inference on the population from the
sample)
(1) T-student
(2) F-distribution
Student’s t-distribution