Professional Documents
Culture Documents
Asset-V1 MITx+18.6501x+2T2021+Type@Asset+Block@Lectureslides Chap1 Annot Attrib
Asset-V1 MITx+18.6501x+2T2021+Type@Asset+Block@Lectureslides Chap1 Annot Attrib
1/37
Goals
Goals:
I To give you a solid introduction to the mathematical theory
behind statistical methods;
I To provide theoretical guarantees for the statistical methods
that you may use for certain applications.
At the end of this class, you will be able to
1. From a real-life situation, formulate a statistical problem in
mathematical terms
2. Select appropriate statistical methods for your problem
3. Understand the implications and limitations of various
methods
2/37
Why statistics?
4/37
In the press
https/:www.nytimes.com:interactive:2018:07:18:upshot:nike-vaporfly-shoe-strava.html
Citation/Attribution: Article (c) New York Times
tps://hbr.org/2018/06/how-vineyard-vines-uses-analytics-to-win-over-customers
Citation/Attribution -- Article and Image Copyright © 2019 Harvard Business School Publishing. All rights reserved
ttps://www.fastcompany.com/90188017/appnexus-is-key-to-atts-plans-to-use-hbo-for-more-consumer-data
Citation/Attribution -- Article by Jeff Beer on Fast Company & Inc © 2019 Mansueto Ventures,
4/37
In science and engineering
https://spectrum.ieee.org/tech-talk/biomedical/devices/measuring-tiny-magnetic-fields-with-an-intelligent-quantum-sensor
Citation/Attribution -- Article Image (c) International Journal of Electrical, Electronics and Data Communication.
4/37
On TV
4/37
Statistics, Data Science . . . and all that
5/37
Statistics, Data Science . . . and all that
5/37
Computational and statistical aspects of data science
6/37
Probability
Rolling 1 die:
I Alice gets $1 if # of dots 3
I Bob gets $2 if # of dots 2
Who do you want to be: Alice or Bob?
Rolling 2 dice:
I Choose a number between 2 and 12
I Win $100 if you chose the sum of the 2 dice
Which number do you choose?
7/37
Statistics and modeling
I Dice are well known random process from physics: 1/6 chance
of each side (no need for data!), dice are independent. We
can deduce the probability of outcomes, and expected $
amounts. This is probability.
I How about more complicated processes? Need to estimate
parameters from data. This is statistics
I Sometimes real randomness (random student, biased coin,
measurement error, . . . )
I Sometimes deterministic but too complex phenomenon:
statistical modeling
Complicated process “=” Simple process + random noise
I (good) Modeling consists in choosing (plausible) simple
process and noise distribution.
8/37
Probability
Statistics
8/37
Statistics vs. probability
9/37
What this course is about
10/37
What this course is about
10/37
Let’s do some statistics
11/37
The kiss
11/37
The kiss
11/37
Statistical experiment
p̂ =
12/37
Random intuition
13/37
A first estimator
14/37
Modelling assumptions
15/37
Discussion
Let us discuss these assumptions.
16/37
Population vs. Samples
I Assume that there is a total population of 5,000
“airport-kissing” couples
I Assume for the sake of argument that p = 35% or that
p = 65%.
I What do samples of size 124 look like in each case?
17/37
Why probability?
18/37
Probability redux
19/37
Averages of random variables: LLN & CLT
p X̄n µ (d)
n ! N (0, 1).
n!1
p (d) 2
(Equivalently, n (X̄n µ) ! N (0, ).)
n!1
20/37
Another useful tool: Hoe↵ding’s inequality
Then,
2n"2
IP[|X̄n µ| "] 2e (b a)2 . 8" > 0
21/37
Consequences
I The LLN’s tell us that
IP, a.s.
R̄n ! p.
n!1
I X ⇠ N (µ, 2)
I IE[X] = µ
I var(X) = 2 >0
23/37
Gaussian density (pdf)
✓ 2
◆
1 (x µ)
fµ, 2 (x) = p exp 2
2⇡ 2
a·X +b⇠N ,
Z= ⇠ N (0, 1)
IP(u X v) = IP Z
IP(|X| > x) =
25/37
Gaussian probability tables
negative Z positive Z
27/37
Quantiles
Definition
Let ↵ in (0, 1). The quantile of order 1 ↵ of a random variable
X is the number q↵ such that
IP(X q↵ ) = 1 ↵
↵ 2.5% 5% 10%
q↵ 1.65 1.28
I Convergence in probability:
IP
Tn !T i↵ IP [|Tn T| "] ! , 8" > 0.
n!1 n!1
I Convergence in distribution:
(d)
Tn !T i↵ IE[f (Tn )] ! IE[f (T )]
n!1 n!1
31/37
Exercises
32/37
Exercises
33/37
Addition, multiplication, division
Then,
a.s./IP
I Tn + U n ! T + U,
n!1
a.s./IP
I Tn U n ! TU,
n!1
Tn a.s./IP T
I If in addition, U =
6 0 a.s., then ! .
Un n!1 U
34/37
Slutsky’s theorem
Some partial results exist for convergence in distribution on the
form of Slutsky’s theorem.
...
35/37
Taking functions
Continuous functions (for all three types) . If f is a continuous
function:
a.s./IP/(d)
Tn !T ) f (Tn ) ! f (T ).
n!1 n!1
IP, a.s.
Example: Recall that by LLN, R̄n ! p. Therefore
n!1
IP, a.s.
f (R̄n ) ! for any continuous f
n!1
(d)
! f (Y ) Y ⇠ N (0, p(1 p))
n!1
p
B not the limit of n[f (R̄n ) f (p)] !! 36/37
Recap
37/37
Unit 1 --18.6501x Attribution Credits
1.
Unit 1 Section 2: What is Statistics
Slide # -- page 4
Object Source / URL: file://localhost/--
https/::www.nytimes.com:interactive:2018:07:18:upshot:nike-vaporfly-shoe-strava.html
Citation/Attribution: Article (c) New York Times
2.
Unit 1 Section 2: What is Statistics
Slide # page 4
Object Source/ URL: https://www.technologyreview.com/s/609760/data-mining-reveals-
the-way-humans-evaluate-each-other/
Citation/Attribution -- Article from the MIT Technology Review. (c) MIT
3.
Unit 1 Section 2: What is Statistics
Slide # page 5
Object Source/ URL:
https://hbr.org/2018/06/how-vineyard-vines-uses-analytics-to-win-over-customers
Citation/Attribution -- Article and Image Copyright © 2019 Harvard Business School
Publishing. All rights reserved.
4.
Unit 1 Section 2: What is Statistics
Slide # 5
Object Source / URL*
https://www.fastcompany.com/90188017/appnexus-is-key-to-atts-plans-to-use-hbo-for-
more-consumer-data
Citation/Attribution -- Article by Jeff Beer on Fast Company & Inc © 2019 Mansueto
Ventures, LLC
5.
Unit 1 Section 2: What is Statistics
Slide # 6
Object Source / URL*
https://www.theguardian.com/science/2017/oct/04/what-is-cryo-electron-microscopy-the-
chemistry-nobel-prize-winning-technique
Citation/Attribution -- Image (c) The Guardian.
6.
Unit 1 Section 2: What is Statistics
Slide # 6
Object Source / URL*
https://spectrum.ieee.org/tech-talk/biomedical/devices/measuring-tiny-magnetic-fields-
with-an-intelligent-quantum-sensor
Citation/Attribution -- Article Image (c) International Journal of Electrical, Electronics and
Data Communication.
7.
Unit 1 Section 2: What is Statistics
Slide # 7
Object Source / URL*
https://www.youtube.com/watch?v=0Rnq1NpHdmw&has_verified=1
Citation/Attribution
Photo of John Oliver © 2019 Home Box Office, Inc. All Rights Reserved.
8.
Unit 1 Section 2: What is Statistics
Slide # 7
Object Source / URL*
https://medium.com/netflix-techblog/studio-production-data-science-646ee2cc21a1
Citation/Attribution
Image on the Medium website (c) Netflix corporation.
9.
Unit 1
Slide # 18, 19
Object Source / URL*
http://www.musee-rodin.fr/en/collections/sculptures/kiss
Citation/Attribution
Photo (c) Musée Rodin.
10.
Unit 1
Slide # 20
Object Source / URL*
https://www.nature.com/articles/421711a
Citation/Attribution – SpringerNature.com
11.
Unit 1
Slide # 23
Object Source / URL*
http://mathshistory.st-andrews.ac.uk/PictDisplay/Gauss.html
Citation/Attribution
Image from the MacTutor History of Mathematics archive
(success)