3 views

- What are Failure Codes in Maintenance Software?
- Solutions Probability Essentials 20-22
- US Federal Reserve: 200757pap
- Excel
- Excel Formulas
- PDP Transferable Skills Audit
- 10.1.1.10
- Modeling Its Role
- Introduction to Statistics
- Resource Use Efficiency in Onion Production among Participating and Non-Participating Farmers in Hadejia Valley Irrigation Project, Jigawa State, Nigeria
- MBA Syllabus
- Eptis Bam Asecal Gases
- agec12372-sup-0002-code
- MIR - Tarasov L. - The World is Built on Probability
- Assignment2_soln
- stats analytical report
- Gillman
- cwk
- 3 Probability[1] (1).ppt
- The Effectiveness of a Multicenter Quality.1

You are on page 1of 46

650

Statistics for Applications

Chapter 1: Introduction

1/43

Goals

Goals:

▶ To give you a solid introduction to the mathematical theory

behind statistical methods;

▶ To provide theoretical guarantees for the statistical methods

that you may use for certain applications.

At the end of this class, you will be able to

1. From a real-life situation, formulate a statistical problem in

mathematical terms

2. Select appropriate statistical methods for your problem

3. Understand the implications and limitations of various

methods

2/43

Instructors

Associate Prof. of Applied Mathematics; IDSS; MIT Center

for Statistics and Data Science.

Instructor in Applied Mathematics; IDSS; MIT Center for

Statistics and Data Science.

3/43

Logistics

▶ Optional Recitation: TBD.

▶ Homework: weekly. Total 11, 10 best kept (30%).

▶ Midterm: Nov. 8, in class, 1 hours and 20 minutes (30 %).

Closed books closed notes. Cheatsheet.

▶ Final: TBD, 2 hours (40%). Open books, open notes.

4/43

Miscellaneous

notions of linear algebra (matrix, vector, multiplication,

orthogonality,…)

▶ Reading: There is no required textbook

▶ Slides are posted on course website

https://ocw.mit.edu/courses/mathematics/18-650-statistics-for-applications-fall-2016/lecture-slides

Attendance is still recommended.

5/43

Why statistics?

6/43

Not only in the press

Should be high enough for most floods Should not be

too expensive (high)

What is a fair premium?

44 showed no improvement. Is the drug effective?

8/43

Randomness

9/43

Randomness

RANDOMNESS

9/43

Randomness

RANDOMNESS

Associated questions:

▶ Notion of average (“fair premium”, …)

▶ Quantifying chance (“most of the floods”, …)

▶ Significance, variability, …

9/43

Probability

▶ Probability studies randomness (hence the prerequisite)

▶ Sometimes, the physical process is completely known: dice,

cards, roulette, fair coins, …

Examples

Rolling 1 die:

▶ Alice gets $1 if # of dots 3

▶ Bob gets $2 if # of dots 2

Who do you want to be: Alice or Bob?

Rolling 2 dice:

▶ Choose a number between 2 and 12

▶ Win $100 if you chose the sum of the 2 dice

Well known random process from physics: 1/6 chance of each side,

dice are independent. We can deduce the probability of outcomes,

and expected $ amounts. This is probability. 10/43

Probability

▶ Probability studies randomness (hence the prerequisite)

▶ Sometimes, the physical process is completely known: dice,

cards, roulette, fair coins, …

Examples

Rolling 1 die:

▶ Alice gets $1 if # of dots 3

▶ Bob gets $2 if # of dots 2

Who do you want to be: Alice or Bob?

Rolling 2 dice:

▶ Choose a number between 2 and 12

▶ Win $100 if you chose the sum of the 2 dice

Well known random process from physics: 1/6 chance of each side,

dice are independent. We can deduce the probability of outcomes,

and expected $ amounts. This is probability. 10/43

Probability

▶ Probability studies randomness (hence the prerequisite)

▶ Sometimes, the physical process is completely known: dice,

cards, roulette, fair coins, …

Examples

Rolling 1 die:

▶ Alice gets $1 if # of dots 3

▶ Bob gets $2 if # of dots 2

Who do you want to be: Alice or Bob?

Rolling 2 dice:

▶ Choose a number between 2 and 12

▶ Win $100 if you chose the sum of the 2 dice

Well known random process from physics: 1/6 chance of each side,

dice are independent. We can deduce the probability of outcomes,

and expected $ amounts. This is probability. 10/43

Probability

▶ Probability studies randomness (hence the prerequisite)

▶ Sometimes, the physical process is completely known: dice,

cards, roulette, fair coins, …

Examples

▶ Alice gets $1 if # of dots 3

▶ Bob gets $2 if # of dots 2

Who do you want to be: Alice or Bob?

Rolling 2 dice:

▶ Choose a number between 2 and 12

▶ Win $100 if you chose the sum of the 2 dice

Well known random process from physics: 1/6 chance of each side,

dice are independent. We can deduce the probability of outcomes,

and expected $ amounts. This is probability. 10/43

Statistics and modeling

parameters from data. This is statistics

▶ Sometimes real randomness (random student, biased coin,

measurement error, …)

▶ Sometimes deterministic but too complex phenomenon:

statistical modeling

Complicated process “=” Simple process + random noise

▶ (good) Modeling consists in choosing (plausible) simple

process and noise distribution.

11/43

Statistics vs. probability

effective. Then we can anticipate that for a study on

100 patients, in average 80 will be cured and at least

65 will be cured with 99.99% chances.

be able to) conclude that we are 95% confident that

for other studies the drug will be effective on between

69.88% and 86.11% of patients

13/43

18.650

▶ Understand mathematics behind statistical methods

▶ Justify quantitive statements given modeling assumptions

▶ Describe interesting mathematics arising in statistics

▶ Provide a math toolbox to extend to other models.

▶ Statistical thinking/modeling (applied stats, e.g. IDS.012)

▶ Implementation (computational stats, e.g. IDS.012)

▶ Laundry list of methods (boring stats, e.g. AP stats)

14/43

18.650

▶ Understand mathematics behind statistical methods

▶ Justify quantitive statements given modeling assumptions

▶ Describe interesting mathematics arising in statistics

▶ Provide a math toolbox to extend to other models.

▶ Statistical thinking/modeling (applied stats, e.g. IDS.012)

▶ Implementation (computational stats, e.g. IDS.012)

▶ Laundry list of methods (boring stats, e.g. AP stats)

14/43

Let’s do some statistics

15/43

Heuristics (1)

romantic reappearance later in life.”

the right when kissing.

▶ Let us design a statistical experiment and analyze its outcome.

▶ Observe n kissing couples times and collect the value of each

outcome (say 1 for RIGHT and 0 for LEFT);

▶ Estimate p with the proportion p̂ of RIGHT.

▶ Study: “Human behaviour: Adult persistence of head-turning

asymmetry” (Nature, 2003): n = 124, 80 to the right so

80

p̂ = = 64.5%

124

17/43

Heuristics (2)

▶ 64.5% is much larger than 50% so there seems to be a

preference for turning right.

▶ What if our data was RIGHT, RIGHT, LEFT (n = 3). That’s

66.7% to the right. Even better?

▶ Intuitively, we need a large enough sample size n to make a

call. How large?

the accuracy of this procedure?

18/43

Heuristics (3)

▶ For i = 1, . . . , n, define Ri = 1 if the ith couple turns to the

right RIGHT, Ri = 0 otherwise.

▶ The estimator of p is the sample average

∑ n

¯n = 1

p̂ = R Ri .

n

i=1

that describes/approximates well the experiment.

19/43

Heuristics (4)

observations Ri , i = 1, . . . , n in order to draw statistical

conclusions. Here are the assumptions we make:

20/43

Heuristics (5)

Let us discuss these assumptions.

perfect information about the conditions of kissing (including

what goes in the kissers’ mind), physics or sociology would

allow us to predict the outcome.

Ri → {0, 1}. They could still have a different parameter

Ri " Ber(pi ) for each couple but we don’t have enough

information with the data estimate the pi ’s accurately. So we

simply assume that our observations come from the same

process: pi = p for all i

locations and different times).

21/43

Two important tools: LLN & CLT

1∑

n

IP, a.s.

X̄n := Xi −−−−≥ µ.

n n-≥

i=1

∈ X̄n − µ (d)

n −−−≥ N (0, 1).

α n-≥

∈ (d)

(Equivalently, ¯ n − µ) −−

n (X −≥ N (0, α 2 ).)

n-≥

22/43

Consequences (1)

IP, a.s.

R̄n −−−−≥ p.

n-≥

▶ ¯n

Hence, when the size n of the experiment becomes large, R

is a good (say ”consistent”) estimator of p.

▶ The CLT refines this by quantifying how good this estimate is.

23/43

Consequences (2)

∈ R̄n − p

<n (x): cdf of n√ .

p(1 − p)

CLT: <n (x) ≤ <(x) when n becomes large. Hence, for all x > 0,

( ( ∈ ))

[ ] x n

IP |R̄n − p| 2 x ≤ 2 1 − < √ .

p(1 − p)

24/43

Consequences (3)

Consequences:

▶ ¯ n concentrates around p;

Approximation on how R

N (0, 1), then with probability ≤ 1 − o (if n is large enough !),

[ √ √ ]

qa/2 p(1 − p) qa/2 p(1 − p)

R̄n → p − ∈ ,p + ∈ .

n n

25/43

Consequences (4)

▶ Note that no matter the (unknown) value of p,

p(1 − p) 1/4.

▶ Hence, roughly with probability at least 1 − o,

[ ]

qa/2 qa/2

R̄n → p − ∈ , p + ∈ .

2 n 2 n

▶ In

[ other words, when n becomes

] large, the interval

qa/2 qa/2

R̄n − ∈ , R̄n + ∈ contains p with probability 2 1 − o.

2 n 2 n

▶ This interval is called an asymptotic confidence interval for p.

▶ In the kiss example, we get

[ 1.96 ]

0.645 ± ∈ = [0.56, 0.73]

2 124

If the extreme (n = 3 case) we would have [0.10, 1.23] but

CLT is not valid! Actually we can make exact computations!

26/43

Another useful tool: Hoeffding’s inequality

What if n is not so large ?

Hoeffding’s inequality (i.i.d. case)

Let n be a positive integer and X, X1 , . . . , Xn be i.i.d. r.v. such

that X → [a, b] a.s. (a < b are given numbers). Let µ = IE[X].

Then, for all λ > 0,

2n�2

−

IP[|X̄n − µ| 2 λ] 2e (b−a)2 .

Consequence:

▶ For o → (0, 1), with probability 2 1 − o,

√ √

log(2/o) ¯ n + log(2/o) .

R̄n − p R

2n 2n

▶ This holds even for small sample sizes n.

27/43

Review of different types of convergence (1)

deterministic).

▶ Almost surely (a.s.) convergence:

[{ }]

a.s.

Tn −−−≥ T iff IP θ : Tn (θ) −−−≥ T (θ) = 1.

n-≥ n-≥

▶ Convergence in probability:

IP

Tn −−−≥ T iff IP [|Tn − T | 2 λ] −−−≥ 0, ⇒λ > 0.

n-≥ n-≥

28/43

Review of different types of convergence (2)

▶ Convergence in Lp (p 2 1):

Lp

Tn −−−≥ T iff IE [|Tn − T |p ] −−−≥ 0.

n-≥ n-≥

▶ Convergence in distribution:

(d)

Tn −−−≥ T iff IP[Tn x] −−−≥ IP[T x],

n-≥ n-≥

Remark

These definitions extend to random vectors (i.e., random variables

in IRd for some d 2 2).

29/43

Review of different types of convergence (3)

(d)

(i) Tn −−−≥ T ;

n-≥

(ii) IE[f (Tn )] −−−≥ IE[f (T )], for all continuous and

n-≥

bounded function f ;

[ ] [ ]

(iii) IE eixTn −−−≥ IE eixT , for all x → IR.

n-≥

30/43

Review of different types of convergence (4)

Important properties

▶ If (Tn )n?1 converges a.s., then it also converges in probability,

and the two limits are equal a.s.

q p and in probability, and the limits are equal a.s.

distribution

▶ If f is a continuous function:

a.s./IP/(d) a.s./IP/(d)

Tn −−−−−−−≥ T ≈ f (Tn ) −−−−−−−≥ f (T ).

n-≥ n-≥

31/43

Review of different types of convergence (6)

One can add, multiply, ... limits almost surely and in probability. If

a.s./IP a.s./IP

Un −−−−≥ U and Vn −−−−≥ V , then:

n-≥ n-≥

a.s./IP

▶ Un + Vn −−−−≥ U + V ,

n-≥

a.s./IP

▶ Un Vn −−−−≥ U V ,

n-≥

Un a.s./IP U

▶ If in addition, V ̸= 0 a.s., then −−−−≥ .

Vn n-≥ V

distribution unless the pair (Un , Vn ) converges in distribution to

(U, V ).

33/43

Another example (1)

T1 , . . . , Tn .

▶ Mutually independent

▶ Exponential random variables with common parameter , > 0.

arrival times.

34/43

Another example (2)

completely justified (often the case with independence).

exponential distribution:

▶ The exponential distributions of T1 , . . . , Tn have the same

parameter: in average all the same inter-arrival time. True

̸ 11pm).

only for limited period (rush hour =

35/43

Another example (3)

▶ Density of T1 :

f (t) = ,e− t , ⇒t 2 0.

1

▶ IE[T1 ] = .

,

1

▶ Hence, a natural estimate of is

,

1∑

n

T̄n := Ti .

n

i=1

▶ A natural estimator of , is

ˆ := 1 .

,

T̄n

36/43

Another example (4)

▶ By the LLN’s,

a.s./IP 1

T̄n −−−−≥

n-≥ ,

▶ Hence,

ˆ−a.s./IP

, −−−≥ ,.

n-≥

▶ By the CLT,

( )

∈ 1 (d)

n T̄n − −−−≥ N (0, ,−2 ).

, n-≥

How does the CLT transfer to ,

confidence interval for , ?

37/43

The Delta method

∈ (d)

n(Zn − e) −−−≥ N (0, α 2 ),

n-≥

asymptotically normal around e).

Then,

▶ (g(Zn ))n?1 is also asymptotically normal;

▶ More precisely,

∈ (d)

n (g(Zn ) − g(e)) −−−≥ N (0, g ′ (e)2 α 2 ).

n-≥

38/43

Consequence of the Delta method (1)

∈ ( ) (d)

▶ n ,ˆ − , −−−≥ N (0, ,2 ).

n-≥

qa/2 ,

ˆ − ,|

|, ∈ .

n

[ ]

q , q ,

▶ Can , ˆ − a/2

∈ ,, ˆ + a/2

∈ be used as an asymptotic

n n

confidence interval for , ?

▶ No ! It depends on ,...

39/43

Consequence of the Delta method (2)

▶ In this case, we can solve for ,:

( ) ( )

qa/2 , qa/2 qa/2

ˆ

|, − ,| ∈ ∼≈ , 1 − ∈ , , 1+ ∈

ˆ

n n n

( )−1 ( )

qa/2 qa/2 −1

∼≈ , 1 + ∈

ˆ , , 1− ∈

ˆ .

n n

[ ( ) ( ) ]

qa/2 −1 qa/2 −1

Hence, , ˆ 1+ ∈ ,,ˆ 1− ∈ is an asymptotic

n n

confidence interval for ,.

40/43

Slutsky’s theorem

Slutsky’s theorem

Let (Xn ), (Yn ) be two sequences of r.v., such that:

(d)

(i) Xn −−−≥ X;

n-≥

IP

(ii) Yn −−−≥ c,

n-≥

where X is a r.v. and c is a given real number. Then,

(d)

(Xn , Yn ) −−−≥ (X, c).

n-≥

In particular,

(d)

Xn + Yn −−−≥ X + c,

n-≥

(d)

Xn Yn −−−≥ cX,

n-≥

...

41/43

Consequence of Slutsky’s theorem (1)

▶ Thanks to the Delta method, we know that

∈ ,ˆ − , (d)

n −−−≥ N (0, 1).

, n-≥

ˆ −−IP−≥ ,.

,

n-≥

▶ Hence, by Slutsky’s theorem,

∈ ,ˆ − , (d)

n −−−≥ N (0, 1).

ˆ

, n-≥

[ ]

ˆ

qa/2 , ˆ

qa/2 ,

ˆ − ∈ ,,

, ˆ+ ∈ .

n n

42/43

Consequence of Slutsky’s theorem (2)

Remark:

trick: “p(1 − p) 1/4”.

confidence interval

[ √ √ ]

qa/2 R¯ n (1 − R

¯ n ) qa/2 R¯ n (1 − R

¯n)

R¯ n − ∈ ¯n +

,R ∈ .

n n

43/43

MIT OpenCourseWare

https://ocw.mit.edu

Fall 2016

For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms.

- What are Failure Codes in Maintenance Software?Uploaded byJeff O'Brien
- Solutions Probability Essentials 20-22Uploaded byJesus Enrique Miranda Blanco
- US Federal Reserve: 200757papUploaded byThe Fed
- ExcelUploaded byapi-3852736
- Excel FormulasUploaded byKCMozhi
- PDP Transferable Skills AuditUploaded byUsman Khalid
- 10.1.1.10Uploaded byAkshay Mahawar
- Modeling Its RoleUploaded byzybactik
- Introduction to StatisticsUploaded byHarsh Saraf
- Resource Use Efficiency in Onion Production among Participating and Non-Participating Farmers in Hadejia Valley Irrigation Project, Jigawa State, NigeriaUploaded byIOSRjournal
- MBA SyllabusUploaded bySindhu Guna
- Eptis Bam Asecal GasesUploaded bylp2508gc
- agec12372-sup-0002-codeUploaded byfsegrstgr
- MIR - Tarasov L. - The World is Built on ProbabilityUploaded byavast2008
- Assignment2_solnUploaded byAdil Ali
- stats analytical reportUploaded byapi-255547696
- GillmanUploaded byHery Purwanto
- cwkUploaded bybs
- 3 Probability[1] (1).pptUploaded bySri Sakthy
- The Effectiveness of a Multicenter Quality.1Uploaded byJosé María Aegris
- Dcl PaperUploaded bykedee82
- StaticsUploaded bykulsoomalam
- Theories QuizesUploaded byShiloh River
- IMUploaded byVincentAlejandro
- masal presentationUploaded byapi-310115431
- Lampiran MeanUploaded byGulo
- Ch01 ScreenUploaded byJohn
- chapter 1 stats solutionsUploaded bySecure High Returns
- Social Research Methods Knowledge Base.pdfUploaded byepni sudrajat
- concept of statisticsUploaded byRITESH

- CurvesUploaded byDurai Raj
- Preferences and UtilityUploaded byIniesta Andrei
- Munson, 7th ed, Review ProblemsUploaded byUmairZahid
- P-N JUNCTION ash.pptxUploaded byAshvani Shukla
- Calculus Law of Exponential ChangeUploaded bytutorciecle123
- Auto CascadingPrompts Without Page Refresh in IBM CognosReport StudioUploaded byclrhoades
- Terminal Blocks - Catálogos (En)Uploaded byFelipe
- Innovative Use of Timber Rounds in High Mark l BatchelarUploaded byBill Ginivan
- Atwood-Water-Heater-Service.pdfUploaded byslong449624135
- On the Low Temperature Resistivity Measurement of CdSe Thin FilmUploaded byEditor IJTSRD
- Lowara 5SVUploaded byPatricia J Ángeles
- Introduction to Photonics lecture 13-14-15 16 Electromagnetic Optics(1)Uploaded byLauren Stevenson
- Identificación AlternariaUploaded byDiana Becerra
- l5 Spring BalanceUploaded byNur Syamiza Zamri
- 2013_PLMEurope_24.10.17-13-30_GEORG-PIEPENBROCK_SPLM_validation_of_plant_performance_and_plant_control_with_tecnomatix_plant_simulati.pdfUploaded byRodrigo Calloni
- 2017 Supervisor Pool DocumentUploaded byRichard Smith
- Physiology Chap10 (Rhythmical Excitation of the Heart)Uploaded byMan Dejelo
- Lab Manual Tutorial Clk DividerUploaded byPrashanth Hamilpur
- PART III FLUID FLOW IN PIPES.pdfUploaded bydanniel austria
- Practice Quiz1Uploaded bysyrissco123
- Oracle Placement Sample Paper 6Uploaded byNaveen Pal
- 034 - Engine - VVT System Malfunction (bank 1).pdfUploaded byMarran
- 01 GPS receiver calibration by BIPM (NIM).pptxUploaded byAlieah Alina
- Lecture 10 Intensity Transformation Functions Using Matlab.pdfUploaded byscribbyscrib
- 6ES73231BH010AA0 Datasheet EnUploaded byHieu LE
- Methods for Reconditioning Silica GelUploaded byMallampati Ramakrishna
- chapter 1Uploaded byKaiswan Gan
- The purpose of numerical modeling.pdfUploaded byburland
- AC portatil wurden.pdfUploaded byTecnicas Metalicas
- LM2470TAUploaded byasr2972

## Much more than documents.

Discover everything Scribd has to offer, including books and audiobooks from major publishers.

Cancel anytime.