© All Rights Reserved

2 views

© All Rights Reserved

- AAOC+C111(MATH+F113)
- null
- Normal Approximations for Hyper Geometric Distribution
- Normal Distributions
- 2007_PredictRandomEffects
- 1 Introduction to Or
- Statistical Corrosion Modelling
- AUTC_31215
- rpra3
- TME 601
- SOR Main Angol
- P537_Utilization of statistics to assess fire risk in buildings_K.Tillander.pdf
- MB0040 MQP Answer Keys
- HW2Solutions.pdf
- 12Outlier
- m-rel
- IJETR021666
- research-2013-lit-review[1].pdf
- Presentation on Reliability Basis of Current standards
- 05-StatisticalModels.pdf

You are on page 1of 26

Data Science

Statistical Models

Hanspeter Pfister, Joe Blitzstein, and Verena Kaynig

(observed) probability (unobserved)

This Week

HW1 due next Thursday - start last week!

Piazza (always search first), and avoid posting

code from your homework solutions (see

Andrews post https://piazza.com/class/

icf0cypdc3243c?cid=310 for more)

Drowning in data, but starved for information

source: http://extensionengine.com/drowning-in-data-the-biggest-hurdle-for-mooc-proliferation/

What is a statistical model?

a family of distributions, indexed by parameters

sharpens distinction between data and parameters,

and between estimators and estimands

parametric (e.g., based on Normal, Binomial) vs.

nonparametric (e.g., methods like bootstrap, KDE)

(observed) probability (unobserved)

What good is a statistical model?

All models are wrong, but some models are useful.

George Box (1919-2013)

All models are wrong, but some models are useful. George Box

Jorge Luis Borges,

On Exactitude in Science

In that Empire, the Art of Cartography attained such Perfection

that the map of a single Province occupied the entirety of a City,

and the map of the Empire, the entirety of a Province. In time,

those Unconscionable Maps no longer satisfied, and the

Cartographers

All models Guild struck

are wrong, a Map

but some of the

models Empirewhose

are useful. size was

George Box

that of the Empire, and which coincided point for point with it.

Big Data vs. Pig Data:

https://scensci.wordpress.com/2012/12/14/big-data-or-pig-

data/

Statistical Models: Two Books

Parametric vs. Nonparametric

parametric: finite-dimensional parameter space (e.g.,

mean and variance for a Normal)

nonparametric: infinite-dimensional parameter space

is there anything in between?

nonparametric is very general, but no free lunch!

remember to plot and explore the data!

Parametric Model Example:

Exponential Distribution

x

f (x) = e ,x > 0

1.2

1.0

1.0

0.8

0.8

0.6

pdf

0.6

cdf

0.4

0.4

0.2

0.2

0.0

0.0

0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0

x x

Exponential Distribution

x

f (x) = e ,x > 0

all models are wrong, but some are useful...

iterate between exploring, the data model-building,

model-fitting, and model-checking

key building block for more realistic models

The Weibull Distribution

Exponential has constant hazard function

Weibull generalizes this to a hazard that is t to a power

much more flexible and realistic than Exponential

representation: a Weibull is an Expo to a power

Family Tree of Parametric Distributions

HGeom

Limit

Conditioning

Bin

(Bern)

Limit

Conjugacy

Conditioning

Beta

Pois

(Unif)

Limit

Poisson process Bank - post oce

Conjugacy

Gamma NBin

Normal

(Expo, Chi-Square) Limit (Geom)

Limit

Limit

Student-t

(Cauchy)

Binomial Distribution

Figure 3.6 shows plots of the Binomial PMF for various values of n and p. Note that

the PMF of the Bin(10, 1/2) distribution is symmetric about 5, but when the success

story: X~Bin(n,p) is the number of successes in n

probability is not 1/2, the PMF is skewed. For a fixed number of trials n, X tends to be

larger when the success probability is high and lower when the success probability is low,

independent Bernoulli(p) trials.

as we would expect from the story of the Binomial distribution. Also recall that in any

PMF plot, the sum of the heights of the vertical bars must be 1.

Bin(10,1/2) Bin(10,1/8)

0.30

0.4

0.25

0.3

0.20

0.15

pmf

pmf

0.2

0.10

0.1

0.05

0.00

0.0

0 2 4 6 8 10 0 2 4 6 8 10

x x

Bin(100,0.03) Bin(9,4/5)

0.30

0.4

0.25

0.3

0.20

0.15

pmf

pmf

0.2

0.10

0.1

0.05

0.00

0.0

0 2 4 6 8 10 0 2 4 6 8 10

x x

Binomial Distribution

story: X~Bin(n,p) is the number of successes in n

independent Bernoulli(p) trials.

where each independently votes for A with probability p

E(X+Y)=E(X)+E(Y))

Var(X+Y)=Var(X)+Var(Y) if X,Y are uncorrelated)

(Doonesbury)

Poisson Distribution

Poisson

story: count number of events that occur, when there are a

last discrete distribution that well introduce in this chapter is the

large number of independent rare events.

h is an extremely popular distribution for modeling discrete data. We

its PMF, mean, and variance, and then discuss its story in more deta

Examples: # mutations in genetics, # of traffic accidents at a

nition 4.7.1 certain

(Poissonintersection, # of An

distribution). emails

r.v.inXanhas

inbox

the Poisson dis

parameter , where mean > =0,variance for the

if the PMF of XPoisson

is

e k

P (X = k) = , k = 0, 1, 2, . . . .

k!

write this as X Pois( ).

P1 k

is a valid PMF because of the Taylor series k=0 k! =e .

mple 4.7.2 (Poisson expectation and variance). Let X Pois( ).

w that the mean and variance are both equal to . For the mean, we h

Poisson PMF, CDF

1.0

1.0

0.8

0.8

0.6

0.6

PMF

CDF

Pois(2)

0.4

0.4

0.2

0.2

0.0

0.0

0 1 2 3 4 5 6 0 1 2 3 4 5 6

x x

1.0

1.0

0.8

0.8

0.6

0.6

PMF

CDF

Pois(5) 0.4

0.4

0.2

0.2

0.0

0.0

0 2 4 6 8 10 0 2 4 6 8 10

x x

FIGURE 4.7

dence of r.v.s) and of good approximations (there are various ways to m

Poisson

ood an approximation Approximation

is). A remarkable theorem is that, in the above n

Aj are independent, N Pois( ), and B is any set of nonnegative i

X

n

1 2

|P (X 2 B) P (N 2 B)| min 1, pj .

j=1

if X is the number of events that occur, where the ith event has probability

provides

pi , andanNupper

Pois( bound

), with on

thehow much

average error

number is incurred

of events afrom using

that occur.

ximation, not only for approximating the PMF of X, but also for appr

e probability that

P X is any set. Also, it makes more precise how sma

d be: weExample:

want nj=1 in pmatching problem,

2 to be very

j small, theatprobability

or least very of

small comp

no match

e result can be shown is approximately

using an 1/e. known as the Stei

advanced technique

d.

Poisson paradigm is also called the law of rare events. The interpret

is that the pj are small, not that is small. For example, in the email e

w probability of getting an email from a specific person in a particular

by the large number of people who could send you an email in that hou

Poisson Model Example

0.30

Poisson model

empirical distribution

q = 1.83)

0.20

Pr(Y i = y i )

Pr( Y i = y|q

0.10 0.00

0.00

0 2 4 6 8 0 10 20

number of children number

source: Hoff, A First Course in Bayesian Statistical Methods

Fig. 3.7. Poisson distributions. The first panel shows a Poisso

Extensions:mean

Zero-Inflated Poisson,

of 1.83, along with theNegative Binomial, of

empirical distribution the nu

women of age 40 from the GSS during the 1990s. The secon

distribution of the sum of 10 i.i.d. Poisson random variables w

Normal (Gaussian) Distribution

symmetry

central limit theorem

characterizations (e.g., via entropy)

68-95-99.7% rule

Wikipedia

Normal Approximation to Binomial

Wikipedia

The Magic of Statistics

The Evil Cauchy Distribution

http://www.etsy.com/shop/NausicaaDistribution

Bootstrap (Efron, 1979)

data: 3.142 2.718 1.414 0.693 1.618

distribution to approximate true distribution

Bootstrap World

source: http://pubs.sciepub.com/ijefm/3/3/2/

which is based on diagram in Efron-Tibshirani book

- AAOC+C111(MATH+F113)Uploaded byBharat Raghunathan
- nullUploaded byapi-25914596
- Normal Approximations for Hyper Geometric DistributionUploaded byRibomaphil Isagani Macuto Atwel
- Normal DistributionsUploaded byZaid Ibrahim
- 2007_PredictRandomEffectsUploaded byhubik38
- 1 Introduction to OrUploaded byRaghav Rohila
- Statistical Corrosion ModellingUploaded byWiwis Arie
- AUTC_31215Uploaded byMathi Gandhi
- rpra3Uploaded bymihai37
- TME 601Uploaded bydearsaswat
- SOR Main AngolUploaded byduytung125
- P537_Utilization of statistics to assess fire risk in buildings_K.Tillander.pdfUploaded byAda Darmon
- MB0040 MQP Answer KeysUploaded byajeet
- HW2Solutions.pdfUploaded byDavid Lee
- 12OutlierUploaded byAnurekha Prasad
- m-relUploaded byuamiranda3518
- IJETR021666Uploaded byerpublication
- research-2013-lit-review[1].pdfUploaded byAdina Tudor
- Presentation on Reliability Basis of Current standardsUploaded byKenneth Mensah
- 05-StatisticalModels.pdfUploaded byMatheus Silva
- 05-StatisticalModels.pdfUploaded byMatheus Silva
- Week 1.4 - Exponential and Normal DistributionsUploaded byPhilipp Kähler
- It Lecture 4 (Process Flow)Uploaded byaardg hyt
- Demand During Lead 00 WillUploaded byZahirovic Muhizin
- DTIC_ADA187220.pdfUploaded byGom
- Multipoints StatisticUploaded byzztannguyenzz
- 98r06.pdfUploaded byEduardo Gantes
- [Jianqing_Fan]_New_Developments_in_Biostatistics_a(BookFi).pdfUploaded byyesly mileyma suescun lozano
- StatUploaded byMaQsud AhMad SaNdhu
- EEE25Uploaded byjake

- Virgin-Case.pdfUploaded byNooray Malik
- Point of View Doc TmpltUploaded bygowtam_ravi
- 2006-2007 Michigan Ross CC Case Interview Preparation GuideUploaded byr_oko
- Case Text Mining Insurance CompanyUploaded bygowtam_ravi
- IQACsdsfghUploaded bygowtam_ravi
- 14-avariable-speedsensorlessinductionmotordrive-110928234501-phpapp02Uploaded bygowtam_ravi
- IntercontinentalUploaded bygowtam_ravi
- My Adwords Video Advertising Notes.docxUploaded bygowtam_ravi
- My Adwords Fundamental Notes.docxUploaded bygowtam_ravi
- HCL Tech - Event Update - Axis Direct - 05042016_05!04!2016_13Uploaded bygowtam_ravi
- 1478692755_Press Release 3Uploaded bygowtam_ravi
- Us About Deloitte Tbm SolutionUploaded bygowtam_ravi
- Carpe Diem_Rule Book 2016Uploaded bygowtam_ravi
- Tableau Notes TopicsUploaded bygowtam_ravi
- LtPtF week7Uploaded bygowtam_ravi
- 91704260 DSTATCOM With Multilevel InveterUploaded bygowtam_ravi
- td06_342-libreUploaded bySairam Sai
- 83521-20ghsdjkdskj2035-1-PBUploaded bygowtam_ravi
- 00025552Uploaded bygowtam_ravi
- dstacomUploaded bygowtam_ravi
- L-01(TB)(ET) ((EE)NPTEL)Uploaded byRishi Sharma

- Gaussian Observation HMM for EEGUploaded byDeetov
- Quantitative Research ProcessUploaded byParlindungan Pardede
- Artigo Broadstock Collins Hunt Greenhouse Disclosure 2018Uploaded byPertenceaofutebol
- fall 2017 map class reportUploaded byapi-253462692
- 25b S1 June 2015Uploaded byLamine
- 2018 Stam Loss Models DataUploaded byUsherAlexander
- Solution Hw 1Uploaded byNixon Patel
- v01-Analysis of VarianceUploaded byasg_rus
- ch17-StatisticalQualityUploaded byYusuf Sahin
- class6Uploaded byKhanh Nam
- 31_03-Newman-MitzelUploaded byraa2010
- Daly 2012Uploaded byCamila Rosero
- Analytical Properties of an Imperfect Repair ModelUploaded byEli Pale
- Dynamic Construction of Stimulus Values in the Ventromedial Prefrontal CortexUploaded bygeohawaii
- Sampling Techniques,PptUploaded byWilda Shofaa Rahmawati
- Test Bank for Business Statistics a First Course 1st Canadian Edition by SharpeUploaded bya243011001
- Dummy ScalesUploaded byMariahDeniseCarumba
- Cohen-1960-A Coefficient of Agreement for Nominal ScalesUploaded byNadson Pontes
- Factor Affecting Positive & Negative Word of Mouth in Restaurant IndustryUploaded byEditor IJTSRD
- Schmid 2015Uploaded byParadon Vongvanitkangwan
- 7. Novita SariUploaded byIcha Nuri
- Simulation and Modelling( R18 Syllabus)(19.06.2019)Uploaded bydvr
- Aesthetic Complexity Practice and PercepUploaded byMihaela Marinescu
- data-02-00008Uploaded byLantana Misai
- Mapping Globally BrandedUploaded byriko22
- Political HistoryUploaded byTrisha Ace Betita Elises-Kapunan
- binomial_distribution.docUploaded bysaras
- MAT 240 SyllabusUploaded byAbigail Marie
- Football Poisson ExponentialUploaded bySaurav Goyal
- SW_2e_ex_ch06Uploaded byccni