0 Up votes0 Down votes

0 views40 pagesDocumento en ingles que habla acerca de la regresion estadistica

Apr 26, 2017

© © All Rights Reserved

PDF, TXT or read online from Scribd

Documento en ingles que habla acerca de la regresion estadistica

© All Rights Reserved

0 views

Documento en ingles que habla acerca de la regresion estadistica

© All Rights Reserved

- Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
- Hidden Figures Young Readers' Edition
- The Law of Explosive Growth: Lesson 20 from The 21 Irrefutable Laws of Leadership
- The E-Myth Revisited: Why Most Small Businesses Don't Work and
- The Wright Brothers
- The Power of Discipline: 7 Ways it Can Change Your Life
- The Other Einstein: A Novel
- The Kiss Quotient: A Novel
- State of Fear
- State of Fear
- The 10X Rule: The Only Difference Between Success and Failure
- Being Wrong: Adventures in the Margin of Error
- Algorithms to Live By: The Computer Science of Human Decisions
- The Black Swan
- Prince Caspian
- The Art of Thinking Clearly
- A Mind for Numbers: How to Excel at Math and Science Even If You Flunked Algebra
- The Last Battle
- The 6th Extinction
- HBR's 10 Must Reads on Strategy (including featured article "What Is Strategy?" by Michael E. Porter)

You are on page 1of 40

Rasmussen (Engineering, Cambridge) Gaussian Process Regression November 11th - 16th, 2010 1 / 40

The Prediction Problem

420

CO2 concentration, ppm

400

?

380

360

340

320

1960 1980 2000 2020

year

Rasmussen (Engineering, Cambridge) Gaussian Process Regression November 11th - 16th, 2010 2 / 40

The Gaussian Distribution

Rasmussen (Engineering, Cambridge) Gaussian Process Regression November 11th - 16th, 2010 3 / 40

Conditionals and Marginals of a Gaussian

conditional marginal

Both the conditionals and the marginals of a joint Gaussian are again Gaussian.

Rasmussen (Engineering, Cambridge) Gaussian Process Regression November 11th - 16th, 2010 4 / 40

Conditionals and Marginals of a Gaussian

In algebra, if x and y are jointly Gaussian

hai h A B i

p(x, y) = N , ,

b B> C

i h A B i

a

h

p(x, y) = N , = p(x) = N(a, A),

b B> C

hai h A B i

p(x, y) = N , = p(x|y) = N(a+BC1 (yb), ABC1 B> ),

b B> C

Rasmussen (Engineering, Cambridge) Gaussian Process Regression November 11th - 16th, 2010 5 / 40

What is a Gaussian Process?

infinitely many variables.

finite number of which have (consistent) Gaussian distributions.

matrix :

f = (f1 , . . . , fn )> N(, ), indexes i = 1, . . . , n

A Gaussian process is fully specified by a mean function m(x) and covariance

function k(x, x 0 ):

Rasmussen (Engineering, Cambridge) Gaussian Process Regression November 11th - 16th, 2010 6 / 40

The marginalization property

and an infinite by infinite covariance matrix may seem impractical. . .

Recall: Z

p(x) = p(x, y)dy.

For Gaussians:

hai h A B i

p(x, y) = N , = p(x) = N(a, A)

b B> C

Rasmussen (Engineering, Cambridge) Gaussian Process Regression November 11th - 16th, 2010 7 / 40

Random functions from a Gaussian Process

To get an indication of what this distribution over functions looks like, focus on a

finite subset of function values f = (f (x1 ), f (x2 ), . . . , f (xn ))> , for which

f N(0, ),

where ij = k(xi , xj ).

Rasmussen (Engineering, Cambridge) Gaussian Process Regression November 11th - 16th, 2010 8 / 40

Some values of the random function

1.5

0.5

output, f(x)

0.5

1.5

5 0 5

input, x

Rasmussen (Engineering, Cambridge) Gaussian Process Regression November 11th - 16th, 2010 9 / 40

Joint Generation

covariance matrix K and mean vector m: (in octave or matlab)

z = randn(D,1);

y = chol(K)*z + m;

Rasmussen (Engineering, Cambridge) Gaussian Process Regression November 11th - 16th, 2010 10 / 40

Sequential Generation

Y

n

p(f1 , . . . , fn |x1 , . . . xn ) = p(fi |fi1 , . . . , f1 , xi , . . . , x1 ),

i=1

h i h A B i

a

p(x, y) = N , = p(x|y) = N(a+BC1 (yb), ABC1 B> )

b B> C

Rasmussen (Engineering, Cambridge) Gaussian Process Regression November 11th - 16th, 2010 11 / 40

Function drawn at random from a Gaussian Process with Gaussian covariance

6

4

6

2

4

0 2

2 0

2

4

4

6 6

Rasmussen (Engineering, Cambridge) Gaussian Process Regression November 11th - 16th, 2010 12 / 40

The Linear Model

In the linear model, the output is assumed to be a linear function of the input:

y = x> w + .

X

w = argmin (y(i) x(i)> w)2 .

i

Y Y 1 (y(i) x(i)> w)2

p(y|X, w) = p(y(i) |x(i) , w) = exp ,

2n 22n

i i

w = (XX> )1 Xy.

Rasmussen (Engineering, Cambridge) Gaussian Process Regression November 11th - 16th, 2010 13 / 40

Basis functions, or Linear-in-the-Parameters Models

The linear model is of little use of the data doesnt follow a linear relationship.

But can be generalized using basis functions:

y = (x)> w + ,

1

w = (X)(X)> (X)y.

Rasmussen (Engineering, Cambridge) Gaussian Process Regression November 11th - 16th, 2010 14 / 40

Maximum likelihood, parametric model

Supervised parametric learning:

data: x, y

model: y = fw (x) +

Gaussian likelihood:

Y

p(y|x, w, Mi ) exp( 12 (yc fw (xc ))2 /2noise ).

c

w

p(y |x , wML , Mi )

Rasmussen (Engineering, Cambridge) Gaussian Process Regression November 11th - 16th, 2010 15 / 40

Bayesian Inference, parametric model

Supervised parametric learning:

data: x, y

model: y = fw (x) +

Gaussian likelihood:

Y

p(y|x, w, Mi ) exp( 12 (yc fw (xc ))2 /2noise ).

c

Parameter prior:

p(w|Mi )

p(w|Mi )p(y|x, w, Mi )

p(w|x, y, Mi ) =

p(y|x, Mi )

Rasmussen (Engineering, Cambridge) Gaussian Process Regression November 11th - 16th, 2010 16 / 40

Bayesian Inference, parametric model, cont.

Making predictions:

Z

p(y |x , x, y, Mi ) = p(y |w, x , Mi )p(w|x, y, Mi )dw

Marginal likelihood:

Z

p(y|x, Mi ) = p(w|Mi )p(y|x, w, Mi )dw.

Model probability:

p(Mi )p(y|x, Mi )

p(Mi |x, y) =

p(y|x)

Rasmussen (Engineering, Cambridge) Gaussian Process Regression November 11th - 16th, 2010 17 / 40

Non-parametric Gaussian process models

In our non-parametric model, the parameters are the function itself!

Gaussian likelihood:

y|x, f (x), Mi N(f, 2noise I)

f (x)|Mi GP m(x) 0, k(x, x 0 )

f (x)|x, y, Mi GP mpost (x) = k(x, x)[K(x, x) + 2noise I]1 y,

kpost (x, x 0 ) = k(x, x 0 ) k(x, x)[K(x, x) + 2noise I]1 k(x, x 0 ) .

y |x , x, y, Mi N k(x , x)> [K + 2noise I]1 y,

k(x , x ) + 2noise k(x , x)> [K + 2noise I]1 k(x , x)

Rasmussen (Engineering, Cambridge) Gaussian Process Regression November 11th - 16th, 2010 18 / 40

Prior and Posterior

2 2

1 1

output, f(x)

output, f(x)

0 0

1 1

2 2

5 0 5 5 0 5

input, x input, x

Predictive distribution:

k(x , x ) + 2noise k(x , x)> [K + 2noise I]1 k(x , x)

Rasmussen (Engineering, Cambridge) Gaussian Process Regression November 11th - 16th, 2010 19 / 40

Factor Graph for Gaussian Process

A Factor Graph is a graphical representation

f(xi) f(xl) f(xj) of a multivariate distribution.

factors. The factors induce dependencies be-

y(xi) y(xj) tween the variables to which they have edges.

Open nodes are stochastic (free) and shaded

xi x1 , . . . , xn other indices xj x1 , . . . , xm nodes are observed (clamped).Plates indicate

repetitions.

The predictive distribution for test case y(xj ) depends only on the corresponding

latent variable f (xj ).

This explains why we can make inference using a finite amount of computation!

Rasmussen (Engineering, Cambridge) Gaussian Process Regression November 11th - 16th, 2010 20 / 40

Some interpretation

Recall our main result:

K(X , X ) K(X , X)[K(X, X) + 2n I]1 K(X, X ) .

X

n X

n

(x ) = k(x , X)[K(X, X) + 2n I]1 y = c y(c) = c k(x , x(c) ).

c=1 c=1

the first term is the prior variance, from which we subtract a (positive) term,

telling how much the data X has explained. Note, that the variance is

independent of the observed outputs y.

Rasmussen (Engineering, Cambridge) Gaussian Process Regression November 11th - 16th, 2010 21 / 40

The marginal likelihood

Log marginal likelihood:

1 1 n

log p(y|x, Mi ) = y> K1 y log |K| log(2)

2 2 2

is the combination of a data fit term and complexity penalty. Occams Razor is

automatic.

any unknown (hyper-) parameters .

log p(y|x, , Mi ) 1 K 1 1 K

= y> K1 K y trace(K1 )

j 2 j 2 j

Rasmussen (Engineering, Cambridge) Gaussian Process Regression November 11th - 16th, 2010 22 / 40

Example: Fitting the length scale parameter

(x x 0 )2

Parameterized covariance function: k(x, x 0 ) = v2 exp + 2n xx 0 .

2`2

1.5

observations

too short

1 good length scale

too long

0.5

0.5

10 8 6 4 2 0 2 4 6 8 10

The mean posterior predictive function is plotted for 3 different length scales (the

green curve corresponds to optimizing the marginal likelihood). Notice, that an

almost exact fit to the data can be achieved by reducing the length scale but the

marginal likelihood does not favour this!

Rasmussen (Engineering, Cambridge) Gaussian Process Regression November 11th - 16th, 2010 23 / 40

Why, in principle, does Bayesian Inference work?

Occams Razor

too simple

P(Y|Mi)

"just right"

too complex

Y

All possible data sets

Rasmussen (Engineering, Cambridge) Gaussian Process Regression November 11th - 16th, 2010 24 / 40

An illustrative analogous example

Imagine the simple task of fitting the variance, 2 , of a zero-mean Gaussian to a

set of n scalar observations.

2 log(2)

Rasmussen (Engineering, Cambridge) Gaussian Process Regression November 11th - 16th, 2010 25 / 40

From random functions to covariance functions

ZZ Z Z

(x) = E[f (x)] = f (x)p(a)p(b)dadb = axp(a)da + bp(b)db = 0,

ZZ

0 0

k(x, x ) = E[(f (x) 0)(f (x ) 0)] = (ax + b)(ax 0 + b)p(a)p(b)dadb

Z Z Z

= a2 xx 0 p(a)da + b2 p(b)db + (x + x 0 ) abp(a)p(b)dadb = xx 0 + .

Rasmussen (Engineering, Cambridge) Gaussian Process Regression November 11th - 16th, 2010 26 / 40

From random functions to covariance functions II

Consider the class of functions (sums of squared exponentials):

1X

f (x) = lim i exp((x i/n)2 ), where i N(0, 1), i

n n

i

Z

= (u) exp((x u)2 )du, where (u) N(0, 1), u.

Z Z

2

(x) = E[f (x)] = exp((x u) ) p()ddu = 0,

Z

E[f (x)f (x )] = exp (x u)2 (x 0 u)2 du

0

Z

x + x 0 2 (x + x 0 )2 (x x 0 )2

x2 x 02 du exp

= exp 2(u ) + .

2 2 2

Thus, the squared exponential covariance function is equivalent to regression

using infinitely many Gaussian shaped basis functions placed everywhere, not just

at your training points!

Rasmussen (Engineering, Cambridge) Gaussian Process Regression November 11th - 16th, 2010 27 / 40

Using finitely many basis functions may be dangerous!

0.5

0.5

10 8 6 4 2 0 2 4 6 8 10

Rasmussen (Engineering, Cambridge) Gaussian Process Regression November 11th - 16th, 2010 28 / 40

Model Selection in Practise; Hyperparameters

There are two types of task: form and parameters of the covariance function.

Typically, our prior is too weak to quantify aspects of the covariance function.

We use a hierarchical model using hyperparameters. Eg, in ARD:

X

D

(xd xd0 )2

0

k(x, x ) = v20 exp , hyperparameters = (v0 , v1 , . . . , vd , 2n ).

2v2d

d=1

2 2 2

1 0 0

0

2 2

2 2 2

2 2 2

0 0 0 0 0 0

x2 2 2 x1 x2 2 2 x1 x2 2 2 x1

Rasmussen (Engineering, Cambridge) Gaussian Process Regression November 11th - 16th, 2010 29 / 40

Rational quadratic covariance function

r2

kRQ (r) = 1+

2`2

with , ` > 0 can be seen as a scale mixture (an infinite sum) of squared

exponential (SE) covariance functions with different characteristic length-scales.

Z

kRQ (r) = p(|, )kSE (r|)d

Z r2

r2

1 exp exp d 1 + ,

2 2`2

Rasmussen (Engineering, Cambridge) Gaussian Process Regression November 11th - 16th, 2010 30 / 40

Rational quadratic covariance function II

1 =1/2

3

=2

2

0.8

1

output, f(x)

covariance

0.6

0

0.4 1

0.2 2

0 3

0 1 2 3 5 0 5

input distance input, x

Rasmussen (Engineering, Cambridge) Gaussian Process Regression November 11th - 16th, 2010 31 / 40

Matrn covariance functions

Stationary covariance functions can be based on the Matrn form:

1 h 2 i 2

0 0 0

k(x, x ) = |x x | K |x x | ,

()21 ` `

where K is the modified Bessel function of second kind of order , and ` is the

characteristic length scale.

Sample functions from Matrn forms are b 1c times differentiable. Thus, the

hyperparameter can control the degree of smoothness

Special cases:

(Ornstein-Uhlenbeck)

k=3/2 (r) = 1 + `3r exp `3r (once differentiable)

5r2

k=5/2 (r) = 1 + `5r + 3` exp `5r (twice differentiable)

2

2

r

k = exp( 2` 2 ): smooth (infinitely differentiable)

Rasmussen (Engineering, Cambridge) Gaussian Process Regression November 11th - 16th, 2010 32 / 40

Matrn covariance functions II

Univariate Matrn covariance function with unit characteristic length scale and

unit variance:

1

=1/2 2

=1

covariance

output, f(x)

=2 1

0.5 0

1

2

0

0 1 2 3 5 0 5

input distance input, x

Rasmussen (Engineering, Cambridge) Gaussian Process Regression November 11th - 16th, 2010 33 / 40

Periodic, smooth functions

To create a distribution over periodic functions of x, we can first map the inputs

to u = (sin(x), cos(x))> , and then measure distances in the u space. Combined

with the SE covariance function, which characteristic length scale `, we get:

kperiodic (x, x 0 ) = exp(2 sin2 ((x x 0 ))/`2 )

3 3

2 2

1 1

0 0

1 1

2 2

3 3

2 1 0 1 2 2 1 0 1 2

Rasmussen (Engineering, Cambridge) Gaussian Process Regression November 11th - 16th, 2010 34 / 40

The Prediction Problem

420

CO2 concentration, ppm

400

?

380

360

340

320

1960 1980 2000 2020

year

Rasmussen (Engineering, Cambridge) Gaussian Process Regression November 11th - 16th, 2010 35 / 40

Covariance Function

The covariance function consists of several terms, parameterized by a total of 11

hyperparameters:

k1 (x, x 0 ) = 21 exp((x x 0 )2 /22 ),

seasonal trend (quasi-periodic

smooth)

k2 (x, x 0 ) = 23 exp 2 sin2 ((x x 0 ))/25 exp 21 (x x 0 )2 /24 ,

short- and medium-term anomaly (rational quadratic)

8

(xx 0 )2

k3 (x, x 0 ) = 26 1 + 28 27

noise (independent Gaussian,

and

dependent)

0 2

(xx )

k4 (x, x 0 ) = 29 exp 2210

+ 211 xx 0 .

Rasmussen (Engineering, Cambridge) Gaussian Process Regression November 11th - 16th, 2010 36 / 40

Long- and medium-term mean predictions

400 1

CO2 concentration, ppm

380 0.5

360 0

340 0.5

320 1

year

Rasmussen (Engineering, Cambridge) Gaussian Process Regression November 11th - 16th, 2010 37 / 40

Mean Seasonal Component

2020

3.6

2010

2000 2.8

1

Year

1990 0 2

2 1

1 2 3.1

2 1 0

1980 3.3

3 2.8

2.8

1970

1960

J F M A M J J A S O N D

Month

Independent noise, magnitude 11 = 0.19 ppm.

Rasmussen (Engineering, Cambridge) Gaussian Process Regression November 11th - 16th, 2010 38 / 40

Conclusions

learning and prediction.

Many other models are (crippled versions) of GPs: Relevance Vector Machines

(RVMs), Radial Basis Function (RBF) networks, splines, neural networks.

Rasmussen (Engineering, Cambridge) Gaussian Process Regression November 11th - 16th, 2010 39 / 40

More on Gaussian Processes

Gaussian Processes for Machine Learning,

MIT Press, 2006.

http://www.GaussianProcess.org/gpml

Rasmussen (Engineering, Cambridge) Gaussian Process Regression November 11th - 16th, 2010 40 / 40

- It is the fall projectUploaded byperestotnik
- Gaussian Process IntroductionUploaded bydonsuni
- cs229-notes9.pdfUploaded byShubhamKhodiyar
- InvertUploaded byडॉ. कनिष्क शर्मा
- Poisson ProcessUploaded byjanusamunra
- Arik Yochelis, Barak Galanti and Zeev Olami- Oscillon formation as an initial pattern stateUploaded byPomac232
- SVDUploaded byivasiton
- LEAST-SQUARES AND CHI-SQUARE FOR THE BUDDING AFICIONADO: ART AND PRACTICEUploaded byMaggie
- Viễn thôngUploaded byquan
- Modeling and Forecasting Realised VolatilityUploaded byQrazyKat
- Koch I. Analysis of Multivariate and High-Dimensional Data 2013Uploaded byrciani
- Randomfield Kl CondUploaded bySlim Salim
- 8.pdfUploaded byTalin Machin
- Isi Jrf Stat 08Uploaded byapi-26401608
- The Normal Distribution1Uploaded bySherifa
- hw3 ProbabilityUploaded byJong Moon
- Binomial DistributionUploaded byAvinash Garikapati
- EE132B HW2 Solution 2019FallUploaded byGao Andrew
- Random Promotion Works Best (IgNobel Prize Winner 2010)Uploaded byJeremy Webster
- Chapter 6 Lecture (3).pptxUploaded byWaqas Ahmed
- MidtermUploaded bytesfaye
- probabilidadUploaded bygmendozap
- SPSSTwoISampleTTest.pdfUploaded byDrMukesh Srivastava
- 2015_SPR_37459_v1_standard_city_23-7-15Uploaded byPerry Stephenson
- sem-montecarlo-2158244015591823Uploaded byHoracio Miranda Vargas
- short term waveUploaded byk12d2
- ID3[1].docxUploaded bySasya Nadira
- Probability DistributionUploaded byBoost Websitetraffic Freeviral
- FASTICAUploaded bymiki3266
- Effects of Varied Versus Constant Loading Zones on Muscular Adaptations in Trained MenUploaded byDouglas Marin

- Large-Eddy Simulation of Turbulent Channel FlowUploaded byvassa
- G5 DLL Q1 WEEK 3 MATH With Pedagogical ApproachesUploaded byJeje Angeles
- LATIHAN SOAL AP2BUploaded byGhassan Siregar
- StudyGuide.pdfUploaded bydanteunmsm
- Casio Manual fx-9860GII_Soft_EN.pdfUploaded byglazetm
- 747 Longitudinal DynamicsUploaded byKore Bence
- Fuzzy Graph theory , a survey.pdfUploaded byorange world
- Tonk, Plonk and Plink - Nuel D. BelnapUploaded byKherian Gracher
- TOTALUploaded byJames A. Smith
- Geometry in Real LifeUploaded byltkishor1971
- MASTER OF MATHEMATICS SOP.docxUploaded byHARSHAD CHAUDHARI
- Similar Shapes With Examples Pp tUploaded byKristina Roko Evcevska
- Floating Cylinder1Uploaded byEgwuatu Uchenna
- Topics in Group TheoryUploaded byFranklin feel
- Complete Business Statistics-Chapter 3Uploaded bykrishnadeep_gupta4018
- Non NumeratesUploaded byElle Hanna C. Angeles
- unit 1 simplifying radicals and operations with radicalsUploaded byapi-293495793
- Wilfrid Sellars - Language, Rules and BehaviorUploaded bykoolkeithevic
- Trignometry 1Uploaded bysishira
- The Tower of Hanoi Myths and MathsUploaded bycamil salame
- Algebra Worksheet 1Uploaded byAustin Ho
- E3h-Eigs01Uploaded byyusufcati
- Basic IntegrationUploaded byflickishdesire
- Generalized structured additive regression based on Bayesian P-SplinesUploaded bycchien
- Assign 6Uploaded byArXlan Xahir
- Weber Lush Zavaliangos Anand 1990.pdfUploaded byChandra Clark
- BMEUploaded byWikki Jio
- On Functions Preserving Convergence of Series in Fuzzy n-Normed SpacesUploaded byMia Amalia
- dsngfgfnflhnUploaded byJelly Berry
- ap calc ch 5-7 pre post testUploaded byapi-314402585

## Much more than documents.

Discover everything Scribd has to offer, including books and audiobooks from major publishers.

Cancel anytime.