24 views

Uploaded by Gustin Ferdana

Bayesian Methods of Parameter Estimation

- Joint Mle Map
- Woolridge Soluciones
- Bayes Decision Theory 2012
- Statistics Rio Part1
- 07_Cudina
- Table Analysis (TAANA) - Usage for Filters in SAP Event Management
- BUILDING-SPECIFIC LOSS ESTIMATION METHODS(chp1-5).pdf
- Probabilisticprogramminginquantitativefinancebythomaswieckileaddatascientistatquantopian 150318094734 Conversion Gate01
- Postprocessing_Whitepaper
- TR171_Ramirez.pdf
- Pbapro x Dbm 23nov2016
- Holger Seig-Problem Set
- A Bayesian Analysis of Multiple Change Point Problems
- A Bayesian Approach to Estimating Target Strength (S.M.M. Fässler, A.S. Brierley & P.G. Fernandes)
- bayes
- 2012-05-exam-c
- 38485-131307-1-SM.pdf
- 2 - 1 - Course Introduction (1422)
- Fitzmaurice G.M., Lipsitz S.R., Ibrahim J.G. - Estimation in Regression Models for Longitudinal Binary Data With Outcome-Dependent Follow-up(2006)(17)
- Linear Models for Continuous Data

You are on page 1of 6

Aciel Eshky

University of Edinburgh

School of Informatics

Introduction

In order to motivate the idea of parameter estimation we need to first understand

the notion of mathematical modeling.

What is the idea behind modeling real world phenomena? Mathematically modeling an aspect of the real world enables us to better understand it

and better explain it, and perhaps enables us to reproduce it, either on a large

scale, or on a simplified scale that characterizes only the critical parts of that

phenomenon [1].

How do we model these real life phenomena? These real life phenomena

are captured by means of distribution models, which are extracted or learned

directly from data gathered about them.

So, what do we mean by parameter estimation? Every distribution

model has a set of parameters that need to be estimated. These parameters

specify any constants appearing in the model and provide a mechanism for

efficient and accurate use of data [2].

Before discussing the Bayesian approach to parameter estimation it is important

to understand the classical frequentest approach.

The frequentest approach is the classical approach to parameter estimation. It

assumes that there is an unknown but objectively fixed parameter [3]. It

chooses the value of which maximizes the likelihood of observed data [4], in

other words, making the available data as likely as possible. A common example

is the maximum likelihood estimator (MLE).

The frequentest approach is statistically driven, and defines probability as

the frequency of successful trials over the number of total trials in an experiment.

For example, in a coin toss experiment, we toss the coin 100 times and it comes

out 25 times as heads and 75 times as tails. The probabilities are extracted

directly from the given data as: (P = heads) = 1/4 and (P = tails) = 3/4.

Distribution models that use the frequentest approach to estimate their parameters are classified as generative models [5], which model the distribution

of entire available data, assumed to have been generated with a fixed .

In contrast, the Bayesian approach allows probability to represent subjective

uncertainty or subjective belief [3]. It fixes the data and instead assumes possible

values for .

Taking the same coin toss example, the probabilities would represent our

subjective belief, rather than the number of successful trials over the total trials.

If we believe that heads and tails are equally likely, the probabilities would

become: (P = heads) = 1/2 and (P = tails) = 1/2.

Distribution models that use the Bayesian approach to estimate their parameters are classified as conditional models, also known as discriminative

models, which do not require us to model much of the data and are rather only

interested in how particular part of the data depends on the other parts [5].

Basics of Bayesian inference

This description is attributed to the following reference [6]. Bayesian inference

grows out of the simple formula known as Bayes rule. Assume we have two

random variables A and B. A principle rule of probability theory known as

the chain rule allows us to specify the joint probability of A and B taking on

particular values a and b, P (a, b), as the product of the conditional probability

that A will take on value a given that B has taken on value b, P (a|b), and the

marginal probability that B takes on value b, P (b). Which gives us:

Joint probability = Conditional Probability x Marginal Probability

Thus we have:

P (a, b) = P (a|b)P (b)

There is nothing special about our choice to marginalize B rather than A, and

thus equally we have:

P (a, b) = P (b|a)P (a)

When combining the two we get:

P (a|b)P (b) = P (b|a)P (a)

rearranged as:

P (b|a)P (a)

P (b)

and can be equally written in a marginalized form as:

P (a|b) =

P (a|b) =

P (b|a)P (a)

0

0

A

a0 P (b|a )P (a )

This expression is Bayes Rule. Which indicates that we can compute the conditional probability of a variable A given the variable B from the conditional

probability of B given A. This introduces the notion of prior and posterior

knowledge.

2

A prior probability is the probability available to us beforehand, and before

making any additional observations. A posterior probability is the probability

obtained from the prior probability after making additional observation to the

prior knowledge available [6]. In our example, the prior probability would be

P (a) and the posterior probability would be P (a|b). The additional observation

was observing that B takes on value b.

Bayes rule obtains its strength from the assumptions we make about the random

variables and the meaning of probability [7]. When dealing with parameter

estimation, could be a parameter needed to be estimated from some given

evidence or data d. The probability of data given the parameter is commonly

referred to as the likelihood. And so, we would be computing the probability

of a parameter given the likelihood of some data.

P (|d) =

P (d|)P ()

0

0

0 P (d| )P ( )

Bayesian parameter estimation specify how we should update our beliefs in the

light of newly introduced evidence.

This summary is attributed to the following references [8, 4]. The Bayesian

approach to parameter estimation works as follows:

1. Formulate our knowledge about a situation

2. Gather data

3. Obtain posterior knowledge that updates our beliefs

How do we formulate our knowledge about a situation?

a. Define a distribution model which expresses qualitative aspects of our

knowledge about the situation. This model will have some unknown parameters, which will be dealt with as random variables [4].

b. Specify a prior probability distribution which expresses our subjective

beliefs and subjective uncertainty about the unknown parameters, before

seeing the data.

After gathering the data, how do we obtain posterior knowledge?

c. Compute posterior probability distribution which estimates the unknown parameters using the rules of probability and given the observed

data, presenting us with updated beliefs.

To illustrate this Bayesian paradigm of parameter estimation, let us apply it to

a simple example concerning visual perception. The example is attributed to

the following reference [9].

The perception problem is modeled using observed image data, denoted as d.

The observable scene properties, denoted as , constitute the parameters needed

to be estimated for this model. We can define probabilities as follows:

P () represents the probability distribution of observable scene properties, which

are the parameters we need to estimate, or in other words, update in the

light of new data. This probability constitutes the prior probability.

P (d|) represents the probability distribution of the images given the observable

scene properties. This probability constitutes the likelihood.

P (d) represents the probability of the images, which are constants that can be

normalized over.

P (|d) represents the probability distribution of the observable scene properties

given the images. This probability constitute the posterior probability of

the estimated parameters.

By applying Bayes theorem we arrive at:

P (|d) =

P (d|)P ()

P (d)

And equally:

P (|d) =

P (d|)P ()

0

0

0 P (d| )P ( )

P (|d) = k P (d|)P ()

An example

Consider the following problem. Given the silhouette of an object, we need to

infer what that object is.

The prior distribution of objects, P (Object) = P (), is:

Object

Cube

Cylinder

Sphere

Prism

Probability

0.3

0.2

0.1

0.4

4

The likelihood of a silhouette given an object, P (Silhouette|Object) = P (d|),

is:

Square

Circle

Trapezoid

Cube

1.0

0.0

0.0

Cylinder

0.6

0.4

0.0

Sphere

0.0

1.0

0.0

Prism

0.4

0.0

0.6

The posterior distribution of objects given the silhouettes,

P (Object|Silhouette) = P (|d) can then be computed. For example, given

= Square

P (Cube|Square) = k 0.2 1.0 = 0.37

P (Cylinder|Square) = k 0.3 0.6 = 0.333

P (Sphere|Square) = k 0.1 0.0 = 0.0

P (P rism|Square) = k 0.4 0.4 = 0.296

And thus we have updated our beliefs in the light of newly introduced data.

References

[1] Amos Storkey.

Mlpr lectures:

Distributions and models.

http://www.inf.ed.ac.uk/teaching/courses/mlpr/lectures/distnsandmodelsprint4up.pdf, 2009. School of Informatics, University of Edinburgh.

[2] J.V. Beck and K.J. Arnold. Parameter estimation in engineering and science. Wiley series in probability and mathematical statistics. J. Wiley, New

York, 1977.

[3] Algorithms for graphical models (agm) bayesian parameter estimation. www-users.cs.york.ac.uk/ jc/teaching/agm/lectures/lect14/lect14.pdf,

November 2006. University of York, Department of Computer Science.

[4] Chris Williams.

Pmr lectures:

Bayesian parameter estimation.

http://www.inf.ed.ac.uk/teaching/courses/pmr/slides/bayespe-2x2.pdf,

2008. School of Informatics, University of Edinburgh.

5

Mlpr lectures: Naive bayes and bayesian methods.

http://www.inf.ed.ac.uk/teaching/courses/mlpr/lectures/naiveandbayesprint4up.pdf, 2009. School of Informatics, University of Edinburgh.

[6] Christopher M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer, August 2006.

[7] Thomas L. Griffiths, Charles Kemp, and Joshua B. Tenenbaum. Bayesian

models of cognition. Technical report.

[8] Radford M. Neal.

Bayesian methods for machine learning.

www.cs.toronto.edu/pub/radford/bayes-tut.pdf, December 2004.

NIPS

Tutorial, University of Toronto.

[9] Olivier Aycard and Luis Enrique Sucar. Bayesian techniques in vision and

perception. http://homepages.inf.ed.ac.uk/rbf/IAPR.

- Joint Mle MapUploaded byAriel Vernaza
- Woolridge SolucionesUploaded byFernanda Rojas Ampuero
- Bayes Decision Theory 2012Uploaded bylcm3766l
- Statistics Rio Part1Uploaded byhnoor6
- 07_CudinaUploaded byrendezvous26
- Table Analysis (TAANA) - Usage for Filters in SAP Event ManagementUploaded byVenkat Ram
- BUILDING-SPECIFIC LOSS ESTIMATION METHODS(chp1-5).pdfUploaded byRomina Ilina-Posea
- Probabilisticprogramminginquantitativefinancebythomaswieckileaddatascientistatquantopian 150318094734 Conversion Gate01Uploaded byFiDi007
- Postprocessing_WhitepaperUploaded byshochst
- TR171_Ramirez.pdfUploaded byTaiwanTaiwan
- Pbapro x Dbm 23nov2016Uploaded byjcfermosell
- Holger Seig-Problem SetUploaded byLucas Piroulus
- A Bayesian Analysis of Multiple Change Point ProblemsUploaded byLiliana Alejandra Rodriguez Zamora
- A Bayesian Approach to Estimating Target Strength (S.M.M. Fässler, A.S. Brierley & P.G. Fernandes)Uploaded byMati Ger
- bayesUploaded byapi-284811969
- 2012-05-exam-cUploaded byEsme Kurnali
- 38485-131307-1-SM.pdfUploaded byAndreea
- 2 - 1 - Course Introduction (1422)Uploaded bydudu123456
- Fitzmaurice G.M., Lipsitz S.R., Ibrahim J.G. - Estimation in Regression Models for Longitudinal Binary Data With Outcome-Dependent Follow-up(2006)(17)Uploaded byoscura
- Linear Models for Continuous DataUploaded bylak93
- Estimating parameters .docxUploaded bySai Kiran
- Set Advanced Channel Management Parameters of CellUploaded byChea Sovong
- 7120959 11 Function ModulesUploaded bymanuelm77
- How IDW WorksUploaded byAndry Tiraska
- TMA84.pdfUploaded byrmarcanob
- US Federal Reserve: ifdp866Uploaded byThe Fed
- Mod 5.1Uploaded bySyed Younus Ahmed
- 1-s2.0-S0167865515004043-mainUploaded byYasser Hamed
- Probabilistic Modelling of Hourly Rainfall Data for Development of Intensity-Duration-Frequency RelationshipsUploaded byBONFRING
- Continuous History Recorder AUploaded byWeerayuth Yurasri

- SPSS Data Analysis Examples_ Multinomial Logistic RegressionUploaded bybananatehpisang
- Approach for Fatigue Load Monitoring Without Load Measurement Devices EWEC2009Uploaded bymartina
- GlsUploaded byArsenal Holic
- Regression Modeling Strategies_ With Applications to Linear Models by Frank E. HarrellUploaded byApoorva
- MWD eg., Econometrics.docUploaded byOmen Protector
- Regression Analysis in RUploaded byMayank Rawat
- Ch13 ExcelUploaded byMonal Patel
- Midterm Sem1 2013Uploaded bypicklepoesie
- Logit and Probit ModelsUploaded bygalih visnhu
- (Estimação Usando Stata )2006-12Uploaded byrogeriomsiq
- Multivar 2 - Simple and Multiple Regression.pdfUploaded byokisaputra198909
- (experiment and model) Costa_ODE.pdfUploaded byDavid King
- Correlation Research Design FixedUploaded byfebby
- Prerequisites, MFE Program, Berkeley-Haas.pdfUploaded byAbhishek Singh Ujjain
- 05 labor force participation and earnings in mongoliaUploaded byapi-232282642
- fstats_ch2.pdfUploaded bysound05
- FinalUploaded byKulsum Jainoodin
- Miscellaneous Questions and AnswersUploaded byaftab20
- 10. Cointegration.pdfUploaded byHeber Peña Aponte
- MLE_in_RUploaded byle_face
- Multiple RegressionUploaded byValery Akhmetzianov
- Spm SlidesUploaded byDedy Dwi PrasTyo
- Assignment 5.docxUploaded bybharat Kumar
- Guerra de La Corte, Adrián TFGUploaded bydklus12
- Introduction to Econometrics- Stock & Watson -Ch 6 Slides.docUploaded byAntonio Alvino
- IM Chapter12 1Uploaded byShruti Sivadasan
- Corelation MatrixUploaded byWaqas Khan
- EcMetHW1Uploaded bySilvio Paula
- Hyp Testing and Conf Ints.pdfUploaded bysumits6
- An Overview of Statistical LearningUploaded byKarthik Vadlapatla