860 views

Uploaded by mbuyiselwa

Exposure to Bayesian Stats

- Bayesian Brain
- Bayesian Analysis in Stata Using WinBUGS
- BayesianTheory_BernardoSmith2000
- Teaching Bayesian Method
- Bayesian Tutorial
- Principles of Statistical Inference
- Bayesian Network
- Bayesian Methods Python
- Bayesian Statistics With R and BUGS
- Doing Bayesian Data Analysis With R and BUGS
- Fundamentals of Computational Neuroscience
- Bayesian Programming
- An Introduction to Bayesian Statistics
- Introduction to Bayesian Networks
- A Bayesian Theory of Games: Iterative Conjectures and Determination of Equilibrium
- The-Field-Guide-to-Data-Science.pdf
- Neural Network complete
- Statistical Data Analysis Explained
- A Guide to Create Beautiful Graphics in R, 2nd Ed.pdf
- Generalized Linear Models

You are on page 1of 45

1 / 29

Warning

about 5 equations (hopefully not too hard!) about 10 gures.

This tutorial should be accessible even if the equations might look hard.

2 / 29

The need for (statistical) modelling; two examples (a linear model/tractography) introduction to statistical inference (frequentist); introduction to the Bayesian approach to parameter estimation; more examples and Bayesian inference in practice conclusions.

3 / 29

Examples include: Sample Size Determination Comparison between two (or more) groups

t-tests, Z-tests; Analysis of variance (ANOVA); tests for proportions etc;

4 / 29

Examples include: Sample Size Determination Comparison between two (or more) groups

t-tests, Z-tests; Analysis of variance (ANOVA); tests for proportions etc;

4 / 29

Examples include: Sample Size Determination Comparison between two (or more) groups

t-tests, Z-tests; Analysis of variance (ANOVA); tests for proportions etc;

4 / 29

Examples include: Sample Size Determination Comparison between two (or more) groups

t-tests, Z-tests; Analysis of variance (ANOVA); tests for proportions etc;

4 / 29

Examples include: Sample Size Determination Comparison between two (or more) groups

t-tests, Z-tests; Analysis of variance (ANOVA); tests for proportions etc;

4 / 29

One (of the best) ways(s) to describe some data is by tting a (statistical) model. Examples include: (linear/logistic/loglinear) regression models; survival analysis; longitudinal data analysis; infectious disease modelling; image/shape analysis; ...

5 / 29

One (of the best) ways(s) to describe some data is by tting a (statistical) model. Examples include: (linear/logistic/loglinear) regression models; survival analysis; longitudinal data analysis; infectious disease modelling; image/shape analysis; ...

5 / 29

One (of the best) ways(s) to describe some data is by tting a (statistical) model. Examples include: (linear/logistic/loglinear) regression models; survival analysis; longitudinal data analysis; infectious disease modelling; image/shape analysis; ...

5 / 29

One (of the best) ways(s) to describe some data is by tting a (statistical) model. Examples include: (linear/logistic/loglinear) regression models; survival analysis; longitudinal data analysis; infectious disease modelling; image/shape analysis; ...

5 / 29

One (of the best) ways(s) to describe some data is by tting a (statistical) model. Examples include: (linear/logistic/loglinear) regression models; survival analysis; longitudinal data analysis; infectious disease modelling; image/shape analysis; ...

5 / 29

One (of the best) ways(s) to describe some data is by tting a (statistical) model. Examples include: (linear/logistic/loglinear) regression models; survival analysis; longitudinal data analysis; infectious disease modelling; image/shape analysis; ...

5 / 29

Perhaps we can t a straight line? y = + x + error

1.0

q q q q q q q q q q qq q q q qq q q q q q q q q q q q q

0.8

q qq q q q q q q q q qq q q

response (y)

q q q q qq q q q qq q q qq q q q

0.4

q q qq q q qq qq qq q q q q q q q q q q q q q q qq q q q

q q

0.6

q q

0.2

explanatory (x)

6 / 29

An Example in DW-MRI

Suppose that we are interested in tractography. We use the diusion tensor to model local diusion within a voxel. The (model) assumption made is that local diusion could be modelled with a 3D Gaussian distribution whose variance-covariance matrix is proportional to the diusion tensor, D .

7 / 29

An Example in DW-MRI

The resulting diusion-weighted signal, i along a gradient direction gi with b -value bi is modelled as: i = S0 exp {bi gT i D g} where D11 D12 D13 D = D21 D22 D23 D31 D32 D33 (1)

S0 is the signal with no diusion weight gradients applied (i.e. b0 = 0). The eigenvectors of D give an orthogonal coordinate system and dene the orientation of the ellipsoid axes. The eigenvalues of D give the length of these axes. If we sort the eigenvalues by magnitude we can derive the the orientation of the major axis of the ellipsoid and the orientation of the minor axes.

8 / 29

An Example in DW-MRI

Although this may look a bit complicate, actually, it can be written in terms of a linear model.

9 / 29

Models have parameters some of which (if not all) are unknown, e.g. and . In statical modelling we are interested in inferring (e.g. estimating) the unknown parameters from data inference. Parameter estimation needs be done in a formal way. In other words we ask ourselves the question: what are the best values for and such that the proposed model (straight line) best describe the observed data? Should we only look for a single estimate for (, )? No! Why? Because there may be many pairs (, ) (often not very dierent from each other) which may equally well describe the data uncertainty

10 / 29

Models have parameters some of which (if not all) are unknown, e.g. and . In statical modelling we are interested in inferring (e.g. estimating) the unknown parameters from data inference. Parameter estimation needs be done in a formal way. In other words we ask ourselves the question: what are the best values for and such that the proposed model (straight line) best describe the observed data? Should we only look for a single estimate for (, )? No! Why? Because there may be many pairs (, ) (often not very dierent from each other) which may equally well describe the data uncertainty

10 / 29

Models have parameters some of which (if not all) are unknown, e.g. and . In statical modelling we are interested in inferring (e.g. estimating) the unknown parameters from data inference. Parameter estimation needs be done in a formal way. In other words we ask ourselves the question: what are the best values for and such that the proposed model (straight line) best describe the observed data? Should we only look for a single estimate for (, )? No! Why? Because there may be many pairs (, ) (often not very dierent from each other) which may equally well describe the data uncertainty

10 / 29

The likelihood function plays a fundamental role in statistical inference. In non-technical terms, the likelihood function is a function that when evaluated at a particular point, say (0 , 0 ), is the probability of observing the (observed) data given that the parameters (, ) take the values 0 and 0 . Lets think of a very simple example: Suppose we are interested in estimating the probability of success (denoted by ) for one particular experiment. Data: Out of 100 times we repeated the experiment we observed 80 successes. What about L(0.1), L(0.7), L(0.99)?

11 / 29

Frequentist inference tell us that: we should for parameter values that maximise the likelihood function maximum likelihood estimator (MLE) associate parameters uncertainty with the calculation of standard errors . . . . . . which in turn enable us to construct condence intervals for the parameters. Whats wrong with that? Nothing, but . . . . . . it is approximate, counter-intuitive (data is assumed to be random, parameter is xed) and often mathematically intractable.

12 / 29

Frequentist inference tell us that: we should for parameter values that maximise the likelihood function maximum likelihood estimator (MLE) associate parameters uncertainty with the calculation of standard errors . . . . . . which in turn enable us to construct condence intervals for the parameters. Whats wrong with that? Nothing, but . . . . . . it is approximate, counter-intuitive (data is assumed to be random, parameter is xed) and often mathematically intractable.

12 / 29

For instance, we cannot ask (or answer!) questions such as

1. what is the probability that the (unknown) probability of success in the previous experiment is greater than 0.6? i.e. compute the quantity P ( > 0.6) . . . 2. or something like, P (0.3 < < 0.9);

Sometime we are interested in (not necessarily) functions of parameters, e.g. 1 + 2 , 1 /(1 1 ) 2 /(1 2 )

Whilst in some cases, the frequentist approach oers a solution which is not exact but approximate, there other where it cannot or it is very hard to do so.

13 / 29

Bayesian Inference

When drawing inference within a Bayesian framework, the data are treated as a xed quantity and the parameters are treated as random variables. That allows us to assign to parameters (and models) probabilities, making the inferential framework far more intuitive and more straightforward (at least in principle!)

14 / 29

Denote by the parameters and by y the observed data. Bayes theorem allows to write: (|y) = where (|y) denotes the posterior distribution of the parameters given the data; (y|) = L() is the likelihood function; () is the prior distribution of which express our beliefs about the parameters, before we see the data; (y) is often called the marginal likelihood and plays the role of the normalising constant of the density of the posterior distribution

15 / 29

(y|) () = (y)

(y|) () (y| ) ( ) d

Everything is assigned distributions (prior, posterior); we are allowed to incorporate prior information about the parameter . . . which is then updated by using the likelihood function . . . leading to the posterior distribution which tell us everything we need about the parameter.

16 / 29

Everything is assigned distributions (prior, posterior); we are allowed to incorporate prior information about the parameter . . . which is then updated by using the likelihood function . . . leading to the posterior distribution which tell us everything we need about the parameter.

16 / 29

Everything is assigned distributions (prior, posterior); we are allowed to incorporate prior information about the parameter . . . which is then updated by using the likelihood function . . . leading to the posterior distribution which tell us everything we need about the parameter.

16 / 29

16 / 29

One of the biggest criticisms to the Bayesian paradigm is the use of the prior distribution. Choose a very informative prior to come up with favourable results; I know nothing about the parameter; what prior do I choose? Arguments against that criticism: priors should be chosen before we see the data and it is very often the case that there is some prior information available (e.g. previous studies) if we know nothing about the parameter, then we could assign to it a so-called uninformative (or vague) prior; if there is not a lot of data available then the posterior distribution would not be inuenced by the prior (too much) and vice versa;

17 / 29

Although Bayesian inference has been around for long time it is only the last two decades that it has really revolutionized the way we do statistical modelling. Although, in principle, Bayesian inference is straightforward and intuitive when it comes to computations it could be very hard to implement it. Thanks to computational developments such as Markov Chain Monte Carlo (MCMC) doing Bayesian inference is a lot easier.

18 / 29

83/100 successes: interested in probability of success

10 8

posterior

0

0.0

0.2

0.4 theta

0.6

0.8

1.0

19 / 29

83/100 successes: interested in probability of success

10

posterior

0

0.0

0.2

0.4 theta

0.6

0.8

1.0

20 / 29

83/100 successes: interested in probability of success

10 8

posterior

0

0.0

0.2

0.4 theta

0.6

0.8

1.0

21 / 29

8/10 successes: interested in probability of success

10

posterior 0

0.0

0.2

0.4 theta

0.6

0.8

1.0

22 / 29

83/100 successes: interested in probability of success

10

posterior

0

0.0

0.2

0.4 theta

0.6

0.8

1.0

23 / 29

Suppose that we are interested in testing two competing model hypotheses, M1 and M2 . Within a Bayesian framework, the model index M can be treated as a an extra parameter (as well as the other parameters in M1 and M2 . So, it is natural to ask what is the posterior model probability given the observed data?, i.e. (M1 |y) or P (M2 |y) Bayes Theorem: P (M1 |y) = where

(y|M1 ) is the marginal likelihood (also called the evidence), (M1 ) is the prior model probability

24 / 29

(y|M1 ) (M1 ) (D )

Given a model selection problem in which we have to choose between two models, on the basis of observed data y. . . . . .the plausibility of the two dierent models M1 and M2 , parametrised by model parameter vectors 1 and 2 is assessed by the Bayes factor given by: P (y|M1 ) = P (y|M2 )

1 2

(y|1 , M1 ) (1 ) d1 (y|2 , M2 ) (2 ) d2

The Bayesian model comparison does not depend on the parameters used by each model. Instead, it considers the probability of the model considering all possible parameter values. This is similar to a likelihood-ratio test, but instead of maximizing the likelihood, we average over all the parameters.

25 / 29

Why bother? An advantage of the use of Bayes factors is that it automatically, and quite naturally, includes a penalty for including too much model structure. It thus guards against overtting. No free lunch! In practical situations, the calculation of Bayes Factor relies on the employment computationally intensive methods, such Reversible-Jump Markov Chain Monte Carlo (RJ-MCMC) which require a certain amount of expertise from the end-user.

26 / 29

We assume that the voxels intensity can be modelled by assuming that Si /S0 N (i , 2 ) where we could consider (at least) two dierent models: 1. Diusion Tensor Model (Model 1) assumes that: i = exp {bi gT i D g} 2. Simple Partial Volume Model (Model 2) assumes that: i = f exp {bd } + (1 f ) exp {bd gi C gT i }

27 / 29

Suppose that we have some measurements (intensities) for each voxel. We could t the two dierent models (on the same dataset). Question: How do we tell which model ts the data best taking into account the uncertainty associated with the parameters in each model? Answer: Calculate the Bayes factor!

28 / 29

Suppose that we have some measurements (intensities) for each voxel. We could t the two dierent models (on the same dataset). Question: How do we tell which model ts the data best taking into account the uncertainty associated with the parameters in each model? Answer: Calculate the Bayes factor!

28 / 29

Conclusions

Quantication of the uncertainty both in parameter estimation and model choice is essential in any modelling exercise. A Bayesian approach oers a natural framework to deal with parameter and model uncertainty. It oers much more than a single best t or any sort sensitivity analysis. There is no free lunch, unfortunately. To do fancy things, often one has to write his/her own computer programs. Software available: R, Winbugs, BayesX . . .

29 / 29

- Bayesian BrainUploaded by@lsreshty
- Bayesian Analysis in Stata Using WinBUGSUploaded byJinghua Lei
- BayesianTheory_BernardoSmith2000Uploaded byMichelle Anzarut
- Teaching Bayesian MethodUploaded byBirte Gröger
- Bayesian TutorialUploaded byMárcio Pavan
- Principles of Statistical InferenceUploaded byEdmundo Caetano
- Bayesian NetworkUploaded byJaehyun Kim
- Bayesian Methods PythonUploaded bydaselknam
- Bayesian Statistics With R and BUGSUploaded byaditya risqi pratama
- Doing Bayesian Data Analysis With R and BUGSUploaded byOctavio Martinez
- Fundamentals of Computational NeuroscienceUploaded byAdrian Rendon Nava
- Bayesian ProgrammingUploaded by정관용
- An Introduction to Bayesian StatisticsUploaded byjamesyu
- Introduction to Bayesian NetworksUploaded byStefan Conrady
- A Bayesian Theory of Games: Iterative Conjectures and Determination of EquilibriumUploaded byChartridge Books Oxford
- The-Field-Guide-to-Data-Science.pdfUploaded byjyotimohapatra
- Neural Network completeUploaded byDeepthi Katragadda
- Statistical Data Analysis ExplainedUploaded bymalikjunaid
- A Guide to Create Beautiful Graphics in R, 2nd Ed.pdfUploaded byIsaac Pedro
- Generalized Linear ModelsUploaded byBoris Polanco
- Computational Neuroscience and Cognitive ModellingUploaded byaajkljkl
- Machine LearningUploaded byAnurag Singh
- Koch I. Analysis of Multivariate and High-Dimensional Data 2013Uploaded byrciani
- Computational Neuroscience - A Comprehensive Approach; Jianfeng Feng Chapman & Hall, 2004)Uploaded byVolodymyr Lugovskoi
- Bayesian Methods for Hackers_ P - Cameron Davidson-PilonUploaded byJarlinton Moreno Zea
- Regression Modeling Strategies With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis.pdfUploaded byVedobrato Chatterjee
- Understanding Machine LearningUploaded bydeepakdodo
- Artificial Neural Networks Architecture ApplicationsUploaded byRaamses Díaz
- (Chapman & Hall_CRC Texts in Statistical Science) Richard McElreath-Statistical Rethinking_ a Bayesian Course With Examples in R and Stan-Chapman and Hall_CRC (2015)Uploaded by9tikg

- Binomial Option Pricing _f-0943Uploaded bySSAM3030
- The Cryotron Superconductive Computer ComponentUploaded bymbuyiselwa
- KAM TheoryUploaded bymbuyiselwa
- Complex Systems ScienceUploaded bymbuyiselwa
- Mq Development CodeUploaded bymbuyiselwa
- Solar Drying AustriaUploaded bySafka sagkas
- Trading StrategyUploaded bymbuyiselwa
- Tribology ScienceUploaded bymbuyiselwa
- HostBridge SOA Web Services for CA-Ideal White PaperUploaded bymbuyiselwa
- Euler Bernoulli BeamsUploaded bymbuyiselwa
- creditUploaded byanon-550336
- [JP Morgan] Intro to Credit DerivativesUploaded byalgunoshombres
- Chapter 14 The Remote API (RAPI).pdfUploaded bymbuyiselwa
- Statistical LearningUploaded bymbuyiselwa
- Molecular MechanicsUploaded bymbuyiselwa
- ThermodynamicsUploaded byPrashant Bhatia
- Axiomatic DesignUploaded bymbuyiselwa
- Breast Cancer PaperUploaded byavallonia9
- Detailed Engineering Design PhaseUploaded byrenjithv_4
- 5. Using Java to Access CA-DatacomUploaded bymbuyiselwa
- coriolisUploaded byJastina Nguyen
- Autocad DrawingsUploaded byadnan
- 30.Technical Drawing ApplicationsUploaded bymbuyiselwa
- Coriolis force.pdfUploaded byClaudio Perez
- Transact-SQL Programming -- Sample ChapterUploaded bymbuyiselwa
- InTech-Thermoplastic Matrix Reinforced With Natural Fibers a Study on Interfacial BehaviorUploaded byJose Manuel Oliveira Pereira
- Coriolus EffectsUploaded bymbuyiselwa
- EngineeringUploaded bymbuyiselwa
- Auto Cad 2 d TutorialUploaded byCesar Parrillo
- 27_2003_Gil G. C._operating Parameters Affecting the Performance of a Mediator-less Microbial Fuel Cell (1)Uploaded bymbuyiselwa

- NR-420501 Simulation and ModellingUploaded bySrinivasa Rao G
- Types of Modeling DiagramsUploaded byRangeet Pan
- Dbms Manual.docUploaded byanildudla
- Histogram.docxUploaded byHemal Vyas
- Econ 704Uploaded byindra
- 31076449 Reserch Analysis Between Wall s Omore Ice Cream by Mian ShahnnawazUploaded byAwais Khatri
- Course Outcomes by Cad CamUploaded bySubhash Khetre
- UMLUploaded byHarisSetiawan
- Cadcam Assignments PartAUploaded byAnuj Reddy
- macro 1 - bootstrapUploaded byapi-297722278
- Quartus II Tutorial HdlUploaded byssunil7432
- Chapter12.docUploaded bythteng
- Labview 3d Control Simulation Using Solidworks 3d Models 4Uploaded byAndony Castillo
- Fundamentals of VoxelizationUploaded byAjeet Verma
- Arithmetic Operations Digitial Image ProcessingUploaded byRovick Tarife
- Intro Uml2Uploaded bySumayyea Salahuddin
- 22-gpgpu.pptUploaded byTan Teck Keng Adrian
- SAP BW Data Modeling ConceptsUploaded byVandana Goyal
- distrib_unif_dist.pdfUploaded bykohli
- Protege Tutor SlideUploaded bynguyende2004
- DESIGN AND DEVELOPMENT OF BUSINESS RULES MANAGEMENT SYSTEM (BRMS) USING ATLAND ECLIPSE SIRIUSUploaded byAnonymous Gl4IRRjzN
- Shs DatabaseUploaded byNerenzo de Guzman
- Assignment 2 (2015F)Uploaded byiamnuai
- PQRS Manual EnUploaded byjfmagar-1
- 26 - Simple SQL Commands-2Uploaded bysandeepdhull
- Marquee Odi-Oracledataintegrator MarqueeUploaded byManohar Reddy
- S1 mindmapUploaded by'Tahir Ansari
- Elmasri and Navathe DBMS Concepts 16Uploaded byAbdul Ghani Khan
- Assignment ForecastingUploaded byVô Thường
- pert.docUploaded byمهندسةزينب جعفر