This action might not be possible to undo. Are you sure you want to continue?

Theo Kypraios http://www.maths.nott.ac.uk/∼tk School of Mathematical Sciences − Division of Statistics

**Division of Radiological and Imaging Sciences Away Day
**

1 / 29

Warning

**This talk includes
**

about 5 equations (hopefully not too hard!) about 10 ﬁgures.

This tutorial should be accessible even if the equations might look hard.

2 / 29

Outline of the Talk

The need for (statistical) modelling; two examples (a linear model/tractography) introduction to statistical inference (frequentist); introduction to the Bayesian approach to parameter estimation; more examples and Bayesian inference in practice conclusions.

3 / 29

**Use of Statistics in Clinical Sciences (1)
**

Examples include: Sample Size Determination Comparison between two (or more) groups

t-tests, Z-tests; Analysis of variance (ANOVA); tests for proportions etc;

Receiver Operating Characteristic (ROC) curves; Clinical Trials; ...

4 / 29

Z-tests.Use of Statistics in Clinical Sciences (1) Examples include: Sample Size Determination Comparison between two (or more) groups t-tests. Analysis of variance (ANOVA). ... Clinical Trials. Receiver Operating Characteristic (ROC) curves. tests for proportions etc. 4 / 29 .

Analysis of variance (ANOVA).. . Receiver Operating Characteristic (ROC) curves.Use of Statistics in Clinical Sciences (1) Examples include: Sample Size Determination Comparison between two (or more) groups t-tests. 4 / 29 . tests for proportions etc. Clinical Trials.. Z-tests.

Z-tests. tests for proportions etc.Use of Statistics in Clinical Sciences (1) Examples include: Sample Size Determination Comparison between two (or more) groups t-tests. Clinical Trials. . Receiver Operating Characteristic (ROC) curves.. 4 / 29 .. Analysis of variance (ANOVA).

Use of Statistics in Clinical Sciences (1) Examples include: Sample Size Determination Comparison between two (or more) groups t-tests. tests for proportions etc.. Z-tests. Analysis of variance (ANOVA). Receiver Operating Characteristic (ROC) curves.. . 4 / 29 . Clinical Trials.

image/shape analysis.. infectious disease modelling.. . survival analysis.Use of Statistics in Clinical Sciences (2) One (of the best) ways(s) to describe some data is by ﬁtting a (statistical) model. Examples include: (linear/logistic/loglinear) regression models. 5 / 29 . longitudinal data analysis.

5 / 29 . . longitudinal data analysis.Use of Statistics in Clinical Sciences (2) One (of the best) ways(s) to describe some data is by ﬁtting a (statistical) model.. infectious disease modelling. image/shape analysis. survival analysis. Examples include: (linear/logistic/loglinear) regression models..

survival analysis. image/shape analysis. 5 / 29 . longitudinal data analysis.. . infectious disease modelling.. Examples include: (linear/logistic/loglinear) regression models.Use of Statistics in Clinical Sciences (2) One (of the best) ways(s) to describe some data is by ﬁtting a (statistical) model.

Use of Statistics in Clinical Sciences (2) One (of the best) ways(s) to describe some data is by ﬁtting a (statistical) model. 5 / 29 . . Examples include: (linear/logistic/loglinear) regression models.. image/shape analysis. survival analysis.. longitudinal data analysis. infectious disease modelling.

5 / 29 . survival analysis.. infectious disease modelling.Use of Statistics in Clinical Sciences (2) One (of the best) ways(s) to describe some data is by ﬁtting a (statistical) model. . image/shape analysis.. Examples include: (linear/logistic/loglinear) regression models. longitudinal data analysis.

Examples include: (linear/logistic/loglinear) regression models. survival analysis.Use of Statistics in Clinical Sciences (2) One (of the best) ways(s) to describe some data is by ﬁtting a (statistical) model. longitudinal data analysis. . image/shape analysis... infectious disease modelling. 5 / 29 .

0 q q q q q q q q q q qq q q q qq q q q q q q q q q q q q 0.2 −2 −1 0 1 2 explanatory (x) 6 / 29 .8 q qq q q q q q q q q qq q q response (y) q q q q q qq q q q qq q q qq q q q 0.4 q q qq q q qq qq qq q q q q q q q q q q q q q q qq q q q qq q q 0.6 q q 0.Aims of Statistical Modelling: A Simple Example Perhaps we can ﬁt a straight line? y = α + β x + error 1.

D . The (model) assumption made is that local diﬀusion could be modelled with a 3D Gaussian distribution whose variance-covariance matrix is proportional to the diﬀusion tensor. 7 / 29 .An Example in DW-MRI Suppose that we are interested in tractography. We use the diﬀusion tensor to model local diﬀusion within a voxel.

b0 = 0). µi along a gradient direction gi with b -value bi is modelled as: µi = S0 exp {−bi gT i D g} where D11 D12 D13 D = D21 D22 D23 D31 D32 D33 (1) S0 is the signal with no diﬀusion weight gradients applied (i. If we sort the eigenvalues by magnitude we can derive the the orientation of the major axis of the ellipsoid and the orientation of the minor axes.An Example in DW-MRI The resulting diﬀusion-weighted signal. 8 / 29 . The eigenvalues of D give the length of these axes. The eigenvectors of D give an orthogonal coordinate system and deﬁne the orientation of the ellipsoid axes.e.

Taken from {Sotiropoulos. actually.An Example in DW-MRI Although this may look a bit complicate. Jones. it can be written in terms of a linear model. Bai + K (2010)} 9 / 29 .

Aims of Statistical Modelling Models have parameters some of which (if not all) are unknown. estimating) the unknown parameters from data → inference. In other words we ask ourselves the question: what are the best values for α and β such that the proposed model (straight line) best describe the observed data? Should we only look for a single estimate for (α. β )? No! Why? Because there may be many pairs (α.g. e. β ) (often not very diﬀerent from each other) which may equally well describe the data → uncertainty 10 / 29 .g. α and β . In statical modelling we are interested in inferring (e. Parameter estimation needs be done in a formal way.

β )? No! Why? Because there may be many pairs (α.Aims of Statistical Modelling Models have parameters some of which (if not all) are unknown. estimating) the unknown parameters from data → inference.g. In statical modelling we are interested in inferring (e. α and β . Parameter estimation needs be done in a formal way. β ) (often not very diﬀerent from each other) which may equally well describe the data → uncertainty 10 / 29 . In other words we ask ourselves the question: what are the best values for α and β such that the proposed model (straight line) best describe the observed data? Should we only look for a single estimate for (α. e.g.

e.Aims of Statistical Modelling Models have parameters some of which (if not all) are unknown. estimating) the unknown parameters from data → inference.g. β ) (often not very diﬀerent from each other) which may equally well describe the data → uncertainty 10 / 29 . In other words we ask ourselves the question: what are the best values for α and β such that the proposed model (straight line) best describe the observed data? Should we only look for a single estimate for (α. Parameter estimation needs be done in a formal way. β )? No! Why? Because there may be many pairs (α.g. α and β . In statical modelling we are interested in inferring (e.

L(0. say (α0 . β ) take the values α0 and β0 . Let’s think of a very simple example: Suppose we are interested in estimating the probability of success (denoted by θ) for one particular experiment. Data: Out of 100 times we repeated the experiment we observed 80 successes.1). is the probability of observing the (observed) data given that the parameters (α. What about L(0. β0 ).The likelihood function The likelihood function plays a fundamental role in statistical inference. the likelihood function is a function that when evaluated at a particular point. L(0.99)? 11 / 29 . In non-technical terms.7).

.Classical (Frequentist) Inference Frequentist inference tell us that: we should for parameter values that maximise the likelihood function → maximum likelihood estimator (MLE) associate parameter’s uncertainty with the calculation of standard errors . parameter is ﬁxed) and often mathematically intractable. . . . . but . . What’s wrong with that? Nothing. 12 / 29 . which in turn enable us to construct conﬁdence intervals for the parameters. . . it is approximate. . . counter-intuitive (data is assumed to be random.

it is approximate. . . . . . parameter is ﬁxed) and often mathematically intractable. counter-intuitive (data is assumed to be random. . . 12 / 29 . but . .Classical (Frequentist) Inference Frequentist inference tell us that: we should for parameter values that maximise the likelihood function → maximum likelihood estimator (MLE) associate parameter’s uncertainty with the calculation of standard errors . which in turn enable us to construct conﬁdence intervals for the parameters. . . What’s wrong with that? Nothing.

P (0. we cannot ask (or answer!) questions such as 1. there other where it cannot or it is very hard to do so. compute the quantity P (θ > 0.e. the frequentist approach oﬀers a solution which is not exact but approximate.6) . 13 / 29 .6?” i. . Sometime we are interested in (not necessarily) functions of parameters. θ1 /(1 − θ1 ) θ2 /(1 − θ2 ) Whilst in some cases.Classical (Frequentist) Inference . “what is the probability that the (unknown) probability of success in the previous experiment is greater than 0.9).3 < θ < 0.g. .Some Issues For instance. 2. θ 1 + θ2 . e. or something like.

making the inferential framework far more intuitive and more straightforward (at least in principle!) 14 / 29 . That allows us to assign to parameters (and models) probabilities. the data are treated as a ﬁxed quantity and the parameters are treated as random variables.Bayesian Inference When drawing inference within a Bayesian framework.

π (θ) is the prior distribution of θ which express our beliefs about the parameters. before we see the data. Bayes theorem allows to write: π (θ|y) = where π (θ|y) denotes the posterior distribution of the parameters given the data. π (y) is often called the marginal likelihood and plays the role of the normalising constant of the density of the posterior distribution 15 / 29 π (y|θ)π (θ) = π (y) π (y|θ)π (θ) θ π (y|θ )π (θ ) dθ . π (y|θ) = L(θ) is the likelihood function.Bayesian Inference (2) Denote by θ the parameters and by y the observed data.

leading to the posterior distribution which tell us everything we need about the parameter. which is then updated by using the likelihood function .Bayesian vs Frequentist Inference Everything is assigned distributions (prior. . . posterior). 16 / 29 . . . we are allowed to incorporate prior information about the parameter .

. . posterior).Bayesian vs Frequentist Inference Everything is assigned distributions (prior. . leading to the posterior distribution which tell us everything we need about the parameter. . which is then updated by using the likelihood function . 16 / 29 . we are allowed to incorporate prior information about the parameter .

. we are allowed to incorporate prior information about the parameter . leading to the posterior distribution which tell us everything we need about the parameter. . posterior). .Bayesian vs Frequentist Inference Everything is assigned distributions (prior. 16 / 29 . which is then updated by using the likelihood function . .

. which is then updated by using the likelihood function . we are allowed to incorporate prior information about the parameter . posterior). 16 / 29 .Bayesian vs Frequentist Inference Everything is assigned distributions (prior. leading to the posterior distribution which tell us everything we need about the parameter. . . .

g. I know nothing about the parameter. Choose a very informative prior to come up with favourable results. previous studies) if we know nothing about the parameter. 17 / 29 . what prior do I choose? Arguments against that criticism: priors should be chosen before we see the data and it is very often the case that there is some prior information available (e. then we could assign to it a so-called uninformative (or vague) prior.Bayesian Inference: The Prior One of the biggest criticisms to the Bayesian paradigm is the use of the prior distribution. if there is not a lot of data available then the posterior distribution would not be inﬂuenced by the prior (too much) and vice versa.

Bayesian inference is straightforward and intuitive when it comes to computations it could be very hard to implement it.Bayesian Inference: The Posterior Although Bayesian inference has been around for long time it is only the last two decades that it has really revolutionized the way we do statistical modelling. Thanks to computational developments such as Markov Chain Monte Carlo (MCMC) doing Bayesian inference is a lot easier. 18 / 29 . in principle. Although.

Bayesian Inference: Some Examples 83/100 successes: interested in probability of success θ 10 8 posterior lik prior posterior 0 0.6 0.0 19 / 29 .2 0.8 1.0 2 4 6 0.4 theta 0.

Bayesian Inference: Some Examples 83/100 successes: interested in probability of success θ 8 10 posterior lik prior posterior 0 0.4 theta 0.6 0.8 1.0 20 / 29 .0 2 4 6 0.2 0.

0 21 / 29 .2 0.6 0.0 2 4 6 0.8 1.Bayesian Inference: Some Examples 83/100 successes: interested in probability of success θ 10 8 posterior lik prior posterior 0 0.4 theta 0.

4 theta 0.6 0.0 2 4 6 8 0.2 0.0 22 / 29 .8 1.Bayesian Inference: Some Examples 8/10 successes: interested in probability of success θ 10 posterior lik prior posterior 0 0.

2 0.8 1.0 2 4 6 8 0.4 theta 0.0 23 / 29 .6 0.Bayesian Inference: Some Examples 83/100 successes: interested in probability of success θ 10 posterior lik prior posterior 0 0.

e. the model index M can be treated as a an extra parameter (as well as the other parameters in M1 and M2 . it is natural to ask “what is the posterior model probability given the observed data?”. M1 and M2 . i. (M1 |y) or P (M2 |y) Bayes Theorem: P (M1 |y) = where π (y|M1 ) is the marginal likelihood (also called the evidence). π (M1 ) is the prior model probability 24 / 29 π (y|M1 )π (M1 ) π (D ) . Within a Bayesian framework. So.Comparing Diﬀerent Hypotheses: Bayesian Model Choice Suppose that we are interested in testing two competing model hypotheses.

This is similar to a likelihood-ratio test.Bayesian Model Choice (2) Given a model selection problem in which we have to choose between two models. . Instead. it considers the probability of the model considering all possible parameter values. M2 )π (θ2 ) dθ2 The Bayesian model comparison does not depend on the parameters used by each model.the plausibility of the two diﬀerent models M1 and M2 . on the basis of observed data y. . 25 / 29 . . . . M1 )π (θ1 ) dθ1 π (y|θ2 . but instead of maximizing the likelihood. parametrised by model parameter vectors θ1 and θ2 is assessed by the Bayes factor given by: P (y|M1 ) = P (y|M2 ) θ1 θ2 π (y|θ1 . we average over all the parameters.

the calculation of Bayes Factor relies on the employment computationally intensive methods. It thus guards against overﬁtting. 26 / 29 .Bayesian Model Choice (3) Why bother? An advantage of the use of Bayes factors is that it automatically. No free lunch! In practical situations. includes a penalty for including too much model structure. such Reversible-Jump Markov Chain Monte Carlo (RJ-MCMC) which require a certain amount of expertise from the end-user. and quite naturally.

An Example in DW-MRI Analysis We assume that the voxel’s intensity can be modelled by assuming that Si /S0 ∼ N (µi . Diﬀusion Tensor Model (Model 1) assumes that: µi = exp {−bi gT i D g} 2. Simple Partial Volume Model (Model 2) assumes that: µi = f exp {−bd } + (1 − f ) exp {−bd gi C gT i } 27 / 29 . σ 2 ) where we could consider (at least) two diﬀerent models: 1.

An Example in DW-MRI Analysis (2) Suppose that we have some measurements (intensities) for each voxel. We could ﬁt the two diﬀerent models (on the same dataset). Question: How do we tell which model ﬁts the data best taking into account the uncertainty associated with the parameters in each model? Answer: Calculate the Bayes factor! 28 / 29 .

We could ﬁt the two diﬀerent models (on the same dataset). Question: How do we tell which model ﬁts the data best taking into account the uncertainty associated with the parameters in each model? Answer: Calculate the Bayes factor! 28 / 29 .An Example in DW-MRI Analysis (2) Suppose that we have some measurements (intensities) for each voxel.

often one has to write his/her own computer programs. BayesX . Winbugs. It oﬀers much more than a single “best ﬁt” or any sort “sensitivity analysis”. . There is no free lunch.Conclusions Quantiﬁcation of the uncertainty both in parameter estimation and model choice is essential in any modelling exercise. Software available: R. A Bayesian approach oﬀers a natural framework to deal with parameter and model uncertainty. . 29 / 29 . To do fancy things. unfortunately.

- [David C Sim] How Many Jews Became Christians in t(BookZZ.org)
- No Punctuation is Funnier
- RR Layouts
- Interrogation Techniques
- [Martin Montgomery] an Introduction to Language and Society
- David Ludlow - Railroad 101 01-22-15
- Do Not Retire Baby Boomer Manifesto
- Grading Concrete Roadbed _ Garden Railways Magazine
- Your Life in Weeks _ Wait but Why
- Duplicate Cosmetics
- Math Ics
- 27218 Web BASICStampManual v2.2
- [Edwin P Alexander] American Locomotives a Pictor(BookZZ.org)
- 226838560 Gary Allen What a Way to Run a Railroad Railroad Monopoly in America 1976 (1)
- How to Wire a Layout for Two-train Operation _ ModelRailroader
- Math Ics
- [Bartelmann M.] Cosmology(BookZZ.org)
- Plant Invaders of Mid-Atlantic Natural Areas
- 107.03.GrowingUpJersey
- 18981 Training Students to Extract Value From Big Data
- MMS Salary Survey 2014
- Fundamental Errors in the General Theory of Relativity
- Drawing Basics_free Beginner Drawing Techniques
- 2014 Phrma Profile
- Collaborative Project Management Guide eBook

Sign up to vote on this title

UsefulNot usefultutorial on Bayesian statistics
good introduction

tutorial on Bayesian statistics

good introduction

good introduction

- Bayesian Statistics
- Bayesian Statistics
- Bayesian Statistics
- Bayesian Statistics
- Teaching Bayesian Method
- Subjective Probability the Real Thing
- AMT Statistical Modeling 1
- 94211972 Proforma Mte 3105 Statistics
- s Pss Chapter 12
- statistics jokes
- Annotated Word Version of Cluster Anaysis Output 2014-2015 Version 2(1)
- Friedman nonparametric test
- 21803_1_Activity-7.docx
- Spatial
- Analysis of Variance
- An Ova
- 00029___9f25a0cb49e68f14932f9224ed99cf77_merged
- Applied Statistics for Scientists and Engineers at Boston
- Bakeman - meta-analiza
- Anilisis Anova
- The Aligned Rank Transform for Nonparametric Factorial Analyses Using Only A NOVA Procedures
- Interaction Plots
- Statistics Hci
- Exploring Statistics
- Mb0040 - Statistics for Management
- Two Ways Anova
- Describe the Role of Statistical Analysis in Interpreting the Results of Research Project
- Optimization of Turning Parameters Using Taguchi Method
- Minitab 16
- Anova.doc
- A Gentle Tutorial in Bayesian Statistics.pdf