You are on page 1of 7

Physics 201 Spring 2019‑2020 Lecture 1

Introduction to Statistical Inference and Probability


Theory

Reading for this lecture: Gregory chapter 1


Reading for next lecture: Gregory chapter 2 (we won’t cover this in any detail, but it’s good
background material) and Gregory sections 3.1‑3.3

1 Statistical inference
This course is about statistical inference in physics. Consider this diagram of the scienti ic
method:
Deductive
inference
Experiment or
Hypothesis/theory Predictions
Observation

Constrain
and
validate

Statistical inference

All science involves some interplay between theory and experiment. Theories make predic‑
tions that can be tested by experiment, but experiments can also inform theory. There are two
kinds of inference that are needed in the scienti ic method:
• Deductive inference: reasoning from one proposition to another using strong syllogisms
of logic (for example, “if A, then B”)
• Statistical inference: reasoning from data to make conclusions about the plausibility of
a theory/model/hypothesis
Deductive inference is the core of most physics classes, which start from fundamental laws
(such as Maxwell’s equations) and proceed to speci ic predictions. It is essential for all physi‑
cists, but particularly theorists, to master.
Statistical inference is not taught as widely within the physics curriculum, but it is just as im‑
portant to the scienti ic method. It is how we test and validate (or reject) theories and discover
new laws of nature. It is essential for all physicists, but particularly for experimentalists.

1.1 What we will cover in this course


In this course, we will cover the two principal frameworks for statistical inference: frequentist
and Bayesian. The two frameworks start from different de initions of probability, but there is
some overlap in the methods used in the two.

Copyright © 2018–2020 Vinothan N. Manoharan 1


Physics 201 Spring 2019‑2020 Lecture 1

1.2 What we will not cover: data science and data exploration
We will not cover data science or machine learning, which focus on constructing predictive
models directly from data. In data science, the theory or model that the machine igures out is
used only for prediction. But in science we are concerned with theories that not only predict,
but also explain, in some mechanistic sense. For example, Newton’s laws explain and predict the
motion of projectiles in terms of force, a fundamental physical concept. If we were to measure a
bunch of projectile trajectories and train a machine to predict the motion, it would likely come
up with a model far more mathematically complex than Newton’s laws. That model might do a
very good job of predicting the motion of a projectile, but its parameters wouldn’t necessarily
correspond to familiar concepts like force.
Another way of looking at machine learning is that it “short‑circuits” the usual scienti ic
method, since the focus is on going directly from data to predictions:
Deductive
inference
Experiment or
Hypothesis/theory Predictions
Observation

Constrain
and
validate Machine learning
(not our focus)

Statistical inference

That doesn’t mean data science and machine learning have no role in physics; they’re just not
the focus of this course.

Note: For the irst part of the course, I will use the terms “theory” and “model”
interchangeably. As the course progresses, we’ll start to use the term “model” to
mean the combination of a physical theory and a statistical model for the results of
an experiment.

Similarly, we will not cover data exploration, which is where you poke, prod, process, and
plot data in various ways until it tells you how you should be modeling it. Data exploration
is focused on iguring out a hypothesis based on the data. It’s also important, but it’s not the
subject of this class. I don’t know of any systematic processes for data exploration—it’s mostly
a bunch of ad hoc techniques.
The reason we avoid these topics is that in the physical sciences, we do measurements to un‑
derstand something about the physical world. And by “understand” I mean understand quanti‑
tatively—that is, through models that predict and explain. For example, we measure the mass
of the Higgs boson not for its own sake, but so that we can validate or reject models of fun‑
damental physics that go beyond the Standard Model. We measure the properties of complex
luids not so that we can put those properties in a handbook, but so that we can compare to
the predictions of models that explain these properties at a microscopic level. In other words,
a model or theory is central to everything we do as experimentalists. So our goal in this course
is to understand how to do model‑based inference.

Copyright © 2018–2020 Vinothan N. Manoharan 2


Physics 201 Spring 2019‑2020 Lecture 1

Note: We also won’t have much to say about design of experiments, which is an
entire subject in its own right. But the course will give you some basic principles
that are useful in design.

1.3 The two basic problems of statistical inference


1. Model selection or hypothesis testing: The irst term is Bayesian, the second frequen‑
tist. In both cases, we are interested in assessing the plausibility or truth of a model.
• Model selection examples:
– Is the trajectory of a particle Brownian or ballistic?
– Does the data suggest the existence of an exoplanet orbiting a star or not? Two
planets?
• Hypothesis testing examples:
– Hypothesis: There exists a mass peak indicative of a new particle (for example,
the Higgs boson)
– Hypothesis: The mass of the observed star is less than one solar mass
2. Parameter estimation: Given a model and some data, with uncertainties, ind the “best”
estimates for the model parameters, and the uncertainties on these estimates. In other
words, it the model to the data.
Examples:
• Find best estimate and uncertainty for Hubble’s constant 𝐻0 , given data and Hub‑
ble’s law
• Given data and a model 𝜃 = sin(𝜔𝑡 + 𝜙), determine the best estimate for 𝜔 and 𝜙
as well as the uncertainty in these parameters.

Note: I’ll try to use the word “uncertainty” rather than “error”. Error implies that
we did something wrong, but you can do a measurement completely correctly, and
you should still end up with an uncertainty.

The second example might result in a it with 𝜔 = 1.1 ± 0.1 and 𝜙 = 0.05 ± 0.01. Below I show
an example of the best‑ it curve and the data, with uncertainties.

Copyright © 2018–2020 Vinothan N. Manoharan 3


Physics 201 Spring 2019‑2020 Lecture 1










EHVWILW
 GDWD

     
t

1.4 The role of uncertainties


Bayesian and frequentist frameworks differ in their approaches to the two problems above.
But in both cases we aim to be quantitative: how well does the model it the data? What are
the uncertainties on the parameters? In model selection/hypothesis testing, we want a value
that will help quantify the validity of the model. In parameter estimation, we want to know
how much to trust the estimate. In both cases, then, it is important to quantify uncertainty.
Consider parameter estimation. Let’s say I have two different methods to it the same data
above, using the same model in both cases. The irst method gives 𝜔 = 1.1 ± 0.1, and the
second gives 𝜔 = 1.1 ± 0.01. Which analysis method is better? It might seem like the second
one is better, since it gives a lower uncertainty. But remember that ultimately we want to use
that uncertainty to compare to a prediction from some other theory (recall that we would never
do a measurement for measurement’s sake). If our uncertainty is underestimated, we are less
likely to obtain agreement with that theory. If our uncertainty is overestimated, we are more
likely. Which is better? Neither—we want the most accurate estimate of the uncertainty on our
parameters, given the uncertainty on our data.
In fact, if there is one central theme of this class, it’s learning to accurately quantify uncer‑
tainty. The goal of our work is not to minimize uncertainty (that’s the goal of experimental
design), but to accurately determine it, given a model and the uncertainty on our data. In your
undergrad lab classes, you probably spent a lot of time on formulas for propagating errors (or,
as noted above, propagating uncertainties). These formulas are deadly boring and generally
useless. We won’t spend any time on them. There are better ways to quantify uncertainty that
don’t require memorizing a bunch of formulas. But they do require a computer.

1.5 Frequentist vs. Bayesian frameworks


All statistical inference requires a theory of probability. The two frameworks differ in their
de initions of probability.

Copyright © 2018–2020 Vinothan N. Manoharan 4


Physics 201 Spring 2019‑2020 Lecture 1

1.5.1 Frequentist de inition of probability 𝑝(𝐴)


In the frequentist de inition, the argument 𝐴 of the probability 𝑝(𝐴) must be a random vari‑
able: a quantity that can take on different values in different runs of the same experiments (for
example, a measurement of position). Then 𝑝(𝐴) is the long‑run relative frequency with which
𝐴 appears in identical repeats of the same experiment.

Sidenote: More formally, we would write 𝑝(𝑎) = 𝑝{𝐴 = 𝑎}, meaning the probabil‑
ity that the random variable 𝐴 (which can take on any value) takes on the speci ic
value 𝑎.

1.5.2 Bayesian de inition of probability 𝑝(𝐴 ∣ 𝐵)


𝑝(𝐴 ∣ 𝐵) is a real number between 0 and 1 measuring the plausibility of proposition 𝐴 given the
truth of proposition 𝐵. 𝐴 can be any logical proposition—for example, “The newly discovered
object is a galaxy”. We read the vertical bar ∣ as “given” or “conditioned on.”
Note that 𝑝(𝐴 ∣ 𝐵) ≠ 𝑝(𝐵 ∣ 𝐴) in general. Imagine a classroom, and consider 𝐴 = “person
is a student” and 𝐵 = “person is in the classroom”. 𝑝(𝐴 ∣ 𝐵) is probably quite close to 1, but
𝑝(𝐵 ∣ 𝐴) is probably close to 0! We’ll see that the proper way to convert 𝑝(𝐴 ∣ 𝐵) to 𝑝(𝐵 ∣ 𝐴)
involves Bayes’s theorem.

1.5.3 Comparison
From the frequentist point of view, it makes no sense to discuss the probability of a hypothesis,
since a hypothesis is not a random variable—it doesn’t change from run to run of our experi‑
ment. But from the Bayesian point of view, we can talk about the probability of a hypothesis
being true, because 𝑝(𝐴 ∣ 𝐵) is a measure of our “degree of belief” in 𝐴, given our state of
knowledge 𝐵. So the two frameworks differ fundamentally in their assumptions.
Does this mean the Bayesian approach is somehow “subjective”? E.T. Jaynes, a physicist who
did important work in the ield of Bayesian inference, put it this way: “The only thing objectivity
requires of a scienti ic approach is that experimenters with the same state of knowledge reach
the same conclusions.”

Sidenote: There have been lots of arguments between proponents of both ap‑
proaches. I suggest staying away from these arguments. Throughout the course,
I’ll try to use the terms “frequentist” and “Bayesian” as labels for approaches, not
people. You don’t have to choose between being one or the other any more than
you have to choose between being a “Newtonist” or a “relativist.” Choose the best
tool for the problem—be pragmatic rather than dogmatic.

For historical reasons, the frequentist approach is more common. It is used in particle physics,
for example, but also in biology and the life sciences. Anytime you see p‑values or 𝑛‑sigma
values for signi icance (for example, “5‑sigma signi icance”), you’re dealing with a frequentist

Copyright © 2018–2020 Vinothan N. Manoharan 5


Physics 201 Spring 2019‑2020 Lecture 1

analysis. The Bayesian approach is more widely used in astrophysics and biophysics. It’s im‑
portant to be familiar with both frameworks. We’ll cover both.
However, the course will be biased toward Bayesian methods, because they are much easier
to teach and understand. Frequentist inference is a huge collection of statistics and tests. But
a central focus of the course will be on developing statistical models that can be used for ei‑
ther approach, and learning that approach (called generative modeling) is just as important as
learning the frameworks.

Sidenote: For what it’s worth: when lives or money are at stake (war, gambling,
disease), Bayesian approaches are the norm. Alan Turing famously used Bayesian
methods to crack the Enigma, for example.

2 Course speci ics


2.1 Computation and programming
Modern data analysis techniques require the use of a computer—even on small data sets. We’ll
see that even itting a linear model to data is non‑trivial, primarily because we want to get
accurate uncertainties, as discussed above. So in general we’ll need to write programs to do
our analysis for us.
We’ll do all of our analysis in the Python programming language (speci ically, Python 3), which
is free and open source, easy to learn, and has powerful tools for data analysis. All assignments
must be done in Python 3; since a big part of the course involves learning from each other,
it’s essential that we all speak the same language. It’s OK if you are not familiar with Python
or computer programming in general. We’ll start with the basics and move quickly to more
advanced computational concepts at the same time as we learn more advanced data analysis
techniques.

2.2 Learning objectives


See syllabus.

2.3 Audience (or, is this course for you?)


See syllabus.

2.4 Textbooks, assignments, grading


See syllabus.

2.5 Final projects


During the second half of the course, you will do an extended project involving Bayesian anal‑
ysis of actual data (either obtained by you or available elsewhere). You’ll work in teams. The
project will be structured so that you will get feedback at each step.

Copyright © 2018–2020 Vinothan N. Manoharan 6


Physics 201 Spring 2019‑2020 Lecture 1

2.6 Section and schedule


See syllabus.

3 Recap
• The goal of statistical inference is to accurately quantify how much to trust a parameter
in a model or a model itself, based on measurements (and uncertainties in those mea‑
surements). In the irst case, called parameter estimation, we will likely want to use the
parameter to test or validate another model, so it is essential to quantify the uncertainty
in the parameter as well as its best‑ it value.
• There are two frameworks for statistical inference: Bayesian and frequentist. In the
Bayesian approach, the argument of a probability can be any logical proposition. In the
frequentist approach, it must be a random variable.

4 Additional material (on course website)


• Syllabus

Copyright © 2018–2020 Vinothan N. Manoharan 7

You might also like