Week - 1

MATH2901 - Higher Theory of Statistics
Libo, Li
May 28, 2020
Libo, Li
Week 1 - Lecture 1
Consultation Hours: TBA

Location: Blackboard Collaborate Rooms
Email: libo.li@unsw.edu.au
Requests from me
Please communicate using the university email.
Try your best to come to consultation hours.
Mute your microphone.
Libo, Li
Assessments
Online Quiz - Week 4, Weighting 5%
Midterm test - Week 7, Weighting 20%
Assignment - Week 9, Weighting 15%
Libo, Li
Statistics
Watch Youtube videos

How statistics can be misleading - Mark Liddell
https://www.youtube.com/watch?v=sxYrzzy3cq8
Chocolate, correlation and cat’s whiskers
https://www.youtube.com/watch?v=ZeCr3Jgh8r0
Interesting websites
Misleading Statistics - https://www.datapine.com/
blog/misleading-statistics-and-data/
Correlation and Causation - http:
//www.tylervigen.com/spurious-correlations
Libo, Li
Probability and Statistics
Probability: The formal study of probability in the west start in

the 17th century with Blaise Pascal, Pierre de Fermat and the
Dutchman Christian Huygens. Pascal’s triangle was known
long time before in Iranian and Chinese culture.
Statistics: The birth of statistics is often dated to 1662, when

John Graunt, along with William Petty, developed early human
statistical and census methods that provided a framework for
modern demography. He produced the first life table, giving
probabilities of survival to each age and gave the first
statistically based estimation of the population of London.
Libo, Li
History
17th Century Through the collaboration of Blaise Pascal,

Pierre de Fermat and the Dutchman Christian Huygens was
probability theory given a mathematically treatment.
Blaise Pascal Pierre de Fermat
Libo, Li
History
18th Century Jacob Bernoulli and Abraham de Moivre’s put

probability on a sound mathematical footing. Bernoulli proved a
version of the fundamental law of large numbers, which states
that in a large number of trials, the average of the outcomes is
likely to be very close to the expected value.
19th Century Probabilistic methods was used to correct

error-prone observations in Astronomy. Carl Frederic Gauss
determination of the orbit of Ceres from a few observations. A
normal distribution of errors was used to determine the most
likely true value.
Libo, Li
Gauss Laplace
Libo, Li
Laplace’s Demon
"We may regard the present state of the universe as the effect
of its past and the cause of its future. An intellect which at a
certain moment would know all forces that set nature in motion,
and all positions of all items of which nature is composed, if this
intellect were also vast enough to submit these data to
analysis, it would embrace in a single formula the movements
of the greatest bodies of the universe and those of the tiniest
atom; for such an intellect nothing would be uncertain and the
future just like the past would be present before its eyes."
Pierre Simon Laplace, A Philosophical Essay on Probabilities
Libo, Li
20th century
Statistics Hypothesis testing of Fisher and Neyman, which is

now widely applied in biological and psychological experiments
and in clinical trials of drugs, as well as in economics.
Probability The theory of stochastic processes broadened into

such areas as Markov processes and Brownian motion. Used
to model random movement of tiny particles suspended in a
fluid and fluctuations in stock markets.
Libo, Li
In conclusion.
Probability: Deductive? From model deduce the
probability
Statistics: Inductive? Induce from data the behaviour of
the black-box model.
Example: From Pascal’s triangle to the bell curve.
n=1000
plot(choose(n,c(0:n))/2^n)
sum(choose(n,c(0:n))/2^n)
Libo, Li
Examples
library(cluster)
head(iris)
fit<-kmeans(iris[1:4], 3)
clusplot(iris, fit$cluster, color=TRUE, shade=TRUE, labels=4, lines=0)
points(1.4,0.3)
library("nnet")
model <- multinom(Species ~ Sepal.Length + Petal.Width, data = iris)
expanded=expand.grid(Sepal.Length=c(1.3,3,7.5),
Petal.Width=c(0.3,1,1.6))
c = data.frame(Sepal.Length = c(1.4), Petal.Width=c(0.3))

expanded
predicted=predict(model,expanded,type="probs")
predicted=predict(model,c,type="probs")
predicted
points(expanded, col = ’red’,cex = 3)
points(c, col = ’red’,cex = 3)
Libo, Li
Experiments, Sample space and Events
Definition
An experiment is any process leading to recorded observations
Example
Some examples
Tossing a coin
Measuring the lifetime of a machine.
Counting the number of calls arriving at a telephone
exchange.
Libo, Li
Probability Space
Definition
An outcome is a possible result of an experiment and the set of
all possible outcomes is called the sample space which is
denoted by Ω.
Example
The following are some examples of sample spaces
Cast two dice consecutively. The sample space is
Ω = {(1, 1), (1, 2), . . . , (1, 6), (2, 1), . . . , (6, 6)}.
The number of arriving calls. The sample space is
Ω = {0, 1, . . . , } = N0
Libo, Li
Probability Space
Definition
An event is a set of outcomes, i.e. a subset of Ω.
Example
The event that the sum of two dices throws is ten or more is
A = {(5, 5), (5, 6), (6, 5), (6, 6)}
Definition
Events are mutually exclusive (disjoint) if they have no
outcomes in common.
Libo, Li
Revision in Set Operations
Lemma
(The associative law) If A, B, C are sets then
(A ∪ B) ∪ C = A ∪ (B ∪ C)
(A ∩ B) ∩ C = A ∩ (B ∩ C)
(Distributive Law) If A, B, C are sets then
A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C)
A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C)
Libo, Li
Remark
If you have trouble remembering the above rules, then one can
essentially replace ∩ by multiplication and ∪ by addition.
Libo, Li
Libo, Li
Libo, Li
σ-algebra
In order to define a probability function on the the sample

space, we require the concept of a σ-algebra, which is
beyond the scope of this course.
One can think of the σ-algebra as the family of all possible
events associate with Ω. In the case where the sample
space Ω is finite or countably infinite, the σ-algebra can be
taken to be the power set of Ω.
Libo, Li
Probability
Definition
A probability is a set function, which is usually denote by P, that
maps events from the σ-algebra to [0, 1] and satisfies certain
properties.
Example
Consider the coin toss experiment. The sample space is given
by Ω = {T , H} and the σ-algebra is A = {φ, {Ω}, {T }, {H}}.
We can define a probability P on the σ-algebra A by setting.
P(φ) = 0, P(Ω) = 1, P({H}) = p, P({T }) = 1 − p.
Libo, Li
Given the probability/sample space (Ω, A, P). The probability
function P must satisfy.
1 For every set A ∈ A, P(A) ≥ 0
2 P(Ω) = 1
3 (Countably additive) Suppose the family of sets (Ai )i∈N are
mutually exclusive, then
∞
[ ∞
X
P( Ai ) = P(Ai )
i=1 i=1
Libo, Li
Libo, Li
Libo, Li
May 28, 2020
Libo, Li
Week 1 - Lecture 2
We have introduced the probability/sample space (Ω, A, P),

Ω is the sample space.
A is the σ-algebra.
P is a probability function.
Libo, Li
The axioms are that the probability function P satisfies are
1 For every set A ∈ A, P(A) ≥ 0
2 P(Ω) = 1
3 Given a mutually exclusive family of sets (Ai )i∈N ,
∞
[ ∞
X
P( Ai ) = P(Ai )
i=1 i=1
Libo, Li
Lemma
1 Given a family of disjoint sets (Ai )i=1,...,k
k
[ k
X
P( Ai ) = P(Ai )
i=1 i=1
2 P(φ) = 0
3 For any A ∈ A, P(A) ≤ 1 and P(Ac ) = 1 − P(A)
4 Suppose B, A ∈ A and A ⊆ B, then P(A) ≤ P(B).
Libo, Li
Libo, Li
Libo, Li
Example
(Tossing two fair dice consecutively) The sample space is
Ω = {(1, 1), (1, 2), . . . , (1, 6), (2, 1), . . . , (6, 6)}.
Let A be the power set of Ω. It is sufficient to define the

probability function P on the singletons, since every event in A
can be written as the disjoint union of the singletons elements.
Libo, Li
Theorem
(Continuity from below) Given an increasing sequence of
events A1 ⊂ A2 ⊂ . . . then
∞
[
P( An ) = lim P(An )
n→∞
n=1
(Continuity from above) Given an decreasing sequence of

events A1 ⊃ A2 ⊃ . . . then
∞
\
P( An ) = lim P(An )
n→∞
n=1
Libo, Li
Proof.
We
T∞ proof continuity
S∞ from above. By De Morgan’s law
A = ( A c )c
n=1 n n=1 n
∞
\ ∞
[
P( An ) = P(( Acn )c )
n=1 n=1
∞
[
= 1 − P( Acn )
n=1
= 1 − lim P(Acn )
n→∞
= lim (1 − P(Acn )) = lim P(An )
n→∞ n→∞
Libo, Li
Conditional Probability and Independence
Definition
the conditional probability that an event A occurs given that an
event B has occurred is
P(A ∩ B)
P(A|B) = , P(B) > 0
P(B)
Definition
Events A and B are independent if P(A ∩ B) = P(A) ∩ P(B).
Libo, Li
Lemma
Given two events A and B then P(A|B) = P(A) if and only if
P(B|A) = P(B).
Proof:
Libo, Li
Libo, Li
Definition
1 A countable sequence of events (Ai )i=N is pairwise
independent if P(Ai ∩ Aj ) = P(Ai )P(Aj ) for all i 6= j.
2 A countable sequence of events (Ai )i=N are independent if
for any sub-collection Ai1 , . . . Ain we have
n
Y
P(Ai1 ∩ Ai2 · · · ∩ Ain ) = P(Aij )
j=1
Remark
Independence implies pairwise independence, but pairwise
independence does not imply independence.
Libo, Li
Example
A ball is drawn at random from 4 balls labelled 1, 2, 3, 4. The
sample space is Ω = {1, 2, 3, 4} and we take P({i}) = 14 .
Consider the events
A = {1} ∪ {2}, B = {1} ∪ {3}, C = {1} ∪ {4}.
We see that P(A ∩ B) = P({1}) = 41 and P(A) = P(B) = 12 ,

which implies P(A)P(B) = P(A ∩ B). That is A and B are
independent. by similar arguments, we can show that the sets
A, B, C are pairwise independent, however
P(A ∩ B ∩ B) 6= P(A)P(B)P(C)
1 1
since P(A ∩ B ∩ B) = P({1}) = 4 and P(A)P(B)P(C) = 23
.
Libo, Li
Libo, Li
Libo, Li
May 28, 2020
Libo, Li
Week 1 - Lecture 3
Lemma
1 The multiplicative law: given events A and B then
P(A ∩ B) = P(A|B)P(B),
and similarly, if you have events A, B, C then
P(A1 ∩ A2 ∩ A3 ) = P(A3 |A2 ∩ A1 )P(A2 |A1 )P(A1 )
2 The additive law: Let A and B be events then
P(A ∪ B) = P(A) + P(B) − P(A ∩ B)
Libo, Li
Remark
The RHS of the multiplicative law is exactly multiplication down
the tree diagram.
Proof.
Libo, Li
Libo, Li
Law of Total Probability
Lemma
Suppose (Ai )i=1,...,k are mutually exclusive and exhaustive of Ω,
that is ki=1 Ai = Ω, then for any event B, we have
S
k
X
P(B) = P(B|Ai )P(Ai )
i=1
Libo, Li
Proof.
It is easy to see that B = B ∩ Ω and by using the fact that
(Ai )i=1,...k is exhaustive of Ω, we can writ e
k
[ k
[
B =B∩Ω=B∩ Ai = (B ∩ Ai )
i=1 i=1
Then by noticing that (B ∩ Ai )i=1,...,k are again disjoint sets and

using the definition of conditional probability, we have
k
X k
X
P(B) = P(B ∩ Ai ) = P(B|Ai )P(Ai )
i=1 i=1
Libo, Li
Lemma
(Bayes Formula) Given sets B, A and a family of disjoint and
exhaustive sets (Ai )i=1,...,k then
P(B|A)P(A)
P(A|B) = Pk
i=1 P(B|Ai )P(Ai )
Libo, Li
Proof.
From definition of conditional probability
P(A ∩ B) P(B|A)P(A)
P(A|B) = =
P(B) P(B)
then by applying the law of total probability to P(B) in the

denominator, we have
P(B|A)P(A)
P(A|B) = Pk
i=1 P(B|Ai )P(Ai )
and this gives us the formula.
Libo, Li
Example
(Applications of Bayes Formula) A diagnostic test for a certain
disease claims to be 90% accurate in the following sense.
If the patient has the disease, the the test will be shown
positive with probability 0.9.
If the patient does not have disease, the the test will show
negative with probability 0.9.
Also we know that 1% of the population has the disease.
Libo, Li
Libo, Li
Libo, Li
Libo, Li
Descriptive Statistics + R
Steps to data analysis

What is the research question. How to provide insight to
the question using statistics
What are the properties of the variable of interest. Different
variable types require different analysis.
Categorical - Data can be sorted into a finite set of (unordered)

categories. e.g. Gender
Quantitative - Responses are measured on some sort of
scale. e.g. Weight.
Libo, Li
Numerical summaries of the quantitative data
Given observations x = (x1 , . . . , xn )

The sample mean (estimated mean) or average is given by
n
1X
x̄ = xi
n
i=1
sample variance (estimated variance)

n
1 X
s2 = (xi − x̄)2
n−1
i=1
Libo, Li
R-studio
r<-rexp(1000)
n<-rnorm(1000)
hist(-r,freq = FALSE)
hist(r,freq = FALSE)
par(mfrow=c(1,3))
plot(density(n), main = ’Symmetric Distribution’)
plot(density(-r+10), main = ’left skewed distribution’)
plot(density(r+10), main = ’right skewed distribution’)
Libo, Li

Week - 1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Week - 1

Uploaded by

Copyright:

Available Formats

MATH2901 - Higher Theory of Statistics

May 28, 2020

Consultation Hours: TBA

Watch Youtube videos

Probability: The formal study of probability in the west start in

Statistics: The birth of statistics is often dated to 1662, when

17th Century Through the collaboration of Blaise Pascal,

Blaise Pascal Pierre de Fermat

18th Century Jacob Bernoulli and Abraham de Moivre’s put

19th Century Probabilistic methods was used to correct

Pierre Simon Laplace, A Philosophical Essay on Probabilities

Statistics Hypothesis testing of Fisher and Neyman, which is

Probability The theory of stochastic processes broadened into

c = data.frame(Sepal.Length = c(1.4), Petal.Width=c(0.3))

A = {(5, 5), (5, 6), (6, 5), (6, 6)}

(Distributive Law) If A, B, C are sets then

In order to define a probability function on the the sample

P(φ) = 0, P(Ω) = 1, P({H}) = p, P({T }) = 1 − p.

May 28, 2020

We have introduced the probability/sample space (Ω, A, P),

Let A be the power set of Ω. It is sufficient to define the

(Continuity from above) Given an decreasing sequence of

A = {1} ∪ {2}, B = {1} ∪ {3}, C = {1} ∪ {4}.

We see that P(A ∩ B) = P({1}) = 41 and P(A) = P(B) = 12 ,

May 28, 2020

and similarly, if you have events A, B, C then

P(A1 ∩ A2 ∩ A3 ) = P(A3 |A2 ∩ A1 )P(A2 |A1 )P(A1 )

2 The additive law: Let A and B be events then

P(A ∪ B) = P(A) + P(B) − P(A ∩ B)

Then by noticing that (B ∩ Ai )i=1,...,k are again disjoint sets and

then by applying the law of total probability to P(B) in the

and this gives us the formula.

Steps to data analysis

Categorical - Data can be sorted into a finite set of (unordered)

Given observations x = (x1 , . . . , xn )

sample variance (estimated variance)

You might also like