Revision – Elements or Probability
Definition
Random Experiment – Any experiment whose exact outcome cannot be predicted
with certainty.
Outcome – The result of a random experiment
Event – Set of possible outcomes.
Sample space – The set containing all possible outcomes.
Probability of an event E, P( E) – A number measuring how likely event E is from a
random event.
Notation for events
Union A ∪B OR
Intersection A ∩ B AND
Complement Ac NOT
Other
A ⊆B A implies B.
A ∩ B=ϕ mutually exclusive events
( A ∪ B )c = Ac ∩B c
De Morgan' s Laws: c
( A ∩ B) = Ac ∪ Bc
3 Way Probability
Kolmogorov’s probability axioms
The probability measure P ( . ) satisfies.
i) 0 ≤ P ( E )≤ 1
ii) P ( S )=1
iii) For an infinite sequence of mutually exclusive events
∞
P ( ¿ i=1 ¿ ∞ E i )=∑ P(E i)
i=1
Thus
If A and B are mutually exclusive P ( A ∪ B ) =P ( A )+ P( B)
a. P ( A c )=1−P( A)
b. Additive law, P ( A ∪ B ) =P ( A )+ P ( B )−P( A ∩ B)
Equally Likely Outcomes
If all outcomes are equally likely
number of outcomes∈E
P ( E )=
Total number of possible outcomes
Multiplication Rule – K steps in obtaining an outcome and ni way of completing step I
then total number of outcomes is.
n1 ×n 2 × …× nk
Permutation – An order sequence of the elements in a set. The number of different
permutations is
Pn=n!
Combination – Subset of elements selected from a larger set. The number of
combinations r that can be selected from a set of n
n!
(nr )=C = r ! ( n−r
n
r
)!
Conditional Probability Rules
If B⊆ A ,then P ( A|B )=1
Multiplication law:
P ( A ∪ B ) =P ( A|B ) × P( B)
P( A ∩ B)
P ( A|B )= =P( A Given B)
P (B)
Law of total probability
P ( A )=P ( A|B ) × P ( B ) + P ( A| Bc ¿ × P (B c )
Bayes’ rule: if P ( A ) >0 and P ( B )> 0
P( A)
P ( A|B )=P ( B| A ) ×
P( B)
Independence of two events
Events are independent if and only if.
P ( A ∩ B )=P ( A ) × P (B)
This implies.
P ( A|B )=P ( A )∧P ( B| A ¿=P( B)
Week 1: Introduction and descriptive statistics
Statistical Process
1. Formulate the research question.
2. Design the study then collect data.
3. Summarize the data in an efficient way.
4. Chose and apply appropriate statistical methods.
5. Draw conclusions.
Population
Population – Total collection of elements
Elements are called individuals.
Given the research question we then observe characteristics for each individual.
Sample – A subset of the population.
Data – measurements that are collected over the sample.
Random Sampling
Process of selecting a sample is called sampling.
Sample is representative of the population.
Random sampling accomplishes this.
Research Question
Descriptive – Specific to the sample data at hand
Inference – Generalizing about a population from a sample.
Data Properties
Categorical (qualitative) – Take a value that is one of several possible categories
e.g Gender, hair colors
Quantitative (Numerical) - naturally measured as a number for which meaningful
arithmetic operations
e.g., Height, age, temperature
Descriptive Statistics
Describing and summarising data is called descriptive statistics
1. Graphical summaries
2. Numerical summaries
Graphical summaries
- Effective way to obtain a feel for the essential characteristics
- Plot often reveals useful info
- Highlight Outliers
Histogram
Classes- Vertical bars in a histogram
To many classes – loses information
To many classes reduces patterns
Number of classes ≃ √ number of observations
Descriptions
Symmetric (mirrored around the middle)
Skewed (Shifted to the right/left)
Outliers
Range of data
Unimodal – One peak, Bimodal – Two Peaks
Bell-shaped – Symmetric and unimodal
Density Histogram
Relative frequency – proportion of observations in that classs
relative frequency
Density=
class width
Density histogram – rectangle heights are the density of each class
Relative Frequency=Rectangle area
Numerical summaries of quantitative variable
Location – A value most of the sample is centred around
Variability – how spread the values are around the centre
Location
Mean
n
1
x= ∑ x i
n i =1
x=
∑ of events
number of event
Median
~
x=Value which∣the datainto two equal parts
Middle Value
If n is odd
m=x n +1
( )
2
1
If n is even m= 2 ( x +x )
( n2 ) ( n2 +1 )
Quartiles and Percentiles
First or lower quartile – Value that has 25% observations bellow
Third or upper quartile – Value that has 75% observations or above
Second quartile would split into two equal halves
Finding Quartiles
First Quartile – Median of the lower half (including the median)
3rd Quartile – Median of the upper half (including the median)
Five number summaries
The 3 quartiles (Lower, Median, Upper) together with the maximum and the minimum
gives the 5-number summary
{ x (1 ) , q 1 , m ,q 3 , x (n ) }
Or
{ Min ,25 % value , 50 % value , 75 % value, Max }
Variability
Measure of the sample variance ( s2 ¿
The average of the squared deviation from the mean
n
2 1 2
s= ∑ ( x i−x́ )
n−1 i=1
Standard deviation = s= √ s2
Interquartile Range
iqr=q 3−q 1
iqr is less sensitive to outliers than sample variance
o Outliers is an observation which is different from the bulk of the data
Rule: Outlier = 1.5 ×iqr ¿ the closet quartile
Boxplot
Graph that describes the 5-number summary
Central box spans the quartiles
Line marks the median
Line extends from the box out to the smallest and largest observation which are
not suspected outliers
Observations more than 1.5 x IQR outside the central box are plotted
Correlation or
Regression
Lecture 2 – Random Variables
In the course specifically looking at random variables derived from numerical data,
specifically going from an observation to a dataset
Random Variables – A real-valued function defined over the sample space:
X : S → R , ω → X ( ω)
Specific Properties
P ( X ∈ S x )=1(Probability that x is∈the set of possible outcomes)
P ( ( X =x 1 ) ∪ ( X=x 2 ) ) =P ( ( X =x1 ) + P ( X =x 2 ) )
( Probability of the two events occuringare independent )
P ( X < x )=1−P ( X ≥ x ) (They are complementary )
Cumulative Distribution Function
A random variable is often described by its CDF
F ( x )=P( X ≤ x)
Properties
P ( a< x ≤ b )=F ( b )−F ( a )
F is nondecreasing function
lim F(x )=F ( + ∞ ) =1
x→+∞
lim F (x)=F (−∞ )=0
x→−∞
Continuous values are infinitely precise (e.g x can be any real number) (infinite)
Discrete they are only set numbers x can be (e.g x can only be integers) (Finite)
Probability mass function
Can only be founded in discrete random variables
p ( x ) =P(X =x)
Sum of all possible p(x) = 1
Bernoulli random variable
Two values S x ={ 0,1 } with probability of π ( Not 3.14 , just random probability )
p ( 1 )=π
p ( 0 )=1−π
For some value π → 0< π <1
Continuous Random Variables
Defined over an uncountable set of real numbers usually an interval (e.g number
between 0 and 1)
Random Variable is said to be continuous is there exist a nonnegative function such that
P ( x ∈ B ) =∫ f ( x ) dx
B
f ( x )=Probability Density function
rr
Continuous version of PMF
Properties
x
F ( x )=P ( X ≤ x ) =∫ f ( y ) dy
−∞
dF (x)
f ( x )= =F ' ( X )
dx
f ( x ) ≥0 ∀ x ∈ R( For all x ∈real numbers)
b
P ( a ≤ X ≤ b )=∫ f ( x ) dx=F ( b ) −F( a)
a
+∞
∫ f ( x ) dx=1
−∞
You cannot get the exact probability of a continuous variable
x
P ( X=x )=∫ f ( x ) dx=0
x
Thus
P ( X < x )=P( X ≤ x)
Expectation from PDF
Mean -Discrete
μ= E ( X )= ∑ xp ( x)
x∈ Sx
Sum of X*(P(x) over all possible values
X 0 1 2
p(x) 0.81 0.18 0.01
E ( x )=( 0.81 )∗0+0.18∗1+ 0.01∗2
Mean – Continuous
μ= E ( X )=∫ xf ( x ) dx
SX
Mean for a function Continuous and Discrete
For function g(x) (e.g x 2)
Discrete
∑ g( x ) p(x )
x∈ Sx
Continuous
E ( g ( x ) )=∫ g ( x ) f ( x)dx
SX
Properties
E ( aX +b )=a E ( X )+ b
Variance
V ar ( x ) =E ( ( X −μ )2 )=σ 2
Discrete Random Variable
V ar ( x ) = ∑ ( x−μ )2 p( x)
x ∈S x
Continuous random Variable
2
V ar ( x ) =∫ ( x−μ ) f ( x ) dx
Sx
Alternative Form
2
V ar ( X )=E ( X 2 )−( E ( x ) ) =E ( X 2 )−μ2
Standard Deviation
σ =√ V ar (X )
Linear Transformation variance
V ar ( aX +b )=a2 V ar ( x)
Standardization – Linear Transformation that is commonly use
X−μ
Z=
σ
Var(Z)=1, E(Z)=0
Often called the z Score
Joint Cumulative Distribution Function
Used with two random variables X and Y
Used when they are/maybe dependent on one another
o E.g height and weight
F XY ( x , y )=P ( X ≤ x ,Y ≤ y ) ∀ ( x , y ) ∈ R × R
F XY ( x , y )=P ( X ≤ x ) ∩ P(Y ≤ y )
Discrete Variables
p xy ( x , y )=P(X =x , Y = y)
Marginal PMF of X and Y
p x ( x )= ∑ p XY ( x , y )∧P y ( y )= ∑ p XY ( x , y )
y ∈SY x∈ Sx
Continuous
P ( X ∈ A , Y ∈ B )=∫ ∫ f xy ( x , y ) dy dx
A B
Marginal Densities
f x ( x ) =∫ f xy ( x , y ) dy∧f y ( y )=∫ f xy ( x , y )
Sy Sx
Expected function
E ( g ( X , Y ) )= ∑ ∑ g ( x , y ) p XY ( x , y )
x∈ S X y∈ S y
∫∫ g ( x , y ) f XY ( x , y ) dy dx
sx sy
E ( aX +bY )=a E ( X ) +b E (Y )
Independence
Independent if
P ( X ≤ x ,Y ≤ y )=P ( X ≤ x ) × P(Y ≤ Y )
Thus for a discrete case
p xy ( x , y )= px ( x ) × p y ( y)
Continuous
f xy ( x , y )=f x ( x ) × f y ( y)
Expectation of independent event
E ( h ( X ) g ( Y ) )=E ( h ( X ) ) × E(g ( Y ) )
Covariance
C ov ( X , Y )=Deviation of X ¿ Mean x × Deviation of Y ¿ mean y
C ov ( X , Y )=E ( ( X−E ( X ) ) × E ( Y −E ( Y ) ) )
Properties
C ov ( X , Y )=C ov ( Y , X ) (Symmetric)
C ov ( X , X ) =V ar ( X )
C ov ( X , Y )=E ( XY ) −E ( X ) E (Y )
C ov ( aX +b , cY + d )=ac C ov ( X , Y )
C ov ( X 1+ X 2 , Y 1 +Y 2 )=C ov ( X 1 ,Y 1 ) +C ov ( X 1 , Y )2 +C ov ( X 1 , Y 1 ) +C ov ( X 2 , Y 2 )
If X and Y are independent
C ov ( X , Y )=0
Covariance of Sums
V ar ( aX +bY ) =a2 V ar ( X )+ b2 V ar ( Y ) +2 ab C ov ( X , Y )
Correlation Coefficient
C ov (X , Y )
ρ=
√ V ar ( X ) V ar (Y )
−1< ρ<1
ρ → Positive → Positive linear relationship
ρ → Negative → Negative linear relationship
|ρ|≫ 0 STRONG LINEAR RELATIONSHIP
Special Random Variables
Binomial Distribution (Discrete)
Multiple independent trials with 2 outcomes and counting how many of 1
outcome occurs
X Bin(n , π)
p ( x) = ( nx ) π ( 1−π )
x n− xq
Properties
X 1 Bin ( n1 , π ) , X 2 Bin(n2 , π )
X 1 + X 2 Bin(n1 +n 2 , π)
If X Bin(n , π)
μ= E ( X )=nπ
σ 2=V ar ( X )=nπ ( 1−π)
Poisson Distribution (discontinuous)
Number of occurrences of a random phenomenon in a fixed period/space
o Number of lightning strikes in a park
o Independent
X P(λ)
−λ λx
p x =e ×
( )
x!
Properties
∑ p ( x ) =1
x∈ Sx
E ( X ) =λ
V ar ( X )= λ
Uniform Distributed (Continuous – Straight Line)
Uniform distribution is that the probability over every point is the same
Hence the PDF = 1 i.e
{
f ( x )= β−α
if x ∈ [ α , β ]
0 otherwise }
CDF
0if x < α
{
F ( x )=
x−α
β −α
if α ≤ x ≤ β
1 if x > β
}
Properties
α+β
E ( X )=
2
( β−α )2
V ar ( x ) =
12
b−a
The probability that X lies in a any subinterval [ a , b ] of [ α , β ] is P ( a< X < b )=
β−α
Any linear transformation on a Uniform distribution is still uniform (just changing
x axis still rectangle)
Exponential Distribution (Continuous – Poisson)
−x
1
PDF=f ( x ) = μ
e
0 { μ
if x ≥ 0
otherwise }
0 if x< 0
CDF=F ( x )=
{1−e
−x
μ
if x ≥0 }
E ( x )=μ
V ( X )=μ 2
n −x
1 μ
P ( X <n )=∫ e dx
0 μ
Normal Distribution
Used with inference as inference Normal Distribution
X N ( μ , σ ) → Mean , Std
2
− ( x−μ )
1 2σ
2
PDF=f ( x ) = e
σ √2 π
E ( X ) =μ
V ar=σ 2
Standard Normal Distribution
μ=0 , σ=1
2
−x
1 2
f ( x )= e
√2 π
No integral to solve for SND
To find CDF you must use normcdf on MATLAB
Key Properties
Normal Distribution is always the same (Change μ horiztonal shift) (Change σ
change scale)
Standidization – Changing any normal to standard normal
X−μ
Z= N (0,1)
σ
Key Probabilities
P (−1<Z <1 ) 0.6827
P (−2<Z <2 ) 0.9545
P (−3< Z<3 ) 0.9973
Finding Value that gives certain probability
P ( Z < z )=x
How to find which Z gives the X%
MATLAB norminv
Common Values
P ( Z >1.645 )=0.05
P ( Z >1.96 ) =0.025
P ( Z >2.575 ) =0.005
Sum of normal values
a X 1+b X 2 N ( a μ 1+ b μ 2 , √ a2 σ 21 +b2 σ 22 )
a X 1−b X 2 N ( a μ1−b μ2 , √ a2 σ 21 +b2 σ 22 )
Testing Normality
qqplot(x)