Professional Documents
Culture Documents
Lecture 4
Theoretical Probability Distributions
2022-2 Fall Semester
Rcode:
hist(Pima.tr$bmi,col="coral",main="BMI of Pima IndianWomen",xlab="BMI",border="burlywood",breaks=30,freq=F)
lines(density(Pima.tr$bmi,adjust=0.7),col="burlywood",lwd=3)
Theoretical Distributions
• Probabilities that are calculated from a finite amount of data are
called empirical probabilities
• The probability distributions can be determined based on theoretical
considerations, which is called theoretical probability distributions
Bernoulli Distribution
• Consider a dichotomous (two-level) random variable Y.
• By definition, Y must assume one of two possible values:
• Failure or success
• Dead or alive
• Male or Female
• Current smoker or not
• Heads or tails (coin flip)
• A random variables that take this type is known as a Bernoulli random
variable, and we describe the probability of response using the
parameter 𝜋
Bernoulli Random Variable
• Often coded so that 𝑌 = 1 is called an event or success, and 𝑌 = 0 is
called a failure
• 𝜋 is defined as the probability of success, 𝜋 = 𝑃(𝑌 = 1)
Coin flip: let 𝑌 = 1 if heads and 𝑌 = 0 if tails, then 𝜋 = 0.5= 𝑃(𝑌 = 0)
Gender at birth in US: let 𝑌 = 1 if male and 𝑌 = 0 if female, then 𝜋=0.512
and 𝑃 𝑌 = 1 = 1-𝜋 = 0.488
Bernoulli Distribution
• Y takes value 1 with probability 𝜋 and 0 with probability 1 − 𝜋
• 𝑃 𝑌 = 𝑦 = 𝜋 𝑦 1 − 𝜋 1−𝑦 Calculate 𝑃 𝑌 = 0 and 𝑃(𝑌 = 1) pi
• 𝑋 ~𝐵𝑖𝑛𝑜𝑚𝑖𝑎𝑙 𝑛, 𝜋 ⇔
0.30
𝑃 𝑋=𝑥 =
0.25
𝑛 𝑥 (1 − 𝜋)𝑛−𝑥
𝑥
𝜋
0.20
• 𝜋 𝑥 (1 − 𝜋)𝑛−𝑥 accounts for
the probability of two
Density
0.15
smokers in order
0.10
𝑛
• 𝑥
accounts for all the
0.05
possible ways in which we
have two smokers regardless
0.00
of order 0 2 4 6 8 10
No. of successes
Exercise
R Workshop
• Random number generation
• Calculate a density given a value
• Calculate right tail and left tail areas
• Calculate quantiles
Continuous Distribution
• We discussed discrete random variables
• As we move to discussion of continuous random variables, we will consider
the distribution of a continuous random variable 𝑋
• Suppose 𝑋 represents height
• An individual exactly 163cm tall is rare
• Theoretically, 𝑋 can assume an infinite number of intermediate values,
such as 163.0001cm or 163.01
• In reality we measure only discrete values due to the limitations of our
measuring instruments
• In result, the distribution of a continuous random variable is represented
by a smooth curve, called density function
Continuous Distributions
• A continuous distribution describes the probabilities of possible
values of a continuous random variable (infinite and uncountable)
• Density functions/curves, like histograms, can have any shape. The
area under the density curve is always 1.
• How do you find the area of interest in the curves?
• Integration!
•
Empirically observed frequency Analytical probability density
(count the number of values observed) (area under the curve)
Normal Distributions
σ=1
σ affects the
σ=2
width of the curve
σ=3
μ affects the
position of the
center of the curve
μ
• The SD measures the distance
from the mean to the point of
inflection
• About 68% of the data are falling
in 1-SD of the mean.
• About 5% of the data are further
than 2-SD from the mean in each
tail
Finding tail probabilities (P given x)
P ( X< x1) = … P (X < x2) = …
x1 x2
In R, pnorm (x =x1, mean = …, sd = …)
Input: quantile and parameters
Finding quantiles (x given P)
P (X <x …) = P1
Area = P1
1-Pk
Pk Pk
Pk
• Z ~ N(0,1)
• When we standardize by finding z-scores, we change the the normal
distribution by moving the location (mean moves to zero) and changing the
scale (SD moves to 1)
• Check Workshop!
Quantile-Quantile plot (Q-Q plot)
• Q-Q plot is designed to compare two probability distributions by
plotting their quantiles against each other.
• Many statistical methods are developed under normality assumption
• Q-Q plot for normality check is called normal Q-Q plot
• We obtain data and a statistical method with normality assumption will be
used
• We need to check if the method is ok to be applied to our data.
• Try Q-Qplot which is a scatterplot for quantiles from data vs. the normal
distribution (theoretical )
Example: Annual Precipitation in US Cities
The average amount of
rainfall in inches for
each of the 70 states