Professional Documents
Culture Documents
Probability deals with predicting the Statistics involves the analysis of the
likelihood of future events. frequency of past events
Example 4.1: Consider there is a drawer containing 100 socks: 30 red, 20 blue
and 50 black socks.
We can use probability to answer questions about the selection of a
random sample of these socks.
PQ1. What is the probability that we draw two blue socks or two red socks from
the drawer?
PQ2. What is the probability that we pull out three socks or have matching pair?
PQ3. What is the probability that we draw five socks and they are all black?
SQ1: A random sample of 10 socks from the drawer produced one blue, four red, five
black socks. What is the total population of black, blue or red socks in the drawer?
SQ2: We randomly sample 10 socks, and write down the number of black socks and
then return the socks to the drawer. The process is done for five times. The mean
number of socks for each of these trial is 7. What is the true number of black socks in
the drawer?
etc.
The probability that the random variable takes a given value can be computed
using the rules governing probability.
For example, the probability that 𝑋 = 1 means either mother or father but not both
has had measles is 0.32. Symbolically, it is denoted as P(X=1) = 0.32
@DSamanta, IIT Kharagpur Data Analytics (CS61061) 7
Probability Distribution
Definition 4.2: Probability distribution
A probability distribution is a definition of probabilities of the values of
random variable.
Example 4.4: Given that 0.2 is the probability that a person (in the ages between
20 and 40) has had childhood measles. Then the probability distribution is given
by
X Probability
?
0 0.64
1 0.32
2 0.04
𝒙 𝒙𝟏 𝒙𝟐 … … … … . . 𝒙𝒏
𝑓 𝑥 = 𝑃(𝑋 = 𝑥) 𝑓 𝑥1 𝑓 𝑥2 … … . . 𝑓(𝑥𝑛 )
𝒙 0 1 2
𝑓 𝑥 0.64 0.32 0.04 0.32
f(x)
0.04
The use of simulation studies can often eliminate the need of costly experiments
and is also often used to study problems where actual experimentation is
impossible.
Examples 4.5:
1) A study involving testing the effectiveness of a new drug, the number of cured
patients among all the patients who use such a drug approximately follows a
binomial distribution.
Thus,
2! 0 (0.8)2−0
𝑃 𝑥 = 0 = 0! (0.2) = 𝟎. 𝟔𝟒
2−0 !
X Probability
2! 0 0.64
𝑃 𝑥=1 = (0.2)1 (0.8)2−1 = 𝟎. 𝟑𝟐
1! 2 − 1 ! 1 0.32
2 0.04
2!
𝑃 𝑥=2 = (0.2)2 (0.8)2−2 = 𝟎. 𝟎𝟒
2! 2 − 2 !
15 38 68 39 49 54 19 79 38 14
If the value of the digit is 0 or 1, the outcome is “had childhood measles”, otherwise,
(digits 2 to 9), the outcome is “did not”.
For example, in the first pair (i.e., 15), representing a couple and for this couple, x = 1. The
frequency distribution, for this sample is
x 0 1 2
f(x)=P(X=x) 0.7 0.3 0.0
𝑛 𝑛!
where 𝑥1 ,𝑥2 ,……,𝑥𝑘
=𝑥
1 !𝑥2 !……𝑥𝑘 !
Example 4.9:
Probability of observing three red cards in 5 draws from an ordinary deck of 52
playing cards.
− You draw one card, note the result and then returned to the deck of cards
− Reshuffled the deck well before the next drawing is made
• The hypergeometric distribution does not require independence and is based on the
sampling done without replacement.
Example 4.10:
Number of customers visiting a ticket selling
counter in a railway station.
𝑒 −𝜆𝑡 . (𝜆𝑡)𝑥
𝑓 𝑥, 𝜆𝑡 = 𝑃 𝑋 = 𝑥 = , 𝑥 = 0, 1, … …
𝑥!
where 𝜆 is the average number of outcomes per unit time and 𝑒 = 2.71828 …
What is P(X = x) if t = 0?
Example 4.11:
The number of customers arriving at a grocery store can be modelled by a
Poisson process with intensity λ=10 customers per hour.
1. Find the probability that there are 2 customers between 10:00 and 10:20.
2. Find the probability that there are 3 customers between 10:00 and 10:20 and
7 customers between 10:20 and 11:00.
@DSamanta, IIT Kharagpur Data Analytics (CS61061) 24
The Poisson Distribution Curves
𝜇 = 𝑛. 𝑝
𝜎 2 = 𝑛𝑝 1 − 𝑝
2. Hypergeometric distribution
The hypergeometric distribution function is characterized with the size of a sample
(𝑛), the number of items (𝑁) and 𝑘 labelled success. Then
𝑛𝑘
𝜇=
𝑁
2
𝑁−𝑛 𝑘 𝑘
𝜎 = . n. (1 − )
𝑁−1 𝑁 𝑁
@DSamanta, IIT Kharagpur Data Analytics (CS61061) 27
Descriptive measures
3. Poisson Distribution
The Poisson distribution is characterized with 𝜆𝑡 where 𝜆=
𝑡ℎ𝑒 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠 and 𝑡 = 𝑡𝑖𝑚𝑒 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙.
𝜇 = 𝜆𝑡
𝜎 2 = 𝜆𝑡
𝑒 −𝜇 ∙ 𝜇 𝑥
𝑃 𝑋=𝑥 =
𝑥!
1
𝑓 𝑥 =
𝑛
Where f(x) represents the probability mass function.
f(x) f(x)
x1 x2 x3 x4
X=x X=x
Discrete Probability distribution Continuous Probability Distribution
Examples 4.13:
1. Tax to be paid for a purchase in a shopping mall. Here, the random
variable varies from 0 to +∞
2. Amount of rainfall in mm in a region.
3. Earthquake intensity in Richter scale.
4. Height of an earth surface. Here, the random variable varies from −𝑎
to +𝑏, 𝑎, 𝑏 ∈ 𝑅, 𝑅 is a set of real numbers.
A B
X=x
f(x)
c
A B
X=x
Note:
−∞ 1
a) 𝐵 = 𝑥𝑑 𝑥 𝑓 ∞−𝐴 × (𝐵 − 𝐴) = 1
𝑑−𝑐
b) 𝑃(𝑐 < 𝑥 < 𝑑)= 𝐵−𝐴 where both 𝑐 and 𝑑 are in the interval (A,B)
𝐴+𝐵
c) 𝜇 = 2
2 (𝐵−𝐴)2
d) 𝜎 = 12
What is the probability of waiting exactly five minutes, that is, P(X = 5)?
Note: Probability is represented by the area under the curve. The probability of a
specific value of a continuous random variable is zero. because the area under a
point is zero.
2
1 −(𝑥−𝜇) ൗ2𝜎2
𝑓 𝑥 = 𝑒 −∞ < 𝑥 < ∞
𝜎 2𝜋
Note:
This f(x) is called the probability density function.
It gives a nonnegative value.
This is used to define the random variable’s probability of coming within a distinct
range of values, as opposed to taking on any one value.
@DSamanta, IIT Kharagpur Data Analytics (CS61061) 38
Probability Distribution Function
𝑏
𝑃 𝑎 ≤ 𝑋 ≤ 𝑏 = න 𝑓(𝑥)𝑑𝑥
𝑎
Note:
This f(x) is called the probability density function.
It gives a nonnegative value.
This is used to define the random variable’s probability of coming within a distinct
range of values, as opposed to taking on any one value.
@DSamanta, IIT Kharagpur Data Analytics (CS61061) 39
Continuous Probability Distributions
Example 4.15:
Suppose bacteria of a certain species typically live 4 to 6 hours. The
probability that a bacterium lives exactly 5 hours is equal to zero. A
lot of bacteria live for approximately 5 hours, but there is no chance
that any given bacterium dies at exactly 5.0000000000... hours.
However, the probability that the bacterium dies between 5 hours and
5.01 hours is quantifiable.
Suppose, the answer is 0.02 (i.e., 2%). Then, the probability that the
bacterium dies between 5 hours and 5.001 hours should be about
0.002, since this time interval is one-tenth as long as the previous.
The probability that the bacterium dies between 5 hours and 5.0001
hours should be about 0.0002, and so on.
Note:
A PDF must be integrated over an interval to yield a probability.
1. 𝑓 𝑥 ≥ 0, for all 𝑥 ∈ 𝑅
∝
2. −∝ 𝑓 𝑥 𝑑𝑥 = 1
𝑏 f(x)
3. 𝑃 𝑎≤𝑋≤𝑏 = 𝑥𝑑 )𝑥(𝑓 𝑎
∝
4. 𝜇 = −∝ 𝑥𝑓(𝑥) 𝑑𝑥 a b
∝ X=x
5. 𝜎 =2 𝑥−𝜇 2𝑓 𝑥 𝑑𝑥
−∝
Note: Probability is represented by area under the curve.
The probability of a specific value of a continuous random variable is zero.
f(x)
𝜎
𝜇
x
σ2
µ1
µ1 µ
µ2 2 µ1 = µ2
Normal curves with µ1< µ2 and σ1 = σ2 Normal curves with µ1 = µ2 and σ1< σ2
σ1
σ2
µ1 µ2
Normal curves with µ1<µ2 and σ1<σ2
Here, both the drug has the same mean, but drug B would be the better
medication, as fewer patients on this distribution have very high or very
low glucose levels.
@DSamanta, IIT Kharagpur Data Analytics (CS61061) 47
Properties of Normal Distribution
The curve is symmetric about a vertical axis through the mean 𝜇.
The random variable 𝑥 can take any value from −∞ 𝑡𝑜 ∞.
The most frequently used descriptive parameters define the curve itself.
The mode, which is the point on the horizontal axis where the curve is a
maximum occurs at 𝑥 = 𝜇.
The total area under the curve and above the horizontal axis is equal to 1.
∞ 1 ∞ − 1 2 (𝑥−𝜇)2
−∞ 𝑓 𝑥 𝑑𝑥 =
𝜎 2𝜋
−∞ 𝑒 2𝜎 𝑑𝑥 =1
1
∞ 1 ∞ − 2 (𝑥−𝜇)2
𝜇= −∞ 𝑥. 𝑓 𝑥 𝑑𝑥 = −∞ 𝑥. 𝑒 2𝜎 𝑑𝑥
𝜎 2𝜋
1
1 ∞ −2[(𝑥−𝜇)ൗ𝜎2]
𝜎2 = −∞(𝑥 − 2
𝜇) . 𝑒 𝑑𝑥
𝜎 2𝜋
1 𝑥2 − 12 (𝑥−𝜇)2
𝑃 𝑥1 < 𝑥 < 𝑥2 = 𝑒 𝑥2𝜎 𝑑𝑥
𝜎 2𝜋 1
denotes the probability of x in the interval (𝑥1 , 𝑥2 ). 𝜇 x1 x2
𝑥 = 𝜇 + 𝑧. 𝜎
Approximately, 68% of the distribution falls within the ±1 standard deviation of the mean.
Approximately, 95% of the distribution falls within the ±2 standard deviation of the mean.
Approximately, 99.7% of the distribution falls within the ±3 standard deviation of the mean.
Note:
These proportions hold true for every normal distribution
𝑥 = 𝜇 + 𝑧. 𝜎
68% of the population will have a resting heart rate between 60 and 80.
95% of the population will have a heart rate between approximately 70 ± (2×10), that is, 50 and 90
beats/min.
• The nearest figure to 5% (0.05) in the table is 0.0495, the z-sore corresponding to this is 1.65.
• The corresponding heart rate lies 1.65 standard deviations above the mean; that is, it is equal
70+1.65×10 = 86.5
• We can conclude that 5% of this population has a heart rate above 86.5 beats/min.
The normal distribution has computational complexity to calculate 𝑃(𝑥1 < 𝑥 <
𝑥−𝜇
z= [z-transformation]
𝜎
𝑧 2
1 −2
f(z, 0, 1) = 𝑒
2𝜋
0.09
0.4
0.08 σ σ=1
0.07
0.3
0.06
0.05
0.2
0.04
0.03
0.02 0.1
0.01
0.00 0.0
-5 0 5 10 15 20 25 -3 -2 -1 0 1 2 3
x=µ µ=0
f(x: µ, σ) f(z: 0, 1)
@DSamanta, IIT Kharagpur Data Analytics (CS61061) 54
Standard Normal Distribution: Example
Example 4.19:
Hint: Find the probability that x is between 50 and 70 or P( 50< x < 70).
You can use both normal and z-distribution tables.
The probability that a patient will survive between 50 and 70 years is equal to 0.4082.
Γ 𝛼 = 𝛼 − 1 න 𝑥 𝛼−2 𝑒 −𝑥 𝑑𝑥
0
= 𝛼−1 Γ 𝛼−1
= 𝑛 − 1 𝑛 − 2 … … … .3.2.1
= 𝑛−1 !
∞
Further, Γ 1 = 0 𝑒 −𝑥 𝑑𝑥 = 1
Note:
1
Γ = 𝜋 [An important property]
2
1.0
σ=1, β=1
0.8
0.6
f(x)
0.4
σ=2, β=1
0.2
σ=4, β=1
0.0
0 2 4 6 8 10 12
x
Note:
1) The mean and variance of gamma distribution are
𝜇 = 𝛼𝛽
𝜎 2 = 𝛼𝛽 2
2) The mean and variance of exponential distribution are
𝜇=𝛽
𝜎 2 = 𝛽2
𝜇 = 𝑣 and 𝜎 2 = 2𝑣
1 1
− 2 [ln 𝑥 −𝜇]2
𝑒 2𝜎 𝑥≥0
𝑓 𝑥: 𝜇, 𝜎 = ൞𝜎𝑥 2𝜋
0 𝑥<0
−2ൗ 2 1
𝜎2 = 𝛼 𝛽 {Γ 1+ − [Γ(1 + )]2 }
𝛽 𝛽
F distribution
Note:
A sample statistics is random variable and like any other random variable, a sample
statistics has a probability distribution.
Note: Probability distribution for random variable is not applicable to sample statistics.
@DSamanta, IIT Kharagpur Data Analytics (CS61061) 68
Sampling Distribution
More precisely, sampling distributions are probability distributions and used to describe
the variability of sample statistics.
ത is called
The probability distribution of sample mean (hereafter, will be denoted as 𝑋)
the sampling distribution of the mean (also, referred to as the distribution of sample
mean).
ത we call sampling distribution of variance (denoted as 𝑆 2 ).
Like 𝑋,
Using the values of 𝑋ത and 𝑆 2 for different random samples of a population, we are to
make inference on the parameters 𝜇 and 𝜎 2 (of the population).
?
@DSamanta, IIT Kharagpur Data Analytics (CS61061) 72
Theorem on Sampling Distribution
Famous theorem in Statistics
This further, can be established with the famous “Central Limit Theorem”, which is
stated below.
If random samples each of size 𝑛 are taken from any distribution with mean μ
and variance 𝜎 2 , the sample mean 𝑋ത will have a distribution approximately
𝜎2
normal with mean μ and variance .
𝑛
n=large
n = small
n=1 to moderate
𝑋𝑖 −μ
where 𝑍𝑖 = .
𝑆
This 𝜒 2 is also a random variable of a distribution and is called 𝜒 2 -
distribution (pronounced as Chi-square distribution).
(𝑛−1)𝑆 2 𝑛 𝑥𝑖 −𝑥 2
𝜒2 = 𝜎2
=σ𝑖=1 𝜎
This expression χ2 describes the distribution (of n samples) and thus having
degrees of freedom 𝜐 = n-1 and often written as 𝜒 2 (𝜐), where 𝜐 is the only
parameter in it.
4. In such situation, only measure of the standard deviation available may be the
sample standard deviation 𝑆.
5. It is natural then to substitute 𝑆 for 𝜎. The problem is that the resulting statistics not
necessarily be normally distributed!
𝑍
𝑡 𝑣 =
𝜒 2 (𝑣)
𝑣
Using this definition, we can develop the sampling distribution of the sample mean when
the population variance, 𝜎 2 is unknown.
That is,
ത
𝑋−𝜇
z = 𝜎/ has the standard normal distribution.
𝑛
𝑛−1 𝑆 2
𝜒2 = has the 𝜒 2 distribution with (𝑛 − 1) degrees of freedom.
𝜎2
ഥ
𝑋−𝜇
𝜎/ 𝑛 ത
𝑋−𝜇
Thus, 𝑡 = or 𝑡= 𝑆/ 𝑛
𝑛−1 𝑆2 /𝜎2
𝑛−1
𝑛−1 𝑆 2
Corollary: Recall that 𝜒 2 = is the Chi-squared distribution with (𝑛 − 1) degrees
𝜎2
of freedom.
Therefore, if we assume that we have sample of size 𝑛1 from a population with variance
𝜎1 2 and an independent sample of size 𝑛2 from another population with variance 𝜎2 2 ,
then the statistics
𝑆1 2 /𝜎1 2
𝐹=
𝑆2 2 /𝜎2 2
𝒕 − 𝐝𝐢𝐬𝐭𝐫𝐢𝐛𝐮𝐭𝐢𝐨𝐧:
population variance is not known. In this case, we use the variance of the
sample as an estimate of the population variance.
𝝌𝟐 − 𝐝𝐢𝐬𝐭𝐫𝐢𝐛𝐮𝐭𝐢𝐨𝐧:
It is used for comparing a sample variance to a theoretical population
variance.
𝑭 − 𝐝𝐢𝐬𝐭𝐫𝐢𝐛𝐮𝐭𝐢𝐨𝐧:
It is used for comparing the variance of two or more populations.
𝑛2 , 𝜇2 , 𝜎2