SSD5 Oct 14

Descriptive Statistics
A More Rigorous Discussion

Statistic vs Statistics 2
• Statistic: Any function of sample observations

• Statistics (as a plural noun): numerical data of
any kind
• STATISTICS (as a singular noun): A collection of
specialized scientific methods for the collection,
analysis and interpretation of numerical data
Statistical Structures in Data, PGDBA Programme, ISI, 2022 October 14, 2022
Descriptive Statistics 3
Descriptive
Statistics
Based on Based on
Sample Sample
Moments Quartiles
Sample Moments 4
• Let X denote the random variable of interest defined over a

population of interest
• Suppose X is measured on every member of a random sample of
size n drawn from the population
• Let 𝑥𝑥1 , 𝑥𝑥2 , ⋯ , 𝑥𝑥𝑛𝑛 represent these n observations
• Sample moments are special types of statistics
Raw moments
Sample
moments Central
Statistical Structures in Data, PGDBA Programme, ISI, 2022 moments October 14, 2022
Raw Moments 5
• For the given set of observations, the raw moment of order r is

defined as 𝑛𝑛
1
𝑚𝑚𝑟𝑟 = � 𝑥𝑥𝑖𝑖𝑟𝑟 ,
′ 𝑟𝑟 = 0, 1, 2, ⋯
𝑛𝑛
𝑖𝑖=1
• Note that 𝑚𝑚0′ = 1
and
𝑚𝑚1′ = 𝑥𝑥,̅ the sample mean
Central Moments 6
• For the given set of observations, the central moment of

order r is defined 𝑛𝑛as
1
𝑚𝑚𝑟𝑟 = � 𝑥𝑥𝑖𝑖 − 𝑚𝑚1′ 𝑟𝑟 , 𝑟𝑟 = 0, 1, 2, ⋯
𝑛𝑛
𝑖𝑖=1
• Note that 𝑚𝑚1 = 0
and
𝑚𝑚2 = 𝑠𝑠 2 , the sample variance
Moment-based Descriptive Measures or
Statistics 7
• Measures of Central Tendency

• Sample mean: 𝑚𝑚1′ = 𝑥𝑥̅
• Measures of Variation or Dispersion

1
• Variance: 𝑠𝑠 2 = 𝑚𝑚2 = ∑𝑛𝑛𝑖𝑖=1 𝑥𝑥𝑖𝑖 − 𝑚𝑚1′ 2
𝑛𝑛
• Standard deviation: 𝑠𝑠 = 𝑚𝑚2
𝑚𝑚2 𝑠𝑠
• Coefficient of variation: 𝑐𝑐𝑣𝑣 = =
𝑚𝑚1′ 𝑥𝑥̅
Moment-based Descriptive Measures or
Statistics 8
• Measures of Skewness
𝑚𝑚32 𝑚𝑚3
• 𝑏𝑏1 = and 𝑔𝑔1 = 𝑏𝑏1 =
𝑚𝑚23 3⁄2
𝑚𝑚2
• Measures of Kurtosis
𝑚𝑚4 𝑚𝑚4
• 𝑏𝑏2 = = and 𝑔𝑔2 = 𝑏𝑏2 − 3
𝑚𝑚22 𝑠𝑠 4
Quartile-based Descriptive Statistics 9
• Measure of Central Tendency

• Median: 𝑄𝑄2
• Measures of Dispersion
• Interquartile Range 𝐼𝐼𝐼𝐼𝐼𝐼 = 𝑄𝑄3 − 𝑄𝑄1
• Mean Absolute Deviation about𝑛𝑛 Median
1
� 𝑥𝑥𝑖𝑖 − 𝑄𝑄2
𝑛𝑛
𝑖𝑖=1
• Bowley’s Measure of Skewness
𝑄𝑄3 − 𝑄𝑄2 − 𝑄𝑄2 − 𝑄𝑄1
𝑆𝑆𝑆𝑆 =
2 𝑄𝑄3 − 𝑄𝑄1
Rationale for Bowley’s Measure 10
Q1 Q3 Q3
Q1 Q3
Q1
Q2 Q2 Q2
Negatively Symmetric Positively

Skewed (Not Skewed) Skewed
Other Descriptive Measures 11
• Central Tendency
• Mode
• Pearson’s Measures of Skewness

• First measure:
𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀 − 𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀
𝑠𝑠
• Second measure:
3(𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀 − 𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀)
𝑠𝑠
Rationale for Pearson’s Measures 12
Suggested Book 13
A. M. Gun, M. K. Gupta and B. Dasgupta, Fundamentals of Statistics

(Volume I), World Press (2016).
Standard Probability
Distributions
Univariate Discrete Distributions 15
General Notions 16
• Let X be a discrete random variable

• Let 𝑥𝑥1 , 𝑥𝑥2 , 𝑥𝑥3 , … be values that X takes with non-zero probabilities.
• 𝑥𝑥1 , 𝑥𝑥2 , 𝑥𝑥3 , … are called the mass points of X .
• Let 𝑝𝑝𝑖𝑖 be the probability of X taking the value 𝑥𝑥𝑖𝑖 , for 𝑖𝑖 = 1,2,3, …,
with 0 < 𝑝𝑝𝑖𝑖 < 1 ∀𝑖𝑖 and ∑∞ 𝑖𝑖=1 𝑝𝑝𝑖𝑖 = 1.
• Then the probability mass function (p.m.f.) of X is defined as
𝑝𝑝 if 𝑥𝑥 = 𝑥𝑥𝑘𝑘 , 𝑖𝑖 = 1,2,3, …
𝑓𝑓 𝑥𝑥 = 𝑃𝑃 𝑋𝑋 = 𝑥𝑥 = � 𝑘𝑘
0 otherwise
General Notions (contd.) 17
• Cumulative Distribution Function (c.d.f.)
𝐹𝐹 𝑥𝑥 = 𝑃𝑃 𝑋𝑋 ≤ 𝑥𝑥 = � 𝑓𝑓(𝑥𝑥𝑖𝑖 ) = � 𝑝𝑝𝑖𝑖
𝑖𝑖:𝑥𝑥𝑖𝑖 ≤𝑥𝑥 𝑖𝑖:𝑥𝑥𝑖𝑖 ≤𝑥𝑥
• Expectation or Expected∞ Value
𝜇𝜇 = 𝐸𝐸 𝑋𝑋 = � 𝑥𝑥𝑖𝑖 𝑓𝑓(𝑥𝑥𝑖𝑖 )
𝑖𝑖=1
• Raw moments of order r,∞𝑟𝑟 = 1,2,3, …
𝜇𝜇𝑟𝑟′ = 𝐸𝐸 𝑋𝑋 𝑟𝑟 = � 𝑥𝑥𝑖𝑖𝑟𝑟 𝑓𝑓(𝑥𝑥𝑖𝑖 )
𝑖𝑖=1
• Central moments of order∞r, 𝑟𝑟 = 1,2,3, …
𝜇𝜇𝑟𝑟 = 𝐸𝐸(𝑋𝑋 − 𝜇𝜇)𝑟𝑟 = �(𝑥𝑥𝑖𝑖 − 𝜇𝜇)𝑟𝑟 𝑓𝑓(𝑥𝑥𝑖𝑖 )
𝑖𝑖=1
• Variance: 𝜎𝜎 2 = 𝜇𝜇2
• Median is defined to be that value 𝜇𝜇� for which
1
𝐹𝐹 𝜇𝜇� =
2
• Mode 𝜇𝜇0 is defined to be that value of X for
which
𝑓𝑓 𝜇𝜇0 = max 𝑓𝑓(𝑥𝑥𝑖𝑖 )

Statistical Structures in Data, PGDBA Programme, ISI, 2022 𝑖𝑖 October 14, 2022
• Skewness
𝜇𝜇32 𝜇𝜇3
𝛽𝛽1 = 3 , 𝛾𝛾1 =
𝜇𝜇2 𝜎𝜎 3
• Kurtosis
𝜇𝜇4
𝛽𝛽2 = 2 , 𝛾𝛾2 = 𝛽𝛽2 − 3.
𝜇𝜇2
The Discrete Uniform Distribution 20
Let X be a discrete random variable which can take any of the values
𝑎𝑎, 𝑎𝑎 + ℎ, 𝑎𝑎 + 2ℎ, 𝑎𝑎 + 3ℎ, … , 𝑎𝑎 + 𝑛𝑛 − 1 ℎ with the same probability, where a
is any real number, ℎ is a positive real number and 𝑘𝑘 is a positive integer.
• Then X is said to have a discrete uniform distribution or discrete rectangular

distribution with p.m.f.
1
ℎ = 1 in
𝑓𝑓 𝑥𝑥 = �𝑛𝑛 if 𝑥𝑥 = 𝑎𝑎, 𝑎𝑎 + ℎ, 𝑎𝑎 + 2ℎ, 𝑎𝑎 + 3ℎ, … 𝑏𝑏,
most cases
0 otherwise.
where 𝑏𝑏 = 𝑎𝑎 + 𝑛𝑛 − 1 ℎ
The Discrete Uniform Distribution (contd.) 21
𝑥𝑥 −𝑎𝑎+1
• c.d.f.: 𝐹𝐹 𝑥𝑥 =
𝑏𝑏−𝑎𝑎+1
𝑎𝑎+𝑏𝑏
• Expectation: μ =
2
2 (𝑏𝑏−𝑎𝑎+1)2 −1 𝑛𝑛2 −1
• Variance: 𝜎𝜎 = = With ℎ = 1
12 12
• Skewness: 𝛾𝛾1 = 0
6(𝑛𝑛2 +1)
• Kurtosis: 𝛾𝛾2 = − 2
5(𝑛𝑛 −1)
𝑏𝑏−𝑎𝑎
• Median: 𝜇𝜇� =
2
• Mode: 𝜇𝜇0 = 𝑎𝑎, 𝑎𝑎 + ℎ, 𝑎𝑎 + 2ℎ, 𝑎𝑎 + 3ℎ, … , 𝑎𝑎 + 𝑛𝑛 − 1 ℎ
The Discrete Uniform Distribution (contd.) 22
p.m.f. c.d.f.
The Bernoulli Distribution 23
• Let X be a random variable that takes only two possible values, 0

and 1, with
𝑃𝑃 𝑋𝑋 = 1 = 𝑝𝑝, 𝑃𝑃 𝑋𝑋 = 0 = 1 − 𝑝𝑝.
• Then X is said to have a Bernoulli distribution with p.m.f.

𝑝𝑝 for 𝑥𝑥 = 1,
𝑓𝑓 𝑥𝑥 = �1 − 𝑝𝑝 for 𝑥𝑥 = 0,
0 otherwise.
The Bernoulli Distribution (contd.) 24
0 for 𝑥𝑥 < 0,
• c.d.f.: 𝐹𝐹 𝑥𝑥 = �1 − 𝑝𝑝 for 0 ≤ 𝑥𝑥 < 1,
1 for 𝑥𝑥 ≥ 1.
• Expectation: μ = 𝑝𝑝
• Variance: 𝜎𝜎 2 = 𝑝𝑝 1 − 𝑝𝑝 = 𝑝𝑝𝑝𝑝, where 𝑞𝑞 = 1 − 𝑝𝑝.
• Raw moments of order r : 𝜇𝜇𝑟𝑟′ = 𝑝𝑝
• Central moments of order r : 𝜇𝜇𝑟𝑟 = 𝑝𝑝(1 − 𝑝𝑝)𝑟𝑟 +(1 − 𝑝𝑝)(−𝑝𝑝)𝑟𝑟
1−2𝑝𝑝 1−6𝑝𝑝𝑝𝑝
• Skewness: 𝛾𝛾1 = Kurtosis: 𝛾𝛾2 =
𝑝𝑝(1−𝑝𝑝) 𝑝𝑝𝑝𝑝
0 if 𝑞𝑞 > 𝑝𝑝, 0 if 𝑞𝑞 > 𝑝𝑝,
• Median: 𝜇𝜇� = �0.5 if 𝑞𝑞 = 𝑝𝑝, Mode: 𝜇𝜇0 = �0, 1 if 𝑞𝑞 = 𝑝𝑝,
1 if 𝑞𝑞 < 𝑝𝑝. 1 if 𝑞𝑞 < 𝑝𝑝.
The Bernoulli Distribution (contd.) 25
p.m.f. c.d.f.
Bernoulli Trials 26
• Consider an experiment which has exactly two

random outcomes
• Success, with probability 𝑝𝑝
• Failure, with probability 1 − 𝑝𝑝
• Any repetition of such an experiment is called a

Bernoulli trial
The Binomial Distribution 27
• Consider 𝑛𝑛 independent Bernoulli trials, each with probability of

success 𝑝𝑝.
• Define a random variable 𝑋𝑋 to be the number of successes in the
𝑛𝑛 Bernoulli trials
• Then 𝑋𝑋 is said to have a Binomial distribution with parameters 𝑛𝑛
and 𝑝𝑝, or a Binomial( 𝑛𝑛, 𝑝𝑝) distribution
• Its p.m.f. is
𝑛𝑛 𝑥𝑥 (𝑛𝑛−𝑥𝑥)
𝑝𝑝 𝑞𝑞 , 𝑥𝑥 = 0,1,2, … , 𝑛𝑛,
𝑓𝑓 𝑥𝑥 = � 𝑥𝑥
0 otherwise.
The Binomial Distribution (contd.) 28
𝑞𝑞
∫0 𝑥𝑥 𝑎𝑎−1 (1−𝑥𝑥)𝑏𝑏−1 𝑑𝑑𝑑𝑑
𝐼𝐼𝑞𝑞 𝑎𝑎, 𝑏𝑏 = ,
• c.d.f.: 𝐹𝐹 𝑥𝑥 = 𝐼𝐼𝑞𝑞 𝑛𝑛 − 𝑥𝑥, 𝑥𝑥 + 1 1
∫0 𝑥𝑥 𝑎𝑎−1 (1−𝑥𝑥)𝑏𝑏−1 𝑑𝑑𝑑𝑑
the Incomplete Beta Function
• Expectation: μ = 𝑛𝑛𝑛𝑛 with parameters 𝑎𝑎 and 𝑏𝑏.
• Variance: 𝜎𝜎 2 = 𝑛𝑛𝑛𝑛 1 − 𝑝𝑝 = 𝑛𝑛𝑛𝑛𝑛𝑛.
• Raw moments of order r : 𝜇𝜇𝑟𝑟′ = (𝑛𝑛)𝑟𝑟 𝑝𝑝𝑟𝑟
𝑞𝑞−𝑝𝑝
• Skewness: 𝛾𝛾1 =
𝑛𝑛𝑛𝑛(1−𝑝𝑝)
1−6𝑝𝑝𝑝𝑝
• Kurtosis: 𝛾𝛾2 =
𝑛𝑛𝑛𝑛𝑛𝑛
• Median: 𝜇𝜇� = 𝑛𝑛𝑛𝑛 or 𝑛𝑛𝑛𝑛
• Mode: 𝜇𝜇0 = (𝑛𝑛 + 1)𝑝𝑝 or (𝑛𝑛 + 1)𝑝𝑝 − 1
The Binomial(n,p) distribution (contd.) 29
p.m.f.
The Binomial(n,p) distribution (contd.) 30
c.d.f.
The Geometric Distribution 31
• Consider an infinite sequence of independent Bernoulli trials with

probability of success 𝑝𝑝.
• Define a random variable 𝑋𝑋 as the number of trials required to get
the first success.
• Then 𝑋𝑋 is said to have a geometric distribution with parameter 𝑝𝑝.
• Its p.m.f. is
𝑥𝑥−1
𝑓𝑓 𝑥𝑥 = �𝑞𝑞 𝑝𝑝 if 𝑥𝑥 = 0,1,2, … .
0 otherwise
The Geometric Distribution (contd.) 32
• c.d.f.: 𝐹𝐹 𝑥𝑥 = 1 − 𝑞𝑞 𝑥𝑥
𝑞𝑞
𝑝𝑝
𝑞𝑞
• Variance: 𝜎𝜎 2 =
𝑝𝑝2
2−𝑝𝑝
1−𝑝𝑝
𝑝𝑝2
• Kurtosis: 𝛾𝛾2 = 6 +
1−𝑝𝑝
−1
log2 (1−𝑝𝑝)
• Mode: 𝜇𝜇0 = 1
The Geometric(p) distribution (contd.)
33
p.m.f. c.d.f.
The Negative Binomial Distribution 34
• Consider an infinite sequence of independent Bernoulli trials with probability of

success p.
• Suppose the trials are continued till 𝑟𝑟 successes are observed, where 𝑟𝑟 is a
prespecified positive integer.
• Define a random variable 𝑋𝑋 to be the number of trials preceding the 𝑟𝑟th success.
• Then X is said to have a negative binomial distribution with parameters 𝑟𝑟 and 𝑝𝑝.
• Its p.m.f. is
𝑟𝑟 + 𝑥𝑥 − 1 𝑟𝑟 𝑥𝑥
𝑝𝑝 𝑞𝑞 , 𝑥𝑥 = 0,1,2, …
𝑓𝑓 𝑥𝑥 = � 𝑟𝑟 − 1
0 otherwise.
The Negative Binomial Distribution (contd.) 35
• c.d.f.: 𝐹𝐹 𝑥𝑥 = 1 − 𝐼𝐼𝑝𝑝 (𝑥𝑥 + 1, 𝑟𝑟)
𝑝𝑝𝑝𝑝
𝑞𝑞
𝑝𝑝𝑝𝑝
• Variance: 𝜎𝜎 2= 2
𝑞𝑞
1+𝑝𝑝
𝑝𝑝𝑝𝑝
6 𝑞𝑞2
• Kurtosis: 𝛾𝛾2 = +
𝑟𝑟 𝑝𝑝𝑝𝑝
𝑝𝑝(𝑟𝑟−1)
if 𝑟𝑟 > 1,
• Mode: 𝜇𝜇0 = � 𝑞𝑞
Statistical Structures in Data, PGDBA Programme, ISI, 2022

0 if 𝑟𝑟 ≤ 1. October 14, 2022
Relation with the Geometric(𝑝𝑝) Distribution 36
• The geometric(𝑝𝑝) distribution is a special case of the

negative Binomial(𝑟𝑟, 𝑝𝑝) distribution when
𝑟𝑟 = 1
The Negative Binomial Distribution (contd.) 37
Orange line: Mean

Green line: standard deviation
The Hypergeometric Distribution 38
• Consider a finite population of size 𝑁𝑁.

• Suppose it contains a proportion 𝑝𝑝 of successes, on account of possessing certain
characteristics
• The remaining proportion 𝑞𝑞 (= 1 − 𝑝𝑝 ) are designated failures
• If n objects are drawn randomly without replacement from this population, let 𝑋𝑋 be a
random variable denoting the number of successes among these 𝑛𝑛 objects.
• Then X is said to have a hypergeometric distribution with p.m.f.
𝑁𝑁𝑝𝑝 𝑁𝑁𝑞𝑞
𝑥𝑥 𝑛𝑛 − 𝑥𝑥 if 𝑥𝑥 = 0, 1, 2, 3, …
𝑓𝑓 𝑥𝑥 = 𝑁𝑁
𝑛𝑛 October 14, 2022
0 otherwise.
The Hypergeometric Distribution (contd.) 39
• Expectation: μ = 𝑛𝑛𝑛𝑛
𝑁𝑁−𝑛𝑛
• Variance: 𝜎𝜎 2 = 𝑛𝑛𝑛𝑛𝑛𝑛
𝑁𝑁−1 1
= 0 if 𝑝𝑝 =
2
1
• Skewness: 𝛾𝛾1 = 𝑞𝑞 − 𝑝𝑝 𝑎𝑎 > 0 if 𝑝𝑝 <
2
1
< 0 if 𝑝𝑝 >
2
where 𝑎𝑎 is a function of N, 𝑛𝑛, 𝑝𝑝.
(𝑛𝑛+1)(𝑁𝑁𝑁𝑁+1)
𝑁𝑁+2
The Hypergeometric Distribution (contd.) 40
𝑟𝑟 = sample size
𝑛𝑛 = 𝑁𝑁𝑁𝑁
p.m.f.
Relation with the Binomial Distribution 41
The hypergeometric distribution with parameters

(N, 𝑛𝑛, 𝑝𝑝) can be approximated by the binomial distribution
with parameters (𝑛𝑛, 𝑝𝑝)
• if 𝑛𝑛 is very small compared to N,
𝑛𝑛
• that is, if is negligibly small.
𝑁𝑁
The Poisson Distribution 42
• Let X be a discrete random variable with mass points

0, 1, 2, 3, …
• X is said to have a Poisson distribution with parameter (𝜆𝜆 >
0) if its p.m.f. is
𝑥𝑥
−𝜆𝜆
𝜆𝜆
𝑓𝑓 𝑥𝑥 = �𝑒𝑒 if 𝑥𝑥 = 0, 1, 2, 3, … ,
𝑥𝑥!
0 otherwise.
The Poisson Distribution (contd.) 43
1 ∞ −𝑡𝑡 𝑥𝑥
• c.d.f.: 𝐹𝐹 𝑥𝑥 = ∫ 𝑒𝑒 𝑡𝑡 𝑑𝑑𝑑𝑑,
𝑥𝑥 ! 𝜆𝜆
• Expectation: μ = 𝜆𝜆
• Variance: 𝜎𝜎 2 = 𝜆𝜆
1
• Skewness: 𝛾𝛾1 = > 0 always.
𝜆𝜆
1
• Kurtosis: 𝛾𝛾2 = > 0 always.
𝜆𝜆
• Mode: 𝜇𝜇0 = 𝜆𝜆
The Poisson Distribution (contd.) 44
λ=1 λ=2
λ=5 λ=10
p.m.f.
Relationship with the Binomial Distribution 45
The Poisson distribution with parameter 𝜆𝜆 is the

limiting form of the binomial distribution with
parameters (𝑛𝑛, 𝑝𝑝) if the following conditions hold:
• 𝑛𝑛 → ∞
• 𝑝𝑝 → 0
• 𝑛𝑛𝑛𝑛 = 𝜆𝜆 is finite.
Fitting Discrete Distributions to Data 46
The Objective 47
• Given a frequency table of the following type,

Value of X 𝑥𝑥1 𝑥𝑥2 𝑥𝑥3 𝑥𝑥4 …..
Observed Frequency 𝑓𝑓1 𝑓𝑓2 𝑓𝑓3 𝑓𝑓4 …..
• The objective is to identify a discrete probability distribution

𝑓𝑓(𝑥𝑥|𝑎𝑎1 , 𝑎𝑎2 , … , 𝑎𝑎𝑘𝑘 ), k being the number of parameters, which fits the data
well, that is, provides a very good approximation to the data
• This is useful as a probabilistic model of the data
• There may exist specific theory of statistical inference for f that can be
exploited to make inference regarding this data.
Main Steps 48
• Estimate the 𝑘𝑘 parameters 𝑎𝑎1 , 𝑎𝑎2 , … , 𝑎𝑎𝑘𝑘 by, say, the methods of
moments, that is, by equating the first k sample moments
respectively with the first 𝑘𝑘 theoretical moments, and solving the
𝑘𝑘 equations.
• Denote these estimates by� �2 , … , 𝑎𝑎�𝑘𝑘 .
𝑎𝑎1 , 𝑎𝑎
• Estimate the theoretical frequencies for the different values of X
by
𝑓𝑓�𝑖𝑖 = 𝑛𝑛𝑓𝑓(𝑥𝑥𝑖𝑖 |� �2 , … , 𝑎𝑎�𝑘𝑘 )
𝑎𝑎1 , 𝑎𝑎
where 𝑛𝑛 = ∑∞
𝑖𝑖=1 𝑓𝑓𝑖𝑖 , the total number of observations in the
data set
Example: Fitting a Binomial (m,p)
distribution 49
Estimation of parameters
• Equations to be solved:
𝑠𝑠 2
𝑚𝑚𝑝𝑝 = 𝑥𝑥̿ 𝑝𝑝̂ = 1−
� 𝑥𝑥̅
2 ⟺ 𝑥𝑥̅
𝑚𝑚𝑚𝑚𝑚𝑚 = 𝑠𝑠
𝑚𝑚
� =
𝑝𝑝̂
• Generally, 𝑚𝑚 is known from the definition of the variable, so that
𝑥𝑥̅
𝑝𝑝̂ =
𝑚𝑚
Example (contd.) 50
• Estimation of the expected frequencies for

𝑥𝑥 = 0,1,2, … , 𝑚𝑚
� 𝑚𝑚 𝑥𝑥 (𝑚𝑚−𝑥𝑥)
𝑓𝑓𝑥𝑥 = 𝑝𝑝̂ 𝑞𝑞�
𝑥𝑥
Value of X 0 1 2 ….. 𝑚𝑚
Observed frequency 𝑓𝑓0 𝑓𝑓1 𝑓𝑓2 …... 𝑓𝑓𝑚𝑚
Expected frequency 𝑓𝑓�0 𝑓𝑓�1 𝑓𝑓�2 …… 𝑓𝑓�
𝑚𝑚
Assessing Goodness of Fit 51
By quantile-quantile(Q-Q)
plots
• By plotting the sample
quantiles against the
theoretical quantiles
• Checking how close the
plot is to a straight line
passing through the
origin
Assessing Goodness of Fit (contd.) 52
Through statistical tests of significance

• Pearson’s 𝝌𝝌𝟐𝟐 test of goodness of fit
𝑘𝑘
(𝑂𝑂 − 𝐸𝐸 )2
2 𝑖𝑖 𝑖𝑖 2
𝜒𝜒 = � ∼ 𝜒𝜒𝑘𝑘−𝑐𝑐
𝐸𝐸𝑖𝑖
𝑖𝑖=1
where 𝑂𝑂𝑖𝑖 , 𝐸𝐸𝑖𝑖 are the observed and expected frequencies
of the i-th non-empty cell and c is the number of parameters
estimated.
• Kolmogorov-Smirnov test (to be discussed later)
Univariate Continuous Distributions 53
General Notions 54
• Let X be a continuous random variable taking values with non-

zero probabilities in the interval (a, b)
• Then the probability density function (p.d.f.) 𝒇𝒇 𝒙𝒙 of X is
defined to be such that the probability of X taking values in an
Δ𝑥𝑥 Δ𝑥𝑥
infinitesimal interval 𝑥𝑥 − , 𝑥𝑥 + of length Δ𝑥𝑥 for any 𝑥𝑥 ∈ (𝑎𝑎, 𝑏𝑏)
2 2
is 𝑓𝑓 𝑥𝑥 Δ𝑥𝑥, that is,
Δ𝑥𝑥 Δ𝑥𝑥
𝑃𝑃 𝑥𝑥 − ≤ 𝑋𝑋 ≤ 𝑥𝑥 + = 𝑓𝑓 𝑥𝑥 Δ𝑥𝑥
2 2
• Cumulative Distribution Function
𝑥𝑥
(c.d.f.)
𝐹𝐹 𝑥𝑥 = 𝑃𝑃 𝑋𝑋 ≤ 𝑥𝑥 = � 𝑓𝑓 𝑡𝑡 𝑑𝑑𝑑𝑑
−∞
• Expectation or Expected∞Value
𝜇𝜇 = 𝐸𝐸 𝑋𝑋 = � 𝑡𝑡𝑓𝑓 𝑡𝑡 𝑑𝑑𝑑𝑑
−∞
• Raw moments of order r,∞𝑟𝑟 = 1,2,3, …
𝜇𝜇𝑟𝑟′ = 𝐸𝐸 𝑋𝑋 𝑟𝑟 = � 𝑡𝑡 𝑟𝑟 𝑓𝑓 𝑡𝑡 𝑑𝑑𝑑𝑑
Statistical Structures in Data, PGDBA Programme, ISI, 2022 −∞ October 14, 2022
• Central moments of order∞r, 𝑟𝑟 = 1,2,3, …
𝜇𝜇𝑟𝑟 = 𝐸𝐸(𝑋𝑋 − 𝜇𝜇)𝑟𝑟 = � (𝑡𝑡 − 𝜇𝜇)𝑟𝑟 𝑓𝑓 𝑡𝑡 𝑑𝑑𝑑𝑑

−∞
• Variance: 𝜎𝜎 2 = 𝜇𝜇2
• Median is defined to be that value 𝜇𝜇� for which
1 �
𝜇𝜇 1
𝐹𝐹 𝜇𝜇� − 0 < ≤ 𝐹𝐹 𝜇𝜇� that is, ∫−∞ 𝑓𝑓 𝑥𝑥 𝑑𝑑𝑑𝑑 =
2 2
• Mode 𝜇𝜇0 is defined to be that value of X for

which

𝑓𝑓 𝜇𝜇0 = max 𝑓𝑓(𝑥𝑥 ) October 14, 2022
𝑥𝑥
• Skewness
𝜇𝜇32 𝜇𝜇3
𝛽𝛽1 = 2 , 𝛾𝛾1 =
𝜇𝜇3 𝜎𝜎 3⁄2
• Kurtosis
𝜇𝜇4
𝛽𝛽2 = 2 , 𝛾𝛾2 = 𝛽𝛽2 − 3.
𝜇𝜇2
The Uniform Distribution 58
• Let 𝑎𝑎 and 𝑏𝑏 be two real numbers such that 𝑎𝑎 < 𝑏𝑏.

• A continuous random variable 𝑋𝑋 is said to have a uniform or
rectangular distribution over (𝑎𝑎, 𝑏𝑏) if its pdf is
1
𝑓𝑓 𝑥𝑥 = �𝑏𝑏 − 𝑎𝑎 if 𝑎𝑎 ≤ 𝑥𝑥 ≤ 𝑏𝑏,
0 otherwise
The Continuous Uniform Distribution (contd.) 59
p.d.f. c.d.f.
The Continuous Uniform Distribution (contd.) 60
0 if 𝑥𝑥 < 𝑎𝑎,
𝑥𝑥−𝑎𝑎
• c.d.f.: 𝐹𝐹 𝑥𝑥 = � if 𝑎𝑎 ≤ 𝑥𝑥 ≤ 𝑏𝑏,
𝑏𝑏−𝑎𝑎
0 otherwise
𝑎𝑎+𝑏𝑏
2
(𝑏𝑏−𝑎𝑎)2
12
• Kurtosis: 𝛾𝛾2 = −1.2
𝑎𝑎+𝑏𝑏
2
The Exponential Distribution 61
• The p.d.f. of a random variable 𝑋𝑋 having an exponential

distribution with parameter 𝜆𝜆 (> 0) is
𝑓𝑓 𝑥𝑥 = 𝜆𝜆𝑒𝑒 −𝜆𝜆𝜆𝜆 , 0 ≤ 𝑥𝑥 < ∞
The Exponential(𝜆𝜆) Distribution (contd.) 62
• c.d.f.: 𝐹𝐹 𝑥𝑥 = 1 − 𝑒𝑒 −𝜆𝜆𝜆𝜆
1
𝜆𝜆
1
𝜆𝜆2
• Kurtosis: 𝛾𝛾2 = 6
ln 2
𝜆𝜆
The Exponential(𝜆𝜆) Distribution (contd.) 63
p.d.f. c.d.f.
The Gamma Distribution 64
• The pdf of a gamma random variable with shape parameter 𝛼𝛼 and

rate parameter 𝛽𝛽, where 𝛼𝛼, 𝛽𝛽 > 0 is
𝛽𝛽𝛼𝛼 𝛼𝛼−1 −𝑥𝑥𝑥𝑥

𝑓𝑓 𝑥𝑥 𝛼𝛼, 𝛽𝛽 = �Γ(𝛼𝛼) 𝑥𝑥 𝑒𝑒 if 0 ≤ 𝑥𝑥 < ∞.
0 otherwise
The Gamma(𝛼𝛼, 𝛽𝛽) Distribution (contd.) 65
1 𝑥𝑥 𝛼𝛼−1 −𝑡𝑡𝛽𝛽
• c.d.f.: 𝐹𝐹 𝑥𝑥 = ∫ 𝑥𝑥 𝑒𝑒 𝑑𝑑𝑑𝑑
Γ(𝛼𝛼) 0
𝛼𝛼
𝛽𝛽
2 𝛼𝛼
• Variance: 𝜎𝜎 = 2
𝛽𝛽
2
𝛼𝛼
6
𝛼𝛼
𝛼𝛼−1
• Mode: 𝜇𝜇0 =
𝛽𝛽
The Gamma(𝛼𝛼, 𝛽𝛽) Distribution (contd.) 66
p.d.f. 1 c.d.f.
𝑘𝑘 = 𝛼𝛼, 𝜃𝜃 =
Statistical Structures in Data, PGDBA Programme, ISI, 2022 𝛽𝛽 October 14, 2022
The Beta Distribution 67
• The beta distribution with shape parameters 𝛼𝛼 and 𝛽𝛽 (𝛼𝛼, 𝛽𝛽 > 0)

has p.d.f.
1
𝑥𝑥 𝛼𝛼−1 (1 − 𝑥𝑥)𝛽𝛽−1 if 0 ≤ 𝑥𝑥 ≤ 1,
𝑓𝑓 𝑥𝑥 𝛼𝛼, 𝛽𝛽 = �Β(𝛼𝛼, 𝛽𝛽)
0 otherwise,
Γ(𝛼𝛼)Γ(𝛽𝛽)
where Β 𝛼𝛼, 𝛽𝛽 =
Γ(𝛼𝛼+𝛽𝛽)
The Beta(𝛼𝛼, 𝛽𝛽) Distribution (contd.) 68
• c.d.f.: 𝐹𝐹 𝑥𝑥 = 𝐼𝐼𝑥𝑥 (𝛼𝛼, 𝛽𝛽)

𝛼𝛼
𝛼𝛼+𝛽𝛽
2 𝛼𝛼𝛼𝛼
• Variance: 𝜎𝜎 =
𝛼𝛼+𝛽𝛽 2 (𝛼𝛼+𝛽𝛽+1)
𝛼𝛼−1
• Mode: 𝜇𝜇0 =
𝛼𝛼+𝛽𝛽−2
• Symmetric if 𝛼𝛼 = 𝛽𝛽, positively skewed if 𝛼𝛼 < 𝛽𝛽, negatively skewed
if 𝛼𝛼 > 𝛽𝛽.
The Beta(𝛼𝛼, 𝛽𝛽) Distribution (contd.) 69
p.d.f.
c.d.f. October 14, 2022
The Normal Distribution 70
• The normal distribution with (parameters) mean 𝜇𝜇 and standard

deviation 𝜎𝜎 2 has the p.d.f.
1
1 − 2 (𝑥𝑥−𝜇𝜇)2
𝑓𝑓 𝑥𝑥 𝜇𝜇, 𝜎𝜎 = 𝑒𝑒 2𝜎𝜎 , −∞ < 𝑥𝑥 < ∞.
𝜎𝜎 2𝜋𝜋
The Normal(𝜇𝜇, 𝜎𝜎) Distribution (contd.) 71
• c.d.f.: 𝐹𝐹 𝑥𝑥
• Expectation: μ
• Variance: 𝜎𝜎 2
• Kurtosis: 𝛾𝛾2 = 0
• Median: 𝜇𝜇� = 𝜇𝜇
• Mode: 𝜇𝜇0 = 𝜇𝜇
• The standard normal distribution: 𝑁𝑁(0,1) with pdf 𝜙𝜙(𝑥𝑥) and cdf Φ(𝑥𝑥)
7
2
The Normal(𝜇𝜇, 𝜎𝜎) Distribution (contd.)
p.d.f. c.d.f.
Some Properties of the Normal Distribution 73
Points of inflection of the 𝒩𝒩(𝜇𝜇, 𝜎𝜎) curve

Distributions Associated with the Normal
Distribution 74
A Few Such Commonly Used Distributions 75
• The log-normal distribution

• The chi-square or 𝜒𝜒 2 -distribution
• The Student’s t distribution
• The F distribution
The Log-normal Distribution 76
• Is a continuous probability distribution of a random variable X

whose logarithm is normally distributed.
• That is, if X is log-normally distributed, then 𝑌𝑌 = ln 𝑋𝑋 has a
normal distribution.
• Equivalently, if Y has a normal distribution, then 𝑋𝑋 = 𝑒𝑒 𝑌𝑌 has a log-
normal distribution.
• Its p.d.f. is
1 1
− 2 ( ln 𝑥𝑥−𝜇𝜇)2
𝑓𝑓 𝑥𝑥 𝜇𝜇, 𝜎𝜎 = 𝑒𝑒 2𝜎𝜎 , 0 < 𝑥𝑥 < ∞.
𝑥𝑥𝜎𝜎 2𝜋𝜋
The Log-normal Distribution (contd.) 77
1 1 ln 𝑥𝑥−𝜇𝜇
• c.d.f.: 𝐹𝐹 𝑥𝑥 = + Φ
2 2 𝜎𝜎 2
𝜇𝜇+𝜎𝜎 2 ⁄2
• Expectation: μ = 𝑒𝑒
2 2𝜇𝜇+𝜎𝜎 2
• Variance: 𝜎𝜎 2 = 𝑒𝑒 𝜎𝜎 − 1 𝑒𝑒
2 2
• Skewness: 𝛾𝛾1 = 𝑒𝑒 𝜎𝜎 + 2 𝑒𝑒 𝜎𝜎 − 1
4𝜎𝜎 2 3𝜎𝜎 2 2𝜎𝜎 2
• Kurtosis: 𝛾𝛾2 = 𝑒𝑒 + 2𝑒𝑒 + 3𝑒𝑒 −6
• Median: 𝜇𝜇� = 𝑒𝑒 𝜇𝜇
2
• Mode: 𝜇𝜇0 = 𝑒𝑒 𝜇𝜇−𝜎𝜎
2
The lognormal ln 𝒩𝒩(𝜇𝜇, 𝜎𝜎 ) (contd.) 78
The (Central) 𝜒𝜒 2 -Distribution with k degrees
of Freedom 79
• If 𝑋𝑋1 , 𝑋𝑋2 , … . , 𝑋𝑋𝑘𝑘 are independent 𝒩𝒩(0,1) random variables, then

𝜒𝜒𝑘𝑘2 = 𝑋𝑋12 + 𝑋𝑋22 + ⋯ 𝑋𝑋𝑘𝑘2
is said to have a central 𝜒𝜒 2 -distribution with k degrees of freedom (d,f.)
• Its p.d.f. is
2−𝑘𝑘⁄2 𝑘𝑘−1 −𝑘𝑘𝑥𝑥
𝑥𝑥 2 𝑒𝑒 2 if 0 ≤ 𝑥𝑥 < ∞.
𝑓𝑓 𝑥𝑥 𝑘𝑘 = Γ(𝑘𝑘)
2
0 otherwise
𝑘𝑘 1
which is identical with the Gamma , p.d.f.
2 2
2
The Central 𝜒𝜒 (𝑘𝑘) Distribution (contd.) 80
1 𝑥𝑥 𝑘𝑘−1 − 𝑡𝑡
• c.d.f.: 𝐹𝐹 𝑥𝑥 = 𝑘𝑘 ∫0 𝑥𝑥
2 𝑒𝑒 2 𝑑𝑑𝑑𝑑
Γ( )
2
• Expectation: μ = 𝑘𝑘
• Variance: 𝜎𝜎 2 = 2𝑘𝑘
8
𝑘𝑘
12
𝑘𝑘
• Mode: 𝜇𝜇0 = max(𝑘𝑘 − 2, 0)
2
The Central 𝜒𝜒 (𝑘𝑘) Distribution (contd.) 81
p.d.f. c.d.f.
The (Central) Student’s t- Distribution 82
• Student was the pen name of William Sealy Gosset (1876 –1937), an English
statistician.
• He developed the Student's t-distribution.
𝑋𝑋
• If 𝑋𝑋 ∼ 𝒩𝒩 0,1 , 𝑌𝑌 ∼ 𝜒𝜒𝑘𝑘2 , and 𝑋𝑋, 𝑌𝑌 are independent, then 𝑡𝑡𝑘𝑘 =
𝑌𝑌�
𝑘𝑘
is said to have a central t-distribution with 𝑘𝑘 degrees of freedom, with p.d.f.
𝑘𝑘 + 1 𝑘𝑘+1
Γ 𝑡𝑡 2 − 2
2
𝑓𝑓 𝑡𝑡 𝑘𝑘 = 1+ , for 0 < 𝑡𝑡 < ∞.
𝑘𝑘 𝑘𝑘
𝑘𝑘𝜋𝜋Γ
2
The Central 𝑡𝑡-Distribution with k d.f.(contd.) 83
• Expectation: μ = 0
2 𝑘𝑘
• Variance: 𝜎𝜎 = , 𝑘𝑘 >2
𝑘𝑘−2
• Skewness: 𝛾𝛾1 = 0, 𝑘𝑘 > 3
6
• Kurtosis: 𝛾𝛾2 = , 𝑘𝑘 > 4,
𝑘𝑘−4
• Median: 𝜇𝜇� = 0
The Central 𝑡𝑡-Distribution with k d.f.(contd.) 84
p.d.f. c.d.f.
The Central F -Distribution 85
𝑋𝑋⁄
• If X ∼ 2,
𝜒𝜒𝑚𝑚 𝑌𝑌 ∼ 𝜒𝜒𝑛𝑛2 and 𝑋𝑋, 𝑌𝑌 are independent, then 𝐹𝐹𝑚𝑚,𝑛𝑛 = 𝑚𝑚
𝑌𝑌⁄
𝑛𝑛
is said to have a central F -distribution with 𝑚𝑚, 𝑛𝑛 degrees of
freedom.
• Its p.d.f. for 0 < 𝐹𝐹 < ∞ is
𝑛𝑛 𝑚𝑚+𝑛𝑛
1 𝑚𝑚 𝑚𝑚
𝑚𝑚 − 2
2 −1
𝑓𝑓 𝐹𝐹 𝑚𝑚, 𝑛𝑛 = 𝑚𝑚 𝑛𝑛 𝐹𝐹 2 1+ 𝐹𝐹 .
Β , 𝑛𝑛 𝑛𝑛
2 2
𝑚𝑚
𝑛𝑛
𝐹𝐹 𝑚𝑚 𝑛𝑛
• Note that 𝑚𝑚 ~𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵 ,
1+ 𝑛𝑛 𝐹𝐹 2 2
The Central 𝐹𝐹-Distribution with (𝑚𝑚, 𝑛𝑛)
d.f.(contd.) 86
𝑛𝑛
• Expectation: μ = , 𝑛𝑛 > 2.
𝑛𝑛−2
2𝑛𝑛 2 (𝑚𝑚+𝑛𝑛−2)
• Variance: 𝜎𝜎 2 = 2 , 𝑛𝑛 >4
𝑚𝑚 𝑛𝑛−2 (𝑛𝑛−4)
The Central 𝐹𝐹-Distribution with (𝑚𝑚, 𝑛𝑛)
d.f.(contd.) 87
m=1, n=1
m-=2, n=1
m=5, n=2
m=10, n=1
m-=n=100
m=1, n=1
m-=2, n=1
m=5, n=2
m=10, n=1
m-=n=100
p.d.f. c.d.f.

SSD5 Oct 14

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SSD5 Oct 14

Uploaded by

Copyright:

Available Formats

Descriptive Statistics

A More Rigorous Discussion

• Statistic: Any function of sample observations

• Let X denote the random variable of interest defined over a

• For the given set of observations, the raw moment of order r is

• For the given set of observations, the central moment of

• Measures of Central Tendency

• Measures of Variation or Dispersion

• Measure of Central Tendency

Negatively Symmetric Positively

• Pearson’s Measures of Skewness

A. M. Gun, M. K. Gupta and B. Dasgupta, Fundamentals of Statistics

• Let X be a discrete random variable

𝑓𝑓 𝜇𝜇0 = max 𝑓𝑓(𝑥𝑥𝑖𝑖 )

• Then X is said to have a discrete uniform distribution or discrete rectangular

• Let X be a random variable that takes only two possible values, 0

• Then X is said to have a Bernoulli distribution with p.m.f.

• Consider an experiment which has exactly two

• Any repetition of such an experiment is called a

• Consider 𝑛𝑛 independent Bernoulli trials, each with probability of

• Consider an infinite sequence of independent Bernoulli trials with

• Consider an infinite sequence of independent Bernoulli trials with probability of

Statistical Structures in Data, PGDBA Programme, ISI, 2022

• The geometric(𝑝𝑝) distribution is a special case of the

Orange line: Mean

• Consider a finite population of size 𝑁𝑁.

The hypergeometric distribution with parameters

• Let X be a discrete random variable with mass points

The Poisson distribution with parameter 𝜆𝜆 is the

• Given a frequency table of the following type,

• The objective is to identify a discrete probability distribution

• Estimation of the expected frequencies for

Through statistical tests of significance

• Let X be a continuous random variable taking values with non-

𝜇𝜇𝑟𝑟 = 𝐸𝐸(𝑋𝑋 − 𝜇𝜇)𝑟𝑟 = � (𝑡𝑡 − 𝜇𝜇)𝑟𝑟 𝑓𝑓 𝑡𝑡 𝑑𝑑𝑑𝑑

• Mode 𝜇𝜇0 is defined to be that value of X for

Statistical Structures in Data, PGDBA Programme, ISI, 2022

• Let 𝑎𝑎 and 𝑏𝑏 be two real numbers such that 𝑎𝑎 < 𝑏𝑏.

• The p.d.f. of a random variable 𝑋𝑋 having an exponential

𝑓𝑓 𝑥𝑥 = 𝜆𝜆𝑒𝑒 −𝜆𝜆𝜆𝜆 , 0 ≤ 𝑥𝑥 < ∞

• The pdf of a gamma random variable with shape parameter 𝛼𝛼 and

𝛽𝛽𝛼𝛼 𝛼𝛼−1 −𝑥𝑥𝑥𝑥

• The beta distribution with shape parameters 𝛼𝛼 and 𝛽𝛽 (𝛼𝛼, 𝛽𝛽 > 0)

• c.d.f.: 𝐹𝐹 𝑥𝑥 = 𝐼𝐼𝑥𝑥 (𝛼𝛼, 𝛽𝛽)

• The normal distribution with (parameters) mean 𝜇𝜇 and standard

Points of inflection of the 𝒩𝒩(𝜇𝜇, 𝜎𝜎) curve

• The log-normal distribution

• Is a continuous probability distribution of a random variable X

• If 𝑋𝑋1 , 𝑋𝑋2 , … . , 𝑋𝑋𝑘𝑘 are independent 𝒩𝒩(0,1) random variables, then

is said to have a central t-distribution with 𝑘𝑘 degrees of freedom, with p.d.f.

You might also like