You are on page 1of 101

Lecture #4

Statistics for Data Analytics

Dr. Debasis Samanta


Professor
Department of Computer Science & Engineering
Quote of the day..

 "I avoid looking forward or backward, and try


to keep looking upward."

 CHARLOTTE BRONTE, an English novelist and poet

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 2


Today’s discussion…
 Statistics versus Probability

 Concept of random variable

 Probability distribution concept

 Discrete probability distribution


 Continuous probability distribution

 Concept of sampling distribution


 Major sampling distributions
 Usage of sampling distributions

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 3


Probability and Statistics
Probability is the chance of an outcome in an experiment (also called event).

Event: Tossing a fair coin


Outcome: Head, Tail

Probability deals with predicting the Statistics involves the analysis of the
likelihood of future events. frequency of past events

Example 4.1: Consider there is a drawer containing 100 socks: 30 red, 20 blue
and 50 black socks.
We can use probability to answer questions about the selection of a
random sample of these socks.
 PQ1. What is the probability that we draw two blue socks or two red socks from
the drawer?
 PQ2. What is the probability that we pull out three socks or have matching pair?
 PQ3. What is the probability that we draw five socks and they are all black?

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 4


Statistics
Instead, if we have no knowledge about the type of socks in the drawers, then we
enter into the realm of statistics. Statistics helps us to infer properties about the
population on the basis of the random sample.

Questions that would be statistical in nature are:

 SQ1: A random sample of 10 socks from the drawer produced one blue, four red, five
black socks. What is the total population of black, blue or red socks in the drawer?

 SQ2: We randomly sample 10 socks, and write down the number of black socks and
then return the socks to the drawer. The process is done for five times. The mean
number of socks for each of these trial is 7. What is the true number of black socks in
the drawer?
 etc.

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 5


Probability versus Statistics
In other words:
 In probability, we are given a model and asked what kind of data we are likely to
see.
 In statistics, we are given data and asked what kind of model is likely to have
generated it.

Example 4.2: Measles Study


 A study on health is concerned with the incidence of childhood measles in parents in a
city. For each couple, we would like to know how likely, it is that either the mother or
father or both have had childhood measles.
 The current census data indicates that 20% adults between the ages 20 and 40
(childbearing ages of parents regardless of sex) have had childhood measles.
 This give us the probability that an individual in the city has had childhood measles.

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 6


Defining Random Variable
Definition 4.1: Random Variable
A random variable is a rule that assigns a numerical value to an outcome of
interest.

Example 4.3: In “measles Study”, we define a random variable 𝑋 as the number


of parents in a married couple who have had childhood measles.
This random variable can take values of 0, 1 𝑎𝑛𝑑 2.
Note:
 Random variable is not exactly the same as the variable defining a data.

 The probability that the random variable takes a given value can be computed
using the rules governing probability.
 For example, the probability that 𝑋 = 1 means either mother or father but not both
has had measles is 0.32. Symbolically, it is denoted as P(X=1) = 0.32
@DSamanta, IIT Kharagpur Data Analytics (CS61061) 7
Probability Distribution
Definition 4.2: Probability distribution
A probability distribution is a definition of probabilities of the values of
random variable.

Example 4.4: Given that 0.2 is the probability that a person (in the ages between
20 and 40) has had childhood measles. Then the probability distribution is given
by
X Probability

?
0 0.64
1 0.32
2 0.04

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 8


Probability Distribution
 In data analytics, the probability distribution is important with which many
statistics making inferences about population can be derived .

 In general, a probability distribution function takes the following form

𝒙 𝒙𝟏 𝒙𝟐 … … … … . . 𝒙𝒏
𝑓 𝑥 = 𝑃(𝑋 = 𝑥) 𝑓 𝑥1 𝑓 𝑥2 … … . . 𝑓(𝑥𝑛 )

Example: Measles Study


0.64

𝒙 0 1 2
𝑓 𝑥 0.64 0.32 0.04 0.32
f(x)
0.04

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 9


Taxonomy of Probability Distributions
Discrete probability distributions
Binomial distribution
Multinomial distribution
Poisson distribution
Hypergeometric distribution

Continuous probability distributions


Normal distribution
Standard normal distribution
Gamma distribution
Exponential distribution
Chi square distribution
Lognormal distribution
Weibull distribution

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 10


Usage of Probability Distribution
 Distribution (discrete/continuous) function is widely used in simulation
studies.
 A simulation study uses a computer to simulate a real phenomenon or process as
closely as possible.

 The use of simulation studies can often eliminate the need of costly experiments
and is also often used to study problems where actual experimentation is
impossible.

Examples 4.5:
1) A study involving testing the effectiveness of a new drug, the number of cured
patients among all the patients who use such a drug approximately follows a
binomial distribution.

2) Operation of ticketing system in a busy public establishment (e.g., airport), the


arrival of passengers can be simulated using Poisson distribution.

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 11


Discrete Probability Distributions

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 12


Binomial Distribution
 In many situations, an outcome has only two outcomes: success and failure.
 Such outcome is called dichotomous outcome.
 An experiment when consists of repeated trials, each with dichotomous outcome is called
Bernoulli process. Each trial in it is called a Bernoulli trial.

Example 4.6: Firing bullets to hit a target.


 Suppose, in a Bernoulli process, we define a random variable P(X ≡ the number of successes)
in trials.
 Such a random variable obeys the binomial probability distribution, if the experiment satisfies
the following conditions:
1) The experiment consists of n trials.
2) Each trial results in one of two mutually exclusive outcomes, one labelled a “success” and
the other a “failure”.
3) The probability of a success on a single trial is equal to 𝒑. The value of 𝑝 remains constant
throughout the experiment.
4) The trials are independent.

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 13


Defining Binomial Distribution
Definition 4.3: Binomial distribution
The function for computing the probability for the binomial probability
distribution is given by
𝑛!
𝑓 𝑥 = 𝑝 𝑥 (1 − 𝑝)𝑛−𝑥
𝑥! 𝑛 − 𝑥 !
for x = 0, 1, 2, …., n
Here, 𝑓 𝑥 = 𝑃 𝑋 = 𝑥 , where 𝑋 denotes “the number of success” and 𝑋 = 𝑥
denotes the number of successes is 𝑥.

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 14


Binomial Distribution Curves

Binomial distribution vs. Normal distribution

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 15


Binomial Distribution
Example 4.7: Measles study
X = having had childhood measles a success
p = 0.2, the probability that a parent had childhood measles
n = 2, here a couple is an experiment and an individual a trial, and the number
of trials is two.

Thus,
2! 0 (0.8)2−0
𝑃 𝑥 = 0 = 0! (0.2) = 𝟎. 𝟔𝟒
2−0 !
X Probability
2! 0 0.64
𝑃 𝑥=1 = (0.2)1 (0.8)2−1 = 𝟎. 𝟑𝟐
1! 2 − 1 ! 1 0.32
2 0.04
2!
𝑃 𝑥=2 = (0.2)2 (0.8)2−2 = 𝟎. 𝟎𝟒
2! 2 − 2 !

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 16


Binomial Distribution
Example 4.8: Verify with real-life experiment
Suppose, 10 random numbers each of two digits are generated by a computer (Monte-Carlo
simulation method)

15 38 68 39 49 54 19 79 38 14
If the value of the digit is 0 or 1, the outcome is “had childhood measles”, otherwise,
(digits 2 to 9), the outcome is “did not”.
For example, in the first pair (i.e., 15), representing a couple and for this couple, x = 1. The
frequency distribution, for this sample is

x 0 1 2
f(x)=P(X=x) 0.7 0.3 0.0

Note: This has close similarity with binomial probability distribution!

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 17


The Multinomial Distribution
The binomial experiment becomes a multinomial experiment, if we let each trial has more
than two possible outcome.

Definition 4.4: Multinomial distribution


If a given trial can result in the k outcomes 𝐸1 , 𝐸2 , … … , 𝐸𝑘 with probabilities
𝑝1 , 𝑝2 , … … , 𝑝𝑘 , then the probability distribution of the random variables
𝑋1 , 𝑋2 , … … , 𝑋𝑘 representing the number of occurrences for 𝐸1 , 𝐸2 , … … , 𝐸𝑘 in
n independent trials is
𝑛
𝑓 𝑥1 , 𝑥2 , … … , 𝑥𝑘 = 𝑥1 ,𝑥2 ,……,𝑥𝑘
𝑝1 𝑥1 𝑝2 𝑥2 … … 𝑝𝑘 𝑥𝑘

𝑛 𝑛!
where 𝑥1 ,𝑥2 ,……,𝑥𝑘
=𝑥
1 !𝑥2 !……𝑥𝑘 !

σ𝑘𝑖=1 𝑥𝑖 = 𝑛 and σ𝑘𝑖=1 𝑝𝑖 = 1

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 18


The Hypergeometric Distribution
• Collection of samples with two strategies
• With replacement
• Without replacement
• A necessary condition of the binomial distribution is that all trials are
independent to each other.
• When sample is collected “with replacement”, then each trial in sample collection is
independent.

Example 4.9:
Probability of observing three red cards in 5 draws from an ordinary deck of 52
playing cards.
− You draw one card, note the result and then returned to the deck of cards
− Reshuffled the deck well before the next drawing is made
• The hypergeometric distribution does not require independence and is based on the
sampling done without replacement.

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 19


The Hypergeometric Distribution
 In general, the hypergeometric probability distribution enables us to find the
probability of selecting 𝑥 successes in 𝑛 trials from 𝑁 items.

Properties of Hypergeometric Distribution


• A random sample of size 𝑛 is selected without replacement from 𝑁 items.
• 𝑘 of the 𝑁 items may be classified as success and 𝑁 − 𝑘 items are classified as
failure.
Let 𝑋 denotes a hypergeometric random variable defining the number of successes.

Definition 4.5: Hypergeometric Probability Distribution

The probability distribution of the hypergeometric random variable 𝑋, the


number of successes in a random sample of size 𝑛 selected from 𝑁 items of
which 𝑘 are labelled success and 𝑁 − 𝑘 labelled as failure is given by
𝑘 𝑁−𝑘
𝑥 𝑛−𝑥
𝑓 𝑥 =𝑃 𝑋=𝑥 = 𝑁
𝑛
max(0, 𝑛 − (𝑁 − 𝑘)) ≤ 𝑥 ≤ min(𝑛, 𝑘)
@DSamanta, IIT Kharagpur Data Analytics (CS61061) 20
Multivariate Hypergeometric Distribution
The hypergeometric distribution can be extended to treat the case where the N
items can be divided into 𝑘 classes 𝐴1 , 𝐴2 , … … , 𝐴𝑘 with 𝑎1 elements in the first
class 𝐴1 , … and 𝑎𝑘 elements in the 𝑘 𝑡ℎ class. We are now interested in the
probability that a random sample of size 𝑛 yields 𝑥1 elements from 𝐴1 , 𝑥2
elements from 𝐴2 , … … , 𝑥𝑘 elements from 𝐴𝑘 .

Definition 4.6: Multivariate Hypergeometric Distribution


If 𝑁 items are partitioned into 𝑘 classes 𝑎1 , 𝑎2 , … … , 𝑎𝑘 respectively, then
the probability distribution of the random variables 𝑋1 , 𝑋2 , … … , 𝑋𝑘 ,
representing the number of elements selected from 𝐴1 , 𝐴2 , … … , 𝐴𝑘 in a
random sample of size 𝑛, is
𝑎1 𝑎2 𝑎𝑘
𝑥1 𝑥2
…… 𝑥𝑘
𝑓 𝑥1 , 𝑥2 , … … , 𝑥𝑘 = 𝑃 𝑋1 = 𝑥1 , 𝑋2 = 𝑥2 , … … , 𝑋𝑘 = 𝑥𝑘 = 𝑁
𝑛
with σ𝑘𝑖=1 𝑥𝑖 = 𝑛 and σ𝑘𝑖=1 𝑎𝑖 = 𝑁

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 21


The Poisson Process
There are some experiments, which involve the occurring of the number of
outcomes during a given time interval (or in a region of space). Such a process is
called the Poisson process.
The Poisson process is one of the most widely used counting processes. It is
usually used in scenarios where we are counting the occurrences of certain
events that appear to happen at a certain rate, but completely at random.

Example 4.10:
Number of customers visiting a ticket selling
counter in a railway station.

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 22


The Poisson Process
Properties of Poisson process
 There is a discrete value, say x is the number of times an event occurs in an
interval and x can take values 0, 1, 2, ....
 The occurrence of one event does not affect the probability that a second
event will occur. That is, events occur independently.
 The average rate at which events occur assumed to be constant.
 Two events cannot occur at exactly the same instant; instead, at each very
small sub-interval exactly one event either occurs or does not occur.

If these conditions are true, then x is a Poisson random variable,


and the distribution of x is a Poisson distribution.

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 23


The Poisson Distribution
Definition 4.7: Poisson distribution
The probability distribution of the Poisson random variable 𝑋, representing
the number of outcomes occurring in a given time interval 𝑡, is

𝑒 −𝜆𝑡 . (𝜆𝑡)𝑥
𝑓 𝑥, 𝜆𝑡 = 𝑃 𝑋 = 𝑥 = , 𝑥 = 0, 1, … …
𝑥!

where 𝜆 is the average number of outcomes per unit time and 𝑒 = 2.71828 …

What is P(X = x) if t = 0?
Example 4.11:
 The number of customers arriving at a grocery store can be modelled by a
Poisson process with intensity λ=10 customers per hour.
1. Find the probability that there are 2 customers between 10:00 and 10:20.
2. Find the probability that there are 3 customers between 10:00 and 10:20 and
7 customers between 10:20 and 11:00.
@DSamanta, IIT Kharagpur Data Analytics (CS61061) 24
The Poisson Distribution Curves

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 25


Descriptive measures
Given a random variable X in an experiment, we have denoted 𝑓 𝑥 = 𝑃 𝑋 = 𝑥 , the
probability that 𝑋 = 𝑥. For discrete events 𝑓 𝑥 = 0 for all values of 𝑥 except 𝑥 =
0, 1, 2, … . .

Properties of discrete probability distribution


1. 0 ≤ 𝑓(𝑥) ≤ 1
2. σ 𝑓 𝑥 = 1
3. 𝜇 = σ 𝑥. 𝑓(𝑥) [ is the mean ]
4. 𝜎 2 = σ 𝑥 − 𝜇 2 . 𝑓(𝑥) [ is the variance ]

In 2, 3 𝑎𝑛𝑑 4, summation is extended for all possible discrete values of 𝑥.


1
Note: For discrete uniform distribution, 𝑓 𝑥 = 𝑛 with 𝑥 = 1, 2, … … , 𝑛
𝑛
1
𝜇 = ෍ 𝑥𝑖
𝑛
𝑖=1
1 𝑛
and 𝜎 2 = σ𝑖=1(𝑥𝑖 −𝜇)2
𝑛

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 26


Descriptive measures
1. Binomial distribution
The binomial probability distribution is characterized with 𝑝 (the probability of
success) and 𝑛 (is the number of trials). Then

𝜇 = 𝑛. 𝑝

𝜎 2 = 𝑛𝑝 1 − 𝑝

2. Hypergeometric distribution
The hypergeometric distribution function is characterized with the size of a sample
(𝑛), the number of items (𝑁) and 𝑘 labelled success. Then

𝑛𝑘
𝜇=
𝑁

2
𝑁−𝑛 𝑘 𝑘
𝜎 = . n. (1 − )
𝑁−1 𝑁 𝑁
@DSamanta, IIT Kharagpur Data Analytics (CS61061) 27
Descriptive measures
3. Poisson Distribution
The Poisson distribution is characterized with 𝜆𝑡 where 𝜆=
𝑡ℎ𝑒 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠 and 𝑡 = 𝑡𝑖𝑚𝑒 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙.

𝜇 = 𝜆𝑡
𝜎 2 = 𝜆𝑡

Alternative definition of Poisson distribution:

𝑒 −𝜇 ∙ 𝜇 𝑥
𝑃 𝑋=𝑥 =
𝑥!

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 28


Special Case: Discrete Uniform Distribution
 Discrete uniform distribution
A random variable X has a discrete uniform distribution if each of the n values in the
range, say x1, x2, x3, …, xn has equal probability. That is

1
𝑓 𝑥 =
𝑛
Where f(x) represents the probability mass function.

 Mean and variance for discrete uniform distribution


Suppose, X is a discrete uniform random variable in the range [a,b], such that a≤b,
then
𝑏+𝑎
𝜇= 2
2 𝑏−𝑎+1 2 −1
𝜎 = 12

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 29


Concept of Probability Mass Function
 A probability mass function is a function that gives the probability that a
discrete random variable is exactly equal to some value. Sometimes it is also
known as the discrete probability density function.

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 30


Concept of Probability Mass Function
Example 4.12:

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 31


Continuous Probability Distributions

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 32


Continuous Probability Distributions

f(x) f(x)

x1 x2 x3 x4
X=x X=x
Discrete Probability distribution Continuous Probability Distribution

Discrete random variable vs. Continuous random variable

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 33


Continuous Probability Distributions
 When the random variable of interest can take any value in an interval, it is
called a continuous random variable.
 Every continuous random variable has an infinite, uncountable number of possible
values (i.e., any value in an interval)

 Consequently, continuous random variable differs from a discrete random


variable (i.e., a finite or countable infinite sequence of elements).

Examples 4.13:
1. Tax to be paid for a purchase in a shopping mall. Here, the random
variable varies from 0 to +∞
2. Amount of rainfall in mm in a region.
3. Earthquake intensity in Richter scale.
4. Height of an earth surface. Here, the random variable varies from −𝑎
to +𝑏, 𝑎, 𝑏 ∈ 𝑅, 𝑅 is a set of real numbers.

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 34


Continuous Uniform Distribution
 One of the simplest continuous
distribution in all of statistics is the f(x)

continuous uniform distribution. c

A B
X=x

Definition 4.8: Continuous Uniform Distribution

The density function of the continuous uniform random variable 𝑋 on the


interval [𝐴, 𝐵] is:
1
𝐴≤𝑥≤𝐵
𝑓 𝑥: 𝐴, 𝐵 = ൞𝐵 − 𝐴
0 𝑂𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 35


Continuous Uniform Distribution

f(x)
c

A B
X=x
Note:
−∞ 1
a) ‫𝐵 = 𝑥𝑑 𝑥 𝑓 ∞׬‬−𝐴 × (𝐵 − 𝐴) = 1
𝑑−𝑐
b) 𝑃(𝑐 < 𝑥 < 𝑑)= 𝐵−𝐴 where both 𝑐 and 𝑑 are in the interval (A,B)
𝐴+𝐵
c) 𝜇 = 2
2 (𝐵−𝐴)2
d) 𝜎 = 12

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 36


Continuous Uniform Distribution: Example
Example 4.14:

What is the probability of waiting exactly five minutes, that is, P(X = 5)?

Note: Probability is represented by the area under the curve. The probability of a
specific value of a continuous random variable is zero. because the area under a
point is zero.

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 37


Non-Uniform Distribution of X

2
1 −(𝑥−𝜇) ൗ2𝜎2
𝑓 𝑥 = 𝑒 −∞ < 𝑥 < ∞
𝜎 2𝜋

Note:
This f(x) is called the probability density function.
It gives a nonnegative value.
This is used to define the random variable’s probability of coming within a distinct
range of values, as opposed to taking on any one value.
@DSamanta, IIT Kharagpur Data Analytics (CS61061) 38
Probability Distribution Function

𝑏
𝑃 𝑎 ≤ 𝑋 ≤ 𝑏 = න 𝑓(𝑥)𝑑𝑥
𝑎
Note:
This f(x) is called the probability density function.
It gives a nonnegative value.
This is used to define the random variable’s probability of coming within a distinct
range of values, as opposed to taking on any one value.
@DSamanta, IIT Kharagpur Data Analytics (CS61061) 39
Continuous Probability Distributions
 Example 4.15:
 Suppose bacteria of a certain species typically live 4 to 6 hours. The
probability that a bacterium lives exactly 5 hours is equal to zero. A
lot of bacteria live for approximately 5 hours, but there is no chance
that any given bacterium dies at exactly 5.0000000000... hours.
 However, the probability that the bacterium dies between 5 hours and
5.01 hours is quantifiable.

 Suppose, the answer is 0.02 (i.e., 2%). Then, the probability that the
bacterium dies between 5 hours and 5.001 hours should be about
0.002, since this time interval is one-tenth as long as the previous.
The probability that the bacterium dies between 5 hours and 5.0001
hours should be about 0.0002, and so on.

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 40


Continuous Probability Distributions
 Note:
 In these three examples, the ratio (probability of dying during an
interval) / (duration of the interval) is approximately constant, and
equal to 2 per hour (or 2 hour−1). For example, there is 0.02
probability of dying in the 0.01-hour interval between 5 and 5.01
hours, and (0.02 probability / 0.01 hours) = 2 hour−1. This quantity 2
hour−1 is called the probability density for dying at around 5 hours.

 Therefore, the probability that the bacterium dies at 5 hours can be


written as (2 hour−1) dt. This is the probability that the bacterium dies
within an infinitesimal window of time around 5 hours, where dt is
the duration of this window.

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 41


Important note

 Probability mass function


 A probability mass function (PMF) is a function that gives the
probability that a discrete random variable is exactly equal to some
value.
 Sometimes it is also known as the discrete density function.

 Probability density function


 A probability density function (PDF) in that is associated with
continuous rather than discrete random variables.

 Note:
 A PDF must be integrated over an interval to yield a probability.

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 42


Properties of Probability Density Function
The function 𝑓(𝑥) is a probability density function for the continuous random
variable 𝑋, defined over the set of real numbers 𝑅, if

1. 𝑓 𝑥 ≥ 0, for all 𝑥 ∈ 𝑅

2. ‫׬‬−∝ 𝑓 𝑥 𝑑𝑥 = 1

𝑏 f(x)
3. 𝑃 𝑎≤𝑋≤𝑏 = ‫𝑥𝑑 )𝑥(𝑓 𝑎׬‬

4. 𝜇 = ‫׬‬−∝ 𝑥𝑓(𝑥) 𝑑𝑥 a b

∝ X=x
5. 𝜎 =2 𝑥−𝜇 2𝑓 𝑥 𝑑𝑥
‫׬‬−∝
Note: Probability is represented by area under the curve.
The probability of a specific value of a continuous random variable is zero.

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 43


Normal Distribution
 The most often used continuous probability distribution is the normal
distribution; it is also known as Gaussian distribution.

 Its graph called the normal curve is the bell-shaped curve.

 Such a curve approximately describes many phenomenon occur in nature,


industry and research.
 Physical measurement in areas such as meteorological experiments, rainfall
studies and measurement of manufacturing parts are often more than adequately
explained with normal distribution.

 A continuous random variable X having the bell-shaped distribution is called a


normal random variable.

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 44


Normal Distribution
• The mathematical equation for the probability distribution of the normal variable
depends upon the two parameters 𝜇 and 𝜎, its mean and standard deviation.

f(x)
𝜎

𝜇
x

Definition 4.9: Normal distribution


The density of the normal variable 𝑥 with mean 𝜇 and variance 𝜎 2 is
(𝑥−𝜇) 2
1 − ൗ 2
𝑓 𝑥 = 𝑒 2𝜎 −∞ < 𝑥 < ∞
𝜎 2𝜋

where 𝜋 = 3.14159 … and 𝑒 = 2.71828 … . ., the Napierian constant


@DSamanta, IIT Kharagpur Data Analytics (CS61061) 45
Normal Distribution Curves
σ1 = σ2
σ1

σ2

µ1
µ1 µ
µ2 2 µ1 = µ2
Normal curves with µ1< µ2 and σ1 = σ2 Normal curves with µ1 = µ2 and σ1< σ2

σ1

σ2

µ1 µ2
Normal curves with µ1<µ2 and σ1<σ2

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 46


Normal Distribution Curves: Example
Example 4.16:
Suppose, the fasting glucose levels of diabetic patients taking two drugs A
and B are shown in the two distribution graphs. Which drug is better?

Here, both the drug has the same mean, but drug B would be the better
medication, as fewer patients on this distribution have very high or very
low glucose levels.
@DSamanta, IIT Kharagpur Data Analytics (CS61061) 47
Properties of Normal Distribution
 The curve is symmetric about a vertical axis through the mean 𝜇.
 The random variable 𝑥 can take any value from −∞ 𝑡𝑜 ∞.
 The most frequently used descriptive parameters define the curve itself.
 The mode, which is the point on the horizontal axis where the curve is a
maximum occurs at 𝑥 = 𝜇.
 The total area under the curve and above the horizontal axis is equal to 1.
∞ 1 ∞ − 1 2 (𝑥−𝜇)2
‫׬‬−∞ 𝑓 𝑥 𝑑𝑥 =
𝜎 2𝜋
‫׬‬−∞ 𝑒 2𝜎 𝑑𝑥 =1
1
∞ 1 ∞ − 2 (𝑥−𝜇)2
 𝜇= ‫׬‬−∞ 𝑥. 𝑓 𝑥 𝑑𝑥 = ‫׬‬−∞ 𝑥. 𝑒 2𝜎 𝑑𝑥
𝜎 2𝜋
1
1 ∞ −2[(𝑥−𝜇)ൗ𝜎2]
 𝜎2 = ‫׬‬−∞(𝑥 − 2
𝜇) . 𝑒 𝑑𝑥
𝜎 2𝜋
1 𝑥2 − 12 (𝑥−𝜇)2
 𝑃 𝑥1 < 𝑥 < 𝑥2 = ‫ 𝑒 𝑥׬‬2𝜎 𝑑𝑥
𝜎 2𝜋 1
denotes the probability of x in the interval (𝑥1 , 𝑥2 ). 𝜇 x1 x2

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 48


Standard Deviation Normal Distribution
 The standard deviation is particularly useful in normal distribution
 The proportion of elements in the normal distribution (i.e., the proportion of the area under the curve) is a constant
for a given number of standard deviations above or below the mean of the distribution

𝑥 = 𝜇 + 𝑧. 𝜎

 Approximately, 68% of the distribution falls within the ±1 standard deviation of the mean.
 Approximately, 95% of the distribution falls within the ±2 standard deviation of the mean.
 Approximately, 99.7% of the distribution falls within the ±3 standard deviation of the mean.

Note:
These proportions hold true for every normal distribution

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 49


Standard Deviation Normal Distribution
Example 4.17:
 Suppose, a population’s resting heart rate is normally distributed with a mean (𝜇) of 70 and a standard
deviation (𝜎) of 10.

𝑥 = 𝜇 + 𝑧. 𝜎

 68% of the population will have a resting heart rate between 60 and 80.
 95% of the population will have a heart rate between approximately 70 ± (2×10), that is, 50 and 90
beats/min.

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 50


Table of z scores

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 51


Table of z scores
Example 4.18:

• The nearest figure to 5% (0.05) in the table is 0.0495, the z-sore corresponding to this is 1.65.
• The corresponding heart rate lies 1.65 standard deviations above the mean; that is, it is equal
70+1.65×10 = 86.5
• We can conclude that 5% of this population has a heart rate above 86.5 beats/min.

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 52


Z scores: Standard Normal Distribution
 X: Normal distribution with mean 𝜇 and variance 𝜎 2 .

 The normal distribution has computational complexity to calculate 𝑃(𝑥1 < 𝑥 <

𝑥−𝜇
z= [z-transformation]
𝜎

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 53


Standard Normal Distribution
Definition 4.10: Standard normal distribution
The simplest case of a normal distribution is known as the standard normal
distribution or unit normal distribution. This is a special case, when µ = 0 and
variance 𝜎2 = 1. It is described by the following:

𝑧 2
1 −2
f(z, 0, 1) = ‫׬‬ 𝑒
2𝜋

0.09
0.4
0.08 σ σ=1
0.07
0.3
0.06

0.05
0.2
0.04

0.03

0.02 0.1

0.01

0.00 0.0
-5 0 5 10 15 20 25 -3 -2 -1 0 1 2 3

x=µ µ=0
f(x: µ, σ) f(z: 0, 1)
@DSamanta, IIT Kharagpur Data Analytics (CS61061) 54
Standard Normal Distribution: Example
Example 4.19:

The lives of diabetic patients is normally distributed with a mean of 50 years


and a standard deviation of 15 years.
What is the probability that a patient will survive between 50 and 70 years?

Hint: Find the probability that x is between 50 and 70 or P( 50< x < 70).
You can use both normal and z-distribution tables.

Given mean, μ= 50 and standard deviation, σ = 15


To find: Probability that x is between 50 and 70 or P( 50< x < 70)

By using the transformation equation, we know;


z = (x – μ) / σ
For x = 50 , z = (50 – 50) / 15 = 0
For x = 70 , z = (70 – 50) / 15 = 1.33
P( 50< x < 70) = P( 0< z < 1.33) = [area to the left of z = 1.33] – [area to the left of z = 0]

From the z-table, we get the value, such as


P( 0< z < 1.33) = 0.9082 – 0.5 = 0.4082

The probability that a patient will survive between 50 and 70 years is equal to 0.4082.

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 55


Gamma Distribution
The gamma distribution derives its name from the well known gamma function in
mathematics.

Definition 4.11: Gamma Function


𝛼
Γ 𝛼 = ‫׬‬0 𝑥 𝛼−1 𝑒 −𝑥 𝑑𝑥 for 𝛼 > 0

Integrating by parts, we can write,


𝛼

Γ 𝛼 = 𝛼 − 1 න 𝑥 𝛼−2 𝑒 −𝑥 𝑑𝑥
0
= 𝛼−1 Γ 𝛼−1

Thus Γ function is defined as a recursive function.

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 56


Gamma Distribution
When 𝛼 = 𝑛, we can write,
Γ 𝑛 = 𝑛 − 1 𝑛 − 2 … … … . Γ(1)

= 𝑛 − 1 𝑛 − 2 … … … .3.2.1

= 𝑛−1 !


Further, Γ 1 = ‫׬‬0 𝑒 −𝑥 𝑑𝑥 = 1

Note:
1
Γ = 𝜋 [An important property]
2

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 57


Gamma Distribution
Definition 4.12: Gamma Distribution

The continuous random variable 𝑥 has a gamma distribution with parameters 𝛼


and 𝛽 such that:
1 𝑥
𝛼−1 −𝛽
𝑥 𝑒 𝑥>0
𝑓 𝑥: 𝛼, 𝛽 = ൞ 𝛽 𝛼 Γ(𝛼)
0 𝑂𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
where 𝛼 > 0 and 𝛽>0

1.0
σ=1, β=1

0.8

0.6

f(x)
0.4
σ=2, β=1

0.2
σ=4, β=1

0.0
0 2 4 6 8 10 12
x

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 58


Exponential Distribution
Definition 4.13: Exponential Distribution
The continuous random variable 𝑥 has an exponential distribution with parameter
𝛽 , where:
𝑥
1 −𝛽
𝑒
𝑓 𝑥: 𝛽 = ൞𝛽 where 𝛽 > 0
0

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 59


Exponential Distribution
Definition 4.13: Exponential Distribution
The continuous random variable 𝑥 has an exponential distribution with parameter
𝛽 , where:
𝑥
1 −𝛽
𝑒
𝑓 𝑥: 𝛽 = ൞𝛽 where 𝛽 > 0
0

Note:
1) The mean and variance of gamma distribution are
𝜇 = 𝛼𝛽
𝜎 2 = 𝛼𝛽 2
2) The mean and variance of exponential distribution are
𝜇=𝛽
𝜎 2 = 𝛽2

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 60


Chi-Squared Distribution
Definition 4.14: Chi-squared distribution
The continuous random variable 𝑥 has a Chi-squared distribution with 𝑣 degrees
of freedom, is given by
1 𝑣ൗ −1 −𝑥
𝑣 𝑥 2 𝑒 2 , 𝑥>0
𝑓 𝑥: 𝑣 = 22 Γ(𝑣/2)
0 𝑂𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
where 𝑣 is a positive integer.

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 61


Chi-Squared Distribution
Definition 4.14: Chi-squared distribution
The continuous random variable 𝑥 has a Chi-squared distribution with 𝑣 degrees
of freedom, is given by
1 𝑣ൗ −1 −𝑥
𝑣 𝑥 2 𝑒 2 , 𝑥>0
𝑓 𝑥: 𝑣 = 22 Γ(𝑣/2)
0 𝑂𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
where 𝑣 is a positive integer.

• The Chi-squared distribution plays an important role in statistical inference .


• The mean and variance of Chi-squared distribution are:

𝜇 = 𝑣 and 𝜎 2 = 2𝑣

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 62


Lognormal Distribution
The lognormal distribution applies in cases where a natural log transformation
results in a normal distribution.

Definition 4.15: Lognormal distribution

The continuous random variable 𝑥 has a lognormal distribution if the random


variable 𝑦 = ln(𝑥) has a normal distribution with mean 𝜇 and standard deviation
𝜎. The resulting density function of 𝑥 is:

1 1
− 2 [ln 𝑥 −𝜇]2
𝑒 2𝜎 𝑥≥0
𝑓 𝑥: 𝜇, 𝜎 = ൞𝜎𝑥 2𝜋
0 𝑥<0

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 63


Lognormal Distribution

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 64


Weibull Distribution
Definition 4.16: Weibull Distribution

The continuous random variable 𝑥 has a Weibull distribution with parameter 𝛼


and 𝛽 such that.
𝛽
𝛼𝛽𝑥 𝛽−1 𝑒 −𝛼𝑥 𝑥>0
𝑓 𝑥: 𝛼, 𝛽 = ൞
0 𝑂𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
where 𝛼 > 0 and 𝛽 > 0

The mean and variance of Weibull


distribution are:
−1 1
ൗ𝛽
𝜇=𝛼 Γ 1+𝛽

−2ൗ 2 1
𝜎2 = 𝛼 𝛽 {Γ 1+ − [Γ(1 + )]2 }
𝛽 𝛽

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 65


Sample Statistics

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 66


In the next part of discussion…
 Basic concept of sampling distribution

 Usage of sampling distributions

 Issue with sampling distributions

 Central Limit Theorem

 Application of Central Limit Theorem

 Major sampling distributions


 𝝌𝟐 distribution
 t-distribution

 F distribution

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 67


Sampling Distribution
There are two facts, which are key to statistical inference.
1. Population parameters are fixed number whose values are usually unknown.
2. Sample statistics are known values for any given sample, but vary from sample to
sample, even taken from the same population.
 In fact, it is unlikely for any two samples drawn independently, producing identical
values of sample statistics.
 In other words, the variability of sample statistics is always present and must be
accounted for in any inferential procedure.
 This variability is called sampling variation.

Note:
A sample statistics is random variable and like any other random variable, a sample
statistics has a probability distribution.

Note: Probability distribution for random variable is not applicable to sample statistics.
@DSamanta, IIT Kharagpur Data Analytics (CS61061) 68
Sampling Distribution
More precisely, sampling distributions are probability distributions and used to describe
the variability of sample statistics.

Definition 4.17: Sampling distribution


The sampling distribution of a statistics is the probability distribution of that
statistics.

ത is called
 The probability distribution of sample mean (hereafter, will be denoted as 𝑋)
the sampling distribution of the mean (also, referred to as the distribution of sample
mean).
ത we call sampling distribution of variance (denoted as 𝑆 2 ).
 Like 𝑋,

 Using the values of 𝑋ത and 𝑆 2 for different random samples of a population, we are to
make inference on the parameters 𝜇 and 𝜎 2 (of the population).

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 69


Sampling Distribution
Example 4.20:
Consider five identical balls numbered and weighting as 1, 2, 3, 4 and 5. Consider an
experiment consisting of drawing two balls, replacing the first before drawing the second,
and then computing the mean of the values of the two balls.
Following table lists all possible samples and their mean.
Sample (𝑿) ഥ)
Mean (𝑿 Sample (𝑿) ഥ)
Mean (𝑿 Sample (𝑿) ഥ)
Mean (𝑿
[1,1] 1.0 [2,4] 3.0 [4,2] 3.0
[1,2] 1.5 [2,5] 3.5 [4,3] 3.5
[1,3] 2.0 [3,1] 2.0 [4,4] 4.0
[1,4] 2.5 [3,2] 2.5 [4,5] 4.5
[1,5] 3.0 [3,3] 3.0 [5,1] 3.0
[2,1] 1.5 [3,4] 3.5 [5,2] 3.5
[2,2] 2.0 [3,5] 4.0 [5,3] 4.0
[2,3] 2.5 [4,1] 2.5 [5,4] 4.5
@DSamanta, IIT Kharagpur Data Analytics (CS61061) 70
[5,5] 5.0
Sampling Distribution
Sampling distribution of means
𝑋ത 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

𝑓(𝑋) 1 2 3 4 5 4 3 2 1
25 25 25 25 25 25 25 25 25

1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 71


Issues with Sampling Distribution
1. In practical situation, for a large population, it is infeasible to have all
possible samples and hence frequency distribution of sample statistics.

2. The sampling distribution of a statistics depends on


 the size of the population

 the size of the samples and

 the method of choosing the samples.

?
@DSamanta, IIT Kharagpur Data Analytics (CS61061) 72
Theorem on Sampling Distribution
Famous theorem in Statistics

Theorem 4.18: Sampling distribution of means


The sampling
Example: The twodistribution of aobeys
balls experiment random sample of size n drawn from a
the theorem.
population with mean 𝜇 and variance 𝜎 will have mean 𝑋ത = 𝜇 and variance
2
𝜎2
of samples’ mean is 𝑆2 ≈ 𝑛

Example 4.21: With reference to data in Example 4.20


5+1
For the population, 𝜇 = 2
=3
52 −1
𝜎2 = =2
12
Applying the theorem, we have 𝑋ത = 3 𝑎𝑛𝑑 𝑆 2 ≈ 0.4
Hence, the theorem!

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 73


Central Limit Theorem
The Theorem 4.18 is an amazing result and in fact, also verified that if we sampling
from a population with unknown distribution, the sampling distribution of 𝑋ത will still
𝜎2
be approximately normal with mean μ and variance provided that the sample size is
𝑛
large.

This further, can be established with the famous “Central Limit Theorem”, which is
stated below.

Theorem 4.19: Central Limit Theorem

If random samples each of size 𝑛 are taken from any distribution with mean μ
and variance 𝜎 2 , the sample mean 𝑋ത will have a distribution approximately
𝜎2
normal with mean μ and variance .
𝑛

The approximation becomes better as 𝑛 increases.

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 74


Applicability of Central Limit Theorem
 The normal approximation of 𝑋 will generally be good if 𝑛 ≥ 30
 The sample size 𝑛 = 30 is, hence, a guideline for the central limit theorem.
 The normality on the distribution of 𝑋 becomes more accurate as 𝑛 grows
larger.

n=large

n = small
n=1 to moderate

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 75


Usefulness of the sampling distribution
 The mean of the sampling distribution of the means is the
population mean.
 This implies that “on the average” the sample mean is the same as the
population mean.
 We therefore say that the sample mean is an unbiased estimate of the
population mean.

 The variance of the distribution of the sample means is σ2/n.


𝜎
 The standard deviation of the sampling distribution (i.e., ) of the
𝑛
mean, often called the standard error of the mean.
 If σ is high then the sample are not reliable.
 For a very large sample size (𝑛 → ∞), standard error tends to zero.

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 76


Applicability of Central Limit Theorem
• One very important application of the Central Limit Theorem is the
determination of reasonable values of the population mean 𝜇 and
variance 𝜎 2 having a sample, that is, a subset of a population.

• One very important deduction

For standard normal distribution, we have the z-transformation


𝑥−𝜇
z= (See Slide# 53)
𝜎

Thus, for a sample statistics


𝑋−𝜇 𝑋−𝜇
𝑧= =𝜎
𝑆 ൗ 𝑛

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 77


Applicability of Central Limit Theorem
Example 4.22:
• A quiz test for the course CS61061 was conducted and it was found that
mean of the scores 𝜇 = 90 with standard deviation 𝜎 = 20.
• Now, all students enrolled in the course are randomly assigned to
various sections of 100 students in each. A section (X) was checked and
the mean score was found as 𝑋ത = 86.

• What is the standard error rate?


𝜎 20
The standard error rate (Central Limit Theorem) = = = 2.0
𝑛 100
• What is the probability of getting a mean of 86 or lower on the
quiz test?
For standard normal distribution, we have the z-transformation
𝑥−𝜇
z=
𝜎
Thus, for a sample statistics
𝑋−𝜇 𝑋−𝜇 86−90
𝑧= =𝜎 = = -2. P(Z<-2)?
@DSamanta, IIT Kharagpur 𝑆 ൗData 2
𝑛 Analytics (CS61061) 78
Standard Sampling Distributions
 Apart from the normal distribution to describe sampling distribution, there
are some other quite different sampling, which are extensively referred in
the study of statistical inference.

 𝜒 2 : Describes the distribution of variance.

 𝑡: Describes the distribution of normally distributed random variable


standardized by an estimate of the standard deviation.

 F: Describes the distribution of the ratio of two variables.

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 79


Chi-square Distribution

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 80


2
The 𝜒 Distribution
 A common use of the 𝜒 2 distribution is to describe the distribution of the
sample variances.

 Let X1, X2, . . . , Xn be the independent random variables from a normally


distributed population with mean = μ and variance = σ2.
 The χ2 distribution can be written as
𝜒 2 = 𝑍1 2 + 𝑍2 2 + 𝑍3 2 + ∙∙∙ + 𝑍𝑛 2

𝑋𝑖 −μ
where 𝑍𝑖 = .
𝑆
 This 𝜒 2 is also a random variable of a distribution and is called 𝜒 2 -
distribution (pronounced as Chi-square distribution).

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 81


2
𝜒 is distribution of sample variances
 A common use of the χ2 distribution is to describe the distribution of the
sample variance. Let X1, X2, . . . , Xn be a random sample from a normally
distributed population with mean = μ and variance = σ2. Then the
quantity (n − 1)S2/σ2 is a random variable whose distribution is described
by a χ2 distribution with (n − 1) degrees of freedom, where S2 is the usual
sample estimate of the population variance. That is
𝑋−𝑋ത 2
𝑆2 =
𝑛−1

 In other words, the χ2 distribution is used to describe the sampling


distribution of S2. Since we divide the sum of squares by degrees of
freedom to obtain the variance estimate, the expression for the random
variable having a χ2 distribution can be written
2 2 𝑋−𝑋ത 2 𝑆𝑆 (𝑛−1)𝑆 2
𝜒 = σ𝑍 = σ = =
𝜎 𝜎2 𝜎2

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 82


2
The 𝜒 Distribution

Definition 4.19: 𝝌𝟐 -distribution for Sampling Variance


If 𝑆 2 is the variance of a random sample of size n taken from a normal population
having the variance 𝜎 2 , then the statistics

(𝑛−1)𝑆 2 𝑛 𝑥𝑖 −𝑥 2
𝜒2 = 𝜎2
=σ𝑖=1 𝜎

has a chi-squared distribution with 𝑣 = 𝑛 − 1 degrees of freedom and variance is


2𝑣

How the 𝜒 2 - distribution is used to describe the sampling distribution of


𝑆2?

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 83


2
The 𝜒 Distribution
 The χ2 distribution can be written as
2 2 𝑋−𝑋ത 2 𝑆𝑆 (𝑛−1)𝑆 2
𝜒 = σ𝑍 = σ = =
𝜎 𝜎2 𝜎2

 This expression χ2 describes the distribution (of n samples) and thus having
degrees of freedom 𝜐 = n-1 and often written as 𝜒 2 (𝜐), where 𝜐 is the only
parameter in it.

The mean of this distribution is 𝜐


and variance is 2𝜐

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 84


2
Some facts about 𝜒 distribution
 The curves are non symmetrical and skewed to the right.

 χ2 values cannot be negative since they are sums of squares.

 The mean of the χ2 distribution is ν, and the variance is 2ν.

 When 𝜐 > 30, the Chi-square curve approximates the


normal distribution. Then, you may write the following
𝜒2 − 𝜐
𝑍=
2𝜐

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 85


2
Application 𝜒 of values
Example 4.23: Judging the quality of a machine
A machine is to produce a ball of 100gm. It is desirable to have maximum
deviation of 0.01gm (this is the desirable value of 𝜎).
Suppose, 15 balls produced by the machine are selected at random and it
shows S = 0.0125gm.
What is the probability that the machine will produces an accurate ball?

𝜒 2 calculation can help us to know this value.

2 𝑛−1 𝑆 2 14× 0.0125 2


𝜒 = 𝜎2 = = 21.875
0.01 2

This is the 𝜒 2 value with 14 degrees of


freedom. The value can be tested with
𝜒 2 table to know the desired probability
value.

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 86


t-Distribution

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 87


The 𝒕 Distribution
 The 𝒕 Distribution
1. To know the sampling distribution of mean we make use of Central Limit Theorem

𝑋−𝜇
with Z = 𝜎/ 𝑛

2. This require the known value of 𝜎 a priori.

3. However, in many situation, 𝜎 is certainly no more reasonable than the knowledge


of the population mean 𝜇.

4. In such situation, only measure of the standard deviation available may be the
sample standard deviation 𝑆.

5. It is natural then to substitute 𝑆 for 𝜎. The problem is that the resulting statistics not
necessarily be normally distributed!

6. The 𝑡 distribution is to alleviate this problem. This distribution is called 𝑠𝑡𝑢𝑑𝑒𝑛𝑡’𝑠 𝑡


or simply 𝑡 − 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛.

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 88


The 𝒕 Distribution
 The 𝒕 Distribution

Definition 4.20: 𝒕 −distribution

The 𝑡 −distribution with 𝑣 degrees of freedom actually takes the form

𝑍
𝑡 𝑣 =
𝜒 2 (𝑣)
𝑣

where 𝑍 is a standard normal random variable, and 𝜒 2 (𝑣) is 𝜒 2 random


variable with 𝑣 degrees of freedom.

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 89


The 𝒕 Distribution
Corollary: Let 𝑋1 , 𝑋2 , … … , 𝑋𝑛 be independent random variables that are all normal with
mean 𝜇 and standard deviation 𝜎.
1 𝑛 1
Let 𝑋ത = σ 𝑋 and 𝑆 2 = 𝑛−1 σ𝑛𝑖=1(𝑋𝑖 − 𝑋)
ത 2
𝑛 𝑖=1 𝑖

Using this definition, we can develop the sampling distribution of the sample mean when
the population variance, 𝜎 2 is unknown.

That is,

𝑋−𝜇
z = 𝜎/ has the standard normal distribution.
𝑛
𝑛−1 𝑆 2
𝜒2 = has the 𝜒 2 distribution with (𝑛 − 1) degrees of freedom.
𝜎2

𝑋−𝜇
𝜎/ 𝑛 ത
𝑋−𝜇
Thus, 𝑡 = or 𝑡= 𝑆/ 𝑛
𝑛−1 𝑆2 /𝜎2
𝑛−1

This is the 𝑡 − 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 with (𝑛 − 1) degrees of freedom.

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 90


F Distribution

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 91


The 𝐹 Distribution
 The 𝐹 distribution finds enormous applications in comparing sample variances.

Definition 4.21: 𝑭 distribution


The statistics F is defined to be the ratio of two independent Chi-Squared
random variables, each divided by its number of degrees of freedom. Hence,
𝜒2 (𝑣1 )/𝑣1
F 𝑣1 , 𝑣2 = 𝜒2 (𝑣2 )/𝑣2

𝑛−1 𝑆 2
Corollary: Recall that 𝜒 2 = is the Chi-squared distribution with (𝑛 − 1) degrees
𝜎2
of freedom.

Therefore, if we assume that we have sample of size 𝑛1 from a population with variance
𝜎1 2 and an independent sample of size 𝑛2 from another population with variance 𝜎2 2 ,
then the statistics
𝑆1 2 /𝜎1 2
𝐹=
𝑆2 2 /𝜎2 2

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 92


Summary of sampling distributions
𝒁 − 𝐝𝐢𝐬𝐭𝐫𝐢𝐛𝐮𝐭𝐢𝐨𝐧:
 Typically it is used for comparing the mean of a sample to some hypothesized
mean for the population in case of large sample, or when population
variance is known.

𝒕 − 𝐝𝐢𝐬𝐭𝐫𝐢𝐛𝐮𝐭𝐢𝐨𝐧:
 population variance is not known. In this case, we use the variance of the
sample as an estimate of the population variance.

𝝌𝟐 − 𝐝𝐢𝐬𝐭𝐫𝐢𝐛𝐮𝐭𝐢𝐨𝐧:
 It is used for comparing a sample variance to a theoretical population
variance.

𝑭 − 𝐝𝐢𝐬𝐭𝐫𝐢𝐛𝐮𝐭𝐢𝐨𝐧:
 It is used for comparing the variance of two or more populations.

DSamanta@IIT Kharagpur Data Analytics (CS61061) 93


Reference

 The detail material related to this lecture can be found in

Probability and Statistics for Engineers and Scientists (8th Ed.) by


Ronald E. Walpole, Sharon L. Myers, Keying Ye (Pearson), 2013.

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 94


Any question?

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 95


Questions of the day…
1. Give some examples of random variables? Also, tell the
range of values and whether they are with continuous or
discrete values.

2. In the following cases, what are the probability


distributions are likely to be followed. In each case, you
should mention the random variable and the parameter(s)
influencing the probability distribution function.
a) In a retail source, how many counters should be opened at a given
time period.
b) Number of people who are suffering from cancers in a town?

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 96


Questions of the day…
2. In the following cases, what are the probability
distributions are likely to be followed. In each case, you
should mention the random variable and the parameter(s)
influencing the probability distribution function.
c) A missile will hit the enemy’s aircraft.
d) A student in the class will secure EX grade.
e) Salary of a person in an enterprise.
f) Accident made by cars in a city.
g) People quit education after i) primary ii) secondary and iii) higher
secondary educations.

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 97


Questions of the day…
3. How you can calculate the mean and standard deviation of
a population if the population follows the following
probability distribution functions with respect to an event.
a) Binomial distribution function.
b) Poisson’s distribution function.
c) Hypergeometric distribution function.
d) Normal distribution function.
e) Standard normal distribution function.

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 98


Questions of the day…
4. What are the degrees of freedom in the following cases.
Case 1: A single number.
Case 2: A list of n numbers.
Case 3: a table of data with m rows and n columns.
Case 4: a data cube with dimension m×n×p.

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 99


Questions of the day…
5. In the following, two normal sampling distributions
are shown with parameters n, μ and σ (all symbols
bear their usual meanings).
𝑛1 , 𝜇1 , 𝜎1

𝑛2 , 𝜇2 , 𝜎2

What are the relations among the parameters in the two?

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 100


Questions of the day…
6. Suppose, 𝑋ത and S denote the sample mean and
standard deviation of a sample. Assume that
population follows normal distribution with
population mean 𝜇 and standard deviation 𝜎. Write
down the expression of z and t values with degree of
freedom n-1.
7. If X and Y are the two random variables, how you can
express the random variable, say Z = X + Y. Illustrate
your idea with an example.

@DSamanta, IIT Kharagpur Data Analytics (CS61061) 101

You might also like