Probability Theory and Statistics Basics-V 1.1

Probability Theory &
Statistics
1
What is Statistics
• Statistics deals with the collection, analysis, interpretation,

presentation and organization of data.
• Statistics allow companies to collect data, translate the data
into information so that decision can be taken based on facts,
rather than intuition, gut feel or past experience
• Statistics is like a powerful microscope which make visible
what has previously been invisible
• Statistics is a tool that separates common sense reasoning
from extraordinary reasoning
• Statistics create foundation for quality, which translates to
profitability and market share
Concept of Data
 Data is Collection of facts
 It may be values or measurements
 It may be words or description of things
 Data in its most basic form is raw in nature
Data Data is characterized by:

 Types of Data
Information  Quantitative
 Qualitative
 Unit of Measure
Knowledge
Types of Data - Quantitative
 A number attached from birth
Quantitative Data:  It takes on Numerical values
5.364…
Continuous/Variable
The characteristics which may assume any value within its range of variation
e.g. Height, Weight, Diameter etc.
Discrete/Attribute 7
The characteristics which assume only isolated values in its range of
variation. e.g. No. of complaints per month, Percentage absenteeism, No. of
injuries etc.
Types of Data - Qualitative
 No number attached from birth
Qualitative Data:  It takes on Categorical values
Takes on values in one of K different classes or categories

Example: Gender (Male/Female), Brand of Product Purchased (Brand A,B, C),
Person Default on Debt (Yes, No)
 Ordinal: Ordered data - Rating, Ranking, Percentile
 Nominal: List of Identity without order e.g. Gender
 Binary : Two class Categorical data of type present and absent having
states 1 and 0 respectively. It is also referred to as Boolean when two
states correspond to TRUE and FALSE.
Interval and Ratio Measurement
• In Nominal and Ordinal, the distance between attribute does not have any
meaning.
• However in interval measurement, the distance between attributes have a
meaning. E.g. if we are measuring temperature in Fahrenhiet scale then
distance between 30 to 40 is same as distance between 70 and 80.
• In such situation, it makes sense to compute average on interval scale.
However it does not make sense to compute ratios on this scale as 80 degrees
is not twice as hot as 40 degrees.
• In Ratio measurement, there is always an absolute zero that is meaningful. This
means that one can construct a meaningful fraction (or ratio) with a ratio
variable.
• Weight and Height are ratio variable and so are most count variable. E.g. No of
defects this week are twice that of last week.
6
Descriptive Statistics
Average or Mean - Sum of all Data divided by number of data points
Median - Middle Data Value when data is ranked from min. to max.
Mode - Data which has maximum frequency.
Maximum - Largest of Data Points.
Minimum - Smallest of Data Points
Range - Difference between Maximum & Minimum

2
S( x-
i xi )
Standard Deviation -
s =
n-1
Central Tendency – Mean, Median, Mode
Mean Median Mode
Sum of all data val divided Middle data value when data Most common value, data
by number of data points is ranked from min to max which has max frequency
Mathematically intensive Easy to compute practically Easy to compute
May not be represented by Usually represented by an Represented by individual

any individual data value individual data value data value
A histogram balances when Median divides area of Mode is the value of highest
supported at mean histogram in half point on the histogram
Not used when: Does not get affected by Does not get affected by
Data contains few extreme extreme values extreme values
values widely different
from majority Preferred when order of Preferred when most
values are considered commonly occurring value
Terminal classes are open important appropriately represent the
group
Why Standard Deviation is important
 Range tells us only about two points, Max and Min
 Standard Deviation tells us about the relative distance of all

points from the average or mean.
 It is a measure that is used to quantify the amount of

variation or dispersion of a set of data values about mean
 It is RMS deviation from mean. It is RMS not of the original

values but of their deviation from mean
 Variation is important to know

Probability Theory
10
Random Experiment
An experiment that can result in different

outcomes, even though it is repeated in the
same manner every time, is called a random
experiment.
An experiment (or a “trial”) with results that

cannot be predicted beforehand, i.e. random.
11
Sample Space
The set of all possible outcomes of a random

experiment is called the sample space of the
experiment.
The sample space is donated by S
12
Sample Space
Each element of Sample space is called a

Sample point.
In other words, each outcome of the random

experiment is also called sample point.
13
Sample Space
A Sample space can be:

• Discrete
e.g.
S = {1,2,3,4,5,6}
S = {low, medium, high}
S={y,n}
• Continuous
e.g. S = R+={x|x>0}
S={x|10<x<11}
14
Event
An Event is a subset of the sample space of a

random experiment
An Event either happens or fails to happen as a

result of an experiment.
15
Event
If you throw a die (an experiment!), which of the two

outcomes is more probable:
A - appearance of six dots
B – appearance of even number of dots
16
Event
It is quite obvious that not all events are equally

probable.
Some are more probable and some are less probable
17
Event
Probability theory makes it possible to:

a) Determine the degree of likelihood (probability) of
various events
b) To compare them according to their probabilities
c) Predict the outcomes of random phenomenon on
the basis of probabilistic estimates
18
Simple (or Elementary) Event
If an event E has only one sample point in a

sample space, it is called simple (or elementary)
event
If a Discrete Sample space contains n distinct

elements, then there are exactly n simple
elements
In a continuous Sample Space there are infinite

simple elements
19
A vehicle has a Gear Box having 7 Gear positions:

1,2,3,4,5,R,N
A series of Experiment is conducted where the

position of Gear is observed every 5 mins during a
drive of the vehicle from Point A to Point B.
Sample Space S = {1,2,3,4,5,R,N}
Each of the observed Gear Positions are Simple

Events
20
During the same drive a separate series of

Experiment is conducted where speed of the
vehicle is observed every 5 mins. If highest
speed possible is 150 kph,
Sample Space S = {x|0<=x<=150}
Each of these observed speeds are Simple

Events.
21
Compound Event
If an Event has more than one sample

point of a sample space, it is called a
Compound Event
22
Compound Event
Example:
In a production line is manufacturing insulated wire ropes of 10m length. The

rope is considered as accepted if the measured thickness is within the
specification 5+- 0.5 mm, otherwise considered as defective
Experiment: A sample of 5 ropes are randomly selected from each production

run of 500 ropes and insulation thickness is measured and classified as defective
or non-defective
The following events are considered as Compound event:
a) E=Exactly one of the five ropes found to be defective

b) F=At least four ropes found to be non defective
The Subset associated with these events are:
a) E= {NYYYY, YNYYY, YYNYY, YYYNY, YYYYN}

b) F={NYYYY, YNYYY, YYNYY, YYYNY, YYYYN, YYYYY}
23
Classical Probability
Let us denote the probability of a random event as

P(A)
𝑚𝑎
P(A) =
𝑛
Where n is the total number of outcomes and ma is

the number of outcomes favorable to event A
This is also known as Classical Formula. It is

applicable for symmetric experiments which
possesses symmetry of possible outcomes
24
Probability Range
Probability will always be in the interval 0 and 1
0<=P(A)<=1
For Sure event P(A) = 1, for impossible event P(A) = 0
For Practically Sure event P(A)~ 1 and for practically

impossible event P(A)~ 0
25
Statistical Probability
• Most of the experiments in real life is not symmetrical.

• Hence classical formula cannot be applied to such
experiments
• We find probability in such cases by experimental
determination of frequency of the event
• The frequency of an event in series of N repetitions is the
ratio of the number of repetitions, in which the event took
place, to the total number of repetitions
𝑀𝐴
P(A) =
𝑁
• where N is total number of repetitions of the experiment and
MA is the number of repetitions in which event A occurs
26
Frequency and Probability
• Most probable event occurs most frequently than events

with low probability
• If the number of repetitions are small, the frequency of
event is to considerable extent a random quality
• We conduct an experiment with great number of repetitions
and the frequency of the event becomes less and less
random and it stabilizes
• If the number of independent trials are sufficiently large, we
say that frequency has approached the probability of an
event
27
Basic rules of probability theory
Probability Summation Rule:
• The probability that one of the two (or

several) mutually exclusive (disjoint) events
occurs is equal to sum of the probabilities of
these events
P(A or B) = P(A) + P(B)
28
Probability Multiplicative Rule:
• The probability of the combination of two events

(sequentially or simultaneously) is equal to the
probability of one of them multiplied by the
probability of the other provided that the first one
has occurred
P(A and B) = P(A). P(B/A)
• where P(B/A) is called conditional probability of

event B calculated for the condition that event A has
occurred
29
• For independent events,
P(B/A) = P(B)
• Two events A and B are called independent if the fact

that one event has occurred does not affect the
probability that the other event will occur
• In such cases:
P(A and B) = P(A). P(B)
30
The General Addition Rule:
P(A or B) = P(A) + P(B) - P(A and B)
Event A = even no of dots in a throw of die

Event B = no of dots >3
P(A or B) = 1/2 + 1/2 – (1/2 * 2/3) = 2/3
31
• The idea of disjoint events is about whether or not it is

possible for the events to occur at the same time
• The idea of independent events is about whether or not

the events affect each other in the sense that the
occurrence of one event affects the probability of the
occurrence of the other
• If the events are disjoint, then they cannot be independent.

A and B disjoint implies that if event A occurs then B does not
and vice versa. Knowing that event A has occurred
dramatically changes the likelihood that event B occurs – that
likelihood is 0 . This implies that A and B are not independent.
32
Probability Distribution Function
• Consider a discrete Random Variable X, which can take
values x1, x2, …., xn
• Not all these values are equally likely. Some are more
probable and some are less probable
• We call distribution function of the random variable any
function that describes the distribution of the probabilities
among the values of the variable
• The distribution of a discrete random variable X can be
represented as follows:
xi x1 x2 …… xn
P(x)
pi p1 p2 …… pn
X
33
Probability Density Function (PDF)
• For Continuous random variable we have probability density
• Density of a substance is mass per unit volume. For non-homogeneous
substance we talk of local density
• In probability theory, we also have local density (probability at point x per
unit length)
• Probability Density function is a function associated with continuous
Random Variable X, which gives the Probability Density at f(x) at point x
• f(x) >=0 everywhere and total area under the curve = 1
PDF
f(x)
X
34
Cumulative Density Function (CDF)
• Area under PDF corresponds to probabilities for the Random
Variable X
• The probability that X lies within the interval (a,b) equals area
bounded by x-axis, pdf curve, X=a and X=b
• Probability at any point a is equal to zero (area is zero at a point)
1.0
P(x < X)
CDF
f(x)
0.5 F(x)
0.0
-4 -3 -2 -1 0 1 2 3 4
x
• The cumulative density function of X returns the probability that
the random variable is less than or equal to the value x
F(x) = P(X <=x)
• This function is applicable to both Discrete and Continuous X
35
Probability Distributions
We need to quantify/verify our conclusions from the descriptive
statistics investigation and remove the subjectivity from the use of
descriptive statistics investigations
Observations vary from each other but they form a pattern that, if
stable, can be described as a distribution.
Distribution can differ in Location, Spread and Shape
A probability distribution is a mathematical model that relates the

value of the variable with the probability of occurrence of that
variable in the population
Inferring something about the population based on what is

measured in the sample is called Statistical Inference
Normal Distribution
Description
• The normal distribution (also called the Gaussian distribution) is
the most commonly used distribution in statistics. Two
parameters (m (mu) and s (sigma)) are required to specify the
distribution.
The distribution
( x - m )2
(x ; m )= 1 -
p ,s 2
e 2s 2
Notes 2 s 2
• The normal distribution closely matches the distribution of many

real life random processes e.g. measuring processes
• Normal distribution is called ‘normal’ as a way of suggesting their
depiction of a common, natural pattern
• Much of what is done in Data Analytics and Six Sigma is based
on normal distribution
Characterising the Normal Distribution
Uni-modal, Symmetric from Centre, Centre being mean, median,
and mode. Variability is estimated through Standard Deviation
Variation /
Standard deviation
s
Mean or Average
m
Center / Location
• In the special case of a distribution having the normal

shape, the Standard Deviation Rule applies.
• This rule tells us approximately what percent of the
observations fall within 1,2, or 3 standard deviations
away from the mean.
• In particular, when a distribution is approximately
normal, almost all the observations (99.7%) fall within 3
standard deviations of the mean.
39
1s 1s
68.27%
2s 2s
95.45%
3s 3s
99.73%
Whether you like it or not, 99.73% of the observations
will lie within Average +/- 3 s.d.
Z- standard normal variate
A “Standard” Normal Distribution
mean = 0
st. dev. = 1
-3 -2 -1 0 1 2 3
-3 -2 -1 0 1 2 3
Z-value anywhere
on this scale
• Z-value
– How many standard deviations
the value-of-interest is away
from the mean
(value - of - interest) - X
Z=
S
41
Use of Standard Normal distribution
• If a list of numbers follow the normal curve, the percentage of
entries falling in a given interval can be estimated as follows:
– First convert the intervals to standard unit
– Find the corresponding area under the normal curve
• The procedure is called the normal approximation
• A histogram which follows normal curve can be reconstructed

fairly well from average and SD.
• In such cases average and SD are good summary statistics
42
Percentiles
• The average and SD can be used to summarize data following the normal curve.
• The are less satisfactory for other kind of data e.g. skewed data. To summarize
such kind of data we use percentiles
• A percentile (or a centile) is a measure used in statistics indicating the value
below which a given percentage of observations in a group of observations fall.
• For example, the 20th percentile is the value (or score) below which 20% of the
observations may be found
• Percentile is each of the 100 equal groups into which a population can be
divided according to the distribution of values of a particular variable
– 50th percentile is the median
– Interquartile range equals: 75th percentile – 25th percentile
• A percentile is only used as a comparison score
• All histograms, whether or not they follow the normal curve, can be
summarized using percentiles
Quantile of a Distribution
• The ath quantile of a cumulative distribution function F is the
point xa so that F(xa) = a
• A percentile is simply a quantile with a expressed as percent

instead of proportion
44
Thank You
Abhinav Srivastava
45

Probability Theory and Statistics Basics-V 1.1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Probability Theory and Statistics Basics-V 1.1

Uploaded by

Copyright:

Available Formats

Probability Theory &

• Statistics deals with the collection, analysis, interpretation,

Data Data is characterized by:

Takes on values in one of K different classes or categories

 Ordinal: Ordered data - Rating, Ranking, Percentile

 Nominal: List of Identity without order e.g. Gender

Mode - Data which has maximum frequency.

Maximum - Largest of Data Points.

Minimum - Smallest of Data Points

Range - Difference between Maximum & Minimum

Mathematically intensive Easy to compute practically Easy to compute

May not be represented by Usually represented by an Represented by individual

 Range tells us only about two points, Max and Min

 Standard Deviation tells us about the relative distance of all

 It is a measure that is used to quantify the amount of

 It is RMS deviation from mean. It is RMS not of the original

 Variation is important to know

An experiment that can result in different

An experiment (or a “trial”) with results that

The set of all possible outcomes of a random

The sample space is donated by S

Each element of Sample space is called a

In other words, each outcome of the random

A Sample space can be:

An Event is a subset of the sample space of a

An Event either happens or fails to happen as a

If you throw a die (an experiment!), which of the two

It is quite obvious that not all events are equally

Probability theory makes it possible to:

If an event E has only one sample point in a

If a Discrete Sample space contains n distinct

In a continuous Sample Space there are infinite

A vehicle has a Gear Box having 7 Gear positions:

A series of Experiment is conducted where the

Sample Space S = {1,2,3,4,5,R,N}

Each of the observed Gear Positions are Simple

During the same drive a separate series of

Sample Space S = {x|0<=x<=150}

Each of these observed speeds are Simple

If an Event has more than one sample

In a production line is manufacturing insulated wire ropes of 10m length. The

Experiment: A sample of 5 ropes are randomly selected from each production

The following events are considered as Compound event:

a) E=Exactly one of the five ropes found to be defective

The Subset associated with these events are:

a) E= {NYYYY, YNYYY, YYNYY, YYYNY, YYYYN}

Let us denote the probability of a random event as

Where n is the total number of outcomes and ma is

This is also known as Classical Formula. It is

Probability will always be in the interval 0 and 1

For Sure event P(A) = 1, for impossible event P(A) = 0

For Practically Sure event P(A)~ 1 and for practically

• Most of the experiments in real life is not symmetrical.

• Most probable event occurs most frequently than events

Probability Summation Rule:

• The probability that one of the two (or

Probability Multiplicative Rule:

• The probability of the combination of two events

P(A and B) = P(A). P(B/A)

• where P(B/A) is called conditional probability of

• For independent events,

• Two events A and B are called independent if the fact

P(A and B) = P(A). P(B)