You are on page 1of 45

Probability Theory &

Statistics

1
What is Statistics

• Statistics deals with the collection, analysis, interpretation,


presentation and organization of data.
• Statistics allow companies to collect data, translate the data
into information so that decision can be taken based on facts,
rather than intuition, gut feel or past experience
• Statistics is like a powerful microscope which make visible
what has previously been invisible
• Statistics is a tool that separates common sense reasoning
from extraordinary reasoning
• Statistics create foundation for quality, which translates to
profitability and market share
Concept of Data
 Data is Collection of facts
 It may be values or measurements
 It may be words or description of things
 Data in its most basic form is raw in nature

Data Data is characterized by:


 Types of Data
Information  Quantitative
 Qualitative
 Unit of Measure
Knowledge
Types of Data - Quantitative
 A number attached from birth
Quantitative Data:  It takes on Numerical values
5.364…

Continuous/Variable
The characteristics which may assume any value within its range of variation
e.g. Height, Weight, Diameter etc.

Discrete/Attribute 7
The characteristics which assume only isolated values in its range of
variation. e.g. No. of complaints per month, Percentage absenteeism, No. of
injuries etc.
Types of Data - Qualitative
 No number attached from birth
Qualitative Data:  It takes on Categorical values

Takes on values in one of K different classes or categories


Example: Gender (Male/Female), Brand of Product Purchased (Brand A,B, C),
Person Default on Debt (Yes, No)

 Ordinal: Ordered data - Rating, Ranking, Percentile

 Nominal: List of Identity without order e.g. Gender

 Binary : Two class Categorical data of type present and absent having
states 1 and 0 respectively. It is also referred to as Boolean when two
states correspond to TRUE and FALSE.
Interval and Ratio Measurement
• In Nominal and Ordinal, the distance between attribute does not have any
meaning.
• However in interval measurement, the distance between attributes have a
meaning. E.g. if we are measuring temperature in Fahrenhiet scale then
distance between 30 to 40 is same as distance between 70 and 80.
• In such situation, it makes sense to compute average on interval scale.
However it does not make sense to compute ratios on this scale as 80 degrees
is not twice as hot as 40 degrees.
• In Ratio measurement, there is always an absolute zero that is meaningful. This
means that one can construct a meaningful fraction (or ratio) with a ratio
variable.
• Weight and Height are ratio variable and so are most count variable. E.g. No of
defects this week are twice that of last week.

6
Descriptive Statistics
Average or Mean - Sum of all Data divided by number of data points

Median - Middle Data Value when data is ranked from min. to max.

Mode - Data which has maximum frequency.

Maximum - Largest of Data Points.

Minimum - Smallest of Data Points

Range - Difference between Maximum & Minimum


2
S( x-
i xi )
Standard Deviation -
s =
n-1
Central Tendency – Mean, Median, Mode
Mean Median Mode

Sum of all data val divided Middle data value when data Most common value, data
by number of data points is ranked from min to max which has max frequency

Mathematically intensive Easy to compute practically Easy to compute

May not be represented by Usually represented by an Represented by individual


any individual data value individual data value data value

A histogram balances when Median divides area of Mode is the value of highest
supported at mean histogram in half point on the histogram

Not used when: Does not get affected by Does not get affected by
Data contains few extreme extreme values extreme values
values widely different
from majority Preferred when order of Preferred when most
values are considered commonly occurring value
Terminal classes are open important appropriately represent the
group
Why Standard Deviation is important

 Range tells us only about two points, Max and Min

 Standard Deviation tells us about the relative distance of all


points from the average or mean.

 It is a measure that is used to quantify the amount of


variation or dispersion of a set of data values about mean

 It is RMS deviation from mean. It is RMS not of the original


values but of their deviation from mean

 Variation is important to know


Probability Theory

10
Random Experiment

An experiment that can result in different


outcomes, even though it is repeated in the
same manner every time, is called a random
experiment.

An experiment (or a “trial”) with results that


cannot be predicted beforehand, i.e. random.

11
Sample Space

The set of all possible outcomes of a random


experiment is called the sample space of the
experiment.

The sample space is donated by S

12
Sample Space

Each element of Sample space is called a


Sample point.

In other words, each outcome of the random


experiment is also called sample point.

13
Sample Space

A Sample space can be:


• Discrete
e.g.
S = {1,2,3,4,5,6}
S = {low, medium, high}
S={y,n}

• Continuous
e.g. S = R+={x|x>0}
S={x|10<x<11}

14
Event

An Event is a subset of the sample space of a


random experiment

An Event either happens or fails to happen as a


result of an experiment.

15
Event

If you throw a die (an experiment!), which of the two


outcomes is more probable:
A - appearance of six dots
B – appearance of even number of dots

16
Event

It is quite obvious that not all events are equally


probable.
Some are more probable and some are less probable

17
Event

Probability theory makes it possible to:


a) Determine the degree of likelihood (probability) of
various events
b) To compare them according to their probabilities
c) Predict the outcomes of random phenomenon on
the basis of probabilistic estimates

18
Simple (or Elementary) Event

If an event E has only one sample point in a


sample space, it is called simple (or elementary)
event

If a Discrete Sample space contains n distinct


elements, then there are exactly n simple
elements

In a continuous Sample Space there are infinite


simple elements

19
Simple (or Elementary) Event

A vehicle has a Gear Box having 7 Gear positions:


1,2,3,4,5,R,N

A series of Experiment is conducted where the


position of Gear is observed every 5 mins during a
drive of the vehicle from Point A to Point B.

Sample Space S = {1,2,3,4,5,R,N}

Each of the observed Gear Positions are Simple


Events

20
Simple (or Elementary) Event

During the same drive a separate series of


Experiment is conducted where speed of the
vehicle is observed every 5 mins. If highest
speed possible is 150 kph,

Sample Space S = {x|0<=x<=150}

Each of these observed speeds are Simple


Events.

21
Compound Event

If an Event has more than one sample


point of a sample space, it is called a
Compound Event

22
Compound Event

Example:

In a production line is manufacturing insulated wire ropes of 10m length. The


rope is considered as accepted if the measured thickness is within the
specification 5+- 0.5 mm, otherwise considered as defective

Experiment: A sample of 5 ropes are randomly selected from each production


run of 500 ropes and insulation thickness is measured and classified as defective
or non-defective

The following events are considered as Compound event:

a) E=Exactly one of the five ropes found to be defective


b) F=At least four ropes found to be non defective

The Subset associated with these events are:

a) E= {NYYYY, YNYYY, YYNYY, YYYNY, YYYYN}


b) F={NYYYY, YNYYY, YYNYY, YYYNY, YYYYN, YYYYY}

23
Classical Probability

Let us denote the probability of a random event as


P(A)
𝑚𝑎
P(A) =
𝑛

Where n is the total number of outcomes and ma is


the number of outcomes favorable to event A

This is also known as Classical Formula. It is


applicable for symmetric experiments which
possesses symmetry of possible outcomes

24
Probability Range

Probability will always be in the interval 0 and 1

0<=P(A)<=1

For Sure event P(A) = 1, for impossible event P(A) = 0

For Practically Sure event P(A)~ 1 and for practically


impossible event P(A)~ 0

25
Statistical Probability

• Most of the experiments in real life is not symmetrical.


• Hence classical formula cannot be applied to such
experiments
• We find probability in such cases by experimental
determination of frequency of the event
• The frequency of an event in series of N repetitions is the
ratio of the number of repetitions, in which the event took
place, to the total number of repetitions
𝑀𝐴
P(A) =
𝑁
• where N is total number of repetitions of the experiment and
MA is the number of repetitions in which event A occurs

26
Frequency and Probability

• Most probable event occurs most frequently than events


with low probability
• If the number of repetitions are small, the frequency of
event is to considerable extent a random quality
• We conduct an experiment with great number of repetitions
and the frequency of the event becomes less and less
random and it stabilizes
• If the number of independent trials are sufficiently large, we
say that frequency has approached the probability of an
event

27
Basic rules of probability theory

Probability Summation Rule:

• The probability that one of the two (or


several) mutually exclusive (disjoint) events
occurs is equal to sum of the probabilities of
these events
P(A or B) = P(A) + P(B)

28
Basic rules of probability theory

Probability Multiplicative Rule:

• The probability of the combination of two events


(sequentially or simultaneously) is equal to the
probability of one of them multiplied by the
probability of the other provided that the first one
has occurred

P(A and B) = P(A). P(B/A)

• where P(B/A) is called conditional probability of


event B calculated for the condition that event A has
occurred

29
Basic rules of probability theory

• For independent events,

P(B/A) = P(B)

• Two events A and B are called independent if the fact


that one event has occurred does not affect the
probability that the other event will occur

• In such cases:

P(A and B) = P(A). P(B)

30
Basic rules of probability theory

The General Addition Rule:

P(A or B) = P(A) + P(B) - P(A and B)

Event A = even no of dots in a throw of die


Event B = no of dots >3

P(A or B) = 1/2 + 1/2 – (1/2 * 2/3) = 2/3

31
Basic rules of probability theory

• The idea of disjoint events is about whether or not it is


possible for the events to occur at the same time

• The idea of independent events is about whether or not


the events affect each other in the sense that the
occurrence of one event affects the probability of the
occurrence of the other

• If the events are disjoint, then they cannot be independent.


A and B disjoint implies that if event A occurs then B does not
and vice versa. Knowing that event A has occurred
dramatically changes the likelihood that event B occurs – that
likelihood is 0 . This implies that A and B are not independent.

32
Probability Distribution Function
• Consider a discrete Random Variable X, which can take
values x1, x2, …., xn
• Not all these values are equally likely. Some are more
probable and some are less probable
• We call distribution function of the random variable any
function that describes the distribution of the probabilities
among the values of the variable
• The distribution of a discrete random variable X can be
represented as follows:

xi x1 x2 …… xn
P(x)

pi p1 p2 …… pn

X
33
Probability Density Function (PDF)
• For Continuous random variable we have probability density
• Density of a substance is mass per unit volume. For non-homogeneous
substance we talk of local density
• In probability theory, we also have local density (probability at point x per
unit length)
• Probability Density function is a function associated with continuous
Random Variable X, which gives the Probability Density at f(x) at point x
• f(x) >=0 everywhere and total area under the curve = 1

PDF
f(x)

X
34
Cumulative Density Function (CDF)
• Area under PDF corresponds to probabilities for the Random
Variable X
• The probability that X lies within the interval (a,b) equals area
bounded by x-axis, pdf curve, X=a and X=b
• Probability at any point a is equal to zero (area is zero at a point)
1.0

P(x < X)

CDF
f(x)
0.5 F(x)

0.0
-4 -3 -2 -1 0 1 2 3 4
x
• The cumulative density function of X returns the probability that
the random variable is less than or equal to the value x
F(x) = P(X <=x)
• This function is applicable to both Discrete and Continuous X

35
Probability Distributions
We need to quantify/verify our conclusions from the descriptive
statistics investigation and remove the subjectivity from the use of
descriptive statistics investigations

Observations vary from each other but they form a pattern that, if
stable, can be described as a distribution.

Distribution can differ in Location, Spread and Shape

A probability distribution is a mathematical model that relates the


value of the variable with the probability of occurrence of that
variable in the population

Inferring something about the population based on what is


measured in the sample is called Statistical Inference
Normal Distribution
Description
• The normal distribution (also called the Gaussian distribution) is
the most commonly used distribution in statistics. Two
parameters (m (mu) and s (sigma)) are required to specify the
distribution.

The distribution
( x - m )2

(x ; m )= 1 -
p ,s 2
e 2s 2

Notes 2 s 2

• The normal distribution closely matches the distribution of many


real life random processes e.g. measuring processes
• Normal distribution is called ‘normal’ as a way of suggesting their
depiction of a common, natural pattern
• Much of what is done in Data Analytics and Six Sigma is based
on normal distribution
Characterising the Normal Distribution
Uni-modal, Symmetric from Centre, Centre being mean, median,
and mode. Variability is estimated through Standard Deviation

Variation /
Standard deviation
s
Mean or Average
m

Center / Location
Characterising the Normal Distribution

• In the special case of a distribution having the normal


shape, the Standard Deviation Rule applies.
• This rule tells us approximately what percent of the
observations fall within 1,2, or 3 standard deviations
away from the mean.
• In particular, when a distribution is approximately
normal, almost all the observations (99.7%) fall within 3
standard deviations of the mean.

39
Characterising the Normal Distribution

1s 1s
68.27%
2s 2s
95.45%
3s 3s
99.73%
Whether you like it or not, 99.73% of the observations
will lie within Average +/- 3 s.d.
Z- standard normal variate
A “Standard” Normal Distribution

mean = 0
st. dev. = 1

-3 -2 -1 0 1 2 3
-3 -2 -1 0 1 2 3

Z-value anywhere
on this scale
• Z-value
– How many standard deviations
the value-of-interest is away
from the mean
(value - of - interest) - X
Z=
S

41
Use of Standard Normal distribution
• If a list of numbers follow the normal curve, the percentage of
entries falling in a given interval can be estimated as follows:
– First convert the intervals to standard unit
– Find the corresponding area under the normal curve
• The procedure is called the normal approximation

• A histogram which follows normal curve can be reconstructed


fairly well from average and SD.
• In such cases average and SD are good summary statistics

42
Percentiles
• The average and SD can be used to summarize data following the normal curve.
• The are less satisfactory for other kind of data e.g. skewed data. To summarize
such kind of data we use percentiles
• A percentile (or a centile) is a measure used in statistics indicating the value
below which a given percentage of observations in a group of observations fall.
• For example, the 20th percentile is the value (or score) below which 20% of the
observations may be found
• Percentile is each of the 100 equal groups into which a population can be
divided according to the distribution of values of a particular variable
– 50th percentile is the median
– Interquartile range equals: 75th percentile – 25th percentile
• A percentile is only used as a comparison score
• All histograms, whether or not they follow the normal curve, can be
summarized using percentiles
Quantile of a Distribution
• The ath quantile of a cumulative distribution function F is the
point xa so that F(xa) = a

• A percentile is simply a quantile with a expressed as percent


instead of proportion

44
Thank You

Abhinav Srivastava

45

You might also like