Professional Documents
Culture Documents
QUANTITATIVE METHODS
MBAZC417
SAJIN JOHN
2020HB58042
Table of Contents
MODULE 1 – DESCRIPTIVE STATISTICS ............................................................................................................................... 3
DATA.................................................................................................................................................................................................................. 3
DISPLAYING QUALITATIVE DATA ................................................................................................................................................................... 5
DISPLAYING QUANTITATIVE DATA................................................................................................................................................................. 6
NUMERICAL MEASURES ................................................................................................................................................................................... 9
STANDARD DEVIATION AS A RULER ............................................................................................................................................................ 13
MODULE 2 – BASIC PROBABILITY ....................................................................................................................................... 14
INTRODUCTION TO PROBABILITY ................................................................................................................................................................ 14
Rules of Probabilities ................................................................................................................................................................................................. 14
Getting values of Probability ................................................................................................................................................................................. 15
Assigning Probabilities ............................................................................................................................................................................................. 15
CONDITIONAL PROBABILITY ........................................................................................................................................................................ 17
Using Contingency Table.......................................................................................................................................................................................... 17
Using Tree Diagrams ................................................................................................................................................................................................. 17
BAYES’ THEOREM .......................................................................................................................................................................................... 18
Tabular Approach Steps ........................................................................................................................................................................................... 18
MODULE 3 – PROBABILITY DISTRIBUTIONS .................................................................................................................... 20
RANDOM VARIABLES .................................................................................................................................................................................... 20
Probability function of a Discrete Random Variable ................................................................................................................................ 20
Probability function of a Continuous Random Variable ......................................................................................................................... 20
Area Under Curve......................................................................................................................................................................................................... 20
DISCRETE PROBABILITY DISTRIBUTION ..................................................................................................................................................... 21
Bivariate Discrete Probability Distribution ................................................................................................................................................... 21
UNIFORM & POISSON DISTRIBUTIONS ........................................................................................................................................................ 23
Discrete Uniform Probability Distribution ..................................................................................................................................................... 23
Poisson Probability Distribution Π(μ) .............................................................................................................................................................. 23
Poisson PD using MS Excel ...................................................................................................................................................................................... 23
Poisson PD using Published Tables ..................................................................................................................................................................... 23
BINOMIAL DISTRIBUTION ............................................................................................................................................................................. 24
Binomial Probability Distribution B(n,p)........................................................................................................................................................ 24
Understanding Probability Tree for Binomial .............................................................................................................................................. 24
Binomial Distribution using MS Excel ............................................................................................................................................................... 25
Binomial Distribution using Published Tables ............................................................................................................................................. 25
A Digression .................................................................................................................................................................................................................... 27
CONTINUOUS RANDOM DISTRIBUTION ....................................................................................................................................................... 28
Probability = Area Under the Curve ................................................................................................................................................................... 28
Uniform Probability Distribution U(a,b) ......................................................................................................................................................... 28
THE EXPONENTIAL DISTRIBUTION.............................................................................................................................................................. 29
Exponential Probability Distribution – exp(μ) ............................................................................................................................................ 29
exp(μ) Properties ......................................................................................................................................................................................................... 29
THE NORMAL DISTRIBUTION....................................................................................................................................................................... 30
Normal Probability Distribution N( ) ........................................................................................................................................................ 30
N( ) – Properties.................................................................................................................................................................................................... 30
Normal Distribution using MS Excel .................................................................................................................................................................. 30
Reading the Table for N(0,1) ................................................................................................................................................................................. 30
Transforming N( ) to N(0, 1) .......................................................................................................................................................................... 30
MODULE 4 – INTERVAL ESTIMATION ................................................................................................................................. 32
CONFIDENCE INTERVAL ESTIMATION ......................................................................................................................................................... 32
Population - Variance/Std Dev ............................................................................................................................................................................ 32
Sample – Variance/Std Dev .................................................................................................................................................................................... 32
Estimating Population Parameters.................................................................................................................................................................... 32
FORMULAS .................................................................................................................................................................................. 33
SAJIN JOHN 2
Data
Quantitative or Numerical
Quanlitative or Categorial
Name,
Discrete Continous Gender, DOJ
# of defects, Inter-arrival
Team size Time, Market
Share
Data in Business
Data
• Marketing • Qualitative or Categorical Data
Sales and usage (Point of Sale data), Order o Could be a Nominal Data or Ordinal
booking, Credit Sales, Sales Persons, Dealers, Data.
Returns, Customers Satisfaction, Loyalty Schemes, o Arithmetic operation do not make
Marketing Research… sense.
• Production and Operations o Name, Gender, Car Colour, Date of
Production, Machine Utilization, Inventory, Joining
Productivity, Quality,… • Quantitative or Numerical Data
• Purchase o Discrete
Prices, Delivery, Suppliers, Prices, Inventory, # of defects, # of Children
Supplier Rating, Returns, … o Continuous
Inter-arrival Time, Market Share,
• HR PE Ratio
Employee records: Employee satisfaction,
Performance, Manpower Planning, … Measurement Scale
• Finance
Invoices, Receivables, Assets, Budgeting, Interest
Rates, Stock Market Index, …
• Maintenance
Breakdown, MTTR, MTFF, …
Organizat
Collection Analysis
ion
SAJIN JOHN 3
SAJIN JOHN 4
SAJIN JOHN 5
Skewness
Skewness measures the asymmetric nature of the
distribution.
A distribution with the peak towards the right and a
longer left tail is skewed left or negatively skewed.
A distribution with the peak towards the left and a
longer right tail is skewed right or positively skewed.
A symmetric distribution has Skewness = 0 Time Taken – Dot Plot
SAJIN JOHN 6
2. Column Chart
Summary Tables
1. Frequency Table
One nominal variable – Blood Group
3. Side-by-side Chart
2. Contingency Table – 2 variables
Two nominal variables – Blood Group &
Ethnicity
SAJIN JOHN 7
9. Network Diagram
7. Scatter Plot
11. Map
SAJIN JOHN 8
Numerical Measures
Graphical and Tabular presentations pictorially • Median may be computed even if values of all
summarizes the etire data set. But business may observations are not available.
require a measure that summarizes the data or the 12345678X
spread of data with a single number. • for A (-1 + 0)/2 = -0.5
for B 0
Statistics & Parameter measure
• If the measure summarizes a sample data, it is Mode
referred to as a Statistic. So, statistuc is a
• The mode is the most frequent value in a data.
number that represents a property of the
• There can be more than one mode in a data
sample.
set as long as those values have the same
• If the measure summarizes the population, it frequency and that frequency is the highest.
is referred to as a Parameter. So, a
• A data set with two modes is called bimodal.
parameter is a numerical characteristic of the
• Mode can be calculated for both quantitative
whole population that can be estimated by a
or quanlitative data.
statistic.
• Mode may or may not exist or can have
Measure of Location multiple modes
11 12 13 14
Data Sets A = {-2, -2, -1, 0, 1, 5} 12 12 13 14 15 15 16
Data Sets B = {-2, -2, -1, 0, 1, 5, 5} • Change in the value of an observation may
not affect Mode
Mean Blue Green Red Red Red Yellow
• The sum of all data points divided by the total Blue Green Red Red Red Blue
number of observations. • Mode may be computed even if values of all
• The sample meanX is said to be a point observations are not available
estimator of the population mean µ. 1233334X
̅ = ∑ 𝒙𝒊 , 𝛍 = ∑ 𝒙𝒊
𝚾 • for A -2
𝒏 𝑵 for B -2, 5
• The words “mean” and “average” are often
used interchangeable.
Percentiles
• The technical term is “arithmetic mean” (AM).
• AM is always unique for a data set. • Percentiles divide ordered data into
• Change in any value of the observations hundredths.
always affects AM. • Percentiles are useful for comparing values.
• AM cannot be computed if all values are not • pth percentile:
𝒑
available. 𝒊=( )𝒏
• for A 1/6 = 0.167 𝟏𝟎𝟎
for B 6/7 = 0.857 if, i ≠ integer, xroundup(i)
else, Avg(xi, x(i+1))
Median
Quartiles
• The median is a number that measures the
“centre” / “mid-point” of the sorted data. • Quartiles are the numbers that separate the
• Its like a “middle value”… like a median of a data into quarters.
triangle. • Quartils may or may not be part of the data.
• So, sort the data and find the middle value if • Median is the second quartile (Q2)
the number of data (n/N) is odd else take the • The first Quartile, Q1, is the middle value of
average of the two middle values (in case of the lower half of the data. This is same as the
even data sets) 25th percentile.
• Visually, Median is preferred when • The third Quartile, Q3, is the middle value of
histograms are negatively or positively the upper half of the data. This is same as the
skewed – i.e. histograms are far from 75th percentile.
symmetric • for A
• Median may not be unique. Q1 -- 25%(n) = 0.25*6 = 1.5 ~ 2nd element.
• Changes in extreme values may not affect Q1 = -2
median Q2 -- 50%(n) = 0.5*6 = 3, Avg(3rd & 4th)
123456789 Q2 = Avg(-1,0) = -0.5
1 2 3 4 5 6 7 8 10000 Q3 -- 75%(n) = 0.75*6 = 4.5 ~ 5th element.
Q3 = 1
Standard Deviation
• The sample standard deviation s is said to be
a point estimator of the population standard
deviation
• The average squared distance from the mean
• The standard deviation is the square root of
the variance.
Range
• Difference between the maximum and
minimum value.
• Range = Max Val – Min Val
• for A Range = 5 - (-2) = 7
for B Range = 5 - (-2) = 7
IQR
• The Inter Quartile Range is a number that
indicates the spread of the middle half or the
middle 50% of the data.
• It is the difference between the third quartile
(Q3) and the first quartile (Q1).
IQR = Q3 – Q1
SAJIN JOHN 10
• Square Root of Mean of Square of Error Skewness from Mean & Median
(RMSE)
• for A s = √6.97 = 2.64
for B s = √9.14 = 3.02
Five-Number Summary
Five-Number Summary consist of Minimum Value, Q1,
Q2 (Median), Q3, Maximum Value
Sensex Healthcare
Minimum 35,414 16,169
Maximum 38,493 18,630
Range 3,079 2,461
Std. Deviation 841 715
Average 37,077 17,026
CoV 0.023 0.042
In above example, CoV is able to capture the variation
on Healthcare is more than that on Senxex.
SAJIN JOHN 11
SAJIN JOHN 12
SAJIN JOHN 13
Introduction to Probability
Probability is a mathematical tool used to study Example:
randomness. The sample space S is the whole numbers starting at
Probability measures uncertainty. one and less than 20.
It deals with the chance (the likelihood) of an event Let event A = the even numbers and event B = numbers
occurring. greater than 13.
An experiment is a planned operation carried out 𝑆 = {1,2,3,4,5,6,7,8,9,10,11,12,13,1,15,16,17,18,19}
under controlled conditions. 𝐴 = {2,4,6,8,10,12,14,16,18}
A result of an experiment is called an outcome. 𝐵 = {14,15,16,17,18,19}
The sample space of an experiment is the set of all 𝐴′ = {1,3,5,7,9,11,13,15,17,19}
possible outcomes. E.g.: S = {H,T} 𝐴 ∩ 𝐵 = {1416,18}
An event is any combination of outcomes. Upper case 𝐴 ∪ 𝐵 = {2,4,6,8,10,12,14,15,16,17,18,19
letters like A and B represent events. 9 6 3
𝑃(𝐴) = , 𝑃(𝐵) = , 𝑃(𝐴 ∩ 𝐵) = ,
19 19 19
Equally likely means that each outcome of an 10
experiment occurs with equal probability. So, the 𝑃(𝐴 ∪ 𝐵 ) =
19
probability of each such event can be calculated by ′)
10
dividing count of outcomes of an event to total 𝑃(𝐴 = , 𝑠𝑜−→ 𝑃(𝐴) + 𝑃(𝐴′ ) = 1
19
number of outcomes in the sample space. 𝑃(𝐴 ∩ 𝐵) 3 𝑃(𝐴 ∩ 𝐵) 3
"∪" Event: The Union 𝑃(𝐴|𝐵) = = , 𝑃(𝐵|𝐴) = =
𝑃(𝐵) 6 𝑃(𝐴) 9
An outcome is in the event A ∪ B if the outcome is
in A or is in B or is in both A and B.
"∩" Event: The Intersection Rules of Probabilities
An outcome is in the event A ∩ B if the outcome is in 1. Probability of an event must be between 0 and 1
both A and B at the same time. (inclusive)
Two events are mutually exclusive when both the 0 < 𝑃(𝐴) < 1
event cannot occur at the same time. i.e. P(A ∩ B)=0 2. A event that A does not occur is called A
complement or simply not A.
The complement of event A is denoted A′ (read
𝑃(𝐴′ ) = 1 − 𝑃(𝐴)
"A prime"). A′ consists of all outcomes that
3. If two events A and B are mutually exclusive, the
are NOT in A. Notice that P(A) + P(A′) = 1.
probability of both events A & B occurring is 0.
The conditional probability of A given B is
𝑃(𝐴 ∩ 𝐵) = 0
written P(A|B). P(A|B) is the probability that 4. If two events A & B are mutually exclusive, the
event A will occur given that the event B has already probability of either A or B is sum of their
occurred. A conditional reduces the sample space. separate probability
𝑷(𝑨 ∩ 𝑩) 𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵)
𝑷(𝑨|𝑩) = , 𝒘𝒉𝒆𝒓𝒆 𝑷(𝑩) > 𝟎
𝑷(𝑩) 5. If events in a set are mutually exclusive and
Two events A and B are independent if the collective exhaustive, the sum of their
knowledge that one occurred does not affect the probabilities must add up to 1
chance the other occurs. For example, the outcomes 𝑃(𝐻𝑒𝑎𝑑𝑠) + 𝑃(𝑇𝑎𝑖𝑙𝑠) = 1
of two roles of a fair die are independent events. 6. If two events A & B are not mutually exclusive, the
The odds of an event presents the probability as a probability of either event A or event B occurring
ratio of success to failure. This is common in various is the sum of their separate probabilities minus
gambling formats. Mathematically, the odds of an the probability of their simultaneous occurrence.
event can be defined as: 𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵)
𝑃(𝐴) 7. If two events A and B are independent, the
1 − 𝑃(𝐴) probability of both events A & B occurring is
A contingency table provides a way of portraying equal to the product of their individual
data that can facilitate calculating probabilities. probabilities.
𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴)𝑃(𝐵)
A tree diagram is a special type of graph used to 8. If two events A & B are not independent, the
determine the outcomes of an experiment. It consists probability of both events A & B occurring is the
of "branches" that are labelled with either frequencies product of the probability of event A multiplied
or probabilities. by probability of event B occurring, given that
event A has occurred.
𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴)𝑃(𝐵|𝐴)
SAJIN JOHN 14
Empirical From observation, Life tables used by insurance companies; Clinical trial
experiments, customer data used by drug approval agencies; Exit polls;
surveys, etc. Customer surveys; Customer data- in banking, ARPU in
telecom; Life of automobile batteries, tires, electrical
switches; Occurrence of earthquakes, rainfall, floods,
fires, etc.
Subjective Own belief Everyday decisions- carry umbrella or not, buy or sell
gold/oil/stocks by retail customers, participating in a
Short coming of a priori approach closed-bid auction, etc.
• Probability does not give a proper idea if an
event could actually occur. Assigning Probabilities
E.g.: Rain/NoRain – 50:50 chances Basic Requirements for Assigning Probabilities
1. 0 ≤ P(E) ≤ 1 for any outcome E
Empirical Probability 2. ∑ P(Ei ) = 1
• From historical data or experiments or
observation Equally Likely
• Life tables in insurance, earthquakes, rainfall, Assigning probabilities based on the assumption of
twins, quality, stock market, … equal likely outcomes.
Item Probability Example: Rolling a fair die and observing the top face
Left-handed 1 : 10 persons S={1,2,3,4,5,6}
Twins 3 : 100 births Probabilities of each sample point is 1/6, i.e. each
Vegetarian 38 : 100 persons sample point has a 1/6 chance of occurring.
Aircraft crash 1 : 48 lakh flights Properties of equally likely events:
• Complementary events
Boys to Girls ratio 51.2 : 48.8
P(A’) = 1 – P(A)
• Intersection of events
P(A & B) = 𝑃(𝐴 ∩ 𝐵)
SAJIN JOHN 15
SAJIN JOHN 16
Conditional Probability
Calculating probability in percentage:
Using Contingency Table Fraction of parts, % or probability in %
Defective Good Total
Supplier-A 10 20 30
Supplier-B 30 40 70
Total 40 60 100
Joint probability is a statistical measure that Joint Probability:
P(A and Defective) = 10
calculates the likelihood of two events occurring
P(A and Good) = 20
together and at the same point in time.
P(B and Defective) = 30
Joint probability is the probability of event Y P(B and Good) = 40
occurring at the same time that event X occurs.
202 Marginal Probability:
𝑃(𝐴𝑙𝑖𝑣𝑒 & 𝑓𝑟𝑜𝑚 𝐹𝑖𝑟𝑠𝑡 𝑐𝑙𝑎𝑠𝑠) = 𝑃(𝐴 ∩ 𝐵) = P(A) = P(A and Defective) + P(A and Good) = 10 + 20 = 30
2201
P(B) = P(B and Defective) + P(B and Good) = 30 + 40 = 70
So, probability of combination of either of the survival P(D) = P(A and Defective) + P(B and Defective) = 10 + 30 = 40
and any of the class together is joint probability. P(G) = P(A and Good) + P(B and Good) = 20 + 40 = 60
Using contingency table, we directly its value by
dividing the count by total instances. Using Tree Diagrams
Marginal probability is the probability of an event A tree diagram is a special type of graph used to
irrespective of the outcome of another variable. determine the outcomes of an experiment. It consists
P(First)=325/2201, P(Alive)=710/2201 of "branches" that are labelled with either frequencies
or probabilities.
Conditional Probability is the likelihood that an For Instance:
event will occur given that another event has already
occurred.
P(survived given he was in 3rd class) = 178/706
P(3rd class given he survived) = 178/710
𝑃(𝐴 ∩ 𝑇) 178/2201 178
𝑃(𝐴|𝑇) = = =
𝑃(𝑇) 706/2201 706
Additional (OR) Rule
𝑃(𝐴 ∪ 𝐹) = 𝑃(𝐴) + 𝑃(𝐹) − 𝑃(𝐴 ∩ 𝐹)
710 325 202 833
𝑃(𝐴 ∪ 𝐹) = + − =
2201 2201 2201 2201
Example:
Frequency of parts, nos
Defective Good Total
Supplier-A 5 10 15
Supplier-B 15 20 35
Total 20 30 50
SAJIN JOHN 17
Bayes’ Theorem
Suppose Ei’s are ALL the possible outcomes and the This information has to be factored into the data
prior probabilities P(Ei) have been assigned to them. above!
The event F has occurred. Events:
𝑃(𝐸𝑖 |𝐹) E1: Lunch Out, E2: Lunch at Canteen, E3: Lunch in
𝑃(𝐸𝑖 )𝑃(𝐹|𝐸𝑖 ) Cubicle
=
𝑃(𝐸1 )𝑃(𝐹|𝐸1 ) + 𝑃(𝐸2 )𝑃(𝐹|𝐸2 ) + ⋯ + 𝑃(𝐸𝑛 )𝑃(𝐹|𝐸𝑛 ) Assuming all the possibilities for the EA to have lunch
𝑃(𝐸𝑖 )𝑃(𝐹|𝐸𝑖 ) are E1, E2, E3.
𝑃(𝐸𝑖 |𝐹) = F: EA came late today.
𝑃(𝐸1 ∩ 𝐹) + 𝑃(𝐸2 ∩ 𝐹) + ⋯ + 𝑃(𝐸𝑛 ∩ 𝐹)
𝑷(𝑬𝒊 )𝑷(𝑭|𝑬𝒊 )
𝑷(𝑬𝒊 |𝑭) = Prior Probabilities, 𝑃(𝐸1 ) = 𝑃(𝐸2 ) = 𝑃(𝐸3 ) = 1⁄3
𝑷(𝑭)
Required: Conditional Probabilities,
• The Ei’s are mutually exclusive 𝑃(𝐹 |𝐸1 ) = 0.4, 𝑃(𝐹 |𝐸2 ) = 0.19, 𝑃(𝐹 |𝐸3 ) = 0.01
𝑃(𝐴 ∩ 𝐵) Event (F) P(E) P(F|E) P(F∩E) P(E|F)
𝑃(𝐴|𝐵) = E1 1/3 0.4 0.4/3 0.67
𝑃(𝐵)
𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐵)𝑃(𝐴|𝐵) E2 1/3 0.19 0.19/3 0.31
• The Ei’s are collectively exhaustive – i.e. are all the E3 1/3 0.01 0.01/3 0.02
possible events. Total 1.0 P(F)=0.2 1.00
𝑃(𝐴) = 𝑃(𝐴 ∩ 𝐵1 ) + 𝑃(𝐴 ∩ 𝐵2 ) + ⋯ + 𝑃(𝐴 ∩ 𝐵3 )
• P(F|Ei)’s are available 𝑃(𝐹 ∩ E) = P(E)P(F|E)
𝑃(𝐸)𝑃(𝐹 |𝐸) 𝑃(𝐹 ∩ E)
𝑃(𝐸|𝐹) = =
Tabular Approach Steps 𝑃(𝐹) P(F)
Step 1 So,
• Identify the mutually exclusive events (E’s) that P(Eating Lunch given EA came late) = P(E1|F)=0.67
make up the Sample Space; P(E2|F)=0.31
• Identify the Fact F. P(E3|F)=0.02
• Note down the P(E) and the conditional
probabilities (F | E) Using Contingency Table,
• Prepare the table with 5 columns and (n+1) rows, E1: Lunch Out, E2: Lunch at Canteen, E3: Lunch in
where n is the size of the sample space Cubicle
Step 2 Assuming all the possibilities for the EA to have lunch
• Enter are E1, E2, E3.
Column 1 The events E’s F: EA came late today.
Column 2 The prior probabilities P(E)’s
Column 3 P(F | E)’s Prior Probabilities, 𝑃(𝐸1 ) = 𝑃(𝐸2 ) = 𝑃(𝐸3 ) = 1⁄3
Step 3 Late Not Late Total
• Column 4 Compute the joint probabilities Eating Out 40% ? 1/3
Using, 𝑃(𝐹 ∩ E) = P(E)P(F|E) Canteen 19% ? 1/3
Step 4 Cubicle 1% ? 1/3
• Column 4. The last cell will contain Total ? ? 1
P(E∩F) = P(F) Computing the remaining values:
Step 5 Late Not Late Total
• Column 5 Compute the posterior probabilities Eating Out 40% (1/3) 60% (1/3) 1/3
using Canteen 19% (1/3) 81%(1/3) 1/3
𝑃(𝐸)𝑃(𝐹 |𝐸) 𝑃(𝐹 ∩ E) Cubicle 1% (1/3) 99%(1/3) 1/3
𝑃 (𝐸 |𝐹 ) = =
𝑃(𝐹) P(F) Total ? ? 1
SAJIN JOHN 18
Converting to Probabilities:
Now,
P(E1|F) = 0.00495/0.0547 = 0.0905
Example 2:
The disease is present in 0.5% of the population. It is Using Tree Diagram,
a deadly disease and death is almost always Events:
inevitable. E1: Have Disease, E2: No Disease
But there is a test that can detect the disease. F: Tested Positive
The True Positive is 99% while the False Positive is Prior Probabilities,
5%. 𝑃(𝐸1 ) = 0.5% = 0.005, P(E2) = 0.995
Question: If you test Conditional Probabilities,
positive, should you 𝑃(𝐹 |𝐸1 ) = 0.99, 𝑃(𝐹 |𝐸2 ) = 0.05
panic?
Events:
E1: Have Disease, E2: No Disease
F: Tested Positive
Prior Probabilities,
𝑃(𝐸1 ) = 0.5% = 0.005, P(E2) = 0.995
Conditional Probabilities,
𝑃(𝐹 |𝐸1 ) = 0.99, 𝑃(𝐹 |𝐸2 ) = 0.05
Event P(E) P(F|E) P(F∩E) P(E|F)
E1 0.005 0.99 0.00495 0.0905
E2 0.995 0.05 0.04975 0.9095
1 P(F)=0.0547 1
𝑃(𝐹 ∩ E) = P(E)P(F|E)
𝑃(𝐸)𝑃(𝐹 |𝐸) 𝑃(𝐹 ∩ E) Using Classification Algorithm,
𝑃(𝐸|𝐹) = =
𝑃(𝐹) P(F)
So, P(Having disease given tested positive)
P(E1|F)=0.0905
SAJIN JOHN 19
Random Variables
A random variable is a numerical description of the
outcome of a random experiment. Probability function of a Continuous Random
A discrete random variable may assume a Variable
countable number of values
The discrete random variables remain unknown till Probabilities assigned to intervals of numbers.
its completed. i.e. # of customers using the ATM in a Such probability distributions have descriptive
day remains unknown until the day ends. measures: μ, σ etc
E.g.: Let X: The time spend by a
# of dependents of an employee car at the toll booth
# of customers using the ATM in a day A Histogram can be
# of sixes in a T20 match developed for the data.
# of owners who like the product A line joining the top of each
A continuous random variable mat assume any rectangle at the enter will
numerical value in an interval generate the empirical
probability density function.
The random variable inherits the probabilities of the
events of the random experiment.
E.g.:
Life of a tire
Time between call at the call centre
Volume of water in a 1 litre mineral water bottle
% of owners who like the product
Area Under Curve
Probability function of a Discrete Random
Variable
The Probability function lists the possible outcomes
and their probabilities
Like frequency distribution, probability distribution • X can assume any value in an interval on the real
have descriptive measures line or in an interval.
0 ≤ 𝑃(𝑥) ≤ 1 • P(X = a) = 0, for any number a
∑ 𝑃(𝑥) = 1 • P(a < X < b) = P(a ≤ X < b) = P(a < X ≤ b) = P(a ≤
X ≤ b)
• P(a < X < b) is the area under the graph of the
probability density function between a and b.
SAJIN JOHN 20
SAJIN JOHN 21
SAJIN JOHN 22
SAJIN JOHN 23
Binomial Distribution
X ~ B(n, p)
p: Probability of Success
q: Probability of Failure (q=1-p)
𝒏
𝒇(𝒙) = 𝑷(𝑿 = 𝒙) = ( ) 𝒑𝒙 𝒒(𝒏−𝒙) = 𝑪𝒏𝒙 𝒑𝒙 𝒒(𝒏−𝒙)
𝒙 A Fair coin tossed three times:
𝒏!
= 𝒑𝒙 𝒒(𝒏−𝒙)
(𝒏
𝒙! − 𝒙)!
𝛍 = 𝐧𝐩
𝝈𝟐 = 𝒏𝒑𝒒
𝝈 = √𝒏𝒑𝒒
Rule of Thumb:
If N >> n, so that p does not change by much (if case of
any situation where probability changes)
𝒏
< 𝟓%
𝑵
Given: Probability distribution of Boys and Girls Binomial Distribution using MS Excel
in 1, 2, and 3, children families? MS Excel generic formula:
=BINOM.DIST(x, NumberOfTrials,
ProbabilityOfSuccess, FALSE)*100
SAJIN JOHN 25
SAJIN JOHN 26
A Digression Case 3:
Case 1: X ~ B(20, 0.9)
X ~ B(30, 0.6) np = 20*0.9 = 18
np = 30*0.6 = 18 nq = 20*0.1 = 2
nq = 30*0.4 = 12
np >= 5,
np >= 5, nq <= 5 (less)
nq >= 5
So, binomial distribution is
So, binomial distribution asymmetric.
is symmetric.
Case 4:
Case 2: X ~ B(20, 0.1)
X ~ B(10, 0.5) np = 20*0.1 = 2
np = 10*0.5 = 5 nq = 20*0.9 = 18
nq = 10*0.5 = 5
np <= 5, (less)
np >= 5, nq >= 5
nq >= 5
So, binomial distribution is
So, binomial distribution is asymmetric.
symmetric.
SAJIN JOHN 27
SAJIN JOHN 28
SAJIN JOHN 29
SAJIN JOHN 30
Example:
The Engineering College has been conducting an
entrance exam for the last 50 years. The scores on
this test are normally distributed with a μ = 50 and a
σ = 10. Kumar believes that he must do better than
at least 80% of those who take the test. Kumar
scores 58. Will he be selected?
Here, Given
X : Score of a randomly selected student
68% within 1 standard deviations. X~N(50, 10)
95% within 2 standard deviations. P(selected student scored less than 58),
99% within 3 standard deviations. x − 50 58 − 50
P(X < 58) = P ( < ) = P(Z < 0.8)
SK rule- 10 10
0.1 – 2 – 14 - 34 rule, from left side. = P(Z < 0) + P(0 < Z < 0.8)
0.1 – 2.1 – 13.6 – 31.4 = 0.5 + 0.2881 = 0.7881
This means, 78.8% students scored 58 or more
For 6-sigma enthusiasts (from Centre) marks but Kumar was hoping to score more than
68.27% within 1 σ 80% of the students.
95.50% within 2 σ
SAJIN JOHN 31
𝒔
𝑺𝒕𝒂𝒏𝒅𝒂𝒓𝒅 𝑬𝒓𝒓𝒐𝒓 𝒐𝒇 𝒎𝒆𝒂𝒏 =
√𝒏
𝒑(𝟏 − 𝒑)
𝑺𝒕𝒂𝒏𝒅𝒂𝒓𝒅 𝑬𝒓𝒓𝒐𝒓 𝒐𝒇 𝒑𝒓𝒐𝒑𝒐𝒓𝒕𝒊𝒐𝒏 = √
𝒏
Population - Variance/Std Dev To estimate, population mean – use t table
To estimate proportion – use Z table
Mean,
∑ 𝒙𝒊 To use t table, degree of freedom, df=sample size – 1
𝛍=
𝑵
Variance, Example:
∑(𝒙𝒊 − 𝝁)𝟐 The weights of 5 potatoes drawn randomly from a
𝝈𝟐 =
𝑵 consignment are 136, 58, 79, 48 and 46 g.
Standard Deviation, Size, n Mean, X' Stdev, S
𝝈 = √𝝈𝟐 Sample-1 136 58 79 48 46 5 73.4 37.4
For population, variance is divided by N
Excel Formula: Estimate average weight (mean) of potatoes in the
Population variance, =Var.p(range) consignment.
Population standard deviation, =Stdev.p(range) SE= S/√n t90% value Estimate, Mean ± t90% * SE
16.7 2.132 73.4 ± 35.6 or, 37.8 to 109
Sample – Variance/Std Dev
Mean, Estimates from 5 samples taken from same
∑ 𝒙𝒊
̅=
𝚾 consignment
𝒏 Size, n Mean, X' Stdev, S SE= S/√n t90% value
Variance, Sample-1 136 58 79 48 46 5 73.4 37.4 16.7 2.132
∑(𝒙𝒊 − 𝑿)𝟐 Sample-2 73 58 79 48 46 5 60.8 14.8 6.6 2.132
𝒔𝟐 = Sample-3 115 58 79 48 46 5 69.2 28.8 12.9 2.132
𝒏−𝟏 Sample-4 64 58 79 48 46 5 59.0 13.4 6.0 2.132
Sample-5 87 58 79 48 46 5 63.6 18.5 8.3 2.132
Standard deviation,
𝒔 = √𝒔𝟐 √ Estimate, Mean ± t90% * SE
For sample, variance is divided by n-1 73.4 ± 35.6 or, 37.8 to 109
Excel Formula: 60.8 ± 14.1 or, 46.7 to 74.9
69.2 ± 27.4 or, 41.8 to 96.6
Sample variance, =Var.s(range)
59 ± 12.8 or, 46.2 to 71.8
Sample standard deviation, =Stdev.s(range) 63.6 ± 17.6 or, 46 to 81.2
SAJIN JOHN 32
FORMULAS
Discrete Probability Distribution
Mean for Population: 0 ≤ 𝑓(𝑥) ≤ 1
∑ 𝒙𝒊
̅=
𝚾 ∑ 𝑓(𝑥) = 1
𝒏
Mean for Sample: 𝑬(𝑿) = 𝝁 = ∑ 𝑿𝒇(𝑿)
∑ 𝒙𝒊
𝛍= 𝑽𝒂𝒓(𝑿) = 𝑽(𝑿) = 𝝈𝟐 = ∑(𝑿 − 𝝁)𝟐 𝒇(𝑿)
𝑵
Percentiles: 𝑬(𝜶𝑿) = 𝜶𝑬(𝑿)
𝒑
𝒊=( )𝒏 𝑽(𝜶𝑿) = 𝜶𝟐 𝑽(𝑿)
𝟏𝟎𝟎 𝑬(𝑿 ± 𝒄) = 𝑬(𝑿) ± 𝒄
if, i ≠ integer, xroundup(i)
𝑽(𝑿 ± 𝒄) = 𝑽(𝑿)
else, Avg(xi, x(i+1))
𝑬(𝜶𝑿 + 𝜷𝒀) = 𝜶𝑬(𝑿) + 𝜷𝑬(𝒀)
If X& Y are independent,
Percentile Rank
𝑽(𝜶𝑿 + 𝜷𝒀) = 𝜶𝟐 𝑽(𝑿) + 𝜷𝟐 𝑽(𝒀)
(#𝒐𝒇 𝒗𝒂𝒍𝒖𝒆𝒔 𝒃𝒆𝒍𝒐𝒘 𝒙) + 𝟎. 𝟓
% 𝒓𝒂𝒏𝒌 = [ ] × 𝟏𝟎𝟎
𝑻𝒐𝒕𝒂𝒍 # 𝒐𝒇 𝒗𝒂𝒍𝒖𝒆𝒔 Poisson Probability Distribution Π(μ)
Range I: The specified interval
Range = Max Val – Min Val X = the number of occurrences in an interval
f(x) = the probability of x occurrences in an interval
Inter Quartile Range μ = mean number of occurrences in an interval
IQR = Q3 – Q1
X ~ Π(μ)
Variance for population, 𝝁𝒙 𝒆−𝝁 𝝀𝒙 𝒆−𝝀
∑(𝒙𝒊 − 𝝁)𝟐 𝑷(𝑿 = 𝒙) = =
𝝈𝟐 = 𝒙! 𝒙!
𝑵 𝜇 = 𝜆 = 𝐸(𝑋) = 𝑉(𝑋)
Variance for sample,
∑(𝒙𝒊 − 𝑿)𝟐 Binomial Probability Distribution B(n,p)
𝒔𝟐 = X ~ B(n, p)
𝒏−𝟏
Z-Score p: Probability of Success
𝒙−𝑿 q: Probability of Failure (q=1-p)
𝒛= 𝒏
𝒔 < 𝟓%
𝒙−𝝁 𝑵
𝒛= 𝒏
𝝈 𝒇(𝒙) = 𝑷(𝑿 = 𝒙) = ( ) 𝒑𝒙 𝒒(𝒏−𝒙) = 𝑪𝒏𝒙 𝒑𝒙 𝒒(𝒏−𝒙)
Chebyshev’s Theorem 𝒙
𝒏!
(𝟏 − 𝟏⁄𝒛𝟐 ) = 𝒑𝒙 𝒒(𝒏−𝒙)
𝒙! (𝒏 − 𝒙)!
𝛍 = 𝐧𝐩
Probability
𝝈𝟐 = 𝒏𝒑𝒒
Not mutually exclusive events,
𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵) 𝝈 = √𝒏𝒑𝒒
For mutually exlusive events,
𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) Uniform Probability Distribution U(a,b)
For not independent events, 𝟏
𝑃(𝐴 ∩ 𝐵 ) = 𝑃(𝐴)𝑃(𝐵|𝐴) 𝒇(𝒙) = {𝒃 − 𝒂 𝒇𝒐𝒓 𝒂 ≤ 𝒙 ≤ 𝒃
For independent events, 𝟎 𝒆𝒍𝒔𝒆 𝒘𝒉𝒆𝒓𝒆
𝑎+𝑏
𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴)𝑃(𝐵) 𝐸(𝑈) =
Probability 2
(𝑏 − 𝑎)2
𝑁𝑜. 𝑜𝑓 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠 𝑖𝑛 𝑤ℎ𝑖𝑐ℎ 𝑡ℎ𝑒 𝑒𝑣𝑒𝑛𝑡 𝑜𝑐𝑐𝑢𝑟𝑠 𝑉(𝑈) =
= 12
𝑇𝑜𝑡𝑎𝑙 𝑛𝑜. 𝑜𝑓 𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠 So, considering x2 > x2
Bayes’ Theorem 𝒙𝟐 − 𝒙𝟏
𝑃(𝐸𝑖 |𝐹) 𝑷(𝒙𝟏 < 𝑼(𝒂, 𝒃) < 𝒙𝟐 ) =
𝒃−𝒂
𝑃(𝐸𝑖 )𝑃(𝐹|𝐸𝑖 ) Exponential Probability Distribution – exp(μ)
= 𝟏
𝑃(𝐸1 )𝑃(𝐹|𝐸1 ) + 𝑃(𝐸2 )𝑃(𝐹|𝐸2 ) + ⋯ + 𝑃(𝐸𝑛 )𝑃(𝐹|𝐸𝑛 )
𝒇(𝒙) = ( ) 𝒆−𝒙/𝝁
𝑃(𝐸𝑖 )𝑃(𝐹|𝐸𝑖 ) 𝝁
𝑃(𝐸𝑖 |𝐹) =
𝑃(𝐸1 ∩ 𝐹) + 𝑃(𝐸2 ∩ 𝐹) + ⋯ + 𝑃(𝐸𝑛 ∩ 𝐹) 𝑷(𝑿 < 𝒙) = 𝟏 − 𝒆−𝒙/𝝁
𝑷(𝑬𝒊 )𝑷(𝑭|𝑬𝒊 ) Where, μ : Avg interval time
𝑷(𝑬𝒊 |𝑭) = 1/ μ : Arrival rate
𝑷(𝑭)
Population Mean
𝑷𝒐𝒑𝒖𝒍𝒂𝒕𝒊𝒐𝒏 𝒎𝒆𝒂𝒏
= 𝑺𝒂𝒎𝒑𝒍𝒆 𝒎𝒆𝒂𝒏
± 𝑺𝒂𝒎𝒑𝒍𝒊𝒏𝒈 𝑬𝒓𝒓𝒐𝒓
𝒔
= 𝑿′ ± 𝑪𝒓𝒊𝒕𝒊𝒄𝒂𝒍 𝒗𝒂𝒍𝒖𝒆 ∗
√𝒏
Population Proportion
𝑷𝒐𝒑𝒖𝒍𝒂𝒕𝒊𝒐𝒏 𝑷𝒓𝒐𝒑𝒐𝒓𝒕𝒊𝒐𝒏
= 𝑺𝒂𝒎𝒑𝒍𝒆 𝒑𝒓𝒐𝒑𝒐𝒓𝒕𝒊𝒐𝒏
± 𝑺𝒂𝒎𝒑𝒍𝒊𝒏𝒈 𝑬𝒓𝒓𝒐𝒓
𝒑(𝟏 − 𝒑)
= 𝒑 ± 𝑪𝒓𝒊𝒕𝒊𝒄𝒂𝒍 𝑬𝒓𝒓𝒐𝒓 ∗ √
𝒏
Standard Error of mean
𝒔
𝑺𝒕𝒂𝒏𝒅𝒂𝒓𝒅 𝑬𝒓𝒓𝒐𝒓 𝒐𝒇 𝒎𝒆𝒂𝒏 =
√𝒏
Standard Error of proportion
𝒑(𝟏 − 𝒑)
𝑺𝒕𝒂𝒏𝒅𝒆𝒅 𝑬𝒓𝒓𝒐𝒓 𝒐𝒇 𝒑𝒓𝒐𝒑𝒐𝒓𝒕𝒊𝒐𝒏 = √
𝒏
SAJIN JOHN 34