You are on page 1of 4

Session 4, Lecture 5, BIMTECH, 04 Feb 2/10/2022

2022

Statistics for Decision Making in Python


Session 4, Lecture 5
Business Vertical – DA, Trimester III, Batch ‘21-’23

V Shekhar Avasthy, 4th Feb, 2022

This session consists of one of the most fundamental aspects of Statistics. The derivation etc shall not be a part of
problems/ grading – only application shall be asked. Nonetheless, UNDERSTANDING this is critical!

Privileged and Confidential. All Rights Reserved © Facts n Data 2022 1

What we had seen…


1. If MEANS of ALL POSSIBLE COMBINATIONS of size “n” are plotted, they lead to a near-bell shaped curve. This is
true for nearly all Data of any shape (except very few exceptions), hence the name Normal curve;
2. All bell shaped curves, are UNIQUELY characterized by the peak (mean) and spread (Standard Deviation) – i.e.,
a combination of peak and spread (mean & SD, in other words), shall lead to one and only one curve;
3. The points on the curve represent the probability of mean on corresponding x-axis for that particular ‘n’
• This implies that sum of probability for mean lying between any 2 points (say, x1 and x2) is equal to probability of all points between
points x1 and x2 – in other words, area under the curve between these two points;
4. Since such a curve represents probability of various mean values, it’s also called a Probability Distribution (PD)
curve;
5. The peak of any PD curve for any ‘n’ from same population, always (nearly) coincides – i.e., the “mean of
means” for any given sample size ‘n’ is always (almost) equal to the population mean;
6. Area on either side of peak (wherever the peak is) between +1 SD is 68.27% of total area, between +2 SD is
95.45% (between + 1.96 SD is 95%) of total area, between +3 SD is 99.73% of the total area, where SD is
Standard Deviation of that particular curve of sample size “n”;
• This implies that if for a sample (say, Sample ‘k’) of “n” objects from a population, one finds sample mean to be, say
μk, there is a 95% chance that real population mean shall lie within 1.96*SDk of the sample mean, where SDk is
the Std Dev of THAT particular curve of sample size “n”.
Privileged and Confidential. All Rights Reserved © Facts n Data 2022 2

All rights reserved, Facts n Data, 2022 1


Session 4, Lecture 5, BIMTECH, 04 Feb 2/10/2022
2022

Another way to look at the bell-shaped (or Normal) PD Curves…

It has also been found that:


s ~= σ / Sq Rt of (n)

Image credit: https://en.wikipedia.org/wiki/Interquartile_range


Privileged and Confidential. All Rights Reserved © Facts n Data 2022 3

Problem: You have a given point “G” shown by red dot on the given curve / Horizontal Axis (on the left). This curve has mean of μg and a Std Dev of σg.
You are to find an equivalent point “E” on a ‘reference curve’ on the right that has a Mean of μR and SD or σr.

Given Point
on a Given
Curve
“G”

σg σR
μR
G μc

Given Point
on a Given
Curve
“G”

σg σR

μc μR
Solution: Equivalent point shall be the one on right curve that has SAME area as is between μc and G (highlighted in yellow on left) between μR and New Point “E”
(Highlighted in grey)
Privileged and Confidential. All Rights Reserved © Facts n Data 2022 4

All rights reserved, Facts n Data, 2022 2


Session 4, Lecture 5, BIMTECH, 04 Feb 2/10/2022
2022

So, how to find that?

We just estimate where this point WOULD HAVE LIED ON reference curve!

Point on Ref Curve - μR = Point on GIVEN Curve - μ


3 . σR 3. σ

(3σ would have almost all the area under the curve)

Point on Ref Curve = Point on GIVEN Curve - μ x σR + μR


σ

So, we create a reference curve, with mean = μR = 0 and SD = σR = 1


Equation becomes:

Point on Ref Curve = Point on GIVEN Curve - μ x 1 +0


σ
Which is the Z-score!!!

Privileged and Confidential. All Rights Reserved © Facts n Data 2022 5

Standard Normal Distribution


 Why Standard Normal?
 To compare scores of different distributions –
with different means and std. dev. E.g. Checking
which student did better in the class
 To normalize scores for statistical decision σ σ
making e.g. Difference in number of standard
deviations in my height from the average height
μ μ
of Indian Women in the age group 25- 30.

 Transformation rule:

σ σ
 Variable Z measures deviation from mean in units of μ μ
standard deviation.

 Z is called standardized variable and its value is called


standard score.

 Standard score gives us the number of standard σ σ


deviations a particular value lies below or above the
mean as X = Zσ – μ.
μ μ
 Then, refer to the already calculated table (Next slide)

Privileged and Confidential. All Rights Reserved © Facts n Data 2022 6

All rights reserved, Facts n Data, 2022 3


Session 4, Lecture 5, BIMTECH, 04 Feb 2/10/2022
2022

The Z-Score tables may


be of different types –
some calculating area
between 0 (reference
curve mean) to point,
some give area less than
reference point, some
give area more than the
reference point. Area between z=0 and z=1.95 is
47.44% of total area under the
Be sure of what table are curve. => there is a 47.44% area
you using and do simple between 0 and 1.95 => there is
47.44% x2 = 94.88% area within +
maths to arrive at the 1.95 from 0.
answer.

Privileged and Confidential. All Rights Reserved © Facts n Data 2022 7

Thank You!

Comments/ Clarifications: shekhar@factsNdata.com / +91-9810228402

Privileged and Confidential. All Rights Reserved © Facts n Data 2022 8

All rights reserved, Facts n Data, 2022 4

You might also like