Statistics For Decision Making in Python: Session 4, Lecture 5 V Shekhar Avasthy, 4 Feb, 2022

Session 4, Lecture 5, BIMTECH, 04 Feb 2/10/2022
2022
Statistics for Decision Making in Python

Session 4, Lecture 5
Business Vertical – DA, Trimester III, Batch ‘21-’23
V Shekhar Avasthy, 4th Feb, 2022
This session consists of one of the most fundamental aspects of Statistics. The derivation etc shall not be a part of
problems/ grading – only application shall be asked. Nonetheless, UNDERSTANDING this is critical!
Privileged and Confidential. All Rights Reserved © Facts n Data 2022 1
What we had seen…

1. If MEANS of ALL POSSIBLE COMBINATIONS of size “n” are plotted, they lead to a near-bell shaped curve. This is
true for nearly all Data of any shape (except very few exceptions), hence the name Normal curve;
2. All bell shaped curves, are UNIQUELY characterized by the peak (mean) and spread (Standard Deviation) – i.e.,
a combination of peak and spread (mean & SD, in other words), shall lead to one and only one curve;
3. The points on the curve represent the probability of mean on corresponding x-axis for that particular ‘n’
• This implies that sum of probability for mean lying between any 2 points (say, x1 and x2) is equal to probability of all points between
points x1 and x2 – in other words, area under the curve between these two points;
4. Since such a curve represents probability of various mean values, it’s also called a Probability Distribution (PD)
curve;
5. The peak of any PD curve for any ‘n’ from same population, always (nearly) coincides – i.e., the “mean of
means” for any given sample size ‘n’ is always (almost) equal to the population mean;
6. Area on either side of peak (wherever the peak is) between +1 SD is 68.27% of total area, between +2 SD is
95.45% (between + 1.96 SD is 95%) of total area, between +3 SD is 99.73% of the total area, where SD is
Standard Deviation of that particular curve of sample size “n”;
• This implies that if for a sample (say, Sample ‘k’) of “n” objects from a population, one finds sample mean to be, say
μk, there is a 95% chance that real population mean shall lie within 1.96*SDk of the sample mean, where SDk is
the Std Dev of THAT particular curve of sample size “n”.
All rights reserved, Facts n Data, 2022 1

2022
Another way to look at the bell-shaped (or Normal) PD Curves…
It has also been found that:

s ~= σ / Sq Rt of (n)
Image credit: https://en.wikipedia.org/wiki/Interquartile_range

Problem: You have a given point “G” shown by red dot on the given curve / Horizontal Axis (on the left). This curve has mean of μg and a Std Dev of σg.
You are to find an equivalent point “E” on a ‘reference curve’ on the right that has a Mean of μR and SD or σr.
Given Point
on a Given
Curve
“G”
σg σR
μR
G μc
Given Point
on a Given
Curve
“G”
σg σR
μc μR
Solution: Equivalent point shall be the one on right curve that has SAME area as is between μc and G (highlighted in yellow on left) between μR and New Point “E”
(Highlighted in grey)

2022
So, how to find that?
We just estimate where this point WOULD HAVE LIED ON reference curve!
Point on Ref Curve - μR = Point on GIVEN Curve - μ

3 . σR 3. σ
(3σ would have almost all the area under the curve)
Point on Ref Curve = Point on GIVEN Curve - μ x σR + μR

σ
So, we create a reference curve, with mean = μR = 0 and SD = σR = 1

Equation becomes:
Point on Ref Curve = Point on GIVEN Curve - μ x 1 +0

σ
Which is the Z-score!!!
Standard Normal Distribution

 Why Standard Normal?
 To compare scores of different distributions –
with different means and std. dev. E.g. Checking
which student did better in the class
 To normalize scores for statistical decision σ σ
making e.g. Difference in number of standard
deviations in my height from the average height
μ μ
of Indian Women in the age group 25- 30.
 Transformation rule:
σ σ
 Variable Z measures deviation from mean in units of μ μ
standard deviation.
 Z is called standardized variable and its value is called

standard score.
 Standard score gives us the number of standard σ σ

deviations a particular value lies below or above the
mean as X = Zσ – μ.
μ μ
 Then, refer to the already calculated table (Next slide)

2022
The Z-Score tables may

be of different types –
some calculating area
between 0 (reference
curve mean) to point,
some give area less than
reference point, some
give area more than the
reference point. Area between z=0 and z=1.95 is
47.44% of total area under the
Be sure of what table are curve. => there is a 47.44% area
you using and do simple between 0 and 1.95 => there is
47.44% x2 = 94.88% area within +
maths to arrive at the 1.95 from 0.
answer.
Thank You!
Comments/ Clarifications: shekhar@factsNdata.com / +91-9810228402

Statistics For Decision Making in Python: Session 4, Lecture 5 V Shekhar Avasthy, 4 Feb, 2022

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistics For Decision Making in Python: Session 4, Lecture 5 V Shekhar Avasthy, 4 Feb, 2022

Uploaded by

Copyright:

Available Formats

Session 4, Lecture 5, BIMTECH, 04 Feb 2/10/2022

Statistics for Decision Making in Python

V Shekhar Avasthy, 4th Feb, 2022

Privileged and Confidential. All Rights Reserved © Facts n Data 2022 1

What we had seen…

All rights reserved, Facts n Data, 2022 1

Another way to look at the bell-shaped (or Normal) PD Curves…

It has also been found that:

Image credit: https://en.wikipedia.org/wiki/Interquartile_range

All rights reserved, Facts n Data, 2022 2

So, how to find that?

Point on Ref Curve - μR = Point on GIVEN Curve - μ

Point on Ref Curve = Point on GIVEN Curve - μ x σR + μR

So, we create a reference curve, with mean = μR = 0 and SD = σR = 1

Point on Ref Curve = Point on GIVEN Curve - μ x 1 +0

Privileged and Confidential. All Rights Reserved © Facts n Data 2022 5

Standard Normal Distribution

 Z is called standardized variable and its value is called

 Standard score gives us the number of standard σ σ

Privileged and Confidential. All Rights Reserved © Facts n Data 2022 6

All rights reserved, Facts n Data, 2022 3

The Z-Score tables may

Privileged and Confidential. All Rights Reserved © Facts n Data 2022 7

Comments/ Clarifications: shekhar@factsNdata.com / +91-9810228402

Privileged and Confidential. All Rights Reserved © Facts n Data 2022 8

All rights reserved, Facts n Data, 2022 4

You might also like