You are on page 1of 44

IIMT 2641 Introduction to Business Analytics

Module 2: Intro to Statistics


Topic 1: Random Variable

1
Objectives
1. RV, Discrete random variable; Define and calculate the
expected value, standard deviation, and variance for
discrete probability distributions.

2. Define continuous random variables & articulate the


similarities and differences between continuous and discrete
random variables.

3. Normal Distribution

2
Random Variables: The Motivation

Now, we want to
We now know how to
focus on the
find probabilities of
distribution of
random outcomes.
outcomes.

Distributions tell us useful information…


1. The spread of possible outcomes.
2. The shape of the possible outcomes (centered around the average, or
skewed).
3. Upper and lower bounds of outcomes.

3
Random Variables: The Motivation
Rolling One Die: Probability Distribution Function
P(x) E
I

- x

4
Random Variable X ,
Random Variable (represented with a capital letter)
§ A numerical description of the outcome of an experiment
– assigns a real number to every possible outcome or event in a statistical
experiment such that the numbers represent mutually exclusive and
collectively exhaustive events.


§ There are two types of random variables:
– Discrete Random Variables (countable number of values)
q X = 1 for heads, 0 for tails
-

q 2 = number of tails ∈ {0,1, … , 10} (discrete variable)


-

↑5 [a 3]
,

– Continuous Random Variables (uncountable number of values)


q 7 = Google′s stock price ∈ [0, ∞) (continuous variable)
5 ( -0
, 0)
6

Probability Distribution
§ Probability distribution assigns the probability to each numerical value of a
random variable.

§ Discrete and continuous probability distribution

§ Popular distributions
§ Binomial distribution
§ Poisson distribution
§ Exponential distribution
§ Normal distribution
§ …
7

Probability Distribution: Discrete RV


§ The probability distribution of a random variable describes how
probabilities are distributed over the values of the random variable.
[14
,
:

xn
-
,

§ For each possible outcome ;! (mutually exclusive and collectively


exhaustive), there is a probability value < 2 = ;! .

§ These values must be between 0 and 1: 0 ≤ < 2 = ;! ≤ 1.

§ They must sum up to 1: ∑$!"# < 2 = ;! = 1.


-
Probability Distribution: Discrete RV
The probability distribution of a random variable describes how probabilities
are distributed over the values of the random variable.
Number of customers in Probability
a day (x) P(X = x)

O
The discrete The probability
0 P(X = 0) = 0.1
random variable X “X equals to”
1 P (X = 1) = 0.4 different values
2 P (X = 2) = 0.25 (0,1,2,3, or 4).
Possible values of
the discrete 3 P (X = 3) = 0.15
random variable X 4 P (X = 4) = 0.1

For a discrete random variable,-


the distribution is also called probability
mass function (pmf).
P(X=

0) P(x 1) +

P(x
=

2)
Quick Check: What is P(X < 3)?
= +
=

-
8
Probability Distribution: Discrete RV
The probability distribution of a random variable describes how probabilities
are distributed over the values of the random variable.
Number of customers in Probability
a day (x) P(X = x)
The discrete The probability
0 P(X = 0) = 0.1
random variable X “X equals to”
1 P (X = 1) = 0.4 different values
2 P (X = 2) = 0.25 (0,1,2,3, or 4).
Possible values of
the discrete 3 P (X = 3) = 0.15
random variable X 4 P (X = 4) = 0.1

For a discrete random variable, the distribution is also called probability


mass function (pmf).

Quick Check: What is P(X < 3)? P(X < 3) = P(X = 0) + P(X = 1) + P(X = 2) = 0.75

9
Cumulative Distribution: The cdf
The cumulative distribution of a random variable X describes the probability that
the outcome of X will be a value less than or equal to x. It is usually denoted with
- ? ; = < 2 ≤ ; and is abbreviated as cdf (cumulative distribution function).
--

Number of Probability Cumulative


customers in a day P(X = x) F(x)
(x)
0 P(X = 0) = 0.1 P (X ≤ 0) = 0.1
1 P(X = 1) = 0.4 P (X ≤ 1) = 0.5
2
3
P(X = 2) = 0.25
P(X = 3) = 0.15
-
P (X ≤ 2) = 0.75
P (X ≤ 3) = 0.9
4 P(X = 4) = 0.1 P (X ≤ 4) = 1

Quick Check: What is F(2)?

10
Cumulative Distribution: The cdf
The cumulative distribution of a random variable X describes the probability that
the outcome of X will be a value less than or equal to x. It is usually denoted with
? ; = < 2 ≤ ; and is abbreviated as cdf (cumulative distribution function).

Number of Probability Cumulative


customers in a day P(X = x) F(x)
(x)
0 P(X = 0) = 0.1 P (X ≤ 0) = 0.1
1 P(X = 1) = 0.4 P (X ≤ 1) = 0.5
2 P(X = 2) = 0.25 P (X ≤ 2) = 0.75
3 P(X = 3) = 0.15 P (X ≤ 3) = 0.9
4 P(X = 4) = 0.1 P (X ≤ 4) = 1

Quick Check: What is F(2)? F(2) = P(X≤ 2) = 0.75

11
pmf VS cdf
Probability Distribution P(X = x) Cumulative Distribution F(x)

~
0.4
1
0.9

0.75
0.25
P(X = x)

F(x)
0.5
0.15
0.1 0.1
0.1

0 1 2 3 4 0 1 2 3 4

Number of customers in a day (x) Number of customers in a day (x)

12
13

What is the expected value of X?


§ The expected value is a measure of central tendency of a probability
distribution, and
– It is the probability-weighted average of all possible outcomes, i.e, it is defined
as
$
A = B 2 = C ;! D < 2 = ;!

-
!"#
= ;# D < 2 = ;# + ;% D < 2 = ;% + ⋯ + ;$ D < 2 = ;$
Expected Value of X: Example
Let 2 ∈ 0,1,2,3,4 be a discrete random variable that gives the number of
customers that arrive in the restaurant between 6 am and 7 am. X has
probability mass function P(X = x) shown in the table below.
Number of customers Probability
between 6 and 7am (x) P(X = x)
0 0.1
1 0.4
2 0.25
3 0.15
4 0.1

What is the expected number of customers that will arrive in the store between
6 and 7am?

14
Expected Value of X: Example
Let 2 ∈ 0,1,2,3,4 be a discrete random variable that gives the number of
customers that arrive in the restaurant between 6 am and 7 am. X has
probability mass function P(X = x) shown in the table below.
Number of customers Probability
between 6 and 7am (x) P(X = x)
0 0.1
1 0.4
2 0.25
3 0.15
4 0.1

What is the expected number of customers that will arrive in the store between
6 and 7am?
E[X] = (0)(0.1) + (1)(0.4) + (2)(0.25) + (3)(0.15) + (4)(0.1)
= 1.75 customers
15
What is the variance of X?
§
&
The variance of a discrete random variable is a measure of spread in the data.
§ It is the probability-weighted sum of its squared deviations from its mean:
s = Var[ X ] = E[( X - µ X )2 ] = å[ x - E ( X )]2 P( x)
-

2
X
-
Ex]
(x :
-

Ex]]" The gap


between x and
Likelihood
of gap
E[X].

IP(X =

xi) (x :
-

E(XS)"

x1 x2 x3 E[X] x4 x5
16
What is the variance of X?
The following discrete distributions have the same expected value,
E[X] = 2, but different variances: 1

Variance = 1.1
1
0.9

P( 2)= =
0.8

ECX]
0.7
0.6
=

O
-
Variance = 0
0.5
1 0.4
0.9 0.3
0.8 0.2
0.7 0.1
0.6 0
0.5 0 1 2 3 4

0.4
1
0.3
0.2 0.9

0.1 0.8
0.7
Variance = 2
0

x
0 1 2 3 4 0.6


0.5
0.4
0.3
0.2
0.1
0
0 1 2 3 4

17
What is the standard deviation of X?
§ The standard deviation of a discrete random variable X is ALSO a measure
-

of spread in the data.


§ Mathematically, it is the square root of the variance

&
s X = Stdev[ X ] = Var[ X ]
*We like standard deviation because it is in the same units as X, meaning it
gives a measure of spread/dispersion that makes sense to us!

18
Calculating Standard Deviation and Variance

E[X] = 1.75 customers !"# $ = '() − +[$])'∗ 0($ = ))


-

Number of
Probability
customers in a
P(X = x)
day (x)

0 0.1

1 0.4

2 0.25

3 0.15

4 0.1

19
Calculating Expected Value Using EXCEL

8
20
Calculating Standard Deviation and Variance Using EXCEL

21
Objectives
1. RV, Discrete random variable; Define and calculate the
expected value, standard deviation, and variance for
discrete probability distributions.

2. Define continuous random variables & articulate the


similarities and differences between continuous and discrete
random variables.

3. Normal Distribution

22
Discrete vs. Continuous Variables: Probability Distributions
Discrete or Continuous?

0.9
0.8
0.7
0.6
Probability

0.5
0.4
0.3
0.2
0.1
0
5' or shorter Over 5', but less than 6' 6' or taller
Height

23
10 . 57 (56] 16 .
(
Discrete vs. Continuous Variables: Probability Distributions
Discrete or Continuous?
0.12

0.1

0.08
Probability

0.06

0.04

0.02

0
5'0"
5'1"
5'2"
5'3"
5'4"
5'5"
5'6"
5'7"
5'8"
5'9"

6'0"
6'1"
6'2"
6'3"
6'4"
6'5"
6'6"
5'10"
5'11"
-]
Height
95
24 ,
Discrete vs. Continuous Variables: Probability Distributions
Discrete or Continuous?
0.016

0.014

-
0.012

0.01
Probability

0.008

0.006

0.004

0.002

0
5.01
5.06
5.11
5.16
5.21
5.26
5.31
5.36
5.41
5.46
5.51
5.56
5.61
5.66
5.71
5.76
5.81
5.86
5.91
5.96
6.01
6.06
6.11
6.16
6.21
6.26
6.31
6.36
6.41
6.46
Height

25
Normal
<- 3)
-

0
,

Discrete vs. Continuous Variables: Probability Distributions


Continuous Random Variables
#
fixs
[a b]
,
->
Uniform
D
– Instead of a pmf table or chart… we have a probability density

&
function (pdf), denoted as f(x), that tells us the height of the curve at
X = x.
q f(x) is NOT the same as P(X = x)

For all continuous random variables P(X = x) = 0


-
Discrete

.
case :

P(Xxx)
=

P(X 0) ....

+P(X
=
x) =

I P(x ,
=
x = xx)
x2
+


=

P(X=t , ) San
=

– Instead of a cdf table or chart… we have a cumulative distribution


-
-

function (cdf) F(x) = P(X≤ x).


q This is the same in theory as in the discrete case, but because X is continuous, you
cannot enumerate and add up all values like in the discrete case.

,
q Instead, take the area under the curve of the pdf I

I
-
26 (Not Required)
Discrete vs. Continuous Variables: Probability Distributions

Discrete or Continuous? pdf cof


,

90 ,
0008

27

Discrete vs. Continuous Variables: Probability Distributions

-
Why does P(X = x) = 0 for any value of x when X is a continuous random
variable?

Assume X is continuous.

E
– Let X = the length of a human hair. What is P(X = 6 cm)?

– Let X = the length of a human hair. What is P(X = 6.186574 cm)?

P(Xt (0 .
2
,
0
.

6])
The implication is, when X is continuous (not discrete): 0 . 6-0 2.

P(X < x) = P (X ≤ x)
-

T
P(X > x) = P (X ≥ x)

! 28
Discrete vs. Continuous Variables: Probability Distributions
Properties of the pdf of a continuous random variable.
1. f(x) gives the height of the pdf curve at the value x
2. f(x) is always non-negative
3. The total area under the curve of f(x) is always 1 (because the sum of all
probabilities of all possible outcomes must sum to 1).
4. The probability of x being in the range between A and B is the area under the
curve between A and B.

M 29
Continuous Distribution Function Example
Let X be the surface temperature of the moon (in oF)
1. What is f(50) ?

2. Find P(50<X<100) =

3. Find P(X≤50)=

4. Find P(- ¥ ≤ X ≤ ¥)=


Higher values of f(x) indicates greater
likelihood of taking value around here

PDF of X Total area “under the


curve” (shaded) is 1

50 100
X could belong to a finite or infinite range.
30
Continuous Distribution Function Example
Let X be the surface temperature of the moon (in oF)
1. What is f(50) ? The height of the curve at x = 50.

2. Find P(50<X<100) = Area under curve between 50 and 100.

3. Find P(X≤50)= Area under curve to the left of 50.

4. Find P(- ¥ ≤ X ≤ ¥)= 1


Higher values of f(x) indicates greater
likelihood of taking value around here

PDF of X Total area “under the


curve” (shaded) is 1

50 100
X could belong to a finite or infinite range.
31
Continuous Distribution Function Example
Let X be the surface temperature of the moon (in oF)
1. Find P(X ≤ 100)= ab and P(X < 190)= 0 95 .
=

P(X = 190)

2. Find P(100 < X < 190) = 0 .

95-0 6 .
=

0 . 35

85
Find P(X > 190)= 1
0 0
05
-
=

3.
.

P(X
3)
=

.95

CDF of X
0
.6

100 190
32
Continuous Distribution Function Example
Let X be the surface temperature of the moon (in oF)
1. Find P(X ≤ 100)= 0.6 and P(X < 190)= 0.95

2. Find P(100 < X < 190) = 0.95 – 0.60 = 0.35

3. Find P(X > 190)= 1 – 0.95 = 0.05

.95
.6
CDF of X

100 190
33
-
Expected Value and Variance of Continuous Random Variables (Not Required)
The Expected Value of a continuous random variable is

I
)
+[$] = 1 ). 3 ) 4)
(
The Variance of a continuous random variable is

)
Var[$] = ∫( () − + $ )'3 ) 4)

*here a is the minimum value of X, and b is the maximum value of X

34
X Y ,
random variable

Independent of RVs (Not Required) " ,


y values ·

Two random variables X and Y are independent if and only if, for every x O
I
and y, the events {2 ≤ ;} and {7 ≤ K} are independent events.
-

-< 2 ≤ ;, 7 ≤ K = < 2 ≤ ; ⋅ <(7 ≤ K) foray c y , .

P(ARB)
=

P(A) .

P(B)
> A and B are
independent

35
Objectives
1. RV, Discrete random variable; Define and calculate the
expected value, standard deviation, and variance for
discrete probability distributions.

2. Define continuous random variables & articulate the


similarities and differences between continuous and discrete
random variables.

3. Normal Distribution

36
The Normal Distribution
Normal random variable X

&
X = any continuous measure
• It can take on any continuous units (feet, test scores, etc.)
• Most commonly used to model symmetric quantities that have a
“central tendency” with concentration around the center.

Properties and Assumptions of the Normal Distribution


1. Infinite Range (between -∞ and +∞), so negative values allowed.
-
2. Symmetric around the mean.

37
39

Normal Distribution
§ The normal distribution is the most popular and useful continuous
probability distribution.
§ E.g., return of a stock portfolio, and test scores.

§ The probability density function is


-

--
1 '&( !
&
N ; = D Q %)! (Not Required)
O 2P

§ The normal distribution is specified completely when we know the mean


B(2), A, and the standard deviation, O. We often use the notation
- R A, O % .

↑9)
12
40

Normal Distribution
§ Normal distribution with different A. (shape unchanged)

|
40
I |
µ = 50
|
60

-
Smaller µ, same s

| | |
µ = 40 50 60

-> Larger µ, same s


| | |
40 50 µ = 60
41

Normal Distribution
§ Normal distribution with different O. (always symmetric)

M Same µ, smaller s
-
-

E 0
Same µ, larger s
[
- E -
-
-
<
<


42

↳,
Normal Distribution =
-

- %
L
P(M-0 = M+ 6)
=
P(n 0 -

-
X <
n +

6) =
68
P(n 20 - X
M 26) 95%
- Mof
-
+
=
<

= 99 7
% FM . 6


.

e
-
NIM oY 43
For M . 0 .

if x ,

Standard Normal Distribution them Z


* M
=N10 1)
-

T
.

&
Standard normal distribution: N(0, 1)
E

-
/

/ 4(-152=1)

"1s
.

= 68
%
/
,

e
z=

'&(
)
I
If S~U V, W* , then X =
+&(
)
follows standard normal distribution.
-

represents the number of standard deviations from a number


x to the mean, A 2
H=
x -

-
44

Standard Normal Distribution: Example


§ If 2 follows normal distribution with mean A = 100, O = 15, and we are
interested in finding the probability that X is less than 130, then
< 2 ≤ 130 = <(X ≤ Y), z=?
-

M - 2
I
=
-

15

No , 1)
45

Standard Normal Distribution: Example


§ If 2 follows normal distribution with mean A = 100, O = 15, and we are
interested in finding the probability that X is less than 130, then
< 2 ≤ 130 = <(X ≤ Y), z=?

You might also like