You are on page 1of 66

Introduction to Biostatistics

Shamik Sen
Dept. of Biosciences & Bioengineering
IIT Bombay
Probability

• Foundation for Inferential Statistics

• Subjective Vs Objective (Frequency)

- Geologist: There is a 60% chance of oil in this region


- Doctor: I am 95% certain the patient has this disease

Frequency Interpretation:
- probability is a property of the outcome
- can be experimentally determined
Sample Space & Event
Sample space (S): Set of all possible outcomes of an experiment

Set of natural numbers, 𝑆 = {1,2,3, … , ∞}

Gender of a newborn child, 𝑆 = {𝑏, 𝑔} S


E
Dosage given to a patient until he/she reacts positively, 𝑆 = {0, ∞}

Outcome of two throws of a coin, 𝑆 = {𝐻𝐻, 𝐻𝑇, 𝑇𝐻, 𝑇𝑇}

Event (E): Any subset of the sample space

Set of natural numbers less than 5, 𝐸 = {1,2,3,4}

Newborn is a girl, 𝐸 = {𝑔}


Union, Intersection & Complement
Union, E⋃ 𝐹 OR (𝐸 + 𝐹): Either E or F occurs

Intersection, E⋂ 𝐹 OR (𝐸𝐹): Both E and F occur

Complement, 𝐸 ! : outcomes not in E OR E does not occur

A: newborn is boy
B: newborn is girl
Mutually Exclusive Events

1: 𝐴 = 𝐵! , B = 𝐴!
2: 𝐴 + 𝐵 = 𝑆
EG FG (E+F)G

De Morgan’s Laws

(𝐸 + 𝐹)! = 𝐸 ! 𝐹 !
(𝐸𝐹)! = 𝐸 ! + 𝐹 !
Axioms of Probability
Probability of an event = 𝑃(𝐸)

1. 𝑃 𝑆 = 1

2. 0 ≤ 𝑃(𝐸) ≤ 1

3. For mutually exclusive events 𝐸! , 𝐸" ,…, 𝐸# , 𝑃 ⋃ 𝐸$ = ∑$%#


$%! 𝑃(𝐸$ )

A. 𝑃 𝐸 & = 1 − 𝑃(𝐸)

B. 𝑃 𝐴 + 𝐵 = 𝑃 𝐴 + 𝑃 𝐵 − 𝑃(𝐴𝐵)

C. Odds of an event = 𝑃(𝐴)/𝑃(𝐴& )


Basics of Counting
Product Rule

• Two experiments performed at once


• If Experiment 1 has 𝑚 outcomes, Experiment 2 has 𝑛 outcomes
• Total # outcomes = 𝑚𝑛

• Generalize: 𝑛"𝑛# … 𝑛$

Arranging 𝑛 objects: 𝑛 𝑛 − 1 𝑛 − 2 … 1 = 𝑛! (0! = 1! = 1)

Arranging 𝑛 objects in which 𝑛" are of type 1, 𝑛# are of type 2, …, 𝑛$


%!
are of type r: % !% !…% !
! " #
Calculating Probability: Example 1

Two balls are “randomly drawn” from a bowl containing 6 white and 5 black balls.
What is the probability that one of the drawn balls is white and the other black?

Total # of outcomes = 11×10 = 110

E: event that one white and one black ball is drawn

Total number of possibilities of event E occurring, 𝑛 𝐸 = 6×5 + 5×6

() (
Hence, 𝑃 𝐸 = "") = ""
Calculating Probability: Example 2
DNA is made of 4 nucleotides: 𝐴, 𝑇, 𝐺, 𝐶

Probability that a 3 nucleotide sequence has all A’s (event E)?


Probability that a 3 nucleotide sequence has no repeats (event F)?

How many sequences of length 3 are possible?


𝑛 𝑆 = 4×4×4 = 64
How many sequences have all A’s?
𝑛 𝐸 =1
"
Therefore, 𝑃 𝐸 = (*

How many sequences of length 3 do not contain any repeats?


𝑛 𝐹 = 4×3×2 = 24
#* +
Therefore, 𝑃 𝐹 = (* = ,
Calculating Probability: Example 3
A class in probability theory consists of 6 men and 4 women. An exam is given
and the students are ranked according to their performance. Assuming that no
two students obtain the same score, (a) how many different rankings are
possible? (b) If all rankings are considered equally likely, what is the probability
that women receive the top 4 scores?

(a) 10!

"!$!
(b)
%&!
Calculating Probability: Example 4
I have a coin which is not fair. I flip the coin twice.

You have to bet as to whether the outcomes are same or different.

What will you bet? (a) Same (b) Different (c) Cannot say

Toss 2
P(same) = 𝑝# + 𝑞# Toss 1 H T
H p2 pq
P(different) = 2𝑝𝑞 T pq q2

P(same) – P(different) = 𝑝# + 𝑞# − 2𝑝𝑞 = 𝑝 − 𝑞 # > 0(𝑝 ≠ 𝑞)


Calculating Probability: Example 5
A fair coin is tossed continuously. Every head fetches 1 point and every tail
fetches 2 points. What is the probability of scoring 𝑛 points?

𝑛−1 H
Let 𝑃% = probability of scoring 𝑛 points
𝑛
" "
Therefore, 𝑃% = 𝑃%-"× # + 𝑃%-#× # T
𝑛−2

Let 𝑃% = A𝑥 %

Therefore, A𝑥 % = 0.5A𝑥 %-" + 0.5A𝑥 %-# implying 2𝑥 # = 𝑥 + 1

"
Solving, we get, 𝑥 = − , 1
#
Example 5 contd..
General solution: 𝑃% = 𝐴(−0.5)% + 𝐵(1)%

" " " " +


We have, 𝑃" = # and 𝑃# = # × # + # = *

Therefore,

Equation 1: −0.5𝐴 + 𝐵 = 0.5


Equation 2: 0.25𝐴 + 𝐵 = 0.75

" #
Solving, 𝐴 = +
,𝐵 = +

# "
Therefore, 𝑃% = + + + (−0.5)%
Permutations Vs Combinations
Permutations: Where arrangements are important
-Rankings
-Sequences

Combinations: Where groups are important


-Committee
-Order not relevant

In how many different ways can we choose 𝑟 items from a total of 𝑛 items?

# of Arrangements = 𝑛 𝑛 − 1 𝑛 − 2 … . 𝑛 − 𝑟 + 1 = 𝑃$%
# of Repetitions (e.g., ABC, ACB, BCA, BAC, CAB, CBA) = 𝑟!
% %-" %-# ….(%-$0") %!
# of Combinations = $!
= $! %-$ ! = 𝐶$%
Calculating Probability: Example 6
A committee of 8 members is to be selected from a group of 6 men and 9
women. If the selection is made randomly, what is the probability that the
committee consists of 3 men and 5 women?

Ways in which 5 members can be chosen from 15 = 𝐶2"2

Ways in which 3 men can be chosen from 6 men = 𝐶+(

Ways in which 5 women can be chosen from 9 women = 𝐶23

'!" !#$
Hence, 𝑃 =
!%&#
Conditional Probability
Conditional Probability of E given F =

Probability of event E given that F has


already occurred = P(E/F) = 𝑃(𝐸𝐹)/𝑃(𝐹)

P(F/E) = 𝑃(𝐸𝐹)/𝑃(𝐸)

Multiplication Rule:

𝑃 𝐸𝐹 = P(E/F) P(F)

𝑃 𝐸𝐹 = P(F/E) P(E)
Conditional Probability: Example 1
5 red and 2 green balls are present in a bag. A random ball is selected and replaced by
a ball of the other color; then a second ball is drawn.

1. What is the probability the second ball is red?


2. What is the probability the first ball was red given the second ball was red?

5 4 2 6 32
𝑃 𝑅! = × + × =
7 7 7 7 49

5 4
𝑃 𝑅" 𝑅! ×
𝑃 𝑅" /𝑅! = = 7 7 = 20 = 5
𝑃 𝑅! 32 32 8
49
Conditional Probability: Example 2
The organization that Jones works for is running a father–son dinner for those
having at least one son. Each of these employees is invited to attend along
with their youngest son. If Jones is known to have two children, what is the
conditional probability that they are both boys given that he is invited to the
dinner?

A: event that atleast one of Jones’s children is a boy (as he is invited to dinner)
B: event that both of Jones’s children are 2 boys

S: { 𝑏, 𝑏 , 𝑏, 𝑔 , 𝑔, 𝑏 , (𝑔, 𝑔)}

C(D,D) "/* "


P(B/A) = P(BA)/P(A) = = =
C{ D,D , D,G ,(G,D)} +/* +
𝑥 =? 𝑥 = 𝑃(𝐴! )
𝑦 =? 𝑦 = 𝑃(𝐵" /𝐴! )
𝑧 =? 𝑧 = 𝑃(𝐶! /𝐵" 𝐴! )
Bayes’ Theorem

Mutually Exclusive Events


Conditional Probability: Example 3
An insurance company believes that people can be divided into two classes —
those that are accident prone and those that are not. Their statistics show
that an accident-prone person will have an accident at some time within a
fixed 1-year period with probability .4, whereas this probability decreases to
.2 for a non-accident-prone person. If we assume that 30 percent of the
population is accident prone, what is the probability that a new policy holder
will have an accident within a year of purchasing a policy?

Event A: person is accident prone


Event B: person has an accident

P(B) = P(B/A) P(A) + P(B/Ac) P(Ac) = 0.4×0.3 + 0.2×0.7 = 0.26


Sensitivity, Specificity & Predictive Value of a Diagnostic test
• False Positives & False Negatives
• Sensitivity: Probability of a positive test result given the presence of a disease
• Specificity: Probability of a negative test result given the absence of a disease
• Predictive value positive: Person has the disease given his/her test result is
positive, 𝑃 𝐷/𝑇
• Predictive value negative: Person does not have the disease given his/her test
result is negative, 𝑃(𝐷 ! /𝑇 ! )

Disease (D)
Test Result (T) Present (D) Absent (D) Total I
Sensitivity = = 𝑃(𝑇/𝐷)
I0!
Positive (T) a b a+b
Negative (T) c d c+d J
Specificity = D0J = P(𝑇 ! /𝐷 ! )
Total a+c b+d
Example 1
A medical research team wished to evaluate a screening test for Alzheimer’s disease. The
test was administered to 450 Alzheimer’s patients and 500 patients with no symptoms of
the disease. The results are as follows:
Alzheimer’s Diagnosis (D)
Test Result (T) Present (D) Absent (D) Total

Positive (T) 436 5 441


Negative (T) 14 495 509

Total 450 500

• Sensitivity = 436/450 = 0.97


• Specificity = 495/500 = 0.99
Example 2
A laboratory blood test is 99 percent effective in detecting a certain disease when
it is, in fact, present. However, the test also yields a “false positive” result for 1
percent of the healthy persons tested. (That is, if a healthy person is tested, then,
with probability .01, the test result will imply he or she has the disease.) If .5
percent of the population actually has the disease, what is the probability a person
has the disease given that his test result is positive?

D: event that person has the disease


T: event that result is positive
$
C(KL) C %
C K
Therefore, 𝑃 𝐷/𝑇 = C(L)
= $ $
C %
C K 0C %&
C(K& )

).33×).))2
Therefore, 𝑃 𝐷/𝑇 = = 0.33
).33×).))20).)"×).332
Receiver operating characteristic (ROC) Curves

• test provides several categories of response rather than simply providing


positive or negative results

Ratings of Computer Tomography images by radiologist


ROC Curve
• ROC curve: plot of sensitivity Vs (1 – specificity)

• Different points correspond to different cutoffs

• Area under the curve represents the diagnostic


accuracy of the test
Independent Events

• Two events E and F are independent if 𝑃 𝐸𝐹 = 𝑃 𝐸 𝑃 𝐹

• If E and F are independent, E and F c are also independent

Since 𝐸 = 𝐸𝐹 + 𝐸𝐹 ! , 𝑃 𝐸 = 𝑃 𝐸𝐹 + 𝑃(𝐸𝐹 ! )

Therefore, 𝑃 𝐸𝐹 ! = P 𝐸 − 𝑃 𝐸𝐹 = 𝑃 𝐸 1 − 𝑃 𝐹 = 𝑃 𝐸 𝑃(𝐹 ! )

• If A, B and C are independent, A is independent of (B + C)


Example
A system composed of n separate components is said to be a parallel
system if it functions when at least one of the components functions.
For such a system, if component i, independent of other components,
functions with probability pi , i = 1, . . . , n, what is the probability the
system functions?

P(system functioning) = 1 – P(system not functioning)


= 1 − 𝑃(none of the components function)
= 1 − (1 − 𝑝")(1 − 𝑝#)… (1 − 𝑝% )
Random Variable (RV)

• RV: variable describing outcome of an experiment

• Sum of scores of two fair dice

• Number of patients who are positive for a given test

• Students with height greater than 6 ft

• IC50 dose for different cell lines

• Discrete RV Vs Continuous RV
Sum of scores of two dice throws

X: sum of scores = {2, 3, ….,12}

1
𝑃 𝑋 = 2 = 𝑃 𝑋 = 12 = Throw 2
36 Throw 1
1
𝑃 𝑋 = 3 = 𝑃 𝑋 = 11 =
18
1
𝑃 𝑋 = 4 = 𝑃(𝑋 = 10) =
12
1
𝑃 𝑋=5 =𝑃 𝑋=9 =
9
5
𝑃 𝑋=6 =𝑃 𝑋=8 =
36
1
𝑃 𝑋=7 =
6
pmf & CDF

Probability Mass Function, 𝑝 𝑎 = 𝑃 𝑋 = 𝑎 , 𝑝(𝑥) ≥ 0, ∑ 𝑝 𝑥 = 1

Cumulative Distribution Function, 𝐹 𝑎 = 𝑃(𝑋 ≤ 𝑎)

X 2 3 4 5 6 7 8 9 10 11 12

𝑝(𝑎) 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36

𝐹(𝑎) 1/36 3/26 6/36 10/36 15/36 21/36 26/36 30/36 33/36 35/36 36/36

Bounds: 0 ≤ 𝐹 𝑎 ≤ 1

Non-decreasing function
Example 1
X: 1 3 5 7
F(a) 0.5 0.75 0.9 1

What is the value of 𝑃 𝑋 ≤ 3 ?


Ans = 0.75

What is the value of 𝑃 𝑋 = 3 ?


Ans = 0.25

NOTE: 𝑃 𝑎 < 𝑋 ≤ 𝑏 = 𝐹 𝑏 − 𝐹(𝑎)

What is the value of 𝑃 𝑋 = 4 ? Ans = 0


Probability Density Function (pdf)

For continuous RVs 𝐹(𝑥)

𝑓 𝑥 = 𝐴𝑥 #, 𝑓 𝑥 = 𝐶 𝑥 # − 𝑥 2

Q 𝐹(𝑥)
e 𝑓 𝑥 𝑑𝑥 = e 𝑓 𝑥 𝑑𝑥 = 1
P -Q
𝑋=𝑥

𝑃 𝑋 ∈ 𝐵 = e 𝑓 𝑥 𝑑𝑥 𝑑𝐹
= 𝑓(𝑥)
R 𝑑𝑥
D
e 𝑓 𝑥 𝑑𝑥 = 𝑃(𝑎 ≤ 𝑥 ≤ 𝑏)
I
Example 2

For a random variable X, 𝑓 𝑥 = 𝐴𝑥 #, 0 ≤ 𝑥 ≤ 1; 0 elsewhere

Calculate the value of A.

"
Since ∫) 𝑓 𝑥 𝑑𝑥 = 1

Therefore, 𝐴[𝑥 +/3]")= 1 implying 𝐴 = 3.


Example 3
Suppose the random variable X has distribution function

0, 𝑥 ≤ 0
𝐹 𝑥 =k -U "
1 − 𝑒 ,𝑥 > 1

Find 𝑃 𝑋 > 1

We have, 𝑃 𝑋 > 1 = 1 − 𝑃 𝑋 ≤ 1 = 1 − 𝐹 1 = 𝑒 -"

JV -U "
We can determine 𝑓 𝑥 = JU
= 2𝑥𝑒

Q -U " Q -W
Therefore, 𝑃 𝑋 > 1 = ∫" 2𝑥𝑒 𝑑𝑥 = ∫" 𝑒 𝑑𝑦 = 𝑒 -"
Expectation of a Random Variable

• Expectation: average value of the random variable = 𝐸 𝑋 = 𝜇

• For a discrete random variable, 𝐸 𝑋 = ∑ 𝑥𝑝(𝑥)

• For a continuous random variable, 𝐸 𝑋 = ∫ 𝑥𝑓 𝑥 𝑑𝑥

• For an unbiased coin toss, 𝐸 𝑋 = 0×0.5 + 1×0.5 = 0.5

"0#0+0*020(
• For one throw of a dice, 𝐸 𝑋 = (
= 3.5
Properties of Expectation
1. Expectation of a constant is a constant, i.e., 𝐸 𝑐 = 𝑐

2. For a discrete random variable, 𝐸 𝑔(𝑋) = ∑ 𝑔 𝑥 𝑝(𝑥)

3. For a continuous random variable, 𝐸 𝑔(𝑋 = ∫ 𝑔 𝑥 𝑓 𝑥 𝑑𝑥

4. 𝐸 𝑎𝑋 + 𝑏 = 𝑎𝐸 𝑋 + 𝑏

5. 𝐸 𝑋 % = ∫ 𝑥 % 𝑓 𝑥 𝑑𝑥

6. 𝐸 ∑ 𝑋X = ∑ 𝐸(𝑋X )

# #
7. 𝑉𝑎𝑟 𝑋 = 𝐸 𝑋 − 𝜇 =E 𝑋# − 𝐸 𝑋 = 𝐸 𝑋 # − 𝜇#

8. 𝑉𝑎𝑟 𝑐 = 0; 𝑉𝑎𝑟 𝑎𝑋 + 𝑏 = 𝑎#𝑉𝑎𝑟(𝑋)


Best Predictor of a Random Variable

Let c be the best predictor of a random variable

Then, 𝐸 𝑋 − 𝑐 # must be a minimum

# # #
Now, 𝐸 𝑋 − 𝑐 =𝐸 𝑋 − 𝜇 + (𝜇 − 𝑐) =𝐸 𝑋−𝜇 + 2𝐸 𝑋 − 𝜇 𝐸 (𝜇 −
𝑐) + 𝐸 𝜇 − 𝑐 #

This can be re-written as, 𝐸 𝑋 − 𝑐 # =𝐸 𝑋−𝜇 # +𝐸 𝜇−𝑐 #

Thus, 𝐸 𝑋 − 𝑐 # ≥𝐸 𝑋−𝜇 #

Therefore, 𝜇 is the best predictor of a random variable


Binomial Distribution
Bernoulli Process

• Each trial results in one of two mutually exclusive outcomes (Success & Failure)
• Probability of success, 𝑝, remains constant from trial to trial
• Trials are independent

Probability of failure: 𝑞 = 1 − 𝑝

Binomial random variable 𝑋 with parameters (𝑛, 𝑝): number of successes in 𝑛


independent trials with probability of success = 𝑝

Probability of 𝑖 successes, 𝑃 𝑋 = 𝑖 = 𝐶X% 𝑝X 𝑞%-X , 0 ≤ 𝑖 ≤ 𝑛


Example 1
In a hospital, 85% of pregnancies led to full-term birth, i.e., delivery in week
37 or later. If 5 birth records were selected at random, calculate the
probability that (i) exactly three of the pregnancies led to pre-term birth.
(ii) atleast 1 pregnancy led to pre-term birth.

Probability of pre-term birth = 𝑝 = 1 − 0.85 = 0.15

(i) Prob. = 𝑃 𝑋 = 3 = 𝐶+20.15+0.85#


(ii) Prob. = 𝑃 𝑋 ≥ 1 = 1 − 𝑃 𝑋 = 0 = 1 − 0.852
Example 2
A communications system consists of n components, each of which will,
independently, function with probability p. The total system will be able to
operate effectively if at least one-half of its components function. For what
values of p is a 5-component system more likely to operate effectively than a
3-component system?

𝐶+2𝑝+(1 − 𝑝)#+𝐶*2𝑝*(1 − 𝑝) +𝑝2 ≥ 𝐶#+𝑝#(1 − 𝑝) + 𝑝+

This reduces to 3(𝑝 − 1)#(2𝑝 − 1) ≥ 0

"
Implying 𝑝 ≥ #
Expectation & Variance of a Binomial Distribution

Expectation of a single trial, 𝐸 𝑋 = 1×𝑝 + 0× 1 − 𝑝 = 𝑝

We can obtain, 𝐸 𝑋 # = 1#𝑝 + 0#𝑞 = 𝑝

Therefore, 𝑉𝑎𝑟 𝑋 = 𝐸 𝑋 # − 𝐸 𝑋 # = 𝑝 − 𝑝# = 𝑝 1 − 𝑝 = 𝑝𝑞

Expectation for 𝑛 trials = 𝑝 + 𝑝 + 𝑝 + ⋯ + 𝑝 = 𝑛𝑝

Variance for 𝑛 trials = 𝑝𝑞 + 𝑝𝑞 + 𝑝𝑞 + ⋯ + 𝑝𝑞 = 𝑛𝑝𝑞


Computing Binomial Distribution Function
𝑝 𝑛−𝑘
𝑃 𝑋 =𝑘+1 = 𝑃(𝑋 = 𝑘)
1−𝑝𝑘+1

Example: 𝑛 = 6, 𝑝 = 0.4

𝑃 𝑋 = 0 = 0.6(
4 6
𝑃 𝑋=1 = × 𝑃 𝑋=0 = 0.1866
6 1
4 5
𝑃 𝑋=2 = × 𝑃 𝑋=1 = 0.311
6 2
4 4
𝑃 𝑋=3 = × 𝑃 𝑋=2 = 0.2765
6 3
4 3
𝑃 𝑋=4 = × 𝑃 𝑋=3 = 0.1382
6 4
Poisson Distribution
Discrete Random Variable taking values ≥ 0

Examples

• The number of misprints on a page of a book. 𝜆=4


• The number of people in a community living to 100 years
of age.

Y'
Pmf: 𝑃 𝑋 = 𝑖 = -Y
𝑒 X! , 𝑖 = 0,1,2, …

Poisson Distribution Function

𝑃(𝑋 = 𝑖 + 1) 𝜆
=
𝑃(𝑋 = 𝑖) 𝑖+1
Example
If the average number of claims handled daily by an insurance company is
5, (i) what proportion of days have less than 3 claims? (ii) What is the
probability that there will be 4 claims in exactly 3 of the next 5 days?
Assume that the number of claims on different days is independent.

Average number of claims = 𝜆 = 5

(i) Proportion of days with less than 3 claims = 𝑃 𝑋 = 0 + 𝑃 𝑋 = 1 +


𝑃(𝑋 = 2)
(ii) 𝑃 𝑋 = 4 = 𝑝; Prob. =𝐶+2𝑝+𝑞#
Moment generating functions

Moment generating function, 𝜙 𝑡 = 𝐸 𝑒 Z[ = ∫ 𝑒 ZU 𝑓 𝑥 𝑑𝑥

We can write, 𝜙 \ 𝑡 = 𝐸(𝑋𝑒 Z[ )

So, 𝜙 \ 0 = 𝐸 𝑋

Similarly, 𝜙 % 0 = 𝐸 𝑋 %
] (' ] )* Y'
For Poisson Distribution, 𝜙 𝑡 = ∑
X!
= 𝑒 -Y ∑(𝜆𝑒 Z )X = exp(𝜆(𝑒 Z −1))

Taking derivatives, 𝜙 \ 𝑡 = 𝜆𝑒 Z exp(𝜆(𝑒 Z −1))

Similarly, 𝜙 ” 𝑡 = (𝜆𝑒 Z )#exp(𝜆(𝑒 Z −1)) + 𝜆𝑒 Z exp(𝜆(𝑒 Z −1))

Therefore, 𝐸 𝑋 = 𝜙 \ 0 = 𝜆, 𝐸 𝑋 # = 𝜆 + 𝜆#

Therefore, 𝑉𝑎𝑟 𝑋 = 𝜆

Mean & Variance are same for the Poisson distribution!

• Sum of independent Poisson random variables is also a Poisson random


variable with 𝜆 = 𝜆" + 𝜆#

• For large 𝑛 (𝑛 ≥ 100) and small 𝑝 (𝑝 ≤ 0.01), Binomial distribution can be


approximated as Poisson distribution with parameter 𝜆 = 𝑛𝑝
Markov’s Inequality
If 𝑋 is a random variable that takes only non-negative values, then for 𝑎 > 0,

𝑃(𝑋 ≥ 𝑎) ≤ 𝐸(𝑋)/𝑎
Poisson Distribution
Y '
• 𝑃 𝑋=𝑖 = 𝑒 -Y X! , 𝑖 = 0,1,2, …

• Mean & Variance are same for the Poisson distribution

] (' ] )* Y'
• 𝜙 𝑡 = ∑
X!
= 𝑒 -Y ∑(𝜆𝑒 Z )X = exp(𝜆(𝑒 Z −1))

• Sum of independent Poisson random variables is also a Poisson random


variable with 𝜆 = 𝜆" + 𝜆#

• For large 𝑛 (𝑛 ≥ 100) and small 𝑝 (𝑝 ≤ 0.01), Binomial distribution can


be approximated as Poisson distribution with parameter 𝜆 = 𝑛𝑝
Example
The number of defective RT-PCR kits produced daily at a diagnostic lab is Poisson
distributed with mean 4. Over a two day span, what is the probability that the
number of defective kits does not exceed 4?

* X
8
𝑃 𝑋" + 𝑋# ≤ 4 = ~ 𝑒 -,
𝑖!
X_)
Normal Distribution

(,)-) "
" - "
Probability Density Function 𝑓 𝑥 = ` #a
𝑒 /

• 68% of data lies within 𝜇 ± 𝜎


• 95% of the data lies within 𝜇 ± 2𝜎
• 99.5% of the data lies within 𝜇 ± 3𝜎

Random Variable, X
Normal Distribution
" Z " /#
1. Moment generating function, 𝜙 𝑡 = 𝑒 bZ0`

2. 𝐸 𝑋 = 𝜇, 𝑉𝑎𝑟 𝑋 = 𝜎 #

[-b
3. Standard Normal Variable, 𝑍 =
`

Random Variable, X
4. 𝑍 follows Normal distribution with mean 0 and variance 1

" -U " /#
5. Standard normal distribution, 𝑓 𝑥 = #a
𝑒
Cumulative Distribution Function

Cumulative Distribution Function,


U
1 -c " /#
Φ 𝑥 =𝑃 𝑍≤𝑥 =e 𝑒 𝑑𝑧
-Q 2𝜋

𝑋−𝜇 𝑎−𝜇 𝑎−𝜇


𝑃 𝑋<𝑎 =𝑃 < = Φ( )
𝜎 𝜎 𝜎

𝑏−𝜇 𝑎−𝜇
𝑃 𝑎<𝑋<𝑏 =Φ − Φ( )
𝜎 𝜎
Example 1
If X is a normal random variable with mean 𝜇 = 3 and variance 𝜎 # = 16, find

""-+
1. 𝑃(𝑋 < 11) = 𝑃 𝑍 < *
= 𝑃 𝑍 < 2 = Φ 2 = 0.9772

-"-+
2. 𝑃(𝑋 > −1) = 𝑃 𝑍 > *
= 𝑃 𝑍 > −1 = 𝑃 𝑍 < 1 = Φ 1 = 0.8413

#-+ d-+ "


3. 𝑃 2 < 𝑋 < 7 = 𝑃 * < 𝑍 < * = 𝑃 − * < 𝑍 < 1
1 1
=Φ 1 −Φ − =Φ 1 − 1−Φ = 0.8413 + 0.5987 − 1 = 0.44
4 4
Example 2

The power 𝑊 dissipated in a resistor is given by the expression 𝑊 = 3𝑉 #


where 𝑉 represents the voltage. If 𝑉 is a random variable with mean 6 and
standard deviation 1, calculate

a) 𝐸 𝑊 = 𝐸 3𝑉 # = 3𝐸 𝑉 # = 3 𝑉𝑎𝑟 𝑉 + 𝐸 𝑉 # = 111

*)-(
b) 𝑃(𝑊 > 120) = 𝑃 3𝑉 #
> 120 = 𝑃 𝑉 > 40 = 𝑃 𝑍 > "
=
𝑃 𝑍 > 0.3246 = 1 − Φ 0.3246 = 0.3727
𝑃 𝑍 < −𝑥 = Φ −𝑥 = 𝑃 𝑍 > 𝑥 = 1 − Φ 𝑥
Example 3
If the yearly precipitation in Pune is a normal random variable with a mean of
12.08 inches and a standard deviation of 3.1 inches, Find the probability that
the total precipitation during the next 2 years will exceed 25 inches.

• If 𝑋X follow normal distribution with parameters 𝜇X , 𝜎X , then ∑ 𝑋X follow


normal distribution with mean ∑ 𝜇X and variance ∑ 𝜎X#

Let 𝑋", 𝑋# be random variables representing the annual rainfall in next 2


years.

Then 𝑋" + 𝑋# is a normal random variable with mean = 24.16 inches and
standard deviation 3.1 2 = 4.384 inches

#2-#*."(
Therefore, 𝑃 𝑋" + 𝑋# > 25 = 𝑃 𝑍 > = 𝑃 𝑍 > 0.1916 = 0.424
*.+,*
Uniform Distribution

Prob.
"
,𝑎 ≤𝑥≤𝑏
1. 𝑓 𝑥 = D-I
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

𝑎 𝑏 𝑥
e-f
2. 𝑃 𝛼 ≤ 𝑥 ≤ 𝛽 = D-I
, [𝛼, 𝛽] is a sub-interval of 𝑎, 𝑏

I0D (D-I)"
3. 𝐸 𝑋 = #
, 𝑉𝑎𝑟 𝑋 = "#
Exponential Distribution
𝜆𝑒 -YU , 𝑥 ≥ 0
1. 𝑓 𝑥 =
0, 𝑥 < 0

2. 𝐹 𝑥 = 1 − 𝑒 -YU implying 𝑃 𝑋 > 𝑥 = 𝑒 -YU

3. Exponential distribution is memoryless, i.e., no effect of conditioning

Y
4. 𝜙 𝑡 = Y-Z , 𝑡 < 𝜆

"
5. 𝐸 𝑋 = Y
, Var X = 1/𝜆#
Example

Suppose that a number of miles that a car can run before its battery wears out is
exponentially distributed with an average value of 10,000 miles. If a person desires
to take a 5,000-mile trip, what is the probability that she will be able to complete
her trip without having to replace her car battery?

"
We have, 𝜆 = "))))

P(remaining lifetime > 5000) = 1 − 𝐹 5000 = 𝑒 -2)))Y = 𝑒 -).2 ≈ 0.6


Covariance

• 𝐶𝑜𝑣 𝑋, 𝑌 = 𝐸 𝑋 − 𝜇[ 𝑌 − 𝜇g = 𝐸 𝑋𝑌 − 𝐸 𝑋 𝐸 𝑌

• 𝐶𝑜𝑣 𝑋, 𝑌 = 𝐶𝑜𝑣 𝑌, 𝑋

• 𝐶𝑜𝑣 𝑋, 𝑋 = 𝑉𝑎𝑟 𝑋

• 𝐶𝑜𝑣 𝑎𝑋, 𝑌 = 𝑎𝐶𝑜𝑣 𝑋, 𝑌

• 𝐶𝑜𝑣 𝑋 + 𝑍, 𝑌 = 𝐶𝑜𝑣 𝑋, 𝑌 + 𝐶𝑜𝑣 𝑍, 𝑌

• 𝐶𝑜𝑣 ∑ 𝑋X , 𝑌 = ∑ 𝐶𝑜𝑣 𝑋X , 𝑌
• 𝑉𝑎𝑟 ∑ 𝑋X = 𝐶𝑜𝑣 ∑ 𝑋X , ∑ 𝑋h = ∑ 𝐶𝑜𝑣 𝑋X , 𝑋X + ∑%X_" ∑%h_" 𝐶𝑜𝑣(𝑋X , 𝑋h )
Xih

• 𝑉𝑎𝑟 ∑ 𝑋X = ∑ 𝑉𝑎𝑟(𝑋X ) + ∑%X_" ∑%h_" 𝐶𝑜𝑣(𝑋X , 𝑋h )


Xih

• 𝑉𝑎𝑟 𝑋 + 𝑌 = 𝑉𝑎𝑟 𝑋 + 𝑉𝑎𝑟 𝑌 + 2𝐶𝑜𝑣 𝑋, 𝑌

• If 𝑋, 𝑌 are independent, 𝐶𝑜𝑣 𝑋, 𝑌 = 0


Bivariate Data

How are the two variable related?


Correlation Coefficient

∑(U-U)(W-
̅ l
W)
• 𝑟=
∑(U-U)̅ " ∑(W-W)
l "

∑(U-U)(W-
̅ l
W)
• Covariance, 𝑆UW =
%-"

m,0
• 𝑟=m
, m0

• −1 ≤ 𝑟 ≤ 1
Linear Regression

Fit a line 𝑦 = 𝑎 + 𝑏𝑥 such that error


is minimized

~ 𝑦 = 𝑛𝑎 + 𝑏 ~ 𝑥

~ 𝑥𝑦 = 𝑎 ~ 𝑥 + 𝑏 ~ 𝑥 #
Summary
• Counting – Permutations Vs Combinations

• Conditional Probability – Sensitivity, Specificity, Predictive Value, ROC


Curves

• Random Variables - Expectation & Variance

• Special Random Variables – Binomial, Poisson, Uniform, Normal,


Exponential distributions

• Moment Generating Functions

• Covariance

• Correlation & Regression

You might also like