0% found this document useful (0 votes)
40 views83 pages

5 Probability Distributions

Uploaded by

Elsai Esb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views83 pages

5 Probability Distributions

Uploaded by

Elsai Esb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Probability Distributions

• A probability distribution is a device for


indicating the values that a random variable may
have by applying the theory of probability

• Random Variable = Any quantity or characteristic


that is able to assume a number of different values
such that any particular outcome is determined by
chance
• e.g. No. of patients in pediatric OPD.
1
cont.…

• Random variables can be either discrete or


continuous

• A discrete random variable is able to assume only a


finite or countable number of outcomes

• A continuous random variable can take on any


value in a specified interval

2
Cont…

• Therefore, the probability distribution of a random


variable is a
• table
• graph or
• formula
that gives the probabilities with which the
random variable takes different values or ranges
of values.
3
A. Discrete Probability Distributions

• For a discrete random variable, the


• Example
probability distribution specifies each of
• Toss two coins
the possible outcomes of the random
• Out come =
variable along with the probability that SS
each will occur. • {TT,TH,HT,HH
}
• Examples can be:
• xX = number
0 1of 2
head
• Frequency distribution f 1 2 2
• P(x)
X = 0,
0.251, 2
0.5 0.25
• Relative frequency distribution LCF 0.25 0.75 1

• Cumulative frequency 4
Properties of Probability Distribution:
1. P ( x)  0, if X is discrete.
f ( x)  0, if X is continuous.

2.  PX  x   1 ,
x
if X is discrete .

 f ( x)dx
x
 1 , if is continuous .

Note:
•If X is a continuous random variable then
b
P ( a  X  b)  
a
f ( x ) dx
•Probability of a fixed value of a continuous random
variable is zero.
 P ( a  X  b)  P ( a  X  b)  P ( a  X  b)  P ( a  X  b)
•If X is discrete random variable the
b 1
P ( a  X  b)   P ( x )
x  a 1
b 1
P ( a  X  b)   p ( x )
xa
b
P ( a  X  b)   P ( x )
x  a 1
b
P ( a  X  b)   P ( x )
xa

•Probability means area for continuous random variable.


cont.…

Example: The following data shows the number of


diagnostic services a patient receives

7
cont.…

• What is the probability that a patient receives exactly


3 diagnostic services?

P(X = 3) = 0.031

• What is the probability that a patient receives at most


one diagnostic service?

P (X ≤ 1) = P(X = 0) + P(X = 1)

= 0.671 + 0.229

= 0.900
8
cont.…

• What is the probability that a patient receives at least four


diagnostic services?
P (X≥4) = P(X = 4) + P(X = 5)
= 0.010 + 0.006
= 0.016

Probability distributions can also be displayed using a graph.

What it looks like???

9
cont.…
• If a random variable is able to take on a large
number of values, then a probability mass
function might not be the most useful way to
summarize its behavior

• Instead, measures of location and dispersion can


be calculated (as long as the data are not
categorical)

10
Definition:
•Let a discrete random variable X assume the values X1,

X2, …, Xn with the probabilities P(X1), P(X2), ….,P(Xn)

respectively. Then the expected value of X ,denoted as


E(X) is defined as:
E ( X )  X 1 P( X 1 )  X 2 P( X 2 )  ....  X n P( X n )
n
  X i P( X i )
i 1
•Let X be a continuous random variable assuming the
b

values in the interval (a, b) such that  f ( x)dx  1


b a

,then E ( X )   x f ( x)dx
a

•The variance of X is given by:

Variance of X  var( X )  E ( X )  [ E ( X )]
2 2

Where:
n
E ( X )   xi P( X  xi ) , if X is discrete
2 2

i 1

  x f ( x)dx ,
2
if X is continuous .
x
There are some general rules for mathematical expectation.
Let X and Y are random variables and k is a constant.

RULE 1 E (k )  k
RULE 2 Var (k )  0
RULE 3 E (kX )  kE ( X )

RULE 4 Var (kX )  k 2Var ( X )


RULE 5 E ( X  Y )  E ( X )  E (Y )
cont.…

• For the diagnostic service data,

Mean (X) = 0(0.671) +1(0.229) +2(0.053)


+3(0.031) +4(0.010) +5(0.006)
= 0.498 ≈ 0.5

• We would expect an average of 0.5 services for each visit

14
Discrete cont.…

σ2 = (0− 0.5)2(0.671) +(1 − 0.5)2(0.229)


+(2 − 0.5)2(0.053) +(3 − 0.5)2(0.031)
+(4 − 0.5)2(0.010) +(5 − 0.5)2(0.006)
= 0.782

Standard deviation = σ = √0.782 = 0.884

Hint: remember calculating variance from grouped data

15
Cont…

• To obtain the expected value of a discrete random variable X,


we multiply each possible outcome by its associated probability
and sum all values with a probability greater than 0.
• Or Where the xi’s are the values the random variable assumes
with positive probability
Example: Consider the random variable representing the number
of episodes of diarrhea in the first 2 years of life.
Suppose this random variable has a probability mass function as
below
R 0 1 2 3 4 5 6
P(X=r) 0.129 0.264 0.271 0.185 0.095 0.039 0.017
Biostatistics 16
Cont…

• What is the expected number of episodes of diarrhoea in the

first 2 years of life?

• E(X)=0(.129)+1(.264)+2(.271)+3(.185)+4(.095)+5(.039)+6(.0

17)=2.038

• Thus, on the average a child would be expected to have 2

episodes of diarrhoea in the first 2 years of life.

• The variance of a discrete random variable denoted by X is

defined by

Biostatistics 17
cont.…

• Examples of discrete probability distributions are the


binomial distribution and the Poisson distribution

18
1. Binomial Distribution

• It is one of the most widely encountered discrete


distributions
• Consider dichotomous random variable
• Based on Bernoulli trial = When a single trial of some
experiment can result in only one of two mutually
exclusive outcomes (success or failure; dead or alive; sick
or well, male or female)

19
Binomial cont.…

Example:
• We are interested in determining whether a newborn
infant will survive until his/her 70th birthday
• Let Y represent the survival status of the child at age 70
years
• Y = 1 if the child survives and Y = 0 if he/she does not

20
Binomial cont.…

• The outcomes are mutually exclusive and exhaustive

• Suppose that 72% of infants born survive to age 70 years


P(Y = 1) = p = 0.72
P(Y = 0) = 1 − p = 0.28

21
Binomial cont.…

22
Binomial cont.…

• A binomial probability distribution occurs when the


following requirements are met
1. The procedure has a fixed number of trials
2. The trials must be independent
3. Each trial must have all outcomes that fall into two
categories
4. The probabilities must remain constant for each trial

23
Binomial cont.…

Characteristics of a Binomial Distribution


• The experiment consist of n identical trials
• Only two possible outcomes which are mutually exclusive
on each trial
• The probability of A (success) remains the same from trial
to trial. This probability is denoted by p, and the
probability of B (failure) is denoted by q
q = 1- p
• The trials are independent
• n and  are the parameters of the binomial distribution
• The mean is n and the variance is n(1- )
24
Binomial cont.…

• Suppose an event can have only binary outcomes A and B

• Let the probability of A is  and that of B is 1 - 

• The probability  stays the same each time the event


occurs

25
Binomial cont.…

• If an experiment repeated n times and the outcome is


independent from one trial to another, the probability
that outcome A occurs exactly x times is:

• P (X=x) = , x = 0, 1, 2, ..., n

26
Binomial cont.…

•n denotes the number of fixed trials


•x denotes the number of successes in the n trials
•p denotes the probability of success
•q denotes the probability of failure (1- p)

• Which means there are x objects in a group among n


objects
• where n!=n(n-1)(n-2)…(1) , and 0!=1
27
Binomial cont.…

Example:
• Suppose we know that 40% of a certain population are
cigarette smokers. If we take a random sample of 10
people from this population, what is the probability that
we will have exactly 4 smokers in our sample?

28
Binomial cont.…

• If we assume that the probability that any individual in


the population is a smoker to be P=.40, then the
probability that x=4 smokers out of n=10 subjects
selected is:

• P(X=4) =10C4(.4)4(1-.4)10-4
= 10C4(.4)4(.6)6 = 210(.0256)(.04666)
= 0.25
• Or the probability of obtaining exactly 4 smokers in the
sample is about 25%
29
Binomial cont.…

• We can compute the probability of observing zero


smokers out of 10 subjects selected at random, exactly 1
smoker, and so on, and display the results in a table, as
given, below.

• The third column, P(X ≤ x), gives the cumulative


probability. E.g. the probability of selecting 3 or fewer
smokers into the sample of 10 subjects is
P(X ≤ 3) =.3823, or about 38%.
30
Binomial cont.…

31
Binomial cont.…

Exercise
• Each child born to a particular set of parents has a
probability of 0.25 of having blood type O. If these
parents have 5 children.
• What is the probability that?
a. Exactly two of them have blood type O
b. At most 2 have blood type O
c. At least 4 have blood type O
d. 2 do not have blood type O.

32
Binomial cont.…

a) Solution for ‘a’

 5 2 5-2
P(x  2) =  (0.25) (0.75)
 2
 0.2637

33
Binomial cont.…

The Mean and Variance of a Binomial Distribution


• Once n and P are specified, we can compute the
proportion of success,

• The mean and variance of the distribution are given by:


μ = np, σ2 = npq, σ = √npq

34
Binomial cont.…

Example:
• 70% of a certain population has been immunized for
polio. If a sample of size 50 is taken, what is the
“expected total number”, in the sample who have been
immunized?
µ = np = 50(.70) = 35

• This tells us that “on the average” we expect to see 35


immunized subjects in a sample of 50 from this
population.

35
Binomial cont.…

• If repeated samples of size 10 are selected from the


population of infants born, the mean number of children
per sample who survive to age 70 would be
np = (10)(0.72) = 7.2

• The variance would be npq = (10)(0.72)(0.28) = 2.02 and


the SD would be
√2.02 = 1.42

36
Exercise
• Suppose that in a certain malarious area past experience indicates that
the probability of a person with a high fever will be positive for malaria is
0.6. Consider 4 randomly selected patients (with high fever) in that same
area.
• 1) What is the probability that no patient will be positive for malaria?
2) What is the probability that exactly one patient will be positive for
malaria?
3) What is the probability that exactly two of the patients will be positive for
malaria?
4) What is the probability that all patients will be positive for malaria?
5) Find the mean and the SD of the probability distribution given above.

37
B. Continuous Probability Distributions

• A continuous random variable can take on any value in a


specified interval or range

• With a large number of class intervals, the frequency


polygon begins to resemble a smooth curve.

• The probability distribution of X is represented by a


smooth curve called a probability density function

38
Continuous cont.…

• The area under the smooth curve is equal to 1


• The area under the curve between any two points x1 and
x2 is the probability that X takes a value between x1 and
x2 39
Continuous cont.…

• Instead of assigning probabilities to specific outcomes of


the random variable X, probabilities are assigned to
ranges of values
• The probability associated with any one particular value
is equal to 0

40
Continuous cont.…

• We calculate:
Pr [ a < X < b], the probability of an interval of values
of X.

• For the above reason,

is also without meaning

41
The Normal distribution

• The Normal distribution is the most important probability


distribution in statistics
• It is frequently called the “Gaussian distribution” or bell-
shape curve.
• Variables such as blood pressure, weight, height, serum
cholesterol level, — are approximately normally
distributed

42
Normal cont.…
• Distribution of weights of 57 children; the frequency
distribution consists of intervals with a width of 10 lb.

43
Normal cont.…

• Now imagine that we increase the number of children


to 50,000 and decrease the width of the intervals to
0.01 lb. The histogram would now look more like;

44
Normal cont.…

• A random variable is said to have a normal distribution if


it has a probability distribution that is symmetric and bell-
shaped

45
Normal cont.…

• If we continue to increase the size of the data set and


decrease the interval width, we eventually arrive at a
smooth curve superimposed on the histogram of
called a density curve.

central limit theorem


 If sample sizes are fairly large, values of x(or p) in
repeated sampling have a very nearly normal
distribution.

46
Normal cont.…

• The concept “probability of X=x” is replaced by the


“probability density function fx ( ) evaluated at X=x”
• The probability that the variable assumes any value in an
interval between two specific points a and b is given by

47
Normal cont.…

• A random variable X is said to follow normal distribution,


if and only if, its probability density function ( a formula
used to represent the distribution of a random variable)
is
2
1  x - 
1  
2  

f(x) = e
 2
, - < x < .

48
Normal cont.…

• π (pi) = 3.14159
• e = 2.71828, x = Value of X
• Range of possible values of X: -∞ to +∞
• µ = Expected value of X (“the long run average”)
• σ2 = Variance of X
• µ and σ are the parameters of the normal distribution —
they completely define its shape

49
Normal cont.…
• The normal distribution plays an important role in
statistical inference because:
1. Many real-life distributions are approximately normal.
2. Many other distributions can be almost normalized by
appropriate data transformations (e.g., taking the log).
When log X has a normal distribution, X is said to have
a lognormal distribution.
3. As a sample size increases, the means of samples
drawn from a population of any distribution will
approach the normal distribution. This theorem, when
stated rigorously, is known as the central limit
theorem.
50
Normal cont.…

51
Normal cont.…

1. The mean µ tells you about location -


• Increase µ - Location shifts right
• Decrease µ – Location shifts left
• Shape is unchanged

2. The variance σ2 tells you about narrowness or flatness


of the bell -
• Increase σ2 - Bell flattens. Extreme values are more likely
• Decrease σ2 - Bell narrows. Extreme values are less likely
• Location is unchanged

52
Normal cont.…

53
Normal cont.…

Properties of the Normal Distribution


1. It is symmetrical about its mean, .
2. The mean, the median and mode are all equal
3. The total area under the curve about the x-axis is 1
square unit.
4. The curve never touches the x-axis (asymptote)
5. As the value of  increases, the curve becomes more
and more flat and vice versa.

54
Normal cont.…

6. Perpendiculars of:
± SD contain about 68%;
±2 SD contain about 95%;
±3 SD contain about 99.7%
of the area under the curve.

7. The distribution is completely determined by the


parameters  and .

55
Normal cont.…

• As a result, we have different normal distributions


depending on the values of μ and σ2
• We cannot tabulate every possible distribution
• Tabulated normal probability calculations are available
only for the Normal Distribution with µ = 0 and σ2=1.

56
Standard Normal Distribution

· It is a normal distribution that has a mean equal to 0 and


a SD equal to 1, and is denoted by N(0, 1)
· The main idea is to standardize all the data that is given
by using Z-scores
· These Z-scores can then be used to find the area (and
thus the probability) under the normal curve

57
SND cont.…

• Z-transformation: If a random variable X~N(,) then we


can transform it to a SND with the help of Z-
transformation

Z= x-

• Z represents the Z-score for a given x value

58
SND cont.…

• Consider redefining the scale to be in terms of how many


SDs away from mean for normal distribution, μ=110 and
σ=15.

Value x
50 65 80 95 110 125 140 155 170
-4 -3 -2 -1 0 1 2 3 4
SDs from mean using
(x-110)/σ = (x-μ)/σ

59
SND cont.…

• This process is known as standardization and gives the


position on a normal curve with μ=0 and σ=1, i.e., the
SND, Z.

• A Z-score is the number of standard deviations that a


given x value is above or below the mean.

60
SND cont.…

Finding normal curve areas


1. The table gives areas between -∞ and the value of zo.

2. Find the z value in tenths in the column at left margin


and locate its row. Find the hundredths place in the
appropriate column.

3. Read the value of the area (P) from the body of the
table where the row and column intersect. Values of P
are in the form of a decimal point and four places.

61
SND cont.…

Some Useful Tips

62
SND cont.…

• Standard normal curve and some important divisions.


63
SND cont.…

a) What is the probability that z < -1.96?

(1) Sketch a normal curve


(2) Draw a line for z = -1.9
(3) Find the area in the table
(4) The answer is the area to the left of the line P(z < -1.96) = .0250
64
SND cont.…

b) What is the probability that -1.96 < z < 1.96?

The area between the values P(-1.96 < z <


1.96) = .9750 - .0250 = .9500

65
SND cont.…

c) What is the probability that z > 1.96?

• The answer is the area to the right of the line; found by subtracting
table value from 1.0000; P(z > 1.96) =1.0000 - .9750 = .0250

66
67
Applications of the Normal Distribution

• The normal distribution is used as a model to study many


different variables.
• We can use the normal distribution to answer probability
questions about continuous random variables.

• Following the model of the normal distribution, a given


value of x must be converted to a z score before it can be
looked up in the z table

68
Applications cont.…

Example:
• The diastolic blood pressures of males 35–44 years of age
are normally distributed with µ = 80 mm Hg and σ2 = 144
mm Hg2
σ = 12 mm Hg

• Therefore, a DBP of 80+12 = 92 mm Hg lies 1 SD above


the mean
• Let individuals with BP above 95 mm Hg are considered
to be hypertensive

69
Applications cont.…

a. What is the probability that a randomly selected male has


a blood pressure above 95 mm Hg?

• Approximately 10.6% of this population would be classified


as hypertensive

70
Applications cont.…

b. What is the probability that a randomly selected male


has a DBP above 110 mm Hg?

Z = 110 – 80 = 2.50
12

P (Z > 2.50) = 0.0062


• Approximately 0.6% of the population has a DBP above
110 mm Hg

71
Applications cont.…

c. What is the probability that a randomly selected male


has a DBP below 60 mm Hg?
Z = 60 – 80 = -1.67
12

P (Z < -1.67) = 0.0475


• Approximately 4.8% of the population has a
DBP below 60 mm Hg

72
Applications cont.…

d. What value of DBP cuts off the upper 5% of this


population?
• Looking at the table, the value Z = 1.645 cuts off an area
of 0.05 in the upper tail
• We want the value of X that corresponds to Z = 1.645
Z=X–μ
σ
1.645 = X – μ, X = 99.7
σ
• Approximately 5% of the men in this population have a
DBP greater than 99.7 mm Hg
73
Exercises

1. If the total cholesterol values for a certain target


population are approximately normally distributed
with a mean of 200 (mg/100 mL) and a standard
deviation of 20 (mg/100 mL), the probability that a
person picked at random from this population will
have a cholesterol value greater than 240 (mg/100
mL) is

74
Exercise cont.…

2. Assume that the test scores for a large class are


normally distributed with a mean of 74 and a standard
deviation of 10.
(a) Suppose that you receive a score of 88. What percent
of the class received scores higher than yours?
(b) Suppose that the teacher wants to limit the number of
A grades in the class to no more than 20%. What would
be the lowest score for an A?

75
Exercise cont.…

3. Refer to the standard normal distribution. What is the


probability of obtaining a z value of:
a) At least 1.25?
b) At least 0.84?
c) Between 1.96 and 1.96?
d) Between 1.22 and 1.85?
e) Between 0.84 and 1.28?
f) Less than 1.72?
g) Less than 1.25?

76
Exercise cont.…

4. Refer to the standard normal distribution. Find a z


value such that the probability of obtaining a larger z
value is:
a) 0.05
b) 0.025
c) 0.20

77
Example2: Suppose that total carbohydrate intake in

12-14 year old males is normally distributed with

mean 124 g/1000 cal and SD 20g/1000 cal.

A. What percent of boys in this age range have

carbohydrate intake above 140g/1000 cal?

B. What percent of boys in this age range have

carbohydrate intake below 90g/1000 cal?


78
• Solution: Let X be carbohydrate intake in 12-14-
year-old males and X ∼ N (124, 400)
• A) P(X>140)==P(Z>0.8)= 1-
• P(Z<0.8)=1- 0.7881= 0.2119
• Interpretation: about 21.2% of boys in the age range
of 12-14 yrs have carbohydrate intake of above
140g/1000cal.

79
• P(X<90)= P(Z<)= P(Z< -1.7)=1-P(Z>1.7)

• N.B: P(X<-x)= P(X>x)


Exercises
1) Assume that among diabetics the fasting
blood level of glucose is approximately normally
distribute with a mean of 90 mg per 100
ml and SD of 4 mg per 100 ml.

80
a) What proportions of diabetics have levels
between 90 and 125 mg per 100 ml?

b) What proportions of diabetics have levels


below 87.4 mg per 100 ml?

c) What level cuts of the lower 10% of diabetics?

d) What are the two levels which encompass


95% of diabetics?
81
Exercise: Diskin et al. studied common breath metabolites

such as ammonia, acetone, isoprene, ethanol and


acetaldehyde in five subjects over a period of 30 days.
Each day, breath samples were taken and analyzed in the
early morning on arrival at the laboratory. For subject A,
a 27-year-old female, the ammonia concentration in
parts per billion (ppb) followed a normal distribution
over 30 days with mean 491 and standard deviation 119.
What is the probability that on a random day, the
subject‘s ammonia concentration is between 292 and
83

You might also like