11 views

Uploaded by S.Waqquas

Lecture Stats

- Introduction to the Normal Distribution.pdf
- Introduction to Biostatistics Second Edition
- Chapter 3 Descriptive Statistics
- "what is mean of life?"
- The Normal Distribution or Gaussian
- Further May02 p1
- Online Signature Verification Using Probabilistic Feature Modelling
- Central Limit Theorm
- Solomon press S1G
- Probability
- Normal Dist
- 21 PERT.pdf
- Ch. 6 Continuous Random Variable
- 40013837 Add Maths Perfect Score Module Form 4 Topical
- Statistik
- Lesson 4 - Linear Coding
- EEE 25 Lec 1
- 22078_v1intro
- s11stat200mtB.pdf
- DIPQNAUNITVI1

You are on page 1of 42

and Economics

Module 1:Probability Theory and

Statistical Inference

Spring 2010

Lecture 3: Continuous probability distributions

Priyantha Wijayatunga, Department of Statistics, Ume

University

These materials

are altered ones from copyrighted lecture slides ( 2009 W.H.

priyantha.wijayatunga@stat.umu.se

Freeman and Company) from the homepage of the book:

The Practice of Business Statistics Using Data for Decisions :Second Edition

by Moore, McCabe, Duckworth and Alwan.

Continuous probability

distributions

Probability density

Sampling distributions

Distributions

Let X denote the # of days a student comes to class (in a week).

Probability distibution is

0.1

0.2

P X x p ( x) 0.2

0.3

0.2

if x 1

if x 2

if x 3

if x 4

if x 5

then

1)what is the probability that a student comes to the class more than 3 days?

2)what is the probability that a student comes to the class 2 or 3 days?

Continuous Probability

A

continuous random variable X takes all values in an interval.

Distributions

Example: There is an infinity of numbers between 0 and 1 (e.g., 0.001, 0.4, 0.0063876).

by a density curve ( also called density function or probability

density).

The probability of any event is the area under the density curve for the

values of X that make up the event.

This is a uniform density curve for the variable X.

The probability that X falls between 0.3 and 0.7 is

the area under the density curve for that interval:

P(0.3 X 0.7) = (0.7 0.3)*1 = 0.4

Density function:

X

f(x)= 1; for 0 x 1

f(x)= 0; for x<0 or x>1

Intervals

All continuous probability distributions assign probability 0 to every

individual outcome. Only intervals can have a positive probability, represented

by the area under the density curve for that interval.

P(X=1) = (1 1)*1 = 0

Height

=1

boundary values are included or excluded:

P(0 X 0.5) = (0.5 0)*1 = 0.5

P(0 < X < 0.5) = (0.5 0)*1 = 0.5

P(X < 0.5 or X > 0.8) = P(X < 0.5) + P(X > 0.8) = 1 P(0.5 < X < 0.8) = 0.7

outcomes

curve.

If

all possible outcomes are equally likely: for example, obtaining a

outcomes

value from 0 to 1 is equally likely.

P(0.3 X 0.7) = 0.4

Similarly, P(X < 0.5 or X > 0.8) = 0.5 +0.2 = 0.7

If

the outcomes are equally likely for any value in between two numbers a and b

distribution

(random variable X can take any value in between a and b) where a<b,

f (x)

(b - a)

if a x b

otherwise

takes to solve a math problem is

known to be any number in between

10 to 20 with equal chances.

Find the probability that a student

takes more than 6 but less than 12

minutes to solve a given math problem.

distribution

The shaded area under a density

curve shows the proportion, or %,

of individuals in a population with

values of X between x1 and x2.

one individual at random

depends on the frequency of this

type of individual in the population,

the probability is also the shaded

area under the curve.

% individuals with X

such that x1 < X < x2

in a recent year had the normal distribution with mean =18.6 and

standard deviation = 5.9.

What is the probability that a randomly chosen student scores 21 or

higher?

Normal probability

distributions

The

probability distribution of many random variables is a normal

distribution. It shows what values the random variable can take and is

used to assign probabilities to those values.

Example: Probability

distribution of womens

heights.

Here since we chose a woman

randomly, her height, X, is a

random variable.

standardize the random variable (z score) and use Table A.

Normal distributions

Normal or Gaussian distributions are a family of symmetrical, bell

shaped density curves defined by a mean (mu) and a standard

deviation (sigma) : N().

f ( x)

1

2

1 x

x

e = 2.71828 The base of the natural logarithm

= pi = 3.14159

Here means are the same ( = 15)

while standard deviations are

different ( = 2, 4, and 6).

( = 10, 15, and 20) while

standard deviations are the same

( = 3)

Inflection point

mean = 64.5

Because all Normal distributions share the same properties, we can

standardize our data to transform any Normal curve N() into the

standard Normal curve N(0,1).

N(64.5, 2.5)

N(0,1)

=>

Standardizing: calculating zA

z-score measures the number of standard deviations that a data

scores

value x is from the mean .

(x )

z

than the mean, then z = 1.

for x , z

than the mean, then z = 2.

for x 2 , z

2 2

When x is smaller than the mean, z is negative.

N(, ) =

N(64.5, 2.5)

distribution. What percent of women are

Area= ???

mean = 64.5"

standard deviation = 2.5"

x (height) = 67"

Area = ???

= 64.5 x = 67

z=0

z=1

(x )

(67 64.5) 2.5

, z

2.5

2.5

Because of the 68-95-99.7 rule, we can conclude that the percent of women

shorter than 67 should be, approximately, .68 + half of (1 - .68) = .84 or 84%.

What is the probability, if we pick one woman at random, that her height will be

some value X? For instance, between 68 and 70 inches P(68 < X < 70)?

Because the woman is selected at random, X is a random variable.

(x )

z

N(, ) =

N(64.5, 2.5)

For x = 68",

(68 64.5)

1. 4

2.5

For x = 70",

(70 64.5)

2.2

2.5

0.9192

0.9861

The area under the curve for the interval [68" to 70"] is 0.9861 0.9192 = 0.0669.

Thus, the probability that a randomly chosen woman falls into this range is 6.69%.

P(68 < X < 70) = 6.69%

Using Table A

Table A gives the area under the standard Normal curve to the left of any z value.

.0082 is the

area under

N(0,1) left

of z = -2.40

under N(0,1) left

of z = -2.41

under N(0,1) left

of z = -2.46

()

For z = 1.00, the area under

the standard Normal curve

to the left of z is 0.8413.

N(, ) =

N(64.5, 2.5)

Area 0.84

Conclusion:

84.13% of women are shorter than 67.

Area 0.16

women are taller than 67".

= 64.5 x = 67

z=1

Because the Normal distribution

is symmetrical, there are 2 ways

Area = 0.9901

under the standard Normal curve

Area = 0.0099

z = -2.33

area right of z =

area left of z

To calculate the area between 2 z-values, first get the area under N(0,1)

to the left for each z-value from Table A.

Then subtract the

smaller area from the

larger area.

A common mistake made by

students is to subtract both zvalues, but the Normal curve is

not uniform.

area left of z1 area left of z2

(Try calculating the area to the left of z minus that same area!)

score at least 820 on the combined math and verbal SAT exam to compete in their

first college year. The SAT scores of 2003 were approximately normal with mean

1026 and standard deviation 209.

What proportion of all students would be NCAA qualifiers (SAT 820)?

x 820

1026

209

(x )

z

(820 1026)

z

209

206

z

0.99

209

Table A : area under

N(0,1) to the left of

z - .99 is 0.1611

or approx.16%.

=

=

total area

1

0.1611

84%

exactly 820 on the SAT. However, the proportion of scores

exactly equal to 820 is 0 for a normal distribution is a

consequence of the idealized smoothing of density curves.

The NCAA defines a partial qualifier eligible to practice and receive an athletic

scholarship, but not to compete, as a combined SAT score is at least 720.

What proportion of all students who take the SAT would be partial

qualifiers? That is, what proportion have scores between 720 and 820?

x 720

1026

209

(x )

z

(720 1026)

z

209

306

z

1.46

209

Table A : area under

N(0,1) to the left of

z - .99 is 0.0721

or approx. 7%.

area between

720 and 820

9%

=

=

0.1611

0.0721

between 720 and 820.

normally distributed data is that

we can manipulate it and then find

answers to questions that involve

comparing seemingly noncomparable distributions.

data. All this involves is changing

the scale so that the mean now = 0

and the standard deviation = 1. If

you do this to different distributions

it makes them comparable.

(x )

z

N(0,1)

Backward normal calculations: We may also want to find

the observed range of values that correspond to a given proportion under the

curve.

For that, we use Table A backward:

area/proportion in the

body of the table

corresponding z-value

from the left column and

top row

For an area to the left of 1.25 % (0.0125),

the z-value is -2.24

approximately the N(25.7, 5.88) distribution. How many miles per gallon

must a vehicle get to place in the top 10% of all 2001 model compact cars?

1. z = 1.28 is the standardized

value with area 0.9 to its left and

0.1 to its right.

2. Unstandardize

x 25.7

1.28

5.88

Solving for x gives x = 33.2

miles per gallon.

probability tables

0.2

0.0

0.1

density

0.3

0.4

-3

-2

-1

Z

P(Z > 1.87 )= 0.03

X 10

P X 11 P

11.025 10

0.3

P Z 1.87

1 P Z 1.87

1 - 0.9693

0.0307

0.3

One way to assess if a distribution is indeed approximately normal is to

plot the data on a normal quantile plot.

The data points are ranked and the percentile ranks are converted to zscores with Table A. The z-scores are then used for the x axis against

which the data are plotted on the y axis of the normal quantile plot.

If the distribution is indeed normal the plot will show a straight line,

indicating a good match between the data and a normal distribution.

distribution. Outliers appear as points that are far away from the overall

pattern of the plot.

the earnings of 15 black

female hourly workers at

National Bank. This

distribution is roughly

Normal except for one

low outlier.

the salaries of Cincinnati

Reds players on opening

day of the 2000 season.

This distribution is

skewed to the right.

As the number of randomly drawn

observations in a sample increases,

the mean of the sample

gets

mean .

This is the law of large numbers. It

is valid for any population.

but it is wrong. The law of large numbers only applies to really large numbers.

distribution?

The sampling distribution of a statistic is the distribution of all

possible values taken by the statistic when all possible samples of a

fixed size n are taken from the population. It is a theoretical idea we

do not actually build it.

of that statistic.

Sampling distribution of

We

take many random

samples of a given size n from a population

sample

mean

with mean and standard deviation

Some sample means will be above the population mean and some

will be below, making up the sampling distribution.

Sampling

distribution

of x bar

Histogram

of some

sample

averages

The mean of the sampling distribution is equal to the population

mean

is the sample size.

The

sample

mean

Mean of a sampling distribution of

x

below even if the distribution of the raw data is skewed. Thus, the mean

of the sampling distribution is an unbiased estimate of the population

mean it will be correct on average in many samples.

standard deviation of the population by a factor of n. Averages are

less variable than individual observations. Also, the results of large

samples are less variable than the results of small samples.

populations

When a variable in a population is normally distributed, the sampling

distribution of the sample mean for all possible samples of size n is

also normally distributed.

Sampling distribution

If the population is N( )

then the sample means

distribution is N( /n).

Population

Central Limit Theorem: When randomly sampling from any population

with mean and standard deviation , when n is large enough, the

sampling distribution of x bar is approximately normal: ~ N( /n).

Population with

strongly skewed

distribution

Sampling

distribution of

x for n = 2

observations

Sampling

distribution of

x for n = 10

observations

Sampling

distribution of

x for n = 25

observations

Histogram of 1000 sample means of 50-sized samples

Density

1.0

1.0

0.5

0.5

0.0

0.0

Density

1.5

1.5

2.0

2.5

Bin(5,0.7)

3.0

3.2

3.4

3.6

3.8

sample mean

random samples with n=50 and get their sample means

Relative frequency distribution is pproximately normal (bell shaped)

mean=3.50164 and sd=0.1471508

1.024695/ 50 0.1449138

In a large population of adults, the mean IQ is 112 with standard deviation 20.

Suppose 200 adults are randomly selected for a market research campaign.

The

B) Approximately normal, mean 112, standard deviation 20

C) Approximately normal, mean 112 , standard deviation 1.414

D) Approximately normal, mean 112, standard deviation 0.1

Application

Hypokalemia is diagnosed when blood potassium levels are low, below

3.5mEq/dl. Lets assume that we know a patient whose measured potassium

levels vary daily according to a normal distribution N( = 3.8, = 0.2).

If only one measurement is made, what is the probability that this patient will be

misdiagnosed hypokalemic?

( x ) 3.5 3.8

z

0.2

of such a misdiagnosis?

( x ) 3.5 3.8

z

n

0.2 4

Note: Make sure to standardize (z) using the standard deviation for the sampling

distribution.

Income distribution

Lets consider the very large database of individual incomes from the Bureau of

Labor Statistics as our population. It is strongly right skewed.

We take 1000 SRSs of 100 incomes, calculate the sample mean for

each, and make a histogram of these 1000 means.

We also take 1000 SRSs of 25 incomes, calculate the sample mean for

each, and make a histogram of these 1000 means.

Which histogram

corresponds to the

samples of size

100? 25?

It depends on the population distribution. More observations are

required if the population distribution is far from normal.

distribution from a strong skewness or even mild outliers.

skewness and outliers.

even for strange population distributions we can

assume a normal sampling distribution of the mean

and work with it to solve problems.

- Introduction to the Normal Distribution.pdfUploaded bywolfretonmaths
- Introduction to Biostatistics Second EditionUploaded byAjmal Khan
- Chapter 3 Descriptive StatisticsUploaded byPhillipBattista
- "what is mean of life?"Uploaded byJazz Mink
- The Normal Distribution or GaussianUploaded byrodwellhead
- Further May02 p1Uploaded bykhey
- Online Signature Verification Using Probabilistic Feature ModellingUploaded byapi-3837813
- Central Limit TheormUploaded byGhada Sheasha
- Solomon press S1GUploaded bynman
- ProbabilityUploaded byRussell Mindanao
- Normal DistUploaded byKanav Gupta
- 21 PERT.pdfUploaded bySureshKumar
- Ch. 6 Continuous Random VariableUploaded byRyan Joseph Agluba Dimacali
- 40013837 Add Maths Perfect Score Module Form 4 TopicalUploaded byahlai420
- StatistikUploaded bysuprpto
- Lesson 4 - Linear CodingUploaded byRob
- EEE 25 Lec 1Uploaded byLeiko Ravelo
- 22078_v1introUploaded byKen Hong
- s11stat200mtB.pdfUploaded byJamie Samuel
- DIPQNAUNITVI1Uploaded byDr Sam
- 2101 Sample Test 2 Spring 2014Uploaded bykdealer358
- ps13Uploaded byspitzersglare
- Q6Uploaded byAnonymous 33HayWW
- Scores of studentsUploaded byEna Fifah
- FINAL QUANTITATIVE RESEARCH-1.docxUploaded byJan Michael Florece
- CV2001 chapter1Uploaded byTest
- 16.3.1_ST_V1.2_20170810Uploaded byKunal Malik
- Chapter13 NewUploaded byKaustubh Tirpude
- 10. Bus Stat - Chapter 10 - April 4thUploaded byIvanJulian

- Mechanics For Advanced Level PhysicsUploaded byHubbak Khan
- AP Stats Project 15Uploaded byS.Waqquas
- ExercisesUploaded byS.Waqquas
- lfstat3e_ppt_07Uploaded byS.Waqquas
- lfstat3e_ppt_08Uploaded byS.Waqquas
- Lecture-2Uploaded byS.Waqquas
- Lecture-1Uploaded byS.Waqquas
- Lecture-5Uploaded byS.Waqquas
- Probability, Sampling and DistributionsUploaded byS.Waqquas
- p1-p3Uploaded byBoodish Radhakeesoon
- Revisionguide - StatsUploaded byS.Waqquas
- Aqa w Trb Pract PapersUploaded bySarahBukhsh
- 271649503 Edexcel Statistics 3Uploaded byA4L
- LectureUploaded byS.Waqquas

- LIC 2m With AnswerUploaded byKavitha Manoharan
- CEC-700-2005-004-DUploaded bySyllogos Kireas
- Dynamic Analysis of Cantilever in AbaqusUploaded byScottBuchholz
- Kokinshu (Pocos Poemas)Uploaded byDavid Monregno
- measurement length term 1Uploaded byapi-277245562
- blackwellbusachevanslanguagecommunityprojectUploaded byapi-351152540
- Review_22_1_2013_MarchUploaded byTAASA
- 5 Tenses DasarUploaded byBadzlan Hasbi
- Samsara e Nirvana: come sorgono la confusione e la liberazione - Italo CilloUploaded bylobeito
- Sol2e Int Progress Test 07B.docUploaded byNatasha Marina
- NavaratriUploaded byGaurav Mittal
- Entering the Portal of Peak States of ConsciousnessUploaded byImmalight
- e-booksUploaded byapi-3728237
- 1421 CritiqueUploaded bycarlostaddei
- Comsol TutorialUploaded byjimmy_burgos_11
- 1-s2.0-S0924424705006321-mainUploaded byTerence Deng
- HD_HandbookUploaded bysean12266727
- Halo: Glasslands; GlasslandsUploaded byMacmillan Publishers
- SS NMR TimescaleUploaded byMarióxido de Sodio
- MAR6862 Customer Relationship Management Syllabus Sect ALL, Cooke, AUploaded byAnindita A. Primastya
- Genre Remediation Reflection.docxUploaded byJoseph Carter
- Financial Report analysis of Asian Paints 2016-17Uploaded byAshutosh Singh
- Lab Report 5-Engineering StaticsUploaded byAndhika Utama
- good morning miss toliverUploaded byapi-408733519
- Saulo Ribeiro - Jiu-Jitsu UniversityUploaded bypedrofsette
- Dcmeet Second v2Uploaded byAnonymous TxPyX8c
- Sika ViscoCrete 4203 NSUploaded bySantosh Kumar Gouda
- CRM Evaluation GuideUploaded byRamon Rincones
- Ancient Stellar Races Involved With Humanity - Ashayana DeanneUploaded byRobson Honorato
- Wikipedia - Pterygium (Conjunctiva) (CHECKED)Uploaded bypixoguias