DPBS 1203 Business and Economic Statistics

DPBS 1203
Business and Economic Statistics

Lecture 4.2
Calculating normal probabilities
 Normal random variables are
continuous  probabilities need to
be calculated as integrals
– Suppose the time (in minutes) it
takes to assemble a computer is
assumed to be X~N(50,100)
– What is the probability that a
computer is assembled in
between 45 and 60 minutes?
– We need tables, as there are no
“closed form” analytical solutions
for such integrals
Calculating normal probabilities…
 Strategy for calculations:
– Step 1: Standardize the variable, to yield an equivalent
probability statement for a standard normal variable
– Step 2: Use the standard normal tables
 Be careful as these can come in different forms!
 Some tables provide P(0 < Z < z)
 Others tables provide P(-∞<Z<z)
 What is P(45 < X < 60) in our computer assembly example?
– Standard normal tables are in Black’s Appendix
f(z)0.5
0.4
pdf
0.3
0.2
0.1
0
-4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 4
z
F(z) 1
0.9
0.8 F(z) 0.5
0.7
0.6 0.4
0.5 0.3
0.4
0.3 0.2 P(0 < Z < z)
P(-∞<Z<z) 0.2 0.1
0.1
0 0
-4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.5 1 1.5 2 2.5 3 3.5 4
z z
 45 − 50 X − µ 60 − 50 
P (45 < X < 60) = P < < 
 10 σ 10 
= P( −0.5 < Z < 1)
 OK, but suppose we only have tables for P(0 < Z < z) or P(-∞<Z<z) ??
– Solution: we can always manipulate probabilities into the needed form!
 Recall some properties of the normal distribution:
– Symmetric around its mean (which is 0 for the standard normal)
– Area under the whole pdf equals 1 (and area under half of it equals .5)
 45 − 50 X − µ 60 − 50 
P(45 < X < 60) = P < < 
 10 σ 10 
= P(−0.5 < Z < 1)
 P(-0.5 < Z < 1)=P(-0.5 < Z < 0)+P(0 < Z < 1)
 P(-0.5 < Z < 1)= P(-∞<Z<1) - P(-∞<Z<-.5) (take this to one type of
standard normal table)
 Now note that:
– P(-0.5 < Z < 0)=P(0 < Z < 0.5) by symmetry (take this to another type
of standard normal table)
– P(-0.5 < Z < 0) = .5 – P(-∞<Z<-.5) (by facts about area under the curve)
and P(0 < Z < 1) = .5 – P(-∞<Z<-1) (by symmetry and facts about area
under the curve) (verify using the first type of standard normal table)
P(0 < Z < 1) = .5 – P(-∞<Z<-1)
= .5 – .1587
= .3413
P(-0.5 < Z < 0) = .5 – P(-∞<Z<-.5)

= .5 - .3085 P(-∞<Z<-1)
= .1915
P(-∞<Z<-.5)
P(0 < Z < 0.5)
P(0 < Z < 1)

 Our required probabilities from this table are:
– P(0 < Z < 1) = .3413
– P(0 < Z < 0.5) = .1915
–  P(-0.5 < Z < 1) = .3413+.1915 = .5328
–  The probability of observed assembly time being between 45 and 60
minutes is .5328 (IFF assembly line time is drawn from a normal distribution!)
 To try at home: What is…
– P(Z >1) ?
– P(-1 < Z < 1) ?
– P(Z ≥1) ?
Calculating normal percentiles
 Tables can be used to solve 2 types of problems:
– Given a particular z, find P(0 < Z < z); OR
– Given a particular probability A, find zA such that P(Z > zA) = A or
P(Z < zA) = 1 - A
– Note that zA in the above expression is the [100*(1-A)]th percentile
of a standard normal!
Calculating normal percentiles…
 Use tables to verify that z0.025 = 1.96
 What is the…
– 97.5th percentile?
– 2.5th percentile?
– 97.5th percentile in computer assembly line

example – in the original distribution?
1.96=(X0.025 – 50)/10
X0.025 =(1.96)10+50 = 69.6 minutes
Normal approximation to the binomial
 We have used the formula or
the binomial tables to
evaluate probabilities for a
binomial random variable
– This is convenient for a
small number of trials (n)
– What if n is large?
 An important application of
the normal distribution is to
approximate the binomial
distribution for large n
Normal approximation to the binomial…
 Denote a binomial random variable by XB , and suppose p = .5
and n = 20
– We know that E(XB) = np = 10 and Var(XB) = np(1-p) = 5
– It may seem natural to choose the approximating normal
random variable as XN~N(10,5)
– How good is this approximation?
 Consider P(10 ≤ XB ≤ 12) = .1762 + .1602 + .1201 = .4565
while, using our approximation based on the standard The mean of
normal distribution, our X is 10
and its std dev
P(10 ≤ XN ≤ 12)=P(0 ≤ Z ≤ 0.89) = .3133 is sqrt(5)
– Yuck! What went wrong with this approximation?
 Would you approximate P(XB=10) by P(XN=10)?
Normal approximation to the binomial ...
 We need a continuity correction

– We could approximate P(XB=10) by P(9.5 ≤ XN ≤ 10.5)
– In general, we could approximate…
 P(XB ≤ x) by P(XN ≤ (x+0.5))
 P(XB ≥ x) by P(XN ≥ (x-0.5))
 Now let’s reconsider our approximation:
– P(10 ≤ XB ≤ 12)=.4565
– Use P(9.5 ≤ XN ≤ 12.5) instead of P(10 ≤ XN ≤ 12)
– Does this improve the approximation?
 Now let’s reconsider our approximation:
P(10 ≤ XB ≤ 12)
 On a number line it looks like this:
9.5 10 10.5 11 11.5 12 12.5
 Note that in this case it includes 10 and 12. Thus to make sure we
capture all of this we go back a little bit at 10 (9.5) and a bit more
forward at 12 (12.5). This is what’s missing previously.
Normal approximation to the binomial…
 9.5 − 10 X − µ 12.5 − 10 
P(9.5 ≤ X N ≤ 12.5) = P ≤ ≤ 
 5 σ 5 
= P(−0.22 ≤ Z ≤ 1.12)
= P(−0.22 ≤ Z ≤ 0) + P(0 ≤ Z ≤ 1.12)
= .0871 + .3686 = .4557
0.4557 ≈ 0.4565. Not too bad as an approximation!

 Now let’s reinforce this let’s say that now it is:
P(10 <XB ≤ 12)
 On a number line it looks like this:
9.5 10 10.5 11 11.5 12 12.5

 Note that in this case it includes doesn’t include 10 but includes 12.
Thus to make sure we capture all of this we go forward a little bit at
10 (10.5) to avoid it and a bit more forward at 12 (12.5). Try this for
other cases.
Normal approximation of binomial:
correcting for continuity
Correction
Values Being Determined
x> +.50
x≥ -.50
x< -.50
x≤ +.05
≤x≤ -.50 and +.50
<x< +.50 and -.50
Example: Airline meals The binomial distribution is appropriate,
but 160 is a lot of trials!
 On a recent flight from Sydney to Let’s therefore use a normal
Perth, all 160 passengers were approximation to the binomial.
offered a lunch choice of beef or Binomial appropriate distribution but n = 160 is large
chicken For normal approximat ion to binomial use :
μ = np = 160(.6) = 96
 Past data indicates 60% choose σ = np(1 − p ) = 160(.6)(.4) = 6.197
beef over chicken
P ( X B > 110 ) = P ( X N > 110.5)
 Passenger choices appear to be 110.5 − 96 

independent = P Z >
 6.197 

= P (Z > 2.34) = .5 − P (0 < Z < 2.34)

 On this flight, what is the probability = 0.0096
that more than 110 passengers will Thus the airline could justify taking only 110 beef
choose beef? meals on the flight as there is only approximat ely
1 chance in 100 that they will run out of beef meals.
Stages of statistical analysis
 Define and understand the problem
– e.g. suppose a firm wants to determine the effectiveness of its
advertising
 Think about what data you would need to address the question
 Collect the appropriate data
 Analyse the data appropriately
– Use sample statistics to describe the problem
 e.g. what is a typical customer? What proportion of customers
recall the firm’s ads?
– Extract information about the population parameters on the basis of
sample statistics
 Suppose 50% of sampled customers recall the ads. What does
that sample proportion tell us about the population proportion?
 Communicate results accurately and effectively
Data collection
 In practice we can often find secondary data
– Data collected by someone else (e.g., the Australian Bureau
of Statistics), possibly for some other purpose
 Alternatively, we could collect primary data
– e.g. market researchers using mail survey or phone
interviews, customized to ask questions about the impact of
an ad
Data collection …
 Observational data measures actual behaviour or outcomes
– e.g. asking people whether they recall an ad or whether they
bought the product; or obtaining data from the company on
actual purchases and ad campaigns
– Often describes “big data” (large-N data stored in file systems
of companies or governments)
 Alternatively, experimental data imposes a treatment and
measures resultant behaviour or outcomes
– e.g. deliberately show one group an ad, and compare
subsequent purchases by that (treatment) group with another
(control) group that didn’t see the ad
Data collection…
 Designing a sample requires...
– Definition of target population
– Method of sampling
 Method of simple random sampling
– A sampling process by which all samples of the same size (n) are
equally likely to be chosen from the population of interest
– Avoids problems of selection bias where the design of sample
systematically favours certain outcomes
 What’s wrong with phone-in polls on talkback radio?
 Great example of a self-selected sample
Data collection…
 Producing a simple random
Threats to random sampling:
sample
• If students did not complete the first quiz, they
– Suppose the target population is have no chance of being sampled
this offering of DPBS 1203, and • Students with no mobile phone have no chance of
we are interested in measuring being sampled
student age • Numbers 0-9 are not allocated by chance in
determining mobile phone numbers
– Our sample is everyone who Therefore, this is unlikely to be a random sample.
completed the first fortnightly But if we used the sample anyway, would its sample
quiz and whose mobile phone statistics be likely to give a distorted view of the age
number ends in “8” distribution of the target population (the students
enrolled in DPBS 1203 this semester)?
– Does this constitute a random
• Is not completing the first quiz correlated with
sample? age?
 If not, is it likely to produce • Non-randomness needs to impact the outcome
results that are still useful in variable in a systematic way in order to be a
problem.
inferring things about the
target population?
Confoundment
 Q: Does radiation from mobile phones cause cancer?
– An observational study would compare cancer rates in a
sample of users with rates in a sample of non-users
 If users have a higher incidence of cancer, is this evidence that
mobile phones cause cancer?
– (Note: There is no consistent evidence that this is the case)
– But even if a relationship were found, you would need to
account for, or control for, other factors that might explain the
finding
– Possible confounding factors:
 There is a higher use of mobile phones in cities where
exposure to other forms of radiation is higher
Confoundment …
 Suppose we design an experimental study of the cancer-
mobile phone link
– Subjects are randomly assigned to one of the following:
 A control group of non-users
 A treatment group of users
– Wait a few years, and then observe and record
differences in cancer rates
 Explain why group allocation is done at random:
Progress report
 We have discussed both discrete and continuous random
variables and their probability distributions
 We have introduced discrete and continuous theoretical
probability distributions that are useful in representing the
distributions of actual data
– Binomial, uniform and normal
– Together, these three distributions offer models for a range
of phenomena
– The normal distribution also plays a key role in the theory
of estimation
 We have introduced the basics of sampling
 Next week: On to estimation!

DPBS 1203 Business and Economic Statistics

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DPBS 1203 Business and Economic Statistics

Uploaded by

Copyright:

Available Formats

DPBS 1203

Business and Economic Statistics

P(-0.5 < Z < 0) = .5 – P(-∞<Z<-.5)

P(0 < Z < 1)

– 97.5th percentile in computer assembly line

 We need a continuity correction

 On a number line it looks like this:

9.5 10 10.5 11 11.5 12 12.5

0.4557 ≈ 0.4565. Not too bad as an approximation!

 On a number line it looks like this:

9.5 10 10.5 11 11.5 12 12.5

= P (Z > 2.34) = .5 − P (0 < Z < 2.34)

You might also like