Professional Documents
Culture Documents
Basic Concepts
Normal Distribution & 'Standard Deviation
Skewed Distribution
Descriptive Statistics (mean, median,mode, 95% confidence interval for a mean, standard deviation, standard error, ra
Epidemiology/Biostatistics Tools
}
9 4/28 4-May 4 4/28 6
10 4/29 5-May 3 5/2 9
11 4/29 6-May 3 5/6 14 4) Select the bloc
12 4/30 7-May 5 5/10 11 dates and tallies,
13 4/30 8-May 3 5/14 7 graphing tool (or
14 5/1 9-May 3 5/18 4 "Charts", "Colum
15 5/2 10-May 4 column chart as s
16 5/2 11-May 2
17 5/4 12-May 1 16
18 5/4 13-May 4 14
19 5/4 14-May 1
12
20 5/4 15-May 3
21 5/5 16-May 2 10
22
2) Then,
5/5
tally the 17-May 1 8
23 number5/5 of cases 18-May 2 6
24 by day.
5/6 19-May 1
4
25 5/6 20-May 1
26 5/6 21-May 2
27 5/7 22-May 2 0
28 5/7 23-May 2
05 05 05
29 5/7 24-May 2 / 20 / 20 / 20
2 2 2
30 5/7 25-May 2/ 3/ 4/
31 5/7 26-May
32 5/8 27-May 1
33 5/8 28-May
34 5/8 29-May 1
35 5/9 30-May
36 5/9 31-May
37 5/9 1-Jun 2
38 5/10 2-Jun
39 5/10 3-Jun
40 5/10 4-Jun
41 5/10 5-Jun 2
42 5/11 6-Jun 1
43 5/11 7-Jun 3
44 5/12 8-Jun 1
45 5/13 9-Jun 1
46 5/13 10-Jun 2
47 5/13 11-Jun 3
48 5/13 12-Jun 3
49 5/14 13-Jun 4
50 5/15 14-Jun 2
Main Menu
5 05 05
/ 20 / 20
2 2
4/ 5/
Wayne W. LaMorte, MD, PhD, MPH Confidence Intervals for a Single Preval
Copyright 2006
Use these first four rows to compute the confidence limits for a single proportion (i.e., a prevalence of a cumulative incidence in one group
symmetrically distributed above and below the point estimate. They are more accurate than the approximations that assume large sample
or 1. (See the formula in Rothmans K: Epidemiology: An Introduction, Oxford University Press, 2002, page 132).
The lower section uses Byar's approximation of the exact limits as described by Rothman, K: Epidemiology: An Introduction, Oxford U
above, these can be used for small or large samples, and the limits are not necessarily symmetrically distributed above and below the
of a cumulative incidence in one group. These use Wilson's approximation of the exact limits for a binomial distribution. These are not
proximations that assume large sample size. These formulas can be used when the sample size is small or the proportion is close to 0
2, page 132).
pidemiology: An Introduction, Oxford University Press, 2002, page 134. As with Wilson's approximation
rically distributed above and below the point estimate.
Approximation)
Chi Squared Test Main Menu
Wayne W. LaMorte, MD, PhD, MPH
Copyright 2006
Example Observed Data Expected Under H0
+ Outcome -Outcome + Outcome -Outcome
Exposed 7 124 131 Exposed 4.99 126.01 131
Non-exposed 1 78 79 Non-exposed 3.01 75.99 79
8 202 210 8 202 210
Enter data into the blue cells to calculate a p-value with the chi squared test.
Observed Data Expected Under H0
+ Outcome -Outcome + Outcome -Outcome
Exposed 0 Exposed #DIV/0! #DIV/0! ###
Non-exposed 0 Non-exposed #DIV/0! #DIV/0! ###
0 0 0 #DIV/0! #DIV/0! ###
Chi Sq= #DIV/0!
p-value= #DIV/0! #DIV/0!
The chi squared test can also be applied to situations with multiple groups and outcomes.
For example, the number of runners who finished a marathon in less than 4 hours among those who trained not at all, a little, m
The Excel function CHITEST will calculate the p-value automatically, if you specify the range of actual (observe
frequencies and the range of expected observations. For example,
Observed Data Expected Under H0
Finished Didn't finish Finished Didn't finish
Not at all 2 5 7 3.29 3.71 7
A little 8 30 38 17.86 20.14 38
Moderately 20 15 35 16.45 18.55 35
A lot 25 12 37 17.39 19.61 37
55 62 117 55 62 117
p-value= 0.000280
The Chi Squared Test is based on the difference
PhD, MPH
between the frequency distribution that was observed
Copyright 2006
and the frequency distribution that would have been
expected under the null hypothesis. In the example
above, only 8 of 210 subjects had the outcome of
interest (3.8095%). Under the null hypothesis, we would
expect 3.8095% of the exposed group to have the
outcome, and we would expect 3.8095% of the non-
exposed group to have the outcome as well. The 2x2
table to the right calculates the frequencies expected
under the null hypothesis
for each cell. The
ed for each cell and
(O-E) 2
d if the number of
Fisher's Exact Test
=
2
E
Enter data into the blue cells to calculate a p-value with the chi squared test.
Lamorte, Wayne W:
Enter data into the Observed Data
turquoise cells. Cases Controls
Exposed 23 27 50
Non-exposed 11 39 50
34 66 100
Stratified Analysis
The area below provides an illustration of a stratified analysis that is limited to two substrata.
For stratified analysis of case-control data for up to 12 sbstrata use the worksheet entitled:
Stratified Analysis (for 2 Substrata) Crude
Cases Controls
688 650 1338
21 59 80
709 709 1418
Lamorte, Wayne W:
Enter data into the
turquoise cells.
Stratum 1
Cases Controls
Exposed 647 622 1269
Non-exposed 2 27 29
649 649 1298
O-E 634.5
O 647
Y-Values X-Values
From Ken Rothm p-value Curve 1 Null Bar
0.0037312 0.0037312 0.81150403
z value 0.0051099 0.0051099 0.849124856
2.9 0.0069335 0.0069335 0.88848976
2.8 0.009322 0.009322 0.929679597
2.7 0.0124189 0.0124189 0.97277897
2.6 0.0163947 0.0163947 1.017876402
2.5 0.0214478 0.0214478 1.065064524
2.4 0.0278065 0.0278065 1.114440258
2.3 0.0357284 0.0357284 1.166105021
2.2 0.0454999 0.0454999 1.22016493
2.1 0.0574327 0.0574327 1.276731023
2 0.0718603 0.0718603 1.335919485
1.9 0.0891306 0.0891306 1.397851888
1.8 0.1095982 0.1095982 1.462655439
1.7 0.133614 0.133614 1.530463242
1.6 0.161513 0.161513 1.601414573
1.5 0.1936006 0.1936006 1.675655164
1.4 0.230139 0.230139 1.753337504
1.3 0.2713318 0.2713318 1.834621148
1.2 0.3173102 0.3173102 1.919673052
1.1 0.36812 0.36812 2.00866791
1 0.4237105 0.4237105 2.101788515
0.9 0.4839271 0.4839271 2.199226133
0.8 0.548506 0.548506 2.301180899
0.7 0.6170749 0.6170749 2.407862226
0.6 0.6891563 0.6891563 2.519489232
0.5 0.764177 0.764177 2.636291198
0.4 0.8414805 0.8414805 2.758508031
0.3 0.9203442 0.9203442 2.886390761 1
0.2 1 1 3.020202055 1
0.1 0.9203443 0.9203443 3.160216758
0 0.8414805 0.8414805 3.306722457
0.1 0.7641771 0.7641771 3.460020069
0.2 0.6891564 0.6891564 3.620424464
0.3 0.6170749 0.6170749 3.788265108
0.4 0.5485061 0.5485061 3.96388674
0.5 0.4839271 0.4839271 4.147650083
0.6 0.4237106 0.4237106 4.339932581
0.7 0.36812 0.36812 4.541129177
0.8 0.3173102 0.3173102 4.751653123
0.9 0.2713318 0.2713318 4.971936829
1 0.230139 0.230139 5.202432752
1.1 0.1936007 0.1936007 5.443614323
1.2 0.161513 0.161513 5.695976923
1.3 0.1336141 0.1336141 5.960038898
1.4 0.1095982 0.1095982 6.236342623
1.5 0.0891306 0.0891306 6.525455618
1.6 0.0718603 0.0718603 6.827971713
1.7 0.0574327 0.0574327 7.144512268
1.8 0.0454999 0.0454999 7.475727447
1.9 0.0357285 0.0357285 7.822297557
2 0.0278065 0.0278065 8.184934444
2.1 0.0214478 0.0214478 8.564382954
2.2 0.0163947 0.0163947 8.961422461
2.3 0.0124189 0.0124189 9.376868476
2.4 0.009322 0.009322 9.811574311
2.5 0.0069335 0.0069335 10.26643284
2.6 0.0051099 0.0051099 10.74237833
2.7 0.0037312 0.0037312 11.24038836
2.8
2.9
Wayne W. LaMorte, MD, PhD, MPH
Main Menu Copyright 2006
1.00
h the chi squared test.
0.90
Expected Under H0 0.80
p-- value
Cases Controls
Exposed 17.00 33.00 50 0.70
Non-exposed 17.00 33.00 50 0.60
34 66 100
0.50
ect Confidence Level 0.40
0.30
80% confidence
0.20
90% confidence
0.10 95% confidence
0.00
0.01 0.1 1 10
Odds Ratio (log scale)
Lamorte, Wayne W:
Stratum 2 This is the p-value for the used to
Cases Controls P-value (HOMOG)= 0.02636859 ok compute a p-value for the Chi Square Te
41 28 69 for homogeneity across strata in order to
determine if there is effect measure
19 32 51
modification.
60 60 120
RR
3.02020205515235
SE(ln)
0.45316884850608
Curve 1 p-value
0.811504 0.003731
0.8491249 0.00511
0.8884898 0.006934
0.9296796 0.009322
0.972779 0.012419
1.0178764 0.016395
1.0650645 0.021448
1.1144403 0.027807
1.166105 0.035728
1.2201649 0.0455
1.276731 0.057433
1.3359195 0.07186
1.3978519 0.089131
1.4626554 0.109598
1.5304632 0.133614
1.6014146 0.161513
1.6756552 0.193601
1.7533375 0.230139
1.8346211 0.271332
1.9196731 0.31731
2.0086679 0.36812
2.1017885 0.423711
2.1992261 0.483927
2.3011809 0.548506
2.4078622 0.617075
2.5194892 0.689156
2.6362912 0.764177
2.758508 0.841481
0 2.8863908 0.920344
1 3.0202021 1
3.1602168 0.920344
3.3067225 0.841481
3.4600201 0.764177
3.6204245 0.689156
3.7882651 0.617075
3.9638867 0.548506
4.1476501 0.483927
4.3399326 0.423711
4.5411292 0.36812
4.7516531 0.31731
4.9719368 0.271332
5.2024328 0.230139
5.4436143 0.193601
5.6959769 0.161513
5.9600389 0.133614
6.2363426 0.109598
6.5254556 0.089131
6.8279717 0.07186
7.1445123 0.057433
7.4757274 0.0455
7.8222976 0.035729
8.1849344 0.027807
8.564383 0.021448
8.9614225 0.016395
9.3768685 0.012419
9.8115743 0.009322
10.266433 0.006934
10.742378 0.00511
11.240388 0.003731
0.99
0.95
0.9
80% confidence
90% confidence
95% confidence
1 10 100
tio (log scale)
orte, Wayne W:
is the p-value for the used to
pute a p-value for the Chi Square Test
omogeneity across strata in order to
mine if there is effect measure
fication.
0.00
0.00
0
Estimating Cumulative Incidence (CI) from Incidence Ra
Relationship of Incidence Rate to Cumulative Incidence (Risk)
Cumulative incidence (the proportion of a population at risk that will develop an outcome in a given period of time
measure of risk, and it is an intuitive way to think about possible health outcomes. An incidence rate is less intuitive
really an estimate of the instantaneous rate of disease, i.e. the rate at which new cases are occurring at any paricul
Incidence rate is therefore more analgous to the speed of a car, which is typically expressed in miles per hour. Time
to measure a car's speed, but we don't have to wait a whole hour; we can glance at the speedometer to see the ins
of travel. Rather than measuring risk per se, incidence rate measures the rate at which new cases of disease occur
time, and time is an integral part of the calculation of incidence rate. In contrast, cumulative incidence or risk asses
probability of an event occurring during a stated period of observation. Consequently, it is essential to describe the
period in words when discussing cumulative incidence (risk), but time is not an integral part of the calculation. Des
distinction, these two ways of expressing incidence are obviously related, and incidence rate can be used to estima
incidence. At first glance it would seem logical that, if the incidence rate remained constant the cumulative inciden
equal to the incidence rate times time:
CI = IR x T
This relationship would hold true if the population were infinitely large, but in a finite population this approximatio
increasingly inaccurate over time because the size of the population at risk declines over time. Rothman uses the e
population of 1,000 people who experience a mortality rate of 11 deaths per 1,000 person-years over a period of y
words, the rate remains constant. The equation above would lead us to believe that after 50 years the cumulative i
death would be CI = IR X T = 11 X 50 = 550 deaths in a population which initally had 1,000 members. In realtity, the
be 423 deaths after 50 years. The problem is that the equation above fails to take into account the fact that the size
population at risk declines over time. After the first year there have been 11 deaths, and the population now has o
not 1,000. As a result, the equation above overestimates the cumulative incidence, because there is an exponentia
population at risk. A more accurate mathematical expression that takes this into account is:
CI = 1 - e(-IR x T), where 'e' = 2.71828
This constant 'e' arises in many mathematical relationships describing growth or decay over time. If you are using a
spreadsheet, you could calculate the CI using the formula:
CI = 1 - EXP(-IR xT)
In the graph below the upper blue line shows the predicted number of deaths using the approximation CI = IR x T. T
in red, shows the more accurate projection of cumulative deaths using the exponential equation.
Nevertheless, note that the prediction from CI = IR x T gives quite reasonable estimates in the early years of follow-
500.0
pproximation CI = IR x T. The lower line,
quation. 400.0
100.0
0.0
0 5 10 15 20 25 30 35 40 45 50
Stratified Analysis
For stratified analysis of risk data (cumulative incidence) use the worksheet entitled: Strat. Cohort CI (Rothman)
For stratified analysis of rate data (incidence rates) use the worksheet entitled: Strat. Cohort IR (Rothman)
Lower Bound
Upper Bound
p-value (test-based)
Expected Under H0
Diseased No Disease p-value function
17.00 33.00 50
17.00 33.00 50
34 66 100 1.00
Confidence Level 0.95
0.90
0.85
0.80
0.75
0.70
0.65
0.60
0.55
0.50
ble Fraction= 0.3529412
0.45
0.40
0.35
0.30
0.25
0.20
0.15
0.10
0.05
0.00
0.01 0.1 1 10
Expected Under H0
Diseased No Disease RR
284.75 - From Rothman: Epidemiology: An Introduction, p. 121:
339.25 - "… significance testing is only a qualitative proposition. The end result is a declaration of 's
significant' that provides no quantitative clue about the size of the effect. [The p value func
624
quantitative visual message about the estimated size of the effect. The message comes in
Confidence Level
to the strngth of the effect and the precision. Strength is conveyed by the location of the c
horizontal axis and precision by the spread of the function around the point estimate. Beca
only one number, it cannot convey two separate quantitative messages."
Cohort CI (Rothman)
Cohort IR (Rothman)
For p-value function
90% 95% 99%
Lower Bound 1.15
Upper Bound 3.82
RR
2.0909091
SE(ln)
0.3136258
Curve 1 p-value
0.8420501 0.0037312
0.8688774 0.0051099
0.8965595 0.0069335
0.9251235 0.009322
0.9545975 0.0124189
0.9850106 0.0163947
1.0163926 0.0214478
1.0487744 0.0278065
1.0821879 0.0357284
1.1166659 0.0454999
1.1522424 0.0574327
1.1889524 0.0718603
1.2268319 0.0891306
1.2659182 0.1095982
1.3062498 0.133614
1.3478664 0.161513
1.3908088 0.1936006
1.4351194 0.230139
1.4808416 0.2713318
1.5280206 0.3173102
1.5767027 0.36812
1.6269357 0.4237105
1.6787692 0.4839271
1.7322541 0.548506
1.7874429 0.6170749
1.8443901 0.6891563
1.9031515 0.764177
1.9637851 0.8414805
2.0263505 0.9203442
2.0909091 1
2.1575246 0.9203443
2.2262624 0.8414805
2.2971901 0.7641771
2.3703776 0.6891564
2.4458968 0.6170749
2.523822 0.5485061
2.6042299 0.4839271
2.6871995 0.4237106
2.7728125 0.36812
2.8611531 0.3173102
2.9523082 0.2713318
3.0463674 0.230139
3.1434234 0.1936007
3.2435715 0.161513
3.3469103 0.1336141
3.4535414 0.1095982
3.5635697 0.0891306
3.6771035 0.0718603
3.7942544 0.0574327
3.9151377 0.0454999
4.0398723 0.0357285
4.1685808 0.0278065
4.30139 0.0214478
4.4384304 0.0163947
4.5798369 0.0124189
4.7257485 0.009322
4.8763088 0.0069335
5.0316659 0.0051099
5.1919725 0.0037312
confidence
0.90
0.95
ction 0.99
80%
confidence
90%
confidence
95%
confidence
1 10 100
RR
21:
end result is a declaration of 'significant' or 'not
of the effect. [The p value function]...presents a
effect. The message comes in two parts, relating
nveyed by the location of the curve along the
around the point estimate. Because the p value is
e messages."
Sample Size Calculations
Part I - Sample Size Calculations for Means
Anticipated Values: Put your anticipated proportions in the blue boxes.
Mean Stan. Dev
Group 1 205 30 The cells in the table below show the estimated number of
subjects needed in each group to demonstrate a statistically
Group 2 200 30
significant differenence at "p" values ranging from 0.10-0.01 and
at varying levels of "power." [Power is the probability of finding a
Difference in means = 2.44 % statistically significant difference, assuming it exists, at a given
"p" value.]
p1 0.6
p2 0.9
overall p 0.75
Effect Size= 0.69282032
Z(1-a/2) 1.96
Z(1-b) 0.84
n 32.6666667
Main Menu
equency)
Enter a seed # 7
Assign to Group: 1
random # 0.135125
13
This program usesxe2a random number
52 generator to assign subjects randomly to a group. You
need to specify how
/100many groups
0.52you want in the first blue cell. You then need to “spark” the
random number generator
trunc by entering
1 some number (ANY number) in the 2nd blue cell. Enter
a number and click outside the cell; this will generate a random number and specify to which
group the subject should be assigned, based on how many groups you specified.
Main Menu
a group. You
to “spark” the
blue cell. Enter
ecify to which
d.
The Normal Distribution: Mean, Variance, and Standard Deviation
Data set 1:
BMI Many biological characteristics that are measurements follow a Normal distribution fai
22 meaning their frequency distributions are bell-shaped and symmetrical around a mean or av
23 The shape of the bell will vary between tall & skinny for samples with relatively little variabili
23 wide for samples that have a lot of variability. To the right and left in green are two datasets
23 of body mass index (BMI). I can graph the frequency distribution of each dataset by
24
steps: 1) select the block of data; 2) click on "Data" from the tool bar above, and choose "S
24
that it is to be sorted according to the column the data is located in, and select "Ok." 3) With
25
I can easily determine the minimum and maximum values and the frequency of each of valu
25
25
Put these tallies in the smaller table entitled "Counts for Bar Chart". 4) select the 2 column b
25 the "Counts for Bar Chart" and click on the Graph icon from the toolbar above (the miniature
25 vertical bar chart). Indicate a vertical bar chart. Note: If it graphs the BMI and Frequency as
26 entities, you may have to first create the chart as an "XY Scatter" to indicate that they are re
26 convert the chart to a vertical bar. Scroll down to view the graph and for information abo
26 standard deviation, etc.
rom the tool bar above,
26 and then choose "Sort"; 3) I then indicate that it is to be sorted according to the column the data is located in, and s
26
26 Counts for Bar Chart
26 BMI frequency
27 22 1
27 23 3
27 24 2
27 25 5
27 26 7
27 27 9
27 28 6
27 29 5
27 30 3
28 31 1
28 32 1
28 33 1
28
28
10
28
9
29
8
29
7
29
29 6
29 5
30 4
30 3
30 2
31 1
32 0
33 22 23 24 25 26 27 28 29 30 31 32 33
27.00 Mean
5.77 Variance +/- 1 SD =68%
2.40 Standard Deviation
+/- 2 SD =95%
Variability in the data can be quantified from the variance, which basically calculates the average dist
the mean (the x with the "bar" over it). 2
(x – x)
n -1
Standard deviation is just the square root of the variance, and it is convenient because the mean +
observations, and the mean + 2 SD captures 95% of the observations.
2
(x – x)
n Excel.
Note the functions that are used to calculate variance and SD in -1 Then compare the standard dev
how these affect the shape of the frequency distributions. Data
Using SD versus SEM: A standard deviation from a sample is an estimate of the population SD, e.g. t
weight in the population. The SEM is a measure of the precision of our estimate of the population’s me
will increase as the sample size increases, i.e. the SEM will be narrower with larger samples.
If the purpose is to describe a group of patients, for example, to see if they are typical in their variability
Tables 2& 3 in Gottlieb et al.: N. Engl. J. Med. 1981; 305:1425-31). However, if the purpose is to estima
prevalence of disease, one should use SEM or a confidence interval.
Wayne W. LaMorte, MD, PhD, MPH
Copyright 2006
Main Menu
Data set 2:
rmal distribution fairly closely, BMI
ical around#DIV/0!
a mean or average value. 23
th relatively little variability to short & 24
green are two datasets that show values 24
tion of each dataset by following these 25
25
r above, and choose "Sort"; 3) i indicate
25
and select "Ok." 3) With the data sorted
25
requency of each of value in the range.
25
4) select the 2 column block of data in 26
lbar above (the miniature, multicolored 26
e BMI and Frequency as two separate 26
o indicate that they are related, and then 26
and for information about mean, 26
26
26
27 Counts for Bar Char
27
27 BMI frequency
27 22 0
27 23 1
27 24 2
27 25 5
27 26 7
27 27 14
27 28 6
27 29 5
27 30 3
27 31 1
27 32 0
28 33 0
28
28 16
28
28 14
28 12
29 10
29
8
29
29 6
29 4
30
2
30
30 0
31 22 23 24 25 26 27 28 29 30 31 32 33
27.05 Mean
3.02 Variance
+/- 1 SD =68%
1.74 Standard Deviation
+/- 2 SD =95%
Adapted from Dr. Tim Heeren, Boston University School of Public Health, Dept. of Biostatist
For specific strata of a population (e.g. age groups) indicate the number of observed events and the number of people in the s
Indicate the distribution of some standard reference population in column C. [Leave a "1" in column F for extra strata to preven
Distribution of
Reference Number of Number of Proportion
e.g. age Stratum Population Events Subjects or "Rate" SE
<5 1 0.20 200 46000 0.00435 0.00031
5-19 2 0.40 900 23000 0.03913 0.00128
20-44 3 0.40 2000 30000 0.06667 0.00144
45-64 4 0.00 0 1 0.00000 0.00000
65+ 5 0.00 0 1 0.00000 0.00000
6 0.00 0 1 0.00000 0.00000
7 0.00 0 1 0.00000 0.00000
8 0.00 0 1 0.00000 0.00000
Totals 1.00 3100 99005
Crude Rate 0.03131
Standardized Proportion or "Rate" 0.04319
Standard Error 0.00077
95% CI for Standardized Rate 0.04167 0.04470
Suppose you want to compare Florida and Alaska with respect to death rates from cancer. The problem is tha
Example: and Florida and Alaska have different age distributions. However, we can calculate age-adjusted rates by usin
determine what the overall rates for Florida and Alaska would have been if their populations had similar distrib
specific rates observed for each population and calculates a weighted average using the "standard" population
US age distribution in 1988 was used as a standard, but you can use any other standard. Note that the crude
substantially (1,061 per 100,000 vs.391 per 100,000, but Florida has a higher percentage of old people. The s
similar (797 vs. 750 per 100,000).
Distribution of Florida
US Population Number of Number of Proportion
e.g. age Stratum in 1988 Events Subjects or "Rate" SE
<5 1 0.07 2414 850000 0.00284 0.00006 Florida
5-19 2 0.22 1300 2280000 0.00057 0.00002 Age Deaths P
20-44 3 0.40 8732 4410000 0.00198 0.00002 <5 2,414 8
45-64 4 0.19 21190 2600000 0.00815 0.00006 5-19 1,300 2,2
65+ 5 0.12 97350 2200000 0.04425 0.00014 20-44 8,732 4,4
45-64 21,190 2,6
6 0.00 0 1 0.00000 0.00000
>65 97,350 2,2
7 0.00 0 1 0.00000 0.00000 Tot. 130,986 12,34
8 0.00 0 1 0.00000 0.00000
Totals 1.00 130986 12340003 Crude Rate= 130,9
Crude Rate 0.01061
Standardized Proportion or "Rate" 0.00797
Standard Error 0.00002
95% CI for Standardized Rate 0.00793 0.00802
Distribution of Alaska
US Population Number of Number of Proportion
in 1988
Alaska
Stratum Events Subjects or "Rate" SE
Age Deaths Pop
<5 164 60,
5-19 85 130,
20-44 450 240,
45-64 503 80,
>65 870 20,
Tot. 2,072 530,0
Alaska
Age Deaths Pop
1 0.07 164 60000 0.00273 0.00021 <5 164 60,
2 0.22 85 130000 0.00065 0.00007 5-19 85 130,
3 0.40 450 240000 0.00188 0.00009 20-44 450 240,
4 0.19 503 80000 0.00629 0.00028 45-64 503 80,
5 0.12 870 20000 0.04350 0.00144 >65 870 20,
Tot. 2,072 530,0
6 0.00 0 1 0.00000 0.00000
7 0.00 0 1 0.00000 0.00000 Crude Rate= 2,072/5
8 0.00 0 1 0.00000 0.00000
Totals 1.00 2072 530003
Crude Rate 0.00391
Standardized Proportion or "Rate" 0.00750
Standard Error 0.00019
95% CI for Standardized Rate 0.00714 0.00786
Main Menu
h, Dept. of Biostatistics
number of people in the stratum in columns E and F.
F for extra strata to prevent calculation error.]
0.00086957 0.00000000
0.01565217 0.00000026
0.02666667 0.00000033
0.00000000 0.00000000
0.00000000 0.00000000
0.00000000 0.00000000
0.00000000 0.00000000
0.00000000 0.00000000
0.04318841 0.00000060
ancer. The problem is that death rates are markedly affected by age,
age-adjusted rates by using a reference or "standard" distribution to
ulations had similar distributions. The calculation uses the age-
g the "standard" populations distribution for weighting. In this case, the
dard. Note that the crude rates for Florida and Alaska differ
ntage of old people. The standardized (age-adjusted) rates are very
0.00019880 0.00000000
Florida % of total Rate per
Age 0.00012544
Deaths 0.00000000
Pop. (Weight) 100,000
<5 0.00079202
2,414 850,000
0.000000007% 284
5-19 0.00154850
1,300 2,280,000 18%
0.00000000 57
20-440.00531000
8,732 4,410,000 36%
0.00000000 198
45-64 21,190 2,600,000 21% 815
0.00000000 0.00000000
>65 97,350 2,200,000 18% 4,425
0.00000000
Tot. 130,986 0.00000000
12,340,000 100%
0.00000000 0.00000000
Crude0.00797476 0.00000000
Rate= 130,986/12,340,000=1,061 per 100,000
Example:
(Column CxD)
State Cancer # People in Expected # Observed #
Rate Community Community Community
e.g. age Stratum (Standard) Strata Cancers Cancers Age Group Community
<20 1 0.00010 74657 7.5 11 population
20-44 2 0.00020 134957 27.0 25
45-64 3 0.00050 54463 27.2 30 0-19 .0001
65-74 4 0.00150 25136 37.7 40
20-44 .0002
75-84 5 0.00180 17012 30.6 30
85+ 6 0.00100 6337 6.3 8 45-64 .0005
7 0.00000 0 0.0
8 0.00000 0 0.0 65-74 .0015
Totals 312562 136.4 144
75-84 .0018
Standarized Incidence Ratio (SIR): 106 85+ .0010
Lower 95% Confidence Limit: 88
Upper 95% Confidence Limit: 123
If the observed count is >30, the confidence interval for
Main Menu observed count is calculated using the Poisson distribu
mmunity exceeds to approximate the distribution of the observed counts.
nested if statements
120
Confidence Limits
for Observed Count
Screening Main Menu
Gold Standard
+ -
Test + 4235 13337 17572 PPV= 0.241
Result - 1755 53563 55318 NPV= 0.968
5990 66900 72890
Sensitivity Specificity
0.707 0.801
Descriptive Statistics: Mean, Median, Mode, 95% confidence
interval for a mean, Standard Deviation, Standard Error, Range Wayne W. LaMorte, MD, PhD, M
(minimum and maximum) Copyright 2006
N 12 Median 19
Mean 17.83 Mode 17
STD (Stand. Dev.) 4.80 Minimum 7
Std Error 1.39 Maximum 23
T-crititcal 2.12
T-crititcal*std err 2.94
Note:
Note:
CONFIDENCE 2.72 Using the Excel 'CONFIDENCE' function
1.96* Std Error= 2.72 gives same thing as 1.96 x stderr This worksheet is
currently under
Example Data: development.
14
17
22
18
22
17
12
7
20
21
21
23
Note:
This worksheet is
currently under
development.
Examining the frequency distribution of a data set is an important first step in analysis. It gives an overall picture of the data, a
distribution determines the appropriate statistical analysis. Many statistical tests rely on the assumption that the data are norm
this isn't always the case. Below in the green cells is a data set with hospital length of stay (days) for rwo sets of patients who
surgery. One data set was collected before instituting a new clinical pathway and one set was collected after instituting it.
Question: Was LOS different after
instituting the pathway?
LOS
Before After
3 3
12 1
2 1
1 5
11 1
4 6
2 1 We can rapidly get a feel for what is going on here by creating a frequency
2 5 histogram. The first step is to sort each of the data sets. Begin by selecting the
3 2
"before" values of LOS. Then, from the top toolbar, click on "Data", "Sort" (if you
1 3
get a warning about adjacent data, just indicate you want to continue with the
8 3
2 1
current selection). Also, indicate that there is no "header" row and that you want
3 5
to sort in ascending order. Repeat this procedure for the other data set.
6 2
1 2
13 2
3 3
8 3
10 7
6 3
4 4
12 1
9 3
7 3
1 2
3 2
3 2
2 4
5.07 2.86
by creating a frequency
sets. Begin by selecting the
click on "Data", "Sort" (if you
want to continue with the
ader" row and that you want
r the other data set.
before
after
4 5 6 7 8 9 10 11 12 13 14 15
28 25
30 27 20
31 30 0.8
ass index for the two groups to the left; group1 was untreated & group 2
at was treated with a regimen of diet and exercise for 4 months. There is
n. Values range from 22-34, and there is considerable overlap between the
40
an
he 35
30
25
20
0.8 1.8
ponse to treatment.
ke just about all subjects reduced their BMI somewhat, and if you
20
0.8 1.8
the null hypothesis is that the means are the same, but in a paired
hesis is that the mean difference between the pairs is zero.
T-tests calculate a "t" statistic that takes into account the difference between the means, the variability in
observations in each group. Based on the "t" statistic and the degrees of freedom (total observations in
look up the probability of observing a difference this great or greater if the null hypothesis were true.
Group 1 Group 2 From a practical point of view Excel provides built in functions tha
4.5 4.2 cell C44 to see the function used for a t-test with equal variance.
5.0 7.2 • the cells where the first groups data is found,
5.3 8.0 • the cells where the second group's data is found,
5.3 3.5
• then whether it is a 2-tailed test or a 1-tailed test, and
6.0 6.3
• finally a "2" to indicate a test for equal variance.
6.0 5.1
7.6 4.6
If the variance is unequal, there is a modified calculations that one
7.7 4.8
the last parameter in the function (compare the formulae in cells C
6.4 2.0 thumb, if one standard deviation is more than twice the other, you
7.2 5.0 variance test.
7.0 5.4
5.6 Note also that the two groups do not have to have the same numb
8.4
8.3 Finally, note that in this case we are estimating the means in each
9.5 are different; consequently, it is appropriate to calculate SEM, whi
15 11 N square root of N.
6.7 5.1 Mean
2.10 2.77 Variance
1.45 1.66 SD
0.37 0.50 SEM (standard error of the mean)
0.02 Two-tailed p-value by t-test for equal variance
0.02 Two-tailed p-value by t-test for unequal variance
The t-test is a "parametric" test, because it relies on the legitimate use of the means and standard deviations, w
the parameters that define normally distributed continuous variables. If the groups you want to compare are cl
skewed (i.e. do not conform to a Normal distribution), you have two options:
1) Sometimes you can "transform" the data, e.g. by taking the log of each observation; if the log
are normally distributed, you can then do a t-test on the transformed data; this is legitimate.
1) Sometimes you can "transform" the data, e.g. by taking the log of each observation; if the log
are normally distributed, you can then do a t-test on the transformed data; this is legitimate.
en the means, the variability in the data, and the number of 4.5
eedom (total observations in the two groups minus 2) one can
4
null hypothesis were true.
3.5
3
2.5
2
1.5
1
0.5
0
10-20 21-30 31-40 41-50 5
provides built in functions that make t-tests easy. Click on
a t-test with equal variance. One specifies:
ps data is found, failed ok
roup's data is found, 56 19
37 25
est or a 1-tailed test, and
57 38
for equal variance.
39
modified calculations that one can get by specifying "3" as 35
mpare the formulae in cells C44 & C45). As a rule of 40
more than twice the other, you should use the unequal 66
19
43.6 27.3
have to have the same number of subjects. 227.4 94.3
15.1 9.7
estimating the means in each group to test whether they
opriate to calculate SEM, which is SD divided by the 0.08
Freq. failed
Freq OK
Mean
Variance
SD
6000
values first. From thes
5000 specify the line of bes
4000
To calculate the correl
3000 relationship one would
2000
"=CORREL(B3:B10,C
Finally, in H7, I square
1000 calculate "r-squared",
0 of the variability in ear
0 2 4 6 8 10 12
Weeks
I used the graphing tool to plot the individual data points (blue diamonds) and
the line of best fit (pink line).
Main Menu
Controls (ANOVA)
Aortoiliac Fem-AK Pop Fem-Distal The columns of data to the left are serum creatinine le
factor analysis of variance can be performed to determ
0.7 1.1 1.5 1.2
differences in the means of these groups.
1.2 1.3 1.1 0.8
1.1 0.9 0.8 0.7 Select the block of data (including column labels) from
0.7 0.7 0.9 0.7 select "Tools", then "Data Analysis", then "Single Facto
1.0 0.8 1.1 8.4 for labels, and specify the Output Range as G12. The
0.5 1.4 0.9 1.8
1.6 0.5 7.0 0.8 The p-value (0.0764) indicates differences in means th
0.8 1.1 1.4 1.0 criterion for statistical significance.
0.6 2.0 0.8 0.7
0.6 0.8 1.1 2.8 Anova: Single Factor
0.6 0.7 0.6 1.5
1.3 1.4 1.2 0.6 SUMMARY
0.5 1.1 0.6 1.3 Groups Count
1.0 1.5 1.2 0.5 Controls 25
1.0 1.0 0.6 1.2 Aortoiliac 25
0.8 0.9 0.8 8.2 Fem-AK Pop 25
0.8 0.9 0.8 0.4 Fem-Distal 25
0.6 0.6 1.3 0.6
0.5 0.9 1.3 1.6
0.9 0.9 1.5 0.5 ANOVA
0.7 1.2 1.5 11.4 Source of Variation SS
0.7 1.2 0.4 0.8 Between Groups 30.3779
0.7 1.3 12.9 0.7 Within Groups 412.2632
0.7 0.4 1.1 0.6
1.1 0.7 8.6 0.9 Total 442.6411
Means: 0.828 1.012 2.040 1.988
ick on "Tools" (above) and then on "Add-Ins" and select "Analysis Tool
ools," you will see a new selection ("Data Analysis") at the bottom of the
nd other procedures.
to the left are serum creatinine levels among 4 groups of subjects. A one-
riance can be performed to determine whether there are significant
eans of these groups.
ata (including column labels) from B2:E27. Then, from the upper menu,
Data Analysis", then "Single Factor Analysis of Variance". Check the box
fy the Output Range as G12. The result is shown in the box below.
df MS F P-value F crit
3 10.12597 2.35794221 0.0764786914 2.699393
96 4.294408
99
S u rv iv a l P ro b a b ility
Survival Curves Main Menu
(Adapted from Kenneth Rothman's "Episheet".)
In the blue cells enter the initial # of subjects at risk (C8), and then the # of events and
losses to follow up for each period.
1.00
0.90
Risk
Initial No. at
No. at Risk
0.80
Cumulative
Surv. Prob.
95% Lower
95% Upper
Follow-up
0.70
Effective
sum q/pL
Survival
Lost to
Events
Bound
Bound
Period
Prob.
0.60
Risk
0.50
0.40
0 100 6 4 98.0 0.0612 0.9388 0.9388 0.8728 0.9716 0.000665 0.30
1 90 6 5 87.5 0.0686 0.9314 0.8744 0.7931 0.9267 0.001507 0.20
2 79 3 2 78.0 0.0385 0.9615 0.8408 0.7535 0.9012 0.002020 0.10
3 74 5 7 70.5 0.0709 0.9291 0.7811 0.6854 0.8540 0.003102 0.00
4 62 4 7 58.5 0.0684 0.9316 0.7277 0.6254 0.8106 0.004357 1 2 3 4
5 51 5 2 50.0 0.1000 0.9000 0.6550 0.5459 0.7498 0.006579 Tim
6 44 3 6 41.0 0.0732 0.9268 0.6070 0.4947 0.7091 0.008505
7 35 0 3 33.5 0.0000 1.0000 0.6070 0.4947 0.7091 0.008505
8 32 7 3 30.5 0.2295 0.7705 0.4677 0.3493 0.5899 0.018271
9 22 5 4 20.0 0.2500 0.7500 0.3508 0.2364 0.4854 0.034938
10 13 6 7 9.5 0.6316 0.3684 0.1292 0.0517 0.2879 0.215389
11 0
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
S u rv iv a l P ro b a b ility
Survival Curve
1.00
0.90 Cumulative Surv.
Effective Size
0.80 Prob.
0.70
95% Lower
0.60
Bound
0.50
0.40 95% Upper
98.0000
0.30 Bound
95.3235
0.20
0.10
93.7696
0.00
90.3081
1 2 3 4 5 6 7 8 9 10 11
85.8686
80.0720 Time Period
76.1161
76.1161
62.2871
52.9724
31.2817
Case-Control Analysis from Ken Rothman's "Episheet.xls" (with permission)
This sheet enables you to perform crude and/or Mantel-Haenszel stratified analysis with up to 12 substrata.
It also computes a MH p-value, testing the null hypothesis OR=1, and it computes a p-value for the chi square test for homogen
Unexposed
Unexposed
Unexposed
Instructions
Exposed
Exposed
Exposed
1. Enter frequencies in the yellow
Total
Total
cells of 2x2 tables; check against
crude table below; CTL-e to clear. Cases 31 14 45 65 12 77
2. Click button to the right to adjust
column width if necessary. Controls 1700 1700 3400 1700 1700 3400
3. The results appear in red. Scroll Total 1731 1714 3445 1765 32 1797
down or right if needed. RR= 2.2143 RR= 5.4167
Cases
Controls
1 Total
0.9 P-value Function
P-value
0.8 Cases
0.7 Controls
0.6 Total
0.5
0.4 Crude Data RRmh = 4.2051
0.3 Cases 96 26 122 90% Conf. Interv. = 2.8496
0.2 Controls 3400 3400 6800 95% Conf. Interv. = 2.6449
0.1 Total 3496 3426 6922 99% Conf. Interv. = 2.2868
0
Crude RR = 3.6923 P-value testing RR = 1: 0.5444
0.1 Relative1Risk 10
P-value for homogeneity: 0.0326
Unexposed
Exposed
G=ad/t
table
Total
Total
RR A E V
1 2.214285714
### 22.61103048 11.10600133 15.29753266
2 5.416666667
### 75.62882582 2.54954008 61.49137451
3
4
5
6
7
8
9
10
11
12
Tot ### 98.2398563 13.65554141 76.78890717
2.8496 - 6.2055
2.6449 - 6.6856 MH chi = -0.60612988170105
2.2868 - 7.7326 Var(ln(RRMH)) = 0.055960656159942
0.5444
0.0326
Q=(b+c)/T
P=(a+d)/T
GQ+HP
H=bc/t
ln(RR)
HQ
GP
6.908563135 0.502467344 0.497532656 7.686510603 11.08234942 3.437235766 0.794929875
11.35225376 0.982192543 0.952698943 60.39636951 69.73286647 10.81528015 1.68948062
612988170105
960656159942
var(ln(RR))
chisq het
0.104863107 3.922818633
0.099894419 0.641668673
4.564487306
df = 1
Cohort Cumulative Incidence Analysis from Ken Rothman's "Episheet.xls" (with permission)
This sheet enables you to perform crude and/or Mantel-Haenszel stratified analysis with up to 12 substrata.
It also computes a MH p-value, testing the null hypothesis RR=1, and it computes a p-value for the chi square test for homogen
Unexposed
Unexposed
Instructions
Exposed
Exposed
Exposed
1. Enter frequencies in yellow cells
Total
Total
of 2x2 tables; check against crude
table below; CTL-e to clear. Cases 31 14 45 90 15 105
2. Click button to the right to adjust Non-cases 1969 1986 3955 1910 1985 3895
column width if necessary.
3. The results appear in red. Scroll Total 2000 2000 4000 2000 2000 4000
down or right if needed. RR= 2.214285714 RR= 6
References: RD= 0.0085 RD= 0.0375
1. Modern Epidemiol, 3rd Ed., Ch. 15
2. Sato T, Biometrics 1989;45:1323-4 Cases
Non-cases
Total
1 P-value Function
0.9 Cases
P-value
0.8 Non-cases
0.7 Total
0.6
0.5
0.4 Crude Data RRmh = 4.1724
0.3 Cases 121 29 150 90% Conf. Interv. =
0.2 Non-cases 3879 3971 7850 95% Conf. Interv. =
0.1 Total 4000 4000 8000 99% Conf. Interv. =
0
0.1 1 10
Crude RR = 4.1724 P-value testing RR = 1:
Relative Risk Crude RD = 0.0230 P-value for homogeneity:
var(ln(RR))
Unexposed
Unexposed
Exposed
ln(RR)
aNo/T
bN1/T
table
Total
Total
RD RR A E V
1 0.009 2.21 31 22.5 11.13 15.5 7 11.1 0.795 0.1027
2 0.038 6 90 52.5 25.57 45 7.5 25.9 1.792 0.0768
3
4
5
6
7
8
9
10
11
12
0.0180 - 0.0280
0.0171 - 0.0289
0.0152 - 0.0308
0.0000
0.0000
(aN0 - bN1)/T
RR chisq het
RD chisq het
Sato Qk
Sato Pk
var(RD)
N1N0/T
Unexposed
Unexposed
Instructions
Exposed
Exposed
Exposed
1. Enter events and person-time in
Total
Total
yellow cells of tables on the right;
check entries against crude table
below; CTL-e to clear. Cases 32 2 34 104 12 116 206
2. Click button to the right to adjust Person-time 52407 18790 71197 43248 10673 53921 28612
column width if needed. RR= 5.73663823535 RR= 2.138811814 RR=
3. The results appear in red. Scroll RD= 0.00050416586 RD= 0.0012804031 RD=
down or right if needed.
Ref: Mod. Epid. 3rd ed. Ch. 15 Cases 102 31 133
Person-time 5317 1462 6779
RR= 0.90473041431
P-value Function
RD= -0.00202008013
1 Cases
0.9
Person-time
P-value
0.8
0.7
0.6
0.5 Crude Data RRmh = 1.4247
0.4 Cases 630 101 731 90% Conf. Interv. =
0.3 Person-time 142247 39220 181467 95% Conf. Interv. =
0.2
99% Conf. Interv. =
0.1
0 Crude RR = 1.7198 P-value testing RR = 1:
0.1 Relative1 Risk 10 Crude RD = 0.0019 P-value for homogeneity:
Unexposed
Exposed
table
Total
Total
RD RR A E
28 234 186 28 214 1 0.0005041659 5.7366382354 32 25.02686911
5710 34322 12663 2585 15248 2 0.0012804031 2.138811814 104 93.03922405
1.468240099 RR= 1.356059837 3 0.0022960986 1.4682400991 206 195.07045044
0.002296099 RD= 0.003856741 4 0.003856741 1.3560598369 186 177.72048793
5 -0.0020200801 0.9047304143 102 104.31641835
6
7
8
9
10
11
12
Tot 630 595.17344988
1.1944 - 1.6994
1.1547 - 1.7578 MH chi =
1.0810 - 1.8776 Var(ln(RRMH)) =
0.0009
0.0340
Var(RDMH) =
0.0006349 - 0.0016529
0.0005375 - 0.0017504
0.0003472 - 0.0019407
0.0009
0.0000
M1PT1PT0/T^2
RR chisq
aPT0/T
bPT1/T
G=ad/t
H=bc/t
V
6.6049815381 8.4452996615 1.4721687712 6.6049815381 8.4452996615 1.4721687712 3.6522151216
18.415972224 20.585523266 9.6247473155 18.415972224 20.585523266 9.6247473155 1.7760455491
32.45301183 34.271312861 23.341763301 32.45301183 34.271312861 23.341763301 0.0223562547
30.129030778 31.532660021 23.253147954 30.129030778 31.532660021 23.253147954 0.0593063549
22.497507542 21.997934799 24.314353149 22.497507542 21.997934799 24.314353149 4.9017361112
3.31906534265252
0.011491538987477
df = 4
9.57372904933489E-08
num of VarRD
PT1PT0/T
RD chisq
WmhRD
13831.025605 6.97313089035 3.3124868945 23.636285138
8560.4106749 10.96077595 11.794298357 0.1157396089
4760.0524445 10.92954956 25.160064332 1.1955071883
2146.7638379 8.27951206716 24.656777197 1.3755452387
1146.6962679 -2.3164183508 23.814780406 0.5527425286
Distribution of Minnesota
the Combined Number of Number of Proportion
e.g. age Stratum Populations % Events Subjects or "Rate" SE
Well water 1 0.29 0.76 93 3379 0.02752 0.00281
City water 2 0.09 0.20 27 874 0.03089 0.00585
Bottled wa 3 0.62 0.05 5 206 0.02427 0.01072
Totals 1.00 1.00 125 4459
Crude Rate 0.02803
Standardized Proportion or "Rate" 0.02580 Crude RR
Standard Error 0.00674 Adjusted RR
95% CI for Standardized Rate 0.01259 0.03901
Distribution of Illinois
the Combined Number of Number of Proportion
Stratum Populations Events Subjects or "Rate" SE
1 0.29 0.01 2 100 0.02000 0.01400
2 0.09 0.03 6 200 0.03000 0.01206
3 0.62 0.96 145 7293 0.01988 0.00163
Totals 1.00 153 7593
Crude Rate 0.02015
Standardized Proportion or "Rate" 0.02082
Standard Error 0.00430
95% CI for Standardized Rate 0.01238 0.02925
Illinois) Main Menu
d more often in Illinois as compared to Minnesota. However,
defects. They wanted to adjust (standardize) the rates of
0.00794493 0.00000066
0.00275294 0.00000027
0.01510244 0.00004451
0.02580031 0.00004544
1.39
1.24 Suppose that after this publication came out, another study was conducted in Illinois to inv
occurred more often in Illinois as compared to Minnesota. However, in this new study the a
consumed could be related to birth defects. They wanted to adjust (standardize) the rates o
Data from the two studies are compared as below.
Births by state and water type Minnesota Pesticide Appliers Illinois Pesticide Appliers Norm
Water Type (#) (#) rate* (#) (#) rate* Well water only 3379 93 26.8 100 2 ____ City water on
0.00577332 0.00001633 206 5 23.7 7293 145 ____ Total 4456 125 28.0 7593 153 ____ * per 1000 live births a. calcu
0.00267342 0.00000116 specific rates for Illinois. Briefly describe how these two states compare in crude rates of bi
0.01237103 0.00000103 number of live births as a standard, calculate a standardized rate (standardized for water ty
0.02081777 0.00001852 how these standardized rates compare with each other and reasons why they may or may n
nducted in Illinois to investigate the hypothesis that birth defects
r, in this new study the authors thought that the type of water
(standardize) the rates of defects in the two states for water type.
Enter the Mean and Std. Dev. for a population and an observed value (X) from the
population. The function returns the probability of values less than the observed value.
This can also be used to interpreted as the "percentile" for the observed value.
Enter the desired probability (percentile), e.g., for 90th percentile enter 0.90. Then
enter the mean and standard deviation for the population. The function will return the
value representing that percentile.
Poisson Calculator
The Poisson distribution is useful when events typically occur at with an expected average frequency although with some rando
variability about this mean. The Poisson equation lets you compute the probability that events will occur at a specified frequenc
example, suppose you typically get 4 spam emails per day, and you want to compute the probability that tomorrow you will rec
spam emails. Enter 4 as the mean and 5 as "X"; the Poisson probability is 15.6%.