On September 25, 2007 Citizens for Responsibility and Ethics in Washington (CREW) filed a lawsuit against the Executive Office of the President, the Office of Administration and the National Archives and Records Administration in the District Court for the District of Columbia. CREW's action challenges as contrary to law those parties' knowing failure to recover, restore, and preserve millions of electronic communications created and/or received within the White House. The lawsuit stems from the millions of e-mails that were improperly deleted from White House servers and exist on back-up tapes, if at all.; Number of Pages: 2; FOIA Request: CREW versus EXECUTIVE OFFICE OF THE PRESIDENT ET AL (Lawsuit); Holder of Document: CREW; Producing Agency: Executive Office of the President; Date Received: Jun 24, 2009;
Original Title
CREW versus EXECUTIVE OFFICE OF THE PRESIDENT ET AL (Lawsuit): OAP00005353
On September 25, 2007 Citizens for Responsibility and Ethics in Washington (CREW) filed a lawsuit against the Executive Office of the President, the Office of Administration and the National Archives and Records Administration in the District Court for the District of Columbia. CREW's action challenges as contrary to law those parties' knowing failure to recover, restore, and preserve millions of electronic communications created and/or received within the White House. The lawsuit stems from the millions of e-mails that were improperly deleted from White House servers and exist on back-up tapes, if at all.; Number of Pages: 2; FOIA Request: CREW versus EXECUTIVE OFFICE OF THE PRESIDENT ET AL (Lawsuit); Holder of Document: CREW; Producing Agency: Executive Office of the President; Date Received: Jun 24, 2009;
On September 25, 2007 Citizens for Responsibility and Ethics in Washington (CREW) filed a lawsuit against the Executive Office of the President, the Office of Administration and the National Archives and Records Administration in the District Court for the District of Columbia. CREW's action challenges as contrary to law those parties' knowing failure to recover, restore, and preserve millions of electronic communications created and/or received within the White House. The lawsuit stems from the millions of e-mails that were improperly deleted from White House servers and exist on back-up tapes, if at all.; Number of Pages: 2; FOIA Request: CREW versus EXECUTIVE OFFICE OF THE PRESIDENT ET AL (Lawsuit); Holder of Document: CREW; Producing Agency: Executive Office of the President; Date Received: Jun 24, 2009;
DRAFT DISCUSSION DOCUMENT
Dr. Kirkendall Sample Size Recommendation
Sample Sizes
‘We need to select a sample size that we think provides the best compromise in terms of
confidence and cost. We can make reasonable statements about the probability of missing
messages being less than .1 with a sample between 25 and 35. To be able to state that the
probability that p is less than .05 is .95 we would need a sample of about 60.
If n=15 and we do not find any missing messages then the probability that p (probability
of missing messages in a good day) is less than .1 is .79 and the probability that p is less
than .05 is .54.
Ifn=25 and we do not find any missing messages then the probability that p is less than
1 is .93 and the probability that p is less than .05 is .72.
Ifn=35 and we do not find any missing messages then the probability that p is less than,
-1is .975 and the probability that p is less than .05 = .83.
If n=60 and we do not find any missing messages then the probability that p is less than
1 is .998 and the probability that p is less than .05 is 954.
Confidence Statements
If the sample size is 25 and we do not find any missing messages in the sampled days, then we
can be 93% confident that the probability of missing messages among the good days is less than
1; and we can be 72% confident that the probability of missing messages among the good days
is less than .05.
If n=25 and we do not find any missing messages then the probability that p is less than .1 is .93
and the probability that p is less than .0S is .72.
If the sample size is 35 and we do not find any missing messages in the sampled days, then we
can be 97.5% confident that the probability of missing messages among the good days is less
than 1; and we can be 83% confident that the probability of missing messages among the good
days is less than .05
Ifn=35 and we do not find any missing messages then the probability that p is less than .1 is .975
and the probability that p is less than .05 = .83
If the sample size is 60 and we do not find any missing messages among the sampled days, then
we can be 99.8% confident that the probability of missing messages among the good days is less
than .1; and we can be 95.4% confident that the probability of missing messages among the good
days is less than .05,
Ifn=60 and we do not find any missing messages then the probability that p is less than .1 is
.998 and the probability that p is less than .05 is .954.
GEORGE W. BUSH PRESIDENTIAL RECORD.
OAP00005353.DRAFT DISCUSSION DOCUMENT
Determination of Good days
Days that OA believes have no missing email messages based on ARIMA analysis and expert,
judgment. A random sample with replacement will be taken from these days and restored from
the DR tapes. Results will be used to determine whether any missing email messages are found
in the sampled days. If no missing email messages are found in any sampled day the good days
will be assumed to be complete.
Let p represent the probability that a file from the DR tapes contains new messages that are
already in Phase II results.
We would like to have p=0 for the Good Days, but the only way to be completely sure is to
evaluate data files, and/or restore files for all days. Instead we propose a lower cost approach
that uses sampling to help us assure that p is very low.
A random sample with replacement of Good Days will be taken and assessed for presence of
new messages. If the DR tapes for the sample of days contain only messages that are already
included in Phase II results, this will be evidence that there are no missing messages among the
Good Days.
Statistical foundations
‘We want information about the proportion of the population (days) have some feature (missing
messages), p. If we use a simple random sample of size n selected with replacement, the number
of sampled units that have the feature is known to follow a binomial distribution with parameters
nand p.
‘We can use the binomial distribution and a prior distribution that the probability p could be
‘anywhere in the interval from 0 to 1 (a uniform distribution) to find the posterior probability of p
given sampled results. The posterior distribution can be viewed as the evidence in the sample for
the values p might take. The posterior distribution of p is a beta distribution with parameters
x+I, and n-x+1. (x is the number of days in the sample for which missing messages are found.)
We are interested in the situation when x=0, so the parameters are I and n. The mean of this
distribution is 1/n+1, and the variance is (n~1)/((n+2)(n+3)). The following confidence
statements (probabilities) came from a beta distribution with parameters I and n available
through a statistical software package called Statistix,
GEORGE W. BUSH PRESIDENTIAL RECORD.
OAP00005354