Professional Documents
Culture Documents
LA SALLE
College of Arts and Sciences
HANDOUTS 5A
BIAS – A methodological problem; any trend or deviation from the truth in data collection, data
analysis, interpretation and publication which can cause false conclusions. Bias can occur either
intentionally or unintentionally.
Some examples of Bias:
o Selection (Sampling) bias – some error or distortion resulting from a selection process that
is not random.
o Observer bias – some error or distortion (either intentional or unintentional) in the
perception or description of the data by the observer. Examples include incorrect use of
subjective scales or surveying a homogeneous population and extrapolating the results to
the general population.
o Subject bias – some error or distortion of the measurement by the study subject. An
example is recall bias, in which subjects incorrectly remember and report events that
occurred in the past, such as dosages of medications or diet habits.
o Instrument bias – some error or distortion from faulty mechanical equipment, such as an
uncalibrated scale that gives falsely low weight or interference by ambient light with
infrared co-oximeter measurements.
INFERENTIAL STATISTICS
PARAMTERIC NONPARAMETRIC
POINT INTERVAL
STATISTICS STATISTICS
Nonparametric Normal
methods distribution?
No Yes
Nonparametric Parametric
methods methods
Some notes:
1. The level of measurement is an important indicator in the choice of statistical hypothesis testing
procedure.
2. Parametric tests have more stringent assumptions before they can be used: data must be at least
interval and the distribution of the data must be normal or approximately normal.
Defn: Estimation is the process of estimating the value of a parameter from information obtained from
a sample.
Estimate – a particular numerical value
Estimator – a sample measure (statistic) used to estimate a population measure (parameter)
POINT ESTIMATION
Determining a particular numerical value estimate of a parameter
An estimator is not expected to estimate the parameter without error, but it is hoped that it
is not too far off
Point Estimation for some parameters
The best estimator for the population …
Mean, , is the sample mean, 𝑥̅
=> Each sample generates a different value of the mean, and this may either be exactly equal
to the population mean or not (see dots on the plot below); however, it will
never be known if the sample mean is exactly equal to the population mean being
estimated
=> Since only one sample is actually taken from the population, then that point could be any
of those points in the graph
=> that value can be used to be the point estimate of the population mean
𝑥
Proportion, P, is the sample proportion, p = , where x is the number of sample units that
𝑛
possess the characteristic of interest
Variance, 2 , is the sample variance, s2
3. A machine dispenses coffee in 12-ounce cups. The contents of eight cups were measured and the
following data were generated: 12.03, 12.10, 12.02, 11.98, 12.00, 12.05, 11.97, 11.99.
The sample variance, s2 = 0.0018, is the best point estimate of the population variance, 2
INTERVAL ESTIMATION
Determining an interval or range of values based on the observed sample to estimate a parameter
Most often preferred to point estimates
Defn: An interval estimate of a parameter is an interval or range of values used to estimate the
parameter. This estimate may or may not contain the value of the parameter being estimated.
Remarks:
1. In an interval estimate, the parameter is specified as being between two values
2. Since the interval either contains the parameter or it does not, a degree of confidence has to be
assigned before the interval estimate is made
Usual values of the degree of confidence : 90%, 95% and 99%
Example: one may wish to be 95% confident that the interval contains the true population
mean
The higher the level of confidence, the wider the range or the larger the interval
Illustration:
The point estimates in the graph above are converted to their respective
confidence intervals for a 95% level of confidence (see graph below).
All have the same widths but at different positions, depending on the value of
the point estimate, 𝑥̅ .
Defn: The confidence level of an interval estimate of a parameter is the probability that the interval
estimate will contain the parameter, assuming that a large number of samples are selected and
that the estimation process on the same parameter is repeated.
Defn: A confidence interval is a specific interval estimate of a parameter determined by using data
obtained from a sample and by using the specific confidence level of the estimate.
FORMULAS:
1. Confidence interval (CI) for the mean, ( known or n 30)
𝝈 𝝈
̅ − 𝒛𝜶⁄ ( ) < < 𝒙
𝒙 ̅ + 𝒛𝜶⁄ ( )
𝒏
𝟐 √ 𝒏 𝟐 √
𝝈
maximum error of the estimate (E) = 𝒛𝜶⁄ ( 𝒏)
𝟐 √
Remark: Sometimes interval estimates rather than point estimates are reported.
Example: “On the basis of a sample of 200 families, the survey estimates that a Filipino family of 5 spends
an average of P2500 pesos per week for groceries. One can be 95% confident that this estimate is
accurate within P150 of the true mean.” This can be translated to a confidence interval format:
Example: The college president asks the statistics teacher to estimate the average age of the students at
their college. How large a sample is necessary? The teacher would like to be 99% confident that
the estimate should be accurate within 1 year. From a previous study, the standard deviation of the
ages is known to be 3 years.
Solution:
𝑧𝛼⁄2= 2.58 ; = 3; E = 1
(2.58)(3) 2
n=( ) = 59.9 60 students
1
Remarks:
1. Confidence interval (CI) for the mean, ( unknown and n < 30)
𝒔 𝒔
̅ − 𝒕𝜶⁄ ( ) < < 𝒙
𝒙 ̅ + 𝒕𝜶⁄ ( )
𝒏 𝟐 √ 𝒏 𝟐 √
̂𝒒
𝒑 ̂ ̂𝒒
𝒑 ̂
𝑝̂ − 𝒛𝜶⁄ (√ 𝒏 ) < P < 𝑝̂ + 𝒛𝜶⁄ (√ 𝒏 )
𝟐 𝟐
̂𝒒
𝒑 ̂
̂ : 𝜎𝑃̂ = √
3. Standard error of 𝒑 𝒏
̂𝒒
𝒑 ̂
̂ = 𝒛𝜶⁄ (√ )
4. Margin of error of 𝒑
𝟐 𝒏
𝟐
(𝐳𝛂⁄ ) 𝐩∗ (𝟏−𝐩∗ )
𝟐
𝐧=
𝐄𝟐
EXERCISES:
2. In a survey, the planning value for the population proportion p * = 0.35. How large a sample should be
taken to provide a 95% confidence interval with a margin of error of 0.05?
3. At 95% confidence, how large a sample should be taken to obtain a margin of error of 0.03 for the
estimation of a population proportion? Assume that past data are not available for developing a planning
value for p*.
4. How large a sample must be selected to provide a 95% confidence interval with a margin or error of 10
Assume the population standard deviation is 40.
5. The following sample data are from a normal population: 10, 8, 12, 15, 13, 11, 6, 5.
a. What is the point estimate of the population mean?
b. What is the point estimate for the population standard deviation?
c. With 95% confidence, what is the margin of error for the estimation of the population mean?
d. What is the 95% confidence interval for the population mean?
7. Sales personnel for Skillings Distributors submit weekly reports listing the customer contacts made
during the week. A sample of 65 weekly reports showed a sample mean of 19.5 customer contacts per
week. The sample standard deviation was 5.2. Provide 90% and 95% confidence intervals for the
population mean number of weekly customer contacts for the sales personnel.
8. Thirty fast-food restaurants including Wendy’s, McDonalds, and Burger King were visited during the
summer of 2000. During each visit, the customer went to the drive-through and ordered a basic meal such
as a “combo” meal or a sandwich, fried, and shake. The time between pulling up to the menu board and
receiving the filled order was recorded. The times in minutes forth 30 visits are as follows:
0.9 1.0 1.2 2.2 1.9 3.6 2.8 5.2 1.8 2.1
6.8 1.3 3.0 4.5 2.8 2.3 2.7 5.7 4.8 3.5
2.6 3.3 5.5 4.0 7.2 9.1 2.8 3.6 7.3 9.0
a. Provide a point estimate of the population mean drive-through time at fast-food restaurants.
b. At 95% confidence, what is the margin of error?
c. What is the 95% confidence interval estimate of the population mean?
d. Discuss the skewness that may be present in this population. What suggestion would you make for
a repeat of this study?
9. A National Retail Foundation survey found households intended to spend an average of $649 during the
December holiday season. Assume that he survey included 600 households and that the sample standard
deviation was $175.
a. With 95% confidence, what is the margin of error?
b. What is the 95% confidence interval estimate of the population mean?
c. The prior year, the population mean expenditure per household was $632. Discuss the change in
the holiday season expenditures over the one-year period.