You are on page 1of 8

UNIVERSITY OF ST.

LA SALLE
College of Arts and Sciences

BBIO 105 – Statistical Biology


Second Semester, Ay 2021 – 2022

HANDOUTS 5A

INTRODUCTION TO INFERENTIAL STATISTICS

SOME COMMON RESEARCH METHODOLOGICAL PROBLEMS:

 BIAS – A methodological problem; any trend or deviation from the truth in data collection, data
analysis, interpretation and publication which can cause false conclusions. Bias can occur either
intentionally or unintentionally.
 Some examples of Bias:
o Selection (Sampling) bias – some error or distortion resulting from a selection process that
is not random.
o Observer bias – some error or distortion (either intentional or unintentional) in the
perception or description of the data by the observer. Examples include incorrect use of
subjective scales or surveying a homogeneous population and extrapolating the results to
the general population.
o Subject bias – some error or distortion of the measurement by the study subject. An
example is recall bias, in which subjects incorrectly remember and report events that
occurred in the past, such as dosages of medications or diet habits.
o Instrument bias – some error or distortion from faulty mechanical equipment, such as an
uncalibrated scale that gives falsely low weight or interference by ambient light with
infrared co-oximeter measurements.

 TWO TYPES OF ERROR:


o Random error – a “wrong” result due to chance. An unknown or unrecognized variable that
has the capacity to distort the sample in either direction may exist.
The error (or disturbance) of an observed value is the deviation of the observed value from
the (unobservable) true value of a quantity of interest (for example, a population mean).
 The easiest method to reduce the influence of random error is to increase the sample
size, which increases the precision of the estimate.

o Systematic error – a wrong result due to interference from bias.


 The only way to correct for bias interference is to avoid it in the first place by
designing a study that controls for bias through double blinding and correct
randomization and prospective design.

LEONARES, S. R., PHD 1


INFERENTIAL STATISTICS

 Allows conclusions about the population to be generated from sample data


 Applicable when probability sampling is used; inferential statistics is founded on the theories of
probability
 Questionable results when non-probability sampling is used
 The branches of inferential statistics:

INFERENTIAL STATISTICS

HYPOTHESIS TESTING ESTIMATION

PARAMTERIC NONPARAMETRIC
POINT INTERVAL
STATISTICS STATISTICS

Determining an appropriate inferential statistical test of hypothesis:

The measurement scale of the variable

Nominal Ordinal Interval Ratio


Yes Yes

Nonparametric Normal
methods distribution?

No Yes
Nonparametric Parametric
methods methods

Some notes:
1. The level of measurement is an important indicator in the choice of statistical hypothesis testing
procedure.
2. Parametric tests have more stringent assumptions before they can be used: data must be at least
interval and the distribution of the data must be normal or approximately normal.

LEONARES, S. R., PHD 2


PARAMETRIC METHODS: ESTIMATION

Defn: Estimation is the process of estimating the value of a parameter from information obtained from
a sample.
Estimate – a particular numerical value
Estimator – a sample measure (statistic) used to estimate a population measure (parameter)

2 types of estimation procedures:


1. Point estimation
2. Interval estimation
Notes:
1. Different samples from the same population may generate different estimates of the same parameter.
2. Different parameters have different point and interval estimation procedures

POINT ESTIMATION
 Determining a particular numerical value estimate of a parameter
 An estimator is not expected to estimate the parameter without error, but it is hoped that it
is not too far off
Point Estimation for some parameters
The best estimator for the population …
 Mean, , is the sample mean, 𝑥̅
=> Each sample generates a different value of the mean, and this may either be exactly equal
to the population mean or not (see dots on the plot below); however, it will
never be known if the sample mean is exactly equal to the population mean being
estimated
=> Since only one sample is actually taken from the population, then that point could be any
of those points in the graph
=> that value can be used to be the point estimate of the population mean

𝑥
 Proportion, P, is the sample proportion, p = , where x is the number of sample units that
𝑛
possess the characteristic of interest
 Variance, 2 , is the sample variance, s2

LEONARES, S. R., PHD 3


Examples:
1. A survey of 30 adults found that the mean age of a person’s primary vehicle is 5.6 years.
 5.6 years is the best point estimate of the population mean age of a person’s primary
vehicle
2. In a study, 200 people were asked if they were satisfied with their job or profession; 162 said that
162
they were. In this case, n = 200 , x = 162 and p = 200 = 0.81

 81% of those surveyed were satisfied with their job or profession

3. A machine dispenses coffee in 12-ounce cups. The contents of eight cups were measured and the
following data were generated: 12.03, 12.10, 12.02, 11.98, 12.00, 12.05, 11.97, 11.99.
 The sample variance, s2 = 0.0018, is the best point estimate of the population variance, 2

Question: “How good is a point estimate?”


Answer: There is no way of knowing how close a particular point estimate is to the population
value

INTERVAL ESTIMATION
 Determining an interval or range of values based on the observed sample to estimate a parameter
 Most often preferred to point estimates

Defn: An interval estimate of a parameter is an interval or range of values used to estimate the
parameter. This estimate may or may not contain the value of the parameter being estimated.

Remarks:
1. In an interval estimate, the parameter is specified as being between two values
2. Since the interval either contains the parameter or it does not, a degree of confidence has to be
assigned before the interval estimate is made
 Usual values of the degree of confidence : 90%, 95% and 99%
 Example: one may wish to be 95% confident that the interval contains the true population
mean
 The higher the level of confidence, the wider the range or the larger the interval

Illustration:
 The point estimates in the graph above are converted to their respective
confidence intervals for a 95% level of confidence (see graph below).
 All have the same widths but at different positions, depending on the value of
the point estimate, 𝑥̅ .

LEONARES, S. R., PHD 4


 The point that lies outside of the upper limit in the first graph generates a
confidence interval that does not contain the true but unknown value of the
population mean, .
 HOWEVER, in actual sampling, only one confidence interval for a particular
confidence level can be constructed and, again, it will never be known for sure
(with 100% confidence) if the said interval contains the unknown value of the
population mean, .

Defn: The confidence level of an interval estimate of a parameter is the probability that the interval
estimate will contain the parameter, assuming that a large number of samples are selected and
that the estimation process on the same parameter is repeated.
Defn: A confidence interval is a specific interval estimate of a parameter determined by using data
obtained from a sample and by using the specific confidence level of the estimate.
FORMULAS:
1. Confidence interval (CI) for the mean,  ( known or n  30)

𝝈 𝝈
̅ − 𝒛𝜶⁄ ( ) <  < 𝒙
𝒙 ̅ + 𝒛𝜶⁄ ( )
𝒏
𝟐 √ 𝒏 𝟐 √

 For a 90% CI, 𝑧𝛼⁄2= 1.65;


 For a 95% CI, 𝑧𝛼⁄2= 1.96
 For a 99% CI, 𝑧𝛼⁄2= 2.58

𝝈
maximum error of the estimate (E) = 𝒛𝜶⁄ ( 𝒏)
𝟐 √

 also called the margin of error


 for a given confidence level, E is the maximum likely
difference between the point estimate of a parameter and the
actual value of the (unknown) parameter being estimated

LEONARES, S. R., PHD 5


Example: A survey of 30 adults found that the mean age of a person’s primary vehicle is 5.6 years.
Assuming that the standard deviation of the population is 0.8 year, find the 99%
confidence interval of the population mean.
Solution:

𝑥̅ = 5.6 years,  = 0.8 year, 𝑧𝛼⁄2= 2.58


0.8 0.8
99% CI: 5.6 – (2.58)( ) <  < 5.6 + (2.58)( )
√30 √30

5.2 <  < 6.0


 one can be 99% confident that the mean age of all primary vehicles is between 5.2 and 6.0
years, based on 30 vehicles

Remark: Sometimes interval estimates rather than point estimates are reported.

Example: “On the basis of a sample of 200 families, the survey estimates that a Filipino family of 5 spends
an average of P2500 pesos per week for groceries. One can be 95% confident that this estimate is
accurate within P150 of the true mean.” This can be translated to a confidence interval format:

 95% CI for : 2500 – 150 <  < 2500 + 150


2350 <  < 2650
 With a 95% level of confidence, a Filipino family of 5 spends, on the average,
between 2350 and 2650 per week for groceries

SAMPLE SIZE for an interval estimate of the population mean


Derived from the formula of E (maximum error of the estimate):
𝟐
𝐳𝛂⁄ 𝛔
𝟐
n=( )
𝐄

Example: The college president asks the statistics teacher to estimate the average age of the students at
their college. How large a sample is necessary? The teacher would like to be 99% confident that
the estimate should be accurate within 1 year. From a previous study, the standard deviation of the
ages is known to be 3 years.
Solution:
𝑧𝛼⁄2= 2.58 ;  = 3; E = 1

(2.58)(3) 2
n=( ) = 59.9  60 students
1

Remarks:
1. Confidence interval (CI) for the mean,  ( unknown and n < 30)
𝒔 𝒔
̅ − 𝒕𝜶⁄ ( ) <  < 𝒙
𝒙 ̅ + 𝒕𝜶⁄ ( )
𝒏 𝟐 √ 𝒏 𝟐 √

with n – 1 degrees of freedom, where t is a value of the t-dist’n for /2

LEONARES, S. R., PHD 6


2. Confidence interval (CI) for a proportion, P

̂𝒒
𝒑 ̂ ̂𝒒
𝒑 ̂
𝑝̂ − 𝒛𝜶⁄ (√ 𝒏 ) < P < 𝑝̂ + 𝒛𝜶⁄ (√ 𝒏 )
𝟐 𝟐

Where 𝑝̂ is the sample proportion


𝑞̂ = 1 – 𝑝̂

̂𝒒
𝒑 ̂
̂ : 𝜎𝑃̂ = √
3. Standard error of 𝒑 𝒏

̂𝒒
𝒑 ̂
̂ = 𝒛𝜶⁄ (√ )
4. Margin of error of 𝒑
𝟐 𝒏

SAMPLE SIZE for an interval estimate of the population proportion, P

𝟐
(𝐳𝛂⁄ ) 𝐩∗ (𝟏−𝐩∗ )
𝟐
𝐧=
𝐄𝟐

where p* = planning value of p for estimating the sample size

EXERCISES:

1. A simple random sample of 400 individuals provides 100 Yes responses.


a. What is the point estimate of the proportion of the population that would provide Yes responses?
b. What is your estimate of the standard error of 𝑝̂ ?
c. Compute for the 95% confidence interval for the population proportion, P.

2. In a survey, the planning value for the population proportion p * = 0.35. How large a sample should be
taken to provide a 95% confidence interval with a margin of error of 0.05?

3. At 95% confidence, how large a sample should be taken to obtain a margin of error of 0.03 for the
estimation of a population proportion? Assume that past data are not available for developing a planning
value for p*.

4. How large a sample must be selected to provide a 95% confidence interval with a margin or error of 10
Assume the population standard deviation is 40.

5. The following sample data are from a normal population: 10, 8, 12, 15, 13, 11, 6, 5.
a. What is the point estimate of the population mean?
b. What is the point estimate for the population standard deviation?
c. With 95% confidence, what is the margin of error for the estimation of the population mean?
d. What is the 95% confidence interval for the population mean?

LEONARES, S. R., PHD 7


6. A simple random sample with n = 54 provided a sample mean of 22.5 and a sample standard deviation
of 4.4.
a. Develop 90%, 95% and 99% confidence intervals for the population mean.
b. What happens to the margin of error and the confidence interval as the confidence level is
increased?

7. Sales personnel for Skillings Distributors submit weekly reports listing the customer contacts made
during the week. A sample of 65 weekly reports showed a sample mean of 19.5 customer contacts per
week. The sample standard deviation was 5.2. Provide 90% and 95% confidence intervals for the
population mean number of weekly customer contacts for the sales personnel.

8. Thirty fast-food restaurants including Wendy’s, McDonalds, and Burger King were visited during the
summer of 2000. During each visit, the customer went to the drive-through and ordered a basic meal such
as a “combo” meal or a sandwich, fried, and shake. The time between pulling up to the menu board and
receiving the filled order was recorded. The times in minutes forth 30 visits are as follows:

0.9 1.0 1.2 2.2 1.9 3.6 2.8 5.2 1.8 2.1
6.8 1.3 3.0 4.5 2.8 2.3 2.7 5.7 4.8 3.5
2.6 3.3 5.5 4.0 7.2 9.1 2.8 3.6 7.3 9.0

a. Provide a point estimate of the population mean drive-through time at fast-food restaurants.
b. At 95% confidence, what is the margin of error?
c. What is the 95% confidence interval estimate of the population mean?
d. Discuss the skewness that may be present in this population. What suggestion would you make for
a repeat of this study?

9. A National Retail Foundation survey found households intended to spend an average of $649 during the
December holiday season. Assume that he survey included 600 households and that the sample standard
deviation was $175.
a. With 95% confidence, what is the margin of error?
b. What is the 95% confidence interval estimate of the population mean?
c. The prior year, the population mean expenditure per household was $632. Discuss the change in
the holiday season expenditures over the one-year period.

LEONARES, S. R., PHD 8

You might also like