You are on page 1of 18

SAMPLING DESIGN: BASIC CONCEPTS

AND PROCEDURE

The goal of sampling is to obtain individuals for a


study in such a way that accurate information about the
population can be obtained.

Two Types of Samples

1. Probability Sample
2. Non-probability Sample 3. Preplan how to select a sequence of digits
from the table so that no bias enters the
PROBABILITY SAMPLES selection process.
4. Select a random number in the preplanned
pattern.
 Samples are obtained using some objective
5. Arrange the random numbers consecutively in
chance mechanism, thus involving numerical order.
randomization. 6. Select as samples those items in the lot
 They require the use of complete listing of the corresponding to the random numbers.
elements of the universe called the sampling
frame. SYSTEMATIC RANDOM SAMPLING (SysRS)
 The probabilities of selection are known.
 They are generally referred to as random  It is obtained by selecting every kth
samples. individual from the population.
 The first individual selected corresponds to a
NON-PROBABILITY SAMPLES random number between 1 to k.

 Samples are obtained haphazardly, selected


purposively, or are taken as volunteers.
 The probabilities of selection are unknown.
 They should NOT be used for statistical
inference.
 The result from the use of judgment sampling,
accidental sampling, purposively sampling,
and the like. [these are examples of sampling
that established non-probability]

BASIC SAMPLING TECHNIQUE OF


PROCEDURES IN OBTAINING SysRS
PROBABILITY SAMPLING
1. Decide on a method of assigning a unique
SIMPLE RANDOM SAMPLING (SRS) serial number, from 1 to N, to each one of the
elements in the population.
 The most basic method of drawing a 2. Compute for the sampling interval
probability sample.
 Assign equal probabilities of selection to N
each possible sample. k= =Population ¿ ¿ Sample ¿ ¿ ¿ ¿
 Results to a simple random sample.
n
3. Select a number, from 1 to k, using a
randomization mechanism, The element in the
population assigned to this number is the first
element of the sample. The other elements of
the sample are those assigned to the numbers
and so on until you get a sample of size.
PROCEDURES IN OBTAINING SRS Example:
1. Assign a number to each item in the lot. We want to select a sample of 50 students from
2. Consult the table of random numbers. 500 students under this method kth item and picked up
from the sampling frame.
Solution:  The clusters are constructed such that the
sampling units are heterogeneous within
cluster and homogeneous among the
clusters.

We start to get a sample starting from i and for


every kth unit subsequently. Suppose the random
number i is 5, then we select 5, 15, 25, 35, …

STRATIFIED RANDOM SAMPLING (StRS)

 It is obtained by separating the population


into non-overlapping groups called strata
and then obtaining a simple random sample
from each stratum.
 The individuals within each stratum should
be homogeneous (or similar) in some way.

Example:

The list of all the agricultural farms in a village or


district may not be easily available but the list of village
or district is generally available. In this case, every farm
in sampling unit and every village or district is the
cluster.

MULTI-STAGE SAMPLING (MSS)

Example:

A sample of 50 students is to be drawn from a


population consisting of 500 students belonging to the
two institutions A and B. The number of students in
institution A is 200 and the institution B is 300. How
will you draw the sample using proportional allocation?

Solution:

There are two strata in this case.

Given: N1 = 200 N2 = 30 N = 500 n = 50

PROCEDURE IN OBTAINING MSS


If n1 and n2 are the sample size,
1. Organize the sampling process into stages
where the unit of analysis is systematically
grouped.
2. Select a sampling technique for each stage.
3. Systematically apply the sampling technique
to each stage until the unit of analysis has
The sample sizes are 20 from A and 30 from B. been selected.
Then the units from each institution are to be selected
by simple random sampling.

CLUSTER SAMPLING

 It is a way to randomly select participants


from a list that is too large for simple random
sampling.
- Interviewer Error
- Misrepresented Answers
- Data Entry Errors
- Questionnaire Design
- Wording of Questions
- The order of the questions, words, and
responses.
2. Sampling Error is the error that results from
using the sampling to estimate information
regarding a population.

BASIC SAMPLING TECHNIQUE OF NON- PRESENTATION OF DATA


PROBABILITY SAMPLING
1. Textual Presentation
2. Tabular Presentation
 Accidental Sampling
3. Graphical Presentation (bar graph, histogram,
 Quota Sampling
pie chart, etc.)
 Convenience Sampling
 Purposive Sampling MEASURE OF CENTRAL TENDENCY
 Judgment Sampling

CASES WHERE IN NON-PROBABILITY MEAN


SAMPLING IS USEFUL
 It is the sum of the data values divided by the
 Only few are willing to be interviewed. number of data values.
 Extreme difficulties in locating or identifying  It is also called the average.
subjects.  It is appropriate only for data under interval
 Probability sampling is more expensive to and ratio scale measurement.
implement.
ADVANTAGE OF MEAN
Exercise: Identify the sample selection procedure used
in each of the following cases:  Simple to understand and easy to calculate.
 It is rigidly defined.
1. A tax auditor selects every 1,000th income tax  It is least affected fluctuation of sampling.
return that is received.  It considers all the values in the series.

MEDIAN

2. 12 people are randomly selected to serve as  It is the “middle observation” when the data
jurors from a jury pool of 150 people. set is sorted (in either increasing or
decreasing order)
 The median divides the distribution into two
equal parts.
3. To select a sample household in a province, a
ADVANTAGE OF MEDIAN
sample of provinces were selected, then a
sample of municipalities were chosen from  The median is not affected by the size of
each of the selected provinces, then a sample extreme values but by the number of
of barangays were chosen from each of the observations.
selected municipality, and all households in  The median can be calculated even when the
the selected barangays were included. frequency distribution contains “open-ended”
intervals.
 It can also be used to define the middle of a
number of objects, properties, or quantities
which are not really quantitative in nature.
SOURCES OF ERRORS IN SAMPLING  It can be easily interpreted.

MODE
1. Non-sampling Error are errors that results
from the survey process.  It is the most frequently occurring value in a
This includes… list of data.
- Non-responses  It is sometimes called nominal average.
 It is appropriate measure of the average for Quantiles are statistics that describe various
data using the nominal scale of measurement, subdivisions of a frequency distribution into equal
proportions.
ADVANTAGE OF MODE
THREE SPECIAL QUANTILES
 The mode is easy to understand.
 Like the median, it is not greatly affected by 1. Quartiles
extreme values. 2. Deciles
 Like the median, it can be computed even 3. Percentiles
when the frequency distribution contains
QUARTILES
“open-ended” intervals.
Descriptive measures that split the ordered data
Remember:
into four quarters.
Whenever you hear the word average, be aware that
the word may not always be referring to the mean. One
average could be used to support one position, while
another average could be used to support a different
position.

MEASURE OF DISPERSION
DECILES

Descriptive measures that split the ordered data


into ten equal parts.

PERCENTILES
Since the data points in figure 2 is more scattered
than the data points in figure 1, then the data set Descriptive measures that split the ordered data into
depicted in figure 2 is more varied. 100 equal parts.

RANGE

 It is the difference between the largest and


the smallest observations or items in a set of
data.

STANDARD DEVIATION

 It is a measure of how far away items in a Example: Interpretation using Quantiles


data set are from the mean.
 The larger the standard deviation, the more 1. Jennifer just received the results of her SAT
variation there is in the data set. exam. Her SAT Mathematics score of 600 is
in the 74th percentile. What does this
VARIANCE mean?

 It represents all data points in a set and is A percentile rank of 74% means that 74% of
calculated by averaging the squared deviation SAT Mathematics scores are less than or equal
of each mean. to 600 and 26% of the scores are greater. So,
26% of the students who took the exam scored
MEASURE OF RELATIVE POSITION better than Jennifer.
2. A test mark is calculated to be at the 84th
percentile, what does this mean?
84% of the people who wrote the test got the A parameter is a numerical characteristic of the
same mark or less than the test mark and 16% population. Any characteristics of a population are
of the people who wrote the test scored higher called a parameter.
than the test mark.
3. Time taken to finish a test is 35 minutes. A statistic is a numerical value that describes a
This time was the first quartile. What does sample, or a number computed from the sample data.
this mean?
WHAT PROPERTIES MAKE A GOOD POINT
ESTIMATOR?
25% of the learners finished the exam in 35
minutes or less, and 75% of the learners 1. It's desirable that the sampling distribution be
finished the exam in more than 35 minutes. centered around the true population
parameter. An estimator with this property is
PARAMETRIC STATISTICS called unbiased.
2. It's desirable that our chosen estimator have a
 Parametric statistical procedures are small standard error in comparison with
inferential procedures that rely on testing other estimators we might have chosen.
claims regarding parameters such as the
population mean, the population standard
deviation, or the population proportion.
 In some circumstances, the use of parametric
procedures requires that certain requirements CONFIDENCE INTERVAL
regarding the distribution of the population,
such as normality, be satisfied.
Confidence interval provides more information than
 Assume underlying statistical distributions in point estimates and it consist of an interval of numbers.
the data. Therefore, several conditions of
validity must be met so that the result of a Level of confidence represents the expected proportion
parametric test is reliable. of intervals that will contain the parameter if a large
 Apply to data in ratio scale, and some apply number of different samples is obtained.
to data in interval scale.
The level of confidence is denoted by,
TWO COMMON FORMS OF STATISTICAL
INFERENCE 1 − 𝛼 × 100%

1. Estimation Confidence interval estimates are of the form Point


2. Hypothesis Testing estimate margin of error.

ESTIMATING THE VALUE OF A

In statistics, estimate is used to approximate the


value of an unknown population parameter.

TWO TYPES OF ESTIMATION


MARGIN OF ERROR
1. POINT ESTIMATION – (single points that
are used to infer parameters directly).
The margin of error of the estimate can be
2. INTERVAL ESTIMATION – (also called
computed using this formula:
confidence interval for parameter).

PARAMETER VS STATISTIC

The margin of error of a confidence interval


estimate of a parameter depends on three factors:

1. Level of Confidence
2. Sample Size
3. Standard Deviation
INTERPRETATION OF CONFIDENCE
INTERVAL

A 1 − 𝛼 × 100% confidence interval indicates that, if we


obtained many simple random samples of size n from
the population whose mean, is unknown, then
approximately of the intervals will contain.

In other words,

“We are (insert level of confidence) confident that Note:


the population mean is between (lower bound) and
(upper bound). This is an abbreviated way of saying If the sample size is large (n ≥ 30), then the sample
the method is correct 1 − 𝛼 × 100% of the time”. standard deviations can be used to estimate the
population standard deviation.
Example:

If we constructed a 90% confidence interval with a


lower bound of 12 and an upper bound of 18, we would
interpret the intervals as follows:

“We are 90% confident that the population mean, is


between 12 and 18”.

Remember:

A 95% confidence interval does not mean that there is


95% probability that the interval contains population How about if  known but n < 30? Use Case 1.
mean.
Example:
ESTIMATING THE VALUE OF A
How much do Filipinos sleep each night? Based on
PARAMETER USING CONFIDENCE
a random sample of 1120 Filipinos 15 years of age or
INTERVALS
older, the mean amount of sleep per night is 8.17 hours
1. Constructing confidence intervals about a according to the Filipino Time.
population mean where the population
standard deviation is (known or unknown). Use Survey conducted by the Bureau of Labor
2. Constructing confidence intervals about a Statistics. Assuming the population standard deviation
population proportion. for amount of sleep per night is 1.2 hours, construct and
3. Constructing confidence intervals about a interpret a 95% confidence interval for the mean
population standard deviation. amount of sleep per night of Filipinos 15 years of age or
older.
CONFIDENCE INTERVAL ABOUT Solution:
POPULATION MEAN where population
standard deviation is KNOWN or UNKNOWN

Example:
A simple random sample of size n = 40 is drawn
from a population. The sample mean is found to be 20.1,
and the sample standard deviation is found to be 3.2.
Construct and interpret a 90% confidence interval
about the population mean.

Solution:

“We are 95% confident that the proportion of Filipinos


CONFIDENCE INTERVAL ABOUT who are in favor of tighter enforcement of government
POPULATION PROPORTION rules on TV content during hours when children are
most likely to be watching is between 0.73 and 0.77”.
The point estimate for the population proportion is,
CONFIDENCE INTERVAL ABOUT
POPULATION VARIANCE

If a simple random sample of size n is taken from a


where x is the number of individuals in the sample with normal population with mean and standard deviation ,
the specified characteristic and n is the sample size. then a confidence interval about (1 − 𝛼 × 100%) is
given by,
Suppose a simple random sample of size n is taken
from a population. A confidence interval for p is given
by the following quantities:

With n – 1 degrees of freedom.

Remember:

Note: A confidence interval about the population


variance or standard deviation is not of the form “point
It must be the case that 𝑛𝑝(1 − 𝑝) ≥ 10 and 𝑛 ≤ estimate margin of error” because the sampling
0.05N to construct this interval. distribution of the sample variance is not symmetric.

Example: Example:

In a poll conducted by the Research Center for the A simple random sample of size n = 12 is drawn
People and the Press, a simple random sample of 1505 from a population that is normally distributed. The
Filipino adults was asked whether they were in favor of sample variance is found to be 𝑠2 = 23.7. Construct a
tighter enforcement of government rules on TV content 90% confidence interval about the population
during hours when children are most likely to be variance.
watching.
Solution:
Of the 1,505 adults, 1,129 responded yes.
Obtained a 95% confidence interval for the proportion
of Filipinos who are in favor of tighter enforcement of
government rules on TV content during hours when
children are most likely to be watching.

Solution:
Exercises:

1. Jane wants to estimate the proportion of


students on her campus who eat cauliflower.
After surveying 20 students, she finds 2 who
eat cauliflower. Obtain and interpret a 95%
confidence interval for the proportion of
students who eat cauliflower on Jane’s
campus.
2. Alan wants to estimate the proportion of
adults who walk to work. In a survey of 10
adults, he finds 1 who walk to work. Obtain
and interpret a 95% confidence interval for the
proportion of adults who walk to work.
3. Suppose a sample of 30 Stats students are
given an IQ test. If the sample has a standard
deviation of 12.23 points, find a 90%
confidence interval for the population
standard deviation and interpret the result.

[provided this space as your answer sheet]


HYPOTHESIS TESTING  Denoted by Ha.
 Statement that must be true if the null
hypothesis is false.
 Hypothesis testing is a procedure on sample
 Sometimes referred to as the research
evidence and probability, used to test claims
hypothesis.
regarding a characteristic of one or more
 Must contain the condition of equality and
populations.
must be written with the symbol ≠, <, or >.
 A statement or claim regarding a
characteristic of one or more populations. Example:
 A preconceived idea, assumed to be true but
has to be tested for its truth or falsity.  Students who eat breakfast will perform
better on a math exam than students who do
Example: not eat breakfast.
 Students who experience test anxiety prior to
 The mean body temperature for patients
an English exam will get higher scores than
admitted to elective surgery is not equal to
students who do not experience test anxiety.
37.0 oC.
 Motorists who talk on the phone while
 A consumer advocate would like to know if
driving will be more likely to make errors on
the mean lifetime of a bulb is less than 500
a driving course than those who do not talk
hours.
on the phone.
 A real estate broker believes that because of
changes in interest rates, as well as other Remember:
economic factors, the mean price has
increased since then. If you are conducting a research study and you want
to use a hypothesis test to support your claim, the claim
PROCEDURES FOR HYPOTHESIS TESTING must be stated in such a way that it becomes the
alternative hypothesis, so it cannot contain the
condition of equality.
1. State the null and alternative hypothesis.
2. Set the level of significance or alpha level (α) TWO TYPES OF ALTERNATIVE TEST
3. Determine the test distribution to use.
4. Determine the critical region. 1. One-Tailed Test
5. State the decision rule. - Left Tailed
6. Calculate a test statistic. - Right Tailed
7. Make statistical decision. 2. Two-Tailed Test

1. State the Null and Alternative Hypothesis

Null Hypothesis

 Denoted by 𝐻o.
 The statement being tested.
 Assumed true until evidence indicates
otherwise.
 Must contain the condition of equality and
must be written with the symbol =, ≤, or ≥.

Example: 2. Set the Level of Significance or Alpha Level


(α)
 Students who eat and not eat breakfast will
perform the same on a math exam.
The level of significance, 𝛼, is the probability of
 Students who experience and not experience
making a type I error.
test anxiety prior to an English exam will get
the same scores. TWO TYPES OF ERROR
 Motorists who talk and not talk on the phone
while driving will get the same errors on a
driving course.

Alternative Hypothesis
[Let null = tao. TYPE I ERROR kapag ni-reject mo
‘yung tamang tao. TYPE II ERROR kapag in-accept
mo ‘yung maling tao]

Example:

𝐻o: The defendant is innocent.

𝐻a: The defendant is not innocent.

What happen to the defendant if the jury made type I


and type II error?
 Rejection of region or critical region is the
 A type I error is like putting an innocent set of all values of the test statistic which will
person in jail. lead to the rejection of 𝐻o.
 A type II error is like letting a guilty person  Acceptance Region is the set of all values of
go free. the test statistic that leads the researcher to
retain 𝐻o.
Remember:

It is important to note that we want to set α before 5. State the Decision Rules
we start our study because the Type I error is the more
‘severe’ error to make. The smaller α is, the smaller USING CONFIDENCE INTERVAL
the region of rejection.
Decision Rule: Reject the null hypothesis if the test
3. Determine the Test Distribution to Use statistic is NOT within the range specified by the
confidence interval.

Determine the best statistical test to be use, based


on the objective, and the assumptions that are satisfied.

LIST OF COMMON PARAMETRIC TEST


USING p-value APPROACH
1. One Sample z-Test
2. One Sample t-Test Decision Rule: Reject the null hypothesis if the
3. One Sample Proportion Test computed p-value is less than or equal to the set
4. Independent Sample z-Test significance level, otherwise do not reject the null
5. Independent Sample t-Test hypothesis.
6. Two Sample Proportion Test
Reject NULL if p < alpha level
7. Paired Sample t-Test
8. Analysis of Variance (ANOVA) Test [not advisable to reject null even if p = alpha level]
9. Tukey Test (Post Hoc ANOVA)
10. Two Way ANOVA USING TRADITIONAL METHOD
11. Pearson Product Moment Correlation
12. Regression Analysis Decision Rule: Reject 𝐻o if the computed value of the
test statistic falls in the region of rejection.
[the following test distribution will be discussed on
latter part of the lecture (sana)] [see the diagram in number 4]

4. Determine the Critical Region 6. Calculate Test Statistic

Once you determine the appropriate statistical test


to be used on step no. 3, calculate the test statistic. The
value computed using different statistical test is used
to compare to the critical value.
Test statistic - a statistic computed from the sample
data that is especially sensitive to the differences
between 𝐻o and 𝐻a.

7. Make Statistical Decision

 Fail to reject the null hypothesis/ Retain the


null hypothesis/ There is no enough evidence
to reject the null hypothesis. Q-Q PROBABILITY PLOTS display the observed
 Reject the null hypothesis. values against normally distributed data (represented by
the line).
Remember:

It is important to recognize that we NEVER accept


the null hypothesis. We are merely saying that the
sample evidence is not strong enough to warrant
rejection of the null hypothesis.

NORMAL DISTRIBUTION

Remember:

Graphical methods are typically not very useful


when the sample size is small.

NUMERICAL

The following tests are the common statistical test


This graph is called the normal curve, which is
for normality.
bell-shaped curve, and which approximately describes
many phenomena that occur in nature, industry, and KOLMOGOROV SMIRNOV TEST
research.
 It was first derived by Kolmogorov (1933)
PROPERTIES OF NORMAL CURVE and later modified and proposed as a test by
Smirnov (1948). The test is non-parametric
and entirely agnostic (uncertain) to what this
1. The normal curve is bell-shaped and
distribution actually is.
symmetric about the mean.
2. The mean, median and mode are equal.  This test has been shown to be less powerful
3. The total area under the curve is equal to one. than the other tests in most situations. It is
4. The normal curve approaches, but never included only because of its historical
touches the x-axis as it extends farther and popularity. Some published articles would
farther away from the mean. say “The Kolmogorov-Smirnov test is only a
historical curiosity. It should never be used."
 Tie scores should not be present in the data.
TESTING NORMALITY OF DATA
LILLIEFORS TEST
To determine if the data is following a normality
distribution, we can use the graphical or numerical  Adaptation of the Kolmogorov - Smirnov
method. Test for the case when the mean and
variance of the normal distribution is
GRAPHICAL UNKNOWN.
 It is also use as correction for Kolmogorov -
Smirnov Test since the parameters of 𝐶𝐷𝐹
HISTOGRAM plots the observed values against their are estimated from the sample, the test
frequency, states a visual estimation whether the becomes conservative and loses power.
distribution is bell shaped or not.
ANDERSON-DARLING TEST
 It is a modified Kolmogorov-Smirnov test,
but more weight to the tails of the
distribution is given.
 This test, developed by Anderson and
Darling (1954), is a popular among those
tests that are based on EDF statistics.
[empirical cumulative distribution function]
NORMAL Q-Q PLOT
SHAPIRO-WILK TEST

 One of the MOST POPULAR TESTS for


normality assumption diagnostics which has
good properties of power [sensitivity of
hypothesis test ] and it based on correlation
within given observations and associated
normal scores.
 The Shapiro-Wilk test statistic is derived by
Shapiro and Wilk (1965)
 Doesn’t work well if several values in the
data set are the same/tie scores occur in the
data.

HISTOGRAM

HYPOTHESIS OF NORMALITY TEST


NUMERICAL METHOD
𝐻o: The sample data follows a normal distribution.
Test Method P-Value Decision Remarks
Kolmogorov- < 0.000 Reject Ho Not Normal
𝐻a: The sample data does not follow a normal
Smirnov
distribution. Lilliefors 0.0571 Failed to reject Normal
Ho
When we are testing normality: Anderson- 0.2178 Failed to reject
Ho
Normal
Darling
 If P value > alpha, it means that the data are Shapiro-Wilk 0.2804 Failed to reject
Ho
Normal
normal.
 If P value ≤ alpha, it means that the data are
NOT normal. ONE SAMPLE HYPOTHESIS TEST
Example:
On the previous lecture, it showed some examples
Construct a graphical and numerical method in of parametric tests, and one might observe that there
testing the normality of these data. Diameters of 36 rivet were such ONE SAMPLE and INDEPENDENT
heads in 1/100 of an inch. SAMPLE. [ano ang pinagkaiba?]

In this lecture, we are much more concerned on


some test statistics which involves POPULATION
MEAN and POPULATION PROPORTION on one  If the null hypothesis can’t be accepted, then
sample. [one group compared to standard group] the conclusion is simply that the population
mean doesn’t equal the assumed value. It
TEST CONCERNING THE POPULATION doesn’t matter if the true value is likely to be
MEAN more or less than the assumed value.
ONE-SAMPLE Z-TEST and ONE-SAMPLE T-  A two-tailed test is the one that rejects the
TEST is used to compare the mean of one sample to a null hypothesis if the sample statistic is
known standard (theoretical/hypothetical) mean (𝜇0). significantly higher or lower than the
assumed value of the population parameter.
ASSUMPTIONS  In a one-tailed test, there is only one rejection
region, and the null hypothesis is rejected
1. The sample is obtained using simple random only if the value of a sample statistic falls
sampling or from a randomized experiment. into the single rejection region.
2. The population from which the data is
sampled is normally distributed.

HYPOTHESES

Tabulated z-values for the common choices of α

Note: 𝜇0 is a specified value of the population mean.

ONE SAMPLE z-TEST

CASE 1: Testing means of a normal population with


known 

Test Statistic:

CASE 2: Large sample tests for means with unknown


[If  is unknown and n > 30, use the z-test but replace
the  with s]

Test Statistic:

Note:
Does an average box of cereal contain more than
ONE SAMPLE t-TEST
368 grams of cereal? A random sample of 25 boxes
showed x= 372.5. The company has specified  to be
CASE 3: Small sample tests for means with unknown 15 grams. Test at the a = 0.05 level.

Solution:
[If  is unknown and n < 30, use the t-test and replace
 by s]

Test Statistic:

Rejection Region:

Example:

Does an average box of cereal contain more than


368 grams of cereal? A random sample of 36 boxes Example:
showed x = 372.5, and s = 15. Test at the a = 0.01 level.
Does an average box of cereal contain less than 368
Solution: grams of cereal? A random sample of 25 boxes showed
x = 372.5, and s = 15. Test at the a = 0.01 level.

Solution:

Example:
Exercise:

Does an average box of cereal contain 368 grams of


cereal? A random sample of 25 boxes showed x = 372.5.
The company has specified  to be 15 grams. Test at the TEST A CLAIM ABOUT A PROPORTION
a = 0.05 level.
We can test a claim about a proportion, percentage,
or probability, as illustrated in these examples:

 Based on a sample survey, fewer than ¼ of


all college graduates’ smoke.
 The percentage of physicians leaving the
country is equal to 15%.
 If a driver is fatally injured in a car crash,
there is a 0.35 probability that the driver was
legally impaired.

ONE SAMPLE PROPORTION TEST

The One-Sample Proportion Test is used to assess


whether a population proportion (P1) is significantly
different from a hypothesized value (P0). The
hypotheses may be stated in terms of the proportions,
their difference, their ratio, or their odds ratio, but all
four hypotheses result in the same test statistics.

ASSUMPTIONS:

1. The conditions for a binomial experiment are


satisfied. That is, we have a fixed number of
independent trials having constant
probabilities, and each trial has two outcome
categories, which we classify as “success”
and “failure”.
2. The conditions 𝑛𝑝𝑜 ≥ 5 and 𝑛(1 − 𝑝𝑜) ≥ 5 are
both satisfied, so the binomial distribution of
sample proportions can be approximated by a
normal distribution with
µ = np and σ =√ np(1−p)

HYPOTHESES

CONNECTION TO CONFIDENCE INTERVALS


significance to suggest that the proportion of
housewives throughout the city who prefer
supermarkets exceeds 40%.

Solution:

We need first to check if np ≥ 5 and np(1-p) ≥ 5 to


Note: 𝑝𝑜 is a specified value of population proportion. determine if binomial distribution can be approximated
by the normal distribution.
REJECTION REGION

The assumption is satisfied.

TEST STATISTIC

Note:

When conducting a test of a claim about a


population proportion p, be careful to identify
correctly the sample proportion.
Exercises:
1. The sample proportion p is sometimes given
directly. (e.g., “10% of the observed sports car 1. Kate Flower, President of Kate and Edith
are red.” This is expressed as p = 0.10) Cake Company, says that the mean number of
2. In other cases, we may need to calculate the cakes sold daily is 1, 500. An employee wants
sample proportion by using, to test the accuracy of Kate's claim. A random
sample of 36 days shows that the mean daily
sales were 1, 450 cakes. Using a level of
significance of 0.01 and assuming σ = 120
Example: “96 surveyed households have cable TV and
cakes. What should the worker conclude?
54 do not,” we can first find the sample size n to be 96 +
54 = 150, then we can calculate the value of the sample
proportion of households with cable TV as follows:

Example:

250 housewives were randomly selected and asked


whether they prefer purchasing fish from supermarkets
or from wet (public) markets. If 114 of them preferred
supermarkets, is there evidence at the 5% level of
2. Juanita Lopez, a production supervisor at
chemical company, wants to be sure that the
Super-Duper can is filled with an average of
16oz of product. If the mean volume is
significantly less than 16 oz, customers will
likely complain, prompting undesirable 4. In a study of air-bag effectiveness, it was
publicity. The physical size of the can doesn’t found that in 821 crashes of midsize cars
allow a mean volume significantly above 16 equipped with air bags, 46 of the crashes
oz. A random sample of 36 cans shows a resulted in hospitalization of the drivers. Use a
sample mean of 15.7 oz. Assuming σ is 0.2 oz, 0.01 level of significance to test the claim that
conduct a hypothesis test with α = 0.01. the airbag hospitalization rate is lower than
the 7.8% rate for crashes of midsize cars
equipped with automatic safety belts.

5. Suppose that the teacher of a school claims


that the average weight of student population
greater than from 140 lb. and we desire to test
3. We want to compare fasting serum cholesterol the truth of this claim. We have a random
levels of Filipino women to that of the sample of 6 students of the school weights
American women. Assume the cholesterol from student population. Use a 0.10 level of
levels in 20 to 39 years old women in the significance.
United States in normally distributed with 𝜇 =
90𝑚𝑔/𝑑l. Blood tests are performed on 19
female Filipinos in this age range rendered a
sample mean cholesterol level of 181.52
mg/dl and standard deviation of 40 mg/dl.
Conduct a test of hypothesis to determine
whether Filipino women have lower average
cholesterol level than their American
counterparts. Use alpha = 0.05.
CONSTRUCTING QUESTIONNAIRES

1. Purpose

8. Pretest and
2. Pre-existing Validation
Questionnaire

STEPS IN
3. Domains and CONSTRUCTI 7. Cover ltter,
Types of Instructions, and
Questions NG Layout
QUESTIONNAI
RE
4. Consider the
6. Ordering
Audience

5. Write
Questions

You might also like