You are on page 1of 12

COLLECTIONS OF STATISTICAL DATA AND SAMPLING

Classification of Data according to Source

1. Primary or Secondary Data


▪ Primary Data – data collected by the issuing or publishing organization
▪ Secondary Data – data originally compiled by the organization responsible for
organization.
2. Internal and External Data
▪ Internal Data – data obtained within the organization
▪ External Data – data obtained outside the organization

Methods of Data Collection

 There are four main methods of data collection.

1. Census.
 A census is a study that obtains data from every member of a population. In most
studies, a census is not practical, because of the cost and/or time required.

2. Sample survey.
 A sample survey is a study that obtains data from a subset of a population, in order to
estimate population attributes.

 Census or Survey Methods can be done using:

1. Direct or Interview method


 this is a method of person-to-person exchange between the interviewer and the
interviewee.
 This method provides consistent and more precise information since
clarification may be given may be given by the interviewee.

2. Indirect or Questionnaire Method


 In this method, written responses are given to prepared question.
 Respondents may feel a greater sense of freedom to answer the questions.

Questionnaire – is a list of questions which are intended to elicit answers to the problems of the study.
 Questionnaires can be mailed or hand-carried
 The common drawback of this method is the non-response to the questionnaire
if it is mailed

3. Registration Method
 this method of gathering information is enforced by certain laws.
 The advantage of this method is that information is kept systematized and made
available to all because of the requirement of the law
o Examples a registration of births, deaths, motor vehicles, and licenses.
1. Experiment.
 This method is used when the objective is to determine the cause-and-effect
relationship of certain phenomena under controlled conditions.
 Scientific researchers usually use this method

2. Observational study.
 In this method, the investigator observes the behavior of the persons or organizations
and their outcomes.
 It is usually used when the subjects cannot talk or write.
 This method makes it possible to record the behavior at the appropriate time and
situation.

Survey Sampling Method

 refers to the way that observations are selected from a population to be in the sample for
a sample survey.

Population Parameter vs. Sample Statistic

 The reason for conducting a sample survey is to estimate the value of some attribute of a
population.
1. Population parameter.
▪ A population parameter is the true value of a population attribute.
2. Sample statistic.
▪ A sample statistic is an estimate, based on sample data, of a population
parameter.

Probability vs. Non-Probability Samples

 As a group, sampling methods fall into one of two categories.


1. Probability samples.
▪ With probability sampling methods, each population element has a known (non-
zero) chance of being chosen for the sample.
2. Non-probability samples.
▪ With non-probability sampling methods, we do not know the probability that
each population element will be chosen, and/or we cannot be sure that each
population element has a non-zero chance of being chosen.

Non-Probability Sampling Methods

 Two of the main types of non-probability sampling methods are voluntary samples and
convenience samples.

 Voluntary sample.
▪ A voluntary sample is made up of people who self-select into the survey. Often,
these folks have a strong interest in the main topic of the survey.
 Convenience sample.
▪ A convenience sample is made up of people who are easy to reach.
Probability Sampling Methods
 The key benefit of probability sampling methods is that they guarantee that the sample chosen
is representative of the population. This ensures that the statistical conclusions will be valid.

1. Simple random sampling.


▪ Is the process of selecting a sample, giving each sampling unit an equal chance
of being included in the sample.
2. Systematic sampling.
 Is a method of selecting a sample by taking every kth unit from an ordered population

3. Stratified sampling.
 Is a method of selecting a sample where the population is divided or stratified into more
or less homogeneous sub-population or strata before sampling is done.
 In stratified sampling, the groups are called strata.
4. Cluster sampling.
 Is a method of selecting a sample of distinct grouped or clusters of smaller units called
elements.
 Similar to stratified sampling in that population is grouped into sub-populations, but in
cluster sampling, these groups are heterogeneous so that each cluster is representative
of the population

Sample size
 the appropriate number of data points that must be drawn from the population.
 The sample size must not be too small that it becomes unreliable and must also not be
too big that the cost of sampling is too much.
 The sample size must be determined correctly

How to compute the sample size

 Sample size is usually computed by this formula;

Where
n = sample size
N = population size
e = desired margin of error (percent allowance of non-precision)

How to use the Table of Random Numbers

1. Determine how many digits are you sample size.


e.g.
n = 150, then it has 3 digits
2. Randomly select the starting row from the table and the starting 3 consecutive columns.
3. Read the numbers either downward or upward.
4. Select and list only those numbers that are less than or equal to your population size until you
have n listed numbers.
1. Or you can use a Random Number Generator

When to use Random Sampling?

 If the population is not widely spread geographically.


 If the population is more or less homogeneous with respect to the characteristics under
study.

Steps in conducting a Simple Random Sampling

Step 1
 It is prerequisite in Simple Random Sampling to have a list of the population units and
number them from 1 to N (N = population size)
Step 2
 Compute the sample size n.
Step 4
 Generate n (n = sample size) random numbers.
Step 5
 Select the populations units with number corresponding to the generated n random
numbers. Your selected n populations units now becomes your n data points.

When to use Systematic Sampling?

 If the ordering of the population is essentially random


 If there is slight stratification in the population
 When stratification with numerous data is used

Steps in conducting a Systematic Sampling

Step 1
 Number the units in the population consecutively from 1 to N (N = population size)

Step 2
 Determine k, the sampling interval by the formula

where N = population size and


n = sample size
Step 4
 Use the table of random numbers to choose r, the first unit of the sample
Step 5
 Consider the list in a circular manner. Starting from r, count k units. That is the 2nd unit
of the sample. Count another k and that is the 3rd unit of the sample. Repeat until you
get n sample units.

When to use Stratified Sampling?

 If the population is such that the distribution of the characteristics under consideration is
very irregular points of the population
 If precise estimates are desired for certain parts of the distribution.
 If sampling problems differ in the various sections of the population.

Steps in conducting a Stratified Sampling

Step 1
 Group or stratify your population into s groups or strata based on homogeneity. The
units in the sample must be heterogeneous.
Step 2
 Determine sample size, n, using the previous formula.
Step 4
 Determine the size of the sampling units, ni to be drawn from each stratum by using this
formula

where n = n1 + n2 + … + ns and
N = N1 + N2 + … + Ns
Step 5
 Use either the table of random numbers or systematic sampling in drawing the ni units
of the sample from each stratum.

When to use Cluster Sampling?

 Clustering is used rather than individual selection when the lower cost per element more than
compensates for its disadvantages.
 If the population can be grouped into clusters where individual population elements are known
to be different with respect to characteristics under study.

Steps in conducting a Cluster Sampling

Step 1
 Group your population into s clusters based on homogeneity. The units in the sample
must be heterogeneous.
Step 2
 Determine sample size, n, using the previous formula.
Step 4
 Determine the size of the sampling units, ni to be drawn from each cluster by using this
formula

where n = n1 + n2 + … + ns and
N = N1 + N2 + … + Ns
Step 5
 Use either the table of random numbers or systematic sampling in drawing the ni units
of the sample from each cluster.

Review Example:
 The local office of an international aviation company has the following classifications:

Classification Number of employees


Whites 182
Blacks 51
Orientals 17

Select a stratified random sample of n (e =10%) students using proportional allocation. How
large a sample must be taken from each stratum?

Solution:

1. Problem: Solve nWhites, nBlacks, nOrientals

2. Determine/solve the following:


 NWhites = 182, NBlacks = 51, NOrientals = 17

 N = NWhites + NBlacks + NOrientals


= 182 + 51 + 17
= 250

= 250
1 + 250(0.1)2
= 250
1 + 250(0.01)
= 250
1 + 2.5
= 250
3.25
=71.43 ≈71
3. Solve for nWhites, nBlacks, nOrientals

=(71)(182) =(71)(51) =(71)(17)


250 250 250
= 12922 = 3621 = 1207
250 250 250
=51.69 ≈ 52 =14.48 ≈ 14 =4.83 ≈ 5
Check Answer:
n = 71 = nWhites + nBlacks + nOrientals
= 71 = 52 + 14 + 5
= 71 = 71

EXPERIMENTAL DESIGN

 n., a test or trial


 v., the process of testing
 an operation or procedure carried out under controlled conditions in order to discover
an unknown effect or law, to test or establish a hypothesis, or to illustrate a known law
(www.merriam-webster.com/dictionary/experiment)
In an experiment, a researcher manipulates one or more variables, while holding all other variables
constant. By noting how the manipulated variables affect a response variable, the researcher can test
whether a causal relationship exists between the manipulated variables and the response variable.

Parts of an Experiment

All experiments have independent variables, dependent variables, and experimental units.
 Independent variable.
o An independent variable (also called a factor) is an explanatory variable manipulated by
the experimenter.
o It is the variable that is intentionally changed in the experiment, such as the
temperature of the water in which an effervescent tablet was dissolved.
o Each factor has two or more levels, i.e., different values of the factor.
Treatments - Combinations of factor levels.
The table below shows independent variables, factors, levels, and treatments for a hypothetical
experiment.

Vitamin C

0 mg 250 mg 500 mg

0 mg Treatment 1 Treatment 2 Treatment 3


Vitamin E
400 mg Treatment 4 Treatment 5 Treatment 6

 Dependent variable.
o is the variable that responds to the changes in the independent variable
o In an experiment, the dependent variable is the one cannot be controlled by the user.
o It is the set of observations resulting from the application of the different treatments
(combination of two or more levels of factors or independent variable.

Treatments (combinations of levels of Independent Variables/Factors)

Treatment 1 Treatment 2 Treatment 3 Treatment 4 Treatment 5 Treatment 6

(Salt – 0 mg, Sugar (Salt – 0 mg, Sugar (Salt – 0 mg, Sugar (Salt – 400 mg, (Salt – 400 mg, (Salt – 400 mg,
– 0 mg) – 250 mg) 500 – mg) Sugar – 0 mg) Sugar – 250 mg) Sugar – 500 mg)

Dependent Observations Observations Observations Observations Observations Observations


Variable on on on on on on
Experimental Experimental Experimental Experimental Experimental Experimental
(Taste) Units Units Units Units Units Units

 Experimental units.
o The recipients of experimental treatments are called experimental units.
o The experimental units in an experiment could be anything - people, plants, animals, or
even inanimate objects.
Example:
In an experiment where you want to know the effects different brands of fertilizer in the height
of plants, identify the independent variable, dependent variable, and the experimental units.
 IV: brands of fertilizer
 DV: height of plants
 EU: plants

Check: Identify the IVs, DVs, and EUs in the following experiments.
1. You are to conduct an experiment where you want to test if the amount of baking soda used in
the cupcake recipe affects the taste of the cupcakes.
2. You want to test if the levels of different levels temperature and pressure affect the fuel
consumption of the machine.
3. You want to determine if the texture of the paper affects your coloring media consumption.

Characteristics of a Well-Designed Experiment

A well-designed experiment includes design features that allow researchers to eliminate extraneous
variables as an explanation for the observed relationship between the independent variable(s) and the
dependent variable. Some of these features are listed below.
 Control - Control refers to steps taken to reduce the effects of extraneous variables (i.e.,
variables other than the independent variable and the dependent variable). These extraneous
variables are called lurking variables.

 Control group. A control group is a baseline group that receives no treatment or a


neutral treatment. To assess treatment effects, the experimenter compares results in
the treatment group to results in the control group.
 Example: Testing the effect of a brand of lotion. One group is asked not use any
lotion for 5 days. And the other group was given the brand of lotion to use for
also 5 days. Everyday, a test will be conducted to both groups to find out if there
is skin discoloration.

 Placebo. Often, participants in an experiment respond differently after they receive a


treatment, even if the treatment is neutral. A neutral treatment that has no "real" effect
on the dependent variable is called a placebo, and a participant's positive response to a
placebo is called the placebo effect.
 Example: Testing the effect of the new brand of pain reliever. Two groups are
made and both groups receive a treatment. One group is given the new brand
of aspirin and the other group a dummy medicine with no effects to the
participant.
 Blinding. Of course, if participants in the control group know that they are receiving a
placebo, the placebo effect will be reduced or eliminated; and the placebo will not serve
its intended control purpose.
 Example: Same as the placebo design but the group receiving the placebo is told
that the medicine they’re taking is actually the placebo ruling out the
Hawthorne effect.

Hawthorne effect - is a form of reactivity whereby subjects improve or modify an aspect of their
behavior being experimentally measured simply in response to the fact that they are
being studied, not in response to any particular experimental manipulation.

 Randomization. Randomization refers to the practice of using chance methods (random


number tables, flipping a coin, etc.) to assign experimental units to treatments. In this
way, the potential effects of lurking variables are distributed at chance levels (hopefully
roughly evenly) across treatment conditions.
 Example:
 15 students are asked to participate in a taste test of the best recipe for
the new cake flavor. There are 3 recipes, one with 1 tsp lime juice, one
with 3 tsp lime juice, and the 3rd with 5 tsp lime juice.
 The 15 students were randomly assigned to taste and rate one recipe.

 Replication. Replication is the repetition of an experimental condition so that the


variability associated with the phenomenon can be estimated.
 Example:
 To test how fast the sugar dissolves in water depending on the
temperature, 25o C, 50o C, 75o C, 100o C, you get 4 glasses of water with
respective temperatures and record how many minutes it took to
dissolve the sugar.
 In order to get a more valid result, you repeat the experiment 2 more
times using the same set of conditions to see if the results are
consistent.

Experimental design - refers to a plan for assigning experimental units to treatment conditions.
Designs of an Experiment

Consider the following hypothetical experiment. Acme Medicine is conducting an experiment to


test a new vaccine, developed to immunize people against the common cold. To test the vaccine, Acme
has 1000 volunteers - 500 men and 500 women. The participants range in age from 21 to 70.
In this lesson, we describe three experimental designs - a completely randomized design, a
randomized block design, and a matched pairs design. And we show how each design might be applied
by Acme Medicine to understand the effect of the vaccine, while ruling out confounding effects of other
factors.

1. Completely Randomized Design

The completely randomized design is probably the simplest experimental design, in terms of data
analysis and convenience. With this design, participants are randomly assigned to treatments.

Treatment A completely randomized design layout for the Acme Experiment


is shown in the table to the right. In this design, the experimenter
randomly assigned participants to one of two treatment conditions.
Placebo Vaccine
They received a placebo or they received the vaccine. The same
number of participants (500) were assigned to each treatment
500 500
condition (although this is not required). The dependent variable is
the number of colds reported in each treatment condition. If the vaccine is effective, participants in the
"vaccine" condition should report significantly fewer colds than participants in the "placebo" condition.

A completely randomized design relies on randomization to control for the effects of extraneous
variables (age, sex, medical conditions). The experimenter assumes that, on averge, extraneous factors
will affect treatment conditions equally; so any significant differences between conditions can fairly be
attributed to the independent variable.

2. Randomized Block Design


With a randomized block design, the experimenter divides participants into subgroups called blocks,
such that the variability within blocks is less than the variability between blocks. Then, participants
within each block are randomly assigned to treatment conditions. Because this design reduces variability
and potential confounding, it produces a better estimate of treatment effects. Blocks are can be based
on certain factors that may affect the experiment, thus by taking it into account, the variability across
each blocks can be reduced.
The table to the right shows a randomized block design for
the Acme experiment. Participants are assigned to blocks,
based on gender. Then, within each block, participants are Treatment
randomly assigned to treatments. For this design, 250 men
get the placebo, 250 men get the vaccine, 250 women get
the placebo, and 250 women get the vaccine. Gender Placebo Vaccine

It is known that men and women are physiologically different Male 250 250
and react differently to medication. This design ensures that
each treatment condition has an equal proportion of men
and women. As a result, differences between treatment Female 250 250
conditions cannot be attributed to gender. This randomized
block design removes gender as a potential source of
variability and as a potential confounding variable.

In this Acme example, the randomized block design is an improvement over the completely randomized
design. Both designs use randomization to implicitly guard against confounding. But only the
randomized block design explicitly controls for gender.
 Note: Blocks perform a similar function in experimental design as strata perform in sampling.
Both divide observations into subgroups. However, they are not the same. Blocking is associated
with experimental design, and stratification is associated with survey sampling.

Problem Set:
1. Mando Rukot is a nutritionist and was hired to test the effects of the new brand of dietary
supplement to the weight of the consumer. There are 100 volunteers composed of 50 males and 50
females. A blind placebo technique is used to control extraneous variables. Design the experiment
using;
a Completely randomized design
b Randomized Block Design

2. Barbie Que is a car dealer and she wants to find out which of the two types of car (A Van or a Pick-
up Truck) appeal to her customers. She decided to conduct a test using 150 volunteers composed of
the following classification of customers 50 office workers, 50 housewives, and 50 college students.
She then let them to test drive the car and asked them of their preference (if they would buy it or
not). Identify the independent variables, dependent variables and experimental units, Design the
experiment using the design you chose.
.

You might also like