You are on page 1of 26

1|Gabino P.

Petilos

INTRODUCTION TO
STATISTICS 1
1.1 WHY STUDY STATISTICS

Information in the form of statistics is so common that people from all walks of
life are exposed to them when they are exposed to mass media. Consider for example
the following:

 Figures released by the National Statistics Coordination


Board (NSCB) showed that the total number of college
graduates increased by only 2.9 percent in 2010 and
accounted for just 2 percent of those within the prime
employable age bracket of 20 to 34.1

 According to Health Secretary Enrique Ona, in a press


briefing, the mortality rate for Filipino mothers has
increased to 221 per 100,000 live births in 2011 from 162
per 100,000 live births in 2009. 2

 According to PNP report, index crime cases was down by


20.01 percent in January this year with 11,379 cases
compared to the 14,227 listed cases in same period last
20113.

The availability of such information is made possible through the use of


statistics. In the field of agriculture, the application of statistics to the design and
analysis of agricultural experiments has lead researchers to recommend varieties of
crops or animals as well as agricultural techniques that resulted in the abundance of our
food supply. In the field of medicine and pharmaceutical research, the methodology of
statistics is constantly used to determine the effectiveness of new drugs. Take for
instance how “Lagundi” plant was discovered as an effective cure for coughs, colds,
fever and flu and other broncho-pulmonary disorders.4 Its efficacy could not have been
established objectively without using statistics. In the field of education, effectiveness
of alternatives modes of teaching to improve learning among students is often
established with the aid of statistics. Statistics are also used (and probably also
misused) to demonstrate the superiority of one commercial product over another. The
Government uses statistics to improve the delivery of services in all sectors of society in
the fields of education, health, and public safety and so on and so forth. It may be said

1
http://newsinfo.inquirer.net/186363/number-of-college-graduates-grew-by-only-2-9-in-2010
2
http://newsinfo.inquirer.net/214829/maternal-mortality-rate-rose-in-2011-says-doh
3
http://www.pia.gov.ph/news/index.php?keywords=crime%20rate.
4
http://www.sulit.com.ph/index.php/view+topic/id/87150/Lagundi+-+Only+in+the+Philippines

-------------------------------------------------------
Introduction to Statistics
2| Gabino P.Petilos

that the application of statistical techniques is so widespread, and the influence of


statistics on our lives is so great, that its importance cannot be over-emphasized.

According to J.T. McClave & F. Dietrich Jr. (1985) “people need to develop a
discerning sense of rational thought that will enable them to evaluate data so as to
make intelligent decisions, inferences, and generalizations”. Thus, our knowledge of
statistics would help us critically evaluate the information fed to us in the form of
statistics.

1.2 MEANING OF STATISTICS

The word statistics is usually associated with numerical facts or figures such as
the number of road accidents that happen per month, the crime rate in the country, the
number of board passers in the CPA board examination or population of people in a
specified place. It could also be any numerical information such as grades of students in
an examination, measures of athletic performance, Consumer Price Index, crime rate, or
unemployment rate. In this context, the entities that comprise the numerical
information are called data or statistics. This is the layman’s understanding of the word
statistics.

The study of the rules, techniques or methods used to collect, present, analyze
and interpret a set of numerical data is also called statistics. In this context, statistics is
a body of knowledge that deals with the collection, organization, and analysis of
numerical data to answer problems in an experiment design and decision-making. This
is another meaning of the word “statistics”.

When only a part of the entire data set is analyzed, numerical measures derived
from this smaller set are also called statistics. In this context, the word can be singular
or plural in meaning. Thus, a single numerical measure derived from this smaller set is
called statistic. Collectively, such measures are called statistics.

Tate (1955) has beautifully summarized the different meanings of statistics with
her comment as follows:

“It’s all perfectly clear; you compute statistics (mean, median, mode, etc) from
statistics (numerical facts) by statistics (statistical methods).”

1.3 SOME BASIC TERMINOLOGIES

The concepts of population and sample are important in statistics. In the


context of statistics and research, the term population refers to the totality of all
observations or entities of any sort, which we are concerned about. These entities
could be animate or inanimate depending on the type of research being conducted.
Thus, in social science and educational researches, the entities are people; in the field of
agriculture, the entities could be plants (diseases of abaca plants) or animals (diseases of
carabao); in the field of geology, the entities could be rocks!

Introduction to Statistics
3| Gabino P.Petilos

A sample, on the other hand, is a subset or a part of a given population, hence


its size is generally smaller than the size of the population. Examples of population and
the corresponding sample are as follows:

Population: All enrolled students at Leyte National High School


Sample: All first year students at Leyte National High School

Population: Newly born babies in a particular hospital


Sample: Newly born babies in the same hospital with abnormalities

Population: Public School Teachers in Region 8


Sample: Public School Teachers in the Division of Leyte

Population: Abaca plants in Region 8


Sample: Abaca plants in Southern Leyte

Another important pair of terms that you have to be familiar with are
parameter and statistic. A number that describes a characteristic of the population is
called a parameter. On the other hand, a number that describes a characteristic of a
sample is called a statistic. Thus, if there are two or more summary measures computed
from a sample data, the measures collectively are called statistics, the third meaning of
the word statistics mentioned above. In general, we use statistic calculated from a
sample in order to estimate a population parameter.

A parameter is constant but is generally not known because more often than
not, sample data instead of population data are studied for some practical reasons.
Statistic, on the other hand, varies from sample to sample. An easy way to differentiate
these two concepts is to remember that “a parameter is to population as a statistic is to
sample”.

Consider for instance the population of public school teachers in Region 8. Let
us consider the characteristic age of this population. Note that it is possible to compute
the average age or mean age of all teachers in Region 8. When computed, the mean
age is an example of a parameter which is usually denoted by the symbol  (Greek
letter mu). Even if we do not have this value at hand, we know that  is fixed in value.
Thus, an important characteristic of a parameter is that, it is generally not known, but its
value is fixed or constant since there is only one average age of all teachers in Region 8.

A sample from this population could be the teachers in the Division of Tacloban
City or the teachers in Leyte Division. Note that we can also talk about the average age
of the teachers in Tacloban City division. The average age from this sample is an
example of a statistic. This statistic is denoted by x (read “x bar” or “sample mean”).
The sample mean is also constant but its value will differ from sample to sample.
Hence, we do not expect the average age of the teachers in Tacloban City division to be
the same as the average age of the teachers in Leyte Division. Either the average age of
the teachers in Tacloban City division or Leyte Division could be used as an estimate of
the unknown average age of all public school teachers in Region 8.

Introduction to Statistics
4| Gabino P.Petilos

As another example, consider the population of male adults in Region 8 who


smoke. The characteristic of interest is the proportion P of these individuals who have
symptoms of lung cancer. The measure P is another example of a parameter. The
value of P may not be known but we assume that this is constant. Now, if we consider
a sample of male adults in Tacloban City who also smoke, the proportion p of these
individuals with symptoms of lung cancer is an example of a statistic. We can use p as
an estimate of P. Note that the statistic p will also vary from sample to sample.

1.4 DESCRIPTIVE AND INFERENTIAL STATISTICS

The subject matter of statistics is broadly divided into two areas, namely,
descriptive statistics and inferential statistics. Descriptive statistics comprises those
methods concerned with collecting and describing a set of data. The data could be
sample data or population data. The main objective of descriptive statistics is to
organize, summarize, or describe a set of numerical data to make it easy to comprehend
and therefore extract useful information.

When applied to a sample data, the aim is to describe or present the data
without going beyond what is given by the data. Thus, when descriptive statistics are
applied to a smaller set of data instead of a larger set, we do not attempt to make
conclusions or inferences about the population from which this sample was drawn.
Generally, we are in the field of descriptive statistics when we present data in table
form, graphical form or when we use summary values such as the mean or standard
deviation of the numerical data.

On the other hand, inferential statistics comprises those methods concerned


with the analysis of a sample leading to conclusions or predictions or inferences about
the entire population. An important feature of this area of statistics is that, the results
of the analysis of the summary of the data are used to make conclusions beyond the
data being analyzed.

Usually, when making an inference about the entire population, a measure of


reliability of the inference is attached. This measure is called the confidence level which
is usually expressed in terms of probability. For instance, clinical trials had been
conducted to test the effectiveness of Lagundi extracts before a general conclusion is
made that the extract is 99.9% effective in curing coughs and other related ailments.
The number 99.9% is the measure of reliability of such an inference or general
conclusion.

Introduction to Statistics
5| Gabino P.Petilos

1.4 VARIABLE AND CONSTANT

Statistical thinking starts with an awareness and understanding that no two


things are alike and that variability is inherent to all things (Levine, etc al., 1995). One
may wonder why students post different scores on a given test; why job performance
varies from individual to individual; or why research productivity varies among tertiary
level schools.

For our purpose, we will define a variable as any characteristic of objects or


individuals that can take on different values for different members of the group under
study. Thus, for instance, characteristics of individuals such as I.Q., gender, height,
weight, political affiliation, religion, etc. are examples of variables. In general, a variable
that can take on numerical values is called a quantitative variable or numerical
variable. Examples of quantitative variables are I.Q., height, weight, income etc. On
the other hand, a variable that can take on non-numerical values is called a qualitative
variable, attribute variable or categorical variable. Examples of qualitative variables
are gender, political affiliation, religion, educational attainment, occupation etc.

A constant is defined as a characteristic that assumes the same value for all
members of the group. For instance, nationality is a variable if we talk about a class in
statistics composed of international students. If the students in this class are typical
residents of Leyte, then nationality may not vary from student to student and hence it
becomes a constant. As another example, grade level is a variable when our
respondents involve pupils from Grade I up to Grade VI. However, if we consider only
pupils in Grade VI, this variable is rendered a constant.

Variables Classified According to Functional Relationship

In research, it is important to know the nature of the variables and how they
function in a particular study. We can classify variables according to three criteria,
namely, according to functional relationship, according to continuity of scale and
according to scale of measurement.

Variables generally differ in terms of how they are used in research. In terms of
functional relationship, variables are generally categorized as independent and
dependent. The variable of prime interest to the researcher is the dependent variable
which is presumed to be influenced by some other variables. On the other hand, a
variable that is thought to influence a dependent variable is called an independent
variable. The relationship between the dependent and independent variable is usually
indicated by a directional arrow or path from the independent variable to the
dependent variable as shown in Figure 1.1

independent dependent
variable variable

Fig. 1.1

Introduction to Statistics
6| Gabino P.Petilos

Thus, dependent and independent variables are distinguished by their temporal


sequence. In terms of cause and effect relationship, the independent variable is
observed first before the dependent variable. Hence the independent variable comes
first before any effect on the dependent variable can be ascertained.

To illustrate, a hypothetical study on the variables that influence absenteeism


rates of government workers could probably include, among others, marital status, level
of morale of the employee, organizational climate, and leadership style of the manager.
In this illustration, absenteeism rate is the main dependent variable while the rest of the
variables mentioned are independent variables.

Other classifications of variable according to functional relationship are


intervening variable, moderator variable, control variable, and nuisance variable.

Let us consider a very simple example to illustrate the idea of intervening


variable. Have you ever wondered why work performance differs from teacher to
teacher? What could plausibly explain variation of work performance among teachers?
If this were an actual study, perhaps you would identify the age or level of motivation of
teachers as the plausible reasons why there is variation in work performance among
teachers. In this hypothetical situation, work performance, age, and level of motivation
are the variables of interest to us with work performance as the dependent variable and
age and level of motivation as the independent variables. Note that the level of
motivation of the teachers could also be influenced by their age. Figure 1.2 shows the
possible relationship among these variables.

level of
age work performance
motivation

Fig. 1.2

By hypothesizing that level of motivation could be influenced by age, the role


played by level of motivation in this hypothetical study changes to being a dependent
variable in relation to age. Thus, we say that level of motivation is an intervening
variable in relation to age and work performance since it causally links between these
two variables. In general, age does not influence the level of work performance directly.
Rather, age causes the level of motivation to come into existence, and the latter in turn
influences work performance of teachers. Thus, an intervening variable is an
intermediate variable that is produced by the main independent variable which in turn
influences the dependent variable (Vockell, 1983).

As another example, if serving or not serving breakfast (independent variable)


to undernourished school children improves their academic performance (dependent
variable), then a possible intervening variable could be satisfaction of hunger. Note
that satisfaction of hunger is produced by serving breakfast which in turn may influence
academic performance.

Introduction to Statistics
7| Gabino P.Petilos

To understand the meaning of a moderator variable, we illustrate this concept


using the study of Labtic (1992) which focused on the effects of Peer Teaching on the
performance of the students in Physics. Thus, the independent variable of the study is
method of teaching with two levels, namely, Peer Teaching Method and the Traditional
Method while the dependent variable is the performance of the students in Physics in
terms of their scores in the Physics achievement test.

Note that the effect of method of teaching on the performance of the students
in Physics could differ depending on the students’ ability level. Above average students
may interact differently during peer teaching compared to below average students.
Hence, ability level is a moderator variable. Thus, a moderator variable is an
independent variable which influences (moderates) the impact of an independent
variable on a dependent variable. Figure 1.3 illustrates the relationship between the
independent and dependent variable with a moderator variable.

Ability Level
(Intervening Variable)

Method of Teaching Physics Performance


(Main Independent Variable) (Dependent Variable)

Fig. 1.3

Unlike the intervening variable, a moderator variable does not come into
existence because of the independent variable. Rather it already exists and is
deliberately included as another independent variable to determine in what ways the
impact of the main independent variable on the dependent variable differs when
different levels of the moderator variable are present. A student’s ability level is not a
result of method of teaching, hence ability level is not an intervening variable.

There are situations when an independent variable that is thought to influence a


dependent variable is not highlighted or focused in a given study. Its effect on the
dependent variable is eliminated or neutralized by making it constant. Such a variable is
called a control variable. Using the same study of Labtic (1992), instead of including
ability level as a moderating variable, the study could be conducted by including only
below average students. Hence, ability level has been controlled and is therefore
considered a control variable.

Finally, performance in Physics may be influenced by other variables inherent in


the students such as their level of anxiety in Physics which is beyond the control of the
researcher. Its influence on the dependent variable is assumed negligible and hence is
not focused or included in the study. Such a variable is an example of a nuisance
variable.

Introduction to Statistics
8| Gabino P.Petilos

Variable Classified According to Continuity of Scale

Quantitative variables can be classified as either discrete or continuous.


Quantitative variables that can take on any value on the measurement scale are called
continuous variables. These variables represent numerical measurements on a
continuous dimension or scale and can take any numerical value within a continuum or
interval. Examples of continuous variables are height, age, weight, income,
temperature, etc. It is important to note that although age is usually reported to the
nearest unit, any value of age can occur within a specified range of values.

On the other hand, quantitative variables which take on only designated values
(finite or countable) are called discrete variables. The number of children in a particular
household is an example of a discrete variable. This variable cannot assume a value of ½
or 0.9. The only possible values that this variable can assume are 0, 1, 2, 3, 4, etc. which
are obviously whole numbers. Other examples of discrete variables are number of
pregnancies, number of failing grades, number of times absent, and size of shoes (in
inches). Note that size of shoes can only take designated values such as 5, 5 12 , 6, 6 12 , 7,
7 12 etc. Although it can take on a fractional part, there is no pair of shoes whose size in
inches is 5 15 or 6 18 or 7.2

An important distinguishing feature between discrete and continuous variable is


that, values of discrete variables are obtained from a process of counting and are
therefore generally restricted to whole numbers. On the other hand, values of a
continuous variable are obtained from a process of measuring in which case the results
could be real numbers.

Variables Classified According to Level of Measurement

Broadly defined, measurement is the assignment of numbers or codes to the


categories of a variable according to sets of predetermined (or arbitrary) rules (Elifson,
et.al. 1990). A common example is the measurement of temperature in degrees Celsius
where 0 is used to indicate the freezing point of water while 100 is used to indicate the
boiling point of water. The rule for measuring temperature is standard (predetermined)
using either “degrees Celsius”, “degrees Fahrenheit” or “degrees Kelvin”. These units of
measures are universally accepted. On the other hand, the variable marital status can
be measured by using an arbitrary rule such as assigning the number 1, if the
respondent is single, the number 2, if the respondent is married, and the number 3,
otherwise.

There are four types of data which can arise when measuring a variable. The
type of data generated would depend on the scale or level of measurement used to
measure the variable. Adequately defined variables are generally classified into the
following four levels of measurement.

Introduction to Statistics
9| Gabino P.Petilos

 Nominal Scale

The nominal scale is the simplest scale of measurement that establishes


equivalence or difference between the attributes of the objects or respondents. In this
scale, numbers are used merely as labels of the categories of the variable. The numbers
cannot be meaningfully ordered and arithmetic operations do not yield meaningful
results.

For instance, the variable sex can be measured using nominal scaling only. Since
there are only two categories of sex, we can assign (arbitrarily) 1 if the respondent is a
male and 0 if the respondent is a female. The numbers 1 and 0 are the labels of the
categories of gender. Note that we cannot meaningfully rank these numbers and say
that being “male” is better than being a “female”.

Sex then is an example of a nominal variable. If there are n = 30 respondents,


the measures of sex would yield data consisting of 0’s and 1’s. Such data are called
nominal data and they are summarized by counting the number of 0’s and 1’s. Thus, if
the number of 0’s is 14, then we say that 14 of the respondents are female and
therefore the number of males is 16. The summary numbers 14 and 16 are called
frequency counts and are also referred to as nominal data.

As another example, the variable marital status is a nominal variable since the
scale of measurement that applies would be a nominal scale. If we assign a “1” for
single respondents, ”2” for married respondents, and “3” for those who are neither
single nor married, then the numbers are just labels of the categories of this variable
and they cannot be meaningfully ordered and operated mathematically. We cannot say
that being married “is better” than being single even if 2 is greater than 1. Again, the
frequencies of 1’s, 2’s and 3’s that can arise from the measurement of marital status are
nominal data.

 Ordinal Scale

In this scale of measurement, the numbers serve also as labels of the categories
of a variable but more than being labels, these numbers can be meaningfully ranked. To
illustrate, consider the variable socio-economic-status (SES). We can assign the
numbers 3, 2, and 1, for the categories “High SES”, ”Middle SES”, and “Low SES”,
respectively. Thus, a respondent whose measure of SES is 3 belongs to a family who is
better off compared to a respondent whose measure of SES is 2. However, we cannot
say that a respondent whose measure of SES is 2 is “twice as rich” as another
respondent whose measure of SES is 1. Thus, in the ordinal scale, although numbers can
be meaningfully ranked, we cannot talk about “equal differences” between the
successive categories of the variable.

As another example, consider the salary grade (SG) among government


employees. The measurement of salary grade range from 1 (lowest) up to 33 (highest).
These numbers are ordinal in nature since a person with salary grade of 11 has more
monthly gross income than another person whose salary grade is 10 or lower than 10.
In the full implementation of the Salary Standardization law for all government

Introduction to Statistics
10 | Gabino P.Petilos

employees, the monthly salary of a person whose salary grade is 10 (Step 1) is P17,255
while that of another person whose salary grade is 5 (Step 1) is P12,019 per month.
Thus, we cannot say that a person whose salary grade is 10 has twice as much monthly
income as another person whose salary grade is 5.

In the first example above, the summary data for SES could be frequency
counts. On the other hand, if only salary grades are considered, these data are not
frequency counts. Salary grade illustrates an example of an ordinal data.

The ranks assigned to the pupils based on their overall average grades is
another example of an ordinal data. We highlight this difference because the statistical
treatment would differ for ordinal data and frequency counts even is the scale of
measurements are both ordinal.

 Interval Scale

The interval scale possesses all the characteristics of the ordinal scale which
means that the numbers used in measuring the variable also serve as labels of the
categories of the variable and that these can also be meaningfully ranked. In addition,
the categories in the interval scale are defined in terms of a “standard unit of
measurement” so that equality of differences between successive categories of the
scale is defined as well as the operations addition and subtraction. In this scale, we can
assess not only which respondent ranks higher on the measure but also “how much
higher”.

The common example used to illustrate a variable which can be measured using
the interval scale is the variable temperature, say, in degrees Celsius. Here, the
standard unit of measure is “1 degree Celsius”. Each measure of temperature such as
100C is a label of a category of temperature. Moreover, we can say that a temperature
of 30C is 5C higher than a temperature of 25C. However, a temperature of 30C is
not twice as hot as a temperature of 15C. This is because the interval scale has no true
zero point. A true zero point is a point on the scale that indicates the complete absence
of the characteristic being measured. Thus, 0 C is not an absolute zero since it does not
indicate the absence of heat.

Methods of scaling many variables in the behavioral sciences such as ability,


attitudes, achievement scores, I.Q., anxiety, and other personality dimensions result in
scores that are more refined and detailed than ordinal scales. However, such scales still
do not fit the exact specifications for interval measurement. Consider, for instance, the
measurement of achievement using a multiple-choice test. The number of correct
answers is generally assigned as the score that represents the individual’s knowledge
level. The question that arises now is that, “does each additional score unit on the scale
mean the same amount of knowledge”? If one individual received a score of 40 and
another received a score of 50, does the difference of 10 units represent the same
amount of knowledge separating another pair of scores such as 10 and 20? In the same
manner, does each “unit of measure” of attitude represent the same amount of
attitude?

Introduction to Statistics
11 | Gabino P.Petilos

Although there is no general agreement among researchers in the behavioral


sciences on how to treat measures of these variables, the practice of treating these
measures as “interval measurements” does not actually distort the results much
(Glasnapp & Poggio, 1985). Hence, these variables are treated as if they were interval
variables. Note that all these mentioned variables do not have a true zero point. In
the case of I.Q., the measures follow a bell-shaped distribution with most scores ranging
from 90 – 109 which are interpreted as average I.Q. using the Binet Scale of Human
Intelligence5

Thus, the likelihood of a person to have a zero I.Q. is almost nil. If a person can
have a zero I.Q., it does not mean that the person has no intelligence at all. Hence, a
measure of I.Q. equal to 0 is an arbitrary zero and not a true zero point.

 Ratio Scale

As mentioned earlier, the interval scale builds on the ordinal scale with an added
characteristic on equality of intervals. If such a scale has a true zero point, then we have
the highest scale of measurement called the ratio scale.

An example of a variable which can be measured in the ratio scale is temperature


in degrees Kelvin. By definition, 0K indicates already the absence of heat (unlike 0C or
0F). Thus, we can say that a temperature of 80K is twice as much as temperature of
40K. Other examples of variables which can be measured in the ratio scale are height,
weight, work experience, amount of time to finish a task, age etc. These variables have
meaningful zero points.

Ratio scale allows multiplication and division. Thus, for instance, we do not only
say that a person who is 120 cm tall is 10 cm taller than another person who is 110 cm
tall but that this person is twice as tall as another person whose height is 60 cm. In other
words, we can always come up with a meaningful ratio between two measures of a ratio
variable.

We present in Table 1 a summary of the characteristics as well as the differences


among these four scales of measurement. A check mark indicates that the characteristic
is present in the scale while a cross mark indicates otherwise. As can be gleaned from
this table, a higher scale builds on the lower scale but with an additional property which
is not present in the lower scale.

5
http://en.wikipedia.org/wiki/IQ_reference_chart

Introduction to Statistics
12 | Gabino P.Petilos

Table 1. Summary of Scales of Measurement

Characteristics
Scale of Numbers Numbers Presence Presence of a
Measurement serve as can be of EQUAL TRUE ZERO
LABELS RANKED INTERVALS POINT
Nominal    
Ordinal    
Interval    x
Ratio    

Although there are four levels of measurement which can generate four different
sets of data, in statistics, we do not distinguish between interval data and ratio data
because these types of data can be analyzed using the same statistical techniques. Thus,
when we tabulate any set of data gathered, we will consider only nominal data
(represented by frequency counts), ordinal data (represented by data that are inherently
ranks), and interval/ratio data (which we will represent by scores). In the succeeding
topics, we shall be referring always to these classifications especially when choosing the
appropriate statistical technique to analyze the data gathered.

Some issues about the treatment of data are worth mentioning in this discussion.
For instance, a set of data obtained from measurement using a lower scale of
measurement cannot be upgraded to a higher scale. If the data gathered are frequency
counts, these summary data cannot be treated as ranks or scores. Hence, applying a
statistical technique appropriate for data gathered using interval scale cannot be justified
when applied to nominal data. This error is often committed in the Analysis of Variance
where frequency counts are treated as if they were scores.

On the other hand, data obtained using a higher scale of measurement may be
downgraded to a lower scale. For instance, scores which are interval data may be ranked
and the corresponding ranks become ordinal in nature. Moreover, scores may also be
classified as pass or fail, so that the resulting data could be the frequency counts
associated to these two categories and are therefore regarded as nominal data.
Although downgrading of data may be done, such practice is not encouraged because
statistical tests that are appropriate for interval data are more powerful than statistical
tests that are appropriate for ordinal or nominal data. Downgrading of data is justified
only when the statistical tool intended to be used in analyzing the gathered data is not
valid due to violations of some assumptions about the said statistical tool.

1.5 SAMPLING TECHNIQUES

This section briefly discusses the idea of sampling and some techniques of
drawing a sample, a topic that is important since the extent to which generalizations can
be made from the results of a research study depends much on the sample size and the
appropriateness of the sampling technique used. If a sample does not represent the
population from which it is drawn, making generalizations to the entire population may
not be warranted. Hence the purpose of sampling is to ensure that the sample at hand is
representative of the population.

Introduction to Statistics
13 | Gabino P.Petilos

There are several reasons why a researcher would resort to a sample instead of
taking the entire population. We state briefly some advantages of sampling.

 Cost . Conducting research involves money especially starting from the data
gathering up to the analysis of data. Hence, using a sample will be less
costly than when using a population

 Time. There are instances when data are urgently needed in order to make a
decision whether or not to implement a program or to start a business.
Using a sample data instead of population data will make the results timely
since it will demand less amount of time to gather and present the
information needed to be used for decision making.

 Accuracy. Population data may be large that there is more chance of


committing errors in tabulating and encoding the data and thus may yield
results which are less accurate than expected. On the other hand, using a
well selected sample would mean less amount of data to deal with so that it
will be easier to monitor and make corrections to possible errors committed
in data gathering, coding and encoding.

 Feasibility. There are situations when it may not be feasible to involve the
entire population when conducting a study or experiment. For instance,
testing the life of a brand of light bulb until it gets busted may not be feasible
since it would mean destroying all manufactured light bulbs. A sample of
light bulbs could well serve the purpose for this kind of experiment.

 Scope of information. Using a sample instead of a population for a given


study may allow the researcher to expand the scope of information that he
or she can gather because fewer individuals of persons will be used to gather
the desired information.

Sampling is the process of selecting individuals or entities from a given


population. Before sampling is done, the researcher usually prepares a list of all entities
or elements in the populations which may be people, households, organizations, or other
units of analysis. This list is called a sampling frame and the entities or individuals that
define the population are called sampling units.

Thus, for instance, if the population of the study involves the rank and file
employees of a particular government agency, the sampling frame will be the list
containing the names of all the rank and file employees of this agency. Each employee is
a sampling unit of this population. If the study involves research productivity of all public
universities in the Philippines, the sampling frame will be the list of all public universities
in the Philippines and each public university is a sampling unit.

The population of interest to the researcher must always be well-defined. A


population is well-defined if it is possible to tell exactly when a sampling unit is in the
population and when it is not. Also, the researcher must evaluate if the population is

Introduction to Statistics
14 | Gabino P.Petilos

homogeneous or heterogeneous with respect to some characteristics. Knowing the


homogeneity or heterogeneity of the population is important since it can help the
researcher in deciding the size of the sample to take from the population as well as the
technique of sampling. There is a saying that “you don’t have to eat the ox to know that
the meat is tough”. Thus, a small sample is needed if we know that the population is
homogeneous with respect to some characteristic.

To illustrate the above ideas, consider the population of high school students in
Philippine Science High School, Eastern Visayas Campus. This population is an example of
a homogeneous population if we consider the ability level of the students as the
characteristic of interest. However, if we talk about the population of high school
students at Leyte National High School, this population could be heterogeneous in terms
of this same variable since this school admits students with different ability levels.

Once the population is defined, the next thing to do is to decide for the sample
size. Researchers usually encounter a problem about the adequate sample size to be
used in his or her particular study. According to Fraenkle & Wallen (1993), there is no
clear cut answer to this problem because there is no single formula that can be used to
determine the sample size of the study. As a rule, however, if the population is
homogeneous, we may take a small sample but if the population is heterogeneous, we
have to take a large sample. How small is small or how large is large is of course an open
question.

For some types of research, the researcher can decide for the minimum sample
size needed that would allow reliable estimates of the parameter under consideration.
For instance, L.R Gay (1991) offers the following suggestions:

1. For Descriptive Research, a certain percent of the population, say, 5%, 10%,
or 15% etc. may be used. The researcher may decide to take a small percent
of the population if the population size is large or a large percent if the
population size is small. However, if the population is heterogeneous, a
large sample size may be needed to come up with a sample that captures the
characteristics of the population.

2. For Correlation Research, at least 30 samples may be used.

3. For Experimental/Causal Comparative Research, at least 15 samples per


group may be used.

Fraenkle and Wallen (1993), on the other hand, suggest that the minimum
number of subjects needed for a particular study as follows:

1. For Descriptive Studies, a minimum of 100 respondents is essential;


2. For Correlation Studies, a sample of at least 50 is deemed appropriate to
establish the existence of a relationship;

3. For Experimental/Causal Comparative Research, at least samples 30 per


group may be used.

Introduction to Statistics
15 | Gabino P.Petilos

Note that even among experts, there is no consensus as to the size of the sample
to take for similar types of studies.

From the statistics point of view, the sample size will depend on the cost and
variability of the data and the margin of error set by the researcher. An estimate will
probably differ from the true value because the data collected are from only some –but
not all– members of the population. The difference between the estimate and the true
value is called sampling error and margin of error is a value that quantifies possible
sampling error.

Instead of using a formula to determine the sample size of a study, one can refer
to a table such as the one presented in Table 2 developed by Krejcie and Morgan (1970)6.
The basis for coming up with the table is the formula published by the research division
of the National Education Association. This formula is given by

X 2 NP (1  P)
S
d 2 ( N  1)  X 2 P(1  P )

where, S = required sample size

X2= the table value of chi-square for 1 degree of freedom at the desired level of
confidence (3.841 for 95% confidence level)
P = the population proportion (assumed to be 0.50 since this would provide the
maximum sample size)
D = the degree of accuracy expressed as a proportion

Thus, for instance, if you want to estimate the sample size for a population of
5,000, enter Table 1 at N = 5,000 and read the corresponding sample size which is
S = 357. According to Krejcie and Morgan (1970), the table is applicable to any
population.

6
http://people.usd.edu/~mbaron/edad810/Krejcie.pdf

Introduction to Statistics
16 | Gabino P.Petilos

Table 2. Table for Determining Sample Size from a Given Population7

N S N S N S
10 10 220 140 1200 291
15 14 230 144 1300 297
20 19 240 148 1400 302
25 24 250 152 1500 306
30 28 260 155 1600 310
35 32 270 159 1700 313
40 36 280 162 1800 317
45 40 290 165 1900 320
50 44 300 169 2000 322
55 48 320 175 2200 327
60 52 340 181 2400 331
65 56 360 186 2600 335
70 59 380 191 2800 338
75 63 400 196 3000 341
80 66 420 201 3500 346
85 70 440 205 4000 351
90 73 460 210 4500 354
95 76 480 214 5000 357
100 80 500 217 6000 361
110 86 550 226 7000 364
120 92 600 234 8000 367
130 97 650 242 9000 368
140 103 700 248 10000 370
150 108 750 254 15000 375
160 113 800 260 20000 377
170 118 850 265 30000 379
180 123 900 269 40000 380
190 127 950 274 50000 381
200 132 1000 278 75000 382
210 136 1100 285 1000000 384
Note: N is population size; S is sample size

The table gives us only the minimum sample size that that is deemed adequate
for a given study. As a general rule, one may take a large sample that the budget for
research can allow since estimates of the unknown parameter becomes more accurate as
the sample size increases.

Once the sample size has been decided, the researcher may now draw the
samples from the population. In general, there are two methods of drawing a sample.

7
Krejcie & Morgan (1970), Educational and Psychological Measurement, 1970, 30, 607-610.

Introduction to Statistics
17 | Gabino P.Petilos

These methods are called random sampling or probability sampling and non- random
sampling or non-probability sampling. In probability sampling, each element in the
population has a nonzero chance of being included in the sample. On the other hand, in
the non-probability sampling, not all elements are given the chance of being included in
the sample. The discussion that follows illustrates some of these random sampling and
non-random sampling methods.

1.5.1 METHODS OF PROBABILITY SAMPLING

A. Simple Random Sampling

In this method, the sample is selected by a process that not only gives each
element in the population a chance of being included in the sample but also makes the
selection of every possible combination of the desired number of cases equally likely.
This method is basic to all sampling designs since other methods of probability sampling
use the idea of simple random sampling. Simple random sampling using the lottery
technique (fishbowl technique) or by using the table of random numbers such as the one
shown in Table 3.

Table 3. Table of Random Numbers


ROW 1-5 6-10 11-15 16-20 21-25 26-30 31-35
1 74448 09121 41402 59881 57123 01324 93511
2 72596 22826 74906 29120 70311 91023 06810
3 33766 36048 79237 62861 87657 93038 27200
4 78026 55417 61601 50081 18076 63888 33399
5 32466 49177 63263 59481 54901 07010 66548

6 19684 73930 75397 11314 55592 53879 00353


7 06340 61972 70534 75770 04075 48104 98864
8 62233 14929 34393 89954 66139 40817 03975
9 97690 44489 07318 46659 86216 17860 26087
10 71004 94032 08385 75859 69723 15294 75141

11 33240 04951 52470 12940 59311 46026 91250


12 63424 29161 34022 24476 88320 63671 39501
13 09201 21005 86086 64584 84805 94555 08612
14 62508 93948 02246 44560 20278 36533 82728
15 73950 50052 98085 77419 05582 74623 76836

16 64384 11423 87413 04578 02628 04205 21177


17 07001 49355 80142 76956 24433 88255 08460
18 83465 24370 61672 30736 49784 02509 66651
19 14670 26021 02674 74548 66288 81122 82741
20 33609 40289 25447 30385 01951 78570 51862

21 02731 05969 74761 15727 04639 41574 96372


22 75468 20313 43643 49668 23863 84983 98440
23 79060 22658 82811 79127 44277 18917 90571
24 85168 81308 97130 87328 25347 92716 34631
25 37303 05508 17551 59310 60009 56026 54473

Introduction to Statistics
18 | Gabino P.Petilos

In simple random sampling, the researcher must prepare the sampling frame
where the sampling units are numbered consecutively. If the population size is
N = 1000, we have to change the numbering using only three digit numbers since
majority of the numbers are 3-digit numbers. Thus, we can rename 1 as 000, 2 as 001, 3
as 002 and so on so that 1000 will be renamed as 999. Note that there are 1000 numbers
from 000 up to 999.

To use the table of random numbers, one has to start at a row and column at
random. To determine the starting row or column, one may use the lottery method. For
instance, since there are 25 rows, we can choose a number from 1 to 25. If 7 is selected,
then we start reading the numbers at row 7. Similarly, since there are 35 columns, we
may also choose any number from 1 to 35. If number 10, say, is selected, then we have
to start at column 10. Note that the entry in row 7 and column 10 is 2. Since the
numbers in the list are three digit numbers, we have to consider three digit numbers by
appending the digits in columns 11 and 12. Thus, the first 3-digit number in the table
starting from row 7 is 270. Therefore, the respondent whose corresponding number is
270 is included in the sample.

We can read the other 3-digit numbers starting from 270 either row-wise or
column-wise. Thus, reading row-wise, the next numbers are 534, 757, 700 etc. If the
digits in row 7 are used up, then we connect the digits in row 8 until we have the desired
sample size. The other option is to take the three digits numbers starting from 270
downward. Thus, the next number after 270 is 934, 907 until 817 at the bottom row.
The next set of three digit numbers will be obtained from the next three columns
(columns 13, 14, and 15) starting at the first row. This may be done until we are able to
identify the required sample size.

Simple random sampling is usually applied when the population size is known
and that the population is homogeneous with respect to the characteristic of interest to
the researcher. If the population is heterogeneous, simple random sampling is not
appropriate since the resulting sample may not be typical of the population.

B. Systematic Sampling with a Random Start r

Systematic random sampling is a method of sampling in which the sample is


drawn by taking every kth element of the population. If the population size N is known,
and the sample size n has been decided, then the value of k is obtained using the
equation
N
k , where k is called the “sampling fraction”.
n
The random start r could be any number from 1 up to k. Thus, for instance, if N = 1000
and n  250 then, k = 4. The random start is any number from 1 up to 4 (drawn using
lottery). If for instance, r = 2 is selected, then the respondent in the sampling frame with
corresponding number 2 is included in the sample. The next samples are systematically
identified by just adding the sampling fraction k = 4 successively to the random start
r = 2. Hence, the next respondents are those whose numbers correspond to 6, 10, 14,
etc. We are assured that we will have n = 250 since 1,000/4 yields n = 250.

Introduction to Statistics
19 | Gabino P.Petilos

What happens if n does not divide N. First regard the N units as arranged
round a circle and let k be the integer nearest to N/n. Referring to Table 2 on page 16,
if N = 1000, the corresponding sample size is n = 278. Note that N/n = 3.59 so that the
nearest integer k = 4. We next select a random number between 1 and N = 1,000 and
take every 4th unit thereafter going round the circle (Cochran, 1977). Thus, if the chosen
number is 991, the next numbers are 995, 999, 3, 7, 11, etc. until we realize n = 278. This
technique is sometimes called circular systematic sample.

To apply systematic random sampling, the sampling frame must also be


constructed and the sampling units be assigned with consecutive numbers. To generate
a representative sample, there should be no systematic pattern in listing the individuals
that define the population. If for instance, the population of the study consists of the
teachers of public elementary school teachers in Tacloban City Division, alphabetizing the
names of the teacher would probably yield a sampling frame where the sampling units
are randomly ordered in terms of the characteristics of the population.

C. Stratified Random Sampling

When the population is heterogeneous, simple random sampling may yield a


sample that is not typical of the population. In this situation, the option is stratified
random sampling. Stratified random sampling is a method of sampling wherein the
population is first divided into homogeneous groups called strata and then a random
sample is drawn from each stratum.

In Figure 1.4, suppose the population is represented by the objects inside the
box. The shapes of the objects represent the different categories of the characteristic or
variable of interest to the researcher. For instance, if the variable is ability level, a square
could represent “above average”, a circle could represent “average” and a triangle could
represent “below average”. If simple random sampling is used, there is a chance that the
sample will be composed mostly of circles and squares and therefore it will not represent
the entire population. Instead of drawing a sample in Figure 1.4, the population is first
subdivided into homogeneous groups where the students are groups according to ability
level as shown in Figure 1.5.

Fig. 1.4 Population of the Study Fig. 1.5 Population divided into
(Heterogeneous) Homogeneous groupings

If samples from each stratum of the population are drawn, the resulting sample
will be representative of the population since each stratum is represented. The issue
now is how many samples from each stratum will be drawn? There are two methods of
drawing samples from each stratum namely, equal allocation or proportional allocation.
Proportional allocation is usually recommended to avoid over-representation or under-

Introduction to Statistics
20 | Gabino P.Petilos

representation of some members in the population. Drawing the samples from each
stratum may be done using simple random sampling or systematic random sampling.

As an example, suppose we want to draw a random sample of n = 300 students


from a population of size N = 1000 students. Suppose further that out of 1000 students,
100 are above average, 700 are average and 200 are below average. If equal allocation
is used, we would probably select 100 students from each stratum to come up with the
desired sample size of n = 300. Note that this would mean that the students who belong
to the above average group will all be part of the sample while only some of the students
from the other groups will be a part of the same sample. The resulting sample is may be
described as “biased” since the above average students is overrepresented while the
average students are underrepresented.

Using proportional allocation, the number of students drawn from each stratum
depends on the ratio of the desired sample size to the population size (n/N). This ratio is
called sampling fraction. Thus, if N = 1000 and n = 300, the sampling fraction is
300/1000 = 0.30 or 30%. Hence, the number of samples to be drawn from each
subgroup is obtained by multiplying the number of cases in the group by the sampling
fraction 30%. Table 4 presents the summary of the computations.

Table 4. Sample Size Determination using Proportional


Allocation

Subgroups Subpopulation Subsample


Size Size
Above Average N1 = 100 30%(100) n1 = 30
Average N2 = 700 30%(700) n2 = 210
Below Average N3 = 200 30% (200) n3 = 60
Total N = 1000 n = 300

Thus, using proportional allocation we only select 30 students from the above
average group, 210 from the average group and 60 from the below average group.

D. Cluster Random Sampling

This technique is similar to stratified random sampling technique in that the


population is also subdivided into groups called clusters. The only difference is that the
elements in each cluster are heterogeneous and we assume that each cluster is
representative of the population. Another difference between cluster random sampling
and stratified random sampling is that, instead of selecting individuals from each cluster,
a sample of clusters is instead selected. For instance, when sampling cigarettes, we may
consider each pack as cluster and select packs instead of individual sticks.

The selection of clusters is usually done using lottery technique or using the table
of random numbers. If a cluster is selected, all units in the cluster are included in the

Introduction to Statistics
21 | Gabino P.Petilos

sample. Cluster random sampling is usually applied when the population size is not
known and there is no available sampling frame.

To illustrate, suppose we want to conduct a survey of breastfeeding mothers in


Tacloban City. Assuming that a sampling frame is not available, we may secure the map
of Tacloban and divide the map into barangays. Since there are 138 barangays in
Tacloban city, these 138 barangays will serve as the clusters. Here, we assume that the
characteristics of breastfeeding mothers in each barangay are heterogeneous. To get the
sample of breastfeeding mothers, we randomly select a number of barangays, and the
breastfeeding mothers in the selected barangays will serve as the sample of the given
population.

Note here that we can only decide for the number of barangays to be selected
but not the sample size. The final sample size will be determined only after identifying all
breastfeeding mothers in the randomly selected barangays.

E. Multi-stage Random Sampling

This sampling technique refers to the procedure as in cluster sampling but is


done in several stages. This is usually applied when the research activity includes a wide
geographical scope of respondents.

For instance, a national study to determine the nutritional status of children aged
7 and below may be done by selecting 2 regions each from Luzon, Visayas and Mindanao
(first stage), selecting 2 provinces from each selected region (second stage), selecting 5
municipalities/cities from each selected province (third stage), selecting 10 barangays
from each selected municipality/ city (fourth stage), and finally selecting 100 children
from each selected barangay (fifth and last stage). The selection of a unit in a particular
stage is usually done using simple random sampling. Thus the sampling technique is
called multi-stage random sampling. If all children aged 7 and below are included in the
final stage, the technique is called multi-stage cluster random sampling. Note the sample
size is post-determined if cluster sampling is applied in the last stage. In this illustration,
the sample size is predetermined because it is decided that only 100 children will be
selected in the final stage. This sample size is given by 2 x 2 x 5 x 10 x 100 = 20, 000.

1.5.2 METHODS OF NON-PROBABILITY SAMPLING

There are situations when samples are deliberately chosen instead of being
randomly selected. This practice leaves other elements of the population a zero chance
of being selected, hence does not generally yield a sample that is representative of the
population. When the objective of a research study is to make generalizations about the
population under investigation, non-probability sampling should not be used. This does
not mean that non-random sampling are not important since, this method is deemed
satisfactory is some situations especially in market research.

Three major forms of non-probability sampling are accidental (incidental),


purposive (judgmental), and quota sampling.

Introduction to Statistics
22 | Gabino P.Petilos

A. Convenience Sampling (Accidental Sampling)

This sampling method is one in which the investigator simply reaches out and
takes the cases that are at hand, continuing the process until the sample reaches a
designated size. Convenience and availability are the primary criteria for sample
selection.

For instance, a company would like to know how the public reacts to their new
product. Using a telephone directory, the company conducts interviews of households to
get the information. Here, the sample is a convenience sample in that the company
selected individuals who happen to be listed on the telephone directory. Selecting the
sample using the telephone directory instead of a comprehensive list of households in
the area where the interview is supposed to be conducted is done as a matter of
convenience. The resulting sample may not represent the population of individuals who
patronize the product.

B. Purposive Sampling (Judgmental Sampling)

In this sampling technique, the representativeness of the sample is based solely


on the researcher’s judgment. The samples are deliberately or purposively selected
because it is believed that they are the ones who can give the desired information sought
by the researcher. For instance, if the problem is to know the sentiments and feelings of
the students regarding the policies implemented by a university, the researcher may
purposively select the student leaders because he or she believes that these students are
the ones who are more knowledgeable or exposed about the issues affecting the
students. The resulting sample may not represent the entire population of students in
the university.

B. Quota Sampling

This technique is usually practiced in market researches wherein field


interviewers are given a “quota” of the number of samples to be interviewed. The
selection of samples usually stops once the quota is reached. For instance, if a field
interviewer is assigned to interview 50 adults to determine the level of acceptability of a
certain product being promoted by a certain company, the field interviewer may just
position himself on a particular spot and interview the first 50 adults accessible to him.
Thus, the 51st adult and so on and so forth are not anymore given the chance of being
included in the sample.

Introduction to Statistics
23 | Gabino P.Petilos

Exercise 1

1. Differentiate the following pairs of words:


a. population – sample
b. parameter – statistic
c. descriptive statistics – inferential statistics
d. independent variable – dependent variable
e. moderator variable – intervening variable

2. Which statement involves descriptive statistics and which involves inferential


statistics?
a. The grade point average of the M.M. students at LNU is 1.43.
b. Recent surveys indicate that 76% of adult Filipinos favor prayer in Public Schools.
c. A study revealed that Lagundi extract is more effective in curing coughs than a
similar product.
d. Marijuana usage was reported by 8% of the high school seniors at a particular
school.
e. Two hundred eighty three people died of rabies last year.
f. A sample survey revealed that about 13% of Filipino adults are illiterate.
g. It is not safe to stay outside the house from 10:00 P.M.to 6:00 A.M.
h. Consumers generally prefer Colgate than any other toothpaste.
i. In the Philippines, the life expectancy of females at birth is 72.7 years.

3. For each given research question, identify the variables and classify each as either
dependent or independent.
a. Does CAI-aided instruction make pupils achieve better performance than the
conventional method of instruction?
b. Are single teachers less emotionally exhausted than married teachers?
c. Is there a significant difference between the level of morale among employees
exposed to autocratic and democratic types of management leadership?
d. Does frequency of classroom observation result in better achievement of pupils
in the National Achievement Test?
e. Is there a difference in the incidence of lung cancer between people who smoke
and people who do not smoke?

4. For each given situation, identify the specified variables


a. Students who are below average tend to perform as well as the above average
group when taught using a Constructivist method of teaching than when taught
using the traditional lecture method.
Independent variable:_________________
Dependent variable:__________________
Moderator variable:__________________

b. Among male adults, the incidence of lung cancer on people who smoke and
drinkers is higher than those who smoke and non-drinkers.
Independent variable:_________________
Dependent variable:__________________
Moderator variable:__________________
Control variable:_____________________

Introduction to Statistics
24 | Gabino P.Petilos

c. “Elementary school children who receive tokens for doing their math will do
fewer problems during their free time than those who do not receive tokens.
This effect will occur among children who initially like to do the problems, but
the opposite effect will occur among those who initially dislike the problems.
This effect will occur because a focus on extrinsic reinforcement reduces intrinsic
motivation.” 8
Independent variable:_________________
Intervening variable:_________________
Dependent variable:__________________
Control variable:____________________
Moderator variable:__________________

5. For each given variable, indicate whether it is a discrete or continuous variable.


a. Number of minutes it takes to read a page in this text
b. Level of anxiety about taking a test
c. Weight of the book
d. Number of references of a book
e. Number of times the letter appears on a page

6. Identify the most likely scale of measurement (nominal, ordinal, interval, or ratio)
a. Distance in feet between desks in an office
b. Size of family
c. Gender of workers in an office
d. Commuting time of workers in an office
e. Job classification of workers in an office
f. Score on a mathematics exam with 50 objectives items
g. Level of sugar in the blood
h. Specie of animals in Manila Zoo
i. Marital satisfaction
j. Grading system (1.0 down to 3.0)

7. Suppose that the following information is obtained from a student upon exiting from
the campus bookstore during the first week of classes. Identify whether each variable
is quantitative or qualitative.
a. Amount of time spent in the bookstore
b. Academic major
c. Grade point average
d. Method of payment
e. Number of textbook purchased

8. Data from the study on the factors that affect mathematics learning contain values of
several variables for each respondent of the study. Classify each variable as
quantitative or qualitative.
a. Score on the Mathematics Performance Test
b. Prior grade in mathematics (average grade in three math subjects)
c. English proficiency level (score on the National Secondary Assessment Test)

8
Vockel, Edward L. (1983). Student Study Guide and Workbook. New York: Macmillan Publishing
Inc, p. 189.

Introduction to Statistics
25 | Gabino P.Petilos

d. Educational qualification (Undergraduate degree obtained)


e. Teachers’ Pedagogical training (with or without training)
f. Attendance in the SEDP training (attended or did not attend)
g. Number of in-service trainings attended
h. Parents occupation

9. The following questions were included in a recent series of student surveys. Indicate
the level of measure for each question.
a. How satisfied are you with your life? (5 – Very Satisfied ...... 1- Very Unsatisfied)
b. What type of job do you expect to obtain when you graduate?
c. Estimate the amount of your study time spent effectively.
d. What is your current grade point average?
e. What language do you speak at home?
f. How many times have you consulted your teacher in Mathematics?
g. Do you prefer to stay in the dormitory? ____ yes ____ no
h. Do you have plans to go abroad after graduation? ___ yes ___ no.

10. For each situation given below, briefly identify the population, the sample, the
sampling technique used, and the variables measured.

a. A research is to be conducted to determine the level of language proficiency and


numeracy skills among the 750 BEED and 250 BSED graduating students at Leyte
Normal University. The researcher wants a sample of 300 by selecting
representatives from the two programs.
Population: _______________________________________________________
Sample: __________________________________________________________
Sampling Technique:_________________________________________________
Variables Measured:________________________________________________
b. Miss Barbosa conducted a study to determine the level of knowledge and
attitude of students towards AIDS. Using the list of fourth year students
obtained from the registrar’s office, she took every tenth name until she gets
280.
Population: _______________________________________________________
Sample: __________________________________________________________
Sampling Technique:_________________________________________________
Variables Measured:________________________________________________
c. A school dentist is going to conduct a research to determine the incidence of
tooth decay among elementary pupils and whether the incidence differs across
gender and across grade levels. A random sample of 10 schools in Tacloban City
Division will be selected and all pupils in each selected school will form part of
the sample.
Population: _______________________________________________________
Sample: __________________________________________________________
Sampling Technique:_________________________________________________
Variables Measured:________________________________________________

Introduction to Statistics
26 | Gabino P.Petilos

d. A survey was conducted to determine the parents’ reaction (whether they favor
or not) of the new K-12 curriculum implemented by DepEd. The sample of
parents was identified by selecting a random sample of 50 pupils from the list of
all Grade VI pupils enrolled in San Fernando Central School.
Population: _______________________________________________________
Sample: __________________________________________________________
Sampling Technique:_________________________________________________
Variables Measured:________________________________________________

e. A study was conducted to determine the factors such as age, work experience,
educational attainment, and number of seminars attended that influence work
productivity among employees of a DENR Region8. A random sample of middle
level managers and rank and file employees were selected.

Population: _______________________________________________________
Sample: __________________________________________________________
Sampling Technique:_________________________________________________
Variables Measured:________________________________________________

Introduction to Statistics

You might also like