Professional Documents
Culture Documents
Prepared by:
EDWARD B. PESCUELA
Instructor
“Most people use statistics like a drunk man uses a lamppost; more for support than
illumination”
― Andrew Lang
Introduction
This is a learning module intended to discuss statistical techniques that are essential to data
analysis in educational researches. The main goal in this exploration is statistical literacy – the ability to
understand and apply a variety of parametric statistical operations. Topics will cover descriptive statistics,
probability, inferential statistics, regression and correlation. Moreover, this course will enhance students’
ability in using statistical software such as MS Excel to automate data processing. Before each activity,
fast facts and discussions are given to help you understand the concepts and processes involved as well
as to solve problems in each activity. The activities will be done individually. Answers in every assessment
must be written or encoded on a short bond paper following the given format. Please do not forget to
write your significant learning experience at the last part of your output. The submission of Module 1
outputs will be on March 8, 2021. If you have queries, you may reach me through E-mail at
edward.pescuela23@gmail.com. Thank you and have fun!
Format
Pretest/Exercise1/Activity 1
1.)
2.)
3.)
.
.
.
_____________________________
Signature over Printed Name
Pretest
I. MULTIPLE CHOICE
1. A good questionnaire should accomplish all of the following goals except
a. contextualizes the information collected
b. expresses the study objectives in question form
c. provides a surplus of questions to ensure all objectives are met
d. creates harmony and rapport with the respondent
2. These kinds of questions are appropriate for in-depth interviews and focus group interviews.
a. Structured-undisguised c. Unstructured-undisguised
b. Structured-disguised d. Unstructured-disguised
4. The basic types of questions that can be used in a questionnaire include all of the following except
a. open-ended c. dichotomous
b. nominal d. scales
II. Determine the sample size (n) to be obtain from III. Determine the margin of error (e) when given
the given population. Use e = 0.05 as margin of the Population N and Sample Size (n).
error. 11. N = 10,500; n = 560
6. 5,300 12. N = 1,386; n = 288
7. 10,430 13. N = 258; n = 45
8. 156 14. N = 4578; n = 550
9. 548 15. N = 438; n = 90
10. 1,540
CLASSIFICATION OF DATA
Data can be classified into Primary and Secondary Data
1. Primary Data: primary data are those that you have collected yourself or the data collected at
source or the data originally collected by individuals, focus groups, and a panel of respondents
specifically set up by the researcher whose opinions may be sought on specific issues from time to
time (Matt, 2001), (Afonja, 2001).
2. Secondary Data: a secondary data research project involves the gathering and/or use of existing
data for which they were originally collected, for example, computerized database, company
records or archives, government publications, industry analysis offered by the media, information
system and computerized or mathematical models of environmental processes and so on (Tim
,1997), (Matt, 2001).
Quantitative data
Quantitative data is data that is mainly numbers. It refers to the information that is collected as, or can be
translated into, numbers, which can then be displayed and analyzed mathematically. Quantitative data are
Structured and Unstructured in nature. Structured data is organized, while unstructured data is relatively
disorganized. Structured data can be produced by closed questions, unstructured data can be produced
by open questions. (Checkland et al 1998), (Matt, 2001), (Burchfield, 1996), (Anyanwu, 2002).
MC MATH 13: ELEMENTARY STATISTICS & PROBABILITY |PESCUELA s. 2021
Closed questions
• Closed questions can make analysis of data relatively easy, but their responses are restricted.
Open questions
In the case of an open question, their responses are not restricted. This will produce almost completely
unstructured data. Although the open question produces data that is difficult to organize and code, it allows
subjects to respond freely and express shades of opinion rather than forcing them to have pre-coded
opinions. As discussed, quantitative data are typically collected directly as numbers (Afonja, 2001),
(Anyanwu, 2002). Some examples include:
➢ The frequency (rate, duration) of specific behaviors or conditions
➢ Test scores (e.g., scores/levels of knowledge, skill, etc.)
➢ Survey results (e.g., reported behavior, or outcomes to environmental conditions; ratings of
satisfaction, stress, etc.)
➢ Numbers or percentages of people with certain characteristics in a population (diagnosed with
diabetes, unemployed, Spanish-speaking, under age 14, grade of school completed, etc.)
(Burchfield, 1996), (Afonja, 2001).
Quantitative data is usually subjected to statistical procedures such as calculating the mean or average
number of times an event or behavior occurs (per day, month, year). These operations, because numbers
are “hard data” and not interpretation, can give definitive, or nearly definitive, answers to different
questions. Various kinds of quantitative analysis can indicate changes in a dependent variable related to
– frequency, duration, timing (when particular things happen), intensity, level, etc. They can allow you to
compare those changes to one another, to changes in another variable, or to changes in another
population. They might be able to tell you, at a particular degree of reliability, whether those changes are
likely to have been caused by your intervention or program, or by another factor, known or unknown. And
they can identify relationships among different variables, which may or may not mean that one causes
another (Burchfield, 1996), (Afonja, 2001), (Anyanwu, 2002).
Qualitative data
Qualitative data is data that is mainly words, sounds or Images. Unlike numbers or “hard data”,
qualitative information tends to be “soft,” meaning it can’t always be reduced to something definite. That is
in some ways a weakness, but it’s also a strength. A number may tell you how well a student did on a test;
the look on her face after seeing her grade, however, may tell you even more about the effect of that result
on her. That look can’t be translated to a number, nor can a teacher’s knowledge of that student’s history,
progress, and experience, all of which go into the teacher’s interpretation of that look. And that
interpretation may be far more valuable in helping that student succeed than knowing her grade or
numerical score on the test (Matt, 2001), (Afonja, 2001), (Burchfield, 1996).
As explained above, qualitative data can sometimes be changed into numbers, usually by counting
the number of times specific things occur in the course of observations or interviews, or by assigning
numbers or ratings to dimensions (e.g., importance, satisfaction, ease of use).
The challenges of translating qualitative data into quantitative data have to do with the human
factor. Qualitative data can sometimes tell you things that quantitative data can’t. It may reveal why certain
methods are working or not working, whether part of what you’re doing conflicts with participants‟ culture,
what participants see as important, etc. It may also show you patterns in behavior, physical or social
environment, or other factors that the numbers in your quantitative data don’t, and occasionally even
identify variables that researchers weren’t aware of. It is often helpful to collect both quantitative and
qualitative information (Tim, 1997), (Anyanwu, 2002).
Quantitative analysis is considered to be objective, without any human bias attached to it, because
it depends on the comparison of numbers according to mathematical computations. Analysis of qualitative
Interviews method
An interview is a series of questions a researcher addresses personally to respondents. An
interview may be structured (where you ask clearly defined questions) or unstructured, where you allow
some of your questioning to be led by the responses of the interviewee. Especially when using unstructured
interviews, using a tape recorder can be a good idea, if it does not affect the relationship with the person
being interviewed. Interviewing method is sub-divided into the following: Face-to-face interviews,
Telephone interviews, etc (Checkland et al 1998).
Face-to-face interviews
This type provides rich data, offer the opportunity to establish rapport with the interviewees, and
help to explore and understand complex issues. many ideas that are ordinarily difficult to articulate can
also be surfaced and discussed during such interviews. On the negative side, face-to-face interviews have
the potential for introducing interviewer bias and can be expensive if a big sample of subjects is to be
personally interviewed (Tim, 1997), (Erricker, 1971), (Burchfield,1996), (Matt, 2001).
Telephone interviews
This help to contact subjects dispersed over various geographic regions and obtain responses from
them immediately on contact. This is an efficient way of collecting data when one has specific questions
to ask, needs the responses quickly, and has the sample spread over a wide geographic area. On the
negative side, the interviewer cannot observe the nonverbal responses of the respondents, and the
interviewee can block a call. Personally, administering questionnaires to groups of individuals (Tim, 1997).
Questionnaire method
A questionnaire is a series of written questions a researcher supplies to subjects, requesting their
response. Usually, the questionnaire is self-administered in that it is posted to the subjects, asking them
to complete it and post it back. The way you will be analyzing the data may influence the layout of the
questionnaire. For example, closed questions provide boxes for the respondent to tick (giving easily coded
information), whereas an open question provides or a box for the respondent to write answers in (giving
more freedom of information, but more difficulty coding) (Checkland et al 1998), (Burchfield,1996),
(Anyanwu, 2002).
SURVEYS METHOD
This consists of finding facts in particular fields of inquiry. We have three important surveys in which data
collected are of a statistical nature. They are social survey and public opinion polls (Matt, 2001), (Erricker,
1971).
a. Social Survey: is a survey meant to provide information for other Government Departments so
that they could carry out their duties more efficiently. Many bodies do conduct social surveys – as, for
example, the universities but the most important is the social survey department of the government like
that of Federal of Statistics (FOS) in Nigeria (which is now called Federal Bureaus of Statistics)
(Burchfield,1996), (Erricker, 1971).
b. market Research: This involves the use of surveys, tests, and statistical studies to analyze
consumer trends and to forecast the size and location of markets for specific products or services. The
social sciences are increasingly utilized in customer research. Psychology and sociology, for example, by
providing clues ton people a activities, circumstances, wants, desires, and general motivation, are keys to
understanding the various behavioural pattern of consumers. (Erricker, 1971).
c. Public Opinion Polls: this is a technique that measures the attitude, perspectives, and
preferences of a population towards events, circumstance and issues of mutual interest. Both random and
quota sampling are used.
Census method
A Census is a study that obtains data from every member of a population, In most studies, a census
is not practical, because of the cost and/ or time required (Erricker, 1971).
- is used to calculate the sample size (n) given the population size (N) and a margin of error (e).
𝑁
-It is computed as 𝑛 = , whereas:
1+𝑁𝑒 2
n = no. of samples
N = total population
e = error margin / margin of error
- If a sample is taken from a population, a formula must be used to take into account confidence levels and
margins of error. When taking statistical samples, sometimes a lot is known about a population, sometimes
a little and sometimes nothing at all. For example, we may know that a population is normally distributed
(e.g., for heights, weights or IQs), we may know that there is a bimodal distribution (as often happens with
class grades in mathematics classes) or we may have no idea about how a population is going to behave
(such as polling college students to get their opinions about quality of student life). Slovin's formula is used
when nothing about the behavior of a population is known at at all.
- To use the formula, first figure out what you want your error of tolerance to be. For example, you may be
happy with a confidence level of 95 percent (giving a margin error of 0.05), or you may require a tighter
accuracy of a 98 percent confidence level (a margin of error of 0.02). Plug your population size and required
margin of error into the formula. The result will be the number of samples you need to take.
Example 1. A researcher plans to conduct a survey. If the population in Legazpi City is 190,000 , find the
sample size if the margin of error is 5%.
𝑁 190,000
𝑛 = 2 =
1 + 𝑁𝑒 1 + 190,000(0.05)2
190,000
= 399.16 ≈ 𝟑𝟗𝟗
476
Example 2. Find the sample size in Example 1 if the margin of error is 1%.
𝑁 190,000
𝑛 = 2 =
1 + 𝑁𝑒 1 + 190,000(0.01])2
190,000
= 399.16 ≈ 𝟗, 𝟓𝟎𝟎
20
Determining “e”
𝑁
From 𝑛 = 1+𝑁𝑒 2; we have:
𝑁−𝑛 10,000−2,000
𝑒 = √ 𝑁𝑛 ; 𝑒 = √10,000(2,000)
𝑒 = 0.02 or 𝑒 = 2%
In this case each individual is chosen entirely by chance and each member of the population has
an equal chance, or probability, of being selected. One way of obtaining a random sample is to give each
individual in a population a number, and then use a table of random numbers to decide which individuals
to include.1 For example, if you have a sampling frame of 1000 individuals, labelled 0 to 999, use groups
of three digits from the random number table to pick your sample. So, if the first three numbers from the
random number table were 094, select the individual labelled “94”, and so on.
As with all probability sampling methods, simple random sampling allows the sampling error to be
calculated and reduces selection bias. A specific advantage is that it is the most straightforward method of
probability sampling. A disadvantage of simple random sampling is that you may not select enough
individuals with your characteristic of interest, especially if that characteristic is uncommon. It may also be
difficult to define a complete sampling frame and inconvenient to contact them, especially if different forms
of contact are required (email, phone, post) and your sample units are scattered over a wide geographical
area.
2. Systematic sampling
Individuals are selected at regular intervals from the sampling frame. The intervals are chosen to
ensure an adequate sample size. If you need a sample size n from a population of size x, you should select
every x/nth individual for the sample. For example, if you wanted a sample size of 100 from a population
of 1000, select every 1000/100 = 10th member of the sampling frame.
Systematic sampling is often more convenient than simple random sampling, and it is easy to
administer. However, it may also lead to bias, for example if there are underlying patterns in the order of
the individuals in the sampling frame, such that the sampling technique coincides with the periodicity of
the underlying pattern. As a hypothetical example, if a group of students were being sampled to gain their
opinions on college facilities, but the Student Record Department’s central list of all students was arranged
such that the sex of students alternated between male and female, choosing an even interval (e.g. every
20th student) would result in a sample of all males or all females. Whilst in this example the bias is obvious
and should be easily corrected, this may not always be the case.
MC MATH 13: ELEMENTARY STATISTICS & PROBABILITY |PESCUELA s. 2021
3. Stratified sampling
In this method, the population is first divided into subgroups (or strata) who all share a similar
characteristic. It is used when we might reasonably expect the measurement of interest to vary between
the different subgroups, and we want to ensure representation from all the subgroups. For example, in a
study of stroke outcomes, we may stratify the population by sex, to ensure equal representation of men
and women. The study sample is then obtained by taking equal sample sizes from each stratum. In
stratified sampling, it may also be appropriate to choose non-equal sample sizes from each stratum. For
example, in a study of the health outcomes of nursing staff in a county, if there are three hospitals each
with different numbers of nursing staff (hospital A has 500 nurses, hospital B has 1000 and hospital C has
2000), then it would be appropriate to choose the sample numbers from each hospital proportionally (e.g.
10 from hospital A, 20 from hospital B and 40 from hospital C). This ensures a more realistic and accurate
estimation of the health outcomes of nurses across the county, whereas simple random sampling would
over-represent nurses from hospitals A and B. The fact that the sample was stratified should be taken into
account at the analysis stage.
Stratified sampling improves the accuracy and representativeness of the results by reducing
sampling bias. However, it requires knowledge of the appropriate characteristics of the sampling frame
(the details of which are not always available), and it can be difficult to decide which characteristic(s) to
stratify by.
4. Clustered sampling
In a clustered sample, subgroups of the population are used as the sampling unit, rather than
individuals. The population is divided into subgroups, known as clusters, which are randomly selected to
be included in the study. Clusters are usually already defined, for example individual GP practices or towns
could be identified as clusters. In single-stage cluster sampling, all members of the chosen clusters are
then included in the study. In two-stage cluster sampling, a selection of individuals from each cluster is
then randomly selected for inclusion. Clustering should be taken into account in the analysis. The General
Household survey, which is undertaken annually in England, is a good example of a (one-stage) cluster
sample. All members of the selected households (clusters) are included in the survey.1
Cluster sampling can be more efficient that simple random sampling, especially where a study
takes place over a wide geographical region. For instance, it is easier to contact lots of individuals in a few
GP practices than a few individuals in many different GP practices. Disadvantages include an increased
risk of bias, if the chosen clusters are not representative of the population, resulting in an increased
sampling error.
1. Convenience sampling
Convenience sampling is perhaps the easiest method of sampling, because participants are
selected based on availability and willingness to take part. Useful results can be obtained, but the results
are prone to significant bias, because those who volunteer to take part may be different from those who
choose not to (volunteer bias), and the sample may not be representative of other characteristics, such as
age or sex. Note: volunteer bias is a risk of all non-probability sampling methods.
2. Quota sampling
This method of sampling is often used by market researchers. Interviewers are given a quota of
subjects of a specified type to attempt to recruit. For example, an interviewer might be told to go out and
select 20 adult men, 20 adult women, 10 teenage girls and 10 teenage boys so that they could interview
them about their television viewing. Ideally the quotas chosen would proportionally represent the
characteristics of the underlying population.
Whilst this has the advantage of being relatively straightforward and potentially representative, the
chosen sample may not be representative of other characteristics that weren’t considered (a consequence
of the non-random nature of sampling). 2
Also known as selective, or subjective, sampling, this technique relies on the judgement of the
researcher when choosing who to ask to participate. Researchers may implicitly thus choose a
“representative” sample to suit their needs, or specifically approach individuals with certain characteristics.
This approach is often used by the media when canvassing the public for opinions and in qualitative
research.
Judgement sampling has the advantage of being time-and cost-effective to perform whilst resulting
in a range of responses (particularly useful in qualitative research). However, in addition to volunteer bias,
it is also prone to errors of judgement by the researcher and the findings, whilst being potentially broad,
will not necessarily be representative.
4. Snowball sampling
This method is commonly used in social sciences when investigating hard-to-reach groups. Existing
subjects are asked to nominate further subjects known to them, so the sample increases in size like a
rolling snowball. For example, when carrying out a survey of risk behaviours amongst intravenous drug
users, participants may be asked to nominate other users to be interviewed.
Snowball sampling can be effective when a sampling frame is difficult to identify. However, by
selecting friends and acquaintances of subjects already investigated, there is a significant risk of selection
bias (choosing a large number of people with similar characteristics or views to the initial individual
identified).
Evaluation
I. Data Collection. Choose the correct answer in each of the items given below.
1. What methods might be employed in a case study?
A. Interviews C. Questionnaires
B. Narrative observations D. Any of these and potentially others
2. A structured interview is one
A. That follows a pre-set list of open questions
B. Where the participant has to choose between a small list of possible responses
C. The interview is structured around photographs which the participant has taken
D. Where a group of participants is asked questions according to a set order, for example the oldest
participant first
3. When would it NOT be appropriate to use a questionnaire?
A. A study looking at the level of knowledge of 100 early childhood practitioners about the key person
system
B. Research involving 150 early years settings on the number of staff they have educated to degree level
C. Research investigating the views of six practitioners in a setting where there has been a change in the
settling-in policy
D. A study looking at parental satisfaction in a ‘chain’ of 12 nurseries.
4. Choose the most appropriate statement. Observations used in research are an example of
A. A methodological approach C. A qualitative data collection method
B. A quantitative data collection method D. A method of collecting data
5. There are five objectives for educational research: exploration, description, explanation, prediction and
influence. Surveys can be used to obtain information for:
A. Exploration and description C. Exploration, prediction and influence
B. Exploration, explanation and influence D. All of the above
A survey to find out if families living in a certain municipality are in favor of Charter Change will be
conducted. To ensure that all income groups are represented, respondents will be divided into high
income (class A), middle (class B) and low-income (class C) groups. Below is the distribution of income
groups.
Class A 1,000
Class B 1,500
Class C 2,500
N = 5,000
Complete the table using the information above by following the guide:
Guide Questions:
1. Using 5% margin of error, how many families should be included in the samples? (Use Slovin’s
Formula)
2. Using proportional allocation, how many from each group should be taken as samples?
3. Using proportional allocation, how many from each group should be taken as samples?
4. Generalize the table in paragraph form.
Class A 1,000
Class B 1,500
Class C 2,500
N = 5,000 n= n=
5. Compute the margin of error if you will take only 900 families.
References
Osang, J. E. et al (2013). Methods of Gathering Data for Research Purpose. Retrieved from
http://www.iosrjournals.org/iosr-jce/papers/Vol15-issue2/I01525965.pdf?id=7568
Methods of Sampling from a Population. Retrieved from https://www.healthknowledge.org.uk/public-
health-textbook/research-methods/1a-epidemiology/methods-of-sampling-population