You are on page 1of 22

REPORTING IN

BIOSTATISTICS

SUBMITTED BY:
GROUP 1
ASENJO, ARRIANE
DIESTRO, CRISTELLE ANNE
MUYCO, TERESA
PARREŃO, JESAHLYN

SUBMITTED TO:
VICTORIANO BENEDICTO, MD.
Introduction
When faced with a research problem, you need to collect,
analyze and interpret data to answer your research
questions. Examples of research questions that could require
you to gather data include how many people will vote for a
candidate, what is the best product mix to use and how
useful is a drug in curing a disease. The research problem you
explore informs the type of data you’ll collect and the data
collection method you’ll use. In this article, we will explore
various types of data, methods of data collection and
advantages and disadvantages of each. After reading our
review, you will have an excellent understanding of when to
use each of the data collection methods we discuss.

Types of Data
Quantitative Data
Data that is expressed in numbers and summarized using
statistics to give meaningful information is referred to
as quantitative data. Examples of quantitative data we could
collect are heights, weights, or ages of students. If we obtain
the mean of each set of measurements, we have meaningful
information about the average value for each of those
student characteristics.
Qualitative Data
When we use data for description without measurement, we
call it qualitative data. Examples of qualitative data are
student attitudes towards school, attitudes towards exam
cheating and friendliness of students to teachers. Such data
cannot be easily summarized using statistics.
Primary Data
When we obtain data directly from individuals, objects or
processes, we refer to it as primary data. Quantitative or
qualitative data can be collected using this approach. Such
data is usually collected solely for the research problem to
you will study. Primary data has several advantages. First, we
tailor it to our specific research question, so there are no
customizations needed to make the data usable. Second,
primary data is reliable because you control how the data is
collected and can monitor its quality. Third, by collecting
primary data, you spend your resources in collecting only
required data. Finally, primary data is proprietary, so you
enjoy advantages over those who cannot access the data.
Despite its advantages, primary data also has disadvantages
of which you need to be aware. The first problem with
primary data is that it is costlier to acquire as compared to
secondary data. Obtaining primary data also requires more
time as compared to gathering secondary data.
Secondary Data
When you collect data after another researcher or agency
that initially gathered it makes it available, you are
gathering secondary data. Examples of secondary data are
census data published by the US Census Bureau, stock prices
data published by CNN and salaries data published by the
Bureau of Labor Statistics.
One advantage to using secondary data is that it will save you
time and money, although some data sets require you to pay
for access. A second advantage is the relative ease with
which you can obtain it. You can easily access secondary data
from publications, government agencies, data aggregation
websites and blogs. A third advantage is that it eliminates
effort duplication since you can identify existing data that
matches your needs instead of gather new data.
Despite the benefits it offers, secondary data has its
shortcomings. One limitation is that secondary data may not
be complete. For it to meet your research needs, you may
need to enrich it with data from other sources. A second
shortcoming is that you cannot verify the accuracy of
secondary data, or the data may be outdated. A third
challenge you face when using secondary data is that
documentation may be incomplete or missing. Therefore,
you may not be aware of any problems that happened in
data collection which would otherwise influence its
interpretation. Another challenge you may face when you
decide to use secondary data is that there may be copyright
restrictions.
Now that we’ve explained the various types of data you can
collect when conducting research, we will proceed to look at
methods used to collect primary and secondary data.
Methods Employed in Primary Data Collection
When you decide to conduct original research, the data you
gather can be quantitative or qualitative. Generally, you
collect quantitative data through sample surveys,
experiments and observational studies. You obtain
qualitative data through focus groups, in-depth interviews
and case studies. We will discuss each of these data
collection methods below and examine their advantages and
disadvantages.
Sample Surveys
A survey is a data collection method where you select a
sample of respondents from a large population in order to
gather information about that population. The process of
identifying individuals from the population who you will
interview is known as sampling.
To gather data through a survey, you construct a
questionnaire to prompt information from selected
respondents. When creating a questionnaire, you should
keep in mind several key considerations. First, make sure the
questions and choices are unambiguous. Second, make sure
the questionnaire will be completed within a reasonable
amount of time. Finally, make sure there are no
typographical errors. To check if there are any problems with
your questionnaire, use it to interview a few people before
administering it to all respondents in your sample. We refer
to this process as pretesting.
Using a survey to collect data offers you several advantages.
The main benefit is time and cost savings because you only
interview a sample, not the large population. Another benefit
is that when you select your sample correctly, you will obtain
information of acceptable accuracy. Additionally, surveys are
adaptable and can be used to collect data for governments,
health care institutions, businesses and any other
environment where data is needed.
A major shortcoming of surveys occurs when you fail to
select a sample correctly; without an appropriate sample, the
results will not accurately generalize the population.

Ways of Interviewing Respondents


Once you have selected your sample and developed your
questionnaire, there are several ways you can interview
participants. Each approach has its advantages and
disadvantages.
1. In-person Interviewing
When you use this method, you meet with the respondents
face to face and ask questions. In-person interviewing offers
several advantages. This technique has excellent response
rates and enables you to conduct interviews that take a
longer amount of time. Another benefit is you can ask follow-
up questions to responses that are not clear.
In-person interviews do have disadvantages of which you
need to be aware. First, this method is expensive and takes
more time because of interviewer training, transport, and
remuneration. A second disadvantage is that some areas of a
population, such as neighborhoods prone to crime, cannot be
accessed which may result in bias.
2. Telephone Interviewing
Using this technique, you call respondents over the phone
and interview them. This method offers the advantage of
quickly collecting data, especially when used with computer-
assisted telephone interviewing. Another advantage is that
collecting data via telephone is cheaper than in-person
interviewing.
One of the main limitations with telephone interviewing it’s
hard to gain the trust of respondents. Due to this reason, you
may not get responses or may introduce bias. Since phone
interviews are generally kept short to reduce the possibility
of upsetting respondents, this method may also limit the
amount of data you can collect.
3. Online Interviewing
With online interviewing, you send an email inviting
respondents to participate in an online survey. This
technique is used widely because it is a low-cost way of
interviewing many respondents. Another benefit is
anonymity; you can get sensitive responses that participants
would not feel comfortable providing with in-person
interviewing.
When you use online interviewing, you face the disadvantage
of not getting a representative sample. You also cannot seek
clarification on responses that are unclear.
4. Mailed Questionnaire
When you use this interviewing method, you send a printed
questionnaire to the postal address of the respondent. The
participants fill in the questionnaire and mail it back. This
interviewing method gives you the advantage of obtaining
information that respondents may be unwilling to give when
interviewing in person.
The main limitation with mailed questionnaires is you are
likely to get a low response rate. Keep in mind that
inaccuracy in mailing address, delays or loss of mail could
also affect the response rate. Additionally, mailed
questionnaires cannot be used to interview respondents with
low literacy, and you cannot seek clarifications on responses.
5. Focus Groups
When you use a focus group as a data collection method, you
identify a group of 6 to 10 people with similar characteristics.
A moderator then guides a discussion to identify attitudes
and experiences of the group. The responses are captured by
video recording, voice recording or writing—this is the data
you will analyze to answer your research questions. Focus
groups have the advantage of requiring fewer resources and
time as compared to interviewing individuals. Another
advantage is that you can request clarifications to unclear
responses.
One disadvantage you face when using focus groups is that
the sample selected may not represent the population
accurately. Furthermore, dominant participants can influence
the responses of others.

Observational Data Collection Methods


In an observational data collection method, you acquire data
by observing any relationships that may be present in the
phenomenon you are studying. There are four types of
observational methods that are available to you as a
researcher: cross-sectional, case-control, cohort and
ecological.
In a cross-sectional study, you only collect data on observed
relationships once. This method has the advantage of being
cheaper and taking less time as compared to case-control
and cohort. However, cross-sectional studies can miss
relationships that may arise over time.
Using a case-control method, you create cases and controls
and then observe them. A case has been exposed to a
phenomenon of interest while a control has not. After
identifying the cases and controls, you move back in time to
observe how your event of interest occurs in the two groups.
This is why case-control studies are referred to as
retrospective. For example, suppose a medical researcher
suspects a certain type of cosmetic is causing skin cancer.
You recruit people who have used a cosmetic, the cases, and
those who have not used the cosmetic, the controls. You
request participants to remember the type of cosmetic and
the frequency of its use. This method is cheaper and requires
less time as compared to the cohort method. However, this
approach has limitations when individuals you are observing
cannot accurately recall information. We refer to this as
recall bias because you rely on the ability of participants to
remember information. In the cosmetic example, recall bias
would occur if participants cannot accurately remember the
type of cosmetic and number of times used.
In a cohort method, you follow people with similar
characteristics over a period. This method is advantageous
when you are collecting data on occurrences that happen
over a long period. It has the disadvantage of being costly
and requiring more time. It is also not suitable for
occurrences that happen rarely.
The three methods we have discussed previously collect data
on individuals. When you are interested in studying a
population instead of individuals, you use
an ecological method. For example, say you are interested in
lung cancer rates in Iowa and North Dakota. You obtain
number of cancer cases per 1000 people for each state from
the National Cancer Institute and compare them. You can
then hypothesize possible causes of differences between the
two states. When you use the ecological method, you save
time and money because data is already available. However
the data collected may lead you to infer population
relationships that do not exist.
Experiments
An experiment is a data collection method where you as a
researcher change some variables and observe their effect on
other variables. The variables that you manipulate are
referred to as independent while the variables that change
as a result of manipulation are dependent variables. Imagine
a manufacturer is testing the effect of drug strength on
number of bacteria in the body. The company decides to test
drug strength at 10mg, 20mg and 40mg. In this example,
drug strength is the independent variable while number of
bacteria is the dependent variable. The drug administered is
the treatment, while 10mg, 20mg and 40mg are the levels of
the treatment.
The greatest advantage of using an experiment is that you
can explore causal relationships that an observational study
cannot. Additionally, experimental research can be adapted
to different fields like medical research, agriculture,
sociology, and psychology. Nevertheless, experiments have
the disadvantage of being expensive and requiring a lot of
time.
CONCEPT OF DATA COLLECTION
Data collection is the process of gathering and measuring
information on variables of interest, in an established
systematic fashion that enables one to answer stated
research questions, test hypotheses, and evaluate
outcomes. The data collection component of research is
common to all fields of study including physical and social
sciences, humanities, business, etc. While methods vary by
discipline, the emphasis on ensuring accurate and honest
collection remains the same. The goal for all data collection
is to capture quality evidence that then translates to rich
data analysis and allows the building of a convincing and
credible answer to questions that have been posed.
Regardless
of the field of study or preference for defining data
(quantitative, qualitative), accurate data collection is
essential to maintaining the integrity of research. Both
the selection of appropriate data collection instruments
(existing, modified, or newly developed) and clearly
delineated instructions for their correct use reduce the
likelihood of errors occurring.
Data collection is one of the most important stages in
conducting a research. You can have the best research design
in the world but if you cannot collect the required data you
will be not be able to complete your project. Data collection
is a very demanding job which needs thorough planning, hard
work, patience, perseverance and more to be able to
complete the task successfully. Data collection starts with
determining what kind of data required followed by the
selection of a sample from a certain population. After that,
you need to use a certain instrument to collect the data
from the selected sample.
TYPES OF DATA
Data are organized into two broad categories: qualitative and
quantitative.
Qualitative Data: Qualitative data are mostly non-
numerical and usually descriptive or nominal in nature. This
means the data collected are in the form of words and
sentences. Often (not always), such data captures feelings,
emotions, or subjective perceptions of something.
Qualitative approaches aim to address the ‘how’ and ‘why’ of
a program and tend to use unstructured methods of data
collection to fully explore the topic. Qualitative questions are
open-ended. Qualitative methods include focus groups,
group discussions and interviews. Qualitative approaches are
good for further
exploring the effects and unintended consequences of a
program. They are, however, expensive and time consuming
to implement. Additionally the findings cannot be
generalized to participants outside of the program and are
only indicative of the group involved.
Qualitative data collection methods play an important
role in impact evaluation by providing
information useful to understand the processes behind
observed results and assess changes in
people’s perceptions of their well-being. Furthermore
qualitative methods can be used to improve the quality of
survey-based quantitative evaluations by helping generate
evaluation hypothesis; strengthening the design of survey
questionnaires and expanding or clarifying quantitative
evaluation findings.

Quantitative Data: Quantitative data is numerical in nature


and can be mathematically computed. Quantitative data
measure uses different scales, which can be classified as
nominal scale, ordinal scale, interval scale and ratio scale.
Often (not always), such data includes measurements of
something. Quantitative approaches address the ‘what’ of
the program. They use a systematic standardized
approach and employ methods such as surveys and ask
questions. Quantitative approaches have the advantage
that they are cheaper to implement, are standardized so
comparisons can be easily made and the size of the effect
can usually be measured. Quantitative approaches however
are limited in their capacity for the investigation and
explanation of similarities and unexpected differences. It is
important to note that for peer-based programs quantitative
data collection approaches often prove to be difficult to
implement for agencies as lack of necessary resources to
ensure rigorous implementation of surveys and frequently
experienced low participation and loss to follow up rates are
commonly experienced factors.
The Quantitative data collection methods rely on random
sampling and structured data collection
instruments that fit diverse experiences into
predetermined response categories. They produce results
that are easy to summarize, compare, and generalize. If the
intent is to generalize from the research participants to a
larger population, the researcher will employ probability
sampling to select participants.

PRIMARY DATA
Data that has been collected from first-hand-experience is
known as primary data. Primary data has not been
published yet and is more reliable, authentic and
objective. Primary data has not been changed or altered
by human beings; therefore its validity is greater than
secondary data.
Importance of Primary Data: In statistical surveys it is
necessary to get information from primary sources and work
on primary data. For example, the statistical records of
female population in a country cannot be based on
newspaper, magazine and other printed sources. A
research can be conducted without secondary data but a
research based on only secondary data is least reliable and
may have biases because secondary data has already been
manipulated by human beings. One of such sources is old
and secondly they contain limited information as well as
they can be misleading and
biased.
Sources of Primary Data:
Sources for primary data are limited and at times it becomes
difficult to
obtain data from primary source because of either
scarcity of population or lack of cooperation.
Following are some of the sources of primary data.
Experiments:
Experiments require an artificial or natural setting in which
to perform logical study
to collect data. Experiments are more suitable for medicine,
psychological studies, nutrition and for
other scientific studies. In experiments the experimenter has
to keep control over the influence of
any extraneous variable on the results.
Survey:
Survey is most commonly used method in social
sciences, management, marketing and
psychology to some extent. Surveys can be conducted in
different methods.
Questionnaire: It is the most commonly used method in
survey. Questionnaires are a list of questions either open-
ended or close-ended for which the respondents give
answers. Questionnaire
can be conducted via telephone, mail, live in a public area, or
in an institute, through electronic mail
or through fax and other methods.
Interview:
Interview is a face-to-face conversation with the
respondent. In interview the main problem arises when
the respondent deliberately hides information otherwise
it is an in depth
source of information. The interviewer can not only record
the statements the interviewee speaks.
Advantages of Using Primary Data

The investigator collects data specific to the problem under
study.


There is no doubt about the quality of the data collected (for
the investigator).


If required, it may be possible to obtain additional data
during the study period.
Disadvantages of Using Primary Data
1. The investigator has to contend with all the hassles of
data collection-
 deciding why, what, how, when to collect;
 getting the data collected (personally or through others);
 getting funding and dealing with funding agencies;
 ethical considerations (consent, permissions, etc.).
2. Ensuring the data collected is of a high standard-
 all desired data is obtained accurately, and in the format it
is required in;
 there is no fake/ cooked up data;
 unnecessary/ useless data has not been included.
3. Cost of obtaining the data is often the major expense in
studies.

SECONDARY DATA
Data collected from a source that has already been
published in any form is called as secondary
data. The review of literature in any research is based on
secondary data. It is collected by someone
else for some other purpose (but being utilized by the
investigator for another purpose). For
examples, Census data being used to analyze the impact of
education on career choice and earning.
Common sources of secondary data for social science
include censuses, organizational records and
data collected through qualitative methodologies or
qualitative research. Secondary data is
essential, since it is impossible to conduct a new survey
that can adequately capture past change
and/or developments.
Sources of Secondary Data:
The following are some ways of collecting secondary data –


Books

Records

Biographies

Newspapers

Published censuses or other statistical data

Data archives

Internet articles

Research articles by other researchers (journals)

Databases, etc.

Importance of Secondary Data:


Secondary data can be less valid but its importance is still
there. Sometimes it is difficult to obtain primary data; in
these cases getting information from secondary sources is
easier and possible. Sometimes primary data does not exist
in such situation one has to confine the research on
secondary data. Sometimes primary data is present but the
respondents are
not willing to reveal it in such case too secondary data can
suffice.
Methods of Collecting Data
1. Direct personal interviews.
2. Indirect Oral interviews.
3. Information from correspondents.
4. Mailed questionnaire method.
5. Experimental Methods.

Methods of Collecting Data. . .


There are many methods used to collect or obtain data for
statistical analysis. Three of the most popular methods are:
• Direct Observation
• Experiments, and
• Surveys
A survey solicits information from people; e.g. Gallup polls;
pre-election polls; marketing surveys. The Response Rate (i.e.
the proportion of all people selected who complete the
survey) is a key survey parameter. Surveys may be
administered in a variety of ways, e.g.
• Personal Interview,
• Telephone Interview, and
• Self-Administered Questionnaire.
Questionnaire Design Over the years, a lot of thought has
been put into the science of the design of survey questions.
Key design principles:
1. Keep the questionnaire as short as possible.
2. Ask short, simple, and clearly worded questions.
3. Start with demographic questions to help respondents get
started comfortably.
4. Use dichotomous (yes | no) and multiple choice questions.
5. Use open-ended questions cautiously.
6. Avoid using leading-questions.
7. Pretest a questionnaire on a small number of people.
8. Think about the way you intend to use the collected data
when preparing the questionnaire.
Sampling. Recall that statistical inference permits us to draw
conclusions about a population based on a sample. Sampling
(i.e. selecting a sub-set of a whole population) is often done
for reasons of cost (it’s less expensive to sample 1,000
television viewers than 100 million TV viewers) and
practicality (e.g. performing a crash test on every automobile
produced is impractical). In any case, the sampled population
and the target population should be similar to one another. 5
Sampling Plans. . . A sampling plan is just a method or
procedure for specifying how a sample will be taken from a
population. We will focus our attention on these three
methods:
• Simple Random Sampling,
• Stratified Random Sampling, and
• Cluster Sampling. Details
Simple Random Sampling. A simple random sample is a
sample selected in such a way that every possible sample of
the same size is equally likely to be chosen. Drawing three
names from a hat containing all the names of the students in
the class is an example of a simple random sample: any
group of three names is as equally likely as picking any other
group of three names.
Stratified Random Sampling. A stratified random sample is
obtained by separating the population into mutually
exclusive sets, or strata, and then drawing simple random
samples from each stratum.
Strata 1 : Gender Male Female
Strata 2 : Age < 20 20-30 31-40 41-50 51-60 > 60
Strata 3 : Occupation professional clerical blue collar other
We can acquire about the total population, make inferences
within a stratum or make comparisons across strata 9 After
the population has been stratified, we can use simple
random sampling to generate the complete sample: If we
only have sufficient resources to sample 400 people total, we
would draw 100 of them from the low income group… …if we
are sampling 1000 people, we’d draw 50 of them from the
high income group.
Cluster Sampling. A cluster sample is a simple random
sample of groups or clusters of elements (vs. a simple
random sample of individual objects). This method is useful
when it is difficult or costly to develop a complete list of the
population members or when the population elements are
widely dispersed geographically. Cluster sampling may
increase sampling error due to similarities among cluster
members.
Sample Size. This is an important issue. Numerical
techniques for determining sample sizes will be described
later, but suffice it to say that the larger the sample size is,
the more accurate we can expect the sample estimates to be.
Sampling and Non-Sampling Errors. Two major types of
error can arise when a sample of observations is taken from a
population: sampling error and non-sampling error. Sampling
error refers to differences between the sample and the
population that exist only because of the observations that
happened to be selected for the sample. Non-sampling errors
are more serious and are due to mistakes made in the
acquisition of data or due to the sample observations being
selected improperly.
Sampling Error. Sampling error refers to differences between
the sample and the population that exist only because of the
observations that happened to be selected for the sample.
Another way to look at this is: the differences in results for
different samples (of the same size) is due to sampling error:
E.g. Two samples of size 10 of 1,000 households. If we
happened to get the highest income level data points in our
first sample and all the lowest income levels in the second,
this is a consequence of sampling error. Increasing the
sample size will reduce this type of error.
Non-Sampling Error. Non-sampling error are more serious
and are due to mistakes made in the acquisition of data or
due to the sample observations being selected improperly.
There are three types of non-sampling errors:
• Errors in data acquisition,
• Nonresponse errors, and
• Selection bias. Increasing the sample size will not reduce
this type of error. Details.
Errors in Data Acquisition. arises from the recording of
incorrect responses, due to: — incorrect measurements
being taken because of faulty equipment, — mistakes made
during transcription from primary sources, — inaccurate
recording of data due to misinterpretation of terms, or —
inaccurate responses to questions concerning sensitive
issues.
Nonresponse Error. refers to error (or bias) introduced when
responses are not obtained from some members of the
sample, i.e. the sample observations that are collected may
not be representative of the target population. As mentioned
earlier, the Response Rate (i.e. the proportion of all people
selected who complete the survey) is a key survey parameter
and helps in the understanding in the validity of the survey
and sources of nonresponse error.
Selection Bias. occurs when the sampling plan is such that
some members of the target population cannot possibly be
selected for inclusion.

You might also like