You are on page 1of 37

We'll talk about data in lots of places in The Knowledge Base, but here I just want to make a

fundamental distinction between two types of data: qualitative and quantitative. The way we
typically define them, we call data 'quantitative' if it is in numerical form and 'qualitative' if it is
not. Notice that qualitative data could be much more than just words or text. Photographs,
videos, sound recordings and so on, can be considered qualitative data.

Personally, while I find the distinction between qualitative and quantitative data to have some
utility, I think most people draw too hard a distinction, and that can lead to all sorts of confusion.
In some areas of social research, the qualitative-quantitative distinction has led to protracted
arguments with the proponents of each arguing the superiority of their kind of data over the
other. The quantitative types argue that their data is 'hard', 'rigorous', 'credible', and 'scientific'.
The qualitative proponents counter that their data is 'sensitive', 'nuanced', 'detailed', and

For many of us in social research, this kind of polarized debate has become less than productive.
And, it obscures the fact that qualitative and quantitative data are intimately related to each other.
All quantitative data is based upon qualitative judgments; and all qualitative data can be
described and manipulated numerically. For instance, think about a very common quantitative
measure in social research -- a self esteem scale. The researchers who develop such instruments
had to make countless judgments in constructing them: how to define self esteem; how to
distinguish it from other related concepts; how to word potential scale items; how to make sure
the items would be understandable to the intended respondents; what kinds of contexts it could
be used in; what kinds of cultural and language constraints might be present; and on and on. The
researcher who decides to use such a scale in their study has to make another set of judgments:
how well does the scale measure the intended concept; how reliable or consistent is it; how
appropriate is it for the research context and intended respondents; and on and on. Believe it or
not, even the respondents make many judgments when filling out such a scale: what is meant by
various terms and phrases; why is the researcher giving this scale to them; how much energy and
effort do they want to expend to complete it, and so on. Even the consumers and readers of the
research will make lots of judgments about the self esteem measure and its appropriateness in
that research context. What may look like a simple, straightforward, cut-and-dried quantitative
measure is actually based on lots of qualitative judgments made by lots of different people.
On the other hand, all qualitative information can be easily converted into quantitative, and there
are many times when doing so would add considerable value to your research. The simplest way
to do this is to divide the qualitative information into units and number them! I know that sounds
trivial, but even that simple nominal enumeration can enable you to organize and process
qualitative information more efficiently. Perhaps more to the point, we might take text
information (say, excerpts from transcripts) and pile these excerpts into piles of similar
statements. When we do something even as easy as this simple grouping or piling task, we can
describe the results quantitatively. For instance, if we had ten statements and we grouped these
into five piles (as shown in the figure),
we could describe the piles using a 10 x
10 table of 0's and 1's. If two statements
were placed together in the same pile,
we would put a 1 in their row-column
juncture. If two statements were placed
in different piles, we would use a 0. The
resulting matrix or table describes the
grouping of the ten statements in terms
of their similarity. Even though the data
in this example consists of qualitative
statements (one per card), the result of
our simple qualitative procedure
(grouping similar excerpts into the same
piles) is quantitative in nature. "So what?" you ask. Once we have the data in numerical form, we
can manipulate it numerically. For instance, we could have five different judges sort the 10
excerpts and obtain a 0-1 matrix like this for each judge. Then we could average the five
matrices into a single one that shows the proportions of judges who grouped each pair together.
This proportion could be considered an estimate of the similarity (across independent judges) of
the excerpts. While this might not seem too exciting or useful, it is exactly this kind of procedure
that I use as an integral part of the process of developing 'concept maps' of ideas for groups of
people (something that is useful!).

You won't be able to do very much in research unless you know how to talk about variables. A
variable is any entity that can take on different values. OK, so what does that mean? Anything
that can vary can be considered a variable. For instance, age can be considered a variable
because age can take different values for different people or for the same person at different
times. Similarly, country can be considered a variable because a person's country can be assigned
a value.

Variables aren't always 'quantitative' or numerical. The variable 'city' consists of text values like
'New York' or 'Sydney'. We can, if it is useful, assign quantitative values instead of (or in place
of) the text values, but we don't have to assign numbers in order for something to be a variable.
It's also important to realize that variables aren't only things that we measure in the traditional
sense. For instance, in much social research and in program evaluation, we consider the
treatment or program to be made up of one or more variables (i.e., the 'cause' can be considered a
variable). An educational program can have varying amounts of 'time on task', 'classroom
settings', 'student-teacher ratios', and so on. So even the program can be considered a variable
(which can be made up of a number of sub-variables).

An attribute is a specific value on a variable. For instance, the variable sex or gender has two
attributes: male and female. Or, the variable agreement might be defined as having five

 1 = strongly disagree
 2 = disagree
 3 = neutral
 4 = agree
 5 = strongly agree

Another important distinction having to do with the term 'variable' is the distinction between an
independent and dependent variable. This distinction is particularly relevant when you are
investigating cause-effect relationships. It took me the longest time to learn this distinction. (Of
course, I'm someone who gets confused about the signs for 'arrivals' and 'departures' at airports --
do I go to arrivals because I'm arriving at the airport or does the person I'm picking up go to
arrivals because they're arriving on the plane!). I originally thought that an independent variable
was one that would be free to vary or respond to some program or treatment, and that a
dependent variable must be one that depends on my efforts (that is, it's the treatment). But this is
entirely backwards! In fact the independent variable is what you (or nature) manipulates -- a
treatment or program or cause. The dependent variable is what is affected by the independent
variable -- your effects or outcomes. For example, if you are studying the effects of a new
educational program on student achievement, the program is the independent variable and your
measures of achievement are the dependent ones.

Finally, there are two traits of variables that should always be achieved. Each variable should be
exhaustive, it should include all possible answerable responses. For instance, if the variable is
"religion" and the only options are "Protestant", "Jewish", and "Muslim", there are quite a few
religions I can think of that haven't been included. The list does not exhaust all possibilities. On
the other hand, if you exhaust all the possibilities with some variables -- religion being one of
them -- you would simply have too many responses. The way to deal with this is to explicitly list
the most common attributes and then use a general category like "Other" to account for all
remaining ones. In addition to being exhaustive, the attributes of a variable should be mutually
exclusive, no respondent should be able to have two attributes simultaneously. While this might
seem obvious, it is often rather tricky in practice. For instance, you might be tempted to represent
the variable "Employment Status" with the two attributes "employed" and "unemployed." But
these attributes are not necessarily mutually exclusive -- a person who is looking for a second job
while employed would be able to check both attributes! But don't we often use questions on
surveys that ask the respondent to "check all that apply" and then list a series of categories? Yes,
we do, but technically speaking, each of the categories in a question like that is its own variable
and is treated dichotomously as either "checked" or "unchecked", attributes that are mutually
ypes of Data

There are two different types of data that we use when we are carrying our research projects.
These two different types of data are called Primary and Secondary data collection.

Primary Data

Primary data is data that we collect ourselves during the period of our research e.g.
Questionnaires, Observations, Interviews and so on. We then use the data we have collected and
noted down to begin the next stage of out research which is the theory making and the
understanding of what we are researching.

Primary data is best used for ever evolving research because different factors play roles in things
we research and can lead to varying results depending on the factor and how much of a role it
plays on the research.

Secondary Data

Secondary data is data that has already been collect and we use for reference or to gain
knowledge from other peoples experiences e.g. published books, Government publications,
Journals and the internet. We then use this data to add to the Primary data that we have collected
and use it to combine different people’s opinions and base a theory with evidence to back this
point up.

Secondary data is best used to add other existing evidence and proof to the Primary data that we
have collected, we are better using Secondary data as reference and to gain the knowledge that
we need to begin our own research processes.

Classification of Data

There are multiple classifications of data that we used in our research these include, discrete
data, Ordinal data, Continuous data, Nominal data, Interval data and Ratio data.

Discrete Data

Discrete data can’t be broken down into smaller data values, e.g. a questionnaire with answer
options of “Yes/No?” and “Male/Female?” This type of research is best used for things that are
counted in whole numbers like the example showed.

Ordinal Data

Ordinal data is usually data that can be ranked and put in place depending on the values that each
subject has for example a football league table, the team with more points will be placed higher
up in the league table.

Continuous Data
Continuous data can have and number value with any number of desired decimal places , this
data is used in events that require time as major part of how the event works and is ranked e.g. a
formula one race can come down to milliseconds between the placing the drivers receive. And
example of continuous data is “The driver completed the fastest lap of the race winning by 1m
12.34 seconds”

Nominal Data

Nominal data is used to assign categories for identification purposes e.g. “Male = 1 Female = 2”
the numbers have no value but identify the difference between the male and female population
participating in the research or data evidence.

Interval Data

Interval data is used on an order of scale basis that is equal to intervals between scores e.g.
Olympic judging scores 6.0, 6.5, 7.0, 7.5, this data is used if judges are unsure whether to round
off the scores higher or lower so have the option to offer up a score in-between the two scores
that the performer received.

Ratio Data

Ratio data is based on an order scale with proportional equal units of measurement e.g. Rugby
scores – scoring 40 points has twice the value of scoring 20 points. This data is also used in
things such as blood pressure measurements.

Thank you for taking the time to read the hub! If you have any questions or any tips to
add, then please post them in the comment section below and maybe we can all learn
something new.

Some of My Other Work -

The Human Skeleton

The Effect of Motivation on Sports Performance

The Functions of Muscles

How To Make Money Buying & Selling Used Items

Data are distinct pieces of information, usually formatted in a special way. Strictly speaking, data
is the plural of datum, a single piece of information. In practice, however, people use data as
both the singular and plural form of the word. In database management systems, data files are the
files that store the database information.

Research data is data that is collected, observed, or created, for purposes of analysis to produce
original research results. The word “data” is used throughout this site to refer to research data.

Research data can be generated for different purposes and through different processes, and can
be divided into different categories. Each category may require a different type of data
management plan.

 Observational: data captured in real-time, usually irreplaceable. For example, sensor

data, survey data, sample data, neurological images.
 Experimental: data from lab equipment, often reproducible, but can be expensive. For
example, gene sequences, chromatograms, toroid magnetic field data.
 Simulation: data generated from test models where model and metadata are more
important than output data. For example, climate models, economic models.
 Derived or compiled: data is reproducible but expensive. For example, text and data
mining, compiled database, 3D models.
 Reference or canonical: a (static or organic) conglomeration or collection of smaller
(peer-reviewed) datasets, most probably published and curated. For example, gene
sequence databanks, chemical structures, or spatial data portals.

Research data may include all of the following:

 Text or Word documents, spreadsheets

 Laboratory notebooks, field notebooks, diaries
 Questionnaires, transcripts, codebooks
 Audiotapes, videotapes
 Photographs, films
 Test responses
 Slides, artifacts, specimens, samples
 Collection of digital objects acquired and generated during the process of research
 Data files
 Database contents including video, audio, text, images
 Models, algorithms, scripts
 Contents of an application such as input, output, log files for analysis software,
simulation software, schemas
 Methodologies and workflows
 Standard operating procedures and protocols

The following research records may also be important to manage during and beyond the life of
a project:

 Correspondence including electronic mail and paper-based correspondence

 Project files
 Grant applications
 Ethics applications
 Technical reports
 Research reports
 Master lists
 Signed consent forms

Data vs. Information

Data are plain facts. When data are processed, organized, structured or presented in a given
context so as to make them useful, they are called information.

It is not enough to have data (such as statistics on the economy). Data in themselves are fairly
useless. But when these data are interpreted and processed to determine their true meaning, they
become useful and can be called information. Data is the computer’s language. Information is
our translation of this language.

Some definitions
Quantitative research is “explaining phenomena by collecting numerical data that are analysed
using mathematically based methods (in particular statistics).”*

Qualitative research seeks to answer questions about why and how people behave in the way
that they do. It provides in-depth information about human behaviour.

* Taken from: Aliaga and Gunderson ‘Interactive Statistics ‘3rd Edition (2005)

Quantitative Research

Quantitative research is perhaps the simpler to define and identify.

The data produced are always numerical, and they are analysed using mathematical and
statistical methods. If there are no numbers involved, then it’s not quantitative research.

Some phenomena obviously lend themselves to quantitative analysis because they are already
available as numbers. Examples include changes in achievement at various stages of education,
or the increase in number of senior managers holding management degrees. However, even
phenomena that are not obviously numerical in nature can be examined using quantitative

Example: turning opinions into numbers

If you wish to carry out statistical analysis of the opinions of a group of people about a particular
issue or element of their lives, you can ask them to express their relative agreement with
statements and answer on a five- or seven-point scale, where 1 is strongly disagree, 2 is disagree,
3 is neutral, 4 is agree and 5 is strongly agree (the seven-point scale also has slightly

Such scales are called Likert scales, and enable statements of opinion to be directly translated
into numerical data.

The development of Likert scales and similar techniques mean that most phenomena can be
studied using quantitative techniques.

This is particularly useful if you are in an environment where numbers are highly valued and
numerical data is considered the ‘gold standard’.

However, it is important to note that quantitative methods are not necessarily the most suitable
methods for investigation. They are unlikely to be very helpful when you want to understand the
detailed reasons for particular behaviour in depth. It is also possible that assigning numbers to
fairly abstract constructs such as personal opinions risks making them spuriously precise.

Sources of Quantitative Data

The most common sources of quantitative data include:

 Surveys, whether conducted online, by phone or in person. These rely on the same questions
being asked in the same way to a large number of people;
 Observations, which may either involve counting the number of times that a particular
phenomenon occurs, such as how often a particular word is used in interviews, or coding
observational data to translate it into numbers; and
 Secondary data, such as company accounts.

Our pages on Survey Design and Observational Research provide more information about these

Analysing Quantitative Data

There are a wide range of statistical techniques available to analyse quantitative data, from
simple graphs to show the data through tests of correlations between two or more items, to
statistical significance. Other techniques include cluster analysis, useful for identifying
relationships between groups of subjects where there is no obvious hypothesis, and hypothesis
testing, to identify whether there are genuine differences between groups.

Our page Statistical Analysis provides more information about some of the simpler statistical

Qualitative Research

Qualitative research is any which does not involve numbers or numerical data.

It often involves words or language, but may also use pictures or photographs and observations.

Almost any phenomenon can be examined in a qualitative way, and it is often the preferred
method of investigation in the UK and the rest of Europe; US studies tend to use quantitative
methods, although this distinction is by no means absolute.

Qualitative analysis results in rich data that gives an in-depth picture and it is particularly useful
for exploring how and why things have happened.

However, there are some pitfalls to qualitative research, such as:

 If respondents do not see a value for them in the research, they may provide inaccurate or
false information. They may also say what they think the researcher wishes to hear. Qualitative
researchers therefore need to take the time to build relationships with their research subjects
and always be aware of this potential.
 Although ethics are an issue for any type of research, there may be particular difficulties with
qualitative research because the researcher may be party to confidential information. It is
important always to bear in mind that you must do no harm to your research subjects.
 It is generally harder for qualitative researchers to remain apart from their work. By the nature
of their study, they are involved with people. It is therefore helpful to develop habits of reflecting
on your part in the work and how this may affect the research. See our page on Reflective
Practice for more.

Sources of Qualitative Data

Although qualitative data is much more general than quantitative, there are still a number of
common techniques for gathering it. These include:

 Interviews, which may be structured, semi-structured or unstructured;

 Focus groups, which involve multiple participants discussing an issue;
 ‘Postcards’, or small-scale written questionnaires that ask, for example, three or four focused
questions of participants but allow them space to write in their own words;
 Secondary data, including diaries, written accounts of past events, and company reports; and
 Observations, which may be on site, or under ‘laboratory conditions’, for example, where
participants are asked to role-play a situation to show what they might do.

Our pages on Interviews for Research, Focus Groups and Observational Research provide more
information about these techniques.

Analysing Qualitative Data

Because qualitative data are drawn from a wide variety of sources, they can be radically different
in scope.

There are, therefore, a wide variety of methods for analysing them, many of which involve
structuring and coding the data into groups and themes. There are also a variety of computer
packages to support qualitative data analysis. The best way to work out which ones are right for
your research is to discuss it with academic colleagues and your supervisor.

Our page Analysing Qualitative Data provides more information about some of the most common

Find more at:

So what is the difference between Qualitative Research and Quantitative Research?

Qualitative Research is primarily exploratory research. It is used to gain an understanding of

underlying reasons, opinions, and motivations. It provides insights into the problem or helps to
develop ideas or hypotheses for potential quantitative research. Qualitative Research is also used
to uncover trends in thought and opinions, and dive deeper into the problem. Qualitative data
collection methods vary using unstructured or semi-structured techniques. Some common
methods include focus groups (group discussions), individual interviews, and
participation/observations. The sample size is typically small, and respondents are selected to
fulfill a given quota.

Quantitative Research is used to quantify the problem by way of generating numerical data or
data that can be transformed into useable statistics. It is used to quantify attitudes, opinions,
behaviors, and other defined variables – and generalize results from a larger sample population.
Quantitative Research uses measurable data to formulate facts and uncover patterns in research.
Quantitative data collection methods are much more structured than Qualitative data collection
methods. Quantitative data collection methods include various forms of surveys – online surveys,
paper surveys, mobile surveys and kiosk surveys, face-to-face interviews, telephone interviews,
longitudinal studies, website interceptors, online polls, and systematic observations.

Snap Survey Software is the ideal solution for a Quantitative Research tool where structured
techniques such as large numbers of respondents and descriptive findings are required. Snap
Survey Software has many robust features that will help your organization effectively gather and
analyze quantitative data.

Quantitative Vs. Qualitative Research: When to Use

Michaela Mora

Mar 16, 2010



Regardless of the subject of your study, you have just two types of research to choose from:
qualitative and quantitative.

How much you know (or suspect) about your area of research and your respondents will
determine exactly which kind of research is right for you. Most people will need a combination
of the two to get the most accurate data.

When and How to Use Qualitative Research

Qualitative research is by definition exploratory, and it is used when we don’t know what to
expect, to define the problem or develop an approach to the problem.
It’s also used to go deeper into issues of interest and explore nuances related to the problem at
hand. Common data collection methods used in qualitative research are:

 Focus groups
 Triads
 Dyads
 In-depth interviews
 Uninterrupted observation
 Bulletin boards
 Ethnographic participation/observation.

The Best Times for Quantitative Research

Quantitative research is conclusive in its purpose, as it tries to quantify a problem and understand
how prevalent it is by looking for projectable results to a larger population.

For this type of study we collect data through:

 Surveys (online, phone, paper)

 Audits
 Points of purchase (purchase transactions)
 Click-streams.

Guidelines For Using Both Types of Research

Ideally, if budget allows, we should use both qualitative and quantitative research since they
provide different perspectives and usually complement each other.

Advanced survey software should give you the option to integrate video and chat sessions with
your surveys, which can give you the best of both quantitative and qualitative research.

This methodological approach is a cost-effective alternative to the combination of in-person

focus groups and a separate quantitative study.

It allows us to save on facility rental, recruitment costs, incentives and travel usually associated
with focus groups, and clients still are able to monitor the sessions remotely from the
convenience of their desktops and ask questions to respondents through the moderator.

If you still want to go with traditional methods and can only afford one or the other, make sure
you select the approach that best fits the research objectives and be aware of its caveats.
Never assume that doing more focus groups is a substitute for quantitative research or that a long
survey will give you all the in-depth information you could get through qualitative research

For a more detailed guide on the best way to ask quantitative questions, check out our article on
New Ways to Ask Quantitative Research Questions.

The research variables, of any scientific experiment or research process,

are factors that can be manipulated and measured.

Any factor that can take on different values is a scientific variable and influences the outcome of
experimental research.
Gender, color and country are all perfectly acceptable variables, because they
are inherently changeable.

Most scientific experiments measure quantifiable factors, such as time or weight, but this is not
essential for a component to be classed as a variable.

As an example, most of us have filled in surveys where a researcher asks questions and asks you to
rate answers. These responses generally have a numerical range, from ‘1 - Strongly Agree’ through
to ‘5 - Strongly Disagree’. This type of measurement allows opinions to be statistically analyzed and
Dependent and Independent Variables

The key to designing any experiment is to look at what research variables could affect the outcome.
There are many types of variable but the most important, for the vast majority of research methods,
are the independent and dependent variables.
A researcher must determine which variable needs to be manipulated to
generate quantifiable results.
The independent variable is the core of the experiment and is isolated and manipulated by the
researcher. The dependent variable is the measurable outcome of this manipulation, the results of
the experimental design. For many physical experiments, isolating the independent variable and
measuring the dependent is generally easy.
If you designed an experiment to determine how quickly a cup of coffee cools, the manipulated
independent variable is time and the dependent measured variable is temperature.

In other fields of science, the variables are often more difficult to determine and an experiment
needs a robust design. Operationalization is a useful tool to measure fuzzy concepts which do not
have one obvious variable.

The Difficulty of Isolating Variables

In biology, social science and geography, for example, isolating a single independent variable is
more difficult and any experimental design must consider this.
For example, in a social research setting, you might wish to compare the effect of different foods
upon hyperactivity in children. The initial research and inductive reasoning leads you to postulate
that certain foods and additives are a contributor to increased hyperactivity. You decide to create a
hypothesis and design an experiment, to establish if there is solid evidence behind the claim.

The type of food is an independent variable, as is the amount eaten, the period of time and the
gender and age of the child. All of these factors must be accounted for during the experimental
design stage. Randomization and controls are generally used to ensure that only one independent
variable is manipulated.
To eradicate some of these research variables and isolate the process, it is essential to use various
scientific measurements to nullify or negate them.
For example, if you wanted to isolate the different types of food as the manipulated variable, you
should use children of the same age and gender.

The test groups should eat the same amount of the food at the same times and the children should
be randomly assigned to groups. This will minimize the physiological differences between children. A
control group, acting as a buffer against unknown research variables, might involve some children
eating a food type with no known links to hyperactivity.
In this experiment, the dependent variable is the level of hyperactivity, with the resulting statistical
tests easily highlighting any correlation. Depending upon the results, you could try to measure a
different variable, such as gender, in a follow up experiment.
Converting Research Variables Into Constants

Ensuring that certain research variables are controlled increases the reliability and validity of the
experiment, by ensuring that other causal effects are eliminated. This safeguard makes it easier for
other researchers to repeat the experiment and comprehensively test the results.
What you are trying to do, in your scientific design, is to change most of the variables into constants,
isolating the independent variable. Any scientific research does contain an element of compromise
and inbuilt error, but eliminating other variables will ensure that the results are robust and valid.


As a researcher, you're going to perform an experiment. I'm kind of hungry right now, so let's say
your experiment will examine four people's ability to throw a ball when they haven't eaten for a
specific period of time - 6, 12, 18 and 24 hours.

We can say that in your experiment, you are going to do something and then see what happens to
other things. But, that sentence isn't very scientific. So, we're going to learn some new words to
replace the unscientific ones, so we can provide a scientific explanation for what you're going to
do in your experiment.

The starting point here is to identify what a variable is. A variable is defined as anything that has
a quantity or quality that varies. Your experiment's variables are not eating and throwing a ball.

Now, let's science up that earlier statement. 'You are going to manipulate a variable to see what
happens to another variable.' It still isn't quite right because we're using the blandest term for
variable, and we didn't differentiate between the variables. Let's take a look at some other terms
that will help us make this statement more scientific and specific.

Dependent and Independent Variables

A moment ago, we discussed the two variables in our experiment - hunger and throwing a ball.
But, they are both better defined by the terms 'dependent' or 'independent' variable.

The dependent variable is the variable a researcher is interested in. The changes to the
dependent variable are what the researcher is trying to measure with all their fancy techniques. In
our example, your dependent variable is the person's ability to throw a ball. We're trying to
measure the change in ball throwing as influenced by hunger.

An independent variable is a variable believed to affect the dependent variable. This is the
variable that you, the researcher, will manipulate to see if it makes the dependent variable
change. In our example of hungry people throwing a ball, our independent variable is how long
it's been since they've eaten.

To reiterate, the independent variable is the thing over which the researcher has control and is
manipulating. In this experiment, the researcher is controlling the food intake of the participant.
The dependent variable is believed to be dependent on the independent variable.

Your experiment's dependent variable is the ball throwing, which will hopefully change due to
the independent variable. So now, our scientific sentence is, 'You are going to manipulate an
independent variable to see what happens to the dependent variable.'

Unwanted Influence

Sometimes, when you're studying a dependent variable, your results don't make any sense. For
instance, what if people in one group are doing amazingly well while the other groups are doing
about the same. This could be caused by a confounding variable, defined as an interference
caused by another variable. In our unusually competent group example, the confounding variable
could be that this group is made up of players from the baseball team.

In our original example of hungry people throwing the ball, there are several confounding
variables we need to make sure we account for. Some examples would be:

 Metabolism and weight of the individuals (for example, a 90 lb woman not eating for 24 hours
compared to a 350 lb man not eating for 6 hours)
 Ball size (people with smaller hands may have a difficult time handling a large ball)
 Age (a 90-year-old person will perform differently than a 19-year-old person)

Confounding variables are a specific type of extraneous variable. Extraneous variables are
defined as any variable other than the independent and dependent variable. So, a confounding
variable is a variable that could strongly influence your study, while extraneous variables are
weaker and typically influence your experiment in a lesser way. Some examples from our ball
throwing study include:

 Time of year
 Location of the experiment
 The person providing instructions

Our scientific sentence is now, 'You're going to manipulate the independent variable to see what
happens to the dependent variable, controlling for confounding or extraneous variables.'
Reducing or Increasing Changes

In an experiment, if you have multiple trials, you want to reduce the number of changes between
each trial. If you tell the ball throwers on the first day to toss a ping-pong ball into a little red
cup, and on the second day you tell ball throwers to hurl a bowling ball into a barrel, your results
are going to be different.

Educational Research
Step 2: Identify Key Variables and
Research Design
Once you have brainstormed project topics, narrowed down the list, and reviewed the research related to
that narrowed list, select a topic that seems most appealing to you. However, this project topic is not set in
stone yet. After you begin working through the project, you may realize that the topic needs to be revised,
or even entirely changed to a different topic. The next step is to identify the key variables and the
research design.

Key Variables

All research projects are based around variables. A variable is the characteristic or attribute of an
individual, group, educational system, or the environment that is of interest in a research study. Variables
can be straightforward and easy to measure, such as gender, age, or course of study. Other variables are
more complex, such as socioeconomic status, academic achievement, or attitude toward school.
Variables may also include an aspect of the educational system, such as a specific teaching method or
counseling program. Characteristics of the environment may also be variables, such as the amount of
school funding or availability of computers. Therefore, once the general research topic has been
identified, the researcher should identify the key variables of interest.

For example, a researcher is interested in low levels of literacy. Literacy itself is still a broad topic. In most
instances, the broad topic and general variables need to be specifically identified. For example, the
researcher needs to identify specific variables that define literacy: reading fluency (the ability to read a
text out loud), reading comprehension (understanding what is read), vocabulary, interest in reading, etc. If
a researcher is interested in motivation, what specific motivation variables are of interest: external
motivation, goals, need for achievement, etc? Reading other research studies about your chosen topic
will help you better identify the specific variables of interest.

Identifying the key variables is important for the following reasons:

 The key variables provide focus when writing the Introduction section.
 The key variables are the major terms to use when searching for research articles for the
Literature Review.
 The key variables are the terms to be operationally defined if an Operational Definition of Terms
section is necessary.

 The key variables provide focus to the Methods section.

 The Instrument will measure the key variables. These key variables must be directly measured
or manipulated for the research study to be valid.

Research Design
After the key variables have been identified, the researcher needs to identify how those variables will be
studied, which is the heart of the research design. There are four primary research designs:
 Descriptive: Describes the current state of variables. For example, a descriptive study might
examine teachers' knowledge of literacy development. This is a descriptive study because it
simply describes the current state of teachers' knowledge of literacy development.
 Causal Comparative: Examines the effect of one variable that cannot be manipulated on other
variables. An example would be the effect of gender on examination malpractice. A researcher
cannot manipulate a person's gender, so instead males and females are compared on their
examination malpractice behavior. Because the variable of interest cannot be manipulated,
causal comparative studies (sometimes also called ex post facto) ccompare two groups that differ
on the independent variable (e.g., gender) on the dependent variable (e.g., examination
malpractice). Thus, the key identifying factor of a causal comparative study is that it compares
two or more groups on a different variable.

 Correlational: Describes the relationship between variables. Correlational studies must examine
two variables that have continuous values. For example, academic achievement is a continuous
variable because students' scores have a wide range of values - oftentimes from 0 to 100.
However, gender is not a continuous variable because there are only two categories that gender
can have: male and female. A correlational study might examine the relationship between
motivation and academic achievement - both continuous variables. Note that in a correlational
design, both variables must be studied within the same group of individuals. In other words, it is
acceptable to study the relationship between academic achievement and motivation in students
because the two variables (academic achievement and motivation) are in the same group of
individuals (students). However, it is extremely difficult to study two variables in two groups of
people, such as the relationship between teacher motivation and student achievement. Here, the
two variables are compared between two groups: teachers and students. I strongly advise against
this latter type of study.

 Experimental and Quasi-Experimental: Examines the effect of a variable that the researcher
manipulates on other variables. An experimental or quasi-experimental study might examine the
effect of telling stories on children's literacy skills. In this case, the researcher will "manipulate"
the variable of telling stories by placing half of the children in a treatment group that listens to
stories and the other half of children in a control group that gets the ordinary literacy instruction.
The difference between an experimental design and quasi-experimental design is described in
Step 4: Research Design.

Descriptive studies are the most simple research design and provide the least amount of information
about improving education. Therefore, descriptive studies should only be conducted for first degree and
diploma projects. Only in special cases should a Masters thesis be descriptive. Doctoral dissertations
should aim for experimental or quasi-experimental studies.

Once the key variables and the research design have been identified, the rest of the study falls into place.

 The purpose, research questions, and hypotheses will be written about the variables based on
the research design.
 The Instruments will be developed to measure the key variables and the Instruments section in
Chapter 3 is written to describe the instruments.

 The Procedures section describes the treatment for experimental studies and/or how the
instrument will be administered.

 The Method of Data Analysis describes how the data is summarized and tested based on the
research questions and hypotheses.

Thus, the most difficult part of planning the research study is identifying the research variables and
research design. Considerable time and thought needs to be given to this step. Once the key variables
have been identified, then the research study can be developed. It is important to develop the research
study as described in Chapter 3 before writing the paper. If thought is not given to how the research study
should be conducted, then a researcher might spend considerable time and energy developing a Chapter
1 and Chapter 2 for a project that is completely unresearchable. Therefore, it is best to refine the research
study before spending the effort to write the first two chapters.

Sampling is concerned with choosing a subset of individuals

from a statistical population to estimate characteristics of a
whole population.
Learning Objective

 Compare and contrast the different sampling techniques used for opinion polls

Key Points
o Probability-proportional-to-size (PPS) is sampling in which the selection probability for
each element is set to be proportional to its size measure, up to 1. This approach can
improve accuracy by concentrating a sample on large elements that have the greatest
impact on population estimates.
o Maintaining the randomness in a sample is very important to each sampling technique
to ensure that the findings are representative of the population in general.
o Panel sampling is the method of selecting a group of participants through a random
sampling method and then asking that group for the same information again several
times over a period of time. This longitudinal sampling-method allows for estimates of
changes in the population.

 Systematic Sampling
Systematic sampling relies on arranging the target population according to some ordering
scheme, a random start, and then selecting elements at regular intervals through that
ordered list.

 Simple Random Sampling

A simple random sampling (SRS) is a sample of a given size in which all such subsets of
the frame are given an equal probability to be chosen.

 Stratified Sampling

Stratified sampling is a method of probability sampling such that sub-populations within

an overall population are identified and included in the sample selected in a balanced

Register for FREE to remove ads and unlock more features! Learn more

Full Text

Sampling Techniques

In statistics and survey methodology, sampling is concerned with the selection of a subset of
individuals from within a statistical population to estimate characteristics of the whole
population . The three main advantages of sampling are that the cost is lower, data collection is
faster, and the accuracy and quality of the data can be easily improved.

Normal Distribution Curve

The normal distribution curve can help indicate if the results of a survey are significant and what
the margin of error may be.

Simple Random Sampling

In a simple random sample (SRS) of a given size, all such subsets of the frame are given an equal
probability. Each element has an equal probability of selection. Furthermore, any given pair of
elements has the same chance of selection as any other pair. This minimizes bias and simplifies
analysis of results. In particular, the variance between individual results within the sample is a
good indicator of variance in the overall population, which makes it relatively easy to estimate
the accuracy of results.
However, SRS can be vulnerable to sampling error because the randomness of the selection may
result in a sample that doesn't reflect the makeup of the population.

Systematic Sampling

Systematic sampling relies on arranging the target population according to some ordering
scheme, a random start, and then selecting elements at regular intervals through that ordered list.
As long as the starting point is randomized, systematic sampling is a type of probability
sampling. It is easy to implement and the stratification can make it efficient, if the variable by
which the list is ordered is correlated with the variable of interest.

However, if periodicity is present and the period is a multiple or factor of the interval used, the
sample is especially likely to be unrepresentative of the overall population, decreasing its
accuracy. Another drawback of systematic sampling is that even in scenarios where it is more
accurate than SRS, its theoretical properties make it difficult to quantify that accuracy. As
described above, systematic sampling is an EPS method, because all elements have the same
probability of selection.

Stratified Sampling

Where the population embraces many distinct categories, the frame can be organized by these
categories into separate "strata. " Each stratum is then sampled as an independent sub-population,
out of which individual elements can be randomly selected. In this way, researchers can draw
inferences about specific subgroups that may be lost in a more generalized random sample.
Additionally, since each stratum is treated as an independent population, different sampling
approaches can be applied to different strata, potentially enabling researchers to use the approach
best suited for each identified subgroup. Stratified sampling can increase the cost and complicate
the research design.

Probability-Proportional-to-Size Sampling

Probability-proportional-to-size (PPS) is sampling in which the selection probability for each

element is set to be proportional to its size measure, up to a maximum of 1.The PPS approach
can improve accuracy for a given sample size by concentrating the sample on large elements that
have the greatest impact on population estimates. PPS sampling is commonly used for surveys of
businesses, where element size varies greatly and auxiliary information is often available.

Cluster Sampling

Sometimes it is more cost-effective to select respondents in groups ("clusters"). Sampling is

often clustered by geography or by time periods. Clustering can reduce travel and administrative
costs. It also means that one does not need a sampling frame listing all elements in the target
population. Instead, clusters can be chosen from a cluster-level frame, with an element-level
frame created only for the selected clusters.
Cluster sampling generally increases the variability of sample estimates above that of simple
random sampling, depending on how the clusters differ between themselves, as compared with
the within-cluster variation.

Quota Sampling

In quota sampling, the population is first segmented into mutually exclusive subgroups, just as in
stratified sampling. Then judgment is used to select the subjects or units from each segment
based on a specified proportion. For example, an interviewer may be told to sample 200 females
and 300 males between the age of 45 and 60. In quota sampling the selection of the sample is
non-random. The problem is that these samples may be biased because not everyone gets a
chance of selection.

Accidental Sampling

Accidental sampling (or grab, convenience, or opportunity sampling) is a type of non-probability

sampling which involves the sample being drawn from that part of the population which is close
to hand. The researcher cannot scientifically make generalizations about the total population
from this sample because it would not be representative enough.

Panel Sampling

Panel sampling is the method of first selecting a group of participants through a random
sampling method and then asking that group for the same information again several times over a
period of time. This longitudinal sampling-method allows estimates of changes in the population

Sampling techniques
What is sampling?
 A shortcut method for investigating a whole population
 Data is gathered on a small part of the whole parent population or sampling frame, and used to
inform what the whole picture is like

Why sample?

In reality there is simply not enough; time, energy, money, labour/man power, equipment, access to
suitable sites to measure every single item or site within the parent population or whole sampling frame.

Therefore an appropriate sampling strategy is adopted to obtain a representative, and statistically valid
sample of the whole.

Sampling considerations
 Larger sample sizes are more accurate representations of the whole
 The sample size chosen is a balance between obtaining a statistically valid representation, and
the time, energy, money, labour, equipment and access available

 A sampling strategy made with the minimum of bias is the most statistically valid

 Most approaches assume that the parent population has a normal distribution where most items
or individuals clustered close to the mean, with few extremes

 A 95% probability or confidence level is usually assumed, for example 95% of items or individuals
will be within plus or minus two standard deviations from the mean

 This also means that up to five per cent may lie outside of this - sampling, no matter how good
can only ever be claimed to be a very close estimate

Sampling techniques

Three main types of sampling strategy:

 Random
 Systematic

 Stratified

Within these types, you may then decide on a; point, line, area method.

Random sampling
 Least biased of all sampling techniques, there is no subjectivity - each member of the total
population has an equal chance of being selected
 Can be obtained using random number tables

 Microsoft Excel has a function to produce random number

The function is simply:

 =RAND()

Type that into a cell and it will produce a random number in that cell. Copy the formula throughout a
selection of cells and it will produce random numbers.

You can modify the formula to obtain whatever range you wish, for example if you wanted random
numbers from one to 250, you could enter the following formula:

 =INT(250*RAND())+1

Where INT eliminates the digits after the decimal, 250* creates the range to be covered, and +1 sets the
lowest number in the range.

Paired numbers could also be obtained using;

 =INT(9000*RAND())+1000
These can then be used as grid coordinates, metre and centimetre sampling stations along a transect, or
in any feasible way.


A. Random point sampling

 A grid is drawn over a map of the study area

 Random number tables are used to obtain coordinates/grid references for the points

 Sampling takes place as feasibly close to these points as possible

B. Random line sampling

 Pairs of coordinates or grid references are obtained using random number tables, and marked on
a map of the study area
 These are joined to form lines to be sampled

C. Random area sampling

 Random number tables generate coordinates or grid references which are used to mark the
bottom left (south west) corner of quadrats or grid squares to be sampled

Figure one: A random number grid showing methods of generating random numbers, lines and areas.

Advantages and disadvantages of random sampling


 Can be used with large sample populations

 Avoids bias


 Can lead to poor representation of the overall parent population or area if large areas are not hit
by the random numbers generated. This is made worse if the study area is very large
 There may be practical constraints in terms of time available and access to certain parts of the
study area

Systematic sampling

Samples are chosen in a systematic, or regular way.

 They are evenly/regularly distributed in a spatial context, for example every two metres along a
transect line
 They can be at equal/regular intervals in a temporal context, for example every half hour or at set
times of the day

 They can be regularly numbered, for example every 10th house or person

A. Systematic point sampling

A grid can be used and the points can be at the intersections of the grid lines (A), or in the middle of each
grid square (B). Sampling is done at the nearest feasible place. Along a transect line, sampling points for
vegetation/pebble data collection could be identified systematically, for example every two metres or
every 10th pebble

B. Systematic line sampling

The eastings or northings of the grid on a map can be used to identify transect lines (C and D)
Alternatively, along a beach it could be decided that a transect up the beach will be conducted every 20
metres along the length of the beach

C. Systematic area sampling

A ‘pattern' of grid squares to be sampled can be identified using a map of the study area, for
example every second/third grid square down or across the area (E) - the south west corner will then
mark the corner of a quadrat. Patterns can be any shape or direction as long as they are regular (F)

Figure two: Systemic sampling grid showing methods of generating systemic points, lines and areas.

Advantages and disadvantages of systematic sampling

 It is more straight-forward than random sampling
 A grid doesn't necessarily have to be used, sampling just has to be at uniform intervals

 A good coverage of the study area can be more easily achieved than using random sampling


 It is more biased, as not all members or points have an equal chance of being selected
 It may therefore lead to over or under representation of a particular pattern

Stratified sampling

This method is used when the parent population or sampling frame is made up of sub-sets of known size.
These sub-sets make up different proportions of the total, and therefore sampling should be stratified to
ensure that results are proportional and representative of the whole.

A. Stratified systematic sampling

The population can be divided into known groups, and each group sampled using a systematic approach.
The number sampled in each group should be in proportion to its known size in the parent population.

For example: the make-up of different social groups in the population of a town can be obtained, and then
the number of questionnaires carried out in different parts of the town can be stratified in line with this
information. A systematic approach can still be used by asking every fifth person.

B. Stratified random sampling

A wide range of data and fieldwork situations can lend themselves to this approach - wherever there are
two study areas being compared, for example two woodlands, river catchments, rock types or a
population with sub-sets of known size, for example woodland with distinctly different habitats.

Random point, line or area techniques can be used as long as the number of measurements taken is in
proportion to the size of the whole.

For example: if an area of woodland was the study site, there would likely be different types of habitat
(sub-sets) within it. Random sampling may altogether ‘miss' one or more of these.

Stratified sampling would take into account the proportional area of each habitat type within the woodland
and then each could be sampled accordingly; if 20 samples were to be taken in the woodland as a whole,
and it was found that a shrubby clearing accounted for 10% of the total area, two samples would need to
be taken within the clearing. The sample points could still be identified randomly (A) or systematically (B)
within each separate area of woodland.
Figure three: A diagram highlighting the benefits of using stratified random sampling and stratified
systemic sampling within certain fieldwork sites.

Advantages and disadvantages of stratified sampling


 It can be used with random or systematic sampling, and with point, line or area techniques
 If the proportions of the sub-sets are known, it can generate results which are more
representative of the whole population

 It is very flexible and applicable to many geographical enquiries

 Correlations and comparisons can be made between sub-sets


 The proportions of the sub-sets must be known and accurate if it is to work properly
 It can be hard to stratify questionnaire data collection, accurate up to date population data may
not be available and it may be hard to identify people's age or social background effectively

Reliability is the degree to which an assessment tool produces stable and

consistent results.

Types of Reliability
1. Test-retest reliability is a measure of reliability obtained by administering the same test
twice over a period of time to a group of individuals. The scores from Time 1 and Time 2
can then be correlated in order to evaluate the test for stability over time.

Example: A test designed to assess student learning in psychology could be given to a

group of students twice, with the second administration perhaps coming a week after the
first. The obtained correlation coefficient would indicate the stability of the scores.

2. Parallel forms reliability is a measure of reliability obtained by administering different

versions of an assessment tool (both versions must contain items that probe the same
construct, skill, knowledge base, etc.) to the same group of individuals. The scores from
the two versions can then be correlated in order to evaluate the consistency of results
across alternate versions.

Example: If you wanted to evaluate the reliability of a critical thinking assessment, you
might create a large set of items that all pertain to critical thinking and then randomly split
the questions up into two sets, which would represent the parallel forms.

3. Inter-rater reliability is a measure of reliability used to assess the degree to which

different judges or raters agree in their assessment decisions. Inter-rater reliability is
useful because human observers will not necessarily interpret answers the same way;
raters may disagree as to how well certain responses or material demonstrate
knowledge of the construct or skill being assessed.

Example: Inter-rater reliability might be employed when different judges are evaluating the
degree to which art portfolios meet certain standards. Inter-rater reliability is especially
useful when judgments can be considered relatively subjective. Thus, the use of this type of
reliability would probably be more likely when evaluating artwork as opposed to math

4. Internal consistency reliability is a measure of reliability used to evaluate the degree

to which different test items that probe the same construct produce similar results.
A. Average inter-item correlation is a subtype of internal consistency reliability. It
is obtained by taking all of the items on a test that probe the same construct
(e.g., reading comprehension), determining the correlation coefficient for each
pair of items, and finally taking the average of all of these correlation
coefficients. This final step yields the average inter-item correlation.

B. Split-half reliability is another subtype of internal consistency reliability. The

process of obtaining split-half reliability is begun by “splitting in half” all items of a
test that are intended to probe the same area of knowledge (e.g., World War II) in
order to form two “sets” of items. The entire test is administered to a group of
individuals, the total score for each “set” is computed, and finally the split-half
reliability is obtained by determining the correlation between the two total “set”

Validity refers to how well a test measures what it is purported to measure.

Why is it necessary?

While reliability is necessary, it alone is not sufficient. For a test to be reliable, it also
needs to be valid. For example, if your scale is off by 5 lbs, it reads your weight every
day with an excess of 5lbs. The scale is reliable because it consistently reports the
same weight every day, but it is not valid because it adds 5lbs to your true weight. It is
not a valid measure of your weight.

Types of Validity
1. Face Validity ascertains that the measure appears to be assessing the intended
construct under study. The stakeholders can easily assess face validity. Although
this is not a very “scientific” type of validity, it may be an essential component in
enlisting motivation of stakeholders. If the stakeholders do not believe the measure
is an accurate assessment of the ability, they may become disengaged with the

Example: If a measure of art appreciation is created all of the items should be related to the
different components and types of art. If the questions are regarding historical time periods,
with no reference to any artistic movement, stakeholders may not be motivated to give their best
effort or invest in this measure because they do not believe it is a true assessment of art

2. Construct Validity is used to ensure that the measure is actually measure what it is
intended to measure (i.e. the construct), and not other variables. Using a panel of
“experts” familiar with the construct is a way in which this type of validity can be
assessed. The experts can examine the items and decide what that specific item is
intended to measure. Students can be involved in this process to obtain their feedback.

Example: A women’s studies program may design a cumulative assessment of learning

throughout the major. The questions are written with complicated wording and phrasing. This
can cause the test inadvertently becoming a test of reading comprehension, rather than a test of
women’s studies. It is important that the measure is actually assessing the intended construct,
rather than an extraneous factor.

3. Criterion-Related Validity is used to predict future or current performance - it correlates test

results with another criterion of interest.

Example: If a physics program designed a measure to assess cumulative student learning

throughout the major. The new measure could be correlated with a standardized measure of
ability in this discipline, such as an ETS field test or the GRE subject test. The higher the
correlation between the established measure and new measure, the more faith stakeholders
can have in the new assessment tool.
4. Formative Validity when applied to outcomes assessment it is used to assess how
well a measure is able to provide information to help improve the program under

Example: When designing a rubric for history one could assess student’s knowledge across the
discipline. If the measure can provide information that students are lacking knowledge in a
certain area, for instance the Civil Rights Movement, then that assessment tool is providing
meaningful information that can be used to improve the course or program requirements.

5. Sampling Validity (similar to content validity) ensures that the measure covers the broad
range of areas within the concept under study. Not everything can be covered, so items need to
be sampled from all of the domains. This may need to be completed using a panel of “experts”
to ensure that the content area is adequately sampled. Additionally, a panel can help limit
“expert” bias (i.e. a test reflecting what an individual personally feels are the most important or
relevant areas).

Example: When designing an assessment of learning in the theatre department, it would not be
sufficient to only cover issues related to acting. Other areas of theatre such as lighting, sound,
functions of stage managers should all be included. The assessment should reflect the content
area in its entirety.

What are some ways to improve validity?

1. Make sure your goals and objectives are clearly defined and operationalized.
Expectations of students should be written down.
2. Match your assessment measure to your goals and objectives. Additionally, have the test
reviewed by faculty at other schools to obtain feedback from an outside party who is less
invested in the instrument.
3. Get students involved; have the students look over the assessment for troublesome
wording, or other difficulties.
4. If possible, compare your measure with other measures, or data that may be available.


American Educational Research Association, American Psychological Association, &

National Council on Measurement in Education. (1985). Standards for educational and
psychological testing. Washington, DC: Authors.

Cozby, P.C. (2001). Measurement Concepts. Methods in Behavioral Research (7th ed.).

California: Mayfield Publishing Company.

Cronbach, L. J. (1971). Test validation. In R. L. Thorndike (Ed.). Educational

Measurement (2nd ed.). Washington, D. C.: American Council on Education.

Moskal, B.M., & Leydens, J.A. (2000). Scoring rubric development: Validity and

reliability. Practical Assessment, Research & Evaluation, 7(10). [Available online:].

The Center for the Enhancement of Teaching. How to improve test reliability and

validity: Implications for grading. [Available online:].

The principles of validity and reliability are fundamental cornerstones of the

scientific method.

Together, they are at the core of what is accepted as scientific proof, by scientist and philosopher
By following a few basic principles, any experimental design will stand up to rigorous
questioning and skepticism.
What is Reliability?

The idea behind reliability is that any significant results must be more than a one-off finding and
be inherently repeatable.
Other researchers must be able to perform exactly the same experiment, under the same
conditions and generate the same results. This will reinforce the findings and ensure that the
wider scientific community will accept the hypothesis.
Without this replication of statistically significant results, the experiment and research have not
fulfilled all of the requirements of testability.
This prerequisite is essential to a hypothesis establishing itself as an accepted scientific truth.
For example, if you are performing a time critical experiment, you will be using some type of
stopwatch. Generally, it is reasonable to assume that the instruments are reliable and will keep
true and accurate time. However, diligent scientists take measurements many times, to minimize
the chances of malfunction and maintain validity and reliability.
At the other extreme, any experiment that uses human judgment is always going to come under

For example, if observers rate certain aspects, like in Bandura’s Bobo Doll Experiment, then the
reliability of the test is compromised. Human judgment can vary wildly between observers, and
the same individual may rate things differently depending upon time of day and current mood.
This means that such experiments are more difficult to repeat and are inherently less reliable.
Reliability is a necessary ingredient for determining the overall validity of a scientific
experiment and enhancing the strength of the results.
Debate between social and pure scientists, concerning reliability, is robust and ongoing.

What is Validity?

Validity encompasses the entire experimental concept and establishes whether the results
obtained meet all of the requirements of the scientific research method.

For example, there must have been randomization of the sample groups and appropriate care and
diligence shown in the allocation of controls.
Internal validity dictates how an experimental design is structured and encompasses all of the
steps of the scientific research method.
Even if your results are great, sloppy and inconsistent design will compromise your integrity in
the eyes of the scientific community. Internal validity and reliability are at the core of any
experimental design.
External validity is the process of examining the results and questioning whether there are any
other possible causal relationships.
Control groups and randomization will lessen external validity problems but no method can be
completely successful. This is why the statistical proofs of a hypothesis called significant, not
absolute truth.
Any scientific research design only puts forward a possible cause for the studied effect.
There is always the chance that another unknown factor contributed to the results and findings.
This extraneous causal relationship may become more apparent, as techniques are refined and

If you have constructed your experiment to contain validity and reliability then the scientific
community is more likely to accept your findings.
Eliminating other potential causal relationships, by using controls and duplicate samples, is the
best way to ensure that your results stand up to rigorous questioning.
Reliability & Validity
We often think of reliability and validity as separate ideas but, in fact, they're related to each
other. Here, I want to show you two ways you can think about their relationship.

One of my favorite metaphors for the relationship between reliability is that of the target. Think
of the center of the target as the concept that you are trying to measure. Imagine that for each
person you are measuring, you are taking a shot at the target. If you measure the concept
perfectly for a person, you are hitting the center of the target. If you don't, you are missing the
center. The more you are off for that person, the further you are from the center.

The figure above shows four possible situations. In the first one, you are hitting the target
consistently, but you are missing the center of the target. That is, you are consistently and
systematically measuring the wrong value for all respondents. This measure is reliable, but no
valid (that is, it's consistent but wrong). The second, shows hits that are randomly spread across
the target. You seldom hit the center of the target but, on average, you are getting the right
answer for the group (but not very well for individuals). In this case, you get a valid group
estimate, but you are inconsistent. Here, you can clearly see that reliability is directly related to
the variability of your measure. The third scenario shows a case where your hits are spread
across the target and you are consistently missing the center. Your measure in this case is neither
reliable nor valid. Finally, we see the "Robin Hood" scenario -- you consistently hit the center of
the target. Your measure is both reliable and valid (I bet you never thought of Robin Hood in
those terms before).

Another way we can think about the relationship between reliability and validity is shown in the
figure below. Here, we set up a 2x2 table. The columns of the table indicate whether you are
trying to measure the same or different concepts. The rows show whether you are using the same
or different methods of measurement. Imagine that we have two concepts we would like to
measure, student verbal and math ability. Furthermore, imagine that we can measure each of
these in two ways. First, we can use a written, paper-and-pencil exam (very much like the SAT or
GRE exams). Second, we can ask the student's classroom teacher to give us a rating of the
student's ability based on their own classroom observation.
The first cell on the upper left shows the comparison of the verbal written test score with the
verbal written test score. But how can we compare the same measure with itself? We could do
this by estimating the reliability of the written test through a test-retest correlation, parallel
forms, or an internal consistency measure (See Types of Reliability). What we are estimating in
this cell is the reliability of the measure.

The cell on the lower left shows a comparison of the verbal written measure with the verbal
teacher observation rating. Because we are trying to measure the same concept, we are looking at
convergent validity (See Measurement Validity Types).

The cell on the upper right shows the comparison of the verbal written exam with the math
written exam. Here, we are comparing two different concepts (verbal versus math) and so we
would expect the relationship to be lower than a comparison of the same concept with itself (e.g.,
verbal versus verbal or math versus math). Thus, we are trying to discriminate between two
concepts and we would consider this discriminant validity.

Finally, we have the cell on the lower right. Here, we are comparing the verbal written exam with
the math teacher observation rating. Like the cell on the upper right, we are also trying to
compare two different concepts (verbal versus math) and so this is a discriminant validity
estimate. But here, we are also trying to compare two different methods of measurement (written
exam versus teacher observation rating). So, we'll call this very discriminant to indicate that we
would expect the relationship in this cell to be even lower than in the one above it.

The four cells incorporate the different values that we examine in the multitrait-multimethod
approach to estimating construct validity.

When we look at reliability and validity in this way, we see that, rather than being distinct, they
actually form a continuum. On one end is the situation where the concepts and methods of
measurement are the same (reliability) and on the other is the situation where concepts and
methods of measurement are different (very discriminant validity).