You are on page 1of 20

Reliability &

Validity
DR. NITU SINGH SISODIA
Reliability & Validity
• Reliability and validity are concepts used to evaluate the quality of
research. They indicate how well a method, technique. or test
measures something. Reliability is about the consistency of a
measure, and validity is about the accuracy of a measure opted.
• It’s important to consider reliability and validity when you are creating
your research design, planning your methods, and writing up your
results, especially in quantitative research. Failing to do so can lead to
several types of research bias and seriously affect your work.
Reliability & Validity
Reliability & Validity
What is reliability?
• Reliability and validity are closely related, but they mean different
things. A measurement can be reliable without being valid. However, if
a measurement is valid, it is usually also reliable.
• What is reliability?
• Reliability refers to how consistently a method measures something. If
the same result can be consistently achieved by using the same
methods under the same circumstances, the measurement is
considered reliable.
• You measure the temperature of a liquid sample several times under
identical conditions. The thermometer displays the same temperature
every time, so the results are reliable.
What is validity?
• What is validity?
• Validity refers to how accurately a method measures what it is intended
to measure. If research has high validity, that means it produces results
that correspond to real properties, characteristics, and variations in the
physical or social world.
• High reliability is one indicator that a measurement is valid. If a method
is not reliable, it probably isn’t valid.
• If the thermometer shows different temperatures each time, even
though you have carefully controlled conditions to ensure the sample’s
temperature stays the same, the thermometer is probably
malfunctioning, and therefore its measurements are not valid.
What’s the difference between reliability and
validity?
• -What’s the difference between reliability and validity?
• Reliability and validity are both about how well a method measures
something:
• Reliability refers to the consistency of a measure (whether the results
can be reproduced under the same conditions).
• Validity refers to the accuracy of a measure (whether the results really
do represent what they are supposed to measure).
Types of validity
The validity of a measurement can be estimated based on three main types of
evidence. Each type can be evaluated through expert judgement or statistical
methods.
Types of validity
Type of validity What does it assess? Example

Construct validity The adherence of a measure to existing A self-esteem questionnaire could be


theory and knowledge of the concept being assessed by measuring other traits known or
measured. assumed to be related to the concept of self-
esteem (such as social skills and optimism).
Strong correlation between the scores for
self-esteem and associated traits would
indicate high construct validity.
Content validity The extent to which the
measurement covers all aspects of
A test that aims to measure a class
of students’ level of Spanish
the concept being measured. contains reading, writing and
speaking components, but no
listening component. Experts agree
that listening comprehension is an
essential aspect of language ability,
so the test lacks content validity for
measuring the overall level of ability
in Spanish.
Criterion validity The extent to which the result of a A survey is conducted to measure
measure corresponds to other valid the political opinions of voters in a
measures of the same concept. region. If the results accurately
predict the later outcome of an
election in that region, this indicates
that the survey has high criterion
validity.
Validity:

Validity is the extent to which an instrument measures what it is supposed to measure and performs as
it is designed to perform.
• Does the measure employed really measure the theoretical concept(variable)?
• It is rare, if nearly impossible, that an instrument be 100% valid, so validity is generally measured in degrees.
• As a process, validation involves collecting and analyzing data to assess the
accuracy of an instrument. There are numerous statistical tests and measures to assess the validity of
quantitative instruments, which generally involves pilot testing. The remainder of this discussion
focuses on external validity and content validity.

• External validity is the extent to which the results of a study can be generalized from a sample to a
population. Establishing eternal validity for an instrument, then, follows directly from sampling. Recall
that a sample should be an accurate representation of a population, because the total population may
not be available. An instrument that is externally valid helps obtain population generalizability, or the
degree to which a sample represents the population.

Content validity refers to the appropriateness of the content of an instrument. In other words, do the
measures (questions, observation logs, etc.) accurately assess what you want to know? This is
particularly important with achievement tests. This would involve taking representative questions
from each of the sections of the unit and evaluating them against the desired outcomes.
Reliability:
• Reliability:
a. Will the measure employed repeatedly on the same individuals yield similar results?
(stability)
b. Will the measure employed by different investigators yield similar results? (equivalence)
c. Will a set of different operational definitions of the same concept employed on the same
individuals, using the same data-collecting technique, yield a highly correlated result? Or, will
all items of the measure be internally consistent? (homogeneity) Reliability can be thought
of as consistency. Does the instrument consistently measure what it is intended to measure?
It is not possible to calculate reliability; however, there are four general estimators that you
may encounter in reading research:
1. Inter-Rater/Observer Reliability: The degree to which different raters/observers give
consistent answers or estimates.
2. Test-Retest Reliability: The consistency of a measure evaluated over time.
3. Parallel-Forms Reliability: The reliability of two tests constructed the same way, from the
same content.
4. Internal Consistency Reliability: The consistency of results across items, often measured
with Cronbach’s Alpha
RELIABILITY
• Reliability refers to the consistency of a measure. Psychologists consider three types of consistency: over time
(test-retest reliability), across items (internal consistency), and across different researchers (inter-rater
reliability).
• Test-Retest Reliability
• When researchers measure a construct that they assume to be consistent across time, then the scores they
obtain should also be consistent across time. Test-retest reliability is the extent to which this is actually the
case.

• For example, intelligence is generally thought to be consistent across time. A person who is highly intelligent
today will be highly intelligent next week. This means that any good measure of intelligence should produce
roughly the same scores for this individual next week as it does today. Clearly, a measure that produces highly
inconsistent scores over time cannot be a very good measure of a construct that is supposed to be consistent
• Assessing test-retest reliability requires using the measure on a group of people at one time, using it again on
the same group of people at a later time, and then looking at test-retest correlation between the two sets of
scores. This is typically done by graphing the data in a scatterplot and computing the correlation coefficient.
Internal Consistency
• Another kind of reliability is internal consistency, which is the consistency of
people’s responses across the items on a multiple-item measure. In general,
all the items on such measures are supposed to reflect the same underlying
construct, so people’s scores on those items should be correlated with each
other.
• On the Rosenberg Self-Esteem Scale, people who agree that they are a
person of worth should tend to agree that they have a number of good
qualities. If people’s responses to the different items are not correlated with
each other, then it would no longer make sense to claim that they are all
measuring the same underlying construct.
• For example, people might make a series of bets in a simulated game of
roulette as a measure of their level of risk seeking. This measure would be
internally consistent to the extent that individual participants’ bets were
consistently high or low across trials
• Like test-retest reliability, internal consistency can only be assessed by collecting
and analyzing data. One approach is to look at a split-half correlation.
• This involves splitting the items into two sets, such as the first and second halves of
the items or the even- and odd-numbered items. Then a score is computed for
each set of items, and the relationship between the two sets of scores is examined.
• The correlation coefficient for these data is +.88. A split-half correlation of +.80 or
greater is generally considered good internal consistency.
• Perhaps the most common measure of internal consistency used by researchers in
psychology is a statistic called Cronbach’s α (the Greek letter alpha).
• Conceptually, α is the mean of all possible split-half correlations for a set of items.
For example, there are 252 ways to split a set of 10 items into two sets of five.
Cronbach’s α would be the mean of the 252 split-half correlations. Note that this is
not how α is actually computed, but it is a correct way of interpreting the meaning
of this statistic. Again, a value of +.80 or greater is generally taken to indicate good
internal consistency.
Interrater Reliability

• Many behavioral measures involve significant judgment on the part of an


observer or a rater. Inter-rater reliability is the extent to which different
observers are consistent in their judgments.
• For example, if you were interested in measuring university students’
social skills, you could make video recordings of them as they interacted
with another student whom they are meeting for the first time.
• Then you could have two or more observers watch the videos and rate
each student’s level of social skills. To the extent that each participant
does, in fact, have some level of social skills that can be detected by an
attentive observer, different observers’ ratings should be highly
correlated with each other.
• THANK U

You might also like