) bo
Validity and Reliability in Research
oT oOo Mo Celgene Vlei eB] Lom lO
SON ea eter)AZ E la
Validity implies the extent to which the research instrument
measures, what it is intended to measure. \
Itrefers to the ability of the instrument/test to measure what it
is supposed to measure .
\
Validity looks at accuracy. Without validity, research goes in
‘the wrong direction. Generally, validity is termed to be much
more important than reliability.
Validity is the accuracy of a measure or the extent to which a
score truthfully represents a concept.
The question of validity is raised in the context of the three
points:
1. The form of the test
2. The purpose of the test
3. The population for whom it is intendedAWN eee ai
poy
Y Aconstruct refers | | ¥ Constructscan | |v they can also be
toa concept or be broader
characteristic characteristics concepts applied
that can’t be of individuals, to organizations f
directly observed such as or social groups, |
but can be intelligence, such as gender |
measured by obesity, job equality, | on O
observing other satisfaction, or corporate social ir J
indicators that depression; responsibility, or J \ o>
are associated freedom of ~
with it speech 0 et
Y Example
¥ There is no objective, observable entity called “depression” that we can measure directly.
But based on existing psychological research and theory, we can measure depression based on a collection of
symptoms and indicators, such as low self-confidence and low energy levels.Se
Pepe nets
Da eeBasis of Comparison Internal Validity External Validity
Meaning Internal Validity is the extent to which the It is the extent to which the research results
experiment is free from errors and any can be inferred to world at large.
difference in measurement is due to
independent variable and nothing else.
Concerned with Control Naturalness
‘What is it? Itis a measure of accuracy of the It checks whether the casual relationship
experiment discovered in the experiment can be
generalized or not.
Identifies How strong the research methods are? Can the outcome of the research be applied
tothe real world?
Describes Degree to which the conclusion is Degree to which the study is warranted to
warranted. generalize the result to other context.
Used to ‘Address or eliminate alternative Generalize the outcome.
explanation for the result.KINDS OF VALIDITY
Construct Validity Inference Validity
(Measurement) (Studies)
Translation Criterion Internal External
Face Content Population Ecological
Predictive Concurrent | Convergent | Discriminant |The inference validity of a research design is the validity of the entirety (completeness) of the research. It indicates
whether one can trust the conclusions or not. The inference validity is further divided into two sub-sections: internal
validity and external validity.
Internal Validity
¥ Internal validity checks the consistency of the conclusions claimed, especially
those related to causality (cause and effect) with the results and design of the
research. It tells how well a study is conducted. The internal validity of any
research has three conditions:
—_+——
v The independent Vv The independent v Any other
and dependent variable should extraneous factors
variables in the precede the should not explain
study should change dependent variable tthe result of the
together. in the study. study.Sata MANE? s
ES eT
External validity
is based on the
¥ When we say v
that results are
Other researchers
will assume that
&
pottery ner generalizable, their results would
pena ere we mean that be the same if lo
a. 2) when research they use the same fh
ae is conducted, methods.
group of people.
Types of External Validity
Population Validity Ecological Validity
¥ Refers to the extent to | | vis often applied in Y ifatest has high
Ability to generalize which the findings of a experimental studies ecological validity, it can
results from the research study are ‘of human behavior be generalized to other
sample to a larger able to be generalized and cognition, such real-life situations, while
population. to reallfe setting that asin psychology and tests with low ecological
are realistic. related fields. validity cannot.TRANSLATION VALIDITY
¥ Translation validity refers to a subjective evaluation that examines whether the selected measures of the study are
similar or different to the subject of the overall desired aim of the study. Its further divided into two types: face validity
and content validity.
rrr
Content validity Face Validity
evaluates how well an instrument (like a test) covers ale ce aca a amas
all relevant parts of the construct it aims to measure.
Here, a construct is a theoretical concept, theme, or
idea—in particular, one that cannot usually be
measured directly
¥ Example: Content validity in exams : A written exam,
tests whether individuals have enough theoretical
knowledge to acquire a driver's license. The exam
would have high content validity if the questions
asked cover every possible topic in the course related
to traffic rules. At the same time, it should also
exclude all other questions that aren't relevant for the
driver's license.
measure what it’s supposed to measure. This type
of validity is concerned with whether a measure
seems relevant and appropriate for what it’s assessing
on the surface.
To have face validity, your measure should be:
1. Clearly relevant for what it’s measuring
2. Appropriate for the participants.
3. Adequate for its purpose
<8 8ITERION VALIDITY
Y Evaluates how accurately a test measures the outcome it
‘was designed to measure. An outcome can be a disease,
behavior, or performance. Concurrent validity measures tests
validity measures those in the future.
Y Toestablish criterion validity, you need to compare your test
results to criterion variables. Criterion variables are often
referred to as a “gold standard” measurement. They
comprise other tests that are widely accepted as valid ¢
measures of a construct. t
¥ Example: Criterion validity
A researcher wants to know whether a college entrance a da da
exam is able to predict future academic performance. First-
semester GPA can serve as the criterion variable, as it is an
accepted measure of academic performanceKINDS OF CRITERION VALIDITY
Predictive Concurrent Convergent Discriminant
Y feferstothe ability ofa | |v Also called criterion- Y Measuring the same Y Discriminant
pesconcaer related concurrent concept with very validity shows you that
ap Seeger Bree validity. different methods. ‘two tests that
a future outcome. Here, | | ¥ Concurrent validity Y If different methods are not supposed to be
a cleo can Le.) ‘means that your test yields the same result, related are, in fact,
penenoy petiucroanse, measures the same way then convergent validity unrelated.
llevan ozone ea ‘as another test in the is supported.
axcursatsomepointin | | samme areathathas | | Example: Different
i Buample: Predictive already been proven to survey items used to
Paneer be valid measure deci
Ce ames Y This type of vali making style- closed and
established by giving a open-ended.
predictive validity when
it can accurately identify gow ot pecs we
Eneeonicants wiciwll tests- yours and the one
par aenel ere already validated and
enarnaintcrtioat correlating the scores.Se A
Y Content Validity: Does the measure
adequately measure the concept?
Fon ace wer af]
Predictive: Does the measure
differentiate individuals in as manner as
to help predict a future criterion?
Y Face Validity: Do “experts” validate that
the instruction measures what its name
suggests it measures?
¥ Criterion Validity: Does the measure
differentiate in a manner that helps to
predict a criterion variable?
Y Construct: Does thi
concept as theorized?
strument tap the
Y Concurrent: Does the measure
differentiate in a manner that helps to
predict a criterion variable concurrently?
Y Convergent: Do two instruments
measuring the concept correlate highly?
¥ Discriminant: Does the measure have a
correlation with a variable that is
supposed to be unrelated to this
variable?Reliability refers to how consistently a method measures something.
If the same result can be consistently achieved by using the same methods under the
same circumstances, the measurement is considered reliable.
You measure the temperature of a liquid sample several times under
identical conditions. The thermometer displays the same
temperature every time, so the results are reliable.
A doctor uses a symptom questionnaire to diagnose a patient with a long-term medical condition.
Several different doctors use the same questionnaire with the same patient but give different
diagnoses. This indicates that the questionnaire has low reliability as a measure of the condition.RYN} a
Y Reliability indicates the degree to which a person's test scores
are stable — or reproducible — and free from measurement
error.
¥ Iftest scores are not reliable, they cannot be valid since they
will not provide a good estimate of the ability or trait that the
test intends to measure. Reliability is therefore a necessary but
not sufficient condition for validity.
Y There cannot be validity without reliability. There can be reliability without validity.
Avalid instrument is always reliable. A reliable instrument need not be a valid instrument.
¥ Reliability on its own is not enough to ensure validity. Ever reliable, it may not accurately reflect the
real situation.
The thermometer that you used to test the sample gives reliable results. However, the thermometer has not
been calibrated properly, so the result is 4 degrees lower than the true value. Therefore, the measurement is not
valiRELIABILITY- INTERNAL CONSISTENCY
¥ 1. Internal consistency assesses
the correlation between multiple
items
Y 2. ina test that are intended to
measure the same construct.
¥ 3. You can calculate internal
consistency without repeating the
test or involving other researchers,
4, so it’s a good way of assessing
reliability when you only have one
data set.
‘Y 6. When you devise a set of
questions or ratings that will be
combined into an overall score,
v5. Why it’s important
‘Y 8. If responses to different items
contradict one another, the test
might be unreliable.
Y 7. you have to make sure that all of the
items really do reflect the same thing.Stability
~——_-__
KINDS OF RELIABILITY (ACCURACY IN MEASUREMENT)
Consistency
a
Test-Retest Parallel-Form Cae Inter-Rater
Reliabil Relial Reliability
BY, 2 Reliability v
¥ The reliability ¥ The reliability ¥ Atest of V The consistency | |v Reflects the
coefficient coefficient consistency of of the judgement correlation
obtained with obtained by respondent's of several raters between two
repetition of an two responses to con how they see a halves of an
identical comparable all the items in phenomenon or instrument.
measure on a set of a measure interpret some
second measures. responses.
occasion.Test-Retest Reliability
¥ The consistency of a measure across time: do
you get the same results when you repeat the
measurement?
¥ Agroup of participants complete a
questionnaire designed to measure personality
traits.
Y If they repeat the questionnaire days, weeks or
months apart and give the same answers, this
indicates high test-retest reliability.
eas
UMM LOSTYPES OF RELIABILITY
Parallel-form Reliability
¥ Parallel forms reliability relates to a measure
that is obtained by conducting assessment
of the same phenomena with the
Participation of the same sample group via
more than one assessment method.
¥ Example: The levels of employee satisfaction
of ABC Company may be assessed with
questionnaires, in-depth interviews and
focus groups and results can be compared.
Parallel-forms Reliability
i! :
Customer
¢yTYPES OF RELIABILITY
Inter-Rater Reliabil
\/tecunoseommscosmn | Mter-rater Reliability
or observers: do you get the same results 0
1
: 10
U
| i
when different people conduct the same
measurement?
Y Based on an assessment criteria checklist,
five examiners submit substantially
different results for the same student
project. This indicates that the assessment
checklist has low inter-rater reliability (for
example, because the criteria are too
subjective).Internal Consistency
v The consistency of the measurement itself:
do you get the same results from different
parts of a test that are designed to measure
the same thing?
se
¥ You design a questionnaire to measure self-
esteem. If you randomly split the results into
two halves, there should be a strong
correlation between the two sets of results.
If the two results are very different, this
indicates low internal consistency.Split-half Reliability
¥ split-half reliability as another type of internal
consistency reliability involves all items of a test
to be ‘spitted in half’.
Splithalf refers to determining a correlation |
between the first half of the measurement and
the second half of the measurement.VARIOUS TYPES OF RELIABILITY
Types of Reliability ‘Measures the Consistency of.
‘Test-Retest Reliability v The same test over time.
¥ Same people, different times.
Inter-Rater Reliability | v The same test conducted by different people.
Y Different people, same test.
Parallel Forms ¥ Different versions of a test which are designed to be equivalent.
Y Different people, same time, different test.
Internal Consistency ¥ The individual items of a test
¥ Different questions, same construct.VARIOUS TYPES OF RELIABILITY
Types of Reliability ‘What it is? How You Do It? What is My Methodology?
Test-Retest A measure of ‘Administer the same Measuring property that you expect to
test/measure at two different stay the same over time.
times to the same group of
participants.
Parallel Forms A measure of Administer two different forms |v Multiple researchers making
equivalence of the same test to the same observations or rating about the same
group of participants. topic.
Inter-Rater A measure of Have two raters rate behaviours | v Using two different tests to measure the
agreement and then determine the amount | same thing.
of agreement between them.
Internal A measure of how Correlate performance on each | ¥ Using a multi-item test where all the
Consistency consistently each item with overall performance items are intended to measure the
item measures the
same underlying
concept.
across participants.
same variable,Basis of Comparison
Validity
Reliability
‘Meaning Validity implies the extent to which the Reliability refers to the degree to which
research instrument measures, what it is ‘assessment tool produces consistent results,
intended to measure. when repeated measurements are made.
Main Focus Validity mainly focuses on the outcome. Reliability mainly focuses on maintaining
consistent result
Influencing Factors Influencing factors for validity are: process, | v Influencing factors for reliability are: test
purpose, theory matters, logical length, test score variability, heterogeneity,
implications, etc. ete.
Result derivation Validity requires more research and is more | ¥ Comparatively simpler and producing
difficult to attain. quicker result is reliability.
Requirement Validity is not a requirement for reliability. Reliability s a requirement for validity.
Tools Validity considers “Precision”. Reliability considers “Consistency and
Repeatability”.
Utility The test is completely useless if the findings | v The test is not very useful if the result
are “Invalid”.
cannot be reproduced.LATIONSHIP BETWEEN VALIDITY AND RELIABILITY
Y Valid test should be reliable (if measuring validity).
¥ Reliability tests may not be valid.
Validity is more important than reliability (according to the courts)
Y Tobe useful, an instrument(test, scale) must be both reasonably reliable and valid
Y Ithelps the Aim for validity first, and then try make the test more reliable little
by little. management to identify the actual potential of employees.
¥ With this in ming, it can be helpful to conceptualize the following four basic scenarios for the relation between reliability
and validity:
¥ Reliable (consistent) and | | V Unreliable (not Y Unreliable (not
¥ Reliable (consisten
pore coneeeeay not valid (measures consistent) and not consistent) and
and valid (measures
, something consistently, valid (inconsistent valid (measures what
what it's meant to ‘ ; :
but it doesn't measure measure which doesn't its meant to measure,
measure, ie, astable ; :
what its meant to measure what its meant i., an unstable
construct)
measure) to measure) construct)[eserstacti
Pere ery
eee aorVARIOUS TYPES OF MEASUREMENT SCALES
Particulars Nominal Scale Ordinal Scale interval Scale Ratio Scale
Characteristics ¥ Description Y Order ¥ Distance Y Description, Order,
Distance and Origin.
‘Sequential ¥ Not Applicable ¥ Applicable ¥ Applicable ¥ Applicable
Arrangements
Fixed Zero Point | V Not Applicable ¥ Not Applicable ¥ Not Applicable ¥ Applicable
Multiplication and | ¥ Not Applicable ¥ Not Applicable ¥ Not Applicable Y Applicable
Division
‘Addition and ¥ Not Applicable ¥ Not Applicable ¥ Applicable ¥ Applicable
‘Subtraction
Difference between |v Non-Measurable | ¥ Non-Measurable | ¥ Measureable Y Measureable
Variables
‘Mean ¥ Not Applicable ¥ Not Applicable ¥ Applicable Y ApplicableVARIOUS TYPES OF MEASUREMENT SCALES
Scale Description Example ‘Type of Data Mathematical
Operation
Nominal | Data consists of name or categories. |¥ Number assignedtoa | v Discrete Counting and %
No ordering scheme is possible. runner in a race. Calculation
¥ Social Security Number,
Ordinal | ¥ Data is arranged in some order but | ¥ Rankorder of runners | v Discrete Y Counting and %
(Ranking) | differences between values cannot ina race. Calculation
be determined or are meaningless.
Interval | Data is arranged in order and ¥ Temperature of three | ¥ Continuous | v Addition and
differences can be found. However, metal parts were 200 F, Subtraction
there is no inherent starting point 300 F and 600 F. Note
and ratios are meaningless. three times of 200 Fis
not same as 600 F.
Ratio | v Anextension of the interval level that | “ Product A cost 300 and |v Continuous | “ Addition,
includes an inherent Zero starting Product 8 cost 600. Subtraction,
point Note that 600 is twice Multiplication,
Y Both differences and ratios are as much as 300. Division and all
meaningful. statistical technique.