You are on page 1of 59

Validity and Reliability

of Research Instrument, and


how to minimize bias

Hardian
Biomedic and Clinical Epidemiology Unit
MFDU
Medical or epidemiological study, major
consideration is to obtain:
Valid measurement
Reliable measurement
on the exposure factors and
outcomes of interest in the study population

WITHOUT BIAS and ERRORS or


to minimize them to the least as possible
To achieve a high standard quality
study
Ensure right answers to study questions
Good the study design
Valid and reliable the measurements
Control for any possible bias
Good cooperation between
* research group and
* study population
Instrument or Research Tool

• Equipment hard ware


– A red blood cell counter
– A PH meter
– An electronic weighing machine
• Paper ware
– A questionnaire
– A weekly diet diary
• people ware
– Observers/investigators
– Technicians
How good is the instrument or tool?

• instrument
• tool
• true value truth
measurement
• measurement
– valid/accurate
– without bias
– precise/reliable
or error
– minimize bias
Some definitions…
• Validity

“The soundness or appropriateness of a test or


instrument in measuring what it is designed to
measure”

Are we testing what we think we’re testing?


Some definitions…

• Reliability

“…the degree to which a test or measure


produces the same scores when applied in
the same circumstances…”
Types of Experimental Validity

• Internal

– Is the experimenter measuring the effect of


the independent variable on the dependent
variable?

• External

– Can the results be generalised to the wider


population?
Validity
AKA
Criterion

Logical Statistical

Construct

Face Content Concurrent Predictive

Reliability Consistency Objectivity


Logical Validity

• Face Validity
– Infers that a test is valid by definition
– It is clear that the test measures what it is supposed to

e.g.
If you want to assess reaction
time, measuring how long it
takes an individual to react to a
given stimulus would have face
validity
Externally
Valid?
Logical Validity

• Face Validity
– Infers that a test is valid by definition
– It is clear that the test measures what it is supposed to

i.e.
Would assessing 15 m sprint
time be a valid means of
Assessing face validity
assessing reaction time? is therefore a subjective process.
Logical Validity
• Content Validity
– Infers that the test measures all aspects contributing to the
variable of interest

e.g.
Who is the most
physically fit?
 VO2 max test?
 Wingate test?
 1 RM? process.
…also a subjective
Statistical Validity
• Concurrent Validity
– Infers that the test produces similar results to a
previously validated test

e.g.
VO2
max

Incremental Treadmill
Protocol with expired gas Multi-Stage Fitness (Beep)
analysis Test
Statistical Validity
• Predictive Validity
– Infers that the test provides a valid reflection of future
performance using a similar test

e.g.
Can performance
during test A be
used to predict
future performance
in test B?

A B
Logical/Statistical Validity

• Construct Validity
– Infers not only that the test is measuring what it
is supposed to, but also that it is capable of
detecting what should exist, theoretically
– Therefore relates to hypothetical or intangible
constructs

e.g.
Team Rivalry

Sportsmanship.
Threats to Validity
(and possible solutions?)
Threats to Internal Validity

• Maturation
– Changes in the DV over time irrespective of the IV
Threats to Internal Validity

• Maturation
e.g. One Group Pre-test Post-test

O1 T O2
Threats to Internal Validity
• Maturation (possible solution)
Pre-test Post-test Randomised Group Comparison
O1 T O2

R RCT
O3 P O4
Threats to Internal Validity

• History
– Unplanned events between measurements
Threats to Internal Validity

• History

O1 T O2

e.g. exercise?

Therefore, solution = control extraneous


variables!
Threats to Internal/External
Validity

• Pre-testing
– Interactive effects due to the pre-test (e.g. learning,
sensitisation, etc.)
– Also influences External Validity
Threats to Internal/External Validity
• Pre-testing …so it is actually T+O1
that is better than P, not T
e.g. T alone.
O1 O2

Assessing muscle …but then


mass here could respond better to
make them train the T than the
R harder in both
trials…
P…
O3 P O4
Threats to Internal/External Validity

• Pre-testing (possible solution)


T
O 1
O 2

P O
O 4

R
3

T
O 5
Solomon Four-
Group Design P O 6
Threats to Internal Validity

• Statistical Regression
– AKA regression to the mean
– An initial extreme score is likely to be
followed by less extreme subsequent scores

e.g.
Training has the greatest effect on untrained individuals.

Therefore, solution = effective sampling.


Threats to Internal Validity
• Instrumentation
– A difference in the way 2 comparable
variables were measured

e.g.
Uncalibrated equipment

Therefore, solution = calibrate!


Threats to Internal Validity

• Selection Bias
– The groups for comparison are not equivalent
Threats to Internal Validity

• Selection Bias
e.g. Groups not randomly assigned
T O1
i.e.
Static Group Group T were
Comparison resistance trained
to start with
P Oa
Threats to Internal Validity
• Selection Bias (possible solution)

Either: T
O1
-Randomise group
assignment,
-Pre-test and post-
test difference,
P
-Repeated Measures
Design. Oa
Threats to Internal/External Validity
• Experimental Mortality
– Missing Data due to subject drop-out
– Reduced n = reduced statistical Power
– Not only challenges quality of data gathered
(Internal Validity) but
also our ability to
generalise
(External Validity).

Therefore, solution =
recruit sufficient
(young?)
participants
Threats to External Validity

• Inadequate description
– 5th characteristic of research…
…should be replicable

If nobody can replicate the methods of a


given study, then it is irrefutable and
therefore lacks external validity.

Therefore, solution = comprehensive methodology


Threats to External Validity
• Biased sampling
– Linked to statistical regression
– Sample does not reflect target population
– n≠N
Results generalised
across gender

Therefore, solution = random sample (of target


population).
Threats to External Validity
• Hawthorne Effect
– DV is influenced by the fact that it is being
recorded

e.g.
Fastest sprint when
professor enters lab

Therefore, solution =
control the lab
environment.
Threats to External Validity
• Demand Characteristics
– Participants detect the purpose of the study and
behave accordingly
e.g.
Sports Science students already know that the
carbohydrate drink is supposedly superior

Therefore, solution =
CHO double or single H2 O
blinding.
Threats to External Validity
• Operationalisation
– AKA Ecological Validity
– The DV must have some relevance in the
‘real world’

e.g.
Intracardiac blood pressure has
no equivalent in public health
survey

Therefore, solution = choose your DV carefully.


Validity of questionnaire
• Expert validity: Item analysis
• Send questionnaire to at least 3 expert (ood number)
• +1 if agree
• 0 if not sure
• - 1 if not agree
• Average score from 3 experts should be >0.5  Item
of questionnaire can be included

Experts Items
1 2 3 4 5 6 7 8 9
1 1 1 1 -1 0 … … … …
2 1 0 0 0 0 … … … …
3 1 1 -1 -1 0 … … … …
Average 1 0,7 0 -0,7 0 … … … …
Reliability
• Reliability is a pre-requisite of validity
e.g. Direct versus Indirect measures of VO2 max

-Gold Standard (i.e. valid and reliable) -Predictive


-Expensive -Cheap
-Complex -Easy
Types of Reliability

• Relative
• Absolute
• Rater reliability (Objectivity)
– Intrarater reliability
– Interrater reliability.
Relative Reliability

Subject 1 60 ml.kg-1.min-1 63 ml.kg-1.min-1 57 ml.kg-1.min-1

Subject 2 55 ml.kg-1.min-1 56 ml.kg-1.min-1 48 ml.kg-1.min-1

Subject 3 70 ml.kg-1.min-1 65 ml.kg-1.min-1 66 ml.kg-1.min-1


i.e. Individuals maintain
Relatively Reliable position in the group
Absolute Reliability

Subject 1 60 ml.kg-1.min-1 63 ml.kg-1.min-1 57 ml.kg-1.min-1

Subject 2 55 ml.kg-1.min-1 56 ml.kg-1.min-1 48 ml.kg-1.min-1

Subject 3 70 ml.kg-1.min-1 65 ml.kg-1.min-1 66 ml.kg-1.min-1


i.e. Test-Retest
Not Absolutely Reliable within individuals
Rater Reliability

• Intrarater reliability
– The consistency of a given observer or
measurement tool on more than one occasion
Rater Reliability
• Interrater reliability
– The consistency of a given measurement from more
than one observer or measurement tool

e.g.
Score for the Indonesian Idol
BCL = 9.9
Judika = 4.4
Armand Maulana = 7.0
Threats to Reliability

• Fatigue
8 am 9 am 10 am

Subject 1 60 ml.kg-1.min-1 55 ml.kg-1.min-1 50 ml.kg-1.min-1

Therefore, solution = increase time between tests.


Threats to Reliability
• Habituation

Subject 1 60 ml.kg-1.min-1 65 ml.kg-1.min-1 70 ml.kg-1.min-1

Therefore, solution = familiarise prior to test.


Threats to Reliability
• Standardisation of Procedures
– Control of extraneous variables

• Precision of Measurements
– i.e. if we are happy to measure VO2 max to the
nearest 10 ml.kg-1.min-1, then it could probably be
reliably predicted from your training volume and age.
Reliable Valid Neither Both
Not Valid Not Reliable Reliable
Reliable Not Valid and Valid
Types of Reliability

• Test-Retest
• Equivalent Forms
• Internal Consistency
• Split-Half Approach
• Kuder-Richardson Approach
• Cronbach Alpha Approach
Test-Retest Reliability

• Administer the same instrument twice to the same


exact group after a time interval has elapsed.
• Calculate a reliability coefficient (r) to indicate the
relationship between the two sets of scores.
• r of+.51 to +.75 moderate to good
• r over +.75 = very good to excellent
Equivalent Forms Reliability

• Also called alternate or parallel forms


• Instruments administered to same group at same time
• Calculate a reliability coefficient (r) to indicate the
relationship between the two sets of scores.
– r of+.51 to +.75 moderate to good
– r over +.75 = very good to excellent
Internal Consistency Reliability
Split-Half Kuder-Richardson (KR)

 Break instrument or sub-parts in ½ -- like  Treats instrument as whole


two instruments  Compares variance of total scores
 Correlate scores on the two halves and sum of item variances
 Data nominal

Best to consult statistics book and


consultant and use computer software to Cronbach Alpha
do the calculations for these tests
 Like KR approach
 Data scaled or ranked
RELIABILITY
 In SPSS, the reliability analysis is obtained from the
following commands:
 ANALYZE – SCALE – RELIABILITY ANALYSIS
 Under reliability, there are five different types of methods
(1) The alpha Cronbach’s method
(2) The split-half method
(3) The Guttman method
(4) The parallel method
(5) The strict parallel method
> 0,6
Summary
1. Reliability : Precision, Reproducibility

Random Error

2. Validity : Accuracy , Conformity

Systematic Error
Bias

You might also like