Professional Documents
Culture Documents
Measurement
LEARNING
WHETHER STUDYTNG THE NUMBER of polarbears leftintheArctic OBJEGTIVES
Circle, the strength of a bar of steel, the number of steps people I
take each day, or the level ofhuman happiness, every scientist faces
A year from now, you should
the challenge of measurement. When researchers test theories or
still be able to:
pursue empirical questions, they have to systematically observe the
phenomena by collecting data. Such systematic observations require t.
measurements, and these measurements must be good ones-or else lnterrogate the construct validity
they are useless. of a study's variables.
Measurement in psychological research can be particularly chal- 2.
lenging. Many of the phenomena psychologists are interested in- Describe the kinds of evidence
motivation, emotion, thinking reasoning-are difficult to measure that support the construct validity
directly. Happiness, the topic of much research, is a good example of a measured variable.
of a construct that could be hard to assess. Is it really possible to
quantify how happy people are? Are the measurements accurate?
Before testing, for example, whether people who make more money
are happier, we might ask whether we can really measure happiness.
Maybe people misrepresent their level of well-being, or maybe peo-
ple aren't aware of how happy they are. I{ow do we evaluate who is
really happy and who isn't? This chapter explains how to ask ques-
tions about the quality of a study,s measures-the construct validity
of quantifications of things like happiness, gratitude, or wealth.
Construct validity, remember, refers to how well a study,s variables
are measured or manipulated.
Construct validity is a crucial piece of any psychological
research study-for frequency, association, or causal claims. This
chapter focuses on the construct validity ofmeasured variables. you
will learn, first, about different ways researchers operationalize
717
3. I am satisfied with my life.
4. So far I have gotten the important things I want in life.
rhen vou'll learn howvou
)) measured variabres' The constru :lffi}t:ffi :ii;;t:i:;::t:'i:i" -
-
5. lf I could live my life over, I would change almost nothing.
Eor a review of measured validity of those measurements'
mani Putated
variables' is covered in ChaPter
10' -The unhappiest people would get a total score of 5 on this self-report scale
"'.le
"na- cnaPttr 3' PP' 58-59'
because they would answer "strongly disagree," or 1, to all five items (1 + 1 + I + I +
1 = 5). The happiest people would get a total score of 35 on this scale because they
VARIABLES
WAYS TO ]VIEASURE would answer "strongly agree," or 7,to all five items (7 + 7 + 7 + 7 + 7 = 35). Those
at the neutral point would score 2O-right in between satisfied and dissatisfied
:f :TTff ffi1:t"|iffi:,t5:"i'"fl *:Ti
g var i abre s t
rhe pr oces s or me a su r i n (+ + + + 4 + 4 + 4 = 2o).Diener and Diener (ul0) reported some data based on this
al'
i"iti" it"* they should operationalize'eacn ;;J"'uutio"ut' and phvsiologic scale, concluding that most people are happy, meaning most people scored above
three comm"" t'n":':;il;;;''"lf-'";;t' 20. For example, 63% ofhigh school and college students scored above 20 in one
Thevalsodecide""il;*o'iunn'on'i"t"';;i;li*"u'*'"-entforeachvariable study, and 72% of disabled adults scored above 20 in another study.
theY Plan to investigate' In choosing this operational definition of subjective well-being, the research
team started with only one possible measure, even though there are many other
and Operational ways to study this concept. Another way to measure happiness is to use a single
More About Conceptual question called the Ladder of l-ife (Cantril, 1965). The question goes like this:
Variables
lmagine a ladder with steps numbered from O at the bottom to 10 at the top. The
rnchapter3,voulearnedabout"l"'":i":*;r1,:;;tl;ffi :"ttrJlT:ililid41 top of the ladder represents the best possible life for you and the bottom of the
lii"T"lihilerheconceptuardennition'or
*::*;i*::*:ilTi'ff
i' th" u"i"ii'"" "f
;";;;ile in theoretical question at a
decision about
ladder represents the worst possible life for you. On which step of the ladder
would you say you personally stand at this time?
construct, '"'"lrli'j'l'
i"n"iti"" '"n'"t""t"i;;;;;;"t'Jspecific
level' The op"rationui variable. the
*;;;;;;" th" On this measure, participants respond by giving a value between O and t0.
how to measure ", "o,,""ptual
Gallup polling organization uses the Ladder of Life scale in its daily Gallup-
Healthways Well-Being Index.
You might be thinking one of these operational definitions seems like a better
::,',ff [",|ff ili""ffi;'JTr,ittt"mpre.oneres-ellclteam'ledbv
of huppin"" Uf al""toping a precise
conceptual
measure of happiness than the other. Which one do you think is best? We'll see
Ed Diener, began tn"
""J'
Diener's * u'n'"utild ;;;; i;; w or d happiness might that they both do a good job of measuring the construct. Diener's research team
fi aliv'
defi nition. speci
c
*"*,li#:, ff; i;t,;ltneir interest to "subjective and Gallup have both learned their measures ofhappiness are accurate because
have a varie., o, ";prt.t,b perspective)'
p*'on s own they have collected data on them, as we'll see later in this chapter.
u
well-being" (o'*"tt-U"'iig'fro*
After denn ing h;i;" " i' :h: ""::"::l *j;t;:ffl :ll'H tTl::?L': OPERATIONALIZI NG OTHER CONCEPTUAL VARIABLES
To study conceptual variables other than happiness, researchers follow a simi-
lT''"';
i:fl:{:**""ru':fit:l"j#l.1m,**,.;*h"**:J'f
well-being, in Part' t t':'tTl'own criteria to
lar process: They start by stating a definition of their construct (the conceptual
variable) and then create an operational definition. For example, to measure the
'i'""ra 1ee3)"Thev worded their
tionnaire format'
describe *h"t.or,rtri".", "
"*urir"" cpuiii;;i;;"t'
"hl'"*;;:;;d"-"td'"-d;;nr" association between wealth and happiness, researchers need to measure not only
happiness, but also wealth. They might operationally define wealth by asking
;"il;,;r"*:;hifr':,:*:Hi"T",lHff.',:'*'",,,j$ixllllf:+*
tor n:*ilbout their satts-
about salary in dollars, by asking for bank account balances, or even by observing
was apPropriate
subjective *"lt-u"i'sin "Ji"'
p"opt".to
'"';;J.o l*to "strongly disagree" and the kind ofcar people drive.
u 7-point sc.ale;
I toi'""po"a"d Consider another variable that has been studied in research on relationships:
faction with life ""ig gratitude toward one's partner. Researchers who measure gratitude toward a rela-
;;;;;;';""ded to "stronglY agree": tionship partner might operationalize rtby asking people how often they thank
ideal' their partner for something they did. Or they might ask people how appreciative
life is close to my
In most ways my
my life are excellent'
2 The conditions of
-1' Ways to Measure Variables 119
a simple variable
such as gender mustb3'onerltionalized' the use of equipment to amplify, record, and analyze biological data. For example,
they usually feel' Even in a number of wavs'
t"'i"ur" """o" "?"t"tionalized research moment-to-moment happiness has been measured using facial electromyography
Table 5.1 shows, "", "";;;;;"t *rr"*'.t*ir"ity comes into the
one place (EMG)-a way of electronically recording tiny movements in the muscles in the face.
In fact, operationalirJ;"r';; measures oftheir constructs'
*o'k to develop new anibetter Facial EMG can be said to detect a happy facial expression because people who are
process, as researcher'
smiling show particular patterns of muscle movement around the eyes and cheeks.
Other constructs might be measured using a brain scanning technique called
of Measures
Three Common TYPes functional magnetic resonance imaging, or fMRI. In a typical fMRI study, people
engage in a carefully structured series ofpsychological tasks (such as looking at
observational' and
futii"to three categ"t; *it;"po"'
Thetypesofmeasurespsychologicalscientiststypicallyuse'tooperationalize three types of photos or playing a series of rock-paper-scissors games) while lying
variables generally in an MRI machine. The MRI equipment records and codes the relative changes
physiological'
in blood flow in particular regions of the brain, as shown in Figure 5.1. When more
SELF-REPORT MEASURES
Aself-reportmeasureoperationaliz::::Tl*:"tJ;:T#:#::l;;Ji-ffi FIGURE 5.I
Wins vs. Losses contrast in Rock, Paper, Scissors
Images from fMRI scans showing brain
*:muJl'":'#Tl"*Ti:J+T:l"il;;;;;;pi",orse,f-reportmeasures their
asking n";;i;;; much thev appreciate activity.
about life satisfaction'
ii*itu'to measures' If stress ln this study of how people respond to rewards
identity ur" both self-report
oartner and asking to self-report on the and losses, the researchers tracked blood flow
was the variable being
studied''"'"u"h"" it;;;;t;n"*le
"i""ig""a"r patterns in the brain when people had either
won, lost, or tied a rock-paper-scissors game
frequencyofspecific"-'ent'they've"*p"'ien""linthepastyeaqsuchasmarriage' played with a computer. They found that many
& Rahe' 1e67)'
;;;;;;'-oving (e'g'' Holmes *;;;" replaced *ttl- n^1t""t
reports or regions of the brain respond more to wins than
rn research o" thild'""' self-reports to losses, as indicated by the highlighted regions.
(Source: Vickery, Chun, & Lee, 2011.)
ttte words the child knows'
teacherreports.rr'","-"u,oresaskparentsorteacherstorespondtoaseriesof
tif"
a""'iUi"g tf'" "*"t''
questions, such as
"hild'' '""""i
Ways to Measure Variables 121
hu"e to be validated
ioralobservationtophysiologicalmeas,',",.**,neopleerroneouslybelievephys. how fast students worked. This represents ordinal data because the fastest exams
mea"""' J*ir'"
;il;;;;h"i'
m-ost accurat"' '"'olt' are on the bottom of the pile-ranked 1. However, this variable has not quantified
iological
byusingoth"'t"""""t'Forinstance'.asmentionedabove'researchersusedfMRl But how how much faster each exam was turned in, compared with the others.
to level of intelligence'
to learn that th" b;;;;'t'
*o'" "m"i*;'"I";e fMRr An interval scale of measurement applies to the numerals of a quantitative
i;;;; frt" place? Before doing the measure variable that meet two conditions: First, the numerals represent equal intervals
was particip""t
participa;;;" re test-an observational
'"*ld;;;^"-*uti'n"a
scans, the r"r""r;;?;;;the it"tt an fMnr pattern to indicate (distances) between levels, and second, there is no "true zero" (aperson can get
(Deary et al', zorojis"ii"'fo '"'"u'"h"" -t*ftt a score of 0, but the 0 does not really mean "nothing'). An IQ test is an interval
whenaperso"i'g""ui""lyhuppy'Ilowever'ih"o"lywuyaresearchercouldknow isby asking each scale-the distance between IQ scores of tOO and 105 represents the same as the
of brainactivitywurrr.-o"i*awithhappiness time the brain distance between IQ scores of 105 and 110. However, a score of 0 on an IQ test does
that some pattern
r"a' c *"""*l lliT ::*"
person r,o* t'uppv-h" "'''i-'" '"lf ';;;;
learn later-i'n this chapter,
it's best when self-report' not mean a person has "no intelligence." Body temperature in degrees Celsius
scan was u"ir.rg aorr". or r"r,il patterns of results' is another example of an interval scale-the intervals between levels are equal;
similar
observational, and physiological -"u""""how however, a temperature of 0 degrees does not mean a person has "no tempera-
ture." Most researchers assume questionnaire scales like Diener's (scored from
Sca1es of Measurement | = strongly disagree to 7 = strongly agree) are interval scales. They do not have a
true zero but we assume the distances between numerals, from I to 7, are equiva-
Allvariablesmusthaveatleasttwolevels(seeChapter3).Thelevelsofoperational
of measurement'
using difierent scales lent. Because they do not have a true zero, interval scales cannot allow a researcher
variables, h"*J;;;;;" "od"d to say things like "twice as hot" or "three times happier."
OUANTITATIVE VARIABLES
CATEGORICAL VS' Finally, a ratio scale of measurement applies when the numerals of a quanti-
tative variable have equal intervals and when the value of 0 truly means "none"
(categortcat
operationalvariablesareprimarily.classifiedascategoricalorquantitative'The
suggests' are categories'
catt*J;;;;;tibl"'' u' th";;;
levels of
u*ioii"r)nl-"*pl"'
levels are
a:e. sex' whose
or "nothing" of the variable being measured. On a knowledge test, a researcher
might measure how many items people answer correctly. If people get a 0, it truly
variables
r"""j, itt u ,tnay might be rhesus macallue'
"r" ^rrJl"rif,i-*^inot
species, *rr"r" represents "nothing correct" (0 answers correct). A researcher might measure how
male and {emale; and frequently people blink their eyes in a stressful situation; number of eyeblinks is a
t"n*t*t rhesus macaques' "2" tor
chimpanzee'unaUo"oto'Aresearcn"'*igtttdecideto'assignnumberstothe
G'g''
levels of a
"ut"go'itul "ariable
"';;;;;?" 12=
Ways to Measure Variables
'ilTHff il::H,I",",:::1^,:i::::T;?'i;il:lT#"::ffi""#:1[: 2
75
Kendra at
with rtems is consistent
t' -i"iJI' tr'rt" nLi"* measurement 60
also disagree with (cm)
ffi;;t"* ""';i'i'
scale has internal
reliability'
q(
data to see if it is
reiiJl"' n"'""'"n""-"'"' '1"";*ffidi;ali1 below)' was 80
association b"t*""r,
;;;;;;" ot,t -""lrir1"'l"a """trt"t'
"
Second measurement (cm)
time and a later time'
and another, o' U"t*l"i
a" Years
"urii"r ;;;;"d to dot"tnent reliabilitY' FIGURE 5.2
Irere's u"
go, whe n p e
""u*ii-oirto*
opre
"o""r"ti""'
.l;*il;
;' i"' o " "n'"
#Uf"*ffiJ':fi l"ffi *
Two measurements of head circumference.
(A) The data for four participants in table form. (B) The same data presented in a scatterplot.
;"
cla"'oo-l; u (te st-retest ratings agree. From these notes, they create a scatterplot, plotting Observer Mark's
centimeters' fo' "'""-'vo"" "
i twice
JffilJilml;:n; ratings on the x-axis and Observer Matt's ratings on the y-axis.
If the data looked like those in Figure 5.3A, the ratings would have high inter-
i*n*;m:'.m;[H"""tilili';;;;'henhavesomeoneersemeasure rater reliability. noth Mark and Matt rate Jay's happiness as 9-one of the happiest
them (interrater reliability)' a ' kids on the playground. Observer Mark rates Jackie a 2-one of the least happy
the first measurements
Fieures'2'h;;l;;ih"'""'1t'ofsuchameasurement^mishtlook'inthe
t";';;;plot' kids; Observer Matt agreed because he rates her 3, and so on. The two observers
form of a data
of he ad ti"
t'b;;;;
o'"t"'
"
scatterplot'
Li"*#io""i"a""t: ;*" ti ;
"" l* l?lJ'T-'H:;ff lil do not show perfect agreement, but there are no great disagreements about the
happiest and least huppy kids. Again, the points are scattered around the plot a
*;;:H;)"::li'F::iTT:Yl:l1Tlil;;';;'io'l'"uch bit, but they hover close to the sloping line that would indicate perfect agreement.
In contrast, suppose the data looked like Figure 5.3B, which shows much less
person measured-twice.nts to be about the
dot represents a
- of head circumference agreement. Here, the two observers are Mark and Peter, and they are watchingthe
two measurem
W; would exPect the all rall almost exactlv
th" d;:'"ffi;;**lot
""":TT:ffi n"]i;;' il; ;; * won't same children at the same time, but Mark gives Jay a rating of 9 and Peter thinks
same ror The two measures
"*n lineit inai.ut" p""r"rJi;;;;";J he rates only a 6. Mark considers Jackie's behavior to be shy and withdrawn and
on the sloping "i*""ra rates her a2,butPeter thinks she seems calm and content and rates her a 7. Here
a,waysu""*"*iiiiil,#;;;11,rt*:*I:y.n:nH:[:lHT::',"JJ,""1
different scort the interrater reliability would be considered unacceptably low One reason could
ift"t *iff lead to slightly
for each trial)' be that the observers did not have a clear enough operational definition of 'ohap-
tions in the t"n" *1""'e placement piness" to work with. Another reason could be that one or both of the coders has
AGREE}IENT
CAN SHOW INTERRATER notbeen trained well enough yet.
SCATTERPLOTS
A scatterplot can thus be a helpful tool for visualizing the agreement between
two administrations of the same measurement (test-retest reliability) or between
::"".:til:::':f"X'supp:set*I""X:-*""nildrenarebeingobservedataplav- two coders (interrater reliability). Using a scatterplot, you can see whether the
,,*"a"y"'"1"-n*:x1";ffi
on a sci
appears to be,
+i$ltl*:*i::ir'r'J"''"?."',"li?
Reliability of Measurement: Are the Scores Consistent? 127
170
t a
3
165 aa a
is low' 2 c aa aa 160
them)
a straight oca a a'
(if the dots are close to
1
through them)' x X
r=.56 r=.93
Coefficient r to c D
x X
,r"n i' Lr"'re d to as the u o"
"-tl":,"i J* r,, or not slopin g eattoall'a
:l"T:':ffi#il':*T:Ti* r=-.59 r=.O1
" *'..";"j u, clo s
"- I
il"
",iu"r
iil" fi ::J;,TJ #l',? H ;lff tnt""a
other wav the. sca.1t1r-1t::t^:::'r." ,,'"r"
"
out' This spread corre- FIGURE 5.4
and would since they are similar to each other, but rtems I and 3 are probably not correlated, i
of an re test we
"b;;; flu';;;;J-"i t"it"-""Ttress' do not
I
ect te st-retest'":ffiil:'; Item 4 doesn't seem to go with any other item, either. But how could we quantify
exp i
INTERRATER RELIABILITY alpha (or cofficient alpha) to see if their measurement scales have internal reliabil-
pos- ity. First, they collect data on the scale from a large sample ofparticipants, and then I
strong (according to -u"' "'' trust the alpha returns one number, computed from the average of the inter-item correlations
itive and '"'"u'Jn"
po'iiive but weak' we could
not
very good i"t""ut"' '"iiJuilit''
rt' i' and the number of items in the scale. The closer the cronbach,s alpha is to 1.o, the
a-big problem' In the
observets,ratings.wewo-,,ldr"t,uinthecodersorrefineouroperationaldefinition better the scale's reliability. (For self-report measures, researchers are looking for
,,.or" ,"r"rlorl;"d"d. e rr"guti.rJj *""ia i"ai.",e but cronbach's alpha of .70 or higher) If cronbach's alpha is high, there is good internal
so it can be
trtJ *""ra mean obser;:;
;;;t ttnsidered Jav verv happvrackie reliability and researchers can sum all the items together. If Cronbach's alpha is less
daycare example'
"ro;;;;,-*1"r*, Yi:l :"""uered
and so on' when we're
assesstng than .70, then internal reliability is poor and the researchers are not justified in com-
observer
"",",
ut"
"o,rJi"',;lur.,,"r,
p"t"' 'lactti" il;;;; bining all the items into one scale. They have to go back and revise the items, or they
unhappv "l;J";; is rare and undesirable'
reliability, a negative correlation interrater reliability when
the observers are might select only those items that were found to correlate strongly with one another. i
'";;;;;hi'
the same tut"go'i"s' As the measures they are using. one example of such evidence is in Figure 5.5,
I
well-being scale, called Satisfaction with Life (swl), was used in six studies.
INTERNAL RELIABILITY The table shows the internal reliability (labeled as coefficient alpha) from each
subjec- of these studies, as well as test-retest reliability for each one. The table did not
Internalreliabilityisrelevantformeasuresthatusemorethanoneitemtoget
as Diener's five-item
on self-repo't '""h present interrater reliability because the scale is a self-report measure, and inter-
at the same
"o"'t'u"t' u""""t
'lJ"' qo"ttio" phrased in multiple wavs'
tilt;;; rater reliability is relevant only when two or more observers are doing the ratings.
tive well-bei"r ';t;';;;t"
b"""t;;;y;"" *uyot wording the question
Researchers rephrase
the items Based on the evidence in this table, we can conclude the subjective well-being
scale has excellent internal reliability and excellent test-retest reliability. you,ll see
mightintroducemeasurement"""''*o""'cherspredictanysucherrorswill
another example of how reliability is discussed in a journal article in the Working
cancel"uthotf'"'outwhentheit"""u'""mmed"ptofot-eachperson'sscore'
It Through section at the end ofthis chapter.
i","r"lil?fi"Uifi ,. *i"th"' peopl" responded consistently
Beforecombiningtheitemso"u.t"lf-'"portscale'researchersneedtoassess
ru-*
the scale,s "rrut
Reliability of Measurement: Are the Scores Consistent? 131
FIGURE 5.5
SamPle
day (rigure 5.7). How can you know for sure these pedometers are accurate?
weighs 50 pounds (22.7 kg)
Of course, it's straightforward to evaluate the validity of a pedometer: You'd sim- every time he steps on
ply walk around, countingyour steps while wearing one, then compare your own it. The scale is certainly
{ count to that of your device. If you're sure you walked 20O steps and your pedom- reliable, but it is not valid.
eter says you walked 2O0, then your device is valid. Similarly, if your pedometer
- YOUR UNDERSTANDING counted the correct distance after you've walked around a track or some other
"*="* path with a known mileage, it's probably a valid monitor.
t.Reliabilityisaboutconsistency'Definethethreekindsofreliability'usingthe
definitions' In the case of an activity monitor, we are lucky to have concrete, straight-
each of your
word cons'sfenf in
types of operationalizations-self-report' forward standards for accurate measurement. But psychological scientists often
2. Foreach of the three common which type(s) of reliability
would
want to measure abstract constructs such as happiness, intelligence, stress, or
plnv'iofogical-indicate
observationaf ' uno self-esteem, which we can't simply count (Cronbach & Meehl, 1955; Smith ,2OO5a, Flex
r = '25' r
-- -'65' r = -'o1' 2005b). Constructvalidity is therefore important in psychological research, espe- 7,A0t,""-
is the stronse st"
t. ilil:lTe followins correlations cially when a construct is not directly observable. Take happiness: We have no r 199''
.r:" -* -
or r = '43? means of directly measuring how happy a person is. We could estimate it in a
. ss.- = t'', u e^a r o,,
" I I:y,::E
j,l:l ;i::'i:,['i::''i'.-j;l?:l :"X Tli t number of ways, such as scores on a well-being inventory, daily smile rate, blood
,-; !4!4,.
t6.*
.,,--'
:1ue-n-aler oq Aeru leu'alu! pui pressure, stress hormone levels, or even the activity levels ofcertain brain regions.
rslellelu! :leuollenlosqo
Yet each ofthese measures ofhappiness is indirect. For some abstract constructs,
there is no single, direct measure. And that is the challenge: How can we know if
indirect operational measures of a construct are really measuring happiness and
DOES FIGURE 5.7
VALIDITY OF I{EASUREMENT: not something else?
Are activity monitors
IT'S SUPPOSED We know by collecting a variety of data. Before using a measure in a study, valid?
IT MEASURE WHAT researchers evaluate the measure's validity, by either collecting their own A friend wore a pedometer
data or reading about the data collected by others. Furthermore, the evidence during a hike and recorded
TO MEASURE? for construct validity is always a matter of degree. Psychologists do not say a these values. What data
could you collect to know
get at the particular measure is or is not valid. Instead, they ask:'What is the weight of whether or not it accurately
th"y;i';;ant to be sure'they
Beforeusingparticularoperationa.li.zati'onsinastudy,researchersnotonlycheck evidence in favor of this measure's validity? There are a number of kinds of
to be sure tt'" *"u"i'J'"*" '"liubl";
might counted his steps?
fJfi;;;;nstruct validitv' You
conceptual vu'iuur"t'Jrt""v?-"*"i*""a"a
' lnterrater
relidbllltY:
Two coders ratings
Criterion Validity: Does It Correlate
of a set of targets with Key Behaviors?
are consistent'with
eaeh othor- To evaluate a measurement's validity, face and content validity are a good place to
start, but most psychologists rely on more than a subjective judgment: They prefer
to see empirical evidence. There are several ways to collect data on a measure, but
Discriminant validitY: in all cases, the point is to make sure the measurement is associated with some-
Convergent validity:
Your self-report Your self'report thing it theoretically shouldbe associated with. In some cases, such relationships
Criterion validitY; measure ig less li
measure is more can be illustrated by using scatterplots and correlation coefficients. They can be
Your measule is stronglY associated
correlated with a strongly associated illustrated with other kinds of evidence too, such as comparisons of groups with
1l
according to the conceptual definition. Suppose you work for a company that
evidencethatcanconvincearesearcher'andwe'lldiscussthembelow'First' wants to predict how well job applicants would perform as salespeople. Of the 1
covered in this chaPter' pany use? You have two choices, which we'll call Aptitude Test A and Aptitude
Test B. Both have items that look good in terms of face validity-they ask about
Face Validity and Content Validity: a candidate's motivation, optimism, and interest in sales. But do the test scores
,"1*:t-X,:ffiil:,:*1"#f #;;
ruture sales
one rn other worJs' are supposed to indicate which of a person's statements are truthful and which
| * u,"
",
io,
",, "
with scores on Aptlt
"*lffi **i* J1ff :n: :4
i1*
::ffi'J, - *'o
as a measure of
are lies. If skin conductance and heart rate are valid measures of lying, we could
conduct a known-groups test in which we know in advance which of a person's
o rr", u"tt"r"".lt"rion validity salespeople' rn statements are true and which are false. The physiological measures should be
conclude tt ut eptitli-"
I
I
i
for selecting
""r, o"" trt"v "o,'-o|.i.,,a" elevated only for the lies, not for the true statements. (For a review of the mixed
1
selling abilitv, ""d;;"h" 'hJ;^;;
jata show th"t ,"o,", Test.B. are a poorer
indi-.
evidence on lie detection, see Saxe, 1991)
a measure ot
I
the other
contfast, vaiidity as
it has ;;;t*"on The known-groups method can also be used to validate self-report measures.
\ cator of futu'" 'ul"' performance; PsychiatristAaron Beck and his colleagues developed the Beck Depression Inven-
I sales aPtitude' tory (BDI), a 2l-item self-report scale with items that ask about major symptoms of
predict their actual
Criterionvalidityisespeciallyimportantforself-reportmeasuresbecause
;;;'-"ll people's self-reports
the correlatio" tu"'i"i;;"
Validity of Measurement: Does lt Measure What lt's Supposed to Measure? 1r7
su 25
one of four choices' lower mean score on the scale, compared with Canadian
college students, who averaged much higher-indicated by 20
BDI
0 I do not feel sad' the M column in Table 5.3. Such known-groups patterns score 15
1 I feel sad'
out of it' provide strong evidence for the criterion validity of the
and I can't snap
I am sad all the time
10
2 it' SWB scale. Researchers can use this scale in their studies
that I can't stand
3 I am so sad or unhappy with confidence. 5
other people' What about the Ladder of Life scale, the measure of hap- o
O I have not lost interest in
piness used in the Gallup-Healthways Well-Being Index? None Mild Moderate Severe
]'lamlessinterestedinotl.rerpeoplethanlusedtobe. This measure also has some known-groups evidence to Psychiatrists' rating
2lhavelostmostofmyinterestinotherpeople.
other people' support its criterion validity. For one, Gallup reported
in
3 I have lost all of my interest FIGURE 5.17
that Americans' well-being was especially low in 2008 BDI scores offour known groups.
for a total BDI score'
uP the scores on each of the 21 items and 2009, a period corresponding to a significant down- This pattern of results means it is valid to
A clinical scientist
adds to a high of 63' turn in the U.S. economy. Well-being is a little bit higher in
from a low of 0 (not at ali dePressed) gave this
use BDI cutoff scores to decide if a person
which can range of the BDI, Beck and his colleagues American summer months, as well. These results fit what has mild, moderate, or severe depression.
validitY one group was suf-
To test the criterion of peoPle. TheY knew we would expect if the Ladder of Life is a valid measure of
(Source: Adapted from Beck et al., 1961.)
grouPs
self-rePort scale to
two known because they had
and the other grouP was not well-being.
dePressron se each Person'
fering from clinical interviews and diagno
trists to conduct clinical group s and created a
asked PsYchia BDI scores of the two
the mean suPPorts the crite-
The researchers comPuted
shown in Figure 5.1O. The evidence the
Convergent Validity and TABLE 5.3
bar graph, the exPected result:
of the BDI' The gr aph shows was Discriminant Validity: Does
rion validitY of depressed peoPle Subjective Well-Being (SWB) Scores for
of the known grouP were not the Pattern Make Sense?
average BDI score the known grouP who Known Groups from Several Studies
score of this
30 higher than the average was established in Criterion validity examines whether a mea-
its cn terion validitY
SAMPLE STUDY
rf it's a valid measure.of Because need a CHARACTERISTICS N MSD
depressed researchers REFERENCE
1tr .l"oression, BDI should is still wideiY used todaY when to sure correlates with key behavioral outcomes.
way, the BDI who are vulnerable Another form of validity evidence is whether American college 244 23.7 6.4
way to identifY new PeoPle
Pavot & Diener
3?"I 3 l"i'iil3; 3i, il"'" " quick and valid students (1993)
6iagnosed wlth there is a meaningful pattern of similarities
20
dePression depressron' to calibrate low'
used the known-grouPs paradigm and differences among self-report measures. French Canadian 355 2s.8 6.1 Blais et al.
BDI Beck also When the P sychiatrists
I. A self-report measure should correlate more college students (1989)
scores on the BD
medium, and high
15
onlY
score
in the samPle' theY evaluated not strongly with self-report measures of similar
(male)
le dePre ssion in
10
intervrewe d the PeoP but also the level of constructs than it does with those of dissimi- Korean university 4L3 1-9.8 5.8 Suh (1993)
depressed
whether theY were severe. As exPec
ted, the students
none' mild, moderate, or (asse ssed
lar constructs. The pattern of correlations with
5 each Person:
e as their level
of dePre ssion measures of theoretically similar and dissimilar
grouPs ros was Printing trade 3O4 24.2 6.0 George (1991)
BDI scores ofthe s.11). This result
was more severe (Figure constructs is called convergent validity and workers
o bvp sychiatrists) measure of dePres-
Not Depressed that the BDIwas avalid discriminant validity (or div e r g ent v ali dity), Veterans Affairs 52 11.8 5.6
even clearer evidence can confidentlY
use
respectively.
Frisch (l-991-)
depressed clinicians and researchers hospital inpatients
s10n. with the BDI, how severe a person's
Psychiailists' iudgment ofgot scores to categorize
specific ranges CONVERGENT VALIDITY Abused women 70 20.7 Fisher (1991)
FIGURE 5'12
3l
ao a
ao
o
oa
aaa o a
a Evidence supporting the discriminant
validity of the BDI.
l'-i"*" the convergent .E o
"tpporting
validitY of the BDI' o
aa a As expected, the BDI is only weakly
correlated with perceived health problems
30
;; ;', stro n s rv
:...':l_"^."-0,:
tlrc vLr v \' :.l,T:::;; o 10
20 o 'to 20 30 (; = .16), providing evidence for discriminant
measure of dePresston' nt va I i d i tY BDI total score BDI total score validity. (Source: Segal et al., 2008.)
"
h#liU*, "
:oio'oo;;"'n
DISCRIMINANT VALIDITY
Table 8'4, and Statistlcs
Review: DescriPtive
{#,';;**"T:,'Ji?""'}'"':i:"".:{+:sl'il';;**;i:,[:,T::::1',{i The BDI should not correlate strongly with measures of constructs that are very
statistics' PP' 46A' 472' different from depression; it should show discriminant validity with them. For
This between similar self-report
:IH*'Tt*ffi1T#'T;""1""'1Hi#;i'"*"":'"d;ih"vscorelow example, depression is not the same as a person's perception of his or her over-
for the good evidence
on both the BDr "or'r"tution
"J;;;'";-D) 1a"n'"'ffi;;;";;;; all physical health. Although mental health problems, including depression, do
measures of the '"i" """""'"i overlap somewhat with physical health problems, we would not expect the BDI
BDI'
.orr*rg""t uaiiditY of the to be strongly correlated with a measure of perceived physical health problems.
restingro'"o"J"#*"l-"rn'""i1*:5:tll;3Ji,l"f J'iiiil:H"3'*i: More importantly we would expect the BDI to be more strongly correlated with
the CES-D and well-beingthan it is with physical health problems. Sure enough,
i 'ff J|'Jii},l1l!ii?;';ffii'**{ffiil;;"bti'h"d'.oo!rheresearch'
ffi;;; -"u"""' b"t that
measure's
1
ers might ne*t t" t"l"'u."t" tf'" cEs-D *itit *idence' Eventuallv' however' they Segal and his colleagues found a correlation of only r = .16 between the BDI and
a measure of perceived physical health problems. This weak correlation shows
i f tt' e weight and paftern
validity would "l'";;;;t" " "nn*t"a ;;; that the BDI is different from people's perceptions of their physical health, so we
might be satisfied tffi;;;;ffi'
"uut"uti"e shown
""lid when measures are
-'" -;;;inced definitive
can say that the BDI has discriminant validity with physical health problems.
of the evident"' wtu"y '"'earchers rrowever' no single Figure 5.13 shows a scatterplot of the results.
to predict oillt';;'?;;* "'L""^"" "aliditv)'
wilf
"tt""' 2o05a)'
t"tiiity lsmittr'
outcome """Uii'ft Validity of Measurement: Does lt Measure What lt's Supposed to Measure? 14,
protbecauseT":*j::::*Jt*:r['"'n"'Tr:
jru;xllr5il#iff l;
Noticealsothatmostofthedotsfallinthelowerleftportionofthescatter-
As another example, suppose you used your pedometer to count how many
steps there are in your daily walk from your parking spot to your building. If the
not dePressed: TheY
s'
pedometer reading is very different day to day, then the measure is unreliable-and
problems scale' r disorders have sim-
-^-^:.'l^* rh'r manv developmental of course, it also cannot be valid because the true distance ofyour walk has not
changed. Therefore, reliability is necessary (but not sufficient) for validity.
il;;;.";-,",,:::,#il,i*;:k:lim;*mi$'JlT:ifl
'"i;**:T:[-"""G3:!"i,lLli*r'r'iu"'-portanttospecirv'ror '"'*i
Therefore, a screenrn
nant validitv, it 't'o'r5ioi;;;;;it'
d;;; ;h" same child'as having a lan-
t
I
;;;;i;;',i*"*"1'?"'1X'i[*:*f
shouldnot be correlz
#m,'""lll:'fii:tilil: CHECK YOUR UNDERSTANDING
general intelligencelecessary
It is usuallY not r
to establish "ff
discriminant validity
between a mea-
and ValiditY the cuff she's using is reliable and accurate. similarly, before conducting a study,
researchers want to be sure the measures they plan to use are reliable and valid
oneessentialpointisworthreiterating:Thevalidityofameasureisnotthesame ones. when you read a research study, you should be asking: Did the researchers
asitsreliabil,.ril;;4ii,t-ight-boTi'$"'ff ;ti#if *ffiT$,tt"li:'"; collect evidence that the measures they are using have construct validity? Ifthey
*lm':-'l*;"."'iH1:Tfl illl-|ffi
didn't do it themselves, did they review construct validity evidence provided
";';;;;t'eliable'buiitstillmavnot
by others?
be valid for assessingintelligence' -r: In empirical journal articles, you'll usually find reliability and validity infor-
has to do
i
over t'-"';;i;;; Through section shows how such information might be presented, using a study
I
FIGURE 5.14
(ArR) scale.
item" in the Appreciation in Relationships
people appreciate
These items *"r"u."d by the researchers to measure how much of
partner' Do you think these items have face validity as a measure
their relationship
2012')
appreciation? (Source: Gordon et al '