Professional Documents
Culture Documents
• Definitions
– Correlation, Reliability, Validity, Measurement
error
• Theories of Reliability
• Types of Reliability
– Standard Error of Measurement
• Types of Validity
• Article
• Exercise
Definitions
• Correlation
– reflect direction (+/-) & strength (0 to 1) of the
relation between two variables
• Variance explained
– Reflects the strength of relation of two variables
• Square of correlation
• Varies from 0 to 1
250
Vince Carter
210
Weight (pounds)
130
Julia Roberts
Calista Flockhart
90
150 160 170 180 190 200
Height (cm)
250
r = .76
Vince Carter
r2 = 58%
210
Weight (pounds)
130
Julia Roberts
Calista Flockhart
90
150 160 170 180 190 200
Height (cm)
Effect of Measurement Error on
Correlations
200
r = 1.00
190 r2 = 100%
180
Height (cm)
170
160
150
150 160 170 180 190 200
Height (cm)
200
r = .98
190
r2 = 96%
Self-Reported Height (cm)
180
170
160
150
150 160 170 180 190 200
Objective Height (cm)
250
r = .92; r2 = 85%
225
Self-Reported Weight (cm)
200
175
150
125
100
100 125 150 175 200 225 250
Objective Weight (cm)
Definitions
• Reliability
• Consistency & stability of measurement
• Reliability is necessary but not sufficient for
validity
• E.g. A measuring tape to is not a valid way to measure
weight although the tape reliably measures height and
height correlates w/weight
• Validity
• Accuracy/meaning of measurement
• Test-retest
• Consistency across time
• Parallel forms
• Consistency across versions
• Internal
• Consistency across items
• Scorer (inter-rater)
• Consistency across raters/judges
Example: The Satisfaction with Life Scale
(SWLS)
1. In most ways my life is close to ideal.
2. The conditions of my life are excellent.
3. I am satisfied with my life.
4. So far I have gotten the important things I
want in my life.
5. If I could live my life over, I would change
almost nothing.
1 2 3 4 5 6 7
Strongly Strongly
Disagree Agree
Types of Reliability
• Test-retest reliability
• Correlation of scores on the same measure taken at
two different times
• Time interval assumes no memory/learning effects
• Parallel-forms
• Correlation of scores on similar versions of the
measure
• Forms equivalent on mean, stan dev, inter-correlations
• Can have time interval b/w admin of two forms
Types of Reliability
Test-retest Reliability
Time 1 Time 2
I1 I2 I3 AvgT1 I1 I2 I3 AvgT2
P1
P2
P3
Correlate AvgT1 to AvgT2 to get reliability
6
SWLS Time 2 (End of Semester)
1
1 2 3 4 5 6 7
SWLS Time 1 (Beginning of Semester)
Test-retest reliability of SWLS
• Good test-retest reliability
•Participants have similar scores at Time 1
(beginning of semester) and at Time 2
(end of semester).
•Retest reliability is useful for constructs
assumed to be stable
•Current mood (e.g., how you feel right
now) shows low-retest correlations, but
that does not mean that the mood measure
is not reliable
Types of Reliability
• Internal Consistency
• Correlation of scores on two halves of the measure
• Length of measure increases reliability
• Inter-rater
• Correlation of raters’ scores
• E.g., Scores on structured job interview
• Can also include time interval
– e.g., ratings of the worth of jobs across time & across judges
Types of Reliability
Internal Reliability
Half 1 Half 2
I1 I2 I3 AvgH1 I4 I5 I6 AvgH2
P1
P2
P3
Correlate AvgH1 to AvgH2
Inter-rater Reliability
Rater 1 Rater 2
I1 I2 I3 AvgR1 I1 I2 I3 AvgR2
P1
P2
P3
Correlate AvgR1 to AvgR2
7
r = .70; r2 = 49%
6
SWLS Items 3, 4, & 5
1
1 2 3 4 5 6 7
SWLS Items 1 & 2
Internal consistency of SWLS
• Satisfactory internal consistency.
•Participants respond similarly to items
that are supposed to measure the same
variable.
•Should be .70 or higher
• Test-retest
• Parallel forms
• Internal
• Scorer (inter-rater)
Standard Error of Measurement
• Definitions
– Correlation, Reliability, Validity, Measurement
error
• Theories of Reliability
• Types of Reliability
• Standard Error of Measurement
• Types of Validity
Validity
Evidence that a measure assesses the construct
Reasons for Invalid Measures
• Different understanding of items
• Different use of the scale (Response Styles)
• Intentionally presenting false information
(socially desirable responding, other-
deception)
• Unintentionally presenting false
information (self-deception)
Types of Validity
• Content Validity
• Extent to which items on the measure are a good
representation of the construct
• e.g., Is your job interview based on what is required for
the job?
• Content validity ratio based on judges’
assessments of a measure’s content
• e.g., Expert (supervisors, incumbents) rating of job
relevance of interview questions
Types of Validity
• Criterion-related Validity
• Extent to which a new measure relates to another
known measure
• Validity coefficient= Size of relation between the new
measure (predictor) and the known measure (criterion)
(a.k.a correlation)
• e.g., do scores on your job interview predict
performance evaluation scores?
Types of Criterion Validity
• Concurrent
• Scores on predictor and criterion are collected
simultaneously (e.g., police officer study)
• Distinguishes between participants in sample who
are already known to be different from each other
• Weaknesses
• Range restriction
– Does not include those who were not hired, fired & promoted
• Differences in test-taking motivation (employees vs.
applicants)
• Experience with job can affect scores on criterion
Types of Criterion Validity
• Predictive
• Scores on predictor (e.g., selection test) collected
some time before scores on criterion (e.g., job
performance)
• Able to differentiate individuals on a criterion
assessed in the future
• Weaknesses
• Due to management pressures, applicants can be chosen
based on scores on predictor (can have range restriction,
but this can be corrected)
• Often, special measures of job performance are
developed for validation study
Correction for range restriction
• Construct Validity
• Extent to which hypotheses about construct are
supported by data
1. Define construct, generate hypotheses about
construct’s relation to other constructs
2. Develop comprehensive measure of construct & assess
its reliability
3. Examine relationship of measure of construct to other,
similar and dissimilar constructs
O-H 1.00
• Validation Study
– Incremental validity
• Additional variance explained (LSOM vs LSI)
DV LSOM LSI
Subjective assessment .15 .01
Interactional instruction .21 .04
Informational instruction .06 .00
In-class Exercise
• Brainstorm constructs to develop measures
• E.g. Dimensions of CIR professor effectiveness, CIR
student effectiveness
• Choose two constructs that can be measured
similarly and be defined clearly
• Example measures
– Self-report (rating scales)
– Peer/informant reports
– Observation
– Archival measures
– Trace measures etc etc.
In-class Exercise
• Form two-person groups to
• Generate items of the 2 different measures for each of
the two constructs
• Appointed person collects all items for both
measures for both constructs
• Compiles & distributes measures to class
• Class gathers data on both measures & both
constructs
• Class enters data into SPSS format
• Compute reliabilities,means, correlations
Fill in the correlations
C2 C1
M1
M1
M2
M2
Types of Validity