Professional Documents
Culture Documents
Measurement: Joseph Stevens, Ph.D. © 2005
Measurement: Joseph Stevens, Ph.D. © 2005
Knowledge
Comprehension
Application
Analysis/Synthesis
Evaluation
Assessment Tasks
Selected Response – MC, T-F, matching
Restricted Response – cloze, fill-in,
completion
Constructed Response - essay
Free Response/Performance Assessments
Products
Performances
Rating
Ranking
Magnitude Estimation
CRT versus NRT
Criterion Referenced Tests (CRT)
Comparison to a criterion/standard
Items that represent the domain
Relevance
Representativeness
Norm Referenced Tests
Comparison to a group
Items that discriminate one person from
another
Kinds of Scores
Raw
Standard scores
Developmental Standard Scores
Percentile Ranks (PR)
Normal Curve Equivalent (NCE)
Grade Equivalent (GE)
Scoring Methods
Objective
Subjective
Holistic
Analytic
100
80
60
Percent
40
20
0
Did Not Meet Met
Standard
Aggregating Scores
Total scores
Summated scores
Composite scores
Issues
Intercorrelation of components
Variance
Reliability
Theories of Measurement
Classical Test Theory (CTT)
X=T+E
Pg ( e x
1 ex
Item Characteristic Curv e: 2
a= 0.725 b = -1.367
1.0
0.8
Probability
0.6
0.4
0.2
b
0
-3 -2 -1 0 1 2 3
Ability
0.8
Probability
0.6
0.4
0.2
b
0
-3 -2 -1 0 1 2 3
Ability
Below 9 3 1 13
Meets 4 8 2 14
Exceeds 2 1 6 9
Total 15 12 9 36
Estimating Reliability
Spearman-Brown prophecy formula
More is better
Reliability as error
Systematic error
Random error
SEM
_______
SEM = SDx √ 1 - rxx
Factors affecting reliability
Time limits
Test length
Item characteristics
Difficulty
Discrimination
Heterogeneity of sample
Number of raters, quality of
subjective scoring
Validity
Accuracy
Unified View (Messick)
Use and Interpretation
Evidential basis
Content
Criterion
Concurrent-Discriminant
Construct
Consequential basis
Validity
Internal, structural
Multitrait-Multimethod (Campbell &
Fiske)
Predictive
Test Development
Construct Representation
Content analysis
Review of research
Direct observation
Expert judgment (panels, ratings,
Delphi)
Instructional objectives
Test Development
Blueprint
Content X Process
Domain sampling
Item frames
Matching item type and response format
to purpose
Item writing
Item Review (grammar, readability,
cueing, sensitivity)
Test Development
Writing instructions
Form design (NAEP brown ink)
Field and pilot testing
Item analysis
Review and revision
Equating
Need to link across forms, people, or
occasions
Horizontal equating
Vertical equating
Designs
Common item
Common persons
Equating
Equipercentile
Linear
IRT
Bias and Sensitivity
Sensitivity in item and test
development
Differential results versus bias
Differential Item Functioning (DIF)
Importance of matching, legal versus
psychometric
Understanding diversity and individual
differences
Item Analysis
Difficulty, p
Means and standard deviations
Discrimination, r-point biserial
Omits
Removing or revising “bad” items
Example
Factor Analysis
Method of evaluating structural
validity and reliability
Exploratory (EFA) example
Confirmatory (CFA) example