Professional Documents
Culture Documents
Reliable measures are dependable, consistent, and relatively free from unsystematic errors of
measurement.
Measurement = the assignment of numerals to objects or events according to rules à “How
much”?. The definition says nothing about the quality of the measurement procedure.
- Equality
(d – a) = (c – a) + (d – c)
- Ranking X’= a + bX à X’= transformed
Interval
- Equal-sized score, a & b = constants, X =
units original score.
(additivity)
- Equality
- Ranking
- Equal-sized
Ratio units
- True
(absolute)
zero
Psychological are mostly nominal- or ordinal level scales*. Intelligence, aptitude (talent) and
personality scales are ordinal-level measures (not amounts, rather ranks). Yet, we can often
assume an equal interval scale.
Physical measurements are evaluated in terms of the degree to which they satisfy the
requirements of order, equality, and addition.
HR specialists are confronted with the tasks of selecting and using psychological measurement
procedures, interpreting results, and communicating the results to others.
Test = any psychological measurement instrument, technique or procedure. Testing is systematic
in 3 areas: content, administration, and scoring.
CONTENT
* Task: - Verbal
- Non-verbal
- Performance
ADMINISTRATION
* Efficiency: - Individual
- Group
SCORING
* Objective
* Nonobjective
Cost
Direct costs à price of software or test booklets, answer sheets, etc.
Indirect costs à time to prepare the test materials, interviewer time, etc.
Interpretation
Thorough awareness of the strengths and limitations of the measurement procedure,
background of the examinee, the situation and the consequences for the examinee.
Face validity
Whether the measurement procedure looks like it is measuring the trait in question.
Reliability and validity information should be gathered not only for newly created measures but
also for any measure before it is put to use. Reliability is important; to make that shot count, to
present the ‘truest’ picture of one’s abilities or personal characteristics. = freedom from
unsystematic errors of measurement. Errors reduce the reliability, and therefore the
generalizability of a person’s score from a single measurement.
The correlation/reliability coefficient is a particularly appropriate measure of such agreement.
2 purposes:
1). To estimate the precision of a particular procedure as a measuring instrument;
2). To estimate the consistency of performance on the procedure by the examinees.
! 2 includes 1 à it is possible to have unreliable performance on a reliable test, but reliable
performance on an unreliable test is impossible.
Reliability coefficient may be interpreted directly as the percentage of total variance attributable
to different sources (coefficient of determination, r²).
X = T + e à X = observed (raw) score, T = true score (measurement error-free), e = error.
Test-retest
Coefficient of stability. Errors: administration (light, loud noises) or personal (mood).
TEST/FORM A--------- RETEST/FORM A (TIME > 0)
Interrater Reliability
Can be estimated using 3 methods:
1. Interrater agreement à % of rater agreement and Cohen’s kappa
2. Interclass correlation à when 2 raters are rating multiple objects/individuals
3. Intraclass correlation à how much of the differences among raters is due to differences
in individuals and how much is due to the errors of measurement.
It is not a ‘real’ reliability coefficient, because it provides no information about the
measurement procedure itself.
Scale coarseness: regardless of whether the scale includes one or multiple items,
information is lost due to scale coarnesses, and two individuals with true scores of 4.4
and 3.6 will appear to have an identical score of 4.0. Scales of Likert-type and ordinal
items are coarse. In contrast to the effects of measurement error, the error caused by scale
coarseness is systematic and the same for each item. Effect à the relationship between
constructs appears weaker than it actually is. Solutions à
1. Use a continuous graphic-rating scale instead of Likert-type scales.
2. Use a statistical correction procedure after data are collected.
Generalizability theory = the reliability of a test score as the precision with which that
score/sample, represents a more generalized universe value of the score.
An examinee’s universe score is defined as the expected value of his or her observed
scores over all admissible observations.
The use of generalizability theory involves conducting two types of research studies: a
generalizability (G) study and a decision (D) study. A test has not one generalizability
coefficient, but many. The application of generalizability theory revealed that subordinate
ratings were of significantly better quality when made for developmental rather than
administrative purposes, but the same was not true for peer ratings.