Professional Documents
Culture Documents
C
VALIDITY
● K SAOs— The knowledge, skills, abilities, and otherattributes necessary for a new
incumbent to perform or do well on the job; also referred to as job, employment, or worker
specifications.
● Science-based selection provides a systematic and analytical process for identifying and
measuring job relevant KSAOs for facilitating a matching of candidate attributes to job
requirements.
● The psychometric concepts of reliability and validity are fundamental to evaluating
the value and legal defensibility of a test.
● By using reliable and valid selection methods, HR managers can be confident in their
value (i .e., that they do what they are intended to do, such as provide value to the hiring
organization), as well as save themselves and their company much aggravation and grief
in having to defend a psychometrically unsound selection tool or process.
● Science-based selection improves the quality of hires, thereby contributing to performance
efficiencies, effectiveness, and the well-being of both the organization and its employees.
mployment tests taken by the Toronto Police candidates — test ofcognitive ability (PATI), –
E
written communication test (WCT),--physical readinessevaluation for police (PREP), –
Behavioural Personnel Assessment Devicefor police(BPAD), andvision and hearing
tests.
● O ften, especially with small- to medium-sized businesses without well-resourced HR
departments, applicants submit resumes, and, after a preliminary screening, a few are
interviewed — In many cases the hiring process is informal.
● When a position becomes vacant, or is newly created, the employer may have a general
idea of the duties to be performed as part of the job, which are then included in an
advertisement used to recruit candidates.
● This advertisement may also state broad educational or experiential requirements
expected of candidates.
● The important difference is whether the job duties and position requirements have been
determined through systematic investigation-that is, a job analysis.
● Based on a review of the applicant's file, work references, and impressions formed in the
interview, the employer makes a hiring decision.
● This decision may reflect the employer's experience, a gut feeling or intuition about a
certain candidate, or personal preference.
● T he employer has an idea of the type of person who will perform well in the target job and
looks for an applicant matching this ideal, in the absence of objective evidence supporting
the job relatedness of the candidate attributes informing the decision.
● All too often, unfortunately, an employer's decision is founded more in biases (often
unconscious) than applicants' standing on job relevant attributes.
● With such informal selection processes seldom is the performance of those hired tracked
and benchmarked against ratings or scores on pre-hire assessments.
● Without matching performance on the job with pre-hire assessments, there is no way of
knowing whether those pre-hire assessments are helpful in making quality hires.
A SELECTION MODEL:
V
● ariables —simply refer to something that varieson the construct of interest.
● When we measure something, we assign a numerical value, and that value may vary
among people or across time and situations. – For example, we assign a value to
r epresent an "IQ" score to capture variability in intelligence among people, or within
people, over time
Variables allow us to make statements about constructs; for example, "Differences in
●
cognitive ability predict success on the job."
● We infer whether this statement is correct by examining the association between
measures of these two variables.
RELIABILITY:
● R eliability—The degree to which observed scoresare free from random measurement
errors.— Reliability is an indication of the stabilityor dependability of a set of
measurements over repeated applications of the measure on the same people – i.e.,
reflecting consistency in observed scores obtained from individuals over several
administrations of the same assessment.
● Most measures we take of job candidates to help inform selection decisions contain some
degree of error in measurement.
● This is especially so for measures of psychological constructs, such as personality or
mental ability.
● In the pure sciences we benefit from more precise measurements.
● In the social sciences precision in measurement is more challenging to achieve and we
must provide evidence for the psychometric integrity of the measures we use, especially
when they are used to inform "high-stake" decisions (as in hiring).
● In measuring most constructs in the social and behavioural sciences we must be content
with using assessments that lack 100 percent precision.
● We build confidence in a measure of what we believe to be a stable trait (e.g., general
cognitive ability) if each time weadminister it tothe same person it yields close to the
same score– It is unlikely to yield identical scorewith each administration, but it should
yield a very similar score.
● The most confident estimate of the person's actual ( true) cognitive ability would be an
average of the person's scores obtained over several assessments.
● Accordingly, the score obtained on each separate administration is an estimate of the
person's true cognitive ability.
● The closer the score on a single test is to the average score the individual obtains over
several administrations of the same test — the more confident we can be in the one
assessment.
● In the language of test experts (psychometricians) the score obtained on any one
administration (i.e., the "observed score") is the person's "true" score on the attribute
assessed and someamount of random "measurement error."
● It is more challenging to measure a psychological construct, such as extraversion, and in
social sciences we never obtain perfect measurement.
● A psychometrically validated measure of extraversion will approximate your true score on
this construct, such that you are very unlikely to obtain the exact same score each time
you complete the extraversion scale.
● There will be some random,unsystematic error ("noise")around the observed score
(the score you obtain).
● Some of the variability in scores across repeated measures might be associated with your
changing mood, testing conditions (e.g., comfortable or not), and the degree of attention
you give the test.
● When using measures for pre-hire assessment we strive to minimize measurement error,
as scores inform hiring decisions and we want faith that those scores are reliable and
accurate.
A
● nother way to think of reliability is in terms of the variability of a set of scores.
● The classical measurement model, which has had a major impact on HR research,
assumes that any observed score is a combination of a true score and an error score.
● True score —The average score that an individualwould obtain on an infinite number of
administrations of the same test or parallel versions of the same test.
● Error score —The difference between an observed scoreand a true score.
● This model assumes that thecharacteristic being measuredis stableand that the only
reason an observed score changes from one measurement to another isdue to random
error.
● Error scores areindependent of the characteristicbeing measuredand areattributable to
the measurement process, not to the individual.
● That is, the magnitude of error scores is unrelated to the magnitude of the characteristic
measured.
● The model also assumes that true scores and error scores combine in a simple
additive manner to produce the observed score.
● If the test is not very accurate-that is, if it adds large random error components to true
scores — then the variance of the measured (i.e., observed) scores should be much
larger than the variance of the true scores.
● R
eliability is captured as the ratio of true score variance to observed score
variance.
● T he reliability coefficient (rx)is also the degreethat observed scores, which are made
on the same stable characteristic, correlate with one another.
● A reliability is reported as a correlation coefficient, ranging in value from 0.0 to +1.0 —
When a test's reliability coefficient is close to 0.0 all variability in observed test scores is
due to measurement error, meaning that we can have no confidence that difference in test
scores across test takers is due to individual differences on the attribute we intended to
measure. Conversely, when a test's reliability coefficient is near + 1.0, this indicates that
most variability in scores reflects true score variability (we can be much more confident
that differences in observed scores across test takers reflect individual differences on the
attribute we intended to measure).
● The square of the reliability coefficient, (rx)2 represents the proportion of variance in the
observed scores that is attributed to true differences on the measured characteristic.
● S
ystematic error (biased) impacts the accuracy of our measure, but not its
reliability – Reliability is lowered only when unsystematic (random) error is present.
MEASUREMENT ERROR:
● M
easurement error —The hypothetical difference betweenan observed score and a true
score; it comprises both random error and systematic error – that is, it can be thought of
as the hypothetical difference between an individual's observed score on any
measurement and the individual's true score.
● M easurement error, whether systematic or random, reduces the usefulness of any set of
measures or the results of any test.
● It reduces the confidence that we can place in the score that the measure assigns to an
individual, which is problematic when scores are used to make "high-stake" decisions,
such as whether to hire someone.
● Information on the degree of error present in any set of measurements must be
considered when using the measurements to make decisions — like possible major
sources of error, the size of the error, and the degree to which the observed scores would
re-occur in another setting
● T
hestandard error of measurement— is a statisticalindex that summarizes information
related to measurement error — This index is estimated from observed scores obtained
over a group of individuals and reflects how an individual's score would vary, on average,
over repeated observations under identical conditions.
FACTORS AFFECTING RELIABILITY:
he factors that introduce error into any set of measurements can be organized into
T
three broad categories:
(1) temporary individual characteristics –Factorssuch as health, motivation, fatigue, and
emotional state introduce temporary, unsystematic errors into the measurement process.
(2) lack of standardization –Changing the conditionsunder which measurements are made
introduces error.
(3) chance –Factors unique to a specific proceduremay introduce error into the set of
measurements.
● O ne small problem exists with respect to true scores: we can never know the true score
variance because true scores are abstract constructs — But we can estimate the true
score.
● There are several ways to estimate a test's reliability — Each involves assessing the
consistency of an examinee's scores over time, across different content samples, or
across different scorers.
● The common assumption for each of these reliability techniques is thatconsistent
variability across the measurements represents true score variability, while
inconsistency across the measurements reflects random error.
● T o measure reliability, we must estimate the degree of variability in a set of scores
that is caused by measurement error.
● We can obtain this estimate by using two different, but parallel, measures of the
characteristic or attribute.
● Parallel testsare two different tests of the sameattribute that are designed to provide
equivalent scores regardless of which of the parallel tests the test taker completes.
● We are confident we have parallel tests if theyyieldapproximately the same mean and
standard deviation in scores.
● When people taking two parallel forms of a testobtainsubstantially different scoresthis
suggeststhe presence of measurement error.
● The correlation between scores obtained on one test with the scores obtained on a
parallel test provides a reliability coefficient.
● It is extremely difficult, if not impossible, to obtain two precisely parallel measures of the
same characteristic; therefore, several other strategies have been developed as
approximations of parallel measures.
● For eg., instructor giving different forms of a test to different class sections – where neither
test is considered easier or harder than the other.
Test and Retest:
● T he same test and measurement procedure are used to assess the same attribute for the
same group of people on two different occasions – the person takes the same test at two
different times,
● For eg.,The HR manager invites the job applicants back for a second employment
interview – They are asked the same questions in the same order.
● The correlation of their first and second interview scores estimates the reliability of the
employment interview.
● High correlations suggest high levels of reliability.
Internal Consistency:
● W here a test -- measures a single construct (e.g., extraversion) and each item of the test
is written to reflect an aspect of that construct.
● Accordingly, your response to one of the items of that test should correlate with your
answer to another item on that same test (of course, the inter-item correlations are
calculated across test takers).
● This is the logic underlyinginternal consistencyreliability.
● Rather than select any one pair of items, however, the correlations are calculated between
all possible pairs of items and then averaged.
● This average estimates the internal consistency, the degree to which all the items on the
test measure the same thing.
● These estimates are sometimes calledalpha coefficients,orCronbach's alpha, after
the formula used to produce the estimate, arrived at alsoby calculating the mean
correlation between all split halves of a test.
Inter-Rater Reliability:
● M easurement in HR selection is often based on the subjective assessment, or rating, of
one individual by another.
● The HR manager's assessment of job performance is a subjective measurement.
● How likely is it that two managers providing independent performance ratings for each of
several employees would assign the same ratings?
● The correlations between these ratings is often used to estimate the reliability of
supervisor ratings of performance.
● Sometimes, this index of reliability is referred to asclassification consistency or
inter-rater agreement.
● For eg.,As part of team projects, professors may ask all the members of the team to rate
independently the contribution of all other team members.
CHOOSING AN INDEX OF RELIABILITY:
● M easures of test-retest reliability, alternate forms reliability, and internal consistency are
special cases of a more general type of index called ageneralizability coefficient.
● These three measures, however, provide slightly different views of a measure's reliability –
i.e., Each is limited and does not convey all the relevant information that might be needed.
● The specific requirements of a situation may dictate which index is chosen.
● It also remains within the professional judgment of the HR specialist to choose an
appropriate index of reliability and to determine the level of reliability that is acceptable for
use of a specific measure.
VALIDITY:
● V alidity — refers to the legitimacy or correctness of the inferences that are drawn from a
set of measurements or other specified procedures —The degree to which
accumulated evidence and theory support specific interpretations of test scores in
the context of the test's proposed use.
● For eg., your knowledge of recruitment and selection in Canada cannot be inferred from
your score on a test of Canadian history, regardless of its reliability.
● It is essential to demonstrate that measures of people's suitability for a job lead to valid
inferences about the characteristic or construct the measure is intended to capture.
● However, it is often difficult to demonstrate the validity of inferences made from
psychological measurements because they deal with abstract constructs, such as
cognitive ability or intelligence.
● The measures may miss important aspects of a construct (construct
underrepresentation/deficiency) or they may be influenced by aspects of testing (e.g., test
anxiety) that are unrelated to the construct (construct-contamination).
● In most cases, independent physical standards for the construct do not exist, making
validation difficult, though not impossible.
● Validation rests on evidence accumulated through different sources and a theoretical
foundation supporting interpretations of the measurements.
VALIDATION STRATEGIES:
● C
onstruct validity —The degree to which atest or procedure assesses an
underlying theoretical constructit is meant to measure; — assessed through multiple
sources of evidence showing that it measures what it purports/claims to measure and not
other constructs – For example, an IQ test must measure intelligence and not personality.
● C
riterion-related validity —The relationship between a predictor (test score) and an
outcome measure; which is assessed by obtaining the correlation between the predictor
and outcome scores.
oth construct and content validities are validation strategies that provide evidence
B
based mostly on test content, while criterion-related validity provides evidence that a
measure predicts what it is expected to predict.
tandards for Educational and Psychological Testingand thePrinciples for the Validation
S
and Use of Personnel Selection Procedures —The latter is an important document that HR
specialists rely on; it uses the traditional terms of content, construct, and criterion-related
validities in discussing validation strategies.
Face Validity:
● Face validity —is the degree to which the test takers (and not subject-matter experts)
view the content of a test or test items as relevant to the context in which the test is being
administered.
● Face validity isbased on the perceptions or opinions of the test taker, and not those
of experts, that the test or items are related to the aims of the test when it is used.
● For example, if you were asked questions concerning your thinking style on a test you
were told measures cognitive ability then you are likely to conclude that the test lacks face
validity and likely is not job relevant.
● Face validity is not a "technical" form of validitylike content, construct, or
criterion-related validity — However, it doesresemble content validity
● When tests lack face validity job candidates are not likely to take the test seriously when
completing them.
● While face validity is not a technical requirement for a test, a test having face
validity is likely to be more technically valid.
● However, a test that is face valid must also meet the technical standards for validity.
● Face validity is not a substitute for other forms of validity.
● {In addition to finding a reliable and valid measure of the predictor, such as cognitive
ability, HR personnel also need to find a reliable and valid measure of job performance.
● H ow do we define and measure the performance of a maker of widgets? This is usually
more difficult than finding a measure of cognitive ability asperformance may be specific
to the job or organization.
● Job performance is an abstract construct that may involve many behaviours, tasks,
and competencies.
● HR must identify those tasks or competencies that are the most important, the most
frequently performed, or the most critical to successful job performance.
● An HR specialist takes this information and develops a measure of job performance
through one of the procedures.
● Whatever measure is developed to assess job performance, it should represent important
work behaviours, outcomes, or relevant organizational expectations about the employee's
performance.}
● In selecting job applicants, one goal is to hire only those applicants who will perform at
high levels.
● If cognitive ability is associated with job performance at the construct level, then at the
measurement level cognitive ability should predict job performance.
● That is, we must establish the association between the predictor and criterion measures
empirically, referred to ascriterion-related validity. There are two approaches to this,
predictive validation, andconcurrent validation, both with challenges and limitations.
● H owever, those hired, and on which the predictive validity coefficient is calculated, are not
likely to represent the full applicant pool from which they were selected, and for which we
use the test. We want to know the predictive validity of the test as used on the full
applicant pool.
● Predictive validities established only on individuals hiredunderestimate the "true"
associationbetween applicant test scores and the performance of those who are hired.
espite the flaws, criterion-related validation strategies are the most frequently used
D
strategies to validate selection assessments.
tarting in the mid-1970s, Schmidt and Hunter, in conjunction with several colleagues,
S
challenged the idea that a validity coefficient was specific to the context or organization
from which it is derived.
They used a procedure known asmeta-analysisto combine validity coefficients for
similar predictor (e.g., cognitive ability) and criterion (e.g., supervisory ratings of
performance) measures reported in different validity studies
They follow the idea that the best estimate of the association between two
variables is the average of all associations between these two variables reported
across independent studies
It also follows from the idea that thelarger the number of peopleon which a validity
coefficient is calculated — themore reliable and robustis the estimated validity
coefficient.
Accordingly, meta-analysis, when averaging validity coefficients across studies, gives
greater weight to those coefficients reported from studies of large versus small samples
With meta-analysis, we simply multiply the sample size for a study by the size of the
validity coefficient reported for that study, sum the product terms across studies, and
divide by the total (across study) sample size.
The result provides a more accurate estimate of the actual association (validity
coefficient) between the two variables than relying on the validity coefficient reported from
one study only.
BIAS:
● P redictive biasis present when the predicted average performance score of a subgroup
isunderpredictedrelative to members of the majority group.
● The test bias would lead to hiring many more white people with no accents relative to
non-whites with accents even though they may have performed successfully had they
been hired.
● One way to overcome this type of bias is to generate separate regression lines (i.e.,
separate prediction formulas) for the two groups (which would result in different cut-off
scores for selection)
● For eg., In Canadian federal organizations, separate prediction formulas are often used to
select job applicants from anglophone and francophone linguistic groups. In U.S. federal
organizations, the use of different selection rules for different identifiable subgroups (often
referred to as subgroup norming) is prohibited by U.S. federal law.
● M easurement biasoccurs in a set of measurements when items on a test may elicit a
variety of responses other than what was intended, or some items on a test may have
different meanings for members of different subgroups.
● For example, the Bennett Mechanical Comprehension Test contains pictures related to
using different tools and machines that tended to be used mostly by males. Males are
more likely to recognize these tools and their proper use and perform well on the test. On
the other hand, females with good mechanical comprehension may not do as well on the
test because of their lack of familiarity with specific tools pictured on the Bennett test. The
result is that the test may underestimate the true mechanical ability of female job
applicants.
● T he statistical procedures needed to assess for predictive and measurement bias are
often complicated and difficult to carry out.
● Nonetheless, the question of bias can be answered through empirical and objective
procedures. HR professionals may have to demonstrate, before courts or tribunals, that
the employment test or procedures they use are free from bias.
● As a first line of defence, before using a selection device, they should establish that the
test does not discriminate on characteristics or traits that are not job related and that it
does not discriminate against members of groups protected by human rights legislation.
FAIRNESS:
● F airness —The principle that every test taker should be assessed in an equitable
manner.
● The concept of fairness in measurement refers to thevalue judgments people make
about the decisions or outcomesthat are based on measurements.
● An unbiased measure or test may still be viewed as unfair either by society as a whole or
by different groups within it.
● Fairness cannot be determined statistically or empirically —Fairness involves
perceptions.
● An organization may believe it is fair to select qualified females in place of higher-ranking
males in order to increase the number of women in the organization; on the other hand,
the higher-ranking males who were passed over might not agree.
● The Principles for he Validation and Use of Personnel Selection Procedures states this
about fairness :
"Fairness is a social rather than a psychometric concept. Its definition depends on
what one considers to be fair. Fairness has no single meaning, and, therefore, no
single statistical or psychometric definition."
The Principles goes on to identify four meanings of fairness that are relevant in selection:
airness as equitable treatment in the testing process — All examinees should be treated
F
equitably throughout the testing process. They should experience the same or comparable
procedures in the testing itself, in how the tests are scored, and in how the test scores are used.
Fairness as lack of bias — A test or testing procedure is considered fair if it does not produce
any systematic effects that are related to different identifiable group membership characteristics
such as age, sex, or race.
Fairness as requiring equal outcomes {e.g., equal passing rates for subgroups of
interest} in selection and prediction —- The standards reject this definition. While group
differences in outcomes should trigger greater scrutiny for sources of potential bias, outcome
differences alone do not indicate bias (they could reflect "adverse impact"). Where assessments
result in adverse impact against members of protected minority groups, but are not bias,
employers are encouraged to consider alternative assessments that are equally predictive but
have no adverse impact.
Fairness as requiring examinees to have comparable access to the constructs measured
by a selection procedure— No one, because of age, race, ethnicity, gender, socio-economic
status, cultural background, disability, and language proficiency should be restricted in their
access to testing tools and procedures used to inform selection decisions. So, for example, an
online assessment of personality and cognitive ability may be less accessible to low
socio-economic status persons and/or certain ethnic groups than for others of high
socio-economic status, due to differences in ownership of mobile devices, computers, and
Internet access.
airness is an even more complex topic than bias.
F
Achieving fairness often requires compromise between conflicting interests
For eg., Lowering the selection standards to include more applicants from a certain sub-group
group to make the workforce more representative of the general population may come at the
cost of hiring job applicants who, while they meet the minimum job qualifications, are not the
most qualified candidates for the position. Yet, the most qualified candidates typically bring the
most return in productivity to the organization.