You are on page 1of 17

‭ hapter 2 — FOUNDATIONS OF RECRUITMENT AND SELECTION I: RELIABILITY AND‬

C
‭VALIDITY‬

‭●‬ K ‭ SAOs‬‭— The knowledge, skills, abilities, and other‬‭attributes necessary for a new‬
‭incumbent to perform or do well on the job; also referred to as job, employment, or worker‬
‭specifications.‬
‭●‬ ‭Science-based selection provides a systematic and analytical process for identifying and‬
‭measuring job relevant KSAOs for facilitating a matching of candidate attributes to job‬
‭requirements.‬
‭●‬ ‭The psychometric concepts of reliability and validity are fundamental to evaluating‬
‭the value and legal defensibility of a test.‬
‭●‬ ‭By using reliable and valid selection methods, HR managers can be confident in their‬
‭value (i .e., that they do what they are intended to do, such as provide value to the hiring‬
‭organization), as well as save themselves and their company much aggravation and grief‬
‭in having to defend a psychometrically unsound selection tool or process.‬
‭●‬ ‭Science-based selection improves the quality of hires, thereby contributing to performance‬
‭efficiencies, effectiveness, and the well-being of both the organization and its employees.‬

‭THE RECRUITMENT AND SELECTION PROCESS:‬

‭ mployment tests taken by the Toronto Police candidates — test of‬‭cognitive ability (PATI)‬‭, –‬
E
‭written communication test (WCT)‬‭,--‬‭physical readiness‬‭evaluation for police (PREP)‬‭, –‬
‭Behavioural Personnel Assessment Device‬‭for police‬‭(BPAD)‬‭, and‬‭vision and hearing‬
‭tests.‬

‭●‬ O ‭ ften, especially with small- to medium-sized businesses without well-resourced HR‬
‭departments, applicants submit resumes, and, after a preliminary screening, a few are‬
‭interviewed — In many cases the hiring process is informal.‬
‭●‬ ‭When a position becomes vacant, or is newly created, the employer may have a general‬
‭idea of the duties to be performed as part of the job, which are then included in an‬
‭advertisement used to recruit candidates.‬
‭●‬ ‭This advertisement may also state broad educational or experiential requirements‬
‭expected of candidates.‬
‭●‬ ‭The important difference is whether the job duties and position requirements have been‬
‭determined through systematic investigation-that is, a job analysis.‬
‭●‬ ‭Based on a review of the applicant's file, work references, and impressions formed in the‬
‭interview, the employer makes a hiring decision.‬
‭●‬ ‭This decision may reflect the employer's experience, a gut feeling or intuition about a‬
‭certain candidate, or personal preference.‬
‭●‬ T ‭ he employer has an idea of the type of person who will perform well in the target job and‬
‭looks for an applicant matching this ideal, in the absence of objective evidence supporting‬
‭the job relatedness of the candidate attributes informing the decision.‬
‭●‬ ‭All too often, unfortunately, an employer's decision is founded more in biases (often‬
‭unconscious) than applicants' standing on job relevant attributes.‬
‭●‬ ‭With such informal selection processes seldom is the performance of those hired tracked‬
‭and benchmarked against ratings or scores on pre-hire assessments.‬
‭●‬ ‭Without matching performance on the job with pre-hire assessments, there is no way of‬
‭knowing whether those pre-hire assessments are helpful in making quality hires.‬

‭A SELECTION MODEL:‬

‭Human Resources Management: Science Versus Practice In Selection:‬


‭●‬ K ‭ nowing the job dimensions, and the behaviours associated with each of them, allows us‬
‭to infer the knowledge, skills, abilities, and other attributes (KSAOs) of a job candidate that‬
‭are most job relevant.‬
‭●‬ ‭Once these KSAOs are known, then we can decide how best to assess them (e.g., tests,‬
‭interview, simulation).‬
‭●‬ ‭Later,‬‭we can determine how well these assessments‬‭of KSAOs predict job performance,‬
‭as measured through appropriate metrics‬‭(e.g., supervisor/ratings,‬‭tips, sales) — This‬
‭later stage we refer to as establishing‬‭predictive‬‭validity.‬
‭●‬ ‭So, the HR professional must determine how each of the KSAOs are to be assessed. For‬
‭cognitive ability, a test of general cognitive ability may be most appropriate.‬
‭●‬ ‭Information about past work history and experience may come from an application form,‬
‭the candidate's resume, or an interview, while information about the candidate's ability to‬
‭deal with stressful situations may be assessed through a combination of a personality‬
‭inventory and a situational interview.‬
‭●‬ ‭The predictors must be reliable and valid measures of the KSAO constructs (e.g.,‬
‭cognitive ability, personality, experience) identified as related to job performance.‬
‭●‬ ‭The aim of selection is to identify job candidates who possess attributes required for‬
‭effectiveness on the job.‬
‭●‬ ‭The pre-hire measures of the job – i.e., relevant KSAOs, as established through job‬
‭analysis, are used to predict job performance (referred to as establishing predictive‬
‭validity).‬
‭●‬ ‭If shown to be predictive, the employer can then be confident that using these measures‬
‭to inform selection decisions provides value (i.e., in yielding more effective employees)‬
‭and a strong basis for defending them if challenged in the courts, by H R tribunals, or by‬
‭grievance boards.‬
‭●‬ ‭Finally, selection systems must be reviewed periodically as jobs and organizations‬
‭change.‬

‭CONSTRUCTS AND VARIABLES:‬

‭●‬ C ‭ onstructs —‬ ‭are ideas or concepts constructed or‬‭invoked to explain relationships‬


‭between observations — for eg., "learning" is a construct used to explain the change in‬
‭behaviour that results from experience —‬‭Constructs‬‭are abstractions that we cannot‬
‭directly observe and so we infer them from observations.‬
‭●‬ ‭To be useful in HR selection, constructs must be measurable – Intelligence (or cognitive‬
‭ability) is a construct that refers to verbal and numerical ability, perceptual and spatial‬
‭ability, and reasoning, among others.‬

‭‬ V
● ‭ ariables —‬‭simply refer to something that varies‬‭on the construct of interest.‬
‭●‬ ‭When we measure something, we assign a numerical value, and that value may vary‬
‭among people or across time and situations. – For example, we assign a value to‬
r‭ epresent an "IQ" score to capture variability in intelligence among people, or within‬
‭people, over time‬
‭ ‬ ‭Variables allow us to make statements about constructs; for example, "Differences in‬

‭cognitive ability predict success on the job."‬
‭●‬ ‭We infer whether this statement is correct by examining the association between‬
‭measures of these two variables.‬

‭ he tools needed to determine whether a selection system or procedure rests on a solid‬


T
‭foundation; that is, that the measures (pre-hire assessments) used to inform hiring decisions are‬
‭— empirically established as‬‭job relevant, reliable,‬‭and valid and that they are fair and‬
‭defensible.‬

‭RELIABILITY:‬

‭●‬ R ‭ eliability‬‭—‬‭The degree to which observed scores‬‭are free from random measurement‬
‭errors.‬‭— Reliability is an indication of the stability‬‭or dependability of a set of‬
‭measurements over repeated applications of the measure on the same people – i.e.,‬
‭reflecting consistency in observed scores obtained from individuals over several‬
‭administrations of the same assessment.‬
‭●‬ ‭Most measures we take of job candidates to help inform selection decisions contain some‬
‭degree of error in measurement.‬
‭●‬ ‭This is especially so for measures of psychological constructs, such as personality or‬
‭mental ability.‬
‭●‬ ‭In the pure sciences we benefit from more precise measurements.‬
‭●‬ ‭In the social sciences precision in measurement is more challenging to achieve and we‬
‭must provide evidence for the psychometric integrity of the measures we use, especially‬
‭when they are used to inform "high-stake" decisions (as in hiring).‬
‭●‬ ‭In measuring most constructs in the social and behavioural sciences we must be content‬
‭with using assessments that lack 100 percent precision.‬
‭●‬ ‭We build confidence in a measure of what we believe to be a stable trait (e.g., general‬
‭cognitive ability) if each time we‬‭administer it to‬‭the same person it yields close to the‬
‭same score‬‭– It is unlikely to yield identical score‬‭with each administration, but it should‬
‭yield a very similar score.‬
‭●‬ ‭The most confident estimate of the person's actual ( true) cognitive ability would be an‬
‭average of the person's scores obtained over several assessments.‬
‭●‬ ‭Accordingly, the score obtained on each separate administration is an estimate of the‬
‭person's true cognitive ability.‬
‭●‬ ‭The closer the score on a single test is to the average score the individual obtains over‬
‭several administrations of the same test — the more confident we can be in the one‬
‭assessment.‬
‭●‬ I‭n the language of test experts (psychometricians) the score obtained on any one‬
‭administration (i.e., the "observed score") is the person's "true" score on the attribute‬
‭assessed and some‬‭amount of random "measurement error."‬

‭●‬ I‭t is more challenging to measure a psychological construct, such as extraversion, and in‬
‭social sciences we never obtain perfect measurement.‬
‭●‬ ‭A psychometrically validated measure of extraversion will approximate your true score on‬
‭this construct, such that you are very unlikely to obtain the exact same score each time‬
‭you complete the extraversion scale.‬
‭●‬ ‭There will be some random,‬‭unsystematic error ("noise")‬‭around the observed score‬
‭(the score you obtain).‬
‭●‬ ‭Some of the variability in scores across repeated measures might be associated with your‬
‭changing mood, testing conditions (e.g., comfortable or not), and the degree of attention‬
‭you give the test.‬
‭●‬ ‭When using measures for pre-hire assessment we strive to minimize measurement error,‬
‭as scores inform hiring decisions and we want faith that those scores are reliable and‬
‭accurate.‬

‭ ystematic error — biased – can be both precise and imprecise‬


S
‭Random error — unbiased – can be both precise and imprecise‬

‭INTERPRETING RELIABILITY COEFFICIENTS:‬

‭‬ A
● ‭ nother way to think of reliability is in terms of the variability of a set of scores.‬
‭●‬ ‭The classical measurement model, which has had a major impact on HR research,‬
‭assumes that any observed score is a combination of a true score and an error score.‬
‭●‬ ‭True score —‬‭The average score that an individual‬‭would obtain on an infinite number of‬
‭administrations of the same test or parallel versions of the same test.‬
‭●‬ ‭Error score —‬‭The difference between an observed score‬‭and a true score.‬
‭●‬ ‭This model assumes that the‬‭characteristic being measured‬‭is stable‬‭and that the only‬
‭reason an observed score changes from one measurement to another is‬‭due to random‬
‭error‬‭.‬
‭●‬ ‭Error scores are‬‭independent of the characteristic‬‭being measured‬‭and are‬‭attributable to‬
‭the measurement process, not to the individual.‬
‭●‬ ‭That is, the magnitude of error scores is unrelated to the magnitude of the characteristic‬
‭measured.‬
‭●‬ ‭The model also assumes that true scores and error scores combine in a simple‬
‭additive manner to produce the observed score‬‭.‬
‭●‬ ‭If the test is not very accurate-that is, if it adds large random error components to true‬
‭scores — then the variance of the measured (i.e., observed) scores should be much‬
‭larger than the variance of the true scores.‬
‭●‬ R
‭ eliability is captured as the ratio of true score variance to observed score‬
‭variance.‬

‭●‬ T ‭ he reliability coefficient (rx)‬‭is also the degree‬‭that observed scores, which are made‬
‭on the same stable characteristic, correlate with one another.‬
‭●‬ ‭A reliability is reported as a correlation coefficient, ranging in value from 0.0 to +1.0 —‬
‭When a test's reliability coefficient is close to 0.0 all variability in observed test scores is‬
‭due to measurement error, meaning that we can have no confidence that difference in test‬
‭scores across test takers is due to individual differences on the attribute we intended to‬
‭measure. Conversely, when a test's reliability coefficient is near + 1.0, this indicates that‬
‭most variability in scores reflects true score variability (we can be much more confident‬
‭that differences in observed scores across test takers reflect individual differences on the‬
‭attribute we intended to measure).‬
‭●‬ ‭The square of the reliability coefficient, (rx)2 represents the proportion of variance in the‬
‭observed scores that is attributed to true differences on the measured characteristic.‬

‭●‬ S
‭ ystematic error (biased) impacts the accuracy of our measure, but not its‬
‭reliability – Reliability is lowered only when unsystematic (random) error is present‬‭.‬

‭MEASUREMENT ERROR:‬

‭●‬ M
‭ easurement error —‬‭The hypothetical difference between‬‭an observed score and a true‬
‭score; it comprises both random error and systematic error – that is, it can be thought of‬
‭as the hypothetical difference between an individual's observed score on any‬
‭measurement and the individual's true score.‬

‭●‬ M ‭ easurement error, whether systematic or random, reduces the usefulness of any set of‬
‭measures or the results of any test.‬
‭●‬ ‭It reduces the confidence that we can place in the score that the measure assigns to an‬
‭individual, which is problematic when scores are used to make "high-stake" decisions,‬
‭such as whether to hire someone.‬
‭●‬ ‭Information on the degree of error present in any set of measurements must be‬
‭considered when using the measurements to make decisions — like possible major‬
‭sources of error, the size of the error, and the degree to which the observed scores would‬
‭re-occur in another setting‬

‭●‬ T
‭ he‬‭standard error of measurement‬‭— is a statistical‬‭index that summarizes information‬
‭related to measurement error — This index is estimated from observed scores obtained‬
‭over a group of individuals and reflects how an individual's score would vary, on average,‬
‭over repeated observations under identical conditions.‬
‭FACTORS AFFECTING RELIABILITY:‬

‭ he factors that introduce error into any set of measurements can be organized into‬
T
‭three broad categories:‬
‭(1) temporary individual characteristics –‬‭Factors‬‭such as health, motivation, fatigue, and‬
‭emotional state introduce temporary, unsystematic errors into the measurement process.‬
‭(2) lack of standardization –‬‭Changing the conditions‬‭under which measurements are made‬
‭introduces error.‬
‭(3) chance –‬‭Factors unique to a specific procedure‬‭may introduce error into the set of‬
‭measurements.‬

‭METHODS OF ESTIMATING RELIABILITY:‬

‭●‬ O ‭ ne small problem exists with respect to true scores: we can never know the true score‬
‭variance because true scores are abstract constructs — But we can estimate the true‬
‭score.‬
‭●‬ ‭There are several ways to estimate a test's reliability — Each involves assessing the‬
‭consistency of an examinee's scores over time, across different content samples, or‬
‭across different scorers.‬
‭●‬ ‭The common assumption for each of these reliability techniques is that‬‭consistent‬
‭variability across the measurements represents true score variability, while‬
‭inconsistency across the measurements reflects random error‬‭.‬

‭Parallel (Alternate) Forms:‬

‭●‬ T ‭ o measure reliability, we must estimate the degree of variability in a set of scores‬
‭that is caused by measurement error.‬
‭●‬ ‭We can obtain this estimate by using two different, but parallel, measures of the‬
‭characteristic or attribute.‬
‭●‬ ‭Parallel tests‬‭are two different tests of the same‬‭attribute that are designed to provide‬
‭equivalent scores regardless of which of the parallel tests the test taker completes.‬
‭●‬ ‭We are confident we have parallel tests if they‬‭yield‬‭approximately the same mean and‬
‭standard deviation in scores.‬
‭●‬ ‭When people taking two parallel forms of a test‬‭obtain‬‭substantially different scores‬‭this‬
‭suggests‬‭the presence of measurement error.‬
‭●‬ ‭The correlation between scores obtained on one test with the scores obtained on a‬
‭parallel test provides a reliability coefficient.‬
‭●‬ ‭It is extremely difficult, if not impossible, to obtain two precisely parallel measures of the‬
‭same characteristic; therefore, several other strategies have been developed as‬
‭approximations of parallel measures.‬
‭●‬ ‭For eg., instructor giving different forms of a test to different class sections – where neither‬
‭test is considered easier or harder than the other.‬
‭Test and Retest:‬

‭●‬ T ‭ he same test and measurement procedure are used to assess the same attribute for the‬
‭same group of people on two different occasions – the person takes the same test at two‬
‭different times,‬
‭●‬ ‭For eg.,The HR manager invites the job applicants back for a second employment‬
‭interview – They are asked the same questions in the same order.‬
‭●‬ ‭The correlation of their first and second interview scores estimates the reliability of the‬
‭employment interview.‬
‭●‬ ‭High correlations suggest high levels of reliability.‬

‭Internal Consistency:‬

‭●‬ W ‭ here a test -- measures a single construct (e.g., extraversion) and each item of the test‬
‭is written to reflect an aspect of that construct.‬
‭●‬ ‭Accordingly, your response to one of the items of that test should correlate with your‬
‭answer to another item on that same test (of course, the inter-item correlations are‬
‭calculated across test takers).‬
‭●‬ ‭This is the logic underlying‬‭internal consistency‬‭reliability‬‭.‬
‭●‬ ‭Rather than select any one pair of items, however, the correlations are calculated between‬
‭all possible pairs of items and then averaged.‬
‭●‬ ‭This average estimates the internal consistency, the degree to which all the items on the‬
‭test measure the same thing.‬
‭●‬ ‭These estimates are sometimes called‬‭alpha coefficients‬‭,‬‭or‬‭Cronbach's alpha‬‭, after‬
‭the formula used to produce the estimate, arrived at also‬‭by calculating the mean‬
‭correlation between all split halves of a test‬‭.‬

‭Inter-Rater Reliability:‬

‭●‬ M ‭ easurement in HR selection is often based on the subjective assessment, or rating, of‬
‭one individual by another.‬
‭●‬ ‭The HR manager's assessment of job performance is a subjective measurement.‬
‭●‬ ‭How likely is it that two managers providing independent performance ratings for each of‬
‭several employees would assign the same ratings?‬
‭●‬ ‭The correlations between these ratings is often used to estimate the reliability of‬
‭supervisor ratings of performance.‬
‭●‬ ‭Sometimes, this index of reliability is referred to as‬‭classification consistency or‬
‭inter-rater agreement.‬
‭●‬ ‭For eg.,As part of team projects, professors may ask all the members of the team to rate‬
‭independently the contribution of all other team members.‬
‭CHOOSING AN INDEX OF RELIABILITY:‬

‭●‬ M ‭ easures of test-retest reliability, alternate forms reliability, and internal consistency are‬
‭special cases of a more general type of index called a‬‭generalizability coefficient‬‭.‬
‭●‬ ‭These three measures, however, provide slightly different views of a measure's reliability –‬
‭i.e., Each is limited and does not convey all the relevant information that might be needed.‬
‭●‬ ‭The specific requirements of a situation may dictate which index is chosen.‬
‭●‬ ‭It also remains within the professional judgment of the HR specialist to choose an‬
‭appropriate index of reliability and to determine the level of reliability that is acceptable for‬
‭use of a specific measure.‬

‭VALIDITY:‬

‭●‬ V ‭ alidity — refers to the legitimacy or correctness of the inferences that are drawn from a‬
‭set of measurements or other specified procedures —‬‭The degree to which‬
‭accumulated evidence and theory support specific interpretations of test scores in‬
‭the context of the test's proposed use.‬
‭●‬ ‭For eg., your knowledge of recruitment and selection in Canada cannot be inferred from‬
‭your score on a test of Canadian history, regardless of its reliability.‬

‭●‬ I‭t is essential to demonstrate that measures of people's suitability for a job lead to valid‬
‭inferences about the characteristic or construct the measure is intended to capture.‬
‭●‬ ‭However, it is often difficult to demonstrate the validity of inferences made from‬
‭psychological measurements because they deal with abstract constructs, such as‬
‭cognitive ability or intelligence.‬
‭●‬ ‭The measures may miss important aspects of a construct (construct‬
‭underrepresentation/deficiency) or they may be influenced by aspects of testing (e.g., test‬
‭anxiety) that are unrelated to the construct (construct-contamination).‬
‭●‬ ‭In most cases, independent physical standards for the construct do not exist, making‬
‭validation difficult, though not impossible.‬
‭●‬ ‭Validation rests on evidence accumulated through different sources and a theoretical‬
‭foundation supporting interpretations of the measurements.‬

‭VALIDATION STRATEGIES:‬

‭ alidity is a unitary concept.‬


V
‭Content, construct, and criterion-related validities are different, but interrelated, strategies‬
‭commonly used to assess the accuracy of inferences based on measurements or tests used in‬
‭the workplace.‬
‭●‬ C
‭ ontent validity —‬‭Whether the items on a‬‭test capture the content or subject matter‬
‭they are intended to measure‬‭; assessed through the judgments of the experts in the‬
‭subject area.‬

‭●‬ C
‭ onstruct validity —‬‭The degree to which a‬‭test or procedure assesses an‬
‭underlying theoretical construct‬‭it is meant to measure; — assessed through multiple‬
‭sources of evidence showing that it measures what it purports/claims to measure and not‬
‭other constructs – For example, an IQ test must measure intelligence and not personality.‬

‭●‬ C
‭ riterion-related validity —‬‭The relationship between a predictor (test score) and an‬
‭outcome measure; which is assessed by obtaining the correlation between the predictor‬
‭and outcome scores.‬

‭ oth construct and content validities are validation strategies that provide evidence‬
B
‭based mostly on test content, while criterion-related validity provides evidence that a‬
‭measure predicts what it is expected to predict.‬

‭ tandards for Educational and Psychological Testing‬‭and the‬‭Principles for the Validation‬
S
‭and Use of Personnel Selection Procedures —‬‭The latter is an important document that HR‬
‭specialists rely on; it uses the traditional terms of content, construct, and criterion-related‬
‭validities in discussing validation strategies.‬

‭Face Validity:‬
‭●‬ ‭Face validity —‬‭is the degree to which the test takers (and not subject-matter experts)‬
‭view the content of a test or test items as relevant to the context in which the test is being‬
‭administered.‬
‭●‬ ‭Face validity is‬‭based on the perceptions or opinions of the test taker‬‭, and not those‬
‭of experts, that the test or items are related to the aims of the test when it is used.‬
‭●‬ ‭For example, if you were asked questions concerning your thinking style on a test you‬
‭were told measures cognitive ability then you are likely to conclude that the test lacks face‬
‭validity and likely is not job relevant.‬
‭●‬ ‭Face validity is not a "technical" form of validity‬‭like content, construct, or‬
‭criterion-related validity — However, it does‬‭resemble content validity‬
‭●‬ ‭When tests lack face validity job candidates are not likely to take the test seriously when‬
‭completing them.‬
‭●‬ ‭While face validity is not a technical requirement for a test, a test having face‬
‭validity is likely to be more technically valid.‬
‭●‬ ‭However, a test that is face valid must also meet the technical standards for validity.‬
‭●‬ ‭Face validity is not a substitute for other forms of validity.‬

‭●‬ ‭{‭I‬n addition to finding a reliable and valid measure of the predictor, such as cognitive‬
‭ability, HR personnel also need to find a reliable and valid measure of job performance.‬
‭●‬ H ‭ ow do we define and measure the performance of a maker of widgets? This is usually‬
‭more difficult than finding a measure of cognitive ability as‬‭performance may be specific‬
‭to the job or organization‬‭.‬
‭●‬ ‭Job performance is an abstract construct that may involve many behaviours, tasks,‬
‭and competencies‬‭.‬
‭●‬ ‭HR must identify those tasks or competencies that are the most important, the most‬
‭frequently performed, or the most critical to successful job performance.‬
‭●‬ ‭An HR specialist takes this information and develops a measure of job performance‬
‭through one of the procedures.‬
‭●‬ ‭Whatever measure is developed to assess job performance, it should represent important‬
‭work behaviours, outcomes, or relevant organizational expectations about the employee's‬
‭performance.‬‭}‬

‭●‬ I‭n selecting job applicants, one goal is to hire only those applicants who will perform at‬
‭high levels.‬
‭●‬ ‭If cognitive ability is associated with job performance at the construct level, then at the‬
‭measurement level cognitive ability should predict job performance.‬
‭●‬ ‭That is, we must establish the association between the predictor and criterion measures‬
‭empirically, referred to as‬‭criterion-related validity‬‭. There are two approaches to this,‬
‭predictive validation‬‭, and‬‭concurrent validation‬‭, both with challenges and limitations.‬

‭●‬ P ‭ redictive validation —‬‭Strategies in which evidence is obtained about a correlation‬


‭between‬‭predictor scores‬‭that are obtained before an applicant is hired and‬‭criterion‬
‭scores‬‭that are obtained later, usually after an applicant is employed.‬
‭●‬ ‭That is, evidence is obtained about a correlation between pre-hire predictor scores (e.g.,‬
‭cognitive ability) and post-hire criterion scores (i.e., performance).‬

‭●‬ H ‭ owever, those hired, and on which the predictive validity coefficient is calculated, are not‬
‭likely to represent the full applicant pool from which they were selected, and for which we‬
‭use the test. We want to know the predictive validity of the test as used on the full‬
‭applicant pool.‬
‭●‬ ‭Predictive validities established only on individuals hired‬‭underestimate the "true"‬
‭association‬‭between applicant test scores and the performance of those who are hired.‬

‭●‬ C ‭ oncurrent validation‬‭— Strategies in which evidence is obtained about a‬‭correlation‬


‭between predictor and criteria scores‬‭from information that is collected at‬
‭approximately the same time from a specific group of workers.‬
‭●‬ ‭For eg., In this case,. HR has current employees complete the cognitive ability test; at the‬
‭same time, their supervisors provide ratings of their job performance — A positive‬
‭correlation between both sets of scores provides evidence for the validity of the cognitive‬
‭ability test as predictor of the job performance measure.‬
‭‬ W
● ‭ hile concurrent evidence may be easier to collect, these strategies, too, are problematic.‬
‭●‬ ‭The group of employed workers used to develop the concurrent validity evidence is likely‬
‭to be older, more experienced, and higher performers, on average, than those from the full‬
‭applicant pool.‬
‭●‬ ‭Poor performing workers most likely are not part of the concurrent validation study as they‬
‭probably were released from their job (i.e., "fired"), voluntarily resigned, or transferred to‬
‭other positions.‬
‭●‬ ‭The primary concern here is whether a validity coefficient based on only successful‬
‭applicants can be used as evidence to validate decisions based on predictor scores‬
‭obtained from a more heterogeneous pool of job applicants, some likely to be successful,‬
‭and others not, if hired.‬
‭●‬ ‭Job incumbents who are asked to complete a battery of selection tests may approach‬
‭them with a different attitude and level of motivation than job applicants (as job‬
‭incumbents are already employed and so do not have much at stake).‬
‭●‬ ‭These differences may affect test responses, especially for personality and integrity tests‬
‭that rely on the test taker's cooperation in responding truthfully, but also for cognitive‬
‭ability tests where job incumbents may exert less effort than job applicants.‬
‭●‬ ‭Statistically, validity coefficients based on concurrent validity evidence underestimate the‬
‭"true" predictive validity of a selection test when used on the full, more heterogeneous,‬
‭applicant pool.‬

‭ espite the flaws, criterion-related validation strategies are the most frequently used‬
D
‭strategies to validate selection assessments.‬

‭Validity and Reliability Evidence:‬


‭●‬ ‭Validity — "Does the test measure what it is supposed to measure?‬
‭●‬ ‭Predictive Validity Evidence — Refers to how well the test predicts some future‬
‭behaviour, regardless of whatever else it may test.‬
‭●‬ ‭Concurrent Criterion-related Validity Evidence — Two tests administered at the same time‬
‭to two groups.‬
‭●‬ ‭Content Validity Evidence — Checks that test items correspond to what is supposed to be‬
‭covered in the test.‬
‭●‬ ‭Construct Validity Evidence — Demonstrates a relationship between some theory and‬
‭another set of variables, that is, if you are testing a foreign language, then those skills‬
‭should show improvement after much instruction.‬
‭●‬ ‭Reliability — "Does the test yield the same or similar score rankings (all other factors‬
‭being equal) consistently?‬
‭●‬ ‭Test-Retest, or Stability — The test is given more than once to determine correlation.‬
‭●‬ ‭Alternate Forms or Equivalence — Administering two equivalent tests and then comparing‬
‭scores.‬
‭●‬ ‭Internal Consistency — Splitting a test into two equal halves (i.e., odd numbers vs even‬
‭numbers), and comparing scores for correlation.‬
‭Validity Generalization:‬

‭ alidity generalization — The application of validity evidence obtained through‬


V
‭meta-analysis of data drawn from studies reporting associations between a similar‬
‭predictor and similar criterion, to one or more situations (e.g., organizations) like those‬
‭on which the meta-analysis is based — i.e., as we are generalizing from other studies the‬
‭criterion-related validity for our own organization.‬

‭ tarting in the mid-1970s, Schmidt and Hunter, in conjunction with several colleagues,‬
S
‭challenged the idea that a validity coefficient was specific to the context or organization‬
‭from which it is derived.‬
‭They used a procedure known as‬‭meta-analysis‬‭to combine validity coefficients for‬
‭similar predictor (e.g., cognitive ability) and criterion (e.g., supervisory ratings of‬
‭performance) measures reported in different validity studies‬
‭They follow the idea that the best estimate of the association between two‬
‭variables is the average of all associations between these two variables reported‬
‭across independent studies‬
‭It also follows from the idea that the‬‭larger the number of people‬‭on which a validity‬
‭coefficient is calculated — the‬‭more reliable and robust‬‭is the estimated validity‬
‭coefficient.‬
‭Accordingly, meta-analysis, when averaging validity coefficients across studies, gives‬
‭greater weight to those coefficients reported from studies of large versus small samples‬
‭With meta-analysis, we simply multiply the sample size for a study by the size of the‬
‭validity coefficient reported for that study, sum the product terms across studies, and‬
‭divide by the total (across study) sample size.‬
‭The result provides a more accurate estimate of the actual association (validity‬
‭coefficient) between the two variables than relying on the validity coefficient reported from‬
‭one study only.‬

‭BIAS AND FAIRNESS:‬

‭BIAS:‬

‭ ias‬‭— refers to systematic errors in measurement, or inferences made from measurements,‬


B
‭that are related to different identifiable group membership characteristics such as age, sex, or‬
‭race.‬
‭General Guidelines for Interpreting Validity Coefficients:‬

‭Validity Coefficient Value‬ ‭Interpretation‬


‭Above 0.35 —- Very beneficial‬
‭0.21 – 0.35 —- Likely to be useful‬
‭0.11 – 0.20 —- Depends on circumstances‬
‭Below 0.11 —- Unlikely to be useful‬

‭●‬ P ‭ redictive bias‬‭is present when the predicted average performance score of a subgroup‬
‭is‬‭underpredicted‬‭relative to members of the majority group.‬
‭●‬ ‭The test bias would lead to hiring many more white people with no accents relative to‬
‭non-whites with accents even though they may have performed successfully had they‬
‭been hired.‬
‭●‬ ‭One way to overcome this type of bias is to generate separate regression lines (i.e.,‬
‭separate prediction formulas‬‭) for the two groups (which would result in different cut-off‬
‭scores for selection)‬
‭●‬ ‭For eg., In Canadian federal organizations, separate prediction formulas are often used to‬
‭select job applicants from anglophone and francophone linguistic groups. In U.S. federal‬
‭organizations, the use of different selection rules for different identifiable subgroups (often‬
‭referred to as subgroup norming) is prohibited by U.S. federal law.‬

‭●‬ M ‭ easurement bias‬‭occurs in a set of measurements when items on a test may elicit a‬
‭variety of responses other than what was intended, or some items on a test may have‬
‭different meanings for members of different subgroups.‬
‭●‬ ‭For example, the Bennett Mechanical Comprehension Test contains pictures related to‬
‭using different tools and machines that tended to be used mostly by males. Males are‬
‭more likely to recognize these tools and their proper use and perform well on the test. On‬
‭the other hand, females with good mechanical comprehension may not do as well on the‬
‭test because of their lack of familiarity with specific tools pictured on the Bennett test. The‬
‭result is that the test may underestimate the true mechanical ability of female job‬
‭applicants.‬

‭●‬ T ‭ he statistical procedures needed to assess for predictive and measurement bias are‬
‭often complicated and difficult to carry out.‬
‭●‬ ‭Nonetheless, the question of bias can be answered through empirical and objective‬
‭procedures. HR professionals may have to demonstrate, before courts or tribunals, that‬
‭the employment test or procedures they use are free from bias.‬
‭●‬ ‭As a first line of defence, before using a selection device, they should establish that the‬
‭test does not discriminate on characteristics or traits that are not job related and that it‬
‭does not discriminate against members of groups protected by human rights legislation.‬

‭FAIRNESS:‬
‭●‬ F ‭ airness —‬‭The principle that every test taker should be assessed in an equitable‬
‭manner.‬
‭●‬ ‭The concept of fairness in measurement refers to the‬‭value judgments people make‬
‭about the decisions or outcomes‬‭that are based on measurements.‬
‭●‬ ‭An unbiased measure or test may still be viewed as unfair either by society as a whole or‬
‭by different groups within it.‬
‭●‬ ‭Fairness cannot be determined statistically or empirically —‬‭Fairness involves‬
‭perceptions‬‭.‬
‭●‬ ‭An organization may believe it is fair to select qualified females in place of higher-ranking‬
‭males in order to increase the number of women in the organization; on the other hand,‬
‭the higher-ranking males who were passed over might not agree.‬
‭●‬ ‭The Principles for he Validation and Use of Personnel Selection Procedures states this‬
‭about fairness :‬
‭"Fairness is a social rather than a psychometric concept. Its definition depends on‬
‭what one considers to be fair. Fairness has no single meaning, and, therefore, no‬
‭single statistical or psychometric definition."‬

‭The Principles goes on to identify four meanings of fairness that are relevant in selection:‬

‭ airness as equitable treatment in the testing process —‬ ‭All examinees should be treated‬
F
‭equitably throughout the testing process. They should experience the same or comparable‬
‭procedures in the testing itself, in how the tests are scored, and in how the test scores are used.‬
‭Fairness as lack of bias —‬ ‭A test or testing procedure is considered fair if it does not produce‬
‭any systematic effects that are related to different identifiable group membership characteristics‬
‭such as age, sex, or race.‬
‭Fairness as requiring equal outcomes {e.g., equal passing rates for subgroups of‬
‭interest} in selection and prediction —-‬ ‭The standards reject this definition. While group‬
‭differences in outcomes should trigger greater scrutiny for sources of potential bias, outcome‬
‭differences alone do not indicate bias (they could reflect "adverse impact"). Where assessments‬
‭result in adverse impact against members of protected minority groups, but are not bias,‬
‭employers are encouraged to consider alternative assessments that are equally predictive but‬
‭have no adverse impact.‬
‭Fairness as requiring examinees to have comparable access to the constructs measured‬
‭by a selection procedure‬‭— No one, because of age, race, ethnicity, gender, socio-economic‬
‭status, cultural background, disability, and language proficiency should be restricted in their‬
‭access to testing tools and procedures used to inform selection decisions. So, for example, an‬
‭online assessment of personality and cognitive ability may be less accessible to low‬
‭socio-economic status persons and/or certain ethnic groups than for others of high‬
‭socio-economic status, due to differences in ownership of mobile devices, computers, and‬
‭Internet access.‬
‭ airness is an even more complex topic than bias.‬
F
‭Achieving fairness often requires compromise between conflicting interests‬
‭For eg., Lowering the selection standards to include more applicants from a certain sub-group‬
‭group to make the workforce more representative of the general population may come at the‬
‭cost of hiring job applicants who, while they meet the minimum job qualifications, are not the‬
‭most qualified candidates for the position. Yet, the most qualified candidates typically bring the‬
‭most return in productivity to the organization.‬

You might also like