Professional Documents
Culture Documents
Course Outline
I. Introduction
a. History of Psychological Testing
b. Defining Psychological Testing
c. Basic Concepts of Psychological Testing
Unit I Introduction
HISTORY OF PSYCHOLOGICAL TESTING
PSYCHOMETRICS
PSYCHOLOGICAL TESTING
TESTING
Objective
Typically, to obtain some gauge, usually numerical in nature, with regard to
an ability or attribute.
Process
Role of Evaluator
The tester is not key to the process; practically speaking, one tester may be
substituted for another tester without appreciably affecting the evaluation .
Skill of Evaluator
ASSESSMENT
Objective
Typically, to answer a referral question, solve a problem, or arrive at a
decision through the use of tools of evaluation.
• Process
Assessment is typically individualized. In contrast to testing, assessment more
typically focuses on how an individual processes rather than simply the
results of that processing (Cohen and Swerdlik, 2009).
• Role of Evaluator
The assessor is key to the process of selecting tests and/or other tools of
evaluation as well as in drawing conclusions from the entire evaluation.
• Skill of Evaluator
Assessment typically requires an educated selection of tools of evaluation,
skill in evaluation, and thoughtful organization and integration of data.
• Outcome
Typically, assessment entails a logical problem-solving approach that brings
to bear many sources of data designed to shed light on a referral question.
Descriptive statistics
are methods used to provide a concise description of a collection of
quantitative information; numbers and graphs used to describe, condense, or
represent data.
Inferential statistics
are methods used to make inferences from observations of a small group of
people known as a sample to a larger group of individuals known as a
population; to estimate population values based on sample values or to test
hypotheses.
TYPES OF SCALES
Nominal scales
Are really not scales at all; their only purpose is to name objects. Nominal
scales are used when the information is qualitative rather than quantitative.
Ordinal scale
This scale allows you to rank individuals or objects but not to say anything
about the meaning of the differences between the ranks.
Interval scale
Has the property of magnitude and equal intervals but not absolute 0.
Ratio scale
A scale that has all three properties (magnitude, equal intervals, and an
absolute 0) any mathematical operation is permissible.
FREQUENCY DISTRIBUTION
A distribution of scores summarizes the scores for a group of individuals. The
frequency distribution displays scores on a variable or a measure to reflect how
frequently each value was obtained.
Mean
arithmetic average
Median
the value that divides a distribution that has been arranged in order of
magnitude into two halves
Mode
most frequently occurring value in a distribution, is useful primarily when
dealing with qualitative or categorical variables
MEASURES OF VARIABILITY
Range
distance between two extreme points—the highest and lowest values in a
distribution
Variance
the sum of the squared differences or deviations between each value (X)
in a distribution and the mean of that distribution (M), divided by N
Standard deviation
the square root of the variance; it provides a single value that is
representative of the individual differences or deviations in a data set
computed from a common reference point, namely, the mean
The normal curve model is used descriptively to locate the position of scores
that come from distributions that are normal. In a process known as
normalization, the normal curve is also used to make distributions that are not
normal—but approximate the normal—conform to the model, in terms of the
relative positions of scores.
The normal curve model is applied inferentially in the areas of
(a) Reliability, to derive confidence intervals to evaluate obtained
scores and differences between obtained scores, and
(b) Validity, to derive confidence intervals for predictions or estimates
based on test scores.
SHAPE OF DISTRIBUTIONS
Kurtosis
refers to the flatness or peakedness of a distribution
Platykurtic
distributions have the greatest amount of dispersion, manifested in tails
that are more extended, and leptokurtic distributions have the least. The
normal distribution is mesokurtic, meaning that it has an intermediate
degree of dispersion.
Skewness (Sk)
of a distribution refers to a lack of symmetry. As we have seen, the normal
distribution is perfectly symmetrical, with Sk = 0; its bulk is in the middle
and its two halves are identical.
A skewed distribution is asymmetrical. If most of the values are at the top end
of the scale and the longer tail extends toward the bottom, the distribution is
negatively skewed (Sk < 0); on the other hand, if most of the values are at the
bottom and the longer tail extends toward the top of the scale, the distribution is
positively skewed (Sk > 0).
in the coefficient, whereas the direction of the relationship is indicated by the sign. A
correlation coefficient of –0.80, for example, indicates exactly the same degree of
Correlation, even if high, does not imply causation. If two variables, X andY,
variable, Z, causes both X andY. This truism is also frequently ignored; moderate to
high correlation coefficients are often cited as though they were proof of a causal
Knowledge of the extent to which things vary in relation to one another is extremely
useful. Through regression analyses we can use correlational data on two or more
dependent variable (Y), within a certain margin of error, based on the known values
variable is correlated.
RELIABILITY
A good test or, more generally, a good measuring tool or procedure is reliable. The
criterion of reliability involves the consistency of the measuring tool: the precision with
which the test measures and the extent to which error is present in measurements. In
theory, the perfectly reliable measuring tool consistently measures in the same way. As you
might expect, however, reliability is a necessary but not sufficient element of a good test. In
In its broadest sense, error refers to the component of the observed test score that
does not have to do with the test taker’s ability. If we use X to represent an observed score,
T to represent a true score, and E to represent error, then the fact that an observed score
X=T+E
A statistic useful in describing sources of test score variability is the variance (σ2)—
the standard. Variance from true differences is true variance, and variance from irrelevant,
sampling or content sampling, terms that refer to variation among items within a test as well
as to variation among items between tests. From the perspective of a test creator, a
challenge in test development is to maximize the proportion of the total variance that is true
administration may influence the test taker’s attention or motivation. The test taker’s
reactions to those influences are the source of one kind of error variance. Examples of
untoward influences during administration of a test include factors related to the test
environment: the room temperature, the level of lighting, and the amount of ventilation and
caused by scorer differences in many tests. If subjectivity is involved in scoring, then the
Other sources of error. Females, for example, may underreport abuse because of
fear, shame, or social desirability factors and over report abuse if they are seeking help.
Males may underreport abuse because of embarrassment and social desirability factors and
RELIABILITY ESTIMATES
Alternate forms are simply different versions of a test that have been
constructed so as to be parallel. Although they do not meet the requirements for the
1. Two test administrations with the same group are required, and
do better or worse on a specific form of the test not as a function of their true ability
but simply because of the particular items that were selected for inclusion in the test.
scores obtained from equivalent halves of a single test administered once. The
Step 2. Calculate a Pearson r between scores on the two halves of the test.
Simply dividing the test in the middle is not recommended because it’s likely this
procedure would spuriously raise or lower the reliability coefficient. Different amounts of
fatigue for the first as opposed to the second part of the test, different amounts of test
anxiety, and differences in item difficulty as a function of placement in the test are all
factors to consider.
internal consistency reliability from a correlation of two halves of a test. The general
VALIDITY
Validity, as applied to a test, is a judgment or estimate of how well a test
measures what it purports to measure in a particular context. More specifically, it is
a judgment based on evidence about the appropriateness of inferences drawn from
test scores.
Validation
is the process of gathering and evaluating evidence about validity. Both the test
developer and the test user may play a role in the validation of a test for a specific purpose.
Such local validation studies may yield insights regarding a particular population of test
takers as compared to the norming sample described in a test manual. Local validation
studies are absolutely necessary when the test user plans to alter in some way the format,
content validity
criterion validity
construct validity
useful to visualize construct validity as being “umbrella validity” since every other variety of
validity falls under it. Three approaches to assessing validity—associated, respectively, with
2. relating scores obtained on the test to other test scores or other measures
a. how scores on the test relate to other test scores and measure
b. how scores on the test can be understood within some theoretical framework
for understanding the construct that the test was designed to measure.
FACE VALIDITY
- relates more to what a test appears to measure to the person being tested
face validity are frequently thought of from the perspective of the test taker,
CONTENT VALIDITY
designed to sample.
From the pooled information (along with the judgment of the test
developer), a test blueprint emerge for the “structure” of the evaluation; that
is, a plan regarding the types of information to be covered by the items, the
where tests used to hire and promote peopl are carefully scrutinized for their
relevance to the job, among other factors. Courts often require evidence that
regarding how essential a particular item is. Lawshe proposed that each rater
respond to the following question for each item: “Is the skill or knowledge measured
by this item
1. Essential
Herzegovina are taught different versions of history, art, and language depending
upon their ethnic background. Such a situation illustrates in stark relief the influence
CRITERION-RELATED VALIDITY
used to infer an individual’s most probable standing on some measure of interest the
measure of interest being the criterion. Concurrent validity is an index of the degree
to which a test score is related to some criterion measure obtained at the same time
CONCURRENT VALIDITY
If test scores are obtained at about the same time that the criterion measures
are obtained, measures of the relationship between the test scores and the criterion
the extent to which test scores may be used to estimate an individual’s present
standing on a criterion.
PREDICTIVE VALIDITY
Measures of the relationship between the test scores and a criterion measure
obtained at a future time provide an indication of the predictive validity of the test;
that is, how accurately scores on the test predict some criterion measure.
Test scores may be obtained at one time and the criterion measures obtained
at a future time, usually after some intervening event has taken place. The
intervening event may take varied forms, such as training, experience, therapy,
the relationship between test scores and scores on the criterion measure.
INCREMENTAL VALIDITY
an aspect of validity that refers to what an additional assessment or
assessments or variables
CONSTRUCT VALIDITY
construct.
1.Reliability 6. Efficiency
of fair, valid, and reliable assessments that effectively measure the intended
Clarity. Test items and instructions should be clear and easily understood by
1. Define clearly what you want to measure. To do this, use substantive theory as a
2. Generate an item pool. Theoretically, all items are randomly chosen from a
items is valuable. Avoid redundant items. In the initial phases, you may want to
write three or four items for each one that will eventually be used on the test or
scale.
3. Avoid exceptionally long items. Long items are often confusing or misleading.
4. Keep the level of reading difficulty appropriate for those who will complete the
scale.
5. Avoid “double-barreled” items that convey two or more ideas at the same time.
For example, consider an item that asks the respondent to agree or disagree with
the statement, “I vote Democratic because I support social programs.” There are
two different statements with which the person could agree: “I vote Democratic”
develop the “acquiescence response set.” This means that the respondents will tend
to agree with most items. To avoid this bias, you can include items that are worded
ITEM FORMATS
Dichotomous Format
Description: This format presents items with two response options, typically
assessments.
Polytomous Format
response options for each item. It allows for a graded response, providing a range of
choices.
Likert Format
Category Format
criteria. Respondents choose the category that best fits their response.
observed. It's a simple way to record the presence or absence of specific behaviors
or characteristics.
TEST ADMINISTRATION
Both behavior of the examiner and their relationship to the test taker can affect test scores
Fuchs and Fuchs found out that test performance was approximately .28 standard deviation
(roughly 4 IQ points) higher when the examiner was familiar with the test taker than when
not.
Familiarity with the test taker, and perhaps preexisting notions about the test taker’s ability
can either positively or negatively bias test results.
Attitudinal surveys - respondents may give the response that they perceived to be expected
the by interviewer
Rapport might be influenced by subtle processes such as the level of performance expected
by the examiner.
Some groups feel that their children should not be tested by anyone EXCEPT a member of
their own race.
According to Sattler there is little evidence that the race of the examiner significantly affects
intelligence test score.
Race of the examiner has nonsignificant effects on test performance for both African
American and white children.
Early results occurred both the Stanford-Binet scale and the Peabody Picture Vocabulary
Test.
Few studies have shown effect attributed to the race of the examiner. There were only 4 of
29 studies found.
Sattler has shown that the race of the examiner affects the scores in some situations .
Examiners effects tend to increase when examiners are given more discretion about the use
of the tests.
Many behavioral assessment procedure require training and evaluation but not a formal
degree or diploma.
SCID users are licensed psychiatrists or psychologist with additional training on the test.
No standardized protocols for training people to administer complicated tests such as the
Wechsler Adult Intelligence Scale-Revised.
A well-known line of research in psychology has shown that data sometimes can be affected
by what an experimenter expects to find.
Results show that subjects actually provide data that confirm the experimenter’s
expectancies.
Study in Israel, women supervisors were told that some women officer cadets offered
exceptional potential. Selection was made randomly instead of on the basis of any evidence.
There were no expectancy effect shown.
Follow-up study shows that expectancy effect show up for men and women supervised by
men but no women led by women.
may come from subtle nonverbal communication between experimenter and subject
experimenter may not even notice
Expectancy effects - small, subtle effect on scores; occurs in some situations and not
others.
careful studies on particular tests needed
Expectancies in test administrators (e.g. more than just scores given) have yielded
somewhat inconsistent results. Some with expectancy effect, some with none.
In spite of inconsistent results, you should pay attention to potentially biasing effect of
expectancy.
Several studies show that reward can significantly affect test performance
Reinforcement and feedback guide the examinee toward a preferred response.
Random reinforcement destroys the accuracy of performance and decreases the
motivation to respond (Eisenberger & Cameron, 1998)
As early as 1970, Cronbach recognized the value of computers as test administrators. Here
are some of the advantages that computers offer:
excellence of standardization,
individually tailored sequential administration,
precision of timing responses,
release of human testers for other duties,
patience (test taker not rushed), and control of bias,
Example of CATs
Conventional Testing - examinees receive the same test questions in the same order,
usually a question at a time.
Branched or Response-Contingent Testing -a problem situation is presented to the
examinee with a number of alternatives.
Sequential Testing -are typically used to make a classification decision (e.g., to hire or not
to hire, to graduate or not to graduate, or whether someone is or is not depressed) using
one or more prespecified cut off scores.Subject Variables
Refers to characteristics that vary across participants, and they can’t be manipulated by the
one administering the test. These are often serious source of error.
Illness affects test scores. When you have a cold or the flu, you might not perform as well
as when you are feeling well. Many variations in health status affect performance in
behavior and in thinking (Kaplan, 2004.)
Medical drugs are now evaluated according to their effects on the cognitive process (Spilker,
1996).
Good morning and good night text messages activate the part of the brain
responsible for happiness.
Feeling ignored causes the same chemical effect as that of an injury
Some of us are actually afraid of being so happy because of the fear that something
tragic might happen next.
Behavioral traits are the observable patterns of behavior that are relatively consistent
across various situations.
Focuses on the interactions between situations and behaviors for the purpose of effecting
behavioral change.
Personality Assessments
Situational Judgment Tests (SJTs)
Behavioral Interviews
Work Sample
Pros: Behavioral assessments can provide objective data to help hiring managers evaluate
their candidates.
Cons: While tests can help reduce bias in the hiring process, they are not immune to bias
themselves. Determining which traits are “valuable” or “risky” is not, itself, an objective
process.
Predictive Value
Pros: Behavioral assessments can be effective in predicting job performance and identifying
candidates who are likely to succeed in the role.
Cons: These tests are not foolproof. Why does an employee succeed at one company but
fail at another? The employee is the same but the company’s product, support, culture,
territory, etc. (and the economy in general) all serve to complicate employee success.
Time
Pros: Behavioral assessments can help filter out candidates who are not a good fit for the
job, saving time and resources in the hiring process. Cons: On the other hand, these
assessments take time to administer and evaluate which can bog down the hiring process.
Bias
Pros: Using behavioral assessments can help ensure that all candidates are evaluated on
the same criteria, which can help reduce bias and ensure fairness in the hiring process.
Cons: No assessment can be truly free from bias. It’s important for hiring managers to be
aware of any potential biases and to use assessments in conjunction with other evaluation
methods.
Costs/Benefits
Pros: Behavioral assessments can provide insight into a candidate’s work style,
communication skills, and problemsolving abilities, which can help managers make more
informed hiring decisions.
Cons: Some tests can be expensive, which may be a barrier for smaller companies or those
with limited budgets.
Reactivity
when individuals change their behavior due to awareness that their behavior is being or will
be measured. - Their behavior might become more positive or negative, depending on the
situation and the people involved.
Drift
-refers to the tendency for observers in behavioral studies to stray from the definitions they
learned during training and to develop their own idiosyncratic definitions of behaviors
despite observing the same behavior.
Expentancies
Another potential source of bias is the expectancies of the observers regarding the subject's
behavior and the feedback observers receive from the experimenter in relation to that
behavior.
Deception
The act of misleading or wrongly informing someone about the true nature of a situation.
Also known as the halo effect is the tendency to ascribe positive attributes independently of
the observed behavior. Some psychologists have argued that this effect can be controlled
through partial correlation in which the correlation between two variables is found while
variability in a third variable is controlled.
UNIT III
When administered and evaluated properly, psychological tests are accurate tools
used to diagnosis and treat mental health conditions. When you hear the words
“psychological testing,” all kinds of questions and thoughts may run through your mind.
Psychological testing is the basis for mental health treatment. These tools are often used
to measure and observe a person’s behaviors, emotions, and thoughts.Tests are performed
by a psychologist who will evaluate the results to determine the cause, severity, and
duration of your symptoms. This will guide them in creating a treatment plan that meets
your needs.
Objective testing involves answering questions with set responses like yes/no or
true/false.
Here’s a more in-depth look at the types of testing available and the most commonly used
tests for each category.
Personality tests
Attitude tests
Measure views of respondents based on how much they agree or disagree with a statement
Test names: Likert Scale, Thurstone Scale
Aptitude tests
There are a number of core principles that form the foundation for psychological
assessment:
Tests do not directly reveal traits or capacities, but may allow inferences to be made
about the person being examined.
Test scores and other test performances may be adversely affected by temporary
states of fatigue, anxiety, or stress; by disturbances in temperament or personality;
or by brain damage.
A psychological evaluation is often thought of as the first line of defense in diagnosing and
treating a mental health condition. Performed by a psychologist, it helps them gain an
understanding of the severity and duration of your symptoms.Tests and assessments are
the two main components used in anvaluation typically includes using formal tests, or
“norm-referenced” tests. These are standardized tests that measure an individual’s ability to
learn and understand several concepts.
Common components of an assessment include:
psychological tests
interviews
observational data
medical evaluation
PERSONALITY TEST
Personality
Menninger (1953, p. 23) defined it as “the individual as a whole, his height and weight
and love and hates and blood pressure and reflexes; his smiles and hopes and bowed legs
and enlarged tonsils.
Personality Assessment
Personality Type
Personality State
Projective Methods
The Rorschach
Hermann Rorschach ( Figure 13–1 ) developed what he called a “form interpretation test”
using inkblots as the forms to be interpreted.
Pictures used as projective stimuli may be photos of real people, animals, objects, or
anything. They may be paintings, drawings, etchings, or any other variety of picture.
-developed for use in specifi c types of settings (such as school or business) or for
specifi c purposes. Sentence completion tests may be relatively atheoretical or linked
very closely to some theory.
Projective
Sounds as Projective Stimuli
Auditory Projective Test
This inspired Skinner to think of an application for sound, not only in behavioral terms but in
the elicitation of “latent” verbal behavior that was significant “in the Freudian sense”
(Skinner, 1979, p. 175).
-the subject’s task was to respond by creating a story based on three sounds played on a
phonograph record.
Wilmer & Husni, 1951) and the other referred to as an auditory apperception test (Ball &
Bernardoni, 1953). Henry Murray also got into the act with his Azzageddi test (Davids &
Murray, 1955), named for a Herman Melville character. Unlike other auditory projectives, the
Azzageddi presented subjects with spoken paragraphs.
Projective
The Production of Figure Drawings
Projective
The Production of Figure Drawings
Figure-drawing tests
a figure drawing test may be defined as a projective method of personality assessment
whereby the assessee produces a drawing that is analyzed on the basis of its content and
related variables.
Machover wrote that the human fi gure drawn by an individual who is directed to “draw a
person” [is] related intimately to the impulses, anxieties, confl icts, and compensations
characteristic of that individual. In some sense, the fi gure drawn is the person, and the
paper corresponds to the environment.
- is another projective figure-drawing test. As the name of the test implies, the testtaker’s
task is to draw a picture of a house, a tree, and a person.
OTHER TEST
Attitude Test
-Attitude testing is done to measure people's attitudes. The purpose is to quantify peoples'
beliefs and behaviors to inform decisions, understand human differences, and gain
knowledge about personality types. Attitude testing can be done directly or indirectly.
-is FUNDAMENTAL to the success or failure that we experience in our life. There is little
difference in people physically or intellectually. But what does make the difference is the
attitude.
-is widely used to screen candidates for various jobs. Employers are often interested
in figuring out which applicants are likely to be resilient, self-motivated, and good at
cooperating with others, and many turn to EQ tests as a way to assess these traits.
-can significantly impact various aspects of your life, including behavior in family,
friendships, and workplace relationships.
Neuropsychological Tests
-refers to a number of tests that healthcare providers use to get information about
how your brain works.
Projective Tests
-are used to measure personality. Subjects are shown ambiguous images or asked
open-ended questions, and their answers give interviewers insights into the person's
unconscious attitudes and beliefs.
- projective test is a personality test in which subjects are shown ambiguous
images and asked to interpret them. The subjects are to project their own emotions,
attitudes, and impulses onto the image, and then use these projections to explain an
image, tell a story, or finish a sentence.
-The Rorschach Inkblot Test is the best known projective test, and it is also the
first test of its kind developed. Subjects are shown series of cards with inkblot
images and asked what the images could be
-are a type of psychological test that involves observing people in a structured way,
either in a laboratory or natural setting, as they carry out various pre-determined
activities. These tests are used mainly to study children's behavior, including how
they interact with other family members.
-are some well-known examples of direct observation tests. One is the Parent-Child
Interaction Assessment (PCIA), which helps psychologists understand how parents
and children interact through language and behavior when they are playing.
It has been suggested that psychological testing and the procedures under which
such tests are administered violate the concept of the merit system, and may be used to
circumvent the procedural guarantees established by Congress in the basic civil service laws
as interpreted by the courts. Nevertheless, as authority for their procedure concerning
mental fitness exams and personality testing, government officials cite executive orders and
civil service laws recognizing presidential authority over selection procedures.
2. Testing Procedures
The State Department representative described existing procedures under which
employees of the Department and twelve other agencies are examined. In each instance the
medical staff determines whether an individual should have a psychiatric evaluation or
undergo psychological testing, or both. These steps are taken whenever a staff physician
believes an employee may have an emotional or psychological problem which would require
treatment or impair his judgment and reliability or be aggravated by an overseas
assignment. If the employee agrees to a psychiatric examination, he is given a choice of one
of the Department's four consulting psychiatrists.
As we have pointed out, there exists no body of case law concerning psychological
testing as a condition or incident of government employment." Therefore, any guidelines
which the courts may in the future lay down in this area must evolve from one or more
present trends of constitutional development. The first of these is the law regulating the
employment relationship where the Government is the employer. The inquiry here relates to
the Government's power to impose conditions upon that relationship and the extent to
which this power is circumscribed by the due process clause of the fifth amendment."' The
second trend concerns recent developments which define, however vaguely, a constitutional
right of privacy.
A view widely held among psychologists, administrators and even members of
Congress is that federal employment is not a "right" but a "privilege." This leads to the
immediate and facile conclusion that personality testing-or any other requirement, for that
matter -may be made a condition of public employment regardless of any adverse
consequences to the individual.
In the first instance, it must be pointed out that the "reasonableness" test
has most frequently been applied to legislative action. However, where departments
and agencies rely upon general statutes for rule-making powers over their
employees, it would be logically inconsistent to suggest that the legislature is
constrained by notions of due process but that the various departments have a
completely free hand to act. If in accordance with traditional due process concepts
the agencies may only act in a manner reasonably calculated to achieve their
legitimate ends, it could be argued that psychological testing is purely arbitrary and
therefore does not meet this criterion. Even if some nexus can be shown between
promoting the efficiency of the federal service and the use of psychological tests, the
serious infringement on personal liberty which results from such tests would compel
that the nexus be clearly indicated.
B. Right to Rebut Test Evidence
There are those who take an even dimmer view of psychological testing and
would ban it completely as a government personnel screening device. The argument
may be expressed in the following terms: Because of the social stigma attached to
adverse test results, the employee should be given "the same rights as he would
have in a criminal trial. The search and seizure of the contents of men's minds by a
forced submission to psychological testing should be denounced as offensive to
those canons of decency and fairness which express the notions of justice of English-
speaking peoples. A comparison can be made to the pumping of a man's stomach in
order to obtain evidence of illegal narcotics possession, a practice which was
condemned by the Court in Rochin v. California. To the extent that the analogy to
criminal proceedings can be maintained, it is obvious that there are also self-
incrimination objections to the utilization of test scores involuntarily received as a
basis for adverse action against the employe