Professional Documents
Culture Documents
Testing
Defining Characteristics
A psychological test is anything that requires an
individual to perform (a behavior) for the
purpose of measuring some attribute, trait, or
characteristic or to predict an outcome.
Eg., suppose we are interested in developing a test to measure your physical ability.
One option would be to evaluate performance in every sport you have ever played.
4. Individuals will report accurately about themselves (for example, about their
personalities, about their likes and dislikes, etc.).
6. The test score an individual receives is equal to his or her true ability plus
some error, and this error may be attributable to the test itself, the examiner,
the examinee, or the environment.
1. Psychological tests measure what they purport to
measure or predict what they are intended to predict.
That is, a test taker’s score may reflect not only the
attribute being measured but also things such as
awkward question wording, errors in administration of
the test, examinee fatigue, and the temperature of the
room in which the test was taken. When evaluating an
individual’s score, we must assume that it will include
some error.
6. The test score an individual receives is equal to his or her true ability plus
some error, and this error may be attributable to the test itself, the
examiner, the examinee, or the environment (cont.)
• 1967- Personality Research Form (PRF) was developed and published by Douglas
Jackson. It represents a relatively sophisticated attempt to measure dimensions of
normal personality.
1969- In 1969 Nancy Bayley developed a Scales of Infant development between age of
1 month and 3 ½ year. The scales are Mental scale which consist of sensory and
perceptual activities, memory learning, problem solving, vocalization, abstract
thinking. Motor Scale includes Sitting, standing, walking and also manipulatory skills of
hands and fingers, and Behavior rating scale assess various aspects of personality like
emotional and social behavior, attention, arousal, persistence etc. It has 5 point
scoring system.
Some of the Tests developed After WW-I
• 1974- Rudolf Moos begins; Social Climate Scale to assess
different environments.
1883 A. D: East India Company conducted the examination for recruitment of people for the
company.
• There was not a strong response to his book during the short period
between its publication and his untimely death. Very few copies sold
and the original publisher declared bankruptcy not long after
Rorschach’s death (but not on account of the book).
• His work might have had little impact upon psychiatry or clinical
psychology had it not been for three of his close friends who
continued to teach Rorschach’s method and to keep it alive.
Rorschach
Inkblot
Test
Rorschach: Historical
5 Scoring Systems
• Adopted by 5 American psychologists with
very different theoretical backgrounds
• Shared common features (same blots were
used, response phase followed by inquiry)
• 5 different systems of administration, scoring
and interpretation emerged
• Two most popular (Beck, Klopf)
Rorschach: Validity and Reliability
Poor psychometric reputation:
• Lack of standardized rules for administration
and scoring
• Poor inter-rater reliability
• Lack of adequate norms
• Unknown or weak validity
Rorschach: Contemporary Use
• John Exner established Rorschach Research
Foundation in 1986
Location
• Location (W, D, Dd)
• Use of white space (S)
Determinants
• Form (good, poor, bad quality)
• Movement (active and passive)
• Color
• Texture
• Shading
Rorschach Inkblot Test
• A psychometrically sound test?
• Particularly useful in assessing thought
processes
Thematic Apperception Test (TAT)
• Developed by Henry Murray and Christian Morgan at
Harvard Psychological Clinic in the year 1935.
L (Lie)
• Faking good
F (Infrequency)
• Faking bad
K (Defensiveness)
• Defensiveness in admitting to problems
Interpreting MMPI
• Validity Scales
• Single scales
• Profile analysis
MMPI: Shortcomings
• Unrepresentative normative sample
• Language of items was outdated (including
sexist language)
• Inadequately addressed difficulties such as
suicide or drug use
MMPI: Revision
• Assembled team of MMPI experts
• Rewrote some items
• Added new items
• Administered new item pool (n=704) to a
standardization sample (representative)
• Retained 567 items from the item pool
Continued problems
– failure of some items to reliably discriminate
between groups
– dimensions based on pre-conceived theory about
structure of personality,
– scales correlate highly and thus provide redundant
information
– they are highly influenced by state at the time of
taking, and the test and re-test stability may
therefore be lower than desired (a problem for
many/most trait measures)
MMPI-2 Content Scales
• Antisocial Practices
Anxiety
• FearsA
Type
• Obsessiveness
Low Self-Esteem
• Depression
Social Discomfort
• Health Problems
Family Concerns
• BizarreInterference
Work Thoughts
• Anger Treatment Indicators
Negative
• Cynicism
MMPI-2 Content Scales (cont.)
• Content scales consist of groups of items
related to a specific content area that have been
empirically related to that content area, such as
the anger scale whose content contains
references to irritability, hot-headedness, and
other symptoms of anger-control problems.
Steps for Test Construction
In order to develop a test one must choose
strategies and materials and then make day to
day research decision that will affect the
quality of the emerging instrument.
The major shortcomings of multiple choice questions are, first the difficulty of
writing good distracter options and, second, the possibility that the presence
of the response may cue a half-knowledgeable respondent to the correct
answer.
Matching Questions :
Matching questions are popular in classroom testing,
but suffer serious psychometric shortcomings.
For true – false or multiple choice items, the optimal level of item
difficulty needs to be adjusted for the effects of guessing. For a true-
false test, a difficulty level of .5 can result when examinees merely guess.
Thus , the optimal level for such items would be .75 (halfway between .5
and 1.0) .
For example :-
University’s Graduate Admissions – Very High cutting score.
Students Eligible for Remedial Education Programme - Very
Low cutting score.
Item Characteristic Curve
Purpose – An Item Characteristic Curve (ICC) is a graphical
display of the relationship between the probability of a
correct response and the examinee’s position on the
underlying trait measured by the test. It is can also be said
as a mathematical idealization of the relationship
between the probability of a correct response and the
amount of trait possessed by test respondents.
The desired shape of ICC depends upon the purpose of the test. We
can understand this statement by the following picture.
Item Discrimination Index
Purpose – An Item Discrimination Index is a statistical Index of
how efficiently an item discriminates between persons who
obtain high and low scores on the entire test.
For example a test developer administer an exam to a try-out sample of 400 high
school students . After scoring he/ she identifies 25% of high scoring and 25% of low
scoring students.
The purpose of administration between high scores and low scores is to identify the
best items.
An ideal item is one that is most of high scorers pass and most of the low scorers fail.
This statement can be proved by taking out ‘d’.
‘d’ varies from -1.0 to +1.0
- An item with d = -1.0 indicates that more of low scoring
subjects answered the item correctly than did the high scoring
subjects.
User’s Manual – This manual gives instructions for administration and also
provide guidelines for test interpretation.
• Objectivity
• Reliability
• Validity
• Norms
Objectivity
Objectivity is the tendency to base
decisions and perceptions on exterior
information instead of on subjective
aspects, like private emotions, beliefs,
and experiences.
Types of Reliability
From the perspective of classical test theory,
an examinee's obtained test score (X) is
composed of two components, a true score
component (T) and an error component (E):
X=T+E
Reliability
The true score component reflects the
examinee's status with regard to the attribute
that is measured by the test, while the error
component represents measurement error.
Consistency = Reliability
The Reliability Coefficient
Ideally, a test's reliability would be calculated
by dividing true score variance by the
obtained (total) variance to derive a reliability
index. This index would indicate the
proportion of observed variability in test
scores that reflects true score variability.
– Test Length
– Range of Test Scores
– Guessing
1. Test Length
The larger the sample of the attribute being
measured by a test, the less the relative effects of
measurement error and the more likely the sample
will provide dependable, consistent information.
When all items are either very difficult or very easy, all
examinees will obtain either low or high scores, resulting in a
restricted range.
Where:
SEmeas = standard error of measurement
SDx = standard deviation of test scores
rxx= reliability coefficient
Interpretation of Standard Error of Measurement
T’=a + bX
=(1-rxx )X + rxx X
T’=(1-.84) x 70 + .84 x 80
=.16 x 70 + .84 x 80
=11.2 + 67=78.2
The Reliability of Different Scores
• A test user is sometimes interested in
comparing the performance of an examinee
on two different tests or subtests and,
therefore, computes a difference score. An
educational psychologist, for instance, might
calculate the difference between a child's
WISC-IV Verbal and Performance scores.
The Reliability of Difference Scores
When doing so, it is important to keep in mind that the
reliability coefficient for the difference scores can be no larger
than the average of the reliabilities of the two tests or subtests:
Because these two correlations are low, they confirm that the
assertiveness test has discriminant validity. This pattern of
correlation coefficients confirms that the assertiveness test
has construct validity.
A1 rA1A1 (.93)
• The age span for a normative age group can vary from
month to a decade or more depending on the degree to
which the test performance is age dependent.
Grade Norms
• A Grade Norm depicts the level of test
performance for each separate grade in the
normative sample.
• These Norms are particularly useful in school
setting when reporting the achievement levels
of schoolchildren.
• Grade based comparison is preferred to age
based comparison where content areas are
heavily dependent upon grades rather than age.
Local and Sub-Group Norms
• Local and subgroup norms are needed to suit the
specific purpose of a test.
• Local norms are derived from representative local
examinees, as opposed to a national sample.
• Subgroup norms consist of the scores obtained from an
identified subgroup(African, Americans, females)
• As a general rule, whenever an identifiable subgroup
performs appreciably better or worse or test than the
more broadly defined standardization sample, it may be
helpful to construct supplementary subgroup norms.
Issues in Psychological Testing
Social Issues
Professional Issues
Moral Issues
Social Issues
1.Dehumanization.
2.Usefulness of tests.
3.Access to psychological testing services.
Dehumanization
In the process of test administration and scoring, psychologists
dehumanize participants during the test constructions.
Testing is expensive.
2.Using the Optical Mark Reader (OMR) test takers answer the items
filling in a so called lozenge (a small area) on a form. The mark reader is
a machine that recognizes whether the lozenge has been filled or not.
Then, this information can be transformed into test scores.
• The program applied pattern matching rules to statements to figure out its
replies. It is considered the forerunner of thinking machines. Weizenbaum was
shocked that his program was taken seriously by many users, who would open
their hearts to it. He started to think philosophically about the implications
of artificial intelligence and later became one of its leading critics.
C. Proliferation of New Tests
• New tests keep coming out all the time, with no
end in sight. There are many revised and
updated tests, and hundreds of new tests are
being published each year.
• The impetus for developing new tests comes
from:
1. Professional disagreement over the best
strategies for measuring human characteristics,
over the nature of these characteristics, and over
theories about the causes of human behavior.
C. Proliferation of New Tests
(cont.)
From public and professional pressure to use only fair,
accurate, and unbiased instruments.
• Finally, if tests are used, then authors and publishers
of tests stand to profit financially. As long as
someone can make a profit publishing tests, then
new tests will be developed and marketed.
Greater public awareness of the nature and use of psychological tests has led to increasing external
influence on testing. At one time, the public knew little about psychological tests; psychologists played an
almost exclusive role in governing how tests were used. Public awareness has led to an increased demand
for psychological services, including testing services. This demand is balanced by the tendency toward
restrictive legislative and judicial regulations and policies such as the judicial decision that restricts the use
of standard intelligence tests in diagnosing mental retardation. Th ese restrictions originate in real and
imagined public fears. In short, the public seems to be ambivalent about psychological testing,
simultaneously desiring the benefits yet fearing the power they attribute to tests.
As more individuals share the responsibility of encouraging the proper use of tests by becoming informed
of their rights and insisting on receiving them, the probability of misuse and abuse of tests will be reduced.
The commitment of the field of psychology to high ethical standards can be easily seen in the published
guidelines, position papers, and debates that have evolved during the relatively short period beginning in
1947 with the development of formal standards for training in clinical psychology (Shakow, Hilgard,
Kelly, Sanford, & Shaffer, 1947).
Practitioners of psychology, their instructors, and their supervisors show a deep concern for social values
and the dignity of the individual human being. However, the pressure of public interest in psychological
tests has led practitioners to an even greater awareness about safeguarding the rights and dignity of the
individual.
Interrelated with all of these issues is the trend toward greater protection for the public. Nearly every state
has laws that govern the use of psychological tests. Several factors give the public significant protection
against the inherent risks of testing: limiting testing to reduce the chance that unqualified people will use
psychological tests, sensitivity among practitioners to the rights of the individual, relevant court decisions,
and a clearly articulated set of ethical guidelines and published standards for proper test use.
Future Trends
A. Future Prospects for Testing Are Promising
B. The Proliferation of New and Improved Tests Will
Continue
C. Controversy, Disagreement, and Change Will
Continue
D. The Integration of Cognitive Science and Computer
Science Will Lead to Several Innovations in Testing
A. Future Prospects for Testing Are Promising
It is believed that testing has a promising future and that optimism on the integral role that testing has
played in the development and recognition of psychology. Testing is a multibillion-dollar industry, and
even relatively small testing companies can gross millions of dollars per year. With so much at stake,
testing is probably here to stay.
Despite division within psychology concerning the role and value of testing, it remains one of the few
unique functions of the professional psychologist. Psychological testing encompasses not only
traditional but also new and innovative uses-as in cognitive-behavioral assessment, psychophysiology,
evaluation research, organizational assessment, community assessment, and investigations into the
nature of human functioning-one can understand just how important tests are to psychologists. With
this fundamental tie to testing, psychologists remain the undisputed leaders in the field.
It is unlikely that attacks on and dissatisfaction with traditional psychological tests will suddenly
compel psychologists to abandon tests. Instead, psychologists will likely continue to take the lead in
this field to produce better and better tests, and such a direction will benefit psychologists, the field,
and society.
Testing corporations that publish and sell widely used high-stakes standardized tests will no doubt
continue to market their products aggressively. Moreover, tests are used in most institutions-schools,
colleges, hospitals, industry, business, the government, and so forth-and new applications and creative
uses continue to emerge in response to their demands. Tests will not suddenly disappear with nothing to
replace them. Current tests will continue to be used until they are replaced by still better tests, which of
course may be based on totally new ideas. Though current tests may gradually fade from the scene, we
believe psychological testing will not simply survive but will flourish through the 21st century.
B. The Proliferation of New and Improved
Tests Will Continue
The future may see the development of many more tests. E.g., Stanford-Binet and Wechsler tests -
two major intelligence scales are probably about as technically adequate as they will ever be. They
can be improved through minor revisions to update test stimuli and to provide larger and even more
representative normative samples with special norms for particular groups and through additional
research to extend and support validity documentation.
In the coming days, major intelligence scales may or may not be challenged at least once or twice by
similar tests with superior standardization and normative data or with less bias against certain
minorities. A true challenge can come only from a test based on original concepts and a more
comprehensive theoretical rationale than that of the present scales.
Structured personality test like MMPI-2 appears destined to be the premier test of the 21st century.
We had not anticipated the innovative approach of Butcher and colleagues in dealing with the
original MMPI’s inadequate normative sample. Thus, future prospects for the MMPI-2 are indeed
bright.
For certain tests like projective tests, the future may be uncertain. There is serious doubt whether
the Rorschach provides clinically useful information (Hunsley & Bailey, 1999). Some say the
Rorschach is no better than reading tea leaves. Although we would not go this far, it is clear that
proponents of the Rorschach are fighting an uphill battle (Exner, 1999; Weiner, 2003).
The future of the Thematic Apperception Test (TAT) is more difficult to predict. In a projective test,
outdated stimuli are not a devastating weakness because projective stimuli are by nature ambiguous.
Nevertheless, because the TAT stimuli have been revised (Ritzler, Sharkey, & Chudy, 1980) the TAT
may enjoy increased respectability as more data are acquired on the more recent versions.
C. Controversy, Disagreement, and Change
Will Continue
• It doesn’t matter whether the topic is testing or animal learning—disagreement and
controversy are second nature to psychologists. Disagreement brings with it new
data that may ultimately produce some clarification along with brand new
contradictions and battle lines. Psychologists will probably never agree that any
one test is perfect, and change will be a constant characteristic of the fi eld. We
continue to be optimistic because we see the change as ultimately resulting in more
empirical data, better theories, continuing innovations and advances, and higher
standards.
F. The Integration of Cognitive Science and Computer
Science Will Lead to Several Innovations in Testing