Guidance Reviewer

GUIDANCE REVIEWER
Course Outline
I. Introduction
a. History of Psychological Testing
b. Defining Psychological Testing
c. Basic Concepts of Psychological Testing
II. The Science of Psychological Measurement

a. Norms and Basic Statistics for Testing
b. Reliability and Validity
c. Writing and Evaluating Test Items (Test Construction)
d. Test Administration
III. Types of Psychological Tests and Their Uses

a. Intelligence, Aptitude, Achievement Tests
b. Personality Tests
c. Other Tests
IV. Applications and Issues in Psychological Testing

a. Application of Psychological Testing in Various Settings
b. Testing and the Law
c. Ethics and the Future of Psychological Testing
Unit I Introduction
HISTORY OF PSYCHOLOGICAL TESTING
1000 BC Testing in Chinese civil service

1850 -1900 Civil service examinations in the United States
1900 -1920 Development of individual and group tests of cognitive
ability, development of Psychometric theory
1920 -1940 Development of factor analysis, development of
projective tests and standardized personality inventories
1940 -1960 Development of vocational interest measures,
standardized measures of psychopathology
1960 - 1980 Development of item response theory,
neuropsychological testing
1980 - to present Large-scale implementation of computerized adaptive
tests
i. DEFINING PSYCHOLOGICAL TESTING
PSYCHOMETRICS
The branch of psychology concerned with the quantification and

measurement of mental attributes, behavior, performance, and the like, as well as
with the design, analysis, and improvement of the tests, questionnaires, and other
instruments used in such measurement.
PSYCHOLOGICAL TESTING
The process of measuring psychology-related variables by means of devices

or procedures designed to obtain a sample of behavior.
PSYCHOLOGICAL ASSESSMENT
Gathering and integration of psychology-related data for the purpose of making a
psychological evaluation that is accomplished through the use of tools such as tests,
interviews, case studies, behavioral observation, and specially designed apparatuses
and measurement procedures.
ii. BASIC CONCEPTS OF PSYCHOLOGICAL TESTING
TESTING
 Objective
Typically, to obtain some gauge, usually numerical in nature, with regard to
an ability or attribute.
 Process
Testing may be individual or group in nature. After test administration, the

tester will typically add up “the number of correct answers or the number of
certain types of responses with little if any regard for the how or mechanics
of such content” (Maloney & Ward, 1976, p. 39).
 Role of Evaluator
The tester is not key to the process; practically speaking, one tester may be
substituted for another tester without appreciably affecting the evaluation .
 Skill of Evaluator
Testing typically requires technician-like skills in terms of administering and

scoring a test as well as in interpreting a test result.
 Outcome
Typically, testing yields a test score or series of test scores.
ASSESSMENT
 Objective
Typically, to answer a referral question, solve a problem, or arrive at a
decision through the use of tools of evaluation.
• Process
Assessment is typically individualized. In contrast to testing, assessment more
typically focuses on how an individual processes rather than simply the
results of that processing (Cohen and Swerdlik, 2009).
• Role of Evaluator
The assessor is key to the process of selecting tests and/or other tools of
evaluation as well as in drawing conclusions from the entire evaluation.
• Skill of Evaluator
Assessment typically requires an educated selection of tools of evaluation,
skill in evaluation, and thoughtful organization and integration of data.
• Outcome
Typically, assessment entails a logical problem-solving approach that brings
to bear many sources of data designed to shed light on a referral question.
Unit II THE SCIENCE OF PSYCHOLOGICAL MEASUREMENT
a) PSYCHOLOGICAL TESTING NORMS & BASIC STATISTICS

STATISTICS
A branch of mathematics dedicated to organizing, depicting, summarizing,
analysing, and otherwise dealing with numerical data.
WHY DO WE NEED STATISTICS?

Statistics are used for purposes of description. Use statistics to make
inferences, which are logical deductions about events that cannot be observed
directly.
 Descriptive statistics
are methods used to provide a concise description of a collection of
quantitative information; numbers and graphs used to describe, condense, or
represent data.
 Inferential statistics
are methods used to make inferences from observations of a small group of
people known as a sample to a larger group of individuals known as a
population; to estimate population values based on sample values or to test
hypotheses.
TYPES OF SCALES
 Nominal scales
Are really not scales at all; their only purpose is to name objects. Nominal
scales are used when the information is qualitative rather than quantitative.
 Ordinal scale
This scale allows you to rank individuals or objects but not to say anything
about the meaning of the differences between the ranks.
 Interval scale
Has the property of magnitude and equal intervals but not absolute 0.
 Ratio scale
A scale that has all three properties (magnitude, equal intervals, and an
absolute 0) any mathematical operation is permissible.
FREQUENCY DISTRIBUTION
A distribution of scores summarizes the scores for a group of individuals. The
frequency distribution displays scores on a variable or a measure to reflect how
frequently each value was obtained.
MEASURES OF CENTRAL TENDENCY
Mean
 arithmetic average
Median
 the value that divides a distribution that has been arranged in order of
magnitude into two halves
Mode
 most frequently occurring value in a distribution, is useful primarily when
dealing with qualitative or categorical variables
MEASURES OF VARIABILITY
Range
 distance between two extreme points—the highest and lowest values in a
distribution
Variance
 the sum of the squared differences or deviations between each value (X)
in a distribution and the mean of that distribution (M), divided by N
Standard deviation
 the square root of the variance; it provides a single value that is
representative of the individual differences or deviations in a data set
computed from a common reference point, namely, the mean
THE NORMAL CURVE

 also known as the bell curve. Its baseline, equivalent to the X-axis of the
distribution, shows the standard deviation (σ) units; its vertical axis, or
ordinate, usually does not need to be shown because the normal curve is
not a frequency distribution of data but a mathematic al model of an ideal
or theoretical distribution.
PROPERTIES OF THE NORMAL CURVE MODEL
 It is bell shaped, as its nickname indicates.

 It is bilaterally symmetrical, which means its two halves are identical (if we
split the curve into two, each half contains 50% of the area under the curve).
 It has tails that approach but never touch the baseline, and thus its limits
extend to ± infinity (±∞), a property that underscores the theoretical and
mathematical nature of the curve.
 It is unimodal; that is, it has a single point of maximum frequency or
maximum height.
 It has a mean, median, and mode that coincide at the center of the
distribution because the point where the curve is in perfect balance, which is
the mean, is also the point that divides the curve into two equal halves, which
is the median, and the most frequent value, which is the mode.
USES OF THE NORMAL CURVE
 The normal curve model is used descriptively to locate the position of scores
that come from distributions that are normal. In a process known as
normalization, the normal curve is also used to make distributions that are not
normal—but approximate the normal—conform to the model, in terms of the
relative positions of scores.
 The normal curve model is applied inferentially in the areas of
(a) Reliability, to derive confidence intervals to evaluate obtained
scores and differences between obtained scores, and
(b) Validity, to derive confidence intervals for predictions or estimates
based on test scores.
SHAPE OF DISTRIBUTIONS
Kurtosis
 refers to the flatness or peakedness of a distribution
Platykurtic
 distributions have the greatest amount of dispersion, manifested in tails
that are more extended, and leptokurtic distributions have the least. The
normal distribution is mesokurtic, meaning that it has an intermediate
degree of dispersion.
Skewness (Sk)
 of a distribution refers to a lack of symmetry. As we have seen, the normal
distribution is perfectly symmetrical, with Sk = 0; its bulk is in the middle
and its two halves are identical.
A skewed distribution is asymmetrical. If most of the values are at the top end
of the scale and the longer tail extends toward the bottom, the distribution is
negatively skewed (Sk < 0); on the other hand, if most of the values are at the
bottom and the longer tail extends toward the top of the scale, the distribution is
positively skewed (Sk > 0).
ESSENTIALS OF CORRELATION AND REGRESSION

The degree of relationship between two variables is indicated by the number
in the coefficient, whereas the direction of the relationship is indicated by the sign. A
correlation coefficient of –0.80, for example, indicates exactly the same degree of
relationship as a coefficient of +0.80. Whether positive or negative, a correlation is
low to the extent that its coefficient approaches zero.
Correlation, even if high, does not imply causation. If two variables, X andY,
are correlated, it may be because X causesY, because Y causes X, or because a third
variable, Z, causes both X andY. This truism is also frequently ignored; moderate to
high correlation coefficients are often cited as though they were proof of a causal
relationship between the correlated variables.
High correlations allow us to make predictions. While correlation does not
imply causation, it does imply a certain amount of common or shared variance.
Knowledge of the extent to which things vary in relation to one another is extremely
useful. Through regression analyses we can use correlational data on two or more
variables to derive equations that allow us to predict the expected values of a
dependent variable (Y), within a certain margin of error, based on the known values
of one or more independent variables (X1, X2 , . . . Xk ), with which the dependent
variable is correlated.
RELIABILITY AND VALIDITY
RELIABILITY
A good test or, more generally, a good measuring tool or procedure is reliable. The
criterion of reliability involves the consistency of the measuring tool: the precision with
which the test measures and the extent to which error is present in measurements. In
theory, the perfectly reliable measuring tool consistently measures in the same way. As you
might expect, however, reliability is a necessary but not sufficient element of a good test. In
addition to being reliable, tests must be reasonably accurate. In the language of
psychometrics, tests must be valid.
In its broadest sense, error refers to the component of the observed test score that
does not have to do with the test taker’s ability. If we use X to represent an observed score,
T to represent a true score, and E to represent error, then the fact that an observed score
equals the true score plus error may be expressed as follows:
X=T+E
A statistic useful in describing sources of test score variability is the variance (σ2)—
the standard. Variance from true differences is true variance, and variance from irrelevant,
random sources is error variance.
SOURCES OF ERROR VARIANCE
Test construction. One source of variance during test construction is item
sampling or content sampling, terms that refer to variation among items within a test as well
as to variation among items between tests. From the perspective of a test creator, a
challenge in test development is to maximize the proportion of the total variance that is true
variance and to minimize the proportion of the total variance.
Test administration. Sources of error variance that occur during test
administration may influence the test taker’s attention or motivation. The test taker’s
reactions to those influences are the source of one kind of error variance. Examples of
untoward influences during administration of a test include factors related to the test
environment: the room temperature, the level of lighting, and the amount of ventilation and
noise, for instance.

Test scoring and interpretation. The advent of computer scoring and a growing
reliance on objective, computer-scorable items virtually have eliminated error variance
caused by scorer differences in many tests. If subjectivity is involved in scoring, then the
scorer (or rater) can be a source of error variance.
Other sources of error. Females, for example, may underreport abuse because of
fear, shame, or social desirability factors and over report abuse if they are seeking help. 
Males may underreport abuse because of embarrassment and social desirability factors and
over report abuse if they are attempting to justify the report.
RELIABILITY ESTIMATES
 Test-Retest Reliability Estimates
One way of estimating the reliability of a measuring instrument is by using

the same instrument to measure the same thing at two points in time. In
psychometric parlance, this approach to reliability evaluation is called the test-
retest method, and the result of such an evaluation is an estimate of Test-Retest
Reliability.
 Parallel-Forms and Alternate-Forms Reliability Estimates
The degree of the relationship between various forms of a test can be

evaluated by means of an alternate-forms or parallel-forms coefficient of reliability,
which is often termed the coefficient of equivalence. Parallel forms of a test exist
when, for each form of the test, the means and the variances of observed test scores
are equal. In theory, the means of scores obtained on parallel forms correlate
equally with the true score. More practically, scores obtained on parallel tests
correlate equally with other measures.
Alternate forms are simply different versions of a test that have been
constructed so as to be parallel. Although they do not meet the requirements for the
legitimate designation “parallel,” alternate forms of a test are typically designed to be

equivalent with respect to variables such as content and level of difficulty. Obtaining
estimates of alternate-forms reliability and parallel-forms reliability is similar in two
ways to obtain an estimate of test-retest reliability:
1. Two test administrations with the same group are required, and
2. test scores may be affected by factors such as motivation, fatigue, or
intervening events such as practice, learning, or therapy (although not as much as
when the same test is administered twice).
An additional source of error variance, item sampling, is inherent in the
computation of an alternate- or parallel-forms reliability coefficient. Test takers may
do better or worse on a specific form of the test not as a function of their true ability
but simply because of the particular items that were selected for inclusion in the test.
 Split-Half Reliability Estimates
An estimate of split-half reliability is obtained by correlating two pairs of
scores obtained from equivalent halves of a single test administered once. The
computation of a coefficient of split-half reliability generally entails three steps:
Step 1. Divide the test into equivalent halves.
Step 2. Calculate a Pearson r between scores on the two halves of the test.
Step 3. Adjust the half-test reliability using the Spearman-Brown formula.
Simply dividing the test in the middle is not recommended because it’s likely this
procedure would spuriously raise or lower the reliability coefficient. Different amounts of
fatigue for the first as opposed to the second part of the test, different amounts of test
anxiety, and differences in item difficulty as a function of placement in the test are all
factors to consider.
 The Spearman-Brown formula

The Spearman-Brown formula allows a test developer or user to estimate
internal consistency reliability from a correlation of two halves of a test. The general
Spearman-Brown ( rSB ) formula i
 Other Methods of Estimating Internal Consistency
Inter-item consistency refers to the degree of correlation among all the

items on a scale. A measure of inter-item consistency is calculated from a
single administration of a single form of a test. An index of interim
consistency, in turn, is useful in assessing the homogeneity of the test. Tests
are said to be homogeneous if they contain items that measure a single trait.
In contrast to test homogeneity, heterogeneity describes the degree to
which a test measures different factors. A heterogeneous (or
nonhomogeneous) test is composed of items that measure more than one
trait. A test that assesses knowledge only of color television repair skills could
be expected to be more homogeneous in content than a test of electronic
repair.
MEASURES OF INTER-SCORER RELIABILITY
Variously referred to as scorer reliability, judge reliability, observer reliability,

and inter-rater reliability, inter-scorer reliability is the degree of agreement or
consistency between two or more scorers (or judges or raters) with regard to a
particular measure. If, for example, the problem is a lack of clarity in scoring criteria,
then the remedy might be to rewrite the scoring criteria section of the manual to
include clearly written scoring rules.
USING AND INTERPRETING A COEFFICIENT OF RELIABILITY
We have seen that, with respect to the test itself, there are basically three
approaches to the estimation of reliability:
1. test-retest
2. alternate or parallel forms, and
3. internal or inter-item consistency
Another question that is linked in no trivial way to the purpose of the test is,
“How high should the coefficient of reliability be?” Perhaps the best “short answer”
to this question is: “On a continuum relative to the purpose and importance of the
decisions to be made on the basis of scores on the test”.
THE PURPOSE OF THE RELIABILITY COEFFICIENT

 If a specific test of employee performance is designed for use at various
times over the course of the employment period, it would be reasonable to
expect the test to demonstrate reliability across time. It would thus be
desirable to have an estimate of the instrument’s test-retest reliability.
 For a test designed for a single administration only, an estimate of
internal consistency would be the reliability measure of choice.
 It is possible, for example, that a portion of the variance could be
accounted for by transient error, a source of error attributable to
variations in the test taker’s feelings, moods, or mental state over time.
VALIDITY
Validity, as applied to a test, is a judgment or estimate of how well a test
measures what it purports to measure in a particular context. More specifically, it is
a judgment based on evidence about the appropriateness of inferences drawn from
test scores.
Validation
is the process of gathering and evaluating evidence about validity. Both the test
developer and the test user may play a role in the validation of a test for a specific purpose.
Such local validation studies may yield insights regarding a particular population of test
takers as compared to the norming sample described in a test manual. Local validation
studies are absolutely necessary when the test user plans to alter in some way the format,
instructions, language, or content of the test.
One way measurement specialists have traditionally conceptualized validity is
according to three categories (Trinitarian view):
 content validity
 criterion validity
 construct validity
In this classic conception of validity, referred to as the trinitarian view, it might be
useful to visualize construct validity as being “umbrella validity” since every other variety of
validity falls under it. Three approaches to assessing validity—associated, respectively, with
content validity, criterion-related validity, and construct validity—area:
1. scrutinizing the test’s content
2. relating scores obtained on the test to other test scores or other measures
3. executing a comprehensive analysis of
a. how scores on the test relate to other test scores and measure
b. how scores on the test can be understood within some theoretical framework
for understanding the construct that the test was designed to measure.
FACE VALIDITY
- relates more to what a test appears to measure to the person being tested
than to what the test actually measures. Face validity is a judgment
concerning how relevant the test items appear to be.

- In contrast to judgments about the reliability of a test and judgments about
the content, construct, or criterion-related validity of a test, judgments about
face validity are frequently thought of from the perspective of the test taker,
not the test user.
- A test’s lack of face validity could contribute to a lack of confidence in the
perceived effectiveness of the test—with a consequential decrease in the test
taker’s cooperation or motivation to do his or her best.
CONTENT VALIDITY
 Content validity describes a judgment of how adequately a test samples
behavior representative of the universe of behavior that the test was
designed to sample.
 From the pooled information (along with the judgment of the test
developer), a test blueprint emerge for the “structure” of the evaluation; that
is, a plan regarding the types of information to be covered by the items, the
number of items tapping each area of coverage, the organization of the
items in the test, and so forth.
THE QUANTIFICATION OF CONTENT VALIDITY
The measurement of content validity is important in employment settings,
where tests used to hire and promote peopl are carefully scrutinized for their
relevance to the job, among other factors. Courts often require evidence that
employment tests are work related.

Lawshe. One method of measuring content validity, developed by C. H.
Lawshe, is essentially a method for gauging agreement among raters or judges
regarding how essential a particular item is. Lawshe proposed that each rater
respond to the following question for each item: “Is the skill or knowledge measured
by this item
1. Essential
2. useful but not essential
3. not necessary, to the performance of the job?
CULTURE AND THE RELATIVITY OF CONTENT VALIDITY
As incredible as it may sound to Westerners, students in Bosnia and
Herzegovina are taught different versions of history, art, and language depending
upon their ethnic background. Such a situation illustrates in stark relief the influence
of culture on what is taught to students as well as on aspects of test construction,
scoring, interpretation, and validation.
CRITERION-RELATED VALIDITY
Criterion-related validity is a judgment of how adequately a test score can be
used to infer an individual’s most probable standing on some measure of interest the
measure of interest being the criterion. Concurrent validity is an index of the degree
to which a test score is related to some criterion measure obtained at the same time
(concurrently). Predictive validity is an index of the degree to which a test score
predicts some criterion measure.
CONCURRENT VALIDITY
If test scores are obtained at about the same time that the criterion measures
are obtained, measures of the relationship between the test scores and the criterion
provide evidence of concurrent validity. Statements of concurrent validity indicate
the extent to which test scores may be used to estimate an individual’s present
standing on a criterion.
A test with satisfactorily demonstrated concurrent validity may therefore be
appealing to prospective users because it holds out the potential of savings of
money and professional time.
PREDICTIVE VALIDITY
Measures of the relationship between the test scores and a criterion measure
obtained at a future time provide an indication of the predictive validity of the test;
that is, how accurately scores on the test predict some criterion measure.
Test scores may be obtained at one time and the criterion measures obtained
at a future time, usually after some intervening event has taken place. The
intervening event may take varied forms, such as training, experience, therapy,
medication, or simply the passage of time.
THE VALIDITY COEFFICIENT
 The validity coefficient is a correlation coefficient that provides a measure of
the relationship between test scores and scores on the criterion measure.
INCREMENTAL VALIDITY
 an aspect of validity that refers to what an additional assessment or
predictive variable can add to the information provided by existing
assessments or variables
CONSTRUCT VALIDITY
 Construct validity is a judgment about the appropriateness of inferences
drawn from test scores regarding individual standings on a variable called a
construct.
C. WRITING AND EVALUATING TEST ITEMS (TEST CINSTRUCTION)
Importance of Well-Constructed Tests
The importance of well-constructed tests in the field of psychology and education is
substantial. Here are key reasons highlighting their significance:
1.Reliability 6. Efficiency
2. Validity 7. Predictive Ability
3.Fairness 8. Applicability Across Settings
4. Objectivity 9. Legal and Ethical Compliance
5. Useful Feedback 10. Continuous Improvement
11. Credibility and Trustworthiness
In summary, well-constructed tests are fundamental in providing accurate,
fair, and reliable assessments in various fields, contributing to informed decision-
making, effective learning, and personal and professional development.
Key Principles in Test Writing

By adhering to these key principles, test writers can contribute to the creation
of fair, valid, and reliable assessments that effectively measure the intended
knowledge, skills, or behaviors.
 Clarity. Test items and instructions should be clear and easily understood by
the test-takers. Ambiguous or confusing language can lead to
misinterpretation and inaccurate responses.
 Relevance. Test items should directly measure the targeted knowledge,
skills, or behaviors. Irrelevant or off-topic questions can undermine the
validity of the test.
 Fairness. Tests should be fair to all individuals, regardless of their
background, culture, or socioeconomic status. Avoiding language or content
biases ensures equitable assessment.
 Objectivity. Test items and scoring should minimize subjective judgments.
Objectivity in scoring enhances reliability, ensuring that different scorers
arrive at similar results.
 Consistency. Test items should be consistent in terms of difficulty and
relevance. Consistency helps maintain reliability, allowing for accurate
comparisons between different test-takers.
 Variety. Include a variety of question types (e.g., multiple-choice, short
answer, essay) to assess different cognitive skills. This variety provides a
more comprehensive evaluation of the test-taker's abilities.

 Avoiding Ambiguity. Test items should be free from ambiguity or
vagueness. Ambiguous questions can lead to confusion and varied
interpretations, compromising the validity of responses.
 Avoiding Double-Barreled Questions. Questions should focus on a single
idea or concept to avoid confusion. Double-barreled questions that address
multiple issues can lead to unclear or inaccurate responses.
GUIDELINES IN WRITING TEST (KAPLANSACUZZO, PSYCHOLOGICAL

TESTING 9TH EDITION)
1. Define clearly what you want to measure. To do this, use substantive theory as a
guide and try to make items as specific as possible.
2. Generate an item pool. Theoretically, all items are randomly chosen from a
universe of item content. In practice, however, care in selecting and developing
items is valuable. Avoid redundant items. In the initial phases, you may want to
write three or four items for each one that will eventually be used on the test or
scale.
3. Avoid exceptionally long items. Long items are often confusing or misleading.
4. Keep the level of reading difficulty appropriate for those who will complete the
scale.
5. Avoid “double-barreled” items that convey two or more ideas at the same time.
For example, consider an item that asks the respondent to agree or disagree with
the statement, “I vote Democratic because I support social programs.” There are
two different statements with which the person could agree: “I vote Democratic”
and “I support social programs.”

6. Consider mixing positively and negatively worded items. Sometimes, respondents
develop the “acquiescence response set.” This means that the respondents will tend
to agree with most items. To avoid this bias, you can include items that are worded
in the opposite direction.
ITEM FORMATS
Dichotomous Format
Description: This format presents items with two response options, typically
"yes/no" or "true/false." It is commonly used for straightforward and clear-cut
assessments.
Polytomous Format
Description: Unlike dichotomous, polytomous formats have more than two
response options for each item. It allows for a graded response, providing a range of
choices.
Likert Format
Description: Likert scales involve presenting a statement and asking
respondents to indicate their level of agreement or disagreement on a scale. It is
widely used in attitudinal assessments.
Category Format
Description: This format involves categorizing items based on predefined
criteria. Respondents choose the category that best fits their response.
Checklists and Q-sorts

Description: A checklist involves marking items on a list that are present or
observed. It's a simple way to record the presence or absence of specific behaviors
or characteristics.
TEST ADMINISTRATION
The Relationship between Examiner and Test Taker
Both behavior of the examiner and their relationship to the test taker can affect test scores
Rapport of the examiner with test taker can influence results
For younger children, a familiar examiner may make difference
Fuchs and Fuchs found out that test performance was approximately .28 standard deviation
(roughly 4 IQ points) higher when the examiner was familiar with the test taker than when
not.
Familiarity with the test taker, and perhaps preexisting notions about the test taker’s ability
can either positively or negatively bias test results.
Attitudinal surveys - respondents may give the response that they perceived to be expected
the by interviewer
Rapport might be influenced by subtle processes such as the level of performance expected
by the examiner.
THE RACE OF THE TESTER
Some groups feel that their children should not be tested by anyone EXCEPT a member of
their own race.
According to Sattler there is little evidence that the race of the examiner significantly affects
intelligence test score.
Race of the examiner has nonsignificant effects on test performance for both African
American and white children.
Early results occurred both the Stanford-Binet scale and the Peabody Picture Vocabulary
Test.
Few studies have shown effect attributed to the race of the examiner. There were only 4 of
29 studies found.
However, procedures for properly administering an IQ test are so specific. Regardless of

race, test administrators should act almost identically.
Deviation from procedures might produce differences in performance attributed to their
race.
Sattler has shown that the race of the examiner affects the scores in some situations .
Examiners effects tend to increase when examiners are given more discretion about the use
of the tests.
TRAINING OF TEST ADMINISTRATORS
Different assessment procedure require different levels of training.
Many behavioral assessment procedure require training and evaluation but not a formal
degree or diploma.
Structured Clinical Interview for DSM-IV is used for psychiatric diagnosis.
SCID users are licensed psychiatrists or psychologist with additional training on the test.
No standardized protocols for training people to administer complicated tests such as the
Wechsler Adult Intelligence Scale-Revised.
EXPECTANCY EFFECTS / ROSENTHAL EFFECTS
A well-known line of research in psychology has shown that data sometimes can be affected
by what an experimenter expects to find.
Results show that subjects actually provide data that confirm the experimenter’s
expectancies.
Study in Israel, women supervisors were told that some women officer cadets offered
exceptional potential. Selection was made randomly instead of on the basis of any evidence.
There were no expectancy effect shown.
Follow-up study shows that expectancy effect show up for men and women supervised by
men but no women led by women.
Expectancy effect exists in some but not all situations.
Expectancy shape our judgements.
Two Aspects of Expectancy Effect with Standardized Tests
Expectancy effects (Rosenthal’s) obtained when experiments followed standardized scripts.
 may come from subtle nonverbal communication between experimenter and subject
 experimenter may not even notice
Expectancy effects - small, subtle effect on scores; occurs in some situations and not
others.
 careful studies on particular tests needed
Expectancy effect may impact intelligence test scoring
Expectancy effect can also occur for non-ambiguous responses.
Expectancies in test administrators (e.g. more than just scores given) have yielded
somewhat inconsistent results. Some with expectancy effect, some with none.
In spite of inconsistent results, you should pay attention to potentially biasing effect of
expectancy.
Rosenthal’s critics do not deny the possibility of this effect.
EFFECT OF REINFORCING RESPONSES
Positive reinforcement is a process that strengthens the likelihood of a particular

response by adding a stimulus after the behavior is performed.
Negative reinforcement also strengthens the likelihood of a particular response, but by

removing an undesirable consequence.
 Several studies show that reward can significantly affect test performance
 Reinforcement and feedback guide the examinee toward a preferred response.
 Random reinforcement destroys the accuracy of performance and decreases the
motivation to respond (Eisenberger & Cameron, 1998)
Computer-assisted Test Administration
Interactive testing involves the presentation of test items on a computer terminal or

personal computer and the automatic recording of test responses. The computer can also be
programmed to instruct the test taker and to provide instruction when parts of the testing
procedure are not clear.
As early as 1970, Cronbach recognized the value of computers as test administrators. Here
are some of the advantages that computers offer:
 excellence of standardization,
 individually tailored sequential administration,
 precision of timing responses,
 release of human testers for other duties,
 patience (test taker not rushed), and control of bias,
Example of CATs
Conventional Testing - examinees receive the same test questions in the same order,
usually a question at a time.
Branched or Response-Contingent Testing -a problem situation is presented to the
examinee with a number of alternatives.
Sequential Testing -are typically used to make a classification decision (e.g., to hire or not
to hire, to graduate or not to graduate, or whether someone is or is not depressed) using
one or more prespecified cut off scores.Subject Variables
Refers to characteristics that vary across participants, and they can’t be manipulated by the
one administering the test. These are often serious source of error.
Illness affects test scores. When you have a cold or the flu, you might not perform as well
as when you are feeling well. Many variations in health status affect performance in
behavior and in thinking (Kaplan, 2004.)
Medical drugs are now evaluated according to their effects on the cognitive process (Spilker,
1996).
Behavioral Assessment Methodology
Facts about human’s behavior:
 Good morning and good night text messages activate the part of the brain
responsible for happiness.
 Feeling ignored causes the same chemical effect as that of an injury
 Some of us are actually afraid of being so happy because of the fear that something
tragic might happen next.
Behavioral traits are the observable patterns of behavior that are relatively consistent
across various situations.
Focuses on the interactions between situations and behaviors for the purpose of effecting
behavioral change.
Types of Behavioral Assessment
 Personality Assessments
 Situational Judgment Tests (SJTs)
 Behavioral Interviews
 Work Sample
Pros and Cons of Behavioral Assessments
Pros: Behavioral assessments can provide objective data to help hiring managers evaluate
their candidates.
Cons: While tests can help reduce bias in the hiring process, they are not immune to bias
themselves. Determining which traits are “valuable” or “risky” is not, itself, an objective
process.
Predictive Value
Pros: Behavioral assessments can be effective in predicting job performance and identifying
candidates who are likely to succeed in the role.
Cons: These tests are not foolproof. Why does an employee succeed at one company but
fail at another? The employee is the same but the company’s product, support, culture,
territory, etc. (and the economy in general) all serve to complicate employee success.
Time
Pros: Behavioral assessments can help filter out candidates who are not a good fit for the
job, saving time and resources in the hiring process. Cons: On the other hand, these
assessments take time to administer and evaluate which can bog down the hiring process.
Bias
Pros: Using behavioral assessments can help ensure that all candidates are evaluated on
the same criteria, which can help reduce bias and ensure fairness in the hiring process.
Cons: No assessment can be truly free from bias. It’s important for hiring managers to be
aware of any potential biases and to use assessments in conjunction with other evaluation
methods.
Costs/Benefits
Pros: Behavioral assessments can provide insight into a candidate’s work style,
communication skills, and problemsolving abilities, which can help managers make more
informed hiring decisions.
Cons: Some tests can be expensive, which may be a barrier for smaller companies or those
with limited budgets.
Reactivity
“observes the observers”
when individuals change their behavior due to awareness that their behavior is being or will
be measured. - Their behavior might become more positive or negative, depending on the
situation and the people involved.
Drift
-refers to the tendency for observers in behavioral studies to stray from the definitions they
learned during training and to develop their own idiosyncratic definitions of behaviors
despite observing the same behavior.
Expentancies
Another potential source of bias is the expectancies of the observers regarding the subject's
behavior and the feedback observers receive from the experimenter in relation to that
behavior.
Deception
The act of misleading or wrongly informing someone about the true nature of a situation.
Statistical Control of Rating Errors
Also known as the halo effect is the tendency to ascribe positive attributes independently of
the observed behavior. Some psychologists have argued that this effect can be controlled
through partial correlation in which the correlation between two variables is found while
variability in a third variable is controlled.
UNIT III
TYPES OF PSYCHOLOGICAL TESTING
When administered and evaluated properly, psychological tests are accurate tools
used to diagnosis and treat mental health conditions. When you hear the words
“psychological testing,” all kinds of questions and thoughts may run through your mind.
Psychological testing is the basis for mental health treatment. These tools are often used
to measure and observe a person’s behaviors, emotions, and thoughts.Tests are performed
by a psychologist who will evaluate the results to determine the cause, severity, and
duration of your symptoms. This will guide them in creating a treatment plan that meets
your needs.
Tests can either be objective or projective:
 Objective testing involves answering questions with set responses like yes/no or
true/false.
 Projective testing evaluates responses to ambiguous stimuli in the hopes of

uncovering hidden emotions and internal conflicts.
Psychologists use testing to examine a variety of factors, including emotional intelligence,

personality, mental aptitude, and neurological functioning.
Here’s a more in-depth look at the types of testing available and the most commonly used
tests for each category.
Personality tests
Measure behaviors, emotions, attitude, and behavioral and environmental characteristics

Test names: Basic Personality Inventory (BPI), 16 Personality Factor Questionnaire
Achievement tests
Measure respondents’ intellectual interests, achievements, and cognitive abilities

Test names: Woodcock-Johnson Psychoeducational Battery, Kaufman Test of Education
Achievement (K-TEA)
Attitude tests
Measure views of respondents based on how much they agree or disagree with a statement
Test names: Likert Scale, Thurstone Scale
Aptitude tests
Measure capabilities, skill sets, and projection of future success

Test names: Visual Reasoning Test, Abstract Reasoning Test
Emotional Intelligence tests
Measure emotional responses such as anger, sadness, happiness, and impulsivity

Test names: Mayor-Salovey-Caruso El Test (MSCEIT), Emotional and Social Competence
Inventory.
There are a number of core principles that form the foundation for psychological
assessment:
 Tests are samples of behavior.
 Tests do not directly reveal traits or capacities, but may allow inferences to be made
about the person being examined.
 Tests should have adequate reliability and validity.
 Test scores and other test performances may be adversely affected by temporary
states of fatigue, anxiety, or stress; by disturbances in temperament or personality;
or by brain damage.
A psychological evaluation can be a key part of your therapy journey. It gathers

information about how you think, feel, behave, and much more.
A psychological evaluation is often thought of as the first line of defense in diagnosing and
treating a mental health condition. Performed by a psychologist, it helps them gain an
understanding of the severity and duration of your symptoms.Tests and assessments are
the two main components used in anvaluation typically includes using formal tests, or
“norm-referenced” tests. These are standardized tests that measure an individual’s ability to
learn and understand several concepts.
Common components of an assessment include:
 psychological tests
 surveys and tests
 interviews
 observational data
 medical and school history
 medical evaluation
PERSONALITY TEST
Personality
McClelland (1951, p. 69) defined personality as “the most adequate conceptualization

of a person’s behavior in all its detail.”
Menninger (1953, p. 23) defined it as “the individual as a whole, his height and weight
and love and hates and blood pressure and reflexes; his smiles and hopes and bowed legs
and enlarged tonsils.
Cohen and Swerdlik define personality as an individual’s unique constellation of

psychological traits and states.
Personality Assessment
Personality assessment may be defined as the measurement and evaluation of psychological

traits, states, values, interests, attitudes, worldview, acculturation, personal identity, sense
of humor, cognitive and behavioral styles, and/or related individual characteristics.
Personality Traits, Types, and States
Guilford (1959, p. 6) defined personality trait as “Any distinguishable, relatively

enduring way in which one individual varies from another.”
A personality type as a constellation of traits and states that is similar in pattern to

one identified category of personality within a taxonomy of personalities.
A personality state is an inferred psychodynamic disposition designed to convey

the dynamic quality of id, ego, and superego in perpetual conflict.
Personality Traits, Types, and States

Personality Trait
- Physical Aggression subscale of the Aggression Questionnaire
Personality Type
MyersBriggs Type Indicator (MBTI; Myers & Briggs, 1943/1962)
Personality State
- State-Trait Anxiety Inventory (STAI Spielberger et al., 1980),
Personality Assessment Methods

Objective Methods
- objective methods of personality assessment characteristically contain short answer items

for which the assessee’s task is to select one response from the two or more provided.
Projective Methods
- projective hypothesis holds that an individual supplies structure to unstructured stimuli in a

manner consistent with the individual’s own unique pattern of conscious and unconscious
needs, fears, desires, impulses, confLicts, and ways of perceiving and responding.
Inkblots as Projective Stimuli
The Rorschach
Hermann Rorschach ( Figure 13–1 ) developed what he called a “form interpretation test”
using inkblots as the forms to be interpreted.
Pictures as Projective Stimuli
Your story should have a beginning, a middle, and an end.
Pictures used as projective stimuli may be photos of real people, animals, objects, or
anything. They may be paintings, drawings, etchings, or any other variety of picture.
The Thematic Apperception Test

In the TAT manual, Murray (1943) also advised examiners to attempt to find out the source
of the examinee’s story. It is noteworthy that the noun apperception is derived from the
verb apperceive, which may be defined as to perceive in terms of past perceptions.
Word association tests
-a word association test may be defined as a semistructured, individually

administered, projective technique of personality assessment that involves the
presentation of a list of stimulus words, to each of which an assessee responds
verbally or in writing with whatever comes to mind first upon hearing the word.
Sentence completion tests
-developed for use in specifi c types of settings (such as school or business) or for
specifi c purposes. Sentence completion tests may be relatively atheoretical or linked
very closely to some theory.
Projective
Sounds as Projective Stimuli
Auditory Projective Test
This inspired Skinner to think of an application for sound, not only in behavioral terms but in
the elicitation of “latent” verbal behavior that was significant “in the Freudian sense”
(Skinner, 1979, p. 175).
Auditory Apperception Test (Stone, 1950)
-the subject’s task was to respond by creating a story based on three sounds played on a
phonograph record.
Auditory sound association test
Wilmer & Husni, 1951) and the other referred to as an auditory apperception test (Ball &
Bernardoni, 1953). Henry Murray also got into the act with his Azzageddi test (Davids &
Murray, 1955), named for a Herman Melville character. Unlike other auditory projectives, the
Azzageddi presented subjects with spoken paragraphs.
Projective
The Production of Figure Drawings
A relatively quick, easily administered projective technique is the analysis of drawings.

Drawings can provide the psychodiagnostician with a wealth of clinical hypotheses to be
confirmed or discarded as the result of other findings.
Projective
The Production of Figure Drawings
Figure-drawing tests
a figure drawing test may be defined as a projective method of personality assessment
whereby the assessee produces a drawing that is analyzed on the basis of its content and
related variables.
Personality Projection in the Drawing of the Human Figure by Karen Machover

(1949).
Machover wrote that the human fi gure drawn by an individual who is directed to “draw a
person” [is] related intimately to the impulses, anxieties, confl icts, and compensations
characteristic of that individual. In some sense, the fi gure drawn is the person, and the
paper corresponds to the environment.
The House-Tree-Person test (HTP; Buck, 1948)
- is another projective figure-drawing test. As the name of the test implies, the testtaker’s
task is to draw a picture of a house, a tree, and a person.
OTHER TEST
Attitude Test
-Attitude testing is done to measure people's attitudes. The purpose is to quantify peoples'
beliefs and behaviors to inform decisions, understand human differences, and gain
knowledge about personality types. Attitude testing can be done directly or indirectly.
-is FUNDAMENTAL to the success or failure that we experience in our life. There is little
difference in people physically or intellectually. But what does make the difference is the
attitude.
Emotional Intelligence Tests
-is widely used to screen candidates for various jobs. Employers are often interested
in figuring out which applicants are likely to be resilient, self-motivated, and good at
cooperating with others, and many turn to EQ tests as a way to assess these traits.
-can significantly impact various aspects of your life, including behavior in family,
friendships, and workplace relationships.
Neuropsychological Tests
-refers to a number of tests that healthcare providers use to get information about
how your brain works.
Projective Tests
-are used to measure personality. Subjects are shown ambiguous images or asked
open-ended questions, and their answers give interviewers insights into the person's
unconscious attitudes and beliefs.
- projective test is a personality test in which subjects are shown ambiguous
images and asked to interpret them. The subjects are to project their own emotions,
attitudes, and impulses onto the image, and then use these projections to explain an
image, tell a story, or finish a sentence.
-The Rorschach Inkblot Test is the best known projective test, and it is also the
first test of its kind developed. Subjects are shown series of cards with inkblot
images and asked what the images could be
Direct Observation Test
-are a type of psychological test that involves observing people in a structured way,
either in a laboratory or natural setting, as they carry out various pre-determined
activities. These tests are used mainly to study children's behavior, including how
they interact with other family members.
-are some well-known examples of direct observation tests. One is the Parent-Child
Interaction Assessment (PCIA), which helps psychologists understand how parents
and children interact through language and behavior when they are playing.
The Parent-Child Early Relational Assessment is a direct observation test used

as a family assessment tool. It involves the video recording of parents with their
preschool age children in four distinct activities: feeding, a familiar task, a familiar
game, and a novel game that requires teaching.
Unit VI Applications and Issues in Psychological Testing
TESTING AND THE LAW

PSYCHOLOGICAL TESTING of government employees has become
widespread in recent years. The use by the federal government of these tests-
particularly those which purport to measure and categorize "personality"-poses a
unique challenge to both Congress and the courts in their effort to protect individual
constitutional rights. How the problems raised by these and other devices are dealt
with may be an important indicator of the effectiveness of judicial and legislative
control over that complex bureaucracy which is our federal government. Additionally,
solutions to these problems may measure the ability of Congress and the courts to
cope with the demands which technological advances have placed upon our system
of government, upon the very fabric of our cultural life, and upon the concept of
individual rights in a democratic society.
THE ROLE OF CONGRESS: INVESTIGATION BY THE SUBCOMMITTEE

A. Scope of the Subcommittee Inquiry
In the course of investigating the rights of federal employees, the Constitutional
Rights Subcommittee over a two-year period received and investigated numerous complaints
that federal employees were being subjected to mind-probing sessions with government
psychiatrists and psychologists in general screening programs-such as that used by the
Peace Corps-or for hiring, firing, and promotion purposes. The investigation shows that
supervisors may suggest or require "fitness for duty examinations" which may include
psychological testing, under subtle threats of disciplinary action for in subordination or loss
of a job.
B. Alleged Authority to Test
It has been suggested that psychological testing and the procedures under which
such tests are administered violate the concept of the merit system, and may be used to
circumvent the procedural guarantees established by Congress in the basic civil service laws
as interpreted by the courts. Nevertheless, as authority for their procedure concerning
mental fitness exams and personality testing, government officials cite executive orders and
civil service laws recognizing presidential authority over selection procedures.
C. Current Practices and Uses of Tests

1. The Need for Testing
In defense of governmental use of psychological tests, officials argue that testing is
necessary, effective, and constitutes no real invasion of privacy. Civil Service Commission
Chairman John Macy testified that the Government is thereby able to screen its work force
to disqualify individuals with "demonstrable emotional or behavioral disorders that would
create a hazard both to the government and to the employee.
According to a representative of the State Department, "psychiatric evaluations and
psychological testing are two necessary and effective means of assuring that its employees
are fit for employment from both the medical and security standpoints. The Department
considers virtually every one of its position both in Washington and overseas to be a
sensitive job, requiring access to classified information.
2. Testing Procedures
The State Department representative described existing procedures under which
employees of the Department and twelve other agencies are examined. In each instance the
medical staff determines whether an individual should have a psychiatric evaluation or
undergo psychological testing, or both. These steps are taken whenever a staff physician
believes an employee may have an emotional or psychological problem which would require
treatment or impair his judgment and reliability or be aggravated by an overseas
assignment. If the employee agrees to a psychiatric examination, he is given a choice of one
of the Department's four consulting psychiatrists.
3. Agency Control of Testing

There is evidently a general disclaimer of responsibility by the agencies for fairness
and effectiveness of methods utilized in medical and psychiatric evaluations and for the
adequacy of qualifications of personnel involved. This is probably the primary reason for the
lack of uniformity of standards and procedures among agencies, and for variations in the
way different cases are handled in one department.
4. Agency Uses of Testing

Psychological tests are given by the various agencies of the Government for a wide
variety of purposes. The Peace Corps, for example, makes use of the MMPI and other
personality tests as an integral part of the selection process. It considers the MMPI the "only
objective personality inventory which helps identify persons who may have or develop
serious personality disorders.
THE CONSTITUTIONAL CASE AGAINST TESTING
As we have pointed out, there exists no body of case law concerning psychological
testing as a condition or incident of government employment." Therefore, any guidelines
which the courts may in the future lay down in this area must evolve from one or more
present trends of constitutional development. The first of these is the law regulating the
employment relationship where the Government is the employer. The inquiry here relates to
the Government's power to impose conditions upon that relationship and the extent to
which this power is circumscribed by the due process clause of the fifth amendment."' The
second trend concerns recent developments which define, however vaguely, a constitutional
right of privacy.
A view widely held among psychologists, administrators and even members of
Congress is that federal employment is not a "right" but a "privilege." This leads to the
immediate and facile conclusion that personality testing-or any other requirement, for that
matter -may be made a condition of public employment regardless of any adverse
consequences to the individual.
A. Reasonably Related to the Desired Goal
In the first instance, it must be pointed out that the "reasonableness" test
has most frequently been applied to legislative action. However, where departments
and agencies rely upon general statutes for rule-making powers over their
employees, it would be logically inconsistent to suggest that the legislature is
constrained by notions of due process but that the various departments have a
completely free hand to act. If in accordance with traditional due process concepts
the agencies may only act in a manner reasonably calculated to achieve their
legitimate ends, it could be argued that psychological testing is purely arbitrary and
therefore does not meet this criterion. Even if some nexus can be shown between
promoting the efficiency of the federal service and the use of psychological tests, the
serious infringement on personal liberty which results from such tests would compel
that the nexus be clearly indicated.
B. Right to Rebut Test Evidence
If it be argued that psychological testing may have some usefulness as a

screening device but that it is by no means an accurate indicator in every instance,
the question arises as to whether the employee should be able to present his own
rebutting psychological data-to "cross-examine" the tests. The Supreme Court has
given rather little guidance to indicate which procedures are necessary to insure that
the requirements of due process are met. Traditionally the courts have treated
admission, promotion, and dismissal from the civil service as matters to be dealt with
by the executive branch. Therefore, rather than set down standards of its own, the
Supreme Court in recent years has contented itself with scrutinizing the details of
particular cases to make certain that the various departments have rigidly adhered to
whatever procedural rules they may have enacted. Thus the constitutional issue has
been avoided.
C. The Analogy to Involuntary Confessions and Self-Incrimination
There are those who take an even dimmer view of psychological testing and
would ban it completely as a government personnel screening device. The argument
may be expressed in the following terms: Because of the social stigma attached to
adverse test results, the employee should be given "the same rights as he would
have in a criminal trial. The search and seizure of the contents of men's minds by a
forced submission to psychological testing should be denounced as offensive to
those canons of decency and fairness which express the notions of justice of English-
speaking peoples. A comparison can be made to the pumping of a man's stomach in
order to obtain evidence of illegal narcotics possession, a practice which was
condemned by the Court in Rochin v. California. To the extent that the analogy to
criminal proceedings can be maintained, it is obvious that there are also self-
incrimination objections to the utilization of test scores involuntarily received as a
basis for adverse action against the employe
D. The "Right of Privacy"

The final constitutional blow to be struck against psychological testing derives
from the evolving notion of a "right of privacy. This newest of constitutional rights
was initially an aspect of the fourth amendment's search and seizure clause and the
self-incrimination provision of the fifth amendment. However, it received an
independent status in Griswold v. Connecticut, grounded on the penumbras of the
specific guarantees of the Bill of Rights, the concept of "liberty" contained within the
due process clause of the fourteenth amendment, and the ninth amendment.
PROPOSED SOLUTIONS TO THE QUESTION OF TESTING
The possibility of a successful constitutional attack on psychological testing in a court

action appears to be a real possibility in the near future. The testimony received by the
Constitutional Rights Subcommittee shows that existing procedures for psychiatric
evaluations and psychological testing are deficient in terms of protection of employee rights.
The necessity for a court test, however, could be eliminated by changes in the current
testing practices and procedures used by the Government. Various alterations in the present
situation were suggested to the Subcommittee. One solution to at least part of the problem
is to afford the employee, and perhaps the applicant, an effective means of challenging the
psychological reports and the expertise of the psychologist.
In the final analysis, a thorough-going reform of existing procedures relating to

psychological testing is a matter which must be confronted by Congress. Congress must
decide whether, in light of the evolving law surrounding the right of privacy and the
employment relationship, a government employee's rights are inferior to those of any other
citizen. Congressional hearings on testing have pointed the way to solutions. They have,
from all indications, also initiated a much-needed dialogue between lawyers and others
concerned with individual rights and the scientists, technicians, and professional medical
men responsible for the new scientific instruments and devices. In the private sector,
observance of the individual's rights will depend to a very great extent upon the intensity
and continuity of that debate. However, insofar as a citizen's relations with his Government
are concerned, Congress has it within its power to insure that individual rights and liberties
are not seconded to technology.
c. Ethics and the Future of Psychological Testing

Theoretical Concerns
The dependability (reliability) of test results is one of the most significant considerations
underlying tests (Thomas & Selthon, 2003); Tryon & Bernstein, 2003). Reliability is defined as the
degree to which knowledge and skills learned are correct, accurate and up to date. That is the extent
to which tests are relatively free of measurement error (Abe, 2012). Reliability places an upper limit
on validity. A test that is totally unreliable is meaningless. There may be exception to this statement,
but general application demands that tests possess some form of reliability or stability. As a direct
consequence, whatever is being measured must itself have reliability. To say that a test has
reliability implies that tests results are attributable to a systematic source of variance which is
reliable itself (American Educational Research Association, American Psychological Association, and
the National Council on Measurement in Education (AERA, APA, & NCME, 1999; APA, 2002).
Most psychological tests today measure a presumably emotionally balanced entity-either
the person as he or she currently functions or some temporal emotional balanced characteristics of
the person. In providing current in-depth functioning, psychologists suggest that the individual
functions this way in a fairly stable, though perhaps short term manner that is free from outside
control or influence of the situation or environment. In other words, psychologists assume that they
can give a detailed account of the individual in absolute terms as if in a vacuum. Psychologists may
opine that the individual is emotionally unstable or that the individual is out of contact with the
state of things as they actually exist, or provide a diagnostic label such as Schizophrenic or neurotic
Moral Issues in Psychological Testing

The field of psychological testing is being shaped by moral issues such as human rights, labelling, and
privacy intrusion: Human Rights Several different kinds of human rights are recognized in
psychological testing. Test takers are usually treated with courtesy, irrespective of gender, ethnicity,
state or nation of origin, religious affiliation, and age, etc. Test takers are usually tested with
measures that meet professional benchmark or standards that are appropriate, receive explanation
prior to test examination on the kind(s) of tests to be conducted. Individuals who do not want to
subject themselves to testing should not and ethically cannot be forced to do so, hence, the
individuals’ freedom to decline, and freedom to withdraw is highly respected unless situation(s)
where the testing is mandated by law or government (APA, 2002).
Labelling
There is nothing absolutely wrong in diagnosing people with kidney problem or disease, but labelling
people with certain Medical disease such as Acquired Immunodeficiency Syndrome (AIDs) and
psychiatric disorders can be damaging. For example, a reasonable percentage of the generality of
the public has little understanding of Schizophrenia. When diagnosing this kind of disease, it is
advisable to use least stigmatizing label consistent with accurate representation. Labels have the
capacity to affect one’s access to help. For instance, chronic Schizophrenia is not curable; as such,
labelling someone a chronic schizophrenia may be so harmful (McReynolds, Ward, & Singer, 2002).
Privacy Invasion
When people react or respond to psychological tests, they have little idea what is being
revealed. But in many cases they feel that their privacy has been invaded in a way that is not
justified by the tests benefits (Brayfield, 1965). Dahlstron (1969) stated that the issue of privacy
invasion is based on serious misunderstanding. He maintained that because tests have been
oversold, the public does not know their limits. Ambiguity of the motion of invasion of privacy is an
important issue in psychological tests. There is nothing absolutely wrong or detrimental in trying to
find out about a person. It is only the wrong application or use of the information gathered from the
person that amount to invasion of the person’s privacy.
Test Constructors and Test Users’ Responsibility
The testing profession has become increasingly stringent and precise in defining the ethics
and responsibility of test designers and test users. This is because even the best test can be misused.
In the right circumstance, almost any test can be useful, but when inappropriately used, even the
best test can be dangerous to the individual (APA, 2002). A major concern is the utilization of tests
with different populations. A test that is reliable and valid for group A may not be valid and reliable
for group B. In light of this issue, psychologists who administer tests are instructed to employ
instruments whose validity and reliability have been established for use with members of the
population being tested and to utilize assessment techniques that are most appropriate to a
person’s best preferred language.
Issues of Social Concern
In psychological testing, social issues such as dehumanization, usefulness of tests and access
to psychological testing services are of essential importance. This aspect will be limited to
dehumanization and usefulness of tests only.
Dehumanization
Some forms of testing lurk any human from judgement making process. This is seen as
becoming more widespread with the increase in computer based testing. For instance, some
corporations provide computerized analysis of Minnesota Multiphasic Personality Inventory (MMPI-
2) and other test results (Kaplan & Saccuzzo, 2009). Such technology tends to reduce test takers’
freedom and uniqueness. With high speed information communication technology (computers) and
centralized data banks, the probability that computers will someday provide important evaluation
judgements about human lives is on the increase.
Usefulness of Tests
The important issue in testing is not whether the tests are perfect but whether they are
useful to the individual or the society. Tests need not be perfect in every area. Society often finds
uses for initial rough or simple instruments that have become precise with research and
development (McKnow, 2007; Meyer et al, 2003; Sawyer, 2007). For instance, scientists believed
that the sun revolved around the earth, the available methods and the principles were useful in that
they led to some precise predictions, even though the theories beneath were incorrect. In like
manner, the assumptions beneath today’s tests may be fundamentally incorrect and the resulting
test instruments far from perfect. The test however, may still be useful as long as they provide data
that leads to better predictions and understanding that can otherwise be obtained.
Current Fashions in Psychological Testing
Among the current fashions or issues in psychological testing are the development of new
tests (higher standards, improved technology, and objectivity), increase in public awareness and
influence, and computer and internet application.
The Development of new Tests
Studies have shown that hundreds of new tests are being published each year. The impetus
for developing these new tests comes from professional disagreement over the best strategies for
measuring human behaviour, the nature of these behaviours, and theories of these human
characteristics. An example is the 2004 modern version of the Kaufman Assessment Battery for
Children (KABC-11); this is an individual ability test for children between 3 and 18 years of age. The
test consists of 18 subsets combined into five global scales called sequential processing,
simultaneous processing, learning, planning, and knowledge (Kaufman & Kaufman, 2004a).
Increased Public Awareness and Influence on Testing
Increased public awareness of the nature and usefulness of tests has led to increasing
external influence on testing. Before this time, the public had little or no knowledge about
psychological tests. Today, there is wide spread awareness among the general public on the need
and importance of psychological tests and other forms of test.
Computer-Based Testing
One of the major trends in testing is the use of computers. Computers are being used in
many different ways. For example, in adaptive computerized testing, different sets of test questions
are administered through computer to different test takers, each depending on each of the traits
being measured (Mills, Potenza, Fremer, & Ward, 2003; Weiss, 1983, 1985). Likewise in ability
testing, the computer adjusts the level of item difficulty according to the test taker’s response. If the
test taker’s answer is incorrect, then an easier item is given; if correct, then a more difficult item
appears next.
The Hope of new and Improved Tests
Psychologists believe that the dominant role of some of the popular tests such as Stanford-
Binet and Wechsler tests is far from secure. These two intelligence scales are probably technically
adequate as they will ever remain. They can be improved through minor versions to update test
stimuli and provide larger and even more representative normative samples with special norms for
particular groups via additional research to extend and support validity evidence.
All psychological tests are based on theories of human functioning. The validity of these
theories and the underlying assumption is far from proven. More so, there seem to be no consensus
or generally agreed assumption of the essence of human personality, normal or abnormal. With the
increase in the awareness of test users created by them for testing, the need for improving the
existing psychological test is necessary as some of the tests today may not be able to meet the
psychological needs of individuals considering the changes that take place in our body Chemistry
which sometimes may have some psychological implications on human personality or trait. As
Kaufman Assessment Battery for Children, Structured Personality Testing, and the MMPI-2 are
already pioneering the 21st century, Psychologist should be more creative in building new tests that
will meet future testing needs of the fast growing population and be persistent in modifying the
existing tests while accomplishing the goals of psychological testing.

Guidance Reviewer

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Guidance Reviewer

Uploaded by

Copyright:

Available Formats

GUIDANCE REVIEWER

II. The Science of Psychological Measurement

III. Types of Psychological Tests and Their Uses

IV. Applications and Issues in Psychological Testing

1000 BC Testing in Chinese civil service

i. DEFINING PSYCHOLOGICAL TESTING

The branch of psychology concerned with the quantification and

The process of measuring psychology-related variables by means of devices

ii. BASIC CONCEPTS OF PSYCHOLOGICAL TESTING

Testing may be individual or group in nature. After test administration, the

Testing typically requires technician-like skills in terms of administering and

Unit II THE SCIENCE OF PSYCHOLOGICAL MEASUREMENT

a) PSYCHOLOGICAL TESTING NORMS & BASIC STATISTICS

WHY DO WE NEED STATISTICS?

MEASURES OF CENTRAL TENDENCY

THE NORMAL CURVE

PROPERTIES OF THE NORMAL CURVE MODEL

 It is bell shaped, as its nickname indicates.

USES OF THE NORMAL CURVE

ESSENTIALS OF CORRELATION AND REGRESSION

relationship as a coefficient of +0.80. Whether positive or negative, a correlation is

low to the extent that its coefficient approaches zero.

are correlated, it may be because X causesY, because Y causes X, or because a third

relationship between the correlated variables.

High correlations allow us to make predictions. While correlation does not

imply causation, it does imply a certain amount of common or shared variance.

variables to derive equations that allow us to predict the expected values of a

of one or more independent variables (X1, X2 , . . . Xk ), with which the dependent

RELIABILITY AND VALIDITY

addition to being reliable, tests must be reasonably accurate. In the language of

psychometrics, tests must be valid.

equals the true score plus error may be expressed as follows:

random sources is error variance.

SOURCES OF ERROR VARIANCE

Test construction. One source of variance during test construction is item

variance and to minimize the proportion of the total variance.

Test administration. Sources of error variance that occur during test

noise, for instance.

reliance on objective, computer-scorable items virtually have eliminated error variance

scorer (or rater) can be a source of error variance.

over report abuse if they are attempting to justify the report.

 Test-Retest Reliability Estimates

One way of estimating the reliability of a measuring instrument is by using

 Parallel-Forms and Alternate-Forms Reliability Estimates

The degree of the relationship between various forms of a test can be

legitimate designation “parallel,” alternate forms of a test are typically designed to be

estimates of alternate-forms reliability and parallel-forms reliability is similar in two

ways to obtain an estimate of test-retest reliability:

2. test scores may be affected by factors such as motivation, fatigue, or

intervening events such as practice, learning, or therapy (although not as much as

when the same test is administered twice).

An additional source of error variance, item sampling, is inherent in the

computation of an alternate- or parallel-forms reliability coefficient. Test takers may

 Split-Half Reliability Estimates

An estimate of split-half reliability is obtained by correlating two pairs of

computation of a coefficient of split-half reliability generally entails three steps:

Step 1. Divide the test into equivalent halves.

Step 3. Adjust the half-test reliability using the Spearman-Brown formula.

 The Spearman-Brown formula

Spearman-Brown ( rSB ) formula i

 Other Methods of Estimating Internal Consistency

Inter-item consistency refers to the degree of correlation among all the

MEASURES OF INTER-SCORER RELIABILITY

Variously referred to as scorer reliability, judge reliability, observer reliability,