Professional Documents
Culture Documents
b) term is also used to denote the form or structure of other a) verbal language
evaluative tools, and processes, such as the guidelines b) nonverbal language
for creating a portfolio work sample (1) body language movements
3. Ways That tests differ from one another: (2) facial expressions in response to
a) administrative procedures interviewer
(1) some test administers have an active (3) the extent of eye contact
knowledge (4) apparent willingness to cooperate
(a) some test administration c) how they are dressed
involves demonstration of (1) neat vs sloppy vs inappropriate
tasks 2. interviewer over the phone taking note of
(b) usually one-on-one a) changes in the interviewee’s voice pitch
(c) trained observation of b) long pauses
assessee’s performance c) signs of emotion in response
3. ways that interviews differ:
a) length, purpose, and nature
CHAPTER 1: PSYCHOLOGICAL TESTING AND ASSESSMENT CHAPTER 1: PSYCHOLOGICAL TESTING AND ASSESSMENT
b) in order to help make diagnostic, treatment, 6. interpretive report: a formal or official computer-generated account
selection, etc of test performance presented in both numeric and narrative form
4. panel interview and including an explanation of the findings;
a) an interview conducted with one interviewee with more a) the three varieties of interpretive report are
than one interviewer (1) descriptive
C. The Portfolio (2) screening
1. files of work products: paper, canvas, film, video, audio, etc (3) consultive
2. samples of ones abilities and accomplishments b) some contain relatively little interpretation and simply call
D. Case History Data: records, transcripts, and other accounts in written, pictorial attention to certain high, low, or unusual scores that
or other form that preserve archival information, official and informal accounts, needed to be focused on.
and other data and items relevant to assessee c) consultative report: A type of interpretive report
1. Sheds light on an individual's past and current adjustment as well as on designed to provide expert and detailed analysis of test
events and circumstances that may have contributed to any changes in data that mimics the work of an expert consultant.
adjustment. d) integrative report: a form of interpretive report of
2. Provides information about neuropsychological functioning prior to the psychological assessment, usually computer-
occurrence of a trauma or other event that results in a deficit. generated, in which data from behavioral, medical,
3. insight into current academic and behavioral standing administrative, and/or other sources are integrated
4. useful in making judgments for future class placements 7. CAPA: computer assisted psychological assessment. (assistance to the
5. Case history Study: a report or illustrative account concerning test user not the test taker)
person or an event that was compiled on the basis of case history a) enables test developers to create psychometrically sound
data tests using complex mathematical procedures and
calculations.
a) Might shed light on how one individual’s personality and
particular set of environmental conditions combined to b) enables test users the construction of tailor-made test
produce a successful world leader. with built-in scoring and interpretive capabilities.
2. role-play test: tool of assessment wherein assessees are directed to (1) test client integrity
act as if they were in a particular situation. Assessees are then evaluated (a) refers to the verification of the
with regard to their expressed thoughts, behaviors, abilities, etc identity of the test taker when a
G. Computers as tools test is administered online
1. local processing: on site computerized scoring, interpretation, or (b) also refers to the sometimes
other conversion of raw test data; contrast w/ CP and varying interests of the test
teleprocessing taker vs that of the test
2. central processing: computerized scoring, interpretation, or other administrator. The test taker
conversion of raw data that is physically transported from the same or might have access to notes,
other test sites; contrast w/ LP and teleprocessing. aids, internet resources etc.
3. teleprocessing: computerized scoring, interpretation, or other (c) internet testing is only testing, not
conversion of raw test data sent over telephone lines by modem from a assessment
test site to a central location for computer processing. contrast with CP 8. CAT: computerized adaptive testing: an interactive, computer-
and LP administered test taking process wherein items presented to the test
4. simple score report: a type of scoring report that provides only a listing taker are based in part on the test taker's performance on previous
of scores items
5. extended scoring report: a type of scoring report that provides a listing a) EX: on a computerized test of academic abilities, the
of scores AND statistical data. computer might be programmed to switch from testing
math skills to English skills after three consecutive failures
on math items.
CHAPTER 1: PSYCHOLOGICAL TESTING AND ASSESSMENT CHAPTER 1: PSYCHOLOGICAL TESTING AND ASSESSMENT
H. Other Tools satisfaction, personal values, quality of living conditions, and
1. DVD- how would you respond to the events that take place in the quality of friendships and other social support.
video BUSINESS AND MILITARY SETTINGS
a) sexual harassment in the workplace
GOVERNMENTAL AND ORGANIZATIONAL CREDENTIALING
b) respond to various types of emergencies How are Assessments Conducted?
c) diagnosis/treatment plan for clients on videotape
protocol: the form or sheet or booklet on which a testtaker’s
2. thermometers, biofeedback, etc responses are entered.
TEST DEVELOPER o term might also be used to refer to a description of a set of test- or
assessment- related procedures, as in the sentence , “the
They are the one who create tests.
examiner dutifully followed the complete protocol for the stress
They conceive, prepare, and develop tests. They also find a way to
interview”
disseminate their tests, by publishing them either commercially or through
professional publications such as books or periodicals. rapport: working relationship between the examiner and the
TEST USER examinee
They select or decide to take a specific test off the shelf and use it for some
ASSESSEMENT OF PEOPLE WITH DISABILITITES
purpose. They may also participate in other roles, e.g., as examiners or
scorers. Define who requires alternate assessment, how such assessment are to be
TEST TAKER conducted and how meaningful inferences are to be drawn from the data
derived from such assessment
Anyone who is the subject of an assessment
Accommodation – adaptation of a test, procedure or situation or the
Test taker may vary on a continuum with respect to numerous
variables including: substitution of one test for another to make the assessment more suitable
o The amount of anxiety they experience & the degree to which for an assesee with exceptional needs.
the test anxiety might affect the results Translate it into Braillee and administered in that form.
o The extent to which they understand & agree with the Alternate Assessment – evaluative or diagnostic procedure or process that
rationale of the assessment
varies from the usual, customary, or standardized way a measurement is
o Their capacity & willingness to cooperate
derived either by virtue of some special accommodation made to the assesee
o Amount of physical pain/emotional distress they are
by means of alternative methods
experiencing
o Amount of physical discomfort Consider these four variables on which of many different types of
o Extent to which they are alert & wide awake accommodation should be employed:
o Extent to which they are predisposed to agreeing or o The capabilities of the assesse
disagreeing when presented with stimulus o The purpose of the assessment
o The extent to which they have received prior coaching o The meaning attached to test scores
o May attribute to portraying themselves in a good light o The capabilities of the assessor
REFERENCE SOURCES
Psychological autopsy – reconstruction of a deceased individual’s
TEST CATALOUGES – contains brief description of the test
psychological profile on the basis of archival records, artifacts, &
interviews previously conducted with the deceased assesee TEST MANUALS – detailed information
TYPES OF SETTINGS REFERENCE VOLUMES – one stop shopping, provides detailed
EDUCATIONAL SETTING information for each test listed, including test publisher, author,
o Achievement test: evaluation of accomplishments or the purpose, intended test population and test administration time
degree of learning that has taken place, usually with regard to JOURNAL ARTICLES – contain reviews of the test
an academic area.
ONLINE DATABASES – most widely used bibliographic databases
o Diagnosis: a description or conclusion reached on the basis of
evidence and opinion though a process of distinguishing the nature TYPES OF TESTS
of something and ruling out alternative conclusions. INDIVIDUAL TEST – those given to only one person at a time
o Diagnostic test: a tool used to make a diagnosis, usually to
GROUP TEST – administered to more than one person at a time by single
identify areas of deficit to be targeted for intervention examiner
o informal evaluation: A typically nonsystematic, relatively brief,
ABILITY TESTS:
and “off the record” assessment leading to the formation of an o ACHIEVEMENT TESTS – refers to previous learning (ex.
opinion or attitude, conducted by any person in any way for any Spelling)
reason, in an unofficial context and not subject to the same o APTITUDE/PROGNOSTIC – refers to the potential for
ethics or standards as evaluation by a professional learning or acquiring a specific skill
o INTELLIGENCE TESTS – refers to a person’s general
CLINICAL SETTING
potential to solve problems
o these tools are used to help screen for or diagnose
behavior problems PERSONALITY TESTS: refers to overt and covert dispositions
o group testing is used primarily for screening: identifying those o OBJECTIVE/STRUCTURED TESTS – usually self-report,
individuals who require further diagnostic evaluation. require the subject to choose between two or more
alternative responses
COUNSELING SETTING
o PROJECTIVE/UNSTRUCTURED TESTS – refers to all possible
o schools, prisons, and governmental or privately owned
uses, applications and underlying concepts of psychological
institutions
and educational tests
o Ultimate objective: the improvement of the assessee in terms
o INTEREST TESTS –
of adjustment, productivity, or some related variable.
GERIATRIC SETTING
o quality of life: in psychological assesment, an evaluation of
variables such as perceived stress, loneliness, sources of
CHAPTER 2: HISTORICAL, CULTURAL AND LEGAL/ETHICAL CONSIDERATIONS CHAPTER 2: HISTORICAL, CULTURAL AND LEGAL/ETHICAL CONSIDERATIONS
A HISTORICAL PERSPECTIVE testakers from young children through senior
19TH CENTURY adulthood.
Tests and testing programs first came into being in China B. THE MEASUREMENT OF PERSONALITY
Testing was instituted as a means of selecting who, of many o Field of psychology was being too test oriented
applicants would obtain government jobs (Civil service) o Clinical psychology was synonymous to mental testing
o ROBERT WOODWORTH – develop a measure of adjustment
The job applicants are tested on proficiency in endeavors such as music,
and emotional stability that could be administered quickly and
archery, knowledge and skill etc.
efficiently to groups of recruits
GRECO-ROMAN WRITINGS (Middle Ages)
To disguise the true purpose of the test,
World of evilness questionnaire was labeled as Personal Data
Deficiency in some bodily fluid as a factor believed to influence Sheet
personality He called it Woodworth Psychoneurotic
Hippocrates and Galen Inventory – first widely used self-report test of
RENAISSANCE personality
Christian von Wolff – anticipated psychology as a science and o Self-report test:
psychological measurement as a specialty within that science Advantages:
CHARLES DARWIN AND INDIVIDUAL DIFFERENCES Respondents best qualified
Disadvantages:
Tests designed to measure these individual differences in ability and
Poor insight into self
personality among people
One might honestly believe
“Origin of Species” chance variation in species would be selected or
something about self that isn’t true
rejected by nature according to adaptivity and survival value. “survival of the
Unwillingness to report seemingly
fittest”
negative qualities
FRANCIS GALTON o Projective test: individual is assumed to project onto some
Explore and quantify individual differences between people. ambiguous stimulus (inkblot, photo, etc.) his or her own unique
Classify people “according to their natural gifts” needs, fears, hopes, and motivations
Displayed the first anthropometric laboratory Ex.) Rorschack inkblot
KARL PEARSON o
Developed the product moment correlation technique. C. ACADEMIC AND APPLIED TRADITIONS Culture
His work can be traced directly from Galton
WILHEM MAX WUNDT and Assessment
First experimental psychology laboratory in University of Leipzig
Focuses more on relating to how people were similar, not different from Culture: ‘the socially transmitted behavior patterns, beliefs, and products of work f a
particular population, community, or group of people’
each other.
JAMES MCKEEN CATELL
Evolving Interest in Culture-Related Issues
Individual differences in reaction time Goddard tested immigrants and found most to be feebleminded
Coined the term mental test o -invalid; overestimated mental deficiency, even in native English-
CHARLES SPEARMAN speakers
Originating the concept of test reliability as well as building the Lead to nature-nurture debate about what intelligence tests actually measure
mathematical framework for the statistical technique of factor Needed to “isolate” the cultural variable
analysis Culture-specific tests: tests designed for use with ppl from one culture, but not
VICTOR HENRI from another
Frenchman who collaborated with Binet on papers suggesting how mental o -minorities still scored abnormally low
tests could be used to measure higher mental processes ex.) loaf of bread vs. tortillas
today tests undergo many steps to ensure its suitable for said nation
EMIL KRAEPELIN
o -take testtakers reactions into account
Early experimenter of word association technique as a formal test
LIGHTNER WITMER Some Issues Regarding Culture and Assessment
“Little known founder of clinical psychology” Verbal Communication
Founded the first psychological clinic in the U.S. o Examiner and examinee must speak the same language
PSYCHE CATELL o Especially tricky with infrequently used vocabulary or unusual
Daughter of James Cattell idioms employed
o Translator may lose nuances of translation or give unintentional
Cattel Infant Intelligence Scale (CIIS) & Measurement of Intelligence in Infants
hints toward more desirable answer
and Young Children
o Also requires understanding of culture
RAYMOND CATTELL
Nonverbal Communication and Behavior
Believed in lexical approach to defining personality which examines human
o Different between cultures
languages for descriptors of personality dimensions o Ex.) meaning of not making eye contact
20t h CENTURY o Body movement could even have physical cause
- Birth of the first formal tests of intelligence o Psychoanalysis: Freud’s theory of personality and psychological
- Testing shifted to be of more understandable relevance/meaning treatment which stated that symbolic significance is assigned to
A. THE MEASUREMENT OF INTELLIGENCE many nonverbal acts.
o Binet created first intelligence to test to identify mentally o Timing tests in cultures not obsessed with speed
retarded school children in Paris (individual) o Lack of speaking could be reverence for elders
o Binet-Simon Test has been revised over again Standards of Evaluation
o Group intelligence tests emerged with need to screen o Acceptable roles for women differ throughout culture
intellect of WWI recruits o “judgments as to who might be the best employee, manager, or
o David Wechsler – designed a test to measure adult leader may differ as a function of culture, as might judgments
intelligence test regarding intelligence, wisdom, courage, and other psychological
for him Intelligence is a global capacity of the variables”
individual to act purposefully, to think rationally and
to deal effectively with his environment.
Wechsler-Bellevue Intelligence Scale
Wechsler Adult Intelligence Test – was revised
several times and extended the age range of
CHAPTER 2: HISTORICAL, CULTURAL AND LEGAL/ETHICAL CONSIDERATIONS CHAPTER 2: HISTORICAL, CULTURAL AND LEGAL/ETHICAL CONSIDERATIONS
o must ask ‘how appropriate are the norms or other The right to be informed of test findings
standards that will be used to make this evaluation’ o Formerly test administrators told to give participants only
positive information
Tests and Group Membership o No realistic information is required
ex.) must be 5’4” to be police officer - excludes cultures with short stature o Tell test takers as little as possible about the nature of their
ex.) Jewish lifestyle not well suited for corporate America performance on a particular test. So that the examinee would
affirmative action: voluntary and mandatory efforts to combat leave the test session feeling pleased and statisfied.
discrimination and promote equal opportunity in education and employment o Test takers have the right also to know what recommendations are
for all being made as a consequence of the test data
Psychology, tests, and public policy The right to privacy and confidentiality
o Private right: “recognizes the freedom of the individual to pick
Legal and Ethical Consideration
and choose for himself the time, circumstances, and particularly
Code of professional ethics: defines the standard of care expected of members of a given
the extent to which he wishes to share or withhold from others
profession.
his attitudes, beliefs, behaviors, and opinions”
o Privileged information: information protected by law from being
The Concerns of the Public
disclosed in legal proceeding. Protects clients from disclosure in
Beginning in world war I, fear that tests were only testing the ability to take
tests judicial proceedings. Privilege belongs to the client not the
psychologist.
Legislation
o Minimum competency testing programs: formal testing programs o Confidentiality: concerns matters of communication outside
designed to be used in decisions regarding various aspects of the courtroom
students’ educations Safekeeping of test data: It is not a good policy to
o Truth-in-testing legislation: state laws to provide testtakers with a maintain all records in perpetuity
means of learning the criteria by which they are being The right to the least stigmatizing label
judged o The standards advise that the least stigmatizing labels should
Litigation always be assigned when reporting test results.
o Daubert ruling made federal judges the gatekeepers to
determining what expert testimony is admitted
o This overrode the Frye policy which only admitted scientific
testimony that had won general acceptance in the scientific
community.
Standard Scores
Standard Score: raw score that has been converted from one scale to another scale,
where the latter has arbitrarily set mean and standard deviation
-used for comparison
CHAPTER 4: OF TESTS AND TESTING CHAPTER 4: OF TESTS AND TESTING
Tasks on some tests mimic the actual behaviors that the
Some Assumptions About Psychological Testing and Assessment test user is attempting to understand
- Assumption 1: Psychological Traits and States Exist o Obtained behavior is usually used to predict future behavior
o Trait: any distinguishable, relatively enduring way in which one
o Could also be used to postdict behavior to aid in the
individual varies from another
understanding of behavior that has already taken place
o States: distinguish one person from another but are relatively less
o Tools of assessment, such as a diary, or case history data, might be of
enduring
great value in such an evaluation
Trait term that an observer applies, as well as strength or
magnitude of the trait presumed present
- Assumption 4: Tests and Other Measurement Techniques Have Strengths and
Weaknesses
based on observing a sample of behavior
o Competent test users understand a lot about the tests they use
o Trait and state definitions also refer to individual variation How it was developed
make comparisons with respect to the hypothetical average person Circumstances under which it is appropriate to
o Samples of behavior: administer the test
Direct observation How test should be administered and to whom
Analysis of self-report statements How results should be interpreted
Paper-and-pencil test answers o Understand and appreciation limitations for tests they use
o Psychological trait covers wide range of possible - Assumption 5: Various Sources of Error Are Part of the Assessment Process
characteristics; ex: o Everyday error= misstates and miscalculations
Intelligence
o Assessment error= a long-standing assumption that factors other
Specific intellectual abilities
than what a test attempts to measure will influence performance on a
Cognitive style
test
Psychopathology
o Error variance: component of a test score attributable to
o Controversy regarding how psychological tests exist
sources other than the trait or ability measured
Psychological tests exist only as constructs: an
Assessees themselves are sources of error variance
informed, scientific concept developed or
o Classical test theory (CTT)/ True score theory: assumption is
constructed to describe or explain a behavior
made that each testtaker has a true score on a test that would be
Cant see, hear or touch infer existence
obtained but for the action of measurement error
from overt behavior: refers to an
- Assumption 6: Testing and Assessment Can Be Conducted in a Fair and
observable action or the product of an
Unbiased Manner
observable action, including test- or
o Court challenged to various tests and testing programs have
assessment-related responses
sensitized test developers and users to the societal demand for fair
o Traits not expected to be manifested in behavior 100% of the time
tests used in a fair manner
Seems to be rank-order stability in personality
Publishers strive to develop instruments that are fair when
traits relatively high correlations between trait
used in strict accordance with guidelines in the test manual
scores at different time points o Fairness related problems/questions:
o Whether and to what degree a trait manifests itself is
Culture is different from people whom the test was
dependent on the strength and nature of the situation
intended for
- Assumption 2: Psychological Traits and States Can Be Quantified and Politics
Measured
- Assumption 7: Testing and Assessment Benefit Society
o After acknowledged that psychological traits and states do exist, the
o Many critical decisions are based on testing and assessment
specific traits and states to be measured need to be defined
procedures
What types of behaviors are assumed to be
indicative of trait? WHAT’S A “GOOD TEST”?
Test developer has to provide test users with a clear
- Criteria
operational definition of the construct under study
o Clear instruction for administration, scoring, and interpretation
o After being defined, test developer considers types of item
content that would provide insight into it
- Reliability
Ex: behaviors that are indicative of a particular trait o A “good test”/measuring tool reliable
o Should all questions be weighted the same? Involves consistency: the prevision with which the test
Weighting the comparative value of a test’s items measures and the extent to which error is present in
comes about as the result of a complex interplay measurements
among many factors: Unreliable measurement needs to be avoided
Technical considerations - Validity
The way a construct has been defined (for o Test is considered valid if it doesn’t indeed measure what it
particular test) purports to measure
Value society (and test developer) attach to o If there is controversy over the definition of a construct then the validity
behaviors evaluated is sure to be criticized as well
o Need to find appropriate ways to score the test and interpret results o Questions regarding validity focus on the items that collectively make
Cumulative scoring: test score is presumed to up the test
represent the strength of the targeted ability or trait or Adequately sample range of areas to measure
state construct
The more the testtaker responds in a Individual items contribute to or take away from
particular direction (as keyed by test test’s validity
manual) the higher the testtaker is o Validity may also be questioned on grounds related to the
presumed to possess the targeted trait or interpretation of test results
ability - Other Considerations
- Assumption 3: Test-Related Behavior Predicts Non-Test-Related Behavior o “Good test” one that trained examiners can administer, score and
o Objective of test is to provide some indication of some aspects interpret with minimum difficulty
of the examinee’s behavior Useful
Yields actionable results that will ultimately benefit
individual testtakers or society at large
CHAPTER 4: OF TESTS AND TESTING CHAPTER 4: OF TESTS AND TESTING
o Purpose of test compare performance of testtaker with o STANDARD ERROR OF THE DIFFERENCE – estimate how large
performance of other testtakers (contains adequate norms: a difference between two scores should be before the
normative data) difference is considered statistically significant
Normative data provides standard with which results - Developing norms for a standardized test
measured can be compared o Establish a standard set of instructions and conditions under
NORMS which the test is given makes scores of normative
- Norm-referenced testing and assessment: method of evaluation and a sample more comparable with scores of future testtakers
way of deriving meaning from test scored by evaluating an individual o All data collected and analyzed, test developer will summarize
testtaker’s score and comparing it to scores of a group of testtakers data using descriptive statistics (measures of central tendency
- Meaning of individual score is relative to other scores on the same test and variability)
- Norms (scholarly context): usual, average, normal, standard, expected or typical Test developer needs to provide precise
- Norms (psychometric context): the test performance data of a description of standardization sample itself
Descriptions of normative samples vary widely in
detail
Tracking
particular group of testtakers that are designed for use as a reference when - Comparisons are usually with people of the same age
evaluating or interpreting individual test scores - Children at the same age level tend to go through different growth patterns
- Normative sample: group of people whose performance on a particular test - Pediatricians must know the child’s percentile within a given age
is analyzed for reference in evaluation the performance of individual testtakers group
o Yields a distribution of scores - This tendency to stay at about the same level relative to one’s peers is
- Norming: refers to the process of deriving norms; particular type of norm known as tracking (ie height and weight)
derivation - Diets may alter this “track”
o Race norming: controversial practice of norming on the basis - Faults: some believe there is an analogy between the rates of physical growth
of race or ethnic background and the rates of intellectual growth
- Norming a test can be very expensive user norms/program norms: consist o Some say that children learn at different rates
of descriptive statistics based on a group of testtakers in a given period of o This system discriminates against some children
time rather than norms obtained by form sampling methods
- Sampling to Develop Norms TYPES OF NORMS
- Standardization: process of administering a test to a representative o Classification of norms ex: age, grade, national, local,
sample of testtakers for the purpose of establishing norms percentile, etc.
o Standardized when has clear, specified procedures o PERCENTILES
- Sampling Median= 2nd quartile: the point at or below which 50% of
o Developer targets defined group as population test the scores fell and above which the remaining 50% fell
designed for Might wish to divide distribution of scores into
All have at least one common, observable Deciles (instead of quartiles): 10 equal parts
characteristic The X th percentile is equal to the score at or below which
o To obtain distribution of scores: X% of scores fall
Test administered to everyone in targeted Percentile: an expression of the percentage of people
population whose score on a test or measure falls below a particular
Administer test to a sample of the population raw score
Sample: portion of universe of people Percentage correct: refers to the
deemed to be representative of whole distribution of raw scores (number of items
population that were answered correctly) multiplied by
Sampling: process of selecting the 100 and divided by the total number of items
portion of universe deemed to be *not same as percentile
representative of whole Percentile is a converted score that refers to a
o Subgroups within a defined population may differ with percentage of testtakers
respect to some characteristics and it is sometimes Percentiles are easily calculated popular way of
essential to have these differences proportionately organizing test related data
represented in sample Using percentiles with normal distribution real
Stratified sampling: sample reflects statistics of differences between raw scores may be minimized near
whole population; helps prevent sampling bias and the ends of the distribution and exaggerated in the middle
ultimately aid in interpretation of findings (worsens with highly skewed data)
Purposive sampling: arbitrarily select sample we o AGE NORMS
believe to be representative of population Age-equivalent scores/age norms: indicate the
Incidental/convenience sampling: sample that is average performance of different samples of testtakers
convenient or available for use who were at various ages at the time the test was
Very exclusive (contain exclusionary administered
criteria) Age norm tables for physical
- TYPES OF STANDARD ERROR: characteristics
o STANDARD ERROR OF MEASUREMENT – estimate the extent “Mental” age vs. physical age (need to
to which an observed score deviates from a true score identify mental age)
o STANDARD ERROR OF ESTIMATE – In regression, an o GRADE NORMS
estimate of the degree of error involved in predicting the value Grade norms: designed to indicate the average test
of one variable from another performance of testtakers in a given school grade
o STANDARD ERROR OF THE MEAN – a measure of sampling error Developed by administering the test to
representative samples of children over a
range of consecutive grades
Mean or median score for children at
each grade level is calculated
CHAPTER 4: OF TESTS AND TESTING CHAPTER 4: OF TESTS AND TESTING
Great intuitive appeal CORRELATION
Do not provide info as to the content or type Degree and direction of correspondence between two things.
of items that a student could or could not Correlation coefficient (r) – expresses a linear relationship between two
answer correctly continuous variables
Developmental norms: (ex: grade norms and age o Numerical index that tells us the extent to which X and Y
norms) term applied broadly to norms developed on are “co-related”
the basis of any trait, ability, skill, or other
Positive correlation: high scores on Y are associated with high scores on X,
characteristic that is presumed to develop,
and low scores on Y correspond to low scores on X
deteriorate, or otherwise be affected by chronological
Negative correlation: higher scores on Y are associated with lower scores
age, school grade, or stage of life
on X, and vise versa
o NATIONAL NORMS
No correlation: the variables are not related
National norms: derived from a normative sample that
was nationally representative of the population -1 to 1
at the time the norming study was conducted Correlation does not imply causation.
o NATIONAL ANCHOR NORMS o Ie weight, height, intelligence
Many different tests purporting to measure the same
human characteristics or abilities PEARSON r
National anchor norms: equivalency tables for scores on Pearson Product Moment Correlation Coefficient
tests that purpose to measure the same thing Devised by Karl Pearson
Could provide the tool for comparisons Relationship of two variables are linear and continuous
Provides stability to test scores by
Coefficient of Determination (r 2) – indication of how much variance is
anchoring them to other test scores
shared by the X and the Y variables
Begins with the computation of percentile
SPEARMAN RHO
norms for each test to be compared
Rank order correlation coefficient
Equipercentile method: equivalency of
scores on different tests is calculated with Developed by Charles Spearman
reference to corresponding percentile scores Used when the sample size is small and when both sets of
o SUBGROUP NORMS measurements are in ordinal form (ranking form)
Normative sample can be segmented by an criteria BISERIAL CORRELATION
initially used in selecting subjects for sample expresses the relationship between a continuous variable and an artificial
Subgroup norms: result of segmentation; more dichotomous variable
narrowly defined o If the dichotomous variable had been true then we would use
o LOCAL NORMS the point biserial correlation
Local norms: provide normative info with respect to o When both variables are dichotomous and at least one of the
the local population’s performance on some test dichotomies is true, then the association between them can be
Typically developed by test users estimated using the phi coefficient
themselves o If both dichotomous variables are artificial, we might use a special
- Fixed Reference Group Scoring Systems correlation coefficient – tetrachoric correlation
o Norms provide context for interpreting meaning of a test score
REGRESSION
o Fixed reference group scoring system: distribution of scored
obtained on the test from one group of testtakers (fixed reference analysis of relationships among variables for the purpose of
group) is used as the basis for the calculation of test scores for future understanding how one variable may predict another
administrators on the test SIMPLE REGRESSION: one IV (X) and one DV (Y)
Ex: SAT test (developed in 1962) - Regression line: defined as the best-fitting straight line through a set of
NORM-REFERENCED VERSUS CRITERION-REFERENCED EVALUATION points in a scatter diagram
o Found by using the principle of least squares, which
- Way to derive meaning from test score is to evaluate test score in relation
minimizes the squared deviation around the regression line
to other scores on same test ( Norm-referenced)
- Criterion-referenced: derive meaning from a test score by evaluating it on the Primary use: To predict one score or variable from another
basis of whether or not some criterion has been met Standard error of estimate: the higher the correlation between X and Y, the
o Criterion: a standard on which a judgment or decision may be greater the accuracy of the prediction and the smaller the SEE.
based MULTIPLE REGRESSION: The use of more than one score to predict Y.
- Criterion-referenced testing and assessment: method of evaluation and Regression coefficient: (b) slope of the regression line
way of deriving meaning from test scores by evaluating an o Sum of squares for the covariance to the sum of squares for X
individual’s score with reference to a set standard (ex: to drive must past o Sum of squares is defined as the sum of the squared
driving test) deviations around the mean
o Derives from values and standards of an individual or o Covariance is used to express how much two measures
organization covary, or vary together
o Also called Domain/content-referenced testing and Slope describes how much change is expected in Y each time X
assessment
increases by one unit
o Critique: if followed strictly, important info about
Intercept (a) is the value of Y when X is 0
individual’s performance relative to others can be
o The point at which the regression line crosses the Y axis
potentially lost
THE BEST-FITTING LINE
Culture and Inference
The difference between the observed and predicted score (Y- Y’) is
- Culture is a factor in test administration, scoring and interpretation
called the residual
- Test user should do research in advance on test’s available norms to
check how appropriate it is for targeted testtaker population The best-fitting line is most appropriately found by squaring each residual
o Helpful to know about the culture of the testtaker Best-fitting line is obtained by keeping these squared residuals as small as
possible
o Principle of least squares:
Correlation is a special case of regression in which the scores for both variables
are in standardized, or Z, units
CHAPTER 4: OF TESTS AND TESTING CHAPTER 4: OF TESTS AND TESTING
In correlation, the intercept is always 0
Third Variable Explanation
Pearson product moment correlation coefficient is a ratio used to - Third variable, ie poor social adjustment, causes TV viewing and
determine the degree of variation in one variable that can be estimated aggression
from knowledge about variation in the other variable - External influence is the third variable
Testing the Statistical Significance of a Correlation Coefficient Restricted Range
- Begin with the null hypothesis that there is no relationship between - Correlation and regression use variability on one variable to explain
variables variability on a second variable
- Null hypothesis rejected is there is evidence that the association - Restricted range problem: correlation requires variability; if the
between two variables is significantly different from 0 variability is restricted, then significant correlations are difficult to find
- t distribution is not a single distribution, but a family of distributions, Mulvariate Analysis
each with its own degrees of freedom - Multivariate analysis considers the relationship among combinations of three
- Degrees of freedom are defined as the sample size minus 2, or N-2 of more variables
- Two-tailed test General Approach
- Linear combination of variables is a weighted composite of the original
How to Interpret a Regression Plot
variables
- Regression plots are pictures that show the relationship between - Y’ = a+b1X1 + … bkXk
variables
- Common use of correlation is to determine the criterion validity
evidence for a test, or the relationship between a test score and some
well-defined criterion
- Middle level of enjoyableness because it is the one observed most
frequently – normative because it uses info gained from representative groups
- Using the test as a predictor is not as good as perfect prediction, but
it is still better than using the normative info
- A regression line such as in 3.9 shows that the test score tells us nothing
about the criterion beyond the normative info
All share interactionism: complex concept by which heredity and Measuring Intelligence
environment are presumed to interact and influence the development of
one’s intelligence Types of Tasks Used in Intelligence Test
Factor-analytic theories: focus is squarely on identifying the Infants: test sensorimotor, interviews with parents
ability(ies) deemed to constitute intelligence Older child: verbal and performance abilities
Information-processing theories: focus is on identifying the specific Mental Age: index that refers to chronological age equivalent to
mental processes that constitute intelligence. one’s test performance
Adults: retention of general information, quantitative reasoning,
Factor-Analytic Theories of Intelligence: expressive language and memory, and socialjudgment
Charles Spearman: pioneered new techniques to measure Theory in Intelligence Test Development and Interpretation
intercorrelations between tests. Weschler made a dichotomous test (Performance and Verbal), but
o Existence of a general intellectual ability factor (g) that advocated multifaceted definition
tapped by all other mental abilities. Thorndike: intelligence = social, concrete, abstract
g representing the portion of the variance that all intelligence tests have in Putting theories into test are extremely hard
common and the remaining portions of the variance being accounted for
either by specific components (s) or by error components (e) Intelligence: Some Issues:
greater g = better test was thought to predict overall intelligence Nature vs. Nurture
Currently believed to be mix of two
CHAPTER 9: INTELLIGENCE AND ITS MEASUREMENT CHAPTER 9: INTELLIGENCE AND ITS MEASUREMENT
Performationism: all structures, including intelligence are had at birth
and can’t be improved upon
Led to predeterminism: one’s abilities are predetermined by genetic
inheritance and no learning or intervention can enhance it
Interactionist: ppl inherit certain intellectual potential
o Theres a limit to genetic abilities (i.e. can’t ever have x-ray vision)
The Stability of Intelligence
Stable pretty much throughout one’s adult life
Cognitive abilities seem to decline with age
The Construct Validity of Tests of Intelligence
Having construct validity requires having unified understanding of what
intelligence is
Very difficult. Spearman says its one thing, Guilford says its many
Thorndike approach is sort of compromise
o Look for one central factor with three additional factors
representing social, concrete, and abstract intelligences
Other Issues
Flynn effect: IQ scores seem to rise every year, but not coupled with
rise in “true intelligence”
Personality
o High IQ: Need for achievement, competition, curiosity,
confidence, emotional stability etc.
o Low IQ: passivity, dependence, maladjustment
o Temperament (used to describe infants)
Gender
o Men usually outscore in visual spatialization tasks and
intelligence scores
o Women tend to outscore in language-skill tasks
o But differences can be bridged
Family Environment
o Divorce can have negative effects
o Begins with “maternal effects” in womb
Culture
o Provides specific models for thinking, acting and feeling
o Assumed that if cultural factors can be controlled then
differences between cultural groups will be lessened
o Assumed that culture can be removed by the reliance on
exclusively nonverbal tasks
Tend not to be very good at predicting success in
various academic and business settings
o Culture loading: the extent to which a test incorporates the
vocabulary, concepts, traditions, knowledge and feelings
associated with a particular culture
o No test can be culture free
o Culture-fair intelligence test: test/assessment process
designed to minimize the influence of culture with regard to
various aspects of evaluation procedure
o Another approached called for cultural-specific intelligence tests
Ex.) BITCH measured streetwiseness
Lacked predictive validity and useful, practical
information
CHAPTER 10: TESTS OF INTELLIGENCE CHAPTER 10: TESTS OF INTELLIGENCE
The Stanford-Binet Intelligence Scales Other Measures of Intelligence
First to have detailed administration and scoring instructions Tests Designed for Individual Administration
First American test to test IQ Kaufman Adolescent and Adult Intelligence Test
First to use alternate items (an item that can be used in place of Kaufman Brief Intelligence Test
another) Kaufman Assessment Battery for Children
Lacked minority group representation Away from information processing and towards a distinction
Ratio IQ =(mental age/chronological age)x100 between sequential and simultaneous processing
Deviation Ratio/test composite: performance of one individual Tests Designed for Group Administration
compared to the performance of others of the same age. Has mean of Group Testing in the Military
100 and standard deviation of 16 o WWI need for government to test intelligence as
Age scale: items grouped by age means of differentiating “unfit” and “exceptionally
Point scale: items organized by category The superior ability”
Stanford-Binet Intelligence Scales: Fifth Edition o Army Alpha Test: to army recruits who could read.
Measures fluid intelligence, crystallized knowledge, quantitative Included general information questions, analogies, and
knowledge, visual-processing, and short-term (working) memory
scrambled sentences to reassemble
Utilizes adaptive testing: testing individually tailored to testtakers to o Army Beta Test: to foreign or illiterate recruits,
ensure that items are neither too difficult (frustrating) or too easy (false
included mazes, coding, and picture completion.
hope)
o After the war, the alpha and beta test were used
Examiner establishes rapport with testtaker, then administers routing
rampantly, and oftentimes misused
test to direct, route examinee to test items most likely at optimal level of
o Screening tools: instrument of procedure used to
difficulty
identify a particular trait or constellation of traits
Teaching items: show testtaker what is expected, how to do it.
o ASVAB (Armed Services Vocational Aptitude Battery):
o Can be used for qualitative assessment, but not scoring
administered to prospective to recruits or high school
Subtests for verbal and nonverbal tests share same name, but involve
different tasks students looked for career guidance
5 career areas: clerical, electronics,
Floor: lowest level of items on subtest
Ceiling: highest-level item of subtest mechanical, skill-technical, and combat
Basal level: base-level criterion that must be met for testing on the operations
subtest to continue Group Testing in Schools
Ceiling level is met when testtaker fails certain number of items in a row. o Useful in developing child’s profile - but cannot be sole
Test discontinues here. indicator
Scores: raw standard composite o Groups of 10-15
Extra-test behavior: behavioral observation The o Starting in Kindergarten
Wechsler Tests o Also called traditional group testing, because more modern
-commonality between all versions: all yield deviation IQ’s with mean of 100 forms can utilize computer. These more aptly called
and standard deviation of 15 individual testing
Wechsler Adult Intelligence Scale-Fourth Edition (WAIS-IV) Measures of Specific Intellectual Abilities
Core subtest: administered to obtain a composite score Widely used intelligence tests only test a sampling of the many
Supplemental/Optional Subtest: provides additional clinical attributable factors aiding in intelligence
information or extending the number of abilities or processes Ex.) Creativity
sampled. o Commonly thought to be composed of originality,
Yields four index scores: Verbal Comprehension Index, a Working Memory fluency, flexibility, and elaboration
Index, a Perceptual Reasoning Index, and a Processing Speed Index o If the focus is too heavily on whether an answer is
Wechsler Intelligence Scale for Children –Fourth Edition (WISC-IV) correct, doesn’t allow for creativity
Process score: index designed to help understand how testtakers process o Achievement tests require convergent thinking:
various kinds of information deductive reasoning process that entails recall and
WISC-IV compared to the SB5 consideration of facts as well as a series of logical
Wechsler Preschool and Primary Scale of Intelligence-Third Edition judgments to narrow down solutions and eventually arrive
(WPPSI-III) at one solution
New school for children under 6 o Divergent thinking: a reasoning process in which
First major intelligence test which adequately sampled total thought is free in many different directions, making
population of the United States several solutions possible
Subtests labeled core, supplemental, or optional Associated words, uses of rubber band etc.
Wechsler, Binet, and the Short Form Test-retest reliability for some of these tests are
Short form: test that has been abbreviated in length to reduce time near unacceptable
needed to administer, score and interpret
used with caution, only for screening
provide only estimates
reducing the number of items usually reduces reliability and thus validity
Wechsler Abbreviated Scale of Intelligence
The Wechsler Test in Perspective
Factor Analysis
o Exploratory factor analysis: summarizing data when we
are not sure how many factors are present in our data
o Confirmatory factor analysis: used to test highly
specific factor analysis
CHAP.11: OTHER INDIVIDUAL TESTS OF ABILITY IN EDUCATION AND SPECIAL EDUCATION CHAP.11: OTHER INDIVIDUAL TESTS OF ABILITY IN EDUCATION AND SPECIAL EDUCATION
Alternative Individual Ability Tests Compared with the Binet and Wechsler Scales Bayley Scales of Infants and Toddler Development – Third Edition (BSID-III)
- None of these are clearly superior from a psychometric - Base assessments on normative maturational developmental data
standpoint - Designed for infants between 1 and 42mths
- Some less stable, most more limited in their documented validity - Assesses development across 5 domains: cognitive, language, motor,
- Compare poorly to Binet and Wechsler on all accounts socioemotional, and adaptive
- They don't rely on a verbal response as much as the B and W - Motor scale: assumes that later mental functions depend on motor
- Just use pointing or Yes/No responses, thus do not depend on the complex development
integration of visual and motor functioning - Excellent standardization
- Contain a performance scale or subscale - Generally positive reviews
- Their specificity often limits the range of functions or abilities that they - Strong internal consistency
can measure - More validity studies needed
- Because they are designed for special populations, some - Widely used in research – children with Down syndrome,
alternatives can be administered totally without the verbal instructions pervasive developmental disorders, cerebral palsy, language
impairment, etc
SPECIFIC INDIVIDUAL ABILITY TESTS - Most psychometrically sound test of its kind
- Earliest individual tests typically designed for specific purposes or - Predictive though?
populations Cattell Infant Intelligence Scale (CIIS)
- One of the first – Seguin Form Board Test – in 1800s – produced only a - Based on normative developmental data
single score
- Downward extension of Stanford-Binet scale for 2-30mth olds
o Used primarily to evaluate mentally retarded adults and
- Similar to Gesell scale
emphasized speed and performance
- Rarely used today
- After, the Healy-Fernald Test was developed as an exclusively
nonverbal test for adolescent delinquents
- Sample is primarily based on children of parents from lower and middle
classes and therefore does not represent the general population
- Knox developed a battery of performance tests for non-English adult
immigrants to the US – administered without language; speed not
- Unchanged for 60yrs
emphasized - Psychometrically unsatisfactory
- These early individual tests designed for specific populations, produced MAJOR TESTS FOR YOUNG CHILDREN
a single score, and had nonverbal performance scales
McCarthy Scales of Children’s Abilities (MSCA)
- Could be administered without visual instructions and used with children - Measure ability in children between 2-8yrs
as well as adults
- Present a carefully constructed individual test of human ability
Infant Scales
- Where mental retardation or developmental delays are suspected,
- Meager validity
these tests can supplement observation, genetic testing, and other - Produces a pattern of scores as well as a variety of composite scores
medical procedures - General cognitive index (CGI): standard score with a mean of 100 and a
Brazelton Neonatal Assessment Scale (BNAS) standard deviation of 16
- Individual test for infants between 3days and 4weeks o Index reflects how well the child has integrated prior
learning experiences and adapted them to the demands
- Purportedly provides an index of a newborn’s competence
of the scales
- Favorable reviews
- Considerable research base
- Relatively good psychometric properties
GENERAL INDIVIDUAL ABILITY TESTS FOR HANDICAPPED AND SPECIAL Illinois Test of Psycholinguistic Abilities (ITPA-3)
POPULATIONS cont… - Assumes that failure to respond correctly to a stimulus can result not only
from a defective output system but also from a defective input or
Columbia Mental Maturity Scale – Third Edition (CMMS) cont… information-processing system