Professional Documents
Culture Documents
Chapter 1: Introduction
Learning Objectives
Standardized tests tend to disadvantage women, test takers whose parents have lower incomes and levels of education,
and ethnic minorities
TEST – a measurement device or technique used to qualify behaviour or aid in the understanding and prediction of
behaviour.
- A test measures only a sample of behaviour, and error is always associated with a sampling process.
- Test scores are not perfect measures of a behaviour or characteristic, but they do add significantly to the
prediction process
- The meaning of test scores can change dramatically, depending on how a well defined sample of individuals
scores on a test
ITEM – a specific stimulus to which a person responds overtly; this response can be scored or evaluated (classified,
graded on a scale, or counted)
PSYCHOLOGICAL TEST – or educational test is a set of items that are designed to measure characteristics of human beings
that pertain to behaviour.
SCALES – relate raw scores on test items to some defined theoretical or empirical distribution
INDIVIDUAL TESTS – those that can be given to only one person at a time (the same way a psychotherapist sees only one
person at a time)
GROUP TEST – can be administered to more than one person at a time by a single examiner (instructor gives everyone in
the class a test at the same time
Tests can also be categorized by the behaviour they measure. Historically experts distinguish among achievement,
aptitude, and intelligence as different types of ability.
- ACHIEVEMENT – previous learning, a test that measures or evaluates how many words you can spell correctly is
called a spelling achievement test
- APTITUDE – potential for learning or acquiring a specific skill, a spelling aptitude test measures how many words
you might be able to spell given a certain amount of training, etc.
- INTELLIGENCE – a person’s general potential to solve problems, adapt to changing circumstances, think abstractly,
and profit from experience
HUMAN ABILITY – encompasses the considerable overlap between achievement, aptitude, and intelligence
There is a clear distinction between ability and personality tests. ABILITY TESTS are related to capacity or potential
PERSONALITY TESTS – related to the overt and covert dispositions of the individual – measure typical behaviour
- STRUCTURED PERSONALITY TESTS – provide a statement, usually of the “self-report” variety, and require the subject
to choose between two or more alternative responses
- PROJECTIVE PERSONALITY TESTS – either the stimulus (test materials) or the required response – or both – are
ambiguous
Psychological testing – refers to all the possible uses, applications, and underlying concepts of psychological and
educational tests.
INTERVIEW – a method of gathering information through verbal interaction, such as direct questions
Historical Perspective
- Most major developments occurred over the last century in the US
- Chinese had a relatively sophisticated civil service testing program 4000 years ago – every third-year oral
examinations were given to help determine work evaluations and promotion decisions
- By the Han Dynasty (206-220 BCE) test batteries was common
o TEST BATTERIES – two or more tests used in conjunction
- The English learned it from the Chinese and adopted it, then the French and US in 1883 – American Civil Service
Commission
REPRESENTATIVE SAMPLE – one that comprises individuals similar to those for whom the test is to be used
Mental age – revised 1908 Binet-Simon Scale, measures age in terms on similar abilities as the average person from that
age group which may differ from chronological age
World War I - Army needed testing to deal with the influx of recruits, Robert Yerkes, President of the American
Psychological Association, recruited psychologist and created two structured tests of human abilities. Army Alpha –
required reading ability, and the Army Beta – measured the intelligence of illiterate adults
Thematic Apperception Test (TAT) – Henry Murray and Christina Morgan, used ambiguous pictures and required the
subject to make up a story about the scene (personality test)
MMPI 2 – Minnesota Multiphasic Personality Inventory – used empirical methods to determine the meaning of a test
response – currently the most widely used and referenced personality test
Factor Analysis – a method of finding the minimum number of dimensions, called factors, to account for a large number
of variables
Chapter 2: Norms and Basic Statistics for Testing
Learning Objectives
Statistical methods serve two important purposes in the quest for scientific understanding
INFERENCES are logical deductions about events that cannot be observed directly
- Ex, can’t know how many people watched a movie but can use a sample to infer a percentage of people who
saw the film
DESCRIPTIVE STATISTICS – are methods used to provide a concise description of a collection of quantitative information
INFERENTIAL STATISTICS – are methods used to make inferences from observations of a small group of people known as a
sample to a larger group of individuals known as a population
EQUAL INTERVALS – a scale has the property of equal intervals if the difference between two points at any place in the
scale has the same meaning as the difference between two other points that differ by the same number of scale units
ABSOLUT 0 (ZERO) – An absolute 0 is obtained when nothing of the property being measured exists, ex. Hearts rate – you
can have a heart rate of 0 (dead) but you cant have an intelligence of 0
- Extremely difficult if not impossible for many psychological qualities to define an absolute zero
Types of Scales
NOMINAL SCALES – not really scales at all, only purpose is to name objects (baseball jersey number)
- When attached to a category, most statistical procedures are not meaningful (ie, 1=red, 2=blue, what would a
mean of 1.87 signify?)
ORDINAL SCALES – a scale with a property of magnitude but not equal intervals or an absolute 0
- Allows you to rank but tells you nothing about the difference between the ranks
- For most problems in psychology the precision to measure the exact differences between intervals does not
exist – so most often ordinal scales are used
INTERVAL SCALE – has the property of magnitude and equal intervals but not absolute zero
- Fahrenheit –
RATIO SCALE – has all 3 properties, magnitude, equal intervals, and an absolute 0
- Example, speed of travel, 0 km per hour is no speed, and 60km per hour is twice as fast as 30km per hour
- A single test score means more if one relates it to another test score.
- A distribution of scores summarizes the scores for a group of individuals
FREQUENCY DISTRIBUTIONS – displays scores on a variable or a measure to reflect how frequently each value was obtained
- Define all the possible scores and determine how many people obtained each of those scores
- For most distribution is bell shaped
- Positive skew – the tail goes off toward the higher or positive of the X axis
- CLASS INTERVAL – when you draw a frequency distribution you must decide on the width of the class intervals
PERCENTILE RANKS – replace a simple rank when we want to adjust for the number of scores in a group,
- What percentage of the scores fall below a particular score (cases below the case of interest)
- The formula is:
B
- Pr = ×100=percentile rank of X i
N
-
B (the number of scores below X i (the score of interest ))
Pr ( Percentile rank )= ×100= percentile rank of X i
N ( thetotal number of scores )
- You form a ratio of the number of cases below the score of interest and the total number of scores
- Will always be less than or equal to 1
- Measure of relative performance
Describing Distributions
MEAN – arithmetic average
X=
∑X
N
X = “X bar” which is the mean
∑ = Greek letter sigma, means sum or add scores together
- Mean doesn’t tell you anything about variability, mean can be the same for sets of scores but they could vary
greatly.
- One way to measure variability is to subtract the mean from each score ( X −X ) and then total the deviations
shown as lower case x . x = ( X −X )
o Sum of the deviations around the mean will always equal 0
- To avoid this you square all the deviations around the mean to get rid of negatives
- Then obtain the average squared deviation around the mean- variance
The squared root of the variance is the standard deviation - σ – thus the squared root of the average squared deviation
around the mean
σ =√ ∑ ¿ ¿ ¿
Z SCORE – the difference between a score and the mean, divided by the standard deviation
X i−X
Z=
S
S – the standard deviation of a population
MCCALL’S T
Same as Z score but the mean is 50 rather than 0 and Standard Deviation is 10 rather than 1
T = 10Z + 50
QUARTILES are points that divide the frequency distribution into equal fourths
- First is 25th percentile (Q1), the second quartile is the median or 50th percentile (Q2), and the 3rd is the 75th
percentile (Q3)
- The INTERQUARTILE RANGE is the interval of scores bounded by the 25th and 75th percentile
DECILES use 10%, thus D9 is is the point below which 90% of the cases fall
STANINE SYSTEM – converts any set of scores into a transformed scale which ranges from 1-9
CRITERION-REFERENCED TEST – describes the specific types of skills, tasks, or knowledge that the test taker can
demonstrate such as mathematical skills
Chapter 3: Reliability
Test-retest – consider the consistency of the test results when the test is administered on different occasions
Parallel forms (equivalent forms) – evaluate the test on different forms of the test that measure the same attribute
Internal consistency – how people perform on similar subsets if items selected from the same form of the measure
Split-half measure – a test is given and divided into halves that are scored separately
Cronbach – coefficient alpha, estimates the internal consistency of tests in which the items are not scored as 0 or 1
Class Notes
(DOPE)
- Sierpinski’s Triangle
Empiricism – truth is revealed through observation, can’t trust your eyes, data is the only way to know
Relativism – there is no ultimate truth just what is true for each perceiver (Protagoras)
Platonic forms – Plato sought to fix relativism, we can know things – the number 6 exists, we can have 6 things
- Believed there is a universal standard of beauty, what exists in the world around us is just a shadow of a perfect
form that we can’t know
Measurement Theory
(X=T+E)
Statistics – 3 lessons to know – mean, variances, and the proportion of explained variance
- Mean
- Variance
- Explained variance- some differences are explainable, sometimes no change can be alarming (ie, we expect kids
to grow up), but some we just can’t explain
Errors
- Explainable: systematic factors that produce systematic changes in scores (height in kids)
o Learning, training, growth, fatigue
- Unexplainable: unsystematic factors that produce random changes in scores
o Marks in school are sometimes higher or lower, with no consistent pattern or order
- Unreliability then represents the extent of unexplained or unsystematic variation in scores of a person on some
trait/ability when that trait/ability is repeatedly measured
o Reliability is the extent of systematic variation in scores of one person on some trait/ability
Sigma – any Greek letter refers to the true value that’s out there that we won't know
- Internal consistency – analyze same instrument and look for comparable items
- Split-half reliability – divide a single test into to halves (odd/even – most common)
- KR20 - For each item what percentage of the people got it right or endorsed it in a particular direction, and the
percentage that got it wrong, and the overall variance of the test
- Uses each item
o Sum of p x q for each item (p1*q1+p2*q2+…)
- KR21 uses an average of each item (mean of p and mean of q)
- P is percentage correct and q is percentage incorrect (q is 1-p)
Saupe’s Quickie
[.19∗number of items]
- Reliability = 1 - 2
SD
- 1/5 th of the number of items = .19
Standard error of measurement (give or take in stats speak) what is the wiggle of any point
- = SD X∗√ (1−r xx )
- 1-Rxx(Reliability) 1 minus the reliability is the unreliability
- If the answer is 2.5, and the person scored 50, not likely to be less than 48.5 or more than 52.5
- Item difficulty = how much of the trait is needed to answer the item correctly 50%
Validity
- Reliability is consistency – the property is repeatable
- Validity (what a score means; the extent to which a test measures what it claims to) is a much more difficult and
complex issue
- Much harder on pieces we can’t see – personality, leadership, etc.
- Invalid test can still be reliable but unreliable test can never be valid
- Content - Degree to which questions, tasks, items on a test are representative of the universe of behaviour the
test was designed to measure (grade 3 spelling), hard with ill defined trait
o Face Validity – items are valid if they look valid
- Criterion-related - A test is shown to be effective in estimating one’s performance on an outcome measure
o If test is valid we can discriminate between those who will or wont
o Concurrent: scores obtained simultaneously
o Predictive: criterion obtained mos/yrs later, Test scores are used to estimate outcome measures
obtained at a later date, ie, using high school marks to predict if you will graduate university used to
determine admission
- Construct - The most difficult and elusive form of validity
o No single external referent is sufficient
o A network of interlocking suppositions can be derived from existing theory
o Boils down to rational argument, this is what I expect to measure this is what it will show and this is
what we expect to see etc.
- Convergent Validity: test correlates with other tests with which it overlaps
- Divergent Validity: test does not correlate with tests from which it should differ