Professional Documents
Culture Documents
These are my first draft personal study notes taken from a combination of readings from:
1. Psychological Assessment in the SA context (2005), 2nd ed (edited Foxcroft and Roodt)
2. Dr M Mazabow’s Psyc Assessment study guide (the so called Boston Notes) – many of my
chapter notes are a direct copy of Dr Mazabow’s format and summary of his info which I
referred to extensively.
3. My notes are not intended as a substitute for either the text book or the Boston notes
The “Focus points” in the intro to my notes are from TL 104/2008 which I am personally putting a
lot of faith in!
I have done a small comparison of some past exam papers (2004, 2005, & 2007, from the
Boston Notes) and the questions appear to follow each assignment’s “Aims” (always a good
place to start since this is what the lecturers want you to come out of the course knowing). These
aims you will find in TL101/2008 at the start of each Assignment instruction and begin with the
words “The aim of this assignment is…”
I may have left entire sections out or may have misunderstood some contexts (both the
foregoing apply particularly to assignment 2 – Psychometric Properties! I had great difficulty
getting through this section.) Same applies to my understanding of the EXAM REQUIREMENT if
you feel I have misinterpreted it please follow your own understanding or check with UNISA.
1. EXAM REQUIREMENT
a. Exam Questions
i. Year Mark (20%)
ii. SECTION A: 20 X Multiple choice (assume 20%)
iii. SECTION B: 3 out of 5 essay questions (assume 20% per question)
b. Syllabus
i. Know theory as discussed in relevant sections of prescribed book
ii. Be able to answer applied questions on each field of knowledge
iii. Prescribed book, TL 101-105 (additional recommended journal articles – not in
detail just be able to apply the information
iv. Know:
1. Structure and content of different types of tests (e.g. tests for preschool
intelligence, self-report inventories, etc)
2. Discuss examples of each type of test
3. Concentrate on tests relevant to the assignments
c. RECOMMENDATION
i. Learn
1. All theory, and go over assignment 1 for SECTION A
2. Concentrate on 3 topics for SECTION B
a. Lightly cover remaining 2 topics (limit to focus points for these
topics)
d. FOCUS POINTS
i. (02) PSYCHOMETRIC PROPERTIES OF PSYCHOLOGICAL ASSESSMENT
MEASURES
1. Technical evaluation and establishment of norms
a. TL 101/2008 – norm referenced and criterion referenced tests
and issues of reliability and validity
b. Additional material on reliability and validity in TL’s
ii. (03) CROSS-CULTURAL APPLICATION OF PSYCHOLOGICAL ASSESSMENT
MEASURES
1. Issues of test bias and test fairness
2. Fair use of assessment measures
3. Factors responsible for differences in test performance in a multicultural
context
4. Use recommended reading and assignment feedback
iii. (04) DEVELOPMENTAL ASSESSMENT OF YOUNG CHILDREN
1. Work through activities in Section C
2. View each activity as a potential question
3. Do NOT need detailed knowledge of the contents of measures
described in study material
iv. (05) ASSESSMENT OF COGNITIVE FUNCTIONING WITH PSYCHOLOGICAL
MEASURES
1. Understand difference between main theoretical approaches to
intelligence
2. Value of these approaches for SA context
v. (06) ASSESSMENT IN AN INDUSTRIAL CONTEXT
1. Focus on the role of different types of psychological measures in the
selection process.
2. (02) PSYCHOMETRIC PROPERTIES OF PSYCHOLOGICAL ASSESSMENT MEASURES
a. FOCUS
i. Technical evaluation and establishment of norms
1. TL 101/2008 – norm referenced and criterion referenced tests and
issues of reliability and validity
2. Additional material on reliability and validity in TL’s
b. NORM-REFERENCED AND CRITERION-REFERENCED TESTS
i. INTRODUCTION
1. Testing vs Assessment
a. Test results represent only one source of assessment
b. Assessment is a multi-dimensional process (tests, interviews,
background information, parent interviews, reference checking)
c. Tests are standardised procedure for
i. sampling behaviour
ii. Describing that behaviour (categories)
iii. Interpretation of test scores
iv. Sample of behaviour obtained used to predict other
non-tested behaviour
d. Test items consist only of a sample of the population of
behaviours examiner is interested in
e. From limited sample one makes inferences about other relevant
behaviours
f. Predict additional behaviours
ii. NORMS AND TEST STANDARDISATION
1. Task here is to standardise the test using a norm group
2. Raw score – most basic level of information provided by a test eg 25/40
3. Raw score only becomes meaningful in relation to the scores obtained
by a large and representative sample of subjects – i.e. in relation to
norms
4. Norm group/normative sample is sample of subject’s representative of
the population for whom the test is intended.
a. Also called standardization sample
b. Process of test standardisation
i. Draw a random heterogeneous and representative
sample of people
ii. Test all these people with your test and obtain their raw
scores
iii. From this distribution of test results you derive
summary statistics or norms
1. Mean score and SD for each age group
2. Percentile and standard scores
5. Once you have this summary of the norm group’s scores you can test
another individual and tell how well they did compared to the norm
group – this is done by transforming the individual’s raw score into a
standard score or percentile
6. Need to update norms periodically as they become obsolete over time.
iii. TRANSFORMING RAW SCORES: TYPES OF NORMS
1. ASSUMPTION: that the raw scores of the norm group are normally
distributed in a bell curve shape – 68% of scores fall within 1 SD above
and below the mean
2. PERCENTILES
a. Most common way of transforming a psychological test score
b. A percentile score of 40 means you scored better than 40% of
the norm.
c. Expressed in terms of the % of persons in the standardisation
sample who fall below a given raw score
d. Mean = 50th percentile
e. Q1 = 25th & Q3 = 75th percentiles (cut off the lowest and highest
quarters of normal distribution.
f. Due to normal distribution there is a marked inequality of
percentile scale units especially at the extreme ends
g. Cannot be used for arithmetic functions
3. STANDARD SCORES
a. Z-scores tell you how far from the mean of the norm your score
is and whether it is above or below the mean
b. Uses Standard deviation units
i. assumes population mean =0.00 and SD = 1.00
ii. Score of +1.00 = I SD above the mean (better than 84%
of sample according to normal distribution)
c. Deriving Z-Score
i. Z=(X-M)/SD
d. Represent interval level measurements so statistical calculations
can be done on these scores
e. .Can use Standard scores to compare
i. two scores obtained by different people on the same
test or
ii. scores on 2 different tests by the same person
f. Standard scores maintain the relative magnitude of differences
found in the original scores
g. Distribution of Z scores is limited: -3.0 to +3.0
h. LINEARLY TRANSFORMED STANDARD SCORES
i. Multiply Z score by constant value (recommend X10
then add 50)
ii. -3.0 to 1.0 to +3.0 becomes 20 to 50 to 80
iii. Become statistically more complex and less useful
i. NORMALISED STANDARD SCORES
i. These are standard scores that have been transformed
to fit a normal distribution
1. McCall’s T-score T=Zn10+50
2. Stanine Scale (standard nine)
a. Range from 1 – 9 mean 5 SD 1.96
b. Scale units are equal
c. Reflects persons position relative to
normal distribution
d. Comparable across groups
e. Allow statistical manipulation
f. Has only 9 units so is an approximate
scale
3. Sten scale (standard ten)
a. Range 1- 10 mean 5.5 SD 2
4. DEVIATION IQ SCALES
a. Normalized standard score with mean 100 SD 15
5. DEVELOPMENTAL SCALES
a. Typically used when measuring characteristics that increase
progressively with increase in age
b. MENTAL AGE SCALES
i. Basal age is computed – highest age at and below which
a test is passed
ii. Eg. average 10yr old should pass this test so 9yr old who
passes a test that has a basal age of 10 is functioning at
a 10-year old mental age level
c. GRADE EQUIVALENTS
i. Performance is translated into a grade value
ii. Specifically for scholastic measures
iii. Values are not precise
6. WHICHOF THE NORM SCORES TO USE?
a. Personal preference or convention
7. THE NORM GROUP
a. Group should be large
b. Representative cross-section of the population for which the
test is designed i.e. variety of different socio-economic levels,
genders, racial groups, urban/rural and geographical locations
c. Randomly sampled (simple or stratified)
d. May be forced to pick a diverse and approximately
representative sample using the sources available
8. CRITERION-REFERENCED VERSUS NORM-REFERENCED TESTS
a. Criterion-referenced tests
i. Measure what the subject can do rather than how he
compares with other subjects
ii. The subjects performance on the test is not compared
to any reference group
iii. Compared to an external standard
iv. These tests are used particularly in EDUCATIONAL
context
1. Can the child read at the level appropriate for
his grade?
2. What arithmetic skills has he mastered?
v. Also used to determine proficiency (job, drivers license)
vi. Should have clearly defined domain of knowledge or
skills
b. SETTING STANDARDS AND CUT-OFF SCORES
i. Expectancy table – table that indicates the relationship
between performance on a test (say aptitude) and
success on a criterion
ii. i.e. for machine operators
80% of employees who got between 31-40 on aptitude
test were successful in job performance later on.
iii. 40% who got between 21-30 were successful
iv. CUT OFF SCORE: is then set such that it meets your
expectation for prediction of success
v. ADVANTAGE: is easy to show the relationship between
test score and probable performance
vi. DISADVANTAGE: requires large samples to construct the
table otherwise predictions will be inaccurate
vii. Be aware there is a band of error in the prediction
viii. Also may be discriminatory
c. RELIABILITY
i. INTRODUCTION
1. Two broad ways to understand reliability:
a. Temporal stability: consistency of measurement stated within a
margin of error (reliability coefficient)
b. Internal stability (consistency/homogeneity)
ii. CLASSICAL THEORY
1. Two sets of influences in any test score
a. Factors that contribute to consistency – the stable attributes
you are trying to measure
b. Factors that contribute to the inconsistency of the score –
factors that have nothing to do with the attribute being
measured but nevertheless affect the score
c. X = T-e
d. The True score can never be known best we can do is state that
the true score lies within a given interval
e. Reliability always implies a certain amount of error
iii. SOURCES OF MEASUREMENT ERROR
1. Unsystematic measurement error (unpredictable and inconsistent)
a. Item Selection
b. Test Administration
c. Test Scoring
2. Systematic measurement error (consistent and predictable directional
effect – will cause score to either be higher or lower)
a. Test is measuring something other than or in addition to what
was intended.
i. Minimized by proper development procedures
ii. It is difficult to assess any trait/quality/characteristic in
isolation
b. X = T + [eu + es]
3. MEASUREMENT ERROR AND RELIABILITY
a. Refers to the same thing “How consistent are the test scores
over time”
iv. THE RELIABILITY COEFFICIENT
1. Reliability coefficient = correlation coefficient
2. Reliability coefficient gives us the ratio:
a. True score variance : Total variance (true score variance + error
variance) R=ΦT2/ΦX2
b. Recall ΦX2= ΦT2+Φe2 Total Var =True Var + Error Variance
c. Therefore Reliability R= ΦT2/ ΦT2+Φe2
d. Reliability coefficient ranges from 0.00 to 1.00
(completely unreliable to fully reliable)
The higher the denominator value (due to Φe2 (error variance))
the lower the result (less reliable)
e. A Reliability coefficient of 0.80 indicates that 80% of variance is
due to True variance and 20% is due to error variance
3. This measure of reliability ONLY APPLIES when the test is administered
under standard conditions, to a person who is similar to the normative
sample.
v. METHODS FOR ESTIMATING RELIABILITY
1. Measures of Temporal Stability
a. Test-retest Reliability
i. Administering the test to the same heterogeneous
group of subjects on two separate occasions
ii. Reliability coefficient is simply the correlation between
the scores obtained by the same person on each test
iii. This is a measure of “Temporal stability” (stability over
time)
iv. Sources of error variance: mood, weather, random day-
to-day changes
v. To be acceptable coefficient not less than 0.80
vi. Time interval will be specified, generally not more than
6 months
vii. Useful in discounting day-to-day fluctuations
viii. PROBLEMS: practice effect and carryover
(remembering lists from first test) will lead to spuriously
low reliability
b. Alternate-forms Reliability
i. Essentially parallel or equivalent forms of the test are
produced – similar item content and difficulty
ii. Administer both forms to the same group (either
immediately or on separate occasions)
iii. Reliability coefficient is correlation between the scores
obtained for each form of the test
iv. ADDITIONAL error variance because of difference in test
items (CONTENT) – it is difficult to establish a parallel
form of test
v. Does not eliminate practice-effect or carryover.
2. Measures of Internal Homogeneity/consistency of a Test
These next measures determine a test’s reliability indirectly by using a
single test. The assumption is if the test has a high degree of internal
consistency it will also show stability on a test-retest approach.
a. Split-half Reliability
i. Single administration of the test to a single group
ii. AFTER the completion the test is split into two
equivalent halves and the first half then correlated with
the second (each subject has two scores – first half and
second half)
iii. Most usual split is between odd and even number items
iv. (similar idea to alternate-forms done at one sitting)
v. Error-variance identified is CONTENT-SAMPLING cannot
be temporal stability because there was only one test
administration.
vi. Error-variance needs to be adjusted because we have
only measured reliability for half the test
1. Correct by using Spearman-Brown formula:
r(corrected)=2r(half test)/1+r(half test)
essentially this doubles the length of the test
vii. PROBLEM whichever way the test is split will have a
potential effect on the error variance obtained.
b. Coefficient alpha reliability (Cronbach’s coefficient alpha)
i. This reliability value is the mean of all possible split-half
coefficients – the degree to which all items correlate
with one another
ii. If using Cronbach’s coefficient alpha must be
supplemented by test-retest method
c. Kuder-Richardson estimate of reliability (KR-20)
i. As with above is coefficient is the mean from all
different splittings – but this formula only used where
score is dichotomous (score either 0 or 1 (or other
yes/no equivalent rating))
3. Inter-scorer reliability
a. The judgement of the scorer may differ from one test to
another reducing reliability of the test
b. PROCEDURE: random sample of completed tests is given to two
or more scorers. The scores are then correlated and an inter-
scorer reliability coefficient obtained
vi. SPECIAL CIRCUMSTANCES IN ESTIMATION OF RELIABILITY
1. Characteristics are inherently unstable
a. Emotional response for instance fluctuates in response to many
internal and external factors – to get a reliability measure would
be very difficult
2. Speed and Power Tests
a. Invariably give overly high reliability on split-half
i. i.e. if subject does 10 items out of 20 in the time limit
almost all 10 will invariably be correct giving a high
odd/even correlation
b. Better to use test-retest
3. Restriction of Range
a. Nature of the group on which the reliability is measured
b. If homogenous group is used reliability coefficient will be higher
than if a representative heterogeneous group was used
4. Criterion-referenced tests (mastery assessment)
a. These tests measure the degree to which a person has
mastered a particular skill
b. We are not interested in test-retest correlation but rather
whether we would find the same classification (pass/fail) on a
second testing occasion
c. Use an alternate-forms method to find percentage of persons
for whom the same decision (pass/fail) is reached on both
testing occasions.
vii. INTERPRETING THE RELIABILITY COEFFICIENTS
1. Total error variance
a. E.g. Error variance is 0.80
therefore 80% true variance 20% error variance
this 20% might be made up:
10% sampling error, 8% time related, 2% inter-scorer
2. Acceptable level of reliability
a. Standardised tests: 0.85 or higher
b. Group tests may be acceptable as low as 0.65
c. In practice a test even with 0.70 can be useful
viii. STANDARD ERROR OF MEASUREMENT (SEM)
1. Used to interpret reliability for INDIVIDUAL test scores
2. Knowing
a. Standard Deviation (SD) of a test derived from the normative
sample and Reliability coefficient r then:
SEM=SD √(1-r)
b. E.g. SEM = 15√(1-0.97) = 2.6
Note: as reliability decreases SEM increases:
SEM = 15√(1-0.89) = 5.0
ix. Now 2 SD gives 95% probability (1 SD 68%) for normal distributions
Example test has Mean of 50 SD 10 test-retest reliability 0.8
To express an Individual’s test score of 60:
SEM = 10√(1-0.8) = 4.5 X 2 (for 2 SD) = 9
With 95% certainty subject obtained 60 +/-9 or
Subject range of score at 95% confidence interval is between 51-69
d. VALIDITY
Validity concerns what the test measures and how well it does so
Reliability is a NECESSARY PRECONDITION for validity
i. CONTENT VALIDITY/CONTENT DESCRIPTION PROCEDURES
1. By choosing appropriate items on the basis of specifications provided by
a panel of experts in the field in question – non statistical type of
validity
2. This type of validity is built into the test from the outset
3. Test manual should specify details about who comprised the panel and
what their qualifications were
4. APPROPRIATE FOR:
a. Educational and occupational achievement tests where domain
of behaviour is well defined
b. Employee selection and classification
c. NOT for aptitude and personality tests
ii. FACE VALIDITY
1. Concept has no technical significance
2. Refers to what a test “appears” to measure
3. Face validity may be improved by wording items in relevance to the
context i.e. For machine operators a mathematical reasoning test could
use terms familiar to a machine operator rather than in terms of “apples
and oranges”
iii. CRITERION-RELATED VALIDITY/CRITERION-PREDICTION PROCEDURES
Refers to a test’s ability to estimate/predict the test-takers performance on
some other independent measure (the criterion)
1. Concurrent validity
a. i.e. where test measures depression and the criterion measure
is diagnosis by a psychiatrist
b. Test can be used as a “short cut” screening procedure
2. Predictive validity
a. E.g. university marks predicted from a university entrance exam
b. The scores of the test itself are correlated against the criterion
giving a correlation coefficient of the criterion: r(correlation)
c. The validity coefficient equation is then:
r(validity coefficient) = √r(reliability of test ) r(correlation)
d. If a test is used for PREDICtion IT IS NECESSARY TO HAVE
COMPUTED A regression equation – best fitting straight line
with which to estimate the criterion with the test score
EQUATION IS: y=bx +a
3. The Criterion
a. Academic achievement
b. Attainment of awards
c. Performance in specialized training
d. Actual job performance
e. Contrasted groups
f. Psychiatric diagnosis
g. Ratings
4. The validity coefficient is constrained by the reliability of both the test
and the criterion so can never be a perfect 10 this is why the validity
coefficient is always presented as being “less than or equal to” it’s
value.
5. Criterion contamination – eg selection board have knowledge of
subjects test results – may influence their decision
iv. THESTANDARD ERROR OF THE ESTIMATE
1. (I did not go through this section)
v. CONSTRUCT VALIDITY/CONSTRUCT IDENTIFICATION PROCEDURES
1. Seven methods for determining construct validity
a. Test homogeneity
i. Making sure the scores for each item correlate with the
total score achieved on the test
b. Appropriate developmental changes (age differentiation)
i. Older children show more progress than younger
c. Theory-consistent group difference
i. Hypothesized against theory
d. Experimental interventions (Theory-consistent intervention
effects)
i. Pre-test and post-test
e. Convergent and discriminant validation
i. Correlates with other variables or tests which measure
the same construct
f. Factor analysis
i. Statistical technique
g. Correlations with other tests
i. Correlation between new test and earlier tests
vi. INTERPRETING THE VALIDITY COEFFICIENT
1. The validity coefficient is actually a Pearson’s correlation coefficient – so
the relationship between the two variables (the criterion and the
predictor variable) must have a linear relationship
2. FACTORS AFFECTING THE VALIDITY COEFFICIENT:
a. The nature of the group – in relation to the norm
b. Heterogeneity of the sample
c. Reliability of the test
d. Criterion-contamination
e. Pre-selection
e. TEST CONSTRUCTION/DEVELOPMENT
i. SIX STEPS IN THE DEVELOPMENT OF A TEST MEASURE
1. Planning Phase
a. Purpose/aim
b. Content
i. The construct to be measured must be operationally
defined
1. Through Rational method –
a. literature study of main theoretical
viewpoints
b. Breaking up construct into several
dimensions and operationalising each
dimension in concrete measureable
terms
2. Criterion-Keying method
a. Items discriminate between different
groups (high/low risk)
c. Format and number of each type
i. Homogenous or heterogeneous – same or varied
content
ii. Use table of specifications
iii. Decide on range of item difficulty
1. Difficulty shows differences between subjects
2. Ceiling effect – when many subjects obtain a
perfect score and cannot be separated
3. Floor effect – when many subjects obtain poor
scores and cannot distinguish the lower end
d. Writing the items
i. Must be non-ambiguous
ii. Avoid double negatives
iii. Not more than one theme to an item
iv. Content must be appropriate for purpose
e. Reviewing the items
i. Try on a small sample of people and review
2. Assembling the Measure and Pre-Testing it
a. Arranging the items
b. Finalizing the length
c. Answer protocols
i. Test booklet or separate sheets
d. Develop the administration instruction
e. Pre-testing
i. Sample of 400-500 representative people
3. Item Analysis
a. Determining item difficulty
i. Difficulty index – for each item % of people getting it
correct
ii. Revise or ditch items at either end of difficulty scale
iii. Index should range 0.3 – 0.7 (30-70% getting it correct)
b. Determining discriminating power of an item
c. Preliminary investigation into item bias
d. Select items for final version
4. Administering the Final Version to the Standardisation Sample
a. Refine instructions for administration and scoring procedures
b. Administer the final version
5. Technical Evaluation and Establishing Norms/Standardising the Test
a. Compute reliability and validity coefficients
i. Choose type of coefficient best suited for the test
ii. (refer earlier section on reliability and validity)
b. If norm-referenced
i. Establish norms from results of standardisation sample
(refer earlier section for types of norm scales)
c. Criterion-related validity (cross-validation)
i. Before publishing the test manual more independence
evidence of criterion-related validity is obtained
ii. Use regression equation from standardisation sample
and apply it to another new group to determine
whether the predictive validity holds true
iii. Predictive validity tends to be less accurate in the new
group due to “validity shrinkage”
6. Compile the Manual
a. Purpose of the test
b. Practical information – length, abilities required to conduct
c. Administration
d. Scoring
e. Show phases of test development
f. Validity and reliability
g. Item bias
h. Norms
i. Cut-off scores if appropriate
7. Submit the Test for Classification
a. Professional Board for Psychometrics Committee
b. Selection will restrict its use to registered psychologists
8. Publishing and Marketing the Test/Measure
a. Marketing must not make any false claims
b. If test is classified do not include any actual examples or they
may find their way into the popular media and invalidate your
test.
c. Restrict who the test is marketed to
9. Ongoing Revision and refinement of the Test
a. Responsibility of developer to ensure tests are developed using
rigorous methodology and with information about reliability
and validity
b. Responsibility of the PSYCHOLOGIST to ensure that the
information given about the test is carefully evaluated before
using it and that it is appropriate (including item bias issues) and
current for his use
3. (03) CROSS-CULTURAL APPLICATION OF PSYCHOLOGICAL ASSESSMENT MEASURES
a. FOCUS
i. Issues of test bias and test fairness
ii. Fair use of assessment measures
iii. Factors responsible for differences in test performance in a multicultural
context
iv. Use recommended reading and assignment feedback
b. DEVELOPMENT OF MODERN PSYCHOLOGICAL MEASURES: SOUTH AFRICAN
PERSPECTIVE
i. Context is characterised by unequal distribution of resources along racial lines
ii. Early measures standardised for whites only
iii. Results of intelligence tests used as evidence for difference between races and
maintaining the idea of white superiority
iv. “differences in original ability” (Fick -1929)
v. No account taken of cultural, economic and education factors on test
performance
vi. No investigation of test-bias
vii. Nationalist Government took power in 1948
1. Measures developed along cultural; and racial lines
2. Increasingly similar but separate measures
3. Various groups did not compete with each other on the job market
4. Most measures were for whites
viii. As discriminatory laws repealed in 1980-90’s
1. Tests developed for use by more than one racial group or norms for
other groups were compiled on pre-existing tests
2. Tests developed and normed for whites were just used on other groups
as well with no investigation into suitability.
3. First study of bias – Owen 1986 – recommended training in tasks being
measured for environmentally disadvantaged subjects
ix. Since 1994 additional criticism from ANC government as result of trying to
redress inequalities in WORK and EDUCATION contexts
1. Labour Relations Act against discriminatory practices now forbids
industrial testing unless tests can be shown to be:
a. Valid and reliable
b. Applicable fairly to ALL employees without bias
2. PROBLEM is that tests have not been scientifically investigated for bias
and have not been cross-culturally validated in SA
x. Assessment continues to be influenced by the political and legislative context
1. Developments come as result of criticism
2. Continues to play a useful role in decision-making – as long as applied in
an ethical and fair way
3. NOTE: Assessment makes use of a variety of information and test results
are only just one source.
c. FACTORS RESPONSIBLE FOR DIFFERENCES IN TEST PERFORMANCE IN A MULTI-
CULTURAL CONTEXT
i. Importance of the Social context
On the importance of taking into account other information about the subject
when interpreting the results of testing.
1. Schooling
a. Most measures indirectly measure what you have learned
through formal education.
b. Education or quality of education thus becomes a source of
potential bias in interpreting results
2. Language
a. Takes longer to process information in a second language
b. Subtle aspects can be missed
c. Subject may be educated in different language to his home
language
3. Culture
a. Influences the way we learn, think and behave
b. Influences the meaning given to tests
c. Content of test mirrors the culture of the people who designed
it
d. Different levels of acculturation into dominant culture.
4. Environment
a. ‘Distal factors’ – broader environment
i. SES
1. Facilities available to child (schools, libraries,
games, clinics)
2. Poverty (disabilities, child abuse, poor health,
lack of stimulation)
ii. Urbanisation
1. Exposure to richer and more stimulating
environment
b. ‘Proximal factors’ – immediate environment
i. Home environment
1. Parental responsively
2. Stimulating environment
3. Household structure - crowding
5. Test-wiseness
a. Prior experience with testing situations
b. Difficulty understanding the examiners instructions
c. Unease over formality and not being able to clarify with
assessor
d. Not used to working fast and briefly
e. Not used to intense concentration
ii. TEST BIAS
(Tests discriminate against persons from non-Western backgrounds, lower SES.
Test bias controversy originated in the finding that African Americans score on
average a SD below white Americans on standardised IQ tests. It is suggested
this difference is due to test bias rather than a meaningful difference)
Test bias refers to an objective statistical criteria – a test will be called biased if
it is differentially valid for different subgroups.
1. Bias in content validity (content bias)[SCORE COMPARABILITY]
a. If relatively more difficult for members of one group than
another due to:
i. Language
ii. Culture
iii. Wording
2. Bias in predictive or criterion-related validity [PREDICTIVE
COMPARABILITY]
a. Regression line is used to predict future score on a criterion
(first year University marks on basis of scholastic aptitude test)
b. Bias is indicated when the scores of the two groups do not
cluster around the same regression line but cluster around two
separate regression lines
3. Bias in construct validity [CONSTRUCT COMPARABILITY]
a. Where a test measures different constructs for different
subgroups or the same construct but with differing levels of
accuracy
b. Tested for by checking rank order of difficulties within the test
for each subgroup – rank orders should be the same or highly
similar (i.e. if one group found a particular item hardest the
other group should as well)
iii. TEST FAIRNESS
Reflects the judge’s subjective philosophy and values – not a statistical concept.
A test may be unbiased but still applied in an unfair manner.
1. Fair Selection Models
a. The regression model
i. If regression equation for two groups is different – use
separate equations for each group – the test is
considered fair if the number of people selected
compared with the number of people selected and
successful is the same for both groups.
b. The quota model
i. Fairness = proportional representation
ii. If test subjects are 50% male 50% female then subjects
selected should have same proportion
c. The equal risk model
i. Cutoff point establishes selection
d. The constant ration model
i. Selection is made according to the same % success per
group i.e. 50% of A successful & 80% B successful then
selection is 50/130 of A and 80/130 of B
ii. In this way even those from the group that tend to do
worse will get selected
e. The conditional probability model
i. Members of both groups who obtains a satisfactory
score has equal chance of being selected regardless of
group membership i.e. 50% from A and 50% from B
f. PROBLEM with these models is there is no rationale for
choosing between them – it is an issue of philosophical position
or values
2. Value positions underlying the models
a. Unqualified individualism
i. Always choose the BEST candidate - highest score of
predicted performance even if this is related to his
group membership. Merit based approach
b. Quotas
i. Select according to local demographic - if 90% locals are
group A then select 80% from top A scorers then 20%
from top B scorers.
ii. Those chosen are not necessarily the best (or even
anywhere near the best)
c. Qualified individualism
i. Use a common regression equation (even though two
separate equations are indicated) then as with
unqualified individualism select the top predicted
performers
ii. This method tends to over- predict on the lower
performing group resulting in more of this group being
selected.
d. CROSS-CULTURAL TEST ADAPTATION AND TRANSLATION
i. Introduction
1. Test translation = converting from one language to another
Test adaption = making a measure more applicable to other contexts
Terms are however used interchangeably.
2. Cheaper and easier to adapt an existing test than to formulate a new
one.
ii. Considerations when adapting measures
1. Administration of the test
a. Communication problems between subject and tester
b. Tester must be familiar with language and culture of the subject
c. Have administration skills and experience with psychometric
testing
2. Item format
a. Multiple choice/true-false/essay – may not be familiar with
format
b. Use balance of different items and include practice items
3. Time limits
a. Use tasks that have no speed requirement when assessing non-
Westerners
iii. Designs for adapting tests
1. Equivalence – Equivalent persons (same ability levels) but from different
cultural groups should get same or similar scores on their equivalent
versions of the test
2. Judgmental designs for testing equivalence
a. Forward-translation designs
i. Source version is translated into target language
ii. Target language sample group complete the translated
test
iii. Experts then question sample group about their
responses to see if test-items have been adequately
understood in the same way the source intended.
iv. ADVANTAGE: test-takers are providing feedback
v. DISADVANTAGE: subjective process
vi. A COMMON ADAPTION is to use bilingual experts to
compare the two versions
1. Bilinguals may not think in same way as
monolinguals
2. Items may be considered similar based on their
prior knowledge but may not be similar for the
subject with less experience.
b. Back-translation designs
i. Original version is translated to target version
ii. Second set of translators translate target version back
to original language
iii. Original language monolingual experts then compare
the two original and back translated versions
iv. Process can be repeated several times
v. ADVANTAGE: Only concepts that have same meaning in
both cultures will survive
vi. DISADVANTAGE: Evaluation of equivalence is only
conducted in the original language and it is possible for
errors made in first translation to get back translated
corrected so that the error in the translated version
goes undetected.
c. After either of above methods a bilingual review committee
should evaluate the final product
3. Statistical designs for testing equivalence
a. Based on bilingual test-takers
i. Given both versions and the scores compared
ii. Can divide group and each half given one version of the
test
b. Based on monolingual subjects (source and target)
i. Matched groups of monolinguals in each language
c. Based on monolingual subjects (source only)
i. Monolinguals in source language take both the original
and back translated versions
ii. Does not provide any data about the actual translated
version
iv. Bias analysis and differential item functioning
1. Differential item functioning (DIF)
a. Two matched groups from different cultural/language groups
take the test
b. An item shows DIF if individuals having same ability but from
different groups do not have same probability of getting the
item correct
c. Remove items with DIF to increase test’s reliability and validity
and results will be more comparable across different groups
2. Statistical methods for detecting DIF
a. Mantel-Haenzel Procedure
b. Item Response Theory
e. FAIR AND ETHICAL ASSESSMENT PRACTICES
i. What Is Fair And Ethical Assessment?
ii. Ethical Assessment Procedures
1. Power relationship
a. Imbalance of power
b. Result can have enormous impact
2. Responsibilities of the tester
a. Inform subject of their rights, purpose of test, confidentiality,
obtain consent, option of refusal and what consequences will
be.
b. Treat subject politely respectfully and impartially
c. Administer measure properly
d. Score correctly and impartially
3. Rights of the test-taker – as above
4. Questions to ask before using a test
a. Will it server the purpose?
b. Possible side effects?
c. Are there alternative options?
5. Responsibilities of the test-takers
a. Follow instructions carefully
b. Treat tester with courtesy and respect
c. Present their test performance honestly
iii. Standard ethical issues
1. Best interests of the client
2. Confidentiality – duty to advise
3. Expertise of the test user
4. Informed consent
5. Standard of test
a. Appropriate and not obsolete
6. Responsible report writing
a. Within limitations of one’s expertise
b. Indicate test result is only one aspect
7. Communicating the test results
8. Consider individual differences
a. To eliminate bias due to age, race, gender or disability
9. Consider cultural and linguistic differences and test-wiseness
4. (04) DEVELOPMENTAL ASSESSMENT OF YOUNG CHILDREN
a. FOCUS
i. Work through activities in Section C
ii. View each activity as a potential question
iii. Do NOT need detailed knowledge of the contents of measures described in
study material
b. RATIONALE FOR TESTING YOUNG CHILDREN
i. Why assess developmental changes?
1. To identify difficulties including motor and speech difficulties as EARLY
as possible – the sooner identified the sooner intervention can be
implemented.
2. Following functions are assessed:
(These factors are not mutually exclusive as socially and emotionally
deprived kids tend to show cognitive delays)
a. Physical
b. Cognitive
c. Social
d. Emotional
3. Assessment even if not apparently necessary may be the first step in
ensuring the optimal development of the child’s potential.
4. Since 1970’s Internationally recognised that the identification of
children with difficulties should take place as early as possible
ii. Types of developmental measures
1. SCREENING
a. Brief formal evaluation of developmental skills
b. Administered by non-specialists (parents, educators, nurses
who have been trained to use the tests)
c. Cost effective
d. Categorise performance rather than provide numerical score.
e. Often administered to large groups simultaneously
2. DIAGNOSTIC
a. Comprehensive diagnostic measures provide numerical scores
or age equivalents for overall performance and for each area
assessed.
b. Performed by trained professionals.
c. Typically used after it has been established that a child is at-risk.
d. Used to identify existence, nature and severity of the problem
a.
c. DIFFERENCE BETWEEN TESTS FOR INFANTS AND TESTS FOR PRESCHOOLERS
i. Age-group specific testing
1. Best example of chronological age affecting test performance is infant
and pre-school tests – content of test differs according to age-range.
2. Birth- 2 ½ yrs (INFANT) focus on sensory and motor development
(hearing, producing sounds, manipulating objects, muscular and posture
control) – development at this age is largely sensory-motor
3. 2 ½ -6yrs (PRE_SCHOOL) focus is on verbal and conceptual abilities –
development is verbal and symbolic.
4. Fits with Piaget’s theory of cognitive development
5. Ratio between mental age and chronological age is fairly constant up to
16yrs at which point mental age tends to level off.
ii. EXAMPLES OF SCREENING TESTS
1. Denver Developmental Screening Test (Denver II) [INFANT?]
a. 1mth-6yrs; based on parent reports, observation and
examination
b. 3 Categories: Abnormal-Questionable-Normal
c. Variety of domains:
i. Gross & fine motor
ii. Language
iii. Personal-social
d. Popular worldwide – no SA norms
e. High reliability and validity
f. Performs well in correctly identifying at-risk kids but does yield
false positives (shows normal as at-risk)
2. Vineland Adaptive Behavioural Scales (Vineland II)
a. Personal and social competence; birth-adult
b. Does not require direct administration – someone familiar with
abilities and behaviour
c. Attractive for use with individuals with special needs (hearing
impaired)
3. Draw-a-Person Test
a. 5-16yrs
b. Intellectual ability can be estimated from a human figure
drawing
iii. EDUCATIONALLY FOCUSSED SCREENING MEASURES
1. SCHOOL –READINESS EVALUATION BY TRAINED TESTERS (SETT)
a. Administration provides context similar to a classroom
b. Evaluates developments of:
i. Language and general intellectual
ii. Physical and motor
iii. Emotional-social
2. SCHOOL-ENTRY GROUP SCREENING MEASURE (SGSM)
a. Non verbal cognitive screening test designed for group
situations for ages 5-9yrs
b. Fairly accurate predictor of at-risk kids for later scholastic
difficulties
iv. EXAMPLES OF DIAGNOTIC TESTS ADAPTED FOR USE IN SA
1. Griffiths Scales of Mental Development
a. Birth – 8yrs
b. 6 Scales – age equivalent given for each scale. Combining scores
gives General Quotient (global score)
i. Locomotor
ii. Personal-social
iii. Hearing and speech
iv. Eye and hand coordination
v. Performance
vi. Practical reasoning – good indication of child’s ability to
benefit from formal schooling.
c. Differences in scores for each scale shows up child’s strong and
weak points.
2. Bayley Scales (Bayley II) [INFANTS]
a. 1mth – 3 ½ yrs; widely used and standardised
b. 3 Scales:
i. Mental – sensory discrimination
ii. Motor – fine-motor and balance
iii. Behaviour – emotional tone
c. Does not give information on adaptive ability
3. McCarthy Scales of Children’s Abilities [PRESCHOOL]
a. 3 ½ - 8 ½ yrs
b. Uses tasks that look like games and use toy like material which
kids tend to enjoy
c. Useful for kids from different cultural and SES groups
d. Adapted for use in SA
e. 18 Tests groups into 5 scales. Overall measure derived by
combining scores on first 3 scales.
i. Verbal
ii. Performance
iii. Quantitative
iv. Motor
v. Memory
4. Junior South African Individual Scales JSAIS [PRESCHOOL]
a. Standardised for 3 – 7 yrs11mths English & Afrikaans with
separate norms for Coloured and Indian
b. Aims to measure general factor ‘g’
c. Range of mental abilities associated with effective functioning
at school
d. 22 Subtests, 12 of which used to compute Global IQ score
e. Verbal IQ and performance IQ can be computed as well as
Memory and Numerical scale scores
f. Some tests are speeded, some have time limits others no time
limit.
g. Reliability coefficients are high
h. Scores
i. 90-109 = average
ii. 80-89 = low
iii. 70-79 = borderline
iv. <70 = cognitively handicapped
i. NOTE:
i. Strengths and weaknesses may not remain stable over
time
ii. Scores may result from temporary difficulties
iii. SES deprivation has strong effect
iv. NOT to pigeon hole children with specific scores
v. SHORTCOMINGS OF SA DEVELOPMENTAL ASSESSMENT
1. Most tests standardised only for some cultural, SES and language groups
with others (particularly black preschoolers) excluded
2. Tests only standardised for some age-groups
3. Because of above SA research is fragmented
4. Socio-emotional functioning is not always included in the tests which
limits their ability to give a holistic picture.