You are on page 1of 31

NORMS AND TEST STANDARDIZATION

Introduction
 Norms - A norm is a rule, standard, or pattern for action
 Informal guideline about what is considered normal (what is correct or incorrect)
social behavior in a particular group or social unit.
 Norm Group – a sample of examinees who are representative of the population for
whom the test is intended
Introduction
 A standardized test is one that is administered, scored, and
interpreted in identical fashion for all examinees.
 Standardized tests allow educators to gain a sense of the
average level of performance for a well-defined group of
students.
 Classroom teachers have no control over these types of tests,
but must understand their nature and interpretation.
 Achievement tests measure academic skills; aptitude tests
measure potential or future achievement.
Introduction
 Two types of standardized tests are
 norm-referenced (no predetermined passing score;
performance is based on comparisons to others) and
 criterion-referenced (performance is compared to pre-
established criteria).
Methods of Reporting Scores on Standardized
Tests
 Criterion-Referenced Tests
• Permit teachers to draw inferences about what students can do
relative to large domain.
• Answer the following questions:

 What does this student know?


 What can this student do? What content and skills has the
student mastered?
• Report raw scores, usually in the form of number or percentage of
items answered correctly.
• Other, less common results include speed of performance, quality of
performance, and precision of performance.
Methods of Reporting Scores on Standardized
Tests
 Norm-Referenced Tests
• Permit comparisons to well-defined norm group (intended to represent
current level of achievement for a specific group of students at a specific
grade level).
• Answer the following questions:

 What is the relative standing of this student across this broad domain
of content?
 How does the student compare to other similar students?
• Scores are often transformed to a common distribution—normal
distribution or bell-shaped curve.
Methods of Reporting Scores on Standardized
Tests
 Norm-Referenced Tests
• Normal distribution
 Three main characteristics:
 Distribution is symmetrical.
 Mean, median, and mode are the same score and are located
at center of distribution.
 Percentage of cases in each standard deviation is known
precisely.
Methods of Reporting Scores on Standardized Tests
 Norm-Referenced Tests
 Normal distribution
Methods of Reporting Scores on Standardized Tests
 Norm-Referenced Tests
• Raw score
 Basic level of information provided by a standardized test
 Number of items answered correctly.
 No of questions answered in the keyed direction – personality
testing
 No of problems answered correctly – with bonus points added for
quick performance
 Raw scores cecome meaningful only in relation to norms
 Not very useful for norm-referenced tests.
 Score must be transformed in order to be useful for comparisons.
Methods of Reporting Scores on Standardized Tests
 Norm-Referenced Tests – Essential Statistical Concepts
 Any quantitative data needs to be summarized, condensed & organized into
meaningful pattern
 Frequency Distribution – a simple & useful way of summarizing data
 Prepared by specifying a small no of usually equal-sized class intervals &
tallying how many scores fall within each interval
 The sum of frequencies for all intervals will equal N, the total number of
scores
 No strict rule for determining interval size
 Common practice – 5-15 class intervals
 Histogram – graphic representation of the same information in freq dist
 Polygon – the frequency of class intervals is represented by single points
rather than columns
Methods of Reporting Scores on Standardized Tests
 Norm-Referenced Tests – Essential Statistical Concepts
 Measures of Central Tendency
 A useful way of collecting scores
 These measures indicate where most values in a distribution fall and are
also referred to as the central location of a distribution.
 The tendency of data to cluster around a middle value.
 In statistics, the three most common measures of central tendency are
the mean, median, and mode.
• Standard Deviation – Measure of Variability
• a measure of the amount of variation or dispersion of a set of
values.
• A low standard deviation indicates that the values tend to be close
to the mean (also called the expected value) of the set, while a high
standard deviation indicates that the values are spread out over a
wider range.
• A statistic that measures the dispersion of a dataset relative to its
mean and is calculated as the square root of the variance.
• calculated as the square root of variance by determining the
variation between each data point relative to the mean.
• If the data points are further from the mean, there is a higher
deviation within the data set; thus, the more spread out the data,
the higher the standard deviation.
Methods of Reporting Scores on Standardized Tests
 Norm-Referenced Tests
• Grade-equivalent score: The grade in the norm group for which a
certain raw score was the median performance.
 Consists of two numerical components: The first number indicates
grade level and the second indicates the month during that school
year (ranges from 0 to 9); for example, grade-equivalent score of
4.2.
 Often misinterpreted as standard to be achieved.
 Although scores represent months, they do not represent equal
units.
Methods of Reporting Scores on Standardized Tests
 Norm-Referenced Tests – Raw Score Transformations
 Essential for making sense of test results
 Converting raw scores into interpretable and useful forms of
information
Methods of Reporting Scores on Standardized Tests
 Norm-Referenced Tests – Raw score transformations
• Percentile rank: Single number that indicates the percentage of norm group
that scored below a given raw score.
• Expresses the percentage of persons in the standardization sample who
scored below a specific raw score
• Indicates only how an examinee compares to the std sample falls above or
below a raw score and does not convey the percentage of questions
answered correctly
• A relative measure – varies from 1-100
 Ranges from 1 to 99; much more compact in middle of distribution
(doesn’t represent equal units).
 A percentile of 50 (P50) corresponds to median or middlemost raw score
 Often misinterpreted as percentage raw scores (absolute score).
 Sometimes they may distort the underlying measurement scale,
especially at extremes
Methods of Reporting Scores on Standardized Tests
 Norm-Referenced Tests Raw score transformations
 Standard score: Score that result from transformation to fit normal distribution.
• Expresses an examinee’s raw score in terms of its distance from the mean in
standard deviation units
• Uses the std devi of the total distribution of raw scores as the fundamental
unit of measurement
• Not only measures the magnitude of deviation from the mean, but the
direction of departure (+/-) as well
Methods of Reporting Scores on Standardized Tests
 Norm-Referenced Tests Raw score transformations
 Standard score:
 Expresses the distance from the mean in std dev units
 Overcomes previous limitation of unequal units.
 Allows for comparison of performance across two different measures.
 Reports performance on various scales to determine how many standard
deviations the score is away from the mean.
Methods of Reporting Scores on Standardized Tests
z-score
 One of the easiest & most commonly used std score
 z=(X-M)/SD
 Subtract Group Mean from ind raw score
 Divide the difference by SD of the group
 More than 99% of scores fall in the range of –3.00 to +3.00.
 Sign indicates whether above or below mean; number indicates
how many standard deviations away from mean.
 Half the students will be above; half will be below.
 Problems with interpreting negative scores.
Methods of Reporting Scores on Standardized Tests
 T-score
A standardized score with mean of 50 and SD of 10
T=10(X-M)/SD+50
T=10z+50
 Especially common with personality tests
 Provides location of score in distribution with mean of 50 and standard
deviation of 10 (over 99% of scores range from 20 to 80).
 For any distribution of raw score, the corresponding T-scores will have an
average of 50
 Mostly T scores fall between values of 20 and 80 – within three std dev of
the mean
 Very high t scores can be observed in clinical settings
 Can be misinterpreted as percentages.
Methods of Reporting Scores
on Standardized Tests
• Standardized scores (continued)
• Stanine (standard nine) score
• All raw scores are converted to a single-digit system of scores ranging
from1 to 9 with a mean of 5 and SD of 2
• Scores are ranked from lowest to highest
• Provides the location of a raw score in a specific segment or band of the
normal distribution.
• Mean of 5 and standard deviation of 2; range from 1 to 9.
• Bottom 4 % convert to a stanine of 1, next 7% to 2
• Represents coarse groupings; does not provide very specific
information.
Methods of Reporting Scores on Standardized Tests
Interpreting Student Performance
 Norm-Referenced Tests
• Error exists in all educational measures.
 Can affect scores both negatively and positively.
• Standard error of measurement (standard error or SEM): The
average amount of measurement error across students in norm
group.
 Provides a range (known as a confidence interval) of
performance when both added and subtracted from test score.

Confidence Interval = Score ± Standard Error


• It is closely associated with the error variance, which indicates the amount
of variability in a test administered to a group that is caused by
measurement error.
• It is closely associated with the error variance, which indicates the amount
of variability in a test administered to a group that is caused by
measurement error.
• Used to determine the effect of measurement error on individual results in a
test and is a common tool in psychoanalytical research and standardized
academic testing.
• A function of both the standard deviation of observed scores and the
reliability of the test.
• When the test is perfectly reliable, the standard error of measurement
equals 0.
• When the test is completely unreliable, the standard error of measurement is
at its maximum, equal to the standard deviation of the observed scores.
• It is in the original unit of measurement. With the exception of
extreme distributions, the standard error of measurement is viewed
as a fixed characteristic of a particular test or measure.
• Serves in a complementary role to the reliability coefficient. When a
test is perfectly reliable, all observed score variance is caused by true
score variance, whereas when a test is completely unreliable, all
observed score variance is a result of error.
• Although the reliability coefficient provides important information
about the amount of error in a test measured in a group or
population, it does not inform on the error present in an individual
test score.
Interpreting Student
Performance
• Standard error of measurement (continued)
 Purpose of confidence interval is to determine range of scores
that we are reasonably confident represents a student’s true
ability.
 68% confidence interval (observed score ± one standard error).
 96% confidence interval (observed score ± two standard
errors).
 99% confidence interval (observed score ± three standard
errors).
Interpreting Student Performance

• Standard error of measurement (continued)


 On norm-referenced tests, confidence intervals are presented
around student’s obtained percentile rank score.
 Known as national percentile bands.
 Can be used to compare subtests by examining the bands for
overlap.
 When bands overlap, there is no real difference between
estimates of true achievement on subtests.
Selecting a Norm Group

• When choosing a norm group , one should try to obtain


representative cross section of the population for whom the test is
designed
• GRE GAT NTS
• Ideal way – computerized random sampling of all eligible members
of the population – every member of the population has an equal
chance of being selected
• An ideal source of informative data
• The diversities of ethnic background, social class, geog location all
proportionately represented in the sample
Selecting a Norm Group

• Stratified Random Sampling


• Classifying the target population on important background variables
(age, gender, race, social class, educational level) & then selecting
an appropriate %age of persons from each stratum
• Basic purpose is to pick a diverse & representative from the selected
population

• Psychological test norms are not absolute, universal or timeless


• Relative to the population from which they are derived
Selecting a Norm Group

• Age and Grade Norms


• Age norms depict the level of test performance for each separate
age group in the normative sample
• Purpose – to facilitate same-aged comparisons

• With age norms, examinee’s performance is interpreted in relation


to standardization subjects of the same age
• Age span can vary from a month to decade or more
• Intellectual abilities can be assessed as narrow as 4 months interval
• Comparisons are easy across age
Selecting a Norm Group

• Age and Grade Norms


• Grade norms conceptually similar to age norms
• Depict the level of test performance for each separate grade in the
normative sample
• Rarely used with ability tests – especially useful in school settings in
terms of children’s achievement level (content)
• Comparing a student against a normative sample from the same
grade – a more appropriate comparison
Selecting a Norm Group

• Local & Sub-Group Norms


• Local norms derived from representative local examinees as
opposed to national sample
• Subgroup Norms – consist of the scores obtained from an identified
subgroup as opposed to a diversified national sample

You might also like