NORMS AND STANDARDIZATION TEST SCORES

NORMS AND TEST STANDARDIZATION
Introduction
 Norms - A norm is a rule, standard, or pattern for action
 Informal guideline about what is considered normal (what is correct or incorrect)
social behavior in a particular group or social unit.
 Norm Group – a sample of examinees who are representative of the population for
whom the test is intended
Introduction
 A standardized test is one that is administered, scored, and
interpreted in identical fashion for all examinees.
 Standardized tests allow educators to gain a sense of the
average level of performance for a well-defined group of
students.
 Classroom teachers have no control over these types of tests,
but must understand their nature and interpretation.
 Achievement tests measure academic skills; aptitude tests
measure potential or future achievement.
Introduction
 Two types of standardized tests are
 norm-referenced (no predetermined passing score;
performance is based on comparisons to others) and
 criterion-referenced (performance is compared to pre-
established criteria).
Methods of Reporting Scores on Standardized
Tests
 Criterion-Referenced Tests
• Permit teachers to draw inferences about what students can do
relative to large domain.
• Answer the following questions:
 What does this student know?

 What can this student do? What content and skills has the
student mastered?
• Report raw scores, usually in the form of number or percentage of
items answered correctly.
• Other, less common results include speed of performance, quality of
performance, and precision of performance.
Tests
 Norm-Referenced Tests
• Permit comparisons to well-defined norm group (intended to represent
current level of achievement for a specific group of students at a specific
grade level).
• Answer the following questions:
 What is the relative standing of this student across this broad domain
of content?
 How does the student compare to other similar students?
• Scores are often transformed to a common distribution—normal
distribution or bell-shaped curve.
Tests
• Normal distribution
 Three main characteristics:
 Distribution is symmetrical.
 Mean, median, and mode are the same score and are located
at center of distribution.
 Percentage of cases in each standard deviation is known
precisely.
Methods of Reporting Scores on Standardized Tests
 Normal distribution
• Raw score
 Basic level of information provided by a standardized test
 Number of items answered correctly.
 No of questions answered in the keyed direction – personality
testing
 No of problems answered correctly – with bonus points added for
quick performance
 Raw scores cecome meaningful only in relation to norms
 Not very useful for norm-referenced tests.
 Score must be transformed in order to be useful for comparisons.
 Norm-Referenced Tests – Essential Statistical Concepts
 Any quantitative data needs to be summarized, condensed & organized into
meaningful pattern
 Frequency Distribution – a simple & useful way of summarizing data
 Prepared by specifying a small no of usually equal-sized class intervals &
tallying how many scores fall within each interval
 The sum of frequencies for all intervals will equal N, the total number of
scores
 No strict rule for determining interval size
 Common practice – 5-15 class intervals
 Histogram – graphic representation of the same information in freq dist
 Polygon – the frequency of class intervals is represented by single points
rather than columns
 Norm-Referenced Tests – Essential Statistical Concepts
 Measures of Central Tendency
 A useful way of collecting scores
 These measures indicate where most values in a distribution fall and are
also referred to as the central location of a distribution.
 The tendency of data to cluster around a middle value.
 In statistics, the three most common measures of central tendency are
the mean, median, and mode.
• Standard Deviation – Measure of Variability
• a measure of the amount of variation or dispersion of a set of
values.
• A low standard deviation indicates that the values tend to be close
to the mean (also called the expected value) of the set, while a high
standard deviation indicates that the values are spread out over a
wider range.
• A statistic that measures the dispersion of a dataset relative to its
mean and is calculated as the square root of the variance.
• calculated as the square root of variance by determining the
variation between each data point relative to the mean.
• If the data points are further from the mean, there is a higher
deviation within the data set; thus, the more spread out the data,
the higher the standard deviation.
• Grade-equivalent score: The grade in the norm group for which a
certain raw score was the median performance.
 Consists of two numerical components: The first number indicates
grade level and the second indicates the month during that school
year (ranges from 0 to 9); for example, grade-equivalent score of
4.2.
 Often misinterpreted as standard to be achieved.
 Although scores represent months, they do not represent equal
units.
 Norm-Referenced Tests – Raw Score Transformations
 Essential for making sense of test results
 Converting raw scores into interpretable and useful forms of
information
 Norm-Referenced Tests – Raw score transformations
• Percentile rank: Single number that indicates the percentage of norm group
that scored below a given raw score.
• Expresses the percentage of persons in the standardization sample who
scored below a specific raw score
• Indicates only how an examinee compares to the std sample falls above or
below a raw score and does not convey the percentage of questions
answered correctly
• A relative measure – varies from 1-100
 Ranges from 1 to 99; much more compact in middle of distribution
(doesn’t represent equal units).
 A percentile of 50 (P50) corresponds to median or middlemost raw score
 Often misinterpreted as percentage raw scores (absolute score).
 Sometimes they may distort the underlying measurement scale,
especially at extremes
 Norm-Referenced Tests Raw score transformations
 Standard score: Score that result from transformation to fit normal distribution.
• Expresses an examinee’s raw score in terms of its distance from the mean in
standard deviation units
• Uses the std devi of the total distribution of raw scores as the fundamental
unit of measurement
• Not only measures the magnitude of deviation from the mean, but the
direction of departure (+/-) as well
 Norm-Referenced Tests Raw score transformations
 Standard score:
 Expresses the distance from the mean in std dev units
 Overcomes previous limitation of unequal units.
 Allows for comparison of performance across two different measures.
 Reports performance on various scales to determine how many standard
deviations the score is away from the mean.
z-score
 One of the easiest & most commonly used std score
 z=(X-M)/SD
 Subtract Group Mean from ind raw score
 Divide the difference by SD of the group
 More than 99% of scores fall in the range of –3.00 to +3.00.
 Sign indicates whether above or below mean; number indicates
how many standard deviations away from mean.
 Half the students will be above; half will be below.
 Problems with interpreting negative scores.
 T-score
A standardized score with mean of 50 and SD of 10
T=10(X-M)/SD+50
T=10z+50
 Especially common with personality tests
 Provides location of score in distribution with mean of 50 and standard
deviation of 10 (over 99% of scores range from 20 to 80).
 For any distribution of raw score, the corresponding T-scores will have an
average of 50
 Mostly T scores fall between values of 20 and 80 – within three std dev of
the mean
 Very high t scores can be observed in clinical settings
 Can be misinterpreted as percentages.
Methods of Reporting Scores
on Standardized Tests
• Standardized scores (continued)
• Stanine (standard nine) score
• All raw scores are converted to a single-digit system of scores ranging
from1 to 9 with a mean of 5 and SD of 2
• Scores are ranked from lowest to highest
• Provides the location of a raw score in a specific segment or band of the
normal distribution.
• Mean of 5 and standard deviation of 2; range from 1 to 9.
• Bottom 4 % convert to a stanine of 1, next 7% to 2
• Represents coarse groupings; does not provide very specific
information.
Interpreting Student Performance
• Error exists in all educational measures.
 Can affect scores both negatively and positively.
• Standard error of measurement (standard error or SEM): The
average amount of measurement error across students in norm
group.
 Provides a range (known as a confidence interval) of
performance when both added and subtracted from test score.
Confidence Interval = Score ± Standard Error

• It is closely associated with the error variance, which indicates the amount
of variability in a test administered to a group that is caused by
measurement error.
• It is closely associated with the error variance, which indicates the amount
of variability in a test administered to a group that is caused by
measurement error.
• Used to determine the effect of measurement error on individual results in a
test and is a common tool in psychoanalytical research and standardized
academic testing.
• A function of both the standard deviation of observed scores and the
reliability of the test.
• When the test is perfectly reliable, the standard error of measurement
equals 0.
• When the test is completely unreliable, the standard error of measurement is
at its maximum, equal to the standard deviation of the observed scores.
• It is in the original unit of measurement. With the exception of
extreme distributions, the standard error of measurement is viewed
as a fixed characteristic of a particular test or measure.
• Serves in a complementary role to the reliability coefficient. When a
test is perfectly reliable, all observed score variance is caused by true
score variance, whereas when a test is completely unreliable, all
observed score variance is a result of error.
• Although the reliability coefficient provides important information
about the amount of error in a test measured in a group or
population, it does not inform on the error present in an individual
test score.
Interpreting Student
Performance
• Standard error of measurement (continued)
 Purpose of confidence interval is to determine range of scores
that we are reasonably confident represents a student’s true
ability.
 68% confidence interval (observed score ± one standard error).
 96% confidence interval (observed score ± two standard
errors).
 99% confidence interval (observed score ± three standard
errors).
Interpreting Student Performance
• Standard error of measurement (continued)

 On norm-referenced tests, confidence intervals are presented
around student’s obtained percentile rank score.
 Known as national percentile bands.
 Can be used to compare subtests by examining the bands for
overlap.
 When bands overlap, there is no real difference between
estimates of true achievement on subtests.
Selecting a Norm Group
• When choosing a norm group , one should try to obtain

representative cross section of the population for whom the test is
designed
• GRE GAT NTS
• Ideal way – computerized random sampling of all eligible members
of the population – every member of the population has an equal
chance of being selected
• An ideal source of informative data
• The diversities of ethnic background, social class, geog location all
proportionately represented in the sample
• Stratified Random Sampling

• Classifying the target population on important background variables
(age, gender, race, social class, educational level) & then selecting
an appropriate %age of persons from each stratum
• Basic purpose is to pick a diverse & representative from the selected
population
• Psychological test norms are not absolute, universal or timeless

• Relative to the population from which they are derived
• Age and Grade Norms

• Age norms depict the level of test performance for each separate
age group in the normative sample
• Purpose – to facilitate same-aged comparisons
• With age norms, examinee’s performance is interpreted in relation

to standardization subjects of the same age
• Age span can vary from a month to decade or more
• Intellectual abilities can be assessed as narrow as 4 months interval
• Comparisons are easy across age
• Age and Grade Norms

• Grade norms conceptually similar to age norms
• Depict the level of test performance for each separate grade in the
normative sample
• Rarely used with ability tests – especially useful in school settings in
terms of children’s achievement level (content)
• Comparing a student against a normative sample from the same
grade – a more appropriate comparison
• Local & Sub-Group Norms

• Local norms derived from representative local examinees as
opposed to national sample
• Subgroup Norms – consist of the scores obtained from an identified
subgroup as opposed to a diversified national sample

NORMS AND STANDARDIZATION TEST SCORES

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

NORMS AND STANDARDIZATION TEST SCORES

Uploaded by

Copyright:

Available Formats

NORMS AND TEST STANDARDIZATION

 What does this student know?

Confidence Interval = Score ± Standard Error

• Standard error of measurement (continued)

• When choosing a norm group , one should try to obtain

• Stratified Random Sampling

• Psychological test norms are not absolute, universal or timeless

• Age and Grade Norms

• With age norms, examinee’s performance is interpreted in relation

• Age and Grade Norms

• Local & Sub-Group Norms

You might also like