You are on page 1of 10

Understanding Different Types of Scoring Systems

Whenever you take a psychometric test either as part of the selection process or as a practice exercise you
will usually see your results presented in terms of numerical scores. These may be; raw scores, standard
scores, percentile scores, Z-scores, T-scores or Stens.

Raw Scores
These refer to your unadjusted score. For example, the number of items answered correctly in an aptitude
or ability test. Some types of assessment tools, such as personality questionnaires, have no right or wrong
answers and in this case, the raw score may represent the number of positive responses for a particular
personality trait. Obviously, raw scores by themselves are not very useful. If you are told that you scored
40 out of 50 in a verbal aptitude test, this is largely meaningless unless you know where your particular
score lies within the context of the scores of other people. Raw scores need to be converted into standard
scores or percentiles will provide you with this kind of information.

How Scores are Distributed

Many human characteristics are distributed throughout the population in a pattern known as the normal
curve or bell curve. This curve describes a distribution where most individuals cluster near the average
and progressively fewer individuals are found the further from the average you go in each direction.
The illustration above shows the relative heights of a large group of people. As you can see, a large
number of individual cases cluster in the middle of the curve and as the extremes are approached, fewer
and fewer cases exist, indicating that progressively fewer individuals are very short or very tall. The
results of aptitude and ability tests also show this normal distribution if a large and representative sample
of the population is used.

Mean and Standard Deviation

There are two characteristics of a normal distribution that you need to understand. The first is the mean or
average and the second is standard deviation, which is a measure of the variability of the distribution. Test
publishers usually assign an arbitrary number to represent the mean standard score when they convert
from raw scores to standard scores. Test X and Test Y are two tests with different standard score means.

In this illustration Test X has a mean of 200 and Test Y has a mean of 100. If an individual got a score of
100 on Test X, that person did very poorly. However, a score of 100 on Test Y would be an average

Standard Deviation.
The standard deviation is the most commonly used measure of variability. It is used to describe the
distribution of scores around the mean.
The value of the standard deviation varies directly with the spread of the test scores. If the spread is large,
the standard deviation is large. One standard deviation of the mean (both the plus and minus) will include
66% of the students' scores. Two standard deviations will include 95% of the scores.

Standard Scores, Percentiles and Norming

Percentile Scores
A percentile score is another type of converted score. Your raw score is converted to a number indicating
the percentage of the norm group who scored below you. For example, a score at the 60th percentile
means that the individual's score is the same as or higher than the scores of 60% of those who took the
test. The 50th percentile is known as the median and represents the middle score of the distribution.
Percentiles have the disadvantage that they are not equal units of measurement. For instance, a difference
of 5 percentile points between two individual’s scores will have a different meaning depending on its
position on the percentile scale, as the scale tends to exaggerate differences near the mean and collapse
differences at the extremes.

Percentiles can not be averaged nor treated in any other way mathematically. However, they do have the
advantage of being easily understood and can be very useful when giving feedback to candidates or
reporting results to managers. If you know your percentile score then you know how it compares with
others in the norm group. For example, if you scored at the 70th percentile, then this means that you
scored the same or better than 70% of the individuals in the norm group. This is the score most often used
by organizations when comparing your score with that of other candidates because they are so easily
understood they are very widely used when reporting results to managers. The characteristic way that test
scores tend to bunch up around the average and the use of percentiles in the interpretation of test results,
has important implications for you as a job candidate. This is because most aptitude tests have relatively
few questions and most of the scores are clustered around the mean. The effect of this is that a very small
improvement in your actual score will make a very substantial difference to your percentile score. To
illustrate this point, consider a typical aptitude test consisting of 50 questions. Most of the candidates,
who are a fairly similar group in terms of their educational background and achievements, will score
around 40. Some will score a few less and some a few more. It is very unlikely that any of them will score
less than 35 or more than 45.
Looking at these results in terms of percentiles is a very poor way of analyzing them and no experienced
statistician would ever use percentiles on this type of data. However, nine times out of ten this is exactly
what happens to these test results and a difference of three or four extra marks can take you from the 30th
to the 70th percentile. This is why preparing for these tests is so worthwhile as even small improvements
in your results can make you appear a far superior candidate.

Different Norming Systems

There are several different norming systems available for use, which have strengths and weaknesses
indifferent situations. These can be grouped into two main categories; rank order and ordinal.


To overcome the problems of interpretation implicit with percentiles and other rank order systems various
types of standard scores have been developed. One of these is the Z-score which is based on the mean and
standard deviation. It indicates how many standard deviations above or below the mean your score is. The
Z-score is calculated by the formula: Z=X-M/SD Where:

Z      =   standard score

X      =   individual raw score

M     =   mean score

SD   =   standard deviation

The illustration shows how Z-scores in standard deviation units are marked out on either side of the mean.
It shows where your score sits in relation to the rest of the norm group. If it is above the mean then it is
positive, and if it is below the mean then it is negative. As you can see from the illustration, Z-scores can
be rather cumbersome to handle because most of them are decimals and half of them can be expected to
be negative. T Scores (Transformed Scores) T-scores are used to solve this problem of decimals and
negative numbers. The T-score is simply a transformation of the Z-score, based on a mean of 50 and
standard deviation of 10. A T-score can be calculated from a Z-score using the formula: T = (Zx10) + 50
Since T-scores do not contain decimal points or negative signs they are used more frequently than Z-
scores as a norm system, particularly for aptitude tests. Stens (Standard Tens) The Sten (standard ten) is a
standard score system commonly used with personality questionnaires. Stens divide the score scale into
ten units. Each unit has a band width of half a standard deviation except the highest unit (Sten 10) which
extends from 2 standard deviations above the mean, and the lowest unit (Sten 1) which extends from 2
standard deviations below the mean.

Sten scores can be calculated from Z-scores using the formula: Sten = (Zx2) + 5.5. Stens have the
advantage that they enable results to be thought of in terms of bands of scores, rather than absolute scores.
These bands are narrow enough to distinguish statistically significant differences between candidates, but
wide enough not to over emphasize minor differences between candidates.

Interpreting Test Results

Aptitude & Ability tests are used to make inferences about your competencies, capabilities, and likely
future performance on the job. But what do your test scores mean and how are they interpreted?
There are two distinct methods that employers use to interpret your scores. These are called criterion-
referenced interpretation and norm-referenced interpretation.

Criterion-Referenced Interpretation.
In criterion-referenced tests, your test score indicates the amount of skill or knowledge that you have in a
particular subject area. The test score is not used to indicate how well you compare to others - it relates
solely to your degree of competence in the specific area assessed. Criterion-referenced assessment is
generally associated with achievement testing and certification. A particular test score is chosen as the
minimum acceptable level of competence. This can either be set by the test publisher who will convert
test scores into proficiency standards, or the company may use its own experience to do this.

For example, suppose a company needs clerical staff with word processing proficiency. The test publisher
may provide a conversion table relating word processing skill to various levels of proficiency, or the
companies own experience with current clerical employees may help them to determine the passing score.
They may decide that a minimum of 50 words per minute with no more than two errors per 100 words is
sufficient for a job with occasional word processing duties. Alternatively, if they have a job with high
production demands, they may set the minimum at 100 words per minute with no more than 1 error per
100 words.
Norm-Referenced Interpretation
In norm-referenced test interpretation, your scores are compared with the test performance of a particular
reference group, called the norm group. The norm group usually consists of large representative samples
of individuals from specific populations, undergraduates, senior managers or clerical workers. It is the
average performance and distribution of their scores that become the test norms of the group.

This illustration shows the distribution and mean scores for a variety of groups for a specific test. A score
of 150 on this test would be average for someone working for the organization at an administrative level
but would be below average compared to the organizations graduate trainees, where the average score
was 210. Within the field of occupational testing, a wide variety of individuals are assessed for a broad
range of different jobs. Clearly, people vary markedly in their abilities and qualities, and the norm group
against which you are compared is of crucial importance. To make sure that the test results can be
interpreted in a meaningful way, the test administrator will identify the most appropriate norm group.
This is done by comparing the educational level, the occupational, language and cultural backgrounds,
and other demographic characteristics of the individuals making up the two groups (norm group & test
group) to establish their similarity.

Making Selection Decisions

The rank-ordering of test results, the use of cut-off scores, or some combination of the two is commonly
used to assess the test scores and make employment-related decisions about them. There are essentially
three approaches that can be taken.
Rank Ordering Firstly the organization could simply select the top scorers. This would seem to be the
most obvious approach, but is does have a major drawback, at least where ‘ordinary’ jobs are concerned.
In times of high unemployment the job is likely to attract some candidates who are too ‘high-powered’
and who will probably get bored quickly and more on as soon as they can. Alternatively, if
unemployment is very low then all of the candidates may have poor scores and may not be up to the job.
Neither of these represents a successful outcome for the organization.

Cut-off Score The second option is to shortlist candidates who achieve more than a minimum
acceptable score. This is more flexible than the above approach as it ensures that candidates who
are not up to the job are excluded whilst giving the interviewer or decision maker the option to
exclude candidates they feel are too high powered.

Profiling The third option is to use a minimum acceptable score in conjunction with profiling.
This approach first excludes unsuitable candidates on the basis of minimum score and then takes
into account the relative strengths of each suitable candidate in all of the areas in which they
have been tested. This is then used to produce a profile map which can be compared to the
‘ideal’ profile for the job. This profile will be based on a job specification compiled by an
occupational psychologist, or qualified personnel professional. This job specification will
encompass the following areas:. Knowledge – is specific knowledge needed. For example;
medical, legal, financial, engineering, etc. This will often be decided on the basis of recognized
qualifications but will be influenced by previous job experience. Skills – are specific skills
needed. For example, typing 150 words per minute, ability to operate CNC machine, etc. This
will often be decided on the basis of recognized qualifications but will be influenced by previous
job experience. Abilities – are underlying abilities needed. For example, numerical ability,
artistic ability, problem solving ability. These may be decided on the basis of aptitude or ability
tests. Experience – is specific experience necessary. For example, managing a construction
project. Personal Qualities – are particular qualities required. For example, interpersonal skills or
leadership skills.

You might also like