You are on page 1of 12

Z value A common statistical way of standardizing data on one scale so a comparison can take place is using a z-score.

The z-score is like a common yard stick for all types of data. Each z-score corresponds to a point in a normal distribution and as such is sometimes called a normal deviate since a z-score will describe how much a point deviates from a mean or specification point.
Deviation IQ (DIQ): An age-based index of general mental ability. It is based on the difference between a person’s score and the average score for persons of the same chronological age. Deviation Score (x): The score for an individual minus the mean score for the group; i.e., the amount a person deviates from the mean .

Percentile: A point on the norms distribution below which a certain percentage of the scores fall. For example, if 70% of the scores fall below a raw score of 56, then the score of 56 is at the 70th percentile. Percentile Rank: The percentage of scores falling below a certain point on a score distribution. (Percentile and percentile rank are sometimes used interchangeably.) Raw Score: A person’s observed score on a test, i.e., the number correct. While raw scores do have some usefulness, they should not be used to make comparisons between performance on different tests, unless other information about the characteristics of the tests is known. For example, if a student answered 24 items correctly on a reading test, and 40 items correctly on a mathematics test, we should not assume that he or she did better on the mathematics test than on the reading measure. Perhaps the reading test consisted of 35 items and the arithmetic test consisted of 80 items. Given this additional information we might conclude that the student did better on the reading test (24/35 as compared with 40/80). How well did the student do in relation to other students who took the test in reading? We cannot address this question until we know how well the class as a whole did on the reading test. Twenty-four items answered correctly is impressive, but if the average (mean) score attained by the class was 33, the student’s score of 24 takes on a different meaning. Standard Score: A general term referring to scores that have been "transformed" for reasons of convenience, comparability, ease of interpretation, etc. The basic type of standard score, known as a z-score, is an expression of the deviation of a score from the mean score of the group in relation to the standard deviation of the scores of the group. Most other standard scores are linear transformations of z-scores, with different means and standard deviations T-Score: A standard score with a mean of 50 and a standard deviation of 10. Thus a T-score of 60 represents a score one standard deviation above the mean. T-scores are obtained by the following formula: Standard Deviation (S.D.) A measure of the variability, or dispersion, of a distribution of scores. The more the scores cluster around the mean, the smaller the standard deviation. In a normal distribution of scores, 68.3% of the scores are within the range of one S.D. below the mean to one S.D. above the mean. Computation of the S.D. is based upon the square of the deviation of each score from the mean. One way of writing the formula is as follows:

z-Score: A type of standard score with a mean of zero and a standard deviation of one Norms: The distribution of test scores of some specified group called the norm group. For example, this may be a national sample of all fourth graders, a national sample of all fourth-grade males, or

Frequency: The number of times a given score (or a set of scores in an interval grouping) occurs in a distribution. Cases are concentrated near the mean and decrease in frequency. It shows the percentage of cases between different scores as expressed in standard deviation units. In a normal distribution.perhaps all fourth graders in some local district. Mean ( ): The arithmetic average of a set of scores. about 34% of the scores fall between the mean and one standard deviation above the mean. Standards. given a set of test items. A Normal Distribution. Figure 1. Frequency Distribution: A tabulation of scores from low to high or high to low showing the number of individuals who obtain each score or fall within each score interval. Through using objective assessment techniques (after evaluation of the psychometric qualities of the tools) we can make judgements confident that the results are going to give a measure of the consistency of an individual’s behaviour (based on the reliability of the tool and validity of its predictions) and its uniqueness (based on the comparison of the result to appropriate groups). It is found by adding all the scores in the distribution and dividing by the total number of scores. Normal Distribution: A distribution of scores or other measures that in graphic form has a distinctive bell-shaped appearance. Norms vs. Norms are indicators of what students of similar characteristics did when confronted with the same test items as those taken by students in the norms group. are arbitrary judgments of what students should be able to do. according to a precise mathematical equation. Standards: Norms are not standards. The assumption that many mental and psychological characteristics are distributed normally has been very useful in test development work. the farther one departs from the mean. the measures are distributed symmetrically about the mean. This does not deny the usefulness of less structured information. Figure 1 below is a normal distribution. rather it suggests that as psychologists we too can fall into the trap of weighting anecdotal or colourful information . on the other hand. Introduction Of all the problems of building our judgements about people the last point is arguably the easiest to control. For example.

For instance. Someone who has scored at the 30th percentile has performed or responded in a way that is higher than 30% of the norm group. When an individual’s score is subsequently compared with this scale. as an index of the relationship between our sample of behaviour and the behaviour we are trying to predict. Percentiles have the disadvantage that they are not equal units of measurement. it is important to ensure that the norm groups used are relevant to the given group or situation that the data are being used for. as the scale tends to exaggerate differences near the mean and collapse differences at the extremes. Hamil et al. by comparing that score with that of other people. Taylor & Thompson. As we compare a person’s aptitudes and attributes in relation to other people. Norms are sets of data derived from groups of individuals. However. we can give a percentile score which represents the percentage of the comparison group that the individual has scored above. who have already completed a test or questionnaire. which have strengths and weaknesses in different situations. people vary markedly in their abilities and qualities. It is very likely that the conclusions reached will vary considerably when an individual is compared against two different groups. 1980. The basis of standard scores is the Z-score which is based on the mean and standard deviation. The normal distribution provides us with the tool to compare a standardised sample of behaviour with a large sample of that behaviour to give us a measure of relative propensity. a difference of 5 percentile points between two individual’s scores will have a different meaning depending on its position on the percentile scale. Ordinal Systems To overcome the problems of interpretation implicit with rank order systems. We do this through seeing how a person’s score sits in relation to other’s scores on a normal distribution. A Z-score is merely a raw score which has been changed to standard deviation units. These norm groups enable us to establish where an individual’s score lies on a standard scale. . 1980. Norming Systems There are a number of different norming systems available for use. a score corresponding to the 75th percentile means that an individual score or response is greater in magnitude than 75% of the norm group in question. and therefore the norm group against which an individual is compared is of crucial importance. For instance. or bell curve. rank order and ordinal. scores on tests and questionnaires also need to be compared to relevant comparison groups. It indicates how many standard deviations above or below the mean a score is. a wide variety of individuals are assessed for a broad range of different jobs. gives us the justification for this comparison. This spread of results allows us to arrange people in a rank order scale according to their performance. Rank Order Systems When a group of people are given a test or questionnaire we expect to observe a range of different scores as people differ in their abilities or personal qualities. For this reason. These can be grouped into two main categories.. Clearly. Accordingly percentiles must not be averaged nor treated in any other fashion mathematically. The 50th percentile is equivalent to the average of the scale. for instance.over more valid baserate data (Ginosar & Trope. they have the advantage that they are easily understood and can be very useful when giving feedback of results to candidates or discussing results with managers. Within the field of occupational testing. school leavers and managers in industry. various types of standard scores have been developed. The validity coefficient. 1982).

It can be seen from Figure 1 that Z-scores in standard deviation units are marked out on either side of the mean. based on a mean of 50 and standard deviation of 10. For this reason they are used more frequently than Zscores as a norm system. To remedy these drawbacks various transformed standard score systems have been derived. By the calculation of the Z-score it can be seen where the individual’s score lies in relation to the rest of the distribution. These simply entail multiplying the obtained Z-score by a new standard deviation and adding it to a new mean. particularly for aptitude tests.5 and a standard deviation of 2. The standard score is very important when comparing scores from different scales within the questionnaire. Stens have the advantage that they are based on the principles of standard scores and that they . As the name suggests. Both of these steps are devised to eradicate decimals and negative numbers. From Figure 1 it can be seen that Z-scores can be rather difficult and cumbersome to handle because most of them are decimals and half of them can be expected to be negative. One standard deviation above and below the mean includes approximately 68% of the sample. and those below the mean negative in sign. and the lowest unit (Sten 1) which extends from 2 standard deviations below the mean.5. T Scores (Transformed Scores) The T-score is a linear transformation of the Z-score. Before these scores can be properly compared they must be converted to a common scale such as a standard score scale.The Z-score is calculated by the formula: where: Z = standard score X = individual raw score = mean score SD = standard deviation Usually when standard scores are used they are interpreted in relation to the normal distribution curve. These can then be used to express an individual’s score on different scales in terms of norms. Those above the mean are positive. Sten scores can be calculated from Z-scores using the formula: Sten = (Zx2) + 5. stens divide the score scale into ten units. A T-score can be calculated from a Zscore using the formula: T = (Zx10) + 50 Stens (Standard Tens) The Sten (standard ten) is a standard score system commonly used with personality questionnaires. One important advantage in using the normal distribution as a basis for norms is that the standard deviation has a precise relationship with the area under the curve. Each unit has a band width of half a standard deviation except the highest unit (Sten 10) which extends from 2 standard deviations above the mean. T-scores have the advantage over Z-scores that they do not contain decimal points nor positive and negative signs. It is based on a transformation from the Z-score and has a mean of 5.

while at the same time guiding the user not to over-interpret small differences between scores. and the United Kingdom in the areas of personality and aptitudes. Australia. A rough rule of thumb is that for instruments which meet the gold standard of reliability (. In order for baserate information to be meaningful the comparison group needs to be as similar to the individual tested and the circumstances in which they were measured. the face validity and construct validity. Your raw score is converted to a number indicating the percentage of the norm group who scored below you. Standard scores indicate where your score lies in comparison to a norm group. For example. The relationship between Stens. then your own score can be compared to this to see if you are above or below this average. As discussed previously. The disadvantage of norms produced in-house is that the comparability with the whole population is lost. are your applicants better or worse than those of other companies.75). rather than absolute points. When providing feedback to a respondent. the purpose of norms is to provide baserate information about the likelihood of a skill or behaviour being displayed by the individual. The 50th percentile is known as the median and represents the middle score of the distribution. With stens these bands are sufficiently narrow not to mark significant differences between people. norm groups should be made up of 100 people or more and should be directly relevant to the purpose of the test. For example. or for one person over different personality scales. For example. T-Scores and Percentiles is shown in the chart of the normal distribution curve (Figure 1).encourage us to think in terms of bands of scores. In-house norms provide the most directly relevant base rate information as the recipient of the information will be very familiar with the prevalence of the behaviour being measured. Choosing Norm Groups The importance of New Zealand based norms lies in two places. Norm group size should be determined by the standard deviation of the sample and the reliability of the instrument. a score at the 60th percentile means that the individual's score is the same as or higher than the scores of 60% of those who took the test. giving the information in the context of a group they feel they should be compared with is very important for the person’s acceptance of the results. For example. Results from our research to date suggests some very real differences between New Zealand. The results will be less meaningful when compared to an inappropriate group. . are the people in your own institution more or less disabled than the whole population. if the average or mean score for the norm group is 25. Percentile Scores A percentile score is another type of converted score. comparing the score of a degreed manager with 16 year old school leavers is unlikely to give useful base rate information as to the ability to solve problems.

If you know your percentile score then you know how it compares with others in the norm group. if you scored at the 70th percentile. The characteristic way that test scores tend to bunch up around the average and the use of percentiles in the interpretation of test results. then this means that you scored the same or better than 70% of the individuals in the norm group. The effect of this is that a very small improvement in your actual score will make a very substantial difference to your percentile score.Percentiles have the disadvantage that they are not equal units of measurement. nine times . It is very unlikely that any of them will score less than 35 or more than 45. However. as the scale tends to exaggerate differences near the mean and collapse differences at the extremes. who are a fairly similar group in terms of their educational background and achievements. However. For instance. Percentiles can not be averaged nor treated in any other way mathematically. will score around 40. Some will score a few less and some a few more. they do have the advantage of being easily understood and can be very useful when giving feedback to candidates or reporting results to managers. To illustrate this point. Most of the candidates. For example. consider a typical aptitude test consisting of 50 questions. a difference of 5 percentile points between two individual’s scores will have a different meaning depending on its position on the percentile scale. This is the score most often used by organizations when comparing your score with that of other candidates because they are so easily understood they are very widely used when reporting results to managers. This is because most aptitude tests have relatively few questions and most of the scores are clustered around the mean. Looking at these results in terms of percentiles is a very poor way of analyzing them and no experienced statistician would ever use percentiles on this type of data. has important implications for you as a job candidate.

rank order and ordinal. which have strengths and weaknesses in different situations. As you can see from the illustration. This is why preparing for these tests is so worthwhile as even small improvements in your results can make you appear a far superior candidate. One of these is the Z-score which is based on the mean and standard deviation. It indicates how many standard deviations above or below the mean your score is. T Scores (Transformed Scores) T-scores are used to solve this problem of decimals and negative numbers.out of ten this is exactly what happens to these test results and a difference of three or four extra marks can take you from the 30th to the 70th percentile. based on a mean of 50 and standard deviation of 10. The T-score is simply a transformation of the Z-score. The Z-score is calculated by the formula: Z=X-M/SD Where: Z X M SD = = = = standard score individual raw score mean score standard deviation The illustration shows how Z-scores in standard deviation units are marked out on either side of the mean. Different Norming Systems There are several different norming systems available for use. If it is above the mean then it is positive. Z-scores To overcome the problems of interpretation implicit with percentiles and other rank order systems various types of standard scores have been developed. and if it is below the mean then it is negative. A T-score can be calculated from a Z-score using the formula: T = (Zx10) + 50 . These can be grouped into two main categories. Z-scores can be rather cumbersome to handle because most of them are decimals and half of them can be expected to be negative. It shows where your score sits in relation to the rest of the norm group.

These bands are narrow enough to distinguish statistically significant differences between candidates. raw scores. .Since T-scores do not contain decimal points or negative signs they are used more frequently than Z-scores as a norm system. Sten scores can be calculated from Z-scores using the formula: Sten = (Zx2) + 5. T-scores or Stens. These may be. Each unit has a band width of half a standard deviation except the highest unit (Sten 10) which extends from 2 standard deviations above the mean. Z-scores. rather than absolute scores. Whenever you take a psychometric test either as part of the selection process or as a practice exercise you will usually see your results presented in terms of numerical scores. Stens have the advantage that they enable results to be thought of in terms of bands of scores.5. Stens (Standard Tens) The Sten (standard ten) is a standard score system commonly used with personality questionnaires. standard scores. Stens divide the score scale into ten units. but wide enough not to over emphasize minor differences between candidates. percentile scores. particularly for aptitude tests. and the lowest unit (Sten 1) which extends from 2 standard deviations below the mean.

a large number of individual cases cluster in the middle of the curve and as the extremes are approached. Mean and Standard Deviation There are two characteristics of a normal distribution that you need to understand. If you are told that you scored 40 out of 50 in a verbal aptitude test. The results of aptitude and ability tests also show this normal distribution if a large and representative sample of the population is used. This curve describes a distribution where most individuals cluster near the average and progressively fewer individuals are found the further from the average you go in each direction. the raw score may represent the number of positive responses for a particular personality trait. The illustration above shows the relative heights of a large group of people. fewer and fewer cases exist. . the number of items answered correctly in an aptitude or ability test. Obviously.Raw Scores These refer to your unadjusted score. such as personality questionnaires. For example. As you can see. indicating that progressively fewer individuals are very short or very tall. Test publishers usually assign an arbitrary number to represent the mean standard score when they convert from raw scores to standard scores. which is a measure of the variability of the distribution. this is largely meaningless unless you know where your particular score lies within the context of the scores of other people. raw scores by themselves are not very useful. The first is the mean or average and the second is standard deviation. Some types of assessment tools. Test X and Test Y are two tests with different standard score means. have no right or wrong answers and in this case. How Scores are Distributed Many human characteristics are distributed throughout the population in a pattern known as the normal curve or bell curve. Raw scores need to be converted into standard scores or percentiles will provide you with this kind of information.

but these scores do not account for factors such as how hard the test is. where a person stands in relation to other people. The standard deviation is the most commonly used measure of variability. Such scores are readily understandable and are useful in indicating what proportion of the total marks a person has gained. One standard deviation of the mean (both the plus and minus) will include 66% of the students' scores. What do test scores mean? Many people will remember test scores from their school days such as ‘7 out of 10’ for a primary school spelling test. that person did very poorly. As another example. and the margin of error in the test score. Standard Deviation.In this illustration Test X has a mean of 200 and Test Y has a mean of 100. . Two standard deviations will include 95% of the scores. the standard deviation is large. The value of the standard deviation varies directly with the spread of the test scores. we would not know how well the pupil is performing against National Curriculum measures. If an individual got a score of 100 on Test X. in a school test such as mathematics or English. a score of 100 on Test Y would be an average score. It is used to describe the distribution of scores around the mean. or ‘63%’ for one of their secondary school exams. If the spread is large. However.

Both enable test-takers to be compared with a large. These examples come from a frequency distribution known as 'the normal distribution'. Usually. Standardised scores Standardised scores are more useful measures than raw scores (the number of questions answered correctly) and there are three reasons why such scores are normally used. about 68 per cent of the test-takers in the national sample will have a standardised score within 15 points of the average (between 85 and 115). the percentile rank gives a rank ordering of that score based on the population as a whole. standardised scores are related to both these statistics. irrespective of the difficulty of the test. tests are standardised so that the average. The standardised score is on a scale that can be readily compared and combined with standardised scores from other tests. The following types of score or measure account for many of the outcomes of educational or psychometric tests: Standardised scores Standardised scores and percentile ranks are directly related. nationally standardised score automatically comes out as 100. and so it is easy to see whether a test-taker is above or below the national average. so that an allowance can be made for the different ages of the pupils In a typical class in England and Wales. 2) In educational tests. This means that. 1) In order to place test takers' scores on a readily understandable scale One way to make a test score such as 43 out of 60 more readily understandable would be to convert it to a percentage (72 per cent to the nearest whole number). the percentage on its own is not related to (a) the average score of all the test-takers. including most of those constructed by NFER. older pupils achieve slightly higher raw scores than younger pupils.Many professionally produced tests. On the other hand. and for many occupational tests. standardised scores are derived in such a way that the ages of the pupils are taken into account by comparing a pupil only with others of the same age (in years and months). The measure of the spread of scores is called the 'standard deviation' and this is usually set to 15 for educational attainment and ability tests. However. However. it is usual that most pupils are born between 1st September in one year and 31st August of the following year. give outcomes that are different from simple proportions or percentages. nationally representative sample that has taken the test prior to publication. An older pupil may in fact gain a higher raw score than a younger . which means that the oldest pupils are very nearly 12 months older than the youngest. irrespective of the difficulty of the test. Almost invariably in ability tests taken in the primary and early secondary years. which is shown in the figure below. or (b) how spread out their scores are. and about 96 per cent will have a standardised score within two standard deviations (30 points) of the average (between 70 and 130).

Similarly. for example. In occupational tests. indicates that thestudent's test performance equals or exceeds 25 out of 100 students on the same measure. A percentile of 25. Instead of reflecting a student's rank compared to others. for example. a fixed relationship between standardised scores and percentile ranks when the same average score and standard deviation are used. 3) So that scores from more than one test can be meaningfully compared or added together Standardised scores from most educational tests cover the same range from 70 to 140. reporting school test scores to parents. mathematics and English can be compared directly using standardised scores. Standard scores. Percentiles. or better than. say.pupil. The percentile rank of a test-taker is defined as the percentage of test-takers in the sample who gained a score at the same level or below that of the test-taker's score. There is. Hence a pupil's standing in. should a teacher wish to add together scores from more than one test. This information may be useful when. or the degree to which scores typically will deviate from the average score. The table below shows the relationship for tests that employ an average standardised score of 100 and a standard deviation of 15. Performance at the 25th percentile. but have a lower standardised score. standard scores indicate how far above or below the average (the "mean") an individual score falls. This is because the older pupil is being compared with other older pupils in the reference group and has a lower performance relative to his or her own age group. Standard scores can be used to compare individuals from different grades or age groups because all scores are converted to the same numerical scale. using a common scale. for example. such as one with an "average" of 100. Percentile Ranks Recording a test-taker's percentile rank enables his or her performance to be compared very clearly with those in the national standardisation sample. A percentile is a score that indicates the rank of the student compared to others (same age or same grade). Percentiles are probably the most commonly used test score in education. they can be meaningfully combined if standardised scores are used. Note that this is not the same as a "percent"-a percentile of 87 does not mean that the student answered 87% of the questions correctly! Percentiles are derived from raw scores using the norms obtained from testing a large population when the test was first developed. For example. Most intelligence tests and many achievement tests use some type of standard scores. whereas it is not meaningful to add together raw scores from tests of different length or difficulty. . in fact. a percentile of 87 indicates that the student equals or surpasses 87 out of 100 (or 87% of) students. the use of standardised scores enables the organisation to compare directly or add together sub-test scores or scores from different tests in a battery. a standard score of 110 on a test with a mean of 100 indicates above average performance compared to the population of students for whom the test was developed and normed. Standard scores also take "variance" into account. indicates a standardised score that is as good as. A standard score is also derived from raw scores using the norming information gathered when the test was developed. for example in order to obtain a simple overall measure of attainment. using a hypothetical group of 100 students. the standardised scores of 25 per cent of the sample.