You are on page 1of 16

Understanding Test Statistics

Day 1: Contents Introduction Tests and Test Scores Data sets and their display Measures of central tendency Skewness Measures of dispersion Basic analysis using a spreadsheet

Introduction

The aims of the programme are: to help participants appreciate the need for statistical analysis in assessment; to enable participants to interpret key statistical indicators; to allow participants to discuss the relevance of statistics to their work. The six-day programme will cover: Basic descriptive statistics; Basic comparative statistics; Classical item statistics; Introduction to item response theory and its uses. The approach will place the emphasis on the interpretation of statistics, rather than the underlying mathematics, and will be related to: test design; test evaluation; grading.

Notes:

George Bethell, 2002

Page 1

Understanding Test Statistics

Why use statistics?


In most examination systems, the subjective assessment of the quality of student responses (answers) is important in defining standards. That is, grading involves expert judgement. However (cu toate acestea), using statistics bring (aduce) advantages because:

they are objective. they allow (permite) the technical performance of tests to be evaluated. they allow comparisons to be made among (printre) populations. they may highlight anomalies (acestea pot evidenia anomalii).

Notes:

George Bethell, 2002

Page 2

Understanding Test Statistics

Tests and Test Scores


What is a test?

A test is an instrument designed to measure ability (capaciti) in a particular domain. A test is made up of a number of tasks (Un test este alctuit dintr-un numr de sarcini). Each task may be made up of several items.( Fiecare sarcin poate fi alctuit din mai multe
elemente)

Students are asked to give responses to the items presented in the test. We mark each students responses according to a marking scheme. (Noi notm
rspunsurile fiecrui elev n funcie de un sistem de marcare barem de corectare)

This produces an overall test score for the student.( Acest lucru produce un scor general la test
pentru student) Test Student Responses Task 1 o xxxxxxx o xxxxxxx o xxxxxxx o xxxxxxx Task 2 o xxxxxxx o xxxxxxx o xxxxxxx Task 3 o xxxxxxx o xxxxxxx o xxxxxxx Task 4 o xxxxxxx o xxxxxxx o xxxxxxx o xxxxxxx o xxxxxxx Apply Marking Scheme Task 1 o 1 o 1 o 0 o 2 Task 2 o 1 o 2 o 1 Task 3 o 0 o 0 o 1 Task 4 o 1 o 4 o 3 o 0 o 1 Calculate Total Score Task 1 1+1+0+2 = 4

Task 1 o item 1 o item 2 o item 3 o item 4 Task 2 o item 1 o item 2 o item 3 Task 3 o item 1 o item 2 o item 3 Task 4 o item 1 o item 2 o item 3 o item 4 o item 5

Task 2 1+2+1 = 4 Task 3 0+0+1 = 1 Task 4 1+4+3+0+1 = 9

Total = 18 If the test is valid and reliable, the total score will be a good indicator of student ability. Notes:

George Bethell, 2002

Page 3

Understanding Test Statistics

Displaying Test Scores


Exams generate lots of numbers! Each component will generate at least one raw score for each candidate. These raw scores may then be manipulated by some or all of the following: examiner scaling; component scaling; weighting; aggregation. All these numbers form the data set for the examination. There are two ways to display data: an ordered list of marks (mark distribution) a graphical display. Distribution of Scores The total scores of the students who a test marked out of 20 are:
12 20 17 19 14 18 20 14 18 18 18 16 15 9 19 11 19 11 8 16 17 10 17 13 16 16 19 18 13 15

We can now count the students who gain a particular mark. The results are recorded as a frequency distribution. We can then plot the list of numbers as a chart.

George Bethell, 2002

Page 4

Statistical Methods for Assessment

Score (x)

Frequency (f)

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

0 0 0 0 0 0 0 0 1 1 1 2 1 2 2 2 4 3 5 4 2

The frequency is the number of students who scored a particular mark. So: no-one scored 7 marks;

one student scored 10 marks;

four students scored 16 marks.

All this data is displayed on the chart below. Test Score Distribution

6 Number of students 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Test Score (out of 20)

George Bethell, January 2003

Page 5

Statistical Methods for Assessment

In national examinations we have large numbers of students taking the tests. Also our examination papers have more items or measuring points typically 100 or more. In this case, ordering and displaying the distribution of scores is even more important. Here are the results from one paper in an English Language examination. 514 students attempted this paper. Notice that here the frequency distribution table has an extra column. This shows the proportion (%) of students who get a particular score. For example, we can see immediately that 4,9% of students are on a mark of 40/60.
Score (x) Frequency Proportion (f) (%) Score (x) Frequency Proportion (f) (%)

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 1 1 5 3 5 2 4 3 8 6 7

0,2 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,2 0,0 0,0 0,0 0,2 0,0 0,2 0,0 0,0 0,2 0,2 1,0 0,6 1,0 0,4 0,8 0,6 1,6 1,2 1,4

31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

8 8 12 9 13 12 17 10 16 25 24 22 29 23 33 21 25 24 21 28 21 22 14 11 8 6 0 1 2 0

1,6 1,6 2,3 1,8 2,5 2,3 3,3 2,0 3,1 4,9 4,7 4,3 5,6 4,5 6,4 4,1 4,9 4,7 4,1 5,5 4,1 4,3 2,7 2,1 1,6 1,2 0,0 0,2 0,4 0,0

This pattern of scores is shown on the chart below.

George Bethell, January 2003

Page 6

Statistical Methods for Assessment

Distribution of Test Scores (maximum = 60)


8

Frequency Histogram: Subgroup 0 - Subtest 0

Percent

0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Score scale

What can we see from this pattern of scores?

Charts like these give us useful information about the pattern of scores produced when a group of students takes a test. However, they are difficult to compare directly. For this we need statistical indicators that tell us about: the middle of a distribution (measures of central tendency) the width of the distribution (measures of dispersion). Notes:

George Bethell, January 2003

Page 7

Statistical Methods for Assessment

Measures of Central Tendency


There are three measures that we can use to tell us about how a typical student performed on our test the mode, the median, and the mean. Each of these tells us something slightly different. The Mode The mode is the score which occurs most frequently. It is the position of the peak in our frequency distribution chart.

Score Distribution
35 30 # of students 25 20 15 10 5 0
0 5 10 15 20 25 30 35 40 45

peak

50

55

60

Score
mode = 45

The Median The median is the position of the middle student when those taking the test are ranked according to their total test scores. For example, a test is taken by 15 students and their scores are: 10 17 13 20 15 14 17 8 19 12 17 We arrange these in order: 8 10 11 12 13 14 15 15 16 16 17 17 17 17 17 17 17 17 17 16 19 19 19 19 19 11 20 20

Now we find the middle one: 8 10 11 12 13 14

The median score is 16.

George Bethell, January 2003

Page 8

Statistical Methods for Assessment

If you have an even number of test-takers, find the students on either side of the middle and take their average score. For example: 8 10 11 12 13 14 15 16 17 17 median = 16,5 17 17 19 19 20 20

The median score is 16,5. The Mean The mean is the most commonly used indicator of typical performance. It is the arithmetic average of all test scores. It is calculated by adding up all the student scores and dividing by the number of students who took the test. For example: Test Scores: 8, 10, 11, 12, 13, 14, 15, 16, 17, 17, 17, 17, 19, 19, 20, 20 Total: 8+10+11+12+13+14+15+16+17+17+17+17+19+19+20+20 = 245 Average score = 245/16 = 15,3 The mean score = 15,3 Relationship between the mode, median and mean If the test produces a symmetrical pattern of scores, the mode, mean and median are exactly the same. For example, the distribution below has all three equal to 5,0.
Distribution of Test Scores 30 25 20 15 10 5 0 0 1 2 3 4 5 6 7 8 9 10
Score Number of students

mode = median = mean (5,0)

George Bethell, January 2003

Page 9

Statistical Methods for Assessment

Skewness
If the test produces an asymmetrical pattern of scores then the distribution is said to be skewed. In this case, the mean, mode and median move apart. For example, in the pattern below, the mode is 7 but the mean is just 6,3.
Distribution of Test Scores
Number of students

50 40 30 20 10 0 0 1 2 3 4 5 6 7 8
mode = 7

10
Score

mean = 6,3

Skewed distributions are useful when you want to make decisions in a particular part of the marking range. For example, to identify students with learning difficulties, the distribution below would be satisfactory. This test discriminates well between students of low ability.
Distribution of Test Scores 12 10 8 6 4 2 0 0
Number of students

These do not need help.

These need help.

10
Score

If you want to grade over a wide range of ability, it is better if your distribution of scores is not highly skewed.

George Bethell, January 2003

Page 10

Statistical Methods for Assessment

Measures of Dispersion
We are interested in the spread of test scores because this can tell us about the spread of ability. We call the spread of test scores the dispersion. Here we look at three measures the range, the inter-quartile range, and the standard deviation. The Range This is the simplest measure of dispersion. It tells us the number of marks from the lowest to the highest score achieved. Range = Maximum Score Achieved Minimum Score Achieved + 1 (The +1 is needed because, for example, in a test marked out of 10 we actually have 11 marks available: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.) Example: In a national examination for History, the maximum possible score was 100 marks. When the examination was taken, the top score achieved was 96. The lowest score achieved was 18. The range of scores was: 96 18 + 1 = 79 In theory, this test had 101 marks available to discriminate between all the students. In the event, only 79 marks were used. For tests and examinations used for grading students across a wide range of ability, we prefer our tests to use nearly all the mark range.

George Bethell, January 2003

Page 11

Statistical Methods for Assessment

The Inter-Quartile Range The range can be misleading because it uses only the two most extreme cases. To solve this problem we look at the number of marks covering the middle 50% of students. This is the inter-quartile range. To find the inter-quartile range we find the score below which 25% of the students fall. This is the lower quartile. Then we find the score above which we find the top 25% of students. This is the upper quartile. Then we calculate the inter-quartile range: inter-quartile range = score at upper quartile score at lower quartile

Finding the inter-quartile range


100 90 80 70 60 score 50 40 30 20 10 0 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% cumulative percentage of students

For this distribution, the first quartile is at 40 marks and the third quartile is at 60 marks. The inter-quartile range is 60 40 = 20 marks. This tells us that, on this test, the middle 50% of the population is covered by just 20 marks.

George Bethell, January 2003

Page 12

Statistical Methods for Assessment

Standard Deviation The most powerful indicator of a tests dispersion is the standard deviation. This is a measure which takes into account every students score. The formula is given below.

Standard Deviation =

(x
i= N i =1

xi = score of student

i from the m ean of the deviations from the m ean

x =m ean score for all students ( xi x ) 2 = square of deviation

( x

x ) = sum of all the squares


2

N = total num ber = square root

of students

You dont have to remember the formula but you should have an idea about what the standard deviation (SD) means. Firstly, the wider the spread of test scores the larger the SD. The smaller the SD, the narrower the distribution of test scores. Secondly, most national examinations produce SDs of about 16 marks in 100. If the SD is larger, the student scores are more spread out. If the SD is smaller, the scores are closer together. To get an idea of what this means, look at the score distributions on the next page. These have been drawn so that: o Each one has exactly the same mean score (50/100). o Each one has exactly the same number of students (2000). o The only thing that differs is the SD.

George Bethell, January 2003

Page 13

Statistical Methods for Assessment

Score Distributions
90 80 70 # of students 60 50 40 30 20 10 0 0 10 20 30 40 50 score 60 70 80 90 100
SD=15 SD=20 SD=10

Notice how an examination with an SD of 20 marks out of 100 (20%), gives a broad spread of scores. The examination with an SD of 10 marks out of 100 (10%) leaves the students clustered in the middle. Look at how much the number of students on the mean mark drops as the SD increases. For grading purposes, we prefer examinations with large standard deviations.

George Bethell, January 2003

Page 14

Statistical Methods for Assessment

Exercises 1 A teacher gives a test to 25 students. Here are their scores.

17, 22, 25, 19, 21, 21, 19, 25, 16, 9, 26, 14, 17, 19, 18, 19, 21, 15, 17, 19, 16, 12, 20, 16, 23

(a) (i)

Find: the mode

(ii)

the median

(iii)

the mean score (iv) the range.

(b) The teacher decides to give all students who score fewer than 10 marks extra help after school. What purpose has the teacher used the test for? (c) The teacher notes that all the students got questions 7 and 8 wrong. She decides that she will have to teach the topic tested by these questions again, using a different approach. What purpose has the teacher used the test for? 2 Here is a distribution of test scores.

Distribution of Scores
8 # of students 6 4 2 0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

score
(a)

(i) (iii)

Find: the number of students taking the test(ii) the mode (iv) the range of scores.

the median score

(b) This distribution is skewed. (i) What does skewed mean? (ii) In this distribution, is the mean smaller, bigger, or the same size as the mode? (iii) Would this test be better for identifying students with learning difficulties or for awarding prizes to the best students? Explain your answer. 3 A Mathematics examination gave these results: Mode = 14% Mean = 43% Standard Deviation = 24%

What does this data tell you about the distribution of test scores? (Consider typical student performance, symmetry and dispersion.)

George Bethell, January 2003

Page 15

Statistical Methods for Assessment

Basic analysis using a spreadsheet


All the statistics introduced above can be calculated using a spreadsheet e.g. Excel. The database should include a student identifier and the raw scores achieved by each student on each component. Student ID 1107 1108 ... Paper 1 score 32 29 ... Paper 2 score 15 17 ... Paper 3 score 19 22 ...

The following functions can then be used to process the data.

(array)
MODE(array) AVERAGE(array) MEDIAN(array) SKEW(array) MIN(array) MAX(array) QUARTILE(array,1) QUARTILE(array,3) STDEV(array)

The sum (total) of a number of scores. The mode of a distribution. The mean of a distribution. The median of a distribution. A numerical indicator of the degree of asymmetry. The maximum score achieved. The minimum score achieved. The range is given by: MAX(array)-MIN(array)+1 25% of candidates fall at or below this score. 75% of candidates fall at or below this score. The inter-quartile range is given by: QUARTILE(array,3)- QUARTILE(array,1) The standard deviation of the distribution.

To draw a frequency distribution chart in Excel, you have to create a table of scores and frequencies using the FREQUENCY(data-array,bins_-array) as shown below.
10 20 30 40 50 60 70 80 90 100 1 21 68 174 286 373 447 372 189 22

Once you have your frequency distribution you can use the chart wizard to create a bar chart.

George Bethell, January 2003

Page 16