You are on page 1of 13

Computer Adaptive Testing in Higher Education: A case

study

Mariana Lilley1, Trevor Barker2, Carol Britton3


1
University of Hertfordshire, m.lilley@herts.ac.uk
2
University of Hertfordshire, t.1.barker@herts.ac.uk
3
University of Hertfordshire, c.britton@herts.ac.uk

ABSTRACT
At the University of Hertfordshire we have developed a computer-adaptive test (CAT)
prototype. The prototype was designed to select the questions presented to individual
learners based upon their ability. Earlier work by the authors during the last five years
has shown benefits of the CAT approach, such as increased learner motivation. It was
therefore important to investigate the fairness of this assessment method. In the study
reported here, statistical analysis of test scores from 320 participants show that in all
cases scores were highly correlated between CATs and other assessment methods
(p<0.05). This was taken to indicate that learners of all abilities were not disadvantaged
by our CAT approach.

KEYWORDS
Student assessment, Computer-adaptive test, Assessment fairness

INTRODUCTION
The past ten to fifteen years have witnessed a significant increase in the use of
computer-assisted assessment in Higher Education. Hardware developments and
subsequent proliferation of computer technology in conjunction with the ever-increasing
student numbers are amongst the main reasons for this trend (Freeman and Lewis,
1998; O’Reilly and Morgan, 1999; Wainer, 2000; Joy et al., 2002). Computer-assisted
assessments are applications that support student testing, from actual test
administration to scoring and student performance reporting. The benefits of these
computerised tools over traditional paper-and-pencil tests are well reported in relevant
literature and range from accuracy of marking to the potential to quickly assess large
groups of students (Pritchett, 1999; Harvey and Mogey, 1999; De Angelis, 2000; Mason
et al., 2001).

A significant number of computer-assisted assessments currently being used in Higher


Education are the so-called computer-based tests (CBTs). CBTs are traditionally not

SOLSTICE 2007 Conference, Edge Hill University 1


tailored towards individual students, as the same fixed set of questions is administered
to all students regardless of their ability within the subject domain.

Conventional CBTs differ from computer-adaptive tests (CATs) primarily in the way that
the questions administered during a given assessment session are selected. In a CAT,
one question is administered at a time and the selection of the next question is
dependent on the response to the previous one. In summary, whilst CBTs mimic
aspects of a paper-and-pencil test, CATs mimic aspects of an oral interview (Freedle &
Duran, 1987; Syang & Dale, 1993). To this end, the first administered question within a
CAT is typically one of average difficulty. A correct response will make a more difficult
question follow. Conversely, an incorrect response will cause an easier question be
administered next. By dynamically selecting the sequence and level of difficulty of the
questions administered to each individual student's proficiency level, the CAT approach
has the potential to offer higher levels of interaction and individualisation than those
offered by its CBT counterpart. This can, in itself, lead to increased student motivation
(Lilley et al., 2004). Because of individual differences in ability levels within the subject
domain being tested, the CBT static approach often poses problems for some students.
For example, a given question might be too easy and thus uninteresting for one student
and too difficult and therefore bewildering to another student. More importantly,
questions that are too difficult or too easy provide tutors with little information regarding
student ability. We argue that it is only by asking questions at the boundary of what a
student understands that we can obtain useful information about what he or she has
learned. By adapting the level of difficulty of the questions to match the ability of the
test-taker, questions that provide little information about a given student can be avoided.
Despite the predicted benefits of computerised adaptive testing, the approach has
received relatively little attention from British Higher Education institutions (Joy et al.,
2002).

The use of computer-adaptive tests (CATs) in Higher Education, as a means of


enhancing student assessment as well as fully exploiting the computer technology
already available, is the focus of ongoing research at the University of Hertfordshire. To
this end, a CAT application based on the Three-Parameter Logistic Model from Item
Response Theory (IRT) was designed, developed and evaluated. Earlier work by the
authors showed the efficacy of the approach in the domain of English as a second
language (Lilley & Barker, 2002; Lilley et al., 2004). Our current focus of research is the
use of computerised adaptive testing within the Computer Science domain.

In the next section of this paper, the reader is provided with an overview on IRT and
computerised-adaptive testing. We then report on the findings of an empirical study, in
which over 300 students participated in summative assessment sessions using our CAT
application. Potential benefits and limitations of the CAT approach in addition to our
views on how the work described here can be developed further are presented in the
final section of this paper.

SOLSTICE 2007 Conference, Edge Hill University 2


COMPUTER-ADAPTIVE TESTS
Computer-adaptive tests (CATs) are typically based on Item Response Theory (IRT)
(Wainer, 2000). IRT is a family of mathematical functions that attempts to predict the
probability of a test-taker answering a given question correctly. Since we aimed to
develop a computer-adaptive test based on the use of objective questions such as
multiple-choice and multiple-response, only IRT models capable of evaluating questions
that are dichotomously scored were considered to be appropriate. The Three-
Parameter Logistic (3-PL) Model was chosen over its counterparts One-Parameter
Logistic Model and Two-Parameter Logistic Model as it takes into consideration the
question's discrimination and the probability of a student answering a question correctly
by guessing.

Equation 1 shows the mathematical function from the 3-PL model used to evaluate the
probability P of a student with an unknown ability θ correctly answering a question of
difficulty b, discrimination a and pseudo-chance c. In order to evaluate the probability Q
of a student with an unknown ability θ incorrectly answering a question, the function
Q(θ ) = 1 − P(θ ) is used (Lord, 1980). Within a CAT, the question to be administered next
as well as the final score obtained by any given student is computed based on the set of
previous responses. This score is obtained using the mathematical function shown in
Equation 2 (Lord, 1980).

Equation 1: Three-Parameter Logistic Model Equation 2: Response Likelihood Function


1− c n
P (θ ) = c + L(u1 , u 2 ,..., u n | θ ) = ∏ Pj j Q j
u 1− u j

1 + e −1.7 a (θ −b ) j =1

By applying the formula shown in Equation 1, it is possible to plot an Item Characteristic


Curve (ICC) for any given question.

Figure 1: 3-PL ICC for a correct response where a=1.5, b=0 and c=0.1
1

0.9
Probability of a correct response

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

Ability θ

SOLSTICE 2007 Conference, Edge Hill University 3


Figure 2: 3-PL ICC for a correct response where a=1.7, b=1.48 and c=0.25
1

0.9

Probability of a correct response


0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

Ability θ

Figure 3: 3-PL ICC for an incorrect response where a=1.5, b=0 and c=0.1
1

0.9
Probability of an incorrect response

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

Ability θ

Figure 4: The response likelihood curve assumes a bell-shape when at least one correct and one
incorrect response are entered
0.3

0.2
Likelihood

0.1

0
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

Ability θ

Questions within the Three-Parameter Logistic Model are dichotomously scored. As an


example, consider a student who answered a set of three questions, in which the first
and second responses were incorrect and the third response was correct, such as u1 =
0, u2 = 0 and u3 = 1. The likelihood function for this example is
L(u1, u 2, u3 | θ ) = ( P1 Q1 )( P2 Q2 )( P3 Q3 ) , or more concisely L(u1, u 2, u3 | θ ) = Q1Q2 P3 .
0 1 0 1 1 0

In the event of a student entering at least one correct and one incorrect response, the
response likelihood curve (see Equation 2) assumes a bell-shape, as shown in Figure 4.

SOLSTICE 2007 Conference, Edge Hill University 4


IRT suggests that the peak of this curve is the most likely value for this student's ability
θ estimate.

An extensive discussion on IRT is beyond the scope of this paper, and the reader
interested in investigating this topic in more depth is referred to Lord (1980), Hambleton
(1991) and Wainer (2000). In the next section of this paper we describe the empirical
study that is the main focus of this paper.

THE STUDY
The study described here involved participants drawn from three different second-year
programming modules at the University of Hertfordshire. For simplicity, these groups
will be referred to as CS1, HND1 and HND2. The CS1 group comprised 96 students
enrolled in a Visual Basic programming module of the Bachelor of Science (BSc)
programme in Computer Science. Both HND1 and HND2 groups consisted of students
enrolled in a Visual Basic programming module of the Higher National Diploma (HND)
programme in Computer Science. The HND1 and HND2 groups had, respectively, 133
and 81 participants.

All subject groups participated in three different types of summative assessment,


namely computer-based test (CBT), computer-adaptive test (CAT) and practical exam.
All assessment sessions took place in computer laboratories, under supervised
conditions. The main characteristics of these assessment sessions are summarised in
Table 1. Table 2 provides an overview of the topics covered by each session of
computer-assisted assessment.
Table 1: Summary of summative assessment methods

Assessment Description
Week
Computer-Assisted Assessment (1) 7 10 CBT questions followed by 10 CAT ones.
Time limit of 30 minutes.
Computer-Assisted Assessment (2) 10 10 CBT questions followed by 20 CAT ones.
Time limit of 40 minutes.
Practical exam 18 Each individual student had to create a working
program based on a set of specifications
provided on the day. Time limit of 2 hours.

As can be seen from Table 1, the computer-assisted assessments (1) and (2)
comprised both non-adaptive (i.e. CBT) and adaptive (i.e. CAT) components. This was
deemed necessary to ensure that students would not be disadvantaged by the adaptive
approach, in addition to providing the authors with useful data for comparative
purposes. The students, however, were unaware of the existence of the adaptive
component until the end of the study.

SOLSTICE 2007 Conference, Edge Hill University 5


Table 2: Topics covered by Computer-Assisted Assessments (1) and (2)

Number of questions Number of questions


Computer-Assisted Computer-Assisted
Topic Assessment (1) Assessment (2)
Data Types and variable declaration 4 5
Arithmetic, Comparison, Concatenation 4 5
and Logical operators
Built-in/Intrinsic functions 4 5
Program flow 4 5
Standard controls: properties, methods 4 5
and events
Professional controls: properties, 0 5
methods and events

The CAT application used in this study comprised an adaptive algorithm based on the
3-PL Model, a question-database and a Graphical User Interface (GUI). The GUI is
illustrated in Figure 5.

Figure 5: Screenshot of a question regarding the MsgBox function

The question database contained information on each question, such as stem, options,
key answer and IRT parameters. There are two main approaches to the calibration of
questions with no historical data. The first approach would be the use of statistical
simulations, commonly known as “virtual students” (Conejo et al., 2000). The second
approach would be the use of experts in the subject domain to grade the difficulty of the
questions (Fernandez, 2003). An important characteristic of our approach was that
experts used Bloom's taxonomy of cognitive skills (Pritchett 1999; Anderson &
Krathwohl, 2001) in order to perform the calibration of questions. In this work, questions
were first classified according to cognitive skill being assessed. After this initial

SOLSTICE 2007 Conference, Edge Hill University 6


classification, questions were then ranked according to difficulty within each cognitive
level. Table 3 summarises the three levels of cognitive skills covered by the question
database and their difficulty range. It can be seen from Table 3 that knowledge was the
lowest level of cognitive skill and application was the highest. An important assumption
of our work is that each higher level cognitive skill will include all lower level skills. As
an example, a question classified as application is assumed to embrace both
comprehension and knowledge.
Table 3: Level of difficulty of questions

Difficulty b Cognitive skill Skill involved

− 2 ≤ b < −0.6 Knowledge Ability to recall taught material

− 0 .6 ≤ b < 0 .8 Comprehension Ability to interpret and/or translate taught material

0 .8 ≤ b ≤ 2 Application Ability to apply taught material to novel situations

At the end of each assessment session, questions were re-calibrated using response
data obtained by all participants who attended the session. In general terms, questions
that were answered correctly by many test takers had their difficulty levels lowered and
questions that were answered incorrectly by many test takers had their difficulty levels
increased.

Both sessions of computer-assisted assessment started with the non-adaptive


questions, followed by the adaptive ones. In the CBT section of the assessment,
questions were sorted by topic and then by ascending difficulty. In the CAT section of
the assessment, questions were grouped by topic and the level of difficulty of the
question to be administered next was based on each individual set of previous
responses (see Equation 2). The scores obtained by the subject groups in the three
assessments were subjected to statistical analysis. The results of this statistical
analysis are presented in the next section of this paper.

RESULTS
Table 4 shows the mean scores obtained by participants in the three Visual Basic
assessments undertaken.
Table 4: Mean scores obtained by the participants in the three assessments undertaken
(N=320)
CBT (1) CAT (1) CBT (2) CAT (2) Practical exam
Group N
Mean Score Mean Ability Mean score Mean Ability Mean Score
CS1 96 52.7% -0.90 27.8% -0.89 48.2%
HND1 133 51.5% -0.83 42.3% -0.91 49.7%
HND2 91 50.0% -0.81 24.3% -1.45 47.9%

SOLSTICE 2007 Conference, Edge Hill University 7


Note that the scores for the adaptive and the non-adaptive component of both
computer-assisted assessments are shown in separate columns. The “Mean Ability”
columns represent the mean value of the estimated student ability for the adaptive
component of the computer-assisted assessment, which ranged from –2 (lowest) to +2
(highest). The scores for the non-adaptive (i.e. CBT) component of the computer-
assisted assessment and practical exam ranged from 0 (lowest score) to 100 (highest
score).

It was important to investigate whether or not participants were disadvantaged by our


approach to adaptive testing. To this end, the data summarised in Table 4 was
subjected to a Pearson's Product Moment correlation using the SPSS software
package. This tests the significance of any relationship between individuals' scores
obtained in the three assessments examined as part of this work. The Pearson's
Product Moment correlation for the CS1, HND1 and HND2 groups are shown
respectively in Tables 5, 6 and 7.

Table 5: Pearson's Product Moment correlations between the scores obtained by CS1 students in the
three assessments undertaken (N=96)
** Correlation is significant at the 0.01 level (2-tailed)

Practical
CAT 2 CBT1 CBT2
exam
Ability Score Score
Score
CAT1 Ability Pearson Correlation .580(**) .390(**) .522(**) .575(**)
Sig. (2-tailed) .000 .000 .000 .000
N 96 96 96 96
CAT2 Ability Pearson Correlation .488(**) .758(**) .640(**)
Sig. (2-tailed) .000 .000 .000
N 96 96 96
CBT1 Score Pearson Correlation .390(**) .502(**)
Sig. (2-tailed) .000 .000
N 96 96
CBT2 Score Pearson Correlation .535(**)
Sig. (2-tailed) .000
N 96

SOLSTICE 2007 Conference, Edge Hill University 8


Table 6: Pearson's Product Moment correlations between the scores obtained by HND1 students in the
three assessments undertaken (N=133)
** Correlation is significant at the 0.01 level (2-tailed)

Practical
CAT 2 CBT1 CBT2
exam
Ability Score Score
Score
CAT1 Ability Pearson Correlation .617(**) .849(**) .548(**) .552(**)
Sig. (2-tailed) .000 .000 .000 .000
N 133 133 133 133
CAT2 Ability Pearson Correlation .552(**) .816(**) .571(**)
Sig. (2-tailed) .000 .000 .000
N 133 133 133
CBT1 Score Pearson Correlation .467(**) .445(**)
Sig. (2-tailed) .000 .000
N 133 133
CBT2 Score Pearson Correlation .527(**)
Sig. (2-tailed) .000
N 133

Table 7: Pearson's Product Moment correlations between the scores obtained by HND2 students in the
three assessments undertaken (N=91)
** Correlation is significant at the 0.01 level (2-tailed)
* Correlation is significant at the 0.05 level (2-tailed)
Practical
CAT 2 CBT1 CBT2
exam
Ability Score Score
Score
CAT1 Ability Pearson Correlation .521(**) .421(**) .449(**) .394(**)
Sig. (2-tailed) .000 .000 .000 .000
N 81 81 81 81
CAT2 Ability Pearson Correlation .411(**) .488(**) .350(**)
Sig. (2-tailed) .000 .000 .001
N 81 81 81
CBT1 Score Pearson Correlation .412(**) .289(**)
Sig. (2-tailed) .000 .009
N 81 81
CBT2 Score Pearson Correlation .236(*)
Sig. (2-tailed) .034
N 81

The results of the statistical analysis of learners' performance are interpreted as


supporting the view that the CAT method is a fair and reliable method of assessment.
Those learners performing well on the CBT component also performed well on the CAT
component of the assessment.

SOLSTICE 2007 Conference, Edge Hill University 9


It is well reported in the relevant literature that objective questions such as multiple-
choice and multiple-response questions are useful when assessing the lowest levels of
cognitive skills, namely knowledge, comprehension and application. On the other hand,
practical examinations support the assessment of higher cognitive skills, namely
analysis, synthesis and evaluation (Pritchett, 1999). Thus, due to the different nature of
skills being evaluated in each assessment, it was expected that the correlation between
practical examinations and computer-assisted assessments would be low but
nonetheless significant.

DISCUSSION AND FUTURE WORK


The research presented here was performed in the context of the increased use of
computer technology as an enabling electronic tool for student assessment in Higher
Education (HE). At the present time, many HE institutions are performing summative
assessment using online CBT methods. It is important that learners and tutors are
aware of the implications of this increasing trend. It is hoped that our research will be
useful in understanding the opportunities as well as the limitations of such methods. By
increasing learner motivation, providing better and reproducible estimates of learner
ability and by improving the efficiency of testing it is hoped that both tutors and learners
may benefit from the CAT approach. This is only likely to be useful if tutors and learners
have confidence in the fairness of the approach. Our research to date suggests that the
CAT prototype we developed did not negatively affect student performance (Barker &
Lilley, 2003, Lilley & Barker, 2002, Lilley & Barker, 2003, Lilley et al., 2004). It is hoped
that our work provides some evidence that CATs reflect learner ability at least as well as
CBT and off-computer tests.

The main benefit of the CAT approach over its CBT counterpart would be higher levels
of individualisation and interaction. In a typical CBT, the same fixed set of questions is
presented to all students participating in the assessment session, regardless of their
performance during the test. The questions within this predefined set are typically
selected in such a way that various ability levels, ranging from low to advanced, are
considered (Pritchett, 1999). A consequence of this configuration is that high-
performance students are presented with one or more questions that are below their
level of ability. Similarly, low-performance students are presented with questions that
are above their level of ability. The adaptive algorithm within CATs makes it possible to
administer questions that are appropriate for each individual student level of ability. In
so doing, a tailored test is dynamically created for each individual test-taker.
Individualising testing can bring double benefits to student assessment:

• academic staff can be provided with more significant information regarding


student performance;
• test-takers can be provided with a more motivating assessment experience,
as they will not be presented with questions that are too easy and therefore
unchallenging or too difficult and thus bewildering.

SOLSTICE 2007 Conference, Edge Hill University 10


Some participants from the CS1 group corroborated the view that CATs can support a
more motivating assessment experience, and reported that using our CAT application
was challenging and not “boring” as other programming tests that they have taken in the
past. Interestingly, they reported that they liked to be assessed using our application
because they felt challenged rather than expected to answer “silly” test questions.
These comments were unprompted and provided by high performing students. In
previous work by the authors (Lilley et al., 2004), participants of a focus group reported
as one of benefits of the CAT approach the fact that it allowed not only the most
proficient but also the less able students to demonstrate what they know.

Although CATs are more difficult to implement than CBTs due to the need of an
adaptive algorithm and a larger differentiated question-database, the findings from our
quantitative and qualitative evaluations of the CAT approach to date foster further
research.

In other research we have shown the benefit of the CAT approach in the context of
formative rather than summative assessment. The utilisation of the adaptive approach
for formative purposes has been successful in other projects such as ASAM (Yong &
Higgins, 2004) and SIETTE (Conejo et al., 2000). It is our view that the inherent
characteristics of the CAT approach – for example, the tailoring of the difficulty of the
task to each individual student – contributed to the success of such projects.

At the University of Hertfordshire we are currently engaged in extending the work


presented here to provide students with personalised feedback on test performance.
The use of a differentiated database of questions that are separated into topic areas
and cognitive skill level has made it possible to provide individualised feedback on
student performance. We have shown this approach to the provision of feedback to be
fast and effective and well received by students of all abilities and valued by teaching
staff (Lilley & Barker, 2006; Barker & Lilley, 2006). We argue that this has been made
easier and more informative by the CAT approach (Lilley & Barker, 2006; Barker et al.
2006).

An important recent application of the CAT approach is its use in a student model for
the presentation and configuration of learning objects (Barker, 2006). Although this
research with CATs is in early stages, we have evidence of the benefits of the approach
for more than ten years. We are currently engaged in implementing a computer
application that is capable of using CAT scores to structure and differentiate the
presentation of podcasts for learners, based upon their ability and their needs.

REFERENCES

ANDERSON, L. W. & KRATHWOHL, D. R. (Eds.) (2001) A Taxonomy for Learning, Teaching,


and Assessing: A Revision of Bloom’s Taxonomy of Educational Objectives. New York:
Longman.

SOLSTICE 2007 Conference, Edge Hill University 11


BARKER, T. & LILLEY, M. (2006) Measuring staff attitude to an automated feedback system
based on a Computer Adaptive Test, Proceedings of CAA 2006 Conference, Loughborough
University, July 2006.

BARKER, T. (2006) Attending to Individual Students: How student modelling can be used in
designing personalised blended learning objects. Journal for the Enhancement of Learning and
Teaching, ISSN 1743-3932, Vol. 3 (2) pp 38-48.

BARKER, T. & LILLEY, M. (2003) Are Individual Learners Disadvantaged by the Use of
Computer-Adaptive Testing? In Proceedings of the 8th Learning Styles Conference. University
of Hull, European Learning Styles Information Network (ELSIN), pp. 30-39.

BARKER, T., LILLEY, M & BRITTON, C. (2006) A student model based on computer adaptive
testing to provide automated feedback: The calibration of questions. Presented at Association
for Learning Technology, ALT 2006, Heriot-Watt University, September 4-7, 2006.

BARKER, T., LILLEY, M & BRITTON, C. (2006) Computer Adaptive Assessment and its use in
the development of a student model for blended learning. Annual Blended Learning
Conference, University of Hertfordshire, July 2006.

CONEJO, R., MILLAN, E., PEREZ-DE-LA-CRUZ, J. L. and TRELLA, M. (2000) An Empirical


Approach to On-Line Learning in SIETTE. Lecture Notes in Computer Science 1839, pp. 605-
614.

DE ANGELIS, S. (2000) Equivalency of Computer-Based and Paper-and-Pencil Testing.


Journal of Allied Health 29(3), pp. 161-164.

FERNANDEZ, G. (2003) Cognitive Scaffolding for a Web-Based Adaptive Learning


Environment. Lecture Notes in Computer Science 2783, pp. 12-20.

FREEDLE, R. O. & DURAN, R. P. (1987) Cognitive and Linguistic Analyses of Test


Performance. New Jersey: Ablex.

FREEMAN, R. & LEWIS, R. (1998) Planning and Implementing Assessment. London: Kogan
Page.

HAMBLETON, R. K. (1991) Fundamentals of Item Response Theory. California: Sage


Publications.

HARVEY, J. & MOGEY, N. (1999) Pragmatic issues when integrating technology into the
assessment of students In S. Brown, P. Race & J. Bull (Eds.). Computer-Assisted Assessment
in Higher Education. London: Kogan Page.

JOY, M., MUZYANTSKII, B., RAWLES, S. & EVANS, M. (2002) An Infrastructure for Web-
Based Computer-Assisted Learning. ACM Journal of Educational Resources (2)4, December
2002, pp. 1-19.

LILLEY, M. & BARKER, T. (2006) Student attitude to adaptive testing, Proceedings of HCI
2006 Conference, Queen Mary, University of London, 11-15 September 2006.

SOLSTICE 2007 Conference, Edge Hill University 12


LILLEY, M. & BARKER, T. (2006) Students’ perceived usefulness of formative feedback for a
computer-adaptive test, proceedings of ECEL 2006: The European Conference on e-Learning,
University of Winchester, 11-12 September 2006

LILLEY, M. & BARKER, T. (2002) The Development and Evaluation of a Computer-Adaptive


Testing Application for English Language In Proceedings of the 6th Computer-Assisted
Assessment Conference. Loughborough University, United Kingdom, pp. 169-184.

LILLEY, M. & BARKER, T. (2003) Comparison between computer-adaptive testing and other
assessment methods: An empirical study In Research Proceedings of the 10th Association for
Learning and Teaching Conference. The University of Sheffield and Sheffield Hallam
University, United Kingdom, pp. 249-258.

LILLEY, M., BARKER, T. & BRITTON, C. (2004) The development and evaluation of a
software prototype for computer adaptive testing. Computers & Education Journal 43(1-2), pp.
109-123.

LORD, F. M. (1980) Applications of Item Response Theory to practical testing problems. New
Jersey: Lawrence Erlbaum Associates.

MASON, B. J., Patry, M. & Bernstein, D. J. (2001) An Examination of the Equivalence Between
Non-Adaptive Computer-Based and Traditional Testing. Journal of Educational Computing
Research 24(1), pp. 29-39.

O’REILLY, M. & MORGAN, C. (1999) Online Assessment: creating communities and


opportunities In S. Brown, P. Race, P. and J. Bull. Computer-Assisted Assessment in Higher
Education. London: Kogan Page.

PRITCHETT, N. (1999) Effective question design In S. Brown, P. Race & J. Bull (Eds.).
Computer-Assisted Assessment in Higher Education. London: Kogan Page.

SYANG, A. & DALE, N. B. (1993) Computerized adaptive testing in Computer Science:


assessing student programming abilities. ACM SIGCSE Bulletin 25 (1), March 1993, pp. 53-57.

WAINER, H. (2000) Computerized Adaptive Testing: A Primer. Lawrence Erlbaum Associates


Inc.

YONG, C. F. & HIGGINS, C. A. (2004) Self-assessing with adaptive exercises In Proceedings


of the 8th Computer-Assisted Assessment Conference. Loughborough University, United
Kingdom, pp. 463-469.

SOLSTICE 2007 Conference, Edge Hill University 13