Professional Documents
Culture Documents
Textbook Handbook of Item Response Theory Three Volume Set First Edition Van Der Linden Ebook All Chapter PDF
Textbook Handbook of Item Response Theory Three Volume Set First Edition Van Der Linden Ebook All Chapter PDF
https://textbookfull.com/product/handbook-of-item-response-
theory-volume-three-applications-1st-edition-wim-j-van-der-
linden-editor/
https://textbookfull.com/product/remote-sensing-handbook-three-
volume-set-first-edition-prasad-s-thenkabail/
https://textbookfull.com/product/crc-handbook-of-thermodynamic-
data-of-polymer-solutions-three-volume-set-first-edition-
wohlfarth/
https://textbookfull.com/product/the-basics-of-item-response-
theory-using-r-1st-edition-frank-b-baker/
Lev Vygotsky First Paperback Edition René Van Der Veer
https://textbookfull.com/product/lev-vygotsky-first-paperback-
edition-rene-van-der-veer/
https://textbookfull.com/product/present-imperfect-contemporary-
south-african-writing-first-edition-van-der-vlies/
https://textbookfull.com/product/illustrated-encyclopedia-of-
applied-and-engineering-physics-three-volume-set-1st-edition-
robert-splinter-author/
https://textbookfull.com/product/biota-grow-2c-gather-2c-cook-
loucas/
https://textbookfull.com/product/narratives-of-technology-1st-
edition-j-m-van-der-laan-auth/
Handbook of
Item Response Theory
VOLUME ONE
Models
Handbook of Item Response Theory, Three-Volume Set
Series Editors
Jeff Gill Steven Heeringa
Washington University, USA University of Michigan, USA
Tom Snijders
Oxford University, UK
University of Groningen, NL
Large and complex datasets are becoming prevalent in the social and behavioral
sciences and statistical methods are crucial for the analysis and interpretation of such
data. This series aims to capture new developments in statistical methodology with
particular relevance to applications in the social and behavioral sciences. It seeks to
promote appropriate use of statistical, econometric and psychometric methods in
these applied sciences by publishing a broad range of reference works, textbooks and
handbooks.
Handbook of
Item Response Theory
VOLUME ONE
Models
Edited by
Wim J. van der Linden
Pacific Metrics
Monterey, California
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
© 2016 by Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, an Informa business
This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been
made to publish reliable data and information, but the author and publisher cannot assume responsibility for the valid-
ity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright
holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this
form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may
rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or uti-
lized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopy-
ing, microfilming, and recording, or in any information storage or retrieval system, without written permission from the
publishers.
For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://
www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923,
978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For
organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for
identification and explanation without intent to infringe.
1. Introduction .............................................................................................................................1
Wim J. van der Linden
vii
viii Contents
15. Poisson and Gamma Models for Reading Speed and Error ...................................... 245
Margo G. H. Jansen
24. Multilevel Response Models with Covariates and Multiple Groups ...................... 407
Jean-Paul Fox and Cees A. W. Glas
29. Joint Hierarchical Modeling of Responses and Response Times ............................ 481
Wim J. van der Linden and Jean-Paul Fox
2. Discrete Distributions
Jodi M. Casabianca and Brian W. Junker
xi
xii Contents for Statistical Tools
Index
Contents for Applications
1. Item-Calibration Designs
Martijn P. F. Berger
2. Parameter Linking
Wim J. van der Linden and Michelle D. Barrett
3. Dimensionality Analysis
Robert D. Gibbons and Li Cai
6. Person Fit
Cees A. W. Glas and Naveed Khalid
xiii
xiv Contents for Applications
21. Bayesian Inference Using Gibbs Sampling (BUGS) for IRT models
Matthew S. Johnson
22. BILOG-MG
Michele F. Zimowski
23. PARSCALE
Eiji Muraki
24. IRTPRO
Li Cai
25. Xcalibre 4
Nathan A. Thompson and Jieun Lee
26. EQSIRT
Peter M. Bentler, Eric Wu, and Patrick Mair
Contents for Applications xv
28. Mplus
Bengt Muthén and Linda Muthén
29. GLLAMM
Sophia Rabe-Hesketh and Anders Skrondal
31. WinGen
Kyung (Chris) T. Han
32. Firestar
Seung W. Choi
32. jMetrik
J. Patrick Meyer
Index
Preface
Item response theory (IRT) has its origins in pioneering work by Louis Thurstone in the
1920s, a handful of authors such as Lawley, Mosier, and Richardson in the 1940s, and more
decisive work by Alan Birnbaum, Frederic Lord, and George Rasch in the 1950s and 1960s.
The major breakthrough it presents is the solution to one of the fundamental flaws inher-
ent in classical test theory—its systematic confounding of what we measure with the test
items used to measure it.
Test administrations are observational studies in which test takers receive a set of items
and we observe their responses. The responses are the joint effects of both the properties
of the items and abilities of the test takers. As in any other observational study, it would
be a methodological error to attribute the effects to one of these underlying causal fac-
tors only. Nevertheless, it seems as if we are forced to do so. If new items are field tested,
the interest is exclusively in their properties, and any confounding with the abilities of
the largely arbitrary selection of test takers used in the study would bias our inferences
about them. Likewise, if examinees are tested, the interest is in their abilities only and we
do not want their scores to be biased by the incidental properties of the items. Classical
test theory does create such biases. For instance, it treats the p-values of the items as
their difficulty parameters, but these values depend equally on the abilities of the sample
of test takers used in the field test. In spite of the terminology, the same holds for its
item-discrimination parameters and definition of test reliability. On the other hand, the
number-correct scores classical test theory typically is used for are scores equally indica-
tive of the difficulty of the test as the abilities of test takers. In fact, the tradition of index-
ing such parameter and scores by the items or test takers only systematically hides this
confounding.
IRT solves the problem by recognizing each response as the outcome of a distinct prob-
ability experiment that has to be modeled with separate parameters for the item and test
taker effects. Consequently, its item parameters allow us to correct for item effects when
we estimate the abilities. Likewise, the presence of the ability parameters allows us to cor-
rect for their effects when estimating the item parameter. One of the best introductions to
this change of paradigm is Rasch (1960, Chapter 1), which is mandatory reading for anyone
with an interest in the subject. The chapter places the new paradigm in the wider context
of the research tradition found in the behavioral and social sciences with its persistent
interest in vaguely defined “populations” of subjects, who, except for some random noise,
are treated as exchangeable, as well as its use of statistical techniques as correlation coef-
ficients, analysis of variance, and hypothesis testing that assume “random sampling” from
them.
The developments since the original conceptualization of IRT have remained rapid.
When Ron Hambleton and I edited an earlier handbook of item response theory (van
der Linden and Hambleton, 1997), we had the impression that its 28 chapters pretty
much summarized what could be said about the subject. But now, nearly two decades
later, three volumes with roughly the same number of chapters each appear to be neces-
sary. And I still feel I have to apologize to all the researchers and practitioners whose
original contributions to the vast literature on IRT are not included in this new hand-
book. Not only have the original models for dichotomous responses been supplemented
with numerous models for different response formats or response processes, it is now
xvii
xviii Preface
clear, for instance, that models for response times on test items require the same type
of parameterization to account both for the item and test taker effects. Another major
development has been the recognition of the need of deeper parameterization due
to a multilevel or hierarchical structure of the response data. This development has
led to the possibility to account for explanatory covariates, group structures with an
impact on the item or ability parameters, mixtures of response processes, higher-level
relationships between responses and response times, or special structures of the item
domain, for instance, due to the use of rule-based item generation. Meanwhile, it has
also become clear how to embed IRT in the wider development of generalized latent
variable modeling. And as a result of all these extensions and new insights, we are now
keener in our choice of treating model parameter as fixed or random. Volume 1 of this
handbook covers most of these developments. Each of its chapters basically reviews one
model. However, all chapters have the common format of an introductory section with
some history of the model and a motivation of its relevance, and then continue with sec-
tions that present the model more formally, treat the estimation of its parameters, show
how to evaluate its fit to empirical data, and illustrate the use of the model through
an empirical example. The last section discusses further applications and remaining
research issues.
As with any other type of probabilistic modeling, IRT depends heavily on the use of
statistical tools for the treatment of its models and their applications. Nevertheless, sys-
tematic introductions and review with an emphasis on their relevance to IRT are hardly
found in the statistical literature. Volume 2 is to fill this void. Its chapters are on such
topics as commonly used probability distributions in IRT, the issue of models with both
intentional and nuisance parameters, the use of information criteria, methods for dealing
with missing data, model identification issues, and several topics in parameter estimation
and model fit and comparison. It is especially in these last two areas that recent develop-
ments have been overwhelming. For instance, when the previous handbook of IRT was
produced, Bayesian approaches had already gained some ground but were certainly not
common. But thanks to the computational success of Markov chain Monte Carlo methods,
these approaches have now become standard, especially for the more complex models in
the second half of Volume 1.
The chapters of Volume 3 review several applications of IRT to the daily practice of
testing. Although each of the chosen topics in the areas of item calibration and analysis,
person fit and scoring, and test design have ample resources in the larger literature on test
theory, the current chapters exclusively highlight the contributions that IRT has brought to
them. This volume also offers chapters with reviews of how IRT has advanced such areas
as large-scale educational assessments, psychological testing, cognitive diagnosis, health
measurement, marketing research, or the more general area of measurement of change.
The volume concludes with an extensive review of computer software programs available
for running any of the models and applications in Volumes 1 and 3.
I expect this Handbook of Item Response Theory to serve as a daily resource of informa-
tion to researchers and practitioners in the field of IRT as well as a textbook to novices.
To better serve them, all chapters are self-contained. But their common core of notation
and extensive cross-referencing allows readers of one of the chapters to consult others for
background information without too much interruption.
I am grateful to all my authors for their belief in this project and the time they have
spent on their chapters. It has been a true privilege to work with each of them. The
same holds for Ron Hambleton who was willing to serve as my sparring partner during
the conception of the plan for this handbook. John Kimmel, executive editor, Statistics,
Preface xix
Chapman & Hall/CRC has been a permanent source of helpful information during the
production of this book. I thank him for his support as well.
References
Rasch, G. 1960. Probabilistic Models for Some Intelligence and Attainment Tests. Copenhagen: Danish
Institute for Educational Research.
van der Linden, W. J., and Hambleton, R. K. (Eds.) 1997. Handbook of Modern Item Response Theory.
New York: Springer.
Contributors
David Andrich is the Chapple professor of education at the University of Western Australia.
He earned his PhD from the University of Chicago in 1973. In 1990, he was elected as a
fellow of the Academy of Social Sciences of Australia for his contributions to measure-
ment in the social sciences. He has contributed to the development of both single-peaked
(unfolding) and monotonic (cumulative) response models, and is especially known for his
contributions to Rasch measurement theory.
Li Cai is a professor of education and psychology at UCLA, where he also serves as codi-
rector of the National Center for Research on Evaluation, Standards, and Student Testing
(CRESST). His methodological research agenda involves the development, integration,
and evaluation of innovative latent variable models that have wide-ranging applications
in educational, psychological, and health-related domains of study. A component on this
agenda is statistical computing, particularly as related to item response theory (IRT) and
multilevel modeling. He has also collaborated with substantive researchers at UCLA and
elsewhere on projects examining measurement issues in educational games and simula-
tions, mental health statistics, substance abuse treatment, and patient-reported outcomes.
xxi
xxii Contributors
Paul De Boeck earned his PhD from the KU Leuven (Belgium) in 1977, with a disserta-
tion on personality inventory responding. He has held positions at the KU Leuven as a
professor of psychological assessment and at the University of Amsterdam (Netherlands)
as a professor of psychological methods from 2009 to 2012. Since 2012 he is a professor of
quantitative psychology at the Ohio State University. He is past section editor of ARCS
Psychometrika and a past president of the Psychometric Society (2007–2008). His main
research interests are explanatory item response models and applications in the domain of
psychology and educational measurement.
Susan E. Embretson earned her PhD at the University of Minnesota in psychology in 1973.
She was on the faculty of the University of Kansas for many years and has been a profes-
sor of psychology at the Georgia Institute of Technology, since 2004. She has served as
president of the Psychometric Society (1999), the Society of Multivariate Psychology (1998),
and American Psychological Association, Division 5 (1991). Embretson has received career
achievement awards from NCME, AERA, and APA. Her current research interests include
cognitive psychometric models and methods, educational test design, the measurement of
change, and automatic item generation.
Hanneke Geerlings earned her PhD in psychometrics from the University of Twente, the
Netherlands, where she is currently appointed as an assistant professor. Her PhD thesis
was on multilevel response models for rule-based item generation.
Margo G. H. Jansen earned her PhD in psychology from the University of Groningen in
1977 with a dissertation on applying Bayesian statistical methods in educational measure-
ment. She has held positions at the Central Institute for Test Development (Cito) until 1979
and as an associate professor at the University of Groningen. Her current research inter-
ests are in educational measurement and linguistics.
Rianne Janssen earned her PhD on componential IRT models from the KU Leuven
(Belgium) in 1994. She has been an associate professor at the same university since 1996.
Contributors xxiii
Her research interests are in nearly every aspect of educational measurement. She is
currently responsible for the national assessments of educational progress in Flanders
(Belgium).
Brian W. Junker is a professor of statistics and associate dean for academic affairs in
the Dietrich College of Humanities and Social Sciences at Carnegie Mellon University.
Dr. Junker has broad interests in psychometrics, education research, and applied statistics,
ranging from nonparametric and Bayesian item response theory, to Markov Chain Monte
Carlo and other computing and estimation methods, and rating protocols for teacher
quality, educational data mining, social network analysis, and mixed membership mod-
eling. He earned a BA in mathematics from the University of Minnesota, and an MS in
mathematics and PhD in statistics from the University of Illinois.
George Karabatsos earned his PhD from the University of Chicago in 1998, with spe-
cialties in psychometric methods and applied statistics. He has been a professor at the
University of Illinois-Chicago since 2002. He received a New Investigator Award from the
Society for Mathematical Psychology in 2002, and is an associate editor of Psychometrika.
His current research focuses on the development and use of Bayesian nonparametrics,
especially the advancement of regression and psychometrics.
Geoff N. Masters is chief executive officer and a member of the Board of the Australian
Council for Educational Research (ACER)—roles he has held since 1998. He is an adjunct
professor in the Queensland Brain Institute. He has a PhD in educational measurement
from the University of Chicago and has published widely in the fields of educational
assessment and research. Professor Masters has served on a range of bodies, including
terms as president of the Australian College of Educators; founding president of the Asia–
Pacific Educational Research Association; member of the Business Council of Australia’s
Education, Skills, and Innovation Taskforce; member of the Australian National Commission
for UNESCO; and member of the International Baccalaureate Research Committee.
Gideon J. Mellenbergh earned his PhD in psychology from the University of Amsterdam
in The Netherlands. He is professor emeritus of psychological methods, the University of
Amsterdam, former director of the Interuniversity Graduate School of Psychometrics and
Sociometrics (IOPS), and emeritus member of the Royal Netherlands Academy of Arts and
Sciences (KNAW). His research interests are in the areas of test construction, psychometric
decision making, measurement invariance, and the analysis of psychometrical concepts.
Dylan Molenaar earned his PhD in psychology from the University of Amsterdam in
the Netherlands in 2012 (cum laude). His dissertation research was funded by a personal
grant from the Netherlands Organization for Scientific Research and he was awarded the
Psychometric Dissertation Award in 2013. As a postdoc he studied item response theory
models for responses and response times. In addition, he has been a visiting scholar at Ohio
State University. Currently, he is an assistant professor at the University of Amsterdam.
His research interests include item response theory, factor analysis, response time model-
ing, intelligence, and behavior genetics.
The Netherlands Society for Statistics and Operations Research (VvS). His research is
in measurement models for abilities and attitudes (Rasch models and Mokken models),
Bayesian methods (prior elicitation, robustness of model choice), and behavior studies of
the users of statistical software. Together with Gerard H. Fischer, he coedited a monograph
on Rasch models in 1995.
Mari Muraki is an education data consultant and was a Code for America Fellow in 2015.
She currently builds technology to help schools use their student data efficiently. Previously,
she led the Stanford University Center for Education Policy Analysis data warehouse and
district data partnerships across the United States. She earned a BA in mathematics and
statistics from the University of Chicago and an MS in statistics, measurement, assessment
and research technology from the University of Pennsylvania.
Bengt Muthén obtained his PhD in statistics at the University of Uppsala, Sweden and is
professor emeritus at UCLA. He was the president of the Psychometric Society from 1988
to 1989 and the recipient of the Psychometric Society’s Lifetime Achievement Award in
2011. He has published extensively on latent variable modeling and many of his proce-
dures are implemented in Mplus.
Richard J. Patz is a chief measurement officer at ACT, with responsibilities for research
and development. His research interests include statistical methods, assessment design,
and management of judgmental processes in education and assessment. He served as
president of the National Council on Measurement in Education from 2015 to 2016. He
earned a BA in mathematics from Grinnell College, and an MS and a PhD in statistics from
Carnegie Mellon University.
department from 1986 to 1989. Dr. Ramsay has contributed research on various topics in
psychometrics, including multidimensional scaling and test theory. His current research
focus is on functional data analysis, and involves developing methods for analyzing sam-
ples of curves and images. He has been the president of the Psychometric Society and
the Statistical Society of Canada. He received the Gold Medal of the Statistical Society
of Canada in 1998 and the Award for Technical or Scientific Contributions to the Field of
Educational Measurement of the National Council on Measurement in Education in 2003,
and was made an honorary member of the Statistical Society of Canada in 2012.
James S. Roberts earned his PhD in experimental psychology from the University of
South Carolina in 1995 with a specialty in quantitative psychology. He subsequently com-
pleted a postdoctoral fellowship in the division of statistics and psychometric research at
Educational Testing Service. He is currently an associate professor of psychology at the
Georgia Institute of Technology and has previously held faculty positions at the Medical
University of South Carolina and the University of Maryland. His research interests focus
on the development and application of new model-based measurement methodology in
education and the social sciences.
H. Jane Rogers earned her bachelor’s and master’s at the University of New England in
Australia and her PhD in psychology at the University of Massachusetts Amherst. Her
research interests are applications of item response theory, assessment of differential item
functioning, and educational statistics. She is the coauthor of a book on item response the-
ory and has published papers on a wide range of psychometric issues. She has consulted
on psychometric issues for numerous organizations and agencies as well as on projects
funded by Educational Testing Service, Law School Admissions Council, Florida Bar, and
National Center for Educational Statistics.
Jürgen Rost earned his PhD from the University of Kiel in Germany in 1980. He became
a professor at its Institute of Science Education and led its methodology department until
2005. He has authored more than 50 papers published in peer-reviewed journals on Rasch
models and latent class models. He developed his mixture-distribution Rasch model
in 1990. He edited two volumes on latent class and latent trait models. In addition, he
authored a textbook on test theory and is the founding editor of Methods of Psychological
Research, the first online open-access journal on research methodology.
Fumiko Samejima earned her PhD in psychology from Keio University, Japan, in 1965.
She has held academic positions at the University of New Brunswick, Canada, and the
University of Tennessee. Although her research is wide-ranging, it is best known for her
xxvi Contributors
pioneering work in polytomous item response modeling. Dr. Samejima is a past president
of the Psychometric Society.
Klaas Sijtsma earned his PhD from the University of Groningen in the Netherlands in
1988, with a dissertation on the topic of nonparametric item response theory. Since 1981,
he has held positions at the University of Groningen, Vrije Universiteit in Amsterdam,
and Utrecht University, and has been a professor of methods of psychological research
at Tilburg University since 1997. He is a past president of the Psychometric Society. His
research interests encompass all topics with respect to the measurement of individual dif-
ferences in psychology. Together with Ivo W. Molenaar, he published a monograph on
nonparametric item response theory, in particular Mokken models, in 2002.
Anders Skrondal earned his PhD in statistics from the University of Oslo for which he
was awarded the Psychometric Society Dissertation Prize. He is currently a senior sci-
entist, Norwegian Institute of Public Health, adjunct professor, Centre for Educational
Measurement, University of Oslo, and adjunct professor, Graduate School of Education,
University of California, Berkeley. Previous positions include professor of statistics and
director of the Methodology Institute, London School of Economics, and adjunct profes-
sor of biostatistics, University of Oslo. His coauthored books include Generalized Latent
Variable Modeling and The Cambridge Dictionary of Statistics. His research interests span top-
ics in psychometrics, biostatistics, social statistics, and econometrics. Skrondal is currently
president-elect of the Psychometric Society.
Francis Tuerlinckx earned his PhD in psychology from the University of Leuven in
Belgium in 2000. He held a research position at the Department of Statistics of Columbia
University, New York. Since 2004, he is a professor of quantitative psychology at the KU
Leuven, Belgium. His research interests are item response theory, response time modeling
in experimental psychology and measurement, Bayesian statistics, and time series analysis.
Contributors xxvii
Wim J. van der Linden is a distinguished scientist and director of research innovation,
Pacific Metrics Corporation, Monterey, California, and professor emeritus of measurement
and data analysis, University of Twente, the Netherlands. He earned his PhD in psychomet-
rics from the University of Amsterdam in 1981. His research interests include test theory,
computerized adaptive testing, optimal test assembly, parameter linking, test equating,
and response-time modeling, as well as decision theory and its application to problems
of educational decision making. He is a past president of the Psychometric Society and
the National Council on Measurement in Education and has received career achievement
awards from NCME, ATP, and the Psychometric Society, as well as the E. F. Lindquist
award from AERA.
Han L. J. van der Maas earned his PhD in developmental psychology in 1993 (cum
laude), with dissertation research on methods for the analysis of phase transitions in
cognitive development. After a five-year KNAW fellowship, he joined the faculty of the
Developmental Group of the University of Amsterdam, first as an associate professor and
in 2003 as a professor. In 2005, he became professor and chair of the Psychological Methods
Group at the University of Amsterdam. Since 2008, he is also director of the Graduate
School of Psychology at the University of Amsterdam. His current research includes net-
work models of general intelligence, new psychometric methods, and adaptive learning
systems for education.
Matthias von Davier currently holds the position of senior research director, global assess-
ment, at Educational Testing Service, Princeton, New Jersey, USA. He earned his PhD
from the University of Kiel in Germany in 1996, specializing in psychometric methods. He
serves as the editor of the British Journal of Mathematical and Statistical Psychology and is a
coeditor of the Journal of Large Scale Assessments in Education. He received the ETS Research
Scientist Award in 2006 and the Bradley Hanson Award for contributions to Educational
Measurement in 2012. His research interests are item response theory, including extended
Rasch models and mixture distribution models for item response data, latent structure
models, diagnostic classification models, computational statistics, and developing advanced
psychometric methods for international large-scale surveys of educational outcomes.
Most people start at our website which has the main PG search
facility: www.gutenberg.org.