You are on page 1of 2

Log in

Menu Search Cart

Essays on Contemporary
Psychometrics
pp 69–87 Cite as

Essays on Contemporary Psychometrics


Chapter

Trustworthy Artificial Intelligence


in Psychometrics
Bernard P. Veldkamp

Chapter First Online: 16 March 2023


Part of the Methodology of Educational
Measurement and Assessment book series
(MEMA)

Abstract

The availability of sensors, eye-trackers,


smartwatches, Wi-Fi trackers, or other digital
devices facilitates the collection of new
types of data that can be used for
measurement. The question is how to
analyze them. Several psychometric models
are available, but even though they have
been applied successfully in many testing
programs, they do have their limits with
respect to the kind of data they can be
applied to. Artificial intelligence (AI) offers
many methods for dealing with these new
and more complex datasets. They do have
some limitations when it comes to reliable
and valid measurement thought. The
question arises how to apply them in the
field of psychometrics. To answer this
question, the field of psychometrics is
introduced first. Besides, the benefits and
disadvantages of artificial intelligence are
illustrated in three examples. A promising
development, when it comes to the
application of AI in the field of
psychometrics, is referred to as trustworthy
AI (TAI), with principles related to fairness,
explainability, and accountability. Based on
the examples of the use of AI in social and
health science and the lessons learned from
the approaches to integrate new data types
in existing psychometric models, a
framework is defined with nine steps for the
use of AI in psychometrics. For each of these
steps, it is evaluated how TAI can be applied
for reliable and valid measurement. The
chapter concludes with the observation that
straightforward application of AI in the field
of Psychometrics might still be a step too far,
but that the developments related to TAI go
fast and offer new and exciting opportunities
for the application of AI to psychometrics.

Keywords
Artificial intelligence Explainable AI

Psychometrics Trustworthy AI

This is a preview of subscription content,


access via your institution.

Chapter USD 29.95


Price excludes VAT (Peru)

Available as PDF
Read on any device
Instant download
Own it forever

Buy Chapter

eBook USD 129.00

Hardcover Book USD 169.99

Tax calculation will be finalised at checkout

Purchases are for personal use only


Learn about institutional subscriptions

References

Abbasi, M. M., & Beltiukov, A. P. (2019).


Summarizing emotions from text using
Plutchik’s wheel of emotions. In N.
Yusupova, G. Shakhmametova, K. Mironov,
& L. Galimova (Eds.), Proceedings of the
7th scientific conference on Information
Technologies for Intelligent Decision Making
Support (ITIDS 2019): Vol. 166. Advances in
intelligent systems research (pp. 291–294).
Atlantis Press. https://doi.org/10.2991/itids-
19.2019.52

CrossRef Google Scholar

Abdurrahim, S. H., Samad, S. A., & Huddin,


A. B. (2018). Review on the effects of age,
gender, and race demographics on
automatic face recognition. The Visual
Computer, 34(11), 1617–1630.
https://doi.org/10.1007/s00371-017-1428-z

CrossRef Google Scholar

Aguinis, H., Gottfredson, R. K., & Joo, H.


(2013). Best-practice recommendations for
defining, identifying, and handling outliers.
Organizational Research Methods, 16(2),
270–301.
https://doi.org/10.1177/1094428112470848

CrossRef Google Scholar

Angelov, P., & Soares, E. (2020). Towards


explainable deep neural networks (xDNN).
Neural Networks, 130(1), 185–194.
https://doi.org/10.1016/j.neunet.2020.07.010

CrossRef Google Scholar

Bakker, M., & Wicherts, J. M. (2014). Outlier


removal, sum scores, and the inflation of
the Type I error rate in independent
samples t-tests: The power of alternatives
and recommendations. Psychological
Methods, 19(3), 409.
https://doi.org/10.1037/met0000014

CrossRef Google Scholar

Blázquez-García, A., Conde, A., Mori, U., &


Lozano, J. A. (2021). A review on
outlier/anomaly detection in time series
data. ACM Computing Surveys (CSUR),
54(3), 1–33.
https://doi.org/10.1145/3444690

CrossRef Google Scholar

CRAN, R. (2021). The R project for


statistical computing. http://www.r-
project.org

Csikszentmihalyi, M. (1997). Finding flow:


The psychology of engagement with
everyday life. Basic Books.
https://psycnet.apa.org/record/1997-
08434-000

Google Scholar

De Groot, A. D. (2019). Methodologie:


Grondslagen van onderzoek en denken in
de gedragswetenschappen [Methodology:
Foundations of research and thinking in the
behavioral sciences]. De Gruyter Mouton.
https://doi.org/10.1515/9783110875621

Dinga, R., Penninx, B. W., Veltman, D. J.,


Schmaal, L., & Marquand, A. F. (2019).
Beyond accuracy: Measures for assessing
machine learning models, pitfalls and
guidelines. bioRxiv, (p. 743138).
https://doi.org/10.1101/743138

Dolmans, T. C., Poel, M., Van’t Klooster, J.


W. J., & Veldkamp, B. P. (2020). Perceived
mental workload classification using
intermediate fusion multimodal deep
learning. Frontiers in Human Neuroscience,
14(1), 609066.
https://doi.org/10.3389/fnhum.2020.60909
6

CrossRef Google Scholar

Embretson, S. E., & Reise, S. P. (2013). Item


response theory. Psychology Press.

CrossRef Google Scholar

Escalante, H. J., Escalera, S., Guyon, I.,


Baró, X., Güçlütürk, Y., Güülü, U., … & van
Lier, R. (Eds.). (2018). Explainable and
interpretable models in computer vision and
machine learning. Cham: Springer
International Publishing.

Google Scholar

Feurer, M., & Hutter, F. (2019).


Hyperparameter optimization. In F. Hutter,
L. Kotthoff, & J. Vanschoren (Eds.),
Automated machine learning: Methods,
systems, challenges (pp. 3–33). Springer.
https://doi.org/10.1007/978-3-030-05318-
5_1

CrossRef Google Scholar

Floridi, L. (2019). Establishing the rules for


building trustworthy AI. Nature Machine
Intelligence, 1(6), 261–262.
https://philpapers.org/archive/FLOETR.pdf

CrossRef Google Scholar

Floridi, L., Cowls, J., Beltrametti, M., Chatila,


R., Chazerand, P., Dignum, V., et al. (2018).
AI4People—An ethical framework for a
good AI society: Opportunities, risks,
principles, and recommendations. Minds
and Machines, 28(4), 689–707.
https://doi.org/10.1007/s11023-018-9482-5

CrossRef Google Scholar

Hambleton, R. K., & Swaminathan, H.


(1985). Item response theory: Principles
and applications. Kluwer-Nijhoff.

CrossRef Google Scholar

Hart, S. G., & Staveland, L. E. (1988).


Development of NASA-TLX (Task Load
Index): Results of empirical and theoretical
research. In P. A. Hancock & N. Meshkati
(Eds.), Advances in psychology (Vol. 52, pp.
139–183). North-Holland.
https://doi.org/10.1016/S0166-
4115(08)62386-9

CrossRef Google Scholar

Hastie, T., Tibshirani, R., & Friedman, J.


(2008). The elements of statistical learning.
Springer. https://doi.org/10.1007/978-0-
387-84858-7

CrossRef Google Scholar

He, Q., Veldkamp, B. P., & de Vries, T.


(2012). Screening for posttraumatic stress
disorder using verbal features in self
narratives: A text mining approach.
Psychiatry Research, 198(3), 441–447.
https://doi.org/10.1016/j.psychres.2012.01.0
32

CrossRef Google Scholar

He, Q., Veldkamp, B. P., Glas, C. A., & Van


Den Berg, S. M. (2019). Combining text
mining of long constructed responses and
item-based measures: A hybrid test design
to screen for posttraumatic stress disorder
(PTSD). Frontiers in Psychology, 10(1),
2358.
https://doi.org/10.3389/fpsyg.2019.02358

CrossRef Google Scholar

Hu, S., Xiong, J., Fu, P., Qiao, L., Tan, J., Jin,
L., & Tang, K. (2017). Signatures of
personality on dense 3D facial images.
Scientific Reports, 7(1), 1–10.
https://doi.org/10.1038/s41598-017-00071-
5

CrossRef Google Scholar

James, G., Witten, D., Hastie, T., &


Tibshirani, R. (2021). An introduction to
statistical learning. Springer.
https://doi.org/10.1007/978-1-4614-7138-7

CrossRef Google Scholar

Kachur, A., Osin, E., Davydov, D., Shutilov,


K., & Novokshonov, A. (2020). Assessing
the Big Five personality traits using real-life
static facial images. Scientific Reports,
10(1), 1–11. https://doi.org/10.1038/s41598-
020-65358-6

CrossRef Google Scholar

Kane, M. T. (2006). Validation. In R. L.


Brennan (Ed.), Educational measurement
(4th ed., pp. 1–73). American Council on
Education/Praeger Publishers.

Google Scholar

Kane, M. T. (2013). Validating the


interpretations and uses of test scores.
Journal of Educational Measurement, 50(1),
1–73. https://doi.org/10.1111/jedm.12000

CrossRef Google Scholar

Keszler, N. S. (2021). Automatic personality


prediction based on facial features: Race,
gender, and age bias [Unpublished
bachelor thesis, University of Twente].
http://essay.utwente.nl/86496/1/Keszler_BA
_BMS.pdf

Liang, H., Sun, X., Sun, Y., & Gao, Y. (2017).


Text feature extraction based on deep
learning: A review. EURASIP Journal on
Wireless Communications and Networking,
2017(1), 1–12.
https://doi.org/10.1186/s13638-017-0993-1

CrossRef Google Scholar

Liem, C. C. S., Langer, M., Demetriou, A.,


Hiemstra, A. M. F., Sukma Wicaksana, A.,
Born, M. P., & König, C. J. (2018).
Psychology meets machine learning:
Interdisciplinary perspectives on
algorithmic job candidate screening. In H. J.
Escalante, S. Escalera, I. Guyon, X. Baró, Y.
Güçlütürk, U. Güçlü, & M. van Gerven
(Eds.), Explainable and interpretable
models in computer vision and machine
learning (pp. 197–253). Springer.
https://doi.org/10.1007/978-3-319-98131-
4_9

CrossRef Google Scholar

Lord, F. M. (1980). Applications of item


response theory to practical testing
problems. Erlbaum.

Google Scholar

Lord, F. M., & Novick, M. R. (1968).


Statistical theories of mental test scores.
Addison-Wesley.

Google Scholar

MATLAB. (2021). MATLAB (Version R2021a)


[Computer Software]. The MathWorks Inc.

Google Scholar

Meijer, R. R., & Sijtsma, K. (2001).


Methodology review: Evaluating person fit.
Applied Psychological Measurement, 25(2),
107–135.
https://doi.org/10.1177/01466210122031957

CrossRef Google Scholar

Miotto, R., Wang, F., Wang, S., Jiang, X., &


Dudley, J. T. (2018). Deep learning for
healthcare: Review, opportunities, and
challenges. Briefings in Bioinformatics,
19(6), 1236–1246.
https://doi.org/10.1093/bib/bbx044

CrossRef Google Scholar

Neumann, M., Niessen, A. S. M., Tendeiro, J.


N., & Meijer, R. R. (2021). The autonomy-
validity dilemma in mechanical prediction
procedures: The quest for a compromise.
Journal of Behavioral Decision Making
(Advance online publication).
https://doi.org/10.1002/bdm.2270

O’Neil, C. (2016). Weapons of math


destruction. Crown Books.

Google Scholar

OpenCV. (2020). Open source computer


vision library.
https://github.com/opencv/opencv

Panch, T., Szolovits, P., & Atun, R. (2018).


Artificial intelligence, machine learning, and
health systems. Journal of Global Health,
8(2), 1–8.
https://doi.org/10.7189/jogh.08.020303

CrossRef Google Scholar

Rawal, G., Yadav, S., & Kumar, R. (2017).


Post-intensive care syndrome: An overview.
Journal of Translational Internal Medicine,
5(2), 90–92.
https://sciendo.com/pdf/10.1515/jtim-2016-
0016

CrossRef Google Scholar

Röber, T. E. (2021). Automated personality


prediction based on facial features
[Unpublished master thesis, University of
Utrecht].

Google Scholar

Schleicher, A. (2019). PISA 2018: Insights


and interpretations. OECD Publishing.
https://www.oecd.org/pisa/PISA%202018%
20Insights%20and%20Interpretations%20
FINAL% 20PDF.pdf

Sijtsma, K. (2009). On the use, the misuse,


and the very limited usefulness of
Cronbach’s alpha. Psychometrika, 74(1),
107–120. https://doi.org/10.1007/s11336-
008-9101-0

CrossRef Google Scholar

Sijtsma, K., & Meijer, R. R. (2006).


Nonparametric item response theory and
special topics. In C. R. Rao & S. Sinharay
(Eds.), Handbook of statistics (Vol. 26, pp.
719–746). Elsevier.
https://doi.org/10.1016/S0169-
7161(06)26022-X

CrossRef Google Scholar

Sijtsma, K., & Molenaar, I. W. (2002).


Introduction to nonparametric item
response theory. SAGE.

CrossRef Google Scholar

Sijtsma, K., & van der Ark, L. A. (2015).


Conceptions of reliability revisited and
practical recommendations. Nursing
Research, 64(2), 128–136.
https://doi.org/10.1097/NNR.00000000000
00077

CrossRef Google Scholar

Thiebes, S., Lins, S., & Sunyaev, A. (2021).


Trustworthy artificial intelligence. Electronic
Markets, 31(2), 447–464.
https://doi.org/10.1007/s12525-020-00441-
4

CrossRef Google Scholar

van der Linden, W. J. (2007). A hierarchical


framework for modeling speed and
accuracy on test items. Psychometrika,
72(3), 287–308.
https://doi.org/10.1007/s11336-006-1478-z

CrossRef Google Scholar

Van Rossum, G., & Drake, F. L. (2009).


Python 3 reference manual. CreateSpace.
https://www.python.org

Google Scholar

Veldkamp, B. P. (2018). Mastering the data


mass [Inaugural address]. University of
Twente.
https://research.utwente.nl/files/28106874/
oratie_Bernard_Veldkamp.pdf

Veldkamp, B., Schildkamp, K., Keijsers, M.,


Visscher, A., & de Jong, T. (2021). Big Data
Analytics in Education: Big Challenges and
Big Opportunities. International
Perspectives on School Settings, Education
Policy and Digital Strategies: A Transatlantic
Discourse in Education Research, 266.

Google Scholar

Voigt, P., & Von dem Bussche, A. (2017).


The EU general data protection regulation
(GDPR): A practical guide. Springer.
https://doi.org/10.1007/978-3-319-57959-7

CrossRef Google Scholar

Voulodimos, A., Doulamis, N., Doulamis, A.,


& Protopapadakis, E. (2018). Deep learning
for computer vision: A brief review.
Computational Intelligence and
Neuroscience, 2018, e7068349.
https://doi.org/10.1155/2018/7068349

CrossRef Google Scholar

Zhu, X. X., Tuia, D., Mou, L., Xia, G. S.,


Zhang, L., Xu, F., & Fraundorfer, F. (2017).
Deep learning in remote sensing: A
comprehensive review and list of resources.
IEEE Geoscience and Remote Sensing
Magazine, 5(4), 8–36.
https://doi.org/10.1109/MGRS.2017.2762307

CrossRef Google Scholar

Download references

Author information

Authors and Affiliations


Department of Learning Data and
Technology, Faculty of Behavioral
Management and Social Sciences,
University of Twente, Enschede, The
Netherlands
Bernard P. Veldkamp

Corresponding author
Correspondence to Bernard P. Veldkamp .

Editor information

Editors and Affiliations


Research Institute of Child Development
and Education, University of Amsterdam,
Amsterdam, The Netherlands
L. Andries van der Ark

Department of Methodology and


Statistics, Tilburg University, Tilburg, The
Netherlands
Wilco H. M. Emons

The expertise group Psychometrics and


Statistics, University of Groningen,
Groningen, The Netherlands
Rob R. Meijer

Rights and permissions

Reprints and Permissions

Copyright information

© 2023 The Author(s), under exclusive


license to Springer Nature Switzerland AG

About this chapter

Cite this chapter


Veldkamp, B.P. (2023). Trustworthy Artificial
Intelligence in Psychometrics. In: van der Ark,
L.A., Emons, W.H.M., Meijer, R.R. (eds) Essays on
Contemporary Psychometrics. Methodology of
Educational Measurement and Assessment.
Springer, Cham. https://doi.org/10.1007/978-3-
031-10370-4_4

Download citation

.RIS .ENW .BIB

DOI
https://doi.org/10.1007/978-3-031-10370-4_4

Published
16 March 2023

Publisher Name
Springer, Cham

Print ISBN
978-3-031-10369-8

Online ISBN
978-3-031-10370-4

eBook Packages
Education
Education (R0)

Discover content
Journals A-Z

Books A-Z

Publish with us
Publish your research

Open access publishing

Products and services


Our products

Librarians

Societies

Partners and advertisers

Our imprints
Springer

Nature Portfolio

BMC

Palgrave Macmillan

Apress

Your privacy choices/Manage cookies

Your US state privacy rights

Accessibility statement

Terms and conditions Privacy policy

Help and support

132.251.3.96

Not affiliated

© 2023 Springer Nature

You might also like