You are on page 1of 10

Personality and Individual Differences 168 (2021) 110363

Contents lists available at ScienceDirect

Personality and Individual Differences


journal homepage: www.elsevier.com/locate/paid

Measuring emotion regulation for preservice teacher selection: A theory- T


driven development of a situational judgment test
Corinna Koschmieder , Aljoscha C. Neubauer

Dept. of Psychology, University of Graz, Austria, Universitätsplatz 2, 8010 Graz, Austria

ARTICLE INFO ABSTRACT

Keywords: Competencies to manage one's own and others' emotions are highly relevant in teacher education but to assess
Situational judgment test them in the context of a selection personality questionnaires are subject to faking. Situational judgment tests
Teacher Student Assessment Austria (TESAT) (SJTs) are considered to be less biased by faking, therefore we developed a novel SJT for emotion regulation that
Emotion regulation is implemented in the admission exam for teacher education. Often SJT development is criticized to be too a
Item response theory
theoretical; therefore, we aimed at a theory-driven SJT of interpersonal vs. intrapersonal emotion regulation in
Emotion regulation in pedagogical situations
(ERIPS)
pedagogical situations. We used a mixed approach of inductive and deductive item construction to improve test
quality (item homogeneity and measurement fairness). The final test comprises 22 items with four response
alternatives, each expressing one of four emotion regulation strategies. In two studies, we examined psycho­
metric quality, fairness and validity of the test and relations with cognitive ability and personality. Results
support a psychometrically sound and gender-fair measurement according to the 1PL Rasch model. Correlations
with tests for emotion regulation, openness, agreeableness and the dark triad were observed. Interpersonal
emotion regulation predicted higher altruistic professional motives, whereas intrapersonal emotion regulation
predicted higher teacher self-efficacy. Both are (negative) predictors of the intention to quit teacher education.

1. Introduction strategies and interpersonal processes. However, in recent years the


focus changed to the intrapersonal aspects of emotion regulation
Psychological tests can provide very useful information on the (Hofmann, 2016) and there is less research on interpersonal emotion
suitability of candidates for many different professions. For predicting regulation (Hofmann, Carpenter, & Curtiss, 2016).
later professional success a great variety of tests for cognitive as well as Especially in education teachers have to deal with emotional si­
noncognitive characteristics of individuals have been tested and found tuations (inter- & intrapersonal) in their everyday work life. This job
to provide reasonable predictive validities (Salgado, Viswesvaran, & with high emotional labor demands can be overwhelming and result in
Ones, 2002; Schmidt & Hunter, 1998). Here we describe the develop­ emotional exhaustion (Kim, Jörg, & Klassen, 2019). It is discussed that
ment of a test of emotion regulation for the selection of students for emotional exhaustion is one of the main elements of burnout (Skaalvik
teacher education, which is part of an admission procedure for teacher & Skaalvik, 2010). In line with this burnout is considered a consequence
training in Austria, in combination with other cognitive and non­ of unsuccessful execution of coping strategies resulting in persistent
cognitive characteristics, in order to increase the predictive validity for stress (Guglielmi & Tatrow, 1998; Vandenberghe & Huberman, 1999).
job performance. Currently emotion regulation is also discussed as subfactor of the
broader concept of emotional intelligence (EI), that can also be a valid
1.1. Emotion regulation and human performance predictor of performance in diverse contexts (Van Rooy & Viswesvaran,
2004), such as academic performance (Neubauer et al., 2017; Parker,
Since 1990 the number of publications containing the term “emo­ Summerfeldt, Hogan, & Majeski, 2004), job performance (Côté &
tion regulation” is consistently increasing (Gross, 2014). Over recent Morgan, 2002) and life satisfaction (Sharma, Gangopadhyay, Austin, &
years, many different frameworks have emerged to describe how Mandal, 2013). Moreover, among all facets of EI emotion regulation
emotions are dealt with (for a review see Neubauer & Freudenthaler, shows the highest correlation with job achievement (Joseph &
2005; Gross, 2014). In the early definitions emotion regulation was Newman, 2010), but this relation is moderated by the amount of
described as the ability to modify emotions through self-regulation emotional labor in the job. For jobs with high emotional labor, emotion


Corresponding author.
E-mail address: corinna.koschmieder@uni-graz.at (C. Koschmieder).

https://doi.org/10.1016/j.paid.2020.110363
Received 28 April 2020; Received in revised form 21 August 2020; Accepted 22 August 2020
0191-8869/ © 2020 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/BY-NC-ND/4.0/).
C. Koschmieder and A.C. Neubauer Personality and Individual Differences 168 (2021) 110363

regulation is a positive predictor (β = 0.17), whereas for jobs with low of the SJT literature, even though numerous attempts have been made
emotional labor, emotion regulation is a negative predictor in recent years to overcome it (De Meijer, Born, van Zielst, & van der
(β = −0.11, Joseph & Newman, 2010). For the teaching profession – a Molen, 2010; Jackson, LoPilato, Hughes, Guenole, & Shalfrooshan,
profession with high demands of emotional labor – EI seems to be a 2017; Westring et al., 2009). In the complex construction process, it is
meaningful predictor, although relevant literature is still rare. Research important to focus on this topic, because already small changes im­
indicates that high EI has a positive effect on pedagogical competencies plicate differences in the validities (Muck, 2013). For example, valid­
and a negative effect on perceived job pressure (Mayr, 2012). In ad­ ities vary depending on whether situational knowledge or behavioral
dition, emotionally competent teachers describe themselves as being tendency is measured (Freudenthaler & Neubauer, 2007; McDaniel,
better at classroom management as well as student engagement (Di Hartman, Whetzel, & Grubb, 2007). An approach to tackle the ques­
Fabio & Palazzeschi, 2008). tionable construct homogeneity of SJTs is an accurate development of
construct-driven SJTs (Guenole, Chernyshenko, & Weekly, 2017). These
1.2. Emotion Regulation Measurement Paradigms in an assessment setting require research-based decisions from the developers to increase
transparency and knowledge on the measured construct.
The competence to manage one's one emotion or the emotions of In line with that the construction process of an SJT entails many
others was so far operationalized via self- and other ratings (Roberts, decisions. Test developers have to ask themselves the following ques­
MacCann, Matthews & Zeidner, 2010). These measures often focus on tions, among others: What is the exact construct I want to measure?
emotion regulation in psychopathological settings, e.g. the ‘difficulties How do I choose the response alternatives? Which type of scoring do I
in emotion regulation scale’ (Gratz & Roemer, 2004). In an admission want to apply? What kind of instruction do I want to use? Who has the
procedure, however, self-reports contain the risk of faking and socially appropriate expertise to rate the response alternatives? Do I use a
desirable response behavior. A reasonable alternative are situational specific situational context?
judgment tests that can be seen as mental work samples. In the litera­ Bearing these questions in mind, it would be important to in­
ture, some SJTs for emotional constructs already exist (some examples: vestigate the influence of different decisions and levels of standardi­
Situational Test of Emotional Understanding (STEU) and Situational zation on the construct (Campion, Ployhart, & MacKenzie Jr, 2014). In
Test of Emotion Management (STEM) developed by MacCann and the last ten years, research on the effect of different response formats or
Roberts (2008). The STEM is widely used, moderately correlated with scorings has increased (Bergman et al., 2006; De Leng et al., 2017;
intelligence and predict outcomes like stress and anxiety. Sharma et al.'s Weng, Yang, Lievens, & McDaniel, 2018), but other topics, such as the
SJT of Emotional Intelligence (Sharma et al., 2013) is related to emo­ identification of new ways to construct situations and items, remain
tional intelligence, but neither to personality nor to intelligence. In neglected. Research indicates that 81.8% of the SJTs are developed
addition, there are a few SJTs constructed for teacher competency using critical incidents (Campion et al., 2014). Weekley, Ployhart, and
purposes (e.g. Klassen, Durksen, Rowett, & Patterson, 2014; Stemler & Holtz (2006) recommend that one could manipulate various features
Sternberg, 2006), however none of them is designed for emotional across cells (e.g. item stem development) and that the effects on SJT
competencies. psychometric equivalence could be examined.
The main contribution of this study is to demonstrate new ap­
1.3. Challenges in the development of SJTs proaches to develop items and situations for construct driven SJTs and
thereby improve construct homogeneity. We address the lack of re­
Situational judgment tests (SJTs) are a well-established, effective search within the development of an SJT to measure emotion regulation
and a now more widely used test format in personnel selection proce­ in pedagogical situations (ERIPS) as part of the admission procedure for
dures (Christian, Edwards, & Bradley, 2010; Ryan & Ployhart, 2014). teacher education. Within this process, we pay attention to a high
This could be due to their good predictive validities, their lower sus­ standardization of the construct in item stem and response alternatives
ceptibility to faking, their good acceptance among applicants and their and use item response theory (IRT) for testing psychometric equiva­
similarity to job simulations (Lievens, Peeters, & Schollaert, 2008; lence (in gender, performance, item difficulty and different item stem
McDaniel & Nguyen, 2001; Schuler & Marcus, 2006). In meta-analyses, contexts) in order to improve construct clarification.
predictive validities of SJTs vary between 0.21 and 0.41 for job per­
formance and show incremental validities over and above cognitive 1.4. The present study
ability and personality (McDaniel, Morgeson, Finnegan, Campion, &
Braverman, 2001). Further evidence of good criterion validity with The current study addresses the development and psychometric
regard to academic performance has been found (Lievens & Sackett, examination of the emotion regulation test in pedagogical situations
2012). (ERIPS). This test was designed in the course of the project "Teacher
Despite these advantages, SJTs face issues in test development: “In Student Assessment Austria" (TESAT; Neubauer et al., 2017) to be in­
fact, the communality between these tests that share a format has not cluded in the admission exam for teacher education. Emotion regula­
been clearly defined” (Gessner & Klimoski, 2006, p.29). McDaniel et al. tion competencies are regarded as an important prerequisite for
(2001) describe SJTs as a measurement method that can be used to burnout prevention and enable teachers to support pupils in their ev­
assess a variety of constructs. As for all psychological tests, it is im­ eryday school life (Brackett, Palomera, Mojsa-Kaja, Reyes, & Salovey,
portant to have a clear definition of the measured construct. However, 2010). Further it should be mentioned that the test was designed to test
in the past, developers of SJTs might have paid less attention to the for emotional knowledge and skills as a basic requirement to enter into
construct validity of their measures than the developers of other psy­ teacher education independent of practical professional knowledge. In
chological instruments (Bergman, Drasgow, Donovan, Henning, & the process, we paid particular attention to the theoretical framework
Juraska, 2006; Christian et al., 2010), as well as to the definition of the for the development of situations (as recommended by Campion et al.,
construct they measure and its structure. Researchers additionally 2014) and the response alternatives. The given response alternatives
argue that the constructs best measured with SJTs are heterogeneous in were standardized based on an underlying model for dealing with
nature (Lievens & Coetsier, 2002). emotions. In fact, all response alternatives were constructed according
This could be due to the specific situational context and the fact that to four emotion-regulation strategies (Aldao, Nolen-Hoeksema, &
the critical incident technique (CIT; Flanagan, 1954) – which is typi­ Schweizer, 2010). This led to a mixed approach of inductive (use of
cally used to develop SJTs – generates items in an inductive approach, critical incidents) and deductive (use of a theoretical framework stan­
which could increase heterogeneity in the construct. Lack of clarity dardized up to the response alternatives) item construction.
about the measured construct of this test format is still an issue in terms The second aim was to test psychometric fairness for gender. Due to

2
C. Koschmieder and A.C. Neubauer Personality and Individual Differences 168 (2021) 110363

the fact that 1) measurement equivalence for men and women in SJTs is Table 1
unexplored, and 2) in most SJTs, women perform better than men Example of an item for interpersonal emotion regulation and an item for in­
(Whetzel, McDaniel, & Nguyen, 2008), it is important to scrutinize that trapersonal emotion regulation of the emotion anger.
the test performance cannot be traced back to any measurement errors. You accuse a student of cheating during a school assignment. The class angrily
Finally, we followed the recommendation of Weekley et al. (2006) takes the student's side.
and manipulated the pedagogical situational context. Items with a • I tell the class to think about why their anger is not appropriate in this situation.
(rumination)
classroom context or other pedagogical contexts were constructed to
test for psychometric equivalence of the different item stems. • I say that there is nothing more to discuss on this subject and continue with the
lesson. (suppression)
We created an initial test version consisting of 40 items. In study 1, a
thorough test analysis was performed using IRT to test for psychometric
• I tell the class that it is important to stand up for others and explain the situation
to them. (cognitive reappraisal)
equivalence, person and item homogeneity and first validation. The • I tell the students that I notice that they are angry and ask them how we can deal
with the situation. (acceptance)
examined test version was used in the teacher admission test in 2015
A student repeats asking for your advice without trying to solve the task
with a longitudinal follow-up in the first semester (study 2). The second himself. Angered, you notice how he asks you for advice again.
study was analyzed for purposes of structural inspection with con­ • I wonder how it is possible that I could be annoyed by such a reaction and take
firmatory factor analysis (CFA) and validation in a larger sample. This time to think about it. (rumination)
• I take a deep breath and pay no further attention to the feeling. (suppression)

resulted in the final test version consisting of 22 items.
I remember how important it is to have motivated and interested students.
(cognitive reappraisal)
2. Study 1: test development • I tell him that I feel that he lets me do his work and that it frustrates me. Then I
show him one last time. (acceptance)
2.1. Methods

2.1.1. Participants Table 2


A total of 188 high school students close to graduation (117 women Example for an item for interpersonal emotion regulation and an item for in­
trapersonal emotion regulation of the emotion fear.
(62.2%) and 71 men (37.8%)) participated in study 1. This sample was
chosen because most of the applicants in the admission procedure apply At the end of the school year, you tell a student in a one on one conversation
for university during their last year in school. The youngest participants that he will not be transferred this year. Afterwards he jumps up and shouts
frightened: “Then I'll run away.”
were 16 and two participants were older than 20 years (M = 17.42,
• I ask the student how he thinks it could have come to this situation. (rumination)
SD = 0.96). At the time of participating in the study, 31.9% of the
participants were considering studying teacher education.
• I tell the student that I unfortunately have to draw these consequences and he will
manage. (suppression)
• I explain to the student that repeating the class gives him the chance to internalize
the learning contents more deeply. (cognitive reappraisal)

2.1.2. Tests and measures
I tell the student that I understand his fear and we will be able to work together to
2.1.2.1. Situational judgment test for emotion regulation in pedagogical find the most appropriate way to deal with the situation. (acceptance)
situations (ERIPS) You took over a class from a teacher colleague at the beginning of the school
2.1.2.1.1. Construct. In their process model, Mayer and Salovey year. As you enter the class, a student says: “Go away. We don't want you
(1997) differentiate between four factors of emotional intelligence (EI): here.” The class agrees with him. The situation makes you feel very
insecure.
(1) perception of emotion; (2) emotional facilitation of thought; (3)
• I think about why the students reject me as a teacher. (rumination)
understanding emotions; and (4) managing emotions. The ERIPS is an
• I try to hide the insecurity and start teaching. (suppression)
SJT that focuses on the model's fourth factor and on the definition of
emotional competence by Petermann and Wiedebusch (2008), which
• I think that the students liked my colleague very much and I hope that this will be
the same for me one day. (cognitive reappraisal)
distinguishes the regulation of one's own emotions as well as the ability • I tell the pupils that this is a new situation for all of us and ask them how we want
to deal with this problem. (acceptance)
to regulate the emotions of others (intra- vs. interpersonal emotion
regulation). In addition, Gross (2014) refers to an intrinsic and extrinsic
emotion regulation and established the process model of emotion
regulation. Within that model different emotion regulation strategies situations through the use of coping strategies acceptance, suppression,
like cognitive reappraisal or suppression are linked to e.g. the stage of rumination and cognitive reappraisal.
cognitive change or response modulation. Next to cognitive reappraisal 2.1.2.1.2. Test construction. The collected situations were – as is
and suppression Aldao et al. (2010) report on four other emotion usual for SJT development – directly derived using CIT (Flanagan,
regulation strategies (acceptance, avoidance, problem solving and 1954) via a structured expert interview. Ten Teachers and eleven
rumination) which are linked to psychopathology in a metanalysis. experts from other pedagogical contexts (scout groups, fire brigades,
For the construction response alternatives of the ERIPS we decided not theater groups etc.) were interviewed by three psychological
to use avoidance, because ERIPS focuses on direct actions in an researchers. On the basis of the interviews item stems were
emotional situation which could not be avoided without leaving the constructed and reviewed by those in the following way: During the
situation. The strategy of problem solving was not integrated either, construction process, the information in the item stem was
since the situations are not necessarily clear problems that can be standardized, i.e. all situations described have a similar length. In
solved. Thus, in all items the strategies of rumination (as intensive order to minimize the probability that the response behavior depends
mental occupation), suppression (as ignoring or stopping the emotional on the extent of reading comprehension competencies, short sentences
situation), cognitive reappraisal (as mental change of the and simple language were used. The following guideline questions can
interpretation) and acceptance (as an accepting treatment of the be answered for each situation:
situation without interrupting or suppressing it) are represented in
the response alternatives. Four examples are given in Tables 1 and 2. (1) Where is the teacher?
Within the construction process attention was paid to situations which (2) What is happening in the situation?
included the emotions of the Izard's (1994) model of emotions (fear, (3) Who experiences the emotion and why?
joy, anger, sadness, disgust, surprise, interest, contempt, shame, guilt), (4) What does the class/group do?
although in the further analyses it was not considered a requirement
that all emotions were represented in the test. Altogether the ERIPS Possible response alternatives were generated in the interviews and
measures inter- and intrapersonal emotion regulation in pedagogical in an online survey, which was completed by 12 teachers and

3
C. Koschmieder and A.C. Neubauer Personality and Individual Differences 168 (2021) 110363

pedagogical experts (age: M = 50.8 years, SD = 4.27; professional ex­ 2.1.3. Procedure
perience: M = 26.8 years, SD = 7.56). Participants were tested groupwise in computer classes during
Subsequently, response alternatives were selected by three psy­ school lessons. All tests were administered with the online survey
chological researchers so that (1) every alternative corresponded to one software LimeSurvey (Version 2.05., www.limesurvey.org). The total
of the four emotion regulation strategies (rumination, suppression, re­ test session took up to one and a half hours. For all Rasch model ana­
appraisal, acceptance). (2) Interpersonal emotion regulation items lyses, R (version 3.4.3.) with the Extended Rasch Modeling package
focus on providing guidance or support to others based on these stra­ (eRm; Mair & Hatzinger, 2007) and the Rasch Sampler package
tegies. As far as intrapersonal emotion regulation items are concerned, (Verhelst, Hatzinger, & Mair, 2007) were used.
(3) care has been taken to ensure that the response alternatives were as
independent as possible from other people. (4) The displayed reactions
2.2. Results
described in the response alternatives should be as immediate to the
situation as possible.
2.2.1. Test analysis
With this procedure, the construct was standardized down to the
Out of the initial pool of 40 items, we wanted to construct a shor­
level of the response alternatives and, in addition to the previously used
tened test with a high psychometric standard. In a first step, we re­
inductive construction, a theory-based deductive construction was ap­
moved five items owing to low item difficulty parameters (pi < 0.19).
plied due to the strict wording and content criteria described above.
In a next step, the fit of a 1PL Rasch model was assessed via item fit
2.1.2.1.3. Coding. SJTs can be coded in various ways. Bergman
parameters, Andersen Likelihood Ratio tests (LRT; Andersen, 1973) and
et al. (2006) refers to six different scoring methods, which again can be
Martin-Löf tests (Martin-Löf, 1973). Due to the small sample size,
broken down into further subcategories. The final items of the ERIPS
nonparametric goodness-of-fit tests, t10 statistics, t2 statistics
were coded by the expert ratings of five clinical psychologists
(Ponocny, 2001) and a Markov chain Monte Carlo algorithm (Verhelst
experienced in the emotion processing of children and adolescents.
et al., 2007) were calculated instead of the Andersen LRT and Martin-
Here, a dichotomous coding was used, which was determined on the
Löf test. In this process, 11 items were excluded. All calculations were
basis of five raters (ICC = 0.79), who rated the quality of the emotion
carried out for inter- and intrapersonal emotion regulation and a gen­
regulation strategy in every alternative on a four-point Likert scale.
eral factor “emotion regulation.”
Only items in which the mean expert rating of the best response
Nonparametric t10 statistics with the split criterion median yielded
alternative was at least one point higher than the ratings of all other
a model fit for all factors with 500 sampled matrices (interpersonal:
response alternatives were selected. This coding enables to assess which
p = .48; intrapersonal: p = .57; general factor: p = .75). The items also
response alternative is rated as the most appropriate in the respective
measured the same construct for the split-criterion gender (t10 statistic:
situation by the experts. The most appropriate answer was coded as 1,
interpersonal: p = .08; intrapersonal: p = .91; general factor: p = .22).
while the other alternatives were coded as 0. Depending on the
These results support the assumption of test fairness for men and
situation, the best alternative can correspond to the coping style of
women. t2 statistics were used to test for item homogeneity in items
rumination, suppression, cognitive reappraisal or acceptance. Most
with high and low item difficulty (interpersonal: χ248 = 36.11, ns, exact
coded response alternatives reflect adaptive strategies. However, the
p = .49; intrapersonal: χ224 = 25.37, ns, exact p = .33; general factor:
response strategies were not analyzed separately, because this would
χ2143 = 78.20, ns, exact p = .58). In addition, item homogeneity was
lead to ipsative data.
assessed for the different pedagogical item stems (interpersonal:
2.1.2.1.4. The pilot test version of the ERIPS. The pilot test version of
χ248 = 28.56, ns, exact p = .94; intrapersonal: χ224 = 26.10, ns, exact
the ERIPS included 40 items, which met the established criteria: 12
p = .34; general factor: χ2143 = 85.56, ns, exact p = .46) and different
items to measure intrapersonal emotion regulation and 28 items to
personal focuses (inter & intrapersonal) regarding emotion regulation
measure interpersonal emotion regulation. In the instruction,
(χ2139 = 79.39, ns, exact p = .63). Item homogeneity can be assumed for
participants were asked to choose the most appropriate alternative.
both criteria. This confirms the independence of the construct of emo­
The full item list cannot be disclosed here, because it is part of the
tion regulation from the situational context.
mentioned admission test. Examples for the emotions fear and anger are
The examined test version (ERIPS-24) includes 24 items (10 for
presented in Tables 1 and 2.
intrapersonal emotion regulation, 14 for interpersonal emotion reg­
ulation) with solution rates ranging from pi = 0.19 to pi = 0.76
2.1.2.2. Situational Test of Emotional Understanding (STEU) and
(M = 0.43, SD = 0.16). Item fit parameters, nonparametric goodness-
Situational Test of Emotion Management (STEM). For assessing the
of-fit tests for person homogeneity and item homogeneity support the
convergent validity, the German version of two well-established SJTs
adoption of the 1PL Rasch model. After the described statistical ana­
for EI were used (STEM & STEU; MacCann & Roberts, 2008; German
lyses, the following emotions remained in the final test version: anger,
Versions: Hilger, Hellwig, & Schulze, 2012). The tests show acceptable
shame, fear, sadness, joy, disgust, guilt. For the emotions surprise, in­
convergent and discriminant validities (Austin, 2010; Libbrecht &
terest and contempt, unfortunately no items were left.
Lievens, 2012) as well as psychometric properties (Allen, Weissman,
Hellwig, MacCann, & Roberts, 2014). The STEU was scored
dichotomously and the STEM by using the original expert scoring 2.2.2. Validity evidence
weights. For further validity analyses, mean scores for interpersonal and
intrapersonal emotion regulation as well as a general factor – the mean
2.1.2.3. Dark Triad Dirty Dozen. The dark triad was assessed with the score of the z-standardized facets – of emotion regulation were com­
German translation of the Dark Triad Dirty Dozen (DTDD; Jonason & puted. Table 3 provides descriptive statistics and correlations of all
Webster, 2010), which measures narcissism, Machiavellianism and measures including the factors of the ERIPS, the Dirty Dozen, the STEU
psychopathy. Every factor consists of four items, each rated on a and the STEM. The factors of the ERIPS were normally distributed
seven-point Likert scale. The DTDD shows good discrimination and (interpersonal: skewness = 0.44, kurtosis = 0.80; intrapersonal: skew­
difficulty parameters (Webster & Jonason, 2013) as well as adequate ness = −0.01, kurtosis = −0.71; general factor: skewness = 0.11, kur­
validities. The growing body of literature examining the relationship tosis = 0.14), tested at the critical value of |1|. Higher emotion reg­
between the dark triad and EI (Jauk, Freudenthaler, & Neubauer, 2016) ulation skills were associated with higher emotion management
convinced us to include the dark triad for validation purposes as well. In measured with the STEM and lower dark personality. Surprisingly, in­
our sample, the reliabilities were acceptable (αnarcissism = 0.79, terpersonal emotion regulation was not related to emotion under­
αMachiavellianism = 0.82, αpsychopathy = 0.71). standing in contrast to intrapersonal emotion regulation.

4
C. Koschmieder and A.C. Neubauer Personality and Individual Differences 168 (2021) 110363

Table 3
Descriptive statistics and Pearson correlations for study 1.
M SD 1 2 3 4 5 6 7

1 Emotion regulation (ER) “g” 0.00 0.79


2 Interpersonal ER 0.41 0.13 0.80⁎⁎
3 Intrapersonal ER 0.46 0.18 0.80⁎⁎ 0.26⁎⁎
4 Emotion understanding 16.01 2.64 0.17⁎ 0.10 0.17⁎
5 Emotion management 9.90 2.20 0.32⁎⁎ 0.24⁎⁎ 0.27⁎⁎ 0.26⁎⁎
6 Narcissism 17.29 4.94 −0.24⁎⁎ −0.21⁎⁎ −0.18⁎ −0.04 −0.23⁎⁎
7 Machiavellianism 11.68 5.26 −0.26⁎⁎ −0.20⁎⁎ −0.21⁎⁎ 0.16⁎ −0.16⁎ 0.36⁎⁎
8 Psychopathy 10.22 4.90 −0.31⁎⁎ −0.27⁎⁎ −0.22⁎⁎ −0.01 −0.30⁎⁎ 0.22⁎⁎ 0.64⁎⁎


p < .05.
⁎⁎
p < .01.

3. Study 2: validity in assessment situations described in study 1. After confirmatory factor analysis two items were
excluded (ERIPS–22). The ERIPS-22 consists of 10 items for
In the second study, the test was implemented in the admission intrapersonal and 12 items for interpersonal emotion regulation.
procedure for teacher education. Apart from the ERIPS the admission
procedure consists of seven psychological tests, which measure in­ 3.1.2.2. Intelligence. Intelligence was assessed by the Intelligence
telligence, language competencies, personality, creativity and openness Structure Battery (INSBAT; Arendasy et al., 2009), which was
to creativity. Further findings on predictive validity on academic presented as a computerized adaptive test (CAT: van der Linden &
achievement of the admission procedure are addressed in Neubauer Glas, 2000) with one subtest each for figural inductive thinking,
et al. (2017) and Weissenbacher et al. (2019). Koschmieder et al. arithmetic flexibility, visual short-term memory and verbal fluency.
(submitted) show further analyses on incremental validity regarding The target reliability was set to an equivalent Cronbach's alpha of 0.80.
academic achievement and teaching performance beyond high school An IQ score reflecting general cognitive ability was calculated.
GPA. In consideration of the higher statistical power of a larger sample,
this study was used for further structural analysis of the test and 3.1.2.3. Language competence. Language proficiency was assessed with
longitudinal criterion validation. a test consisting of grammar (20 items), orthography (17 items) and
reading comprehension (eight items). All items were administered via
3.1. Methods single choice measure. Correct answers were summed up for all three
facets. A mean score was calculated using the three z-standardized
3.1.1. Participants facets to build a language competence factor. For the assessment, three
The data set of the second study consisted of the admission test parallel test versions with reliabilities of αversion 1 = 0.76, αversion
(Sample 1) and a follow-up survey in the first semester (Sample 2). A 2 = 0.78 and αversion 3 = 0.84 were used.
total of 3139 applicants took part in the admission procedure for tea­
cher education. One participant did not complete the ERIPS-24. Results 3.1.2.4. Personality. Participants completed the Big Five Inventory
are hence reported for 3138 participants aged between 17 and 60 years (BFI; Lang, Lüdtke, & Asendorpf, 2001). The test measures the Big
(M = 22.72, SD = 5.49) with the majority of the sample, namely 71%, Five (extraversion, openness, agreeableness, conscientiousness and
being female. The percentage of participants who did not pass the se­ neuroticism) with 42 self-report items rated on a five-point Likert scale.
lection process was 13.3%. The procedure took place at 12 educational
institutions all over Austria. Of these, 11 institutions participated in the 3.1.2.5. Teacher self-efficacy. Teacher self-efficacy was measured with
follow-up survey. Based on the follow-up data of 939 students, 750 the scale of Schmitz and Schwarzer (2000). The scale consists of ten
records could be assigned to the anonymized data of the admission self-report items, judged on a rating scale from “not true” to “true.” The
procedure via code. The linked follow-up sample included 575 female reliability in this sample was α = 0.73. CFA showed an acceptable fit
participants (77%) and had an average age of 22 years (M = 21.59, after one error correlation was approved (χ234 = 87.81, χ2/df = 2.58,
SD = 4.30). p < .05, CFI = 0.95, SRMR = 0.04, RMSEA = 0.05).

3.1.2. Tests and measures 3.1.2.6. Desire to work with children and adolescents. For longitudinal
Emotion regulation, intelligence, linguistic competence and per­ validation and the prediction of dropout, the altruistic professional
sonality were included in the selection assessment (Sample 1), while motive (Jungert, Alm, & Thornberg, 2014) of the desire to work with
teacher self-efficacy, pedagogical experience, the intention to quit children and adolescents was assessed with a scale for job-specific
studying and the desire to work with children and adolescents were motivation for teachers (Klemenz et al., 2014). This includes three
measured in the follow-up survey (Sample 2). Additional tests were items with a seven-point Likert scale and showed a good reliability of
included in the admission test and the follow-up, but are not relevant to α = 0.90 in this sample.
the development of the ERIPS and therefore not further considered
here. For validation intelligence and personality were used, because the 3.1.2.7. Intention to quit. Individual differences in the intention to quit
existing literature on SJTs already shows that the development of SJTs the study – as a predictor of later dropout – were assessed with five
(e.g. max. and typ. performance) influences relations with these con­ items (example: “I'm thinking about quitting my studies”). The items were
structs (Freudenthaler & Neubauer, 2007). Also, language competence measured on a seven-point Likert scale with a reliability of α = 0.73.
as part of the assessment battery was chosen, due to the fact, that SJTs CFA, calculated with an MLM algorithm, confirmed a very good fit to
are language-based tests and a high correlation would be a reason to the data (χ25 = 8.51, p = .13, CFI = 0.99, SRMR = 0.02,
develop a video-based version of ERIPS. RMSEA = 0.03).

3.1.2.1. Emotion regulation. For study 2, the shortened test version from 3.1.3. Procedure
study 1 (ERIPS-24) was used. The test was administered and scored as Applicants were tested groupwise in institutional computer classes.

5
C. Koschmieder and A.C. Neubauer Personality and Individual Differences 168 (2021) 110363

Apart from the intelligence test, all tests were administered with the self-efficacy is related to academic performance (Honicke & Broadbent,
software Questionmark Perception (Questionmark; London, UK). The 2016) and the decision to drop out of teacher education (Pfitzner-Eden,
whole assessment took 3 h on average. It began with the cognitive tests 2016). Jungert et al. (2014) found an indirect effect of altruistic mo­
and finished with the personality tests. tives on student dropout via academic engagement. Thus, we tested
For the follow-up survey, the participants were tested after studying possible indirect effects of emotion regulation beyond teacher self-ef­
teacher education for half a year. They were tested in groups during ficacy and motives, calculating a structural equation model (Fig. 1).
their regular study lessons with paper and pencil tests. The follow-up On account of a not acceptable model fit of the teacher self-efficacy
had a maximum duration of 1 h. The samples from study 1 and 2 were scale, one residual correlation was allowed for this factor. The model
linked with an anonymous code. The Lavaan package (Rosseel, 2012) showed a good fit for the fit indices according to Hu and Bentler (1999)
was used for structural equation modeling. and an acceptable fit of the χ2/df lower than two (Byrne, 1989;
χ2368 = 539.86, χ2/df = 1.47, p < .0001, CFI = 0.96, SRMR = 0.04,
RMSEA = 0.04). The results of the model support the assumption that
3.2. Results interpersonal emotion regulation has an effect on teacher self-efficacy
(β = 0.18, p < .05), which has a negative effect on the intention to
In the first step, the dimensionality of the ERIPS was assessed with a quit (β = −0.22, p < .01). Equivalent to this, an effect of in­
CFA. Two models – a two-factor model and a one-factor model – were trapersonal emotion regulation on the altruistic motive of working with
calculated using a weighted least square mean and variance adjusted children (β = 0.27, p < .01) was found, which is also a significant
(WLSMV) algorithm with a delta parametrization. This algorithm was predictor of the intention to quit (β = 0.13, p < .05).
chosen due to its statistical equivalence to the 2PL Birnbaum model
(Arendasy & Sommer, 2017). Items were randomly parceled into 12
4. Discussion
indicators within inter- and intrapersonal emotion regulation. In the
course of the CFA, two items were excluded.
The aim of this article was the development of an SJT to determine
The two-factor model (χ243 = 94.59, χ2/df = 2.20, p < .0001,
people's skills regarding their emotion regulation in the admission
CFI = 0.958, SRMR = 0.025, RMSEA = 0.020) revealed a considerably
procedure for teacher education. Through highly structured item de­
better model fit than the one-factor model (χ244 = 189.13, χ2/df = 4.30,
velopment and test analysis using the 1PL Rasch Model and structural
p < .0001, CFI = 0.88, SRMR = 0.035, RMSEA = 0.032). Inter- and
equation models, the study contributes a new approach to increase the
intrapersonal emotion regulation showed a latent correlation of
construct homogeneity of SJTs. Study 1 included the test analysis and
r = 0.53. For this reason, the results for the two-factor model are re­
examined convergent validities. This study led to a first version of the
ported below.
ERIPS, which was implemented in the admission exam in study 2. Study
Due to this selection the final Version of the ERIPS-22 consists of 22
2 analyzed the psychometric structure and relations with personality
items with 10 intrapersonal and 12 interpersonal vignettes. These were
traits and cognitive abilities. A longitudinal follow-up assessed the link
distributed among the emotions sadness (3 items), joy (3 items), shame
between emotion regulation, teacher self-efficacy, altruistic profes­
(3 items), fear (5 items), anger (4 items), contempt (2 items), and
sional motives and intention to quit. After study two, the final test
surprise (2 items). The response alternatives coded by the expert ratings
version consists of 22 items with two factors: interpersonal emotion
mainly reflected the adaptive strategies (17 items represent acceptance,
regulation (12 items) and intrapersonal emotion regulation (10 items).
3 items cognitive reappraisal), while two items were coded with the
During this development process, various topics were addressed, which
strategy of rumination.
are discussed in SJT literature. (1) A mixed approach (inductive and
Table 4 presents the correlations of the study variables. For both
deductive) in the item generation to build a theoretical framework was
factors of the ERIPS, higher scores in emotion regulation were related to
used; (2) this theoretical basis was extended down to the response al­
higher language skills and to the personality factors, especially to
ternatives; (3) IRT was used to test for psychometric fairness in gender
higher openness as well as higher agreeableness. Small associations
and construct homogeneity in item stems with different pedagogical
with intelligence were also observed. Teacher self-efficacy was related
contexts.
to interpersonal emotion regulation, while the altruistic professional
motive to work with children was related to intrapersonal emotion
regulation. In contrast to previous findings (Neubauer et al., 2017), no 4.1. Conclusions from the validity evidence of the ERIPS
correlation with the intention to quit was found.
Observed results were in line with previous findings that teacher Validity analyses showed convergent correlations with situational

Table 4
Descriptive statistics and Pearson correlations for Study 2.
M SD 1 2 3 4 5 6 7 8 9 10 11

1 Interpersonal emotion regulation 0.56 0.18


2 Intrapersonal emotion regulation 0.47 0.18 0.20⁎⁎
3 Intelligence (IQ) 105.87 7.60 0.07⁎⁎ 0.04⁎
4 Language 0.00 0.74 0.17⁎⁎ 0.13⁎⁎ 0.30⁎⁎
5 Neuroticism 2.09 0.54 −0.08⁎⁎ 0.04⁎ −0.02 0.06⁎⁎
6 Extraversion 4.17 0.49 0.08⁎⁎ 0.04⁎ −0.08⁎⁎ −0.05⁎⁎ −0.47⁎⁎
7 Openness 4.10 0.49 0.18⁎⁎ 0.05⁎⁎ 0.05⁎ 0.11⁎⁎ −0.27⁎⁎ 0.40⁎⁎
8 Agreeableness 4.23 0.44 0.14⁎⁎ 0.06⁎⁎ −0.02 −0.05⁎⁎ −0.38⁎⁎ 0.37⁎⁎ 0.37⁎⁎
9 Conscientiousness 4.26 0.48 0.08⁎⁎ −0.01 −0.08⁎⁎ −0.02 −0.39⁎⁎ 0.40⁎⁎ 0.30⁎⁎ 0.48⁎⁎
10 Teacher self-efficacy 3.43 0.32 0.12⁎⁎ 0.05 −0.05 −0.03 −0.22⁎⁎ 0.27⁎⁎ 0.27⁎⁎ 0.21⁎⁎ 0.17⁎⁎
11 Altruistic professional motive of the desire to 6.15 1.06 0.02 0.10⁎⁎ −0.12⁎⁎ −0.08⁎ −0.02 0.12⁎⁎ 0.09⁎ 0.17⁎⁎ 0.09⁎ 0.41⁎⁎
work with children
12 Intention to quit 2.09 1.05 0.02 0.03 −0.04 −0.12⁎⁎ 0.07⁎ −0.06 −0.15⁎⁎ −0.11⁎⁎ −0.14⁎⁎ −0.25⁎⁎ −0.23⁎⁎

Note. Variables 1–9 were collected in Sample 1, variables 10–12 in Sample 2 (see section Participants - Study 2).

p < .05.
⁎⁎
p < .01.

6
C. Koschmieder and A.C. Neubauer Personality and Individual Differences 168 (2021) 110363

Fig. 1. Structural equation model testing indirect effects of emotion regulation (Inter ER and Intra ER) on the intention to quit via the altruistic motive of the desire to
work with children and adolescents (Motive) and teacher self-efficacy (SWK). Model fit: χ2368 = 539.86, χ2/df = 1.47, p < .0001, CFI = 0.96, SRMR = 0.04,
RMSEA = 0.04. Dotted arrows were tested but turned out not to be significant.

management as expected from other studies (Libbrecht & Lievens, would be interesting to conduct further analyses with regard to the
2012; Sharma et al., 2013). The ERIPS is designed to measure emotion different emotion regulation styles and facets of narcissism.
regulation with various coping styles. The expert rating shows that Moreover, in study 2, emotion regulation was linked to openness
among the response alternatives, acceptance was by far the most fre­ and agreeableness rather than to any other personality aspects. In ad­
quently rated as the most appropriate coping style (17 items). In line dition, correlations with intelligence and linguistic competence were
with previous research adaptive coping strategies such as cognitive observed. But as the correlations were quite small, discriminant validity
reappraisal or acceptance are particularly useful for regulating emo­ can be assumed. The small correlations may be attributed to response
tions and promote positive outcomes (Aldao et al., 2010), therefore it biases in the assessment sample caused by socially desired response
might be useful to add more items to the SJT, in which cognitive re­ behavior in the assessment procedure. The structure of discriminant
appraisal is the best alternative to achieve a more balanced measure­ validities suggests, compared with instructions analyzed by McDaniel
ment of adaptive strategies. et al. (2007), that the instruction to choose the “most appropriate”
Even if the ERIPS is only designed to measure emotion regulation, answer is a mixture of behavioral tendency instruction and knowledge
we expected some small relations with emotion understanding, which, instruction. This could be advantageous in assessment situations when
however was not found. This might be due to the emotion information testing not only maximum performance but also typical behavior as
being already included in the item stem. Therefore, recognizing the well as its relation to academic performance and dropout.
emotion in the specific situation is hence no longer necessary in this Understanding why prospective teachers quit their studies is a
test. In situations that affect one's self, on the other hand, one seems to question of high practical relevance for student selection. Further
automatically know which emotion is triggered. Here, the emotional longitudinal research may test different moderators and mediators of
aspects that are given do not seem to be as relevant, or probably a the assessed traits and study dropout. Our results show predictive va­
comparison with the presented emotion occurs. In a future study, it lidities on altruistic professional motives and teacher self-efficacy,
would be interesting to compare the indicated emotions with the which is a stable predictor of academic performance. Both show pre­
emotions felt by a person for the items of intrapersonal emotion reg­ dictive validities regarding the intention to quit studying. Interpersonal
ulation. emotion regulation affects teachers' self-efficacy. Students who are able
In addition to the convergent validity, it would also be interesting to to deal well with the emotions of others also have a higher ability to act
investigate the incremental validity of the ERIPS. Unfortunately, we adequately in pedagogical situations, which is associated with higher
were not able to investigate the incremental validity of ERIPS over self-efficacy. Intrapersonal emotion regulation, on the other hand, has
STEM and the STEU as it was not possible to integrate them into the an impact on altruistic professional motives. This result leads to the
assessment in Study 2. assumption that professional motives as internal representations of
Furthermore, negative relations with the dark triad were found. “learned outcomes that have become desirable for a given individual” (De
These results support the assumption that psychopathy and Cooman et al., 2007) are influenced by one's own emotional processes.
Machiavellianism are associated with emotional deficits (Megías, In particular, social motives are stronger in groups of professionally
Gómez-Leal, Gutiérrez-Cobo, Cabello, & Fernández-Berrocal, 2018). active teachers than in groups of teachers who no longer work in this
Findings for the relation between narcissism and emotion regulation are profession (De Cooman et al., 2007).
heterogeneous. Our results are in contrast to previous findings, which First results of the predictive validity of ERIPS in the admission
reported positive relations of narcissism and empathy (Jonason & Kroll, procedure for teacher training have already been published with re­
2015), while they are in line with the findings of Jauk et al. (2016), who levant outcomes (Neubauer et al., 2017; Weissenbacher et al., 2019). In
also observed a negative relation between the two variables. This could a further four-year research project, we will test replicability of our
be due to the construct definition of emotional intelligence. Empathy – findings in a larger sample.
as measured in the study by Jonason and Kroll (2015) – is important to
be able to understand others, a skill of great importance to narcissists. 4.2. Conclusions regarding the development of SJTs
However, in using emotion regulation strategies, students high on
narcissism seem to use maladaptive strategies more often than students Our findings support the item homogeneity of items with different
low on narcissism (Zalpour, Shahidi, Zarrani, Mazaheri, & Heidari, pedagogical contexts and the psychometric measurement equivalence
2015). In order to examine these assumptions more thoroughly, it for women and men. Especially in text-based tests with complex item

7
C. Koschmieder and A.C. Neubauer Personality and Individual Differences 168 (2021) 110363

construction, it is useful to check homogeneity with regard to the provide information about psychometric properties of the items. As an
fairness of possible subgroups as well as homogeneity on the item level. example, we introduce the ERIPS as an assessment for emotion reg­
This can provide information for further development and item selec­ ulation in pedagogical situations. The test shows a two-factor structure
tion. For example, the analysis could indicate that an item prefers men with good psychometric characteristics, including convergent as well as
by reason of a presented situational context that is typical for men, but discriminant validities. Future research should aim to examine the ef­
not for women. fects of preferences on adaptive or maladaptive emotion regulation
Moreover, further investigation of the ERIPS' susceptibility to faking styles. These could be linked to different criteria, which influence
would be of high relevance. With reference to studies by Krammer, academic performance and dropout – for example, learning styles,
Sommer, and Arendasy (2017), this could be tested by relating the real- goals, study satisfaction, health and recovery behavior. This could be
life assessment situation to a follow-up survey with an honest, re­ useful for the admission exam to reduce later burnout prevalence in
producible and maximum performance condition. teachers.
However, further investigation of the structure of situational judg­
ment tests is only possible if the items have been designed and stan­ CRediT authorship contribution statement
dardized according to clear guidelines based on a theoretical frame­
work and guiding questions. A standardization down to the level of All persons who meet authorship criteria are listed as authors, and
response alternatives is very rarely realized in the development of SJTs all authors certify that they have participated sufficiently in the work to
and is a novelty of our study. Contrary to the usual inductive procedure take public responsibility for the content, including participation in the
for the design of SJTs, we started with a theoretical framework before concept, design, analysis, writing, or revision of the manuscript.
beginning the development of the SJT (Campion et al., 2014). This Furthermore, each author certifies that this material or similar material
helps to understand why SJTs show good validities and what constructs has not been and will not be submitted to or published in any other
we exactly measure. publication before its appearance in the Journal of Personality and
With regard to the ERIPS, it could be useful for the future to in­ Individual Differences.
tegrate a theoretical construct in the scoring key instead of an expert
rating. In our study we decided to use clinical psychologists and not Acknowledgements
teachers as subject matter experts to avoid a confounding with peda­
gogical knowledge. However, one could argue, that teachers' emotional This research was supported by the HRSM Fund from the Austrian
knowledge skills differ from those rated by our experts. For the future Federal Ministry of Science, Research and Economy. We are grateful for
we plan to further analyze the preferences of different regulation stra­ the help of Hanna Vollmann as well as Martin Trosien, Katharina Sieber
tegies with regard to their validities. Rumination and suppression as and Nora Nordtvedt, who assisted in this project.
maladaptive emotion regulation strategies are negatively related to
mental health disorders (Aldao et al., 2010). In terms of high burnout References
rates in the teaching profession, this issue is of high relevance for tea­
cher education. Aldao, A., Nolen-Hoeksema, S., & Schweizer, S. (2010). Emotion-regulation strategies
across psychopathology: A meta-analytic review. Clinical Psychology Review, 30,
217–237.
4.3. Limitations and implications Allen, V. D., Weissman, A., Hellwig, S., MacCann, C., & Roberts, R. D. (2014).
Development of the Situational Test of Emotional Understanding–Brief (STEU-B)
Despite several years of development of the ERIPS, the study has using item response theory. Personality and Individual Differences, 65, 3–7.
Andersen, E. B. (1973). A goodness of fit test for the Rasch model. Psychometrika, 38,
some limitations from which we derive implications for further studies. 123–140.
The ERIPS is used to select qualified students for teacher education in Arendasy, M., Hornke, L. F., Sommer, M., Häusler, J., Wagner-Menghin, M., Gittler, G., &
order to later get good teachers for the schools. So far, we could not Körtner, T. (2009). Manual Intelligenz-Struktur-Batterie (INSBAT, version 26.00)
[Manual Intelligence-Structure-Battery (INSBAT, version 26.00)]. Mödling: Schuhfried
investigate validities regarding later job performance as a teacher. In a GmbH.
four-year longitudinal research project “Dropout rates, educational and Arendasy, M., & Sommer, M. (2017). Reducing the effect size of the retest effect:
job success of preservice teachers” the validity of ERIPS regarding Examining different approaches. Intelligence, 62, 89–98.
Austin, E. J. (2010). Measurement of ability emotional intelligence: Results for two new
professional success and burnout of future teachers shall be in­
tests. British Journal of Psychology, 101(3), 563–578.
vestigated. Furthermore, it would be useful to examine the convergent Bergman, M. E., Drasgow, F., Donovan, M. A., Henning, J. B., & Juraska, S. E. (2006).
validity of the ERIPS with another test assessing different coping stra­ Scoring situational judgment tests: Once you get the data, your troubles begin.
tegies. As we are not aware of any published SJTs for coping strategies International Journal of Selection and Assessment, 14, 223–235.
Brackett, M. A., Palomera, R., Mojsa-Kaja, J., Reyes, M. R., & Salovey, P. (2010). Emotion
for the moment one could only use self-report questionnaires for vali­ regulation ability, burnout, and job satisfaction among British secondary school
dation. In future research the ERIPS could be analyzed regarding the teachers. Psychology in the Schools, 47, 406–417.
different coping strategies. However, such a coding would lead to ip­ Brown, A., & Maydeu-Olivares, A. (2013). How IRT can solve problems of ipsative data in
forced-choice questionnaires. Psychological Methods, 18, 36–52.
sative data, which would need to be taken into account when analyzing Byrne, B. M. (1989). A primer of LISREL. New York: Springer.
the data. For this IRT approaches (Brown & Maydeu-Olivares, 2013) Campion, M. C., Ployhart, R. E., & MacKenzie, W. I., Jr. (2014). The state of research on
would have to be used. Furthermore, in the ERIPS we currently have a situational judgment tests: A content analysis and directions for future research.
Human Performance, 27, 283–310.
predominance of adaptive coping strategies as rated by experts. This led Christian, M. S., Edwards, B. D., & Bradley, J. C. (2010). Situational judgment tests:
to an unbalanced scoring of the items and more items could be devel­ Constructs assessed and a meta-analysis of their criterion-related validities. Personnel
oped in which other strategies might get higher ratings. Also, other Psychology, 63, 83–117.
Côté, S., & Morgan, L. M. (2002). A longitudinal analysis of the association between
coping strategies than the four assumed here, could be considered for
emotion regulation, job satisfaction, and intentions to quit. Journal of Organizational
the future. Behavior, 23, 947–962.
De Cooman, R., De Gieter, S., Pepermans, R., Bois, C. D., Caers, R., & Jegers, M. (2007).
Graduate teacher motivation for choosing a job in education. International Journal for
4.4. Conclusion
Educational and Vocational Guidance, 7, 123–136.
De Leng, W. E., Stegers-Jager, K. M., Husbands, A., Dowell, J. S., Born, M. P., & Themmen,
In conclusion, we argue that a strong integration of a theoretical A. P. N. (2017). Scoring method of a situational judgment test: Influence on internal
framework in the items and response alternatives in the development of consistency reliability, adverse impact and correlation with personality? Advances in
Health Sciences Education, 22, 243–265.
SJTs would make a significant contribution to research on situational De Meijer, L. A., Born, M., van Zielst, J., & van der Molen, H. T. (2010). Construct-driven
judgment tests. Additionally, and in line with the recommendations of development of a video-based situational judgment test for integrity. European
Guenole et al. (2017), IRT analysis and testing item homogeneity can Psychologist, 15, 229–236.

8
C. Koschmieder and A.C. Neubauer Personality and Individual Differences 168 (2021) 110363

Di Fabio, A., & Palazzeschi, L. (2008). Emotional intelligence and self-efficacy in a sample Lievens, F., & Coetsier, P. (2002). Situational tests in student selection: An examination of
of Italian high school teachers. Social Behavior and Personality: An International predictive validity, adverse impact, and construct validity. International Journal of
Journal, 36, 315–326. Selection and Assessment, 10, 245–257.
Flanagan, J. C. (1954). The critical incident technique. Psychological Bulletin, 41, Lievens, F., Peeters, H., & Schollaert, E. (2008). Situational judgment tests: A review of
237–358. recent research. Personnel Review, 37, 426–441.
Freudenthaler, H. H., & Neubauer, A. C. (2007). Measuring emotional management Lievens, F., & Sackett, P. R. (2012). The validity of interpersonal skills assessment via
abilities: Further evidence of the importance to distinguish between typical and situational judgment tests for predicting academic success and job performance.
maximum performance. Personality and Individual Differences, 42, 1561–1572. Journal of Applied Psychology, 97, 460–468.
Gessner, T. L., & Klimoski, R. J. (2006). Making sense of situations. In J. A. Weekley, & R. MacCann, C., & Roberts, R. D. (2008). New paradigms for assessing emotional in­
E. Ployhart (Eds.). Situational judgment tests: Theory, measurement, and application (pp. telligence: Theory and data. Emotion, 8, 540–551.
13–38). Mahwah, NJ: Erlbaum. Mair, P., & Hatzinger, R. (2007). Extended Rasch modeling: The eRm package for the
Gratz, K. L., & Roemer, L. (2004). Multidimensional assessment of emotion regulation and application of IRT models in R. Journal of Statistical Software, 20, 1–20.
dysregulation: Development, factor structure, and initial validation of the difficulties Martin-Löf, P. (1973). Statistiska modeller. Anteckningar fran seminarier lasaret 1969–1970
in emotion regulation scale. Journal of Psychopathology and Behavioral Assessment, 26, utarbetade av rolf sundberg. obetydligt ändrat nytryck, october 1973 (photocopied
41–54. manuscript). Institutet för säkringsmatematik och matematisk statistik vid Stockholms
Gross, J. J. (Ed.). (2014). Handbook of emotion regulation. New York: Guilford Press. universitet.
Guenole, N., Chernyshenko, O. S., & Weekly, J. (2017). On designing construct driven Mayer, J. D., & Salovey, P. (1997). What is emotional intelligence? In P. Salovey, & D. J.
situational judgment tests: Some preliminary recommendations. International Journal Sluyter (Eds.). Emotional development and emotional intelligence (pp. 3–31). New York:
of Testing, 17, 234–252. Basic Books.
Guglielmi, R. S., & Tatrow, K. (1998). Occupational stress, burnout, and health in tea­ Mayr, J. (2012). Ein Lehramtsstudium beginnen? Ein Lehramtsstudium beginnen lassen?
chers: A methodological and theoretical analysis. Review of Educational Research, 68, Laufbahnberatung und Bewerberauswahl konstruktiv gestalten [Start a study for
61–99. teacher education? Design career counselling and applicant selection in a con­
Hilger, L., Hellwig, S., & Schulze, R. (2012). Deutschsprachige Adaptation des STEU sowie structive way]. In B. Weyand, M. Justus, & M. Schratz (Eds.). Auf unsere Lehrerinnen
des STEM und erste Validitätsevidenz [German adaptation of the STEU and the STEM und Lehrer kommt es an. Geeignete Lehrer/−innen gewinnen, (aus-)bilden und fördern (S.
and first evidence of validity]. Presentation at the 48th Congress of the German Society 38–57). Essen: Stifterverband für die Deutsche Wissenschaft.
of Psychology, Bielefeld. McDaniel, M. A., Hartman, N. S., Whetzel, D. L., & Grubb, W. (2007). Situational judg­
Hofmann, S., Carpenter, G., & Curtiss, J. (2016). Interpersonal Emotion Regulation ment tests, response instructions, and validity: A meta-analysis. Personnel Psychology,
Questionnaire (IERQ): Scale development and psychometric characteristics. Cognitive 60, 63–91.
Therapy and Research, 40, 341–356. McDaniel, M. A., Morgeson, F. P., Finnegan, E. B., Campion, M. A., & Braverman, E. P.
Hofmann, S. G. (2016). Emotion in therapy: From science to practice. New York, NY: (2001). Predicting job performance using situational judgment tests: A clarification of
Guilford Press. the literature. Journal of Applied Psychology, 86, 730–741.
Honicke, T., & Broadbent, J. (2016). The influence of academic self-efficacy on academic McDaniel, M. A., & Nguyen, N. T. (2001). Situational judgment tests: A review of practice
performance: A systematic review. Educational Research Review, 17, 63–84. and constructs assessed. International Journal of Selection and Assessment, 9, 103–113.
Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure Megías, A., Gómez-Leal, R., Gutiérrez-Cobo, M. J., Cabello, R., & Fernández-Berrocal, P.
analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, (2018). The relationship between trait psychopathy and emotional intelligence: A
6, 1–55. meta-analytic review. Neuroscience & Biobehavioral Reviews, 84, 198–203.
Izard, C. E. (1994). Die Emotionen des Menschen. Eine Einführung in die Grundlagen der Muck, P. M. (2013). Entwicklung von Situational Judgment Tests. Konzeptionelle
Emotionspsychologie [The emotions of humans. An introduction to the principles of emo­ Überlegungen und empirische Befunde [Development of situational judgment tests.
tional psychology]. Weinheim: Beltz, Psychologie-Verlag-Union. Conceptual considerations and empirical findings]. Vol. 57, Zeitschrift für Arbeits- und
Jackson, D. J., LoPilato, A. C., Hughes, D., Guenole, N., & Shalfrooshan, A. (2017). The Organisationspsychologie185–205.
internal structure of situational judgment tests reflects candidate main effects: Not Neubauer, A. C., & Freudenthaler, H. H. (2005). Models of emotional intelligence. In R.
dimensions or situations. Journal of Occupational and Organizational Psychology, 90, Schulze, & R. D. Roberts (Eds.). Emotional intelligence: An international handbook (pp.
1–27. 31–50). Ashland, OH: Hogrefe & Huber Publishers.
Jauk, E., Freudenthaler, H. H., & Neubauer, A. C. (2016). The dark triad and trait versus Neubauer, A. C., Koschmieder, C., Krammer, G., Mayr, J., Müller, F., Pflanzl, B., Pretsch,
ability emotional intelligence. Journal of Individual Differences, 37, 112–118. J., & Schaupp, H. (2017). TESAT – Ein neues Verfahren zur Eignungsfeststellung und
Jonason, P. K., & Kroll, C. H. (2015). A multidimensional view of the relationship between Bewerberauswahl für das Lehramtsstudium: Kontext, Konzept und erste Befunde
empathy and the dark triad. Journal of Individual Differences, 36, 150–156. [TESAT - A new procedure for the assessment and selection of applicants for teacher
Jonason, P. K., & Webster, G. D. (2010). The dirty dozen: A concise measure of the dark education: Context, concept and initial findings]. Zeitschrift für Bildungsforschung,
triad. Psychological Assessment, 22, 420–432. 7(1), 5–21.
Joseph, D. L., & Newman, D. A. (2010). Emotional intelligence: An integrative meta- Parker, J. D., Summerfeldt, L. J., Hogan, M. J., & Majeski, S. A. (2004). Emotional in­
analysis and cascading model. Journal of Applied Psychology, 95, 54–78. telligence and academic success: Examining the transition from high school to uni­
Jungert, T., Alm, F., & Thornberg, R. (2014). Motives for becoming a teacher and their versity. Personality and Individual Differences, 36(1), 163–172.
relations to academic engagement and dropout among student teachers. Journal of Petermann, F., & Wiedebusch, S. (2008). Emotionale Kompetenz bei Kindern [Emotional
Education for Teaching, 40, 173–185. competence of children]. (2., überarbeitete Aufl.)Göttingen: Hogrefe.
Kim, L. E., Jörg, V., & Klassen, R. M. (2019). A meta-analysis of the effects of teacher Pfitzner-Eden, F. (2016). I feel less confident so I quit? Do true changes in teacher self-
personality on teacher effectiveness and burnout. Educational Psychology Review, 31, efficacy predict changes in preservice teachers' intention to quit their teaching de­
163–195. gree? Teaching and Teacher Education, 55, 240–254.
Klassen, R. M., Durksen, T. L., Rowett, E., & Patterson, F. (2014). Applicant reactions to a Ponocny, I. (2001). Nonparametric goodness-of-fit tests for the Rasch model.
situational judgment test used for selection into initial teacher training. International Psychometrika, 66, 437–460.
Journal of Educational Psychology, 3, 104–125. Roberts, R. D., MacCann, C., Matthews, G., & Zeidner, M. (2010). Emotional intelligence:
Klemenz, S., Tachtsoglou, S., Lünnemann, M., Darge, K., König, J., & Rothland, M. (2014). Toward a consensus of models and measures. Social and Personality Psychology
EMW – Entwicklung von berufsspezifischer Motivation und pädagogischem Wissen in der Compass, 4(10), 821–840.
Lehrerausbildung. Codebook zum Fragebogen Messzeitpunkt 2, Teil 1 und 3, DE/AT/CH. Rosseel, Y. (2012). Lavaan: An R package for structural equation modeling. Journal of
Fragen zur Person, zur berufsspezifischen Motivation und zu Lerngelegenheiten [EMW - Statistical Software, 48, 1–36.
Development of professional motivation and pedagogical knowledge in teacher education. Ryan, A. M., & Ployhart, R. E. (2014). A century of selection. Annual Review of Psychology,
Codebook for questionnaire Measurement measurmentpoint 2, part 1 and 3, DE/AT/CH. 65, Article 693717.
Questions about the person, about job-specific motivation and about learning opportu­ Salgado, J. F., Viswesvaran, C., & Ones, D. S. (2002). Predictors used for personnel se­
nities]. Köln: Universität zu Köln. Verfügbar unter http://kups.ub.uni-koeln.de/ lection: An overview of constructs, methods and techniques. In N. Anderson, D. S.
5788/. Ones, H. K. Sinangil, & C. Viswesvaran (Eds.). Handbook of industrial, work and or­
Koschmieder, C., Weissenbacher, B., Riegler, D., Krammer, G., Gruber, C., & Neubauer, A. ganizational psychology (pp. 165–199). London: Sage.
(2020). Was leisten psychologische Tests über Abschlussnoten hinaus? Befunde zur Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in
prädiktiven Validität, Selbst- und Fremdselektion in Auswahlverfahren zur personnel psychology: Practical and theoretical implications of 85 years of research
Studienzulassung [What can psychological tests contribute beyond GPA? Findings on findings. Psychological Bulletin, 124, 262–274.
predictive validity, self- and external selection in admission procedures for higher educa­ Schmitz, G. S., & Schwarzer, R. (2000). Selbstwirksamkeitserwartung von Lehrern:
tional studies]. Diagnostica (submitted). Längsschnittbefunde mit einem neuen instrument [Self-efficacy expectations of tea­
Krammer, G., Sommer, M., & Arendasy, M. E. (2017). The psychometric costs of appli­ chers: Longitudinal findings with a new instrument]. Zeitschrift für Pädagogische
cants' faking: Examining measurement invariance and retest correlations across re­ Psychologie/German Journal of Educational Psychology, 14, 12–25.
sponse conditions. Journal of Personality Assessment, 99, 510–523. Schuler, H., & Marcus, B. (2006). Das interview. In H. Schuler (Hrsg.) (Ed.). Lehrbuch der
Lang, F. R., Lüdtke, O., & Asendorpf, J. B. (2001). Testgüte und psychometrische Äquivalenz Personalpsychologie [Handbook of personnel psychology]. Göttingen: Hogrefe (S.
der deutschen Version des Big Five Inventory (BFI) bei jungen, mittelalten und alten 209–221).
Erwachsenen [Test quality and psychometric equivalence of the German version of the Big Sharma, S., Gangopadhyay, M., Austin, E., & Mandal, M. K. (2013). Development and
Five Inventory (BFI) in young, middle-aged and aged adults]. Vol. 47, validation of a situational judgment test of emotional intelligence. International
Diagnostica111–121. Journal of Selection and Assessment, 21, 57–73.
Libbrecht, N., & Lievens, F. (2012). Validity evidence for the situational judgment test Skaalvik, E. M., & Skaalvik, S. (2010). Teacher self-efficacy and teacher burnout: A study
paradigm in emotional intelligence measurement. International Journal of Psychology, of relations. Teaching and Teacher Education, 26, 1059–1069.
47, 438–447. Stemler, S. F., & Sternberg, R. J. (2006). Using situational judgement tests to measure

9
C. Koschmieder and A.C. Neubauer Personality and Individual Differences 168 (2021) 110363

practical intelligence. In J. A. Weekley, & R. E. Ployhart (Eds.). Situational judgment R. E. Ployhart (Eds.). Situational judgment tests: Theory, measurement, and application
tests: Theory, measurement, and application (pp. 279–300). Mahwah, NJ: Erlbaum. (pp. 157–182). Mahwah, NJ: Erlbaum.
van der Linden, W. J., & Glas, C. A. W. (2000). Computer adaptive testing: Theory and Weissenbacher, B., Koschmieder, C., Hecht, P., Knitel, D., König, B., Krammer, G., Müller,
practice. Norwell, MA: Kluwer. F., Schaupp, H., & Neubauer, A. C. (2019). Der Studien- und Berufserfolg von (an­
Van Rooy, D. L., & Viswesvaran, C. (2004). Emotional intelligence: A meta-analytic in­ gehenden) Lehrpersonen in Österreich im Längsschnitt [The academic and profes­
vestigation of predictive validity and nomological net. Journal of Vocational Behavior, sional achievements of (prospective) teachers in Austria in longitudinal section].
65, 71–95. Beiträge zur Lehrerinnen- und Lehrerbildung, 1, 42–56.
Vandenberghe, R., & Huberman, A. M. (1999). Understanding and preventing teacher Weng, Q. D., Yang, H., Lievens, F., & McDaniel, M. A. (2018). Optimizing the validity of
burnout: A sourcebook of international research and practice. Cambridge: Cambridge situational judgment tests: The importance of scoring methods. Journal of Vocational
University Press. Behavior, 104, 199–209.
Verhelst, N. D., Hatzinger, R., & Mair, P. (2007). The Rasch sampler. Journal of Statistical Westring, A. J. F., Oswald, F. L., Schmitt, N., Drzakowski, S., Imus, A., Kim, B., & Shivpuri,
Software, 20, 1–14. S. (2009). Estimating trait and situational variance in a situational judgment test.
Webster, G. D., & Jonason, P. K. (2013). Putting the “IRT” in “Dirty”: Item response Human Performance, 22, 44–63.
theory analyses of the Dark Triad Dirty Dozen—An efficient measure of narcissism, Whetzel, D. L., McDaniel, M. A., & Nguyen, N. T. (2008). Subgroup differences in situa­
psychopathy, and Machiavellianism. Personality and Individual Differences, 54, tional judgment test performance: A meta-analysis. Human Performance, 21, 291–309.
302–306. Zalpour, K., Shahidi, S., Zarrani, R., Mazaheri, M. A., & Heidari, M. (2015). Empathy
Weekley, J. A., Ployhart, R. E., & Holtz, B. C. (2006). On the development of situational and cognitive emotion regulation in phenotypes of narcissism. Payesh, 14,
judgment tests: Issues in item development, scaling, and scoring. In J. A. Weekley, & 239–247.

10

You might also like