You are on page 1of 26

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/282513055

Development and Construct Validation of a Situational Judgment


Test of Strategic Knowledge of Classroom Management in
Elementary Schools

Article  in  Educational Assessment · July 2015


DOI: 10.1080/10627197.2015.1062087

CITATIONS READS

19 1,545

2 authors:

Bernadette Gold Manfred Holodynski


Universität Erfurt University of Münster
17 PUBLICATIONS   185 CITATIONS    78 PUBLICATIONS   721 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

ProdiviS - Förderung der professionellen Wahrnehmung in digitalen, videobasierten Selbstlernmodulen View project

FALKO Panel View project

All content following this page was uploaded by Manfred Holodynski on 14 November 2015.

The user has requested enhancement of the downloaded file.


This article was downloaded by: [Institut Fuer Tierernaehrung/Fli], [Mr Manfred
Holodynski]
On: 11 August 2015, At: 22:46
Publisher: Routledge
Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered
office: 5 Howick Place, London, SW1P 1WG

Educational Assessment
Publication details, including instructions for authors and
subscription information:
http://www.tandfonline.com/loi/heda20

Development and Construct Validation of


a Situational Judgment Test of Strategic
Knowledge of Classroom Management in
Elementary Schools
a a
Bernadette Gold & Manfred Holodynski
a
University of Münster
Published online: 11 Aug 2015.

Click for updates

To cite this article: Bernadette Gold & Manfred Holodynski (2015) Development and Construct
Validation of a Situational Judgment Test of Strategic Knowledge of Classroom Management in
Elementary Schools, Educational Assessment, 20:3, 226-248, DOI: 10.1080/10627197.2015.1062087

To link to this article: http://dx.doi.org/10.1080/10627197.2015.1062087

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the
“Content”) contained in the publications on our platform. However, Taylor & Francis,
our agents, and our licensors make no representations or warranties whatsoever as to
the accuracy, completeness, or suitability for any purpose of the Content. Any opinions
and views expressed in this publication are the opinions and views of the authors,
and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content
should not be relied upon and should be independently verified with primary sources
of information. Taylor and Francis shall not be liable for any losses, actions, claims,
proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or
howsoever caused arising directly or indirectly in connection with, in relation to or arising
out of the use of the Content.

This article may be used for research, teaching, and private study purposes. Any
substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,
systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &
Conditions of access and use can be found at http://www.tandfonline.com/page/terms-
and-conditions
Downloaded by [Institut Fuer Tierernaehrung/Fli], [Mr Manfred Holodynski] at 22:46 11 August 2015
Educational Assessment, 20:226–248, 2015
Copyright q Taylor & Francis Group, LLC
ISSN: 1062-7197 print/1532-6977 online
DOI: 10.1080/10627197.2015.1062087

Development and Construct Validation of a


Situational Judgment Test of Strategic Knowledge of
Downloaded by [Institut Fuer Tierernaehrung/Fli], [Mr Manfred Holodynski] at 22:46 11 August 2015

Classroom Management in Elementary Schools


Bernadette Gold and Manfred Holodynski
University of Münster

The current study describes the development and construct validation of a situational judgment test
for assessing the strategic knowledge of classroom management in elementary schools. Classroom
scenarios and accompanying courses of action were constructed, of which 17 experts confirmed the
content validity. A pilot study and a cross-validation with preservice teachers and inservice teachers
revealed the assumed factor structure and sensitivity of the test to differences in expertise. The
results indicate that the situational judgment test for assessing strategic knowledge of classroom
management in elementary schools is a valid assessment tool for investigating the acquisition and
promotion of classroom management knowledge during teacher education.

Classroom management is a crucial component of professional teacher knowledge. Numerous


studies have confirmed its significance for effective teaching as well as for student learning and
motivation (Baumert & Kunter, 2013; Emmer & Stough, 2001; Grossman, 1990; Hattie, 2009;
Shulman, 1986; Wang, Härtel, & Walberg, 1993). Furthermore, classroom management is one
of the most serious issues and concerns for beginning teachers (Veenman, 1984). Given that
preservice teachers feel unprepared for managing classrooms, they seem to need more
knowledge of classroom management, which is presently not an obligatory part of teacher
education (Jones, 2006).
Teacher knowledge of classroom management is stored in various forms (Shulman, 1986).
Propositional knowledge focuses on theoretical principles, case knowledge illustrates
propositional knowledge through cases and examples of practice, and strategic knowledge
refers to professional judgment about appropriate courses of action in complex situations or
dilemmas (Fenstermacher, 1994). Strategic knowledge is highly connected to effective teaching
(D’Agostino & VanWinkle, 2007), whereas propositional knowledge is seen as inert and
inapplicable in practical situations (Renkl, Mandl, & Gruber, 1996). A reliable and valid
instrument for directly measuring strategic knowledge of classroom management is required to
investigate its acquisition and promotion during teacher education. A situational judgment test
(SJT; Stemler & Sternberg, 2006) is an appropriate method for assessing strategic knowledge.
It consists of realistic work-related situations and deals with several effective and ineffective

Correspondence should be sent to Bernadette Gold, University of Münster, Fliednerstraße 21, 48149 Münster,
Germany. E-mail: bernadette.gold@uni-muenster.de

226
STRATEGIC KNOWLEDGE OF CLASSROOM MANAGEMENT 227

courses of action (Whetzel & McDaniel, 2009). Respondents evaluate the courses of action with
respect to either their effectiveness or their likelihood that they themselves would behave in this
way in the situation.
The aim of the present study was to develop and validate a SJT for assessing preservice
teachers’ strategic knowledge of classroom management in elementary schools (SJT-CM).
Downloaded by [Institut Fuer Tierernaehrung/Fli], [Mr Manfred Holodynski] at 22:46 11 August 2015

Developing this SJT required the construction of authentic classroom scenarios and a set of
courses of action that describe how a teacher could react. Experts on classroom management
confirmed the content validity of these scenarios. The construct validity of the test and
sensitivity to differences in expertise were explored in a pilot study with preservice and in-
service teachers. The results of the pilot study were replicated in a cross-validation with a larger
sample of bachelor students, master students, and teachers.

KNOWLEDGE OF CLASSROOM MANAGEMENT

Many theories and frameworks have dealt with the professional knowledge of teachers
(Grossman, 1990; Kunter et al., 2013; National Board for Professional Teaching Standards,
2002). All these models include Shulman’s (1986) breakdown of professional knowledge into
three types: (a) content knowledge, (b) pedagogical content knowledge, and (c) pedagogical –
psychological knowledge. Despite slightly differences in the operationalizations of these
dimensions, knowledge of classroom management is consistently described as a crucial
component of pedagogical – psychological knowledge (König, Blömeke, Paine, Schmidt, &
Hsieh, 2011; Voss, Kunter, & Baumert, 2011). Standards for teaching, teacher education, and
teacher assessment also include classroom management skills (Interstate New Teacher
Assessment and Support Consortium, 2011; National Board for Professional Teaching
Standards, 2002). Supervisors, school principals, parents, and teacher educators also agree that
classroom management skills are crucial for effective teaching—in particular for beginning
teachers (Nichols & Mundt, 1996; Veenman, 1984). There is also long-standing empirical
evidence of the positive impact of effective classroom management on student outcomes. Good
classroom managers have students with higher learning gains (Hattie, 2009; Seidel & Shavelson,
2007; Wang et al., 1993), and they are less vulnerable to burnout (Brouwers & Tomic, 2000;
Friedman, 2006).
Definitions of classroom management vary, but they have the following in common:
Classroom management aims at providing a positive and effective learning environment and
at creating a high rate of time on-task (Emmer & Stough, 2001; Slavin, 2003). Theories and
guidebooks offer countless strategies on how to manage a classroom successfully. They
generally argue that it is important to prevent interruptions and to respond appropriately to
misbehavior, to maintain momentum and smoothness, and to establish socially shared
expectations in the form of rules and routines (Evertson & Emmer, 2012; Evertson & Weinstein,
2006; Good & Brophy, 2000; Weinstein & Migano, 2006).
It is less important to know each of these strategies and more important to choose an
appropriate action depending on the specific situation (Doyle, 2006; Gettinger & Kohler, 2006;
Weinstein, 1999; Westerman, 1991). Clearly, mere knowledge of the various strategies remains
unusable and inert unless the teacher is able to apply it appropriately in a specific classroom
situation (Renkl et al., 1996).
228 GOLD AND HOLODYNSKI

In addition to the aforementioned breakdown, Shulman (1986) also claimed that teacher
knowledge is organized into three “forms” (p. 10): First, propositional knowledge contains
principles of empirical research, maxims that derive from practice, and norms. Second, case
knowledge contextualizes propositional knowledge through cases and considers the complexity
of teaching. It encompasses practical examples of theoretical principles, precedents that
Downloaded by [Institut Fuer Tierernaehrung/Fli], [Mr Manfred Holodynski] at 22:46 11 August 2015

illustrate principles of practice, and parables that demonstrate values and norms. Third, strategic
knowledge is required when theoretical rules or principles (propositional knowledge) conflict
with cases or instances (case knowledge) in a specific situation and a teacher has to weigh
alternatives for acting. In such dilemmas or problem situations, professional judgment is
required (Fenstermacher, 1994). Effective teaching is dependent on “teachers’ ability to make
decisions about effective practices within the context of their particular classrooms” (Gettinger
& Kohler, 2006, p. 88).
Consequently, knowledge of classroom management encompasses propositional knowledge
about theories and principles of classroom management, case knowledge about specific
classroom-management-related cases or prototypes, and strategic knowledge as “wisdom
of practice” (Shulman, 1986, p. 13). Strategic knowledge encompasses skilled professional
judgment in complex contradictory situations (Fenstermacher, 1994). It develops through
practical experiences, when propositional knowledge is combined with cases and complex
situations in the classroom (Blömeke, Felbrich, Müller, Kaiser, & Lehmann, 2008; Leinhardt,
McCarthy Young, & Merriman, 1995). Valid and reliable instruments for directly measuring
professional knowledge in the field of teacher education are currently lacking (Voss et al., 2011).
Consequently, an appropriate instrument for measuring strategic knowledge of classroom
management in elementary school is required to further investigate its acquisition. For instance,
longitudinal studies during teacher education would yield insights into the development of
knowledge of classroom management and the extent to which it is influenced by practical
experience or in first years of teaching.

ASSESSING STRATEGIC KNOWLEDGE OF CLASSROOM MANAGEMENT

Teacher skills are commonly assessed through indirect factors such as teacher education levels,
certification status, or the number of courses studied (Goldhaber & Anthony, 2007; Moyer-
Packenham, Bolyard, & Kitsantas, 2008; Wayne & Youngs, 2003). The validity of these
indicators is limited, and standardized tests that directly measure teacher knowledge are needed
(Voss et al., 2011). D’Agostino and VanWinkle (2007) reexamined typical professional
knowledge items from publicly released Praxis and NES state tests with noneducation majors, as
well as beginning and advanced education majors. They found that academic (propositional)
knowledge is mostly connected to theoretical principles learned in the education program but
that functional teaching (strategic) knowledge is crucial for effective teaching. Consequently,
the assessment of strategic knowledge should be more valid in predicting teaching skills than
paper –pencil tests for measuring propositional knowledge.
Tests for assessing propositional and strategic classroom management knowledge have
been developed within the scope of licensure processes in teacher education. For instance, the
Principles of Learning and Teaching tests of the Praxis Series II require, among other topics,
“principles and strategies for classroom management” (Educational Testing Service, 2011, p. 3),
STRATEGIC KNOWLEDGE OF CLASSROOM MANAGEMENT 229

which encompasses knowing how to develop classroom routines and procedures, maintain
accurate records, establish standards of conduct, arrange classroom space, and promote a
positive learning environment. Although the Praxis Series are well validated, they are for
internal use only and therefore not available for research purposes.
There are also some tests for assessing knowledge of classroom management as one
Downloaded by [Institut Fuer Tierernaehrung/Fli], [Mr Manfred Holodynski] at 22:46 11 August 2015

component of general pedagogical – psychological knowledge in the context of large-scale


studies conducted over the last few years. For instance, Voss et al. (2011) measured classroom
management as one of four subdimensions of teacher candidates’ general pedagogical –
psychological knowledge in the COACTIV-R study (Kunter et al., 2013), which investigates the
effects of general teacher competence on instructional quality in mathematics in Germany.
Although this test also taps strategic knowledge of classroom management, it is intended only
for secondary teachers. In the Teacher Education and Development Study: Learning to Teach
Mathematics (TEDS-M), an international comparative study of elementary and lower secondary
preservice teachers’ content and pedagogical content knowledge of mathematics (Tatto et al.,
2008), three states of the study (United States, Germany, and Taiwan) developed a test for
assessing general pedagogical knowledge. The test also included situated items serving as
prompts to elicit strategic knowledge of classroom management (König et al., 2011). This test is
also available for elementary school teachers. However, it measures classroom management
as just one subscale of a broader test of pedagogical knowledge and does not reflect the
multifaceted nature of classroom management.
Because strategic knowledge is required in specific problem situations or dilemmas, a
contextualized instrument that elicits professional judgments about how to act in a complex,
ambiguous situation is needed for assessing strategic knowledge of classroom management
(Berliner, 2005; Blömeke et al., 2008). SJTs meet these requirements and are commonly used for
measuring strategic knowledge (Stemler & Sternberg, 2006). They consist of work-related
situations and several responses to the situation that have to be evaluated (McDaniel & Nguyen,
2001). The responses reflect the tendency of the respondent to react effectively or/and
ineffectively in realistic problem situations and therefore measure strategic knowledge
(McDaniel & Nguyen, 2001; Motowidlo, Hooper, & Jackson, 2006; Stemler & Sternberg, 2006;
Weekley, Ployhart, & Holtz, 2006). SJTs have been applied mostly for measuring leadership and
interpersonal skills (Christian, Edwards, & Bradley, 2010). Many studies have confirmed the
relationship between SJTs and job performance in this field (Lievens & Sackett, 2012;
McDaniel, Morgeson, Finnegan, Campion, & Braverman, 2001). We concluded that this
approach was also suitable for measuring strategic knowledge of classroom management.

AIM OF THE STUDY

The purpose of this study was to construct and validate a SJT for measuring strategic knowledge
of classroom management (SJT-CM) for preservice teachers in elementary school education.
As previously stated, established instruments for measuring classroom management knowledge
are for secondary school teachers, are not available for research, or do not measure
classroom management as a multifaceted construct. SJTs are situated measurements that tap
professional judgments in complex problem situations. The construction of the SJT-CM required
the development and validation of appropriate classroom scenarios as item stems and,
230 GOLD AND HOLODYNSKI

additionally, effective and ineffective strategies as responses through expert ratings. The
construct validity and sensitivity to differences in expertise were investigated in a pilot study.
We hypothesized a three-dimensional factor structure according to three facets of classroom
management (monitoring, managing momentum, and rules and routines). Furthermore, we
hypothesized that teachers would have more strategic knowledge of classroom management than
Downloaded by [Institut Fuer Tierernaehrung/Fli], [Mr Manfred Holodynski] at 22:46 11 August 2015

preservice teachers. Accordingly, the test should be sensitive to these differences in expertise.
Finally, we conducted a cross-validation to confirm the robustness of the results in a larger
sample.

TEST CONSTRUCTION AND PILOT STUDY

Development of a Situational Judgment Test for Assessing Strategic Knowledge of


Classroom Management
Three issues in SJT construction have to be taken into account: (a) problem situations as
item stems, (b) responses as possible (effective and ineffective) courses of action in the situation,
and (c) response instructions as well as the scoring key for the effectiveness of respondents’
answers.
Criteria for the construction of item stems. There are two methods of constructing item
stems: the critical incident and the model-based method (Weekley et al., 2006). Using the former
method, subject matter experts recall or create situations that are typical examples of effective or
ineffective performance. In the theory-based model, the situations originate in a theoretical
model or a job analysis. Which one of the two methods is better remains an open question.
Weekley et al. (2006) recommended developing a model and then gathering various critical
incidents that reflect the model. Accordingly, we chose the model-based method for developing
the item stems. Based on the theoretical conceptualization of classroom management
knowledge, we assumed that there are three main facets of classroom management: monitoring,
managing momentum, and rules and routines. To ensure that the scope of the study remained
tractable, this study focused on classroom management in elementary schools, as classroom
management can of course vary, depending on the school level (Carter & Doyle, 2006; Emmer
& Gerwels, 2006).
1. Monitoring first includes proactive strategies such as Kounin’s (1970) concepts of
“Withitness” and “Overlapping.” The teacher is aware of all relevant and
simultaneous processes in the classroom and of the fact that students recognize
such teacher awareness. Active supervision means giving verbal, gestural, and facially
expressed feedback to the students with respect to their behavior and performance and
choosing a reasonable position in the classroom (Simonsen, Fairbanks, Briesch,
Myers, & Sugai, 2008; Stage & Quiroz, 1997). Second, monitoring also includes
reactive strategies such as a prompt and appropriate response to inappropriate
behavior before it intensifies or spreads, and that targets the student who caused the
disruption (Johnston, 1995; Kounin, 1970). Not every violation of classroom rules
constitutes misbehavior. Teachers have to adjust their response depending on the
intensity of the interruption, taking the context and underlying aim of student problem
behavior into account (Dreikurs, Grunwald, & Pepper, 1982).
STRATEGIC KNOWLEDGE OF CLASSROOM MANAGEMENT 231

2. Managing momentum encompasses methods for establishing smooth transitions


between the various classroom activities (Kounin, 1970; Smith, 1985). Maintaining
momentum means implementing briskly paced lessons without slowdowns;
unnecessary breaks; or, on the other hand, too fast a pace, depending on the level of
student understanding and attention (Arlin, 1979). This refers to proactive strategies,
Downloaded by [Institut Fuer Tierernaehrung/Fli], [Mr Manfred Holodynski] at 22:46 11 August 2015

as activity signals, and appropriate (physical) preparation of the classroom (Doyle,


2006; Evertson & Emmer, 2012; Kounin & Doyle, 1975; Weinstein, 1979). Unlike
Kounin (1970), we do not consider “group focus” as an independent dimension,
because we suggest that it has a supporting function for the momentum of a lesson.
This entails engaging the attention of the whole class (“group alerting”), observing and
evaluating student learning and participation (“encouraging accountability”), and
creating “high-participation formats.”
3. Finally, socially shared expectations of behavior in the form of rules and routines are
critical for the daily routine in a classroom. They give the students an orientation and
provide a supporting function to the other facets of classroom management (Emmer,
Evertson, & Anderson, 1980; Evertson & Emmer, 1982). Daily structure and
orientation is especially important in elementary schools (Brophy & Evertson, 1978).
Rules should be stated comprehensively and with clear expectations (Emmer &
Gerwels, 2006) and related to logical consequences that the students perceive as fair
and appropriate (Elias & Schwaab, 2006) and established at the beginning each school
year (Bohn, Roehrig, & Pressley, 2004; Emmer et al., 1980).
We searched for situations in existing classroom videos from elementary schools that were
characteristic of these facets using a coding scheme according to the three facets. The videos in
question were made as a part of the project “ViU – Early Science” funded by the German
Federal Ministry of Education and Research. The situations in the videos were transcribed into
short written situations that were low in complexity. To ensure the authenticity of the situations,
we asked external experts on classroom management whether the situations reflected typical
classroom events of the respective facet of classroom management.

Criteria for the construction of responses. There is no empirical evidence in the


literature on whether response options created by subject matter experts, novices (critical-
incidents-based), or test developers (construct-based) differ (Weekley et al., 2006), so we
chose a mixed approach. First, the situations were presented to student teachers who were
asked to formulate at least three options for reacting to the situations, because novices are
more likely to generate a wide range of effective and ineffective strategies, whereas experts
tend to formulate mainly effective strategies (McDaniel & Nguyen, 2001). Second, we
adapted the strategies with respect to classroom management strategies from both
practitioner and research literature. To confirm the assumed level of effectiveness of the
response options, experts graded each strategy from A (excellent) to F ( failure) according to
school grades.

Criteria for the construction of response instructions and the scoring key. Research on
SJTs has investigated two different response instructions: Behavioral instructions aim at what
the respondent “would do,” whereas knowledge instructions ask the respondent what he or she
“should do.” We decided to use the “should do” instruction, because it has been shown to have
232 GOLD AND HOLODYNSKI

higher criterion-related validity (McDaniel, Hartmann, Whetzel, & Grubb, 2007) and be less
susceptible to faking and social desirability bias.
The answer key is usually determined through expert judgments (McDaniel & Nguyen,
2001). With respect to the scoring method, the respondent chooses one option (which is the
most likely/best response) or two options (the most and less likely/effective response), or the
Downloaded by [Institut Fuer Tierernaehrung/Fli], [Mr Manfred Holodynski] at 22:46 11 August 2015

respondent evaluates all options (e.g., on a Likert scale). In our study, respondents were required
to evaluate each response option on a 6-point Likert scale (in contrast to a forced-choice method)
to increase the variance in the item scores and to facilitate statistical analysis. Next we provide
an example of a scenario with its response options derived from theoretical principles.

Example scenario. The scenario describes an off-task student trying to distract his
neighbor while some students present their results to the class (see Figure 1). The challenge for

FIGURE 1 Example scenario of the situational judgment test for assessing preservice teachers’ strategic knowledge of
classroom management in elementary schools.
STRATEGIC KNOWLEDGE OF CLASSROOM MANAGEMENT 233

the teacher is to deal with the disruption and nonetheless continue with the lesson simultaneously
(overlapping). The misbehavior is not intensive but has already affected others. Slavin (2003)
and Weinstein and Migano (2006) proposed several strategies for dealing with minor
misbehavior. Using a nonverbal intervention or an indirect verbal intervention like stating the
student’s name, incorporating the name into the lesson, or calling on the child to participate
Downloaded by [Institut Fuer Tierernaehrung/Fli], [Mr Manfred Holodynski] at 22:46 11 August 2015

in the lesson enables the class to continue without further interruptions. A more intense
intervention is a direct verbal one such as a succinct command. Because the behavior in this
scenario is not serious, and as the teacher would have to interrupt the lesson, the two previous
intervention strategies are more efficient. Ignoring the misbehavior can be an appropriate
strategy if the misbehavior is minor and fleeting. In this case, the student’s behavior has already
affected the attention of another student. Ignoring it would therefore not be intensive enough,
because the disruption may have a negative influence on classmates. Furthermore, Weinstein
and Migano postulated several principles for dealing with inappropriate behavior, for example,
continuing the lesson, protecting the students’ dignity, or having an awareness of the
misbehavior’s context dependence. Based on these principles, we formulated six different
effective or ineffective strategies for the teacher to react in this scenario (Strategies a, c, e, f).
Strategy b (ignoring the behavior) and Strategy c were distractors, because ignoring is
appropriate only when the misbehavior is fleeting. Discussing minor misbehavior with the whole
class would offend the student’s dignity and affect the time flow of the lesson.
After the construction of 14 scenarios (five monitoring, five managing momentum, four rules
and routines) with five to seven courses of action for each scenario, we created a web-based
version of the SJT-CM. Previous studies did not find performance differences between
(unproctored) web-based tests and paper–pencil tests (Templer & Lange 2008). The next step was
to conduct an expert survey to investigate the content validity of the scenarios and of the responses.

Content Validation (of Scenarios and Responses)


We asked 17 experts on classroom management to complete the test in order to validate the
chosen scenarios and their accompanying strategies. Fifteen experts were academic staff (six
with a professorship, two with a PhD degree, and seven PhD students), and two experts had a
PhD degree as well as high experience in teacher trainings on classroom management. We asked
the experts how representative each scenario was for monitoring, managing momentum and
rules and routines as a criterion of content validity. They also evaluated the effectiveness of each
strategy on a 6-point rating scale from 1 (A) to 6 (F) according to school grades. Based on their
agreement on the quality of the strategies, we created a master rating to score the participants’
responses.

Validation of the Scenarios

With respect to each scenario, we asked the experts how representative it was in terms of
monitoring, managing momentum, and rules and routines on a scale from 1 (Grade A excellent)
to 6 (Grade F failure). We calculated analyses of variance with simple contrasts to check the
suitability of each scenario for the assumed facet of classroom management.
Figure 2 shows the average judgment of the experts for each scenario with respect to
suitability for monitoring, managing momentum, and rules and routines. The first five scenarios
234 GOLD AND HOLODYNSKI
Downloaded by [Institut Fuer Tierernaehrung/Fli], [Mr Manfred Holodynski] at 22:46 11 August 2015

FIGURE 2 Average grade of all experts on how well the respective scenario represents monitoring, managing
momentum, and rules and routines. Note. aAn analysis of variance reveals that the scenario demonstrates the intended
facet significantly better than another facet. bAn analysis of variance reveals that the scenario does not demonstrate the
assumed facet significantly better than either one of the other facets. No letter at all indicates that the scenario
demonstrates the assumed facet significantly better than either one of the other facets.

were assumed to be relevant for teachers’ monitoring, Scenario 6 through 10 were hypothesized
as problems with managing momentum, and the last four scenarios were intended to illustrate
issues in terms of rules and routines. The first four of the five scenarios that illustrated
monitoring were considered to be appropriate by the experts. Problems on monitoring were
evidently barely differentiable from those on rules and routines because these scenarios
contained any form of off-task behavior or disruption that can be characterized as violating
classroom rules. The fifth scenario was judged as satisfactory for all facets of classroom
management. In addition, it was judged more suitable for the other two facets, which indicated
low content validity for this scenario. All scenarios on managing momentum—in particular,
Scenarios 6 through 9—were rated as excellent or good for illustrating the respective facet. They
were clearly more appropriate for the theoretically assumed facet than either one of the
alternatives. Regarding Scenarios 12 through 14, the experts agreed with the hypothesized
classification of rules and routines. Although Scenario 11 was difficult to separate from being
appropriate for monitoring or managing momentum, it was judged as focusing mostly on rules
and routines.
STRATEGIC KNOWLEDGE OF CLASSROOM MANAGEMENT 235

In summary, the experts rated all scenarios to be at least satisfactorily representative for the
hypothesized theoretical facet of classroom management. We therefore concluded that most
scenarios revealed an adequate degree of content validity. Only Scenario 5 could not be
classified clearly and was therefore excluded from further investigation.
Downloaded by [Institut Fuer Tierernaehrung/Fli], [Mr Manfred Holodynski] at 22:46 11 August 2015

Validation of the Strategies

To avoid the risk of leniency or severity bias (Hoyt, 2000), we analyzed the relative grading
through pairwise comparisons between single strategies instead of absolute rater agreement.
In other words, we considered the expert agreement on the efficiency ranks of the various
strategies and not on the specific grade. We built all possible pair comparisons for each scenario
(see Table 1 as an example) and calculated the frequencies of the three possible relationships
between two chosen strategies (better, equal, and worse) based on the expert responses. At least
11 experts (65%) had to agree on which strategy was more appropriate. Pair comparisons with
less agreement were removed (see Table 1 as an example).
The comparisons of all strategy pairs revealed that only 56% reached the minimum rater
agreement of 65% (see Table 2). This was mainly due to the low number of remaining pair
comparisons of Scenario 10 (27%) and Scenario 12 (29%). This means that the experts generally
disagreed about the strategies’ rank order in these two scenarios. Consequently, Scenarios 10
and 12 were excluded for further analysis, and 61.83% of all pair comparisons remained. The
average agreement on the relationship between the 102 pair comparisons within the 12

TABLE 1
Pairwise Comparisons for the Strategy Evaluations of The Example Scenario and
Frequencies of the Relations Between the Strategies

Expert Rating

Pairwise Comparisons , ¼ .

Giving a succinct command (a) – Distractor: Ignoring (b) 10 3 4b


Giving a succinct command (a) – Nonverbal intervention (c) 1 1 15a
Giving a succinct command (a) – Distractor: Discussion (d) 11 4 2a
Giving a succinct command (a) – Calling student to participate (e) 4 4 9b
Giving a succinct command (a) – Stating student’s name (f) 0 1 16a
Distractor: Ignoring (b) – Nonverbal intervention (c) 1 1 15a
Distractor: Ignoring (b) – Distractor: Discussion (d) 7 6 4b
Distractor: Ignoring (b) – Calling student to participate (e) 2 4 11a
Distractor: Ignoring (b) – Stating student’s name (f) 1 — 16a
Nonverbal intervention (c) – Distractor: Discussion (d) 14 2 1b
Nonverbal intervention (c) – Calling student to participate (e) 15 1 1a
Nonverbal intervention (c) – Stating student’s name (f) 9 4 4b
Distractor: Discussion (d) – Calling student to participate (e) 3 3 11a
Distractor: Discussion (d) – Stating student’s name (f) — 2 15a
Calling student to participate (e) – Stating student’s name (f) — 1 16a
a
Pair comparison included due to expert agreement over 65%. bPair comparison
excluded due to expert agreement below 65%.
236 GOLD AND HOLODYNSKI

TABLE 2
Summary of All Scenarios, the Number of Associated Strategies, the Resulting Number
of Pair Comparisons, Remaining Pair Comparisons After the Expert Ratings, and the
Experts’ Average Agreement for all Pair Comparisons of Each Scenario

Scenario CM Facet Nstrategies Npairs Nremained pairs Magreement Magreementa


Downloaded by [Institut Fuer Tierernaehrung/Fli], [Mr Manfred Holodynski] at 22:46 11 August 2015

1 MO 6 15 12 (80%) 85.35% 90.92%


2 MO 6 15 11 (73%) 70.98% 77.93%
3 MO 6 15 8 (53%) 67.56% 79.03%
4 MO 6 15 10 (67%) 67.48% 74.62%
5 MO 6 15 8 (53 %) 62.61% 68.28%
6 MM 6 15 10 (67%) 82.37% 97.33%
7 MM 7 21 9 (42%) 67.98% 80.23%
8 MM 5 10 4 (40%) 68.02% 69.99%
9 MM 5 10 7 (70%) 72.48% 74.80%
10b MM 6 15 4 (27%) 60.58% 72.11%
11 RR 6 15 10 (67%) 66.37% 73.27%
12b RR 7 21 6 (29%) 57.72% 67.32%
13 RR 5 10 6 (60%) 71.51% 77.12%
14 RR 5 10 7 (70%) 64.07% 68.37%
Total 82 202 112 (57%) 68.93% 76.52%
Total final 69 166 102 (62%) 70.57% 77.66%

Note. CM ¼ classroom management; N ¼ number; M ¼ mean; MO ¼ monitoring;


MM ¼ managing momentum; RR ¼ rules and routines.
a
Average agreement of pair comparisons after excluding strategy pairs with an agreement
under 65%. bDeleted scenarios because of insufficient agreement between experts.

remaining scenarios was 77.66%. All scenarios, the number of pair comparisons, and the
average percentage rater agreement on the pair comparisons are listed in Table 2.
Scoring key. The pair comparisons with a minimum agreement of 65% also served as a
coding system for future participants’ answers. Given that the participant agreed with the master
rating regarding the relationship between two strategies—for example, giving a succinct
command (a) is less appropriate than a nonverbal intervention (c)—1 point was awarded,
otherwise zero points. If the participant graded both strategies as equally efficient and they
additionally had consecutive rank positions in the master rating, half a point was awarded.
As a consequence of the expert survey, we shortened the test slightly by excluding Scenario 5
because of insufficient content validity and Scenarios 10 and 12 because of low agreement as to
the strategies’ rank order. In the next phase, we investigated the construct validity and sensitivity
to differences in expertise of the resulting initial test version in a pilot study.

Pilot Study
We conducted a pilot study with 98 preservice teachers (75.5% female, number of semesters:
M ¼ 4.25, SD ¼ 2.26) to investigate the hypothesized three-dimensional factor structure of the
test, according to the three facets of classroom management monitoring, managing momentum,
and rules and routines. The final test version consisted of 11 scenarios with five to seven
courses of action. Participants evaluated each strategy on a scale from 1 (Grade A) to 6 (Grade
F). Forty-four in-service teachers (82.6% female, years of teaching: M ¼ 16.38, SD ¼ 10.42)
STRATEGIC KNOWLEDGE OF CLASSROOM MANAGEMENT 237

were recruited to examine the sensitivity to differences in expertise by comparing the test
performance of the two groups of expertise level. We assumed that teachers had more
knowledge of classroom management and that the SJT-CM was sufficiently sensitive to identify
mean differences in expertise.
We constructed one parcel for each scenario by aggregating the scores of all remaining pair
Downloaded by [Institut Fuer Tierernaehrung/Fli], [Mr Manfred Holodynski] at 22:46 11 August 2015

comparisons. The resulting total score of each scenario was divided by the highest possible
score that could be reached. Thus, the total score for each scenario represented the accordance
with the master rating in percentage with respect to the rank order of this specific strategy and
ranged between 0 and 1. The resulting 11 parcels were used as variables for further analysis.
To investigate the construct validity of the test, we first examined the hypothesized factor
structure using confirmatory factor analysis (CFA) with maximum likelihood estimation.
We conducted this analysis only with the target group of preservice teachers. As indicators of the
model fit to the data, the chi-square test should be insignificant, comparative fit index (CFI) and
Tucker –Lewis index (TLI) should be higher than 0.95, root mean square error of approximation
(RMSEA) should be lower than 0.05, and standardized root mean square residual (SRMR)
lower than 0.08 (Hu & Bentler 1999). We used analysis of variance to examine the sensitivity of
the test to differences between preservice teachers and teachers. The CFA was conducted
with the software MPlus (Muthén & Muthén 2010). SPSS (Version 21) was used for analysis
of variance.

Construct validity. The CFA revealed a good fit for the hypothesized model (see Figure 3,
Model 1), x2(41) ¼ 45.499, p ¼ .29, RMSEA ¼ 0.033, 90% confidence interval (CI) [.000,
.079], p (RMSEA # .05) ¼ .673, SRMR ¼ 0.059, CFI ¼ 0.976 and TLI ¼ 0.968. As Scenario 8
yielded a weak factor loading and a high residual variance, it was excluded from further analysis.
Modification indices indicated the model fit would improve by allowing Scenario 3 to load
on the rules and routine factor. Because this scenario was not clearly categorized for either
monitoring or rules and routines in the expert survey, we concluded that this item was also
suitable for rules and routines.
Consequently, we modified the hypothesized model as follows. First, we excluded
Scenario 8, and second, we specified that Scenario 3 would load only on the factor rules and
routines. Based on the modified model, we conducted a second CFA with 10 remaining scenarios
(see Figure 3, Model 2). The CFA revealed an even better model fit, x2(32) ¼ 25.978, p ¼ .77,
RMSEA , 0.001, 90% CI [.000, .053], p (RMSEA # .05) ¼ .870, SRMR ¼ 0.044,
CFI ¼ 1.00, TLI ¼ 1.049, having mostly medium to high factor loadings (lmin ¼ .40;
lmax ¼ .74). The latent factor correlations between .70 and .73 indicated that the three facets
of classroom management were related but separable. The composite reliability (McDonald,
1999) of all scales was acceptable, with vmonitoring ¼ .627, vmanaging momentum ¼ .685,
vrules and routines ¼ .639, and vtotal ¼ .843.

Sensitivity to differences in expertise. An analysis of variance was conducted with the


scores for the three subscales monitoring, managing momentum and rules and routines as
dependent variables. The scale scores were constructed using the mean of the total scores of
the respective scenarios (Table 3). An analysis of variance indicated that the teachers had
significantly higher means with respect to monitoring (Mdiff ¼ 6.19, SDpooled ¼ 14.28, p ¼ .018,
d ¼ 0.43) and rules and routines (Mdiff ¼ 5.84, SDpooled ¼ 14.18, p ¼ .025, d ¼ 0.41).
Unexpectedly, there were no significant mean differences between preservice teachers and
238 GOLD AND HOLODYNSKI
Downloaded by [Institut Fuer Tierernaehrung/Fli], [Mr Manfred Holodynski] at 22:46 11 August 2015

FIGURE 3 Three-dimensional models of knowledge of classroom management (standardized model parameters).


Note. Nonitalicized estimates refer to the pilot study and italicized estimates refer to the validation study.

inservice teachers in terms of managing momentum (Mdiff ¼ .77, SDpooled ¼ 14.91, p ¼ .776,
d ¼ 0.05).

VALIDATION STUDY

The cross-validation was conducted to examine the replicability of the three-dimensional factor
structure and the sensitivity to differences in expertise with a larger sample of (a) bachelor and
master students of elementary school education and (b) elementary school teachers. Master
students had more opportunities to gain knowledge during their studies, and more practical
phases that probably enhanced their strategic knowledge in classroom management (preservice
teachers have to complete several internships at school during their studies). Therefore, we
STRATEGIC KNOWLEDGE OF CLASSROOM MANAGEMENT 239

TABLE 3
Means and Standard Deviations for Each Scenario and the Scales of Classroom Management
Separated by Group of Expertise

Pilot Study Validation Study


Downloaded by [Institut Fuer Tierernaehrung/Fli], [Mr Manfred Holodynski] at 22:46 11 August 2015

Student Bachelor
Teachers Teachers Students Master Students Teachers

M SD M SD M SD M SD M SD

S1 69.73 16.40 70.74 17.46 71.31 15.93 73.30 18.378 73.97 16.01
S2 71.75 20.56 78.31 17.52 76.10 18.27 77.35 15.642 77.75 16.63
S3 77.81 21.20 88.64 15.92 84.97 16.95 88.48 16.734 91.10 15.05
S4 56.17 21.49 67.16 18.53 51.04 20.38 58.04 20.527 64.88 17.52
S5 54.21 20.81 62.50 21.36 58.03 21.06 55.22 24.806 60.30 22.79
S6 82.60 18.08 81.36 20.39 83.29 17.91 87.35 16.508 83.44 16.69
S7 82.77 21.28 89.27 11.39 82.30 19.15 83.72 17.409 87.87 16.75
S8 67.96 26.90 54.55 28.07 69.12 23.62 66.96 25.553 58.40 29.36
S9 81.92 19.68 74.35 23.07 81.87 19.78 81.61 19.088 80.91 19.15
S11 75.05 19.37 74.43 20.12 75.62 19.29 79.48 18.020 77.04 18.41
S13 70.15 22.73 75.19 13.14 74.91 21.11 79.86 15.956 76.27 15.38
S14 64.72 29.21 72.84 22.79 65.82 22.32 71.04 24.393 73.83 24.60
MO 65.88 14.81 72.07 13.01 66.15 13.03 69.56 11.909 72.20 11.87
MM 82.43 15.41 81.66 13.73 82.48 14.33 84.23 12.321 84.07 12.75
RR 71.93 16.00 77.77 8.82 75.33 13.59 79.71 11.620 79.56 11.13
CM 73.42 12.14 77.17 7.91 74.65 10.76 77.83 8.78 78.61 9.22

Note. MO ¼ monitoring; MM ¼ managing momentum; RR ¼ rules and routines; CM ¼ classroom


management.

expected master students to perform better in the SJT-CM than bachelor students and teachers
to perform best of all. The comparison of the latent means among the three groups of expertise,
with regard to their classroom management knowledge, using multigroup analysis, requires
scalar invariance across groups (the same basic factor structure as well as equal factor loadings
and intercepts; Widaman & Reise, 1997). Hence, we first tested for measurement invariance
across groups, and subsequently compared the mean test performance between the three groups.

Method

Participants. There were 308 preservice teachers for elementary school education and 125
elementary school teachers (88% female, years of teaching: M ¼ 7.71) who participated in this
study. There were 193 of the preservice teachers who were at the beginning of their bachelor
program (88.1% female, Msemester ¼ 1.04) and 115 preservice teachers in their master’s studies
(87.8% female, Msemester ¼ 7.58). Sixty-eight bachelor students agreed to repeat the test 4
months after the first completion in order to calculate the test – retest correlation as an indicator
of test stability. Because we were not able to ensure essentially tau-equivalence as an assumption
for interpreting test – retest correlations as reliability, the test – retest correlation served as an
indicator of stability.
240 GOLD AND HOLODYNSKI

Materials and procedure. All participants answered the web-based SJT-CM consisting of
10 scenarios with five to seven acting strategies that resulted from the pilot study. The
completion of the entire survey took 30 min on average.
Statistical analysis. Again, we used CFA with maximum likelihood estimation to replicate
Downloaded by [Institut Fuer Tierernaehrung/Fli], [Mr Manfred Holodynski] at 22:46 11 August 2015

the factor structure of the test with preservice teachers. Measurement invariance across the
groups was tested using multigroup analysis. Firstly, we specified a configural invariance model
(fixed factor structure but unconstrained loadings and intercepts), then a metric invariance model
(fixed factor structure and equal factor loadings), and finally a scalar invariance model (fixed
factor structure, equal factor loadings, and equal intercepts). We compared the models through
the calculation of chi-square difference tests with Satorra-Bentler scaling correction (Satorra &
Bentler, 2001). A significant chi-square difference test indicated noninvariance. To compare the
latent means of the three different groups, latent means of the bachelor students were set to
zero (reference group) and latent mean differences of the other two groups were estimated.
The choice of a reference group was arbitrary and facilitated the interpretation of the results; it
did not affect the results as such. We additionally calculated the test –retest correlation of the
SJT-CM with a small sample of bachelor students. All calculations were carried out with the
statistical software Mplus (Muthén & Muthén, 2010).

Results
Construct validity. The fit indices of the CFA indicated that the internal structure of the test
was replicated in the second larger sample, x2(32) ¼ 47.84, p ¼ .04, RMSEA ¼ .04, 90% CI
[.011, .062], p (RMSEA # .05) ¼ .744, SRMR ¼ .041, CFI ¼ .961, TLI ¼ .945. The factor
loadings were slightly lower than in the pilot study; especially the loadings of items on the factor
monitoring (see Figure 3, Model 2). The latent correlations between the three subdimensions
also slightly differed from the values in the pilot study. The correlation between monitoring and
rules and routines was considerably smaller, whereas the correlation between managing
momentum and rules and routines increased. We examined whether the factor loadings differed
significantly, using multigroup analysis across the sample of the pilot and the validation study.
We compared a model with equal factor loadings across the two samples with a model, in which
the factor loadings of each group were freely estimated. A chi-square difference test using
Satorra-Bentler scaling correction revealed that the factor loadings did not differ significantly,
Dx2(D6) ¼ 1.856, p ¼ .932.
The internal consistency of all scales was poor with vmonitoring ¼ .469, vmanaging momentum ¼
.586, and vrules and routines ¼ .576. Only the internal consistency for the complete test was good with
vtotal ¼ .782. The test–retest correlations were high with rmonitoring ¼ .758, rmanaging momentum ¼
.755, rrules and routines ¼ .727, and rtotal ¼ .851.

Measurement invariance. To ensure that the factor structure of the test was invariant
across bachelor students, master students, and teachers, we compared a configural invariant
model, a metric invariant model, and a scalar invariant model. The fit indices for the configural
invariance model were satisfactory (see Table 4), indicating that the factor structure was
invariant across the three groups. Chi-square difference tests indicated that metric invariance was
tenable but that the scalar invariance model led to a significant decrease in model fit. Modification
indices revealed an increase in model fit by freely estimating the intercept of Scenario 2 in the
STRATEGIC KNOWLEDGE OF CLASSROOM MANAGEMENT 241

TABLE 4
Measurement Invariance Across Bachelor Students, Master Students, and Teachers

x2 df RMSEA CFI TLI Dx2 a Ddf p

1. Configural invariance 93.880 72 .046 .948 .922


Downloaded by [Institut Fuer Tierernaehrung/Fli], [Mr Manfred Holodynski] at 22:46 11 August 2015

2. Metric invariance 114.731 84 .050 .927 .906


Difference between Model 2 & Model 1 19.396 12 .079
3. Scalar invariance 140.137 96 .056 .895 .882
Difference between Model 3 & Model 2 25.700 12 .012
4. Partial scalar invariance 130.362 95 .051 .916 .905
Difference between Model 4 & Model 2 15.624 11 .156
a
Corrected chi-square difference using Satorra-Bentler scaling correction.

master students group. The resulting model of partial scalar invariance did not differ significantly
from the metric invariance model. To sum up, with the exception of Scenario 2, the factor
structure of the test, the factor loadings, and the item intercepts were invariant across bachelor
students, master students, and teachers, which is crucial for mean comparisons across groups.
Sensitivity to differences in expertise. Multigroup analysis revealed mean differences
between the three groups (see Figure 4). Teachers outperformed bachelor students significantly
with regard to monitoring (d ¼ 0.93, p , .001), managing momentum (d ¼ 0.31, p ¼ .034),
and rules and routines (d ¼ 0.49, p ¼ .004). Master student latent means were significantly
higher than those of bachelor students regarding monitoring (d ¼ 0.43, p ¼ .018) and rules and

FIGURE 4 Latent mean differences between bachelor students, master students, and teachers.
242 GOLD AND HOLODYNSKI

routines (d ¼ 0.44, p ¼ .004) but not regarding managing momentum (d ¼ 0.21). The mean
differences between master students and teachers were only significant for the facet monitoring
(d ¼ 0.54, p ¼ .013).

DISCUSSION
Downloaded by [Institut Fuer Tierernaehrung/Fli], [Mr Manfred Holodynski] at 22:46 11 August 2015

In the present study, we developed a situational judgment test for assessing preservice teachers’
strategic knowledge of classroom management in elementary schools (SJT-CM) and indications
of content validity, construct validity, and sensitivity to differences in expertise. Direct
measurements of teachers’ knowledge of classroom management have lacking in the literature
for a long time and were mostly assessed by self-reports or indirect factors (Voss et al., 2011).
We addressed this need for a reliable and valid instrument in teacher education research by
constructing an SJT for measuring strategic knowledge of classroom management. Such
knowledge is needed when the teacher is confronted with a complex problem or dilemma
situation in the classroom and professional judgment is required (Fenstermacher, 1994; Shulman,
1986). This form of knowledge predicts effective teaching better than propositional knowledge
(D’Agostino & VanWinkle, 2007). Because effective classroom management supports student
learning (Emmer & Stough, 2001; Hattie, 2009) and the mental health of teachers (Friedman,
2006), strategic knowledge of classroom management is an essential ability for (beginning)
teachers. Its assessment is an important step into conducting further research on the acquisition of
such knowledge in teacher education. The SJT-CM developed in the present study is the first
validated test for measuring strategic knowledge of preservice teachers in elementary school
education that covers classroom management as a multifaceted construct. It can now be applied,
for example, in research on the acquisition and promotion of classroom management knowledge
and its effects on perceived preparedness, self-efficacy, and job satisfaction.
The construction of the SJT-CM included the development and validation of specific
classroom scenarios that served as item stems and related effective and ineffective responses.
Fourteen scenarios were based on transcribed classroom videos from Germany. Because they
were substantially reduced in complexity and the conceptualization of classroom management is
similar across countries (Wubbels, 2011), we did not expect cultural differences to affect the
generalizability of the scenarios. Furthermore, Wubbels (2011) reviewed classroom manage-
ment approaches from an international perspective and concluded that there seem to be
differences only with respect to discipline problems between individualistic and collectivistic
states rather than between single countries. Seventeen experts on classroom management
validated 12 scenarios that illustrated problems with respect to the three facets of classroom
management monitoring, managing momentum, or rules and routines, as well as responses that
were derived from evidence-based and theoretical principles. Experts agreed that the scenarios
constituted appropriate examples of the respective classroom management facet. The responses
of the experts also determined the rank order for scoring the participants’ responses. A pilot
study confirmed the three-dimensional factor structure according to the three facets of classroom
management with moderate intercorrelations. The reliability was acceptable and comparable to
the reliability of classroom management scales of the pedagogical –psychological knowledge
tests in other studies (e.g., COACTIV: a ¼ .65, TEDS-M: expected a posteriori estimation
reliability: .65). Significant mean differences between preservice teachers and inservice teachers
indicated sufficient sensitivity of the test to differences in expertise.
STRATEGIC KNOWLEDGE OF CLASSROOM MANAGEMENT 243

A validation study with a larger sample of bachelor students, master students, and teachers
replicated the three-dimensional factor structure. The test was (partial) scalar invariant across
the three groups, which is required for comparisons between groups (Dimitrov, 2010).
Multigroup analysis revealed significant latent mean differences between bachelor students and
teachers with respect to all classroom management facets. Master students outperformed
Downloaded by [Institut Fuer Tierernaehrung/Fli], [Mr Manfred Holodynski] at 22:46 11 August 2015

bachelor students regarding monitoring and rules and routines but had a significantly lower
latent mean than the teachers regarding monitoring. The latent mean pattern indicated sensitivity
to differences in expertise.
Taking research on the internal structure of SJTs into account, it is often unclear how to
interpret factor structures as a result of multidimensionality. McDaniel and Whetzel (2005)
reported that SJT items usually measure multiple constructs, such as personality or cognitive
characteristics, and thus are difficult to replicate in a new sample. Consequently, the replication
of an interpretable three-dimensional factor structure of the SJT-CM differs from existing
research on SJTs in terms of personnel selection. The difference between these results may be
due to the model-based development of the scenarios. Most item stems of SJTs are based on the
critical incident method (Weekley et al., 2006) using examples of good and poor job
performance reported by subject matter experts. In contrast, our model-based approach relied on
evidence-based facets of effective classroom management. This approach possibly leads to a
higher discrimination of situations with respect to theoretical facets, and thus a clearer factor
structure. Nevertheless, decision making is a complex process, and we cannot exclude the
possibility that the beliefs of teachers about classroom management (Martin, Yin, & Baldwin,
1998) affect the evaluation of courses of action. Consequently, further research on SJTs should
investigate whether a model-based and a critical incident method could influence the
dimensionality of the measured construct.
One limitation of the study is the poor internal consistency of managing momentum and rules
and routines and even unacceptable internal consistency of monitoring in the validation study.
Low internal consistencies of the SJT-CM may be caused by the short scales that consisted of
only three to four items each (Cortina, 1993). Test developers should take this point into account
for future SJTs. However, the average internal consistency of SJTs in a meta-analysis of Catano,
Brochu, and Lamerson (2012) was .46 (CI ^ .05), which is comparable to the scale monitoring
with the lowest internal consistency of the three. Some researchers in the field of SJTs regard
alpha, as used in the aforementioned meta-analysis, as an inappropriate reliability coefficient for
SJTs, due to heterogeneity within one response (McDaniel et al., 2007; Motowidlo, Dunnette, &
Carter, 1990; Schmitt & Chan, 2006; Whetzel & McDaniel, 2009). They proposed calculating
the test – retest reliability of SJTs. Because our sample size for repeated measures was too small
to confirm the assumption of essentially tau-equivalence for interpreting the correlation as
reliability, we concluded the high test –retest correlation of the SJT-CM and interpreted it
cautiously as an indicator of stability.
However, good reliability alone does not provide information about the validity of a test, and
several forms of validation are necessary (Cortina, 1993; Schmitt, 1996). Despite low reliability,
we confirmed content and construct validity of the test. Furthermore, the SJT-CM identified
differences in knowledge of different groups of expertise in both the pilot and the validation
study. This was particularly the case for monitoring, which had the lowest reliability. Based on
the high test –retest correlations and the confirmed content and construct validity, as well as
sensitivity of the test, we concluded that it would be appropriate to investigate the intra- and
244 GOLD AND HOLODYNSKI

interindividual development of strategic knowledge of classroom management during teacher


education. Nevertheless, the test should not be used for individual diagnoses. If used in research,
latent structural equation models like those used here can help to avoid potential problems that
may occur due to the modest reliabilities. In structural equation models, measurement error is
clearly separated from true individual differences, making low reliabilities less of an issue when
Downloaded by [Institut Fuer Tierernaehrung/Fli], [Mr Manfred Holodynski] at 22:46 11 August 2015

facets of classroom management are related to other variables.


Another limitation of the present study is the lack of information on whether the test scores
predict effective teaching in real classrooms. Initial studies indicated that pedagogical –
psychological knowledge is positively related to student ratings of cognitive activation and the
pace of instruction (Voss et al., 2011). Future studies should assess such criterion-related or
predictive measures in order to further validate the SJT-CM.
One field of application for the SJT-CM is the investigation of training effects in intervention
studies. When preservice teachers begin to teach, they feel extremely unprepared with respect to
classroom management (Moore, 2003; Swabey, Castleton, & Penney, 2010) and classroom
management is underrepresented as a topic in teacher education programs (Jones, 2006). Prior
research has revealed that student teacher education programs increase perceived preparedness
in managing classrooms (O’Neill & Stephenson, 2012; Woodcock & Reupert, 2012).
An interesting research question is whether and to what extent classes and trainings enhance
strategic knowledge of classroom management. In the last decade, the first large-scale studies
have been conducted as a response to the lack of research in the field of knowledge acquisition
during teacher education programs. This work focused on content knowledge or pedagogical
content-knowledge in mathematics. In particular, because beginning teachers often struggle with
classroom management, future intervention studies should address the promotion of strategic
knowledge of classroom management by implementing practice-oriented elements such as case-
based and situated approaches (Lundeberg, Levin, & Harrington, 1999), reflections on teaching
(Schön, 1983), or practical experiences. The effects of such trainings can now be assessed by
the SJT-CM.
A rather practical field of application is the use of the SJT-CM as a training tool. Student
teachers could work on the scenarios in their classes and discuss the courses of action and the
related theoretical facets, weigh the pros and cons of each strategy, and attempt both to anticipate
the respective effect on students and to transfer the theoretical aspects to new scenarios,
which could be based on their own teaching experiences. They could also choose one course of
action, compare it with the expert rating, and reflect on possible reasons for differences between
their own and the expert rating. Furthermore, following a rule-example approach, theoretical
aspects could be illustrated by means of the scenarios. In addition, following an example-rule
approach (Seidel, Blomberg, & Renkl, 2013), student teachers could work on the scenarios
and derive theoretical facets of classroom management. Such procedures could help to bridge
the theory – practice gap in teacher education (Brouwer & Korthagen, 2005; Korthagen &
Kessels, 1999).

ACKNOWLEDGMENTS

This study was funded by the German Federal Ministry of Education and Research (BMBF)
under the project number: 01JH0916. Many thanks go to Christian Geiser for helpful advice and
feedback of earlier versions of the paper.
STRATEGIC KNOWLEDGE OF CLASSROOM MANAGEMENT 245

REFERENCES

Arlin, M. (1979). Teacher transitions can disrupt classroom time flow. American Educational Research Journal, 16,
42–56. doi:10.3102/00028312016001042
Baumert, J., & Kunter, M. (2013). The COACTIV model of teachers’ professional competence. In M. Kunter, J. Baumert,
Downloaded by [Institut Fuer Tierernaehrung/Fli], [Mr Manfred Holodynski] at 22:46 11 August 2015

W. Blum, U. Klusmann, S. Krauss, & M. Neubrand (Eds.), Cognitive activation in the mathematics classroom and
professional competence of teachers. Results from the COACTIV project (pp. 25–48). New York, NY: Springer.
Berliner, D. C. (2005). The near impossibility of testing for teacher quality. Journal of Teacher Education, 56, 205–213.
doi:10.1177/0022487105275904
Blömeke, S., Felbrich, A., Müller, C., Kaiser, G., & Lehmann, R. (2008). Effectiveness of teacher education. State of
research, measurement issues and consequences for future studies. The International Journal on Mathematics
Education, 40, 719–734. doi:10.1007/s11858-008-0096-x
Bohn, C. M., Roehrig, A. D., & Pressley, M. (2004). The first days of school in the classrooms of two more effective and
four less effective primary-grades teachers. Elementary School Journal, 104, 269–287.
Brophy, J., & Evertson, C. (1978). Context variables in teaching. Educational Psychologist, 12, 310–316. doi:10.1016/
0742-051X(88)90020-0
Brouwer, N., & Korthagen, F. (2005). Can teacher education make a difference? American Educational Research
Journal, 42, 153 –224.
Brouwers, A., & Tomic, W. (2000). A longitudinal study of teacher burnout and perceived self-efficacy in classroom
management. Teaching and Teacher Education, 16, 239 –253. doi:10.1016/S0742-051X(99)00057-8
Carter, K., & Doyle, W. (2006). Classroom management in early childhood and elementary classrooms.
In C. M. Evertson & C. S. Weinstein (Eds.), Handbook of classroom management. Research, practice, and
contemporary issues (pp. 373–406). Mahwah, NJ: Erlbaum.
Catano, V. M., Brochu, A., & Lamerson, C. D. (2012). Assessing the reliability of situational judgment tests used in high-
stakes situations. International Journal of Selection and Assessment, 20, 333–346. doi:10.1111/j.1468-2389.2012.
00604.x
Christian, M. S., Edwards, B. D., & Bradley, J. C. (2010). Situational judgment tests: Constructs assessed and a meta-
analysis of their criterion-related validities. Personnel Psychology, 63, 83–117. doi:10.1111/j.1744-6570.2009.01163.x
Cortina, M. (1993). What is coefficient alpha? An examination of theory and applications. Journal of Applied
Psychology, 78, 98–104. doi:10.1037/0021-9010.78.1.98
D’Agostino, J. V., & VanWinkle, W. H. (2007). Identifying prepared and competent teachers with professional
knowledge tests. Journal of Personnel Evaluation in Education, 20, 65–84. doi:10.1007/s11092-007-9047-2
Dimitrov, D. M. (2010). Testing for factorial invariance in the context of construct validation. Measurement and
Evaluation in Counseling and Development, 43, 121– 149. doi:10.1177/0748175610373459
Doyle, W. (2006). Ecological management and classroom management. In C. M. Evertson & C. S. Weinstein (Eds.),
Handbook of classroom management. Research, practice, and contemporary issues (pp. 97–126). Mahwah, NJ:
Erlbaum.
Dreikurs, R., Grunwald, B., & Pepper, F. (1982). Maintaining sanity in the classroom: Classroom management
techniques (2nd ed.). New York, NY: Macmillian.
Elias, M. J., & Schwaab, Y. (2006). From compliance to responsibility: Social and emotional learning and classroom
management. In C. M. Evertson & C. S. Weinstein (Eds.), Handbook of classroom management: Research, practice,
and contemporary issues (pp. 309 –342). Mahwah, NJ: Erlbaum.
Emmer, E. T., Evertson, C. M., & Anderson, L. (1980). Effective classroom management at the beginning of the school
year. Elementary School Journal, 80, 219– 231.
Emmer, E., & Gerwels, M. C. (2006). Classroom management in middle school and high school classrooms.
In C. M. Evertson & C. S. Weinstein (Eds.), Handbook of Classroom Management: Research, practice, and
contemporary issues (pp. 407–438). Mahwah, NJ: Erlbaum.
Emmer, E. T., & Stough, L. M. (2001). Classroom management: A critical part of educational psychology, with
implications for teacher education. Educational Psychologist, 36, 103–112. doi:10.1207/S15326985EP3602_5
Educational Testing Service. (2011). Principles of learning and teaching: Grades K–6 (0522). Retrieved from http://
www.ets.org/s/praxis/pdf/0622.pdf
Evertson, C. M., & Emmer, E. T. (1982). Effective management at the beginning of the year in junior high classes.
Journal of Educational Psychology, 74, 485 –498. doi:10.1037/0022-0663.74.4.485
246 GOLD AND HOLODYNSKI

Evertson, C. M., & Emmer, E. T. (2012). Classroom management for elementary teachers (9th ed.). New York, NY:
Addison Wesley.
Evertson, C. M., & Weinstein, C. S. (Eds.). (2006). Handbook of classroom management. Research, practice, and
contemporary issues. Mahwah, NJ: Erlbaum.
Fenstermacher, G. D. (1994). The knower and the known: The nature of knowledge in research on teaching. Review of
Research in Education, 20, 3–56.
Downloaded by [Institut Fuer Tierernaehrung/Fli], [Mr Manfred Holodynski] at 22:46 11 August 2015

Friedman, I. A. (2006). Classroom management and teacher stress. In C. M. Evertson & C. S. Weinstein (Eds.),
Handbook of classroom management. Research, practice, and contemporary issues (pp. 925–945). Mahwah, NJ:
Erlbaum.
Gettinger, M., & Kohler, K. M. (2006). Process-outcome approaches to classroom management and effective teaching.
In C. M. Evertson & C. S. Weinstein (Eds.), Handbook of classroom management. Research, practice, and
contemporary issues (pp. 73–95). Mahwah, NJ: Erlbaum.
Goldhaber, D. D., & Anthony, E. (2007). Can teacher quality be effectively assessed? National Board Certification as a
signal of effective teaching. The Review of Economics and Statistics, 89, 134–150. doi:10.1162/rest.89.1.134
Good, T. L., & Brophy, J. E. (2000). Looking in classrooms (8th ed.). New York, NY: Addison-Wesley.
Grossman, P. L. (1990). The making of a teacher. New York, NY: Teachers College Press.
Hattie, J. (2009). Visible learning: A synthesis of over 800 meta-analyses relating to achievement. New York, NY:
Routledge.
Hoyt, W. T. (2000). Rater bias in psychological research: When is it a problem and what can we do about it?
Psychological Methods, 5, 64 –86. doi:10.1037/1082-989X.5.1.64
Hu, L. T., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria
versus new alternatives. Structural Equation Modeling, 6, 1–55. doi:10.1037/1082-989X.3.4.424
Interstate New Teacher Assessment and Support Consortium. (2011). InTASC Model Core Teaching Standards:
A resource for state dialogue. Retrieved from http://www.ccsso.org/Documents/2011/InTASC_Model_Core_Teachi
ng_Standards_2011.pdf
Johnston, B. D. (1995). “Withitness”: Real or fictional? The Physical Educator, 52, 22–28.
Jones, V. F. (2006). How do teachers learn to be effective classroom managers? In C. M. Evertson & C. S. Weinstein
(Eds.), Handbook of classroom management. Research, practice and contemporary issues (pp. 887–907). Mahwah,
NJ: Erlbaum.
König, J., Blömeke, S., Paine, L., Schmidt, W. H., & Hsieh, F.-J. (2011). General pedagogical knowledge of future
middle school teachers: On the complex ecology of teacher education in the United States, Germany, and Taiwan.
Journal of Teacher Education, 62, 188–201. doi:10.1177/0022487110388664
Korthagen, F. A., & Kessels, J. P. (1999). Linking theory and practice: Changing the pedagogy of teacher education.
Educational Researcher, 28(4), 4–17.
Kounin, J. S. (1970). Discipline and group management in classrooms. New York, NY: Holt, Rinehart, & Winston.
Kounin, J. S., & Doyle, P. H. (1975). Degree of continuity of a lesson’s signal system and the task involvement of
children. Journal of Educational Psychology, 67, 159 –164. doi:10.1037/h0076999
Kunter, M., Baumert, J., Blum, W., Klusmann, U., Krauss, S., & Neubrand, M. (Eds.). (2013). Cognitive activation in the
mathematics classroom and professional competence of teachers. Results from the COACTIV project. New York, NY:
Springer.
Leinhardt, G., McCarthy Young, K., & Merriman, J. (1995). Integrating professional knowledge: the theory of practice
and the practice of theory. Learning and Instruction, 5, 401–408. doi:10.1016/0959-4752(95)00025-9
Lievens, F., & Sackett, P. (2012). The validity of interpersonal skills assessment via situational judgment tests
for predicting academic success and job performance. Journal of Applied Psychology, 97, 460 –468. doi:10.1037/
a0025741
Lundeberg, M., Levin, B., & Harrington, H. (1999). Who learns what from cases and how? The research base for
teaching and learning with cases. Mahwah, NJ: Erlbaum.
Martin, N. K., Yin, Z., & Baldwin, B. (1998). Construct validation of the attitudes and beliefs on classroom control
inventory. Journal of Classroom Interaction, 33, 6 –15.
McDaniel, M. A., Hartman, N. S., Whetzel, D. L., & Grubb, W. L. (2007). Situational judgment tests, response
instructions and validity: A meta-analysis. Personnel Psychology, 60, 63–91. doi:10.1111/j.1744-6570.2007.00065.x
McDaniel, M. A., Morgeson, F. P., Finnegan, E. B., Campion, M. A., & Braverman, E. P. (2001). Predicting job
performance using situational judgment tests: A clarification of the literature. Journal of Applied Psychology, 86,
730 –740. doi:10.1037/0021-9010.86.4.730
STRATEGIC KNOWLEDGE OF CLASSROOM MANAGEMENT 247

McDaniel, M. A., & Nguyen, N. T. (2001). Situational judgment tests: A review of practice and constructs assessed.
International Journal of Selection and Assessment, 9, 103–113. doi:10.1111/1468-2389.00167
McDaniel, M. A., & Whetzel, D. L. (2005). Situational judgment test research: Informing the debate on practical
intelligence theory. Intelligence, 33, 515–525.
McDonald, R. P. (1999). Test theory: A unified treatment. Mahwah, NJ: Erlbaum.
Moore, R. (2003). Reexamining the field experiences of preservice teachers. Journal of Teacher Education, 54, 31–42.
Downloaded by [Institut Fuer Tierernaehrung/Fli], [Mr Manfred Holodynski] at 22:46 11 August 2015

doi:10.1177/0022487102238656
Motowidlo, S. J., Dunnette, M. D., & Carter, G. W. (1990). An alternative selection procedure: The low-fidelity
simulation. Journal of Applied Psychology, 75, 640– 647.
Motowidlo, S. J., Hooper, A. C., & Jackson, H. L. (2006). Implicit policies about relations between personality traits and
behavioral effectiveness in situational judgment items. Journal of Applied Psychology, 91, 749–761. doi:10.1037/
0021-9010.75.6.640
Moyer-Packenham, P. S., Bolyard, J. J., & Kitsantas, A. (2008). The assessment of mathematics and science teacher
quality. Peabody Journal of Education, 83, 562– 591. doi:10.1080/01619560802414940
Muthén, L. K., & Muthén, B. O. (1998–2010). Mplus user’s guide (6th ed.). Los Angeles CA: Author.
National Board for Professional Teaching Standards. (2002). What teachers should know and be able to do. Arlington,
VA: Author. Retrieved from http://www.nbpts.org/sites/default/files/documents/certificates/what_teachers_should_
know.pdf
Nichols, L. S., & Mundt, J. P. (1996). Surviving the first year of teaching: Perceptions of critical competencies from four
educational perspectives. Journal of Family and Consumer Sciences Education, 14, 23 –39.
O’Neill, S., & Stephenson, J. (2012). Does classroom management coursework influence pre-service teachers’ perceived
preparedness or confidence? Teaching and Teacher Education, 28, 1131–1143. doi:10.1016/j.tate.2012.06.008
Renkl, A., Mandl, H., & Gruber, H. (1996). Inert knowledge: Analyses and remedies. Educational Psychologist, 31,
115–121. doi:10.1207/s15326985ep3102_3
Satorra, A., & Bentler, P. M. (2001). A scaled difference chi-square test statistic for moment structure analysis.
Psychometrika, 66, 507–514. doi:10.1007/BF02296192
Schmitt, N. (1996). Uses and abuses of coefficient alpha. Psychological Assessment, 8, 350–353. doi:10.1037/1040-
3590.8.4.350
Schmitt, N., & Chan, D. (2006). Situational judgment tests: Method or construct? In J. A. Weekly & R. E. Ployhart (Eds.),
Situational judgment tests: Theory, measurement, and application (pp. 135 –155). Mahwah, NJ: Erlbaum.
Schön, D. A. (1983). The reflective practitioner. How professionals think in action. New York, NY: Basic Books.
Seidel, T., Blomberg, G., & Renkl, A. (2013). Instructional strategies for using video in teacher education. Teaching and
Teacher Education, 34, 56–65.
Seidel, T., & Shavelson, R. J. (2007). Teaching effectiveness research in the last decade: Role of theory and research
design in disentangling meta-analysis results. Review of Educational Research, 77, 454 –499. doi:10.3102/
0034654307310317
Shulman, L. S. (1986). Those who understand: Knowledge growth in teaching. Educational Researcher, 15(2), 4–14.
Simonsen, B., Fairbanks, S., Briesch, A., Myers, D., & Sugai, G. (2008). Evidence-based practices in classroom
management: Considerations for research to practice. Education and Treatment of Children, 31, 351–380.
doi:10.1353/etc.0.0007
Slavin, R. E. (2003). Educational psychology: Theory and practice (7th ed.). Boston, MA: Allyn & Bacon.
Smith, H. (1985). The marking of transitions by more and less effective teachers. Theory Into Practice, 24, 57–62.
Stage, S. A., & Quiroz, D. R. (1997). A meta-analysis of interventions to decrease disruptive classroom behavior in
public education. School Psychology Review, 26, 333– 369.
Stemler, S. E., & Sternberg, R. J. (2006). Using situational judgment tests to measure practical intelligence.
In J. A. Weekley & R. E. Ployhart (Eds.), Situational judgment tests. Theory, measurement, and application
(pp. 107– 131). San Francisco, CA: Jossey-Bass.
Swabey, K., Castleton, G., & Penney, D. (2010). Meeting the standards? Exploring preparedness for teaching. Australian
Journal of Teacher Education, 35(8), 29–46.
Tatto, M. T., Schwille, J., Senk, S., Ingvarson, L., Peck, R., & Rowley, G. (Eds.). (2008). Teacher Education and
Development Study in Mathematics (TEDS– M): Conceptual framework. East Lansing: Teacher Education and
Development International Study Center, College of Education, Michigan State University.
Templer, K. J., & Lange, S. R. (2008). Internet testing: Equivalence between proctored lab and unproctored field
conditions. Computers in Human Behavior, 24, 1216–1228. doi:10.1016/j.chb.2007.04.006
248 GOLD AND HOLODYNSKI

Veenman, S. (1984). Perceived problems of beginning teachers. Review of Educational Research, 54, 143–178. doi:10.
2307/1170301
Voss, T., Kunter, M., & Baumert, J. (2011). Assessing teacher candidates’ general pedagogical and psychological
knowledge: Test construction and validation. Journal of Educational Psychology, 103, 952–969. doi:10.1037/
a0025125
Wang, M. C., Härtel, G. D., & Walberg, H. J. (1993). Toward a knowledge base for school learning. Review of
Downloaded by [Institut Fuer Tierernaehrung/Fli], [Mr Manfred Holodynski] at 22:46 11 August 2015

Educational Research, 63, 249 –294. doi:10.1177/0013164407305592


Wayne, A. J., & Youngs, P. (2003). Teacher characteristics and student achievement gains: A review. Review of
Educational Research, 73, 89 –122. doi:10.3102/00346543073001089
Weekley, J. A., Ployhart, R. E., & Holtz, B. C. (2006). On the development of situational judgment tests: Issues in item
development, scaling, and scoring. In J. A. Weekley & R. E. Ployhart (Eds.), Situational judgment tests: Theory,
measurement, and application (pp. 157–182). Mahwah, NJ: Erlbaum.
Weinstein, C. (1979). The physical environment of the school: A review of the research. Review of Educational
Research, 49, 557–610. doi:10.3102/00346543049004577
Weinstein, C. S. (1999). Reflections on best practices and promising programs. In H. Freiberg (Ed.), Beyond
behaviorism: changing the classroom management paradigm (pp. 145–163). Boston, MA: Allyn & Bacon.
Weinstein, C. S., & Migano, A. J., Jr. (2006). Elementary Classroom Management: Lessons from Research and Practice.
New York, NY: McGraw-Hill.
Westerman, D. A. (1991). Expert and novice teacher decision making. Journal of Teacher Education, 42, 292–305.
doi:10.1177/002248719104200407
Whetzel, D. L., & McDaniel, M. A. (2009). Situational judgment tests: An overview of current research. Human
Resource Management Review, 19, 188 –202. doi:10.1016/j.hrmr.2009.03.007
Widaman, K. F., & Reise, S. P. (1997). Exploring the measurement invariance of psychological instruments:
Applications in the substance use domain. In K. J. Bryant, M. Windle, & S. G. West (Eds.), The science of prevention:
Methodological advances from alcohol and substance abuse research (pp. 281 –324). Washington, DC: American
Psychological Association.
Woodcock, S., & Reupert, A. (2012). A cross-sectional study of student teachers’ behaviour management strategies
throughout their training years. The Australian Educational Researcher, 39, 159–172. doi:10.1007/s13384-012-0056-x
Wubbels, T. (2011). An international perspective on classroom management: What should prospective teachers learn?
Teaching Education, 22, 113–131. doi:10.1080/10476210.2011.567838

View publication stats

You might also like