You are on page 1of 15

Behavior Research Methods

https://doi.org/10.3758/s13428-022-01799-3

Self-report and facial expression indicators of team cohesion


development
Justin W. Bonny1 

Accepted: 14 January 2022


© The Psychonomic Society, Inc. 2022

Abstract
Ad hoc teams are formed to complete tasks across formal and informal environments. More effective teams tend to report
higher cohesion, more strongly identifying as a group. Dynamic theories of team processes suggest that cohesion changes as
teams form and perform to achieve a goal. The present research examined approaches for rapidly measuring team dynamics
to investigate how cohesion developed in newly formed teams as they completed a series of video game matches. Self-report
ratings of cohesion were collected via manikin-based measures designed to be rapidly completed. In addition, the emotion
valence and arousal of facial expressions of teammates were estimated via video recordings. Results suggested that percep-
tions of cohesion rapidly changed as teams completed video game matches. The present study indicates that manikin-based
self-report measures and emotion valence of facial expressions are dynamic and could be used as behavioral indicators of
team cohesion development.

Keywords  Video games · Teams · Cohesion · Emotion · Facial expressions

Introduction manikin-based self-report scales and the facial expressions


of teammates offer potential methods for developing indica-
Small teams play a key role in the effectiveness of many tors of team cohesion in near real time. Questions remain
professional and sports organizations (Elbardissi et al., 2013; regarding whether such measures are sensitive to longitu-
Huckman et al., 2009). Composed of two or more interde- dinal changes as teams form and gain experience working
pendent individuals that interact to achieve common goals together and reflect self-reported team cohesion. The goals
(Mathieu et al., 2019), previous research suggests that effec- of the present study were to examine whether manikin-
tive teams tend to have higher cohesion in addition to higher based ratings and facial expressions of teammates changed
performance (Beal et al., 2003; Carron, Bray, & Eys, 2002a; as newly formed teams accumulated experience and reflected
Zaccaro et al., 1995). Cohesion refers to the extent to which team cohesion rather than external factors.
individuals are drawn together as a group (Festinger, 1950;
Mullen & Copper, 1994) and has been found to change as Dynamics of team processes
teams accumulate experience performing a task (Harrison
et al., 1998). With evidence of dynamic changes in team Cohesion likely varies as a team works towards a goal. The
processes, monitoring the cohesion of teams while perform- input-process-output (IPO) model of team functioning pro-
ing could allow for issues to be quickly identified as teams poses that task and teammate characteristics (input) influ-
work towards completing a goal. However, current measures ence team communication and coordination as members
of cohesion typically require the use of long-form question- perform the task (process) to achieve the objective (output;
naires, which are limited in their use as real-time indica- Gino et al., 2010; Hackman & Morris, 1975; Kozlowski
tors of cohesion. Recent studies examining the use of brief, & Ilgen, 2006). It has been extended to incorporate the
dynamic and cyclical nature of team functioning (Ilgen et al.,
* Justin W. Bonny 2005; Marks et al., 2001). Team processes are modeled as
justin.bonny@morgan.edu phasic, with multiple IPO episodes occurring as a team
forms and completes goals in order to achieve the task objec-
1
Department of Psychology, Morgan State University, 1700 tive (Ilgen et al., 2005). The outcomes of the previous phase
East Cold Spring Lane, Baltimore, MD, USA

13
Vol.:(0123456789)
Behavior Research Methods

influence the input and processes of the subsequent phase of emotion, such as valence, a series of human-like figures
(Marks et al., 2001). Cohesion is considered to be an emer- includes iconic depictions associated with valence such as a
gent state in the IPO model, capturing member attitudes, frown or smile and raised or un-raised cheeks. After being
emotions, and task context, changing as teams perform and provided with instructions about the manikins and what the
can be included as input in subsequent phases (Marks et al., visualizations represent, respondents are then presented with
2001). The dynamic nature of cohesion as an emergent state the scales and asked to select the manikin item that best
indicates that it likely varies as teams accumulate experience matches the psychological construct of interest. Responses
and progress through IPO episodes. As such, it is predicted on these scales have been found to align with physiological
that real-time measures of team functioning that are associ- recordings, providing evidence that these brief self-report
ated with cohesion will change as teams are performing to measures can be used to assess near-real-time emotion
achieve the task objective. responses (Lang et al., 1993).
A relatively unexplored question is how ad hoc team An exploratory study used a set of manikin measures to
cohesion develops in near real time while completing an assess team cohesion in near real time. Specifically, Bonny
exercise. Past research has typically examined the develop- (2018) constructed a group-based manikin scale based on
ment of team cohesion across large temporal scales, such three-factor models of cohesion, proposed by Festinger
as weeks-long projects (Chang & Bordia, 2001; Terborg (1950) and supported by meta-analyses investigating con-
et al., 1976). In comparison, ad hoc groups of members nections between team cohesion and performance (Mullen
that are assembled for a specific, time-limited situation are & Copper, 1994). Utilizing parallels to physical forces, team
prevalent in multiple fields including trauma units (Rob- cohesion has been argued to be composed of multiple com-
erts et al., 2014), military units (Ben-Shalom et al., 2005), ponents: interpersonal attraction (the extent to which mem-
and emergency response coalitions (McMaster & Baber, bers are drawn together), task commitment (the degree to
2012). As present measures of team cohesion typically rely which members pursue goal-related tasks), and group pride
on members having collaborated for an extended period of (the extent to which members support and take pleasure in
time (Brawley et al., 1987), it is unclear the extent to which the success of team; Festinger, 1950; Mullen & Copper,
cohesion is relatively static or dynamically changing as an 1994). The manikin scale was designed to visually depict
ad hoc team forms and performs to reach a goal. This is groups of individuals that varied across these dimensions.
particularly important when considering that teams can be In addition, the scale was designed to be rapidly completed
relatively diverse when it comes to strategizing and execut- by team members, in line with the dynamic IPO model
ing to reach a goal (Ancona, 1990). The goal of the present (Ilgen et al., 2005; Marks et al., 2001). Results indicated
research was to examine how cohesion unfolds and changes that the manikin scale responses were correlated with estab-
in real time while ad hoc teams complete an exercise. lished post hoc measures of cohesion and changed during
the team-based tasks. This suggested that, like emotion,
Manikin‑based scales and team cohesion manikin-based measures could be used to track changes in
group-based constructs, such as cohesion. However, Bonny
To evaluate cohesion as teams are completing an exercise, (2018) was limited in the number of participants collected
near-real-time measures are required. Current widely used from ad hoc groups. Further research was required to iden-
measures, such as the Group Environment Questionnaire, tify whether such manikin-based measures were, indeed,
typically present a set of items to each team member to com- valid indicators of team cohesion.
plete after a group activity (Brawley et al., 1987; Eys et al.,
2007). This format is not well suited for rapid responses Expressed emotions and team dynamics
(e.g., large number of response items) while an activity is
underway. As such, current self-report measures of team Evidence suggests that the facial expressions of teammates
cohesion are limited in their ability to examine how cohe- can relate to aspects of team dynamics. Indeed, connections
sion varies during a team exercise. with emotions expressed by teammates have been observed
Brief self-report measures have been effectively utilized with team performance (Chikersal et al., 2017; Moll et al.,
to assess components of emotion responses to stimuli. Spe- 2010) and ratings of cohesion (Mønster et al., 2016). A ben-
cifically, single-item self-assessment manikins have been efit of using such measures over self-report scales is the abil-
found to be reliable and valid real-time measures of facets of ity to record data in real time without any response require-
emotion, including arousal, valence, and dominance (Brad- ment from teammates. Facial expressions of co-located
ley & Lang, 1994). Manikin-based scales differ from more teammates may serve as a nonverbal communication channel
commonly used rating scales in that they are designed to regarding team processes. Displays of emotion, including
visually, rather than verbally, represent a continuous psycho- facial expressions, gaze direction, and motor actions among
logical dimension (Morris, 1995). As applied to dimensions members, can become coordinated and converge while

13
Behavior Research Methods

teams are interacting to complete a task (Parkinson, 2020). the valence expressed by teammates, may correlate with
Additionally, the composition of affective states of groups aspects of team effectiveness.
is proposed to be dynamic (Kelly & Barsade, 2001). The
amount of positive versus negative emotion self-reported
by members of small teams has been found to be connected Present study
with performance (Jung, 2016), and the amount of positive
emotion displayed by soccer players during matches cor- The present research examined whether cohesion manikin
related with team outcomes (Moll et al., 2010). As models scale ratings and facially expressed emotions of teammates
of team dynamics distinguish between cohesion and perfor- change with accumulated experience and indicate post hoc
mance, an outstanding question is to what extent the emotion team cohesion. Groups of participants were recruited to
expressed by members can be used to predict the cohesion, complete a series of team-based video game matches. In
in addition to the performance, of the team. this game, players controlled virtual cars as they “hit” a ball
A concern regarding the use of group measures of psy- around an arena, attempting to score on the opponent team's
chophysiological signals, such as synchrony, to infer team goal. This type of team-based video game was selected for
dynamics is whether the source of said synchrony is the three main reasons. First, team-based video games have been
team. Factors external to a team could lead to similarities used in previous research to study team dynamics (Bonny,
in such measures, resulting in incorrect conclusions about 2018; Freeman & Wohn, 2019). Second, despite being a
team dynamics (Burgess, 2013). For example, team mem- game played by maneuvering a virtual car via a control-
bers similarly displaying smiles could be mistaken as evi- ler, the mechanics of the game were similar to the sport of
dence of a cohesive team when it was instead due to external soccer, which has been studied in previous team research
influences, such as watching the same event occur on a video (e.g., Moll et al., 2010). Third, although a team-based game,
screen. A previous study by Mønster et al. (2016) included a individual players had much freedom regarding whether to
method to address these concerns. In the study, small teams coordinate their actions with their teammates or act indepen-
were given a construction task where they were asked to pro- dently. To characterize teams in the present study with past
duce as many items as possible while physiological signals research, the dimensional scale models of team structure was
were recorded. To evaluate the extent to which physiologi- used (Hollenbeck et al., 2012). The framework proposes that
cal measures were sensitive to team dynamics, team-wide teams can be distinguished by three dimensions: skill dif-
scores were calculated in two conditions, using members ferentiation, authority differentiation, and temporal stability.
of the same team (real team) and members from different Within this framework, the ad hoc teams in the present study
teams (pseudo team). In this design, similar values for scores were low in temporal stability (formed to complete a single
calculated for groups composed of members of the same and session), low in skill differentiation (each member could per-
different teams would indicate that team synchrony was due form the same set of actions in the video game), and low in
to external factors, in line with Burgess (2013). In the study, authority differentiation (there was no pre-designated team
of the measures recorded, synchrony in electromyography leader).
activity from the zygomaticus major (muscle involved in A set of brief cohesion manikin scales (Bonny, 2018) was
face smiles) and electrodermal activity (skin conductance) modified and administered to participants after each video
was greater for the real compared to pseudo teams. This game match was completed. These modifications included
suggested that these physiological signals were sensitive to the addition of a social bonding item and a group viability
team membership. Furthermore, teams displaying greater item in response to weak correlations observed in Bonny
synchrony in muscle activity associated with smiles during (2018) between cohesion manikin scale responses and
the task self-reported higher post hoc ratings of team cohe- Group Environment Questionnaire scales reflecting social
sion (Mønster et al., 2016). This suggested that physiological integration of the team and a gap in the measure for assess-
signals associated with smiling could indicate team dynam- ing the likelihood of individuals remaining as a team (Bell
ics. Emotion displayed via facial expressions has also been & Marentette, 2011). In addition to manikin scales, the pre-
found to correlate with aspects of team performance. In a sent study collected estimates of the valence and arousal of
study by Chikersal et al. (2017), dyads were video-recorded facial expressions of teammates. Past studies observed that
during completion of a series of tasks. Using facial recog- synchrony in muscle activity related to smiles of team mem-
nition software, the valence of facial expressions of each bers was correlated with cohesion (Mønster et al., 2016) and
partner was extracted, ranging from positive (e.g., happy) that estimated valence of expressed emotions was associated
to negative (e.g., sad). When examining task performance, with performance (Chikersal et al., 2017). Participants were
dyads with higher synchrony in facial expressions tended to video-recorded while completing the game, and software
have higher performance (Chikersal et al., 2017). Both stud- was used to estimate the valence (ranging from positive to
ies indicated that facial expressions of emotion, specifically negative) and arousal (ranging from activated to subdued)

13
Behavior Research Methods

of facial expressions, in line with the circumplex model of a member of a team, were removed from further analysis due
emotion (Lang et al., 1993). to missing data. A power analysis based on a correlation,
Two types of team measures were calculated to examine r = .174, observed by Mønster et al. (2016) between self-
the sensitivity of manikin and facial expression measures report ratings regarding member positive affect towards the
of teammates to changes in team dynamics: the mean level group and greater zygomaticus muscle activity, associated
and the variability of measures across members. Previous with smiling, indicated that statistical power of .80 would
research has used the mean of team member traits to exam- be achieved with a sample size of 254. The mean age of par-
ine how the level of the trait being expressed on the team ticipants was 38.41 years (SD = 12.92, Min = 18, Max = 65),
relates to team functioning (Terborg et al., 1976). Team vari- with a total of 147 identifying their gender as female (100 as
ability, specifically the normalized mean Euclidean distance male, 1 as transgender, and 1 as other), and, regarding self-
(MEDn) score, has been used to characterize the diversity identified race and ethnicity, 4 as Asian (no Hispanic), 140
of a team with regard to a particular trait, with lower scores as Black (6 as Hispanic), 101 as White (2 as Hispanic), and
indicating greater teammate similarity (Biemann & Kear- 4 as multiracial (1 as Hispanic). A total of 213 participants
ney, 2010). Although higher similarity scores for real ver- reported playing video games during a typical week, and 65
sus pseudo teams would not be sufficient evidence to reject reported having either played or watched the targeted video
external factors as the source (cf. Burgess, 2013), the lack of game Rocket League (43 having played; 184 having no expe-
such differences would indicate that the manikin and facial rience with the game). The research protocol was approved
expression measures were not sensitive to factors internal to by local and US federal independent review boards.
teams. To further test whether such measures were indicators
of team cohesion, a questionnaire that has been previously Software and materials
observed to correlate with team cohesion and effectiveness,
the Group Environment Questionnaire (GEQ; Brawley et al., A camcorder (Sony Handycam CX405) was placed on a
1987; Carron et al., 1985; Carron, Colman, et al., 2002b; Eys tripod in front of each participant, focusing on the upper
et al., 2007), was administered after participants completed body, including their face. During the study, the camcorder
all video game matches. The GEQ scales assess multiple recorded participants using standard definition resolution
aspects of team cohesion, specifically whether members felt (1280 × 768 pixels). Participants were seated around a square
that they were part of the group and whether all members table with their teammates. In the same room was the oppo-
coalesced into a group around completing the task at hand as nent team, seated in a similar configuration. A fabric curtain
well as social interactions (Brawley et al., 2002). Evidence separated the two teams such that members could not view,
of correlations between self-report ratings and facial expres- but could hear audio from, the opposing team.
sion measures and GEQ scales would further support such Each participant had a gaming-ready laptop computer
measures as being indicators of team cohesion development. (39.6 cm diagonal screen; 1920 × 1080-pixel resolution)
Three predictions were made. It was hypothesized that if to complete the team-based video game. A game control-
manikin-based and facial expression measures were influ- ler (Microsoft Xbox One) was used to play the game. An
enced by team dynamics, they would be more similar among adjacent touchscreen laptop computer was used to complete
members of the same versus opposing teams (Hypothesis self-report measures during the study.
1) and would change as teams gained experience playing The team-based video game was a car-based soccer game,
the video game (Hypothesis 2). To test the sensitivity of Rocket League (Psyonix LLC). The game was based on the
measures to team membership, for each testing session, two sport of soccer, but with remote-controlled vehicles replac-
teams of participants competed against each other via the ing human players. Each player drove a virtual car around
team-based video game. Third, it was hypothesized that if a virtual field to hit a large soccer ball into an opponent’s
manikin-based and facial expression measures reflected team goal, and keep it out of their own goal. Verbal and written
cohesion, they would correlate with team cohesion measured instructions were provided to participants on how to play the
via traditional questionnaires administered after completing game prior to the start of the first game match.
the video game matches (Hypothesis 3).
Procedure

Method At the start of a testing session, two teams of participants


were formed with the same number of members, ranging
Participants from two to four. Each team competed against each other
during the session. Prior to team assignment, participants
A total of 249 individuals participated on teams in the pre- reported on average how much time per week they had
sent research. Data from one additional participant, who was played Rocket League over the past two years (0 = I have

13
Behavior Research Methods

Fig. 1  Anchor images and accompanying statements for each item of the cohesion assessment manikin (CAM) scales. Instructions for the scale
emphasized that participants select which item image (ranging from 1 to 7) best reflected how they felt about their team at the current moment

never played this game, 1 = 5 minutes, 2 = 15 minutes, 3 = 45 (CAM) scales based on how they felt about their team at that
minutes, 4 = 1 hour, 5 = 2 hours, 6 = 5 hours, 7 = 10 hours, moment. After the fourth match, the camcorders were turned
8 = 15 hours, 9 = more than 15 hours) and how familiar they off and participants were asked to complete self-report meas-
were with the other participants present during the study ures of cohesion. Participants were then informed of which
session using a modified inclusion of the other in the self team earned the bonus incentive. The testing session lasted
scale with low familiarity indicated by two non-overlapping approximately 80 minutes.
circles (1) to two circles almost completely overlapping (7;
Aron et al., 1992; Gächter et al., 2015). For each session, Team measures
participants were sorted into two teams to minimize inter-
team differences in Rocket League experience and intra-team Goals scored
member familiarity. A total of 72 teams took part in the
study (N four-person teams = 40; N three-person teams = 26; For each team and match, the number of goals scored by a
N two-person teams = 6). After providing instructions for team was recorded to assess team performance. In addition,
the video game, participants were informed that the team the total number of goals scored across all matches was cal-
that accrued the most goals during the session would earn culated as an indicator of overall team performance.
a bonus of $5 each (all participants received $95 cash for
their participation). Participants were given a paper packet Cohesion assessment manikin scales (CAM)
of instructions for completing the manikin measures and for
how to play the team-based video game. A set of pictographic self-report scales was developed to
Next, the researchers set each camcorder to start record- rapidly assess cohesion during the research session. Based
ing and the first match was started. Participants then com- on a previous study (Bonny, 2018), the cohesion assessment
pleted a total of four matches of the team-based video game. manikin (CAM) scales were developed to include one-item
Each match had five minutes of game time. The game timer unipolar measures for components of team cohesion. A
was paused when a goal was scored, and the team with the total of five scales were designed to reflect interpersonal
most goals at the end of the match was declared the win- attraction, task commitment, group pride, social bonding,
ner. In cases where a tie occurred at the end of the match, and group viability [see Fig. 1; see Bonny (2022b) for par-
teams completed overtime where the first goal would end ticipant instructions and full measure]. For each scale, a
the match. At the end of each match, participants were set of seven images each containing manikin figures and
able to view the performance statistics of their team and symbols was used to display an aspect of cohesion, ranging
the opponent team. Afterwards, the researchers instructed from high cohesion to low cohesion. Prior to the team-based
participants to complete the cohesion assessment manikin tasks, participants were presented with written instructions

13
Behavior Research Methods

providing a description of each scale. During the group task, was returned. This meant that the software was not able to
each scale was presented individually on a computer screen, estimate a facial expression for the video frame, not that
with the images arranged horizontally and a response prompt a “neutral” expression was detected. Reasons for software
displayed at the top. Accompanying the high and low anchor failure to estimate emotion states for a video frame included
for each scale were short phrases presented to remind par- parts of the face being occluded and talking.
ticipants of the spectrum of the scale. The mean rating of For team-level measures to be calculated for a video
all scales was calculated and used during analyses. Ratings frame, each member of a team required emotion state esti-
were aligned with other cohesion measures, with higher mates.1 There were multiple frames for which at least one
scores indicating greater reported cohesion. member was missing an emotion estimate due to a failed fit
by the software. Across all 72 teams and matches, there were
Group Environment Questionnaire a total of 3,800,442 video frames analyzed by the software
(number of frames per team, per session: M = 52,783.92,
Team cohesion was assessed using a modified version of SD = 5194.15, Min = 40,854, Max = 68,335). Of all frames
the GEQ, which has been previously found to be reliable collapsed across all teams, a total of 2,533,129 frames had
and valid (Brawley et al., 2002). The four scales of the GEQ values estimated for valence and arousal of all team members
include perceptions of the group integrating around the task- (66.65% of total frames). Within each team, the percentage
ing at hand (group integration—task; GIT), extent to which of frames that had estimated values for all members varied
the group coalesced around social aspects of the group (M = 66.54%, SD = 25.25%, Min = 2.53%, Max = 99.00%).
(group integration—social; GIS), individual drive to com-
plete the task (attractions towards the group—task; ATGT), Team‑level measure scores
and individual interest in socially affiliating with group
members (attractions towards the group—social; ATGS). To assess team-level measures, two scores were calculated
To account for the ad hoc nature of the teams in the present for CAM and facial expression measures. To align with
study, statements within the original scale were adjusted, the single set of CAM ratings collected per match, facial
similar to previous research (Bonny, 2018). The scores for expression measures were averaged for each match. For
each scale were calculated for each participant using mean facial expression measures, video frames without an emotion
question responses. To create a team-level score, the mean of measure for each team member were dropped from further
member responses was calculated for each GEQ scale, with analysis. For one team, during the third match, zero frames
higher scores indicating greater team cohesion. were retained and emotion measures were not calculated.
The following team scores were calculated for each match:
Measures of facial emotion expression mean CAM, mean valence, mean arousal, MEDn CAM,
MEDn valence, MEDn arousal, goals scored (see Table 1
Video recordings of each team member were analyzed using for definitions; calculation formulas can be found in the Sup-
the FaceReader 8.0 video software package (Noldus Infor- plementary Materials). The diversity, and in turn similarity,
mation Technology). The software uses facial recognition in CAM and facial expression scores was calculated using
algorithms to estimate the presence and absence of facial the adjusted MEDn, reflecting the disparity between val-
action units and calculate the emotion being displayed by ues of team members with smaller values indicating greater
an individual, in alignment with the Facial Action Coding similarity (a value of zero indicating that the same values
System (Ekman & Friesen, 1971). Each video was down- were provided by all members) and larger values indicating
sampled to 30 frames per second (FPS; every other frame greater disparity (Biemann & Kearney, 2010).
of the original 60-FPS video). No other video editing was
performed. For each analyzed video frame, the software
package estimated the level of arousal and valence present Results
in the facial expression using facial action units. Although
the software provided estimates of multiple facial action All analyses were conducted using associated R packages
units, arousal and valence were selected in line with previ- via two-tailed tests (α = .05). Graphs were produced using
ous research by Chikersal et al. (2017) and the circumplex the ggplot2 R package (Wickham  2016). For analyses
model of emotion (Bradley & Lang, 1994), and to reduce the including multiple regressions, Cohen's f2 was calculated
number of statistical tests compared to including all facial
action unit scores. Scores for valence ranged from −1 to 1,
negative to positive, respectively, and arousal scores ranged 1
  Note that for the team with one participant missing data, each of
from 0 to 1, subdued to activated, respectively. When the the remaining team members was required to have an emotion meas-
software failed to estimate a measure, a value of “fit failed” ure to be included in analysis.

13
Behavior Research Methods

Table 1  Team scores calculated for each video game match


Team score Description

Mean CAM The mean of member mean CAM scores ( CAMn  ) for a team, where CAMn is the mean of member ratings
for each CAM scale (attraction, pride, commitment, future, social)
Mean valence of facial expressions The mean of frame team mean valence scores ( Valencef  ), where Valencef  is the mean of team member
valence scores for a specific frame
Mean arousal of facial expressions The mean of frame team mean arousal scores ( Arousalf  ), where Arousalf is the mean arousal scores of
team members for a specific frame
MEDn CAM The MEDn of member mean CAM scores ( CAMn  ) for a team, where CAMn is the mean of member ratings
for each CAM scale (attraction, pride, commitment, future, social)
MEDn valence of facial expressions The mean of frame team MEDn valence scores (Valence MEDnf), where Valence MEDnf is the MEDn of
team member valence scores for a specific frame
MEDn arousal of facial expressions The mean of frame team MEDn arousal scores (Arousal MEDnf), where Arousal MEDnf is the MEDn of
team member arousal scores for a specific frame
Goals scored Total number of goals scored by team members

CAM cohesion assessment manikin, MEDn mean Euclidean distance normalized

to estimate effect size. When included in analyses, linear research reporting the psychometric properties of cohesion
mixed models were calculated using the lme4 R package measures have typically used questionnaires collected after
(Bates et al., 2015), with degrees of freedom estimated using teams have finished completing their task. A Cronbach's
Satterthwaite's method via the lmerTest R package (Kuznet- alpha of .92 was observed with fourth-match CAM ratings,
sova et al., 2017). To estimate the practical significance of indicating that acceptable inter-item reliability was achieved
predictors in linear mixed models, the effect size f2 was (GEQ scale reliability: ATGT = .75, ATGS = .66, GIT = .71,
computed using the marginal pseudo R2 estimated via the GIS = .72). Test-retest reliability of CAM scales was
MuMIn R package (Barton, 2018). Marginal pseudo R2 esti- assessed using intra-class coefficients (ICC) calculated via
mates the amount of variance accounted for by fixed effects, a two-way mixed model to indicate the agreement between
independently of random effects (Nakagawa & Schielzeth, the average responses (ICC 3A,k; Koo & Li, 2016; Shrout
2013). To estimate f2 for a predictor, marginal pseudo R2 was & Fleiss, 1979). The use of average responses was selected
calculated when the predictor was and was not included in since the CAM responses were intended to measure the level
the model (Matsuno & Fujita, 2018; Selya et al., 2012).The of cohesion throughout a session. Using the “psych” R-pack-
datasets used for analyses can be accessed via the following age, ICC(3A,k) coefficients for all scales (attraction = .79,
online repository: Bonny (2020a). commitment = .81, pride = .78, social = .90, future = .89)
When comparing competing teams across sessions, the were acceptable, indicating adequate test-retest reliability.
difference between teams was significantly greater than zero To estimate how quickly participants completed the CAM
for mean Rocket League experience (Wilcoxon V = 210, scales, in addition to rating values, the response times for
p < .001), M = 0.66, SD = 0.83, Min = 0, Max = 2.25, which providing CAM ratings after each scale was presented to
includes 18 sessions where there was no difference, and mar- the participants were collected during testing sessions.
ginally significant for mean teammate familiarity (Wilcoxon The mean response time of a single CAM scale was 7.27
V = 15, p = .059), M = 0.04, SD = 0.12, Min = 0, Max = 0.50, s (SD = 17.17) with a 95% confidence interval of 6.84 to
which includes 31 sessions where there was no difference. 7.69 s.
Initial models investigating changes in team mean CAM and
facial expression scores included these as statistical control Sensitivity of CAM and facial expression measures
variables. to team membership

Cohesion assessment manikin scale reliability An initial set of analyses was used to assess whether team
CAM and facial expression scores were sensitive to team
To assess inter-item reliability of the CAM scales, individual membership and accumulated team experience. Within the
ratings provided after the fourth video game match were ana- context of the present research, it was predicted that MEDn
lyzed and compared to GEQ scale reliability scores. Fourth- scores, which reflect similarity across teammates, for CAM
match CAM scales were analyzed due to temporal prox- and facial expression measures would differ when com-
imity to the collection of GEQ ratings and since previous posed of members from the same versus competing teams.

13
Behavior Research Methods

Alternatively, if MEDn scores reflected factors that were not


specific to the group, such as viewing the same game match
in real time, they would be similar regardless of team mem-
bership. To test these possibilities, two sets of MEDn scores
were calculated: measures from participants on the same team
(real teams) versus those from participants that were team-
mates and some members of the opposing team (pseudo teams;
four-member teams: two from each; three-member teams:
two teammates and one opponent; two-member teams: one
from each). Like past research that utilized a similar approach
(Mønster et al., 2016), it was hypothesized that MEDn scores
from measures that reflected team dynamics would be lower
for real compared to pseudo teams. Second, it was predicted
that measures that were influenced by team dynamics would
change as teams gained experience.
Similarity scores were analyzed using linear mixed mod-
els for each measure from each match. To examine emotions
expressed during each match, the mean MEDn score for each
group of individuals was calculated across all video frames
for a match. Additionally, to account for differences due to
team size, the number of members was entered as a control
variable. A 2 (team condition: real, pseudo) by 4 (match: 1,
2, 3, 4) by 3 (team size: 2, 3, 4) design with an interaction
between team condition and match (random intercept for ses-
sion) was evaluated for each measure. No significant interac-
tions (ps > .6) or main effect of team size (ps > .07) were
observed for any dependent variables; subsequent models
removed these terms. For CAM MEDn scores, significant
effects were observed for team condition, F(1,540) = 10.92,
p < .001, f2 = .01, and match, F(3,540) = 9.62, p < .001,
f2 = .04. For both valence and arousal MEDn scores, signifi-
cant effects of team condition, valence, F(1,537.99) = 10.97,
p < .001, f2 = .01, arousal, F(1,537.98) = 8.59, p < .001, f2 = .01,
and match, valence, F(3,538.05) = 13.91, p < .001, f2 = .05,
Fig. 2  Team similarity scores (MEDn) by teammate condition, calcu-
arousal, F(3,538.06) = 3.09, p = .03, f2 = .01, were observed. lated with real and pseudo teammates across matches for each meas-
For all measures, MEDn scores were lower for real com- ure. Error bars represent 95% confidence intervals
pared to pseudo teams (see Fig. 2). When focusing only on
real teams, pairwise post hoc comparisons (Tukey-adjusted
p-values) indicated that MEDn scores of later matches tended mixed models. To examine the impact of team perfor-
to be lower than those of earlier matches, with the exception of mance on such measures, the number of goals scored by
arousal MEDn scores (adj. ps < .05; CAM: match 1 vs. match a team for each match was included as a predictor. Mod-
4; valence: match 1 vs. match 2, match 1 vs. match 3, match els included a main effect of match (1, 2, 3, 4) and goals
1 vs. match 4). Overall, these results indicated that MEDn scored, and team size (size: 2, 3, 4), mean team Rocket
scores for CAM and facial expression measures were sensi- League experience, and mean team familiarity as control
tive to whether calculations were made with members from variables (random intercept for session). No significant
the same team and, for CAM and valence measures, changes effects were observed for mean team Rocket League expe-
in accumulated team experience. rience (ps > .14) or mean team familiarity (ps > .57).
For team size, no significant effect was observed for CAM
Changes in mean CAM and facial expression and valence scores (ps > .4); a significant main effect was
measures with experience observed for arousal, F(2,36.08) = 3.27, p = .05, f2 = .08,
but post hoc comparisons were not significant (adj. ps
Experience-dependent changes in CAM and facial expres- > .05). Subsequent models removed these terms. Sig-
sion measures were examined using team means via linear nificant main effects of match were observed for CAM,

13
Behavior Research Methods

were sensitive to changes in accumulated team experience


and performance during each match.
A follow-up analysis was performed to further examine
how team behaviors during matches affected CAM ratings.
Recall that participants completed CAM ratings after each
match, by which time the outcome of the match was dis-
played, including the number of goals their team scored.
A linear mixed model included CAM mean scores as the
dependent variable with a random intercept for session.
Predictor variables included mean valence and arousal
scores, and goals scored as well as interactions with, and
main effect of, match; team size was included as a control
variable. No significant interaction effects and main effect
of team size were observed (ps > .4) and were removed
from a subsequent model. Significant main effects of
valence, F(1,286.28) = 18.49, p < .001, f 2 = .07, goals
scored, F(1,270.09) = 36.60, p < .001, f2 = .06, and match,
F(3,252.36) = 9.77, p < .001, f 2 = .08, but not arousal,
F(1,286.50) = 1.11, p = .29, f2 = .01, were observed.

Relation between match measures and post‑session


cohesion scores

The next set of analyses examined the extent to which the


means of teammate CAM, arousal, and valence scores were
related to GEQ scores collected after the team task was com-
peted. To capture the extent to which members rated cohe-
sion and displayed arousal and valence via facial expressions
across the entire session, the mean of said measures was
calculated by collapsing across all matches. Additionally,
the total number of goals scored by teams across all four
matches was calculated. Correlation analyses using Pear-
son correlation coefficients revealed significant positive
Fig. 3  Cohesion assessment manikin (CAM) and emotion expression correlations between CAM and valence team mean scores
scores for teams as teams completed game matches. Error bars reflect
95% confidence intervals and the team mean of each GEQ scale (see Table 2). A sig-
nificant correlation between the GEQ GIS scale and team
arousal mean was observed. Overall, these results suggested
F(3,251.83) = 6.09, p < .001, f 2  = .04, and valence, that teams with members that had higher CAM ratings and
F(3,250.91) = 5.37, p < .001, f 2 = .03, but not arousal, expressed higher levels of valence (i.e., greater positivity)
F(3,251.1) = 1.70, p = .17, f2 = .01 (see Fig. 3). Post hoc across a session reported higher cohesion.
pairwise comparisons indicated that CAM scores tended A multivariate analysis of variance (MANOVA) was used
to be higher for later compared to earlier matches (adj. to further examine the relative connections between CAM,
ps < .05 for match 1 vs. match 3, match 1 vs. match 4); facial expression, goals scored, and GEQ scales. CAM,
a reverse pattern was observed for valence scores (adj. valence, arousal scores, total goals scored, and team size
ps < .05 for match 1 vs. match 3, match 1 vs. match 4). were simultaneously entered as predictors of GEQ scores
For goals scored, significant effects were observed with (each was scaled and centered to reduce multicollinearity;
CAM scores, F(1,262.80) = 53.50, p < .001, f2 = .13, and variance inflation factors, VIFs, for individual predictors
valence scores, F(1,259.85) = 23.38, p < .001, f 2 = .05, ranged from 1.15 to 1.26 across models). Significant effects
but not arousal scores, F(1,260.65) = .59, p = .44, f2 < .01; of CAM, Pillai's trace = .63, F(4,62) = 26.63, p < .001, and
higher goals scored was associated with higher CAM valence, Pillai's trace = .18, F(4,62) = 3.43, p = .01, were
scores (standardized estimate = .34, SE = .05) and valence observed; no significant effects were observed for arousal,
scores (standardized estimate = .22, SE = .05). Overall, Pillai's trace = .04, F(4,62) = .71, p = .59, total goals, Pil-
these results indicated that CAM and valence mean scores lai's trace = .05, F(4,62) = .77, p = .55, or group size, Pillai's

13
Behavior Research Methods

Table 2  Descriptive statistics and Pearson correlation coefficients between CAM, facial expression, and GEQ team measures
Team measure M SD Median Min Max 1 2 3 4 5 6 7

1-CAM mean 5.95 .79 6.05 3.34 7 --- --- --- --- --- --- ---
2-Valence mean .03 .13 .02 -.28 .39 .305** --- --- --- --- --- ---
3-Arousal mean .34 .03 .34 .26 .40 .130 .350** --- --- --- --- ---
4-ATGT mean 7.80 .91 8.06 5.06 9 .708*** .277* .073 --- --- --- ---
5-ATGS mean 5.91 .91 6.20 2.95 7.25 .670*** .348** .205 .704*** --- --- ---
6-GIT mean 7.16 .94 7.18 4.35 8.70 .648*** .365** .045 .692*** .577*** --- ---
7-GIS mean 5.38 1.12 5.53 2.25 7.69 .606*** .476*** .235* .605*** .762*** .565*** ---
8-Total goals 12.9 8.04 10.50 1 39 .341** .140 -.035 .263* .242* .301* .119

CAM cohesion assessment manikin, GEQ Group Environment Questionnaire, ATGT​individual attractions towards the group—task, ATGS indi-
vidual attractions towards the group—social, GIT group integration—task, GIS group integration—social
*
  p < .05, ** p < .01, *** p < .001

Table 3  Pearson correlations (r) between GEQ and CAM match scores and tests comparing correlation strength between CAM scores for earlier
matches and the fourth match
GEQ Scale r CAM - 1 r CAM - 2 r CAM - 3 r CAM - 4 r CAM - 1 vs. r r CAM - 2 vs. r r CAM - 3
CAM - 4 CAM - 4 vs. r CAM
-4

ATGT​ .483*** .644*** .674*** .748*** adj. p = .01* adj. p = .09 adj. p = .09
ATGS .476*** .586*** .634*** .717*** adj. p = .02* adj. p = .07 adj. p = .09
GIT .372** .631*** .647*** .673*** adj. p = .01* adj. p = .63 adj. p = .64
GIS .549*** .577*** .508*** .560*** adj. p = .91 adj. p = .85 adj. p = .41
*
  p < .05, ** p < .01, *** p < .001

trace = .09, F(8,126) = .78, p = .62. When examining effects Specifically, the correlations between CAM team mean
for each GEQ scale, to estimate the impact of CAM and scores for each match (1, 2, 3, 4) and each GEQ scale
valence scores, the change in multiple R2 was calculated were calculated (see Supplementary Material for correla-
as the difference in R2 when removing each predictor from tions between all match measures and GEQ scales: Bonny
the full model, with f2 calculated as a standardized effect (2022a)). A set of Williams tests were then used to evaluate
size. For all GEQ scales, CAM scores were a significant whether CAM match 4 scores correlated more strongly with
predictor, with teams reporting higher mean CAM scores GEQ scales compared to CAM scores collected during ear-
across the session also having higher GEQ scores: ATGT lier matches. To control for multiple comparisons, p-values
(full model R2 = .53), F(1,65) = 69.31, p < .01, Δ R2 = .39, were adjusted using the false discovery rate. For ATGT,
f 2 = .80, ATGS (full model R 2 = .50), F(1,65) = 57.96, ATGS, and GIT scales, correlations with CAM match 4
p < .01, Δ R2 = .30, f2 = .60, GIT (full model R2 = .48), scores were significantly greater than with CAM match 1
F(1,65) = 52.24, p < .01, Δ R2 = .25, f 2 = .48, GIS (full scores; no significant differences in CAM score correlations
model R2 = .51), F(1,65) = 48.46, p < .01, Δ R2 = .23, f2 = .47. across matches were observed with GIS scores (see Table 3).
Valence scores were a significant predictor for GEQ GIS
scores, F(1,65) = 12.30, p < .01, Δ R2 = .07, f2 = .14; no
significant effects were observed for ATGT, F(1,65) =.56, Discussion
p = .46, Δ R2 < .01, f2 = .01, ATGS, F(1,65) =2.91, p = .09, Δ
R2 = .01, f2 = .02, or GIT, F(1,65) = 3.83, p = .05, Δ R2 = .04, In the present study, the degree to which brief manikin
f2 = .07. This indicated that teams that maintained more posi- scales and facially expressed emotions of teammates were
tive facial expressions across the session tended to report indicators of cohesion was examined. To do so, CAM scales
that their team had higher social cohesion. were collected and the facial expressions of participants
A series of tests were used to explore how the strength of were recorded while newly formed teams completed mul-
correlations between self-report measures, GEQ and CAM tiple matches of a team-based video game. When examin-
scales, may have changed from earlier to later matches. ing sensitivity to team membership, CAM ratings and the

13
Behavior Research Methods

valence and arousal of facial expressions were more similar valence displayed, but tended to, overall, have moderate rat-
for teams composed of real versus competing teammates. ings and express higher positivity. As teams completed game
Across matches, the mean CAM ratings and valence of matches, they tended to provide higher CAM ratings and
team member expressions varied with accumulated experi- become more neutral in the valence of facial expressions.
ence and were significantly related to self-reported measures The observed changes in CAM ratings and valence of
of cohesion collected after completing the session. These facial expressions indicated that the dynamics of ad hoc
results provide evidence that CAM ratings and the valence teams changed rapidly during testing sessions. For both
of facial expressions were indicators of team cohesion. measures, significant differences in team mean scores were
observed between the first and fourth rounds (as well as third
Sensitivity of indicator measures to team with valence), suggesting that the largest changes occurred
membership early in the testing sessions. With regard to facial expres-
sions, the longitudinal changes build upon previous studies
Similarity scores for both CAM ratings and the valence and that investigated synchrony in facial expressions of dyads
arousal of facial expressions were sensitive to team member- and small teams while performing a task (Chikersal et al.,
ship. In a previous study, greater synchrony in muscle activ- 2017; Mønster et al., 2016). This aligns with past studies
ity related to smiles was observed when including mem- which observed effects indicative of experience-dependent
bers from the same compared to different teams (Mønster changes in team facial expressions. For example, in Mønster
et al., 2016). Although team similarity instead of synchrony et al. (2016), teams of three members completed a series of
metrics were used in the present study, CAM, valence, and production tasks before deciding whether to adopt a new
arousal diversity scores (MEDn) were greater when calcu- routine to produce the item. Synchrony in smile-related
lated using members of different as opposed to same teams, motor activity was associated with the decision, with groups
in support of Hypothesis 1. This result was particularly nota- displaying lower synchrony more likely to adopt the new
ble since members of the opposing team were still part of routine. Building on this research, the present study indicates
the same testing session and were observing, hearing, and that the level of positivity expressed by small teams changes
playing the same team-based video game matches. The pres- as they gain more experience performing together.
ence of a difference in MEDn scores for real and pseudo
teams indicated that CAM and facial expression measures Evidence that CAM scores were indicators
were influenced by team dynamics. Corroborating evidence of cohesion
for CAM and valence scores was provided by two additional
findings: changes in mean CAM and valence scores with Connections observed with post-session cohesion question-
experience and correlations with GEQ measures. naires indicated that CAM ratings reflected aspects of team
cohesion. When examining the psychometric properties of
Changes in CAM and facial expression valence the scales, individual participant CAM ratings were found to
scores as teams accumulated experience have high inter-item reliability. When calculating the mean
CAM scores of teams across matches, significant positive
A main objective of the present research was to examine how correlations were observed with all GEQ scales, in support
CAM ratings and the valence and arousal of facial expres- of Hypothesis 3. These correlations were strong, and session
sions of newly formed teams changed as they performed CAM scores remained significant predictors of all scales
together. When examining teams across game matches, the when accounting for total goals scored and the valence and
mean of CAM scores increased, and, although the valence arousal of facial expressions. When combined with evidence
of facial expressions decreased, no significant changes that scores were more similar for real versus pseudo teams,
were observed for arousal, in partial support of Hypothesis these results suggest that responses made via manikin-based
2. The significant effect of facially expressed valence, but CAM scales were related to aspects of team cohesion.
not arousal, is similar to a previous study that observed a Ratings collected early in the testing session via CAM
stronger effect when manipulating the positive versus neg- scales predicted responses on a post hoc cohesion question-
ative valence compared to higher versus lower energy of naire. When comparing the strength of correlations with
small groups while performing a task (Barsade, 2002). In GEQ scales and team CAM scores from each match, the cor-
addition to main effects of experience, teams that scored relations with match 4 were significantly stronger than those
more goals during matches tended to have higher valence from match 1 for ATGT, ATGS, and GIT scales. The lack of
scores during matches and report higher CAM scores after other significant differences in match-level correlations sug-
matches. Overall, these results indicated the following longi- gests that, for these GEQ scales, CAM scores from as early
tudinal patterns. Initially, during the first game match, mem- as the second match were relatively strong predictors of post
bers of teams varied in mean CAM ratings and the level of hoc cohesion ratings. In contrast, no significant variation in

13
Behavior Research Methods

correlation strength by match was observed with the GIS matches (Moll et al., 2010). The inclusion of CAM ratings
scale. This suggests that CAM scores collected after the first in the present study provides novel evidence that the valence
match had a relation of similar strength to GIS scores com- of facial expressions of co-located teammates was related to
pared to those collected after the fourth match, just before social bonding aspects of team cohesion.
the GEQ was administered. Overall, this pattern of correla- The observed changes in CAM and valence measures sug-
tions suggests that collecting brief, self-report ratings early gest that cohesion can rapidly change with experience. This
on as an ad hoc team is performing can predict the cohesion builds upon previous research that has observed changes in
the team achieves by the end of the task. Whereas traditional cohesion as teams performed over several days and weeks
measures of cohesion, such as the GEQ and the team climate (Harrison et al., 1998; Terborg et al., 1976). This aligns with
inventory (Anderson & West, 1998), typically assume a pro- temporally dynamic theories of team processes (Kozlowski
longed period of interaction between members, connections & Chao, 2012) and highlights the need for further research
were observed after about ten minutes of team interaction. to examine the extent to which cohesion fluctuates as teams
This suggests that the CAM scales can be used to predict perform over extended periods of time.
the cohesion of teams whose members have little experience It is important to note that, despite some strong connec-
working together. Combined with the brief amount of time tions observed with post hoc questionnaires of cohesion,
required to complete them, around seven seconds for each there was still variance in GEQ scales unaccounted for by
item, these results indicate that the CAM scales can be used CAM and valence measures. This indicates that other fac-
to monitor changes in team cohesion, especially when new tors, in addition to CAM and valence scores, were related
teams are formed. to GEQ scores. Furthermore, portions of variance in CAM
scores collected after each match were unaccounted for by
Evidence that valence scores were indicators facial expression scores and goals scored. Like GEQ scores,
of cohesion this suggests that other factors influenced CAM ratings. In
summary, although significant connections were observed
The results of the present study suggest that the level of posi- with CAM, valence, and GEQ scores, the results highlight
tivity in the facial expressions of members was linked with that these measures were multiply determined. This suggests
the social cohesion of a team. The mean valence of team that CAM and valence of facial expression measures should
facial expressions across the session was associated with all be used as indicator metrics, alongside the traditional long-
GEQ scores in the present study, in support of Hypothesis 3. form cohesion questionnaires.
In contrast, the mean arousal of facial expressions across the
session was significantly correlated with only the GIS scale. Limitations and future directions
When entered as predictors of GEQ scale scores, along with
CAM scores and total goals scored, mean valence scores Overall, teams in the present research reported relatively
only accounted for unique variance in a single GEQ scale, high cohesion. For the questionnaires of team cohesion,
GIS. The GIS scale contains items that reflect the social scores for CAM and GEQ measures had sample mean
bonding of teammates as a single unit (Brawley et al., 2002). and median scores higher than the midpoint of possible
Overall, teams that displayed facial expressions that con- responses. Despite these high ratings, improvements in
tained more positivity over the course of the study session CAM scores were still observed as teams completed video
were more likely to report, after finishing playing the game, game matches. However, the high cohesion observed raises
that they shared a social bond with their teammates. questions as to whether the results of the present study
Connections observed with CAM scores suggests that the would extend to contexts where teams report lower cohesion.
valence of facial expressions could be used as a real-time Subsequent research can build upon the present study
indicator of team cohesion. Longitudinal models suggested in multiple ways. Although the team-based video game
that the mean valence of facial expressions during matches involved a virtual task, team members completed the game
was affected by experience and correlated with CAM rat- in the same physical location. Given the ability of team-
ings. It is important to note that the correlation was small, based video games to be completed by remotely located
suggesting that the mean facially expressed valence of team- players, future studies can examine the extent to which
mates only accounted for a portion of variance in CAM rat- facial expression-based measures are associated with
ings. However, it was a unique predictor of GIS scores, in the cohesion of teams that interact virtually. The type of
addition to CAM scores. This builds upon previous observa- pseudo team comparisons can be expanded as well. The
tions that teams who self-reported more positive emotions present study created pseudo teams using players who
while performing reported higher team performance (Jung, directly competed within a session. A strength of this
2016), and soccer teams with players who displayed greater design was that all members experienced the same tem-
positive emotion during shootouts were more likely to win poral sequence of events during a match. However, this

13
Behavior Research Methods

could have led to greater diversity in pseudo team meas- evaluate whether real-time measures of expressed emotion
ures than if members were not competing. For example, predict the development of cohesion.
when one team scored a goal, those members may have
had greater valence in facial expressions, with the oppos- Acknowledgements  The author would like to thank Keona Smith for
their assistance with data collection.
ing team having lower valence; this could have resulted in
greater valence diversity on a pseudo team composed of Funding  The research described herein was sponsored by the US Army
both members than if they were not competing. A prelimi- Research Institute for the Behavioral and Social Sciences, Department
nary analysis suggested that pseudo teams from across and of the Army (Grant No. W911NF-19-1-0123). The views expressed
within the same session did not vary statistically (see Sup- in this document are those of the author and do not reflect the offi-
cial policy or position of the Department of the Army, DOD, or the
plementary Materials), but it is limited due to the design US Government. The US Government is authorized to reproduce and
of the study. Future research can select a team-based task distribute reprints for Government purposes notwithstanding any copy-
that allows for teams to work towards the objective at the right notation herein.
same time, but are not directly competing.
The type of teams formed was low in temporal stabil-
ity, skill differentiation, and authority differentiation in the
dimensional scaling framework (Hollenbeck et al., 2012), References
and teams were lab-based, not field-based (Bell, 2007).
Future research should examine CAM and facial expression Ancona, D. (1990). Outward bound: Strategies for team survival
in an organization. Academy of Management Journal, 33(2),
measures in teams that vary along these dimensions outside 334–365. https://​doi.​org/​10.​2307/​256328
of the lab. For example, organized team sports involve com- Anderson, N., & West, M. (1998). Measuring climate for work
petitive play (e.g., basketball) and can have higher temporal group innovation: Development and validation of the Team
stability, authority differentiation, and skill differentiation. Climate Inventory. Journal of Organizational Behavior, 19,
235–258. 63(4), 596–612. https://​doi.​org/​10.​1002/​(SICI)​1099-​
Subsequent studies can investigate CAM and facial expres- 1379(199805)​19:​3<​235::​AID-​JOB83​7>3.​0.​CO;2-C
sion metrics in these team settings and then extend and adapt Aron, A., Aron, E. N., & Smollan, D. (1992). Inclusion of other
the approach to noncompetitive play environments such in the self scale and the structure of interpersonal closeness.
as medical and manufacturing teams. By doing so, future Journal of Personality and Social Psychology, 63(4), 596–612.
https://​doi.​org/​10.​1037/​0022-​3514.​63.4.​596
research can examine the extent to which CAM and facially Barrett, L. F., Adolphs, R., Marsella, S., Martinez, A. M., & Pollak,
expressed valence scores are general indicators of cohesion S. D. (2019). Emotional Expressions Reconsidered: Challenges
or specific to certain types of teams and tasks. to Inferring Emotion From Human Facial Movements. Psycho-
The approach of the present study for assessing expressed logical Science in the Public Interest, 20(1), 1–68. https://​doi.​
org/​10.​1177/​15291​00619​832930
emotion was to estimate facial expressions from video Barsade, S. G. (2002). The Ripple Effect: Emotional Contagion and
recordings using facial recognition software. The large Its Influence on Group Behavior Part of the Human Resources
number of video recording frames that required removal Management Commons, and the Organizational Behavior
when estimating facial expression measures raises ques- and Theory Commons. Administrative Science Quarterly, 47,
644–675.
tions about the use of the method with field-based teams. Barton, K. (2018). Package “MuMIn: Multi-model inference” for R.
For these frames, emotion measures were missing for at least In R Package Version 1.40.4. https://​cran.r-​proje​ct.​org/​packa​ge=​
one team member. This precluded the calculation of time- MuMIn
series-based scores, such as synchrony, in addition to team Bates, D. M., Mächler, M., Bolker, B. M., & Walker, S. C. (2015). Fit-
ting linear mixed-effects models using lme4. Journal of Statistical
mean and similarity, to compare these metrics as indicators Software, 67(1), 1–48. https://​doi.​org/​10.​18637/​jss.​v067.​i01
of team dynamics. Current facial expression algorithms are Beal, D. J., Cohen, R. R., Burke, M. J., & McLendon, C. L. (2003).
poor at estimating emotion when individuals are conversing, Cohesion and performance in groups: A meta-analytic clarifica-
have portions of the face occluded by clothing or hands, or tion of construct relations. Journal of Applied Psychology, 88(6),
989–1004. https://​doi.​org/​10.​1037/​0021-​9010.​88.6.​989
are facing away from the camera (Min et al., 2011). Future Bell, S. T. (2007). Deep-level composition variables as predictors of
studies that combine multimodal data sources may be able team performance: A meta-analysis. Journal of Applied Psychol-
to provide more complete emotion expression measures. ogy, 92(3), 595–615. https://​doi.​org/​10.​1037/​0021-​9010.​92.3.​595
Including estimates of expressed emotion from other modali- Bell, S. T., & Marentette, B. J. (2011). Team viability for long-term and
ongoing organizational teams. Organizational Psychology Review,
ties, such as tone of voice, could also help improve estimates 1(4). https://​doi.​org/​10.​1177/​20413​86611​405876
of emotion (Busso et al., 2004) and mitigate concerns that Ben-Shalom, U., Lehrer, Z., & Ben-Ari, E. (2005). Cohesion dur-
focusing only on facial displays of emotion may not provide ing military operations. Armed Forces & Society, 32(1), 63–79.
accurate estimates of the emotion state of a person (Bar- https://​doi.​org/​10.​1177/​00953​27X05​277888
Biemann, T., & Kearney, E. (2010). Size does matter: How varying
rett et al., 2019). Previous research has calculated expressed group sizes in a sample affect the most common measures of
emotions of team members in near real time (Samrose group diversity. Organizational Research Methods, 13(3), 582–
et al., 2018). Taking a similar approach, future research can 599. https://​doi.​org/​10.​1177/​10944​28109​338875

13
Behavior Research Methods

Bonny, J. W. (2018). Preliminary evaluation of a brief team cohesion Gächter, S., Starmer, C., & Tufano, F. (2015). Measuring the closeness
manikin scale. Proceedings of the Human Factors and Ergonom- of relationships: A comprehensive evaluation of the “inclusion of
ics Society 2018 Annual Meeting, 747–751. the other in the self” scale. PLoS ONE, 10(6). https://​doi.​org/​10.​
Bonny, J. W. (2022a). Self-report and facial expression indicators of 1371/​journ​al.​pone.​01294​78
team cohesion development. OSF. https://​doi.​org/​10.​17605/​OSF.​ Gino, F., Argote, L., Miron-Spektor, E., & Todorova, G. (2010). First,
IO/​RVCDB get your feet wet: The effects of learning from direct and indi-
Bonny, J. W. (2022b). Cohesion assessment manikin scale materials. rect experience on team creativity. Organizational Behavior and
OSF. https://​doi.​org/​10.​17605/​OSF.​IO/​NBTWS Human Decision Processes, 111(2), 102–115. https://​doi.​org/​10.​
Bradley, M., & Lang, P. J. (1994). Measuring emotion: The self-assess- 1016/j.​obhdp.​2009.​11.​002
ment manikin and the semantic differential. Journal of Behavior Hackman, J. R., & Morris, C. G. (1975). Group tasks, group interaction
Therapy and Experimental Psychiatry, 25(I), 49–59. https://​doi.​ process, and group performance effectiveness: A review and pro-
org/​10.​1016/​0005-​7916(94)​90063-9 posed integration. Advances in Experimental Social Psychology,
Brawley, L. R., Carron, A. V., & Widmeyer, W. N. (1987). Assessing 8(C), 45–99. https://​doi.​org/​10.​1016/​S0065-​2601(08)​60248-8
the cohesion of teams: Validity of the group environment ques- Harrison, D. A., Price, K. H., & Bell, M. P. (1998). Beyond relational
tionnaire. Journal of Sport Psychology, 9, 275–294. https://​doi.​ demography: Time and the effects of surface- and deep-level
org/​10.​1123/​jsp.9.​3.​275 diversity on work group cohesion. Academy of Management Jour-
Brawley, L. R., Carron, A. V, & Widmeyer, N. W. (2002). The group nal, 41(1), 96–107. https://​doi.​org/​10.​2307/​256901
environment questionnaire: Test manual. Fitness Information Hollenbeck, J. R., Beersma, B., & Schouten, M. E. (2012). Beyond
Technology. team types and taxonomies: A dimensional scaling conceptualiza-
Burgess, A. P. (2013). On the interpretation of synchronization in EEG tion for team description. Academy of Management Review, 37(1),
hyperscanning studies: A cautionary note. Frontiers in Human 82–106. https://​doi.​org/​10.​5465/​amr.​2010.​0181
Neuroscience, 7, 881. https://​doi.​org/​10.​3389/​fnhum.​2013.​00881 Huckman, R. S., Staats, B. R., & Upton, D. M. (2009). Team famili-
Busso, C., Deng, Z., Yildirim, S., Bulut, M., Lee, C. M., Kazemzadeh, arity, role experience, and performance: Evidence from Indian
A., Lee, S., Neumann, U., & Narayanan, S. (2004). Analysis of software services. Management Science, 55(1), 85–100. https://​
emotion recognition using facial expressions, speech and multi- doi.​org/​10.​1287/​mnsc.​1080.​0921
modal information. ICMI’04 - Sixth International Conference on Ilgen, D. R., Hollenbeck, J. R., Johnson, M., & Jundt, D. (2005). Teams
Multimodal Interfaces. https://​doi.​org/​10.​1145/​10279​33.​10279​68 in Organizations: From Input-Process-Output Models to IMOI
Carron, A., Widmeyer, W. N., & Brawley, L. (1985). The development Models. Annual Review of Psychology, 56(1), 517–543. https://​
of an instrument to assess cohesion in sport teams: The Group doi.​org/​10.​1146/​annur​ev.​psych.​56.​091103.​070250
Environment Questionnaire. Journal of Sport Psychology, 7(3), Jung, M. F. (2016). Coupling interactions and performance: Predicting
244–266. https://​doi.​org/​10.​1123/​jsp.7.​3.​244 team performance from thin slices of conflict. ACM Transactions
Carron, A. V., Bray, S. R., & Eys, M. A. A. (2002a). Team cohe- on Computer-Human Interaction, 23(3), 1–32. https://​doi.​org/​10.​
sion and team success in sport. Journal of Sports Sciences, 20(2), 1145/​27537​67
119–126. https://​doi.​org/​10.​1080/​02640​41023​17200​828 Kelly, J. R., & Barsade, S. G. (2001). Mood and emotions in small
Carron, A. V., Colman, M. M., Wheeler, J., & Stevens, D. (2002b). groups and work teams. Organizational Behavior and Human
Cohesion and performance in sport: A meta analysis. Journal of Decision Processes, 86(1), 99–130. https://​doi.​org/​10.​1006/​obhd.​
Sport & Exercise Psychology, 24(2), 168–188. https://​doi.​org/​10.​ 2001.​2974
1123/​jsep.​24.2.​168 Koo, T. K., & Li, M. Y. (2016). A Guideline of Selecting and Reporting
Chang, A., & Bordia, P. (2001). A multidimensional approach to the Intraclass Correlation Coefficients for Reliability Research. Jour-
group cohesion-group performance relationship. Small Group nal of Chiropractic Medicine, 15(2). https://d​ oi.o​ rg/1​ 0.1​ 016/j.j​ cm.​
Research, 32(4), 379–405. https://​doi.​org/​10.​1177/​10464​96401​ 2016.​02.​012
03200​401 Kozlowski, S. W. J., & Chao, G. T. (2012). The dynamics of emer-
Chikersal, P., Tomprou, M., Kim, Y., Woolley, A., & Dabbish, L. gence: Cognition and cohesion in work teams. Managerial and
(2017). Deep structures of collaboration: Physiological correlates Decision Economics, 33(5–6), 335–354. https://​doi.​org/​10.​1002/​
of collective intelligence and group satisfaction. Proceedings of mde.​2552
the 20th ACM Conference on Computer-Supported Cooperative Kozlowski, S. W. J., & Ilgen, D. R. (2006). Enhancing the effectiveness
Work and Social Computing (CSCW 2017), 873–888. https://​doi.​ of work groups and teams. Psychological Science in the Public
org/​10.​1145/​29981​81.​29982​50 Interest, 7(3), 77–124. https://​doi.​org/​10.​1111/j.​1529-​1006.​2006.​
Ekman, P., & Friesen, W. V. (1971). Facial Action Coding System: 00030.x
A Technique for the Measurement of Facial Movement. Journal Kuznetsova, A., Brockhoff, P. B., & Christensen, R. H. B. (2017).
of Personality and Social Psychology. https://​doi.​org/​10.​1037/​ lmerTest package: Tests in linear mixed effects models. Journal
h0030​377 of Statistical Software, 82(13), 1–19. https://d​ oi.o​ rg/1​ 0.1​ 8637/j​ ss.​
Elbardissi, A. W., Duclos, A., Rawn, J. D., Orgill, D. P., & Carty, M. v082.​i13
J. (2013). Cumulative team experience matters more than indi- Lang, P. J., Greenwald, M. K., Bradley, M. M., & Hamm, A. O. (1993).
vidual surgeon experience in cardiac surgery. Journal of Thoracic Looking at pictures: Affective, facial, visceral, and behavioral
and Cardiovascular Surgery, 145(2), 328–333. https://d​ oi.​org/​10.​ reactions. Psychophysiology, 30(3), 261–273. https://​doi.​org/​10.​
1016/j.​jtcvs.​2012.​09.​022 1111/j.​1469-​8986.​1993.​tb033​52.x
Eys, M. A., Carron, A. V, Bray, S. R., & Brawley, L. R. (2007). Item Marks, M. A., Mathieu, J. E., & Zaccaro, S. J. (2001). A temporally
wording and internal consistency of a measure of cohesion: The based framework and taxonomy of team processes. Academy of
Group Environment Questionnaire. Journal of Sport & Exercise Management Review, 26(3), 356–376. https://​doi.​org/​10.​5465/​
Psychology, 29, 395–402. https://​doi.​org/​10.​1123/​jsep.​29.3.​395 AMR.​2001.​48457​85
Festinger, L. (1950). Informal social communication. Psychological Mathieu, J. E., Gallagher, P. T., Domingo, M. A., & Klock, E. A.
Review, 57(5), 271–282. https://​doi.​org/​10.​1037/​h0056​932 (2019). Embracing Complexity: Reviewing the Past Decade of
Freeman, G., & Wohn, D. Y. (2019). Understanding eSports Team Team Effectiveness Research. Annual Review of Organizational
Formation and Coordination. Computer Supported Cooperative Psychology and Organizational Behavior, 6, 17–46. https://​doi.​
Work, 28, 95–126. https://​doi.​org/​10.​1007/​s10606-​017-​9299-4 org/​10.​1146/​annur​ev-​orgps​ych-​012218-​015106

13
Behavior Research Methods

Matsuno, T., & Fujita, K. (2018). Body inversion effect in monkeys. Journal of Surgery, 207(2), 170–178. https://​doi.​org/​10.​1016/j.​
PLoS ONE. https://​doi.​org/​10.​1371/​journ​al.​pone.​02043​53 amjsu​rg.​2013.​06.​016
McMaster, R., & Baber, C. (2012). Multi-agency operations: Coopera- Samrose, S., Zhao, R., White, J., Li, V., Nova, L., Lu, Y., Ali, M. R., &
tion during flooding. Applied Ergonomics, 43(1), 38–47. https://​ Hoque, M. E. (2018). CoCo: Collaboration Coach for Understand-
doi.​org/​10.​1016/j.​apergo.​2011.​03.​006 ing Team Dynamics during Video Conferencing. Proceedings of
Min, R., Hadid, A., & Dugelay, J. L. (2011). Improving the recognition the ACM on Interactive, Mobile, Wearable and Ubiquitous Tech-
of faces occluded by facial accessories. 2011 IEEE International nologies. https://​doi.​org/​10.​1145/​31611​86
Conference on Automatic Face and Gesture Recognition and Selya, A. S., Rose, J. S., Dierker, L. C., Hedeker, D., & Mermelstein, R.
Workshops, FG 2011. https://​doi.​org/​10.​1109/​FG.​2011.​57714​39 J. (2012). A practical guide to calculating Cohen’s f 2, a measure
Moll, T., Jordet, G., & Pepping, G. J. (2010). Emotional contagion of local effect size, from PROC MIXED. Frontiers in Psychology.
in soccer penalty shootouts: Celebration of individual success is https://​doi.​org/​10.​3389/​fpsyg.​2012.​00111
associated with ultimate team success. Journal of Sports Sciences, Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: Uses in
28(9), 983–992. https://​doi.​org/​10.​1080/​02640​414.​2010.​484068 assessing rater reliability. Psychological Bulletin, 86(2). https://​
Mønster, D., Håkonsson, D. D., Eskildsen, J. K., & Wallot, S. (2016). doi.​org/​10.​1037/​0033-​2909.​86.2.​420
Physiological evidence of interpersonal dynamics in a cooperative Terborg, J. R., Castore, C., & DeNinno, J. A. (1976). A longitudinal
production task. Physiology and Behavior, 156, 24–34. https://d​ oi.​ field investigation of the impact of group composition on group
org/​10.​1016/j.​physb​eh.​2016.​01.​004 performance and cohesion. Journal of Personality and Social
Morris, J. D. (1995). Observations: SAM: The Self-Assessment Psychology, 34(5), 782–790. https://​doi.​org/​10.​1037/​0022-​3514.​
Manikin - An efficient cross-cultural measurement of emotional 34.5.​782
response. Journal of Advertising Research, 35(6), 63–68. Wickham, H. (2016). ggplot2: Elegant graphics for data analysis.
Mullen, B., & Copper, C. (1994). The relation between group cohe- Springer-Verlag New York
siveness and performance: An integration. Psychological Bulletin, Zaccaro, S. J., Gualtieri, J., & Minionis, D. (1995). Task cohesion as
115(2), 210–227. https://​doi.​org/​10.​1037/​0033-​2909.​115.2.​210 a facilitator of team decision making under temporal urgency.
Nakagawa, S., & Schielzeth, H. (2013). A general and simple method Military Psychology, 7(2), 77–93. https://​doi.​org/​10.​1207/​s1532​
for obtaining R2 from generalized linear mixed-effects models. 7876m​p0702_3
Methods in Ecology and Evolution, 4(2), 133–142. https://​doi.​
org/​10.​1111/j.​2041-​210x.​2012.​00261.x Open Practices Statement  The data and materials for all experiments
Parkinson, B. (2020). Intragroup Emotion Convergence: Beyond Con- are available at https://​doi.​org/​10.​17605/​OSF.​IO/​RVCDB and https://​
tagion and Social Appraisal. Personality and Social Psychology doi.​org/​10.​17605/​OSF.​IO/​NBTWS and none of the experiments were
Review, 24(2), 121–140. https://​doi.​org/​10.​1177/​10888​68319​ preregistered.
882596 Publisher’s note Springer Nature remains neutral with regard to
Roberts, N. K., Williams, R. G., Schwind, C. J., Sutyak, J. A., McDow- jurisdictional claims in published maps and institutional affiliations.
ell, C., Griffen, D., Wall, J., Sanfey, H., Chestnut, A., Meier, A. H.,
Wohltmann, C., Clark, T. R., & Wetter, N. (2014). The impact of
brief team communication, leadership and team behavior training
on ad hoc team performance in trauma care settings. American

13

You might also like