Video Data Analysis: A Methodological Frame For A Novel Research Trend

Article
Sociological Methods & Research

2021, Vol. 50(1) 135-174
Video Data Analysis: ª The Author(s) 2018
Article reuse guidelines:
A Methodological sagepub.com/journals-permissions
DOI: 10.1177/0049124118769093
Frame for a Novel journals.sagepub.com/home/smr
Research Trend
Anne Nassauer1 and Nicolas M. Legewie2
Abstract
Since the early 2000s, the proliferation of cameras, whether in mobile
phones or CCTV, led to a sharp increase in visual recordings of human
behavior. This vast pool of data enables new approaches to analyzing situa-
tional dynamics. Application is both qualitative and quantitative and ranges
widely in fields such as sociology, psychology, criminology, and education.
Despite the potential and numerous applications of this approach, a con-
solidated methodological frame does not exist. This article draws on various
fields of study to outline such a frame, what we call video data analysis (VDA).
We discuss VDA’s research agenda, methodological forebears, and applica-
tions, introduce an analytic tool kit, and discuss criteria for validity. We aim
to establish VDA as a methodological frame and an interdisciplinary analytic
approach, thereby enhancing efficiency and comparability of studies, and
communication among disciplines that employ VDA. This article can serve as
a point of reference for current and future practitioners, reviewers, and
interested readers.
1
Department of Sociology, John F. Kennedy Institute, Freie Universität Berlin, Berlin, Germany
2
Socio-Economic Panel, German Institute for Economic Research (DIW Berlin), Berlin,
Germany
Corresponding Author:
Anne Nassauer, Department of Sociology, John F. Kennedy Institute, Freie Universität Berlin,
Lansstraße 7-9, 14195 Berlin, Germany.
Email: anne.nassauer@fu-berlin.de
136 Sociological Methods & Research 50(1)
Keywords
video, visual data, emotions, microsociology, situational analysis, interactions,
behavioral analysis
How do teams structure social organization in the workplace and manage

changing situational demands? Is there crucial causality at the microlevel of
violent confrontations? How do patterns of daily behaviors in public spaces
shape the fabric of social life? Since the early 2000s, researchers have devel-
oped new possibilities to study such questions, fueled by a surge in the
production and dissemination of visual data (i.e., moving or still images).1
Camcorders, mobile phone cameras, and even drones are used to film and
photograph social life. Together with the proliferation of public space under
video surveillance, this development produces a large and ever-expanding
pool of visual data that researchers can employ. The way people use the
Internet for sharing experiences makes many of these data easily accessible.
For example, on YouTube, more than three hundred hours of footage are
uploaded every minute (McConnell 2015), many documenting real-life
social situations and interactions. Even extremely private situations, and
deviant and criminal behaviors, are being filmed and uploaded online.
Consequently, visual data from nonlaboratory settings are being increas-
ingly employed as a basis for researching social phenomena in sociology,
social psychology, and criminology, among other fields. Studies using such
data share a homogeneous research agenda and approach to data analysis
(Collins 2008; Klusemann 2009; Levine, Taylor, and Best 2011). They ana-
lyze what the visual captures about situational dynamics of human social life.
They use such data to trace situations or events step-by-step to explain a
process or outcome, focusing on aspects such as peoples’ interactions, move-
ments, fields of vision, exchanges of glances or gestures, and actors’ facial
expressions and body postures. We label this analytic strategy video data
analysis (VDA).
Numerous empirical studies employ VDA.2 Examples include studies on
deviant behavior such as interpersonal and collective violence (Collins
2008), massacres (Klusemann 2009), and street fights (Levine et al. 2011).
Criminologists increasingly use videos to study crime as it unfolds3 (e.g., see
the special issue Crime Caught on Camera, Lindegaard and Bernasco 2018).
Other studies employ visuals to examine behavior at the workplace such as in
emergency call centers (Fele 2008; Mondada 2008), staff–patient
Nassauer and Legewie 137
interactions in nursing centers (Caldwell and Atwal 2005), cooperation in

anesthesia teams (Burtscher et al. 2010; for an overview, see Heath, Hind-
marsh, and Luff 2010:6-7), or social rituals in interactions between humans
and animals (Konecki 2008, see also 2016). Research in education has a
tradition of studying visuals to examine nonverbal and verbal aspects of
social interaction, though usually with participants who are aware of being
filmed. Recent examples include classroom interactions (Andersson and
Sørvik 2013) and peacemaking among children (P. Verbeek 2008).
The most prominent among recent applications in sociology is Collins’s
(2008) analysis of pictures and videos to study emotional dynamics in a
variety of violent and near-violent situations. Collins focuses on the minutes
and seconds before and during violent behaviors, identifying emotions in
actors’ facial muscles and body postures. This approach allows Collins to
challenge core assumptions of conventional theories of violence by showing
that situational emotions, instead of actors’ prior strategies or motivations,
trigger violent behaviors. Visual data are instrumental to enabling Collins
develop his argument and corroborate his findings. His study suggests that
“there is crucial causality at the micro-level” (Collins 2016b), which can be
uncovered using visual data.
Despite the increase in applications, a methodological frame with expli-
cit analytic tools and procedures, and quality criteria has not yet been
developed. Studies employing visuals to examine natural behaviors, emo-
tions, and microsituational dynamics are scattered across disciplines, pre-
senting their analytic approach without reference to or drawing on a
broader research strand, a common set of analytic tools, or quality criteria
explicitly geared toward use of novel types of visual data (Klusemann
2009; Levine et al. 2011). Collins’s (2008) microsociological study on
violence, for example, draws criticism for being methodologically vague
(Laitin 2008).
This article addresses this gap by outlining VDA as a methodological
frame, focusing on issues in which VDA differs from other approaches.
We introduce VDA’s research agenda, discuss four methodological fore-
bears, and examine applications of VDA. From many applications and
related approaches, we construct a tool kit consisting of analytic dimensions
and procedures used frequently during VDA. We also discuss how validity
criteria apply to the approach. Throughout this article, we use three studies as
examples to illustrate these aspects (Klusemann 2009; Levine et al. 2011;
Nassauer 2018b). With these contributions, this article enhances efficiency and
comparability of studies, and communication among disciplines that employ
VDA. It can serve as a point of reference for current and future practitioners,
reviewers, and interested readers.
What Is VDA?
VDA focuses on situational dynamics and behaviors using video or other
visual data to understand how people act and interact, and which conse-
quences situational dynamics have for social outcomes. This perspective
helps with understanding the rules and processes that govern social life, both
in everyday encounters and in extreme situations (Collins 2008; Goffman
1959). The applications above show that by observing a person’s move-
ments, fields of vision, uses of space, interactions, exchanges of glances and
gestures, facial expressions, and body postures, it is possible to decipher the
syntax of situational dynamics. Comparing dynamics across situations allows
identifying patterns that explain social phenomena (Suchar 1997:34).
With its analytic focus and use of data, VDA combines aspects of various
methodological approaches from several disciplines. We focus on four pro-
minent methodological forebears: visual studies, ethnography, experimental
behavioral studies, and multimodal interaction analysis.4 VDA resembles the
tradition of visual studies with its prominent use of visual data. It is similar to
ethnography in its focus on situational dynamics. Like experimental beha-
vioral studies in psychology, it relies on detailed analysis of recorded indi-
vidual and social behaviors. It resembles multimodal interaction analysis in
its goal of analyzing situations on all relevant interactional dimensions. VDA
combines aspects of these disparate methodological approaches while differ-
ing from each regarding other aspects (Figure 1). This combination makes
VDA a novel approach to studying social situations. Discussing VDA in the
context of these four research traditions not only gives credit to the metho-
dological groundwork laid by past researchers but also clarifies VDA’s
research agenda and the potential that novel sources of visual data offer to
analysis.
VDA and Visual Studies: Focusing on the Depicted

Visual data have been used in social science research for many decades.
These earlier and still thriving areas of research are collected under the term
visual studies and trace to the 1940s (Mead and Bateson 1942). Generally,
visual studies are “a collection of approaches in which researchers use [visual
data] to portray, describe, or analyze social phenomena” (Harper 1988:55).
Mulmodal
Visual studies interacon
analysis
Video
Data
Analysis
Ethnography Experimental
behavioral
studies
Figure 1. Video data analysis as a methodological nexus.
A look at how visual data are understood and analyzed during visual
studies clarifies VDA’s approach to visuals as data. Visual studies view
visual data as iconic constructions, as part of communication strategies, and
as containers of behavioral and symbolic information (Grady 2008:7). Most
research in visual studies focus on the former two understandings (Becker
1986; Harper 1988; see also Pauwels 2011). In contrast, VDA uses visual
data only in the latter sense, as containers of behavioral and symbolic infor-
mation (Collins 2008; Klusemann 2009; see also Erickson 2011). The goal of
VDA is to study what in visual studies is called the depicted, not the depic-
tion. The depicted refers to the raw content of a visual, the objects, people,
surroundings, and so on. The depiction refers to the physical and social
means by which this content is captured, presented, and interpreted (Harper
1988:29; Pauwels 2010:557; Pink 2006:222-24; see also Kissmann 2009;
Knoblauch et al. 2006; Pauwels 2011).
Many scholars of visual studies view both aspects as inseparable; the
empirical dimension (the depicted) cannot be understood without analyzing
the symbolic dimension (the depiction; Grady 2008; Harper 1988; Kno-
blauch et al. 2006; Tuma, Schnettler, and Knoblauch 2013:63-84). For exam-
ple, in a video of negotiations between military leaders, a common analysis
in visual studies includes the way the scene was recorded to talk about how
high-level military meetings are documented. Who recorded the data and
with what intentions, and how the data are presented are points of analysis
(Harper 1988; Wagner 2002). Further, many scholars of visual studies use
photo elicitation or other techniques in which visual data are produced by the
subjects themselves, specifically for the purpose of the study (e.g., Harper
2002; see Margolis and Pauwels 2011 for a collection of similar approaches).
In such circumstances, treating the depiction and the depicted as inseparable
analytic units makes sense.
However, VDA operates under different circumstances, with different
data collection techniques and a different focus. Widespread use of cameras,
including smartphones, closed-circuit television (CCTV), and body cameras,
changed which situations are captured and the degree to which people are
accustomed to being filmed. VDA makes use of these ever-increasing data on
human behaviors and, in contrast to visual studies, focuses on things we can
see people do and the way they do them, as captured in visual data. In the
example above concerning military negotiations, a VDA approach scruti-
nizes who moved or looked when, where, how the protagonists speak and
react to each other, or what emotions they display during the situation (e.g.,
Klusemann 2009). In this sense, VDA studies the inherent logic of situations
to understand situational dynamics and their underlying systems of rules. In
VDA, who recorded the data, with what intention, and how the scene is
presented (i.e., the depiction) is only important insofar as it influences the
situation itself or what is visible in the recording. Despite these differences,
visual studies offer important insights on VDA, and their reflection on the
context of production and how people react to being filmed inform VDA
criteria of validity.
VDA and Ethnography: The Analytic Potential of Visual Data

Since the early twentieth century, situational dynamics have been a core
interest in microsociological research and parts of anthropology and crim-
inology (Garfinkel 2005; Goffman 1959, 1982; Mauss 1923; Reiss 1971).
Such ethnographic studies contribute enormously to the social sciences.
Until the turn of the millennium, studies in these fields largely employed
participant observations of social encounters and retrospective interviews
of participants to describe and explain social phenomena (e.g., Duneier
2000; Humphreys and Rainwater 1975; Katz 2001; for the criminological
field of systematic social observation, see Mastrofski, Parks, and McClus-
key 2010; Reiss 1971). Although ethnography and VDA both seek to
understand social phenomena through their situational dimensions, there
are crucial differences in the data used and the position of the researcher
during data collection. Ethnographic researchers usually participate in the
researched situation, while VDA researchers use visual recordings of a
situation, often produced by third parties. This difference results in the
complementary analytic potential of VDA in comparison to participant
observations and retrospective interviews.
Participant observation is a powerful technique for studying social phe-
nomena at the microlevel, but data transparency and reproducibility of find-
ings are limited. Readers must rely on a researcher’s accuracy in
documenting and describing situations, which might be tainted by observer
bias (LeCompte and Goetz 2007; Lipinski and Nelson 1974:347). Further, it
is difficult for researchers to capture situational microdetails in the field,
particularly when observing numerous persons: who was standing where,
at which second, looking in which direction, with what facial expressions
and body postures, and what was the persons’ intonation, rhythm of speech,
and speed of movement? Beyond the fact that full documentation of such
aspects is nearly impossible in complex situations, even for an entire team of
researchers, writing detailed accounts in situ restricts a researcher’s attention
on the situation (Jordan and Henderson 1995:52; Lipinski and Nelson
1974:342; Mackenzie and Xiao 2003:ii54).
For the same reason, retrospective interviews and surveys provide impor-
tant insights, but they are not ideal to studying situational dynamics because
respondents usually have difficulties recalling details about situations. Even
if interviewees have high stakes to remember situational details (e.g., saving
people from harm), they can often not remember details, or they recall them
incorrectly.5 Therefore, retrospective accounts of participants provide a poor
primary source for microlevel studies of what happened and why (Bernard
et al. 1984:509f; Vrij, Hope, and Fisher 2014).
Visual data that capture situational details is a powerful addition to estab-
lished data types. First, they create an “incomparably richer record” of what
happened during a situation (Jordan and Henderson 1995:52). Visuals used
as data containers allow analyses in greater detail due to the possibility of
rewinding videotapes or studying photographs meticulously (Heath et al.
2010:61-86). In this way, researchers are able to focus on even the most
fleeting of information (such as microexpressions of emotions) and recon-
struct the exact sequence of interactions frame by frame, even during long
and complex situations. Second, using recorded visual data increases relia-
bility of findings because multiple researchers can analyze the same data
material. The value of sharing and showing visual data for transparency
cannot be underestimated since it allows testing intercoder reliability and
increases reproducibility of findings (Heath et al. 2010:7). Third, some

events are too rare for systematic participant observation (e.g., mass panics,
natural disasters, bar fights).6 With the proliferation of cell phone and CCTV
cameras, such events are now often photographed or recorded on video,
allowing researchers to study even such extraordinary situations.
VDA and Experimental Behavioral Studies: Focusing on

Natural Behavior
Studies in psychology also have a tradition of employing visual data, albeit
often in experimental settings. Studies trigger and analyze facial and bodily
expressions of emotion (Ekman, Friesen, and Ellsworth 1972), focus on
emotion suppression (Gross and Levenson 1993, 1997), emotional aspects
related to aging (Isaacowitz et al. 2006; Pruzan and Isaacowitz 2006), and
marital satisfaction (Carstensen, Gottman, and Levenson 1995; Levenson
and Gottman 1985). For example, Waldinger et al. (2004) analyze videos
of 47 married couples who discuss areas of disagreement in their relation-
ship, and the authors use this material to study emotions in couple interac-
tions and marital quality and stability. Such studies contribute to the analysis
of behavior and social situations through detailed analytic approaches, sys-
tematic coding apparatuses, and experimental rigor (see Harrigan, Rosenthal,
and Scherer 2008; Scherer and Ekman 1982). However, they usually analyze
visual data of behaviors recorded in laboratories, instead of observing natural
behaviors directly in real-life settings (Harrigan et al. 2008; see Drury et al.
2009 for an innovative use of immersive virtual environment technology to
address this problem). This focus entails omitting a number of questions
essential to understanding natural behaviors (Baumeister, Vohs, and Funder
2007). Thus, VDA adds to psychological behavioral studies by employing
new data sources (e.g., CCTV or mobile phone recordings), allowing anal-
ysis of natural behaviors while maintaining the detail of psychological beha-
vioral analyses.
VDA and Multimodal Interaction Analysis: Focusing on

Social Outcomes
Multimodal interaction analysis is an analytic approach developed in linguis-
tics to address growing availability of video footage. It is rooted in conversa-
tion analysis (Sacks, Schegloff, and Jefferson 1974), which analyzes verbal
and nonverbal communication. Broadening this scope, multimodal interac-
tion analysis studies visuals to assess all channels of communication (Norris
2004:2; Stivers and Sidnell 2005). For VDA, multimodal interaction analysis
offers at least two useful insights. First, the approach disentangles and thus
allows coding and analyzing of various streams of communication (Norris
2004; Stivers and Sidnell 2005). It includes not only words and sentences
people use to communicate (i.e., the lexicosyntactic channel) but also an
array of additional channels such as intonations, facial expressions, gestures,
gazes, body postures, and physical aspects of the environment (Stivers and
Sidnell 2005:2). Second, multimodal interaction analysis identifies patterns
in communication relevant to analyzing interactions including interactional
turn-taking, the organization of communication sequences, and how partici-
pants repair unsuccessful attempts at communication (Norris 2004; Stivers
and Sidnell 2005).
Despite these overlaps, multimodal interaction analysis differs from VDA
in several regards. Whereas VDA focuses on interactions and emotions to
understand social outcomes such as use of violence (Collins 2008; Nassauer
2016, 2018b), tipping points in negotiations (Klusemann 2009), or criminal
behaviors (Nassauer 2018a), multimodal interaction analysis focuses on how
“communication is accomplished” (Stivers and Sidnell 2005:2), that is, the
technical aspects of communication itself (e.g., how it is structured and
organized or how it works and breaks down). This difference leads to dis-
parities to the analytic approaches. For example, multimodal interaction
analysis is not concerned with how people perceive and interpret a situation
but only with what they express and how others react to it (Norris 2004:4).
During VDA, participants’ interpretations of a situation can be important
parts of analysis (Collins 2008, 2014, 2016a; Nassauer 2016). Moreover,
multimodal interaction analysis does not usually try to establish causal links
between situational dynamics and social outcomes beyond communication.
Such links are an important aspect in many VDA applications. Despite these
disparities, multimodal interaction analysis informs VDA by analyzing the
organization, structure, and function of communication during interactions.
Three Examples of VDA Applications

We will use three studies as recurring examples to discuss VDA as a meth-
odological framework. All three examine situational aspects of natural beha-
viors caught on tape. For reasons of clarity and brevity, we use studies from a
single interdisciplinary field—violence research. The methodological frame,
however, applies to other substantive fields as well. The three studies were
chosen to represent a range of research scenarios, including (a) single-case,
qualitative studies that use one visual; (b) multiple-case, quantitative studies
that analyze one visual per event; and (c) multiple-case studies that analyze a
patchwork of visuals.
Klusemann’s (2009) paper on atrocities and confrontational tension
examines one eight-hour video for a single-case analysis of events leading
to the 1995 Srebrenica massacre in Bosnia–Herzegovina. He applies Col-
lins’s (2008) notion of emotional dominance in social situations as a cause of
violent outbursts. The author assesses a turning point during events that led
up to the atrocity. The eight-hour video was recorded by a Serbian camera
team and accessed through the Criminal Tribunal for the Former Yugoslavia.
It shows negotiations between Serbian General Mladic and Dutch peace-
keeping Commander Karremans before the outbreak of the massacre. Klu-
semann complements this visual with document data on the event. He
employs qualitative analysis with a focus on emotions, challenging prevalent
approaches in the study of war and atrocities that focus on background
factors as explanatory variables.
Levine et al. (2011) study 42 videos of one to eight minutes in length,
capturing 42 incidents of aggression in a city in North West England. Their
analysis focuses on group effects and behaviors of third parties leading from
aggression to physical altercations. The videos were captured by CCTV
cameras and provided by a city council. The authors focus on action variables
leading to fights or preventing them from occurring. Statistical analysis of 42
violent and nonviolent cases, with a focus on behavioral paths, shows that
third parties inhibit rather than facilitate violence. This finding challenges
conventional psychological theories on the role of group size in regulating
aggression and violence (2011:411).
Nassauer (2016, 2018b) analyzes 30 violent and peaceful protest marches
using nearly 1,000 visual data that vary in length between a few seconds and
several hours. She analyzes the emergence of violence in protests, focusing
on interactions and emotional dynamics between protesters and police that
led to violent clashes. Nassauer complements these data with document data
and uses in-depth, cross-case comparisons to study situational interactions
and emotions. The results question predominant theories on protest violence,
which assume police or protesters’ motivations or strategies are crucial for
outbreaks of such violence.
Analytic Tool Kit for VDA

Although researchers employ what we call VDA in a variety of disciplines,
there are commonalities to the analytic approaches. However, since these
commonalities have not yet been collected systematically, VDA applications
have been insulated within disciplines, and advances in one field are often
overlooked in others. To move beyond this insulation, we distill commonal-
ities from existing applications into a VDA tool kit of analytic dimensions
and procedures. This tool kit can operate as a reference point for authors,
reviewers, and readers of VDA studies and as a starting point for consolidat-
ing VDA into a coherent methodological approach. Not all aspects we intro-
duce in this section are relevant to every study. For example, Levine et al.
(2011) do not focus on facial expressions and body postures as analytic
dimensions to study emotions, but Klusemann (2009) and Nassauer (2016)
do. We also do not claim that this tool kit is exhaustive; additional aspects
might arise during application of VDA.
Analytic Dimensions
Analytic dimensions refer to the content of visual data that are of interest
when analyzing situations. We differentiate (a) facial expressions and body
posture, (b) interactions, and (c) context (see also Collins 2016a:92).7 Facial
expressions and body postures are any nonverbal information that a person’s
face and body convey. Interactions refer to anything people do or say that is
geared toward or affects their environment or people within. Context means
information on the physical and social setting of a situation. Like any analy-
tic approach, the dimensions presented in the following sections entail the-
oretical underpinnings (Jordan and Henderson 1995:40). Discussing these
underpinnings lies beyond the scope of this article (e.g., the question whether
there is a connection between facial expressions and emotions, see Ekman
2003-17; 213-231). Researchers applying VDA must reflect on the theore-
tical assumptions underlying their respective studies. The suggested analytic
dimensions should thus be understood as lenses that help deriving informa-
tion from visual recordings and that might help to understand situational
dynamics, provided they draw on a thorough theoretical reflection and
employ clear, detailed coding schemes.
Face and body posture. People convey much information nonverbally through
their faces and bodies (Birdwhistell 1970:158; Kendon 1990). Accordingly,
facial expressions and body postures can be essential dimensions when work-
ing with visual data. When coding facial expressions, Collins (2009:567)
suggests focusing on three areas: the mouth and jaw, the region around the
eyes, and the forehead. Scrutinizing visual data with this focus can allow
identifying universal emotions (i.e., fear, anger, happiness, sadness, surprise,
disgust, and contempt; see Ekman et al. 1972), positive or negative affect,
degrees of interest and engagement (Carstensen et al. 1995:143), and atten-

tional foci in a situation (Jordan and Henderson 1995:67).
Another option is to look at body postures, for example, the relative
position of the head, shoulders, torso, arms, and legs (see Birdwhistell
1970), head movements, hands, and neck (see Waxer 1985). Research sug-
gests that analyzing body postures allows deriving information about peo-
ple’s emotional states, levels of energy, attention focus, and confidence
(Collins 1993, 2008; Juslin and Scherer 2008). For example, relaxed body
postures are characterized by shoulders down, relaxed muscles, and soft
movements. They can indicate that actors feel at ease, safe, and/or self-
assured, while tense body postures can indicate that actors feel threatened,
uncomfortable, and/or stressed. Depending on the theoretical underpinnings
of a study, similar information can be derived from the way actors move in a
situation (e.g., an actor’s levels of activity) or an actor’s gaze (Exline and
Fehr 1982; for an overview, see Juslin and Scherer 2008).
In his study of the Srebrenica massacre, Klusemann (2009) draws on
Ekman et al.’s (1972) study of human facial expressions and universal
emotions to examine the emotional states of Mladic and Karremans during
their negotiations. He combines Ekman et al.’s coding of facial expressions
with analysis of actors’ body language and Collins’ concept of high- and
low-emotional energy. Klusemann measures high-emotional energy by
actors moving firmly, taking initiative during interactions, and showing a
strong physical presence (e.g., pulling oneself up to full height; for a full
list, see Klusemann 2009:9).
Nassauer (2016) similarly uses facial expressions as indicators of police
and protesters’ emotional states during demonstrations to detect changes
from relaxed to tense emotional states (e.g., anger and fear) and to examine
the connection of such changes to the emergence of collective violence. She
also relies on body postures next to facial expressions as indications of
emotions, especially since the faces of recorded actors are sometimes not
visible (e.g., due to officers wearing helmets). Based on research from
Ekman et al. (1972) and Klusemann (2009:9), she codes (2016:16) indicators
of fear such as postures and movements that are shrinking, avoiding gaze,
hand touches or covers face, eyes, mouth, contracting muscles, stiff stance
(body), lowered brows, inner and outer brow raised, raised upper eyelids,
tense lower eyelids, tensed or stretched lips, lips parted, and dropped jaw
(face).
Next to studying facial and bodily expression in visual recordings,
research suggests that it may be possible to study emotions in terms of
arousal using thermal imaging (e.g., Clay-Warner and Robinson 2014).
Thermal imaging displays infrared radiation emitted by humans. Increases in

emitted radiation can be combined with visual recordings of interactions,
behaviors, or emotions to detect and confirm changes to emotions and their
relationships to interactions.
Interactions. When conducting VDA interactions are usually a core interest.

First, interactions refer to movement and actions, such as moving in space,
using items, and touching, hitting, and hugging, and otherwise engaging
others physically or navigating interpersonal and environmental spaces
(Collins 2008; also see Kendon 1990; Watson and Graves 1966). In a study
of bar fights, Levine et al. (2011) focus on the actions of bystanders. They
code escalatory (e.g., kicking or slapping someone) and conciliatory beha-
viors (e.g., blocking contact or pulling fighters apart) and analyze how such
behaviors influence the dynamics of aggressive situations. Density calcula-
tions add another layer to the analysis of interactions (for a current research
project employing this method, see Reichertz and Schöner 2014). These can
be combined with visual data, for example, to study whether crowd members
move closer together after specific interactions in the in-group or out-group
or for the prevention of crowd disasters (Abuarafah, Khozium, and
AbdRabou 2012).
Second, interactions include gestures; regionally or culturally specific
nonverbal codes for communicating directions, distress, indifference, con-
tempt, the beginning or end of an encounter, and a broad range of other
content (Birdwhistell 1970; Juslin and Scherer 2008). Some gestures like a
nod are widely understood, while others reveal their meaning only in a
specific sequence of actions such as a teacher opening a textbook as a signal
that class has started (Stivers and Sidnell 2005:5). For example, Nassauer
(2016, 2018b) analyzes gestures with which police officers communicate
good will even in situations of tension and close physical contact with pro-
testers.8 Such gestures can be studied as important communication devices
that influence situational dynamics, in this case, deescalating the situation
(Nassauer 2015).
Third, verbal communication and other audio can be analyzed as part of
interactions. Content, intonation, and tone of voice provide information on
positive or negative affect, taunts, aggression, and conciliation (Carstensen
et al. 1995:143; Juslin and Scherer 2008; see Sacks et al. 1974 and Norris
2004 for approaches to analyzing communication). For example, Klusemann
(2009) emphasizes verbal communication during the negotiations he studied.
He examines pauses and emphasis of speech and interruptions of one speaker
by another. He shows that pauses by General Mladic dramatize the power
stratification and lead Commander Karremans (the listener) to experience

unease, thereby amplifying the latter’s loss of dominance throughout the
interaction (p. 7).9
If both audio and video data are used in a VDA study, it is important to
establish valid and reliable links between both channels. If the two channels
come from different data, a researcher can assess whether both sources are
reliable. If audio and video channels come from a single datum, researchers
should verify whether both channels match in timing (i.e., whether sounds
and gestures that occur together in the data really occurred together during
the situation; Stivers and Sidnell 2005:7-10). Further, audio information
might reveal the presence of other people or external influences (e.g., traffic
noise or music) not captured on film. It is also useful to verify whether audio
and video corroborate or contradict each other regarding how a situation
unfolds, and how each channel influences the outcome. Hence, if available,
audio recordings strengthen visual analysis of interactions.
All the above interactions can be coded regarding their intensity, timing,
frequency, and duration (Gross and Levenson 1993:973-74). Intensity
refers to a rating of how extreme an interaction was. Timing refers to when
an interaction occurred during a situation, and frequency refers to how
often an interaction occurred during a situation, while duration refers to
how long it lasted.
Context. The context of a situation should not be understood merely as static

background factors of a situation. The significance of social action and
activity is embedded within the circumstances in which they arise; an action
can have different meanings depending on context (Heath et al. 2010:82-83;
Stivers and Sidnell 2005). We argue that the context of a situation can be
coded regarding physical and social dimensions, and both should be incor-
porated during a VDA. Of course, not all information on context can be
derived from visual data, especially regarding what actors bring to situations.
This illustrates the advantages of triangulating visual and nonvisual data
during VDA.
Physical dimensions refer to properties of the environment in which a
situation unfolds. Examples include the expanse of space (e.g., ample or
confined), lighting (e.g., bright or dark), weather conditions, and access
(e.g., restricted or open). As approaches such as multimodal interaction
analysis (Norris 2004) or actor–network theory (Callon 1986; Latour 1993;
Law 2009) suggest, inanimate objects, such as tools and devices, TV screens,
or audio speakers, often play a role in how a situation develops because they
mediate actions and interactions between people (Law 1992:381-84). For
example, Nassauer’s (2016, 2018b) study of violence during demonstrations

shows that assignment of space for protest routes or no-protest zones is
crucial to the unfolding of situational dynamics. When group A (protesters
or police) perceives during the demonstration that group B is invading space
that was assigned to group A prior to a demonstration, spatial incursions
foster tensions and fears among police and protesters. In the same study,
Nassauer (2016, see also 2015) shows that objects such as police communi-
cation devices and special audio speakers developed for crowd management
play a role in how demonstration marches unfold.
The social dimension of context refers to aspects such as actors present,
information on social relationships prior to a situation captured on tape, and
social roles during a situation. For example, it might be useful to count how
many actors are present and describe their gender, ages, and attire, if dis-
cernable. Depending on the research question, it might be useful to know
which actors knew each other prior to the situation under study, whether
they had a positive or negative relationship, how far back the relationship
goes, how likely it is to continue in the future, and how intense the rela-
tionship is. If available, it might be helpful to determine further information
about specific actors (e.g., mental illness, substance abuse, inebriation, or
health conditions). Actors’ interpretations and expectations of the situation
might also be relevant, and whether an actor is familiar with the type of
situation under study. Levine et al. (2011:411) reflect on not knowing
whether actors involved in fight scenes consumed alcohol, though many
individuals presumably had. Such information might be available if
researchers obtain CCTV from police, combined with police case files or
reports.10 Klusemann (2009:38) provides context by discussing the stages
of the Yugoslavian war and describing the power balance in phases pre-
ceding the recorded encounter. Nassauer (2016, 2018b) discusses the social
dimension of context by analyzing in how far police officers’ understand-
ing of their own social roles during protests, police training, and cultural
expectations influence situational dynamics.
Analytic Procedures
Like analytic dimensions, analytic procedures depend on the specific VDA
application, but common aspects exist. The goal of VDA is to reconstruct a
situation step-by-step, analyze its inner dynamics, and establish comprehen-
sive story lines. We discuss approaches commonly employed in VDA studies
to achieve these goals, by coding situations, reconstructing processes and
events, and establishing causal links.
Coding situations. When assessing situations using VDA, the first step is
analyzing them in great detail (even frame by frame, if necessary). As pro-
posed by multimodal interaction analysis (e.g., Norris 2004), processes,
events, or situations under study can be broken down into lower-level actions
and one or multiple higher-level actions. Lower-level actions are single
actions that comprise the smallest analytic element of a situation (e.g., a
shrug, a change in gaze direction, or a shout). Multiple lower-level actions
chained together and enclosed by opening and closing actions comprise a
higher-level action (Norris 2004:11), such as a conversation, a fight, or
surgery. VDA studies occasionally analyze a single higher-level action, such
as Levine et al.’s (2011) brawls. Other times, they analyze multiple higher-
level actions chained into a larger event such as Nassauer’s (2016) analysis of
demonstration marches. In both cases, VDA relies on meticulous reconstruc-
tion of sequences of actions during a process, situation, or event.
The analytic dimensions introduced above provide a set of codes with
which to disentangle various lower-level actions that often occur simultane-
ously. The researcher should dissect the situation under investigation, iso-
lating each lower-level action. How fine-grained coding should be depends
on the VDA research project. One option is multimodal transcription of a
situation, in which all lower-level actions are notated in multimodal chron-
ological order (see Norris 2004, Chap. 3 for a guideline). In other studies, it
may suffice to analyze lower-level actions in some parts of the situation or
event and focus on higher-level actions in others (Nassauer 2016, 2018b).
Specialized software can help in the coding effort and facilitate transcrip-
tion and analysis. For example, Noldus Observer XT (version 14) is geared
toward deductive analysis of behaviors in which codes are defined early.
When using an open or inductive approach (Strauss and Corbin 1998),
Atlas.ti (version 8), NVivo (version 12), or MAXQDA (version 2018) are
more useful since they allow adding, changing, and grouping codes freely
during coding. These packages are also useful when researchers aim to
triangulate visual data with other data types. Other software options include
Elan (version 5.0.0-beta) (designed for the analysis of language, signs, and
gestures) or Multimodal Analysis software. Some software improves low-
quality videos (e.g., low-resolution CCTV recordings), and others, such as
Noldus Face Reader (version 7.1), offer automated coding of emotions in
facial expressions or automated face recognition (e.g., FindFace). Automated
coding of visual data is evolving quickly. For example, there have been
substantial advances in movement recognition algorithms, which are applied
to tasks such as automated recognition of violent altercations in visual data
(Gao et al. 2016; Nievas et al. 2011; Ribeiro, Audigier, and Pham 2016).
Such capabilities will increase in the near future, promising additional ben-
efits for VDA applications.
Reconstructing processes and events. Coding all relevant lower-level actions of

a situation meticulously serves as the basis for further analysis. Crucial to
VDA are the timing and temporal sequence of codes within the social and
physical context of a situation—what happened when, following what, and
leading to which subsequent development. Analyses often reconstruct tem-
poral sequences or produce a tempospatial matrix of events, a diagram of
movements, or some other type of description of process within a situation.
For example, in a tempospatial matrix, types of interactions or facial and
bodily expressions occupy columns, and rows mark progression over time
(Figure 2). Based on such a matrix, it is possible to reflect on patterns in
temporal sequences of situations. Researchers can scrutinize if and how
participants take turns during interactions, and whether there is rhythmicity
in situations (Jordan and Henderson 1995:59-67).11
For example, Klusemann (2009) focuses on actions and reactions of Gen-
eral Mladic and Commander Karremans, and how their respective emotional
states developed during negotiation. Figure 2A illustrates this analytic pro-
cedure, which examines the parallel sequences of lower-level actions during
verbal and nonverbal communication. Levine et al. (2011) similarly follow
the development of aggressive situations step-by-step, with focus on the
influence of bystander actions. As shown in Figure 2B, the authors construct
a sequential action tree for each situation, showing the succession of bystan-
ders’ escalatory and conciliatory behaviors after an initial escalatory beha-
vior by a perpetrator (2011:410). Nassauer’s (2016) analysis required a
different approach since the author uses a patchwork of data to assess tem-
porally and spatially extensive protest events. This patchwork data approach
allows reconstructing complex cases that cannot be captured in their entirety
in only one video. As shown in Figure 2C, once the author locates data in
space and time,12 she looks for changes to participants’ emotional states
(from relaxation and happiness to tension and fear) and then traces the
interactions that produced these changes (Nassauer 2016:5).
Causal links. Based on a reconstruction of processes and events, the next step
in VDA often is to identify systematic links between situational dynamics
and the outcome. VDA operates similarly to causal process tracing and its
understanding of processes as a temporal sequence of linked elements
(Blatter 2012; Mahoney 2012), though with a focus at the microlevel of
situations. Many VDA studies have the goal of establishing comprehensive
152
A Time Conversation Facial expression & body posture
Karremans (K): That's eh what they (UN and Mladic (M) and K: mutual gaze; K: hands folded before
Dutch governmental authorities) have asked for. his abdomen; at end K moves left arm outward.
Translator (T): M and K: mutual gaze; towards end K looks down.
K: Eh, I don't know if I may eh expect an answer. K: slight shrug; K looks in-between M and T with gaze
straight ahead; at end he looks at M.
T: K looks in-between T and M.
K: Because I am realizing that ehh those ehh K moves his left hand shortly outward – slight emphasis
questions should be asked eh in Pale. on Pale. (He refers to negotiations at a higher level).
(adapted from Klusemann 2009, p.6)
B C
P
Action A
Action A Start of event Outcome
P P Action B
Action A
probability (p) P P Action A Acon 1 Acon 2 Acon 3 Acon 1
Action B
Interacon
P Action B
Action 1
P Action A happiness happiness fear fear fear fear
P
Emoon
Action A
P P Action B
Action B
P Action A A street B street C street C Street E Street F Street
P
Locaon
Action B
P Action B
Time 1 pm 1.15 pm 1.30 pm 1.45 pm 2 pm 2.15 pm
Time
(adapted from Levine et al. 2011: 410) (Nassauer 2016: 5)
Figure 2. Examples of analytic procedures in video data analysis.

and compelling story lines that explain why an outcome did or did not occur,
based on what happened during the situation. This can be done explanatorily
to generate hypotheses and build theories, or, with an adequate sample, to
identify generalizable patterns.
Applied to VDA, this means that a comprehensive and compelling story
line of a situation should show the situational path that led (through one or
several chains of causally linked actions and stages) from the start of the
situation to the occurrence of the outcome under study or its absence (see
Blatter 2012:5). There are several strategies to examine whether such a story
line exists. One option is to focus on temporally and spatially adjacent
actions, examining how one might have caused the next. Identifying the
mechanism that links two actions can be crucial to identifying causal links
(Brady 2008:218). Smoking-gun observations combine these two aspects,
showing clearly the connection between cause and effect (Blatter 2012:16,
19). It might also be useful to look for pivotal moments in the sequence of
actions in a situation, such as turning points, critical junctures, or windows of
opportunity (Goertz and Levy 2007).
To establish a comprehensive and compelling story line, Klusemann
(2009) draws heavily on his simultaneous analysis of a verbal conversation
and both actors’ nonverbal behaviors to establish links between lower-level
actions (i.e., specific interactions and emotions that ultimately led to the
outcome). The analysis enables the author to identify each new cue in the
conversation and the ensuing reaction by the other. Klusemann shows step-
by-step how Mladic gains emotional dominance over Karremans during their
conversation. Levine et al. (2011) calculate the probability for each action to
turn aggression into violence and employ statistical analysis to examine the
microregulation of violence. They identify a pattern in third-party behaviors
and probabilities of action sequences that prevent aggression from turning
into physical violence. Nassauer (2016, 2018b) compares patterns across 30
violent and peaceful cases, looking for interactions that consistently change
emotional states and whether they associate with emergence of violence.
Using this strategy, Nassauer shows that combinations of interactions during
demonstrations systematically increase tension and fear, and thus foster vio-
lent outbreaks even during otherwise peaceful protest marches. As shown in
Figure 2, both Nassauer (2016) and Klusemann (2009) focus on time
sequences, and Levine et al. (2011) focus on action variables and the sub-
sequent probability of aggression leading to violence.
The notion of establishing causal links through careful analysis usually
assumes that “there is crucial causality at the microlevel” of these events
(Collins 2016b). A discussion of theoretical and methodological
underpinnings of this perspective is beyond the scope of this article, but four
short remarks seem in order: (a) Analyzing causal links is not equivalent to
claiming generalizable findings, as demonstrated by small-N VDA studies
(Klusemann 2009) that usually strive for generating hypotheses and building
theory. (b) Not all social phenomena show causality at the microlevel; if
VDA scholars find that a comprehensive story line does not exist, this might
indicate that crucial factors for explaining the outcome lie outside the situ-
ation. (c) Even under the assumption of causality, the burden of proof always
lies with the researcher (for an overview of criteria of causal links, see Brady
2008). The task requires not only the kind of meticulous analysis described
above but also careful theoretical reflection. Theory sharpens a scholar’s
eyes to detect and evaluate patterns found in data (Collins 2016a:83). (d)
It can be prudent to combine VDA with other analytic tools such as classic
cross-case comparisons, as in Nassauer’s (2016) study, or statistical analysis,
as employed by Levine et al. (2011). There are also approaches more spe-
cialized to analyzing sequential order, such as sequence analysis (MacIndoe
and Abbott 2004; Aisenbrey and Fasang 2010), which can add to VDA
studies and the analysis of causal links in situational dynamics.
Validity in VDA
Apart from commonalities in analytic dimensions and analytic procedures,
VDA studies also share challenges to and criteria for validity. Such criteria
are a cornerstone of any analytic framework and help evaluate research.13
Validity means that the concepts and indicators researchers use capture what
they are intended to capture and that data used provide evidence for conclu-
sions drawn (LeCompte and Goetz 2007:17; Saylor 2013:354). Extant studies
that employ VDA follow criteria of validity but without an explicit discussion
of how the use of visual data affects these criteria (e.g., Collins 2008; Kluse-
mann 2009). We introduce optimal capture and natural behavior as criteria for
validity. Optimal capture means visual data should cover the duration of a
situation or event, its space, and all actors involved. Natural behavior refers to
the degree to which actors in visual data behave the same way they would
without being filmed in the type of situation under investigation.
Optimal Capture
VDA is especially suited to situations and events during which all relevant
actors can be observed from start to finish in a limited, well-defined period.
However, even large, complex events can be studied using VDA, as long as
available data approaches optimal capture of situational dynamics. Since it

is difficult to formulate a clear criterion for when data capture an event with
sufficient completeness, we propose a pragmatic working definition: Visual
data must enable researchers to establish a seamless sequence of relevant
lower-level actions and provide compelling empirical evidence for sys-
tematic links between those actions. Optimal capture of a situation is a
high goal, one that visual data are uniquely positioned to achieve. Other
data types such as survey data, participant observations, or interviews are
inherently unable to ensure optimal capture of an event under study because
participants are unable to take in and recall all relevant information of a
situation (Jordan and Henderson 1995:52; Lipinski and Nelson 1974:342;
Vrij et al. 2014).
There are two main challenges to optimal capture, both when using single
videos of short, straightforward interactions and when studying large, com-
plex events with a patchwork of visual data. The first challenge to optimal
capture is that recordings can be incomplete. For example, a recording might
start too late or finish too early to capture an entire situation (similar to the
issue of left- and right-censored data in survey research; see M. Verbeek
2008). The frame or angle captured might exclude relevant moments or
aspects from recording. This incompleteness can be accidental, coincidental,
or purposive (e.g., members of one faction of a conflict record the transgres-
sions of their opponents while ignoring their own faction’s misconduct). A
similar result follows if people later edit recordings, removing scenes, or
purposely selecting angles to use in the final cut. Hence, researchers must
ensure that no part of an interaction is missing in the visual data they use
during VDA and that nothing is left out due to the angle of filming. If edited
data are used, scholars employing VDA must reflect on possible gaps, miss-
ing information, and the possible implications for their findings. Moreover,
when selecting cases, researchers should reflect on whether case selection
and data source selection may have introduced bias. For instance, selecting
data pieces from a single source may introduce bias if that source represents
an individual, a group, or an institution with a clear stake in how the situation
is perceived.
A second issue is whether the material allows observing and investigating
all relevant aspects of a situation. Possible issues include poor resolution,
people blocking the view of events or interactions, and aspects being too far
from the camera to be discernable. Thus, scholars employing VDA must
ensure that visual data allow them to examine exactly what happened, when,
where, and how (see also Nassauer 2018a).
Klusemann (2009) and Levine et al. (2011), and Nassauer (2016, 2018b)
in a different way, show scenarios of striving for optimal capture. The
former two studies analyze spatially confined situations with a limited
number of participants, and both use one video to capture the entire event
under study. Nassauer triangulates sources and data types to pursue optimal
capture.
Klusemann uses one visual document of eight hours, filmed by a Serbian
camera team. Since the author focuses on emotional dominance, and his data
contain both visual and audio information, a single datum ensures optimal
capture of the interaction. Aggressions in Levine, Taylor, and Best’s study
are short events, ranging from one to eight minutes. Thus, relying on a single
recording for each case allows capturing all of the situations’ relevant
aspects.14
For other VDA studies, triangulation can be much more crucial to pursu-
ing optimal capture. Nassauer (2016) analyzes protest marches, events that
often involve thousands of participants, cover wide, open areas, and last for
several hours.15 The author therefore triangulates data using two data col-
lection techniques to strive for optimal capture of events. First, she gathers
recordings on each relevant situation from several angles and sources from
all relevant groups involved. This technique of multiple coverage hedges
selectivity bias; validity of data increases if several data suggest the same
picture, despite being obtained from different types of sources and showing
the situation from different angles. Collins (2016a) therefore suggests that “it
is best to collect everything possible, all images available from all sources”
(p. 82). Second, Nassauer triangulates visual data with other types of data,
including reports from the media, police, protesters, and bystanders, and
retrospective interviews with participants, court files, and police documents.
Such complementary data can corroborate findings from visual data and
provide information on contextual factors (see also Bramsen 2018; Weenink
2014). Complementary data beyond the situation under study help with
assessing the viability of alternative explanations for the outcome under
study.
For some events or social phenomena, not even triangulation of several
videos and additional nonvisual data suffice to achieve the desired degree of
optimal capture. In such situations, we suggest two alternatives for VDA
scholars. First, researchers should document what parts of the situation or
specific channels of action are captured insufficiently. For example, Levine
et al. (2011:411) discuss that they do not have information on intoxication or
previous interactions of actors captured later in aggressive situations. How-
ever, the authors make a convincing argument for why they can draw
meaningful conclusions. Second, if gaps are too extensive to conduct

detailed analyses, we suggest that researchers remove respective cases from
the VDA data set. In a paper employing VDA to study how armed store
robberies succeed or fail, Nassauer (2018a) removes cases in which signif-
icant parts of an interaction are missing. If cases are removed due to insuffi-
cient capture, researchers should reflect on whether this could introduce bias
because of patterns in data availability that may lead to insufficient data for
certain types of cases.
Natural Behavior
Studies that employ VDA are largely interested in natural behaviors (i.e.,
those that occur the same way in unobserved situations of the same type).
Hence, analyzing a video in which people have adapted their behaviors poses
a problem for validity during VDA. The primary challenge with recording
natural behaviors is reactivity (LeCompte and Goetz 2007:12; Lomax and
Casey 1998),16 the possibility that actors adapt their behaviors due to the
presence of a researcher or recording device (LeCompte and Goetz
2007:20-22). Behaviors might be adapted because of a social relationship
between the recorder and the people being recorded or because people react
to the camera. For example, actors might refrain from deviant or potentially
embarrassing behaviors or act according to a different social role meant for
the audience of the recording (Harper 1988:28; Pauwels 2010:553f; Wagner
2002:165).17
Researchers can assess reactivity and data validity through a sequence of
three questions: (a) Did participants realize they were being filmed; (b) if
they did, is there evidence that they adapted their behaviors? and (c) if so,
how far is reactivity part of a situation and should thus be considered natural?
The first question relates to participant awareness depending on the type of
recording and situation. For example, CCTV cameras in public places are
often unnoticed by participants. In other instances, cameras might remain
unnoticed because of dense crowds (Pauwels 2010:564). If people in the
visual footage did not know they were recorded, reactivity can be disquali-
fied (Pauwels 2010:563).
The second question is concerned with the fact that, even if participants
realize they are being filmed, data are not necessarily tainted by reactivity.
As Becker (1986:255f) points out, reactivity is low or nonexistent if record-
ings are commonplace and all participants expect them to occur or if inter-
actions are too important to be altered by the presence of a recording device.
Further, some behaviors are not altered because they occur subconsciously.
Gestures and some body movements are difficult to alter consistently over a
long period (Jordan and Henderson 1995), and the presence of a camera
might fade quickly from people’s awareness even if the setting is clearly
framed as being recorded (this is especially true if the camera operator is not
in the room; Jordan and Henderson 1995:55-6). Hence, reactivity decreases
over time and is less pronounced when people are involved intensely in what
they are doing.
If participants clearly react to cameras or other recording devices, the
third question is whether such reactivity might be part of the situation. For
instance, Nassauer (2016) analyzes protest events during which media,
bystanders, police, and protesters recorded videos. Some actors may not have
realized when exactly they were filmed due to dense crowds, and they may
not have adapted behaviors due to highly emotional circumstances before the
outbreaks of violent confrontations. But even those who realize they are
being filmed and adapt their behavior do not compromise data validity
because participants are usually counting on cameras to film such events.
Protesters, for example, want to make their political demands heard by a
wider public (Tilly and Tarrow 2006). Thus, reactivity is an essential part of
protests, and cameras being present and actors potentially adapting their
behaviors can constitute natural behaviors in such situations.
In Klusemann’s (2009) analysis, the TV team’s cameras were also part of
the situation. Analyzing a military negotiation, Klusemann examines highly
stylized performances from the onset. Thus, any behavior in that situation
was natural in the sense that it occurred during a high-stakes military nego-
tiation. Further, with the lives of thousands of soldiers and civilians on the
line, negotiations were too important for General Mladic and Commander
Karremans to alter their behavior for the camera in any way besides for the
sake of negotiating.
Levine et al. (2011) analyze another type of high-stakes situation: aggres-
sion and physical altercations. Protagonists fought physically, or were on the
brink of a fight, while bystanders tried to influence the situation. In such a
scenario in which bodily harm is imminent, reactivity to cameras is low.
Further, CCTV cameras are omnipresent in many parts of the United King-
dom (Temperton 2015). They often record from a distance and thus go
unnoticed by actors. Since the authors studied arguments in front of bars at
night with potentially intoxicated individuals, it is likely that actors were
either unaware of the cameras or did not care, which supports the claim of
low reactivity.
As these examples suggest, reactivity “should neither be ignored, nor
considered fatal” (Jordan and Henderson 1995:56) when studying behavior
through novel sources of visual data. To assess data validity in VDA,

researchers should reflect on the awareness of participants, the importance
of the situation, and the normality of visual recordings. A claim for behavior
being natural is supported if participants did not notice cameras, they did not
adapt their behaviors due to the high stakes of the situation or their high
involvement in activities, or reactivity is a characteristic of the type of
situation under study.
Challenges and Limitations

Like all methodological approaches, VDA has limitations and faces chal-
lenges. Beyond the need to conduct research diligently and apply the
approach to research questions it was developed to address, we discuss four
limitations and challenges specific to VDA.
Limited Access to Private Events

VDA requires people to film or photograph a situation or event, implying that
public events are easier to study than private events. During some types of
events, such as private funerals in Western societies, it is very uncommon to
film. Hence, while other data types such as interviews can shed light on very
private events, visual data on such situations are harder to acquire. Yet social
change appears to partly negate this limitation. The inhibition threshold to
film all types of events seems to be decreasing rapidly; people increasingly
capture behaviors during disasters (e.g., tsunamis or earthquakes), very pri-
vate events (e.g., sexual encounters or giving birth), or incriminating events
(e.g., battery or rape). In 2017, several individuals even filmed and live
streamed committing a murder. Thus, VDA might be unsuitable to studying
some types of situations that are not normally recorded, but it allows tapping
the parallel developments of camera proliferation and user-driven content
creation on social media platforms to study an ever-widening array of social
events and situations.
Tacit Knowledge and Immersion in the Field

A second drawback can be that analyzing videos does not allow developing
tacit knowledge that can enrich ethnographic studies. Studying data on a
situation is different from being in the situation, where noises and smells
might contribute to shaping atmosphere. Above, we discussed the advantages
that VDA offers because of this difference—not being caught up in the
situation’s dynamic or potential emotions, being able to analyze all details,

and sharing data with colleagues and readers. But depending on the research
question, it can be advantageous to use oneself as a research instrument, and
by reflecting on when one feels awkward or at ease, happy or sad, calm or
agitated, and safe or threatened during a situation. This aspect further illus-
trates the potentials of triangulating methods. Combining VDA with partici-
pant observations, interviews, or other data can strengthen insights derived
from the methodological approach.
Interpretation of Visual Data

During data analysis in VDA, some aspects analyzed may be especially
difficult to interpret reliably. For instance, gestures are context dependent
regarding culture, region, or peer group and may be misinterpreted by
researchers unfamiliar with such contexts (Stivers and Sidnell 2005:5).
Emotion expressions may be unconnected to a prior action despite spatial
and temporal proximities and may be misinterpreted if not analyzed as part
of a larger sequence of (inter)actions or if the cultural setting is unfamiliar
to the researcher (e.g., Ekman 2003:4, 217; Elfenbein and Ambady 2003;
von Scheve 2012). Missing information, especially on the social context of
situations, may undermine the validity of an analysis. Lastly, researchers
may tend to ascribe intention to behaviors without indicators from the
visual data.
While no method is immune to such or other forms of difficulties in
interpretation, VDA allows hedging against these dangers. One tool we
discussed in this article is triangulation. Triangulation may safeguard against
missing information, can be vital to confirm emotional states, and provides
evidence on how people interpreted a situation (see Nassauer 2016, 2018b).
Depending on the research question and cases, it may make sense to limit
oneself to culturally familiar contexts to avoid misinterpreting gestures or
specific emotion expression (if these are of particular interest to the research
question). Another tool could be testing intercoder reliability to help safe-
guarding against misinterpretations of gestures or emotion expressions.
To avoid ascribing intention to behaviors, researchers should reflect on
whether they code what they see (e.g., a behavior, like person A running
off from where person B is standing) or whether their codes imply mean-
ings that the researcher may ascribe to a behavior, without supporting
evidence (e.g., coding “running” vs. “trying to escape”). Again, triangula-
tion with other data providing evidence on the meaning of actions (such as
interviews) may be useful.
Table 1. Video Data Analysis as a Methodological Frame.
Research agenda
Analyzes situational Focus on natural behavior.
dynamics Behavior, emotions, and interactions.
Tracing situations or events step-by-step to identify
patterns that help explaining outcomes.
Unit of analysis
Focus on the depicted Raw content of visual data: objects, persons,
surroundings, and the way they interact.
Analytic potential Possibility to analyze situations frame by frame.
Increase reliability by involving several researchers in
analysis of the exact same primary data.
Studying rare events through video footage and
photographs.
Analytic tool kit
Analytic dimensions Face and body: movement of mouth and jaw, eye
region, forehead; body postures, head, hands, and neck.
Interactions: movement in space, using items, engaging
others physically, gestures, and verbal communication.
Context: physical and social.
Analytic procedures Coding situations.
Reconstructing processes and events.
Identifying causal links.
Criteria for validity
Optimal capture Optimal capture of duration, space, and participants.
Assuring sufficient quality of recording.
Triangulation as a possible strategy.
Natural behavior Reactivity as main challenge: possibility that actors
adapt their behavior due to the presence of a
recording device.
Three key questions to assess reactivity: Where
participants aware? Did they possibly adapt behavior?
Could adaptation introduce bias into data?
Lastly, it is always advisable to reflect on alternative explanations, both

in terms of situational dynamics and in terms of other explanations (e.g.,
background factors). Scholars should try to develop such alternative expla-
nations as thoroughly and convincingly as their actual explanation and pit
them against each other (Nassauer 2018b; for an example of this technique
outside VDA, see Gould 1999).
Ethical Concerns
Some VDA studies access visual data through social media platforms. Since this
is a new approach to data collection, there is less precedent regarding research
ethics. Issues include anonymity and confidentiality, difficulties in asking for
informed consent, and ethical questions arising from potential analyses of visual
data that show victims of crimes or personal tragedies. Regulations for research
on human subjects vary greatly among countries, but discussions of research
ethics often involve some form of cost–benefit analysis—what are the potential
benefits of the research, and what are the possible costs for the subjects? Costs
for subjects should be minimized, and a study’s potential benefits should clearly
outweigh costs that remain. In the case of visual data recorded by a researcher,
such an analysis follows the same logic as with any other research project. For
studies using visual data found online or acquired through third parties (e.g., the
police or other government agencies), the calculus may differ. For example, if
visual data are accessed through social media platforms, where they have been
uploaded for the entire Internet to see, researchers need to assess whether there
would be further harm to subjects depicted in the data if those data were used for
analysis, and what it would mean to make these data available for the reader.
Future discussions on research ethics must address such issues (Legewie and
Nassauer, 2018).
Conclusion
This article constructs a methodological frame for using visual data to cap-
ture situations as they occur in real life, what we call VDA. VDA focuses on
analysis of video and other visual data of a variety of real-life behaviors,
emotions, and interactions in situational dynamics. As Table 1 summarizes,
we present an analytic tool kit for VDA including analytic dimensions and
procedures. We argue for optimal capture and natural behaviors recorded in
found material as criteria for data validity. To illustrate these points, we
examine three disparate VDA applications from one empirical subfield—
violence research in sociology and psychology. This article enhances effi-
ciency and comparability of studies, and communication among disciplines
that use VDA. It can serve as a point of reference for current and future
practitioners, reviewers, and interested readers.
The proliferation of publicly available visual data of naturally occurring
situations increases opportunities for grounding social scientific analyses in
these data. Profiting from this surge in data, VDA is likely to be used broadly
to study human behavior in various disciplines such as sociology,
psychology, education, criminology, linguistics, and anthropology. The

approach provides researchers with the opportunity to examine the essence
of social situations and study if there is causality at the microlevel (Collins
2016b). The potential of using visual data also mitigates the disappearance of
actual behavior (i.e., natural social behaviors) from behavioral studies (Bau-
meister et al. 2007). With the amount of data available online, VDA is at the
forefront of open science, enabling researchers to provide direct online
access to all data analyzed (see e.g., Nassauer 2018a) and allowing research-
ers and interested readers worldwide to reproduce and verify findings.
With technology making real-life interactions more available than ever
before, we might not only be “entering the Golden Age of visual socio-
logy” (Collins 2016a:93) but of situational analysis more broadly. As
data and studies proliferate, it will be important to refine VDA as a
methodological frame and further develop analytic techniques. Through
a discussion of the general approach, analytic tools, and validity criteria,
we aim to contribute to this effort. The near future will likely see VDA
grow into a hallmark approach, with a fully specified set of procedures
for data collection and analysis. Such an approach would open new
avenues to using visual data for studying situational dynamics in social
science research and beyond.
Acknowledgments
The authors would like to express their gratitude to the following researchers for
their feedback on the article (in alphabetical order): Isabel Bramsen, Randall Col-
lins, Anette Fasang, Heiner Legewie, Christian von Scheve, Harald Wenzel, and to
the participants of the video analysis workshop in Copenhagen (in particular Wim
Bernasco, Marie Bruvik Heinskou, Mark Levine, Marie Lindegaard, Lasse Liebst,
Richard Philpot, and Poul Poder) and participants at the American Sociological
Association Meeting, Seattle 2016, Section “Methodology—Advancement in
Observing and Modeling Social Processes.” Further, we would like to thank the
anonymous reviewers for their thoughtful comments and suggestions.
Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research,
authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or
publication of this article.
Notes
1. More precisely, visual data can be defined as any data that confer its information
primarily through still or moving images, with or without sound, as opposed to
written or spoken text, or numeric data.
2. A full overview of studies that employ visuals to examine behavior lies beyond
the scope of this article. For an overview, see Erickson (2011) among others.
3. They thereby follow in the tradition of systematic social observations, established
by Reiss (1971). See also Mastrofski et al. (2010).
4. Other analytic strategies resemble what we call video data analysis (VDA).
Examples include interaction analysis (Jordan and Henderson 1995) and human
ethology (Tinbergen 1963; P. Verbeek 2008). The four approaches mentioned
above illustrate the best overlaps between VDA and established approaches, and
new possibilities that VDA offers for researchers interested in studying situa-
tional dynamics.
5. A drastic example is the terrorist attack on the Westgate Mall in Nairobi, Kenya,
in 2013. The attack lasted four days and left 67 dead and 175 wounded. Shoppers
who managed to flee Westgate Mall while the attack was underway told a
coherent story to media and security forces about the attackers who were still
killing people and keeping hostages inside: There were 10–15 attackers, multi-
ethnic, one of them a British woman, and one of them fled with escaping shop-
pers. Although media reported this firsthand information for weeks, evaluations
of CCTV footage later showed that it was wildly inaccurate. Although shoppers
had a strong interest in helping police end the attack (some had friends or family
still in the mall), most information that victims and onlookers provided was
incorrect. Four attackers stormed the mall, all were Somali, all were male, and
none fled (Gatehouse 2013; Krulwich and Abumrad 2015).
6. Immersive virtual environment technologies offer another route to studying rare
events. For instance, Drury and his colleagues (2009) study cooperation and com-
petition during emergency evacuations using immersive laboratory simulations.
7. In laboratory experiments used in many psychological studies, it is possible to
add another dimension—physiological processes (e.g., Carstensen et al. 1995;
Gross and Levenson 1993). Aspects of interest include cardiac activity, such as
heart rate or pulse transmission timed to the finger and ear; vascular activity, such
as finger temperature or finger pulse amplitude; electrodermal measures, such as
skin conductance; respiratory measures, such as respiration period or respiration
depth; and general somatic activity (Gross and Levenson 1993:973). Since VDA
focuses on natural behaviors in everyday social settings, we do not include such
physiological processes as indicators because they can usually not be measured
while guaranteeing natural behavior.
8. For example, the author describes how at a police roadblock, protesters at the
back of the march pushed protesters at the front into the line of police officers.
Officers pushed back but immediately raised their hands up high, palms facing
protesters, in a gesture of peaceful intent (Nassauer 2016:8).
9. Gibson (2012) focuses on audio data to analyze situational dynamics, examining
microdynamics of the 1962 Cuban Missile Crisis using an audiotape of President
Kennedy’s crisis group.
10. In Levine et al. (2011) study, focusing on cases with available police reports
would have meant including only cases after which people involved the police.
Hence, many aggressive but nonviolent situations would likely have been
excluded. Sample size and available information on the context therefore involve
a trade-off. In such cases, researchers must decide what is more important to
answering the research question.
11. In this sense, police operations also increasingly rely on some of the tools we
collected in VDA. For instance, in 2017, Manchester police set up a portal after
the Manchester Arena terror attack for the public to upload videos and pictures
that could help in the reconstruction and analysis of what happened.
12. Before being able to reconstruct sequences of interactions during a protest, the
author needed to connect the data and locate each visual datum in space and time.
Some data contained contextual information (e.g., news coverage of a protest
event), showing when and where an interaction occurred. Other data did not
contain such information and thus required detailed investigations to pinpoint
when and where a recorded interaction occurred. Nassauer (2016:4) employs
documented protest routes, police radio logs, and Google Maps Street View to
locate where on the protest route a filmed intersection was located (e.g., taking a
virtual walk along the protest route with Google Maps Street View to locate a
building in front of which an interaction was filmed). She thereby arranged
approximately 50 visual data and 50 document data per protest into a tempospa-
tial matrix to achieve optimal coverage of the event, with different data pieces
covering each relevant scene.
13. The challenges with assessing reliability during VDA are those common to other
types of data (e.g., clear conceptualizations, transparent coding, and complete
reconstructions) and are not addressed in detail in this article. For discussions of
reliability, see George and Bennett (2005:70) and King, Keohane, and Verba
(1994:26).
14. One drawback in their video data is that interactions might have started earlier
than the fight captured (e.g., protagonists might have had prior encounters or
arguments at a pub, or might be acquainted or even friends). Here, triangulation
with other data types such as court files could complement visual data and
provide missing information where necessary.
15. Protest marches are not unique in their demands on data. Similar events include
sporting events and mass panics.
16. This issue is also a critical challenge for data validity in participant observation
(see Duneier 2000:339f; LeCompte and Goetz 2007) and has analogous problems
in social desirability and interviewer effects during in-depth and face-to-face
survey interviews (Krumpal 2013; West and Blom 2016).
17. Recorded data can also show scripted behaviors, which refer to actions that are
decided on prior to the situation (e.g., when people use public spaces for a flash
mob). Not recognizing such situations as scripted undermines data validity.
Assessing scriptedness is analogous to assessing reactivity; if stakes are high
(e.g., during a mass panic), scripted behaviors are unlikely. Scripted behaviors
are not a problem if they are part of analysis (e.g., analyzing the influence of the
New Zealand national rugby team’s [scripted] traditional Haka war dance on the
opposing teams’ displays of emotions before and during a match or visual record-
ings of reactions to [scripted] behaviors during a Garfinkelian breaching experi-
ment [Garfinkel 1984]).
References
Abuarafah, Adnan Ghazi, Mohamed Osama Khozium, and Essam AbdRabou. 2012.
“Real-time Crowd Monitoring Using Infrared Thermal Video Sequences.” Jour-
nal of American Science 8(3):133-40.
Aisenbrey, Silke and Anette E. Fasang. 2010. “New Life for Old Ideas: The ‘Second
Wave’ of Sequence Analysis Bringing the ‘Course’ Back into the Life Course.”
Sociological Methods & Research 38(3):420-62.
Andersson, Emilia and Gard Ove Sørvik. 2013. “Reality Lost? Re-use of Qualitative
Data in Classroom Video Studies.” Forum Qualitative Sozialforschung/Forum:
Qualitative Social Research 14(3). Accessed April 16, 2018. http://www.qualita
tive-research.net/index.php/fqs/article/view/1941/3558.
Baumeister, Roy F., Kathleen D. Vohs, and David C. Funder. 2007. “Psychology as
the Science of Self-reports and Finger Movements: Whatever Happened to Actual
Behavior?” Perspectives on Psychological Science 2(4):396-403.
Becker, Howard Saul. 1986. Doing Things Together: Selected Papers. Evanston, IL:
Northwestern University Press.
Bernard, H. Russell, Peter Killworth, David Kronenfeld, and Lee Sailer. 1984. “The
Problem of Informant Accuracy: The Validity of Retrospective Data.” Annual
Review of Anthropology 13(1):495-517.
Birdwhistell, Ray. 1970. Kinesics and Context: Essays on Body Motion Communi-
cation. Philadelphia: University of Pennsylvania Press.
Blatter, Joachim. 2012. “Taking Terminology and Timing Seriously: Ontological and
Epistemological Foundations of Causal-process Tracing: Configurational
Thinking and Timing.” ECPR Joint Sessions, Antwerpen. Retrieved March 30,
2018 (https://www.unilu.ch/fileadmin/shared/Publikationen/blatter_taking-termi
nology-and-timing-seriously-ecpr-antwerp.pdf).
Brady, Henry E. 2008. “Causation and Explanation in Social Science.” Pp.
217-70 in The Oxford Handbook of Political Methodology, edited by J. M.
Box-Steffensmeier, H. E. Brady, and D. Collier. New York: Oxford University
Press.
Bramsen, I. 2018. “How Violence Happens (or not): Situational Conditions of
Violence and Nonviolence in Bahrain, Tunisia and Syria, Psychology of Vio-
lence.” Psychology of Violence 8(3):305-315.
Burtscher, Michael J., Johannes Wacker, Gudela Grote, and Tanja Manser. 2010.
“Managing Nonroutine Events in Anesthesia: The Role of Adaptive Coordina-
tion.” Human Factors: The Journal of the Human Factors and Ergonomics Soci-
ety 52(2):282-94.
Caldwell, Kay and Anita Atwal. 2005. “Non-participant Observation: Using
Video Tapes to Collect Data in Nursing Research.” Nurse Researcher 13(2):
42-54.
Callon, Michel. 1986. “Some Elements of a Sociology of Translation: Domestication
of the Scallops and the Fishermen of Saint Brieuc Bay.” Pp. 196-233 in Power,
Action and Belief: A New Sociology of Knowledge? Vol. 32, Sociological Review
Monograph, edited by J. Law. London, England: Routledge.
Carstensen, Laura L., John M. Gottman, and Robert W. Levenson. 1995. “Emotional
Behavior in Long-term Marriage.” Psychology and Aging 10(1):140-49.
Clay-Warner, Jody and Dawn T. Robinson. 2014. “Infrared Thermography as a
Measure of Emotion Response.” Emotion Review 7(2): 157-62.
Collins, Randall. 1993. “Emotional Energy as the Common Denominator of Rational
Action.” Rationality and Society 5(2):203-30.
Collins, Randall. 2008. Violence: A Micro-sociological Theory. Princeton, NJ: Prin-
ceton University Press.
Collins, Randall. 2009. “The Micro-sociology of Violence.” The British Journal of
Sociology 60(3):566-76.
Collins, Randall. 2014. “Micro-sociology of Mass Rampage Killings.” Revue de
Synthe`se 135(4):405-20.
Collins, Randall. 2016a. “On Visual Methods and the Growth of Micro-interactional
Sociology: Interview to Professor Randall Collins (with Uliano Conti).” Pp. 81-94
in Lo spazio del visuale. Manuale sull’utilizzo dell’immagine nella ricerca
sociale, edited by U. Conti. Rome, Italy: Armando Editore.
Collins, Randall. 2016b. “The Sociological Eye: What Has Micro-sociology Accom-
plished?” Retrieved May 5, 2016 (http://sociological-eye.blogspot.com/2016/04/
what-has-micro-sociology-accomplished.html).
Drury, John, C. Cocking, S. Reicher, A. Burton, D. Schofield, A. Hardwick, D.

Graham, and P. Langston. 2009. “Cooperation versus Competition in a Mass
Emergency Evacuation: A New Laboratory Simulation and a New Theoretical
Model.” Behavior Research Methods 41(3):957-70.
Duneier, Mitchell. 2000. Sidewalk. 1st ed. New York: Farrar, Straus and Giroux.
Ekman, Paul. 2003. Emotions Revealed: Recognizing Faces and Feelings to Improve
Communication and Emotional Life. New York: St. Martin’s Griffin.
Ekman, Paul, Wallace F. Friesen, and Phoebe Ellsworth. 1972. Emotion in the Human
Face. New York: Pergamon.
Elfenbein, Hillary Anger and Nalini Ambady. 2003. “Universals and Cultural Dif-
ferences in Recognizing Emotions.” Current Directions in Psychological Science
12(5):159-64.
Erickson, Frederick. 2011. “Uses of Video in Social Research: A Brief History.”
International Journal of Social Research Methodology 14(3):179-89.
Exline, Ralph V. and B. J. Fehr. 1982. “The Assessment of Gaze.” Pp. 91-135 in
Handbook of Methods in Nonverbal Behavior Research, edited by K. R.
Scherer and P. Ekman. Cambridge, Cambridgeshire: Cambridge University
Press.
Fele, Giolo. 2008. “The Collaborative Production of Responses and Dispatching on
the Radio: Video Analysis in a Medical Emergency Call Center.” Forum Quali-
tative Sozialforschung/Forum: Qualitative Social Research 9(3). Accessed April
16, 2018. http://www.qualitative-research.net/index.php/fqs/article/viewArticle/
1175/2616.
Gao, Yuan, Hong Liu, Xiaohu Sun, Can Wang, and Yi Liu. 2016. “Violence
Detection Using Oriented Violent Flows.” Image and Vision Computing
48-49:37-41.
Garfinkel, Harold. 1984. Studies in Ethnomethodology. Cambridge, England: Wiley.
Garfinkel, Harold. 2005. “A Conception of and Experiments with ‘Trust’ as a Con-
dition of Concerted Stable Actions.” Pp. 370-80 in The Production of Reality:
Essays and Readings on Social Interaction, edited by J. A. O’Brien. Thousand
Oaks, CA: Sage.
Gatehouse, Gabriel. 2013. “Kenya Military Names Westgate Mall Attack Suspects.”
BBC News, May 10. Retrieved May 15, 2016 (http://www.bbc.com/news/world-
africa-24412315).
George, Alexander L. and Andrew Bennett. 2005. Case Studies and Theory Devel-
opment in the Social Sciences. Cambridge, England: The MIT Press.
Gibson, David R. 2012. Talk at the Brink: Deliberation and Decision during the
Cuban Missile Crisis. Princeton, NJ: Princeton University Press.
Goertz, Gary and Jack S. Levy. 2007. Explaining War and Peace: Case Studies and
Necessary Condition Counterfactuals. New York: Routledge.
Goffman, Erving. 1959. The Presentation of Self in Everyday Life. New York: Anchor
Books.
Goffman, Erving. 1982. Interaction Ritual: Essays on Face-to-face Behavior. New
York: Pantheon.
Gould, Roger V. 1999. “Collective Violence and Group Solidarity: Evidence from a
Feuding Society.” American Sociological Review 64(3):356-80.
Grady, John. 2008. “Visual Research at the Crossroads.” Forum Qualitative Sozial-
forschung/Forum: Qualitative Social Research 9(3). Accessed April 16, 2018.
http://www.qualitative-research.net/index.php/fqs/article/view/1173/2618.
Gross, James J. and Robert W. Levenson. 1993. “Emotional Suppression: Physiology,
Self-report, and Expressive Behavior.” Journal of Personality and Social Psychol-
ogy 64(6):970-86.
Gross, James J. and Robert W. Levenson. 1997. “Hiding Feelings: The Acute Effects
of Inhibiting Negative and Positive Emotion.” Journal of Abnormal Psychology
106(1):95-103.
Harper, Douglas. 1988. “Visual Sociology: Expanding Sociological Vision.” The
American Sociologist 19(1):54-70.
Harper, Douglas. 2002. “Talking about Pictures: A Case for Photo Elicitation.” Visual
Studies 17(1):13-26.
Harrigan, Jinni A., Robert Rosenthal, and Klaus R. Scherer, eds. 2008. The New
Handbook of Methods in Nonverbal Behavior Research. Paris, France: Cambridge
University Press.
Heath, Christian, Jon Hindmarsh, and Paul Luff. 2010. Video in Qualitative
Research: Analysing Social Interaction in Everyday Life. Los Angeles, CA: Sage.
Humphreys, Laud and Lee Rainwater. 1975. Tearoom Trade: Impersonal Sex in
Public Places. 2 ed. New York: Aldine Transaction.
Isaacowitz, Derek M., Heather A. Wadlinger, Deborah Goren, and Hugh R.
Wilson. 2006. “Selective Preference in Visual Fixation away from Negative
Images in Old Age? An Eye-tracking Study.” Psychology and Aging 21(1):
40-48.
Jordan, Brigitte and Austin Henderson. 1995. “Interaction Analysis: Foundations and
Practice.” The Journal of the Learning Sciences 4(1):39-103.
Juslin, Patrik N. and Klaus R. Scherer. 2008. “Vocal Expression of Affect.” Pp.
65-136 in The New Handbook of Methods in Nonverbal Behavior Research, edited
by J. A. Harrigan, R. Rosenthal, and K. R. Scherer. Paris, France: Cambridge
University Press.
Katz, Jack. 2001. How Emotions Work. New en. Chicago, IL: University of Chicago
Press.
Kendon, Adam. 1990. Conducting Interaction: Patterns of Behavior in Focused
Encounters. New York: Cambridge University Press.
King, Gary, Robert O. Keohane, and Sidney Verba. 1994. Designing Social Inquiry.
Princeton, NJ: Princeton University Press.
Kissmann, Ulrike Tikvah. 2009. Video Interaction Analysis. Frankfurt am Main,
Germany: Peter Lang.
Klusemann, Stefan. 2009. “Atrocities and Confrontational Tension.” Frontiers in
Behavioral Neuroscience 3(42):1-10.
Knoblauch, Hubert, Bernt Schnettler, Jürgen Raab, and Hans-Georg Soeffner, eds.
2006. Video Analysis: Methodology and Methods: Qualitative Audiovisual Data
Analysis in Sociology. Frankfurt am Main, Germany: Peter Lang.
Konecki, Krzysztof T. 2008. “Touching and Gesture Exchange as an Element of
Emotional Bond Construction. Application of Visual Sociology in the
Research on Interaction between Humans and Animals.” Forum Qualitative
Sozialforschung / Forum: Qualitative Social Research 9(3). Retrieved Novem-
ber 7, 2015 (http://www.qualitative-research.net/index.php/fqs/article/view/
1154).
Konecki, Krzysztof T. 2016. Is the Body the Temple of the Soul? Modern Yoga
Practice as a Psychosocial Phenomenon. Łódź, Poland: Jagiellonian University
Press.
Krulwich, Robert and Jad Abumrad. 2015. “Outside Westgate.” Retrieved May 15,
2016 (http://www.radiolab.org/story/outside-westgate/).
Krumpal, Ivar. 2013. “Determinants of Social Desirability Bias in Sensitive Surveys:
A Literature Review.” Quality & Quantity 47(4):2025-47.
Laitin, David D. 2008. “Confronting Violence Face to Face.” Science 320(5872):
51-52.
Latour, Bruno. 1993. The Pasteurization of France. London, England: Harvard Uni-
versity Press.
Law, John. 1992. “Notes on the Theory of the Actor-network: Ordering, Strategy, and
Heterogeneity.” Systemic Practice and Action Research 5(4):379-93.
Law, John. 2009. “Actor Network Theory and Material Semiotics.” Pp. 141-58 in
Social Theory, edited by B. S. Turner. Chichester, England: Wiley-Blackwell.
Legewie and Nassauer (2018). YouTube, Google, Facebook: 21st Century Online
Video Research and Research Ethics. In: Forum Qualitative Sozialforschung/
Forum: Qualitative Social Research.
LeCompte, Margaret D. and Judith Preissle Goetz. 2007. “Problems of Reliability and
Validity in Ethnographic Research.” Pp. 3-39 in Qualitative Research Volume
Two. Quality Issues in Qualitative Research, Sage Benchmarks in Social Research
Methods, edited by A. Bryman. Thousand Oaks, CA: Sage.
Levenson, Robert W. and John M. Gottman. 1985. “Physiological and Affective
Predictors of Change in Relationship Satisfaction.” Journal of Personality and
Social Psychology 49(1):85-94.
Levine, Mark, Paul J. Taylor, and Rachel Best. 2011. “Third Parties, Violence, and
Conflict Resolution: The Role of Group Size and Collective Action in the Micro-
regulation of Violence.” Psychological Science 22(3):406-12.
Lindegaard, M. R. and W. Bernasco. eds. 2018. “Crime Caught on Camera.” [Special
issue]. Journal of Research in Crime and Delinquency 55(1):3-186.
Lipinski, David and Rosemery Nelson. 1974. “Problems in the Use of Naturalistic
Observation as a Means of Behavioral Assessment.” Behavior Therapy 5(3):
341-51.
Lomax, Helen and Neil Casey. 1998. “Recording Social Life: Reflexivity and Video
Methodology.” Sociological Research Online. Retrieved May 27, 2016 (http://
www.socresonline.org.uk/3/lomax/lomax_doc.html).
MacIndoe, H. and Abbott, A. 2004. “Sequence Analysis and Optimal Matching
Techniques for Social Science Data” Pp. 387-405 in Handbook of Data Analysis,
edited by Hardy, Melissa and Bryman, Alan. London: Sage.
Mackenzie, C. F. and Y. Xiao. 2003. “Video Techniques and Data Compared with
Observation in Emergency Trauma Care.” Quality and Safety in Health Care
12(Suppl 2): ii51-ii57.
Mahoney, James. 2012. “The Logic of Process Tracing Tests in the Social Sciences.”
Sociological Methods & Research 41(4):570-97.
Margolis, Eric and Luc Pauwels. 2011. The SAGE Handbook of Visual Research
Methods. Los Angeles, CA: Sage.
Mastrofski, Stephen D., Roger B. Parks, and John D. McCluskey. 2010. “Systematic
Social Observation in Criminology.” Pp. 225-47 in Handbook of Quantitative
Criminology, edited by A. R. Piquero and D. Weisburd. New York: Springer.
Mauss, Marcel. 1923. “Essai Sur Le Don Forme Et Raison De L’Échange Dans Les
Sociétés Aarchaı̈quès.” L’Anne´e Sociologique 1: 30-186.
McConnell, Fred. 2015. “YouTube is 10 Years Old: The Evolution of Online Video.”
The Guardian, February 13. Retrieved January 10, 2018 (https://www.theguar
dian.com/technology/2015/feb/13/youtube-10-years-old-evolution-of-online-
video?CMP=fb_gu).
Mead, Margaret and Gregory Bateson. 1942. Balinese Character, a Photographic
Analysis. New York: New York Academy of Sciences.
Mondada, Lorenza. 2008. “Using Video for a Sequential and Multimodal Analysis of
Social Interaction: Videotaping Institutional Telephone Calls.” Forum Qualitative
Sozialforschung/Forum: Qualitative Social Research 9(3). Accessed April 16,
2018. http://www.qualitative-research.net/index.php/fqs/article/viewArticle/
1161/2566.
Nassauer, Anne. 2015. “Effective Crowd Policing: Empirical Insights on Avoiding
Violence.” Policing: An International Journal of Police Strategies & Manage-
ment 38(1):3-23.
Nassauer, Anne. 2016. “From Peaceful Marches to Violent Clashes: A Micro-

situational Analysis.” Social Movement Studies 15(5):1-16.
Nassauer, Anne. 2018a. “How Robberies Succeed or Fail: Analyzing Crime Caught
on CCTV.” Journal of Research in Crime and Delinquency 55(1):125-54.
Nassauer, Anne. 2018b. “Situational Dynamics and the Emergence of Violence dur-
ing Protests.” Psychology of Violence 8(3):293-304.
Nievas, Enrique Bermejo, Oscar Deniz Suarez, Gloria Bueno Garcı́a, and Rahul
Sukthankar. 2011. “Violence Detection in Video Using Computer Vision Tech-
niques.” Pp. 332-39 in Computer Analysis of Images and Patterns, edited by
Berciano, A., Dı́az-Pernil D., Kropatsch W.G., Molina-Abril H., and Real P.
Berlin, Heidelberg: Springer.
Norris, Sigrid. 2004. Analyzing Multimodal Interaction: A Methodological Frame-
work. London, England: Routledge.
Pauwels, Luc. 2010. “Visual Sociology Reframed: An Analytical Synthesis and Dis-
cussion of Visual Methods in Social and Cultural Research.” Sociological Meth-
ods & Research 38(4):545-81.
Pauwels, Luc. 2011. “An Integrated Conceptual Framework for Visual Social
Research.” Pp. 3-23 in The SAGE Handbook of Visual Research Methods, edited
by E. Margolis and L. Pauwels. Los Angeles, CA: Sage.
Pink, Sarah. 2006. “Photography.” Pp. 222-24 in The SAGE Dictionary of Social
Research Methods, edited by V. Jupp. Thousand Oaks, CA: Sage.
Pruzan, Katherine and Derek M. Isaacowitz. 2006. “An Attentional Application of
Socioemotional Selectivity Theory in College Students.” Social Development
15(2):326-38.
Reichertz, Jo and Gregor Schöner. 2014. “Emotion. Eskalation. Gewalt. Entwicklung
Eines Video-basierten Verfahrens Zur Früherkennung von Emotionsprozessen
Bei Großveranstaltungen.” Retrieved May 10, 2017 (http://gepris.dfg.de/gepris/
projekt/258979342).
Reiss, Albert J. 1971. “Systematic Observation of Natural Social Phenomena.” Socio-
logical Methodology 3:3-33.
Ribeiro, Pedro Canotilho, Romaric Audigier, and Quoc Cuong Pham. 2016.
“RIMOC, a Feature to Discriminate Unstructured Motions: Application to Vio-
lence Detection for Video-surveillance.” Computer Vision and Image Under-
standing 144:121-43.
Sacks, Harvey, Emmanuel A. Schegloff, and Gail Jefferson. 1974. “A Simplest
Systematics for the Organization of Turn-taking for Conversation.” Language
50(4):696-735.
Saylor, Ryan. 2013. “Concepts, Measures, and Measuring Well: An Alternative
Outlook.” Sociological Methods & Research 42(3):354-91.
Scherer, Klaus R. and Paul Ekman, eds. 1982. Handbook of Methods in Nonverbal
Behavior Research (Studies in Emotion and Social Interaction). Cambridge,
England: Cambridge University Press.
Stivers, Tanya and Jack Sidnell. 2005. “Introduction: Multimodal Interaction.”
Semiotica 2005(156):1-20.
Strauss, Anselm L. and Juliet M. Corbin. 1998. Basics of Qualitative Research:
Grounded Theory Procedures and Techniques. 2nd ed. Thousand Oaks, CA:
Sage.
Suchar, Charles S. 1997. “Grounding Visual Sociology Research in Shooting
Scripts.” Qualitative Sociology 20(1):33-55.
Temperton, James. 2015. “One Nation under CCTV: The Future of Automated Sur-
veillance.” Wired UK, August 17. Retrieved May 5, 2016 (http://www.wired.co
.uk/article/one-nation-under-cctv).
Tilly, Charles and Sidney G. Tarrow. 2006. Contentious Politics. Boulder, CO:
Paradigm.
Tinbergen, N. 1963. “On Aims and Methods of Ethology.” Zeitschrift Für Tierpsy-
chologie 20(4):410-33.
Tuma, René, Bernt Schnettler, and Hubert Knoblauch. 2013. Videographie. Einführung
in die interpretative Videoanalyse sozialer Situationen. Qualitative Sozialforschung.
Wiesbaden, Germany: Springer.
Verbeek, Marno. 2008. A Guide to Modern Econometrics. Chichester, England:
Wiley.
Verbeek, Peter. 2008. “Peace Ethology.” Behaviour 145(11):1497-1524.
von Scheve, Christian. 2012. “The Social Calibration of Emotion Expression: An
Affective Basis of Micro-social Order.” Sociological Theory 30(1):1-14.
Vrij, Aldert, Lorraine Hope, and Ronald P. Fisher. 2014. “Eliciting Reliable Infor-
mation in Investigative Interviews.” Policy Insights from the Behavioral and
Brain Sciences 1(1):129-36.
Wagner, Jon. 2002. “Contrasting Images, Complementary Trajectories: Sociology,
Visual Sociology and Visual Research.” Visual Studies 17(2):160-71.
Waldinger, Robert J., Stuart T. Hauser, Marc S. Schulz, Joseph P. Allen, and Judith A.
Crowell. 2004. “Reading Others’ Emotions: The Role of Intuitive Judgments in
Predicting Marital Satisfaction, Quality, and Stability.” Journal of Family Psy-
chology: JFP: Journal of the Division of Family Psychology of the American
Psychological Association (Division 43) 18(1):58-71.
Watson, O. Michael and Theodore D. Graves. 1966. “Quantitative Research in Proxe-
mic Behavior.” American Anthropologist 68(4):971-85.
Waxer, Peter H. 1985. “Video Ethology: Television as a Data Base for Cross-
cultural Studies in Nonverbal Displays.” Journal of Nonverbal Behavior 9(2):
111-20.
Weenink, Don. 2014. “Frenzied Attacks. A Micro-sociological Analysis of the Emo-

tional Dynamics of Extreme Youth Violence.” The British Journal of Sociology
65(3):411-33.
West, Brady T. and Annelies G. Blom. 2016. “Explaining Interviewer Effects: A
Research Synthesis.” Journal of Survey Statistics and Methodology. 2017:
175-211.
Author Biographies
Anne Nassauer is an assistant professor at the Department of Sociology, John F.
Kennedy Institute, Freie Universität Berlin, Germany. Her research areas include
microsociology, violence, and video data analysis. Among other issues, she is inter-
ested in collective action, qualitative research design, symbolic interaction, emotions,
and criminal behavior.
Nicolas M. Legewie is a postdoctoral researcher at the German Institute of Eco-
nomic Research (DIW Berlin). He teaches and writes about social inequality and
mobility, migration, social networks, as well as research design, digital social
science research, research ethics, and video data analysis. He is currently the
coprincipal investigator in the project “Mentoring of Refugees (MORE),” a rando-
mized controlled trial investigating the impact of a nongovernmental mentoring
program on the lives of refugees in Germany.

Video Data Analysis: A Methodological Frame For A Novel Research Trend

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Video Data Analysis: A Methodological Frame For A Novel Research Trend

Uploaded by

Copyright:

Available Formats

Article

Sociological Methods & Research

Anne Nassauer1 and Nicolas M. Legewie2

How do teams structure social organization in the workplace and manage

interactions in nursing centers (Caldwell and Atwal 2005), cooperation in

VDA and Visual Studies: Focusing on the Depicted

Figure 1. Video data analysis as a methodological nexus.

VDA and Ethnography: The Analytic Potential of Visual Data

increases reproducibility of findings (Heath et al. 2010:7). Third, some

VDA and Experimental Behavioral Studies: Focusing on

VDA and Multimodal Interaction Analysis: Focusing on

Three Examples of VDA Applications

Analytic Tool Kit for VDA

degrees of interest and engagement (Carstensen et al. 1995:143), and atten-

Thermal imaging displays infrared radiation emitted by humans. Increases in

Interactions. When conducting VDA interactions are usually a core interest.

stratification and lead Commander Karremans (the listener) to experience

Context. The context of a situation should not be understood merely as static

example, Nassauer’s (2016, 2018b) study of violence during demonstrations

Reconstructing processes and events. Coding all relevant lower-level actions of

(adapted from Klusemann 2009, p.6)

Figure 2. Examples of analytic procedures in video data analysis.

available data approaches optimal capture of situational dynamics. Since it

meaningful conclusions. Second, if gaps are too extensive to conduct

through novel sources of visual data. To assess data validity in VDA,

Challenges and Limitations

Limited Access to Private Events

Tacit Knowledge and Immersion in the Field

situation’s dynamic or potential emotions, being able to analyze all details,

Interpretation of Visual Data

Table 1. Video Data Analysis as a Methodological Frame.

Lastly, it is always advisable to reflect on alternative explanations, both

psychology, education, criminology, linguistics, and anthropology. The

Declaration of Conflicting Interests

Drury, John, C. Cocking, S. Reicher, A. Burton, D. Schofield, A. Hardwick, D.

Nassauer, Anne. 2016. “From Peaceful Marches to Violent Clashes: A Micro-

Weenink, Don. 2014. “Frenzied Attacks. A Micro-sociological Analysis of the Emo-

You might also like