Professional Documents
Culture Documents
Quantitative Me
Quantitative Me
Keywords: Multivariate analysis methods, data collection, big data, longitudinal data
1 Overview
Quantitative social research applies statistical methods to analyse observable phe-
nomena. Whereas in qualitative social research the process of data collection is
usually intertwined with theory building and data analysis (Hollstein/Kumkar,
QUALITATIVE METHODS, in this volume), in quantitative studies these steps are
carried out successively. Upon completing the theoretical work of formulating hy-
potheses (or assumptions, depending on the epistemological framework) and oper-
ationalizing concepts, the quantitative researcher defines the population to be stud-
ied, chooses a sampling method, and―in survey research only―designs the
questionnaire. These steps are followed by the actual fieldwork, in which interviews
are conducted or data is collected from other sources, such as documents, observa-
tions, experiments, and websites. Before the start of data analysis, the data have to be
cleaned, for example, by excluding records with too many missing values and
checking for possible errors and inconsistencies.
Raw data in quantitative social research can be understood as a large table of
numbers, often with hundreds of variables in columns and thousands of cases in rows.
With the aid of quantitative statistical methods, one can summarize information in the
dataset into tables, graphs, and meaningful numeric information to answer research
questions. Data analysis usually starts with univariate and bivariate descriptions,
followed by multivariate analyses.
In this review, we summarize recent contributions of authors from German-
speaking countries on the topics of multivariate statistical methods and data collec-
tion in quantitative social research. In light of the large range and variety of quanti-
OpenAccess. © 2021 Alice Barth and Jörg Blasius, published by De Gruyter. This work is licensed
under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
https://doi.org/10.1515/9783110627275-022
316 Alice Barth and Jörg Blasius
tative methods used in the social sciences, the review mainly aims to give an overview
of publications that introduce and discuss multivariate quantitative methods by using
examples from social science research. Advanced statistical discussions such as the
development of pure algorithms are beyond our scope.
(Meitinger, Braun, and Behr, 2018; Gummer, Roßmann, and Silber, 2018), and the
increasing number of respondents who complete web surveys using mobile devices
(Liebe et al., 2015; Struminskaya, Weyandt, and Bosnjak, 2015). The latter develop-
ment also facilitates the use of new data sources, such as collecting data on geolo-
cation, accelerometer, or browser history from smartphones via apps (Keusch et al.,
2019). The collection of digital traces left by humans can also take place via websites,
social media platforms, or sensors (see section 5 on big data for further discussion).
Other alternatives to survey data are data produced by processes external to scientific
research, such as administrative data (primarily collected by government entities) or
transaction data. In particular, the integration of such data with socio-demographic
characteristics or other information obtained by way of ‘classical’ surveys offers many
possibilities for social science research (Stier et al., 2019).
Another important issue that has generated a large number of discussions in the
international context is the quality of survey data (Menold and Wolbring, 2019). Data
quality depends to a great extent on the behavior of respondents, interviewers, and
survey institutes. With regard to respondents, suboptimal response behavior in
quantitative surveys is frequently discussed under the term satisficing. Roßmann
(2017) provides a comprehensive summary of the theory and state of research on
satisficing while also presenting new modeling strategies based on an empirical study.
The reader by Winker, Menold, and Porst (2013) focuses on interviewers’ deviations in
surveys. Other recent contributions to this topic are Durrant and D’Arrigo (2014) on
doorstep interactions and West et al. (2018) on conversational interviewing. A syn-
thesis of the literature on interviewer effects is provided by West and Blom (2017).
Blasius and Thiessen (2018) argue that interviewers’ falsifications are not a marginal
problem at all; even surveys such as the ESS are affected to such an extent that some
countries should be excluded from the analysis. The role of institutions in fabricating
survey data has been analyzed by Blasius and Thiessen (2012, 2015) using the World
Values Survey and data from the Program for International Student Assessment’s
(PISA) 2009 survey.
structure, which indicates the integration of latent, not directly observable variables
(e.g., the level of anomie or the intelligence quotient).
At the second level, a distinction is made as to whether the dependent variable is
categorically (including dichotomous) or continuously scaled. Furthermore, categori-
cally scaled variables are usually subdivided into ordered and unordered categorical
variables (not shown in Figure 1). Figure 1 distinguishes four groups of procedures:
‒ The dependent variable is continuously scaled: in this case researchers usually
apply different forms of regression analysis.
‒ The dependent variable is categorically scaled: most often applied in this case are
forms of logistic regression analysis, sometimes multinomial logistic regression,
and in the past discriminant analysis.
‒ The latent variable is continuously scaled: in this case scaling methods are ap-
plied, such as correspondence analysis, factor analysis, or its more sophisticated
version, structural equation modeling (often just called SEM).
‒ The latent variable is categorically scaled: in this case clustering methods and
forms of latent class analysis can be applied.
In each of these four groups, a distinction is made at the level of the independent
variables between categorically or continuously scaled variables. Another level of
distinction (not shown in Figure 1) concerns the question of causality―whether or not
the assumption is made that independent characteristics influence dependent ones.
To determine causal effects, panel data analysis or event history analysis are often
applied. The aforementioned procedures can also be combined with other methods
(e.g., when the factor scores from a scaling method are used in regression analysis).
Quantitative Methods 319
There are several variants of regression analysis. If the model mainly consists of
manifest variables on the first level (left and right upper quadrant in Figure 1) and if
320 Alice Barth and Jörg Blasius
Latent variables are not directly observed but computed on the basis of a set of
manifest variables. Clustering methods imply a latent variable that is unordered cat-
egorical, thus classifying the data into several groups. The goal of cluster analysis is to
assign each object to exactly one cluster; the clustering can be done by row (persons,
institutions, etc.) or by column (variables, variable categories, etc.). The individual
clusters should be as homogeneous as possible (i.e., the variance within the clusters
should be minimal) while the clusters should differ as much as possible (i.e., the
variance between the clusters should be maximal). In other words, cluster analysis
serves to group a large number of objects (persons or variables) together. A good in-
troduction to cluster analysis methods in the German language is provided by Bacher,
Pöge, and Wenzig (2010, third edition).
Similar to cluster analysis, latent class analysis tries to identify previously un-
observed groups based on patterns in a set of variables. In latent class analysis, the
assignment to groups is probabilistic―thus, each object or person has a certain
probability of belonging to a specific cluster. Latent class analysis is a model-based
Quantitative Methods 321
procedure, allowing tests for the goodness of fit of the solution (in particular, the
number of clusters) by a variety of fit indices mostly based on log-likelihood statistics.
While there are chapters on this method in the reader by Wolf and Best (2010), the
international discussion has almost exclusively been in English and has mostly been
advanced by authors from the Netherlands and the US. Two current empirical appli-
cations by German-speaking authors are Barth and Trübner (2018) and Grunow, Be-
gall, and Buchler (2018), who use latent class analysis to assess different patterns of
attitudes towards gender roles in Germany and Europe, respectively.
Scaling methods are used in the case of continuously scaled latent variables (usually
with a mean of zero and a standard deviation of one). If the manifest variables are
continuously scaled as well, factor analysis or principal component analysis can be
applied. A classic example is the determination of intelligence quotients. The methods
provide factor scores (i.e., all respondents receive values on the extracted latent
variables). The resulting variables can be used for further analyses, for example, in
regression models.
An advanced form of factor analysis is structural equation modeling, which en-
ables the specification of causal effects between the manifest and latent variables as
well as between latent variables. Very good and detailed introductions have been
provided by Reinecke (2014, second edition) and Urban and Mayerl (2014). A practice-
oriented introduction using Mplus has been provided by Kleinke, Schlüter, and Christ
(2017, second edition) and an introduction using R by Steinmetz (2015, second edi-
tion). A current methodological application of structural equation modeling―adapted
to multilevel data structures (see section 4)―is the assessment of measurement (non‐)
equivalence across groups, for example, in cross-cultural research (among others,
Davidov et al., 2015, 2018).
In many cases, survey data may not even presume an ordinal data level (e.g., when
assessing features of lifestyle or political preferences); in these cases, multiple cor-
respondence analysis is the most appropriate instrument for scaling the data. An
introduction into this method is given by Blasius (2001), and some recent research can
be found in Blasius and Greenacre (2014) as well as in a special issue of the Italian
Journal of Applied Statistics, edited by Balbi, Blasius, and Greenacre (2017).
4 Longitudinal Data
While the methods discussed up to this point are mainly used for the analysis of cross-
sectional data, their general ideas can be extended to the analysis of longitudinal data
such as panel data, event history data, time series data, and repeated cross-sectional
data.
322 Alice Barth and Jörg Blasius
Second, panel studies may suffer from attrition. As a result of non-contact, re-
fusal, relocations, or other reasons, respondents drop out of the sample over time.
Analogous to panel conditioning, the vital question is whether panel attrition is se-
lective and may, as a consequence, introduce bias in substantive results. The mag-
nitude and patterns of attrition differ between panel studies and countries (Behr,
Bellgardt, and Rendtel, 2005). As panel data provide information on respondents from
earlier waves, the application of longitudinal weights and the statistical modeling of
selective dropouts have proven to be effective measures to reduce bias in estimates
(Rendtel and Harms, 2009).
Another type of longitudinal data are event history data. These data are collected
historically on a daily, weekly, or monthly basis, for example, the month in which
familial or occupational status changes. Methods for analyzing patterns of events over
time include event history analysis (sometimes referred to as survival analysis) and
sequence analysis. Whereas in event history analysis the main objectives are to esti-
mate the probability of specific transitions or durations and to investigate causal re-
lationships, sequence analysis is directed at exploring, classifying, and comparing
different trajectory patterns. Blossfeld, Rohwer, and Schneider (2019, second edition)
have provided an introduction to conducting event history analysis using Stata. In
step-by-step instructions with many examples, the book covers the entire research
process from data collection to the analyses of the event history data and the inter-
pretation of the results. Its focus is mainly on parametric methods. A good overview of
recent developments in sequence analysis can be found in Brzinsky-Fay and Kohler
(2010) as well as in Aisenbrey and Fasang (2010). Among German social scientists, the
most prominent area of application of both event history and sequence analysis is life
course research, assessing, for example, the relation between unemployment and
fertility in different socio-economic groups (Kreyenfeld and Andersson, 2014) or
changes in the division of housework in married couples (Grunow, Schulz, and
Blossfeld, 2012).
Another type of longitudinal data are time series data: long (at least thirty to forty
measurement occasions), ordered data measured at equally spaced points in time, for
example, unemployment data, economic data, or election data. Although the majority
of publications come from economics, there is an increasing number of social sci-
entists working with this kind of data. While there are a number of textbooks on time
series analysis that require advanced mathematical skills, Thome (2005) designed his
textbook as an application-oriented introduction for (advanced) students, teachers,
324 Alice Barth and Jörg Blasius
and research practitioners, requiring only basic knowledge of descriptive and infer-
ential statistics. The book starts with the classical methods of breaking down time
series into different components (such as trend and cycle). In the second step, the time
series are interpreted as stochastic processes, which can be modeled in the context of
the Box–Jenkins approach; these are the so-called autoregressive integrated moving
average (ARIMA) models. Finally, time series models are expanded so that they can
also identify and represent causal relationships. The respective process steps are ex-
plained using examples from research practice. A concise and application-oriented
introduction to time series analysis using R is provided by Schlittgen and Sattarhoff
(2020). The examples used mostly stem from the field of economics, but the book is
helpful as an overview and a guide to practitioners from the social sciences as well.
5 Multilevel Data
When data are organized in a hierarchical structure (e.g., pupils nested in school
classes, their teachers nested in schools, their schools nested in counties, and counties
nested in countries), multilevel modeling can―and often should―be applied. Langer
(2009, second edition) has written a textbook on multilevel analysis that provides
examples from empirical education research, in particular the PISA studies. Alongside
a historical introduction to modeling in context analysis, the book discusses the
analysis of panel data as a multilevel structure (individuals nested in time points) and
gives detailed instructions for the empirical application of multilevel models using
SPSS and the freeware MLA.
A concise introduction in the form of a journal article is provided by Steenbergen
and Jones (2002) in the American Journal of Political Science. It discusses the logic and
the statistical background of the general two-level linear multilevel model and some
submodels and gives an example of assessing support for the European Union using a
three-level model (individuals, political parties, and countries). Two recent special
issues of the Kölner Zeitschrift für Soziologie und Sozialpsychologie discuss the theory
and methodology of multilevel analyses in regional contexts (Friedrichs and Non-
nenmacher, 2014) and in cross-national comparative research (Andreß, Fetchenhauer,
and Meulemann, 2019). They both provide a wide range of empirical examples. An
article by Schmidt-Catran and Fairbrother (2015) addresses common misspecifications
in multilevel models for the investigation of longitudinal comparative survey data.
6 Big Data
The significance of big data for social science research has been one of the most
discussed subjects in the last decade, with a strong annual increase in the last five
years. In general, data are considered “big” when they are digitally available and
exceed the capacities of conventional analysis software owing to their large volume
Quantitative Methods 325
and complexity. Recent volumes on big data in social science research share the goals
of introducing methods for analysis, providing examples of best practice, and dis-
cussing implications of big data for social science research in general. As such, they all
provide introductions to the subject of big data, albeit with different priorities.
The Handbook of Big Data, edited by Bühlmann et al. (2016), largely focuses on
analysis techniques for large-scale datasets. In twenty-four chapters, various authors
from the fields of mathematics, statistics, social sciences, and computer science dis-
cuss how datasets that challenge traditional methods of analysis by their sheer size
can be explored, visualized, and analyzed using efficient algorithms, graphs, and
machine-learning approaches. Particularly readers with a background in statistics will
profit from learning how familiar methods such as regression analysis, structural
equation modeling, algorithms based on singular value decomposition, and various
other estimation methods can be adapted to sparse and complex datasets.
The volume Computational Social Science. Die Analyse von Big Data, edited by
Behnke et al. (2018), provides a large number of examples of social science research
with big data. The handbook compiles a series of case studies using big data in po-
litical science, epistemological reflections, and introductions to various analysis
methods and software. The main focus of the book is on text mining and analysis of
textual data, for example, sentiment analysis of users’ evaluations of politicians in
online media, the classification of parliamentary discourse, or the use of Twitter data
in political communication research.
Finally, Computational Social Science in the Age of Big Data, edited by Stuetzer,
Welker, and Egger (2018), has a large focus on aspects of the research process such as
approaches to data collection, technical implementation, and data storage. Further,
the volume presents epistemological perspectives and case studies on big data, and a
tutorial section provides hands-on manuals for learning analytics and geospatial
analysis of social media data.
One variant of big data is digital, automatically collected process data. In the
context of web surveys, “collateral data” such as timestamps, keystrokes, or mouse
clicks are discussed using the term paradata. In the compendium Improving Surveys
with Paradata (2013), editor Kreuter notes in her preface that “paradata are a key
feature of the ‘big data’ revolution for survey researchers and survey methodologists.”
The volume presents a comprehensive overview of the uses of paradata, including
survey error investigation, paradata-driven adaptions, studying and improving the
response process, modeling strategies for different kinds of paradata, and the inves-
tigation of paradata quality. Although not all survey paradata amount to big data in
terms of volume, their often complex and unstructured nature poses similar chal-
lenges to analysis.
In addition, there is a wide range of substantial applications of big data research
by social scientists from German-speaking countries. Just to mention a few examples:
Schmitz et al. (2009) combine data on e-mail contacts with a survey to assess indi-
cated and revealed mate preferences in online dating; Wagner et al. (2016) assess
326 Alice Barth and Jörg Blasius
gender biases in Wikipedia; and Weller et al. (2014) edited a volume on Twitter and
Society.
The growing popularity of big data in social science research has also sparked
controversial discussions on the potential and limitations of analysis. For example,
Jungherr et al. (2017) argue that Twitter data are valid for the measurement of the
temporal dynamics of attention toward politics but regard attempts to infer public
opinion or even election results from Twitter as highly questionable. Besides the re-
alization that big data is not yet to replace “traditional” representative polls and social
surveys entirely, such controversies also point to the importance of decisions con-
cerning sampling, cleaning, and processing the data and to the need for transparently
documenting the research process. In addition, analyses using big data raise vital
questions of data protection, privacy, and informed consent (Kreuter et al., 2018;
Mühlichen, 2018).
References
Aisenbrey, S.; Fasang A. New Life for Old Ideas: The ‘Second Wave’ of Sequence Analysis Bringing
the ‘Course’ Back Into the Life Course. Sociological Methods & Research 2010, 38, 420–462.
Andreß, H.; Golsch, K.; Schmidt, A. W. Applied Panel Data Analysis for Economic and Social
Surveys; Springer: Berlin, 2013.
Andreß, H. J.; Fetchenhauer, D.; Meulemann, H., Eds. Cross-national Comparative Research.
Sonderheft der Kölner Zeitschrift für Soziologie und Sozialpsychologie; Springer VS:
Wiesbaden, 2019.
Bach, R. L.; Eckman, S. Participating in a Panel Survey Changes Respondents’ Labour Market
Behaviour. Journal of the Royal Statistical Society: Series A (Statistics in Society) 2019, 182,
263–281.
Quantitative Methods 327
Engel, U.; Jann, B.; Lynn, P.; Scherpenzeel, A.; Sturgis, P. Eds. Improving Survey Methods. Lessons
from Recent Research; Routledge: New York, 2015.
Friedrichs, J.; Nonnenmacher, A. Eds. Soziale Kontexte und Soziale Mechanismen. Sonderheft der
Kölner Zeitschrift für Soziologie und Sozialpsychologie; Springer VS: Wiesbaden, 2014.
Gangl, M.; Ziefle, A. Motherhood, Labor Force Behavior, and Women’s Careers: An Empirical
Assessment of the Wage Penalty for Motherhood in Britain, Germany, and the United States.
Demography 2009, 46, 341–369.
Giesselmann, M.; Windzio, M. Regressionsmodelle zur Analyse von Paneldaten; Springer VS:
Wiesbaden, 2012.
Grunow, D.; Schulz, F.; Blossfeld, H. P. What Determines Change in the Division of Housework over
the Course of Marriage? International Sociology 2012, 27, 289–307.
Grunow, D.; Begall, K.; Buchler, S. Gender Ideologies in Europe: A Multidimensional Framework.
Journal of Marriage and Family 2018, 80, 42–60.
Gummer, T.; Roßmann, J.; Silber H. Using Instructed Response Items as Attention Checks in Web
Surveys: Properties and Implementation. Sociological Methods & Research 2018, DOI: https://
doi.org/10.1177/0049124118769083
Häder, S.; Häder, M.; Schmich, P., Eds. Telefonumfragen in Deutschland; Schriftenreihe der ASI –
Arbeitsgemeinschaft Sozialwissenschaftlicher Institute; Springer VS: Wiesbaden, 2019.
Jungherr, A.; Schoen, H.; Posegga, O.; Jürgens, P. Digital Trace Data in the Study of Public Opinion:
An Indicator of Attention Toward Politics rather than Political Support. Social Science
Computer Review 2017, 35, 336–356.
Keusch, F.; Struminskaya, B.; Antoun, C.; Couper, M. P.; Kreuter, F. Willingness to Participate in
Passive Mobile Data Collection. Public Opinion Quarterly 2019, DOI: https://doi.org/10.1093/
poq/nfz007
Kleinke, K.; Schlüter, E.; Christ, O. Strukturgleichungsmodelle mit Mplus: Eine praktische
Einführung; De Gruyter-Oldenbourg: Berlin/Boston, 2017.
Kreuter, F., Ed. Improving Surveys with Paradata: Analytic Uses of Process Information; John Wiley &
Sons: Hoboken, 2013.
Kreuter, F.; Haas, G. C.; Keusch, F.; Bähr, S.; Trappmann, M. Collecting Survey and Smartphone
Sensor Data with an App: Opportunities and Challenges around Privacy and Informed Consent.
Social Science Computer Review 2018, DOI: https://doi.org/10.1177/0894439318816389
Kreyenfeld, M.; Andersson, G. Socioeconomic Differences in the Unemployment and Fertility Nexus:
Evidence from Denmark and Germany. Advances in Life Course Research 2014, 21, 59–73.
Langer, W. Mehrebenenanalyse. Eine Einführung für Forschung und Praxis. 2nd ed.; VS: Wiesbaden,
2009 [1st ed. 2004].
Liebe, U.; Glenk, K.; Oehlmann, M.; Meyerhoff, J. Does the Use of Mobile Devices (Tablets and
Smartphones) Affect Survey Quality and Choice Behaviour in Web Surveys? Journal of Choice
Modelling 2015, 14, 17–31.
Meitinger, K.; Braun, M.; Behr, D. Sequence Matters in Online Probing: The Impact of the Order of
Probes on Response Quality, Motivation of Respondents, and Answer Content. Survey
Research Methods 2018, 12, 103–120.
Menold, N.; Wolbring, T., Eds. Qualitätssicherung sozialwissenschaftlicher Erhebungsinstrumente;
Schriftenreihe der ASI – Arbeitsgemeinschaft Sozialwissenschaftlicher Institute; Springer VS:
Wiesbaden, 2019.
Mühlichen, A. Privatheit im Zeitalter vernetzter Systeme: Eine empirische Studie; Barbara Budrich:
Opladen, 2018.
Reinecke, J. Wachstumsmodelle; Rainer Hampp: Munich, 2012.
Reinecke, J. Growth Curve Models and Panel Dropouts: Applications with Criminological Panel Data.
Netherlands Journal of Psychology 2013, 67, 122–131.
Quantitative Methods 329
Wolf, C.; Best, H., Eds. Handbuch der sozialwissenschaftlichen Datenanalyse; VS: Wiesbaden, 2010.
Wolf, C.; Joye, D.; Smith, T. W.; Fu, Y., Eds. The Sage Handbook of Survey Methodology; Sage:
London, 2016.