Moreira O. Advanced Techniques For Collecting Statistical Data 2023

Advanced Techniques for
Collecting Statistical Data

ADVANCED TECHNIQUES FOR
COLLECTING STATISTICAL DATA
Edited by:
Olga Moreira
ARCLER
P r e s s
www.arclerpress.com
Advanced Techniques for Collecting Statistical Data
Olga Moreira
Arcler Press
224 Shoreacres Road
Burlington, ON L7L 2H2
Canada
www.arclerpress.com
Email: orders@arclereducation.com
e-book Edition 2023
ISBN: 978-1-77469-547-0 (e-book)
This book contains information obtained from highly regarded resources. Reprinted
material sources are indicated. Copyright for individual articles remains with the au-
thors as indicated and published under Creative Commons License. A Wide variety of
references are listed. Reasonable efforts have been made to publish reliable data and
views articulated in the chapters are those of the individual contributors, and not neces-
sarily those of the editors or publishers. Editors or publishers are not responsible for
the accuracy of the information in the published chapters or consequences of their use.
The publisher assumes no responsibility for any damage or grievance to the persons or
property arising out of the use of any materials, instructions, methods or thoughts in the
book. The editors and the publisher have attempted to trace the copyright holders of all
material reproduced in this publication and apologize to copyright holders if permission
has not been obtained. If any copyright holder has not been acknowledged, please write
to us so we may rectify.
Notice: Registered trademark of products or corporate names are used only for explana-
tion and identification without intent of infringement.
© 2023 Arcler Press
ISBN: 978-1-77469-497-8 (Hardcover)
Arcler Press publishes wide variety of books and eBooks. For more information about
Arcler Press and its products, visit our website at www.arclerpress.com
DECLARATION
Some content or chapters in this book are open access copyright free
published research work, which is published under Creative Commons
License and are indicated with the citation. We are thankful to the
publishers and authors of the content and chapters as without them this
book wouldn’t have been possible.
ABOUT THE EDITOR
Olga Moreira is a Ph.D. and M.Sc. in Astrophysics and B.Sc. in Physics/Applied

Mathematics (Astronomy). She is an experienced technical writer and data analyst. As
a graduate student, she held two research grants to carry out her work in Astrophysics at
two of the most renowned European institutions in the fields of Astrophysics and Space
Science (the European Space Agency, and the European Southern Observatory). She
is currently an independent scientist, peer-reviewer and editor. Her research interest is
solar physics, machine learning and artificial neural networks.
TABLE OF CONTENTS
List of Contributors .......................................................................................xv

List of Abbreviations .................................................................................... xxi
Preface.................................................................................................. ....xxiii
Chapter 1 Series: Practical Guidance to Qualitative Research. Part 1: Introduction . 1

Abstract ..................................................................................................... 1
Introduction ............................................................................................... 2
Qualitative Research .................................................................................. 2
High-Quality Qualitative Research in Primary Care ................................... 3
Further Education And Reading.................................................................. 4
Acknowledgements ................................................................................... 5
References ................................................................................................. 6
Chapter 2 Series: Practical Guidance to Qualitative Research. Part 2: Context,

Research Questions and Designs ............................................................... 7
Abstract ..................................................................................................... 7
Introduction ............................................................................................... 9
Context ...................................................................................................... 9
Research Questions ................................................................................. 10
Designing Qualitative Studies .................................................................. 13
Acknowledgements ................................................................................. 18
References ............................................................................................... 19
Chapter 3 Series: Practical Guidance to Qualitative Research. Part 3:

Sampling, Data Collection and Analysis .................................................. 21
Abstract ................................................................................................... 21
Introduction ............................................................................................. 23
Sampling ................................................................................................. 23
Data Collection ....................................................................................... 27
Analysis ................................................................................................... 35
References ............................................................................................... 39
Chapter 4 Series: Practical Guidance to Qualitative Research. Part 4:

Trustworthiness and Publishing ............................................................... 41
Abstract ................................................................................................... 41
Introduction ............................................................................................. 43
Trustworthiness ........................................................................................ 43
Publishing................................................................................................ 47
References ............................................................................................... 51
Chapter 5 Participant Observation as a Data Collection Method ............................ 53

Abstract ................................................................................................... 53
Introduction ............................................................................................. 54
Definitions ............................................................................................... 54
The History of Participant Observation as a Method ................................. 55
Advantages and Disadvantages of Using Participant Observation ............. 59
The Stances of the Observer..................................................................... 62
How Does One Know What to Observe? ................................................. 64
How Does One Conduct an Observation? ............................................... 65
Tips for Collecting Useful Observation Data ............................................ 73
Keeping and Analyzing Field Notes and Writing Up the Findings ............. 76
Teaching Participant Observation ............................................................. 80
Summary ................................................................................................. 84
References ............................................................................................... 85
Chapter 6 Attitudes towards Participation in a Passive Data

Collection Experiment ............................................................................. 89
Abstract ................................................................................................... 89
Introduction ............................................................................................. 90
Background ............................................................................................. 91
Methods and Design ................................................................................ 98
Results and Discussion .......................................................................... 103
Conclusions ........................................................................................... 110
Acknowledgments ................................................................................. 111
x
Appendix B. Internal Validity Test of Vignette Responses ........................ 114
Author Contributions ............................................................................. 115
References ............................................................................................. 116
Chapter 7 An Integrative Review on Methodological Considerations in

Mental Health Research – Design, Sampling, Data Collection
Procedure and Quality Assurance ......................................................... 121
Abstract ................................................................................................. 121
Background ........................................................................................... 123
Methods ................................................................................................ 125
Results ................................................................................................... 129
Discussion ............................................................................................. 142
Conclusion ............................................................................................ 148
Acknowledgements ............................................................................... 149
Authors’ Contributions ........................................................................... 149
References ............................................................................................. 150
Chapter 8 Wiki Surveys: Open and Quantifiable Social Data Collection ............... 155
Abstract ................................................................................................. 155
Introduction ........................................................................................... 156
Wiki Surveys .......................................................................................... 157
Case Studies .......................................................................................... 165
Discussion ............................................................................................. 171
References ............................................................................................. 174
Chapter 9 Towards a Standard Sampling Methodology on Online Social

Networks: Collecting Global Trends on Twitter .................................... 181
Abstract ................................................................................................. 182
Introduction ........................................................................................... 182
Related Work ......................................................................................... 185
Problem Definition ................................................................................ 187
Random Strategies ................................................................................. 188
The Alternative Version of the Metropolis-Hastings Algorithm ................ 191
Sampling Global Trends on Twitter ......................................................... 192
Results ................................................................................................... 195
xi
Limitations ............................................................................................. 202
Conclusions ........................................................................................... 202
References ............................................................................................. 205
Chapter 10 Mobile Data Collection: Smart, but Not (Yet) Smart Enough ................ 209
Background ........................................................................................... 209
Smart Mobile Data Collection................................................................ 210
Smarter Mobile Data Collection in the Future ........................................ 212
Conclusions ........................................................................................... 214
References ............................................................................................. 216
Chapter 11 Comparing a Mobile Phone Automated System With a Paper

and Email Data Collection System: Substudy Within a Randomized
Controlled Trial ..................................................................................... 221
Abstract ................................................................................................. 222
Introduction ........................................................................................... 223
Methods ................................................................................................ 224
Results ................................................................................................... 230
Discussion ............................................................................................. 237
References ............................................................................................. 242
Chapter 12 Big Data Collection and Object Participation Willingness:

An Analytical Framework from the Perspective of Value Balance ......... 247
Abstract ................................................................................................. 247
The Origin of Research .......................................................................... 248
The Presentation of Analytical Framework.............................................. 250
Conclusion and Prospect ....................................................................... 255
Reference .............................................................................................. 256
Chapter 13 Research on Computer Simulation Big Data

Intelligent Collection and Analysis System ............................................ 257
Abstract ................................................................................................. 257
Introduction ........................................................................................... 258
xii
Principles of Big Data Intelligent Fusion................................................. 259
Experimental Simulation Analysis .......................................................... 262
Conclusion ............................................................................................ 265
References ............................................................................................. 266
Chapter 14 Development of a Mobile Application for Smart Clinical

Trial Subject Data Collection and Management .................................... 267
Abstract ................................................................................................. 268
Introduction ........................................................................................... 268
Materials and Methods .......................................................................... 270
Results ................................................................................................... 272
Discussion ............................................................................................. 278
Conclusions ........................................................................................... 281
References ............................................................................................. 283
Chapter 15 The CoronaSurveys System for COVID-19

Incidence Data Collection and Processing ............................................ 287
Introduction ........................................................................................... 288
Data Collection ..................................................................................... 290
Data Analysis ......................................................................................... 294
Data Visualization.................................................................................. 296
Results ................................................................................................... 300
Conclusion ............................................................................................ 301
References ............................................................................................. 303
Chapter 16 Artificial Intelligence Based Body Sensor Network Framework—

Narrative Review: Proposing an End-to-End Framework using
Wearable Sensors, Real-Time Location Systems and Artificial
Intelligence/Machine Learning Algorithms for Data Collection,
Data Mining and Knowledge Discovery in Sports and Healthcare ........ 305
Abstract ................................................................................................. 306
Introduction ........................................................................................... 307
Artificial Intelligence-Based Body Sensor Network Framework: AIBSNF . 318
Specific Applications ............................................................................. 320
General Applications ............................................................................. 322
Limitations And Issues............................................................................ 324
xiii
Conclusion ............................................................................................ 326
References ............................................................................................. 327
Chapter 17 DAViS: a Unified Solution for Data Collection, Analyzation,

and Visualization in Real-time Stock Market Prediction........................ 337
Abstract ................................................................................................. 337
Introduction ........................................................................................... 338
Related Literature................................................................................... 343
Preliminary ............................................................................................ 345
The Proposed DAViS Framework ............................................................ 347
Experimental Setup ................................................................................ 361
Experimental Result ............................................................................... 364
Conclusions and Future Direction.......................................................... 375
References ............................................................................................. 377
Index ..................................................................................................... 381
xiv
LIST OF CONTRIBUTORS
Albine Moser
Faculty of Health Care, Research Centre Autonomy and Participation of Chronically Ill
People, Zuyd University of Applied Sciences, Heerlen, The Netherlands
Faculty of Health, Medicine and Life Sciences, Department of Family Medicine,
Maastricht University, Maastricht, The Netherlands
Irene Korstjens
Faculty of Health Care, Research Centre for Midwifery Science, Zuyd University of
Applied Sciences, Maastricht, The Netherlands
Barbara B. Kawulich
University of West Georgia Educational Leadership and Professional Studies
Department1601 Maple Street, Room 153, Education Annex Carrollton, GA 30118,
USA
Bence Ságvári
Computational Social Science—Research Center for Educational and Network Studies
(CSS–RECENS), Centre for Social Sciences, Tóth Kálmán Utca 4, 1097 Budapest,
Hungary
Institute of Communication and Sociology, Corvinus University, Fővám tér 8, 1093
Budapest, Hungary
Attila Gulyás
Hungary
Júlia Koltai
Hungary
Department of Network and Data Science, Central European University, Quellenstraße
51, 1100 Vienna, Austria
Faculty of Social Sciences, Eötvös Loránd University of Sciences, Pázmány Péter
Sétány 1/A, 1117 Budapest, Hungary
Eric Badu
School of Nursing and Midwifery, The University of Newcastle, Callaghan, Australia
Anthony Paul O’Brien

Faculty of Health and Medicine, School Nursing and Midwifery, University of
Newcastle, Callaghan, Australia
Rebecca Mitchell
Faculty of Business and Economics, Macquarie University, North Ryde, Australia
Matthew J. Salganik
Department of Sociology, Center for Information Technology Policy, and Office of
Population Research, Princeton University, Princeton, NJ, USA
Karen E. C. Levy
Information Law Institute and Department of Media, Culture, and Communication,
New York University, New York, NY, USA and Data & Society Research Institute,
New York, NY, USA
C. A. Piña-García
Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Departamento de
Ciencias de la Computación, Universidad Nacional Autónoma de México, Ciudad de
México, México
Carlos Gershenson
México, México
Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México,
Circuito Maestro Mario de la Cueva S/N, Ciudad Universitaria, Ciudad de México,
04510 México
SENSEable City Lab, Massachusetts Institute of Technology, 77 Massachusetts Avenue,
Cambridge, 02139 USA
MoBS Lab, Network Science Institute, Northeastern University, 360 Huntington av
1010-177, Boston, 02115 USA
ITMO University, Birzhevaya liniya 4, St. Petersburg, 199034 Russia
J. Mario Siqueiros-García
México, México
xvi
Alexander Seifert
University Research Priority Program “Dynamics of Healthy Aging”, University of
Zurich, Zurich, Switzerland
Matthias Hofer
Department of Communication and Media Research, University of Zurich, Zurich,
Switzerland
Mathias Allemand
Department of Psychology, University of Zurich, Zurich, Switzerland
Diana M Bond, PhD

Sydney School of Public Health, Faculty of Medicine and Health, University of Sydney,
Sydney, Australia
Jeremy Hammond, PhD

Strategic Ventures, University of Sydney, Sydney, Australia
Antonia W Shand, MB ChB

Children’s Hospital at Westmead Clinical School, Faculty of Medicine and Health,
University of Sydney, Sydney, Australia
Department for Maternal Fetal Medicine, Royal Hospital for Women, Sydney, Australia
Natasha Nassar, PhD

Sydney, Australia
Xiang Huang
Guangdong University of Finance and Economics, College of entrepreneurship
education, Guangzhou 510320, China
Hongying Liu
Department of Computer Science and Engineering, Guangzhou College of Technology
and Business, Guangzhou 510850, China
xvii
Hyeongju Ryu
Biomedical Research Institute, Seoul National University Hospital, Seoul 03080, Korea
Meihua Piao
Office of Hospital Information, Seoul National University Hospital, Seoul 03080, Korea
Heejin Kim
Clinical Trials Center, Seoul National University Hospital, Seoul 03080, Korea
Wooseok Yang
Kyung Hwan Kim

Department of Thoracic and Cardiovascular Surgery, Seoul National University
Hospital, Seoul National University College of Medicine, Seoul 03080, Korea
Carlos Baquero
U. Minho and INESC TEC, Braga, Portugal
Paolo Casari
Department of Information Engineering and Computer Science, University of Trento,
Trento, Italy
Antonio Fernandez Anta

IMDEA Networks Institute, Madrid, Spain
Amanda García-García
Davide Frey
Inria Rennes, Rennes, France
Augusto Garcia-Agundez
Multimedia Communications Lab, TU Darmstadt, Darmstadt, Germany
Chryssis Georgiou
Department of Computer Science, University of Cyprus, Nicosia, Cyprus
Benjamin Girault
Department of Electrical and Computer Engineering University of Southern California,
Los Angeles, CA, United States
xviii
Antonio Ortega
Mathieu Goessens
Consulting, Rennes, France
Harold A. Hernández-Roig
Department of Statistics, UC3M & UC3M-Santander Big Data Institute, Getafe, Spain
Nicolas Nicolaou
Algolysis Ltd, Nicosia, Cyprus
Efstathios Stavrakis
Oluwasegun Ojo
IMDEA Networks Institute and UC3M, Madrid, Spain
Julian C. Roberts
Skyhaven Media, Liverpool, United Kingdom
Ignacio Sanchez
InqBarna, Barcelona, Spain
Ashwin A. Phatak
Institute of Exercise Training and Sport Informatics, German Sports University,
Cologne, Germany
Franz-Georg Wieland
Institute of Physics, University of Freiburg, Freiburg im Breisgau, Germany
Kartik Vempala
Bloomberg LP, New York, USA
Frederik Volkmar
Cologne, Germany
Daniel Memmert
Cologne, Germany
xix
Suppawong Tuarob
Faculty of Information and Communication Technology, Mahidol University, Nakhon
Pathom 73170, Thailand
Poom Wettayakorn
Ponpat Phetchai
Siripong Traivijitkhun
Sunghoon Lim
Department of Industrial Engineering, Ulsan National Institute of Science and
Technology, Ulsan 44919, Republic of Korea
Institute for the 4th Industrial Revolution, Ulsan National Institute of Science and
Thanapon Noraset
Tipajin Thaipisutikul
xx
LIST OF ABBREVIATIONS
BSN Body sensor networks

AIBSNF Artificial intelligence-based body sensor network framework
RTLS Real-time location systems
AI/ML Artificial intelligence and machine learning
ECG Electrocardiogram
EMG Electromyogram
ANN Artificial neural networks
GSR Galvanic skin resistance
LED Light emitting diode
IMU Inertial measurement units
LoS Line of sight
NLoS No line of sight
MLFF Multi-level fusion framework
RA Rheumatoid arthritis
TCM Traditional Chinese medicine
PREFACE
We live in an age of Big Data. This is changing the way researchers collect and
preprocess data. This book aims to provide a broad view of the current methods and
techniques, as well as automated systems for statistical data collection. It is divided into
three parts, each focusing on a different aspect of the statistical data collection process.
The first part of the book is focused on introducing the readers to qualitative research
data collection methods. Qualitative Chapters 1 to 4 include a practical guide by Moser
& Korstjens (2017) on the designing, sampling, collecting, and analyzing data about
people, processes, and cultures in qualitative research. Chapters 5 to 6 are focused on
observation-based methods, participant observation specifically.
Chapter 1 introduces the concept of “qualitative research” from the point of view of
clinical trials and healthcare sciences. Qualitative research is seen as “the investigation
of phenomena, typically in an in-depth and holistic fashion, through the collection
of rich narrative materials using a flexible research design”. Chapter 2 is devoted to
giving an answer to frequent queries about the context, research questions and design
of qualitative research. Chapter 3 is devoted to sampling strategies, as well as data
collection and analysis plans. Chapter 4 reflects upon the trustworthiness of the collected
data. Chapter 5 includes the various definitions of participant observation, the purposes
for which it is used, along with exercises for teaching observation techniques. Chapter
6 includes an exploratory study conducted in Hungary using a factorial design-based
online survey to explore the willingness to participate in a future research project based
on active and passive data collection via smartphones.
The second part of the book is focused on data mining of information collected from
clinical and social studies surveys, as well as from social media. Chapter 7 includes a
review of methods used in clinical research, from study design to sampling and data
collection. Chapters 8 and 9 data collection methods that facilitate quantification of
information from online survey respondents and social media. Chapter 8 presents a new
method for data collection and data analysis for pairwise wiki surveys using two proof-
of-concept case studies involving the free and open-source website www.allourideas.
org. Chapter 9 proposes a methodology to carry out an efficient data collecting process
via three random strategies: Brownian, Illusion and Reservoir. It shows that this new
methodology be used to collect global trends on Twitter. Chapters 10 to 11 are focused
on mobile data collection methods, and chapters 12 to 13 are focused on big data
collection systems. Chapter 10 reflects on many challenges of mobile data collection
with smartphones, as well as on the interesting avenues the future development of this
technology can provide for clinical research. Chapter 11 compares a web-based mobile
phone automated system (MPAS) with the traditional paper and email-based data
collection (PEDC). It demonstrates that MPAS has the potential to be a more effective
and acceptable method for improving the overall management, treatment compliance,
and methodological quality of clinical research. Chapter 12 proposes an analytical
framework, which considers the decision-making of big data objects participating in
the big data collection process. This new framework aims to reflect on factors that
can improve the participation willingness of big data objects. Chapter 13 proposes a
JA-va3D-based big data network multi-resolution acquisition method which has lower
acquisition costs, shorter completion times, and higher acquisition accuracy than most
current data collection and analysis systems.
The third and last part of this book is focused on the current efforts to optimize and
automate data collection procedures. Chapter 14 presents the development of a mobile
application for collecting subject data for clinical trials which is shown to increase
the efficiency of clinical trial management. Chapter 15 describes the CoronaSurveys
system developed for facilitating COVID-19 data collection. The proposed system
includes multiple components and processes, including the web survey; the mobile
apps; the survey responses cleaning and aggregation; the data storage and publication;
the data processing and estimates computation; and the results’ visualization. Chapter
16 is focused on machine learning algorithms for data collection, data mining and
knowledge discovery in sports and healthcare. It proposes an artificial intelligence-
based body sensor network framework (AIBSNF), a framework for strategic use of
body sensor networks (BSN), which combines with a real-time location system (RTLS)
and wearable biosensors to collect multivariate, low-noise, and high-fidelity data.
Chapter 17 introduces DAViS as an automated system for data collection, analysis, and
visualization of stock market prediction in real-time. The proposed stock forecasting
method outperforms a traditional baseline and confirms that leveraging an ensemble
scheme of machine learning methods with contextual information improves stock
prediction performance.
Chapter
SERIES: PRACTICAL
GUIDANCE TO
QUALITATIVE RESEARCH.
1
PART 1: INTRODUCTION
Albine Mosera,b and Irene Korstjensc

a
People, Zuyd University of Applied Sciences, Heerlen, The Netherlands

Faculty of Health, Medicine and Life Sciences, Department of Family Medicine, Maastricht
b
University, Maastricht, The Netherlands

c
Applied Sciences, Maastricht, The Netherlands
ABSTRACT
In the course of our supervisory work over the years, we have noticed that
qualitative research tends to evoke a lot of questions and worries, so-called
Frequently Asked Questions. This journal series of four articles intends to
provide novice researchers with practical guidance for conducting high-
Citation: (APA): Moser, A., & Korstjens, I. (2017). Series: Practical guidance to quali-
tative research. Part 1: Introduction. European Journal of General Practice, 23(1), 271-
273. (4 pages)
Copyright: © This is an Open Access article distributed under the terms of the Creative
Commons Attribution License (http://creativecommons.org/licenses/by/4.0/).
2 Advanced Techniques for Collecting Statistical Data
quality qualitative research in primary care. By ‘novice’ we mean Master’s

students and junior researchers, as well as experienced quantitative
researchers who are engaging in qualitative research for the first time. This
series addresses their questions and provides researchers, readers, reviewers
and editors with references to criteria and tools for judging the quality of
papers reporting on qualitative research. This first article describes the key
features of qualitative research, provides publications for further learning
and reading, and gives an outline of the series.
Keywords: Qualitative research, qualitative methodology, phenomena, nat-

ural context, emerging design, primary care
INTRODUCTION
while many researchers who conducted qualitative research for the first time
understood the tenets of qualitative research, knowing about qualitative
methodology and carrying out qualitative research were two different
things. We noticed that they somehow mixed quantitative and qualitative
methodology and methods. We also observed that they experienced many
uncertainties when doing qualitative research. They expressed a great need
for practical guidance regarding key methodological issues. For example,
questions often heard and addressed were, ‘What kind of literature would I
search for when preparing a qualitative study?’ ‘Is it normal that my research
question seems to change during the study?’ ‘What types of sampling can I
use?’ ‘What methods of data collection are appropriate?’ ‘Can I wait with my
analysis until all data have been collected?’ ‘What are the quality criteria for
qualitative research?’ ‘How do I report my qualitative study?’ This induced
us to write this series providing ‘practical guidance’ to qualitative research.
QUALITATIVE RESEARCH
Qualitative research has been defined as the investigation of phenomena,
typically in an in-depth and holistic fashion, through the collection of rich
narrative materials using a flexible research design [1]. Qualitative research
aims to provide in-depth insights and understanding of real-world problems
and, in contrast to quantitative research, it does not introduce treatments,
manipulate or quantify predefined variables. Qualitative research
encompasses many different designs, which however share several key
features as presented in Box 1.
Series: Practical Guidance to Qualitative Research. Part 1: Introduction 3
Box 1. Key features of qualitative research.
Qualitative research studies phenomena in the natural contexts of individuals or groups.

Qualitative researchers try to gain a deeper understanding of people’s experiences, per-
ceptions, behaviour and processes and the meanings they attach to them.
During the research process, researchers use ‘emerging design’ to be flexible in adjust-
ing to the context.
Data collection and analysis are iterative processes that happen simultaneously as the
research progresses.
Qualitative research is associated with the constructivist or naturalistic

paradigm, which began as a countermovement to the positivistic paradigm
associated with quantitative research. Where positivism assumes that there
is an orderly reality that can be objectively studied, constructivism holds that
there are multiple interpretations of reality and that the goal of the research is
to understand how individuals construct reality within their natural context
[1].
HIGH-QUALITY QUALITATIVE RESEARCH IN

PRIMARY CARE
Qualitative research is a vital aspect of research in primary care and
qualitative studies with a clear and important clinical message can be highly
cited [2,3]. This series intends to provide novice researchers an introduction
to information about conducting high-quality qualitative research in the
field of primary care. By novice researchers, we mean Master’s students
and junior researchers in primary care as well as experienced quantitative
researchers who are engaging in qualitative research for the first time. As
primary care is an interprofessional field, we bear in mind that our readers
have different backgrounds, e.g. general practice, nursing, maternity care,
occupational therapy, physical therapy and health sciences. This series is
not a straightforward ‘cookbook’ but a source to consult when engaging
in qualitative research. We neither explain all the details nor deliver an
emergency kit to solve the sort of problems that all qualitative researchers
encounter at least once in their lifetimes, such as failing audio recorders. We
do focus on topics that have evoked a lot of questions and worries among
novice researchers; the so-called frequently asked questions (FAQs).
We aim to provide researchers with practical guidance for doing
qualitative research. For the journal’s editorial policy, it will serve as a
standard for qualitative research papers. For those who are not involved
in qualitative research on a daily basis, this series might be used as an

introduction to understanding what high-quality qualitative research entails.
This way, the series will also provide readers, reviewers and editors with
references to criteria and tools for judging the quality of papers reporting on
qualitative research.
FURTHER EDUCATION AND READING

As in quantitative research, qualitative research requires excellent
methodology. Therefore, researchers in primary care need to be sufficiently
trained in this type of research [2]. We hope that this series will function as
a stepping stone towards participation in relevant national and international
qualitative research courses or networks and will stimulate reading books and
articles on qualitative research. During our supervisory work, researchers
have mentioned examples of books on qualitative research that helped them
in striving to perform outstanding qualitative research in primary care. Box
2 presents a selection of these books and the BMJ 2008 series on qualitative
research for further reading.
Box 2. Examples of publications on qualitative research.
Brinkmann S, Kvale S. Interviews. Learning the craft of qualitative research interviewing.

3rd ed. Sage: London; 2014.
Bourgeault I, Dingwall R, de Vries R. The SAGE handbook of qualitative methods in health
a
Research. 1st ed. Sage: London; 2010.

Creswell JW. Qualitative research design. Choosing among five approaches. 3rd ed. Sage:
Los Angeles (CA); 2013.
a
Denzin NK, Lincoln YS. The SAGE handbook of qualitative research. 4th ed. Sage: Lon-
don; 2011.
Gray DE. Doing research in the real world. 3rd ed. Sage: London; 2013.
Holloway I & Wheeler S. Qualitative research in nursing and healthcare. 3rd ed. Wiley-
Blackwell: Chichester; 2010.
a
Miles MB, Huberman AM, Saldana J. Qualitative data analysis. A methods sourcebook. 3rd
ed. Sage: Los Angeles (CA); 2014.
Morgan DL, Krueger RA. Focus group kit. Volumes 1–6. Sage: London; 1997.
Polit DF & Beck CT. Nursing Research: Generating and assessing evidence for nursing prac-
tice. 10th ed. Lippincott, Williams & Wilkins: Philadelphia (PA); 2017.
Pope C, Van Royen P, Baker R. Qualitative methods in research on healthcare quality. Qual
Saf Health Care 2002, 11: 148–152.
Salmons J. Qualitative online interviews. 2nd ed. Sage: London; 2015.
Silverman D. Doing qualitative research. 4th ed. Sage: London; 2013.
Series: Practical Guidance to Qualitative Research. Part 1: Introduction 5
Starks H, Trinidad SB. Choose your method: A comparison of phenomenology, discourse

analysis and grounded theory. Qual Health Res 2007;17:1372–1380.
Tracy SJ. Qualitative quality: Eight ‘big-tent’ criteria for excellent qualitative research. Qual
Inq 2010; 16(10):837–851.
BMJ series on qualitative research, published online 7 August 2008:
Kuper A, Reeves S, Levinson W. An introduction to reading and appraising qualitative
research. BMJ 2008;337:a288.
Reeves S, Albert M, Kuper A, Hodges BD. Why use theories in qualitative research? BMJ
2008; 337:a949.
Hodges BD, Kuper A, Reeves S. Discourse analysis. BMJ 2008;337:a879.
Kuper A, Lingard L, Levinson W. Critically appraising qualitative research. BMJ
2008;337:a1035.
Reeves S, Kuper A, Hodges BD. Qualitative research methodologies: ethnography. BMJ
2008;337:a1020.
Lingard L, Albert M, Levinson W. Grounded theory, mixed methods, and action research.
BMJ 2008;337:a567.
a
For advanced learning.
ACKNOWLEDGEMENTS
The authors wish to thank the following junior researchers who have been
participating for the last few years in the so-called ‘think tank on qualitative
research’ project, a collaborative project between Zuyd University of Applied
Sciences and Maastricht University, for their pertinent questions: Erica
Baarends, Jerome van Dongen, Jolanda Friesen-Storms, Steffy Lenzen,
Ankie Hoefnagels, Barbara Piskur, Claudia van Putten-Gamel, Wilma
Savelberg, Steffy Stans, and Anita Stevens. The authors are grateful to Isabel
van Helmond, Joyce Molenaar and Darcy Ummels for proofreading our
manuscripts and providing valuable feedback from the ‘novice perspective’.
REFERENCES
1. Polit DF, Beck CT.. Nursing research: generating and assessing
evidence for nursing practice. 10th ed. Philadelphia (PA): Lippincott,
Williams & Wilkins; 2017.
2. Hepworth J, Key M.. General practitioners learning qualitative
research: a case study of postgraduate education. Aust Fam
Physician 2015;44:760–763.
3. Greenhalgh T, Annandale E, Ashcroft R, et al.. An open letter to the
BMJ editors on qualitative research. BMJ. 2016;352:i563.
Chapter
SERIES: PRACTICAL
GUIDANCE TO
2
PART 2: CONTEXT,
RESEARCH QUESTIONS
AND DESIGNS
Irene Korstjensa and Albine Moserb,c

a
Applied Sciences, Maastricht, The Netherlands;

b
People, Zuyd University of Applied Sciences, Heerlen, The Netherlands;
c
ABSTRACT
frequently asked questions (FAQs). This series of four articles intends to
provide novice researchers with practical guidance for conducting high-quality
Citation: (APA): Korstjens, I., & Moser, A. (2017). Series: Practical guidance to quali-
tative research. Part 2: Context, research questions and designs. European Journal of
General Practice, 23(1), 274-279. (7 pages)
Copyright: © This is an Open Access article distributed under the terms of the Creative
Commons Attribution License (http://creativecommons.org/licenses/by/4.0/).
qualitative research in primary care. By ‘novice’ we mean Master’s students

and junior researchers, as well as experienced quantitative researchers who
are engaging in qualitative research for the first time. This series addresses
their questions and provides researchers, readers, reviewers and editors with
references to criteria and tools for judging the quality of qualitative research
papers. This second article addresses FAQs about context, research questions
and designs. Qualitative research takes into account the natural contexts in
which individuals or groups function to provide an in-depth understanding
of real-world problems. The research questions are generally broad and
open to unexpected findings. The choice of a qualitative design primarily
depends on the nature of the research problem, the research question(s)
and the scientific knowledge one seeks. Ethnography, phenomenology
and grounded theory are considered to represent the ‘big three’ qualitative
approaches. Theory guides the researcher through the research process by
providing a ‘lens’ to look at the phenomenon under study. Since qualitative
researchers and the participants of their studies interact in a social process,
researchers influence the research process. The first article described the key
features of qualitative research, the third article will focus on sampling, data
collection and analysis, while the last article focuses on trustworthiness and
publishing.
Keywords: General practice/family medicine, general qualitative designs

and methods
Key points on context, research questions and designs
• Research questions are generally, broad and open to unexpected
findings, and depending on the research process might change to
some extent.
• The SPIDER tool is more suited than PICO for searching for
qualitative studies in the literature, and can support the process of
formulating research questions for original studies.
• The choice of a qualitative design primarily depends on the nature
of the research problem, the research question, and the scientific
knowledge one seeks.
• Theory guides the researcher through the research process by
providing a ‘lens’ to look at the phenomenon under study.
• Since qualitative researchers and the participants interact in a
social process, the researcher influences the research process.
Series: Practical Guidance to Qualitative Research. Part 2: Context... 9
INTRODUCTION
In an introductory paper [1], we have described the key features of qualitative
research. The current article addresses frequently asked questions about
context, research questions and design of qualitative research.
CONTEXT
Why is context important?

Qualitative research takes into account the natural contexts in which individuals
or groups function, as its aim is to provide an in-depth understanding of real-
world problems [2]. In contrast to quantitative research, generalizability
is not a guiding principle. According to most qualitative researchers, the
‘reality’ we perceive is constructed by our social, cultural, historical and
individual contexts. Therefore, you look for variety in people to describe,
explore or explain phenomena in real-world contexts. Influence from the
researcher on the context is inevitable. However, by striving to minimalize
your interfering with people’s natural settings, you can get a ‘behind the
scenes’ picture of how people feel or what other forces are at work, which
may not be discovered in a quantitative investigation. Understanding what
practitioners and patients think, feel or do in their natural context, can
make clinical practice and evidence-based interventions more effective,
efficient, equitable and humane. For example, despite their awareness of
widespread family violence, general practitioners (GPs) seem to be hesitant
to ask about intimate partner violence. By applying a qualitative research
approach, you might explore how and why practitioners act this way. You
need to understand their context to be able to interact effectively with
them, to analyse the data, and report your findings. You might consider the
characteristics of practitioners and patients, such as their age, marital status,
education, health condition, physical environment or social circumstances,
and how and where you conduct your observations, interviews and group
discussions. By giving your readers a ‘thick description’ of the participants’
contexts you render their behaviour, experiences, perceptions and feelings
meaningful. Moreover, you enable your readers to consider whether and
how the findings of your study can be transferred to their contexts.
RESEARCH QUESTIONS
Why should the research question be broad and open?

To enable a thorough in-depth description, exploration or explanation of the
phenomenon under study, in general, research questions need to be broad and
open to unexpected findings. Within more in-depth research, for example,
during building theory in a grounded theory design, the research question
might be more focused. Where quantitative research asks: ‘how many, how
much, and how often?’ qualitative research would ask: ‘what?’ and even
more ‘how, and why?’ Depending on the research process, you might feel a
need for fine-tuning or additional questions. This is common in qualitative
research as it works with ‘emerging design,’ which means that it is not
possible to plan the research in detail at the start, as the researchers have
to be responsive to what they find as the research proceeds. This flexibility
within the design is seen as a strength in qualitative research but only within
an overall coherent methodology.
What kind of literature would I search for when preparing a

qualitative study?
You would search for literature that can provide you with insights into the
current state of knowledge and the knowledge gap that your study might
address (Box 1). You might look for original quantitative, mixed-method
and qualitative studies, or reviews such as quantitative meta-analyses or
qualitative meta-syntheses. These findings would give you a picture of the
empirical knowledge gap and the qualitative research questions that might
lead to relevant and new insights and useful theories, models or concepts for
studying your topic. When little knowledge is available, a qualitative study
can be a useful starting point for subsequent studies. If in preparing your
qualitative study, you cannot find sufficient literature about your topic, you
might turn to proxy literature to explore the landscape around your topic.
For example, when you are one of the very first researchers to study shared
decision-making or health literacy in maternity care for disadvantaged
parents-to-be, you might search for existing literature on these topics in
other healthcare settings, such as general practice.
Box 1. Searching the literature for qualitative studies: the SPIDER tool.
Based on Cooke et al. [3].
S Sample: qualitative research uses smaller samples, as findings are not intended to be
generalized to the general population.
PI Phenomenon of Interest: qualitative research examines how and why certain experi-
ences, behaviours and decisions occur (in contrast to the effectiveness of intervention).
D Design: refers to the theoretical framework and the corresponding method used, which
influence the robustness of the analysis and findings.
E Evaluation: evaluation outcomes may include more subjective outcomes (views, at-
titudes, perspectives, experiences, etc.).
R Research type: qualitative, quantitative and mixed-methods research could be
searched for.
Why do qualitative researchers prefer SPIDER to PICO?

The SPIDER tool (sample-phenomenon of interest-design-evaluation-
research type) (Box 1) is one of the available tools for qualitative literature
searches [3]. It has been specifically developed for qualitative evidence
synthesis, making it more suitable than PICO (population-intervention-
comparison-outcome) in searching for qualitative studies that focus on
understanding real-life experiences and processes of a variety of participants.
PICO is primarily a tool for collecting evidence from published quantitative
research on prognoses, diagnoses and therapies. Quantitative studies mostly
use larger samples, comparing intervention and control groups, focusing on
quantification of predefined outcomes at group level that can be generalized
to larger populations. In contrast, qualitative research studies smaller
samples in greater depth; it strives to minimalize manipulating their natural
settings and is open to rich and unexpected findings. To suit this approach,
the SPIDER tool was developed by adapting the PICO tool. Although these
tools are meant for searching the literature, they can also be helpful in
formulating research questions for original studies. Using SPIDER might
support you in formulating a broad and open qualitative research question.
An example of an SPIDER-type question for a qualitative study using
interviews is: ‘What are young parents’ experiences of attending antenatal
education?’ The abstract and introduction of a manuscript might contain
this broad and open research question, after which the methods section
provides further operationalization of the elements of the SPIDER tool,
such as (S) young mothers and fathers, aged 17–27 years, 1–12 months after
childbirth, low to high educational levels, in urban or semi-urban regions;
(PI) experiences of antenatal education in group sessions during pregnancy
guided by maternity care professionals; (D) phenomenology, interviews;

(E) perceived benefits and costs, psychosocial and peer support received,
changes in attitude, expectations, and perceived skills regarding healthy
lifestyle, childbirth, parenthood, etc.; and (R) qualitative.
Is it normal that my research question seems to change during

the study?
During the research process, the research question might change to a certain
degree because data collection and analysis sharpens the researcher’s
lenses. Data collection and analysis are iterative processes that happen
simultaneously as the research progresses. This might lead to a somewhat
different focus of your research question and to additional questions.
However, you cannot radically change your research question because that
would mean you were performing a different study. In the methods section,
you need to describe how and explain why the original research question
was changed.
For example, let us return to the problem that GPs are hesitant to ask
about intimate partner violence despite their awareness of family violence.
To design a qualitative study, you might use SPIDER to support you in
formulating your research question. You purposefully sample GPs, varying
in age, gender, years of experience and type of practice (S-1). You might
also decide to sample patients, in a variety of life situations, who have
been faced with the problem (S-2). You clarify the phenomenon of family
violence, which might be broadly defined when you design your study—e.g.
family abuse and violence (PI-1). However, as your study evolves you might
feel the need for fine-tuning—e.g. asking about intimate partner violence
(PI-2). You describe the design, for instance, a phenomenological study
using interviews (D), as well as the ‘think, feel or do’ elements you want to
evaluate in your qualitative research. Depending on what is already known
and the aim of your research, you might choose to describe actual behaviour
and experiences (E-1) or explore attitudes and perspectives (E-2). Then, as
your study progresses, you also might want to explain communication and
follow-up processes (E-3) in your qualitative research (R).
Each of your choices will be a trade-off between the intended variety,
depth and richness of your findings and the required samples, methods,
techniques and efforts for data collection and analyses. These choices lead
to different research questions, for example:
• ‘What are GPs’ and patients’ attitudes and perspectives towards
discussing family abuse and violence?’ Or:
• ‘How do GPs behave during the communication and follow-
up process when a patient’s signals suggest intimate partner
violence?’
DESIGNING QUALITATIVE STUDIES
How do I choose a qualitative design?

As in quantitative research, you base the choice of a qualitative design
primarily on the nature of the research problem, the research question and the
scientific knowledge you seek. Therefore, instead of simply choosing what
seems easy or interesting, it is wiser to first consider and discuss with other
qualitative researchers the pros and cons of different designs for your study.
Then, depending on your skills and your knowledge and understanding of
qualitative methodology and your research topic, you might seek training
or support from other qualitative researchers. Finally, just as in quantitative
research, the resources and time available and your access to the study
settings and participants also influence the choices you make in designing
the study.
What are the most important qualitative designs?

Ethnography [4], phenomenology [5], and grounded theory [6] are
considered the ‘big three’ qualitative approaches [7] (Box 2). Box 2 shows
that they stem from different theoretical disciplines and are used in various
domains focusing on different areas of inquiry. Furthermore, qualitative
research has a rich tradition of various designs [2]. Box 3 presents other
qualitative approaches such as case studies [8], conversation analysis [9],
narrative research [10], hermeneutic research [11], historical research [12],
participatory action research and [13], participatory community research
[14], and research based on critical social theory [15], for example, feminist
research or empowerment evaluation [16]. Some researchers do not mention
a specific qualitative approach or research tradition but use a descriptive
generic research [17] or say that they used thematic analysis or content
analysis, an analysis of themes and patterns that emerge in the narrative

content from a qualitative study [2]. This form of data analysis will be
addressed in Part 3 of our series.
Box 2. The ‘big three’ approaches in qualitative study design. Based on Polit
and Beck [2].
Box 3. Definitions of other qualitative research approaches. Based on Polit and

Beck [2].
Ethnography Phenomenology Grounded theory

Definition A branch of human A qualitative research A qualitative research
enquiry, associated with tradition, with roots in methodology with
anthropology that focuses philosophy and psy- roots in sociology
on the culture of a group chology, that focuses on that aims to develop
of people, with an effort to the lived experience of theories grounded in
understand the world view humans. real-world observa-
of those under study. tions.
Discipline Anthropology Psychology, philosophy Sociology
Domain Culture Lived experience Social settings
Area of Holistic view of a culture. Experiences of indi- Social structural
inquiry viduals within their process within a social
experiential world or setting.
‘life-world’.
Focus Understanding the mean- Exploring how indi- Building theories
ings and behaviours asso- viduals make sense of about social phenom-
ciated with the member- the world to provide ena.
ship of groups, teams, etc. insightful accounts of
their subjective experi-
ence.
Case study A research method involving a thorough, in-depth analysis of an

individual, group or other social unit.
Conversation analysis Form of discourse analysis, a qualitative tradition from the
discipline of sociolinguistics that seeks to understand the rules,
mechanisms, and structure of conversations.
Critical social theory An approach to viewing the world that involves a critique of
society, with the goal of envisioning new possibilities and ef-
fecting social change.
Feminist research Research that seeks to understand, typically through qualita-
tive approaches, how gender and a gendered social order shape
women’s lives and their consciousness.
Hermeneutics A qualitative research tradition, drawing on interpretative

phenomenology that focuses on the lived experience of humans,
and how they interpret those experiences.
Historical research Systematic studies designed to discover facts and relationships
about past events.
Narrative research A narrative approach that focuses on the story as the object of
the inquiry.
Participatory action A collaborative research approach between researchers and par-
research ticipants based on the premise that the production of knowledge
can be political and used to exert power.
Community-based par- A research approach that enlists those who are most affected by
ticipatory research a community issue—typically in collaboration or partnership
with others who have research skills—to conduct research on
and analyse that issue, with the goal to resolve it.
Content analysis The process or organizing and integrating material from docu-
ments, often-narrative information from a qualitative study,
according to key concepts and themes.
Depending on Your Research Question, You might Choose One

of the ‘Big Three’ Designs
Let us assume that you want to study the caring relationship in palliative care
in a primary care setting for people with COPD. If you are interested in the
care provided by family caregivers from different ethnic backgrounds, you
will want to investigate their experiences. Your research question might be
‘What constitutes the caring relationship between GPs and family caregivers
in the palliative care for people with COPD among family caregivers of
Moroccan, Syrian, and Iranian ethnicity?’ Since you are interested in the
caring relationship within cultural groups or subgroups, you might choose
ethnography. Ethnography is the study of culture within a society, focusing
on one or more groups. Data is collected mostly through observations,
informal (ethnographic) conversations, interviews and/or artefacts. The
findings are presented in a lengthy monograph where concepts and patterns
are presented in a holistic way using context-rich description.
If you are interested in the experiential world or ‘life-world’ of the
family caregivers and the impact of caregiving on their own lives, your
research question might be ‘What is the lived experience of being a family
caregiver for a family member with COPD whose end is near?’ In such a
case, you might choose phenomenology, in which data are collected through
in-depth interviews. The findings are presented in detailed descriptions of
participants’ experiences, grouped in themes.
If you want to study the interaction between GPs and family caregivers
to generate a theory of ‘trust’ within caring relationships, your research
question might be ‘How does a relationship of trust between GPs and family
caregivers evolve in end-of-life care for people with COPD?’ Grounded
theory might then be the design of the first choice. In this approach, data
are collected mostly through in-depth interviews, but may also include
observations of encounters, followed by interviews with those who were
observed. The findings presented consist of a theory, including a basic social
process and relevant concepts and categories.
If you merely aim to give a qualitative description of the views of family
caregivers about facilitators and barriers to contacting GPs, you might use
content analysis and present the themes and subthemes you found.
What is the role of theory in qualitative research?

The role of theory is to guide you through the research process. Theory
supports formulating the research question, guides data collection and
analysis, and offers possible explanations of underlying causes of or
influences on phenomena. From the start of your research, theory provides
you with a ‘lens’ to look at the phenomenon under study. During your study,
this ‘theoretical lens’ helps to focus your attention on specific aspects of the
data and provides you with a conceptual model or framework for analysing
them. It supports you in moving beyond the individual ‘stories’ of the
participants. This leads to a broader understanding of the phenomenon of
study and a wider applicability and transferability of the findings, which
might help you formulate new theory, or advance a model or framework.
Note that research does not need to be always theory-based, for example,
in a descriptive study, interviewing people about perceived facilitators and
barriers for adopting new behaviour.
What is my role as a researcher?

As a qualitative researcher, you influence the research process. Qualitative
researchers and the study participants always interact in a social process. You
build a relationship midst data collection, for the short-term in an interview,
or for the long-term during observations or longitudinal studies. This
influences the research process and its findings, which is why your report
needs to be transparent about your perspective and explicitly acknowledge
your subjectivity. Your role as a qualitative researcher requires empathy as
well as distance. By empathy, we mean that you can put yourself into the
participants’ situation. Empathy is needed to establish a trusting relationship

but might also bring about emotional distress. By distance, we mean that
you need to be aware of your values, which influence your data collection,
and that you have to be non-judgemental and non-directive.
There is always a power difference between the researcher and
participants. Especially, feminist researchers acknowledge that the research
is done by, for, and about women and the focus is on gender domination
and discrimination. As a feminist researcher, you would try to establish a
trustworthy and non-exploitative relationship and place yourself within the
study to avoid objectification. Feminist research is transformative to change
oppressive structures for women [16].
What ethical issues do I need to consider?

Although qualitative researchers do not aim to intervene, their interaction
with participants requires careful adherence to the statement of ethical
principles for medical research involving human subjects as laid down
in the Declaration of Helsinki [18]. It states that healthcare professionals
involved in medical research are obliged to protect the life, health, dignity,
integrity, right to self-determination, privacy and confidentiality of personal
information of research subjects. The Declaration also warrants that all
vulnerable groups and individuals should receive specifically considered
protection. This is also relevant when working in contexts of low-income
countries and poverty. Furthermore, researchers must consider the ethical,
legal and regulatory norms and standards in their own countries, as well as
applicable international norms and standards. You might contact your local
Medical Ethics Committee before setting up your study. In some countries,
Medical Ethics Committees do not review qualitative research [2]. In that
case, you will have to adhere to the Declaration of Helsinki [18], and you
might seek approval from a research committee at your institution or the
board of your institution.
In qualitative research, you have to ensure anonymity by code numbering
the tapes and transcripts and removing any identifying information from
the transcripts. When you work with transcription offices, they will need to
sign a confidentiality agreement. Even though the quotes from participants
in your manuscripts are anonymized, you cannot always guarantee full
confidentiality. Therefore, you might ask participants special permission for
using these quotes in scientific publications.
The next article in this Series on qualitative research, Part 3, will focus
on sampling, data collection, and analysis [19]. In the final article, Part 4, we
address two overarching themes: trustworthiness and publishing [20].
ACKNOWLEDGEMENTS
The authors thank the following junior researchers who have been
participating for the last few years in the so-called ‘Think tank on qualitative
REFERENCES
1. Moser A, Korstjens I.. Series: Practical guidance to qualitative research.
Part 1: Introduction. Eur J Gen Pract. 2017;23:271-273.
2. Polit DF, Beck CT, Nursing research: Generating and assessing
3. Cooke A, Smith D, Booth A.. Beyond PICO: the SPIDER tool for
qualitative evidence synthesis. Qual Health Res. 2012; 22:1435–1443.
4. Atkinson P, Coffey A, Delamount S, et al.. Handbook of ethnography.
Thousand Oaks (CA): Sage; 2001.
5. Smith JA, Flowers P, Larkin M.. Interpretative phenomenological
analysis. theory, method and research. London (UK): Sage; 2010.
6. Charmaz K. Constructing grounded theory. 2nd ed. Thousand Oaks
(CA): Sage; 2014.
7. Creswell JW. Qualitative research design. Choosing among five
approaches. 3rd ed. Los Angeles (CA): Sage; 2013.
8. Yin R. Case study research: design and methods (5th ed.). Thousand
Oaks (CA): Sage; 2014.
9. Ten HP. Doing conversation analysis (2nd ed). London (UK): Sage;
2007.
10. Riessman CK. Narrative methods for the human sciences. Thousand
11. Fleming V, Gaidys U, Robb Y.. Hermeneutic research in nursing:
developing a Gadamerian-based research method. Nurs Inq.
2003;10:113–120.
12. Lundy KS, Historical research. In Munhall PL, ed. Nursing research:
a qualitative perspective. 5th ed. (pp 381–398). Sudbury (MA): Jones
& Bartlett; 2012.
13. Koch T, Kralik D.. Participatory action research in health care. Oxford
(UK): Blackwell; 2006.
14. Minkler M & Wallerstein N, editors. Community-based participatory
research for health. San Francisco (CA): Jossey-Bass Publishers; 2003.
15. Dant T. Critical social theory. Culture, society and critique. London
(UK): Sage; 2004.
16. Hesse-Biber S (editor). Feminist research practice: a primer. Thousand
17. Sandelowski M. Whatever happened to qualitative description? Res

Nurs Health. 2010;23:334–340.
18. World Medical Association . Declaration of Helsinki. Ethical principles
for medical research involving human subjects. 2013. [Internet]; [cited
2017 Aug 9]. Available from: https://www.wma.net/policies-post/
wma-declaration-of-helsinki-ethical-principles-for-medical-research-
involving-human-subjects/
19. Moser A, Korstjens I.. Series: Practical guidance to qualitative
research. Part 3: Sampling, data collection and analysis. Eur J Gen
Pract. 2018;24. DOI: 10.1080/13814788.2017.1375091.
20. Korstjens I, Moser A.. Series: Practical guidance to qualitative research.
Part 4: Trustworthiness and publishing. Eur J Gen Pract. 2018;24.
DOI: 10.1080/13814788.2017.1375092.
Chapter
SERIES: PRACTICAL
GUIDANCE TO
3
PART 3: SAMPLING, DATA
COLLECTION AND
ANALYSIS
Albine Moser and Irene Korstjens

a
Faculty of Health Care, Research Centre Autonomy and Participation of Chronically Ill People, Zuyd
University of Applied Sciences, Heerlen, The Netherlands
b
Faculty of Health, Medicine and Life Sciences, Department of Family Medicine, Maastricht University,
Maastricht, The Netherlands
c
Faculty of Health Care, Research Centre for Midwifery Science, Zuyd University of Applied Sciences,
Maastricht, The Netherlands
ABSTRACT
Citation: (APA): Moser, A., & Korstjens, I. (2018). Series: Practical guidance to quali-
tative research. Part 3: Sampling, data collection and analysis. European journal of gen-
eral practice, 24(1), 9-18. (11 pages)
Copyright: © 2018 The Author(s). Published by Informa UK Limited, trading as Taylor
& Francis Group. This is an Open Access article distributed under the terms of the
Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/).

qualitative research papers. The second article focused on context, research
questions and designs, and referred to publications for further reading. This
third article addresses FAQs about sampling, data collection and analysis.
The data collection plan needs to be broadly defined and open at first, and
become flexible during data collection. Sampling strategies should be chosen
in such a way that they yield rich information and are consistent with the
methodological approach used. Data saturation determines sample size and
will be different for each study. The most commonly used data collection
methods are participant observation, face-to-face in-depth interviews and
focus group discussions. Analyses in ethnographic, phenomenological,
grounded theory, and content analysis studies yield different narrative
findings: a detailed description of a culture, the essence of the lived
experience, a theory, and a descriptive summary, respectively. The fourth
and final article will focus on trustworthiness and publishing qualitative
research.
Keywords: General practice/family medicine, general qualitative designs

and methods, sampling, data collection, analysis
Key points on sampling, data collection and analysis
• The data collection plan needs to be broadly defined and open
during data collection.
• Sampling strategies should be chosen in such a way that they
yield rich information and are consistent with the methodological
approach used.
• Data saturation determines sample size and is different for each
study.
• The most commonly used data collection methods are participant
observation, face-to-face in-depth interviews and focus group
discussions.
• Analyses of ethnographic, phenomenological, grounded theory,
and content analysis studies yield different narrative findings:
a detailed description of a culture, the essence of the lived
experience, a theory or a descriptive summary, respectively.
Series: Practical Guidance to Qualitative Research. Part 3: Sampling... 23
INTRODUCTION
This article is the third paper in a series of four articles aiming to provide
practical guidance to qualitative research. In an introductory paper, we
have described the objective, nature and outline of the Series [1]. Part 2 of
the series focused on context, research questions and design of qualitative
research [2]. In this paper, Part 3, we address frequently asked questions
(FAQs) about sampling, data collection and analysis.
SAMPLING
What is a sampling plan?

A sampling plan is a formal plan specifying a sampling method, a sample
size, and procedure for recruiting participants (Box 1) [3]. A qualitative
sampling plan describes how many observations, interviews, focus-group
discussions or cases are needed to ensure that the findings will contribute
rich data. In quantitative studies, the sampling plan, including sample size,
is determined in detail in beforehand but qualitative research projects start
with a broadly defined sampling plan. This plan enables you to include a
variety of settings and situations and a variety of participants, including
negative cases or extreme cases to obtain rich data. The key features of
a qualitative sampling plan are as follows. First, participants are always
sampled deliberately. Second, sample size differs for each study and is small.
Third, the sample will emerge during the study: based on further questions
raised in the process of data collection and analysis, inclusion and exclusion
criteria might be altered, or the sampling sites might be changed. Finally,
the sample is determined by conceptual requirements and not primarily
by representativeness. You, therefore, need to provide a description of
and rationale for your choices in the sampling plan. The sampling plan is
appropriate when the selected participants and settings are sufficient to
provide the information needed for a full understanding of the phenomenon
under study.
Box 1. Sampling strategies in qualitative research. Based on Polit & Beck [3].
Sampling Definition
Purposive sampling Selection of participants based on the researchers’ judgement
about what potential participants will be most informative.
Criterion sampling Selection of participants who meet pre-determined criteria of
importance.
Theoretical sampling Selection of participants based on the emerging findings to en-
sure adequate representation of theoretical concepts.
Convenience sampling Selection of participants who are easily available.
Snowball sampling Selection of participants through referrals by previously selected
participants or persons who have access to potential participants.
Maximum variation Selection of participants based on a wide range of variation in
sampling backgrounds.
Extreme case sampling Purposeful selection of the most unusual cases.
Typical case sampling Selection of the most typical or average participants.
Confirming and discon- Confirming and disconfirming cases sampling supports checking
firming sampling or challenging emerging trends or patterns in the data.
Some practicalities: a critical first step is to select settings and situations

where you have access to potential participants. Subsequently, the best strategy
to apply is to recruit participants who can provide the richest information.
Such participants have to be knowledgeable on the phenomenon and can
articulate and reflect, and are motivated to communicate at length and in
depth with you. Finally, you should review the sampling plan regularly and
adapt when necessary.
What sampling strategies can I use?

Sampling is the process of selecting or searching for situations, context and/
or participants who provide rich data of the phenomenon of interest [3].
In qualitative research, you sample deliberately, not at random. The most
commonly used deliberate sampling strategies are purposive sampling,
criterion sampling, theoretical sampling, convenience sampling and
snowball sampling. Occasionally, the ‘maximum variation,’ ‘typical cases’
and ‘confirming and disconfirming’ sampling strategies are used. Key
informants need to be carefully chosen. Key informants hold special and
expert knowledge about the phenomenon to be studied and are willing to
share information and insights with you as the researcher [3]. They also
help to gain access to participants, especially when groups are studied. In
addition, as researcher, you can validate your ideas and perceptions with
those of the key informants.
What is the connection between sampling types and qualitative

designs?
The ‘big three’ approaches of ethnography, phenomenology, and grounded
theory use different types of sampling.
In ethnography, the main strategy is purposive sampling of a variety
of key informants, who are most knowledgeable about a culture and are
able and willing to act as representatives in revealing and interpreting the
culture. For example, an ethnographic study on the cultural influences of
communication in maternity care will recruit key informants from among
a variety of parents-to-be, midwives and obstetricians in midwifery care
practices and hospitals.
Phenomenology uses criterion sampling, in which participants meet
predefined criteria. The most prominent criterion is the participant’s
experience with the phenomenon under study. The researchers look for
participants who have shared an experience, but vary in characteristics and
in their individual experiences. For example, a phenomenological study on
the lived experiences of pregnant women with psychosocial support from
primary care midwives will recruit pregnant women varying in age, parity
and educational level in primary midwifery practices.
Grounded theory usually starts with purposive sampling and later uses
theoretical sampling to select participants who can best contribute to the
developing theory. As theory construction takes place concurrently with
data collection and analyses, the theoretical sampling of new participants
also occurs along with the emerging theoretical concepts. For example,
one grounded theory study tested several theoretical constructs to build
a theory on autonomy in diabetes patients [4]. In developing the theory,
the researchers started by purposefully sampling participants with diabetes
differing in age, onset of diabetes and social roles, for example, employees,
housewives, and retired people. After the first analysis, researchers continued
with theoretically sampling, for example, participants who differed in the
treatment they received, with different degrees of care dependency, and
participants who receive care from a general practitioner (GP), at a hospital
or from a specialist nurse, etc.
In addition to the ‘big three’ approaches, content analysis is frequently
applied in primary care research, and very often uses purposive, convenience,
or snowball sampling. For instance, a study on peoples’ choice of a hospital
for elective orthopaedic surgery used snowball sampling [5]. One elderly
person in the private network of one researcher personally approached
potential respondents in her social network by means of personal invitations

(including letters). In turn, respondents were asked to pass on the invitation
to other eligible candidates.
Sampling is also dependent on the characteristics of the setting, e.g.,
access, time, vulnerability of participants, and different types of stakeholders.
The setting, where sampling is carried out, is described in detail to provide
thick description of the context, thereby, enabling the reader to make a
transferability judgement (see Part 3: transferability). Sampling also affects
the data analysis, where you continue decision-making about whom or what
situations to sample next. This is based on what you consider as still missing
to get the necessary information for rich findings (see Part 1: emergent
design). Another point of attention is the sampling of ‘invisible groups’ or
vulnerable people. Sampling of these participants would require applying
multiple sampling strategies, and more time calculated in the project
planning stage for sampling and recruitment [6].
How do sample size and data saturation interact?

A guiding principle in qualitative research is to sample only until data
saturation has been achieved. Data saturation means the collection of
qualitative data to the point where a sense of closure is attained because new
data yield redundant information [3].
Data saturation is reached when no new analytical information arises
anymore, and the study provides maximum information on the phenomenon.
In quantitative research, by contrast, the sample size is determined by a
power calculation. The usually small sample size in qualitative research
depends on the information richness of the data, the variety of participants
(or other units), the broadness of the research question and the phenomenon,
the data collection method (e.g., individual or group interviews) and the type
of sampling strategy. Mostly, you and your research team will jointly decide
when data saturation has been reached, and hence whether the sampling can
be ended and the sample size is sufficient. The most important criterion is
the availability of enough in-depth data showing the patterns, categories and
variety of the phenomenon under study. You review the analysis, findings,
and the quality of the participant quotes you have collected, and then decide
whether sampling might be ended because of data saturation. In many cases,
you will choose to carry out two or three more observations or interviews
or an additional focus group discussion to confirm that data saturation has
been reached.
When designing a qualitative sampling plan, we (the authors) work

with estimates. We estimate that ethnographic research should require
25–50 interviews and observations, including about four-to-six focus
group discussions, while phenomenological studies require fewer than 10
interviews, grounded theory studies 20–30 interviews and content analysis
15–20 interviews or three-to-four focus group discussions. However,
these numbers are very tentative and should be very carefully considered
before using them. Furthermore, qualitative designs do not always mean
small sample numbers. Bigger sample sizes might occur, for example, in
content analysis, employing rapid qualitative approaches, and in large or
longitudinal qualitative studies.
DATA COLLECTION
What methods of data collection are appropriate?

The most frequently used data collection methods are participant observation,
interviews, and focus group discussions. Participant observation is a method
of data collection through the participation in and observation of a group or
individuals over an extended period of time [3]. Interviews are another data
collection method in which an interviewer asks the respondents questions
[6], face-to-face, by telephone or online. The qualitative research interview
seeks to describe the meanings of central themes in the life world of the
participants. The main task in interviewing is to understand the meaning
of what participants say [5]. Focus group discussions are a data collection
method with a small group of people to discuss a given topic, usually guided
by a moderator using a questioning-route [8]. It is common in qualitative
research to combine more than one data collection method in one study. You
should always choose your data collection method wisely. Data collection in
qualitative research is unstructured and flexible. You often make decisions
on data collection while engaging in fieldwork, the guiding questions being
with whom, what, when, where and how. The most basic or ‘light’ version
of qualitative data collection is that of open questions in surveys. Box 2
provides an overview of the ‘big three’ qualitative approaches and their
most commonly used data collection methods.
Box 2. Qualitative data collection methods.
Definition Aim Ethno- Pheno- Grounded Content

graphy menology theory analysis
Participants of Participation in To obtain a close Suitable Very rare Sometimes
observations and observa- and intimate
tion of people familiarity with
or groups. a given group of
individuals and
their practices
through inten-
sive involvement
with people in
their environ-
ment, usually
over an extended
period.
Face-to-face A conversation To elicit the par- Suitable Suitable Suitable Suitable
in-depths where the re- ticipant’s experi-
Interviews searcher poses ences, percep-
questions and tions, thoughts
the partici- and feelings.
pants provide
answers face-
to-face, by
telephone or
via mail.
Focus group Interview with To examine Suitable Sometimes Suitable
discussion a group of different experi-
participants ences, percep-
to answer tions, thoughts
questions on a and feelings
specific topic among various
face-to-face participants or
or via mail; parties.
people who
participate
interact with
each other.
What role should I adopt when conducting participant obser-

vations?
What is important is to immerse yourself in the research setting, to enable you
to study it from the inside. There are four types of researcher involvement
in observations, and in your qualitative study, you may apply all four. In
the first type, as ‘complete participant’, you become part of the setting and
play an insider role, just as you do in your own work setting. This role
might be appropriate when studying persons who are difficult to access. The
second type is ‘active participation’. You have gained access to a particular
setting and observed the group under study. You can move around at will
and can observe in detail and depth and in different situations. The third role
is ‘moderate participation’. You do not actually work in the setting you wish
to study but are located there as a researcher. You might adopt this role when
you are not affiliated to the care setting you wish to study. The fourth role
is that of the ‘complete observer’, in which you merely observe (bystander
role) and do not participate in the setting at all. However, you cannot perform
any observations without access to the care setting. Such access might be
easily obtained when you collect data by observations in your own primary
care setting. In some cases, you might observe other care settings, which are
relevant to primary care, for instance observing the discharge procedure for
vulnerable elderly people from hospital to primary care.
How do I perform observations?

It is important to decide what to focus on in each individual observation.
The focus of observations is important because you can never observe
everything, and you can only observe each situation once. Your focus might
differ between observations. Each observation should provide you with
answers regarding ‘Who do you observe?’, ‘What do you observe’, ‘Where
does the observation take place?’, ‘When does it take place?’, ‘How does it
happen?’, and ‘Why does it happen as it happens?’ Observations are not static
but proceed in three stages: descriptive, focused, and selective. Descriptive
means that you observe, on the basis of general questions, everything that
goes on in the setting. Focused observation means that you observe certain
situations for some time, with some areas becoming more prominent.
Selective means that you observe highly specific issues only. For example,
if you want to observe the discharge procedure for vulnerable elderly people
from hospitals to general practice, you might begin with broad observations
to get to know the general procedure. This might involve observing several
different patient situations. You might find that the involvement of primary
care nurses deserves special attention, so you might then focus on the roles
of hospital staff and primary care nurses, and their interactions. Finally,
you might want to observe only the specific situations where hospital staff
and primary care nurses exchange information. You take field notes from
all these observations and add your own reflections on the situations you
observed. You jot down words, whole sentences or parts of situations, and
your reflections on a piece of paper. After the observations, the field notes
need to be worked out and transcribed immediately to be able to include
detailed descriptions.
Box 3. Further reading on interviews and focus group discussion.
Box 4. Qualitative data analysis.
What are the general features of an interview?

Interviews involve interactions between the interviewer(s) and the
respondent(s) based on interview questions. Individual, or face-to-face,
interviews should be distinguished from focus group discussions. The
interview questions are written down in an interview guide [7] for individual
interviews or a questioning route [8] for focus group discussions, with
questions focusing on the phenomenon under study. The sequence of the
questions is pre-determined. In individual interviews, the sequence depends
on the respondents and how the interviews unfold. During the interview, as
the conversation evolves, you go back and forth through the sequence of
questions. It should be a dialogue, not a strict question–answer interview.
In a focus group discussion, the sequence is intended to facilitate the
interaction between the participants, and you might adapt the sequence
depending on how their discussion evolves. Working with an interview
guide or questioning route enables you to collect information on specific

topics from all participants. You are in control in the sense that you give
direction to the interview, while the participants are in control of their
answers. However, you need to be open-minded to recognize that some
relevant topics for participants may not have been covered in your interview
guide or questioning route, and need to be added. During the data collection
process, you develop the interview guide or questioning route further and
revise it based on the analysis.
The interview guide and questioning route might include open and
general as well as subordinate or detailed questions, probes and prompts.
Probes are exploratory questions, for example, ‘Can you tell me more about
this?’ or ‘Then what happened?’ Prompts are words and signs to encourage
participants to tell more. Examples of stimulating prompts are eye contact,
leaning forward and open body language.
Box 5. Further reading on qualitative analysis.
Ethnography • Atkinson P, Coffey A, Delamount S, Lofland J, Lofmand L.

Handbook of ethnography. Sage: Thousand Oaks (CA); 2001.
• Spradley J. The ethnographic interview. Holt Rinehart &
Winston: New York (NY); 1979.
• Spradley J. Participant observation. Holt Rinehart & Winston:
New York (NY); 1980.
Phenomenology • Colaizzi PF. Psychological research as the phenomenologist
views it. In: Valle R, King M, editors. Essential
phenomenological alternative for psychology. New York (NY): Oxford
University Press; 1978. p. 41-78.
• Smith J.A, Flowers P, Larkin M. Interpretative phenomenological
analysis. Theory, method and research. Sage: London; 2010.
Grounded theory • Charmaz K. Constructing grounded theory. 2nd ed. Sage:
Thousand Oaks (CA); 2014.
• Corbin J, Strauss A. Basics of qualitative research. Techniques
and procedures for developing grounded theory. Sage: Los
Angeles (CA); 2008.
Content analysis • Elo S, Kääriäinen M, Kanste O, Pölkki T, Utriainen K, Kyngäs
H. Qualitative Content Analysis: a focus on trustworthiness. Sage
Open 2014: 1–10. DOI: 10.1177/2158244014522633.
• Elo S. Kyngäs A. The qualitative content analysis process. J Adv
Nurs. 2008; 62: 107–115.
• Hsieh HF. Shannon SE. Three approaches to qualitative content
analysis. Qual Health Res. 2005; 15: 1277–1288.
What is a face-to-face interview?

A face-to-face interview is an individual interview, that is, a conversation
between participant and interviewer. Interviews can focus on past or present
situations, and on personal issues. Most qualitative studies start with open
interviews to get a broad ‘picture’ of what is going on. You should not
provide a great deal of guidance and avoid influencing the answers to fit
‘your’ point of view, as you want to obtain the participant’s own experiences,
perceptions, thoughts, and feelings. You should encourage the participants
to speak freely. As the interview evolves, your subsequent major and
subordinate questions become more focused. A face-to-face or individual
interview might last between 30 and 90 min.
Most interviews are semi-structured [3]. To prepare an interview guide
to enhance that a set of topics will be covered by every participant, you
might use a framework for constructing a semi-structured interview guide
[10]: (1) identify the prerequisites to use a semi-structured interview and
evaluate if a semi-structured interview is the appropriate data collection
method; (2) retrieve and utilize previous knowledge to gain a comprehensive
and adequate understanding of the phenomenon under study; (3) formulate
a preliminary interview guide by operationalizing the previous knowledge;
(4) pilot-test the preliminary interview guide to confirm the coverage and
relevance of the content and to identify the need for reformulation of
questions; (5) complete the interview guide to collect rich data with a clear
and logical guide.
The first few minutes of an interview are decisive. The participant wants
to feel at ease before sharing his or her experiences. In a semi-structured
interview, you would start with open questions related to the topic, which
invite the participant to talk freely. The questions aim to encourage
participants to tell their personal experiences, including feelings and
emotions and often focus on a particular experience or specific events. As
you want to get as much detail as possible, you also ask follow-up questions
or encourage telling more details by using probes and prompts or keeping
a short period of silence [6]. You first ask what and why questions and then
how questions.
You need to be prepared for handling problems you might encounter, such
as gaining access, dealing with multiple formal and informal gatekeepers,
negotiating space and privacy for recording data, socially desirable answers
from participants, reluctance of participants to tell their story, deciding on
the appropriate role (emotional involvement), and exiting from fieldwork

prematurely.
What is a focus group discussion and when can I use it?

A focus group discussion is a way to gather together people to discuss a
specific topic of interest. The people participating in the focus group
discussion share certain characteristics, e.g., professional background, or
share similar experiences, e.g., having diabetes. You use their interaction
to collect the information you need on a particular topic. To what depth
of information the discussion goes depends on the extent to which focus
group participants can stimulate each other in discussing and sharing
their views and experiences. Focus group participants respond to you and
to each other. Focus group discussions are often used to explore patients’
experiences of their condition and interactions with health professionals,
to evaluate programmes and treatment, to gain an understanding of health
professionals’ roles and identities, to examine the perception of professional
education, or to obtain perspectives on primary care issues. A focus group
discussion usually lasts 90–120 mins.
You might use guidelines for developing a questioning route [9]:
(1) brainstorm about possible topics you want to cover; (2) sequence
the questioning: arrange general questions first, and then, more specific
questions, and ask positive questions before negative questions; (3) phrase
the questions: use open-ended questions, ask participants to think back
and reflect on their personal experiences, avoid asking ‘why’ questions,
keep questions simple and make your questions sound conversational, be
careful about giving examples; (4) estimate the time for each question and
consider: the complexity of the question, the category of the question, level
of participant’s expertise, the size of the focus group discussion, and the
amount of discussion you want related to the question; (5) obtain feedback
from others (peers); (6) revise the questions based on the feedback; and (7)
test the questions by doing a mock focus group discussion. All questions
need to provide an answer to the phenomenon under study.
You need to be prepared to manage difficulties as they arise, for example,
dominant participants during the discussion, little or no interaction and
discussion between participants, participants who have difficulties sharing
their real feelings about sensitive topics with others, and participants who
behave differently when they are observed.
How should I compose a focus group and how many

participants are needed?
The purpose of the focus group discussion determines the composition.
Smaller groups might be more suitable for complex (and sometimes
controversial) topics. Also, smaller focus groups give the participants
more time to voice their views and provide more detailed information,
while participants in larger focus groups might generate greater variety of
information. In composing a smaller or larger focus group, you need to ensure
that the participants are likely to have different viewpoints that stimulate
the discussion. For example, if you want to discuss the management of
obesity in a primary care district, you might want to have a group composed
of professionals who work with these patients but also have a variety of
backgrounds, e.g. GPs, community nurses, practice nurses in general
practice, school nurses, midwives or dieticians.
Focus groups generally consist of 6–12 participants. Careful time
management is important, since you have to determine how much time you
want to devote to answering each question, and how much time is available
for each individual participant. For example, if you have planned a focus
group discussion lasting 90 min. with eight participants, you might need
15 min. for the introduction and the concluding summary. This means you
have 75 min. for asking questions, and if you have four questions, this allows
a total of 18 min. of speaking time for each question. If all eight respondents
participate in the discussion, this boils down to about two minutes of
speaking time per respondent per question.
How can I use new media to collect qualitative data?

New media are increasingly used for collecting qualitative data, for example,
through online observations, online interviews and focus group discussions,
and in analysis of online sources. Data can be collected synchronously or
asynchronously, with text messaging, video conferences, video calls or
immersive virtual worlds or games, etcetera. Qualitative research moves
from ‘virtual’ to ‘digital’. Virtual means those approaches that import
traditional data collection methods into the online environment and digital
means those approaches take advantage of the unique characteristics and
capabilities of the Internet for research [10]. New media can also be applied.
See Box 3 for further reading on interview and focus group discussion.
ANALYSIS
Can I wait with my analysis until all data have been collected?
You cannot wait with the analysis, because an iterative approach and
emerging design are at the heart of qualitative research. This involves
a process whereby you move back and forth between sampling, data
collection and data analysis to accumulate rich data and interesting findings.
The principle is that what emerges from data analysis will shape subsequent
sampling decisions. Immediately after the very first observation, interview
or focus group discussion, you have to start the analysis and prepare your
field notes.
Why is a good transcript so important?

First, transcripts of audiotaped interviews and focus group discussions
and your field notes constitute your major data sources. Trained and
well-instructed transcribers preferably make transcripts. Usually, e.g., in
ethnography, phenomenology, grounded theory, and content analysis, data
are transcribed verbatim, which means that recordings are fully typed out,
and the transcripts are accurate and reflect the interview or focus group
discussion experience. Most important aspects of transcribing are the focus
on the participants’ words, transcribing all parts of the audiotape, and
carefully revisiting the tape and rereading the transcript. In conversation
analysis non-verbal actions such as coughing, the lengths of pausing and
emphasizing, tone of voice need to be described in detail using a formal
transcription system (best known are G. Jefferson’s symbols).
To facilitate analysis, it is essential that you ensure and check that
transcripts are accurate and reflect the totality of the interview, including
pauses, punctuation and non-verbal data. To be able to make sense of
qualitative data, you need to immerse yourself in the data and ‘live’ the data.
In this process of incubation, you search the transcripts for meaning and
essential patterns, and you try to collect legitimate and insightful findings.
You familiarize yourself with the data by reading and rereading transcripts
carefully and conscientiously, in search for deeper understanding.
Are there differences between the analyses in ethnography,
phenomenology, grounded theory, and content analysis?
Ethnography, phenomenology, and grounded theory each have
different analytical approaches, and you should be aware that each of these
approaches has different schools of thought, which may also have integrated
the analytical methods from other schools (Box 4). When you opt for a
particular approach, it is best to use a handbook describing its analytical
methods, as it is better to use one approach consistently than to ‘mix up’
different schools.
In general, qualitative analysis begins with organizing data. Large
amounts of data need to be stored in smaller and manageable units, which
can be retrieved and reviewed easily. To obtain a sense of the whole,
analysis starts with reading and rereading the data, looking at themes,
emotions and the unexpected, taking into account the overall picture. You
immerse yourself in the data. The most widely used procedure is to develop
an inductive coding scheme based on actual data [11]. This is a process of
open coding, creating categories and abstraction. In most cases, you do not
start with a predefined coding scheme. You describe what is going on in
the data. You ask yourself, what is this? What does it stand for? What else
is like this? What is this distinct from? Based on this close examination of
what emerges from the data you make as many labels as needed. Then, you
make a coding sheet, in which you collect the labels and, based on your
interpretation, cluster them in preliminary categories. The next step is to
order similar or dissimilar categories into broader higher order categories.
Each category is named using content-characteristic words. Then, you use
abstraction by formulating a general description of the phenomenon under
study: subcategories with similar events and information are grouped
together as categories and categories are grouped as main categories. During
the analysis process, you identify ‘missing analytical information’ and you
continue data collection. You reread, recode, re-analyse and re-collect data
until your findings provide breadth and depth.
Throughout the qualitative study, you reflect on what you see or do not
see in the data. It is common to write ‘analytic memos’ [3], write-ups or mini-
analyses about what you think you are learning during the course of your
study, from designing to publishing. They can be a few sentences or pages,
whatever is needed to reflect upon: open codes, categories, concepts, and
patterns that might be emerging in the data. Memos can contain summaries
of major findings and comments and reflections on particular aspects.
In ethnography, analysis begins from the moment that the researcher
sets foot in the field. The analysis involves continually looking for patterns
in the behaviours and thoughts of the participants in everyday life, in order
to obtain an understanding of the culture under study. When comparing
one pattern with another and analysing many patterns simultaneously, you
may use maps, flow charts, organizational charts and matrices to illustrate
the comparisons graphically. The outcome of an ethnographic study is a
narrative description of a culture.
In phenomenology, analysis aims to describe and interpret the meaning
of an experience, often by identifying essential subordinate and major
themes. You search for common themes featuring within an interview and
across interviews, sometimes involving the study participants or other
experts in the analysis process. The outcome of a phenomenological study
is a detailed description of themes that capture the essential meaning of a
‘lived’ experience.
Grounded theory generates a theory that explains how a basic social
problem that emerged from the data is processed in a social setting.
Grounded theory uses the ‘constant comparison’ method, which involves
comparing elements that are present in one data source (e.g., an interview)
with elements in another source, to identify commonalities. The steps in
the analysis are known as open, axial and selective coding. Throughout the
analysis, you document your ideas about the data in methodological and
theoretical memos. The outcome of a grounded theory study is a theory.
Descriptive generic qualitative research is defined as research designed
to produce a low inference description of a phenomenon [12]. Although
Sandelowski maintains that all research involves interpretation, she has
also suggested that qualitative description attempts to minimize inferences
made in order to remain ‘closer’ to the original data [12]. Descriptive
generic qualitative research often applies content analysis. Descriptive
content analysis studies are not based on a specific qualitative tradition
and are varied in their methods of analysis. The analysis of the content
aims to identify themes, and patterns within and among these themes. An
inductive content analysis [11] involves breaking down the data into smaller
units, coding and naming the units according to the content they present,
and grouping the coded material based on shared concepts. They can be
represented by clustering in treelike diagrams. A deductive content analysis
[11] uses a theory, theoretical framework or conceptual model to analyse
the data by operationalizing them in a coding matrix. An inductive content
analysis might use several techniques from grounded theory, such as open
and axial coding and constant comparison. However, note that your findings
are merely a summary of categories, not a grounded theory.
Analysis software can support you to manage your data, for example
by helping to store, annotate and retrieve texts, to locate words, phrases and
segments of data, to name and label, to sort and organize, to identify data
units, to prepare diagrams and to extract quotes. Still, as a researcher you
would do the analytical work by looking at what is in the data, and making
decisions about assigning codes, and identifying categories, concepts and
patterns. The computer assisted qualitative data analysis (CAQDAS) website
provides support to make informed choices between analytical software
and courses: http://www.surrey.ac.uk/sociology/research/researchcentres/
caqdas/support/choosing. See Box 5 for further reading on qualitative
analysis.
The next and final article in this series, Part 4, will focus on trustworthiness
and publishing qualitative research [13].
ACKNOWLEDGEMENTS
The authors thank the following junior researchers who have been
REFERENCES
1. Moser A, Korstjens I.. Series: practical guidance to qualitative research.
Part 1: Introduction. Eur J Gen Pract. 2017;23:271–273.
2. Korstjens I, Moser A.. Series: Practical guidance to qualitative
research. Part 2: Context, research questions and designs. Eur J Gen
Pract. 2017;23:274–279.
3. Polit DF, Beck CT.. Nursing research: Generating and assessing
4. Moser A, van der Bruggen H, Widdershoven G.. Competency in
shaping one’s life: Autonomy of people with type 2 diabetes mellitus
in a nurse-led, shared-care setting; A qualitative study. Int J Nurs Stud.
2006;43:417–427.
5. Moser A, Korstjens I, van der Weijden T, et al.. Patient’s decision
making in selecting a hospital for elective orthopaedic surgery. J Eval
Clin Pract. 2010;16:1262–1268.
6. Bonevski B, Randell M, Paul C, et al.. Reaching the hard-to-reach:
a systematic review of strategies for improving health and medical
research with socially disadvantaged groups. BMC Med Res Methodol.
2014;14:42.
7. Brinkmann S, Kvale S.. Interviews. Learning the craft of qualitative
research interviewing. 3rd ed. London (UK): Sage; 2014.
8. Kruger R, Casey M.. Focus groups: A practical guide for applied
research. Thousand Oaks (CA): Sage; 2015.
9. Kallio H, Pietilä AM, Johnson M, et al.. Systematic methodological
review: developing a framework for a qualitative semi-structured
interview guide. J Adv Nurs. 2016;72:2954–2965.
10. Salmons J. Qualitative online interviews. 2nd ed London (UK): Sage;
2015.
11. Elo S, Kyngäs A.. The qualitative content analysis process. J Adv Nurs.
2008;62:107–115.
12. Sandelowski M. Whatever happened to qualitative description? Res
Nurs Health. 2010;23:334–340.
13. Korstjens I, Moser A.. Series: Practical guidance to qualitative research.
Part 4: Trustworthiness and publishing. Eur J Gen Pract. 2018;24 DOI:
10.1080/13814788.2017.1375092
Chapter
SERIES: PRACTICAL
GUIDANCE TO
4
PART 4: TRUSTWORTHI-
NESS AND PUBLISHING
Irene Korstjensa and Albine Moserb,c

a
Applied Sciences, Maastricht, The Netherlands;

b
People, Zuyd University of Applied Sciences, Heerlen, The Netherlands;
c
ABSTRACT
In the course of our supervisory work over the years we have noticed that
Citation: (APA): Korstjens, I., & Moser, A. (2018). Series: Practical guidance to quali-
tative research. Part 4: Trustworthiness and publishing. European Journal of General
Practice, 24(1), 120-124. (6 pages)
Copyright: © 2018 The Author(s). Published by Informa UK Limited, trading as Taylor
& Francis Group. This is an Open Access article distributed under the terms of the
Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/).

qualitative research papers. The first article provides an introduction to
this series. The second article focused on context, research questions and
designs. The third article focused on sampling, data collection and analysis.
This fourth article addresses FAQs about trustworthiness and publishing.
Quality criteria for all qualitative research are credibility, transferability,
dependability, and confirmability. Reflexivity is an integral part of ensuring
the transparency and quality of qualitative research. Writing a qualitative
research article reflects the iterative nature of the qualitative research process:
data analysis continues while writing. A qualitative research article is mostly
narrative and tends to be longer than a quantitative paper, and sometimes
requires a different structure. Editors essentially use the criteria: is it new,
is it true, is it relevant? An effective cover letter enhances confidence in the
newness, trueness and relevance, and explains why your study required a
qualitative design. It provides information about the way you applied quality
criteria or a checklist, and you can attach the checklist to the manuscript.
Keywords: General practice/family medicine, general, qualitative designs
and methods, trustworthiness, reflexivity, publishing
Key points on trustworthiness and publishing
• The quality criteria for all qualitative research are credibility,
transferability, dependability, and confirmability.
• In addition, reflexivity is an integral part of ensuring the
transparency and quality of qualitative research.
• Writing a qualitative article reflects the iterative nature of the
qualitative research process: continuous data analysis continues
with simultaneous fine-tuning.
• Editors essentially use the criteria: is it new, is it true, and is it
relevant?
• An effective cover letter enhances confidence in the newness,
trueness and relevance, and explains why your study required a
qualitative design.
Series: Practical Guidance to Qualitative Research. Part 4... 43
INTRODUCTION
This article is the fourth and last in a series of four articles aiming to provide
practical guidance for qualitative research. In an introductory paper, we
have described the objective, nature and outline of the series [1]. Part 2 of
the series focused on context, research questions and design of qualitative
research [2], whereas Part 3 concerned sampling, data collection and analysis
[3]. In this paper Part 4, we address frequently asked questions (FAQs) about
two overarching themes: trustworthiness and publishing.
TRUSTWORTHINESS
What are the quality criteria for qualitative research?

The same quality criteria apply to all qualitative designs, including the
‘big three’ approaches. Quality criteria used in quantitative research, e.g.
internal validity, generalizability, reliability, and objectivity, are not suitable
to judge the quality of qualitative research. Qualitative researchers speak
of trustworthiness, which simply poses the question ‘Can the findings to
be trusted?’ [4]. Several definitions and criteria of trustworthiness exist
(see Box 1) [2], but the best-known criteria are credibility, transferability,
dependability, and confirmability as defined by Lincoln and Guba [4].
Box 1: Trustworthiness: definitions of quality criteria in qualitative research.

Based on Lincoln and Guba [4].
Credibility The confidence that can be placed in the truth of the research findings. Credibility
establishes whether the research findings represent plausible information drawn
from the participants’ original data and is a correct interpretation of the participants’
original views.
Transferability The degree to which the results of qualitative research can be transferred to other
contexts or settings with other respondents. The researcher facilitates the transfer-
ability judgment by a potential user through thick description.
Dependability The stability of findings over time. Dependability involves participants’ evaluation
of the findings, interpretation and recommendations of the study such that all are
supported by the data as received from participants of the study.
Confirmability The degree to which the findings of the research study could be confirmed by other
researchers. Confirmability is concerned with establishing that data and interpre-
tations of the findings are not figments of the inquirer’s imagination, but clearly
derived from the data.
Reflexivity The process of critical self-reflection about oneself as researcher (own biases,
preferences, preconceptions), and the research relationship (relationship to the
respondent, and how the relationship affects participant’s answers to questions).
What is credibility and what strategies can be used to ensure it?

Credibility is the equivalent of internal validity in quantitative research
and is concerned with the aspect of truth-value [4]. Strategies to ensure
credibility are prolonged engagement, persistent observation, triangulation
and member check (Box 2). When you design your study, you also determine
which of these strategies you will use, because not all strategies might be
suitable. For example, a member check of written findings might not be
possible for study participants with a low level of literacy. Let us give an
example of the possible use of strategies to ensure credibility. A team of
primary care researchers studied the process by which people with type 2
diabetes mellitus try to master diabetes self-management [6]. They used
the grounded theory approach, and their main finding was an explanatory
theory. The researchers ensured credibility by using the following strategies.
Box 2. Definition of strategies to ensure trustworthiness in qualitative research.

Based on Lincoln and Guba [4]; Sim and Sharp [5].
Criterion Strategy Definition

Credibility Prolonged en- Lasting presence during observation of long interviews or long-lasting
gagement engagement in the field with participants. Investing sufficient time to
become familiar with the setting and context, to test for misinformation,
to build trust, and to get to know the data to get rich data.
Persistent obser- Identifying those characteristics and elements that are most relevant to
vation the problem or issue under study, on which you will focus in detail.
Triangulation Using different data sources, investigators and methods of data collec-
tion.
• Data triangulation refers to using multiple data sources in time
(gathering data in different times of the day or at different times in a
year), space (collecting data on the same phenomenon in multiples
sites or test for cross-site consistency) and person (gathering data from
different types or level of people e.g. individuals, their family members
and clinicians).
• Investigator triangulation is concerned with using two ore
researchers to make coding, analysis and interpretation decisions.
• Method triangulation means using multiple methods of data
collection.
Member check Feeding back data, analytical categories, interpretations and conclu-
sions to members of those groups from whom the data were originally
obtained. It strengthens the data, especially because researcher and
respondents look at the data with different eyes.
Transfer- Thick description Describing not just the behaviour and experiences, but their context as
ability well, so that the behaviour and experiences become meaningful to an
outsider.
Depend- Audit trail Transparently describing the research steps taken from the start of a
ability and research project to the development and reporting of the findings. The
confirm- records of the research path are kept throughout the study.
ability
Reflexiv- Diary Examining one’s own conceptual lens, explicit and implicit assump-
ity tions, preconceptions and values, and how these affect research deci-
sions in all phases of qualitative studies.
Prolonged engagement. Several distinct questions were asked regarding

topics related to mastery. Participants were encouraged to support their
statements with examples, and the interviewer asked follow-up questions.
The researchers studied the data from their raw interview material until a
theory emerged to provide them with the scope of the phenomenon under
study.
Triangulation. Triangulation aims to enhance the process of qualitative
research by using multiple approaches [7]. Methodological triangulation was
used by gathering data by means of different data collection methods such
as in-depth interviews, focus group discussions and field notes. Investigator
triangulation was applied by involving several researchers as research team
members, and involving them in addressing the organizational aspects of
the study and the process of analysis. Data were analysed by two different
researchers. The first six interviews were analysed by them independently,
after which the interpretations were compared. If their interpretations
differed, they discussed them until the most suitable interpretation was
found, which best represented the meaning of the data. The two researchers
held regular meetings during the process of analysis (after analysing every
third data set). In addition, regular analytical sessions were held with the
research team. Data triangulation was secured by using the various data sets
that emerged throughout the analysis process: raw material, codes, concepts
and theoretical saturation.
Persistent observation. Developing the codes, the concepts and the core
category helped to examine the characteristics of the data. The researchers
constantly read and reread the data, analysed them, theorized about them
and revised the concepts accordingly. They recoded and relabelled codes,
concepts and the core category. The researchers studied the data until the
final theory provided the intended depth of insight.
Member check. All transcripts of the interviews and focus group
discussions were sent to the participants for feedback. In addition, halfway
through the study period, a meeting was held with those who had participated
in either the interviews or the focus group discussions, enabling them to
correct the interpretation and challenge what they perceived to be ‘wrong’
interpretations. Finally, the findings were presented to the participants in

another meeting to confirm the theory.
What does transferability mean and who makes a ‘transfer-

ability judgement’?
Transferability concerns the aspect of applicability [4]. Your responsibility
as a researcher is to provide a ‘thick description’ of the participants and
the research process, to enable the reader to assess whether your findings
are transferable to their own setting; this is the so-called transferability
judgement. This implies that the reader, not you, makes the transferability
judgment because you do not know their specific settings.
In the aforementioned study on self-management of diabetes, the
researchers provided a rich account of descriptive data, such as the context
in which the research was carried out, its setting, sample, sample size,
sample strategy, demographic, socio-economic, and clinical characteristics,
inclusion and exclusion criteria, interview procedure and topics, changes
in interview questions based on the iterative research process, and excerpts
from the interview guide.
What is the difference between dependability and confirmabil-

ity and why is an audit trail needed?
Dependability includes the aspect of consistency [4]. You need to check
whether the analysis process is in line with the accepted standards for a
particular design. Confirmability concerns the aspect of neutrality [4].
You need to secure the inter-subjectivity of the data. The interpretation
should not be based on your own particular preferences and viewpoints but
needs to be grounded in the data. Here, the focus is on the interpretation
process embedded in the process of analysis. The strategy needed to
ensure dependability and confirmability is known as an audit trail. You are
responsible for providing a complete set of notes on decisions made during
the research process, research team meetings, reflective thoughts, sampling,
research materials adopted, emergence of the findings and information about
the data management. This enables the auditor to study the transparency of
the research path.
In the aforementioned study of diabetes self-management, a university-
based auditor examined the analytical process, the records and the minutes
of meetings for accuracy, and assessed whether all analytical techniques of
the grounded theory methodology had been used accordingly. This auditor
also reviewed the analysis, i.e. the descriptive, axial and selective codes, to
see whether they followed from the data (raw data, analysis notes, coding
notes, process notes, and report) and grounded in the data. The auditor who
performed the dependability and confirmability audit was not part of the
research team but an expert in grounded theory. The audit report was shared
with all members of the research team.
Why is reflexivity an important quality criterion?

As a qualitative researcher, you have to acknowledge the importance of being
self-aware and reflexive about your own role in the process of collecting,
analysing and interpreting the data, and in the pre-conceived assumptions,
you bring to your research [8]. Therefore, your interviews, observations,
focus group discussions and all analytical data need to be supplemented
with your reflexive notes. In the aforementioned study of diabetes self-
management, the reflexive notes for an interview described the setting and
aspects of the interview that were noted during the interview itself and while
transcribing the audio tape and analysing the transcript. Reflexive notes
also included the researcher’s subjective responses to the setting and the
relationship with the interviewees.
PUBLISHING
How do I report my qualitative study?

The process of writing up your qualitative study reflects the iterative process
of performing qualitative research. As you start your study, you make choices
about the design, and as your study proceeds, you develop your design further.
The same applies to writing your manuscript. First, you decide its structure,
and during the process of writing, you adapt certain aspects. Moreover,
while writing you are still analysing and fine-tuning your findings. The
usual structure of articles is a structured abstract with subheadings, followed
by the main text, structured in sections labelled Introduction-Methods-
Results-Discussion. You might apply this structure loosely, for example
renaming Results as Findings, but sometimes your specific study design
requires a different structure. For example, an ethnographic study might use
a narrative abstract and then start by describing a specific case, or combine
the Findings and Discussion sections. A qualitative article is usually much
longer (5000–7000 words) than quantitative articles, which often present
their results in tables. You might present quantified characteristics of your
participants in tables or running text, and you are likely to use boxes to
present your interview guide or questioning route, or an overview of the
main findings in categories, subcategories and themes. Most of your article
is running text, providing a balanced presentation. You provide a thick
description of the participants and the context, transparently describe and
reflect on your methods, and do justice to the richness of your qualitative
findings in reporting, interpreting and discussing them. Thus, the Methods
and Findings sections will be much longer than in a quantitative paper.
The difference between reporting quantitative and qualitative research
becomes most visible in the Results section. Quantitative articles have a
strict division between the Results section, which presents the evidence,
and the Discussion section. In contrast, the Findings section in qualitative
papers consists mostly of synthesis and interpretation, often with links to
empirical data. Quantitative and qualitative researchers alike, however,
need to be concise in presenting the main findings to answer the research
question, and avoid distractions. Therefore, you need to make choices to
provide a comprehensive and balanced representation of your findings. Your
main findings may consist, for example, of interpretations, relationships
and themes, and your Findings section might include the development of a
theory or model, or integration with earlier research or theory. You present
evidence to substantiate your analytic findings. You use quotes or citations
in the text, or field notes, text excerpts or photographs in boxes to illustrate
and visualize the variety and richness of the findings.
Before you start preparing your article, it is wise to examine first the
journal of your choice. You need to check its guidelines for authors and
recommended sources for reference style, ethics, etc., as well as recently
accepted qualitative manuscripts. More and more journals also refer to
quality criteria lists for reporting qualitative research, and ask you to upload
the checklist with your submission. Two of these checklists are available
at http://www.equator-network.org/reporting-guidelines.
How do I select a potential journal for publishing my research?

Selecting a potential journal for publishing qualitative articles is not much
different from the procedure used for quantitative articles. First, you consider
your potential public and the healthcare settings, health problems, field, or
research methodology you are focusing on. Next, you look for journals in
the Journal Citation Index of Web of Science, consult other researchers and
study the potential journals’ aims, scopes, and author guidelines. This also
enables you to find out how open these journals are to publishing qualitative
research and accepting articles with different designs, structures and lengths.
If you are unsure whether the journal of your choice would accept qualitative
research, you might contact the Editor in Chief. Lastly, you might look in
your top three journals for qualitative articles, and try to decide how your
manuscript would fit in. The author guidelines and examples of manuscripts
will support you during your writing, and your top three offers alternatives
in case you need to turn to another journal.
What are the journal editors’ considerations in accepting a

qualitative manuscript?
Your article should effectively present high-quality research and should
adhere to the journal’s guidelines. Editors essentially use the same criteria
for qualitative articles as for quantitative articles: Is it new, it is true, is
it relevant? However, editors may use—implicitly or explicitly—the level-
of-evidence pyramid, with qualitative research positioned in the lower
ranks. Moreover, many medical journal editors will be more familiar with
quantitative designs than with qualitative work.
Therefore, you need to put some extra effort in your cover letter to the
editor, to enhance their confidence in the newness, trueness and relevance,
and the quality of your work. It is of the utmost importance that you explain in
your cover letter why your study required a qualitative design, and probably
more words than usual. If you need to deviate from the usual structure, you
have to explain why. To enhance confidence in the quality of your work,
you should explain how you applied quality criteria or refer to the checklist
you used (Boxes 2 and 3). You might even attach the checklist as additional
information to the manuscript. You might also request that the Editor-in-
Chief invites at least one reviewer who is familiar with qualitative research.
Box 3. Quality criteria checklists for reporting qualitative research. Based on

O’Brien et al. [9]; Tong et al. [10].
Standards for reporting qualitative Consolidated criteria for reporting qualitative

research (SRQR) research (COREQ)
All aspects of qualitative studies. Qualitative studies focusing on in-depth inter-
views and focus groups.
21 items for: title, abstract, introduction, 32 items for: research team and reflexivity, study
methods, results/findings, discussion, design, data analysis, and reporting.
conflicts of interest, and funding.
ACKNOWLEDGEMENTS
The authors wish to thank the following junior researchers who have been
REFERENCES
1. Moser A, Korstjens I.. Series: practical guidance to qualitative research.
Part 1: Introduction. Eur J Gen Pract. 2017;23:271–273.
2. Korstjens I, Moser A.. Series: practical guidance to qualitative research.
Part 2: Context, research questions and designs. Eur J Gen Pract.
2017;23:274–279.
3. Moser A, Korstjens I.. Series: practical guidance to qualitative
research. Part 3: Sampling, data collection and analysis. Eur J Gen
Pract. 2018;24. DOI: 10.1080/13814788.2017.1375091
4. Lincoln YS, Guba EG.. Naturalistic inquiry. California: Sage
Publications; 1985. [Google Scholar]
5. Tracy SJ. Qualitative quality: eight ‘big-tent’ criteria for excellent
qualitative research. Qual Inq. 2010;16:837–851.
6. Moser A, van der Bruggen H, Widdershoven G, et al.. Self-management
of type 2 diabetes mellitus: a qualitative investigation from the
perspective of participants in a nurse-led, shared-care programme in
the Netherlands. BMC Public Health. 2008;8:91.
7. Sim J, Sharp K.. A critical appraisal of the role of triangulation in
nursing research. Int J Nurs Stud. 1998;35:23–31.
8. Mauthner NS, Doucet A.. Reflexive accounts and accounts of reflexivity
in qualitative data. Soc 2003;37:413–431.
9. O’Brien BC, Harris IB, Beckman TJ, et al.. Standards for reporting
qualitative research: a synthesis of recommendations. Acad Med.
2014;89:1245–1251.
10. Tong A, Sainsbury P, Craig J.. Consolidated criteria for reporting
qualitative research (COREQ): a 32-item checklist for interviews and
focus groups. Int J Qual Health Care. 2007;19:349–357.
Chapter
PARTICIPANT
OBSERVATION AS A DATA
COLLECTION METHOD
5
Barbara B. Kawulich
University of West Georgia Educational Leadership and Professional Studies Department1601
Maple Street, Room 153, Education Annex Carrollton, GA 30118, USA
ABSTRACT
Observation, particularly participant observation, has been used in a
variety of disciplines as a tool for collecting data about people, processes,
and cultures in qualitative research. This paper provides a look at various
definitions of participant observation, the history of its use, the purposes
for which it is used, the stances of the observer, and when, what, and how
to observe. Information on keeping field notes and writing them up is also
discussed, along with some exercises for teaching observation techniques to
researchers-in-training.
Citation: (APA): Kawulich, B. B. (2005, May). Participant observation as a data collec-

tion method. In Forum qualitative sozialforschung/forum: Qualitative social research,
Vol. 6, No. 2. (28 pages)
Copyright: © 2005 Barbara B. Kawulich. This work is licensed under a Creative
Commons Attribution 4.0 International License (https://creativecommons.org/licenses/
by/4.0/).
Keywords: participant observation, qualitative research methods, field notes
INTRODUCTION
Participant observation, for many years, has been a hallmark of both
anthropological and sociological studies. In recent years, the field of
education has seen an increase in the number of qualitative studies that include
participant observation as a way to collect information. Qualitative methods
of data collection, such as interviewing, observation, and document analysis,
have been included under the umbrella term of “ethnographic methods” in
recent years. The purpose of this paper is to discuss observation, particularly
participant observation, as a tool for collecting data in qualitative research
studies. Aspects of observation discussed herein include various definitions
of participant observation, some history of its use, the purposes for which
such observation is used, the stances or roles of the observer, and additional
information about when, what, and how to observe. Further information is
provided to address keeping field notes and their use in writing up the final
story. [1]
DEFINITIONS
MARSHALL and ROSSMAN (1989) define observation as “the systematic
description of events, behaviors, and artifacts in the social setting chosen
for study” (p.79). Observations enable the researcher to describe existing
situations using the five senses, providing a “written photograph” of the
situation under study (ERLANDSON, HARRIS, SKIPPER, & ALLEN,
1993). DeMUNCK and SOBO (1998) describe participant observation as
the primary method used by anthropologists doing fieldwork. Fieldwork
involves “active looking, improving memory, informal interviewing, writing
detailed field notes, and perhaps most importantly, patience” (DeWALT
& DeWALT, 2002, p.vii). Participant observation is the process enabling
researchers to learn about the activities of the people under study in the
natural setting through observing and participating in those activities. It
provides the context for development of sampling guidelines and interview
guides (DeWALT & DeWALT, 2002). SCHENSUL, SCHENSUL, and
LeCOMPTE (1999) define participant observation as “the process of
learning through exposure to or involvement in the day-to-day or routine
activities of participants in the researcher setting” (p.91). [2]
Participant Observation as a Data Collection Method 55
BERNARD (1994) adds to this understanding, indicating that participant

observation requires a certain amount of deception and impression
management. Most anthropologists, he notes, need to maintain a sense of
objectivity through distance. He defines participant observation as the process
of establishing rapport within a community and learning to act in such a way
as to blend into the community so that its members will act naturally, then
removing oneself from the setting or community to immerse oneself in the
data to understand what is going on and be able to write about it. He includes
more than just observation in the process of being a participant observer;
he includes observation, natural conversations, interviews of various sorts,
checklists, questionnaires, and unobtrusive methods. Participant observation
is characterized by such actions as having an open, nonjudgmental attitude,
being interested in learning more about others, being aware of the propensity
for feeling culture shock and for making mistakes, the majority of which can
be overcome, being a careful observer and a good listener, and being open to
the unexpected in what is learned (DeWALT & DeWALT, 1998). [3]
FINE (2003) uses the term “peopled ethnography” to describe text
that provides an understanding of the setting and that describes theoretical
implications through the use of vignettes, based on field notes from
observations, interviews, and products of the group members. He suggests
that ethnography is most effective when one observes the group being
studied in settings that enable him/her to “explore the organized routines
of behavior” (p.41). FINE, in part, defines “peopled ethnography” as being
based on extensive observation in the field, a labor-intensive activity that
sometimes lasts for years. In this description of the observation process, one
is expected to become a part of the group being studied to the extent that
the members themselves include the observer in the activity and turn to the
observer for information about how the group is operating. He also indicates
that it is at this point, when members begin to ask the observer questions
about the group and when they begin to include the observer in the “gossip,”
that it is time to leave the field. This process he describes of becoming a part
of the community, while observing their behaviors and activities, is called
participant observation. [4]
THE HISTORY OF PARTICIPANT OBSERVATION AS

A METHOD
Participant observation is considered a staple in anthropological studies,
especially in ethnographic studies, and has been used as a data collection
method for over a century. As DeWALT and DeWALT (2002) relate it,
one of the first instances of its use involved the work of Frank Hamilton
CUSHING, who spent four and a half years as a participant observer with the
Zuni Pueblo people around 1879 in a study for the Smithsonian Institution’s
Bureau of Ethnology. During this time, CUSHING learned the language,
participated in the customs, was adopted by a pueblo, and was initiated into
the priesthood. Because he did not publish extensively about this culture, he
was criticized as having gone native, meaning that he had lost his objectivity
and, therefore, his ability to write analytically about the culture. My own
experience conducting research in indigenous communities, which began
about ten years ago with my own ethnographic doctoral dissertation on
Muscogee (Creek) women’s perceptions of work (KAWULICH, 1998)
and has continued in the years since (i.e., KAWULICH, 2004), leads me to
believe that, while this may have been the case, it is also possible that he
held the Zuni people in such high esteem that he felt it impolitic or irreverent
to do so. In my own research, I have been hesitant to write about religious
ceremonies or other aspects of indigenous culture that I have observed,
for example, for fear of relating information that my participants or other
community members might feel should not be shared. When I first began
conducting my ethnographic study of the Muscogee culture, I was made
aware of several incidents in which researchers were perceived to have
taken information they had obtained through interviews or observations and
had published their findings without permission of the Creek people or done
so without giving proper credit to the participants who had shared their lives
with the researchers. [5]
A short time later, in 1888, Beatrice Potter WEBB studied poor
neighborhoods during the day and returned to her privileged lifestyle
at night. She took a job as a rent collector to interact with the people in
buildings and offices and took a job as a seamstress in a sweatshop to better
understand their lives. Then, in the early 1920s, MALINOWSKI studied
and wrote about his participation and observation of the Trobriands, a
study BERNARD (1998) calls one of the most cited early discussions of
anthropological data collection methods. Around the same time, Margaret
MEAD studied the lives of adolescent Samoan girls. MEAD’s approach
to data collection differed from that of her mentor, anthropologist Frank
BOAS, who emphasized the use of historical texts and materials to document
disappearing native cultures. Instead, MEAD participated in the living culture
to record their cultural activities, focusing on specific activities, rather than
participating in the activities of the culture overall as did MALINOWSKI.
By 1874, the Royal Anthropological Institute of Great Britain had published

a manual of methods called Notes and Queries on Anthropology, which was
subsequently revised several times until 1971 (BERNARD, 1998). [6]
STOCKING (1983, as cited in DeWALT & DeWALT, 2002) divided
participant observation as an ethnographic method of data collection into
three phases: participation, observation, and interrogation, pointing out
that MALINOWSKI and MEAD both emphasized the use of observation
and interrogation, but not participation. He suggests that both MEAD and
MALINOWSKI held positions of power within the culture that enabled
them to collect data from a position of privilege. While ethnographers
traditionally tried to understand others by observing them and writing
detailed accounts of others’ lives from an outsider viewpoint, more recently,
sociologists have taken a more insider viewpoint by studying groups in
their own cultures. These sociological studies have brought into question
the stance or positioning of the observer and generated more creative
approaches to lending voice to others in the presentation of the findings of
their studies (GAITAN, 2000). By the 1940s, participant observation was
widely used by both anthropologists and sociologists. The previously noted
studies were some of the first to use the process of participant observation to
obtain data for understanding various cultures and, as such, are considered
to be required reading in anthropology classes. [7]
WHY USE OBSERVATION TO COLLECT DATA?

Observation methods are useful to researchers in a variety of ways. They
provide researchers with ways to check for nonverbal expression of feelings,
determine who interacts with whom, grasp how participants communicate
with each other, and check for how much time is spent on various activities
(SCHMUCK, 1997). Participant observation allows researchers to check
definitions of terms that participants use in interviews, observe events that
informants may be unable or unwilling to share when doing so would be
impolitic, impolite, or insensitive, and observe situations informants have
described in interviews, thereby making them aware of distortions or
inaccuracies in description provided by those informants (MARSHALL &
ROSSMAN, 1995). [8]
DeWALT and DeWALT (2002) believe that “the goal for design of
research using participant observation as a method is to develop a holistic
understanding of the phenomena under study that is as objective and accurate
as possible given the limitations of the method” (p.92). They suggest that
participant observation be used as a way to increase the validity1) of the

study, as observations may help the researcher have a better understanding of
the context and phenomenon under study. Validity is stronger with the use of
additional strategies used with observation, such as interviewing, document
analysis, or surveys, questionnaires, or other more quantitative methods.
Participant observation can be used to help answer descriptive research
questions, to build theory, or to generate or test hypotheses (DeWALT &
DeWALT, 2002). [9]
When designing a research study and determining whether to use
observation as a data collection method, one must consider the types of
questions guiding the study, the site under study, what opportunities are
available at the site for observation, the representativeness of the participants
of the population at that site, and the strategies to be used to record and
analyze the data (DeWALT & DeWALT, 2002). [10]
Participant observation is a beginning step in ethnographic studies.
SCHENSUL, SCHENSUL, and LeCOMPTE (1999) list the following
reasons for using participant observation in research:
• to identify and guide relationships with informants;
• to help the researcher get the feel for how things are organized
and prioritized, how people interrelate, and what are the cultural
parameters;
• to show the researcher what the cultural members deem to be
important in manners, leadership, politics, social interaction, and
taboos;
• to help the researcher become known to the cultural members,
thereby easing facilitation of the research process; and
• to provide the researcher with a source of questions to be
addressed with participants (p.91). [11]
BERNARD (1994) lists five reasons for including participant observation
in cultural studies, all of which increase the study’s validity:
• It makes it possible to collect different types of data. Being on site
over a period of time familiarizes the researcher to the community,
thereby facilitating involvement in sensitive activities to which
he/she generally would not be invited.
• It reduces the incidence of “reactivity” or people acting in a
certain way when they are aware of being observed.
• It helps the researcher to develop questions that make sense in the

native language or are culturally relevant.
• It gives the researcher a better understanding of what is happening
in the culture and lends credence to one’s interpretations of the
observation. Participant observation also enables the researcher
to collect both quantitative and qualitative data through surveys
and interviews.
• It is sometimes the only way to collect the right data for one’s
study (pp.142-3). [12]
ADVANTAGES AND DISADVANTAGES OF USING

PARTICIPANT OBSERVATION
DeMUNCK and SOBO (1998) provide several advantages of using
participant observation over other methods of data collection. These
include that it affords access to the “backstage culture” (p.43); it allows for
richly detailed description, which they interpret to mean that one’s goal of
describing “behaviors, intentions, situations, and events as understood by
one’s informants” is highlighted (p.43); and it provides opportunities for
viewing or participating in unscheduled events. DeWALT and DeWALT
(2002) add that it improves the quality of data collection and interpretation
and facilitates the development of new research questions or hypotheses
(p.8). [13]
DeMUNCK and SOBO also share several disadvantages of using
participation as a method, including that sometimes the researcher may not
be interested in what happens out of the public eye and that one must rely on
the use of key informants. The MEAD-FREEMAN2) controversy illustrates
how different researchers gain different understanding of what they
observe, based on the key informant(s) used in the study. Problems related
to representation of events and the subsequent interpretations may occur
when researchers select key informants who are similar to them or when the
informants are community leaders or marginal participants (DeMUNCK &
SOBO, 1998). To alleviate this potential bias problem, BERNARD (1994)
suggests pretesting informants or selecting participants who are culturally
competent in the topic being studied. [14]
JOHNSON and SACKETT (1998) discuss participant observation as a
source of erroneous description in behavioral research. They note that the
information collected by anthropologists is not representative of the culture,
as much of the data collected by these researchers is observed based on the

researcher’s individual interest in a setting or behavior, rather than being
representative of what actually happens in a culture. For example, they
report that more data has been collected about political/religious activities
than about eating/sleeping activities, because the political/religious activities
are more interesting to researchers than eating/sleeping activities; yet, the
amount of time the cultural members spent on political/religious activities
was less than 3%, while the amount of time they spent eating/sleeping was
greater than 60%. Such actions skew the description of cultural activities.
To alleviate this problem, they advocate the use of systematic observation
procedures to incorporate rigorous techniques for sampling and recording
behavior that keep researchers from neglecting certain aspects of culture.
Their definition of structured observation directs who is observed, when
and where they are observed, what is observed, and how the observations
are recorded, providing a more quantitative observation than participant
observation. [15]
Limitations of Observation
Several researchers have noted the limitations involved with using
observations as a tool for data collection. For example, DeWALT and
DeWALT (2002) note that male and female researchers have access to
different information, as they have access to different people, settings,
and bodies of knowledge. Participant observation is conducted by a biased
human who serves as the instrument for data collection; the researcher must
understand how his/her gender, sexuality, ethnicity, class, and theoretical
approach may affect observation, analysis, and interpretation. [16]
SCHENSUL, SCHENSUL, and LeCOMPTE (1999) refer to
participation as meaning almost total immersion in an unfamiliar culture
to study others’ lives through the researcher’s participation as a full-time
resident or member, though they point out that most observers are not full
participants in community life. There are a number of things that affect
whether the researcher is accepted in the community, including one’s
appearance, ethnicity, age, gender, and class, for example. Another factor
they mention that may inhibit one’s acceptance relates to what they call the
structural characteristics—that is, those mores that exist in the community
regarding interaction and behavior (p.93). Some of the reasons they mention
for a researcher’s not being included in activities include a lack of trust, the
community’s discomfort with having an outsider there, potential danger to
either the community or the researcher, and the community’s lack of funds to
further support the researcher in the research. Some of the ways the researcher
might be excluded include the community members’ use of a language that
is unfamiliar to the researcher, their changing from one language to another
that is not understood by the researcher, their changing the subject when the
researcher arrives, their refusal to answer certain questions, their moving
away from the researcher to talk out of ear shot, or their failure to invite the
researcher to social events. [17]
SCHENSUL, SCHENSUL, and LeCOMPTE further point out that all
researchers should expect to experience a feeling of having been excluded
at some point in the research process, particularly in the beginning. The
important thing, they note, is for the researcher to recognize what that
exclusion means to the research process and that, after the researcher has
been in the community for a while, the community is likely to have accepted
the researcher to some degree. [18]
Another limitation involved in conducting observations is noted
by DeWALT, DeWALT, and WAYLAND (1998). The researcher must
determine to what extent he/she will participate in the lives of the participants
and whether to intervene in a situation. Another potential limitation they
mention is that of researcher bias. They note that, unless ethnographers
use other methods than just participant observation, there is likelihood that
they will fail to report the negative aspects of the cultural members. They
encourage the novice researcher to practice reflexivity at the beginning
of one’s research to help him/her understand the biases he/she has that
may interfere with correct interpretation of what is observed. Researcher
bias is one of the aspects of qualitative research that has led to the view
that qualitative research is subjective, rather than objective. According to
RATNER (2002), some qualitative researchers believe that one cannot be
both objective and subjective, while others believe that the two can coexist,
that one’s subjectivity can facilitate understanding the world of others. He
notes that, when one reflects on one’s biases, he/she can then recognize
those biases that may distort understanding and replace them with those that
help him/her to be more objective. In this way, he suggests, the researcher is
being respectful of the participants by using a variety of methods to ensure
that what he/she thinks is being said, in fact, matches the understanding of
the participant. BREUER and ROTH (2003) use a variety of methods for
knowledge production, including, for example, positioning or various points
of view, different frames of reference, such as special or temporal relativity,
perceptual schemata based on experience, and interaction with the social
context—understanding that any interaction changes the observed object.
Using different approaches to data collection and observation, in particular,

leads to richer understanding of the social context and the participants
therein. [19]
SCHENSUL, SCHENSUL, and LeCOMPTE (1999) also suggest that
observation is filtered through one’s interpretive frames and that “the most
accurate observations are shaped by formative theoretical frameworks
and scrupulous attention to detail” (p.95). The quality of the participant
observation depends upon the skill of the researcher to observe, document,
and interpret what has been observed. It is important in the early stages of the
research process for the researcher to make accurate observation field notes
without imposing preconceived categories from the researcher’s theoretical
perspective, but allow them to emerge from the community under study (see
Section 10). [20]
THE STANCES OF THE OBSERVER

The degree to which the researcher involves himself/herself in participation
in the culture under study makes a difference in the quality and amount of
data he/she will be able to collect. GOLD (1958) has provided a description
of observer stances that extend Buford JUNKER’s explanation of four
theoretical stances for researchers conducting field observations. GOLD
relates the four observation stances as follows:
• At one extreme is the complete participant, who is a member
of the group being studied and who conceals his/her researcher
role from the group to avoid disrupting normal activity. The
disadvantages of this stance are that the researcher may lack
objectivity, the group members may feel distrustful of the
researcher when the research role is revealed, and the ethics of
the situation are questionable, since the group members are being
deceived.
• In the participant as observer stance, the researcher is a member
of the group being studied, and the group is aware of the research
activity. In this stance, the researcher is a participant in the group
who is observing others and who is interested more in observing
than in participating, as his/her participation is a given, since he/
she is a member of the group. This role also has disadvantages,
in that there is a trade off between the depth of the data revealed
to the researcher and the level of confidentiality provided to the
group for the information they provide.
• The observer as participant stance enables the researcher to

participate in the group activities as desired, yet the main role
of the researcher in this stance is to collect data, and the group
being studied is aware of the researcher›s observation activities.
In this stance, the researcher is an observer who is not a member
of the group and who is interested in participating as a means
for conducting better observation and, hence, generating more
complete understanding of the group›s activities. MERRIAM
(1998) points out that, while the researcher may have access to
many different people in this situation from whom he/she may
obtain information, the group members control the level of
information given. As ADLER and ADLER (1994, p.380) note,
this «peripheral membership role» enables the researcher to
«observe and interact closely enough with members to establish
an insider›s identity without participating in those activities
constituting the core of group membership.»
• The opposite extreme stance from the complete participant is
the complete observer, in which the researcher is completely
hidden from view while observing or when the researcher is in
plain sight in a public setting, yet the public being studied is
unaware of being observed. In either case, the observation in this
stance is unobtrusive and unknown to participants. [21]
Of these four stances, the role providing the most ethical approach
to observation is that of the observer as participant, as the researcher’s
observation activities are known to the group being studied, yet the emphasis
for the researcher is on collecting data, rather than participating in the
activity being observed. [22]
MERRIAM (1998) calls the stance of participant observer a
“schizophrenic activity” (p.103), because the researcher participates in the
setting under study, but not to the extent that he/she becomes too absorbed
to observe and analyze what is happening. The question frequently is asked,
should the researcher be concerned about his/her role of participant observer
affecting the situation. MERRIAM (1998) suggests that the question is not
whether the process of observing affects the situation or the participants,
but how the researcher accounts for those effects in explaining the data.
Participant observation is more difficult than simply observing without
participation in the activity of the setting, since it usually requires that the
field notes be jotted down at a later time, after the activity has concluded.
Yet there are situations in which participation is required for understanding.

Simply observing without participating in the action may not lend itself to
one’s complete understanding of the activity. [23]
DeWALT and DeWALT provide an alternative view of the roles
the participant observer may take, by comparing the various stances of
observation through membership roles described by both SPRADLEY
(1980, pp.58-62) and ADLER and ADLER (1987). SPRADLEY describes
the various roles that observers may take, ranging in degree of participation
from non-participation (activities are observed from outside the research
setting) to passive participation (activities are observed in the setting but
without participation in activities) to moderate participation (activities are
observed in the setting with almost complete participation in activities) to
complete participation (activities are observed in the setting with complete
participation in the culture). ADLER and ADLER similarly describe the range
of membership roles to include peripheral membership, active membership,
and full membership. Those serving in a peripheral membership role observe
in the setting but do not participate in activities, while active membership
roles denote the researcher’s participation in certain or all activities, and full
membership is reflected by fully participating in the culture. The degree to
which the researcher may participate may be determined by the researcher
or by the community (DeWALT & DeWALT, 2002). [24]
Other factors that may affect the degree to which one may participate
in the culture include the researcher’s age, gender, class, and ethnicity.
One also must consider the limitations of participating in activities that are
dangerous or illegal.
“The key point is that researchers should be aware of the compromises
in access, objectivity, and community expectation that are being made at any
particular place along the continuum. Further, in the writing of ethnography,
the particular place of the researcher on this continuum should be made
clear” (DeWALT & DeWALT, 2002 p.23). [25]
HOW DOES ONE KNOW WHAT TO OBSERVE?

MERRIAM (1998) suggests that the most important factor in determining
what a researcher should observe is the researcher’s purpose for conducting
the study in the first place. “Where to begin looking depends on the research
question, but where to focus or stop action cannot be determined ahead of
time” (MERRIAM, 1998, p.97). [26]
To help the researcher know what to observe, DeWALT and DeWALT

(2002) suggest that he/she study what is happening and why; sort out the
regular from the irregular activities; look for variation to view the event
in its entirety from a variety of viewpoints; look for the negative cases or
exceptions; and, when behaviors exemplify the theoretical purposes for the
observation, seek similar opportunities for observation and plan systematic
observations of those events/behaviors. Over time, such events may change,
with the season, for example, so persistent observation of activities or events
that one has already observed may be necessary. [27]
WOLCOTT (2001) suggests that fieldworkers ask themselves if they are
making good use of the opportunity to learn what it is they want to know. He
further advises that fieldworkers ask themselves if what they want to learn
makes the best use of the opportunity presented. [28]
HOW DOES ONE CONDUCT AN OBSERVATION?

WHYTE (1979) notes that, while there is no one way that is best for
conducting research using participant observation, the most effective work is
done by researchers who view informants as collaborators; to do otherwise,
he adds, is a waste of human resources. His emphasis is on the relationship
between the researcher and informants as collaborative researchers who,
through building solid relationships, improve the research process and
improve the skills of the researcher to conduct research. [29]
Conducting observations involves a variety of activities and
considerations for the researcher, which include ethics, establishing rapport,
selecting key informants, the processes for conducting observations,
deciding what and when to observe, keeping field notes, and writing up
one’s findings. In this section, these aspects of the research activities are
discussed in more detail. [30]
Ethics
A primary consideration in any research study is to conduct the research
in an ethical manner, letting the community know that one’s purpose for
observing is to document their activities. While there may be instances
where covert observation methods might be appropriate, these situations are
few and are suspect. DeWALT, DeWALT, and WAYLAND (1998) advise
the researcher to take some of the field notes publicly to reinforce that what
the researcher is doing is collecting data for research purposes. When the
researcher meets community members for the first time, he/she should
be sure to inform them of the purpose for being there, sharing sufficient
information with them about the research topic that their questions about the
research and the researcher’s presence there are put to rest. This means that
one is constantly introducing oneself as a researcher. [31]
Another ethical responsibility is to preserve the anonymity of the
participants in the final write-up and in field notes to prevent their
identification, should the field notes be subpoenaed for inspection. Individual
identities must be described in ways that community members will not be
able to identify the participants. Several years ago, when I submitted an
article for publication, one of the reviewers provided feedback that it would
be helpful to the reader if I described the participants as, for example, “a
35 year old divorced mother of three, who worked at Wal-Mart.” This level
of detail was not a feasible option for me in providing a description of
individual participants, as it would have been easy for the local community
members to identify these participants from such specific detail; this was a
small community where everyone knew everyone else, and they would have
known who the woman was. Instead, I only provided broad descriptions that
lacked specific details, such as “a woman in her thirties who worked in the
retail industry.” [32]
DeWALT, DeWALT, and WAYLAND also point out that there is an
ethical concern regarding the relationships established by the researcher
when conducting participant observation; the researcher needs to develop
close relationships, yet those relationships are difficult to maintain, when
the researcher returns to his/her home at a distant location. It is typical
for researchers who spend an extended period of time in a community to
establish friendships or other relationships, some of which may extend over a
lifetime; others are transient and extend only for the duration of the research
study. Particularly when conducting cross-cultural research, it is necessary
to have an understanding of cultural norms that exist. As MARSHALL and
BATTEN (2004) note, one must address issues, such as potential exploitation
and inaccuracy of findings, or other actions which may cause damage to the
community. They suggest that the researcher take a participatory approach
to research by including community members in the research process,
beginning with obtaining culturally appropriate permission to conduct
research and ensuring that the research addresses issues of importance to the
community. They further suggest that the research findings be shared with
the community to ensure accuracy of findings. In my own ongoing research
projects with the Muscogee (Creek) people, I have maintained relationships
with many of the people, including tribal leaders, tribal administrators, and
council members, and have shared the findings with selected tribal members
to check my findings. Further, I have given them copies of my work for their
library. I, too, have found that, by taking a participatory approach to my
research with them, I have been asked to participate in studies that they wish
to have conducted. [33]
Gaining Entry and Establishing Rapport

Regarding entering the field, there are several activities that must be
addressed. These include choosing a site, gaining permission, selecting key
informants, and familiarizing oneself with the setting or culture (BERNARD,
1994). In this process, one must choose a site that will facilitate easy access
to the data. The objective is to collect data that will help answer the research
questions. [34]
To assist in gaining permission from the community to conduct the study,
the researcher may bring letters of introduction or other information that
will ease entry, such as information about one’s affiliation, funding sources,
and planned length of time in the field. One may need to meet with the
community leaders. For example, when one wishes to conduct research in
a school, permission must be granted by the school principal and, possibly,
by the district school superintendent. For research conducted in indigenous
communities, it may be necessary to gain permission from the tribal leader
or council. [35]
One should use personal contacts to ease entry; these would include
key informants who serve as gatekeepers, but BERNARD cautions against
choosing a gatekeeper who represents one side of warring factions, as the
researcher may be seen as affiliated with that faction. He also cautions that,
when using highly placed individuals as gatekeepers, the researcher may
be expected to serve as a spy. AGAR (1980) suggests that the researcher be
wary of accepting the first people he/she encounters in the research setting
as key informants, as they may be “deviants” or “professional stranger
handlers.” The former may be people who live on the fringe of the culture,
and association with them may provide the researcher with erroneous views
of the culture or may alienate the researcher from others who might better
inform the study. The “professional stranger handlers” are those people who
take upon themselves the job of finding out what it is the researcher is after
and how it may affect the members of the culture. AGAR suggests finding
a key informant to sponsor the researcher to facilitate his/her meeting those
people who can provide the needed information. These key informants must
be people who are respected by other cultural members and who are viewed
to be neutral, to enable the researcher to meet informants in all of the various
factions found in the culture. [36]
The researcher also should become familiar with the setting and social
organization of the culture. This may involve mapping out the setting or
developing social networks to help the researcher understand the situation.
These activities also are useful for enabling the researcher to know what to
observe and from whom to gather information. [37]
“Hanging out” is the process through which the researcher gains trust
and establishes rapport with participants (BERNARD, 1994). DeMUNCK
and SOBO (1998) state that, “only through hanging out do a majority of
villagers get an opportunity to watch, meet, and get to know you outside your
‘professional’ role” (p.41). This process of hanging out involves meeting and
conversing with people to develop relationships over an extended period
of time. There are three stages to the hanging out process, moving from a
position of formal, ignorant intruder to welcome, knowledgeable intimate
(DeMUNCK & SOBO). The first stage is the stage at which the researcher
is a stranger who is learning the social rules and language, making herself/
himself known to the community, so they will begin to teach her/him how
to behave appropriately in that culture. In the second stage, one begins to
merge with the crowd and stand out less as an intruder, what DeMUNCK
and SOBO call the “acquaintance” stage. During this stage, the language
becomes more familiar to the researcher, but he/she still may not be fluent
in its use. The third stage they mention is called the “intimate” stage, during
which the researcher has established relationships with cultural participants
to the extent that he/she no longer has to think about what he/she says, but
is as comfortable with the interaction as the participants are with her/him
being there. There is more to participant observation than just hanging out.
It sometimes involves the researcher’s working with and participating in
everyday activities beside participants in their daily lives. It also involves
taking field notes of observations and interpretations. Included in this
fieldwork is persistent observation and intermittent questioning to gain
clarification of meaning of activities. [38]
Rapport is built over time; it involves establishing a trusting relationship
with the community, so that the cultural members feel secure in sharing
sensitive information with the researcher to the extent that they feel assured
that the information gathered and reported will be presented accurately and
dependably. Rapport-building involves active listening, showing respect and
empathy, being truthful, and showing a commitment to the well-being of the

community or individual. Rapport is also related to the issue of reciprocity,
the giving back of something in return for their sharing their lives with the
researcher. The cultural members are sharing information with the researcher,
making him/her welcome in the community, inviting him/her to participate
in and report on their activities. The researcher has the responsibility for
giving something back, whether it is monetary remuneration, gifts or
material goods, physical labor, time, or research results. Confidentiality is
also a part of the reciprocal trust established with the community under study.
They must be assured that they can share personal information without their
identity being exposed to others. [39]
BERNARD states that “the most important thing you can do to stop
being a freak is to speak the language of the people you’re studying—and
speak it well” (1994, p.145). Fluency in the native language helps gain
access to sensitive information and increases rapport with participants.
Learn about local dialects, he suggests, but refrain from trying to mimic
local pronunciations, which may be misinterpreted as ridicule. Learning
to speak the language shows that the researcher has a vested interest in
the community, that the interest is not transient, and helps the researcher
to understand the nuances of conversation, particularly what constitutes
humor. [40]
As mentioned in the discussion of the limitations of observation,
BERNARD suggests that gender affects one’s ability to access certain
information and how one views others. What is appropriate action in some
cultures is dependent upon one’s gender. Gender can limit what one can ask,
what one can observe, and what one can report. For example, several years
after completing my doctoral dissertation with Muscogee (Creek) women
about their perceptions of work, I returned for additional interviews with
the women to gather specific information about more intimate aspects of
their lives that had been touched on briefly in our previous conversations,
but which were not reported. During these interviews, they shared with
me their stories about how they learned about intimacy when they were
growing up. Because the conversations dealt with sexual content, which,
in their culture, was referred to more delicately as intimacy, I was unable
to report my findings, as, to do so, would have been inappropriate. One
does not discuss such topics in mixed company, so my writing about this
subject might have endangered my reputation in the community or possibly
inhibited my continued relationship with community members. I was forced
to choose between publishing the findings, which would have benefited my
academic career, and retaining my reputation within the Creek community. I

chose to maintain a relationship with the Creek people, so I did not publish
any of the findings from that study. I also was told by the funding source that
I should not request additional funds for research, if the results would not be
publishable. [41]
The Processes of Conducting Observations

Exactly how does one go about conducting observation? WERNER and
SCHOEPFLE (1987, as cited in ANGROSINO & dePEREZ, 2000) focus on
the process of conducting observations and describe three types of processes:
• The first is descriptive observation, in which one observes
anything and everything, assuming that he/she knows nothing;
the disadvantage of this type is that it can lead to the collection of
minutiae that may or may not be relevant to the study.
• The second type, focused observation, emphasizes observation
supported by interviews, in which the participants’ insights guide
the researcher’s decisions about what to observe.
• The third type of observation, considered by ANGROSINO and
DePEREZ to be the most systematic, is selective observation, in
which the researcher focuses on different types of activities to
help delineate the differences in those activities (ANGROSINO
& dePEREZ, 2000, p.677). [42]
Other researchers have taken a different approach to explaining how
to conduct observations. For example, MERRIAM (1988) developed an
observation guide in which she compiled various elements to be recorded
in field notes. The first of these elements includes the physical environment.
This involves observing the surroundings of the setting and providing a
written description of the context. Next, she describes the participants in
detail. Then she records the activities and interactions that occur in the
setting. She also looks at the frequency and duration of those activities/
interactions and other subtle factors, such as informal, unplanned activities,
symbolic meanings, nonverbal communication, physical clues, and what
should happen that has not happened. In her 1998 book, MERRIAM adds
such elements as observing the conversation in terms of content, who speaks
to whom, who listens, silences, the researcher’s own behavior and how that
role affects those one is observing, and what one says or thinks. [43]
To conduct participant observation, one must live in the context to
facilitate prolonged engagement; prolonged engagement is one of the
activities listed by LINCOLN and GUBA (1994) to establish trustworthiness.

The findings are considered to be more trustworthy, when the researcher can
show that he/she spent a considerable amount of time in the setting, as this
prolonged interaction with the community enables the researcher to have
more opportunities to observe and participate in a variety of activities over
time. The reader would not view the findings as credible, if the researcher
only spent a week in the culture; however, he/she would be more assured that
the findings are accurate, if the researcher lived in the culture for an extended
time or visited the culture repeatedly over time. Living in the culture enables
one to learn the language and participate in everyday activities. Through
these activities, the researcher has access to community members who can
explain the meaning that such activities hold for them as individuals and can
use conversations to elicit data in lieu of more formal interviews. [44]
When I was preparing to conduct my ethnographic study with the
Muscogee (Creek) women of Oklahoma, my professor, Valerie FENNELL,
told me that I should take the attitude of “treat me like a little child who knows
nothing,” so that my informants would teach me what I needed to know about
the culture. I found this attitude to be very helpful in establishing rapport,
in getting the community members to explain things they thought I should
know, and in inviting me to observe activities that they felt were important
for my understanding of their culture. DeWALT and DeWALT support the
view of the ethnographer as an apprentice, taking the stance of a child in
need of teaching about the cultural mores as a means for enculturation.
KOTTAK (1994) defines enculturation as “the social process by which
culture is learned and transmitted across generations” (p.16). Conducting
observations involves such activities as “fitting in, active seeing, short-
term memory, informal interviewing, recording detailed field notes, and,
perhaps most importantly, patience” (DeWALT & DeWALT, 2002, p.17).
DeWALT and DeWALT extend this list of necessary skills, adding MEAD’s
suggested activities, which include developing tolerance to poor conditions
and unpleasant situations, resisting impulsiveness, particularly interrupting
others, and resisting attachment to particular factions or individuals. [45]
ANGROSINO and DePEREZ (2000) advocate using a structured
observation process to maximize the efficiency of the field experience,
minimize researcher bias, and facilitate replication or verification by others,
all of which make the findings more objective. This objectivity, they explain,
occurs when there is agreement between the researcher and the participants
as to what is going on. Sociologists, they note, typically use document
analysis to check their results, while anthropologists tend to verify their

findings through participant observation. [46]
BERNARD (1994) states that most basic anthropological research
is conducted over a period of about a year, but recently there have been
participant observations that were conducted in a matter of weeks. In these
instances, he notes the use of rapid assessment techniques that include
“going in and getting on with the job of collection data without spending
months developing rapport. This means going into a field situation armed
with a lot of questions that you want to answer and perhaps a checklist of
data that you need to collect” (p.139). [47]
In this instance the cultural members are taken into the researcher’s
confidence as research partners to enable him/her to get the questions
answered. BERNARD notes that those anthropologists who are in the
field for extended periods of time are better able to obtain information of
a sensitive nature, such as information about witchcraft, sexuality, political
feuds, etc. By staying involved with the culture over a period of years, data
about social changes that occur over time are more readily perceived and
understood. [48]
BERNARD and his associates developed an outline of the stages
of participant observation fieldwork that includes initial contact; shock;
discovering the obvious; the break; focusing; exhaustion, the second break,
and frantic activity; and leaving. In ethnographic research, it is common
for the researcher to live in the culture under study for extended periods
of time and to return home for short breaks, then return to the research
setting for more data collection. When the researcher encounters a culture
that is different from his/her own and lives in that culture, constantly
being bombarded by new stimuli, culture shock results. Researchers react
differently to such shock. Some may sit in their motel room and play cards
or read novels to escape. Others may work and rework data endlessly.
Sometimes the researcher needs to take a break from the constant observation
and note taking to recuperate. When I conducted my dissertation fieldwork,
I stayed in a local motel, although I had been invited to stay at the home
of some community members. I chose to remain in the motel, because this
enabled me to have the down time in the evenings that I needed to write up
field notes and code and analyze data. Had I stayed with friends, they may
have felt that they had to entertain me, and I would have felt obligated to
spend my evenings conversing or participating in whatever activities they
had planned, when I needed some time to myself to be alone, think, and
“veg” out. [49]
The aspects of conducting observations are discussed above, but these

are not the only ways to conduct observations. DeMUNCK and SOBO
use freelisting to elicit from cultural members items related to specific
categories of information. Through freelisting, they build a dictionary of
coded responses to explain various categories. They also suggest the use
of pile sorting, which involves the use of cards that participants sort into
piles according to similar topics. The process involves making decisions
about what topics to include. Such card pile sorting processes are easy to
administer and may be meaningful to the participant’s world and frames of
reference (DeMUNCK & SOBO, 1998). [50]
A different approach to observation, consensus analysis, is a method
DeMUNCK and SOBO describe to design sampling frames for ethnographic
research, enabling the researcher to establish the viewpoints of the participants
from the inside out. This involves aspects of ethnographic fieldwork, such as
getting to know participants intimately to understand their way of thinking
and experiencing the world. It further involves verifying information gathered
to determine if the researcher correctly understood the information collected.
The question of whether one has understood correctly lends itself to the
internal validity question of whether the researcher has correctly understood
the participants. Whether the information can be generalized addresses the
external validity in terms of whether the interpretation is transferable from the
sample to the population from which it was selected. DeMUNCK and SOBO
note that the ethnographer begins with a topic and discusses that topic with
various people who know about it. He/She selects a variety of people who
know about the topic to include in the sample, remembering that not everyone
has the same opinion or experience about the topic. They suggest using a
nested sampling frame to determine differences in knowledge about a topic.
To help determine the differences, the researcher should ask the participants
if they know people who have a different experience or opinion of the topic.
Seeking out participants with different points of view enables the researcher
to fully flesh out understanding of the topic in that culture. DeMUNCK and
SOBO also suggest talking with anyone who is willing to teach you. [51]
TIPS FOR COLLECTING USEFUL OBSERVATION

DATA
TAYLOR and BOGDAN (1984) provided several tips for conducting
observations after one has gained entry into the setting under study. They
suggest that the researcher should:
• be unobtrusive in dress and actions;

• become familiar with the setting before beginning to collect data;
• keep the observations short at first to keep from becoming
overwhelmed;
• be honest, but not too technical or detailed, in explaining to
participants what he/she is doing. [52]
MERRIAM (1998) adds that the researcher should:
• pay attention, shifting from a “wide” to a “narrow” angle
perspective, focusing on a single person, activity, interaction,
then returning to a view of the overall situation;
• look for key words in conversations to trigger later recollection of
the conversation content;
• concentrate on the first and last remarks of a conversation, as
these are most easily remembered;
• during breaks in the action, mentally replay remarks and scenes
one has observed. [53]
DeWALT and DeWALT (2002) make these suggestions:
• Actively observe, attending to details one wants to record later.
• Look at the interactions occurring in the setting, including who
talks to whom, whose opinions are respected, how decisions are
made. Also observe where participants stand or sit, particularly
those with power versus those with less power or men versus
women.
• Counting persons or incidents of observed activity is useful in
helping one recollect the situation, especially when viewing
complex events or events in which there are many participants.
• Listen carefully to conversations, trying to remember as many
verbatim conversations, nonverbal expressions, and gestures as
possible. To assist in seeing events with “new eyes,” turn detailed
jottings into extensive field notes, including spatial maps and
interaction maps. Look carefully to seek out new insights.
• Keep a running observation record. [54]
WOLCOTT (2001) adds to the discussion of how to conduct observations.
He suggests that, to move around gracefully within the culture, one should:
• practice reciprocity in whatever terms are appropriate for that
culture;
• be tolerant of ambiguity; this includes being adaptable and

flexible;
• have personal determination and faith in oneself to help alleviate
culture shock. [55]
He further shares some tips for doing better participant observation
(pp.96-100).
• When one is not sure what to attend to, he/she should look to see
what it is that he/she is attending to and try to determine how and
why one’s attention has been drawn as it has. One should take
note of what he/she is observing, what is being put into the field
notes and in how much detail, and what one is noting about the
researcher’s personal experience in conducting the research. The
process of note taking is not complete until one has reviewed
his/her notes to make sure that he/she is coupling the analysis
with observations throughout the process to keep the researcher
on track.
• The researcher should review constantly what he/she is looking
for and whether he/she is seeing it or is likely to do so in the
circumstances for observation presented. It may be necessary to
refocus one’s attention to what is actually going on. This process
involves looking for recurring patterns or underlying themes in
behavior, action or inaction. He/she should also reflect on what
someone from another discipline might find of interest there. He/
she should look at her/his participation, what he/she is observing
and recording, in terms of the kind of information he/she will
need to report rather than what he/she feels he/she should collect.
• Being attentive for any length of time is difficult to do. One tends
to do it off and on. One should be aware that his/her attention
to details comes in short bursts that are followed by inattentive
rests, and those moments of attention should be capitalized upon.
• One should reflect on the note taking process and subsequent
writing-up practices as a critical part of fieldwork, making it part
of the daily routine, keeping the entries up to date. The elaborated
note taking also provides a connection between what he/she is
experiencing and how he/she is translating that experience into a
form that can be communicated to others. He/she should make a
habit of including in one’s field notes such specifics as day, date,
and time, along with a simple coding system for keeping track
of entries, and reflections on and about one’s mood, personal

reactions, and random thoughts, as these may help to recapture
detail not written down. One should also consider beginning
to do some writing as fieldwork proceeds. One should take
time frequently to draft expanded pieces written using “thick
description,” as described by GEERTZ (1973), so that such
details might later be incorporated into the final write up.
• One should take seriously the challenge of participating and
focus, when appropriate, on one’s role as participant over one’s
role as observer. Fieldwork involves more than data gathering.
It may also involve informal interviews, conversations, or more
structured interviews, such as questionnaires or surveys. [56]
BERNARD notes that one must become explicitly aware, being attentive
in his/her observations, reporting what is seen, not inferred. It is natural
to impose on a situation what is culturally correct, in the absence of real
memories, but building memory capacity can be enhanced by practicing
reliable observation. If the data one collects is not reliable, the conclusions
will not be valid. BERNARD advises that the researcher not talk to anyone
after observing, until he/she has written down his/her field notes. He
advocates that he/she try to remember things in historical/chronological order
and draw a map of the physical space to help him/her remember details. He
also suggests that the researcher maintain naiveté, assuming an attitude of
learner and being guided by participants’ teaching without being considered
stupid, incompetent, or dangerous to their wellbeing. Sometimes, he points
out, one’s expertise is what helps to establish rapport. Having good writing
skills, that is, writing concisely and compellingly, is also necessary to good
participant observation. The researcher must learn to ‘hang out’ to enable
him/her to ask questions when appropriate and to ask appropriate questions.
Maintaining one’s objectivity means realizing and acknowledging one’s
biases, assumptions, prejudices, opinions, and values. [57]
KEEPING AND ANALYZING FIELD NOTES AND

WRITING UP THE FINDINGS
KUTSCHE (1998) suggests that, when mapping out a setting, one must first
learn to put aside his/her preconceptions. The process of mapping, as he
describes it, involves describing the relationship between the sociocultural
behavior one observes and the physical environment. The researcher
should draw a physical map of the setting, using as much detail as possible.
KUTSCHE suggests that the researcher visit the setting under study at
different times of the day to see how it is used differently at different times
of the day/night. He/she should describe without judgment and avoid using
meaningless adjectives, such as “older” (older than what/whom?) or “pretty”
(as compared to what/whom?); use adjectives that help to describe the
various aspects of the setting meaningfully (what is it that makes the house
inviting?). When one succeeds in avoiding judgment, he/she is practicing
cultural relativism. This mapping process uses only one of the five senses—
vision. “Human events happen in particular places, weathers, times, and so
forth. If you are intrigued, you will be pleased to know that what you are
doing is a subdiscipline of anthropology called cultural ecology” (p.16). It
involves looking at the interaction of the participants with the environment.
STEWARD (1955, as cited in KUTSCHE, 1998), a student of KROEBER
(1939, as cited in KUTSCHE, 1998), who wrote about Native American
adaptations to North American environments, developed a theory called
“multilinear evolution” in which he described how cultural traditions evolve
related to specific environments.
“Cultural systems are not just rules for behavior, ways of surviving,
or straitjackets to constrict free expression ... All cultures, no matter how
simple or sophisticated, are also rhythms, music, architecture, the dances of
living. ... To look at culture as style is to look at ritual” (p.49). [58]
KUTSCHE refers to ritual as being the symbolic representation of the
sentiments in a situation, where the situation involves person, place, time,
conception, thing, or occasion. Some of the examples of cultural rituals
KUTSCHE presents for analysis include rites of deference or rites of
passage. Ritual and habit are different, KUTSCHE explains, in that habits
have no symbolic expression or meaning (such as tying one’s shoes in the
same way each time). [59]
In mapping out the setting being observed, SCHENSUL, SCHENSUL,
and LeCOMPTE (1999) suggest the following be included:
• a count of attendees, including such demographics as age, gender,
and race;
• a physical map of the setting and description of the physical
surroundings;
• a portrayal of where participants are positioned over time;
• a description of the activities being observed, detailing activities
of interest. [60]
They indicate that counting, census taking, and mapping are important
ways to help the researcher gain a better understanding of the social setting
in the early stages of participation, particularly when the researcher is not
fluent in the language and has few key informants in the community. [61]
Social differences they mention that are readily observed include
differences among individuals, families, or groups by educational level,
type of employment, and income. Things to look for include the cultural
members’ manner of dress and decorative accoutrements, leisure activities,
speech patterns, place of residence and choice of transportation. They also
add that one might look for differences in housing structure or payment
structure for goods or services. [62]
Field notes are the primary way of capturing the data that is collected from
participant observations. Notes taken to capture this data include records
of what is observed, including informal conversations with participants,
records of activities and ceremonies, during which the researcher is unable
to question participants about their activities, and journal notes that are kept
on a daily basis. DeWALT, DeWALT, and WAYLAND describe field notes
as both data and analysis, as the notes provide an accurate description of
what is observed and are the product of the observation process. As they
note, observations are not data unless they are recorded into field notes. [63]
DeMUNCK and SOBO (1998) advocate using two notebooks for
keeping field notes, one with questions to be answered, the other with more
personal observations that may not fit the topics covered in the first notebook.
They do this to alleviate the clutter of extraneous information that can occur
when taking. Field notes in the first notebook should include jottings, maps,
diagrams, interview notes, and observations. In the second notebook, they
suggest keeping memos, casual “mullings, questions, comments, quirky
notes, and diary type entries” (p.45). One can find information in the notes
easily by indexing and cross-referencing information from both notebooks
by noting on index cards such information as “conflicts, gender, jokes,
religion, marriage, kinship, men’s activities, women’s activities, and so on”
(p.45). They summarize each day’s notes and index them by notebook, page
number, and a short identifying description. [64]
The feelings, thoughts, suppositions of the researcher may be noted
separately. SCHENSUL, SCHENSUL, and LeCOMPTE (1999) note that
good field notes:
• use exact quotes when possible;
• use pseudonyms to protect confidentiality;
• describe activities in the order in which they occur;

• provide descriptions without inferring meaning;
• include relevant background information to situate the event;
• separate one’s own thoughts and assumptions from what one
actually observes;
• record the date, time, place, and name of researcher on each set
of notes. [65]
Regarding coding their observation notes, DeMUNCK and SOBO
(1998) suggest that coding is used to select and emphasize information that is
important enough to record, enabling the researcher to weed out extraneous
information and focus his/her observations on the type of information
needed for the study. They describe codes as
“rules for organizing symbols into larger and more meaningful strings
of symbols. It is important, no imperative, to construct a coding system not
because the coding system represents the ‘true’ structure of the process you
are studying, but because it offers a framework for organizing and thinking
about the data” (p.48). [66]
KUTSCHE states that, when one is trying to analyze interview
information and observation field notes, he/she is trying to develop a model
that helps to make sense of what the participants do. One is constructing a
model of culture, not telling the truth about the data, as there are numerous
truths, particularly when presented from each individual participant’s
viewpoint. The researcher should set out an outline of the information he/
she has, organize the information according to the outline, then move the
points around as the argument of one’s study dictates. He further suggests
that he/she organize the collected data into a narrative in which one may
tell the story of a day or a week in the lives of informants, as they may have
provided information in these terms in response to grand tour questions, that
is, questions that encourage participants to elaborate on their description of
a cultural scene (SPRADLEY, 1979). Once the data have been organized in
this way, there will probably be several sections in the narrative that reflect
one’s interpretation of certain themes that make the cultural scene clear
to the reader. He further suggests asking participants to help structure the
report. In this way, member checks and peer debriefing occur to help ensure
the trustworthiness of the data (LINCOLN & GUBA, 1994). [67]
When writing up one’s description of a ritual, KUTSCHE advises the
researcher to make a short draft of the ritual and then take specific aspects
to focus on and write up in detail with one’s analysis. It is the analysis

that differentiates between creative writing and ethnology, he points out.
When writing up one’s ethnographic observations, KUTSCHE advises that
the researcher follow the lead of SPRADLEY and McCURDY (1972) and
find a cultural scene, spend time with the informants, asking questions and
clarifying answers, analyze the material, pulling together the themes into
a well-organized story. Regarding developing models, he indicates that
the aim is to construct a picture of the culture that reflects the data one
has collected. He bases his model development on guidelines by Ward H.
GOODENOUGH, who advocates that the first level of development includes
what happens, followed by a second level of development which includes
what the ethnographer has observed, subsequently followed by a third level
including what was recorded in the field, and finally followed by a fourth
level derived from one’s notes. He adds that GOODENOUGH describes a
fifth level, in which ethnological theory is developed from separate models
of separate cultures. KUTSCHE defines models as having four properties
described by LEVI-STRAUSS (1953, p.525, as cited in KUTSCHE,1998),
two of which are pertinent to this discussion: the first property, in which the
structure exhibits the characteristics of a system, and the fourth property, in
which the model makes clear all observed facts. [68]
WOLCOTT indicates that fieldworkers of today should put themselves
into their written discussion of the analysis without regaling the reader with
self-reports of how well they did their job. This means that there will be a bit
of postmodern auto-ethnographic information told in the etic or researcher’s
voice (PIKE, 1966), along with the participants’ voices which provide
the emic perspective (PIKE, 1966). Autoethnography, in recent years, has
become an accepted means for illustrating the knowledge production of
researchers from their own perspective, incorporating their own feelings and
emotions into the mix, as is illustrated by Carolyn ELLIS (i.e., ELLIS, 2003,
and HOLMAN JONES, 2004). [69]
TEACHING PARTICIPANT OBSERVATION

Throughout the past eight or so years of teaching qualitative research courses,
I have developed a variety of exercises for teaching observation skills, based
on techniques I observed from other researchers and teachers of qualitative
research or techniques described in others’ syllabi. Over time, I have revised
others’ exercises and created my own to address the needs of my students
in learning how to conduct qualitative research. Below are several of those
exercises that other professors of qualitative research methods may find

useful. [70]
Memory Exercise—Students are asked to think of a familiar place,
such as a room in their home, and make field notes that include a map of
the setting and a physical description of as much as they can remember
of what is contained in that setting. They are then asked to compare their
recollections with the actual setting to see what they were able to remember
and how well they were able to do so. The purpose of this exercise is to help
students realize how easy it is to overlook various aspects that they have
not consciously tried to remember. In this way, they begin to be attentive to
details and begin to practice active observing skills. [71]
Sight without sound—In this exercise, students are asked to find a
setting in which they are able to see activity but in which they are unable
to hear what is being said in the interaction. For a specified length of time
(5 to 10 minutes), they are to observe the action/interaction, and record as
much information as they can in as much detail as possible. This exercise
has also been done by turning off the sound on the television and observing
the actions/interactions on a program; students, in this case, are instructed
to find a television program with which they are unfamiliar, so they are
less apt to impose upon their field notes what they believe they know about
familiar characters or programs. This option is less desirable, as students
sometimes find it difficult to find a program with which they do not have
some familiarity. The purpose of the exercise is to teach the students to
begin observing and taking in information using their sight. [72]
Instructions for writing up their field notes include having them begin by
drawing a map of the setting and providing a description of the participants.
By having them record on one side of their paper what information they take
in through their senses and on the other side whatever thoughts, feelings,
ideas they have about what is happening, they are more likely to begin to see
the difference in observed data and their own construction or interpretation
of the activity. This exercise also helps them realize the importance of using
all of their senses to take in information and the importance of observing
both the verbal and the nonverbal behaviors of the situation. Possible
settings for observation in this exercise have included sitting inside fast-
food restaurants, viewing the playground, observing interactions across
parking lots or mall food courts, or viewing interactions at a distance on the
subway, for example. [73]
Sound without sight—In this exercise, similar to the above exercise,

students are asked to find a setting in which they are able to hear activity/
interactions, but in which they are unable to see what is going on. Again, for
a specified length of time, they are asked to record as much as they can hear
of the interaction, putting their thoughts, feelings, and ideas about what is
happening on the right side of the paper, and putting the information they
take in with their senses on the left hand side of the paper. Before beginning,
they again are asked to describe the setting, but, if possible, they are not to
see the participants in the setting under study. In this way, they are better
able to note their guesses about the participants’ ages, gender, ethnicity,
etc. My students have conducted this exercise in restaurants, listening to
conversations of patrons in booths behind them, while sitting on airplanes
or other modes of transportation, or by sitting outside classrooms where
students were interacting, for example. A variation of this exercise is to
have students turn their backs to the television or listen to a radio program
with which they are unfamiliar, and have them conduct the exercise in that
fashion, without sight to guide their interpretations. [74]
In both of these examples, male students are cautioned to stay away
from playgrounds or other settings where there actions may be misconstrued.
They are further cautioned against sitting in vehicles and observing, as
several of my students have been approached by security or police officers
who questioned them about their actions. The lesson here is that, while
much information can be taken in through hearing conversations, without
the body language, meanings can be misconstrued. Further, they usually find
it interesting to make guesses about the participants in terms of age, gender,
ethnicity, and relationship to other participants in the setting, based on what
they heard. [75]
In both of these examples, it is especially interesting when one student
conducts the sight without sound and another students conducts the
sound without sight exercise using the same interaction/setting, as their
explanations, when shared in class, sometimes illustrate how easy it is to put
one’s own construction on what is actually happening. [76]
Photographic Observation—This exercise encourages students to use
photographs to help them remember activities, and photographs can serve as
illustrations of aspects of activities that are not easily described. Students are
asked to take a series of 12 to 36 photographs of an activity, and provide a
written description of the activity that tells the story of what is happening in
the activity, photo by photo. They are instructed to number the photographs
and take notes as they take pictures to help them keep the photos organized
in the right sequence. Several students have indicated that this was a fun
exercise in which their children, who were the participants in the activity,
were delighted to be involved; they also noted that this provided them with
a pictographic recollection of a part of their children’s lives that would be
a keepsake. One student recorded her 6 year old daughter’s first formal tea
party, for example. [77]
Direct Observation—In this instance, students are asked to find a
setting they wish to observe in which they will be able to observe without
interruption and in which they will not be participating. For some specified
length of time (about 15 to 30 minutes), they are asked to record everything
they can take in through their senses about that setting and the interactions
contained therein for the duration of the time period, again recording on one
side of the paper their field notes from observation and on the other side their
thoughts, feelings, and ideas about what is happening. Part of the lesson here
is that, when researchers are recording aspects of the observation, whether
it be the physical characteristics of the setting or interactions between
participants, they are unable to both observe and record. This exercise is
also good practice for getting them to write detailed notes about what is or
is not happening, about the physical surroundings, and about interactions,
particularly conversations and the nonverbal behaviors that go along with
those conversations. [78]
Participant Observation—Students are asked to participate in some
activity that takes at least 2 hours, during which they are not allowed to
take any notes. Having a few friends or family members over for dinner is
a good example of a situation where they must participate without taking
notes. In this situation, the students must periodically review what they want
to remember. They are instructed to remember as much as possible, then
record their recollections in as much detail as they can remember as soon as
possible after the activity ends. Students are cautioned not to talk to anyone
or drink too much, so their recollections will be unaltered. The lesson here
is that they must consciously try to remember bits of conversation and other
details in chronological order. [79]
When comparing their field notes from direct observation to participant
observation, the students may find that their notes from direct observation
(without participation) are more detailed and lengthy than with participant
observation; however, through participation, there is more involvement in
the activities under study, so there is likely to be better interpretation of
what happened and why. They also may find that participant observation
lends itself better to recollecting information at a later time than direct
observation. [80]
SUMMARY
Participant observation involves the researcher’s involvement in a variety
of activities over an extended period of time that enable him/her to observe
the cultural members in their daily lives and to participate in their activities
to facilitate a better understanding of those behaviors and activities. The
process of conducting this type of field work involves gaining entry into
the community, selecting gatekeepers and key informants, participating in
as many different activities as are allowable by the community members,
clarifying one’s findings through member checks, formal interviews, and
informal conversations, and keeping organized, structured field notes
to facilitate the development of a narrative that explains various cultural
aspects to the reader. Participant observation is used as a mainstay in field
work in a variety of disciplines, and, as such, has proven to be a beneficial
tool for producing studies that provide accurate representation of a culture.
This paper, while not wholly inclusive of all that has been written about this
type of field work methods, presents an overview of what is known about
it, including its various definitions, history, and purposes, the stances of the
researcher, and information about how to conduct observations in the field.
[81]
Notes
1) Validity is a term typically associated with quantitative research;
however, when viewed in terms of its meaning of reflecting what
is purported to be measured/observed, its use is appropriate.
Validity in this instance may refer to context validity, face validity
or trustworthiness as described by LINCOLN and GUBA (1994).
2) Many years after MEAD studied the Samoan girls, FREEMAN
replicated MEAD’s study and derived different interpretations.
FREEMAN’s study suggested that MEAD’s informants had
misled her by telling her what they wanted her to believe, rather
than what was truthful about their activities.
REFERENCES
1. Adler, Patricia A. & Adler, Peter (1987). Membership roles in field
research. Newbury Park: Sage.
2. Adler, Patricia A. & Adler, Peter (1994). Observation techniques.
In Norman K. Denzin & Yvonna S. Lincoln (Eds.), Handbook of
qualitative research (pp.377-392). Thousand Oaks, CA: Sage.
3. Agar, Michael H. (1980). The professional stranger: an informal
introduction to ethnography. SanDiego: Academic Press.
4. Angrosino, Michael V. & Mays dePerez, Kimberly A. (2000). Rethinking
observation: From method to context. In Norman K. Denzin & Yvonna
S. Lincoln (Eds.), Handbook of Qualitative Research (second edition,
pp.673-702), Thousand Oaks, CA: Sage.
5. Bernard, H. Russell (1994). Research methods in anthropology:
qualitative and quantitative approaches (second edition). Walnut
Creek, CA: AltaMira Press.
6. Bernard, H. Russell (Ed.) (1998). Handbook of methods in cultural
anthropology. Walnut Creek: AltaMira Press.
7. Breuer, Franz & Roth, Wolff-Michael (2003, May). Subjectivity and
reflexivity in the social sciences: epistemic windows and methodical
consequences [30 paragraphs]. Forum Qualitative Sozialforschung /
Forum: Qualitative Social Research [On-line Journal], 4(2), Art.25.
Available at http://www.qualitative-research.net/fqs-texte/2-03/2-
03intro-3-e.htm [April, 5, 2005].
8. deMunck, Victor C. & Sobo, Elisa J. (Eds) (1998). Using methods in
the field: a practical introduction and casebook. Walnut Creek, CA:
AltaMira Press.
9. DeWalt, Kathleen M. & DeWalt, Billie R. (1998). Participant
observation. In H. Russell Bernard (Ed.), Handbook of methods in
cultural anthropology (pp.259-300). Walnut Creek: AltaMira Press.
10. DeWalt, Kathleen M. & DeWalt, Billie R. (2002). Participant
observation: a guide for fieldworkers. Walnut Creek, CA: AltaMira
Press.
11. Ellis, Carolyn (2003, May). Grave tending: with mom at the cemetery [8
paragraphs]. Forum Qualitative Sozialforschung / Forum: Qualitative
Social research [On-line Journal], 4(2), Art.28. Available at http://
www.qualitative-research.net/fqs-texte/2-03/2-03ellis-e.htm [April 5,
2005].
12. Erlandson, David A.; Harris, Edward L.; Skipper, Barbara L. & Allen,
Steve D. (1993). Doing naturalistic inquiry: a guide to methods.
Newbury Park, CA: Sage.
13. Fine, Gary A. (2003). Towards a peopled ethnography developing
theory from group life. Ethnography, 4(1), 41-60.
14. Gaitan, Alfredo (2000, November). Exploring alternative forms
of writing ethnography. Review Essay: Carolyn Ellis and Arthur
Bochner (Eds.) (1996). Composing ethnography: Alternative forms of
qualitative writing [9 paragraphs}. Forum Qualitative Sozialforschung
/ Forum: Qualitative Social Research [On-line Journal], 1(3), Art.42.
Available at: http://www.qualitative-research.net/fqs-texte/3-00/3-
00review-gaitan-e.htm [April, 5, 2005].
15. Gans, Herbert J. (1999). Participant observation in the era of
“ethnography.” Journal of Contemporary Ethnography, 28(5), 540-
548.
16. Geertz, Clifford (1973). Thick description: Towards an interpretive
theory of culture. In Clifford Geertz (Ed.), The interpretation of
cultures (pp.3-32). New York: Basic Books.
17. Glantz, Jeffrey & Sullivan, Susan (2000). Supervision in practice: 3
Steps to improving teaching and learning. Corwin Press, Inc.
18. Glickman, Carl D.; Gordon, Stephen P. & Ross-Gordon, Jovita
(1998). Supervision of instruction (fourth edition). Boston: Allyn &
Bacon.
19. Gold, Raymond L. (1958). Roles in sociological field observations. Social
Forces, 36, 217-223.
20. Holman Jones, Stacy (2004, September). Building connections in
qualitative research. Carolyn Ellis and Art Bochner in conversation
with Stacy Holman Jones [113 paragraphs]. Forum Qualitative
Sozialforschung / Forum: Qualitative Social Research [On-line
Journal], 5(3), Art.28. Available at http://www.qualitative-research.
net/fqs-texte/3-04/04-3-28-e.htm [April 5, 2005].
21. Johnson, Allen & Sackett, Ross (1998). Direct systematic observation
of behavior. In H. Russell Bernard (Ed.), Handbook of methods in
cultural anthropology (pp.301-332). Walnut Creek: AltaMira Press.
22. Kawulich, Barbara B. (1998). Muscogee (Creek) women’s perceptions
of work (Unpublished doctoral dissertation, Georgia State University).
23. Kawulich, Barbara B. (2004). Muscogee women’s identity

development. In Mark Hutter (Ed.), The family experience: a reader in
cultural diversity (pp.83-93). Boston: Pearson Education.
24. Kottak, Conrad P. (1994). Cultural anthropology (sixth edition). New
York: McGraw-Hill.
25. Kroeber, Alfred L. (1939). Cultural and natural areas of Native North
America. Berkeley: University of California Press.
26. Kutsche, Paul (1998). Field ethnography: a manual for doing cultural
anthropology. Upper Saddle River, NJ: Prentice Hall.
27. Levi-Strauss, Claude (1953). Social structure. In Alfred L. Kroeber
(Ed.), Anthropology today (pp.24-53). Chicago: University of Chicago
Press.
28. Lincoln, Yvonna S. & Guba, Egon G. (1985). Naturalistic inquiry.
Beverly Hills, CA: Sage.
29. Marshall, Anne & Batten, Suzanne (2004, September). Researching
across cultures: issues of ethics and power [17 paragraphs]. Forum
Qualitative Sozialforschung / Forum: Qualitative Social Research [On-
line Journal], 5(3), Art.39. Available at: http://www.qualitative-
research.net/fqs-texte/3-04/04-3-39-e.htm [April 5, 2005].
30. Marshall, Catherine & Rossman, Gretchen B. (1989). Designing
qualitative research. Newbury Park, CA: Sage.
31. Marshall, Catherine & Rossman, Gretchen B. (1995). Designing
qualitative research. Newbury Park, CA: Sage.
32. Merriam, Sharan B. (1988). Case study research in education: a
qualitative approach. San Francisco: Jossey-Bass Publishers.
33. Merriam, Sharan B. (1998). Qualitative research and case study
applications in education. San Francisco: Jossey-Bass Publishers.
34. Pike, Kenneth L. (1966). Emic and etic standpoints for the description of
behavior. In Alfred G. Smith (Ed.), Communication and culture (pp.52-
163). New York: Holt, Reinhart & Winston.
35. Ratner, Carl (2002, September). Subjectivity and objectivity in
qualitative methodology [29 paragraphs]. Forum Qualitative
Sozialforschung / Forum: Qualitative Social Research [On-line
Journal], 3(3), Art.16. Available at: http://www.qualitative-research.
net/fqs-texte/3-02/3-02ratner-e.htm [April 5, 2005].
36. Schensul, Stephen L.; Schensul, Jean J. & LeCompte, Margaret D.

(1999). Essential ethnographic methods: observations, interviews, and
questionnaires (Book 2 in Ethnographer›s Toolkit). Walnut Creek, CA:
AltaMira Press.
37. Schmuck, Richard (1997). Practical action research for change.
Arlington Heights, IL: IRI/Skylight Training and Publishing.
38. Spradley, James P. (1979). The ethnographic interview. Fort Worth:
Harcourt Brace Jovanovich College Publishers.
39. Spradley, James P. (1980). Participant observation. New York: Holt,
Rinehart and Winston.
40. Spradley, James P. & McCurdy, David W. (1972). The Cultural
Experience. Chicago: Science Research Associates.
41. Steward, Julian H. (1955). Theory of culture change: the methodology
of multilinear evolution. Urbana: University of Illinois Press.
42. Taylor, Steven J. & Bogdan, Robert (1984). Introduction to qualitative
research: The search for meanings (second edition). New York: John
Wiley.
43. Werner Oswald & Schoepfle, G. Mark (1987). Systematic fieldwork:
Vol. 1. Foundations of ethnography and interviewing. Newbury Park,
CA: Sage Publications.
44. Whyte, William F. (February, 1979). On making the most of participant
observation. The American Sociologist, 14, 56-66.
45. Wolcott, Harry F. (2001). The art of fieldwork. Walnut Creek, CA:
AltaMira Press.
Chapter
ATTITUDES TOWARDS
PARTICIPATION IN A
PASSIVE DATA
6
COLLECTION EXPERIMENT
Bence Ságvári,1,2, Attila Gulyás,1 and Júlia Koltai1,3,4

1
(CSS–RECENS), Centre for Social Sciences, Tóth Kálmán Utca 4, 1097 Budapest, Hungary
Institute of Communication and Sociology, Corvinus University, Fővám tér 8, 1093
2
Budapest, Hungary
3
Department of Network and Data Science, Central European University, Quellenstraße 51,
1100 Vienna, Austria
4
Faculty of Social Sciences, Eötvös Loránd University of Sciences, Pázmány Péter Sétány
1/A, 1117 Budapest, Hungary
ABSTRACT
In this paper, we present the results of an exploratory study conducted
in Hungary using a factorial design-based online survey to explore the
Citation: (APA): Ságvári, B., Gulyás, A., & Koltai, J. (2021). Attitudes towards Par-
ticipation in a Passive Data Collection Experiment. Sensors, 21(18), 6085. (18 pages)
Copyright: © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is
an open access article distributed under the terms and conditions of the Creative Com-
mons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
willingness to participate in a future research project based on active and

passive data collection via smartphones. Recently, the improvement of
smart devices has enabled the collection of behavioural data on a previously
unimaginable scale. However, the willingness to share this data is a key
issue for the social sciences and often proves to be the biggest obstacle
to conducting research. In this paper we use vignettes to test different
(hypothetical) study settings that involve sensor data collection but differ in
the organizer of the research, the purpose of the study and the type of collected
data, the duration of data sharing, the number of incentives and the ability to
suspend and review the collection of data. Besides the demographic profile
of respondents, we also include behavioural and attitudinal variables to the
models. Our results show that the content and context of the data collection
significantly changes people’s willingness to participate, however their
basic demographic characteristics (apart from age) and general level of trust
seem to have no significant effect. This study is a first step in a larger project
that involves the development of a complex smartphone-based research tool
for hybrid (active and passive) data collection. The results presented in this
paper help improve our experimental design to encourage participation by
minimizing data sharing concerns and maximizing user participation and
motivation.
Keywords: data fusion, surveys, informed consent
INTRODUCTION
Smartphone technologies combined with the improvement of cloud-based
research architecture offers great opportunities in social sciences. The most
common methodology in the social sciences is still the use of surveys and
other approaches that require the active participation of research subjects.
However, there are some areas that are best researched not through surveys,
but rather by observing individuals’ behaviour in a continuous social
experiment. Mobile technologies make it possible to observe behaviour on a
new level by using raw data of various kinds collected by our most common
everyday companion: our smartphone. Moreover, since smartphones shape
our daily lives thanks to various actions available through countless apps, it
is logical to consider them as a platform for actual research.
There have been numerous research projects that have relied on
collecting participants’ mobile sensor and app usage data, but the biggest
concern has been the willingness to share this data. Privacy and trust
Attitudes towards Participation in a Passive Data Collection Experiment 91
concerns both contribute to people’s unwillingness to provide access to

their personal data, and uncovering these attitudes is a critical step for any
successful experimental design.
In this paper, we present the results of our pre-experimental survey
to uncover prospective participants’ attitudes toward sharing their mobile
sensor and app usage data. This experiment is part of a larger research
and software development project aimed at creating a modular active and
passive data collection tool for smartphones that could be used in social and
health research.
For this study we used data from an online survey representative of
Internet users in Hungary. The aim of the survey was to analyse respondents’
attitudes (and not actual behaviour) towards using a hypothetical research
app that performs active and passive data collection.
The following section provides further details on the background of
active/passive data collection and an outlook on results from other studies.
We then discuss the details of the online panel used in our study, the survey
design and the models used in the analysis. After presenting our results, we
conclude by mentioning some open questions and limitations that can be
addressed in further steps of this study.
BACKGROUND
Surveys, Active and Passive Data Collection

In recent decades survey methods have been the main research tools in
the social sciences. Technological advances have not changed that, but
rather expanded it. Traditional paper-and-pencil interviews (PAPI) and
surveys quickly adopted new technologies: Interviews were conducted
over telephones (regular surveys) and as computers became mainstream,
computer aided survey methods emerged.
This development took another leap when smartphone applications
emerged along with cloud-based services and smartphones suddenly
became a viable platform for collecting survey data [1,2,3,4]. Although self-
reported surveys generally suffer from bias for a variety of reasons [5,6,7,8],
conducting surveys with smartphones is a very cost-effective method of data
collection that also opens up opportunities to collect other non-survey types
of data. Such data includes location information, application usage, media
consumption etc. all of which provide better insight into the behaviour and
social connections of individuals [9,10,11,12,13]. More importantly, since

it is behavioural data, it is much less prone to bias, unlike ordinary surveys.
The collection of data is divided into two main categories depending on
the subscriber’s interaction with their smartphone: active and passive data
collection.
Active data collection means that an action by the participant is required
to generate the collected information, and the participant is prompted by
the research application to provide this information. This means that the
participant triggers phone features (taking photos, recording other types of
data, actively sending a location tag) while also giving consent for this data
to be sent to the researching institution. Submitting surveys or survey-like
inputs (e.g., gathering attitudes or moods) [14,15,16]) can also be considered
a form of active data collection.
Passive data collection, on the other hand, means that sensor data from
the smartphone is collected and sent periodically without the participant
knowing that data was collected at any given time. There are various
sensors that can be used in a smartphone: multiple location-based sensors
(GPS, gyroscopes), accelerometers, audio sensors, Bluetooth radios, Wi-Fi
antennas, and with the advancement of technology, many other sensors–such
as pulse or blood pressure sensors. In the field of healthcare, such passive
data collection is becoming the main solution for health monitoring in the
elderly or in other special scenarios [17,18,19].
Obviously, such data collection approaches can be combined to provide
instant data linkages [20], which can then be used to provide even richer
information-e.g., pulse measurements while conducting surveys and
answering questions can validate responses.
In order to conduct such data collection in a legally and ethically
acceptable manner, informed consent must be given by participants for every
aspect of the data collection. With the inclusion of the GDPR, there are clear
requirements for recording participants’ consents and handling their data,
the key feature being that they can withdraw their consent at any time during
their participation in an experiment.
For smartphone apps, the default requirement is that access to data and
sensor information must be explicitly permitted by the user of the device.
However, this consent does not apply to the sharing of data with third parties
- in this case, the researching institution. The participants must give their
explicit consent for their data to be collected and transferred from their
device to a location unknown to them. Similarly, the researching institution
must ensure proper handling of the data and is responsible to the participants
for the security of their data.
Several studies have found that people are generally reluctant to share
their data when it comes to some form of passive data collection [21,22,23],
mostly due to privacy concerns. However, people who frequently use
their smartphones are less likely to have such concerns [23]. Over the past
decade, the amount of data collected by various organizations has increased
dramatically. This includes companies with whom users share their data
with their consent [24], but they are probably unaware of the amount of data
they are sharing and how it is exactly exploited for commercial purposes.
Several studies have found that people are much more likely to share
data when they are actively engaged in the process (e.g., sending surveys,
taking photos, etc.) than when they passively share sensor data [15,23].
This lower participation rate is influenced by numerous factors, so people’s
willingness to share data is itself an interesting research question.
Willingness to Share Data

As detailed as such data can be, participation rates in such experiments show
diverse results, but generally, they are rather low when it comes to passive
data collection. In what follows, we will refer to the participation rate as
“willingness to participate” or WTP, a commonly used abbreviation in this
context. We have collected benchmark data from relevant articles studying
WTP in various passive data collection scenarios. As Table 1. shows,
WTP is mostly below 50%, both for cases where passive data collection
is complemented by a survey and for cases where it is not. Although not
evident from this summary, the presence of controlled active data collection
had a positive effect on participation, only Bricka et al. [25] conducted an
experiment comparatively analysing the presence of active data collection.
Table 1: The ratio of willingness to share data in selected studies
Study Passive Data Collection Willingness Note

Contents to Participate
(WTP)
Biler et al. (2013) GPS data 8%
[26]
Kretuer et al. mobile phone network 15.95%
(2019) [21] quality, location, interaction
history, characteristics of
the social network, activity
data, smartphone usage
Toepoel and GPS data 26% one-time, after a
Lugtig (2014) survey
[27]
Bricka et al. GPS with survey 30–73% in this study, the par-
(2009) [25] GPS only 12–27% ticipants would fill
multiple surveys
Pinter (2015) [28] Location 42% this was only
claimed willingness
not actual downloads
of an application
Revilla et al. GPS data 17.01–36.84% this value is the
(2016) [22] min-max willing-
ness rate of mobile/
tablet users from the
following countries:
Argentina, Brazil,
Chile, Colombia,
Spain, Mexico,
Portugal
Revilla et al. Web activity 30–50%
(2017) [29]
Scherpenzeel Location (GPS, Wi-Fi, cell) 81%
(2017) [30]
Wenz (2019) [23] GPS 39%
Usage 28%
Most of these studies required the participant to provide location

information when filling out a questionnaire, sometimes just a snippet of
it. Yet willingness to share this information is particularly low. This result
is perplexing considering that most smartphone users share their location
data with other apps (often not even in the context of providing location
information). Google services, shopping apps are a typical example of
location data users.
The only outlier in this table is the study reported by Scherpenzeel

[30], where the participation rate is suspiciously high. The participants in
this study were panellists who had already completed a larger survey panel
for the institution, so there was neither a trust barrier to overcome nor an
increased participation burden.
Mulder and de Bruijne [15] went deeper in their study and surveyed
behaviour on a 7-point scale (1-very unlikely to participate; 7-very likely
to participate) for different data collection types. In their sample, the mean
willingness to participate in passive data collection was 2.2, indicating a
very low willingness of respondents to participate. In the same study, they
found a mean of 4.15 for participating in a traditional PAPI survey study and
3.62 for completing the survey via an app. Thus, the difference between the
different ways of completing the survey was not large, but the inclusion of
passive data collection had a strong negative impact.
Given the participation rates for regular surveys in general, these even
lower numbers are not very surprising. However, to conduct a successful
experiment with an acceptable participation rate, it is important to collect
the causes that lower the participation rate. In the following, we will look at
some factors that have been analysed in different studies.
Importance of Institutional Trust

Trust in the institution collecting the data was found to be a key factor in
the willingness to share data [21,31,32]. Several studies have examined
the researching institution’s role on willingness to share passive data.
Participants’ main concern regarding data collection is the privacy of their
data. It is important to emphasise that a brief indication that the data will not
be shared with third parties does not really generate trust among the users of
an application, but rather the provider of the application influences it.
Keusch et al. found that people are about twice as likely to trust research
institutions not to share their sensitive data [21]. They measured WTP
using an 11-point scale in a survey of panellists. By halving this scale to
obtain a dichotomous WTP variable, they found that WTP was similar for
all three types of institutions (ranging from 33.1% to 36-9%). However, in
their further analysis, they found that WTP was significantly higher in the
case of universities and statistical offices than market research firms. Note,
however, that in this study no participants downloaded an app, these results
only show theoretical readiness.
Struminskaya et al. found similar results in their study [32], where

they tested hypotheses comparing WTP for universities, statistical offices,
and market research companies. They found that the WTP reported by
respondents is significantly highest for universities, followed by statistical
offices and finally for market research firms.
A practical result for this factor can be found in the study by Kreuter et al.
[18], where participants were asked to download a research app sponsored
by a research institution on their phone. In their study, they found a WTP of
15.9%, which was the app download rate.
Control Over the Data Connection

Passive data collection poses some risk to the user due to the lack of control
over the data collection. Here, we consider the ability to temporarily suspend
passive data collection from the app as “control” over data collection.
Of course, it is possible to prevent an app from collecting data (disabling
location services or turning off the mobile device’s Wi-Fi antenna), but here
we refer to the case where the experimental app provides in-built ability by
design to suspend data collection.
The best recent example of such an application was provided by Haas
et al., where subjects could individually choose which data the application
should collect [33]. In their application, users had to give their consent to
individual properties that the application could record: Network quality and
location information; interaction history; social network properties; activity
data, smartphone usage (Figure 13.2/a in [33]). They found that only a very
small percentage (20%) of participants changed these options after installing
the app and only 7% disabled a data sharing feature.
In their study, Keusch et al. specifically asked about WTP when the
participant was able to temporarily suspend the data collection component
of the app altogether and found a positive correlation [21]. This differs from
the total control that Haas et al. (2020) offered participants, as they allowed
the “level” of sharing to be adjusted rather than turned on and off. [33]
Another form of control, the ability to review and change recorded data,
was present in Struminskaya et al.’s survey. The corresponding indication in
their survey was rather vague, and probably that is why they did not find a
significant effect for it [32].
Incentives
Another way to improve WTP is to provide monetary incentives for
participation. Haas et al. focused their analysis on different types of
incentives paid at different points in an experiment [33]. Incentives can be
given in different time frames for different activities of the participants. In
terms of time frame, it is common to offer incentives for installing a research
application, at the end of the survey, but it is also possible to offer recurring
incentives. Another option for incentives is to offer them based on “tiers” of
how much data participants are willing to provide.
In their study, Haas et al. also examined the impact of incentives on
the installation and rewarded sharing of various sensor data. There was a
positive effect of initial incentives, but interestingly, they did not find the
expected positive effect of incentives on granting access to more data-sharing
functions. Another interesting finding was that a higher overall incentive did
not increase participants’ willingness to have the application installed over a
longer experimental period.
In addition to these findings, their overall conclusion was that the effects
of incentives improve participation similar to regular survey studies. The
results of Keusch et al. also support this finding [21].
Other Factors
Keusch et al. [21] found that a shorter experimental period (one month as
opposed to six months) and monetary incentives increased willingness to
participate in a study. As another incentive, Struminskaya et al. [32] found
that actual interest in the research topic (participants can receive feedback on
research findings) is also a positive factor for increased level of participation.
Finally, participants’ limited ability to use devices was also found to be a
factor in the study by Wenz et al. They found that individuals who rated their
own usage abilities as below average (below 3 on a 5-point scale) showed
a significantly lower willingness to participate, especially in passive data
collection tasks. [23] On the other hand, those who reported advanced phone
use skills were much more willing to participate in such tasks.
Although not necessarily related to age, Mulder and Bruinje found in
another study that willingness to participate decreased dramatically after age
50 [15]. These results indicate that usability is important when designing a
research application.
As these results show, there are many details to analyse when designing
an experiment that relies on passive data collection. Some of the studies used
surveys to uncover various latent characteristics that influence willingness
to participate, while others conducted a working research application to
share practical usage information.
Given that many studies reported low WTP scores, we concluded that it
is very important to conduct a preliminary study before elaborating the final
design of such an experiment. Therefore, the goal of this work is to figure
out how we can implement a research tool that motivates participation in the
study and still collect a useful amount of information.
METHODS AND DESIGN

To collect the information on WTP needed to design and fine-tune our
research ecosystem and its user interface components, we decided to conduct
an online vignette survey using a representative sample of smartphone users
in Hungary. In this section, we first formulate our research hypotheses and
then present our methods and models for the hypotheses.
Research Questions
Because the focus of our study is exploratory in nature, we did not formulate
explicit research hypotheses, but designed our models and the survey to be
able to answer the following questions:
• Q1.
What is the general level of WTP in a passive data collection study?
In order to have a single benchmark data and provide comparison with
similar studies we asked a simple question whether respondents would be
willing to participate or not in a study that is built on smartphone based
passive data collection.
• Q2.
What features of the research design would motivate people to participate
in the study?
We included several questions in our survey that address key features
of the study: the type of institute conducting the experiment, type of data
collected, length of the study, monetary incentives, and control over data
collection. We wanted to know which of these features should be emphasized
to maximize WTP.
• Q3.
What kind of demographic attributes influence WTP?
As mentioned in previous studies, age may be an important factor for
participation in our study, but we also considered other characteristics, such
as gender, education, type of settlement, and geographic region of residence.
• Q4.
What is the role of trust-, skills-, and privacy related contextual factors
on WTP?
As previous results suggest, trust, previous (negative) experiences and
privacy concerns might be key issues in how people react to various data
collection techniques. We used composite indicators to measure the effect
of interpersonal and institutional trust, smartphone skills and usage, and
general concerns over active and passive data collection methods on WTP.
Survey and Sample Details

Data collection for this study was conducted by a market research firm
using its online access panel of 150,000 registered users. The sample is
representative of Hungarian Internet users in terms of gender, age, education
level, type of settlement and geographical region. The online data collection
ran from 9 to 20 June 2021. The average time to complete the survey was
15 min. Basic descriptive statistics of the sample are shown in Table A1 in
Appendix A.
Apart from a few single items, the survey consisted of thematic blocks
of multiple-choice or Likert-scale questions. Among others, we asked
respondents about interpersonal and institutional trust, general smartphone
use habits, and concerns about various active and passive digital data
collection techniques using smartphones. The items on trust in the survey
were adapted from the European Social Survey (ESS), so they are well-tested
and have been used for a long time. With the exception of the last block of
the questionnaire, all questions were the same for all respondents. In the last
block, a special factorial survey technique was used to ask questions about
willingness to participate in a hypothetical smartphone based passive data
collection study [34,35].
The factorial survey included situations, called “vignettes”, in which
several dimensions of the situation are varied. The vignettes described
situations of a hypothetical data collection study and respondents had to
decide how likely they would be willing to participate. An example of a

vignette is shown in Box 1, with the varying dimensions of the situations
underlined, and the exact wording of the outcome variable (WTP).
Box 1. An example of a vignette.
Six dimensions were varied in the vignettes with the following values:
• The organizer of the research: (1) decision-makers, (2) a private
company, (3) scientific research institute.
• Data collected: (1) spatial movement, (2) mobile usage, (3)
communication habits, (4) spatial movement & mobile usage, (5)
spatial movement & communication habits, (6) mobile usage &
communication habits, (7) all three.
• Length of the research: (1) one month, (2) six months.
• Incentive: (1) HUF 5000 after installing the application, (2)
HUF 5000 after the completion of the study, (3) HUF 5000 after
installing the application and HUF 5000 after the completion of
the study.
• Interruption and control: (1) user cannot interrupt the data
collection, (2) user can temporarily interrupt the data collection,
(3) user can temporarily interrupt the data collection and review
the data and authorize its transfer.
Following Jasso, the creation of the vignettes proceeded as follows [35]:
First, we created a “universe” in which all combinations of the dimensions
described above were present, which included 378 different situations. From
these 378 situations, we randomly selected 150 and assigned them, also
randomly, to 15 different vignette blocks, which we call decks. Here, each
deck included 10 different vignettes and an additional control vignette to
test the internal validity of the experiment. The content of this last vignette
was the same as a randomly selected vignette from the first nine items
previously evaluated. The results show a high degree (64%) of consistency
between responses to the same two vignettes, suggesting a satisfactory level
of internal validity (see Appendix B for details on the analysis of this test).
In this manner, each respondent completed one randomly assigned deck
with 10 + 1 vignettes. In total, 11,000 vignettes were answered by 1000
participants. (Data from the 11th vignette were excluded from the analysis).
The descriptive statistics of the vignette dimensions are presented in Table
2.
Table 2: The descriptive statistics of the dimensions of vignettes
Organizer of the research

decision-makers private company scientific research institute
35.3% 34.2% 30.4%
Data collected
spatial mobile communica- spatial move- spatial move- mobile usage & all three
movement usage tion habits ment & mobile ment & com- communication
usage munication habits
habits
12.3% 15.8% 13.5% 18.2% 11.3% 15.1% 13.8%
Length of the research
one month six months
48.1% 51.9%
Incentive
HUF 5000 after in- HUF 5000 after the completion of the HUF 5000 after installing the
stalling the application research application and HUF 5000 after
the completion of the research
33.7% 32.6% 33.6%
Interruption and control
user cannot interrupt user can interrupt the data collection user can interrupt the data
the data collection collection and has control over
their data
34.2% 31.7% 34.2%
Willingness to participate
min. max. mean standard deviation
0 10 4.50 3.65
Notes: the vignette level data, N = 10,000.

This technique allowed us to combine the advantages of large surveys
with the advantages of experiments. Due to the large sample size, the
analysis has strong statistical power, and we can also dissociate the effects
of different stimuli (dimensions) using multilevel analysis. [36,37] Thus, we

can examine the effect of multiple variables on the outcome variable (WTP
measured on a 0 to 10 scale).
In addition to vignette-level variables, we also included respondent
level variables in the model to examine how individual characteristics
influence the effects of vignette dimensions on participation. We included
both respondent-level sociodemographic variables and attitudinal variables
in the model. The sociodemographic variables were gender (coded as
males and females); age; education with four categories (primary school
or lower, vocational, high school, college); place of residence with the type
of settlement (capital city, county seat, town, village); and the seven major
regions of Hungary (Central Hungary, Northern Hungary, Northern Great
Plain, Southern Great Plain, Southern Transdanubia, Central Transdanubia,
and Western Transdanubia).
The attitudinal variables we used in the models were the following: For
how many types of activities does the respondent use their smartphone. We
queried 15 different activities (see Table A2 in Appendix A for the full list
of activities) and simply counted the activities for which the respondent
actively uses his or her smartphone.
The personal trust variable shows the average of responses for three
trust-related items (see Table A3 in Appendix A for details) measured on a
scale from 0 to 10, where 0 represents complete distrust and 10 represents
complete trust. We performed the same calculation for trust in institutions.
We listed several institutions (see Table A4 in Appendix A for the full list)
and asked respondents to indicate their level of trust on a scale of 0 to 10,
where “0” means they do not trust the institution at all and “10” means they
trust it completely.
We also included several digital data collections techniques and asked
respondents how concerned they would be about sharing such information
for scientific research, emphasizing that their data would only be used
anonymously and in aggregated format without storing their personal
information. The response options were 1 to 4, with 1 meaning “would be
completely concerned” and 4 meaning “would not be concerned at all.” In
total, we asked about 18 different active and passive data collections (see
Table A5 in Appendix A for the full list of items), from which we formed
two separate indices: 6 items measured active, and another 12 items
measured passive data collection techniques. For both composite indicators,
we counted scores of 1 and 2 (i.e., those more likely to indicate concern).
For the statistical proof of the indices’ internal consistency, we performed

Cronbach’s alpha tests, which proved to be acceptable in each case.
In addition to the sociodemographic variables and the composite indices,
we added two other variables: the time respondents spend online and use
their smartphones (in minutes) on an average day.
Analysis and Models

The analysis could be divided into four parts. First, we simply checked for
the descriptive results of the benchmark variable showing the general level
of willingness to participate in a smartphone-based passive data collection
study.
In the next step we constructed variance component models, to
understand the direct effect of the decks by calculating the total variance in
the vignette outcome that is explained by respondent characteristics vs. deck
of vignettes.
In the second part of the analysis, we created three linear regression
models. These models were multilevel models because the analyses were
conducted at the vignette level, but each set of 10 vignettes was completed by
the same subject. Thus, the assumption about observational independence—
which is required in the case of general linear regression—was not made.
To control for these dependencies, we used multilevel mixed models. In
the first model, we included only the independent variables at the vignette
level. Then, in a second step, we added respondents’ sociodemographic
characteristics, as we assumed that these influence respondents’ willingness
to participate. In a third step, we additionally included composite indices
of the attitudinal variables at the respondent level. In the final step of the
analysis, we added cross-level interaction terms to the model to examine how
vignette-level dimensions are varied by respondent-level characteristics.
RESULTS AND DISCUSSION

In general, 50 percent of respondents would participate in a study that
includes the passive collection and sharing of data from the respondents’
smartphones. The online access panel that was used for this survey includes
panellists who from time to time are taking part in active data collection
(i.e., filling out online surveys through their PCs, laptops or smartphones),
so they presumably comprise a rather active and more motivated segment of
the Hungarian internet users. But one in two of them seems to be open for
passive data collection as well. (Q1)
The vignettes used in the survey was designed to understand the internal
motives and factors behind that shape the level of willingness. In the first step
of this analysis, we built two only intercept models, in which the dependent
variable was the outcome and there were no independent variables, but
the control for the level of decks and the level of respondents. Based on
the estimates of covariance parameters, we could conclude on the ratio of
explained variance by the different levels. The variance component models
revealed that 77.6 percent of the total variance in the vignette outcome is
explained by respondent characteristics and 1.4 percent is explained by the
deck of vignettes. Thus, the effect of the decks (the design of the vignette
study) is quite small.
We then created three multilevel regression models. (Table 3). In the first
one (Model 1), we included only the independent variables at the vignette
level. Then, in a second step (Model 2), we added the socio-demographic
characteristics of the respondents, as we assumed that they influence the
respondents’ willingness to participate. In a third step (Model 3), we also
included composite indices of respondent-level attitudinal variables. Table
3 shows the results of the three models.
Table 3: Multilevel Regression Models on the Willingness to Participate
h Dependent Variable: Willingness to Participate

Model 1 Model 2 Model 3
Intercept 5.85 * 8.12 * 6.94 *
(0.13) (0.82) (1.04)
Vignette level variables
Organizer of the research (ref: deci-
sion makers)
Private company 0.15 * 0.15 * 0.22 *
(0.04) (0.04) (0.04)
Scientific research institute 0.29 * 0.29 * 0.36 *
(0.04) (0.04) (0.05)
Data collected (ref: all three)
Spatial movement 0.09 0.09 0.05
(0.07) (0.07) (0.07)
Mobile usage −0.11 −0.11 −0.14 *
(0.06) (0.06) (0.07)

Communication habits −0.02 −0.01 −0.09
(0.07) (0.07) (0.07)
Movement & usage −0.02 −0.02 −0.06
(0.06) (0.06) (0.07)
Movement & communication 0.03 0.03 −0.05
(0.07) (0.07) (0.08)
Usage & communication 0.00 0.00 −0.02
(0.06) (0.06) (0.07)
Length of the research (Ref: one −0.68 * −0.68 * −0.74 *
month)
Incentive (ref: after downloading the (0.03) (0.03) (0.04)
app & after the end of the research)
After downloading the app −0.43 * −0.43 * −0.44 *
(0.04) (0.04) (0.05)
After the end of the research −0.47 * −0.47 * −0.49 *
(0.04) (0.04) (0.05)
Interruption and control (Ref: user
can interrupt the data collection and
has control over their data)
User can interrupt the data collection −0.13 * −0.13 * −0.17 *
(0.04) (0.04) (0.05)
User cannot interrupt the data col- −0.59 * −0.59 * −0.64 *
lection
(0.04) (0.04) (0.04)
Respondent level socio-demographic
variables
Gender (Ref: men) −0.30 −0.16
(0.22) (0.22)
Age (+: older) −0.04 * −0.02 *
(0.01) (0.01)
Education (+: higher) −0.24 −0.20
(0.13) (0.14)
Type of settlement (+: smaller) 0.13 −0.02
(0.12) (0.12)
Region (Ref: Western Transdanubia)
Central Hungary 0.07 0.29
(0.41) (0.43)
Northern Hungary 0.08 0.52
(0.47) (0.48)
Northern Great Plane 0.02 0.46
(0.45) (0.48)
Southern Great Plane 0.44 0.62
(0.46) (0.48)
Southern Transdanubia 0.28 0.35
(0.48) (0.52)
Central Transdanubia 0.72 0.84
(0.47) (0.50)
Respondent level attitude indices
Smartphone activities (+: multiple) 0.10 *
(0.04)
Personal trust (+: high) 0.07
(0.05)
Institutional trust (+: high) 0.12
(0.07)
Time spent online on an average day 0.00
(minutes)
(0.00)
Time spent using their smart phone 0.00
on an average day (minutes)
(0.00)
Number of active data collection −0.06
mentioned as rather worrying
(0.09)
Number of passive data collection −0.20 *
mentioned as rather worrying
(0.04)
AIC 44,997.2 44,977.2 37,066.5
BIC 45,011.8 44,991.8 37,080.7
Observations 100,000 10,000 10,000
Notes: Standard errors in parentheses. * p < 0.001.

Results of Model 1 revealed that compared to policymakers, respondents
are significantly more likely to participate in research conducted by a private
company (with an average of 0.15 points on a scale of 1 to 10) or a scientific
research institute (with an average of 0.29 points. People are more willing
(with an average of 0.68 points) to participate in a study that lasts only one
month-compared to one that lasts six months. And not surprisingly, they
would be more likely to participate in a study if they were paid twice instead
of once-by about 0.44 points. The chance of participating is highest if the
user can suspend the data collection at any time and view the collected
data when needed. The two options of no suspension and suspension but
no control over the data showed a lower chance of participation (with an
average of 0.59 and 0.13 respectively). Interestingly, there were no significant
differences between the purpose of the data collection and thus the type of
data collected. Compared to the reference category, where all three types of
data are requested, none of the other single types of data collection showed
significantly lower or higher level of participation (Q2).
We included respondents’ sociodemographic variables in Model 2.
Including respondents’ sociodemographic variables did not really change the
effect of the vignette dimensions. Interestingly, none of the sociodemographic
characteristics have a significant effect on participation, with the exception
of age: in accordance with previous research, older individuals are less likely
to participate. The probability of participation decreases by 0.04 points with
each additional year (Q3).
In Model 3, we added respondents’ attitudinal indices to the model.
The addition of the respondent-level attitudinal variables did not really
change the effects of the variables compared to the previous models. Of the
attitudinal variables, none of the trust indices appear to have a significant
effect, however smartphone use and concerns about passive data collection
do change the likelihood of participation. This is because the more activity
and the longer time someone uses a smartphone, the more likely they are
to participate in such a study. The more types of passive data collection
someone has concerns for, the less likely they are to participate (Q4).
Varying Effects of the Vignette Level Variables

among Respondents
In the next step we tested the vignette-level variables that had a significant
effect on willingness to participate to see if their effect differed across
respondents. These variables were the length of the study, the organizer
of the research, the type of incentive, and the possibility of suspension
and control. We set the slope of these variables to random (one at a time,
separately in different models) and tested whether they were significant,
that is, whether the effects varied across respondents. To achieve convergent
models, we transformed some of the vignette-level variables into dummies.
The transformation was based on the results of the previous models and
categorized together those values that showed the same direction of

effects. The organizer of the research was coded as: (a) private company or
scientific research institute vs. (b) policymakers. The type of incentive was
categorized as (a) only one vs. (b) two incentives given. The opportunity of
suspension was transformed into (a) no opportunity or there is opportunity
vs. (b) opportunity with control over transferred data. The results showed
that all of these variables had significant random slopes, so all effects varied
between respondents.
Interaction of Vignette- and Respondent Level Variables on the

Willingness to Participate
We also tested all interaction terms of those vignette variables that had
significant random slopes (length of the study, organizer of the research,
type of incentive, and possibility of suspension and control) with those
respondent level variables that had a significant effect on willingness to
participate (age, smartphone use, and concerns about the passive nature of
data collection).
Of the twelve interactions tested, six proved to be significant. Figure
1 shows the nature of these interactions with the means of the predicted
values. For illustration purposes, we divided each ratio-level variable into
two categories and used their mean as cut values.
Figure 1: Predicted values of cross-level interactions. (a) Length of study with

smartphone usage. (b) Length of study with concerns overs passive data col-
lection. (c) Number of incentives with smartphone usage. (d) Number of incen-
tives with concerns over passive data collection. (e) Interruption and control
with age. (f) Interruption and control with smartphone usage.
We can observe that shorter research duration predicts higher probability
of participation, but this effect is stronger for those who use their smartphone
for more types-compared to those who use it for fewer. (a) Interestingly, the
effect of the length of study is stronger among those with fewer concerns
about passive types of data collection and weaker among those with more
concerns. (b) When we consider the number of incentives, we can see that
while two incentives generally increase the odds of participation compared
to only one incentive, this effect is stronger among those who use their
smartphone for fewer types of activities and weaker among those who use
it for more activities. (c) In addition, the effect of the incentive is stronger
among those who have fewer concerns about passive data collection and
weaker among those who have more concerns. (d) In the original model
(Model 3, Table 3), we could see that someone is more likely to participate
if they can interrupt the study when they want to and if they can review
the data collected about them-compared to simply suspend or even not
being able to suspend. Based on the interactions, this effect is stronger for
younger individuals than for older individuals. (e) When we account for
smartphone usage, we see that the effect of type of suspension disappears
for those with lower smartphone usage and persists only for those who use
their smartphone for more tasks (f).
CONCLUSIONS
With this study, we aimed to continue the series of analyses examining
users’ attitudes toward passive sensors- and other device information-based
smartphone data collection.
Overall, our results are consistent with findings of previous research: We
found evidence that a more trusted survey organiser/client, shorter duration
of data collection, multiple incentives, and control over data collection can
significantly influence willingness to participate. The results also show that
apart from age (as major determinant of digital technology use and attitudes
towards digital technologies), demographic characteristics alone do not play
an important role. This finding might be biased by the general characteristics
of the online panel we used for the survey, but they might come as an
important information for future studies that aim for representativeness of
the online (smartphone user) population.
Contrary to our preliminary expectations, trust in people and institutions
alone does not seem to have a notable effect. This is especially noteworthy
given the fact that the Hungarian society has generally lower level of
personal and institutional trust compared to Western and Northern European
countries. However, general attitudes toward technology, the complexity
and intensity of smartphone use, and general concerns about passive data
collection may be critical in determining who is willing to participate in
future research.
Asking questions on future behaviour of people in hypothetical situations
have obvious limitations. In our case, this means that there is a good chance
that we would get different results if we asked people to download an
existing, ready-to-test app and to both actively and passively collect real,
personal data from users. We were mainly interested in people’s feelings,
fears and expectations that determine their future actions, and we suggest
that our results provided valid insights.
It should also be mentioned that in this research we focused mostly on
the dimensions analysed in previous studies and included them in our own
analysis. Of course, there are many other important factors that can influence
the willingness of users to participate. Our aim was therefore not to provide
a complete picture, but to gather important aspects that could enrich our
collective knowledge on smartphone based passive data collection and
inform our own application development process.
ACKNOWLEDGMENTS
We thank János Klenovszki, Bálint Markos and Norbert Sárközi at NRC
Ltd. for the professional support they provided us in conducting the online
survey. The authors also wish to thank anonymous reviewers for feedback
on the manuscript.
Appendix A. Survey Details

Table A1: Sample characteristics by sample size, unweighted and weighted
sample distribution
Unweighted Weighted Unweighted
Sample Distribution (%) Sample Size (n)
Gender
Male 42.9 48.2 429
Female 57.1 51.8 571
Age
y18–29 19.8 22.8 198
y30–39 21.3 20.1 213
y40–49 23.5 24.2 235
y50–59 17.6 15.2 176
y60–69 14.0 13.9 140
y70+ 3.8 3.9 38
Education
Primary 24.7 35.6 247
Secondary 42.3 39.0 423
Tertiary 33.0 25.4 330
Type of settlement
Budapest (capital) 20.7 21.2 207
Towns 54.3 52.1 543
Villages 25.0 26.7 250
Region
Central Hungary 32.4 34.6 324
Northern Hungary 11.1 10.4 111
Northern Great Plain 14.8 13.8 148
Southern Great Plain 14.0 11.7 140
Southern Transdanubia 8.9 8.8 89
Central Transdanubia 9.7 10.7 97

Western Transdanubia 9.1 9.9 91
Total 100% 100% 1000
Table A2: Questions of activities, for which the respondent uses their smart-
phone
1. Browsing websites
2. Write/read emails
3. Taking photos, videos
4. View content from social networking sites
(e.g., texts, images, videos on Facebook, Instagram, Twitter, etc.)
5. Post content to social media sites
(e.g., share text, images, videos on Facebook, Instagram, Twitter)
6. Online shopping (e.g., tickets, books, clothes, technical articles)
7. Online banking (e.g., account balance inquiries, transfers)
8. Install new apps (e.g., via Google Play or App Store)
9. Use apps that use the device’s location (e.g., Google Maps, Foursquare)
10. Connect devices to your device via Bluetooth (e.g., smart watch, pedometer)
11. Game
12. Listening to music, watching videos
13. Recording of training data (e.g., while running, number of steps per day, etc.)
14. Reading and editing files related to work and study
15. Voice assistant services (Google Assistant, Apple Siri, Amazon Alexa, etc.)
Table A3: The three items in the questionnaire measuring personal trust
1. In general, what would you say? Can most people be trusted, or rather that we
cannot be careful enough in human relationships? Put your opinion on a scale where
“0” means we can’t be careful enough and “10” means that most people are trust-
worthy.
2. Do you think that most people would try to take advantage of you if they had the
opportunity or try to be fair? Place your opinion on a scale where “0” means that
Most people would try to take advantage and “10” means that most people would
try to be fair.
3. Do you think people tend to care only about themselves, or are they generally
helpful? Place your opinion on a scale where “0” means people care more about
themselves and “10” means people tend to be helpful.
Table A4: The list of institutions, about which we asked the respondent how
much they trust in them
• 1.
Hungarian Parliament
• 2.
Hungarian legal system
• 3.
Politicians
• 4.
Police
• 5.
Scientists
• 6.
Online stores
• 7.
Large Internet companies (Apple, Google, Facebook, Microsoft, etc.)
• 8.
Online news portals
Table A5: Types of active and passive data collection methods
Type of Data Method of Data Collection

Collection
Active 1. Answer some questions via text message (SMS).
2. Answer the questions in a questionnaire in a personal video interview us-
ing your smartphone. (Questions will be asked by the interviewer.)
3. Fill out an online questionnaire through an app downloaded to your
smartphone.
4. Fill out an online questionnaire through your smartphone’s web browser.
5. Take photos or scan barcodes with your smartphone camera
(e.g., photos of recipes or barcodes of products you purchase).
6. While you are watching a research-related video on your phone your
camera uses software to examine what emotions appear on your face.
Passive 7. Allowing the built-in function of your smartphone to measure whether

e.g., How much and at what speed you walk run or just bike.
8. Connect a device to your smartphone using a Bluetooth connection (for
example to measure your physical activity)
9. Download an app that collects information about how you use your
smartphone.
10. How long you use your phone for a day (that is how long your device’s
display is on)
11.How many times a day you receive and make calls. (Only the number of
calls is recorded no phone numbers!)
12. Number of entries in your phonebook (ie how many phone numbers are
stored in your device. Important: specific names and phone numbers will
not be removed from your device!)
13. Sharing your smartphone’s geographic coordinates
(e.g., how much time you spend in a particular location)
14. The number of applications installed on your phone.
15. The number of male and female names in your phone’s contact list.
(Important: specific names and phone numbers will not be removed from
your device!)
16. The proportion of foreign phone numbers in your phonebook. (Impor-
tant: specific names and phone numbers will not be exported from your
device!)
17. The time is when you start using your phone in the morning.
18. The time you last use your phone in the evening.
APPENDIX B. INTERNAL VALIDITY TEST OF

VIGNETTE RESPONSES
There is strong correlation between the responses to the original and the
control vignette (r(998) = 0.89, p < 0.001). Overall, 63.8 percent of the
vignette responses were the same for the control item and the randomly
selected main vignette. Another 9.1 and 8.4 percent of the responses
differed by only minus or plus one point. This means that 81.3 percent of
the responses could be considered quasi identical for the randomly chosen
original and the control vignette.
Figure A1: Comparison of responses for original vignettes vs. control vignette.
AUTHOR CONTRIBUTIONS
Conceptualization and methodology: B.S., J.K. and A.G.; formal analysis,
J.K., B.S. and A.G.; writing—original draft preparation, review and editing,
A.G., J.K. and B.S. All authors have read and agreed to the published version
of the manuscript.
REFERENCES
1. De Bruijne M., Wijnant A. Comparing Survey Results Obtained
via Mobile Devices and Computers: An Experiment with a Mobile
Web Survey on a Heterogeneous Group of Mobile Devices Versus a
Computer-Assisted Web Survey. Soc. Sci. Comput. Rev. 2013;31:482–
504. doi: 10.1177/0894439313483976.
2. De Bruijne M., Wijnant A. Mobile Response in Web Panels. Soc. Sci.
Comput. Rev. 2014;32:728–742. doi: 10.1177/0894439314525918.
3. Couper M.P., Antoun C., Mavletova A. Total Survey Error in Practice.
John Wiley & Sons, Ltd.; Hoboken, NJ, USA: 2017. Mobile Web
Surveys; pp. 133–154.
4. Couper M.P. New Developments in Survey Data Collection. Annu. Rev.
Sociol. 2017;43:121–145. doi: 10.1146/annurev-soc-060116-053613.
5. Brenner P.S., DeLamater J. Lies, Damned Lies, and Survey Self-
Reports? Identity as a Cause of Measurement Bias. Soc. Psychol. Q.
2016;79:333–354. doi: 10.1177/0190272516628298.
6. Brenner P.S., DeLamater J. Measurement Directiveness as a Cause
of Response Bias: Evidence From Two Survey Experiments. Sociol.
Methods Res. 2014;45:348–371. doi: 10.1177/0049124114558630.
7. Palczyńska M., Rynko M. ICT Skills Measurement in Social Surveys:
Can We Trust Self-Reports? Qual. Quant. 2021;55:917–943. doi:
10.1007/s11135-020-01031-4.
8. Tourangeau R., Rips L.J., Rasinski K. The Psychology of Survey
Response. Cambridge University Press; Cambridge, UK: 2000.
9. Link M.W., Murphy J., Schober M.F., Buskirk T.D., Childs J.H.,
Tesfaye C.L. Mobile Technologies for Conducting, Augmenting and
Potentially Replacing Surveys: Report of the AAPOR Task Force on
Emerging Technologies in Public Opinion Research. Public Opin. Q.
2014;78:779–787. doi: 10.1093/poq/nfu054.
10. Karsai M., Perra N., Vespignani A. Time Varying Networks and
the Weakness of Strong Ties. Sci. Rep. 2015;4:4001. doi: 10.1038/
srep04001.
11. Onnela J.-P., Saramäki J., Hyvönen J., Szabó G., Lazer D., Kaski
K., Kertész J., Barabási A.-L. Structure and Tie Strengths in Mobile
Communication Networks. Proc. Natl. Acad. Sci. USA. 2007;104:7332–
7336. doi: 10.1073/pnas.0610245104.
12. Palmer J.R.B., Espenshade T.J., Bartumeus F., Chung C.Y., Ozgencil
N.E., Li K. New Approaches to Human Mobility: Using Mobile Phones
for Demographic Research. Demography. 2013;50:1105–1128. doi:
10.1007/s13524-012-0175-z.
13. Miritello G., Moro E., Lara R., Martínez-López R., Belchamber
J., Roberts S.G.B., Dunbar R.I.M. Time as a Limited Resource:
Communication Strategy in Mobile Phone Networks. Soc. Netw.
2013;35:89–95. doi: 10.1016/j.socnet.2013.01.003.
14. Kreuter F., Presser S., Tourangeau R. Social Desirability Bias in CATI,
IVR, and Web Surveys: The Effects of Mode and Question Sensitivity.
Public Opin. Q. 2008;72:847–865. doi: 10.1093/poq/nfn063.
15. Mulder J., de Bruijne M. Willingness of Online Respondents to
Participate in Alternative Modes of Data Collection. Surv. Pract.
2019;12:8356. doi: 10.29115/SP-2019-0001.
16. Scherpenzeel A. Data Collection in a Probability-Based Internet
Panel: How the LISS Panel Was Built and How It Can Be Used. BMS
Bull. Sociol. Methodol./Bull. Méthodol. Sociol. 2011;109:56–61. doi:
10.1177/0759106310387713.
17. Kołakowska A., Szwoch W., Szwoch M. A Review of Emotion
Recognition Methods Based on Data Acquired via Smartphone
Sensors. Sensors. 2020;20:6367. doi: 10.3390/s20216367.
18. Kreuter F., Haas G.-C., Keusch F., Bähr S., Trappmann M. Collecting
Survey and Smartphone Sensor Data With an App: Opportunities and
Challenges Around Privacy and Informed Consent. Soc. Sci. Comput.
Rev. 2020;38:533–549. doi: 10.1177/0894439318816389.
19. Struminskaya B., Lugtig P., Keusch F., Höhne J.K. Augmenting
Surveys With Data From Sensors and Apps: Opportunities and
Challenges. Soc. Sci. Comput. Rev. 2020:089443932097995. doi:
10.1177/0894439320979951.
20. Younis E.M.G., Kanjo E., Chamberlain A. Designing and Evaluating
Mobile Self-Reporting Techniques: Crowdsourcing for Citizen
Science. Pers. Ubiquitous Comput. 2019;23:329–338. doi: 10.1007/
s00779-019-01207-2.
21. Keusch F., Struminskaya B., Antoun C., Couper M.P., Kreuter F.
Willingness to Participate in Passive Mobile Data Collection. Public
Opin. Q. 2019;83:210–235. doi: 10.1093/poq/nfz007.
22. Revilla M., Toninelli D., Ochoa C., Loewe G. Do Online Access
Panels Need to Adapt Surveys for Mobile Devices? Internet Res.
2016;26:1209–1227. doi: 10.1108/IntR-02-2015-0032.
23. Wenz A., Jäckle A., Couper M.P. Willingness to Use Mobile
Technologies for Data Collection in a Probability Household Panel.
Surv. Res. Methods. 2019;13:1–22. doi: 10.18148/SRM/2019.
V1I1.7298.
24. Van Dijck J. Datafication, Dataism and Dataveillance: Big Data between
Scientific Paradigm and Ideology. Surveill. Soc. 2014;12:197–208. doi:
10.24908/ss.v12i2.4776.
25. Bricka S., Zmud J., Wolf J., Freedman J. Household Travel Surveys
with GPS An Experiment. Transp. Res. Rec. J. Transp. Res. Board.
2009;2105:51–56. doi: 10.3141/2105-07.
26. Biler S., Šenk P., Winklerová L. Willingness of Individuals to
Participate in a Travel Behavior Survey Using GPS Devices [Stanislav
Biler et al.]; Proceedings of the NTTS 2013; Brussels, Belgium. 5–7
March 2013; pp. 1015–1023.
27. Toepoel V., Lugtig P. What Happens If You Offer a Mobile Option
to Your Web Panel? Evidence From a Probability-Based Panel
of Internet Users. Soc. Sci. Comput. Rev. 2014;32:544–560. doi:
10.1177/0894439313510482.
28. Pinter R. Willingness of Online Access Panel Members to Participate
in Smartphone Application-Based Research. Mob. Res. Methods.
2015:141–156. doi: 10.5334/bar.i.
29. Revilla M., Ochoa C., Loewe G. Using Passive Data from a Meter to
Complement Survey Data in Order to Study Online Behavior. Soc. Sci.
Comput. Rev. 2017;35:521–536. doi: 10.1177/0894439316638457.
30. Scherpenzeel A. Mixing Online Panel Data Collection with Innovative
Methods. In: Eifler S., Faulbaum F., editors. Methodische Probleme
von Mixed-Mode-Ansätzen in der Umfrageforschung. Springer
Fachmedien; Wiesbaden, Germany: 2017. pp. 27–49. Schriftenreihe
der ASI—Arbeitsgemeinschaft Sozialwissenschaftlicher Institute.
31. Cabalquinto E., Hutchins B. “It Should Allow Me to Opt in or Opt
out”: Investigating Smartphone Use and the Contending Attitudes of
Commuters towards Geolocation Data Collection. Telemat. Inform.
2020;51:101403. doi: 10.1016/j.tele.2020.101403.
32. Struminskaya B., Toepoel V., Lugtig P., Haan M., Luiten A., Schouten
B. Understanding Willingness to Share Smartphone-Sensor Data.
Public Opin. Q. 2021;84:725–759. doi: 10.1093/poq/nfaa044.
33. Haas G., Kreuter F., Keusch F., Trappmann M., Bähr S. Effects of
Incentives in Smartphone Data Collection. In: Hill C.A., Biemer
P.P., Buskirk T.D., Japec L., Kirchner A., Kolenikov S., Lyberg L.E.,
editors. Big Data Meets Survey Science. Wiley; Hoboken, NJ, USA:
2020. pp. 387–414.
34. Hox J.J., Kreft I.G.G., Hermkens P.L.J. The Analysis of
Factorial Surveys. Sociol. Methods Res. 1991;19:493–510. doi:
10.1177/0049124191019004003.
35. Jasso G. Factorial Survey Methods for Studying Beliefs and
Judgments. Sociol. Methods Res. 2006;34:334–423. doi:
10.1177/0049124105283121.
36. Auspurg K., Hinz T. Multifactorial Experiments in Surveys: Conjoint
Analysis, Choice Experiments, and Factorial Surveys. In: Keuschnigg
M., Wolbring T., editors. Experimente in den Sozialwissenschaften.
Nomos; Baden-Baden, Germany: 2015. pp. 291–315. Soziale Welt
Sonderband.
37. Wallander L. 25 Years of Factorial Surveys in Sociology: A Review. Soc.
Sci. Res. 2009;38:505–520. doi: 10.1016/j.ssresearch.2009.03.004.
Chapter
AN INTEGRATIVE REVIEW
ON METHODOLOGICAL
CONSIDERATIONS IN
7
MENTAL HEALTH RESEARCH
– DESIGN, SAMPLING, DATA
COLLECTION PROCEDURE
AND QUALITY ASSURANCE
Eric Badu1, Anthony Paul O’Brien2, and Rebecca Mitchell3

School of Nursing and Midwifery, The University of Newcastle, Callaghan, Australia
1
Faculty of Health and Medicine, School Nursing and Midwifery, University of Newcastle,
2
Callaghan, Australia
Faculty of Business and Economics, Macquarie University, North Ryde, Australia
3
ABSTRACT
Background
Several typologies and guidelines are available to address the methodological
and practical considerations required in mental health research. However,
Citation: (APA): (1Badu, E., O’Brien, A. P., & Mitchell, R. (2019). An integrative
review on methodological considerations in mental health research–design, sampling,
data collection procedure and quality assurance. Archives of Public Health, 77(1), 1-15.
(15 pages)
Copyright: © This is an open-access article distributed under the terms of a Creative
Commons Attribution License (https://creativecommons.org/licenses/by/4.0/).
few studies have actually attempted to systematically identify and synthesise

these considerations. This paper provides an integrative review that identifies
and synthesises the available research evidence on mental health research
methodological considerations.
Methods
A search of the published literature was conducted using EMBASE, Medline,
PsycINFO, CINAHL, Web of Science, and Scopus. The search was limited
to papers published in English for the timeframe 2000–2018. Using pre-
defined inclusion and exclusion criteria, three reviewers independently
screened the retrieved papers. A data extraction form was used to extract
data from the included papers.
Results
Of 27 papers meeting the inclusion criteria, 13 focused on qualitative research,
8 mixed methods and 6 papers focused on quantitative methodology. A total
of 14 papers targeted global mental health research, with 2 papers each
describing studies in Germany, Sweden and China. The review identified
several methodological considerations relating to study design, methods,
data collection, and quality assurance. Methodological issues regarding the
study design included assembling team members, familiarisation and sharing
information on the topic, and seeking the contribution of team members.
Methodological considerations to facilitate data collection involved
adequate preparation prior to fieldwork, appropriateness and adequacy of
the sampling and data collection approach, selection of consumers, the
social or cultural context, practical and organisational skills; and ethical and
sensitivity issues.
Conclusion
The evidence confirms that studies on methodological considerations in
conducting mental health research largely focus on qualitative studies in
a transcultural setting, as well as recommendations derived from multi-
site surveys. Mental health research should adequately consider the
methodological issues around study design, sampling, data collection
procedures and quality assurance in order to maintain the quality of data
collection.
Keywords: Mental health, Methodological approach, Mixed methods,
Sampling, Data collection
An Integrative Review on Methodological Considerations in Mental ... 123
BACKGROUND
In the past decades there has been considerable attention on research
methods to facilitate studies in various academic fields, such as public health,
education, humanities, behavioural and social sciences [1–4]. These research
methodologies have generally focused on the two major research pillars
known as quantitative or qualitative research. In recent years, researchers
conducting mental health research appear to be either employing both
qualitative and quantitative research methods separately, or mixed methods
approaches to triangulate and validate findings [5, 6].
A combination of study designs has been utilised to answer research
questions associated with mental health services and consumer outcomes
[7, 8]. Study designs in the public health and clinical domains, for example,
have largely focused on observational studies (non-interventional) and
experimental research (interventional) [1, 3, 9]. Observational design in
non-interventional research requires the investigator to simply observe,
record, classify, count and analyse the data [1, 2, 10]. This design is different
from the observational approaches used in social science research, which
may involve observing (participant and non- participant) phenomena in the
fieldwork [1]. Furthermore, the observational study has been categorized
into five types, namely cross-sectional design, case-control studies, cohort
studies, case report and case series studies [1–3, 9–11]. The cross-sectional
design is used to measure the occurrence of a condition at a one-time point,
sometimes referred to as a prevalence study. This approach of conducting
research is relatively quick and easy but does not permit a distinction between
cause and effect [1]. Conversely, the case-control is a design that examines
the relationship between an attribute and a disease by comparing those with
and without the disease [1, 2, 12]. In addition, the case-control design is
usually retrospective and aims to identify predictors of a particular outcome.
This type of design is relevant when investigating rare or chronic diseases
which may result from long-term exposure to particular risk factors [10].
Cohort studies measure the relationship between exposure to a factor and
the probability of the occurrence of a disease [1, 10]. In a case series design,
medical records are reviewed for exposure to determinants of disease and
outcomes. More importantly, case series and case reports are often used as
preliminary research to provide information on key clinical issues [12].
The interventional study design describes a research approach that
applies clinical care to evaluate treatment effects on outcomes [13]. Several
previous studies have explained the various forms of experimental study
design used in public health and clinical research [14, 15]. In particular,
experimental studies have been categorized into randomized controlled

trials (RCTs), non-randomized controlled trials, and quasi-experimental
designs [14]. The randomized trial is a comparative study where participants
are randomly assigned to one of two groups. This research examines a
comparison between a group receiving treatment and a control group
receiving treatment as usual or receiving a placebo. Herein, the exposure to
the intervention is determined by random allocation [16, 17].
Recently, research methodologists have given considerable attention
to the development of methodologies to conduct research in vulnerable
populations. Vulnerable population research, such as with mental health
consumers often involves considering the challenges associated with
sampling (selecting marginalized participants), collecting data and analysing
it, as well as research engagement. Consequently, several empirical studies
have been undertaken to document the methodological issues and challenges
in research involving marginalized populations. In particular, these studies
largely addresses the typologies and practical guidelines for conducting
empirical studies in mental health. Despite the increasing evidence,
however, only a few studies have yet attempted to systematically identify
and synthesise the methodological considerations in conducting mental
health research from the perspective of consumers.
A preliminary search using the search engines Medline, Web of Science,
Google Scholar, and Scopus Index and EMBASE identified only two reviews
of mental health based research. Among these two papers, one focused on
the various types of mixed methods used in mental health research [18],
whilst the other paper, focused on the role of qualitative studies in mental
health research involving mixed methods [19]. Even though the latter two
studies attempted to systematically review mixed methods mental health
research, this integrative review is unique, as it collectively synthesises the
design, data collection, sampling, and quality assurance issues together,
which has not been previously attempted.
This paper provides an integrative review addressing the available
evidence on mental health research methodological considerations. The
paper also synthesises evidence on the methods, study designs, data
collection procedures, analyses and quality assurance measures. Identifying
and synthesising evidence on the conduct of mental health research has
relevance to clinicians and academic researchers where the evidence
provides a guide regarding the methodological issues involved when
conducting research in the mental health domain. Additionally, the synthesis
can inform clinicians and academia about the gaps in the literature related to
methodological considerations.
METHODS
Methodology
An integrative review was conducted to synthesise the available evidence on
mental health research methodological considerations. To guide the review,
the World Health Organization (WHO) definition of mental health has been
utilised. The WHO defines mental health as: “a state of well-being, in which
the individual realises his or her own potentials, ability to cope with the
normal stresses of life, functionality and work productivity, as well as the
ability to contribute effectively in community life” [20]. The integrative
review enabled the simultaneous inclusion of diverse methodologies (i.e.,
experimental and non-experimental research) and varied perspectives to
fully understand a phenomenon of concern [21, 22]. The review also uses
diverse data sources to develop a holistic understanding of methodological
considerations in mental health research. The methodology employed
involves five stages: 1) problem identification (ensuring that the research
question and purpose are clearly defined); 2) literature search (incorporating
a comprehensive search strategy); 3) data evaluation; 4) data analysis
(data reduction, display, comparison and conclusions) and; 5) presentation
(synthesising findings in a model or theory and describing the implications
for practice, policy and further research) [21].
Inclusion Criteria
The integrative review focused on methodological issues in mental health
research. This included core areas such as study design and methods,
particularly qualitative, quantitative or both. The review targeted papers
that addressed study design, sampling, data collection procedures, quality
assurance and the data analysis process. More specifically, the included
papers addressed methodological issues on empirical studies in mental
health research. The methodological issues in this context are not limited to
a particular mental illness. Studies that met the inclusion criteria were peer-
reviewed articles published in the English Language, from January 2000 to
July 2018.
Exclusion Criteria
Articles that were excluded were based purely on general health services
or clinical effectiveness of a particular intervention with no connection to
mental health research. Articles were also excluded when it addresses non-
methodological issues. Other general exclusion criteria were book chapters,
conference abstracts, papers that present opinion, editorials, commentaries
and clinical case reviews.
Search Strategy and Selection Procedure

The search of published articles was conducted from six electronic databases,
namely EMBASE, CINAHL (EBSCO), Web of Science, Scopus, PsycINFO
and Medline. We developed a search strategy based on the recommended
guidelines by the Joanna Briggs Institute (JBI) [23]. Specifically, a three-
step search strategy was utilised to conduct the search for information (see
Table 1). An initial limited search was conducted in Medline and Embase
(see Table 1). We analysed the text words contained in the title and abstract
and of the index terms from the initial search results [23]. A second search
using all identified keywords and index terms was then repeated across all
remaining five databases (see Table 1). Finally, the reference lists of all
eligible studies were manually hand searched [23].
Table 1: Search strategy and selection procedure
Stages Search terms and keywords

Stage 1 (Initial search in (“mental health” OR mental health service OR “psychiatric services”
MEDLINE and EMBASE OR mental disorders OR mental illness) AND (“methods” or “research
designs” or “data collection” or “data analysis” OR “sampling” or
“sample size” OR “mixed methods”) AND (“quality assurance” OR
“reliability” OR “validity” OR “techniques” OR “strategies” OR
research design OR “informed consent”)
Stage 2 (search across (“psychiatry” OR “mental health” OR “mental disorders” OR “mental
CINAHL, Web of Science, patient” OR “mental illness” OR “mental treatment” OR “consumer”)
Scopus, and PsycINFO) AND (“research methods” OR “methodology” OR “research designs”
OR “qualitative research” OR “quantitative research” OR “mixed
methods” OR “biomedical research” OR “health service research” OR
“epidemiologic methods” OR “behavioural research” OR “process de-
sign”) AND (“sampling” OR “sample size” OR “patient selection” OR
“surveys” OR “questionnaires” OR “interviews” OR “data analysis”
OR “content analysis” OR “thematic analysis” OR “reporting”) AND
(“informed consent” “reliability” OR “quality assurance” OR “validity”
OR “techniques” OR “strategies” OR “process”)
Stage 3 Hand searching of the reference lists
The selection of eligible articles adhered to the Preferred Reporting

Items for Systematic Reviews and Meta-Analyses (PRISMA) [24] (see Fig.
1). Firstly, three authors independently screened the titles of articles that
were retrieved and then approved those meeting the selection criteria. The
authors reviewed all the titles and abstracts and agreed on those needing
full-text screening. E.B (Eric Badu) conducted the initial screening of titles
and abstracts. A.P.O’B (Anthony Paul O’Brien) and R.M (Rebecca Mitchell)
conducted the second screening of titles and abstracts of all the identified
papers. The authors (E.B, A.P.O’B and R.M) conducted full-text screening
according to the inclusion and exclusion criteria.
Figure 1: Flow Chart of studies included in the review.
Data Management and Extraction

The integrative review used Endnote ×8 to screen and handle duplicate
references. A predefined data extraction form was developed to extract
data from all included articles (see Additional file 1). The data extraction
form was developed according to Joanna Briggs Institute (JBI) [23] and
Cochrane [24] manuals, as well as the literature associated with concepts
and methods in mental health research. The data extraction form was
categorised into sub-sections, such as study details (citation, year of

publication, author, contact details of lead author, and funder/sponsoring
organisation, publication type), objective of the paper, primary subject area
of the paper (study design, methods, sampling, data collection, data analysis,
quality assurance). The data extraction form also had a section on additional
information on methodological consideration, recommendations and other
potential references. The authors extracted results of the included papers
in numerical and textual format [23]. EB (Eric Badu) conducted the data
extraction, A.P.O’B (Anthony Paul O’Brien) and R.M (Rebecca Mitchell),
conducted the second review of the extracted data.
Data Synthesis
Content analysis was used to synthesise the extracted data. The content
analysis process involved several stages which involved noting patterns
and themes, seeing plausibility, clustering, counting, making contrasts
and comparisons, discerning common and unusual patterns, subsuming
particulars into general, noting relations between variability, finding
intervening factors and building a logical chain of evidence [21] (see Table
2).
Table 2: The key emerging themes
Theme Sub-theme Na Papers

Mixed methods Categorizing mixed methods 4 (19) (18) (43) (48)
design in mental Function of mixed methods 6 (45) (42) (48) (19) (18) (43)
health research
Structure of mixed methods 5 (43) (19) (18) (42) (48)
Process of mixed methods 5 (48) (43) (42) (19) (18)
Consideration for using mixed 3 (19) (18) (45)
methods
Qualitative study Considering qualitative 6 (32) (36) (19) (26) (28) (44)
in mental health methods
research
Sampling in mental Sampling approaches (quan- 3 (35) (34) (25)
health research titative)
Sampling approaches (quali- 7 (28) (32) (46) (19) (42) (30)
tative) (31)
Sampling consideration 4 (30) (31) (32) (46)
Data collection Approaches for collecting 9 (28) (41) (30) (31) (44) (47)
in mental health qualitative data (19) (40) (34)
research Consideration for data col- 6 (32) (37) (31) (41) (49) (47)
lection
Preparing for data collection 8 (25) (33) (34) (35) (39) (41)
(49) (30)
Quality assurance Seeking informed consent 7 (25) (26) (33) (35) (37) (39)
procedures (47)
Procedure for ensuring quality 5 (49) (25) (39) (33) (38)
control (quantitative)
Procedure for ensuring quality 4 (32) (37) (46) (19)
control (qualitative)
Na number of papers
RESULTS
Study Characteristics
The integrative review identified a total of 491 records from all databases,
after which 19 duplicates were removed. Out of this, 472 titles and abstracts
were assessed for eligibility, after which 439 articles were excluded. Articles
not meeting the inclusion criteria were excluded. Specifically, papers
excluded were those that did not address methodological issues as well as
papers addressing methodological consideration in other disciplines. A total
of 33 full-text articles were assessed – 9 articles were further excluded,
whilst an additional 3 articles were identified from reference lists. Overall,
27 articles were included in the final synthesis (see Fig. 1). Of the total
included papers, 12 contained qualitative research, 9 were mixed methods
(both qualitative and quantitative) and 6 papers focused on quantitative data.
Conversely, a total of 14 papers targeted global mental health research and 2
papers each describing studies in Germany, Sweden and China. The papers
addressed different methodological issues, such as study design, methods,
data collection, and analysis as well as quality assurance (see Table 3).
Table 3: Study characteristics
Author Setting Methodological issues Type of

addressed method
Alonso, Angermeyer Belgium, France, Sampling, data collection Quantitative
[25] Germany, Italy, and Quality assurance
the Netherlands
and Spain
Baarnhielm and Ekblad Sweden Quality assurance (ethical Qualitative
[26] issues)
Braun and Clarke [27] Global Data analysis Qualitative
Brown and Lloyd [28] Global Methods, sampling, data Qualitative
collection and analysis
Davidsen [29] Global Data analysis Qualitative
de Jong and Van Om- Global Sampling and Data col- Mixed Methods
meren [30] lection
Ekblad and Baarnhielm Sweden Data collection Qualitative
[31]
Fossey, Harvey [32] Global Methods, Sampling, data Qualitative
collection, data analysis
and Quality assurance
Jacobi, Wittchen [33] Germany Data collection, analysis Quantitative
Koch, Vogel [34] Germany Sampling, data collection Mixed Methods
Korver, Quee [35] Netherlands Sampling and Quality as- Quantitative
surance
Larkin, Watts [36] Global Study design Qualitative
Latvala, Vuokila-Oik- Finland Data collection and Quality Qualitative
konen [37] assurance
Leese, White [38] Europe Quality assurance Quantitative
Liu, Huang [39] China Data analysis and Quality Quantitative
assurance
Montgomery and Bai- Canada Data collection and Qualitative
ley [40] analysis
Owen [41] UK Data collection Qualitative
Palinkas [19] Global Study design, methods, Mixed Methods
sampling, data collection,
analysis and Quality as-
surance
Palinkas, Horwitz [18] Global Study design Mixed Methods
Palinkas, Horwitz [42] Global Sampling Mixed methods
Palinkas, Aarons [43] Global Study design Mixed Methods
Razafsha, Behforuzi Global Methods and data collec- Mixed Methods

[44] tion
Robins, Ware [45] Global Study design Mixed Methods
Robinson [46] Global Sampling and Quality as- Qualitative
surance
Schilder, Tomov [47] Bulgaria Data collection Qualitative
Schoonenboom and Global Study design Mixed Methods
Johnson [48]
Yin, Phillips [49] China Data collection Quantitative
Mixed Methods Design in Mental Health Research

Mixed methods research is defined as a research process where the elements
of qualitative and quantitative research are combined in the design, data
collection, and its triangulation and validation [48]. The integrative review
identified four sub-themes that describe mixed methods design in the context
of mental health research. The sub-themes include the categories of mixed
methods, their function, structure, process and further methodological
considerations for mixed methods design. These sub-themes are explained
as follows:
Categorizing Mixed Methods in Mental Health Research

Four studies highlighted the categories of mixed methods design applicable
to mental health research [18, 19, 43, 48]. Generally, there are differences in
the categories of mixed methods design, however, three distinct categories
predominantly appear to cross cut in all studies. These categories are
function, structure and process. Some studies further categorised mixed
method design to include rationale, objectives, or purpose. For instance,
Schoonenboom and Johnson [48] categorised mixed methods design into
primary and secondary dimensions.
The Function of Mixed Methods in Mental Health Research

Six studies explain the function of conducting mixed methods design in
mental health research. Two studies specifically recommended that mixed
methods have the ability to provide a more robust understanding of services
by expanding and strengthening the conclusions from the study [42, 45].
More importantly, the use of both qualitative and quantitative methods
have the ability to provide innovative solutions to important and complex
problems, especially by addressing diversity and divergence [48]. The review

identified five underlying functions of a mixed method design in mental
health research which include achieving convergence, complementarity,
expansion, development and sampling [18, 19, 43].
The use of mixed methods to achieve convergence aims to employ both
qualitative and quantitative data to answer the same question, either through
triangulation (to confirm the conclusions from each of the methods) or
transformation (using qualitative techniques to transform quantitative data).
Similarly, complementarity in mixed methods integrates both qualitative
and quantitative methods to answer questions for the purpose of evaluation
or elaboration [18, 19, 43]. Two papers recommend that qualitative methods
are used to provide the depth of understanding, whilst the quantitative
methods provide a breadth of understanding [18, 43]. In mental health
research, the qualitative data is often used to examine treatment processes,
whilst the quantitative methods are used to examine treatment outcomes
against quality care key performance targets.
Additionally, three papers indicated that expansion as a function of
mixed methods uses one type of method to answer questions raised by the
other type of method [18, 19, 43]. For instance, qualitative data is used to
explain findings from quantitative analysis. Also, some studies highlight
that development as a function of mixed methods aims to use one method
to answer research questions, and use the findings to inform other methods
to answer different research questions. A qualitative method, for example, is
used to identify the content of items to be used in a quantitative study. This
approach aims to use qualitative methods to create a conceptual framework
for generating hypotheses to be tested by using a quantitative method [18,
19, 43]. Three papers suggested that using mixed methods for the purpose
of sampling utilize one method (eg. quantitative) to identify a sample of
participants to conduct research using other methods (eg. qualitative) [18,
19, 43]. For instance, quantitative data is sequentially utilized to identify
potential participants to participate in a qualitative study and the vice versa.
Structure of Mixed Methods in Mental Health Research

Five studies categorised the structure of conducting mixed methods in
mental health research, into two broader concepts including simultaneous
(concurrent) and sequential (see Table 3). In both categories, one method
is regarded as primary and the other as secondary, although equal weight
can be given to both methods [18, 19, 42, 43, 48]. Two studies suggested
that the sequential design is a process where the data collection and analysis
of one component (eg. quantitative) takes place after the data collection
and analysis of the other component (eg qualitative). Herein, the data
collection and analysis of one component (e.g. qualitative) may depend on
the outcomes of the other component (e.g. quantitative) [43, 48]. An earlier
review suggested that the majority of contemporary studies in mental health
research use a sequential design, with qualitative methods, more often
preceding quantitative methods [18].
Alternatively, the concurrent design collects and analyses data of
both components (e.g. quantitative and qualitative) simultaneously and
independently. Palinkas, Horwitz [42] recommend that one component is
used as secondary to the other component, or that both components are
assigned equal priority. Such a mixed methods approach aims to provide a
depth of understanding afforded by qualitative methods, with the breadth of
understanding offered by the quantitative data to elaborate on the findings
of one component or seek convergence through triangulation of the results.
Schoonenboom and Johnson [48] recommended the use of capital letters for
one component and lower case letters for another component in the same
design to indicate that one component is primary and the other is secondary
or supplemental.
Process of Mixed Methods in Mental Health Research

Five papers highlighted the process for the use of mixed methods in mental
health research [18, 19, 42, 43, 48]. The papers suggested three distinct
processes or strategies for combining qualitative and quantitative data.
These include merging or converging the two data sets, connecting the
two datasets by having one build upon the other; and embedding one data
set within the other [19, 43]. The process of connecting occurs when the
analysis of one dataset leads to the need for the other data set. For instance,
in the situation where quantitative results lead to the subsequent collection
and analysis of qualitative data [18, 43]. A previous study suggested that
most studies in mental health sought to connect the data sets. Similarly,
the process of merging the datasets brings together two sets of data during
the interpretation, or transforms one type of data into the other type, by
combining the data into new variables [18]. The process of embedding data
into mixed method designs in mental health uses one dataset to provide a
supportive role to the other dataset [43].
Consideration for Using Mixed Methods in Mental Health Re-

search
Three studies highlighted several factors that need to be considered when
conducting mixed methods design in mental health research [18, 19, 45].
Accordingly, these factors include developing familiarity with the topic
under investigation based on experience, willingness to share information
on the topic [19], establishing early collaboration, willingness to negotiate
emerging problems, seeking the contribution of team members, and soliciting
third-party assistance to resolve any emerging problems [45]. Additionally,
Palinkas, Horwitz [18] recommended that mixed methods in the context
of mental health research are mostly applied in studies that assess needs of
services, examine existing services, developing new or adapting existing
services, evaluating services in randomised control trials, and examining
service implementation.
Qualitative Study in Mental Health Research

This theme describes the various qualitative methods used in mental health
research. The theme also addresses methodological considerations for using
qualitative methods in mental health research. The key emerging issues are
discussed below:
Considering Qualitative Components in Conducting Mental

Health Research
Six studies recommended the use of qualitative methods in mental health
research [19, 26, 28, 32, 36, 44]. Two qualitative research paradigms
were identified, including the interpretive and critical approach [32]. The
interpretive methodologies predominantly explore the meaning of human
experiences and actions, whilst the critical approach emphasises the social
and historical origins and contexts of meaning [32]. Two studies suggested
that the interpretive qualitative methods used in mental health research are
ethnography, phenomenology and narrative approaches [32, 36].
The ethnographic approach describes the everyday meaning of the
phenomena within a societal and cultural context, for instance, the way
phenomena or experience is contrasted within a community, or by collective
members over time [32]. Alternatively, the phenomenological approach
explores the claims and concerns of a subject with a speculative development
of an interpretative account within their cultural and physical environments
focusing on the lived experience [32, 36].
Moreover, the critical qualitative approaches used in mental health

research are predominantly emancipatory (for instance, socio-political
traditions) and participatory action-based research. The emancipatory
traditions recognise that knowledge is acquired through critical discourse and
debate but are not seen as discovered by objective inquiry [32]. Alternatively,
the participatory action based approach uses critical perspectives to engage
key stakeholders as participants in the design and conduct of the research
[32].
Some studies highlighted several reasons why qualitative methods are
relevant to mental health research. In particular, qualitative methods are
significant as they emphasise naturalistic inquiry and have a discovery-
oriented approach [19, 26]. Two studies suggested that qualitative methods
are often relevant in the initial stages of research studies to understand
specific issues such as behaviour, or symptoms of consumers of mental
services [19]. Specifically, Palinkas [19] suggests that qualitative methods
help to obtain initial pilot data, or when there is too little previous research
or in the absence of a theory, such as provided in exploratory studies, or
previously under-researched phenomena.
Three studies stressed that qualitative methods can help to better
understand socially sensitive issues, such as exploring the solutions
to overcome challenges in mental health clinical policies [19, 28, 44].
Consequently, Razafsha, Behforuzi [44] recommended that the natural
holistic view of qualitative methods can help to understand the more
recovery-oriented policy of mental health, rather than simply the treatment
of symptoms. Similarly, the subjective experiences of consumers using
qualitative approaches have been found useful to inform clinical policy
development [28].
Sampling in Mental Health Research

The theme explains the sampling approaches used in mental health research.
The section also describes the methodological considerations when
sampling participants for mental health research. The sub-themes emerging
are explained in the following sections:
Sampling Approaches (Quantitative)

Some studies reviewed highlighted the sampling approaches previously used
in mental health research [25, 34, 35]. Generally, all quantitative studies
tend to use several probability sampling approaches, whilst qualitative
studies used non-probability techniques. The quantitative mental health

studies conducted at community and population level employ multi-stage
sampling techniques usually involving systematic sampling, stratified
and random sampling [25, 34]. Similarly, quantitative studies that recruit
consumers in the hospital setting employ consecutive sampling [35]. Two
studies reviewed highlighted that the identification of consumers of mental
health services for research is usually conducted by service providers. For
instance, Korver, Quee [35] research used a consecutive sampling approach
by identifying consumers through clinicians working in regional psychosis
departments, or academic centres.
Sampling Approaches (Qualitative)

Seven studies suggested that the sampling procedures widely used in mental
health research involving qualitative methods are non-probability techniques,
which include purposive [19, 28, 32, 42, 46], snowballing [30, 32, 46] and
theoretical sampling [31, 32]. The purposive sampling identifies participants
that possess relevant characteristics to answer a research question [28].
Purposive sampling can be used in a single case study, or for multiple cases.
The purposive sampling used in mental health research is usually extreme,
or deviant case sampling, criterion sampling, and maximum variation
sampling [19]. Furthermore, it is advised when using purposive sampling in
a multistage level study, that it should aim to begin with the broader picture
to achieve variation, or dispersion, before moving to the more focused view
that considers similarity, or central tendencies [42].
Two studies added that theoretical sampling involved sampling
participants, situations and processes based on concepts on theoretical
grounds and then using the findings to build theory, such as in a Grounded
Theory study [31, 32]. Some studies highlighted that snowball sampling
is another strategy widely used in mental health research [30, 32, 46].
This is ascribed to the fact that people with mental illness are perceived as
marginalised in research and practically hard-to-reach using conventional
sampling [30, 32]. Snowballing sampling involves asking the marginalised
participants to recommend individuals who might have direct knowledge
relevant to the study [30, 32, 46]. Although this approach is relevant, some
studies advise the limited possibility of generalising the sample, because of
the likelihood of selection bias [30].
Sampling Consideration
Four studies in this section highlighted some of the sampling considerations
in mental health research [30–32, 46]. Generally, mental health research
should consider the appropriateness and adequacy of sampling approach by
applying attributes such as shared social, or cultural experiences, or shared
concern related to the study [32], diversity and variety of participants [31],
practical and organisational skills, as well as ethical and sensitivity issues
[46]. Robinson [46] further suggested that sampling can be homogenous or
heterogeneous depending on the research questions for the study. Achieving
homogeneity in sampling should employ a variety of parameters, which
include demographic, graphical, physical, psychological, or life history
homogeneity [46]. Additionally, applying homogeneity in sampling can be
influenced by theoretical and practical factors. Alternatively, some samples
are intentionally selected based on heterogeneous factors [46].
Data Collection in Mental Health Research

This theme highlights the data collection methods used in mental health
research. The theme is explained according to three sub-themes, which
include approaches for collecting qualitative data, methodological
considerations, as well as preparations for data collection. The sub-themes
are as follows:
Approaches for Collecting Qualitative Data

The studies reviewed recommended the approaches that are widely applied
in collecting data in mental health research. The widely used qualitative data
collection approaches in mental health research are focus group discussions
(FGDs) [19, 28, 30, 31, 41, 44, 47], extended in-depth interviews [19,
30, 34], participant and non-participant observation [19], Delphi data
collection, quasi-statistical techniques [19] and field notes [31, 40]. Seven
studies suggest that FGDs are widely used data collection approaches [19,
28, 30, 31, 41, 44, 47] because they are valuable in gathering information on
consumers’ perspectives of services, especially regarding satisfaction, unmet/
met service needs and the perceived impact of services [47]. Conversely,
Ekblad and Baarnhielm [31] recommended that this approach is relevant
to improve clinical understanding of the thoughts, emotions, meanings and
attitudes towards mental health services.
Such data collection approaches are particularly relevant to consumers
of mental health services, due to their low self-confidence and self-esteem
[41]. The approach can help to understand specific terms, vocabulary,

opinions and attitudes of consumers of mental health services, as well as their
reasoning about personal distress and healing [31]. Similarly, the reliance on
verbal rather than written communication helps to promote the participation
of participants with serious and enduring mental health problems [31, 41].
Although FGD has several important outcomes, there are some limitations
that need critical consideration. Ekblad and Baarnhielm [31] for example
suggest, that marginalised participants may not always feel free to talk about
private issues regarding their condition at the group level mostly due to
perceived stigma and group confidentiality.
Some studies reviewed recommended that attempting to capture
comprehensive information and analysing group interactions in mental health
research requires the research method to use field notes as a supplementary
data source to help validate the FGDs [31, 40, 41]. The use of field notes
in addition to FGDs essentially provides greater detail in the accounts of
consumers’ subjective experiences. Furthermore, Montgomery and Bailey
[40] suggest that field notes require observational sensitivity, and also
require having specific content such as descriptive and interpretive data.
Three studies in this section suggested that in-depth interviews are
used to collect data from consumers of mental health services [19, 30, 34].
This approach is particularly important to explore the behaviour, subjective
experiences and psychological processes; opinions, and perceptions of
mental health services. de Jong and Van Ommeren [30] recommend that
in-depth interviews help to collect data on culturally marked disorders,
their personal and interpersonal significance, patient and family explanatory
models, individual and family coping styles, symptom symbols and protective
mediators. Palinkas [19] also highlights that the structured narrative form
of extended interviewing is the type of in-depth interview used in mental
health research. This approach provides participants with the opportunity to
describe the experience of living with an illness and seeking services that
assist them.
Consideration for Data Collection

Six studies recommended consideration required in the data collection
process [31, 32, 37, 41, 47, 49]. Some studies highlighted that consumers of
mental health services might refuse to participate in research due to several
factors [37] like the severity of their illness, stigma and discrimination [41].
Subsequently, such issues are recommended to be addressed by building
confidence and trust between the researcher and consumers [31, 37]. This
is a significant prerequisite, as it can sensitise and normalise the research
process and aims with the participants prior to discussing their personal
mental health issues. Similarly, some studies added that the researcher can
gain the confidence of service providers who manage consumers of mental
health services [41, 47], seek ethical approval from the relevant committee(s)
[41, 47], meet and greet the consumers of mental health services before
data collection, and arrange a mutually acceptable venue for the groups and
possibly supply transport [41].
Two studies further suggested that the cultural and social differences of
the participants need consideration [26, 31]. These factors could influence
the perception and interpretation of ethical issues in the research situation.
Additionally, two studies recommended the use of standardised
assessment instruments for mental health research that involve quantitative
data collection [33, 49]. A recent survey suggested that measures to
standardise the data collection approach can convert self-completion
instruments to interviewer-completion instruments [49]. The interviewer
can then read the items of the instruments to respondents and record their
responses. The study further suggested the need to collect demographic and
behavioural information about the participant(s).
Preparing for Data Collection

Eight studies highlighted the procedures involved in preparing for data
collection in mental health research [25, 30, 33–35, 39, 41, 49]. These
studies suggest that the preparation process involve organising meetings of
researchers, colleagues and representatives of the research population. The
meeting of researchers generally involves training of interviewers about the
overall design, objectives and research questions associated with the study.
de Jong and Van Ommeren [30] recommended that preparation for the use
of quantitative data encompasses translating and adapting instruments with
the aim of achieving content, semantic, concept, criterion and technical
equivalence.
Quality assurance procedures in mental health research

This section describes the quality assurance procedures used in mental health
research. Quality assurance is explained according to three sub-themes: 1)
seeking informed consent, 2) the procedure for ensuring quality assurance
in a quantitative study and 3) the procedure for ensuring quality control in

a qualitative study. The sub-themes are explained in the following content.
Seeking Informed Consent

The papers analysed for the integrative review suggested that the rights of
participants to safeguard their integrity must always be respected, and so
each potential subject must be adequately informed of the aims, methods,
anticipated benefits and potential hazards of the study and any potential
discomforts (see Table Table3).3). Seven studies highlight that potential
participants of mental health research must be consented to the study prior
to data collection [25, 26, 33, 35, 37, 39, 47]. The consent process helps
to assure participants of anonymity and confidentiality and further explain
the research procedure to them. Baarnhielm and Ekblad [26] argue that the
research should be guided by four basic moral values for medical ethics,
autonomy, non-maleficence, beneficence, and justice. In particular, potential
consumers of mental health services who may have severe conditions and
unable to consent themselves are expected to have their consent signed
by a respective family caregiver [37]. Latvala, Vuokila-Oikkonen [37]
further suggested that researchers are responsible to agree on the criteria
to determine the competency of potential participants in mental health
research. The criteria are particularly relevant when potential participants
have difficulties in understanding information due to their mental illness.
Procedure for Ensuring Quality Control (Quantitative)

Several studies highlighted procedures for ensuring quality control in
mental health research (see Table Table3).3). The quality control measures
are used to achieve the highest reliability, validity and timeliness. Some
studies demonstrate that ensuring quality control should consider factors
such as pre-testing tools [25, 49], minimising non-response rates [25, 39]
and monitoring of data collection processes [25, 33, 49].
Accordingly, two studies suggested that efforts should be made to re-
approach participants who initially refuse to participate in the study. For
instance, Liu, Huang [39] recommended that when a consumer of mental
health services refuse to participate in a study (due to low self-esteem) when
approached for the first time, a different interviewer can re-approach the
same participant to see if they are more comfortable to participate after
the first invitation. Three studies further recommend that monitoring data
quality can be accomplished through “checks across individuals, completion
status and checks across variables” [25, 33, 49]. For example, Alonso,
Angermeyer [25] advocate that various checks are used to verify completion
of the interview, and consistency across instruments against the standard
procedure.
Procedure for Ensuring Quality Control (Qualitative)

Four studies highlighted the procedures for ensuring quality control of
qualitative data in mental health research [19, 32, 37, 46]. A further two
studies suggested that the quality of qualitative research is governed by
the principles of credibility, dependability, transferability, reflexivity,
confirmability [19, 32]. Some studies explain that the credibility or
trustworthiness of qualitative research in mental health is determined by
methodological and interpretive rigour of the phenomenon being investigated
[32, 37]. Consequently, Fossey, Harvey [32] propose that the methodological
rigour for assessing the credibility of qualitative research are congruence,
responsiveness or sensitivity to social context, appropriateness (importance
and impact), adequacy and transparency. Similarly, interpretive rigour is
classified as authenticity, coherence, reciprocity, typicality and permeability
of the researcher’s intentions; including engagement and interpretation [32].
Robinson [46] explained that transparency (openness and honesty)
is achieved if the research report explicitly addresses how the sampling,
data collection, analysis, and presentation are met. In particular, efforts
to address these methodological issues highlight the extent to which the
criteria for quality profoundly interacts with standards for ethics. Similarly,
responsiveness, or sensitivity, helps to situate or locate the study within a
place, a time and a meaningful group [46]. The study should also consider
the researcher’s background, location and connection to the study setting,
particularly in the recruitment process. This is often described as role conflict
or research bias.
In the interpretive phenomenon, coherence highlights the ability to
select an appropriate sampling procedure that mutually matches the research
aims, questions, data collection, analysis, as well as any theoretical concepts
or frameworks [32, 46]. Similarly, authenticity explains the appropriate
representation of participants’ perspectives in the research process and
the interpretation of results. Authenticity is maximised by providing
evidence that participants are adequately represented in the interpretive
process, or provided an opportunity to give feedback on the researcher’s
interpretation [32]. Again, the contribution of the researcher’s perspective
to the interpretation enhances permeability. Fossey, Harvey [32] further

suggest that reflexive reporting, which distinguishes the participants’ voices
from that of the researcher in the report, enhances the permeability of the
researcher’s role and perspective.
One study highlighted the approaches used to ensure validity in
qualitative research, which includes saturation, identification of deviant
or non-confirmatory cases, member checking and coding by consensus.
Saturation involves completeness in the research process, where all relevant
data collection, codes and themes required to answer the phenomenon of
inquiry are achieved; and no new data emerges [19]. Similarly, member
checking is the process whereby participants or others who share similar
characteristics review study findings to elaborate on confirming them [19].
The coding by consensus involves a collaborative approach to analysing
the data. Ensuring regular meetings among coders to discuss procedures
for assigning codes to segments of data and resolve differences in coding
procedures, and by comparison of codes assigned on selected transcripts to
calculate a percentage agreement or kappa measure of interrater reliability,
are commonly applied [19].
Two studies recommend the need to acknowledge the importance of
generalisability (transferability). This concept aims to provide sufficient
information about the research setting, findings and interpretations for
readers to appropriately determine the replicability of the findings from
one context, or population to another, otherwise known as reliability in
quantitative research [19, 32]. Similarly, the researchers should employ
reflexivity as a means of identifying and addressing potential biases in
data collection and interpretation. Palinkas [19] suggests that such bias is
associated with theoretical orientations; pre-conceived beliefs, assumptions,
and demographic characteristics; and familiarity and experience with the
methods and phenomenon. Another approach to enhance the rigour of
analysis involves peer debriefing and support meetings held among team
members which facilitate detailed auditing during data analysis [19].
DISCUSSION
The integrative review was conducted to synthesise evidence into
recommended methodological considerations when conducting mental
health research. The evidence from the review has been discussed according
to five major themes: 1) mixed methods study in mental health research;
2) qualitative study in mental health research; 3) sampling in mental
health research; 4) data collection in mental health research; and 5) quality

assurance procedures in mental health research.
Mixed Methods Study in Mental Health Research

The evidence suggests that mixed methods approach in mental health are
generally categorised according to their function (rationale, objectives or
purpose), structure and process [18, 19, 43, 48]. The mixed methods study can
be conducted for the purpose of achieving convergence, complementarity,
expansion, development and sampling [18, 19, 43]. Researchers conducting
mental health studies should understand the underlying functions or purpose
of mixed methods. Similarly, mixed methods in mental health studies can be
structured simultaneously (concurrent) and sequential [18, 19, 42, 43, 48].
More importantly, the process of combining qualitative and quantitative data
can be achieved through merging or converging, connecting and embedding
one data set within the other [18, 19, 42, 43, 48]. The evidence further
recommends that researchers need to understand the stage of integrating the
two sets of data and the rationale for doing so. This can inform researchers
regarding the best stage and appropriate ways of combining the two
components of data to adequately address the research question(s).
The evidence recommended some methodological consideration in the
design of mixed methods projects in mental health [18, 19, 45]. These issues
include establishing early collaboration, becoming familiar with the topic,
sharing information on the topic, negotiating any emerging problems and
seeking contributions from team members. The involvement of various
expertise could ensure that methodological issues are clearly identified.
However, addressing such issues midway, or late through the design can
negatively affect the implementation [45]. Any robust discoveries can
rarely be accommodated under the existing design. Therefore, the inclusion
of various methodological expertise during inception can lead to a more
robust mixed-methods design which maximises the contributions of team
members. Whilst fundamental and philosophical differences in qualitative
and quantitative methods may not be resolved, some workable solutions can
be employed, particularly if challenges are viewed as philosophical rather
than personal [45]. The cultural issues can be alleviated by understanding
the concepts, norms and values of the setting, further to respecting and
including perspectives of the various stakeholders.
Qualitative Study in Mental Health Research

The review findings suggest that qualitative methods are relevant when
conducting mental health research. The qualitative methods are mostly used
where there has been limited previous research and an absence of theoretical
perspectives. The approach is also used to gather initial pilot data. More
importantly, the qualitative methods are relevant when we want to understand
sensitive issues, especially from consumers of mental health services, where
the ‘lived experience is paramount [19, 28, 44]. Qualitative methods can
help understand the experiences of consumers in the process of treatment, as
well as their therapeutic relationship with mental health professionals. The
experiences of consumers from qualitative data are particularly important
in developing clinical policy [28]. The review findings find two paradigms
of qualitative methods are used in mental health research. These paradigms
are the interpretive and critical approach [32]. The interpretive qualitative
method(s) include phenomenology, ethnography and narrative approaches
[32, 36]. Conversely, critical qualitative approaches are participatory action
research and emancipatory approach. The review findings suggest that these
approaches to qualitative methods need critical considerations, particularly
when dealing with consumers of mental health services.
Sampling in Mental Health Research

The review findings identified several sampling techniques used in mental
health research. Quantitative studies, usually employ probability sampling,
whilst qualitative studies use non-probability sampling [25, 34]. The most
common sampling techniques for quantitative studies are multi-stage
sampling, which involves systematic, stratified, random sampling and
consecutive sampling. In contrast, the predominant sampling approaches for
qualitative studies are purposive [19, 28, 32, 42, 46], snowballing [30, 32,
46] and theoretical sampling [31, 32].
The sampling of consumers of mental health services requires some
important considerations. The sampling should consider the appropriateness
and adequacy of the sampling approach, diversity and variety of consumers
of services, attributes such as social, or cultural experiences, shared concerns
related to the study, practical and organisational skills, as well as ethical and
sensitivity issues are all relevant [31, 32, 46]. Sampling consumers of mental
health services should also consider the homogeneity and heterogeneity of
consumers. However, failure to address these considerations can present
difficulty in sampling and subsequently result in selection and reporting bias

in mental health research.
Data collection in Mental Health Research

The evidence recommends several data collection approaches in collecting
data in mental health research, including focus group discussion, extended
in-depth interviews, observations, field notes, Delphi data collection and
quasi-statistical techniques. The focus group discussions appear as an
approach widely used to collect data from consumers of mental health
services [19, 28, 30, 31, 41, 44, 47]. The focus group discussion appears to
be a significant source of obtaining information. This approach promotes
the participation of consumers with severe conditions, particularly at the
group level interaction. Mental health researchers are encouraged to use this
approach to collect data from consumers, in order to promote group level
interaction. Additionally, field notes can be used to supplement information
and to more deeply analyse the interactions of consumers of mental health
services. Field notes are significant when wanting to gather detailed accounts
about the subjective experiences of consumers of mental health services
[40]. Field notes can help researchers to capture the gestures and opinions
of consumers of mental health services which cannot be covered in the
audio-tape recording. Particularly, the field note is relevant to complement
the richness of information collected through focus group discussion from
consumers of mental health services.
Furthermore, it was found that in-depth interviews can be used to
explore specific mental health issues, particularly culturally marked
disorders, their personal and interpersonal significance, patient and family
explanatory models, individual and family coping styles, as well as symptom
symbols and protective mediators [19, 30, 34]. The in-depth interviews are
particularly relevant if the study is interested in the lived experiences of
consumers without the contamination of others in a group situation. The in-
depth interviews are relevant when consumers of mental health services are
uncomfortable in disclosing their confidential information in front of others
[31]. The lived experience in a phenomenological context preferably allows
the consumer the opportunity to express themselves anonymously without
any tacit coercion created by a group context.
The review findings recommend significant factors requiring
consideration when collecting data in mental health research. These
considerations include building confidence and trust between the researcher
and consumers [31, 37], gaining confidence of mental health professionals

who manage consumers of mental health services, seeking ethical approval
from the relevant committees, meeting consumers of services before data
collection as well as arranging a mutually acceptable venue for the groups
and providing transport services [41, 47]. The evidence confirms that the
identification of consumers of mental health services to participate in
research can be facilitated by mental health professionals. Similarly, the
cultural and social differences of the consumers of mental health services
need consideration when collecting data from them [26, 31].
Moreover, our review advocates that standardised assessment
instruments can be used to collect data from consumers of mental health
services, particularly in quantitative data. The self-completion instruments
for collecting such information can be converted to interviewer-completion
instruments [33, 49]. The interviewer can read the questions to consumers
of mental health services and record their responses. It is recommended that
collecting data from consumers of mental health services requires significant
preparation, such as training with co-investigators and representatives from
consumers of mental health services [25, 30, 33–35, 39, 49]. The training
helps interviewers and other investigators to understand the research
project, particularly translating and adapting an instrument for the study
setting with the aim to achieve content, semantic, concept, criteria and
technical equivalence [30]. The evidence indicates that there is a need to
adequately train interviewers when preparing for fieldwork to collect data
from consumers of mental health services.
Quality Assurance Procedures in Mental Health Research

The evidence provides several approaches that can be employed to ensure
quality assurance in mental health research involving quantitative methods.
The quality assurance approach encompasses seeking informed consent
from consumers of mental health services [26, 37], pre-testing of tools [25,
49], minimising non-response rates and monitoring of the data collection
process [25, 33, 49]. The quality assurance process in mental health research
primarily aims to achieve the highest reliability, validity and timeliness, to
improve the quality of care provided. For instance, the informed consent
exposes consumers of mental health services to the aim(s), methods,
anticipated benefits and potential hazards and discomforts of participating in
the study. Herein, consumers of mental health services who cannot respond
to the inform consent process because of the severity of their illness can
have it signed by their family caregivers. The implication is that researchers
should determine which category of consumers of mental health services

need family caregivers involved in the consent process [37].
The review findings advises that researchers should use pre-testing
to evaluate the data collection procedure on a small scale and then to
subsequently make any necessary changes [25]. The pre-testing aims to
help the interviewers get acquainted with the procedures and to detect any
potential problems [49]. The researchers can discuss the findings of the
pre-testing and then further resolve any challenges that may arise prior to
the actual field work being commenced. The non-response rates in mental
health research can be minimised by re-approaching consumers of mental
health services who initially refuse to participate in the study.
In addition, quality assurance for qualitative data can be ensured
by applying the principles of credibility, dependability, transferability,
reflexivity, confirmability [19, 32]. It was found that the credibility of
qualitative research in mental health is achieved through methodological
and interpretive rigour [32, 37]. The methodological rigour for assessing
credibility relates to congruence, responsiveness or sensitivity to a social
context, appropriateness, adequacy and transparency. By contrast, ensuring
interpretive rigour is achieved through authenticity, coherence, reciprocity,
typicality and permeability of researchers’ intentions, engagement and
interpretation [32, 46].
Strengths and Limitations

The evidence has several strengths and limitations that require interpretation
and explanation. Firstly, we employed a systematic approach involving
five stages of problem identification, literature search, data evaluation,
data synthesis and presentation of results [21]. Similarly, we searched six
databases and developed a data extraction form to extract information. The
rigorous process employed in this study, for instance, searching databases
and data extraction forms, helped to capture comprehensive information on
the subject.
The integrative review has several limitations largely related to the search
words, language limitations, time period and appraisal of methodological
quality of included papers. In particular, the differences in key terms and
words concerning methodological issues in the context of mental health
research across cultures and organisational contexts may possibly have
missed some relevant articles pertaining to the study. Similarly, limiting
included studies to only English language articles and those published from
January 2000 to July 2018 could have missed useful articles published in
other languages and those published prior to 2000. The review did not assess
the methodological quality of included papers using a critical appraisal tool,
however, the combination of clearly articulated search methods, consultation
with the research librarian, and reviewing articles with methodological
experts in mental health research helped to address the limitations.
CONCLUSION
The review identified several methodological issues that need critical
attention when conducting mental health research. The evidence confirms
that studies that addressed methodological considerations in conducting
mental health research largely focuses on qualitative studies in a transcultural
setting, in addition to lessons from multi-site surveys in mental health
research. Specifically, the methodological issues related to the study design,
sampling, data collection processes and quality assurance are critical to the
research design chosen for any particular study. The review highlighted
that researchers conducting mental health research can establish early
collaboration, familiarise themselves with the topic, share information on the
topic, negotiate to resolve any emerging problems and seek the contribution
of clinical (or researcher) team members on the ground. In addition, the
recruitment of consumers of mental health services should consider the
appropriateness and adequacy of sampling approaches, diversity and variety
of consumers of services, their social or cultural experiences, practical and
organisational skills, as well as ethical and sensitivity issues.
The evidence confirms that in an attempt to effectively recruit and collect
data from consumers of mental health services, there is the need to build
confidence and trust between the researcher and consumers; and to gain the
confidence of mental health service providers. Furthermore, seeking ethical
approval from the relevant committee, meeting with consumers of services
before data collection, arranging a mutually acceptable venue for the groups,
and providing transport services, are all further important considerations. The
review findings establish that researchers conducting mental health research
should consider several quality assurance issues. Issues such as adequate
training prior to data collection, seeking informed consent from consumers
of mental health services, pre-testing of tools, minimising non-response rates
and monitoring of the data collection process. More specifically, quality
assurance for qualitative data can be achieved by applying the principles of
credibility, dependability, transferability, reflexivity, confirmability.
Based on the findings from this review, it is recommended that mental

health research should adequately consider the methodological issues
regarding study design, sampling, data collection procedures and quality
assurance issues to effectively conduct meaningful research.
ACKNOWLEDGEMENTS
The authors wish to thank the University of Newcastle Graduate Research
and the School of Nursing and Midwifery, for the Doctoral Scholarship
offered to the lead author. The authors are also grateful for the support
received from Ms. Debbie Booth, the Librarian for supporting the literature
search.
AUTHORS’ CONTRIBUTIONS
EB, APO’B, and RM conceptualized the study. EB conducted the data
extraction, APO’B, and RM, conducted the second review of the extracted
data. EB, working closely with APO’B and RM performed the content
analysis and drafted the manuscript. EB, APO’B, and RM, reviewed and
made inputs into the intellectual content and agreed on its submission for
publication. All authors read and approved the final manuscript.
REFERENCES
1. National Ethics Advisory Committee. Ethical guidelines for
intervention studies: revised edition. Wellington (New Zealand):
Ministry of Health. 2012.
2. Mann C. Observational research methods. Research design II: cohort,
cross sectional, and case-control studies. Emerg Med J. 2003;20(1):54–
60. doi: 10.1136/emj.20.1.54.
3. DiPietro NA. Methods in epidemiology: observational study designs.
Pharmacotherapy: The Journal of Human Pharmacology and Drug
Therapy. 2010;30(10):973–984. doi: 10.1592/phco.30.10.973.
4. Hong NQ, Pluyr P, Fabregues S, Bartlett G, Boardman F, Cargo M,
et al. Mixed Methods Appraisal Tool (MMAT). Canada.: Intellectual
Property Office, Canada; 2018.
5. Creswell JW, Creswell JD. Research design: qualitative, quantitative,
and mixed methods approaches: sage publications. 2017.
6. Wisdom J, Creswell JW. Mixed methods: integrating quantitative and
qualitative data collection and analysis while studying patient-centered
medical home models. Rockville: Agency for Healthcare Research and
Quality; 2013.
7. Bonita R, Beaglehole R, Kjellström T. Basic epidemiology: World
Health Organization. 2006.
8. Centers for Disease Control Prevention [CDC]. Principles of
epidemiology in public health practice: an introduction to applied
epidemiology and biostatistics. Atlanta, GA: US Dept. of Health and
Human Services, Centers for Disease Control and Prevention (CDC),
Office of Workforce and Career Development; 2012.
9. Parab S, Bhalerao S. Study designs. International journal of Ayurveda
research. 2010;1(2):128. doi: 10.4103/0974-7788.64406.
10. Yang W, Zilov A, Soewondo P, Bech OM, Sekkal F, Home PD.
Observational studies: going beyond the boundaries of randomized
controlled trials. Diabetes Res Clin Pract. 2010;88:S3–S9. doi:
10.1016/S0168-8227(10)70002-4.
11. Department of Family Medicine (McGill University). Mixed Methods
Appraisal Tool (MMAT) – Version 2011 Canada: McGill University;
2011 [Available from: http://mixedmethodsappraisaltoolpublic.
pbworks.com/w/file/fetch/84371689/MMAT%202011%20
criteria%20and%20tutorial%202011-06-29updated2014.08.21.pdf.
12. Besen Justin, Gan Stephanie D. A Critical Evaluation of Clinical

Research Study Designs. Journal of Investigative Dermatology.
2014;134(3):1–4. doi: 10.1038/jid.2013.545.
13. Axelrod DA, Hayward R. Nonrandomized interventional study designs
(quasi-experimental designs). Clinical research methods for surgeons:
Springer; 2006. p. 63–76.
14. Thiese MS. Observational and interventional study design types; an
overview. Biochemia medica: Biochemia medica. 2014;24(2):199–
210. doi: 10.11613/BM.2014.022.
15. Velengtas P, Mohr P, Messner DA. Making informed decisions: assessing
the strengths and weaknesses of study designs and analytic methods for
comparative effectiveness research. National Pharmaceutical Council
2012.
16. Guerrera F, Renaud S, Tabbò F, Filosso PL. How to design a randomized
clinical trial: tips and tricks for conduct a successful study in thoracic
disease domain. Journal of thoracic disease. 2017;9(8):2692. doi:
10.21037/jtd.2017.06.147.
17. Bhide A, Shah PS, Acharya G. A simplified guide to randomized
controlled trials. Acta Obstet Gynecol Scand. 2018;97(4):380–387.
doi: 10.1111/aogs.13309.
18. Palinkas L, Horwitz SM, Chamberlain P, Hurlburt MS, Landsverk
J. Mixed-methods designs in mental health services research:
a review. Psychiatr Serv. 2011;62(3):255–263. doi: 10.1176/
ps.62.3.pss6203_0255.
19. Palinkas L. Qualitative and mixed methods in mental health services
and implementation research. J Clin Child Adolesc Psychol.
2014;43(6):851–861. doi: 10.1080/15374416.2014.910791.
20. World Health Organization [WHO]. Mental health: a state of well-
being 2014 [Available from: http://www.who.int/features/factfiles/
mental_health/en/.
21. Whittemore R, Knafl K. The integrative review: updated
methodology. J Adv Nurs. 2005;52(5):546–553. doi: 10.1111/j.1365-
2648.2005.03621.x.
22. Hopia H, Latvala E, Liimatainen L. Reviewing the methodology of
an integrative review. Scand J Caring Sci. 2016;30(4):662–669. doi:
10.1111/scs.12327.
23. Pearson A, White H, Bath-Hextall F, Apostolo J, Salmond S, Kirkpatrick

P. Methodology for JBI mixed methods systematic reviews. The Joanna
Briggs Institute Reviewers Manual. 2014;1:5–34.
24. Moher D, Liberati A, Tetzlaff J, Altman DG, Group P Preferred
reporting items for systematic reviews and meta-analyses: the PRISMA
statement. PLoS Med. 2009;6(7):e1000097. doi: 10.1371/journal.
pmed.1000097.
25. Alonso J, Angermeyer MC, Bernert S, Bruffaerts R, Brugha TS,
Bryson H, et al. Sampling and methods of the European study of the
epidemiology of mental disorders (ESEMeD) project. Acta Psychiatr
Scand Suppl. 2004;109(420):8–20.
26. Baarnhielm S, Ekblad S. Qualitative research, culture and ethics: a
case discussion. Transcultural Psychiatry. 2002;39(4):469–483. doi:
10.1177/1363461502039004493. [CrossRef]
27. Braun V, Clarke V. Using thematic analysis in psychology. Qual
Res Psychol. 2006;3(2):77–101. doi: 10.1191/1478088706qp063oa.
[CrossRef]
28. Brown C, Lloyd K. Qualitative methods in psychiatric research.
Adv Psychiatr Treat. 2001;7(5):350–356. doi: 10.1192/apt.7.5.350.
[CrossRef]
29. Davidsen AS. Phenomenological approaches in psychology and
health sciences. Qual Res Psychol. 2013;10(3):318–339. doi:
10.1080/14780887.2011.608466.
30. de Jong JT, Van Ommeren M. Toward a culture-informed epidemiology:
combining qualitative and quantitative research in transcultural
contexts. Transcultural Psychiatry. 2002;39(4):422–433. doi:
10.1177/136346150203900402. [CrossRef]
31. Ekblad S, Baarnhielm S. Focus group interview research in transcultural
psychiatry: reflections on research experiences. Transcultural
Psychiatry. 2002;39(4):484–500. doi: 10.1177/136346150203900406.
[CrossRef]
32. Fossey E, Harvey C, McDermott F, Davidson L. Understanding and
evaluating qualitative research. Aust N Z J Psychiatry. 2002;36(6):717–
732. doi: 10.1046/j.1440-1614.2002.01100.x.
33. Jacobi Frank, Wittchen Hans-Ulrich, Hölting Christoph, Sommer
Sieghard, Lieb Roselind, Höfler Michael, Pfister Hildegard. Estimating
the prevalence of mental and somatic disorders in the community:
aims and methods of the German National Health Interview and

Examination Survey. International Journal of Methods in Psychiatric
Research. 2002;11(1):1–18. doi: 10.1002/mpr.118.
34. Koch A, Vogel A, Holzmann M, Pfennig A, Salize HJ, Puschner B, et al.
MEMENTA-‘Mental healthcare provision for adults with intellectual
disability and a mental disorder’. A cross-sectional epidemiological
multisite study assessing prevalence of psychiatric symptomatology,
needs for care and quality of healthcare provision for adults with
intellectual disability in Germany: a study protocol. BMJ Open.
2014;4(5):e004878. doi: 10.1136/bmjopen-2014-004878.
35. Korver N, Quee PJ, Boos HB, Simons CJ, de Haan L, Investigators
G. Genetic risk and outcome of psychosis (GROUP), a multi site
longitudinal cohort study focused on gene–environment interaction:
objectives, sample characteristics, recruitment and assessment
methods. Int J Methods Psychiatr Res. 2012;21(3):205–221. doi:
10.1002/mpr.1352.
36. Larkin M, Watts S, Clifton E. Giving voice and making sense
in interpretative phenomenological analysis. Qual Res Psychol.
2006;3(2):102–120. doi: 10.1191/1478088706qp062oa.
37. Latvala E, Vuokila-Oikkonen P, Janhonen S. Videotaped recording as a
method of participant observation in psychiatric nursing research. J Adv
Nurs. 2000;31(5):1252–1257. doi: 10.1046/j.1365-2648.2000.01383.x.
38. Leese MN, White IR, Schene AH, Koeter MW, Ruggeri M, Gaite L.
Reliability in multi-site psychiatric studies. Int J Methods Psychiatr
Res. 2001;10(1):29–42. doi: 10.1002/mpr.98.
39. Liu Z, Huang Y, Lv P, Zhang T, Wang H, Li Q, et al. The China
mental health survey: II. Design and field procedures. Soc Psychiatry
Psychiatr Epidemiol. 2016;51(11):1547–1557. doi: 10.1007/s00127-
016-1269-5.
40. Montgomery P, Bailey PH. Field notes and theoretical memos
in grounded theory. West J Nurs Res. 2007;29(1):65–79. doi:
10.1177/0193945906292557.
41. Owen S. The practical, methodological and ethical dilemmas
of conducting focus groups with vulnerable clients. J Adv Nurs.
2001;36(5):652–658. doi: 10.1046/j.1365-2648.2001.02030.x.
42. Palinkas L, Horwitz SM, Green CA, Wisdom JP, Duan N, Hoagwood
K. Purposeful sampling for qualitative data collection and analysis
in mixed method implementation research. Adm Policy Ment Health

Ment Health Serv Res. 2015;42(5):533–544. doi: 10.1007/s10488-013-
0528-y.
43. Palinkas L, Aarons GA, Horwitz S, Chamberlain P, Hurlburt M,
Landsverk J. Mixed method designs in implementation research. Adm
Policy Ment Health Ment Health Serv Res. 2011;38(1):44–53. doi:
10.1007/s10488-010-0314-z.
44. Razafsha M, Behforuzi H, Azari H, Zhang Z, Wang KK, Kobeissy FH,
et al. Qualitative versus quantitative methods in psychiatric research.
Methods Mol Biol. 2012;829:49–62. doi: 10.1007/978-1-61779-458-
2_3.
45. Robins CS, Ware NC, Dosreis S, Willging CE, Chung JY, Lewis-
Fernández R. Dialogues on mixed-methods and mental health services
research: anticipating challenges, building solutions. Psychiatr Serv.
2008;59(7):727–731. doi: 10.1176/ps.2008.59.7.727.
46. Robinson OC. Sampling in interview-based qualitative research: a
theoretical and practical guide. Qual Res Psychol. 2014;11(1):25–41.
doi: 10.1080/14780887.2013.801543.
47. Schilder K, Tomov T, Mladenova M, Mayeya J, Jenkins R,
Gulbinat W, et al. The appropriateness and use of focus group
methodology across international mental health communities.
International Review of Psychiatry. 2004;16(1–2):24–30. doi:
10.1080/09540260310001635078.
48. Schoonenboom J, Johnson RB. How to construct a mixed methods
research DesignWie man ein mixed methods-Forschungs-
design konstruiert. KZfSS Kölner Zeitschrift für Soziologie und
Sozialpsychologie. 2017;69(2):107–131. doi: 10.1007/s11577-017-
0454-1.
49. Yin H, Phillips MR, Wardenaar KJ, Xu G, Ormel J, Tian H, et al. The
Tianjin mental health survey (TJMHS): study rationale, design and
methods. Int J Methods Psychiatr Res. 2017;26(3):09. doi: 10.1002/
mpr.1535.
Chapter
WIKI SURVEYS: OPEN AND
QUANTIFIABLE SOCIAL
DATA COLLECTION
8
Matthew J. Salganik1 , Karen E. C. Levy2

1
Department of Sociology, Center for Information Technology Policy, and Office of
Population Research, Princeton University, Princeton, NJ, USA
2
Information Law Institute and Department of Media, Culture, and Communication, New
York University, New York, NY, USA and Data & Society Research Institute, New York,
NY, USA
ABSTRACT
In the social sciences, there is a longstanding tension between data collection
methods that facilitate quantification and those that are open to unanticipated
information. Advances in technology now enable new, hybrid methods that
combine some of the benefits of both approaches. Drawing inspiration from
Citation: (APA): Salganik, M. J., & Levy, K. E. (2015). Wiki surveys: Open and quan-
tifiable social data collection. PloS one, 10(5), e0123483. (17 pages)
Copyright: ©2015 Salganik, Levy. This is an open access article distributed under the
terms of the Creative Commons Attribution 4.0 International License (http://creative-
commons.org/licenses/by/4.0/)
online information aggregation systems like Wikipedia and from traditional

survey research, we propose a new class of research instruments called wiki
surveys. Just as Wikipedia evolves over time based on contributions from
participants, we envision an evolving survey driven by contributions from
respondents. We develop three general principles that underlie wiki surveys:
they should be greedy, collaborative, and adaptive. Building on these
principles, we develop methods for data collection and data analysis for one
type of wiki survey, a pairwise wiki survey. Using two proof-of-concept
case studies involving our free and open-source website www.allourideas.
org, we show that pairwise wiki surveys can yield insights that would be
difficult to obtain with other methods.
INTRODUCTION
In the social sciences, there is a longstanding tension between data
collection methods that facilitate quantification and those that are open
to unanticipated information. For example, one can contrast a traditional
public opinion survey based on a series of pre-written questions and
answers with an interview in which respondents are free to speak in their
own words. The tension between these approaches derives, in part, from
the strengths of each: open approaches (e.g., interviews) enable us to learn
new and unexpected information, while closed approaches (e.g., surveys)
tend to be more cost-effective and easier to analyze. Fortunately, advances
in technology now enable new, hybrid approaches that combine the benefits
of each. Drawing inspiration both from online information aggregation
systems like Wikipedia and from traditional survey research, we propose
a new class of research instruments called wiki surveys. Just as Wikipedia
grows and improves over time based on contributions from participants, we
envision an evolving survey driven by contributions from respondents.
Although the tension between open and closed approaches to data
collection is currently most evident in disagreements between proponents
of quantitative and qualitative methods, the trade-off between open and
closed survey questions was also particularly contentious in the early days
of survey research [1–3]. Although closed survey questions, in which
respondents choose from a series of pre-written answer choices, have come
to dominate the field, this is not because they have been proven superior for
measurement. Rather, the dominance of closed questions is largely based
on practical considerations: having a fixed set of responses dramatically
simplifies data analysis [4].
Wiki Surveys: Open and Quantifiable Social Data Collection 157
The dominance of closed questions, however, has led to some missed

opportunities, as open approaches may provide insights that closed methods
cannot [4–8]. For example, in one study, researchers conducted a split-ballot
test of an open and closed form of a question about what people value in
jobs [5]. When asked in closed form, virtually all respondents provided
one of the five researcher-created answer choices. But, when asked in open
form, nearly 60% of respondents provided a new answer that fell outside the
original five choices. In some situations, these unanticipated answers can be
the most valuable, but they are not easily collected with closed questions.
Because respondents tend to confine their responses to the choices offered
[9], researchers who construct all the possible choices necessarily constrain
what can be learned.
Projects that depend on crowdsourcing and user-generated content, such
as Wikipedia, suggest an alternative approach. What if a survey could be
constructed by respondents themselves? Such a survey could produce clear,
quantifiable results at a reasonable cost, while minimizing the degree to
which researchers must impose their pre-existing knowledge and biases on
the data collection process. We see wiki surveys as an initial step toward this
possibility.
Wiki surveys are intended to serve as a complement to, not a replacement
for, traditional closed and open methods. In some settings, traditional
methods will be preferable, but in others we expect that wiki surveys may
produce new insights. The field of survey research has always evolved in
response to new opportunities created by changes in technology and society
[10–16], and we see this research as part of that longstanding evolution.
In this paper, we develop three general principles that underlie wiki
surveys: they should be greedy, collaborative, and adaptive. Building on
these principles, we develop methods for data collection and data analysis
for one type of wiki survey, a pairwise wiki survey. Using two proof-of-
concept case studies involving our free and open-source website www.
allourideas.org, we show that pairwise wiki surveys can yield insights that
would be difficult to obtain with other methods. The paper concludes with a
discussion of the limitations of this work and possibilities for future research.
WIKI SURVEYS
Online information aggregation projects, of which Wikipedia is an exemplar,
can inspire new directions in survey research. These projects, which are built
from crowdsourced, user-generated content, tend to share certain properties
that are not characteristic of traditional surveys [17–20]. These properties

guide our development of wiki surveys. In particular, we propose that wiki
surveys should follow three general principles: they should be greedy,
collaborative, and adaptive.
Greediness
Traditional surveys attempt to collect a fixed amount of information
from each respondent; respondents who want to contribute less than one
questionnaire’s worth of information are considered problematic, and
respondents who want to contribute more are prohibited from doing so. This
contrasts sharply with successful information aggregation projects on the
Internet, which collect as much or as little information as each participant is
willing to provide. Such a structure typically results in highly unequal levels
of contribution: when contributors are plotted in rank order, the distributions
tend to show a small number of heavy contributors—the “fat head”—and
a large number of light contributors—the “long tail” [21, 22] (Fig 1). For
example, the number of edits to Wikipedia per editor roughly follows a
power-law distribution with an exponent 2 [22]. If Wikipedia were to allow
10 and only 10 edits per editor—akin to a survey that requires respondents
to complete one and only one form—it would exclude about 95% of the
edits contributed. As such, traditional surveys potentially leave enormous
amounts of information from the “fat head” and “long tail” uncollected.
Wiki surveys, then, should be greedy in the sense that they should capture as
much or as little information as a respondent is willing to provide.
Figure 1: Schematic of rank order plot of contributions to successful online

information aggregation projects.
These systems can handle both heavy contributors (“the fat head”),
shown on the left side of the plot, and light contributors (“the long tail”),
shows on the right side of the plot. Traditional survey methods utilize
information from neither the “fat head” nor the “long tail” and thus leave
huge amounts of information uncollected.
https://doi.org/10.1371/journal.pone.0123483.g001
Collaborativeness
In traditional surveys, the questions and answer choices are typically written
by researchers rather than respondents. In contrast, wiki surveys should be
collaborative in that they are open to new information contributed directly
by respondents that may not have been anticipated by the researcher, as often
happens during an interview. Crucially, unlike a traditional “other” box in a
survey, this new information would then be presented to future respondents
for evaluation. In this way, a wiki survey bears some resemblance to a focus
group in which participants can respond to the contributions of others [23,
24]. Thus, just as a community collaboratively writes and edits Wikipedia,
the content of a wiki survey should be partially created by its respondents.
This approach to collaborative survey construction resembles some forms
of survey pre-testing [25]. However, rather than thinking of pre-testing
as a phase distinct from the actual data collection, in wiki surveys the
collaboration process continues throughout data collection.
Adaptivity
Traditional surveys are static: survey questions, their order, and their possible
answers are determined before data collection begins and do not evolve as
more is learned about the parameters of interest. This static approach, while
easier to implement, does not maximize the amount that can be learned from
each respondent. Wiki surveys, therefore, should be adaptive in the sense that
the instrument is continually optimized to elicit the most useful information,
given what is already known. In other words, while collaborativeness
involves being open to new information, adaptivity involves using the
information that has already been gathered more efficiently. In the context
of wiki surveys, adaptivity is particularly important given that respondents
can provide different amounts of information (due to greediness) and that
some answer choices are newer than others (due to collaborativeness).
Like greediness and collaborativeness, adaptivity increases the complexity
of data analysis. However, research in related areas [26–33] suggests that

gains in efficiency from adaptivity can more than offset the cost of added
complexity.
Pairwise Wiki Surveys
Building on previous work [34–40], we operationalize these three
principles into what we call a pairwise wiki survey. A pairwise wiki survey
consists of a single question with many possible answer items. Respondents
can participate in a pairwise wiki survey in two ways: first, they can make
pairwise comparisons between items (i.e., respondents vote between item A
and item B), and second, they can add new items that are then presented to
future respondents.
Pairwise comparison, which has a long history in the social sciences
[41], is an ideal question format for wiki surveys because it is amenable
to the three criteria described above. Pairwise comparison can be greedy
because the instrument can easily present as many (or as few) prompts as
each respondent is willing to answer. New items contributed by respondents
can easily be integrated into the choice sets of future respondents, enabling
the instrument to be collaborative. Finally, pairwise comparison can be
adaptive because the pairs to be presented can be selected to maximize
learning given previous responses. These properties exist because pairwise
comparisons are both granular and modular; that is, the unit of contribution
is small and can be readily aggregated [17].
Pairwise comparison also has several practical benefits. First, pairwise
comparison makes manipulation, or “gaming,” of results difficult because
respondents cannot choose which pairs they will see; instead, this choice
is made by the instrument. Thus, when there is a large number of possible
items, a respondent would have to respond many times in order to be
presented with the item that she wishes to “vote up” (or “vote down”) [42].
Second, pairwise comparison requires respondents to prioritize items—that
is, because the respondent must select one of two discrete answer choices
from each pair, she is prevented from simply saying that she likes (or
dislikes) every option equally strongly. This feature is particularly valuable
in policy and planning contexts, in which finite resources make prioritization
of ideas necessary. Finally, responding to a series of pairwise comparisons
is reasonably enjoyable, a common characteristic of many successful web-
based social research projects [43, 44].
Data Collection
In order to collect pairwise wiki survey data, we created the free and open-
source website All Our Ideas (www.allourideas.org), which enables anyone
to create their own pairwise wiki survey. To date, about 6,000 pairwise
wiki surveys have been created that include about 300,000 items and 7
million responses. By providing this service online, we are able to collect
a tremendous amount of data about how pairwise wiki surveys work in
practice, and our steady stream of users provides a natural testbed for further
methodological research.
The data collection process in a pairwise wiki survey is illustrated by
a project conducted by the New York City Mayor’s Office of Long-Term
Planning and Sustainability in order to integrate residents’ ideas into PlaNYC
2030, New York’s citywide sustainability plan. The City has typically held
public meetings and small focus groups to obtain feedback from the public.
By using a pairwise wiki survey, the Mayor’s Office sought to broaden the
dialogue to include input from residents who do not traditionally attend
public meetings. To begin the process, the Mayor’s Office generated a list of
25 ideas based on their previous outreach (e.g., “Require all big buildings to
make certain energy efficiency upgrades,” “Teach kids about green issues as
part of school curriculum”).
Using these 25 ideas as “seeds,” the Mayor’s Office created a pairwise
wiki survey with the question “Which do you think is a better idea for creating
a greener, greater New York City?” Respondents were presented with a pair
of ideas (e.g., “Open schoolyards across the city as public playgrounds” and
“Increase targeted tree plantings in neighborhoods with high asthma rates”),
and asked to choose between them (see Fig 2). After choosing, respondents
were immediately presented with another randomly selected pair of ideas.
Respondents were able to continue contributing information about their
preferences for as long as they wished by either voting or choosing “I can’t
decide.” Crucially, at any point, respondents were able to contribute their
own ideas, which—pending approval by the wiki survey creator—became
part of the pool of ideas to be presented to others. Respondents were also
able to view the popularity of the ideas at any time, making the process
transparent. However, by decoupling the processes of voting and viewing
the results—which occur on distinct screens (see Fig 2)—the site prevents
a respondent from having immediate information about the opinions of
others when she responds, which minimizes the risk of social influence and
information cascades [43, 45–48].
Figure 2: Response and results interfaces at www.allourideas.org.

This example is from a pairwise wiki survey created by the New York
City Mayor’s Office to learn about residents’ ideas about how to make New
York “greener and greater.”
The Mayor’s Office launched its pairwise wiki survey in October 2010
in conjunction with a series of community meetings to obtain resident
feedback. The effort was publicized at meetings in all five boroughs of
the city and via social media. Over about four months, 1,436 respondents
contributed 31,893 responses and 464 ideas to the pairwise wiki survey.
Data Analysis
Given this data collection process, we analyze data from a pairwise wiki
survey in two main steps (Fig 3). First, we use responses to estimate the
opinion matrix Θ that includes an estimate of how much each respondent
values each item. Next, we summarize the opinion matrix to produce a
score for each item that estimates the probability that it will beat a randomly
chosen item for a randomly chosen respondent. Because this analysis is
modular, either step—estimation or summarization—could be improved
independently.
Figure 3: Summary of data analysis plan.

We use responses to estimate the opinion matrix Θ and then we
summarize the opinion matrix with the scores of each item.
Estimating the opinion matrix

The analysis begins with a set of pairwise comparison responses that are
nested within respondents. For example, Fig 3 shows five hypothetical
responses from two respondents. These responses are used to estimate the
opinion matrix
which has one row for each respondent and one column for each item, where
θj, k is the amount that respondent j values item k (or more generally, the
amount that respondent j believes item k answers the question being asked).
In the New York City example described above, θj, k could be the amount that
a specific respondent values the idea “Open schoolyards across the city as
public playgrounds.”
Three features of the response data complicate the process of estimating
the opinion matrix Θ. First, because the wiki survey is greedy, we have
an unequal number of responses from each respondent. Second, because
the wiki survey is collaborative, there are some items that can never be
presented to some respondents. For example, if respondent j contributed
an item, then none of the previous respondents could have seen that item.
Collectively, the greediness and the collaborativeness mean that in practice
we often have to estimate a respondent’s value for an item that she has never
encountered. The third problem is that responses are in the form of pairwise
comparisons, which means that we can only observe a respondent’s relative
preference between two items, not her absolute feeling about either item.
In order to address these three challenges, we propose a statistical model
that assumes that respondents’ responses reflect their relative preferences
between items (i.e., the Thurstone-Mosteller model [41, 49, 50]) and that
the distribution of preferences across respondents for each item follows
a normal distribution. Given these assumptions and weakly informative
priors, we can perform Bayesian inference to estimate the θj, k’s that are
most consistent with the responses that we observe and the assumptions
that we have made. One important feature of this modeling strategy is that
for those who contribute many responses, we can better estimate their row
in the opinion matrix, and for those who contribute fewer responses, we
have to rely more on the pooling of information from other respondents
(i.e., imputation). The specific functional forms that we assume result in
the following posterior distribution, which resembles a hierarchical probit
model:
(1)
where X is an appropriately constructed design matrix, Y is an appropriately
constructed outcome vector, μ = μ1…μK represents the mean appeal of each
item, and μ0 = μ0[1]…μ0[K] and τ20=τ20[1]…τ20[K]τ02=τ0[1]2…τ0[K]2 are
parameters to the priors for mean appeal of each item (μ).
This statistical model is just one of many possible approaches to
estimating the opinion matrix from the response data, and we hope that future
research will develop improved approaches. We fully derive the model,
discuss situations in which our modeling assumptions might not hold, and
describe the Gibbs sampling approach that we use to make repeated draws
from the posterior distribution. Computer code to make these draws was
written in R [51] and utilized the following packages: plyr [52], multicore
[53], bigmemory [54], truncnorm [55], testthat [56], Matrix [57], and
matrixStats [58].
Summarizing opinion matrix

Once estimated, the opinion matrix Θ may include hundreds of thousands
of parameters —there are often thousands of respondents and hundreds of
items—that are measured on a non-intuitive scale. Therefore, the second
step of our analysis is to summarize the opinion matrix Θ in order to make it
more interpretable. The ideal summary of the opinion matrix will likely vary
from setting to setting, but our preferred summary statistic is what we call
the score of each item, sîsî, which is the estimated chance that it will beat
a randomly chosen item for a randomly chosen respondent. That is,
(2)
The minimum score is 0 for an item that is always expected to lose,

and the maximum score is 100 for an item that is always expected to win.
For example, a score of 50 for the idea “Open schoolyards across the city
as public playgrounds” means that we estimate it is equally likely to win or
lose when compared to a randomly selected idea for a randomly selected
respondent. To construct 95% posterior intervals around the estimated
scores, we use the t posterior draws of the opinion matrix (Θ(1),Θ(2), …, Θ(t))
to calculate t posterior draws of s (sˆ(1),sˆ(2),…,sˆ(t)s^(1),s^(2),…,s^(t)).
From these draws, we calculate the 95% posterior intervals around sîsî
by findings values a and b such that Pr(sî>a)=0.025Pr(sî>a)=0.025 and
Pr(sî<b)=0.025Pr(sî<b)=0.025 [59].
We chose to conduct a two-step analysis process—estimating and then
summarizing the opinion matrix, Θ—rather than estimating the scores
directly for three reasons. First, we believe that making the opinion matrix,
Θ, an explicit target of inference underscores the possible heterogeneity of
preferences among respondents. Second, by estimating the opinion matrix
as an intermediate step, our approach can be extended to cases in which
co-variates are added at the level of the respondent (e.g., gender, age,
income, etc.) or at the level of the item (e.g., about the economy, about
the environment, etc.). Finally, although we are currently most interested in
the score as a summary statistic, there are many possible summaries of the
opinion matrix that could be important, and by estimating Θ we enable future
researchers to choose other summaries that may be important in their setting
(e.g., which items cluster together such that people who value one item in
the cluster tend to value other items in the cluster?). We return to some
possible improvements, extensions, and generalizations in the Discussion.
CASE STUDIES
To show how pairwise wiki surveys operate in practice, in this section we
describe two case studies in which the All Our Ideas platform was used
for collecting and prioritizing community ideas for policymaking: New
York City’s PlaNYC 2030 and the Organisation for Economic Co-operation
and Development (OECD)’s “Raise Your Hand” initiative. As described
previously, the New York City Mayor’s Office conducted a wiki survey in
order to integrate residents’ ideas into the 2011 update to the City’s long-
term sustainability plan. The wiki survey asked residents to contribute their
ideas about how to create “a greener, greater New York City” and to vote
on the ideas of others. The OECD’s wiki survey was created in preparation
for an Education Ministerial Meeting and an Education Policy Forum on
“Investing in Skills for the 21st Century.” The OECD sought to bring fresh
ideas from the public to these events in a democratic, transparent, and
bottom-up way by seeking input from education stakeholders located around
the globe. To accomplish these goals, the OECD created a wiki survey to
allow respondents to contribute and vote on ideas about “the most important
action we need to take in education today.”
We assisted the New York City Mayor’s Office and the OECD in the
process of setting up their wiki surveys, and spoke with officials of both
institutions multiple times over the course of survey administration. We
also conducted qualitative interviews with officials from both groups at the
conclusion of survey data collection in order to better understand how the
wiki surveys worked in practice, contextualize the results, and get a better
sense of whether the use of a wiki survey enabled the groups to obtain
information that might have been difficult to obtain via other data collection
methods. Unfortunately, logistical considerations prevented either group
from using a probabilistic sampling design. Therefore, we can only draw
inferences about respondents, who should not be considered a random
sample from some larger population. However, wiki surveys can be used in
conjunction with probabilistic sampling designs, and we will return to the
issue of sampling in the Discussion.
Quantitative Results
The pairwise wiki surveys conducted by the New York City Mayor’s Office
and the OECD had similar patterns of respondent participation. In the
PlaNYC wiki survey, 1,436 respondents contributed 31,893 responses, and
in the OECD wiki survey 1,668 respondents contributed 28,852 responses.
Further, respondents contributed a substantial number of new ideas (464 for
PlaNYC, and 534 for OECD). Of these contributed ideas, those that the wiki
survey creators deemed inappropriate or duplicative were not activated. In
the end, the number of ideas under consideration was dramatically expanded.
For PlaNYC the number of active ideas in the wiki survey increased from
25 to 269, a 10-fold increase, and for the OECD from 60 to 285, a 5-fold
increase (Fig 4).
Figure 4: Cumulative number of activated ideas for PlaNYC [A] and OECD
[B].
The PlaNYC wiki survey ran from October 7, 2010 to January 30, 2011.
The OECD wiki survey ran from September 15, 2010 to October 15, 2010.
In both cases the pool of ideas grew over time as respondents contributed to
the wiki survey. PlaNYC had 25 seed ideas and 464 user-contributed ideas,
244 of which the Mayor’s Office activated. The OECD had 60 seed ideas
(6 of which it deactivated during the course of the survey), and 534 user-
contributed ideas, 231 of which it activated. In both cases, ideas that were
deemed inappropriate or duplicative were not activated.
Within each survey, the level of respondent contribution varied widely,
in terms of both number of responses and number of ideas contributed, as
we expected given the greedy nature of the wiki survey. In both cases, the
distributions of both responses and contributed ideas contained “fat heads”
and “long tails” (see Fig 5). If the wiki surveys captured only a fixed amount
of information per respondent—as opposed to capturing all levels of effort—a
significant amount of information would have been lost. For instance, if
we only accepted the first 10 responses per respondent and discarded all
respondents with fewer than 10 responses, approximately 75% of the
responses in each survey would have been discarded. Further, if we were
to limit the number of ideas contributed to one per respondent, as is typical
in surveys with one and only one “other box,” we would have excluded a
significant number of new ideas: nearly half of the user-contributed ideas in
the PlaNYC survey and about 40% in the OECD survey.
Figure 5: Distribution of contribution per respondent for PlaNYC [A] and

OECD [B].
Both the number of responses per respondent and the number of ideas
contributed per respondent show a “fat head” and a “long tail.” Note that the
scales on the figures are different.
In both cases, many of the highest-scoring ideas were contributed
by respondents. For PlaNYC, 8 of the top 10 ideas were contributed by
users, as were 7 of the top 10 ideas for the OECD (Fig 6). These high-
scoring user-contributed ideas highlight a strength of pairwise relative wiki
surveys relative to both surveys and interviews. With a survey, it would have
been difficult to learn about these new user-contributed ideas, and with an
interview it would have been difficult to empirically assess the support that
respondents have for them.
Figure 6: Ten highest-scoring ideas for PlanNYC [A] and OECD [B].
Ideas that were contributed by respondents are printed in a bold/italic font
and marked by closed circles; seed ideas are printed in a standard font and
marked by open circles. In the case of PlaNYC, 8 of the 10 highest-scoring
ideas were contributed by respondents. In the case of the OECD, 7 of the

10 highest-scoring ideas were contributed by respondents. Horizontal lines
show 95% posterior intervals.
Building on these specific results, we can begin to formulate a general
model that describes the situations in which many of the top scoring items
will be contributed by respondents. Three mathematical factors determine
the extent to which an idea generation process will produce extreme
outcomes (i.e., high scoring ideas): the number of ideas, the mean of ideas’
scores, and the variance of ideas’ scores [60]. In both of these case studies,
there were many more user-contributed ideas than seed ideas, and they
had higher variance in scores (Fig 7). These two features—volume and
variance—ensured that many of the highest-scoring ideas were contributed
by respondents, even though these ideas had a lower mean score than the seed
ideas. Thus, in settings in which researchers seek to discover the highest-
scoring ideas, the high variance and high volume of user-contributed ideas
make them a likely source of these extreme outcomes.
Figure 7: Distribution of scores of seed ideas and user-contributed ideas for

PlaNYC [A] and OECD [B].
In both cases, some of the lowest-scoring ideas were user-contributed,
but critically, some of the highest-scoring ideas were also user-contributed.
In general, the large number of user-contributed ideas, combined with their
high variance, means that they typically include some extremely popular
ideas. Posterior intervals for each estimate are not shown.
Qualitative Results
Because user-contributed ideas that score well are likely to be of interest—
in fact, they highlight the value of the collaborativeness of wiki surveys—
we sought to understand more about these items by conducting interviews
with the creators of the PlaNYC and OECD wiki surveys. Based on these
interviews, as well as interviews with six other wiki survey creators, we
identified two general categories of high-scoring user-contributed ideas:
novel information—that is, substantively new ideas that were not anticipated
by the wiki survey creators—and alternative framings—that is, new and
resonant ways of expressing existing ideas.
Some high-scoring user-contributed ideas contained information that
was novel to the wiki survey creator. For example, in the PlaNYC context,
the Mayor’s Office reported that user-contributed ideas were sometimes
able to bridge multiple policy arenas (or “silos”) that might have been
more difficult connections to make for office staff working within a specific
arena. For instance, consider the high-scoring user-contributed idea “plug
ships into electricity grid so they don’t idle in port—reducing emissions
equivalent to 12000 cars per ship.” The Mayor’s Office suggested that
staff may not have prioritized such an idea internally (it did not appear on
the Mayor’s Office’s list of seed ideas), even though the idea’s high score
suggested public support for this policy goal: “[T]his relates to two areas.
So plugging ships into electricity grid, so that’s one, in terms of energy and
sourcing energy. And it relates to freight. [Question: Okay, which are two
separate silos?] Correct, so freight is something that we’re looking closer at.
… And emissions, reducing emissions, is something that’s an overall goal
of the plan. … So this has a lot of value to it for us to learn from” (interview
with Ibrahim Abdul-Matin, New York City Mayor’s Office, December 12,
2010).
Other user-contributed ideas suggested alternative framings for existing
ideas. For instance, the creators of the OECD wiki survey noted that high-
scoring, user-contributed ideas like “Teach to think, not to regurgitate”
“wouldn’t be formulated in such a way [by the OECD]. … [I]t’s very
un-OECD-speak, which we liked” (interview with Julie Harris, OECD,
February 3, 2011). More generally, OECD staff noted that “what for me has
been most interesting is that … those top priorities [are] very much couched
in the language of principles[. …] It’s sort of constitutional language”
(interview with Joanne Caddy, OECD, February 15, 2011). PlaNYC’s wiki
survey creators also described the importance of user-contributed ideas
being expressed in unexpected ways. The top-scoring idea in PlaNYC’s wiki
survey, contributed by a respondent, was “Keep NYC’s drinking water clean
by banning fracking in NYC’s watershed”; Mayor’s Office staff indicated
that the office would have used more general language about protecting the
watershed, rather than referencing fracking explicitly: “[W]e talk about it

differently. We’ll say, ‘protect the watershed.’ We don’t say, ‘protect the
watershed from fracking’” (interview with Ibrahim Abdul-Matin, New York
City Mayor’s Office, December 12, 2010).
Taken together, these two case studies suggest that pairwise wiki surveys
can provide information that is difficult, if not impossible, to gather from
more traditional surveys or interviews. This unique information comes from
high-scoring user-contributed ideas, and may involve both the content of the
ideas and the language used to frame them.
DISCUSSION
In this paper we propose a new class of data collection instruments called
wiki surveys. By combining insights from traditional survey research and
projects such as Wikipedia, we propose three general principles that all wiki
surveys should satisfy: they should be greedy, collaborative, and adaptive.
Designing an instrument that satisfies those three criteria introduces a
number of challenges for data collection and data analysis, which we attempt
to resolve in the form of a pairwise wiki survey. Through two case studies
we show that pairwise wiki surveys can enable data collection that would
be difficult with other methods. Moving beyond these proof-of-concept case
studies to a fuller understanding of the strengths and weaknesses of pairwise
wiki surveys, in particular, and wiki surveys, in general, will require
substantial additional research.
One next step for improving our understanding of the measurement
properties of pairwise wiki surveys would be additional studies to assess
the consistency and validity of responses. Consistency could be assessed
by measuring the extent to which respondents provide identical responses
to the same pair and provide transitive responses to a series of pairs.
Assessing validity would be more difficult, however, because wiki surveys
tend to measure subjective states, such as attitudes, for which gold-standard
measures rarely exist [61]. Despite the inherent difficulty of validating
measures of subjective states, there are several approaches that could lead
to increased confidence in the validity of pairwise wiki surveys [62]. First,
studies could be done to assess discriminant validity by measuring the
extent to which groups of respondents who are thought to have different
preferences produce different wiki survey results. Second, construct validity
could be assessed by measuring the extent to which responses for items
that we believe to be similar are in fact similar. Third, studies could assess
predictive validity by measuring the ability of results from pairwise wiki

surveys to predict the future behavior of respondents. Finally, the results of
pairwise wiki surveys could be compared to data collected through other
quantitative and qualitative methodologies.
Another area for future research about pairwise wiki surveys is
improving the statistical methods used to estimate the opinion matrix—either
by choosing pairs more efficiently or developing more flexible statistical
models. First, one could develop algorithms that would choose pairs so as to
maximize the amount learned from each respondent. However, maximizing
the amount of information per response [63–65] may not maximize the
amount of information per respondent, which is determined by both the
information per response and the number of responses provided by the
respondent [66]. That is, an algorithm that chooses very informative pairs
from a statistical perspective might not be effective if people do not enjoy
responding to those kinds of pairs. Thus, algorithms could be developed
to address both maximization of information per pair and to encourage
participation by, for example, choosing pairs to which respondents enjoy
responding. In addition to choosing pairs more efficiently, we believe
that substantial progress can be made by developing more flexible and
general statistical models for estimating the opinion matrix from a set of
responses. For example, the statistical model we propose could be extended
to include co-variates at the level of the respondent (e.g., age, gender, level
of education, etc.) and at the level of the item (e.g., phrase structure, item
topic, etc.). Another modeling improvement would involve creating more
flexible assumptions about the distributions of opinions among respondents.
These methodological improvements could be assessed by their robustness
and their ability to improve the prediction of future responses (e.g., [67]).
Another important next step is to combine pairwise wiki surveys with
probabilistic sampling methods, something that was logistically impossible
in our case studies. If one thinks of survey research as a combination of
sampling and interacting with respondents [68], then pairwise wiki surveys
should be considered a new way of interacting with respondents, not a new
way of sampling. However, pairwise wiki surveys can be naturally combined
with a variety of different sampling designs. For example, researchers
wishing to employ pairwise wiki surveys with a nationally representative
sample can make use of commercially available online panels [69, 70].
Further, researchers wishing to study more specific groups—e.g., workers
in a firm or residents in a city—could draw their own probability samples
from administrative records.
Given the significant amount of work that remains to be done, we have

taken a number of concrete steps to facilitate the future development of
pairwise wiki surveys. First, we have made it easy for other researchers
to create and host their own pairwise wiki surveys at www.allourideas.org.
Further, the website enables researchers to download detailed data from their
survey which can be analyzed in any way that researchers find appropriate.
Finally, we have made all of the code that powers www.allourideas.org
available open-source so that anyone can modify and improve it. We hope
that these concrete steps will stimulate the development of pairwise wiki
surveys. Further, we hope that other researchers will create different types
of wiki surveys, particularly wiki surveys in which respondents themselves
help to generate the questions [71, 72]. We expect that the development of
wiki surveys will lead to new and powerful forms of open and quantifiable
data collection.
ACKNOWLEDGMENTS
We thank Peter Lubell-Doughtie, Adam Sanders, Pius Uzamere, Dhruv
Kapadia, Chap Ambrose, Calvin Lee, Dmitri Garbuzov, Brian Tubergen,
Peter Green, and Luke Baker for outstanding web development; we thank
Nadia Heninger, Bill Zeller, Bambi Tsui, Dhwani Shah, Gary Fine, Mark
Newman, Dennis Feehan, Sophia Li, Lauren Senesac, Devah Pager, Paul
DiMaggio, Adam Slez, Scott Lynch, David Rothschild, and Ceren Budak
for valuable suggestions; and we thank Josh Weinstein for his critical role
in the genesis of this project. Further, we thank Ibrahim Abdul-Matin and
colleagues at the New York City Mayor’s Office and Joanne Caddy, Julie
Harris, and Cassandra Davis at the Organisation for Economic Co-operation
and Development. This paper represents the views of its authors and not the
users or funders of www.allourideas.org.
Conceived and designed the experiments: MJS KECL. Performed the
experiments: MJS KECL. Analyzed the data: MJS KECL. Wrote the paper:
MJS KECL.
REFERENCES
1. Lazarsfeld PF. The Controversy Over Detailed Interviews—An Offer
for Negotiation. Public Opinion Quarterly. 1944;8(1):38–60.
2. Converse JM. Strong arguments and weak evidence: The open/closed
questioning controversy of the 1940s. Public Opinion Quarterly.
1984;48(1):267–282.
3. Converse JM. Survey research in the United States: Roots and
emergence 1890–1960. New Brunswick: Transaction Publishers; 2009.
4. Schuman H. Method and meaning in polls and surveys. Cambridge:
Harvard University Press; 2008.
5. Schuman H, Presser S. The Open and Closed Question. American
Sociological Review. 1979 Oct;44(5):692–712.
6. Schuman H, Scott J. Problems in the Use of Survey Questions to
Measure Public Opinion. Science. 1987 May;236(4804):957–959.
pmid:17812751
7. Presser S. Measurement Issues in the Study of Social Change. Social
Forces. 1990 Mar;68(3):856–868.
8. Roberts ME, Stewart BM, Tingley D, Lucas C, Leder-Luis J, Gadarian
SK, et al. Structural Topic Models for Open-Ended Survey Responses.
American Journal of Political Science. 2014 Oct;58(4):1064–1082.
9. Krosnick JA. Survey Research. Annual Review of Psychology. 1999
Feb;50(1):537–567. pmid:15012463
10. Mitofsky WJ. Presidential Address: Methods and Standards: A
Challenge for Change. Public Opinion Quarterly. 1989;53(3):446–453.
11. Dillman DA. Presidential Address: Navigating the Rapids of Change:
Some Observations on Survey Methodology in the Early Twenty-First
Century. Public Opinion Quarterly. 2002 Oct;66(3):473–494.
12. Couper MP. Designing effective web surveys. Cambridge, UK:
Cambridge University Press; 2008.
13. Couper MP, Miller PV. Web Survey Methods: Introduction. Public
Opinion Quarterly. 2009 Jan;72(5):831–835.
14. Couper MP. The Future of Modes of Data Collection. Public Opinion
Quarterly. 2011;75(5):889–908.
15. Groves RM. Three Eras of Survey Research. Public Opinion Quarterly.
2011;75(5):861–871.
16. Newport F. Presidential Address: Taking AAPOR’s Mission To Heart.

Public Opinion Quarterly. 2011;75(3):593–604.
17. Benkler Y. The wealth of networks: How social production transforms
markets and freedom. New Haven, CT: Yale University Press; 2006.
18. Howe J. Crowdsourcing: Why the power of the crowd is driving the
future of business. New York: Three Rivers Press; 2009.
19. Noveck BS. Wiki government: How technology can make government
better, democracy stronger, and citizens more powerful. Washington,
D. C. : Brookings Institution Press; 2009.
20. Nielsen MA. Reinventing discovery: The new era of networked
science. Princeton, N. J. : Princeton University Press; 2012.
21. Anderson C. The long tail: Why the future of business is selling less of
more. New York, NY: Hyperion; 2006.
22. Wilkinson DM. Strong Regularities in Online Peer Production. In:
Proceedings of the 9th ACM Conference on Electronic Commerce;
2008. p. 302–309.
23. Merton RK, Kendall PL. The Focused Interview. American Journal of
Sociology. 1946 May;51(6):541–557.
24. Merton RK. The Focussed Interview and Focus Groups: Continuities
and Discontinuities. Public Opinion Quarterly. 1987;51(4):550–566.
25. Presser S, Couper MP, Lessler JT, Martin E, Martin J, Rothgeb JM,
et al. Methods for Testing and Evaluating Survey Questions. Public
Opinion Quarterly. 2004 Apr;68(1):109–130.
26. Balasubramanian SK, Kamakura WA. Measuring Consumer Attitudes
Toward the Marketplace with Tailored Interviews. Journal of Marketing
Research. 1989;26(3):311–326.
27. Singh J, Howell RD, Rhoads GK. Adaptive Designs for Likert-type
Data: An Approach for Implementing Marketing Surveys. Journal of
Marketing Research. 1990;27(3):304–321.
28. Groves RM, Heeringa SG. Responsive Design for Household Surveys:
Tools for Actively Controlling Survey Errors and Costs. Journal
of the Royal Statistical Society: Series A (Statistics in Society).
2006;169(3):439–457.
29. Toubia O, Flores L. Adaptive Idea Screening Using Consumers.
Marketing Science. 2007;26(3):342–360.
30. Smyth JD, Dillman DA, Christian LM, Mcbride M. Open-ended

Questions in Web Surveys: Can Increasing the Size of Answer Boxes
and Providing Extra Verbal Instructions Improve Response Quality?
Public Opinion Quarterly. 2009 Jun;73(2):325–337.
31. Chen K, Chen H, Conway N, Hellerstein JM, Parikh TS. Usher:
Improving Data Quality with Dynamic Forms. In: 2010 IEEE 26th
International Conference on Data Engineering (ICDE); 2010. p. 321–
332.
32. Dzyabura D, Hauser JR. Active Machine Learning for Consideration
Heuristics. Marketing Science. 2011;30(5):801–819.
33. Montgomery JM, Cutler J. Computerized Adaptive Testing for Public
Opinion Surveys. Political Analysis. 2013 Apr;21(2):172–192.
34. Lewry F, Ryan T. Kittenwar: May the cutest kitten win! San Francisco:
Chronicle Books; 2007.
35. Wu M. USG launches new web tool to gauge students’ priorities. The
Daily Princetonian. 2008.
36. Weinstein JR. Photocracy: Employing Pictoral Pair-Wise Comparison
to Study National Identity in China, Japan, and the United States via
the Web. Senior Thesis. Princeton University: Department of East
Asian Studies; 2009.
37. Shah D. Solving Problems Using the Power of Many: Information
Aggregation Websites, A Theoretical Framework and Efficacy Test.
Senior Thesis. Princeton University: Department of Sociology; 2009.
38. Das Sarma A, Das Sarma A, Gollapudi S, Panigrahy R. Ranking
mechanisms in twitter-like forums. In: Proceedings of the third ACM
international conference on Web search and data mining. WSDM’10.
New York, NY, USA: ACM; 2010. p. 21–30.
39. Luon Y, Aperjis C, Huberman BA. Rankr: A Mobile System for
Crowdsourcing Opinions. In: Zhang JY, Wilkiewicz J, Nahapetian
A, editors. Mobile Computing, Applications, and Services. No.
95 in Lecture Notes of the Institute for Computer Sciences, Social
Informatics and Telecommunications Engineering. Springer Berlin
Heidelberg; 2012. p. 20–31.
40. Salesses P, Schechtner K, Hidalgo CA. The Collaborative Image of
The City: Mapping the Inequality of Urban Perception. PLoS ONE.
2013 Jul;8(7):e68400. pmid:23894301
41. Thurstone LL. The Method of Paired Comparisons for Social Values.
Journal of Abnormal and Social Psychology. 1927;21(4):384–400.
42. Hacker S, von Ahn L. Matchin: Eliciting User Preferences with an
Online Game. Proceedings of the 27th international conference on
Human factors in computing systems. 2009;p. 1207–1216.
43. Salganik MJ, Watts DJ. {Web-based} Experiments for the Study of
Collective Social Dynamics in Cultural Markets. Topics in Cognitive
Science. 2009 Jul;1(3):439–468. pmid:25164996
44. Goel S, Mason W, Watts DJ. Real and perceived attitude agreement
in social networks. Journal of Personality and Social Psychology.
2010;99(4):611–621. pmid:20731500
45. Salganik MJ, Dodds PS, Watts DJ. Experimental Study of Inequality
and Unpredictability in an Artificial Cultural Market. Science. 2006
Feb;311(5762):854–856. pmid:16469928
46. Zhu H, Huberman B, Luon Y. To switch or not to switch: understanding
social influence in online choices. In: Proceedings of the SIGCHI
Conference on Human Factors in Computing Systems. CHI’12. New
York, NY, USA: ACM; 2012. p. 2257–2266.
47. Muchnik L, Aral S, Taylor SJ. Social Influence Bias: A Randomized
Experiment. Science. 2013 Aug;341(6146):647–651. pmid:23929980
48. van de Rijt A, Kang SM, Restivo M, Patil A. Field experiments of
success-breeds-success dynamics. Proceedings of the National
Academy of Sciences. 2014 May;111(19):6934–6939.
49. Mosteller F. Remarks on the Method of Paired Comparisons: I. The
Least Squares Solution Assuming Equal Standard Deviations and
Equal Correlations. Psychometrika. 1951 Mar;16:3–9.
50. Stern H. A Continuum of Paired Comparisons Models. Biometrika.
1990 Jun;77(2):265–273.
51. R Core Team. R: A Language and Environment for Statistical
Computing; 2014. R Foundation for Statistical Computing, Vienna,
Austria. Available from: http://www. R-project. org/.
52. Wickham H. The Split-Apply-Combine Strategy for Data Analysis.
Journal of Statistical Software. 2011;40(1):1–29.
53. Urbanek S. multicore: Parallel Processing of R Code on Machines with
Multiple Cores or CPUs; 2011. R package version 0. 1–5.
54. Kane MJ, Emerson JW. bigmemory: Manage Massive Matrices With
Shared Memory and Memory-Mapped Files; 2011. R package version
4. 2. 11.
55. Trautmann H, Steuer D, Mersmann O, Bornkamp B. truncnorm:
Truncated Normal Distribution; 2011. R package, version 1. 0–5.
56. Wickham H. testthat: Get Started with Testing. The R Journal.
2011;3(1):5–10.
57. Bates D, Maechler M. Matrix: Sparse and Dense Matrix Classes and
Methods; 2011. R package version 0. 999375–50.
58. Bengtsson H. matrixStats: Methods that apply to rows and columns of
a matrix. ; 2013. R package version 0. 8. 14.
59. Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian data analysis.
2nd ed. Boca Raton: Chapman and Hall/CRC; 2003.
60. Girotra K, Terwiesch C, Ulrich KT. Idea Generation and the Quality of
the Best Idea. Management Science. 2010 Apr;56(4):591–605.
61. Turner CF, Martin E. Surveying subjective phenomena. New York:
Russell Sage Foundation; 1984.
62. Fowler FJ. Improving survey questions: design and evaluation.
Thousand Oaks: Sage; 1995.
63. Lindley DV. On a Measure of the Information Provided by an Experiment.
The Annals of Mathematical Statistics. 1956 Dec;27(4):986–1005.
64. Glickman ME, Jensen ST. Adaptive paired comparison design. Journal
of Statistical Planning and Inference. 2005 Jan;127(1–2):279–293.
65. Pfeiffer T, Gao XA, Chen Y, Mao A, Rand DG. Adaptive Polling for
Information Aggregation. In: Twenty-Sixth AAAI Conference on
Artificial Intelligence; 2012.
66. von Ahn L, Dabbish L. Designing Games with a Purpose.
Communications of the ACM. 2008;51(8):58–67.
67. Mao A, Soufiani HA, Chen Y, Parkes DC. Capturing Cognitive Aspects
of Human Judgment; 2013. 1311. 0251. Available from: http://arxiv.
org/abs/1311. 0251.
68. Conrad FG, Schober MF. Envisioning the survey interview of the
future. Hoboken, N. J. : Wiley-Interscience; 2008.
69. Baker R, Blumberg SJ, Brick JM, Couper MP, Courtright M, Dennis
JM, et al. Research Synthesis: AAPOR Report on Online Panels. Public
Opinion Quarterly. 2010 Dec;74(4):711–781.
70. Brick JM. The Future of Survey Sampling. Public Opinion Quarterly.
2011 Dec;75(5):872–888.
71. Sullivan JL, Piereson J, Marcus GE. An Alternative Conceptualization
of Political Tolerance: Illusory Increases 1950s-1970s. American
Political Science Review. 1979;73(3):781–794.
72. Gal D, Rucker DD. Answering the Unasked Question: Response
Substitution in Consumer Surveys. Journal of Marketing Research.
2011 Feb;48:185–195.
Chapter
TOWARDS A STANDARD
SAMPLING METHODOLOGY
ON ONLINE SOCIAL
9
NETWORKS: COLLECTING
GLOBAL TRENDS ON
TWITTER
C. A. Piña-García1, Carlos Gershenson1,2,3,4,5, and

J. Mario Siqueiros-García1
1

México, México
Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México,
2
Circuito Maestro Mario de la Cueva S/N, Ciudad Universitaria, Ciudad de México, 04510
México
SENSEable City Lab, Massachusetts Institute of Technology, 77 Massachusetts Avenue,
3
Cambridge, 02139 USA

4
MoBS Lab, Network Science Institute, Northeastern University, 360 Huntington av 1010-
177, Boston, 02115 USA
ITMO University, Birzhevaya liniya 4, St. Petersburg, 199034 Russia
5
Citation: (APA): Piña-García, C. A., Gershenson, C., & Siqueiros-García, J. M. (2016).

Towards a standard sampling methodology on online social networks: collecting global
trends on Twitter. Applied network science, 1(1), 1-19. (19 pages)
Copyright: © This article is distributed under the terms of the Creative Commons At-
tribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).
ABSTRACT
One of the most significant current challenges in large-scale online social
networks, is to establish a concise and coherent method aimed to collect and
summarize data. Sampling the content of an Online Social Network (OSN)
plays an important role as a knowledge discovery tool.
It is becoming increasingly difficult to ignore the fact that current
sampling methods must cope with a lack of a full sampling frame i.e., there
is an imposed condition determined by a limited data access. In addition,
another key aspect to take into account is the huge amount of data generated
by users of social networking services such as Twitter, which is perhaps
the most influential microblogging service producing approximately 500
million tweets per day. In this context, due to the size of Twitter, which is
problematic to be measured, the analysis of the entire network is infeasible
and sampling is unavoidable.
In addition, we strongly believe that there is a clear need to develop a
new methodology to collect information on social networks (social mining).
In this regard, we think that this paper introduces a set of random strategies
that could be considered as a reliable alternative to gather global trends on
Twitter. It is important to note that this research pretends to show some
initial ideas in how convenient are random walks to extract information or
global trends.
The main purpose of this study, is to propose a suitable methodology
to carry out an efficient collecting process via three random strategies:
Brownian, Illusion and Reservoir. These random strategies will be applied
through a Metropolis-Hastings Random Walk (MHRW). We show that
interesting insights can be obtained by sampling emerging global trends on
Twitter. The study also offers some important insights providing descriptive
statistics and graphical description from the preliminary experiments.
Keywords: Twitter, Sampling method, Random walks, Online social net-

work, Data acquisition
INTRODUCTION
In recent years, there has been an increasing interest in Online Social
Networks (OSNs) exploration. Mining social signals can provide quick
knowledge of a real-world event (Roy and Zeng 2014). More recently,
areas of social network analysis are now expanding to different disciplines,
Towards a Standard Sampling Methodology on Online Social Networks... 183
not only in data mining studies but also in computational social science
(user behavior), social media analytics and complex systems. Thus, the
availability of unprecedented amounts of data about human interactions
from different social networks opens the possibility of using this information
to leverage knowledge about the diversity of social behavior and the activity
of individuals (Lu and Brelsford 2014; Piña-García and Gu 2013b; 2015;
Thapen and Ghanem 2013; Weng et al. 2012). The focus of social data
analysis is essentially the content that is being produced by users. The data
produced in social networks are rich, diverse and abundant, which makes
them a relevant source for data science (Ferrara et al. 2014; Kurka et al.
2015; Weikum et al. 2011).
The main challenge faced by many experiments in data science is the
lack of a standard methodology to collect and analyze data sets. Thus, the
main obstacles that data scientists face is as follows:
• They do not know what sort of data they need,
• They do not know how much data they need,
• They do not know what critical questions they should be asking
and
• They do not know is this data is private i.e. this data could be
considered illegal in some contexts.
Social media platforms have increasingly replaced other means of
communication, such as telephone and emails (Phan and Airoldi 2015).
Thus, the rising interest in digital media and social interactions mediated
by online technologies is boosting the research outputs in an emerging field
that is multidisciplinary by nature: it brings together computer scientists,
sociologists, physicists, and researchers from a wide range of other
disciplines (González-Bailón et al. 2014).
Twitter, can be considered the most studied OSN (Kurka et al. 2015). This
social media platform provides an efficient and effective communication
medium for one-on-one interactions and broadcast calls (e.g., for assistance
or dissemination and access to useful information) (Phan and Airoldi 2015).
In this regard, we consider Twitter as a a suitable large-scale social network
to be explored (Kwak et al. 2010).
Twitter is the most famous microblogging website in the social media
space, where users post messages that are limited to 140 characters. In
addition, users can follow other accounts they find interesting. Posts are
called “tweets”. Unlike the case with other social networks, the relationship
does not have to be mutual (Golbeck 2013). It should be noted that Twitter
produces approximately 500 million tweets per day, with 271 million regular
users (Serfass and Sherman 2015). Therefore, Twitter has been a valuable
tool to track and identify patterns of mobility and activity, especially using
geolocated tweets. Geolocated tweets typically use the Global Positioning
System (GPS) tracking capability installed on mobile devices when enabled
by the user to give his or her precise location (latitude and longitude).
In this research we adopt a strategy to identify “trending topics”
generated in real-time on Twitter (Weng and Menczer 2015). The main goal
of this research is to extract emergent topics and identify their relevance on
Twitter. This manuscript is an exploratory data analysis based on topical
interest. It is important to note that Twitter has emerged as an important
platform to observe relatively informal communication.
This manuscript provides the basic steps that were followed to collect
systematically a set of trending topics. These topics or global trends1 were
gathered and filtered according to their geographic distribution and topical
interest.
In addition, we present a statistical and a descriptive analysis of the main
features obtained from our collected dataset. Finally, our central hypothesis
in this work is that in order to advance our understanding of social interaction,
it is necessary to propose a reliable methodology to collect, analyze and
visualize collected data from OSNs.
Contributions
A central hypothesis in this study is that in order to advance our quantitative
understanding of social interaction, is not possible to get by with incomplete
data. It becomes necessary to obtain representative data. Therefore, the aim
of this study is to propose an algorithm to discover and collect emerging
global trends on Twitter. Specifically, our contributions in this study are as
follows:
• This paper provides a series of random strategies (Brownian,
Illusion and Reservoir) based on random walk models to sample
small but relevant parts of information produced on Twitter.
• This research is intended to determine the extent to which
random walks can be combined by using an alternative version of
a Metropolis-Hastings algorithm.
RELATED WORK
A considerable amount of literature has been published on using graph
sampling techniques on large-scale OSNs. These studies are rapidly growing
in the scientific community, showing that sampling methods are essential
for practical estimation of OSN properties. These properties include, for
example: user age distribution, net activity, net connectivity and node
degree. Studies on social science show the importance of graph sampling
techniques, e.g., (Caci et al. 2012; Fire and Puzis 2012; Lee et al. 2006;
Mislove et al. 2006; Scott 2011).
Online social networks such as Facebook represents one of the biggest
social services in the world. Therefore, it may be seen as a large-scale source
to collect data with the aim to obtain a representative sample or characterize
the whole network structure (Bhattacharyya et al. 2011; Caci et al. 2011;
Ferri et al. 2012; Ugander et al. 2011). Recent evidence suggests that efficient
random walk inspired techniques has been successfully used to sample large-
scale social networks, in particular, Facebook (Gjoka et al. 2010; 2011a).
However, despite its relative success of Facebook, these specific sampling
strategies have not been tested on different social networking services such
as: Twitter.
A number of researchers have pointed out that statistical approaches
such as random walks can be used to improve and speeding up the process of
sampling. This can be done by considering different randomized algorithms
which are able to cope with large datasets. Recently, the Metropolis-
Hastings Random Walk algorithm have been tested on Facebook and Last.
fm (a music website with 30 million active users) showing significant results
for an unbiased sampling of users (Gjoka et al. 2011a, 2011b; Kurant et al.
2011).
Similarly, some studies based on supervised random walks use the
information from the network structure with the aim to guide a random walk
on the graph, e.g., on the Facebook social graph (Backstrom and Leskovec
2011). In addition, there are other studies that introduce the same random
walk technique analyzed from the Markov Chain Monte Carlo (MCMC)
perspective, i.e., the Metropolis-Hastings random walk (MHRW), which is
mainly used to produce uniform samples (Bar-Yossef and Gurevich 2008).
An alternative Metropolis-Hastings random walk using a spiral proposal
distribution is presented in (Piña-García and Gu 2013b). The authors
examined whether it was possible to alter the behavior of the MHRW using
spirals as a probability distribution instead a classic Gaussian distribution.
They observed that the spiral inspired approach was able to adapt itself
correctly to a Metropolis-Hastings random walk.
These studies presented thus far provide evidence that there is a
growing interest in the use of rapid sampling models and a clear need of
data extraction tools on Facebook (Bhattacharyya et al. 2011; Caci et al.
2011; Ferri et al. 2012; Ugander et al. 2011). However, Twitter has recently
received special attention from researchers that are interested in uncovering
global topics, that are well known as: “memes” and “hashtags2” (Hawelka
et al. 2014; Kallus 2014; Mitchell et al. 2013; Takhteyev et al. 2012; Thapen
and Ghanem 2013).
Recently, there has been an increasing amount of literature on data
collection via Twitter. Preliminary work on information diffusion was
presented in (Weng et al. 2013b), where authors examined the mechanisms
behind human interactions through an unprecedented amount of data (social
observatory). They also argued that information diffusion affects network
evolution.
An important analysis about the geography of twitter networks was
presented in (Takhteyev et al. 2012). In this case, the authors showed that
distance matters on Twitter, both at short and longer ranges. In addition, they
argued that the distance considerably constrains ties. The authors highlighted
the importance of Twitter in terms of collection of data due to its popularity
and international reach. They also suggested that these ties at distances of
up to 1000 km are more frequent than what it would be expected if the ties
were formed randomly.
In a large longitudinal study carried out in (Hawelka et al. 2014), the
authors found global patterns of human mobility based on data extracted
from Twitter. A dataset of almost a billion of tweets recorded in 2012, was
used to estimate volumes of international travelers. The authors argue that
Twitter is a viable source to understand and quantify global mobility patterns.
Furthermore, a detailed investigation on correlations between real-
time expressions of individuals and a wide range of emotional, geographic,
demographic and health characteristics was conducted in (Mitchell et al.
2013). Results showed how social media may potentially be used to estimate
real-time levels and changes in population-level measures (Haralabopoulos
and Anagnostopoulos 2014; Leskovec et al. 2008). The findings in (Mitchell
et al. 2013), were supported by a large dataset of over 10 million geo-
tagged tweets, gathered from 373 urban areas in the United States during
the calendar year of 2011.
In another major study, a “conversational vibrancy” framework to

capture dynamics of hashtags based on their topicality, interactivity,
diversity, and prominence was introduced in (Lin et al. 2013). The authors
examined the growth and persistence of hashtags during the 2012 U.S.
presidential debates. They point out that the growth of a hashtag and death
is largely determined by an environmental context condition rather than the
conversational vibrancy of the hashtag itself.
A recent study about community structure in OSNs was presented in
(Weng et al. 2013a). Authors claim that “memes” and behaviors can be
mimicked as a contagion phenomenon. The authors concluded that the
future popularity of a meme can be predicted by quantifying its early spread
patterns.
A very important study that builds a systematic framework for
investigating human behaviors under extreme events with online social
network data extracted from Twitter was carried out in (Lu and Brelsford
2014). The researchers have shown distinctive changes in patterns of
interactions in online communities that have been affected by a natural
disaster compared to communities that were not affected.
Finally, in a controlled study of the automatic analysis of UK political
tweets was provided in (Thapen and Ghanem 2013). In this case, the authors
examined the extent of which the volume and sentiment of tweets can be used
as a proxy to obtain their voting intentions and compare the results against
existing poll data. In addition, the authors propose a data collection method
through a list of selected Twitter accounts classified by party affiliation.
Approximately 689,637 tweets, were retrieved from the publicly available
timelines of the members of Parliament on 10th June 2013, the authors took
a random sample of 600 users from Twitter.
PROBLEM DEFINITION
As the digital world grows, it generates an enormous amounts of data every
second, challenging us to find new methods to efficiently extract and sample
information. Currently, data science is a relatively new field of study that
involves concepts that come from data analysis and data mining. However,
we are experiencing a digital revolution where collecting data has become in
a everyday task to data scientist. In this regard, scientific production based
on data science has grown up sharply in last few years and it tends to still
happen. At this point, it is possible to observe many proposed methodologies
to collect and analyze information. However, none of these approaches
provide us a truthful unified framework in data science. Thus, it becomes

necessary to propose a standard sampling methodology that allows us to
establish a general framework to cope with current challenges.
Although extensive research has been carried out on related work
about data collection, much uncertainty still exists about the existence
of a standard sampling methodology to efficiently collect datasets from
social networking services. Many of the related work aforementioned have
based their experiments on offline datasets i.e., this information have been
previously collected by third parties without revealing in some cases how
the data was obtained from the source. In addition, the previous studies lack
of a standardized random sampling model, which means that each study
applies different techniques for sampling OSNs.
Recent evidence suggests that an uniform sample can be obtained with
a remarkable performance in terms of low computational cost (Gjoka et al.
2010; 2011a). It is important to note that our approach is mainly based on
this work carried out by Minas Gjoka et al. Further information and material
can be found on http://www.minasgjoka.com/index.html.
The key research question of this study is whether or not is possible to
use a different approach, apart from the normal or Gaussian distribution
as an internal random generator. Thus, this paper seeks to exploit ideas
from randomized algorithms such as: a Brownian walk (based on a normal
distribution), a spiral-inspired walk and a Reservoir sampling algorithm.
However, in this manuscript we are focused in extracting global
trending topics as a case of study. It is important to mention that since at
the moment there is not a clear solution or methodology in terms of data
extraction i.e., there is not any “ground truth” that we can follow or compare
with. This proposed methodology pretends to show some initial ideas in
how convenient are random walks to extract information or trends in this
particular case.
RANDOM STRATEGIES
This section will examine three random strategies that are incorporated into
the alternative version of the MHRW. These random strategies are aimed to
be used heuristically as an internal picker for a candidate node (hereinafter
referred to as ϱ). This set of random strategies is composed as follows:
Brownian walk (normal distribution), a spiral-inspired walk (Illusion) and a
Reservoir sampling method. It is important to note that the Brownian case
will be used as a the baseline to be compared with the rest of the random
strategies.
The main idea of the Metropolis-Hastings algorithm is to provide a
number of random samples from a given distribution. Thus, our proposed
version of the MHRW is able to sample a candidate node ϱ, which is directly
obtained from: q(y|x)={B r o w n i a n,I l l u s i o n,R e s e r v o i r}.
Brownian Walk
The traditional approach used to sample through the MHRW is based on
the normal distribution. In this regard, we have developed a Brownian walk
that presents a normal distribution. It is important to note that in most cases
the Brownian walk is related to a continuous time process. However, in
this research it has been considered a discretized version of this strategy.
Technically speaking in this model, the candidate node ϱ will be computed
according to the Java language command: Math.random().
Illusion spiRal
In this research, we have considered a spiral-inspired approach in terms of
an Illusion spiral.3 This spiral presents an interesting geometric shape which
presents a sequence of points spirally on a plane such that they are equitably
and economically spaced (see Fig. 1). This spiral model is produced by the
following expression:
z←az+bz/|z|. (1)
Figure 1: Pattern visualization of the Illusion spiral.

Where a=0.6+0.8i and b=0.65+0.7599i. It is important to note that this

spiral-inspired approach should not be considered as a formal distribution in
itself. It is just a consequence of making use of complex numbers to correctly
generate a geometric shape or in this case an Illusion spiral. In addition, Eq.
1 includes complex variables to correctly generate the pattern.
Thus, a z value of the form z=a+b i is iteratively generated. In
this regard, we are able to obtain a collection of complex numbers where the
real part will be considered as a candidate node z=ϱ for our study purposes.
Reservoir Sampling
A reservoir sampling can be seen as an algorithm that consists in selecting
a random sample of size n, from a file containing N records, in which
the value of N is not known to the algorithm. According to (Vitter 1985),
the first step of any reservoir algorithm is to put the first n records into a
“reservoir”. The rest of the records are processed sequentially. Thus, the
number of items to select (k) is smaller than the size of the source array S(i).
Algorithm 1 provides an overview of the steps carried out by the reservoir
sampling process.
In this study, a non-conditional version of the aforementioned algorithm

will be considered i.e., lines 8–10 are discarded. In this case, the random
number j:=r a n d o m(1,i) has come to be used to refer to a candidate node
j=ϱ.
THE ALTERNATIVE VERSION OF THE

METROPOLIS-HASTINGS ALGORITHM
The Metropolis-Hastings algorithm makes use of a proposal density: q(y|x)
which might be a simple distribution such as normal. For this study purposes,
the term q(y|x) will be applied to a set of three mutually exclusive random
strategies: q(y|x)={B r o w n i a n,I l l u s i o n,R e s e r v o i r}.
In this context we conceive a random strategy as a procedure designed
to generate a sequence of random numbers, this sequence represents global
trends on Twitter. One of the three random strategies is selected to pick
different trends from list A. Subsequently, all the trends that were sampled
are copied to a matrix B, these collected trends are arranged according to
how they were chosen (see Fig. 2).
Figure 2: In summary, all the global trends retrieved from Twitter are poten-
tial nodes. Subsequently, all the trends that were collected from the servers are
drawn through the random walks provided by q(y|x).
The key idea of this alternative version of the MHRW algorithm is to
generate a number of independent samples from a given random generator.
Thus, it is necessary to sample a candidate node ϱ from q(y|x)={B r o w n i
a n,I l l u s i o n,R e s e r v o i r}. The candidate node is accepted if and only
if this node belongs to the graph G. The steps of this method are outlined in
Algorithm 2.
It should be highlighted that the node v 0 is placed in terms of the first

record retrieved from the servers of Twitter. Similarly, the stopping criterion
is determined by the number of countries randomly chosen e.g., it is possible
to select 15 countries with their respective top 10 trending topics (15 ×
10 trends in total). The next section will describe how these countries are
obtained.
SAMPLING GLOBAL TRENDS ON TWITTER
Pre-processing
For the estimation of trends concentration, a list of countries with publicly
available trends was requested from Twitter. Countries are identified by
means of a specific WOEID. The term WOEID refers to a service that allows
to look up the unique identifier called the “Where on Earth ID” (see http://
developer.yahoo.com/geo/geoplanet/). Figure 3 illustrates a map of the
geographical locations of these countries. In addition, a full list of retrieved
countries can be found in Table 1.
Figure 3: The map shows the geographical location of the countries around the
globe that had more activity on Twitter according to a set of empirical trials.
Table 1: Table of retrieved countries with more activity on Twitter during our
empirical trials
List of countries
Argentina Australia
Belgium Brazil
Canada Chile
Colombia Dom. Republic
Ecuador France
Germany Greece
Guatemala India
Indonesia Ireland
Italy Japan
Kenya Korea
Malaysia Mexico
Netherlands New Zealand
Nigeria Norway
Pakistan Peru
Philippines Poland
Portugal Russia
Singapore South Africa
Spain Sweden
U. Arab Emirates Turkey
Ukraine United Kingdom
United States Venezuela
The Algorithm 2 interacts with Twitter via its public API as a primary way
to retrieve data. Once all the information has been retrieved, a random
sampling is performed across the global trends using Algorithm 2. Collected
samples are stored in an output data file and depicted on a visual interface.
Figure 4 shows how this process works on Twitter.
Figure 4: The diagram shows the content extraction tool or social explorer us-
ing an API to establish a connection, then a random sampling is carried out for
collecting global trends in real-time from Twitter. Finally, it generates an output
information file and depicts the results upon the visual interface.
A sample is chosen according to the following eligibility criteria (initial
conditions): 1) number of countries and 2) a minimum number of users
following a global trend. In this study, the initial conditions consisted of 15
countries and 10 users. Therefore, a maximum of 150 (15 × 10) trending
topics per independent run were available to be gathered. However, due to
a trending topic or an user can be counted multiple times, which makes the
measurement hard to interpret, all duplicate trends and duplicate users were
removed from the sample. After filtering out all duplicates, it can be built a
data structure containing a set of unique records.
In summary, the steps to generate the data are as follows:
• Collect a list of WOEIDs by searching countries with publicly
available trends, then select randomly a set of W=15 unique
countries.
• for each country c∈W, we acquire a list of the top ten trending
topics(TT), add each trending topic TT as a node to the graph G.
Then, set the minimum number of users following this trend to
F r=10;
• for each TT, get a list of users linked to the corresponding trending
topic, e.g. F r(T T), and add each of them as a node to G;
• create an edge [T T,F r(T T)] and add it to G;
• save the graph G.
RESULTS
In order to assess the performance of the social explorer, a sample of publicly
available trends was collected, this random sample contains tweets posted
from December 17 to December 20 2013, between 16:30 and 22:30 GMT
(time window). This sample consisted of 3,325 trending topics generated by
225,102 unique users that emerged during the observed time window.
It is important to note that in this case, not only tweets written in English
were extracted. This feature provides a different framework with respect to
previous studies whereby only English tweets were collected e.g., (Weng et
al. 2012, 2013a). One advantage of this multilingual feature is that it avoids
a bias in terms of the information posted in English.
To replicate the sampling process, a series of 10 independent walks was
performed for each one of the three random strategies: q(y|x)=Brownian,
Illusion, Reservoir (30 runs in total). Then, two different output files were
stored for further analysis: a .dat file and a .gml file. The first one contains
information such as: Total number of trending topics, total number of unique
followers, number of iterations, total number of sampled trends, a full list
of the collected trends, number of nodes, number of edges, node degree per
trending topic, memory usage, total number of duplicates and the elapsed
time during the sampling process. On the other hand, the second output
file contains a GML (Graph Modeling Language) formatted file, which
describes a graph obtained by the social explorer. This file is used to build
and evaluate graphically each one of the samples.
Figure 5 a compares a cumulative analysis of the number of trends. This
plot may be divided into three main criteria: number of trends retrieved from
the Twitter service (collected), number of trends after removing all duplicates
(filtered) and number of sampled trends collected by each random generator
(sampled). Similarly, means with respect to the number of sampled trends
are shown in Fig. 5 b. What is interesting in this data, is that the sampled
trends represents the core information to evaluate how was the three models
behavior in terms of data collection. It should be highlighted that the number
of collected trends depends exclusively from the Twitter service. Likewise,
the filtering process was carried out as a cleaning data process.
Figure 5: a Plot divided into three main criteria: number of trends retrieved
from Twitter (collected); number of trends after removing all duplicates (fil-
tered) and number of sampled trends collected by each random generator (sam-
pled). b Means corresponding to the average of sampled trends.
Owing to the natural tendency of the social explorer to move toward
the same node many times, which is induced by each one of the random
strategies, a considerable number of duplicate trends is added to the output
sequence. This permits to compare the results in terms of the number of
duplicate trends generated during the observation time window (see Fig. 6
a). Likewise, Fig. 6 b compares the number of unique followers obtained
from each random generator q(y|x).
Figure 6: Plots generated during the observation time window: a the number
of duplicate trends for q(y|x). b the number of unique followers presented with
logarithmic scale for the y-axis.
Table 2 compares a descriptive statistics of the average number in

terms of: trends filtered, sampled trends, duplicate trends and the number
of followers. It is apparent from this table that the Illusion model slightly
differs from the rest of the random strategies in terms of sampled trends.
This difference may be caused by the spread-out pattern presented on the
shape of this spiral (see Fig. 1).
Table 2: Descriptive statistics during the observation time window
Trends q(y|x) Total Avg Std

Filtered Brownian 504 50.4 8.16
Illusion 535 53.5 7.59
Reservoir 518 51.8 4.07
Sampled Brownian 316 31.60 5.44
Illusion 347 34.70 4.62
Reservoir 254 25.40 2.45
Duplicated Brownian 188 18.80 3.08
Illusion 188 18.80 3.04
Reservoir 264 26.40 2.91
Followers Brownian 71574 7157.4 1696.6
Illusion 80522 8052.2 2654.8
Reservoir 73006 7300.6 1582
In addition, the percentage of accuracy was computed based on the ratio

obtained by dividing the cumulated value of sampled trends by the total
number of iterations carried out by the social explorer. The same procedure
was applied to the cumulated value of duplicate trends. The results obtained
from this analysis are summarized in Fig. 7. This set of plots considers the
percentage amounts of sampled and duplicate trends over all the 30 samples
produced by each random generator. Data from this figure can be compared
with the data in Table 3 which shows that the Brownian and Illusion models
perform well during their 10 independent runs. However, the reservoir
model showed a poor performance in terms of percentage of sampling.
Figure 7: Group of plots of the percentage of accuracy plotted versus the num-
ber of trials. The measures are computed based on the percentage of sampled
trends and on the percentage of duplicate trends generated by each random
generator.
Table 3: Descriptive Statistics of the average amount of sampled and duplicate

trends. It can be seen from the data that the Illusion spiral outperforms all others
random strategies in terms of the number of sampled trends
Trends q(y|x) Avg

% Sampled Brownian 62.61 %
Illusion 64.94 %
Reservoir 49.06 %
% Duplicated Brownian 37.3 %
Illusion 35.05 %
Reservoir 50.93 %
Basic statistics of the average amount of the relative accuracy in sampling

and the average amount of the percentage of duplicate trends, are reported
in Table 3.
Memory Consumption
This section examines the estimated memory usage employed by each
proposed model. The results obtained from the preliminary analysis of
memory consumption can be compared in Table 4, this table compares the
average memory consumption in Megabytes (MB) and the total of memory
used across 10 independent runs. In this regard, there were no significant
differences between the amount of MB used for each random generator.
Table 4: Descriptive Statistics of memory consumption. This table compares

the average memory consumption in Megabytes (MB) and the total of memory
used across 10 independent runs
Results q(y|x) Total Avg Std

Memory (MB) Brownian 66 6.6 4.16
Illusion 66 6.6 2.71
Reservoir 62 6.2 4.61
Due to the experiments were run using custom software written in Java,
it has been considered to assess the results obtained related to the memory
consumption in Megabytes (MB). The basic computer hardware information
is as follows: Processor: Intel(R) Core(TM)2 Duo CPU at 3.33 GHz. Installed
memory (RAM): 4.00 GB. System type: 64-bit operation system. The
application was run on Windows 7 Enterprise edition. Figure 8 presents a
cumulative memory usage plot. This plot is presented as a stacked bar which
provides the sum of all the memory consumption across 10 independent
walks per model. From these data, it can be seen that there were no significant
differences between the sampling methods used as random strategies.
Figure 8: Stacked bar chart displaying the sum of the memory consumption
split in 10 independent runs per random generator.
Concentration Levels of Trending Topics

In order to identify the concentration levels of the trending topics obtained
from the .gml file, three samples were used to analyze visually the
distribution of the clusters. It is important to note that these samples were
chosen because they have a greater content of trending topics than the rest of
the samples, each sample corresponds to one random generator. Thus, data
was plotted as a community structure.
As can be seen from the graphs in Fig. 9 a, b and c the number of edges
incident to the node defines the level of dominance of each trending topic.
From the chart, it can be seen that by far the greatest trending topic used by
the users of Twitter is related to Christmas season e.g., “Xmas”, “Christmas”
and “Santa”. The most likely cause of this outcome is due to the fact that
the time window of the experiments was in December. Similarly, it can be
seen from the word clouds in Fig. 9 d, e and f the same information about the
concentration levels of the trending topics. In this case, the size of a word
is proportional to the relative degree of a trend. At this stage, it is possible
to distinguish more words than the graph version depicted in figures: Fig. 9
a, b and c. Essentially, either the graphs and the word clouds show the same
information.
Figure 9: Visualizations of concentration levels of trending topics for the fol-

lowing random strategies: a Brownian, b Illusion and (c) Reservoir. These
graphs represent trending topics produced by Twitter users over our sampling
time window. The size of a node indicates the degree of a trend. Similarly, word
clouds in (d) Brownian, e Illusion and (f) Reservoir show a group of words
whose sizes are proportional to the number of edges incident to the trending
node i.e., the degree of the node. Essentially, either the graphs and the word
clouds show the same information.
Convergence Monitoring
Part of the aim of this research is to identify convergence during the sampling
process. Therefore, a convergence analysis was prepared according to the
procedure used by the Geweke to evaluate the accuracy of sampling-based
approaches (Geweke 1991; Lee et al. 2006). This Geweke diagnostics is a
standard Z-score which consists in taking two non-overlapping parts of the
Markov chain and compares the means of both parts, using a difference of
means test to see if the two parts of the chain are from the same distribution
(null hypothesis).
This diagnostic represents a test of whether the sample of draws has
attained an equilibrium state based on the first 10 % of the sample of draws,
versus the last 50 % of the sample of draws. If the Markov chain of draws
has reached an equilibrium state, it would be expected to obtain roughly
equal averages from these two splits of the sample (Lesage 1999). MATLAB
functions that were used to implement these estimations can be found at
http://www.spatial-econometrics.com/gibbs/contents.html.
Figure 10 provides trace plots for the property of node degree (number
of users that follow a particular trend). These plots present the Z-score value
against the number of iterations. Therefore, using the Geweke diagnostics it
is possible to identify the convergence analysis for the Brownian walk, the
Illusion spiral and the Reservoir sampling. The number of draws was fixed
to 1100 with a burn-in process discarding the first 100. Thus, in accordance
to (Gjoka et al. 2011a) we can declare convergence when most values fall
in the [–1, 1] interval. Additionally, we plot an average line using 30 points
on the x-axis. Finally, as it can be seen in Fig. 10 our convergence analysis
suggests that our sample draws have attained an equilibrium state showing
that the means of the values converge rapidly in the sequence.
Figure 10: Plots of the resulting Z-scores against the number of iterations for
the metric of node degree (number of users that follow a particular trend). Hori-
zontal lines at Z=±1 are added to the plots to indicate the convergence interval.
LIMITATIONS
One advantage of this approach is the multilingual feature which avoids
a bias in terms of the information posted in English. However, there are
certain drawbacks associated with the use of different languages e.g., lack
of knowledge of the language and the misinterpretation of the statements.
On the other hand, this research does not take into account that the social
explorer is not able to distinguish between Twitterbots4 and real users on
Twitter. Therefore, all the estimates include Twitterbots causing an over
estimation in the results. These data must be interpreted with caution since
all the information collected from this study is mainly based on the Twitter
response service.
CONCLUSIONS
This paper has explained the central importance of defining a standard
sampling methodology applicable to cases where the social network
information flow is readily available. The main purpose of the current study
was to assess a low computational cost method for sampling emerging
global trends on Twitter.
It is now possible to state that the development of a faster randomized

algorithm able to carry out a collecting process via three random strategies,
can be effectively gathered by using q(y|x)={B r o w n i a n,I l l u s i o n,R
e s e r v o i r}. The present paper confirms previous findings related to the
good performance of the Brownian and Illusion generators (Piña-García and
Gu 2013a). It should be noted that according to the first systematic study of
using a Metropolis-Hastings Random Walk (MHRW) reported by Gjoka et al.
(2010), where the authors demonstrated that the MHRW perform well under
Facebook using a normal distribution as a random generator (a Brownian
walk in this study), the results of this research show that the MHRW, is
eligible to be modified in terms of how it heuristically generates a candidate
node using different random strategies such as: Illusion and Reservoir.
The empirical findings of this study suggest that, sampling global trends
on Twitter has several practical applications related to extract real-time
information. Despite its exploratory nature by looking at how impactful
people are about a specific topic and within specific categories, this research
offers some insight into how to collect publicly available trends using a
social explorer, which works as an interface between a faster randomized
algorithm proposed in Algorithm 2 and Twitter.
Overall, our current study indicates that our sampling methodology
may be a promising new approach to social networking service analysis
and an useful exploration tool for social data acquisition. However, a debate
continues about the best strategies to follow in this data science context.
The controversy about a sampling methodology has raged in last years
claiming the need of an standard methodology to collect data on OSNs.
No agreement have been achieved within the scientific community in terms
of a theoretical framework. Thus, this study highlights the importance of
proposing a standard sampling methodology to advance our knowledge for
addressing questions of social mining.
Endnotes
1
A word, phrase or topic that is tagged at a greater rate than other tags is said
to be a trending topic.
2
A hashtag is a word or metadata tag prefixed with the hash symbol (#).
3
see (Davis 1993) for a full description of the Illusion spiral.
4
A Twitterbot is a program used to produce automated posts on the Twitter
microblogging service, or to automatically follow Twitter users.
ACKNOWLEDGEMENTS
This work has been supported in part by “Programa de Apoyo a Proyectos
de Investigación e Innovación Tecnológica” (grant no. PAPIIT IA301016).
Carlos Gershenson was partially supported by SNI membership 47907. J.
Mario Siqueiros-García was partially supported by SNI membership 54027.
We also aknowledge the support of projects 212802, 221341, 260021 and
222220 of CONACyT. Carlos Piña-García acknowledges UNAM for post-
doctoral fellowship.
The content extraction tool was programmed by C. A. Piña-García. All
authors helped to write the literature review and to collect data. C. A.
Piña-García wrote the majority of the paper with assistance from Carlos
Gershenson and Siqueiros-García. All authors read and approved the final
manuscript.
REFERENCES
1. Backstrom, L, Leskovec J (2011) Supervised random walks: predicting
and recommending links in social networks In: Proceedings of the
fourth ACM international conference on web search and data mining,
635–644.. ACM.
2. Bar-Yossef Z, Gurevich M. Random sampling from a search engine’s
index. J ACM (JACM) 2008;55(5):24. doi: 10.1145/1411509.1411514.
3. Bhattacharyya P, Garg A, Wu SF. Analysis of user keyword similarity
in online social networks. Soc Netw Anal Mining. 2011;1(3):143–158.
doi: 10.1007/s13278-010-0006-4.
4. Caci, B, Cardaci M, Tabacchi ME (2011) Facebook as a small world: a
topological hypothesis. Soc Netw Anal Mining: 1–5.
5. Caci B, Cardaci M, Tabacchi ME. Facebook as a small world: a
topological hypothesis. Soc Netw Anal Mining. 2012;2(2):163–167.
doi: 10.1007/s13278-011-0042-8.
6. Davis P. Spirals: Prom Theodorus to Chaos. Wellesley, MA: AK
Peters; 1993.
7. Ferrara E, De Meo P, Fiumara G, Baumgartner R. Web data extraction,
applications and techniques: a survey. Knowl Based Syst. 2014;70:301–
323. doi: 10.1016/j.knosys.2014.07.007.
8. Ferri F, Grifoni P, Guzzo T. New forms of social and professional
digital relationships: the case of facebook. Soc Netw Anal Mining.
2012;2(2):121–137. doi: 10.1007/s13278-011-0038-4.
9. Fire, M, Puzis R (2012) Organization mining using online social
networks. Netw Spat Econ: 1–34. Springer.
10. Geweke J. Evaluating the accuracy of sampling-based approaches
to the calculation of posterior moments. MN, USA: Federal Reserve
Bank of Minneapolis, Research Department Minneapolis; 1991.
11. Gjoka M, Kurant M, Butts CT, Markopoulou A. Proceedings of IEEE
INFOCOM ’10. San Diego, CA: IEEE; 2010. Walking in Facebook: a
case study of unbiased sampling of OSNs; pp. 1–9.
12. Gjoka M, Kurant M, Butts CT, Markopoulou A. Practical
recommendations on crawling online social networks. Selected
Areas Commun IEEE J. 2011;29(9):1872–1892. doi: 10.1109/
JSAC.2011.111011.
13. Gjoka M, Butts CT, Kurant M, Markopoulou A. Multigraph

sampling of online social networks. Selected Areas Commun IEEE J.
2011;29(9):1893–1905. doi: 10.1109/JSAC.2011.111012.
14. Golbeck, J (2013) Analyzing the Social Web. Morgan Kaufmann.
15. González-Bailón S, Wang N, Rivero A, Borge-Holthoefer J, Moreno
Y. Assessing the bias in samples of large online networks. Soc Netw.
2014;38:16–27. doi: 10.1016/j.socnet.2014.01.004.
16. Haralabopoulos G, Anagnostopoulos I. Real time enhanced random
sampling of online social networks. J Netw Comput Appl. 2014;41:126–
134. doi: 10.1016/j.jnca.2013.10.016.
17. Hawelka B, Sitko I, Beinat E, Sobolevsky S, Kazakopoulos P, Ratti C. Geo-
located twitter as the proxy for global mobility patterns. Cartogr Geogr
Inf Sci. 2014;41(3):260–271. doi: 10.1080/15230406.2014.890072.
18. Kallus, N (2014) Predicting crowd behavior with big public data In:
Proceedings of the companion publication of the 23rd international
conference on World wide web companion, 625–630.
19. Kurant M, Gjoka M, Butts CT, Markopoulou A. Proceedings of the
ACM SIGMETRICS joint international conference on Measurement
and modeling of computer systems. San Jose, CA: ACM; 2011. Walking
on a graph with a magnifying glass: stratified sampling via weighted
random walks; pp. 281–292.
20. Kurka, DB, Godoy A, Von Zuben FJ (2015) Online social network
analysis: A survey of research applications in computer science. arXiv
preprint arXiv:1504.05655.
21. Kwak, H, Lee C, Park H, Moon S (2010) What is twitter, a social
network or a news media? In: Proceedings of the 19th International
Conference on World Wide Web, 591–600.. ACM.
22. Lee SH, Kim PJ, Jeong H. Statistical properties of sampled networks.
Phys Rev E. 2006;73(1):016102. doi: 10.1103/PhysRevE.73.016102.
23. Lesage JP. Applied econometrics using matlab. University of Toronto:
Manuscript, Dept. of Economics; 1999.
24. Leskovec, J, Lang KJ, Dasgupta A, Mahoney MW (2008) Statistical
properties of community structure in large social and information
networks In: Proceedings of the 17th International Conference on
World Wide Web, 695–704.. ACM.
25. Lin, YR, Margolin D, Keegan B, Baronchelli A, Lazer D (2013)

# bigbirds never die: Understanding social dynamics of emergent
hashtag. arXiv preprint arXiv:1303.7144.
26. Lu, X, Brelsford C (2014) Network structure and community evolution
on twitter: Human behavior change in response to the 2011 japanese
earthquake and tsunami. Sci Rep 4. Nature Publishing Group.
27. Mislove, A, Gummadi KP, Druschel P (2006) Exploiting social
networks for internet search In: 5th Workshop on Hot Topics in
Networks (HotNets06), 79.. Citeseer.
28. Mitchell L, Frank MR, Harris KD, Dodds PS, Danforth CM. The
geography of happiness: Connecting twitter sentiment and expression,
demographics, and objective characteristics of place. PLoS ONE.
2013;8(5):64417. doi: 10.1371/journal.pone.0064417.
29. Phan TQ, Airoldi EM. A natural experiment of social network formation
and dynamics. Proc Natl Acad Sci. 2015;112(21):6595–6600. doi:
10.1073/pnas.1404770112.
30. Piña-García, C, Gu D (2013a) Collecting random samples from
facebook: an efficient heuristic for sampling large and undirected
graphs via a metropolis-hastings random walk In: Systems, Man, and
Cybernetics (SMC), 2013 IEEE International Conference On, 2244–
2249.. IEEE.
31. Piña-García C, Gu D. Spiraling facebook: an alternative metropolis–
hastings random walk using a spiral proposal distribution. Soc Netw
Anal Mining. 2013;3(4):1403–1415. doi: 10.1007/s13278-013-0126-8.
32. Piña-García, C, Gu D (2015) Towards a standard sampling methodology
on online social networks: Collecting global trends on twitter. arXiv
preprint arXiv:1507.01489.
33. Roy, SD, Zeng W (2014) Social Multimedia Signals. Springer.
34. Scott J. Social network analysis: developments, advances, and
prospects. Soc Netw Anal Mining. 2011;1(1):21–26. doi: 10.1007/
s13278-010-0012-6.
35. Serfass DG, Sherman RA. Situations in 140 characters: Assessing real-
world situations on twitter. PLoS ONE. 2015;10(11):0143051. doi:
10.1371/journal.pone.0143051.
36. Takhteyev Y, Gruzd A, Wellman B. Geography of twitter networks.
Soc Netw. 2012;34(1):73–81. doi: 10.1016/j.socnet.2011.05.006.
37. Thapen, NA, Ghanem MM (2013) Towards passive political opinion

polling using twitter In: SMA@ BCS-SGAI, 19–34.
38. Ugander, J, Karrer B, Backstrom L, Marlow C (2011) The anatomy of
the facebook social graph. Arxiv preprint arXiv:1111.4503.
39. Vitter JS. Random sampling with a reservoir. ACM Trans Math Softw
(TOMS) 1985;11(1):37–57. doi: 10.1145/3147.3165.
40. Weikum, G, Ntarmos N, Spaniol M, Triantafillou P, Benczúr AA,
Kirkpatrick S, Rigaux P, Williamson M (2011) Longitudinal analytics
on web archive data: it’s about time! In: CIDR, 199–202.
41. Weng, L, Flammini A, Vespignani A, Menczer F (2012) Competition
among memes in a world with limited attention. Scie Rep 2. Nature
Publishing Group.
42. Weng, L, Menczer F, Ahn YY (2013a) Virality prediction and community
structure in social networks. Sci Rep 3(2522). doi: 10.1038/srep02522.
43. Weng L, Menczer F. Topicality and impact in social media: Diverse
messages, focused messengers. PLoS ONE. 2015;10(2):0118410.
44. Weng, L, Ratkiewicz J, Perra N, Gonçalves B, Castillo C, Bonchi F,
Schifanella R, Menczer F, Flammini A (2013b) The role of information
diffusion in the evolution of social networks. arXiv preprint
arXiv:1302.6276.
Chapter
MOBILE DATA COLLECTION:
SMART, BUT NOT (YET)
SMART ENOUGH
10
Alexander Seifert1, Matthias Hofer1,2, and Mathias Allemand1,3

University Research Priority Program “Dynamics of Healthy Aging”, University of Zurich,
1
Zurich, Switzerland
2
Department of Communication and Media Research, University of Zurich, Zurich,
Switzerland
Department of Psychology, University of Zurich, Zurich, Switzerland
3
BACKGROUND
Mobile data collection with smartphones—which belongs to the
methodological family of ambulatory assessment, ecological momentary
assessment, and experience sampling—is a method for assessing and
Citation: (APA): Seifert, A., Hofer, M., & Allemand, M. (2018). Mobile data collec-
tion: smart, but not (yet) smart enough. Frontiers in neuroscience, 12, 971. (4 pages)
Copyright: © 2018 Seifert, Hofer and Allemand. This is an open-access article distrib-
uted under the terms of the Creative Commons Attribution License (CC BY): http://
creativecommons.org/licenses/by/4.0/
tracking people’s ongoing thoughts, feelings, behaviors, or physiological

processes in daily life using a smartphone (Mehl and Conner, 2012; Miller,
2012; Trull and Ebner-Priemer, 2013; Harari et al., 2016). The primary goal
of this method is to collect in-the-moment or close-to-the-moment active
data (i.e., subjective self-reports) and/or passive data (e.g., data collected
from smartphone sensors) directly from people in their daily lives. The
collection and assessment of such data is possible because smartphones
are widely available and come with the computational power and sensors
needed to obtain information about their owners› daily lives. Researchers
in the fields of social science (e.g., Raento et al., 2009), psychology (e.g.,
Miller, 2012; Harari et al., 2016), and neuroscience (e.g., Schlee et al., 2016;
Ladouce et al., 2017) use smartphones to collect data about personality
processes and dynamics (Allemand and Mehl, 2017; Beierle et al., 2018a;
Stieger et al., 2018; Zimmermann et al., 2018), daily cognitive behaviors
(Aschwanden et al., 2018), social support behaviors (Scholz et al., 2016),
momentary thoughts (Demiray et al., 2017), couple interactions (Horn et al.,
2018), physical activity (Gruenenfelder-Steiger et al., 2017), and moods and
emotions (Erbas et al., 2018).
Using smartphones for data collection provides a snapshot of individuals’
everyday perceptions, experiences, and interactions with their environments.
The use of mobile devices for the assessment of individuals’ daily lives is
not a new research method (e.g., Fahrenberg et al., 1996). However, because
smartphones have now become so widespread throughout the population,
are low in cost, and are equipped with sensor technology and ready for data
collection through apps (Miller, 2012; Cartwright, 2016; Harari et al., 2016;
Beierle et al., 2018a), we are now living in an interesting time for smart
mobile data collection. Despite much progress, based on our experiences
and discussions with experts in the field, we see the potential for further
development of this method.
SMART MOBILE DATA COLLECTION

Mobile data collection with smartphones is growing rapidly in popularity
due to its many advantages. One such advantage is that the findings are
ecologically valid because they are collected during people’s day-to-day
lives and capture behaviors and experiences in real environments outside of
research laboratories (Wrzus and Mehl, 2015). Real-time reports (i.e., active
data) and sensor data (i.e., passive data) are measured in the moment and
are therefore less prone to memory bias than are retrospective assessments
Mobile Data Collection: Smart, but Not (Yet) Smart Enough 211
(Redelmeier and Kahneman, 1996). By capturing real-time data about when

and where an action takes place, the method provides important information
about the dynamics of real-life patterns (Hektner et al., 2007). A smartphone
allows researchers to capture such data by installing random, continuous,
or event-based alarms to ask participants for their responses to questions or
events during the day. Intensive repeated measurements of one participant
capture within-person information, which represents the behaviors and
experiences of a single individual. In contrast, between-person information
demonstrates variability between individuals. Collecting within-person
information allows for the study of the mechanisms and processes that
underlie behavior, and this can be contrary to between-person information
(Hamaker, 2012). For example, a study by Stawski et al. (2013) showed that
processing speed is important for understanding between-person differences
in working memory, whereas attention switching is of greater importance
to within-person variations. Therefore, it can be argued that the proper
study of the dynamic nature of psychological processes requires repeated
observations within individuals (Conner et al., 2009). Smartphones are ideal
tools for collecting such data.
Real-life data measurements are also rich in contextual information, as
mobile data collection allows for the combination of self-reports or observer-
reports (i.e., active data) and objective assessments (i.e., passive data) of
activities, movements, social interactions, bodily functions, and biological
markers, using the sensors that are built into smartphones (Ebner-Priemer et
al., 2013). For example, it is possible to collect self-reports (e.g., individuals’
feelings of social inclusion) and simultaneously to record acoustic sound
clips of conversation to collect the objective patterns of participants’ actual
proximity to and interaction with others (e.g., Mehl et al., 2001).
Finally, as measurement devices, smartphones are both powerful and
widespread in the population. This enables data analysis in real time and the
opportunity to run machine learning approaches within the devices, allowing
for large, individualized, dynamic, and intensive real-life studies (Raento et
al., 2009; Bleidorn and Hopwood, 2018). Because most participants already
have their own smartphones, an app is the only thing they need to install to
participate in a study (Miller, 2012). This gives researchers the opportunity
to conduct studies with large samples (Dufau et al., 2011).
SMARTER MOBILE DATA COLLECTION IN THE

FUTURE
In our research, we identified some of the challenges accompanying mobile
data collection with smartphones. In addition to discerning six challenging
areas, we offer some suggestions for dealing with these challenges in the
future. The first challenge relates to collecting data in real-life environments.
Collecting smart data in daily life may result in the validation of existing
theories, some of which may relate to behaviors and phenomena outside the
realm of day-to-day life. However, this requires that researchers develop
theories that reflect the multiple factors and dynamics of the real-life context
that may influence the individual. Additionally, real-life data should not be
collected simply because it is possible to do so, with conclusions about the
theoretical significance of the data being drawn afterwards. Instead, we
should develop and discuss the potential of real-life theories that consider
both the within-person and between-person effects and the real-life context.
The second challenging area relates to real-time measurements. In data
collection, real-time also means right on time; in other words, researchers
have to carefully determine whether they are collecting data about the
most relevant variables at the most appropriate moments and at ideal time
intervals. To do so, they must first know when to collect data and when
behaviors, thoughts, or changes are likely to occur. This question is crucial in
mobile data collection, because conclusions about fluctuations, variability,
and dynamics need to stem from a sound theoretical rationale or from the
behavior patterns of the target participant (e.g., Wright and Hopwood, 2016).
For instance, smartphone sensor technology and machine learning can help
researchers by detecting the time points of events within a participant, by
learning when events normally occur, or by learning the dependency of
other subjective or objective variables upon events (e.g., Albert et al., 2012).
The third challenging area concerns within-person data. Typical
smartphone studies collect data with great fidelity and generate large
quantities of observations, placing the approach clearly within the domain
of “big data” and requiring its associated advanced analytic techniques
(Yarkoni, 2012; Fan et al., 2014). Working with big data requires highly
technical expertise that researchers outside the field of computational science
do not normally have. Resources must be organized, and after collecting the
data, skills in advanced statistical analyses, including longitudinal structural
equation modeling (Little, 2013), dynamic structural equation modeling
(Asparouhov et al., 2017), multilevel modeling (Bolger and Laurenceau,
2013), and machine learning (e.g., Bleidorn and Hopwood, 2018), are
required. As a result, an interdisciplinary research approach involving
researchers interested in collecting data with smartphones and experts
familiar with those forms of data collection, management, and analysis
is crucial. Such endeavors should be supported by funding organizations
and academic career programs, enabling the full potential of mobile data
collection with smartphones to be achieved.
As a fourth challenging area, we identify the contextual information
that can be collected with smartphone sensor data (i.e., passive data), as
researchers have to consider the different forms, intervals, and amounts
of sensor data (e.g., GPS data, app use, and accelerometer data). When
collecting passive data continuously over multiple days, researchers need
to consider more than just the data itself; they must also be able to interpret
what the measurements indicate and convert the data into psychologically
meaningful variables, such as sociability or mobility patterns (e.g., Mehl
et al., 2006; Harari et al., 2016). Although this task is fundamental to the
research, it often requires new skills of researchers and new approaches
within the technology—approaches that ideally automatically aggregate
passive smartphone-sensor-based data. For example, when collecting sound
files containing conversation, it would be very helpful to automatically
detect the spoken words of a target person (e.g., Mehl et al., 2001), detect
contextual information (e.g., Lu et al., 2012), or interpret GPS data in
terms of mobility patterns (e.g., Ryder et al., 2009). For such requirements,
preliminary solutions do exist (e.g., Barry et al., 2006; White et al., 2011),
but much more development and validation work is needed before we can
achieve automatic, preprocessed, and validated smartphone-sensor data that
can be combined with other types of data collection.
The fifth challenging area relates to the smartphone device itself. Mobile
data collection with smartphones requires more technical preparation and
greater technical confidence and skills, on the side of both the researcher
and participant, than is required in classic paper-and-pencil studies. Daily
technical hassles such as malfunctioning software and hardware, low
smartphone batteries, and operation systems crashing during ongoing
studies cost time and resources. Therefore, we highly recommend including
an explicit time buffer and anticipating a higher than usual drop-out rate
in smartphone studies to compensate for potential technical problems and
challenges (for more information on technical issues, please see Mehl and
Conner, 2012; Miller, 2012; Harari et al., 2016). Although the technical side
of mobile data collection with smartphones is likely to become more reliable
over time, more validation studies are required in this area and more ready-
made valid apps are needed. When using smartphones for data collection
within specific population groups, it is also important to consider the unique
needs of the target group. For example, when working with older adults, it
can be helpful to reflect participants’ potential lack of smartphone skills by
adapting briefings on smartphone/app use (Seifert et al., 2017).
The final, though certainly not least important, challenging area is that
of data security and ethical issues. Collecting mobile data has revived past
concerns about data protection and the ethical use of data. Using mobile
devices for data collection, including tracking behavior and lifestyle
patterns, introduces a unique dimension to individual participant protection.
When collecting intensive profiles of individuals, which is the main research
method within mobile data collection with smartphones, anonymization is
nearly impossible. Therefore, traceable real-life data requires an intensive
consideration of ethical and legal approval, the safeguarding of participant
privacy, and the establishment of data security and data privacy (Harari et
al., 2016; Marelli and Testa, 2018). As an example, Beierle et al. (2018b)
conceived a privacy model for mobile data collection apps. Zook et al.
(2017) present ten simple rules for responsible big data research, concluding
that ethical and data protection issues should not prevent research but that it
is vital to ensure “that the work is sound, accurate, and maximizes the good
while minimizing harm” (Zook et al., 2017, p. 8). When using participants’
own smartphones, it is also important that researchers acquire participants’
consent to share self-recorded data with researchers (Gustarini et al., 2016).
In a quantitative population survey among persons over 50 years of age,
Seifert et al. (2018) found that more than the half of this demographic
group is willing to share self-recorded data with researchers, regardless of
participants’ age, gender, education, technology affinity, or perceived health.
The sharing and use of participants’ own self-recorded data may require
new models of participant involvement, with the goal of creating a trusted
relationship between the data providers and researchers working with the
data (Beierle et al., 2018b; Seifert et al., 2018).
CONCLUSIONS
Mobile data collection with smartphones offers unique and innovative
opportunities for studying human beings and processes in real life and real
time. This approach offers researchers the opportunity to collect real-time
reports of participants in their natural environment and within their individual
dynamics and life contexts with the help of a regular smartphone. However,
the approach also brings many challenges that provide interesting avenues
for future developments. To date, mobile data collection with smartphones
is already very smart, but we see the potential for even smarter mobile data
collection in the future.
All authors worked on this paper from conception to final approval and
share the same opinion.
Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any
commercial or financial relationships that could be construed as a potential
conflict of interest. The handling editor declared a past co-authorship with
the authors.
ACKNOWLEDGMENTS
MH thanks the Swiss National Science Foundation for funding (No.
PY00PI_17485/1). MA thanks the UZH Digital Society Initiative (DSI)
from the University of Zurich and the Swiss National Science Foundation
(No. 162724) for funding.
REFERENCES
1. Albert M. V., Kording K., Herrmann M., Jayaraman A. (2012). Fall
classification by machine learning using mobile phones. PLOS ONE
7:e36556. 10.1371/journal.pone.0036556
2. Allemand M., Mehl M. R. (2017). Personality assessment in daily life:
a roadmap for future personality development research, in Personality
Development Across the Lifespan, ed. J. Specht (London: Elsevier
Academic Press; ), 437–454.
3. Aschwanden D., Luchetti M., Allemand M. (2018). Are open and
neurotic behaviors related to cognitive behaviors in daily life of older
adults? J. Pers. 10.1111/jopy.12409. .
4. Asparouhov T., Hamaker E. L., Muthén B. (2017). Dynamic
latent class analysis. Struct. Equ. Modeling 24, 257–269.
10.1080/10705511.2016.1253479
5. Barry S. J., Dane A. D., Morice A. H., Walmsley A. D. (2006). The
automatic recognition and counting of cough. Cough 2:8. 10.1186/1745-
9974-2-8
6. Beierle F., Tran V. T., Allemand M., Neff P., Schlee W., Probst T.,
et al. (2018a). TYDR: Track your daily routine. Android app for
tracking smartphone sensor and usage data, in Proceedings of the
5th International Conference on Mobile Software Engineering and
Systems - MOBILESoft ‘18; 2018 May 27–28 (Gothenburg: ACM
Press; ), 72–75.
7. Beierle F., Tran V. T., Allemand M., Neff P., Schlee W., Probst T., et
al. (2018b). Context data categories and privacy model for mobile
data collection apps. Procedia Comput. Sci. 134, 18–25. 10.1016/j.
procs.2018.07.139
8. Bleidorn W., Hopwood C. J. (2018). Using machine learning to
advance personality assessment and theory. Pers. Soc. Psychol. Rev.
1:1088868318772990 10.1177/1088868318772990
9. Bolger N., Laurenceau J. P. (2013). Intensive Longitudinal Methods:
An Introduction to Diary and Experience Sampling Research. New
York, NY: Guilford Press.
10. Cartwright J. (2016). Technology: Smartphone science. Nature 531,
669–671. 10.1038/nj7596-669a
11. Conner T. S., Tennen H., Fleeson W., Barrett L. F. (2009). Experience
sampling methods: a modern idiographic approach to personality
research. Soc. Personal. Psychol. Compass. 3, 292–313. 10.1111/j.1751-

9004.2009.00170.x
12. Demiray B., Mischler M., Martin M. (2017). Reminiscence in everyday
conversations: a naturalistic observation study of older adults. J.
Gerontol. B Psychol. Sci. Soc. Sci. 10.1093/geronb/gbx141. .
13. Dufau S., Duñabeitia J. A., Moret-Tatay C., McGonigal A., Peeters D.,
Alario F. X., et al.. (2011). Smart phone, smart science: how the use
of smartphones can revolutionize research in cognitive science. PLoS
ONE 6:e24974. 10.1371/journal.pone.0024974
14. Ebner-Priemer U. W., Koudela S., Mutz G., Kanning M. (2013).
Interactive multimodal ambulatory monitoring to investigate the
association between physical activity and affect. Front. Psychol. 3:596.
10.3389/fpsyg.2012.00596
15. Erbas Y., Ceulemans E., Kalokerinos E. K., Houben M., Koval P., Pe
M. L., et al.. (2018). Why I don’t always know what I’m feeling: the
role of stress in within-person fluctuations in emotion differentiation.
J. Pers. Soc. Psychol. 115, 179–191. 10.1037/pspa0000126
16. Fahrenberg J., Myrtek M. eds. (1996). Ambulatory Assessment:
Computer-Assisted Psychological and Psychophysiological Methods
in Monitoring and Field Studies. Seattle, WA: Hogrefe and Huber.
17. Fan J., Han F., Liu H. (2014). Challenges of big data analysis. Natl. Sci.
Rev. 1, 293–314. 10.1093/nsr/nwt032
18. Gruenenfelder-Steiger A. E., Katana M., Martin A. A., Aschwanden D.,
Koska J. L., Kündig Y., et al. (2017). Physical activity and depressive
mood in the daily life of older adults. GeroPsych 30, 119–129.
10.1024/1662-9647/a000172
19. Gustarini M., Wac K., Dey A. K. (2016). Anonymous smartphone data
collection: Factors influencing the users’ acceptance in mobile crowd
sensing. Pers. Ubiquitous Comput. 20, 65–82. 10.1007/s00779-015-
0898-0
20. Hamaker E. L. (2012). Why researchers should think ‘within-person’:
A paradigmatic rationale, in Handbook of Research Methods for
Studying Daily Life, eds. M. R. Mehl and T. S. Conner (New York, NY:
Guilford; ), 43–61.
21. Harari G. M., Lane N. D., Wang R., Crosier B. S., Campbell
A. T., Gosling S. D. (2016). Using smartphones to collect
behavioral data in psychological science: opportunities, practical
considerations, and challenges. Perspect. Psychol. Sci. 11, 838–854.

10.1177/1745691616650285
22. Hektner J. M., Schmidt J. A., Csikszentmihalyi M. (2007). Experience
Sampling Method: Measuring the Quality of Everyday Life. Thousand
Oaks, CA: Sage Publications.
23. Horn A. B., Samson A. C., Debrot A., Perrez M. (2018). Positive
humor in couples as interpersonal emotion regulation: a dyadic study
in everyday life on the mediating role of psychological intimacy. J.
Soc. Pers. Relat. 1–21. 10.1177/0265407518788197
24. Ladouce S., Donaldson D. I., Dudchenko P. A., Ietswaart M. (2017).
Understanding minds in real-world environments: toward a mobile
cognition approach. Front. Hum. Neurosci. 10:694. 10.3389/
fnhum.2016.00694
25. Little T. D. (2013). Longitudinal structural equation modeling. New
York, NY: The Guilford Press.
26. Lu H., Frauendorfer D., Rabbi M., Mast M. S., Chittaranjan G.
T., Campbell A. T., et al. (2012). StressSense: detecting stress
in unconstrained acoustic environments using smartphones, in:
Proceedings of the 2012 ACM Conference on Ubiquitous Computing
– UbiComp ‘12; 2012 Sept 5–8 (Pittsburgh, PA: ACM Press; ), 351.
27. Marelli L., Testa G. (2018). Scrutinizing the EU general data protection
regulation. Science 360, 496–498. 10.1126/science.aar5419
28. Mehl M. R., Conner T. S. (eds) (2012). Handbook of Research Methods
for Studying Daily Life. New York, NY: Guilford Publications, Inc.
29. Mehl M. R., Gosling S. D., Pennebaker J. W. (2006). Personality in its
natural habitat: manifestations and implicit folk theories of personality
in daily life. J. Pers. Soc. Psychol. 90, 862–877. 10.1037/0022-
3514.90.5.862
30. Mehl M. R., Pennebaker J. W., Crow D. M., Dabbs J., Price J. H.
(2001). The Electronically Activated Recorder (EAR): a device for
sampling naturalistic daily activities and conversations. Behav. Res.
Methods Instrum. Comput. 33, 517–523. 10.3758/BF03195410
31. Miller G. (2012). The smartphone psychology manifesto. Perspect.
Psychol. Sci. 7, 221–237. 10.1177/1745691612441215
32. Raento M., Oulasvirta A., Eagle N. (2009). Smartphones: an
emerging tool for social scientists. Sociol. Methods Res. 37, 426–454.
10.1177/0049124108330005
33. Redelmeier D. A., Kahneman D. (1996). Patients’ memories of

painful medical treatments: real-time and retrospective evaluations
of two minimally invasive procedures. Pain 66, 3–8. 10.1016/0304-
3959(96)02994-6
34. Ryder J., Longstaff B., Reddy S., Estrin D. (2009). Ambulation: a tool
for monitoring mobility patterns over time using mobile phones, in
Proceedings of the 2009 International Conference on Computational
Science and Engineering; 2009 Aug 29–31 (Vancouver, BC: IEEE; ),
927–931. 10.1109/CSE.2009.312
35. Schlee W., Pryss R. C., Probst T., Schobel J., Bachmeier A., Reichert
M., et al.. (2016). Measuring the moment-to-moment variability of
tinnitus: the trackyourtinnitus smart phone app. Front. Aging Neurosci.
8:294. 10.3389/fnagi.2016.00294
36. Scholz U., Stadler G., Ochsner S., Rackow P., Hornung R., Knoll N.
(2016). Examining the relationship between daily changes in support
and smoking around a self-set quit date. Health Psychol. 35, 514–517.
10.1037/hea0000286
37. Seifert A., Christen M., Martin M. (2018). Willingness of older adults
to share mobile health data with researchers. GeroPsych 31, 41–49.
10.1024/1662-9647/a000181
38. Seifert A., Schlomann A., Rietz C., Schelling H. R. (2017). The use of
mobile devices for physical activity tracking in older adults’ everyday
life. Digital Health 3, 1–12. 10.1177/2055207617740088
39. Stawski R. S., Sliwinski M. J., Hofer S. M. (2013). Between-person
and within-person associations among processing speed, attention
switching, and working memory in younger and older adults. Exp.
Aging. Res. 39, 194–214. 10.1080/0361073X.2013.761556
40. Stieger M., Nißen M., Rüegger D., Kowatsch T., Flückiger C.,
Allemand M. (2018). PEACH, a smartphone- and conversational
agent-based coaching intervention for intentional personality change:
study protocol of a randomized, wait-list controlled trial. BMC Psychol.
6:43. 10.1186/s40359-018-0257-9
41. Trull T. J., Ebner-Priemer U. (2013). Ambulatory assessment. Annu. Rev.
Clin. Psychol. 9, 151–176. 10.1146/annurev-clinpsy-050212-185510
42. White J., Thompson C., Turner H., Dougherty B., Schmidt D. C. (2011).
WreckWatch: Automatic traffic accident detection and notification
with smartphones. Mobile Netw. Appl. 16, 285–303. 10.1007/s11036-
011-0304-8
43. Wright A. G., Hopwood C. J. (2016). Advancing the assessment

of dynamic psychological processes. Assessment 23, 399–403.
10.1177/1073191116654760
44. Wrzus C., Mehl M. R. (2015). Lab and/or field? Measuring personality
processes and their social consequences. Eur. J. Pers. 29, 250–271.
10.1002/per.1986
45. Yarkoni T. (2012). Psychoinformatics: new horizons at the interface of
the psychological and computing sciences. Curr. Dir. Psychol. Sci. 21,
391–397. 10.1177/0963721412457362
46. Zimmermann J., Woods W. C., Ritter S., Happel M., Masuhr O., Jaeger
U., et al. (2018). Integrating structure and dynamics in personality
assessment: first steps toward the development and validation of a
personality dynamics Diary. PsyArXiv. 10.31234/osf.io/5zcth
47. Zook M., Barocas S., Boyd D., Crawford K., Keller E., Gangadharan
S. P., et al.. (2017). Ten simple rules for responsible big data research.
PLoS Comput. Biol. 13:e1005399. 10.1371/journal.pcbi.1005399
Chapter
COMPARING A MOBILE
PHONE AUTOMATED
SYSTEM WITH A PAPER AND
11
EMAIL DATA COLLECTION
SYSTEM: SUBSTUDY
WITHIN A RANDOMIZED
CONTROLLED TRIAL
Diana M Bond, PhD1, Jeremy Hammond, PhD2, Antonia W Shand, MB

ChB3,4, and Natasha Nassar, PhD1,3
1
Sydney, Australia
2
Strategic Ventures, University of Sydney, Sydney, Australia
3
4
Department for Maternal Fetal Medicine, Royal Hospital for Women, Sydney, Australia
Citation: (APA): Bond, D. M., Hammond, J., Shand, A. W., & Nassar, N. (2020). Com-
paring a mobile phone automated system with a paper and email data collection sys-
tem: Substudy within a randomized controlled trial. JMIR mHealth and uHealth, 8(8),
e15284. (13 pages)
Copyright: © This is an open-access article distributed under the terms of the Creative
Commons Attribution License (https://creativecommons.org/licenses/by/4.0/).
ABSTRACT
Background
Traditional data collection methods using paper and email are increasingly
being replaced by data collection using mobile phones, although there is
limited evidence evaluating the impact of mobile phone technology as part
of an automated research management system on data collection and health
outcomes.
Objective
The aim of this study is to compare a web-based mobile phone automated
system (MPAS) with a more traditional delivery and data collection
system combining paper and email data collection (PEDC) in a cohort of
breastfeeding women.
Methods
We conducted a substudy of a randomized controlled trial in Sydney,
Australia, which included women with uncomplicated term births who
intended to breastfeed. Women were recruited within 72 hours of giving
birth. A quasi-randomized number of women were recruited using the PEDC
system, and the remainder were recruited using the MPAS. The outcomes
assessed included the effectiveness of data collection, impact on study
outcomes, response rate, acceptability, and cost analysis between the MPAS
and PEDC methods.
Results
Women were recruited between April 2015 and December 2016. The analysis
included 555 women: 471 using the MPAS and 84 using the PEDC. There
were no differences in clinical outcomes between the 2 groups. At the end of
the 8-week treatment phase, the MPAS group showed an increased response
rate compared with the PEDC group (56% vs 37%; P<.001), which was also
seen at the 2-, 6-, and 12-month follow-ups. At the 2-month follow-up, the
MPAS participants also showed an increased rate of self-reported treatment
compliance (70% vs 56%; P<.001) and a higher recommendation rate for
future use (95% vs 64%; P<.001) as compared with the PEDC group. The
cost analysis between the 2 groups was comparable.
Comparing a Mobile Phone Automated System With a Paper and Email... 223
Conclusions
MPAS is an effective and acceptable method for improving the overall
management, treatment compliance, and methodological quality of clinical
research to ensure the validity and reliability of findings.
Keywords: mobile phones, text messaging, data collection methods, clinical

trial, breastfeeding, maternal health
INTRODUCTION
Background
Participant engagement and response is a vital aspect of any clinical research
study. Many research studies are costly, labor intensive, and potentially
compromised because of the difficulties associated with patient compliance,
engagement, incomplete data collection, and inadequate follow-up [1-3]. The
method and type of data collection system utilized to recruit participants and
collect data throughout the study is important to ensure the quality, reliability,
and validity of data collection. In addition, it must be cost-effective and
acceptable to participants, funding organizations, and researchers [4-6].
Paper-based data collection in research studies is gradually being
replaced or used in conjunction with electronic data collection systems
[7], primarily in the form of emails containing links to web-based surveys.
Comparison of these two methods has been well documented [8-11].
In recent years, mobile phone technology has been increasingly used
to promote health-related behavioral change and self-management of
care via the use of apps and automated SMS text messages. Studies have
shown effective changes in psychological and physical symptoms [12-
14] as well as specific pregnancy and breastfeeding outcomes [15,16] by
sending individually tailored text messages to participants. However, a
Cochrane review specifically looking at mobile phone apps as a method
of data delivery for self-administered questionnaires found that none of
the included trials in the review reported data accuracy or response rates
[17]. Furthermore, a review of studies utilizing mobile phones for data
collection showed that they were based on very small sample sizes, collected
intermittent data (as opposed to daily), or had limited longitudinal data
collection (maximum 9 months) [18-21]. There is also limited assessment
of mobile phone technology as part of a web-based automated system,
integrating randomization, SMS delivery, and electronic data collection into

a streamlined data management system. Although previous studies have
compared traditional paper-based data collection with data collection using
mobile phones [22,23], there is limited evidence assessing the effectiveness
of a combination of paper or email-based methods in comparison with
mobile phones as part of an automated data collection management system.
In addition, longitudinal data collection using mobile phone technology has
not been assessed, particularly in maternal and infant health, despite adults
of reproductive age currently being the largest users of mobile phones [24].
Objectives
The primary aims of this study were to compare a web-based research
management system utilizing mobile phone technology with a traditional
delivery and data collection system using a combination of paper- and email-
based methods on clinical research outcomes and to assess the acceptability
and effectiveness of use, including cost analysis.
METHODS
Design
We conducted a prespecified substudy as part of the APProve (CAn
Probiotics ImProve Breastfeeding Outcomes?) trial to compare a mobile
phone automated system (MPAS) with a paper and email data collection
(PEDC) system. APProve was a double-blind randomized controlled trial
(RCT) evaluating the effectiveness of an oral probiotic versus a placebo
for preventing mastitis in breastfeeding women. It was conducted between
April 2015 and December 2016 in 3 maternity hospitals in Sydney, Australia.
Detailed methods have been published previously [25]. Briefly, it involved
the evaluation of a probiotic versus a placebo taken daily for 8 weeks for
the prevention of mastitis, which was assessed using short daily and slightly
longer weekly questionnaires during the first 8 weeks following birth and
longer follow-up questionnaires at 2, 6, and 12 months.
The MPAS was a data delivery and collection system that combined
treatment randomization, SMS delivery to participants, electronic data
collection, and data management. It was developed by the study team with
the aid of an eResearch (electronic research) company, which developed
the system based on our prospective design specifications. The system
integrated 2 established software services, SMS delivery and a web-based
survey tool, which were then linked to a secure web-based data management
system. The MPAS sent automated text messages to the participants’
mobile phones with links to self-administered web-based surveys. Each
survey link was embedded with the participant’s unique identifier, enabling
comparison across multiple surveys. A maximum of 2 automated reminders
were integrated into the system if a participant did not respond after 3 days.
The MPAS was pilot tested by 17 members of the research department,
with feedback and suggestions integrated into the system before study
commencement.
The PEDC included a combination of an 8-week calendar diary provided
to participants at the time of trial entry and emailed links to weekly and
follow-up surveys. The calendar diaries were identified with the participant
study number at the time of treatment randomization, and the start date was
manually entered. The A4-size calendar was preserved with a waterproof
coating, allowing for daily entries by pen. Participants were encouraged to
hang the calendar in a prominent place at home. PEDC users were supplied
with a stamped, addressed envelope to post the calendar back to the trial
coordinating center at the end of the treatment phase.
The study was approved by the Northern Sydney Local Health
District Human Research Ethics Committee, approval number HREC/14/
HAWKE/358, and registered with the Australian New Zealand Clinical
Trials Registry, registration number ACTRN12615000923561. Written
informed consent was obtained from all participants.
Participants and Study Procedures

Of the 639 women randomized to the APProve trial, 539 women were
allocated to the MPAS and 100 women to the PEDC. A quasi-randomization
process was applied for PEDC recruitment, which was conducted on
randomly preassigned days of the week and continued until 100 participants
were recruited. Both groups of women were identified, approached, and
consented to the study in the postnatal ward in the same way, but the
treatment randomization process was slightly different.
For the women allocated to the MPAS group, a research assistant
entered their details into the web-based data management system, which
then automatically generated a unique participant identification number and
treatment allocation. The randomization schedule was built into the system
and generated using a computer random number generator with random
block sizes. Randomization of participants using the PEDC was conducted
using sealed, opaque envelopes, with the randomization schedule developed

using a similar but separate process compared with the MPAS group.
Data Collection
Baseline sociodemographic, clinical, and birth characteristics collected in this
study are shown in Table 1. All daily, weekly, and follow-up questionnaires
were identical for the 2 groups.
Table 1: Characteristics of participants using the mobile phone automated sys-

tem compared with the paper and email data collection system
Participant characteristics MPASa (n=526) PEDCb (n=94) Statisticsc P value
t value (df) Chi-square (df)
Maternal
Maternal age (years), mean (SD) 33.4 (4.9) 33.5 (4.0) 0.06 (618) N/Ad .95
Born in Australia, n (%) 256 (48.7) 56 (59.6) N/A 3.8 (1) .05
Ethnicity, n (%) N/A 0.2 (2) .92
Asian 110 (20.9) 19 (20.2) N/A N/A N/A
White 365 (69.4) 67 (71.3) N/A N/A N/A
Other 51 (9.7) 8 (8.5) N/A N/A N/A
Tertiary education , n (%)

e
440 (83.7) 77 (81.9) N/A 0.2 (1) .68
Alcohol in pregnancy, n (%) 58 (11.0) 11 (11.7) N/A 0.0 (1) .85
First baby, n (%) 312 (59.3) 44 (46.8) N/A 5.1 (1) .02
Allocated to probiotic, n (%) 265 (50.4) 46 (48.9) N/A 0.1 (1) .80
Birth, infant, and postpartum
Caesarean section, n (%) 163 (31.0) 25 (26.6) N/A 0.7 (1) .39
Birthweight (grams), mean (SD) 3421 (458.1) 3456 (451.6) 0.69 (618) N/A .49
MPAS: mobile phone automated system.

a
b
PEDC: paper and email data collection.
c
Test statistics using Pearson chi-square test for categorical variables and
2-tailed, independent sample t test for continuous variables with their
respective df are presented.
d
N/A: not applicable.
e
College, university, or vocational training after high school.
For the MPAS group, each study site was provided with an electronic
tablet with internet connectivity to enable the research assistant to enter the
participants’ details, conduct treatment randomization, and enter baseline
and hospital data directly into the web-based data management system.
All research assistants were trained in the use of the MPAS and given
individualized password-protected access to the website, which could be
accessed by phone, tablet, or computer. Only deidentified data were entered
into the database and linked to an individual study number generated
automatically at randomization. The only paper-based data for this cohort
included a signed patient information and consent form and a trial entry
form containing the participants’ contact details. Once randomized, the study
number generated by the MPAS was written in the trial entry form to allow
for reidentification, if required. An audit trail was integrated into the MPAS
to log all SMS messages sent and surveys completed. Daily and weekly
outcome data for the APProve trial for the first 8 weeks (56 days) following
birth were collected via self-completed questionnaires using automated
weblinks sent directly via SMS to the participant’s mobile phone. Before
the follow-up questionnaires at 2, 6, and 12 months (63, 180, and 360 days),
participants were sent an automated link asking for their preferred method
of receiving the questionnaires, with SMS, email, or post as options. On the
basis of the response, the MPAS would either send the participant an SMS
link to the relevant survey or alert the trial coordinator by an automated
email of the preference for an emailed or a postal questionnaire.
For the PEDC participants, baseline and hospital data were collected
on paper-based data forms and then entered into the web-based system at
the trial coordinating center. Once randomized to their allocated treatment,
participants were given a calendar diary by the research assistant to record
daily outcomes for 8 weeks. Weekly outcome data for the first 8 weeks
and follow-up questionnaires at 2, 6, and 12 months were collected by an
emailed weblink to a web-based survey sent by the clinical trial coordinator
(Figure 1).
Figure 1: Flow diagram comparing the mobile phone automated system with
paper and email data collection. MPAS: mobile phone automated system;
Outcomes
Outcomes evaluating participant acceptability, treatment compliance, and
effectiveness of data collection comparing the MPAS with the PEDC were
assessed in the 2-month follow-up questionnaire. Data were collected on the
ease of participation in the trial and the ease of remembering to take the study
treatment every day (both rated from 0 [very difficult] to 5 [very easy]), self-
reported compliance with taking the allocated treatment (compliance was
defined as having taken the product for ≥42 of 56 days, semicompliance as
having taken the product for 15-41 of 56 days, and noncompliance as having
taken the product for ≤14 of 56 days), whether the method of data collection
was helpful in reminding the participant to take the treatment (ranked from
0 [not helpful at all] to 5 [very helpful]), recommendation of the allocated
method of data collection for future studies, and the preference for how the
participant wanted to receive the follow-up questionnaires (SMS, email, or
post). The effectiveness of data collection was defined as the frequency of
completing the questionnaires at all time points.
We also assessed whether the data collection method had any impact
on the clinical trial outcomes. Clinical outcomes were collected during
the daily, weekly, and 2-month surveys. They included mastitis, maternal
infection, and breastfeeding status up to 2 months after birth. The mastitis
outcome measure was based on self-reported symptoms related to breast
infection or a clinical diagnosis of mastitis by a care provider [26].
Satisfaction with using their assigned method of data collection (MPAS
or PEDC) was assessed by using open-ended free text questions to elicit
written comments pertaining to what the participants liked the most and the
least about their assigned method of data collection and what suggestions
could be provided for future use. In addition, satisfaction with the method of
data collection was elicited from the MPAS users and responses ranked from
0 (did not like at all) to 5 (really liked it). This response was subgrouped into
2 categories: satisfied (4-5) and less satisfied (0-3).
The cost analysis of utilizing the MPAS compared with the PEDC was
also performed. Costs included those associated with the initial development
and ongoing usage of each system and personnel time associated with trial
participant survey collection and follow-up. A web-based time tracking
report was generated weekly to determine the average time required for
creating and sending emails and manual data entry from paper survey
collection.
Statistical Analysis
Baseline sociodemographic, clinical, and birth characteristics were compared
between the 2 groups. Categorical data were summarized using percentages,
and the differences in the characteristics between the 2 groups were assessed
using a chi-square test. Continuous outcomes with a normal distribution
were summarized using mean and SD, and the characteristics between the
2 groups were compared using t tests. Data with a nonnormal distribution
were summarized using medians, and the groups were compared using
nonparametric Wilcoxon tests. Satisfaction with the MPAS was analyzed
by maternal sociodemographic characteristics and treatment compliance.
Written responses were thematically assessed by 2 authors and an external
researcher, who each independently coded the data, followed by group
discussion. Common themes and relevant responses were identified, and

frequency was quantified. Analyses were conducted using SPSS version 24
(IBM SPSS Statistics, 2016 IBM Corporation), and P value <.05 was used
for statistical significance.
RESULTS
Participant Characteristics
Of 620 women, 526 women were quasi-randomized to the MPAS group
and 94 women to the PEDC group. There were no differences between the
groups except that a higher percentage of women in the MPAS group gave
birth to their first baby (P=.02; Table 1). After loss to follow-up of 10.5%
(55/526) participants in the MPAS group and 11% (10/94) in the PEDC group,
secondary outcomes were analyzed for 555 women. We found no difference
in the trial outcomes between the 2 data collection groups (Table 2). There
was also no difference in the ease of use between the MPAS and PEDC
groups. However, a higher proportion of participants using the MPAS were
compliant with taking the study treatment (331/471, 70.3% vs 47/84, 56%;
P<.001), were more likely to rate their method of data collection as being a
helpful reminder to record their symptoms (median 4.37 vs 2.63; P<.001),
and were more likely to recommend their assigned method for future use
(330/349, 94.6% vs 36/56, 64%; P<.001). There was little difference among
the characteristics of the women who were lost to follow-up compared with
those for whom we had follow-up data, except that at 2 months postpartum,
the former were less likely to be tertiary educated (45/65, 69% vs 472/555,
85.0%; P=.001).
Table 2: Impact and acceptability of the mobile phone automated system com-
pared with the paper and email data collection system
Maternal MPASa (n=471) PEDCb (n=84) Statisticsc Odds ratio P values

outcomes (95% CI)
t value (df) Chi-
square (df)
Mastitis, n 90 (19.1) 15 (17.9) N/Ad 0.1(1) 1.09 (0.59 .79
(%) to 1.99)
Infections 77 (16.3) 20 (23.8) N/A 2.8 (1) 0.63 (0.36 .10
(other than to 1.09)
mastitis), n
(%)
Any breast- 443e (94.5) 77 (91.7) N/A 0.1 (1) 1.55 (0.65 .32
feeding at to 3.69)
2 months, n
(%)
Exclusive 385f (82.3) 67 (79.8) N/A 0.3 (1) 1.18 (0.66 .58
breastfeed- to 2.11)
ing at 2
months, n
(%)
Ease of 3.76 (1.31) 3.57 (1.40) −1.02 (428) N/A 0.19 (−0.56 .31
participa- to 0.18)
tion (0-5,
5=very
easy), mean
(SD)
Ease of 3.21 (1.43) 2.95 (1.50) −1.3 (427) N/A 0.21 (−0.66 .21
remember- to 0.14)
ing to take
product (in-
dependent
of method;
0-5, 5=very
easy), mean
(SD)
Compliant with treatment, n (%) N/A 15.8 (2) N/A <.001
Compliant (≥42 331 (70.3) 47 (56.0) N/A N/A N/A N/A
of 56 days)
Semicompliant 87 (18.5) 14 (16.7) N/A N/A N/A N/A
(15-41 of 56
days)
Noncompliant 53 (11.3) 23 (27.4) N/A N/A N/A N/A
(≤14 of 56 days)
Helpful 4.37 (1.19) 2.63 (1.85) −9.3 (403) N/A 0.19 (−2.11 <.001
reminder to −1.38)
(data collec-
tion; 0-5,
5=very
helpful),
mean (SD)
Recom- 330 (94.6)g 36 (64.3)h N/A 50.8 (1) 0.19 (−2.11 <.001
mend for to −1.38)
future, n
(%)

a
b
c
Test statistics using Pearson chi-square for categorical variables and 2-tailed,
independent sample t test for continuous variables with their respective df
are presented.
d
N/A: not applicable.
e
N=469.
f
N=468.
g
N=349.
h
N=56.
Effectiveness and Satisfaction

The frequency with which women completed the daily and weekly
questionnaires was consistently higher among the MPAS users, with a 56%
average response rate over the 8-week treatment period compared with 37%
(P<.001) among the PEDC users (Figure 2). There was a gradual decrease
in the MPAS daily response rate over the course of the treatment phase
from 70% in the first week to less than half the women completing the
questionnaires by 8 weeks. Although the daily response rate from PEDC
users was lower than MPAS users, there was a notable spike in the response
rate among the PEDC users on the days the weekly questionnaires were sent
by email (Figure 2). Response rates for the follow-up questionnaires showed
a 12% higher rate of survey completion among the MPAS users at 2 months
compared with the PEDC participants, with an 18% difference at 12 months
(P<.05; Figure 2).
Figure 2: Effectiveness of data collection between the mobile phone automated

system and the paper and email data collection. MPAS: mobile phone auto-
mated system; PEDC: paper and email data collection.
Among the MPAS users, satisfaction was high with a mean score of
4.49 out of 5 (SD 1.0). There was no difference in satisfaction scores among
maternal characteristics. There was a difference in satisfaction related to
compliance, with participants most compliant with treatment being the most
satisfied with the use of the MPAS (P<.001; Figure 3). Nearly half of the
participants preferred to receive the questionnaires by either SMS (135/289,
46.7%) or email (139/289, 48.0%) at 2 months; however, the preference
for SMS increased to 60% for both the 6- and 12-month questionnaires
(142/241,58.9% and 135/224,60.2%, respectively). Very few women opted
to receive questionnaires by post (<5%).
Figure 3: Treatment compliance and satisfaction for the mobile phone auto-
mated system (n=555).
Responses to open-ended questions in the 2-month questionnaires
were received from 74.1% (349/471) MPAS participants and 67% (56/84)
PEDC participants. The themes identified were related to the factors that
the participants liked most and liked least about their method of data
collection as outlined in Table 3. Most of the MPAS participants stated that
the MPAS was easy, convenient, quick, accessible, and efficient to use. In
particular, many commented that web-based questionnaires were easy to
complete while breastfeeding. Overall, less than 5% (16/349) of the MPAS

participants stated that it was difficult to remember to complete the survey
every day, compared with 25% (14/56) PEDC participants. Approximately,
1 in 5 participants in each group commented on the functionality of either
the diary or the MPAS, such as difficulty with formatting, size restrictions,
Wi-Fi accessibility, and inability to enter additional comments. Although
11 women in the MPAS group stated that they found the text messages
intrusive, 3 participants stated that they liked the fact that this method was
not intrusive.
Table 3: Qualitative analyses of the likes and dislikes of mobile phone auto-
mated system users compared with paper and email data collection system users
Participant factors related to method of MPASa (n=349), n (%) PEDCb (n=56), n (%)
data collection
Liked the most
Ease of use 325 (93.1) 7 (12.5)
Good reminder to take treatment 75 (21.5) 10 (17.8)
Liked the least
Nothing 168 (48.1) 10 (17.8)
Time consuming 24 (6.9) 12 (21.4)
Functionality issues 77 (22.1) 10 (17.8)
Difficult to remember to complete 16 (4.6) 14 (25.0)
survey

a
b
Suggestions for future use by the MPAS participants included allowing
users to select the time of day to receive the SMS and to opt in or out of
reminder messages, limiting the number of questions on the questionnaire
to minimize scrolling, diversifying the content of each SMS for improved
interest, and improving the functionality to allow the questionnaires to be
completed later if interrupted. Many of the PEDC participants recommended
the use of SMS or a web-based app for data collection (Textbox 1).
Participants’ comments about the mobile phone automated system
compared with the paper and email data collection system.
Mobile phone automated system
• “I found using my phone to complete the surveys great as I could
do it easily when feeding my daughter.”
• “It was great—something to look forward to everyday. It was

easy and also a great reminder in case I had forgotten to take my
daily Approve sachets.”
• “So easy to remember and to complete the daily survey. I often
completed the survey while out and about.”
• “Most people have a smartphone on hand. Much easier than
using a computer or a paper record. Ease of use—always with
me. Could answer questions while breastfeeding my baby.”
• “Sometimes it took a while to upload the questions”
• “Reminders were great but sometimes daily were a bit annoying”
• “Weekly questionnaires bit lengthy”
• “It would often change my response (touch feature too sensitive)”
• “Hard to see if the survey was completed if forgotten to complete
the previous ones”
Paper and email data collection system
• “I liked to be a part of this study but it was not that easy to
remember it to take every day...I missed sometimes.”
• “Helped to keep on track. Encouraged me to have a morning
routine that incorporated having breakfast at the similar time each
morning.”
• “The calendar was quick and easy. Can’t imagine also having to
write in a diary on a daily basis.”
• “Now that everyone is on the phone maybe there could be a daily
reminder on the participants’ phone, creating an app or site so the
data goes straight to the research office daily or weekly.”
• “Filling out the manual form is troublesome.”
• “Forgetting to fill in the daily diary even though it was clearly
explained to me before I agreed to do the trial. I’m so sorry. I
only found it the other day in a pile of paperwork. I do everything
electronically.”
• “Keeping track and filling as was not doing it every day so it was
hard to remember after 15-20 days for that period, sorry.”
• “The progress chart would be easier if online or an app so it could
be filled in on a smartphone during feeds.”
• “Probably use a different stock as it could be hard to write on.”
Cost Analysis
Cost analysis between the 2 groups showed a comparable per-person cost,
with the MPAS costing on average Aus $10 (US $7.21) more (Tables 4 and
5).
Table 4: Cost analysis for paper and email data collection
Paper and email data collection Cost, Aus $ (US $)a

Diaries
Printing 854.05 (615.65)
Labels for diaries 58.85 (42.42)
Stamps 150 (108.13)
Envelopes 40 (28.83)
Paper and printing (case report forms) 10 (7.21)
Emailsb,c 5000 (3604.29)
Data collection forms 2500 (1802.14)
Reminder emails: 35% (33/94) return rate 500 (360.43)
Total×100 participants 9112.90 (6569.11)
Total cost per person 91.13 (65.69)
All costs are calculated in Australian dollars (Aus $1=US $0.72).

a
b
Labor is calculated at Aus $50 (US $36.04) per hour.
Emails are calculated at 5 min per email.
c
Table 5: Cost analysis for mobile phone automated system
Mobile phone automated system Cost, Aus $ (US $)a

Tablets×3 1060 (764.11)
Intersect: data hosting 3300 (2378.83)
Intersect: app development 29,080 (20,962.50)
Intersect: trial Infrastructure 14,500 (10,452,40)
Web survey tool (Aus $780 per year×2) 1560 (1124.54)
SMS service (45000@Aus $0.069 per sms) 3105 (2238.26)
Mobile service number (Aus $25 per month×24) 600 (432.52)
Website hosting (Aus $25 per year×2) 50 (36.04)
Broadband (Aus $30 per month×24) 720 (519.02)
Reminder emails: 46.7% (246/526) return rate)b,c 450 (324.39)

Total×529 participants 54,425 (39,232.70)
Total cost per person 102.88 (74.16)
All costs are calculated in Australian dollars (Aus $1=US $0.72).

a
b
Labor is calculated at Aus $50 (US $36.04) per hour.
Emails are calculated at 5 min per email.
c
DISCUSSION
Principal Findings
This study demonstrates that an MPAS is an effective and acceptable tool
for improving study delivery and data collection within a randomized
trial as compared with a more traditional system. We have shown that the
mobile phone system improved treatment compliance and response rates,
demonstrated greater user satisfaction, is comparable in cost to PEDC, and
does not impact study outcomes.
Comparison with Prior Work

Our study supports previous studies which showed that SMS messaging
could improve treatment adherence and was acceptable to participants
[16,19,27]. Despite concerns about long-term attrition in previous studies
[28], the MPAS results showed that even with a decrease in response rates
over time, the response rates were consistently higher than the PEDC rates
over the same period, possibly because of better engagement among the
users. Although the response rate of the PEDC participants showed that
37% (35/94) of the participants returned a completed questionnaire, it is
likely that some of the days may have been retrospectively completed,
compromising the accuracy of the data. The peak completion rate of the
PEDC questionnaires was on the day the weekly questionnaires were emailed
to the participants, suggesting that emailed links are a more effective method
of data collection compared with paper-based data collection, although they
are more time consuming for the trial coordinator compared with automated
SMS links. Despite no difference in clinical outcome measures between the
2 groups, the increased response rates to the daily surveys provided rich data
regarding breastfeeding habits, confirming the feasibility of using an MPAS
as a means of improving the reliability of outcome data in breastfeeding

research [23].
The daily questionnaires of the MPAS appeared to have a secondary
effect of improving treatment compliance by serving as a daily reminder,
which in turn increased engagement with the system, resulting in a higher
rate of satisfaction. Anecdotally, satisfaction among the research assistants
was also high, with the majority saying that the MPAS was easy to use and
less time-consuming for randomization and data entry as compared with
paper forms. Moreover, the MPAS minimized the use of paper.
Despite previous research showing a 55% reduction in cost upon using
electronic data collection compared with paper data collection [10], our study
indicates that the cost per person is comparable between PEDC and MPAS.
This is largely because of the differences in electronic data capture between
the 2 studies, with the earlier study collecting, monitoring, and entering data
directly into a web-based database, whereas the major expenditure to our
study was the development of a research management system that integrated
randomization, automated SMS, and data collection. It is important to note
that once the trial infrastructure and data hosting was installed and initiated,
there was potential to significantly scale up the number of participants and
the duration of the study without an incremental increase in cost, whereas
an increase in PEDC participants would constitute a supplemental increase
in labor costs. An additional 18 PEDC participants in our study would have
balanced the costs between the 2 groups. Furthermore, the scope for contact
and engagement with participants with the MPAS is greater compared
with paper and email methods of data collection. For example, the PEDC
participants each received a minimum of 11 emails. Conversely, the MPAS
participants received an average of 61 automated text messages, including
welcome texts, daily SMS, and reminder messages. If the same number of
texts were sent by email by a clinical trial coordinator, the cost would have
increased to an additional Aus $200 (US $144.17) per participant (Aus $292
[US $210.49] PEDC vs Aus $102 [US $73.53] MPAS).
There is very little data to evaluate the use of SMS as a consolidated
research management tool. We found many benefits of using MPAS in
the multicenter APProve trial, including a centralized system to manage
randomization, data collection across all stages of the trial, automated
reminders and alerts, reduced paper transfer of sensitive patient information
between sites, reduced potential for transcription error [11,29,30], and
improved reliability of daily data collection associated with reduced risk
of recall bias [23]. Reducing the burden and time of data collection on
the research assistant was significant, along with issues associated with
patient confidentiality and storage of physical case report forms [23,29].
The advantage of integrating the MPAS via a web-based platform ensured
access across mobile phone platforms and enabled accessibility to a large
and diverse population, especially for those living in rural, remote, or
disadvantaged areas or where mobility is restricted [31,32]. In addition,
staff sick leave and absences were less of an issue because of the automated
nature of the system, leading to increased flexibility of the research team,
which is important when managing research studies on small budgets in
small teams.
Strengths and Limitations

The main strength of our study was embedding the assessment of the
MPAS versus PEDC as a substudy in an RCT with quasi-randomization
to treatment group showing little difference between study groups. Most
studies comparing paper-based data collection and electronic data collection
had very small sample sizes, 20 to 116 participants [20,33], whereas we were
able to show an effective difference with a statistically robust sample size.
Furthermore, daily data collection for 8 weeks and comparison of responses
at 3 strategic time points over the course of 1 year was instrumental in the
accurate assessment of outcomes and minimizing errors in recall bias [34].
The inclusion of data accuracy and response rates fills a gap in the literature
as addressed by a relevant Cochrane review [17]. Furthermore, the method
of data collection for both groups allowed for objectivity of responses
without gratitude bias, as is often seen in questionnaires of a face-to-face
nature [35,36].
One of the limitations of the study was the difference in sample size
between the 2 groups. As this was a substudy of an RCT, it was not powered
for this secondary outcome. Random sampling was performed to ensure that
the MPAS did not adversely affect the primary outcome. Although baseline
maternal characteristics show that more women in the MPAS group gave
birth to their first baby, possibly because the paper diary appeared more
overwhelming for first-time mothers, there were no differences between
maternal health and breastfeeding outcomes. In addition, self-reported
compliance can be perceived as subjective and prone to bias, but as
compliance was measured by the same method in both groups, the bias would
be nondifferential. There were also issues with the interface and usability
for completing the questionnaires via the web for both MPAS and PEDC
participants. However, we were able to resolve many of the issues and make
slight modifications to the software over time. This did not negatively impact
the response rates. A final limitation was that no assessment of participant
time was included in the cost analysis. This was not included as it was not
anticipated that there would have been a discernible difference in time cost
between the 2 groups. Posting the diaries and logging on to the computer
for the weekly questionnaires may have elicited more time from the PEDC
participants, but this would have been negligible.
Conclusions
Despite the increasing growth of web-based clinical trial management
systems, there has been little or no evaluation of these systems against
traditional methods of trial management systems. Since the commencement
of our trial, there have been improvements in the quality and availability
of electronic data collection systems. For example, REDCap (Research
Electronic Data Capture) is a secure web application for building and
managing web-based surveys and databases, specifically for research studies
and operations [37]. The system offers an easy-to-use and secure method of
flexible yet robust data collection, which is free to researchers affiliated with
universities. Using such a system would have decreased the costs associated
with the development of the web-based survey tool we utilized as well as
eliminated many of the functionality issues we experienced to reduce future
research costs.
Future research should focus on how to maximize the effect of mobile
phone technology, such as implementing strategies to improve long-term
engagement with participants by simplifying questionnaires, optimizing
the number of text messages, and personalizing the content and timing of
messages.
Although we evaluated MPAS in a perinatal population, the use
of mobile phone technology provides the opportunity to facilitate and
improve the quality and effectiveness of clinical research studies; enhance
patient interaction; and improve clinical research across a wide range
of methodologies, disciplines, and health care settings. Integration and
evaluation of mobile phone research management systems that are cost-
effective, efficient, and acceptable to both researchers and patients is
essential, given the increasing use of mobile phone technology [24] and
high costs of undertaking research. We have shown that the use of an
integrated MPAS is an effective and acceptable method for improving the
overall management, treatment compliance, and methodological quality of

a randomized clinical trial to ensure validity and reliability of findings, in
addition to being cost-effective.
ACKNOWLEDGMENTS
Funding was provided by the Ramsay Research and Teaching Fund of Royal
North Shore Hospital and the Kolling Institute of Medical Research. NN
was supported by Australian National Health and Medical Research Council
Career Development (APP1067066) and DB by a University of Sydney
Postgraduate Award. In-kind support was provided by Intersect Australia
Ltd for research support and development of the MPAS. The funders of
the study had no role in the study design, data collection, data analysis,
data interpretation, or writing of the report. No payment was received for
writing this paper by pharmaceutical companies or other agencies. The
corresponding author had full access to all the data in the study and had the
final responsibility for the decision to submit for publication. The authors
would like to thank the research coordinators and midwives of Royal North
Shore Hospital, Royal Prince Alfred Hospital, and Royal Hospital for
Women for their assistance in trial recruitment and data collection and Ms
Andrea Pattinson for her assistance in the qualitative review of responses.
The authors also gratefully acknowledge the contribution of the women who
participated in this trial.
REFERENCES
1. Stone AA, Shiffman S, Schwartz JE, Broderick JE, Hufford MR.
Patient compliance with paper and electronic diaries. Control Clin
Trials. 2003 Apr;24(2):182–99. doi: 10.1016/s0197-2456(02)00320-3.
2. Wood AM, White IR, Thompson SG. Are missing outcome data
adequately handled? A review of published randomized controlled
trials in major medical journals. Clin Trials. 2004;1(4):368–76. doi:
10.1191/1740774504cn032oa.
3. Jüni P, Altman D, Egger M. Systematic reviews in health care:
assessing the quality of controlled clinical trials. Br Med J. 2001 Jul
7;323(7303):42–6. doi: 10.1136/bmj.323.7303.42. http://europepmc.
org/abstract/MED/11440947.
4. Sibbald B, Roland M. Understanding controlled trials. Why
are randomised controlled trials important? Br Med J. 1998 Jan
17;316(7126):201. doi: 10.1136/bmj.316.7126.201. http://europepmc.
org/abstract/MED/9468688.
5. Sanson-Fisher RW, Bonevski B, Green LW, D’Este C. Limitations of
the randomized controlled trial in evaluating population-based health
interventions. Am J Prev Med. 2007 Aug;33(2):155–61. doi: 10.1016/j.
amepre.2007.04.007.
6. Whitford H, Donnan P, Symon A, Kellett G, Monteith-Hodge E,
Rauchhaus P, Wyatt J. Evaluating the reliability, validity, acceptability,
and practicality of SMS text messaging as a tool to collect research
data: results from the feeding your baby project. J Am Med Inform
Assoc. 2012;19(5):744–9. doi: 10.1136/amiajnl-2011-000785. http://
europepmc.org/abstract/MED/22539081.
7. Nahm ML, Pieper CF, Cunningham MM. Quantifying data quality
for clinical trials using electronic data capture. PLoS One. 2008 Aug
25;3(8):e3049. doi: 10.1371/journal.pone.0003049. http://dx.plos.
org/10.1371/journal.pone.0003049.
8. Fitzgerald D, Hockey R, Jones M, Mishra G, Waller M, Dobson A. Use
of online or paper surveys by Australian women: longitudinal study
of users, devices, and cohort retention. J Med Internet Res. 2019 Mar
14;21(3):e10672. doi: 10.2196/10672. https://www.jmir.org/2019/3/
e10672/
9. Chen L, Chapman JL, Yee BJ, Wong KK, Grunstein RR, Marshall
NS, Miller CB. Agreement between electronic and paper Epworth
sleepiness scale responses in obstructive sleep apnoea: secondary

analysis of a randomised controlled trial undertaken in a specialised
tertiary care clinic. BMJ Open. 2018 Mar 8;8(3):e019255. doi: 10.1136/
bmjopen-2017-019255. http://bmjopen.bmj.com/cgi/pmidlookup?vie
w=long&pmid=29523562.
10. Pavlović I, Kern T, Miklavcic D. Comparison of paper-based and
electronic data collection process in clinical trials: costs simulation
study. Contemp Clin Trials. 2009 Jul;30(4):300–16. doi: 10.1016/j.
cct.2009.03.008.
11. Boyer K, Olson J, Calantone R, Jackson E. Print versus electronic
surveys: a comparison of two data collection methodologies. J
Oper Manage. 2002 Feb 21;20(4):357–73. doi: 10.1016/s0272-
6963(02)00004-9.
12. Heron K, Smyth J. Ecological momentary interventions:
incorporating mobile technology into psychosocial and health
behaviour treatments. Br J Health Psychol. 2010 Feb;15(Pt 1):1–39.
doi: 10.1348/135910709X466063. http://europepmc.org/abstract/
MED/19646331.
13. Dobson R, Whittaker R, Jiang Y, Maddison R, Shepherd M, McNamara
C, Cutfield R, Khanolkar M, Murphy R. Effectiveness of text message
based, diabetes self management support programme (SMS4BG):
two arm, parallel randomised controlled trial. Br Med J. 2018 May
17;361:k1959. doi: 10.1136/bmj.k1959. http://www.bmj.com/cgi/pmi
dlookup?view=long&pmid=29773539.
14. Redfern J. Smart health and innovation: facilitating health-related
behaviour change. Proc Nutr Soc. 2017 Aug;76(3):328–332. doi:
10.1017/S0029665117001094.
15. Lau Y, Htun TP, Tam WS, Klainin-Yobas P. Efficacy of e-technologies
in improving breastfeeding outcomes among perinatal women: a meta-
analysis. Matern Child Nutr. 2016 Jul;12(3):381–401. doi: 10.1111/
mcn.12202. http://europepmc.org/abstract/MED/26194599.
16. Poorman E, Gazmararian J, Parker R, Yang B, Elon L. Use of text
messaging for maternal and infant health: a systematic review of
the literature. Matern Child Health J. 2015 May;19(5):969–89. doi:
10.1007/s10995-014-1595-8.
17. Marcano Belisario JS, Jamsek J, Huckvale K, O’Donoghue J, Morrison
CP, Car J. Comparison of self-administered survey questionnaire
responses collected using mobile apps versus other methods. Cochrane
Database Syst Rev. 2015 Jul 27;(7):MR000042. doi: 10.1002/14651858.

MR000042.pub2.
18. Jimoh F, Lund E, Harvey L, Frost C, Lay W, Roe M, Berry R, Finglas
P. Comparing diet and exercise monitoring using smartphone app and
paper diary: a two-phase intervention study. JMIR Mhealth Uhealth.
2018 Jan 15;6(1):e17. doi: 10.2196/mhealth.7702. https://mhealth.
jmir.org/2018/1/e17/
19. Mougalian SS, Epstein LN, Jhaveri AP, Han G, Abu-Khalaf M,
Hofstatter EW, DiGiovanna MP, Silber AL, Adelson K, Pusztai L,
Gross CP. Bidirectional text messaging to monitor endocrine therapy
adherence and patient-reported outcomes in breast cancer. JCO Clin
Cancer Inform. 2017 Nov;1:1–10. doi: 10.1200/CCI.17.00015. https://
ascopubs.org/doi/10.1200/CCI.17.00015?url_ver=Z39.88-2003&rfr_
id=ori:rid:crossref.org&rfr_dat=cr_pub%3dpubmed.
20. Christie A, Dagfinrud H, Dale O, Schulz T, Hagen KB. Collection of
patient-reported outcomes;--text messages on mobile phones provide
valid scores and high response rates. BMC Med Res Methodol. 2014 Apr
16;14:52. doi: 10.1186/1471-2288-14-52. https://bmcmedresmethodol.
biomedcentral.com/articles/10.1186/1471-2288-14-52.
21. Lim MS, Sacks-Davis R, Aitken CK, Hocking JS, Hellard ME.
Randomised controlled trial of paper, online and SMS diaries for
collecting sexual behaviour information from young people. J
Epidemiol Community Health. 2010 Oct;64(10):885–9. doi: 10.1136/
jech.2008.085316.
22. Price M, Kuhn E, Hoffman JE, Ruzek J, Acierno R. Comparison of
the PTSD checklist (PCL) administered via a mobile device relative to
a paper form. J Trauma Stress. 2015 Oct;28(5):480–3. doi: 10.1002/
jts.22037.
23. Bruun S, Buhl S, Husby S, Jacobsen LN, Michaelsen KF, Sørensen
J, Zachariassen G. Breastfeeding, infant formula, and introduction to
complementary foods-comparing data obtained by questionnaires and
health visitors’ reports to weekly short message service text messages.
Breastfeed Med. 2017 Nov;12(9):554–60. doi: 10.1089/bfm.2017.0054.
24. Chaffey D. Mobile Marketing Statistics Compilation. Smart Insights.
2019. [2019-02-06]. https://www.smartinsights.com/mobile-
marketing/mobile-marketing-analytics/mobile-marketing-statistics/
webcite.
25. Bond D, Morris J, Nassar N. Study protocol: evaluation of the probiotic

Lactobacillus Fermentum CECT5716 for the prevention of mastitis in
breastfeeding women: a randomised controlled trial. BMC Pregnancy
Childbirth. 2017 May 19;17(1):148. doi: 10.1186/s12884-017-1330-8.
https://bmcpregnancychildbirth.biomedcentral.com/articles/10.1186/
s12884-017-1330-8.
26. Amir L, Lumley J, Garland S. A failed RCT to determine if antibiotics
prevent mastitis: cracked nipples colonized with Staphylococcus
aureus: a randomized treatment trial [ISRCTN65289389] BMC
Pregnancy Childbirth. 2004 Sep 16;4(1):19. doi: 10.1186/1471-
2393-4-19. https://bmcpregnancychildbirth.biomedcentral.com/
articles/10.1186/1471-2393-4-19.
27. Thakkar J, Kurup R, Laba T, Santo K, Thiagalingam A, Rodgers A,
Woodward M, Redfern J, Chow C. Mobile telephone text messaging for
medication adherence in chronic disease: a meta-analysis. JAMA Intern
Med. 2016 Mar;176(3):340–9. doi: 10.1001/jamainternmed.2015.7667.
28. Eysenbach G. The law of attrition. J Med Internet Res. 2005 Mar
31;7(1):e11. doi: 10.2196/jmir.7.1.e11. https://www.jmir.org/2005/1/
e11/
29. Cole E, Pisano ED, Clary GJ, Zeng D, Koomen M, Kuzmiak CM,
Seo BK, Lee Y, Pavic D. A comparative study of mobile electronic
data entry systems for clinical trials data collection. Int J Med Inform.
2006;75(10-11):722–9. doi: 10.1016/j.ijmedinf.2005.10.007.
30. Jones S, Murphy F, Edwards M, James J. Doing things differently:
advantages and disadvantages of web questionnaires. Nurse Res.
2008;15(4):15–26. doi: 10.7748/nr2008.07.15.4.15.c6658.
31. Bensley RJ, Hovis A, Horton KD, Loyo JJ, Bensley KM, Phillips
D, Desmangles C. Accessibility and preferred use of online web
applications among WIC participants with internet access. J Nutr Educ
Behav. 2014;46(3 Suppl):S87–92. doi: 10.1016/j.jneb.2014.02.007.
32. Vangeepuram N, Mayer V, Fei K, Hanlen-Rosado E, Andrade C,
Wright S, Horowitz C. Smartphone ownership and perspectives on
health apps among a vulnerable population in East Harlem, New
York. Mhealth. 2018;4:31. doi: 10.21037/mhealth.2018.07.02. doi:
10.21037/mhealth.2018.07.02.
33. Dale O, Hagen KB. Despite technical problems personal digital
assistants outperform pen and paper when collecting patient diary
data. J Clin Epidemiol. 2007 Jan;60(1):8–17. doi: 10.1016/j.

jclinepi.2006.04.005.
34. Fadnes LT, Taube A, Tylleskär T. How to identify information bias due
to self-reporting in epidemiological research. Int J Epidemiol. 2009
Jan;7(2):1–21. doi: 10.5580/1818.
35. Lumley J. Assessing satisfaction with childbirth. Birth. 1985;12(3):141–
5. doi: 10.1111/j.1523-536x.1985.tb00952.x.
36. van Teijlingen ER, Hundley V, Rennie A, Graham W, Fitzmaurice
A. Maternity satisfaction studies and their limitations: ‘what is, must
still be best’. Birth. 2003 Jun;30(2):75–82. doi: 10.1046/j.1523-
536x.2003.00224.x.
37. Harris PA, Taylor R, Thielke R, Payne J, Gonzalez N, Conde JG.
Research electronic data capture (REDCap)--a metadata-driven
methodology and workflow process for providing translational research
informatics support. J Biomed Inform. 2009 Apr;42(2):377–81. doi:
10.1016/j.jbi.2008.08.010. https://linkinghub.elsevier.com/retrieve/
pii/S1532-0464(08)00122-6.
Chapter
BIG DATA COLLECTION
AND OBJECT PARTICIPA-
TION WILLINGNESS: AN
12
ANALYTICAL FRAMEWORK
FROM THE PERSPECTIVE OF
VALUE BALANCE
Xiang Huang
Guangdong University of Finance and Economics, College of entrepreneurship education,
Guangzhou 510320, China
ABSTRACT
The application of big data not only brings us great convenience, but also
brings social problems such as big data “familiar”, information leakage and
so on, which seriously affects customers’ willingness to participate and their
satisfaction with the enterprise. How to collect customer information in
order to improve customers’ willingness to participate is an urgent topic to
Citation: (APA): Huang, X. (2021, April). Big Data Collection and Object Participa-
tion Willingness: An Analytical Framework from the Perspective of Value Balance. In
Journal of Physics: Conference Series (Vol. 1881, No. 3, p. 032016). IOP Publishing.
Copyright: © Content from this work may be used under the terms of the Creative
Commons Attribution 3.0 licence (http://creativecommons.org/licenses/by/3.0).
be discussed. This paper proposes an analytical framework, which considers

that the decision-making of big data objects participating in the big data
collection process is a process of comprehensive value balance, including
four types of CO creation, inducement, dedication and fishing, involving two
dimensions of activity value and data value, and the necessity of procedure,
activity value, information sensitivity, process complexity, data security and
procedure From the six aspects of value judgment, this paper may provide
useful enlightenment and reference for big data subjects to choose the target
data collection method of big data application and improve the participation
willingness of big data objects.
Keywords: Big Data, Willingness To Participate, Value Balance Perspec-

tive, Analytical Framework
THE ORIGIN OF RESEARCH

Big data refers to the collection of massive data, and the analysis and
research of the laws and phenomena contained in massive data, it is a
statistical analysis technology to obtain valuable information to predict the
future development of things [1].The application of big data depends on
the acquisition and analysis of massive data. With the development of the
Internet, especially the mobile Internet, and the increasing popularity of
e-commerce and online payment, it brings great convenience to people’s
daily life, at the same time, it also produces a variety of data all the time.
This provides a very good condition for the collection and application of
big data. Enterprises use massive data such as customer behavior obtained
from various channels to analyze, accurately locate customer preferences
and needs, conduct precision marketing, and provide customers with
appropriate products and services. Therefore, the application of big data
plays an important role in better matching supply and demand, reducing the
waste of resources and time.
With the deepening and popularization of big data application, it is also
found that the application of big data will bring a lot of economic, social
and legal problems. In order to obtain a large amount of valuable data,
some enterprises illegally collect consumers’ personal information, which
brings security risks to consumers and damages their rights and interests.
Some enterprises in the access to massive data, poor data management,
resulting in consumer personal information leakage, and even will contain
consumer personal information illegal sales, resulting in consumer personal
Big Data Collection and Object Participation Willingness: An Analytical... 249
information leakage, consumer privacy information wantonly spread on

the Internet, resulting in spam messages, harassing phone calls, accurate
fraud frequency, seriously infringed on the personal and property safety of
consumers, affect the quality of life We have established a normal social
and economic order. Some enterprises will use their own data and platform
advantages to “kill familiar”, treat customers differently and infringe on
the rights and interests of consumers. Some enterprises will illegally use
consumers’ personal information, such as sending commercial information
to consumers without their consent, promoting business, selling goods and
even harassing consumers through telemarketing. Some enterprises make
use of data advantage and market power to make high profits higher than the
competitive price by bidding up prices and price fraud, or crowd out other
operators by low price dumping and price collusion, so as to achieve the
purpose of monopoly [2] .
Therefore, all sectors of the society and participants, especially the
general public as the object of big data technology, are very vigilant about
the application of big data. By means of big data technology, the words and
deeds of the general public are transformed into data, which are collected,
stored, analyzed and utilized by the main body of big data technology.
Although they generate data, they do not possess data, have no data collection
and processing ability, and do not enjoy the benefits of data. They even do
not know how to dispose of the data they generate. They are in an absolutely
passive position. They have a strong reaction to the disclosure of privacy
information and being “killed” caused by big data applications, because it
means that their rights are seriously violated and their security is improved
A sharp decline in the sense of responsibility. These have seriously affected
their willingness to participate in big data and their satisfaction with relevant
enterprises, resulting in the “digital gap” between big data subject and big
data technology object [3, 4] .
The problem of “digital divide” has brought great challenges to the
application of big data. How to collect customer information to improve
customer participation intention is a topic that needs in-depth discussion.
The existing research mainly focuses on the characteristics of big data, the
impact of big data technology and Application on existing industries and
enterprises, how to make better use of big data technology to help existing
enterprises and businesses, how to improve the technology of big data
collection, storage, analysis and application, how to use big data to open up
new businesses or start new enterprises, and the social and legal impact of
big data application However, few studies focus on how to optimize the data
collection methods to solve the “digital gap” between big data subject and
big data technology object.
From the perspective of value balance of big data object, this paper
attempts to construct a model, which can provide useful enlightenment and
reference for the research and practice of big data application.
THE PRESENTATION OF ANALYTICAL

FRAMEWORK
Every actor must make a decision according to his own interests, and then
make a decision in favor of himself. Therefore, in order to improve the
enthusiasm of big data objects to participate in the data collection of big
data subjects, we need to understand the various interest concerns of big
data objects in the data collection process of big data subjects, that is, we can
investigate the participation decision-making and participation behavior of
big data objects from the Perspective of interest balance.
(1)Decision making considerations

When big data objects participate in the decision-making process of big
data, they mainly consider the following factors:
①Procedural necessity
If the process of big data collection is just the necessary procedure for the big
data object to carry out other economic or social activities, in order to achieve
its goal, the big data object can only accept the data input requirements,
regardless of whether it is willing or not. On the contrary, if the data collection
process is not required by the big data object to carry out other activities,
or needs to add other actions in addition to the program needs, the big data
object may not be willing to participate in the data collection process. For
example, in the process of shopping on the e-commerce platform, in order
to complete the transaction, consumers must input the necessary identity
information and the information of the purchased object whether they
are willing or not, so consumers have to accept that the data is collected.
However, if the platform requires the input of personal information such as
height and weight, which has nothing to do with the transaction, or requires
additional income, purchase frequency and other consumption intention
information, then consumers are often reluctant to participate.
②Activity value
If the process of big data collection is just the necessary procedure for the big
data object to carry out other economic or social activities, and the results of
these economic or social activities bring great utility and satisfaction to the
big data object, it will promote the big data object to complete this process.
The stronger the willingness to complete these economic or social activities,
the stronger the motivation to participate in the big data collection process.
③Information sensitivity
If the big data object thinks that the required data is private information, it
is often reluctant to participate. On the contrary, it is easier to be persuaded
to participate in data collection. For example, if big data objects are required
to input information such as marital status, family income level and sexual
orientation, it may cause big data objects to be very vigilant or even disgusted.
④ Process complexity
If the big data object is required to input a lot of information, or the input
process is complex and cumbersome, or the input information needs certain
knowledge and ability to identify and judge, which costs a lot of time and
energy, it may greatly reduce its willingness and enthusiasm to participate.
⑤Data security
If the information that big data objects are required to input is not sensitive
information such as private privacy, but if big data objects think that big
data subjects may abuse these private information when analyzing and using
them, or worry that big data subjects may cause information leakage when
storing data, which will eventually damage their rights and interests, the
participation of big data objects may be greatly reduced Wish and enthusiasm.
⑥Data value
If the big data object thinks that its participation in the big data collection
process will bring value to the big data subject, but it will also bring value to
itself, that is, it will create a win-win situation, or even a multi win situation,
then the big data object will have more willingness and enthusiasm to
participate. The higher the potential value of big data objects, the stronger
their motivation to participate in the big data collection process.
(2) Analysis framework dimension

These six big data object considerations can be divided into two categories
according to the different objects of concern: the first category focuses on
the economic or social activities carried out simultaneously with big data
collection, and the balance is the income and cost brought by the economic
or social activities. The benefits of economic or social activities can be
measured by activity value, while the costs of economic or social activities
can be measured by procedural necessity. When the program necessity and
activity value are high, the big data object has the highest evaluation on the
value of economic or social activities. When the necessity of procedure is
low and the value of activity is high, or the necessity of procedure is high
and the value of activity is low, the evaluation of big data object on the value
of economic or social activity is in the middle. When program necessity and
activity value are low, big data object has the lowest evaluation on economic
or social activity value.
The second type focuses on big data itself, weighing the benefits and costs
brought by the collection and application of big data. The benefits brought
by big data can be measured by data value, while the costs brought by big
data can be measured by information sensitivity, process complexity and
data security. When data value and data security are high, and information
sensitivity and process complexity are low, big data objects have the highest
evaluation of big data value. When the data value and data security are low,
and the information sensitivity and process complexity are high; or when
the data value and data security are high, and the information sensitivity and
process complexity are low, the evaluation of big data value is in the middle.
When the data value and security are low, and the information sensitivity
and process complexity are high, the evaluation of big data object for big
data value is the lowest.
The influence direction and size of the big data object’s attention object
and attention dimension on the big data object’s participation intention can
be summarized in the following table:
Table 1: Summary of the influence of object on big data attention dimension

and big data participation willingness
(3) Big Data Collection Type

Taking the value judgment of big data objects on two concerned objects as
the classification basis, big data collection methods can be classified (as
shown in Figure 1).
Figure 1: Types of big data collection methods.

When the big data object has a high evaluation on the value of the
activity and the value of the data, both the subject and the object think that
the data collection process is a process of creating value for themselves,
so it is a “co creation” data collection process. When the big data object
has a high evaluation on the value of activities but a low evaluation on the
value of data, the object will think that the data collection process has little
to do with its own value, and participating in the data collection process is
mainly to help the big data subject to create value, so it is a “dedication”
data collection process. When the big data object has a low evaluation on
the value of the activity but a high evaluation on the value of the data, the
object’s participation in the big data collection process is mainly attracted
by the promised interests of the big data subject or the future value of the
big data application, so it is a “inducement” data collection process. When
the evaluation of big data object on activity value and data value is low, the
object does not have any motivation to participate in the big data collection
process, which is mainly induced by the big data subject through other
means, so it is a “fishing” data collection process.
(4) Analysis Framework

Based on the above analysis, we propose a “factor type Willingness” analysis
framework (see Figure 2)
Figure 2: Analysis framework from the perspective of value balance.

The different degree of each value influencing factor will lead to
different types of data collection, which will lead to different degrees of
participation willingness of big data objects. According to the analysis, it
can be concluded that the object participation willingness of “co creation
type” data collection type is the highest, that of “fishing type” data collection
type is the lowest, and that of “dedication type” and “inducement type” is
in the middle. Big data subjects can switch the data receipt types among the
four types by adjusting the purpose of big data application and the means of
data collection, so as to change the object’s willingness to participate in the
big data object.
CONCLUSION AND PROSPECT

The application of technology should be people-oriented [5]. The application
of big data brings us great convenience and welfare, but at the same time, it
also produces a lot of serious social problems. These problems cause more
and more serious “data gap” between the subject and object of big data,
which has seriously affected the application and development of big data. It is
necessary to systematically and deeply study how to improve the satisfaction
of the object and its willingness to participate in the process of big data
information collection. This paper attempts to view big data collection and
application from the perspective of big data object. From the perspective
of value balance, it holds that the decision-making of big data object
participating in big data collection process is a process of comprehensive
value balance, which includes four types: CO creation type, inducement
type, dedication type and fishing type, involving two dimensions of activity
value and data value as well as the necessity of procedure Activity value,
information sensitivity, process complexity, data security and program value
judgment. This analysis framework not only considers the value evaluation
of the big data object on the big data itself, but also the value evaluation of the
economic or social activities of the object in the process of data collection; it
considers the evaluation of the big data object on the benefits and benefits,
and its perception of the corresponding costs and costs; it also considers
the evaluation of the real value of the big data object, It also considers its
estimation of future value and integrates it with appropriate logic, which
is more comprehensive and systematic. Therefore, this paper may provide
useful enlightenment and reference for big data subjects in how to choose
big data application goals and big data collection methods, improve the
participation willingness of big data objects, and for the research of big data
application[6-10] .
However, this analysis framework is only a qualitative analysis
framework proposed in theory, which is far from the level of quantitative
analysis. There is still a lack of quantitative data and actual data test on the
extent to which each influencing factor will lead to different types of data
collection, and then how much it will affect the participation willingness
of big data objects. In addition, this analysis framework only puts forward
different value influencing factors of big data objects, but it does not further
explore what factors lead to the degree of change of these influencing factors.
All these need to be further studied in the future.
REFERENCE
1. Yu Desheng, Li Xing. Research on dynamic evolutionary game of big
data maturity between consumers and businesses [J]. Price theory and
practice, 2019 (11): 129-132
2. Yuan Bo. Research on antitrust in the field of big data [D]. Shanghai
Jiaotong University, 2019
3. Wei Junwei. Social problems and Countermeasures of big data
technology application [D]. Central China Normal University, 2019
4. Song Jixin. Research on ethical issues and governance of big data
technology [J]. Journal of Shenyang Institute of Technology (SOCIAL
SCIENCE EDITION), 2018,14 (04): 452-455
5. LAN Yihui. The research, development and application of science and
technology should be people-oriented [J]. Scientific research, 2002
(02): 152-157
6. Yong Shi, Chun Shi, Shi-Yuan Xu, A-Li Sun, Jun Wang. Exposure
assessment of rainstorm waterlogging on old-style residences in
Shanghai based on scenario simulation[J]. Natural Hazards. 2010 (2)
7. Dapeng Huang, Chuang Liu, Huajun Fang, Shunfeng Peng. Assessment
of waterlogging risk in Lixiahe region of Jiangsu Province based on
AVHRR and MODIS image[J]. Chinese Geographical Science. 2008
(2)
8. Qiu J. Urbanization contributed to Beijing storms. Nature
News&Comment . 2012
9. Sang Y K,Wang Z G,Liu C M.What factors are responsible for the
Beijing Storm. Natural Hazards. 2013
10. Wu Z H,Huang N E, Long S R,et al. On the Trend,Detrending,and
variability of nonlinear and non stationary time Series. Proceedings
of the National Academy of Sciences of the United States of America
. 2007
Chapter
RESEARCH ON COMPUTER
SIMULATION BIG DATA
INTELLIGENT COLLECTION
13
AND ANALYSIS SYSTEM
Hongying Liu
Department of Computer Science and Engineering, Guangzhou College of Technology and
Business, Guangzhou 510850, China
ABSTRACT
As a characteristic of big data, the individual data in it is no longer isolated,
and the data and its underlying mechanisms have complex associations,
which make all data into an indivisible whole. The dynamic generation and
disappearance of data will change its original relationship and affect the
overall characteristics of the data. This feature of big data makes the subject-
Citation: (APA): Liu, H. (2021, March). Research on Computer Simulation Big Data
Intelligent Collection and Analysis System. In Journal of Physics: Conference Series
(Vol. 1802, No. 3, p. 032052). IOP Publishing. (7 pages)
Copyright: © Content from this work may be used under the terms of the Creative
Commons Attribution 3.0 licence (http://creativecommons.org/licenses/by/3.0).
oriented analysis methods such as data mining present limitations: the

presupposition of the subject and the analysis by subject split the interaction
relationship between the subjects, leading to the loss of the implicit mechanism
in these relationships. Aiming at the problems of traditional network big
data multi-resolution acquisition methods such as high acquisition cost,
long completion time and low acquisition accuracy, a JA-va3D-based big
data network multi-resolution acquisition method is proposed, and average
interactive data is introduced. The extraction method estimates the power
spectral density of the network data multi-resolution acquisition, and uses
the ADASYN algorithm to remove the invalid multi-resolution data, and
realizes the large data multi-resolution accurate acquisition. Experimental
results show that the proposed method has lower acquisition cost, shorter
completion time, and higher acquisition accuracy; it has certain practical
value and can be widely used in various fields.
Keywords. Computer, simulation data, intelligent algorithm, collection and

analysis system.
INTRODUCTION
The emergence of technologies such as cloud computing, mobile
communications and big data has promoted the rapid development of
various application software, and has been widely used in logistics and
warehousing, power communications and smart tourism. The database in
cloud computing is the most critical part of application software, and it is
also the starting point and end point of application software data processing
and processing. Therefore, database design is an important factor affecting
the use and popularization of application software. At present, in the process
of using the database, as the number of visits by users gradually increases, the
scale of big data has also become huge [1]. The traditional database design
model is prone to load phenomena, which is not conducive to improving the
efficiency of database extraction, so a new intelligent storage method needs
to be added.
At present, many experts and scholars are conducting research on
network multi-resolution big data collection. For example, the multi-
resolution collection method of network data based on linear regression,
through the application of linear regression analysis method to construct a
sensing data model, maintain the characteristics of the sensing data, so that
the node only transmits the parameter information of the regression model,
Research on Computer Simulation Big Data Intelligent Collection and... 259
instead of transmitting the actual monitored sensing data information. The

multi-resolution feature of network data is guaranteed to complete the
collection, but this method has many imperfect legacy problems. Multi-
resolution acquisition method based on network big data migration. Setting
up an experimental environment and performing full data migration can
remove invalid data to improve the accuracy of network data multi-resolution
acquisition, but this method takes a lot of time in the acquisition process.
This paper proposes a method for rapid information acquisition based on Ja-
va3D. This method analyses the composition of the information by studying
the operating rules and characteristics of the information, and encodes and
obtains the information, and adds depth to the information [2]. In order to
standardize the information, in order to complete the quantitative extraction
of information data and complete the rapid acquisition of information, the
process of this method is more complicated and the efficiency is low.
PRINCIPLES OF BIG DATA INTELLIGENT FUSION

In the big data fusion principle, the data set is obtained first, and the data set
is divided into several subsets. Then cluster several subsets at the same time
to get several cluster centres. Judge whether the sum of the cluster centres is
less than the threshold of the data fusion problem scale, if the result is yes,
cluster the data whose number is the value obtained by the sum of the data
cluster centres, merge the obtained categories, and complete the data fusion
[3].
To better distinguish the two sequences of abnormal big data and normal
big data. The big data in cloud computing needs to be divided into two
segments, and then stored in two buffers of different sizes [4]. The short
buffer area can store approximately 23-63 large data sequences, while the
long buffer area can store approximately 243 large data sequences. In order
to strengthen the storage capacity of the research method, the sequences in
the long and short buffers are selected in the appropriate state for the buffer
sample set. When new big data is searched, it is stored in the short buffer
area. If the short buffer area is full, the first data stored in the short buffer
area is stored in the long buffer area. If the long buffer area is full, delete the
big data stored in the long buffer area first. The processing process is shown
in Figure 1.
Figure 1: Long and short cache structure.
Related Theorems
Assume that for the same target, the initial state estimation and covariance
matrix of sensors i and j are , respectively. The
dynamic equation of target movement is:
(1)
Among them, the process noise wk is white noise with zero mean value,
the covariance matrix is Qk , and the measurement equations of the two
sensors are
(2)
Among them, the measured noise is white noise with zero mean, the
covariance is , and the cross-covariance matrix is .

The state of sensor i at time k is estimated as:
(3)
The estimated error is:
(4)
Cross-covariance:
(5)
It can be seen from the above formula that the cross-correlation is caused
by a priori estimation process noise Qk−1 and measurement noise .
The concept of consistency: Suppose that the true state of the target is a , the
sensor’s estimated state of the target is , the estimated error covariance
is P , and the true error covariance is . The so-called
consistency is the estimated covariance P P ≥ .
Fusion Algorithm for Correlation Estimation

Assuming that the true correlation system between two local estimates is γ0 ,
we can determine a γmin and γ max such that , and can obtain
an estimate of the cross-covariance matrix that satisfies the following
conditions, that is, satisfies:
(6)
Under such conditions, making full use of these correlation information
can improve the fusion accuracy. Algorithm flow:
After obtaining the estimates of , define the joint

covariance matrix:
(7)
Use formulas (8) and (9) to determine the fusion weight for fusion. In
the literature, the estimation method of the correlation coefficient bound is
given. The cross-covariance matrix can be estimated by formula (5), and the
estimated must satisfy
(8)
EXPERIMENTAL SIMULATION ANALYSIS
Experimental Environment Construction

In order to verify the overall effectiveness of the big data intelligent fusion
algorithm based on the estimation mechanism, a simulation experiment
is needed. Experimental data: Take a certain city centre as the local area
network within 2km of the circle. Experimental environment: 8 machines
are used to correspond to the data stored in each cell. The configuration
of each machine is: CPU is i5- 2400, 3.1GHz; memory is 8G, operating
system is Win10 Ultimate. The experimental indicators are the lifetime of
network nodes after data fusion and the network energy consumption during
the fusion process. Among them, network node lifetime: After data fusion,
it will be divided into different types of data nodes [5]. As time goes by, a
large amount of data will flow into each node. The current data fusion node
will be washed away, and the current data fusion the time from when a node
is established to when it is washed away is the network node lifetime.
Experimental Analysis
The proposed Java3D-based network multi-resolution large data acquisition
method, the optical fibre network communication data resolution acquisition
method and the network data resolution acquisition method based on the
linked list structure are compared with the results of the completion time of
the network data multi-resolution acquisition. The unit of completion time
is seconds, which is represented by s. The experimental results are shown
in Figure 2. In Figure 2, A represents the proposed method; B represents
the data resolution acquisition method based on optical fibre network
communication; C represents the network data resolution acquisition method

based on the linked list structure [6].
Figure 2: Comparison experiment of acquisition completion time by different

methods.
The network data resolution acquisition method based on the linked list
structure takes the longest time to complete the network data multi-resolution
acquisition process; the network data multiresolution acquisition based on
the optical fibre network communication data resolution acquisition method
takes longer to complete than the network data based on the linked list
structure The network data resolution acquisition method of the proposed
method takes a short time to complete the acquisition, but with the continuous
increase of the amount of data, the acquisition completion time has been
above 60s; the proposed method has the shortest completion time of the
network data multiresolution acquisition, and increases with the number of
data. It keeps increasing, has been below 60s, and has a high collection
efficiency. The proposed method is compared with the acquisition cost
(yuan) based on the optical fibre network communication data resolution
acquisition method and the network data resolution acquisition method
based on the linked list structure [7]. The experimental results are shown in
Figure 3.
Figure 3: Comparison experiment of collection cost consumption by different

methods.
In Figure 3, A represents the proposed method; B represents the data
resolution acquisition method based on optical fibre network communication;
C represents the network data resolution acquisition method based on the
linked list structure.
Obtaining information according to the characteristics of user
information is the basis for the rapid information acquisition method under
big data analysis to obtain user information. The parameter Ak represents
the characteristics of user information. When the characteristics of user
information are larger, user information is easier to obtain, to make the
results more accurate. The parameter Ak is used to test the rapid information
acquisition method. Table 1 shows the test results of the rapid information
acquisition method under big data analysis and the feature-based user
information rapid acquisition method.
Table 1: Test results of two different methods
CONCLUSION
Aiming at the various problems in the traditional Java3D network big data
multi-resolution acquisition process, a Java3D-based network big data
multi-resolution acquisition method is proposed. This method has shorter
completion time for multi-resolution acquisition of network big data, lower
cost, higher acquisition accuracy, certain application performance, and can
be widely used in various fields.
REFERENCES
1. Wang, L., & Wang, G. Big data in cyber-physical systems, digital
manufacturing and industry 4.0. International Journal of Engineering
and Manufacturing (IJEM), 6(4) (2016) 1-8.
2. Zhu, L., Yu, F. R., Wang, Y., Ning, B., & Tang, T. Big data analytics
in intelligent transportation systems: A survey. IEEE Transactions on
Intelligent Transportation Systems, 20(1) (2018) 383-398.
3. Jung, D., Tran Tuan, V., Dai Tran, Q., Park, M., & Park, S. Conceptual
Framework of an Intelligent Decision Support System for Smart City
Disaster Management. Applied Sciences, 10(2) (2020) 666-675.
4. Zhong, R. Y., Xu, C., Chen, C., & Huang, G. Q. Big data analytics
for physical internet-based intelligent manufacturing shop floors.
International journal of production research, 55(9) (2017) 2610-2621.
5. Sumalee, A., & Ho, H. W. Smarter and more connected: Future
intelligent transportation system. IATSS Research, 42(2) (2018) 67-71.
6. Zheng, X., Chen, W., Wang, P., Shen, D., Chen, S., Wang, X., ... &
Yang, L. Big data for social transportation. IEEE Transactions on
Intelligent Transportation Systems, 17(3) (2015) 620- 630.
7. Chih-Lin, I., Sun, Q., Liu, Z., Zhang, S., & Han, S. The big-data-driven
intelligent wireless network: architecture, use cases, solutions, and
future trends. IEEE vehicular technology magazine, 12(4) (2017) 20-
29.
Chapter
DEVELOPMENT OF A MOBILE
APPLICATION FOR SMART
CLINICAL TRIAL SUBJECT
14
DATA COLLECTION AND
MANAGEMENT
Hyeongju Ryu 1, Meihua Piao 2, Heejin Kim 3, Wooseok Yang 3 and Kyung
Hwan Kim 4
1
Biomedical Research Institute, Seoul National University Hospital, Seoul 03080, Ko-
rea
2
Office of Hospital Information, Seoul National University Hospital, Seoul 03080, Ko-
rea
3
4
Department of Thoracic and Cardiovascular Surgery, Seoul National University Hospi-
tal, Seoul National University College of Medicine, Seoul 03080, Korea
Citation: (APA): Ryu, H., Piao, M., Kim, H., Yang, W., & Kim, K. H. (2022). Devel-
opment of a Mobile Application for Smart Clinical Trial Subject Data Collection and
Management. Applied Sciences, 12(7), 3343.(12 pages)
Copyright: © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is
an open access article distributed under the terms and conditions of the Creative Com-
mons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
ABSTRACT
Wearable devices and digital health technologies have enabled the
exchange of urgent clinical trial information. We developed an application
to improve the functioning of decentralized clinical trials and performed
a heuristic evaluation to reflect the user demands of existing clinical trial
workers. The waterfall model of the software life cycle was used to guide
the development. Focus group interviews (N = 7) were conducted to reflect
the needs of clinical research professionals, and Wizard of Oz prototyping
was performed to ensure high usability and completeness. Unit tests and
heuristic evaluation (N = 11) were used. Thematic analysis was performed
using the focus group interview data. Based on this analysis, the main
menu was designed to include health management, laboratory test results,
medications, concomitant medications, adverse reactions, questionnaires,
meals, and My Alarm. Through role-playing, the functions and configuration
of the prototype were adjusted and enhanced, and a heuristic evaluation
was performed. None of the heuristic evaluation items indicated critical
usability errors, suggesting that the revised prototype application can be
practically applied to clinical trials. The application is expected to increase
the efficiency of clinical trial management, and the development process
introduced in this study will be helpful for researchers developing similar
applications in the future.
Keywords: clinical trial, heuristics, management, mobile application, tech-

nology, telemedicine]
INTRODUCTION
Clinical trials are essential to study the efficacy and risks associated
with drugs; however, they are costly and time-consuming, occasionally
requiring years for completion. A recent study found that the average cost of
development of a novel drug from drug discovery to the marketing approval
of a product is between $2 billion and $3 billion and can take anywhere from
12 to 18 years, with clinical trials being the most costly and time-consuming
phases of the entire process [1]. The burden of such conventional processes
for whole-drug development has become even more challenging with the
coronavirus disease 2019 (COVID-19) pandemic, especially because of
the difficulty in recruiting and retaining clinical trial participants [2,3,4].
Because of the limitations imposed by COVID-19, such as self-isolation,
site closures, and travel restrictions, as of October 2021, more than 2100
clinical trials have been reported to be explicitly suspended [5,6,7].
Development of a Mobile Application for Smart Clinical Trial Subject ... 269
To overcome these difficulties, attempts have been made to utilize digital

technologies, including Internet of Things (IoT) and patient-generated health
data (PGHD) from devices such as biowearables, smartphones, and home
medical devices, and to execute decentralized clinical trials (DCTs) [8,9,10].
The proliferation of these devices is expected to accelerate patient selection
and adherence to trials. In particular, attempts to increase the efficiency of
clinical research and medical fields using digital health technology have
expanded exponentially as of 2020 [9,10,11].
Clinical research professionals (CRPs) at trial sites are responsible for
collecting patient data and evaluating drugs. Although data collection is
very important in clinical trials, the collection of accurate data is difficult in
practice. In particular, for outpatient clinical trials, although frequent visits
to the institutions can facilitate data collection, they cause inconvenience
to the participants, make it difficult to recruit patients, and increase the
possibility of trial discontinuation by participants. The collected data may
also be questionable. In many cases, the data report is delayed until the next
visit. Therefore, the collected data are often reported to be of poor quality,
missing details such as the occurrence of the event itself, the time, and the
reaction of the participant [12].
At present, to solve these difficulties in data collection, various
technologies for traditional clinical trials and DCTs are being implemented
haphazardly. However, most of these technologies and devices have been
developed from the perspective of sponsors requesting clinical trials
[13,14]. Although, like many other countries, South Korea, where this study
is conducted, ranks sixth (3.68%) in the industry-sponsored trials of 2020
and third (4.5%) in single-site trials. The number of trials is also increasing.
However, the proportion of investigator-initiated trial among all the ongoing
clinical trials is declining. The impact of investigator-initiated trials in the
clinical trial field is decreasing [15].
Thus, there has been no application development reflecting the needs of
CRPs conducting clinical trials at actual institutions. Therefore, CRPs have
to deal with the burden of using and adapting different platforms provided
by various sponsors for clinical trials and educating participants [16].
The CRPs at trial sites who monitor the participants and evaluate
drugs are currently exposed to various clinical trial management systems,
ranging from traditional systems to systems based on the latest technologies.
However, it is difficult to find a system specialized for efficient trial and
patient management that can allow CRPs at the trial site to operate in the
desired way. In this scenario, the development and introduction of a new

trial management system based on the needs of the CRPs at the trial site
can save cost and time and yield more accurate clinical trial results quickly
by using PGHD. Thus, we intended to develop a real-time clinical trial
management application that reflects the needs of medical staff in clinical
trials to facilitate the broader application of digital health technology to
actual clinical trial sites.
MATERIALS AND METHODS

The development of the real-time clinical trial monitoring system was
guided by the waterfall model in the software life cycle (SDLC). SDLC
is a methodology used to create high-quality software by utilizing
clearly defined processes. As the first SDLC approach used in software
development, the waterfall model shows the software development process
in a linear sequential flow. This model is simple and easy to understand and
use, and the entry and exit criteria are well defined. Therefore, the model can
deliver software with quality based on a systematic process, especially when
various experts participate in the development, as in this study [17,18]. The
applied model had four stepwise phases, as shown in Figure 1, and each
phase is described as follows.
Figure 1: Study flow diagram. FGI, focus group interview.

CRPs from the Seoul National University Hospital Clinical Trial Center
were recruited to verify their application needs. Focus group interviews
(FGIs) were conducted by dividing the seven recruited volunteers into two
groups: research doctors and clinical research coordinators. The ideal sample
size for a focus group interview varies according to the literature [19]. In
this study, the sample size was selected to ensure less than 10 people per
group and more than two groups per concept, in accordance with previously
described criteria [20]. The purpose of the application to be developed was
explained to the interviewees, and the needs of the planned application
were collected based on clinical trial situations and application functions.
All seven CRPs (three doctors and four research coordinators) participated
in the FGIs. Through structured open-ended questions, requirements such
as essential needs and functions to be included in the application were
investigated using the FGIs. A qualitative thematic analysis was used for
data analysis. The data were analyzed by grouping the collected data into
similar concepts and then categorizing them [21].
The design of an application system needs to be developed to fully adopt
the practical needs of clinical trials; therefore, a stepwise approach to polish
the system was applied. The function and structure of the application were
designed considering the needs of CRPs collected through the FGIs. The
main menu of the application was designed, followed by the construction
of the information architecture and wireframes. The user interface was
modified to reflect the opinions of the CRPs from the FGIs.
Wizard of Oz (WOZ) prototyping was performed with CRPs by
using the user interface of the application to improve the user experience.
Before actual development, WOZ was a way to test usability through role-
playing with mockup software. Likewise, role-playing to increase usability
was performed by clinical trial experts and a team of software engineers
responsible for the development of application software. Based on the
results of WOZ prototyping, the usefulness and efficiency of the application
user interface were confirmed before actual production. Unit tests were
performed to check the programming errors and usability problems of the
initial version of the prototype application, and the revised version of the
prototype application was made based on the unit test results [22,23].
To test all of the functions in the revised version of the prototype
application, task scenarios were developed as a heuristic evaluation for a
total of 48 tasks in two detailed scenarios. Step-by-step, each scenario was
designed to accomplish tasks, including login, the input of adverse reactions
and self-reports, and wearable device connection. Nelson’s heuristic

principle is the most commonly employed principle for heuristic evaluation;
however, in this study, Joyce’s SMART heuristics (short for smartphone)
[24], which was designed with consideration of the mobile environment,
was used instead. The severity of the problem was measured using a three-
level scale, which can clearly and scientifically quantify the level of the
problem and collect additional viewpoints [25,26]. Heuristic evaluation for
the revised version of the prototype application was conducted not only with
the participants from WOZ but also newly recruited CRPs to control for
possible bias from the engagement for the application development.
The study was conducted in accordance with the Declaration of
Helsinki and approved by the Institutional Review Board of Seoul National
University Hospital (protocol code 2011-114-1173; date: 23 September
2021). From 6 September to 17 September 2021, subjects were recruited
through convenience sampling, and heuristic evaluation was carried out.
Written informed consent was obtained from all subjects involved in the
study.
RESULTS
Phase 1. Requirements Analysis

Thematic analysis was performed using the FGI data obtained from CRPs,
and the collected needs are shown in Table 1. The opinions collected were
classified into seven categories. Among these categories, four categories
were related to the function of the application, such as the need to record
adverse reactions or concomitant drugs, to give remote feedback to the
patients, and patient health record sharing; the other three categories were
related to the design and composition of the application, including needs for
screen menu, format standardization, and design requirements.
Table 1: Collected needs for the real-time clinical trial monitoring system
No. Needs Details

1 Checking for side A function for recording side effects and adverse reactions is
effects and adverse required.
reactions (This information is currently recorded in the comment section
because there is no separate section for recording it)
2 Concomitant drug Taking photos and uploading concomitant drug function informa-
check tion is required to check drug relationships.
3 Remote feedback In addition to the traditional method of calling or texting the
function Clinical Research Coordinator, a function to give feedback to the
patient based on the data recorded in the application is required
(e.g., by analyzing a chat message).
4 Data sharing with A function to share data such as laboratory test results, doctors’
the hospital system feedback, and a brief history of the patient, from the hospital
system, is required.
5 Application menu Separate menus to check medication, diet, concomitant medica-
tions, and adverse reactions are required
6 Standard form use The form of the application should be based on the standard form
currently used in the clinical trial center.
7 Design requirements The design should be based on the target audience of users under
60 years of age.
Phase 2. System Design

Based on the analysis of FGI results, the main menu of the application was
designed to comprise health management, laboratory test results, medications,
meals, concomitant medications, adverse reactions, questionnaires, and
My Alarm. To optimize usability, all menu items were displayed on the
main screen, which was the first screen after login [27]. The design of the
information architecture of the application is schematically illustrated in
Figure 2a, as applied to the menu in needs and the procedures of clinical
trials.
Figure 2: (a) Information architecture; (b) Wireframe. ECG, electrocardio-

gram. BMI, body mass index.
The rows of the information architecture of the main screen are ordered
according to the general procedures of clinical trials, such as recording
symptoms, checking test results, entering medicine records, etc. Columns
of the information architecture were organized by listing the contents to be
recorded in each menu.
A wireframe with more detailed descriptions of buttons and functions
was created, as shown in Figure 2b. The mobile application screen was
designed to be intuitively understandable, as shown in the lower-left panel.
The global navigation bar is located at the top of the screen, and management
items such as weight, blood pressure, daily steps, and blood sugar level are
displayed in a single row for easy recognition. Each item can also have a
separate graph display with the most recent data. The value of each item
can be manually entered by the user, and data is linked with the Samsung
Health app, so when using a wearable device or another measuring device
that works with the Samsung Health app, it can also be entered through the
device. The user interface suggested for the trial version application was
developed by adopting the CRPs’ feedback to remove unnecessary text and
medical terms to make the screen less complicated and to include pictograms
for easy understanding for non-professional trial participants.
Phase 3. Implementation: System Development

Through role-playing interactions using the WOZ prototype, the functions
and configuration of the prototype application were adjusted and enhanced.
Some of the initial functions were changed (e.g., the menus were rearranged
to account for the clinical trial process and for the integration of duplicate
menus). Data collected for concomitant medications and adverse responses
met different data specifications; therefore, the two menus were designed to
be separated. On the other hand, vital sign data from external devices, such
as smartwatches and wearable devices, were combined with the symptom
records. Since the clinical trial patient participants were not clinical
professionals and were not familiar with the terms used in general clinical
trials, the titles of the health report and reminder menus were changed
to include more easily understandable words. The final menu lists and
functions are shown in Table 2. The network comprises an external network
where users’ input data, an internal network used by medical staff, and a
demilitarized zone (DMZ) server for the security of the internal network, as
shown in Figure 3. The application server is responsible for data processing
in the DMZ, and the batch server only performs the function of detecting
arrhythmia in the DMZ. The application programming interface (API)
server allows communication between the internal network and DMZ; the
webserver manages data processing inside the internal network, and the
database stores all the data that are generated. The cloud server outside the
network is responsible for relaying the user’s electrocardiogram (ECG) data
captured from the wearable devices in real-time. The data are transferred in
the following order: Data generated by the user → application server → API
server → database storage → web server → medical check.
Figure 3: Network server configuration diagram. API (Application

Programming Interface) is a software intermediary that allows two applications
to communicate with each other. DMZ (Demilitarized zone) is a small, isolated
network located between an external and internal network for data security.
Table 2: Confirmed main menu structure and functions
No. Menu Function Description

1 Self-report Patient-generated health data (PGHD), including the user’s weight,
fasting blood sugar level, blood pressure, heart rate, body tempera-
ture, and oxygen saturation, were entered and checked.
2 Medication Medication: A medication log, which included the name and time
+ nutrition of each medication or treatment, was maintained. The relevant data
were added to the adverse reaction menu when the participants
showed adverse reactions.
Diet: A meal diary was maintained with photos of each meal, con-
tents, and the time of consumption.
3 Concomitant drug In participants consuming over-the-counter drugs or health supple-
ments other than the test drug, information about the time and
amount of the drugs was entered.
4 Adverse reactions When an adverse reaction occurred, the type, location, period, action
method, picture of the symptoms, etc., were recorded, and the man-
agement of persistent adverse reactions was documented.
5 Symptom record Symptom record: Cough, stuffy nose, sore throat, fatigue, headache,
fever, loss of smell, loss of taste, etc. (corresponding to symptoms of
COVID-19) were reported.
Health Record: Blood pressure and ECG data are input through an
external device (wearable device).
Blood pressure: Data from all devices linked to Samsung Health can
be entered.
ECG: Real-time input through VP-100 (device certified by the Korea
Food and Drug Administration).
6 Daily The user’s medication, nutritional, and health measurement record
to-do items that must be entered each day are presented.
The status changes from to-do to done when the user completes that
task.
ECG data are transmitted through three paths. Raw ECG data are
transmitted in the order mentioned above, whereas the arrhythmia detection
algorithm runs in the batch server and transmits the result to the API server.
Finally, for ECG streaming, the data are sent to the external cloud, and the
CRP accesses the cloud from the internal network to check the streaming
data.
Phase 4. Testing and Evaluation

A heuristic evaluation to identify the major usability issues and scope for
improvement was conducted with the revised version of the prototype
application. Five participants from the WOZ prototyping stage and six
newly recruited CRPs participated in the heuristic evaluation, and the results
of the evaluation are summarized in Table 3.
Table 3: Results of the heuristic evaluation of the clinical trial monitoring ap-
plication
Heuristic Evaluation Contents N (%) Mean Heuristic

Score
Program errors that make it difficult to proceed 5 (45%) 2.20 SMART 3
with the scenario
Errors related to the “symptom input” page 5 (45%) 1.80 SMART 8
configuration and screen information
Inconvenience caused by the graphic method for 5 (45%) 1.40 SMART 11
inputting time
Errors related to the “Health Report” page con- 5 (45%) 1.20 SMART 7
figuration and screen information
Inconvenience caused by a hidden or difficult-to- 3 (27%) 3.00 SMART 6
operate input button
Errors caused by unclear or missing pop-ups 3 (27%) 2.00 SMART 3

Errors caused by missing notifications for the 3 (27%) 1.33 SMART 1
ECG-related connection
Inconsistent screen discomfort 3 (27%) 1.33 SMART 2
Errors related to the “Combination Drugs” page 3 (27%) 1.33 SMART 8
configuration and screen information
Confusing screen configuration that allowed 3 (27%) 1.00 SMART 5
users to input the heart rate in the blood pressure
input window
Inconvenience for elderly individuals or people 2 (18%) 2.00 SMART 10
with reduced vision due to the small font size
Discomfort caused by awkward or difficult-to- 2 (18%) 1.50 SMART 2
understand expressions
Inconvenience caused by the lack of visibility of 2 (18%) 1.50 SMART 6
the configuration of the menu and tab at a glance
Inconvenience caused by the keyboard window 2 (18%) 1.50 SMART 10
covering the screen when typing
A scale ranging from 1 to 3 with 1 being a minor error and 5 being a

critical error. There was no significant difference in the heuristic evaluation
results between the existing participants and the newly recruited CRPs.
None of the heuristic evaluation items indicated critical usability errors
since more than half of the evaluators perfectly used the application for
each test item, which confirmed that the revised version of the prototype
application could be practically applied to clinical trials. When duplicate
errors were allowed for each item, the third and eighth errors were confirmed
eight times each. Next, the sixth, seventh, and eleventh errors were confirmed
five times each. In particular, among heuristic three errors, “errors that make
it difficult to proceed with the scenario,” were identified by five evaluators
and confirmed as the most frequent error. Three participants (27% of the
participants) pointed out that “the input button is hidden or difficult to
select,” and it was confirmed and evaluated as a “fatal error” with three
points.
DISCUSSION
In this study, to develop a real-time monitoring application, a stepwise
approach was applied to improve usability, starting with an analysis of
needs. The initial FGIs for needs analysis identified requests for “side
effects and adverse reaction identification services”, “concomitant drug
identification capabilities”, and “remote feedback functions”. The primary
goal of a clinical trial is to assess the benefit-to-risk ratio of the drug or

treatment under consideration [28]. The therapeutic benefits of an agent,
which represent its impact, can be determined within a predictable range.
Simultaneously, the risks and adverse effects should be investigated in
consideration of the causal link between adverse events and clinical trials.
In this regard, the collection of reliable data for adverse events is crucial
but quite difficult [29,30]. Thus, the function request for “side effects and
adverse reaction identification services” appears to be a way to solve the
difficulties associated with clinical trials.
In many sponsor-initiated trials, predominantly clinical trials, the
information and communication technology devices and software used
for the clinical trial are usually developed by the sponsors and applied
to the clinical trial sites. This system forces the trial site’s clinical trial
professionals to learn to operate a new device or system and to educate the
participants whenever they receive a request [31]. In contrast, the functions
of an application can be learned relatively easily if the application design
matches the system of the user’s institution, emphasizing the importance of
a similar structure [32]. Despite the fact that personal health records in the
hospital information system (HIS) are now required in clinical trial research,
a number of obstacles, such as the lack of interoperability between clinical
trial research systems and HIS, make their usage challenging. This may
indicate the need for HIS compatibility [33,34,35]. Moreover, though the
target group of each clinical trial varies depending on the stage and type of
the trial, healthy volunteers participating in the phase 1 trial are relatively
easy to generalize and constitute the largest number of participants across
clinical trials. This indicates the need for application development based
on its use in healthy adults. Especially in specialized domains, such as
medical systems, the participation of actual users in program development is
crucial. In this regard, visualizing the contents of the developed application,
information architecture, and wireframes to encourage the involvement of
CRPs who were not specialists in software development facilitated easy
participation.
From the developer’s point of view, the use of the WOZ methodology
for communication with the CRPs needs to be actively considered. In
particular, although it was not possible to conduct research on real clinical
trial participants in this study, it is a unique experience for CRPs to
experience the position of participants through the WOZ process. In fact,
the end general usability can be greatly increased through the WOZ process
because the actual end-users are generally accepting of the program that
CRPs provide, and users often do not give active and negative feedback to
the CRPs in the clinical environment.
For interoperability and security, the creation of applications with Fast
Healthcare Interoperability Resources (FHIR) as a standard was considered
[36,37]. However, because of security reasons as well as practical difficulties
in recruiting a technician with FHIR-based production experience,
interoperability with the hospital network was not implemented.
Various solutions are also required for processing ECG data. For real-time
ECG streaming, the limited server resources and bandwidth with the DMZ
server caused delays in streaming. This problem was solved by transmission
using a cloud server. However, this approach introduced a security issue
because the cloud server was not located in the internal network. To solve
the security issue, the Amazon Web Services Cloud, which is known for
its relatively stable security among cloud services, was used. Only the
function of viewing the ECG graph transmitted by the cloud was performed
in the internal network, effectively blocking other data connections between
networks. The arrhythmia detection algorithm also encountered resource
issues. This problem was solved by physically separating one DMZ server
into two logical servers, configuring the batch server, and processing the
algorithm in the batch server. Thus, problems that occurred during the actual
development process were solved within the limited available resources.
Heuristic evaluation was used to confirm the direction of improvement.
In the heuristic evaluation, various categories of tests were planned, although
the actual results indicated the importance of improving the overall usability
of the application on the basis of individual errors rather than classifying
the errors by category. For example, if an error occurred because the button
on the screen was hidden, some participants reported it as a design error,
whereas others considered it a configuration error. In this regard, developers
should be careful about developing applications on a subjective basis without
fully reflecting the needs of users. The most common complaint identified
in the heuristic evaluation was that the input window and text were too
small. Thus, the evaluation suggested that animations or design elements
used to improve aesthetics may not benefit end-users who frequently use the
application. However, the positive responses to the screen composition and
other aesthetic aspects indicated the importance of identifying a compromise
for these aspects. It was considered that the CRP group newly participating
in application development would evaluate the application from a different
angle than the CRP group that continued to participate in application
development, but there was no significant difference in the heuristic results

between the two groups [38]. This is because the CRP group relatively
familiar with applications is not an expert group in application development.
It is thought that these results came from the fact that both groups have in
common that they are CPR.
The developed application is designed to input data related to the site’s
frequently performed clinical trials according to the needs of CRPs, and
data related to specific COVID-19 symptoms were added to accommodate
the needs created by the pandemic. To improve the suitability of the
application for various clinical trial situations, the addition of input fields
and linkage with hospital data, such as laboratory and imaging test results,
will be necessary. In addition, although the unit test and heuristic evaluation
confirmed that the data entered by the evaluators according to the manual
were stored without errors in the server, the simultaneous transmission of
heavy and bandwidth-consuming data, such as ECG streaming data, has not
yet been tested. Further studies are needed to confirm the fidelity of data
transmission.
With the increasing importance of remote clinical trials, a clinical trial
application that completely replaces the need for the direct participation of
CRPs in clinical research is expected to control the cost escalations and
unnecessary period extensions caused by traditional clinical trial conduct.
Despite the various implications of this study, it has several limitations.
Technically, a direct link between FHIR and hospital HIS was not
implemented, and only the Android version was produced, so it cannot be
used in the iPhone environment. The limitations of this study are as follows.
First, its use has not been verified in actual clinical practice using the
developed application. Second, no direct usability evaluation was performed
on clinical trial subjects during the manufacturing process. Third and last,
it has not been tested for direct differences compared to other sponsor-led
applications.
CONCLUSIONS
In this study, we developed an application to address the difficulties
associated with subject management in traditional clinical trials. After the
development of the application, a heuristic evaluation was performed to
reflect the user demands of existing clinical trial workers. These evaluations
made it possible to confirm various consistencies in the application functions
and user interface. Unlike other studies, this study explains the researcher-
led application development process in great detail and provides insights

that were gained from each development process. In addition, the fabrication
process described in this study will serve as a basis for the development of
similar applications. In the future, additional real user testing and data safety
studies will be needed, and the efficiency of clinical trial management is
expected to improve if the application streamlined through such evaluations
is applied to actual clinical trials.
Conceptualization, H.R.; resources, H.R.; methodology, H.R. and M.P.;
validation, H.K.; data curation, M.P. and H.R.; writing—original draft
preparation, H.R.; writing—review and editing, H.R. and M.P.; visualization,
H.R. and W.Y.; supervision, K.H.K.; project administration, K.H.K.; funding
acquisition, K.H.K. All authors have read and agreed to the published version
of the manuscript.
REFERENCES
1. Moore, T.J.; Zhang, H.; Anderson, G.; Alexander, G.C. Estimated
Costs of Pivotal Trials for Novel Therapeutic Agents Approved by the
US Food and Drug Administration, 2015–2016. JAMA Intern. Med.
2018, 178, 1451–1457.
2. Fischer, S.M.; Kline, D.M.; Min, S.J.; Okuyama, S.; Fink, R.M. Apoyo
Con Carino: Strategies to Promote Recruiting, Enrolling, and Retaining
Latinos in a Cancer Clinical Trial. J. Natl. Compr. Canc. Netw. 2017,
15, 1392–1399.
3. Fogel, D.B. Factors Associated with Clinical Trials That Fail and
Opportunities for Improving the Likelihood of Success: A Review.
Contemp. Clin. Trials Commun. 2018, 11, 156–164.
4. Soares, R.R.; Parikh, D.; Shields, C.N.; Peck, T.; Gopal, A.; Sharpe,
J.; Yonekawa, Y. Geographic Access Disparities to Clinical Trials in
Diabetic Eye Disease in the United States. Ophthalmol. Retina 2021,
5, 879–887.
5. Carlisle, B.G. Clinical Trials Stopped by COVID-19 [Internet]. The
Grey Literature. 2020. Available online: https://covid19.bgcarlisle.
com/ (accessed on 1 January 2022).
6. Asaad, M.; Habibullah, N.K.; Butler, C.E. The Impact of COVID-19
on Clinical Trials. Ann. Surg. 2020, 272, e222–e223.
7. Hamzelou, J. World in Lockdown. New Sci. 2020, 245, 7.
8. Apostolaros, M.; Babaian, D.; Corneli, A.; Forrest, A.; Hamre, G.;
Hewett, J.; Podolsky, L.; Popat, V.; Randall, P. Legal, Regulatory, and
Practical Issues to Consider When Adopting Decentralized Clinical
Trials: Recommendations from the Clinical Trials Transformation
Initiative. Ther. Innov. Regul. Sci. 2020, 54, 779–787.
9. Hashiguchi, T.C.O. Bringing Health Care to the Patient: An Overview
of the Use of Telemedicine in OECD Countries; OECD Health Working
Papers, No. 116; OECD Publishing: Paris, France, 2020.
10. Weinstein, R.S.; Lopez, A.M.; Joseph, B.A.; Erps, K.A.; Holcomb, M.;
Barker, G.P.; Krupinski, E.A. Telemedicine, Telehealth, and Mobile
Health Applications That Work: Opportunities and Barriers. Am. J.
Med. 2014, 127, 183–187.
11. Won, J.H.; Lee, H. Can the COVID-19 Pandemic Disrupt the Current
Drug Development Practices? Int. J. Mol. Sci. 2021, 22, 5457.
12. Little, R.J.; D’Agostino, R.; Cohen, M.L.; Dickersin, K.; Emerson,
S.S.; Farrar, J.T.; Frangakis, C.; Hogan, J.W.; Molenberghs, G.;
Murphy, S.A.; et al. The Prevention and Treatment of Missing Data in
Clinical Trials. N. Engl. J. Med. 2012, 367, 1355–1360.
13. Inan, O.T.; Tenaerts, P.; Prindiville, S.A.; Reynolds, H.R.; Dizon,
D.S.; Cooper-Arnold, K.; Turakhia, M.; Pletcher, M.J.; Preston, K.L.;
Krumholz, H.M.; et al. Digitizing Clinical Trials. NPJ Digit. Med.
2020, 3, 101.
14. Kario, K.; Tomitani, N.; Kanegae, H.; Yasui, N.; Nishizawa, M.;
Fujiwara, T.; Shigezumi, T.; Nagai, R.; Harada, H. Development of
a New ICT-Based Multisensor Blood Pressure Monitoring System
for Use in Hemodynamic Biomarker-Initiated Anticipation Medicine
for Cardiovascular Disease: The National IMPACT Program Project.
Prog. Cardiovasc. Dis. 2017, 60, 435–449.
15. Korea National Enterprise for Clinical Trials [Internet]. The Grey
Literature. 18 May 2021. Available online: https://www.konect.or.kr/
kr/contents/datainfo_data_01_tab03/view.do (accessed on 15 March
2022).
16. Levy, H. Reducing the Data Burden for Clinical Investigators. Appl.
Clin. Trials. 2017, 26, 17. [Google Scholar]
17. Roger, S.P.; Bruce, R.M. Software Engineering: A Practitioner’s
Approach; McGraw-Hill Education: New York, NY, USA, 2015.
18. Pressman, R.S. Software Engineering: A Practitioner’s Approach;
Palgrave MacMillan: London, UK, 2005.
19. Carlsen, B.; Glenton, C. What about N? A Methodological Study
of Sample-Size Reporting in Focus Group Studies. BMC Med. Res.
Methodol. 2011, 11, 26.
20. Krueger, R.A. Focus Groups: A Practical Guide for Applied Research;
Sage Publications: Thousand Oaks, CA, USA, 2014.
21. Hsieh, H.F.; Shannon, S.E. Three Approaches to Qualitative Content
Analysis. Qual. Health Res. 2005, 15, 1277–1288.
22. Green, P.; Wei-Haas, L. The Rapid Development of User Interfaces:
Experience with the Wizard of Oz Method. Proc. Hum. Factors Soc.
Annu. Meet. 1985, 29, 470–474.
23. Pettersson, J.S.; Wik, M. The Longevity of General Purpose Wizard-
of-Oz Tools. In Proceedings of the Annual Meeting of the Australian
Special Interest Group for Computer Human Interaction, Parkville,

Australia, 7–10 December 2015.
24. Joyce, G.; Lilley, M.; Barker, T.; Jefferies, A. Mobile Application
Usability: Heuristic Evaluation and Evaluation of Heuristics. Advances
in Human Factors, Software, and Systems Engineering; Springer:
Cham, Switzerland, 2016; pp. 77–86.
25. Sauro, J.; Lewis, J. Should All Scale Points Be Labelled? 2020.
Available online: https://measuringu.com/scale-points-labeled/
accessed (accessed on 1 January 2022).
26. Sauro, J. Rating the Severity of Usability Problems. 2013. Available
online: http://www.measuringu.com/blog/rating-severity.php
(accessed on 1 January 2022).
27. Norman, D. The Design of Everyday Things: Revised and Expanded
Edition; Basic Books: New York, NY, USA, 2013.
28. Umscheid, C.A.; Margolis, D.J.; Grossman, C.E. Key Concepts of
Clinical Trials: A Narrative Review. Postgrad. Med. 2011, 123, 194–
204.
29. Allen, E.N.; Chandler, C.I.R.; Mandimika, N.; Leisegang, C.; Barnes,
K. Eliciting Adverse Effects Data from Participants in Clinical Trials.
Cochrane Database Syst. Rev. 2018, 1, MR000039.
30. Berlin, J.A.; Glasser, S.C.; Ellenberg, S.S. Adverse Event Detection in
Drug Development: Recommendations and Obligations Beyond phase
3. Am. J. Public Health 2008, 98, 1366–1371.
31. Lawry, S.; Popovic, V.; Blackler, A.; Thompson, H. Age, Familiarity,
and Intuitive Use: An Empirical Investigation. Appl. Ergon. 2019, 74,
74–84.
32. Khairat, S.; Burke, G.; Archambault, H.; Schwartz, T.; Larson, J.;
Ratwani, R.M. Perceived Burden of EHRs on Physicians at Different
Stages of Their Career. Appl. Clin. Inform. 2018, 9, 336–347.
33. Mc Cord, K.A.; Ewald, H.; Ladanie, A.; Briel, M.; Speich, B.; Bucher,
H.C.; Hemkens, L.G.; RCD for RCTs Initiative and the Making
Randomized Trials More Affordable Group. Current Use and Costs of
Electronic Health Records for Clinical Trial Research: A Descriptive
Study. CMAJ Open. 2019, 7, E23–E32.
34. National Academies of Sciences, Engineering, and Medicine.

Reflections on Sharing Clinical Trial Data: Challenges and a Way
Forward: Proceedings of the a Workshop; National Academies Press:
Washington, DC, USA, 2020.
35. Parab, A.A.; Mehta, P.; Vattikola, A.; Denney, C.K.; Cherry, M.;
Maniar, R.M.; Kjaer, J. Accelerating the Adoption of eSource in
Clinical Research: A Transcelerate Point of View. Ther. Innov. Regul.
Sci. 2020, 54, 1141–1151.
36. Rosa, M.; Faria, C.; Barbosa, A.M.; Caravau, H.; Rosa, A.F.; Rocha, N.P.
A Fast Healthcare Interoperability Resources (FHIR) Implementation
Integrating Complex Security Mechanisms. Procedia Comput. Sci.
2019, 164, 524–531.
37. Saripalle, R.; Runyan, C.; Russell, M. Using HL7 FHIR to Achieve
Interoperability in Patient Health Record. J. Biomed. Inform. 2019, 94,
103188.
38. Othman, M.K.; Ong, L.W.; Aman, S. Expert vs novice collaborative
heuristic evaluation (CHE) of a smartphone app for cultural heritage
sites. Multimed. Tools. Appl. 2022, 81, 6923–6942.
Chapter
THE CORONASURVEYS
SYSTEM FOR COVID-19
INCIDENCE DATA
15
COLLECTION AND
PROCESSING
Carlos Baquero1 , Paolo Casari 2 , Antonio Fernandez Anta3, Amanda

García-García3 , Davide Frey 4 , Augusto Garcia-Agundez 5 , Chryssis
Georgiou6 , Benjamin Girault 7 , Antonio Ortega7 , Mathieu Goessens 8 ,
Harold A. Hernández-Roig9 , Nicolas Nicolaou10, Efstathios Stavrakis 10,
Oluwasegun Ojo11, Julian C. Roberts 12 and Ignacio Sanchez 13
1
U. Minho and INESC TEC, Braga, Portugal
2
Department of Information Engineering and Computer Science, University of Trento,
Trento, Italy
3
Citation: (APA): Baquero, C., Casari, P., Fernandez Anta, A., García-García, A., Frey,
D., Garcia-Agundez, A., ... & Sanchez, I. (2021). The CoronaSurveys system for CO-
VID-19 incidence data collection and processing. Frontiers in Computer Science, 3,
641237. (10 pages)
Copyright: © 2021 Baquero, Casari, Fernandez Anta, García-García, Frey, Garcia-
Agundez, Georgiou, Girault, Ortega, Goessens, Hernández-Roig, Nicolaou, Stavrakis,
Ojo, Roberts and Sanchez. This is an open-access article distributed under the terms
of the Creative Commons Attribution License (CC BY): http://creativecommons.org/
licenses/by/4.0/
4
Inria Rennes, Rennes, France
5
Multimedia Communications Lab, TU Darmstadt, Darmstadt, Germany
6
Department of Computer Science, University of Cyprus, Nicosia, Cyprus
7
8
Consulting, Rennes, France
9
Department of Statistics, UC3M & UC3M-Santander Big Data Institute, Getafe, Spain
10
11
IMDEA Networks Institute and UC3M, Madrid, Spain
12
Skyhaven Media, Liverpool, United Kingdom
13
InqBarna, Barcelona, Spain
CoronaSurveys is an ongoing interdisciplinary project developing a system

to infer the incidence of COVID-19 around the world using anonymous
open surveys. The surveys have been translated into 60 languages and
are continuously collecting participant responses from any country in the
world. The responses collected are pre-processed, organized, and stored in
a version-controlled repository, which is publicly available to the scientific
community. In addition, the CoronaSurveys team has devised several
estimates computed on the basis of survey responses and other data, and
makes them available on the project’s website in the form of tables, as well
as interactive plots and maps. In this paper, we describe the computational
system developed for the CoronaSurveys project. The system includes
multiple components and processes, including the web survey, the mobile
apps, the cleaning and aggregation process of the survey responses, the
process of storage and publication of the data, the processing of the data
and the computation of estimates, and the visualization of the results. In
this paper we describe the system architecture and the major challenges we
faced in designing and deploying it.
INTRODUCTION
During the current coronavirus pandemic, monitoring the evolution of
COVID-19 cases is of utmost importance for the authorities to make
informed policy decisions (e.g., lock-downs), and to raise awareness in the
general public for taking appropriate public health measures.
The CoronaSurveys System for COVID-19 Incidence Data Collection ... 289
At the time of the pandemic outbreak, a lack of laboratory tests, materials,

and human resources implied that the evolution of officially confirmed cases
did not represent the total number of cases (Ruppert et al., 2018; Maxmen,
2020). Even now, there are significant differences across countries in terms
of the availability of tests. For this reason, given the rapid progression of
the pandemic, in some cases health authorities are forced to make important
decisions based on sub-optimal data. For this reason, alternatives to testing
that can be rapidly deployed are likely to help authorities, as well as the
general population, to better understand the progress of a pandemic (Yang et
al., 2012), particularly at its early stages or in low income countries, where
massive testing is unfeasible.
To this end, we have created a system, named CoronaSurveys1 to estimate
the number of COVID-19 cases based on crowd-sourced open anonymous
surveys. CoronaSurveys has been operating since March 2020, starting with
only three countries (Spain, Portugal and Cyprus) and currently offering
surveys for all the countries in the globe.
CoronaSurveys uses the network scale-up method (NSUM) (Russell
Bernard et al., 1991; Bernard et al., 2010), which implements indirect
reporting to: 1) reach a wider coverage in a shorter time frame, 2) obtain
estimates that converge faster to the true value, and 3) preserve the privacy
of the participants. The individual responses act as snapshots of knowledge
of the current situation of the pandemic from a personal point of view.
When these responses are analyzed collectively, across time and geographic
locations, a combined view of the pandemic can be inferred. To the best of
our knowledge, this is the largest scale NSUM system ever deployed and
the only one to be collecting data continuously over a period of over a year
using open surveys.
In this paper, we present the main components of the current
CoronaSurveys infrastructure, including the collection, processing
and visualization methods used. The computational system powering
CoronaSurveys has been designed as the aggregation of lightly coupled
components that can be replaced and modified almost independently (see
architecture in Figure 1). This has enabled the system to continuously adapt
to the evolution of the COVID-19 pandemic with relatively low effort,
demonstrating its extensibility, re-usability, and potential to be used for
tracking future pandemic outbreaks.
Figure 1: CoronaSurveys computational system architecture.
DATA COLLECTION
The data collection subsystem consists of 1) a user-centered web and mobile
front-end interface, providing a straightforward and intuitive access to the
surveys, and 2) a data collection back-end enabling response aggregation in
a consistent and structured format to facilitate post-processing.
Front-end: Survey Design

Usability, interaction, and user interfacing play key roles in the initial
engagement and subsequent retention of participants. To this end, we pay
attention to two main elements: 1) the appearance and usability of the front-
end solutions, and 2) the contents and length of the survey.
The web and mobile survey applications have been designed to have
minimal loading times, with lightweight graphical elements, a color
scheme and page layout suitable for all users, including visually impaired
participants and participants in geographic locations where internet speeds
may be poor (see Figure 2). For instance, a tailor-made cache system has
been built and deployed to minimize the survey loading time. Similarly,
in order to be able to improve accessibility and user experience, the initial
website was migrated from GitHub pages to a Wordpress deployment in a
server managed by the project team.
Figure 2: Snapshots of the Coronasurveys app. It shows the main app screen
(left), the information about the project shown when accessing the survey (cen-
ter), and the survey questions (right).
To preserve user engagement, minimize participant fatigue, and ensure
a steady flow of responses we initially designed a minimal survey consisting
of two simple questions:
• How many people do you know personally in this geographical
area? Include only those whose health status you are likely to
be aware of (The geographical area was previously selected, see
Figure 2.)
• How many of those were diagnosed with or have symptoms of
COVID-19?
We denote the reply to the first question as the Reach,riri, and the reply to
the second question as the Number of Cases,cici. In this way, the aggregated
value provides a rough estimate of the incidence of COVID-19. The
simplicity of the survey, together with the increased interest of people in the
initial stages of the pandemic, led to successful initial survey deployments
(e.g., 200 responses per week in Spain, 800 responses in the first day in
Cyprus, and more than 1,000 in Ukraine). Despite their simplicity, these
two questions were sufficient for producing rough preliminary estimates of
the cumulative incidence of COVID-19 in several countries, in a period in
which testing was scarce.
As CoronaSurveys expanded its reach, additional questions were

introduced to improve granularity and estimate more parameters of the
pandemic (like fatalities), while maintaining the survey completion time at
around 1 min. Currently, the survey also includes the following questions:
• Of the people with symptoms, how many are still sick?
• How many started with symptoms in the latest 7 days?
• How many passed away?
By including these additional questions, we are able to track the number
of active cases (Question 3), new cases (Question 4), and the cumulative
number of fatalities (Question 5).
Data Aggregation
The back-end data collection engine was designed to provide seamless
aggregation of the data in a consistent and structured format. Timeliness,
consistency, and proper dissemination of the data were the three main pillars
of the aggregation process. CoronaSurveys updates its estimates daily to
provide a comparison with the estimates of officially confirmed cases,
which are also updated once per day. This daily aggregation also serves as a
privacy preserving measure, as we discuss in the next section.
During aggregation, survey responses are classified by country and
stored in individual files named as CC-aggregate.csv, where CC is the two
letter ISO code of the country. Each row in the file corresponds to a single
response and is composed of the elements that appear in Table 1: the date
of the response, the country for which the response reports, the country ISO
code, the region in the country for which the response reports (if any), the
region ISO code, the language used to fill the survey, the answers to the
survey questions (Var1,…VarnVar1,…Varn), a cookie that anonymously
identifies a participant, and a campaign field that can be used to identify
responses that correspond to specific survey dissemination campaigns. The
aggregated data is then provided to the estimation engine and published in
an online public repository (GCGImdea/coronasurveys, 2020).
Table 1: Aggregation row format

User Privacy
Ensuring anonymity and privacy is important to minimize reservations
from participants on filling the survey. Ideally, we would like to acquire as
much relevant data as possible (e.g., geolocation), but this is orthogonal to
anonymity and is likely to lead to less responses. CoronaSurveys implements
four anonymity strategies:
Avoid Third Party Tracking

One of the initial concerns was to eliminate the possibility of a third party
to collect data from participants. Although first deployed in Google Forms,
we quickly moved the surveys to a self-hosted instance of the open-source
tool Limesurvey (LimeSurvey Project Team/Carsten Schmitz, 2012) to
minimize this risk.
Avoid Revealing User Identity

CoronaSurveys does not ask any personal questions, and only collects data
about the contacts of the participant. The data collected from each participant
is limited to the day in which the survey was completed, the geographical
region for which the user wishes to provide information, and the replies to
the aforementioned questions.
Secure User Identification

Identifying users who return to the CoronaSurveys system, while preserving
their anonymity, is necessary to prevent malicious and repetitive responses
that can skew our input data. Given our goal to avoid storing personal
information, creating personal accounts was not possible, and instead we
decided to create a random cookie at the participants browser, or device,
to provide an identification for the user and stored it along with the time
the survey was last filled in. The cookie is stored in an encrypted form.
This cookie can help us detect some duplicate responses and some malicious
attacks (anonymous duplication), but does not ensure security. For example,
a user could submit its responses from multiple devices, and each would be
associated with a different cookie. To remove further malicious responses,
we implement outlier detection algorithms described in Section 3.
Protecting User Identity

Tracking the time when a user submits a response may allow an adversary
to recover their true identity. For this reason, we 1) do not include the time
of the day in the aggregated and published data, and 2) shuffle the responses
of a single day preventing an adversary from extracting the order in which
responses were received.
DATA ANALYSIS
Based on the aggregated, anonymous data, CoronaSurveys employs several
methods to produce estimates of the number of COVID-19 cases in all
geographical areas for which sufficient data are available, comparing these
estimates with those provided by the official authorities. The estimation
methods are:
• cCFR-based: This method is based on estimating the
corrected case fatality ratio (cCFR), from the official numbers
of cumulative cases and fatalities, and taking an estimation of
the approximate number of cases with known outcomes into
consideration. It is also assumed that a reliable value of the
traditional case fatality ratio (CFR*CFR*) is available (We use
CFR*=1.38%CFR*=1.38% with a 95%95% confidence interval
of 1.23%1.23% and 1.53%1.53%, as described in (Verity et al.,
2020).) Then, the number of cases is estimated by multiplying
the official figure of cumulative cases in a region D by the ratio
cCFR(D)/CFR*cCFR(D)/CFR*, where cCFR(D)cCFR(D) is the
cCFR estimated for D.
• cCFR-fatalities: This method divides the official number of
fatalities on a given day d by CFR*CFR*, and assigns the resulting
number of cases to day d−Pd−P (P is the median number of days
from symptom onset to death). We use P=13P=13, following the
values reported by the Centers for disease Control and Prevention
(Centers for Disease Control and Prevention, 2021a).
• UMD-Symptom-Survey: This method uses the responses to
direct questions about symptoms from the University of Maryland
COVID-19 World Survey (Fan et al., 2020) to estimate active
cases. In particular, it counts the number of responses that declare
fever, and cough or difficulty breathing. This survey collects
more than 100,000100,000 individual responses daily.
• UMD-Symptom-Survey-Indirect: This method estimates active

cases applying the NSUM method to the responses of an indirect
question from the University of Maryland COVID-19 World
Survey (Fan et al., 2020). In this estimation method the Reach
is obtained from the CoronaSurveys data, while the Number of
Cases are the cases reported by answering YES to the question 1)
“Do you personally know anyone in your local community who
is sick with a fever and either a cough or difficulty breathing?”
and answering the question 2) “How many people do you know
with these symptoms?”
• 300Responses: This method uses a weighted average of 300
filtered CoronaSurveys responses for a given geographical area.
Filtering consists in discarding answers that report an unusually
large reach (entries larger than 1.5 times the interquartile range
above the upper quartile) or an unusually large number of cases
(over 1/31/3 of cases in the reach).
• Estimates-W: This method uses a weighted average of
CoronaSurveys responses from the last W days, using the same
filtering criteria as 300Responses.
Cookies allow us to make sure we only count the latest answer for each
respondent in each aggregated batch (set of 300 responses for estimates-300,
or last W days for estimates-W).
The estimates obtained with the above methods are stored in the online
public repository. Each method M stores the estimates in a folder named
estimates-M/PlotData, and the estimates for each country CCCC is stored in
the file CCCC-estimate.csv in the format shown in Table 2.
Table 2: Estimates row format
Once in the predefined format, the estimates are imported in a time-

series database, from which we generate visualizations. Time-series
databases are often used to store streaming data organized in measurements.
For CoronaSurveys, each series of estimates obtained with a given method
is one such measurement, while the date, the country, the region, and
the population, are characteristics of the measurements, facilitating the
localization of the estimates.
DATA VISUALIZATION
Finally, converting the computed data to meaningful visualizations is
essential to observe trends, insights, and behaviors from our data, as well as
to communicate our outcomes to a wider audience. Our visualization engine
employs the Grafana (Grafana Labs, 2018) framework, which enables the
creation of interactive plots of various types. We can group our plots into
three categories, based on the information they provide:
• CoronaSurveys participation statistics
• Global-scale visualizations
• Local-scale visualizations
To better map the effects of the pandemic and in order to capture a holistic
view of its impact, we present the computed estimates in both global, and
countrywide (local) visualisations. Global visualisations intend to expose the
distribution of the pandemic around the globe, and identify areas with higher
infection rates. Countrywide visualisations aim to pinpoint the estimated
magnitude of the problem compared to officially reported cases.
Coronasurveys Participation Statistics

Figure 3 depicts the statistics of CoronaSurveys participation. In just over
12 months, CoronaSurveys has collected data from roughly 25,00025,000
participants worldwide, with Spain being the country with the most
responses. This means that the absolute reach in Spain is significantly
higher. However, the country with largest relative reach with respect to the
population is Cyprus, with almost 1,3001,300 responses, an absolute reach of
more than 30,00030,000, and a population of roughly 1 million. This figure
also reflects the success of indirect reporting: with this method, we obtain
the information of more than 50 times the number of survey responses, more
than a million persons in total.
Figure 3: Coronasurveys statistics of participation and reach.
Global-Scale Visualizations
Our goal for the global visualisations is twofold: 1) to provide a snapshot
of the pandemic based on the latest computed estimates and 2) to provide
a comparative plot exposing the progress of the virus in multiple countries.
A map is one of the most intuitive ways to present an instance of the
data on a global scale. Therefore, Figure 4 presents a map visualization that
includes the estimates of the percentage of cumulative cases (infected) per
country based on the cCFR algorithm (ccfr-based). Bubble points can capture
the magnitude of a value by adjusting their color based on a predefined color
scale, and their radius relative to the maximum and minimum values on the
map. On the top left of the figure there are visible drop-down menus to select
other estimators and metrics.
Figure 4: Cumulative number of cases, estimated with the cCFR-based meth-

od, around the globe. A larger radius means a higher percentage.
Figure 5 provides a comparison of the countries most affected by the
pandemic. This plot also presents the estimates, based on the Estimates-W
algorithm. We show in this figure only the lines for United Kingdom, Brazil,
Portugal, France and Chile are shown for clarity (Lines can be shown or
hidden individually in the website plot.)
Figure 5: Global estimates of cumulative number of cases obtained with meth-

od Estimates-W. Only the lines for United Kingdom, Brazil, Portugal, France
and Chile are shown for clarity.
Local-Scale Visualizations
For local-scale visualization, we display the evolution in the number of active
cases, new daily cases, and contagious cases (see Figure 6), estimated with
some of the methods described above. To estimate the number of active and
contagious cases when only daily cases are available (e.g., from confirmed
data) we assume that cases are active and contagious for 18 and 12 days,
respectively, (Centers for Disease Control and Prevention, 2021a; Centers
for Disease Control and Prevention, 2021b). Observe in Figure 6 that the
ratios of active cases estimated on the last day (April 26th, 2021) with the
responses to direct symptom questions (blue line, 4.31%4.31%) and to the
indirect questions using NSUM (purple line, 2.87%2.87%) are one order
of magnitude larger than those obtained with the official number of cases
(0.33%0.33%) and the official number of fatalities (0.31%0.31%) (The
reason for the difference between the blue and the purple lines is currently
under evaluation.)
Figure 6: Estimates of number of cases in India. The estimates of active cases

obtained with data from the University of Maryland COVID-19 World Survey
(Fan et al., 2020) are one order of magnitude higher than those obtained from
official data.
To illustrate the estimates obtained from the survey we use Portugal,
a country for which we obtained a good number of replies (see Figure 7).
Observe the increase in the number of replies in February 2021, when a paid
campaign in Facebook Ads was deployed in Portugal. Country-level plots
present a comparison of the different estimation methods for cumulative
number of cases, including the report of the official authorities. Figure 8
presents the cumulative number of cases estimates in CoronaSurveys for
Portugal. The thin green line is the number of cases reported by the official
authorities, while the remaining curves present the estimates obtained with
cCFR-based, 300Responses, and estimates-W. As can be seen, all curves
have similar trends, but cCFR-based 300Responses, and estimates-W have
sensibly larger values than the official data.
Figure 7: Survey responses in Portugal. Observe the increase of participation

obtained in February 2021 with a paid campaign in Facebook Ads.
Figure 8: Estimates for Portugal.
RESULTS
To test the feasibility of using CoronaSurveys to provide accurate estimates
of the number of cases, we conducted a comparison between our estimates
and the results of massive serology testing in Spain, a study conducted by
Pollan et al. (Pollán et al., 2020). In this study (García-Agundez et al., 2021),
we calculated the correlation between our estimates and the serology results
across all regions (autonomous communities) of Spain in the timeframe
of the serology study. The serology study recruited n=61075n=61075
participants, which represents 0.1787% ± 0.0984%0.1787% ± 0.0984%
of the regional population. In contrast, CoronaSurveys data provides
information on n=67199n=67199 people through indirect reporting, or
0.1827% ± 0.0701%0.1827% ± 0.0701% of the regional population.
This resulted in a Pearson R squared correlation of 0.89. In addition,
we observed that CoronaSurveys systematically underestimates the number
of cases by a factor of 46%, possibly due to asymptomatic cases. This ratio

is consistent with other survey implementations that used direct reporting
instead (Oliver et al., 2020).
Although further comparisons in other countries are necessary once we
have sufficient data and similar serology studies are available, we believe
this strongly supports the use of open surveys as an additional source of
information to track the progress of pandemics.
CONCLUSION
In this article, we present the system architecture and estimation methods
of CoronaSurveys, which uses open surveys to monitor the progress of the
COVID-19 pandemic. Our graphical estimations require large amounts
of data from active participants, but provide insightful depictions of the
progress of the pandemic in different regions, offering an estimation of the
cumulative and active number of cases in different geographical areas.
The most important challenge and limitation of CoronaSurveys is the
number of survey responses. In this sense, the dissemination of our graphical
estimations is important to maximize user engagement and retention. For
this reason, in the future we aim to include a forecast of the number of cases
and fatalities based on recent data for different geographical areas, in order
to empower the dissemination of our graphical visualizations and with it
increase user recruitment.
In addition, our outlier detection methods are heuristic and could,
in the future, be improved to be more resilient to malicious responses.
CoronaSurveys is a work in progress, and features such as the number of
responses per day could be implemented to detect certain types of malicious
attacks which open online surveys may be subjected to.
Our first evaluation, comparing the results of CoronaSurveys with a
serology study in Spain provided excellent results, supporting open surveys
and indirect reporting as potential sources of information to track pandemics,
although further comparisons in different regions are required. An interesting
topic of discussion would be the minimum number of responses required to
provide reasonably accurate estimates, as increasing number of replies will
balance out individual inaccuracies of over- or underestimation and improve
the functionality of our outlier detection methods, following the “wisdom of
the crowd” phenomenon. Naturally, the minimum number of responses will
depend on factors such as population dispersion and cultural differences on
behavior, but our initial estimate is that by indirectly providing information

for a percentage of the population similar to that of a massive serology study,
we can already provide valuable estimates.
In conclusion, massive serology testing is ultimately the standard to
accurately estimate the prevalence of COVID-19 in a region. However, this
has its limitations, since it requires time until deployment, involves massive
resources, and is unfeasible in some scenarios and countries. As an example,
in the current outbreak in India as of April 2021, the level of underreporting is
likely to be very high (Institute for Health Metr, 2021), which matches what
is observed in Figure 6. In these scenarios, we believe indirect reporting can
provide a viable alternative to obtain early approximations of prevalence.
Although CoronaSurveys is a work in progress and much fine tuning is still
required, we believe it provides a proof of concept of indirect reporting, as
well as early results on its feasibility.
All authors listed have made a substantial, direct, and intellectual contribu-
tion to the work and approved it for publication.
REFERENCES
1. Bernard, H. R., Hallett, T., Iovita, A., Johnsen, E. C., Lyerla, R.,
McCarty, C., et al. (2010). Counting Hard-To-Count Populations: the
Network Scale-Up Method for Public Health. Sex. Transm. infections
86 (Suppl. 2), ii11–ii15. doi:10.1136/sti.2010.044446
2. Centers for Disease Control and Prevention (2021a). Covid-19
Pandemic Planning Scenarios. Available at: https://www.cdc.gov/
coronavirus/2019-ncov/hcp/planning-scenarios.html (Accessed
December 12, 2020).
3. Centers for Disease Control and Prevention (2021b). Clinical Questions
about Covid-19: Questions and Answers. Available at: https://www.
cdc.gov/coronavirus/2019-ncov/hcp/faq.html (Accessed 05 09, 2021).
4. Fan, J., Yao, L., Stewart, K., Kommareddy, A. R., Bradford, A., Chiu,
S., et al. (2020). Covid-19 World Symptom Survey Data Api. Available
at: https://covidmap.umd.edu/api.html (Accessed May 28, 2021).
5. García-Agundez, A., Ojo, O., Hernández-Roig, H. A., Baquero,
C., Frey, D., Georgiou, C., et al. (2021). Estimating the COVID-19
Prevalence in spain with Indirect Reporting via Open Surveys. Front.
Public Health 9. Available at: https://www.medrxiv.org/content/10.11
01/2021.01.29.20248125v1 (Accessed May 28, 2021).
6. GCGImdea/coronasurveys (2020). Coronasurveys Data Repository.
Available at: https://github.com/GCGImdea/coronasurveys (Accessed
November 5, 2020).
7. Grafana Labs (2018). Grafana Documentation. Available at: https://
grafana.com/docs/ (Accessed May 28, 2021).
8. Institute for Health Metrics and Evaluation (2021). Covid-19 Results
Briefing in india. Available at: http://www.healthdata.org/sites/default/
files/files/Projects/COVID/2021/163_briefing_India_9.pdf (Accessed
May 03, 2021).
9. LimeSurvey Project Team/Carsten Schmitz (2012). LimeSurvey: An
Open Source Survey Tool. Hamburg, Germany: LimeSurvey Project.
10. Maxmen, A. (2020). How Much Is Coronavirus Spreading under the
Radar?. Nature 10. doi:10.1038/d41586-020-00760-8Available at:
https://www.nature.com/articles/d41586-020-00760-8
11. Oliver, N., Barber, X., Roomp, K., and Roomp, K. (2020). Assessing
the Impact of the Covid-19 Pandemic in spain: Large-Scale, Online,
Self-Reported Population Survey. J. Med. Internet Res. 22 (9), e21319.

doi:10.2196/21319
12. Pollán, M., Pérez-Gómez, B., Pastor-Barriuso, R., Oteo, J., Hernán, M.
A., Pérez-Olmeda, M., et al. (2020). Prevalence of Sars-Cov-2 in spain
(Ene-covid): a Nationwide, Population-Based Seroepidemiological
Study. The Lancet 396 (10250), 535–544. doi:10.1016/s0140-
6736(20)32266-2
13. Ruppert, E., Grommé, F., Upsec-Spilda, F., and Cakici, B., (2018). Citizen
Data and Trust in Official Statistics. Economie Statistique/Economics
Stat. (505-506), 171–184. doi:10.24187/ecostat.2018.505d.1971
14. Russell Bernard, H., Johnsen, E. C., Killworth, P. D., and Robinson, S.
(1991). Estimating the Size of an Average Personal Network and of an
Event Subpopulation: Some Empirical Results. Soc. Sci. Res. 20 (2),
109–121. doi:10.1016/0049-089x(91)90012-r
15. Verity, R., Okell, L. C., Dorigatti, I., Peter, W., Whittaker, C., Imai, N.,
et al. (2020). Estimates of the Severity of Coronavirus Disease 2019: a
Model-Based Analysis. Lancet Infect. Dis. 20, 669–677. doi:10.1016/
S1473-3099(20)30243-7
16. Yang, P., Ma, C., Shi, W., Cui, S., Lu, G., Peng, X., et al. (2012). A
Serological Survey of Antibodies to H5, H7 and H9 Avian Influenza
Viruses Amongst the Duck-Related Workers in Beijing, china. PLoS
One 7 (11), e50770. doi:10.1371/journal.pone.0050770Available
at: https://journals.plos.org/plosone/article?id=10.1371/journal.
pone.0050770
Chapter
ARTIFICIAL INTELLIGENCE
BASED BODY SENSOR
NETWORK FRAMEWORK—
16
NARRATIVE REVIEW:
PROPOSING AN END-TO-END
FRAMEWORK USING WEARABLE
SENSORS, REAL-TIME
LOCATION SYSTEMS AND
ARTIFICIAL INTELLIGENCE/
MACHINE LEARNING
ALGORITHMS FOR DATA
COLLECTION, DATA MINING
AND KNOWLEDGE DISCOVERY
IN SPORTS AND HEALTHCARE
Citation: (APA): Phatak, A. A., Wieland, F. G., Vempala, K., Volkmar, F., & Memmert,
D. (2021). Artificial Intelligence Based Body Sensor Network Framework—Narrative
Review: Proposing an End-to-End Framework using Wearable Sensors, Real-Time
Location Systems and Artificial Intelligence/Machine Learning Algorithms for Data
Collection, Data Mining and Knowledge Discovery in Sports and Healthcare. Sports
Medicine-Open, 7(1), 1-15. (15 pages)
Copyright: © Open Access. This article is licensed under a Creative Commons Attribu-
tion 4.0 International License (http://creativecommons.org/licenses/by/4.0/)
Ashwin A. Phatak1, Franz-Georg Wieland2, Kartik Vempala3, Frederik Volkmar1,

and Daniel Memmert1
1
Institute of Exercise Training and Sport Informatics, German Sports University, Cologne,
Germany
2
Institute of Physics, University of Freiburg, Freiburg im Breisgau, Germany
3
Bloomberg LP, New York, USA
ABSTRACT
With the rising amount of data in the sports and health sectors, a plethora
of applications using big data mining have become possible. Multiple
frameworks have been proposed to mine, store, preprocess, and analyze
physiological vitals data using artificial intelligence and machine learning
algorithms. Comparatively, less research has been done to collect potentially
high volume, high-quality ‘big data’ in an organized, time-synchronized,
and holistic manner to solve similar problems in multiple fields. Although
a large number of data collection devices exist in the form of sensors. They
are either highly specialized, univariate and fragmented in nature or exist in
a lab setting. The current study aims to propose artificial intelligence-based
body sensor network framework (AIBSNF), a framework for strategic use
of body sensor networks (BSN), which combines with real-time location
system (RTLS) and wearable biosensors to collect multivariate, low noise,
and high-fidelity data. This facilitates gathering of time-synchronized
location and physiological vitals data, which allows artificial intelligence
and machine learning (AI/ML)-based time series analysis. The study gives
a brief overview of wearable sensor technology, RTLS, and provides use
cases of AI/ML algorithms in the field of sensor fusion. The study also
elaborates sample scenarios using a specific sensor network consisting of
pressure sensors (insoles), accelerometers, gyroscopes, ECG, EMG, and
RTLS position detectors for particular applications in the field of health
care and sports. The AIBSNF may provide a solid blueprint for conducting
research and development, forming a smooth end-to-end pipeline from
data collection using BSN, RTLS and final stage analytics based on AI/ML
algorithms.
Keywords: Wireless body area networks, Wearable biosensors, Sports

analysis, Real-time location system, Multi-sensor fusion, Vitals data
Artificial Intelligence Based Body Sensor Network Framework... 307
Key Points
• A large number of wearable sensor technologies have given
rise to big data collection possibilities in the fields of sport and
healthcare.
• Emergence of body sensor networks, real time location systems
and multi sensor data fusion algorithm show great potential for
application in wide set of industries.
• The proposed AIBSNF framework has potential to provide a
solid blueprint for exploiting these rising technologies for end-
to-end application from data collection to knowledge discovery
across industries.
INTRODUCTION
Big Data and the Future

‘Dataism’ is a term coined by Yuval Harari in his popular science book ‘Homo
Deus’. This term suggests that in the near future, decisions in all aspects of
society will be based on the interpretation of the available ‘big data’ [1].
“Big data is defined as high-volume, high-velocity, high-variety and high
veracity information assets (4Vs) that demand cost-effective, innovative
forms of information processing for enhanced insight and decision making.”
In this definition, volume refers to the magnitude or size of the data, variety
refers to structural heterogeneity in the dataset, velocity refers to the rate at
which data are generated and veracity refers to the truthfulness or reliability
of the data [2–4]. ‘big data’ currently holds tremendous untapped potential,
which has possible applications in a multitude of industries, including but
not limited to health care, banking and finance, security, aviation, astronomy,
agriculture, and sports [2, 5–7]. Although big data can be a considerable
asset for the knowledge discovery process, using this data is a non-trivial
task. Due to its unique computational and statistical challenges, the strength
of ‘big data’ in terms of the 4Vs described above can also be its drawback.
Noise accumulation, spurious correlation, measurement errors, and high
computational power requirements are some of these challenges [8].
Solutions for any of the above-mentioned problems in terms of framework
and data mining tools may prove crucial for generating data, analyzing it, and
extracting actionable knowledge for application in the respective industries.
In recent years, the application of big data, its acquisition, and analysis
using AI/ML algorithms have been applied in sports and healthcare
diagnostics [5]. It has resulted in improvements in the identification of
critical information and is being used in decision-making processes [5]. The
nature of these fields is such that certain physiological signs that signify
sports performance are also good indicators of mental and physical health
[5]. The physiological information and movement patterns required to
investigate athletic performance in sports, such as heart activity, recovery,
muscular strength coordination, balance, etc., have considerable overlap
with general health indicators. Considering this overlap, the data collection
tools for these physiological indices can potentially be used to analyze both
sports performance and available health predictors [9].
Figure 1 outlines the scope of the present review, it outlines the field of
application, technologies used for data collection and post data collection
analysis for knowledge discovery in the broad set of fields.
Figure 1: Application fields and Scope of the AIBSNF framework.
Status Quo of Data Gathering, Mining and Analytics in Sports

The use of big data and AI/ML tools in sports was first introduced in track
and field, and weightlifting [10]. Baseball was one of the first sports to use
data for recruiting and performance-enhancing purposes [6]. The sports of
basketball and football soon caught up with a large number of professional
teams and academics using ‘big data’ for recruiting, performance analysis, and
performance enhancement in their respective sports [7]. The data currently
collected in sports falls under broad categories, viz. physiological data,
position tracking data, psychological data, scouting data, video data, etc. [7].
Spatiotemporal events, position data, and comprehensive match statistics for
several sports are commercially available today through companies such as

OPTA (https://www.optasports.com/), Hudl (https://www.hudl.com/), Instat
(https://instatsport.com/), Statsbomb (https://statsbomb.com/), and others.
Mining for the relevant information is primarily done through video and
manual tagging [11]. The collection of physiological vitals such as muscular
contraction data, GAIT analysis, etc., is currently difficult, if not impossible
to extract just from the game footage. The quality of data available within
the current technological limits is continuously improving, but some issues
still persist in collecting high-quality data during live sporting events.
Another issue is that data sources recorded by humans are currently prone to
missing values, inconsistencies between different measures, and a temporal
error [11]. Data sources that are automatically recorded, such as tracking
information via motion capture systems, traditionally have difficulties
with tracking complicated and crowded game situations [12]. Furthermore,
video data’s sheer size and complexity make it difficult for domain (sports)
specific feature extraction [11]. This issue is sometimes dealt with manually
or automatically after processing and cleaning the data; however, problems
persist. Tracking a relatively small sporting object such as a ball traveling at
high speeds is still unreliable even after post-event data mining [11]. Tracking
high-speed objects such as balls and rackets is especially important since
sporting equipment is usually the central reference point for meaningful
sports analysis.
A recent review of sports research focusing on data mining and analytics
(smart sports training) using AI/ML algorithms showed that the most
researched sports involving big data and computational intelligence were
soccer, running, and weight lifting [2]. The same review identified a total
of 97 studies ranging from individual to team sports with the same focus.
It also concluded that there is a lot of room for improvement in research
methods with respect to the quality and public availability of datasets, which
provides opportunities to validate the research done. Furthermore, multiple
studies have implemented or proposed frameworks of ML algorithms and
artificial neural networks (ANN) for sports results prediction. They mainly
focus on technical implementation of the algorithms and its performance
while predicting outcomes of a sporting match [2]. There seems to be less
emphasis on frameworks that focus on obtaining and organizing the high-
quality, low noise, and time-synced data required for implementing these
algorithms [13]. The framework proposed in the current review aims to
address these points.
Data Gathering, Mining and Analytics in Health Care

The field of health care has a long history of recording, analyzing, and drawing
inferences based on data [8]. This seems to be an effect of the requirements
of regulatory bodies [8]. The total amount of healthcare data is predicted
to cross Yottabyte scale in the next years [5]. Multiple approaches using
AI have been used in the past for injury risk assessment and performance
prediction [9]. Vison-based motion analysis has also been used for medical
diagnostics[14]. There still seems to be untapped potential for big data to
improve clinical operations, public health, preventive medicine, precision
medicine, evidence-based medicine, remote monitoring, patient profiling,
etc., in terms of lower cost, faster analysis, and reduced error rates [15]. A
few of the factors crucial for accelerating innovation in the field of smart
medicine seem to be, data gathering techniques and data mining from
existing sources [16].
Architectural frameworks facilitating AI/ML have been proposed for
analyzing the currently available vitals data across various sectors within
healthcare [15]. Despite this, deep and smooth data integration across
multiple healthcare applications is fragmented and slow. Another technical
challenge for the development of such tools is the data from different
healthcare environments. The lack of consistency in the structure of the data,
available features, noise, and bias of the source may lead to issues in the
trained algorithms [17]. With the development and advancements in low-
cost near gold standard sensor technology, there is a possibility to combine
multiple sensors to collect organized, time-synchronized data as the first step
for developing a pipeline for use in numerous healthcare and diagnostics
applications in the form of fusion technologies [18, 19]. Wearable sensors
seem to be the ideal tool for collecting such high-quality data [20–22].
Wearable Biosensors
The rise of wearable sensors as tools for data collection seems to be ideal for
gathering physiological and vital data. Hence, wearable sensors have become
popular in medical, entertainment, security, and commercial areas [21]. A
recent review published in ‘Nature Biotechnology’ elaborated the rising
interest in wearable biosensor technology in academics, performance, and the
health industry [22]. Wearables show great potential to provide continuous,
real-time physiological data using dynamic non-invasive measurements of
biochemical and physiological markers. So far, these sensors have been
used for gathering precise, high-fidelity strategic data, which facilitate a
whole host of applications with, military, precision medicine, and the fitness
industry at its forefront [23–25]. Their precise fidelity and precision varies
based on the specifications of their varying use cases.
Advances in electronics, printing, non-invasive data collection, and
monitoring technology, have given rise to durable, unobtrusive, non-
invasive wearable clothing as an electronics platform capable of sensing
human locomotion and vital physiological signals [21, 25–29]. Such media
and miniaturization of sensors provide unprecedented capacity to gather a
wide range of data in many scenarios. By choosing a specific set of sensors
strategically located at different human body locations, there is potential to
collect precise data for solving interesting problems. Table 1 shows a non-
exhaustive list of non-invasive sensor technology that can potentially be
used in various combinations to gather physiological data for applications
in a wide range of disciplines.
Table 1: A non-exhaustive list of sensors and their applications in physiological

vitals data gathering
Category Physiological index Application Types of sensors References

Sweat moni- Glucose and lactate Blood sugar Iontophoresis [20]
toring and physi- electrode-based
ological load sensors
monitoring
Electrolyte concen- Hydration Galvanic skin
tration monitoring, resistance (GSR)
measuring of
trace mineral
densities, etc.
Temperature Skin/core tempera- Acute infec- Pyroelectric sen- [30]
sensors ture measurement tion monitor- sors
ing
Cortisol Stress moni- Molybdenum di- [31]
toring sulfide nanosheets
Emotion Skin conduction Continuous GSR [32]
regulation estimation of
stress
Respiration Oxygen saturation Blood oxygen Light emitting [23, 33]
levels diode (LED)
Respiratory effort Strain sensor
sensor
Stretchable respira- Respiratory Resistive humidity [28]

tion sensors disease moni- sensors
toring and
diagnosis
Cardiological Heart rate Physical load Electrocardiogram [34]
data (ECG)
Heart rate variability Recovery and
stress
Skeletal mus- Electrical activity in Muscular ab- Electromyogram [35]
cle analysis muscles normities, ner- (EMG)
vous system
functioning
Muscular
fatigue, injury
probability
Continu- Biomechanical data Body position, Accelerometers, [36, 37]
ous exercise postural con- pressure sensors,
monitoring trol and team inertial measure-
through interaction ment unit gyro-
body motion scope
analysis
Foot movement GAIT analysis
pattern
Real-time Location Systems (RTLS) in Healthcare and Sports

Real-time location system (RTLS) is a combination of wireless hardware and
software deployed to acquire a continuous real-time position of assets and
resources, usually using a fixed reference point or receivers [38]. They seem
to have an advantage over video capture as in most situations direct line of
sight is required for motion analysis and this may not always be possible [39].
Most RTLS technologies are capable of measuring ToA (time of arrival),
TDoA (time difference of arrival), AOA (angle of arrival), RSS (receiver
signal strength), RSF (receiver signal phase), and RTF (roundtrip time of
flight). Using this, they can identify the real-time location of an object(s)
in question with one of the following methods: literation, angulation, or
fingerprinting [40]. In recent years due to advancements in data collection
technology, improvements in data mining algorithms and a reduction in the
cost of development kits have given rise to a whole host of applications in
industries such as production management, food delivery, healthcare and
sports analytics [38, 41–44].
In healthcare, RTLS has been used effectively in elder care tracking,

medical asset tracking, medication tracking, etc. Radio-frequency
identification (RFID) has mainly been used in various capacities for
improving medical asset management [34]. Furthermore, due to the
COVID-19 pandemic of early 2020, RTLS was used for contact tracing
to identify the potential spread of the virus. A combination of RTLS and
electronic medical records was successfully able to locate all contacts with
a sensitivity of 77.8% and specificity of 73.4%. Although not perfect, there
seems to be potential to improve this rate by integrating other complementary
measurement techniques [45].
In the field of team sports, real-time position and event data, in particular,
has become crucial for the industry. Findings from this gathered data have
benefited the field for physiological indicator analysis, tactical analysis, and
their combination [46]. Despite considerable progress in motion analysis
systems, there seems to be a lack of accurate and cost-effective technologies
in the current market. The accuracy levels required for different marker-less
human motion analysis scenarios are yet not established, but can potentially
be improved by adding wearable tags (transmission antennas) on the players/
sporting objects [46]. Multiple technology stacks and frameworks for data
storage and potential analysis using AI and ML algorithms have been
proposed [3]. Albeit, the gathering of such data in a synchronized manner
is a non-trivial task. Multiple companies offer position data gathering and
interpretation services using real-time location systems (RTLS) technology,
but studies on the accuracy of this data are limited. Table 2 shows a non-
exhaustive list of contemporary RTLS technologies, their accuracy, and
detection range, in different applications.
Table 2: A non-exhaustive list of real-time location systems (RTLS) and ap-

plications thus far along with their specifications. (NLoS = no line of sight,
LoS = line of sight
Technology Application Dynamic ac- Detection range Transmission References

curacy frequency
range
mmWave (5 g) Human Pose Up to < 0.02 m > 200 m@ 30–300 GHz [47, 48]
detection @ LoS outdoor
Active radio- Indoor loca- 0.9 to 1.6 m@ 50–95 m radius 433 [42]
frequency tion detection NLoS with possible
identification of devices scalability to
(RFID) and people 1000 m @ out-
doors
3D-Light Detec- Precision 0.01 to 0.2 m @ ~ 200 m @ LoS, ~ 200 Thz [49]
tion and Rang- vehicle local- LoS both outdoor and
ing (LiDAR) ization indoor
Wireless Fidel- Indoor and 1–3 m @NLoS < 200, @ out- 2.4 to 5 GHz [42]
ity (Wi-Fi) outdoor po- door and < 60
sitioning for m indoor under
smartphones Wi-Fi covered
distance
Ultra Sound Indoor loca- Up to 0.01 m @ Up to 10 m @ 1–20 MHz [46, 50]
tion LoS and ~ 0.02 LoS indoor
@ NLoS
Bluetooth Real-Time Typically, Up to 2 m @ 2 MHz of [40, 51]
Indoor Posi- between 2 and 5 NLoS width in the
tioning m but can go up 2.4 GHz band
to 0.77 m using
different signal
processing algo-
rithms
Ultra-Wide Tracking Between 0.08 40–80 m 3.1 to 10.6 [42, 46, 52]
Band (UWB) and position and 0.2 m @ GHz
detection in LoS
sports
Computer Vi- Tracking Up to 0.05.–0.1 N/A N/A [40, 53]
sion of ball in m @ 340 fps
sports such
as Tennis and
Cricket
Tracking Up to 8.5% error N/A N/A
path length and under 1
of multiple m for marker-
objects based solutions
Global Posi- Measuring Up to 1.31 m/s > 100 km Out- 1575.42 MHz [54]
tioning System real-time error while mea- door and Indoor and 1227.6
(GPS) movement of suring velocity MHz
soccer play- and 6.05% error
ers in a test when measur-
situation ing position @
NLoS
Global Naviga- Smartphone Up to a few > 100 km Out- 1–2 GHz [55]
tion Satellite Location centimeters but door
System unstable
The fields of healthcare and sports have, to date, used RTLS and wearable
technologies separately for solving specific problems [43, 45, 54]. The scope
of the application has been limited thus far, but there seems to be a massive
potential for using RTLS in combination with wearable sensor technology.
Previous research has proposed and implemented frameworks in health care
and sports using these technologies separately. Still, there seems to be a lack
of integration of these two technologies.
Body Sensor Network (BSN)

Developments in wearable sensor technology and the improvements in
wired and wireless communication devices have given rise to low-power,
intelligent, miniaturized sensor node networks, also known as wireless body
areas networks, body sensor network, body area networks, etc. (referred to
as BSN henceforth) [56]. The BSNs provide a blueprint for placing sensors
at strategic locations for each individualized application. BSNs have been
proposed in a multi-level fusion framework (MLFF) to monitor soldiers and
help decision-making by using multiple factors such as physiology, emotions,
fatigue, environment, and location. The same MLFFs, which can potentially
measure soldiers’ performance, can be used for sports performance analysis
and health care with minor tweaks [24].
Biomechanical, biometric, and positional data are crucial for
understanding physiology and logistics involved in sports. The data thus
obtained, both in real-time and post-event, has tremendous potential
in knowledge discovery, research and development across sports. This
combination may help decode physiology and logistics in a sport, which
may unlock further avenues for research and development in academics or
applications in industry. Currently, several AI/ML algorithms exist which
have the potential for performing multi-sensor data fusion. These algorithms
can stitch images, perform time series analysis and forecasting, automatic
event detection and classification, anomaly detection, fault detection, etc.
[46–48].
Artificial Intelligence and Machine Learning Algorithms for

Multi-sensor Data Analysis
A comprehensive range of tools and techniques for time series analysis
already exist for multidimensional signal processing. The utility has been
demonstrated in applied and fundamental research in physics, biology,
medicine, and economics [57]. A growing number of time series analysis
algorithms have become available as data mining and interpretation tools
due to recent advances in AI and ML. They seem to have an excellent tool
capable of handling multivariate data. Table 3 shows a non-exhaustive list of
algorithms used for multiple analogous applications that have the potential
to be used in the problems addressed by the current review.
Table 3: Non-exhaustive List of Algorithms used for multi-sensor data model-

ling
Algorithm Task Performance Category References

Kernel Ensemble Heart disease pre- 98% accuracy Medical Data [18]
Random Forest diction using daily on testing data
classifier with 40 activity data from
estimators, 8 fea- multiple sensors
tures at depth of 15
Convolutional Neu- Fault diagnosis in a 93% to 99% Machine [58]
ral Networks planetary gearbox accuracy on Design
from multi-sensor testing data
data
Long Short-Term Real-time iden- ~ 95% with Human Motion [59]
Memory Artificial tification of foot maximum Analysis
Neural Network contact and foot off delay of 3 s in
by analyzing gait real time
pattern in children
TimeNet Pre- Generalized time The aver- Generalized [60]
trained Deep series classification age accuracy solution for
Recurrent Neural across multiple observed was series analysis
Network datasets 83% on various across various
datasets domains
Choquet Inte- Multivariate Time Between 90 to Anomaly [61]
gral + Hidden Mar- Series Anomaly 99% depending detection
kov Chain Models Detection across on the chosen
various data sets dataset
Convolutional Neu- Real-Time Skeletal Localization Human Motion [39]
ral Networks Posture estimation error of 3.2 cm Capture
using mm-wave for X, 2.7 for y
radar and 7.5 for z
Principle Compo- Multivariate Time Performs best Automatic [62]
nent Analysis + To- series analysis for across multiple Event Detec-
eplitz Inverse-Co- identification of performance tion
variance Clustering recurring events in matrices (F1,
smart manufactur- Precision, Rand
ing Index, etc.)
K-nearest neigh- Method for Rec- 78.9% accuracy Activity Rec- [63]
bors ognition of the ognition
Physical Activity
of Human Being
Using a Wearable
Accelerometer
Support Vector Fall detection on Recall 90% Activity Rec- [64]

Machines mobile phones us- and precision ognition /Fall
ing features from a 95.7% detection
five-phase mode
Artificial neural An alternative to Sensitivity Activity Rec- [65]
networks traditional fall de- 0.984 Specific- ognition /Fall
tection methods ity 0.986 detection
Bayesian sequential Contamination Average detec- Water Contam- [66]
analysis and Multi- Event Detection tion rate > 80% ination Event
layer Perceptron with Multivariate Detection
Time-Series Data in
Agricultural Water
Monitoring
Fisher’s Linear Detecting Stress Accuracy of Stress Level [67]
Discriminant During Real-World 97.4% Detection
Driving Tasks Us-
ing Physiological
Sensors
Correlation-based Automated epilep- Average accu- Medical Diag- [68]
feature selection tic seizure detection racy of 98.45% nosis
with random for-
est classifier with
random forest
classifier
A large number of algorithms are continuously being developed in the

field of AI/ML-based time series analysis. Multiple libraries in Python and
R exist along with open-source repositories on GitHub. There are various
tools available for analyzing and interpreting multivariate data acquired
from multiple sensors [18, 19, 58]. However, multiple sensors in different
fields seem to be univariate or exist only as theoretical frameworks. There
appears to be limited applications of using continuous, time-synchronized
physiological vitals and RTLS data in combination with AI/ML algorithms
both, in sports and healthcare.
The present review aims to highlight available tools in fields of wearable
biosensors, RTLS, and AI/ML. Furthermore, the authors propose, AIBSNF,
a framework of BSN which collects continuous multivariate physiological
and live location data through the mashup of RTLS and wearable sensor
technology in a potentially time synchronized manner. AIBSNF provides the
blueprint for collecting such data, which is ideal for knowledge discovery
through AI/ML algorithms. The authors of the current review highlight the
framework’s application in two widely distinct scenarios, viz. team sports
(tackle in football) and healthcare diagnostics (monitoring and research in
patients with rheumatoid arthritis or osteoarthritis).
ARTIFICIAL INTELLIGENCE-BASED BODY SENSOR

NETWORK FRAMEWORK: AIBSNF
The selection of the sensors and placement on the body is crucial due to the
limitations on the number of sensors placed on a single person. Furthermore,
the sampling rate for different sensors may be different, which needs to be
considered depending on the applications at hand. In Table 4, we suggest an
example BSN framework and strategic placement of selected sensors based
on successful past applications and fit into the conceptual framework shown
in Fig. 2.
Table 4: List of chosen sensors and their possible placement based on previous
applications and prior studies conducted
Chosen Sensor Measurement Of Placement on Successful past applications References

the body
Inertial Measure- Acceleration of the Centre of Mass, Clinical instrumentation, falls [69]
ment Unit (IMU) limbs (Angular and Wrists, Feet (as management, identification of
Linear) Insoles) pathologic motor features, etc.
Injury prevention, load assess- [10]
ment, performance coaching
tool, automatic event detec-
tion in multiple sports, etc.
Electrocardiogram Detailed electrical On the chest Abnormal Findings in ST [70]
(ECG) activity of the heart, and T waves in patients with
Heart Rate, Heart Rheumatoid Arthritis
Rate Variability, etc.
Detection of unusual heart [71]
electric field parameters in
type 1 and 2 diabetic patients
Differentiating pathological vs [72]
physiological abnormalities in
athletes to assess vulnerability
of athletes to Sudden Cardiac
Death
Electromyograph Electrical activ- Quadriceps, Assessing muscle activity [69]
(EMG) ity in the muscle in Glutes, Calves, levels in the elderly, patients
question Hamstrings, with neurological disorders,
Back, Abdomen, and the injured
etc. (can be vari-
able based on the
problem at hand)
Injury recovery, muscle [10]
activation patterns analysis,
synergies in muscle chains, in
athletes during sport specific
movements
RTLS tag Position of the per- Center of Mass Tracking of patients, medical [73]
son in question, from staff and medical assets in a
a set reference point hospital
Physical Load, real-time posi- [7, 74]
tion data acquisition, tactical
analysis in team sports, coach-
ing and strategy development
Force Plate Pressure distribution Feet (as Insoles) Gait, biofeedback interven- [69]
on individual feet tions in stroke patients to
improve balance, mobility
Gait, analysis of athletes for [10]
technique and performance
optimization
The accuracy of the abovementioned sensors is highly dynamic and the

state of the art is constantly changing due to improvements in technology
and post collection signal processing techniques
Figure 2: The conceptual framework (AIBSNF) for building a knowledge dis-

covery pipeline.
Figure 3 illustrates the whole proposed framework by using sample
sensors, their ideal placement, data gathering and preprocessing as
synchronizing the data, use of time series analysis algorithms for specific
used case of sport specific event detection.
Figure 3: Example framework with selected BSN for continuous monitoring of

chronic diseases and sports specific event detection.
SPECIFIC APPLICATIONS
Monitoring and Research in Patients with Rheumatoid Arthri-

tis or Osteoarthritis
Rheumatoid arthritis (RA) is a chronic inflammatory autoimmune disorder
that affects the joints but can also cause damage to other systems such as skin
and lungs. It is projected that over 78 million adults will be diagnosed with
rheumatoid arthritis in the US alone. Among those with arthritis, one in four
have movement and working limitations. Furthermore, adults with arthritis
were shown to be 2.5 times likely to have fallen as compared to healthy
adults (https://www.cdc.gov/arthritis/data_statistics/arthritis-related).
Multiple studies have explored compliance rates as compared to the
gold standard while performing rehabilitation exercises in patients with
knee osteoarthritis. This was done using IMU sensors for feedback and
monitoring resulting in varying degrees of accuracy during measurements.
The studies concluded that wearable technology for assessing rehabilitation
performance is a viable solution with room for potential improvements in
measurement accuracy and compliance with the gold standard [75, 76].
Furthermore, ECG abnormalities have been detected in patients with RA,
and there is a need to explore this area further [70].
AIBSNF can be used to continuously monitor individuals with arthritis.
According to previous research there are observed reductions in stride
length, EMG activation of specific muscles, and abnormalities in the ECG
pattern (all known markers of progression of arthritis) [70, 71]. Interventions
could be potentially timed to manage attrition in RA patients using medical
or nonmedical information, such as exercise prescription and change

in medications. Furthermore, the data collected from such continuous
monitoring can be used by doctors, researchers and other health professionals
to measure the efficacy and compliance rates of the intervention.
Example: Automatic Detection of a Tackle in the Game

of Football
With an increasing number of teams taking decisions based on data analytics,
there is a requirement for gathering and interpreting many high-fidelity
data in sports specific scenarios. Automatic event detection and accurate
position data are both crucial for this purpose. Figure 4 outlines a procedure
in which ECG, EMG, gyroscopes, and RTLS tags have been placed at
strategic locations on players from opposing teams and the ball. All these
sensors would collect real-time data which convey the biomechanical and
vital information from players, as well as the ball’s position. The scenario in
Fig. 4 is a tackle which has a unique visual fingerprint, i.e., where the player
without the ball (player 2) is on the ground with one leg extended, trying to
reach the ball. Player 1 is trying to dribble trying to reach the ball, which is
in the possession of player 1.
Figure 4: The scenario of conducting a tackle in the sport of football where

both the players are fitted with a body sensor network.
Human experts can identify a tackle when they see it. This is primarily
due to the physical interaction of two players, which is unique to the tackle
itself. When this information is digitized using the BSN, the physics of the
ball and biomechanics, combined with the location data of all three parties
involved can be recorded. This data can be used for automatic sports-
specific event and lower limb movement detection [77, 78]. A time-series
clustering and classification algorithm can potentially identify all sets of
tackles automatically. Another advantage of tracking ECG, EMG, and RTLS
data is tracking of physical load on the cardiovascular system and individual
muscles can also be performed. When done on an ongoing basis, there is
potential to avoid injury, assess the preparedness of an athlete, and find new
correlations in a host of technical and tactical components of the game in
real-time [7, 74].
The same methodology can potentially be applied across multiple
individual and team sports to identify a wide variety of events. Automatic
event identification has been proposed in previous studies, but there is further
research warranted due to low reliability and validity of existing approaches
[37, 77]. The current BSN plus time series analysis framework would
potentially prove invaluable for multiple applications in sports, including
but not limited to analyzing technique, coaching, self and opponent analysis,
tactical analysis, talent identification, player selection, recruitment etc. [7,
74, 79]. Furthermore, broadcasting agencies can use such data to provide
visualizations and real-time information breakdown for live sporting events.
This may help enrich the ordinary viewer’s experience, providing them
with in-depth information from an expert’s perspective. Health performance
tracking is another up and coming field due to the rise of high-quality
low-cost sensors. AIBSNF can be potentially used to build biofeedback
mechanisms for continuous health tracking for a whole host of applications
such as sleep and recovery tracking, personalized training programs for
strength, mobility, cardiovascular endurance, and even ergonomic posture
feedback in workplaces [56].
GENERAL APPLICATIONS
Smartphone Applications and Wearables in Fitness and Health

Tracking
Due to properties such as utility, portability, high computing power, the
smartphone has become an ideal tool for collecting a wide variety of data
including health-related parameters in everyday life [64]. Using technologies

such as WiFi and bluetooth, these devices provide a perfect ground to stack
systems of data transmission from a wide range of sensors in a simplified
manner. The collection of data can be achieved by the smartphone itself
with embedded sensors (e.g., triaxial accelerometer) or with the use of
external sensors (e.g., ECG, EMG) using wearables [80]. Furthermore, the
smartphone can potentially act as a data transmitting and processing tool
that collects data and interacts with it. These properties facilitating input,
output and interactive operations make the smartphone an important part
of the data management systems [21]. Various smartphone applications
combine data transmitted from different sensors which can be analyzed both
in real-time and post collection [81].
Mashup data collection tools have been applied in several studies such as
the “Physiodroid study” [82]. Using external sensors and a smartphone for
collection and computation, the team simultaneously analyzed ECG, heart
rate, respiration, acceleration and skin temperature. Data was transmitted
into an app where the patients and the clinicians had insights into the
processed data. Such data can prove useful to detect emergency events and
have shown to reduce anxiety in patients [82].
Besides the medical sector, ordinary users can profit from such integrated
systems when applied in individual sports. Connecting data on e.g., muscle
oxygen levels with heart rate there is potential for having high fidelity view
on the condition of the body during training sessions, which could be useful
for managing physical load and intensity on a personalized level.
Applications in Healthcare Diagnostics and Research

A set of clinical case studies performed with patients on the autism spectrum
compared the use of multivariate and univariate data using ML methods, to
demonstrate the utility of multivariate analysis. The authors concluded that
multivariate analysis techniques seem crucial for analyzing data collected
from biological networks [83]. Furthermore, most diseases seem to have
multiple causes and prognosis which are usually determined by several
factors. The multivariable analysis allows accounting for the multifaceted
nature of risk, and their relative contribution to the result. Hence it seems
to be advantageous to gather multivariable data for diagnosis and designing
probable interventions [84]. In line with previous research the proposed
AIBSNF framework can be integrated in textile-based wearables [29]. These
can potentially be self-powered, e.g. using recent advances in piezoelectric
technology by harvesting energy from biomechanical movement [27, 85].

The wearables can also be extended to smart prosthetics and other assistive
technologies [86].
Applications in Alternative Medicine

‘Pulse diagnosis’ is a diagnostic technique used in Ayurveda, traditional
chinese medicine (TCM), and other alternative therapies. Practitioners of
these alternative medicine fields understand the pathological changes in
internal organs [87, 88]. Devices capable of quantifying pulse via multi-
sensor information using ECG, ultrasound imagining, pressure impedance
blood flow, and volume pulse have been developed [89]. Furthermore,
pulse diagnosis has also been conducted by measuring skin impedance at
acupoints using a photoplethysmography sensor, galvanic skin response
(GSR), and a smartphone. This was done to diagnose a condition called wiry
pulse in TCM with an accuracy of above 90 percent [90]. Research in areas
that combine modern technologies to quantify and validate ancient medicine
practices, seem to require multivariate vitals data and its analysis. AIBSNF
with slight modifications may prove valuable for conducting validation
studies and develop alternative medicine technology.
Playing Field for Artificial Intelligence and Machine Learning

Chess has been used as an ideal scenario to push AI/ML research forward.
This is because each piece has a strict set of rules as to where it may move.
Furthermore, performance in chess is highly objective due to the strict set
of rules. This allows the digitization of all moves performed, which can
then be studied using probabilistic mathematical modeling. This makes it
ideal for training agents via reinforcement learning [91]. Using the proposed
BSN and RTLS technology, real world sports can be digitized in the same
manner. However, the players in sports don’t follow rules as strict as those
in chess. Sports are still largely contested under set of rules which are liberal
as compared to chess. This, when digitized, can potentially provide a vast
playground for AI/ML algorithms. This may offer a multitude of insights
into specific sports and may also help move the field of reinforcement
learning forward.
LIMITATIONS AND ISSUES

Although there is enormous potential for the use of BSNs, several
challenges still exist for its implementation. On the technology side, the
wearable sensors developed are comparatively new and collect data at

different frequencies. They also lack benchmarking and approval for use
in the medical industry [21]. The energy required for various sensors is
varied, and there exist limitations such as data collection time due to the
size and the shape of the battery, compliance of the user, and the physical
impact of sensor operations. Hence, designing effective BSNs for solving
problems requires a comprehensive set of domain-specific and technological
knowledge [92]. Furthermore, on the analytics side, there still exist
challenges such as computational power availability, high dimensionality,
noise, and variable latency from different sensors in the case of real-time
data collection [21]. The field of AI/ML, although advanced, is still in its
infancy for discovering causal relationships. Hence, care must be taken to
interpret output from AIBSNF as causal inferences or diagnosis, in the case
of healthcare applications.
Data security is another major issue in the development and use of
AIBSNF. Due to the intimate nature of the collected data, there needs to be
mechanisms ensuring strict data protection at all proposed pipeline levels
[56, 92]. The BSN connected to a user and data storage location may be
significantly threatened by outside attacks. Data security risks may be higher
for attacks from some entity with inside access to this data [22]. These issues
are not unique to the BSN framework but for big data in general. However,
due to the nature of the data collected in the applications mentioned in
the current review, the threat may be amplified in terms of confidentiality,
integrity, and information privacy [93].
In combination with data, using AI algorithms as independent
decision making tools is a matter of great debate in the healthcare sector
due to privacy concerns [94]. In the context of healthcare, AI can be
categorized in two different ways, the first being a diagnostic tool, where
the decision making and ethics lie with the human user of the tool. The
second category is AI algorithms as independent decision makers or non-
biological healthcare professionals [94]. The scope of the current paper lies
within the prior category. A recent review outlines the opportunities and
drawbacks of using AI in personalized medicine. It concludes the necessity
for a multidisciplinary, public discussion to define the principles, ethics and
social guidelines for using AI in healthcare due to limited expertise of the
regulation sector in the field of concern [95].
CONCLUSION
The fields of RTLS and wearable biosensors are rapidly developing. There
has been tremendous progress in improving accuracy, validity, and reliability
in the sport and healthcare industries over the past decade [22, 40]. It is safe
to assume that they will continue to improve with the further layering of AI/
ML techniques. AIBSNF seems to be ideally positioned to take advantage
of the improvements in all these fields. It has the potential to impact a wide
range of research and development activities in multiple industries. Due
to the rapid pace of this development, numerous technological challenges
exist. Identifying the right sensors, and mashing them up successfully at
appropriate sample rates for time-synchronous data gathering, is possible
but challenging. Data protection at each level of collection, use of the right
algorithms, availability of computational power, and data science expertise
are needed to successfully implement such technology on a commercial scale.
The fields of sports and healthcare seem to be ideal areas where the proposed
mashup technologies can be of significant benefit. AIBSNF provides a high-
level understanding of how these technologies can be combined to develop
applications in multiple fields. However, there still exist a whole host of
technical challenges specific to the application domain. Further research and
development are required for the successful application of AIBSNF in the
highlighted industries.
ACKNOWLEDGEMENTS
The authors of the paper would like to acknowledge Ms Maithili Phatak for
her contribution for the artwork in the current paper. The authors would also
like to acknowledge the contributions of the management staff and language
correction team of the Institute of Exercise Training and Sport Informatics
at the German Sports University, Cologne.
AP: Primary author, writing and overall research, F-GW: Writing and
reviewing of the scenarios and the proposed framework, KV: Researching
and writing time series algorithms section, FV: Discussion and conclusion
writing, DM: Supervisor, contributions in introduction, sports applications
and framework development. All authors read and approved the final
manuscript.
REFERENCES
1. Harari YN. Homo Deus: a brief history of tomorrow. Homo Deus:
Random House; 2016.
2. Rajšp A, Fister I. A systematic literature review of intelligent data
analysis methods for smart sport training. Appl Sci. 2020;10:3013. doi:
10.3390/app10093013.
3. Roy R, Paul A, Bhimjyani P, Dey N, Ganguly D, Das AK, et al. A
short review on applications of big data analytics. In: Mandal JK,
Bhattacharya D, et al., editors. Emerg technol model graph. Singapore:
Springer; 2020. pp. 265–278.
4. Claudino JG, Cardoso Filho CA, Boullosa D, Lima-Alves A, Carrion
GR, GianonI RL da S, et al. The role of veracity on the load monitoring
of professional soccer players: a systematic review in the face of the
big data era. Appl Sci. 2021;11:6479.
5. Cottle M, Hoover W, Kanwal S, Kohn M, Strome T, Treister NW.
Transforming health care through big data: strategies for leveraging
big data in the health care industry. Inst. Heal. Technol. Transform. -
iHT. 2013.
6. MacLennan T. Moneyball: The Art of Winning an Unfair Game. J Pop
Cult. 2005;
7. Rein R, Memmert D. Big data and tactical analysis in elite soccer:
future challenges and opportunities for sports science. Springerplus.
2016;5:1–13. doi: 10.1186/s40064-016-3108-2.
8. Raghupathi W. Data Mining in Health Care. [Internet]. 1st ed. Healthc.
Informatics Improv. Effic. Product. Taylor & Francis; 2010. https://
www.taylorfrancis.com/books/e/9780429131059
9. Claudino JG, Capanema D de O, de Souza TV, Serrão JC, Machado
Pereira AC, Nassis GP. Current approaches to the use of artificial
intelligence for injury risk assessment and performance prediction in
team sports: a systematic review. Sports Med Open Sports Med Open;
2019. p. 1–12.
10. Taborri J, Keogh J, Kos A, Santuz A, Umek A, Urbanczyk C, et al. Sport
biomechanics applications using inertial, force, and EMG sensors: a
literature overview. Appl Bionics Biomech. 2020;2020.
11. Vijayakumar V, Nedunchezhian R. A study on video data mining. Int J
Multimed Inf Retr. 2012;1:153–172. doi: 10.1007/s13735-012-0016-2.
12. Bialkowski A, Lucey P, Carr P, Yue Y, Sridharan S, Matthews I. Large-

scale analysis of soccer matches using spatiotemporal tracking data.
In: Proceedings of the IEEE international conference on data mining,
ICDM. 2015;2015-Janua:725–30.
13. Bunker RP, Thabtah F. A machine learning framework for sport result
prediction. Appl Comput Inform. 2019;15:27–33. doi: 10.1016/j.
aci.2017.09.005.
14. Colyer SL, Evans M, Cosker DP, Salo AIT. A review of the evolution
of vision-based motion analysis and the integration of advanced
computer vision methods towards developing a markerless system.
Sport Med Open.; 2018;4. https://sportsmedicine-open.springeropen.
com/articles/10.1186/s40798-018-0139-y
15. Raghupathi W, Raghupathi V. Big data analytics in healthcare: promise
and potential. Heal Inf Sci Syst. 2014
16. Wang Y, Kung LA, Byrd TA. Big data analytics: understanding
its capabilities and potential benefits for healthcare organizations.
Technol Forecast Soc Change. 2018;126:3–13. doi: 10.1016/j.
techfore.2015.12.019.
17. Yu KH, Beam AL, Kohane IS. Artificial intelligence in healthcare. Nat
Biomed Eng. 2018;2:719–731. doi: 10.1038/s41551-018-0305-z.
18. Muzammal M, Talat R, Sodhro AH, Pirbhulal S. A multi-sensor
data fusion enabled ensemble approach for medical data from body
sensor networks. Inf Fusion. 2020;53:155–164. doi: 10.1016/j.
inffus.2019.06.021.
19. Dong J, Zhuang D, Huang Y, Fu J. Advances in multi-sensor data
fusion: algorithms and applications. Sensors. 2009;9:7771–7784. doi:
10.3390/s91007771.
20. Gao W, Emaminejad S, Nyein HYY, Challa S, Chen K, Peck A, et
al. Fully integrated wearable sensor arrays for multiplexed in situ
perspiration analysis. Nature. 2016;529:509–514. doi: 10.1038/
nature16521.
21. Mukhopadhyay SC. Wearable sensors for human activity monitoring:
a review. IEEE Sens J. 2015;15:1321–1330. doi: 10.1109/
JSEN.2014.2370945.
22. Kim J, Campbell AS, de Ávila BEF, Wang J. Wearable biosensors
for healthcare monitoring. Nat Biotechnol. 2019;37:389–406. doi:
10.1038/s41587-019-0045-y.
23. Jeong IC, Bychkov D, Searson PC. Wearable devices for precision
medicine and health state monitoring. IEEE Trans Biomed Eng IEEE.
2019;66:1242–1258. doi: 10.1109/TBME.2018.2871638.
24. Shi H, Zhao H, Liu Y, Gao W, Dou SC. Systematic analysis of a
military wearable device based on a multi-level fusion framework:
research directions. Sensors (Switzerland) 2019;19:2651. doi: 10.3390/
s19122651.
25. Seshadri DR, Li RT, Voos JE, Rowbottom JR, Alfes CM, Zorman
CA, et al. Wearable sensors for monitoring the physiological and
biochemical profile of the athlete. NPJ Digit Med. 2019;2:1–16. doi:
10.1038/s41746-019-0150-9.
26. Homayounfar SZ, Andrew TL. Wearable sensors for monitoring
human motion: a review on mechanisms, materials, and challenges.
SLAS Technol. 2020;25:9–24.
27. Zhou H, Zhang Y, Qiu Y, Wu H, Qin W, Liao Y, et al. Stretchable
piezoelectric energy harvesters and self-powered sensors for wearable
and implantable devices. Biosens Bioelectron. 2020;168:112569. doi:
10.1016/j.bios.2020.112569.
28. Dinh T, Nguyen T, Phan HP, Nguyen NT, Dao DV, Bell J. Stretchable
respiration sensors: Advanced designs and multifunctional platforms
for wearable physiological monitoring. Biosens Bioelectron.
2020;166:112460. doi: 10.1016/j.bios.2020.112460.
29. Heo JS, Eom J, Kim YH, Park SK. Recent progress of textile-based
wearable electronics: a comprehensive review of materials, devices,
and applications. Small. 2018;14:1–16. doi: 10.1002/smll.201703034.
30. Moran DS, Mendal L. Core temperature measurement: methods and
current insights. Sport. Med. 2002.
31. Rice P, Upasham S, Jagannath B, Manuel R, Pali M, Prasad S. CortiWatch:
watch-based cortisol tracker. Futur Sci OA. 2019;5:FSO416.
32. Wen W, Tomoi D, Yamakawa H, Hamasaki S, Takakusaki K, An Q, et
al. Continuous estimation of stress using physiological signals during
a car race. Psychology. 2017;6:978–86. https://www.researchgate.net/
publication/317012834_Continuous_Estimation_of_Stress_Using_
Physiological_Signals_during_a_Car_Race
33. Chu M, Nguyen T, Pandey V, Zhou Y, Pham HN, Bar-Yoseph R, et
al. Respiration rate and volume measurements using wearable strain
sensors. NPJ Digit Med. 2019;2:1–9. doi: 10.1038/s41746-019-0083-

3.
34. Imani S, Bandodkar AJ, Mohan AMV, Kumar R, Yu S, Wang J, et al. A
wearable chemical-electrophysiological hybrid biosensing system for
real-time health and fitness monitoring. Nat Commun. 2016;7:1–7. doi:
10.1038/ncomms11650.
35. Taelman J, Adriaensen T, Van Der Horst C, Linz T, Spaepen A. Textile
integrated contactless EMG sensing for stress analysis. In: Annu Int
Conf IEEE Eng Med Biol Proc. 2007. p. 3966–9.
36. Lin R, Kim HJ, Achavananthadith S, Kurt SA, Tan SCC, Yao H, et al.
Wireless battery-free body sensor networks using near-field-enabled
clothing. Nat Commun. 2020;11:1–10.
37. Johnston W, O’Reilly M, Argent R, Caulfield B. Reliability, validity
and utility of inertial sensor systems for postural control assessment
in sport science and medicine applications: a systematic review. Sport
Med. 2019;49:783–818. doi: 10.1007/s40279-019-01095-9.
38. Malik A. RTLS for DUMMIES. Wiley Publ. 2009.
39. Sengupta A, Jin F, Zhang R, Cao S. mm-Pose: real-time human skeletal
posture estimation using mmWave radars and CNNs. IEEE Sens J.
2020;20:10032–10044. doi: 10.1109/JSEN.2020.2991741.
40. Mendoza-Silva GM, Torres-Sospedra J, Huerta J. A meta-review of
indoor positioning systems. Sensors (Switzerland) 2019;19:4507. doi:
10.3390/s19204507.
41. De Silva V, Caine M, Skinner J, Dogan S, Kondoz A, Peter T, et al. Player
tracking data analytics as a tool for physical performance management
in football: a case study from chelsea football club academy. Sports.
2018;6:130. doi: 10.3390/sports6040130.
42. Zhai C, Zou Z, Zhou Q, Mao J, Chen Q, Tenhunen H, et al. A 2.4-GHz
ISM RF and UWB hybrid RFID real-time locating system for industrial
enterprise Internet of Things. Enterp Inf Syst. 2017;11:909–26.
43. Kamel Boulos MN, Berry G. Real-time locating systems (RTLS) in
healthcare: A condensed primer. Int. J. Health Geogr. 2012.
44. Clarinox. Real Time Location Systems. Clarinox.Com. 2009.
45. Ho HJ, Zhang ZX, Huang Z, Aung AH, Lim WY, Chow A. Use of a
real-time locating system for contact tracing of health care workers
during the COVID-19 pandemic at an infectious disease center in
singapore: Validation study. J Med Internet Res. 2020;22.
46. Leser R, Baca A, Ogris G. Local positioning systems in (game) sports.

Sensors. 2011;11:9778–9797. doi: 10.3390/s111009778.
47. Khalid R, DAS GUPTA R, ALIZADEH P. Real-time location sensing
system. 2018. https://patents.google.com/patent/WO2018206934A1/
en
48. Wu T, Rappaport TS, Collins CM. The human body and millimeter-
wave wireless communication systems: interactions and implications.
IEEE Int Conf Commun. 2015. https://ieeexplore.ieee.org/
document/7248688
49. Hsu CM, Shiu CW. 3D LiDAR-based precision vehicle localization
with movable region constraints. Sensors (Switzerland). 2019;19.
50. Luo X, Wang H, Yan S, Liu J, Zhong Y, Lan R. Ultrasonic
localization method based on receiver array optimization schemes.
Int J Distrib Sens Netw. 2018;14. https://journals.sagepub.com/doi/
full/10.1177/1550147718812017
51. Pancham J, Millham R, Fong SJ. Investigation of obstructions and range
limit on bluetooth low energy RSSI for the healthcare environment
[Internet]. Lect. Notes Comput. Sci. (including Subser. Lect. Notes
Artif. Intell. Lect. Notes Bioinformatics). Springer International
Publishing; 2018. http://dx.doi.org/10.1007/978-3-319-95171-3_21
52. Blauberger P, Marzilger R, Lames M. Validation of player and ball
tracking with a local positioning system. 2021;21:3501–9
53. Thomas G, Gade R, Moeslund TB, Carr P, Hilton A. Computer vision
for sports: current applications and research topics. Comput Vis Image
Underst. 2017
54. Bastida Castillo A, Gómez Carmona CD, De la Cruz Sánchez E, Pino
Ortega J. Accuracy, intra- and inter-unit reliability, and comparison
between GPS and UWB-based position-tracking systems used for
time–motion analyses in soccer. Eur J Sport Sci. 2018;18:450–7.
55. Dabove P, Di Pietra V. Towards high accuracy GNSS real-time
positioning with smartphones. Adv Sports Res. 2019;63:94–102. doi:
10.1016/j.asr.2018.08.025.
56. Movassaghi S, Abolhasan M, Lipman J, Smith D, Jamalipour A.
Wireless body area networks: a survey. IEEE Commun Surv Tutorials.
2014;16:1658–1686. doi: 10.1109/SURV.2013.121313.00064.
57. Lacasa L, Nicosia V, Latora V. Network structure of multivariate time
series. Sci Rep. 2015;5:1–9. doi: 10.1038/srep15508.
58. Jing L, Wang T, Zhao M, Wang P. An adaptive multi-sensor data

fusion method based on deep convolutional neural networks for fault
diagnosis of planetary gearbox. Sensors (Switzerland) 2017;17:414.
doi: 10.3390/s17020414.
59. Kidziński Ł, Delp S, Schwartz M. Automatic real-time gait event
detection in children using deep neural networks. PLoS ONE.
2019;14:1–11. doi: 10.1371/journal.pone.0211466.
60. Malhotra P, Vishnu T V., Vig L, Agarwal P, Shroff G. TimeNet: Pre-
trained deep recurrent neural network for time series classification. In:
ESANN 2017—proceedings, 25th Eur Symp Artif Neural Networks,
Comput Intell Mach Learn. 2017
61. Li J, Pedrycz W, Jamal I. Multivariate time series anomaly detection:
a framework of Hidden Markov models. Appl Soft Comput J.
2017;60:229–240. doi: 10.1016/j.asoc.2017.06.035.
62. Kapp V, May MC, Lanza G, Wuest T. Pattern recognition in multivariate
time series: towards an automated event detection method for smart
manufacturing systems. J Manuf Mater Process. 2020;4:88.
63. Adaskevicius R. Method for recognition of the physical activity
of human being using a wearable accelerometer. Elektron ir
Elektrotechnika. 2014;20:127–131. doi: 10.5755/j01.eee.20.5.7113.
64. Shi Y, Shi Y, Wang X. Fall detection on mobile phones using features
from a five-phase model. In: Proceedings of the- IEEE 9th international
conference on Ubiquitous Intell Comput IEEE 9th Int Conf Auton
Trust Comput UIC-ATC 2012. 2012;951–6
65. Vallejo M, Isaza C V., Lopez JD. Artificial neural networks as an
alternative to traditional fall detection methods. In: Proceedings of the
annual international conferene on IEEE Eng Med Biol Soc EMBS.
2013;1648–51
66. Mao Y, Qi H, Ping P, Li X. Contamination event detection with
multivariate time-series data in agricultural water monitoring. Sensors
(Switzerland) 2017;17:1–19.
67. Jimenez AM. Physiological sensor. ProQuest Diss Theses.
2013;139. http://search.proquest.com/docview/1527176270?accoun
tid=6180%5Cnhttp://dw2zn6fm9z.search.serialssolutions.com/?ctx_
ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&rfr_id=info:sid/Pr
oQuest+Dissertations+%26+Theses+Global&rft_val_fmt=info:ofi/
fmt:kev:mtx:dissert
68. Mursalin M, Zhang Y, Chen Y, Chawla NV. Automated epileptic

seizure detection using improved correlation-based feature selection
with random forest classifier. Neurocomputing. 2017;241:204–214.
doi: 10.1016/j.neucom.2017.02.053.
69. Porciuncula F, Roto AV, Kumar D, Davis I, Roy S, Walsh CJ, et al.
Wearable movement sensors for rehabilitation: a focused review of
technological and clinical advances. PM R. American Academy of
Physical Medicine and Rehabilitation; 2018;10:S220–32. 10.1016/j.
pmrj.2018.06.013
70. Shenavar Masooleh I, Zayeni H, Haji-Abbasi A, Azarpira M, Hadian
A, Hassankhani A, et al. Cardiac involvement in rheumatoid arthritis:
a cross-sectional study in Iran. Indian Heart J. 2016.
71. Žďárská D, Pelíšková P, Charvát J, Slavíček J, Mlček M, Medová E,
et al. ECG body surface mapping (BSM) in type 1 diabetic patients.
Physiol Res. 2007;56:403–410. doi: 10.33549/physiolres.931021.
72. Abela M, Sharma S. Abnormal ECG findings in athletes: clinical
evaluation and considerations. Curr Treat Options Cardiovasc Med.
2019;21:1–17. doi: 10.1007/s11936-019-0794-4.
73. Gholamhosseini L, Sadoughi F, Safaei A. Hospital real-time location
system (A practical approach in healthcare): a narrative review article.
Iran J Public Health. 2019;48:593–602.
74. Low B, Coutinho D, Gonçalves B, Rein R, Memmert D, Sampaio J.
A systematic review of collective tactical behaviours in football using
positional data. Sport. Med. 2020.
75. Papi E, Osei-Kuffour D, Chen YMA, McGregor AH. Use of wearable
technology for performance assessment: a validation study. Med Eng
Phys. 2015;37:698–704. doi: 10.1016/j.medengphy.2015.03.017.
76. Kobsar D, Osis ST, Boyd JE, Hettinga BA, Ferber R. Wearable sensors
to predict improvement following an exercise intervention in patients
with knee osteoarthritis. J Neuroeng Rehabil. 2017;14:1–10. doi:
10.1186/s12984-017-0309-z.
77. Chambers R, Gabbett TJ, Cole MH, Beard A. The use of wearable
microsensors to quantify sport-specific movements. Sport Med.
2015;45:1065–1081. doi: 10.1007/s40279-015-0332-9.
78. O’Reilly M, Caulfield B, Ward T, Johnston W, Doherty C. Wearable
inertial sensor systems for lower limb exercise detection and evaluation:
a systematic review. Sport Med. 2018;48:1221–1246. doi: 10.1007/

s40279-018-0878-4.
79. James N. Notational analysis in soccer: past, present and future. Int J
Perform Anal Sport. 2006;6:67–81. doi: 10.1080/24748668.2006.118
68373.
80. Ali S, Khusro S, Rauf A, Mahfooz S. Sensors and mobile phones:
evolution and state-of-the-art. Pak J Sci. 2014;66:386–400.
81. Gupta A, Chakraborty C, Gupta B. Medical information processing
using smartphone under IoT framework. Energy Conserv; 2019.
10.1007/978-981-13-7399-2_12
82. Lima WS, Souto E, El-Khatib K, Jalali R, Gama J. Human activity
recognition using inertial sensors in a smartphone: an overview.
Sensors (Switzerland). 2019.
83. Vargason T, Howsmon DP, McGuinness DL, Hahn J. On the use of
multivariate methods for analysis of data from biological networks.
Processes. 2017;5:36. doi: 10.3390/pr5030036.
84. Katz MH. Multivariable analysis: a practical guide for clinicians and
public health researchers. Multivariable Anal A Pract. Guid. Clin.
Public Heal. Res. 2011. https://www.cambridge.org/core/books/
multivariable-analysis/DBE7816A781AEF53108FD721199B4AC9
85. Reid RC, Mahbub I. Wearable self-powered biosensors. Curr Opin
Electrochem. 2020;19:55–62. doi: 10.1016/j.coelec.2019.10.002.
86. Khoshmanesh F, Thurgood P, Pirogova E, Nahavandi S, Baratchi S.
Wearable sensors: at the frontier of personalised health monitoring,
smart prosthetics and assistive technologies. Biosens Bioelectron.
2021;176:112946. doi: 10.1016/j.bios.2020.112946.
87. Hajar R. The pulse from ancient to modern medicine: Part 3. Hear
Views. 2018;19:117–20. https://www.ncbi.nlm.nih.gov/pmc/articles/
PMC6448473/
88. Duraisamy R, Dinakar S, Venkittaramanujam V, Jeyakumar V. A
systematic approach for pulse diagnosis based on siddha medical
procedures. In: 2017 4th Int Conf Signal Process Commun Networking,
ICSCN 2017. 2017. https://ieeexplore.ieee.org/document/8085694
89. Zhang J, Niu X, Yang XZ, Zhu QW, Li HY, Wang X, et al. Design and
application of pulse information acquisition and analysis system with
dynamic recognition in traditional Chinese medicine. Afr Health Sci.
2014;14:743–752. doi: 10.4314/ahs.v14i3.34.
90. Lan KC, Litscher G, Hung TH. Traditional chinese medicine pulse
diagnosis on a smartphone using skin impedance at acupoints: a
feasibility study. Sensors (Switzerland) 2020;20:1–14. doi: 10.3390/
s21010001.
91. Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai M, Guez A, et al.
A general reinforcement learning algorithm that masters chess, shogi,
and Go through self-play. Science (80- ). 2018;362:1140–4.
92. Crosby V, Wireless G. Body area networks for healthcare: a survey.
Int J Ad hoc Sens Ubiquitous Comput. 2012;3:1–26. doi: 10.5121/
ijasuc.2012.3301.
93. Mathur A, Gupta CP. Big data challenges and issues: a review. Lect.
Notes Data Eng. Commun. Technol. Springer; 2020. 10.1007/978-3-
030-24643-3_53
94. Kluge EHW. Artificial intelligence in healthcare: ethical
considerations. Healthc Manag Forum. 2020;33:47–49. doi:
10.1177/0840470419850438.
95. Gómez-González E, Gomez E, Márquez-Rivas J, Guerrero-Claro
M, Fernández-Lizaranzu I, Relimpio-López MI, et al. Artificial
intelligence in medicine and healthcare: a review and classification of
current and near-future applications and their ethical and social Impact.
2020. http://arxiv.org/abs/2001.09778
Chapter
DAViS: A UNIFIED SOLUTION
FOR DATA COLLECTION,
ANALYZATION, AND
17
VISUALIZATION IN REAL-TIME
STOCK MARKET PREDICTION
Suppawong Tuarob1 , Poom Wettayakorn1 , Ponpat Phetchai1 , Siripong

Traivijitkhun1 , Sunghoon Lim2,3, Thanapon Noraset1 and Tipajin
Thaipisutikul1
1
2
Department of Industrial Engineering, Ulsan National Institute of Science and Technology,
Ulsan 44919, Republic of Korea
3
Institute for the 4th Industrial Revolution, Ulsan National Institute of Science and
ABSTRACT
The explosion of online information with the recent advent of digital
technology in information processing, information storing, information
sharing, natural language processing, and text mining techniques has
enabled stock investors to uncover market movement and volatility from
Citation: (APA): Tuarob, S., Wettayakorn, P., Phetchai, P., Traivijitkhun, S., Lim, S.,
Noraset, T., & Thaipisutikul, T. (2021). DAViS: a unified solution for data collection,
analyzation, and visualization in real-time stock market prediction. Financial Innova-
tion, 7(1), 1-32. (32 pages)
Copyright: © Open Access. This article is licensed under a Creative Commons Attribu-
tion 4.0 International License (http://creativecommons.org/licenses/by/4.0/)
heterogeneous content. For example, a typical stock market investor reads

the news, explores market sentiment, and analyzes technical details in order
to make a sound decision prior to purchasing or selling a particular company’s
stock. However, capturing a dynamic stock market trend is challenging
owing to high fluctuation and the non-stationary nature of the stock market.
Although existing studies have attempted to enhance stock prediction, few
have provided a complete decision-support system for investors to retrieve
real-time data from multiple sources and extract insightful information
for sound decision-making. To address the above challenge, we propose a
unified solution for data collection, analysis, and visualization in real-time
stock market prediction to retrieve and process relevant financial data from
news articles, social media, and company technical information. We aim to
provide not only useful information for stock investors but also meaningful
visualization that enables investors to effectively interpret storyline events
affecting stock prices. Specifically, we utilize an ensemble stacking of
diversified machine-learning-based estimators and innovative contextual
feature engineering to predict the next day’s stock prices. Experiment
results show that our proposed stock forecasting method outperforms
a traditional baseline with an average mean absolute percentage error of
0.93. Our findings confirm that leveraging an ensemble scheme of machine
learning methods with contextual information improves stock prediction
performance. Finally, our study could be further extended to a wide variety
of innovative financial applications that seek to incorporate external insight
from contextual information such as large-scale online news articles and
social media data.
INTRODUCTION
The stock market prediction has become a prominent research topic for
both researchers and investors due to its important role in the economy and
obvious financial benefits. There is an urgent need to uncover the stock
market’s future behavior in order to avoid investment risks while achieving
the best profit margins for investments. Nevertheless, stock market decision-
making is difficult due to the stock market’s complex behavior and unstable
nature. Accurate prediction is even more challenging considering the need
to forecast the local stock market in different countries (Wu et al. 2019;
Selvamuthu et al. 2019; Gopinathan and Durai 2019) since there are unique
cultures, different norms, and diverse heterogeneous sources that can affect
investors’ decision-making processes. Therefore, we take the Thai stock
DAViS: A Unified Solution for Data Collection, Analyzation, and... 339
market as an empirical study to demonstrate how to leverage stock prediction

performance locally.
Despite the high prevalence of existing stock prediction approaches,
there are several challenges that we need to consider when designing
a practical end-to-end stock prediction framework to tackle the dynamic
nature of the stock market (Hu et al. 2015).
CH1: There are heterogeneous sources of stock information as

shown in Fig. 1.
The large amount of data generated by the Internet users is considered
a treasure trove of knowledge for investors. Real-time data collection
and analysis are needed to explore and evaluate the enormous amount of
valuable available data. This process is an essential step to enhance stock
prediction performance since stock market circumstances are known to
fluctuate frequently. This means that decisions made with even minutes-old
data can potentially result in poor decisions.
Figure 1: An example of the heterogeneous sources of stock information.

Hence, when proposing an end-to-end framework for stock prediction,
the most current data from all important information sources should be
collected in real-time to obtain a full and accurate picture of the most updated
status of stock companies.
CH2: Heterogeneous behaviors need to be encoded for investors

to understand stock trends as shown in Fig. 2.
Figure 2: An example of heterogeneous behavior encoding related to stock

prediction.
Since financial information available on the Internet comes in a wide
variety of types, sources, temporal dimensions, and thematic scales, it would
be unfeasible to assemble all the information available. According to reports
from Bomfim (2003), there is strong evidence that news articles, social media
contents, and general discussion contents are often important indicators of
market movements. Moreover, they are critical for investors to keep abreast
of the collective expectation and various reports or predictions on stock
prices. For instance, as presented in Fig. 2, we can observe that there is a
strong causal relationship between news and KBank stock price. Before the
price suddenly dropped on March 27, 2018, there is a news announcement,
“KBank waives fees for digital banking.” Similarly, when the news headline
“Scrapped digital fees to hit banking income” was released, an increase in
the stock price was observed on the following date. Therefore, we should
include these contextual data in the system.
CH3: Effective integration of machine learning approaches is

needed for stock prediction as shown in Fig. 3.
Figure 3: An example of a dynamic ensemble model for stock prediction.

Each machine learning approach is designed to cope with different
data types and goals. One technique may outperform another on a specific
dataset and vice versa. For example, Zhong and Enke (2019b) used deep
neural networks (DNNs) with Principal Component Analysis (PCA) to
classify the stock return daily. Seker et al. (2013) utilized the combination
of k-NN, C4.5, and SVM approaches to classify stock market trends. The
effective integration of various machine learning algorithms should provide
a dynamic method to adjust the impact of each standalone model in the final
predictions with respect to the different types of stock datasets.
CH4: There is a need for comprehensive visualization of

decision-support information along with stock prediction results,
as shown in Fig. 4.
Figure 4: An example of a dynamic ensemble model for stock prediction.

Although stock market predictions help investors in their investment

decisions by offering them strong visions about stock market behavior to
avoid investment risks, we found that there is a need to include relevant
and useful information along with the stock prediction results. This enables
investors to better understand the current situation of a particular stock,
such as the cause and effect of the rise or decline trend in their prospective
investment.
In order to cope with the aforementioned challenges, in this study, we
propose a unified solution for data collection, analyzation, and visualization
in real-time stock market prediction, DAViS. DAViS is an end-to-end
framework for a real-time stock decision support system that not only
provides real-time stock data analysis, using simple yet effective ensemble
techniques with contextual text data but also delivers an easy-to-interpret
visualization of information related to a particular stock. In particular, there
are three main components of DAViS: DAViS-C, DAViS-A, and DAViS-V.
(1) DAViS-C is designed to collect and pre-process various types of stock
data into the proper format in vectorization form. Specifically, the various
types of stock and contextual data are converted into one-dimensional
vectors in the latent space by utilizing the techniques of tokenization, stop-
word removal, term frequency-inverse document frequency (TF-IDF),
feature agglomeration, and principle component analysis (PCA). The final
vector output of DAViS-C module then becomes an input of the DAViS-A
module for further analysis. (2) DAViS-A is designed to predict the future
stock price based on historical stock price data and contextual text data from
news, discussion boards, and social media. An ensemble machine learning
approach is used to leverage the benefit and strengthen each standalone
machine learning model on a specific stock dataset. Specifically, the adaptive
ensemble approach called meta-regressor is employed to combine multiple
machine learning prediction results while learning to estimate the impact
of different models. (3) DAViS-V is designed to provide not only useful
information for stock investors, such as future predicted stock prices, but
also meaningful visualization that allows investors to effectively interpret
storyline events affecting stock prices.
In summary, the main contributions of our study are as follows.
• To the best of our knowledge, this is the first study to provide
the end-to-end framework called DAViS for real-time stock data
collection (DAViS-C), analyzation (DAViS-A), and visualization
(DAViS-V) for stock market researchers and investors.
• The proposed DAViS-C module is designed in a proactive fashion

to pull and process the related heterogeneous stock data in real-
time.
• The concept of integrating the simple yet effective machine
learning approaches is introduced in DAViS-A to not only
efficiently enhance the accuracy of stock prediction but also
provide interpretable prediction results to users.
• The proposed DAViS-A module integrates various contextual
knowledge, including financial news websites, discussion
boards, and social media, into an ensemble learning technique to
strengthen diversified regressors and bolster a robust predictive
model.
• The proposed DAViS-V module could aid investors and traders in
their decision-making processes and provide easy-to-interpret as
well as sufficient information to support their future investment
plans over time.
• We perform experiments on real 21 stocks from the Stock Exchange
of Thailand.Footnote1 The experimental results demonstrate that
our proposed framework outperforms the standalone machine
learning approaches by a large margin.
The rest of the paper is organized as follows. Related literature section
discusses the background and Related literature. Preliminary and The
proposed DAViS Framework sections explain the preliminary notations
used throughout this study as well as the details of our proposed end-to-end
framework. Experimental setup section presents the Experimental setup and
the main research questions. Experimental result section provides in-depth
details on the Experimental results, including the overall comparisons,
an ablation study, and discussion. Finally, the concluding remarks are
summarized in Conclusions and future direction.
RELATED LITERATURE
Multiple techniques have been proposed to analyze the various phenomena
in financial markets (Wen et al. 2019; Kou et al. 2021). The overarching
goal of this research is to implement a computational model that derives
the relationship between contextual information and related stocks in
the financial market. We can divide the traditional models into two main
approaches based on the type of information they are focused on: technical
data or fundamental data.
Technical Analysis makes predictions on future stocks based on time-
series numerical data, such as opening and closing price and trade volume. The
main purpose of this approach is to find trading patterns that can be exploited
for future predictions. For example, Nayak et al. (2015) and Alhassan et al.
(2014) discovered a complicated stock pattern by utilizing the auto-regressive
model (AR), linearity, and stationary-time series. Nassirtoussi et al. (2015),
Nguyen et al. (2015), and Hagenau et al. (2013) predicted future stock prices
from historical data. Zhong and Enke (2019a) presented comprehensive big
data analytics based on 60 financial and economic features. They utilized
DNNs and traditional artificial neural networks (ANNs) along with the
principal component analysis (PCA) method to predict the daily direction
of future stock market index returns. Stoean et al. (2019) exploited deep-
learning methods with a heuristic-based strategy for trading simulations and
stock prediction. Nti et al. (2020) used an ensemble support vector machine
to boost stock prediction performance. However, the nature of stock price
prediction is highly volatile and non-stationary. Therefore, only utilizing
the numerical price data with technical analysis is inadequate to discover
dynamic market trends. In contrast, fundamental analysis integrates
information from outside market historical data such as news, social media,
and business reports as additional inputs for stock predictive models. For
example, Bollen et al. (2011) and Mao et al. (2011) proposed techniques
that mine opinions from social media for improved stock prediction. Vu et
al. (2012) first used a keyword-based algorithm to analyze and categorize
Twitter messages as positive, negative, and neutral. Then all features along
with historical prices were used to train a Decision Tree (C4.5) classifier to
predict the direction of future prices. Schumaker et al. (2012) investigated
the correlation between the sentiment of financial news articles and stock
movements. Later, Li et al. (2014) constructed sentiment vectors with the
Harvard psychological dictionary and used them to train a Support Vector
Machine (SVM) classifier to predict the daily open and closing prices. Jin
et al. (2013) presented Forex-Foreteller (FF), a currency trend model using
news articles as well as historical prices and currency exchange values. The
system used sentiment analysis and LDA (Blei et al. 2003a) to obtain a topical
distribution of each article. Akhtar et al. (2017) and Araque et al. (2017)
proposed ensemble model construction to enhance sentiment analysis. Such
methods are based on the work of Cheng et al. (2012), who examined whether
ensemble methods could outperform the base learning algorithms, each of
which learns from previous price information (as a time series). Afzali and
Kumar (2019) integrated a company’s textual information to improve stock
prediction performance. Lim and Tucker (2019) quantified the sentiment
in a financial market and social media to enhance performance in many
financial applications. Chattupan and Netisopakul (2015) used word-pair
features (i.e., a keyword and polarity word) to conduct a news sentiment
classification based on three sentiments: positive, negative, and neutral. In
addition, Lertsuksakda et al. (2014) used the hourglass of emotions which
is an improvement over Camras (1981)’s wheel of emotions—comprising
eight emotional dimensions, namely, joy, trust, fear, surprise, sadness,
disgust, anger, and anticipation—which has been utilized for many emotion-
inspired predictive tasks. While there have been many efforts to enhance the
performance of stock price prediction, few studies have provided an end-
to-end framework to collect, analyze, and visualize stock insights in a real-
time system. Our work differs from the exiting studies since we leverage
both technical and fundamental data from online news, social networks, and
discussion boards to support investors’ decision-making processes. Details
on our proposed model are provided in the next section.
PRELIMINARY
In this section, we present the notations used throughout this paper. We
denote the sets of stock companies, technical data analysis, and fundamental
data analysis as S, T, and F, where the sizes of these sets are |S|, |T|, and
|F|, respectively. The technical data analysis utilizes technical information
such as the price-earnings ratio, market capitalization, and volume. These
types of data can be kept in a tabular format of real numbers. Investors
can conveniently gather this information from many stock-price reporting
sources, such as the Stock Exchange of Thailand (SET),Footnote2,Yahoo
Finance,Footnote3 and Stock Radars.Footnote4 On the other hand, fundamental data
analysis involves monitoring primarily three basic factors (i.e., economic,
industrial, and organizational performance) that can affect stock prices.
Such analysis requires examining both quantitative and qualitative data.
While it is not convenient to represent qualitative data, often distilled from
news articles, in a defined structural format, such insight can be helpful to
investors and therefore cannot be neglected.
Definition 1
Technical Data Analysis Time Series (T): Each stock company $s \in S$ has
historical stock prices sorted by the time in chronological order. We define
a company’s historical stock prices as where
t is the current timestamp of the recent stock price belonging to company s
and l denotes the number of historical days used as the time lag.
Definition 2
Fundamental Data Analysis Time Series (F): There are three types of
fundamental data analysis used in this study, including financial news
information, discussion board information, and social media information. We
denote as a set of financial news articles, as a set of social media
posts, and as a set of discussion board posts. For all the fundamental
(F) data, we sort each , and by time in chronological
order. We then define a company’s historical contextual text data input
as
where t is the current timestamp of the recent
contextual text data belonging to company s and l denotes the number of
historical days used as the time lag.
Definition 3
Stock Data Time Series Input: This research focuses on time-series data, that
is, historical prices along with contextual information are used as the input to
the proposed stock predictive model. We therefore combine the l historical
data from T and F, and construct the horizontal input data to the model as
Problem formulation: For each stock company

, given the time-series transactions T and F as
our goal is to predict the future

closing stock price in the next k days. In our study, we set l to three days
and k to one day. In this way, the stock price prediction problem can be
framed as a one-step multivariate time series forecasting task since it
focuses on predicting the stock price for the next day (t+1). The horizon
of one day is assumed, and our methodology focuses on short-term trading
where decisions to purchase or sell can be made on a daily basis. For the
lag observations, the data of the past three days (t−2, t−1, and t) is used to
supervise the model. Such lag settings were also used by Bollen et al. (2011)
to predict the Dow Jones Industrial Average (DJIA).
THE PROPOSED DAVIS FRAMEWORK
Figure 5: An illustration of the proposed DAViS framework.

In this section, we first give an overview of our proposed framework.
The details of each component are introduced in the following sub-sections.
As shown in Fig. 5, we propose an end-to-end framework including stock
real-time data collection, stock data analysis, and stock data visualization
to address the complex stock prediction problem, namely, “A Unified
Solution for Data Collection, Analyzation, and Visualization in Real-time
Stock Market Prediction” (DAViS). First, the heterogeneous data sources,
including technical data analysis (stock price) and fundamental data analysis
(contextual text data) are retrieved in a proactive fashion by the DAViS-C
module. Then, the various types of stock and contextual data are projected
onto a one-dimensional vector latent space by utilizing tokenization and
stop-word removal techniques to segment the words and filter irrelevant
ones from the corpus. Next, the TF-IDF term-scoring technique is used
to transform the contextual text data into the vector format. Subsequently,
we use the feature agglomeration and principle component analysis (PCA)
techniques to perform dimensionality reduction to densify huge and sparse
vectors. The final output vector of the DAViS-C module then becomes an
input of the DAViS-A module. Second, DAViS-A performs the integration
of the standalone estimators. In particular, a dynamical ensemble approach
called meta-regression is employed to combine multiple machine learning
prediction results. This allows the meta-regressor to learn to estimate the
weights of different models simultaneously. Third, the final predicted
stock prices are visualized with real-time and useful information such as
the sentiment of financial news and discussion board posts with respect
to a particular stock. In addition, the most related news articles and top
relevant topics are ranked and displayed on our end-to-end framework as
supplementary insights to support decision-making in a dynamic stock
market investment.
DAViS-C: Collecting and Processing Contextual Information

The details of the DAViS-C module are shown in Fig. 6. We provide further
step-by-step details on converting the raw heterogeneous contextual text
and stock data into the low-dimensional latent vector space as follows.
Figure 6: Details of the DAViS-C module.

Data Collection
In our research, four types of contextual information are investigated for
their predictability of stock prices.
Historical Price Information Technical information can be used for
mathematical calculations with various variables. In our research, we gather
information on stock prices using the application programming interface
(API) to download historical stock data from the SiamChartFootnote5 website,
with a focus on seven attributes: date, opening price, highest price, lowest
price, closing price, adjusted closing price, and trading volume.
Financial News Information Financial news often reports important
events that may directly and/or indirectly affect a company’s stock price.
Publicly available news articles from reliable news sources such as
KaohoonFootnote6 and Money ChannelFootnote7 are routinely crawled. To minimize
the assumption of a news article’s metadata, only common attributes such
as the news ID, textual header, content, timestamp, and news source are
parsed and stored. A news article is mapped to corresponding companies by
detecting the presence of stock symbols in the news content; it is a common
protocol for financial news sources to include related stock symbols in the
corresponding news articles.
Social Media Information: Investors often express their opinions on
social networks. In this research, Twitter messages (or tweets) are used as
social media information. The open-source Get Old TweetsFootnote8 is used
to collect public tweets. To allow the methodology to be generalized to
other social media platforms, only common social media information such
as textual content and timestamp is extracted and stored. User-identifiable
information such as usernames and mentions are removed before storing
and further processing.
Discussion Board Informatio: Discussion boards are used to exchange
opinions on a company’s situation, which may or may not be related to
stock prices. A discussion thread comprises the main post and a sequence
of comments. Such information could be used to infer the current sentiment
toward a particular company. PantipFootnote9 discussion forum is used in our
research due to its public availability and popularity among Thai investors.
Based on our observations, the messages and discussed topics usually contain
or are related to facts and company news that could be indicators for stock
price movements. Furthermore, the overall sentiment expressed by users
also indicates the situation of the mentioned companies and, subsequently,
their stock movements. For our study, only public discussion threads that
mention stock symbols are collected, with user-identifiable information,

such as usernames, removed prior to storing and further processing.
Data preprocessing
In this section, the techniques used in data pre-processing are explained.
These techniques can be divided into three steps.
Input-Data Extraction This phase refers to the process of extracting
useful content from the crawled HTML pages using an HTML parser. With
the help of Python BeautifulSoup4Footnote10 library, a document object model
(DOM) traversal is used to extract necessary information by defining the
ID, class, or tag that the content belongs to. Following this, the timestamp is
extracted to show when the article was released, which could help to visualize
the connection between the data and the prices in a storyline format.
Tokenization and Stop Word Removal Tokenization or word segmentation
is one of the first processes in traditional natural language processing (NLP).
While effective tokenization tools are available for standard languages
such as English, most algorithms for tokenizing Thai text are still under
investigation (Tuarob and Mitrpanont 2017; Noraset et al. 2021). In our
work, the Thai word segmentation open-source model developed by the
National Electronics and Computer Technology Center (NECTEC), namely
LexTo (Thai Lexeme Tokenizer)Footnote11 is used to tokenize the text. LexTo
is a dictionary-based tokenizer that implements the longest matching
algorithm. A textual document is mapped to corresponding companies using
stock symbol detection. Information pertaining to each company is also
extracted in this step.
Text Vectorization Using TF-IDF To use machine learning for text
analysis, textual information needs to be converted into a machine-readable
format since raw text data cannot be fed straight into the machine learning
algorithm. Specifically, each document must be represented with a fixed-
length vector of real numbers. This process is often referred to as vectorization.
A textual representation method will be performed by transforming tokenized
words in each document into a bag-of-words representation in which each
term presents one feature of a document vector. The bag-of-words approach
is used as the de facto standard of text analysis research due to its simplicity
and capacity to produce a vectorized representation of the text. Each term t
is given a term frequency-inverse document frequency (TF-IDF) (Manning
et al. 2009) score with respect to the document d, defined as:
(1)
(2)
(3)
where is the number of occurrences of word t in article d,
defines the document frequency of a term with t, and n is the total number
of documents. Generally, the TF-IDF scoring scheme prefers terms that
frequently appear in a given document but less frequently in the corpus.
Such terms are deemed to be both representative and meaningful. After
performing the TF-IDF weighting, a document can then be represented with
a vector of weighted terms. We use Python’s scikit-learnFootnote12 to vectorize
textual documents.
Feature space modification

In machine learning methodologies, feature engineering refers to the process
of selecting and deriving meaningful and discriminative features from a
given dataset, which is a crucial aspect of developing machine-learning-
based algorithms. As two types of information are combined (historical and
textual data), the feature selection method is explained by examining the
two aspects. When observing historical data, five attributes are provided
as standard information, namely, the highest price, lowest price, opening
price, closing price, and volume of individual stock each day. Intuitively,
the closing price has the highest predictive ability since it is the most recent
price information available. For this reason, the closing price will be the
only feature selected since the target prediction is set to be the next day’s
closing price (Bollen et al. 2011). The time-step feature is constructed based
on the closing prices, that is, close(t-2), close(t-1), and close(t), to predict
close(t+1).
Since we deal with a large high-dimensional dataset, unsupervised
dimensionality reduction techniques are employed to accelerate the learning
speed and reduce unnecessary information. Moreover, these techniques
have been shown to improve the efficiency of predictions (Fodor 2002). The
dimensionality reduction techniques used in this research include feature
agglomeration and principal component analysis, and are described as
follows:
First, the principal component analysis (PCA) is known to be one of

the most popular techniques to reduce dimension using a linear algebra
concept of eigenvectors and singular value decomposition to perform
orthogonal transformation. It works by first calculating the covariance
matrix, eigenvectors, and their eigenvalues, selecting the top-k eigenvectors
based on corresponding eigenvalues, and transforming the original matrix
into k dimensions. In addition to the number of k dimensions, we are able
to achieve accurate predictions with 20 dimensions, aiming to improve the
computation speed without losing the efficiency of its predictions. Second,
feature agglomeration is another approach for dimensionality reduction,
using a clustering technique to group similar features. The Ward hierarchical
clustering technique is used to perform feature agglomeration by minimizing
the variance of the clusters rather than calculating distances between features.
In addition to the number of k clusters, we also define 21 clusters based on
the assumption that each cluster represents each of the 21 company stocks
in our experimentation dataset. In short, while the feature agglomeration
produces clusters of features, the principal component analysis allows us
to filter out noisy features based on how significant the variance is and how
spread-out they are.
Figure 7: Illustration of the engineered features after applying PCA and Ward
clustering to reduce the feature space.
As a result of feature engineering, the feature space is reduced to 40
dimensions that are derived from the integration of two-dimensional
reduction techniques (the first 20 dimensions come from the principal
component analysis (PCA), and the other 20 dimensions come from Ward
hierarchical agglomerative clustering), as illustrated in Fig. 7.
DAViS-A: Stock Price Forecasting
Figure 8: The details of DAViS-A module.

As shown in Fig. 8, the machine learning algorithms are generated based
on a variety of supervised learning techniques that automatically construct
mathematical models from training data. After the learning process, trained
predictive models are expected to perform precise predictions on future
unforeseen inputs. In this research, the machine is given the daily historical
information mapped with the next day’s prices.
Standalone based estimators The following machine-learning-based
regression algorithms are used as the base estimators.
• Linear regression (LR) can provide a good approximation via the
linear formulation as follows.
(4)
where refers to the predicted value, and refer

to the coefficients corresponding to individual features or variables. In
addition to the learning criteria, the linear regression will be able to fit into
the model, as it has the objective to minimize the residual sum of squares
(RSS) between the actual and predicted stock prices as shown in Eq. 5.
(5)
A more robust version of linear regression is called the Huber regression,

designed to overcome regular linear regressions which are highly sensitive
to noises or outliers. We use Huber regression to perform ensemble stacking
in the subsequent section.
• Bayesian Ridge regression (BAY) is a generalized linear estimator
that works by computing a probabilistic model with the
regularization used in ridge regression.
• Decision Tree (DT) is a tree-based estimator and one of the most
common machine learning techniques. The method is executed
by creating decision rules and branching the tree into many levels
of depth. However, the tree depth varies directly in line with the
model complexity. A highly complex model can lead to a high
variation or so-called “over-fitting.”
• Random Forest (RF) is based on an ensemble learning technique
with multiple decision trees. In this model, the “forest” refers
to a multitude of decision trees that are used to randomly select
features and sub-samples from a dataset. Therefore, the Random
Forest algorithm may have a lower risk of over-fitting than
the individual decision tree algorithms due to the incorporated
randomness.
• k-Nearest Neighbors regression (k-NN) is an instance-based
estimator that finds the k-nearest neighbors in the training data to
make a prediction.
• Adaptive Boosting (AdaBoost) is an ensemble learning method
(also known as meta-learning), which was initially created to
increase the efficiency of binary classifiers. AdaBoost uses an
iterative approach to learn from the mistakes of weak classifiers
and turn them into strong ones.
• Gradient Boosting (GB) is a boosting-based estimator based on
sequential modeling that aims to reduce errors from previous
models by adding stronger models, which works to decrease the
estimator’s bias.
• Extreme Gradient Boosting (XGB) is known to optimize gradient
boosting by enabling a parallel tree boosting technique that has
outperformed general machine learning models in many cases;
it has become widely used among data scientists in the industry.
Ensemble Estimators After the base machine learning estimators are

trained, and their hyperparameters are tuned, experiments will be performed
on the following ensemble learning techniques to combine all base learners,
and the results will be evaluated to achieve a robust ensemble model.
As illustrated in Fig. 8, when attempting to integrate the individual base
learners, the dynamic ensemble approach called meta-regressor will be
employed to combine multiple machine learning models. First, the outputs
of all the standalone models become the inputs of the meta-regressor. Then
the meta-regressor will learn to estimate the weights of different models. In
other words, it will determine which models perform well or poorly based
on given input data. The meta-regressor is a more effective way to use
and outperform the individual estimators since it can dynamically handle
complex stock data. Furthermore, the Huber regressor, used as the meta-
regressor, can tolerate the noises and outliners of stock data.
Figure 9: Two variants of DAViS-A model architecture.

We propose two variants of DAViS-A denoted as (1) DAViS-A-wo-c
and (2) DAViS-A-w-c. The difference between these two variants is
the integration of the contextual text data into the meta-regressor in the
ensemble estimator. Figure 9a shows the model architecture of DAViS-
A-wo-c. It utilizes the predicted stock prices from standalone models as
inputs to the meta-regressor. Then, the meta-regressor attentively weights
the contribution of each standalone model and returns the final predicted
stock price as output. In contrast, in DAViS-A-w-c, as shown in Fig. 9b,
we further incorporate the standalone prediction results with the vector of
contextual text data to compute a combination of a high-dimensional vector
of stock price and contextual text. Then, we decompose the high-dimensional
features into lower-dimensional features via the PCA technique. As a result,
the high-dimensional features are encoded into 7-dimensional vectors before
applying the meta-regressor to generate the ensemble estimator. More details
on this parameter setting are described in Experimental result section.
DAViS-V: Stock Price Visualization
Figure 10: The details of DAViS-V module.

As shown in Fig. 10, after we obtain the predicted stock price in the
following days via the DAViS-A module, we perform four sub-tasks to
further extract useful information for real-time stock prediction.
Financial sentiment analysis

As sentiments have continuously driven the financial market, the news
and public sentiment are digested to explicate the relationship between
sentiments and stock prices (Zha et al. 2021). The analyses are divided into
different classes corresponding to each source of information as follows:
• The news sentiment refers to the sentiment of the news article
itself, which is categorized into three classes, namely, positive,
negative, and neutral. For example, news that reports a company’s
revenue growth is deemed positive while an article that reports an
event that could jeopardize a company’s income or reputation is
treated as negative. However, news articles that do not convey
any direction of the stock movements are classified as neutral,
such as general news of an overview of the daily market that lists
daily stock prices.
• The public sentiment is extracted from public comments
pertaining to each individual stock. The classifier detects both
positive and negative sentiments in a thread.
News informativeness analysis

Due to uncategorized news information or insufficient evidence for
classification, news categorization based on its informativeness becomes
crucial in our news-analysis system. The system categorizes news articles
into three classes as follows:
• The report category refers to news that may come from primary
information sources or reporters. Such sources merely report
business activities or announcements such as company earnings,
industry statistics, and other corporate news.
• The review category refers to articles that provide stock reviews.
The security reviews can be provided by analysts or brokers.
Stock reviews usually provide information on the analysis of the
companies, suggested trading strategies, and target prices.
• The market category refers to news articles that discuss the
current situation of the overall financial market and do not target
any particular stock or company.
DAViS-V’s Classification Methods used in Sect 4.3.1 and 4.3.2: To
process the aforementioned financial information, the machine learning
methodology is applied to automate the system analyses. In addition, the
machine learning algorithms of our system are based on a supervised learning
method, whereby the machine is supervised or trained from provided training
samples to perform classification on unseen data. For the comparison of
learning algorithms, we use three representative classification algorithms
from different classification families:
• Naïve Bayes (NB), a probability-based classifier, represents
each document as a probability-weighted vector of words. One
of the benefits of Naïve Bayes is its simple architecture, which
enables the model to scale and adapt quickly to the changes of
new datasets.
• Random Forest, a tree-based classifier, has been shown to
perform well on text classification tasks due to its ability to avoid
overfitting issues.
• Support Vector Machine (SVM), a function-based classifier,
has been used extensively and shown in previous studies to be
effective for text classification (Colas and Brazdil 2006). The
operation behind the SVM algorithm is to find hyper-planes that
maximize the margin of labeling-data points of any class.
Document scoring analysis

A well-formulated ranking scoring is needed for the information retrieval
system to retrieve and rank news articles relevant to users’ needs. Our
proposed ranking scheme is incorporated with an automated document
classifier where we utilize the text analysis classes to compute the weight of
each document. Consequently, the weight of each class is examined to find
its best value.
Figure 11: Percentage of classified categories corresponding to each source of

information.
As shown in Fig. 11, there are fewer negative articles than positive and
neutral ones. This could potentially cause a class imbalance problem where
machine learning models are biased toward the majority classes (Picek et al.
2019). Therefore, we give more weight to the negative articles in the training
set so that the weight sums are equal across all classes. Another weighting
component is the news category, that is, report, review, or market. The report
category is assigned the highest weight, as we consider that reports announce
real facts of the corresponding companies while the review articles merely
provide opinionated recommendations from analysts. Last, we assign the
weights of the market news category and neutral news sentiment to be zero,
since they usually do not provide useful information that affects movements
of a particular company’s stock price. The pre-defined weights are shown
in Table 1. Therefore, to perform document ranking, the scoring scheme is
mathematically formulated as follows:
(6)
where score(d) is the score of document d; sentiment is the weight of its
sentiment classes; informativeness is the weight of its category classes; date
is document release time, formatted as ‘yyyymmdd’; N(s|d) is the number of
stocks s related to a document d; $\beta$ is the bias factor, which is a pre-
defined weight scheme added to compute the inverse relation of N(s|d), and
is set to 5.0 by default.
Table 1: Pre-defined weights on each analyzed category
News sentiment News category Public sentiment

Class Weight Class Weight Class Weight
Positive 1.0 Report 2.5 Positive 1.0
Negative 1.5 Review 2.0 Negative 1.5
Neutral 0.0 Market 0.0 – –
Topic modeling analysis

Latent Dirichlet Allocation (LDA) is a topic-modeling algorithm that
represents a document as a mixture of various topics, each of which is
a distribution of term probabilities. LDA has been widely used in text
mining applications, such as extracting important key phrases (Liu et al.
2010), recommending citations (Huang et al. 2014), and measuring topical
document similarity (Tuarob et al. 2021). By utilizing the topic-modeling
algorithm, the usefulness of this technique in the field of financial topic
discovery is examined, where new terms or insights can be detected ahead
of time. Therefore, the LDA algorithm is formulated to obtain the probability
of term wi being in the document d, given by , as follows:
(7)
For the implementation, we apply the TwittDict algorithm proposed by
Tuarob et al. (2015), which is an extension of Latent Dirichlet Allocation
(LDA) (Blei et al. 2003b) that extracts emerging social-oriented key phrase
semantics from Twitter messages. Such key phrases extracted from a corpus
of news articles are ranked based on their prevalence probability. Top key
phrases are used to generate a tag cloud that captures the current topics
prevalently discussed in the news.

In summary, we perform four sub-tasks in order to provide more
insights to investors. Our end-to-end stock prediction system provides an
easy-to-interpret visualization summarizing the predicted stock prices with
supporting information extracted by the DAViS-V module.
EXPERIMENTAL SETUP
Dataset Statistics
Table 2: Dataset statistics, including number of news articles, forum posts, and
tweets for each stock
Stock # News articles # Forum posts # Tweets

ADVANC 8167 754 249
AOT 7820 605 484
BANPU 5631 270 329
BBL 5961 238 580
BDMS 4726 182 147
BH 3999 111 189
CK 4706 210 573
CPALL 6385 331 220
CPF 5642 863 335
CPN 4483 161 234
HMPRO 5642 863 335
IRPC 5594 155 251
JAS 5391 1858 878
KBANK 8469 182 445
MINT 4967 99 105
PTT 10,398 878 1538
SCB 6936 1188 1979
SCC 6743 186 151
THAI 3803 297 12
TU 3070 64 2956
TRUE 6512 917 1070
Average 5954.52 495.81 621.90
In this section, the dataset statistics of particular sources of information are

described, for example, news articles from Kaohoon and Money Channel
news sources, social media posts from Twitter, and discussion board posts
from Pantip as detailed in Table 2. Looking at the news sources, the data
were collected from December 2014 to February 2018, comprising 123,506
data points. Twitter’s data were collected from January 2014 to Febru-
ary 2018, containing 12,776 data points. Finally, for Pantip, the data were
collected from January 2014 to February 2018, consisting of 14,192 data
points. The abbreviations of all the selected stock companies are as follows:
BANPU,PTT,KBANK,AOT,CPF,TU,CPN,CPALL,BDMS,ADVANC,TRU
E,IRPC,BBL,SCB,THAI,MINT,SCC,CK,HMPRO,BH, and JAS. The stock
companies used in this study cover various sectors, including technology,
transportation, energy, financial, health care, real estate, goods/service, and
agriculture. The selected stocks have the highest market cap in each sector
and sufficient corresponding textual data (i.e., news articles, forum posts,
and tweets) to validate our proposed techniques.
Evaluation Metrics
In this section, the performance metrics used to evaluate the predictions,
in terms of both the magnitude of the error and directional accuracy, are
described as follows:
• Mean Absolute Percentage Error (MAPE) is an error-based
measurement that calculates the absolute error by percentage
with respect to the actual value.
• Directional Accuracy (DA) provides a measurement of prediction
direction accuracy. The predicted values can be considered
positive or negative directions.
Models Configuration and Hyperparameters Tuning

This section further explains how these estimators are trained and integrated
into the system, and how the hyperparameters are tuned. In addition to
machine learning methodology, hyperparameter tuning or optimization is
crucial since these parameters control the model’s complexity, capacity to
learn, and resource utilization.
Table 3: Example distribution of the training set (train), validation set (valid),
and testing set (test) from the dataset set of news articles
Dataset Portion (%) Begin date End date Data points

Train 80 Feb, 2015 Jun, 2017 20,466
Test 10 Jun, 2017 Oct, 2017 2549
Valid 10 Oct, 2017 Feb, 2018 2577
In order to have the optimal model configuration and hyperparameter

tuning, a process of validating parameters has to be implemented. One of
the common ways of performing hyperparameter tuning is grid search, that
is, searching through a manually specified set of hyperparameters. A grid
search is implemented using a metric to measure performance by examining
a subset of data from the training set, as illustrated in Table 3. As shown in
Table 3, there is a time-series dataset that has been constructed based on
multiple time-steps, controlled by lags and horizon. Thus, it is necessary
to split the dataset based on the timeline rather than randomly splitting or
shuffling the data to achieve this. Therefore, the first 80% of the data is
reserved for the training set. Consequently, the model is trained based on
the data in the training set, and the hyperparameters are tuned using the
validation set. Subsequently, the final performance will be evaluated using
the information from the testing set. To achieve the optimization objective,
a performance metric will be set while a parameter search is performed
based on the mean absolute percentage error (MAPE). For instance, this
could be done by selecting the hyperparameters that show the lowest
MAPE when running the model against the validation set. The list of tuned
hyperparameters using Grid Search corresponding to all base learners are as
follows:
• Decision Tree: Splitting criterion = mean squared error with
Friedman’s improvement; Maximum depth of the tree = 10;
Maximum features = none (use all the features).
• Random Forest: Splitting criterion = mean squared error; Number
of ensemble trees = 10 (default); Maximum depth of the tree =
12; Maximum features = none (the total number of features);
Minimum samples on leaf node = 2.
• k-Nearest Neighbors: Number of k-neighbors = 9; Nearest
neighbor algorithm used = BallTree; BallTree’s leaf size = 20;
Distance metric = Euclidean Distance; Weight function = distance,
that is, the closer the neighbors, the more weight is given.
• Gradient Boosting: Loss function = least squares regression;

Splitting criterion = mean squared error with Friedman’s
improvement; Based estimators = tree boosting; Number of
boosting estimators = 50; Learning rate = 0.3.
• XGBoost: Booster = gblinear (using linear function); Learning
objective = linear regression; Learning rate = 1.0; Validation
evaluation metric = root mean square error; L2 regularization =
0.1; L1 regularization = 0.45; Updater = Ordinary coordinate of
descent algorithm.
EXPERIMENTAL RESULT
In this section, the experiments are conducted to answer the following
research questions: RQ1: What is the proper feature engineering method
to use in DAViS-C for dimensionality reduction? RQ2: What is the proper
size of dimension decomposition used in DAViS-A’s decomposition
process? RQ3: What are the proper time lags (l) to use for stock prediction
in DAViS-A? RQ4: Does the proposed ensemble machine learning with
contextual text data in DAViS-A-w-c outperform the one without contextual
text data in DAViS-A-wo-c? RQ5: How do the different types of contextual
text data affect stock prediction performance? RQ6: How well does the
DAViS-V classification task perform on financial sentiment analysis and
news informative analysis? RQ7: How well does DAViS-V perform in the
document scoring task? RQ8: How well does DAViS-V perform in the
topic modeling task? RQ9: Can our proposed ensemble machine learning
approach in DAViS-A provide interpretable results to stock investors?
RQ10: Can our end-to-end DAViS framework provide useful insights for
investors to make real-time decisions on stock investments?
The Sensitivity of the Proposed Model Hyperparameters

(RQ1,RQ2,RQ3)
Figure 12: Sensitivity of DAViS hyperparameters.

As shown in Fig. 12a, to supervise the predictive models, we incorporated
price and textual features. After the elementary features were prepared, the
identified features are then evaluated, including the bag-of-words features
with TF-IDF term weighting, only PCA, only Ward clustering, and the
combination of PCA and Ward features. Since the combination of PCA
and Ward yields the best MAPE of 1.20 on the validation set, we use this
combination as our feature engineering method in DAViS-C. Figure 12b and
c show that when DAViS-A’s dimension decomposition and the time lag (l)
are set to 7 and 3, respectively, we obtain the best MAPE value. Therefore,
these parameters are used in our prediction task.
The Effectiveness of Incorporating Contextual Text Data into

Ensemble Stock Machine Learning (RQ4)
In this section, we first provide an overview performance of DAViS-A-w-c
for individual stocks in Table 4. Note that the bottom-right values show the
average results of all stocks corresponding to each metric. The contextual text
data used in DAViS-A-w-c includes all text from news articles, social media
messages, and discussion board posts. Next, we analyze the performance
comparison between the ensemble stock machine learning prediction with
and without contextual text data denoted as DAViS-A-wo-c and DAViS-A-
w-c, respectively. Table 5 shows that DAViS-A-w-c could outperform all
base estimators in terms of error-based performance metrics by yielding a
MAPE of 0.93% and a DA of 54.36%. Statistical tests shown in Table 6
confirm that the performance of our proposed ensemble stacking estimator is
statistically significantly different from that of the other baseline estimators,
especially in terms of DA. We also observed that including contextual text
data in DAViS-A-w-c could improve the stock prediction performance by
large margins.
Table 4: The MAPE and DA performance of DAViS-A-w-c (with contextual

text data) on individual stocks
Stocks MAPE (%) DA (%) Stocks MAPE (%) DA (%)

BANPU 0.97 58 IRPC 1.27 37
PTT 0.50 59 BBL 0.57 53
KBANK 0.84 60 SCB 0.74 57
AOT 1.28 55 THAI 1.56 61
CPF 1.17 47 MINT 0.92 53
TU 0.85 60 SCC 0.69 54
CPN 1.07 61 CK 0.96 58
CPALL 0.53 49 HMPRO 1.07 55
BDMS 0.86 51 BH 0.90 50
ADVANC 0.74 64 JAS 0.89 46
TRUE 1.27 54 Avg. 0.93 54
Table 5: The comparison of DAViS-A-wo-c (without contextual text data) and

DAViS-A-w-c (with contextual text data)
Base estimator DAViS-A-wo-c DAViS-A-w-c

MAPE (%) DA MAPE (%) DA (%)
(%)
Linear regression 1.01 45.84 1.31 48.45
Bayesian ridge 1.01 45.76 1.05 47.26
Decision tree 1.12 51.58 1.23 53.11
Random Forest 1.27 49.59 1.09 53.13
K-Neighbors 1.20 61.77 1.18 55.08

AdaBoost 1.01 50.91 1.01 52.62
Gradient boost 1.02 54.92 1.09 52.45
XGBoost 1.04 53.23 0.94 52.65
Ensemble stacking None None 0.93 54.36
Table 6: Comparison of the p-values from the Student’s paired t-test between
the proposed ensemble stacking and other baseline estimators (DAViS-A-w-c),
with $\alpha$ = 0.05
Base estimator MAPE DA

Linear regression < 0.05 < 0.05
Bayesian ridge 0.056 < 0.05
Decision tree < 0.05 < 0.05
Random forest < 0.05 < 0.05
K-Neighbors < 0.05 < 0.05
AdaBoost < 0.05 < 0.05
Gradient Boost < 0.05 < 0.05
XGBoost 0.470 < 0.05
Table 7: The experimental results of each source of information in terms of

MAPE and DA
Stocks News article Social media Discussion board

MAPE (%) DA (%) MAPE (%) DA (%) MAPE (%) DA (%)
ADVANC 0.96 51 1.13 46 0.97 45
AOT 1.30 59 1.53 58 6.79 46
BANPU 1.41 60 1.28 57 1.85 51
BBL 0.85 59 0.80 56 1.07 62
BDMS 1.01 51 1.15 57 1.31 61
BH 1.32 51 1.54 48 1.38 76
CK 1.16 57 1.23 57 1.37 58
CPALL 1.03 60 1.81 57 6.65 33
CPF 1.18 53 1.31 47 1.28 52
CPN 1.47 60 1.72 59 8.31 43
HMPRO 1.50 58 2.12 67 1.68 68
IRPC 1.47 53 1.49 54 1.70 35
JAS 1.52 50 1.85 51 2.09 50
KBANK 0.98 57 1.23 45 1.68 45
MINT 1.25 56 1.00 88 1.68 47

PTT 0.88 59 1.64 47 1.40 50
SCB 0.90 56 0.93 56 0.96 53
SCC 0.78 54 1.01 70 1.22 45
THAI 1.61 51 0.93 100 2.12 67
TU 1.21 51 1.08 57 1.39 58
TRUE 1.55 55 1.35 60 1.65 44
Average 1.21 55 1.34 59 2.31 52
Analysis of the Impact of Different Types of Contextual Data

(RQ5)
This section investigates the efficiency of integrating each source of
information from news articles, Twitter messages (i.e., Tweets), and Pantip
posts. In this experiment, we enabled each type of contextual text data at
a time in DAViS-A-w-c. Table 7 shows that news information yields the
best performance in terms of error-based metrics, with a MAPE of 1.21%
on average. However, social media information (i.e., Twitter) achieves
the highest directional accuracy with an average of 59%. Moreover, the
discussion board (i.e., Pantip) seems to have the lowest efficiency in terms
of both MAPE and DA. Based on further investigation, this might be due to
the nature of the discussion board, where an author creates a topic or post on
which other people comment/reply. Hence, it was discovered that, particularly
with Pantip posts, random opinions/sentiments from investors that carried
less information compared to the news articles were quite prevalent. In
addition, we grouped stocks based on their market sectors and performed
another experiment to observe the impact of contextual data on different
business sectors. As presented in Table 8, we observe that integrating news
articles and social media contextual data into our proposed DAViS-A model
could improve the stock prediction results while including discussion board
data has less impact on performance improvement. Specifically, we found
that incorporating news articles in the business sectors related to technology,
transportation, energy, financial, and real estate while incorporating social
media posts in the business sectors related to health care, goods/service, and
agriculture could improve the stock prediction performance.
Table 8: The experimental results of each source of information grouped by

market sectors by using MAPE and DA
Sectors News article Social media Discussion board

MAPE DA (%) MAPE (%) DA (%) MAPE (%) DA (%)
(%)
Technology 1.34 52 1.44 52 1.57 46
Transportation 1.46 55 1.23 79 4.46 57
Energy 1.25 57 1.47 53 1.65 45
Financial 0.91 57 0.99 52 1.24 53
Health care 1.17 51 1.35 53 1.35 69
Real estate 1.14 57 1.32 62 3.63 49
Goods/service 1.27 59 1.97 62 4.17 51
Agricultural 1.21 53 1.13 64 1.45 52
Average 1.22 55 1.36 60 2.44 53
Analysis of the Performance of DAViS-V Classification Task on

Financial Sentiment Analysis and News Informative Analysis
(RQ6)
To evaluate the performance of three different machine learning algorithms,
a tenfold cross-validation technique is applied to randomly partition our
dataset into 10 subsamples. Then the evaluation results from the ten folds
are averaged. In addition, the dataset used for evaluating the contextual
classification comprises 550 news sentiment-type articles, 380 news
informativeness-type articles, and 521 public sentiment-type articles. Dataset
statistics are shown in Fig. 13. For measurement metrics, classification
accuracy is used to evaluate the correctness of classifiers.
Figure 13: The number of data points in each class in the dataset.
Table 9: Classification results based on accuracy metrics of three analyses, in-

cluding news sentiment (positive, negative, and neutral), news informativeness
(market, report, and review), and public sentiment (positive and negative)
Models News sentiment Informativeness Public sentiment

NB 83.09 85.26 66.03
RF 82.18 89.21 70.40
SVM 85.64 87.89 71.21
As seen in Table 9, first, Random Forest has the highest news

informativeness accuracy score (89.21%). Second, Support Vector Machine
has the highest news and public sentiment accuracy scores; 85.64% and
71.21%, respectively. Finally, to measure the average classification accuracy
of the three tasks, SVM yields the highest average accuracy (81.58%), while
RF and NB have average accuracies of 80.60% and 78.13%, respectively.
In addition, we notice that the news classification task yields higher
accuracy than the public sentiment classification from the discussion board.
Two major factors could explain why discussed messages are harder to
analyze. The first factor is that news articles commonly use a formal writing
style and are structured in a consistent pattern. Equally important, the second
factor is the high level of noise generated by misspelled words in informal
discussions on message boards. These spelling errors could affect text
analysis processes, which add spurious non-standard words to the feature
space, leading to reduced learning efficacy. The other discussion result is the
fact that news informativeness can yield significantly high accuracy. From
observation, those three categories have their own uniqueness, which usually
has the same patterns of trivially distinguishable keywords. For instance, in
the review category, statements like “Analysts said company X is strongly
recommended” are commonly used. Thus, there are quite obvious feature
words that help the machine to better identify the differences between
classes.
Analysis on the performance of DAViS-V in the document

scoring task (RQ7)
This section investigates whether the formulated document scoring equation
can be applied in practice and how different weights perform in comparison.
Accordingly, analyzed news articles are collected and sorted by scores. To
demonstrate the calculation, a set of dummy data is generated with their
tags and info, including date, news sentiment, news informativeness, and the
number of related stocks. The results of news articles scoring and ranking
are listed in Table 10.
Table 10: Ranked documents based on our proposed document scoring tech-
nique
Rank Date N(s|d) Sentiment Informativeness

1 2017-01-05 1 stock Positive (1.0) Review (2.0)
2 2017-01-04 1 stock Negative (1.5) Review (2.0)
3 2017-01-03 1 stock Negative (1.5) Report (2.5)
4 2017-01-06 4 stocks Positive (1.0) Report (2.5)
5 2017-01-03 1 stock Neutral (0.0) Report (2.5)
6 2017-01-05 1 stock Neutral (0.0) Market (0.0)
7 2017-01-04 2 stocks Neutral (0.0) Review (2.0)
8 2017-01-06 5 stocks Neutral (0.0) Market (0.0)
To discuss the analysis of the parameters shown in the scoring function

(Eq. 6), a ramification on the variance of each parameter is clarified as
follows:
• date: set as exogenous feedback where the article is generally
ranked based on recency.
• sentiment & informativeness: the sentiment and the informativeness
parameters are weighted on the gravity of each class, where
some classes are ranked higher than others. In addition, the
overall weights of news informativeness are set higher than the
sentiment classes. This is because informativeness, such as that
in news reports, is found to be more important than examining its
sentiment alone.
• N(s|d): as N(s|d) refers to the number of companies related to a
document. It has been found that articles that refer to too many
companies are less meaningful. For instance, the statement
“Today’s top 5 most active stocks are A, B, C, D, and E” will be
given a lower score. Thus, N(s|d) is placed as the divisor of $\
beta$ that infers an inverse relationship with the increase in the
number of mentioned companies.
• $\beta$: the $\beta$ variable defines the importance of the
inverse relation of N(s|d). Thus, there is no different weight
between higher and lower N(s|d) values if $\beta$ equals zero.
Analysis of the Performance of DAViS-V in the Topic Modeling

Task (RQ8)
To determine the effectiveness of the key phrases extracted by the TwittDict
algorithm, the articles related to a given company are collected to perform
topic discovery. The textual content in each article is tokenized and fed to
the TwittDict algorithm to generate topical-oriented key phrases.
Table 11: Notable key phrases of a selected company using TwittDict’s topic
modeling technique, where a key phrase is denoted as a mixture of topics, fre-
quency is its occurrence in the company’s corpus; score defines the relevance of
its key phrase corresponding to the company
Key phrase Frequency Score

Highlight securities are 4 3.3614E−4
Weekly Stock Trading 14 2.9650E−4
Between February 5 2.3770E−4
No change in trading volume 4 1.7058E−4
Value per security 13 9.0389E−5
Million values 21 8.3247E−5
Top gainers 41 8.1449E−5
Baht Index 14 7.8542E−5
Surpass the target 9 7.5920E−5
Weekly target 10 7.1074E−5
Technical signals 4 6.8964E−5
As seen in Table 11, most of the key phrases might not convey sufficient
information. This might be because imbalanced news articles are generated
each day, as illustrated in Fig. 11. Thus, the topic modeling could be misled
by a high volume of the market news category. Although the extracted key
phrases do not provide meaningful messages to investors, it is undeniable
that there would be potential benefits if this topic modeling approach can
discover emerging insightful information early. Therefore, a possible
improvement would be to equip the system with the ability to automatically
perform document filtering and extract valuable topics.
Analysis of Interpretable Machine Learning (RQ9)

The lack of interpretability of the prediction results casts a shadow on
reliability and user trust. Therefore, we aim to provide not only precise and
accurate predictions but also interpretable results from machine learning
models. In our DAViS-A, the meta-regressor provides an adaptive way
to combine the results from standalone models dynamically. The meta-
regressor will learn to estimate the weights of different models and determine
the performance of the models based on the given input data. As a result
of having significantly different base models, stacking would be a more
effective way to use and outperform the individual estimators. For example,
we randomly select one testing instance to perform the prediction. Weights
corresponding to each standalone model are shown in Table 12. We can
observe that DT and GB contribute more than others in this specific stock
dataset, while KNN and XGB seem to have less impact on the prediction
results. With the meta-regressor ensemble approach, we could also explain
how the model derives the final prediction results.
Table 12: Coefficient values corresponding to the base estimators determined

through performing a stacking ensemble from the meta-regression
DT RF KNN BAY Ada GB XGB

0.2777 $-$ 0.0028 $-$ 0.2047 $-$ 0.0663 0.0533 0.2537 $-$ 0.1057
Here DT, RF, KNN, BAY, AdaBoost (Ada), GB, and XGB are used as
base learners
Analysis of an End-to-end DAViS Framework (RQ10)

The final output of the DAViS system is an interactive intelligent web
application for investors who may use it as a tool to assist their trading.
In the investment world, analyzing technical chart patterns is one of the
popular methods used by investors to monitor market movements. Analysis
results are visualized into various charts, including a stock price prediction
line chart, sentiment chart, annotation chart, and topic word-cloud, as shown
in Fig. 14.
Figure 14: Visualization of our DAViS Stock decision support system.

In Fig. 14a, the sentiment chart presents the public mood toward
a particular stock. A variety of emotions are classified as having either a
positive or negative effect on the stock. This chart clearly shows how many
people view the stock positively or negatively on a certain day of investment
from discussion boards. By knowing the majority opinions, investors may
use this information as a factor to determine their trading strategies.

As illustrated in Fig. 14b, the word cloud defines a group of key
phrases generated using the TwittDict algorithm, which prevalently appears
on financial news and message boards. The size of each key phrase is
represented by the calculated prevalence probability, where larger words
indicate higher scores.
As shown in Fig. 14c, the annotation chart allows investors to observe
the collection of related news that causes each specific stock price to change
over time. As a result, investors may discover the news patterns and styles
that have a major effect on the prices. This kind of chart can be expressed
on top of the technical stock chart with marked points to note the associated
news. Figure 14c displays the annotation chart of ADVANC stock in the SET
market. There are three points of interest, labelled as A, B, and C. Position
A demonstrates that the stock price goes up to the noted point due to the
support of positive news toward ADVANC, displayed on the right, marked A.
Point B illustrates a behavior similar to A, but due to a different company’s
situation. However, it is different in terms of the downward peak C where
the news captured by the system is a negative news article toward the entire
market. Therefore, there is no doubt that news marked as C would cause the
prices of other stocks to drop as well.
As shown in Fig. 14d and e, the search engine system provides two types
of search results, namely financial news articles and financial discussions on
discussion boards. The articles are ranked based on the proposed ranking
algorithm that filters out irrelevant and less significant articles to help
investors access useful financial information. Regarding the details of the
interface, there are four main visualized components, including published
date, content (i.e., news header and synopsis), stock-related information, and
analyzed document tags. Additionally, the document annotation contains two
automated tagging results displayed as colored boxes on top of the content
headers. Finally, the list of related stocks is displayed as grey boxes at the
bottom of each content box.
CONCLUSIONS AND FUTURE DIRECTION

In the information age, an enormous amount of information is generated
rapidly throughout social media and other websites in a matter of seconds.
Manually monitoring such a massive amount of information can be tedious.
We addressed the challenge of analyzing unstructured data and directed our
interest to the financial field. Financial contextual information, including
news articles, discussion boards, and social media, is extracted and digested
using machine-learning techniques to gain insight into stock markets. As
discussed in the prototype model of DAViS, we proposed an interpretable
ensemble stacking of diversified machine-learning-based estimators in
combination with an engineered textual transformation using the PCA
and Ward hierarchical features to predict the next day’s stock prices. The
use of textual analysis with a topic modeling-based technique is applied
to extract useful information such as sentiment, informativeness, and key
phrases. Finally, we described how documents are scored and ranked based
on different variables in our system. Future studies could further develop the
system to include even more contextual knowledge and discover predictive
signals that could be deployed in an innovative algorithmic trading system.
Integrating the prediction into a trading strategy and comparing it with
existing ones could also further expand the practicality of our proposed
methods.
Notes
1. https://www.set.or.th/.
2. https://www.set.or.th/.
3. https://finance.yahoo.com/.
4. https://www.stockradars.co/.
5. http://www.siamchart.com.
6. https://www.kaohoon.com.
7. http://www.moneychannel.co.th.
8. https://github.com/Jefferson-Henrique/GetOldTweets-python.
9. https://www.pantip.com.
10. https://pypi.python.org/pypi/beautifulsoup4.
11. www.sansarn.com/lexto.
12. http://scikit-learn.org.
ACKNOWLEDGEMENTS
This research project is supported by Mahidol University (Grant No. MU-
MiniRC02/2564). We also appreciate the partial computing resources
from Grant No. RSA6280105, funded by Thailand Science Research and
Innovation (TSRI), (formerly known as the Thailand Research Fund (TRF)),
and the National Research Council of Thailand (NRCT).
REFERENCES
1. Afzali M, Kumar S (2019) Text document clustering: issues and
challenges. In 2019 International conference on machine learning, big
data, cloud and parallel computing (COMITCon). IEEE, pp 263–268
2. Akhtar MS, Gupta D, Ekbal A, Bhattacharyya P (2017) Feature
selection and ensemble construction: a two-step method for aspect
based sentiment analysis. Knowl Based Syst 125(Supplement C):116–
135 (ISSN 0950-7051)
3. Alhassan J, Abdullahi M, Lawal J (2014) Application of artificial
neural network to stock forecasting-comparison with ses and arima. J
Comput Model 4(2):179–190
4. Araque O, Corcuera-Platas I, Sánchez-Rada JF, Iglesias CA (2017)
Enhancing deep learning sentiment analysis with ensemble techniques
in social applications. Exp Syst Appl 77(Supplement C):236–246
(ISSN 0957-4174)
5. Blei DM, Ng AY, Jordan MI (2003a) Latent dirichlet allocation. J Mach
Learn Res 3(Jan):993–1022
6. Blei DM, Ng AY, Jordan MI (2003b) Latent dirichlet allocation. J Mach
Learn Res 3(Jan):993–1022
7. Bollen J, Mao H, Zeng X (2011) Twitter mood predicts the stock
market. J Comput Sci 2(1):1–8 (ISSN 1877-7503)
8. Bomfim AN (2003) Pre-announcement effects, news effects, and
volatility: monetary policy and the stock market. J Bank Finance
27:133–151
9. Camras L (1981) Emotion: theory, research and experience. Am J
Psychol 94(2):370–372 (ISSN 00029556)
10. Chattupan A, Netisopakul P (2015) Thai stock news sentiment
classification using wordpair features. In: The 29th Pacific Asia
conference on language, information and computation, pp 188–195
11. Cheng C, Xu W, Wang J (2012) A comparison of ensemble methods in
financial market prediction. In: 2012 Fifth international joint conference
on computational sciences and optimization. IEEE, pp 755–759
12. Colas F, Brazdil P (2006) Comparison of svm and some older
classification algorithms in text classification tasks. In IFIP international
conference on artificial intelligence in theory and practice. Springer, pp
169–178
13. Fodor IK (2002) A survey of dimension reduction techniques. Center

Appl Sci Comput Lawrence Livermore Natl Lab 9:1–18
14. Gopinathan R, Durai S (2019) Stock market and macroeconomic
variables: new evidence from India. Financ Innov 5:12. https://doi.
org/10.1186/s40854-019-0145-1
15. Hagenau M, Liebmann M, Neumann D (2013) Automated news
reading: stock price prediction based on financial news using context-
capturing features. Decis Supp Syst 55(3):685–697 (ISSN 0167-9236)
16. Hu D, Schwabe G, Li X (2015) Systemic risk management and
investment analysis with financial network analytics: research
opportunities and challenges. Financ Innov 1:12. https://doi.
org/10.1186/s40854-015-0001-x
17. Huang W, Wu Z, Mitra P, Giles CL (2014) Refseer: a citation
recommendation system. In IEEE/ACM joint conference on digital
libraries. IEEE, pp 371–374
18. Jin F, Self N, Saraf P, Butler P, Wang W, Ramakrishnan N (2013)
Forex-foreteller: currency trend modeling using news articles. In:
Proceedings of the 19th ACM SIGKDD international conference on
knowledge discovery and data mining, KDD ’13. ACM, New York,
NY, USA, pp 1470–1473. ISBN 978-1-4503-2174-7
19. Kou G, Akdeniz ÖO, Dinçer H, Yüksel S (2021) Fintech investments in
European banks: a hybrid it2 fuzzy multidimensional decision-making
approach. Financ Innov 7(1):1–28
20. Lertsuksakda R, Netisopakul P, Pasupa K (2014) Thai sentiment terms
construction using the hourglass of emotions. In: 2014 6th international
conference on knowledge and smart technology (KST), pp 46–50
21. Li X, Xie H, Chen L, Wang J, Deng X (2014) News impact on stock
price return via sentiment analysis. Knowl Based Syst 69(Supplement
C):14–23. https://doi.org/10.1016/j.knosys.2014.04.022 (ISSN 0950-
7051)
22. Lim S, Tucker CS (2019) Mining twitter data for causal links between
tweets and real-world outcomes. Exp Syst Appl X 3:100007
23. Liu Z, Huang W, Zheng Y, Sun M (2010) Automatic keyphrase
extraction via topic decomposition. In: Proceedings of the 2010
conference on empirical methods in natural language processing, pp
366–376
24. Manning CD, Raghavan P, Schütze H (2009) Introduction to

information retrieval, chapter Stemming and lemmatization (2.2.4), pp
32–34. Cambridge University Press, Cambridge, England
25. Mao H, Counts S, Bollen J (2011) Predicting financial markets:
comparing survey, news, twitter and search engine data. arXiv preprint
arXiv:1112.1051
26. Nassirtoussi AK, Aghabozorgi S, Wah TY, Ngo DCL (2015) Text
mining of news-headlines for forex market prediction: a multi-layer
dimension reduction algorithm with semantics and sentiment. Exp Syst
Appl 42(1):306–324 (ISSN 0957-4174)
27. Nayak RK, Mishra D, Rath AK (2015) A naïve svm-knn based stock
market trend reversal analysis for Indian benchmark indices. Appl Soft
Comput 35:670–680
28. Nguyen TH, Shirai K, Velcin J (2015) Sentiment analysis on social
media for stock movement prediction. Exp Syst Appl 42(24):9603–
9611 (ISSN 0957-4174)
29. Noraset T, Lowphansirikul L, Tuarob S (2021) Wabiqa: a wikipedia-
based thai question-answering system. Inf Process Manag 58(1):102431
30. Nti IK, Adekoya AF, Weyori BA (2020) Efficient stock-market
prediction using ensemble support vector machine. Open Comput Sci
10(1):153–163. https://doi.org/10.1515/comp-2020-0199
31. Picek S, Heuser A, Jovic A, Bhasin S, Regazzoni F (2019) The curse
of class imbalance and conflicting metrics with machine learning for
side-channel evaluations. IACR Trans Cryptogr Hardware Embed Syst
2019(1):1–29
32. Schumaker RP, Zhang Y, Huang C-N, Chen H (2012) Evaluating
sentiment in financial news articles. Decis Supp Syst 53(3):458–464
(ISSN 0167-9236)
33. Seker SE, Mert C, Al-Naami K, Ayan U, Ozalp N (2013) Ensemble
classification over stock market time series and economy news. In:
2013 IEEE international conference on intelligence and security
informatics. IEEE, pp 272–273
34. Selvamuthu D, Kumar V, Mishra A (2019) Indian stock market
prediction using artificial neural networks on tick data. Financ Innov
5:12. https://doi.org/10.1186/s40854-019-0131-7
35. Stoean C, Paja W, Stoean R, Sandita A (2019) Deep architectures for

long-term stock price prediction with a heuristic-based strategy for
trading simulations. PLoS ONE 14(10):e0223593
36. Tuarob S, Mitrpanont JL (2017) Automatic discovery of abusive thai
language usages in social networks. In: International conference on
Asian digital libraries. Springer, pp 267–278
37. Tuarob S, Chu W, Chen D, Tucker C (2015) Twittdict: extracting
social oriented keyphrase semantics from twitter. In: Association for
computational linguistics (ACL), pp 25–31, 01
38. Tuarob S, Assavakamhaenghan N, Tanaphantaruk W, Suwanworaboon
P, Hassan S-U, Choetkiertikul M (2021) Automatic team
recommendation for collaborative software development. Empir
Software Eng 26(4):1–53
39. Vu TT, Chang S, Ha QT, Collier N (2012) An experiment in integrating
sentiment features for tech stock prediction in twitter. In: Proceedings
of the workshop on information extraction and entity analytics on
social media data. Mumbai, pp 23–38
40. Wen F, Xu L, Ouyang G, Kou G (2019) Retail investor attention and
stock price crash risk: evidence from China. Int Rev Financ Anal
65:101376
41. Wu W, Chen J, Xu L, He Q, Tindall M (2019) A statistical learning
approach for stock selection in the Chinese stock market. Financ Innov
5:12. https://doi.org/10.1186/s40854-019-0137-1
42. Zha Q, Kou G, Zhang H, Liang H, Chen X, Li C-C, Dong Y (2021)
Opinion dynamics in finance and business: a literature review and
research opportunities. Financ Innov 6(1):1–22
43. Zhong X, Enke D (2019a) Predicting the daily return direction of the
stock market using hybrid machine learning algorithms. Financ Innov
5:12. https://doi.org/10.1186/s40854-019-0138-0
44. Zhong X, Enke D (2019b) Predicting the daily return direction of the
stock market using hybrid machine learning algorithms. Financ Innov
5:12. https://doi.org/10.1186/s40854-019-0138-0
INDEX
A aviation 307
Abnormal big data 259 B
accelerometers 92
back-end data collection engine 292
accurate data 269
banking 307
Accurate prediction 338
Bayesian Ridge regression (BAY)
Active data collection 92
354
activity value 248, 252, 254, 255
behavioural data 90, 92
agriculture 307
big data 247, 248, 249, 250, 251,
algorithms 172
252, 253, 254, 255, 256
ambulatory assessment 209
biowearables 269
analysis notes 47
Bluetooth radios 92
AOA (angle of arrival) 312
body sensor networks (BSN) 306
application programming interface
breastfeeding 222, 223, 224, 229,
(API) 275, 349
231, 234, 235, 237, 239, 243,
application software 258
245
artificial intelligence 306, 327
artificial intelligence-based body C
sensor network framework
case-control studies 123, 150
(AIBSNF) 306
case report 123
artificial neural networks (ANN)
case series studies 123
309
Clinical research professionals
astronomy 307
(CRPs) 269
audio sensors 92
clinical trial 223, 227, 229, 238,
automated research management
240, 241
system 222
cloud-based research 90
cloud computing 258, 259
coding notes 47 data cluster centres 259

cohort studies 123 data collection 8, 12, 13, 16, 18, 20,
collaborative survey 159 21, 22, 23, 25, 26, 27, 28, 31,
community 55, 56, 58, 59, 60, 61, 32, 34, 35, 36
62, 64, 65, 66, 67, 68, 69, 71, data collection subsystem 290
72, 78, 84 Dataism 307
comprehensive value balance 248, data mining 183, 187, 205
255 Data saturation 22, 26
computational science 212 data science 183, 187, 203
computational social science 183 data security 248, 252, 255
confirmability 42, 43, 45, 46, 47 decentralized clinical trials (DCTs)
content analysis 22, 25, 27, 35, 37, [8,9,10] 269
39 Decision Tree (DT) 354
convenience sampling 24 deep neural networks (DNNs) 341
CoronaSurveys 287, 288, 289, 290, demilitarized zone (DMZ) server
292, 293, 294, 295, 296, 299, 275
300, 301, 302 dependability 42, 43, 46, 47
coronavirus disease 2019 (COV- Directional Accuracy (DA) 362
ID-19) pandemic 268 document analysis 54, 58, 72
corrected case fatality ratio (cCFR) document object model (DOM) 350
294 Dow Jones Industrial Average
cost analysis 222, 224, 229, 240 (DJIA) 347
credibility 42, 43, 44
E
criterion sampling 24
cross-correlation 261 ecological momentary assessment
cross-covariance matrix 260, 261, 209
262 e-commerce 248, 250
cross-sectional design 123 education 123
culture 55, 56, 57, 59, 60, 62, 64, electrocardiogram (ECG) 275
67, 68, 69, 71, 72, 73, 74, 75, estimated error 261
77, 79, 80, 84, 86, 87, 88 Ethnography 8, 13, 14, 15
experience sampling 209
D
Extreme Gradient Boosting (XGB)
data 42, 43, 44, 45, 46, 47, 48, 49, 354
51
F
database 258
database design 258 finance 307
database extraction 258
Index 383
Focus group interviews (FGIs) 271 K

food delivery 312
k-Nearest Neighbors regression (k-
frequently asked questions (FAQs)
NN) 354
3
L
G
Latent Dirichlet Allocation (LDA)
galvanic skin response (GSR) 324
360
general practice 3
logistics 258
general practitioners (GPs) 9
Global Positioning System (GPS) M
184
machine learning algorithms 306
Google Scholar 124
Markov Chain Monte Carlo
Gradient Boosting (GB) 354
(MCMC) 185
grounded theory 8, 10, 13, 19
massive data 248
H maternity care 3
Mean Absolute Percentage Error
health care 306, 307, 310, 314, 315,
(MAPE) 362
327, 330
Metropolis-Hastings Random Walk
health sciences 3
(MHRW) 182, 203
high-quality qualitative research 22
mobile apps 288
home medical devices 269
Mobile data collection 209, 210,
hospital information system (HIS)
213, 214
279
mobile devices 210, 214, 219
humanities 123
mobile phone automated system
human resources 289
(MPAS) 222, 224
hybrid methods 155
mobile phone technology 222, 223,
I 224, 240
monopoly 249
information leakage 247, 248, 251
multi-level fusion framework
information sensitivity 248, 252,
(MLFF) 315
255
multiple location-based sensors 92
Internet 248, 249
Internet of Things (IoT) 269 N
interpretation 307, 313, 315
National Electronics and Computer
interprofessional field 3
Technology Center (NECT-
interviewing 54, 58, 71, 88
EC) 350
natural context 9 phenomenology 8, 12, 13, 15

natural language processing (NLP) phenomenon 8, 10, 11, 12, 16
350 photoplethysmography sensor 324
network scale-up method (NSUM) physical therapy 3
289 PICO (population-intervention-
neuroscience 209, 210 comparison-outcome) 11
non-interventional research 123 power communications 258
non-randomized controlled trials primary care 22, 25, 29, 33, 34
124 Principal Component Analysis
normal big data 259 (PCA) 341
nursing 3, 4, 6 process notes 47
production management 312
O
Prolonged engagement 44, 45
Observation 53, 57, 60, 82, 83, 85 psychology 210, 218
occupational therapy 3 public health 123, 150
online information 337 purposive sampling 24
online information aggregation sys-
Q
tems 156
online payment 248 qualitative meta-syntheses 10
Online Social Network (OSN) 182 qualitative methodology 2
Organisation for Economic Co- qualitative research 1, 2, 3, 4, 5, 6
operation and Development quantitative meta-analyses 10
(OECD) 165 quasi-experimental designs 124,
151
P
R
pairwise wiki survey 156, 157, 160,
161, 162, 171 Radio-frequency identification
pandemic 288, 289, 291, 292, 296, (RFID) 313
297, 298, 301 Random Forest (RF) 354
paper and email data collection randomized controlled trial (RCT)
(PEDC) 222, 224 224
Paper-based data collection 223 raw data 47
Participant observation 53, 54, 55, Real-life data measurements 211
57, 58, 59, 60, 63, 84, 85, 86, real-time location system (RTLS)
88 306
Passive data collection 92, 96 real-time system 345
patient-generated health data Reflexivity 42, 43, 45
(PGHD) 269 report 47
peopled ethnography 55, 86 Research questions 8
Index 385
reservoir sampling 190 Stock Exchange of Thailand (SET)

residual sum of squares (RSS) 353 345
Rheumatoid arthritis (RA) 320 stock market decision-making 338
rich narrative materials 2 Support Vector Machine (SVM)
RSF (receiver signal phase) 312 344, 358
RSS (receiver signal strength) 312 survey research 156, 157, 171, 172
RTF (roundtrip time of flight) 312
T
S
TDoA (time difference of arrival)
sampling 22, 23, 24, 25, 26, 27, 35 312
Scopus Index 124 term frequency-inverse document
search engines 124 frequency (TF-IDF) 342, 350
security 307, 310, 325 text messaging 223, 242, 243, 244,
smart devices 90 245
Smartphone 90, 106, 117, 118, 119 theoretical sampling 24, 25
smart tourism 258 ToA (time of arrival) 312
snowball sampling 24, 25 traditional chinese medicine (TCM)
social behavior 183 324
social data analysis 183 transferability 42, 43, 46
social media analytics 183 Triangulation 44, 45
social networking services 182, Twitter 181, 182, 183, 184, 185,
185, 188 186, 187, 191, 192, 193, 194,
social networks 68, 182, 183, 185, 195, 196, 200, 202, 203
205, 207, 208
W
social organization 68
social problems 247, 255 warehousing 258
social process 8, 16 Wearable devices 268
social science research 123 Web of Science 122, 124, 126
social sciences 90, 91 web survey 288
software life cycle 268, 270 Wi-Fi antennas 92
sports 306, 307, 308, 309, 312, 313, wiki surveys 156, 157, 158, 159,
314, 315, 317, 318, 319, 320, 160, 161, 165, 166, 167, 168,
321, 322, 323, 324, 326, 327, 169, 171, 172, 173
331 World Health Organization (WHO)
statistical model 163, 164, 172 125
stock datasets 341

Moreira O. Advanced Techniques For Collecting Statistical Data 2023

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Moreira O. Advanced Techniques For Collecting Statistical Data 2023

Uploaded by

Copyright:

Available Formats

Advanced Techniques for

Collecting Statistical Data

e-book Edition 2023

ISBN: 978-1-77469-547-0 (e-book)

© 2023 Arcler Press

ISBN: 978-1-77469-497-8 (Hardcover)

Olga Moreira is a Ph.D. and M.Sc. in Astrophysics and B.Sc. in Physics/Applied

List of Contributors .......................................................................................xv

Chapter 1 Series: Practical Guidance to Qualitative Research. Part 1: Introduction . 1

Chapter 2 Series: Practical Guidance to Qualitative Research. Part 2: Context,

Chapter 3 Series: Practical Guidance to Qualitative Research. Part 3:

Chapter 4 Series: Practical Guidance to Qualitative Research. Part 4:

Chapter 5 Participant Observation as a Data Collection Method ............................ 53

Chapter 6 Attitudes towards Participation in a Passive Data

Chapter 7 An Integrative Review on Methodological Considerations in

Chapter 9 Towards a Standard Sampling Methodology on Online Social

Chapter 11 Comparing a Mobile Phone Automated System With a Paper

Chapter 12 Big Data Collection and Object Participation Willingness:

Chapter 13 Research on Computer Simulation Big Data

Chapter 14 Development of a Mobile Application for Smart Clinical

Chapter 15 The CoronaSurveys System for COVID-19

Chapter 16 Artificial Intelligence Based Body Sensor Network Framework—

Chapter 17 DAViS: a Unified Solution for Data Collection, Analyzation,

Index ..................................................................................................... 381

Anthony Paul O’Brien

Diana M Bond, PhD

Jeremy Hammond, PhD

Antonia W Shand, MB ChB

Natasha Nassar, PhD

Kyung Hwan Kim

Antonio Fernandez Anta

BSN Body sensor networks

Albine Mosera,b and Irene Korstjensc

People, Zuyd University of Applied Sciences, Heerlen, The Netherlands

University, Maastricht, The Netherlands

Applied Sciences, Maastricht, The Netherlands

quality qualitative research in primary care. By ‘novice’ we mean Master’s

Keywords: Qualitative research, qualitative methodology, phenomena, nat-

Box 1. Key features of qualitative research.

Qualitative research studies phenomena in the natural contexts of individuals or groups.

Qualitative research is associated with the constructivist or naturalistic

HIGH-QUALITY QUALITATIVE RESEARCH IN

in qualitative research on a daily basis, this series might be used as an

FURTHER EDUCATION AND READING

Box 2. Examples of publications on qualitative research.

Brinkmann S, Kvale S. Interviews. Learning the craft of qualitative research interviewing.

Research. 1st ed. Sage: London; 2010.

Starks H, Trinidad SB. Choose your method: A comparison of phenomenology, discourse

Irene Korstjensa and Albine Moserb,c

Applied Sciences, Maastricht, The Netherlands;

University, Maastricht, The Netherlands

qualitative research in primary care. By ‘novice’ we mean Master’s students

Keywords: General practice/family medicine, general qualitative designs

Why is context important?

Why should the research question be broad and open?

What kind of literature would I search for when preparing a

Why do qualitative researchers prefer SPIDER to PICO?

guided by maternity care professionals; (D) phenomenology, interviews;

Is it normal that my research question seems to change during

DESIGNING QUALITATIVE STUDIES

How do I choose a qualitative design?

What are the most important qualitative designs?

analysis, an analysis of themes and patterns that emerge in the narrative