Big Data Technology: Developments in Current Research and Emerging Landscape

Enterprise Information Systems
ISSN: 1751-7575 (Print) 1751-7583 (Online) Journal homepage: https://www.tandfonline.com/loi/teis20
Big data technology: developments in current

research and emerging landscape
Nitin Singh
To cite this article: Nitin Singh (2019): Big data technology: developments in current research and
emerging landscape, Enterprise Information Systems, DOI: 10.1080/17517575.2019.1612098
To link to this article: https://doi.org/10.1080/17517575.2019.1612098
Published online: 14 May 2019.
Submit your article to this journal
View Crossmark data
Full Terms & Conditions of access and use can be found at

https://www.tandfonline.com/action/journalInformation?journalCode=teis20
ENTERPRISE INFORMATION SYSTEMS
https://doi.org/10.1080/17517575.2019.1612098
Big data technology: developments in current research and

emerging landscape
Nitin Singh
IIM, Kashipur, India
ABSTRACT ARTICLE HISTORY

In this study, big data studies (01/2015–6/2018) are reviewed and Received 6 March 2018
several highly cited papers are identified, which indicates a growing Accepted 23 April 2019
interest in the area of big data. The papers and proceedings from KEYWORDS
international peer-reviewed journals and ranked conferences were Big data; analytics; research
reviewed. We employed Principal component analysis and citation themes; principal
and co-citation analysis to identify themes of research emanating component analysis; citation
from these studies. Citation and co-citation analysis reveals that there analysis; co-citation analysis
is cross-functional nature of big data research, which permeates
different business sectors and is influenced by themes in engineering
and information management.
Introduction
There have been a number of recent developments in data extraction, storage, and
analysis technologies (IDC 2017), furthering the need of businesses to harvest data. Data
exist today in a variety of formats, sizes, and types (i.e. structured, semi-structured, and
unstructured). Thus, the application of data analytics is considerably more extensive
than in the past, and the software that uses quantitative methods for data analytics has
become more and more sophisticated. Data analytics is a quantitative discipline and has
been evolving since ‘Analytics 1.0ʹ first appeared in the mid-1950s (Davenport 2013). As
the Internet of Things (IoT), connected devices, sensors, and smart machines have
become more widespread, the ability of devices and machines to generate new types
of real-time information in the industry’s value stream has been growing. In fact,
organisations now expect increasingly competitive and convenient cloud-based data
options that have on-demand pricing and fit-for-purpose data processing options
(Gartner 2017). A typical resource in these functional areas would be expected to be
able to accommodate large volumes of data, while the enormity of data necessitates
clear strategies for storage and management and for deriving patterns and meaningful
insights. Thus, enterprises are considering big data technologies as an important part of
their information infrastructure.
Some studies have noted that big data initiatives are fairly large. According to
a recent International Data Corporation report, global sales for big data and business
analytics are expected to grow from USD 150.8 billion in 2017 to more than USD
210 billion in 2020 (IDC 2017). This translates to a compound annual growth rate of
CONTACT Nitin Singh nitin.singh@iimkashipur.ac.in IIM, Kashipur, India

© 2019 Informa UK Limited, trading as Taylor & Francis Group
2 N. SINGH
11.7%. Regarding sectors, banking has emerged as the industry with the largest invest-
ment in big data and business analytics solutions (approximately USD 17 billion in 2016).
In this context, it is no surprise that current research is interwoven around fundamental
questions about big data and how best to use it for business. The rapid growth in the
adoption of big data technologies has attracted research on its managerial, technologi-
cal, and societal ramifications (Mishra et al. 2017; Kalantari et al. 2017; Qazi and Sher
2016; Ghosh 2016; Phillips-Wren et al. 2015). Numerous reviews have demonstrated the
maturity and intellectual structure of the field. As such, the objective of this paper is to
crystallise the current research themes that have emerged from extant literature ana-
lyses and reviews. We believe that this research contributes to the current literature in
several ways. First, the study assesses the current state of research and extracts the
quantitative evidence on extant subjective or narrative big data reviews. Second, the
study employs a principal component analysis (PCA) and a bibliometric analysis to
capture the intellectual structure that emerged from the review. Third, research like
this current study is rare; therefore, there is a need for additional studies like this to be
conducted so that others may understand and build on the evolving base of big data
knowledge.
Literature review of big data

Large and varied datasets can be found in electronic medical records, social network
sites, Internet clickstream logs, and so on. This type of data is referred to as big data.
There is no defined fixed threshold for big data; as such, it is kind of a ‘moving target’
(Chai and Shih 2017; Emmanuel and Stanier 2016; Reed and Dongarra 2015). Big data are
also characterised by the three Vs: velocity (the speed at which the data are generated);
volume (the size of data); and variety (the different types of data – numeric, text, images,
etc.). Big data management refers to the overall management of data collection, storage,
extraction, and analysis. A technology known as Hadoop is used to manage this big data
ecosystem. It was developed as an open source technology by the Apache foundation
and is a collection of various coordinated applications. Each application is responsible
for a different function (i.e. data collection, storage, extraction, and analysis) (Holmes
2014; Daniel Coimbra de Almeida and Bernardino 2015).
The objective of this work was to review the emerging research in big data. Before we
discuss our data collection process, we provide a summary of three review articles on
big data and applications in the period 2015 to 2017. We built the context for the
current study by reviewing analyses with a similar purpose to ours. Ghosh (2016)
conducted a narrative review of big data in his editorial preface. Phillips-Wren et al.
(2015) provided a framework for big data research and organised research opportunities
under big data management and governance; this study involved conducting interviews
with industry experts and reviewing the relevant academic and practitioner literature.
Qazi and Sher (2016) conducted a narrative review of the literature on applications of
big data across business functions. In another study, researchers used a bibliometric
analysis to examine papers that had been published in the web of science database
from 1980 to March 2015 (Kalantari et al. 2017), taking into consideration the general
concentration, dispersion, and movement of the data pool. The results showed that
certain keywords (e.g. classification, neural networks, vector machines, MapReduce, etc.)
ENTERPRISE INFORMATION SYSTEMS 3
were pivotal and drove the research. This study of Kalantari et al. (2017) helped describe
the landscape of big data research and provided other researchers avenues for future
research. It employed a multivariate regression approach and included variables such as
the numbers of pages, references, authors, and citations. Mishra et al. (2017) used
a similar direction and approach; this study employed a bibliometric analysis to under-
stand the trends and challenges involved with big data, using a citation and co-citation
analysis to assess papers published from 2011 to 2015 in 10 selected journals. The work
conducted by Mishra et al. (2017) was instrumental in highlighting the literature.
Moreover, the study identified an increase in the number of big data papers conducted
during the study period. The studies mentioned above demonstrate the ongoing
research interest in identifying the trends and themes in the area of big data. This is
understandable, given the fact that this area is continuously evolving. Taking a range of
methodological perspectives into consideration is also important when cultivating
a comprehensive understanding of any research area. Our study contributes to these
efforts by exploring more recent articles, including those published between 2015–2017.
We conducted a narrative review of the recent literature and performed a quantitative
analysis using the previously described methods (i.e. PCA, co-citation analysis). We found
that PCA and co-citation analyses provide complementary findings, which helps con-
solidate big data insights.
Literature analysis by journals and conferences

Business and Information Systems (IS) engineering
Bichler, Heinzl, and van der Aalst (2017) defined the terms ‘business analytics’ and ‘data
science’. They contrasted the two terms, showing that ‘business analytics’ is evidence-
based problem recognition and involves solutions in a business context whereas data
science is an interdisciplinary field, albeit with the same focus. IS research is expected to
make considerable progress; in fact, most future research is expected in to be in this field.
Communications of the ACM (CACM)

Articles in CACM address the engineering, management, and social aspects of big data.
An interview with Association for Computing Machinery (ACM) Turing award recipients
brought out the potential of big data for personalised services and the multitude of
differences between sensor-generated data and data from other sources (CACM staff
2017). It also highlighted the need to scale up real-time querying abilities and metho-
dological approaches, such as separating correlation from causation and removing the
bias from predictive models constructed by data scientists. Kugler (2016) drew attention
to the limitations of big data, despite its ability to manage data in the prediction of
disease outbreaks, highlighting the opportunity for more research in epidemic preven-
tion applications. Zaharia et al. (2016) discussed the potential applications of APACHE
Spark, a scalable, open source framework that encompasses machine learning, querying,
and integration perspectives. Further, they compared the performance of APACHE Spark
with other distributed computing platforms such as Hadoop. Reed and Dongarra (2015)
and Shim et al. (2016) explored the challenges and opportunities in applications of big
data technologies such as exascale computing and phonetic analytics to understand
consumer behaviour. Metcalf (2016) and King (2015) pointed out the need for
4 N. SINGH
responsible use of big data technologies to provide some direction in the ethical use of
big data. Date (2016) discussed the boundary condition for a business decision of
moving data into Amazon storage either online or by physical shipment. Nair (2015)
opined that approximate, timely information might be preferable to precise, delayed
information when making business decisions.
Communications of the association for information systems

Gupta, Goul, and Dinter (2015) demonstrated a methodological approach for Business
Intelligence (BI) curriculum development and outlined three model curricula for teach-
ing BI in undergraduate, graduate science, and business degree programmes. Along
similar lines, Turel and Kapoor (2016) analysed the maturity of business analytics
programs in the US and identified a wide gap between industry expectations and the
curricula offered by programs. Sahay (2016) presented opportunities, challenges, and
applications as well as state and non-state initiatives for big data in the health-care
domain. Phillips-Wren et al. (2015) reviewed big data research in the period 2011–2014.
They organised the research from both academic and industrial perspectives using a big
data framework with the following components: data sources, data preparation, data
storage, analysis, access, usage, and big data management and governance. Demirkan
et al. (2015) proposed that service innovations have been treated in a continuous
manner but handled in a periodic manner. This is necessary, since the rate of transfor-
mation of technologies that affect service innovation (i.e. big data, IoT, Social and
Cognitive Computing) is increasing at a pace not witnessed earlier.
Computers in Industry
Yang et al. (2015a) reviewed state-of-the-art health-care applications and suggested that
scalable technologies in cloud computing might provide cost-effective solutions in
healthcare. Babiceanu and Seker (2016) discussed the role of big data and analytics in
managing manufacturing operations. Further, they developed a framework for cyber-
physical systems which would facilitate data collection and algorithmic analysis of data.
Enterprise Information Systems

In a related study, a citation analysis was conducted to identify highly valued Enterprise
Information System (EIS) articles, and the co-citation method was subsequently used to
generate a raw matrix (Shiau 2016). The author explained that the source documents for
this analysis were obtained from the Scientific Indexing database. The matrix, in turn,
was analysed using statistical methods, such as factor, cluster, and Multidimensional
Scaling (MDS) analyses, to determine the intellectual core of EIS studies. This led to
a building block of related and seminal studies on other topics that had also contributed
to an understanding of the respective intellectual core. The study significantly informed
the current manuscript because it applied related methodologies, like a factor analysis
and citation/co-citations, to identify the intellectual core of big data. A semantic simi-
larity analysis of IoT was performed to determine IoT’s intellectual core (NG et al. 2018).
Further, the Web of Science database was used to collect the metadata on papers, and it
was used to identify the high-value articles and to record citation growth. After screen-
ing, the semantic analysis was used to compile a co-citation matrix, which served as an
input for further statistical analyses (i.e. factor analysis, Hierarchical Cluster Analysis and
Multi-Dimensional Scaling). These analyses were conducted to shed light on the intel-
lectual structure of IoT studies.
In another study, the authors demonstrated the huge scope for penetration of big
data and cyber-physical systems in Industry 4.0, since these systems help improve
resource efficiency and achieve personalisation (Xu and Duan 2018). Large volumes of
data created by cyber-physical systems can be handled by big data techniques and can
therefore improve system security, scalability, and efficiency. Quality as a Service (QaaS)
models for web services have also been studied using big data technologies. For
instance, Ahmad and Sarkar (2016) showed that Quality of Service (QoS) can be used
as an input, while the QaaS model provides an output for web services that matches
user expectations (Ahmad and Sarkar 2016). It has been shown that evaluations of QoS
effectiveness can be conducted through server logs. After reviewing recent publications
in EIS – namely, ‘Big data for cyber-physical systems in Industry 4.0: A survey’ by Xu and
Duan (2018) and ‘QaaS model for web services using big data technologies’ by Ahmad &
Sarkar (2016) – we observed that these two papers focused on two different areas of
significant big data applications: cyber-physical systems and QoS. These papers were
similar to the present study in that we also consider applications of big data technology.
However, our manuscript is different in the sense that it builds on the research con-
ducted in other studies (including these two) and attempts to identify emerging
research issues and areas. Additionally, our manuscript does not examine a specific
application area such as cyber-physical systems or QoS.
Information and software technology

Osvaldo et al. (2017) proposed a model-driven engineering approach for software
development in a big data ecosystem. Along similar lines, Nadal et al. (2017) proposed
a reference architecture for the deployment of a big data system.
Interfaces
Baughman et al. (2016) built a model to predict the volume of cloud computing
resources needed to sustain the IT needs of an organisation in the context of a live
sporting event. The benefits of the model were that it was more efficient than human
counterparts and was proactive in provisioning resources from the cloud, unlike human
decision makers, who tend to be reactive.
Journal of global information technology management

Ghosh (2016) presented opportunities and challenges for IS research in paradigms such
as epistemology, practical computations, knowledge generation, value addition, security
and privacy, and interpretational inquiry.
Journal of information technology

Articles primarily focused on the use of big data for strategic decision-making.
Constantiou and Kallinikos (2015) emphasised the importance of data structures that
provide the ability to acquire granular data in a form that is amenable to algorithmic
processing. Yoo (2015) proposed the use of evolutionary models drawn from biological
sciences to study complex socio-technical environments enabled by big data.
Constantiou and Kallinikos (2015) differentiated a standard strategy context from a big
6 N. SINGH
data context and discussed the associated opportunities and perils. The big data context
is heterogeneous, unstructured, haphazard, trans-semiotic, inductive, bottom-up, short-
horizon, and nowcasting in nature. This calls for cautious applications of big data when
devising strategy. Markus (2015) underlined the pros and cons of big data technology
with specific emphasis on data privacy. The standard notions of strategising in organisa-
tions seem to change with big data. Woerner and Wixom (2015) showed how big data
improve business models by (1) enabling the acquisition of new data, (2) providing new
insights, and (3) suggesting new actions. Big data also bring innovations to the business
model through data monetisation and digital transformation. Zuboff (2015) proposed an
alternative conceptualisation of big data as a form of surveillance capitalism, the
objective of which is to collect data about individuals and their habits for the purpose
of controlling and modifying behaviour for commerce. Bhimani (2015) commented on
the mechanism through which big data shapes strategy: by increasing the barriers to
entry; by redefining influence and organisational power; and by changing the relation-
ship between organisations and their stakeholders.
Management information systems quarterly

Mani, Shmueli, and Yahav (2015) studied the computational performance of tree-based
models on big data samples. Brynjolfsson, Geva, and Reichman (2015) introduced
a novel method called ‘crowd-squared’ to address data selection issues in identifying
the relevant data for subsequent modelling. Martens et al. (2016) demonstrated that the
use of massive amounts of fine-grained data yields substantial improvement in predic-
tions, thereby yielding better insights for targeted marketing campaigns. Menon and
Sarkar (2016) presented novel, scalable approaches that provided near-optimal solutions
for addressing privacy concerns in big data. Baesens, De Winne, and Sels (2017) explored
the managerial and technical issues and opportunities of research, such as ubiquitous
informing, implementation environments, integration issues, value assessment, regula-
tory compliance, managing analytical decisions, and return-on-investment capital in
networked business. Saboo, Kumar, and Park (2016) highlighted the importance of
incorporating temporal effects on the effectiveness of the marketing mix and provided
a model that could improve firms’ revenues by 17%. Ketter et al. (2015) discussed a new
approach called ‘competitive benchmarking’ to address complex, multi-disciplinary
problems involving socio-technical systems.
MIT sloan management review

Winig (2017) showcased a data-driven approach pursued by a commercial bank to
commercialise new products, measure the value of a customer from multiple viewpoints,
and engage with customers at all socio-economic levels. Fitzgerald (2015) highlighted the
best practices of General Mills and emphasised the importance of having the right people
to make sense of big data. Kiron (2017) enumerated the key stages in building a successful
data-driven organisation: (1) managing the data, (2) forging data partnerships, (3) valuing
data as an asset, and (4) commercialising data-based business models. Ransbotham, Kiron,
and Prentice (2015) emphasised the importance of being able to use the results of
analytical models to make business decisions. The authors differentiated between analy-
tics consumption and analytics production. Further, they recommended the following
steps to bridge the gap between production and consumption: understanding analytics
basics; building on experience; taking incremental steps; leveraging domain knowledge;

and recognising limitations. Chai and Shih (2017) presented the risks of relying upon
findings or patterns from data mining that are not necessarily supported by a hypothesis
or theory. They argued that the value of traditional methods should not be ignored and
that big data should be used to supplement existing methods instead of replacing them.
Baesens, De Winne, and Sels (2017) discussed four key steps to the success of human
resource analytics: (1) modelling and measuring employee network dynamics; (2) mana-
ging expectations from analytics; (3) focusing on business insights rather than model
performance; and (4) periodic reviewing of the models.
Research and technology management

The focus of these articles is on the use of big data analytics in research and development.
Markham, Kowolenko, and Michaelis (2015) emphasised the use of big data analytics for the
development of new products that identify target customers, for developing appropriate
advertising, and for research and development of products and services. Holden (2016)
argued that the inclusion of big data in research and development decisions would affect
strategic decisions, in part making them more accurate.
The international technology management review

Qazi and Sher (2016) explored the application and value addition of big data and analytics in
several business contexts such as e-commerce, human resource management, customer
relationship management, innovations, accounting, supply chain, and business policy.
ACM international conference on big data research

A 2017 study proposed an automatic database-building and visualisation platform that
can automatically convert collected data into a database, regardless of the data format
(Back and Ha 2017). The optimisation of large data through MapReduce was also
discussed in this study. The researchers optimised the related parameters on the
Hadoop platform and compared test data by using the TeraSort procedure before and
after the optimisation. It was found that optimisation significantly influenced the per-
formance of the Hadoop platform (Yan. 2017).
ACM international workshop on big data software engineering

Villanustre (2015) recently discussed the various challenges involved with big data and
also presented his experience in this field, using the example of a company that uses big
data analytics as its core business. The Hadoop ecosystem includes several components,
and software engineering is an important part of the big data field’s development. One
study described the following characteristics and processes of big data: requirements,
architecture, testing, and maintenance (Madhavji, Miranskyy, and Kontogiannis 2015).
IEEE international conference on big data (different years)

Baumann (2017) presented various data cube standards for ease of access to Geo big data,
such as satellite imagery data and data cube model, and also contributed to a cross-
disciplinary exchange on modelling and other challenges in Geo Data Mining. Gates et al.
(2015) proposed a multi-core CPU and GPU implementation for an alternating least-
squares algorithm to compute recommendations based on implicit feedback datasets.
8 N. SINGH
Gates et al. (2015) also suggested reordering the sequential system generation to simul-
taneously compute numerous systems. There were also studies that considered big data
analytics. For instance, one study used analytics to identify the caution spots regarding
accidents from the big data generated through vehicle recorders; the authors recom-
mended the use of a visual exploration system that enables the identification of various
types of caution spots (Itoh et al. 2015). There have also been discussions on issues related
to data quality in big data technology. In one study, the researchers proposed a scalable
approach to enhance the quality of big data by cleaning inconsistent data (Benbernou
and Ouziri 2017). Data access is a major bottleneck in big data processing. A study
conducted in this area demonstrated the use of PortHadoop in solving the cross-
platform data-access issue. Experiential results showed that PortHadoop was effective
and compatible with high-performance computing (Yang et al. 2015b).
IEEE international congress on big data (different years)

There have been an increased amount of discussions about Hadoop components. One
study introduced improvements in Hadoop components that fully exploit the performance
benefits of solid state drives (Hong et al. 2016). Moreover, the use of NoSQL in high-
performance processing is currently being deliberated in the literature. In one study, the
authors proposed an NoSQL data model that supports performance according to the input
database size and also enables a flexible framework (Mohan et al. 2016). Elshater et al. (2015)
studied the impact of YARN scheduler on data locality. YARN stands for Yet Another
Resource Negotiator and it is a resource management technology used in Hadoop distrib-
uted processing. Elshater et al. (2015) identified a trade-off between response time and data
locality. Another study compared the big data open source platforms to help companies
choose the one most suitable for their needs: Apache Mahout, MOA, R Project, Wabbit,
Pegasus, and Graph Lab Create (Daniel Coimbra de Almeida and Bernardino 2015).
International conference on big data and advanced wireless technologies

A study conducted on IoT data depicted IoT’s benefits and scalability. The researcher
presented new contributions in IoT’s domains of hardware, communication, data sto-
rage, and data-processing that make it flexible and cost-effective (Pham 2016). The issue
of data quality is being increasingly discussed in seminal conferences since it is
a relevant and current issue. In one study, the authors examined the most commonly
used definitions of big data and discussed the strengths and limitations of the different
approaches, with particular reference to data quality issues. They also proposed an
alternative definition of big data that is based on data quality (Emmanuel and Stanier
2016). The implementation of big data across different industries has also been men-
tioned at recent conferences. Seref and Bostanci (2016) examined the use of data
gathered from health-care devices to track cardiovascular diseases, glaucoma, diabetes,
and other conditions. They reviewed the developments in the big data of wearable
devices and outlined state-of-the-art approaches.
Research methodology
The methodology we used primarily employs two different approaches: a PCA and
a bibliometric analysis (apart from keyword and subjective analyses). In the section on
interpretation, we reconciled the results of the co-citation and PCA, which together
showed the structure of the field. The objective of the PCA was to establish the under-
lying pattern in the keywords that frequently appear in the narrowed-down papers.
Next, we performed a bibliometric (citation and co-citation) analysis. Citation
analysis is a tool used to investigate the intellectual structure of a given field
(Garfield 1979). It can be used to identify seminal and influential papers and par-
ent–child relationships between source and derivative works, and is based on cita-
tions made or received by a paper (Wang et al. 2016). We performed citation analysis
to identify seminal works and key journals, after which we analysed co-citations,
which occur when two or more papers are cited together by another paper. The
higher the number of co-citations, the higher is the possibility that two papers are
semantically related to each other (Small 1973). The semantic relation that emerges
is usually strong because it reflects the opinions of a wide set of authors (Small
1973). Wang et al. (2016) used co-citation analysis to investigate the structure of the
cloud-computing domain within the IS field. Outside of IS, Pilkington and Meredith
(2009) employed citation and co-citation analysis to uncover the structure of the
operations management field.
The steps we used to gather relevant data were as follows. We considered a set of
international peer-reviewed journals and leading conferences from both manage-
ment and technology streams within the IS domain. The real hype would also
consider how attractive big data as a domain is to research. Consequently, the
analysis should also include conferences because work is disseminated more quickly
on that platform. We selectively scanned multiple prestigious conferences at which
relevant papers were recently presented, and identified journal and conference
articles using a keyword search with a publication window of 2015–2017. Articles
prior to 2015 were not considered, as the content in these articles would have
reflected information from one or two years prior to the publication year. Based on
our review of content, we shortlisted a total of 61 articles from journals and
conferences in the publication window 2015–2017. In this section, we organise the
review of the articles by journal and conferences. Table 1 provides year-wise pub-
lication counts across journals.
Application of the research methodology

The methodological steps employed in this study are as follows.
Identification of journals and papers and shortlisting the papers

In the first step, a set of search keywords were defined and papers were searched in
journals ranked by Australia Business Dean’s Council (ABDC) and conferences ranked by
Excellence in Research Australia (ERA) and Qualis (Core, Qualis). The keywords chosen
were: analytics, data quality, big data, Hadoop, data privacy, visualisation, data mining,
data preparation, data cleaning, data storage, Cloudera, and Amazon. In the next step,
we identified the number of citations for these shortlisted papers and organised them
by citations through Google scholar. The shortlisted papers from the journals were
organised in descending order by citation (Table 2).
10 N. SINGH
Table 1. Year-wise publications of journals.

Journal 2015 2016 2017 2018
Business and Information Systems Engineering 1
Communication of ACM 5 4 1
Enterprise Information Systems 1 1 2
Communication of AIS 4 1
Interfaces 1
Information and Software Technology 2
The International Technology Management Review 1
Journal of Global Information Technology Management 1
Journal of Information Technology 9
MIS Quarterly 6
MIT Sloan Management Review 5 1
Research-Technology Management 1 1
Computers in Industry 1 1
ACM International Workshop on BIG Data Software Engineering 3
IEEE International Congress on big data 5 2 2
ACM International Conference on big data research 2
International Conference on big data and advanced wireless technologies 1
Grand Total 29 25 9 2
We did not classify the papers since we wanted to shortlist and identify papers by
citation before performing any analysis. Using these keywords, we were able to shortlist
65 papers and proceedings published in international peer-reviewed journals and con-
ferences from recent studies (01/2015–06/2018) on big data (ABDC, ERA, Qualis). We
selected only those papers that received a higher number of citations because highly
cited articles are high-value papers in their respective fields of study. It was important to
ensure that only influential articles were selected (Shiau 2016).
In the next step, the research papers’ metadata for the citation and co-citation
analysis were retrieved from Crossref, a leading worldwide database containing more
than 100 million registered content records (Crossref). Crossref also has over 7.9 million
records that contain Crossmark, which has more than 3.3 million records with funding
information and more than 2 million records that have at least one ORCID ID. Crossref
also interlinks various reputable journals, books, and scientific databases, thus allowing
for research discovery along with citation indexing. It is trusted by approximately 11,629
worldwide scholarly member organisations (source: www.crosref.org).
Identification of conferences and papers

Papers presented at conferences tend to explore topics that are of imminent interest
and are in the developing or final development stage. Thus, they often provide the most
current information on various topics within a field of study. We obtained papers that
were presented in conferences ranked by Qualis and ERA (Excellence in Research in
Australia). ERA ratings are created by Computing Research and Education Association of
Australasia (CORE) while Qualis ratings are published by the Brazilian ministry of educa-
tion. The conferences were selected based on their rating by Qualis and ERA, which
group the IEEE and ACM conferences in different performance classes. The performance
class or rank is derived from the number of citations of conference papers, authors, and
other scientific criteria. The Qualis conference uses the H-index as the performance
measure for conferences, and the ranking categories range from A1 to B5. The ERA
Table 2. Frequently cited big data related papers.
ID Authors (Year) Source Times cited
1 Garfield (1972) Science AAAS 3031
2 Small (1973) Journal of the American Society for Information Science 2971
3 Garfield (1979) Scientometrics 784
4 Shoshana Zuboff (2015) Journal of Information Technology 472
5 Zaharia et al. (2016) Communications of the ACM 424
6 Pilkington and Meredith (2009) Journal of Operations Management 337
7 Palvia et al. (2003) Communications of the ACM 253
8 Reed and Dongarra (2015) Communications of the ACM 230
9 Davenport (2013) Harvard Business Review 227
10 Palvia, Pinjani, and Sibley (2007) Information & Management 179
11 Constantiou and Kallinikos (2015) Journal of Information Technology 159
12 Babiceanu and Seker (2016) Computers in Industry 131
13 Baesens et al. (2014) MIS quarterly 93
14 Metcalf and Crawford (2016) Big Data & Society 88
15 Phillips-Wren et al. (2015) Communications of the AIS 71
16 Yang et al. (2015a) Computers in Industry 67
17 Woerner and Wixom (2015) Journal of Information Technology 56
18 Gupta, Goul, and Dinter (2015) Communications of the AIS 53
19 Demirkan et al. (2015) Communications of the AIS 49
20 Ghose and Todri (2015) MIS quarterly 46
21 Markus (2015) Journal of Information Technology 46
22 Wang et al. (2016) Decision Support Systems 45
23 Ketter et al. (2015) MIS quarterly 42
24 Martens et al. (2016) MIS quarterly 41
25 Bhimani (2015) Journal of Information Technology 35
26 Gates et al. (2015) IEEE International Conference on Big Data 29
27 Nair (2015) Communications of the ACM 28
28 Yoo (2015) Journal of Information Technology 27
29 Madhavji, Miranskyy, and Kontogiannis (2015) IEEE/ACM 1st International Workshop on Big Data Software Engineering, Florence, Italy 26
30 Markham, Kowolenko, and Michaelis (2015) Research-Technology Management 26
31 Saboo, Kumar, and Park (2016) MIS quarterly 26
32 Kallinikos and Constantiou (2015) Journal of Information Technology 25
ENTERPRISE INFORMATION SYSTEMS
33 Brynjolfsson, Geva, and Reichman (2015) MIS quarterly 16

34 Ransbotham, Kiron, and Prentice (2015) MIT Sloan Management Review 15
35 Bichler, Heinzl, and van der Aalst (2017) Business & Information Systems Engineering 14
11
(Continued)
12
Table 2. (Continued).
36 Menon and Sarkar (2016) MIS quarterly 14
37 Metcalf (2016) Communications of the ACM 13
38 Mani, Shmueli, and Yahav (2015) MIS quarterly 12
N. SINGH
39 Mishra et al. (2017) Business Process Management Journal 11

40 Turel and Kapoor (2016) Communications of the AIS 11
41 Nadal et al. (2017) Information and Software Technology 10
42 Yang et al. (2015b) IEEE Conference on Big Data 10
43 de Almeida and Bernardino (2015) IEEE International Congress on Big Data, New York, USA 9
44 Itoh et al. (2015) IEEE International Conference on Big Data 9
45 Kalantari et al. (2017) Journal of Big Data 8
46 Kiron (2017) MIT Sloan Management Review 8
47 Kugler (2016) Communications of the ACM 8
48 Mohan et al. (2016) IEEE International Congress on Big Data, New York, USA 8
49 Elshater et al. (2015) IEEE International Congress on Big Data, New York, USA 7
50 Hong et al. (2016) IEEE International Congress on Big Data, New York, USA 7
51 King (2015) Communications of the ACM 7
52 Mohamed, Fernando, and Ho (2015) IEEE/ACM 1st International Workshop on Big Data Software Engineering, Florence, Italy 7
53 Sahay (2016) Communications of the AIS 7
54 Shim et al. (2016) Communications of the ACM 7
55 Emmanuel and Stanier (2016) International Conference on Big Data and Advanced Wireless Technologies, Blagoevgrad, Bulgaria 6
56 Baughman et al. (2016) Interfaces 6
57 Chai and Shih (2017) MIT Sloan Management Review 5
58 Osvaldo et al. (2017) Information and Software Technology 4
59 Seref and Bostanci (2016) International Conference on Big Data and Advanced Wireless Technologies Blagoevgrad, Bulgaria 3
60 Villanustre (2015) IEEE/ACM 1st International Workshop on Big Data Software Engineering, Florence, Italy 3
61 Winig (2017) MIT Sloan Management Review 3
62 Holden (2016) Research-Technology Management 2
63 Baesens,Winne and Sels (2017) MIT Sloan Management Review 1
64 Baumann (2017) IEEE International Conference on Big Data. Boston, Boston, USA 1
65 Benbernou and Ouziri (2017) IEEE International Conference on Big Data 1
66 Date (2016) Communications of the ACM 1
67 Fitzgerald (2015) MIT Sloan Management Review 1
68 Ghosh (2016) Journal of Global Information Technology Management 1
69 Qazi and Sher (2016) International Technology Management Review 1
rankings are listed as A, B, and C, with A being the best. Approximately 20% of the IEEE
and ACM conferences fall into A categories, while 75% fall into different B categories.
Because of these high rankings, we shortlisted the IEEE and ACM conferences in our
research. The papers from IEEE and ACM conferences were identified based on
a keyword match to big data.
Keyword analyses on textual information are often used to identify and shortlist papers
(Mishra et al. 2017; Li and Duan 2018). The application of a keyword analysis can use
different approaches: conventional, directed, or summative. These applications extract
quantifiable information from textual data, and each approach uses a specific coding
scheme, coding origin, and analytical procedure. Analysis of content involves converting
unstructured to structured content so that trends and patterns can be perceived within
the content. For example, studies have created metadata from content through extrac-
tion of entities and subject classifications. Keyword analysis is an established research
method in IS research (Palvia et al. 2003; Palvia, Pinjani, and Sibley 2007).
In this study, we used a large corpus of text (the journal papers) to evaluate big data
and analytics research trends. In conventional content analysis, coding categories are
derived directly from the text data. Thus, we used summative analysis of keywords in the
content followed by the interpretation of underlying context. This approach involved
quantifying the frequency of specific keywords in each paper to identify major concerns
connected to big data. We examined papers from journals and conferences in a 3.5-year
period (01/2015–06/2018), scanning entire papers to obtain keyword statistics. A coding
sheet was used to ensure standardisation and consistency in the process of keyword
counting and to guarantee that all relevant keywords were recorded. Only keywords
explicitly used in published papers and conferences were recorded.
The keywords were based on a review of issues related to research and practice that
seem to surface often in the literature. In the ‘Literature review’ section, we discussed
these issues according to the research agenda, methodologies, and findings. We
selected the following keywords based on this review: analytics, data quality, big data,
Hadoop, data privacy, visualisation, data mining, data preparation, data cleaning, data
storage, Cloudera, and Amazon. As shown in Figure 1, critical issues that have been
discussed often in recent published articles are big data and analytics followed by data
storage, data privacy, and data mining.
Conceptual framework: a conceptual analysis from a data management

perspective
A topic that is frequently discussed in the recent literature is data management. Some
studies deliberate on the characteristics of data (i.e. volume, velocity, variety, variability,
veracity, and value) (Demirkan et al. 2015). We use a conceptual framework inspired by
Demirkan et al. (2015) to organise the literature on the basis of the functional aspect
addressed in the article, which could be one or more of the following: what to keep,
where to keep it, how to keep it, and why to keep it.
14 N. SINGH
Big Data 2362

Analytics 1246
Hadoop 456
Data Storage 256
Visualization 137
Data Mining 114
Data Quality 102
Amazon 63
Data Privacy 22
Data Cleaning 15
Data Preparation 13
Cloudera 6
0 500 1000 1500 2000 2500
Frequency
Figure 1. Keywords of frequency graph.
Where ?
Amazon
# Papers : 21
Why ?
Analytics, How ?
Visualization, Data Big Data Cloudera, Hadoop
Mining # Papers : 28
# Papers : 111
What ?
Data cleaning,
quality, storage,
quality
# Papers : 54
Figure 2. Key issues being researched in big data.

Principal component analysis

The possibility exists that there are specific components underlying different variables
(i.e. keywords), the logic being that multiple variables may have similar patterns of
responses, as they are all associated with the latent construct. In this context, it could
be meaningful to collapse a large number of variables into a few underlying constructs
or components. To achieve this, we ran PCA on all the variables.
The appropriateness of the dataset was first evaluated using Kaiser-Meyer-Olkin
(KMO) and Bartlett’s test of sphericity. The appropriateness was confirmed by the results,
which are shown in Table 4. There could be partial correlations among the few variables
in the dataset. If a large partial correlation exists, this indicates that some but not all
variables share variance. In a PCA, it is optimal when a variable shares variance with all
other variables in the dataset. Kaiser’s Measure of Sampling Adequacy (MSA) is
a measure used to capture this effect (Kaiser 1974; Jollife 2002). A small MSA value
indicates that the correlation between any two variables (xij) is unique and is not related
to the remaining variables. Normally, MSA values below 0.5 are not considered accep-
table. We examined the matrix of partial correlations, the MSA for each variable, and the
overall MSA computed across all variables. It is recommended that variables with small
MSAs be removed so that the unique correlation effect among variables does not affect
the PCA (Jollife 2002). An alternative option would be to supplement the data with
additional variables that could be correlated with the variables that display low MSAs.
Once the dataset was deemed appropriate, the MSA values of the individual variables
were examined using the anti-image correlation matrix. Three variables – ‘data quality’,
‘data privacy’, and ‘Hadoop’ – did not meet the desirable MSA value of 0.5 and were
subsequently dropped. The variable ‘data privacy’ had the lowest MSA value and was
dropped first, followed by ‘data quality’ and ‘Hadoop’. ‘Hadoop’, which had a marginal
MSA value, was dropped because of its unclear placement among the components. This
is understandable as ‘Hadoop’ is a common research theme and is discussed across
a range of subjects, from ‘big data management’ and ‘data services’ to ‘intelligence’. The
communalities of the remaining nine variables (as displayed in Table 3) were found to be
greater than 0.5, and the PCA therefore continued with these nine variables.
Additionally, we also considered Bartlett’s test of sphericity. This test indicates
whether variables are related to each other and are thus amenable to the detection of
factors (Cross 2015). The null hypothesis tests the hypothesis that the correlation matrix
is an identity matrix, that is, that the sample is randomly drawn from a population in
which variables are not correlated to each other for a given significance level. If the null
hypothesis is rejected, one may proceed with PCA. In this test, it was found that the
p-value was small (less than 0.05), indicating that PCA would be useful for this data.
Observation of the communalities indicated that the components could extract
a large percentage of the variance from the variables (Table 4). The eigenvalue is
a measure that captures how much of the variable variance any one component
explains. Any component with an eigenvalue ≥1 explains more variance than a single
observed variable (Jollife 2002). We were able to obtain three components (Eigen Values
>1.0) that could explain, collectively, 64.01% of the total variance (Table 5).
PCA was employed to obtain the factor-variable loading relationships. The unrotated factor-
variable loadings are shown in Table 6. The relationship of the variable to the underlying factor
16 N. SINGH
Table 3. KMO and Bartlett’s test.

Kaiser-Meyer-Olkin measure of sampling adequacy. .584
Bartlett’s test of sphericity Approx. Chi-Square 151.889
Df 36
Sig. .000
Table 4. Communalities.
Initial Extraction
Analytics 1.000 .601
Big data 1.000 .507
Visualisation 1.000 .517
Data mining 1.000 .691
Data preparation 1.000 .812
Data cleaning 1.000 .509
Storage 1.000 .543
Cloudera 1.000 .811
Amazon 1.000 .770
Extraction Method: Principal Component Analysis.
Table 5. Total variance explained.

Extraction sums of squared Rotation Sums of Squared
Initial eigenvalues loadings Loadings
% of Cumulative % of Cumulative % of Cumulative
Component Total Variance % Total Variance % Total Variance %
1 2.783 30.926 30.926 2.783 30.926 30.926 2.333 25.919 25.919
2 1.825 20.280 51.206 1.825 20.280 51.206 2.029 22.545 48.464
3 1.152 12.805 64.011 1.152 12.805 64.011 1.399 15.547 64.011
4 .779 8.661 72.672
5 .747 8.297 80.969
6 .683 7.589 88.558
7 .529 5.872 94.431
8 .356 3.951 98.381
9 .146 1.619 100.000
Table 6. Unrotated component matrix.

Component
1 2 3
Data preparation .786
Cloudera .766
Storage .694
Amazon .634 −.540
Big data .607
Analytics .628
Data cleaning .340 −.545
Data mining .692
Visualisation .504 .507
a. 3 components extracted.
or component is expressed by the loading. We suppressed factor-variable loadings with <0.5 in

Table 6, to focus on strong factor-variable relationships. We observe from Table 6 that six
variables load on factor one, followed by four variables on factor two, and two variables on
factor three. Three variables – Amazon, data cleaning, and visualisation – also indicated the
Table 7. Rotated component matrix.

Component
1 2 3
Data Preparation .892
big data .707
Storage .699
Analytics .624
amazon .861
Cloudera .833
Data Cleaning .706
Data Mining .830
Visualization .711
Rotation Method: Varimax with Kaiser Normalisation.
a. Rotation converged in five iterations.
presence of cross loading. The variable Amazon cross-loaded on factors 1 and 2. Data cleaning
cross-loaded on factors 1 and 2, and visualisation cross-loaded on factors 2 and 3.
To obtain a clearer picture, we conducted rotations in PCA. We had to resort to
rotations, since few variables were found to be almost equally loaded by the 2nd
component, making it difficult to differentiate and interpret the components. Thus, we
conducted the rotation to more clearly interpret the components. The rule of thumb is
that a component must clearly load at least two variables. We find that the rotations did
not change the position of variables. However, the coordinates of the variable vectors
were changed. We observed that rotated component loadings provided a differentiated
loading of components (Table 7). Other rotation methods, oblimin and promax, did not
reveal patterns as interpretable as the one produced by varimax. Thus, we pursued the
analysis with varimax rotation. Varimax maximises the factor loadings and tries to
associate the variables with at most one factor, thereby simplifying the analysis.
Interpretation and discussion of the PCA results

Three components with discriminatory and strong factor-variable loading relationships
are observable in Table 7. This makes sense as the first factor (component) seems to be
capturing qualities related to a construct called ‘big data management’. Likewise,
the second and third component point to the constructs related to ‘data services’ and
‘intelligence’, respectively.
The construct ‘big data management’ describes the management and execution of
engineering activities and deployment of related technologies. The variety characteristic
requires a basket of technologies to move data into the big data ecosystem from the
applications and databases that give rise to such data. Data preparation is the first step,
whereby an organisation tries to capture all data relevant for the analysis into one
ecosystem. Data preparation is similar to the collection of structured, semi-structured,
unstructured, and streaming data. For example, an organisation may have structured
data in its enterprise resource planning, supply chain management, or customer rela-
tionship management servers. It could also collect social media data or telemetry data
about its customers from its IoT systems, which then must be moved into the big data
ecosystem. Data preparation can be thought of as a set of engineering activities that
18 N. SINGH
uses various technologies to move data, for example, from a Structured Query Language
(SQL) type environment into Hadoop. The volume characteristic of big data demands
the use of distributed technologies for storage and processing. Hadoop is a scalable
framework that allows distributed storage, processing, and resource management of the
resources of a distributed ecosystem. While data preparation and Hadoop treat data
management from an engineering perspective, big data and storage could be analogous
to data storage in a big data ecosystem. On analytics, whereas it is reasonable to expect
that factor 3, ‘intelligence’, loads on to it, we found that the factor ‘big data manage-
ment’ had a higher loading on analytics than did ‘intelligence’. One plausible explana-
tion is the scope of analytics is greater than visualisation or data mining. For example,
exploratory, descriptive, and inferential analytics are supported by big data technologies
such as hive andspark in Hadoop ecosystem. Spark is native to big data ecosystems, and
it supports scalable analytics through its machine learning libraries. We infer that the
native technologies might have led to the assumption that analytics is more inherently
part of the ‘big data management’ component than of the ‘intelligence’ component.
The second component, ‘data services’, entails functional services related to data
cleaning and the service providers who offer such services. Service providers such as
Amazon offer a cloud-based infrastructure in which one may deploy a big data ecosys-
tem. Cloudera provides a management stack with packaged technologies (Hbase, Hive,
Impala, Hadoop, Pigscripts, Solr, Yarn, etc.) that can deploy and manage a distributed
storage and computing environment. The third construct, ‘intelligence’, is related to data
mining and visualisation. Activities make use of the data being managed in the big data
ecosystem and the data services offered by vendors to derive actionable insights. The
three constructs are functionally linked to each other in that ‘big data management’ and
‘data services’ are precursors for deriving any ‘intelligence’ in the big data ecosystem.
Citation and co-citation analysis of the shortlisted articles

Citation and co-citation analyses assess the interconnectedness of papers (articles) in
a given sample through the discovery of nodes and links between nodes. There are
various applications that can be used to explore this interconnectedness. Some of these
are Gephi, VOSviewer, Bibliometrix R, CitNetExplorer, and Scimat. We used VOSviewer for
our analysis, which is a software tool for constructing and visualising bibliometric net-
works. These networks may include journals, authors, and papers, and they can be
identified based on citations, bibliographic coupling, co-citation, or co-authorship rela-
tions. Because we wanted to understand the common research themes emerging from
bibliographic networks, we employed a co-citation analysis approach. Prior to that, we
did citation analysis to understand frequently cited authors and their specific research
topics. This provided a general picture. Next, we performed co-citation analysis to arrive
at a more detailed picture of emerging and common research themes, a step that
extended the analysis of author networks and influential research. The ultimate objective
was to identify emerging themes and influences in big data research. Results indicated
that big data research has been strongly influenced by themes in engineering and
information management. This analysis also identified major research foci within these
themes, which we present at the end of this section.
A citation analysis was performed to understand which themes are appearing often in
recent research. We retrieved metadata on individual papers from Crossref in the form of the
respective papers’ Digital Object Identifiers (DOI). Next, we activated API_KEY and then
obtained metadata on the existing Crossref DOIs. For that task, we used OpenURL, which
provides an XML representation of the metadata. The DOI files were then imported into the
VOSviewer tool for an analysis of the citations and co-citations. A citation analysis examines
the frequency, patterns, and graphs of citations in documents. It uses the pattern of citations
and links one document to another to reveal properties of the documents (Garfield 1972).
A typical aim would be to identify the most important documents in a collection. A co-
citation analysis, like bibliographic coupling, is a semantic similarity measure of documents
that makes use of citation relationships. It is the frequency with which two documents are
cited together by other documents (Small 1973).
We performed a citation and co-citation analysis as a complement to the literature
review to enhance clarity about ongoing research themes. There have been several
studies that have adopted citation and co-citation analyses to characterise and interpret
the structure as well as the dynamics of clusters. It has been found that this method
increases the interpretability of the literature (Garfield 1972). We applied a citation and
co-citation analysis with analytic and sense-making tasks by integrating network visua-
lisation, spectral clustering, automatic cluster labelling, and text summarisation.
Automatic cluster labelling and summarisation were used to augment the interpretation
of these clusters. This method focused on interconnections between authors and cita-
tion and co-citation cluster members. The software used for the analysis was VOSviewer,
a freely available computer program that was developed for constructing and viewing
bibliometric maps.
We pursued the analysis further to understand the pattern of co-citations, hoping to
obtain further insights into specific research themes. An interpretation was performed
based on the density views of the co-citation analysis. In the density view, each point on
the map has a colour that signifies the density of items (the co-citations) measured up to
that point. Thus, each point represents a theme that has been researched by various
papers. The colour range was between red and blue; the larger the number of the items
in the neighbourhood of a point and the higher the weights of those neighbouring
items, the closer that point’s colour is to red. Conversely, the smaller the number of
items around the point and the lower the weights of those neighbouring items, the
closer the point’s colour is to blue.
The density view revealed the structure of the citation and co-citation figures. An area
with dense interconnections indicates that authors associated with this area have
received a significant number of citations, whereas authors associated with less dense
areas have received fewer citations. In this case, Manyika et al. and Chen received the
most citations, followed by Yang et al. A clear separation is also evident between the
works by Desouza/Jacob and Bauman et al. on one end and those of Yang, Zhen,
Manyika et al., Chen, and others on the other end. Clusters have formed due to similarity
of research themes. In other words, different authors have pursued similar themes in
their respective papers and this has resulted in a specific cluster. Figures 3 and 4 show
that there are mainly two clusters into which the research themes tended to fall. One
cluster contains Desouza/Jacob and Bauman et al., with these authors focusing on
exploring the application of big data in a specific sector. In their paper, Desouza and
20 N. SINGH
Jacob (2017) explored the limitations of big data applications in the public sector.
Likewise, Baumann et al. (2015) explored how big data applications used in the earth
sciences require different tools and techniques due to the need to process large
planetary observation datasets.
In the second cluster, there are three sub-clusters. The first sub-cluster contains Yang
et al. (2015a), who explored the potential benefit of big data applications in health care.
The same sub-cluster contains Chen and Zhang (2014), who discussed several meth-
odologies for managing data deluge such as granular computing, cloud computing, bio-
inspired computing, and quantum computing. In the second sub-cluster, there are four
authors who have completed survey-based studies on big data, mainly in the fields of
clinical data warehouses, emerging information technologies, and semantic information
retrieval. In the third sub-cluster, there is only one paper. This paper’s theme is different
from all others, as it focuses on a scalable software platform for the smart grid cyber-
physical system using cloud technologies (Simmhan et al. 2013).
The co-citation analysis showed Brynjolfsson et al. & Baesens as the most co-cited
authors. The links were also found to be strong, as indicated in Table 8, 9, 10 and 11.
Analyses of Tables 9 and 11 led us to conclude that there was substantial co-citation in the
relatively short period of three years. It can also be inferred from Figure 3 that big data
research has appeared in most of the major journals that are part of this study. Furthermore,
analysis by journals (Figure 4) shows strong relationships across different journals and that
research on big data is ubiquitous across disciplines (e.g. accounting, finance, law, manage-
ment, nature, etc.).
Table 8. Threshold criteria for citation by authors.

Minimum no. of documents of Authors meeting
author threshold
5 31
6 24
7 20
8 16
9 12
10 11
11 11
12 9
Table 9.: Citation by authors.

Selected Author Documents Citations
✓ Baesens, Bart 38 367
✓ Brynjolfsson, Erik 19 1431
✓ Demirkan, Haluk 20 256
✓ Ghose, Anindya 20 403
✓ Kallinikos, Jannis 16 287
✓ Ketter, Wolfgang 11 41
✓ Sahay, Sundeep 20 276
✓ Turel, Ofir 20 92
✓ Vanthienen, Jan 14 21
✓ Verbeke, Wouter 11 78
✓ Zaharia, Matei 12 271
Table 10. Threshold criteria for co-citation by authors.

Minimum no. of citations of author Authors meeting threshold
20 5
18 6
17 7
16 8
15 12
14 14
Table 11. Co-citation by authors.

Selected Author Citations Total link strength
✓ Jacob 18 252
✓ Sullivan 16 240
✓ Wang 22 133
✓ Liu 21 131
✓ Zhang 22 128
✓ Chen 31 85
✓ Li 17 52
✓ Yang 15 17
✓ Meyer 15 13
✓ Brown 15 5
✓ J Kallinikos 20 0
✓ M. Dorigo 15 0
Figure 3. Density view of co-citation by authors.
We observe in Figure 4 that three prominent clusters emerged on the map. The
largest cluster contains public policy, economics, organisation science, and law, which
can be seen in red on the left portion of the map. The most frequently occurring terms
in this cluster include technology management, business policy, economics, law, opi-
nion, public policy, organisation, and several others. The most interesting aspect of this
22 N. SINGH
Figure 4. Density view of the Co-citation by papers.
cluster is its interdisciplinary nature. The red cluster bridges technology with economics,
organisational issues, law, and business policy, and it also contains many terms related
to basic science research. Prominent journals in this cluster include the Harvard Business
Review, Communications of the ACM, Harvard Law Review, American Economic Review,
Accounting and Business Research, and Organization Science. Interestingly, journals like
Science and Nature also appear in this cluster, though they are on the extreme side. This
indicates that many research issues in this cluster have origins in or are interlinked with
the basic sciences. This is relevant considering big data research has origins in comput-
ing science. This cluster is the most widely dispersed, with terms scattered among
economics, law, computer science, and public policy. Terms that are more often asso-
ciated with the basic sciences (from journals like Science and Nature) are found on the
extreme side of this cluster. In this case, terms such as software development, program-
ming, coding, and project management intermingle with other terms in the cross-
section with the second (green) cluster.
The second cluster (green) is on the right portion of the map. It is the next largest
cluster and includes terms related to the scholarship of technology and information
sciences. The most frequently occurring journals in this cluster are MIS Quarterly, Journal
of Information Technology, The Information Society, and Strategic Management Journal,
among others. The most frequently occurring terms include information technology,
technology strategy, big data information literacy, database management systems, and
analytics. It is interesting to note the overlap between this cluster and the management
science cluster (blue). The term ‘information management’ spans the boundary between
these two clusters.
Of the three present clusters, the smallest cluster is related to management science (blue)
and is spread across the upper portion of the map. Journals like MIT Sloan Management
Review, Academy of Management Review, Accounting, and Organizations & Society are
represented in this cluster. The most frequently occurring terms here relate to management
paradigms, administration, and people. The papers belonging to this cluster frequently
mention terms related to management of technology, information technology adoption,
employee engagement, return on investment, and business value. These terms span the
cross-section of information technology and management science.
The most interesting feature of the information technology cluster (green) is where it
intersects with other clusters. There is significant overlap between big data and other
areas, as information technology is an interdisciplinary field. At the intersection of the
information technology cluster with the management science cluster (blue), we find
terms associated with management research, such as information technology develop-
ment, adoption, and management. Additionally, terms such as data management,
compliance, data integration, employee engagement, and return on investment are
found on the edges of the green, blue, and red clusters. These terms are associated
with technology and business-related research. We also observe that, in all three
clusters, there are several terms that indicate considerable use of social media and
surveys as data collection methods.
We observed clearly articulated research themes emanating from the literature. One
theme was related to the engineering side of big data. Within this theme, we observed
sub-themes like exascale computing, Apache Hadoop, and unified engines for big data
processing. Parallel to the engineering theme, another research theme centred on
information management. In this context, surveillance capitalism, information civilisa-
tion, data management, analytics and the adoption of big data were key research sub-
topics. Figure 3 also showed that research themes permeated across business sectors
(e.g. accounting, healthcare, media, urban planning, corporate finance, etc.) Apparently,
researchers are addressing challenges not only within engineering and information
management. They are also investigating the implications of these challenges across
business sectors. We also observed a growing research interest in machine learning and
predictive analytics, as highlighted in seminal studies (Zuboff 2015; Reed and Dongarra
2015; Yang et al. 2015a). We expect an increasing research focus on predictive analytics,
high-performance computing, and machine learning, as these are key research sub-
themes. The papers in these sub-themes highlight the analytical and computing chal-
lenges faced by businesses.
Discussion
Contributions to theory
The current manuscript contributes to the literature on big data and extends research
papers and reviews in the area of big data (Zaharia et al. 2016; Reed and Dongarra 2015;
Davenport 2013; Constantiou and Kallinikos 2015; Babiceanu and Seker 2016; Yang et al.
2015a, 2015b; Baesens et al. 2014; Metcalf and Crawford 2016; Phillips-Wren et al. 2015).
It does this in the following ways. First, it adds to a systematic literature review of big
data by proposing and applying PCA and citation techniques (in additional to a co-
24 N. SINGH
citation analysis) to compare different studies. Second, through PCA, our analysis iden-
tifies three themes or components in big data research. The first theme captures
a component that could be termed ‘big data management’. Likewise, the second and
third components relate to the constructs ‘data services’ and ‘intelligence’, respectively.
The construct ‘big data management’ describes the management and execution of
engineering activities and the deployment of technologies for the same. The second
component, ‘data services’, relates to data cleaning and service providers who offer such
services. The third component relates to services for analytics. Third, the analysis also
illustrates the relationships between the components and argues that a better concep-
tualisation and use of techniques will result in better applications of big data. Therefore,
future research should consider these components.
Fourth, through citation and co-citation, our study found a difference between the
works of Desouza/Jacob and Bauman et al. on one end and the research of Yang, Zhen,
Manyika et al., Chen, and others on the other end (Desouza and Jacob 2017; Baumann
et al. 2015; Yang et al. 2015a; Manyika, Chui, and Brown et al. 2011; Chen and Zhang
2014). Clusters formed because of similarities or differences between research themes of
authors in their respective papers. Fifth, through citation and co-citation analysis, two
main clusters were identified with separate research themes. Analysing the journals
showed a strong relationship across different journals and also indicated that research
on big data is ubiquitous across disciplines (e.g. accounting, finance, law, management
of nature, management, etc.).
Big data have drawn significant attention from researchers, and the research is still in
a growth phase. Therefore, there is a need to continue the type of research exemplified
by the current study. We also observed that machine learning and predictive analytics
are being increasingly discussed, as demonstrated by seminal studies (Zuboff 2015; Reed
and Dongarra 2015; Yang et al. 2015a). For future research, it would be useful to adopt
other methods for this type of analysis and to observe the results.
Contributions to managerial practice

The above discussion has implications for the three critical areas identified as compo-
nents. It also highlights the interest in the industry, in general, in data initiatives. In this
paper, we can crystallise a conceptual framework that emerged from the literature
analysis and the findings. We can gauge three components (or themes) as a result of
the PCA that was performed on the keyword data. These components – ‘big data
management’, ‘data services’, and ‘intelligence’ – capture qualities that can be classified
as specific constructs. The components are indicative of industry aspirations, as well.
Companies are grappling with certain business issues connected to big data. We discuss
a few of these issues as they relate to this study’s findings. This is not an exhaustive list
of the issues; however, the current study’s findings did identify that these issues are
critical ones. First, companies intend to understand which component must receive the
most attention. Second, it is critical for companies to understand which functional area
of business must receive the largest share of investments. The findings suggest that ‘big
data management’ is an area that demands more attention as it involves data prepara-
tion, and data preparation requires the collection of structured, semi-structured, unstruc-
tured, and streaming data. Data preparation is akin to a set of engineering activities that
use various technologies to move data, for example, from a SQL type environment into
Hadoop (or streaming data into Hadoop).
Another business issue relates to relative focus. The question arises – should the
relative focus on any of these components be a function of a company’s market niche
and its competitive focus? The findings suggest that the factor ‘big data management’
had a higher loading on analytics than did ‘intelligence’. However, on the analytics side,
when it is reasonable to expect that factor 3 (‘intelligence’) loads on to it, we found that
‘big data management’ had a higher loading on analytics than did ‘intelligence’.
These findings shed light on the two other business pain points that companies
face – which type of big data services do companies need? Do these services fall more
on the data management side or on the analytics and visualisation side? One plausible
explanation is that the scope of analytics is greater than visualisation or data mining. For
example, exploratory, descriptive, and inferential analytics are supported by big data
ecosystem technologies. The second component, ‘data services’, entails functional ser-
vices related to data cleaning. Companies with limited data readiness or companies that
buy data from third-party service providers would need big data services. Once they
extend beyond this stage, such companies would be more effective in using ‘intelli-
gence’ and thereby analytics. The three components are functionally linked to each
other in that ‘big data management’ and ‘data services’ are precursors for deriving any
‘intelligence’ in the big data ecosystem.
Directions for future research

Future research that continues to build on these business pain points can significantly
contribute in both theory and practice. Data have become easier to collect, store,
extract, and analyse. Industries will increasingly strive to build new big data business
models to deliver innovative products and services to customers. As evidenced by the
recent increase in high-quality research, multiple perspectives are being utilised to study
big data technologies. These perspectives will have significant impacts on our worldly
knowledge and on technological developments. The three components identified in this
study have research implications for future. First, there is a strong undercurrent of
research centred on data services, and the current study foresees that research will
increasingly focus on ways to monetise big data (i.e. leveraging big data to increase
business value). An interesting implication of this study relates to the conceptualisation
of big data as a form of reconnaissance to gather digital data about consumers and their
lifestyles for the purpose of commerce. The ‘intelligence’ component discovered in the
findings would be helpful in this aspect. Other methods (such as cluster analysis and
Multi-Dimensional Scaling) could also be employed in future studies to further an
understanding of the intellectual structure of big data research. This would add new
dimensions and perspectives to an understanding of big data’s intellectual foundations.
It would be interesting to investigate approaches related to commercialising data-based
business models. There is also a need to pursue empirical studies to discover how to use
the results of analytical models to make business decisions. We also predict opportu-
nities for research in purely technical areas, such as ubiquitous information and integra-
tion in big data ecosystems. There are also opportunities in managerial applications,
such as regulatory compliance on data privacy. Clearly, big data is undergoing
26 N. SINGH
a paradigm shift, as evidenced from the high-quality research that has recently been
conducted. As civilisation moves into future, this research will have a significant impact
on our worldly knowledge.
Limitations of the study

This study does have certain limitations, which we summarise below –
We used citations for shortlisting papers subsequent to when the review and other
analyses were performed. It is possible that a different shortlisting approach would yield
a different result. Future research could investigate this issue.
This research was based on a literature review of big data that was identified with
certain keywords. Particular keywords were used in this process, and it is therefore
possible that other keywords might have yielded different results.
We adopted PCA and citation and co-citation analyses to identify the underlying
research themes and paper clusters. Other methods could be used for such an analysis,
a limitation that could be overcome by future research. Such research could offer new
dimensions and perspectives in the understanding of big data’s intellectual foundations.
The papers that had been published in journals and presented at conferences were
not classified in this study, as the objective was different. However, such paper classifi-
cation would shed light on the sub-themes present in recent research.
Conclusions
In this study, we presented a review of the recent big data research. We analysed the
extant research in three ways – a literature, a PCA, and a bibliometric analysis – and
discussed the components that emerged from the PCA. The findings show that extant
research is centred on how big data improves the businesses that acquire it, manage it,
and derive new insights from it through data analytics. The PCA identified three major
components (or themes) – ‘big data management’, ‘data services’, and intelligence’ –
which each capture qualities that can be classified as specific constructs. The three
components are functionally linked to each other in that ‘big data management’ and
‘data services ‘are precursors to ‘intelligence’ in the big data ecosystem. We proved the
issue further by investigating the interconnectedness of papers to identify overlapping
research themes to understand common research themes emerging from the biblio-
graphic networks. The citation and co-citation analyses showed that big data research
has been strongly influenced by themes in engineering and information management. It
was also found that the research themes are spread across various business sectors.
Interestingly, machine learning and predictive analytics have been increasingly dis-
cussed as analytical tools to harvest data. The bibliometric analysis demonstrated that
predictive accuracy, robust data analytics, and high-performance computing were also
key research sub-themes. We invite future research on this topic to further develop an
understanding of big data.
Acknowledgments
The author is grateful to the Editor-in-Chief and anonymous referees whose valuable comments
and suggestions substantially helped improve this article.
Disclosure statement
No potential conflict of interest was reported by the author.
ORCID
Nitin Singh http://orcid.org/0000-0002-9003-3310
References
Ahmad, F., and A. Sarkar. 2016. “QaaS (Quality as a Service) Model for Web Services Using Big Data
Technologies.” Enterprise Information Systems 11 (9): 1352–1373.
Babiceanu, R. F., and R. Seker. 2016. “Big Data and Virtualization for Manufacturing Cyber-Physical
Systems: A Survey of the Current Status and Future Outlook.” Computers in Industry 81: 128–137.
doi:10.1016/j.compind.2016.02.004.
Back, B.-H., and H. Il-Kyu. (2017). “A Platform for Supporting Automatic Data Storing and
Visualization of Public and Private Big Data.” ACM International Conference on Big Data
Research. Osaka, Japan. 12–17. doi: 10.2460/ajvr.78.1.12.
Baesens, B., R. Bapna, J. R. Marsden, J. Vanthienen, and J. L. Zhao. 2014. “Transformational Issues of
Big Data and Analytics in Networked Business.” MIS Quarterly 38 (2): 629–631.
Baesens, B., S. De Winne, and L. Sels. 2017. “Is Your Company Ready for HR Analytics?” MIT Sloan
Management Review 58 (2): 20.
Baughman, A. K., B. Richard., B. Harrison, B. O’Connell, H. Pearthree, F. Brandon., C. McAvoy, S. Sun,
and C. Upton. 2016. “IBM Predicts Cloud Computing Demand for Sports Tournaments.”
Interfaces 46 (1): 33–48. doi:10.1287/inte.2015.0820.
Baumann, P. (2017). “Standardizing Big Earth Datacubes”, IEEE International Conference on Big
Data. Boston, Boston, USA. 67–73.
Baumann P., Mazzetti P., Ungar J., Barbera R., Barboni B., Beccati A., Bigagli L., et al. 2015. “Big Data
Analytics for Earth Sciences: The EarthServer Approach.” International Journal of Digital Earth 9
(1): 3–29. doi:10.1080/17538947.2014.1003106.
Benbernou, S., and M. Ouziri (2017). “Enhancing Data Quality by Cleaning Inconsistent Big RDF
Data.” IEEE International Conference on Big Data, 74–79. doi: 10.1186/s12912-017-0268-5.
Bhimani, A. 2015. “Exploring Big Datas Strategic Consequences.” Journal of Information Technology
30 (1): 66–69. doi:10.1057/jit.2014.29.
Bichler, M., A. Heinzl, and W. M. van der Aalst. 2017. “Business Analytics and Data Science: Once
Again?” Business & Information Systems Engineering 59 (2): 77–79. doi:10.1007/s12599-016-0461-1.
Brynjolfsson, E., T. Geva, and S. Reichman. 2015. “Crowd-Squared: Amplifying the Predictive Power
of Search Trend Data.” MIS Quarterly 40 (4): 941–961. doi:10.25300/MISQ/2016/40.4.07.
CACM staff. 2017. “Big Data.” Communications of the ACM 60 (6): 24–25. doi:10.1145/3079064.
Chai, S., and W. Shih. 2017. “Why Big Data Isn’t Enough.” MIT Sloan Management Review 58 (2): 57.
Chen, P., and C. Zhang. 2014. “Data-Intensive Applications, Challenges, Techniques and
Technologies: A Survey on Big Data.” Information Sciences 275: 314–347. doi:10.1016/j.
ins.2014.01.015.
Chun Kit, N. G., W. Chun Ho, Y. Kai Leung, I. Wai Hung, and T. Cheung. 2018. “A Semantic Similarity
Analysis of Internet of Things.” Enterprise Information Systems 12 (7): 820–855. doi:10.1080/
17517575.2018.1464666.
28 N. SINGH
Constantiou, I. D., and J. Kallinikos. 2015. “New Games, New Rules: Big Data and the Changing
Context of Strategy.” Journal of Information Technology 30 (1): 44–57. doi:10.1057/jit.2014.17.
Cross, R. 2015. Principal Component Analysis Handbook. Clanrye International. Crossref at https://
www.crossref.org; Accessed on Dec 2018
Da Xu, L., and L. Duan. 2018. “Big Data for Cyber Physical Systems in Industry 4.0: A Survey.”
Enterprise Information Systems 13 (2): 148–169.
Date, S. 2016. “Should You Upload or Ship Big Data to the Cloud?” Communications of the ACM 59
(7): 44–51. doi:10.1145/2963119.
Davenport, T. H. 2013. “Analytics 3.0.” Harvard Business Review. December.
de Almeida, D. C. P., and J. Bernardino (2015). “Big Data Open Source Platforms.” IEEE International
Congress on Big Data, New York, USA. 268–275.
Demirkan, H., C. Bess, J. Spohrer, A. Rayes, D. Allen, and Y. Moghaddam. 2015. “Innovations with
Smart Service Systems: Analytics, Big Data, Cognitive Assistance, and the Internet of
Everything.” Communications of the AIS 37: 35.
Desouza, K., and B. Jacob. 2017. “Big Data in the Public Sector: Lessons for Practitioners and
Scholars.” Administration & Society 49 (7): 1043–1064. doi:10.1177/0095399714555751.
Elshater, Y., P. Martin, D. Rope, M. McRoberts, and C. Statchuk (2015). “A Study of Data Locality in
YARN.” IEEE International Congress on Big Data, New York, USA. 174–181.
Emmanuel, I., and C. Stanier (2016). “Defining Big Data.” International Conference on Big Data and
Advanced Wireless Technologies, Blagoevgrad, Bulgaria. Article No.: 5 ERA, portal.core.edu.au/
conf-ranks/; Accessed on Dec 2018
Fitzgerald, M. 2015. Enhancing Intuition with Analytics at General Mills. Massachusetts Institute of
Technology: MIT Sloan Management Review.
Garfield, E. 1972. “Citation Analysis as a Tool in Journal Evaluation.” Science 178 (4060): 471–479.
Garfield, E. 1979. “Is Citation Analysis a Legitimate Evaluation Tool?” Scientometrics 1 (4): 359–375.
doi:10.1007/BF02019306.
Gartner. (2017). “Hype Cycle for Data Management.” www.Gartner.com
Gates, M., H. Anzt, J. Kurzak, and J. Dongarra 2015. “Accelerating Collaborative Filtering Using
Concepts from High Performance Computing.” IEEE International Conference on Big Data, Santa
Clara, CA.
Ghose, A., and V. Todri. 2015. “Towards a Digital Attribution Model: Measuring the Impact of
Display Advertising on Online Consumer Behavior.” MIS Quarterly 40 (4): 889–910. doi:10.25300/
MISQ/2016/40.4.05.
Ghosh, J. 2016. “Big Data Analytics: A Field of Opportunities for Information Systems and
Technology Researchers.” Journal of Global Information Technology Management 19 (4):
217–222. doi:10.1080/1097198X.2016.1249667.
Gupta, B., M. Goul, and B. Dinter. 2015. “Business Intelligence and Big Data in Higher Education:
Status of a Multi-Year Model Curriculum Development Effort for Business School
Undergraduates, MS Graduates, and MBAs.” Communications of the AIS 36: 23.
Holden, G. 2016. “Big Data and R&D Management.” Research-Technology Management 59 (5):
22–26. doi:10.1080/08956308.2016.1208044.
Holmes, A. 2014. Hadoop in Practice. 2nd ed. New Delhi: Dreamtech Press.
Hong, J., L. Li, C. Han, B. Jin, Q. Yang, and Z. Yang (2016). “Optimizing Hadoop Framework for Solid
State Drives.” IEEE International Congress on Big Data, New York, USA. 9–17. doi: 10.1167/
tvst.5.6.9.
IDC, (2017). “Double-Digit Growth Forecast for the Worldwide Big Data and Business Analytics
Market through 2020 Led by Banking and Manufacturing Investments.” https://www.idc.com/
getdoc.jsp?containerId=prUS41826116
Itoh, M., D. Yokoyama, M. Toyoda, and M. Kitsuregawa (2015). “Visual Interface for Exploring
Caution Spots from Vehicle Recorder Big Data.” IEEE International Conference on Big Data.
Jollife, I. T. 2002. Principal Component Analysis. New York, NY: Springer.
Kaiser, H. F. 1974. “An Index of Factorial Simplicity.” Psychometrika 39: 31–36. doi:10.1007/
BF02291575.
Kalantari, A., A. Kamsin, H. S. Kamaruddin, N. A. Ebrahim, A. Gani, A. Ebrahimi, and S. Shamshirband.

2017. “A Bibliometric Approach to Tracking Big Data Research Trends.” Journal of Big Data 4 (1):
30. doi:10.1186/s40537-017-0088-1.
Kallinikos, J., and I. D. Constantiou. 2015. “Big Data Revisited: A Rejoinder.” Journal of Information
Technology 30 (1): 70–74. doi:10.1057/jit.2014.36.
Ketter, W., M. Peters, J. Collins, and A. Gupta. 2015. “Competitive Benchmarking: An IS Research
Approach to Address Wicked Problems with Big Data and Analytics.” MIS Quarterly 40 (4):
1057–1080. doi:10.25300/MISQ/2016/40.4.12.
King, J. L. 2015. “Humans in Computing: Growing Responsibilities for Researchers.” Communications
of the ACM 58 (3): 31–33. doi:10.1145/2739250.
Kiron, D. 2017. “Lessons from Becoming a Data-Driven Organization.” MIT Sloan Management
Review 58 (2). https://sloanreview.mit.edu/case-study/lessons-from-becoming-a-data-driven-
organization/.
Kugler, L. 2016. “What Happens When Big Data Blunders?” Communications of the ACM 59 (6):
15–16. doi:10.1145/2942427.
Madhavji, N. H., A. Miranskyy, and K. Kontogiannis (2015). “Big Picture of Big Data Software
Engineering: With Example Research Challenges.” IEEE/ACM 1st International Workshop on
Big Data Software Engineering, Florence, Italy. 11–14.
Mani, D., G. Shmueli, and I. Yahav. 2015. “A Tree-Based Approach for Addressing Self-Selection in
Impact Studies with Big Data.” MIS Quarterly 40 (4): 819–848.
Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., Byers, A.H. 2011. Big Data: The
Next Frontier for Innovation, Competition, and Productivity. McKinsey & Company.
Markham, S. K., M. Kowolenko, and T. L. Michaelis. 2015. “Unstructured Text Analytics to Support
New Product Development Decisions.” Research-Technology Management 58 (2): 30–39.
Markus, M. L. 2015. “New Games, New Rules, New Scoreboards: The Potential Consequences of Big
Data.” Journal of Information Technology 30 (1): 58–59. doi:10.1057/jit.2014.28.
Martens, D., F. Provost, J. Clark, and E. J. de Fortuny. 2016. “Mining Massive Fine-Grained Behavior
Data to Improve Predictive Analytics.” MIS Quarterly 40 (4): 869–888. doi:10.25300/MISQ.
Menon, S., and S. Sarkar. 2016. “Privacy and Big Data: Scalable Approaches to Sanitize Large
Transactional Databases for Sharing.” MIS Quarterly 40 (4): 963–981. doi:10.25300/MISQ.
Metcalf, J. 2016. “Big Data Analytics and Revision of the Common Rule.” Communications of the
ACM 59 (7): 31–33. doi:10.1145/2963119.
Metcalf, J., and K. Crawford. 2016. “Where are Human Subjects in Big Data Research? the Emerging
Ethics Divide.” Big Data & Society 3 (1): 1–14. doi:10.1177/2053951716650211.
Mishra, D., Z. Luo, S. Jiang, T. Papadopoulos, and R. Dubey. 2017. “A Bibliographic Study on Big
Data: Concepts, Trends and Challenges.” Business Process Management Journal 23 (3): 555–573.
doi:10.1108/BPMJ-10-2015-0149.
Mohamed, A. T., C. L. Fernando, and D. Ho (2015). “Software Analytics to Software Practice:
A Systematic Literature Review.” IEEE/ACM 1st International Workshop on Big Data Software
Engineering, Florence, Italy, 30–36.
Mohan, A., M. Ebrahimi, S. Lu, and A. Kotov (2016). “A NoSQL Data Model for Scalable Big Data
Workflow Execution.” IEEE International Congress on Big Data, New York, USA. 52–59.
Nadal, S., V. Herrero, O. Romero, A. Abell, X. Franch, S. Vansummeren, and D. Valerio. 2017.
“A Software Reference Architecture for Semantic-Aware Big Data Systems.” Information and
Software Technology 90: 75–92. doi:10.1016/j.infsof.2017.06.001.
Nair, R. 2015. “Big Data Needs Approximate Computing: Technical Perspective.” Communications of
the ACM 58 (1): 104. doi:10.1145/2688072.
Osvaldo, S. S., Jr, D. Lopes, A. C. Silva, and Z. Abdelouahab. 2017. “Developing Software Systems to
Big Data Platform Based on MapReduce Model: An Approach Based on Model Driven
Engineering.” Information and Software Technology 92: 30–48. doi:10.1016/j.infsof.2017.07.006.
Palvia, P., E. Mao, A. F. Salam, and K. S. Soliman. 2003. “Management Information Systems
Research: What’s There in a Methodology?” Communications of the ACM 11 (1): 288–310.
30 N. SINGH
Palvia, P., P. Pinjani, and E. H. Sibley. 2007. “A Profile of Information Systems Research Published in
Information & Management.” Information & Management 44 (1): 1–11. doi:10.1016/j.
im.2006.10.002.
Pham, C. (2016). “Internet-of-Thing and Reasons Why It Is Becoming a Reality.” International
Conference on Big Data and Advanced Wireless Technologies. Blagoevgrad, Bulgaria. Article No.: 1.
Phillips-Wren, G. E., L. S. Iyer, U. R. Kulkarni, and T. Ariyachandra. 2015. “Business Analytics in the
Context of Big Data: A Roadmap for Research.” Communications of the AIS 37 (23): 448–472.
Pilkington, A., and J. Meredith. 2009. “The Evolution of the Intellectual Structure of Operations
Management - 1980-2006; a Citation/Co-Citation Analysis.” Journal of Operations Management
27: 185–202. doi:10.1016/j.jom.2008.08.001.
Qazi, R. U. R., and A. Sher. 2016. “Big Data Applications in Businesses: An Overview.” The
International Technology Management Review 6 (2): 50–63. doi:10.2991/itmr.2016.6.2.3.
Ransbotham, S., D. Kiron, and P. K. Prentice. 2015. “The Talent Dividend.” MIT Sloan Management
Review 56 (4): 1.
Reed, D. A., and J. Dongarra. 2015. “Exascale Computing and Big Data.” Communications of the
ACM 58 (7): 56–68. doi:10.1145/2797100.
Saboo, A. R., V. Kumar, and I. Park. 2016. “Using Big Data to Model Time-Varying Effects for Marketing
Resource (Re) Allocation.” MIS Quarterly 40 (4): 911–939. doi:10.25300/MISQ/2016/40.4.06.
Sahay, S. 2016. “Big Data and Public Health: Challenges and Opportunities for Low and Middle
Income Countries.” Communications of the AIS 39: 20.
Seref, B., and E. Bostanci (2016). “Opportunities, Threats and Future Directions in Big Data for
Medical Wearables International Conference on Big Data and Advanced Wireless Technologies.”
International Conference on Big Data and Advanced Wireless Technologies Blagoevgrad,
Bulgaria. Article No.: 15.
Shiau, W.-L. 2016. “The Intellectual Core of Enterprise Information Systems: A Co-Citation Analysis.”
Enterprise Information Systems 10 (8): 815–844. doi:10.1080/17517575.2015.1019570.
Shim, J. P., J. Koh, S. Fister, and H. Y. Seo. 2016. “Phonetic Analytics Technology and Big Data:
Real-World Cases.” Communications of the ACM 59 (2): 84–90. doi:10.1145/2886013.
Simmhan, Y., S. Aman, A. Alok Kumbhare, and L. Rongyang. 2013. “Cloud-Based Software Platform
for Big Data Analytics in Smart Grids.” Computing in Science & Engineering 15 (4): 38–47.
doi:10.1109/MCSE.2013.39.
Small, H. 1973. “Co-Citation in the Scientific Literature: A New Measure of the Relationship
between Two Documents.” Journal of the American Society for Information Science 24 (4):
265–269. doi:10.1002/(ISSN)1097-4571.
Turel, O., and B. Kapoor. 2016. “A Business Analytics Maturity Perspective on the Gap between
Business Schools and Presumed Industry Needs.” Communications of the AIS 39: 6.
Villanustre, F. (2015). “Industrial Big Data Analytics: Lessons from the Trenches.” IEEE/ACM 1st
International Workshop on Big Data Software Engineering, Florence, Italy, 1–3.
Wang, N., H. Liang, Y. Jia, S. Ge, Y. Xue, and Z. Wang. 2016. “Cloud Computing Research in the IS
Discipline: A Citation/Co-Citation Analysis.” Decision Support Systems 86 (C): 35–47. doi:10.1016/j.
dss.2016.03.006.
Winig, L. 2017. “A Data-Driven Approach to Customer Relationships: A Case Study of Nedbank’s
Data Practices in South Africa.” MIT Sloan Management Review 58 (2).
Woerner, S. L., and B. H. Wixom. 2015. “Big Data: Extending the Business Strategy Toolbox.” Journal
of Information Technology 30 (1): 60–62. doi:10.1057/jit.2014.31.
Yan., Z. (2017 Oct). “A Method of Related Parameters Combinatorial Optimization of Large Data
Platform Based on MapReduce.” ACM International Conference on Big Data Research. Osaka,
Japan. 18–25.
Yang, -J.-J., J. Li, J. Mulder, Y. Wang, S. Chen, H. Wu, Q. Wang, and H. Pan. 2015a. “Emerging
Information Technologies for Enhanced Healthcare.” Computers in Industry 69: 3–11.
doi:10.1016/j.compind.2015.01.012.
Yang, X., N. Liu, B. Feng, X.-H. Sun, and S. Zhou (2015b). “PortHadoop: Support Direct HPC Data
Processing in Hadoop.” IEEE Conference on Big Data, Santa Clara, CA.
Yoo, Y. 2015. “It Is Not about Size: A Further Thought on Big Data.” Journal of Information
Technology 30 (1): 63–65. doi:10.1057/jit.2014.30.
Zaharia, M., R. S. Xin, P. Wendell, T. Das, M. Armbrust, A. Dave, X. Meng, et al. 2016. “Apache Spark:
A Unified Engine for Big Data Processing.” Communications of the ACM 59 (11): 56–65.
doi:10.1145/2934664.
Zuboff, S. 2015. “Big Other: Surveillance Capitalism and the Prospects of an Information
Civilization.” Journal of Information Technology 30 (1): 75–89. doi:10.1057/jit.2015.5.

Big Data Technology: Developments in Current Research and Emerging Landscape

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Big Data Technology: Developments in Current Research and Emerging Landscape

Uploaded by

Copyright:

Available Formats

Enterprise Information Systems

ISSN: 1751-7575 (Print) 1751-7583 (Online) Journal homepage: https://www.tandfonline.com/loi/teis20

Big data technology: developments in current

To link to this article: https://doi.org/10.1080/17517575.2019.1612098

Published online: 14 May 2019.

Submit your article to this journal

View Crossmark data

Full Terms & Conditions of access and use can be found at

Big data technology: developments in current research and

ABSTRACT ARTICLE HISTORY

CONTACT Nitin Singh nitin.singh@iimkashipur.ac.in IIM, Kashipur, India

Literature review of big data

Literature analysis by journals and conferences

Communications of the ACM (CACM)

Communications of the association for information systems

Enterprise Information Systems

Information and software technology

Journal of global information technology management

Journal of information technology

Management information systems quarterly

MIT sloan management review

basics; building on experience; taking incremental steps; leveraging domain knowledge;

Research and technology management

The international technology management review

ACM international conference on big data research

ACM international workshop on big data software engineering

IEEE international conference on big data (diﬀerent years)

IEEE international congress on big data (diﬀerent years)

International conference on big data and advanced wireless technologies

Application of the research methodology

Identiﬁcation of journals and papers and shortlisting the papers

Table 1. Year-wise publications of journals.

Identiﬁcation of conferences and papers

33 Brynjolfsson, Geva, and Reichman (2015) MIS quarterly 16

39 Mishra et al. (2017) Business Process Management Journal 11

Conceptual framework: a conceptual analysis from a data management

Big Data 2362

0 500 1000 1500 2000 2500

Figure 2. Key issues being researched in big data.

Principal component analysis

Table 3. KMO and Bartlett’s test.

Table 5. Total variance explained.

Table 6. Unrotated component matrix.

or component is expressed by the loading. We suppressed factor-variable loadings with <0.5 in

Table 7. Rotated component matrix.

Interpretation and discussion of the PCA results

Citation and co-citation analysis of the shortlisted articles

Table 8. Threshold criteria for citation by authors.

Table 9.: Citation by authors.

Table 10. Threshold criteria for co-citation by authors.

Table 11. Co-citation by authors.

Figure 3. Density view of co-citation by authors.

Figure 4. Density view of the Co-citation by papers.

Contributions to managerial practice

Directions for future research

Limitations of the study

Kalantari, A., A. Kamsin, H. S. Kamaruddin, N. A. Ebrahim, A. Gani, A. Ebrahimi, and S. Shamshirband.

You might also like