You are on page 1of 6

A SYSTEMATIC LITERATURE REVIEW WITH

BIBLIOMETRIC ANALYSIS OF DATA


CATALOG FROM PERIOD 1970 TO 2021
BOUFASSIL Asmae EL HADDADI Anass
Data science team and competitive intelligence (DSCI) Data science team and competitive intelligence (DSCI)
ENSAH / UAE ENSAH / UAE
AL Hoceima , Morocco AL Hoceima , Morocco
boufassil@gmail.com anass.elhaddadi@gmail.com

Abstract—The handling of a large amount of data to analyze find their favorite books, a data catalog provides an overview
certain behaviors is reaching a great popularity in the period of all business data.
1970-2021.This phenomenon has been called data catalog. The As a result, it was proposed as an objective to analyze the
goal of this study was to examine the scientific output on the
data catalog in the Scopus database. A bibliometric analysis scientific output, understood as the published articles on data
of a sample of 332 scientific documents was done. The rise catalog in the Scopus.
in publications and the arrangement of particular journals, Consequently, the following research questions were iden-
countries, authors, and keywords as references in the subject tified:
matter stand out among the findings. Finally, explanations for QR1: What is the state of scientific production over time?
the study’s findings that might be possible are presented, along
with ideas for further investigation. QR2: Which journals and countries concentrate the greatest
Index Terms—data catalog, bibliometric, systematic literature scientific production on Data catalog?
review, co-occurrence, co-citation. QR3. Which are the articles of greater impact in the area
of Data catalog?
I. I NTRODUCTION QR4. What are the main lines of research in this field that
are derived from the keywords of scientific articles?
The term “data catalog” is generally understood to mean QR5: What are the key concepts that related to the research
centralized location for data management where data cat- field?
aloging and metadata management are combined. It does The remainder of this study is as structured as follows.
not only provide information to data users to locate and Section 2 describes the methodology. Section 3 presents the
understand data, but also automates metadata management bibliometric analysis and findings. Section 4 concludes with a
makes it collaborative. critical reflection.
Several authors have attempted to define “data catalog” for
example Gartner describes the data catalog: “A data catalog II. M ATERIALS AND M ETHODS
maintains an inventory of data assets through the discovery, A bibliometric analysis methodology is used in this study.
description, and organization of datasets. The catalog provides Following the guidelines and criteria of bibliometrics, keyword
context to enable data analysts, data scientists, data stewards, for this study is “data catalog”, therefore, the scientific pro-
and other data consumers to find and understand a relevant duction is collected in article format, from 1970 to 2021. The
dataset for the purpose of extracting business value.” In 2020, data in this study was drawn from the database Elsevier (also
Labadie et al. defined “Data catalog maintains an inventory of known as the scopus) 2004 core collection. This is a selective
data assets through the discovery, description, and organization index of good quality publications.
of datasets.”(Labadie et al. 2020).
Specifically, “Data catalogs as tools to centrally “collect, A. Systematic literature review
create, and maintain metadata”, allowing for easier findability A literature review aims to collect and review the litera-
and accessibility.”(Ehrlinger et al. 2021) ture to identify potential research gaps and uncover knowl-
An effective analysis of data catalog helps to step up to those edge limitations (Tranfield et al., 2003). Structured literature
Managing data in the age of big data, data lakes, and self- searches are typically accomplished through an iterative loop
service and for modern data driven businesses, data catalogs of defining appropriate search terms, retrieving literature, and
are at the center of their data journey. Data catalogs greatly completing the analysis (Saunders et al., 2009). Rowley and
contribute to the success of any data analysis process. Simply Slack (2004) recommend a structured approach to finding
put, just like we have a catalog in a library that helps readers resources, designing mind maps to create literary reviews,
writing research, and creating bibliographies. In this study, we III. R ESULTATS
used a systematic literature review approach (Tranfield et al.,
2003), and bibliometrics analysis, to conduct a comprehensive The publications per year are mostly concentrated between
assessment of the field with the aim of identifying the most 1970 and 2021 of the articles published on data catalog.
influential studies and authors, as well as current There are Likewise, its origin in literature begins in 1970, although the
research areas and insights. Give current research interests. flow of publications begins in 2000(Table.1.). According to

B. Bibliometric analysis method


Bibliometrics provides systematic methods and tools to
analyse and quantify the dissemination and relationship of
scientific publications (Wallin 2005, De Bellis 2009, Cadez
2013, Sweileh et al.2017, Van Eck and Waltman 2017). This
helps new researchers, librarians, administrators and decision
makers gain a high-level view and foresight of all related
aspects.
Bibliometric analysis is a method for conducting literature
reviews that relates to published research’ statistical and quan-
titative data (Broadus 1987). Bibliometrics is ”more impartial
and trustworthy” than other methods of literature review (Aria
and Cuccurullo 2017). Bibliometrics reviews are ”systematic,
transparent, and reproducible” when properly carried out and
reported (Aria and Cuccurullo 2017). Initially, bibliometrics
was used by academics to analyze published studies based
on counts of publications and citations. With advancements
in bibliometric techniques, tools, and software, we employed
the open-source Bibliometrix package of R programming
language (Aria and Cuccurullo 2017). table 1, the Scopus database is most data for this resonance
we chose it for this study.
C. Methodology of bibliometric analysis In Fig 2, we present the chronological growth of data catalog
The purpose of literature review articles might be twofold: research for the study period 1970-2021. In the first period
(a) summarizing the existing literature on a topic by high- there were years without any records, but after 1989 there
lighting significant themes and issues and offering grounds for was a gradual increase.
further investigation (Seuring et al., 2005); and (b) comparing
any scientific literature to current knowledge and hypotheses
(Saunders et al., 2009). There exist different types of literature
review for exemple systematic literature review, meta-analysis,
bibliometric analysis etc. In this work, bibliometric analysis is
used.
Fig.1. illustrates the methodology adopted in this study.
The clustering of sources through Bradford Law using the Fig.4 depicts the analysis of citation versus country. The
Biblioshiny tool is displayed in fig. 3 for the output of the USA has a prominent network citation compared to other
information science research. It is evident that the JOURNAL countries, followed by Netherlands, Germany, and United
OF PHYSICS: CONFERENCE SERIES is the top pick among Kingdom.
readers.

Fig.5 presents the top 50 source of publications, based on


the number of papers published in the research area. metadata
In Table.2, which shows the order of the countries sorted
is the predominant source of publication, as shown below, with
by several records published by the countries in the area of
46 publications for 1970-2021.
data catalog. the top 10 most productive countries in terms
of the number of publications, from Scopus are presented.
The USA tops the list with 33 publications, followed by
China, Germany, and United Kingdom with 18, 17, and 14
publications, respectively. The below table projects the top 10
country-wise research outputs in the sector of study. Italy is
positioning in 5th place of publishing articles related to data
catalog study.

In Fig.6, we present the time series growth terms associated


with the data catalog for the study period 1970-2021. There
is a gradual increase in terms, especially ”metadata.”
Fig.9 depicts that co-citation versus reference for the pub-
lications done during the study period. Most of the references
Fig.7 shows the co-concurrence of the author’s major key- occurred from early 1970s articles since the study is confined
word used for the data catalog study for the period of study. with data catalog tools, Bradford Law, and other scientific
The below figure identifies the network of authors’ major parameters. The below figure can identify the co-citation of
keywords used for the Information Science research. articles with the references.

Fig.8 clearly shows the topmost authors in the field of The conceptual function By using Correspondence Analy-
research for this study, as reflected in the Scopus database. sis (CA), Multiple Correspondence Analysis (MCA), Metric
NANA is a prominent author, having a good number of Multidimensional Scaling (MDS), and Clustering of a bipartite
publications and a citation in recent years. network of terms extracted from keyword, title, or abstract
fields, Structure creates a conceptual structure map of a research is crucial for the data catalog is a well-established
scientific field as shown in fig.10. topic, and the research Community could look into the further
development of the the order of publications in this field.
visualization of data one of the most often used terms in
data catalog is presented. The most popular search terms are
Citation,Science mapping, network analysis, and co-citation.
Finally, a thorough discussion of the publication of authors
from other nations and India follows. The reader can better
understand the current trending areas by using the many sub-
sections that are used to categorize the overview of the pub-
lished work in data catalog with bibliometric. The bibliometric
study’s provision of the number of papers and citations is
the study’s shortcoming. Numbers clearly indicate quantity,
but citations do not indicate quality. Additionally, the study’s
bibliometric analysis included the Scopus database. There are
alternative sources though, Therefore additional investigation
using different indexing databases could be thought of as the
study’s potential future direction.

R EFERENCES
Anuradha, K.T., and Shalini R. Urs. “Bibliometric Indicators
of Indian Research Collaboration Patterns: A Correspondence
Analysis.” Scientometrics, vol. 71, 2007, pp. 179-189.
Ludo Waltman, Nees Jan van Eck, and Ed C.M. Noyons.”A
unified approach to mapping and clustering of bibliometric
L IMITATIONS OF THE RESEARCH networks”.
The data catalog is a new area of study, which is one of Chen, Kaihua, et al. “International Research Collaboration:
the limitations of the current survey. The results cannot be An Emerging Domain of Innovation Studies?” Research Pol-
generalized because only one bibliographic database, Scopus, icy, vol. 48, no. 1, 2019, pp. 149-168.
was used. Lack of a bibliometric algorithm to support the Clément Labadie; Christine Legner; Markus Eurich; Martin
development of citations per paper over time is another draw- Fadler.”FAIR Enough? Enhancing the Usage of Enterprise
back. As a result, older articles are cited more frequently than Data with Data Catalogs” 2020.
more recent ones, which makes it difficult to accurately assess Fazli, Farzaneh, et al. “Co-Authorship Patterns and Topic
the influence of recent publications. Additionally, the current Networks in the Scientific Publication of Hamadan University
approach does not identify cross-citations or create author of Medical Sciences.” Library Philosophy and Practice, 2018.
clusters. As a starting step in additional study, the current Lisa Ehrlinger, Johannes Schrott, Martin Melichar, Nicolas
results can be seen as explanatory and indicative yet crude. Kirchmayr Wolfram Wöß .”Data Catalogs: A Systematic
Present results, which are restricted to the educational setting, Literature Review and Guidelines to Implementation” 2021.
are suggestive of research ”gaps” those researchers may have Garfield, E. “Citation Indexes for Science: A New Dimen-
to take into consideration in their research. sion in Documentation through Association of Ideas.” Science,
1955, pp. 108- 111.
D ISCUSSION AND C ONCLUSION David Tranfield, David Denyer, Palminder Smart.”Towards a
This publication does a thorough bibliometric analysis in Methodology for Developing Evidence-Informed Management
”Data catalog.” The structures and development in this field Knowledge by Means of Systematic Review” 2003.
were uncovered with the aid of the bibliometric analysis. Garfield, Eugene. “Science Citation Index - A New Dimen-
The bibliometric analysis made use of Scopus, the most used sion in Indexing.” Science, 1964, pp. 649-654.
database. Due to the fact that Scopus indexes a variety of Gunasekaran, M., and R. Balasubramani. “Scientometric
sources, 332 publications are included. NA NA is the most pro- Analysis of Artificial Intelligence Research Output: An Indian
ductive author globally, and is the most well-known researcher Perspective.” European Journal of Scientific Research, vol. 70,
in the subject of data catalog research . Both authors have no. 2, 2012, pp. 317-322.
strong networks with numerous nations. metadata, information Jennifer Rowley, Frances Slack.”Conducting a literature
management are the topics that receive the most attention in review ”2004.
data catalog. China and the USA are the most fruitful nations Eileen F. S. Kaner,Heather O. Dickinson,Fiona
according to bibliometrics . The bibliographic This paper’s Beyer,Elizabeth Pienaar,Carla Schlesinger,Fiona
analysis provided the intrinsic structure of data catalog. This Campbell,John B. Saunders,Bernard Burnand,Nick
Heather.”The effectiveness of brief alcohol interventions
in primary care settings: a systematic review” 2005.
Hirsch, J.E. “An Index to Quantify an Individual’s Scientific
Research Output.” Proceedings of the National Academy of
Science, 2005.
R. N. Broadus ”Toward a definition of “bibliomet-
rics””1987.
Kumar, V. Vasantha, et al. “A Power-graph based Ap-
proach to Detection of Research Communities from Co-
Authorship Networks.” Journal of Computational and Theo-
retical Nanoscience, vol. 14, no. 12, 2017.
Kumaresan, Ramasamy, et al. “Scientometric Analysis of
Seaweed Research with Reference to Web of Science.” Library
Philosophy and Practice, 2015.
Massimo Ariaa , Corrado Cuccurullo ”bibliometrix: An R-
tool for comprehensive science mapping analysis” 2017.
Mandhirasalam, M. “Research Publication Output of Thi-
agarajar College of Engineering, Madurai: A Scientometric
Study.” Indian Journal of Science, vol. 21, 2015, pp. 490-498
Radha, L. “Coronavirus: A Scientometric Study with Spe-
cial Reference to Web of Science.” Shanlax International
Journal of Arts, Science and Humanities, vol. 8, no. 1, 2020,
pp. 213- 217.
K.Smithab , D.Marinova ”Use of bibliometric modelling for
policy making” 2005.
Johan A. Wallin ”Bibliometric Methods: Pitfalls and Possi-
bilities” 2005.
Radha, L. “Research Output of Thiagarajar College of Engi-
neering, Madurai during 2014-2018: A Scientometric Analysis
using Excel Sheet.” Shanlax International Journal of Arts,
Science and Humanities, vol. 8, no. 2, 2020, pp. 97- 101.
El-Sayed M. El-Alfy , Salahadin A. Mohammed ”A review
of machine learning for big data analytics: bibliometric ap-
proach” 2020.
Sivakumar, B. “Analysis of the Publications of the PSG
College of Arts and Science: A Bibliometric Study.” Journal
of Advancements in the Library Science, vol. 4, no. 1, 2017,
pp. 7-14.
Surulinathi, M., et al. “Continent wise Analysis of Green
Computing Research: A Scientometric Study.” Journal of
Advances in Library and Information Science, vol. 2, no. 1,
2013, pp. 39-44.
Thanuskodi, S. “Bibliometric Analysis of the Journal Li-
brary Philosophy and Practice from 2005- 2009.” Library
Philosophy and Practice, 2010.

You might also like