Professional Documents
Culture Documents
Data Science and Its Relationship To Library and Information Science: A Content Analysis
Data Science and Its Relationship To Library and Information Science: A Content Analysis
https://www.emerald.com/insight/2514-9288.htm
Abstract
Purpose – The purpose of this paper is to present the results of a study exploring the emerging field of data
science from the library and information science (LIS) perspective.
Design/methodology/approach – Content analysis of research publications on data science was made of
papers published in the Web of Science database to identify the main themes discussed in the publications from
the LIS perspective.
Findings – A content analysis of 80 publications is presented. The articles belonged to the six broad
categories: data science education and training; knowledge and skills of the data professional; the role of
libraries and librarians in the data science movement; tools, techniques and applications of data science; data
science from the knowledge management perspective; and data science from the perspective of health sciences.
The category of tools, techniques and applications of data science was most addressed by the authors, followed
by data science from the perspective of health sciences, data science education and training and knowledge and
skills of the data professional. However, several publications fell into several categories because these topics
were closely related.
Research limitations/implications – Only publication recorded in the Web of Science database and with
the term “data science” in the topic area were analyzed. Therefore, several relevant studies are not discussed in
this paper that either were related to other keywords such as “e-science”, “e-research”, “data service”, “data
curation”, “research data management” or “scientific data management” or were not present in the Web of
Science database.
Originality/value – The paper provides the first exploration by content analysis of the field of data science
from the perspective of the LIS.
Keywords Education, Content analysis, Literature review, Data science, Information science, Library science
Paper type Research paper
1. Introduction
Data science, having emerged in response to the increased amount of data, has received
considerable attention in recent years. For example, the Web of Science database has
recorded 2,350 publications in the topic area “data science” in the period 1980–2019 by April
2019. And, 44.8% of publications came from the subject area of computer science, followed by
engineering (18.2%), mathematics (7.4%), science technology and other topics (5.6%),
business economic (4.6%), education and educational research (3.6%), information science
and library science (3.4%), physics (2.9%), telecommunications (2.9%), medical informatics
(2.7%), materials science (2.6%) and operations research and management science (2.6%)
Data Technologies and
(Virkus and Garoufallou, 2019a). At 3.4%, the library and information science (LIS) Applications
contribution to data science in the period 1980–2019 was limited. Vol. 54 No. 5, 2020
pp. 643-663
This paper presents the results of the study that explored the field of data science from the © Emerald Publishing Limited
2514-9288
LIS perspective using content analysis. The structure of this paper is as follows: section 2 DOI 10.1108/DTA-07-2020-0167
DTA describes the research methodology, section 3 presents the results of the content analysis on
54,5 the data science from the LIS perspective, section 4 draws conclusions.
2. Methodology
This paper presents the research results that are a part of the study that explored the data
science from the LIS perspective. The following research questions were proposed: (1) what
644 are the main tendencies in publication years, document types, countries of origin, source
titles, authors of publications, affiliations of the article authors and the most cited articles
related to data science in the field of LIS? (2) What are the main themes discussed in the
publications from the LIS perspective?
In the first stage, the bibliographic analysis was made on the basis of papers published in
the Web of Science database. Searches were carried out in the database by topic in April 2019
using the term “data science”. The search strategy discovered 80 publications published from
1980–2019. A statistical descriptive analysis of these papers provided answers to the first
research question and was presented by Virkus and Garoufallou (2019a). The methodology
and research approach were also tested in another paper by Virkus and Garoufallou (2019b),
which dealt with data science from a perspective of computer science.
In the second stage the content analysis of 80 publications was made to answer the second
research question: What are the main themes discussed in the publications from the LIS
perspective? This paper will provide an answer to the second research question.
The research has two limitations: (1) only publication recorded in the Web of Science
database and (2) only publications with the term “data science” in the topic area of the Web of
Science database were analyzed. Therefore, several relevant studies are not discussed in this
paper that either were related to other keywords such as “e-science”, “e-research”, “data
service”, “data curation”, “research data management” or “scientific data management” or
were not present in the Web of Science database.
4. Conclusions
The LIS contribution to data science in the period 1980–2019 according to the Web of Science
database was quite limited – 3.4%. The first paper was published in 2005 and the number of
articles have increased over the past few years. It appears that there has been continuous
increase in articles from 2015. The main document types are journal articles, followed by
conference proceedings and editorial material.
The analysis revealed that data science is quite interdisciplinary by nature. The reviewed
articles were diverse in content. In addition to the identified six broad categories (data science
education and training; knowledge and skills of the data professional; the role of libraries and
librarians in the data science movement; tools, techniques and applications of data science;
data science from the KM perspective; data science from the perspective of health sciences),
the topics included big data issues, data structures, information and data visualization,
solution for social computing and social network analytics, the application of agile
methodologies and principles to business intelligence, topical sequence profiling, directory-
based incentive management services for ad hoc mobile clouds, BDS models and access to
scholarly publications. Several publications fell into several categories because these topics
were closely related. These topics were explored from the perspective of research or practice;
for example, from the perspective of the information professional and data analysts or
information systems or health sciences research. Data science was also discussed in the
historical context and as a part of a broader cultural history.
The category of tools, techniques and applications of data science was most addressed by
the authors and indicates that there are numerous data science tools, techniques and
applications that help obtain value from data, study data and its patterns and generate
outcomes from it and are applicable in a wide range of fields. This category was followed by
data science from the perspective of health sciences, data science education and training and
knowledge and skills of the data professional. Based on the analyzed publications, several
fields such as LIS, information systems, KM and health sciences provide valuable
contributions to data science.
References
Agarwal, R. and Dhar, V. (2014), “Big data, data science, and analytics: the opportunity and challenge
for IS research”, Information Systems Research, Vol. 25 No. 3, pp. 443-448.
Alluqmani, A. and Shamir, L. (2018), “Writing styles in different scientific disciplines: a data science
approach”, Scientometrics, Vol. 115 No. 2, pp. 1071-1085.
Almugbel, R., Hung, L.H., Hu, J., Almutairy, A., Ortogero, N., Tamta, Y. and Yeung, K.Y. (2018),
“Reproducible Bioconductor workflows using browser-based interactive notebooks and
containers”, Journal of the American Medical Informatics Association, Vol. 25 No. 1, pp. 4-12.
Amirian, P., van Loggerenberg, F. and Lang, T. (2017), “Data science and analytics”, in Amirian, P.,
Lang, T. and van Loggerenberg, F. (Eds), Big Data in Healthcare, SpringerBriefs in
Pharmaceutical Science and Drug Development, Springer, Cham, pp. 15-37.
Antell, K., Foote, J.B., Turner, J. and Shults, B. (2014), “Dealing with data: science librarians’ Data science
participation in data management at association of research libraries institutions”, College and
Research Libraries, Vol. 75 No. 4, pp. 557-574. and
Aristodemou, L. and Tietze, F. (2018), “The state-of-the-art on intellectual property analytics (IPA): a
information
literature review on artificial intelligence, machine learning and deep learning methods for science
analysing intellectual property (IP) data”, World Patent Information, Vol. 55, pp. 37-51.
Barbuti, N., Caldarola, T. and Ferilli, S. (2018), “A graphic matching process for searching and
retrieving information in digital libraries of manuscripts”, in Serra, G. and Tasso, C. (Eds), 659
Digital Libraries and Multimedia Archives. IRCDL 2018. Communications in Computer and
Information Science, Springer, Cham, Vol. 806, pp. 139-150.
Baskarada, S. and Koronios, A. (2017), “Unicorn data scientist: the rarest of breeds”, Program, Vol. 51
No. 1, pp. 65-74.
Beaton, B. (2016), “How to respond to data science: early data criticism by Lionel Trilling”,
Information and Culture, Vol. 51 No. 3, pp. 352-372.
Berente, N., Seidel, S. and Safadi, H. (2018), “Research commentary - data-driven computationally
intensive theory development”, Information Systems Research, Vol. 30 No. 1, pp. 50-64.
Biswas, R. (2016), “Introducing data structures for big data”, in Effective Big Data Management and
Opportunities for Implementation, IGI Global, pp. 25-52.
Borgman, C.L., Darch, P.T., Sands, A.E., Pasquetto, I.V., Golshan, M.S., Wallis, J.C. and Traweek, S.
(2015), “Knowledge infrastructures in science: data, diversity, and digital libraries”,
International Journal on Digital Libraries, Vol. 16 Nos 3-4, pp. 207-227.
Brennan, P.F., Chiang, M.F. and Ohno-Machado, L. (2018), “Biomedical informatics and data science:
evolving fields with significant overlap”, Journal of the American Medical Informatics
Association, Vol. 25 No. 1, pp. 2-3.
Brunner, R.J. (2018), “The data science handbook. Field Cady, John Wiley & Sons, Inc., Hoboken, NJ,
2017.416 pp”, Journal of the Association for Information Science and Technology, Vol. 69 No. 6,
pp. 861-863.
Cady, F. (2017), The Data Science Handbook, John Wiley & Sons, Hoboken, NJ.
Carter, D. and Sholler, D. (2016), “Data science on the ground: hype, criticism, and everyday work”,
Journal of the Association for Information Science and Technology, Vol. 67 No. 10, pp. 2309-2319.
Cervone, H.F. (2016), “Informatics and data science: an overview for the information professional”,
Digital Library Perspectives, Vol. 32 No. 1, pp. 7-10.
Cervone, H.F. (2017), “What does the evolution of curriculum in knowledge management programs tell
us about the future of the field?”, VINE Journal of Information and Knowledge Management
Systems, Vol. 47 No. 4, pp. 454-466.
Chen, H.L. and Zhang, Y. (2017), “Educating data management professionals: a content analysis of job
descriptions”, The Journal of Academic Librarianship, Vol. 43 No. 1, pp. 18-24.
Cho, J. (2019), “Subject analysis of LIS data archived in a Figshare using co-occurrence analysis”,
Online Information Review, Vol. 43 No. 2, pp. 256-264.
Costa, C. and Santos, M.Y. (2017), “The data scientist profile and its representativeness in the
European e-Competence framework and the skills framework for the information age”,
International Journal of Information Management, Vol. 37 No. 6, pp. 726-734.
Courneya, J.P. and Mayo, A. (2018), “High-performance computing service for bioinformatics and data
science”, Journal of the Medical Library Association, Vol. 106 No. 4, p. 494.
Da Sylva, L. (2017), “The theoretical and practical impact of data on information professionals”,
Documentation et Bibliotheques, Vol. 63 No. 4, pp. 5-34.
(2017), “Special section on data science and business intelligence”,
de Vasconcelos, J.B. and Rocha, A.
International Journal of Information Management, Vol. 37 No. 6, pp. 716-717.
DTA Erdmann, C. (2015), “Data scientist training for librarians”, Library and Information Services in
Astronomy VII: Open Science at the Frontiers of Librarianship ASP Conference Series, Vol. 492,
54,5 pp. 31-37, available at: www.aspbooks.org/a/volumes/article_details/?paper_id536774
(accessed 12 July 2020).
Estiri, H., Stephens, K.A., Klann, J.G. and Murphy, S.N. (2018), “Exploring completeness in clinical data
research networks with DQe-c”, Journal of the American Medical Informatics Association,
Vol. 25 No. 1, pp. 17-24.
660 Evans, B.J. and Krumholz, H.M. (2019), “People-powered data collaboratives: fueling data science with
the health-related experiences of individuals”, Journal of the American Medical Informatics
Association, Vol. 26 No. 2, pp. 159-161.
Foster, J., McLeod, J., Nolin, J. and Greifeneder, E. (2018), “Data work in context: value, risks, and
governance”, Journal of the Association for Information Science and Technology, Vol. 69 No. 12,
pp. 1414-1427.
Ghasemaghaei, M., Ebrahimi, S. and Hassanein, K. (2018), “Data analytics competency for improving
firm decision making performance”, The Journal of Strategic Information Systems, Vol. 27 No. 1,
pp. 101-113.
Ghosh, J. (2016), “Big data analytics: a field of opportunities for information systems and technology
researchers”, Journal of Global Information Technology Management, Vol. 19 No. 4, pp. 217-222.
Gollub, T., Lipka, N., Koh, E., Genc, E. and Stein, B. (2016), “Topical sequence profiling”, 2016 27th
International Workshop on Database and Expert Systems Applications (DEXA), IEEE,
pp. 207-211.
Granville, V. (2014), Developing Analytic Talent: Becoming a Data Scientist, John Wiley & Sons,
Hoboken, NJ.
Greenberg, J. (2017), “Big metadata, smart metadata, and metadata capital: toward greater synergy
between data science and metadata”, Journal of Data and Information Science, Vol. 2 No. 3,
pp. 19-36.
Guo, J., Zhang, W., Fan, W. and Li, W. (2018), “Combining geographical and social influences with
deep learning for personalized point-of-interest recommendation”, Journal of Management
Information Systems, Vol. 35 No. 4, pp. 1121-1153.
Halim, Z. and Khan, S. (2019), “A data science-based framework to categorize academic journals”,
Scientometrics, Vol. 119 No. 1, pp. 393-423.
Hjørland, B. (2019), “Data (with big data and database semantics)”, KO Knowledge Organization,
Vol. 45 No. 8, pp. 685-708.
Intezari, A. and Gressel, S. (2017), “Information and reformation in KM systems: big data and strategic
decision-making”, Journal of Knowledge Management, Vol. 21 No. 1, pp. 71-91.
Kelleher, J.D. and Tierney, B. (2018), Data Science, MIT Press, Cambridge, MA.
Kennan, M.A. (2017), “‘In the eye of the beholder’: knowledge and skills requirements for data
professionals”, Information Research, Vol. 22 No. 4, available at: www.informationr.net/ir/22-4/
rails/rails1601.html (accessed 18 January 2019).
Kocheturov, A. and Pardalos, P.M. (2016), “Data science for massive networks”, in Braslavski, P.,
Markov, I., Pardalos, P., Volkovich, Y., Ignatov, D.I., Koltsov, S. and Koltsova, O. (Eds),
Information Retrieval. RuSSIR 2015. Communications in Computer and Information Science,
Springer, Cham, Vol. 573, pp. 88-100.
Koltay, T. (2019), “Accepted and emerging roles of academic libraries in supporting Research 2.0”, The
Journal of Academic Librarianship, Vol. 45 No. 2, pp. 75-80.
Ku, J.P., Hicks, J.L., Hastie, T., Leskovec, J., Re, C. and Delp, S.L. (2015), “The Mobilize Center: an NIH
big data to knowledge center to advance human movement research and improve mobility”,
Journal of the American Medical Informatics Association, Vol. 22 No. 6, pp. 1120-1125.
Kumar, S., Abowd, G.D., Abraham, W.T., al’Absi, M., Gayle Beck, J., Chau, D.H., Condie, T., Conroy, Data science
D.E., Ertin, E., Estrin, D. and Ganesan, D. (2015), “Center of excellence for mobile sensor data-to-
knowledge (MD2K)”, Journal of the American Medical Informatics Association, Vol. 22 No. 6, and
pp. 1137-1142. information
Larson, D. and Chang, V. (2016), “A review and future direction of agile, business intelligence, science
analytics and data science”, International Journal of Information Management, Vol. 36 No. 5,
pp. 700-710.
Ledley, T.S., Dahlman, L., Domenico, B. and Taber, M.R. (2005), “Facilitating the effective use of earth 661
science data in education through digital libraries: bridging the gap between scientists and
educators”, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries
(JCDL’05), IEEE, p. 386.
Leung, C.K., Braun, P., Enkhee, M., Pazdor, A.G., Sarumi, O.A. and Tran, K. (2016), “Knowledge
discovery from big social key-value data”, 2016 IEEE International Conference on Computer
and Information Technology (CIT), IEEE, pp. 484-491.
Lorentzen, D.G. and Nolin, J. (2017), “Approaching completeness: capturing a hashtagged Twitter
conversation and its follow-on conversation”, Social Science Computer Review, Vol. 35 No. 2,
pp. 277-286.
Mandal, S. (2019), “The influence of big data analytics management capabilities on supply chain
preparedness, alertness and agility: an empirical investigation”, Information Technology and
People, Vol. 32 No. 2, pp. 297-318.
Margolis, R., Derr, L., Dunn, M., Huerta, M., Larkin, J., Sheehan, J. and Green, E.D. (2014), “The national
institutes of health’s big data to knowledge (BD2K) initiative: capitalizing on biomedical big
data”, Journal of the American Medical Informatics Association, Vol. 21 No. 6, pp. 957-958.
Maxwell, D., Norton, H. and Wu, J. (2018), “The data science opportunity: crafting a holistic strategy”,
Journal of Library Administration, Vol. 58 No. 2, pp. 111-127.
Mitchell, E.T. (2015), “Reproducibility and its application to technical service processes”, Technical
Services Quarterly, Vol. 32 No. 4, pp. 402-413.
Newman, R., Chang, V., Walters, R.J. and Wills, G.B. (2016), “Model and experimental development for
business data science”, International Journal of Information Management, Vol. 36 No. 4,
pp. 607-617.
Ohno-Machado, L. (Ed.) (2013), “Data science and informatics: when it comes to biomedical data, is
there a real distinction?”, Journal of the American Medical Informatics Association, Vol. 20 No. 6,
p. 1009.
Ohno-Machado, L. (2018a), “Special focus on biomedical data science”, Journal of the American
Medical Informatics Association, Vol. 25 No. 1, p. 1.
Ohno-Machado, L. (2018b), “Data science and artificial intelligence to improve clinical practice and
research”, Journal of the American Medical Informatics Association, Vol. 25 No. 10, p. 1273.
Ortiz-Repiso, V., Greenberg, J. and Calzada-Prado, J. (2018), “A cross-institutional analysis of data-
related curricula in information science programmes: a focused look at the iSchools”, Journal of
Information Science, Vol. 44 No. 6, pp. 768-784.
Park, H.W. and Leydesdorff, L. (2013), “Decomposing social and semantic networks in emerging “big
data” research”, Journal of Informetrics, Vol. 7 No. 3, pp. 756-765.
Poulova, P., Mikulecka, J., Kozel, T. and Klimova, B. (2018), “Data science study program”, 12th
International Scientific Conference on Distance Learning in Applied Informatics (DIVAI),
pp. 337-347.
Qasim, M.A., Ul Hassan, S., Aljohani, N.R. and Lytras, M.D. (2017), “Human behavior analysis in the
production and consumption of scientific knowledge across regions: a case study on
publications in Scopus”, Library Hi Tech, Vol. 35 No. 4, pp. 577-587.
DTA Rempel, E.S., Barnett, J. and Durrant, H. (2018), “Public engagement with UK government data
science: propositions from a literature review of public engagement on new technologies”,
54,5 Government Information Quarterly, Vol. 35 No. 4, pp. 569-578.
Rentier, B. (2016), “Open science: a revolution in sight?”, Interlending and Document Supply, Vol. 44
No. 4, pp. 155-160.
Saar-Tsechansky, M. (2015), “Editor’s comments: the business of business data science in IS journals”,
MIS Quarterly, Vol. 39 No. 4, pp. iii-vi.
662
Saltz, J., Shamshurin, I. and Connors, C. (2017), “Predicting data science sociotechnical execution
challenges by categorizing data science projects”, Journal of the Association for Information
Science and Technology, Vol. 68 No. 12, pp. 2720-2728.
Schweighofer, E. (2015), “The role of AI & law in legal data science”, in Rotolo, A. (Ed.), Legal
Knowledge and Information Systems, JURIX 2015: The Twenty-Eight Annual Conference, IOS
Press, Amsterdam, pp. 191-192.
Sexton, A., Shepherd, E., Duke-Williams, O. and Eveleigh, A. (2017), “A balance of trust in the use of
government administrative data”, Archival Science, Vol. 17 No. 4, pp. 305-330.
Sheble, L. (2016), “Research synthesis methods and library and information science: shared problems,
limited diffusion”, Journal of the Association for Information Science and Technology, Vol. 67
No. 8, pp. 1990-2008.
Shera, J.H. (1951), “Documentation: its scope and limitations”, The Library Quarterly, Vol. 21 No. 1,
pp. 13-26.
Si, L., Zhuang, X., Xing, W. and Guo, W. (2013), “The cultivation of scientific data specialists:
development of LIS education oriented to e-science service requirements”, Library Hi Tech,
Vol. 31 No. 4, pp. 700-724.
Song, I.Y. and Zhu, Y. (2017), “Big data and data science: opportunities and challenges of iSchools”,
Journal of Data and Information Science, Vol. 2 No. 3, pp. 1-18.
Song, P., Zheng, C., Zhang, C. and Yu, X. (2018), “Data analytics and firm performance: an empirical
study in an online B2C platform”, Information and Management, Vol. 55 No. 5, pp. 633-642.
Spruit, M. and Lytras, M. (2018), “Applied data science in patient-centric healthcare: adaptive analytic
systems for empowering physicians and patients”, Telematics and Informatics, Vol. 35 No. 4,
pp. 643-653.
Stanton, J.M., Palmer, C.L., Blake, C. and Allard, S. (2012), “Interdisciplinary data science education”,
Special Issues in Data Management (ACS Symposium Series, Vol. 1110), Washington, DC,
American Chemical Society.
Sundararajan, A., Provost, F., Oestreicher-Singer, G. and Aral, S. (2013), “Information in digital,
economic, and social networks”, Information Systems Research, Vol. 24 No. 4, pp. 883-905.
Tang, R. and Sae-Lim, W. (2016), “Data science programs in US higher education: an exploratory
content analysis of program description, curriculum structure, and course focus”, Education for
Information, Vol. 32 No. 3, pp. 269-290.
Thirathon, U., Wieder, B., Matolcsy, Z. and Ossimitz, M.L. (2017), “Big data, analytic culture and
analytic-based decision making evidence from Australia”, Procedia Computer Science, Vol. 121,
pp. 775-783.
Ting, I. (2015), “Developing analytic talent: becoming a data scientist”, Online Information Review,
Vol. 39 No. 2, p. 273.
Umachandran, K. and Ferdinand-James, D.S. (2017), “Affordances of data science in agriculture,
manufacturing, and education”, in Tamane, S. (Ed.), Privacy and Security Policies in Big Data,
IGI Global, pp. 14-40.
Virkus, S. and Garoufallou, E. (2019a), “Data science from a library and information science
perspective”, Data Technologies and Applications, Vol. 53 No. 4, pp. 422-441, doi: 10.1108/DTA-
05-2019-0076.
Virkus, S. and Garoufallou, E. (2019b), “Data science from a perspective of computer science”, in Data science
Garoufallou, E., Fallucchi, F. and William De Luca, E. (Eds), Metadata and Semantic Research.
MTSR 2019. Communications in Computer and Information Science, Springer, Cham, Vol. 1057, and
pp. 209-219, doi: 10.1007/978-3-030-36599-8_19. information
Waltl, B., Zec, M. and Matthes, F. (2015), “A data science environment for legal texts”, in Rotolo, A. science
(Ed.), Legal Knowledge and Information Systems. JURIX 2015: The Twenty-Eight Annual
Conference, IOS Press, Amsterdam, pp. 193-194.
Wang, K. (2018), “Twinning data science with information science in schools of library and 663
information science”, Journal of Documentation, Vol. 74 No. 6, pp. 1243-1257.
Wilson, T.D. (2018), “Review of: Kelleher, John D. and Tierney, Brendan. Data science. Cambridge,
MA: MIT Press, 2018”, Information Research, Vol. 23 No. 2, available at: informationr.net/ir/
reviews/revs630.html (accessed 12 July 2020).
Xia, W., Wan, Z., Yin, Z., Gaupp, J., Liu, Y., Clayton, E.W. and Malin, B.A. (2017), “It’s all in the timing:
calibrating temporal penalties for biomedical data sharing”, Journal of the American Medical
Informatics Association, Vol. 25 No. 1, pp. 25-31.
Yousafzai, A., Chang, V., Gani, A. and Noor, R.M. (2016), “Directory-based incentive management
services for ad-hoc mobile clouds”, International Journal of Information Management, Vol. 36
No. 6, pp. 900-906.
Zhou, X., Li, W. and Arundel, S.T. (2018), “A spatio-contextual probabilistic model for extracting
linear features in hilly terrains from high-resolution DEM data”, International Journal of
Geographical Information Science, Vol. 33 No. 4, pp. 666-686.
Zoltan, G. (2016), “Big data, science, causality”, Informacios Tarsadalom, Vol. 16 No. 2, p. 32.
For instructions on how to order reprints of this article, please visit our website:
www.emeraldgrouppublishing.com/licensing/reprints.htm
Or contact us for further details: permissions@emeraldinsight.com