You are on page 1of 9

Introduction

The administration of organizational procedures, technological resources, and human resources


that together generate, gather, combine, arrange, process, store, distribute, use, and discard information
is known as information management. This study provides a comprehensive overview of information
management from 1970 to 2019, using statistical text analysis and a probabilistic generative model. It
examines publication trends and themes, revealing common themes such as data management,
knowledge management, environmental management, project management, service management, and
mobile and web management. The findings also identify themes such as knowledge management,
environmental management, project management, and social communication as academic hotspots for
future research.

Early research on information management (IM) encompasses various organizational activities,


including acquiring, integrating, organizing, structuring, processing, and distributing information
optimally. Initially, studies focused on technical aspects of IM, such as hardware and software
development. Information resources management (IRM) was associated with managing information and
information technologies used for acquiring, storing, processing, and utilizing information. IM became an
appealing domain, recognizing information as an organizational resource requiring effective
management. IRM focused on data management, but it did not cover the broad perspective of IM, which
focuses on efficient access, organization, processing, and utilization of information.

The Information Management (IM) domain has experienced convergence and integration, with
the separation of roles between data administrators and information managers. This has led to the
convergence of skills and made IM a strategic issue in organizations, improving corporate performance,
encouraging competition, and reducing uncertainty. Strategic information management enhances
differentiation in value chain activities and has broader impacts at organizational, sectoral, and societal
levels.

The exponential growth in information production has increased the importance of information
management (IM), which requires new skills, knowledge, qualifications, and experience for managing
information at four levels:

• Information Retrieval
• Information Systems
• Information Contexts
• Information Environments
IM is defined as the management of processes and systems that create, acquire, organize, store,
distribute, and use information. In the last five decades, IM has covered various themes and topics, but
limited efforts have been made to integrate this fragmented research. Understanding the evolution of
the intellectual structure of IM research published in various journals and conference proceedings in the
last five decades could add significant value to the existing body of knowledge and be interesting for
academics and practitioners in the field. A study covering the evolution of IM over a large time interval
can add significant value to the literature on the information management field.

Methods and Data

1. Bibliometric analysis
It is a widely accepted technique for analyzing and summarizing vast and fragmented research.
Originating from the library and information sciences, it has since been applied to various fields
such as social science, international business, public policy, marketing, advertising, psychology,
travel and tourism marketing, computer integrated manufacturing, communications, and
information systems. Citation analysis measures similarity and association among research
papers, contributors, and journals. For a specific research domain, bibliometric analysis can
analyze thematic areas and visualize conceptual subdomains. For a journal, it can understand
thematic evolution, visualize citation patterns, discern progressive themes, and envision future
research avenues. Overall, bibliometric analysis is a reliable and widely accepted method for
analyzing and summarizing research.
2. Topic modeling based on the structural topic models
Topic modeling is a useful technique in natural language processing and text analytics that
extracts underlying topics (latent themes) from text documents. It is an unsupervised machine
learning technique that learns and discovers latent themes and their prevalences across a
collection of documents. Popular techniques include Latent Semantic Analysis (LSA), Probabilistic
Latent Semantic Analysis (PLSA), and Latent Dirichlet Allocation (LDA). LDA is a Bayesian
probability-based generative probabilistic model, assuming each document is a distribution over
a fixed vocabulary and each latent topic is a distribution over words with an assigned probability.
Structural topic modeling (STM) is a recent and sophisticated probabilistic topic modeling
technique that estimates a topic model through fast variational approximation using an
expectation maximization algorithm. STM can also model the interaction between covariates
associated with metadata.
Step-1. Estimate the topic prevalence parameter (proportion vector) θ d for each word in a
document d using a logistic-normal generalized linear model. θ d is generated for each word
from the vocabulary of size V having the probability of k = 1, …, K different topics.
where, the topic prevalence model’s coefficients are represented by Γ =[γ1 |.. . | γK ] and Σ is a
hyper-parameter modeled as a (K- 1) by (K -1) covariance matrix.

Step-2. Generate the topical content model β, which represents the words as a probabilistic
mixture of each topic (k) using the following equation where m is the baseline word distribution

vector of length V, is the topic (t) specific deviation, is the covariate (c) group deviation

and is the interaction (i) coefficient. This study includes publication year as covariates.

Step-3. For each word in the document, (n ∈{1,. . ., Nd}), the core language model represented
by following unsupervised models can be used to sample a topic from a multinomial distribution
over the topic prevalence parameter and for a given topic, a word is sampled using the
applicable multinomial distribution.

This study uses Logistic-Normal distribution to compute topic prevalence parameters in STM,
which is related to document-level covariates. The text corpus for STM is created from the
article's title, keywords, abstract, and publication year. The text data is preprocessed to remove
stop words, numbers, non-English words, special characters, and punctuations. The text corpus
is then cleaned to remove frequent words related to copyright information and publishers.
Bigram terms are generated from the text corpus and compared with author's specified
keywords for topic modeling. The most frequent bigram terms are concatenated for better
results.

3. Data
The study retrieved bibliographic data from the Scopus database from 1970 to 2019, focusing on
the Information Management (IM) domain. The search query included Business, Management,
and Accounting as the most prominent subject area, covering 20,057 documents. The study used
19,916 research documents after eradicating discrepancies. The R environment was used for
analyses, including bibliometric overview, topic modeling, and results visualization, on a
Windows 10-based computer with 16 GB RAM and 64-bit architecture.

Results

1. Most cited papers


The top 10 most cited studies in the Information Management (IM) domain between 1970 and
2019 are listed in Table 2. The most cited paper with 5410 citations is Alavi and Leidner's 2001
work on knowledge management and systems. The second most cited paper is Hansen, Nohria,
& Tierney's 1999 study on knowledge management strategies. Seven articles have over 1000
citations annually, with the most recent being a 2012 IEEE Task Force manifesto on Process
Mining.
2. Noteworthy influential authors
Table 3 shows influential authors' total citations and average citations per research document.
Dorothy Leidner from Baylor University, Maryam Alavi from Georgia Tech, and Varun Grover
from Arkansas are the top three, with Alavi leading with 5410 citations.
3. The most influential institutions and countries
The International Relations (IM) domain attracts scholars worldwide, with Hong Kong
Polytechnic University having the highest number of publications. Other top institutions include
the National University of Singapore, South China University of Technology, and other countries
like the United States, Romania, Russia, Czech Republic, United Kingdom, Malaysia, Australia,
Germany, and Indonesia.
4. Top research keywords
The analysis of research article's most common keywords reveals major areas of interest,
revealing the conceptual structure and knowledge development of a domain. The top-10
research keywords and their occurrences are listed in Table 5 and a networked graph in Fig. 4.
5. Topic modeling for thematic analysis
This study uses STM-based topic modeling to analyze text content from various research
documents, including journal articles, conference papers, reviews, and book chapters. It
identifies key themes and latent topics from 1970 to 2019, using a total of 19,916 documents.
The study also analyzes research articles and conference proceedings. The key challenge is
identifying the correct number of latent topics.

Results: all research documents

The study discovered 16 topics from 19,916 research documents related to the Information
Management (IM) domain using STM. The semantically descriptive topics were labeled based on highly
probable words. The study used concatenated most frequent bigrams to include meaningful bigrams in
topic modeling. Semantic coherence and exclusivity are crucial constructs measuring the overall quality
of topic models. The average semantic coherence scores for all topics range from 11.33-11.80, while the
range of semantic coherence is -191.29 to -102.45. The study found that top words in two topics do not
cooccur equally within the documents.
The perspective visualization in Fig. 7 shows topical contrast for Topic-1, 12 (International
Accounting and Global Business), 3, 3 (Information, Web, and User), and 8 (Industry and Industrial
Innovation), indicating semantic association between topics.

The research on international accounting and global business focuses on macro-level aspects,
while topics like financial performance and investment focus on micro-level aspects. Topics like
information management, web, user, content, search, and semantics are more focused on, while topics
like industry and industrial innovation are more focused on innovation, industry, and environmental
aspects.
This study used a correlation analysis to quantify the association among extracted topics. The
results showed weak or no correlations, with correlation values less than 0.3, indicating a weak or no
correlation among the extracted topics. A positive correlation indicates many documents contain both
topics equally.
This study analyzed maximum-a-posteriori (MAP) estimates for document topic loadings to
confirm topic quality. A histogram plot showed the expected distribution of topic proportions across
research documents from 1970 to 2019. The statistical mixture hypothesis suggests each document is a
probabilistic mixture of key and non-key topics. The plot shows that each extracted latent topic has little
or no relation with multiple research documents.

You might also like