Literature Review On Preprocessing For Text Mining

Crafting a literature review on preprocessing for text mining is no simple task.
It demands a thorough
understanding of the subject matter, a keen eye for relevant research, and the ability to synthesize
complex information into a coherent narrative. For many, this can be a daunting and time-consuming
process.
Navigating through the vast expanse of academic literature, identifying key studies, and critically
analyzing their findings requires expertise and dedication. Moreover, ensuring that the review is
comprehensive, well-organized, and up-to-date adds another layer of challenge.
In light of these difficulties, many individuals seek assistance in crafting their literature reviews.
One reliable solution is ⇒ StudyHub.vip ⇔. With a team of experienced writers well-versed in the
intricacies of academic writing, ⇒ StudyHub.vip ⇔ offers bespoke literature review services
tailored to your specific needs.
By entrusting your literature review to the professionals at ⇒ StudyHub.vip ⇔, you can alleviate the
stress and uncertainty associated with this daunting task. With their expertise and dedication, they
will deliver a meticulously researched and expertly crafted literature review that meets the highest
academic standards.
Don't let the challenges of writing a literature review hinder your academic progress. Order from ⇒
StudyHub.vip ⇔ today and take the first step towards achieving your academic goals.
Science has not escaped this evolution either, and it is often difficult to quickly and reliably “stand
on the shoulders of giants”. On the other hand, the objective of time-based techniques is to maintain
and correct the order of the events recorded in the log from the timestamp information. These
techniques usually work in conjunction with clustering and abstraction or alignment techniques; thus,
allowing the identification of patterns related to noisy data or data diversity. Many databases have
been successfully analyzed using text mining techniques, such as those of the Queensland
Department of Transport and Main Roads and National Highway Traffic Safety Administration
(NHTSA), the California Department of Motor Vehicles database, railway signaling maintenance
dataset, social media data, online reviews, and the Alabama and Illinois statewide crash database.
Table 2 outlines a general view and a summary of the most significant characteristics
(C1—techniques, C2—tools, C3—representation schemes, C4—imperfection types, C5—related
tasks, and C6—types of information), which are described in greater detail in the next sections. 3.2.
C1. Techniques Is there a way of grouping event log preprocessing techniques. This study provided a
pathway to understand the research trends, areas of improvement, and a few future research
directions in HRL. Paper should be a substantial original Article that involves several techniques or
approaches, provides an outlook for. A comparative assessment of different models will be conducted
in future studies where the authors will investigate in detail (1) the characteristics of the data, (2) the
model setup, (3) and the context of the research to identify the applicability and efficiency of
different models under different circumstances. The color intensity in the network nodes refers to
more general or abstract terms that contain other more specific terms, all related to describing the
handling of preprocessing in process mining. Using VOS viewer, a bibliometric analysis of 395
articles with 12,700 references is analyzed. An event filter allows dicing a log, i.e., to retain a
fragment of the process across multiple cases. The topics identified such as shipping services and
container product information systems were further investigated regarding trends in patenting
activity and major assignees for each topic. We use cookies on our website to ensure you get the best
experience. Within this group, there are two main approaches: filtering and time-based techniques.
Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. The
goal of this research is to identify the current state-of-the-art text mining techniques that are used for
transportation infrastructure assessment and planning and the potential of new and existing text
mining techniques in different transportation sectors. Journal of Theoretical and Applied Electronic
Commerce Research (JTAER). External Social Media Twitter, Web, Blogs, Patents, Reviews.
Different criteria might lead to different taxonomies of data preprocessing techniques in the context
of process mining. Moreover, cases, such as events, can have attributes. Case: Profiling Group
Support Systems Research”, Proceedings of 40th HICSS Conference, Hawaii, USA. Moreover, due
to simple suffix rules, sometimes the final stem is not appropriate. The results showed that the
framework could accurately estimate the density of traffic videos in both good and bad illumination
conditions. This review responds to the call by proposing a novel classification framework that
provides a full picture of current literature on where and how BDA has been applied within the SCM
context. Citations, each three years of the surveyed papers. OPERATIONAL VS.DECISION
SUPPORT SYSTEMS. Operational Systems. This technique is generally applied together with
pattern identification or event abstraction techniques, since both are strongly linked to identifying
associations or rules from observed behaviors, or acquired experiences in the event log. Some
techniques proposed for event abstraction make use of supervised learning when annotations with
high-level interpretations of the low-level events are available for a subset of the sequences (i.e.,
traces). These annotations provide guidance on how to label higher level events and guidance for the
target level of abstraction. This automaton captures the direct follow dependencies between events in
the log. We also discuss the emerging future challenges in the domain of data preprocessing, in the
context of process mining.
The definition of pattern is given as the abstraction from a concrete form, which keeps recurring in
specific non-arbitrary contexts. In Type-2 process mining tools, the analytic workflow is made
explicit; that is, the user can visualize and decide what elements to isolate or eliminate from the
event log. Topic models and sentiment analysis were the two major text mining techniques adopted
by researchers in the transportation infrastructure domain. On the other hand, the BoW model
involves representing a text document in terms of word frequency counts and developing an overall
frequency distribution of words in the document. Journal of Pharmaceutical and BioTech Industry
(JPBI). Table 3 summarizes the most relevant characteristics of the surveyed works of clustering
techniques. Heterogeneity stems from operational processes that change over time to adapt to
changing circumstances. Similarly, to identify peoples’ perceptions, especially using text data from
social media, sentiment analysis is comparatively the most feasible alternative and a less
computationally challenging alternative. The events for a case are represented in the form of a trace,
i.e., a sequence of unique events. This article is an open access article distributed under the terms and
conditions of the Creative Commons Attribution (CC BY) license ( ). Corresponding risk control
strategies for supply chain member risk, system environment risk, coordinating role risk, and
structure risk were also developed. This research work has three main contributions: We present, for
the first time, a review of preprocessing techniques of event logs, also called data cleaning or data
preparation techniques in the context of process mining. Depending on the task need and problem
context, either stemming, lemmatization, or both can be applied. Essentially, this study generated
future scenarios which indicated emerging technologies’ early warning signs of potential social
impacts and their specific consequences to society. In the big data and cloud computing era of today
huge amount of text data are getting generated online. In the case of pattern-based preprocessing
techniques, they mainly use the raw event log to identify concrete forms, which keeps recurring non-
arbitrary contexts, with the timestamp attribute being the most used by these techniques. Text
Analytics Capabilities Text Analytics Applications Text Mining and Text Analytics. It summarizes
the existing literature to explain the current state of the art in supply chain digitalization. Note that
from the first issue of 2016, this journal uses article numbers instead of page numbers. Definition 4.
A Business process model is the graphical and analytic representation used to capture the behavior of
an organization’s business processes. The first step is to mine the source data with the financial
process mining (FPM) algorithm to obtain process instances represented as graphs. Specifically, we
collected papers since 2005 (period from which automatic algorithms for mining processes began to
be proposed, such as the alpha algorithm) using the following terms “refining, repairing, cleaning,
refinement, filtering, clustering, preprocessing, ordered, aligning, abstraction, anomalous detection,
infrequent behavior, noisy, imperfection, traces, event log, process mining” identified in their title or
abstract. Please note that many of the page functionalities won't work as expected without javascript
enabled. Traditionally, these algorithms have been applied without taking into consideration the
availability of a process model. Science has not escaped this evolution either, and it is often difficult
to quickly and reliably “stand on the shoulders of giants”. As such, this study inspirationally presents
an automatic word stemming system for Hausa language with a view to contributing to the field of
electronic text processing, as well as NLP, in general. Definition 2. A trace can be seen as a case, i.e.,
a finite sequence of events. In the following years, with the arrival of big data and the internet of
things, in the creation of huge event logs, it may be necessary to design new preprocessing
algorithms that deal with new challenges that have not, so far, been identified or solved. Funding
This work was supported in part by the Autonomous University of Tamaulipas, Mexico.
In addition, the conformance task between the event log and the model can be executed in a
considerable time, especially when there are large event logs, always expecting to get an output
result, in the case where an enhancement task is focused on extending or improving an existing
process model, using information from the actual model recorded in an event log, including, to a
lesser degree, the use of preprocessing techniques. Both tasks allow improving the quality of the
event log or the process model and the performance of some process mining techniques. 3.6.1. Event
Abstraction The majority of available process mining techniques assume that event data are captured
on the same level of granularity. In the following years, with the arrival of big data and the internet
of things, in the creation of huge event logs, it may be necessary to design new preprocessing
algorithms that deal with new challenges that have not, so far, been identified or solved. To better
connect SC processes needs and what BDA offer, we present a structured review of academic
literature that addresses BDA methods in SCM using the supply chain operations reference (SCOR)
model. A total of 20 main topics were derived that included rail network, maritime transportation,
freight transportation planning, vehicle size and weight, and emission and fuel consumption. The
third step is to enumerate all possible paths representing sub-sequences of the mined process
instances. A total of 2527 provisions were prepared from two construction specifications of highway
projects with five national standards from three countries (Australia, the United Kingdom, and the
United States). In real scenarios, some process mining tasks work under the assumption that behavior
related to the execution of the running process is stored correctly within the event log, and that each
instance of the process stored in the event log is already finished. The problem is tackled by
identifying a concise set of promising candidates using an algorithm for computing the optimal repair
from the generated candidates, and a heuristic approximation by selecting repairs from the
candidates. There are other filters, such as timeframe that allows retaining or removing those cases
that are active in, contained in, started in, or ended in a particular period of time. Applications of
Text Mining in the Transportation Infrastructure Sector: A Review. These techniques usually work in
conjunction with clustering and abstraction or alignment techniques; thus, allowing the identification
of patterns related to noisy data or data diversity. A comparative assessment of different models will
be conducted in future studies where the authors will investigate in detail (1) the characteristics of
the data, (2) the model setup, (3) and the context of the research to identify the applicability and
efficiency of different models under different circumstances. In order to reduce the redundant
recoveries in regard to parallel routings, the authors use an algorithm that leverages trace replaying
to efficiently find a minimum recovery. Step 1—Tokenization: During tokenization, the text is
treated as a string which is later split into smaller pieces, or “tokens”. Some of the attributes used
during the preprocessing are: case ID, event label, timestamp, cost, resource, contextual information,
additional event payload, among others (see Figure 7 ). For example, color, food safety, smell,
flavor, promotions, deals, particular combinations of food and drinks, and the presence of foreign
particles were identified as key to sentiment development among consumers in a food supply chain.
You can download the paper by clicking the button above. Sometimes, the noise may be associated
to the presence of rare events due to handling of exceptional cases, incorrect recording of selected
tasks in the execution of the process, or even for the incorrect assignment of timestamps.
International Journal of Translational Medicine (IJTM). Editor’s Choice articles are based on
recommendations by the scientific editors of MDPI journals from around the world. These findings
will also provide some implications of BDA on operations management and supply chain
management practices and strategies. Finally, Table 4 shows the two most frequent problems in event
logs, the presence of noise and the data diversity or granularity level. 3.3. C2. Tools What tools are
available for the event logs preprocessing task. Some techniques proposed for event abstraction make
use of supervised learning when annotations with high-level interpretations of the low-level events
are available for a subset of the sequences (i.e., traces). These annotations provide guidance on how
to label higher level events and guidance for the target level of abstraction. This is because an event
log can have different data cleaning requirements and a single technique could not address all
possible issues. Paper should be a substantial original Article that involves several techniques or
approaches, provides an outlook for. These transportation infrastructure-related problems or issues
may involve issues such as crashes or accidents investigation, driving behavior analysis, and
construction activities. The color intensity in the network nodes refers to more general or abstract
terms that contain other more specific terms, all related to describing the handling of preprocessing in
process mining. If this probability is lower than a given threshold, the activity is considered as an
outlier. The Probabilistic Linear Discriminant Analysis (PLDA) topic model was used for feature
selection on the semantic level from a railway signaling maintenance data set to reduce the data set
into a low-dimensional topic space.
The focus is given on fundamentals methods of text mining which include natural language
possessing and information extraction. In the following years, with the arrival of big data and the
internet of things, in the creation of huge event logs, it may be necessary to design new
preprocessing algorithms that deal with new challenges that have not, so far, been identified or
solved. We also study and present findings about how a preprocessing technique can improve a
process mining task. Each event produced during the execution of a business process instance (a
case) corresponds to a trace. That work proposes a sequence-focused metric to evaluate supervised
event abstraction results that fits closely to the tasks of process discovery and conformance
checking. Funding This research received no external funding. The vector space model is an algebraic
model that represents text as vectors. We present a quantitative and qualitative analysis of the most
popular techniques for event log preprocessing. Part-of-speech tagging assigns parts of speech to
each word of a given text based on context. In this study, we identified and classified the common
and relevant characteristics found in the surveyed papers. A large number of unknown objects can be
examined in an uncontrolled classification method. This automaton captures the direct follow
dependencies between events in the log. A review of articles related to the topics was done within
SCOPUS, the largest abstract and citation database of peer-reviewed literature. This tool shows the
results of each change in thresholds or method on the discovered process model and allows user
interaction. Essentially, the types of text mining techniques, innovations in the application of these
techniques, the type of data analyzed, and the scope of these applications are described in detail. The
WordNet lexical database and Suggested Upper Merged Ontology (SUMO) were used as text
mining techniques to identify the positive and negative factors and their potential impact on tourism
and transport needs. Figure 2 shows the distribution of the selected works (in %), from 2006 to
2020, based on the year of publication (during a period of three years). The authors used a
backtracking idea to reduce the redundant sequences associated to parallel events. The optimal
alignment occurs when a trace in the log and an occurrence sequence in the model have the shortest
edit distance. Due to resources scarcity not enough work has been conducted for Urdu. First,
although studies from outside the US were sought to be included, in the end, only a small sample of
international studies were included. These filters are especially useful when handling real-life logs
and they do not only allow for projecting data in the log, but also for adding data to the log,
removing process instances (cases), and removing and modifying events. The text classification by
various Mechanisms of Machine Learning meets the challenge of the vector's high dimensionality.
This mapping, once obtained, can be applied to the unannotated traces in order to estimate the
corresponding high-level event for each low-level event. Works that do not include evaluation and
experimental results. Query details and results (number or retrieved papers) from the selected digital
libraries. We use cookies on our website to ensure you get the best experience. On the one hand,
filtering techniques aim to determine the likelihood of the occurrence of events or traces based on its
surrounding behavior. Step 1—Tokenization: During tokenization, the text is treated as a string
which is later split into smaller pieces, or “tokens”. An event log with low quality (missing, erroneous
or noisy values, duplicates, etc.) can lead to a complex, unstructured (spaghetti-type), and difficult to
interpret model (as shown in Figure 1 a); or a model that does not reflect the real behavior of the
business process.
Data Mining and Data Warehousing. SUMMARY. Operational vs. Decision Support Systems What
is Business Intelligence. Essentially, the inflectional or enunciated form of a word is replaced by the
base form. Our review provides answers, in the broad sense, for three main questions: (1) how can
the different techniques of event log preprocessing be grouped? (2) What problems exist around
achieving data quality in the event log. Such an example of word association analysis can be used to
construct a knowledge network using words as nodes and associations as edges. For example, due to
simple suffix rules, the tokens universal, university, and universe may be reduced to the stem univers,
which is not accurate. This is required to make a correct mapping between a clean event log and free
of events, activities or traces that are missing, noisy, or inconsistent with the model in execution. This
process typically has to cope with ambiguous word-tag mappings for complex text data, especially if
the data comes from scientific or any niche disciplines where jargon is used. Introduction. Data
volume is growing and sources of information are more diverse. The third step is to enumerate all
possible paths representing sub-sequences of the mined process instances. The aim of this paper is to
give an overview of text mining in the contexts of its techniques, application domains and the most
challenging issue. Journal of Otorhinolaryngology, Hearing and Balance Medicine (JOHBM). This
approach repairs event data with inconsistent labeling but sound structure, using the minimum
change principle to preserve the original information as much as possible. Subscribe to receive issue
release notifications and newsletters from MDPI journals. In the big data and cloud computing era of
today huge amount of text data are getting generated online. Moreover, cases, such as events, can
have attributes. Text mining needs proven techniques to be developed for it to be most effective.
These studies reflected the expansion and convergence of trends in the supply chain and logistics
field, and identified challenges and the potential opportunities to be utilized to make supply chain
and logistics systems more efficient. 4.3.3. Risk and Resilience Analysis Three studies were
identified that focused on the application of text mining techniques in risk management and
developed data-driven approaches related to supply chain resilience analysis. Second, the research
steps adopted in this study are described in detail. Applications of Text Mining in the Transportation
Infrastructure Sector: A Review. First, although studies from outside the US were sought to be
included, in the end, only a small sample of international studies were included. This paper therefore
intends to explore these premises. Conclusively, this paper attempts to describe in detail the recent
increase in interest and progress made in Urdu language processing research. The internal
construction of enterprises could be reflected via talent competition, technological innovation, and
the optimization of management. 4.4.4. Others One study was found that could not be categorized
into any of the preceding categories. Qiaozhu Mei Jing Jiang ChengXiang Zhai Nov 3, 2004. The
goal was to recommend travel routes for new travelers. Please let us know what you think of our
products and services. The events for a case are represented in the form of a trace, i.e., a sequence of
unique events. In the table are also shown the main problems identified in the event log, such as
missing data (. The challenging issue in text mining which is caused by the complexity in a natural
language is also addressed in this paper. Many of the tools that contain preprocessing techniques are
limited to interacting with the user to make a better decision when including, isolating, or eliminating
any event or trace. 3.4. C3. Representation Schemes of Event Logs Used in Preprocessing Techniques
What structures are more appropriate to represent and manipulate event logs in preprocessing
techniques.

Literature Review On Preprocessing For Text Mining

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Literature Review On Preprocessing For Text Mining

Uploaded by

Copyright:

Available Formats

Crafting a literature review on preprocessing for text mining is no simple task.

You might also like