Data Mining To Discovery of Knowledge

10 VI June 2022
https://doi.org/10.22214/ijraset.2022.43643
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue VI June 2022- Available at www.ijraset.com
Data Mining to Discovery of Knowledge

Aman Gupta1, Ashish Jadhav2
ASM
I. INTRODUCTION
Data mining and discovery of knowledge have been attracting a significant amount of research, industry, and media attention of
late. What is all the excitement about? This article provides an overview of this emerging field, clarifying how data mining and
knowledge discovery in databases are related both to each other and to related fields, such as machine learning, statistics, and
databases. The article mentions particular real-world applications, specific data-mining techniques, challenges involved in real-
world applications of knowledge discovery, and current and future research directions in the field.
At an abstract level, the DISCOVERY OF KNOWLEDGE field is concerned with the development of methods and techniques
for making sense of data. The basic problem addressed by the DISCOVERY OF KNOWLEDGE process is one of mapping
low-level data (which are typically too voluminous to understand and digest easily) into other forms that might be more
compact (for example, a short report), more abstract (for example, a descriptive approximation or model of the process that
generated the data), or more useful (for example, a predictive model for estimating the value of future cases). At the core of the
process is the application of specific data-mining methods for pattern discovery and extraction.
By discussing the historical context of DISCOVERY OF KNOWLEDGE and data mining and their intersection with other
related fields. A brief summary of recent DISCOVERY OF KNOWLEDGE real-world applications is provided. Definitions of
DISCOVERY OF KNOWLEDGE and data mining are provided, and the general multistep DISCOVERY OF KNOWLEDGE
process is outlined. This multistep process has the application of data-mining algorithms as one particular step in the process.
The data-mining step is discussed in more detail in the context of specific data-mining algorithms and their application. Real-
world practical application issues are also outlined. Finally, the article enumerates challenges for future research and
development and in particular discusses potential opportunities for AI technology in DISCOVERY OF KNOWLEDGE systems.
II. WHY DO WE NEED DISCOVERY OF KNOWLEDGE?

The traditional method of turning data into knowledge relies on manual analysis and interpretation. For example, in the health-
care industry, it is common for specialists to periodically analyze current trends and changes in health-care data, say, on a
quarterly basis. The specialists then provide a report detailing the analysis to the sponsoring health-care organization; this report
becomes the basis for future decision making and planning for health-care management. In a totally different type of
application, planetary geologists sift through remotely sensed images of planets and asteroids, carefully locating and cataloguing
such geologic objects of interest as impact craters. Be it science, marketing, finance, health care, retail, or any other field, the
classical approach to data analysis relies fundamentally on one or more analysts becoming intimately familiar with the data and
serving as an interface between the data and the users and products.
The need to scale up human analysis capabilities to handling the large number of bytes that we can collect is both economic and
scientific. Businesses use data to gain competitive advantage, increase efficiency, and provide more valuable services to
customers. Data we capture about our environment are the basic evidence we use to build theories and models of the universe
we live in. Because computers have enabled humans to gather more data than we can digest, it is only natural to turn to
computational techniques to help us unearth meaningful patternsand structures from the massive volumes of data. Hence,
DISCOVERY OF KNOWLEDGE is an attempt to address a problem that the digital information era made a fact of life for all
of us: data overload.
III. DATA MINING AND DISCOVERY TO KNOWLEDGE

Historically, the notion of finding useful patterns in data has been given a variety of names, including data mining, knowledge
extraction, information discovery, information harvesting, data archaeology, and data pattern processing. The term data mining
has mostly been used by statisticians, data analysts, and the management information systems (MIS) communities. It has also
gained popularity in the database field. The phrase knowledge discovery in databases was coined at the first DISCOVERY OF
KNOWLEDGE workshop in 1989 (Piatetsky-Shapiro 1991) to emphasize that knowledge is the end product of a data-driven
discovery. It has been popularized in the AI and machine-learning fields.
In our view, DISCOVERY OF KNOWLEDGE refers to the overall process of discovering useful knowledge from data, and
data mining refers to a particular step in this process. Data mining is the application of specific algorithms for extracting
patterns from data. The distinction between the DISCOVERY OF KNOWLEDGE process and the data-mining step (within the
process) is a central point of this article. The additional steps in the DISCOVERY OF KNOWLEDGE process, such as data
preparation, data selection, data cleaning, incorporation of appropriate prior knowledge, and proper interpretation of the results
of mining, are essential to ensure that useful knowledge is derived from the data. Blind application of data-mining methods
(rightly criticized as data dredging in the statistical literature) can be a dangerous activity, easily leading to the discovery of
meaningless and invalid patterns. The Interdisciplinary Nature of DISCOVERY OF KNOWLEDGE DISCOVERY OF
KNOWLEDGE has evolved, and continues to evolve, from the intersection of research fields such as machine learning, pattern
recognition, databases, statistics, AI, knowledge acquisition for expert systems, data visualization, and high-performance
computing. The unifying goal is extracting high-level knowledge from Rlow-level data in the context of large data sets.
IV. THE DISCOVERY OF KNOWLEDGE PROCESS

The DISCOVERY OF KNOWLEDGE process is interactive and iterative,involving numerous steps with many decisions made
by the user. Brachman and Anand (1996) give a practical view of the DISCOVERY OF KNOWLEDGE process, emphasizing
the interactive nature of the process. Here, we broadly outline some of its basic steps: First is developing an understanding of
the application domain and the relevant prior knowledge and identifying the goal of the DISCOVERY OF KNOWLEDGE
process from the customer’s viewpoint. Second is creating a target data set: selecting a data set, or focusing on a subset of
variables or data samples, on which discovery is to be performed. Third is data cleaning and preprocessing. Basic operations
include removing noise if appropriate, collecting the necessary information to model or account for noise, deciding on strategies
for handling missing data fields, and accounting for time-sequence information and known changes. Fourth is data reduction
and projection: finding useful features to represent the data depending on the goal of the task. With dimensionality reduction or
transformation methods, the effective number of variables under consideration can be reduced, or invariant representations for
the data can be found.
V. APPLICATIONS
The investigation of distinguishing local area structures in multi-facet networks is encountering a bloom somewhat recently.
Important explores cover different angles among our everyday existence, for example, breaking down persuasive clients in
numerous social stages (Al-Garadi et al. 2018), tracking down association of proteins in a natural framework (Gosak et al. 2018)
and overseeing metropolitan transportation framework with different traffic habits (Liu et al. 2019), and so on. The
accompanying subsections summed up uses of local area location by means of a multi-facet network system.
VI. MAIN CONTRIBUTIONS

There have been various endeavors to address local area identification issue in multi-facet networks through different
methodologies, e.g., distinguishing networks in worldly organizations by seclusion expansion (Bazzi et al. 2016), where the
creators underscore the contrast between "invalid organizations" and "invalid models" in particularity expansion and examine
the impact of interlayer edges on the multi-facet seclusion boost issue. De Bacco et al. (2017) propose a generative model for
multi-facet organizations, which can be utilized to total layers into bunches or to pack a dataset by recognizing particularly
pertinent or repetitive layers. The proposed model is fit for integrating local area discovery and connection expectation for
multi-facet organizations, and trial results on both engineered and genuine world datasets shows its plausibility. Breaking down
multi-facet networks is vital on the grounds that many intriguing examples can't be gotten by investigating single-layer
organizations. That is our inspiration for summing up these methodologies. The commitments of this work are:
A. We construct a scientific classification of local area discovery strategies in view of different procedures utilized.
B. We give a nitty gritty review of works that go under various classifications.
C. The assessment measures for local area results are ordered and summed up.
D. The uses of local area discovery in multi-facet networks are presented, as well as fascinating headings for future works.
Supposedly, this is the most recent work that gives a complete overview on different local area discovery strategies in multi-
facet organizations.
VII. CONCLUSION
Here are numerous Data Mining techniques from which one can decided for mining the arising clinical information bases and
vaults. In this part, we have assessed most famous ones, furthermore, gave a few pointers where they have been applied.
Notwithstanding the potential and promising approaches, the utility of Data Mining techniques to dissect clinical informational
indexes is as yet meager, particularly when contrasted with traditional measurable methodologies. It is making progress, be that
as it may, in the areas where information is went with information bases, and where information storehouses
putting away heterogenous information from various sources took ground.
REFERENCES
[1] Bazzan, A. L., Engel, P. M., Schroeder, L. F., and da Silva, S. C. (2002). Automated annotation of keywords for proteins related to mycoplasmataceae
using machine learning techniques. Bioinformatics, 18:35S–43S.
[2] Frank, E., Holmes, G., Kirkby, R., and Hall, M. (2002). Racing committees for large datasets. In Proceedings of the International Conference on
Discovery Science, pages 153–164. Springer-Verlag.
[3] Frank, E., Paynter, G. W., Witten, I. H., Gutwin, C., and Nevill-Manning, C. G. (1999). Domain-specifific keyphrase extraction. In Proceedings of the
16th International Joint Conference on Artifificial Intelligence, pages 668–673. Morgan Kaufmann.
[4] Holmes, G., Cunningham, S. J., Rue, B. D., and Bollen, F. (1998). Predicting apple bruising using machine learning. Acta Hort, 476:289–296.
[5] Holmes, G. and Hall, M. (2002). A development environment for predictive modelling in foods. International Journal of Food Microbiology, 73:351–362.
[6] Holmes, G., Kirkby, R., and Pfahringer, B. (2003). Mining data streams using option trees. Technical Report 08/03, Department of Computer Science,
University of Waikato.
[7] Kusabs, N., Bollen, F., Trigg, L., Holmes, G., and Inglis, S. (1998). Objective measurement of mushroom quality. In Proc New Zealand Institute of
Agricultural Science and the New Zealand Society for Horticultural Science Annual Convention, page 51.
[8] Li, J., Liu, H., Downing, J. R., Yeoh, A. E.-J., and Wong, L. (2003). Simple rules underlying gene expression profifiles of more than six subtypes of acute
lymphoblastic leukemia (all) patients. Bioinformatics, 19:71–78.
[9] McQueen, R., Holmes, G., and Hunt, L. (1998). User satisfaction with machine learning as a data analysis method in agricultural research. New Zealand
Journal of Agricultural Research, 41(4):577–584.

Data Mining To Discovery of Knowledge

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Mining To Discovery of Knowledge

Uploaded by

Copyright:

Available Formats

10 VI June 2022

Data Mining to Discovery of Knowledge

II. WHY DO WE NEED DISCOVERY OF KNOWLEDGE?

III. DATA MINING AND DISCOVERY TO KNOWLEDGE

IV. THE DISCOVERY OF KNOWLEDGE PROCESS

VI. MAIN CONTRIBUTIONS

You might also like