You are on page 1of 5

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/50946165

A Review on Data mining from Past to the Future

Article  in  International Journal of Computer Applications · February 2011


DOI: 10.5120/1961-2623 · Source: DOAJ

CITATIONS READS

62 5,266

2 authors:

Venkatadri Marriboyina Lokanatha C Reddy


Amity University, Madhya Pradesh. Dravidian University
18 PUBLICATIONS   115 CITATIONS    28 PUBLICATIONS   308 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Pattern Analysis of Human Bio-field Using Image Processing View project

PhD work on High speed Neworks View project

All content following this page was uploaded by Lokanatha C Reddy on 17 May 2014.

The user has requested enhancement of the downloaded file.


International Journal of Computer Applications (0975 – 8887)
Volume 15– No.7, February 2011

A Review on Data mining from Past to the Future


Venkatadri.M Dr. Lokanatha C. Reddy
Research Scholar, Professor,
Dept. of Computer Science, Dept. of Computer Science,
Dravidian University, India. Dravidian University, India.

ABSTRACT presents the comparative statement of data mining trends and


Data and Information or Knowledge has a significant role on finally conclusion follows.
human activities. Data mining is the knowledge discovery
process by analyzing the large volumes of data from various
2. HISTORICAL TRENDS OF DATA
perspectives and summarizing it into useful information. Due MINING
to the importance of extracting knowledge/information from The building blocks of data mining is the evolution of a field
the large data repositories, data mining has become an with the confluences of various disciplines, which includes
essential component in various fields of human life. database management systems(DBMS), Statistics, Artificial
Advancements in Statistics, Machine Learning, Artificial Intelligence(AI), and Machine Learning(ML). The era of data
Intelligence, Pattern Recognition and Computation mining applications was conceived in the year1980 primarily
capabilities have evolved the present day’s data mining by research-driven tools focused on single tasks [3]. The early
applications and these applications have enriched the various day’s data mining trends are as under.
fields of human life including business, education, medical,
scientific etc. Hence, this paper discusses the various 2.1 Data Trends
improvements in the field of data mining from past to the In initial days, data mining algorithms work best for
present and explores the future trends. numerical data collected from a single data base, and various
data mining techniques have evolved for flat files, traditional
Keywords and relational databases where the data is stored in tabular
Knowledge Discovery in Databases, Data Mining, Historical representation. Later on, with the confluence of Statistics and
Trends, Heterogeneous Data, Current Trends, Future Trends. Machine Learning techniques, various algorithms evolved to
mine the non numerical data and relational databases.
1. INTRODUCTION
The advent of information technology in various fields of 2.2 Computing Trends
human life has lead to the large volumes of data storage in The field of data mining has been greatly influenced by the
various formats like records, documents, images, sound development of fourth generation programming languages and
recordings, videos, scientific data, and many new data various related computing techniques. In, early days of data
formats. The data collected from different applications mining most of the algorithms employed only statistical
require proper mechanism of extracting knowledge techniques. Later on they evolved with various computing
/information from large repositories for better decision techniques like AI, ML and Pattern Reorganization. Various
making. Knowledge discovery in databases (KDD), often data mining techniques (Induction, Compression and
called data mining, aims at the discovery of useful Approximation) and algorithms developed to mine the large
information from large collections of data[1]. The core volumes of heterogeneous data stored in the data warehouses.
functionalities of data mining are applying various methods
and algorithms in order to discover and extract patterns of 3. CURRENT TRENDS
stored data [2]. From the last two decades data mining and The field of data mining has been growing due to its
knowledge discovery applications have got a rich focus due to enormous success in terms of broad-ranging application
its significance in decision making and it has become an achievements and scientific progress, understanding. Various
essential component in various organizations. The field of data mining applications have been successfully implemented
data mining have been prospered and posed into new areas of in various domains like health care, finance, retail,
human life with various integrations and advancements in the telecommunication, fraud detection and risk analysis...etc...
fields of Statistics, Databases, Machine Learning, Pattern The ever increasing complexities in various fields and
Reorganization, Artificial Intelligence and Computation improvements in technology have posed new challenges to
capabilities etc. The various application areas of data mining data mining; the various challenges include different data
are Life Sciences (LS), Customer Relationship Management formats, data from disparate locations, advances in
(CRM), Web Applications, Manufacturing, Competitive computation and networking resources, research and scientific
Intelligence, Retail/Finance/Banking, fields, ever growing business challenges etc. Advancements in
Computer/Network/Security, Monitoring/Surveillance, data mining with various integrations and implications of
Teaching Support, Climate modeling, Astronomy, and methods and techniques have shaped the present data mining
Behavioral Ecology etc. All most every field of human life applications to handle the various challenges, the current
has become data-intensive, which made the data mining as an trends of data mining applications are
essential component. Hence, this paper reviews the various
trends of data mining and its relative areas from past to 3.1 Mining the Heterogeneous data
present and explores the future areas of it. This paper is The following table depicts various currently employed data
organized as follows section 2 presents historical perspectives mining techniques and algorithms to mine the various data
of data mining section 3 presents current trends in data mining formats in different application areas. The various data mining
section 4 presents future trends of data mining Section 5 areas are explained after the table1.

19
International Journal of Computer Applications (0975 – 8887)
Volume 15– No.7, February 2011

Table 1: Current Data Mining areas and techniques to mine the various Data Formats
Data mining Application Areas Data Formats Data mining Techniques/Algorithms
type
Hypermedia Internet and Intranet Hyper Text
Classification and Clustering Techniques
data mining Applications. Data
Ubiquitous data Applications of Mobile
Ubiquitous Traditional data mining techniques drawn from the
mining phones, PDA, Digital
Data Statistics and Machine Learning
Cam etc.
Multimedia data Audio/Video Multimedia
Rule based decision tree classification algorithms
mining Applications Data
Spatial Data Network, Remote
mining Sensing and GIS Spatial Data Spatial Clustering Techniques, Spatial OLAP
applications.
Time series Business and Financial Time series
Rule Induction algorithms.
Data mining applications. Data

3.1.1 Hypertext / Hypermedia data mining 3.2 Utilizing the Computing and
The hypertext and hypermedia data is a collection of data Networking Resources
from online catalogues, digital libraries, and online Data mining has been prospered by utilizing the advanced
information data bases which include hyperlinks, text computing and networking resources like Parallel, Distributed
markups and other forms of data. Web mining is the and Grid technologies. Parallel data mining applications have
application of data mining to discover the patterns from the evolved using the Parallel computing, typical parallel data
Web. The important data mining technique used for hypertext mining applications employ the Apriori algorithm [12].
and hypermedia data are Classification (supervised learning), Parallel computing and distributed data mining are both
Clustering (unsupervised learning). integrated in Grid technologies [13]. Grid based Support
Vector Machine method is used in distributed data mining
3.1.2 Ubiquitous data mining [14]. Recently, various soft computing methodologies have
The advent of laptops, palmtops, cell phones, and wearable been applied in data mining such as fuzzy logic, rough set,
computer devices with increasing computational capacity and neural networks, evolutionary computing (Genetic Algorithms
proliferation of all these devices is leading to the emergence and Genetic Programming), and support vector machines to
of ubiquitous computing paradigm [4]. The Ubiquitous analyze various formats of data stored in distributed databases
computing environments are subsequently giving rise to a results in a more intelligent and robust system providing a
new class of applications termed Ubiquitous Data Mining human-interpretable, low cost, approximate solution, as
(UDM). UDM is the process of analysis of data for extracting compared to traditional techniques [15] for systematic
useful knowledge from the data of ubiquitous computing [5]. analysis, a robust preprocessing system, flexible information
Traditional data mining techniques that are drawn from the processing, data analysis and decision making.
combination of ML and Statistics are presently employed in
ubiquitous data mining [6]. 3.3 Research and Scientific Computing
3.1.3 Multimedia data mining Trends
The explosion in the amount data from many scientific
The multimedia data includes images, video, audio, and
disciplines, such as astronomy, remote sensing, bio-
animation. The data mining techniques that are applied on
informatics, combinatorial chemistry, medical imaging, and
multimedia data are rule based decision tree classification
experimental physics are tuning to various data mining
algorithms like Artificial Neural Networks, Instance-based
techniques, to find out useful information. The Direct-kernel
learning algorithms, Support Vector Machines, also
based techniques are powerful data mining tool for
association rule mining, clustering methods [7].
predictive modeling, feature selection and visualization in
3.1.4 Spatial data mining scientific computing [16].
The spatial data includes astronomical data, satellite data and
space craft data. Some of the data mining techniques and data
3.4 Business Trends
structures which are used when analyzing spatial and related Today’s business must be more profitable, react quicker and
types of data include the use of spatial warehouses, spatial offer high quality services that ever before. With these types
data cubes, spatial OLAP, and spatial clustering methods [8]. of expectations and constraints, data mining becomes a
fundamental technology in enabling customer’s transactions
more accurately. Data mining techniques of classification,
3.1.5 Time series data mining regression, and cluster analysis are used for in current
A time series is a sequence of data points, measured typically business trends [17]. Most of the current business data mining
at successive times spaced at uniform time intervals. Typical applications utilize the classification and prediction
examples include stock prices, currency exchange rates, the techniques for supporting business decisions. In business
volume of product sales, biomedical measurements, weather environment data mining has evolved to Decision Support
data, etc, collected over monotonically increasing time. Rule Systems (DSS) and very recently it has grown to Business
induction algorithms such as Version Space [9], AQ15 [10], Intelligence (BI) systems.
C4.5 rules [11] are presently employed in Time series data
mining applications. 4. FUTURE TRENDS
Due to the enormous success of various application areas of
data mining, the field of data mining has been establishing
itself as the major discipline of computer science and has
shown interest potential for the future developments. Ever

20
International Journal of Computer Applications (0975 – 8887)
Volume 15– No.7, February 2011

increasing technology and future application areas are always data mining has to focus seriously to address the data privacy,
poses new challenges and opportunities for data mining, the security and governance. Cloud computing is a great area to
typical future trends of data mining includes be focused by data mining, as the Cloud computing is
penetrating more and more in all ranges of business and
Standardization of data mining languages scientific computing. Data mining techniques and applications
Data preprocessing are very much needed in cloud computing paradigm.
Complex objects of data
Computing resources 4.5 Web mining
Web mining The development of World Wide Web and its usage grows, it
Scientific Computing will continue to generate ever more content, structure, and
Business data usage data and the value of Web mining will keep increasing.
Research needs to be done in developing the right set of Web
4.1 Standardization of data mining metrics, and their measurement procedures, extracting process
languages models from usage data, understanding how different parts of
There are various data mining tools with different syntaxes, the process model impact various Web metrics of interest,
hence it is to be standardized for making convenient of the how the process models change in response to various
users. Data mining applications has to concentrate more in changes that are made-changing stimuli to the user,
standardization of interaction languages and flexible user developing Web mining techniques to improve various other
interactions. aspects of Web services, techniques to recognize known
frauds and intrusion detection.
4.2 Data Preprocessing
To identify useful novel patterns in distributed, large, complex 4.6 Scientific Computing
and temporal data, data mining techniques has to evolve in In recent years data mining has attracted the research in
various stages. The present techniques and algorithms of data various scientific computing applications, due to its efficient
preprocessing stage are not up to the mark compared with its analysis of data, discovering meaningful new correlations,
significance in finding out the novel patterns of data. In future patterns and trends with the help of various tools and
there is a great need of data mining applications with efficient techniques. More research has to be done in mining of
data preprocessing techniques. scientific data in particular approaches for mining
astronomical, biological, chemical, and fluid dynamical data
4.3 Complex object of data analysis. The ubiquitous use of embedded systems in sensing
Data mining is going to penetrate in all fields of human life, and actuation environments plays major impending
the presently available data mining techniques are restricted to developments in scientific computing will require a new class
mine the traditional forms of data only, and in future there is a of techniques capable of dynamic data analysis in faulty,
potentiality for data mining techniques for complex data distributed framework. The research in data mining requires
objects like high dimensional, high speed data streams, more attention in ecological and environmental information
sequence, noise in the time series, graph, Multi-instance analysis to utilize our natural environment and resources.
objects, Multi-represented objects and temporal data. Significant data mining research has to be done in molecular
biology problems.
4.4 Computing Resources
The contemporary developments in high speed connectivity, 4.7 Business Trends
parallel, distributed, grid and cloud computing has posed new Business data mining needs more enhancement in the design
challenges for data mining. The high speed internet of data mining techniques to gain significant advantages in
connectivity has posed a great demand for novel and efficient today’s competitive global market place (E-Business). The
data mining techniques to analyze the massive data which is Data mining techniques hold great promises for developing
captured of IP packets at high link speeds in order to detect new sets of tools that can be used to provide more privacy for
the Denial of Service (DoS) and other types of attacks. a common man, increasing customer satisfaction, providing
Distributed data mining applications demand new alternatives best, safe and useful products at reasonable and economical
in different fields, such as discovery of universal strategy to prices, in today’s E-Business environment.
configure a distributed data mining, data placement at
different locations, scheduling, resource management, and 5. COMPARATIVE STATEMENT
transactional systems etc. New data mining techniques and The following table presents the comparative statement of
tools are needed to facilitate seamless integration of various various data mining trends from past to the future.
resources in grid based environment. Moreover, grid based

Table 2: Data mining Trends Comparative Statement


Data mining Algorithms/ Computing Prime areas of
Data formats
trends Techniques employed Resources applications
Numerical data and Evolution of 4G PL
Statistical, Machine
Past structured data stored in and various related Business
Learning Techniques
traditional databases techniques
Statistical, Machine High speed networks,
Heterogeneous data formats
Learning, Artificial High end storage Business, Web,
includes structured, semi-
Current Intelligence, Pattern devices and Parallel, Medical diagnosis
structured and unstructured
Reorganization Distributed etc…
data
Techniques computing etc…

21
International Journal of Computer Applications (0975 – 8887)
Volume 15– No.7, February 2011

Complex data objects Business, Web,


includes high dimensional, Medical diagnosis,
Soft Computing
high speed data streams, Scientific and
techniques like Fuzzy Multi-agent
sequence, noise in the time Research analysis
Future logic, Neural Networks technologies and
series, graph, Multi- fields (bio, remote
and Genetic Cloud Computing
instance objects, Multi- sensing etc…),
Programming
represented objects and Social networking
temporal data etc… etc…

6. CONCLUSION
In this paper we briefly reviewed the various data mining
trends from its inception to the future. This review would be
helpful to researchers to focus on the various issues of data
mining. In future course, we will review the various [12] Z. K. Baker and V. K.Prasanna. 2005. Efficient Parallel
classification algorithms and significance of evolutionary Data Mining with the Apriori Algorithm on FPGAs. In
computing (genetic programming) approach in designing of Submitted to the IEEE International Parallel and
efficient classification algorithms for data mining. Distributed Processing Symposium (IPDPS ’05).
[13] Jing He.2009. Advances in Data Mining: History and
7. REFERENCES Future, Third international Symposium on Information
[1] Heikki, Mannila. 1996. Data mining: machine learning, Technology Application, 978-0-7695-3859-4/09 IEEE
statistics, and databases, IEEE 2009 DOI 10.1109/IITA.2009.204
[2] Fayadd, U., Piatesky -Shapiro, G., and Smyth, P. 1996. [14] Ali Meligy.2009. A Grid-Based Distributed SVM Data
From Data Mining To Knowledge Discovery in Mining Algorithm, European Journal of Scientific
Databases, AAAI Press / The MIT Press, Research ISSN 1450-216X Vol.27 No.3. Pp.313-321 ©
Massachusetts Institute Of Technology. ISBN 0–262 Euro Journals Publishing, Inc. Available at :
56097–6 Fayap. http://www.eurojournals.com/ejsr.htm
[3] Piatetsky-Shapiro, Gregory. 2000. The Data-Mining [15] S. Mitra, S. K. Pal, and P. Mitra. 2001. Data mining in
Industry Coming of Age. IEEE Intelligent Systems. soft computing framework: A survey, IEEE Trans.
[4] Salmin, Sultana et al. 2009. Ubiquitous Secretary: A Neural Networks, vol. 13, pp. 3 - 14.
Ubiquitous Computing Application Based on Web [16] Mark, J., Embrechts. 2005. Introduction to Scientific
Services Architecture , International Journal of Data Mining: Direct Kernel Methods & Applications ,
Multimedia and Ubiquitous Engineering Vol. 4, No. 4, Computationally Intelligent Hybrid Systems: The Fusion
October, 2009 of Soft Computing and Hard Computing , Wiley , New
[5] Hsu, J. 2002. Data Mining Trends and Developments: York, pp. 317-365
The Key Data Mining Technologies and Applications [17] Han, J., & Kamber, M. 2001. Data mining: Concepts and
for the 21st Century, The Proceedings of the 19th techniques .Morgan-Kaufman Series of Data
Annual Conference for Information Systems Educators Management Systems. San Diego: Academic Press.
(ISECON 2002), ISSN: 1542-7382. Available
Online: http://colton.byuh.edu/isecon/2002/224b/Hsu.pdf
[6] Shonali Krishnaswamy. 2005. Towards Situation- AUTHORS PROFILE
awareness and Ubiquitous Data Mining for Road Safety: Venkatadri. M received his Masters Degree in Computer Science
Rationale and Architecture for a Compelling Application
and Engineering from Acharya Nagarjuna University, India. He is
(2005), Proceedings of Conference on Intelligent
Vehicles and Road Infrastructure 2005, pages-16, 17. presently pursuing his Ph.D in Computer Science, Dravidian
Available at : University, India. His area of interest includes databases, data
http://www.csse.monash.edu.au/~mgaber/CameraReadyI warehousing and mining, and artificial intelligence.
[7] Kotsiantis, S., Kanellopoulos, D., Pintelas, P. 2004.
Multimedia mining. WSEAS Transactions on Systems, Dr. Lokanatha C. Reddy earned M.Sc. (Maths) from Indian
No 3, s. 3263-3268. Institute of Technology, New Delhi; M.Tech (CS) with Honours
[8] Abdulvahit, Torun. , Ebnem, Düzgün. 2006. Using from Indian Statistical Institute, Kolkata; and Ph.D (CS) from Sri
spatial data mining techniques to reveal vulnerability of Krishnadevaraya University, Anantapur. Earlier worked at
people and places due to oil transportation and accidents: KSRM College of Engineering, Kadapa; Indian Space Research
A case study of Istanbul strait, ISPRS Technical Organization (ISAC) at Bangalore and as the Head of the
Commission II Symposium, Vienna. Addison Wesley, Computer Centre at the Sri Krishnadevaraya University, India;
1st edition.
Presently, he is the Professor of Computer Science at the
[9] T. M. Mitchell. 1982. Generalization as Search, Artificial Dravidian University, India. His active research interests include
Intelligence, 18(2), 1982, pp.203-226.
Real time Computation, Distributed Computation, Device
[10] R. Michalski., I. Mozetic., J. Hong., and N. Lavrac.
Drivers, Geometric Designs and Shapes, Digital Image
1986. The AQ15 Inductive Leaning System: An
Overview and Experiments, Reports of Machine Processing, Pattern Recognition and Networks.
Leaning and Inference Laboratory, MLI-86-6, George
Maseon University.
[11] J. R. Quinlan.1992.Programs for Machine Learning,
Morgan Kaufmann.

22

View publication stats

You might also like