You are on page 1of 20

APPLICATIONS AND TRENDS IN DATA MINING

1. Data Mining Applications


Data mining is an interdisciplinary field with wide and diverse applications
 There is still a nontrivial gaps between data mining principles and domain-
specific applications.
 Data mining is widely used in different areas.
 There is a number of commercial data mining system available today and
yet there are many challenges in this field.

Data Mining Applications:


I. Financial data analysis
II. Retail industry
III. Telecommunication industry
IV. Biological data analysis
V. Other Scientific Applications
VI. Intrusion Detection

1.1 Data mining for Financial Data Analysis:


The financial data in banking and financial industry is generally reliable and of
high quality which facilitates systematic data analysis and data mining. Some of the
typical cases are as follows :
 Design and construction of data warehouses for multidimensional data analysis and
data mining:
Multidimensional data analysis methods should be used to analyze the general
properties of such data. For example,
 View the debit and revenue changes by month, by region, by sector, and
by other factors
 Access statistical information such as max, min, total, average, tren, etc.
Data warehouses, data cubes, multifeature and discovery-driven data cubes,
characterization and class comparisons, and outlier analysis all play important
roles in financial data analysis and mining.
 Loan payment prediction and customer credit policy analysis:
 Data mining methods, such as attribute selection and attribute relevance
ranking, may help identify important factors and eliminate irrelevant ones.
 For example, factors related to the risk of loan payments include loan-to-value
ratio, term of the loan, debt ratio (total amount of monthly debt versus the total
monthly income)
 Loan payment performance
 Consumer credit rating
 Classification and Clustering of customers for targeted marketing:
 Classification and clustering methods can be used for customer group
identification and targeted marketing. For example, we can use classification
to identify the most crucial factors that may influence a customer’s decision
regarding banking.
 Multidimensional segmentation by nearest-neighbor, classification, decision
trees, etc. to identify customer groups or associate a new customer to an
appropriate customer group.
 Detection of money laundering and other financial crimes:
 To detect money laundering and other financial crimes, it is important to
integrate information from multiple databases (like bank transaction databases,
and federal or state crime history databases).
 Multiple data analysis tools can then be used to detect unusual patterns, such
as large amounts of cash flow at certain periods, by certain groups of
customers.
 Tools:
 Data visualization : to display transaction activities using graphs
by time and by groups of customers
 Linkage analysis : to identify links among different customers
and activities
 Classification : to filter unrelated attributes and rank the highly
related ones
 Clustering tools : t o group different cases
 Outlier analysis: to detect unusual amounts of fund transfers or
other activities.
 Sequential pattern analysis tools: to characterize unusual access
sequences.
1.2 Data Mining for Retail Industry:
 Data Mining has its great application in Retail Industry because it collects large
amount of data from on sales, customer purchasing history, goods transportation,
consumption and services.
 Data mining in retail industry helps in identifying customer buying patterns and trends
that lead to improved quality of customer service and good customer retention and
satisfaction.
Applications of retail data mining: „
 Identify customer buying behaviors „
 Discover customer shopping patterns and trends „
 Improve the quality of customer service „
 Achieve better customer retention and satisfaction „
 Enhance goods consumption ratios „
 Design more effective goods transportation and distribution policies
Here is the list of examples of data mining in the retail industry −
 Design and Construction of data warehouses based on the benefits of data mining:
Because retail data cover a wide spectrum (including sales, customers,
employees, goods transportation, consumption, and services), there can be many ways to
design a data warehouse for this industry.
 Multidimensional analysis of sales, customers, products, time and region:
 The retail industry requires timely information regarding customer needs,
product sales, trends, and fashions, as well as the quality, cost, profit, and service
of commodities.
 To provide powerful multidimensional analysis and visualization tools, including
the construction of sophisticated data cubes according to the needs of data
analysis.
 Analysis of effectiveness of sales campaigns:
 The retail industry conducts sales campaigns using advertisements, coupons, and
various kinds of discounts and bonuses to promote products and attract
customers.
 Careful analysis of the effectiveness of sales campaigns can help improve
company profits.
 Multidimensional analysis can be used for this purpose by comparing the amount
of sales and the number of transactions containing the sales items during the
sales period versus those containing the same items before or after the sales
campaign.
 Customer Retention:
 Goods purchased at different periods by the same customers can be grouped into
sequences.
 Sequential pattern mining can then be used to investigate changes in customer
consumption or loyalty and suggest adjustments on the pricing and variety of
goods in order to help retain customers and attract new ones.
 Product recommendation and cross-referencing of items:
By mining associations from sales records, one may discover that a customer who
buys a digital camera is likely to buy another set of items. Such information can be used
to form product recommendations.

1.3 Data Mining for the Telecommunication Industry:


Data mining in telecommunication industry helps in identifying the
telecommunication patterns, catch fraudulent activities, make better use of resource, and
improve quality of service. Here is the list of examples for which data mining improves
telecommunication services –
 Multidimensional Analysis of Telecommunication data:
 Telecommunication data are intrinsically multidimensional, with dimensions
such as calling-time, duration, location of caller, location of callee, and type of
call.
 The multidimensional analysis of such data can be used to identify and compare
the data traffic, system workload, resource usage, user group behavior, and
profit.
 Fraudulent pattern analysis and the identification of unusual patterns:
 Identify potentially fraudulent users and their a typical usage patterns „
 Detect attempts to gain fraudulent entry to customer accounts „
 Many of these patterns can be discovered by multidimensional analysis, cluster
analysis, and outlier analysis
 Multidimensional association and sequential patterns analysis:
 Find usage patterns for a set of communication services by customer group, by
month, etc. „
 Promote the sales of specific services. „
 Improve the availability of particular services in a region
 Mobile Telecommunication services:
 One important feature of mobile telecommunication data is its association with
spatiotemporal information.
 Spatiotemporal data mining may become essential for finding certain patterns.
For example, unusually busy mobile phone traffic at certain locations may
indicate something abnormal happening in these locations.
 Use of visualization tools in telecommunication data analysis:
 Tools for OLAP visualization, linkage visualization, association visualization,
clustering, and outlier visualization have been shown to be very useful for
telecommunication data analysis.

1.4 Biological Data Analysis


1. In recent times, we have seen a tremendous growth in the field of biology such as
genomics, proteomics, functional Genomics and biomedical research. Biological
data mining is a very important part of Bioinformatics.
2. DNA sequences form the foundation of the genetic codes of all living organisms.
All DNA sequences are comprised of four basic building blocks, called
nucleotides: adenine (A), cytosine (C), guanine (G), and thymine (T). These four
nucleotides (or bases) are combined to form long sequences.

Following are the aspects in which data mining contributes for biological data analysis :
 Semantic integration of heterogeneous, distributed genomic and proteomic
databases
 The semantic integration of such data is essential to the cross-site analysis of
biological data.
 Data cleaning, data integration, reference reconciliation, classification, and
clustering methods will facilitate the integration of biological data and the
construction of data warehouses for biological data analysis.
 Alignment, indexing, similarity search and comparative analysis multiple
nucleotide sequences
 BLAST and FASTA, in particular, are tools for the systematic analysis of
genomic and proteomic data.
 Biological sequence analysis methods differ from many sequential pattern
analysis algorithms proposed in data mining research.
 The sequence data to be searched in order to deal with insertions, deletions, and
mutations.
 Sophisticated statistical analysis and dynamic programming methods often play
a key role in the development of alignment algorithms.
 Compare the frequently occurring patterns of each class (e.g., diseased and
healthy). „
 Identify gene sequence patterns that play roles in various diseases.
 Discovery of structural patterns and analysis of genetic networks and protein
pathways
 In biology, protein sequences are folded into three-dimensional structures, and
such structures interact with each other based on their relative positions and the
distances between them. Such complex interactions form the basis of
sophisticated genetic networks and protein pathways.
 Association and path analysis
 Association analysis: identification of co-occurring gene sequences „
 Most diseases are not triggered by a single gene but by a combination of
genes acting together „
 Association analysis may help determine the kinds of genes that are
likely to co-occur together in target samples „

 Path analysis: linking genes to different disease development stages „


 Different genes may become active at different stages of the disease.
 Develop pharmaceutical interventions that target the different stages
separately
 Visualization tools in genetic data analysis
 Alignments among genomic or proteomic sequences and the interactions
among complex biological structures are most effectively presented in
graphic forms, transformed into various kinds of easy-to-understand
visual displays.
 Visualization and visual data mining therefore play an important role in
biological data analysis.

1.5 Other Scientific Applications


Huge amount of data have been collected from scientific domains such as
geosciences, astronomy, etc. A large amount of data sets is being generated because of
the fast numerical simulations in various fields such as climate and ecosystem modeling,
chemical engineering, fluid dynamics, etc.
Following are the applications of data mining in the field of Scientific Applications −

 Data Warehouses and data preprocessing.

 For scientific applications in general, methods are needed for integrating


data from heterogeneous sources (such as data covering different time
periods) and for identifying events.
 For climate and ecosystem data, for instance (which are spatial and
temporal), the problem is that there are too many events in the spatial
domain and too few in the temporal domain.
 Mining complex data types:

 Scientific data sets are heterogeneous in nature, typically involving semi-


structured and unstructured data, such as multimedia data and
georeferenced stream data.
 Robust methods are needed for handling spatiotemporal data, related
concept hierarchies, and complex geographic relationships.
 Graph-based mining.

 Graphs may be used to capture many of the spatial, topological, geometric,


and other relational characteristics present in scientific data sets.
 For example, graphs can be used to model chemical structures and data
generated by numerical simulations, such as fluid-flow simulations.
 The success of graph-modeling, however, depends on improvements in the
scalability and efficiency of many classical data mining tasks, such as
classification, frequent pattern mining, and clustering.
 Visualization and domain specific knowledge.

 High-level graphical user interfaces and visualization tools are required for
scientific data mining systems.
 These should be integrated with existing domain-specific information
systems and database systems to guide researchers and general users in
searching for patterns, interpreting and visualizing discovered patterns, and
using discovered knowledge in their decision making.

1.6 Intrusion Detection


Intrusion refers to any kind of action that threatens integrity, confidentiality, or
the availability of network resources. In this world of connectivity, security has become the
major issue. With increased usage of internet and availability of the tools and tricks for
intruding and attacking network prompted intrusion detection to become a critical component of
network administration. Here is the list of areas in which data mining technology may be
applied for intrusion detection:
 Development of data mining algorithm for intrusion detection.
 Data mining algorithms can be used for misuse detection and anomaly detection.
 In misuse detection, training data are labeled as either “normal” or “intrusion.”
 Supervised or unsupervised learning can be used. In a supervised approach, the
model is developed based on training data that are known to be “normal.”
 In an unsupervised approach, no information is given about the training data.
 Anomaly detection research has included the application of classification
algorithms, statistical approaches, clustering, and outlier analysis.
 Association and correlation analysis, aggregation to help select and build
discriminating attributes.
 Analysis of Stream data.
Due to the transient and dynamic nature of intrusions and malicious attacks, it is
crucial to perform intrusion detection in the data stream environment.
 Distributed data mining.
Intrusions can be launched from several different locations and targeted to many
different destinations. Distributed data mining methods may be used to analyze network
data from several network locations in order to detect these distributed attacks.
 Visualization and query tools.
Visualization tools should be available for viewing any anomalous patterns
detected. Such tools may include features for viewing associations, clusters, and outliers.
Intrusion detection systems should also have a graphical user interface that allows
security analysts to pose queries regarding the network data or intrusion detection
results.
2. Data Mining System Products and Research Prototypes
Many commercial data mining systems have little in common with
respect to data mining functionality or methodology and may even work with
completely different kinds of data sets.
There are many data mining system products and domain specific data mining
applications. The new data mining systems and applications are being added to the
previous systems. The selection of a suitable data mining system generally depends on
the following factors.

 Data Types:
 The data mining system may handle formatted text, record-based data, and
relational data.
 The data could also be in ASCII text, relational database data or data warehouse
data. Therefore, we should check what exact format the data mining system can
handle.

 System issues: 
 The data mining system should be compatible with one or more operating
systems. 
 The most popular operating systems that host data mining software are
UNIX/Linux and Microsoft Windows. There are also data mining systems that
run on Macintosh, OS/2, and others.
 Large industry-oriented data mining systems often adopt a client/server
architecture.
 A recent trend has data mining systems providing Web-based interfaces and
allowing XML data as input and/or output.

Data Sources:
 Data sources refer to the data formats in which data mining system will operate.
Some data mining system may work only on ASCII text files while others on
multiple relational sources.
 Data mining system should also support ODBC connections or OLE DB for
ODBC connections.

 Data Mining functions and methodologies :


There are some data mining systems that provide only one data mining function
such as classification while some provides multiple data mining functions such as
concept description, discovery-driven OLAP analysis, association mining, linkage
analysis, statistical analysis, classification, prediction, clustering, outlier analysis,
similarity search, etc.

 Coupling data mining with databases or data warehouse systems: Data mining
systems need to be coupled with a database or a data warehouse system. The coupled
components are integrated into a uniform information processing environment. Here are
the types of coupling listed below −
 No coupling
 Loose Coupling
 Semi tight Coupling
 Tight Coupling

 Scalability: There are two scalability issues in data mining


 Row (Database size) Scalability: A data mining system is considered as row
scalable when the number or rows are enlarged 10 times. It takes no more than 10
times to execute a query.
 Column (Dimension) Salability: A data mining system is considered as column
scalable if the mining query execution time increases linearly with the number of
columns.

 Visualization Tools: Visualization in data mining can be categorized as follows −


 Data Visualization
 Mining Results Visualization
 Mining process visualization
 Visual data mining

 Data Mining query language and graphical user interface:


 Data mining is an exploratory process. An easy-to-use and high-quality graphical
user interface is essential in order to promote user-guided, highly interactive data
mining.
 Most data mining systems provide user-friendly interfaces for mining. Most data
mining systems do not share any underlying data mining query language.
 Standardizing data mining query languages include Microsoft’s OLE DB for
Data Mining.
 Other standardization efforts include PMML (or Predictive Model Markup
Language).

2.1 Examples of Commercial Data Mining Systems:


Many data mining systems specialize in one data mining function, such as
classification, Most of the systems described below provide multiple data mining
functions and explore multiple knowledge discovery techniques.

 From database system and graphics system vendors:

IBM Intelligent Miner:


 Intelligent Miner is an IBM data mining product that provides a wide range of
data mining functions, including association mining, classification, regression,
predictive modeling, deviation detection, clustering, and sequential pattern
analysis.
 It also provides an application toolkit containing neural network algorithms,
statistical methods, data preparation tools, and data visualization tools.
 Intelligent Miner include the scalability of its mining algorithms and its tight
integration with IBM’s DB2 relational database system.

Microsoft SQL Server 2000:


 Microsoft SQL Server 2005 is a database management system that incorporates
multiple data mining functions smoothly in its relational database system and
data warehouse system environments.
 It includes association mining, classification (using decision tree, naïve Bayes,
and neural network algorithms), regression trees, sequence clustering, and time-
series analysis.
 In addition, Microsoft SQL Server 2005 supports the integration of algorithms
developed by third-party vendors and application users.

SGI MineSet:
 MineSet, available from Purple Insight, was introduced by SGI in 1999.
 It provides multiple data mining functions, including association mining and
classification, as well as advanced statistics and visualization tools.
 MineSet is its set of robust graphics tools, including rule Advanced Visualization
Tools.

Oracle Data Mining (ODM):


 Oracle Data Mining (ODM), an option to Oracle Database 10g Enterprise
Edition, provides several data mining functions, including association mining,
classification, prediction, regression, clustering, and sequence similarity search
and analysis.
 Oracle Database 10g also provides an embedded data warehousing infrastructure
for multidimensional data analysis.

 From vendors of statistical analysis or data mining software:

Clementine:
 Clementine, from SPSS, provides an integrated data mining development
environment for end users and developers.
 Clementine is its objectoriented, extended module interface, which allows
users’ algorithms and utilities to be added to Clementine’s visual programming
environment.

Enterprise Miner:
 Enterprise Miner was developed by SAS Institute, Inc.
 Enterprise Miner is its variety of statistical analysis tools, which are built
based on the long history of SAS in the market of statistical analysis.

Insightful Miner:
 Insightful Miner, from Insightful Inc.,
 It provides several data mining functions, including data cleaning,
classification, prediction, clustering, and statistical analysis packages, along
with visualization tools.
 Its visual interface, which allows users to wire components together to create
self-documenting programs.

 Originating from the machine learning community:

CART:
 CART, available from Salford Systems, is the commercial version of the
CART (Classification and Regression Trees) system.
 It creates decision trees for classification and regression trees for prediction.
 CART employs boosting to improve accuracy.
See5 and C5.0:
 See5 and C5.0, available from RuleQuest, are commercial versions of the C4.5
decision tree and rule generation method .
 See5 is the Windows version of C4.5, while C5.0 is its UNIX counterpart.
Weka:
 Weka, developed at the University of Waikato in New Zealand, is open-source
data mining software in Java.
 It contains a collection of algorithms for data mining tasks, including data
preprocessing, association mining, classification, regression, clustering, and
visualization.

3. Additional Themes on Data Mining:


3.1 Theoretical Foundations of Data Mining: The theoretical foundations of data
mining includes the following concepts :

 Data Reduction : The basic idea of this theory is to reduce the data representation which
trades accuracy for speed in response to the need to obtain quick approximate answers to
queries on very large databases. Some of the data reduction techniques are as follows −
 Singular value Decomposition
 Wavelets
 Regression
 Log-linear models
 Histograms
 Clustering
 Sampling
 Construction of Index Trees
 Data Compression : The basic idea of this theory is to compress the given data by
encoding in terms of the following −
 Bits
 Association Rules
 Decision Trees
 Clusters
 Pattern Discovery: The basic idea of this theory is to discover patterns occurring in a
database. Following are the areas that contribute to this theory −
 Machine Learning
 Neural Network
 Association Mining
 Sequential Pattern Matching

 Clustering
 Probability Theory: This theory is based on statistical theory. The basic idea behind this
theory is to discover joint probability distributions of random variables.
 Probability Theory: According to this theory, data mining finds the patterns that are
interesting only to the extent that they can be used in the decision-making process of
some enterprise.
 Microeconomic View: As per this theory, a database schema consists of data and
patterns that are stored in a database. Therefore, data mining is the task of performing
induction on databases.
 Inductive databases: Apart from the database-oriented techniques, there are statistical
techniques available for data analysis. These techniques can be applied to scientific data
and data from economic and social sciences as well.
3.2 Statistical Data Mining
Some of the Statistical Data Mining Techniques are as follows −
 Regression − Regression methods are used to predict the value of the response variable
from one or more predictor variables where the variables are numeric. Listed below are
the forms of Regression −
Linear
 Multiple
 Weighted
 Polynomial
 Nonparametric
 Robust
 Generalized Linear Models − Generalized Linear Model includes −
 Logistic Regression
 Poisson Regression

The model's generalization allows a categorical response variable to be related to a set of


predictor variables in a manner similar to the modeling of numeric response variable
using linear regression.
 Analysis of Variance: This technique analyzes:

 Experimental data for two or more populations described by a numeric response


variable.
 One or more categorical variables (factors).
 Mixed-effect Models: These models are used for analyzing grouped data. These models
describe the relationship between a response variable and some co-variates in the data
grouped according to one or more factors.
 Factor Analysis: Factor analysis is used to predict a categorical response variable. This
method assumes that independent variables follow a multivariate normal distribution.
 Time Series Analysis: Following are the methods for analyzing time-series data −

 Auto-regression Methods.
 Univariate ARIMA (AutoRegressive Integrated Moving Average) Modeling.
 Long-memory time-series modeling.

3.3 Visual Data Mining


Visual Data Mining uses data and/or knowledge visualization techniques to
discover implicit knowledge from large data sets. Visual data mining can be viewed as
an integration of the following disciplines:
 Data Visualization: Use of computer graphics to create visual images which aid
in the understanding of complex, often massive representations of data.
 Data Mining: The process of discovering implicit but useful knowledge from
large data sets using visualization techniques.
Visual data mining is closely related to the following :
 Computer Graphics
 Multimedia Systems
 Human Computer Interaction
 Pattern Recognition
 High-performance Computing
Generally data visualization and data mining can be integrated in the following ways:
 Data Visualization : The data in a database or a data warehouse can be viewed in
several visual forms that are listed below :
 Boxplots
 3-D Cubes
 Data distribution charts
 Curves
 Surfaces
 Link graphs etc.
 Data Mining Result Visualization: Data Mining Result Visualization is the presentation
of the results of data mining in visual forms. These visual forms could be scattered plots,
boxplots, etc.

 Data Mining Process Visualization: Data Mining Process Visualization presents the
several processes of data mining. It allows the users to see how the data is extracted. It
also allows the users to see from which database or data warehouse the data is cleaned,
integrated, preprocessed, and mined.
3.4 Audio Data Mining
 Audio data mining makes use of audio signals to indicate the patterns of data or the
features of data mining results.
 By transforming patterns into sound and musing.
 We can listen to pitches and tunes, instead of watching pictures, in order to identify
anything interesting.
3.5 Data Mining and Collaborative Filtering
The Collaborative Filtering Approach is generally used for recommending
products to customers. These recommendations are based on the opinions of other
customers.

4. Data Mining, Privacy, and Data Security

4.1 Data Mining, Privacy, and Data Security:


In 1980, the Organization for Economic Co-operation and Development
(OECD) established a set of international guidelines, referred to as fair information practices.
These guidelines aim to protect privacy and data accuracy. They cover aspects relating to data
collection, use, openness, security, quality, and accountability. They include the following
principles:

Purpose specification and use limitation:


 The purposes for which personal data are collected should be specified at the time of
collection, and the data collected should not exceed the stated purpose.
 Data mining is typically a secondary purpose of the data collection.
 It has been argued that attaching a disclaimer that the data may also be used for mining
is generally not accepted as sufficient disclosure of intent. Due to the exploratory nature
of data mining,
 it is impossible to know what patterns may be discovered; therefore, there is no
certainty over how they may be used.

Openness:
Individuals have the right to know the nature of the data collected about them, the
identity of the data controller (responsible for ensuring the principles), and how the data
are being used.

Security Safeguards:
Personal data should be protected by reasonable security safeguards against such
risks as loss or unauthorized access, destruction, use, modification, or disclosure of
data.

Counterterrorism is a new application area for data mining:


 Data mining for counterterrorism may be used to detect unusual patterns, terrorist
activities (including bioterrorism), and fraudulent behavior.
 These include developing algorithms for real-time mining (e.g., for building models in
real time, so as to detect real-time threats such as that a building is scheduled to be
bombed by 10 a.m. the next morning); for multimedia data mining (involving audio,
video, and image mining, in addition to text mining); and in finding unclassified data to
test such applications.

To secure the privacy of individuals while collecting and mining data:


 Many data security enhancing techniques have been developed to help protect data.
 Databases can employ a multilevel security model to classify and restrict data
according to various security levels, with users permitted access to only their
authorized level.
 Encryption is another technique in which individual data items may be encoded.
 This may involve blind signatures (which build on public key encryption), biometric
encryption (e.g., where the image of a person’s iris or fingerprint is used to encode his
or her personal information), and anonymous databases (which permit the consolidation
of various databases but limit access to personal information to only those who need to
know; personal information is encrypted and stored at different locations).
 Intrusion detection is another active area of research that helps protect the privacy of
personal data.

Privacy-preserving data mining is a new area of data mining research that is emerging in
response to privacy protection during mining. It
There are two common approaches: secure multiparty computation and data
obscuration.
 In secure multiparty computation, data values are encoded using simulation and
cryptographic techniques so that no party can learn another’s data values. This
approach can be impractical when mining large databases.
 In data obscuration, the actual data are distorted by aggregation (such as using the
average income for a neighborhood, rather than the actual income of residents) or by
adding random noise.
In this way, we may continue to reap the benefits of data mining in terms of time and money
savings and the discovery of new knowledge.

5. Trends in Data Mining

Data mining concepts are still evolving and here are the latest trends that we get
to see in this field.
Application exploration:
 The exploration of data mining for businesses continues to expand as e-commerce and
e-marketing have become mainstream elements of the retail industry.
 Data mining is increasingly used for the exploration of applications in other areas, such
as financial analysis, telecommunications, biomedicine, intrusion detection, mobile
(wireless) data mining and science.

Scalable and interactive data mining methods:


 Scalable algorithms for individual and integrated data mining functions become
essential.
 One important direction toward improving the overall efficiency of the mining process
while increasing user interaction is constraint-based mining.
 This provides users with added control by allowing the specification and use of
constraints to guide data mining systems in their search for interesting patterns.
Integration of data mining with database systems, data warehouse systems, and Web
database systems:
 Database systems, data warehouse systems, and the Web have become mainstream
information processing systems. The data mining can be smoothly integrated into such
an information processing environment.
 A data mining system should be tightly coupled with database and data warehouse
systems.
 Transaction management, query processing, on-line analytical processing, and on-line
analytical mining should be integrated into one unified framework.
 It will ensure data mining portability, scalability, high performance, and an integrated
information processing environment for multidimensional data analysis and
exploration.

Standardization of data mining language:


A standard data mining language or other standardization will facilitate the
systematic development of data mining solutions, improve interoperability among multiple
data mining systems and functions, and promote the education and use of data mining
systems in industry and society.

Visual data mining:


 Visual data mining is an effective way to discover knowledge from huge amounts of
data.
 The development of visual data mining techniques will facilitate the promotion and use
of data mining as a tool for data analysis.

New methods for mining complex types of data:


More research is required towards the integration of data mining methods with
existing data analysis techniques for the complex types of data.
Biological data mining:
Mining DNA and protein sequences, mining highdimensional microarray data,
biological pathway and network analysis, link analysis across heterogeneous biological
data and information integration of biological data by data mining are interesting for
biological data mining research.

Web mining: 
Web content mining, web log mining, and other mining services on the internet
have secured a place among the flourishing subfields of data mining. 

Distributed data mining: 


Traditional data mining methods, designed to work at a centralized location, do
not work well in many of the distributed computing environments present today (e.g., the
Internet, intranets, local area networks, high-speed wireless networks, and sensor
networks). Advances in distributed data mining methods are expected.

Real-time data mining: 


Real-time data or ‘stream data’ is generated from web mining, mobile data
mining, e-commerce, stock analysis, etc. This type of data requires dynamic data mining
models. 

Graph mining, link analysis, and social network analysis:


Graph mining, link analysis, and social network analysis are useful for capturing
sequential, topological, geometric, and other relational characteristics of many scientific
data sets (such as for chemical compounds and biological networks) and social data sets
(such as for the analysis of hidden criminal networks). Such modeling is also useful for
analyzing links in Web structure mining.

Multirelational and Multidatabase data mining:


Multirelational data mining methods search for patterns involving multiple tables
(relations) from a relational database. Multidatabase mining searches for patterns across
multiple databases.

Privacy protection and information security in data mining:


 The collaboration of technologists, social scientists, law experts, and companies is
needed to produce a rigorous definition of privacy and a formalism to prove privacy-
preservation in data mining.
 Privacy protection and information security have also come to light as a notable trend in
the data mining space.

You might also like