You are on page 1of 11

DATA MINING: Applications and Trends

Data mining has attracted a great deal of attention in the information industry and in society as a
whole in recent years, due to the availability of huge amounts of data and the imminent need for
turning such data into useful information and knowledge. Today as more data are gathered, with
the amount of data doubling every three years, Data Mining is becoming an increasingly
important tool to transform these data into information. It is commonly used in a wide range
of profiling practices, such as marketing, surveillance, fraud detection and scientific discovery.

INTRODUCTION: - Data Mining is the exploration and analysis of large sets, in order to
discover meaningful patterns and rules. The key idea is to find effective ways to combine
computers power to process data with the human eye’s ability .to detect patterns. The techniques of
data mining are designed for work best with large data sets.

Data mining is the process of extracting patterns from data It is the process of extraction of
interesting (nontrivial, implicit, previously unknown and potentially useful) patterns or knowledge
from huge amount of data. It is the set of activities used to find new, hidden or unexpected patterns
in data or unusual patterns in data. Using information contained within data warehouse, data
mining can often provide answers to questions about an organization that a decision maker has
previously not thought to ask.

• Which products should be promoted to a particular customer?


• What is the probability that a certain customer will respond to a planned promotion?
• Which securities will be most profitable to buy or sell during the next trading session?
• What is the likelihood that a certain customer will default or pay back a schedule?
• What is the appropriate medical diagnosis for this patient?

These types of questions can be answered surprisingly easily if the information hidden among the
data in your databases can be located and utilized.

The importance of collecting data that reflect your business or scientific


activities to achieve competitive advantage is widely recognized now.
Powerful systems for collecting data and managing it in large databases
usually take place in all large and mid-range companies. However, the
bottleneck of turning this data into your success is the difficulty of extracting
knowledge about the system you study from the collected data.
Human analysts with no special tools can no longer make sense of enormous
volumes of data that require processing in order to make informed business
decisions.
Data mining automates the process of finding relationships and patterns in
raw data and delivers results that can be either utilized in an automated
decision support system or assessed by a human analyst.

EVOLUTION

Data mining techniques are the result of a long process of research and product development. This
evolution began when business data was first stored on computers, continued with improvements
in data access, and more recently, generated technologies that allow users to navigate through their
data in real time. Data mining takes this evolutionary process beyond retrospective data access and
navigation to prospective and proactive information delivery.

From the user’s point of view, the following four steps were revolutionary because they allowed
new business questions to be answered accurately and quickly.

Data Collection (1960s)

Data Access (1980s)

Data Warehousing & Decision


Support (1990s)

Data Mining (Emerging Today)

Fig 2: Evolutionary Stages of Data Mining

 Data Collection (1960s): At this stage:-


 Business question: “What was my total revenue in the last five years?".
 Enabling technologies: Computers, tapes, disks.
 Product Providers: IBM, CDC.
 Characteristics: Retrospective, static data delivery.
 Data Access (1980s): At this stage:-
 Business question: "What were unit sales in New England last March?".
 Enabling technologies: Relational databases (RDBMS), Structured Query Language
(SQL), ODBC.
 Product Providers: Oracle, Sybase, Informix, IBM, Microsoft.
 Characteristics: Retrospective, dynamic data delivery at record level.

 Data Warehousing & Decision Support (1990s): At this stage:-


 Business question: "What were unit sales in New England last March? Drill down to
Boston.”
 Enabling technologies: On-line analytic processing (OLAP), multidimensional databases,
and data warehouses.
 Product Providers: Pilot, Comshare, Arbor, Cognos, Micro strategy.
 Characteristics: Retrospective, dynamic data delivery at multiple levels.

 Data Mining (Emerging Today): At this stage:-


 Business question: "What’s likely to happen to Boston unit sales next month? Why?".
 Enabling technologies: Advanced algorithms, multiprocessor computers, massive
databases.
 Product Providers: Pilot, Lockheed, IBM, SGI, numerous startups (nascent industry).
 Characteristics: Prospective, proactive information delivery.

The core components of data mining technology have been under development for decades, in
research areas such as statistics, artificial intelligence, and machine learning. Today, the maturity
of these techniques, coupled with high-performance relational database engines and broad data
integration efforts, make these technologies practical for current data warehouse environments.

THE PRESENT AND THE FUTURE

The field of data mining has been growing in leaps and bounds, and has shown great potential for
the future. What is the future of data mining? Certainly, the field has made great strides in past
years, and many industry analysts and experts in the area feel that the future will be bright. There
is definite growth in the area of data mining. Many industry analysts and research firms have
projected a bright future for the entire data mining area, and its related area of CRM (customer
relationship management). The growth in the CRM Analytic application market had approached
54.1% per year through 2003. In addition, data mining projects had grown by more than 300% by
the year 2002. By 2003, over 90% of consumer-based industries with e-commerce orientation had
utilized some kind of data mining model. As mentioned previously, the field of data mining is very
broad, and there are many methods and technologies which have become dominant in the field.

THE SCOPE OF DATA MINING

Data mining derives its name from the similarities between searching for valuable business
information in a large database and mining a mountain for a vein of valuable ore. Both processes
require either sifting through an immense amount of material, or intelligently probing it to find
exactly where the value resides. Given databases of sufficient size and quality, data mining
technology can generate new business opportunities by providing these capabilities:

• Automated prediction of trends and behaviors: Data mining automates the process of
finding predictive information in large databases. Questions that traditionally required
extensive hands-on analysis can now be answered directly from the data — quickly. A
typical example of a predictive problem is targeted marketing. Data mining uses data on
past promotional mailings to identify the targets most likely to maximize return on
investment in future mailings. Other predictive problems include forecasting bankruptcy
and other forms of default, and identifying segments of a population likely to respond
similarly to given events.

• Automated discovery of previously unknown patterns. Data mining tools sweep through
databases and identify previously hidden patterns in one step. An example of pattern
discovery is the analysis of retail sales data to identify seemingly unrelated products that
are often purchased together. Other pattern discovery problems include detecting fraudulent
credit card transactions and identifying anomalous data that could represent data entry
keying errors.

Data mining techniques can yield the benefits of automation on existing software and hardware
platforms, and can be implemented on new systems as existing platforms are upgraded and new
products developed. When data mining tools are implemented on high performance parallel
processing systems, they can analyze massive databases in minutes. Faster processing means that
users can automatically experiment with more models to understand complex data. High speed
makes it practical for users to analyze huge quantities of data. Larger databases, in turn, yield
improved predictions.

TECHNIQUES OF DATA MINING

The most commonly used techniques in data mining are:

• Artificial neural networks: Non-linear predictive models that learn through training and
resemble biological neural networks in structure.

• Decision trees: Tree-shaped structures that represent sets of decisions. These decisions
generate rules for the classification of a dataset. Specific decision tree methods include
Classification and Regression Trees (CART) and Chi Square Automatic Interaction
Detection (CHAID).

• Genetic algorithms: Optimization techniques that use process such as genetic


combination, mutation, and natural selection in a design based on the concepts of
evolution.

• Nearest neighbor method: A technique that classifies each record in a dataset based on a
combination of the classes of the k record(s) most similar to it in a historical dataset (where
k ³ 1). Sometimes called the k-nearest neighbor technique.

• Rule induction: The extraction of useful if-then rules from data based on statistical
significance.

Many of these technologies have been in use for more than a decade in specialized analysis tools
that work with relatively small volumes of data. These capabilities are now evolving to integrate
directly with industry-standard data warehouse and OLAP platforms.

1.5 THE TEN STEPS OF DATA MINING

Here is a process for extracting hidden knowledge from your data warehouse, your customer
information file, or any other company database.

1. Identify The Objective -- Before you begin, be clear on what you hope to accomplish with
your analysis. Know in advance the business goal of the data mining. Establish whether or not
the goal is measurable. Some possible goals are to
• Find sales relationships between specific products or services
• Identify specific purchasing patterns over time
• Identify potential types of customers
• Find product sales trends.

2. Select The Data -- Once you have defined your goal, your next step is to select the data to
meet this goal. This may be a subset of your data warehouse or a data mart that contains
specific product information. It may be your customer information file. Segment it as much as
possible the scope of the data to be mined. Here are some key issues.

• Are the data adequate to describe the phenomena the data mining analysis is attempting to model?
• Can you enhance internal customer records with external lifestyle and demographic data?
• Are the data stable—will the mined attributes be the same after the analysis?
• If you are merging databases can you find a common field for linking them?
• How current and relevant are the data to the business goal?

3 Prepare The Data -- Once you've assembled the data, you must decide which
attributes to convert into usable formats. Consider the input of domain experts—
creators and users of the data.

• Establish strategies for handling missing data, extraneous noise, and outliers.
• Identify redundant variables in the dataset and decide which fields to exclude
• Decide on a log or square transformation, if necessary

Identify the Objective

2. Select the data

3. Prepare the data

4. Audit the data


Steps of DATA
MINING
5. Select the Tools

6. Format the solution


7. Construct the solution

8. Validate the findings

9. Deliver the findings

10. Integrate the solution

• Visually inspect the dataset to get a feel for the database


• Determine the distribution frequencies of the data

You can postpone some of these decisions until you select a data-mining tool. For example, if
you need a neural network or polynomial network you may have to transform some of your
fields.

4. Audit The Data -- Evaluate the structure of your data in order to determine the appropriate
tools.

• What is the ratio of categorical/binary attributes in the database?


• What is the nature and structure of the database?
• What is the overall condition of the dataset?
• What is the distribution of the dataset?

Balance the objective assessment of the structure of your data against your users' need to
understand the findings. Neural nets, for example, don't explain their results.

5. Select The Tools -- Two concerns drive the selection of the appropriate data-mining tool—
your business objectives and your data structure. Both should guide you to the same tool.
Consider these questions when evaluating a set of potential tools.

• Is the data set heavily categorical?


• What platforms do your candidate tools support?
• Are the candidate tools ODBC-compliant?
• What data format can the tools import?
No single tool is likely to provide the answer to your data-mining project. Some tools integrate
several technologies into a suite of statistical analysis programs, a neural network, and a
symbolic classifier.

6. Format The Solution -- In conjunction with your data audit, your business objective and the
selection of your tool determine the format of your solution. The Key questions are:

• What is the optimum format of the solution—decision tree, rules, C code, SQL syntax?
• What are the available format options?
• What is the goal of the solution?
• What do the end-users need—graphs, reports, code?

7. Construct The Model -- At this point that the data mining process begins. Usually the first
step is to use a random number seed to split the data into a training set and a test set and
construct and evaluate a model. The generation of classification rules, decision trees, clustering
sub-groups, scores, code, weights and evaluation data/error rates takes place at this stage.
Resolve these issues:

• Are error rates at acceptable levels? Can you improve them?


• What extraneous attributes did you find? Can you purge them?
• Is additional data or a different methodology necessary?
• Will you have to train and test a new data set?

8. Validate The Findings -- Share and discuss the results of the analysis with the business
client or domain expert. Ensure that the findings are correct and appropriate to the business
objectives.

• Do the findings make sense?


• Do you have to return to any prior steps to improve results?
• Can use other data mining tools to replicate the findings?

9. Deliver The Findings -- Provide a final report to the business unit or client. The report
should document the entire data mining process including data preparation, tools used, test
results, source code, and rules. Some of the issues are:

• Will additional data improve the analysis?


• What strategic insight did you discover and how is it applicable?
• What proposals can result from the data mining analysis?
• Do the findings meet the business objective?

10. Integrate The Solution -- Share the findings with all interested end-users in the appropriate
business units. You might wind up incorporating the results of the analysis into the company's
business procedures. Some of the data mining solutions may involve

• SQL syntax for distribution to end-users


• C code incorporated into a production system
• Rules integrated into a decision support system.

Although data mining tools automate database analysis, they can lead to faulty findings and
erroneous conclusions if you're not careful. Bear in mind that data mining is a business process
with a specific goal—to extract a competitive insight from historical records in a database.

DATA MINING APPLICATIONS

• For Financial data analysis

Most banks and financial institutions offer a wide variety of banking services (such as checking,
saving, and business and individual customer transactions), credit (such as business, mortgage, and
automobile loans), and investment services (such as mutual funds). Some also offer insurance
services and stock services. Financial data collected in the banking and financial industry is often
relatively complete, reliable and high quality, which facilitates systematic data analysis and data
mining. For example it can also help in fraud detection by detecting a group of people who stage
accidents to collect on insurance money.

• For Retail Industry

Retail industry collects huge amount of data on sales, customer shopping history, goods
transportation and Consumption and service records and so on. The quantity of data collected
continues to expand rapidly, especially due to the increasing ease, availability and popularity of the
business conducted on web, or e-commerce. Retail industry provides a rich source for data mining.
Retail data mining can help identify customer behavior, discover customer shopping patterns and
trends, improve the quality of customer service, achieve better customer retention and satisfaction,
enhance goods consumption ratios design more effective goods transportation and distribution
policies and reduce the cost of business.

• For Telecommunication Industry

The telecommunication industry has quickly evolved from offering local and long distance
telephone services to provide many other comprehensive communication services including voice,
fax, pager, cellular phone, images, e-mail, computer and web data transmission and other data
traffic. The integration of telecommunication, computer network, Internet and numerous other
means of communication and computing are underway. Moreover, with the deregulation of the
telecommunication industry in many countries and the development of new computer and
communication technologies, the telecommunication market is rapidly expanding and highly
competitive. This creates a great demand from data mining in order to help understand business
involved, identify telecommunication patterns, catch fraudulent activities, make better use of
resources, and improve the quality of services.

• Text Mining and Web Mining

Text mining is the process of searching large volumes of documents from certain keywords or key
phrases. By searching literally thousands of documents various relationships between the
documents can be established. Using text mining however, we can easily derive certain patterns in
the comments that may help identify a common set of customer perceptions not captured by the
other survey questions. An extension of text mining is web mining. Web mining is an exciting new
field that integrates data and text mining within a website. It enhances the web site with intelligent
behavior, such as suggesting related links or recommending new products to the consumer. Web
mining is especially exciting because it enables tasks that were previously difficult to implement.
They can be configured to monitor and gather data from a wide variety of locations and can
analyze the data across one or multiple sites. For example the search engines work on the principle
of data mining.

• Higher Education

An important challenge that higher education faces today is predicting paths of students and
alumni. Which student will enroll in particular course programs? Who will need additional
assistance in order to graduate? Meanwhile, additional issues such as enrollment management and
time-to degree, continue to exert pressure on colleges to search for new and faster solutions.
Institutions can better address these students and alumni through the analysis and presentation of
data. Data mining has quickly emerged as a highly desirable tool for using current reporting
capabilities to uncover and understand hidden patterns in vast databases.

• Healthcare

The past decade has seen an explosive growth in biomedical research, ranging from the
development of new pharmaceuticals and in cancer therapies to the identification and study of
human genome by discovering large scale sequencing patterns and gene functions. Recent research
in DNA analysis has led to the discovery of genetic causes for many diseases and disabilities as
well as approaches for disease diagnosis, prevention and treatment.

TRENDS

As different types of data are available, approaches poses many challenging research issues in data
mining. The design of a standard data mining languages, the development of effective and efficient
data mining methods and systems, the construction of interactive and integrated data mining
environments, and the applications of data mining to solve large applications large application
problems are important tasks for data mining researches and data mining system and application
developers. Here we will discuss some of the trends in data mining that reflect the pursuit of these
challenges:

• Application Exploration: Earlier data mining was mainly used for helping businesses gain
a competitive edge. But as data mining is becoming more popular it is gaining wide
acceptance in other fields also such as biomedicine, stock market, fraud detection,
telecommunication and many more. And many new explorations are being done for this
purpose. In addition for data mining for business continues to expand as e-commerce and
marketing becomes mainstream elements of the retail industry. As generic data mining
systems may have limitations in dealing with application-specific problems, we may see a
trend toward the development of more application– specific data mining systems.Scalable
data mining methods: The current data mining methods capable of handling only a
particular type of data and limited amount of data, but as data is expanding at a massive
rate, there is a need to develop new data mining methods which are scalable and can handle
different types of data and large volume of data. The data mining methods should be more
interactive and user friendly. One important direction towards improving the repair
efficiency of the timing process while increasing user interaction is constraint-based
mining. This provide user with more control by allowing the specification and use of
constraints to guide data mining systems in their search for interesting patterns.

• Combination of data mining with database systems, data warehouse systems, and web
database systems: Database systems, data warehouse systems, and WWW are loaded with
huge amounts of data and have thus become the major information processing systems. It is
important to make sure that data mining serves as essential data analysis component that can
be easily included in to such an information-processing environment. The desired architecture
for data mining system is the tight coupling with database and data warehouse systems.
Transaction management query processing, online analytical processing and online analytical
mining should be integrated into one unified framework.

• Standardization of data mining language: Today few data mining languages are
commercially available in the market like Microsoft’s SQL server 2005, IBM Intelligent Miner,
SAS Enterprise Miner, SGI Mineset, Clementine, DBMiner and many more but a standard data
mining language or other standardization efforts will provide the orderly development of data
mining solutions, improved interpretability among multiple data mining systems and functions.

• Visual data mining: It is rightly said a picture is worth a thousand words. So if the result of
the mined data can be shown in the visual form it will further enhance the worth of the mined
data. Visual data mining is an effective way to discover knowledge from huge amounts of data.
The systematic study and development of visual data mining techniques will promote the use
for data mining analysis.

• New methods for mining complex types of data: The complex types of data like geospatial,
multimedia, time series, sequence and text data poses an important research area in field of data
mining. There is still a huge gap between the needs for these applications and the available
technology.

• Web mining: The World Wide Web is huge collection of globally distributed collection of
news, advertisements, consumer records, financial, education, government, e-commerce and
many other services. The WWW also contains huge and dynamic collection hyper linked
information, providing a huge source for data mining. Based on the above facts, the Web also
poses great challenges for efficient resource and knowledge discovery.

• Biological data mining: Although biological data mining can be considered under
“application exploration”, the unique combination of complexity, richness, size, and
importance of biological warrants special attention in data mining. Mining DNA and protein
sequences, mining high-dimensional microarray data are some of the interesting topics for
biological data mining research.

• Data mining and software engineering: As software programs become increasingly bulky in
size, sophisticated in complexity, and tend to originate from the integration of multiple
components developed by different software team, it is an increasingly challenging task to
ensure software robustness and reliability. The analysis of the executions of a buggy software
program is essentially a data mining process- tracing the data generated during program
executions may disclose important patterns and outliers that may lead to the eventual
automated discovery of software bugs.

• Distributed data mining: Traditional data mining methods, designed to work at a centralized
location, do not work well in many of the distributed computing environments present today
(e.g., intranets, Internets, LAN). Advances in distributed data mining methods are expected.

• Real time data mining: Many applications involving stream data (such as e-commerce,
web mining, stock analysis) require dynamic data mining models to be built in real time.
Additional development is needed in this area

CONCLUSION

Comprehensive data warehouses that integrate operational data with customer, supplier, and
market information have resulted in an explosion of information. Competition requires timely and
sophisticated analysis on an integrated view of the data. However, there is a growing gap between
more powerful storage and retrieval systems and the users’ ability to effectively analyze and act on
the information they contain. Both relational and OLAP technologies have tremendous capabilities
for navigating massive data warehouses, but brute force navigation of data is not enough. A new
technological leap is needed to structure and prioritize information for specific end-user problems.
The data mining tools can make this leap. Quantifiable business benefits have been proven through
the integration of data mining with current information systems, and new products are on the
horizon that will bring this integration to an even wider audience of users.

Since data mining is a young discipline with wide and diverse applications, there is still a
nontrivial gap between general principles of data mining and domain specific, effective data
mining tools for particular applications.

A few application domains of Data Mining (such as finance, the retail industry and
telecommunication) and Trends in Data Mining which include further efforts towards the
exploration of new application areas and new methods for handling complex data types, algorithms
scalability, constraint based mining and visualization methods, the integration of data mining with
data warehousing and database systems, the standardization of data mining languages, and data
privacy protection and security.

You might also like