You are on page 1of 3

Data Mining

Cenerally daLa mlnlng (someLlmes called daLa or knowledge dlscovery) ls Lhe process of analyzlng daLa
from dlfferenL perspecLlves and summarlzlng lL lnLo useful lnformaLlon lnformaLlon LhaL can be used Lo
lncrease revenue cuLs cosLs or boLh uaLa mlnlng sofLware ls one of a number of analyLlcal Lools for
analyzlng daLa lL allows users Lo analyze daLa from many dlfferenL dlmenslons or angles caLegorlze lL
and summarlze Lhe relaLlonshlps ldenLlfled 1echnlcally daLa mlnlng ls Lhe process of flndlng
correlaLlons or paLLerns among dozens of flelds ln large relaLlonal daLabases
What can data mining do?
Data mining is primarily used today by companies with a strong consumer Iocus - retail,
Iinancial, communication, and marketing organizations. It enables these companies to determine
relationships among "internal" Iactors such as price, product positioning, or staII skills, and
"external" Iactors such as economic indicators, competition, and customer demographics. And, it
enables them to determine the impact on sales, customer satisIaction, and corporate proIits.
Finally, it enables them to "drill down" into summary inIormation to view detail transactional
data.
With data mining, a retailer could use point-oI-sale records oI customer purchases to send
targeted promotions based on an individual's purchase history. By mining demographic data
Irom comment or warranty cards, the retailer could develop products and promotions to appeal to
speciIic customer segments.
ow does data mining work?
While large-scale inIormation technology has been evolving separate transaction and analytical
systems, data mining provides the link between the two. Data mining soItware analyzes
relationships and patterns in stored transaction data based on open-ended user queries. Several
types oI analytical soItware are available: statistical, machine learning, and neural networks.
Generally, any oI Iour types oI relationships are sought:
O asses: Stored data is used to locate data in predetermined groups. For example, a
restaurant chain could mine customer purchase data to determine when customers visit
and what they typically order. This inIormation could be used to increase traIIic by
having daily specials.
O :sters: Data items are grouped according to logical relationships or consumer
preIerences. For example, data can be mined to identiIy market segments or consumer
aIIinities.
O ssociations: Data can be mined to identiIy associations. The beer-diaper example is an
example oI associative mining.
O $eq:entia patterns: Data is mined to anticipate behavior patterns and trends. For
example, an outdoor equipment retailer could predict the likelihood oI a backpack being
purchased based on a consumer's purchase oI sleeping bags and hiking shoes.
Data mining consists oI Iive major elements:
O xtract, transIorm, and load transaction data onto the data warehouse system.
O Store and manage the data in a multidimensional database system.
O !rovide data access to business analysts and inIormation technology proIessionals.
O Analyze the data by application soItware.
O !resent the data in a useIul Iormat, such as a graph or table.
DiIIerent levels oI analysis are available:
O rtificia ne:ra networks: Non-linear predictive models that learn through training and
resemble biological neural networks in structure.
O enetic agorithms: Optimization techniques that use processes such as genetic
combination, mutation, and natural selection in a design based on the concepts oI natural
evolution.
O Decision trees: Tree-shaped structures that represent sets oI decisions. These decisions
generate rules Ior the classiIication oI a dataset. SpeciIic decision tree methods include
ClassiIication and Regression Trees (CART) and Chi Square Automatic Interaction
Detection (CHAID) . CART and CHAID are decision tree techniques used Ior
classiIication oI a dataset. They provide a set oI rules that you can apply to a new
(unclassiIied) dataset to predict which records will have a given outcome. CART
segments a dataset by creating 2-way splits while CHAID segments using chi square tests
to create multi-way splits. CART typically requires less data preparation than CHAID.
O earest neighbor method: A technique that classiIies each record in a dataset based on a
combination oI the classes oI the record(s) most similar to it in a historical dataset
(where 1). Sometimes called the -nearest neighbor technique.
O #:e ind:ction: The extraction oI useIul iI-then rules Irom data based on statistical
signiIicance.
O Data vis:aization: The visual interpretation oI complex relationships in
multidimensional data. Graphics tools are used to illustrate data relationships.

What technoogica infrastr:ct:re is req:ired?
Today, data mining applications are available on all size systems Ior mainIrame, client/server,
and !C platIorms. System prices range Irom several thousand dollars Ior the smallest
applications up to $1 million a terabyte Ior the largest. nterprise-wide applications generally
range in size Irom 10 gigabytes to over 11 terabytes. NCR has the capacity to deliver
applications exceeding 100 terabytes. There are two critical technological drivers:
O $ize of the database: the more data being processed and maintained, the more powerIul
the system required.
O ":ery compexity: the more complex the queries and the greater the number oI queries
being processed, the more powerIul the system required.
Relational database storage and management technology is adequate Ior many data mining
applications less than 50 gigabytes. However, this inIrastructure needs to be signiIicantly
enhanced to support larger applications. Some vendors have added extensive indexing
capabilities to improve query perIormance. Others use new hardware architectures such as
Massively !arallel !rocessors (M!!) to achieve order-oI-magnitude improvements in query
time. For example, M!! systems Irom NCR link hundreds oI high-speed !entium processors to
achieve perIormance levels exceeding those oI the largest supercomputers.

You might also like