Mod II Data Mining

1
DATA MINING
Data mining is the process of unearthing useful patterns and relationships in large volumes of data. A
sophisticated data search capability that uses statistical algorithms to uncover patterns and correlations,
data mining extracts knowledge buried in corporate data warehouses.
Role of Data Mining in CRM
Although it is still a relatively new technology, businesses from all industry verticals i.e. healthcare,
manufacturing, financial, transportation, etc. have invested in data mining technology to take advantage
of historical data. Data mining techniques in CRM assist your business in finding and selecting the
relevant information that can then be used to get a holistic view of the customer life-cycle; this comprises
of four stages: customer identification, attraction, retention, and development. The more data there is in
the database, the more accurate the models will be created and their subsequent use will result in more
business value.
Data mining typically involves the use of predictive modeling, forecasting and descriptive modeling
techniques as its key elements. Exploiting CRM in this age of data analytics enables an organization to
manage customer retention, select the right prospects & customer segments, set optimal pricing policies,
and objectively measure and rank the suppliers best suited for their needs.
Data Mining Implementation Process
1. Business understanding: In this phase, business and data-mining goals are established.
 First, you need to understand business and client objectives. You need to define what your
client wants (which many times even they do not know themselves)
 Take stock of the current data mining scenario. Factor in resources, assumption, constraints,
and other significant factors into your assessment.
 Using business objectives and current scenario, define your data mining goals.
 A good data mining plan is very detailed and should be developed to accomplish both
business and data mining goals.
2. Data understanding: In this phase, sanity check on data is performed to check whether its
appropriate for the data mining goals.
 First, data is collected from multiple data sources available in the organization.
 These data sources may include multiple databases, flat filer or data cubes. There are issues
like object matching and schema integration which can arise during Data Integration process.
It is a quite complex and tricky process as data from various sources unlikely to match easily.
For example, table A contains an entity named cust_no whereas another table B contains an
entity named cust-id.
 Therefore, it is quite difficult to ensure that both of these given objects refer to the same value
or not. Here, Metadata should be used to reduce errors in the data integration process.
 Next, the step is to search for properties of acquired data. A good way to explore the data is to
answer the data mining questions (decided in business phase) using the query, reporting, and
visualization tools.
2
 Based on the results of query, the data quality should be ascertained. Missing data if any
should be acquired.
3. Data preparation: In this phase, data is made production ready.

 The data preparation process consumes about 90% of the time of the project.
 The data from different sources should be selected, cleaned, transformed, formatted,
anonymized, and constructed (if required).
 Data cleaning is a process to "clean" the data by smoothing noisy data and filling in missing
values.
 For example, for a customer demographics profile, age data is missing. The data is incomplete
and should be filled. In some cases, there could be data outliers. For instance, age has a value
300. Data could be inconsistent. For instance, name of the customer is different in different
tables.
 Data transformation operations change the data to make it useful in data mining. Following
transformation can be applied.
4. Data transformation: Data transformation operations would contribute toward the success of the
mining process.
 Smoothing: It helps to remove noise from the data.
 Aggregation: Summary or aggregation operations are applied to the data. I.e., the weekly
sales data is aggregated to calculate the monthly and yearly total.
 Generalization: In this step, Low-level data is replaced by higher-level concepts with the
help of concept hierarchies. For example, the city is replaced by the county.
 Normalization: Normalization performed when the attribute data are scaled up o scaled
down. Example: Data should fall in the range -2.0 to 2.0 post-normalization.
The result of this process is a final data set that can be used in modeling.
5. Modelling: In this phase, mathematical models are used to determine data patterns.
 Based on the business objectives, suitable modeling techniques should be selected for the
prepared dataset.
 Create a scenario to test check the quality and validity of the model.
 Run the model on the prepared dataset.
 Results should be assessed by all stakeholders to make sure that model can meet data mining
objectives.
6. Evaluation: In this phase, patterns identified are evaluated against the business objectives.
 Results generated by the data mining model should be evaluated against the business
objectives.
 Gaining business understanding is an iterative process. In fact, while understanding, new
business requirements may be raised because of data mining.
 A go or no-go decision is taken to move the model in the deployment phase.
3
7. Deployment: In this phase, you ship your data mining discoveries to everyday business
operations.
 The knowledge or information discovered during data mining process should be made easy to
understand for non-technical stakeholders.
 A detailed deployment plan, for shipping, maintenance, and monitoring of data mining
discoveries is created.
 A final project report is created with lessons learned and key experiences during the project.
This helps to improve the organization's business policy.
INFORMATION REQUIREMENTS OF AN EFFECTIVE CRM SOLUTION

The employees of a firm employing CRM would require rich information about their firm and
customer base including:
• Information about the market
• Information about the firm
• The current segment
• Demographic Distribution (by age, sex, education, income, marital status, etc)
• The firm’s best customers and the segment they belong to, products they buy, preferences,
habits and tastes of each segment.
• Individual level information consisting of:
 Customer personal details such as name, address, family details, education, etc
 The customer group /segment to which the individual belongs
 History of present and past behavior
 Likes, dislikes, habits and preferences
 Events coming up in their personal life etc.
Here’s how data mining might be used in a few key business functions:
 Finance: Use data insights to create accurate risk models for lending, mergers/acquisitions, and
uncovering fraudulent activities
 IT Operations: Collect, process, and analyze massive volumes of application, network, and
infrastructure data to discover insights for IT system security and network performance
 Marketing: Surface previously hidden buyer behavior trends and predict future behaviors to
develop more accurate buyer personas, create more targeted campaigns that increase engagement,
and promote new products or services
 Human Resources: Mine job application data to provide a comprehensive view of a candidate.
Identify the best match for each open role using data analytics to evaluate education, experience,
skills, previous job titles, certifications, and geography.
Applications of Data Mining in CRM

4
 Basket Analysis: Ascertain which items customers tend to purchase together. This knowledge
can improve stocking, store layout strategies, and promotions.
 Sales Forecasting: Examining time-based patterns helps businesses make re-stocking decisions.
Furthermore, it helps you in supply chain management, financial management and gives complete
control over internal operations.
 Database Marketing: Retailers can design profiles of customers based on demographics, tastes,
preferences, buying behavior, etc. It will also aid the marketing team in designing personalized
marketing campaigns and promotional offers. This will result in enhanced productivity, optimal
allocation of the company’s resources and desirable ROI.
 Predictive Life-Cycle Management: Data mining helps an organization predict each customer’s
lifetime value and to service each segment appropriately.
 Market Segmentation: Learn which customers are interested in purchasing your products and
design your marketing campaigns and promotions keeping their tastes and preferences in mind.
This will increase efficiency and result in the desired ROI since you won’t be targeting customers
who show little to no interest in your product.
 Product Customization: Manufacturers can customize products according to the exact needs of
customers. In order to do this, they must be able to predict which features should be bundled to
meet customer demand.
 Fraud Detection: By analyzing past transactions that were later determined to be fraudulent, a
business can take corrective measures and stop such events from occurring in the future. Banks
and other financial institutions will benefit from this feature immensely, by reducing the number
of bad debts.
 Warranties: Manufacturers need to predict the number of customers who will make warranty
claims and the average cost of those claims. This will ensure efficient and effective management
of company funds.
5
LEVELS OF DATA MINING OPERATIONS

Depending on the level at which data mining is being done, the data mining operations are divided into
two categories:
 The aggregate or the macro level: In this level, without looking at any customer in particular,
the general customer behavior, trends and preferences are found from a large database, such as
market basket purchases from a retail store. Mining at the macro level gives us a broad overview
of the data e.g. when customer of the retail store are segmented by profitability criteria, we obtain
clusters who are profitable to various extent. Knowledge obtained by mining at macro level is
useful when dealing with situations where:
o We are dealing with a customer about whom we do not have individual information:
Hence,
we need to extrapolate the characteristics of the group to which he/she might belong. In
retail store example, a store can segment its customers on basis of age and characteristics
can be extracted. When a new customer enters the store, the salesman could use his
intuition
in arriving at the customer’s age and recover the characteristics of that age group such as
the frequently bought products, colour preferences, etc.
o Targeting new set of customers: If the retail chain has opened a new store it can use the
data
from the most similar current store to predict the behavior of the new prospects.
o We are dealing with aspects of the service, which influence a majority of the customer and
therefore cannot be customized to suit individual tastes, example being the design of the
physical layout of a retail store.
o Predicting the possibility of an action that the customer has never undertaken. A customer
might not have tried out a new product because he/she was not aware of it. A salesman can
encourage him/her to try out the product if his/her profile matches that of the current
product
users.
 The Individual or Micro level: This level tries to track a particular customer and try further the
relationship by acting in a timely way to his/her choice and preferences. As the interactions of the
individual with the firm increases, the firm obtains more data about him/her. Offering
individualized value adding propositions can strengthen relationship with the individual customer.
For this, we need to track the customers and mine their data at the individual or micro level. Some
important features to note about mining at this level are:
6
o Micro-level mining provides specific information about a particular customer. For

example, the retail store can go to the extent of finding out the preferred colours of his
shirt
o A firm takes up micro level mining to build a detailed customer profile of a regular
customer.
o Data mining at this level might be expensive if the data mining tool has to collect
individual information from a large database. Having a separate database for profitable
customer might be helpful.
Knowledge obtained at the individual level is useful in situations where:
o The firm wants to customize its offering to the customer based on the customer’s tastes
and preferences e.g. the retail store can offer discounts on the purchase of a bundle of
products that the customer prefers buying together.
o The firm wants to assist the purchase of a new product based on the information it has of
the last purchase. For example, if a customer has bought a suit in his visit, then the store
might offer a discount on the purchase of a tie of a matching colour.
o The firm wants to take advantage of the personal events in a customer’s life (e.g.
birthdays, anniversaries, birth of child etc.) to further cement the precious relationship.
o Current patterns that go against usually observed customer behavior point to interesting
phenomenon. If retail customer suddenly switches brand then he/she might not be satisfied
with the last purchase.
The most common operations used at this level are: -

 Classification: Classification is a process that maps a given data item into one of
the several predefined classes. CRM uses classification for a variety of purposes
like behavior prediction, product and customer categorization.
 Regression: Regression is the operation of learning a function that predicts the
value of a real valued dependent variable based on values of other independent
variables. Regression finds application in a CRM environment where prediction
needs to be made about the behavior regarding real value-added variables. Suppose
the retail store collects data on the monthly visits of the customers viz. frequency,
time spent on each visit, and purchases made during each visit. If the manager has
a strong intuition that total purchase is linked to frequency of visit, then this
situation can be modeled by regression. This model can then be used to predict
future purchases of a customer. Regression needs sufficient amount of data to be
reliable and valid.
 Link Analysis: Link Analysis seeks to establish relationship between items or
variables in a database record to expose patterns and trends. Link analysis can also
trace connections between items of record over time. The most important link
analysis application in CRM, called market basket analysis, is an operation that
seeks relationship between product items characterizing product affinities or buyer
preferences. The retail store collects thousands of interactions daily. A link
analysis task performed on this data will point to items that are bought together
7
e.g. bread and butter are bought together rather than bread and orange juice. Such
information can be used to design store layouts, design coupons, etc.
 Segmentation: Segmentation aims to identify a finite set of naturally occurring
clusters or categories to describe data.
 Deviation Detection: Deviation Detection (DD) focuses on discovering the most
significant changes in the data from previously measured, expected or normative
values. Most CRM solutions have a DD task running in parallel on a regular basis.
Suppose a retailer finds out that the sales from a particular section of the store have
been much less than expected. This deviation on further analysis points out to non-
stocking of a popular brand.
Tools such as decision trees, rule induction, case-based reasoning, visualization

techniques, nearest neighbor techniques, clustering algorithms, etc are used for the
above purposes.

Mod II Data Mining

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Mod II Data Mining

Uploaded by

Copyright:

Available Formats

1

Data Mining Implementation Process

3. Data preparation: In this phase, data is made production ready.

INFORMATION REQUIREMENTS OF AN EFFECTIVE CRM SOLUTION

Applications of Data Mining in CRM

LEVELS OF DATA MINING OPERATIONS

o Micro-level mining provides specific information about a particular customer. For

The most common operations used at this level are: -

Tools such as decision trees, rule induction, case-based reasoning, visualization

You might also like