You are on page 1of 40

Management 

Information System
Saini Das
Vinod Gupta School of Management, IIT Kharagpur

Module 02: Foundations of Business Analytics – Part 1
Lecture 01 : Datamining
‐ W. Edwards Deming
“There is a striking correlation between an
organization's analytics sophistication and its
competitive performance.”

10 Insights: A first look at the new intelligent enterprise survey on winning with data,
MIT Sloan Management Review, Vol 52, No 1, 2010
“Data Scientists will be the sexiest job of 
21st century” 

Harvard Business Review 2012 
Why mine data?

 Huge amounts of data being collected and warehoused
 Walmart records 20 million items in transactions per day
 Health care transactions: multi‐gigabyte databases

 Source of competitive advantage
What is Data Mining and KDD?

• Knowledge discovery in databases (KDD) is the non‐trivial 
process of identifying valid, potentially useful, 
understandable and ultimately actionable patterns in data.

• Data mining is a step in the KDD process of applying data 
analytics and discovery algorithms
Knowledge Discovery Process

Integration Understanding

Interpretation Knowledge
& Evaluation

Knowledge
Raw
Data __ __ __
Patterns
__ __ __
__ __ __ and
Rules
Transformed
DATA Target Data
Ware Data
house
Information Requirements of Key Decision‐Making 
Groups in a Firm
Stages In Decision Making
Data Mining is multidisciplinary

Statistics Machine learning

Data Mining

Database
systems
Data Mining….

Data mining is an information-analysis tool that


involves the automated discovery of patterns and
relationships in a data warehouse.
The University of Maryland- “forecast terrorist behavior
based on past actions”
Used extensively in marketing to improve
customer retention;
Pricing analysis; and customer segmentation analysis
E-commerce presents another major opportunity for
effective use of data mining.
DataMining….

Predictive analysis is a form of data mining that


combines historical data with assumptions about future
conditions to predict outcomes of events
My space, Facebook and others mines the data of all of its
members to determine which ads should be displayed for
each member to attract the maximum attention and hits.
Police Departments uses predictive analysis to predict
“when and where crimes were likely to occur, so officers
can be on hand to prevent their occurrence
Data Mining….
What cannot be obtained  using OLAP :

• Finds hidden patterns, relationships in datasets
• Example: customer buying patterns
• Infers rules to predict future behavior
• Types of information obtainable from data mining:
• Associations: Occurrences linked to single event
• Sequences: Events linked over time
• Classification: Recognizes patterns that describe group to which item belongs
• Clustering: Similar to classification when no groups have been defined; finds 
groupings within data
• Forecasting: Uses series of existing values to forecast what other values will be
13
Data Mining
 Data mining is the process of extracting patterns from data that is used in a wide range of uses such as 
marketing surveillance, fraud detection and scientific discovery.
 Data Warehousing provides Enterprise with memory, Data Mining provides Enterprise with intelligence. 
 Data mining can discover valid, novel, potentially useful, and ultimately understandable patterns in data. 

Data Mining Steps
 Data Preparation
 Model Construction
 Results validation

Data Mining Vendors

• Major players: Clementine, IBM’s Intelligent Miner, SGI’s MineSet, SAS’s Enterprise Miner.
• Other players: DataMind (NeurOagent), Information Discovery (IDIS), SAS Institute (SAS/ Neuronets) 
Text Mining and Web Mining
• Text mining
• Extracts key elements from large unstructured data sets 
• Sentiment analysis software
• Web mining
• Discovery and analysis of useful patterns and information from web
• Web content mining
• Web structure mining
• Web usage mining

15
Databases and the Web
–Many companies use the web to make some internal databases 
available to customers or partners
–Typical configuration includes:
• Web server
• Application server/middleware / Datawarehouse
• Database server (hosting DBMS)
–Advantages of using the web for database access:
• Ease of use of browser software
• Web interface requires few or no changes to database
• Inexpensive to add web interface to system

16
Establishing an Information Policy
• Firm’s rules, procedures, roles for sharing, managing, standardizing 
data
• Data administration 
• Establishes policies and procedures to manage data
• Data governance
• Deals with policies and processes for managing availability, usability, integrity, 
and security of data, especially regarding government regulations
• Database administration
• Creating and maintaining database

17
Ensuring Data Quality 
• More than 25 percent of critical data in Fortune 1000 company 
databases are inaccurate or incomplete
• Before new database is in place, a firm must:
• Identify and correct faulty data 
• Establish better routines for editing data once database in operation
• Data quality audit
• Data cleansing

18
Data Mining Applications

Typical Applications
Customer Segmentation What
Which
WhatHow
Which
are
isofcan Imost
customers
my
the tellchannel
mymarket
best which
are
valuable
What is the life time
segments
goodto
transactions
reach
candidates
and
customers
who
my customers
are
for
are
myare
our
at likely
customers
new
risk to
in each
of
long
Propensity to Buy profitability of my customers ?
be
distance
marketfraudulent
byleaving
segment
calling
segment? ??
? plans ?
Profitability Modeling & Profiling
Customer Attrition
Channel Optimization
Fraud Detection

Targeting customers
Personalize
Increase
Prevent
Interact loss
w/customers
high
customer
ofvalue
highbased on
customers
value their needs.
relationships.
based
customers
onbased
Detect and prevent fraud to minimize loss.
More product
Higher
on
and
their
current
let sales
preference. = value
satisfaction
go of
& lower
future Greater
= loyalty
profitability.
Higher
customers.
retention
19
Industry wide applications of analytics
Some other applications….
• Marketing
• Which customers are likely to respond to this campaign?
• Which customers are likely to be profitable ?
• Who might want to buy this product ? (‘Cross-selling’)
• Which web-pages are customers visiting before buying products or
before leaving our site ? Which types of customers visit which pages ?
• Telecommunications
• Which customers are vulnerable to attrition (at risk of churning) ?
• Based on these symptoms, where are problems located in the network ?
• Finance and Insurance
• Which customers are credit risks / insurance risks ?
• Which claims or credit transactions are fraudulent ?
• Which stocks are likely to perform well in the next 3 months, and why ?
When should we buy and sell, given the likely performance and
transaction costs ?

21
21
Some other applications (Contd..)
• Healthcare
• Which patients may take longer to recover ?
• What is the likely cause of this illness ?
• Which patients are at risk of disease (and might benefit from
medication)? Pfizer pharmaceuticals used data mining to construct a
predictive model that was then embedded in their online cholesterol
health risk assessment, which tells patients their cholesterol risk score.
High risk patients can consult their doctors and request Lipitor, Pfizer’s
cholesterol medication.

• Retail
• Which products do customers buy together (or in sequence) & which do
they not buy together ? (‘Category management’.)
• What characterizes customers at various stores ?
• What items are bought for cash, on credit, or by check ?
• What type of customer buys this item, or this product type ?

22
22
Data Mining Applications (Contd..)

• Quality Control
• Which shipments are high-risk and need to be inspected ?
• Customer Support
• Which tasks schedule (ordering) is optimal (or good enough) ?
• Which customer service representative should I assign to a task ?
• What documents or people are likely to be helpful to the customer in
solving their problem ?

23
23
Real‐life Applications…

Data Mining helps to predict your most likely purchases 
So keep beer bottles close to  Diapers
Target predicts customer pregnancy

https://www.youtube.com/watch?v=XH1wQEgROg4

26
Categories of Data Analytics

27
Descriptive Analytics Applications

28
Diagnostic Analytics Applications

 Why customers liked your social media campaign or why they 
didn’t?

 Why certain products were popular at a certain time, at a certain 
place?

29
Predictive Analytics Applications

30
Prescriptive Analytics Applications

31
Analytics Tools & Techniques
Data 
Analytics

Descriptive  Diagnostic  Predictive  Prescriptive 


analytics analytics analytics analytics

Data Visualization Root cause Regression Optimization


analysis
Clustering Forecasting

Classification
Decision tree
Neural nets
SVM
Logistic Regression

Association Rule
Mining
Descriptive Analytics Techniques
 Data Visualization: Graphical representation of data. By using visual elements 
like charts, graphs, and maps, data visualization tools provide an accessible way 
to see and understand trends, outliers, and patterns in data.
 Software for Data Visualization: MS Excel; Power BI, Tableau; QlikView

33
Descriptive Analytics Techniques (contd..) Intercluster distances
are maximized

 Clustering: Given a set of data points, each having a set of 
attributes, find clusters such that:
 data points in one cluster are more similar to one another
 data points in separate clusters are less similar to one another.
Intracluster distances
are minimized
 Application in Market Segmentation: Subdivide a market into 
distinct subsets of customers based on their geographical and 
lifestyle related information where any subset may conceivably be 
selected as a market target to be reached with a distinct marketing 
mix.

34
Predictive Analytics Techniques
 Regression analysis: Regression analysis is a statistical technique 
used to describe relationships among variables.  The simplest case to 
examine is one in which a variable Y , referred to as the dependent or 
target variable, may be related to one variable X, called an 
independent or explanatory variable.

 Applications:
 Predicting events that are yet to occur:
 Demand analysis or number of units consumers will purchase in the next 
quarter based on past trends
 Number of shoppers who will watch a particular advertisement.
 Number of policy holders who will be involved in accidents in the next year.

35
Predictive Analytics Techniques (contd..)
• Classification: Data defined in terms of attributes, one of which is the class or
target and others are predictor variables.
• Find a model for class/target attribute as a function of the values of
other(predictor) attributes, such that previously unseen records can be
assigned a class as accurately as possible. Given data is usually divided into
training and test sets.
• Training Data: used to build the model
• Test data: used to validate the model (determine accuracy of the model).
• Classification Techniques:
• Decision tree
• Neural Networks
• Support Vector Machines
• Bayesian Classifier

36
Predictive Analytics Techniques (contd..)
 Application 1: Given old data about fraudulent customers and their background, predict whether new
customer will commit fraud or not.
Class/
Predictor attributes
Target
Tid Refund Marital Taxable
Status Income Cheat Actual

1 Yes Single 125K No


Predicted by
2 No Married 100K No algorithm
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No Test
Set
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
Training
Learn
10 No Single 90K Yes Model
10

Set Classifier

37
Predictive Analytics Techniques (contd..)
• Application 2: Given old data about customers and payments, predict new 
applicant’s loan eligibility.

Previous customers Classifier Decision rules Class/


Salary > 5 L Target
Age Eligible/
Salary Not Eligible
Prof. =  Exec
Profession
Location
Customer type Loc =  Mumbai

Custype =   Age > 40
Eligible

New applicant’s data

38
Predictive Analytics Techniques (contd..)
• Association Rule Mining: Finding frequent patterns:
• Analysing item sets in customers basket or transactions and identifying 
frequently occurring item sets, which can be basis for recommendations.
• Known as Market Basket Analysis in Retail.

 Applications:
 Store layout: Place the frequently occurring items either in 
close proximity or as far apart as possible; give combo offers.
 Warehousing and inventory management.

39
THANK YOU!

40

You might also like