You are on page 1of 50

Data Mining and Its

Importance in IT Industry
JOCELYN B. BARBOSA, CoE, MSIT
IT Faculty
Data , Data everywhere..

The Information Age (also known as the Computer Age, Digital Age, or New
Media Age) is a period in human history characterized by the shift from
traditional industrialization, to an economy based on information
computerization.
2
Data: stored representations of meaningful
objects and events
Structured: numbers, text, dates
Unstructured: images, video, documents

Facts and statistics collected together


for reference or analysis.
Database: organized collection of logically related data
2.5 Quintillion Bytes of Data
(2.5 billion billion)
Data Stored

2000 2005 2010 2015

Database stores data which are growing up tremendously!


We made this in
the last 5 years

There has been a huge increase in the amount of data being


stored in database over past twenty years.
store
create

control
find
access
Personal Details

Emails

Videos
Social Networks

To collect, Documents
Store and
Contacts
process
Instant messages
There has been a huge increase in the
amount of data being stored in database
over past twenty years.

All that user wants is more sophisticated


information.

This surges up the demand of Data mining.

9
What is Data mining?
Data mining refers to extracting or
mining knowledge from large amount of
data. - By Jiawei Han, Micheline Kamber,

10
What is Data mining?
Searching through large amounts of data for
correlations, sequences, and trends.

Current driving applications in sales (targeted


marketing, inventory) and finance (stock picking)
Select information to be mined Choose mining tool (based on Evaluate results
type of results wanted)

Sales data
C luster
70% of
Sequence customers who
purchase
comforters later
C lassify purchase
Inference curtains
What is Data Mining?
History
Knowledge Discovery in Databases workshops started 89
Now a conference under the auspices of ACM SIGKDD
IEEE conference series starting 2001
Key founders / technology contributers:
Usama Fayyad, JPL (then Microsoft, now has his own company,
Digimine)
Gregory Piatetsky-Shapiro (then GTE, now his own data mining
consulting company, Knowledge Stream Partners)
Rakesh Agrawal (IBM Research)

The term data mining has been around since at least 1983
-- as a pejorative term in the statistics community
Knowledge Discovery in Databases:
Process
Interpretation/
Evaluation

Data Mining Knowledge

Preprocessing
Patterns

Selection
Preprocessed
Data
Data
Target
Data

adapted from:
U. Fayyad, et al. (1995), From Knowledge Discovery to Data
Mining: An Overview, Advanced in Knowledge Discovery and Data
Mining, U. Fayyad et al. (Eds.), AAAI/MIT Press

See also: http://www.crisp-dm.org

Data mining focuses on extraction of information from a large set of data and
transforms it into an easily interpretable structure for further use.
Why Data mining?

I am not able to find the data I need.


(Data is dispersed over the network)

The data I have access to is poorly documented.


(Proper information is missing)

I am not able to use the data I have.


(Unexpected results)

14
Example of Data mining in IT:

A company that offers online services in several


countries faces a problem in times of loading the
homepage.
Even IT department employees could not solve the
bug. Based on customer feedback and last mile
measurements, company gets to know that IT
services are suffering from insufficient time for
loading the homepage for some users.

15
Management has the high interest in solving the
problem because the response time of the site directly
affects the online sales, IT department has already
looked in to log files with the help of mining tool.

It has been discovered that theres no such single type


of typical problem. Instead data mining program
discovers that there are various problem clusters.

16
The patterns for this problem is determined by a combination
of country code , operating system, and the version of a
browser that the customers use.

One another pattern shows that the same bad performance


in loading times of the website is caused by a different
problem.

17
Speed of internet and cookie settings also lead to
greater time for loading the site. This is how data
mining helps in indentifying the problem and
hence solving the problem becomes quick.

This is just one example of how data mining can be


so useful.

18
What Can Data Mining Do?

Summary - Link Sequential


Categorical, Regression statistics, Analysis / associatio
Summary rules Model ns
Depende
ncies

19
Predictive Data mining
The objective of predictive tasks is to use the
values of some variable to predict the values
of other variable.

Ex: Web mining is used by the online


marketers to predict the purchase by online
user on a website.

20
Classification
Classification is used to map data in a predefined groups.
Find ways to separate data Route documents to
items into pre-defined groups
We know X and Y belong most likely
together, find other things in
same group interested parties
Requires training data: Data English or non-
items where group is known
english?
Uses:
Profiling Domestic or Foreign?
Technologies: Training Data
Generate decision trees
(results are human tool produces
understandable)
Neural Nets Groups

classifier
Classification Example
Clustering

Find groups of similar data Group people with


items similar travel
Statistical techniques require
definition of distance (e.g. profiles
between travel profiles), George, Patricia
conceptual techniques use
background concepts and Jeff, Evelyn, Chris
logical descriptions
Rob
Uses:
Demographic analysis
Clusters
Clustering
Clustering is a technique for finding similarity groups in data,
called clusters

Example 1: groups people of similar sizes


together to make small, medium and
large T-Shirts.
Tailor-made for each person: too
expensive
One-size-fits-all: does not fit all.

Example 2: In marketing, segment customers


according to their similarities
To do targeted marketing.
Example 3: Given a collection of text
documents, we want to organize them
according to their content similarities,
To produce a topic hierarchy
Association Rules
Identify dependencies in Find groups of items
the data: commonly purchased
X makes Y likely together
Indicate significance of People who purchase fish
each dependency are extraordinarily likely
to purchase wine
Bayesian methods
People who purchase
Turkey are extraordinarily
Uses: likely to purchase
Targeted marketing cranberries

Date/Time/Register Fish Turkey Cranberries Wine


12/6 13:15 2 N Y Y Y
12/6 13:16 3 Y N N Y
Association Rules
market basket analysis It helps to understand what products or
services are commonly purchased together.
Association Rules
Market Basket Analysis

Large US supermarket chain which discovered a strong


association for many customers between a brand of babies nappies
(diapers) and a brand of beer.

Most customers who bought the nappies also bought the beer.
Descriptive Data mining

The objective of descriptive tasks is to find


human readable patterns which describes
the relationships between data.

29
Application Area of Data mining

Although data mining is still in its early stage;


companies in a wide range of industries are already
using data mining techniques to take advantage of
historical data.
retail,
heath care,
manufacturing,
finance, and
Transportation

30
Marketing /Retail :

Marketers will have appropriate approach for targeted


customers.

By using market basket analysis, a store


can have an appropriate arrangement in
such a way that customers can purchase
frequent buying products together with
pleasant.
Application Area of Data mining
(contd)
Data mining helps analysts to recognize significant
facts, relationships, trends, patterns and anomalies
which might go unnoticed otherwise.

Data mining uses pattern recognition technologies


and mathematical techniques to sift through
warehoused information.

32
Application Area of Data mining
(contd)
In business,
Data mining is useful for discovering patterns and
relationships in data to help make better decisions.

Data mining helps in developing smarter marketing


campaigns and to predict customer loyalty.

33
Application Area of Data mining
(contd)
Data Mining application in Medical Image Classification or Medical
Diagnosis

Facial Paralysis Assessment (FP classification and grading)


Application Area of Data mining
(contd)

35
Application Area of Data mining
(contd)

36
Example : Facial Paralysis Assessment
(FP classification and grading)

Facial Part Detection using HAAR classifiers


Based on the paper FACIAL FEATURE DETECTION USING HAAR CLASSIFIERS
Philip Wilson et al, Journal of Computing Sciences in Colleges, Volume 21 Issue 4, April 2006, Pages 127-133

Facial Features Detection


Automatic Parameter Selection for the LAC initial evolving curve
Feature extraction: Calculation of the distance ratio

Facial
movement :
whistling

dSO_IO1 dSO_IO2 dSO_ dSO_I


IC1
C2

Facial
movement:
screwing of
nose/snarl

Facial expression activity: raising of


eyebrows
(dSO_IO1 < dSO_IO2 )
dSO_IO1 affected side
(dSO_IO1 /dSO_IO2 ) approaches 1 if normal
Feature extraction: Calculation of the distance ratio

Calculation of the distance ratio and the iris area ratio during snarl
activity or screwing of nose
Classification
Classification
0
0
0
0
0
0
0
0
1
1
1
1
?
Training Data

tool produces

Groups

classifier
Advantages of Data Mining

In Marketing / Retail :

Marketers will have appropriate approach for


targeted customers.

By using market basket analysis, a store can


have an appropriate arrangement in such a way
that customers can purchase frequent buying
products together with pleasant.

It also helps the retail companies to offer certain


discounts which will attract more customers
42
Advantages of Data Mining

Finance / Banking
By building a model from historical
customers data of loans, the bank officials
and financial institution can determine
good and bad loans.

Data mining also helps banks to detect


fraudulent credit card transactions.

43
Advantages of Data Mining

Manufacturing
Data mining is useful in operational
engineering data, which can detect faulty
equipments and determines optimal control
parameters.
Governments
Data mining helps in building patterns that can detect
money laundering or criminal activities.

44
Specific use of data mining include:

Market segmentation - Data mining helps to identify


the common characteristics of customers who buy
the same products from your company.
Customer anticipation - It helps to predict which
customers may leave your company and go to a
competitor.
Fraud detection - It indentifies which transactions are
most likely to be fraudulent.

45
Direct marketing - Direct marketing identifies which
prospects should be included to obtain the highest
response rate.
Interactive marketing - It is useful for predicting what
each user on a Web site is most likely interested in seeing.
Market basket analysis - It helps to understand what
products or services are commonly purchased together.
Trend analysis - Trend analysis identifies the difference
between a typical customer this month and last.

46
Disadvantages of data mining
Privacy Issues
Information might be collected and used in unethical
way which can potentially cause a lot of troubles.

Businesses collect the information of its users for


setting up the marketing strategies but there are
chances that business might be taken by other firms or
gets shut down and thats where a concern of misusing
or leaking the personal information arises.

47
Security issues
Security is the biggest concern in data mining.
Businesses own all the information of their
employees which even includes personal and
financial information, there are the chances of
misusing data by hackers and which cause
serious trouble to the organization and its
employees.

48
In Data mining,

49
50