You are on page 1of 35

WHAT IS BUSINESS ANALYTICS?

PREDICTIVE DESCRIPTIVE PRESCRIPTIVE


ANALYTICS ANALYTICS ANALYTICS

BUSINESS
INTELLIGENCE ??

DATA MINING BUSINESS ANALYTICS

BIG DATA ??
A PERSPECTIVE ON NUMBERS
Kilo
Mega
Giga
Tera
Peta
Exa
Zetta

Googol
LEARNING OUTCOMES
KNOWLEDGE Acquire both theoretical and practical understanding of
classification, prediction, reduction and exploration of data that is
at the heart of data mining
Gain a business decision-making context knowledge through data
mining
Apply this knowledge during decision making process through
concepts and interpretation of relevant tools
SKILLS Use SPSS software through various tools in Analytics lab
Choose right tools to solve analytical problems
Use real business cases analyze and interpret these solutions
Apply these skills in real life decision making situation
ATTITUDE Appreciate the value of data by ensuring quality in data analysis
Analyze and interpret data to add value to business problems
enabling better business decisions
To incorporate ethical practices while dealing with data
ANALYTICS/DATA MINING
Process of discovering interesting information from
large amounts of data stored in either databases,
data warehouses or other information repositories.

Business analytics is comprised of solutions used to


build analysis models and simulations to create
scenarios, understand realities and predict future
states. Business analytics includes data mining,
predictive analytics, applied analytics and statistics,
and is delivered as an application suitable for a
business user.
Gartner
PREDICTIVE ANALYTICS
Predictive Analytics is an area of data mining
that deals with extracting information from
data and using it to predict trends and
behaviour patterns.
QUESTIONS
Data mining algorithms can generate thousands
of patterns.
Are all the patterns interesting?
Easily understood by humans,
Valid on new or test data with some degree of
certainty,
Potentially useful,
Novel.
An interesting pattern represents knowledge
Interestingness measures.
WHAT IS INTERESTING INFORMATION ?
DATA

PATTERN

INFORMATION

KNOWLEDGE

KNOWLEDGE DISCOVERY
DATA MINING : CORE IDEAS
Descriptive Analytics
Predictive Analytics
Classification
Prediction
Discrimination
Characterization
Classification
Clustering
Association
PREDICTION
Attempt to predict the value of a numerical
variable
Based on historical data
Involves understanding trends and patterns
found in historical data
Dependent and independent variables
CLASSIFICATION
Predict the class (or category) of a variable
rather than the value
Use of historical data to build a model
patterns and trends
With the help of the model, new data can be
examined to predict its class
ASSOCIATION
What goes with what analysis.
Association analysis is the discovery of
association rules showing attribute-value
conditions that occur frequently together in a
given set of data.
Association analysis is widely used for market
basket or transaction data analysis.
ASSOCIATION
Age (20 29) with Income (Rs 10000 25000)
associated with purchase of DVD Players.
X Y,
records that satisfy X also satisfy Y.
Support and Confidence.
Each attribute is referred to as a dimension.
The above rule can be referred to as a
multidimensional association rule.
Another example :
Baby diaper, Friday evening Beer [1%, 70%]"
CLUSTERING
Clustering analyzes data objects without
consulting a known class label.
Clusters of objects are formed so that objects
within a cluster have high similarity in
comparison to one another, but are very
dissimilar to objects in other clusters.
Each cluster can be viewed as a class of objects,
from which rules can be derived.
Clustering analysis can be performed on
customer data in order to identify homogeneous
subpopulations of customers.
DATA CHARACTERIZATION
Data characterization is a summarization of
the general characteristics or features of a
target class of data.
Typically collected by a database or OLAP
query.
Example :
The characteristics of all the software products
whose sale increased by 25% in the last one
year.
DATA DISCRIMINATION
Data discrimination is a comparison of the
general features of target class data objects with
those of objects from one or a set of contrasting
classes.
Compare the characteristics of software products
whose sales increased by 10% in the last year
with those whose sales decreased by at least 30%
during the same period.
Techniques used are same as for Characterization.
QUESTIONS
Can a data mining system generate all the
interesting patterns?
Unrealistic and inefficient to generate all
interesting patterns.
Focused search which makes use of
interestingness measures should be used to
control pattern generation.
Can a data mining system generate only the
interesting patterns?
STEPS IN DATA MINING
Develop an understanding of the purpose of the
data mining project or application.
Obtain the dataset to be used in the analysis.
Sampling, integration.
Explore, clean and preprocess the data.
Reduce and separate :
Eliminate variables (unnecessary variables)
Transform variables (money spent)
Create variables (whether a category item was
purchased)
Training and validation datasets.
STEPS IN DATA MINING
Determine the data mining task :
Association, clustering etc.
Choose the data mining technique :
Regression, neural networks etc.
Use algorithms to perform task.
Interpret the results.
Finding the best algorithm, fine tuning the
algorithm etc.
Deploy the model.
STEPS IN DATA MINING
SEMMA
Developed by SAS.
Sample, Explore, Modify, Model, Assess.
CRISP-DM
SPSS.
Cross-Industry Standard Process for Data Mining.
DATA AND VARIABLES : TYPES?
Numerical Data
Continuous
Discrete
Categorical Data
Ordinal
Nominal

Numeric
Text
DATA AND VARIABLES : HOW MUCH?
Thumb rule : 10 data points for every
predictor variable.
Another : 6 x m x p
m classes and p predictors.
Domain experts to choose variables.
Straight line correlation may be an indication
of redundant attributes.
PREPROCESSING DATA
Conversion of continuous to categorical data
Binning
Conversion of nominal/ordinal data to
numeric data
Dummy variables for nominal data
Identification of outliers
Dimension reduction
Standardization
DUMMY VARIABLES
Consider a single variable that has four
possible values: Student, Unemployed,
Employed, Retired
Four dummy variables:
Student: Yes/No
Unemployed: Yes/No
Employed: Yes/No
Retired: Yes/No
STANDARDIZING DATA
Z-score
Each value expressed as no of std deviations
away from the mean.
To prevent some variables from dominating
the others.
Data mining software have the feature.
PRINCIPLE COMPONENT ANALYSIS
Useful procedure for reducing the number of
dimensions (predictors) data reduction.
Valuable when there are predictor variables
that are highly correlated.
Find linear combinations of old set of variables
that can replace the original variables.
The new variables are uncorrelated.
NORMALIZING DATA
Another use of PCA.
Usually done when scales of variables are very
different.
No of standard deviations away from the
mean z score.
Not always desirable. Hence in most of the
data mining products it is an optional feature.
PARTITIONING DATA
Training data.
Used to build models.
Validation data.
To assess performance of each model and to
choose the best.
Also can be used to tune and improve the model.
Test data.
Prevention of overfitting.
OVERSAMPLING
Used when there are two classes and one of the
classes is of much greater interest than the other,
and,
The number of data points in the class of interest
is much lower than in the other class
Cost of misclassification is an indicator

Overweight the class of interest so that the


training set consists of equal numbers in both
classes
Original ratio is retained in the validation set
OVERFITTING

Advertising Sales
239 514
364 789
602 550
644 1386
770 1394
789 1440
OVERFITTING
1600

1400

1200

1000

800
Series1

600

400

200

0
0 200 400 600 800 1000
OVERFITTING
1600

1400

1200

1000

800

600

400

200

0
0 200 400 600 800 1000
OVERFITTING
1600

1400

1200

1000

800

600

400

200

0
0 200 400 600 800 1000
OVERFITTING
Also observed when the number of records is
not much larger than the number of
predictors.
With lesser data, lower degree curves may
perform better than higher degree curves.
Higher degree curves may confuse noise
(chance variation) with signal.
CONCLUSION