Prelude………….

“There will be more data created in

history of the planet Earth.

The ability to access that data, the

deliver it to the right people at

the right time so that they can

make the right decision, is going to

be critical to every enterprise on

the planet.

How analytics can help

Business Challenge

Internal Transactional Data

Perception Data data

Measure as-is

• Reporting • Performance Analyses • Trend Analyses

Analyze

• Segmentation • Benchmarking and Gap Analyses

•Trigger Analyses •Key Driver Analyses

Predict/Improve

•Modeling •Optimization

•Simulation

Business Intelligence and

Analytics

Optimization What’s the best that can happen?

Predictive What will happen next?

Analytics

Modeling

Forecasting / What if these trends continue?

Exploration

Competitive Advantage

Analysis

Alerts What actions are needed?

Query / Drill Where exactly is the problem

down

Ad hoc reports How many, how often, where?

reports

Degree of Intelligence

Data Rich - Information Poor

big problem: it collects a lot of important

business data that never gets used.

competitive advantage can’t get the level

of access they need.”

--Forrester Research

DATA TYPES

Nominal-identify and classify

Ordinal-order data

Interval-rate the data

Ratio-set a reference.

3 & 4 are metric data.

DATA MINING

Data reporting.

Data Cleaning:

Consistency Checks-identifies data that are out

of range,logically inconsistent,or have extreme

values.Data not defined by CODING are

inadmissible.

Missing Responses-values of variables that are

unknown,because unambigous answers are

provided or no answer given.

Treatment:Substitute a neutral value,imputed

response.

Data divided into TRAINING DATA,TESTING

DATA,PREDICTION DATA.

Stages in Data Analysis

Editing

Coding Error

Checking

And

Data Entry Verification

(Keyboarding)

Data

Analysis

Analysis Analysis Analysis Analysis

Interpretation

A Classification of Multivariate Methods

A ll M u lt iv a r ia t e

M e t h o d s

A r e s o m e o f t h e

v a r ia b le s d e p e n d e n t

o n o t h e r s ?

Y e s N o

D e p e n d I en n d c e e p e n d e n c e

M e t h o d sM e t h o d s

Multivariate Analysis: Classification of Dependence Methods

D e p e n d e n c e

M e t h o d s

H o w m a n y v a r i a b le s

a r e d e p e n d e n t ?

O n e D e p e n d e n t S e v e r a l D e p e n d e M n ut l t i p l e in d

V a r i a b l e V a r i a b l e s a n d d e p e n

v a r i a b l e s

M e t r i c - T h e sN c o a n l em s e a t r r i e c M - e T t rh i ce - T h e sN c o a n l em s e a t r r i ec - TC h a e n o n i c a

r a t i o o r i n t e s r c v a a l le s a r e n or a m t i oi n ao lr i n t e s r c v a a l l e s a r e n o mA ni n a a l y l s i s

o r o r d i n a l o r o r d in a l

M u l t i p l e M u

lt ip le M u lt iv a r ia t e C o n jo in t

R e g r e s s i o n D i s c r i m in a n At n a ly s i s o f A n a l y s i s

A n a ly s i s V a r i a n c e

( M A N O V A )

Multivariate Analysis: Classification of Independence Methods

I n d e p e n d e n c e

M e t h o d s

A r e i n p u t s

M a t r i c ?

M e t r i c - T h e N o n m e t r i c

s c a l e s a r e r a t i o s c a l e s a r e

o r I n t e r v a l o r o r d i n a

F a c t o r C l u s t e r M e t r i c N o n m e t r i

A n a l y s i s A n a l y s Mi s u l t i d i m eM n u s l i t o i d n i am l e

S c a l i n g S c a l i n g

Correlation Analysis

Correlation analysis is a statistical technique used to

measure the magnitude of linear relationship

between two variables.

Correlation analysis cannot be used in isolation to

describe the relationship between variables.

It can be used along with regression analysis to

determine the nature of the relationship between

two variables.

Two prominent types of correlation Coefficient are

Pearson Product Moment correlation

coefficient

Spearman’s Rank correlation coefficient

Application:

How strongly are sales related to advertising

expenditure?

Is there any association between market share &

size of the sales force?

Are consumer’s perceptions of quality related to

the perceptions of prices?

Regression Analysis

nature and closeness of relationships

between two or more variables

It evaluate the causal effect of one variable

on another variable

It used to predict the variability in the

dependent (or criterion) variable based

on the information about one or more

independent (or predictor) variables.

Two variables : Simple or Linear

Regression Analysis

More than two variables : Multiple

Regression Analysis

application

Can the variation in market share be

accounted for by the size of the sales

force?

Are consumer’s perceptions of quality

determined by their perceptions of price?

What level of sales can be expected,given

the levels of advertising

expenditures,prices,and level of the

distribution?

Linear Regression Analysis

Linear regression : Y = + X

Where Y : Dependent variable

X : Independent variable

and : Two constants are called regression coefficients

: Slope coefficient i.e. the change in the value of Y

with

the corresponding change in one unit of X

: Y intercept when X = 0

R2 : The strength of association i.e. to what degree that the

variation in Y can be explained by X.

R2= 0.10 then only 10% of the total variation in Y can be

explained by the variation in X variables

Test of significance of Regression Equation

Linear regression : Y = + X

F test is used to test the significance of the linear relationship

between two variables Y and X

H0: = 0 (There is no linear relationship between Y and X)

model represent the real world data.

Other regression methods

Forward regression

Step wise regression

Backward regression

Logit Analysis(Logistic

Regression)

Dependent variable is binary and several

independent variables are metric.

Estimates the probability a binary event

taking place,by logistic regression

model.eg.success/failure.

S-curve

Discriminant Analysis

Discriminant analysis aims at studying the effect of two or more

predictor variables (independent variables) on certain

evaluation criterion

The evaluation criterion may be two or more groups

Two groups such as good or bad, like or dislike, successful or

unsuccessful, above expected level or below expected level

Three groupssuch as good, normal or poor

Check whether the predictor variable discriminate among the

groups

To identify the predictor variable which is more important when

compared to other predictor variable(s).

Such analysis is called discriminant analysis

Discriminant Analysis

Designing a discriminant function: Y = aX1 + bX2

where Y is a linear composite representing the discriminant function, X1 and

X2 are the predictor variables (independent variables) which are having

effect on the evaluation criterion of the problem of interest.

Finding the discriminant ratio (K) and determining the variables which

account for intergroup difference in terms of group means

This ratio is the maximum possible ratio between the ‘variability between

groups’ and the ‘variability within groups’

Finding the critical value which can be used to include a new data set (i.e. new

combination of instances for the predictor variables) into its appropriate

group

Testing H0: The group means are equal in importance

H1: The group means are not equal in importance

using F test at a given significance level

CONFUSION MATRIX,MISCLASSIFICATION ERROR.

application

What psychographic characteristics help

differentiate between price-sensitive and

non-price-sensitive buyers of groceries?

In terms of demographic characteristics,how

do customers who exhibit store loyalty

differ from those who do not?

Conjoint Analysis

Technique that attempts to determine the

relative importance consumers attach to

salient attributes & the utilities they attach

to the levels of attributes.

TERMS:Part worth functions,relative

importance weights,attribute levels.

Factor Analysis

Factor analysis can be defined as a ‘set of methods in which the observable or

manifest responses of individuals on a set of variables are represented as

functions of a small number of latent variables called factors’.

Factor analysis helps the researcher to reduce the number of variables to be

analyzed, thereby making the analysis easier.

Analysis based on a wide range of variables can be tedious and time

consuming.

Using Factor Analysis, the researcher can reduce the large number of variables

into a few dimensions called factors that summarize the available data.

Its aims at grouping the original input variables into factors which underlying

the input variables.

For example, age, gender, marital status can be combined under a factor called

demographic characteristics. The income level, education, employment

status can be combined under a factor called socio-economic status. The

credit card and family background can be combined under factor called

background status.

Benefits of Factor Analysis

To identify the hidden dimensions or construct which

may not be apparent from direct analysis

population being analyzed.

Procedure followed for Factor Analysis

Define the problem

Construct the correlation matrix that measures the

relationship between the factors and the variables.

Select an appropriate factor analysis method

Determine the number of factors

Rotation of factors

Interpret the factors

Determine the factor scores

application

Used in market segmentation for identifying

underlying variables on which to group th

customers.

In product research,employed to determine

the brand attributes that influences

consumer choices.

In pricing studies,used to identify the

charactersitics of price-sensitive

consumers.

In advertising,used to understand the media

consumption habits of target market.

Cluster Analysis

Cluster analysis can be defined as a set of techniques

used to classify the objects into relatively

homogeneous groups called clusters

It involves identifying similar objects and grouping

them under homogeneous groups

Cluster as a group of objects that display high

correlation with each other and low correlation

with other variables in other clusters

Procedure in Cluster Analysis

1. Defining the problem: First define the problem and de upon the variables based

on which the objects are clustered.

2. Selection of similarity or distance measures: The similarity measure tries to

examine the proximity between the objects. Closer or similar objects are

grouped together and the farther objects are ignored. There are three major

methods to measure the similarity between objects:

1. Euclidean Distance measures

2. Correlation coefficient

3. Association coefficients

3. Selection of clustering approach: To select the appropriate clustering approach.

There are two types of clustering approaches:

1. Hierarchical Clustering approach

2. Non-Hierarchical Clustering approach

Hierarchical clustering Approachconsists of either a top-down approach or a

bottom-up approach. Prominent hierarchical clustering methods are: Single

linkage, Complete linkage, Average linkage, Ward’s method and Centroid

method.

Procedure in Cluster Analysis

Hierarchical clustering Approachconsists of either a top-down

approach or a bottom-up approach. Prominent hierarchical clustering

methods are: Single linkage, Complete linkage, Average linkage,

Ward’s method and Centroid method.

all the objects that are within the specified distance from the cluster center are

included in the cluster

application

Segmentation of market

Understanding buying behaviours

Identifying new product opportunities.

Selecting test markets

Reducing data

Artificial Neural Network

Used when function is not well

defined.eg.fraud,stock price prediction.

Used for unstructured functions only

Single layer,multi layer ANN.

STEPS:Architecture-Training-Activation

DECISION TREE

Prediction,classification

method

Information gain,Ginni

index used for hierarical

determination

Decile or precentile

values decides cut-offs

at nodes

Also used as a substitute

of logit model.

Thank you

