Prelude………….

“There will be more data created in the next five years than in the history of the planet Earth.

The ability to access that data, the ability to understand it and deliver it to the right people at the right time so that they can make the right decision, is going to be critical to every enterprise on the planet.

How analytics can help Business Challenge Internal Transactional Data Primary/ Perception Data Secondary Market data Measure as-is • Reporting • Performance Analyses • Trend Analyses Analyze • Segmentation • Benchmarking and Gap Analyses •Trigger Analyses •Key Driver Analyses Predict/Improve •Modeling •Optimization •Simulation Data Driven Business Decision .

how often.Business Intelligence and Analytics Optimization Predictive Modeling Forecasting / Exploration Statistical Analysis Alerts What’s the best that can happen? What will happen next? What if these trends continue? Why is this happening? What actions are needed? Access and Reporting Analytics Competitive Advantage Query / Drill Where exactly is the problem down Ad hoc reports How many. where? Standard reports What happened? Degree of Intelligence .

” --Forrester Research .Information Poor “The typical Fortune 1000 company has a big problem: it collects a lot of important business data that never gets used. Users who could take data and turn it into competitive advantage can’t get the level of access they need.Data Rich .

.DATA TYPES Nominal-identify and classify Ordinal-order data Interval-rate the data Ratio-set a reference. 3 & 4 are metric data. 1 & 2 are non-metric data.

logically inconsistent. Missing Responses-values of variables that are unknown.because unambigous answers are provided or no answer given.TESTING DATA.Data not defined by CODING are inadmissible. Data divided into TRAINING DATA.DATA MINING Data reporting.imputed response. Treatment:Substitute a neutral value. . Data Cleaning: Consistency Checks-identifies data that are out of range.or have extreme values.PREDICTION DATA.

Stages in Data Analysis Editing Coding Data Entry (Keyboarding) Data Analysis Error Checking And Verification Descriptive Analysis Univeriate Analysis Bivariate Analysis Multivate Analysis Interpretation .

A Classification of Multivariate Methods A ll M M u e t h lt iv a r ia t e o d s A r e s o m e o f t h e v a r ia b le s d e p e n d o n o t h e r s ? Y D e s N o e n t e p e n d I en n d c e e p e n d e n M e t h o d sM e t h o d s c e .

e T t rh i ce i n t e s r c v a a l le s a r e n o a m t i oi n r o r o r d i n a l lt ip le n D i s c r i m in A n a ly s i s M u M n At .TC h a e n o n i c a a lr i n t e s r c v a a l l e s o a r e n o mA ni n a a l y l s i s o r o r d in a l a r ia t e s i s o f n c e O V A ) C A o n jo in t n a l y s i s M R u l t i p l e e g r e s s i o a u lt iv n a ly V a r i a ( M A N .Multivariate Analysis: Classification of Dependence Methods D M H a O n V e a D e p e r i a b l e n d e n t o w r e e p e n d e n e t h o d s m d e V e a p n e y n c e v a r i a b d e n t ? e n le s S v e r a l D e p a r i a b l e s d e M n ut l t i p l e in d a n d d e p e n v a r i a b l e s M e t r i c r a t i o o T r h e sN c o a n l em s e a t r r i e c M .T h e sN c o a n l em s e a t r r i ec .

Multivariate Analysis: Classification of Independence Methods I n M A d e p e n d e e t h o d s i n p u t r i c ? t s n c e r e M a M s c o F A n a c t o r a l y s i s C A a r e t r i c .T h e l e s a r e r a I n t e r v a l l u n s t e r a l y s Ms i M u S N t i o s o n m e t r i c c a l e s a r e o r o r d i n a e t r i c N o n m e t r i l t i d i m eM n u s l i t o i d n i am l e c a l i n g S c a l i n g .

It can be used along with regression analysis to determine the nature of the relationship between two variables.Correlation Analysis Correlation analysis is a statistical technique used to measure the magnitude of linear relationship between two variables. Two prominent types of correlation Coefficient are Pearson Product Moment correlation coefficient Spearman’s Rank correlation coefficient . Correlation analysis cannot be used in isolation to describe the relationship between variables.

Application: How strongly are sales related to advertising expenditure? Is there any association between market share & size of the sales force? Are consumer’s perceptions of quality related to the perceptions of prices? .

Regression Analysis Regression analysis is used to predict the nature and closeness of relationships between two or more variables It evaluate the causal effect of one variable on another variable It used to predict the variability in the dependent (or criterion) variable based on the information about one or more independent (or predictor) variables. Two variables : Simple or Linear Regression Analysis More than two variables : Multiple Regression Analysis .

prices.application Can the variation in market share be accounted for by the size of the sales force? Are consumer’s perceptions of quality determined by their perceptions of price? What level of sales can be expected.and level of the distribution? .given the levels of advertising expenditures.

Linear Regression Analysis Linear regression : Y = + X Where Y : Dependent variable X : Independent variable and : Two constants are called regression coefficients : Slope coefficient i.e.e. the change in the value of Y with the corresponding change in one unit of X : Y intercept when X = 0 R2 : The strength of association i.10 then only 10% of the total variation in Y can be explained by the variation in X variables . R2= 0. to what degree that the variation in Y can be explained by X.

.Test of significance of Regression Equation Linear regression : Y = + X F test is used to test the significance of the linear relationship between two variables Y and X H0: = 0 (There is no linear relationship between Y and X) H1: 0 (There is linear relationship between Y and X) Objective : To check whether the estimates from the regression model represent the real world data.

Other regression methods Forward regression Step wise regression Backward regression .

by logistic regression model.eg.success/failure. Estimates the probability a binary event taking place.Logit Analysis(Logistic Regression) Dependent variable is binary and several independent variables are metric. S-curve .

like or dislike. normal or poor Check whether the predictor variable discriminate among the groups To identify the predictor variable which is more important when compared to other predictor variable(s). successful or unsuccessful. above expected level or below expected level Three groupssuch as good.Discriminant Analysis Discriminant analysis aims at studying the effect of two or more predictor variables (independent variables) on certain evaluation criterion The evaluation criterion may be two or more groups Two groups such as good or bad. Such analysis is called discriminant analysis .

X1 and X2 are the predictor variables (independent variables) which are having effect on the evaluation criterion of the problem of interest.Discriminant Analysis Designing a discriminant function: Y = aX1 + bX2 where Y is a linear composite representing the discriminant function. Finding the discriminant ratio (K) and determining the variables which account for intergroup difference in terms of group means This ratio is the maximum possible ratio between the ‘variability between groups’ and the ‘variability within groups’ Finding the critical value which can be used to include a new data set (i.MISCLASSIFICATION ERROR. . new combination of instances for the predictor variables) into its appropriate group Testing H0: The group means are equal in importance H1: The group means are not equal in importance using F test at a given significance level CONFUSION MATRIX.e.

application What psychographic characteristics help differentiate between price-sensitive and non-price-sensitive buyers of groceries? In terms of demographic characteristics.how do customers who exhibit store loyalty differ from those who do not? .

TERMS:Part worth functions. .Conjoint Analysis Technique that attempts to determine the relative importance consumers attach to salient attributes & the utilities they attach to the levels of attributes.relative importance weights.attribute levels.

the researcher can reduce the large number of variables into a few dimensions called factors that summarize the available data. marital status can be combined under a factor called demographic characteristics. The income level. Using Factor Analysis. education. For example. gender. The credit card and family background can be combined under factor called background status.Factor Analysis Factor analysis can be defined as a ‘set of methods in which the observable or manifest responses of individuals on a set of variables are represented as functions of a small number of latent variables called factors’. . employment status can be combined under a factor called socio-economic status. Analysis based on a wide range of variables can be tedious and time consuming. Its aims at grouping the original input variables into factors which underlying the input variables. thereby making the analysis easier. age. Factor analysis helps the researcher to reduce the number of variables to be analyzed.

Benefits of Factor Analysis To identify the hidden dimensions or construct which may not be apparent from direct analysis To identify relationships between variables It helps in data reduction It helps the researcher to cluster the product and population being analyzed. .

Procedure followed for Factor Analysis Define the problem Construct the correlation matrix that measures the relationship between the factors and the variables. Select an appropriate factor analysis method Determine the number of factors Rotation of factors Interpret the factors Determine the factor scores .

. In product research. In advertising.employed to determine the brand attributes that influences consumer choices.used to understand the media consumption habits of target market. In pricing studies.application Used in market segmentation for identifying underlying variables on which to group th customers.used to identify the charactersitics of price-sensitive consumers.

Cluster Analysis Cluster analysis can be defined as a set of techniques used to classify the objects into relatively homogeneous groups called clusters It involves identifying similar objects and grouping them under homogeneous groups Cluster as a group of objects that display high correlation with each other and low correlation with other variables in other clusters .

. Closer or similar objects are grouped together and the farther objects are ignored. 2. Non-Hierarchical Clustering approach Hierarchical clustering Approachconsists of either a top-down approach or a bottom-up approach. Ward’s method and Centroid method. There are two types of clustering approaches: 1. Defining the problem: First define the problem and de upon the variables based on which the objects are clustered. Correlation coefficient 3. There are three major methods to measure the similarity between objects: 1. Complete linkage. Association coefficients 3. Selection of similarity or distance measures: The similarity measure tries to examine the proximity between the objects.Procedure in Cluster Analysis 1. Hierarchical Clustering approach 2. Euclidean Distance measures 2. Average linkage. Selection of clustering approach: To select the appropriate clustering approach. Prominent hierarchical clustering methods are: Single linkage.

Ward’s method and Centroid method. Average linkage. Prominent hierarchical clustering methods are: Single linkage. Non-Hierarchical clustering Approach: A cluster center is first determined and all the objects that are within the specified distance from the cluster center are included in the cluster Deciding on the number of clusters to be selected Interpreting the clusters 4 5 . Complete linkage.Procedure in Cluster Analysis Hierarchical clustering Approachconsists of either a top-down approach or a bottom-up approach.

application Segmentation of market Understanding buying behaviours Identifying new product opportunities. Selecting test markets Reducing data .

multi layer ANN.stock price prediction.fraud.Artificial Neural Network Used when function is not well defined. Used for unstructured functions only Single layer. STEPS:Architecture-Training-Activation .eg.

classification method Information gain.Ginni index used for hierarical determination Decile or precentile values decides cut-offs at nodes Also used as a substitute of logit model.DECISION TREE Prediction. .

