You are on page 1of 15

Technique

Application

well as categorical.

Alternative

Assumption

If assumption gets violated then box cox method can be used and variable

can be transformed

Limitations

Data should be linearly dependent otherwise svm and neural network can

be used .

Multicollinearity within dependent variable need to be checked

Accuracy

measure

Key terms

multicollinearity , outliers and influential points , box-cox, normal

distribution, PCA

Sources to study

Regression model course at course era by john Hopkins ( implementation

on R )

Technique

Logistic Regression

Application

numeric as well as categorical.

Alternative

Bayesian

Assumption

If assumption gets violated then box cox method can be used and variable

can be transformed

Limitations

Data should be linearly dependent otherwise svm and neural network can

be used .

Multicollinearity within dependent variable need to be checked.

Accuracy

measure

Key terms

multicollinearity , outliers and influential points , box-cox, normal

distribution, PCA

Sources to study

Regression model course at course era by john Hopkins ( implementation

on R )

Technique

Poisson Regression

Application

failure of a machine )

Alternative

Assumption

Limitations

Accuracy

measure

AIC, BIC

Key terms

Sources to study

Regression model course at course era by john Hopkins ( implementation

on R )

Technique

Application

numeric output.

Alternative

Neural network

Assumption

KRUSKAL WALLIS method can be used if assumptions get violated

Limitations

Accuracy

measure

F statistics

Key terms

Sources to study

R code from http://mgmt.iisc.ernet.in/CM/MG221/Handouts.html

Technique

Application

Alternative

Assumption

No assumptions

Limitations

If not then it has to be divided into categories

Accuracy

measure

Key terms

Sources to study

http://www-users.cs.umn.edu/~kumar/dmbook/index.php ( theory )

R code is provided during training

Technique

Clustering

Application

Alternative

Assumption

No assumptions

Limitations

cluster is difficult )

Due to redundant variables , clear cluster are not visible in large data

Accuracy

measure

Key terms

selection , knee point , sumofsquare

Sources to study

http://www-users.cs.umn.edu/~kumar/dmbook/index.php

Technique

Application

Classifier

Alternative

Assumption

Limitations

Interpretation is difficult when tree is big .

Only categorical variable otherwise numeric data is categorized by decision

tree with loss in information

Accuracy

measure

Key terms

Sources to study

http://www-users.cs.umn.edu/~kumar/dmbook/index.php ( theory )

Technique

Application

Classifier

Alternative

Assumption

Limitations

All variables must be independent.

Only categorical variable.

Accuracy

measure

Key terms

Bayes theorem

Sources to study

http://www-users.cs.umn.edu/~kumar/dmbook/index.php ( theory )

Technique

Application

Classifier ( NON_LINEAR )

Alternative

Assumption

Limitations

Very expensive and time consuming

Accuracy

measure

Key terms

Sources to study

http://www-users.cs.umn.edu/~kumar/dmbook/index.php

Technique

Principle Component

Application

Alternative

Feature selection

Assumption

Limitations

Interpretation is difficult.

Accuracy

measure

Key terms

Sources to study

http://www-users.cs.umn.edu/~kumar/dmbook/index.php

Technique

Time series

Application

Alternative

Assumption

Limitations

Accuracy

measure

Key terms

ARIMA, FOERCASTING ,

Sources to study

Website otext.com

Further readings

suggested

Markov models : for predicting sudden jumps in stock market data or time series data related to other

domain

Survival Analysis : data when the machine is going to fail. Sometimes you get the data which is censored.

Like a data of failure of machine where some machines did not fail when the data was collected.

Data envelop Analysis : to measure the performance difference between various units or teams based on

multiple factors .

Further readings

Transformation suggested

of variables in linear modeling ( box cox method )

Adding non linear terms to your model

Association sequence rule

Sparse clustering for feature selection in clustering ( special method for clustering )

Nave bayes classifier ( used in text analytics to classify tweets , mails , document

etc

K-NN classifier that is usually used when clustering and classification both are

required.

How to include the interaction term to improve the model performance

Poisson regression for predicting the failure of machines

Generalized linear regression modelling

Linear discriminant analysis

Boosting bagging and other methods for improvement of classifiers

Random forests method for classification

Topic modelling

Sentimental analysis

Support vector regression for non-linear regression

Thank You

