You are on page 1of 15

Summary

Technique

Application

Predicting the numeric quantity. The dependent variable can be numeric as

well as categorical.

Alternative

Assumption

Normality, iid and homoscedasticity for the residuals .

If assumption gets violated then box cox method can be used and variable
can be transformed

Limitations

Data should be linearly dependent otherwise svm and neural network can
be used .
Multicollinearity within dependent variable need to be checked

Accuracy
measure

Key terms

Ordinary linear regression, residuals , transformation of variable ,

multicollinearity , outliers and influential points , box-cox, normal
distribution, PCA

Sources to study

NPTEL notes by Prof Shulabh from iit k ( theory )

Regression model course at course era by john Hopkins ( implementation
on R )

Technique

Logistic Regression

Application

Predicting the categorical variables. The dependent variable can be

numeric as well as categorical.

Alternative

Bayesian

Assumption

Normality, iid and homoscedasticity for the residuals .

If assumption gets violated then box cox method can be used and variable
can be transformed

Limitations

Data should be linearly dependent otherwise svm and neural network can
be used .
Multicollinearity within dependent variable need to be checked.

Accuracy
measure

Key terms

Logistic regression as a classifier, residuals , transformation of variable ,

multicollinearity , outliers and influential points , box-cox, normal
distribution, PCA

Sources to study

NPTEL notes by Prof Shulabh from iit k ( theory )

Regression model course at course era by john Hopkins ( implementation
on R )

Technique

Poisson Regression

Application

Predicting the number of events in a given time period ( event can be a

failure of a machine )

Alternative

Assumption

Limitations

Accuracy
measure

AIC, BIC

Key terms

Sources to study

NPTEL notes by Prof Sulabh from iit k ( theory )

Regression model course at course era by john Hopkins ( implementation
on R )

Technique

Application

numeric output.

Alternative

Neural network

Assumption

Normality, iid and homoscedasticity for the groups .

KRUSKAL WALLIS method can be used if assumptions get violated

Limitations

Accuracy
measure

F statistics

Key terms

Sources to study

kutner applied linear statistical models ( book )

R code from http://mgmt.iisc.ernet.in/CM/MG221/Handouts.html

Technique

Application

Alternative

Assumption

No assumptions

Limitations

Everything should be categorical.

If not then it has to be divided into categories

Accuracy
measure

Key terms

Association sequence, support, confidence , lift, market basket analysis ,

Sources to study

http://www-users.cs.umn.edu/~kumar/dmbook/index.php ( theory )
R code is provided during training

Technique

Clustering

Application

Alternative

Assumption

No assumptions

Limitations

If dataset is large ( hierarchical cannot be used and finding number of

cluster is difficult )
Due to redundant variables , clear cluster are not visible in large data

Accuracy
measure

Key terms

K-means , k-medoids, hierarchical , Sparseclustering , PCA , Feature

selection , knee point , sumofsquare

Sources to study

Machine learning course by Stanford university Prof Andrew

http://www-users.cs.umn.edu/~kumar/dmbook/index.php

Technique

Application

Classifier

Alternative

Assumption

Limitations

The data should be linear separable.

Interpretation is difficult when tree is big .
Only categorical variable otherwise numeric data is categorized by decision
tree with loss in information

Accuracy
measure

Key terms

Entropy , splitinfo, Gain ratio

Sources to study

http://www-users.cs.umn.edu/~kumar/dmbook/index.php ( theory )

Technique

Application

Classifier

Alternative

Assumption

Limitations

The data should be linear separable.

All variables must be independent.
Only categorical variable.

Accuracy
measure

Confusion matrix , accuracy measure , KS statistics, Area under ROC Curve

Key terms

Bayes theorem

Sources to study

http://www-users.cs.umn.edu/~kumar/dmbook/index.php ( theory )

Technique

SVM and Neural network ( classifier )

Application

Classifier ( NON_LINEAR )

Alternative

Assumption

Limitations

Interpretation is very difficult.

Very expensive and time consuming

Accuracy
measure

Key terms

Sources to study

Machine learning course by Stanford university Prof Andrew

http://www-users.cs.umn.edu/~kumar/dmbook/index.php

Technique

Principle Component

Application

Dimensionality reduction for clustering as well as regression

Alternative

Feature selection

Assumption

Limitations

Interpretation is difficult.

Accuracy
measure
Key terms

Sources to study

Machine learning course by Stanford university Prof Andrew

http://www-users.cs.umn.edu/~kumar/dmbook/index.php

Technique

Time series

Application

Alternative

Assumption

Limitations

Accuracy
measure

AIC and BIC

Key terms

ARIMA, FOERCASTING ,

Sources to study

NPTEL VIDEOS BY IISC PROF FROM CIVIL ENGINEERING DEPTT

Website otext.com

suggested

Markov models : for predicting sudden jumps in stock market data or time series data related to other
domain
Survival Analysis : data when the machine is going to fail. Sometimes you get the data which is censored.
Like a data of failure of machine where some machines did not fail when the data was collected.
Data envelop Analysis : to measure the performance difference between various units or teams based on
multiple factors .

Transformation suggested
of variables in linear modeling ( box cox method )

What measure you do when assumption violates ( boc cox )

Adding non linear terms to your model
Association sequence rule
Sparse clustering for feature selection in clustering ( special method for clustering )
Nave bayes classifier ( used in text analytics to classify tweets , mails , document
etc
K-NN classifier that is usually used when clustering and classification both are
required.
How to include the interaction term to improve the model performance
Poisson regression for predicting the failure of machines
Generalized linear regression modelling
Linear discriminant analysis
Boosting bagging and other methods for improvement of classifiers
Random forests method for classification
Topic modelling
Sentimental analysis
Support vector regression for non-linear regression

Thank You