You are on page 1of 13

Business Intelligence

(Course Code: MIS 410, Prerequisite: BUS 173, MIS 210/MIS 310)

Dr. Atikur R. Khan


Associate Professor
Department of Management
North South University
Outline of the Course

 Introduction to BI and BI Tools


 Part-I: Descriptive Analytics
 Part-II: Inferential Analytics
 Part-III: Predictive Analytics
 Part-IV: Prescriptive Analytics
 Part-V: Decision Analytics
 Lab Works: R/Python & Tableau/Power BI
Regression model: Model validation

• Cross-Validation
• Bootstrapping
• Multiple regression
• Logistic regression
Cross-Validation

Cross-validation is a method that reserves a portion of a data set that is


not used for model building but is used for testing the model built on other
portion of that data set. Steps are as follows:

• Split data into training and test samples

• Build model with training sample

• Test the model on the test sample by computing a measure (for


example, MSE)

• This helps us to evaluate the effectiveness of the model. If the


model provides better performance on validation (test) sample, we
can go ahead with the model.
Cross-Validation (CV)

This is used to assess how the model will generalize on an independent


data set.

• Find or estimate expected error

• Helps in finding the best model

• Avoid overfitting
Cross-Validation Methods

• Hold out method of cross-validation

• K-fold cross-validation

• Leave-one-out cross-validation (LOOCV)

• Bootstrapping
Hold Out Cross-Validation

Split data set into training and test (validation) data


sets. Example: 80% training and 20% test data;
maximum 30% test (validation) data
Hold Out Cross-Validation
Hold Out Cross-Validation

• Which of the previous two models perform better with 25% hold out
cross-validation? – The model with the least average MSE computed
from 500 replication.
K-fold Cross-Validation
Usually, k=10 is used and we call it 10-fold cross-
validation. Let us explain this with k=5 as follows.
K-fold Cross-Validation
Replicate the whole computation 100 times and calculate
average MSE
MSE.rep = NULL
For(i in 1:100)
{

MSE.rep[ i ] = MSE
}
mean(MSE.rep)
Bootstrapping
• (1) Fit regression model

• (2) Calculate fitted values and residuals, and save these values

• (3) Resample residuals = sample(residual, sample size)

• (4) Dependent variable = fitted values + resampled residuals

• (5) Fit regression model with dependent variable in (4) and save coefficients

• (6) Repeat (3) – (5) steps 100 times, and calculate average of coefficients.
These estimates are known as bootstrap estimates of coefficients.
Bootstrapping

You might also like