Professional Documents
Culture Documents
Unit 3 :
Modeling and Evaluation:
It determines, what happened in the past It determines, what can happen in the
Basic
by analyzing stored data. future with the help past data analysis.
19
Application of Predictive method
20
Process of Predictive model
Step 1:Data collection and purification: Data is
accumulated from all the sources to extract the required
information by cleaning data with some operations that
eliminate loud data to get accurate estimations. Various
sources are included Transaction and customer assistance
data, survey and economic data.
21
Process of Predictive model
Step 2: Data transformation: Data need to be transformed
through accurate processing to get normalized data. The
values are scaled in a provided range of normalized data,
extraneous elements get removed by correlation analysis to
conclude the final decision.
22
Process of Predictive model
Step 3: Formulation of the predictive model: Any
predictive model often employs regression techniques to
design a predictive model by using the classification
algorithm. During this process, test data is recognized,
classification decisions get implemented on test data to
determine the performance of the model.
23
Process of Predictive model
Step 4: Performance analysis or conclusion: At last,
inferences are drawn from the model, for this, cluster
analysis is performed. After building the model analysis is
important for the maintaining.
24
Steps in building regression model
STEP 1: Collect/Extract Data
The first step in building a regression model is to collect or extract data on the dependent
(outcome) vari-able and independent (feature) variables from different data sources. Data
collection in many cases can be time-consuming and expensive, even when the organization has
well-designed enterprise resource planning (ERP) system.
STEP 2: Pre-Process the Data
Before the model is built, it is essential to ensure the quality of the data for issues such as
reliability, completeness, usefulness, accuracy, missing data, and outliers.
1. Data imputation techniques may be used to deal with missing data. Use of descriptive statistics
and visualization (such as box plot and scatter plot) may be used to identify the existence of
outliers and variability in the dataset.
25
Steps in building regression model
2. Many new variables (such as the ratio of variables or product of variables) can be derived (aka
feature engineering) and also used in model building.
3. Categorical data has must be pre-processed using dummy variables (part of feature engineering)
before it is used in the regression model.
26
Steps in building regression model
STEP 5: Build the Model
The model is built using the training dataset to estimate the regression parameters. The method of
Ordinary Least Squares (OLS) is used to estimate the regression parameters.
STEP 6:
Perform Model Diagnostics Regression is often misused since many times the modeler fails to
perform necessary diagnostics tests before applying the model. Before it can be applied, it is
necessary that the model created is validatedfor all model assumptions including the definition of
the function form. If the model assumptions are violated, then the modeler must use remedial
measure.
27
Steps in building regression model
STEP 5: Build the Model
The model is built using the training dataset to estimate the regression parameters. Te method of
Ordinary Least Squares (OLS) is used to estimate the regression parameters.
STEP 6:
Perform Model Diagnostics Regression is often misused since many times the modeler fails to
perform necessary diagnostics tests before applying the model. Before it can be applied, it is
necessary that the model created is validated for all model assumptions including the definition of
the function form. If the model assumptions are violated, then the modeler must use remedial
measure.
28
linear regression model
Linear regression is a quiet and simple statistical regression method used for
predictive analysis and shows the relationship between the continuous variables.
Linear regression shows the linear relationship between the independent variable
(X-axis) and the dependent variable (Y-axis), consequently called linear regression.
If there is a single input variable (x), such linear regression is called simple linear
regression. And if there is more than one input variable, such linear regression is
called multiple linear regression.
The linear regression model gives a sloped straight line describing the relationship
within the variables.
29
Cost function
A cost function, also called a loss function, is used to define and measure the error of a model. The
differences between the prices predicted by the model and the observed prices of the pizzas in the
training set are called residuals or training errors.
Cost function optimizes the regression coefficients or weights and measures how a linear
regression model is performing. The cost function is used to find the accuracy of the mapping
function that maps the input variable to the output variable. This mapping function is also known
as the Hypothesis function.
in Linear Regression, Mean Squared Error (MSE) cost function is used, which is the average of
squared error that occurred between the predicted values and actual values.
30
EXAMPLE:
Let's assume that you have recorded the diameters and prices of pizzas that
you have previously eaten in your pizza journal. These observations comprise
our training data
32
EXAMPLE:
from sklearn.linear_model import LinearRegression
# Training data
X = [[6], [8], [10], [14], [18]]
y = [[7], [9], [13], [17.5], [18]]
# Create and fit the model
model = LinearRegression()
model.fit(X, y)
print 'A 12" pizza should cost: $%.2f' % model.predict([12])[0]
A 12" pizza should cost: $13.68
33
EVALUATING THE FITNESS OF MODEL
sum of squares is calculated with the formula in the can produce the best
pizza-price predictor by minimizing the sum of the residuals. That is, our
model fits if the values it predicts for the response variable are close to the
observed values for all of the training examples. This measure of the model's
fitness is called the residual sum of squares cost function. Formally, this
function assesses the fitness of a model by summing the squared residuals
for all of our training examples. The residual lfollowing equation,
34
EVALUATING THE MODEL
how well the observed values of the response variables are predicted by the
model. More concretely, r-squared is the proportion of the variance in the
response variable that is explained by the model. An r-squared score of one
indicates that the response variable can be predicted without any error using
the model.
35
CALCULATION
36
PYTHON IMPLEMENTATION
from sklearn.linear_model import LinearRegression
X = [[6], [8], [10], [14], [18]]
y = [[7], [9], [13], [17.5], [18]]
X_test = [[8], [9], [11], [16], [12]]
y_test = [[11], [8.5], [15], [18], [11]]
model = LinearRegression()
model.fit(X, y)
print 'R-squared: %.4f' % model.score(X_test, y_test)
37
https://medium.com/ml-research-lab/chapter-2-data-and-it
s-different-types-3dfebcbb4dbe
https://blog.statsbot.co/data-structures-related-to-machine-
learning-algorithms-5edf77c8bbf4#:~:text=Array,mathem
atical%20tool%20at%20your%20disposal
.
https://www.upgrad.com/blog/types-of-data/
https://www.spirion.com/data-remediation/
38
https://seleritysas.com/blog/2019/12/12/types-of-predictiv
e-analytics-models-and-how-they-work
/
https://
towardsdatascience.com/selecting-the-correct-predictive-
modeling-technique-ba459c370d59
https://
www.netsuite.com/portal/resource/articles/financial-mana
gement/predictive-modeling.shtml
https://
www.dezyre.com/article/types-of-analytics-descriptive-pre
dictive-prescriptive-analytics/209#toc-2
https://
39 Prof. Monali Suthar (SOCET-CE)
www.sciencedirect.com/topics/computer-science/descripti
40 Prof. Monali Suthar (SOCET-CE)