You are on page 1of 6

21CS3052

DATA WAREHOUSING & MINING


CO-2
Session-12
Simple Regression Analysis

AIM OF THE SESSION:


 An ability to understand the Simple Regression
Analysis

LEARNING OUTCOMES

1. Define different types of simple regression.


2. Describe fitting a straight line using least square
method.
3. Summarize the concept of linear regression.

What is Regression Analysis?


Regression analysis is a set of statistical methods used for the
estimation of relationships between a dependent variable and one or
more independent variables. It can be utilized to assess the strength of
the relationship between variables and for modelling the future
relationship between them.

Regression analysis includes several variations, such as linear, multiple


linear, and nonlinear. The most common models are simple linear and
multiple linear. Nonlinear regression analysis is commonly used for
more complicated data sets in which the dependent and independent
variables show a nonlinear relationship.

Regression analysis offers numerous applications in various disciplines,


including finance.
Regression Analysis – Linear Model Assumptions

Linear regression analysis is based on six fundamental assumptions:

1. The dependent and independent variables show a linear


relationship between the slope and the intercept.
2. The independent variable is not random.
3. The value of the residual (error) is zero.
4. The value of the residual (error) is constant across all
observations.
5. The value of the residual (error) is not correlated across all
observations.
6. The residual (error) values follow the normal distribution.
Regression Analysis – Simple Linear Regression

Simple linear regression is a model that assesses the relationship


between a dependent variable and an independent variable. The
simple linear model is expressed using the following equation:
Y = a + bX + ϵ

Where:

 Y – Dependent variable


 X – Independent (explanatory) variable
 a – Intercept
 b – Slope
 ϵ – Residual (error)
Regression Analysis – Multiple Linear Regression

Multiple linear regression analysis is essentially similar to the simple


linear model, with the exception that multiple independent variables
are used in the model. The mathematical representation of multiple
linear regression is:

Y = a + bX1 + cX2  + dX3 + ϵ

Where:

 Y – Dependent variable


 X1, X2, X3 – Independent (explanatory) variables
 a – Intercept
 b, c, d – Slopes
 ϵ – Residual (error)

Multiple linear regression follows the same conditions as the simple


linear model. However, since there are several independent variables in
multiple linear analysis, there is another mandatory condition for the
model:

 Non-collinearity: Independent variables should show a


minimum correlation with each other. If the independent
variables are highly correlated with each other, it will be difficult
to assess the true relationships between the dependent and
independent variables.
Regression Analysis in Finance

Regression analysis comes with several applications in finance. For


example, the statistical method is fundamental to the Capital Asset
Pricing Model (CAPM). Essentially, the CAPM equation is a model that
determines the relationship between the expected return of an asset
and the market risk premium.

The analysis is also used to forecast the returns of securities, based on


different factors, or to forecast the performance of a business. Learn
more forecasting methods in CFI’s Budgeting and Forecasting Course!

Ridge Regression

Ride regression refers to a process that is used to analyze


various regression data that have the issue of
multicollinearity. Multicollinearity is the existence of a
linear correlation between two independent variables.

Ridge regression exists when the least square estimates are


the least biased with high variance, so they are quite
different from the real value. However, by adding a degree of
bias to the estimated regression value, the errors are reduced
by applying ridge regression.

Lasso Regression

The term LASSO stands for Least Absolute Shrinkage and


Selection Operator. Lasso regression is a linear type of
regression that utilizes shrinkage. In Lasso regression, all
the data points are shrunk towards a central point, also known
as the mean. The lasso process is most fitted for simple and
sparse models with fewer parameters than other regression.
This type of regression is well fitted for models that suffer
from multicollinearity.

Application of Regression

Regression is a very popular technique, and it has wide


applications in businesses and industries. The regression
procedure involves the predictor variable and response
variable. The major application of regression is given below.

 Environmental modeling
 Analyzing Business and marketing behavior
 Financial predictors or forecasting
 Analyzing the new trends and patterns.

Difference between Regression and Classification in data


mining

Regression and classification are quite similar to each other.


Classification and Regression are two significant prediction
issues that are used in data mining. If you have given a
training set of inputs and outputs and learn a function that
relates the two, that hopefully enables you to predict outputs
given inputs on new data. The only difference is that in
classification, the outputs are discrete, whereas, in
regression, the outputs are not. But the concepts are blurred,
as in "logistic regression", which can be interpreted as
either a classification or a regression method. So, it becomes
difficult for the user to understand when to use
classification and regression.

Difference between Regression and Classification in data


mining

Regression Classification
Regression refers to a type of
Classification refers to a
supervised machine learning
process of assigning predefined
technique that is used to
class labels to instances based
predict any continuous-valued
on their attributes.
attribute.
In regression, the nature of In classification, the nature
the predicted data is ordered. of the predicated data is
unordered.
Classification is divided into
The regression can be further
two categories: binary
divided into linear regression
classifier and multi-class
and non-linear regression.
classifier.
In the regression process, the In the classification process,
calculations are basically the calculations are basically
done by utilizing the root done by measuring the
mean square error. efficiency.
Examples of regressions are
The examples of classifications
regression tree, linear
are the decision tree.
regression, etc.

TERMINAL QUESTIONS
1. Describe simple regression analysis.

2. List out steps in simple regression.


3. Analyze to predict the value of one variable when the value
of another variable is known with an example.
4. Summarize the pattern models to general regression models.
REFERENCE BOOKS:
1. Han J & Kamber M, “Data Mining: Concepts and
Techniques”, Third Edition, Elsevier, 2011. 
Sites and Web links:
1. https://www.geeksforgeeks.org/data-mining/
2. https://www.javatpoint.com/data-mining
3. https://www.springboard.com/blog/data-science/data-
mining/

You might also like