You are on page 1of 25

Introduction to linear

regression
What is regression?

• Regression is a statistical method used in finance,


investing, and other disciplines that attempts to
determine the strength and character of the
relationship between one dependent variable
(usually denoted by Y) and a series of other
variables (known as independent variables such
as X1, X2…Xn).
• Regression is a statistical procedure that
determines the equation for the straight line that
best fits a specific set of data.
Regression Analysis

• How well a set of data points fits a straight


line can be measured by calculating the
distance between the data points and the line.
• The total error between the data points and
the line is obtained by squaring each distance
and then summing the squared values.
• The regression equation is designed to
produce the minimum sum of squared errors.
Regression Analysis

• Regression analysis is based on relationship,


or association, between two (or more)
variables.
• The variable which is being predicted is called
dependent variable.
• The variable (or variables) that are being used
to predict the value of the dependent variable
are called the independent variables.
Case Discussion
• Mr. Aaditya is the country head of Skyline paints
limited. At the end of the year 2022, he asked his
sales head to showcase the monthly sales (Rs.
Millions) of the east and west region. The sales data
region-wise are given below:

Months 1 2 3 4 5 6 7 8 9 10 11 12

West 164 166 158 162 174 187 182 184 191 195 190 192

East 120 114 122 118 114 112 124 126 123 135 138 140
Cont…
• By looking at the sales data, Mr. Aaditya asked his sales
head about the low performance of the east sales. Here,
the sales head argued to do some advertisements in the
eastern zone. Now Aaditya argued that whether the sales
of west is high due to advertisement. The adv. Expenses
(Rs. Millions) of western zone is given below:

Months 1 2 3 4 5 6 7 8 9 10 11 12
West 10 12 14 12 15 15 14 16 12 18 20 18

• From the above discussion find whether


advertisements have any effect on sales.
Regression Line
• Regression line between the scores and grade point.
Types of Relationships
Regression Equation
• In regression analysis, we shall develop an estimating
equation— that is, a mathematical formula that relates
the known variables to the unknown variable.

• The a is called the Y-intercept because its value is the


point at which the regression line crosses the Y-axis.
• b the slope- represents how much each unit change of
the independent variable X changes the dependent
variable Y.
Application of Regression
• Prediction of target variable for example to
predict the stock price, sales volume, profitability,
scores etc.
• Modeling the relationship between the
dependent and independent variables. For
example finding the relationship between
employee productivity and profitability.
• Review and understand how different variables
impact all of these things.
• Testing of hypothesis
Fitting a linear model to data

• To statistician the line will have “good fit” if it


minimizes the error between estimated points
on the line and the actual observed points
that were used to draw it.
• From now, we use 𝑌 (𝑌 hat) to symbolize the
individual values of the estimated points—
that is, the points that lie on the estimating
line.
Fitting a linear model to data
Fitting a linear model to data
Least Square Method
• The best method to minimize the errors is to apply the
least square method.
• The estimating line that minimizes the sum of the
squares of the errors, we call this the least-squares
method.
• The "least squares" method is a form of mathematical
regression analysis used to determine the line of best
fit for a set of data that minimizes the sum of the
squares of the errors.
• Each point of data represents the relationship between
a known independent variable and an unknown
dependent variable.
Least Square Regression
Equation
• The statistician has developed two equation to
identify the slope (b) and Y-intercept of the
best fitting regression line. The first formula
calculate the slope:
Least Square Regression
Equation
• The second formula calculates the Y-intercept.
Illustration
• Suppose municipality want to know the
relationship and effect of age of truck on
repair expense based on following collected
data.
Repair expenses during last year
Trucks Number Age of Truck in Years
(Rs. Hundreds)
101 5 7
102 3 7
103 3 6
104 1 4
Solution
• Solution: First identify
Solution

= 78-(4*3*6)/44-(4*3^2)
= 78-72/44-36
= 6/8
= 0.75

Thus the equation of the estimating line will be

= 3.75 + 0.75.X
Standard error of estimates
• To measure the reliability of the estimating equation,
statisticians have developed the standard error of estimate.
• This standard error is symbolized se and is similar to the
standard deviation.
• The standard deviation is used to measure the dispersion of a
set of observations about the mean.
• The standard error of estimate, on the other hand, measures
the variability, or scatter, of the observed values around the
regression line.

𝑌−𝑌 2
• 𝑆𝑒 =
𝑛−2
Standard error of estimates

The larger the standard error of estimate, the


greater the scattering (or dispersion) of points
around the regression line
Conversely, if Se = 0, we expect the estimating
equation to be a “perfect” estimator of the
dependent variable.
Standard error of estimates
Coefficient of Determination
(r square)
• The coefficient of determination measures the
proportion of variation in Y that is explained by
the variation in the independent variable X in the
regression model. The range of r square is from 0
to 1 and the greater the value, the more the
variation in Y in the regression model can be
explained by the variation in X.
• The coefficient of determination is equal to the
regression sum of squares (i.e., explained
variation) divided by the total sum of squares
(i.e., total variation).
Benefits of Regression
• Operation efficiency: Companies use this application to optimize
the business process.
• Supporting decisions: Many companies and their top managers
today are using regression analysis (and other kinds of data
analytics) to make an informed business decision and eliminate
guesswork and gut intuition.
• Correcting errors: Even the most informed and careful managers
do make mistakes in judgment. Regression analysis helps managers,
and businesses in general, recognize and correct errors.
• New Insights: Looking at the data can provide new and fresh
insights. Many businesses gather lots of data about their customers.
But that data is meaningless without proper regression analysis,
which can help find the relationship between different variables to
uncover patterns.
Correlation Vs. Regression

Correlation Regression
• Signifies the degree of • Indicates the causal
relationship between the relationship between
two variables. variables.
• It is limited between the • It can be more than two
two variables only.
variables.
• The variables are
interchangeable, which is • The independent and
symmetrical. dependent variables can
• Can not be helpful in not be interchangeable.
prediction. • Used for prediction.

You might also like