You are on page 1of 31

Equation of a Line

with Python Programming


Module 2- Part 2

CS 132 - Mathematics for Computer Science

1
Learning Objectives
1. Derive the equation of a line that fits a given
set of points based on the concept of a
Regression
(Voluntary Homework)
2. Create a Program that is an aid in the
derivation of the equation of a line using the
Python Programming language
3. Use colab.research.google.com to run a
program in Python
Introduction
• Regression Analysis is an approach for
deriving the equation of a line that may be
used for forecasting.
• Given: a set a points
• Goal: Derive a line that fits the given set of
points.
Fitting a line to a given set of points
(Model for inference based on data collected)

• Curve Fitting
– Deriving a Regression Line
– Deriving Nonlinear Regression Curve (e.g.
Quadratic Curve)
Regression Analysis
• Determine the values of parameters for a
function that cause the function to best fit
a set of data observations
• The Regression line has the form y=ax+b.
Hence, the values of the parameters a and
b leads to having determined the line that
fits a set of points.
Linear Regression

y = ax + b
(equivalent form: y= a + bx or y=p0 +p1x)
General Idea behind the Regression Line

• Data gathered are represented as points in


the Cartesian Plane. {(x1,y1), (x2,y2), …,
(xn,yn)}

• Minimize the sum of the squares of


deviations from actual points to points in
the regression line.Hence, the approach
may also be called The Least Square
Method
Formulas for Linear Regression


• Using principles of Calculus, a system of equations


that will allow the derivation of the values of a and
b for the regression line Y=ax+b can be derived.

• For the mean time, the Formulas are taken from a


reference.
Given: {(x1,y1), (x2,y2), …, (xn,yn)}
Formulas for Linear Regression


• Using principles of Calculus, a system of equations that will allow the derivation of the values of
a and b for the regression line y=ax+b can be derived using the following formulas.

Given: {(x1,y1), (x2,y2), …, (xn,yn)}


The values for a and b for y=a+bx may be known from the following system of two equations in two
unknowns.

» For the mean time, we forego showing the derivation of the formulas.
Required Exercise Exercise
2B
(See Next Slide)
Exercise

11
Linear Regression
(A sample context)
For a particular stretch of a highway it is believed that there is a
correlation between the vehicle density(number of vehicles per 100 m) on
the highway and the number of accidents that occur. From causal
observation, the number of accidents has been found to increase with an
increase in vehicle density up to a certain point. However, once the
vehicle density exceeds a certain value, the average vehicle speed is
reduced due to congestion, thereby reducing the number of accidents.
To predict accident rates and as an aid to produce an improved highway
design, we wish to develop equations relating the vehicle density to the
number of accidents from observed data.
Assume:
Accidents on a Highway
depends on Vehicle Density
Data for Regression Problem
Observation Data

You may use a spreadsheet software as an aid
Please find sample but partial worksheets uploaded for this purpose

Vehicle Density (x) Number of Accidents (y)

1.4 3

2.0 6

2.3 4

4.5 7

6.2 10

6.7 15

7.0 11

8.5 18

9.0 13

12.7 17

13.1 15

17.7 16

18.5 11

20.3 5
Data for Regression Problem
!
The First 8 Data Points may be represented by one
line
!
The last 7 Data Points may be represented by
another regression line.
Linear Regression
(Sample case described in a previous slide )

Analysis of the situation dictates an approach by which two straight


lines that best fit the data are derived, one that rises until it reaches
the vehicle density at the peak number of accidents( First 8
observations) and another that decreases from this point(last 7
observations). Determine the lines.
Observation Data

You may use a spreadsheet software as an aid
Please see partial worksheet on the next slide

Vehicle Density (x) Number of Accidents (y)

1.4 3

2.0 6

2.3 4

4.5 7

6.2 10

6.7 15

7.0 11

8.5 18

9.0 13

12.7 17

13.1 15

17.7 16

18.5 11

20.3 5
Partial Worksheet
(Part of the solution of Exercise 2B)

APPLY THE FORMULAS FOR GETTING THE REGRESSION LINE. See Module2B1_p
21
Observation Data

You may use a spreadsheet software as an aid
Please find sample but partial worksheets uploaded for this purpose

Vehicle Density (x) Number of Accidents (y)

1.4 3

2.0 6

2.3 4

4.5 7

6.2 10

6.7 15

7.0 11

8.5 18

9.0 13

12.7 17

13.1 15

17.7 16

18.5 11

20.3 5
Partial Worksheet
(Part of the solution of the Exercise 2B)

APPLY THE FORMULAS FOR GETTING THE REGRESSION LINE. See Module2B1_present

23
Cramer’s Rule for Solving a system two equations in
2 unknowns

Equation 1 : (a)x + (b)y = h


Equation 2 : (c)x + (d)y = k

Cramer’s Rule Uses Determinants

x = (h*d - k*b)/(a*d - c*b)


y= (a*k - c*h)/(a*d-c*b)

24
Cramer’s Rule for Solving a system two equations in
2 unknowns

Equation 1 : (2)x + (3)y = 12


Equation 2 : (1)x + (-5)y = -7

a=2, b=3, h=12, c=1, d=-5, k=-7

x = (h*d - k*b)/(a*d - c*b)


y= (a*k - c*h)/(a*d-c*b)

x = (12*(-5) - (-7)*3)/(2*(-5) - 1*3) = -39/-13 = 3


y= (2*(-7) - 1*12)/(2*(-5)—1*3) = -26/-13 = 2

25
Needed Theoretical Framework
» To be able to solve 2 unknowns, you must
have a system of 2 equations in 2 unknowns
» Apply Method of Elimination
» By substitution
» By Addition/Subtraction
» Use Cramer’s Rule

26
Linear Regression Problem
(Another sample Context)

Population Forecasting Model

year population(million)

1950 20.2

1955 22.9

1960 25.29

1965 30.55

1970 34.7

1975 40.3

1980 47.1

1985 52.3

1990 59.2

1995 68.2

2000 78.78

2005 85.5

2010 91.35
Nonlinear Regression Formulas

• A probable lesson in a future Course


Nonlinear Regression

Y = a + b*X + c*X^2
Thank You!
Best Regards!
Sincerely yours,

Course Facilitator

30
References
Walpole, R.(1997) Introduction to Statistics.
3rd Edition. Prentice Hall International, Inc.
Singapore.

You might also like