You are on page 1of 3

Linear regression with one variable through LMS algorithm

Rodrigo Barbosa de Santis

May 21, 2019

1 Introduction
One of the most explored problem in machine learning literature so far is the linear regression, that consists in
approximating the straight line – or function – that better fits a set of given points. Thus, we can use the traced line to
predict outcomes for unknown values. Several methods have been proposed for linear regression so far, including
geometric, mathematics and computational approaches. In this study, it is applied the Least Square algorithm for
addressing this particular problem.

2 Materials and methods

2.1 Linear Regression
Linear Regression is one of the oldest and most widely used predictive model in the field of Machine Learning.
It approximates a linear function for representing the expected results for unknown data. (Principe et al., 1999)
In the present work, a simple linear model with one variable is adopted for fitting the data, shown in Table 1
and plotted in Fig. 1.

Table 1: Regression data

x 1 2 3 4 5 6 7 8 9 10 11 12
d 1.72 1.90 1.57 1.83 2.13 1.66 2.05 2.23 2.89 3.04 2.72 3.18

Figure 1: Plot of x versus d

The model is expressed by (Principe et al., 1999)

di = wxi + b + ϵi = yi + ϵi (1)
where d, x, y and ϵ are the desired, predictor, linearly fitted and error values for each i = 1, 2, ..., N , respectively,
w is the line slope and b is the bias.

In most instances, it is not possible to find a straight line which fits all values. Therefore, a criterion is determined
for estimate which parameters present the best performance on this task. The mean square error (MSE) is one of
the most common adopted criterion, calculated by (Principe et al., 1999)

1 ∑ 2
J= ϵ (2)
2N i=1 i
where J is the average sum of square errors and N the amount of sample data to be fitted.

2.2 Least Mean Square (LMS)

The Least Mean Square (LMS) is a parameter search algorithm which minimizes the difference between the
linear system output y and the desired response d. The function J(w), called the performance surface – see Fig.
2.2) – is an important tool that helps to visualize how the adaptation of the weights affects the MSE. (Principe et
al., 1999)

Figure 2: Performance surface for the regression problem

As our main goal is finding a w∗ that minimizes the J function, each is iteratively calculated, by the Eq. (3),
where k = 1, 2, ..., k number of training iterations (or epochs), η is the step size (or learning rate).

w(k + 1) = w(k) − η∇w J(k) (3)

The gradient of the performance surface ∇w J(k) is a vector that points toward direction of maximum J, given
∇w J = ≈ −ϵ(k)x(k). (4)
and ϵ(k) is the residual error calculated by ϵi = di − (b + wxi ). Substituting Eq. (4) in Eq. (3), we obtain the final
equation of LMS algorithm: (Principe et al., 1999)

w(k + 1) = w(k) + ηϵ(k)x(k). (5)

The procedure is executed for n epochs with a fixed learning rate η for approximate result for the weights
w∗ . Depending on the initial values and the learning rate, the solution can or cannot converge to a result. One
common phenomena faced when η value is too large, is known as rattling, where the algorithm achieves a unstable
non-optimal solution (Principe et al., 1999).

3 Development
The algorithm is implemented using Python 2.7.8 (Van Rossum, 1998), including the following libraries:

1. NumPy ( – A large set of funcations that allows arrays manipulation;

2. PyLab ( – A scientific library, which provides a group of graphic and
chart functions.

The parameters set for the method are summarized in Table 2. All weights w were initialized as 0.00.

Table 2: Parameters set for LMS algorithm

Learning rate 0.1 0.01 0.001
Epochs 100 1,000 10,000

4 Results
The first result, obtained utilizing epochs = 1, 000 and learning rate = 0.01, is shown in Figure 4. The model
is described by the linear function f (x) = 0.1568x + 1.1918, with error J = 0.0360.

Figure 3: Linear model adjusted by the algorithm

Varying the learning rate to 0.1 resulted into an execution error, returning no values for the weights, whilst
adopting 0.001 performed a worse solution than the one previously found, with J = 0.1550.
The same sensitivity analysis was performed for the number of epochs: increasing it to 10, 000, the method
found a slightly better model with J = 0.0336, while lowering it to 100 presented error substantially higher –
J = 0.1550.

5 Conclusion
Although the LMS algorithm does not present an analytical solution for the linear regression model, it provides
a satisfactory approximation that can be iteratively found in many different application. There are some limitations
in the method, as depending on the initial weights it can converge for a local but not global solution, or start rattling
and not converging at all. But these are known issues treated by other and the method is also quite important for
introducing some concepts developed in other machine learning methods

Principe, J. C., Euliano, N. R., & Lefebvre, W. C. (1999). Neural and adaptive systems: fundamentals through
simulations with CD-ROM. John Wiley & Sons, Inc..

Van Rossum, G. (1998). Python: a computer language. Version 2.7.8. Amsterdam, Stichting Mathematisch Cen-
trum. (