You are on page 1of 9

Evaluation metrics for regression problems.

Evaluation Metrics
are very important as they tell us, how accurate our model is.

Before we proceed to the evaluation techniques, it is important to


gain some intuition.

Linear Regression

In the above image, we can see that we have plotted a linear curve,
but the curve is not perfect as some points are lying above the line
& some are lying below the line.

So, how accurate our model is?

The evaluation metrics aim to solve these problems. Now, without


wasting time, let’s jump to the evaluation metrics & see the
evaluation techniques.

There are 6 evaluation techniques:


1. M.A.E (Mean Absolute Error)

2. M.S.E (Mean Squared Error)

3. R.M.S.E (Root Mean Squared Error)

4. R.M.S.L.E (Root Mean Squared Log Error)

5. R-Squared

6. Adjusted R-Squared

Now, let’s discuss these techniques one by one.

M.A.E (Mean Absolute Error)

It is the simplest & very widely used evaluation technique. It is


simply the mean of difference b/w actual & predicted values.

Below, is the mathematical formula of the Mean Absolute Error.

Mean Absolute Error

The Scikit-Learn is a great library, as it has almost all the inbuilt


functions that we need in our Data Science journey.
Below is the code to implement Mean Absolute Error
from sklearn.metrics import
mean_absolute_errormean_absolute_error(y_true, y_pred)

Here, ‘y_true’ is the true target values & ‘y_pred’ is the predicted
target values.

M.S.E (Mean Squared Error)


Another evaluation technique is the Mean Squared Error. It takes
the average of the square of the error. Here, the error is the
difference b/w actual & predicted values.

Below is the mathematical formula of the Mean Squared Error.

Mean Squared Error

I’m sure that after looking at the above mathematical function, you
will be thinking of the implementation. But, don’t worry, there is
an inbuilt function called ‘mean_squared_error’.

Here is the code below:


from sklearn.metrics import
mean_squared_errormean_squared_error(y_true,y_pred)
Here, y_true is the true value & y_pred is the predicted value.

Well, A problem with the above function is that It changes the


units. To avoid, that problem, we will use another technique,
called, R.M.S.E (Root Mean Squared Error)

R.M.S.E (Root Mean Squared Error)


Root mean squared Error is another technique that is being used
these days. First of all, it solves the problem in the above
technique.

It squares the error & then it takes the square root of the total
average function. Below, is the mathematical function which will
make the things more clear.

Root Mean Squared Error

Below, is the code to calculate the Root Mean Squared Error.


from sklearn.metrics import mean_squared_errorimport numpy as npmse
= mean_squared_error(y_true,y_pred)rmse = np.sqrt(mse)
Here, you can see that we don’t have any special function to
calculate RMSE but instead we use MSE & take the square-root of
that.

But wait, there is a limitation of this method.

Let’s take the below example.

In Example 1, we can see that the error is very large. The actual
value is 1 & the predicted value is 401.

In example 2, we can see that, if we compare actual to predicted,


the predicted is giving us a good result.
For example 1 & 2, the error is 400, but actually, the ML model in
example 2 is giving us better results. But, according to RMSE, the
error is the same.

Therefore, to solve this problem, we use another similar, but yet


modified method, which is discussed below.

R.M.S.L.E (Root Mean Squared Log Error)

The mathematical function of this technique is displayed below.

Root Mean squared Log Error

Now, if we take the above case in the RMSLE, then, the RMSLE of
Ex 1 is greater than Ex 2. & therefore, RMSLE solves the problem
which occurred in RMSE (Root Mean Squared Error)

This method actually, scales down the values & thus, it avoids the
above error.

Below is the code to implement it


from sklearn.metrics import
mean_squared_log_errornp.sqrt(mean_squared_log_error( y_true,
predictions ))
Here, ‘y_true’ is the actual target variable & ‘predictions’ is the
predicted target variable.

R — Squared

Now, we come to another technique called R — Squared, whose


actual name is Relative Squared Error.

This method helps us to calculate the relative error. This technique


helps us to judge, which algorithm is better based on their mean
squared errors.

The mathematical formula of the R — Squared method is given


below.

R — Squared

If x >1, this means that, the MSE of the numerator is greater than
the MSE of the baseline model which in turn means that, the new
model is worse than the baseline model.

Higher is the R — Squared, better is the model.

Below is the code to implement the R-Squared evaluation


technique
from sklearn.metrics import
r2_scoresklearn.metrics.r2_score(y_true, y_pred)

Here, ‘y_true’ is the true target variable & ‘y_pred’ is the predicted
target variable

After, reading, the above paragraph, you may be really impressed


by these evaluation metrics. But wait there is a limitation of this
technique.

The limitation is that R-Squared value either increases or doesn’t


change upon adding more features. Regardless of how features
impact the model.

To overcome this limitation, there is another evaluation technique


called Adjusted R — Squared which is discussed below.

Adjusted R — Squared

The mathematical formula is displayed below.

Adjusted R — Squared

Here, n: number of samples & k: number of features.


There is no inbuilt function on scikit-learn to calculate Adjusted R-
Squared but we can find R-Squared & just calculate the Adjusted
R-Squared.

Well, the above are the 6 most commonly used evaluation metrics
for Regression Problems.

But, there are a lot of factors that are to be done before model
training like data cleaning, data visualization, data analysis,
missing value treatment, outlier treatment, etc.

You might also like