You are on page 1of 5

6/27/2019 Lab10 Regression Evaluation Methods

Lab No. 10
The students will be able to learn Regression Evaluation Methods

1. Students will be able to apply MAE method


2. Students will be able to apply MSE method
3. Students will be able to apply RMSE method

In [74]: import pandas as pd

 In [75]: data = pd.read_csv('e:/mydata/advertising.csv')

In [76]: data.head(3)

Out[76]: Unnamed: 0 TV Radio Newspaper Sales

0 1 230.1 37.8 69.2 22.1

1 2 44.5 39.3 45.1 10.4

2 3 17.2 45.9 69.3 9.3

Split the dataset

In [77]: X = data[['TV', 'Radio', 'Newspaper']]


X.head()

Out[77]: TV Radio Newspaper

0 230.1 37.8 69.2

1 44.5 39.3 45.1

2 17.2 45.9 69.3

3 151.5 41.3 58.5

4 180.8 10.8 58.4

In [78]: X.shape

Out[78]: (200, 3)

In [79]: type(X)

Out[79]: pandas.core.frame.DataFrame

localhost:8888/notebooks/Lab10 Regression Evaluation Methods.ipynb 1/5


6/27/2019 Lab10 Regression Evaluation Methods

In [80]: y = data['Sales']
y.head()

Out[80]: 0 22.1
1 10.4
2 9.3
3 18.5
4 12.9
Name: Sales, dtype: float64

In [81]: print(y.shape)

(200,)

In [82]: print(type(y))

<class 'pandas.core.series.Series'>

In [83]: from sklearn.model_selection import train_test_split

In [84]: X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_st

In [85]: print(X_train.shape)
print(y_train.shape)
print(X_test.shape)
print(y_test.shape)

(140, 3)
(140,)
(60, 3)
(60,)

In [86]: from sklearn.linear_model import LinearRegression

In [87]: lr = LinearRegression()

In [88]: lr.fit(X_train, y_train)

Out[88]: LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

localhost:8888/notebooks/Lab10 Regression Evaluation Methods.ipynb 2/5


6/27/2019 Lab10 Regression Evaluation Methods

In [89]: y_pred = lr.predict(X_test)


print(y_pred)
print()
print("Number of Predictions are: ", len(y_pred))

[21.66318307 16.44137936 7.69144625 17.9163172 18.67047113 23.79199311


16.2825425 13.44138683 9.15294033 17.32475313 14.43922876 9.84019547
17.26329945 16.62853147 15.09158705 15.50173894 12.43404074 17.32591521
11.04327486 18.05652777 9.35309526 12.79465958 8.73413846 10.47225333
11.38216042 15.02658554 9.7406823 19.44676903 18.19211174 17.20178728
21.56359539 14.70484262 16.2635213 12.37098906 19.97059316 15.36768988
14.00399515 10.0772945 20.91891557 7.43833283 3.67031166 7.27760354
5.99523188 18.41497546 8.31868226 14.1090252 14.93697583 20.35882814
20.56271636 19.55380813 24.10360923 14.84985778 6.71474914 19.77761567
18.93996367 12.5109195 14.20052652 6.10844697 15.3695344 9.56769111]

Number of Predictions are: 60

In [108]: # MY WORK FOR TESTING PURPOSE

df_actual = pd.Series(y_test, name='Actual')


print(df_actual.head())

58 23.8
40 16.6
34 9.5
102 14.8
184 17.6
Name: Actual, dtype: float64

In [109]: df_predicted = pd.Series(y_pred, name='Predicted')


print(df_predicted.head())

0 21.663183
1 16.441379
2 7.691446
3 17.916317
4 18.670471
Name: Predicted, dtype: float64

Evaluation Metric
In [106]: # Mean Absolute Error
from sklearn import metrics
import numpy as np

localhost:8888/notebooks/Lab10 Regression Evaluation Methods.ipynb 3/5


6/27/2019 Lab10 Regression Evaluation Methods

In [107]: print(metrics.mean_absolute_error(y_test, y_pred))


print(metrics.mean_squared_error(y_test, y_pred))
print(np.sqrt(metrics.mean_squared_error(y_test, y_pred)))

1.0548328405073326
1.9289249074665733
1.388857410775697

Lab Tasks:
1. Compare MAE, MSE and RMSE
2. Describe these evaluation techniques mathematically

Task 1

Mean absolute error

Mean absolute error (MAE) is a measure of difference between two continuous variables. ... The
Mean Absolute Error is given by: It is possible to express MAE as the sum of two components:
Quantity Disagreement and Allocation Disagreement.

Mean Squared error

The mean squared error tells you how close a regression line is to a set of points. It does this by
taking the distances from the points to the regression line (these distances are the “errors”) and
squaring them. The squaring is necessary to remove any negative signs. It also gives more weight
to larger differences. It’s called the mean squared error as you’re finding the average of a set of
errors.

Root mean squared error

Root Mean Square Error (RMSE) measures how much error there is between two data sets. In
other words, it compares a predicted value and an observed or known value. It's also known as
Root Mean Square Deviation and is one of the most widely used statistics in GIS

Task 2

Mean Squared error

Find the regression line.


Insert your X values into the linear regression equation to find the new Y values (Y').
Subtract the new Y value from the original to get the error.
Square the errors.
Add up the errors.
Find the mean.

Mean Absolute error

localhost:8888/notebooks/Lab10 Regression Evaluation Methods.ipynb 4/5


6/27/2019 Lab10 Regression Evaluation Methods

The formula for the absolute error (Δx) is:


(Δx) = xi – x,

Where:
xi is the measurement,
x is the true value.
The Mean Absolute Error(MAE) is the average of all absolute errors. The formula is: mean
absolute error

n = the number of errors,


Σ = summation symbol (which means “add them all up”),
|xi – x| = the absolute errors.
The formula may look a little daunting, but the steps are easy:
Find all of your absolute errors, xi – x.
Add them all up.
Divide by the number of errors. For example, if you had 10 measurements, divide by 10.
average of a set of errors.

Root mean squared error


Root Mean Square Error (RMSE) measures how much error there is between two data sets. In
other words, it compares a predicted value and an observed or known value. rmse formula

In [ ]:

localhost:8888/notebooks/Lab10 Regression Evaluation Methods.ipynb 5/5

You might also like