Lab10 Regression Evaluation Methods

6/27/2019 Lab10 Regression Evaluation Methods
Lab No. 10
The students will be able to learn Regression Evaluation Methods
1. Students will be able to apply MAE method

2. Students will be able to apply MSE method
3. Students will be able to apply RMSE method
In [74]: import pandas as pd
 In [75]: data = pd.read_csv('e:/mydata/advertising.csv')
In [76]: data.head(3)
Out[76]: Unnamed: 0 TV Radio Newspaper Sales
0 1 230.1 37.8 69.2 22.1
1 2 44.5 39.3 45.1 10.4
2 3 17.2 45.9 69.3 9.3
Split the dataset
In [77]: X = data[['TV', 'Radio', 'Newspaper']]

X.head()
Out[77]: TV Radio Newspaper
0 230.1 37.8 69.2
1 44.5 39.3 45.1
2 17.2 45.9 69.3
3 151.5 41.3 58.5
4 180.8 10.8 58.4
In [78]: X.shape
Out[78]: (200, 3)
In [79]: type(X)
Out[79]: pandas.core.frame.DataFrame
localhost:8888/notebooks/Lab10 Regression Evaluation Methods.ipynb 1/5

In [80]: y = data['Sales']
y.head()
Out[80]: 0 22.1
1 10.4
2 9.3
3 18.5
4 12.9
Name: Sales, dtype: float64
In [81]: print(y.shape)
(200,)
In [82]: print(type(y))
<class 'pandas.core.series.Series'>
In [83]: from sklearn.model_selection import train_test_split
In [84]: X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_st
In [85]: print(X_train.shape)
print(y_train.shape)
print(X_test.shape)
print(y_test.shape)
(140, 3)
(140,)
(60, 3)
(60,)
In [86]: from sklearn.linear_model import LinearRegression
In [87]: lr = LinearRegression()
In [88]: lr.fit(X_train, y_train)
Out[88]: LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

In [89]: y_pred = lr.predict(X_test)

print(y_pred)
print()
print("Number of Predictions are: ", len(y_pred))
[21.66318307 16.44137936 7.69144625 17.9163172 18.67047113 23.79199311

16.2825425 13.44138683 9.15294033 17.32475313 14.43922876 9.84019547
17.26329945 16.62853147 15.09158705 15.50173894 12.43404074 17.32591521
11.04327486 18.05652777 9.35309526 12.79465958 8.73413846 10.47225333
11.38216042 15.02658554 9.7406823 19.44676903 18.19211174 17.20178728
21.56359539 14.70484262 16.2635213 12.37098906 19.97059316 15.36768988
14.00399515 10.0772945 20.91891557 7.43833283 3.67031166 7.27760354
5.99523188 18.41497546 8.31868226 14.1090252 14.93697583 20.35882814
20.56271636 19.55380813 24.10360923 14.84985778 6.71474914 19.77761567
18.93996367 12.5109195 14.20052652 6.10844697 15.3695344 9.56769111]
Number of Predictions are: 60
In [108]: # MY WORK FOR TESTING PURPOSE
df_actual = pd.Series(y_test, name='Actual')

print(df_actual.head())
58 23.8
40 16.6
34 9.5
102 14.8
184 17.6
Name: Actual, dtype: float64
In [109]: df_predicted = pd.Series(y_pred, name='Predicted')

print(df_predicted.head())
0 21.663183
1 16.441379
2 7.691446
3 17.916317
4 18.670471
Name: Predicted, dtype: float64
Evaluation Metric
In [106]: # Mean Absolute Error
from sklearn import metrics
import numpy as np

In [107]: print(metrics.mean_absolute_error(y_test, y_pred))

print(metrics.mean_squared_error(y_test, y_pred))
print(np.sqrt(metrics.mean_squared_error(y_test, y_pred)))
1.0548328405073326
1.9289249074665733
1.388857410775697
Lab Tasks:
1. Compare MAE, MSE and RMSE
2. Describe these evaluation techniques mathematically
Task 1
Mean absolute error
Mean absolute error (MAE) is a measure of difference between two continuous variables. ... The
Mean Absolute Error is given by: It is possible to express MAE as the sum of two components:
Quantity Disagreement and Allocation Disagreement.
Mean Squared error
The mean squared error tells you how close a regression line is to a set of points. It does this by
taking the distances from the points to the regression line (these distances are the “errors”) and
squaring them. The squaring is necessary to remove any negative signs. It also gives more weight
to larger differences. It’s called the mean squared error as you’re finding the average of a set of
errors.
Root mean squared error
Root Mean Square Error (RMSE) measures how much error there is between two data sets. In
other words, it compares a predicted value and an observed or known value. It's also known as
Root Mean Square Deviation and is one of the most widely used statistics in GIS
Task 2
Mean Squared error
Find the regression line.

Insert your X values into the linear regression equation to find the new Y values (Y').
Subtract the new Y value from the original to get the error.
Square the errors.
Add up the errors.
Find the mean.
Mean Absolute error

The formula for the absolute error (Δx) is:

(Δx) = xi – x,
Where:
xi is the measurement,
x is the true value.
The Mean Absolute Error(MAE) is the average of all absolute errors. The formula is: mean
absolute error
n = the number of errors,

Σ = summation symbol (which means “add them all up”),
|xi – x| = the absolute errors.
The formula may look a little daunting, but the steps are easy:
Find all of your absolute errors, xi – x.
Add them all up.
Divide by the number of errors. For example, if you had 10 measurements, divide by 10.
average of a set of errors.
Root mean squared error

Root Mean Square Error (RMSE) measures how much error there is between two data sets. In
other words, it compares a predicted value and an observed or known value. rmse formula
In [ ]:

Lab10 Regression Evaluation Methods

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lab10 Regression Evaluation Methods

Uploaded by

Copyright:

Available Formats

6/27/2019 Lab10 Regression Evaluation Methods

1. Students will be able to apply MAE method

In [74]: import pandas as pd

 In [75]: data = pd.read_csv('e:/mydata/advertising.csv')

Out[76]: Unnamed: 0 TV Radio Newspaper Sales

0 1 230.1 37.8 69.2 22.1

1 2 44.5 39.3 45.1 10.4

2 3 17.2 45.9 69.3 9.3

Split the dataset

In [77]: X = data[['TV', 'Radio', 'Newspaper']]

Out[77]: TV Radio Newspaper

0 230.1 37.8 69.2

1 44.5 39.3 45.1

2 17.2 45.9 69.3

3 151.5 41.3 58.5

4 180.8 10.8 58.4

localhost:8888/notebooks/Lab10 Regression Evaluation Methods.ipynb 1/5

In [83]: from sklearn.model_selection import train_test_split

In [84]: X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_st

In [86]: from sklearn.linear_model import LinearRegression

In [88]: lr.fit(X_train, y_train)

Out[88]: LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

localhost:8888/notebooks/Lab10 Regression Evaluation Methods.ipynb 2/5

In [89]: y_pred = lr.predict(X_test)

[21.66318307 16.44137936 7.69144625 17.9163172 18.67047113 23.79199311

Number of Predictions are: 60

In [108]: # MY WORK FOR TESTING PURPOSE

df_actual = pd.Series(y_test, name='Actual')

In [109]: df_predicted = pd.Series(y_pred, name='Predicted')

localhost:8888/notebooks/Lab10 Regression Evaluation Methods.ipynb 3/5

In [107]: print(metrics.mean_absolute_error(y_test, y_pred))

Mean absolute error

Mean Squared error

Root mean squared error

Mean Squared error

Find the regression line.

Mean Absolute error

localhost:8888/notebooks/Lab10 Regression Evaluation Methods.ipynb 4/5

The formula for the absolute error (Δx) is:

n = the number of errors,

Root mean squared error

localhost:8888/notebooks/Lab10 Regression Evaluation Methods.ipynb 5/5

You might also like