Professional Documents
Culture Documents
I used Neural Networks to solve an Energy efficiency Data Set with 8 attributes and 2 decisions,
in my program I used the first decision with all six attributes.
The attributes and decisions were:
Quantile regression estimates the median or other quantiles of conditional on, while ordinary
least squares (OLS) estimate the conditional mean.
In my case, Linear Regression performs better at 20% training size than Quantile Regression
with a score of 85% compared to 73%. The error is also a lot smaller when using Linear
Regression standing at 13.6 compared to 24.91. When changing the training size, the result do
change but linear regression still tops quantile regression at the same ratio.
I used a logarithmic axis for visual representation with the middle blue line representing perfect
regression. The closer the red point are to the line the more accurate the model is. If the points
are under the blue line it means that the model is under predicting and if they are above the
blue line it means that the model is over predicting.
Pripoae Serbanescu Mihai
LINEAR REGRESSION
Pripoae Serbanescu Mihai
QUANTILE REGRESSION
If we look closely to the 2 graphs, we ca observe that the dots at linear regression are
more chaotic than in quantile regression. The phenomenon is caused by the fact that
linear regression makes a bunch of predictions whereas quantile regression does not,
Pripoae Serbanescu Mihai
so it looks a lot more symmetrical and linear. While the Linear Regression Graph looks
more chaotic it gives more precision and less errors based on the predictions made.
The CODE:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import QuantileRegressor
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score
from sklearn.metrics import mean_squared_error
dataset = pd.read_csv("ENB2012_data.csv")
x = dataset.iloc[:, :8]
y = dataset.iloc[:, -3]
x_tr, x_te, y_tr, y_te = train_test_split(x, y, test_size=0.2,
random_state=1)
# Linear Regression
LinearRegression_model = LinearRegression()
LinearRegression_model.fit(x_tr, y_tr)
y_pr = LinearRegression_model.predict(x_te)
print("\n Linear Regression")
PredictedValues1 = pd.DataFrame({'Actual': y_te, 'Predicted': y_pr})
print(PredictedValues1)
score = r2_score(y_te, y_pr)
print("Regression accuracy score is ", score)
print("Mean squared error is== ", mean_squared_error(y_te, y_pr))
print("Root of mean squared error is == ", np.sqrt(mean_squared_error(y_te,
y_pr)))
plt.figure(figsize=(10, 10))
plt.scatter(y_te, y_pr, c='crimson')
plt.yscale('log')
plt.xscale('log')
p1 = max(max(y_te), max(y_tr))
p2 = min(min(y_te), min(y_tr))
plt.plot([p1, p2], [p1, p2], 'b-')
plt.xlabel('True Values', fontsize=15)
plt.ylabel('Predictions', fontsize=15)
plt.axis('equal')
plt.show()
# Quantile Regression
print("\n Method2")
QuantileRegressor_Model = QuantileRegressor()
Pripoae Serbanescu Mihai
QuantileRegressor_Model.fit(x_tr, y_tr)
y_pr = QuantileRegressor_Model.predict(x_te)
PredictedValues2 = pd.DataFrame({'Actual': y_te, 'Predicted': y_pr})
print(PredictedValues2)
score = r2_score(y_te, y_pr)
print("Regression accuracy score is ", score)
print("Mean squared error is== ", mean_squared_error(y_te, y_pr))
print("Root of mean squared error is == ", np.sqrt(mean_squared_error(y_te,
y_pr)))
plt.figure(figsize=(10, 10))
plt.scatter(y_te, y_pr, c='crimson')
plt.yscale('log')
plt.xscale('log')
p1 = max(max(y_te), max(y_tr))
p2 = min(min(y_te), min(y_tr))
plt.plot([p1, p2], [p1, p2], 'b-')
plt.xlabel('True Values', fontsize=15)
plt.ylabel('Predictions', fontsize=15)
plt.axis('equal')
plt.show()
I used read_csv to open and process the dataset . I stored the attributes in x and the
decision in Y choosing only the last decision(Cooling Load).I created the model using
x_tr and y_tr than generated the score and error .I then generated a plot to visualize the
results and than printed it.