Professional Documents
Culture Documents
Dataset Overview:
The dataset, provided in 'e1.xlsx', contains values with 'x' ranging from 0 to 0.9346 and 'y' from 0 to 13.4821, split into
training and testing sets with an 80:20 ratio.
6
4
2
0
0 0.2 0.4 0.6 0.8 1 1.2
X
b represents the value of y that is expected by our model when the independent variable x equals 0.
a represents the rate of change of y as predicted by our model with respect to x.
The coefficients by our closed-form equation and the built-in Excel functionality come out to be very
close to each other. The built-in functionality of Excel also gives us various other parameters that can
help us gauge the performance of our model.
1. Y_cap (Predicted y): The value of 'y' predicted by the regression model for each 'x'
value. It's used to compare against the actual 'y' values to assess the model's
prediction accuracy.
2. e (Error): The difference between the actual 'y' value and the predicted 'ycap' for
each data point. It's a direct measure of the prediction error.
3. e_sq (Error Squared): The square of the error 'e'. Squaring the error emphasizes
larger errors more than smaller ones and is used in calculating other metrics like
MSE.
4. MAE (Mean Absolute Error): The average of the absolute values of the errors.
MAE provides a simple measure of prediction accuracy, with lower values
indicating better model performance.
5. SSE (Sum of Squared Errors): The sum of all squared errors 'e_sq'. SSE is a
measure of the total prediction error.
6. MSE (Mean Squared Error): The average of the squared errors 'e_sq'. MSE is a
common measure of model accuracy.
7. RMSE (Root Mean Squared Error): The square root of MSE. RMSE is a popular
metric for assessing model accuracy as it is in the same units as the dependent
variable and penalizes larger errors more.
The values calculated of the above measures are:
The red line captures the trend of the data points, ascending diagonally from
the lower left to the upper right, indicating a positive linear relationship
between 'x' and 'y'. Most data points are close to the line, suggesting the
model fits the data reasonably well.
there are a few points further from the line that may represent outliers or
variations not captured by the model.
• Scatter Plot of e vs. x
e
2
1.5
0.5
0
0 0.2 0.4 0.6 0.8 1 1.2
-0.5
-1
-1.5
-2
-2.5
• Histogram of e
y
14
12
10
0
0 0.2 0.4 0.6 0.8 1 1.2
Our model also fits the test data well this shows that our model performance is good.
The MSE and RMSE errors have risen a bit compared to the trained dataset this might
be due to the limited training data.
• Scatter Plot of e vs. x
e
2
1.5
1
0.5
0
0 0.2 0.4 0.6 0.8 1 1.2
-0.5
-1
-1.5
-2
-2.5
-3
The errors are random and this shows our model performance is good.
The histogram represents a normal distribution which is another indication of good model
performance.
Analysis:
The model is performing well and demonstrating good results and the errors do not shoot up
very much this shows our model is predicting the dependent variable reliably.
1. Model Performance: The calculated coefficients 'a' and 'b' for our SLR model showed
a significant linear relationship between the independent variable 'x' and the
dependent variable 'y'. The positive slope indicated that as 'x' increases, 'y' also
increases.
2. Data Fit: The scatter plot with the superimposed line of predicted values (ycap)
closely followed the trend of the actual data points, suggesting that the SLR model has
a good fit.
3. Residual Analysis: The scatter plot of residuals 'e' did not exhibit any systematic
patterns, implying that the model's assumptions of linearity and homoscedasticity
(constant variance of errors) were largely met.
4. Error Metrics: The calculated error metrics such as MAE, SSE, MSE, and RMSE were
within acceptable ranges, indicating that the model's predictions were accurate and
reliable for the given data.
5. Predictive Accuracy: The comparison of train and test data error metrics suggested
that the model generalized well, maintaining its predictive accuracy on unseen data.
The exercise helped understand various concepts related to Simple linear regression.