Polynomial Regression for Non-Linear Data
Objective: Use Polynomial Regression to fit a non-linear dataset.
Dataset: Create a synthetic dataset with a non-linear relationship (e.g., \(y = x^2 + 2x + 3\)).
Tasks:
1. Generate and explore the dataset.
2. Visualize the data to confirm its non-linear nature.
3. Implement Polynomial Regression (e.g., degree=2) to fit the data.
4. Compare the performance with a Simple Linear Regression model.
import numpy as np
import pandas as pd
# Set random seed for reproducibility
np.random.seed(42)
# Generate synthetic dataset
X = np.linspace(-10, 10, 100).reshape(-1, 1) # Generate 100 values from -10 to 10
y = X**2 + 2*X + 3 + np.random.normal(0, 5, X.shape) # Quadratic equation with noise
# Convert to DataFrame
df = pd.DataFrame({'X': X.flatten(), 'y': y.flatten()})
# Display first few rows
print(df.head())
import matplotlib.pyplot as plt
# Scatter plot to visualize the dataset
plt.scatter(df['X'], df['y'], color='blue', label='Data Points')
plt.xlabel('X')
plt.ylabel('y')
plt.title('Scatter Plot of Non-Linear Dataset')
plt.legend()
plt.show()
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import make_pipeline
# Create a polynomial regression model (degree=2)
poly_degree = 2
poly_model = make_pipeline(PolynomialFeatures(degree=poly_degree), LinearRegression())
# Train the model
poly_model.fit(X, y)
# Predictions
y_pred_poly = poly_model.predict(X)
# Plot Polynomial Regression Fit
plt.scatter(X, y, color='blue', label='Data Points')
plt.plot(X, y_pred_poly, color='red', label=f'Polynomial Regression (degree={poly_degree})')
plt.xlabel('X')
plt.ylabel('y')
plt.title('Polynomial Regression Fit')
plt.legend()
plt.show()
# Train a simple linear regression model
linear_model = LinearRegression()
linear_model.fit(X, y)
# Predictions using Linear Regression
y_pred_linear = linear_model.predict(X)
# Compare Polynomial vs. Linear Regression
plt.scatter(X, y, color='blue', label='Data Points')
plt.plot(X, y_pred_linear, color='green', linestyle='dashed', label='Linear Regression')
plt.plot(X, y_pred_poly, color='red', label=f'Polynomial Regression (degree={poly_degree})')
plt.xlabel('X')
plt.ylabel('y')
plt.title('Comparison: Polynomial vs. Linear Regression')
plt.legend()
plt.show()
The comparison between Polynomial Regression (degree=2) and Simple Linear Regression
shows that Polynomial Regression provides a much better fit for the given dataset. Linear
Regression, represented by the green dashed line, assumes a straight-line relationship and fails to
capture the quadratic pattern in the data, leading to higher error and poor predictive performance.
In contrast, Polynomial Regression (red line) effectively models the curvature, reducing the error
and improving accuracy. While Linear Regression underfits the data due to its simplicity,
Polynomial Regression balances flexibility and generalization, making it the better choice for
this dataset.