Simple Linear Regression in Python with scikit-learn

Simple Linear Regression With scikit-learn
There are five basic steps when you’re implementing linear regression:
1. Import the packages and classes you need.

2. Provide data to work with and eventually do appropriate transformations.
3. Create a regression model and fit it with existing data.
4. Check the results of model fitting to know whether the model is satisfactory.
5. Apply the model for predictions.
These steps are more or less general for most of the regression approaches and implementations.
Problem Statement: -
A certain organization wanted an early estimate of their employee churn out rate. So, the HR
department came up with data regarding the employee’s salary hike and churn out rate for a
financial year. The analytics team will have to perform a deep analysis and predict an estimate
of employee churn and present the statistics. Approach –A Simple Linear regression model
needs to be built with target variable ‘Churn_out_rate’. Apply necessary transformations and
record the RMSE values, Correlation coefficient values for different transformation models.
Step 1: Import packages and classes
The first step is to import the package numpy and the class LinearRegression from sklearn.linear_model:
import numpy as np
from sklearn.linear_model import LinearRegression
Now, you have all the functionalities you need to implement linear regression.
The fundamental data type of NumPy is the array type called numpy.ndarray. The rest of this article
uses the term array to refer to instances of the type numpy.ndarray.
The class sklearn.linear_model.LinearRegression will be used to perform linear and polynomial

regression and make predictions accordingly.
Step 2: Provide data
The second step is defining data to work with. The inputs (regressors, 𝑥) and output (predictor, 𝑦).
calories_consumed.csv is imported .
Exploratory data analysis is performed on data
Step 3: Create a model and fit it
The next step is to create a linear regression model and fit it using the existing data.
Let’s create an instance of the class LinearRegression, which will represent the regression model:
Simple linear regression

model = LinearRegression()
This statement creates the variable model as the instance of LinearRegression. You can provide several
optional parameters to LinearRegression
statsmodels.formula.api is imported to build a model based on ols of data
model1=smf.ols('calories ~ weight',data=cal_data).fit()
Regression line is plotted after obtaining predicted values
after plotting scattered plot root mean squared error is calculated

In order to reduce the errors and to obtain best fit line Transformation is performed on data
Log transformation
In exponential transformation, transformation is applied on y data
#x=log(weight),y=calories
scattered plot is plotted
later correlation coefficient is obtained between transformed input and output
model2 is built on obtained data
new regression line is plotted

new rmse is calculated
Exponential transformation
In exponential transformation, transformation is applied on y data
#x=(weight),y=log(calories)
scattered plot is plotted

later correlation coefficient is obtained between transformed input and output
model3 is built on obtained data
new regression line is plotted
new rmse is calculated

Polynomial transformation
x=s_hike ,x^2=s_hike*s_hike, y=log(churn)
from sklearn.preprocessing import PolynomialFeatures to build the polynomial regression
new regression line
from the above regressive model the rmse is obtained
choose the best model by using all RMSE values of above transformations
models with respective RMS values are tabulated

from the above observations exp model is taken as best
Step 4: Get results
Once you have your model fitted, you can get the results to check whether the model works
satisfactorily and interpret it.
the summary of final model is

final model is fitted on train and test split data and prediction is observed
the final rmse value is

Simple Linear Regression in Python with scikit-learn

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Simple Linear Regression in Python with scikit-learn

Uploaded by

Copyright:

Available Formats

Simple Linear Regression With scikit-learn

1. Import the packages and classes you need.

Step 1: Import packages and classes

The first step is to import the package numpy and the class LinearRegression from sklearn.linear_model:

The class sklearn.linear_model.LinearRegression will be used to perform linear and polynomial

Step 2: Provide data

Step 3: Create a model and fit it

Simple linear regression

statsmodels.formula.api is imported to build a model based on ols of data

Regression line is plotted after obtaining predicted values

after plotting scattered plot root mean squared error is calculated

In exponential transformation, transformation is applied on y data

scattered plot is plotted

later correlation coefficient is obtained between transformed input and output

model2 is built on obtained data

new regression line is plotted

In exponential transformation, transformation is applied on y data

scattered plot is plotted

model3 is built on obtained data

new regression line is plotted

new rmse is calculated

x=s_hike ,x^2=s_hike*s_hike, y=log(churn)

from sklearn.preprocessing import PolynomialFeatures to build the polynomial regression

new regression line

from the above regressive model the rmse is obtained

models with respective RMS values are tabulated

Step 4: Get results

the summary of final model is

the final rmse value is

You might also like