You are on page 1of 12

Simple 

Linear Regression
• Step 1 : Open the Google Colab ‐> https://colab.research.google.com/

To learn more on Colab please watch : https://www.youtube.com/watch?v=inN8seMm7UI
Simple Linear Regression
• Step 2 : Import some libraries
Numpy ‐> for scientific computing
Matplotlib ‐> plotting library
Pandas ‐>  data analysis/machine learning, multi‐dimensional arrays

# Importing the libraries


import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
Simple Linear Regression
• Step 3 : Import the dataset
Our dataset will be in CSV (comma separated values) format
Run the following codes to import the dataset
# Importing the dataset
from google.colab import files
uploaded= files.upload()
Simple Linear Regression
• Step 3 : Import the dataset
Our dataset will be in CSV (comma separated values) format
Run the following codes to import the dataset
# Importing the dataset
from google.colab import files
uploaded= files.upload()
Simple Linear Regression
• Step 3 : Import the dataset
Import the uploaded file into dataframe
# loading the uploaded dataset into dataframe
import pandas as pd
dataset = pd.read_csv('Salary_Data.csv')
dataset.head(10)
Simple Linear Regression
• Step 4 : Split dataset into x and y
• y – dependent variable
• x – independent variable
# split independent and dependent variable from dataframe
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 1].values

Table <‐ [years experience, annual salary]
The dataset.iloc[ ] is used to read between columns in a table.
[:, :‐1] ‐> all the rows in the table and all the coumns except the last one 
(years experience)
[:, :1] ‐> all the rows in the table and the 2nd (counting from zero) column 
in the table (annual salary)
Simple Linear Regression
• Step 5 : Split Training set and Test set
• Training set –> most of the data, so that the machine learn
• Test set -> to validate the model built
# split into training and test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 1/3, random_state = 0)

Table <‐ [years experience, annual salary]
The dataset.iloc[ ] is used to read between columns in a table.
[:, :‐1] ‐> all the rows in the table and all the coumns except the last one 
(years experience)
[:, :1] ‐> all the rows in the table and the 2nd (counting from zero) column 
in the table (annual salary)
Simple Linear Regression
• Step 6: Fitting simple Linear Regression to Training set
#fitting simple linear regression to the training set
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)
Simple Linear Regression
• Step 7: Visualizing the Training set results
#visualizing the training set results
import matplotlib.pyplot as plt
plt.scatter(X_train, y_train, color = 'red')
plt.plot(X_train, regressor.predict(X_train), color = 'blue')
plt.title('Salary vs Experience (Training set)')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.show()
Simple Linear Regression
• Step 8: Prediction of a new value of X
#predicition for a new value of X (years of experience)
xnew=[[2.5]]
y_pred = regressor.predict(xnew)
print (y_pred)
Simple Linear Regression

• Exercise 2: Plot the results for Test set
#Visualizing the results for Test set
Simple Linear Regression
• Exercise 3: Use colab to solve this problem
#To be submitted

You might also like