You are on page 1of 8

Lab1: Implementation of Linear Regression and

Logistic Regression Models


Linear Regression
Aim:
To Implement Linear regression on diabetes dataset.
Algorithm:
1. From sklearn library import the diabetes dataset and linear regression tool.
2. Load the diabetes dataset onto your notebook.
3. Create x_train, y_train, x_test, y_test. Train the x-axis with data and y-axis with target.
4. Fit the regression model on the dataset and get the coefficient values and predicted x_test
values in an array.
5. Get the linear regression score of the model.
6. Plot a graph for the above data.
7. Then create subplots for each attribute. Check the attribute that model works best with.
Program:
import numpy as np
import pandas as pd
import matplotlib. pyplot as plt
from sklearn import linear_model
from sklearn import datasets
diabetes = datasets.load_diabetes()
linreg = linear_model.LinearRegression()
x_train = diabetes.data[:-10]
y_train = diabetes.target[:-10]
x_test = diabetes.data[-10:]
y_test = diabetes.target[-10:]
linreg.fit(x_train,y_train)
linreg.coef_ ypred = linreg.predict(x_test)
ypred
y_test
linreg.score(x_test,y_test)
plt.figure(figsize=(8,12))
for f in range(0,10):
xi_test = x_test[:,f]
xi_train = x_train[:,f]
xi_test = xi_test[:,np.newaxis]
xi_train = xi_train[:,np.newaxis]
linreg.fit(xi_train, y_train)
y = linreg.predict(xi_test)
plt.subplot(5,2,f+1)
plt.scatter(xi_test,y_test,color='k')
plt.plot(xi_test, y, color='b', linewidth=3)
plt. show()

Figure 1: A linear regression represents a linear correlation between a feature and the targets

Experimental Results:
DATA DESCRIPTION:

The Diabetes dataset consider in this experiment contains physiological data collected on 442
patients and as a corresponding target an indicator of the disease progression after a year. The
physiological data occupy the first 10 columns with values that indicate respectively the
following: • Age • Sex • Body mass index • Blood pressure • S1, S2, S3, S4, S5, and S6 (six
blood serum measurements. When we execute the below line, we get the linear regression
score(R2). The linear regression score gives the variance of the feature (R2) ie the linear
relationship between the feature and the targets. The values of R2 range from 0 to 1. An R2 close
to zero indicates a model with very little explanatory power. An R2 close to one indicates a model
with more explanatory power. But in our experiment, we got 0.58 as the score which shows that
the model is providing medium explanatory power.

linreg.score(x_test,y_test)
Out[ ]: 0.58507530226905713
Next, we have drawn regression line for all 10 features, creating 10 models and seeing the
result for each of them through a linear chart.

Figure 2. Ten Linear charts showing the correlations between physiological factors and
the progression of diabetes
From the above 10 charts, we observed that features 'age', 'bmi', 'blood
pressure','s1','s2','s3','s4','s5','s6' are showing regression line with linear relationship whereas the
feature 'sex' is not showing linear relationship with the target variable(disease).

Result:
Hence we implemented Linear Regression model on diabetes dataset .
Logistic Regression

Aim: To implement logistic regression on Breast Cancer dataset.


Algorithm:
Step 1: Start the program
Step 2: Import the required libraries
Step 3: Load the breast cancer dataset
Step 4: Identify the dependent variable and independent variables and determine if the problem
is a binary classification problem.
Step 5: Clean and pre-process the data, and make sure the data is suitable for logistic regression
modeling.
Step 6: Train the logistic regression model on the selected independent variables and estimate
the coefficients of the model.
Step 7: Evaluate the performance of the logistic regression model using appropriate metrics
such as accuracy.
Step 6: End the program
Code:
#Logistic Regression.py

import numpy as np

def sigmoid(x):
return 1/(1+np.exp(-x))

class LogisticRegression():

def __init__(self, lr=0.001, n_iters=1000):


self.lr = lr
self.n_iters = n_iters
self.weights = None
self.bias = None

def fit(self, X, y):


n_samples, n_features = X.shape
self.weights = np.zeros(n_features)
self.bias = 0

for _ in range(self.n_iters):
linear_pred = np.dot(X, self.weights) + self.bias
predictions = sigmoid(linear_pred)

dw = (1/n_samples) * np.dot(X.T, (predictions - y))


db = (1/n_samples) * np.sum(predictions-y)

self.weights = self.weights - self.lr*dw


self.bias = self.bias - self.lr*db

def predict(self, X):


linear_pred = np.dot(X, self.weights) + self.bias
y_pred = sigmoid(linear_pred)
class_pred = [0 if y<=0.5 else 1 for y in y_pred]
return class_pred
#Train.py

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn import datasets
import matplotlib.pyplot as plt
from LogisticRegression import LogisticRegression

bc = datasets.load_breast_cancer()
X, y = bc.data, bc.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1234)

clf = LogisticRegression(lr=0.01)
clf.fit(X_train,y_train)
y_pred = clf.predict(X_test)

def accuracy(y_pred, y_test):


return np.sum(y_pred==y_test)/len(y_test)

acc = accuracy(y_pred, y_test)


print(acc)
Experimental Results:
Data Description:
The Breast Cancer dataset consider in this experiment contains 2 classes ie Benign and
Malignant. The total number of samples considered is 569 and the total number of features is 30.
If the probability of the classifier is above 0.5 then it belongs to benign class and if the probability
of the classifier is below 0.5 then it will be considered for malignant class. Logistic Regression
gave the accuracy of 93% on breast cancer data.
Output:
Accuracy -93.056

Result:
From the output results, we found out that Logistic Regression has given 93% on breast cancer
dataset.

You might also like