You are on page 1of 8

Experiment-5

AIM:

(a) Execute the Logistic Regression with the help of diabetes data set. Analyse the
result and identify how well the model performed on test set. Brief the steps that
you have followed for analyses the data set.

(b) Implement Logistic Regression using python.

THEORY:
Logistic regression is a fundamental and widely used statistical method in machine learning for
binary classification tasks.

Logistic regression is a supervised machine learning algorithm used for classification tasks where
the goal is to predict the probability that an instance belongs to a given class or not.

Logistic regression aims to model the probability that an instance belongs to a particular class based on
one or more predictor variables. It's particularly useful when the dependent variable (target) is
categorical with two levels, commonly referred to as the binary classification problem.

For example, we have two classes Class 0 and Class 1 if the value of the logistic function for an input is
greater than 0.5 (threshold value) then it belongs to Class 1 it belongs to Class 0. It’s referred to as
regression because it is the extension of linear regression but is mainly used for classification problems.

Types of Logistic Regression

On the basis of the categories, Logistic Regression can be classified into three types:

1. Binomial: In binomial Logistic regression, there can be only two possible types of the dependent
variables, such as 0 or 1, Pass or Fail, etc.

2. Multinomial: In multinomial Logistic regression, there can be 3 or more possible unordered


types of the dependent variable, such as “cat”, “dogs”, or “sheep”

3. Ordinal: In ordinal Logistic regression, there can be 3 or more possible ordered types of
dependent variables, such as “low”, “Medium”, or “High”.

Gungun Sahu EN21CS301277


Sahu
Procedure:

a) Logistic Regression on weka:

Load Data: Open Weka Explorer and load your dataset. Weka supports various file formats, such as
ARFF, CSV, and more.

Gungun Sahu EN21CS301277


Sahu
1.Preprocess Data (if needed): If your dataset requires preprocessing, you can perform tasks like
handling missing values, normalization, or feature selection using Weka's preprocessing tools.

1.Choose Logistic Regression Algorithm: Navigate to the "Classify" tab in Weka Explorer. Click on
the "Choose" button to select the logistic regression algorithm. In Weka, logistic regression is
implemented as the "Logistic" classifier under the "functions" category.

Gungun Sahu EN21CS301277


Sahu
1. Set Options : Configure any specific options for the logistic regression algorithm, such
as regularization parameters, feature selection methods, or other settings.

2. Split Data (optional): Optionally, split your dataset into training and testing sets to

evaluate the performance of the logistic regression model. First use 80% for training and
20% for testing. Second time, use 60% for training and 40% testing.

3. Run Logistic Regression: Once we have selected the logistic regression algorithm and

set the options, click on the "Start" button to run the logistic regression model on your
dataset.

4. Evaluate Results: After running the logistic regression model, evaluate its performance

using appropriate evaluation metrics. Weka provides tools for computing various
performance metrics such as accuracy, precision, recall, F1-score, ROC curve, and
AUCROC.

Gungun Sahu EN21CS301277


Sahu
5. Interpret Results: Interpret the results of the logistic regression model, including the

coefficients of the predictor variables, odds ratios, and any other relevant statistics.
Weka provides visualization tools and summary statistics to help interpret the results.

b) Logistic Regression using python:


Steps to perform logistic regression in python:
1. Import Libraries: Import the necessary libraries for data manipulation, visualization, and
modeling.
Gungun Sahu EN21CS301277
Sahu
2. Preprocess Data (if needed): Handle missing values, encode categorical variables, and
perform feature scaling if necessary.
3. Split Data: Split your dataset into training and testing sets.
4. Instantiate Model: Create an instance of the logistic regression model.
5. Train Model: Fit the model on the training data.
6. Make Predictions: Use the trained model to make predictions on the test data.
7. Evaluate Model: Evaluate the performance of the model using appropriate metrics.
8. Interpret Results: Interpret the results of the logistic regression model, including
coefficients and odds ratios if needed.
9. Iterate and Refine (if needed): Depending on the performance of the model, you may need
to iterate and refine your approach by experimenting with different preprocessing
techniques, feature selection methods, or hyperparameter tuning.
Program for logistic regression:
# Import necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt import seaborn as sns from sklearn.datasets import
load_diabetes from sklearn.model_selection import train_test_split from sklearn.preprocessing
import StandardScaler from sklearn.linear_model import LogisticRegression from
sklearn.metrics import accuracy_score, classification_report, confusion_matrix, roc_curve, auc

# Load the diabetes dataset diabetes


= load_diabetes()
X, y = diabetes.data, diabetes.target
# Convert the target variable to binary (1 for diabetes, 0 for no diabetes) y_binary
= (y > np.median(y)).astype(int)

# Split the data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(
X, y_binary, test_size=0.2, random_state=42)

# Standardize features scaler


= StandardScaler()
X_train = scaler.fit_transform(X_train)

Gungun Sahu EN21CS301277


Sahu
X_test = scaler.transform(X_test)

# Train the Logistic Regression model


model = LogisticRegression()
model.fit(X_train, y_train)

# Evaluate the model y_pred =


model.predict(X_test) accuracy =
accuracy_score(y_test, y_pred) print("Accuracy:
{:.2f}%".format(accuracy * 100))
# Visualize the decision boundary with accuracy information
plt.figure(figsize=(8, 6)) sns.scatterplot(x=X_test[:, 2], y=X_test[:,
8], hue=y_test, palette={0: 'blue', 1: 'red'}, marker='o') plt.xlabel("BMI") plt.ylabel("Age")
plt.title("Logistic Regression Decision
Boundary\nAccuracy: {:.2f}%".format( accuracy * 100))
plt.legend(title="Diabetes", loc="upper right") plt.show()
# evaluate the model print("Confusion Matrix:\n",
confusion_matrix(y_test, y_pred)) print("\nClassification Report:\n",
classification_report(y_test, y_pred))
# Plot ROC Curve y_prob =
model.predict_proba(X_test)[:, 1] fpr, tpr,
thresholds = roc_curve(y_test, y_prob) roc_auc
= auc(fpr, tpr)
plt.figure(figsize=(8, 6)) plt.plot(fpr, tpr,
color='darkorange', lw=2,
label=f'ROC Curve (AUC = {roc_auc:.2f})')
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--', label='Random')
plt.xlabel('False Positive Rate') plt.ylabel('True Positive Rate') plt.title('Receiver
Operating Characteristic (ROC) Curve\nAccuracy: {:.2f}%'.format( accuracy *
100))
plt.legend(loc="lower right") plt.show()

Gungun Sahu EN21CS301277


Sahu
OUTPUT:

Gungun Sahu EN21CS301277


Sahu

You might also like