You are on page 1of 3

AIML Practical No 5

Name: Ishan Ahuja


Year: - TE A MECHANICAL
Roll no: 02
Seat No: T190600804

Aim: Logistic Regression.


Theory: This type of statistical analysis (also known as logit model) is
often used for predictive analytics and modelling, and extends to
applications in machine learning. In this analytics approach, the
dependent variable is finite or categorical: either A or B (binary
regression) or a range of finite options A, B, C or D (multinomial
regression). It is used in statistical software to understand the
relationship between the dependent variable and one or more
independent variables by estimating probabilities using a logistic
regression equation. 
This type of analysis can help you predict the likelihood of an event
happening or a choice being made. For example, you may want to know
the likelihood of a visitor choosing an offer made on your website — or
not (dependent variable). Your analysis can look at known
characteristics of visitors, such as sites they came from, repeat visits to
your site, behaviours on your site (independent variables). Logistic
regression models help you determine a probability of what type of
visitors are likely to accept the offer — or not. As a result, you can make
better decisions about promoting your offer or make decisions about the
offer itself.

Dataset from GitHub.

Algorithm:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn import metrics
import matplotlib.pyplot as plt

Step 2: Load the Data

#import dataset from CSV file on Github


url = "https://raw.githubusercontent.com/Statology/Python-
Guides/main/default.csv"
data = pd.read_csv(url)

#view first six rows of dataset


data[0:6]

default student balance income

0 0 0 729.526495 44361.625074

1 0 1 817.180407 12106.134700

2 0 0 1073.549164 31767.138947

3 0 0 529.250605 35704.493935

4 0 0 785.655883 38463.495879

5 0 1 919.588530 7491.558572

#find total observations in dataset


len(data.index)

10000

Step 3: Create Training and Test Samples

#define the predictor variables and the response variable


X = data[['student', 'balance', 'income']]
y = data['default']

#split the dataset into training (70%) and testing (30%) sets
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.3,rand
om_state=0)

Step 4: Fit the Logistic Regression Model


#instantiate the model
log_regression = LogisticRegression()

#fit the model using the training data


log_regression.fit(X_train,y_train)

#use model to make predictions on test data


y_pred = log_regression.predict(X_test)

Step 5: Model Diagnostics

cnf_matrix = metrics.confusion_matrix(y_test, y_pred)


cnf_matrix

array([[2870, 17],
[ 93, 20]])

print("Accuracy:",metrics.accuracy_score(y_test, y_pred))

Accuracy: 0.9633333333333334

#define metrics
y_pred_proba = log_regression.predict_proba(X_test)[::,1]
fpr, tpr, _ = metrics.roc_curve(y_test, y_pred_proba)
auc = metrics.roc_auc_score(y_test, y_pred_proba)

#create ROC curve


plt.plot(fpr,tpr,label="AUC="+str(auc))
plt.legend(loc=4)
plt.show()

You might also like