Department of Computer Engineering
Experiment No: 03
Date:
Roll No:
Aim: To Implement Logistic Regression
Theory:
Logistic regression is a popular statistical method used for binary classification tasks, where the
goal is to predict a categorical outcome with two possible classes (usually denoted as 0 and 1).
Despite its name, logistic regression is actually a classification algorithm, not a regression
algorithm like linear regression. It estimates the probability that an instance belongs to a particular
class based on its input features.
The basic idea behind logistic regression is to transform the output of a linear equation into a range
between 0 and 1 using the logistic function (also known as the sigmoid function). The equation for
logistic regression can be expressed as follows:
p(y=1 | X) = 1 / (1 + e^(-Z))
● p(y=1 | X) is the probability that the output y is equal to 1 given the input features X.
● Z is the linear combination of the input features and their corresponding coefficients. It is
calculated as Z = b0 + b1x1 + b2x2 + ... + bn*xn, where b0, b1, b2, ..., bn are the
coefficients and x1, x2, ..., xn are the input features.
During the training phase, logistic regression attempts to find the optimal values for the
coefficients b0, b1, b2, ..., bn by minimizing a cost function. The most commonly used cost
function for logistic regression is the log loss (also known as the cross-entropy loss).The training
process typically involves an optimization algorithm (e.g., gradient descent) that iteratively adjusts
the coefficients to minimize the cost function. Once the model is trained, it can be used to predict
the probability of an input belonging to class 1, and a decision threshold (usually 0.5) is applied to
convert these probabilities into class labels (0 or 1).Logistic regression has several advantages,
such as simplicity, interpretability, and efficiency in training and prediction. However, it is limited
to handling binary classification problems. For multiclass classification tasks, variations like
multinomial logistic regression (softmax regression) or one-vs-rest logistic regression can be
employed.
Program:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
def sigmoid(z):
return 1 / (1 + np.exp(-z))
def compute_cost(X, y, theta):
m = len(y) # number of training examples
predictions = sigmoid(X @ theta)
cost = - (1/m) * np.sum(y * np.log(predictions + 1e-15) + (1 - y) * np.log(1 - predictions + 1e-15))
return cost
def gradient_descent(X, y, theta, learning_rate, num_iterations):
m = len(y)
cost_history = np.zeros(num_iterations)
for i in range(num_iterations):
predictions = sigmoid(X @ theta)
errors = predictions - y
gradient = (1/m) * (X.T @ errors)
theta -= learning_rate * gradient
cost_history[i] = compute_cost(X, y, theta)
return theta, cost_history
def predict(X, theta, threshold=0.5):
probabilities = sigmoid(X @ theta)
return (probabilities >= threshold).astype(int)
def accuracy(y_true, y_pred):
return np.mean(y_true == y_pred) * 100
def print_logistic_regression_equation(theta):
b0 = theta[0]
coefficients = theta[1:]
terms = [f"{coeff:.4f} * x{i+1}" for i, coeff in enumerate(coefficients)]
Z = f"{b0:.4f} + " + " + ".join(terms)
print("Logistic Regression Equation:")
print(f"p(y = 1 | X) = 1 / (1 + exp(-({Z})))")
print(f"b0 (intercept) : {b0:.4f}")
for i, coeff in enumerate(coefficients, start=1):
print(f"b{i} (coefficient for x{i}) : {coeff:.4f}")
def make_prediction(theta, scaler, income, savings):
# Prepare the input data
input_data = np.array([[1, income, savings]]) # Add intercept term
input_data_scaled = scaler.transform(input_data[:, 1:]) # Scale features
input_data[:, 1:] = input_data_scaled
prediction_prob = sigmoid(input_data @ theta)
if prediction_prob > 0.5:
print("Loan Sanctioned")
else:
print("Loan Not Sanctioned")
def main():
df = pd.read_excel(r'C:\Users\RGIT 5\dataset.xlsx') # Use raw string or adjust path
X = df[['Annual Income', 'Savings']].values
y = df['target'].values
X = np.hstack((np.ones((X.shape[0], 1)), X))
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
scaler = StandardScaler()
X_train[:, 1:] = scaler.fit_transform(X_train[:, 1:])
X_test[:, 1:] = scaler.transform(X_test[:, 1:])
theta = np.zeros(X_train.shape[1])
learning_rate = 0.01
num_iterations = 1000
theta, cost_history = gradient_descent(X_train, y_train, theta, learning_rate, num_iterations)
print("Scikit-learn Logistic Regression Parameters:")
print_logistic_regression_equation(theta)
train_predictions = predict(X_train, theta)
test_predictions = predict(X_test, theta)
train_accuracy = accuracy(y_train, train_predictions)
test_accuracy = accuracy(y_test, test_predictions)
print("Training Accuracy:", train_accuracy, "%")
print("Test Accuracy:", test_accuracy, "%")
plt.plot(range(num_iterations), cost_history, label='Cost')
plt.xlabel('Number of iterations')
plt.ylabel('Cost')
plt.title('Cost Function History')
plt.legend()
plt.show()
test_income = float(input("Enter Annual Income (Lakhs): "))
test_savings = float(input("Enter Savings (Lakhs): "))
make_prediction(theta, scaler, test_income, test_savings)
if name == " main ":
main()
Output:
Logistic Regression Equation:
p(y = 1 | X) = 1 / (1 + exp(-(-0.5713 + 1.4762 * x1 + 0.7611 * x2)))
b0 (intercept) : -0.5713
b1 (coefficient for x1) : 1.4762
b2 (coefficient for x2) : 0.7611
Training Accuracy: 75.0 %
Test Accuracy: 100.0 %
Enter Annual Income (Lakhs): 10
Enter Savings (Lakhs): 7
Loan Sanctioned
Conclusion: Thus we have studied and Implemented Logistic Regression