You are on page 1of 16

B E ST SI N GE R

ANA LYS I S

N Krishna Sumanth-2021BCSE07AED378
CONTENTS
o Objective

o Why Logistic Regression?

o Features and data processing.

o Code and explanation.

o Accuracy and Conclusion.


OBJECTIVE
 Perform binary classification using logistic regression.
 Analyzing the best singer and finding the model’s
accuracy.
WHY LOGISTIC
REGRESSION?

• Interpretability
• Simplicity
• Efficiency
• Assumption of linearity
• We l l - s u i t e d f o r b i n a r y c l a s s i f i c a t i o n
F E AT U R E S I N T H E D ATA S E T
oSinger oMode
oAcoustiness oSpeechiness
oDanceability o Te m p o
oEnergy o Va l e n c e
oKey oLiked
oLiveness
oLoudness
DATA PROCESSING
oH a n d l i n g m i s s i n g d a t a
o E n c o d i n g c a t e g o r i c a l f e a t u re s
o F e a t u re s c a l i n g
I M P O R T I N G D ATA S E T S A N D M O D U L E S

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Load the dataset


data = pd.read_csv('singerdata.csv')
S E L E C T T H E R E L E VA N T C O L U M N S F O R
PREDICTION

selected_columns = ['acousticness', 'danceability', 'energy', 'key', 'liveness',


'loudness', 'mode', 'speechiness', 'tempo', 'valence',
'liked']
data = data[selected_columns]
D ATA P R E P R O C E S S I N G

# Preprocessing
data = data.dropna() # Remove rows with missing values

# Separate the features and target variable


X = data.iloc[:, :-1] # Features
y = data.iloc[:, -1] # Labels
E N C O D E C AT E G O R I C A L F E AT U R E S

label_encoder = LabelEncoder()
X['acousticness'] = label_encoder.fit_transform(X['acousticness'])
X['danceability'] = label_encoder.fit_transform(X['danceability'])
X['energy'] = label_encoder.fit_transform(X['energy'])
X['key'] = label_encoder.fit_transform(X['key'])
X['liveness'] = label_encoder.fit_transform(X['liveness'])
X['loudness'] = label_encoder.fit_transform(X['loudness'])
X['mode'] = label_encoder.fit_transform(X['mode'])
X['speechiness'] = label_encoder.fit_transform(X['speechiness'])
X['tempo'] = label_encoder.fit_transform(X['tempo'])
X['valence'] = label_encoder.fit_transform(X['valence'])
S P L I T T I N G D ATA S E T

# Split the dataset into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=10)
F E AT U R E S C A L I N G

# Feature scaling
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
MODEL TRAINING ,PREDICTION &
E VA L U AT I O N

# Model training
model = LogisticRegression()
model.fit(X_train, y_train)

# Model prediction
y_pred = model.predict(X_test)
MODEL TRAINING ,PREDICTION &
E VA L U AT I O N

print(y_pred)
# Model evaluation
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
OUTPUT
T HA NK Y OU ...

You might also like