You are on page 1of 32

PART B

Roll No: C-121 Name: Devanshu Maheshwari


Class : C Batch : C-2
Date of Experiment: 20-12-23 Date of Submission: 22-12-23
Grade :

B.1 Software Code written by student:


Task 1: Implementation of Perceptron from scratch:

Name: Devanshu Maheshwari


Roll No: C-121
Aim: The aim is Implementation of perceptron from scratch

# Modules used for data handling and linear algebra operations.


import pandas as pd
import numpy as np

# Modules used for data visualization


import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style()

# Modules used for encoding the categorical variables.


from sklearn.preprocessing import OneHotEncoder

# Reading the dataset


df = pd.read_csv("/content/crx.csv")
df.head()
df.info()

# plotting graph
df.dtypes.value_counts().plot(kind="bar",
title="Types of Data",
xlabel="Data Type",
ylabel="No.of columns",
rot=0,
color=["crimson","orange"])
plt.show()

# The dataset has mainly categorical variables and few numericals


# In Exploratory Data Analysis the following parts are included:

Seggragation of columns (into categorical and numerical)


Analysis of Missing Values
Target Variable Class Distribution

# Seggregating columns

cat_cols = []
num_cols = []

for i in df.columns:
if df[i].dtype == "O":
cat_cols.append(i)
else:
num_cols.append(i)

# Missing Value Analysis

null_freq = []
for i in df.columns:
f = dict(df[i].value_counts())
if "?" in f.keys():
null_freq.append(f["?"]*100/len(df))
else:
null_freq.append(0)

# Only 2% of the rows have null values and hence the rows can be dropped
directly instead of applying any imputation techniques.

pd.Series(dict(zip(df.columns,null_freq))).plot(kind="bar",
rot=0,
title="Missing Value
Frequency",
xlabel="Column Name",
ylabel="Percentage of
missing values",

color=["orange","crimson"])
plt.show()

# Target Variable Analysis


df['a16'].value_counts().plot(kind="bar",
title="Class Distribution",
xlabel="Status of Credit Card Approval",
ylabel="Frequency of the Status",
color=["crimson","orange"],
rot=0)
plt.show()

# Pre Processing

df = df.replace({"?":None})
df = df.dropna()

# Encoding the columns


encoder = OneHotEncoder(sparse=False)
for i in cat_cols:
df[i] = encoder.fit_transform(df[i].values.reshape(-1,1))

df = df.reset_index()

# Train-Test Split
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(df,
df['a16'],
test_size = 0.10,
train_size=0.90,
random_state = 0
)
X_train.pop('a16')
X_test.pop('a16')

X_train

# Perceptron

class Perceptron:

# Initialising the required parameters for the perceptron.


def __init__(self, X, y, learning_rate, epochs : int):
self.X = X
self.y = y
self.learning_rate = learning_rate
self.epochs = epochs

# Activation function.
def __activation_function(self,x):
return 1.0 if (x > 0) else 0.0
# The model training or fitting by updating weights.
def fit(self):
n_rows,n_cols = self.X.shape
self.weights = np.zeros((n_cols + 1, 1))
for epoch in range(self.epochs):
for index, features in enumerate(self.X.values):
feature_transposed = np.insert(features, 0, 1).reshape(-
1,1)
predicted_target =
self.__activation_function(np.dot(feature_transposed.T, self.weights))
flag = np.squeeze(predicted_target) - self.y[index]
if flag != 0:
self.weights += self.learning_rate*((self.y[index] -
predicted_target)*feature_transposed)

# Predicting on a single instance.


def predict(self, X_test):
return self.__activation_function(np.dot(p.weights.reshape(1,-1)
[0],X_test))

# Predicting on a larger number of instances and returning accuracy.


def test(self, test_data, y):
x = []
for i in range(len(test_data.values)):
X_test = np.array(test_data.iloc[i])
x.append(p.predict(np.insert(X_test,0,1)) == p.y[i])
return sum(x)*100/len(test_data)

# Initialising the Perceptron Parameters


y = np.array(pd.DataFrame(y_train).reset_index().drop(["index"],axis=1))
X = pd.DataFrame(X_train).reset_index().drop(["index"],axis=1)
p = Perceptron(X, y, 0.5, 50)

# Fitting the model

p.fit()

# Test Accuracy
p.test(pd.DataFrame(X_test).reset_index().drop(["index"],axis=1),
np.array(pd.DataFrame(y_test).reset_index().drop(["index"],axis=1)))

# Train Accuracy
p.test(X,y)
# Turning the no of epochs
train_acc = []
test_acc = []
epochs = []

for i in range(20,200,20):
y =
np.array(pd.DataFrame(y_train).reset_index().drop(["index"],axis=1))
X = pd.DataFrame(X_train).reset_index().drop(["index"],axis=1)
p = Perceptron(X, y, 0.5, i) # features x, target y , learning rate
=0.5 , i is epochs
p.fit()
train_acc.append(p.test(X,y))

test_acc.append(p.test(pd.DataFrame(X_test).reset_index().drop(["index"],a
xis=1),
np.array(pd.DataFrame(y_test).reset_index().drop(["index"],axis=1))))
epochs.append(i)

# Train-Test Accuracies across different epochs

plt.figure(figsize=(10,10))
plt.plot(epochs,train_acc)
plt.plot(epochs,test_acc)
plt.xlabel("Number of Epochs")
plt.ylabel("Accuracy of the model")
plt.legend(['Train Accuracy',"Test Accuracy"])
plt.show()

tuning = pd.DataFrame({"epochs":np.array(epochs).squeeze(),"train
accuracy":np.array(train_acc).squeeze(), "test
accuracy":np.array(test_acc).squeeze()})

def diff(row):
return row[1] - row[2]

tuning["Difference"] = tuning.apply(diff,axis=1)

tuning.sort_values(by="Difference")
Task 2: Titanic Dataset Perceptron

Name: Devanshu Maheshwari


Roll No: C-121
Aim: The aim is to get accuracy from titanic dataset for perceptron model.

from sklearn.linear_model import Perceptron

train_df = pd.read_csv('/content/train.csv')
test_df = pd.read_csv('/content/test.csv')
combine = [train_df, test_df]

print(train_df.columns.values)

train_df.head()

train_df.tail()

train_df.info()
print('_'*40)
test_df.info()

train_df.describe()

train_df.describe(include=['O'])

train_df[['Pclass', 'Survived']].groupby(['Pclass'],
as_index=False).mean().sort_values(by='Survived', ascending=False)

train_df[["Sex", "Survived"]].groupby(['Sex'],
as_index=False).mean().sort_values(by='Survived', ascending=False)

train_df[["SibSp", "Survived"]].groupby(['SibSp'],
as_index=False).mean().sort_values(by='Survived', ascending=False)

train_df[["Parch", "Survived"]].groupby(['Parch'],
as_index=False).mean().sort_values(by='Survived', ascending=False)

g = sns.FacetGrid(train_df, col='Survived')
g.map(plt.hist, 'Age', bins=20)

grid = sns.FacetGrid(train_df, col='Survived', row='Pclass', height=2.2,


aspect=1.6)
grid.map(plt.hist, 'Age', alpha=.5, bins=20)
grid.add_legend()

grid = sns.FacetGrid(train_df, row='Embarked', height=2.2, aspect=1.6)


grid.map(sns.pointplot, 'Pclass', 'Survived', 'Sex', palette='deep')
grid.add_legend()

grid = sns.FacetGrid(train_df, row='Embarked', col='Survived', height=2.2,


aspect=1.6)
grid.map(sns.barplot, 'Sex', 'Fare', alpha=.5, ci=None)
grid.add_legend()

print("Before", train_df.shape, test_df.shape, combine[0].shape,


combine[1].shape)

train_df = train_df.drop(['Ticket', 'Cabin'], axis=1)


test_df = test_df.drop(['Ticket', 'Cabin'], axis=1)
combine = [train_df, test_df]

"After", train_df.shape, test_df.shape, combine[0].shape, combine[1].shape

for dataset in combine:


dataset['Title'] = dataset.Name.str.extract(' ([A-Za-z]+)\.',
expand=False)

pd.crosstab(train_df['Title'], train_df['Sex'])

for dataset in combine:


dataset['Title'] = dataset['Title'].replace(['Lady',
'Countess','Capt', 'Col',\
'Don', 'Dr', 'Major', 'Rev', 'Sir', 'Jonkheer', 'Dona'], 'Rare')

dataset['Title'] = dataset['Title'].replace('Mlle', 'Miss')


dataset['Title'] = dataset['Title'].replace('Ms', 'Miss')
dataset['Title'] = dataset['Title'].replace('Mme', 'Mrs')

train_df[['Title', 'Survived']].groupby(['Title'], as_index=False).mean()

title_mapping = {"Mr": 1, "Miss": 2, "Mrs": 3, "Master": 4, "Rare": 5}


for dataset in combine:
dataset['Title'] = dataset['Title'].map(title_mapping)
dataset['Title'] = dataset['Title'].fillna(0)

train_df.head()
train_df = train_df.drop(['Name', 'PassengerId'], axis=1)
test_df = test_df.drop(['Name'], axis=1)
combine = [train_df, test_df]
train_df.shape, test_df.shape

#Converting a categorical feature¶


for dataset in combine:
dataset['Sex'] = dataset['Sex'].map( {'female': 1, 'male': 0}
).astype(int)

train_df.head()

grid = sns.FacetGrid(train_df, row='Pclass', col='Sex', height=2.2,


aspect=1.6)
grid.map(plt.hist, 'Age', alpha=.5, bins=20)
grid.add_legend()

guess_ages = np.zeros((2,3))
guess_ages

for dataset in combine:


for i in range(0, 2):
for j in range(0, 3):
guess_df = dataset[(dataset['Sex'] == i) & \
(dataset['Pclass'] == j+1)]
['Age'].dropna()

age_guess = guess_df.median()

# Convert random age float to nearest .5 age


guess_ages[i,j] = int( age_guess/0.5 + 0.5 ) * 0.5

for i in range(0, 2):


for j in range(0, 3):
dataset.loc[ (dataset.Age.isnull()) & (dataset.Sex == i) &
(dataset.Pclass == j+1),\
'Age'] = guess_ages[i,j]

dataset['Age'] = dataset['Age'].astype(int)

train_df.head()

train_df['AgeBand'] = pd.cut(train_df['Age'], 5)
train_df[['AgeBand', 'Survived']].groupby(['AgeBand'],
as_index=False).mean().sort_values(by='AgeBand', ascending=True)

for dataset in combine:


dataset.loc[ dataset['Age'] <= 16, 'Age'] = 0
dataset.loc[(dataset['Age'] > 16) & (dataset['Age'] <= 32), 'Age'] = 1
dataset.loc[(dataset['Age'] > 32) & (dataset['Age'] <= 48), 'Age'] = 2
dataset.loc[(dataset['Age'] > 48) & (dataset['Age'] <= 64), 'Age'] = 3
dataset.loc[ dataset['Age'] > 64, 'Age']
train_df.head()

train_df = train_df.drop(['AgeBand'], axis=1)


combine = [train_df, test_df]
train_df.head()

for dataset in combine:


dataset['FamilySize'] = dataset['SibSp'] + dataset['Parch'] + 1

train_df[['FamilySize', 'Survived']].groupby(['FamilySize'],
as_index=False).mean().sort_values(by='Survived', ascending=False)

for dataset in combine:


dataset['IsAlone'] = 0
dataset.loc[dataset['FamilySize'] == 1, 'IsAlone'] = 1

train_df[['IsAlone', 'Survived']].groupby(['IsAlone'],
as_index=False).mean()

train_df = train_df.drop(['Parch', 'SibSp', 'FamilySize'], axis=1)


test_df = test_df.drop(['Parch', 'SibSp', 'FamilySize'], axis=1)
combine = [train_df, test_df]

train_df.head()

for dataset in combine:


dataset['Age*Class'] = dataset.Age * dataset.Pclass

train_df.loc[:, ['Age*Class', 'Age', 'Pclass']].head(10)

freq_port = train_df.Embarked.dropna().mode()[0]
freq_port

for dataset in combine:


dataset['Embarked'] = dataset['Embarked'].fillna(freq_port)
train_df[['Embarked', 'Survived']].groupby(['Embarked'],
as_index=False).mean().sort_values(by='Survived', ascending=False)

#Converting categorical feature to numeric¶


for dataset in combine:
dataset['Embarked'] = dataset['Embarked'].map( {'S': 0, 'C': 1, 'Q':
2} ).astype(int)

train_df.head()

test_df['Fare'].fillna(test_df['Fare'].dropna().median(), inplace=True)
test_df.head()

train_df['FareBand'] = pd.qcut(train_df['Fare'], 4)
train_df[['FareBand', 'Survived']].groupby(['FareBand'],
as_index=False).mean().sort_values(by='FareBand', ascending=True)

for dataset in combine:


dataset.loc[ dataset['Fare'] <= 7.91, 'Fare'] = 0
dataset.loc[(dataset['Fare'] > 7.91) & (dataset['Fare'] <= 14.454),
'Fare'] = 1
dataset.loc[(dataset['Fare'] > 14.454) & (dataset['Fare'] <= 31),
'Fare'] = 2
dataset.loc[ dataset['Fare'] > 31, 'Fare'] = 3
dataset['Fare'] = dataset['Fare'].astype(int)

train_df = train_df.drop(['FareBand'], axis=1)


combine = [train_df, test_df]

train_df.head(10)

test_df.head(10)

X_train = train_df.drop("Survived", axis=1)


Y_train = train_df["Survived"]
X_test = test_df.drop("PassengerId", axis=1).copy()
X_train.shape, Y_train.shape, X_test.shape

# Perceptron

perceptron = Perceptron()
perceptron.fit(X_train, Y_train)
Y_pred = perceptron.predict(X_test)
acc_perceptron = round(perceptron.score(X_train, Y_train) * 100, 2)
acc_perceptron

B.2 Input and Output:


# CRX Dataset
# Titanic Dataset
B.3 Observations and learning:

Observation:

1. CRX Dataset:

 Training Accuracy: 77%

 Testing Accuracy: 48%

2. Titanic Dataset:

 Perceptron Accuracy: 77%

Learning:

I gained valuable hands-on experience in applying perceptron models to


real-world datasets, such as the CRX and Titanic datasets. Through this
process, I learned the importance of cleaning and preprocessing the data to
enhance model performance. Cleaning involved handling missing values,
addressing outliers, and selecting relevant features to improve the overall
quality of the dataset. Additionally, I explored the iterative nature of model
development, experimenting with hyperparameters and adjusting the
model architecture to achieve a balance between training and testing
accuracy. This practical application not only honed my skills in
implementing perceptron’s but also highlighted the significance of
understanding the data and making informed decisions throughout the
modeling process. Overall, the experience provided a solid foundation for
future endeavors in machine learning and data analysis.

B.4 Conclusion:

1. Titanic Dataset:

 The 77% accuracy in both training and testing on the Titanic


dataset indicates a balanced performance. However, it's worth
exploring more sophisticated models, such as ensemble
methods or deep learning architectures, to see if further
improvements can be achieved.

2. Further Investigation:

 It would be beneficial to conduct a detailed analysis of


misclassified instances, inspect feature importance, and
consider data preprocessing steps. Additionally, cross-
validation can provide a more robust estimate of model
performance.

3. Iterative Process:

 Model development is often an iterative process.


Adjustments to the model, feature engineering, and
hyperparameter tuning should be conducted in a systematic
manner to iteratively improve performance.

You might also like