You are on page 1of 9

Worksheet-3.

Student Name: Nikhil Raj UID: 21MCI1126


Branch: MCA(AIML) Section/Group: 21MAM-1/B
Semester: 3rd Date of Performance: 11/11/2022
Subject Name: Deep Learning Subject Code:21CAH-724

1. Aim/Overview of the practical: Take a NLP data set from any online repository and use ANN
for any NLP application

2. Task to be done:
1. Load the data
2. Understanding the data format
3. Preprocessing the data
4. Build a model
5. Compile the model
6. Train the model
7. What is the accuracy of the model?
8 How to improve the accuracy of the model?
9. Evaluate the model
10. What results do you get with a model with more than 2 hidden layers and more/less neurons

3. Code for experiment/practical:


from google.colab import drive
drive.mount('/content/drive')

import pandas as pd
import os

#Load Spam Data and review content


data = pd.read_csv("/content/drive/MyDrive/Machine Learning/SPAM text message 20170820
- Data.csv")
print("\nLoaded Data :\n------------------------------------")
print(data.head())

#Separate feature and target data


spam_classes_raw = data["Category"]
spam_messages = data["Message"]

print(data.shape)

data.isnull().sum()

data['Category'].value_counts()

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

import re
import string
from wordcloud import WordCloud
from collections import Counter

labels = ['Spam', 'Ham']


sizes = [747, 4825]
custom_colours = ['#ff7675', '#74b9ff']

plt.figure(figsize=(20, 6), dpi=227)


plt.subplot(1, 2, 1)
plt.pie(sizes, labels = labels, textprops={'fontsize': 15}, startangle=140,
autopct='%1.0f%%', colors=custom_colours, explode=[0, 0.05])

plt.subplot(1, 2, 2)
sns.barplot(x = data['Category'].unique(), y = data['Category'].value_counts(), palette
= 'viridis')

plt.show()

data['Total Words'] = data['Message'].apply(lambda x: len(x.split()))

def count_total_words(text):
char = 0
for word in text.split():
char += len(word)
return char
data['Total Chars'] = data["Message"].apply(count_total_words)

plt.figure(figsize = (10, 6))


sns.kdeplot(x = data['Total Words'], hue= data['Category'], palette= 'winter', shade =
True)
plt.show()

plt.figure(figsize = (10, 6))


sns.kdeplot(x = data['Total Chars'], hue= data['Category'], palette= 'winter', shade =
True)
plt.show()

#Build a label encoder for target variable to convert strings to numeric values.
from sklearn import preprocessing
import tensorflow as tf

label_encoder = preprocessing.LabelEncoder()
spam_classes = label_encoder.fit_transform(
spam_classes_raw)

#Convert target to one-hot encoding vector


spam_classes = tf.keras.utils.to_categorical(spam_classes,2)

print("One-hot Encoding Shape : ", spam_classes.shape)

#Preprocess data for spam messages


from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

#Max words in the vocabulary for this dataset


VOCAB_WORDS=10000
#Max sequence length for word sequences
MAX_SEQUENCE_LENGTH=100

#Create a vocabulary with unique words and IDs


spam_tokenizer = Tokenizer(num_words=VOCAB_WORDS)
spam_tokenizer.fit_on_texts(spam_messages)

print("Total unique tokens found: ", len(spam_tokenizer.word_index))


print("Example token ID for word \"me\" :", spam_tokenizer.word_index.get("me"))

#Convert sentences to token-ID sequences


spam_sequences = spam_tokenizer.texts_to_sequences(spam_messages)

#Pad all sequences to fixed length


spam_padded = pad_sequences(spam_sequences, maxlen=MAX_SEQUENCE_LENGTH)

print("\nTotal sequences found : ", len(spam_padded))


print("Example Sequence for sentence : ", spam_messages[0] )
print(spam_padded[0])

#Split into training and test data


from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(
spam_padded,spam_classes,test_size=0.2)

#Performing Feature Scaling


from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

#Initialising ANN
ann = tf.keras.models.Sequential()

#Adding First Hidden Layer


ann.add(tf.keras.layers.Dense(units=6,activation="relu"))

#Adding Second Hidden Layer


ann.add(tf.keras.layers.Dense(units=6,activation="relu"))

#Adding Output Layer


ann.add(tf.keras.layers.Dense(units=1,activation="sigmoid"))

#Compiling ANN
ann.compile(optimizer="adam",loss="binary_crossentropy",metrics=['accuracy'])
ann.fit(X_train,Y_train,batch_size=32,epochs = 0)

# fit the keras model on the dataset without progress bars


ann.fit(spam_padded,spam_classes, epochs=0, batch_size=0, verbose=0)
# # evaluate the keras model
# _, accuracy = ann.evaluate(spam_padded,spam_classes, verbose=0)

4-Result/Outcome:-
Learning outcomes (What I have learnt):

1. Learn about ANN and it’s features.

2. Learn about regression and to perform it with ANN.

3. Learn to create a model for ANN.

4. Learn to predict with Ann.

Evaluation Grid:

Sr. No. Parameters Marks Obtained Maximum Marks


1. Demonstration/Performance/Pre 5
Lab Quiz
2. Worksheet 10
3. Post Lab Quiz 5

You might also like