21MCI1126 - Nikhil - Raj DL EXP 3.1

Worksheet-3.
Student Name: Nikhil Raj UID: 21MCI1126

Branch: MCA(AIML) Section/Group: 21MAM-1/B
Semester: 3rd Date of Performance: 11/11/2022
Subject Name: Deep Learning Subject Code:21CAH-724
1. Aim/Overview of the practical: Take a NLP data set from any online repository and use ANN
for any NLP application
2. Task to be done:
1. Load the data
2. Understanding the data format
3. Preprocessing the data
4. Build a model
5. Compile the model
6. Train the model
7. What is the accuracy of the model?
8 How to improve the accuracy of the model?
9. Evaluate the model
10. What results do you get with a model with more than 2 hidden layers and more/less neurons
3. Code for experiment/practical:

from google.colab import drive
drive.mount('/content/drive')
import pandas as pd
import os
#Load Spam Data and review content

data = pd.read_csv("/content/drive/MyDrive/Machine Learning/SPAM text message 20170820
- Data.csv")
print("\nLoaded Data :\n------------------------------------")
print(data.head())
#Separate feature and target data

spam_classes_raw = data["Category"]
spam_messages = data["Message"]
print(data.shape)
data.isnull().sum()
data['Category'].value_counts()
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
import re
import string
from wordcloud import WordCloud
from collections import Counter
labels = ['Spam', 'Ham']

sizes = [747, 4825]
custom_colours = ['#ff7675', '#74b9ff']
plt.figure(figsize=(20, 6), dpi=227)

plt.subplot(1, 2, 1)
plt.pie(sizes, labels = labels, textprops={'fontsize': 15}, startangle=140,
autopct='%1.0f%%', colors=custom_colours, explode=[0, 0.05])
plt.subplot(1, 2, 2)
sns.barplot(x = data['Category'].unique(), y = data['Category'].value_counts(), palette
= 'viridis')
plt.show()
data['Total Words'] = data['Message'].apply(lambda x: len(x.split()))
def count_total_words(text):
char = 0
for word in text.split():
char += len(word)
return char
data['Total Chars'] = data["Message"].apply(count_total_words)
plt.figure(figsize = (10, 6))

sns.kdeplot(x = data['Total Words'], hue= data['Category'], palette= 'winter', shade =
True)
plt.show()
plt.figure(figsize = (10, 6))

sns.kdeplot(x = data['Total Chars'], hue= data['Category'], palette= 'winter', shade =
True)
plt.show()
#Build a label encoder for target variable to convert strings to numeric values.
from sklearn import preprocessing
import tensorflow as tf
label_encoder = preprocessing.LabelEncoder()
spam_classes = label_encoder.fit_transform(
spam_classes_raw)
#Convert target to one-hot encoding vector

spam_classes = tf.keras.utils.to_categorical(spam_classes,2)
print("One-hot Encoding Shape : ", spam_classes.shape)
#Preprocess data for spam messages

from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
#Max words in the vocabulary for this dataset

VOCAB_WORDS=10000
#Max sequence length for word sequences
MAX_SEQUENCE_LENGTH=100
#Create a vocabulary with unique words and IDs

spam_tokenizer = Tokenizer(num_words=VOCAB_WORDS)
spam_tokenizer.fit_on_texts(spam_messages)
print("Total unique tokens found: ", len(spam_tokenizer.word_index))

print("Example token ID for word \"me\" :", spam_tokenizer.word_index.get("me"))
#Convert sentences to token-ID sequences

spam_sequences = spam_tokenizer.texts_to_sequences(spam_messages)
#Pad all sequences to fixed length

spam_padded = pad_sequences(spam_sequences, maxlen=MAX_SEQUENCE_LENGTH)
print("\nTotal sequences found : ", len(spam_padded))

print("Example Sequence for sentence : ", spam_messages[0] )
print(spam_padded[0])
#Split into training and test data

from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(
spam_padded,spam_classes,test_size=0.2)
#Performing Feature Scaling

from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
#Initialising ANN
ann = tf.keras.models.Sequential()
#Adding First Hidden Layer

ann.add(tf.keras.layers.Dense(units=6,activation="relu"))
#Adding Second Hidden Layer

ann.add(tf.keras.layers.Dense(units=6,activation="relu"))
#Adding Output Layer

ann.add(tf.keras.layers.Dense(units=1,activation="sigmoid"))
#Compiling ANN
ann.compile(optimizer="adam",loss="binary_crossentropy",metrics=['accuracy'])
ann.fit(X_train,Y_train,batch_size=32,epochs = 0)
# fit the keras model on the dataset without progress bars

ann.fit(spam_padded,spam_classes, epochs=0, batch_size=0, verbose=0)
# # evaluate the keras model
# _, accuracy = ann.evaluate(spam_padded,spam_classes, verbose=0)
4-Result/Outcome:-
Learning outcomes (What I have learnt):
1. Learn about ANN and it’s features.
2. Learn about regression and to perform it with ANN.
3. Learn to create a model for ANN.
4. Learn to predict with Ann.
Evaluation Grid:
Sr. No. Parameters Marks Obtained Maximum Marks

1. Demonstration/Performance/Pre 5
Lab Quiz
2. Worksheet 10
3. Post Lab Quiz 5

21MCI1126 - Nikhil - Raj DL EXP 3.1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

21MCI1126 - Nikhil - Raj DL EXP 3.1

Uploaded by

Copyright:

Available Formats

Worksheet-3.

Student Name: Nikhil Raj UID: 21MCI1126

3. Code for experiment/practical:

#Load Spam Data and review content

#Separate feature and target data

labels = ['Spam', 'Ham']

plt.figure(figsize=(20, 6), dpi=227)

data['Total Words'] = data['Message'].apply(lambda x: len(x.split()))

plt.figure(figsize = (10, 6))

plt.figure(figsize = (10, 6))

#Convert target to one-hot encoding vector

print("One-hot Encoding Shape : ", spam_classes.shape)

#Preprocess data for spam messages

#Max words in the vocabulary for this dataset

#Create a vocabulary with unique words and IDs

print("Total unique tokens found: ", len(spam_tokenizer.word_index))

#Convert sentences to token-ID sequences

#Pad all sequences to fixed length

print("\nTotal sequences found : ", len(spam_padded))

#Split into training and test data

#Performing Feature Scaling

#Adding First Hidden Layer

#Adding Second Hidden Layer

#Adding Output Layer

# fit the keras model on the dataset without progress bars

1. Learn about ANN and it’s features.

2. Learn about regression and to perform it with ANN.

3. Learn to create a model for ANN.

4. Learn to predict with Ann.

Sr. No. Parameters Marks Obtained Maximum Marks

You might also like