You are on page 1of 19

Unitedworld School of Computational Intelligence (USCI)

Karnavati University

Professional Course Training

Topic: Neural Network SMS Text Classifier

Prepared by

Yash Kamleshkumar Patel


21BSDS12

Zaeem Dola
21BSCS14
Index:

CONTENTS:

1. Introduction

1.1 Project details


1.2 Project purpose
1.3 Project use case
1.4 Implementation feasibility

2. Details of modules used.

2.1. Introduction to PANDAS


2.2. Introduction to NUMPY
2.3. Introduction to LIBROSA
2.4. Introduction to SEABORN
2.5. Introduction to MATPLOTLIB
2.6. Introduction to TENSORFLOW

3. System Requirements
3.1Dataset Used

4. Screenshot

5. Conclusion
1. Introduction

1.1 Project Details:

An SMS classifier is a system designed to automatically classify short message service (SMS) messages as either
ham (legitimate messages) or spam (unsolicited or fraudulent messages). This type of system can be useful for
managing and organizing SMS messages, particularly in cases where users receive a high volume of messages or
in situations where security is a concern.

The SMS classifier using ham and spam works by analyzing the content of each incoming message and
comparing it to a set of predefined criteria. The criteria used to identify ham messages may include factors such
as message length, sender identity, and the presence of certain keywords or phrases. Spam messages, on the
other hand, may be identified based on characteristics such as the use of excessive capitalization or punctuation,
the presence of certain URLs or phone numbers, or the inclusion of suspicious or misleading language.

The SMS classifier system can be trained using machine learning algorithms that use historical data to learn to
distinguish between ham and spam messages. By analyzing large volumes of past SMS messages, the system can
learn to identify patterns and common characteristics that are indicative of each message type.

Once the system has been trained, it can be used to automatically classify incoming messages in real-time,
providing users with a streamlined and efficient way of managing their SMS communications. Users may also
have the option to manually label messages as ham or spam, which can help to further refine the system's
classification accuracy over time.

Overall, the SMS classifier using ham and spam is a powerful tool for managing SMS communications and
improving security. By automatically identifying and filtering out spam messages, users can save time and
reduce the risk of falling victim to SMS-based scams or frauds.

1.2 Project Purpose

The purpose of this project is to develop a neural network-based SMS classifier that can accurately distinguish
between ham and spam messages. The neural network will be trained on a large dataset of SMS messages and
will be designed to analyze the content of each message in real-time and classify it as either ham or spam.

The main goal of this project is to create a reliable and effective tool for managing SMS communications. By
automatically identifying and filtering out spam messages, the neural network classifier can help users to save
time and avoid potential security risks associated with unsolicited or fraudulent SMS messages.

Specifically, the project aims to achieve the following objectives:

Build a large dataset of SMS messages for training and testing the neural network classifier.

Develop and implement a neural network architecture optimized for SMS classification that can analyze the
content of each message and classify it as ham or spam.
Train the neural network using the dataset of SMS messages to improve its accuracy and performance.

Evaluate the performance of the neural network classifier using a variety of metrics, including precision, recall,
and F1 score.

Integrate the SMS neural network classifier into an SMS messaging application to provide users with a reliable
and efficient way of managing their SMS communications.

Ultimately, the goal of this project is to provide users with a powerful tool for managing their SMS
communications that is both accurate and efficient. By leveraging the power of neural networks, this SMS
classifier can help to improve the security and efficiency of SMS messaging, making it easier and safer for users
to stay connected with friends, family, and colleagues.

1.3Project use case

Project Use Case: SMS Neural Network Classifier using Python.

The SMS Neural Network Classifier using Python is a machine learning project that aims to develop a neural
network-based classifier for SMS messages that can accurately distinguish between ham and spam messages.
This project has a wide range of potential use cases, including:

Personal Use: The SMS Neural Network Classifier can be used by individuals to manage their personal SMS
communications. By automatically identifying and filtering out spam messages, users can save time and reduce
the risk of falling victim to SMS-based scams or frauds.

Business Use: The SMS Neural Network Classifier can also be used by businesses to manage their SMS
communications with customers. By filtering out spam messages, businesses can ensure that their messages are
seen and responded to by their customers, improving customer engagement and satisfaction.

Security Use: The SMS Neural Network Classifier can be used by security professionals to monitor SMS
communications for potential security threats. By identifying and flagging suspicious messages, security
professionals can take proactive measures to prevent security breaches and protect sensitive data.

Research Use: The SMS Neural Network Classifier can be used by researchers to study the patterns and trends of
SMS messaging. By analyzing large volumes of SMS messages, researchers can gain valuable insights into
communication patterns and behaviors.

In each of these use cases, the SMS Neural Network Classifier can provide a valuable tool for managing SMS
communications. By leveraging the power of Python and machine learning algorithms, this classifier can help to
improve the efficiency and security of SMS messaging, making it easier and safer for users to stay connected
with friends, family, and colleagues.
1.4Implementation feasibility

The implementation of a neural network-based SMS classifier using Python is highly feasible. Python is a
popular programming language for machine learning and has a wide range of libraries and frameworks that make
developing and implementing neural networks relatively easy.

The following factors contribute to the feasibility of implementing an SMS neural network classifier using ham
and spam:

Dataset availability: A large dataset of SMS messages is readily available for training and testing the neural
network classifier. There are several publicly available datasets that include thousands of SMS messages labeled
as ham or spam, making it easy to acquire and process the necessary data.

Machine learning frameworks: Python offers several popular machine learning frameworks, such as TensorFlow,
Keras, and Scikit-learn, which provide robust support for neural network development and implementation.

Open-source libraries: There are several open-source libraries available in Python, such as NLTK (Natural
Language Toolkit) and TextBlob, that provide advanced text processing and natural language processing
capabilities that can be used to preprocess and analyze SMS messages.

Computing resources: The required computing resources for implementing an SMS neural network classifier are
relatively modest. A typical laptop or desktop computer with a modern CPU and GPU can easily handle the
computational requirements of training and testing the neural network.

Deployment options: Python provides several options for deploying the SMS neural network classifier, including
standalone desktop applications, web-based applications, and mobile applications.

Overall, the implementation of an SMS neural network classifier using ham and spam in Python is highly
feasible. With the availability of datasets, machine learning frameworks, open-source libraries, computing
resources, and deployment options, the development and deployment of the SMS classifier can be accomplished
with relative ease.

2. Details of modules used.

Introduction to Pandas -

The most often used open-source Python library for data science, data analysis, and machine
learning activities is titled Pandas. It is constructed on top of NumPy, a different package that
supports multi-dimensional arrays. Pandas make it simple to do many of the time-consuming,
repetitive tasks associated with working with data, including:

1. Data cleansing
2. Data fill

3. Data normalization

4. Merges and joins

5. Data visualization

6. Statistical analysis

7. Data inspection

8. Loading and saving data

The first step of working in pandas is to ensure whether it is installed in the Python folder or
not. We installed it in our system using the pip command. We typed the cmd command in
the search box and located the folder using the cd command where the python-pip file has
been installed. After locating it, we typed the command:

After the pandas were installed into the system, we imported the library. This module is
generally imported as:

Here, pd is referred to as an alias to the Pandas. However, it is not necessary to import the
library using the alias, it just helps in writing less code every time a method or property is
called.

Pandas generally provide two data structures for manipulating data, They are:

● Series

● Data frame
Introduction to NumPy -

NumPy is a general-purpose array-processing package. It provides a high-performance


multidimensional array object and tools for working with these arrays. It is the fundamental
package for scientific computing with Python. It is open-source software. It contains various
features including these important ones:

● A powerful N-Dimensional array object

● Sophisticated (broadcasting) functions

● Tools for integrating C/C++ and Fortran Codes

Mac and Linux users can install NumPy via pip

command:

Windows do not have any package manager analogous to that of Linux or mac.

NumPy’s main object is the homogeneous multidimensional array.

● It is a table of elements (usually numbers), all of the same type, indexed by a tuple of
positive integers.

● In NumPy, dimensions are called axes. The number of axes is rank.

● NumPy’s array class is called ndarray. It is also known by the alias array.
Introduction to Librosa -

Librosa is a Python package for music and audio analysis. Librosa is basically used when we
work with audio data like in music generation (using LSTMs), and

Automatic Speech Recognition.

It provides the building blocks necessary to create music information retrieval systems.
Librosa helps to visualize the audio signals and also do the feature extractions in it using
different signal processing techniques.

Installation

● Using PyPI (Python Package Index)

Open the command prompt on your system and write any one of them.
Introduction to SEABORN
Seaborn is a library that uses Matplotlib underneath to plot graphs. It will be used to
visualize random distributions. Seaborn makes it easy to switch between different visual
representations by using a consistent dataset-oriented API.

If you have Python and PIP already installed on a system, install it using this command:

C:\Users\Your Name>pip install seaborn

Seaborn aids in data exploration and comprehension. Its charting functions work with data
frames and arrays holding entire datasets and internally carry out the necessary semantic
mapping and statistical aggregation to build useful graphs. Its dataset-oriented, declarative
API enables you to concentrate on the meaning of the various plot parts rather than the
specifics of how to render them.
Introduction to MATPLOTLIB -

Matplotlib is a comprehensive library for creating static, animated, and interactive


visualizations in Python. Matplotlib makes easy things easy and hard things possible.

Matplotlib is a powerful tool for executing a variety of tasks. Matplotlib is an amazing


visualization library in Python for 2D plots of arrays. Matplotlib is a multi-platform data
visualization library built on NumPy arrays and designed to work with the broader SciPy
stack.

One of the greatest benefits of visualization is that it allows us visual access to huge amounts
of data in easily digestible visuals. Matplotlib consists of several plots like line, bar, scatter,
histogram, etc.

Histograms, bar charts, power spectra, error charts, and many other graphs and plots are
supported. It is combined with NumPy to provide a powerful open-source MATLAB
substitute environment.

Installation:

Windows, Linux, and macOS distributions have matplotlib and most of its dependencies as
wheel packages. Run the following command to install the matplotlib package:

python -mpip install -U matplotlib

Importing matplotlib:

from matplotlib import pyplot as plt

Basic plots in Matplotlib:

Matplotlib comes with a wide variety of plots. Plots help to understand trends, and patterns,
and make correlations. They’re typically instruments for reasoning about quantitative
information.

Sub Plots-

Subplots () is a Matplotlib function that is used to display multiple plots in one figure. It takes
various arguments such as a number of rows, columns, or share x, share y axis.
Introduction to TENSORFLOW -

TensorFlow provides a collection of workflows to develop and train


models using Python or JavaScript, and to easily deploy in the cloud, on-
prem, in the browser, or on the device no matter what language you use.

It is a foundation library that can be used to build Deep Learning models


directly or indirectly using wrapper libraries created on top of TensorFlow
to make the process easier.

Google developed it, maintains it, and makes it available under the
Apache 2.0 open-source license. Despite having access to the
underlying C++ API, the API is ostensibly for the Python programming
language.

TensorFlow was created to be used in both research and development


and production systems, including Rank Brain in Google search and the
entertaining Deep Dream project, in contrast to other numerical libraries
intended for usage in Deep Learning like Theano.

Installation of TensorFlow:

pip install tensorflow

Some TensorFlow Fundamentals:


Tensors:

A tensor is an array that represents the types of data in the TensorFlow


Python deep-learning library. A tensor, as compared to a one-
dimensional vector or array or a two-dimensional matrix, can have n
dimensions. The values in a tensor contain identical data types with a
specified shape. Dimensionality is represented by the shape. A vector,
for example, is a one- dimensional tensor, a matrix is a two-dimensional
tensor, and a scalar is a zero-dimensional tensor.

Shape:

In the TensorFlow Python library, the shape corresponds to the


dimensionality of the tensor. In simple terms, the number of elements in
each dimension defines a tensor’s shape. During the graph creation
process, TensorFlow automatically infers shapes. The rank of these
inferred shapes could be known or unknown. If the rank is known, the
dimensions’ sizes may be known or unknown.

3. System Requirements

A user-friendly GUI is provided to allow convenient use of the system


without the need to understand the technical details of speech emotion
recognition.

The programming would require python programming language.


System requirements for Python Installation: 1. Operating system:
Linux- Ubuntu 16.04 to 17.10, or Windows 7 to 10, with 2GB RAM
(4GB preferable) 2. You have to install Python 3.6 and related
packages, please follow the installation instructions given below as per
your operating system. It can also be used in Google Colab.

Most speech recognition systems require the following components to


operate effectively - Speech recognition software, a compatible computer
and sound system, and a noise- cancelling microphone or a headset. A
portable dictation recorder that lets a user dictate away from the computer
is optional.

For the programming to process and acquire desired output, it is


necessary to find an accurate database to get precise and accurate
information or output.

When programming in python, it needs different libraries to be imported.


3.1 Dataset used:

The dataset used in the SMS spam detection model on the webpage is the "SMS Spam Collection
Dataset". This dataset contains a collection of 5,574 SMS messages in English language, where 4,827
messages are legitimate (ham) and 747 messages are spam. The dataset is available on the UCI
Machine Learning Repository and can be downloaded from the following link:
https://archive.ics.uci.edu/ml/datasets/sms+spam+collection
Dataset size: 503.6 KB

Label: Indicates the label or class of the SMS message, which is either "ham" for legitimate messages
or "spam" for spam messages.
Text: Contains the text content of the SMS message.
Each row in the dataset represents a single SMS message, with the label and text fields separated by
a tab character.

4. Screenshot
5. Conclusion

In conclusion, building an SMS spam or ham classifier using Python can be a powerful tool
for filtering unwanted messages and improving SMS communication efficiency. The process
involves several key steps, including data cleaning, feature extraction, and model training.

During data cleaning, we pre-process the SMS dataset by removing unnecessary information
such as stop words, punctuations, and numbers, and converting all characters to lowercase.
Feature extraction involves transforming the pre-processed data into a numerical
representation that can be used for machine learning algorithms.

Once we have the features, we can use machine learning algorithms such as Naive Bayes,
Support Vector Machines, or Neural Networks to train our model on the labeled SMS dataset.
The trained model can then be used to classify incoming SMS messages as spam or ham in
real-time.

The performance of our SMS spam or ham classifier can be evaluated using metrics such as
accuracy, precision, recall, and F1-score. It is important to keep in mind that the performance
of the classifier can be affected by the quality and size of the dataset, the choice of features
and machine learning algorithms, and the presence of evolving spamming techniques.
Overall, building an SMS spam or ham classifier using Python can be a challenging but
rewarding task. With proper data preprocessing, feature extraction, and model training, we
can develop a highly accurate and efficient SMS classifier that can be used to filter out
unwanted messages and improve SMS communication.

You might also like