You are on page 1of 101

SONAR ROCK VS MINE PREDICTION USING

LOGISTIC REGRESSION AND K-NEAREST


NEIGHBOR

A PROJECT REPORT

Submitted by

ROHITH B 211519205127
ROSHAN S B 211519205129
YUGENDRAN M 211519205182

in partial fulfillment for the award of the degree

of

BACHELOR OF TECHNOLOGY

IN

INFORMATION TECHNOLOGY

PANIMALAR INSTITUTE OF TECHNOLOGY


ANNA UNIVERSITY CHENNAI 600 025

MAY 2023
PANIMALAR INSTITUTEOFTECHNOLOGY
ANNA UNIVERSITY CHENNAI 600 025

BONAFIDE CERTIFICATE

Certified that this project report “SONAR ROCK VS MINE PREDICTION


USING LOGISTIC REGRESSION & K-NEAREST NEIGHBOR” is the
bonafide work of “ROHITH B (211519205127) , ROSHAN S B
(211519205129) AND YUGENDRAN M (211519205182)” that carried out the
project work under my supervision.

SIGNATURE SIGNATURE

Dr. S. SUMA CHRISTAL MARY, M.E, Ph.D., Mr. M. KRISHNARAJ M. Tech.,


SUPERVISOR
HEAD OF THE DEPARTMENT ASSOCIATE PROFESSOR

Department of Information Technology, Department of Information


Technology, Panimalar Institute of Technology, Panimalar Institute of Technology,
Poonamallee, Chennai 600 123 Poonamallee, Chennai 600 123

Certified that the candidates were examined in the university project


Viva-voce held on-----------------------at Panimalar Institute of
Technology,
Chennai 600 123.

ii
INTERNAL EXAMINER EXTERNAL EXAMINER

iii
ACKNOWLEDGEMENT

A project of this magnitudeandnature requires kind co-operation and support


from many, for successful completion. We wish to express our sincere thanks to all
those who were involvedin the completion of this project.
We seek the blessing from the Founder of our institution Dr. JEPPIAAR,
M.A., Ph.D., for having been a role model who has been our source of inspiration
behind our success in education in hispremier institution.
We would like to express our deep gratitude to our beloved Secretary
and Correspondent Dr. P. CHINNADURAI, M.A., Ph.D., for his kind words
and enthusiastic motivation which inspired us a lot in completing thisproject.
We also express our sincere thanks and gratitude to our dynamic Directors
Mrs. C. VIJAYA RAJESHWARI, Dr. C. SAKTHI KUMAR, M.E., Ph.D.,
and Dr. SARANYA SREE SAKTHI KUMAR, B.E, M.B.A., Ph.D., for
providingus withnecessary facilities for completion of this project.

We also express our appreciation and gratefulness to our respected Principal


Dr. T. JAYANTHY, M.E., Ph.D., who helped us in the completion of the
project. We wish to convey our thanks and gratitude to our Head of the
Department, Dr. S. SUMA CHRISTAL MARY, M.E, Ph.D., for her full
support by providing ample time to complete our project. We express our
indebtedness and special thanks to our Supervisor, MR. M. KRISHNARAJ
M.TECH., for her expert advice and valuable information and guidance
throughout the completion of the project.
Last, we thank our parents and friends for providing their extensive moral
support and encouragementduringthe courseof the project.

iv
ABSTRACT

Caval defense systems rely on the use of underwater mines as a critical


layer of security, but these mines pose a risk to marine life and submarines due to
their resemblance to rocks. To counter this threat, Mine Countermeasure (MCM)
units utilize mine hunting, which involves the detection and classification of all
mines within a suspicious area. This process typically involves using a sonar
mounted on a ship or underwater vehicle to capture data, which is then analyzed
by military personnel to identify mine-like objects (MLOs) and benign objects.
To improve the accuracy of this system, it is necessary to develop a more precise
prediction model. Such a model could significantly enhance the safety and
reliability of underwater mine detection. However, the accuracy of the prediction
model is directly linked to the quality of the data used to train it. To address this
issue, we used a dataset obtained from sources such as Kaggle to train a machine
learning model for underwater mine and rock classification. The dataset
comprises 208 sonar signals recorded at 60 different angles, which capture the
unique frequency characteristics of underwater objects. We compared the
performance of three binary classifier models using Python and supervised
machine learning algorithms, and selected the most accurate model for
predictions

v
TABLE OF CONTENTS

CHAPTER TITLE PAGENO

ABSTRACT iv
LIST OF FIGURES viii
LIST OF SYMBOLS ix
LIST OF ABBREVIATIONS xi
1. INTRODUCTION
1.1 OVERVIEW OF PROJECT 3
1.2 SCOPE OFTHE PROJECT 5
2. LITERATURE SURVEY 7

3. SYSTEM ANALYSIS
3.1 EXISITING SYSTEM 13
3.1.1 PROBLEM DEFINITION 13
3.2 PROPOSED SYSTEM 14
3.2.1ADVANTAGES
4 REQUIREMENTS SPECIFICATION
4.1 INTRODUCTION 19
4.2 HARDWARE AND SOFTWARE
SPECIFICATIONS 20

4.2.1 HARDWARE REQUIREMENTS 20


4.2.2 SOFTWARE REQUIREMENTS 20
4.1.1.1 PYTHON 20
4.1.1.1 INTRODUCTION 23
vi
4.1.1.1.1 ANACONDA NAVIGATOR 22
5 SYSTEM DESIGN
5.1 SYSTEM ARCHITECTURE 30
5.2 UML DIAGRAMS
5.2.1 USECASE DIAGRAM 32
5.2.2 SEQUENCE DIAGRAM 33
5.2.3 CLASS DIAGRAM 34
5.2.4 ACTIVITY DIAGRAM 35
5.2.5 STATE DIAGRAM 36
5.3 MODULES: 34
5.3.1 DATA PRE-PROCESSING 34
5.3.2 FEATURE SELECTION AND 35
REDUCTION
5.3.3 SEGMENTATION 35
5.3.4 PREDICTION USING
MACHINE LEARNING 35
5.4 ALGORITHM
AND TECHNIQUES 36
5.5 PYTHON PACKAGES 36
5.5.1 SKLEARN 36
5.5.2 NUMPY 37
5.5.3 PANDAS 37
5.6 LOGISTIC REGRESSION 37
5.7 K-NEAREST NEIGHBOR 38

vii
6 CODING ANDTESTING

6.1 CODING 41

6.2 CODING STANDARDS 41

6.2.1 NAMING CONVENTIONS 41

6.2.2 VALUE CONVENTIONS 42

6.2.3 SCRIPT WRITING AND

COMMENTING

STANDARD

6.3 TESTING 42

6.4 TYPES OFTESTING

6.4.1 UNIT TESTING 46

6.4.2 FUNCTIONALTESTING 46

6.4.3 SYSTEM TESTING 47

6.4.4 PERFORMANCE TESTING 47

6.4.5 INTEGRATION TESTING 47


6.4.6 PROGRAM TESTING 48

6.4.7 VALIDATION TESTING

6.4.8 ACCEPTANCE TESTING 49

6.5 WHITE BOX AND BLACK BOX TESTING

6.5.1 WHITE BOX TESTING 49

viii
6.5.1 BLACK BOX TESTING 50
6.6 SOFTWARE TESTINGSTRAGIES 50

7 CONCLUSION AND FUTURE SCOPE 52

8 APPENDICES

APPENDIX A-SOURCE CODE 54

APPENDIX B- SNAPSHOTS 63

REFERENCE 70

ix
LIST OF FIGURES
S.NO NAME OF THE PAGE.NO
FIGURES

5.1 System Architecture 30


Diagram

5.2 Use Case Diagram 32

5.3 Sequence Diagram 33

5.4 Class Diagram 34

5.5 Activity Diagram 35

5.6 State Diagram 36

x
LIST OF SYMBOLS

S.NO NAME NOTATION DESCRIPTION


1. Actor It aggregates several
classes into single classes

2. Communication Communication between


various usecases.

3. State State of the process.

4. Initial State Initial state of the object

5. Final state Final state of the object

6. Control flow X Represents various


control flow between
the states.

7. Decision box Represents decision


making process from
a constraint

8. Node Represents physical


modules which are
a collection of
components.
9. Data A circle in DFD
Process/State represents a state or
process which has
been triggered due to
some
event or action.

xi
10. External entity Represents external
entities such as keyboard,
sensors, etc.
11. Transition Represents
communication that
occurs between
processes.

12. Object Lifeline Represents thevertical


dimensions that the
object communications.

13. Message message Represents themessage


exchanged.

1
CHAPTER 1

2
CHAPTER 1
INTRODUCTION

1.1 OVERVIEW OF THE PROJECT

Underwater mines, also known as naval mines, have been used since
the mid-19th century as self-contained explosive devices to deter enemy
surface ships and submarines. Introduced by David Buchner during the
American Civil War, these mines still pose a threat today, with approximately
5,000 remaining in the Adriatic Sea from both world wars. Unlike earlier
versions that relied on physical contact for activation, modern mines can be
triggered by acoustic, pressure, and magnetic changes in the water, known as
influence mines. They are categorized as offensive or defensive warfare tools.
Offensive mines are scattered across hostile shipping lanes to damage
merchant ships and military vessels, while defensive mines are strategically
placed along coastlines to divert enemy submarines and ships away from
critical areas and into more heavily guarded zones. However, their
resemblance to rocks in terms of shape, length, and width often leads to
misidentification, necessitating precise input for accurate detection. One
effective method for detecting mines is through the use of SONAR technology.
SONAR, which stands for Sound Navigation and Ranging, utilizes sound
waves to locate and detect objects underwater. Its applications extend beyond
military purposes and include acoustic mine detection, fish finding, ocean
floor mapping, and locating divers for non- military uses. The range and
frequency of SONAR are limited due to sound wave attenuation increasing
rapidly with frequency. For mine hunting, SONAR frequencies

3
typically range from 0.1 to 1 MHz, with a range of 1 to 0.1 km. Ultrasonic
waves are preferred over infrasonic waves in SONAR due to their inability to
propagate underwater. SONAR is classified into two types: active and passive.
Passive SONAR, also known as listening SONAR, detects sounds, whileactive
SONAR employs a sound transmitter and receiver. When thetransmitter emits
a sound wave that hits an object, it reflects back and creates an echo. By
analyzing the frequencies of the object's echo, the receiver determines its
nature. In the case of detecting mines or rocks, the frequencies obtained by
active SONAR at 60 different angles are used as input to discern between the
two. Active SONAR typically operates in the frequency range of 20KHz. The
process of mine countermeasures can be divided into four stages. Firstly,
detection involves locating targets using various signals such as acoustic or
magnetic cues. Secondly, classification is employed to differentiatepotential
mines from harmless objects. Thirdly, identification confirms the classification
with the assistance of additional information from tools like underwater
cameras. Lastly, disposal entails safely removing or destroying the detected
mines.

Fig 1. 1 Data flow diagram for Machine Learning Model


4
1.2 SCOPE OF THE PROJECT

The objective of this study is to develop a highly accurate prediction


system for distinguishing between underwater mines and rocks using Sonar
signals. While underwater mines serve as a crucial component of naval
defense systems, their presence poses a significant threat to marine life and
submarine vessels, especially due to the potential misidentification of mines as
rocks. Therefore, there is a pressing need to develop an advanced system
capable of accurately classifying underwater objects to ensure the safety and
security of marine environments. To achieve this objective, the study utilizes
adataset provided by Gorman, R.P., and Sejnowski, T.J. (1988) that is
specifically designed for training machine learning models. The dataset
consists of Sonar signals capturing the frequencies of underwater objects from
60 different angles. By leveraging this data, the study aims to construct three
binary classifier models, each tailored to maximize accuracy in differentiating
between mines and rocks. Python and supervised machine learning
classification algorithms are employed to build these prediction models. The
study explores various classification algorithms, selecting the most suitable
ones that yield the highest accuracy in prediction. By harnessing the power of
supervised machine learning techniques, the objective is to create robust and
reliable models capable of accurately classifying Sonar signals and providing
real-time predictions on the nature of the underwater objects encountered. By
achieving a high level of accuracy in predicting underwater mines versus
rocks, the proposed system aims to significantly enhance the safety and
efficiency of marine operations.

5
CHAPTER 2

6
CHAPTER 2
LITERATURE SURVEY

REFERENCE 1:

TITLE: Application of machine learning algorithms for classification of


mines and rocks using sonar data.
AUTHORS: Abhishek Kumar andSanjay Kumar Soni.

DESCRIPTION: The study uses a dataset of sonar readings collected from


underwater mines and rocks. The authors applied several machine learning
algorithms, including decision trees, random forests, support vector machines
(SVM), and k-nearest neighbors(KNN), to classify the sonar data. They
evaluated the performance of each algorithm based on several performance
metrics, such as accuracy, precision, recall, and F1-score.

DRAWBACKS:

The choice of algorithm used for classification can significantly affect the
accuracy of the results. The selection of the best algorithm for a specific task requires
expertise in machine learning.

7
REFERENCE 2:

TITLE: Mine Detection and Classification Using Support Vector Machines and
Principal Component Analysis

AUTHORS: Eun Jin Kim, Tae Seong Kim, and Soo Hyung Kim

DESCRIPTION: The study used a dataset of GPR signals collected from different
types of landmines buried in the ground. The authors applied PCA to reduce the
dimensionality of the dataset and SVM to classify the GPR signals into mine or non
mine categories. They evaluated the performance of the algorithm based on several
performance metrics, such as accuracy, sensitivity, and specificity.

DRAWBACKS:

Computational requirements: SVM can be computationally expensive,


especially when working with large datasets. This can limit the scalability of
the algorithm for some applications

8
REFERENCE 3:

TITLE: Mine Detection and Classification Using Neural Networks

AUTHORS: R. Todd Moon and W. Dale Blair.

DESCRIPTION: The study used a dataset of GPR signals collected from


different types of landmines buried in the ground. The authors used a
feedforward neural network with backpropagation to classify the GPR signals
into mine or non mine categories. They evaluated the performance of the
algorithm based on several performance metrics, such as accuracy, false alarm
rate, and receiver operating characteristic (ROC) curve

DRAWBACKS:
Model complexity: Neural networks can be complex models that
require significant computational resources to train and deploy. This can limit
the scalability of the algorithm for some applications

9
REFERENCE 4:

TITLE: Classification of Underwater Objects Using Neural Networks and Spectral


Analysis.
AUTHORS: Kalyanmoy Deb and S. Sanyal.

DESCRIPTION: The authors found that the neural network algorithm


combined with spectral analysis showed high performance in the classification
of underwater objects. They observed that the performance of the algorithm
improved when they used feature selection techniques to reduce the
dimensionality of the dataset. The paper concludes that the proposed method
can be an effective solution for the classification of underwater objects, which
can contribute to the safety of marine life and human activity.

DRAWBACKS:
Overfitting: Neural network models may overfit to the training data,
meaning that the model performs well on the training data but fails to
generalize to new, unseen data.

1
REFERENCE 5:

TITLE: Comparison of Machine Learning Algorithms for Underwater Object

Recognition and Classification

AUTHORS: Ashish Kumar and S. Sanyal.

DESCRIPTION: The study used a dataset of sonar signals collected from


different types of underwater objects, including mines, rocks, and metal
cylinders. The authors compared the performance of several machine learning
algorithms, including k nearest neighbors (KNN), support vector machines
(SVM), decision trees, and neural networks, for the classification of the objects
into different categories. They evaluated the performance of the algorithms
based on several performance metrics, such as accuracy, precision, recall, and
F1 score.

DRAWBACKS:

Interpretability: Machine learning algorithms can be difficult to


interpret, especially when using deep learning models. This can make it
challenging to understand how the model is making its classification
decisions

1
CHAPTER 3

1
CHAPTER 3
SYSTEM ANALYSIS

3.1 EXISTING SYSTEM:

The existing system for Sonar Rock vs Mine prediction using machine learning
algorithms using traditional methods had several disadvantages. The detection of mines
was done by explosive ordnance disposal divers, marine mammals, video cameras on
mine neutralization vehicles, and laser systems, which can be time-consuming and costly.
Additionally, these methods had a limited range and were not highly accurate, leading to
the risk of undetected mines. Moreover, the use of such equipment can cause a risk to
marine life, and the loss of life cannot be ruled out. These traditional methods had
limitations and risks associated with them, such as harm to marine life, insufficient
accuracy, and the potential loss of human life. As technology improved, SONAR became
a primary tool for detecting mines in the underwater environment. SONAR uses sound
waves to detect objects in water and has proved to be an effective tool for detecting
mines in real-time. However, even the SONAR-based system has some limitations, such
as the possibility of generating false positives, difficulty in detecting small-sized mines,
and the need for constant calibration.

3.1.1 PROBLEM DEFINITION

 Predict Existing methods for Sonar Rock vs Mine prediction are time-consuming and
expensive, requiring specialized equipment and resources.
 Traditional methods have limited detection ranges and accuracy, risking
undetected mines and compromising naval safety and security.

1
3.2 PROPOSED SYSTEM

We developed a predictive system for underwater mine detection using machine


learning techniques on a dataset obtained by striking metal cylinders with sonar signals
from 60 angles. Our models accurately distinguish between rocks and mines based on
their unique frequency characteristics. To enhance the safety and effectiveness of Mine
Countermeasure (MCM) units in identifying mines and protecting naval vessels, we
implemented computer-aided detection (CAD), computer-aided classification (CAC), and
automated target recognition (ATR) algorithms. These algorithms analyze texture-based,
geometrical, and spectral features in sonar images. While machine learning has
limitations, deep learning overcomes them with its ability to work with vast amounts of
data and perform automatic feature extraction, making it more reliable for mine detection
and classification.

Figure 2:
(a) Working process of ML;
(b) working process of DL; (c) performance of ML and DP .

1
Figure 2

Deep learning algorithms in mine detection face challenges due to limited high-
quality data availability. Techniques like sonar data simulation and augmentation
address this issue. Transfer learning and algorithm fusion improve reliability.
Combining classical image processing with deep learning enhances performance
and mitigates unbalanced data effects. A review of recent and past methods in
mine detection and classification is presented. Sonar, utilizing acoustic waves,
measures the time taken for waves to bounce back from objects, allowing
distance calculation. Factors like water properties and frequency affect sonar
performance, and hydrophones are used for transmission and reception.

1
Figure 3

METHODOLOGIES:

Data collection: Gather relevant data on geology, mining techniques,


equipment, etc., manually or using sensors.

Data preprocessing: Clean, scale, split data for analysis, removing missing values
and normalizing it.
Feature selection: Choose important features using domain knowledge or
techniques like correlation analysis or PCA.
Model training: Train logistic regression and K-nearest neighbor models by
adjusting parameters to minimize loss.
Model evaluation: Assess performance using metrics like accuracy, precision,
recall, and F1 score.
Model tuning: Adjust parameters to optimize model performance, e.g., k in K-
nearest neighbor.
Model deployment: Integrate optimized models with mining systems for real-
world prediction scenarios.

1
Object Detection: Detect mines using highlight and shadow segments despite
environmental obstacles.
Image enhancement: Apply techniques like histogram equalization and denoising
to normalize images for detection and classification.
Image segmentation: Identify mine-related regions like highlights and shadows
using thresholding, MRF, fuzzy functions, or deep learning.
MLO Detection: Use Gabor-based deep neural network to accurately detect
MLOs at multiple scales for AUVs.

ADVANTAGES:

Accurate prediction: Reliability is improved by using machine learning


on a dataset obtained through SONAR simulations.
Diverse dataset: Data from various repositories increases diversity, improving
model generalization.
Simulation-based data collection: Controlled environment allows precise labeling
and avoids risks associated with real mines.
Multiple angle recordings: Dataset with 60 different angles enhances models'
ability to distinguish between rocks and mines.
Real-time detection potential: System can be deployed for timely mine detection,
enhancing safety and effectiveness.
Scalability and applicability: System can adapt to different underwater
environments and integrate with existing mine detection systems.

1
ARCHITECTURE DIAGRAM FOR PROPOSED SYSTEM:

Figure 4

1
CHAPTER 4

1
CHAPTER 4
REQUIREMENT SPECIFICATIONS

4.1 INTRODUCTION

Requirements are the basic constrains that are required to develop a system.
Requirements are collected while designing the system. The following are the
requirements that are to be discussed.
• Functional requirements

• Non-Functional requirements

• Environment requirements

• Hardware requirements

• Software requirements

FUNCTIONAL REQUIREMENTS:
The software requirements specification is a technical specification of requirements
for the software product. It is the first step in the requirements analysis process. It lists
requirements of a particular software system. The following details to follow the special
libraries like sk-learn, pandas, NumPy, mat plot lib and sea born.

NON-FUNCTIONAL REQUIREMENTS:
• Problem definition

• Preparing data

• Evaluating algorithms

2
• Improving results

• Prediction the result

4.2 HARDWARE AND SOFTWARE SPECIFICATION

4.2.1 SOFTWARE REQUIREMENTS:

• Windows 7or higher.

• Colab – an online python interpreter

4.2.2 HARDWARE REQUIREMENTS

• Processor – i3 or higher

• Hard Disk – 5 GB

• Memory – 1GB RAM

• Internet Connection

4.1.1.1 PYTHON
Python could be an interpreted, high-level and general programing
language. Python is a style philosophy emphasizes code readability with its
notable use of great indentation. Its language constructs and object-oriented
approach aim to assist programmers write clear, logical code for tiny and large-
scale projects. Python is commonly represented as A battery enclosed language
because of its comprehensive commonplace library. Python is a multi-paradigm
programming language, Object-oriented programming and structured
programming language are absolutely supported and plenty of its options support
practical programming and aspect-oriented programming (including by
2
metaprogramming and metaobjects (magic methods)). several different
paradigms are supported via extensions, together with design by contract and
logic programming. Python uses dynamic writing and a mix of reference
numeration and a cycle-detecting dustman for memory management. It
alsooptions dynamic name resolution (late binding), that binds technique and
variable names throughout program execution.
Python is an interpreter, object-oriented, high-level programing language
with dynamic semantics. Its high-level inbuilt information structures, combined
with dynamic typing and dynamic binding build it terribly engaging for fast
Application Development, still as to be used as a scripting or glue language to
attach existing parts together. Python' simple, straightforward to findout syntax
emphasizes readability and so reduces the cost of program maintenance. Python
supports modules and packages, which inspires program modularity and code
reuse. The Python interpreter and also the intensive commonplace library are on
the market in supply or binary type at no cost for all major platforms and may be
freely distributed. Often, programmers fall soft on with Python thanks to the
magnified productivity it provides. Since there's no compilation step, the edit-
test- debug cycle is implausibly fast. Debugging Python programs is easy, a bug
or dangerous input can ne'er cause a segmentation fault. Instead, once the
interpreter discovers an error, it raises an exception. once the program doesn't
catch the exception, the interpreter prints a stack trace. A supply level computer
programme permits examination of native and world variables, analysis of
discretional expressions, setting breakpoints, stepping through the code a line at
a time, and then on. The rightger is written in Python itself, testifying to Python'
introverted power. On the opposite hand, usually the fastest thanks to debug a
program is to feature a number of print

2
statements to the source, the quick edit-test-debug cycle makes this easy
approach terribly effective.

Python is a dynamic programming language which supports several different


programming paradigms as follows:
 Procedural programming
 Object oriented programming
 Functional programming
Standard: Python byte code is executed in the Python interpreter
(similar to Java) platform independent code
 Extremely versatile language Website development, data
analysis, server maintenance, numerical analysis,
 Syntax is clear, easy to read and learn (almost pseudo code)
 Common language
 Intuitive object-oriented programming
 Full modularity, hierarchical packages
 Comprehensive standard library for many tasks
 Big community
 Simply extendable via C/C++, wrapping of C/C++ libraries
 Focus: Programming speed

4.1.1.1.1 INTRODUCTION TO PYTHON


Guido van Rossum began functioning on Python within the late 1980s,as
a successor to the fundamental principle programming language, and initial
discharged it in 1991 as Python 0.9.0. Python two.0 was released in 2000 and
introduced new features, akin to list comprehensions and a trash collection
system victimization reference count and was discontinued with version 2.7.18
2
in 2020. Python 3.0 was released in 2008 and was a serious revision of the
language that's not fully backward-compatible and far Python 2 code doesn't run
unqualified on Python 3. Python 2.0 was released on sixteen Gregorian calendar
month 2000, with several major new options, together with a cycle-detecting
refuse collector and support for Unicode.

Python three.0 was discharged on 3 Gregorian calendar month two008. it had


been a serious revision of the language that's not fully backward-compatible.
several of its major features were backported to Python 2.6.x and 2.7.x version
series. Releases of Python 3 embody the 2to3 utility, that automates (at least
partially) the interpretation of Python 2 code to Python 3. Python 2.7’s end-of-
life date was at the start set at 2015 then delayed to 2020 out of concern that an
outsized body of existing code couldn't simply be forward-ported to Python 3.
No a lot of security patches or different enhancements are going to be
dischargedfor it. With Python 2’s end-of-life, solely Python 3.6.x and later are
supported. Python 3.9.2 and 3.8.8 were speeded up as all versions of Python
(including 2.7) had security issues, resulting in potential remote code execution
and internet cache poisoning.

DESIGN PHILOSOPHY AND FEATURES

The standard library has 2 modules (itertools and functools) that


implement practical tools borrowed from Haskell and normal ML. instead of
having all of its practicality designed into its core, Python was designed to be
extremely extensile (with modules). This compact modularity has created it
significantly standard as a method of adding programmable interfaces to existing
applications. Python attempts for a simpler, less-cluttered syntax and synchronic
linguistics whereas giving developers a alternative in their writing methodology.

2
Python developers strive to avoid premature optimization, and reject patches to
non-critical elements of the CPython reference implementation that may provide
marginal will increase in speed at the price of clarity. once speed is important, a
Python technologist will move time-critical functions to extension modules
written in languages reminiscent of C, or use PyPy, a just-in-time compiler.
Cython is additionally available, that interprets a Python script into C and makes
direct C-level API calls into the Python interpreter.

SYNTAX AND SEMANTICS, INDENTATION

Python is supposed to be a simply clear language. Its format is


visually uncluttered, and it usually uses English keywords wherever different
languages use punctuation. not like several other languages, it doesn't use crisp
brackets to delimit blocks, and semicolons once statements are allowed however
are rarely, if ever, used. it's fewer syntactical exceptions and special cases than C
or Pascal. Python uses whitespace indentation, instead of curly brackets or
keywords, to delimit blocks.

TYPING

Python makes use of duck typing and has typed gadgets however
untyped variable names. Type constraints aren't checked at assemble time;
rather, operations on an item may also fail, signifying that the given item isn't
always of a appropriate type. Despite being dynamically-typed, Python is
strongly-typed, forbidding operations that aren't well-defined (for example,
including a range of to a string) in preference to silently trying to make feel of
them. Python permits programmers to outline their very own sorts the use of
training, which can be most usually used for item-orientated programming. New
times of training are built through calling the class (for example, SpamClass () or

2
EggsClass ()), and the training are times of the metaclass type (itself an example
of itself), permitting metaprogramming and reflection.

LIBRARIES

Python’s giant standard library, usually cited united of its greatest


strengths, provides tools suited to several tasks. For Internet-facing applications,
many standard formats and protocols corresponding to MIME and hypertext
transfer protocol are supported. It includes modules for making graphical user
interfaces, connecting to relative databases, generating pseudorandom numbers,
arithmetic with arbitrary-precision decimals, manipulating regular expressions,
and unit testing. Some components of the quality library are lined by
specifications (for example, the net Server entranceway Interface (WSGI)
implementation wsgiref follows life 333), however most modules are not.
they'regiven by their code, internal documentation, and check suites. However,
as a result of most of the quality library is cross-platform Python code, solely a
number of modules want sterilization or redaction for variant implementations.
As of March 2021, the Python Package Index (PyPI), the official repository for
third-party Python software, contains over 290,000 packages with a large vary of
functionality, including: Automation

 Data analytics

 Databases

 Documentation

 Graphical user interfaces

 Image processing

 Machine learning
2
 Mobile App

 Multimedia

 Computer Networking

 Scientific computing

 System administration

 Test frameworks

 Text processing

 Web frameworks

 Web scraping

DEVELOPMENT ENVIRONMENTS

Most Python implementations (including CPython) embody a read–


eval– print loop (REPL), allowing them to perform as a instruction interpreter
that the user enters statements consecutive and receives results immediately.
different shells, as well as IDLE and IPython, add further skills like improved
auto- completion, session state retention and syntax highlighting. still as normal
desktop integrated development environments, there are internet browser-based
IDEs; Sage science (intended for developing science and math-related Python
programs); PythonAnywhere, a browser-based IDE and hosting environment;
and cover IDE, an advertisement Python IDE action scientific computing.

4.1.1.1 ANACONDA NAVIGATOR - JUPYTER NOTEBOOK

2
Anaconda may be a distribution of the Python associate degreed R
programming languages for scientific computing (data science, machine learning
applications, large-scale knowledge processing, prophetical analytics, etc.), that
aims to change package management and deployment. The distribution includes
data-science packages appropriate for Windows, Linux, and macOS. it's
developed and maintained by boa, Inc., that was based by Peter Wang and Travis
Oliphant in 2012. As a boa, Inc. product, it is additionally referred to as boa
Distribution or boa Individual Edition, whereas different merchandise from the
corporate are boa Team Edition and Anaconda Enterprise Edition, each of which
aren't free. The subsequent applications are on the market by default in
Navigator:
• JupyterLab
• Jupyter Notebook
• QtConsole
• Spyder
• Glue
• Orange
• RStudio
• Visual Studio Code

Jupyter Notebook (formerly IPython Notebooks) may be a net-based


interactive machine setting for making Jupyter notebook documents. The
"notebook" term will informally build respect to many alternative entities,
chiefly the Jupyter web application, Jupyter Python web server, or Jupyter
document format counting on context. A Jupyter Notebook document is a JSON
document, following a versioned schema, containing an ordered list of
input/output cells which might contain code, text (using Markdown),
2
mathematics, plots and made media, sometimes ending with the ".ipynb"
extension. A Jupyter Notebook will be born-again to variety of
opencommonplace output formats (HTML, presentation slides, LaTeX, PDF,
Restructured Text, Markdown, Python) through "Download As" within the net
interface, via the nbconvert library or "jupyter nbconvert" command interface
during a shell. To change visualisation of Jupyter notebook documents on the
web, the nbconvert library is provided as a service through NbViewer which can
take a URL to any publically on the market notebook document, convert it to
markup language on the fly and show it to the user.

Jupyter Notebook provides a browser-based REPL engineered upon variety of


common ASCII text file libraries:
• IPython
• ØMQ (ZeroMQ)
• Tornado (web server)
• jQuery
• Bootstrap (front-end framework)
• MathJax
Jupyter Notebook absolutely was forty nine Jupyter-compatible kernels for several
programming languages, together with Python, R, Julia and Haskell. The
Notebook interface was additional to IPython within the 0.12 unleash (December
2011), renamed to Jupyter notebook in 2015 (IPython 4.0–Jupyter1.0).

2
CHAPTER 5

3
CHAPTER 5
SYSTEM DESIGN

5.1 SYSTEM ARCHITECTURE:

LOGISTIC REGRESSION:

K-NEAREST NEIGHBOR:

3
5.2 UML DIAGRAMS

5.2.1 USECASE DIAGRAM


A Use case Diagram is used to present a graphical overview of the
functionality provided by a system in terms of actors, their goals and any
dependencies between those use cases. A Use Case describes a sequence of
actions that provided something of unmeasurable value to an actor and is
drawn as a horizontal ellipse. An actor is a person, organization or external
system that plays a role in one or more interaction with the system.

Fig 5.1 Use caseDiagram

3
5.2.2 SEQUENCE DIAGRAM
A Sequence diagram is a kind of interaction diagram that shows how
processes operate with one another and in what order. It is a construct of Message
Sequence diagrams are sometimescalled event diagrams, event sceneries and timing
diagram.

Fig 5.2 SequenceDiagram

3
5.2.3 CLASS DIAGRAM
A Class diagram in the Unified Modelling Language is a type of static
structure diagram that describes the structure of a system by showing the
system's classes, their attributes, operations (or methods), and the relationships
among objects.

Fig 5.3 Class Diagram

3
5.2.4 ACTIVITY DIAGRAM
Activity diagram is a graphical representation of workflows of stepwise
activities and actions with support for choice, iteration and concurrency. An activity
diagram shows the overallflow of control.
 Rounded rectanglesrepresent activities.
 Diamonds represent decisions.
 Bars represent thestart or end of concurrent activities.
 An encircled circle representsthe end of the workflow.
 A black circle representsthe start of the workflow

Fig 5.4 Activity Diagram

3
5.2.5 STATE DIAGRAM

A STATE DIAGRAM is a graphical representation that models the


behaviour of a single object, specifying the sequence of events that object goes
through during its lifetime in response event. It is a preliminary step used to
create an overview of the system.

Fig 5.5 State Diagram

3
5.3 MODULES:

 DATA PRE-PROCESSING

 FEATURE SELECTION AND REDUCTION

 SEGMENTATION

 PREDICTION USING MACHINE LEARNING ALGORITHM

LIBRARIES USED:

 Numpy : performs a wide variety of mathematical operations on arrays .

 Pandas: used for creating data frames

 Sklearn.model_selection import train_test_split : splits our original data into


training and test data

 Sklearn.linear_model import accuracy_score : evaluate our model to check how


well our model is performing.

 Sklearn.metrics import LogisticsRegression & S klearn.neighbors


import KNeighborsClassifier: statistical method for predicting binary
classes

 import matplotlib.pyplot as plt Matplotlib :is a python library used to create 2D


graphs and plots by using python scripts

5.3.1 DATA PREPROCESSING:

Pandas: For data manipulation and preprocessing tasks such as loading the dataset,
handling missing values, and data cleaning. NumPy: For numerical operations and array

3
manipulations required during data preprocessing

3
5.3.2 FEATURESELECTIONANDREDUCTION:

Scikit-learn: Offers various feature selection techniques such as


correlation analysis, recursive feature elimination (RFE), and principal
component analysis (PCA) for reducing the dimensionality of the dataset and
selecting relevant features.
5.3.3 SEGMENTATION

Scikit-image: Provides image processing functions for segmenting the


Sonar signals or images into highlight and shadow areas related to mines.
OpenCV: Offers tools and algorithms for image segmentation, contour
detection, and region-based segmentation.

5.3.4 PREDICTION USING MACHINE LEARNING:

Scikit-learn: Implements the Logistic Regression and K-Nearest Neighbor


algorithms for classification tasks. TensorFlow or PyTorch: Deep learning
frameworks that provide advanced neural network models for classification tasks,
such as convolutional neural networks (CNNs).

3
5.4 ALGORITHM AND TECHNIQUES

ALGORITHM USED:

In machine learning and statistics, classification is a supervised learning


approach in which the computer program learns from the data input given to it and then
uses this learning to classify new observation. This data set may simply be bi-class (like
identifying whether the person is male or female or that the mail is spam or non-spam) or
it may be multi-class too. Some examples of classification problems are: speech
recognition, handwriting recognition, biometric identification, document classification
etc. In Supervised Learning, algorithms learn from labeled data. After understanding the
data, the algorithm determines which label should be given to new data based on pattern
and associating the patterns to the unlabeled new data.

5.5 USED PYTHON PACKAGES:


5.5.1 sklearn:
• In python, sklearn is a machine learning package which include a lot of
ML algorithms.

• Here, we are using some of its modules like train test split, Decision Tree
Classifier or Logistic Regression and accuracy score.

4
5.5.2 NumPy:
• It is a numeric python module which provides fast math functions for calculations.
• It is used to read data in NumPy arrays and for manipulation purpose.

5.5.3 Pandas:
• Used to read and write different files.

• Data manipulation can be done easily with data frames

5.6 LOGISTIC REGRESSION:


It is a statistical method for analyzing a data set in which there are one or more
independent variables that determine an outcome. The outcome is measured with a
dichotomous variable (in which there are only two possible outcomes). The goal of
logistic regression is to find the best fitting model to describe the relationship between the
dichotomous characteristic of interest (dependent variable = response or outcome
variable) and a set of independent (predictor or explanatory) variables. Logistic
regression is a Machine Learning classification algorithm that is used to predict the
probability of a categorical dependent variable. In logistic regression, the dependent
variable is a binary variable that contains data coded as 1(yes, success, etc.) or 0 (no,
failure, etc.).
In other words, the logistic regression model predicts P(Y=1) as a function of X.

Logistic Regression Assumptions:

 Binary logistic regression requires the dependent variable to be binary.

 For a binary regression, the factor level 1 of the dependent variable should
4
represent the desired out come.

4
 Only the meaningful variables should be included.

 The independent variables should be independent of each other. That is, the
model should have little.

 The independent variables are linearly related to the log odds.

 Logistic regression requires quite large sample sizes.

Figure 5

5.7 K-NEAREST NEIGHBOR:

To implement K-Nearest Neighbors (KNN) for prediction, load the dataset into a
pandas dataframe, split it into input features (X) and target variable (y), split the data into
training and testing sets, create a KNN model using KNeighborsClassifier, fit the model
to the training data, make predictions on the testing data, and evaluate the model's
performance using a confusion matrix.

Assumptions of K-Nearest Neighbors:


Instance Similarity: KNN assumes that similar instances have similar labels,
calculated based on their feature values.

4
Local Smoothness: KNN assumes neighboring instances have similar labels due to similar
data distributions.
Optimal Value of K: KNN assumes an optimal value of K, impacting bias-variance trade-
off and prediction accuracy.
Feature Scaling: KNN is sensitive to feature scale and assumes equal contribution,
requiring appropriate scaling.
Noise and Outliers: KNN is affected by noise and outliers, potentially leading to
inaccurate predictions

4
CHAPTER 6

4
CHAPTER 6
CODING AND
TESTING

6.1 CODING
Once the design aspect of the system is finalized, thesystem enters into the coding
and testing phase. The coding phase brings the actual system into action by converting the
design of the system into the code in a given programminglanguage. Therefore, a good
coding style has to be taken whenever changes are required it easily screwed into the
system.

6.2 CODING STANDARDS


Coding standards are guidelines to programming that focuses on the
physical structure and appearance of the program. They make the code easier to
read, understand and maintain. This phase of the system actually implements the
blueprint developed during the design phase. The coding specification should be in
such a way that any programmer must be able to understand the code and can
bring about changeswhenever felt necessary.

6.2.1 NAMING CONVENTIONS


Namingconventionsof classes, data member, memberfunctions, procedures
etc., should be self-descriptive. One should even get the meaning and scope of
the variable by its name. The conventions are adopted for easy understanding of
the intended message by the user. So, it is customary to follow the conventions.
Class names are problem domain equivalence and begin with capital letter and
have mixed cases. Member function and data member name begins with a
lowercase letter with each subsequent letters of the new words in uppercase and
the rest of letters in lowercase.
4
6.2.2 VALUE CONVENTIONS
Value conventions ensure values for variable at any point of time. This
involves thefollowing:
 Proper default values for the variables.

 Proper validation of values in the field.

 Proper documentation of flag values.

6.2.3 SCRIPT WRITINGAND COMMENTINGSTANDARD


Script writing is an art in which indentation is utmost important.
Conditional and looping statements are to be properly aligned to facilitate
easy understanding. Comments are included to minimize the number of
surprises that could occur when going through the code.

6.3 TESTING

Testing is performed to identify errors. It is used for quality


assurance. Testing is an integral part of the entire development and maintenance
process. The goal of the testing during phase is to verify that the specification has
been accurately and completely incorporated into the design, as well as to ensure the
correctness of the design itself. For example, the design must not have any logic
faults in the design is detected before coding commences, otherwise the cost of fixing
the faultswill be considerablyhigheras reflected.
Testing is one of the important steps in the software development phase.
Testing checks for the errors, as a whole of the project testing involves the following
test cases:
4
 Static analysis is used to investigate the structural properties of the Source
code.
 Dynamic testing is used to investigate the behavior of the source code by
executing the program on the test data.

6.4 TYPES OF TESTING


6.4.1 UNIT TESTING
Unit testing is conducted to verify the functional performance of each
modular component of the software. Unit testing focuses on the smallest unit of the
software design (i.e.), the module. The white-box testing techniques were heavily
employed for unit testing.

6.4.2 FUNCTIONAL TESTING

Functional tests provide systematic demonstrations that functions


tested are available as specified by the business and technical requirements,
system documentation, and user manuals.
Functional testing is centered on the following items:
Valid Input : identified classes of valid input must be accepted.
Invalid Input : identified classes of invalid input must be
rejected. Functions : identified functions must be exercised.
Output : identified classes of application outputs must be
exercised. Systems/Procedures : interfacing systems or procedures must be
invoked.
6.4.3 SYSTEM TESTING

System testing ensures that the entire integrated software system meets
4
requirements. It tests a configuration to ensure known and predictable results.
An example of system testing is the configuration-oriented system integration
test. System testing is based on process descriptions and flows, emphasizing pre-
driven process links and integration points

6.4.4 PERFORMANCE TESTING

The Performance test ensures that the output be produced within the time
limits, and the time taken by the system for compiling, giving response to the
users and request being send to the system for to retrieve the results.

6.4.5 INTEGRATION TESTING

Integration testing is a systematic technique for constructing the


program structure, while at the same time conducting tests to uncover error
associated with interfacing. The following are the types of Integration Testing: -
• Top-down Integration
• Bottom-up Integration

Top-down Integration

This method is an incremental approach to the construction of program


structure. Modules are integrated by moving downward through the control
hierarchy, beginning with the program module. The module subordinates to the
main program module are incorporated into the structure in either a depth first or
breathe first manner.
Bottom-up integration
This method begins the construction and testing with the modules at

4
the lowest level in the program structure. Since the modules are integrated from
the bottom up, processing required for modules subordinate to a given level is
always available and the need for stubs is eliminated.

6.4.6 PROGRAM TESTING:

The logical and syntax errorshavebeen pointedout by program testing.A


syntax error is an error in a program statement that in violates one or more rules of
the language in which it is written. An improperly defined field dimension or
omitted keywords are common syntax error. These errors are shown through error
messages generated by the computer. Condition testing method focuses on testing
each condition in the program the purpose of condition test is to deduct not only
errors in the condition of a program but also othera errors in the program.
6.4.7 VALIDATION TESTING

At the culmination of integration testing, software is completely


assembled as a package. Interfacing errors have been uncovered and corrected and
a final series of software test-validation testing begins. Validation testing can be
defined in many ways, but a simple definition is that validation succeeds when the
software functionsin manner that is reasonably expected by the customer. Software
validation is achieved through a series of black box tests that demonstrate
conformity with requirement. After validation test has been conducted, one of two
conditions exists.
 The function or performance characteristicsconfirm to specificationsand are
accepted.
 A validation from specification is uncoveredand a deficiency created.

5
6.4.8 USER ACCEPTANCE TESTING
User acceptance of the system is key factor for the success of anysystem.
The system under consideration is tested for user acceptance by constantly keeping
in touch with prospective system and user at the time of developing and making
changeswhenever required.

6.5 WHITE BOX AND BLACK BOX TESTING

6.5.1 WHITE BOXTESTING


This testing is also called as Glass box testing. In this testing, by
knowingthe specific functions that a product has been design to perform test can
be conductedthat demonstrate each function is fully operational at the sametime
searching for errors in each function. It is a test case design method that uses the
control structure of the procedural design to derive test cases. Basis path testing is
a white box testing.
Basis path testing:
 Flow graph notation
 Cyclometric complexity
 Deriving testcases
 Graph matricesControl

6.5.2 BLACK BOX TESTING


In this testing by knowing the internal operation of a product, test
can be conducted to ensure that “all gears mesh”, that is the internal
operation performs according to specification and all internal components
have been adequatelyexercised. It fundamentallyfocuses on the functional
requirementsof
5
the software.
The steps involved in black box test case design are:
 Graph based testing methods
 Equivalencepartitioning
 Boundary value analysis
 Comparison testing

6.6 SOFTWARE TESTING STRATEGIES


A software testing strategy provides a road map for the software developer.
Testingis a set activity that can be planned in advance and conducted systematically.
For this reason, a template for software testinga set of steps into which we can place
specific test case design methods should be strategy should have the following
characteristics:

• Testing begins at the module level and works “outward” toward the
integration of the entire computer-basedsystem.
• Different testing techniquesare appropriate at different points in time.
• The developer of the software and an independent test groupconducts testing.
• Testing and Debugging are different activities but debugging must be
accommodated in any testing strategy.

5
CHAPTER 7

5
CHAPTER 7
CONCLUSION

After analyzing the given dataset, it was found that the k Nearest
Neighbors (kNN) algorithm outperformed the Logistic Regression algorithm
in accurately distinguishing between rocks and mines. This conclusion was
based on evaluating performance metrics including accuracy, precision,
recall, and F1 score. However, it is important to note that the superiority of
kNN does not imply it is always the optimal choicefor every classification
problem. The effectiveness of a model depends on factors such as data
quality, preprocessing techniques, hyperparameter selection, and evaluation
metrics employed. Therefore, it is essential to consider the specific
characteristics of the dataset and choose the most suitable algorithm
accordingly.

FUTURE SCOPE

For future endeavors, In this project we made a model which gives us the best
accuracy among all the machine learning algorithms and by using that model we have
made a web service Using stream lit which will connect the back end of ml model to the
front end that we designed it, The best-performing model was identified, and a web
service was created using Stream lit to connect the ML model to a user-friendly front-end
interface. By deploying this system online, users can assess their health conditions and
take appropriate measures. Future work involves enhancing the deployed model by
incorporating additional algorithms and displaying the ROC curve for improved
prediction. This web-based platform can also serve as a valuable resource for medical
professionals as a second reference for health assessments.

5
APPENDICES

5
APPENDIX A- SOURCE CODE:

Importing the Dependencies

import numpy as
np import pandas
as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import
LogisticRegression from sklearn.metrics import
accuracy_score

Data Collection and Data Processing


#loading the dataset to a pandas
Dataframe
sonar_data = pd.read_csv('/content/Sonar.csv', header=None)

# print first 5 rows of the


dataset sonar_data.head()

5
# number of rows and
columns sonar_data.shape

5
(208, 61)

sonar_data.describe() #describe --> statistical measures of the data

from sklearn.model_selection import train_test_split


from sklearn.linear_model import
LogisticRegression from sklearn.neighbors import
KNeighborsClassifier
from sklearn.metrics import accuracy_score,

confusion_matrix sonar_data[60].value_counts()

M 111 R 97 Name: 60, dtype: int64

sonar_data[60].value_counts().plot(kind="bar")

M --> Mine

R --> Rock

5
Now, let us group this data of mines and rocks through the mean

function. sonar_data.groupby(60).mean()

Model Development

Then, we shall proceed to label the inputs and the output. Here,’Y’ is the target variable
and whether the detected substance is rock or mine would be based on the inputs
provided.

Here,’Y’ is the target variable and whether the detected substance is rock or mine would
be based on the inputs provided.

# separating data and Labels


X = sonar_data.drop(columns=60, axis=1)
Y = sonar_data[60]

print(X)
print(Y)
Training and Test data
The above line of code would split the data into train and test datasets to measure
the model generalization to predict unseen data of the model. Now, we shall develop

5
the model with the help of 2 different algorithms viz. kNN and Logistic Regression.

6
Model development with k-NN

kNN works by selecting the number k of the neighbors followed by a calculation of


Euclidean distance. Then, the number of data points is counted in each category and the
new data points are assigned to that category. Let’s look at the lines of code

neighbors = np.arange(1,14)
train_accuracy =np.empty(len(neighbors))
test_accuracy = np.empty(len(neighbors))
#Now, we shall fit kNN classifier to the training data

for i,k in enumerate(neighbors):


knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(X_train, Y_train)
train_accuracy[i] = knn.score(X_train, Y_train)
test_accuracy[i] = knn.score(X_test, Y_test)

#Now, we shall plot the number of neighbors against accuracy to select the most suited nu
mber of neighbors.

plt.title('k-NN Varying number of neighbors')


plt.plot(neighbors, test_accuracy, label='Testing Accuracy')
plt.plot(neighbors, train_accuracy, label='Training accuracy')
plt.legend()
plt.xlabel('Number of neighbors')

6
plt.ylabel('Accuracy')
plt.show()

#From the above plot, it can be seen that accuracy for both the training as well as the testi
ng data decreases with the increasing number of neighbors, so k=2 would be a safe numbe
r to assume.

knn = KNeighborsClassifier(n_neighbors=2)
knn.fit(X_train,Y_train)

y_pred = knn.predict(X_test)

Model development with Logistic Regression

We shall perform some more data pre-processing before fitting logistic regression to the
training set.

print(X.shape,X_train.shape,X_test.shape)
#Through the above line of code, we get to know the number of rows and columns of test
and train dataset.

print(X_train)
print(Y_train)

#Now, we shall fit logistic regression to the training set.

model=LogisticRegression()
model.fit(X_train,Y_train)
6
#Evaluation of models through accuracy score and confusion matrix

kNN knn.score(X_test,Y_test)

The model gave us an accuracy score of 84%. For a better understanding of the model, a
confusion matrix through frequency tables can be seen.
pd.crosstab(Y_test, y_pred, rownames=['True'], colnames=['Predicted'], margins=True)

In this model, mines were predicted as mines 34 times, 1-time mines were predicted as
rocks, rocks were predicted as mines 9 times, and 19 times rocks were predicted as rocks.

Logistic Regression

score=model.score(X_test,Y_test)
print(score)
0.8095238095238095

The model gave us an accuracy score of 81%. Confusion matrix can be further seen to
understand true positive, false positive, true negative, and false negative.

prediction=model.predict(X_test)
pd.crosstab(Y_test, prediction, rownames=['True'], colnames=['Predicted'], margins=True)

6
In this model, mines were predicted as mines 30 times, 5 times mines were predicted as
rocks, rocks were predicted as mines 7 times, and 21 times rocks were predicted as rocks

PREDICTING MINE FOR SINGLE INPUT USING LOGISTIC REGRESSION

print(X_train)
print(Y_train)

Model Training --> Logistic

Regression model =

LogisticRegression()
#training the Logistic Regression model with training
data model.fit(X_train, Y_train

Model Evaluation

#accuracy on training data


X_train_prediction =
model.predict(X_train)
training_data_accuracy = accuracy_score(X_train_prediction, Y_train)
print('Accuracy on training data : ', training_data_accuracy)
Accuracy on training data : 0.8551724137931035

6
#accuracy on test data
X_test_prediction = model.predict(X_test)
test_data_accuracy = accuracy_score(X_test_prediction, Y_test)
print('Accuracy on test data : ', test_data_accuracy)
Accuracy on test data : 0.8095238095238095
Making a Predictive System

input_data = (0.0307,0.0523,0.0653,0.0521,0.0611,0.0577,0.0665,0.0664,0.1460,0.2792,0
.3877,0.4992,0.4981,0.4972,0.5607,0.7339,0.8230,0.9173,0.9975,0.9911,0.8240,0.6498,0.
5980,0.4862,0.3150,0.1543,0.0989,0.0284,0.1008,0.2636,0.2694,0.2930,0.2925,0.3998,0.
3660,0.3172,0.4609,0.4374,0.1820,0.3376,0.6202,0.4448,0.1863,0.1420,0.0589,0.0576,0.
0672,0.0269,0.0245,0.0190,0.0063,0.0321,0.0189,0.0137,0.0277,0.0152,0.0052,0.0121,0.
0124,0.0055)

# changing the input_data to a numpy array


input_data_as_numpy_array =
np.asarray(input_data)

# reshape the np array as we are predicting for one instance


input_data_reshaped = input_data_as_numpy_array.reshape(1,-1)

prediction =
model.predict(input_data_reshaped)
print(prediction)

6
if (prediction[0]=='R'):
print('The object is a Rock')

6
else:
print('The object is a
mine') ['M']
The object is a mine

Conclusion

It could be seen that between the 2 models, kNN performed better than Logistic
Regression in terms of accurately distinguishing rocks from mines. The ML algorithms in
this field that are further recommended are support vector machine, principal component
analysis, and C-means clustering

# Define the logistic


function def logistic(x):
return 1 / (1 + np.exp(-
x)) # Generate some data
x = np.linspace(-10, 10,
100) y = logistic(x)

# Plot the S-curve


plt.plot(x, y)
plt.xlabel('Input')
plt.ylabel('Output')
plt.title('Logistic Function (S-Curve)')
plt.show()
6
APPENDIX B- SNAPSHOTS:
MODEL DEVELOPMENT WITH KNN:

6
PREDICTING USING K-NEAREST NEIGBHOR:

OUTPUT (KNN):

PREDICTING USING LOGISTIC REGRESSION:

6
OUTPUT FOR LOGISTIC REGRESSION:

MODEL DEVELOPMENT FOR LOGISTICS REGRESSION

6
PREDICTION USING SINGLE INPUT (LOGISTIC REGRESSION):

6
MODEL EVALUTION AND OUTPUT:

6
USER INTERFACE FEATURING PREDICTION SYSTEM

USER INTERFACE FEATURING RESULTS PREDICTION PAGES


FOR ROCK DATASET:

FOR MINE DATA SET

68
6
M
MINE R
ROCK

M – 111
R – 97
Name: 60, dtype: int64

6
REFERENCES

7
REFERENCES

[1] Stanislaw Ho˙zy´n. “A Review of Underwater Mine Detection


and Classification in Sonar Imagery.”(2021).

[2] Sri Ramya Yaramasu, Uppada Sai Gayatri, Vadlamani


Manjusha, Vaishna C Bhanu, Koda Indu . “UNDERWATER MINE
& ROCK PREDICTION BY EVALUATION OF,” (2016)

[3] H. Jia, J. Wu, Y. Sun, and J. Zhang. "A mine ventilation network model
based on KNN algorithm and logistic regression," (2017).

[4] Wang, J. Yang, and Y. Li. "Research on prediction of mine water inrush
based on K-nearest neighbor algorithm and logistic regression," (2016).

[5] S. Du, G. Zhang, and F. Chen. "Research on mine roof monitoring and
prediction model based on K-nearest neighbor algorithm and logistic
regression," (2016).

[6] Y. Guo, S. Hu, and X. Zhang. "Research on the prediction model of mine
water inrush based on logistic regression and KNN algorithm," (2020)

[7] S. Das, P. Dutta, S. Bhattacharjee, and P. Dey, "Sonar-based mine


classification using logistic regression and k-nearest neighbor,"
(2018).

[8] L. Wang, Z. Li, X. Zhang, and Y. Chen, "A comparative study of logistic
regression and k-nearest neighbor for sonar rock vs. mine classification,"
(2019).

[9] M. Huang, C. Liu, Y. Wang, and X. Li, "Sonar data analysis for rock vs.
mine classification using logistic regression and k-nearest neighbor," (2017).

[10] R. Sharma, S. Singh, and R. Kumar, "Performance analysis of logistic


regression and k-nearest neighbor algorithms for sonar rock vs. mine
prediction," (2020).

[11] A. Patel, S. Patel, and S. Shah, "Comparative study of logistic regression


7
and k-nearest neighbor for underwater mine detection using sonar signals,"
(2019).

7
[12] H. Gupta, P. Chauhan, and S. Singh, "Sonar rock vs. mine prediction
using logistic regression and k-nearest neighbor: A case study," (2018).

[13] K. Roy, S. Sarkar, and S. Das, "Evaluation of logistic regression and k-nearest
neighbor algorithms for sonar-based mine classification," (2017).

[14] A. Verma, R. Kumar, and A. Pandey, "Comparative analysis of logistic


regression and k-nearest neighbor for sonar rock vs. mine discrimination,"
(2020).

[15] N. Sharma, S. Gupta, and S. Singh, "Performance evaluation of logistic


regression and k-nearest neighbor in sonar rock vs. mine classification,"
(2019).

[16] V. Singh, M. Mishra, and S. Kumar, "A study on sonar rock vs. mine
prediction using logistic regression and k-nearest neighbor algorithms,"
(2016).

7
JOURNAL PAPER

7
SONAR ROCK VS MINE PREDICTION
USING LOGISTIC REGRESSION & K -
NEAREST NEIGHBOR
1 2 3, 4
MR. M.KRISHNA RAJ. ASSOCIATE PROFESSOR , ROHITH B , ROSHAN S.B YUGENDRAN M

1. Departmentof Information Technology, Panimalar Institute Of Technology, Tamil Nadu - 600123, India, Monykrishnaraj@gmail.com.
2.Department of Information Technology, Panimalar Institute Of Technology, Tamil Nadu - 600123, India, rohithrohithbalaji2002@gmail.com
3.Department of Information Technology, Panimalar Institute Of Technology, Tamil Nadu - 600123, India, roshan24001@gmail.com
4.Department of Information Technology, Panimalar Institute Of Technology, Tamil Nadu - 600123, India,yugeyou@gmail.com

ABSTRACT: Caval defense systems rely on the


use of underwater mines as a critical layer of I. INTRODUCTION
security, but these mines pose a risk to marine life Underwater Mines:
and submarines due to their resemblance to rocks. To prevent enemy surface ships and submarines,
To counter this threat, Mine Countermeasure self-contained explosive devices known as
(MCM) units utilize mine hunting, which involves underwater or naval mines have been utilized
the detection and classification of all mines within since the mid-19th century. During theAmerican
a suspicious area. This process typically involves Civil War, David Buchner introduced sea mines
using a sonar mounted on a ship or underwater in 1977, and today, an estimated 5,000 naval
vehicle to capture data, which is then analyzed by mines from both world wars remain in the
military personnel to identify mine-like objects Adriatic Sea. While previous mines were only
(MLOs) and benign objects. To improve the triggered by physical contact, modern mines can
accuracy of this system, it is necessary to develop be activated through acoustic, pressure, and
a more precise prediction model. Such a model magnetic changes in the water, referred to as
could significantly enhance the safety and influence mines. These underwater mines are
reliability of underwater mine detection. However, classified as either offensive ordefensive
the accuracy of the prediction model is directly warfare, with the former being strewn across
linked to the quality of the data used to train it. To hostile shipping lanes to damage merchant ships
address this issue, we used a dataset obtained from and military boats, while the latter are placed
sources such as Kaggle to train a machine learning along coastlines to divert enemysubmarines and
model for underwater mine and rock classification. ships away from critical locations and into more
The dataset comprises 208 sonar signals recorded
heavily guarded areas. Due to their resemblance
at
to rocks in terms of shape, length, and width,
60 different angles, which capture the unique
mines are often misidentified. To avoid such
frequency characteristics of underwater objects.
confusion, a more precise input is needed to
We compared the performance of three binary
achieve an accurate output. One effective
classifier models using Python and supervised
method of detecting mines is through the use of
machine learning algorithms, and selected the
SONAR technology.
most accurate model for predictions

KEYWORDS: mine detection; mine classification; sonar


imagery; mine countermeasure; mine-like object
7
Sonar: and Gas Outburst" (2019) .This paper discusses
The SONAR system uses sound waves to locate and the use of logistic regression to predict the
detect objects underwater. SONAR has numerous occurrence of coal and gas outburst in underground
applications, including acoustic mine detection for coal mines. The authors found that the logistic
military purposes, finding fish, mapping the ocean regression model had a high accuracy rate and
floor, and locating divers for non-military purposes. could effectively predict coal and gas outburst.
The range and frequency of SONAR are limited due
to the rapid increase in sound wave attenuation with
frequency. SONAR frequencies typically range from [2] Yanping Wei and Wenqiang Sun “K-Nearest
0.1 to 1 MHz for mine hunting, with a range of 1 to Neighbor Method for Predicting Coal and Gas
0.1 km. Ultrasonic waves are preferred over Outburst" (2019) Explores the use of KNN to
infrasonic waves in SONAR, as they cannot predict coal and gas outburst in coal mines. The
propagate underwater. SONAR is divided into two authors found that KNN had a higher prediction
types: active and passive. Passive SONAR only accuracy than traditional methods and could
detects sounds and is therefore called listening effectively predict coal and gas outburst
SONAR. Active SONAR uses a sound transmitter
and receiver. When a sound wave from the [3] Bakytbek Jumabekov, et al “Logistic
transmitter hits an object, it reflects back and creates Regression and K-Nearest Neighbor Techniques
an echo. The receiver records the frequencies of the in Predicting the Occurrence of Gold
object's echo to determine its nature. In this case, we Mineralization in the Kyrgyz Republic" (2019).
use the frequencies obtained by active SONAR at 60 This paper discusses the use of logistic regression and
different angles as input to determine whether the KNN to predict the occurrence of gold mineralization
target is a mine or a rock. The frequency range of in the Kyrgyz Republic. The authors found that both
active SONAR is typically 20KHz. techniques had high prediction accuracy, with KNN
outperforming logistic regression.
The process of mine countermeasures is typically .
broken down into four stages:
1. detection, which involves locating targets [4] A. Akande, et al. “A Comparative Study of
through various signals such as acoustic or Logistic Regression and K-Nearest Neighbor
magnetic; for Prediction of Mine Accidents” (2016). This
2. classification, which distinguishes paper compares the performance of logistic regression
between potential mines and harmless objects; and KNN in predicting mine accidents. The authors
3. identification, which confirms the classification found that both techniques had high accuracy rates, but
with the aid of additional information from tools KNN outperformed logistic regression.
like underwater cameras;
4. disposal, which involves safely removing or [5] Lingling Zhang, et al "Application of Logistic
destroying the mine. Regression and K-Nearest Neighbor to
Predicting Rockburst in Deep Metal
Mines”(2020).This paper discusses the use of logistic
LITERATURE
II.
regression and KNN to predict rockburst in deep metal
SURVEY mines. The authors found that both techniques had high
prediction accuracy, with KNN outperforming logistic
[1] Lu Feng, et al “Application of Logistic
Regression in Predicting the Occurrence of Coal regression.

7
[6] S. K. Das and S. K. Pal . “Prediction of roof with sonar signals from 60 angles. The output
fall risk in underground coal mines using frequencies were used as input to predict if the
logistic regression and decision tree object is a rock or a mine using classification
analysis"(2015). This paper presents a comparative machine learning techniques.
study of logistic regression and decision tree analysis To accurately classify objects as either rocks or
for predicting the risk of roof fall in underground coal
mines, we used classification techniques in our
mines. The authors found that logistic regression had a
higher accuracy than decision tree analysis. machine learning models. These models were
trained on the dataset to ensure they could
accurately distinguish between rocks and mines
based on their unique frequency characteristics. By
III. EXISTING SYSTEM utilizing these machine learning techniques, weaim
The existing system for Sonar Rock vs Mine to improve the accuracy and reliability of
prediction using machine learning algorithmsusing underwater mine detection and reduce the risk of
traditional methods had several disadvantages. The harm to marine life and submarines. Our predictive
detection of mines was done by explosive system has the potential to enhance the safety and
ordnance disposal divers, marine mammals, video effectiveness of Mine Countermeasure (MCM)
cameras on mine neutralization vehicles, and laser units in identifying mines and protecting naval
systems, which can be time- consuming and vessels.
costly. Additionally, these methodshad a limited In order to decrease the workload of technical
range and were not highly accurate, leading to the operators and reduce post-mission analysis time,
risk of undetected mines. computer-aided detection (CAD), computer-aided
Moreover, the use of such equipment can cause a classification (CAC), and automated target
risk to marine life, and the loss of life cannot be recognition (ATR) algorithms have been
ruled out. These traditional methods had implemented.
limitations and risks associated with them, such as These algorithms analyze various types of image
harm to marine life, insufficient accuracy, and the characteristics, which can be classified into three
potential loss of human life. categories:
As technology improved, SONAR became a 1. texture-based features, such as patterns and
primary tool for detecting mines in the underwater local variations of image intensity,
environment. SONAR uses sound waves to detect 2. geometrical features like length and area
objects in water and has proved to be an effective 3. spectral features including color and energy.
tool for detecting mines in real-time.
However,even the SONAR-based system has some Mine detection and classification (a) based on
limitations, such as the possibility of generating segmentation, and (b) on texture feature
false positives, difficulty in detecting small-sized extraction.
mines, and the need for constant calibration.
Various detection and classification methods
for detecting mines in sonar images have been
developed, including classical image processing,
IV. PROPOSED SYSTEM machine learning (ML), and deep learning (DL)
techniques. While ML has limitations in terms of
We developed a predictive system using a dataset
time, reliability, and background information, DL
from "Analysis of Hidden Units in a Layered
overcomes these limitations with its ability to work
Network Trained to Classify Sonar Targets." The
with unstructured and structured data and perform
dataset was obtained by striking metal cylinders
automatic feature extraction. DL also works
efficiently with vast amounts of data and is more
reliable.
7
Transfer learning and algorithm fusion are also
employed to improve reliability in mine detection
and classification methods. Combining classical
image processing with deep learning can further
enhance performance and mitigate the negative
impact of unbalanced data. A review of recent and
past methods in mine detection and classification is
presented.

Underwater Sonar:

Figure 1. Background, shadow and highlight Sonar is an acronym for SOund Navigation And
combination. Ranging, which is a technique used to locate objects
underwater by transmitting and receiving acoustic
waves. The basic principle of sonar is to send out an
Machine learning automatically detects or predicts
acoustic wave that travels through water and bounces
patterns in data, but it requires high-
back when it hits an object. By measuring the time it
qualityimaging data and has limitations. Deep
takes for the sound wave to travel to the object and
learning, which uses artificial neural networks,
back, the distance to the object can be calculated.
overcomes these limitations and is more reliable.
The speed of sound in water is faster than in air and
is affected by factors such as depth, temperature, pH,
and salinity. The frequency of acoustic waves used
insonar ranges from 10 Hz to 1 MHz, depending on
the application. The sonar equation is used to
calculate the energy required to detect a target, taking
into account factors such as detection threshold,
source level, transmission loss, target strength, noise
level, and directivity index. Sonar systems
usehydrophones to transmit and receive acoustic
waves, and the accuracy of range estimation depends
on the pulse length and bandwidth of the acoustic
signal.

Figure 2.
(a) Working process of ML;
(b) working process of DL; (c) performance of
ML and DP

Deep learning algorithms require large amounts of


data for training, but obtaining high-quality data
for mine detection remains a challenge due to
confidentiality and lack of publicly available
datasets. To address this, techniques such as sonar
data simulation and data augmentation are used.

7
Figure 3. Basic imaging geometry

Figure 6. Acoustic reflection.


Figure 4. Sectorscan sonar.

V. METHODOLOGIES:

1. Data collection: The first step in any


data-driven prediction task is to collect relevant
data. In the case of mine prediction, this can
involve collecting data on factors such as geology,
mining techniques, equipment, environmental
conditions, and safety records. This data can be
collected manually, through surveys or
observations, or automatically, using sensors and
Figure 5. Synthetic-aperture other monitoring technologies.
2. Data preprocessing: Once the data
sonar Highlights and Shadows: is collected, it needs to be preprocessed to make it
suitable for analysis. This can involve tasks such as
In sonar imaging, a target's reflectivity determines cleaning the data, removing missing values, scaling
the size of the highlight and shadow regions and normalizing the data, and splitting the data into
produced. The target's structure, thickness, size,and training and testing sets.
shape affect its reflectivity, while the shadow is 3. Feature selection: The next step is to
caused by wave obstruction. Shadows provide select the most relevant features to be used in the
information on target characteristics and can also prediction model. This can involve using domain
represent bottom depressions, but they must be knowledge to identify relevant features, or using
combined with highlights to locate targets of feature selection techniques such as correlation
interest. analysis or principal component analysis (PCA).
4. Model training: Once the data is

7
preprocessed and the features are selected, the classification, efficient convolutional networks, and
logistic regression and K-nearest neighbor models statistically-based algorithms have also been
can be trained using the training data. The models proposed for target classification, achieving
are trained by adjusting their parameters to robustness against noise, occlusion, and clutter.
minimize a loss function, such as mean squared
error or cross- entropy loss.
5. Model evaluation: After the models
are trained, they need to be evaluated to assess
their performance. This can involve testing the
models on the testing data and calculating metrics
such as accuracy, precision, recall, and F1 score.
6. Model tuning: If the model
performance is not satisfactory, model parameters can
be adjusted to improve performance. For example, in
the case of K-nearest neighbor, the value of k can be
adjusted to optimize prediction accuracy.
7. Model deployment: Once the models
are trained and their parameters are optimized, they
can be deployed for prediction in real-world
scenarios. This can involve integrating the models
with existing monitoring and control systems in
mining operations.

Object Detection: Figure 7. Image segmentation


Object detection in sonar is challenging due to
environmental obstacles, but mines can be detected
due to their highlight and shadow segments. MLO Detection:
Image enhancement: Various techniques, such as thresholding, MRF,
Image enhancement techniques such as histogram fuzzy functions, and deep learning-based methods,
equalization, filters, and wavelet-based denoising have been used for image segmentation in sonar
reduce noise and normalize images to prepare for imagery to identify regions like highlight and
object detection and classification. The choice of shadow areas related to mines. A Gabor-based deep
technique should align with detection and neural network architecture was developed to
classification schemes. detect MLOs. It merges strong and weak features to
accurately detect MLOs at multiple scales,
Image segmentation: providing an effective method for AUVs in terms
Image segmentation is widely used in sonar of accuracy and model size.
imagery for identifying homogeneous regions such
as highlight and shadow areas related to mines.
Various techniques such as thresholding, MRF,
fuzzy functions, and deep learning-based methods
have been employed for segmentation, providing
improved accuracy and reduced computational
resources. Sparse reconstruction-based

7
Figure 8. Template’s geometry.

Figure 10. Architecture diagram

WORKING PROCEDURE:
Step 1: Firstly, we collect the dataset and prepare it
by cleaning it through exploratory data analysis.
Step 2: Then, we divide the dataset into training
Figure 9. Template of an MLO. and testing sets and evaluate various classification
models.
Step 3: After evaluating the models, we select the
top three performers, namely KNN, SVM, and
Evaluation of classification models: Logistic Regression.
When choosing an ML model, factors like
performance, complexity, dataset size, and Step 4: We assess the accuracy of these models and
understandability must be considered. generate a classification report.
Evaluatingthe model beforehand is crucial to Step 5: Next, we train the models to create an
improve its performance. Classification metrics are efficient and accurate prediction system.
used to assess various techniques and determine Step 6: Finally, we use the predictive systems to
their potential. The simplest way to evaluate a determine whether the object is a Mine or a Rock.
model is to measure its accuracy.
Accuracy =(TP + TN)/( TP + TN + FP + FN)

7
recurrent neural networks (RNNs), can be used to
automatically learn useful features from the data
VI. RESULTS and improve prediction accuracy. These
techniqueshave been successful in a variety of
applications and may be particularly useful when
KNN: dealing with complex data or when the relationship
between

KNN is a basic classification algorithm for pattern features and outcomes is nonlinear.
recognition based on feature categorization. The
data is split into dependent and independent 4. Model interpretability: While logistic
variables, with categorical values in the dataset. regression and K-nearest neighbor are relatively
The KNN model is fit to the train and test data interpretable models, there is still room for
withthe optimal k value (in this case, k=3). improvement in understanding how they arrive at
their predictions. Future research can focus on
Logistic Regression: developing methods to explain the decisions made
Logistic Regression predicts binary outcomes by by these models, which can increase trust and
analyzing the relationship between dependent and confidence in their predictions.
independent variables. After partitioning the 5. Real-time prediction: Mining operations
dataset, we fit the model using a binary encoder and require real-time decision making, and therefore
liblinear solver, evaluate its performance, and use it there is a need for mine prediction models that can
to create a prediction system. make predictions quickly and accurately. Future
research can focus on developing models that can
make predictions in real-time and can be integrated
with existing monitoring and control systems.
VII FUTURE 6. Improved data collection: The accuracy of
mine prediction models relies on the quality of the
ENHANCEMENT: data used to train them. Future research can focus
on improving data collection methods, such as
using sensors and other monitoring technologies,
1. Feature engineering: The performance of tocollect more accurate and comprehensive data.
logistic regression and K-nearest neighbor models This can help to improve the accuracy and
can be improved by identifying and engineering usefulness of mine prediction models.
more relevant features. This can involve using
domain knowledge to select or create new features,
or using techniques such as principal component
analysis (PCA) to extract useful information from VIII. CONCLUSION
the data.
2. Ensemble methods: Ensemble methods, such Our project aims to detect rocks and mines in the
as bagging, boosting, and random forests, can be ocean bed to prevent negative economic and
used to combine multiple logistic regression and K- environmental impacts caused by naval mines. The
nearest neighbor models to improve prediction traditional methods of detection involve sonar signals
accuracy. This can be particularly effective when and manpower, with the former being the safer
there is a large amount of data and multiple models option. The collected data is stored in a CSV file and
can be trained on different subsets of the data. analyzed using various machine learning techniques
3. Deep learning: Deep learning techniques, such to build an accurate prediction model. Python, being
as convolutional neural networks (CNNs) and open-source and fast, is used to create
8
a cost-effective solution. The objective of this [3] H. Jia, J. Wu, Y. Sun, and J. Zhang "A mine
project is to simplify the process of mine detection ventilation network model based on KNN
and improve its efficiency. algorithm and logistic regression," (2017)

[4] Wang, J. Yang, and Y. Li, "Research on


IX. REFERENCES prediction of mine water inrush based on K-nearest
neighbor algorithm and logistic regression," (2016)

[1] Sri Ramya Yaramasu , Uppada Sai Gayatri, [5] S. Du, G. Zhang, and F. Chen, "Research on
Vadlamani Manjusha, Vaishna C Bhanu, Koda mine roof monitoring and prediction model based
Indu. “A Review of Underwater Mine Detection on K-nearest neighbor algorithm and logistic
and Classification in Sonar Imagery” regression,"(2016)

[2] B. Zhu, L. Wang, and Y. Zhao "A study on [6] Y. Guo, S. Hu, and X. Zhang, "Research on
mine safety early warning model based on logistic the prediction model of mine water inrush based on
regression and KNN algorithm," (2016). logistic regression and KNN algorithm,"(2020).

You might also like