You are on page 1of 62

Data Hiding using Stenography(audio,video,images)

A project report submitted in partial fulfillment


of the requirements for the award of the degree of

Master
of
Computer Application
Submitted by
STUDENT_NAME
ROLL_NO
Under the esteemed guidance of
GUIDE_NAME
Assistant Professor

DEPARTMENT OF COMPUTER SCIENCE &ENGINEERING


ST.MARY’S GROUP OF INSTITUTIONS GUNTUR (Affiliated
to JNTU Kakinada, Approved by AICTE, Accredited by NBA)
CHEBROLU-522 212, A.P, INDIA
2014-16
ST. MARY’S GROUP OF INSTITUTIONS, CHEBROLU, GUNTUR
(Affiliated to JNTU Kakinada)

DEPARTMENT OF COMPUTER SCIENCE & ENGINEEING

CERTIFICATE

This is to certify that the project report entitled PROJECT NAME” is the bonafied record of
project work carried out by STUDENT NAME, a student of this college, during the academic
year 2014 - 2016, in partial fulfillment of the requirements for the award of the degree of Master
Of Computer Application from St.Marys Group Of Institutions Guntur of Jawaharlal Nehru
Technological University, Kakinada.

GUIDE_NAME,
Asst. Professor Associate. Professor
(Project Guide) (Head of Department, CSE)
DECLARATION

We, hereby declare that the project report entitled “PROJECT_NAME” is an original work
done at St.Mary„s Group of Institutions Guntur, Chebrolu, Guntur and submitted in fulfillment
of the requirements for the award of Master of Computer Application, to St.Mary„s Group of
Institutions Guntur, Chebrolu, Guntur.

STUDENT_NAME
ROLL_NO
ACKNOWLEDGEMENT

We consider it as a privilege to thank all those people who helped us a lot for successful
completion of the project “PROJECT_NAME” A special gratitude we extend to our guide
GUIDE_NAME, Asst. Professor whose contribution in stimulating suggestions and
encouragement ,helped us to coordinate our project especially in writing this report, whose
valuable suggestions, guidance and comprehensive assistance helped us a lot in presenting the
project “PROJECT_NAME”.
We would also like to acknowledge with much appreciation the crucial role of our Co-Ordinator
GUIDE_NAME, Asst.Professor for helping us a lot in completing our project. We just wanted
to say thank you for being such a wonderful educator as well as a person.
We express our heartfelt thanks to HOD_NAME, Head of the Department, CSE, for his
spontaneous expression of knowledge, which helped us in bringing up this project through the
academic year.

STUDENT_NAME
ROLL_NO
ABSTRACT:

Given the spread of information, the various types of steganography are being represented. At
that point, video steganography and its procedures will be examined as the first step in the
exploration of various steganography and its details. In this paper, picture steganography will be
combined with a few procedures, such as least significant bits transformations and multiple
minimum critical bits. Finally, we'll talk about compression techniques in the method that
enables steganography in images, audio, and video.This system's goal is to give hackers a good,
efficient way to encode and decode their data while hiding it behind images, videos, and audio
files, which they can then combine to create steganography files. The secret text information has
been successfully archived in image, audio, and video. In addition, the image, audio, and video
file will be interpreted with a secret text extraction focus.According to the method, the visually
appealing portions of videos are viewed as a collection of pictures or frames that actually contain
the encrypted secure data. The image portion is handled in a similar manner. The current digital
era's use of steganography can be attributed to people's desire to conceal communication through
a medium that is replete with potential listeners as well as the steganography technique's ability
to secure the transmission of text, video, image, and audio data.

5
TABLE OF CONTENTS

TITLE PAGENO
1. ABSTRACT 6
2. INTRODUCTION
2.1 SYSTEM ANALYSIS
2.2 PROPOSED SYSTEM 9
2.3OVERVIEW OF THE PROJECT 9
3- LITERATURE SURVEY 11
3.1 REQUIREMENT SPECIFICATIONS 13
3.2 HADWARE AND SOFTWARE SPECIFICATIONS 15
3.3 TECHNOLOGIES USED 18
3.4 INTRODUCTION TO PYTHON 20
3.5 MACHINE LEARNING 24
3.6 SUPERVISED LEARNING 26
4. DESIGN AND IMPLEMENTATION CONSTRAINTS 27
4.1 CONSTRAINTS IN ANALYSIS 30
4.2 CONSTRAINTS IN DESIGN 34
5.DESIGN AND IMPLEMENTATION 38
6. ARCHITECTURE DIAGRAM 43
7. MODULES 45
8. CODING AND TESTING 50
9.APPENDISIS 52

6
SYNOPSIS
INTRODUCTION

Steganography is the process of concealing sensitive information within a regular, public file or
communication such that it cannot be read before it reaches its intended recipient. It acts as a
steganography tool and combines the capabilities of the following techniques: image
steganography, audio steganography, video steganography, and image to image steganography.
Steganography is a web application built using Python and Flask. The only person who may
access a message's contents according to the definition of confidentiality is the intended
recipient. Only the sender and his intended recipient should have access to the message's
contents Examples: information on military applications that has come from a higher authority
transmitted to a different higher power. Traffic analysis is the assault that puts message secrecy
at risk because of interception that happened between the sender and the recipient. According to
the integrity principle, a message's contents shouldn't be changed unless it has been authorised by
a person. [1] Only authorised individuals and authorised mechanisms may change information.
Integrity ensures that data is received exactly as it was supplied by a legitimate party.
Modification and masquerading attacks, data integrity threats, threats to ensure that information
is altered only in allowed ways, threats to system integrity, and threats to ensure that systems
operate as intended. Steganography, the practise of concealing information in methods that
thwart the detection of concealed signals, is also free from unwanted alteration. Steganography
literally translates to "covered writing" and is a Greek word. It uses various covert
communication techniques to hide the existence of messages. Steganography is the art of
concealing important or secret information within something that doesn't seem unusual. Because
steganography and cryptology both function to secure sensitive information, they are sometimes
mistaken with one another. The distinction between the two is that steganography involves
concealing information while making it seem as though nothing is concealed at all. When a
person looks at an object that has information hidden in it, he or she does not try to decipher it
because they are unaware that there is hidden information.

7
Integrity Confidentiality Security figure 1 : security principle 1.1. Image Steganography: In
image steganography, the use of pixel intensity information is inherently hidden. When the
captured cover object is an image, this is called image steganography. 1.2. Text
Steganography:In this technique, hide a secret text message inside the covering image . 1.3.
Audio Steganography: This technique is called phonetic steganography because it was chosen to
hide the information

that is the secret message of the voice. You can also keep secret messages confidential if the
messages are encrypted. 1.4. Video Steganography: This format is used to hide any type of file in
the cover video file. This format can be more secure than other media files due to its size and
complexity. Steganography Text Image Audio video figure 2 : different mediums to achieve
steganography. II. STEGANOGRAPHY FEATURES ❖ Capacity for Loading: It demonstrates
how much concealed information the cover picture may contain. The pace of implantation is
specified in whole amounts, such as the length of the hidden message.  Secured When there is
a subtle distinction between the cover picture and the steganographic image, a steganographic
system is considered to be safe. Hidden data should never be made public without the main user's
consent, who also possesses the password.  Reasonably priced Any steganography approach's
cost efficiency is determined by two factors: data concealing and data retrieval ❖ Quality. In
order to prevent the quality of the data from declining, it is important to use an acceptable
amount of data and a sound methodology  Indistinct ❖ The system is flawless and
indistinguishable when the human eye cannot tell the difference between the cover image and the
steganographic image. ❖ Accuracy The data or information you get from the media ought to be
trustworthy and correct. III. TERMINOLOGIES USED IN STEGANOGRAPHY ARE
FOLLOWING 1. Covered Image Real picture serving as a carrier for the concealed file is called
the covered image. 2. Steganographic: Steganographic: The concealed information that is placed
into the cover is steganographic ( image,audio,video). 3. Message : Information that is genuinely
concealed in visuals, whether it be an image, a video, an audio file, or plain text. 4. Implanting
Algorithm: Algorithm Implantation: An algorithm is employed to conceal the information within
the picture. VI. STEGANOGRAPHY METHODOLOGY The least significant bit algorithm
(LSB), which represents the bit position in a binary integer as the binary 1s place of the integer,
is the most widely used steganography technique. Similar to how the highest-order position of

8
the binary integer is represented by the most significant bit (MSB), The least significant bits
(LSB) substitution method is typically used to accomplish picture steganography. The LSB is
sometimes referred to as the low-order bit or right-most bit. Images often have greater pixel
quality, however not all of them are utilised. Least significant bit techniques operate under the
presumption that little pixel value changes will not result in noticeable changes. The encrypted
data is transformed into binary form.The least important bits in the noisy region are found by
scanning the cover picture. The least important bit of the cover picture is then changed with the
binary bits from the secret image. 1 0 0 1 0 1 0 1 The number 149 is shown above as an 8-bit
binary digit, with its rightmost bit highlighted. If we set it to 0, we get the value 10010100. (the
decimal is equivalent to 148). The impact of changing any other bit in the binary number above
would be significant. Therefore, in this illustration, the rightmost bit is the least important
because changing it has the least effect on the original number. Each pixel value in a digital
image has a number, and together they make up the image that tell us something about that
particular pixel is Red, green, and blue channels of a digital colour picture each contain eight bits
to represent them, allowing each channel to take a value between 0-255 to reflect the pixel's
intensity. White is represented by (255,255,255) while black is represented by (R, G, B)=(0,0,0).
Let's say we wish to conceal the letter A within a pixel array as an example. Let's examine the
procedure. (R,G,B)=(11101010,11101001,11001010) (10111001,11001011,11101000)
(11001001 00100100 11101001) We want to conceal A in this array of pixels, which contains
pixels. A has the ASCII code of 9 65. It becomes 01000001 in binary form. Therefore, we can
change the LSB of every number in the array if we utilise LSB conversion. us and obtain.
(R,G,B)=(11101010,11101001,11001010) (10111000,11001010,11101000) (11001000
00100101 11101001) Once A has been concealed in the pixel array, let's use Python to conceal
some text in the picture

Image comparison figure 3 . a) Encoded image (left) vs. original image (right) Benefits of the
LSB approach include: 1. The main image's quality is preserved. 2. Increased capacity for
information storage. Drawback of the LSB approach include: 1 Low resistivity; possible loss of
visual data. 2 Hidden data can be easily destroyed by attacks. 4.1 Mathematical model The secret
data's maximum and minimum values are first established; all other values are then subtracted
from this maximum value. Any steganographic method, often known as a "Stegoalgorithm," is

9
made up of the Stego-Function and its antipodes, the Stego-Function-1. creates Stego-Image S
from Cover-Image C and Information I as the result. At the receiving end, the Stego-Image S is
supplied to a decoding algorithm that creates Information by mathematically inverting the Stego-
Function (shown as -1) This may be expressed mathematically as S = (C, I) and I = -1. (S) Here
are several calculating formulas in mathematics (LSB) RMSE (Root Mean Squared Error) The
square root of the residuals' variance yields the RMSE. It displays the model's absolute fit to the
data, or how closely the values of the observed data points match those predicted by the model. =
∑ ∑ [(,)(,)] × where i and j are the pixel positions in the image I(i,j)= original image coordinate
F(i,j)=steganography image coordinates M and N are size of the image PSNR (Peak Signal to
Noise Ratio) The PSNR is the most commonly used for the quality of reconstruction of extracted
images. It is defined as, PSNR=10log( ) where the maximum pixel value possible for an image
with 8 bits per sample is 255. 4.2 architectural model Steganography model deals with hidden
data in the data approach, that is, writing hidden messages in such a way that only the sender and
receiver understand the content of the message and have access to the hidden information.
system architecture of least significant bit technique for Image Steganography has been depicted
in the flow diagram in Figure 4.b) figure 4. b) architectural model V RESULT & DISCUSSION
We treat the grayscale/RGB image as the cover image as shown in Figure 5.1, Figure 5.2 and the
text/image file as a secret message for LSB techniques, and then output the stegnography image.
SL.N o image SNR(dB) MSE PSNR(dB) 1. Gray image 59.5043 0.0463 61.4733 2. RGB image
61.3649 0.021 67.6697 Table : PSNR of Least Significant Bits Encoding

CHAPTER 2

SYSTEM ANALYSIS

2.1 EXISTING SYSTEM

The existing system for Data Hiding using Steganography reflects the current landscape of
steganographic techniques and tools available in the domain. Steganography, as a field, has seen
significant development over the years, with various methods and applications designed to
conceal information within different types of media, including audio, video, and images.

10
Current steganographic tools typically operate by modifying specific features of the cover media
to embed hidden data without causing noticeable alterations. In the context of audio
steganography, conventional methods often involve manipulating the least significant bits
(LSBs) of audio samples, ensuring that the changes are imperceptible to the human ear. Video
steganography techniques focus on concealing information within the frames of video files,
employing strategies to minimize visual impact during playback. Image steganography
commonly utilizes techniques such as LSB substitution or frequency domain methods to embed
data within image files.

While these existing methods have proven effective in concealing information, they may face
challenges related to security and robustness. Traditional LSB-based methods, for example, are
vulnerable to statistical attacks that analyze the distribution of LSB values. Additionally, existing
systems may lack adaptability, making them less effective when dealing with diverse media
formats and characteristics.

One of the common limitations of the existing systems is their reliance on straightforward
encryption methods. While encryption adds a layer of security to the embedded data,
advancements in cryptanalysis may pose threats to the confidentiality of the concealed
information. Moreover, existing systems may not incorporate advanced noise addition
techniques, making them potentially susceptible to detection through statistical analysis or
pattern recognition.

User interfaces of existing steganographic tools vary in terms of accessibility and user-
friendliness. Some tools may provide a straightforward and intuitive interface, enabling users to
interact seamlessly with the system. However, others may have a steeper learning curve, limiting
their adoption by users with varying levels of technical expertise.

Overall, the existing system landscape for Data Hiding using Steganography offers a range of
tools with different capabilities and limitations. While these tools serve the purpose of
concealing information within media files, the evolution of security threats and the need for more

11
adaptable and robust solutions underscore the importance of exploring enhancements and
advancements in steganographic techniques. The proposed system aims to address these
limitations by introducing innovative features and security measures to ensure a more secure,
efficient, and user-friendly steganographic tool.

2.2PROPOSED SYSTEM

The proposed system for Data Hiding using Steganography in Python introduces
enhancements and advancements to the existing steganographic techniques, aiming to provide a
more secure and efficient means of concealing data within different media types, including
audio, video, and images.

The primary focus of the proposed system lies in strengthening the security measures
and optimizing the embedding and extraction processes. One key enhancement is the
incorporation of more sophisticated encryption techniques for the embedded data. By
implementing robust encryption algorithms, the system ensures an additional layer of protection,
making it more resilient against potential attacks.

Moreover, the proposed system introduces adaptive embedding strategies that


dynamically adjust based on the characteristics of the cover media. This adaptability enhances
the system's overall performance and robustness, ensuring optimal data hiding capacity while
minimizing the risk of detection. The adaptability feature is particularly beneficial when dealing
with various media formats, as it tailors the embedding process to the specific nuances of each
type.

To further improve security, the proposed system includes advanced noise addition
techniques during the embedding process. This helps mask any potential patterns that might arise
from the data hiding, making it even more challenging for adversaries to detect the concealed
information. The noise addition is carefully calibrated to maintain the natural appearance of the
cover media, ensuring that the embedded data remains hidden from visual or auditory inspection.

12
Additionally, the user interface of the proposed system undergoes refinements to
enhance user experience. Intuitive controls, informative feedback, and streamlined navigation
contribute to a more user-friendly interaction. The goal is to empower users with a tool that not
only ensures security and efficiency but is also accessible to individuals with varying levels of
technical expertise.

In terms of performance optimization, the proposed system explores parallel processing


techniques to expedite the embedding and extraction processes, particularly when dealing with
large media files. This enhancement contributes to a more efficient and responsive system,
accommodating users working with diverse data sizes and complexities.

Furthermore, the proposed system introduces a feedback mechanism that provides users
with insights into the potential risks associated with the selected embedding parameters. This
feature enhances user awareness and assists in making informed decisions about the trade-offs
between data hiding capacity and potential detectability.

Overall, the proposed system for Data Hiding using Steganography in Python
represents a significant advancement in the field of secure communication. By introducing
sophisticated encryption, adaptive strategies, noise addition techniques, and user interface
refinements, the system aims to offer an unparalleled combination of security, efficiency, and
user-friendliness. These enhancements collectively contribute to a robust and versatile
steganographic tool, addressing the evolving needs of users seeking to transmit sensitive
information covertly and securely within various media formats.

CHAPTER 3

REQUIREMENT SPECIFICATIONS AND ANALYSIS

13
Requirement Analysis for "Data Hiding using Steganography in Python"

Requirement analysis plays a pivotal role in the development of any software system, laying the
foundation for a comprehensive understanding of user needs and system functionalities. In the
case of the proposed system for Data Hiding using Steganography in Python, the requirement
analysis encompasses various aspects, including functional and non-functional requirements,
user expectations, and system constraints.

Functional Requirements:

1. Data Embedding and Extraction:

The system must facilitate the seamless embedding of data within different media types,
including audio, video, and images. Additionally, it should provide robust mechanisms for
extracting concealed information without loss or corruption.

2. Encryption Integration:

To enhance security, the system should support the integration of advanced encryption
techniques during the embedding process. This ensures the confidentiality of the hidden data and
protects against unauthorized access.

3. Adaptive Embedding Strategies:

The system must employ adaptive strategies for embedding data based on the characteristics of
the cover media. This adaptability ensures optimal hiding capacity while minimizing the risk of
detection.

14
4. Noise Addition Mechanism:

Incorporate an advanced noise addition mechanism during the embedding process to mask
potential patterns and enhance the system's resistance to statistical and pattern recognition
attacks.

5. User Interface Design:

Develop an intuitive and user-friendly interface that accommodates users with varying levels
of technical expertise. The interface should offer clear controls, informative feedback, and
streamlined navigation for a seamless user experience.

6. Parallel Processing Optimization:

Implement parallel processing techniques to optimize the embedding and extraction processes,
especially when dealing with large media files. This optimization contributes to improved system
efficiency and responsiveness.

7. Feedback Mechanism:

Introduce a feedback mechanism that provides users with insights into the potential risks
associated with selected embedding parameters. This feature empowers users to make informed
decisions regarding the trade-offs between data hiding capacity and potential detectability.

Non-Functional Requirements:

15
1. Security:

The system must prioritize the security of the concealed information, employing robust
encryption and adaptive embedding strategies to thwart various attacks.

2. Efficiency:

Ensure efficient and timely embedding and extraction processes, even when dealing with large
media files. Parallel processing contributes to improved performance.

3. Robustness:

The system should be robust, adapting to the diverse characteristics of different media formats
while maintaining optimal hiding capacity.

4. Usability:

Prioritize usability by designing an intuitive user interface that caters to users with different
levels of technical expertise. User feedback and controls should be clear and accessible.

5. Scalability:

Design the system to be scalable, accommodating varying data sizes and complexities without
compromising performance or security.

User Expectations:

16
Users expect a steganographic tool that not only provides a secure and efficient means of hiding
data but also ensures a positive and user-friendly experience. They anticipate a tool that aligns
with their specific needs for covert communication while offering features that enhance security
and adaptability.

System Constraints:

1. Processing Power:

The system's performance may be constrained by the processing power of the user's machine,
especially when dealing with resource-intensive tasks like parallel processing.

2. Media Format Compatibility:

Constraints may arise when dealing with media formats that are less common or have unique
characteristics. The system should strive for compatibility with a wide range of media formats.

3. Cryptographic Algorithm Limitations:

The cryptographic algorithms used for encryption may have limitations, and the system must
be designed to adapt to advancements in cryptanalysis to maintain data confidentiality.

In conclusion, the requirement analysis for the proposed system emphasizes the need for a
secure, efficient, and user-friendly steganographic tool. By addressing functional and non-
functional requirements, user expectations, and system constraints, the system aims to provide a
robust solution for data hiding using steganography in Python.

17
3.1 INTRODUCTION

Prediction of modernized approval system based on machine learning approach is a approval


system from where we can know whether the will pass or not. In this system, we take some data
from the user like his monthly income, marriage status, amount, duration, etc. Then the bank
will decide according to its parameters whether the client will get the or not. So there is a
classification

system, in this system, a training set is employed to make the model and the classifier may
classify the data items into their appropriate class. A test dataset is created that trains the data and
gives the appropriate result that, is the client potential and can repay the . Prediction of a
modernized approval system is incredibly helpful for banks and also the clients. This system
checks the candidate on his priority basis. Customer can submit his application directly to the
bank so the bank will do the whole process, no third party or stockholder will interfere in it. And
finally, the bank will decide that the candidate is deserving or not on its priority basis. The only
object of this research paper is that the deserving candidate gets straight forward and quick
results.

3.2 HARDWARE AND SOFTWARE SPECIFICATION :

3.2.1 HARDWARE REQUIREMENTS:

 Hard disk : 500 GB and above.

 Processor : i3 and above.

 Ram : 4GB and above.

3.2.2 SOFTWARE REQUIREMENTS :

 Operating System : Windows 10

 Software : python

18
 Tools :Anaconda (Jupyter Note Book IDE)

3.3 TECHNOLOGIES USED:

 Programming Language: Python.

3.3.1 Introduction to Python:

Python is a widely used general-purpose, high level programming language. It was


initially designed by Guido van Rossum in 1991 and developed by Python Software Foundation.
It was mainly developed for emphasis on code readability, and its syntax allows programmers to
express concepts in fewer lines of code.
Python is a programming language that lets you work quickly and integrate systems more
efficiently.

It is used for:

 web development (server-side),

 software development,

 mathematics,

 System scripting.

What can Python do?

 Python can be used on a server to create web applications.

 Python can be used alongside software to create workflows.

 Python can connect to database systems. It can also read and modify files.

 Python can be used to handle big data and perform complex mathematics.

 Python can be used for rapid prototyping, or for production-ready software

19
development.

Why Python?

✧ Python works on different platforms (Windows, Mac, Linux, Raspberry Pi, etc).

✧ Python has a simple syntax similar to the English language.

✧ Python has syntax that allows developers to write programs with fewer lines than some other
programming languages.
✧ Python runs on an interpreter system, meaning that code can be executed as soon as it is
written. This means that prototyping can be very quick.
✧ Python can be treated in a procedural way, an object-orientated way or a functional way.

Python Syntax compared to other programming languages

● Python was designed to for readability, and has some similarities to the English language with
influence from mathematics.

● Python uses new lines to complete a command, as opposed to other programming languages
which often use semicolons or parentheses.
● Python relies on indentation, using whitespace, to define scope; such as the scope of loops,
functions and classes. Other programming languages often use curly-brackets for this purpose.

Python is Interpreted

● Many languages are compiled, meaning the source code you create needs to be translated into
machine code, the language of your computer’s processor, before it can be run. Programs
written
in an interpreted language are passed straight to an interpreter that runs them directly.
● This makes for a quicker development cycle because you just type in your code and run it,
without the intermediate compilation step.

20
● One potential downside to interpreted languages is execution speed. Programs that are compiled
into the native language of the computer processor tend to run more quickly than interpreted
programs. For some applications that are particularly computationally intensive, like graphics
processing or intense number crunching, this can be limiting.
● In practice, however, for most programs, the difference in execution speed is measured in
milliseconds, or seconds at most, and not
appreciably noticeable to a human user. The expediency of coding in an interpreted language is
typically worth it for most applications.
● For all its syntactical simplicity, Python supports most constructs that would be expected in a
very high-level language, including complex dynamic data types, structured and functional
programming, and object-oriented programming.

● Additionally, a very extensive library of classes and functions is available that provides
capability well beyond what is built into the language, such as database manipulation or GUI
programming.
● Python accomplishes what many programming languages don’t: the language itself is simply
designed, but it is very versatile in terms of what you can accomplish with it.

Data hiding, a crucial aspect in information security, demands robust techniques to conceal
information within various types of media. Autoencoders and Generative Adversarial Networks
(GANs) emerge as powerful tools in the realm of data hiding, offering innovative solutions to
ensure the confidentiality of information. These techniques find applications in diverse domains,
from image and audio processing to cryptography, demonstrating their versatility and
significance.

2. Autoencoders
Autoencoders, a class of neural networks, serve as fundamental components in data hiding
methodologies. At their core, autoencoders consist of an encoder and a decoder, working in
tandem to compress and reconstruct data. The encoder reduces the input data into a compact
representation, often referred to as the bottleneck layer, while the decoder reconstructs the

21
original data from this compressed form. This intrinsic capability makes autoencoders
particularly valuable in scenarios where efficient data compression and reconstruction are
essential.

2.1 What are Autoencoders?


Autoencoders are neural network architectures designed for unsupervised learning. Their
primary function involves encoding and decoding input data, with the objective of reproducing
the original input as accurately as possible. The encoder, responsible for mapping input data to a
lower-dimensional representation, creates a condensed version of the input, capturing its
essential features. Meanwhile, the decoder reconstructs the original input from this condensed
representation. This two-step process facilitates efficient data compression and has applications
in various domains, including image and signal processing.

2.2 Types of Autoencoders


Within the realm of autoencoders, different variations cater to specific use cases. Variational
Autoencoders (VAEs) introduce probabilistic elements, allowing for the generation of diverse
outputs from the same input. Denoising Autoencoders focus on reconstructing clean data from
noisy input, making them particularly useful in scenarios with imperfect data. Sparse
Autoencoders encourage sparsity in the encoded representation, promoting the discovery of
salient features in the data.

2.3 Applications of Autoencoders


Autoencoders find wide-ranging applications, contributing significantly to various fields. In
image processing, they play a pivotal role in compression and reconstruction tasks, aiding in
efficient storage and transmission of visual data. Additionally, autoencoders excel in anomaly
detection, where deviations from the learned patterns indicate potential issues or security
breaches. Their ability to learn meaningful representations from data makes them valuable in
feature learning tasks across diverse domains.

2.4 Implementation Example

22
Implementing an autoencoder involves constructing a neural network with an encoder and a
decoder. In Python, popular libraries like TensorFlow and PyTorch provide the necessary tools
for building and training autoencoder models. A simple example in Python might involve
defining layers for the encoder and decoder, choosing appropriate activation functions, and
training the model on a dataset. This hands-on approach allows practitioners to gain a deeper
understanding of autoencoders and their application in real-world scenarios.

3.3.2 Machine learning Introduction:


Machinelearning (ML) is the scientific study of algorithms and statistical

models that computer systems use to perform a specific task without

using explicit instructions, relying on patterns and inference instead. It is seen as a subset of
artificial intelligence. Machine learning algorithms build a mathematical model based on sample
data, known as "training data", in order to make predictions or decisions without being explicitly
programmed to perform the task. Machine learning algorithms are used in a wide variety of
applications, such as email filtering and computer vision, where it is difficult or infeasible to
develop a conventional algorithm for effectively performing the task.
Machine learning is closely related to computational statistics, which focuses on making
predictions using computers. The study of mathematical optimization delivers methods,
theory and application domains to the field of machine learning. Data mining is a field of study
within machine learning, and focuses on exploratory data analysis through learning. In
its application across business problems, machine learning is also referred to as predictive
analytics.

Machine learning tasks:

Machine learning tasks are classified into several broad categories. In supervised learning,
the algorithm builds a mathematical model from a set of data that contains both the inputs and
the desired outputs. For example, if the task were determining whether an image contained a
certain object, the training data for a supervised learning algorithm would include images with
and without that object (the input), and each image would have a label (the output) designating

23
whether it contained the object. In special cases, the input may be only partially available, or
restricted to special feedback. Semi algorithms develop mathematical

models from incomplete training data, where a portion of the sample input doesn't have labels.

Classification algorithms and regression algorithms are types of supervised learning.


Classification algorithms are used when the outputs are restricted to a limited set of values. For a
classification algorithm that filters emails, the input would be an incoming email, and the output
would be the name of the folder in which to file the email. For an algorithm that identifies spam
emails, the output would be the prediction of either "spam" or "not spam", represented by the
Boolean values true and false. Regression algorithms are named for their continuous outputs,
meaning they may have any value within a range. Examples of a continuous value are the
temperature, length, or price of an object.

In unsupervised learning, the algorithm builds a mathematical model from a set of data that
contains only inputs and no desired output labels. Unsupervised learning algorithms are used to
find structure in the data, like grouping or clustering of data points. Unsupervised learning can
discover patterns in the data, and can group the inputs into categories, as in feature learning.
Dimensionality reduction is the process of reducing the number of "features", or inputs, in a
set of data.

Active learning algorithms access the desired outputs (training labels) for a limited set of
inputs based on a budget and optimize the choice of inputs for which it will acquire training
labels. When used interactively, these can be presented to a human user for labeling.
Reinforcement learning algorithms are given feedback in the form of positive or negative
reinforcement in a dynamic environment and are used in autonomous vehicles or in learning to
play a game against a human opponent. Other

specialized algorithms in machine learning include topic modeling, where the computer program
is given a set of natural language documents and finds other documents that cover similar topics.
Machine learning algorithms can be used to find the unobservable probability density function in
density estimation problems. Meta learning algorithms learn their own inductive bias based on
previous experience. In developmental robotics, robot learning algorithms generate their own
sequences of learning experiences, also known as a curriculum, to cumulatively acquire new
skills through self-guided exploration and social interaction with humans. These robots use

24
guidance mechanisms such as active learning, maturation, motor synergies, and imitation.

Types of learning algorithms:

The types of machine learning algorithms differ in their approach, the type of data they
input and output, and the type of task or problem that they are intended to solve.

Supervised learning:

Supervised learning algorithms build a mathematical model of a set of data that contains
both the inputs and the desired outputs. The data is known as training data, and consists of a set
of training examples. Each training example has one or more inputs and the desired output, also
known as a supervisory signal. In the mathematical model, each training example is represented
by an array or vector, sometimes called a feature vector, and the training data is represented by a
matrix. Through iterative optimization of an objective function, supervised learning algorithms
learn a function that can be used to predict the output associated with new inputs. An
optimal function will allow the algorithm to correctly determine the output for inputs that were
not a part of the training data. An algorithm that improves the accuracy of its outputs or
predictions over time is said to have learned to perform that task.

Supervised learning algorithms include classification and regression. Classification algorithms


are used when the outputs are restricted to a limited set of values, and regression algorithms are
used when the outputs may have any numerical value within a range. Similarity learning is an
area of supervised machine learning closely related to regression and classification, but the goal
is to learn from examples using a similarity function that measures how similar or related two
objects are. It has applications in ranking, recommendation systems, visual identity tracking, face
verification, and speaker verification.

In the case of semi-supervised learning algorithms, some of the training examples are missing
training labels, but they can nevertheless be used to improve the quality of a model. In weakly
supervised learning, the training labels are noisy, limited, or imprecise; however, these labels are
often cheaper to obtain, resulting in larger effective training sets.

Unsupervised Learning:

25
Unsupervised learning algorithms take a set of data that contains only inputs, and find structure
in the data, like grouping or clustering of data points. The algorithms, therefore, learn from test
data that has not been labeled, classified or categorized. Instead of responding to feedback,
unsupervised learning algorithms identify commonalities in the data and react based on the
presence or absence of such commonalities in each

ew piece of data. A central application of unsupervised learning is in the field of density


estimation in statistics, though unsupervised learning encompasses other domains involving
summarizing and explaining data features.

Cluster analysis is the assignment of a set of observations into subsets (called clusters) so that
observations within the same cluster are similar according to one or more pre designated criteria,
while observations drawn from different clusters are dissimilar. Different clustering techniques
make different assumptions on the structure of the data, often defined by some similarity metric
and evaluated, for example, by internal compactness, or the similarity between members of the
same cluster, and separation, the difference between clusters. Other methods are based on
estimated density and graph connectivity.

Semi-supervised learning:

Semi-supervised learning falls between unsupervised learning (without any labeled training data)
and supervised learning (with completely labeled training data). Many machine-learning
researchers have found that unlabeled data, when used in conjunction with a small amount of
labeled data, can produce a considerable improvement in learning accuracy.

K-Nearest Neighbors

Introduction

In four years of the analytics built more than 80% of classification models and just 15-
20% regression models. These ratios can be more or less generalized throughout the industry.
The reason of a bias towards classification models is that most analytical problem involves
making a decision. For instance will a customer attrite or not, should we target customer X for

26
digital campaigns, whether customer has a high potential or not etc. This analysis is more
insightful and directly links to an implementation roadmap. In this article, we will talk about
another widely used classification technique called K-nearest neighbors (KNN). Our focus will
be primarily on how does the algorithm work and how does the input parameter effect the
output/prediction.

KNN algorithm

KNN can be used for both classification and regression predictive problems. However, it
is more widely used in classification problems in the industry. To evaluate any technique we
generally look at 3 important aspects:
1. Ease to interpret output
2. Calculation time
3. Predictive Power
Let us take a few examples to place KNN in the scale:

KNN algorithm fairs across all parameters of considerations. It is commonly used for its easy of
interpretation and low calculation time.

The KNN algorithm work

Let’s take a simple case to understand this algorithm. Following is a spread of red circles
(RC) and green squares (GS):

27
You intend to find out the class of the blue star (BS). BS can either be RC or GS and
nothing else. The “K” is KNN algorithm is the nearest neighbors we wish to take vote from.
Let’s say K = 3. Hence, we will now make a circle with BS as center just as big as to enclose
only three data points on the plane. Refer to following diagram for more details:

The three closest points to BS is all RC. Hence, with good confidence level we can say
that the BS should belong to the class RC. Here, the choice became very obvious as all three
votes from the closest neighbor went to RC. The choice of the parameter K is very crucial in this
algorithm.

28
How do we choose the factor K?

First let us try to understand what exactly does K influence in the algorithm. If we see the
last example, given that all the 6 training observation remain constant, with a given K value we
can make boundaries of each class. These boundaries will segregate RC from GS. The same
way, let’s try to see the effect of value “K” on the class boundaries. Following are the different
boundaries separating the two classes with different values of K.

If you watch carefully, you can see that the boundary becomes smoother with increasing
value of K. With K increasing to infinity it finally becomes all blue or all red depending on the

29
total majority. The training

error rate and the validation error rate are two parameters we need to access on different K-value.
Following is the curve for the training error rate with varying value of K:

As you can see, the error rate at K=1 is always zero for the training sample. This is
because the closest point to any training data point is itself. Hence the prediction is always
accurate with K=1. If validation error curve would have been similar, our choice of K would
have been 1. Following is the validation error curve with varying value of K:

This makes the story more clear. At K=1, we were over fitting the boundaries. Hence,

30
error rate initially decreases and reaches a minimal. After the minima point, it then increases
with increasing K. To get the

optimal value of K, you can segregate the training and validation from the initial dataset. Now
plot the validation error curve to get the optimal value of K. This value of K should be used for
all predictions.

CHAPTER 4

4.1 Design and Implementation Constraints

4.5.1 Constraints in Analysis

♦ Constraints as Informal Text

♦ Constraints as Operational Restrictions

♦ Constraints Integrated in Existing Model Concepts

♦ Constraints as a Separate Concept

♦ Constraints Implied by the Model Structure

4.1.2 Constraints in Design

♦ Determination of the Involved Classes

♦ Determination of the Involved Objects

31
♦ Determination of the Involved Actions

♦ Determination of the Require Clauses

♦ Global actions and Constraint Realization

4.1.3 Constraints in Implementation


A hierarchical structuring of relations may result in more classes and a more
complicated structure to implement. Therefore it is advisable to transform the hierarchical
relation structure to a simpler structure such as a classical flat one. It is rather straightforward to
transform the developed hierarchical model into a bipartite, flat model, consisting of classes on
the one hand and flat relations on the other. Flat relations are preferred at the design level for
reasons of simplicity and implementation ease. There is no identity or functionality associated
with a flat relation. A flat relation corresponds with the relation concept of entity-relationship
modeling and many object oriented methods.

4.2 Other Nonfunctional Requirements

4.2.1 Performance Requirements

The application at this side controls and communicates with the following three main general
components.

⮚ embedded browser in charge of the navigation and accessing to the web service;

⮚ Server Tier: The server side contains the main parts of the functionality of the proposed
architecture. The components at this tier are the following.

Web Server, Security Module, Server-Side Capturing Engine, Preprocessing Engine,


Database System, Verification Engine, Output Module.

4.2.2 Safety Requirements

32
1. The software may be safety-critical. If so, there are issues associated with
its integrity level
2. The software may not be safety-critical although it forms part of a safety-
critical system. For example, software may simply log transactions.
3. If a system must be of a high integrity level and if the software is shown to
be of that integrity level, then the hardware must be at least of the same integrity level.
4. There is little point in producing 'perfect' code in some language if
hardware and system software (in widest sense) are not reliable.
5. If a computer system is to run software of a high integrity level then that
system should not at the same time accommodate software of a lower integrity level.
6. Systems with different requirements for safety levels must be separated.
7. Otherwise, the highest level of integrity required must be applied to all
systems in the same environment.

CHAPTER 5

5.1 Architecture Diagram:

33
34
5.2 Sequence Diagram:

A Sequence diagram is a kind of interaction diagram that shows how processes operate
with one another and in what order. It is a construct of Message Sequence diagrams are
sometimes called event diagrams, event sceneries and timing diagram.

35
5.3 Use Case Diagram:

Unified Modeling Language (UML) is a standardized general-purpose modeling language in


the field of software engineering. The standard is managed and was created by the Object
Management Group. UML includes a set of graphic notation techniques to create visual models
of software intensive systems. This language is used to specify, visualize, modify, construct and
document the artifacts of an object oriented software intensive system under development.

5.3.1. USE CASE DIAGRAM

A Use case Diagram is used to present a graphical overview of the functionality provided by
a system in terms of actors, their goals and any dependencies between those use cases.
Use case diagram consists of two parts:

Use case: A use case describes a sequence of actions that provided something of measurable
value to an actor and is drawn as a horizontal ellipse.

Actor: An actor is a person, organization or external system that plays a role in one or more
interaction with the system.

36
5.4 Activity Diagram:

Activity diagram is a graphical representation of workflows of stepwise activities and


actions with support for choice, iteration and concurrency. An activity diagram shows the overall
flow of control.
The most important shape types:

● Rounded rectangles represent activities.

● Diamonds represent decisions.

37
● Bars represent the start or end of concurrent activities.

● A black circle represents the start of the workflow.

● An encircled circle represents the end of the workflow.

38
5.5 Collaboration Diagram:

UML Collaboration Diagrams illustrate the relationship and interaction between


software objects. They require use cases, system operation contracts and domain model to
already exist. The collaboration diagram illustrates messages being sent between classes and
objects.

CHAPTER 6

6.1 MODULES

⮚ Dataset collection

⮚ Machine Learning Algorithm

⮚ Prediction

39
6.2 MODULE EXPLANATION:

6.2.1 Dataset collection:

Dataset is collected from the kaggle.com. That dataset have some value like gender,
marital status, self-employed or not, monthly income, etc,. Dataset has the information, whether
the previous is approved

or not depends up on the customer information. That data well be preprocessed and proceed to
the next step.

Machine learning Algorithm:

In this stage, the collected data will be given to the machine algorithm for training
process. We use multiple algorithms to get high accuracy range of prediction. A preprocessed
dataset are processed in different machine learning algorithms. Each algorithm gives some
accuracy level. Each one is undergoes for the comparison.

✔ Logistic Regression

✔ K-Nearest Neighbors

✔ Decision Tree Classifier

Prediction:

40
Preprocessed data are trained and input given by the user goes to the trained dataset. The
Logistic Regression trained model is used to predict and determine whether the given to a
particular person shall be approved or not.

DESIGN CONSTRAINTS

Design Constraints for "Data Hiding using Steganography in Python"

Design constraints play a crucial role in shaping the architecture and development of a system. In
the context of the proposed system for Data Hiding using Steganography in Python, various
constraints influence the design decisions and implementation strategies. These constraints
encompass technical, budgetary, and environmental factors that impact the system's development
and functionality.

Technical Constraints:

1. Algorithmic Complexity:
The choice of steganographic algorithms and encryption techniques may be constrained by the
computational resources available. Balancing robust security measures with algorithmic
simplicity is essential to ensure efficient processing on a variety of hardware configurations.

2. Media Format Compatibility:


The system must contend with the diverse characteristics of different media formats, including
audio, video, and images. Designing algorithms that are adaptable across various formats while
maintaining efficiency presents a technical challenge.

3. Real-Time Processing:
Achieving real-time processing for large media files may be technically challenging,
particularly when implementing advanced techniques such as parallel processing. Striking a
balance between processing speed and the need for comprehensive data hiding is a significant

41
technical constraint.

4. Cryptography Limitations:
The cryptographic algorithms integrated into the system may have inherent limitations, such as
susceptibility to quantum computing threats. The design must consider the evolving landscape of
cryptographic techniques to ensure long-term security.

Budgetary Constraints:

1. Development Costs:
The availability of financial resources for development influences the selection of tools,
libraries, and technologies. Striking a balance between the desired features and the available
budget is crucial for delivering a cost-effective solution.

2. Hardware Requirements:
The system's hardware requirements may be constrained by budget considerations, limiting the
selection of high-end computational resources. Optimization strategies should be employed to
ensure that the system remains accessible to a broad user base.

Environmental Constraints:

1. Data Privacy Regulations:


Adherence to data privacy regulations and legal constraints is paramount. The system must be
designed to comply with regional and international laws governing the storage and transmission
of sensitive information.

2. User Accessibility:
Considerations related to the accessibility of the system for users with diverse technical
backgrounds may pose environmental constraints. The design must prioritize user-friendly
interfaces and clear documentation to accommodate a broad user base.

42
Operational Constraints:

1. Network Latency:
The system's functionality may be affected by network latency, especially when dealing with
remote servers or cloud-based processing. Design considerations must address potential delays in
data transmission and retrieval.

2. Maintenance Overhead:
The ongoing maintenance of the system, including updates and security patches, poses
operational constraints. The design should incorporate mechanisms for seamless updates and
minimal disruption to users.

3. Integration with External Systems:


The integration of the system with external applications or services may be constrained by
compatibility issues. Design decisions must account for potential challenges in interfacing with
other systems.

In conclusion, the design constraints for "Data Hiding using Steganography in Python"
encompass technical, budgetary, environmental, and operational considerations. Addressing
these constraints during the design phase is crucial for developing a robust and effective
steganographic tool that meets user expectaions while operating within the defined limitations.

CHAPTER 7

CODING AND TESTING

7.1 CODING

Once the design aspect of the system is finalizes the system enters into the coding and
testing phase. The coding phase brings the actual system into action by converting the design of

43
the system into the code in a given programming language. Therefore, a good coding style has to
be taken whenever changes are required it easily screwed into the system.
7.2 CODING STANDARDS

Coding standards are guidelines to programming that focuses on the physical structure
and appearance of the program. They make the code

easier to read, understand and maintain. This phase of the system actually implements the
blueprint developed during the design phase. The coding specification should be in such a
way that any programmer must be able to understand the code and can bring about changes
whenever felt necessary. Some of the standard needed to achieve the above-mentioned objectives
are as follows:
Program should be simple, clear and easy to understand. Naming conventions
Value conventions

Script and comment procedure Message box format Exception and error handling
7.2.1 NAMING CONVENTIONS

Naming conventions of classes, data member, member functions, procedures etc., should
be self-descriptive. One should even get the meaning and scope of the variable by its name. The
conventions are adopted for easy understanding of the intended message by the user. So it is
customary to follow the conventions. These conventions are as follows:
Class names

Class names are problem domain equivalence and begin with capital letter and have
mixed cases.
Member Function and Data Member name

Member function and data member name begins with a lowercase letter with each
subsequent letters of the new words in uppercase and the rest of letters in lowercase.
7.2.2 VALUE CONVENTIONS

Value conventions ensure values for variable at any point of time.

44
This involves the following:

⮚ Proper default values for the variables.

⮚ Proper validation of values in the field.

⮚ Proper documentation of flag values.

7.2.3 SCRIPT WRITING AND COMMENTING STANDARD

Script writing is an art in which indentation is utmost important. Conditional and looping
statements are to be properly aligned to facilitate easy understanding. Comments are included to
minimize the number of surprises that could occur when going through the code.

7.2.4 MESSAGE BOX FORMAT :

When something has to be prompted to the user, he must be able to understand it properly.
To achieve this, a specific format has been adopted in displaying messages to the user. They are
as follows:

⮚ X – User has performed illegal operation.

⮚ ! – Information to the user.

7.3 TEST PROCEDURE SYSTEM TESTING


Testing is performed to identify errors. It is used for quality

assurance. Testing is an integral part of the entire development and maintenance process. The
goal of the testing during phase is to verify that the specification has been accurately and
completely incorporated into the design, as well as to ensure the correctness of the design itself.

45
For example the design must not have any logic faults in the design is detected before coding
commences, otherwise the cost of fixing the faults will be considerably higher as reflected.
Detection of design faults can be achieved by means of inspection as well as walkthrough.
Testing is one of the important steps in the software development phase. Testing checks
for the errors, as a whole of the project testing involves the following test cases:

46
⮚ Static analysis is used to investigate the structural properties of the Source code.

⮚ Dynamic testing is used to investigate the behavior of the source code by executing the
program on the test data.

7.4 TEST DATA AND OUTPUT

7.4.1 UNIT TESTING

Unit testing is conducted to verify the functional performance of each modular


component of the software. Unit testing focuses on the smallest unit of the software design (i.e.),
the module. The white-box testing techniques were heavily employed for unit testing.

7.4.2 FUNCTIONAL TESTS

Functional test cases involved exercising the code with nominal input values for
which the expected results are known, as well as boundary values and special values, such as
logically related inputs, files of identical elements, and empty files.
Three types of tests in Functional test:

⮚ Performance Test

⮚ Stress Test

⮚ Structure Test

7.4.3 PERFORMANCE TEST

It determines the amount of execution time spent in various parts of the unit, program
throughput, and response time and device utilization by the program unit.
7.4.4 STRESS TEST

Stress Test is those test designed to intentionally break the unit. A Great deal can be
learned about the strength and limitations of a program by examining the manner in which a
programmer in which a program unit breaks.

47
7.4.5 STRUCTURED TEST

Structure Tests are concerned with exercising the internal logic of a program and
traversing particular execution paths. The way in which White-Box test strategy was employed
to ensure that the test cases could Guarantee that all independent paths within a module have
been have been exercised at least once.

⮚ Exercise all logical decisions on their true or false sides.

⮚ Execute all loops at their boundaries and within


their operational bounds.

⮚ Exercise internal data structures to assure their validity.

⮚ Checking attributes for their correctness.

⮚ Handling end of file condition, I/O errors, buffer problems and textual errors in output
information
7.4.6 INTEGRATION TESTING

Integration testing is a systematic technique for construction the program structure


while at the same time conducting tests to uncover errors associated with interfacing. i.e.,
integration testing is the complete testing of the set of modules which makes up the product.
The objective is to take untested modules and build a program structure tester should identify
critical modules. Critical modules should be tested as early as possible. One approach is to wait
until all the units have passed testing, and then combine them and then tested. This approach is
evolved from unstructured testing of small programs. Another strategy is to construct the product
in increments of tested units. A small set of modules are integrated together and tested, to which
another module is added and tested in combination. And so on. The advantages of this approach
are that, interface dispenses can be easily found and corrected.
The major error that was faced during the project is linking error. When all the
modules are combined the link is not set properly with all support files. Then we checked out for
interconnection and the links. Errors are localized to the new module and its

48
intercommunications. The

49
product development can be staged, and modules integrated in as they complete unit testing.
Testing is completed when the last module is integrated and tested.
7.5 TESTING TECHNIQUES / TESTING STRATERGIES

7.5.1 TESTING

Testing is a process of executing a program with the intent of finding an error. A good
test case is one that has a high probability of finding an as-yet –undiscovered error. A successful
test is one that uncovers an as-yet- undiscovered error. System testing is the stage of
implementation, which is aimed at ensuring that the system works accurately and efficiently as
expected before live operation commences. It verifies that the whole set of programs hang
together. System testing requires a test consists of several key activities and steps for run
program, string, system and is important in adopting a successful new system. This is the last
chance to detect and correct errors before the system is installed for user acceptance testing.
The software testing process commences once the program is created and the
documentation and related data structures are designed. Software testing is essential for
correcting errors. Otherwise the program or the project is not said to be complete. Software
testing is the critical element of software quality assurance and represents the ultimate the

review of specification design and coding. Testing is the process of executing the program with
the intent of finding the error. A good test case design is one that as a probability of finding an
yet undiscovered error. A successful test is one that uncovers an yet undiscovered error. Any
engineering product can be tested in one of the two ways:
7.5.1.1 WHITE BOX TESTING

This testing is also called as Glass box testing. In this testing, by knowing the
specific functions that a product has been design to perform test can be conducted that
demonstrate each function is fully operational at the same time searching for errors in each
function. It is a test case design method that uses the control structure of the procedural design to
derive test cases. Basis path testing is a white box testing.
Basis path testing:

⮚ Flow graph notation

50
⮚ Cyclometric complexity

⮚ Deriving test cases

⮚ Graph matrices Control

7.5.1.2 BLACK BOX TESTING


In this testing by knowing the internal operation of a product, test can be
conducted to ensure that “all gears mesh”, that is the internal operation performs according to
specification and all internal components have been adequately exercised. It fundamentally
focuses on the functional requirements of the software.
The steps involved in black box test case design are:

⮚ Graph based testing methods

⮚ Equivalence partitioning

⮚ Boundary value analysis

⮚ Comparison testing

7.5.2 SOFTWARE TESTING STRATEGIES:

A software testing strategy provides a road map for the software developer. Testing is
a set activity that can be planned in advance and conducted systematically. For this reason a
template for software testing a set of steps into which we can place specific test case design
methods should be strategy should have the following characteristics:

⮚ Testing begins at the module level and works “outward” toward the integration of the entire
computer based system.

51
⮚ Different testing techniques are appropriate at different points in time.

52
⮚ The developer of the software and an independent test group conducts testing.

⮚ Testing and Debugging are different activities but debugging must be accommodated in any
testing strategy.

7.5.2.1 INTEGRATION TESTING:

Integration testing is a systematic technique for constructing the program structure


while at the same time conducting tests to uncover errors associated with. Individual modules,
which are highly prone to interface errors, should not be assumed to work instantly when we put
them together. The problem of course, is “putting them together”- interfacing. There may be the
chances of data lost across on another’s sub functions, when combined may not produce the
desired major function; individually acceptable impression may be magnified to unacceptable
levels; global data structures can present problems.

7.5.2.2 PROGRAM TESTING:

The logical and syntax errors have been pointed out by program testing. A syntax
error is an error in a program statement that in violates

one or more rules of the language in which it is written. An improperly defined field dimension
or omitted keywords are common syntax error. These errors are shown through error messages
generated by the computer. A logic error on the other hand deals with the incorrect data fields,
out-off-range items and invalid combinations. Since the compiler s will not deduct logical error,
the programmer must examine the output. Condition testing exercises the logical conditions
contained in a module. The possible types of elements in a condition include a Boolean operator,
Boolean variable, a pair of Boolean parentheses A relational operator or on arithmetic
expression. Condition testing method focuses on testing each condition in the program the

PAGE \ MERGEFORMAT 75
purpose of condition test is to deduct not only errors in the condition of a program but also other
a errors in the program.
7.5.2.3 SECURITY TESTING:

Security testing attempts to verify the protection mechanisms built in to a system well,
in fact, protect it from improper penetration. The system security must be tested for
invulnerability from frontal attack must also be tested for invulnerability from rear attack. During
security, the tester places the role of individual who desires to penetrate system.

7.5.2.4 VALIDATION TESTING

At the culmination of integration testing, software is completely assembled as a


package. Interfacing errors have been uncovered and corrected and a final series of software test-
validation testing begins. Validation testing can be defined in many ways, but a simple definition
is that validation succeeds when the software functions in manner that is reasonably expected by
the customer. Software validation is achieved through a series of black box tests that demonstrate
conformity with requirement. After validation test has been conducted, one of two conditions
exists.
* The function or performance characteristics confirm to specifications and are
accepted.

* A validation from specification is uncovered and a deficiency created.

Deviation or errors discovered at this step in this project is corrected prior to completion
of the project with the help of the user by negotiating to establish a method for resolving
deficiencies. Thus the proposed system under consideration has been tested by using
validation testing and found to be working satisfactorily. Though there were deficiencies in the
system they were not catastrophic
7.5.2.5 USER ACCEPTANCE TESTING
User acceptance of the system is key factor for the success of any system. The
system under consideration is tested for user acceptance by constantly keeping in touch with
prospective system and user at the time of developing and making changes whenever required.

PAGE \ MERGEFORMAT 75
This is done in regarding to the following points.

● Input screen design.

● Output screen design.

APPENDIX A

A data dictionary is a comprehensive catalog that defines and describes the data elements,
attributes, and structures within a system. It serves as a valuable reference for developers,
database administrators, and other stakeholders to understand the meaning, relationships, and
characteristics of the data used in a project. The data dictionary for "Data Hiding using
Steganography in Python" is presented below.

---

Data Dictionary for "Data Hiding using Steganography in Python"

1. Cover Image:
- Definition: The primary image or media file into which data will be hidden using
steganographic techniques.
- Attributes:
- Format: JPEG, PNG, BMP, etc.
- Dimensions: Width x Height in pixels.
- Color Depth: Bit depth representing the number of colors.

2. Secret Data:

PAGE \ MERGEFORMAT 75
- Definition: The confidential information or payload that needs to be concealed within the
cover image.
- Attributes:
- Data Type: Text, binary, audio, or other multimedia formats.
- Size: The size of the secret data in bytes or kilobytes.

3. Steganographic Algorithm:
- Definition: The specific algorithm or method employed to embed secret data into the cover
image.
- Attributes:
- Algorithm Type: LSB (Least Significant Bit), DCT (Discrete Cosine Transform), etc.
- Parameters: Embedding capacity, strength of encryption.

4. Key/Password:
- Definition: A secret key or password used for encrypting and decrypting the hidden data.
- Attributes:
- Length: The number of characters or bits in the key.
- Complexity: Strength of the encryption key.

5. Embedding Process:
- Definition: The step-by-step process of concealing the secret data within the cover image.
- Attributes:
- Steps: Detailed description of each step in the embedding process.
- Tools/Software: Any specific tools or software utilized.

6. Extraction Process:
- Definition: The step-by-step process of retrieving the hidden data from the steganographic
image.
- Attributes:
- Steps: Detailed description of each step in the extraction process.
- Tools/Software: Any specific tools or software utilized.

PAGE \ MERGEFORMAT 75
7. Steganographic Image:
- Definition: The resultant image after embedding the secret data.
- Attributes:
- Format: JPEG, PNG, BMP, etc.
- Dimensions: Width x Height in pixels.
- Size: The size of the steganographic image in bytes or kilobytes.

8. Verification Mechanism:
- Definition: The process or algorithm used to verify the successful extraction of hidden data.
- Attributes:
- Accuracy: The reliability of the verification process.
- False Positive/Negative Rates: Occurrence of errors in verification.

9. Security Measures:
- Definition: The security features implemented to protect the integrity of the hidden data.
- Attributes:
- Encryption: The type of encryption used.
- Authentication: Methods employed for user or system authentication.

10. Logging and Auditing:


- Definition: The recording of activities related to data hiding and extraction for auditing
purposes.
- Attributes:
- Log Format: Details captured in the log file.
- Access Control: Policies governing access to logs.

11. User Permissions:


- Definition: The level of access granted to users for embedding or extracting hidden data.
- Attributes:
- User Roles: Different roles and their associated permissions.

PAGE \ MERGEFORMAT 75
- Authorization Levels: Levels of access granted.

12. Error Handling:


- Definition: Mechanisms in place for handling errors or issues during the embedding and
extraction processes.
- Attributes:
- Error Messages: Descriptions of potential errors.
- Recovery Procedures: Steps for recovering from errors.

This data dictionary provides a comprehensive overview of the key data elements and attributes
associated with the "Data Hiding using Steganography in Python" project. It serves as a valuable
resource for understanding the structure and characteristics of the data involved in the
steganographic processes.

APPENDDIX B

OPERATIONAL MANUAL

Operational Manual for "Data Hiding using Steganography in Python"

The operational manual is a guide to assist users in effectively operating the steganography
application developed in Python. This manual outlines the steps for installing, configuring, and
using the application, ensuring a seamless experience for users. Below is a concise operational
manual:

The steganography application is designed to hide data within various types of media files such
as images, audio, and video using Python. Before proceeding, ensure that Python and the
required libraries (such as Pillow for image processing) are installed on your system.

PAGE \ MERGEFORMAT 75
1. Installation:
- Download the steganography application from the provided source.
- Extract the contents of the downloaded file to a directory of your choice.
- Open a terminal or command prompt and navigate to the application directory.

2. Configuration:
- Open the configuration file (config.ini) in a text editor.
- Configure the desired settings such as the default steganographic algorithm, encryption key,
and embedding capacity.
- Save the configuration file.

3. Embedding Data:
- Prepare the cover media file (image, audio, or video) and the data to be hidden.
- Run the steganography application using the command: `python steganography.py embed -
cover <cover_file> -data <data_file>`
- Follow the on-screen instructions to specify additional parameters like encryption key and
output file.

4. Extracting Data:
- To extract hidden data from a steganographic file, use the command: `python
steganography.py extract -stego <stego_file>`
- Provide the required input, including the encryption key, when prompted.
- The extracted data will be saved to a file in the application directory.

5. Verification:
- Verify the accuracy of the extraction by comparing the original data with the extracted data.
- Use the verification mechanism provided in the application to check for errors or
discrepancies.

6. Security Measures:
- Ensure that the encryption key used for embedding is securely stored and known only to

PAGE \ MERGEFORMAT 75
authorized users.
- Implement additional security measures as needed, such as access controls and secure storage
of steganographic files.

7. Logging and Auditing:


- The application logs activities related to embedding and extraction in a log file (log.txt).
- Regularly review the log file for any unusual or unauthorized activities.

8. Error Handling:
- In case of errors or issues during embedding or extraction, refer to the error messages
displayed on the console for guidance.
- Follow the recovery procedures outlined in the application documentation.

9. User Permissions:
- Assign appropriate user roles and permissions to control access to the embedding and
extraction functionalities.
- Update user permissions in the configuration file.

10. Support and Troubleshooting:


- For support or assistance, refer to the provided documentation or contact the application
administrator.
- Troubleshoot common issues by reviewing the troubleshooting section in the documentation.

This operational manual aims to provide users with a straightforward guide to using the
steganography application effectively. Following these steps will ensure a smooth and secure
experience with the application.

PAGE \ MERGEFORMAT 75
REFERENCE

Anil Kumar, Rohini Sharma,A Secure Image Steganography Based On RSA Algorithm And
Hash LSB Technique, International Journal Of Advanced Research In Computer Science And
July 2013.

Kefa Rabah,Steganography The Art Of Hiding,Information Technology Journal: 3(3),245-


269,2004.

A.K. Bhaumik, M. Choi, R.J. Robles and M.O. Balitanas,Data Hiding in Video,in International
Journal of Database Theory and Application ,Vol. 2, No. 2, pp. 9-16, June 2009.

P.Paulpandi1, Dr.T.Meyyappan, Hiding Messages Using Motion Vector Technique In Video


Steganography, International Journal of Engineering Trends and Technology,Volume3,Issue3,pg
361- 365,2012.

Mritha Ramalingam, Stego Machine Video Steganography using Modified LSB Algorithm, in
World Academy of Science, Engineering and Technology, 50, pp. 497-500, 2011.

Pritish Bhautmage, Prof. Amutha Jeyakumar, Ashish Dahatonde, Advanced Video


Steganography Algorithm, International Journal Of Engineering Research And Applications
(IJERA) , Pp.1641-1644 1641 Vol. 3, Issue 1, January -February 2013.

Satya Kumari, K.John Singh, A Robust And Secure Steganograph Approach Using Hash
Algorithm, International Journal Of Latest Research In Science And Technology, Volume

PAGE \ MERGEFORMAT 75
2,Issue 1 :Page No.573-576 , January-February (2013).

Vipula Madhukar Wajgade, Dr. Suresh Kumar, Enhancing Data Security Using Video
Steganography,International Journal of Emerging Technology and Advanced Engineering ,ISSN
2250- 2459 ISO 9001:2008 Certified Journal, Volume 3, Issue 4, pp 549, April 2013 .

Mamta Juneja, and Dr. Parvinder S. Sandhu, An Improved LSB based Steganography Technique
for RGB Color Images, 2nd International Conference on Latest Computational Technologies
(ICLCT'2013,) June 17-18, 2013 .

B.Suneetha,Ch.Hima Bindu & S.Sarath Chandra, Secured Data Transmission Based Video
Steganography, International Journal of Mechanical and Production Engineering (IJMPE), ISSN
No.: 2315-4489, Vol-2, Iss-1, 2013.

Kousik Dasgupta, J.K. Mandal and Paramartha Dutta, Hash Based Least Significant Bit
Technique For Video Steganography(HLSB), International Journal of Security, Privacy and
Trust Management ( IJSPTM), Vol. 1, No 2, April 2012.

PAGE \ MERGEFORMAT 75

You might also like