You are on page 1of 23

File Encryption and Encrypted

text embedding in an image


Abstract

File Encryption and encrypted text embedding is process which allows transmission of
encrypted messages. A traditional stenography algorithm uses standard techniques which are
vulnerable to known encryption method attack. Apart from this if there where losses data
algorithms employed for image encryption and decryption then sure data would be lost. So
considering scenario a new method of losses data encryption and random key distribution is
employed to solve the problem.

This algorithm would be used for large transmission of information and could be
transmitted through un-secure channel, since reconstructing the entire piece of information not
possible. Apart from this the algorithm would support secure information encoding in various
types of information media. Medias would be supported is running images, algorithmic image
generation (SVG), audio files.

This method also support pass phrase and hardware based encryption of image
information which would allow more security of information. Expiry based encryption image
methods are also employed to remove messages which reached expiry period.

Introduction:

As we know concept of stenography used traditionally to hide the information skill full in
an image is called stenography. This employs taking unused bits from the image and adding
necessary information bit by sender. The other end the receiver would get the image and
extract the information bit from received image.

But this scenario if standard decoder to decode image information is used then
information would be in un-secured hands. To avoid that information which has to be dumped
inside the image is encrypted. Then encrypted and stenography allows transmission of invisible
encrypted information to be transmitted. Even though the middle hand gets the information
still the user needs key. By this it’s understood that traditional stenography is more vulnerable
than encrypted stenography if keys are known to the user.

Let’s be detailed there are two types of encryption are used and they are private key
and public key. To put it simple the private key uses two keys and public key uses a single key
(shared key).

The better way is session based public and private key could be used. This would get
better than encrypted stenography.
When things are looked in perspective of volume of transmission of information the session
based encrypted stenography consumes more volume. Since the following reasons as stated
below:

(Existing system)

1. Already the image transmission in traditional internet connection consumes more


transmission data band width. This would cost more for the user.
2. Key transmission again requires additional bandwidth; this also could be acceptable
since we required transmitting key to the user.
3. If Private Key is used then again the key has to be transmitted to the other user. This
with a single file it would negligible cot. But when it comes to large transmission it
would cost more.
4. Session based keys also is best approach you apply when user never wants to
compromise the security.
5. The compression algorithms would able to perform efficient compression since the
image could not be compressed affectively.

So over coming all this and developing a better system would generally overcome all this
negative items from the existing system.

The method we are proposing should have the following merits over the demerits of existing
system. They are the following:

1. The encryption should be scattered along the image or media which we transmitting the
will be safer, since the user would not get complete information.
2. Session based on RSA tokens (hard ware based RSA tokens) reduces transmission of
private keys. Thus delegating private key security to the user.
3. Losses data compression can be applied over the image thus reduce the data
transmission volume. Since the method would uses restore block and check sum digit
added to each image.

With these advantages over the existing system then there would be naturally value addition
and they are as follows:

1. Highly securable and less transmission time.

Modules:

Modularization of the system building is important for building affective system .since as when
we look at the development history. The failure happened because of the following reasons.
1. The un-planned deliverables and un-scheduled delivery times.
2. Lacking of quality happened because of poor quality of delivery.
3. In affective management and methodologies.

So better to go for developing methods which has proven history. On understanding of the
facts and effectiveness of the method and delivery cycles the following model is selected.

Scrum – Agile XP development model would be used in which the model employs the short
cycle of development and release within fortnight.

Accordingly the modules are classified as follows:

1. Encoder
2. Transmitter
3. Receiver
4. Decoder

Encoder:

Encoder covers the following functional points:

1. Algorithm selections
2. Private key selection
3. Editing private key selection
4. Deleting private key
5. Session time selection
6. Random image encoding pattern
7. Passphrase selection
8. Editing passphrase
9. Adding passphrase
10. Deleting passphrase
11. Media content selection
12. Editing content selection
13. Deleting content selection

Transmitter:

Based on configuration defined in encoder the transmitter would transmit to the recipient. The
transmitter acts over application layer encryption which allows transmission of information
from any type of image producer. Transmitter also supports XML transmission to transmit in
standard way to all types of target platforms.
Receiver:

Receiver would receive the image from the transmission channel and would prepare the
message for the decoder to decode.

Decoder:

Decoder would decode the information from the receiver with re constructing the same
procedure and extracts the information from the image and dumps in desired format.

Algorithms:

Let us call the algorithm as random session based losses compression stenography algorithm.
Calling the algorithm in simpler way is abbreviated name as RSLCSA. From now RSLCSA refers to
algorithm defined in this paper. RSLCSA is Meta algorithm. It works as follows.

1. Selecting encryption method


2. Selecting session time out
3. Random scattering method
4. Losses data compression
5. Finally transmitted in transport layer.

Hardware and Software Requirements

Hardware Requirements:

Hardware requirements require 8 GB for centralized machine else for individual use simple i5
machine.

Software Requirements:

Requires windows XP and .NET 3.5 SP1 with proper fire wall and antivirus.
USE CASE DIAGRAM
LOGIN ACTIVITY
HIDE ACTIVITY
SAVE ACTIVITY
RETRIEVE ACTIVITY
2 DATA FLOW DIAGRAMS

LEVEL 0 DFD DIAGRAM


Problem Definition:

Module Description:

1. Encoder:
In digital audio technology, an encoder is a program that converts an audio WAV file
into an MP3 file, a highly-compressed sound file that preserves the quality of a CD
recording. (The program that gets the sound selection from a CD and stores it as a WAV
file on a hard drive is called a ripper.) An MP3 encoder compresses the WAV file so that
it is about one-twelfth the size of the original digital sound file. The quality is maintained
by an algorithm that optimizes for audio perception, losing data that will not contribute
to perception. The program that plays the MP3 file is called a player. Some audio
products provide all three programs together as a package.

Encoder covers the following functional points:

1. Algorithm selections
2. Private key selection
3. Editing private key selection
4. Deleting private key
5. Session time selection
6. Random image encoding pattern
7. Passphrase selection
8. Editing passphrase
9. Adding passphrase
10. Deleting passphrase
11. Media content selection
12. Editing content selection
13. Deleting content selection

Algorithm Selection:

selection algorithm is an algorithm for finding the kth smallest number in a list (such a
number is called the kth order statistic). This includes the cases of finding the minimum,
maximum, and median elements. There are O(n), worst-case linear time, selection
algorithms. Selection is a subproblem of more complex problems like the nearest neighbor
problem and shortest path problems.

The term "selection" is used in other contexts in computer science, including the stage of a
genetic algorithm in which genomes are chosen from a population for later breeding; see
Selection (genetic algorithm). This article addresses only the problem of determining order
statistics.

Selection by sorting

Selection can be reduced to sorting by sorting the list and then extracting the desired element. This
method is efficient when many selections need to be made from a list, in which case only one initial,
expensive sort is needed, followed by many cheap extraction operations. In general, this method requires
O(n log n) time, where n is the length of the list.

Linear minimum/maximum algorithms


Linear time algorithms to find minimums or maximums work by iterating over the list and keeping track of
the minimum or maximum element so far.

[edit]Nonlinear general selection algorithm


Using the same ideas used in minimum/maximum algorithms, we can construct a simple, but inefficient
general algorithm for finding the kth smallest or kth largest item in a list, requiring O(kn) time, which is
effective when k is small. To accomplish this, we simply find the most extreme value and move it to the
beginning until we reach our desired index. This can be seen as an incomplete selection sort. Here is the
minimum-based algorithm:

function select(list[1..n], k)
for i from 1 to k
minIndex = i
minValue = list[i]
for j from i+1 to n
if list[j] < minValue
minIndex = j
minValue = list[j]
swap list[i] and list[minIndex]
return list[k]

Other advantages of this method are:


 After locating the jth smallest element, it requires only O(j + (k-j)2) time to find the kth smallest
element, or only O(k) for k ≤ j.
 It can be done with linked list data structures, whereas the one based on partition
requires random access.
[edit]Partition-based general selection algorithm
A general selection algorithm that is efficient in practice, but has poor worst-case performance, was
conceived by the inventor of quicksort, C.A.R. Hoare, and is known as Hoare's selection
algorithmor quickselect.

In quicksort, there is a subprocedure called partition that can, in linear time, group a list (ranging from
indices left to right) into two parts, those less than a certain element, and those greater than or equal
to the element. Here is pseudocode that performs a partition about the element list[pivotIndex]:

function partition(list, left, right, pivotIndex)


pivotValue := list[pivotIndex]
swap list[pivotIndex] and list[right] // Move pivot to end
storeIndex := left
for i from left to right-1
if list[i] < pivotValue
swap list[storeIndex] and list[i]
storeIndex := storeIndex + 1
swap list[right] and list[storeIndex] // Move pivot to its final place
return storeIndex

In quicksort, we recursively sort both branches, leading to best-case Ω(n log n) time. However, when
doing selection, we already know which partition our desired element lies in, since the pivot is in its final
sorted position, with all those preceding it in sorted order and all those following it in sorted order. Thus a
single recursive call locates the desired element in the correct partition:

function select(list, left, right, k)


select pivotIndex between left and right
pivotNewIndex := partition(list, left, right, pivotIndex)
if k = pivotNewIndex
return list[k]
else if k < pivotNewIndex
return select(list, left, pivotNewIndex-1, k)
else
return select(list, pivotNewIndex+1, right, k)

Note the resemblance to quicksort: just as the minimum-based selection algorithm is a partial selection
sort, this is a partial quicksort, generating and partitioning only O(log n) of its O(n) partitions. This simple
procedure has expected linear performance, and, like quicksort, has quite good performance in practice.
It is also an in-place algorithm, requiring only constant memory overhead, since thetail recursion can be
eliminated with a loop like this:

function select(list, left, right, k)


loop
select pivotIndex between left and right
pivotNewIndex := partition(list, left, right, pivotIndex)
if k = pivotNewIndex
return list[k]
else if k < pivotNewIndex
right := pivotNewIndex-1
else
left := pivotNewIndex+1

Like quicksort, the performance of the algorithm is sensitive to the pivot that is chosen. If bad pivots are
consistently chosen, this degrades to the minimum-based selection described previously, and so can
require as much as O(n2) time. David Musser describes a "median-of-3 killer" sequence that can force the
well-known median-of-three pivot selection algorithm to fail with worst-case behavior
(seeIntroselect section below).

[edit]Linear general selection algorithm - Median of Medians algorithm

Median of Medians

Class Selection algorithm

Data structure Array

Worst case performance O(n)

Best case performance O(n)

Worst case space complexity O(1) auxiliary

A worst-case linear algorithm for the general case of selecting the kth largest element was published
by Blum, Floyd, Pratt, Rivest and Tarjan in their 1973 paper "Time bounds for selection", sometimes
called BFPRT after the last names of the authors. It is based on the quickselect algorithm and is also
known as the median-of-medians algorithm.
Although quickselect is linear-time on average, it can require quadratic time with poor pivot choices
(consider the case of pivoting around the smallest element at each step). The solution to make it O(n) in
the worst case is to consistently find "good" pivots. A good pivot is one for which we can establish that a
constant proportion of elements fall both below and above it.

The Select algorithm divides the list into groups of five elements. (Left over elements are ignored for
now.) Then, for each group of five, the median is calculated (an operation that can potentially be made
very fast if the five values can be loaded into registers and compared). (If sorting in-place, then these
medians are moved into one contiguous block in the list.) Select is then called recursively on this sublist
of n/5 elements to find their true median. Finally, the "median of medians" is chosen to be the pivot.

[edit]Properties of pivot
The chosen pivot is both less than and greater than half of the elements in the list of medians, which is
around n/10 elements for each half. Each of these elements is a median of 5, making it less than 2 other
elements and greater than 2 other elements outside the block. Hence, the pivot is less than 3(n /
10) elements outside the block, and greater than another 3(n / 10) elements outside the block. Thus the
chosen median splits the elements somewhere between 30%/70% and 70%/30%, which assures worst-
case linear behavior of the algorithm. To visualize:

One iteration on the list {0,1,2,3,...99}

2 4 4 1 2 3 1 3 3
12 15 11 2 9 5 0 7 3 1 39
1 4 0 8 0 2 9 5 7

2 4 4 5 2 5 3 4 5 7
13 16 14 8 10 26 6 33 4 79
7 9 6 2 5 1 4 3 6 2

Median 4 4 5 5 5 6 6 6 6 6 8
17 23 24 28 29 30 31 36 83
s 2 7 0 5 8 0 3 5 6 7 1

5 4 5 5 7 7 6 8 7 7 8
22 45 38 53 61 41 62 82 87
4 8 9 7 1 8 4 0 0 6 5

7 9 7 8 9 8 7 9 7 9 9
96 95 94 86 89 69 68 97 91
3 2 4 8 9 4 5 0 7 3 8

(red = "(one of the two possible) median of medians", gray = "number < red", white = "number > red")
(5-tuples are shown here sorted by median, for clarity; note however that sorting is forbidden as part of
the algorithm itself, since that is an O(n log n) operation)

Note that all elements above/left of the red (30% of the 100 elements) are less, and all elements
below/right of the red (another 30% of the 100 elements) are greater.

[edit]Proof of O(n) running time


The median-calculating recursive call does not exceed worst-case linear behavior because the list of
medians is 20% of the size of the list, while the other recursive call recurs on at most 70% of the list,
making the running time

T(n) ≤ T(n/5) + T(7n/10) + O(n)

The O(n) is for the partitioning work (we visited each element a constant number of times, in order to form
them into O(n) groups and take each median in O(1) time). From this, one can then show that
T(n) ≤ c*n*(1 + (9/10) + (9/10)2 + ...) = O(n).

[edit]Important notes
Although this approach optimizes quite well, it is typically outperformed in practice by the expected linear
algorithm with random pivot choices[citation needed].

The worst-case algorithm can construct a worst-case O(n log n) quicksort algorithm, by using it to find the
median at every step.

[edit]Introselect

David Musser's well-known introsort achieves practical performance comparable to quicksort while


preserving O(n log n) worst-case behavior by creating a hybrid of quicksort and heapsort. In the same
paper, Musser introduced an "introspective selection" algorithm, popularly called introselect, which
combines Hoare's algorithm with the worst-case linear algorithm described above to achieve worst-case
linear selection with performance similar to Hoare's algorithm. [1] It works by optimistically starting out with
Hoare's algorithm and only switching to the worst-time linear algorithm if it recurses too many times
without making sufficient progress. Simply limiting the recursion to constant depth is not good enough,
since this would make the algorithm switch on all sufficiently large lists. Musser discusses a couple of
simple approaches:

 Keep track of the list of sizes of the subpartitions processed so far. If at any point k recursive calls
have been made without halving the list size, for some small positive k, switch to the worst-case
linear algorithm.
 Sum the size of all partitions generated so far. If this exceeds the list size times some small
positive constant k, switch to the worst-case linear algorithm. This sum is easy to track in a single
scalar variable.

Both approaches limit the recursion depth to O(klog n), which is O(log n) since k is a predetermined
constant. The paper suggested that more research on introselect was forthcoming, but as of 2007 it has
not appeared.

[edit]Selection as incremental sorting


One of the advantages of the sort-and-index approach, as mentioned, is its ability to amortize the sorting
cost over many subsequent selections. However, sometimes the number of selections that will be done is
not known in advance, and may be either small or large. In these cases, we can adapt the algorithms
given above to simultaneously select an element while partially sorting the list, thus accelerating future
selections.

Both the selection procedure based on minimum-finding and the one based on partitioning can be seen
as a form of partial sort. The minimum-based algorithm sorts the list up to the given index, and so clearly
speeds up future selections, especially of smaller indexes. The partition-based algorithm does not
achieve the same behaviour automatically, but can be adapted to remember its previous pivot choices
and reuse them wherever possible, avoiding costly partition operations, particularly the top-level one. The
list becomes gradually more sorted as more partition operations are done incrementally; no pivots are
ever "lost." If desired, this same pivot list could be passed on to quicksort to reuse, again avoiding many
costly partition operations.

[edit]Using data structures to select in sublinear time


Given an unorganized list of data, linear time (Ω(n)) is required to find the minimum element, because we
have to examine every element (otherwise, we might miss it). If we organize the list, for example by
keeping it sorted at all times, then selecting the kth largest element is trivial, but then insertion requires
linear time, as do other operations such as combining two lists.

The strategy to find an order statistic in sublinear time is to store the data in an organized fashion using
suitable data structures that facilitate the selection. Two such data structures are tree-based structures
and frequency tables.

When only the minimum (or maximum) is needed, a good approach is to use a heap, which is able to find
the minimum (or maximum) element in constant time, while all other operations, including insertion, are
O(log n) or better. More generally, a self-balancing binary search tree can easily be augmented to make it
possible to both insert an element and find the kth largest element in O(log n) time. We simply store in
each node a count of how many descendants it has, and use this to determine which path to follow. The
information can be updated efficiently since adding a node only affects the counts of its O(log n)
ancestors, and tree rotations only affect the counts of the nodes involved in the rotation.

Another simple strategy is based on some of the same concepts as the hash table. When we know the
range of values beforehand, we can divide that range into h subintervals and assign these to hbuckets.
When we insert an element, we add it to the bucket corresponding to the interval it falls in. To find the
minimum or maximum element, we scan from the beginning or end for the first nonempty bucket and find
the minimum or maximum element in that bucket. In general, to find the kth element, we maintain a count
of the number of elements in each bucket, then scan the buckets from left to right adding up counts until
we find the bucket containing the desired element, then use the expected linear-time algorithm to find the
correct element in that bucket.

If we choose h of size roughly sqrt(n), and the input is close to uniformly distributed, this scheme can
perform selections in expected O(sqrt(n)) time. Unfortunately, this strategy is also sensitive to clustering
of elements in a narrow interval, which may result in buckets with large numbers of elements (clustering
can be eliminated through a good hash function, but finding the element with the kth largest hash value
isn't very useful). Additionally, like hash tables this structure requires table resizings to maintain efficiency
as elements are added and n becomes much larger than h2. A useful case of this is finding an order
statistic or extremum in a finite range of data. Using above table with bucket interval 1 and maintaining
counts in each bucket is much superior to other methods. Such hash tables are like frequency
tables used to classify the data in descriptive statistics.

[edit]Selecting k smallest or largest elements


Another fundamental selection problem is that of selecting the k smallest or k largest elements, which is
particularly useful where we want to present just the "top k" of an unsorted list, such as the top 100
corporations by gross sales.

[edit]Application of simple selection algorithms


We can use the linear-time solution discussed above to select the "k"-th largest element, then run through
the list in linear time and choose all elements less-than-or-equal-to "k". If the list needs to be sorted, then
this can be done in O(k log k) after the fact.

[edit]Direct application of the quicksort-based selection algorithm


The quicksort-based selection algorithm can be used to find the k smallest or the k largest elements. To
find the k smallest elements, find the kth smallest element using the median of medians quicksort-based
algorithm. After the partition that finds the kth smallest element, all elements smaller than the kth smallest
element will be to the left of the kth element and all elements larger will be to the right. Thus all elements
from the 1st to the kth element inclusive constitute the k smallest elements. The time complexity is linear
in n, the total number of elements.

[edit]Data structure-based solutions


Another simple method is to add each element of the list into an ordered set data structure, such as
a heap or self-balancing binary search tree, with at most k elements. Whenever the data structure has
more than k elements, we remove the largest element, which can be done in O(log k) time. Each insertion
operation also takes O(log k) time, resulting in O(nlog k) time overall.

It is possible to transform the list into a heap in Θ(n) time, and then traverse the heap using a
modified Breadth-first search algorithm that places the elements in a Priority Queue (instead of the
ordinary queue that is normally used in a BFS), and terminate the scan after traversing exactly k
elements. As the queue size remains O(k) throughout the traversal, it would require O(klog k) time to
complete, leading to a time bound of O(n + klog k) on this algorithm.

We can achieve an O(log n) time solution using skip lists. Skip lists are sorted data structures that allow
insertion, deletion and indexed retrieval in O(log n) time. Thus, for any given percentile, we can insert a
new element into (and possibly delete an old element from) the list in O(log n), calculate the
corresponding index(es) and finally access the percentile value in O(log n) time. See, for example, this
Python-based implementation for calculating running median.

[edit]Optimised sorting algorithms


More efficient than any of these are specialized partial sorting algorithms based
on mergesort and quicksort. The simplest is the quicksort variation: there is no need to recursively sort
partitions which only contain elements that would fall after the kth place in the end. Thus, if the pivot falls
in position k or later, we recur only on the left partition:

function quicksortFirstK(list, left, right, k)


if right > left
select pivotIndex between left and right
pivotNewIndex := partition(list, left, right, pivotIndex)
quicksortFirstK(list, left, pivotNewIndex-1, k)
if pivotNewIndex < k
quicksortFirstK(list, pivotNewIndex+1, right, k)

The resulting algorithm requires an expected time of only O(n + klogk), and is quite efficient in practice,
especially if we substitute selection sort when k becomes small relative to n. However, the worst-case
time complexity is still very bad, in the case of a bad pivot selection. Pivot selection along the lines of the
worst-case linear time selection algorithm could be used to get better worst-case performance.
Even better is if we don't require those k items to be themselves sorted. Losing that requirement means
we can ignore all partitions that fall entirely before or after the kth place. We recur only into the partition
that actually contains the kth element itself.

function quickfindFirstK(list, left, right, k)


if right > left
select pivotIndex between left and right
pivotNewIndex := partition(list, left, right, pivotIndex)
if pivotNewIndex > k // new condition
quickfindFirstK(list, left, pivotNewIndex-1, k)
if pivotNewIndex < k
quickfindFirstK(list, pivotNewIndex+1, right, k)

The resulting algorithm requires an expected time of only O(n), which is the best such an algorithm can
hope for.

A simpler formulation of a worst-case O(n) algorithm, is as follows:

 simply use the #Linear general selection algorithm - Median of Medians algorithm described in


above sections to find the kth element in O(n) time worst-case
 use the Quicksort#Algorithm partition operation (which is O(n)) from Quicksort to partition into the
elements less than and greater than the kth element
[edit]Tournament Algorithm
Another method is tournament algorithm. The idea is to conduct a knockout minimal round tournament to
decide the ranks. It first organises the games (comparisons) between adjacent pairs and moves the
winners to next round until championship (the first best) is decided. It also constructs the tournament tree
along the way. Now the second best element must be among the direct losers to winner and these losers
can be found out by walking in the binary tree in O(log n) time. It organises another tournament to decide
the second best among these potential elements. The third best must be one among the losers of the
second best in either of the two tournament trees. The approach continues until we find k elements. This
algorithm takes O(n + k log n) complexity, which for any fixed kindependent of n is O(n).

[edit]Lower bounds
In his seminal The Art of Computer Programming, Donald E. Knuth discussed a number of lower bounds
for the number of comparisons required to locate the k smallest entries of an unorganized list ofn items
(using only comparisons). There's a trivial lower bound of n − 1 for the minimum or maximum entry. To
see this, consider a tournament where each game represents one comparison. Since every player except
the winner of the tournament must lose a game before we know the winner, we have a lower bound of n −
1 comparisons.

The story becomes more complex for other indexes. To find the k smallest values requires at least this
many comparisons:

This bound is achievable for k=2 but better, more complex bounds exist for larger k.

[edit]Language support
Very few languages have built-in support for general selection, although many provide facilities for
finding the smallest or largest element of a list. A notable exception is C++, which provides a
templated nth_element method with a guarantee of expected linear time. It is implied but not
required that it is based on Hoare's algorithm by its requirement of expected linear time. (Ref section
25.3.2 of ISO/IEC 14882:2003(E) and 14882:1998(E), see also SGI STL description of nth_element)

C++ also provides the partial_sort algorithm, which solves the problem of selecting the
smallest k elements (sorted), with a time complexity of O(n log k). No algorithm is provided for
selecting the greatest k elements since this should be done by inverting the ordering predicate.

For Perl, the module Sort::Key::Top, available from CPAN, provides a set of functions to select the
top n elements from a list using several orderings and custom key extraction procedures.

Python's standard library (since 2.4) includes heapq.nsmallest() and nlargest(), returning


sorted lists in O(n + k log n) time.

Because language support for sorting is more ubiquitous, the simplistic approach of sorting followed
by indexing is preferred in many environments despite its disadvantage in speed. Indeed for lazy
languages, this simplistic approach can even get you the best complexity possible for
the k smallest/greatest sorted (with maximum/minimum as a special case) if your sort is lazy
enough.

[edit]Online selection algorithm


In certain selection problems, selection must be online, that is, an element can only be selected from
a sequential input at the instance of observation and each selection, respectively refusal, is
irrevocable. The problem is to select, under these constraints, a specific element of the input
sequence (as for example the largest or the smallest value) with largest probability. This problem
can be tackled by the Odds algorithm which is known to be optimal under an independence
condition. The algorithm is also optimal itself with the number of operations being linear in the length
of input.

You might also like