You are on page 1of 46

UNIVERSIY OF CALCUTTA

FEBRUARY, 2012

SPECIMEN

DNA CRYPTOGRAPHY
A PROJECT REPORT

Under the Guidance of


Prof. SANJIT KUMAR SETUA

Submitted by
Mr. NURUL HASAN

In partial fulfillment for the award of the 3rd semester of

MASTER OF TECHNOLOGY

IN

COMPUTER SCIENCE & ENGINEERING

36

UNIVERSIY OF CALCUTTA | DNA CRYPTOGRAPHY


CERTIFICATE

This is to certify that the project captained “DNA CRYPTOGRAPHY” by NURUL


HASAN has been prepared under the Supervision of Prof. SANJIT KUMAR SETUA
as a part of fulfillment of M.tech 3rd semester examination 2012, in Computer
Science & Engineering Department of University of Calcutta.

_______________________________ _______________________________

Prof. SANJIT KUMAR SETUA Prof. K.N.D

(Reader, University of Calcutta) (Head of the Department)

(Department of computer science, C.U)

36
DNA CRYPTOGRAPHY
UNIVERSIY OF CALCUTTA | DNA CRYPTOGRAPHY
INDEX
1. INTRODUCTION

1.1.DNA Computation

1.2.Limitations in Silicon Based Technology

1.3 . DNA COMPUTING

1.4. The Genesis of DNA Computing

1.5. Different Forms of DNA Computing

1.6. Benefits of DNA Computing

1.7. Problems with DNA Computing

1.8. Future of DNA Computing

1.9. RELATED WORK

2. DNA COMPUTING AND THE RUDIMENTS OF BIOLOGY

2.1 Structure of DNA Molecule

2.2 Molecular Operations

2.2.1 Melt and Anneal

2.2.2 Polymerize

2.2.3 Synthesis

2.2.4 Gel Electrophoresis

2.2.5 Amplification through PCR


36
2.2.6 Detect

2.2.7 Extract

UNIVERSIY OF CALCUTTA | DNA CRYPTOGRAPHY


2.2.8 Sequencing

2.2.9 Cut and Legate

2.3 Some Points in Favor of DNA Computing

3. MODELS OF DNA COMPUTATION

3.1 Sticker Model

3.2 Splicing System

3.3 Algorithmic Self-Assembly

4. DNA COMPUTING CHALLENGES TRADITIONAL CRYPTOLOGY

4.1 Breaking DES

4.2 Breaking RSA

4.3 Breaking Number Theory Research Unit

4.4 Breaking International Data Encryption Algorithm

5. AN ENCRYPTION SCHEME USING DNA TECHNOLOGY AND ITS WEAKNESS

5.1 DNA Digital Coding Technology

5.2 Key Generation

5.3 Encryption Procedure

5.4 Decryption Procedure

5.5 Details of the Scheme with Example

5.6 Security of the Scheme

5.7 Vulnerability of the Scheme to a Type of “Man in the Middle Attack”


36

7. CONCLUSION

UNIVERSIY OF CALCUTTA | DNA CRYPTOGRAPHY


1. INTRODUCTION
36

Ever since scientists discovered that conventional silicon based computers have an upper limit

UNIVERSIY OF CALCUTTA | DNA CRYPTOGRAPHY


in terms of speed, they have been searching for an alternative media with which to solve
computational problems. The search led to DNA. DNA computing is a method of solving
computational problems with the help of biological and chemical operations on DNA strand. It
was introduced by Adleman. Since then more and more researchers are motivated by the
promising future of this area and start working on it.

DNA is a basic storage medium for all living cells. The main function of DNA is to transmit or
absorb the data of life for billion years. Roughly 10 trillion DNA molecules could fit into a space
the size of a marble. Since all these molecules can process data simultaneously, you could have
10 trillion calculations going on in a small space at once.

Think of DNA as software, and enzymes as hardware. Put them together in a test tube. The way
in which these molecules undergo chemical reactions with each other allows simple operations
to be performed as a byproduct of the reactions. The scientists tell the devices what to do by
controlling the composition of the DNA software molecules. It's a completely different
approach to pushing electrons around a dry circuit in a conventional computer. To the naked
eye, the DNA computer looks like clear water solution in a test tube. There is no mechanical
device. A trillion bio-molecular devices could fit into a single drop of water. Instead of showing
up on a computer screen, results are analyzed using a technique that allows scientists to see
the length of the DNA output molecule.

1.3.DNA Computation

DNA computing has the potential to overcome the limits imposed on the processing power on
silicon based computers. This paper sets the stage for considering this topic to solve
computational problems. It focuses on DNA computing in all its varieties and considers the
benefits and possible problems of this different form of computing. The advent of DNA
computing also opened doors for collaboration among computer scientists, chemists, biologists, 36
and mathematicians. With the arrival of Adleman’s experiment, computer scientists and
biologists now have the opportunity to study and conduct research in fields completely
different from their own. Such collaborative efforts broaden the scope of research in these

UNIVERSIY OF CALCUTTA | DNA CRYPTOGRAPHY


fields and lead to new insights and perspectives that otherwise would not be discovered. “It’s
nice to see that computer scientists are getting to know a lot about DNA, and that molecular
biologists are getting to know a lot about computer science.”

1.4.Limitations in Silicon Based Technology

Computer processing power follows a rule of Moore’s Law that is the power will be doubled
after every 12 months. This is achieved by decreasing the size of transistors and increasing their
number in processor. But in coming years this size is reduced to such extent that the only way
to make them small is to construct them with atoms. But this size will effect on the transmission
of information. So there is a lower size limit on silicon based computer. Also these chips are
made of toxic components. Silicon based computers waste a lots of energy in the form of heat
they generate and energy they consume.

1.4 . DNA COMPUTING

DNA molecule has a double helix structure composed of two sugar phosphate backbones
formed by the polymerization of deoxy-ribose sugar. Placed between two backbones are pairs
of nucleotides Adenine, Cytosine, Guanine and Thymine. DNA computers use single strands of
DNA to perform computing operations.

DNA computing focuses on the use of massive parallelism, or the allocation of tiny portions of a
computing task to many different processing elements. The Structure of the DNA allows the
elements of the problem to be represented in a form that is analogous to the binary code
structure. Trillions of unique strands of DNA are able to represent all of the possible solutions to
the problem. Some scientists predict a future where our bodies are patrolled by tiny DNA
computers that monitor our well-being and release the right drugs to repair damaged or
unhealthy tissue.

36

1.4. The Genesis of DNA Computing

UNIVERSIY OF CALCUTTA | DNA CRYPTOGRAPHY


The first major breakthrough in the field of DNA computing occurred in 1994, when Adleman
use DNA computing to solve the traveling salesman problem, which is also known as
Hamiltonian problem. In a Hamiltonian Path problem, a series of towns are connected to each
other by a fixed number of bridges.

A hypothetical salesman has to find the shortest path through the set of towns so that he visits
each town once before arriving at his final destination. When the number of cities is small, the
question can be tackled analytically by figuring out all possible combinations for itineraries and
then choosing the shortest path. As the number of cities grows, the problem generates too
many possible paths for brute force solving, so a computer is needed to solve it. However, even
with a computer, a Hamiltonian Path problem can easily become too complicated to solve.

Although the solution to Adleman’s seven−city Hamiltonian Path problem was relatively
straightforward (since all possible routes can be written by hand in a reasonable amount of
time), his experiment showed that DNA could be useful as a computational tool.

1.5. Different Forms of DNA Computing

One form is DNA computing in which information is processed by making and breaking bonds
between DNA components. DNA computers can solve variety of problems and it has proved its
worth by solving some complicated problems like “Sales man problem”.

Another form is DNA chip which is being used by scientists in their research for self treatment
of diseases. Efforts are under way to create tiny robots that could reside in cells and interact
with different processes of living organisms.

Researchers are developing genetic “computer programs” that could be introduced into and
replicated by living cells in order to control their processes. Research has already produced
engineered sequences of genetic material that can cause the living cell in which it is implanted
to produce one of two possible genes. This would be effectively analogous to the computer
programs and can serve as “switches” to control the chemicals that living organisms synthesize.

Another variant is to combine living organisms with silicon based technology because brain of
living organisms has ability to understand such problems that no amount of silicon based 36
computers will be able to handle.

UNIVERSIY OF CALCUTTA | DNA CRYPTOGRAPHY


1.6. Benefits of DNA Computing

Following are major benefits of using the DNA computing.

Parallelism. “The speed of any computer, biological or not, is determined by two factors: (i)
how many parallel processes it has; (ii) how many steps each one can perform per unit time.
The exciting point about biology is that the first of these factors can be very large: recall that a
small amount of water contains about 1022 molecules. Thus, biological computations could
potentially have vastly more parallelism than conventional ones.”

Gigantic Memory Capacity. They provide extremely dense information storage. For example,
one gram of DNA, which when dry would occupy a volume of approximately one cubic
centimeter, can store as much information as approximately one trillion CDs.

Low Power Dissipation. DNA computers can perform 2 x 1019 (irreversible) operations per
joule. Existing supercomputers aren’t very energy-efficient, executing a maximum of 109
operations per joule. Here, the energy could be very valuable in future. So, this character of
DNA computers can be very important.

Suitable for Combinatorial Problems. Much of the work on DNA computing has continued to
focus on solving NP-complete and other hard computational problems. In fact, experiments
have proved that DNA Computers are suitable for solving complex combinatorial problems,
even until now, it costs still several days to solve the problems like Hamiltonian Path problems .
But the key point is that Adleman's original and subsequent works demonstrated the ability of
DNA Computers to obtain tractable solutions NP-complete and other hard problems, while
these are unimaginable using conventional computers.

36
Clean, Cheap and Available. Besides above characteristics, clean, cheap and available are easily
found from performance of DNA Computer. It is clean because people do not use any harmful
material to produce it and also no pollution generates. It is cheap and available because you

UNIVERSIY OF CALCUTTA | DNA CRYPTOGRAPHY


can easily find DNA from nature while it’s not necessary to exploit mines and that all the work
you should do is to extract or refine the parts that you need from organism.

1.7. Problems with DNA Computing

A number of problems with DNA computing must be resolved before it can reach its full
potential.

1) First, in some cases the type of genetic sequences that would have to be synthesized to make
fully functional genetic robots would be expensive using current methods.

2) Second, despite their capability for massively parallel calculations, the individual operations
of DNA computers are quite slow in comparison to those of their silicon based computers.

3) Third, DNA computing requires quantity of DNA that can only be used once as reuse can
contaminate reaction vessels and lead to less accurate results.

4) Finally, DNA computing is prone to errors at a level that would be considered unacceptable
by silicon based computer industry.

5) The DNA molecules can fracture. Over the six months you're computing, your DNA system is
gradually turning to water. DNA molecules can break – meaning a DNA molecule, which was
part of your computer, is fracture by time. DNA can deteriorate. As time goes by, your DNA
computer may start to dissolve. DNA can get damaged as it waits around in solutions and the
manipulations of DNA are prone to error.

1.8. Future of DNA Computing

The current state of DNA computing research does not suggest that DNA computers will
provide a successor to silicon within the next few decades if at all.

We do not believe that DNA computing should be written off completely however since whilst a 36
DNA computer in the traditional sense may be a pipedream, there are niche application areas
where the technology may play a part.

UNIVERSIY OF CALCUTTA | DNA CRYPTOGRAPHY


It may be possible that DNA computing technology can be integrated with more traditional
approaches to create DNA/silicon hybrid architectures or within software. Since software is
more flexible and suited to rapid adaptation than hardware, we may see DNA computing
benefits being implemented and exploited by in software first, leaving hardware to play catch
up .

Success hinges on the refinement of the DNA computing process to reduce the time taken to
isolate the correct results from all the possibilities generated, and the addition of autonomy to
allow DNA computers to arrive at their results with the minimum of human interference.

1.9. RELATED WORK

Researchers are developing genetic “computer programs” that could be introduced into and
replicated by living cells in order to control their processes. Research has

Already produced engineered sequences of genetic material that can cause the living cell in
which it is implanted to produce one of two possible genes. This would be effectively analogous
to computer programs and could serve as “switches” to control the chemicals that living
organisms synthesize.

Development efforts are underway to add data processing elements, memory storage
elements, and communication elements, to produce tiny genetic “robots” that could reside in
cells. This would allow a level of interface with living processes on a microscopic level that is
not possible using strictly silicon-based computing technology. Such techniques could provide
an unprecedented level of control over such processes.

Another variant of biological computing development seeks to unite living organisms with
silicon-based computing technology. The purpose of this is to use living organisms to control
technology. Such technology involves linking living neural cells to silicon-based computing
components. The reason for doing this is that the brains of humans, and to a lesser degree,
those of lower organisms, have abilities to understand complicated problems that no amount of
silicon-based processing power will be able to handle. Further, they are able to solve problems
correctly, even with only partial information.

This field is in an extremely early stage of development. For example, researchers at Georgia
Tech have used leech neurons to perform mathematical operations, as shown in the picture 36
above. Further, other researchers have managed to link the brain of a lamprey eel to a robot for
the purposes of controlling it. The brain has already shown the ability to process information
from the surrounding environment and direct the robot’s movements in response to the

UNIVERSIY OF CALCUTTA | DNA CRYPTOGRAPHY


stimuli. Some researchers are quite interested in extending development work in this area to
human beings, albeit in slow steps.

Recently, British cybernetics professor Kevin Warwick was the personal participant in an
experiment in which a computer chip was attached to a main nerve in his arm. The chip
remained in his arm for four months, allowing him to control a robot on wheels.

2. DNA COMPUTING AND THE RUDIMENTS OF BIOLOGY

DNA (Deoxyribose Nucleic Acid) computing, also known as molecular computing is a new
approach to massively parallel computation based on groundbreaking work by Adleman. DNA
computing was proposed as a means of solving a class of intractable computational problems in
which the computing time can grow exponentially with problem size (the 'NP-complete' or non-
deterministic polynomial time complete problems). A DNA computer is basically a collection of
specially selected DNA strands whose combinations will result in the solution to some problem,
depending on the problem at hand. Technology is currently available both to select the initial
strands and to filter the final solution. DNA computing is a new computational paradigm that
employs (bio)molecular manipulation to solve computational problems, at the same time
exploring natural processes as computational models. In 1994, Leonard Adleman at the
Laboratory of Molecular Science, Department of Computer Science, University of Southern
California surprised the scientific community by using the tools of molecular biology to solve a
different computational problem. The main idea was the encoding of data in DNA strands and
the use of tools from molecular biology to execute computational operations. Besides the 36
novelty of this approach, molecular computing has the potential to outperform electronic
computers. For example, DNA computations may use a billion times less energy than an
electronic computer while storing data in a trillion times less space. Moreover, computing with

UNIVERSIY OF CALCUTTA | DNA CRYPTOGRAPHY


DNA is highly parallel: In principle there could be billions upon trillions of DNA molecules
undergoing chemical reactions, that is, performing computations, simultaneously.

2.1 Structure of DNA Molecule

DNA is the way the nature storages information from one generation to the following of the
same species. DNA (deoxyribonucleic acid) is a double stranded sequence of four nucleotides;
the four nucleotides that compose a strand of DNA are as follows: adenine (A), guanine (G),
cytosine (C), and thymine (T); they are often called bases. DNA supports two key functions for
life:
 coding for the production of proteins,
 self-replication.

Each deoxyribonucleotide consists of three components:


 a sugar — deoxyribose
 five carbon atoms: 1´ to 5´
 hydroxyl group (OH) attached to 3´ carbon
 a phosphate group
 a nitrogenous base.

36

UNIVERSIY OF CALCUTTA | DNA CRYPTOGRAPHY


Fig. 1 a) Bonds in a Double Strand Chain of DNA

Fig. 1 b) Hydrogen Bonds between Complementary Nucleotides

The chemical structure of DNA consists of a particular bond of two linear sequences of bases
(Fig.1). This bond follows a property of Complementarity: adenine bonds with thymine (A-T)
and vice versa (T-A), cytosine bonds with guanine (C-G) and vice versa (G-C). This is known as
Watson-Crick complementarity. The four nucleotides adenine (A), guanine (G), cytosine (C), and
thymine (T) compose a strand of DNA. Each DNA strand has two different ends that determine
its polarity: the 3’end, and the 5’end. The double helix (Fig.2) is an anti-parallel (two strands of
opposite polarity) bonding of two complementary strands.

36

UNIVERSIY OF CALCUTTA | DNA CRYPTOGRAPHY


Fig. 2 The Structure of DNA
Double Helix

2.2 Molecular Operations

DNA molecules can be


manipulated in various ways.
The different chemical
operations serve as the tool
kit for DNA computation.

2.2.1 Melt and Anneal

A double strand DNA molecule can be separated into two single strands by varying the
temperature and pH of the solution. In this process the hydrogen bond between the bases
breaks. This is called melting. The reverse process is called annealing. In annealing two single
strands which satisfy WC complementarity and they are opposite in orientation could be
combined to form a double strand DNA molecule. This is also affected by varying the
temperature and pH of the solution which reestablishes the hydrogen bonds between the
bases.

36

UNIVERSIY OF CALCUTTA | DNA CRYPTOGRAPHY


2.2.2 Polymerize

We can form the WC complement of a single strand DNA molecule using an enzyme called
DNApolymerase. The strand whose complement is to be made is called template. For this
purpose, a short sequence of nucleotide, called the primer is attached at one end of the
template. The enzyme polymerase then extends the sequence by adding nucleotide one by one
to form the total WC complementary sequence.

2.2.3 Synthesis

Any short sequence of nucleotide can be synthesized in the lab. Longer random sequence are 36
available in the DNA pool of various organisms.

UNIVERSIY OF CALCUTTA | DNA CRYPTOGRAPHY


2.2.4 Gel Electrophoresis

DNA molecules can be separated based on the lengths of the molecules. To do this some of the
DNA solution, strained with ethidium bromide, is placed in a container filled with agrose gel.
The container is then kept in an electric field. As the DNA molecules are negatively charged,
they start moving in the gel due to the electric field. The smaller molecules move a greater
distance than the heavier molecules. Thus they form separate bands on the gel which can be
viewed or can be separated to get the strands in the bands. With agrose gel strands differing in
length one can also be separated [17].
36

UNIVERSIY OF CALCUTTA | DNA CRYPTOGRAPHY


2.2.5 Amplification through PCR

With the technique called polymerize chain reaction (PCR) a strand can be replicated very
efficiently and elegantly. Suppose we want to amplify a double strand molecule A with known
borders B and C. First a solution containing a, the two primers B and C, polymerase enzyme and
plenty of nucleotides is prepared. The strand A is then melted with the application of heat to 36
form two complementary single strands A 1 and A2. When the solution is cooled down the
primers attaches at the ends. As in polymerization the primers then extends to form the whole

UNIVERSIY OF CALCUTTA | DNA CRYPTOGRAPHY


strand due to the effect of polymerase. If we repeat the process we can amplify strand A
exponentially.

2.2.6 Detect

Given a test tube with a solution we can easily detect whether the test tube contains any
strands or not using regents. PCR may be used to amplify the result and then a process called
“sequencing” is used to actually read the solution. 36

2.2.7 Extract

UNIVERSIY OF CALCUTTA | DNA CRYPTOGRAPHY


In this process all strands containing a given substrand x can be extracted from a solution of
different DNA molecules. This could be done using biotin-avidin affinity purification as in [1].
First copies of x (i.e., the WC complement of x), called the probes, are formed and they are
attached to biotin molecules, which in turn will be anchored to an avidin bead matrix. If we
then melt the strand in the solution and pour them over the matrix, strand containing x will be
attached to the probes. The other strands are simply washed out.

2.2.8 Sequencing

In this process an unknown DNA strand can be read to know the exact sequence of nucleotide.
But sequencing is a costly affair and effort should be made to avoid this.

2.2.9 Cut and Ligate

There are enzymes that can cut DNA molecules in various ways. Endonuclease (SI
endonuclease) can cut a double strand (resp. a single strand); but they are not site specific, i.e.,
they can cut anywhere in the molecule. There is a class of enzymes called restriction enzymes
which can cut double strand at specific sites having definite sequence of nucleotides. An
example is the action of EcoRI. EcoRI acts on site having sequence GAATTC. This action is called
splicing. Two separated strands can be with reunited by removing the restriction enzyme and
adding enzyme called ligase. The general action of ligase is to create a phosphodiester bond.
Two partial double strands with complementary overhanging sequence of nucleotide, called the
sticky end, can be joined by applying ligase. This process is called ligation.

36

UNIVERSIY OF CALCUTTA | DNA CRYPTOGRAPHY


2.3 Some Points in Favour of DNA Computing

1. DNA strand has the capability of processing information due to its chemical
properties.

2. DNA strand can store an incredible amount of data in a very small volume.
36
3. It is massively parallel. This gives DNA computing very high speed.

UNIVERSIY OF CALCUTTA | DNA CRYPTOGRAPHY


4. Various complex structures like living organism are the result of applying few simple
operations to an initial DNA sequence. This makes us believe that DNA can be a potential
tool for computation.

3. MODELS OF DNA COMPUTATION

There are various models of DNA computation. Some of them are mathematical models
capturing the real world situation, some of which are purely mathematical whose experimental
feasibility remains unresolved. Here we describe three models which have been used for the
purpose of DNA cryptology.

3.1 Sticker Model

This model introduced in [18], was used in [5] for breaking the widely used Data Encryption
Standard (DES). The model is based on WC complementarity. This model uses two kinds of
strands referred to as memory strands and sticker strands or simply stickers. A memory strand
is a n nucleotide long single strand consisting of k non-overlapping substrands (which are
usually of equal length m). Each nucleotide of the memory strand may not be of a part of any of
the k-substrands. Thus n ≥ mk. Any two of the k-substrands differ in several base positions. A
sticker is just the WC complement of exactly one of the k-substrands. If a sticker is attached to
its matching substrand on the memory strand the particular substrand is said to on. Otherwise
it is off.

Operation on Sticker model :

Before specifying the operations on sticker model we define a test tube to be a multiset
containing the memory complexes. The general operations on the memory complexes in a test
tube are merge, separate, set and clear.
Merge: Two test tubes are combined into one. This is just mixing the solution of two test tubes.
Separate: Given a test tube T and an integer i, 1 ≤ i ≤ k this produces two test tube +(T, i) and −
(T, i) where +(T, i) (−(T, i)) contains all memory complexes whose ith substrand is on (resp. off).
Set: Given a test tube T and an integer i, 1 ≤ i≤ k this produces another test tube set(T, i) where 36
each memory complex has its ith substrand on.

UNIVERSIY OF CALCUTTA | DNA CRYPTOGRAPHY


Clear: Given a test tube T and an integer i, 1 ≤ i ≤ k this produces another test tube clear(T, i)
where each memory complex has its ith substrand off.
The input or initial test tube will be a library of memory complexes. In particular a (k, l) library, 1
≤ l ≤ k, consists of memory complexes with k substrands, the last k−l substrands are on whereas
the first l substrands are on and off in all possible ways. Thus, a (k, l) library contains 2 l different
memory complexes.

3.2 Splicing System

Splicing system was proposed by Tom Head [19]. Splicing system captures mathematically the
two molecular operations cut and ligate introduced in Chapter 2. The mathematical model was
introduced and studied before Adleman’s experiment. Many results including universality of
splicing system were obtained in [20, 21, 22]. In several organisms the DNA present is circular. If
both circular and linear DNA strands are present the mathematical analysis becomes much
more complicated because then several different possibilities must be handled. Despite of that
various result concerning circular splicing has been obtained in [23, 24]. From practical point of
view this splicing seems feasible because of easy availability of the particular type of circular
DNA strands and the simple operations on it. This splicing has been used in [25] to break a
public key cryptosystem.

3.3 Algorithmic Self-Assembly

Introduced in [26], self-assembly seems to be a powerful tool for DNA computing paradigm.
Apart from creating the building blocks, self-assembly only involves annealing and ligation for
computation. The building blocks are various complex nanoscopic structure of DNA molecules.
Some of the nanoscopic structure has been studied and created by Seeman et al. Winfree
showed that self assembly has the power of universal computation in [26]. In [27] it has been
shown that nanoscopic structure using DNA could be made which in turn serves as the building
block of self-assembly. This gives greater flexibility while retaining the advantage of massive
parallelism inherent in DNA computing. This nanoscopic structure can act like Wang tiles [28].
The wang tiles are squares with colored edges. If the Wang tiles are allowed to cover the plane 36
according to an additional rule that only edges of same color can face each other, then the
Wang tiles can simulate Turing machine. Thus it has the power of universal computation. The
DNA Wang tiles are ”Double Crossover” or ”Triple Crossover” tiles made up of several

UNIVERSIY OF CALCUTTA | DNA CRYPTOGRAPHY


interwoven DNA strands to form a square body with sticky ends coming out of the sides or
corners. This tile has been used to compute Xor in [29], multiplication and circular convolution
in [30]. The problem of attacking the cryptosystem NTRU and creating DNA one-time-pad has
been addressed in [30] and [15] respectively. The stability and error resistance of DNA tile
seems promising for DNA computing.

4. DNA COMPUTING CHALLENGES TRADITIONAL CRYPTOLOGY

The security of traditional cryptology is usually based on complex mathematical problem that
we can not find a quick algorithm at this stage, such as famous Rivest-Shamir-Adleman
(RSA) encryption, the security of which bases on the difficulty of a large number finding its two
prime factors. Once corresponding quick methods to mathematic problems were found, they
might be no longer secure. DNA computing provides a parallel processing capability with
molecular level, introducing a fire-new data structure and calculating method. It can
simultaneously attack different parts of the computing problem, putting forward challenges to
traditional information security technology. A number of proposals have been submitted for
breaking conventional cryptosystems by DNA computing. It indicated that the cryptosystem
using public-key was perhaps insecure.

4.1 Breaking DES

DES is a cipher which based on a Symmetric-key algorithm that uses a 56-bit key. The algorithm
was initially controversial with classified design elements, a relatively short key length, and 36
suspicions about a National Security Agency backdoor. DES is now considered to be insecure for
many applications. This is chiefly due to the 56-bit key size being too small. Dan Boneh et al.
spent nearly 4 months to construct DES-1 liquid and then broke DES within a day [5]. They

UNIVERSIY OF CALCUTTA | DNA CRYPTOGRAPHY


claimed that any symmetric system under 64 bits can be broken with this method. The process
to solve this kind of problem is listed as follows: Firstly, encode appropriate binary codes,
create initial DNA liquid which contains all possible keys; Secondly, carry out 16 wheels of
encryption after pasted known plaintext strands respectively. Lastly, find the solution by
searching. Though the idea brought up by Boneh et al. is comparatively simple in theory, what
they used in the experiment are mainly extracting, separation, pasting etc. the concrete
operation is not easy in real experiments since the method of binary system is comparatively
abstract.

4.2 Breaking RSA

In cryptography, RSA is an algorithm for public-key cryptography. The security of the RSA
cryptosystem is based on two mathematical problems: the problem of

factoring large numbers and the RSA problem. Full decryption of an RSA ciphertext is thought to
be infeasible on the assumption that both of these problems are hard, i.e., no efficient
algorithm exists for solving them. Weng-Long Chang et al. have designed integer factorization
way of utilizing DNA computing, which can be used to break RSA. On the basis of Adleman’s
thought, Beaver et al. translated large number of decomposition problems into HPP [31], they
analysed

1000 bits RSA and concluded that to solve that problem required the acme number to be 106 at
least, namely 10200000L liquid to be needed on the grounds of conservative estimation. Obviously
it is infeasible . For this, Winfree et al. came up with the idea of computation by self-assembled
tiles since DNA tiles can be more easily ¡programmed¡ to incorporate the constraints of a given
problem. Brun proposed in theory the systems could compute the sum and product of two
numbers using the tile assembly model [32]. He found that the addition and multiplication can
be done using Θ(1) tiles and both computations can be carried out in linear time with the tile
assembly model and then he combined those systems to create two new systems with more
complex behavior to factor numbers and solve subset sum problems [33], [34]. Xuncai Zhang
proposes a scheme which using self-assembly of DNA tiles subtraction [35] which can be used 36
to factor integers.

UNIVERSIY OF CALCUTTA | DNA CRYPTOGRAPHY


4.3 Breaking Number Theory Research Unit

Number Theory Research Unit (NTRU) is a novel, efficient public-key cryptosystem proposed by
Hoffstein et al, which is based on a new problem, the closest vector problem. Its security comes
from the interaction of the polynomial mixing system with the independence of reduction
modulo of two relative prime integers p and q. Here, we show now that it is possible to break
the NTRU cryptosystem using Self- Assembly of DNA Tilings. Such attack has already been used
in Res.[30].The attack involves computation of cyclic convolution product modulo some integer
and it is this part that has been shown to be done efficiently using DNA Wang tiles which
effectively breaks the system. However, the main disadvantage of the attack is that it is not fully
supported by physical experiments and the Wang tiles that are used for the attack to form 3D
self-assembly have not yet been made practically. On their basis Xuncai Zhang uses existing tiles
to carry out the attack NTRU cryptosystem. The basic idea is to exploit the massive parallelism
possible in DNA operations in order to emulate a non-deterministic device that breaks the
NTRU system in polynomial time. Such emulation can be achieved by exponential-order
parallelism [36].

4.4 Breaking International Data Encryption Algorithm

International Data Encryption Algorithm (IDEA) is a symmetric block cipher with a 128-bit key
space. The security of IDEA is ensured by confusion and diffusion, which treat with the data in
the IDEA. It has been adopted by Pretty Good Privacy. Researchers have finished a great of
work in attacking IDEA, such as super-scale integrate circuit calculating and analysis, various
parallel computing analysis, as well as other attack schemes against IDEA. But the security of
IDEA does not to be threat. DNA computing as research method, in order to illuminate the
feasibility of breaking symmetric block cipher with DNA computing, a recursive splicing model
DNA algorithm to break IDEA is proposed by Xiutang Geng, and concrete implement process is
given in Res.[37].

Although DNA computing is a fire-new computing mode, it can not get away from the
influence of Turing in the corresponding theoretical computing model. Under the existing DNA
computing mode, the time complexity of DNA computing does not increase with the
computational complexity remarkably. But it only converts the time complexity into space 36
complexity. Then, once the complication of problems break the physical limit of DNA segment
which operated by the bio-chemical technique, DNA computing is still too far away to reach. Up
to now, methods of traditional decryption based on DNA computing can not evade from

UNIVERSIY OF CALCUTTA | DNA CRYPTOGRAPHY


exponential explode cycle. In spite of that, DNA computing has greatly improved the ability of
people to break the cipher.

5. AN ENCRYPTION SCHEME USING DNA TECHNOLOGY AND ITS WEAKNESS

In their research paper “An Encryption Scheme Using DNA Technology”, Guangzhao Cui et al.
[38] described the system design of encryption scheme, whose security is mainly based on the
difficult biological problems and difficult mathematical problems. They showed the way of
exchanging message safely just between specific two persons. As usually they called the sender
Alice, and the intended receiver Bob. They extended the definition of this encryption scheme as
follows :

Suppose there is a sender Alice who owns an encryption key KA, and an intended receiver Bob
who owns a decryption key KB (KA = KB or KA ≠ KB). Alice uses KA to translate a plaintext M into
ciphertext C by a translation (encryption) E. Bob uses KB to translate the ciphertext C into the
plaintext M by a translation (decryption) D.

The encryption process is:

C = EKA (M)

The decryption process is:

DKB (C) = DKB (EKA (M)) = M

It is difficult to obtain M from C unless one has KB. Here, KA, KB and C are not limited to digital 36
data, but can be any method, material, data, etc. such as DNA sequence. E and D are also not
limited to mathematical calculations, but can be any physical or chemical or biological or

UNIVERSIY OF CALCUTTA | DNA CRYPTOGRAPHY


mathematical process such as traditional encryption method. Using traditional cryptography
RSA to preprocess the plaintext, an encryption scheme with DNA technologies was proposed in
this paper. The general process of the encryption and decryption scheme is described in the
later section.

5.1 DNA Digital Coding Technology

In the information science, the most fundamental coding method is binary digital coding, which
is anything can be encoded by two state 0 or 1 and a combination of 0 and 1. There are four
kinds of bases, which are adenine (A) and thymine (T) or cytosine (C) and guanine (G) in DNA
sequence. The simplest coding patterns to encode the 4 nucleotide bases (A, T, C, G) is by
means of 4 digits: 0(00), 1(01), 2(10), 3(11). Obviously, there are 4!=24 possible coding patterns
by this encoding format. As we all know, in a double helix DNA string, two DNA strands are held
together complementary in terms of sequence, that is A to T and C to G according to Watson-
Crick complementary rule. Take DNA digital coding into account, it should reflect the biological
characteristics of 4 nucleotide bases, the complementary rule that (~0)=1, and (~1=0) is
proposed in this DNA digital coding. According to this complementary rule, that is 0(00) to 3(11)
and 1(01) to 2(10). So among these 24 patterns, only 8 kinds of patterns (0123/CTAG,
0123/CATG, 0123/GTAC, 0123/GATC, 0123/TCGA, 0123/TGCA, 0123/ACGT, 0123/AGCT) which
are topologically identical fit the complementary rule of the nucleotide bases. It is suggested
that the coding pattern in accordance with the sequence of molecular weight, 0123/CTAG, is
the best coding pattern for the nucleotide bases. This pattern could perfect reflect the
biological characteristics of 4 nucleotide bases and have a certain biological significance.

The binary digital coding of DNA sequences prevails over the character DNA coding with the
following advantages: (1) To decrease the redundancy of the information coding and improve
the coding efficiency compared to the traditional

character DNA coding. (2) The digital coding of DNA sequence is very convenient for
mathematical operation and logical operation and may give a great impact on the DNA bio-
computer. (3) The DNA sequence after preprocessing by DNA digital coding techniques is able 36
to do digital computing and adapt to the existing computer-processing mode, which facilitates
the direct conversion between biological information and encryption information in the
cryptography scheme. (4) By using the technology of DNA digital coding, the traditional

UNIVERSIY OF CALCUTTA | DNA CRYPTOGRAPHY


encryption method such as DES or RSA could be used to preprocess to the plaintext in the
cryptography scheme.

5.2 Key Generation

The intended receiver Bob has a public-private pair of keys (e, d). The message-sender Alice
designs a DNA sequence which is 20-mer oligonucleotides long as a forward primer for PCR
amplification and transmits it to intended receiver Bob over a secure channel. The message-
receiver Bob also designs a DNA sequence which is 20-mer oligonucleotides long as a reverse
primer for PCR amplification and transmits it to Alice over a secure channel. After a pair of PCR
primers is respectively designed and exchanged over a secure communication channel, an
encryption key KA is formed that is a pair of PCR primers and Bob’s public key e, as well as an
decryption key KB is formed that is a pair of PCR primers and Bob’s secret key d.

5.3 Encryption Procedure

First of all, the sender Alice will translate the plaintext M into hexadecimal code by using the
built-in computer code. Then hexadecimal code is translated into binary plaintext M’ by using
third-party software. Finally, Alice translates the binary plaintext M’ into the binary ciphertext
C’ by using Bob’s public key e. This preprocess operation is called the “data pre-treatment”. The
pretreatment data flow chart is described in Fig.3.

36

UNIVERSIY OF CALCUTTA | DNA CRYPTOGRAPHY


Fig. 3 Data Pre(Post)treatment flowchart

Then, Alice translates the binary cipher text C’ into the DNA sequence according to the DNA
digital coding technology described in section 5.1. After coding, Alice synthesizes the secret-
message DNA sequence which is flanked by forward and reverse PCR primers, each 20-mer
oligonucleotides long. Thus, the secrete-message DNA sequence is prepared. The last process of
this encryption is that Alice generates a certain number of dummies and puts the secrete-
message DNA sequence among them. It is necessary that each dummy has the same structure
as the secrete-message DNA sequence. In this scheme, the dummy is generated by sonic-ting
human DNA to roughly 60 to 160 nucleotide sequences with a certain number of dummies.
Alice sends the DNA mixture to Bob using an open communication channel.

5.4 Decryption Procedure

After the intended receiver Bob gets the DNA mixture, he can easily find the secrete-message
DNA sequence. Since the intended receiver Bob had gotten the correct PCR two primer pairs
through a secure way, he could amplify the secrete message DNA sequence by perform PCR on
DNA mixture. After Bob amplifies the secrete-message DNA sequence, he could retrieve the
plaintext M sent from Alice using the reverse of the preprocess operation using his secret key d.
This decryption process is not only a mathematical computation, but also a biological process. 36

UNIVERSIY OF CALCUTTA | DNA CRYPTOGRAPHY


5.5 Details of the Scheme with Example

In the following part of this section, the details of this encryption scheme is thoroughly
discussed with an example shown in Fig.4. The result of the PCR amplification is shown in Fig.5.

Step 1: Key Generation

The message-sender Alice and the message-receiver Bob respectively design and exchange a
pair of PCR primers over a secure communication channel. The encryption and decryption keys
are a pair of PCR primers. In this scheme, the intended PCR two primer pairs was not
independent designed by sender or receiver, but respectively designed involving complete
cooperation by sender and receiver. This operation could increase the security of this
encryption scheme, because even if an adversary somehow caught one of a primer pair, the
amplification was not efficient when one of a primer pair is incorrect. Only when both of the
primer sequences were correct, the amplification could be successful.

Step 2: Data Pre-treatment

Here “GENECRYPTOGRAPHY” (gene cryptography) is chosen as plaintext M to encrypt. This


plaintext is first converted into hexadecimal code by using the built-in computer code, that is:
“47 45 4E 45 43 52 59 50 54 4F 47 52 41 50 48 59”. Then this hexadecimal code is translated
into binary plaintext M’ by using third-party software, that is:

01000111 01000101 01001110 01000101

01000011 01010010 01011001 01010000

01010100 01001111 01000111 01010010

01000001 01010000 01001000 01011001

Step 3: Encryption

Alice will encrypt the binary plaintext M’ into the binary ciphertext C’ by using Bob’s public key 36
e. After that, Alice converts the binary ciphertext C’ into the DNA sequence by using the DNA
digital coding technology. Finally, a secret-message DNA sequence, containing an encoded
message 64 nucleotides long, is flanked by forward and reverse PCR primers. Thus, the secrete-

UNIVERSIY OF CALCUTTA | DNA CRYPTOGRAPHY


message DNA is prepared. After mixing the secrete-message DNA sequence with a certain
number of dummies, Alice sends the DNA mixture to Bob using an open communication
channel, such as DNA ink or DNA book.

Step 4: Decryption

After the intended receiver Bob gets the DNA mixture, he can easily pick out the secret-
message DNA sequence by using the correct primer pairs. Bob translates the secret-message
DNA sequence into the binary ciphertext C’ by using the DNA digital coding technology. Then,
Bob can decrypt the binary ciphertext C’ into the binary plaintext M’ by using his secret key e.

Step 5: Data Post-treatment

After the binary plaintext M’ has been recovered, Bob can retrieve the plaintext M,
“GENECRYPTOGRAPHY”, from the binary plaintext M’ by using “data post-treatment” which is
actually reverse of the “data pre-treatment” process.

36

Fig. 4 Flow chart of Encryption scheme system

UNIVERSIY OF CALCUTTA | DNA CRYPTOGRAPHY


Fig. 5 The result of the PCR amplification

5.6 Security of the Scheme

The security of this encryption scheme comes from two levels: The first level is biological
security and the foundation for this security is the complexity of biological difficult problems. It
is extremely difficult to amplify the message encoded sequence without knowing the correct
PCR two primer pairs. When the adversary gets the sending sample and tries to pick out the
message-encoded sequence without knowing the correct PCR two primer pairs, adversary must
choose two primer sequences from about 10 23 kinds of sequences. For verifying this 36
expectation, PCR with or without a correct forward primer was executed. Only when both of
the primer sequences were correct, the plaintext M could be retrieve by the amplification. On
the other hand, when one of a primer pair was incorrect, the amplification was not efficient.

UNIVERSIY OF CALCUTTA | DNA CRYPTOGRAPHY


The sending sample will be abandoned after several times experimental verification because of
the sample pollution made by biological experiment operations, which don’t like the operation
on digital data that can be repeated countless times. However, the intended message-receiver
can easily retrieve the secret-message DNA sequence by using his decryption key. This result
supports that the secret-message DNA sequence would be restored only by the intended
message-receiver. This security of the complexity of biological difficult problem is the main
foundation for the security of this encryption scheme. But, it is possible that this biological
difficult problem may be broken with the development of biological technology after many
years or adversary has caught the correct primer sequences from two parties. If this should
occur, the first security level of this encryption scheme would be broken. However, it dose not
mean that the encryption scheme itself is broken. There is a second level of security that
difficult mathematic problem of traditional cryptography RSA. Without the intended receiver’s
secret key d, adversary still needs tremendous computation to break this encryption scheme.
This encryption scheme has high confidential strength, because biological difficult issues and
cryptography computing difficulties provide a double security safeguards for the scheme.

5.7 Vulnerability of the Scheme to a Type of “Man in the Middle Attack”

Though this proposed encryption scheme assures to attain high level of security due to the
complexity of the involved difficult biological problem as well as the difficult mathematical
problem, still there is a need to exchange the forward and reverse primers (required to be used
in PCR) between the sender and the receiver through a secure channel before the actual
encryption takes place. Now no channel can be certified as an absolute secure channel as the
definition of security changes with the regular technological advancements. Therefore, if an
adversary somehow manages to make the so called “secure” channel compromised, then he
can launch an attack which may be of the same nature as that of the famous “Man in the
Middle Attack” related with the Diffie-Hellman key exchange policy of classical cryptography.
The procedure of the attack may be the following :

Step 1 : The adversary Derth, managing to access the secure channel, captures the forward
primer (say FA) sent by Alice to Bob and then sends another forward primer (say F D) to Bob of 36
his own choice.

UNIVERSIY OF CALCUTTA | DNA CRYPTOGRAPHY


Step 2 : In the same way Derth captures the reverse primer (say R B) sent by Bob to Alice and
then sends another reverse primer (say RD) to Alice of his own choice.

Step 3 : Now Alice has the set of primers {F A , RD}, Bob has the set of primers {F D , RB} and Derth
has the both set of primers {FA , RD} & {FD , RB} to be used in the PCR process.

Step 4 : Now when Alice sends some DNA-encoded message, which is encrypted using {F A , RD},
through some insecure (i.e. of low cost) channel, Derth blocks it easily and uses the same
primer pair, which is available to him, to decrypt the message and obtains the original message.

Step 5 : Derth now encrypts the message again using the primer pair {F D , RB} and sends it to
Bob.

Step 6 : Bob, thinking that the message has been arrived from Alice, decrypts it using the
primer pair {FD , RB} and sends his DNA-encoded reply-message which he encrypted using the
primer pair {FD , RB}.

Step 7 : Now again Derth blocks the encrypted message from Bob and decrypts it using {F D , RB}
to obtain the original message.

Step 8 : Derth now encrypts the message again using the primer pair {F A , RD} and sends it to
Alice.

Step 9 : Alice, being unaware of the fact that the reply from Bob has been intercepted,
decrypts it using the primer pair {FA , RD} and the conversation goes on.

Therefore, Derth manages to compromise the conversation between Alice and Bob operating
from the middle and using two set of primer pairs _ one for Alice and another for Bob.

1.10. CONCLUSION

In this paper we reviewed current technologies that are available in DNA computing research
field. DNA computing is a brand new research area which receives more and more attentions
from both biologists and computer scientists. Some biological experiments has been performed
which proved the possibility of DNA computing. Due to the highly parallel characteristics of DNA 36
operations, the corresponding DNA algorithms scale well in the size of the problem. Therefore
DNA computing shows potential advantages in solving the hard problems. As a conclusion, DNA
computing is one of the newest exciting areas to be explored by researchers.

UNIVERSIY OF CALCUTTA | DNA CRYPTOGRAPHY


There are a lot of opportunities in expanding and manipulating DNA characteristics and
operations to solve real application especially industrial engineering and management
engineering problems. However there are still some obstacles in employing this method in its
full motion.

36

UNIVERSIY OF CALCUTTA | DNA CRYPTOGRAPHY


36

UNIVERSIY OF CALCUTTA | DNA CRYPTOGRAPHY


36

UNIVERSIY OF CALCUTTA | DNA CRYPTOGRAPHY


36

UNIVERSIY OF CALCUTTA | DNA CRYPTOGRAPHY


36

UNIVERSIY OF CALCUTTA | DNA CRYPTOGRAPHY


36

UNIVERSIY OF CALCUTTA | DNA CRYPTOGRAPHY


36

UNIVERSIY OF CALCUTTA | DNA CRYPTOGRAPHY


36

UNIVERSIY OF CALCUTTA | DNA CRYPTOGRAPHY


36

UNIVERSIY OF CALCUTTA | DNA CRYPTOGRAPHY


36

UNIVERSIY OF CALCUTTA | DNA CRYPTOGRAPHY


REFERENCES

1. Adleman, L. M. Molecular computation of solutions to combinatorial problems. Science,


266(11):1021-1024, 1994.

2. Zingel, T. Formal Models of DNA Computing: A Survey. Proc. Estonian Acad. Sci. Phys. Math.,
49, 2, 90–99, 2000.

3. A .Narayanan and S.Zorbalas, DNA algorithm for computing shortest paths, Proc. of the
Genetic Programming, Morgan Kaufman, p p.718-723, 1998.

4. J. C. Adams, on the application of DNA based computing, available online at:


http://publish.uwo.ca/jadams/dnaapps1.htm, 14 Jan, 2008.

5. Details of some recent advances in DNA computers are available from NASA‟s web site at:
http://www.jpl.nasa.gov/releases/2002/ release_2002_63.html

6. Nanotechnology will drive the evolution of the DNA molecule as functional components”,
Olympus TechnoZone, http://www.olympus.co .jp/en/magazine/TechZone/Vol54e/page5.html

7. Information about some of the DNA chip development work at the University of Houston can
be found at: http://www.uh.edu/admin /media/nr/102001/biochip.htm

8. Will Ryu, “DNA Computing: A Primer”, Ars Technica, 2002

36

UNIVERSIY OF CALCUTTA | DNA CRYPTOGRAPHY

You might also like