Professional Documents
Culture Documents
Submitted By
ABHISHEK YOGIRAJ SHIVGAN
(1811001)
Course Guide
Shri. S. C. Kulkarni
DEPARTMENT OF
ELECTRONICS AND TELECOMMUNICATION ENGINEERING
GOVERNMENT COLLEGE OF ENGINEERING, JALGAON 425002
DEC 2021
GOVERNMENT COLLEGE OF ENGINEERING, JALGAON 425002
(An Autonomous Institute of Government of Maharashtra and affiliated To Kavayitri
Bahinabai Chaudhari North Maharashtra University, Jalgaon)
Department of Electronics and Telecommunication Engineering
CERTIFICATE
This is to certify that ET407U SEMINAR REPORT on DNA COMPUTING, which is being
submitted herewith for the partial completion of Bachelor of technology completed by ABHISHEK
YOGIRAJ SHIVGAN under my supervision and guidance. With the declaration of the student, the
work embodied in this Seminar Report has contributed to the best of my knowledge and belief.
(Dr. G. M. Malwatkar)
Principal
DECLARATION
I hereby declare that ET407U SEMINAR REPORT on DNA COMPUTING was performed and
written by me under the guidance of Shri. S. C. Kulkarni at Government College of Engineering,
Jalgaon. This work has not been previously formed the basis for the award of any degree or diploma
or certificate nor has been submitted elsewhere for the award of any degree or diploma.
Place: Jalgaon
Date:
ABHISHEK YOGIRAJ SHIVGAN
PRN: 1811001
Final Year B. Tech E&Tc
ACKNOWLEDGMENT
It is indeed a great pleasure and proud privilege for us to present this report; first and
foremost. We are thankful to principal of our college Dr. G. M. Malwatkar for having taken
interest in all activities related to studies. We would like to express sincere gratitude towards our
HoD Dr. D. S. Chaudhari and guide Shri. S. C. Kulkarni Sir for being guiding force behind all
our efforts and their assistance during the seminar. It is indeed a great pleasure and proud privilege
for us to present this report, first and foremost,we again express our sincere thanks to the staff of
Electronics and Telecommunication Engineering Department,for their co-operation and
suggestions during this mini project and report preparation.
DNA computing is an area of natural computing based on the idea that molecular
biology processes can be used to perform arithmetic and logic operations on
information encoded as DNA strands. The first part of this review outlines basic
molecular biology notions necessary for understanding DNA computing, recounts the
first experimental demonstration of DNA computing (Adleman’s 7-vertex
Hamiltonian Path Problem), and recounts the milestone wet laboratory experiment
that first demonstrated the potential of DNA computing to outperform the
computational ability of an unaided human (20 variable instance of 3-SAT).
The second part of the review describes how the properties of DNA-based
information, and in particular the Watson–Crick complementarity of DNA single
strands, have influenced areas of theoretical computer science such as formal
language theory, coding theory, automata theory and combinatorics on words. More
precisely, we describe the problem of DNA encodings design, present an analysis of
intramolecular bonds, define and characterize languages that avoid certain
undesirable intermolecular bonds, and investigate languages whose words avoid even
imperfect bindings between their constituent strands. We also present another,
vectorial, representation of DNA strands, and two computational models based on
this representation: sticker systems and Watson–Crick automata. Lastly, we describe
the influence that properties of DNA-based information have had on research in
combinatorics on words, by enumerating several natural generalizations of classical
concepts of combinatorics of words: pseudopalindromes, pseudoperiodicity, Watson–
Crick conjugate and commutative words, involutively bordered words, pseudoknot
bordered words. In addition, we outline natural extensions in this context of two of
the most fundamental results in combinatorics of words, namely Fine and Wilf's
theorem and the Lyndon–Schutzenberger result.
Chapter 1
INTRODUCTION
Every single cell which builds up a living organism carries information for various
functions necessary for the survival of the cell. This genetic information in each cell
is stored in molecules called nucleic acids. The most stable form of nucleic acids is
called deoxyribonucleic acid(DNA). Each of the DNA strands forms helical
structures that are long polymers of millions of linked nucleotides. These nucleotides
consist of one of four nitrogen bases, a five-carbon sugar, and a phosphate group. The
nitrogen bases - A (Adenine), T (Thymine), G (Guanine), C (Cytosine) encodes the
genetic information while the others provide structural stability. The strands are
linked to each other by the base-pairing rule, T with A and C with G. The
arrangement of these bases is important as they decide the functionality of different
genes.
When talking about deoxyribonucleic acid -- DNA, the molecule that carries the
genetic information of life -- scientists often make comparisons to computer systems,
with DNA being an enormous "program" to be run by the body's hardware. But
significant differences exist between the genetic code of DNA and the binary code
used by computers, and each system has its advantages and limitations.
Counting Digits
The simplest unit of binary code is the binary digit, or "bit," which can have one of
two values: 0 or 1. The simplest unit of DNA, on the other hand, is the nucleotide,
which can have one of four bases: adenine, cytosine, thymine or guanine (A, C, T or
G). This increased variation means that each nucleotide of DNA can hold twice as
much information as each digit of a binary program.
Byte Sizes
Computers and biological systems both read their respective codes in blocks of
several units instead of analysing each bit or nucleotide individually. Binary
information is grouped into sets of eight bits, called bytes; each byte thus has one of
256 possible configurations of zeros and ones. Genetic information instead comes in
triplets of nucleotides known as codons, which represent different amino acids,
meaning that each DNA "byte" has only 64 possibilities.
Protecting Data
In digital code, a single inaccurate bit causes its byte to have a different value, which
can introduce significant errors to a computer program. DNA is considerably more
resilient in comparison, as many nucleotide changes do not result in changes to the
value of -- the amino acid coded by -- a codon. Although 64 codons are possible,
biological machinery uses only 20 amino acids in the construction of proteins. Many
codons that differ by one nucleotide therefore code for the same amino acid, a
property known as redundancy. Redundancy protects genetic data from some
inevitable errors that occur in the replication and reading of DNA.
One might suggest that the genetic information is equally carried by the amino acids
produced by the codons. (This still assumes that “junk” DNA also carries exactly that
information). There are 21 possible results from each codon. The one “start” codon
encodes one amino acid; 60 different codons encode another 19 amino acids; and
three codons encode “stop”. The 3 billion base pairs would be grouped into 1 billion
codons, and each codon has 21 possible meanings. So that would be 21^(1 billion)
sequences of amino acids.
We need to convert 21^(1 billion) to a power of two, since all the other information
results are in bits. The conversion factor is ln(21)/ln(2), where “ln” is the natural
logarithm function. We have ln(21)/ln(2) = 3.0445/0.6931 = 4.3923 (rounded),
according to my calculator. (1 billion) * 4.3923 = 4,392,300,000 bits of information
to code amino acids. So that is a total information of 4,392,322,500 bits including the
epigenome. In ASCII code, that would be 627,474,642 MB (megabytes).
Comparing the Genetic Code to Computer Data Storage
Let’s conclude by comparing computer data storage to the genetic code for DNA.
Computers store data in two-valued bits, grouped as bytes of 7 or more bits (for
ASCII). One byte holds 2^7=128 unique values.
DNA stores data in four-valued base pairs, which RNA then groups as codons of 3
pairs. One codon holds 4^3=2^6=64 unique values. A sequence of base pairs that
convey biological information is called a gene. DNA includes extra information to
express or suppress specific genes. Each gene has at least one bit of information for
expression or suppression.
Computer files may be measured in megabytes or gigabytes: millions or billions of
bytes. One CD-ROM disc may store about 710 MB. Modern solid-state memory and
disk drives can store gigabytes. If we can fully prescribe one human’s DNA by
specifying the full sequence of base pairs, plus a binary flag to express or suppress
each gene, then human DNA contains about 6 Gb or 857 MB of information.
Conclusion
Still a lot of work and resources required to develop it into a fully fledged
product.