You are on page 1of 100

Ashwin Balu & Sunny Gupta

CISC879: Natural Computing
Queen’s University
Outline
Biology Overview
DNA Basics
Gene Expression and System
Biology
DNA Reactions
DNA Computations
DNA Computing Applications
Current Trends
Open Problems and Limitations

Background
 “The use of biochemicals and biomolecular
operations to solve problems and to perform
computation”

 Questions:
◦ Can any algorithm be simulated by means of DNA
computing?
◦ Is it possible to design a programmable molecular
computer?
Motivation
 Unique features
◦ DNA used as data and code structures!
◦ Many ways of creating DNA computers

 Usefulness
◦ Massive parallelism
◦ Smaller “hardware” size
◦ High energy efficiency
◦ Smaller information storage
◦ Can solve problems standard computers can’t
Need of DNA computer?
Moore’s Law states that silicon
microprocessors double in
complexity roughly every two
years.
One day this will no longer hold
true when miniaturisation limits
are reached. Intel scientists say
it will happen in about the year
2018.
Require a successor to silicon.

The Beginning
 Francis Crick &  James D. Watson.
co-discoverers of the structure of
the DNA molecule in 1953

Molecular Biology of the
Cell
 Cellular structures and processes result from a complex
interaction network of biological molecules
Carbohydrates
Lipids
Proteins

Mammals only use 20 different amino
acids to make the immense variety of
proteins it needs.
Molecular Forces &
Bonding
What is DNA?
Source code to life
Instructions for building and
regulating cells
Data store for genetic inheritance
Think of enzymes as hardware,
DNA as software

DNA Molecule

DNA is a medium of information storage using 4 base pairs
to store information for all living cells.

It has contained and transmitted the data of life for
billions of years

DNA is organized into long structures called chromosomes
DNA






 DNA makes the building blocks for life



DNA - Data and Code






 DNA doesn't just make proteins , it has
instructions on how the system should behave
 Human and chimpanzee DNA is 98.5 percent identical
 DNA of humans and mice is only around 60 percent
similar



Dense Information Storage

üThis image shows 1
gram of DNA on a CD. The
CD can hold 800 MB of
data.
ü
üThe 1 gram of DNA can
hold about 1x1014 MB of
data.
Types of DNA
Mitochondrial
Nuclear

 Nuclear and mitochondrial DNA are thought
to be of separate evolutionary origin

 mtDNA being derived from the 
circular genomes of the bacteria



Mitochondrial DNA
Nuclear chromosomes encode around
30,000 genes in 3 billion bases

Mitochondrial DNA genome is tiny with
only around 16,500 bases. However, this

16k of data is enough to encode
several proteins and RNA molecules,
containing exactly 35 genes.
From Large to Small

10 μm 0.84 μm

2 nm 11 nm
DNA Molecules
 Important features
◦ 3 basic parts for each
◦ Numbering carbons
◦ 5’: unattached phosphate group, 3’: unattached
hydroxyl group

A T

G C
DNA Bases
 Important features
◦ Complementarity
◦ Purines vs. pyrimidines
◦ Hydrogen bonds
◦ Phosphodiester bonds
◦ Antiparallelism
◦ Natural direction

DNA Molecules
 Simplest representation
◦ 5’ – CGTGTTCGAAGCCC – 3’
◦ 3’ – GCACAAGCTTCGGG – 5’

 Important features
◦ Representation
◦ Complementarity
◦ Directionality

 Sticky ends
◦ 5’ – CGTGTTCGA – 3’
◦ 3’ – GCACA – 5’
Manipulating DNA
 1) Denaturation (melting)









 2) Annealing (renaturation)

Manipulating DNA
 3) Polymerase extension

 5’ – TCGATT – 3’ (primer)
 3’ – AGCTAACTT – 5’ (template)

 5’ – TCGATTG – 3’
 3’ – AGCTAACTT – 5’

 5’ – TCGATTGA – 3’
 3’ – AGCTAACTT – 5’

 5’ – TCGATTGAA – 3’
 3’ – AGCTAACTT – 5’


Manipulating DNA
 4) Nuclease degradation 
◦ 
 5’ – TCGATTGAA – 3’ 5’ – TGAATTCCG – 3’
 3’ – AGCTAACTT – 5’ 3’ – ACTTAAGGC – 5’

 5’ – TCGATTGA – 3’ 5’ – TG– 3’ 5’ –
 3’ – GCTAACTT – 5’ AATTCCG – 3’
 3’ – ACTTAA – 5’
 5’ – TCGATTG – 3’ 3’ – GGC – 5’
 3’ – CTAACTT – 5’
 5’ – TGCCCGGGA – 3’
 5’ – TCGATT – 3’ 3’ – ACGGGCCCT – 5’
 3’ – TAACTT – 5’
 5’ – TGCCC – 3’ 5’
– GGGA – 3’
3’ – ACGGG – 5’ 3’
– CCCT – 5’
Manipulating DNA
 5) Ligation

 OH
P
 5’ – TC
GATTGAA – 3’
 3’ – AGCTAA
CTT – 5’
 P
OH

 OH P
 5’ – TCGATTGAA – 3’
 3’ – AGCTAACTT – 5’
 OH
P

Manipulating DNA
 6) Amplification

1. Denaturatation

2. Add primers

3. Annealling
Manipulating DNA
 6) Amplification (cont’d)

4. Polymerase extension
Manipulating DNA
 7) Gel electrophoresis
Manipulating DNA
 8) Modify nucleotides – insert, delete, substitute

 9) Filtering – magnetic bead separation

 10) Synthesis of a single strand

 11) Sequencing
DNA manipulations:
Ifwe want to use DNA as an
information bulk, we must be able to
manipulate it .
However we are talking of handling
molecules…
ENZYMES = Natural CATALYSERS.
So instead of using physical processes,
we would have to use natural ones,
more effective:
◦ for lengthening: polymerases…
◦ for cutting: nucleases (exo/endo-
nucleases)…
◦ for linking: ligases…
Serialization: 1985: Kary Mullis  PCR
 Thank this reaction we get millions of identical
Video
DNA Machine
Introduction to DNA
Computing
What is DNA computing ?
◦ Around 1950 first idea (precursor
Feynman)
Molecularlevel (just greater than
10-9 meter)
Massive parallelism.
◦ In a liter of water, with only 5
grams of DNA we get around 1021
bases !
◦ Each DNA strand represents a
processor !
DNA Computing Begins...
 1970s
◦ Much speculation

 1994
◦ Leonard Max Adleman
◦ “Molecular Computation Solutions to
 Combinatorial Problems”
◦ Used DNA computing to solve an NP-complete
problem: Hamiltonian Path Problem

“Biology and computer science - life and computation –
are related.”
Hamiltonian Path Problem
 Solution:
◦ 1. Generate random paths through the graph
◦ 2. Keep only those paths beginning with vin and
ending with vout
◦ 3. If graph has n vertices , keep only those paths
with exactly n vertices
◦ 4. Keep only those paths that enter all vertices
of the graph at least once
◦ 5. If any path remains, say YES; else, say NO

6

2 3

0
5 4
1
Hamiltonian Path Problem
 Instance of the HPP solved by Adleman

6

2 3

0
5 4

1
Adleman’s HPP Solution
 Adleman translated this solution step-by-step
into molecular biology
 Encoded each vertex as a single stranded
nucleotide of length 20 – randomized codes
 Each possible edge synthesized
 Connect edges by enzymatic ligation


 TGAATCCGACGTCCAGTGA
v
ATGAACTATGGCACGCTATC
v2
1
 GCAGGTCACT
TACTTGATAC

e12
Adleman’s HPP Solution
Adleman’s HPP Solution

The basic idea is to have a set
of molecules with unique
sequences representing the
vertices and edges of the graph
Adleman’s HPP Solution
 Solution:
◦ 1. Generate random paths through the graph
◦ 2. Keep only those paths beginning with vin and
ending with vout
◦ 3. If graph has n vertices , keep only those paths
with exactly n vertices
◦ 4. Keep only those paths that enter all vertices
of the graph at least once
◦ 5. If any path remains, say YES; else, say NO
Adleman’s HPP Solution
 Let Ei be the oligonucleotide of edge i
 Let Ei be the complement of Ei

 Using E0 and E6 as primers, PCR product of Step
1
 Only paths containing vertex 0 and vertex 6
remain
 Use filtering operation to separate out
strands starting at vertex 0 and ending with
vertex 6
Adleman’s HPP Solution
 Solution:
◦ 1. Generate random paths through the graph
◦ 2. Keep only those paths beginning with vin and
ending with vout
◦ 3. If graph has n vertices , keep only those paths
with exactly n vertices
◦ 4. Keep only those paths that enter all vertices
of the graph at least once
◦ 5. If any path remains, say YES; else, say NO
Adleman’s HPP Solution
 Separate product of Step 2 by gel
electrophoresis
 Identify DNA molecules with 7 vertices
◦ 7 vertices * 20 bases each = 140 bp
 Repeatcycles of PCR and gel electrophoresis
to purify the product further

 Result: 7-vertex molecules that start with 0,
end with 6
 Examples:
◦ 0, 1, 2, 3, 4, 5, 6
◦ 0, 3, 2, 3, 4, 5, 6
◦ 0,1, 1, 1, 1, 1, 6
Adleman’s HPP Solution
 Solution:
◦ 1. Generate random paths through the graph
◦ 2. Keep only those paths beginning with vin and
ending with vout
◦ 3. If graph has n vertices , keep only those paths
with exactly n vertices
◦ 4. Keep only those paths that enter all vertices
of the graph at least once
◦ 5. If any path remains, say YES; else, say NO
Adleman’s HPP Solution
 Probe single stranded DNA with complementary
oligonucleotides attached to magnetic beads
 Can pull sequences with specific vertices out
of the solution
 Use one step for each vertex



Adleman’s HPP Solution
 Solution:
◦ 1. Generate random paths through the graph
◦ 2. Keep only those paths beginning with vin and
ending with vout
◦ 3. If graph has n vertices , keep only those paths
with exactly n vertices
◦ 4. Keep only those paths that enter all vertices
of the graph at least once
◦ 5. If any path remains, say YES; else, say NO
Adleman’s HPP Solution
 PCR the remnants after Step 4
 Analyze it by gel electrophoresis
 If anything exists, obtain YES – Hamiltonian
path found
 Else, obtain NO – no Hamiltonian path
available
Adleman’s HPP Solution
 Solution:
◦ 1. Generate random paths through the graph
◦ 2. Keep only those paths beginning with vin and
ending with vout
◦ 3. If graph has n vertices , keep only those paths
with exactly n vertices
◦ 4. Keep only those paths that enter all vertices
of the graph at least once
◦ 5. If any path remains, say YES; else, say NO

◦ WE JUST COMPUTED WITH DNA!!!
Thoughts About Adleman’s
Solution
 Practical details of experiment are not
relevant
 Experiment took 7 days of lab work
 However, distinctive advantage
◦ # of oligonucleotides needed will increase
linearly in relation to the number of vertices
involved
◦ NP-complete in classical computing; O(n) in DNA
computing

Thoughts About Adleman’s
Solution
 Adleman’s solution has weaknesses
◦ # of single strands necessary to encode vertices
and edges of generic HPP is of the order n!
◦ Drastic limitations on the size of problems that
can be solved by this procedure
◦ General strategy to work around this is to
diminish set of candidate solutions to be
generated

 Since HPP is NP-complete, Adleman’s DNA
technique can solve any NP problem
◦ But not necessarily in a feasible way

Thoughts About Adleman’s
Solution
 Brute force was used

 Speed of any computer determined by
◦ Parallelism
◦ Number of steps per unit time

DNA Classical
Operations 106 - 1012 1014 - 1020
(per second)
Energy used 2*1019 109
(operations per joule) Theoretical:
Storage size of one bit 10 12 19
34*10 1
(per cubic nanometer)
SAT Problem
 Satisfiability problem for prepositional
formulae
 Logical variables E = {e1, e2, …, en}
 Clauses Cj = {e1j, e2j, …, enj} joined by AND,
OR, NOT

 Problem:
◦ Given C1 ^ C2 ^ … ^ Cm assign a Boolean value to
each variable such that the entire statement is
TRUE
◦ NP-complete!

Lipton’s SAT Solution
 Possibly represent this as a graph search
problem
 Two phases:
◦ 1) Generate all paths in the graph
◦ 2) Search (filter) for truth assignment set that
satisfies formula
◦ Basically, same principles as Adleman

 Assume formula with n variables
 FALSE e10 e20 e30 en-1 0 en0

 v0 v1 v2
… vn-1 vn

 TRUE e10 e20 e30 en-1 0 en0
Lipton’s
Gene Expression
Used by all known life, eukaryotes, prokaryotes and viruses
Modulates the macromolecular machinery for life

Transcription, RNA splicing, Translation, post-translational modifications of p

Gene regulation gives the cell control over structure and function
Metaprogramming
Gene Expression
Transcription

Translation

Turing machines, invented by
Alan Turing in 1936, are
extremely simple computers
that consist of a finite-
state compute head that can
move back-and-forth on an
infinite one-dimensional
memory tape.
Systems Biology
Gene Regulation Network
Cytoscape Demo
Internet connection map

Asia Pacific - Red
Europe/Middle
East/Central
Asia/Africa - Green

North America -
Blue
Latin American and
Caribbean - Yellow
RFC1918 IP
Addresses - Cyan
Unknown - White
DNA Computer
 Given enough strands of DNA and
certain biological operations
 DNA can model 1-tape
nondeterministic Turing machine
 DNA compare to formulas
 DNA can work like a state
machine

DNA Logic Gates
 DNA can work like a state
machine
 Catalytic DNA or  DNAzyme
 DNAzymes are used to build logic
gates
 DNAzymes are limited to 1-, 2-,
and 3-input gates

DNA Multiplication
DNA Multiplication

Restriction Enzyme Digests
DNA Multiplication
DNA Code Breaking
DNA Code Breaking
DNA Associative Memory
Content addressable memories
are useful in a number of
computer contexts and are
widely thought to be an
important component of human
intelligence

Content addressable memory is
one where a stored word may be
retrieved from sufficient, partial,
knowledge of its content.
DNA Associative Memory
Vesselcontaining DNA
Encode a word-appropriate single
strand DNA molecule encoding it

Stickers model:
Memory complex = Strand of DNA
(single or semi-double).
Stickers are segments of DNA,
that are composed of a certain
number of DNA bases.
To use correctly the stickers
model, each sticker must be
able to anneal only at a specific
place in the memory complex.

To visualize:

0 0 0 1 0 1 0 0 1 0
Memory complex:
Semi-double

Soup of stickers:

=
A G C A T G A T

Zoom
DNA Associative Memory
About a stickers machine?
Simple operations: merge, select,
detect, clean.
 Tubes are considered (cylinders
with two entries)
However for a mere computation
(DES):
◦ Great number of tubes is needed
(1000).
◦ Huge amount of DNA needed as
well.
Practically no such machine has
Technological Developments

US team shows that DNA
computing can be simplified
by attaching the molecules
to a surface.

DNA molecules were applied
to a small glass plate
overlaid with gold.

Exposure to certain enzymes,
destroyed the molecules with
wrong answers leaving only
the DNA with the right
answers.
DNA Limitations
DNA denaturing (temperature,
time, PH)
Length of the DNA strand=size of
the problem
While the number of strands
could be exponential ..1021 is the
upper bound (volume issues)
DNA algorithms need to be more
noise tolerant

Making DNA Computers Error
Resistant
DNA computing not error free

DNA calculations fall into 3 basic
classes
1.Decreasing Volume (# strands
are reduced with each step)
2.Constant Volume (# strands
constant throughout all steps)
3.Mixed Algorithms
Making DNA Computers Error
Resistant

Aldemanand Lipton are even
more special. Each strand is
“good” or “bad”
•Good strands encode a solution
•Bad strands do not
•If a good strand is damaged or
lost the algorithm fails
•If a bad strand is not removed
and many are left at end then
Making DNA Computers Error
Resistant

Aldemanand Lipton are even
more special. Each strand is
“good” or “bad”
•Good strands encode a solution
•Bad strands do not
•If a good strand is damaged or
lost the algorithm fails
•If a bad strand is not removed
and many are left at end then
Making DNA Computers Error
Resistant

Two sources of errors
1.Every operation can cause an error
(extraction)
◦ –extraction is not perfect usually 95%
strands match the desired pattern
◦ –In addition, strands that do not
match will sometimes be removed
anyways.
◦ Rates typically 1 part in 106
2.DNA has ½ life, and decays at a finite
rate. If an algorithm takes months
Some hurdles:
Operations done manually in the
lab.
Natural tools are what they are…
àFormation of a library (statistic
way)
àOperations problems

Molecular Computing
 Wayne State University’s Michael Conrad has defined
his vision of a molecular computer in which proteins
integrate multiple input modes to perform a
functional output (Conrad, 1986). In addition to
smaller size scale, protein based molecular
computing offers different architectures and
computing dimensions. Conrad suggests that “non-
von Neumann, nonserial and non-silicon” computers
will be “context dependent,” with input processed as
dynamical physical structures, patterns, or analog
symbols. Multidimensional conditions determine the
conformational state of any one protein:
temperature, pH, ionic concentrations, voltage,
dipole moment, electroacoustical vibration,
phosphorylation or hydrolysis state, conformational
state of bound neighbor proteins, etc. Proteins
integrate all this information to determine output.
Thus each protein is a rudimentary computer and
Molecular Computing
Cells and organisms are natural molecular computers

Allowing proteins to fold producing computation



Molecular Computing
 Allowing proteins to fold producing
computation


Protein Folding
 Mainly guided by:
Hydrophobic interactions
Intramolecular hydrogen bonds
Van der Waals forces

Protein Folding
 Folding is a free energy minimization process
that depends on the interactions among
amino acids
 Protein change as fast as femtoseconds (10-15
sec)

Folding Proteins
 All proteins begin to fold into three–
dimensional structures after synthesis
 These structures gives proteins its
functionally (lock and key receptors)
 Folding is a free energy minimization process

Protein Folding Problem
 Considered to be an NP-complete problem

 Massively parallel computers to derive
solutions by brute force have failed

 Molecular pathway too complex

 Genetic Algorithms do better but cannot
guarantee polynomial time, fitness
relies on structure, and since the
structure is not known you have the
“termination problem” in GA
Protein Folding Problem
 Protein with mere 25 amino acids requires 94,502
years to solve
 No way of knowing if a GA terminated with an optimal
solution
Protein based computing
Different architectures and computing

dimensions

 Non-von Neumann, non-serial and non-silicon
 Context dependent
 Input processed as dynamical physical
structures, patterns, or analog symbols
 Multidimensional conditions
 Temperature, pH, ionic concentrations, voltage,
dipole moment, electroacoustical vibration,
phosphorylation or hydrolysis state,
conformational state of bound neighbor
proteins, etc.
 Proteins integrate all this
Protein Folding Matrix
Computer
 Use Rose scale matrix
 Let the protein folding solve large matrix
problems
Folding Proteins
Applications
 Possible to generate a vast combinatorial of
different protein shapes just by changing
the DNA base sequence

 Encrypting data (lock and key)
 Decrypting data
 Encryption breaking
 Pattern Recognition

D N A C O M P U T E R V s S ILIC O N C O M P U T E R

Feature DNA COMPUTER SILICON COMPUTER

Miniaturization Unlimited Limited

Processing Parallel Sequential

Speed Very fast Slower

Cost Cheaper Costly

Materials used Non-toxic Toxic

Size Very Small Large

Data Capacity Very Large Smaller
Advantages
 Perform millions of operations
simultaneously;
 Conduct large parallel processing
 Massive amounts of working memory;
 Generate & use own energy source via the
input.
 Four storage bits A T G C .
 Miniaturization of data storage
Limitations
DNA computing involves a relatively
large amount of error

Requires human assistance!

Time consuming laboratory
procedures.

No universal method of data
representation.
Slides to go
I'm putting together a few slides
on associative memory,
cryptographic problems, DNA
based addition and matrix
multiplication, parallel machines
and DNA computer limitations.
