You are on page 1of 19

Technical Seminar On

Prepared and Presented by

Neeraj Chowdhary
15B81A04N9
WHAT IS DNA DIGITAL DATA STORAGE ?

 DNA digital data storage is defined as


the process of encoding and decoding
binary data to and from synthesized DNA
strands.

 Uses artificial DNA made using


commercially available oligonucleotide
synthesis

Figure 1 An artistic rendering of DNA storage


WHY IT IS NEEDED?

• New Generation Computers and High Speed Internet have gained


popularity in the recent years.

• But when it comes to handling big data, the data of a corporation


or of the world as a whole, the present data storage technology
comes nowhere near to be able to manage it efficiently.

• Current technology of data storage cannot cope with our needs


and also leads lot of E-waste.

• They also cannot store information for long periods of time.


Figure 2: Rise in E-waste
A COMPARISON

This picture shows Why DNA is the obvious


option for Future!!

From Magnetic To Genetic

Figure 3: A Comparison of Technologies


STRUCTURE OF DNA

• DNA consists of Adenine(A),


Guanine(G), Cytosine(C) and
Thymine(T).
• Paired into nucleotide base pairs A-T
and G-C. Figure 4:
• Backbone of the DNA strand is made Structure of
from alternating phosphate and sugar DNA
residues.
• Single nucleotide can represent 2 bits
of information
HOW DNA AS STORAGE
TECHNOLOGY?

• Source data in form of binary bits (0 and 1) was converted to a tertiary bit code
(0, 1 and 2) to decrease chances of encoding errors.

• Following the conversion, the digital data is encoded into the nucleobases of
DNA.

• By altering the positions of nucleobases A,T,G and C, the tertiary code can be
mapped onto the nucleobases codes, thus making a repetitive blocks of nucleobases
that encode data.

• The encoded DNA then can be sequenced and read back to tertiary and then to
binary data using technologies similar to those used to map the human
genome.
A
PRACTIC AL
EXAMPLE

Figure 5: A View of
the Entire Process
EXAMPLE

• Lets See “VVIT” as sample string.


• First we should use numbers to represent the letters in ASCII code
• From ASCII table V=86 V=86 I=73 T=84
• Then change to quaternary numbers 86= 1112 86= 1112 73 = 1021
84 = 1110
• Use “A ,T, C & G” to represent the numbers
• 0=A 1=T 2=C 3=G
• 1112111210211110
TTTCTTTCTACTTTTA
WHY DNA?

• A mere milligram of the molecule


could encode the complete text of
every book in the Library of Congress.
• Very high data density.
• More compact than current magnetic
tape or hard drive storage.
• As Right side graph shows cost to
manufacture Genomes is going down
• Hence future looks bright

Figure 5: Graph showing cost to manufacture


one Genome
DISADVANTAGES

 Cost : The production costs of generating raw, unassembled sequence


(reading) data is high. Synthesizing artificial sequences is costlier.

 Speed: Speed is low. The fastest current technology can sequence (read)
DNA on the order of about 1 billion bases per hour. Synthesis (write) is
even slower and more expensive as well. This is extremely slow compared
to modern storage media but would be suitable for long term data storage.

 Rewriting: This is essentially a write-once technology, but static data like


government and historical records could benefit from this storage option.
DEVELOPMENTS

• In 2016 research by Church and Technicolor Research and Innovation was


published in which, 22 MB of a MPEG compressed movie sequence were
stored and recovered from DNA.

• In March 2017,Yaniv Erlich and Dina Zielinski of Columbia University and


the New York Genome Center published a method known as DNA Fountain
that stored data at a density of 215 petabytes per gram of DNA. The technique
approaches the Shannon capacity of DNA storage, achieving 85% of the
theoretical limit. The method was not ready for large-scale use, as it costs
$7000 to synthesize 2 megabytes of data and another $2000 to read it.
DEVELOPMENT

• In March 2018, University of Washington and Microsoft published results


demonstrating storage and retrieval of approximately 200MB of data. The
research also proposed and evaluated a method for random access of data
items stored in DNA.

• Research published by Eurecom and Imperial College in January 2019,


demonstrated the ability to store structured data in synthetic DNA. The
research showed how to encode structured or, more specifically, relational
data in synthetic DNA and also demonstrated how to perform data processing
operations (similar to SQL) directly on the DNA as chemical processes.
DNA FOUNTAIN

• DNA Fountain is a strategy to store and


retrieve DNA information that is robust and
approaches the theoretical maximum of
information that can be stored per nucleotide.

• The success of our strategy lies in careful


adaptation of recent developments in coding
theory to the domain specific constraints of
DNA storage.
APPLICATIONS

• National security for information hiding


purposes and for data stenography.

• Preserve safely the personal information of a person such as medical


information and family history in their own bodies.

• Storage of archival documents.


CONCLUSION

• At present, DNA storage is experimental. Before it becomes commonplace, it


needs to be completely automated, and the processes of both building DNA and
reading it must be improved.

• They are both prone to error and relatively slow. For example, today’s DNA
synthesis lets us write a few hundred bytes per second; a modern hard drive can
write hundreds of millions of bytes per second.

• These are significant challenges, but we are optimistic because all the relevant
technologies are improving rapidly.

• Further, DNA data storage doesn’t need the perfect accuracy that biology requires,
so researchers are likely to find even cheaper and faster ways to store information
in nature’s oldest data storage system.
A STORY TO END THE SEMINAR

• In January 21, 2015, Nick Goldman from the European Bioinformatics Institute (EBI),
announced the Davos Bitcoin Challenge at the World Economic Forum annual meeting
in Davos.

• During his presentation, DNA-tubes were handed out to the audience with the
message that each tube contained the private key of exactly one bitcoin, all coded in
DNA.

• The first one to sequence and decode the DNA could claim the bitcoin and win the
challenge. The challenge was set for three years and would close if nobody claimed the
prize before January 21, 2018.
A STORY TO END THE SEMINAR

• Almost three years later on January 19, 2018, the EBI announced that a Belgian
PhD student, Sander Wuyts of the University of Antwerp, was the first one to
complete the challenge.

• Next to the instructions on how to claim the bitcoin (stored as a plain text
and pdf file), the logo of the EBI, the logo of the company that printed the
DNA (Custom Array) and a sketch of James Joyce were retrieved from the
DNA.

You might also like