Lab+Life Scientist Apr/May 2017

Page 26

Image courtesy of Caroline Davis2010 (via Flickr) under CC BY 2.0

storage technology

Data storage in a drop of DNA

US researchers have come up with a novel method of storing the world’s ever-increasing amount of data, turning to a storage technology that humans would quite literally not be able to live without — DNA.


system, the 1895 French film Arrival of a train at La

multiplying their DNA sample through polymerase

Ciotat, a $50 Amazon gift card, a computer virus,

chain reaction (PCR) and that those copies, and

a Pioneer plaque and a 1948 study by information

even copies of their copies, could be recovered

theorist Claude Shannon. They compressed the files


he concept is not an entirely new one,

into a master file, and then split the data into short

The capacity of DNA data storage is around

with researchers at the European Bioinformatics

strings of binary code made up of ones and zeros.

1.8 binary digits per nucleotide base, accounting

Institute (EMBL-EBI) demonstrating in 2012–13

Using their own customised version of an

for the biological constraints of the material as well

the storage of 739 KB of data in DNA. And

erasure-correcting algorithm called fountain

as the need to include redundant information for

according to the authors of the current study,

codes — originally designed for streaming video

reassembly. By applying their version of fountain

published in the journal Science, DNA has all the

on a smartphone — the researchers randomly

codes, called DNA Fountain, the researchers

characteristics to make it an ideal storage medium:

packaged the strings into so-called droplets, and

ensured the reading and writing process was as

• It is ultracompact — about one million times

mapped the ones and zeros in each droplet to the

efficient as possible. They succeeded in packing an

four nucleotide bases in DNA: A, G, C and T. The

average of 1.6 bits into each base nucleotide — at

• It comes in a liquid state, so it is not bound

algorithm deleted letter combinations known to

least 60% more data than previously published

by the physical limitations of other storage

create errors and added a barcode to each droplet

methods, and close to the 1.8-bit limit.


to help reassemble the files later.

more so than regular digital media.

The downside of the study was that cost

• It can last for hundreds of thousands of years

The scientists generated a digital list of 72,000

remained a barrier: the researchers spent $7000

if kept in a cool, dry place, as demonstrated

DNA strands, each 200 bases long, and sent it in a

to synthesise the DNA they used to archive their

by the recent recovery of DNA from the bones

text file to DNA synthesis start-up Twist Bioscience,

2 MB of data and another $2000 to read it. The

of a 430,000-year-old human ancestor found

which specialises in turning digital data into

price of DNA synthesis may be reduced, however,

in a cave in Spain.

biological data. Two weeks later, they received a

if lower-quality molecules are produced and

“DNA won’t degrade over time like cassette

vial holding a speck of DNA molecules.

coding strategies like DNA Fountain are used to

tapes and CDs, and it won’t become obsolete —

To retrieve their files, the researchers used

if it does, we have bigger problems,” said study

sequencing technology to read the DNA strands,

Ultimately, the researchers showed that their

co-author Yaniv Erlich, from Columbia University

followed by software to translate the genetic

coding strategy packs a whopping 215 PB of data

and the New York Genome Center (NYGC).

fix molecular errors.

code back into binary. They recovered their files

on a single gram of DNA — 100 times more than

Erlich and his colleague Dina Zielinski, an

with no errors. They also demonstrated that a

the method published by EMBL-EBI. According

associate scientist at NYGC, chose six files to

virtually unlimited number of copies of the files

to Erlich, this makes it the highest density data-

encode into DNA: a full computer operating

could be created with their coding technique by

storage device ever created.

26 | LAB+LIFE SCIENTIST - Apr/May 2017 |