A team of researchers from Brown University has made significant progress in developing a novel molecular data storage system.

In a study published in Nature communicationsThe team saved a variety of image files – a Picasso drawing, an image of the Egyptian god Anubis, and others – in arrays of mixtures containing specially synthesized small molecules. In total, the researchers have stored more than 200 kilobytes of data, which they say are the most frequently stored with small molecules to date. That’s not a lot of data compared to traditional storage, but it’s a significant advance in terms of small molecule storage, the researchers say.

“I think this is a major step forward,” said Jacob Rosenstein, assistant professor at Brown’s School of Engineering and author of the study. “The large number of unique small molecules, the amount of data that we can store and the reliability of the data reading are promising to scale this even further.”

As the data universe grows, much work is being done to find new and more compact storage. By encoding data in molecules, the equivalent of terabytes of data may be stored in just a few millimeters. Most studies on molecular storage have focused on long-chain polymers such as DNA, which are known to carry biological data. However, using small molecules offers potential advantages over long polymers. Small molecules may be easier and cheaper to manufacture than synthetic DNA and theoretically have an even higher storage capacity.

The Brown research team, supported by the United States Defense Agency DARPA (Defense Advanced Research Projects Agency) under the leadership of chemistry professor Brenda Rubenstein, looked for ways to make data storage with small molecules feasible and scalable.

The team uses small metal plates with 1,500 tiny places with a diameter of less than one millimeter to store the data. Each spot contains a mixture of molecules. The presence or absence of different molecules in each mixture indicates the digital data. The number of bits in each mix can be as large as the library of different molecules available for mixing. The data can then be read using a mass spectrometer that can identify the molecules present in each well.

In an article published last year, the Brown team showed that they can save kilobyte image files using some common metabolites, the molecules that organisms use to regulate metabolism. For this new work, the researchers were able to significantly expand the size of their library, and thus the size of the files they could encode, by synthesizing their own molecules.

The team made its molecules using Ugi reactions – a technique that is widely used in the pharmaceutical industry to quickly make a large number of different compounds. Ugi reactions combine four broad classes of reagents (an amine, an aldehyde or a ketone, a carboxylic acid and an isocyanide) to form a new molecule. By using different reagents from each class, the researchers were able to quickly produce a variety of different molecules. For this work, the team used five different amines, five aldehydes, 12 carboxylic acids and five isocyanides in various combinations to produce 1,500 different compounds.

“The advantage here is the potential scalability of the library,” said Rubenstein. “We only use 27 different components to create a library of 1,500 molecules in a day. That means we don’t have to go out and find 1,500 unique molecules.”

From there, the team used sub-libraries of compounds to encode their images. A binary image of the Egyptian god Anubis was stored in a library with 32 connections. A 575 link library was used to encode a Picasso drawing of a 0.88 megapixel violin.

The large number of molecules available for the chemical libraries also enabled the researchers to investigate alternative coding schemes that made reading data more robust. While mass spectrometry is highly precise, it is not perfect. As with any system used to store or transfer data, this system also requires error correction.

“The way we design the libraries and read the data includes additional information that we can use to correct some errors,” said Brown graduate student Chris Arcadia, the first author of the article. “That helped us to rationalize the experimental workflow and still achieve accuracy rates of up to 99 percent.”

According to the researchers, there is still a lot to do to bring this idea to a useful scale. However, the ability to create large chemical libraries and use them to encode larger and larger files suggests that the approach can actually be expanded.

“We are no longer limited to the size of our chemical library, which is really important,” said Rosenstein. “This is the biggest step forward. When we started this project a few years ago, we had some debates about whether something of this size was experimentally feasible at all, so it’s very encouraging that we did it.”

