Computing with DNA

Invented (discovered?) by Dr. Leonard M. Adleman of USC in 1994, a computer scientist and mathematician

Basic Idea: Perform molecular biology experiment to find solution to math problem.

“DNA Computer”

“Biological Computation” vs “computational biology”

“Molecular Computer”

Why DNA Computing?

o Radically different computational paradigm

o Observe that DNA is just another ‘code’ that can be manipulated.

o Computation is typically viewed as ‘string processing’ of some form.

o Computation is implemented by biological molecules and manipulation mechanisms – e.g. usual DNA reproductive mechanisms, enzymes catalyse particular joining, shortening, extension manipulations etc.

o Fundamental is the notion of Crick Watson complementarity

Still: Why DNA Computing?

o Support for standard computation

o Better understanding of how nature computes

o New data structures (molecules)

o New operations - l cut, paste, adjoin, insert, delete, ...

o New computability models.

Key Features of DNA Computing

· Massive parallelism of DNA strands high density of information storage ease of constructing many copies

· Watson-Crick complementarity – yields universality, in the Turing sense

o feature provided „for free“

o universal twin shuffle language

Twin-shuffle language

TS consists of all words over the alphabet {0,1,0’,1’} obtained thus –

o Take an arbitrary word w over {0,1} and its complement w’ and shuffle the letters, keeping the order of letters from each word the same.

o Twin-shuffle languages are universal: for every recursively enumerable language L, there is a mapping g such that L=g(TS)

o e.g. Consider set of all possible words (sequences) that can be obtained from two given words by shuffling them without changing the order of letters.

§ For instance, shuffling AG and TC we get AGTC, ATCG, TCAG and TAGC.

§ Then collect all shufflings of all pairs of complementary words into the so-called twin-shuffle language.

§ There is a simple way to go from a DNA double strand to a word in the twin-shuffle language and back.

§ Universality follows from the fact that any Turing computation can be performed by using an appropriate finite automaton to filter the (fixed) twin-shuffle language.

§ So DNA computers could in theory perform any operation that digital computers can.

The double helix

Two DNA strands, each a right-handed helix

Strands are anti-parallel - the chains run from 3’ to 5’ in opposite directions

The primary sugarphosphate structure of the two DNA strands wind around the helix axis with the bases of the individual nucleotides on the inside of the helix.

(image from http://www.blc.arizona.edu/Molecular_Graphics/DNA_Structure/DNA_Tutorial.HTML )

Base Pairing in DNA

( from http://www.accessexcellence.org/RC/VL/GG/dna2.php )

Watson-Crick Complementarity: AT and CG

DNA Synthesis

· Getting bits of DNA with specified sequences is really not a problem

· Oligonucleotides (short strands of DNA) can be made in the lab with a synthesizer

– Supply bottled of A, C, G and T in solution to the synthesizer

– Specify the sequence

– Turn it on!

· Millions of copies of the required sequence are produced and placed in solution

· …or just buy them by mail order!

Denaturing, annealing and ligation

Denaturing = melting = separation of dsDNS (double stranded) into two complementary ssDNA (single stranded)

– Achieved by heating the solution to ~90-95deg

Annealing is the reverse of denaturing

– Solution of ssDNA is cooled, allows strands to join together again

Ligation is the joining of two nucleotides together

– For example, join two strands ssDNA end to end

– Ligase is the enzyme that does this

– Can also repair breaks in one strand of dsDNA

Cutting DNA - Restriction

Enzymes (molecular machines) are used to cut DNA sugar-phosphate backbone

– Enzymes look for a specific sequence of bases

– Cuts may be asymmetric – “sticky ends”

– Cuts may be within or outside the enzyme’s recognition site

Joining DNA – Ligation

· Reverse of cutting is also accomplished by enzymes

· Where we have “sticky” ends (one strand longer than the other) we must have the exact complement for it to work.

Copying DNA - Polymerisation

· At some stage we are going to need to make copies of DNA sequences

· Assume the molecule has ends γ and β, denote the WC complement of a sequence of bases by h(.)

· Heat the dsDNA up to split it into two ssDNA

· Primers h(γ) and h(β) in the mixture that bind to the ends of the ssDNA

· Once this is done, a DNA polymerase fills in the rest of the missing bases to complete the dsDNA.

· The same is done for the complementary strand, so we end up with two strands the same.

Amplifying DNA - PCR

The polymerase chain reaction can be used to amplify DNA.

– Template (strand to copy) is denatured at high temp. (~ 95oC)

– It is cooled to a temperature that will allow optimal primer binding.

– The reaction temperature is then raised to that optimal for the DNA polymerase (~ 72oC) so the primers are extended along the template.

– This series of steps is carried out 20 - 30 times leading to exponential amplification of the target template.

Separating DNA – Hybridization

· Allows us to separate out molecules with a known DNA sequence from a mix of different sequences

· To detect a particular sequence of bases, we generate many copies of a probe – single strand of DNA bases that is WC complement of the target

· Attach these probes to biotin molecule

· Bind biotin to a fixed matrix or magnetic beads

· Pour the mix of sequences over the probes

· When target meets the probe they bind

· Non-binding strands can be washed away

· Targets then removed from matrix back into solution

Separating DNA - Electrophoresis

• Separation is based on size = length of strands

• DNA mixture loaded onto one edge of gel

• An electric field is applied across the gel

• How far a strand moves depends on its mass – Logarithmic separation with

length

First DNA Computing Problem

HAMILTONIAN PATH PROBLEM (Posed by William Hamilton)

Given a network of nodes and directed connections between them, is there a path through the network that begins with the start node and concludes with the end node visiting each node only once (“Hamiltonian path")?

Adleman, L.M. “Molecular Computation of Solutions to Combinatorial Problems.” Science. 266: 1021-1024 (Nov. 11, 1994). (PDF)

Adelman’s Scientific American paper (PDF)

· Hamiltonian path problem with 7 cities and 14 connections.

· With DNA, the initial state is created by synthesizing DNA molecules with a certain sequence and after some reactions, a new molecule is produced with the answer.

· It took one second for the DNA to come up with answers, but it took him a week to dig out the answer from the DNA soup

“Does a Hamiltonian path exist, or not?”

Problems that are NP Complete (Nondeterministic Polynomial time)

· Hard NP problem - the time required for algorithms to find a solution increases exponentially with the number of variables involved.

· Easy NP problem - the algorithm running time increases in proportion to the number of variables.)

· A hard NP problem can eat up a lot of computer cycles if carried out by brute force.

o e.g. the Hamilton path problem is a hard NP problem.

§ for N cities, there are N!/2 possible paths.

§ As the number of cities grows, the number of possible path combinations soars.

§ 9 cities - 180,000 possible paths.

§ 11 cities - 19.8 million paths,

§ 13 cities - 3 billion paths,

§ 17 cities - 200 trillion paths.

Solving the Hamiltonian Problem

Typical Algorithm - Generation-&-Test:

Step 1: Generate random paths on the network.

Step 2: Keep only those paths that begin with start city and conclude with end city.

Step 3: If there are N cities, keep only those paths of length N.

Step 4: Keep only those that enter all cities at least once.

Step 5. Any remaining paths are solutions (i.e., Hamiltonian paths).

Does a Hamiltonian path exist for the following network?

Combinatorial Explosion

The total number of paths grows exponentially as the network size increases:

(e.g.) 106 paths for N=10 cities, 1012 paths (N=20), 10100 paths!! (N =100)

· The Generation-&-Test algorithm takes “forever”.

· Some sort of smart algorithm must be devised; none has been found so far (NP-hard).

DNA for Hamiltonian Problem

The key to solving the problem is using DNA to perform the five steps of the Generation-&-Test algorithm in parallel search, instead of serial search.

Adelman’s algorithm

1. Produce strands corresponding to vertices and edges

2. Generate many paths in the graph, randomly

3. Remove all paths that do not start at vertex vi or end on vertex vo

4. Remove all paths that do not involve exactly n vertices

5. For each of the n vertices v, remove all paths that do not involve v.

• Steps 2,3, 4 are constant time, 1,5 are polynomial in n

DNA encoding of city-network

Vertex and Edge Encodings

Each city vi is encoded by two sub-sequences:

vi = vi´ vi´´ the base complements.

Each flight eik from vi to vk is encoded by:

eik =vi´´vk´

Aldleman’s Data Representation

7 Cities

• Each node represented by a 20-mer strand

• Each possible edge represented by a complementary 20-mer

DNA Experiment

1. In a test tube, mix the prepared DNA pieces together (which will randomly link with each other, forming all different paths).

2. Perform PCR with two ‘start’ and ‘end’ DNA pieces as primers (which creates millions’ copies of DNA strands with the right start and end).

3. Perform gel electrophoresis to identify only those pieces of right length (e.g., N=4).

4. Use DNA ‘probe’ molecules to check whether their paths pass through all intermediate cities.

5. All DNA pieces that are left in the tube should be precisely those representing Hamiltonian paths.

- If the tube contains any DNA at all, then conclude that a Hamiltonian path exists, and otherwise not.

- When it does, the DNA sequence represents the specific path of the solution.

DNA Experiment Set-up

Ingredients and tools needed:

- DNA strands that encode city names and connections between them

- Polymerases, ligase, water, salt, other ingredients

- Polymerase chain reaction (PCR) set

- Gel electrophoresis tool (that filters out non-solution strands)

Gel Electrophoresis

o DNA molecules are negatively charged.

o Placed in an electric field, they will move towards the positive electrode.

o The negative charge is proportional to the length of the DNA molecule.

o The force needed to move the molecule is proportional to its length.

o A gel makes the molecules move at different speeds.

o DNA molecules are invisible, and must be marked (ethidium bromide, radioactive)

SUMMARY & CONCLUSION

o Why does it work?

Enormous parallelism, with 1023 DNA pieces working in parallel to find solution simultaneously.

Takes less than a week (vs. thousands yrs for supercomputer)

o Massively parallel information processing:

o 10⁶ ops / sec for PCs

o 10¹² ops / sec for supercomputers

o 10²⁰ ops / sec possible for DNA

o DNA computers would be > 1,000,000 times faster than any computer today.

o Extraordinary energy efficient (10^-10 of supercomputer energy use)