Reports

Molecular Computation by DNA Hairpin Formation

See allHide authors and affiliations

Science  19 May 2000:
Vol. 288, Issue 5469, pp. 1223-1226
DOI: 10.1126/science.288.5469.1223

Abstract

Hairpin formation by single-stranded DNA molecules was exploited in a DNA-based computation in order to explore the feasibility of autonomous molecular computing. An instance of the satisfiability problem, a famous hard combinatorial problem, was solved by using molecular biology techniques. The satisfiability of a given Boolean formula was examined autonomously, on the basis of hairpin formation by the molecules that represent the formula. This computation algorithm can test several clauses in the given formula simultaneously, which could reduce the number of laboratory steps required for computation.

In 1994, Adleman experimentally demonstrated that DNA molecules and common molecular biology techniques could be used to solve hard combinatorial problems (1), especially problems involving large searches. The data carried on a number of molecules are processed simultaneously by such techniques, and highly data-parallel computation is achieved.

Adleman's work was later generalized by Lipton (2), whose study encouraged further experimental work, based on Adleman and Lipton's paradigm (3–8). A number of theoretical studies have also emerged in the past 5 years (9,10), including the idea of autonomous DNA computers (11–14). In these systems, the logic of computation is implemented without external control or interference (other than the regulation of temperature), which could drastically reduce the number of required laboratory steps. It has been claimed that the self-assembly and the potential to form secondary structures of the molecules are useful for the embodiment of such computers (15–18), but no actual computation of a hard combinatorial problem had been performed in the autonomous manner. Here, we describe a DNA-based solution of the satisfiability (SAT) problem, where the main logic of computation was implemented on the basis of hairpin formation by single-stranded DNA (ssDNA) molecules.

The SAT problem is to find Boolean-value assignments that satisfy the given formula. Each variable is assigned the Boolean value (either 0 or 1); a set of the values (a value assignment) for the variables satisfies the formula if the value of the formula becomes 1. In a subclass of the SAT problem, called conjunctive normal form (CNF)–SAT, Boolean formulas are restricted to the form ofC 1C 2 ∧ … ∧Cn , where eachCi is a “clause” and “∧” is the logical AND operation. A clause is of the formL 1L 2 ∨ … ∨Lm , where eachLj is a “literal” and “∨” is the logical OR operation; a literal is either a variable or its negation (if variable x takes 1 or 0, then its negation, denoted by ¬x , takes 0 or 1, respectively).

Lipton proposed a DNA-based solution of CNF-SAT (2), which has been demonstrated in experimental studies (5–8). We took a different approach, because autonomous computing must encode the logical constraints into DNA sequences themselves. “Literal strings” were introduced to encode the given formula; they are conjunctions of the literals selected from each clause (one literal per clause). For example, the formula (ab) ∧ (¬a ∨ ¬c) can be represented by the set of four literal strings,a- ¬a,a- ¬c,b- ¬a, andb- ¬c. A formula is satisfiable if there is such a literal string that does not involve any variable together with its negation (we refer to such strings as “satisfying”), because the literal string gives a value assignment that makes every clause take the value 1 simultaneously, so that the value of the entire formula will become 1. If each variable is encoded with a sequence complementary to that encoding its negation, then the literal strings containing one or more pairs of complementary literals can form hairpins, and the satisfying strings stay in the form without such hairpins (“nonhairpin” form) (Fig. 1).

Figure 1

Illustration of the computation based on DNA hairpin formation. (A) The ssDNA representing a satisfying literal string,b-c- ¬d-e- ¬a- ¬fa-c- ¬d- ¬d, stays in the nonhairpin form. (B) A literal string,b- ¬c-a-c-e- ¬f- ¬d-c-a-d, has pairs of the complementary literals (c- ¬c and ¬d-d), and the ssDNA representing it forms hairpins.

Thus, the following algorithm solves the CNF-SAT problem. (i) Generate the literal strings according to the given formula. This step is implemented by a ligation reaction, which concatenates the literals. (ii) Allow ssDNA molecules, each representing a literal string, to form hairpins. This step performs the main logic of computation only by regulating the temperature. Even enzymes are not necessary. (iii) Remove the hairpin-forming molecules. The remaining molecules represent the satisfying literal strings, which can be identified with the solutions (value assignments) to the problem. Molecular biology techniques for this step were developed in the present study.

The first two steps are implemented autonomously, and each step processes several clauses simultaneously; the third step may be more laborious. The major difference between our algorithm and the previous molecular algorithms for solving the SAT problem is that our algorithm involves no sequence-specific operations that must be repeated more times for larger problems.

To test the feasibility of the above algorithm, we addressed a six-variable 10-clause instance of 3-SAT, where each clause contains (at most) three literals. Even the 3-SAT problem belongs to NP-complete, a class of computational problems that are hard to solve (19).

The Boolean formula is F = (ab ∨ ¬c) ∧ (acd) ∧ (a ∨ ¬c ∨ ¬d) ∧ (¬a ∨ ¬cd) ∧ (a ∨ ¬ce) ∧ (ad ∨ ¬f) ∧ (¬acd) ∧ (ac ∨ ¬d) ∧ (¬a ∨ ¬c ∨ ¬d) ∧ (¬ac ∨ ¬d). There exists the unique solution (a, b, c,d, e, f) = (0, 1, 1, 0, 1, 0), which is identified with 24 satisfying literal strings out of 310 (59,049) possible literal strings.

For generating a pool of these possible strings (20), the DNA for each literal in clause i has linkeri-1 (or a primer-binding site, pbs1) on the left and linkeri (or pbs2) on the right (Fig. 2). Thus, a literal string is a linear assembly of 10 literal DNA molecules, where 30-base literal sequences are concatenated, through the 4-base linkers, exclusively with those from the neighboring clauses. The literal DNA was prepared, in the form of double-stranded DNA (dsDNA) with the protruding 5′ ends as the linkers, by digesting longer dsDNA molecules, including three to four literals, into each literal DNA. The enzyme that we used for this digestion, Bst XI, recognizes 5′-CCAWNNNNWTGG-3′, where W denotes either A or T and N denotes any base. NNNN covers the linker, and thus each literal sequence is of the form 5′-WTGG…CCAW-3′. The Bst NI site, CCAGG, is also contained for the hairpin-removing step (as described below). The remaining 17 nucleotides were designed to meet the criteria commonly used for DNA computers, and the linker sequences are not palindromic for avoiding self-concatenation.

Figure 2

Illustration of “literal units” in the course of the assembly into a 10-clause or full-length literal string. The arrows represent ssDNA molecules; their direction is from the 5′ to 3′ ends.Lk indicates the literal from clausek (1 ≤ k ≤ 10), and the numbers at both ends of each literal indicate the linker numbers. The circles in the middle of the literals represent the Bst NI sites, and pbs1 and pbs2 indicate the primer binding sites for PCR.

These DNA molecules for 30 literals from the 10 clauses (3 literals per clause) were mixed in a test tube and concatenated with DNA ligase, and the ligation products were separated by gel electrophoresis (Fig. 3A). The final yield (1.5 to 3 ng) of the full-length molecules was enough to contain >104 molecules for every literal string. The quality of this pool of the literal strings (pool 0) was examined by sequence analysis (Table 1). The literals occurred roughly at random, except for the apparent bias found in clauses 5 and 7; the claim of unbiased concatenations for our ligation scheme requires further verification. “Erroneous” literals, that is, literals absent in the corresponding clauses of the formula, were not found; this is reinforced by the additional data collected during computations (some bias may appear in these data because of the implemented operations).

Figure 3

Electrophoresis of products after each operation. (A) The concatemers of the literal units (lane 1). The band corresponding to the full-length literal strings is indicated by the arrow. Lane M is a size marker. (B) The full-length literal strings, biotinylated at one end (pool 0) (lane 1); the products after the Bst NI digestion (pool 1) (lane 2); and the products after further processing by ePCR (pool 3) (lane 3). Lane M is a size marker. The electrophoresis was conducted in 8% polyacrylamide gels stained with ethidium bromide.

Table 1

Multiplicity of the occurrence of each literal in pool 0. The data of 223 literals were obtained from pool 0, and 730 additional data (parentheses) were collected during the computations using pool 0 (31). “c1” to “c10” indicate clauses 1 to 10, respectively. “k 1,” “k 2,” and “k 3” represent the first, second, and third literals from each clause, respectively. k 1represents a for clauses 1, 2, 3, 5, 6, and 8 and ¬a for clauses 4, 7, 9, and 10.k 2 represents b for clause 1;c for clauses 2, 7, 8, and 10; ¬c for clauses 3, 4, 5, and 9; and d for clause 6.k 3 represents ¬c for clause 1; d for clauses 2, 4, and 7; ¬d for clauses 3, 8, 9, and 10; e for clause 5; and ¬f for clause 6. “x” represents any erroneous literal that is not included in the corresponding clauses of the formulaF.

View this table:

The logic of our computation, embodied as hairpin formation, was performed by regulating the temperature of a DNA solution in which the literal-string DNA molecules are dissolved at low concentration, allowing the dsDNA to melt and refold intramolecularly. In the absence of available means for this hairpin-removing process, we developed the following two techniques. One is destructive: enzymatic digestion is used to remove hairpin DNA. The other is similar to the polymerase chain reaction (PCR) and increases the population of nonhairpin molecules in a given DNA pool.

The literal sequences have a restriction enzyme site for Bst NI in the middle of the sequence, so that the double-stranded regions of a hairpin molecule become susceptible to the enzyme (20). The selectivity for hairpin DNA requires that the restriction sites alone do not form stable base pairs with each other, regardless of the complementarity between the literals. To check this point, we performed a preliminary experiment with the following ssDNA molecules: molecule A, consisting of the sequences for literal a, linker 1, and literal b in this order (a-1-b), and molecule B, in the form of a-1- ¬a. Figure 4A shows that molecule B actually forms a hairpin, because it migrates on the gel much faster than expected from its length, and molecule A stays in the nonhairpin form. It was found that only molecule B is susceptible to the restriction enzyme. The small population of molecule B remaining undigested was probably due to incomplete “deprotection” after chemical synthesis.

Figure 4

(A) Electrophoresis of ssDNA molecules subjected to the Bst NI digestion. Molecules A (lanes 1 and 2) and B (lanes 3 and 4) were renatured (lanes 1 and 3) and then subjected to the enzymatic digestion (lanes 2 and 4). Lane M is a size marker. The electrophoresis was conducted in the denaturing 8% polyacrylamide gel stained with ethidium bromide. (B) Electrophoresis of the products of ePCR. Molecules C (lane 1) and D (lane 2) were each used as the template for a 10-cycle ePCR, followed by a 15-cycle PCR to recover the DNA. An aliquot (5 μl) of the product was applied to the 8% polyacrylamide gel electrophoresis, stained with ethidium bromide. Lane M is a size marker.

The second technique depends on the inability of DNA polymerase to duplicate a DNA template that forms stable hairpins (20). If the concentration of the template is kept low enough to allow hairpin formation, rather than intermolecular hybridization, then PCR is expected to amplify only nonhairpin molecules. To maintain such a low DNA concentration, we diluted the reaction mixture twice after each PCR cycle. The common PCR protocol was then utilized to recover the DNA. To test the feasibility of this technique (which we call “exclusive PCR” or “ePCR”), we processed dsDNA molecules C (pbs1-a-1-c-2- ¬d-pbs2) and D (pbs1-7-a-8- ¬d-9-¬a-pbs2) by ePCR. As expected, molecule D (having a pair of a and ¬a) was recovered in a much smaller amount than molecule C (Fig. 4B). This ePCR technique may be also effective for the hairpin DNA with a few point mutations in the restriction site; for such DNA, the enzymatic digestion protocol does not work.

These techniques were applied to processing the pool of the literal strings, pool 0. The enzymatic digestion was performed first. The literal strings in the dsDNA form were biotinylated by amplification with the primers [either of which was biotinylated at its 5′ end (Fig. 3B)] and then bound on the beads for conversion into the single-stranded form by alkali treatment. Another objective of this immobilization was to prevent the intermolecular hybridization between the literal strings during the renaturing process that facilitates hairpin formation. The ssDNA molecules on the beads were digested twice with the restriction enzyme Bst NI. The molecules that remained undigested were recovered by PCR from the beads. The band corresponding to the undigested DNA appeared on the gel, together with a ladder of the bands (Fig. 3B). It is likely that, during PCR, the digested molecules hybridized with different sites on the undigested ones and were extended with them as the template, thus producing literal strings of various lengths (21). Only the full-length strings obtained here (pool 1) were purified and subjected to the sequence determination. No satisfying literal strings were found among 11 clones picked up randomly. Then, this destructive process was performed once more on pool 1, followed by the ePCR processing of 10 cycles, to generate pool 2; only one satisfying string was found among 16 clones. Instead of the second round of the destructive process, ePCR (20 cycles) (22) was directly performed on pool 1 to generate pool 3 (Fig. 3B), where we found six satisfying literal strings (five different ones) out of 37 sequenced clones (Table 2). All of these strings were identified with the correct unique solution.

Table 2

The satisfying literal strings obtained from pool 3. The numbers in the parentheses indicate the multiplicity of the occurrence of each string.

View this table:

The final result with pool 3 demonstrated the feasibility of the algorithm based on the intrinsic property of ssDNA to form hairpins, although the rate of hitting the real solution was much lower than that of the previous successful computations (6–8), which used common molecular biology techniques. Furthermore, the literal strings in pools 2 and 3 had accumulated point mutations around the restriction site, preventing further enrichment by digestion and weakening the effectiveness of ePCR. Increasing the fraction of satisfying strings in the final pool would require improvements in the fidelity of PCR or in the effectiveness of ePCR.

The instance solved here is comparable in size to the SAT problems solved in the previous computations (6–8); for example, Faulhammer and co-workers recently solved a nine-variable and five-clause instance (8). In these computations, each clause was examined by a few laboratory steps, whereas our computation processed all clauses simultaneously. In the previous computations, the long sequence of laboratory steps introduced many chances for errors to occur. The major drawback with our method is an inefficiency with respect to the required amount of DNA; the algorithms based on Lipton's method only require 2n or less molecules for encoding the value assignments with nvariables, whereas we generated 3m literal strings for m clauses, where m is four timesn for the hardest instances (23). This inefficiency comes from the necessity of a molecular representation of the given logical constraints.

For a scale-up of our method, we should also enumerate the value assignments as the literal strings comprising either of the two literals for each variable, which are then to be randomly concatenated with the literal strings that represent the given formula, being subjected to the hairpin-based computation. This modification could reduce the required DNA amount, together with use of the “breadth-first” search proposed for large SAT problems (6, 24), where our algorithm must be repeated by the number of the clauses, although several clauses can be processed simultaneously. Upon tackling large instances, PCR may introduce serious errors into computation because all sequences may not be amplified with the same efficiency (7). However, a variation in sequence was still retained after our computation including as many as 70 PCR cycles (25). Further study is necessary for unraveling the nature of the possible bias during PCR.

Another drawback with our method is the incompleteness of our understanding of the nature of hairpin molecules. The limit to the available length of hairpin remains to be determined; the available data suggest that an intact double helix of 30 base pairs is stable enough for closing a 2000-base hairpin loop (26–28). Sequence design or experimental conditions that ensure the hairpin formation and thus reduce the error rate also remain as the major issue for our algorithm. On the other hand, “negative” errors (loss of “solution” molecules) were successfully controlled, because the satisfying strings were finally obtained, despite their small population (0.04% of the total molecules) in the initial DNA pool. This approach for molecular computing, described on a theoretical basis (29) and then implied in some previous experiments (1, 3, 30), sheds a new light on the potential of DNA, not restricted to a carrier of information.

  • Present address: Yokoyama Cytologic Project, Exploratory Research for Advanced Technology, Japan Science and Technology Corporation, c/o RIKEN, Hirosawa, Wako-shi, Saitama 351-0198, Japan.

REFERENCES AND NOTES

View Abstract

Navigate This Article