# «Abstract. DNA self-assembly is emerging as a key paradigm for nano-technology, nano-computation, and several related disciplines. In nature, DNA ...»

Error Free Self-Assembly using Error Prone Tiles

Ho-Lin Chen ∗ Ashish Goel †

Stanford University Stanford University

Abstract. DNA self-assembly is emerging as a key paradigm for nano-technology, nano-computation,

and several related disciplines. In nature, DNA self-assembly is often equipped with explicit mechanisms

for both error prevention and error correction. For artiﬁcial self-assembly, these problems are even more

important since we are interested in assembling large systems with great precision.

We present an error-correction scheme, called snaked proof-reading, which can correct both growth and nucleation errors in a self-assembling system. This builds upon an earlier construction of Winfree and Bekbolatov [13], which could correct a limited class of growth errors. Like their construction, our system also replaces each tile in the system by a k × k block of tiles, and does not require changing the basic tile assembly model proposed by Rothemund and Winfree [10].

We perform a theoretical analysis of our system under fairly general assumptions, tiles can both attach and fall off depending on the thermodynamic rate parameters which also govern the error rate. We prove that with appropriate values of the block size, a seed row of N tiles can be extended into an N × N square of tiles without errors in expected time O(N ), and further, this square remains stable for an expected time of Ω(N ). This is the ﬁrst error-correction system for DNA self-assembly that has provably good assembly time (close to linear) and provable error-correction. The assembly time is the same, up to logarithmic factors, as the time for an irreversible, error-free assembly. We also did a preliminary simulation study of our scheme.

In simulations, our scheme performs much better (in terms of error-correction) than the earlier scheme of Winfree and Bekbolatov, and also much better than the unaltered tile system.

1 Introduction Self-assembly is the ubiquitous process by which objects autonomously assemble into complexes. Nature provides many examples: Atoms react to form molecules. Molecules react to form crystals and supramolecules. Cells sometimes coalesce to form organisms. It is widely believed that self-assembly will ultimately become an important technology, enabling the fabrication of great quantities of small complex objects such as computer circuits. DNA has emerged as an important component to use in artiﬁcial self-assembly of nano-scale systems due to its small size, its incredible versatility, and the precedent set by the abundant use of DNA self-assembly in nature. Accordingly, DNA self-assembly has received signiﬁcant attention over the last few years, both by practitioners [15, 17, 12, 13], and by theoreticians [7, 8, 14, 1, 9, 10, 2, 3, 6, 5, 4]. The theoretical results have focused on efﬁciently assembling structures of a controlled size (the canonical example being assembly of n × n squares) and shape. In this paper, we are interested in simultaneously achieving robustness and efﬁciency.

∗ Ho-Lin Chen is at the Department of Computer Science, Stanford University. Email: holin@stanford.edu. Research supported in part by NSF Award 0323766.

† Ashish Goel is at the Department of Management Science and Engineering and (by courtesy) Computer Science, Stanford University, Terman 311, Stanford CA 94305. Email: ashishg@stanford.edu. Research supported by NSF CAREER Award 0339262 and by NSF Award 0323766.

The Tile Assembly Model, originally proposed by Rothemund and Winfree [10], and later extended by Adleman et al. [2], provides a useful framework to study the efﬁciency (as opposed to robustness) of DNA self-assembly. In this model, a square tile is the basic unit of an assembly. Each tile has a glue on each side;

each glue has a label and a strength (typically 1 or 2). A tile can attach to a position in an existing assembly if at all the edges where this tile “abuts” the assembly, the glues on the tile and the assembly are the same, and the total strength of these glues is at least equal to a system parameter called the temperature (typically 2). Assembly starts from a single seed crystal and proceeds by repeated accretion of single tiles. The speed of an addition (and hence the time for the entire process to complete) is determined by the concentrations of different tiles in the system. Details are in Section 2.

Rothemund and Winfree [10] gave an elegant self-assembling system for constructing squares by selfassembly in this model. Their construction of n × n squares requires time Θ(n log n) and program size Θ(log n). Adleman et al. [2] presented a new construction for assembling n × n squares which uses optimal log n time Θ(n) and optimal program size Θ( log log n ). Both constructions ﬁrst assemble a roughly log n × n rectangle (at temperature 2) by simulating a binary counter, and then complete the rectangle into a square.

Later, Adleman et al. [3] studied several combinatorial optimization problems related to self-assembly.

Together, the above results are a comprehensive treatment of the efﬁciency of self-assembly, but they do not address robustness.

In nature, DNA self-assembly is often equipped with explicit mechanisms for both error prevention and error correction. For artiﬁcial self-assembly, these problems are even more important since we are interested in assembling large systems with great precision. In reality, several effects are observed which lead to a loss of robustness compared to the model. The assembly tends to be reversible, i.e., tiles can fall away from an existing assembly. Also, incorrect tiles sometimes get incorporated and locked into a growing assembly, much like defects in a crystal. However, for sophisticated combinatorial assemblies like counters, which form the basis for controlling the size of a structure, a single error can lead to assemblies drastically larger or smaller (or different in other ways) than the intended structure. Finally, the temperature of the system can be controlled only imperfectly. Experimental studies of algorithmic self-assembly have observed error rates of 1% to 10% [13].

The work towards robustness has focused so far on two broad approaches. The ﬁrst approach is to identify mechanisms used by nature for error-correction and error-prevention in DNA self-assembly and study how they can be leveraged in an algorithmic setting. One example of this approach is strand invasion [6].

The other approach is to design more combinatorial error-correction mechanisms. This is closest in spirit to the ﬁeld of coding theory. One example of this approach, due to Winfree and Bekbolatov [13], is proofreading tiles. They suggest replacing each tile in the original system with a k × k block. This provides some redundancy in the system (hence the loose analogy with coding theory). Their approach can correct growth errors, which result from an incorrect tile attaching at a correct location, i.e., a location where some other tile could have correctly attached. However, their approach does not reliably correct nucleation errors, which result from a tile (correct or incorrect) attaching at a site which is not yet active. Their proof-reading scheme is explained in section 3, along with the difference between growth and nucleation errors.

We present a modiﬁed proof-reading system which can correct both kinds of errors; we call it a snaked proof-reading system. Our scheme provably (under some mild assumptions) results in error-free assembly of an n × n square in time O(n) with high probability (whp). Further, our system results in the ﬁnal assembly remaining stable for an Ω(n) duration whp. Hence, there is a large window during which there is a high probability of ﬁnding complete assemblies. The best-possible assembly time for an n × n structure is linear even without errors and even in the irreversible model. Thus, our system guarantees close to optimum speed. To the best of our knowledge, this is the ﬁrst result which simultaneously achieves both robustness and efﬁciency.

Our snaked system is explained informally in section 3 using an illustrative example. We prove that the error-rate in this illustrative example is much better for our system than for that of Winfree and Bekbolatov.

We give a formal description of our system in section 4 and prove the properties of error-correction and efﬁciency. Section 4 also provides simulation evidence with both our illustrative example and the Sierpinski tile system [12]; in both cases, we demonstrate that our system resulted in a signiﬁcant reduction in errors.

Our analysis uses the thermodynamic model of Winfree [12]. We assume that the forward and reverse rates as well as the error-rates are governed by underlying thermodynamic parameters. We ﬁrst analyze the performance of k × k proof-reading blocks in terms of the error-rate and efﬁciency, and then let k grow to O(log n). Our O notation hides polynomials in log n. We believe that our analysis is slack, and can be signiﬁcantly improved in terms of the dependence on k. We make some simplifying assumptions to allow our proofs to go through; our analysis indicates that these assumptions are just an artifact of our analysis and not really necessary.

Our basic construction and analysis applies to all rectilinear tile systems (where growth happens from south to north and west to east). These systems include the Sierpinski tile system, the square-completion tile system, and the block cellular automata for simulating Turing machines. It also applies to counters, a basic primitive in many self-assembly constructions and computations, but we omit the discussion about counters from this paper.

2 Tile Assembly model The Combinatorial Tile Assembly Model The tile assembly model was originally proposed by Rothemund and Winfree[10, 2]. It extends the theoretical model of tiling by Wang [11] to include a mechanism for growth based on the physics of molecular self-assembly. Informally, each tile of an assembly is a square with glues of various types on each edge. Two tiles will stick to each other if they have compatible glues.

We will present a succinct deﬁnition, with minor modiﬁcations for ease of explanation.

A tile is an oriented unit square with the north, east, south and west edges labeled from some alphabet Σ of glues. For each tile t, the labels of its four edges are denoted σN (t), σE (t), σS (t), and σW (t). Sometimes we will describe a tile t as the quadruple (σN (t), σE (t), σS (t), σW (t)). Consider the triple T, g, τ where T is a ﬁnite set of tiles, τ ∈ Z0 is the temperature, and g is the glue strength function from Σ × Σ to Z≥0, where Σ is the set of glues. It is assumed that for all x, y ∈ Σ, (x = y) implies g(x, y) = 0 and there’s a glue null ∈ Σ, such that g(null, x) = 0 for all x ∈ Σ. A conﬁguration is a map from Z 2 to T empty.

Let C and D be two conﬁgurations. Suppose there exist some t ∈ T and some (x, y) ∈ Z 2 such that D = C except at (x, y), C(x, y) = null and D(x, y) = t. Let fN,C,t (x, y) = g(σN (t), σS (C(x, y + 1)).

Informally fN,C,t (x, y) is the strength of the bond at the north side of t under conﬁguration C. Deﬁne fS,C,t (x, y), fE,C,t (x, y) and fW,C,t (x, y) similarly. Then we say that tile t is attachable to C at position (x, y) iff fN,C,t (x, y) + fS,C,t (x, y) + fE,C,t (x, y) + fW,C,t (x, y) ≥ τ, and we write C →T D to denote the transition from C to D in attaching a tile to C at position (x, y). Informally, C → T D iff D can be obtained from C by adding a tile t such that the total strength of interaction between t and C is at least τ.

A tile system is a quadruple T = T, s, g, τ, where T, g, τ are as above and s ∈ T is a special tile called the “seed”. We deﬁne the notion of a derived supertile of a tile system T = T, s, g, τ recursively

**as follows:**

1. The conﬁguration Γ such that Γ(x, y) = empty except when (x, y) = (0, 0) and Γ(0, 0) = s is a derived supertile of T, and

2. if C →T D and C is a supertile of T, then D is also a derived supertile of T.

Informally, a derived supertile is either just the seed (condition 1 above), or obtained by legal addition of a single tile to another derived supertile (condition 2). We will often omit the word “derived” in the rest of the paper, and use the terms “seed supertile” or just “seed” or s to denote the special supertile in condition 1.