Research Article

Continuous genetic recording with self-targeting CRISPR-Cas in human cells

See allHide authors and affiliations

Science  09 Sep 2016:
Vol. 353, Issue 6304, aag0511
DOI: 10.1126/science.aag0511

Structured Abstract


Technologies that enable the longitudinal tracking and recording of molecular events into genomic DNA would be useful for the detailed monitoring of cellular state in artificial and native contexts. Although previous systems have been used to memorize digital information such as the presence or absence of biological signals, tools for recording analog information such as the duration or magnitude of biological activity in human cells are needed. Here, we present Mammalian Synthetic Cellular Recorders Integrating Biological Events (mSCRIBE), a memory system for storing analog biological information in the form of accumulating DNA mutations in human cells. mSCRIBE leverages self-targeting guide RNAs (stgRNAs) that are engineered to direct Streptococcus pyogenes Cas9 cleavage against DNA loci that encode the stgRNAs, thus accumulating mutations at stgRNA loci as a record of stgRNA or Cas9 expression.


The RNA-guided DNA endonuclease Cas9 introduces a double-stranded break in target DNA containing a 5′-NGG-3′ protospacer-adjacent motif (PAM) and homology to the specificity-determining sequence (SDS) of a small guide RNA (sgRNA). Once a double-strand break is introduced, the targeted DNA can be repaired via error-prone DNA repair mechanisms in human cells. We hypothesized that if a PAM sequence were introduced in the DNA locus encoding the sgRNA, the transcribed sgRNA would direct Cas9 to cleave its own encoding DNA, thus acting as a stgRNA. After error-prone repair, the mutagenized stgRNA locus should continue to be transcribed and enact additional rounds of continuous, self-targeted mutagenesis. Thus, the stgRNA locus should acquire mutations corresponding to the level of activity of the Cas9-stgRNA complex. We hypothesized that by linking the expression of stgRNA or Cas9 to biological events of interest, one could then record the duration and/or intensity of such events in the form of accumulated mutations at the stgRNA locus. The recorded information could be read by sequencing the stgRNA locus or by other related strategies.


We first built a stgRNA by engineering a sgRNA-encoding DNA locus to contain a 5′-NGG-3′ PAM immediately downstream of the SDS-encoding region. We then validated that the stgRNA could undergo multiple rounds of self-targeted mutagenesis by building a mutation-based toggling reporter system in which the progressive accumulation of mutations at the stgRNA locus is reported by individual cells toggling between green and red fluorescent protein expression. Next, we analyzed the sequence-evolution properties of stgRNAs in order to devise a sequence-based recording metric that conveys information on the duration and/or magnitude of stgRNA activity. We showed that computationally designed stgRNAs that contain longer SDSs of length 30, 40, and 70 nucleotides are able to accumulate mutations over longer durations of time. We demonstrated the analog nature of mSCRIBE by building a tumor necrosis factor–α (TNFα)–inducible Cas9 expression system and observing graded increases in the recording metric as a function of increasing TNFα concentration and/or duration of exposure in vitro. By designing doxycycline and isopropyl-β-dthiogalactoside-inducible stgRNA expression systems, we also showed inducible, multiplexed recording at two independent DNA loci. Last, we confirmed that human cells containing TNFα-responsive mSCRIBE units can record lipopolysaccharide (LPS)–induced acute inflammation events over time in mice.


We demonstrate that sgRNAs can be engineered to function as stgRNAs. By linking stgRNA or Cas9 expression to specific biological events of interest—such as the presence of small molecules, exposure to TNFα, or LPS-induced inflammation—we validated mSCRIBE as an analog memory device that records information about the duration and/or magnitude of biological events. Moreover, we demonstrated that multiple biological events can be simultaneously monitored by using independent stgRNA loci. We envision that this platform for genomically encoded memory in human cells should be broadly useful for studying biological systems and longitudinal and dynamic events in vitro and in situ, such as signaling pathways, gene regulatory networks, and tissue heterogeneity involved in development, healthy cell function, and disease pathogenesis.

Continuously evolving stgRNAs.

The Cas9-stgRNA complex cleaves the DNA locus from which the stgRNA is transcribed, leading to error-prone DNA repair. Multiple rounds of transcription and DNA cleavage can occur, resulting in progressive mutagenesis of the DNA encoding the stgRNA. The accumulation of mutations in the stgRNA locus provides a molecular record of cellular events that regulate stgRNA or Cas9 expression.


The ability to record molecular events in vivo would enable monitoring of signaling dynamics within cellular niches and critical factors that orchestrate cellular behavior. We present a self-contained analog memory device for longitudinal recording of molecular stimuli into DNA mutations in human cells. This device consists of a self-targeting guide RNA (stgRNA) that repeatedly directs Streptococcus pyogenes Cas9 nuclease activity toward the DNA that encodes the stgRNA, enabling localized, continuous DNA mutagenesis as a function of stgRNA expression. We demonstrate programmable and multiplexed memory storage in human cells triggered by exogenous inducers or inflammation, both in vitro and in vivo. This tool, Mammalian Synthetic Cellular Recorder Integrating Biological Events (mSCRIBE), provides a distinct strategy for investigating cell biology in vivo and enables continuous evolution of targeted DNA sequences.

Cellular behavior is dynamic, responsive, and regulated by the integration of multiple molecular signals. Biological memory devices that can record regulatory events would be useful tools for investigating cellular behavior over the course of a biological process and furthering our understanding of signaling dynamics within cellular niches. Earlier generations of biological memory devices relied on digital switching between two or multiple quasi-stable states based on active transcription and translation of proteins (13). However, such systems do not maintain their memory after the cells are disruptively harvested. Encoding transient cellular events into genomic DNA memory by using DNA recombinases enables the storage of heritable biological information even after gene regulation is disrupted (4, 5). The capacity and scalability of these memory devices are limited by the number of orthogonal regulatory elements (such as transcription factors and recombinases) that can reliably function together. Furthermore, because they are restricted to a small number of digital states, they cannot record dynamic (analog) biological information, such as the magnitude or duration of a cellular event. We recently demonstrated a population-based technology for genomically encoded analog memory in Escherichia coli based on dynamic genome editing with retrons (6). Here, we present Mammalian Synthetic Cellular Recorders Integrating Biological Events (mSCRIBE), an analog memory system that enables the recording of cellular events within human cell populations in the form of DNA mutations. mSCRIBE uses self-targeting guide RNAs (stgRNAs) that direct clustered regularly interspaced short palindromic repeats–associated (CRISPR-Cas) activity to repeatedly mutagenize the DNA loci that encodes the stgRNAs (7). During the course of review of this work, systems with similar principles have been proposed (8, 9). Although these systems use Cas9 to record information in DNA, they pursue different applications, such as lineage tracing and generating barcodes, to specifically tag multiple cells simultaneously. In contrast, we use our platform to build memory devices capable of recording analog biological activity into mammalian cells both in vitro and in vivo.

The Streptococcus pyogenes Cas9 system from the CRISPR-Cas family is an effective genome-engineering enzyme that catalyzes double-strand breaks and generates mutations at DNA loci targeted by a small guide RNA (sgRNA) (1113). Normal sgRNAs are composed of a 20-nucleotide (nt) specificity determining sequence (SDS), which specifies the DNA sequence to be targeted and is immediately followed by an 80-nt scaffold sequence, which associates the sgRNA with Cas9. In addition to sequence homology with the SDS, targeted DNA sequences must possess a protospacer-adjacent motif (PAM) (5′-NGG-3′) immediately adjacent to their 3′-end in order to be bound by the Cas9-sgRNA complex and cleaved (14). When a double-strand break is introduced in the target DNA locus in the genome, the break is repaired through either homologous recombination (when a repair template is provided) or error-prone nonhomologous end joining (NHEJ) DNA repair mechanisms, resulting in mutagenesis of the targeted locus (11, 12). Even though the DNA locus encoding a normal sgRNA sequence is perfectly homologous to the sgRNA, it is not targeted by the standard Cas9-sgRNA complex because it does not contain a PAM.

To enable continuous encoding of population-level memory in human cells, we sought to build a modular memory unit that can be repeatedly written to generate new sequences and encode additional information over time. With the standard CRISPR-Cas system, once a genomic DNA target is repaired, resulting in a different DNA sequence, it is unlikely to be targeted again by the original sgRNA because the resulting DNA sequence and the sgRNA would lack the necessary sequence homology. We hypothesized that if the standard sgRNA architecture could be engineered so that it acted on the same DNA locus from which the sgRNA is transcribed, rather than a separate sequence elsewhere in the genome, this would yield a stgRNA that should repeatedly target and mutagenize the DNA that encodes it. To achieve this, we modified the DNA sequence from which the sgRNA is transcribed to include a 5′-NGG-3′ PAM immediately downstream of the region encoding the SDS so that the resulting PAM-modified stgRNA would direct Cas9 endonuclease activity toward the stgRNA’s own DNA locus. After a double-strand DNA break is introduced in the SDS-encoding region and repaired via the NHEJ repair pathway, the resulting de novo mutated stgRNA locus should continue to be transcribed as a mutated version of the original stgRNA and participate in another cycle of self-targeting mutagenesis. Multiple cycles of transcription followed by cleavage and error-prone repair should occur, resulting in a continuous, self-evolving Cas9-stgRNA system (Fig. 1A). We hypothesized that by biologically linking the activity of this system with regulatory events of interest, mSCRIBE can serve as a memory device that records information in the form of DNA mutations. We analyzed the sequence evolution dynamics of stgRNAs containing 20-, 30-, and 40-nucleotide SDSs and created a population-based recording metric that conveys information about the duration and/or intensity of stgRNA activity.

Fig. 1 Continuously evolving stgRNAs.

(A) Schematic of the self-targeting CRISPR-Cas system. The Cas9-stgRNA complex cleaves the DNA from which the stgRNA is transcribed, leading to error-prone DNA repair. Multiple rounds of transcription and DNA cleavage can occur, resulting in continuous mutagenesis of the DNA encoding the stgRNA. The blue line in the stgRNA schematic represents the SDS, and mutations in the stgRNAs are illustrated as red marks. The accumulation of mutations in the stgRNA provides a molecular record of cellular events that regulate stgRNA or Cas9 expression. (B) Multiple variants of sgRNAs were built and tested for inducing mutations at the DNA loci that encoded them by using T7 E1 assays. Introducing a PAM into the DNA encoding the S. pyogenes sgRNA (black arrows) renders the sgRNA self-targeting, as evidenced by Cas9-dependent cleavage of PCR amplicons into two fragments (380 and 150 bp) in the mod2 sgRNA variant (referred to as stgRNA), based on T7 E1 assays. (C) Further analysis of the percentage of mutated sequences via Illumina MiSeq sequencing confirmed that stgRNA can effectively generate mutations at its own DNA locus (two biological replicates were performed). (D) The percentage of sequences containing specific mutation types (insertion or deletion) at individual base pair positions out of all mutated sequences. By aligning each of the Illumina MiSeq reads with the original unmutated stgRNA sequence, the base pair positions of insertions and deletions acquired by the stgRNA locus was calculated. (E) Computationally designed stgRNAs with longer SDS (30nt-1, 40nt-1, and 70nt-1) demonstrate self-targeting activity based on T7 E1 assays (fig. S1 and table S2, constructs 1 to 11).

Modifying a sgRNA-expressing DNA locus to include a PAM renders it self-targeting

We built multiple variants of a S. pyogenes sgRNA-encoding DNA sequence with a 5′-GGG-3′ PAM located immediately downstream of the region encoding the 20-nt SDS and tested them for their ability to generate mutations at their own DNA locus. Human embryonic kidney (HEK) 293T–derived stable cell lines were built to express either the wild-type (WT) or each of the variant sgRNAs shown in Fig. 1B (table S2, constructs 1 to 6, and Materials and methods). Plasmids encoding either spCas9 (table S2, construct 7) or monomeric yellow fluorescent protein (mYFP) (negative control) driven by the cytomegalovirus promoter (CMVp) were transfected into cells stably expressing the depicted sgRNAs, and the sgRNA loci were inspected for mutagenesis by using T7 endonuclease I (T7 E1) assays 4 days after transfection. A straightforward variant sgRNA (mod1) with guanine substitutions at the U23 and U24 positions did not exhibit any noticeable self-targeting activity. We speculated that this was due to the presence of bulky guanine and adenine residues facing each other in the stem region, resulting in a destablized secondary structure. Thus, we encoded compensatory adenine-to-cytosine mutations within the stem region (A48, A49 position) of the mod2 sgRNA variant and observed robust mutagenesis at the modified sgRNA locus (Fig. 1B). Additional variant sgRNAs (mod3, mod4, and mod5) did not exhibit noticeable self-targeting activity. Thus, the mod2 sgRNA was hereafter referred to and used as the stgRNA architecture.

We further characterized the mutagenesis pattern of the stgRNA by sequencing the DNA locus encoding it. A HEK 293T cell line expressing the stgRNA was transfected with a plasmid expressing either Cas9 (table S2, construct 7) or mYFP driven by the CMV promoter. Genomic DNA was harvested from the cells at either 24 or 96 hours after transfection and subjected to targeted polymerase chain reaction (PCR) amplification of the region encoding the stgRNAs. The PCR amplicons were either sequenced with MiSeq or cloned into E. coli for Sanger sequencing of individual bacterial colonies (fig. S1). We found that cells transfected with the Cas9-expressing plasmid exhibited enhanced mutation frequencies in the stgRNA loci, and those frequencies increased over time, compared with cells transfected with the control mYFP-expressing plasmid (Fig. 1C). By using high-throughput sequencing, we inspected the mutated sequences generated by stgRNAs to determine the probability of insertions or deletions occurring at specific base pair positions. We calculated the percentage of those that contained insertions or deletions at each base pair position among all mutated sequences (Fig. 1D). We observed higher rates of deletions as compared with insertions at each nucleotide position. Moreover, an elevated percentage of mutated sequences exhibited deletions consecutively spanning nucleotide positions 13 to 17 for this specific stgRNA (20nt-1). We later carried out a more thorough analysis into the sequence evolution patterns of stgRNAs.

Given our observation that deletions are preferred over insertions, we suspected that stgRNAs would be shortened over time with repeated self-targeting activity, ultimately rendering them ineffective because of loss of the PAM or shortened SDS. To enable multiple cycles of self-targeting activity, we designed stgRNAs that are made up of longer SDSs. We initially built a cell line expressing a stgRNA containing a randomly chosen 30-nt SDS (table S2, construct 8) but did not detect noticeable self-targeting activity when the cell line was transfected with a plasmid expressing Cas9. We speculated that stgRNAs with longer than 20-nt SDSs might contain undesirable secondary structures that result in loss of activity. Therefore, we computationally designed stgRNAs that were predicted to maintain the scaffold fold of sgRNAs without undesirable secondary structures, such as stem loops and pseudoknots within the SDS (Materials and methods). Stable cell lines encoding stgRNAs containing these computationally designed 30-, 40-, and 70-nt SDS (table S2, constructs 9 to 11) were transfected with a plasmid expressing Cas9 driven by the CMV promoter. T7 E1 assays of PCR-amplified genomic DNA demonstrated robust indel formation in the respective stgRNA loci (Fig. 1E).

stgRNA-encoding loci undergo multiple rounds of self-targeted mutagenesis

We sought to demonstrate that the stgRNA-encoding DNA locus in individual cells undergoes multiple rounds of self-targeted mutagenesis. To track genomic mutations in single cells over time, we developed a mutation-based toggling reporter (MBTR) system that generates distinct fluorescence outputs based on indel sizes at the stgRNA-encoding locus, which was inspired by a design previously described for tracking DNA mutagenesis outcomes (15). Downstream of a CMV promoter and a canonical ATG start codon, we embedded the mutation detection region (MDR), which consists of a modified U6 promoter followed by a stgRNA locus. The MDR was immediately followed by out-of-frame green (GFP) and red (RFP) fluorescent proteins, which were separated by correspondingly out-of-frame “2A self-cleaving peptides” (P2A and T2A) (Fig. 2A and table S2, construct 13). Different reading frames are expected to be in-frame with the start codon, depending on the size of indels in the MDR. In the starting state (reading frame 1, F1), no fluorescence is expected. In reading frame 2 (F2), which corresponds to any –1 base pair (bp) frameshift mutation, an in-frame RFP is translated along with the T2A self-cleaving peptide, which enables release of the functional RFP from the upstream nonsense peptide. In reading frame 3 (F3), which corresponds to any –2 bp frameshift mutation, GFP is properly expressed downstream of an in-frame P2A and followed by a stop codon. We confirmed the functionality of this design by manually building constructs with stgRNA loci containing indels of various sizes (0 bp, –1 bp, and –2 bp corresponding to constructs 13, 14, and 15 in table S2, respectively) and introducing them into cells without Cas9. We observed the expected correspondence between indel sizes and fluorescence output (fig. S2).

Fig. 2 Tracking repetitive and continuous self-targeting activity at the stgRNA locus.

(A) Schematic of MBTR system consisting of a stgRNA in the MDR or a regular sgRNA target sequence in the MDR. We illustrate the expected fluorescent readouts of the MBTR system based on different indel sizes in the MDR. Correct reading frames of each protein relative to the start codon are indicated in the superscript as F1, F2, and F3. (B) An outline illustrating the double-sorting experiment that tracks repetitive self-targeting activity by using the MBTR system (Materials and methods). (C) Microscopy analysis and (D) flow cytometry data before the first and second sorting of UBCp-Cas9 cells containing the self-targeting or nonself-targeting MBTR constructs. The white arrows in the microscope images indicate cells that expressed a fluorescent protein different from the one they were sorted for 7 days earlier. (E) The genomic DNA collected from sorted cells was amplified and cloned into E. coli; the resulting bacterial colonies were then Sanger sequenced (Materials and methods). A sample of Sanger sequences for the different sorted populations is presented along with their mutation type, and the correct reading frame annotated. We observed a high correspondence between the mutated genotype and the observed fluorescent protein expression phenotype (figs. S2 and S3).

We subsequently used the MBTR system to assess changes in fluorescent gene expression within cells constitutively expressing Cas9 in order to track repeated mutagenesis at the stgRNA locus over time. We built a self-targeting MBTR construct containing a computationally designed 27-nt stgRNA driven by a modified U6 promoter embedded in the MDR (Fig. 2A and table S2, construct 13). As a control, we built a non–self-targeting MBTR construct with a regular sgRNA that targets an identical 27-bp DNA sequence embedded in the MDR (Fig. 2A and table S2, construct 16). We integrated the self-targeting or the non–self-targeting construct [via lentiviral transduction at multiplicity of infection (MOI) ~0.3 to ensure that most infected cells contained single copies] into the genome of clonally derived Cas9-expressing HEK 293T cells (hereafter called UBCp-Cas9 cells) and analyzed the cells by means of two rounds of fluorescence-activated cell sorting (FACS) based on RFP and GFP levels (Fig. 2B). In both cases, we found ~1 to 5% of the cells were RFP+/GFP or RFP/GFP+, which were sorted into Gen1:RFP and Gen1:GFP populations, respectively (Fig. 2, C and D), and <0.3% cells expressed both GFP and RFP. We cultured the Gen1:RFP and Gen1:GFP cells for 7 days, resulting in Gen2R and Gen2G populations, respectively. We then subjected the Gen2R and Gen2G populations to a second round of FACS. For cells with the stgRNA MBTR, a subpopulation of Gen2R cells toggled into being GFP-positive, and similarly, a subpopulation of Gen2G cells toggled into being RFP-positive. In contrast, cells containing the non–self-targeting MBTR with a regular sgRNA maintained their original fluorescence signals with no appreciable toggling behavior observed with FACS analysis (Fig. 2, C and D). The toggling of fluorescence output observed in UBCp-Cas9 cells transduced with the stgRNA MBTR suggests that repeated mutagenesis, resulting in multiple frameshifts in the MDR, occurred at the stgRNA locus within single cells. We also observed a double-positive cell population in the self-targeting group, which we believe is mostly likely due to residual fluorescence from one fluorophore not being completely lost before the expression of the other fluorophore. To further corroborate this finding, we sequenced the stgRNA locus in individual cells from post-sorted populations in both rounds of sorting by cloning PCR amplicons into E. coli and performing Sanger sequencing on individual bacterial colonies (Fig. 2E and fig. S3A). We found strong correlations (77 to 100% accuracy) between the sequenced genotype and observed fluorescence phenotype in all of the sorted cell populations (fig. S3B). Together, these results confirmed that repetitive mutagenesis can occur at the stgRNA locus within single cells.

stgRNAs exhibit characteristic sequence evolution patterns

Having established that stgRNA loci are capable of undergoing multiple rounds of targeted mutagenesis, we set out to delineate their sequence evolution patterns over time. We hypothesized that we could infer characteristic properties associated with stgRNA sequence evolution by simultaneously investigating many independently evolving cell clones, all of which contain an exactly identical stgRNA sequence to start with (Fig. 3C). We synthesized barcoded plasmid DNA libraries in which the stgRNA sequence was maintained constant while a chemically randomized 16-bp barcode was placed immediately downstream of the stgRNA (Fig. 3A). Six separate DNA libraries were synthesized that encode stgRNAs containing six distinct SDSs of different lengths: 20nt-1, 20nt-2, 30nt-1, 30nt-2, 40nt-1, or 40nt-2 (table S2, constructs 19 to 24). We used a constitutively expressed blue fluorescent protein, EBFP2, to confirm a MOI of ~0.3 so that most of the infected cells should contain single-copy integrants.

Fig. 3 stgRNA sequence evolution analysis.

(A) Schematic of the DNA construct used in building barcoded libraries encoding stgRNA loci. A randomized 16-bp barcode was placed immediately downstream of the stgRNA expression cassette in order to individually tag UBCp-Cas9 cells that contained integrated stgRNA loci. (B) The 16-day time course involved repeated sampling and passaging of cells in order to study sequence-evolution characteristics of stgRNA loci. (C) We lentivirally infected UBCp-Cas9 cells at a MOI ~0.3 so that the dominant population in infected cells contained single genomic copies of 16-bp-barcode–tagged stgRNA loci, which should be independently evolving. (D) The raw number of 16-bp barcodes that were associated with any particular 30nt-1 stgRNA sequence variant was plotted on the y axis for three different time points (day 2, day 6, and day 14). Each discrete, aligned sequence is identified by an integer index along the x axis. The starting stgRNA sequence is shown as index 1. (E) A transition probability matrix for the top 100 most frequent sequence variants of the 30nt-1 stgRNA. The color intensity at each (x, y) position in the matrix indicates the likelihood of the stgRNA sequence variant in each row (y) transitioning to a stgRNA sequence variant in each column (x) within the defined time scale (2 days). Because the non–self-targeting sequence variants (which contain mutations in the PAM) do not participate in self-targeting action, the y axis only consists of self-targeting stgRNA variants. The integer index of a stgRNA sequence variant is provided along with a graphical representation of the stgRNA sequence variant, in which a deletion is illustrated with a blank space, an insertion with a red box, and an unmutated base pair with a gray box. The PAM is shown in green. From left to right on the x axis and bottom to top on the y axis, the sequence variants are arranged in order of decreasing distance between the mutated region and the PAM. When the distances are the same, the sequence variants are arranged in order of increasing number of deletions. (F) The percent mutated stgRNA metric is plotted for each of the stgRNAs as a function of time. We observed a reasonably linear range of performance metric for stgRNAs, especially for the longer SDS containing 30nt-1, 30nt-2, 40nt-1, and 40nt-2 stgRNAs (figs. S4 to S7).

On day 0, lentiviral particles encoding each of the six stgRNA libraries were used to infect 200,000 UBCp-Cas9 cells in six separate wells of a 24-well plate. At a target MOI of 0.3, the infections resulted in ~60,000 successfully transduced cells per well. For each stgRNA library, eight cell samples were collected at time points spaced ~48 hours apart until day 16 (Fig. 3B). All samples from eight different time points across the six different libraries were pooled together and sequenced via NextSeq (Illumina, San Diego, CA). After aligning the next-generation sequencing reads to reference DNA sequences (Materials and methods), 16-bp barcodes that were observed across all the time points and the corresponding upstream stgRNA sequences were identified (fig. S4A). For each of the stgRNA libraries, we found >104 distinct 16-bp barcoded loci that were observed across all of the eight time points (fig. S4B). The aligned stgRNA sequence variants were represented with words composed of a four-letter alphabet: At each base pair position, the stgRNA sequence was represented by either M, I, X, or D, which stand for match, insertion, mismatch, or deletion, respectively (fig. S4, C and D, and Materials and methods). We identified >1000 distinct sequence variants that were observed in any of the time points and any of the barcoded loci for each stgRNA (fig. S5A and table S1, stgRNA sequences). Although some sequence variants are found in common across the stgRNAs, the majority of the sequence variants are specific to each stgRNA.

We plotted the number of barcoded loci associated with each sequence variant derived from the original 30nt-1 stgRNA for three different time points (Fig. 3D). Although the majority of the barcoded loci contained the original unmutated stgRNA sequence (index 1) for all three time points, we observed that a sequence variant containing an insertion at base pair 29 (index 523) and another sequence variant containing insertions at base pairs 29 and 30 (index 740) gained major representation by day 14. We noticed that most of the barcoded stgRNA loci evolved into just a few major sequence variants and thus sought to determine whether these specific sequences would dominate across different experimental conditions. In fig. S5B, we present the top seven most abundant sequence variants of the 30nt-1 stgRNA observed in three different experiments discussed in this work. The three experiments were performed with the 30nt-1 stgRNA encoded and (i) tested in vitro in a HEK 293T–derived cell line (UBCp-Cas9), (ii) tested in vitro in a HEK 293T–derived cell line in which Cas9 was regulated by the nuclear factor–κB (NF-κB)–responsive promoter (inflammation-recording cells), or (iii) tested in vivo in inflammation-recording cells (Figs. 3F and 4, E and G, respectively). We found that six sequence variants (including indices 523 and 740) were represented in the top seven sequence variants for all three different experiments we performed with the 30nt-1 stgRNA. Moreover, even though we observed >1000 distinct sequence variants for 30nt-1 stgRNA (fig. S5A and table S1, stgRNA sequences), these top seven most abundant sequence variants constituted >85% of the total sequences represented in each of these experiments. Thus, we speculate that stgRNA activity can result in specific and consistent mutations. We also analyzed whether any of stgRNA variants might contain direct homology to human genomic DNA. In fig. S5C, we present homology analysis for the top 100 most frequent 30nt-1 stgRNA variants. We found that only one of the top 100 stgRNA variants (35th most frequent variant) had perfect homology to genomic DNA (an intronic region), whereas most of the variants differed from the DNA by at least 2 bp in their SDS. Hence, the DNA locus encoding each stgRNA variant was the most likely targeted sequence for the majority of the 30nt-1 stgRNA variants.

Fig. 4 mSCRIBE as an analog memory device in vitro and in vivo.

(A) Schematic of multiplexed doxycycline and IPTG-inducible stgRNA cassettes within human cell populations. By introducing small-molecule–inducible stgRNA expression constructs into UBCp-Cas9 cells that also express TetR and LacI, the expression and self-targeting activity of each stgRNA can be independently regulated by doxycycline and IPTG, respectively. (B) mSCRIBE implements independently programmable, multiplexed genomic recording in human cells. Cleavage fragments observed from the T7 E1 assay of mSCRIBE units under independent regulation by doxycycline (Dox; 500 ng/mL) and IPTG (2 mM) are presented. (C) Constructs used to build a HEK 293T–derived clonal NF-κBp-Cas9 cell line that expresses Cas9 in response to NF-κB activation. The 30nt-1 stgRNA construct was placed on a lentiviral backbone that expresses EBFP2 constitutively and was introduced in to NF-κBp-Cas9 cells via lentiviral infections at 0.3 MOI so as to build inflammation-recording cells. (D) T7 E1 assay testing for TNFα-inducible stgRNA activity in inflammation-recording cells in vitro. Inflammation-recording cells were grown either in the absence or presence of 1 ng/mL TNFα for 96 hours. (E) Graded increases in recording activity as a function of time and concentrations of TNFα demonstrate the analog nature of mSCRIBE. Inflammation-recording cells were grown in media containing different amounts of TNFα or no TNFα. Cell samples were collected at 36-hour–time point intervals for each of the concentrations. Genomic DNA from the samples was PCR-amplified and sequenced via next-generation sequencing, and the percent mutated stgRNA metric was calculated. (F) Experimental outline for testing mSCRIBE in living mice. Inflammation-recording cells were implanted in the flank of three cohorts of four mice each. Three different cohorts of mice were treated with either no LPS, or with one or two doses of LPS on days 7 and 10. After harvesting the samples on day 13 and PCR-amplifying the genomic DNA followed by next-generation sequencing, the percent mutated stgRNA metric was calculated. (G) The percent mutated stgRNA metric calculated for the three cohorts of four mice is presented. The solid bars indicate the mean for each cohort (n = 4 mice in each condition), and the error bars indicate the SEM. mSCRIBE demonstrates increasing genomic recording activity with increasing doses of LPS in mice (figs. S8 to 10).

Given our observation that stgRNAs may have characteristic sequence evolution patterns, we sought to infer the likelihood of a stgRNA locus transitioning from any given sequence variant to another variant owing to self-targeted mutagenesis. We computed such likelihoods in the form of a transition probability matrix, which captures the probability of a sequence variant transitioning to any sequence variant within a given time frame (Fig. 3E, fig. S4, and Materials and methods). We found that self-targeting sequence variants were generally more likely to remain unchanged than be mutagenized across the 2-day time period, as indicated by high probabilities along the main diagonal (matrix elements where x = y), as annotated in fig. S6. In addition, transition probability values were found to be typically higher for sequence transitions below the main diagonal versus for those above the main diagonal, implying that sequence variants tend to progressively gain deletions (fig. S6). Moreover, when compared with deletion-containing sequence variants, insertion-containing sequence variants tended to have a very narrow set of sequence variants into which they were likely to mutagenize. Last, we noticed that the predominant way in which mutated self-targeting sequence variants mutagenize into non–self-targeting sequence variants is by losing the PAM and downstream region encoding the stgRNA handle while keeping the SDS-encoding region intact.

Having analyzed the sequence evolution characteristics of stgRNAs, we envisioned that a metric could be computed on the basis of the relative abundance of stgRNA sequence variants as a measure of stgRNA activity. Such a metric would enable the use of stgRNAs as intracellular recording devices in a population to store biologically relevant, time-dependent information that could be reliably interpreted after the events were recorded. From our analysis of stgRNA sequence evolution, we reasoned that novel self-targeting sequence variants at a given time point should have arisen from prior self-targeting sequence variants and not from non–self-targeting sequence variants. Thus, we calculated the percentage of sequences that contain mutations only in the SDS-encoding region among all the sequences that contain an intact PAM, which we call the percent mutated stgRNA, to serve as an indicator of stgRNA activity. In Fig. 3F, we plot the percent mutated stgRNA as a function of time for the six different stgRNAs. Except for the 20nt-2 stgRNA, which saturated to ~100% by 10 days, we observed nonsaturating and steadily increasing responses of the metric for all stgRNAs over the entire 16-day experimentation period. On the basis of the rate of increase of the percent mutated stgRNA (percent mutated stgRNA/time), stgRNAs encoding SDSs of longer length should have a greater capacity to maintain a steady increase in the recording metric for longer durations of time and thus should be more suitable for longer-term recording applications.

We also conducted a time course experiment with regular sgRNAs targeting a DNA target sequence so as to test their ability to serve as memory registers (fig. S7). We used sgRNAs encoding the same 20nt-1, 30nt-2, and 40nt-1 SDSs tested in Fig. 3F (table S2, constructs 25 to 27) and found that unlike stgRNA loci, sgRNA target loci quickly saturate the percent mutated sequence metric and exhibit restricted linear ranges.

Small-molecule inducible and multiplexed memory storage using mSCRIBE

We placed stgRNA loci under the control of small-molecule inducers in order to record chemical inputs into genomic memory registers. We designed doxycycline-inducible and isopropyl-β-d-thiogalactoside (IPTG)–inducible RNA polymerase III (RNAP III) promoters to express stgRNAs, similar to prior work with short hairpin RNAs (Fig. 4A) (16, 17). We engineered the RNAP III H1 promoter to contain a Tet-operator, allowing for tight repression of promoter activity in the presence of the TetR protein, which can be rapidly and efficiently relieved by the addition of doxycycline (table S2, construct 29). Similarly, we built an IPTG-inducible stgRNA locus by introducing three LacO sites into the RNAP III U6 promoter so that LacI can repress transcription of the stgRNA, which is relieved by the addition of IPTG (table S2, construct 30). We first verified that doxycycline and IPTG-inducible stgRNAs worked independently when integrated into the genome of UBCp-Cas9 cells that also express TetR and LacI (table S2, construct 28) (fig. S8). Next, we placed the doxycyline and IPTG-inducible stgRNA loci on to a single lentiviral backbone (Fig. 4A and table S2, construct 31) and integrated them into the genome of UBCp-Cas9 cells that also expressed TetR and LacI. The induction of stgRNA expression by exposure to doxycycline or IPTG led to efficient self-targeting mutagenesis at the cognate loci as detected with the T7 E1 assay, whereas lack of exposure to doxycycline or IPTG did not (Fig. 4B and Materials and methods). Moreover, when cells were exposed to both doxycycline and IPTG, we detected simultaneous mutation acquisition at both loci, thus demonstrating inducible and multiplexed molecular recording across the cell populations.

Recording the activation of the NF-κB pathway via mSCRIBE

We next sought to build stgRNA memory units that record signaling events in cells within live animals. We adapted a well-established acute inflammation model involving repetitive intraperitoneal injection of lipopolysaccharide (LPS) in mice (18). Immune cells that sense LPS release tumor necrosis factor α (TNFα), which is a potent activator of the NF-κB pathway (19). The activation of the NF-κB pathway plays an important role in coordinating responses to inflammation (20). To sense the activation of the NF-κB pathway, we built a construct containing a NF-κB–responsive promoter driving the expression of the RFP mKate2 (table S2, construct 32) and stably integrated it into HEK 293T cells. We observed a >50-fold increase in expression levels when these cells were exposed to TNFα in vitro (fig. S9, A, B, and C). Next, we implanted these cells into the flanks of athymic nude mice (female nu/nu). After implanted cells reached a palpable volume, we performed intraperitoneal injection of LPS and observed robust mKate2 expression (fig. S9D) and elevated TNFα concentrations in the serum after LPS injection (fig. S9E).

We then built a clonal HEK 293T cell line containing an NF-κB–induced Cas9 expression cassette (NF-κBp-Cas9 cells) and infected the cells with lentiviral particles encoding the 30nt-1 stgRNA at MOI ~0.3. These cells (hereafter referred to as inflammation-recording cells) accumulated stgRNA mutations, as detected with the T7 E1 assay, when induced with TNFα (Fig. 4D). We characterized the stgRNA memory unit in inflammation-recording cells by varying the concentration [within pathophysiologically relevant concentrations (fig. S9E) (21)] and duration of exposure to TNFα in vitro and determining the percent mutated stgRNA metric (Fig. 4E). We observed graded increases in the percent mutated stgRNA metric as a function of time, thus demonstrating that stgRNA-based memory can record temporal information on signaling events in human cells. Furthermore, higher TNFα concentrations resulted in cells that had higher values for the percent mutated stgRNA metric, indicating that signal magnitude can modulate the mSCRIBE memory register in an analog fashion.

Recording LPS-inducible inflammation in vivo via mSCRIBE

After characterizing the in vitro time and dosage sensitivity of our inflammation-recording cells, we implanted them into mice. The implanted mice were split into three cohorts: no LPS injection over 13 days, an LPS injection on day 7, and an LPS injection on day 7 followed by another LPS injection on day 10 (Fig. 4F). The genomic DNA of implanted cells was extracted from all cohorts on day 13. The stgRNA locus was PCR-amplified and sequenced via next-generation sequencing. We observed a direct correlation between the LPS dosage and the percent mutated stgRNA metric, with increasing numbers of LPS injections resulting in increased percent mutated stgRNA metric (fig. S5B). Our results indicate that stgRNA memory registers can be used in vivo to record physiologically relevant biological signals in an analog fashion.

While generating data for Figs. 3F and 4E, we used PCR to amplify the stgRNA loci from ~30,000 cells and then calculated the percent mutated stgRNA metric as a readout of genomic memory. However, access to tissues or biological samples could be limited in certain in vivo contexts. To investigate the sensitivity of our stgRNA-encoded memory when the input biological material is restricted, we sampled 1:100 dilutions of the genomic DNA extracted from the TNFα-treated inflammation-recording cells in Fig. 4E (which corresponds to ~300 cells) in triplicate followed by PCR amplification, sequencing, and calculation of the percent mutated stgRNA metric (fig. S10). We found very little deviation between the percent mutated sgRNA metric between samples with ~300 cells versus those from ~30,000 cells. We hypothesize that this tight correspondence is due to stgRNA evolution toward very few, dominating sequence variants, as was observed in Fig. 3D and fig. S5B.

Discussion and conclusions

In this Research Article, we describe an architecture for stgRNAs that can repeatedly direct Cas9 activity against the DNA loci that encode the stgRNAs. This technology enables the creation of self-contained genomic analog memory units in human cell populations. We show that stgRNAs can be engineered by introducing a PAM into the sgRNA sequence and with our MBTR system validate that mutations accumulate repeatedly in stgRNA-encoding loci over time. After characterizing the sequence evolution dynamics of stgRNAs, we derived a computational metric that can be used to map the extent of stgRNA mutagenesis in a cell population to the duration or magnitude of the recorded input signal. Our results demonstrate that the percent mutated stgRNA metric increases with the magnitude and duration of input signals, thus resulting in long-lasting analog memory stored in the genomic DNA of human cell populations.

Because the stgRNA loci can be multiplexed for memory storage and function in vivo, this approach for analog memory in human cells could be used to map dynamic and combinatorial sets of gene regulatory events without the need for continuous cell imaging or destructive sampling. For example, cellular recorders could be used to monitor the spatiotemporal heterogeneity of molecular stimuli that cancer cells are exposed to within tumor microenvironments (22), such as exposure to hypoxia, pro-inflammatory cytokines, and other soluble factors. One could also track the extent to which specific signaling pathways are activated during disease progression or development, such as the mitogen-activated protein kinase (MAPK), Wnt, Sonic Hedgehog (SHH), and TGF-β–regulated signaling pathways (2326).

One limitation of our approach is that the NHEJ DNA repair mechanism is error-prone, so it is not easy to precisely control how each stgRNA cleavage event translates into a defined mutation, which could result in errors and noise in interpreting a given memory register. Ideally, each stgRNA cleavage event would result in a defined mutation, rather than a range of mutations. Among NHEJ repair mechanisms, recent studies have identified a more error-prone repair pathway, termed alternative NHEJ (aNHEJ). To enhance the controllability of mutations that arise over time, small-molecule inhibitors of aNHEJ components, including ligase III and PARP1, could be used (27, 28). The systematic engineering and characterization of a larger library of stgRNA sequences could also help to identify memory registers that are more efficient than the ones tested here.

Moreover, because our system generates a diverse set of stgRNA variants during the self-mutagenesis process, it is difficult to predict and eliminate potential off-target effects that may arise even if the original stgRNA can be designed for minimal off-target effects. As an alternative, we could fuse deactivated Cas9 (dCas9) to DNA cleavage domains such as single-chain FokI nucleases (29) so that dCas9 could be targeted to a specific DNA locus, with cleavage occurring away from the dCas9 binding site. This way, one can avoid generating variants of stgRNAs that might target other sites in the genome while repeated targeting of the DNA locus can occur at locations distal to the dCas9 binding site, hence serving as a continuous memory register. Alternatively, adopting the recently described “base-editing” strategy that uses cytidine deaminase (30) activity could help to avoid issues with using mutagenesis via DNA double-strand breaks for memory storage. Epigenetic strategies—for example, by fusing methyltransferases (31) or demethylases (32) to dCas9—could also be leveraged for continuous memory storage. Last, in addition to recording information, this technology could be used for lineage tracing in the context of organogenesis. Embryonic stem cells containing stgRNAs could be allowed to develop into a whole organism, and the resulting lineage relationships between multiple cell types could be delineated via in situ RNA sequencing (33). We show that mSCRIBE, enabled by self-targeting CRISPR-Cas, is useful for analog memory in mammalian cells. We anticipate that mSCRIBE will be applicable to a broad range of biological settings and should provide insights into signaling dynamics and regulatory events in cell populations within living animals.

Materials and methods

Vector construction

The vectors used in this study (table S2, construct 12) were constructed using standard molecular cloning techniques, including restriction enzyme digestion, ligation, PCR, and Gibson assembly. Custom oligonucleotides were purchased from Integrated DNA Technologies. The vector constructs were transformed into E. coli strain DH5α, and 50 μg/ml of carbenicillin (Teknova) was used to isolate colonies harboring the constructs. DNA was extracted and purified using Plasmid Mini or Midi Kits (Qiagen). Sequences of the vector constructs were verified with Genewiz and Quintara Bio’s DNA sequencing service. Sequences of all of the DNA constructs used in this work are listed in Table S2 and their plasmid maps are available at

T7 Endonuclease I (T7 E1) assay and Sanger sequencing

Unless otherwise stated, cells used for T7 E1 assays were grown in 24-well plates with 200,000 cells per well. Genomic DNA from respective cell lines containing stgRNA or the sgRNA loci was extracted using the QuickExtract DNA extraction solution (Epicentre). Genomic PCR was performed using the KAPA-HiFi polymerase (KAPA biosystems) using the primers:

JP1710 – GCAGAGATCCAGTTTGGGGGGTTCCGCGCAC and JP1711 – CCCGGTAGAATTCCTCGACGTCTAATGCCAAC at 65°C for 30s and 25s/cycle extension at 72°C for 29 cycles. Purified PCR DNA was then used in the T7 Endonuclease I (T7 E1) assays. Specifically, 400 ng of PCR DNA was used per 20 mL T7 E1 reaction mixture (NEB Protocols, M0302). For Sanger sequencing, PCR amplicons from mutated genomic DNA were cloned in to KpnI/NheI sites of Construct 13 from previous work (34) and transformed into E. coli (DH5a, NEB). Single colonies of bacteria were Sanger sequenced using the Rolling Circle Amplification method (Genewiz, Inc).

Cell culture, transfections and lentiviral infections

Cell culture and transfections were performed as described earlier (34). HEK 293T cells (ATCC CRL- 11268) were purchased from and authenticated by ATCC. Our cell lines were tested negative for mycoplasma contamination by the Diagnostic Laboratory of the Division of Comparative Medicine at MIT. Lentiviruses were packaged using the FUGw backbone (2) (Addgene #25870) in HEK 293T cells. Filtered lentiviruses were used to infect respective cell lines in the presence of polybrene (8 μg/mL). Successful lentiviral integration was confirmed by using lentiviral plasmid constructs constitutively expressing fluorescent proteins or antibiotic resistance genes to serve as infection markers.

Clonal cell lines and DNA constructs

A lentiviral plasmid construct expressing spCas9, codon optimized for expression in human cells fused to the puromycin resistance gene with a P2A linker was built from the taCas9 plasmid (34) (table S2, construct 12). The UBCp-Cas9 cell line was constructed by infecting early passage HEK 293T cells with high titer lentiviral particles encoding Construct 12 and selecting for clonal populations grown in the presence of puromycin (7 μg/mL). The NF-κBp-Cas9 cell line was built by infecting HEK 293T cells with high titer lentiviral particles encoding a NF-κB-responsive Cas9 expressing construct (table S2, construct 33). Transduced cells were induced with 1 ng/mL TNFα for three days followed by selection with 3 μg/mL puromycin. NF-κBp-Cas9 cells were then clonally isolated in the absence of TNFα. NF-κBp-Cas9 cells were infected with lentivirus particles encoding the 30nt-1 stgRNA locus at 0.3 multiplicity of infection (MOI) to build inflammation-recording cells. Cell lines used to test stgRNA activity were built by infecting HEK 293T cells with lentiviral particles encoding constructs 1 through 6 (table S2) and selecting for successfully transduced cells with 300 μg/mL hygromycin. The cell line used to test inducible and multiplexed recording with doxycycline and IPTG was built by infecting UBCp-Cas9 cells with lentiviral particles encoding a DNA construct that expresses TetR and LacI constitutively (table S2, construct 28) followed by selection with 200 mg/mL zeocin for seven days.

Design of longer stgRNAs

Longer stgRNAs were designed using the ViennaRNA package (36). Specifically, the RNAfold software was used to generate SDSs that retain the native structure of the guide RNA handle and no secondary structures in the SDS encoding region in the minimum free energy structure.

FACS and microscopy

Before analysis and sorting, cells were suspended in PBS with 2% fetal bovine serum. Cells were sorted using Beckmann Coulter MoFlo cell sorter. Flow cytometry analysis was performed with Becton Dickinson LSRFortessa and FlowJo. Fluorescence microscopy images of cells were obtained by using Thermo Scientific’s EVOS cell imager. The cells were directly imaged from tissue culture plates.

Mutation-based toggling reporter (MBTR)-based cell sorting experiment

HEK 293T cells stably expressing Cas9 (UBCp-Cas9 cells) were infected with MBTR constructs at low titer (MOI = 0.3) so that most of the infected cells had a single copy of the construct. In the self- targeting scenario, a U6 promoter driven stgRNA with a 27 nt SDS is embedded between a constitutive human CMV promoter and modified GFP and RFP reporters. RNAP II mediated transcription starts upstream of the U6 promoter. Different sizes of indel formation at the stgRNA locus should result in different peptides sequences being translated. When translated in-frame, two “self-cleaving” 2A peptides, P2A and T2A, are designed to cause co-translational “cleavage” of the peptides and release functional fluorescent protein from the nonsense peptides, thus resulting in the appropriate fluorescent signal. The non-self-targeting construct consists of a U6 promoter driving expression of a regular sgRNA, which targets a sequence corresponding to the sgRNA embedded in the MBTR system as the MDR. Five days after the initial infection, generation 1 (Gen1) cells were sorted into RFP or GFP positive populations (Gen1:RFP and Gen1:GFP). The genomic DNA was extracted from a portion of the sorted cells. The rest of the sorted cells were allowed to grow to acquire further mutations at the stgRNA loci. The cells initially sorted for RFP or GFP fluorescence (Gen2R and Gen2G) were sorted again seven days after the first sort. The genomic DNA of the sorted cells (Gen2R:RFP, Gen2R:GFP, Gen2G:RFP and Gen2G:GFP) was collected, PCR amplified and Sanger sequenced after bacterial cloning. See Fig. 2 and fig. S3.

Next-generation sequencing and alignment

Genomic DNA from respective cell lines was extracted using QuickExtract (Epicenter) and amplified using sequence specific primers containing Illumina adapter sequences P5 – AATGATACGGCGACCACCGAGATCTACAC and P7 – CAAGCAGAAGACGGCATACGAGAT as primer overhangs. Multiple PCR samples were multiplexed together and sequenced on a single flow cell using 8 bp multiplexing barcodes incorporated via reverse primers. The barcode library stgRNA samples in Fig. 3 were split into two groups and sequenced on the NextSeq platform (resulting in 154 and 178 million reads) while the 20nt-1 stgRNA samples in Fig. 1, the regular sgRNA samples in fig. S7, TNFa dosage and time course characterization samples in Fig. 4E and the mouse tumor PCR samples in Fig. 4G were sequenced on the MiSeq platform (resulting in ~13 million reads per experiment). Paired end reads were assembled using the PEAR package (37). Optimal sequence alignment was performed by a custom written C++ code implementing the SS-2 algorithm (38) using affine gap costs with a gap opening penalty of 2.5 and a gap continuation penalty of 0.5 (see Code availability). The aligned sequences were represented using a four-letter alphabet in the “MIXD” format where M represents a match, I represents an insertion, X represents a mismatch and D represents a deletion. At each base-pair position, the sequence aligned base pair is represented by one of the following letters: ‘M’, ‘I’, ‘X’ or ‘D’ (fig. S4). 27 letter words were used to represent the 20nt stgRNA sequence variants wherein the 27 letters correspond to the first 20 bp of the SDS encoding region, followed by 3 bp of PAM and 4 bp representing the immediately adjacent 4 bp region encoding the stgRNA handle. Similarly, 37 and 47 letter words were used to represent the 30nt and 40nt stgRNA sequence variants.

Barcoded stgRNA sequence evolution and transition probabilities

After sequence alignment, 16 bp barcodes and the stgRNA sequence variants (in the MIXD format) were extracted. Only the 16 bp barcodes that were represented in all of the time points were considered for further analysis. We employ the well-established Discrete Time Markov Chain (DTMC) analysis to model stgRNA sequence evolution. Each unique stgRNA sequence variant is considered to represent a “state” and the list of stgRNA sequence variants belonging to the same 16 bp barcode and consecutive time points to comprise a DTMC. A maximum likelihood estimation of the transition probabilities is then computed. Specifically, all possible two-wise combinations of sequence variants associated with the same barcode but consecutive time points were evaluated for a “parent-daughter” association. For every sequence variant in a future time point (a daughter), a sequence variant with the same barcode in the immediately preceding time point that had the minimum Hamming distance to the daughter sequence variant was assigned as the parent. Since the presence of an intact PAM is an absolute requirement for self-targeting capability of stgRNAs, only the sequence variants that contained an intact PAM were considered as potential parents. Many parent-daughter associations were computed across all the barcodes and time points, resulting in an overall count for each specific parent-daughter association. Finally, the counts were normalized such that the total likelihood of transitioning from each parent to all possible daughters would sum to one. The Hamming distance metric between two sequence variants in the MIXD format was calculated by assigning a distance score for each base pair position. Specifically, if only one of the sequence variants being compared had an insertion at a particular base pair position, then the score for that position is assigned 2. In all other cases, the score at a base pair position was assigned 0 if the sequence variant letters were identical and 1 if they were not identical. The scores for each base pair position were summed up and used as the Hamming distance metric between the two sequence variants. Finally, while assigning parent-daughter associations, unless the parent and the daughter sequence variants were exactly identical, sequence variants that contain mutations in the PAM were not considered as potential parents. The implementation of the above algorithm using a specific barcoded locus is presented in fig. S4. See Fig. 3E.

While designing an mSCRIBE memory device, it is important to keep in mind that stgRNA sequence evolution in its current implementation relies on an undirected phenomenon that can involve potential sources of bias. Over time, the newly generated stgRNAs could become inactive due to severe shortening of their SDS, acquisition of mutations that modify the downstream S. pyogenes scaffold required for recognition by Cas9, introduction of runs of ‘T’ residues could inactivate the stgRNA due to RNA Pol III termination, and homologous repair from the sister chromatid that might result in complete loss of the stgRNA locus. There could also be unanticipated off-target effects because of newly formed stgRNAs targeting sites elsewhere in the genome. However, as we have observed with the stgRNA sequences used in this work, stgRNAs tend to progressively gain deletions and hence, we believe one can minimize such unanticipated affects by designing stgRNAs that are maximally orthogonal to genomic DNA.

Small-molecule-inducible and multiplexed memory storage

We first built a cell line expressing TetR and LacI by infecting UBCp-Cas9 cells with construct 28, table S2. This cell line was then infected with lentiviral particles encoding the inducible stgRNA cassette from table S2, constructs 29 to 31, and the cells were grown either in the presence or absence of 500 ng/mL doxycycline and/or 2mM IPTG. The cells were harvested 96 hours post induction and PCR amplified genomic DNA was subject to T7 E1 assays. See Fig. 4, A and B.

In vivo inflammation model

Four to six weeks old female athymic nude mice (strain nu/nu) were obtained from the rodent breeding colony at Charles River Laboratory. They were specific pathogen free and maintained on sterilized water and animal food. All animals were maintained and used in accordance with the guidelines of the Institutional Animal Care and Use Committee. Sample sizes of the study were estimated based according to in vivo pilot studies and in vitro studies on the expected variance between animals and assay sensitivity (32). Inflammation-recording cells were suspended in matrigel (Corning, NY) in 1:1 ratio with cell growth media. 2 x106 cells were implanted subcutaneously in the flank regions of mice. Animals were randomly assigned into experimental groups after tumor implantation with matched tumor sizes. Where indicated, mice were injected intraperitoneally with lipopolysaccharide (LPS) (from Escherichia coli serotype 0111:B4, prepared by from sterile ready-made solution from Sigma Chemical Co., St. Louis, MO) dissolved in 0.1 ml saline solution. Animal studies were conducted without blinding. The exclusion and inclusion criteria of the animal study were pre-established. Animals with tumors that grew more than 10 mm in its largest diameter during the experimental period were sacrificed and excluded from the study. See Fig. 4, F and G.

Code availability

Relevant C++ routines used for data analysis can be found at

Supplementary Materials

Figs. S1 to S10

Tables S1 and S2


References and Notes

Acknowledgments: The plasmid constructs mentioned in table S2 are available from Addgene via their standard materials transfer agreement. T.K.L., S.D.P., and C.H.C. are inventors on a U.S. patent application (PCT/US2016/032348) submitted by MIT that covers the self-targeting genome editing system. We thank members of the Lu laboratory for helpful discussions. We thank the MIT MicroBioCenter for technical support with next-generation sequencing and the MIT Koch Institute flow cytometry core facility for their technical assistance in cell sorting. This work was supported by the National Institutes of Health (grants DP2 OD008435 and P50 GM098792), the Office of Naval Research (grant N00014-13-1-0424), the National Science Foundation (grant MCB-1350625), the Defense Advanced Research Projects Agency, The Center for Microbiome Informatics and Therapeutics, and NSF Expeditions in Computing Program Award 1522074. C.H.C. was supported by a Natural Sciences and Engineering Research Council of Canada postgraduate fellowship. S.P., C.H.C., and T.K.L. conceived the work. S.D.P. and C.H.C. designed and performed experiments. S.D.P. performed computational analyses on next-generation sequencing data. C.H.C. conducted in vivo animal studies. S.D.P., C.H.C., and T.K.L. designed the experiments and interpreted and analyzed the data. S.D.P., C.H.C., and T.K.L. wrote the paper. Sequences of all of the DNA constructs used in this work are listed in table S2, and their plasmid maps and C++ routines are available at
View Abstract


Navigate This Article