Genetic Screens in Human Cells Using the CRISPR-Cas9 System

See allHide authors and affiliations

Science  03 Jan 2014:
Vol. 343, Issue 6166, pp. 80-84
DOI: 10.1126/science.1246981

Improving Whole-Genome Screens

Improved methods are needed for the knockout of individual genes in genome-scale functional screens. Wang et al. (p. 80, published online 12 December) and Shalem et al. (p. 84, published online 12 December) used the bacterial CRISPR/Cas9 system to power-screen protocols that avoid several of the pitfalls associated with small interfering RNA (siRNA) screens. Genome editing by these methods completely disrupts target genes, thus avoiding weak signals that can occur when transcript abundance is partially decreased by siRNA. Furthermore, gene targeting by the CRISPR system is more precise and appears to produce substantially fewer off-target effects than existing methods.


The bacterial clustered regularly interspaced short palindromic repeats (CRISPR)–Cas9 system for genome editing has greatly expanded the toolbox for mammalian genetics, enabling the rapid generation of isogenic cell lines and mice with modified alleles. Here, we describe a pooled, loss-of-function genetic screening approach suitable for both positive and negative selection that uses a genome-scale lentiviral single-guide RNA (sgRNA) library. sgRNA expression cassettes were stably integrated into the genome, which enabled a complex mutant pool to be tracked by massively parallel sequencing. We used a library containing 73,000 sgRNAs to generate knockout collections and performed screens in two human cell lines. A screen for resistance to the nucleotide analog 6-thioguanine identified all expected members of the DNA mismatch repair pathway, whereas another for the DNA topoisomerase II (TOP2A) poison etoposide identified TOP2A, as expected, and also cyclin-dependent kinase 6, CDK6. A negative selection screen for essential genes identified numerous gene sets corresponding to fundamental processes. Last, we show that sgRNA efficiency is associated with specific sequence motifs, enabling the prediction of more effective sgRNAs. Collectively, these results establish Cas9/sgRNA screens as a powerful tool for systematic genetic analysis in mammalian cells.

A critical need in biology is the ability to efficiently identify the set of genes underlying a cellular process. In microorganisms, powerful methods allow systematic loss-of-function genetic screening (1, 2). In mammalian cells, however, current screening methods fall short—primarily because of the difficulty of inactivating both copies of a gene in a diploid mammalian cell. Insertional mutagenesis screens in cell lines that are near-haploid or carry Blm mutations, which cause frequent somatic crossing-over, have proven powerful but are not applicable to most cell lines and suffer from integration biases of the insertion vectors (3, 4). The primary solution has been to target mRNAs with RNA interference (RNAi) (59). However, this approach is also imperfect because it only partially suppresses target gene levels and can have off-target effects on other mRNAs, resulting in false negative and false positive results (1012). Thus, there remains an unmet need for an efficient, large-scale, loss-of-function screening method in mammalian cells.

Recently, the clustered regularly interspaced short palindromic repeats (CRISPR) pathway, which functions as an adaptive immune system in bacteria (13), has been co-opted to engineer mammalian genomes in an efficient manner (1416). In this two-component system, a single-guide RNA (sgRNA) directs the Cas9 nuclease to cause double-stranded cleavage of matching target DNA sequences (17). In contrast to previous genome-editing techniques, such as zinc-finger nucleases and transcription activator-like effector nucleases (TALENs), the target specificity of CRISPR-Cas9 is dictated by a 20–base pair (bp) sequence at the 5′ end of the sgRNA, allowing for much greater ease of construction of knockout reagents. Mutant cells lines and mice bearing multiple modified alleles can be generated with this technology (18, 19).

We set out to explore the feasibility of using the CRISPR-Cas9 system to perform large-scale, loss-of-function screens in mammalian cells. The idea was to use a pool of sgRNA-expressing lentivirus to generate a library of knockout cells that could be screened under both positive and negative selection. Each sgRNA would serve as a distinct DNA barcode that can be used to count the number of cells carrying it by using high-throughput sequencing (Fig. 1A). Pooled screening requires that single-copy sgRNA integrants are sufficient to induce efficient cleavage of both copies of a targeted locus. This contrasts with the high expression of sgRNAs achieved through transfection that is typically used to engineer a specific genomic change by using the CRISPR-Cas9 system.

Fig. 1 A pooled approach for genetic screening in mammalian cells by using a lentiviral CRISPR-Cas9 system.

(A) Outline of sgRNA library construction and genetic screening strategy. (B) Immunoblot analysis of WT KBM7 cells and KBM7 cells transduced with a doxycycline-inducible FLAG-Cas9 construct upon doxycycline induction. S6K1 was used as a loading control. (C) Sufficiency of single-copy sgRNAs to induce genomic cleavage. Cas9-expressing KBM7 cells were transduced with AAVS1-targeting sgRNA lentivirus at low MOI. The SURVEYOR mutation detection assay was performed on cells at the indicated days post-infection (dpi). Briefly, mutations resulting from cleavage of the AAVS1 locus were detected through polymerase chain reaction (PCR) amplification of a 500-bp amplicon flanking the target sequence, re-annealing of the PCR product, and selective digestion of mismatched heteroduplex fragments. (D) Characterization of mutations induced by CRISPR-Cas9 as analyzed with high-throughput sequencing. (E) sgRNA library design pipeline. (F) Example of sgRNAs designed for PSMA4. sgRNAs targeting constitutive exonic coding sequences nearest to the start codon were chosen for construction. (G) Composition of genome-scale sgRNA library.

We first tested the concept in the near-haploid, human KBM7 CML cell line by creating a clonal derivative expressing the Cas9 nuclease (with a FLAG-tag at its N terminus) under a doxycycline-inducible promoter (Fig. 1B). Transduction of these cells at low multiplicity of infection (MOI) with a lentivirus expressing a sgRNA targeting the endogenous AAVS1 locus revealed substantial cleavage at the AAVS1 locus 48 hours after infection (Fig. 1C). Moreover, because the sgRNA was stably expressed, genomic cleavage continued to increase over the course of the experiment. Deep sequencing of the locus revealed that repair of Cas9-induced double-strand breaks resulted in small deletions (<20 bp) in the target sequence, with tiny insertions or substitutions (<3 bp) occurring at a lower frequency (Fig. 1D). The vast majority of the lesions, occurring in a protein-coding region, would be predicted to give rise to a nonfunctional protein product, indicating that CRISPR-Cas9 is an efficient means of generating loss-of-function alleles.

We also analyzed off-target activity of CRISPR-Cas9. Although the specificity of CRISPR-Cas9 has been extensively characterized in transfection-based settings (2022), we wanted to examine its off-target behavior in our system, in which Cas9 and a sgRNA targeting AAVS1 (sgAAVS1) were stably expressed for 2 weeks. We compared the level of cleavage observed at the target locus (97%) with levels at 13 potential off-target cleavage sites in the genome (defined as sites differing by up to 3 bp from sgAAVS1) (fig. S1A). Minimal cleavage (<2.5%) was observed at all sites, with one exception, which was the only site that had perfect complementarity in the “seed” region (terminal 8 bp) (fig. S1B). On average, sgRNAs have ~2.2 such sites in the genome, almost always (as in this case) occurring in noncoding DNA and thus less likely to affect gene function (supplementary text S1).

To test the ability to simultaneously screen tens of thousands of sgRNAs, we designed a sgRNA library with 73,151 members, consisting of multiple sgRNAs targeting 7114 genes and 100 nontargeting controls (Fig. 1E, table S1, and supplementary materials, materials and methods). sgRNAs were designed against constitutive coding exons near the beginning of each gene and filtered for potential off-target effects based on sequence similarity to the rest of the human genome (Fig. 1, F and G). The library included 10 sgRNAs for each of 7031 genes and all possible sgRNAs for each of the 83 genes encoding ribosomal proteins (Fig. 1H). To assess the effective representation of our microarray synthesized library, we sequenced sgRNA barcodes from KBM7 cells 24 hours after infection with the entire lentiviral pool and were able to detect the overwhelming majority (>99%) of our sgRNAs, with high uniformity across constructs (only a sixfold increase in abundance between the 10th and 90th percentiles) (fig. S2A).

As an initial test of our approach, we screened the library for genes that function in DNA mismatch repair (MMR). In the presence of the nucleotide analog 6-thioguianine (6-TG), MMR-proficient cells are unable to repair 6-TG–induced lesions and arrest at the G2-M cell-cycle checkpoint, whereas MMR-defective cells do not recognize the lesions and continue to divide (23). We infected Cas9-KBM7 cells with the entire sgRNA library, cultured the cells in a concentration of 6-TG that is lethal to wild-type (WT) KBM7 cells, and sequenced the sgRNA barcodes in the final population. sgRNAs targeting the genes encoding the four components of the MMR pathway (MSH2, MSH6, MLH1, and PMS2) (24) were dramatically enriched in the 6-TG–treated cells. At least four independent sgRNAs for each gene showed very strong enrichment, and barcodes corresponding to these genes made up >30% of all barcodes (Fig. 2, A and B). Each of the 20 most abundant sgRNAs targeted one of these four genes. The fact that few of the other 73,000 sgRNAs scored highly in this assay suggests a low frequency of off-target effects.

Fig. 2 Resistance screens using CRISPR-Cas9.

(A) Raw abundance (percentage) of sgRNA barcodes after 12 days of selection with 6-TG. (B) MMR deficiency confers resistance to 6-TG. Diagram depicts cellular DNA repair processes. Only sgRNAs targeting components of the DNA MMR pathway were enriched. The diagram was modified and adapted from (32). (C) Primary etoposide screening data. The count for a sgRNA is defined as the number of reads that perfectly match the sgRNA target sequence. (D) sgRNAs from both screens were ranked by their differential abundance between the treated versus untreated populations. For clarity, sgRNAs with no change in abundance are omitted. (E) Gene hit identification by comparing differential abundances of all sgRNAs targeting a gene with differential abundances of nontargeting sgRNAs in a one-sided Kolmogorov-Smirnov test. P values are corrected for multiple hypothesis testing. (F) Immunoblot analysis of WT and sgRNA-modified HL60 cells 1 week after infection. S6K1 was used as a loading control. (G) Viability, as measured by cellular ATP concentration, of WT and sgRNA-modified HL60 cells at indicated etoposide concentrations. Error bars denote SD (n = 3 experiments per group).

We next addressed the challenge of loss-of-function screening in diploid cells, which require biallelic inactivation of a target gene. We therefore generated an inducible Cas9 derivative of the HL60 pseudo-diploid human leukemic cell line. In both HL60 and KBM7 cells, we screened for genes whose loss conferred resistance to etoposide, a chemotherapeutic agent that poisons DNA topoisomerase IIA (TOP2A). To identify hit genes, we calculated the difference in abundance between the treated and untreated populations for each sgRNA, calculated a score for each gene using a Kolmogorov-Smirnov test to compare the sgRNAs targeting the gene against the nontargeting control sgRNAs, and corrected for multiple hypothesis testing (Fig. 2, C to E, and table S2). Identical genes were detected in both screens, with significance levels exceeding all other genes by more than 100-fold. As expected, loss of TOP2A itself conferred strong protection to etoposide (25). The screen also revealed a role for CDK6, a G1 cyclin-dependent kinase, in mediating etoposide-induced cytotoxicity. Every one of the 20 sgRNAs in the library targeting TOP2A or CDK6 was strongly enriched (>90th percentile) in both screens, indicating that the effective coverage of our libraries is very high. We generated isogenic HL60 cell lines with individual sgRNAs against TOP2A and CDK6 and, consistent with the screen results, these lines were much more resistant to etoposide than parental or sgAAVS1-modified HL60 cells (Fig. 2, F and G). Thus, our Cas9/sgRNA system enables large-scale positive selection loss-of-function screens.

To identify genes required for cellular proliferation, we screened for genes whose loss conferred a selective disadvantage on cells. Such a screen requires accurate identification of sgRNAs that are depleted from the final cell population. A sgRNA will show depletion only if cleavage of the target gene occurs in the majority of cells carrying the construct.

As an initial test, we screened KBM7 cells with a small library containing sgRNAs targeting the BCR and ABL1 genes (table S3). The survival of KBM7 cells depends on the fusion protein produced by the BCR-ABL translocation (26). As expected, depletion was seen only for sgRNAs targeting the exons of BCR and ABL1 that encode the fusion protein, but not for those targeting the other exons of BCR and ABL1 (Fig. 3A).

Fig. 3 Negative selection screens using CRISPR-Cas9 reveal rules governing sgRNA efficacy.

(A) Selective depletion of sgRNAs targeting exons of BCR and ABL1 present in the fusion protein. Individual sgRNAs are plotted according to their target sequence position along each gene, and the height of each bar indicates the level of depletion observed. Boxes indicate individual exons. (B) Cas9-dependent depletion of sgRNAs targeting ribosomal proteins. Cumulative distribution function plots of log2 fold changes in sgRNA abundance before and after 12 cell doublings in Cas9-KBM7, Cas9-HL60, and WT-KBM7 cells. (C) Requirement of similar sets of ribosomal protein genes for proliferation in the HL60 and KBM7 cells. Gene scores are defined as the median log2 fold change of all sgRNAs targeting a gene. (D) Depleted sgRNAs target genes involved in fundamental biological processes. Gene set enrichment analysis was performed on genes ranked by their combined depletion scores from screens in HL60 and KBM7 cells. Vertical lines underneath the x axis denote members of the gene set analyzed. (E) Features influencing sgRNA efficacy. Depletion (log2 fold change) of sgRNAs targeting ribosomal protein genes was used as an indicator of sgRNA efficacy. Correlation between log2 fold changes and spacer %GC content (left), exon position targeted (middle), and strand targeted (right) are depicted (*P < 0.05). (F) sgRNA target sequence preferences for Cas9 loading and cleavage efficiency. Position-specific nucleotide preferences for Cas9 loading are determined by counting sgRNAs bound to Cas9 normalized to the number of corresponding genomic integrations. Heatmaps depict sequence-dependent variation in Cas9 loading (top) and ribosomal protein gene-targeting sgRNA depletion (bottom). The color scale represents the median value (of Cas9 affinity or log2 fold-change) for all sgRNAs with the specified nucleotide at the specified position. (G) sgRNA efficacy prediction. Ribosomal protein gene-targeting sgRNAs were designated as “weak” or “strong” on the basis of their log2 fold change and used to train a support-vector-machine (SVM) classifier. As an independent test, the SVM was used to predict the efficacy of sgRNAs targeting 400 essential nonribosomal genes (*P < 0.05).

We then infected Cas9-HL60, Cas9-KBM7, and WT KBM7 cells with the entire 73,000-member sgRNA library and used deep sequencing of the sgRNA barcodes to monitor the change in abundance of each sgRNA between the initial seeding and a final population obtained after 12 cell doublings (fig. S2, A and B).

We began by analyzing ribosomal protein genes, for which the library contained all possible sgRNAs. We observed strong Cas9-dependent depletion of sgRNAs targeting genes encoding ribosomal proteins, with good concordance between the sets of ribosomal protein genes essential for cell proliferation in the HL60 and KBM7 screens (the median sgRNA fold-change in abundance was used as a measure of gene essentiality) (Fig. 3, B and C). A few ribosomal protein genes were not found to be essential. These were two genes encoded on chromosome Y [RPS4Y2, which is testes-specific (27), and RPS4Y1, which is expressed at low levels as compared with its homolog RPS4X on chromosome X (28)] and “ribosome-like” proteins, which may be required only in select tissues (27) and generally are lowly expressed in KBM7 cells (fig. S3A).

We then turned our attention to other genes within our data set, for which 10 sgRNAs were designed. As for the ribosomal genes, the essentiality scores of these genes were also strongly correlated between the two cells lines (fig. S3B and table S4). For the 20 highest scoring genes, we found independent evidence for essentiality, based primarily on data from large-scale functional studies in model organisms (table S5).

To evaluate the results at a global level, we tested 4722 gene sets to see whether they showed strong signatures of essentiality by using gene set enrichment analysis (29). Gene sets related to fundamental biological processes—including DNA replication, gene transcription, and protein degradation—showed strong depletion, which is consistent with their essentiality (Fig. 3D and table S6).

Last, we sought to understand the features underlying sgRNA efficacy. Although the vast majority of sgRNAs against ribosomal protein genes showed depletion, detailed comparison of sgRNAs targeting the same gene revealed substantial variation in the precise amounts of depletion. These differences are unlikely to be caused by local accessibility to the Cas9/sgRNA complex inasmuch as comparable variability was observed even among sgRNAs targeting neighboring target sites of a given gene (fig. S4A). Given that our library includes all possible sgRNAs against each of the 84 ribosomal genes, the data allowed us to search for factors that might explain the differential efficacy of sgRNAs. Because the majority of ribosomal protein genes are essential, we reasoned that the level of depletion of a given ribosomal protein-targeting sgRNA could serve as a proxy for its cleavage efficiency. Applying this approach, we found several trends related to sgRNA efficacy: (i) Single-guide sequences with very high or low GC content were less effective against their targets. (ii) sgRNAs targeting the last coding exon were less effective than those targeting earlier exons, which is consistent with the notion that disruption of the terminal exon would be expected to have less impact on gene function. (iii) sgRNAs targeting the transcribed strand were less effective than those targeting the nontranscribed strand (Fig. 3E). Although these trends were statistically significant, they explained only a small proportion of differences in sgRNA efficacy (table S7).

We hypothesized that differences in sgRNA efficacy might also result from sequence features governing interactions with Cas9. To test this, we developed a method to profile the sgRNAs directly bound to Cas9 in a highly parallel manner (supplementary materials, materials and methods). By comparing the abundance of sgRNAs bound to Cas9 relative to the abundance of their corresponding genomic integrants, we found that the nucleotide composition near the 3′ end of the spacer sequence was the most important determinant of Cas9 loading (Fig. 3F). Specifically, Cas9 preferentially bound sgRNAs containing purines in the last four nucleotides of the spacer sequence, whereas pyrimidines were disfavored. A similar pattern emerged when we examined depletion of ribosomal protein-targeting sgRNAs [correlation coefficient (r) = 0.81], suggesting that, in significant part, the cleavage efficiency of a sgRNA was determined by its affinity for Cas9 (table S7).

We then sought to build an algorithm to discriminate between strong and weak sgRNAs (Fig. 3G). We trained a support-vector-machine classifier based on the target sequences and depletion scores of ribosomal protein-targeting sgRNAs. As an independent test, we used the classifier to predict the efficacy of sgRNAs targeting the 400 top scoring (essential) nonribosomal genes. The top two thirds of our predictions exhibited threefold higher efficacy than that of the remaining fraction, confirming the accuracy of the algorithm.

Using this algorithm, we designed a whole-genome sgRNA library consisting of sequences predicted to have higher efficacy (table S8). As with the sgRNA pool used in our screens, this new collection was also filtered for potential off-target matches. This reference set of sgRNAs may be useful both for targeting single genes as well as large-scale sgRNA screening.

Taken together, these results demonstrate the utility of CRISPR-Cas9 for conducting large-scale genetic screens in mammalian cells. On the basis of our initial experiments, this system appears to offer several powerful features that together provide substantial advantages over current functional screening methods.

First, CRISPR-Cas9 inactivates genes at the DNA level, making it possible to study phenotypes that require a complete loss of gene function to be elicited. In addition, the system should also enable functional interrogation of nontranscribed elements, which are inaccessible by means of RNAi.

Second, a large proportion of sgRNAs successfully generate mutations at their target sites. Although this parameter is difficult to directly assess in pooled screens, we can obtain an estimate by examining the “hit rate” at known genes. Applying a z score analysis of our positive selection screens, we found that over 75% (46 of 60) of sgRNAs score at a significance threshold that perfectly separates true and false positives on a gene level (fig. S5, A to D). Together, these results show that the effective coverage of our library is very high and that the rate of false negatives should be low, even in a large-scale screen.

Third, off-target effects do not appear to seriously hamper our screens, according to several lines of evidence. Direct sequencing of potential off-target loci detected minimal cleavage at secondary sites, which typically reside in noncoding regions and do not affect gene function. Moreover, in the 6-TG screens the 20 most abundant sgRNAs all targeted one of the four members of the MMR pathway. In total, they represented over 30% of the final pool, which is a fraction greater the next 400 sgRNAs combined. In the etoposide screen, the two top genes scored far above background levels (P values 100-fold smaller than that of the next best gene), enabling clear discrimination between true and false-positive hits. Last, new versions of the CRISPR-Cas9 system have recently been developed that substantially decrease off-target activity (30, 31).

Although we limited our investigation to proliferation-based phenotypes, our approach can be applied to a much wider range of biological phenomena. With appropriate sgRNA libraries, the method should enable genetic analyses of mammalian cells to be conducted with a degree of rigor and completeness currently possible only in the study of microorganisms.

Supplementary Materials

Materials and Methods

Supplementary Text

Figs. S1 to S5

Tables S1 to S8

References (3343)

References and Notes

  1. Acknowledgments: We thank all members of the Sabatini and Lander labs, especially J. Engreitz, S. Schwartz, A. Shishkin, and Z. Tsun for protocols, reagents, and advice; T. Mikkelsen for assistance with oligonucleotide synthesis; and L. Gaffney for assistance with figures. This work was supported by the U.S. National Institutes of Health (CA103866) (D.M.S.), National Human Genome Research Institute (2U54HG003067-10) (E.S.L.), the Broad Institute of MIT and Harvard (E.S.L.), and an award from the U.S. National Science Foundation (T.W.). The composition of the sgRNA pools and screening data can be found in the supplementary materials. A patent application has been filed by the Broad Institute relating to aspects of the work described in this manuscript. Inducible Cas9 and sgRNA backbone lentiviral vectors and the genome-scale sgRNA plasmid pool are deposited in Addgene.
View Abstract

Stay Connected to Science

Navigate This Article