CRISPR Explorer: A fast and intuitive tool for designing guide RNA for genome editing

The RNA-guided CRISPR-Cas9 (clustered, regularly interspaced, short palindromic repeat-CRISPR-associated 9) system has become a revolutionary technology for targeted genome engineering. The critical step of this technology requires the design of a highly specific and efficient guide RNA (gRNA) that will guide the Cas9 nuclease to the complementary DNA target sequence. CRISPR-Explorer is a new and user-friendly web server for selecting optimal CRISPR sites. It implements the latest scoring schemes of gRNA specificity and efficiency based on published empirical studies. The gRNA design results are generated instantly, thus removing wait times. The user can visualize the high-quality gRNAs with detailed design information through an interactive genome browser. Furthermore, the user can define and specify the parameters for gRNA selection in the Batch Design mode, which recognizes various input formats. CRISPR Explorer is freely accessible at: http://crisprexplorer.org.


INTRODUCTION
CRISPR-Cas is an RNA-guided nuclease system that allows efficient perturbation of gene functions and is increasingly being used for genome-wide functional screens [1][2][3]. The type II CRISPR-Cas system from Streptococcus pyogenes Cas9 (SpCas9) has been adapted to target a specific DNA sequence for Cas9 nuclease cleavage using a programmable guide RNA (gRNA) [4][5][6][7]. Targeting is mediated by the first 20 nucleotides at the 5'-end of the gRNA, which are complementary to the target DNA sequence, followed by a 3 bp protospacer adjacent motif (PAM) sequence. Cas9 nuclease can be directed by a programmable gRNA to induce DNA double-stranded breaks (DSBs) at specific locations in the genome. In mammalian cells, genome editing typically occurs through the repair of the Cas9-induced DSB by the error-prone non-homologous end-joining (NHEJ) mechanism, which introduces variable-length insertion/deletion (indel) mutations, or through homology-directed (HDR) repair in the presence of an exogenous DNA template. An indel mutation at the spliced coding exon of the target gene frequently results in a coding frameshift and the initiation of nonsense-mediated decay of the gene transcript, which causes gene inactivation. In addition to its utility in genome editing, by removing the Cas9 nuclease activity, catalytically inactive Cas9 (dCas9) proteins that are fused to a co-activation or co-repression domain can be guided by gRNA to a specific DNA sequence to activate or repress gene transcription, respectively [8].
Numerous studies have shown that CRISPR-Cas9 not only targets its intended on-target sites, but also certain off-target sites in the genome that share sequence similarity with the on-target sites [9][10][11][12]. This off-target effect is attributed to the ability of Cas9 to recognize a non-canonical PAM sequence and tolerate the nucleotide mismatch between gRNA and its target sequence. To achieve productive genome editing, the efficiency of Cas9-mediated modification is also critical. The sequence model predicts that the editing efficiency is influenced by the DNA target sequence as well as the flanking sequences [13]. Indeed, such a sequence model has been experimentally validated, and gRNA efficiency can be calculated based on this model [13]. Therefore, to generate a highly specific and efficient gRNA design, we believe that it is important to consider both the specificity and efficiency of the gRNA.
To date, a number of web-based applications have been developed to select highly specific genome-editing sites [9,12,14,15]. Typically, these applications require the target sequence or gRNA as input. The potential CRISPR sites on the target sequence are identified, and this is followed by the searching of potential off-target sites throughout the reference genome. The wait time depends on the computation time for each target site calculation. The wait time increases as the number of input sequences increases, which can result in a long wait time. The output is often presented on a plain web page that provides a basic display of the positions of the CRISPR sites that match the input sequence. A few applications use the UCSC Genome Browser to display results that are not interactive. Most of these applications only consider the specificity of the gRNA, but not the efficiency. One application, CRISPR-ERA, does consider the specificity and efficiency for gRNA design but uses an ad hoc scoring scheme [14]. Finally, most of the existing web applications lack an intuitive and convenient way for browsing and batch design.
Here, our goal was to develop a guide RNA design web server that eliminates the above issues and offers the most up-to-date and proven scoring schemes for gRNA specificity and efficiency. CRISPR-Explorer is fast, intuitive, and flexible. It is particularly useful for genome-wide functional screening and high-throughput screening using the CRIS-PR-Cas9 technology. The CRISPR-Cas system is a rapidly moving technology. We will continue to update the scoring schemes as new knowledge and data become available, and we expect to incorporate the scoring schemes for new Cas nucleases that are identified in the future [16].
Briefly, the whole reference genomes of human (hg19) and mouse (mm10) for every possible CRISPR site (e.g. 5'(N~20)-NGG, 5'(N~20)-NAG sites, etc.) were scanned. To identify potential off-targets for each possible gRNA, we used the "all-mapper" of the Genome Multitool (GEM) mapper, which gives all alignments of a specific short sequence with a user-defined number of mismatches [17]. The number of off-target alignments in the reference genome for a specific CRISPR site grows rapidly as the number of mismatches increases, and therefore, up to 4 mismatches in the 20 bp gRNA were allowed when searching for off-target sites. Three mismatches were used for the truncated (18 bp) gRNA since previous studies showed that a truncated gRNA with more than 3 mismatches usually does not have a detectable off-target effect [18]. To calculate the specificity score (or aggregate scores of single hits), we adopted the algorithm that was developed by the Zhang F group [12]  . As for the calculation of the truncated (18 bp) gRNA, the first two nucleotide positions and their experimentally determined effects were removed from the calculation of the specificity score. The efficiency score was calculated using the SSC program [13].
To achieve fast query speed, the results were sorted and indexed using the Tabix program [19]. We adopted the WashU Epigenome Browser [20] as the presentation framework for display customization and linking to the gRNA database. The batch design exporter that we implemented is based on the Angular UI Grid (http://ui-grid.info/) JavaScript framework, which provides an interactive output.

PROCEDURE Input
The CRISPR-Explorer homepage contains 3 panels. See Currently, only Cas9 PAM (-NGG) is supported because it is the most widely used and has a more mature scoring scheme based on empirical studies. However, we will include other PAMs (such as Cpf1) in the future when possible. By default, repeat regions are excluded for off-target searching. However, the user has the option to include repeat regions during the gRNA design.

2.
'Browse gRNAs' panel. Browse gRNAs for your gene of interest by entering the gene name, the genome coordinate, or a DNA sequence from the genome. The input format for the gene name can be the official gene symbol, Ensembl gene name, or refGene name. If you know the genomic location of the region, you can directly query the browser with the genome coordinate (either in bed format or the following format: chrX:NNNNN-NNNNN). When the target sequence (with a limit of 20000 bases) is supplied as input, the Blat program (an alignment tool) from the UCSC Genome Browser is used to map the sequence to the genome location. The speed of this part depends on the UCSC Blat server. A guide RNA ID that is generated from Batch Design Exporter can be used to access the detailed information about the gRNA.

'Batch Design Exporter' panel. Multiple genome coordinates or gene entries can be entered simultaneously.
Currently, a maximum number of 100 entries is allowed per run to prevent server overload. You can also reduce the output and get a faster response by applying pre-defined filters. The 'Exonic guide only' option selects gRNAs that fall within the exonic regions. RefGene provides the gene structure to define the exonic regions. The 'MIT score ≥ 50' option selects gRNAs that have specificity scores greater than or equal to 50 [12]. The 'SSC score > 0' option selects gRNAs that have efficiency scores that are greater than zero [13]. The 'Mismatch > 1' option selects for gRNAs that have more than 1 mismatch with the most similar sequence in the whole genome.

Output
The two main display options in CRISPR-Explorer are the interactive browser ( Fig. 2 and 3) and a table (Fig. 4). 4. After the 'Browse' button inside the 'Browse gRNAs' panel is clicked, the browser launches (Fig. 2). The gRNAs are color-coded based on different specificity and efficiency score cutoffs. The legend on the bottom left-hand side defines the color codes. Red gRNAs have the highest specificity and efficiency scores, and green gRNAs have the lowest specificity and efficiency scores. By rolling the cursor over the gRNA bar, brief information about that gRNA is displayed. To get detailed information about a gRNA, clicking on the gRNA leads to a pop-up window that shows the gRNA's genome coordinate, strand-specificity, length, closest mismatch, specificity and efficiency scores, sequence, and detailed off-target information (#OT) (Fig. 2). A click on the 'Details' link in this pop-up will lead to a page with detailed information about the off-target sequences of the selected gRNA. The table that is found on this page is interactive and contains the following information: off-target sequences with the mismatches in upper case letters ('Off-target sequence'), genome coordinate with strand specificity that is indicated by "+" or "-" ('Location'), number of mismatches ('Mismatch'), off-target score ('Score'), and the name of the gene in which the exons are hit by the off-target mismatch ('Hit Exon').

5.
To change the gRNA display on the browser, first right-click on the browser track and then click on 'Configure'. One can choose different filters ('Closes Mismatch', 'Specificity', or 'Efficiency') to limit the gRNAs that are displayed on the browser (Fig. 3). For example, in Figure 3, the selection of 'Specificity' with the scoring range between 50 and 100 restricts the display of gRNAs to those whose scores fall within this range.

6.
After the 'Submit' button inside the 'Batch Design Exporter' panel is clicked, an interactive table launches (Fig. 4). When a gene name is used as input, the output will contain information about the exon (ranked in   composite gene model) in which the target site falls ('HitExon' column). If a transcript name is use as input (e.g. Ensembl transcript name), the 'HitExon' will rank by that transcript. Genome coordinates is the ideal input format for gRNA design in unannotated genomic regions. When genome coordinates are supplied as input, the 'HitExon' information is not reported.