Hostname: page-component-78c5997874-j824f Total loading time: 0 Render date: 2024-11-10T23:14:10.180Z Has data issue: false hasContentIssue false

CRISPR-finder: A high throughput and cost-effective method to identify successfully edited Arabidopsis thaliana individuals

Published online by Cambridge University Press:  15 January 2021

Efthymia Symeonidi
Affiliation:
Department of Molecular Biology, Max Planck Institute for Developmental Biology, Tübingen, Germany
Julian Regalado
Affiliation:
Department of Molecular Biology, Max Planck Institute for Developmental Biology, Tübingen, Germany
Rebecca Schwab
Affiliation:
Department of Molecular Biology, Max Planck Institute for Developmental Biology, Tübingen, Germany
Detlef Weigel*
Affiliation:
Department of Molecular Biology, Max Planck Institute for Developmental Biology, Tübingen, Germany
*
Author for correspondence: Detlef Weigel, E-mail: weigel@weigelworld.org Current address: Efthymia Symeonidi, Department of Biological Sciences, University of Utah, Salt Lake City, Utah84112, USA; Julian Regalado, The Globe Institute, University of Copenhagen, Øster Voldgade 5-7, 1350 Copenhagen, Denmark.

Abstract

Genome editing with the CRISPR/Cas (clustered regularly interspaced short palindromic repeats/CRISPR associated protein) system allows mutagenesis of a targeted region of the genome using a Cas endonuclease and an artificial guide RNA. Both because of variable efficiency with which such mutations arise and because the repair process produces a spectrum of mutations, one needs to ascertain the genome sequence at the targeted locus for many individuals that have been subjected to mutagenesis. We provide a complete protocol for the generation of amplicons up until the identification of the exact mutations in the targeted region. CRISPR-finder can be used to process thousands of individuals in a single sequencing run. We successfully identified an ISOCHORISMATE SYNTHASE 1 mutant line in which the production of salicylic acid was impaired compared to the wild type, as expected. These features establish CRISPR-finder as a high-throughput, cost-effective and efficient genotyping method of individuals whose genomes have been targeted using the CRISPR/Cas9 system.

Type
Original Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in anymedium, provided the original work is properly cited.
Copyright
© The Author(s), 2021. Published by Cambridge University Press in association with The John Innes Centre

1. Introduction

Genome editing has become a routine approach to investigate gene function in vivo. The recent development of CRISPR/Cas9-based systems has opened new doors for genome editing by simplifying the requirements for genome targeting, particularly in comparison to zinc finger nucleases and TALENs (Gaj et al., Reference Gaj, Gersbach and Barbas2013). The system requires a nuclease (Cas9), an artificial single guide RNA (sgRNA), and a short sequence upstream of the sgRNA binding site called a Protospacer Adjacent Motif (PAM), which has the sequence 5-NGG-3 (Gasiunas et al., Reference Gasiunas, Barrangou, Horvath and Siksnys2012; Jinek et al., Reference Jinek, Chylinski, Fonfara, Hauer, Doudna and Charpentier2012). Part of the sgRNA is complementary to 20 nucleotides in the targeted region of the genome, and the rest is responsible for the stabilization of the Cas9/sgRNA complex.

Interaction of the Cas9/sgRNA complex with the target site enables Cas9’s endonuclease domains to generate a double-stranded break. Such breaks can be repaired through either the non-homologous end joining (NHEJ) or the homology-directed repair (HDR) pathway. NHEJ is error-prone, and can introduce small insertions or deletions that can lead to the disruption of open reading frames (Ma et al., Reference Ma, Lu, Tippin, Goodman, Shimazaki, Koiwai, Hsieh, Schwarz and Lieber2004; Phillips & Morgan, Reference Phillips and Morgan1994). In the case of HDR, a donor template complementary to the target needs to be present to introduce a specific region to the genome of interest (Gratz et al., Reference Gratz, Ukken, Rubinstein, Thiede, Donohue, Cummings and O’Connor-Giles2014; Liang et al., Reference Liang, Han, Romanienko and Jasin1998). The CRISPR/Cas9 and related systems have been used to generate knock-outs (Chang et al., Reference Chang, Sun, Gao, Zhu, Xu, Zhu, Xiong and Xi2013; Li et al., Reference Li, Qiu, Shao, Chen, Guan, Liu, Li, Gao, Wang, Lu, Zhao and Liu2013), knock-ins (Auer et al., Reference Auer, Duroure, De Cian, Concordet and Del Bene2014; Platt et al., Reference Platt, Chen, Zhou, Yim, Swiech, Kempton, Dahlman, Parnas, Eisenhaure, Jovanovic, Graham, Jhunjhunwala, Heidenreich, Xavier, Langer, Anderson, Hacohen, Regev, Feng and Zhang2014) and to delete entire genes (Canver et al., Reference Canver, Bauer, Dass, Yien, Chung, Masuda, Maeda, Paw and Orkin2014) in several species including the plant Arabidopsis thaliana (Feng et al., Reference Feng, Zhang, Ding, Liu, Yang, Wei, Cao, Zhu, Zhang, Mao and Zhu2013; Reference Feng, Mao, Xu, Zhang, Wei, Yang, Wang, Zhang, Zheng, Yang, Zeng, Liu and Zhu2014; Hyun et al., Reference Hyun, Kim, Cho, Choi, Kim and Coupland2015; Peterson et al., Reference Peterson, Haak, Nishimura, Teixeira, James, Dangl and Nimchuk2016).

While the generation of mutants using CRISPR/Cas9 is relatively easy, identification of desired mutations often requires screening many events. Two common approaches to screen for induced mutations are Sanger sequencing (Fauser et al., Reference Fauser, Schiml and Puchta2014; Feng et al., Reference Feng, Mao, Xu, Zhang, Wei, Yang, Wang, Zhang, Zheng, Yang, Zeng, Liu and Zhu2014) or the T7 Endonuclease1 (T7E1) assay (Ablain et al., Reference Ablain, Durand, Yang, Zhou and Zon2015; Xie &Yang, Reference Xie and Yang2013) applied to individual PCR products. Unfortunately, neither method provides immediately a precise identification of mutations in the desired region. For example, in the case of Sanger sequencing, the final readout merges the most abundant products in the template into one chromatogram (Sanger & Coulson, Reference Sanger and Coulson1975; Strauss et al., Reference Strauss, Kobori, Siu and Hood1986). This can lead to secondary peaks and sometimes a mixed signal due to other amplified molecules in the mixture, and can make it very hard to detect desired but rare events that might have occurred during editing. Confirmation of successful editing through subsequent cloning of a mixed PCR product followed by retrieval of bacterial colonies that carry the rare variant is time-consuming and expensive. Use of T7E1 can also yield inconclusive results due to its reliance on the T7 Endonuclease 1 to digest only fragments carrying mismatches (Mashal et al., Reference Mashal, Koontz and Sklar1995), which would miss homozygous mutants, as there are no mismatched fragments available for digestion. In addition, both techniques can be expensive for screening a large number of samples (>100).

These limitations led us to develop a robust and cost-efficient way of efficiently screening large numbers of samples. Here we introduce a high-throughput screening approach for identifying mutations using Illumina sequencing, called CRISPR-finder. We describe both the library preparation of the samples and the analysis pipeline for identifying editing events. The method is compatible with sequencing on different Illumina instruments (MiSeq and HiSeq300), and the adapter sequences could be modified for use on other platforms.

Our approach is inspired by an amplicon sequencing method previously developed for pooling samples for the analysis of microbiomes (Lundberg et al., Reference Lundberg, Yourstone, Mieczkowski, Jones and Dangl2013). In our approach, the amplicon libraries are generated through a two-step PCR amplification using specific combinations of oligonucleotides for the first step. During the PCRs, frameshifting nucleotides and one of 96 unique indices are added. Based on the unique combination of the frameshifting nucleotides and the barcode we were able to sequence hundreds of samples, for example >900 samples, in a single Illumina MiSeq run. To illustrate the accuracy and the precision of the method we describe how we identified and characterized a Cas9-free line with a mutation in the ISOCHORISMATE SYNTHASE 1 (ICS1) gene.

2. Results

2.1. Target site identification

The aim of this study was to improve the speed of mutant identification with the CRISPR/Cas9 system. To demonstrate the efficacy of this new approach, the ISOCHORISMATE SYNTHASE 1 (ICS1) gene was targeted in different A. thaliana accessions (Supplementary Table S1). ICS1 encodes an enzyme involved in salicylic acid biosynthesis (Wildermuth et al., Reference Wildermuth, Dewdney, Wu and Ausubel2001).

The accessions of A. thaliana used in this study are from the first phase of the 1001 Genomes Project (Cao et al., Reference Cao, Schneeberger, Ossowski, Günther, Bender, Fitz, Koenig, Lanz, Stegle, Lippert, Wang, Ott, Müller, Alonso-Blanco, Borgwardt, Schmid and Weigel2011). The polymorph tool (http://polymorph.weigelworld.org) was used to align sequences of ICS1 from the different accessions. Target sites without sequence variation among the accessions were identified to select the guide RNAs (Figure 1a).

Figure 1. Amplicon preparation. (a) Diagram of the targeted gene, ISOCHORISMATE SYNTHASE 1 (ICS1). Black boxes indicate exons, and grey boxes untranslated regions. The arrow shows the direction of transcription. The numbers at the beginning and end of the gene correspond to the genomic coordinates. (b)–(d) Amplicon preparation. (b) The first PCR step to amplify a specific region of the genome. The oligonucleotide primers in this step fuse the first part of the TruSeq adapters (grey) and the frame shifting nucleotides (red). (c) The second PCR amplification adds the last part of the TruSeq adapters (purple) and one of the 96 barcodes (orange). (d) The final amplicon with frameshifting base pairs(s) (red), TruSeq adapters (grey and purple) and barcode(orange).

Plants were transformed separately with the ICS1 targeting construct (Supplementary Table S2). The primary transformants were found to have somatic editing events by using the CRISPR-finder genotyping pipeline. The selection of the transgene was based on glufosinate or the seed-specific expression of mCherry (Gao et al., Reference Gao, Chen, Dai, Zhang and Zhao2016; Kroj et al., Reference Kroj, Savino, Valon, Giraudat and Parcy2003).

2.2. Generation and sequencing of amplicons spanning CRISPR/Cas9 target sites

In order to quickly and unambiguously identify CRISPR/Cas9-induced mutations in a large number of plants, the targeted regions were amplified by PCR, attaching different barcodes for different individual plants, and then pools of barcoded PCR products were sequenced on an Illumina MiSeq (or HiSeq) instrument. The ICS1 locus was targeted in different accessions to determine the efficacy of the method at different genetic backgrounds. Two sites were targeted in the gene, 72 bp apart for ICS1. The amplified regions were 211 bp long.

The amplicons were prepared based on a two-step PCR amplification protocol (Figure 1bd). During the first round of amplification, the specific region of interest was amplified, and frameshifting nucleotides as well as part of the sequences required for the fragment to hybridize to the flowcell for sequencing, the TruSeq adapters, were added. This was achieved by using specific combinations of oligonucleotides (Figure 1c) (Supplementary Table S3). The cleaned PCR product was used as a template for the second round of amplification, where the remainder of the TruSeq adapters and one of 96 barcodes were added (Lundberg et al., Reference Lundberg, Yourstone, Mieczkowski, Jones and Dangl2013) (Figure 1d) (Supplementary Table S3). Each PCR amplification step was carried out for 15 cycles.

The PCR products were quantified using the Quant-iTTM PicoGreen® dsDNA assay, normalized (described in Methods) and pooled (Supplementary Figure S1). For the sequencing on the MiSeq platform, the MiSeq reagent kit v2 (300-cycles or 500-cycles) (MS-102-2002 or MS-102-2003) was used (150 or 250 bp paired-end reads). The adapters were designed and chosen in order to be compatible with both MiSeq and HiSeq3000 platforms (Illumina, San Diego, USA); successful runs were carried out on both platforms.

2.3. Demultiplexing process

After sequencing, the pooled reads were demultiplexed in a two-step process. Ninety-six batches of combined samples were first identified via the indices that were located at the TruSeq adapters incorporated in the 2nd PCR amplification. This process was carried out with bcl2fastq (1.8.4) software, provided by Illumina, which also trims the sequence of the barcodes (https://my.illumina.com) (Figure 2a). The length of the reads downstream was 150 or 250 bp according to the kit that was used.

Figure 2. Diagrams of the demultiplexing procedure and graphs representing the average number of reads/plate/run. (a) The primary demultiplexing step is carried out by the Illumina software and separates the samples based on the indices that are located within the adapter region into 96 pools. (b) The secondary demultiplexing script (PlexSeq) then assigns the reads to individual plants based on the frameshifting nucleotides. (c)–(f) Each graph shows the average number of reads per plate (≤96 samples) in each run. Each run consisted of different samples. For different runs, different numbers of plates were sequenced depending on the number of samples. (c) MiSeqrun010, (d) MiSeqrun024, (e) MiSeqrun046 and (f) MiSeqrun083. For exact number of samples per run see Supplementary Table S4.

Subsequently, sequencing reads from different samples were mixed under the same barcode. In order to assign each read to the individual from which it came, we took advantage of the frameshifting nucleotides incorporated during the first step of the two-step PCR amplification. The first nine nucleotides from each read were used as ‘secondary’ barcodes to determine from which sample each read in the sequencing run originated; nine bases are sufficient to capture the unique frameshifting nucleotides used during the amplicon generation (Figure 2b). The specific combination of oligonucleotides during the first amplification generated a unique combination of forward-reverse oligonucleotide and barcode sequence information for each sample.

For binning of reads, the PlexSeq Python script (https://github.com/7PintsOfCherryGarcia/plexseq) was developed, which successfully demultiplexes >98% of reads in each dataset (Figure 2b). Since PlexSeq was run without allowing any mismatches of the ‘secondary’ barcodes, around 2% of the data could not be separated because of errors in PCR primers or errors introduced during the sequencing process; a loss of 2% of reads was deemed acceptable (Figure 2c–f). These unassigned reads are ignored in downstream analyses. A file with the expected ‘secondary’ barcodes needs to be provided in order for the script to successfully proceed with demultiplexing (Supplementary Figure S2). After PlexSeq, the sequences that were used as barcodes for demultiplexing were not trimmed, since they are part of the amplicons.

2.4. Analysis pipeline

After the demultiplexing process, each sample was genotyped in order to detect single nucleotide polymorphisms, as well as small insertions and deletions in the region of interest.

For each sample, reads were mapped back to the reference sequence for the gene of interest (Gene ID:843810) using the MEM algorithm of the BWA read mapping tool (Li et al., Reference Li, Handsaker, Wysoker, Fennell, Ruan, Homer, Marth, Abecasis and Durbin2009) with standard parameters (Figure 3a). The resulting alignment files were genotyped with freebayes using standard parameters (Figure 3a) (Garrison & Marth, Reference Garrison and Marth2012). The resulting VCF file was then filtered with vcftools (Danecek et al., Reference Danecek, Auton, Abecasis, Albers, Banks, DePristo, Handsaker, Lunter, Marth, Sherry, McVean and Durbin2011) to only keep samples in which high-quality variants were detected at regions of interest.

Figure 3. The analysis pipeline and visualized alignments and SA levels of different genotypes. (a) Diagram of the analysis pipeline. BWA-MEM is used for the alignment and the Freebayes algorithm for variant calling. Finally, IGV is used for visualizing the alignments or the vcf files. (b) Alignment that shows a deletion visualized in IGV. On the top track in the coverage panel it is apparent how coverage is decreased at the location of the deletion. The black box indicates the location of the PAM site. (c) Alignment that shows a 1-bp insertion (purple ‘I’) in IGV. The black box indicates the location of the PAM site. (d) SA content of TüWa1-2 wild-type and the derivative TüWa1-2 ics1-1c mutant. (e) SA content of Col-0 reference wild type and isogenic sid2-2 mutant for comparison. SID2 is a synonym for ICS1. Note the very different scale from (d). FT, Fresh Tissue. The whiskers of each boxplot indicate the spread of the data, and the line within the box corresponds to the median value of the data points. The measurements from plants grown in 23°C short-day conditions (8 h light/16 h dark) for 43 days.

At the end, integrative genomics viewer (IGV) (Robinson et al., Reference Robinson, Thorvaldsdóttir, Winckler, Guttman, Lander, Getz and Mesirov2011; Thorvaldsdóttir et al., Reference Thorvaldsdóttir, Robinson and Mesirov2013) was used for visual inspection of read mapping and variant calls (Figure 3b,c). All software was used with standard parameters unless otherwise noted. The required memory for the analysis can be 5–20 Gb depending on the output of the run.

2.5. Identifying mutations

Using CRISPR-finder, plants either heterozygous or homozygous for targeting events were identified. As a proof of concept for our approach, we targeted the ICS1 gene in the TüWa1-2 background. The TüWa1-2 ics1-1c mutant was identified after screening more than 100 individuals. The parental genotype was originally collected in Germany and its phenotype shows extensive necrotic lesions on the leaves (Supplementary Figure S3), which can be attributed to extensive cell death. It was hypothesized that this is caused by elevated levels of SA. Using a biosensor assay, the SA content in plants was quantified (Defraia et al., Reference Defraia, Schmelz and Mou2008; Huang et al., Reference Huang, Huang, Preston, Naylor, Carr, Li, Singer, Whiteley and Wang2006). As expected, the levels of free SA in the TüWa1-2 ics1-1c mutant were decreased in comparison to the ones in the wild-type parental lines (Figure 3d,e). These results demonstrate that our approach of screening can easily and rapidly identify individuals with targeted mutations that have the desired effect.

3. Discussion

We describe a high-throughput screening approach, called CRISPR-finder, that increases the accuracy and reduces the time and cost required for identifying CRISPR/Cas induced mutations (Figure 4). We generate barcoded amplicons of the targeted region through a two-step PCR amplification (Figure 1b–d). For each individual, a unique combination of frameshifting nucleotides and index sequence is used, which greatly increases the number of barcodes. An important consideration is that pooling of amplicons for sequencing can lead to unbalanced representation of samples. However, if we aim for average coverage of 1,000×, and assume that 10% of individuals provide 10× as many reads as aimed for, and 10% of individuals provide only one-tenth of the reads aimed for, a single MiSeq run (~15–20 million reads) would still provide sufficient coverage to analyze over 10,000 samples in a single run. Of course, the coverage can be adjusted to the needs of different experimental set ups.

Figure 4. Schematic representation of the screening pipeline. Starting from hundreds of samples the amplicon generation takes place by preparing the individuals for sequencing. By the end of the sequencing run the demultiplexing and analysis can take place that can lead to the identification of the desired edited individuals.

For processing large numbers of samples, CRISPR-finder is a particularly cost-effective method. While with conventional assays such as Sanger sequencing and the T7E1 assay, costs scale linearly with the number of samples to screen, for CRISPR-finder in one sequencing pool, thousands of samples can be sequenced with high resolution and the cost per sample decreases as more samples are added to the pool. Additionally, ‘spiking in’ samples into another sequencing run to use only part of a flowcell’s capacity is possible, further increasing flexibility and reducing costs. We consider our method to be cost-efficient as soon as there are at least 100 individuals to be genotyped.

Finally, the first oligonucleotide set does not need HPLC purification, limiting the cost of the multiplexing procedure. This is because oligonucleotide primers are synthesized from 3 to 5, and truncations or errors will therefore be concentrated towards the ends of the amplicon during the first round of amplification. These ends serve as the binding sites for the oligonucleotides that are used for the second round of amplification, which will anneal despite minor errors and result in products with the correct adapter sequence.

While our method was developed for screening A. thaliana CRISPR/Cas9 mutagenized individuals, it can be easily adopted for any organism that has been genome edited using the CRISPR/Cas9 or related systems. The targeting of multiple sites can be accommodated with one amplicon, if the distance permits it (present study) or by generating one amplicon for each site. Note, however, that large deletions induced by CRISPR/Cas9 editing (Kosicki et al., Reference Kosicki, Tomberg and Bradley2018) would escape detection with our pipeline because the expected sequences for hybridization sites of the first set of oligonucleotides might not be present. In addition, while our method allows for efficient and precise genotyping and identification of individuals carrying the desired mutations, one still has to consider the downstream steps such as identification of transgene-free lines that faithfully inherit the mutation.

In conclusion, a full pipeline from DNA extraction to identification of individuals carrying mutations generated with the CRISPR/Cas9 system is described in detail – CRISPR-finder. Compared to more conventional methods (Sanger and T7E1 assay), large-scale amplicon sequencing is more robust and less expensive.

4. Material and methods

4.1. Plant growth

A. thaliana seeds were kept at −80°C overnight and then surface-sterilized with 70% ethanol and 0.05% (v/v) Triton X-100 for 5 min, followed by 100% ethanol for 5 min. Seeds were air-dried in a sterile hood until all residual ethanol had evaporated. Seeds were stratified in 0.1% (w/v) agar-agar for 7 days in the dark at 4°C prior to sowing on soil. Vernalization-requiring seedlings (highlighted with blue in Supplementary Table S1) were placed for seven weeks in 4°C short-day conditions (8 h light/16 h dark) and then transferred to 23°C long-day conditions (16 h light/8 h dark). For SA assays, plants were grown in 23°C short-day conditions (8 h light/16 h dark).

4.2. Plasmid generation

Constructs for plant transformation were generated using the GreenGate cloning system (Lampropoulos et al., Reference Lampropoulos, Sutikovic, Wenzl, Maegele, Lohmann and Forner2013). The five different constructs used are described in Supplementary Table S2. The sgRNA constructs were generated as described in Wu et al. (Reference Wu, Lucke, Jang, Zhu, Symeonidi, Wang, Fitz, Xi, Schwab and Weigel2018), pEF016 (5-AATCAATTGCTCCGATTTGC-3) and pEF017 (5-TTCTCTCGTCGCAGTGACGT-3).

4.3. Plant transformation

Plants were transformed using the flora dip method as described by Clough and Bent (Reference Clough and Bent1998).

4.4. Selection of Cas9 transgene-free plants

Two selection markers were used, resistance to glufosinate ammonium (BASTA SL, Bayer Crop Science, Leverkusen, Germany) and AT2S3::mCherry (Gao et al., Reference Gao, Chen, Dai, Zhang and Zhao2016). To select transgene-free plants that no longer carried BASTA resistance, leaves were brushed with a solution, diluted from the original stock (200 g/L) BASTA (1:1,000 or 1:2,000) (Bayer Crop Science, Leverkusen, Germany). The treatment caused leaves from plants without the transgene to become wrinkled and yellowish.

Seeds from plants that were carrying the AT2S3::mCherry (Kroj et al., Reference Kroj, Savino, Valon, Giraudat and Parcy2003) cassette were screened for fluorescence or absence thereof under a LEICA MZFLIII Fluorescence stereoscope (Wetzlar, Germany) with a SOLA 365 SM Light Engine© lamp (Lumencot, Beaverton, OR).

We consider as good practice that one confirms the absence of the transgene with a genotyping approach for any line that will be used for subsequent experiments.

4.5. DNA isolation

Genomic DNA was extracted following a published protocol (Edwards et al., Reference Edwards, Johnstone and Thompson1991), with an additional ethanol wash. DNA was resuspended in 100 μl of ddH2O.

4.6. Salicylic acid quantification

The protocol was adapted from Marek et al. (Reference Marek, Carver, Ding, Sathyanarayan, Zhang and Mou2010). Fresh tissue was collected and frozen at −80°C overnight. For every 175 mg of fresh tissue, 250 μl of 0.1 M pH 5.5 sodium acetate was added post grinding for further vortexing. Acinetobacter sp. ADPWH_lux strain was used (Huang et al., Reference Huang, Huang, Preston, Naylor, Carr, Li, Singer, Whiteley and Wang2006) for the quantification of salicylic acid. Overnight culture of Acinetobacter sp. ADPWH_lux at 37°C was diluted (1:20) and grown at 37°C while shaking at 200 rpm until it reached OD600 of 0.4. For measuring free and 2-O-β-D-glucoside (SAG) SA, plant crude extract from the samples was incubated at 37oC for 1.5 h with 0.4 U/μl of β-glucosidase prior to measurement.

Black Optiplates (96 wells, ref:655906; Greiner Bio-One, Kremsmünster, Austria) were used for the measurements. They were loaded with 50 μl of LB, 60 μl of the cell culture and 30 μl of the plant extract. Standards were prepared with 50 μl of LB, 60 μl of the cell culture, 10 μl of known SA concentrations and 20 μl of plant extract from 35S::NahG plants as control (Col-0 background) (prepared the same way as the samples). The plates were incubated at 37°C for 2 h without shaking and the luminescence was measured using the Iinfinite F200 instrument (TECAN, Männedorf, Switzerland) and the i-control 1.12 software.

4.7. Amplicon library preparation

The amplicon libraries were generated with a two-step PCR protocol. The first reaction consisted of 1 μl of genomic DNA as template, 0.5 μM forward oligonucleotide (G-40604/G-40605/G-40606/G-40606/G-42015), 0.5 μM reverse oligonucleotide (G-40607/G-40608/G-40609/G-42016), 1× Phusion HF buffer (1.5 mM MgCl2) (Thermo Fisher Scientific, Waltham, MA), 0.2 mM dNTPs (Thermo Fisher Scientific, #R0182, Waltham, MA) and 0.02 U/μl Phusion High-Fidelity DNA polymerase (Thermo Fisher scientific, #F530, Waltham, MA) to a final volume of 25 μl.

The second PCR amplification consisted of 2.5 μl of the cleaned PCR product of the previous reaction, 0.5 μM forward oligonucleotide (G-40610), 0.25 μM reverse oligonucleotide that had one of the 96 indices (Lundberg et al., Reference Lundberg, Yourstone, Mieczkowski, Jones and Dangl2013), 1× Phusion HF buffer (1.5 mM MgCl2) (Thermo Fisher Scientific, Waltham, MA), 0.2 mM dNTPs (Thermo Fisher Scientific, #R0182, Waltham, MA) and 0.02 U/μl Phusion High-Fidelity DNA polymerase (Thermo Fisher Scientific, #F530, Waltham, MA) to a final volume of 25 μl.

Sequencing libraries were prepared using Q5® High-Fidelity DNA polymerase (New England BioLabs, #M0491, Ipswich, MA) in a final concentration of 0.02 U/μl along with 1× Q5 reaction buffer (2 mM MgCl2). The rest of the reaction components (DNA template, dNTPs) remained the same.

The MJ Research PTC225 Peltier (Marshall Scientific, Hampton, NH) or the BIO-RAD C1000 Touch (Hercules, CA) thermal cyclers were used. The PCR programs had 15 cycles in which the denaturing temperature was 94°C for 30 s, followed by annealing at 60°C for 30 s, and extension at 72°C for 10 s for program 1, and 15 s for program 2. A final extension step was at 72°C for 2 min.

4.8. Bead clean up

For the generation of the amplicon libraries, two bead-based clean-up steps were carried out using SPRI beads (Magnetic SpeedBeadsTM, GE Healthcare No.:65152105050250, Chicago, IL). The first PCR product was cleaned using a ratio of 1:0.9 (reaction:beads v/v) and resuspended in 17 μl of ddH2O. The second PCR product was cleaned using the same ratio of beads and resuspended in 27 μl. The ratios of clean ups were chosen after optimization.

4.9. Quant-iTTM PicoGreen® dsDNA assay

Amplicons were quantified using the Quant-iTTM PicoGreen (Invitrogen, Carlsbad, CA) dsDNA assay. One microlitre of each amplicon was used according to the manufacturer’s instructions for the quantification. The samples were prepared in black 96 well, F-bottom, non-binding microplates (96 wells, ref:655906; Greiner Bio-one), and the TECAN Infinite M200 PRO plate reader was used for all the measurements using the Magellan 7.2 software.

4.10. Pooling (Supplementary Figure S1)

To roughly normalize samples when pooling, the DNA concentration of all samples in each 96 well plate was first measured fluorometrically (PicoGreen assay). First, all the 96 samples from each plate were pooled, creating subpools. From samples with concentrations less than half of the mean, 6 μl were taken. From samples with concentrations more than twice the mean, 1.5 μl was taken. For all other samples falling between these extremes, 3 μl was taken. After each plate was pooled in this way, the subpools representing entire plates were again measured fluorometrically (Qubit dsDNA-HS assay) (Thermo Fisher Scientific, Waltham, MA) and pooled in an equimolar manner to create a final pool containing all samples. The concentration of the subpools and the final pool were evaluated using the Qubit dsDNA-HS assay. Each pool was analyzed on the Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA) according to the manufacturer’s instructions. DNA1000 chips were used for the amplicon libraries.

4.11. Illumina MiSeq sequencing

The libraries were diluted for Illumina sequencing following manufacturers’ protocols and sequenced on the MiSeq platform using MiSeq reagent kit v2 (300-cycles) (MS-102-2002) for the MiSeq 010, 024 and 046 and MiSeq reagent kit v2 (500-cycles) (MS-102-2003) for the MiSeq 083.

Acknowledgements

We thank Derek Lundberg for discussions, suggestions regarding PCR and library preparation and comments on the manuscript, Talia Karasov for comments on the manuscript, Christa Lanz and Julia Hildebrandt for technical help sequencing, Ilja Berzukov for Illumina demultiplexing.

Financial Support

Funding was provided by the Bayer Science and Education Foundation (to ES) and the Max Planck Society.

Conflict of Interest

The authors declare no competing interests.

Authorship Contributions

E.S., R.S. and D.W. planned the study. R.S. and E.S. prepared initial plasmids for the CRISPR/Cas9 system. E.S. performed all cloning, plant transformations, DNA extractions and prepared all the libraries. JR designed and wrote PlexSeq for sample demultiplexing. E.S. carried out data analysis. E.S. conducted Hpa in infections. E.S. wrote the manuscript. E.S., J.R., R.S. and D.W. revised the manuscript.

Data availability Statement

All data in this manuscript has been deposited in the European Nucleotide Archive. It can be accessed under the project number PRJEB39078.

Supplementary Materials

To view supplementary material for this article, please visit http://dx.doi.org/10.1017/qpb.2020.6.

References

Ablain, J., Durand, E. M., Yang, S., Zhou, Y., & Zon, L. I. (2015). A CRISPR/Cas9 vector system for tissue-specific gene disruption in zebrafish. Developmental Cell, 32(6), 756764.CrossRefGoogle ScholarPubMed
Auer, T. O., Duroure, K., De Cian, A., Concordet, J. P., & Del Bene, F. (2014). Highly efficient CRISPR/Cas9-mediated knock-in in zebrafish by homology-independent DNA repair. Genome Research, 24(1), 142153.CrossRefGoogle ScholarPubMed
Canver, M. C., Bauer, D. E., Dass, A., Yien, Y. Y., Chung, J., Masuda, T., Maeda, T., Paw, B. H., & Orkin, S. H. (2014). Characterization of genomic deletion efficiency mediated by clustered regularly interspaced palindromic repeats (CRISPR)/Cas9 nuclease system in mammalian cells. The Journal of Biological Chemistry, 289(31), 2131221324.CrossRefGoogle ScholarPubMed
Cao, J., Schneeberger, K., Ossowski, S., Günther, T., Bender, S., Fitz, J., Koenig, D., Lanz, C., Stegle, O., Lippert, C., Wang, X., Ott, F., Müller, J., Alonso-Blanco, C., Borgwardt, K., Schmid, K. J., & Weigel, D. (2011). Whole-genome sequencing of multiple Arabidopsis thaliana populations. Nature Genetics, 43(10), 956963.CrossRefGoogle ScholarPubMed
Chang, N., Sun, C., Gao, L., Zhu, D., Xu, X., Zhu, X., Xiong, J.-W., & Xi, J. J. (2013). Genome editing with RNA-guided Cas9 nuclease in zebrafish embryos. Cell Research, 23(4), 465472.CrossRefGoogle ScholarPubMed
Clough, S. J., & Bent, A. F. (1998). Floral dip: a simplified method for Agrobacterium-mediated transformation of Arabidopsis thaliana . The Plant Journal, 16(6), 735743.CrossRefGoogle ScholarPubMed
Danecek, P., Auton, A., Abecasis, G., Albers, C. A., Banks, E., DePristo, M. A., Handsaker, R. E., Lunter, G., Marth, G. T., Sherry, S. T., McVean, G., Durbin, R., & 1000 Genomes Project Analysis Group. (2011). The variant call format and VCFtools. Bioinformatics, 27(15), 21562158.CrossRefGoogle ScholarPubMed
Defraia, C. T., Schmelz, E. A., & Mou, Z. (2008). A rapid biosensor-based method for quantification of free and glucose-conjugated salicylic acid. Plant Methods, 4, 28.CrossRefGoogle ScholarPubMed
Edwards, K., Johnstone, C., & Thompson, C. (1991). A simple and rapid method for the preparation of plant genomic DNA for PCR analysis. Nucleic Acids Research, 19(6), 1349.CrossRefGoogle ScholarPubMed
Fauser, F., Schiml, S., & Puchta, H. (2014). Both CRISPR/Cas-based nucleases and nickases can be used efficiently for genome engineering in Arabidopsis thaliana . The Plant Journal: For Cell and Molecular Biology, 79(2), 348359.CrossRefGoogle ScholarPubMed
Feng, Z., Mao, Y., Xu, N., Zhang, B., Wei, P., Yang, D.-L., Wang, Z., Zhang, Z., Zheng, R., Yang, L., Zeng, L., Liu, X., & Zhu, J.-K. (2014). Multigeneration analysis reveals the inheritance, specificity, and patterns of CRISPR/Cas-induced gene modifications in Arabidopsis. Proceedings of the National Academy of Sciences of the United States of America, 111(12), 46324637.CrossRefGoogle ScholarPubMed
Feng, Z., Zhang, B., Ding, W., Liu, X., Yang, D.-L., Wei, P., Cao, F., Zhu, S., Zhang, F., Mao, Y., & Zhu, J.-K. (2013). Efficient genome editing in plants using a CRISPR/Cas system. Cell Research, 23(10), 12291232.CrossRefGoogle ScholarPubMed
Gaj, T., Gersbach, C. A., & Barbas, C. F. III (2013). ZFN, TALEN, and CRISPR/Cas-based methods for genome engineering. Trends in Biotechnology, 31(7), 397405.CrossRefGoogle ScholarPubMed
Gao, X., Chen, J., Dai, X., Zhang, D., & Zhao, Y. (2016). An effective strategy for reliably isolating heritable and Cas9-free Arabidopsis mutants generated by CRISPR/Cas9-mediated genome editing. Plant Physiology, 171(3), 17941800.CrossRefGoogle ScholarPubMed
Garrison, E., & Marth, G. (2012). Haplotype-based variant detection from short-read sequencing. In arXiv [q-bio.GN]. http://arxiv.org/abs/1207.3907 Google Scholar
Gasiunas, G., Barrangou, R., Horvath, P., & Siksnys, V. (2012). Cas9-crRNA ribonucleoprotein complex mediates specific DNA cleavage for adaptive immunity in bacteria. Proceedings of the National Academy of Sciences of the United States of America, 109(39), E2579E2586.CrossRefGoogle Scholar
Gratz, S. J., Ukken, F. P., Rubinstein, C. D., Thiede, G., Donohue, L. K., Cummings, A. M., & O’Connor-Giles, K. M. (2014). Highly specific and efficient CRISPR/Cas9-catalyzed homology-directed repair in Drosophila. Genetics, 196(4), 961971.CrossRefGoogle ScholarPubMed
Huang, W. E., Huang, L., Preston, G. M., Naylor, M., Carr, J. P., Li, Y., Singer, A. C., Whiteley, A. S., & Wang, H. (2006). Quantitative in situ assay of salicylic acid in tobacco leaves using a genetically modified biosensor strain of Acinetobacter sp. ADP1. The Plant Journal: For Cell and Molecular Biology, 46(6), 10731083.CrossRefGoogle ScholarPubMed
Hyun, Y., Kim, J., Cho, S. W., Choi, Y., Kim, J.-S., & Coupland, G. (2015). Site-directed mutagenesis in Arabidopsis thaliana using dividing tissue-targeted RGEN of the CRISPR/Cas system to generate heritable null alleles. Planta, 241(1), 271284.CrossRefGoogle ScholarPubMed
Jinek, M., Chylinski, K., Fonfara, I., Hauer, M., Doudna, J. A., & Charpentier, E. (2012). A programmable dual-RNA – guided DNA endonuclease in adaptive bacterial immunity. Science, 337(6096), 816821.CrossRefGoogle ScholarPubMed
Kosicki, M., Tomberg, K., & Bradley, A. (2018). Repair of double-strand breaks induced by CRISPR-Cas9 leads to large deletions and complex rearrangements. Nature Biotechnology, 36, 765771. https://doi.org/10.1038/nbt.4192 CrossRefGoogle ScholarPubMed
Kroj, T., Savino, G., Valon, C., Giraudat, J., & Parcy, F. (2003). Regulation of storage protein gene expression in Arabidopsis. Development, 130(24), 60656073.CrossRefGoogle ScholarPubMed
Lampropoulos, A., Sutikovic, Z., Wenzl, C., Maegele, I., Lohmann, J. U., & Forner, J. (2013). GreenGate – a novel, versatile, and efficient cloning system for plant transgenesis. PloS One, 8(12), e83043.CrossRefGoogle ScholarPubMed
Liang, F., Han, M., Romanienko, P. J., & Jasin, M. (1998). Homology-directed repair is a major double-strand break repair pathway in mammalian cells. Proceedings of the National Academy of Sciences of the United States of America, 95(9), 51725177.CrossRefGoogle ScholarPubMed
Li, D., Qiu, Z., Shao, Y., Chen, Y., Guan, Y., Liu, M., Li, Y., Gao, N., Wang, L., Lu, X., Zhao, Y., & Liu, M. (2013). Heritable gene targeting in the mouse and rat using a CRISPR-Cas system. Nature Biotechnology, 31(8), 681683.CrossRefGoogle ScholarPubMed
Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., Durbin, R., & 1000 Genome Project Data Processing Subgroup. (2009). The sequence alignment/map format and SAMtools. Bioinformatics, 25(16), 20782079.CrossRefGoogle ScholarPubMed
Lundberg, D. S., Yourstone, S., Mieczkowski, P., Jones, C. D., & Dangl, J. L. (2013). Practical innovations for high-throughput amplicon sequencing. Nature Methods, 10(10), 9991002.CrossRefGoogle ScholarPubMed
Marek, G., Carver, R., Ding, Y., Sathyanarayan, D., Zhang, X., & Mou, Z. (2010). A high-throughput method for isolation of salicylic acid metabolic mutants. Plant Methods, 6, 21.CrossRefGoogle ScholarPubMed
Mashal, R. D., Koontz, J., & Sklar, J. (1995). Detection of mutations by cleavage of DNA heteroduplexes with bacteriophage resolvases. Nature Genetics, 9(2), 177183.CrossRefGoogle ScholarPubMed
Ma, Y., Lu, H., Tippin, B., Goodman, M. F., Shimazaki, N., Koiwai, O., Hsieh, C. L., Schwarz, K., & Lieber, M. R. (2004). A biochemically defined system for mammalian nonhomologous DNA end joining. Molecular Cell, 16(5), 701713.CrossRefGoogle ScholarPubMed
Peterson, B. A., Haak, D. C., Nishimura, M. T., Teixeira, P. J. P. L., James, S. R., Dangl, J. L., & Nimchuk, Z. L. (2016). Genome-wide assessment of efficiency and specificity in CRISPR/Cas9 mediated multiple site targeting in Arabidopsis. PloS One, 11(9), e0162169.CrossRefGoogle ScholarPubMed
Phillips, J. W., & Morgan, W. F. (1994). Illegitimate recombination induced by DNA double-strand breaks in a mammalian chromosome. Molecular and Cellular Biology, 14(9), 57945803.CrossRefGoogle Scholar
Platt, R. J., Chen, S., Zhou, Y., Yim, M. J., Swiech, L., Kempton, H. R., Dahlman, J. E., Parnas, O., Eisenhaure, T. M., Jovanovic, M., Graham, D. B., Jhunjhunwala, S., Heidenreich, M., Xavier, R. J., Langer, R., Anderson, D. G., Hacohen, N., Regev, A., Feng, G., … Zhang, F. (2014). CRISPR-Cas9 knockin mice for genome editing and cancer modeling. Cell, 159(2), 440455.CrossRefGoogle ScholarPubMed
Robinson, J. T., Thorvaldsdóttir, H., Winckler, W., Guttman, M., Lander, E. S., Getz, G., & Mesirov, J. P. (2011). Integrative genomics viewer. Nature Biotechnology, 29(1), 2426.CrossRefGoogle ScholarPubMed
Sanger, F., & Coulson, A. R. (1975). A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase. Journal of Molecular Biology, 94(3), 441448.CrossRefGoogle ScholarPubMed
Strauss, E. C., Kobori, J. A., Siu, G., & Hood, L. E. (1986). Specific-primer-directed DNA sequencing. Analytical Biochemistry, 154(1), 353360.CrossRefGoogle ScholarPubMed
Thorvaldsdóttir, H., Robinson, J. T., & Mesirov, J. P. (2013). Integrative genomics viewer (IGV): high-performance genomics data visualization and exploration. Briefings in Bioinformatics, 14(2), 178192.CrossRefGoogle ScholarPubMed
Wildermuth, M. C., Dewdney, J., Wu, G., & Ausubel, F. M. (2001). Isochorismate synthase is required to synthesize salicylic acid for plant defence. Nature, 414(6863), 562565.CrossRefGoogle ScholarPubMed
Wu, R., Lucke, M., Jang, Y.-T., Zhu, W., Symeonidi, E., Wang, C., Fitz, J., Xi, W., Schwab, R., & Weigel, D. (2018). An efficient CRISPR vector toolbox for engineering large deletions in Arabidopsis thaliana . Plant Methods, 14(1), 65.CrossRefGoogle ScholarPubMed
Xie, K., & Yang, Y. (2013). RNA-guided genome editing in plants using a CRISPR–Cas system. Molecular Plant, 6(6), 19751983.CrossRefGoogle ScholarPubMed
Figure 0

Figure 1. Amplicon preparation. (a) Diagram of the targeted gene, ISOCHORISMATE SYNTHASE 1 (ICS1). Black boxes indicate exons, and grey boxes untranslated regions. The arrow shows the direction of transcription. The numbers at the beginning and end of the gene correspond to the genomic coordinates. (b)–(d) Amplicon preparation. (b) The first PCR step to amplify a specific region of the genome. The oligonucleotide primers in this step fuse the first part of the TruSeq adapters (grey) and the frame shifting nucleotides (red). (c) The second PCR amplification adds the last part of the TruSeq adapters (purple) and one of the 96 barcodes (orange). (d) The final amplicon with frameshifting base pairs(s) (red), TruSeq adapters (grey and purple) and barcode(orange).

Figure 1

Figure 2. Diagrams of the demultiplexing procedure and graphs representing the average number of reads/plate/run. (a) The primary demultiplexing step is carried out by the Illumina software and separates the samples based on the indices that are located within the adapter region into 96 pools. (b) The secondary demultiplexing script (PlexSeq) then assigns the reads to individual plants based on the frameshifting nucleotides. (c)–(f) Each graph shows the average number of reads per plate (≤96 samples) in each run. Each run consisted of different samples. For different runs, different numbers of plates were sequenced depending on the number of samples. (c) MiSeqrun010, (d) MiSeqrun024, (e) MiSeqrun046 and (f) MiSeqrun083. For exact number of samples per run see Supplementary Table S4.

Figure 2

Figure 3. The analysis pipeline and visualized alignments and SA levels of different genotypes. (a) Diagram of the analysis pipeline. BWA-MEM is used for the alignment and the Freebayes algorithm for variant calling. Finally, IGV is used for visualizing the alignments or the vcf files. (b) Alignment that shows a deletion visualized in IGV. On the top track in the coverage panel it is apparent how coverage is decreased at the location of the deletion. The black box indicates the location of the PAM site. (c) Alignment that shows a 1-bp insertion (purple ‘I’) in IGV. The black box indicates the location of the PAM site. (d) SA content of TüWa1-2 wild-type and the derivative TüWa1-2 ics1-1c mutant. (e) SA content of Col-0 reference wild type and isogenic sid2-2 mutant for comparison. SID2 is a synonym for ICS1. Note the very different scale from (d). FT, Fresh Tissue. The whiskers of each boxplot indicate the spread of the data, and the line within the box corresponds to the median value of the data points. The measurements from plants grown in 23°C short-day conditions (8 h light/16 h dark) for 43 days.

Figure 3

Figure 4. Schematic representation of the screening pipeline. Starting from hundreds of samples the amplicon generation takes place by preparing the individuals for sequencing. By the end of the sequencing run the demultiplexing and analysis can take place that can lead to the identification of the desired edited individuals.

Supplementary material: File

Symeonidi et al. supplementary material

Symeonidi et al. supplementary material 1

Download Symeonidi et al. supplementary material(File)
File 2.9 MB
Supplementary material: File

Symeonidi et al. supplementary material

Symeonidi et al. supplementary material 2

Download Symeonidi et al. supplementary material(File)
File 142.8 KB

Author comment: CRISPR-finder: A high throughput and cost-effective method to identify successfully edited Arabidopsis thaliana individuals — R0/PR1

Comments

Dear Olivier,

We would like to submit the attached manuscript “CRISPR-finder: A high throughput and cost effective method for identifying successfully edited A. thaliana individuals” for publication in Quantitative Plant Biology.

As you know, CRISPR/Cas9 technology has become the most successful genome editing tools during the last years. Depending on the system, accurate identification of edited individuals is one of the bottlenecks in applying this method in practice, because a large number of individuals needs to be genotyped. Screening using Sanger sequencing or T7 Endonuclease I (T7E1) mismatch assays is expensive and lacks resolution, often adding significant cost and/or experimental burden.

We introduce a high throughput and cost-effective approach for identifying the exact mutations in edited individuals, based on a two-step PCR amplification of the targeted region. We demonstrate the precision and versatility of the method by using it in different A. thaliana accessions. We anticipate that the technique will be widely adopted because of its accuracy and its simplicity. A preprint of this study has been uploaded on bioRxiv (https://www.biorxiv.org/content/10.1101/2020.06.25.171538), and quite a few colleagues have already started to implement the method we describe.

We declare no competing interests. Knowledgeable reviewers for this work include Luca Pinello (Massachusetts General Hospital), Holger Puchta (Karlsruhe Institute of Technology), Astrid Cruaud (INRA), MItchell L Sogin (The Marine Biological Laboratory) and Frederic Bushman (U Penn). We are looking forward to hearing from you.

Yours truly,

Detlef Weigel

Executive Director, Max Planck Institute

Review: CRISPR-finder: A high throughput and cost-effective method to identify successfully edited Arabidopsis thaliana individuals — R0/PR2

Conflict of interest statement

Reviewer declares none.

Comments

Comments to Author: The manuscript describes an experimental and analysis pipeline for being able to screen large numbers of plants for desired gene editing events. The work addresses a very important issue, as it is currently much easier to induce gene edits using CRISPR/Cas9 than to screen for desired gene editing events in the resulting progeny. The authors also present a nice (and impressive) proof of concept for the potential of efficient gene editing. A well worked out pipeline such as described here is timely and will certainly be of high interests for many plant scientists. Moreover, the manuscript is written and illustrated very well and easy to follow.

Despite my enthusiasm for the manuscript, I am not sure if the manuscript is a good fit for Quantitative Plant Biology. In its present form it is a method paper, but the scope of the journal is described as publishing “ground-breaking discoveries and predictions in quantitative plant science”. One possibility to improve with regards to the journal match might be to actually use the data that should exist according to the manuscript: The same gene edits were performed in 48 different accessions and using two different Cas9 versions. Perhaps, we can learn something about the impact of the genetic background and/or these different Cas9 versions on gene edits. Moreover, I am sure scoring the phenotypes of all these mutants in different accessions would reveal exciting things. However, it is of course it is very likely that these findings will be part of another study.

In addition to this more significant issue, there are a couple of minor issues:

1. In the current form of the manuscript, it is somehow unclear why the different accessions/Cas9 versions are presented in the manuscript. Results for these accessions or Cas9 versions are not presented at all (maybe it is included in the database deposition, but I couldn’t access it).

2. It would be nice to have a bit more discussion on the limits of the approach: Would it work with multi-gRNA promoter editing (it seems that larger INDELs etc. can be frequently caused by that), what bottlenecks remain once the genotyping problem is solved?

3. It might be helpful to explain some of the abbreviations or jargon terms. For instance, L93 MiSeq is not explained, or Trueseq adapters are not.

4. Fig 1a: Are the numbers at the gene model the genomic coordinates?

5. Fig 2c-f: The sample names are the same; Are they repeats or different samples? Also, the y-axis should have scientific notation (20k might confuse some readers)

6. L189: Typo: “Figure”

7. Fig 3d,e: Spell out or define FT in y-axis label; Define properties of the box plot (what are the whiskers, box, vertical line indicating)

8. L233: “free SA in the TüWa1-2 c-ics1-1 mutant were significantly lower” specify stat. test and test result either in Figure 3 or in text;

9. L307ff ”Selection of Cas9-free plants”: How can one be sure that herbicide resistance gene or FP are not silenced (thus the plants would not be Cas9 free)?

10. L347: Typo “Thermo FIsher"

11. L375,L377: Unclear what the ratio refers to.

Review: CRISPR-finder: A high throughput and cost-effective method to identify successfully edited Arabidopsis thaliana individuals — R0/PR3

Conflict of interest statement

Reviewer declares none.

Comments

Comments to Author: Review Symeonidi et al., september 2020

This is a technical paper providing a method to accurately detect small mutations triggered by the CRISPR-Cas9 tool in Arabidopsis using a cost-efficient NGS method. The authors describe a 2-steps PCR-based multiplexing method in which they amplify the region of interest with a first set of unique frameshifting barcodes. Then after pooling, a second PCR with a set of Illumina barcodes is performed. The information is then deconvoluted in two steps to assign reads to individual plants.

The authors describe the bioinformatics method they set up to separate the reads in two steps. One step with the standard Illumina demultiplexing tool, and then their homemade method (Plexseq) which looks up the secondary (frameshift) barcode as an exact match in the 5’ end of the first PE read, and then, if applicable (PE data) looks at the exact match et the 5’ end of the secondary read.

For the moment, the bioinformatics tool does not allow errors on the secondary read 5’ end, which results in the loss of a few % of the reads. However, this minor amount of “lost “ reads does not affect the final characterization of the DNA sequence in transformed plants.

This is an interesting paper which may help plant groups save time to characterize their different CRISPR lines. As most groups are massively using this emerging tool, they realize that genotyping by Sanger becomes really costly (both in time and money). Hight thoughput methods such as the one decribed in here represent interesting alternatives. Double amplicfications and deconvolution have already been described for other species (Brocal et al., . BMC Genomics. 2016;17:259, for example), but this time the authors cleverly reused/recycled frameshifting barcodes, which were previously described in environmental sequencing for a totally different purpose. I liked this agile approach.

I thus recommend publication for this work, however I have one major comment on how the “wet” steps are described:

As it is presented, it is not easy to clearly understand the “how” of the two steps PCR protocol. This part should be more clearly explained so that the reader can understand the interest of the method. My misunderstanding came from the reading of the original paper on frameshifting nucleotides (Lundberg et al., 2013). In this study, frameshifting nucleotides have been used to capture molecular diversity in complex environmental DNA mixes to avoid biases generated during PCR amplification from complex mixtures. The use of frameshifting primers in this study has a totally different goal finally. I initially thought that these nucleotides were used to capture molecular diversity from each individual with contains a range of somatic mutations, and got confused. Then by looking at the suppl tables describing the 8 different frameshifting nucleotides I understood that these 8 different primers were used to amplify DNA from 8 different individuals. Then these 8 samples are pooled in 1 single batch for the second PCR round. Up to 96 batches of 8 individuals can then be sequenced in one run. Am I right? So maybe this part should be more carefully rephrased so as to make it more explicit and clear.

Are the sample tagged in unique combination (i.e. sample 1 is amplified with G-40604 and G-40607, allowing the same frameshift barcode on both ends) or in combinatorial multiplexing?

Overall, no description of the pooling design is provided, and no description of the actual number of individuals which could be genotyped in the 4 different runs is provided. I think figure 1 should be redesigned to incorporate a description of the exact pooling strategy so as to view the actual scale of the study.

I have a few more minor comments:

There is little information on whether single end reads or paired end are provided by the MiSeq (only one mention of the use of the MiSeq reagent kit V2 which comes rather late in the manuscript and remains elusive on the type of output that was chosen : SE or PE, read length?). What was the average “useful” read length after trimming the to consecutive barcodes? Which % of the reads actually mapped on the amplicon sequence? The authors should be more precise.

Figure 2 should be redesigned with colors matching figure 1 (first PCR with red and grey tags, then second PCR specific colors).

Finally, there are not enough details on sample numbers, sequencing results reads and finally with so little precision the reader cannot really estimate whether this strategy can be cost effective or not, and what is the approximate number of samples from which it becomes really interesting to use this NGS method.

Recommendation: CRISPR-finder: A high throughput and cost-effective method to identify successfully edited Arabidopsis thaliana individuals — R0/PR4

Comments

Comments to Author: There were extenuating circumstances that led to a delay in the completion of this review cycle; I appreciate your patience. The result of this evaluation are overall positive, with reviewer 1 requesting minor revisions and reviewer 2 requesting revisions that should not require additional experimentation or analysis. The suggested revisions will make this work more accessible to a broader and interdisciplinary audience. I’d advise to make the required corrections, and add more context (e.g. in the discussion) on how how your protocol will be useful for quantitative plant biology papers (following the reviewer comments). It is my intention to evaluate your textual revisions myself before a final decision.

Decision: CRISPR-finder: A high throughput and cost-effective method to identify successfully edited Arabidopsis thaliana individuals — R0/PR5

Comments

No accompanying comment.

Author comment: CRISPR-finder: A high throughput and cost-effective method to identify successfully edited Arabidopsis thaliana individuals — R1/PR6

Comments

Dear Olivier,

Many thanks for having our manuscript “CRISPR-finder: A high throughput and cost effective method for identifying successfully edited A. thaliana individuals” considered for publication in Quantitative Plant Biology.

We were very pleased with the positive comments and have hopefully addressed all of them satisfactorily. We are looking forward to hearing from you.

Yours truly,

Detlef Weigel

Executive Director, Max Planck Institute

Review: CRISPR-finder: A high throughput and cost-effective method to identify successfully edited Arabidopsis thaliana individuals — R1/PR7

Comments

Comments to Author: I commend the authors on this revised manuscript.

The authors have addressed all points that I had raised, with one minor exception that I suggest they revise:

While they have streamlined the manuscript and removed information about the approaches (accession, Cas9 versions) for which no data is presented, in line 234 (methods section) they still refer to two Cas9 versions. It is not clear which one is the Cas9 version for which the results are reported. Please clarify.

Review: CRISPR-finder: A high throughput and cost-effective method to identify successfully edited Arabidopsis thaliana individuals — R1/PR8

Comments

Comments to Author: All the issues I pointed out in the first review have been adressed,all added figures and table improve the manuscript, which is clear and easily understandable .

Recommendation: CRISPR-finder: A high throughput and cost-effective method to identify successfully edited Arabidopsis thaliana individuals — R1/PR9

Comments

Comments to Author: Thanks for the revised version. Please make sure you address the minor comment of Reviewer 1.

Decision: CRISPR-finder: A high throughput and cost-effective method to identify successfully edited Arabidopsis thaliana individuals — R1/PR10

Comments

No accompanying comment.

Author comment: CRISPR-finder: A high throughput and cost-effective method to identify successfully edited Arabidopsis thaliana individuals — R2/PR11

Comments

No accompanying comment.

Recommendation: CRISPR-finder: A high throughput and cost-effective method to identify successfully edited Arabidopsis thaliana individuals — R2/PR12

Comments

No accompanying comment.

Decision: CRISPR-finder: A high throughput and cost-effective method to identify successfully edited Arabidopsis thaliana individuals — R2/PR13

Comments

No accompanying comment.