Introduction
Camellia fascicularis H. T. Chang is a unique and rare plant species in Yunnan province, China. It has high ornamental and medicinal value, and new gene sources for genetic improvement of Camellia genus crops by interspecific hybridization (Cheng et al., Reference Cheng, Chen, Zhao and Huang1994; Zhao et al., Reference Zhao, Cheng and Chen1998; Kell et al., Reference Kell, Qin, Chen, Ford-Lloyd, Wei, Kang and Maxted2015; Zhang et al., Reference Zhang, Zhang, Zhang, Wang and Ping2015). In particular, it is considered as a source for the special Camellia golden flower colour, and is called ‘queen of Camellia’ and ‘Panda of the plant world’. The wild C. fascicularis is mainly distributed in Gejiu, Hekou and Maguan counties, Yunnan province, accounting for 93.8% of total resources in Yunnan (Zhang et al., Reference Zhang, Zhang, Zhang, Wang and Ping2015). It contains rich tea polysaccharides, tea polyphenols, saponins, flavonoids, tea pigment, caffeine, protein, vitamin B1, vitamin B2, vitamin C, vitamin E, folic acid, fatty acid, β-carotene and other natural nutrients, which make it a valuable economic species. During flowering seasons of C. fascicularis, the local residents often pick up its flowers for use in Chinese traditional medicine, making the plants unable to bear offspring and renew, so the wild C. fascicularis population is in danger of extinction (Ma et al., Reference Ma, Chen, Grumbin, Dao, Sun and Guo2013). C. fascicularis was included in Yunnan province's 2010–2015 Minispecies Rescue and Protection Emergency Action Plan, and was also classified as critically endangered based on the International Union for Conservation of Nature assessment criteria (Jiang and Fan, Reference Jiang and Fan2003). On the other hand, breeders have made attempts at improvement of ornamental value by interspecific cross breeding between Camellia japonica and Camellia chrysantha, C. fascicularis and Camellia oleifera, as well as C. fascicularis, C. japonica, etc., since 1970s, but the breeding of Camellia was very difficult and few new cultivars in C. fascicularis have been bred in the world (Parks and Scogin, Reference Parks and Scogin1987; Parks, Reference Parks1989; Uemoto et al., Reference Uemoto, Miyajima and Eguchi1991; Cheng et al., Reference Cheng, Chen, Zhao and Huang1994; Zhao et al., Reference Zhao, Cheng and Chen1998). Understanding of genetic structure between and within the species was the fundamental goal of germplasm collection, crop improvement and conservation (Mercati et al., Reference Mercati, Catarcione, Paolacci, Abenavoli, Sunseri and Ciaffi2015). Therefore, there is an urgent need to focus on the unknown genetic background, structure, relationships and diversity in natural populations of C. fascicularis to develop a strategy to prevent biodiversity loss, safeguard plant populations and develop future breeding programmes through in vitro propagation.
Molecular markers have been routinely used to analyse the genetic variation in the plants (Abbate et al., Reference Abbate, Mercati, Noto, Heuertz, Carimi, Bosco and Schicchi2020). Among the types of DNA markers, simple sequence repeat (SSR) markers are relatively rich in polymorphism, co-dominant in inheritance, genome-wide in distribution and high in reliability, and therefore are preferred over other DNA markers (Ju et al., Reference Ju, Ma, Xin, Zhou and Tian2015; Si et al., Reference Si, Gao, Fan, Liu and Han2019; Bhandawat et al., Reference Bhandawat, Sharma, Singh, Seth, Nag, Kaur and Sharma2020; Joseph et al., Reference Joseph, Joseph, Isaac, Richard, Vernon and Samuel2020; Tirfessa et al., Reference Tirfessa, Tesso, Adugna, Mohammed and Kiambi2020; Wani et al., Reference Wani, Sharma, Gupta and Munshi2020). Previously, 10, 12, 21 and 9 SSR markers were identified in Camellia nitidissima (Chen et al., Reference Chen, Jiang, Wang, Wei, Wei, Tang and Li2010), Camellia huana (Li et al., Reference Li, Liu, Pei, Ning and Tang2020), Camellia pingguoensis (Lu et al., Reference Lu, Liufu, Peng, Ye and Tang2014) and Camellia tunghinensis (Tang et al., Reference Tang, Chen, Wei, Shi, Chai and Kong2014), respectively. Furthermore, the genetic diversity and population structure were studied based on SSR markers in three endangered Camellia species: Camellia chrysanthoides, Camellia micrantha and Camellia parvipetala (Chen et al., Reference Chen, Lu, Ye and Tang2019). After identifying several expressed sequence tag-derived simple sequence repeats (EST-SSRs) from the 35,410 unigene sequence of Camellia chuongtsoensis, the genetic diversity of this species was analysed using 10 of the EST-SSRs (Shao et al., Reference Shao, Fan, Huang, GAO, Li and Zhang2015). The genetic diversity of the C. nitidissima population and its variety, microcarpa from Naning area, was analysed using 10 SSR markers (Lu et al., Reference Lu, Chen, Liang and Tang2019). Nevertheless, the development of SSR markers in C. fascicularis has seldom been reported, and much work remains to be done to identify SSR markers and analyse its population diversity for its protection. Here, we sequenced the transcriptome and characterized the EST-SSR markers in this species, and analysed the diversity of C. fascicularis. Our results will facilitate the strategizing of long-term conservation programmes for genetic resources, such as in situ and ex situ conservation, seed collection and tree in vitro propagation.
Materials and methods
Materials
A total of 40 C. fascicularis accessions, 10 from Shazhudi (SZD) in Gejiu city, 10 from Nanxi (NX) in Hekou county, 10 from Xiaonanxi (XNX) in Hekou county and 10 from Jianshanjiao in Maguan (MG) county, in Daweishan National Nature Reserve of Yunnan Province of China, were investigated in this study and collected as experimental samples. Another group of six Camellia pubipetala individuals from a single Longmen (LM) population (Longmen town, Daxin county, Guangxi province) was used to test the polymorphism of the SSR markers in this study. For sample collection, some fresh leaves were taken from each tree, then quickly dried and fixed with silica gel self-indicator, and preserved in a ziplock bag. Among these trees, the samples used for transcriptome sequencing came from trees in Gejiu and Hekou, which we collected and planted in the greenhouse of Southwest Forestry University in Kunming. The details about the samples are shown in Table 1.
N, number of individuals.
a Voucher specimens deposited at the Herbarium of Southwest Forestry University, Kunming, China.
Methods
Extraction of DNA and RNA, and transcriptome sequencing
The genomic DNA of C. fascicularis was extracted using the EZ-10 Spin Column Plant Genomic DNA Purification Kit (Sangon Biotech, Shanghai). The total mRNA was extracted, from leaf samples of trees planted in the greenhouse, using the trizol method (Invitrogen TRIzol, ThermoFisher Scientific, Shanghai), and mRNA was enriched using magnetic oligo (dT) beads. Then the mRNA was broken into short fragments, and reverse transcription polymerase chain reaction (RT-PCR) was performed for cDNA synthesis, and the resulting cDNA fragments were purified using AMPure XP system (Beckman Coulter, Beverly, USA). Then, transcriptome sequencing was completed on Illumina HiSeq™2000 platform (Novogene, Beijing).
Identification of SSR markers
After sequencing, clean reads were obtained by removing the reads containing poly-N, adapters and low-quality reads from the raw reads, which were in the fastq format and processed using in-house Perl scripts. The microsatellite recognition tool MISA Perl script (MIcroSAtellite identification tool, www.mybiosoftware.com/misa-microsatellite-identification-tool.html) was used to screen the EST sequences containing microsatellite loci (Wang et al., Reference Wang, Cai and Jia2012). Then Primer 3.0 software was used to design SSR primers from the screened EST sequences (Rozen and Skaletsky, Reference Rozen and Skaletsky2000). The primer length was in the range of 18–25 bp. Annealing temperature ranged from 55 to 65°C. The length of PCR products ranged from 100 to 500 bp. GC content was in the range of 40–60%. PCR amplification was performed in 20 μl reaction volume consisting of 5 ng of template DNA, 2× Taq Master Mix (0.1 U/μl Taq DNA polymerase, 3 mM MgCl2, 0.4 mM dNTPs, 2× PCR buffer) and 0.5 μM of each forward and reverse primer. The reaction was performed in a PCR amplification instrument (Bio-Rad, S1000 thermal cycler, USA) using the PCR programme as follows: 5 min at 95°C, followed by 35 cycles of denaturation at 94°C for 45 s, annealing at 57–60°C (depending on each marker) for 45 s and extension at 72°C for 1 min, then final extension was performed at 72°C for 10 min. Finally, PCR products were preliminarily detected using 1.5% agarose gel, and those primers of the PCR products with clear bands were used in further analysis by capillary electrophoresis.
As the forward primers from 20 SSR markers obtained in the preliminary screening were marked with two fluorescent dyes (FAM, HEX; Applied Biosystems; Table 2), the multiplex SSR genotyping was carried out using multiplex-ready PCR technology (Hayden et al., Reference Hayden, Nguyen, Waterman, Mcmichael and Chalmers2008), and the products were detected using capillary electrophoresis (ABI_3730, Applied Biosystems, Foster City, CA).
a Monomorphic.
Analysis of locus polymorphisms and genetic structure
Finally, the polymorphisms at each locus were calculated using GenALEx 6.5 (Peakall and Smouse, Reference Peakall and Smouse2006), and Na (number of alleles at each locus), Ho (observed heterozygosity) and He (expected heterozygosity) were analysed using POPGENE 32 (Yeh et al., Reference Yeh, Yang and Boyle1999). Based on the Nei's genetic distance and genetic identity, the cluster map of the population was constructed by NTSYS2.10 software.
Results
The transcriptome sequencing provided 57, 518 and 636 raw reads and 54, 417 and 600 clean reads. Clean reads were spliced and clustered to obtain 155,011 unigenes, from which 95,979 microsatellite loci were identified. Among these SSR loci, a total of 153 types of nucleotide repeat motifs with 2–6 nucleotides were found, including 71.44% di-nucleotide repeats (four types of repeats), 25.48% tri-nucleotide repeats (10 types of repeats), 2.38% tetra-nucleotide repeats (26 types of repeats), 0.28% penta-nucleotide repeats (39 types of repeats) and 0.42% of hexa-nucleotide repeats (74 types of repeats). Among the di-nucleotide motifs, AG/CT showed the highest frequency, accounting for 72.69%, followed by AT/AT accounting for 19.41%, AC/GT accounting for 7.75%, and the lowest being CG/CG with a proportion of 0.15%. In the tri-nucleotide motifs, the proportions were: 27.52% AAG/CCT, 15.15% ACC/GGT, 12.85% ATC/ATG, 11.74% AAT/ATT, 9.84% AGG/CCT, 7.82% AAC/GTT, 5.79% AGC/CTG, 5.0% CCG/CGG, 2.24% ACT/AGT and 2.05% ACG/CGT (online Supplementary Fig. S1). The detailed information about these SSRs is given in the Supplementary materials (online Supplementary Table S1).
From these SSR loci, we designed new SSR primers, and PCR products were initially validated on 1.5% agarose gel for the samples of C. fascicularis. Then, 20 EST-SSR markers with clear bands and stable reaction were selected and subjected to capillary electrophoresis detection. Among these 20 EST-SSR loci, six were monomorphic in the population of C. fascicularis while the remaining 14 (YJ01, YJ05, YJ10, YJ11, YJ12, YJ18, YJ19, YJ25, YJ27, YJ28, YJ35, YJ37, YJ38, YJ43) had high polymorphism (Table 2). DNA stretches of these 14 markers consisted of tandemly repeated di- or tri-nucleotide motifs. The number of repeats ranged from five to nine, and the size of amplified fragments ranged from 155 to 254 bp (Table 2). The number of alleles (Na) ranged from 2 to 8, averaging at 4.86, which was a relatively high polymorphism for the population. Among these, YJ43 had the highest number of effective alleles, and markers YJ01 and YJ11 had the lowest number of effective alleles in the population of C. fascicularis. The diversity of markers was validated using allele number (Na), and expected heterozygosity (He), which are given in Table 2. Finally, the EST-SSR primer sequences were submitted to NCBI database as public molecular information (Table 2).
The DNA sequences of the 14 candidate markers showing polymorphisms among the four populations of C. fascicularis were submitted to GenBank (Table 2). The number of alleles per locus ranged from 2 to 5, He ranged from 0.183 to 0.683, and Ho ranged from 0.201 to 0.700 (Table 3). Based on the genetic structure in C. fascicularis, the four natural populations from different growth regions were classified into three clusters: the population of MG in one group, the SZD population of Gejiu city in a second group, and the NX and XNX populations of Hekou county in the third group (Fig. 1). Cross-species amplification of the 14 newly developed polymorphic markers was tested in six C. pubipetala individuals from a single LM population (Longmen town, Daxin county, Guangxi province, Table 1), using the same procedures described above. Five loci (35.71%) were successfully amplified in all C. pubipetala individuals tested (Table 4).
Na, number of alleles per locus; He, expected heterozygosity; Ho, observed heterozygosity; N, number of individuals sample.
Na, number of alleles per locus; He, expected heterozygosity; Ho, observed heterozygosity; N, number of individuals sampled.
a Basic information provided in Table 1.
b GenBank accession number for the cross-amplified markers in C. pubipetala.
Discussion
Due to lack of reference genome sequence, and difficulty in sampling wild populations of C. fascicularis, the development of a credible molecular marker platform for genetic studies of this plant is challenging. To date, analyses of genetic diversity and structure in this species are lacking. In this study, we firstly performed transcriptome sequencing, contributing high-throughput EST sequence data for public databases. These data will be very useful for genetic studies, especially for those species without genomic information (Zalapa et al., Reference Zalapa, Cuevas, Zhu, Steffan, Senalik, Zeldin, McCown, Harbut and Simon2012; Wani et al., Reference Wani, Sharma, Gupta and Munshi2020). The EST-SSR markers developed and characterized in this study provided a strong basis for detecting genetic diversity and structure in C. fascicularis and related Camellia species. The markers developed in this study had high polymorphism with significant number of alleles at each locus, and showed high variation and diversity in 40 accessions of C. fascicularis.
SSR markers have already been developed and used in genetic diversity analysis in several related Camellia species. For example, 10 SSR markers with different di-nucleotide repeat motifs were developed from genomic DNA fragments, by magnetic beads, in C. nitidissima, showing 5–12 alleles at each locus (Chen et al., Reference Chen, Jiang, Wang, Wei, Wei, Tang and Li2010). In another study, 21 SSR markers were developed from genomic sequence, using 454 sequencing, in C. pingguoensis. The markers were characterized in 32 accessions of C. pingguoensis and 28 accessions of Camellia impressinervis. The number of alleles ranged from 3 to 9 at different loci, the observed heterozygosities (Ho) ranged from 0.20 to 0.96, and the expected heterozygosities ranged from 0.25 to 0.87 (Lu et al., Reference Lu, Liufu, Peng, Ye and Tang2014). For the population of C. huana, its genetic diversity was characterized using 12 SSR markers selected from those previously developed in the related species: C. pingguoensis and Camellia flavida (Li et al., Reference Li, Liu, Pei, Ning and Tang2020). Similarly, for C. tunghinensis Chang, the SSR genotyping of samples from this species was optimized using 31 SSR markers developed from C. nitidissima, C. japonica and Camellia sinensis (Tang et al., Reference Tang, Chen, Wei, Shi, Chai and Kong2014). SSR markers were also the basis for analysing the genetic diversity and population structure in three endangered Camellia species: C. chrysanthoides, C. micrantha and C. parvipetala (Chen et al., Reference Chen, Lu, Ye and Tang2019). Similarly, using SSR markers, the population of C. chuongtsoensis (Shao et al., Reference Shao, Fan, Huang, GAO, Li and Zhang2015), as well as that of C. nitidissima and its variety microcarpa from Naning (Lu et al., Reference Lu, Chen, Liang and Tang2019) were also studied. All in all, these SSR markers provide a good basis for investigating population genetics and conservation biology of Camellia species.
Relevant information on genetic diversity, population structure and gene flow is required to initiate successful conservation of rare and endangered species (Ren et al., Reference Ren, Jian, Chen, Liu, Zhang, Liu, Xu and Luo2014; Zaya et al., Reference Zaya, Molano-Flores, Feist, Koontz and Coons2017). Out of the 20 EST-SSR examined in this study, 14 were polymorphic and highly informative in differentiating 40 accessions of C. fascicularis from different geographic origins and grouping these accessions into three clusters. The genetic variation among clusters was lower than within each population, which showed that there was almost no gene penetration among the three clusters. It is suggested that more attention should be given on in situ and ex situ conservation of C. fascicularis in this narrow growth region. The polymorphic markers of this study will be helpful in conservation for maintaining the genetic diversity of this threatened species, and genetic improvement of new Camellias cultivars by interspecific hybridization.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S1479262123000138.
Acknowledgements
This study was supported by grants from the Scientific Research Fund Project of Yunnan Education Department (2021Y248) and National Key Research and Development Project (2017YFD060120204).
Author contributions
Peiyao Xin and Luyao Ma conceived and designed the experiments; Luyao Ma analysed the data; Bin Li and Cheng Liu performed the experiments; Junrong Tang, Jing Xin, Yaxuan Xin and Peng Ye contributed to reagents/materials/analysis tools; Bin Bai and Luyao Ma summarized data and wrote the manuscript.
Conflict of interest
The authors declare that they have no conflict of interest.