Introduction
Neocinnamomum plants are evergreen shrubs or small trees belonging to the Neocinnamomeae tribe of the Lauraceae family. To date, five Neocinnamomum species have been recorded in the Chinese Flora (A Editorial Committee of Flora of China, 1982), namely Neocinnamomum caudatum (N. caudatum), Neocinnamomum fargesii (N. fargesii), Neocinnamomum lecomtei (N. lecomtei), Neocinnamomum mekongense (N. mekongense) and Neocinnamomum delavayi (N. delavayi). Two additional Neocinnamomum species, Neocinnamomum atjehense (N. atjehense) and Neocinnamomum parvifolium (revised to a branch species of N. delavayi) that are distributed outside China, are recorded in the World Flora. On the International Plant Names Index website, 14 Neocinnamomum species are listed, and seven species are left after removing synonyms, including Neocinnamomum caudatum var. macrocarpum (Xu et al., Reference Xu, Xia and Zhang2017). Neocinnamomum plants are mainly distributed in the south-central, western and northwestern provinces of China, as well as in Nepal and Sikkim. They mainly grow as shrubs in sparse or dense forests or at forest margins, at an altitude of 1100–2300 m, along the banks of river valleys, ditch sides or on well-drained limestones (Zhang and Jia, Reference Zhang and Jia2013). Neocinnamomum plants are important oil trees and valuable medicinal plant resources in China, with multiple economic applications (Nanjing Medical University, 1977; Wang and Yang, Reference Wang and Yang2013). Presently, only a few studies have been performed on the genomes of Neocinnamomum plants compared with other oil plants; moreover, the degree of development and utilization of their wild resources is extremely low (Gan et al., Reference Gan, Song, Chen, Liu, Yang, Xu and Zheng2018).
Ribosomal DNA is parentally inherited, and phylogenetic analysis can comprehensively reveal the parental pedigree and system network evolution relationship (Kellogg and Bennetzen, Reference Kellogg and Bennetzen2004). Presently, the research and application of nuclear genomic sequences mainly focus on the internal transcribed spacer (ITS) sequences of ribosomal DNA (Liston et al., Reference Liston, Robinson, Oliphant and Alvarez-Buylla1996; Mayol and Rosselló, Reference Mayol and Rosselló2001; Wei et al., Reference Wei, Wang and Hong2003; Won and Renner, Reference Won and Renner2005; Kan et al., Reference Kan, Wang, Ding and Wang2007; Zheng et al., Reference Zheng, Cai, Yao and Teng2008). The general length of ITS sequences is 565–700 base pairs (bp), whereas that of the ribosomal genome ranges from 6000 to 8000 bp and has more genetic loci and information. The plant nuclear DNA (nrDNA) sequences have highly tandem repeats; however, these repeated sequences are homozygous or near-homozygous in most species because of homogeneous evolution, especially ITS1 and ITS2. Owing to their fast evolution, relatively short sequences and many mutant sites, ITS1 and ITS2 of the nuclear genome have been used in systematic studies (Schaal and Learn, Reference Schaal and Learn1988; Baldwin et al., Reference Baldwin, Sanderson, Porter, Wojciechowski and Donoghue1999). ITS sequences are used in the studies of evolutionary genetics, including species identification, polyploid origin, hybridization and evolution. Artyukova et al. (Reference Artyukova, Gontcharov, Kozyrenko, Reunova and Zhuravlev2005) established the phylogenetic relationship between species in the Far East and other species in the family by analysing the ITS sequence of eight species of Araliaceae growing in the Far East of Russia (Artyukova et al., Reference Artyukova, Gontcharov, Kozyrenko, Reunova and Zhuravlev2005). Luo et al. (Reference Luo, Zhang and Yang2005) analysed the ITS sequences of 51 species and one subgenus of Aconitum from eastern Asia, northern America and Europe, the results established the taxonomic status of the subgenus Aconitum of Ranunculaceae, and were also supported by morphological characteristics such as seeds and petals, which corrected the errors of traditional classification (Luo et al., Reference Luo, Zhang and Yang2005). Sudheer Pamidimarri et al. (Reference Sudheer Pamidimarri, Chattopadhyay and Reddy2009) used the ITS sequence to study the phylogenetic relationship and genetic differentiation of Jatropha curcas L⋅(Sudheer Pamidimarri et al., Reference Sudheer Pamidimarri, Chattopadhyay and Reddy2009). A large number of studies have shown that ITS sequence analysis has irreplaceable advancement and reliability in the study of genetic relationship of medicinal plants, which will become the trend of future research.
Owing to the small genetic differences among species, accurate analysis of the relationship among species based on a small number of genetic markers is difficult. In recent years, in many studies, the use of the complete mitochondrial genome, chloroplast genome and nrDNA was preferred to explore the phylogenetic relationship among species (Hsiao et al., Reference Hsiao, Chatterton and Asay1994; Liu et al., Reference Liu, Cox, Wang and Goffinet2014; Dong et al., Reference Dong, Chen, Liu, Wang, Zhang, Yang, Lang and Zhang2020; Kang et al., Reference Kang, Ong, Lee, Jung, Kyaw, Fan and Kim2021; Xin et al., Reference Xin, Yu, Eiadthong, Cao, Li, Yang, Zhao and Xin2023). Among the three types of genomes present in cells, the nuclear genome is the largest, most dominant and inherited from both parents (Li, Reference Li2006). Presently, the publicly available nuclear genomic data of the Neocinnamomum plants are relatively limited, which slows down the systematic genome-based research on the Neocinnamomum plants. In this study, we assembled nrDNA sequences of seven Neocinnamomum species using genome resequencing technologies, constructed high-resolution phylogenetic trees and studied the phylogenetic relationships among the Neocinnamomum species. The aims of this study were as follows: (1) constructing a phylogenetic framework with high support values; (2) exploring the phylogenetic differentiation process in the Neocinnamomum species; and (3) providing a scientific basis for the development and utilization of germplasm resources, genetic improvement and constructing germplasm resources of the Neocinnamomum plants.
Materials and methods
Plant materials
In this study, a total of 51 leaf samples from seven Neocinnamomum species were collected from the healthy and mature branches of at least three plants per species (details shown in online Supplementary Table S1). These samples were collected mainly in Yunnan, Sichuan and Guangxi from an altitude ranging from 1100 to 2300 m above sea level. From each plant, two to three leaves were collected and packed in Ziplock bags and dried with silica gel for later use for DNA isolation. These leaves were photographed with colour cards and rulers as references. The specimens were flattened with dry newspaper and cardboard and dried in an electric blast drying oven.
Sequencing
Samples were sent to the Kunming Institute of Botany, Chinese Academy of Sciences (Kunming, China) for nrDNA sequencing. Illumina (Illumina, Inc., San Diego, California, USA) (next-generation sequencing) and third-generation sequencing technologies were used, which enable the sequencing of millions of DNA bases parallelly, allowing an in-depth, detailed and comprehensive analysis of the sequenced genome and transcriptome. The core principle is sequencing-by-synthesizing, which determines the DNA base by capturing the marker's signal of the newly synthesized end (Liang et al., Reference Liang, Liu and Zhang2020).
Genome assembly and annotation
The obtained sequencing data of the nrDNA sequences were assembled using the GetOrganelle software package (Jin et al., Reference Jin, Yu, Yang, Song, DePamphilis, Yi and Li2020). The assembly workflow included (1) extracting all possible reads by extension based on the reference target genes or genomes in the database; (2) extracting the reads of the target genome; (3) removing non-target gene fragments and low-coverage contigs; and (4) final assembly of the genome and output results.
Bandage (Wick et al., Reference Wick, Schultz, Zobel and Holt2015) software was used to view, edit and export complete nrDNA sequences. For the samples without a complete structure, the log files were inspected to adjust the corresponding parameters and re-analyse the data; moreover, the output scaffolds.fasta file was examined and further filtered, with closely related species as reference sequences, to generate a complete structure.
The online software GeSeq (Tillich et al., Reference Tillich, Lehwark, Pellizzer, Ulbricht-Jones, Fischer, Bock and Greiner2017) and Geneious (Kearse et al., Reference Kearse, Moir, Wilson, Stones-Havas, Cheung, Sturrock, Buxton, Cooper, Markowitz, Duran, Thierer, Ashton, Meintjes and Drummond2012) were used to annotate the assembled nrDNA sequences, which were then manually corrected using the open reading frames of the corresponding sequences. Geneious was used to annotate the nrDNA sequences of the seven Neocinnamomum species using Eriobotrya fragrans as the reference.
Genome characterization and mutation analysis
Geneious was used to calculate the size, total GC content, gene content and pseudogenes of each region of the nrDNA. The nrDNA sequences were aligned using an online software Multiple Alignment using Fast Fourier Transform (MAFFT) v7 (Katoh and Standley, Reference Katoh and Standley2013) and were manually checked by MEGA X (Kumar et al., Reference Kumar, Stecher, Li, Knyaz and Tamura2018). Insertions/deletions (Indels) and single-nucleotide polymorphisms (SNPs) were counted and localized using manual statistical methods. Additionally, nrDNA hypervariable regions were analysed using mVASTA and DNAsp, and the sliding window method was performed in DnaSP v6.0 (Rozas et al., Reference Rozas, Ferrer-Mata, Sánchez-DelBarrio, Guirao-Rico, Librado and Ramos-Onsins2017) to evaluate the nucleotide diversity (Pi) value of the nrDNA sequences.
Genome sequence alignment and phylogenetic analysis
The ribosomal gene matrix was first aligned by using MAFFT and then manually adjusted and corrected by Geneious to obtain a reliable comparison matrix for subsequent phylogenetic analysis. A maximum likelihood (ML) phylogenetic tree was constructed by IQ-TREE software, with the general time-reversible model and a fast bootstrap value (bootstrap iteration) of 1000. An additional Bayesian inference (BI) tree was constructed by the CIPRES online software (http://www.phylo.org/). FigTree was used to view, edit and obtain the support values and branch length of the phylogenetic trees.
Results
Geographical distribution of the Neocinnamomum plants
ArcGIS software was used to create a heat map of the geographical distribution of the Neocinnamomum species (Fig. 1), which were mainly distributed in Yunnan, Sichuan, Tibet, Guizhou, Hubei, Chongqing and Guangxi of China, as well as in Myanmar, Laos and Vietnam. In China, Guangxi was most abundant in Neocinnamomum species (six species), followed by Yunnan (five species), Tibet (three species), Sichuan (three species), Guizhou (three species), Hubei (two species) and Chongqing (one species).
nrDNA sequence analysis
Genome size and gene content
Across the 51 Neocinnamomum samples, the nrDNA length ranged from 6716 to 6761 bp, and the GC content ranged from 56.3 to 56.9%. The lengths of the external transcribed spacer (ETS) region (including ETS1 and ETS2, namely 5′ ETS and 3′ ETS, respectively), 18S gene, 5.8S gene, 26S gene, ITS1 and ITS2 were 926–940, 1810, 159, 3384–3385, 228–264 and 203–208 bp, respectively, and the corresponding GC contents were 57.1–59.2, 50.4–50.5, 57.9, 57.9–58.3, 63.1–65.9 and 68.8–70.4%, respectively (online Supplementary Table S2).
nrDNA mutation analysis
A total of 27 indels were identified in all Neocinnamomum nrDNA sequences (Table 1), with the 26S gene having one, both the ETS regions having 12, ITS1 having 13, ITS2 having one and 18S and 5.8S genes having no indels. A total of 184 SNPs were detected in the nrDNA sequences (Fig. 2), including 51 transitions (TS) and 133 transversions (TV). The ratio of TS to TV was 1:0.38. A total of 20 SNPs were detected in the three RNA genes, with the 5.8S gene having none, the 18S gene having two and the 26S gene having 18 SNPs. Among the transcribed spacer regions, 118 SNPs were detected in both the ETS regions combined, 20 in ITS2 and 26 in ITS1.
Analysis of hypervariable regions
The Pi values of the nrDNA sequences of the 51 Neocinnamomum samples were 0–0.09111 (Fig. 3), with an average Pi value being 0.020188. Four hypervariable regions, with a Pi value greater than 0.04, were found in the nrDNA sequences, and all were located in the transcribed spacer regions. Among them, ETS2 had the highest Pi value (0.09111), followed by ETS1 (0.06039), whereas the Pi values of ITS1 and ITS2 were relatively low, with a maximum value of 0.04292.
Phylogenetic analysis
The length of the nrDNA sequences of the 51 Neocinnamomum samples ranged from 6716 to 6761 bp and the alignment matrix was 6790 bp in length after manual adjustment. In this study, two phylogenetic trees (i.e. ML and BI trees) were reconstructed based on the nrDNA sequences of the seven Neocinnamomum species (Fig. 4), with Caryodaphnopsis henryi as the outgroup. The ML tree divided the Neocinnamomum species into four clades, with clade I consisting of N. macrocarpum and N. caudatum, clade II with N. delavayi and N. mekongense, clade III with N. fargesii and a branch species of N. delavayi and clade IV with N. lecomtei constituting the most basal clade. All support values of the BI tree were one, except for one Neocinnamomum species, which had a support value of 0.98. The topological structure of the BI tree was consistent with that of the ML tree.
Discussion
In this study, the phylogenetic tree constructed by nrDNA divided the Neocinnamomum species into four clades, with clade I consisting of N. macrocarpum and N. caudatum, clade II with N. delavayi and N. mekongense, clade III with N. fargesii and a branch species of N. delavayi and clade IV with N. lecomtei constituting the most basal clade. Our result contrasts with an earlier phylogenetic study based on the complete chloroplast genomes of N. delavayi, N. mekongense and N. lecomtei using the ML method and revealed that N. delavayi was a sister species to N. mekongense and N. lecomtei, having high support values (Ren et al., Reference Ren, Song, Zhao and Xu2019). One explanation for this is that in most higher plants, the mitochondrial genome and chloroplast genome belong to maternal inheritance, while the nrDNA sequence belongs to parental inheritance, which can more truly reflect the genetic relationship between species (Zong, Reference Zong2019). In the division of interspecific relationship, ribosomal genome and ITS sequence are more suitable for the division of interspecific relationship than chloroplast genome and mitochondrial genome. Qu (Reference Qu2022) constructed three phylogenetic trees of 44 Carvodaphmopsis using nrDNA, chloroplast and mitochondrial genomes. There were serious conflicts among the three phylogenetic trees. Compared with the ribosomal gene tree, chloroplast and mitochondrial genomes performed poorly in the division of interspecific relationships. The results not only verify the effectiveness of nuclear genome sequences in the classification of interspecific relationships, but also show that chloroplast and mitochondrial genomes have limitations in the definition of subgenus relationships (Qu, Reference Qu2022).
Hybridization has been reported in plant lineages and is considered to be a key mechanism of plant evolution and far more common than in of animals. Hybridization can result in gene tree conflicts, especially between single-parent inheritance chloroplast or mitochondrial fragments and biparental inheritance nuclear gene fragments (Soltis and Kuzoff, Reference Soltis and Kuzoff1995). Sequences from different genomes of hybrid species often reflect different genetic systems (e.g. mitochondrial genes from fathers, plastid genes from maternal lines and nuclear genes from two parents), which may lead to inconsistencies between these different data sources, resulting in differences between the nrDNA phylogenetic tree and the phylogenetic tree constructed by chloroplast and mitochondrial genomes. For example, a serious conflict between nuclear and plastid gene trees was demonstrated in the construction of Cymbidium phylogeny using ITS and chloroplast genes (Zhang et al., Reference Zhang, Chen, Chen, Zhai, Huang, Wu, Li, Peng, Rao, Liu and Lan2021).
Neocinnamomum has been confirmed to be monophyletic by many scholars, but there are still many controversies in the systematic study of Neocinnamomum. Li et al. (Reference Li, Madriñán and Li2016) performed a phylogenetic study on Caryodaphnopsis. Among the four Neocinnamomum species, N. delavayi and N. mekongense were closely related sister groups, and then they were clustered with N. fargesii and N. lecomtei, respectively (Li et al., Reference Li, Madriñán and Li2016), which was consistent with the results of this study. Wang et al. (Reference Wang, Li, Conran and Li2010) suggested N. delavayi and N. mekongense, and N. lecomtei and N. caudatum as sister groups, respectively, whereas N. fargesii was determined as a basal and monophyletic group. Our findings were not consistent with the above results (Wang et al., Reference Wang, Li, Conran and Li2010). The reason may be that Wang et al. (Reference Wang, Li, Conran and Li2010) only used the chloroplast intergenic spacer and the three fragment sequences of nrDNA to construct a phylogenetic tree, so its accuracy is lower than that of the phylogenetic tree constructed by the chloroplast genome and the long sequence of nrDNA.
Conclusions
In this study, we sequenced the genomes of 51 individual leaf samples of seven Neocinnamomum species by next-generation and third-generation sequencing technologies with a sequencing depth of 30×, and about 6 Gb raw data and 2 Gb clean data were obtained for each individual. All 51 nrDNA sequences were assembled using GetOrganelle software, and nrDNA mutations were analysed. Four hypervariable regions (i.e. transcribed spacer regions) were found in the nrDNA sequences, with ETS (located downstream of the 26S gene) having the highest degree of variation. A total of 27 indels and 184 SNPs were detected in the nrDNA sequences, both of which mainly occurred in the ETS and ITS regions. Finally, the ML and BI phylogenetic trees were constructed based on the nrDNA sequences. The ML and BI trees constructed based on the ribosomal genes divided the Neocinnamomum species into four clades. Clade I included N. macrocarpum and N. caudatum, clade II included N. delavayi and N. mekongense, clade III included N. fargesii and a branch species of N. delavayi and clade IV included N. lecomtei constituting the most basal clade. In the BI tree, all the support values were one except for that of one Neocinnamomum species (PP = 0.98), and the topological structure of the BI tree was consistent with that of the ML tree.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S1479262123000771.
Data
The plastid marker matK sequence data of the 163 Lauraceae species will be submitted to LCGDB (https://lcgdb.wordpress.com) through the revision process.
Acknowledgements
The authors thank the financial support from the Yunnan Province Science and Technology Talents and Platform Project.
Author contributions
Sampling, Q. Y. and W. X.; data analysis, L. Y., Y. X. and Q. L.; thesis writing, Q. L.; conceived and designed the experiments, Y. S. and P. X.; project fund support provider, P. X. All authors have read and agreed to the published version of the manuscript.
Funding statement
This research was funded by the Yunnan Province Science and Technology Talents and Platform Project (No. 202205AF150022).
Competing interest
None.