Implications
Demand for global animal protein and pressures for cropping land will combine to drive increased feed efficiency in ruminants from low quality cellulosic feeds. However, livestock production is responsible for a large proportion of global agricultural greenhouse gas emissions. Observing and understanding the highly complex relationships that influence the rumen microbiota are essential for designing methods to successfully intervene and manipulate the system towards a desired phenotype.
Introduction
The complex relationship between the host ruminant and its inhabitant microbiota has been the focus of research for decades, initially centred around identification of the types of microbiota that reside within the rumen through to a greater understanding of their functional contribution to the host’s energy requirements. As new technology has become available, what originally involved the isolation and detailed studies of single strains in the laboratory has now moved to large scale sequencing of ‘total’ rumen microbiota nucleic acids (metagenomics and metatranscriptomics), proteomics and metabolomics. Notwithstanding the limitations of these new techniques, which will be discussed in more detail throughout this review, the adoption of these techniques has been rapid and applied to most ruminant production systems. Initially to define the variance in rumen bacterial populations with diet shifts, which can lead to digestive disorders such as acidosis. Then in the last decade, the emphasis has been focussed around the understanding of the rumen microbiota’s contribution to agricultural greenhouse gas emissions, predominantly methane. While now there is increased interest on defining the rumen microbiota of an efficient production animal (meat and dairy) and the influence of the host genetics on shaping the microbiota of the rumen (Jami et al., Reference Jami, White and Mizrahi2014).
Characterising the rumen microbiota using taxonomic marker genes
Considerations and limitations
Metataxonomics has been recommended as the term for defining high throughput sequencing analysis of amplified taxonomic marker genes, whereas metagenomics refers to the use of shotgun sequencing approaches to characterise the potential function of the microbiota based on their genomes (Marchesi and Ravel, Reference Marchesi and Ravel2015). Identifying the microbiota that resides within the rumen and variations to the composition in response to perturbations is critical for the development of our understanding of the complex dynamics that exist within the rumen environment. Metataxonomics is routinely employed in ruminant studies and is relatively inexpensive as it easily allows for the pooling of many samples and reasonable sequence depth per analysis compared with other metaomic techniques. General primer sets are used to amplify sequences representing the archaeal and bacterial members by targeting the 16S rRNA gene, whereas the 18S rRNA gene is used for eukaryotic targets, essentially protists. Due to the high degree of sequence similarity found within the fungal 18S rRNA, the anaerobic fungal populations require targeting of the length polymorphic internal transcribed spacer sequence that lies between the ribosomal genes (Dore and Stahl, Reference Dore and Stahl1991; Kittelmann et al., Reference Kittelmann, Naylor, Koolaard and Janssen2012). Although more recently, focus has shifted towards the D1/D2 region at the 5' end of the 28S rRNA gene (Edwards et al., Reference Edwards, Forster, Callaghan, Dollhofer, Dagar, Cheng, Chang, Kittelmann, Fliegerova, Puniya, Henske, Gilmore, O’Malley, Griffith and Smidt2017). The choice of DNA extraction method, amplification primer sets, sequencing platform and bioinformatics workflows will all influence the final analysis (Gantner et al., Reference Gantner, Andersson, Alonso-Saez and Bertilsson2011; Henderson et al., Reference Henderson, Cox, Kittelmann, Miri, Zethof, Noel, Waghorn and Janssen2013; Klindworth et al., Reference Klindworth, Pruesse, Schweer, Peplies, Quast, Horn and Glöckner2013). The variance in these methods and how they are applied makes reconciling results between research groups difficult, although using consistent methods and control samples can produce valid observations (Henderson et al., Reference Henderson, Cox, Kittelmann, Miri, Zethof, Noel, Waghorn and Janssen2013).
Phylogenetic gene analysis is based on the sequence similarity or, more correctly, evolutionary divergence of sequences between defined taxonomic units. Initially, and at least for the full length 16S rRNA gene, a value of 97% sequence similarity was used to define a species level rank (Stackebrandt and Goebel, Reference Stackebrandt and Goebel1994). However, this has now been redefined to a recommended value of 98.5% (Konstantinidis and Tiedje, Reference Konstantinidis and Tiedje2007). This does not hold true for the shorter sequences generated from the next generation sequencing technologies and values of 99% to 100% are more common for defining an operational taxonomic unit (OTU) rather than to a 97% species rank (Callahan et al., Reference Callahan, McMurdie, Rosen, Han, Johnson and Holmes2016). Using a value of 100% is also likely to generate multiple OTUs from the same species due to species which possess multiple copies of the 16S rRNA gene and that these copies are known to not be identical (Větrovský and Baldrian, Reference Větrovský and Baldrian2013). The choice of variable region targeted along with relaxing the identity threshold to 99% to allow for possible polymorphism effects may mitigate these. The most popular variable region currently being targeted is the V4 region, with primers that cover both bacterial and archaeal populations, producing an amplicon size amenable to the Illumina MiSeq sequencing platform (Kozich et al., Reference Kozich, Westcott, Baxter, Highlander and Schloss2013).
As OTUs are defined by sequence similarity, it is imperative that errors arising from sample processing and sequencing platforms are minimised and removed (Kunin et al., Reference Kunin, Engelbrektson, Ochman and Hugenholtz2009). Software methods to correct for sequencing platform errors are available and widely employed to improve the quality of sequence data. (Kunin et al., Reference Kunin, Engelbrektson, Ochman and Hugenholtz2009; Quince et al., Reference Quince, Lanzen, Curtis, Davenport, Hall, Head, Read and Sloan2009; Bragg et al., Reference Bragg, Stone, Imelfort, Hugenholtz and Tyson2012, Callahan et al., Reference Callahan, McMurdie, Rosen, Han, Johnson and Holmes2016). The inclusion of appropriate negative controls and standards across multiple experiments should be included to account for contaminants and variance in sample processing.
The most widely used analysis for rumen environments involves the 16S rRNA gene, for which curated databases such as Greengenes, RDP and SILVA exist (Cole et al., Reference Cole, Chai, Marsh, Farris, Wang, Kulam, Chandra, McGarrell, Schmidt, Garrity and Tiedje2003; DeSantis et al., Reference DeSantis, Hugenholtz, Larsen, Rojas, Brodie, Keller, Huber, Dalevi, Hu and Andersen2006; Pruesse et al., Reference Pruesse, Quast, Knittel, Fuchs, Ludwig, Peplies and Glockner2007). The databases are constantly improving by including measures that account and remove inaccuracies in the data sets, especially around chimeric sequences and taxonomic nomenclature. (Yilmaz et al., Reference Yilmaz, Parfrey, Yarza, Gerken, Pruesse, Quast, Schweer, Peplies, Ludwig and Glöckner2014; Balvočiūtė and Huson, Reference Balvočiūtė and Huson2017; Edgar, Reference Edgar2018). However, a large proportion of the databases are derived from environmental sequences and are generally not full length. Taxonomy for these sequences is therefore predicted using various methods such as Bayesian or sequence alignments to curated trees (Wang et al., Reference Wang, Garrity, Tiedje and Cole2007; McDonald et al., Reference McDonald, Price, Goodrich, Nawrocki, DeSantis, Probst, Andersen, Knight and Hugenholtz2012). Many of the characterised isolates are the sole representative for an identified genus, thus making genus the lowest level of rank available for many classifiers.
Due to the intense interest in methanogenic archaeal populations in relation to their contribution to greenhouse gas emissions from livestock, a highly resolved taxonomic database focussed on gut isolates has been generated (RIM-DB) (Seedorf et al., Reference Seedorf, Kittelmann, Henderson and Janssen2014).
Taxonomic inconsistencies and variance in accuracy of rank classification between reference sets poses a challenge for comparing results found within the literature. Although a common agreed workflow would minimise these issues, it is unlikely that this will eventuate. Agreed minimum descriptions of sample collection and data analysis have been requested MIMARKS (Yilmaz et al., Reference Yilmaz, Kottmann, Field, Knight, Cole, Amaral-Zettler, Gilbert, Karsch-Mizrachi, Johnston, Cochrane, Vaughan, Hunter, Park, Morrison, Rocca-Serra, Sterk, Arumugam, Bailey, Baumgartner, Birren, Blaser, Bonazzi, Booth, Bork, Bushman, Buttigieg, PSG, Charlson, Costello, Huot-Creasy, Dawyndt, DeSantis, Fierer, Fuhrman, Gallery, Gevers, Gibbs, Gil, Gonzalez, Gordon, Guralnick, Hankeln, Highlander, Hugenholtz, Jansson, Kau, Kelley, Kennedy, Knights, Koren, Kuczynski, Kyrpides, Larsen, Lauber, Legg, Ley, Lozupone, Ludwig, Lyons, Maguire, Methé, Meyer, Muegge, Nakielny, Nelson, Nemergut, Neufeld, Newbold, Oliver, Pace, Palanisamy, Peplies, Petrosino, Proctor, Pruesse, Quast, Raes, Ratnasingham, Ravel, Relman, Assunta-Sansone, Schloss, Schriml, Sinha, Smith, Sodergren, Spor, Stombaugh, Tiedje, Ward, Weinstock, Wendel, White, Whiteley, Wilke, Wortman, Yatsunenko and Glöckner2011).
The nature of these sequencing methods produces compositional data that restrict analysis to relative abundance methods and excludes standard Pearson and Spearman correlation analysis (Pearson, Reference Pearson1897; Lovell et al., Reference Lovell, Pawlowsky-Glahn, Egozcue, Marguerat and Bahler2015; Gloor et al., Reference Gloor, Macklaim, Pawlowsky-Glahn and Egozcue2017). Alternative methods such as Aitchison and PhILR for Beta diversity analysis and ϕ to describe the strength of proportionality between two variables for describing correlations are available (Aitchison et al., Reference Aitchison, Barceló-Vidal, Martín-Fernández and Pawlowsky-Glahn2000; Lovell et al., Reference Lovell, Pawlowsky-Glahn, Egozcue, Marguerat and Bahler2015; Silverman et al., Reference Silverman, Washburne, Mukherjee and David2017). Furthermore, issues around variance in 16S rRNA gene copy numbers between species have not been adequately addressed and cannot be accurately determined for OTUs that are not represented by characterised species (Louca et al., Reference Louca, Doebeli and Parfrey2018). Further confounding the analysis is sequencing depth between samples, abundance issues arise around the variation in the number of sequences obtained from a given sample. Rarefying the data to a defined level across all samples preferentially excludes lower abundant OTUs leading to a loss of precision, whereas those that use the entire data set must account for the magnitude of sequence depth between samples and usually employ a transformation or scaling method (Anders and Huber, Reference Anders and Huber2010; Robinson et al., Reference Robinson, McCarthy and Smyth2010; McMurdie and Holmes, Reference McMurdie and Holmes2014; Weiss et al., Reference Weiss, Xu, Peddada, Amir, Bittinger, Gonzalez, Lozupone, Zaneveld, Vázquez-Baeza, Birmingham, Hyde and Knight2017).
Methods for the identification of OTUs that are significantly associated with a given treatment or phenotype should not use models that apply a Poisson distribution, due to the sparsity of the data matrices. Researchers have suggested the use of a negative binomial distribution and log transformation of the data as an alternative to address the over dispersion problem arising in 16S rRNA gene data (Anders and Huber, Reference Anders and Huber2010). However, this tends to increase the false discovery rate due to the data being proportional (Lovell et al., Reference Lovell, Pawlowsky-Glahn, Egozcue, Marguerat and Bahler2015; Gloor et al., Reference Gloor, Macklaim, Pawlowsky-Glahn and Egozcue2017).
A recent review suggests alternatives to the standard approaches that have previously been undertaken for each of the major data transformation and analysis steps when dealing with proportional data (Gloor et al., Reference Gloor, Macklaim, Pawlowsky-Glahn and Egozcue2017). These include initial normalisation of the count data using a log ratio transformation (centred or isometric) rather than rarefaction. Substitution of beta diversity analysis using Aitchison calculations of distances for Bray Curtis and PhILR for unifrac that beta diversity variance is visualised based on compositional principal component biplots rather than principal co-ordinate. Finally, correlation of proportional data should be performed with an appropriate measure such as ϕ rather than Pearson or Spearman and identifying differential abundant OTUs with ALDEx2 or ANCOM. Methods that account for the compositional aspect of the data have been developed and incorporated into the popular QIIME software package (Caporaso et al., Reference Caporaso, Kuczynski, Stombaugh, Bittinger, Bushman, Costello, Fierer, Pena, Goodrich, Gordon, Huttley, Kelley, Knights, Koenig, Ley, Lozupone, McDonald, Muegge, Pirrung, Reeder, Sevinsky, Turnbaugh, Walters, Widmann, Yatsunenko, Zaneveld and Knight2010), MixMC and ALDEx2 packages (Fernandes et al., Reference Fernandes, Reid, Macklaim, McMurrough, Edgell and Gloor2014; Mandal et al., Reference Mandal, Van Treuren, White, Eggesbø, Knight and Peddada2015; Lê Cao et al., Reference Lê Cao, Costello, Lakis, Bartolo, Chua, Brazeilles and Rondeau2016).
Taxonomic identification of OTUs does not define a functional phenotype, as different strains can vary dramatically in their function, that is non-pathogenic v. pathogenic. Certain families are more clearly defined and separated, mainly due to the higher level of interest in those groups and therefore more representative sequences are available compared with the less studied families. Analysing data at the family level therefore will generally group many species carrying varied functional capacity together and shed little light on any functional capacity changes to the ecosystem. This also suggests that methods of functional inference such as PICRUSt and Piphillin (Langille et al., Reference Langille, Zaneveld, Caporaso, McDonald, Knights, Reyes, Clemente, Burkepile, Vega Thurber, Knight, Beiko and Huttenhower2013; Iwai et al., Reference Iwai, Weinmaier, Schmidt, Albertson, Poloso, Dabbagh and DeSantis2016) although useful, will only be accurate when confidence intervals for a match are high and closest to a reference genome. Efforts to improve this for a rumen microbiome focus have been undertaken through the development of a rumen reference data set, CowPi (Wilkinson et al., Reference Wilkinson, Huws, Edwards, Kingston-Smith, Siu-Ting, Hughes, Rubino, Friedersdorff and Creevey2018).
Applications of microbiota taxonomic profiling
Microbial profiling studies using high throughput sequencing have been performed on the major ruminants investigating changes due to diet, subacute acidosis, methane emissions, feed efficiency, variation along the digestive tract, maternal influence and seasonal changes (Fouts et al., Reference Fouts, Szpakowski, Purushe, Torralba, Waterman, MacNeil, Alexander and Nelson2012; Lee et al., Reference Lee, Jung, Oh, Lee, Madsen and Jeon2012; Li et al., Reference Li, Connor, Li, Baldwin and Sparks2012; Jami et al., Reference Jami, White and Mizrahi2014; Denman et al., Reference Denman, Martinez Fernandez, Shinkai, Mitsumori and McSweeney2015; Mao et al., Reference Mao, Zhang, Liu and Zhu2015; Myer et al., Reference Myer, Smith, Wells, Kuehn and Freetly2015; Huws et al., Reference Huws, Edwards, Creevey, Rees Stevens, Lin, Girdwood, Pachebat and Kingston-Smith2016; Martinez-Fernandez et al., Reference Martinez-Fernandez, Denman, Yang, Cheung, Mitsumori and McSweeney2016; Abecia et al., Reference Abecia, Jiménez, Martínez-Fernandez, Martín-García, Ramos-Morales, Pinloche, Denman, Newbold and Yáñez-Ruiz2017; Danielsson et al., Reference Danielsson, Dicksved, Sun, Gonda, Müller, Schnürer and Bertilsson2017; Noel et al., Reference Noel, Attwood, Rakonjac, Moon, Waghorn and Janssen2017; Tapio et al., Reference Tapio, Fischer, Blasco, Tapio, Wallace, Bayat, Ventto, Kahala, Negussie, Shingfield and Vilkki2017b; Wetzels et al., Reference Wetzels, Mann, Pourazad, Qumar, Pinior, Metzler-Zebeli, Wagner, Schmitz-Esser and Zebeli2017; Petri et al., Reference Petri, Vahmani, Yang, Dugan and McAllister2018). But the most comprehensive study performed on ruminates to identify the core microbiota and define what variance in the rumen microbiome was attributed to ruminant species, diet and geographical location was undertaken by the rumen microbial census collaboration (Henderson et al., Reference Henderson, Cox, Ganesh, Jonker, Young and Janssen2015). A unifying co-ordinated approach limited the large amount of inconsistency in sample preparation, gene amplification and analysis platforms used between different research groups. Diet was found to exhibit the largest influence on the rumen bacterial community, attributed to the changes in the physical and chemical characteristics of the feed allowing for specialised niches. The unclassified Bacteroidales and Ruminococcaceae were observed to be consistently relatively more abundant in animals fed forage, whereas Prevotella and unclassified Succinivibrionaceae were relatively more abundant in animals fed diets containing concentrate.
Methanogenic archaeal species were found to be ubiquitously distributed and not affected by host, diet or location with Methanobrevibacter gottschalkii and Methanobrevibacter ruminantium dominating nearly all samples and accounting for 74% of the archaeal data (Henderson et al., Reference Henderson, Cox, Ganesh, Jonker, Young and Janssen2015). This reflects the limited substrates used by methanogenic archaea, predominately CO2 and H2 or methylated substrates, such as methanol, methylamines and methyl sulphides, and that they exist within the rumen as specialised secondary feeders obtaining substrates for growth from the common fermentative end products of numerous bacterial species.
Several studies including those that have looked at the microbiota populations for natural low methane animals and those that have reduced the methanogen populations through the addition of specific inhibitors have seen common population shifts. Species closely related to the lactic acid producing Sharpea azabuensis has been consistently observed to be increased in low methane sheep rumen and likely indicate rapid heterofermentative pathways due to higher rumen turnover rates (Kittelmann et al., Reference Kittelmann, Pinares-Patiño, Seedorf, Kirk, Ganesh, McEwan and Janssen2014; Kamke et al., Reference Kamke, Kittelmann, Soni, Li, Tavendale, Ganesh, Janssen, Shi, Froula, Rubin and Attwood2016); whereas H2 producing Ruminococcaceae, Lachnospiraceae and Verrucomicrobia are more prevalent in high methane yielding animals. In cattle and goats, high methane animals were also associated with increases in Verrucomicrobia and Synergistetes bacteria and decreases in the relative abundance of methanogenic archaea (Denman et al., Reference Denman, Martinez Fernandez, Shinkai, Mitsumori and McSweeney2015; Wallace et al., Reference Wallace, Rooke, McKain, Duthie, Hyslop, Ross, Waterhouse, Watson and Roehe2015; Martinez-Fernandez et al., Reference Martinez-Fernandez, Denman, Yang, Cheung, Mitsumori and McSweeney2016). Cattle on high starch diets have also attributed changes in the Succinovibrioacce relative abundance with methane yields, but this was not evident in cattle on poor quality roughage diets (Martinez-Fernandez et al., Reference Martinez-Fernandez, Denman, Yang, Cheung, Mitsumori and McSweeney2016; Danielsson et al., Reference Danielsson, Dicksved, Sun, Gonda, Müller, Schnürer and Bertilsson2017).
Prevotella spp. are the most frequently observed OTUs in amplicon data sets and are often both negatively and positively associated with the phenotype being investigated. Yet as most analysis is based on de novo clustering, the ability to compare these uncultured diverse OTUs between studies is limited. In a recent review on the ruminal microbiome associated with methane emissions, the authors identified many Prevotella OTUs from 1000 cattle that clustered together phylogenetically and which fell into groups that were either positively or negatively correlated with methane emissions, suggesting a similar function for closely related species (Tapio et al., Reference Tapio, Snelling, Strozzi and Wallace2017a). Further investigation of the functional capacity of these differing but closely related species is needed to provide evidence for their contribution.
Initial reports on a small number of cattle found a positive correlation with methane yield and the bacteria:methanogen ratio (methanogen abundance) (Wallace et al., Reference Wallace, Rooke, McKain, Duthie, Hyslop, Ross, Waterhouse, Watson and Roehe2015; Roehe et al., Reference Roehe, Dewhurst, Duthie, Rooke, McKain, Ross, Hyslop, Waterhouse, Freeman, Watson and Wallace2016). However, larger groups of animals and other studies have not found this correlation (Danielsson et al., Reference Danielsson, Schnurer, Arthurson and Bertilsson2012; Wallace et al., Reference Wallace, Rooke, McKain, Duthie, Hyslop, Ross, Waterhouse, Watson and Roehe2015; Tapio et al., Reference Tapio, Snelling, Strozzi and Wallace2017a). Although the relative abundance of the methanogenic population may not consistently be associated with methane yield and is not informative of the level of functional activity, there is a consistent change in the dominant species with a relative increase in M. ruminantium and decrease in M. Gottschalkii in low methane animals (Wallace et al., Reference Wallace, Rooke, McKain, Duthie, Hyslop, Ross, Waterhouse, Watson and Roehe2015; Danielsson et al., Reference Danielsson, Dicksved, Sun, Gonda, Müller, Schnürer and Bertilsson2017; Martinez-Fernandez et al., Reference Martinez-Fernandez, Denman, Cheung and McSweeney2017).
Microbial profiling for feed efficiency traits in dairy cattle has revealed a reduction in the microbial richness and a strong correlation between the ratio of the phyla Firmicutes to Bacteroidetes and milk-fat yield, but no strong correlation with residual feed intake (RFI) (Jami et al., Reference Jami, White and Mizrahi2014). However in another study, a similar change in the Firmicutes:Bacteroidetes ratio was evident, but no reduction in the total microbial richness was observed (Myer et al., Reference Myer, Smith, Wells, Kuehn and Freetly2015). However, this may just reflect the different regions of the 16S rRNA gene that were used between the studies, confounding the comparison. Caution should be taken when using Phylum level values and ratios as a phenotype predictor due to the varied functional diversity possessed by the many bacterial species within a phyla. The use of larger numbers of animals and more refined analysis defining distinct microbial composition as an enterotype based on similar microbial clustering should be more accurate (Costea et al., Reference Costea, Hildebrand, Arumugam, Bäckhed, Blaser, Bushman, de Vos, Ehrlich, Fraser, Hattori, Huttenhower, Jeffery, Knights, Lewis, Ley, Ochman, O’Toole, Quince, Relman, Shanahan, Sunagawa, Wang, Weinstock, Wu, Zeller, Zhao, Raes, Knight and Bork2018).
Microbial profiling studies using RNA as a template have been used to infer active populations at time of sampling. In a temporal study to monitor fibre colonisation, Huws et al. (Reference Huws, Edwards, Creevey, Rees Stevens, Lin, Girdwood, Pachebat and Kingston-Smith2016) showed a biphasic colonisation of fibre with primary colonisers established within 1to 2 h and secondary colonisers not evident until 4 to 8 h after feeding. A constant signal from Butyrivibrio, Fibrobacter, Olsenella and Prevotella would indicate their role as core bacteria involved in fibre degradation regardless of the sampling time. The variance in microbes observed due to time after feeding highlights the importance of considering these changes when interpreting the results from a single collection.
Limited studies have focussed on the protozoal and fungal populations of the rumen using amplicon-based methods (Fouts et al., Reference Fouts, Szpakowski, Purushe, Torralba, Waterman, MacNeil, Alexander and Nelson2012; Kittelmann et al., Reference Kittelmann, Seedorf, Walters, Clemente, Knight, Gordon and Janssen2013; Mao et al., Reference Mao, Huo and Zhu2016; Cunha et al., Reference Cunha, Veloso, Marcondes, Mantovani, Tomich, Pereira, Ferreira, Dill-McFarland and Suen2017; Tapio et al., Reference Tapio, Fischer, Blasco, Tapio, Wallace, Bayat, Ventto, Kahala, Negussie, Shingfield and Vilkki2017b). Both Kittelmann et al. and Cunha et al. saw no difference in the fungal populations in relation to methane emissions, whereas Tapio et al. indicated that two fungal species were negatively correlated with methane yields (Kittelmann et al., Reference Kittelmann, Pinares-Patiño, Seedorf, Kirk, Ganesh, McEwan and Janssen2014; Cunha et al., Reference Cunha, Veloso, Marcondes, Mantovani, Tomich, Pereira, Ferreira, Dill-McFarland and Suen2017; Tapio et al., Reference Tapio, Snelling, Strozzi and Wallace2017a). The taxonomic databases are more poorly characterised for rumen fungi and protozoa, although a specific gut anaerobic fungal database has been created and the 18S rRNA SILVA database is used for protozoa taxonomy (Kittelmann et al., Reference Kittelmann, Naylor, Koolaard and Janssen2012; Yilmaz et al., Reference Yilmaz, Parfrey, Yarza, Gerken, Pruesse, Quast, Schweer, Peplies, Ludwig and Glöckner2014). A shift towards targeting the D1/D2 region of the 28S rRNA gene for anaerobic fungi should improve classification as an extensively curated taxonomic database exists for the 28S rRNA gene (Edwards et al., Reference Edwards, Forster, Callaghan, Dollhofer, Dagar, Cheng, Chang, Kittelmann, Fliegerova, Puniya, Henske, Gilmore, O’Malley, Griffith and Smidt2017).
Other phylogenetic markers targeting functional genes have been used to define specific microbial populations of the rumen, including the formyltetrahydrofolate synthase, methyl coenzyme reductase (mcrA) and urease (ureC) genes (Gagen et al., Reference Gagen, Denman, Padmanabha, Zadbuke, Al Jassim, Morrison and McSweeney2010; Henderson et al., Reference Henderson, Naylor, Leahy and Janssen2010; Mitsumori et al., Reference Mitsumori, Matsui, Tajima, Shinkai, Takenaka, Denman and McSweeney2014; Jin et al., Reference Jin, Zhao, Zheng, Bu, Beckers, Denman, McSweeney and Wang2017) . Only the ureC gene has been used in conjunction with high throughput sequencing to study the ureolytic populations of the rumen and rumen mucosal associated populations (Jin et al., Reference Jin, Zhao, Zheng, Bu, Beckers, Denman, McSweeney and Wang2017). Due to the limited availability of taxonomic data within the reference data set, more than 50% of the data could not be taxonomically identified. However, distinct populations were found on the rumen wall compared with the fibre and liquid associated fractions.
Profiling the rumen microbiota using a taxonomic marker gene is inexpensive, rapid and provides a broad low resolution catalogue of the identified microbiota. Development of robust analysis methods can now accurately correlate shifts in the community and define co-occurrence networks for OTUs, providing greater insight into the complex interactions of the rumen. Initially characterising samples using these methods can aid in deciding which samples are appropriate for more expensive techniques such as metagenomics for describing the functional capacity.
Defining the functional potential of the rumen microbiota through metagenomics analysis
Considerations and limitations
Defining the functional capacity within the rumen microbiota is achieved through sequencing of the combined genomes in a shotgun approach, with the aim of cataloguing the genes and the species to which they belong. Analysis can profile the taxonomy, catalogue the functional genes, attempt to assemble whole genomes and monitor changes in functional gene counts. Metagenomics allows for the study of the uncultivable members and has become an important tool for understanding the full genomic potential that resides within the rumen microbiome, while minimising biases observed with amplicon-based methodologies. Limitations still apply to metagenomics analysis, again such as sample collection and DNA extraction techniques which can bias the proportion and types of species detected. Assembly of sequences into contiguous genomic sequences (contigs) for metagenomics studies are similar to those techniques developed for individual genomes. Although there is a need to overcome issues around varying levels of genomic DNA for different species (which is exploited by ‘binning’ techniques, see below) and that closely related species or strains may become inadvertently co-assembled. Most metagenomic assemblers employ varying kmer sizes and coverage depth values to improve de novo assembly (IDBA-UD, Meta-IDBA, metaSPAdes and MEGAHIT) (Peng et al., Reference Peng, Leung, Yiu and Chin2011; Bankevich et al., Reference Bankevich, Nurk, Antipov, Gurevich, Dvorkin, Kulikov, Lesin, Nikolenko, Pham, Prjibelski, Pyshkin, Sirotkin, Vyahhi, Tesler, Alekseyev and Pevzner2012; Peng et al., Reference Peng, Leung, Yiu and Chin2012; Li et al., Reference Li, Liu, Luo, Sadakane and Lam2015). Assembled contigs can then be placed into common bins based on nucleotide frequency (most commonly tetramer) and coverage depth within the sample using approaches like PhyloPythiaS, GroopM and MetaBat (Patil et al., Reference Patil, Haider, Pope, Turnbaugh, Morrison, Scheffer and McHardy2011; Imelfort et al., Reference Imelfort, Parks, Woodcroft, Dennis, Hugenholtz and Tyson2014; Kang et al., Reference Kang, Froula, Egan and Wang2015). Sequencing multiple similar environmental samples as opposed to just deeper sequencing of a single sample improves binning based on the assumption that species have the same relative abundance among samples (Albertsen et al., Reference Albertsen, Hugenholtz, Skarshewski, Nielsen, Tyson and Nielsen2013). Completeness and contamination of metagenomic assembled genomes (MAGs) can be assessed based on the presence of multiple lineage-specific single copy marker genes using CheckM or similar approaches with PhyloSift (Darling et al., Reference Darling, Jospin, Lowe, Matsen, Bik and Eisen2014; Parks et al., Reference Parks, Imelfort, Skennerton, Hugenholtz and Tyson2015). These methodologies also allow for assigning of taxonomy to these MAGs based on these single copy markers being concatenated and placed in concatenated gene trees. Genome-based taxonomy trees are rapidly proving to be more accurate at phylogenetic placement than the single marker gene methods and have recently resulted in 73% of taxa being corrected with one or more changes to their existing taxonomy (Parks et al., Reference Parks, Chuvochina, Waite, Rinke, Skarshewski, Chaumeil and Hugenholtz2018).
Applications of metagenomics
Like most other studies of the rumen microbiome, these tools have been employed to gain a better understanding about fibre degradation, methane emissions and ruminant efficiency. Although only performed on three animals and not to the depth of today’s metagenomics studies, the first published data defining the fibre adherent population in cattle highlighted the differences obtained from full length PCR-amplified phylogenetic assessment and that from metagenomics data at both the 16S rRNA gene and genomic sequence level (Brulc et al., Reference Brulc, Antonopoulos, Miller, Wilson, Yannarell, Dinsdale, Edwards, Frank, Emerson, Wacklin, Coutinho, Henrissat, Nelson and White2009). Although biases were attributed to the PCR method, all methods exhibited the same power to discriminate between animals and the rumen fluid v. fibre adherent populations (Brulc et al., Reference Brulc, Antonopoulos, Miller, Wilson, Yannarell, Dinsdale, Edwards, Frank, Emerson, Wacklin, Coutinho, Henrissat, Nelson and White2009). Assignment of phylogeny for genomic sequences was limited by the relevant data matches in the SEED data set at the time of analysis (Overbeek et al., Reference Overbeek, Begley, Butler, Choudhuri, Chuang, Cohoon, de Crécy-Lagard, Diaz, Disz, Edwards, Fonstein, Frank, Gerdes, Glass, Goesmann, Hanson, Iwata-Reuyl, Jensen, Jamshidi, Krause, Kubal, Larsen, Linke, McHardy, Meyer, Neuweger, Olsen, Olson, Osterman, Portnoy, Pusch, Rodionov, Rückert, Steiner, Stevens, Thiele, Vassieva, Ye, Zagnitko and Vonstein2005). However, analysis of the functional capacity concluded that primary colonisers target the easily accessible side chains of complex plant polysaccharides, reflecting that samples were collected 1 h after feeding and that bacteria focussing on the main cellulosic and xylan backbones likely colonise later (Brulc et al., Reference Brulc, Antonopoulos, Miller, Wilson, Yannarell, Dinsdale, Edwards, Frank, Emerson, Wacklin, Coutinho, Henrissat, Nelson and White2009; Huws et al., Reference Huws, Edwards, Creevey, Rees Stevens, Lin, Girdwood, Pachebat and Kingston-Smith2016).
A deeper metagenomic study into the fibre adherent microbiome of switch grass also demonstrated the ability of the rumen microbiota to rapidly colonise and degrade biomass (Hess et al., Reference Hess, Sczyrba, Egan, Kim, Chokhawala, Schroth, Luo, Clark, Chen, Zhang, Mackie, Pennacchio, Tringe, Visel, Woyke, Wang and Rubin2011). Increased sequence data identified 2.5 million open-reading frames (ORFs) of which ~1% were classified as candidate carbohydrate active genes. The majority of these were novel and not closely aligned with those in the NCBI non-redundant database, highlighting the extensive repertoire of enzymes employed by the rumen microbiota to deconstruct plant material.
Due to the greater sequence data and availability of genome binning methods, the authors were able to group assembled sequence reads based on tetra nucleotide frequencies and read coverage, producing 446 distinct groups (bins). Due to their relatively higher abundance within the fibre adherent population, 15 near complete draft genomes from previously un-isolated species were generated (Hess et al., Reference Hess, Sczyrba, Egan, Kim, Chokhawala, Schroth, Luo, Clark, Chen, Zhang, Mackie, Pennacchio, Tringe, Visel, Woyke, Wang and Rubin2011). This allowed for the accurate assignment of functional capacity and potential role to these specific bacteria rather than just gene catalogue counts from the microbiome.
Other ruminants, including Yak and Reindeer have also been investigated, primarily focussing on carbohydrate active enzymes (Dai et al., Reference Dai, Zhu, Luo, Song, Liu, Liu, Chen, Wang, Li, Zeng, Dong, Hu, Li, Xu, Huang and Dong2012; Pope et al., Reference Pope, Mackenzie, Gregor, Smith, Sundset, McHardy, Morrison and Eijsink2012). A deep analysis of the Moose metagenome including genomic binning allowed for genomic reconstruction of representatives from the uncultured Bacteroidetes family BS11 and characterisation from genome reconstruction suggested a role in hemicellulosic fermentation (Solden et al., Reference Solden, Hoyt, Collins, Plank, Daly, Hildebrand, Beavers, Wolfe, Nicora, Purvine, Carstensen, Lipton, Spalinger, Firkins, Wolfe and Wrighton2017).
Limited sequence depth from a metagenomics study of the dairy rumen microbiome resulted in the assembly of only small contigs and only 20% of these could be functionally annotated (Pitta et al., Reference Pitta, Indugu, Kumar, Vecchiarelli, Sinha, Baker, Bhukya and Ferguson2016). However, shifts in the microbiota based on age where detected, along with a high proportion of functional genes assigned to starch degradation which comprised 20% to 30% of the offered diet (Pitta et al., Reference Pitta, Indugu, Kumar, Vecchiarelli, Sinha, Baker, Bhukya and Ferguson2016).
Initial studies investigating ruminants in relation to a low methane phenotype, either naturally or through chemical modification did not attempt to generate MAGs, but rather focussed on the phylogenetic and functional changes in the rumen microbiota. Decreases in the relative abundance of methanogenic species and methane generating pathways were evident in goats treated with bromochloromethane (BCM), reflecting the mode of action of BCM on these target species (Denman et al., Reference Denman, Martinez Fernandez, Shinkai, Mitsumori and McSweeney2015). Higher H2 levels in the rumen resulted in fermentative shifts to propionate, which was attributed in the metagenomic data to increases in Prevotella and Selenomonas spp. and supported by increased functional gene counts for the production of propionate through the randomising (succinate) pathway.
In a cattle trial that selected four pairs of cattle as natural low and high methane emitters, there was a 2.5-fold difference in the archaeal population using qPCR and 16S rRNA relative gene abundance data (Wallace et al., Reference Wallace, Rooke, McKain, Duthie, Hyslop, Ross, Waterhouse, Watson and Roehe2015; Roehe et al., Reference Roehe, Dewhurst, Duthie, Rooke, McKain, Ross, Hyslop, Waterhouse, Freeman, Watson and Wallace2016). KEGG analysis of archaeal genes associated directly or indirectly with methane production were also higher in the high methane-emitting animals, confirming the increased archaeal presence. A larger group of cattle from varying breeds and diets using a similar analysis also found the same hydrogenotrophic methane synthesis pathway correlated with higher methane emissions and a weak correlation to archaeal abundance (Auffret et al., Reference Auffret, Stewart, Dewhurst, Duthie, Rooke, Wallace, Freeman, Snelling, Watson and Roehe2017). However quantitative data for 1000 dairy cows showed only a weak correlation for the archaea:bacteria ratio with methane emissions (Tapio et al., Reference Tapio, Snelling, Strozzi and Wallace2017a), suggesting that the predictive power of this approach may be too low to identify high methane-producing phenotypes.
Similarly, low methane emitting sheep did not show a decrease in the relative abundance of methanogen species or methanogenic pathway genes using metagenomic data, rather only at the gene transcript level was a difference observed (Shi et al., Reference Shi, Moon, Leahy, Kang, Froula, Kittelmann, Fan, Deutsch, Gagic, Seedorf, Kelly, Atua, Sang, Soni, Li, Pinares-Patiño, McEwan, Janssen, Chen, Visel, Wang, Attwood and Rubin2014). The bacterial component of the low methane yield sheep rumen metagenome and transcriptome suggested a switch to hexose fermentation through to lactate and butyrate resulting in lower H2 yields available to drive methanogenesis. This was evident with increased observations of the lactate producing Sharpea spp. and subsequent conversion of this substrate by Megasphaera spp. to butyrate along with their respective pathways (Kamke et al., Reference Kamke, Kittelmann, Soni, Li, Tavendale, Ganesh, Janssen, Shi, Froula, Rubin and Attwood2016).
Metagenomics has also been used to identify functional shifts associated with ruminant efficiency or more commonly low RFI animals (Roehe et al., Reference Roehe, Dewhurst, Duthie, Rooke, McKain, Ross, Hyslop, Waterhouse, Freeman, Watson and Wallace2016; Shabat et al., Reference Shabat, Sasson, Doron-Faigenboim, Durman, Yaacoby, Berg Miller, White, Shterzer and Mizrahi2016). Efficient dairy cattle defined by RFI exhibited a lower richness of abundant microbial species for both 16S rRNA and at the microbial functional gene level (Shabat et al., Reference Shabat, Sasson, Doron-Faigenboim, Durman, Yaacoby, Berg Miller, White, Shterzer and Mizrahi2016). Both phylogenetically and functionally efficient animals were dominated by increases in Megasphaera elsdeniii and Coprococcus catus, hydrogen consuming lactate utilisers resulting in production of butyrate and propionate for the host. Coprococcus spp. have also previously been associated with the NADPH-dependent reduction of phloroglucinol and the redirection of H2 in the rumen to acetate in a methane inhibited rumen (Martinez-Fernandez et al., Reference Martinez-Fernandez, Denman, Cheung and McSweeney2017). In dairy cattle, efficient animals were also linked with decreases in Methanobrevobacter spp. and methanogenic pathway genes (Shabat et al., Reference Shabat, Sasson, Doron-Faigenboim, Durman, Yaacoby, Berg Miller, White, Shterzer and Mizrahi2016). Thus, the microbiology associated with H2 production and utilisation within the rumen seems to be tightly linked with methane and efficiency traits.
In cattle that were initially selected for their methane emission ranking, a correlation with RFI was also detected in which the authors demonstrated the abundance of 49 genes, explaining 86% of the variation observed in feed efficiency (Roehe et al., Reference Roehe, Dewhurst, Duthie, Rooke, McKain, Ross, Hyslop, Waterhouse, Freeman, Watson and Wallace2016). Of particular note were genes identified as ‘fucose sensing’ involved in cross talk between the host and the microbiota, possibly in response to the mucin content from bovine saliva.
Recently, results from the Hungate 1000 genome project were published, revealing genomic coverage of ~75% of the known genera from the rumen (Seshadri et al., Reference Seshadri, Leahy, Attwood, Teh, Lambie, Cookson, Eloe-Fadrosh, Pavlopoulos, Hadjithomas, Varghese, Paez-Espino, Hungate project, Perry, Henderson, Creevey, Terrapon, Lapebie, Drula, Lombard, Rubin, Kyrpides, Henrissat, Woyke, Ivanova and Kelly2018). But in spite of that the Hungate collection covers only a fraction of the diversity found within the rumen (Li et al., Reference Li, Zhong, Ramayo-Caldas, Terrapon, Lombard, Potocki-Veronese, Estelle-Fabrellas, Popova, Yang, Zhang, Li, Tang, Chen, Chen, Li, Guo, Martin, Maguin, Xu, Yang, Wang, Madsen, Kristiansen, Henrissat, Ehrlich and Morgavi2018; Stewart et al., Reference Stewart, Auffret, Warr, Wiser, Press, Langford, Liachko, Snelling, Dewhurst, Walker, Roehe and Watson2018). Close to 2.2% of the ORFs in the combined genomes were classified as carbohydrate-active enzymes and binding proteins reflecting the functional role of bacteria in degrading cellulose, hemicellulose and pectin. Furthermore, polysaccharide utilisation loci involved in the degradation of animal glycans were enriched in the Bacteroidetes genomes and may indicate the ability of these species to harvest energy from N-linked salivary glycoproteins (Seshadri et al., Reference Seshadri, Leahy, Attwood, Teh, Lambie, Cookson, Eloe-Fadrosh, Pavlopoulos, Hadjithomas, Varghese, Paez-Espino, Hungate project, Perry, Henderson, Creevey, Terrapon, Lapebie, Drula, Lombard, Rubin, Kyrpides, Henrissat, Woyke, Ivanova and Kelly2018). Metabolic fermentation pathway reconstruction for the sequenced species has now lead to the most complete reconstruction of the rumen microbiome metabolic scheme incorporating both species and functional capacity. However, the level of contribution that these species make to the complex interactions that take place within the rumen is still not fully understood. Although, recently the rumen microbiome gene catalogue clearly observed diet modulation in gene abundance counts even though 90% of coding genes were shared (Li et al., Reference Li, Zhong, Ramayo-Caldas, Terrapon, Lombard, Potocki-Veronese, Estelle-Fabrellas, Popova, Yang, Zhang, Li, Tang, Chen, Chen, Li, Guo, Martin, Maguin, Xu, Yang, Wang, Madsen, Kristiansen, Henrissat, Ehrlich and Morgavi2018).
Likewise, several studies focussing on the generation of MAGs from ruminants have discovered similar over representation of carbohydrate active enzymes for 99, 324 and 913 MAGs from moose and cattle (Svartstrom et al., Reference Svartstrom, Alneberg, Terrapon, Lombard, de Bruijn, Malmsten, Dalin, El Muller, Shah, Wilmes, Henrissat, Aspeborg and Andersson2017; Li et al., Reference Li, Zhong, Ramayo-Caldas, Terrapon, Lombard, Potocki-Veronese, Estelle-Fabrellas, Popova, Yang, Zhang, Li, Tang, Chen, Chen, Li, Guo, Martin, Maguin, Xu, Yang, Wang, Madsen, Kristiansen, Henrissat, Ehrlich and Morgavi2018; Stewart et al., Reference Stewart, Auffret, Warr, Wiser, Press, Langford, Liachko, Snelling, Dewhurst, Walker, Roehe and Watson2018).
With the results from the Hungate 1000 genome sequencing initiative and the various rumen focussed MAGs and rumen microbiota gene catalogue available, there is the future possibility of reducing the requirement to perform metagenomic assemblies and rather rapidly generate relative abundance counts and functional capacity linked to taxonomy through direct mapping to rumen relevant annotated genomic data sets (Hess et al., Reference Hess, Sczyrba, Egan, Kim, Chokhawala, Schroth, Luo, Clark, Chen, Zhang, Mackie, Pennacchio, Tringe, Visel, Woyke, Wang and Rubin2011; Parks et al., Reference Parks, Rinke, Chuvochina, Chaumeil, Woodcroft, Evans, Hugenholtz and Tyson2017; Svartstrom et al., Reference Svartstrom, Alneberg, Terrapon, Lombard, de Bruijn, Malmsten, Dalin, El Muller, Shah, Wilmes, Henrissat, Aspeborg and Andersson2017; Li et al., Reference Li, Zhong, Ramayo-Caldas, Terrapon, Lombard, Potocki-Veronese, Estelle-Fabrellas, Popova, Yang, Zhang, Li, Tang, Chen, Chen, Li, Guo, Martin, Maguin, Xu, Yang, Wang, Madsen, Kristiansen, Henrissat, Ehrlich and Morgavi2018; Seshadri et al., Reference Seshadri, Leahy, Attwood, Teh, Lambie, Cookson, Eloe-Fadrosh, Pavlopoulos, Hadjithomas, Varghese, Paez-Espino, Hungate project, Perry, Henderson, Creevey, Terrapon, Lapebie, Drula, Lombard, Rubin, Kyrpides, Henrissat, Woyke, Ivanova and Kelly2018; Stewart et al., Reference Stewart, Auffret, Warr, Wiser, Press, Langford, Liachko, Snelling, Dewhurst, Walker, Roehe and Watson2018).
As of yet very little genomic data representing protists and the anaerobic fungi are available, substantially limiting our ability to study these groups. However, improvements in long read sequencing platforms have overcome some of the difficulties in assembling the highly repetitive, AT base rich genomes of the anaerobic fungi (Solomon et al., Reference Solomon, Haitjema, Henske, Gilmore, Borges-Rivera, Lipzen, Brewer, Purvine, Wright, Theodorou, Grigoriev, Regev, Thompson and O’Malley2016; Haitjema et al., Reference Haitjema, Gilmore, Henske, Solomon, de Groot, Kuo, Mondo, Salamov, LaButti, Zhao, Chiniquy, Barry, Brewer, Purvine, Wright, Hainaut, Boxma, van Alen, Hackstein, Henrissat, Baker, Grigoriev and O’Malley2017). Comparative genomics and proteomics have catalogued an extensive array of plant depolymerisation and structural genes that form the anaerobic fungal cellulosome complex. Scaffoldin proteins, containing dokerin-binding cohesion sequences were highly conserved across the Neocallimastigomycota, allowing potentially for interspecies fungal enzyme complexes to form (Haitjema et al., Reference Haitjema, Gilmore, Henske, Solomon, de Groot, Kuo, Mondo, Salamov, LaButti, Zhao, Chiniquy, Barry, Brewer, Purvine, Wright, Hainaut, Boxma, van Alen, Hackstein, Henrissat, Baker, Grigoriev and O’Malley2017). This large diversity of degradative and substrate binding capacity incorporated into the fungal cellulosome structure provides a co-ordinated and synergistic mechanism for complete conversion of cellulosic material to fermentable sugars. Transcriptomics provided further insights into aspects of the co-ordination, identifying repressive regulation of cellulosic gene sets in response to glucose as an end product metabolite. Furthermore, the tailoring of hydrolytic gene transcripts was linked to the complexity of the substrate through an increase in the number and functional diversity of degradative genes transcribed (Solomon et al., Reference Solomon, Haitjema, Henske, Gilmore, Borges-Rivera, Lipzen, Brewer, Purvine, Wright, Theodorou, Grigoriev, Regev, Thompson and O’Malley2016).
Observing functional gene activity within the rumen using metatranscriptomics
Considerations and limitations
Metagenomic analysis performed on genomic DNA cannot distinguish whether the material is from viable cells or if the predicted genes are functionally expressed at the time of collection. However, limitations for using rRNA as an indicator of the species activity within a community exist and should be considered. Concentrations of rRNA are not consistently correlated with growth and can differ greatly between closely related taxa, whereas dormant cells can still contain high levels of rRNA (Blazewicz et al., Reference Blazewicz, Barnard, Daly and Firestone2013). Unlike eukaryotic mRNA, the majority of prokaryotic transcripts are not poly adenylated at the 3' end, making the commonly used mRNA polyA enrichment methods impractical (Sarkar, Reference Sarkar1997). Most current microbial targeted methods employ techniques to deplete the rRNA sequences in order to increase the number of non-rRNA reads in the data sets.
Computational methods to identify and remove the ribosomal sequences are common and easily employed. Likewise host transcripts should be removed if a reference data set is available in order to enrich the microbiome transcripts. Although, host transcript data should not be discarded as it may show some indication of the cross talk between the host and its microbiome. Due to the lack of relevant rumen microbial genomic data sets to map transcripts too, most analyses involve a de novo assembly and annotation step.
Application of metatranscriptomics
Considering that most RNASeq analysis workflows were established for eukaryotic mRNA, it was not surprising that the first large-scale rumen metatranscriptomic analysis focussed on the rumen eukaryotic species. Even though samples were collected before feeding, the eukaryotic fibre adherent population of the muskoxen rumen exhibited a high level of expression for genes involved in the degradation of crystalline plant cell wall polysaccharides, with large numbers of exo-acting glucanases and swollenin genes (Qi et al., Reference Qi, Wang, O’Toole, Barboza, Ungerfeld, Leigh, Selinger, Butler, Tsang, McAllister and Forster2011). Functional assignment of the transcripts showed that 3.4% of the data were more closely related to bacterial sequences, illustrating the poor coverage of rumen eukaryotic sequences in the reference database and the occurrence of horizontal gene transfer that occurs within the rumen microbiome.
Using methods to deplete the ribosomal RNA sequences from the total RNA, Dai and colleagues were able to collect in excess of 1 million non-rRNA reads of which ~1% were identified as carbohydrate active enzymes or binding modules (Dai et al., Reference Dai, Tian, Li, Su, Wang, Zhao, Liu, Luo, Liu, Zheng, Wang, Dong, Hu and Huang2015). A similar level of carbohydrate active genes were observed in the mRNA enriched metatranscriptomic data of dairy cows in Japan (Shinkai et al., Reference Shinkai, Mitsumori, Sofyan, Kanamori, Sasaki, Katayose and Takenaka2016). These studies confirmed that the major bacterial activity of fibre degradation was being performed by members of the genera Fibrobacter, Prevotella and Ruminococcus. Recently, a study on dairy cows in France used 18 newly designed ribosomal depletion capture probes covering a large number of the rumen archaeal, bacterial, fungal and protozoal genera to enrich for non-rRNA reads (Comtet-Marre et al., Reference Comtet-Marre, Parisot, Lepercq, Chaucheyras-Durand, Mosoni, Peyretaillade, Bayat, Shingfield, Peyret and Forano2017). Likewise, the data confirmed the majority of bacterial activity resides with these fibrolytic species, but also highlighted the large contribution from fungal and protozoal species. The lack of relevant genomes for mapping and annotation of RNA-Seq data is still considered the major limitation of these methods.
Metatranscriptomic analysis of low and high RFI beef cattle used total RNA sequencing rather than enriching for non-rRNA sequences (Li and Guan, Reference Li and Guan2017). There was no variance between the microbiomes based on the rRNA data at the taxonomic family level. Approximately 90% of all the read data were identified as rRNA, resulting on an average ~4.5 million non-ribosomal reads for transcriptomic analysis. Based on reads assigned to KEGG pathways, 30 pathways were relatively more abundant in H-RFI animals (inefficient), including pathways associated with amino acid metabolism, reinforcing the findings from H-RFI dairy cows using metagenomics (Shabat et al., Reference Shabat, Sasson, Doron-Faigenboim, Durman, Yaacoby, Berg Miller, White, Shterzer and Mizrahi2016).
Early studies of rumen methanogens using rRNA and mcrA libraries identified a group of archaea closely related to Thermoplasmatales (rumen cluster C) (Denman et al., Reference Denman, Tomkins and McSweeney2007; Janssen and Kirs, Reference Janssen and Kirs2008). Although it was hypothesised that they were likely to be methanogenic archaea, due the ubiquitous nature of this group within the rumen and supported by the presence of the mcrA gene. It was not until Poulsen and colleagues used metatranscriptomics that a definitive link could be ascertained between methanogenic capacity and the RCC group (Poulsen et al., Reference Poulsen, Schwab, Jensen, Engberg, Spang, Canibe, Hojberg, Milinovich, Fragner, Schleper, Weckwerth, Lund, Schramm and Urich2013). Based on these transcripts, it was concluded that this group of methanogens used methylamines for the production of methane rather than the usual hydrogenotropihic pathway used by Methanobrevibacter spp. The RCC and related methylotrophic methanogens were renamed and belong to the new order Methanomassiliicoccales (Iino et al., Reference Iino, Tamaki, Tamazawa, Ueno, Ohkuma, Suzuki, Igarashi and Haruta2013).
High methane-emitting sheep did not show any variance in the abundance of methanogenic species or pathways from metagenomic analysis, but did show a strong correlation between hydrogenotrophic methanogenesis transcripts and methane yields (Shi et al., Reference Shi, Moon, Leahy, Kang, Froula, Kittelmann, Fan, Deutsch, Gagic, Seedorf, Kelly, Atua, Sang, Soni, Li, Pinares-Patiño, McEwan, Janssen, Chen, Visel, Wang, Attwood and Rubin2014). It was concluded that this was likely a response to the supply of hydrogen in the rumen from the fermentative processes of other rumen microbes. Further to this study, the bacterial species were the focus of a second paper in which it was suggested that due most likely to a smaller rumen size and faster transit times, conditions were more favourable for rapid bacterial fermentation of hexose through to lactate and butyrate resulting in lower H2 yields for methanogenesis (Kamke et al., Reference Kamke, Kittelmann, Soni, Li, Tavendale, Ganesh, Janssen, Shi, Froula, Rubin and Attwood2016). Butyrate production was assigned to Sharpea spp. and Megasphaera spp. and involves a two-step process, calculated to produce 24% less methane, due to less net hydrogen generation, compared with the common one-step fermentative pathway performed by Ruminococcaceae (Kamke et al., Reference Kamke, Kittelmann, Soni, Li, Tavendale, Ganesh, Janssen, Shi, Froula, Rubin and Attwood2016).
Metaproteomics for the characterisation of the rumen microbioata proteome
Considerations and limitations
Advances in the throughput and accuracy of mass spectrometry tools coupled with peptide separation methods can now lead to the detection of over 100 000 tandem mass spectra per acquisition (Timmins-Schiffman et al., Reference Timmins-Schiffman, May, Mikan, Riffle, Frazar, Harvey, Noble and Nunn2017). But like all other omic approaches, the limitation for identification and taxonomic assignment is due to the poor coverage of sample relevant data in the reference sets. Workflows that combine relevant metagenomic and isolate sequence data will improve peptide identification (Petriz and Franco, Reference Petriz and Franco2017). Processing methods that reduce contaminating proteins from the diet or host are also essential to maximise the focus on the microbiome peptides. Likewise, rumen and faecal samples can contain polyphenolic compounds, such as tannins and humic acids that can co-precipitate and interfere with analysis through modifications that make it difficult to identify the peptides and supress ionisation (Makkar et al., Reference Makkar, Blummel and Becker1995; Qian and Hettich, Reference Qian and Hettich2017; Snelling and Wallace, Reference Snelling and Wallace2017).
Applications of metaproteomics
Limited metaproteomic studies have been published for the rumen, an early attempt from sheep rumen microorganisms focussed on identifying cellulose-binding proteins through an enrichment step (Toyoda et al., Reference Toyoda, Iio, Mitsumori and Minato2009). By using 1D-PAGE coupled to MS/MS, they were able to identify a small number of proteins and link them to microbial species using the limited databases available at the time. Endoglucanases of the cellulolytic bacterium Fibrobacter succinogenes and an exoglucanase from the fungi Piromyces equi were among the proteins identified.
Increased resolution and substantially more peptides were identified when 2D-PAGE separation was used before LC-MS/MS detection (Snelling and Wallace, Reference Snelling and Wallace2017). Issues associated with humic content of grass fed animals and between-sample replication limits the possibility to use 2D-PAGE as a sole tool to predict the function of rumen proteins. Humic compounds co-precipitate with proteins and alter gel mobility, resulting in unresolved smears rather than distinct protein bands. However, metaproteomics of the separated proteins identified protozoal structural proteins, prokaryotic central metabolic enzymes and archaeal methanogenesis proteins (Snelling and Wallace, Reference Snelling and Wallace2017).
Using a metaproteomic shotgun approach to discover peptides from plant adherent and rumen liquid fraction microbiota, it was possible to identify in excess of 2000 bacterial, 150 archaeal and 800 eukaryotic proteins in the fibre adherent fraction and similar ratios but lower numbers in the liquid fraction (Deusch and Seifert, Reference Deusch and Seifert2015). Bacterial and archaeal taxonomy was as expected with Prevotellaceae, Fibrobacteraceae, Ruminococcaceae, Clostridiaceae, Methanobacteriaceae and Methanomicrobiaceae dominating. Eukaryotic taxonomic assignment was hindered by the sparsity of rumen protozoal and fungal data sets and most proteins could only be identified to the Phyla level and were classified as originating from plant and host proteins.
A more detailed metaproteomic study including sample fractionation for fibre associated and liquid microbiota, along with diet shifts from forage and grain based diets produced over 8000 bacterial and 350 archaeal proteins (Deusch et al., Reference Deusch, Camarinha-Silva, Conrad, Beifuss, Rodehutscord and Seifert2017). Concurrent with the 16S rRNA amplicon-based analysis of the same samples, diet was confirmed as the largest driver of microbiota change. Succinivibrionaceae OTUs and proteins attributed to this group, particularly carbohydrate esterases were both more abundant in the corn-supplemented diet and confirmed its increase on starch-based diets as observed previously (Petri et al., Reference Petri, Schwaiger, Penner, Beauchemin, Forster, McKinnon and McAllister2013; Wallace et al., Reference Wallace, Rooke, McKain, Duthie, Hyslop, Ross, Waterhouse, Watson and Roehe2015; Martinez-Fernandez et al., Reference Martinez-Fernandez, Denman, Yang, Cheung, Mitsumori and McSweeney2016). The major fibre degrading bacteria Fibrobacter spp., Ruminococcus spp. and Lachnospiraceae were more prevalent in the solid fraction. Furthermore, a high proportion of proteins linked to butyrate formation were assigned to Lachnospiraceae bacteria in the fibre-rich diets. The metabolically diverse Prevotellaceae were abundant in all samples and linked with acetate and propionate pathway proteins (Deusch et al., Reference Deusch, Camarinha-Silva, Conrad, Beifuss, Rodehutscord and Seifert2017). Archaeal populations did not differ significantly between diets or collected fractions due to low abundance. However, proteomic analysis identified proteins from the Crenarchaeota and Thaumarchaeota Phyla that the amplicon sequencing did not, possibly because of primer biases for the OTU sequencing. The methyl-coenzyme M reductase involved in the final step of methanogenesis was detected at its lowest in the corn-based diet, corresponding with the general observation of others that methanogenesis is lower in cattle fed high concentrate diets (Rooke et al., Reference Rooke, Wallace, Duthie, McKain, de Souza, Hyslop, Ross, Waterhouse and Roehe2014; Martinez-Fernandez et al., Reference Martinez-Fernandez, Denman, Yang, Cheung, Mitsumori and McSweeney2016).
Microbial metabolite detection within the rumen using metabolomics
The majority of rumen nutrition studies have usually performed some limited metabolomics analysis, at least presenting data on short chain fatty acids (SCFA) products and some with methane and hydrogen levels. Mass spectrometry and nuclear magnetic resonance spectroscopy are the most popular high throughput methods being applied to rumen samples.
Analogous to most other rumen analysis, the metabolome of animals on different diets could easily be distinguished. Most metabolomics studies in ruminants have investigated the increased proportion of concentrate/grain in a diet compared with roughage. Saleem et al. (Reference Saleem, Bouatra, Guo, Psychogios, Mandal, Dunn, Ametaj and Wishart2013) summarised the results of studies performed on increasing levels of concentrate using various methods to detect 246 rumen metabolites, covering phospholipids, inorganic ions, gases, amino acids SCFA and carbohydrates (Ametaj et al., Reference Ametaj, Zebeli, Saleem, Psychogios, Lewis, Dunn, Xia and Wishart2010; Saleem et al., Reference Saleem, Ametaj, Bouatra, Mandal, Zebeli, Dunn and Wishart2012). Increased grain in the diet lead to rises in methylamines and a decrease in 3-phenylpropionate (hydrocinnamate), likely linked to the decreased plant phenolic component of the diet. Similar findings were also reported from other studies when altering the diet (Zhao et al., Reference Zhao, Zhao, Bu, Sun, Wang and Dong2014; Mao et al., Reference Mao, Huo and Zhu2016; Zhang et al., Reference Zhang, Shi, Wang, Li, Cao, Ji, He and Zhang2017a and Reference Zhang, Ye, Liu and Mao2017b; O’Callaghan et al., Reference O’Callaghan, Vazquez-Fresno, Serra-Cayuela, Dong, Mandal, Hennessy, McAuliffe, Dillon, Wishart, Stanton and Ross2018).
Profiling of feed efficient rumens could also be distinguished by metabolomics, ruminal biohydrogenation pathways, in particular down regulation of linoleic and alpha linoleic acid metabolism were associated with average daily gain (Artegoitia et al., Reference Artegoitia, Foote, Lewis and Freetly2017). However, feed efficiency in dairy cows was linked to increased production of SCFA, with increased propionate:acetate ratios and decreased methane and increased putrescine (Shabat et al., Reference Shabat, Sasson, Doron-Faigenboim, Durman, Yaacoby, Berg Miller, White, Shterzer and Mizrahi2016).
Unlike other omic technologies, metabolomics cannot directly link metabolites to a microbial species. Associations with changes in microbial relative abundance through microbial profiling, metatranscriptomic or metaproteomic is required. Correlations between changes in urine and plasma metabolites have been linked to relative abundance shifts for protozoa and Methanomassiliicoccus spp., particularly trimethylamine N-oxide (Morgavi et al., Reference Morgavi, Rathahao-Paris, Popova, Boccard, Nielsen and Boudra2015; Saro et al., Reference Saro, Hohenester, Bernard, Lagrée, Martin, Doreau, Boudra, Popova and Morgavi2018). This suggests that this compound may be used as a biomarker for monitoring methylamine utilising methanogens through urine metabolites.
Combining many of the omic platforms will often lead to more powerful observations; for determining the function of un-cultivated bacteria or confirming functional roles within different systems. The characterisation of multiple genomes from the un-cultivated Bacteroidetes BS11 family using metagenomic data and confirmation of function with metaproteomic and metabolomic data categorised them as hemicellulosic degraders producing SCFA as end products (Solden et al., Reference Solden, Hoyt, Collins, Plank, Daly, Hildebrand, Beavers, Wolfe, Nicora, Purvine, Carstensen, Lipton, Spalinger, Firkins, Wolfe and Wrighton2017). Likewise, microbial profiling partnered with metaproteomics and metabolomics reveals the Prevotellaceae as a metabolically versatile group dominating on both concentrate and fibre rich diets (Deusch et al., Reference Deusch, Camarinha-Silva, Conrad, Beifuss, Rodehutscord and Seifert2017). Corn-based diets promoted the activities of the Succinivibrionaceae family, leading to production of succinate for use by other species resulting in propionate as an end product. Fibre rich diets promoted Prevotellaceae, Ruminococcus spp. and Lachnospiraceae to drive butyrate and propionate formation leading to changes in the propionate:acetate ratio. Dairy cattle showed increased energy harvest in feed efficient animals that was linked to a restriction in microbiome diversity and richness. Efficient animals were highlighted with increases in hydrogen and lactate utilising Megasphaera elsdeniii and Coprococcus catus, resulting in production of butyrate and propionate (Shabat et al., Reference Shabat, Sasson, Doron-Faigenboim, Durman, Yaacoby, Berg Miller, White, Shterzer and Mizrahi2016).
Future directions
Accurate annotation of the functional gene products allows for the construction of genome-scale metabolic models (GEM) for individual isolates, which can then be applied to understand complex interactions within the system (van der Ark et al., Reference van der Ark, van Heck, Martins Dos Santos, Belzer and de Vos2017). Most microbes have the capacity to utilise a wide array of nutrients using varied metabolic pathways. Tools such as Minimal Environmental TOol (MENTO), RAVEN and ModelSEED can be used to predict the minimal nutrient requirements of as yet un-cultivated organisms based on its GEM (Henry et al., Reference Henry, DeJongh, Best, Frybarger, Linsay and Stevens2010; Agren et al., Reference Agren, Liu, Shoaie, Vongsangnak, Nookaew and Nielsen2013; Zarecki et al., Reference Zarecki, Oberhardt, Reshef, Gophna and Ruppin2014). Beyond models that predict an individual’s phenotype are ones that allow for modelling co-cultures and multispecies interactions to predict ways to drive a desired rumen phenotype. Hypothesis testing based on culturing studies can be performed on isolates and multiple species competing and interacting to build network models to better inform the GEM in an iterative manner. Predictive competitive and cooperative metabolic models already suggest that competition is generally dominated by versatile fast growing species (Freilich et al., Reference Freilich, Zarecki, Eilam, Segal, Henry, Kupiec, Gophna, Sharan and Ruppin2011). Likely explaining the high abundance of Prevotella spp. consistently observed in the rumen. Regardless of these techniques, accuracy and further advancement of these models will still require isolation and culturing of these new microorganisms to validate and strengthen predictions. New high throughput culturing methods and media are proving successful (Kenters et al., Reference Kenters, Henderson, Jeyanathan, Kittelmann and Janssen2011; Lagier et al., Reference Lagier, Khelaifia, Alou, Ndongo, Dione, Hugon, Caputo, Cadoret, Traore, Seck, Dubourg, Durand, Mourembou, Guilhot, Togo, Bellali, Bachar, Cassir, Bittar, Delerce, Mailhe, Ricaboni, Bilen, Dangui Nieko, Dia Badiane, Valles, Mouelhi, Diop, Million, Musso, Abrahão, Azhar, Bibi, Yasir, Diallo, Sokhna, Djossou, Vitton, Robert, Rolain, La Scola, Fournier, Levasseur and Raoult2016).
Monitoring changes to rumen function for research purposes is achievable, but scaling to the level that will drive changes in production systems will need the development of simple robust quantitative markers. A large number of microbial metabolites and microbial-host co-metabolites are present in plasma and other body fluids. There is potential for the discovery of biomarkers linked to microbiota function that could be applied in farms using less invasive and simpler techniques. Saliva and buccal swab samples have already been shown to reflect the rumen microbiome, whereas changes in plasma fatty acid profiles were suggested as biomarkers for weight gain and levels of trimethylamine N-oxide as a marker for methylamine utilising methanogens (Kittelmann et al., Reference Kittelmann, Kirk, Jonker, McCulloch and Janssen2015; Morgavi et al., Reference Morgavi, Rathahao-Paris, Popova, Boccard, Nielsen and Boudra2015; Tapio et al., Reference Tapio, Shingfield, McKain, Bonin, Fischer, Bayat, Vilkki, Taberlet, Snelling and Wallace2016; Artegoitia et al., Reference Artegoitia, Foote, Lewis and Freetly2017).
The omics technologies have given us an understanding of which species are present and their potential function. Now, why and how they are present will start to be addressed. Ultimately, these understandings should lead to the ability to design targeted interventions that direct rumen composition and activity towards improved production, health and benefits to the environment.
Conclusions
Considering the limitations and erroneous methods initially used, primarily due to the infancy of these technologies, the low numbers of animals investigated and lack of robust statistical tools for compositional data, omics based technologies have produced results that relate to and support our biological understanding of the rumen system. Continued and more precise use of these tools will lead to a detailed and comprehensive understanding of compositional and functional capacity. Relative abundance shifts in microbial populations are now being related to gene transcripts and proteins that explain changes in detected metabolites. However, our ability to rationally design and drive rumen microbial composition and function in this highly complex and dynamically changing environment is limited. Integrating omics data will allow for the construction of rumen-specific microbial metabolic models.
Acknowledgements
D. P. Morgavi was partially supported by the French National Research Agency (ANR) through the FACCE-JPI project RumenStability and FACCE ERA-GAS project RumenPredict. S. E. Denman and C. S. McSweeny were partially supported by Meat and Livestock Australia (MLA).
Declaration of interest
None.
Ethics statement
All ethical standards have been met.
Software and data repository resources
None.