The advent of microbiology in the 19th Century is arguably the biggest factor shaping modern medicine today. While understanding infectious disease is often toted as the biggest benefit of this, the discovery of microbial natural products is arguably a much more impactful one. Much like the industrial revolution brought about products that were never imaginable, the “golden age of discovery” of microbial natural products provided access to hitherto untapped biosynthetic machinery and consequently a whole new era of natural products. However, the process of attaining these products (using the Waksman platform) remained inefficient and laborious, and soon, when the process was repeatedly rediscovering the same products, a new discovery method was required. Metagenomics provided the answer to this. Biosynthetic genes exist in clusters, so a genome of interest can be scanned for known biosynthetic genes, and neighboring genes analyzed to assess their capacity for producing secondary metabolites (Myronovskyi and Luzhetskyy, 2016). If novel biosynthetic pathways are found, the gen can be cloned, and the production of the desired product induced. Functional metagenomics involves experimentally expressing and screening metagenomic DNA libraries from an environment, in order to aid the annotation of sequence-based metagenomic methods. Functional genomics poses more challenges but facilitates the discovery of products that cannot be predicted using sequence-based methods. This review covers a variety of primary literature, selected foremost for their potential impact on the progression of the field.
There are two main metagenomic strategies that are involved in the discovery of microbial natural products: Sequence-based strategies and phenotype-based strategies. Both strategies have the advantage of not requiring the cultivation of the organism responsible for producing the product. Phenotype-based strategies generally involve cloning eDNA libraries (Figure 1) and screening them for a desired phenotype. The most obvious example of this is screening for enzyme activity. Clones can be introduced to a medium containing the substrate related to a desired enzyme under desired conditions, the digestion of which would indicate the presence of the enzyme. The choice of habitat from which these genes are derived essentially directs the properties of the natural products produced by the gene clusters. This is collectively known as functional genomics.
Sequence-based strategies involve sequence tagging highly conserved regions within known groups of biosynthetic pathways such as polyketide synthases (PKS) and nonribosomal peptide synthases (NRPS). This is done using PCR probes designed for homology with genes of interest found through DNA sequencing. This allows a much more targeted approach for finding biosynthetic gene clusters of interest, using the amplification and over-expression of rare genes of interest.
This review highlights how functional metagenomics has allowed the directed discovery of novel natural products with desirable characteristics which by nature make the microbes producing them difficult to culture. The discovery of novel thermostable cellulase enzymes and lipolytic enzymes which function at lower temperatures are examples of this which are explored in this review. This review also explores how sequence-based genomics provides answers to two issues that are unaddressed by traditional genome mining: activating silent pathways in well-studied microbial biosynthetic systems and the identification of rare novel biosynthetic pathways. These topic areas are the most impactful on real-life applications, resulting in the directionally led discovery of desired natural products.
Functional Metagenomics: Discovery of novel thermostable cellulases
Cellulases are a group of enzymes that catalyze the breakdown of cellulose, the most abundant polysaccharide on earth, into glucose. These have many industrial applications, such as in the production of biofuels and the treatment of textiles. Both industries require thermostable cellulases which can work at higher temperatures. The use of functional metagenomics allows scientists to exploit the metabolisms of thermophile prokaryotes which are difficult to culture in the laboratory to discover new, thermostable cellulases. Screening of 60 fosmids eDNA libraries derived from hydrothermal vents was conducted at temperatures ranging from 8°C to 70°C. Most of the active fosmid clones were in the temperature range of 30-37°C, which is not much use for high-temperature industrial applications. One fosmid clone was active at 70°C, however, and when expressed on plates containing carboxymethyl cellulose at this temperature, produced a ring of hydrolysis, indicating the presence of a thermostable cellulase. The sequence analysis uncovered a novel family of cellulose-/β-glucan-specific endoglucanase cellulases, named Cel12E (Leis et al., 2015). Analysis of the open reading frames (ORF) in the fosmid insert shows that it originates from the archaeon Thermococcus sp., extracted from deep-sea vents. Further experimentation demonstrated that Cel12E can hydrolyze cellulose at temperatures as high as 92°C.
Metagenomics clearly offers several advantages over traditional genome mining techniques in discovering thermostable cellulases such as this. Firstly, given that less than 2% of eDNA expressed an enzyme with the desired thermostable capability, it would be highly inefficient to culture and express each genome of interest individually. Furthermore, the extreme environment in which the prokaryotes encoding these biosynthetic pathways reside means that successful culturing and expression are challenging, to begin with. Even if these extremophile archaea were successfully cultured in the laboratory, it is unlikely that the conditions required for these pathways to be activated would be met, meaning most of them would be “silent”. Moreover, metagenomics as used in this context opens the door for further discovery of similar natural products. Having identified a species that produces thermostable cellulases, it follows that the same species would likely encode biosynthetic pathways that produce other thermostable enzymes. This means that there is a cumulative effect of natural product discovery. This contrasts traditional genome mining efforts, which could be considered a zero-sum game; the more products discovered, the less likely it becomes to discover novel ones.
Functional Metagenomics: Cold Active Lipases
The sediment at the bottom of the Baltic sea has a high level of microbial diversity, thus making it a good candidate for the discovery of low-temperature lipases. Lipases are enzymes that hydrolyze water-insoluble triglycerides into glycerol and fatty acids. These have a wide variety of industrial uses, including the production of detergent, pharmaceuticals, textiles, and paper. Such an enzyme that is active at a low temperature would have considerable financial advantages, reducing industrial energy costs and emissions. A metagenomic library of Baltic sea eDNA was constructed in an E. coli fosmid library and cloned before being size separated to identify fosmids of interest (Figure 2). The clones were then functionally screened for lipolytic activity (Hardeman and Sjoling, 2007). Only 1% yielded a positive result. A novel enzyme, H1Lip1 was identified as a lipase candidate after sequencing and annotation comparison, having a 54% similarity to an esterase found in Pseudomonas putida. The lipase was overexpressed and purified, before being screened with the ester, DGGR, to confirm it as a lipase. H1Lip1 was shown to be stable between 25-35°C.
The literature points towards the eDNA vector and host as being the reason that yields were so low, given that they were not adapted to low temperatures. However, other studies have yielded much higher results from a Deep-Sea Sponge, Stelletta norman, using the same vector (Borchert et al., 2017). In this case, a multitude of other variables was also adjusted, including salinity and pH when expressing the enzyme. It is feasible, therefore, that other silenced pathways could produce novel lipases, if the correct conditions are met in addition to substrate screening.
Sequence-based metagenomics: Activating silenced pathways
Recent advancements in bioinformatics tools such as antiSMASH allow researchers to identify silenced biosynthetic genes in order to predict and discover natural products, without requiring the genes to be expressed. An example of this sequence analysis is in Salinispora tropica, where researchers predicted that a PKS gene cluster would attach a polyene unit to the final product, which could be detected by experimental UV absorption (Udwary et al., 2007). Culture broth extracts were then produced and screened for UV absorption, and novel polyene macrolactams were isolated. A similar study analyzed silent NRPS and PKS-NRPS gene clusters found in the Pseudomonas fluorescens Pf5 gene and allowed researchers to deduce the amino acid substrate used by these clusters (Garrido-Sanz et al., 2017). These substrates were then labeled with radioactive isotopes so that they could be tracked. This led to the discovery that P. fluorescens produces famines, which are a novel group of lipopeptides. This is despite the fact that P. fluorescens is a well-studied bacterium. The availability of genomes and genome comparison tools such as antiSMASH means that biological natural products can be predicted based on the substrates required allowing scientists to identify and target silent gene clusters and induce expression thereof.
Another example of this is the induction of biosynthetic genes from the well-studied Streptomyces coelicolor, in which HDAC inhibitors were identified to stimulate the expression of biosynthetic gene clusters in conditions under which they were not normally expressed (Moore et al., 2012). This is a major advantage to sequence-based metagenomic strategies in the production of natural products, not only are silenced pathways identified but the conditions and substrates required for expression can also be deduced.
Sequence-based Metagenomics: Discovering rare pathways
Sequencing of a library consisting of over 10 million unique cosmid clones was constructed using DNA isolated from the Anza-Borrego Desert. An oxy-tryptophan dimerization gene cluster was found which was distinct from others and was later expressed in Streptomyces albus. It was found to produce several useful natural products including an indolotryptoline antiproliferative agent (Chang and Brady, 2013). This sequenced-based screening for the discovery of unique biosynthetic gene clusters prior to expression provides a much more directed approach to discovering natural products, since gene clusters that are determined to encode natural products which have been previously discovered can be excluded prior to expression.
The production of many natural products of interest involves tryptophan dimerization. All biosynthetic gene clusters which encode tryptophan dimers have highly conserved genes that are responsible for dimerization. Indolotryptoline is a tryptophan dimer that has been elusive in culture-based experiments and is a pharmacologically desirable product since it has been shown to have anti-tumor activity. Sequence screening of the eDNA sample resulted in the cloning and expression of an indolotryptoline-based biosynthetic gene cluster which was termed bor. This cluster produced two borrogomycin natural products which have antibiotic as well as anticancer properties. This was achieved through the design of degenerate primers to target the aforementioned conserved sequences involved. This is one such example of where a rare iteration of a biosynthetic gene can be overexpressed to produce bioactive natural products. Thus, it is entirely possible that a mutation in a single prokaryote that offers no evolutionary advantage could be captured and overexpressed from an eDNA sample, to produce natural products which are pharmaceutically or industrially advantageous but are not found in any other environmental genome. This exemplifies the way in which sequence-directed genome mining uncovers rare biosynthetic pathways where traditional methods are a proverbial shot in the dark, resulting in the rediscovery of the most common biosynthetic systems.
Previous genome mining methods, namely the Waksman platform for the discovery of antibiotics have been hindered by silent biosynthetic genes which are not expressed in the laboratory as well as the scarcity of some natural product-producing genes. This resulted in the rediscovery of the same natural products. The advent of metagenomics revolutionized genome mining for genes producing useful natural products, not only allowing researchers to identify silent biosynthetic genes within well-studied genomes, but also giving insight into the substrates required for their activation. Additionally, unique pathways can be identified much more efficiently, meaning that novel natural products can be discovered from libraries containing many genes of no interest. Functional metagenomics has facilitated the discovery of natural products which selected for their native environmental conditions, meaning they can be directly mined in correspondence with a desired industrial or pharmaceutical use. Future research could look towards constructing libraries of chimeric biosynthetic genes using eDNA libraries from generally conflicting conditions, producing industrial enzymes that do not exist naturally.
- Myronovskyi, M. and Luzhetskyy, A. (2016). Native and engineered promoters in natural product discovery. Natural Product Reports, 33(8), pp.1006-1019.
- Leis, B., Heinze, S., Angelov, A., Pham, V., Thürmer, A., Jebbar, M., Golyshin, P., Streit, W., Daniel, R. and Liebl, W. (2015). Functional Screening of Hydrolytic Activities Reveals an Extremely Thermostable Cellulase from a Deep-Sea Archaeon. Frontiers in Bioengineering and Biotechnology, 3.
- Hardeman, F. and Sjoling, S. (2007). Metagenomic approach for the isolation of a novel low-temperature-active lipase from uncultured bacteria of marine sediment. FEMS Microbiology Ecology, 59(2), pp.524-534.
- Borchert, E., Selvin, J., Kiran, S., Jackson, S., O'Gara, F. and Dobson, A. (2017). A Novel Cold Active Esterase from a Deep Sea Sponge Stelletta normani Metagenomic Library. Frontiers in Marine Science, 4.
- Unwary, D., Zeigler, L., Asolkar, R., Singan, V., Lapidus, A., Fenical, W., Jensen, P. and Moore, B. (2007). Genome sequencing reveals complex secondary metabolome in the marine actinomycete Salinispora tropica. Proceedings of the National Academy of Sciences, 104(25), pp.10376-10381.
- Garrido-Sanz, D., Arrebola, E., Martínez-Granero, F., García-Méndez, S., Muriel, C., Blanco-Romero, E., Martín, M., Rivilla, R. and Redondo-Nieto, M. (2017). Classification of Isolates from the Pseudomonas fluorescens Complex into Phylogenomic Groups Based on Group-Specific Markers. Frontiers in Microbiology, 8.
- Moore, J., Bradshaw, E., Seipke, R., Hutchings, M. and McArthur, M. (2012). Use and Discovery of Chemical Elicitors That Stimulate Biosynthetic Gene Clusters in Streptomyces Bacteria. Methods in Enzymology, pp.367-385.
- Chang, F. and Brady, S. (2013). Discovery of indolotryptoline antiproliferative agents by homology-guided metagenomic screening. Proceedings of the National Academy of Sciences, 110(7), pp.2478-2483.