Introduction

Historically, natural products (NPs) have played a key role in drug discovery, especially for cancer and infectious diseases1,2, but also in other therapeutic areas, including cardiovascular diseases (for example, statins) and multiple sclerosis (for example, fingolimod)3,4,5.

NPs offer special features in comparison with conventional synthetic molecules, which confer both advantages and challenges for the drug discovery process. NPs are characterized by enormous scaffold diversity and structural complexity. They typically have a higher molecular mass, a larger number of sp3 carbon atoms and oxygen atoms but fewer nitrogen and halogen atoms, higher numbers of H-bond acceptors and donors, lower calculated octanol–water partition coefficients (cLogP values, indicating higher hydrophilicity) and greater molecular rigidity compared with synthetic compound libraries1,6,7,8,9. These differences can be advantageous; for example, the higher rigidity of NPs can be valuable in drug discovery tackling protein–protein interactions10. Indeed, NPs are a major source of oral drugs ‘beyond Lipinski’s rule of five11. The increasing significance of drugs not conforming to this rule is illustrated by the increase in molecular mass of approved oral drugs over the past 20 years12. NPs are structurally ‘optimized’ by evolution to serve particular biological functions1, including the regulation of endogenous defence mechanisms and the interaction (often competition) with other organisms, which explains their high relevance for infectious diseases and cancer. Furthermore, their use in traditional medicine may provide insights regarding efficacy and safety. Overall, the NP pool is enriched with ‘bioactive’ compounds covering a wider area of chemical space compared with typical synthetic small-molecule libraries13.

Despite these advantages and multiple successful drug discovery examples, several drawbacks of NPs have led pharmaceutical companies to reduce NP-based drug discovery programmes. NP screens typically involve a library of extracts from natural sources (Fig. 1), which may not be compatible with traditional target-based assays14. Identifying the bioactive compounds of interest can be challenging, and dereplication tools have to be applied to avoid rediscovery of known compounds. Accessing sufficient biological material to isolate and characterize a bioactive NP may also be challenging15. Furthermore, gaining intellectual property (IP) rights for (unmodified) NPs exhibiting relevant bioactivities can be a hurdle, since naturally occurring compounds in their original form may not always be patented (legal frameworks vary between countries and are evolving)16, although simple derivatives can be patent-protected (Box 1). An additional layer of complexity relates to the regulations defining the need for benefit sharing with countries of origin of the biological material, framed in the United Nations 1992 Convention on Biological Diversity and the Nagoya Protocol, which entered into force in 2014 (ref.17), as well as recent developments concerning benefit sharing linked to use of marine genetic resources18.

Fig. 1: Outline of traditional bioactivity-guided isolation steps in natural product drug discovery.
figure 1

Steps in the process are shown in purple boxes, with associated key limitations shown in red boxes and advances that are helping to address these limitations in modern natural product (NP)-based drug discovery shown in green boxes. The process begins with extraction of NPs from organisms such as bacteria. The choice of extraction method determines which compound classes will be present in the extract (for example, the use of more polar solvents will result in a higher abundance of polar compounds in the crude extract). To maximize the diversity of the extracted NPs, the biological material can be subjected to extraction with several solvents of different polarity. Following the identification of a crude extract with promising pharmacological activity, the next step is its (often multiple) consecutive bioactivity-guided fractionation until the pure bioactive compounds are isolated. A key limitation for the potential of this approach to identify novel NPs is that many potential source organisms cannot be cultured or stop producing relevant NPs when taken out of their natural habitat. These limitations are being addressed through development of new methods for culturing, for in situ analysis, for NP synthesis induction and for heterologous expression of biosynthetic genes. At the crude extract step, challenges include the presence in the extracts of NPs that are already known, NPs that do not have drug-like properties or insufficient amounts of NPs for characterization. These challenges can be addressed through the development of methods for dereplication, extraction and pre-fractionation of extracts. Finally, at the last stage, when bioactive compounds are identified by phenotypic assays, significant time and effort are typically needed to identify the affected molecular targets. This challenge can be addressed by the development of methods for accelerated elucidation of molecular modes of action, such as the nematic protein organization technique (NPOT), drug affinity responsive target stability (DARTS), stable isotope labelling with amino acids in cell culture and pulse proteolysis (SILAC-PP), the cellular thermal shift assay (CETSA) and an extension known as thermal proteome profiling (TPP), stability of proteins from rates of oxidation (SPROX), the similarity ensemble approach (SEA) and bioinformatics-based analysis of connectivity (connectivity map, CMAP)23,189,190,191,192.

Although the complexity of NP structures can be advantageous, the generation of structural analogues to explore structure–activity relationships and to optimize NP leads can be challenging, particularly if synthetic routes are difficult. Also, NP-based drug leads are often identified by phenotypic assays, and deconvolution of their molecular mechanisms of action can be time-consuming19. Fortunately, there have been substantial advances20 both in the development of screening assays (for example, harnessing the potential of induced pluripotent stem cells and gene editing technologies) and in strategies to identify the modes of action of active compounds (reviewed previously21,22,23).

Here, we discuss recent technological and scientific advances that may help to overcome challenges in NP-based drug discovery, with an emphasis on three areas: analytical techniques, genome mining and engineering, and cultivation systems. In the concluding section, we highlight promising future directions for NP drug discovery.

Application of analytical techniques

Classical NP-based drug research starts with biological screening of ‘crude’ extracts to identify a bioactive ‘hit’ extract, which is further fractionated to isolate the active NPs. Bioactivity-guided isolation is a laborious process with a number of limitations, but various strategies and technologies can be used to address some of them (Fig. 2). For example, to create libraries that are compatible with high-throughput screening, crude extracts can be pre-fractionated into sub-fractions that are more suitable for automated liquid handling systems. In addition, fractionation methods can be adjusted so that sub-fractions preferentially contain compounds with drug-like properties (typically moderate hydrophilicity). Such approaches can increase the number of hits compared with using crude extracts, as well as enabling more efficient follow-up of promising hits24.

Fig. 2: Applications of advanced analytical technologies empowering modern natural product-based drug discovery.
figure 2

a | An illustrative example of the application of liquid chromatography–high-resolution mass spectrometry (LC–HRMS) metabolomics in the screening of natural product (NP) extracts is the work of Kurita et al.58, in which 234 bacterial extracts were subjected to image-based phenotypic bioactivity screening and LC–HRMS metabolomics. Clustering of the resulting data allowed prioritization of promising extracts for further analysis, resulting in the discovery of the new NPs, quinocinnolinomycins A–D. b | Another illustrative example of LC–HRMS screening of NP extracts is the work of Clevenger et al.85, who obtained novel NP extracts through heterologous expression of fungal artificial chromosomes (FACs) containing uncharacterized biosynthetic gene clusters (BGCs) from diverse fungal species in Aspergillus nidulans. Analysis of the LC–HRMS metabolomics data with a FAC-Score algorithm directed the simultaneous discovery of 15 new NPs and the characterization of their BGCs.

Metabolomics was developed as an approach to simultaneously analyse multiple metabolites in biological samples. Enabled by technological developments in chromatography and spectrometry, metabolomics was historically applied first in other research fields, such as biomedical and agricultural sciences2. Advances in the analytical instrumentation used in NP research25,26, coupled with computational approaches that can generate plausible NP analogue structures and their respective simulated spectra27, have also enabled application of ‘omics’ approaches such as metabolomics in NP-based drug discovery. Metabolomics can provide accurate information on the metabolite composition in NP extracts, thus helping to prioritize NPs for isolation, to accelerate dereplication28,29 and to annotate unknown analogues and new NP scaffolds. Moreover, metabolomics can detect differences between metabolite compositions in various physiological states of producing organisms and enable the generation of hypotheses to explain them, and can also provide extensive metabolite profiles to underpin phenotypic characterization at the molecular level30. Both options are very useful in understanding the molecular mechanisms of action of NPs.

For metabolite profiling, NP extracts are analysed by NMR spectroscopy or high-resolution mass spectrometry (HRMS), or respective combined methods involving upstream liquid chromatography (LC)31,32, such as LC–HRMS, which can separate numerous isomers present in NP extracts33. Moreover, such combined methods might integrate HRMS and NMR, allowing the simultaneous use of the advantages of both techiques34,35. NMR analysis of NP extracts is simple and reproducible, and provides direct quantitative information and detailed structural information, although it has relatively low sensitivity, meaning that it generally enables profiling only of major constituents33. The applications of NMR in NP research are versatile36 and the technique is used both directly for metabolomics of unfractionated NP extracts and for structural characterization of compounds and fractions obtained with appropriate separation methods, most often LC. HRMS is the gold standard for qualitative and quantitative metabolite profiling33 and is most commonly applied in combination with LC. HRMS can also be used in the direct infusion mode (called DIMS)37, whereby samples are directly profiled by MS without a chromatography step, or in MS imaging (MSI)38, which enables determination of the spatial distribution of NPs within living organisms. HRMS enables routine acquisition of accurate molecular mass information, which together with appropriate heuristic filtering can provide unambiguous assignment of molecular formulae for hundreds to thousands of metabolites within a single extract over a dynamic range that may exceed five orders of magnitude31,39. However, challenges remain in data mining and in the unambiguous identification of the metabolites using various workflows relying on open web-based tools40.

Dereplication of secondary metabolites in bioactive extracts includes the determination of molecular mass and formula and cross-searching in the literature or structural NP databases with taxonomic information, which greatly assists the identification process. Such metadata, which are difficult to query in the literature, are often compiled in proprietary databases, such as the Dictionary of Natural Products, which encompasses all NP structures reported with links to their biological sources (see Related links). However, a comprehensive experimental tandem mass spectrometry (MS/MS) database of all NPs reported to date does not exist, and a search for experimental spectra across various platforms is hindered by the lack of standardized collision energy conditions for fragmentation in LC–MS/MS25.

In this respect, the Global Natural Products Social (GNPS) molecular networking platform developed in the Dorrestein laboratory is an important addition to the toolbox41. Molecular networking organizes thousands of sets of MS/MS data recorded from a given set of extracts and visualizes the relationship of the analytes as clusters of structurally related molecules. This improves the efficiency of dereplication by enabling annotation of isomers and analogues of a given metabolite in a cluster42. The recorded experimental spectra can be searched against putative structures and their corresponding predicted MS/MS spectra generated by tools such as competitive fragmentation modelling (CFM-ID)43. Based on such approaches, vast databases of theoretical NP spectra have been created and applied in dereplication44. The GNPS molecular networking approach has limitations, however, such as better applicability to some classes of NPs than others and the uncertainty of structural assignment among possible predicted candidates. Efforts to address such issues are ongoing45,46,47, including overlaying molecular networks of large NP extract libraries with taxonomic information to improve the confidence of annotation48. Overall, molecular networking mainly allows better prioritization of the isolation of unknown compounds by strengthening the dereplication process and elucidating relationships between NP analogues, and rigorous structure elucidation for NPs of interest should not be neglected.

Another useful platform for metabolite identification is METLIN49, which includes a high-resolution MS/MS database with a fragment similarity search function that is useful for identification of unknown compounds. Other databases and in silico tools such as Compound Structure Identification (CSI): FingerID and Input Output Kernel Regression (IOKR) can be used to search available fragment ion spectra, as well as to generate predicted spectra of fragment ions not present in current databases50. A novel computational platform for predicting the structural identity of metabolites derived from any identified compound has also been recently reported51, which should increase the searchable chemical space of NPs.

To accelerate the identification of bioactive NPs in extracts, metabolomics data can be matched to the biological activities of these extracts52. Various chemometric methods such as multivariate data analysis can correlate the measured activity with signals in the NMR and MS spectra, enabling the active compounds to be traced in complex mixtures with no need for further bioassays53,54,55. Furthermore, several analytical modules involving different bioassays and detection technologies can be linked to allow simultaneous bioactivity evaluation and identification of compounds present in small amounts (analytical scale) in complex compound mixtures34,35.

Metabolomics data can be integrated with data obtained by other omics techniques such as transcriptomics and proteomics and/or with imaging-based screens. For example, Acharya et al. used this approach to characterize NP-mediated interactions between a Micromonospora species and a Rhodococcus species56. In another interesting example, Kurita et al. developed a compound activity mapping platform for the prediction of identities and mechanisms of action of constituents from complex NP extract libraries by integrating cytological profiling57 with untargeted metabolomics data from a library of extracts58, and identified quinocinnolinomycins as a new family of NPs causing endoplasmic reticulum stress58 (Fig. 2a).

Analytical advances that enable the profiling of responses to bioactive molecules at the single-cell level can also accelerate NP-based drug discovery. Irish, Bachmann, Earl and colleagues developed a high-throughput platform for metabolomic profiling of bioactivity by integrating phospho-specific flow cytometry, single-cell chemical biology and cellular barcoding with metabolomic arrays (characterized chromatographic microtitre arrays originating from biological extracts)59. Using this platform, the authors studied the single-cell responses of bone marrow biopsy samples from patients with acute myeloid leukaemia following exposure to microbial metabolomic arrays obtained from extracts of biosynthetically prolific bacteria, which enabled the identification of new bioactive polyketides59.

Finally, advances in analytical technologies continue to support the rigorous structure determination of NPs of interest. The progressive development of higher-field NMR instruments and probe technology60,61 has enabled NP structure determination from very small quantities (below 10 µg)62,63, which is important, as the available quantities of NPs are often limited. In addition, microcrystal electron diffraction (MicroED) has recently emerged as a cryo-electron microscopy-based technique for unambiguous structure determination of small molecules64 and is already finding important applications in NP research65. The increased resolution and sensitivity of analytical equipment can also help address problems associated with ‘residual complexity’ of isolated NPs; that is when biologically potent but unidentified impurities in an isolated NP sample (which could include structurally related metabolites or conformers) lead to an incorrect assignment of structure and/or activity66,67. To avoid futile downstream development efforts, Pauli and colleagues recommended that lead NPs should undergo advanced purity analysis at an early stage using quantitative NMR and LC–MS67.

Genome mining and engineering

Advances in knowledge on biosynthetic pathways for NPs and in developing tools for analysing and manipulating genomes are further key drivers for modern NP-based drug discovery. Two key characteristics enable the identification of biosynthetic genes in the genomes of the producing organisms. First, these genes are clustered in the genomes of bacteria and filamentous fungi. Second, many NPs are based on polyketide or peptide cores, and their biosynthetic pathways involve enzymes — polyketide synthases (PKSs) and nonribosomal peptide synthetases (NRPSs), respectively — that are encoded by large genes with highly conserved modules68.

‘Genome mining’ is based on searches for genes that are likely to govern biosynthesis of scaffold structures, and can be used to identify NP biosynthetic gene clusters69,70,71. Prioritization of gene clusters for further work is facilitated by advances in biosynthetic knowledge and predictive bioinformatics tools, which can provide hints about whether the metabolic products of the clusters have chemical scaffolds that are new or known, thereby supporting dereplication72,73. Such predictive tools for gene cluster analysis can be applied in combination with spectroscopic techniques to accelerate the identification of NPs65 and determine the stereochemistry of metabolic products66. Furthermore, to extend genome mining from a single genome to entire genera, microbiomes or strain collections, computational tools have been developed, such as BiG-SCAPE, which enables sequence similarity analysis of biosynthetic gene clusters, and CORASON, which uses a phylogenomic approach to elucidate evolutionary relationships between gene clusters74.

Phylogenetic studies of known groups of talented secondary metabolite producers can also empower discovery of novel NPs. Recently, a study comparing secondary metabolite profiles and phylogenetic data in myxobacteria demonstrated a correlation between the taxonomic distance and the production of distinct secondary metabolite families75. In filamentous fungi, it was likewise shown that secondary metabolite profiles are closely correlated with their phylogeny76. These organisms are rich in secondary metabolites, as demonstrated by LC–MS studies of their extracts under laboratory conditions77. Concurrent genomic and phylogenomic analyses implied that even the genomes of well-studied organism groups harbour many gene clusters for secondary metabolite biosynthesis with as yet unknown functions78. The phylogeny of biosynthetic gene clusters, together with analysis of the absence of known resistance determinants, was recently used to prioritize members of the glycopeptide antibiotic family that could have novel activities. This led to the identification of the known antibiotic complestatin and the newly discovered corbomycin as compounds that act through a previously uncharacterized mechanism involving inhibition of peptidoglycan remodelling79.

Many microorganisms cannot be cultured, or tools for their genetic manipulation are not sufficiently developed, which makes it more challenging to access their NP-producing potential. However, biosynthetic gene clusters for NPs can be cloned and heterologously expressed in organisms that are well-characterized and easier to culture and to genetically manipulate (such as Streptomyces coelicolor, Escherichia coli and Saccharomyces cerevisiae)80. The aim is to achieve higher production titres in the heterologous hosts than in wild-type strains, improving the availability of lead compounds80,81,82. Vectors that can carry large DNA inserts are needed for the cloning of complete NP biosynthetic gene clusters. Cosmids (which can have inserts of 30–40 kb), fosmids (which can harbour 40–50 kb) and bacterial artificial chromosomes (BACs; which can have inserts of 100 kb to >300 kb) have been developed83. For fungal gene clusters, self-replicating fungal artificial chromosomes (FACs) have been developed, which can have inserts of >100 kb (ref.84). FACs in combination with metabolomic scoring were used to develop a scalable platform, FAC-MS, allowing the characterization of fungal biosynthetic gene clusters and their respective NPs at unprecedented scale85. The application of FAC-MS for the screening of 56 biosynthetic gene clusters from different fungal species yielded the discovery of 15 new metabolites, including a new macrolactone, valactamide A85 (Fig. 2b).

Even in culturable microorganisms, many biosynthetic gene clusters may not be expressed under conventional culture conditions, and these silent clusters could represent a large untapped source of NPs with drug-like properties86. Several approaches can be pursued to identify such NPs. One approach is sequencing, bioinformatic analysis and heterologous expression of silent biosynthetic gene clusters, which has already led to the discovery of several new NP scaffolds from cultivable strains87. Direct cloning and heterologous expression was also used to discover the new antibiotic taromycin A, which was identified upon the transfer of a silent 67 kb NRPS biosynthetic gene cluster from Saccharomonospora sp. CNQ-490 into S. coelicolor88. To transfer a biosynthetic gene cluster of such size, a platform based on transformation-associated recombination (TAR) cloning was developed. This platform enables direct cloning and manipulation of large biosynthetic gene clusters in S. cerevisiae, maintenance and manipulation of the vector in E. coli, and heterologous expression of the cloned gene clusters in Actinobacteria (such as S. coelicolor) following chromosomal integration88, and is an alternative to BACs for heterologous expression of large biosynthetic gene clusters.

Heterologous expression has limitations, such as the need to clone and manipulate very large genome regions occupied by biosynthetic gene clusters and the difficulty of identifying a suitable host that provides all conditions necessary for the production of the corresponding NPs. These limitations can be circumvented by activating biosynthetic gene clusters directly in the native microorganism through targeted genetic manipulations, generally involving the insertion of activating regulatory elements or deletion of inhibitory elements such as repressors or their binding sites. For example, a derepression strategy of deleting gbnR, a gene for a transcriptional repressor in Streptomyces venezuelae ATCC 10712 was used by Sidda et al. in the discovery of gaburedins, a family of γ-aminobutyrate-derived ureas89. An example of the activator-based strategy is the constitutive expression of the samR0484 gene in Streptomyces ambofaciens ATCC 23877, which led to the discovery of stambomycins A–D, 51-membered cytotoxic glycosylated macrolides72. Alternatively, silent biosynthetic gene clusters can be activated using repressor decoys90, which have the same DNA nucleotide sequence as the binding sites for the repressors that prevent the expression of the clusters. When these decoys are introduced into the bacteria, they sequester the respective repressors, and the ‘endogenous’ binding sites in the genome remain unoccupied, leading to derepression of the previously silent biosynthetic genes and production of the corresponding NPs. This approach has been applied to activate eight silent biosynthetic gene clusters in multiple streptomycetes and led to the characterization of a novel NP, oxazolepoxidomycin A90. The repressor decoy strategy is simpler, easier and faster to perform than the deletion of genes encoding regulatory factors. However, it has the same limitation as other approaches that rely on the introduction of recombinant DNA molecules into cells: it is necessary to develop protocols for efficient introduction of DNA into the targeted host strain, and the decoy must be maintained on a high-copy plasmid to ensure efficient repressor sequestration.

Another approach focused on exchange of regulatory elements is based on the CRISPR–Cas9 technology. The promise of this technique is exemplified in a recent work by Zhang et al., which demonstrated that CRISPR–Cas9-mediated targeted promoter introduction can efficiently activate diverse biosynthetic gene clusters in multiple Streptomyces species, leading to the production of unique metabolites, including a novel polyketide in Streptomyces viridochromogenes91. The CRISPR–Cas9 technology was also used to knock out genes encoding two well-known and frequently rediscovered antibiotics in several actinomycete strains, which led to the production of different rare and previously unknown variants of antibiotics that were otherwise obscured, including amicetin, thiolactomycin, phenanthroviridin and 5-chloro-3-formylindole92.

Approaches that rely on sequencing, bioinformatics and heterologous expression can also enable the identification of novel NPs from bacterial strains that have not yet been cultivated (Fig. 3a). For example, Hover et al. searched the metagenomes of 2,000 soil samples for biosynthetic gene clusters for lipopeptides with calcium-binding motifs. This led to the discovery of malacidins, members of the calcium-dependent antibiotic family, via heterologous expression of a 72 kb biosynthetic gene cluster from a desert soil sample in a Streptomyces albus host strain93 (Fig. 3b). However, in comparison with some of the other above-discussed strategies72,89,90, this metagenome-based discovery approach is more suited to finding new members of known NP classes rather than discovery of entirely new classes. In another study, Chu et al. developed a human microbiome-based approach that identified nonribosomal linear heptapeptides called humimycins as novel antibiotics active against methicillin-resistant Staphylococcus aureus (MRSA)94 (Fig. 3c). The structure of the NPs was predicted via bioinformatics analysis of gene clusters found in human commensal bacteria, followed by their chemical synthesis. A major strength of this innovative approach is that it is entirely independent of microbial cultivation and heterologous gene expression. Nevertheless, there are limitations related to the accuracy of computational chemical structure predictions and the feasibility of total chemical synthesis if structures are complex.

Fig. 3: Strategies for genome mining-driven discovery of natural products and natural product-like compounds.
figure 3

a | Genome mining-based approaches to explore the biosynthetic capacity of microorganisms rely on DNA extraction, sequencing and bioinformatics analysis. The vast majority of microbes from different environments and microbiota communities have not been cultured, and their capacity to produce natural products (NPs) was largely inaccessible until recently. In the case of unculturable microorganisms, the bioinformatics analysis step can be followed by either targeted heterologous expression of biosynthetic gene clusters (BGCs) prioritized as being likely to yield relevant new NPs or direct chemical synthesis of ‘synthetic–bioinformatic’ NP-like compounds. b,c | These two approaches are exemplified by the recent discoveries of malacidins (panel b) and humimycins (panel c), respectively93,94. A major strength of the ‘synthetic–bioinformatic’ approach is that it is entirely independent of microbial culture and gene expression. Its limitations are the accuracy of computational chemical structure predictions and the feasibility of total chemical synthesis. NRPS, nonribosomal peptide synthetase.

The genomes of plants or animals can also be mined for novel NPs. For example, mining of 116 plant genomes enabled by identification of a precursor gene for the biosynthesis of lyciumins, a class of branched cyclic ribosomal peptides with hypotensive action produced by Lycium barbarum (popularly known as goji), identified diverse novel lyciumin chemotypes in seven other plants, including crops such as soybean, beet, quinoa and eggplant95. Genome mining in the animal kingdom is exemplified by the work of Dutertre et al., which used an integrated transcriptomics and proteomics approach to discover thousands of novel venom peptides from Conus marmoreus snails96. Proteomics analysis revealed that the vast majority of the conopeptide diversity was derived from a set of ~100 genes through variable peptide processing96.

Some bioactive compounds initially isolated from marine organisms might be products of symbionts, and genome mining can facilitate the characterization of such NPs. For example, it has been shown that bioactive compounds from the sponge Theonella swinhoei are produced by bacterial symbionts97, and characterization of the symbiont ‘Candidatus Entotheonella serta’ using single-cell genomics led to the discovery of gene clusters for misakinolide and theonellamide biosynthesis98. Another example of a marine NP produced by a bacterial symbiont is ET-743 (trabectedin), originally isolated from the tunicate Ecteinascidia turbinate. A meta-omics approach developed by Rath et al. revealed that the producer of this clinically used anticancer agent is the bacterial symbiont ‘Candidatus Endoecteinascidia frumentensis’99.

Similarly, plant microbiomes also represent a large reservoir for the identification of novel bioactive NPs (such as the antitumour agents maytansine, paclitaxel and camptothecin, which were initially isolated from plants and later shown to be produced by microbial endophytes)100 that can be tapped by genome mining approaches. An illustrative example is a recent work by Helfrich et al. that identified hundreds of novel biosynthetic gene clusters by genome mining of 224 bacterial strains isolated from Arabidopsis thaliana leaves101. A combination of bioactivity screening and imaging mass spectrometry was used to select a single species for further genomic analysis and led to the isolation of a NP with an unprecedented structure, the trans-acyltransferase PKS-derived antibiotic macrobrevin101.

Targeted genetic engineering of NP biosynthetic gene clusters can be of high value if the producing organism is difficult to cultivate or the yield of a NP is too low to allow comprehensive NP characterization. Rational genetic engineering and heterologous expression contributed to increase the production of vioprolides, a depsipeptide class of anticancer and antifungal NPs in the myxobacterium Cystobacter violaceus Cb vi35, by several orders of magnitude. In addition, non-natural vioprolide analogues were generated by this approach102. Similarly, promoter engineering and heterologous expression of biosynthetic gene clusters was reported to result in a 7-fold increase in the production of the cytotoxic NP disorazol103, and a 328-fold increase in the production of spinosad, an insecticidal macrolide produced by the bacterium Saccharopolyspora spinosa104.

Besides increasing NP yields, targeted gene manipulation can also be used to alter biosynthetic pathways in a predictable manner to produce new NP analogues with improved pharmacological properties, such as higher specific activity, lower toxicity and better pharmacokinetics. Such biosynthetic engineering approaches depend on a solid understanding of the biosynthetic pathway leading to a specific NP, access to the genes specifying this pathway and the ability to manipulate them in either the original or a heterologous host. Recent advances in biosynthetic engineering have enabled faster and more efficient production of NP analogues, including the development of methods for accelerated engineering and recombination of modules of PKS gene clusters105, NRPSs106,107 and NRPS–PKS assembly lines108, as well as elucidation of mechanisms for polyketide chain release that are contributing to NP structural diversification109,110. Examples of biosynthetic engineering applied to several important NPs include the generation of analogues of the immunosuppressant rapamycin111, the antitumour agents mithramycin112 and bleomycin113, and the antifungal agent nystatin114.

It should be noted that biosynthetic engineering has limitations regarding the parts of the NP molecule that can be targeted for modifications, and the chemical groups that can be introduced or removed. Considering the complexity of many NPs, however, total synthesis may be prohibitively costly, and a combined approach of biosynthetic engineering and chemical modification can provide a viable alternative for identifying improved drug candidates. For example, biosynthetic engineering may create a ‘handle’ for addition of a beneficial chemical group by synthetic chemistry, as demonstrated for the biosynthetically engineered analogues of nystatin mentioned above; further synthetic chemistry modifications resulted in compounds with improved in vivo pharmacotherapeutic characteristics compared with amphotericin B115,116.

Advances in microbial culturing systems

The complex regulation of NP biosynthesis in response to the environment means that the conditions under which producing organisms are cultivated can have a major impact on the chance of identifying novel NPs87. Several strategies have been developed to improve the likelihood of identifying novel NPs compared with monoculture under standard laboratory conditions and to make ‘uncultured’ microorganisms grow in a simulated natural environment117 (Fig. 4).

Fig. 4: Application of advanced microbial culturing approaches to identify new natural products.
figure 4

New strategies for isolating previously uncultured microorganisms can enable access to new natural products (NPs) produced by them. a | To recapitulate the effect of complex signals coming from the native environment, microorganisms can be cultivated directly in the environment from which they were isolated. This concept is used with the iChip platform, in which diluted environmental samples are seeded in multiple small chambers separated from the native environment with a semipermeable membrane. The potential of this approach is illustrated by the recent discovery of teixobactin, a new antibiotic with activity against Gram-positive bacteria134,135. b | Another important recent development involves obtaining information from environmental samples using omics techniques such as metagenomics to identify and partially characterize microorganisms present in a specific environment before culturing. An approach relying on such preliminary information was recently used to engineer the capture of antibodies based on genetic information, which resulted in the successful cultivation of previously uncultured bacteria from the human mouth145. This reverse genomics workflow was validated by the isolation and cultivation of three species of Saccharibacteria (TM7) along with their interacting Actinobacteria hosts, as well as SR1 bacteria that are members of a candidate phylum with no previously cultured representatives.

One well-established approach to promote the identification of novel NPs is the modulation of culture conditions such as temperature, pH and nutrient sources. This strategy may lead to activation of silent gene clusters, thereby promoting production of different NPs. The term ‘One Strain Many Compounds’ (OSMAC) was coined for this approach about 20 years ago118, but the concept has a longer history119, with its use being routine in industrial microbiology since the 1960s120.

While OSMAC is still widely used for the identification of new bioactive compounds121,122, this approach has limited capacity to mimic the complexities of natural habitats. It is difficult to predict the combination of cues (which might also involve metabolites secreted by other members of the microbial community) to which the microorganism has evolved to respond by switching metabolic programmes. To account for such kinds of interactions, co-culturing using ‘helper’ strains can be applied123. This can enable the production and identification of new NPs, as illustrated by recent studies in which particular fungi were co-cultured with Streptomcyes species124,125.

Study of the molecular mechanisms underlying the ability of helper strains to increase the cultivability of previously uncultured microbes can lead to the identification of specific growth factors, allowing expansion of the number of species that can be successfully cultured. This strategy was used by D’Onofrio et al. for the identification of new acyl-desferrioxamine siderophores (iron-chelating compounds) as growth factors produced by helper strains promoting the growth of previously uncultured isolates from marine sediment biofilm117,126. The siderophore-assisted growth is based on the property of these compounds to provide iron for microbes unable to autonomously produce siderophores themselves, and the application of this approach led to the isolation of previously uncultivated microorganisms126. The development of strategies to cultivate microbial symbionts that produce NPs only upon interaction with their hosts can promote access to new NPs. Microbial symbionts interacting with insects or other organisms are a highly promising reservoir for the discovery of novel bioactive NPs produced in a unique ecological context127,128,129,130. To stimulate NP production, culturing strategies can be developed that better mimic the native environment of microbial symbionts of insects, including the use of media containing either lyophilized dead insects131 or l-proline, a major constituent of insect haemolymph132.

Strategies to mimic the natural environment even more closely by harnessing in situ incubation in the environment from which the microorganism is sampled have been developed, dating back to more than 20 years ago with the biotech companies OneCell and Diversa. They developed platforms that allowed the growth of some previously uncultivated microbes from various environments based on diluting out and suspension in a single drop of medium120,133. More recently, such strategies have been highlighted by the development and application of a platform dubbed the iChip, in which diluted soil samples are seeded in multiple small chambers separated from the environment with a semipermeable membrane134. After seeding, the iChip is placed back into the soil from which the sample was taken for an in situ incubation period, allowing the cultured microorganisms to be exposed to influences from their native environment. The power of this culturing approach was demonstrated by the discovery of a new antibiotic, teixobactin, produced by a previously uncultured soil bacterium135,136 (Fig. 4a). This platform may be of great significance for NP drug discovery, given that it has been estimated that only 1% of soil organisms have so far been successfully cultured using traditional culturing techniques137.

The omics strategies discussed in previous sections can complement efforts to explore NPs produced upon microbial interactions. The application of such a strategy is illustrated in the work of Derewacz et al., who analysed the metabolome of a genome-sequenced Nocardiopsis bacterium upon co-culture with bacteria of the genera Escherichia, Bacillus, Tsukamurella and Rhodococcus138. Around 14% of the metabolomic features found in co-cultures were undetectable in monocultures, with many of those being unique to specific co-culture genera, and the previously unreported polyketides ciromicin A and B, which possess an unusual pyrrolidinol substructure and displayed moderate and selective cytotoxicity, were identified138. Other examples include a ‘culturomics’ approach that combines multiple culture conditions with MS profiling and 16S rRNA-based taxonomy to identify prokaryotic species from the human gut139, and an ultrahigh-throughput screening platform based on microfluidic droplet single-cell encapsulation and cultivation followed by next-generation sequencing and LC–MS, which allows investigation of pairwise interactions between target microorganisms140. The latter approach enabled identification of a slow-growing oral microbiota species that inhibits the growth of S. aureus140.

Historically early-adopted microbial culturing approaches led to a bias reflected in the predominant discovery of NPs from microorganisms that are easy to cultivate (such as streptomycetes and some common filamentous fungi). As a result, a vast number of NPs from such ‘easy to culture’ microbes have already been characterized, and conventional screening efforts tend to yield disappointing returns associated with frequent rediscovery of known NPs and their closely related congeners. Therefore, culturing strategies aimed at previously unexplored (or under-investigated) microbial groups, with the potential to produce NPs with entirely new scaffolds and bioactivities (such as Burkholderia, Clostridium and Xenorhabdus) are of high interest141,142. Closthioamide, the first secondary metabolite from a strictly anaerobic bacterium, was discovered from Clostridium cellulolyticum by this approach143. Targeted isolation of such species is important, and a genome-guided approach to achieve this goal has recently been demonstrated for Burkholderia strains in environmental samples144. Another highly innovative approach to the isolation and cultivation of previously uncultured bacteria was recently reported by Cross et al.145, who used genomic information to engineer antibodies predicted to target selected microorganisms and to specifically capture these microorganisms from complex communities and to isolate them in pure cultures. This approach was validated by isolation and cultivation of previously uncultured bacteria from the human oral cavity145 (Fig. 4b), and it could be applicable to a wide range of target organisms if suitable cultivation conditions can be identified for the isolated cells.

Despite these advances in culturing strategies, artificial conditions still do not fully represent the complex environment of natural habitats. To circumvent this problem, microbial and NP diversity can also be accessed via extraction of organisms and/or their NPs in situ. To directly gain compounds produced in the natural marine environment (which may be missed otherwise), resin capture technology can be used to capture compounds on inert sorbent supports ready to be desorbed, analysed and tested for biological activity146. Sustainable approaches for in situ extraction with green solvents, such as glycerol or natural deep eutectic and ionic solvents (NADES), could be used directly during field work147,148. To improve dereplication, analytical equipment miniaturization is also facilitating in situ analysis; examples include the introduction of devices for physicochemical data analysis, such as micro-MS and portable near infrared spectroscopy149,150.

Outlook for NPs in drug discovery

The technological advances discussed above have the potential to reinvigorate NP-based drug discovery in both established and emerging areas. NPs have long been the key source of new drugs against infectious diseases, especially antibiotics (reviewed elsewhere151,152). Selected NPs with antimicrobial properties discovered by leveraging advances discussed in the sections above, including strategies to exploit the human microbiome for novel NPs94,153 are highlighted in Figs 3,4. Along with the search for new NPs with antimicrobial activities, researchers are continuing to develop and optimize already known NP classes, making use of advances in biosynthetic engineering154, total synthesis155 or semi-synthetic strategies156,157. In addition, antivirulence strategies could represent an alternative approach to fighting infections158, for which NPs targeting bacterial quorum sensing could be of interest159.

NPs also have a successful history as cancer therapeutics, which has been well covered in other reviews160,161,162,163. An important new opportunity in this field is the capacity of some NPs to trigger a selective yet potent host immune reaction against cancer cells, particularly given the intense interest at present in strategies that could improve response rates to immune checkpoint inhibitors by turning ‘cold’ tumours ‘hot’164. For example, NPs such as cardiac glycosides165 can increase the immunogenicity of stressed and dying cancer cells by triggering immunogenic cell death, characterized by the release of damage-associated molecular patterns (DAMPs), which could open new avenues for drug discovery or repurposing166,167,168.

Botanical therapies containing complex mixtures of NPs have long attracted interest owing to the potential for synergistic therapeutic effects of components within the mixture169,170. However, the variability of the NP composition in the starting plant material owing to factors such as environmental variations in the location at which the plants were collected is a major challenge for the development of botanical drugs1. With the advances in technology for their characterization, such as metabolomics discussed above, as well as development of regulatory guidance for complex mixtures of NPs (see Related links), it is becoming more feasible to develop such mixtures as therapeutics, rather than to identify and purify a single active ingredient171.

Since gut microbiota are considered to play a major role in health and disease172,173,174, and NPs are known to affect the gut microbiome composition175,176,177,178, this area is an emerging opportunity for NP-based drug discovery. However, drug discovery efforts in this area are still in their infancy, with many open questions remaining179. A future direction may be the characterization of single microbiota-derived species for particular therapeutic applications, and the advances in culturing strategies, genome mining and analytics discussed above will be of great importance in this respect.

Many advances discussed above are supported by computational tools including databases (such as genomic, chemical or spectral analysis data; see ref.180 for a recent review on NP databases) and tools that enable the analysis of genetic information, the prediction of chemical structures and pharmacological activities181, the integration of data sets with diverse information (such as tools for multi-omics analysis)182 and machine learning applications183.

Although this Review focuses on technologies that enable the discovery of novel NPs, it is important to acknowledge that unmodified NPs may possess suboptimal efficacy or absorption, distribution, metabolism, excretion and toxicity (ADMET) properties. So, for development of NP hits into leads and ultimately into successful drugs, chemical modification may be required. In addition, bringing a compound into clinical development requires a sustainable and economically viable supply of sufficient quantities of the compound. Total chemical synthesis, semi-synthesis using a NP as a starting point for analogue generation and biosynthetic engineering modifying biosynthetic pathways of the producing organism will be of great importance in this context (Fig. 5). Recent advances in chemical synthesis and biosynthetic engineering technologies are strongly empowering NP-based drug discovery and development by enabling property optimization of complex NP scaffolds that were previously regarded as inaccessible. This allows the enrichment of screening libraries with NPs, NP hybrids, NP analogues and NP-inspired molecules, as well as superior structure functionalization approaches (including late-stage functionalization) for optimization of NP leads94,105,106,107,108,184,185,186,187,188.

Fig. 5: Strategies to obtain natural product analogues with superior properties.
figure 5

Unmodified natural products (NPs) often possess suboptimal properties, and superior analogues need to be obtained in order to yield valuable new drugs. a | NP analogues can be accessed through the development of total chemical synthesis followed by chemical derivatization, through semisynthesis using a NP as a starting point for the introduction of chemical modifications, and through biosynthetic engineering using manipulations of biosynthetic pathways of the producing organism to generate NP analogues. b,c | Tetracyclines are an example of NP-derived antibiotics that have already yielded several generations of successfully marketed semisynthetic and synthetic derivatives. The first generation of tetracyclines (such as chlortetracycline and tetracycline) were unmodified NPs, while the two subsequent generations of analogues with optimized properties were semisynthetic (second-generation, doxycycline, minocycline; third-generation, tigecycline) and the most recently developed fourth-generation analogues (eravacycline) are entirely synthetic, accessed via total synthesis193,194. More recent examples of property optimization of other classes of NPs through total chemical synthesis followed by chemical derivatization or through semisynthesis are illustrated by studies focused on analogues of chrysomycin A (panel b)195 and arylomycins (panel c)157, respectively. d | The biosynthetic engineering approach has also shown potential; for example, in the generation of analogues of rapamycin111, bleomycin113 (panel d) and nystatin114. 6′-deoxy-BLM A2, 6′-deoxy-bleomycin A2; BLM A2, bleomycin A2.

Finally, although NP-based drug discovery offers a unique niche for diverse forms of academia–industry collaboration, a key challenge is that scientific and technological expertise is often scattered over many academic institutions and companies. Focused efforts are needed to support translational NP research in academia, which has become more difficult in recent years given the decline in the number of large companies actively engaged in NP research. A conventional solution to improve academia–industry interaction is to focus the relevant expertise under one umbrella and in close spatial proximity. For example, the Phytovalley Tirol, centred in Innsbruck, Austria, brings together several research institutions and companies (among others, the Austrian Drug Screening Institute (ADSI), the Michael Popp Research Institute for New Phyto-Entities, Bionorica Research and Biocrates Life Sciences AG) with the aim of accelerating NP-based drug discovery. Another solution could be virtual consortia, such as the International Natural Product Sciences Taskforce (INPST) that we have recently established (see Related links), which provides a platform for integration of expertise, technology and materials from the participating academic and industrial entities.

In conclusion, NPs remain a promising pool for the discovery of scaffolds with high structural diversity and various bioactivities that can be directly developed or used as starting points for optimization into novel drugs. While drug development overall continues to be challenged by high attrition rates, there are additional hurdles for NPs due to issues such as accessibility, sustainable supply and IP constraints. However, we believe that the scientific and technological advances discussed in this Review provide a strong basis for NP-based drug discovery to continue making major contributions to human health and longevity.