Machine learning identifies novel signatures of antifungal drug resistance in Saccharomycotina yeasts
M.-C. Harrison et al. "Machine learning identifies novel signatures of antifungal drug resistance in Saccharomycotina yeasts" PLoS Genetics (2026) 22:e1012091 [DOI: 10.1371/journal.pgen.1012091]
Antifungal drug resistance is a major challenge in fungal infection management. Numerous genomic changes are known to contribute to acquired drug resistance in clinical isolates of specific pathogens, but whether they broadly explain natural resistance across entire lineages is unknown. We leveraged genomic, ecological, and phenotypic trait data from naturally sampled strains from nearly all known species in subphylum Saccharomycotina to examine the evolution of resistance to eight antifungal drugs. The phylogenetic distribution of drug resistance varied by drug; fluconazole resistance was widespread, while 5-fluorocytosine resistance was rare, except in Lipomycetales. A random forest algorithm trained on genomic data predicted drug-resistant yeasts with 54-75% accuracy. Fluconazole resistance was consistently predicted with the highest accuracy (75.2%). Furthermore, fluconazole resistance prediction accuracy was similar between models trained on genome-wide variation in the presence and number of InterPro protein annotations across Saccharomycotina (75.2%) and those trained on amino acid sequence alignment data of Erg11, a protein known to be involved in fluconazole resistance (74.3-74.9%). Interestingly, the top Erg11 residues for predicting fluconazole resistance across Saccharomycotina do not overlap with, are not spatially close to, and are less conserved than those previously linked to resistance in clinical isolates of Candida albicans. In silico deep mutational scanning of the C. albicans Erg11 protein reveals that amino acid variants implicated in clinical cases of resistance are almost universally destabilizing while variants in our most informative residues are energetically more neutral, explaining why the latter are much more common than the former in natural populations. Importantly, previous experimental analyses of C. albicans Erg11 have shown that amino acid variation in our most informative residues, despite having never been directly implicated in clinical cases, can directly contribute to resistance. Our results suggest that studies of natural resistance in yeast species never encountered in the clinic will yield a fuller understanding of antifungal drug resistance.
The genomic, metabolic and environmental datasets are available at https://figshare.com/s/739a5de80d5ce89dbd10. The supplemental tables for this manuscript, which include drug resistance data, are available at https://figshare.com/s/93423df01686403cee4e. The different integer-encodings of Erg11, including the different alignments, unique k-mers with k=3 in each sequence, and the one-hot encoding of the MAFFT alignment, are available at https://figshare.com/s/2261a7989826892fe80c. Finally, an example of the random forest code used for this paper is available at https://github.com/mcharrison95/code_for_yeast_drug_res.