Genomic factors shaping codon usage across the Saccharomycotina subphylum

Citation

B. Zavala et al. "Genomic factors shaping codon usage across the Saccharomycotina subphylum" G3 (2024) 14:jkae207 [DOI:10.1093/g3journal/jkae207]

Description

Codon usage bias, or the unequal use of synonymous codons, is observed across genes, genomes, and between species. It has been implicated in many cellular functions, such as translation dynamics and transcript stability, but can also be shaped by neutral forces. We characterized codon usage across 1,154 strains from 1,051 species from the fungal subphylum Saccharomycotina to gain insight into the biases, molecular mechanisms, evolution, and genomic features contributing to codon usage patterns. We found a general preference for A/T-ending codons and correlations between codon usage bias, GC content, and tRNA-ome size. Codon usage bias is distinct between the 12 orders to such a degree that yeasts can be classified with an accuracy greater than 90% using a machine-learning algorithm. We also characterized the degree to which codon usage bias is impacted by translational selection. We found it was influenced by a combination of features, including the number of coding sequences, BUSCO count, and genome length. Our analysis also revealed an extreme bias in codon usage in the Saccharomycodales associated with a lack of predicted arginine tRNAs that decode CGN codons, leaving only the AGN codons to encode arginine. Analysis of Saccharomycodales gene expression, tRNA sequences, and codon evolution suggests that avoidance of the CGN codons is associated with a decline in arginine tRNA function. Consistent with previous findings, codon usage bias within the Saccharomycotina is shaped by genomic features and GC bias. However, we find cases of extreme codon usage preference and avoidance along yeast lineages, suggesting additional forces may be shaping the evolution of specific codons.

Data Access

The Y1000+ data can be obtained from the project website (http://y1000plus.org) or the associated Figshare repository https://doi.org/10.25452/figshare.plus.c.6714042. The Figshare project (https://figshare.com/projects/Genomic_factors_shaping_codon_usage_acros…) contains the raw random forest model data, the assembled transcriptomes from the Hanseniaspora, the RSCU for all coding sequences in the subphylum, the conserved arginine analysis, and the mitochondrial tRNA analysis. The Hanseniaspora RNA-sequencing data have been deposited in BioProject PRJNA1144926, accessions SAMN43045963, SAMN43045964, and SAMN43045965.

Conversion
Genomics
Phylogenetic relationships