Proc Natl Acad Sci U S A. 2021 Apr 6;118(14):e2025192118. doi: 10.1073/pnas.2025192118.
When addressing a genomic question, having a reliable and adequate reference genome is of utmost importance. This drives the necessity to refine and customize reference genomes (RGs). Our laboratory has recently developed a strategy, the Perfect Match Genomic Landscape (PMGL), to detect variation between genomes [K. Palacios-Flores et al. Genetics 208, 1631-1641 (2018)]. The PMGL is precise and sensitive and, in contrast to most currently used algorithms, is nonstatistical in nature. Here we demonstrate the power of PMGL to refine and customize RGs. As a proof-of-concept, we refined different versions of the Saccharomyces cerevisiae RG. We applied the automatic PMGL pipeline to refine the genomes of microorganisms belonging to the three domains of life: the archaea Methanococcus maripaludis and Pyrococcus furiosus; the bacteria Escherichia coli, Staphylococcus aureus, and Bacillus subtilis; and the eukarya Schizosaccharomyces pombe, Aspergillus oryzae, and several strains of Saccharomyces paradoxus. We analyzed the reference genome of the virus SARS-CoV-2 and previously published viral genomes from patients’ samples with COVID-19. We performed a mutation-accumulation experiment in E. coli and show that the PMGL strategy can detect specific mutations generated at any desired step of the whole procedure. We propose that PMGL can be used as a final step for the refinement and customization of any haploid genome, independently of the strategies and algorithms used in its assembly.