Important information about the gut microbiome may be lost before DNA sequencing data are even analysed, potentially affecting future efforts to diagnose disease, monitor antibiotic resistance and develop personalised medicine, according to a study by Polish and Estonian researchers.
The study, published in the journal mSystems (more: HERE and HERE) found that two widely used DNA sequencing platforms produced highly similar results when identifying which microorganisms were present in the gut, but differed significantly when analysing what those microorganisms are capable of doing, such as carrying genes linked to antibiotic resistance or specific metabolic functions.
The research was conducted by scientists from the Małopolska Centre of Biotechnology at the Jagiellonian University, the Sano – Centre for Computational Personalized Medicine in Kraków, and the University of Tartu in Estonia. The team analysed 1,351 gut microbiome samples from the Estonian Biobank, making it one of the largest comparisons of its kind.
The gut microbiome is the community of trillions of bacteria, viruses and other microorganisms that inhabit the digestive tract. Growing evidence suggests it plays an important role not only in digestion but also in regulating the immune system, metabolism and the risk of a wide range of diseases.
To study the microbiome, scientists use DNA sequencing, a technique that reads the genetic material of microorganisms present in a biological sample. Sequencing allows researchers to determine both which microbes are present and, increasingly, what biological functions they may perform.
As large microbiome studies proliferate and biobanks around the world collect samples from thousands of people, researchers are increasingly combining data from different countries, laboratories and sequencing technologies. This has raised questions about whether results obtained using different platforms can be reliably compared.
To address this, the researchers compared data generated using two of the most widely used sequencing platforms, Illumina and MGI, by analysing the same 1,351 samples with both methods.
The comparison showed that the platforms agreed on more than 92% of microbial species detected in the samples, and measures of microbial diversity were also highly consistent. According to the researchers, this suggests that large datasets generated using different sequencing platforms can be combined for studies focusing on the taxonomic composition of the microbiome.
The picture changed, however, when the researchers examined the biological functions encoded by the microorganisms rather than simply identifying which species were present.
At that level, substantial differences emerged between the two datasets. The researchers found that the discrepancies were not caused by the sequencing platforms themselves but by the way samples had been prepared before sequencing.
In particular, the team identified multiplexing—the practice of sequencing multiple samples simultaneously to reduce costs—as a key factor. Although multiplexing makes large studies more economical, it also reduces the number of unique DNA fragments obtained from each sample, increasing the likelihood that rare genes will go undetected.
"Our results confirm that we can trust taxonomic comparisons between platforms in large cohorts, but we urgently need to raise the bar for functional studies," says the first author of both publications, Kinga Zielińska, PhD.
The researchers say the findings have implications that extend beyond technical aspects of DNA sequencing. Many rare genes may be biologically important, including those associated with antibiotic resistance and other characteristics relevant to public health.
"As microbiome science moves toward clinical applications, from personalized medicine to global surveillance of antibiotic resistance, the ability to reliably detect what microbes are doing, not just which microbes are present, becomes crucial," they emphasise.
According to the authors, the findings suggest that microbiome studies should place greater emphasis on data quality during sample preparation, even if doing so increases costs. Studies seeking to understand microbial functions may require different design priorities than those focused solely on identifying microbial species.
"Decisions made at the very beginning of the sequencing process, long before any data analysis, can create blind spots that no algorithm can fix," says Paweł P. Łabaj, PhD, professor at the Faculty of Medicine of the Jagiellonian University.
Katarzyna Czechowicz (PAP)
kap/ zan/
tr. RL