2025.03.31.646392

Total: 1

#1 Enhancing variant detection in complex genomes: leveraging linked reads for robust SNP, Indel, and structural variant analysis [PDF] [Copy] [Kimi] [REL]

Authors: Can Luo, Yichen Henry Liu, Han Liu, Zhenmiao Zhang, Lu Zhang, Brock A Peters, Xin Maizie Zhou

Accurate detection of genetic variants, including single nucleotide polymorphisms (SNPs), small insertions and deletions (INDELs), and structural variants (SVs), is critical for comprehensive genomic analysis. While traditional short-read sequencing performs well for SNP and INDEL detection, it struggles to resolve SVs, especially in complex genomic regions, due to inherent read length limitations. Linked-read sequencing technologies, such as single-tube Long Fragment Read sequencing (stLFR), overcome these challenges by employing molecular barcodes, providing crucial long-range information. This study investigates traditional pair-end linked-reads and a conceptual extension of linked-read technology: barcoded single-end reads of 500 bp (SE500_stLFR) and 1000 bp (SE1000_stLFR), generated using the single-tube Long Fragment Read (stLFR) platform. Unlike conventional paired-end (PE100_stLFR) linked reads, these longer single-end reads could offer improved resolution for variant detection by leveraging extended read lengths per barcode. We simulated a diverse set of datasets for the HG002 sample using T2T-based realistic genome simulation. Variant detection performance was then systematically assessed across three stLFR configurations: standard PE100_stLFR, SE500_stLFR, and SE1000_stLFR. Benchmarking against the Genome in a Bottle (GIAB) gold standard reveals distinct strengths of each configuration. Extended single-end reads (SE500_stLFR and SE1000_stLFR) significantly enhance SV detection, with SE1000_stLFR providing the best balance between precision and recall. In contrast, the shorter PE100_stLFR reads exhibit higher precision for SNP and INDEL calling, particularly within high-confidence regions, though with reduced performance in low-mappability contexts. To explore optimization strategies, we constructed hybrid libraries combining paired-end and single-end barcoded reads. These hybrid approaches integrate the complementary advantages of different read types, consistently outperforming single libraries across small variant types and genomic contexts. Collectively, our findings offer a robust comparative framework for evaluating stLFR sequencing strategies, highlight the promise of barcoded single-end reads for improving SV detection, and provide practical guidance for tailoring sequencing designs to the complexities of the genome.

Subject: Bioinformatics

Publish: 2025-04-06