Poster Presentation 46th Lorne Genome Conference 2025

Benchmarking long-read RNA-sequencing technologies with LongBench2: a cross-platform reference dataset profiling cancer cell lines with bulk and single-cell approaches (#263)

Yupei You 1 , Ashleigh Solano 1 , James Lancaster 1 , Margaux David 2 , Changqing Wang 1 , Kathleen Zeglinski 1 , Shian Su 1 , Reza Ghamsari 3 , Jin Ng 1 , Kate Sutherland 1 , Manveer Chauhan 4 , Sefi Prawer 2 , Michael B. Clark 4 , Quentin Gouil 1 , Matthew E. Ritchie 1
  1. Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia
  2. Department of Neurosciences, Leuven Brain Institute, Leuven, Belgium
  3. Graduate School of Biomedical Engineering, University of New South Wales, Kensington, New South Wales, Australia
  4. Department of Anatomy and Neuroscience, The University of Melbourne, Parkville, VIC, Australia

Long-read RNA sequencing technologies offer unparalleled insights into transcriptomes by enabling full-length sequencing of RNA molecules, uncovering novel isoforms and alternative splicing events. While long-read sequencing platforms, such as Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT), have historically been associated with higher error rates, recent advancements in both platforms have significantly enhanced read accuracy, broadening their applicability for transcriptomic studies.

With the rapid evolution of sequencing protocols and bioinformatics tools, the trade-offs between sequencing throughput, read length, accuracy, and cost present significant challenges in selecting the optimal approach. Systematic benchmarking studies that compare these options are crucial to inform future research directions. However, many existing benchmarking datasets with matched data across multiple platforms have limitations, including: 1) a lack of realistic biological replicates, which may restrict the generalisability of differential analysis results to real-world scenarios, and 2) the use of earlier sequencing kits, which may not reflect the latest advancements in sequencing technology, limiting their relevance for future studies that typically use newer sequencing protocols.

To address these gaps, we present LongBench2, a comprehensive benchmarking dataset designed to fill these critical gaps. Derived from eight lung cancer cell lines with synthetic RNA spike-ins, LongBench2 includes bulk, single-cell, and single-nucleus RNA-seq data from three state-of-the-art long-read sequencing platforms — ONT PCR-cDNA, ONT direct RNA sequencing, PacBio Kinnex—alongside Illumina short-read data for robust cross-platform comparisons. The LongBench2 dataset is a valuable resource for benchmarking and improving sequencing protocols and bioinformatics tools, With the LongBench2 dataset we present a systematic evaluation of transcript capture, quantification, and differential expression analyses, examining the strengths and limitations of each sequencing platform in various biological contexts, enabling researchers to make more informed decisions on platform and method selection.