Poster Presentation 46th Lorne Genome Conference 2025

A new compression strategy to reduce the size of nanopore sequencing data (#162)

Kavindu Jayasooriya 1 2 3 4 , Sasha P Jenner 1 , Pasindu Marasinghe 4 , Udith Senanayake 4 , Hassaan Saadat 3 , David Taubman 3 , Roshan Ragel 4 , Hasindu Gamaarachchi 1 2 3 , Ira W Deveson 1 2 3
  1. Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, NSW, Australia
  2. Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children’s Research Institute, Sydney, NSW, Australia
  3. University of New South Wales, Sydney, NSW, Australia
  4. Department of Computer Engineering, University of Peradeniya, Peradeniya, Sri Lanka

Nanopore sequencing is an increasingly central tool for genomics. Despite rapid advances in the field, large data
volumes and computational bottlenecks continue to pose major challenges. Here we introduce ex-zd, a new data
compression strategy that helps address the large size of raw signal data generated during nanopore experiments. Ex-zd
encompasses both a lossless compression method, which modestly outperforms all current methods for nanopore
signal data compression, and a ‘lossy’ method, which can be used to achieve dramatic additional savings. The latter
component works by reducing the number of bits used to encode signal data. We show that the three least significant
bits in signal data generated on instruments from Oxford Nanopore Technologies (ONT) predominantly encode noise.
Their removal reduces file sizes by half without impacting downstream analyses, including basecalling and detection
of DNA methylation. Ex-zd compression saves hundreds of gigabytes on a single ONT sequencing experiment, thereby
increasing the scalability, portability and accessibility of nanopore sequencing.