Poster Presentation 46th Lorne Genome Conference 2025

Most pervasive transcription in mammalian genomes is a consequence of selection (#135)

Brett Adey 1 , Danielle Maddock 1 , Marcel Dinger 2 , Paul Gardner 3 , Ant Poole 1 , Austen Ganley 1
  1. University of Auckland, Auckland, AUCKLAND, New Zealand
  2. School of Life and Environmental Sciences, University of Sydney, Sydney, NSW 2006, Australia
  3. Department of Biochemistry, University of Otago, Dunedin, New Zealand

Mammalian genomes are pervasively transcribed, but explaining why, when functional elements (such as protein-coding genes) occupy a small fraction of the genome, has generated a furious debate that is tied to genomic function. At one extreme, pervasive transcription is seen as background noise resulting from simple transcription initiation sequence requirements. At the other extreme, pervasive transcription is seen as encoding an enormous suite of functional noncoding RNAs. Sean Eddy offered an approach to discriminate between these two hypotheses – put large amounts of random DNA into a mammalian genome to form a transcriptional noise baseline, allowing us to determine whether real genomic transcriptional activity exceeds this. Here we used a machine learning model that predicts transcription initiation, Puffin-D, to enact Eddy’s idea. We first confirmed that Puffin-D accurately predicts transcription initiation for transcribed and non-transcribed regions of the human genome. We then made an in silico reversed entire human genome sequence and used Puffin-D to predict transcription initiation in this sequence compared to the forward human genome sequence. The predicted pattern of forward sequence transcription initiation is similar to that observed for actual transcription initiation data, again supporting Puffin-D’s accuracy. Strikingly, we found transcription is only predicted to initiate 20-25% as frequently in the reversed genome compared to the forward. This difference indicates that the large majority of transcription in mammalian genomes is not explained by background transcription, so we conclude it is a consequence of selection. We suggest this selected transcription may be driving a large suite of functional noncoding RNAs, maintaining open chromatin structure to facilitate ongoing access to DNA, and/or is derived from selfish transposable elements.