Oral Presentation 46th Lorne Genome Conference 2025

Constructing a pangenome reference of African foragers to improve a catalogue of early diverged contemporary human genome variation (114589)

Weerachai Jaraltledsiri 1 , Mehedi Hasan 1 , Ksenia Skvortsova 2 , Riana MS Bornman 3 , Jue Jiang 1 , Hagen EA Förtsch 4 , Jeffrey Mphahlele 5 , Vanessa M Hayes 1
  1. University of Sydney, Camperdown, NSW, Australia
  2. Garvan Institute of Medical Research, Darlinghurst
  3. University of Pretoria, Pretoria
  4. Windhoek Central Hospital, Windhoek Khomas
  5. National Health Laboratory Service and Sefako Makgatho Health Sciences University, Pretoria

Constructing pangenome references for the improved catalogue of genomic variation is an ever-increasingly new study in human genetics to determine the true extent of human diversity and associated medical impact. Despite broad agreement that Homo sapiens originated in Africa and Khoe-San African foragers show the oldest known split date among present-day humans, they have no such study to date. Here, together with the generation of 33-55 whole-genome short-read data for 150 Khoe-San peoples, we build three reference-grade assemblies of Khoe-San individuals with a per-base accuracy of 99.999%, using integrative analysis of HiFi long-read and short-read technologies. While short-read data reveal ~30 million small-to-large variants with >1.3 million single nucleotide variants being novel, pangenome inference of three Khoe-San and 44 public pangenome drafts identifies ~900 thousand additional novel variants in the foragers. Khoe-San genomic variants identify ‘San’ and ‘Damara’ as separate phylogenetic lineages, representing shared traditionally forager lifestyles and click-speaking languages. While San represents modern humans’ deep divergence (~115 thousand years ago), Damara divergence is recent; both show high effective population size estimates suggesting global dominance between 45-150 thousand years ago. Developing assembly-based selection tests, we extensively report 1,376 genes under positive selection (dN/dS = 19.46), of which 479 are significantly associated with forager peoples and, therefore, maintained ancestral alleles that differ from derived genetic variation observed in non-Africans. Using our pangenome drafts, the pathway analysis of 2,276 new truncated variants shows three significant pathways (FDR <0.04998) involved in megabase regions of HLA, OR and KRT/KRTAP forager-associated multigene families, with their phylogenies and relevant long-read methylation suggesting the evolutionary mode of gene copy gain or loss. This work provides a first draft of African forager pangenomes and confirms its accurate detection of early diverged genetic variation and relevant epigenetic discovery to establish the full extent of global reference resources.