Evolutionary conservation scores, quantifying the degree to which sequences have been constrained by selective pressures, provide a means to draw biological inferences on the importance of proteins and their constituent residues. These scores have therefore historically underpinned the prediction of missense mutation pathogenicity. Despite their important informational contribution, the robustness of tools relying primarily on conservation have proven to be somewhat limited. Recognizing this shortfall, the missense tolerance ratio (MTR) was developed to capture purifying selection within a population. By leveraging the ever-expanding set of available human sequencing data, these scores employ population variant counts and are informed by structural information to evaluate the extent of missense mutation accumulation at protein residue sites. Encouragingly, regions identified as intolerant using these scores were significantly enriched with pathogenic variants. Moreover, the integration of MTR scores within a machine learning model yielded an accurate pathogenicity predictor.
While traditional conservation measures observe evolutionary constraints over substantial spans of time and across a diverse range of organisms, MTR scores focus specifically on the modern human population. To assess the degree of overlap in information contained within these two scores, a Spearman correlation was calculated, revealing a value of only 0.174 across the human proteome. Integrating the two scores into a logistic regression model significantly improved predictive performance compared to either score individually. These results suggest that by relying solely on conservation scores for pathogenicity prediction you are likely to overlook crucial insights into human-specific biology. Additionally, by comparing the scores to one another at the protein level, discrepancies have the potential to highlight where human biology deviates from the average organism. Using this methodology we have found that three out of the top four proteins which have substantially more selective pressure acting upon them within humans are important in the control of synaptic plasticity.