

To provide a rationale for each prediction, we quantified the contribution of features and feature combinations to the pathogenicity inference of each variant.


To limit circularity and bias, VARITY excludes features informed by variant annotation and protein identity. Here, we describe VARITY, which judiciously exploits a larger reservoir of training examples with uncertain accuracy and representativity. There are many challenges in training accurate computational systems, e.g., in finding the balance between quantity, quality, and bias in the variant sets used as training examples and avoiding predictive features that can accentuate the effects of bias. The success of personalized genomic medicine depends on our ability to assess the pathogenicity of rare human variants, including the important class of missense variation.
