Small insertions/deletions (INDELs) of ≤21 bp comprise 18%of all recorded mutations causing human inherited disease and are evident in 24%of documentedMendelian diseases. INDELs affect gene function inmultipleways: forexample,byintroducingprematurestopcodonsthateither leadtotheproductionof truncatedproteinsoraffect transcriptional efficiency.However, themeans bywhich they impact post-transcriptional regulation, including alternative splicing, havenot been fully evaluated. In this study,wecollate disease-causingINDELs fromtheHuman Gene Mutation Database (HGMD) and neutral INDELs from the 1000 Genomes Project. The potential of these two types of INDELs to affect binding-site affinity of RNA-binding proteins (RBPs) was then evaluated. We identified several sequence features that can distinguish disease-causing INDELs from neutral INDELs. Moreover, we built a machine-learning predictor called PinPor (predicting pathogenic small insertions and deletions affecting post-transcriptional regulation, http://watson.compbio.iupui.edu/pinpor/) to ascertain which newly observed INDELs are likely to be pathogenic. Our results show that disease-causing INDELs are more likely to ablate RBP-binding sites and tend to affectmore RBP-binding sites than neutral INDELs. Additionally, disease-causing INDELs give rise to greater deviations in binding affinity than neutral INDELs.We also demonstrated that diseasecausing INDELsmay be distinguished fromneutral INDELs by several sequence features, such as their proximity tosplice sites andtheirpotential effectsonRNAsecondary structure.Thispredictor showedsatisfactoryperformance in identifying numerous pathogenic INDELs, with aMatthews correlation coefficient (MCC) value of 0.51 and an accuracy of 0.75.
ASJC Scopus subject areas
- Molecular Biology