Study of Protein-Protein Interactions Using Probabilistic Graphical Models

Haidong Wang, Daphne Koller

Protein-protein interactions are central to all cellular processes. Discovery of mechanisms underlying protein interaction network will allow for meaningful predictions about the functions of cellular proteins, with possible applications to drug design. Large amounts of genomic data are now available for protein interactions, sequence, structure, localization, transcriptional regulation, expression, phosphorylation, and etc.

In this project we are trying to extract patterns and learn relationships from the noisy genomic data by using probabilistic models. We use these models to make accurate predictions on protein-protein interactions. The efficient mincut inference algorithm is used because loopy belief propagation is slow for large and dense networks, which is typical for this kind of genome-wide biological network. We also use feature selection methods, such as L1 regularization, to extract significant correlations from the large set of potential correlations. Finally, amino acid level information is used to predict the structural features and thus further improve the prediction on protein-protein interactions and their specific binding sites.