The increased availability of large-scale protein-protein interaction (PPI) data has made it possible to have a network level understanding of the basic components and organization of the cell machinery. A significant number of proteins in protein interaction networks (PIN) remain uncharacterized and predicting their function remains a major challenge. We propose a novel distance metric for PIN clustering. First we augment the graph representing the PIN with weights derived from Gene Ontology (GO) semantic similarity and we use this augmented representation in a random walk with restarts (RWR) process. The distance between a pair of proteins is calculated from the steady state distribution of the RWR. We validate our approach by function prediction via clustering in a purified and reliable Saccharomyces cerevisiae PIN. We show that the rise of function prediction performance when using the novel distance metric is significant, as compared to traditional approaches.
Distance metric Graph clustering Protein interaction network Protein function prediction