The proteins are a basic part of every living organism. The more we know about them the easier it is to understand the complex mechanism behind it and influence its proper functioning. There are several known ways for protein function prediction. The focus of this paper is the BLAST algorithm which is a tool for protein function prediction using protein sequence. BLAST makes comparison between two protein sequences and gives a score regarding their similarity. After using BLAST on all the queried sequences via several computations we calculate a threshold over which all the similar proteins which BLAST returns will be considered homologous. According to this approach, the queried protein should get the functions of its homologous proteins. There are many papers which describe this kind of protein function prediction, but this particular methodology focuses on the correlation between the accuracy of the algorithm and the protein families. The goal of this research is to find protein families for which a correlation between the protein sequence and the protein function exists, i.e to give a recommendation for protein families for which BLAST can be used as a tool for predicting the protein function. Even though the secondary protein structure is much more precise in revealing protein similarities, much bigger memory and computing power is needed in order to use this approach, so the goal is to find a way to use the primary structure i.e the sequence which is the simplest representation of the protein. The algorithms which use the protein sequence are much faster than the others because of the simplicity of the primary structure over the secondary and tertiary structures.
Protein sequence Sequence similarity BLAST UniProt Gene Ontology Protein functions Protein families.