(IJCSIS) International Journal of Computer Science and Information Security,Vol.
Besides PDB database there are many databases that servethe 3D protein structures domain. The Structural Classificationof Proteins, SCOP, is a protein structure database whichdescribes the known evolutionary relationship of the proteinstructures as well as its structural relationship. It has been heldat Cambridge University in the medical research council [10,11]. The classification of protein Class, Architecture,Topology, and Homologous superfamily, CATH,  is adatabase that classifies the structures in the PDB hierarchically,and held at University of London. Furthermore, the Families of Structurally Similar Proteins (FSSP) database, built at theEuropean Bioinformatics Institute, was created based on theDALI method [13, 14]. Moreover, a database called PROSITE[2, 15] is for the family classification of proteins where proteinstructures are classified into families that share the samefunctions.
Proteins are the basic component of human cells as well asbeing the largest. So, the importance of proteins is clearregarding the role that proteins play in determining the functionof cells. Proteins have many structures where each structurehelps in the understanding the functions and chemicalproperties of living cells. The functions and chemicalproperties of proteins cannot be identified or determined beforeforming its tertiary structure.  shows the four levels of proteins starting from the amino acid sequences ending with itsquaternary structure .The trusted methods for identifying the tertiary structure of proteins are X-Ray Crystallography and NMR . But theproblems with those methods are cost and time where they areexpensive and much time is massively consumed in order toform the tertiary structure.
Searching for similar protein structures from the targetdatabase goes through many processes. First, the protein getsrepresented in a proper way that is suitable for comparisonmethods. This transformation of the protein has to be done forboth the query protein structure and the database. This processis considered as a pre-process due to the size of the databaseand the time consumed by this stip. The rest of the sub-processes are all about how to get and measure the similarityand search for the query protein structure.
Protein structure comparison and retrieval is one of themost important challenges in bioinformatics. Researchers
outputs in this field are still unsatisfactory where performanceis less than the expected for time and accuracy. An advantageof protein structure retrieval is that it helps in predicting thetertiary structure of proteins and thus plays an important role inunderstanding and identifying the functions of protein.The challenges in this domain are accuracy and time wherefaster and high accuracy methods are required withoutsacrificing the time. Many methods have been produced in thisresearch area to find out the optimal solution for solving thischallenge.II.
Similarity Representaion Methods
Similarity representation of protein structure importancecomes about due to its role in understanding the behavior of proteins. It helps in protein structure matching and similarityamong other protein structures. Furthermore, it is the first stepof protein structure comparison and retrieval. It is the processwhere the protein structure is built and rearranged in order togive simple and efficient representation for protein comparisonto manage and efficiently prepare the matching. This dataforming helps in fastening the comparison and retrieval processof proteins and has a high effect on the accuracy.Many methods have been proposed for protein 3D structuresimilarity representation in order to enhance the comparisons of performance and efficiency. The following sections presentthese methods.
Matrix representation methods
This group uses matrices for presenting protein 3Dstructures. These methods are divided into two sub-groups,distance and similarity matrices.
Two proteins are aligned in a matrixalike in order to represent them by calculating the distancebetween them. The values contained in the cells of the matrixrepresent the distance between the amino acids of the twoproteins.Holm L. and Sander C.  proposed an algorithm forprotein structures comparison called DALI. The proteinstructures were represented as a distance matrix. The alignmentbetween patterns and protein structures is done by executing apairwise comparison on the distance matrices
patterns, wherethe similar patterns are kept in a list called pair list. Then, thepatterns in the pair list are gathered to be aligned into a largeset of pairs. The algorithm focuses on the subset of the patternsbecause of the size of the distance matrix, where it increases byincreasing the length of the patterns or protein structures,. Thedistance matrix is reduced and the similar patterns are limited,in order to decrease the scope of the research process.Aung Z. and Tan K.L  proposed a protein 3D structureretrieval system called PROTDEX2. The algorithm depends onindex construction to represent the protein structure which isdivided into two sub-processes, feature vectors extraction from
COMPND MOL_ID: 1;COMPND 2 MOLECULE: GLUTATHIONE SYNTHETASE;COMPND 3 CHAIN: A;..SOURCE MOL_ID: 1;SOURCE 2 ORGANISM_SCIENTIFIC: AVIAN SARCOMA VIRUS;..REMARK 3 REFINEMENT.REMARK 3 PROGRAM : X-PLOR 3.851
Figure 1: PDB File Format Example