An Extensive Survey on Gene PredictionMethodologies
Lecturer, P.G. Department of Information and Communication Technology,Fakir Mohan University, Orissa, India
In recent times, Bioinformatics plays an increasinglyimportant role in the study of modern biology. Bioinformaticsdeals with the management and analysis of biological informationstored in databases. The field of genomics is dependant onBioinformatics which is a significant novel tool emerging inbiology for finding facts about gene sequences, interaction of genomes, and unified working of genes in the formation of finalsyndrome or phenotype. The rising popularity of genomesequencing has resulted in the utilization of computationalmethods for gene finding in DNA sequences. Recently computerassisted gene prediction has gained impetus and tremendousamount of work has been carried out on this subject. An amplerange of noteworthy techniques have been proposed by theresearchers for the prediction of genes. An extensive review of theprevailing literature related to gene prediction is presented alongwith classification by utilizing an assortment of techniques. Inaddition, a succinct introduction about the prediction of genes ispresented to get acquainted with the vital information on thesubject gene prediction.
Keywords- Genomic Signal Processing (GSP), gene, exon,intron, gene prediction, DNA sequence, RNA, protein, sensitivity, specificity, mRNA.
INTRODUCTIONBiology and biotechnology are transforming researchinto an information-rich enterprise and hence they aredeveloping technological revolution. The implementation of computer technology into the administration of biologicalinformation is Bioinformatics . It is a fast growing area of computer science that deals with the collection, organizationand analysis of DNA and protein sequence. Nowadays, foraddressing the recognized and realistic issues which originatein the management and analysis of biological data, itincorporates the construction and development of databases,algorithms, computational and statistical methods andhypothesis . It is debatable that back to Mendel’s discoveryof genetic inheritance in 1865, the origin of bioinformaticshistory can be discovered. On the other hand, bioinformaticsresearch in a real sense began in late 1960s which isrepresented by Dayoff’s atlas of protein sequences as well asthe early modeling analysis of protein and RNA structures .
Dr. Ranjit Kumar Sahu
Assistant Surgeon, Post Doctoral Department of Plastic andReconstructive Surgery,S.C.B. Medical College, Cuttack,Orissa, IndiaE-mail:firstname.lastname@example.org Due to the availability of excessive amount of genomic and proteomic data in public domain, it is becomingprogressively more significant to process this information insuch a way that are valuable to humankind . One of thechallenges in the analysis of newly sequenced genomes is thecomputational recognition of genes and the understanding of the genome is the fundamental step. For evaluating genomicsequences and annotate genes, it is required to discover preciseand fast tools . In this framework, a significant role in thesefields has been played by the established and recent signalprocessing techniques . Comparatively, Genomic signalprocessing (GSP) is a new field in bio-informatics that dealswith the digital signal representations of genomic data andanalysis of the same by means of conventional digital signalprocessing (DSP) techniques .In the DNA (deoxyribonucleic acid) of a livingorganism, the genetic information is accumulated. DNA is amacro molecule in the form of a double helix. There are pairsof bases among the two strands of the backbone. There arefour bases called adenine, cytosine, guanine, and thymine.They are abbreviated with the letters A, C, G, and Trespectively . For the chemical composition of oneindividual protein, Gene is a fragment of DNA consisting of the formula. Genes serve as the blueprints for proteins and afew additional products. During the production of anygenetically encoded molecule, mRNA is the initialintermediate . The genomic information is frequentlypresented by means of the sequences of nucleotide symbols inthe strands of DNA molecules or by using the symboliccodons (triplets of nucleotides) or by the symbolic sequencesof amino acids in the subsequent polypeptide chains .Genes and the intergenic spaces are the two types of regions in a DNA sequence. Proteins are the building blocksof every organism and the information for the generation of the proteins are stored in the gene, where genes are in chargefor the construction of distinct proteins. Although, every cellin an organism consists of identical DNA, only a subset isexpressed in any particular family of cells and hence they haveidentical genes . The exons and the introns are the two
(IJCSIS) International Journal of Computer Science and Information Security,Vol. 8, No. 7, October 201088http://sites.google.com/site/ijcsis/ISSN 1947-5500