Professional Documents
Culture Documents
3.1 Introduction
The information resulted from postgenomic and high-throughput techniques are no longer a bottleneck in understanding
and tackling the biological processes. The biological problems are easy to unravel by sequencing of DNA, proteins
using various computational tools, and informatics algorithms for assessing molecular data (Khan, 2018).
Bioinformatics is playing a major role in the field of molecular biology ranging from cancer studies in humans to study
of microbial pathogens (Katara, 2014). Moreover, to understand the high-throughput techniques such as DNA microar-
rays, chip-on-chip, protein chips, and recently, the new-generation sequencers, from global prospective, the researchers
are handling a vast amount of data generated through these techniques. This huge amount of data generated needs to be
analyzed using bioinformatics tools. The first genomic initiative has been set up about 35 years ago, the Human
Genome Project, and completed in 2003. Bioinformatics aids in deciphering various human genes and provided infor-
mation about their structure and organization. A researcher could be able to learn more and more regarding functions of
genes and proteins among the similar and dissimilar organisms. The only challenging goal was determining the unit by
unit order of nucleotides together making up the human genome (Collins & Fink, 1995). Arabidopsis thaliana was the
first among the plants and third among the multicellular organism after Caenorhabditis elegans and Drosophila melano-
gaster, to be completely sequenced (Tabata et al., 2000). It became the sound basis for further investigations as on com-
pleting the sequencing of this plant; it was found that high-throughput technologies will dramatically increase the
knowledge on complex biological networks (Hidalgo, 2003). Bioinformatics is an interdisciplinary subject which is the
amalgamation of biological and information science that develops new methods and software tools to understand the
biological data. It plays a key role to do comprehensive analysis and to understand gene functions with variable levels
of protein expression. It is also used to compare the genetic and genomic data and aids to understand various evolution-
ary aspects of molecular biology. There are various sequence search engines, namely, for homology-based search,
NCBI BLAST N and BLAST p.; for orthologous sequence search, Ortho MCL; and for paralogous sequence search,
Mc Scan and Mc Scan X are available. Biological databases are used to store and distribute the sequence data, namely,
European Molecular Biology Laboratory (EMBL) and the DNA database of Japan (DDBJ). In order to speed up the
analysis, bioinformatics enriched itself with a lot of resources, facilities, and databases which are updated timely with
new information and knowledge. This review enlightens various bioinformatics methods to solve the biological pro-
blems which are related to functional genomics.
contributes to the different biological processes. The main goal of functional genomics is to generate a particular pheno-
type with the help of different components of a biological system. Some functional genomic approaches are mainly
based on DNA level (genomics and epigenomics), RNA level (transcriptomics), protein level (proteomics), and metabo-
lite level (metabolomics).
fragment called tag released by treatment of the linker-ligated cDNA with BsmFI from a defined position of each
cDNA. The tags are concatenated and cloned into a plasmid vector, which is then sequenced after removal of this linker
fragment. Generally, for a given sample, around 10,000100,000 tags may be analyzed. The profusion of the transcript
which corresponds to the tag is represented by the number of each tag in the total sample. The next main step is to iden-
tify the gene which corresponds to the tag or tag annotation. The 15-bp tag sequence is generally used as a query to
search expressed sequence tags (ESTs) or cDNA databases of any organism of interest through BLAST search
(Altschul, Gish, Miller, Myers, & Lipman, 1990). Results of tag counts and tag annotation are then combined finally
into a gene expression profile. Gene expression profiles are then compared of two samples that are treated differently,
then we will be able to tell which gene is up- or downregulated in response to the particular treatment. In short, follow-
ing are the steps to the SAGE procedure:
G mRNA of an input sample (e.g., a tumor) isolated.
G Remove a small portion of sequence of mRNA molecule which is used for analysis.
G Link these small sequences together to form a longer chain or concatamer.
G Clone these chains into a vector which can be taken up by bacteria.
G Then sequence the chains with the help of high-throughput sequencer.
G Processing of data to count the small sequence tags with the help of a computer.
USAGE, a web-based application which comprises a set of tools to compare and analyze SAGE data. USAGE is
accessible at http://www.cmbi.kun.nl/usage free of cost for academic institutions. In addition, it enhances the functional-
ity and flexibility of data (Van Kampen et al., 2000). Some of the SAGE databases are:
1. SAGE net: This is the database known as SAGEnet (http://www.sagenet.org) which is maintained by the
Vogelstein/Kinzler Lab at Johns Hopkins. It is used mainly for colon cancer, pancreatic cancer, and some normal
tissues of these cells.
2. SAGEmap: This is developed by National Institute of Health’s (NIH) National Centre for Biotechnology
Information (NCBI) and NIH’s Cancer Genome Anatomy Project (CGAP). This database is considered as a public
gene expression repository and unique in many ways.
3. Genzyme’s SAGE database: Database is used to create SAGE tag libraries for contracting parties. This database is
also available through other agencies such as Celera Genomics and Compugen.
Besides this, few other SAGE analysis tools are available such as SAGE300. The SAGE data is obtained with the
help of sequencing the short DNA tags, although data may have errors due to sequencing (Tuteja & Tuteja, 2004).
4. The mRNA transcript allocated with each tag could be made even more arduous and uncertain on interpolating the
sequencing errors into the process.
3.7 Conclusion
The genomic data resulting from sequencing created various huge challenges as well as several opportunities to study
the genomes of organism. The bioinformatic tools mentioned in the present review article including databases and soft-
ware play an efficient role in handling out those challenges. Several functional genomic approaches with their databases
are mentioned to tackle the biological problems generating from the huge size of data. Although the functional genomic
databases are continuously updated with mined knowledge and new information in order to provide much more reliable
information for genomics-related analysis.
46 SECTION | I Bioinformatics and next generation sequencing technologies
References
Abril, J. F., & Castellano Hereza, S. (2019). Genome annotation (pp. 195209). Elsevier.
Altschul, S. F., Gish, W., Miller, W., Myers, E. W., & Lipman, D. J. (1990). Basic local alignment search tool. Journal of Molecular Biology, 215,
403410.
Bouchez, D., & Höfte, H. (1998). Functional genomics in plants. Plant Physiology, 118(3), 725732.
Brazma, A., Hingamp, P., Quackenbush, J., Sherlock, G., Spellman, P., Stoeckert, C., . . . Roch, K. G. (2013). An introduction to Functional Genomics
and System Biology. Advances in wound care., 2(9), 490498.
Brazma, A., Hingamp, P., Quackenbush, J., Sherlock, G., Spellman, P., Stoeckert, C., Aach, J., Ansorge, W., Ball, C. A., Causton, H. C. &
Gaasterland, T. (2001). Minimum information about a microarray experiment (MIAME)—toward standards for microarray data. Nature Genetics,
29(4), 365371.
Bunnik, E. M. & Le Roch, K. G. (2013). An introduction to functional genomics and systems biology. Advances in wound care, 2(9), 490498.
Chen, H., Centola, M., Altschul, S. F., & Metzger, H. (1998). Characterization of gene expression in resting and activated mastcells. The Journal of
Experimental Medicine, 188, 16571668.
Collins, F. S., & Fink, L. (1995). The Human Genome Project. Alcohol Health and Research World, 19(3), 190195.
Consortium, 1. (2012). An integrated map of genetic variation from 1,092 human genomes. Nature, 491(7422), 5665.
de Sá, P. H., Guimarães, L. C., das Graças, D. A., de Oliveira Veras, A. A., Barh, D., Azevedo, V., . . . Ramos, R. T. (2018). Next-generation sequenc-
ing and data analysis: Strategies, tools, pipelines and protocols. Omics Technologies and Bio-Engineering (pp. 191207). Academic Press.
Edmonson, M. N., Zhang, J., Yan, C., et al. (2011). Bambino: A variant detector and alignment viewer for next-generation sequencing data in the
SAM/BAM format. Bioinformatics (Oxford, England), 27, 865866.
Govindarajan, R., Duraiyan, J., Kaliyappan, K., & Palanisamy, M. (2012). Microarray and its applications. Journal of Pharmacy & Bioallied Sciences,
4(Suppl 2), S310.
Hayden, E. C. (2009). Genome sequencing: the third generation. Nature, 457(7231), 768769.
Hidalgo, O. B. (2003). Functional genomics and bioinformatics: an overview. Biotecnologı´a Aplicada., 20(3), 183.
Katara, P. (2014). Potential of Bioinformatics as functional genomics tools: An overview. Network Modeling Analysis in Health Informatics and
Bioinformatics., 3, 52.
Khan, N. T. (2018). Structural and Functional Bioinformatics. Letters in Health and Biological Science, 3(1), 711.
Kremer, S., Stewart, J., Taylor, R., Vilo, J., & Vingron, M. (2001). Minimum information about a microarray experiment (MIAME) toward stan-
dards for microarray data. Nature Genetics, 29, 365371.
Lee, H. C., Lai, K., Lorenc, M. T., Imelfort, M., Duran, C., & Edwards, D. (2012). Bioinformatics tools and databases for analysis of next-generation
sequence data. Briefings in Functional Genomics., 11(1), 1224.
Li, Y., Xiao, J., Chen, L., Huang, X., Cheng, Z., Han, B., & Wu, C. (2018). Rice functional genomics research: past decade and future. Molecular
plant., 11(3), 359380.
Mehta, J. P., & Rani, S. (2011). Software and tools for microarray data analysis in Gene Expression Profiling (784, pp. 4153). Humana Press.
Patino, W. D., Mian, O. Y., & Hwang, P. M. (2002). Serial analysis of gene expression: technical considerations and applications to cardiovascular
biology. Circulation Research, 91(7), 565569.
Stajich, J. E., Harris, T., Brunk, B. P., Brestelli, J., Fischer, S., Harb, O. S., & Stoeckert, C. J., Jr (2012). FungiDB: an integrated functional genomics
database for fungi. Nucleic Acids Research, 40(1), 675681.
Tabata, S., Kaneko, T., Nakamura, Y., Kotani, H., Kato, T., Asamizu, E., Miyajima, N., Sasamoto, S., Kimura, T., Hosouchi, T. & Kawashima, K.
(2000). Sequence and analysis of chromosome 5 of the plant Arabidopsis thaliana. Nature, 408(6814), 823826.
Tuteja, R., & Tuteja, N. (2004). Serial analysis of gene expression (SAGE): unraveling the bioinformatics tools. Bioessays., 26(8), 916922.
Van Kampen, A. H., van Schaik, B. D., Pauws, E., Michiels, E. M. C., Ruijter, J. M., Caron, H. N., & van Der Mee, M. (2000). USAGE: A web-
based approach towards the analysis of SAGE data. Bioinformatics (Oxford, England), 16(10), 899905.
Velculescu, V. E., Zhang, L., Vogelstein, B., & Kinzler, K. W. (1995). Serial analysis of gene expression. Science (New York, N.Y.), 270, 484487.
Wang, D., Fan, W., Guo, X., Wu, K., Zhou, S., Chen, Z., . . . Zhou, Y. (2020). MaGenDB: a functional genomics hub for Malvaceae plants. Nucleic
Acids Research., 48(1), 10761084.
Wang, L., Xie, W., Chen, Y., Tang, W., Yang, J., Ye, R., Liu, L., Lin, Y., Xu, C., Xiao, J., et al. (2010). A dynamic gene expression atlas covering
the entire life cycle of rice. The Plant Journal: for Cell and Molecular Biology, 61, 752766.
Wang, S. M. (2004). Understanding SAGE data. Trends in Genetics., 23(1), 4250.
Wei, L., Gu, L., Song, X., Cui, X., Lu, Z., Zhou, M., Wang, L., Hu, F., Zhai, J., Meyers, B. C. ,, et al. (2014). Dicer-like 3 produces transposable
element-associated 24-nt siRNAs that control agricultural traits in rice. Proceedings of the National Academy of Sciences of the United States of
America, 111, 38773882.
Yamamoto, M., Wakatsuki, T., Hada, A., & Ryo, A. (2001). Use of serial analysis of gene expression (SAGE) technology. Journal of Immunological
Methods, 250(12), 4566.
Zou, D., Ma, L., Yu, J., & Zhang, Z. (2015). Biological databases for human research. Genomics, proteomics & bioinformatics., 13(1), 5563.